# Upload Thesis heidok hilsenstein

Dissertation submitted to the Combined Faculties for the Natural Sciences and for Mathematics of the Ruperto-Carola University of Heidelberg, Germany for the degree of Doctor of Natural Sciences presented by Diplom-Physiker Volker Hilsenstein born in Mannheim, Germany Oral examination: 5. May 2004 Design and Implementation of a Passive Stereo-Infrared Imaging System for the Surface Reconstruction of Water Waves Referees: Prof. Dr. Bernd Jähne Prof. Dr. Kurt Roth Zusammenfassung Eine quantitative Beschreibung der Austauschprozesse zwischen Ozean und Atmosphäre erfordert ein Verständnis des Einflusses von Wellen auf diese Prozesse. Diese Arbeit stellt ein bildgebendes, passives Infrarot-Stereokamerasystem zur Rekonstruktion wellenbewegter Wasseroberflächen vor. Da keine Lichtquelle unterhalb der Wasseroberfläche benötigt wird, ist das System feldgängig. Zunächst werden bestehende Verfahren zur Wellenvisualisierung besprochen und die Hauptprobleme der stereo-photogrammetrischen Rekonstruktion von Wasseroberflächen herausgearbeitet: Transparenz, unzureichende Bildstruktur und spiegelnde Reflektion. Es wird aufgezeigt, daß sich durch Bildaufnahme im infraroten Wellenlängenbereich viele dieser Probleme vermeiden lassen. Nach einer Wiederholung der Infrarotradiometrie werden die wichtigsten Komponenten der auf dem Stereoprinzip beruhenden Oberflächenrekonstruktion erläutert: Kamerakalibration, Epipolargeometrie sowie Disparitätssuche. Im Anschluß wird das verwendete Stereokamerasystem beschrieben. Eine experimentelle Validierung des Systems erfolgte am Heidelberger Wind-Wellenkanal. Anhand mehrerer dort aufgenommer Infrarot-Stereobildsequenzen von Wasserwellen wird gezeigt, daß das Instrument eine dichte Rekonstruktion der Wasseroberfläche ermöglicht. Eine experimentelle Beurteilung der Genauigkeit des Verfahrens erfolgt anhand der Rekonstruktion einer ruhenden Wasseroberfläche, welche als Referenzebene verwendet wird. Abstract To quantify air-sea exchange processes, an understanding of how they are influenced by water waves is necessary. This work presents a passive, infrared stereo-imaging system for the reconstruction of a wavy water surface. The system does not require a submerged light source, so it is suitable for field operation. The structure of the thesis is as follows. Previous work on water wave imaging is reviewed and the major problems with stereo-based reconstruction of water surfaces are identified: transparency, lack of texture and specular reflections. It is shown that many of these problems can be avoided by imaging at infrared wavelengths. Following a review of infrared radiometry, the key ingredients of surface reconstruction using the stereo principle are explained, including camera calibration, epipolar geometry and disparity estimation. A description of the stereo infrared camera system used in this work is given. An experimental validation of the system was performed at the Heidelberg wind-wave channel. Several stereo infrared image sequences of water waves recorded at this facility are used to demonstrate that a dense surface reconstruction of water waves is possible using this system. The accuracy of the reconstruction is experimentally assessed using a flat water surface as a reference plane. Contents 1 Introduction 7 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.1 Air-Sea Interactions and their Effects on Climate . . . 7 1.1.2 Factors Influencing Air-Sea Exchange Processes . . . 9 1.1.3 Heat Flux Measurements using Thermography . . . . 11 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.1 Stereo Measurements of Water Waves . . . . . . . . . 13 1.2.2 Slope Measurements of Waves . . . . . . . . . . . . . 16 1.3 Why is wave imaging difficult ? . . . . . . . . . . . . . . . . . 18 1.4 Aim and own contribution . . . . . . . . . . . . . . . . . . . . 21 1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.2 2 Basics of Infrared Imaging 25 2.1 Radiometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . 26 2.1.2 Electromagnetic Radiation of a Blackbody . . . . . . 30 2.1.3 Emissive Properties of Real Surfaces . . . . . . . . . . 34 2.1.4 Optical Properties of Water in the Infrared Region . . 35 Infrared Detectors . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.2.1 Types of detectors . . . . . . . . . . . . . . . . . . . . 41 2.2.2 Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . 42 2.2.3 Detector Output and Temperature Measurements . . 43 2.2 3 Geometry 47 3.1 Projective Geometry . . . . . . . . . . . . . . . . . . . . . . . 48 3.1.1 The Projective Plane P2 . . . . . . . . . . . . . . . . 48 3.1.2 The Projective Space P3 . . . . . . . . . . . . . . . . 49 3.1.3 Homographies . . . . . . . . . . . . . . . . . . . . . . 49 2 Contents 3.2 3.3 3.4 3.5 4 Single View Geometry and Camera Models . . . . . . . . . . . 50 3.2.1 Pinhole camera model . . . . . . . . . . . . . . . . . . 50 3.2.2 World Coordinate System . . . . . . . . . . . . . . . 52 3.2.3 CCD-type cameras . . . . . . . . . . . . . . . . . . . 53 Non-Linear Distortion . . . . . . . . . . . . . . . . . . . . . . 53 3.3.1 Modeling Non-Linear Distortion . . . . . . . . . . . . 54 3.3.2 Correcting for Non-Linear Distortion . . . . . . . . . 55 Estimation of Camera Parameters using Zhang’s Method . . . 55 3.4.1 The Calibration Process . . . . . . . . . . . . . . . . 57 3.4.2 Initial Guess through Closed-Form Solution . . . . . . 58 3.4.3 Full Solution through Minimization of Geometric Error 61 Two View Geometry . . . . . . . . . . . . . . . . . . . . . . . 62 3.5.1 Calibration of a Stereo Camera System . . . . . . . . 62 3.5.2 Epipolar Constraint and the Fundamental Matrix . . 65 3.5.3 Image Rectification . . . . . . . . . . . . . . . . . . . 67 3.5.4 Projective Distortion-Minimizing Rectification . . . . 69 3.5.5 Triangulation . . . . . . . . . . . . . . . . . . . . . . 73 Solving the Correspondence Problem 75 4.1 Classification of Matching Algorithms . . . . . . . . . . . . . . 75 4.1.1 Feature-based Stereo Matching . . . . . . . . . . . . . 76 4.1.2 Area-Based Stereo Matching . . . . . . . . . . . . . . 76 Prerequisites for Area-Based Matching . . . . . . . . . . . . . 79 4.2.1 Fronto-Parallel . . . . . . . . . . . . . . . . . . . . . . 79 4.2.2 Lambertian Surface . . . . . . . . . . . . . . . . . . . 80 4.2.3 Opacity . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.2.4 Texture . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.2.5 Conclusions Regarding Wave Imaging . . . . . . . . . 82 Matching Algorithm . . . . . . . . . . . . . . . . . . . . . . . 83 4.3.1 84 4.2 4.3 Matching Score . . . . . . . . . . . . . . . . . . . . . Contents 5 6 3 4.3.2 Efficient Implementation . . . . . . . . . . . . . . . . 84 4.3.3 Computation of Disparity . . . . . . . . . . . . . . . . 87 4.3.4 Sub-Pixel Refinement . . . . . . . . . . . . . . . . . . 87 4.3.5 Validation of Matches . . . . . . . . . . . . . . . . . . 88 4.3.6 Multi-Scale Approach . . . . . . . . . . . . . . . . . . 89 Image Pre- and Postprocessing 93 5.1 Non-Uniformity Correction and Radiometric Calibration . . . 93 5.2 Outlier Removal . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.3 Regularization: Filling in the Gaps . . . . . . . . . . . . . . . 96 5.3.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.3.2 Filling in Defective Pixels . . . . . . . . . . . . . . . . 98 5.3.3 Regularization of Disparity Estimates . . . . . . . . . 99 Experimental Setup and Procedures 101 6.1 Infrared Cameras . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.1.1 Specifications . . . . . . . . . . . . . . . . . . . . . . 101 6.1.2 Stereo Setup . . . . . . . . . . . . . . . . . . . . . . . 103 6.2 Blackbody . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.3 Acquisition System . . . . . . . . . . . . . . . . . . . . . . . . 104 6.3.1 Frame Grabber . . . . . . . . . . . . . . . . . . . . . 104 6.3.2 Camera Synchronization . . . . . . . . . . . . . . . . 106 6.3.3 PC and RAID . . . . . . . . . . . . . . . . . . . . . . 106 6.4 Geometric Calibration Target . . . . . . . . . . . . . . . . . . 108 6.5 Aeolotron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.6 Experimental Procedure . . . . . . . . . . . . . . . . . . . . . 110 6.6.1 Radiometric Calibration Procedure . . . . . . . . . . 110 6.6.2 Geometric Calibration Procedure . . . . . . . . . . . 110 6.6.3 Deployment . . . . . . . . . . . . . . . . . . . . . . . 111 6.6.4 Acquisition of Image Sequences . . . . . . . . . . . . 112 4 7 Contents Results 113 7.1 Radiometric Calibration and Non-Uniformity Correction . . . 113 7.1.1 Thermosensorik CMT Camera . . . . . . . . . . . . . 114 7.1.2 Amber Radiance Camera . . . . . . . . . . . . . . . . 115 Geometric Camera Calibration . . . . . . . . . . . . . . . . . . 118 7.2.1 Interior Parameters . . . . . . . . . . . . . . . . . . . 120 7.2.2 Exterior Parameters . . . . . . . . . . . . . . . . . . . 122 7.3 Rectification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7.4 Disparity Estimation . . . . . . . . . . . . . . . . . . . . . . . 128 7.4.1 Test Image Results . . . . . . . . . . . . . . . . . . . 128 7.4.2 Multi-Scale Disparity Estimation . . . . . . . . . . . 129 7.5 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.6 Depth Reconstruction . . . . . . . . . . . . . . . . . . . . . . 133 7.7 Measurements of Water Waves . . . . . . . . . . . . . . . . . . 134 7.8 Discussion of Accuracy . . . . . . . . . . . . . . . . . . . . . . 138 7.8.1 138 7.2 8 Assessing the Total System Accuracy . . . . . . . . . Conclusions 145 8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 8.2 Discussion and Outlook 145 . . . . . . . . . . . . . . . . . . . . . A Reconstruction Accuracy 149 A.1 Factors Influencing the Reconstruction Accuracy . . . . . . . . 149 A.2 Range Resolution of Triangulation . . . . . . . . . . . . . . . . 150 A.3 Estimation of a Reference Plane . . . . . . . . . . . . . . . . . 152 A.3.1 Least-Squares Estimate . . . . . . . . . . . . . . . . . 152 A.3.2 Robust Estimate 153 . . . . . . . . . . . . . . . . . . . . B Estimating Planar Homographies 155 B.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . 155 B.2 Initial Guess . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 B.3 Minimizing Geometric Error . . . . . . . . . . . . . . . . . . . 157 Contents 5 Index 160 Bibliography 163 6 Contents 1 Introduction 1.1 Motivation 1.1.1 Air-Sea Interactions and their Effects on Climate In 2000, an international and interdisciplinary research program, the Surface Ocean Lower Atmosphere Study (Solas), was founded. The program’s goal is “to achieve quantitative understanding of the key biogeochemical-physical interactions and feedbacks between the ocean and the atmosphere, and how this coupled system affects and is affected by climate and environmental change”. [69] These ocean-atmosphere interactions are being studied partly in order to understand whether the changes in the world’s climate, with a rise in the global mean temperature being the most prominent, can be attributed to human activities and to be able to predict future effects of these activities on climate. As the first point is already largely answered – according to the third assessment report of the Intergovernmental Panel on Climate Change[53] “there is new and stronger evidence that most of the warming observed over the last 50 years is attributable to human activities” – the emphasis of current research is to monitor and better quantify the ongoing changes and to predict future changes in the world’s climate system. The anthropogenic release of carbon dioxide (CO2 ), as well as other greenhouse gases, into the atmosphere is known to be one of the major factors driving climate change. When deriving global budgets for atmospheric CO2 concentrations, the coupling between the atmosphere and the oceans has to be taken into account because the oceans are a major sink for CO2 . The net flux of CO2 from the atmosphere into the oceans during the period from 1990 to 1999 was about 2 PgC1 per year, which is roughly one third of the yearly anthropogenic emissions into the atmosphere (see IPCC [53]). The temporal and regional variability of this CO2 air-sea flux is not well understood. The available data on this variation is from measurements obtained over the last three decades assimilated by Takahashi [97, 98]. One of the parameters influencing the magnitude of these fluxes is the transport velocity across the air-sea interface. This transport velocity quantifies the kinetics of gas exchange and its value is in turn affected by many physical and biogeochemical processes such as diffusion, near-surface turbulence, surfactants and wave breaking. 1 PgC stands for petagrams of carbon. One petagram equals 1015 grams or one gigaton. 8 Introduction Figure 1.1: The different length scales (height/depth) of processes occurring at the air-sea interface. Taken from [69]. The study of the processes influencing transport across the air-sea interface is one of the main foci of the Solas program. Figure 1.1 summarizes some of these processes together with their characteristic length scales. Close to the surface, waves of different scales and wave breaking play an important role in the exchange processes. A primary goal of the current studies is to derive a physically sound parameterization of the transfer velocity, based on observable physical and biogeochemical properties of the interface. Such a parameterization can then be used to improve the predictive capabilities of climate models, and, provided that the parameters can be observed by satellite remote sensing, to derive global maps of air-sea fluxes. Such a parameterization of air-sea transport velocity is not only of interest for the exchange of CO2 , but also for other climate-relevant gases such as nitrous oxide (N2 O) and dimethyl sulfide (DMS), as well as for the transport of heat and momentum. In the oceans, heat energy can be transported by oceanic circulations and thereby alters the global distribution of energy. 9 1.1.2 Factors Influencing Air-Sea Exchange Processes To quantify the oceans’ role as a sink for CO2 , other gases, and heat, it is necessary to understand the exchange of gases across the air-sea interface. Note that the processes controlling air-sea transport are similar for gases, heat, and momentum. Therefore the following discussion is independent of the transported substance. In this section, some of the terms used in the subsequent discussion are defined. Boundary Layer Air-sea exchange is driven by the concentration difference between the atmosphere and the ocean, and its kinetics are usually expressed in terms of a single number, the transfer velocity k. For most gases, as well as for heat, the transfer velocity is controlled by processes within a thin aqueous boundary layer. In this layer, molecular diffusion is the dominant mode of transport, because turbulent eddies cannot reach through the surface. Surface Renewal Model It has been observed that a range of processes, such as wind stress, waves, wave breaking, buoyancy fluxes and interactions with subsurface flows can significantly enhance gas transfer. The common feature of these processes is that they create subsurface turbulence. Surface films, which reduce subsurface turbulence by damping waves and reducing the momentum input of wind stress, have the opposite effect and reduce air-sea transfer. A simple surface renewal model describes the effects of subsurface turbulence on the boundary layer very well. The model assumes that turbulent eddies randomly replace parts of the diffusion-controlled boundary layer with water from the well-mixed bulk. The higher concentration difference between the bulk water and the atmosphere enhances the transport velocity. Figure 1.2 illustrates this model for the case of heat transfer. A net heat flux cools down the water surface and a temperature difference develops between the well-mixed bulk water and the diffusion-controlled boundary layer. The colder surface layer has a typical thickness on the order of half a millimeter and is referred to as the cool skin of the ocean[79]. Turbulent eddies randomly replace parts of this cool skin, bringing up warmer bulk water, which subsequently equilibrates its temperature by diffusion. At higher wind speeds, bubble entrainment due to wave breaking and sea spray further enhance gas transfer. Water Waves As already mentioned, water waves play an important role in the spatial and temporal variability of gas transfer. Short capillary and capillarygravity waves dissipate their energy by microscale breaking, creating subsurface turbulence. Larger gravity waves affect the spatial and temporal distribution of 10 Introduction Figure 1.2: Surface renewal model. A heat flux j cools down the water surface. Directly below the surface, a cold water layer with a typical thickness of about 0.5 mm develops, the “cool skin of the ocean”[79]. Turbulent eddies randomly replace parts of this cool water layer with warmer fluid parcels from the well mixed bulk. These warmer fluid parcels subsequently equilibrate their temperature by diffusion. (Image courtesy of C. Garbe, IWR Heidelberg) this microscale breaking, that is breaking without entrainment of bubbles, by modulating the steepness and propagation velocity of the smaller waves. At larger scales and higher wind speeds the distribution of whitecap and wave breaking with entrainment of bubbles is similarly modulated by long gravity waves and swell [23]. Parameterizations Even with a perfect understanding of air-sea gas exchange, it would not be feasible to account for all the processes influencing the transfer velocity when modeling fluxes on a regional or global scale. Therefore, a parameterization of transfer velocity, using only a few observable variables, and based on sound physical principles, is desirable. Most parameterizations of transport velocity used in climate models are based on the single parameter wind speed (see Liss and Merlivat [70],Wanninkhof [104]). Part of the popularity of wind speed as a parameter is probably due to the fact that it can be measured easily. However, it is known that for a given wind speed, gas transfer velocities can fluctuate significantly in the presence of surfactants [30, 31]. Such surface films, either from anthropogenic sources or produced by marine plankton, are ubiquitous, so using wind speed as a single parameter for the transfer velocity appears questionable. 11 It has been suggested that wave slope might be a more suitable parameter to characterize sub-surface turbulence and its effect on gas exchange. Measurements by Jähne et al. [60] and Bock and Hara [9] have shown that a parameterization based on mean squared wave slope can be used to estimate the transfer velocity, independent of surface films. Mean squared wave slope also determines the intensity of microwave scattering, making it a parameter that can be remotely sensed by microwave scatterometry from airplanes or satellites [95]. 1.1.3 Heat Flux Measurements using Thermography Modern infrared cameras can image the temperature distribution on the water surface with high spatial and temporal resolution. The smallest temperature differences that they can resolve typically lie around 20 mK. This allows direct observation of surface renewal events at the air-sea interface, as the temperature difference between the cool skin layer and the warm bulk water is typically on the order of 100 mK (see figure 1.2). Therefore, such infrared cameras are a powerful tool to study air-sea gas exchange. Typical examples of infrared images of a water surface, acquired at the Heidelberg Aeolotron (see section 6.5), are given in figure 1.3. In the first image of the sequence the water is in thermal equilibrium with the airspace, which results in an almost uniform temperature. At the time this first image was taken, a net heat flux was created by flushing the airspace with dry air, but the effect of this flux is not yet observed. Wind stress from the air conditioning system and buoyancy fluxes due to density differences generate subsurface turbulence. This subsequently leads to surface renewal events bringing up warmer bulk water, which shows up as bright streaks. A complex temperature pattern develops as seen in figure 1.3b-f. The total temperature difference between bright and dark regions in this image sequence is approximately 0.5 K. Over the last decade, a number of image processing techniques have been developed, which allow the estimation of net heat flux and the transfer velocity directly from infrared image sequences. These methods are based on measuring the temporal decrease in temperature of fluid parcels at the water surface. To do this, the fluid parcels have to be tracked between image frames, using motion estimation algorithms. Recent reviews are given by Haußecker et al. [46] and Garbe et al. [36]. Because the transport processes for heat and gases are essentially the same, the transport velocities for gases can be calculated from the transport velocity obtained for heat, using appropriate scaling that corrects for the different diffusivities of heat and gases. This is known as Schmidt number scaling [see for example 46]. 12 Introduction Figure 1.3: Infrared image sequence of a flat water surface acquired in the Heidelberg Aeolotron wind-wave facility. When the first image was captured the water surface was still in thermal equilibrium with the airspace. As a heat flux was created by flushing the airspace with cool dry air complex temperature patterns develop at the water surface, driven by subsurface turbulence (b-f ). The time between consecutive images is about 1.5s. Surface temperature is expressed in terms of grey-value with brighter values representing higher temperatures. The temperature difference between dark and light pixels is about 0.5 K. The imaged area measures approximately 18cm×13.5cm. 1.2 Related work This section gives a two-part summary of previous approaches to wave imaging. The first part reviews techniques that use stereo imaging to recover the wave surface topography, while the second part reviews techniques that measure the wave slope or statistical parameters thereof. Other reviews of wave imaging can be found in Redweik [76, chapter 2] with an emphasis of stereo imaging of sea swell, Jähne et al. [59] which focuses on imaging short wind waves, and Luhmann and Tecklenburg [72], whose focus is on reconstructing the shape of bow waves created by ships in a model tank. 13 1.2.1 Stereo Measurements of Water Waves Stereo photogrammetry is a method that can be used to reconstruct the shape of objects given two images taken from different positions. It is based on the principle of triangulation depicted in figure 1.4; the position of a point P can be calculated by intersecting the lines of sight. Figure 1.4: Triangulation: If a point P on a surface is imaged by two cameras with known optical centers C 1 and C 2 , its position can be calculated by intersecting the two rays of view. The idea of measuring the shape of water waves using stereo imaging is not new. In fact, in 1903, shortly after Pulfrich built the Zeiss stereo-comparator in Jena (see figure 1.5) that made routine photogrammetric analysis of stereo image pairs feasible, Rottok and Kohlschütter [77] suggested using the newly available instrument to measure wave topography on the ocean. In the summer of 1904 Kohlschütter embarked on the S.M.S. Hyäne, equipped with a stereo camera setup designed by Pulfrich, for a cruise on the Kieler Förde. The primary task of the cruise was to evaluate the suitability of stereo photogrammetry for mapping shorelines, but image pairs of swell were collected as well. Although the measurements were apparently hampered by the fact that the alignment of the two photographic plates was distorted due to vibrations on the ship, Kohlschütter [68] notes that his experiments on stereo wave imaging produced “günstige Ergebnisse” (favorable results). Also in 1904, Laas, during a cruise on the Southern Atlantic Ocean, collected several stereo image pairs of water waves and was able to produce topographic maps of wave height. Such a map is depicted in figure 1.6. The image analysis had to be done manually using a stereo-comparator, which is very time consuming and therefore hardly feasible for large numbers of image pairs. 14 Introduction Figure 1.5: Left: One of the cameras, mounted close to the bow, aboard the S.M.S Hyäne, which were used for the stereo photogrammetric measurements in 1904. The second camera (not shown) was located close to the stern. Right: The Zeiss stereocomparator, which was used to analyze stereo image pairs. (Both images from [68]). Schumacher used similar stereo camera setups on cruises aboard the S.M.S. Meteor in 1925–1927 [87] and aboard the Europa in 1939. Due to the outbreak of war the results of these cruises weren’t published until 1950 [88]. In the early 1950s, attempts were made to obtain aerial stereo images of ocean waves, in France by the Institut Géographique National [18], and in the United States during the Stereo Wave Observation Project, SWOP [15]. The images were obtained from two cameras mounted on separate airplanes flying in parallel at the same height. To guarantee simultaneous image exposure, the cameras were synchronized with a radio transmitter. Both attempts were largely a failure; in the French project the contrast of the images was not good enough to allow for a proper analysis and the SWOP team could evaluate only two of about one hundred image pairs taken. In 1993, Redweik [76] evaluated whether correlation-based matching can be used for the automated analysis of stereo image pairs of waves. For a selected set of images acquired on a North Sea platform under favorable natural lighting conditions, without sun glint, he produced digital elevation models of the wave fields. 15 Figure 1.6: Surface topography of large gravity waves reconstructed from a stereo image pair taken in the Southern Atlantic in 1904 by Laas. He demonstrated that automated image matching was suited to the task and, within the uncertainties, produced identical results as a manual analysis on a stereo-planigraph. In 2002 Santel et al. [78] used a stereo camera system to reconstruct the shape of water waves in the surf zone. They manually labeled a number of corresponding points that were used as seed points for a correlation-based matching algorithm. Correlation-based matching is feasible in this case because large parts of the water surface in the surf zone are covered with foam, which provides surface texture and has almost Lambertian reflection properties (see Figure 1.8c,e and section 2.1.1). In the works described so far, the observed swell and wave fields were typically on the order of hundred meters wide with wave heights on the order of meters. Research on stereo imaging applied to short gravity and capillary wind waves has been performed by Shemdin et al. [91],[90] as well as Banner et al. [5], who produced directional wave spectra from stereo image pairs. However, a critical analysis reveals that their results may be unreliable, as they claim a much higher resolution than theoretically possible [6, 55, 59]. In the last decade, Waas [101, 102, 103] developed a combined wave height and 16 Introduction slope gauge that uses the stereo principle to compute the average wave height. For this instrument, called the Reflective Stereo Slope Gauge, a LED light source, creating specular glints, is mounted next to each camera and polarizing filters ensured that each camera would only image light originating from the light source mounted near the other camera. The symmetric design of the system ensures that specular glints appearing in the two images correspond to the same surface patch, making these glints good features to match. The glint patterns are also used to obtain statistical data about the wave slope distribution. This system was later refined by Dieter [20, 21] and used for field measurements from Scripps’ Pier in La Jolla, California. The disadvantage of this approach is that only the small fraction of the water surface covered by specular glints can be reconstructed. Furthermore, the system is sensitive to ambient light and can therefore only be operated at night. Other researchers have used a different strategy altogether, actively altering the optical properties of the water surface by applying aluminum powder or other substances to it, thereby reducing its specularity [see for example 1, 73]. For example, this was used for the study of hydrodynamic models of harbors. However, exchange processes at the air-sea interface are strongly influenced by surface films, and on the open ocean the marker substance would have to be constantly resupplied, so it is obvious that this method can not be used for the present application. Common to all previous approaches to stereo wave imaging is that they operate at visible wavelengths. 1.2.2 Slope Measurements of Waves Another class of methods for obtaining information about the shape and motion of water waves is based on measuring the surface slope, by using either the directional dependence of the light intensity reflected at the air-water interface or the directional dependence of the refracted light that passes through the interface. Reflection-Based Slope Measurements The seminal work for methods to characterize wave fields based on reflection of light at the water surface dates back to 1954 and is due to Cox and Munk [16, 17]. They derived wave slope statistics from sun glint patterns, using the fact that the sun only covers a small solid angle of the sky and therefore the slope of facets that specularly reflect sunlight towards an observer is determined within a narrow range. Stilwell [96] later developed a generalization of this technique, which used an optical Fourier transform to generate wave energy spectra directly from photographs of the water surface, obtained in situations with a uniform sky illumination. 17 In computer vision, a technique to reconstruct the shape of Lambertian surfaces based on reflected intensity is known as shape-from-shading [see 52]. Using this shape-from-shading approach, Schultz [85, 86] proposed and simulated a system consisting of four synchronized cameras that should allow a complete reconstruction of the surface slope under natural lighting conditions. One camera was pointed to the sky to map the hemispherical radiance distribution. Based on this radiance map and the irradiance values obtained by the other three cameras, imaging the same patch of water from three different directions, the two components of surface slope could be estimated using a specular surface model. Presumably owing to the complexity of this proposed system it was never implemented [55]. Refraction-Based Slope Measurements The other class of wave slope imaging techniques is based on refraction at the air-water interface. The current state of the art for wave slope measurements in wind-wave tanks is the so-called Color Imaging Slope Gauge (CISG) [3, 4, 33, 61]. The key to this technique, depicted in figure 1.7, is an extended light source. The intensity of this light source is color coded, such that every point on the light source has a different color value. With this optical setup, each value of the surface slope corresponds to a unique color value that is recorded by the camera. This technique allows very accurate slope measurements and can resolve small capillary waves, as depicted in figure 1.7 on the right. Based on the good performance of these instruments in laboratory settings, several refraction-based instruments have been built for use in the field. Klinke and Jähne [66, 67] used an imaging slope gauge on a freely drifting buoy, with cameras mounted above the water surface and a submerged array of light emitting diodes (LEDs) to create an intensity-coded area light source. Due to interference from ambient light, the use of the instrument is restricted to dusk and night. Bock and Hara [8] have employed the refractive technique using a submerged scanning laser and a position-sensitive photodiode for field measurements on a catamaran. The submerged laser assembly is smaller than the extended light source used by Klinke but can still interfere with the wave field. Both instruments have provided valuable data, but they also demonstrate a general problem with using the refractive technique for routine measurements. As either the camera or (preferably) the light source must be submerged, they can usually not be operated directly from a research vessel, but have to be mounted on a smaller external float, like a buoy or a catamaran. Therefore, the system has to be deployed separately, which is very labor intensive and can become difficult at 18 Introduction towards camera water surface Fresnel lens area light source (color coded) Figure 1.7: Color Imaging Slope Gauge. Left: Optical Setup. The light received by the camera depends on the slope of the water surface. Due to the color coded area light source, the slope can be calculated from the color value of an image point. Right: Wave slope image of capillary waves obtained with a CISG. The imaged area is approximately 15 cm× 15 cm. (Image courtesy of Jochen Klinke of Scripps Institution for Oceanography) high wind speeds or with an agitated sea. Furthermore, the whole setup has to be very rugged and watertight to withstand sea water. The submerged parts of such a system can also interfere with the larger gravity waves. 1.3 Why is wave imaging difficult ? When analyzing the previous attempts to reconstruct the surface topography, surface slope and motion of water waves reviewed in the preceding section, it becomes apparent that the optical properties of water in the visible wavelength range pose severe problems under natural lighting conditions. The difficulties are best illustrated with a few images of water surfaces under different conditions, as shown in figure 1.8. Depending on view angle and illumination conditions the following effects are observed: • Specularity: Light reflection off the water surface is largely specular, or mirror-like. That means that the reflected intensity is very large when the 19 a d b c e Figure 1.8: Images of water waves taken under different illumination conditions and observation angles illustrate some of the major optical properties, which make wave imaging difficult (see text). incidence angle equals the observation angle. In bright sunlight, this leads to sun glints originating from the narrow range of surface slopes that reflect direct sunlight as depicted in figure 1.8a. For overcast sky conditions the specularity leads to a situation as in figure 1.8b. • Transparency: For observation angles close to the surface normal, water is largely transparent in the visible region of the electromagnetic spectrum (figure 1.8d). This becomes important when imaging small scale capillary waves in the field, as the observation angle has to be close to the surface normal in this case. Otherwise, the capillary waves are more likely to be occluded by larger gravity waves. • Lack of texture: Smooth parts of the water surface are often mostly featureless (figure 1.8a,b,c). • Large intensity variations: The specularity causes large variations in image irradiance, with some surface patches reflecting direct sunlight and other surface patches reflecting light from darker areas of the sky or showing up- 20 Introduction welling light (figure 1.8a). Due to their limited dynamic range, most cameras cannot cover the whole range of irradiance observed with good intensity resolution. • Non-uniform reflectance properties: Strongly aerated water, as found in breaking waves (figure 1.8c,e), has very different reflectance properties from non-aerated water. Due to multiple reflections at the interface of bubbles, breaking waves are to a good approximation Lambertian (see section 2.1.1). Specularity, transparency and lack of texture pose a serious problem for stereo photogrammetry. To reconstruct the surface using triangulation, points in images taken from different positions corresponding to the same point on the surface have to be identified. Stereo matching algorithms usually fail to find such correspondences where the water surface appears without texture or transparent. Specular reflection is view angle dependent, so it can introduce a systematic error when finding correspondences. For example, a flat part of the water surface reflects radiation from different parts of the sky into the two cameras. The stereo algorithm will then try to match features in the sky, which do not correspond to the same point on the water surface. This situation is illustrated in figure 1.9. For a wavy surface, this effect can lead to a systematic bias in the estimated wave height as well as missing correspondences, as is shown in Jähne et al. [59]. Figure 1.9: Due to the view-angle dependence of specular reflection, corresponding features seen in the two camera images originate not from the same patch on the water surface, but from the same region of the sky. 21 1.4 Aim and own contribution Section 1.1 illustrated the need for simultaneous measurements of air-sea fluxes and the parameters, such as wave slope, that influence the transport across the interface. Specifically, combining heat flux measurements using thermographic imaging with wave measurements is desirable. In the laboratory environment this can be achieved by combining gas and heat flux measurements with slope measurements using a Color Imaging Slope Gauge (see section 1.2.2) (see for example [106]). However, existing field data from combined measurements of the water surface and air-sea fluxes is scarce, largely due to the difficulties with wave imaging. To help fill this gap, the aim of the present work was to design an instrument that can be used to reconstruct the shape and motion of short to medium gravity waves on the ocean, either from a fixed platform or from a research vessel. In this respect, the present work addresses some of the aims put forth in the science plan of the Solas program [69]. Based on the experience from previous work (discussed in section 1.2) the following features are desirable for such an instrument: 1. The instrument should be field deployable for use off the boom of a ship during a research cruise, from a pier or from an offshore platform. Rigging should be easy and fast. 2. The system should be operable over a wide range of wind speeds and other meteorological conditions. 3. The instrument should be rugged enough to withstand the harsh operating conditions at sea. 4. The water waves should not be disturbed, for example by placing parts of the instrument underneath the surface. 5. The instrument should be as insensitive as possible to variations in natural illumination conditions. The approach taken in this work is a stereo infrared imaging wave gauge, which consists of two infrared cameras and uses stereo photogrammetry to recover the surface shape of short gravity waves. In addition to wave reconstruction, the infrared image sequences acquired with this instrument can be used to estimate air-sea heat and gas fluxes as described in section 1.1.3. Operating in the infrared wavelength region from 3–5 µm solves many of the problems of previous stereo wave imaging systems, arising from the optical properties of water at visible wavelengths (mentioned in section 1.3), as follows: 22 Introduction • Textured: Infrared images of the water surface show a rich texture when a heat flux cools down the surface. An example was given in figure 1.3. • Opaque: The maximum penetration depth of infrared radiation in water is less than 0.1 mm in the 3–5 µm range. • Emissive: Every warm object radiates electromagnetic energy according to Planck’s law (see section 2.1.2). • Lambertian: In the 3–5 µm region, the brightness of the radiation emitted by a water surface changes only by about 5% for viewing angles up to 60◦ relative to the surface normal, see section 2.1.4. The temperature patterns observed at the water surface whenever a heat flux is present create a rich texture in the infrared images, with features that can be used to match corresponding points in the stereo image pairs automatically (see section 4.2.4). Opacity and almost Lambertian surface characteristics ensure that these features are bona fide surface features for a large range of viewing angles. In contrast, the transparency and specularity of water in the visible region can give rise to visible image features that are below the water surface or are view-dependent mirror images of surrounding features like clouds, respectively (see sections 4.2.3 to 4.2.4). The infrared radiation detected by the two cameras is emitted by the water surface itself, according to Planck’s Law (see section 2.1.2). Therefore, no light source is needed, which means that the infrared stereo wave gauge is a passive remote sensing device. Although the present system has only been evaluated in the laboratory, monocular infrared image sequences of similarly good quality (high contrast and a rich texture) were obtained on previous research cruises (CoOp 1995,1997 [7] and GasEx 2001) under different wind speed regimes, as reported by Schimpf [83]. However, the best results are obtained at dusk or at night, when the negative heat flux cooling the surface is usually strongest due to the lessened solar radiative flux. Stronger cooling makes for a larger temperature difference between the bulk water and the top thermal skin, giving higher contrast in the thermal images. Measuring at night also avoids specular sun glints in the images. These can, however, still be a problem, although they are less severe than at visible wavelengths, where the maximum of the solar blackbody spectrum is located. Previous experience with thermographic imaging at sea also shows that the infrared cameras can be made rugged enough to withstand and operate under harsh 23 conditions by placing them in a protective casing as described by Garbe [35, section 12.2]. The image sequences obtained from each of the two infrared cameras making up the stereo system can be used to analyze the heat flux at the water surface, using the techniques described by Schimpf et al. [84] and Garbe et al. [37]. In fact, the additional depth information obtained with the stereo camera setup should help to improve the accuracy of the heat flux and transfer velocity estimates obtained using the methods described in [37]. These methods rely on the accurate calculation of optical flow from the infrared image sequences. The wave motion changes the distance of the water surface from the camera, introducing a diverging or converging component to the optical flow. Using the depth information obtained from stereo imaging, it is possible to correct for this effect. In summary, the infrared stereo wave gauge presented in this thesis can be classified as an imaging, passive, close-range, remote-sensing device, where the individual attributes have the following meaning: • Imaging: In contrast to point measurements, spatial variations can be recorded. • Passive: No light source is required, in contrast to active remote sensing devices. • Close-range: The distance of the object from the camera is only one order of magnitude larger than the baseline of the system. • Remote-sensing: The instrument does not touch or interfere with the surface. 1.5 Thesis Outline Chapter 2 introduces the basic concepts and terminology of radiometry, with an emphasis on infrared radiation. It is shown that a warm water surface radiates electromagnetic energy similarly to a blackbody and how, based on this radiation, infrared cameras can be used to measure the temperature distribution at the water surface. The optical properties of water in the infrared region are also presented. Chapters 3 and 4 explain the foundations of stereo computer vision. As this work might be of interest to physical oceanographers without a background in computer vision, most of the algorithms used are stated explicitly. The problem of stereo computer vision can be divided into two main parts: the geometric concepts needed to extract metric information from image pairs; and the automatic matching of corresponding points across the two images, known as disparity estimation. The geometric aspects are the focus of chapter 3, where camera calibration and two-view 24 Introduction geometry are presented using the notation of homogeneous coordinates, a useful concept from projective geometry. Disparity matching is the topic of chapter 4, where the matching algorithm used for the stereo infrared wave gauge is outlined, following a review of existing algorithms and underlying model assumptions. The image sequences obtained from the infrared cameras cannot be used directly for the algorithms outlined in chapters 3 and 4, as they first have to be corrected for the different sensitivities of the sensor elements in the camera. Also, gaps in the images due to defective camera pixels must be interpolated. The output data of the stereo algorithm also contains gaps in image regions where no disparity estimate could be obtained. Both input and output images are interpolated in a similar manner, using a technique known as regularization, detailed in chapter 5. Chapter 5 also discusses some other well-established image processing methods that have been used in this work, namely non-uniformity correction and outlier removal. The experimental setup and an overview of the experiments performed with the infrared stereo wave gauge at the Heidelberg wind-wave channel are detailed in chapter 6 and the results are presented in chapter 7. The conclusion in chapter 8 contains a critical discussion of the results obtained and an outlook on possible future developments. For better readability, some mathematical derivations that would otherwise disrupt the text, are presented in appendices. Appendix A discusses the factors that influence the accuracy of the 3D points reconstructed with the infrared stereo camera system. The theoretical limit on range resolution achievable using triangulation is analyzed. To obtain an experimental assessment of the achievable accuracy, a method for robustly fitting a reference plane to the reconstructed points of a flat water surface is presented. Appendix B provides a detailed explanation of one of the steps involved in camera calibration, namely the estimation of a projective transform, or homography, from a set of point correspondences. 2 Basics of Infrared Imaging This chapter discusses what can be seen in the images obtained by an infrared camera. When trying to analyze infrared images, the thermal radiation of warm bodies as described by Planck’s law, the thermal and optical properties of the emitting surfaces and the environment, and the interaction of the radiation with the detector become important. The use of infrared imaging, or thermography, distinguishes this work from the majority of stereo computer vision applications, which usually operate in the visible wavelength region. The main difference is that in the infrared region of the electromagnetic spectrum, every object emits radiation (as will be seen in section 2.1.3). That is, every scene object is also a “light source”, whereas at visible wavelengths, objects usually reflect or re-emit the radiation from a light source. In section 2.1.1, some radiometric terms needed for the subsequent sections are defined. The nomenclature and units follow the recommendations of the International Association of the Physical Sciences of the Ocean (IAPSO) (see, for example, [49]). Section 2.1.2 reviews the physics governing the thermal emission of electromagnetic radiation for the idealized case of a blackbody and introduces Planck’s law. The thermal emission and radiometric properties of real world objects are then addressed in section 2.1.3. Due to their importance for the application to wave imaging, the properties of water in the infrared region are stated in section 2.1.4; some of the properties of water in the visible region have already been mentioned in section 1.3. Specifically, the assumption that the water surface is approximately Lambertian in this wavelength region, which will be used for the stereo matching described in chapter 4, is justified. Section 2.2 describes the physical principles by which the cameras used in the present application detect infrared radiation and defines the parameter NE∆T that is commonly used to specify the thermal sensitivity of a detector. For current infrared imagers, the sensitivity shows strong variations between individual pixels, so a radiometric calibration must be performed, as explained in sections 2.2.3 and 5.1. Radiometric calibration also forms the basis for temperature measurements using an infrared detector. 26 2.1 2.1.1 Basics of Infrared Imaging Radiometry Definitions In this section, the definitions of several radiometric quantities are given, which provide the basis for the subsequent sections. The concept of a Lambertian surface is also introduced. For a treatment that goes beyond the bare definitions, the reader is referred to the summary by Haußecker [45]. The notation follows the recommendations of the International Association of the Physical Sciences of the Ocean (see [49]) . Preceding the definition of the actual radiometric quantities, the terms spherical coordinates and solid angle are defined, as they are subsequently used to specify directional dependence. Table 2.1 summarizes the definitions and the units of the different radiometric quantities. Spherical Coordinates Using spherical coordinates, any point in three dimensional space can be expressed by two angles θ, φ and a distance r from a point of reference, as shown in figure 2.1. To represent a direction, only the angles θ and φ are needed. Figure 2.1: Spherical coordinates: In spherical coordinates a point R is represented by the angles φ, θ and the distance r. The directional dependence of the flux emanating from a surface patch dS is also expressed in terms of φ and θ. Solid Angle It is straightforward to extend the notion of an angle in two dimensions, defined as the length of a segment on the unit circle, to the three dimensional case where the solid angle is defined as the area of a surface patch on the unit 27 sphere. This surface patch can be interpreted as the projection of an arbitrarily shaped patch onto the unit sphere, as is depicted in figure 2.2. Since the solid angle is defined as a ratio of two areas, it is dimensionless. As this can lead to confusion, the artificial unit sterad [sr] is used to mark quantities with a directional dependence. Figure 2.2: Solid angle: The surface patch S covers a solid angle Ω. The solid angle is defined as the fractional area of surface of the unit sphere covered by the projection A of S. Radiant Energy and Flux The total radiant energy emitted or received by an object is denoted by Q and is measured in Joule [J]. The temporal derivative of radiant energy is the radiant flux or radiant power Φ, which has the units Watt [W] or [Js−1 ]. Exitance and Irradiance Radiant exitance M is a measure of the radiative flux emitted per unit surface area. The corresponding quantity for incident (as opposed to emitted) flux is irradiance E. Both M and L are used to describe an extended area. Exitance Irradiance · ¸ dΦemitted W M= dS m2 · ¸ dΦincident W E= dS m2 (2.1) (2.2) In general M and E are functions of the position on the surface M (x), E(x). 28 Basics of Infrared Imaging Radiant Intensity Radiant intensity I is defined as the radiant flux emitted per unit solid angle Ω from a point in space. dΦemitted I= dΩ Radiant intensity · W sr ¸ (2.3) Radiance Radiance L is a function of both position and direction. It describes the radiant flux per foreshortened (projected) unit area per unit solid angle emanating from a given position in a given direction. The foreshortened area specifies a surface element perpendicular to the propagation direction of the radiation. · ¸ dΦ dΦ W Radiance L= = (2.4) dΩdS⊥ dΩdS cos(θ) m2 sr If the radiance is known, the other radiometric quantities introduced above can easily be obtained from it by integration. For example, combining equations (2.1) and (2.3) with equation (2.4) yields the following relations: dI dS dM L cos θ = . dΩ L cos θ = (2.5) (2.6) Integrating dM over the hemisphere surrounding the point of reference, and integrating dI over the surface patch S, yields π Z2 Z2π Z M (x) = L(x, θ, ϕ) cos θ dΩ = L(x, θ, ϕ) cos θ sin θ dϕdθ 0 hemisphere and (2.7) 0 Z I(θ, ϕ) = L(x, θ, ϕ) cos θ dS, (2.8) S respectively. Note that equation (2.8), is an extension of the original definition of intensity (2.3), which describes point sources only. 29 Lambertian Surfaces A surface which looks equally “bright” from all directions is said to be Lambertian or ideal diffuse. More precisely, this means that its radiance is independent of view angle, that is L(x, θ, ϕ) = L(x). From equation (2.8) it follows that for a Lambertian surface the angular distribution of intensity obeys the relation Z I(θ) = L(x) cos θ dS = I0 cos θ, (2.9) S known as Lambert’s cosine law . This is depicted in figure 2.3. Figure 2.3: Lambert’s cosine law: For a Lambertian surface the angular variation of emitted intensity follows the functional relationship I(θ) = I0 cos θ, which is schematically represented by the lengths of the arrows. The concept of a Lambertian surface is idealized, but some materials like cotton cloth or matte paper at visible wavelengths, and (as is shown in section 2.1.4) water in the infrared region are reasonably close to this ideal. The last example is important for the present application since the correlation-based stereo matching described in chapter 4, like many computer vision algorithms, is based on the assumption that the observed object has Lambertian surface characteristics. (see section 4.2.2). Lambert’s cosine law appears, at first glance, to contradict the definition of a Lambertian surface as being equally “bright” from all view angles because of the cosine factor. However, when calculating the flux the detector receives from a given solid angle, the cosine factor is canceled by the foreshortening effect: for an oblique view angle θ, the observed surface patch is larger by a factor of 1/ cos θ as depicted in figure 2.4. Since radiance, by definition, accounts for this foreshortening effect, it is constant independent of the angle θ. Note that in this discussion it was assumed that only the emitting surface is tilted, while the detector surface (CCD chip) remains perpendicular to the view direction. For a discussion of arbitrary source-detector geometry see for example Jähne et al. [57, section 5.2.1]. 30 Basics of Infrared Imaging Figure 2.4: Effect of foreshortening: For a given solid angle, a detector receives flux from a larger area of a surface tilted by angle θ than from a surface perpendicular to the view direction at the same distance. Spectral Distribution The radiant energy Q is generally comprised of radiation in a spectrum of different wavelengths. To account for this, the spectral radiant energy Qλ is defined as Qλ (λ) = dQ(λ) dλ £ ¤ J m−1 , (2.10) which is the radiant energy within the infinitesimal wavelength interval [λ, λ + dλ]. For all of the radiometric quantities introduced above, a corresponding spectral quantity can be defined analogous to equation (2.10). These spectral quantities will hereafter be denoted with a subscript λ. Photon Flux The infrared cameras used for the present application are photon detectors. As will be shown in section 2.2.3, their detector output depends on the number of incident photons. Therefore, it is sometimes useful to formulate the radiometric quantities defined above in terms of the number of photons rather than in terms of energy. For the spectral quantities this can be done by dividing the energy-based radiometric quantities by the energy of a single photon at the given wavelength, Ephoton = hc , to obtain a photon count. These photon-based λ quantities are denoted with a subscript p. For example, the spectral exitance expressed in terms of the number of photons emitted per unit time and unit area is Mp,λ = 2.1.2 λ Mλ . hc (2.11) Electromagnetic Radiation of a Blackbody In 1800, during his experiments with sunlight, Herschel discovered that there is some sort of “heat radiation” that is not tied to visible light. With a prism 31 Quantity Symbol Unit Radiant energy Q J Radiant flux Φ W Radiant exitance Irradiance Radiant intensity M E I Wm−2 Wm−2 Wsr−1 Radiance L Wm−2 sr−1 Definition Total energy emitted by a source or received by a detector Total power emitted by a source or received by a detector Power emitted per unit surface area Power received at unit surface element Power leaving a point on a surface into unit solid angle Power leaving unit projected (i.e. perpendicular to direction of travel) surface area into unit solid angle Table 2.1: Summary of radiometric quantities and their units. and a thermometer he observed this “heat radiation” close to the red part of the visible spectrum and named it “infrared radiation”. With Maxwell’s theory of electromagnetism (1864) this radiation could be explained as the electromagnetic radiation emitted due to the acceleration of charged particles by thermal motion. Planck’s law of Radiation In 1900, Planck [74] derived an explanation for the spectral distribution of thermal radiation, based on the assumption that the emitter can only lose discrete quanta of energy. This marked the start of quantum mechanics and modern physics. Other than temperature, the spectral distribution of electromagnetic radiation also depends on the physical properties of the emitting surface (see section 2.1.3). To address this, Kirchhoff introduced the concept of an ideal surface that neither transmits nor reflects, but absorbs all incident radiation independent of wavelength and direction. Such a perfect absorber of radiation is called a blackbody. By the principle of conservation of energy, a blackbody must also be a perfect emitter of radiation. This is a particular case of Kirchhoff ’s law (see below). Planck’s law for the spectral distribution of radiation by a blackbody is given by: ¢−1 2πhc2 ¡ hc/kb T λ e − 1 dλ 5 λ ¢−1 2πhν 3 ¡ hν/kb T Mν (T )dν = dν, e − 1 c3 Mλ (T )dλ = (2.12) 32 Basics of Infrared Imaging where T is the absolute temperature in Kelvin [K], c = 3 · 108 [ms−1 ] is the speed of light, kb = 1.38 · 10−23 [J/K] is Boltzmann’s constant and h = 6.62 · 10−34 [Js] is Planck’s constant. To switch between the formulations of equation (2.12) for wavelength λ and frequency ν the relation dλ = −c/ν 2 is used. dν In equation (2.12), Planck’s law is stated for spectral radiant exitance, in contrast to many physics textbooks that formulate Planck’s law in terms of the spectral energy density ρλ (T ), which has the units [Jsm−3 ] and is related to the spectral radiant exitance by ρλ (T ) = 4cMλ (T ). The spectral radiant exitance of a blackbody at different temperatures is given in figure 2.5 (for simplicity, these spectral distributions will be referred to as blackbody curves). With increasing temperature, the maximum of the blackbody curve, λmax , shifts to shorter wavelengths. If the temperature is high enough, radiation is also emitted in the visible wavelength range. This effect forms the basis for the incandescent lamp. Note that the blackbody curves for different temperatures do not intersect, meaning that the spectral radiant exitance increases monotonically with temperature for all wavelengths. This monotonic increase is important, as it leads to the possibility of measuring the temperature of a blackbody surface from its radiant exitance within any spectral subregion of the blackbody curve, as explained in section 2.2.3. Wien’s Displacement Law The temperature dependence of the wavelength λmax , at which the blackbody curve has its maximum, is known as Wien’s displacement law : λmax · T = constant = 2891 µm · K. (2.13) Stefan-Boltzmann Law The total radiant exitance of a blackbody is obtained by integrating equation (2.12) over all wavelengths, leading to the Stefan-Boltzmann law : Z∞ (2.14) Mtotal (T ) = Mλ (λ, T )dλ = σ · T 4 0 −8 −2 where σ = 5.67 · 10 Wm K −4 . Kirchhoff’s Law As mentioned in the introduction to blackbodies, a perfect emitter is also a perfect absorber. In fact, for a surface in thermodynamic equilibrium, there is a more general rule: its emissivity is always equal to its absorptivity independent of direction, temperature and wavelength. This is known as Kirchhoff ’s 33 !" !" !" !" !" λ µ λ µ Figure 2.5: Blackbody curves at different temperatures. 34 Basics of Infrared Imaging Law and is due to the conservation of energy. If there were an imbalance between absorptivity and emissivity, the surface would either lose energy to, or gain energy from, its environment, violating the assumption that it is in thermal equilibrium. 2.1.3 Emissive Properties of Real Surfaces Emission from real objects differs from blackbody emission, because the physical surface properties affect the angular distribution and the spectral characteristics of the emitted radiation. Furthermore, a real surface does not absorb all of the incident radiation, but reflects or transmits some fraction of it. Surface properties also change with temperature, complicating the situation further. The radiometric effects of the surface properties are described by the four parameters emissivity ², absorptivity α, reflectivity ρ and transmissivity τ . Emissivity is defined as the ratio of emitted radiation for the surface in question to the emitted radiation of a blackbody at the same absolute temperature: ²= Msurface(T ) . Mblackbody(T ) (2.15) Absorptivity is defined as the fraction of the total irradiance absorbed by the surface: Eabsorbed α= . (2.16) Etotal From Kirchhoff’s Law it follows that α = ². Reflectivity is similarly defined as the fraction of the total irradiance reflected by the surface: Ereflected ρ= . (2.17) Etotal Finally, transmissivity is a parameter describing transparent surfaces and is defined as the fraction of the total irradiance transmitted through the surface: τ= Etransmitted . Etotal (2.18) As indicated above, these four parameters generally have an angular as well as a spectral dependence, and for some objects they vary with temperature (e.g. the emissivity of glowing hot objects approaches 1). So, in general, these parameters are functions of wavelength and direction, for example in the case of emissivity ² = ²(λ, θ, φ, T ). 35 By conservation of energy, it follows that ρ + α + τ = 1. (2.19) For all non-transparent surfaces and for sufficiently thick bulk material, in which all radiation will eventually be absorbed, this simplifies to α = ² = 1 − ρ. (2.20) The nomenclature used here reflects the common use in the infrared community, although it is slightly incorrect: “The emissivity, reflectivity and absorptivity are properties of ideal materials. Real world materials have defects, surface irregularities, and may contain a variety of trace materials. Real materials are characterized by emittance, reflectance, and absorptance. Unfortunately, the infrared community indiscriminately mixes emissivity with emittance. Since emissivity is used more often this book will use emissivity for real materials.” Holst [51] Non-blackbody surfaces can be classified according to their emissivity as follows: • Greybodies have an emissivity ² < 1 that is independent of angle and wavelength. That is, their spectral emissivity differs from the blackbody curve only by a constant factor, namely ². For a greybody ρ = 1 − ². • Selective emitters are the most general case; their emissivity varies with wavelength. Figure 2.6 compares the spectral radiant exitance of a blackbody, a greybody and a selective emitter at the same temperature. 2.1.4 Optical Properties of Water in the Infrared Region This section states some optical properties of water at infrared wavelengths that are important for the present application. The values for the reflectivity and the penetration depth are derived from data on the complex refractive index of water in the infrared region that was compiled by Downing and Williams [22]. As this derivation is based on the Fresnel equations, these are briefly reviewed. It is shown that the reflectivity of water at infrared wavelengths is very small and depends only slightly on the angle of incidence. Therefore, for many practical purposes the water surface can be considered to be Lambertian. 36 Basics of Infrared Imaging !" # *( $ % +, & '& #!" # ρ( ) λ µ Figure 2.6: Spectral radiant exitance for real objects. The spectra of a selective emitter (schematic) and a greybody with an emissivity of 0.5 are compared to a blackbody at the same temperature (100 ◦ C). Complex Index of Refraction When an electromagnetic field is incident on the surface of a material, it exerts a force on particles with non-zero electric moments (such as electrons and polar molecules). These particles are thereby accelerated, re-emitting some of the incident radiation and absorbing some. The details of this interaction are fairly complex and depend, among other things, on the crystal structure, and the electronic and molecular excitation levels of the material. For bulk matter, the average effects of these processes are described by the dielectric function ²(λ), not to be confused with the emissivity, and the complex refractive index N (λ) = n(λ) − ik(λ) (2.21) derived from it [see e.g. 28, 64, 100]. The real and imaginary parts n(λ) and k(λ) of the complex refractive index for water at infrared wavelengths are displayed in figure 2.7. 37 0 0.3 10 10 -1 10 0.2 10 -2 10 -3 10 0.1 10 -4 0 10 1 2 10 10 10 0 10 λ µ 1 2 10 10 λ µ Figure 2.7: Real part, n, and imaginary part, k, of the complex index of refraction for water at infrared wavelengths. From data collected and tabulated by Downing and Williams [22] Reflectivity The reflectivity of a material can be calculated from the index of refraction using the Fresnel equations: ¯ ¯ ¯ N cos θ − cos ψ ¯2 ¯ ¯ ρ⊥ (θ, ψ) = ¯ ¯ and ¯ N cos θ + cos ψ ¯ ¯ ¯ ¯ cos θ − N cos ψ ¯2 ¯ ¯ ρk (θ, ψ) = ¯ ¯, ¯ cos θ + N cos ψ ¯ (2.22) (2.23) where the angles θ and ψ are defined as in figure 2.8, and ρ⊥ and ρk are the reflectivity for radiation polarized perpendicular to and parallel to the plane of incidence, respectively. The plane of incidence is the plane spanned by the propagation 38 Basics of Infrared Imaging Figure 2.8: Incident flux, φi , reflected flux, φr and transmitted (refracted) flux, φt , at the air-sea interface. The angle of incidence θ is equal to the angle of reflection. The angle of refraction is denoted ψ. direction of the electromagnetic wave and the surface normal. For unpolarized radiation, the reflectivity is obtained by taking the mean: ρ= ¢ 1¡ ρ⊥ + ρ|| . 2 (2.24) The value of cos ψ can be derived using Snell’s law of refraction (see, for example [100]): r sin2 θ cos ψ = 1 − . (2.25) N2 Although many textbooks only state equations (2.22) and (2.23) for a purely real refractive index N = n, they also hold if the imaginary part k is non-vanishing (see [40]). The angular dependence of the reflectivity of water in the infrared region, computed using equation (2.22), is displayed in figure 2.9. The real and imaginary parts of the refractive index used to calculate this dependence were obtained by averaging n(λ) and k(λ), as tabulated in Downing and Williams [22], over the 3 − 5 µm wavelength region. Note that the reflectivity is very small and almost constant for angles of up to 55◦ . Therefore, water at these wavelengths has surface properties that are similar to a Lambertian surface. Penetration Depth If the imaginary part of the complex refractive index is nonzero, the radiative flux is attenuated in the medium. After a distance z in the medium, an initial flux Φ0 is reduced to: Φz (λ) = Φ0 e−β(λ)z , (2.26) 39 ρ⊥ ρ ρ θ Figure 2.9: Angular dependence of reflectivity of water obtained from equations (2.22),(2.24) and the complex index of refraction (2.21). The values were calculated with the real part and the imaginary part of the refractive index chosen as n=1.384 and k=0.0412 respectively. These values are the mean of n and k over the 3 − 5 µm interval. where β is the absorption coefficient. It is related to the imaginary part k of the refractive index by k(λ) β(λ) = 4π . (2.27) λ The penetration depth, ζ(λ), defined as the distance in the medium over which the flux is reduced by a factor of e, is given by the inverse of β: ζ(λ) = 1 . β(λ) (2.28) For water in the 3 − 5 µm wavelength region, which is the region in which the cameras used for the present research are sensitive (see section 6.1), the penetration depth varies considerably, as can be seen in figure 2.10. However, for this application the important result is that the penetration depth in this region is always less than 100 µm. This means that if water is imaged in this wavelength range, Basics of Infrared Imaging µ 40 λ µ Figure 2.10: Spectral dependence of the penetration depth of infrared radiation in water. The penetration depth is lower than 100 µm throughout the 3 − 5 µm wavelength region. From data published by Downing and Williams [22]. only the temperature distribution at the surface is observed, fulfilling an important prerequisite for stereo matching (see section 4.2.3). This is in stark contrast to water in the visible part of the spectrum, where the transparency at normal incidence is quite high. Exact values in the visible spectrum vary considerably (see Hoejerslev [49]), depending on dissolved substances and suspended particles, but everyday experience shows that one can look through several meters of clear water and, for example, recognize the tiles at the bottom of a swimming pool. 2.2 Infrared Detectors This section reviews the principles by which infrared radiation is detected and how the detector output can be used to sense temperature distributions remotely. Two common detector types, thermal detectors and photon detectors, are presented in section 2.2.1. The thermal sensitivity of infrared detectors is often specified in terms of the noise equivalent temperature difference, a performance parameter that is explained in section 2.2.2. Section 2.2.3 discusses how the detector output is related to the temperature and to the surface properties of the 41 imaged object. This discussion is for photon detectors, as the cameras used in this work (see section 6.1) are based on this detector type, but similar principles apply to thermal detectors. 2.2.1 Types of detectors There are several different types of infrared detectors. Imaging infrared detectors are often referred to as focal plane array, or FPA. The most common types of FPAs are thermal detector arrays and photon detector arrays. Thermal detectors detect infrared radiation by secondary effects due to the heat deposited by the radiation incident on the detector. Typical examples of thermal detectors are bolometers. In contrast to thermal detectors, photon detectors detect infrared radiation directly, through charges created by incident photons. Bolometers Bolometers typically consist of two material types, one with high absorptivity and one with a heat-dependent resistance. Incoming infrared radiation deposits energy in the highly absorptive material, such that heat is generated. This absorbed heat causes a change in the electrical resistance of the second material, which is in thermal contact with the absorber. The change in electrical resistance can be transformed into a voltage signal. For use in infrared imagers, a number of small bolometers are arranged to form a micro-bolometer array, where each bolometer accounts for one image pixel. Most commercially available microbolometer arrays are sensitive in the 8 − 14 µm wavelength region and can be operated without cooling. Photon Detectors Photon detectors directly transform incoming infrared radiation into electrical charges. The detector consists of a semiconductor, which has a band gap between the valence band and the conduction band. For infrared detectors, the semiconductor material is chosen such that the potential energy difference between these bands is on the same order of magnitude as the energy of infrared photons in the desired wavelength range. Incident infrared photons are absorbed and can excite electrons from the valence band into the conduction band, creating free electrical charges and thus a measurable electronic signal. The fraction η of photons that excite an electron into the conduction band is called the quantum efficiency of a detector and is generally a function of wavelength. As electrons can also be thermally excited into the conduction band, infrared photon detectors are cooled to improve the signal-to-noise ratio. Typical operating 42 Basics of Infrared Imaging temperatures are in the range between 70 K and 77 K and are achieved by Stirling coolers or by using liquid nitrogen. Common detector materials are Indium Antimonide (InSb), Platinum Silicide (PtSi) and Mercury Cadmium Telluride (HgCdTe). Another group of photon detectors are quantum well inter-sub-band photodetectors, or QWIPs. These are based on sandwiched layers of AlGaAs and GaAs. The layers form quantum wells, which give rise to quantized electron bands. The band gap between different states, and therefore the infrared wavelength region in which the detector is sensitive, can be tailored by varying the thickness of the different layers. Detectors based on InSb and PtSi typically operate in the 3 − 5 µm wavelength region, also called mid-wave infrared (MWIR), while HgCdTe-based and QWIPbased detectors are available for the 8 − 12 µm, or long-wave infrared (LWIR), wavelength region as well as for MWIR. 2.2.2 Sensitivity The thermal sensitivity of an infrared camera is often specified in terms of the performance parameter NE∆T or NEDT, the noise equivalent temperature differential. This can be understood as follows: In a setup in which an infrared camera receives radiation from an ideal blackbody, assume that the detector output is expressed as a voltage V . With such a setup, the NE∆T can be defined as the temperature difference ∆T of the blackbody that causes a difference in detector output ∆V of equal magnitude as the noise level Vnoise of the detector. In other words, the NE∆T is the smallest temperature difference that can be resolved by the detector. It is important to note that the NE∆T is not a fixed performance parameter, but itself depends on the background temperature Tbg . This is because the difference in power radiated by the blackbody is not linearly related to the temperature 4 difference but follows the Stefan-Boltzmann law ∆M = σ((Tbg + ∆T )4 − Tbg ). Moreover, the noise level also depends on the transmittance of the optics and the integration time of the sensor. Therefore, NE∆T is only meaningful as a performance parameter if the integration time and the background temperature are also specified with it1 . The NE∆Ts for the cameras used in this work are listed in table 6.1 on page 102. 1 Unfortunately, it is common that manufacturers don’t state at which integration time or frame rate their claimed NE∆T was measured. It is often safe to assume that the specified value does not hold for the shortest possible integration time. 43 2.2.3 Detector Output and Temperature Measurements In this section, the dependence of the output of an infrared detector on the incident flux is analyzed. It is shown that, if a blackbody is imaged, the detector output increases monotonically with the temperature of the blackbody. Because the cameras used for the present work are photon-detectors, the discussion of the detector output will focus on such detectors. However, for bolometer cameras the detector output can be derived in an analogous way by using the energy-based rather than the photon-based radiometric quantities. Detector Output The output of an infrared detector depends on the incident irradiance, the attenuation of the optical system and the response of the sensor. All these factors influencing the detector output may exhibit a spectral dependence, so that the output Vout is given as an integral over the spectral range λmin . . . λmax in which the detector is sensitive λ Zmax η(λ)Ep,λ (λ, T )dλ. Vout = k (2.29) λmin In this integral, Ep,λ (λ, T ) is the spectral incident irradiance, expressed in terms of the number of photons, and η(λ) is the quantum efficiency of the detector system, that is the quantum efficiency of the sensor multiplied by the attenuation factor of the optical system. The effects of the electronic read-out circuit are represented by a constant factor k. For a camera imaging a blackbody, the irradiance is given as the exitance of the blackbody, according to Planck’s law (2.12). As noted in section 2.1.2, the exitance of a blackbody increases monotonically for every spectral subregion. Assuming that the quantum efficiency of the detector is approximately constant over the spectral range λmin . . . λmax , the integral (2.29) also monotonically increases with temperature, resulting in a one-to-one correspondence between the blackbody temperature and the detector output. Therefore, it is possible to sense the temperature of a blackbody remotely with a radiometrically calibrated detector. Radiometric Calibration Radiometric calibration of a detector is performed by recording its output for several accurately known temperatures of an imaged blackbody source. As the sensitivity of each pixel on an infrared detector chip is different, such a calibration has to be performed on a per-pixel basis. For a given 44 Basics of Infrared Imaging Figure 2.11: Effect of emissivity: The left image shows a color photograph of a bottle with a white sticker, aluminum coated tape, and black duct tape. The bottle is filled with warm water. In the infrared image on the right the white sticker and the aluminum tape appear darker because of their lower emissivity compared to the black tape and the bottle glass. detector output, the temperature of a blackbody source can then be determined by interpolating between the different calibration points. In the present work, a polynomial is fitted through the calibration points to obtain a functional relationship between temperature and detector output, see sections 5.1 and 7.1. Non-Blackbody Objects For objects that are not blackbodies, the situation is more difficult. Because their emissivity is smaller than one, such objects do not emit infrared radiation as effectively as blackbodies. Moreover, they also reflect infrared radiation from their environment towards the detector. In such cases the detector output is given by λ Zmax obj ²(λ)η(λ)Mp,λ (Tobj ) Vout = k λmin λ Zmax env ρ(λ)η(λ)Mp,λ (Tenv )dλ, +k (2.30) λmin where ² and ρ are the emissivity and the reflectivity of the object respectively. Therefore, the exact emissivity of the object and the temperature of the environment, Tenv , have to be known in order to perform absolute temperature measurements. The temperature of a blackbody that would produce the same grey-value as the imaged object with emissivity ² < 1 is often called the apparent temperature of that object. 45 The effect of the emissivity on the output of an infrared detector is demonstrated in figure 2.11, which displays images of a bottle of warm water acquired with a “normal” camera and an infrared camera. Although the bottle has a homogeneous surface temperature, the irradiance received by the infrared camera, and therefore the apparent temperature, differs for those parts of the bottle that are covered with tape. This can be attributed to the variation in emissivity between the different types of tape. 46 Basics of Infrared Imaging 3 Stereo Computer Vision I: Geometry This chapter introduces the geometric concepts necessary to perform a 3D reconstruction with a stereo camera setup. Section 3.1 briefly outlines the concept of projective spaces, and introduces homogeneous coordinates, since these allow the formulation of the imaging process in a linear framework. The focus of section 3.2 is a detailed analysis of the imaging process with a single camera. This analysis is presented first for a simple pinhole model in section 3.2.1 and is then extended to a more realistic model involving a CCD-type camera and lens distortion in sections 3.2.3 and 3.3. To describe a real camera with such a model, the values of the free model parameters have to be estimated for the camera. This process is called camera calibration and is addressed in section 3.4, where a calibration algorithm proposed by Zhang [108] is presented. Camera calibration is essential if metric information about a 3D scene is to be derived from 2D images of that scene. Section 3.5 reviews the imaging geometry for the stereo case, in which two cameras are used, with a presentation of the epipolar constraint in section 3.5.2. If a point in one image is given, this constraint can be used to restrict the search space for a corresponding image point in the second image to a line. The topic of sections 3.5.3 and 3.5.4 is rectification; that is, a projective transform of two stereo images that brings corresponding image points onto the same image scan-lines. Rectification is an elegant method to make use of the epipolar constraint during disparity estimation, discussed in chapter 4, Once both cameras of a stereo setup have been calibrated and correspondences between points in both images have been established, 3D reconstruction can be performed using triangulation, described in section 3.5.5. In this work, homogeneous vectors are distinguished from non-homogeneous vectors by a tilde. Vectors representing points in 3D space are denoted by upper-case letters whereas vectors representing image points are denoted by lower-case letters. Table 3.1 gives examples and summarizes the notation for geometric entities used in this work. 48 Geometry Type Example 2D Image Homogeneous 3-vector (point or line in P2 ) Rectified homogeneous 3-vector Homogeneous 4-vector (point or plane in P3 ) 2-vector (point in R2 , e.g. image point) 2-vector (undistorted image point with respect to principal point) 2-vector (distorted image point) 3-vector (point in R3 , e.g. scene point) Matrices I, J ã, b̃ a, b Ã, B̃ a, b â, b̂ ă, b̆ A, B A, B Table 3.1: Summary of the notation used in chapter 3. 3.1 Projective Geometry In this section, homogeneous coordinates, a concept used in projective geometry, are introduced. Homogeneous coordinates are important for the present application, as they can be used to express the imaging process with a camera, which is a projection of the three-dimensional scene space onto a two-dimensional image plane, in a linear form. For an in-depth introduction to projective geometry in the context of computer vision, the reader is referred to the recent monographs by Faugeras and Luong [27] and Hartley and Zisserman [44]. 3.1.1 The Projective Plane P2 The projective plane P2 corresponds to the Euclidean plane R2 . However, in contrast to the Euclidean case, where a point in the plane is represented by a 2-vector (x, y)T in R2 , a point in the projective plane is represented by a 3-vector (x̃, ỹ, w̃)T in R3 without (0, 0, 0)T . This 3-vector is scale invariant, that is, for any factor α 6= 0, (x̃, ỹ, w̃)T and (αx̃, αỹ, αw̃)T are equivalent and correspond to the same point in P2 . This scale-invariant representation is known as a homogeneous vector. As mentioned above, in this work, homogeneous vectors and elements of homogeneous vectors are denoted with a tilde. A Euclidean point x = (x, y)T can be represented in homogeneous notation by the projective point x̃ = (αx, αy, α)T . It is common to choose α = 1 and to augment the Euclidean 2-vector by a 1 in the third coordinate when going from a Euclidean to a projective representation of a point. Going back from a projective 49 representation to a Euclidean representation of a point is achieved by dividing out the scale factor, x = (x̃/w̃, ỹ/w̃)T , for w̃ 6= 0. Projective points with w̃ = 0 are called points at infinity and have no counterpart in the Euclidean plane. The use of homogeneous vectors in P2 can be illustrated by considering a line on a plane. A line on the Euclidean plane can be represented by a 3-vector (a, b, c)T . A point x = (x, y)T lies on the line if ax + by + c = 0. Using homogeneous coordinates, this constraint defining the line can be expressed by an inner product T l̃ x̃ = (a, b, c) (x, y, 1)T = 0. (3.1) Note that the set of points x̃ fulfilling (3.1) does not change if the 3-vector (a, b, c) is scaled by an arbitrary, non-zero factor α. Therefore lines in P2 can also be represented as homogeneous 3-vectors. It is easy to show (see for example [44]) that the point of intersection x̃ of two lines l̃1 , l̃2 in P2 is given by their cross-product x̃ = l̃1 × l̃2 . Similarly, the line l̃ connecting two points x̃1 , x̃2 is given by their cross-product l̃ = x̃1 × x̃2 . This interchangeability between points and lines is known as duality. 3.1.2 The Projective Space P3 Analogous to the extension of the Euclidean plane to the projective plane discussed in the previous section, it is possible to extend the concept of the Euclidean 3space R3 to the projective space P3 . In P3 , points are represented by homogeneous 4-vectors in R4 without (0, 0, 0, 0)T . As in the 2-dimensional case, the Euclidean representation of a projective point X̃ = (X̃, Ỹ , Z̃, W̃ )T can be obtained by dividing by X = (X̃/W̃ , Ỹ /W̃ , Z̃/W̃ )T , for W̃ 6= 0. Again, points with W̃ = 0 are called points at infinity and have no counterpart in the Euclidean space. A point X = (X, Y, Z)T in Euclidean space can be represented in the projective space by augmenting its vector with a one, that is X̃ = (X, Y, Z, 1)T . Planes in P3 are also represented by homogeneous 4-vectors, analogous to the role of lines in P2 . In this work, vectors and elements of vectors in the Euclidean space R3 and in the projective space P3 are denoted with capital letters to distinguish them from vectors in the Euclidean and projective planes R2 and P2 . 3.1.3 Homographies In projective geometry, non-singular, linear transformations in projective spaces are called homographies. A homography can be described by a non-singular matrix. 50 Geometry For example, the transformation of a homogeneous vector in P2 can be expressed as x̃0 = Hx̃, where H is a 3 × 3 matrix. Because homogeneous vectors are scale invariant, the transformation matrix is also only defined up to an arbitrary scale factor. 3.2 3.2.1 Single View Geometry and Camera Models Pinhole camera model Figure 3.1 illustrates a simple pinhole camera model that can be used to describe the central projection of points in 3-dimensional space onto a 2-dimensional focal plane or image plane. The image m of a point M in the scene is determined by the intersection of the image plane and the line connecting M with the center of projection or camera center C. One can define an orthonormal coordinate system, with its origin located at the camera center and base vectors X, Y , Z, such that the focal plane is perpendicular to the Z-direction. This coordinate system is called the camera coordinate system. In the focal plane, a 2-dimensional image coordinate system is defined by the base vectors u and v. The Z-direction is aligned with the view direction of the camera and is referred to as the principal axis. The intersection of the principal axis with the focal plane is called the principal point p0 with the image coordinates (u0 , v0 )T . The distance of the focal plane from the origin is called the focal length f . Denoting image coordinates relative to the principal point with a circumflex (ˆ), such that x̂ = x − p0 , the projection of a scene point M = (X, Y, Z)T can be calculated as ¶ µ ¶ µ f X/Z x̂ = . x̂ = (3.2) ŷ f Y /Z The image coordinates are then given by µ ¶ µ ¶ x f X/Z + u0 x= = . y f Y /Z + v0 (3.3) The resulting image point x corresponds to the homogeneous point x̃ = (f X/Z + u0 , f Y /Z + v0 , 1)T (3.4) in the projective plane, which is equivalent to x̃ = (f X + Zu0 , f Y + Zv0 , Z)T due to the scale invariance of homogeneous vectors. Therefore, the central projection 51 Figure 3.1: Pinhole camera model. The focal plane is located at a distance f from the center of projection C, which is the origin of the camera coordinate system X, Y , Z. The axis perpendicular to the focal plane is called the principal axis and intersects it at the principal point p0 . A coordinate system in the image plane is spanned by the vectors u, v. The image m of a point M in the scene is given by the intersection of the line connecting M and C. of a point (X, Y, Z)T onto the image plane can be expressed by a linear mapping from P3 to P2 as follows X x̃ f s u0 0 Y f v0 0 (3.5) x̃ = ỹ = Z . w̃ 1 0 1 Equation (3.5) can be rewritten as x̃ = K [1 | 0] X̃ camera using the 3×3 camera calibration matrix f u0 K = f v0 . 1 (3.6) (3.7) The symbols 1 and 0 denote the 3×3 identity matrix and the 3×1 null vector, respectively. 52 3.2.2 Geometry World Coordinate System In many cases, 3D scene points are expressed with respect to a world coordinate system that differs from the camera coordinate system described in the previous section. The world coordinates of a point can be transformed into the camera coordinate system, with the transformation X camera = R (X world − C) , (3.8) where C is the position of the camera center in the world coordinate system and R is the 3×3 rotation matrix that aligns the axes of the world and the camera coordinate systems. The parameters that describe the six degrees of freedom for the translation and the rotation are called the exterior camera parameters. In homogeneous notation this coordinate transformation can be expressed as X̃ camera = R 0T −RC X̃ world . 1 (3.9) Introducing the translation vector T = −RC, (3.10) the projection of points expressed as homogeneous vectors in the world coordinate system can be written as x̃ = K [R | T ] X̃ world . (3.11) With the 3×4 camera projection matrix P, defined as P = K [R | T ] , (3.12) equation (3.11) can be written in the concise form x̃ = PX̃ world . (3.13) Thus, the imaging process with a projective camera can be expressed as a linear transformation from P3 to P2 . 53 3.2.3 CCD-type cameras When using CCD-type cameras, the basic pinhole camera model has to be slightly modified. The term CCD-type is used here to refer to cameras with a detector that has a number of individual sensor elements arranged in a rectangular array. Factors such as the type of integrated circuit technology used and the electronic read-out mechanism of the detector, are not important for this discussion1 , and so the results of this section also hold for the focal plane array detectors commonly used in infrared cameras and CMOS cameras. For a CCD-type detector chip, the image coordinates are usually expressed in terms of pixel coordinates. This can be accounted for by multiplying the parameter focal length f in the camera matrix K by an appropriate scale factor. This scale factor is determined by the center-to-center spacing of the pixels on the detector chip, the so-called pixel pitch. As the horizontal pixel pitch px may differ from the vertical pixel pitch py for some detectors, the two directions have to be considered separately. With αx = f /px and αy = f /py the camera calibration matrix for a CCD-type camera can be rewritten as α x s u0 αy v0 . K= (3.14) 1 The parameter s is called skew and accounts for cases where the rows and columns of pixels on the detector chip are not perpendicular to each other. With modern cameras this is usually not a concern, so that one can assume that s = 0. The free parameters of K are commonly referred to as interior camera parameters. 3.3 Non-Linear Distortion The pinhole model for CCD-type cameras is useful, as it allows a linear, and therefore simple, formulation of the imaging process. However, it is only an approximation, since real cameras and lenses exhibit distortion, which cannot be accounted for with a linear model. The effect of such non-linear distortion is demonstrated in figure 3.2 where the straight lines on a checkerboard appear curved in the image. Section 3.3.1 presents a non-linear model to describe lens distortion and 1 Strictly speaking, the term CCD, short for charge-coupled device, only refers to the way in which the image information is stored and read out from the detector chip. Some imaging detectors use a different read-out mechanism and are therefore not CCD detectors in the strict sense of the original definition. 54 Geometry section 3.3.2 provides an algorithm that makes it possible to “un-distort” images obtained with a real camera. The relation between world points and image coordinates in these undistorted images can then be used with the linear model, equation (3.11). Figure 3.2: Due to non-linear distortion introduced by the lens, straight lines in the scene appear curved in this image taken with a consumer digital camera. For reference, straight lines are shown in red. 3.3.1 Modeling Non-Linear Distortion To account for image distortion caused by real lenses, a non-linear distortion term can be added to the undistorted coordinates calculated using the pin-hole model. In this section, appropriate non-linear correction terms for modeling radial and tangential distortion are presented. Most real lenses exhibit radial distortion, which causes a radial displacement between the image coordinate x, onto which a scene point X would be projected obtained by an ideal linear camera, and the distorted coordinate x̆ of the projection actually observed. The center of this radial displacement is usually assumed to coincide with the principal point. Using undistorted image coordinates, expressed relative to the principal point, û = u − u0 , v̂ = v − v0 , the radial displacement (δur , δvr )T can be approximated by a series expansion µ ¶ µ ¶ ¢ û ¡ 2 δur κ1 r + κ2 r4 + . . . (3.15) = δvr v̂ where r= √ û2 + v̂ 2 (3.16) 55 is the distance from the center of distortion. In the manufacturing process, it may happen that the elements of a lens are not perfectly centered on the optical axis. This causes a distortion that has both a radial and a tangential component. The radial component can be accounted for by (3.15) and the displacement due to tangential distortion can be modeled as: µ δut δvt ¶ µ = ¶ 2τ1 ûv̂ + τ2 (r2 + 2û2 ) . τ1 (r2 + 2v̂ 2 ) + 2τ2 ûv̂ (3.17) Finally, the distorted image coordinate (ŭ, v̆)T is obtained by adding the radial and the tangential displacement vectors to the idealized, undistorted coordinate. ¶ µ ¶ µ ŭ u + δur + δut = v + δvr + δvt v̆ (3.18) This particular choice of parameters for modeling non-linear distortion is also used in [10, 47] and goes back to Brown [11, 12]. The distortion parameters are considered to be additional interior camera parameters, complementing the set of interior parameters of the camera calibration matrix K, described in section 3.2.3. 3.3.2 Correcting for Non-Linear Distortion In section 3.3.1, it was shown how to calculate the mapping between a 3D point and its real (distorted) image coordinates. In practice, the reverse problem usually arises: one acquires images with a lens system that introduces distortion, and would like to derive metric information from the images using the linear model of equation (3.13). Therefore, a mapping from the distorted coordinates (ŭ, v̆)T to the idealized undistorted coordinates (u, v)T is required before applying the pin-hole model. If both tangential and radial distortion are included in the model, there is generally no analytical solution to this inverse mapping problem. However, by employing a look-up table approach, a corrected image can be calculated using algorithm 3.1. 3.4 Estimation of Camera Parameters using Zhang’s Method In the previous section, it was shown how a projective CCD camera model can be parameterized by a set of intrinsic and extrinsic parameters. In this section, 56 Geometry Given: A distorted image Idist . 1. Allocate image space Icorr to hold the corrected image (usually the same size as the original image). 2. Loop over all the image coordinates (u,v) in Icorr (a) calculate distorted coordinate (ŭ, v̆) (b) Fill in image pixel (u,v) in Icorr with intensity (color or grey-value) of pixel (ŭ, v̆) from Idist (in general this will involve interpolation) Algorithm 3.1: Algorithm for removing non-linear image distortion. it is shown how these parameters can be estimated for a given real camera. The process of estimating the camera parameters is usually called camera calibration. A variety of methods for camera calibration can be found in the literature; see [14] for an overview. The common idea underlying these methods is to derive constraints on the parameters using known correspondences between features in the 3D scene and features in the image. This can be done either by the use of existing features in the 3D scene, such as straight lines and right angles, which are often found as architectural features, or through the use of known feature points on a dedicated calibration target which is put in the scene. For the present application, a calibration method is used in which a planar calibration target was imaged from several unknown orientations. This method was proposed by Zhang [108]. The calibration target used is a plane with a checkerboard pattern. The corners of the checkers are high-contrast feature points that can be easily and precisely located in the images, for example with a Harris corner detector [41]. This calibration method was chosen for its ease of use, keeping in mind that the calibration might eventually have to be performed on a ship, where other methods (such as moving a target on a linear positioner) might be less feasible. For a description of the calibration target used in the present application and the results of the calibration see sections 6.4 and 7.2 respectively. The implementation of the algorithm used for this work is due to Bouguet [10]. The presentation in this section outlines the main steps of the calibration algorithm described in Zhang’s paper. 57 3.4.1 The Calibration Process For the calibration, a planar calibration target with easily identifiable point features (corners, edges) Qi at known locations is needed. During the calibration process, images of this target are collected from several different orientations, either by moving the camera and leaving the target fixed or vice versa. This is illustrated in figure 3.3. The points q i corresponding to Qi are then extracted from the images and are used for the calibration. Figure 3.3: Calibration process. A planar calibration target, for example a checkerboard pattern, defines the world coordinate system. The position of the checker corners Qi with respect to this world coordinate system are known. For the calibration, images of the calibration target are collected from several orientations, either by keeping the camera fixed and moving the calibration target or vice versa. For each orientation j of the calibration target, a set of exterior parameters, describing the rotation Rj and translation T j between world and camera coordinate system, has to be estimated in addition to the interior parameters of the camera. The main calibration step, described in section 3.4.3, is an optimization that finds the set of interior camera parameters and exterior parameters with respect to the calibration plane that minimizes the re-projection error. The re-projection error is the distance between the observed image points q i and the image points q̆ i calculated by projecting the known Qi with the estimated camera parameters. To perform this optimization, it is necessary to have an initial guess for the camera parameters. Without this initial guess, it is likely that the optimization will converge to some local minimum of the re-projection error. An algebraic solution for finding this initial guess is presented in the following section 3.4.2. 58 3.4.2 Geometry Initial Guess through Closed-Form Solution For the initial guess, only a linear pinhole model without radial distortion is used. The following description of how to calculate the initial solution closely follows the steps outlined in Zhang’s paper [108]. Notation and World Coordinate System Throughout this section, the subscript j is used to denote different orientations of the calibration target and the subscript i is used to denote feature points on the calibration target. The world coordinate system is chosen such, that the planar calibration target lies in the Z = 0 plane. The ith feature point on the j th orientation of the calibration target is denoted by the vector Qji = (Xji , Yji , Zji )T and its image is denoted by q ji = (uji , vji )T . The Zji component of the feature point vector is, by definition of the world coordinate system, always zero, reflecting the fact that a planar calibration target is intrinsically two-dimensional. Homography between the Model Plane and its Image The feature points Q̃ji lie on the model plane (that is, the calibration target) and their respective images q̃ ji also lie on a plane, namely the focal plane of the camera. Therefore, they are related by a planar homography, which is a projective transform in the projective plane P2 . This homography can be represented by a 3 × 3 matrix Hj as uji Xji q̃ ji = µ vji = Hj Yji , (3.19) 1 1 for some scale factor µ. If at least four correspondences between the feature points Q̃ji and their images q̃ ji are established, the homography Hj can be calculated. The details of how this is done are given in appendix B. Constraints on the Camera Parameters For each relative orientation j between the calibration target and the camera, the homography Hj gives two constraints on the interior camera parameters. If at least three different homographies are obtained, the entire camera matrix can be determined. Once the interior camera parameters are determined, the exterior parameters for each orientation of the model plane can be extracted from Hj . To derive a relation between the homographies Hj and the camera parameters, consider the projection of the feature points Q̃ji into the image (see equation (3.11)) q̃ ji = K [Rj | T j ] Q̃ji , (3.20) 59 with the camera matrix K and the exterior parameters Rj , T j . Representing the rotation matrix Rj by its column vectors this becomes Xji uji £ ¤ Yji λ vji = K Rj1 Rj2 Rj3 T j (3.21) Zji = 0 . 1 1 Using the fact that by definition of the world coordinate system Zji = 0 for all the feature points (the planar calibration target lies in the Z=0 plane), the notation can be shortened by leaving out the Z coordinate to uji £ ¤ Xji λ vji = K Rj1 Rj2 T j Yji , (3.22) 1 1 representing the Q̃ji by 3-vectors (Xji , Yji , 1)T . A comparison of the relation between the feature points and their images expressed by the planar homography, equation (3.19), and the same relation derived using the camera projection matrix, equation (3.22), shows that £ ¤ £ ¤ (3.23) Hj = H j1 H j2 H j3 = λK Rj1 Rj2 T j , where the H jk , k = 1 . . . 3, represent the column vectors of Hj . Again, the constant λ denotes some scale factor, though not necessarily the same as the λ as in (3.22). Using K−1 H 1,2 = λR1,2 and the orthonormality of the rotation matrix R, that is RTk Rl = δkl , equation (3.23) leads to the following two constraints on the camera matrix K H Tj1 K−T K−1 H j2 = 0 (3.24) H Tj1 K−T K−1 H j1 (3.25) = H Tj2 K−T K−1 H j2 . The camera matrix K has five degrees of freedom. As each homography only gives two constraints, at least three homographies are needed to calculate the intrinsic parameters. Closed-form solution In this subsection, it is shown how the above constraints can be used to obtain a solution for K given at least three homographies. 60 Geometry Equations (3.24) and (3.25) involve the matrix product K−T K−1 . For the calculations below, it is useful to express this product as B11 B12 B13 B = K−T K−1 ≡ B21 B22 B23 . (3.26) B31 B32 B33 B is symmetric and can therefore be represented by a 6-vector B = (B11 , B12 , B22 , B13 , B23 , B33 )T . (3.27) If the k th column vector for a given homography Hj is defined to be H k = (Hk1 , Hk1 , Hk3 )T , one can define a vector V kl as follows: H Tk BH l = V Tkl B with (3.28) V kl Hk1 Hl1 Hk1 Hl2 − Hk2 Hl1 H H k2 l2 = Hk3 Hl1 + Hk1 Hl3 . Hk3 Hl2 + Hk2 Hl3 Hk3 Hl3 (3.29) Using this definition, the constraints (3.24) and (3.25) can be recast in the form · ¸ V T12 B=0 (3.30) (V 11 − V 12 )T Given a single homography, this system is under-determined, but if n homographies corresponding to different orientations of the model plane are available, there are n equations of type (3.30). These n equations can be stacked, yielding the equation system VB = 0 (3.31) with the 2n×6 matrix V. Leaving degenerate configurations aside, this system can be solved for B if at least three homographies are given. A least-squares solution of (3.31) can be obtained by the singular value decomposition of V, with B given as the vector corresponding to the smallest singular value. 61 Once B is estimated, the intrinsic camera parameters can be calculated using equations (3.14) and (3.26), yielding ¡ ¢ 2 v0 = (B12 B13 − B11 B23 ) / B11 B22 − B12 ¡ 2 ¢ λ = B33 − B13 + v0 (B12 B13 − B11 B23 ) /B11 p αx = λ/B11 q 2 ) αy = λB11 / (B11 B22 − B12 s = −B12 fx2 fy /λ u0 = sv0 /fx − B13 fx2 /λ. (3.32) If K has been calculated, the exterior parameters follow from equations (3.14) and (3.23) as Rj1 = λK−1 H j1 3.4.3 Rj2 = λK−1 H j2 Rj3 = Rj1 × Rj1 T = λK−1 H j3 . (3.33) Full Solution through Minimization of Geometric Error The algebraic error minimized by solving equation (3.31) in a least-squares sense has no meaningful physical interpretation. This is because the matrix B contains parameters with different physical units, for example skew and focal length. Also, the solution obtained is only a linear solution that does not include the parameters of the distortion model described in section 3.3. The final calibration step is therefore a non-linear optimization, which varies the set of all parameters (exterior, intrinsic, distortion) to find the parameters that minimize a geometrically meaningful cost function. This cost function is ² (K, κ, τ, Rj , T j , Qi ) = n X m X kq ij − q̆ (K, κ, τ, Rj , T j , Qi )k2 , (3.34) j=1 i=1 points plane orientations that is, the sum of squared differences between the image coordinates of the extracted grid corners q ji and the coordinates calculated by re-projecting the known grid-corner positions Qji using the set of exterior, interior and distortion parameters. The sums run over all orientations j of the planar calibration target and over all the grid points i. The term kq ji − q̆ (K, κ, τ, Rj , T j , Qi )k is called the re-projection error for each pixel. It has a meaningful interpretation as the distance between an extracted point 62 Geometry coordinate and the respective coordinate obtained by re-projection, and therefore ² is a meaningful geometric error function that must be minimized. In the present application, an implementation by Bouguet [10] was used, in which the optimization is performed with a gradient-descent algorithm (see for example [75]). As the optimization algorithm may become trapped in some irrelevant local minimum it is important to have an initial estimate of the parameters that is close to the correct solution. To this end, the results obtained using the algebraic method described in section 3.4.2 were used as an initial guess for the internal and exterior camera parameters, and the initial values of the distortion parameters were set to zero. 3.5 Two View Geometry In this section the geometry of a stereo system consisting of two cameras is reviewed. First, section 3.5.1 shows how the camera calibration method described in section 3.4 can be used to calibrate a stereo camera system and to determine the relative orientations of two cameras in addition to their intrinsic parameters. Section 3.5.2 discusses the epipolar constraint, which is one the most important geometrical constraint in two-view geometry. Given a point in one image, the epipolar constraint limits the possible locations for the corresponding point in the second image to a line. Using homogeneous coordinates, the epipolar constraint can be expressed elegantly in terms of the fundamental matrix . The epipolar constraint can be exploited to reduce the search space of candidate image points in stereo matching (see chapter 4). For this, it is often convenient to apply projective transformations to the original stereo image pair, which aligns the epipolar lines with the image scan-lines. This so-called rectification process is described in sections 3.5.3 and 3.5.4. 3.5.1 Calibration of a Stereo Camera System An important parameter describing a stereo camera system is the relative orientation of the two cameras with respect to each other. It is required for the calculation of the 3D position of a point by triangulation (see section 3.5.5), and, if known, allows a simple determination of the fundamental matrix (see section 3.5.2). The relative orientation of two cameras with respect to each other can easily be determined if the two cameras are geometrically calibrated with respect to the 63 same world coordinate system. Therefore, the calibration process described in section 3.4.1 should be performed for both cameras together, using the same feature points and the same orientations of the reference plane. The calibration procedure as described in section 3.4.1 uses a world coordinate system with the origin at one of the corners of the calibration target. For a stereo camera system, where usually only the relative position of the cameras is of interest, it is more appropriate to let the world coordinate system coincide with the camera coordinate system of one of the cameras. The position and orientation of the second camera is then expressed with respect to this camera system, as discussed below. Assume that the rotation matrices R1 , R2 and the translation vectors T 1 , T 2 describe the exterior orientation of the two cameras with respect to the same world coordinate system, defined by the calibration pattern. This situation is depicted in figure 3.4. Figure 3.4: Calibration of a stereo camera system. The projective centers of two cameras comprising a stereo system are denoted by C 1 and C 2 . The world coordinate frame, denoted by the subscript w, is defined by the calibration target. The exterior orientation of the two cameras with respect to this world frame is represented by the rotation matrices R1 ,R2 and the translation vectors T 1 ,T 2 . The position of the second camera with respect to the camera coordinate system of the first camera can be calculated using equations (3.35) and (3.36) and is represented by R12 and T 12 . The translation vectors T 1 , T 2 are related to the positions of the two camera centers C 1 , C 2 by T 1 = −R1 C 1 and T 2 = −R2 C 2 , respectively (see equation 64 Geometry (3.10)). The position of the second camera center with respect to the first camera center can be expressed by the vector sum C 02 = −C 1 + C 2 . The corresponding translation vector is therefore −1 T 12 = −R1 (R−1 1 T 1 − R2 T 2 ) = −T 1 + R1 R−1 2 T2 (3.35) The rotation of the camera coordinate system of the second camera with respect to the camera coordinate system of the first camera is obtained by two successive rotations: • Rotating it into alignment with the world coordinate system (defined by the calibration target) using R−1 2 . • Rotating it from the world coordinate system into alignment with the camera coordinate system of the first camera using R1 . The combined rotation matrix is therefore R12 = R1 R−1 2 . (3.36) As only the relative orientations of the two cameras and the calibration target are involved, it makes no difference whether the stereo camera system is moved and the calibration target remains fixed or whether the stereo camera system remains fixed and the calibration target is moved. For each orientation j of the reference plane, a complete set of exterior parameters Rj1 , T j1 , Rj2 , T j2 is obtained for both cameras. Each of these sets can be used to determine the relative orientation T 12 , R12 between the two cameras. Because of uncertainties, there will be some variation in the estimated relative orientations. The implementation of the stereo calibration, due to Bouguet [10], used in the present work, therefore uses the component-wise median of all the translation vectors. As the entries of a rotation matrix are not all independent, one cannot simply take the median of each entry. Therefore, each rotation is expressed in terms of a rotation vector, whose direction specifies the axis of rotation and whose magnitude specifies the angle of rotation. Then the median of each component of the rotation vector is taken, to obtain a “median rotation”. As a final step, the orientation parameters are further refined using a non-linear optimization, as in section 3.4.3, that minimizes the re-projection error of the points in both images by variation of the relative orientation and the interior parameters of both cameras. 65 3.5.2 Epipolar Constraint and the Fundamental Matrix Epipolar Constraint The diagram in figure 3.5 shows the geometry of two cameras imaging the same scene point P . Assuming that the image of P in one of the cameras is given, for example p1 , then the line connecting the camera center C 1 and P is known. The image point p2 , which is the projection of P into the other camera, must lie on the projection of the line connecting C 1 and P . This projection is a line in the image plane (named l2 in figure 3.5) and is called the epipolar line corresponding to p1 . The configuration is symmetric, there is also an epipolar line l1 in the first camera corresponding to the image p2 of P in the second camera. Together with the two camera centers C 1 and C 2 , every scene point P defines a plane, the so called epipolar plane. The intersections of this plane with the two focal planes are just the epipolar lines corresponding to P . All epipolar planes contain the line that connects the two camera centers C 1 and C 2 . This line intersects the focal planes at the so-called epipoles e1 and e2 . In each image all epipolar lines intersect each other at the epipole. P l2 p2 p1 C1 e1 e2 C2 Figure 3.5: Two-view geometry. The scene point P is projected to the image points p1 ,p2 through the projective centers C 1 and C 2 respectively. The camera center C 1 and the image point p1 define a line of sight. The projection of this line onto the image plane of the second camera forms the epipolar line l2 corresponding to the point p1 . The image point p2 corresponding to the same scene point P as the image point p1 in the other view must lie on the epipolar line l2 . For illustration, the projections of some points lying on the line connecting C 1 to P onto the epipolar line are shown. The plane spanned by the point P and the two camera centers is called the epipolar plane. The projection of the two camera centers into the respective other image defines the epipoles e1 ,e2 . 66 Geometry The Fundamental Matrix and its Properties The epipolar constraint finds its algebraic expression in the so-called fundamental matrix F, a 3 × 3 matrix of rank 2. Without derivation this section lists some properties of the fundamental matrix (for a derivation see [27, 44]). Using the homogeneous notation for points and lines in the projective plane P2 , introduced in section 3.1.1, the fundamental matrix establishes a relationship between a point x̃1 in image I1 and its corresponding epipolar line l̃2 in image I2 l̃2 = Fx̃1 . (3.37) For a point x̃2 in image I2 , the corresponding epipolar line l̃1 in image I1 can be calculated using the transpose of the fundamental matrix l̃1 = FT x̃2 . (3.38) If the two points x̃1 , x̃2 are the images of the same scene point X x̃2 must lie on the epipolar line l̃2 corresponding to x̃2 . It follows that x̃T2 l̃2 = x̃T2 Fx̃1 = 0, (3.39) because when using homogeneous coordinates, the inner product p̃T l̃ is zero for every point p̃ on the line l̃. Because all epipolar lines in each image intersect at the epipole of their respective image, the epipoles are the null-space of F and its transpose, Fẽ1 = 0 FT ẽ2 = 0 (3.40) where 0 is the null-vector (0, 0, 0)T . Calculating the Fundamental Matrix The fundamental matrix can be computed from point correspondences across the two images, for example using the algorithm outlined by Hartley [42]. If intrinsic and extrinsic camera parameters of the stereo system are already known, for example as the result of a camera calibration process as outlined in 3.4, the fundamental matrix can be derived from these parameters. Assuming that the camera coordinate system of the first camera coincides with the world coordinate system, and the relative position and orientation of the second camera is expressed by a translation vector T and a rotation matrix R, that is P1 = K1 [1 | 0], P2 = K2 [R | T ], then the fundamental matrix is given as −1 F = K−T 2 [T ]× RK1 . (3.41) 67 The notation [T ]× stands for the matrix for which [T ]× X = T × X for any X, that is 0 −T3 T2 0 T1 . [T ]× = T3 (3.42) −T2 −T1 0 3.5.3 Image Rectification For a verging stereo camera setup like the one depicted in figure 3.5, the epipoles lie either in the image or at a finite distance from the image. As the epipoles are the points where all epipolar lines intersect, this results in non-parallel epipolar lines as depicted in figure 3.6. Figure 3.6: Epipolar lines for a configuration where the epipole e lies within a finite distance of the image. For the implementation of the stereo matching process, described in chapter 4, it is more convenient to have parallel epipolar lines that coincide with the image scan-lines. This can be achieved if the images of both cameras lie in the same focal plane. For such a setup, the line connecting the two camera centers must be parallel to the common focal plane. Therefore the epipoles, which are the projections of the two camera centers into the respective other image, lie at infinity, resulting in parallel epipolar lines. However, it is often not practical to use such a setup, because the overlap of the fields-of-view is smaller than for a verging stereo camera setup (see also section 6.1.2). Instead of using a stereo setup with coplanar focal planes, one can use a verging stereo system, and apply projective transformations to map the image points obtained with this setup onto two rectified images. These rectified images share a virtual image plane that is parallel to the baseline connecting the camera centers, as shown in figure 3.7. This process of re-projecting the image is known as rectification. 68 Geometry Figure 3.7: Rectification of stereo image pairs. The image points x1 and x2 of X are re-projected onto a common image plane, which is parallel to the baseline connecting the two camera centers C 1 and C 2 . The rectified image points are denoted x1 and x2 . In contrast to the original epipoles e1 , e2 , the epipoles corresponding to the new image planes lie at infinity, resulting in parallel epipolar lines (dashed). Using homogeneous coordinates, the rectification can be performed by applying homographies H1 and H2 to all points in the original images I1 and I2 respectively: x1 = H1 x̃1 and x2 = H2 x̃2 , (3.43) where x1 , x2 are the image coordinates in the rectified images. This results in the following transformation of the epipolar constraint (3.39) for two points x̃1 , x̃2 in images I1 , I2 corresponding to the same scene point X̃: x̃T2 Fx̃1 = 0 xT2 H−T FH−1 x = 0, | 2 {z 1 } 1 (3.44) F with F being the fundamental matrix for the rectified image pair. For image rectification, the homographies H1 , H2 have to be chosen in a way that the epipolar lines corresponding to F are parallel and aligned with the image scan-lines. It is easy to verify that the fundamental matrix 0 0 0 F = 0 0 −1 (3.45) 0 1 0 69 fulfills these requirements. Its epipoles lie at e = (1, 0, 0)T , which is a point at infinity, therefore the epipolar lines are parallel. Corresponding points, for which xT2 Fx1 = 0, also have the same y-coordinate that is they lie on the same scan-line. 3.5.4 Projective Distortion-Minimizing Rectification Several algorithms for calculating the rectifying homographies given the fundamental matrix or the projection matrices of the two cameras have been proposed, see for example [25, 32, 38, 71]. The choice of homographies H1 , H2 that fulfill 0 0 0 −1 F = H−T 2 FH1 = 0 0 −1 0 1 0 (3.46) is not unique. A geometric explanation for this is, that there are infinitely many planes that are parallel to the camera baseline. The particular choice of homographies can affect the quality of the rectified image pair, because the digital images have to be resampled when the projective transformations are applied. This is illustrated in figure 3.8, where a stereo pair is rectified with two different sets of homographies. Clearly, the choice of homographies for the stereo pair in the middle is unfortunate, as some parts of the images are compressed, resulting in loss of detail, whereas other parts of the image are stretched. For the present application, the algorithm by Loop and Zhang [71] has been implemented to calculate a set of homographies that minimizes a measure of distortion for the rectified images. The derivation of the algorithm is quite lengthy and can be found in the original paper by Loop and Zhang [71]; only the main steps of the algorithm are outlined here. In the following discussion, the homographies T ũ ua H = ṽ T = va wa w̃T H1 , H2 are represented by ub uc vb vc wb 1 (3.47) where wc is set to 1 as homographies are only defined up to a scale factor. Indices of 1 and 2, as in ua,1 , represent elements of H1 and H2 respectively; without indices the discussion holds for both homographies. 70 Geometry Figure 3.8: Effect of the choice of rectifying homographies on the rectified image pairs. The top row shows the original image pair, and the middle and bottom row show rectified image pairs obtained with a bad and a good choice of rectifying homographies. A few sample points and their corresponding epipolar lines are shown in red. Decomposition of Homographies The main idea of the algorithm is to decompose the rectifying homographies into a projective part Hp and an affine part Ha , and to choose the free parameters of the projective part such that a cost function representing projective distortion is minimized. The affine transform is further subdivided into a similarity Hr , that is a rotation and a translation, and a shearing transform Hs . This results in the following two homographies H1 = Hs,1 Hr,1 Hp,1 and H2 = Hs,2 Hr,2 Hp,2 (3.48) The projective part Hp of the transforms is used to bring the epipoles to infinity. The similarity transform Hr subsequently rotates them into alignment with the horizontal image scan-lines, that is the (1, 0, 0)T direction. One of the similarities is also used to translate one of the images vertically, such that corresponding points 71 lie on the same scan-line in each image. After this step, the images are rectified. The final shearing transform Hs , reduces the perspective distortion, if possible, by preserving the perpendicularity of lines and aspect ratio. It is also used to adjust the scale of the rectified images. Projective Part Hp and Distortion Minimization The projective parts of the transform are used to map the epipoles to points at infinity and take the form 1 0 0 Hp = 0 1 0 . (3.49) wa wb 1 Hp will transform a point x̃i = (xi , yi , 1)T to ( xγii , γyii , 1), where γi is given by the dot product of the third row of Hp and x̃i , γi = (wa , wb , 1) (xi , yi , 1)T . (3.50) If the γi were the same for all points x̃i , there would be no projective distortion, and the transform Hp would be affine. However, it is not possible to use an affine transform to map the epipole to infinity unless it is already there. Therefore, the γi , which can be thought of as projective weight factors, will show some variation for different points x̃i . The idea behind the distortion minimization used in [71] is to find the projective transform that maps the epipole to a point at infinity and is as close to an affine transform as possible. To decide whether a projective transform is close to affine, the variation of the γi over all image points is used as a criterion. This results in a cost function ¶2 X ¶2 n1 µ n2 µ X γi,1 − γc,1 γi,2 − γc,2 f (wa,1 , wb,1 , wa,2 , wb,2 ) = (3.51) + , γi,1 γi,2 i=1 i=1 where the first sum runs over all pixels n1 in the first image and the second sum runs over all pixels n2 in the second image. The weights γc,1 and γc,2 are the weights of the center pixels of image I1 and I2 respectively. When minimizing this function, the fact that the parameters wa,1 , wb,1 and wa,2 , wb,2 are not fully independent, because they are related by the fundamental matrix, must be taken into account. The set of parameters that minimizes f , and therefore minimizes the variation of the weights γi over both images, yields the projective transforms that are closest to affine and therefore reduce distortion. 72 Geometry Similarity Transform The similarity transform Hr is used to rotate the epipoles into alignment with the image scan-lines. Based on the assumption that wa,1 , wb,1 and wa,1 , wb,1 have been determined by minimizing equation (3.51), Loop and Zhang arrive at the following results F32 − wb,1 F33 wa,1 F33 − F31 0 Hr , 1 = F31 − wa,1 F33 F32 − wb,1 F33 F33 + vc,2 (3.52) 0 0 1 and wb,2 F33 − F23 F13 − wa,2 F33 0 Hr , 2 = wa,2 F33 − F13 wb,2 F33 − F23 vc,2 , 0 0 1 (3.53) where the Fij are the elements of the fundamental matrix. The parameter vc,2 should be chosen such that corresponding points lie on the same scan-line. Shearing Transform The final shearing δ1 Hs = 0 0 transform is of the form δ2 0 1 0 . 0 1 (3.54) It is used to reduce the projective distortion introduced by Hp . This is done by adjusting the free parameters δ1 and δ2 such that the perpendicularity of lines and the aspect ratio of each image is preserved. Given points a, b, c, d in the original image as defined in figure 3.9 and their counterparts a0 , b0 , c0 , d0 in the image transformed by Hr Hp , one can define the Euclidean 2-vectors h,v connecting the mid-points of the transformed image borders. With h̃ = (hT , 0) and ṽ = (v T , 0) perpendicularity after applying the shearing transform is ensured if (Hs h̃)T (Hs ṽ) = 0. (3.55) The original aspect ratio is preserved after the shearing transform if (Hs h̃)T (Hs h̃) n2w = , (Hs ṽ)T (Hs ṽ) n2h (3.56) where nw and nh are the width and the height of the original image. The values of δ1 , δ2 for which Hs satisfies equations (3.55) and (3.56) are given in Loop and Zhang [71] as n2h h2y + n2w vy2 δ1 = nw nh (hy vx − hx vy ) and δ2 = n2h hy hx + n2w vx vy . nw nh (hx vy − hy vx ) (3.57) 73 Figure 3.9: Definition of the midpoints a, b, c, d of the image borders in the original image (left) and their counterparts a0 , b0 , c0 , d0 in the image transformed by Hr Hp (right). The shearing transforms for both images can be calculated this way, yielding the rectifying homographies H1 = Hs,1 Hr,1 Hp,1 and H2 = Hs,2 Hr,2 Hp,2 . In general, the rectified images resulting from these homographies still have to be scaled. An appropriate scale factor can be chosen by requiring the combined area of the rectified images to be the same as for the original image pair. 3.5.5 Triangulation The goal of stereo imaging is usually to obtain a 3D-reconstruction of an object, for example the water surface in the present application. To obtain the 3D-coordinates of a point X, which projects to the image points x1 and x2 in two images I1 and I2 , one has to find the intersections of the two lines of sight that go through the image points and their respective camera centers. In general, the positions of x1 and x2 are not known exactly, but are subject to uncertainties arising from noise or errors in the search for corresponding points. Therefore, the lines of sight might not intersect and an approximate solution must be sought. Several methods for choosing an approximate solution are summarized and assessed by Hartley and Sturm [43]. For the present application a linear triangulation method, described below, was chosen. As demonstrated in [43], this method does not perform very well in cases where the camera projection matrices are only known up to a projective transform. As the camera projection matrices are known in this work, this is not a critical issue, and the use of the linear triangulation method is justified. 74 Geometry Linear Triangulation Method By substituting the projection matrices P1 ,P2 of the two cameras and a pair of corresponding image points x̃1 = w1 (x1 , y1 , 1)T and x̃2 = w2 (x2 , y2 , 1)T into equation (3.13) one obtains T X̃ P 1,a Ỹ x̃1 = P1 X̃ = P T1,b (3.58) Z̃ T P 1,c W̃ and an analogous equation for x̃2 and P2 . If the rows of the projection matrix are expressed by the 4-vectors P Ta...c as above, this linear system yields the following three equations: w1 x1 = P T1,a X̃ (3.59) w1 y1 = P T1,b X̃ w1 = P T1,c X̃. The third equation for w1 can be substituted into the other equations, yielding two equations for three unknowns. An analogous substitution can be applied to the equations corresponding to the second camera, and the resulting system of four linear equations can be arranged into a matrix form as x1 P T1,c − P T1,a y1 P T1,c − P T1,b (3.60) AX̃ = x2 P T2,c − P T2,a X̃ = 0. y2 P T2,c − P T2,b The least-squares solution for X̃ yields the position of the scene point corresponding to x̃1 , x̃2 in homogeneous coordinates. In the present implementation, this equation is solved by computing the eigenvector corresponding to the smallest eigenvalue of AT A using the singular value decomposition (see for instance Golub and van Loan [39] or any other textbook on matrix computation). Finally, the Euclidean coordinates of the scene point corresponding to the image points x1 , x2 are obtained by X = 1/W̃ (X̃, Ỹ , Z̃)T . If, as in the present work, the stereo image pair is rectified by a pair of homographies H1 , H2 , the triangulation can be performed on pairs of rectified points. In this case, the original projection matrices have to be replaced by the effective projection matrices after rectification, namely P1r = H1 P1 and P2r = H2 P2 . 4 Stereo Computer Vision II: Solving the Correspondence Problem The concepts outlined in the previous chapter can be used to infer the 3D coordinates of a scene point given its image points in two different views. In this chapter, the problem of how to identify a pair of image points p1 , p2 , corresponding to a common point P in the scene, given two images of a scene I1 , I2 is addressed. Such corresponding points are sometimes referred to as homologous points and the problem of identifying them is known as the stereo correspondence problem. In the computer vision literature, a vast number of stereo matching algorithms for solving the correspondence problem can be found. Section 4.1 classifies these algorithms, which fall into two main categories: featurebased, briefly discussed in section 4.1.1, and area-based, analyzed in more detail in section 4.1.2. Based on the recent taxonomy of area-based stereo matching algorithms by Scharstein and Szeliski [82], it is shown how these algorithms can be broken up into a few characteristic building blocks. Area-based matching algorithms are based on the premise that the respective local neighborhoods around two homologous image points look similar. This premise can sometimes be violated. Therefore, section 4.2 reviews the prerequisites for area-based matching and shows that, for a water surface, they are better fulfilled for infrared images than for images acquired at visible wavelengths. The matching algorithm used in the present application and its implementation are presented in section 4.3. Note that the difference between the image coordinates of two homologous points is commonly referred to as disparity. Therefore the process of matching corresponding points in stereo images is also known as disparity estimation. 4.1 Classification of Matching Algorithms In order to motivate the choice of matching algorithm used in the present work, this section gives a brief overview of existing stereo algorithms. A more extensive review can be found in the recent paper by Brown et al. [13]. The paper by Dhond and Aggarwal [19], though outdated in some points, can still serve as a good introduction. The work by Scharstein and Szeliski [82] focuses on area-based methods. 76 Solving the Correspondence Problem As mentioned above, stereo matching algorithms can be divided into two major categories. Feature-based algorithms seek correspondences only for salient image features, such as edges or corners, whereas area-based algorithms try to find correspondences for every image point. The focus of this overview is on area-based algorithms, as they allow a dense surface reconstruction, which is a requirement for the present application. 4.1.1 Feature-based Stereo Matching In feature-based stereo matching, the complexity of the matching process is reduced by a pre-processing step in which prominent features like corners (points with high grey value gradient in all directions) and edges are extracted from the image. The matching is then performed on these features only. The reduction in the number of candidate points significantly speeds up the algorithm. Furthermore, the chance of ambiguous matches is reduced. One application of feature-based stereo matching in a field related to the study of water waves is stereo particle tracking velocimetry. This technique has been employed at the Institute for Environmental Physics, University of Heidelberg, to analyze the subsurface turbulence generated by water waves [see 24, 34, 105] and to study the effects of ship-induced turbulence on sediment transportation in waterways [see 62, 65, 92, 94]. In both applications, moving particles (tracer or sediment particles) create streaks in the images of both cameras. These streaks are then segmented from the background and matched, eventually leading to a 3D-reconstruction of the particles’ trajectories and velocities. Feature-based algorithms generate only a sparse depth map and are therefore not suitable for the present application. 4.1.2 Area-Based Stereo Matching In area-based disparity matching, small area patches around a point in one image are compared to area patches around every point on the corresponding epipolar line (see section 3.5.2) in the other image, usually by computing the correlation of the image intensities (grey-values). Based on the assumption that the areas surrounding two image points corresponding to the same scene point are similar, points with a higher correlation score are assigned a higher probability of being the correct match. For this assumption to be justified, a number of conditions, discussed in detail in section 4.2, have to be met. With area-based matching, a match for every image pixel is sought, leading to a dense 3D reconstruction. 77 In the following paragraphs, the main building blocks of area-based algorithms are outlined. First the representation of matches using a disparity map is defined. This is followed by a dissection of the area-based algorithms into three main steps based on work by Scharstein and Szeliski [82]. Representation of Matches It is common to represent matching points by their shift in image coordinates (from one image to the other), expressed in terms of a uni-valued disparity map D(x, y). This disparity map is defined as matches I1 (x, y) ←→ I2 (x + D(x, y), y). (4.1) This means that for every pixel coordinate (x, y) in image I1 , the corresponding image point in image I2 can be found by adding the disparity D(x, y) to the xcoordinate. Disparity can therefore be interpreted as a shift in pixels. Note that one image of the stereo pair plays the role of the reference image with respect to which the disparity is defined. Here, I1 was arbitrarily chosen as the reference image; it is of course also possible to use I2 as the reference. The disparity is only added to the x-coordinate, taking into account that the match must lie on the epipolar line (see section 3.5.2), and assuming that the epipolar lines have been aligned with the x-axis through a rectification step as described in section 3.5.3. Algorithm Building Blocks To provide a framework for their taxonomy of areabased matching algorithms, Scharstein and Szeliski [82] break down the algorithms into three major building blocks: 1. computation of a correlation score for each possible match, 2. aggregation of the matching score over local neighborhoods, and 3. computation, and possibly optimization, of the disparity. Scharstein and Szeliski then classify the matching algorithms based on the approaches used to perform the above steps. Most stereo algorithms fit into this classification scheme, although many algorithms combine one or more of the steps into a single step for implementation reasons. For example, the algorithm chosen for the present application, described in section 4.3, combines the computation of the matching score with the aggregation step. Matching Score The first step of a stereo algorithm consists of calculating a similarity measure for every pair of candidate points. This similarity measure is 78 Solving the Correspondence Problem often referred to as matching score or correlation score. Common matching scores used as a similarity measure include absolute differences (AD), squared differences (SD) and cross-correlation of image grey-values. The matching scores for all image pixels can be represented as a data volume C0 (x, y, d). This data volume is referred to as disparity space image [82]. As an example, for a matching score of absolute differences, the disparity space image is calculated as C0 (x, y, d) = |I1 (x, y) − I2 (x + d, y)| . (4.2) Depending on the image size and the range of disparities that is searched, the data volume C0 (x, y, d) can take up large amounts of memory. For efficiency reasons, many implementations, including the present application, therefore only store the matching scores for a single image line (see 4.3.2). The major differences between different matching scores is their sensitivity to greyvalue transformations (scaling and offset) and to outliers. However, the findings of Scharstein and Szeliski [82] and Faugeras et al. [26] suggest that the result of the stereo matching processes is not strongly dependent on the particular choice of matching score. Aggregation of Matching Score In general, matches cannot be found based on comparing only single pixels. This is because the grey-value is affected by camera noise and view angle, and because other, non-corresponding, pixels might have a similar grey-value. Therefore, it is necessary to consider pixel neighborhoods. For two pixels that are a correct match, it is likely that their respective neighboring pixels will also have a high degree of similarity. However, for two pixels that don’t match, the matching score of their respective neighboring pixels is likely to be lower. By taking the neighboring pixels into account, the comparison is effectively of area patches and not single pixels. A simple way to take the neighborhood into account is to sum up the matching score over a fixed window of support around each pixel. This can be done by convolving the disparity space image C0 (x, y, d) with a filter mask w(x, y, d) that computes a local average, for example a rectangular correlation window (see 4.3.2) or a Gaussian. Mostly, the aggregation is only performed along the x- and y-coordinates, but some algorithms also aggregate over neighboring disparity values. The general case looks as follows. C(x, y, d) = w(x, y, d) ∗ C0 (x, y, d). (4.3) 79 More elaborate aggregation schemes use, for example, adaptive filter sizes, applying a larger filter in areas with little texture. Other algorithms, for example [109], use iterative aggregation schemes that apply, in effect, anisotropic smoothing (see, for example [80]) to the disparity space image. Computation and Optimization of Disparity In the final step of the algorithm, the disparity for each pixel in the reference image is computed. A common approach, implemented in the present application, is to choose the value of d with the maximum correlation score: D(x, y) = argmax C(x, y, d). (4.4) d In other approaches, the correlation score is used as part of a global cost function. Such a cost function can for example be used to enforce a smooth spatial variation of the disparity estimates, similar to the regularization technique described in section 5.3.3. The disparity map is found as the solution that minimizes the overall cost function. 4.2 Prerequisites for Area-Based Matching As mentioned in section 1.3, stereo matching algorithms can fail if the imaged surface has specular reflection characteristics. Area-based stereo matching is based on the assumption that the neighborhoods centered around homologous image points look similar. In addition to specular reflections, there are other situations for which this assumption is violated. Such situations arise when the image of a surface is strongly dependent on view angle. This view angle dependence can cause patches centered around corresponding points to look dissimilar in the images of two cameras. In this section, the assumptions about the scene properties underlying area-based stereo matching algorithms will be reviewed in detail. 4.2.1 Fronto-Parallel If a surface S is observed at a grazing angle, small changes of the viewing angle have a large effect on the size of the image of S. To quantify this effect, consider a small solid angle Ω (see section 2.1.1) as seen from a camera imaging the surface. This solid angle Ω corresponds to an image 80 Solving the Correspondence Problem area aΩ . The size of the surface patch in the 3D scene corresponding to aΩ will be denoted by A⊥ when S is perpendicular to the view direction (θ = 0). When the surface is tilted by an angle θ, the covered scene area A(θ) corresponding to the same fixed image area aΩ is enlarged by a factor of 1/ cos(θ): A(θ) = A⊥ . cos θ (4.5) The view angle dependent change in size of this area is given by the derivative: dA(θ) = A(θ) tan θ. dθ (4.6) Therefore, at grazing angles (large θ), a small change in viewing angle will have a large foreshortening effect on the image, whereas for close to normal angles (small θ) such a change will not have a major effect. The effect of this view angle dependence on a stereo camera system is illustrated in figure 4.1, which depicts two cameras imaging a surface that is tilted with respect to the camera baseline. Due to foreshortening, the solid angle covered by the surface patch A is larger for the left camera than for the right camera. The dissimilarity in image size reduces the correlation between the image patches resulting in a decreased performance of correlation-based disparity estimation. Therefore, the so-called fronto-parallel assumption underlies most correlation-based algorithms: it is assumed that the objects in the scene are imaged frontally and that their surfaces are oriented more or less parallel to the baseline connecting the two camera centers. 4.2.2 Lambertian Surface In section 2.1.1, a Lambertian surface was defined as a surface for which the radiance is constant, independent of view angle. This is a property that is very desirable in stereo vision, because a non-Lambertian surface introduces a dependence of the grey-value on the view angle in addition to the foreshortening effect described above. In particular, specular surfaces cause problems, because they exhibit highlights when the angle of incidence of the light source is equal to the view angle. These highlights are very distinct features, and a correlation-based algorithm will therefore often match them across the images. This leads to errors in the reconstruction 81 Figure 4.1: Foreshortening effect for a surface that is tilted with respect to the baseline. The solid angles Ω1 and Ω2 covered by the same surface patch A differ significantly in size. as the highlights observed from different view angles do not usually correspond to the same point on the surface. An example for the errors introduced by specular reflection is also given in the introduction, see figure 1.9. For mirror-like surfaces, the disparity algorithm will match the reflection of the surroundings, which depends on the angle of incidence. This situation is depticted in figure 1.9 on page 20. 4.2.3 Opacity A fundamental assumption in correlation-based matching is that the surfaces of the objects being imaged are opaque. When transparent surfaces are present in the scene, the radiance received by the camera along a line of sight may be the cumulative radiance originating from several points along that line, as depicted in figure 4.2. For different view angles, the image of a point on a transparent surface will thus have different grey-values depending on what lies behind it on the particular line of sight. Therefore, the correlation based on the grey-value will fail in such cases. In fact, independent of the matching approach, stereo vision cannot recover a 3D structure in the general case involving transparent media. This problem can only be solved using a volume reconstruction technique such as tomography. 82 Solving the Correspondence Problem Figure 4.2: When transparent objects are present, the radiance received by the cameras C1 and C2 from the direction of point P is the sum of the contributions from several points along the line of sight. 4.2.4 Texture Correlation-based matching is ineffective on textureless surfaces with a constant grey-value, because this leads to a constant matching score everywhere on the surface. For such a case, the assumption that a match is characterized by a maximum of the correlation score is wrong. Therefore, it is necessary that objects in the scene be textured; that is, that they show some spatial variation in grey-value. Whether the correlation score exhibits a single pronounced maximum depends on the statistical properties of the grey-value pattern. In particular, spatially periodic textures produce several local extrema in the matching score, which makes the determination of the true maximum difficult or impossible. 4.2.5 Summary and Conclusions Regarding Wave Imaging Many of the assumptions presented regarding scene properties that are used in area-based stereo matching algorithms are clearly violated for water at visible wavelengths. Water in the visible region is transparent, has specular reflection properties, and does not have a lot of surface texture. Examples of this were given in section 1.3. 83 In contrast, water in the 3 − 5 µm infrared wavelength region exhibits the surface properties required for stereo matching; as shown in section 2.1.4 it is opaque and has a rich surface texture, demonstrated in figure 1.3 on page 12. Therefore, water in this wavelength region is ideally suited for stereo matching. 4.3 Multi-scale, Area-Based Matching Algorithm using Shiftable Windows Many disparity estimation algorithms are tailored for situations often found in computer vision applications, such as robot navigation or the reconstruction of architectural features. In these applications, the scene is usually comprised of multiple objects, giving rise to occlusions and surface discontinuities. Moreover, these objects often also have different surface properties. This causes problems, as aggregation of the matching score across object boundaries tends to smooth out the discontinuities. Some algorithms address this issue and try to preserve sharp object boundaries by using adaptive filter windows or iterative aggregation of support, as mentioned in section 4.1.2. The situation encountered when reconstructing water waves using infrared imagery is more favorable, insofar as the water surface is a single “smooth” surface without sharp discontinuities. Given this situation, it is possible to use a comparatively simple algorithm using fixed window sizes for spatial aggregation of support and without a computationally expensive optimization stage. This section discusses the algorithm employed for the present application, which is an adaption of an algorithm described by Faugeras et al. [26]. This algorithm was chosen because it is computationally efficient, easy to implement and it produced good results. The computational efficiency stems from the fact that the computation of the matching score and the spatial aggregation are combined into a single step that is implemented using shiftable correlation windows as described in section 4.3.2. The algorithm computes one disparity map using the first and one disparity map using the second image of the stereo pair as the reference image. Matches that are not consistent across the two maps are discarded to produce a final disparity map. This validation procedure is explained in section 4.3.5. Sub-pixel accuracy is achieved by interpolating the correlation score between discrete values of disparity with a parabola as described in section 4.3.4. A simple multi-scale extension using image pyramids is presented in section 4.3.6. For this, the basic algorithm described in sections 4.3.1 to 4.3.5 is applied to several 84 Solving the Correspondence Problem down-sampled versions of the stereo pair. This has a similar effect to running the algorithm with differently-sized correlation windows, but it is computationally more efficient. The matches found on coarser scales can be used to fill in regions where a match cannot be found at the original resolution, for example due to lack of texture. 4.3.1 Matching Score Faugeras et al. [26] describe several different choices for the matching score, and note that three out of the four correlation criteria they tested perform about equally well for cases in which the grey-value distributions of the two images are similar. The difference between the correlation criteria they tested lies mainly in their sensitivity to differences in the grey-value histograms of both images. In the present application, the radiometric calibration process and the non-uniformity correction, described in section 5.1 ensure that the grey-value distributions of the two images are matched, so that the particular choice of the matching score is not critical. For this application, cross-correlation (matching score C2 in Faugeras et al. [26]) was used as the matching score: X C(x, y, d) = sX i,j I1 (x + i, y + j) · I2 (x + d + i, y + j) i,j I1 (x + i, y + j)2 · sX . I2 (x + d + i, y + (4.7) j)2 i,j The summations run over a (2n + 1) · (2m + 1) rectangular window. This function has higher values for patches that are better correlated. Note that the computation of equation (4.7) combines two of the steps outlined in section 4.1.2, that is, the computation of the correlation score and the spatial aggregation. Here, spatial aggregation is controlled empirically by selection of the window size, given by the two parameters m and n. When choosing this size, a balance has to be found between blurring details by choosing too big a window versus not having a good enough signal-to-noise ratio in regions with low texture if the window is too small. 4.3.2 Efficient Implementation using Shiftable Windows Computation of the Correlation Score To evaluate the correlation score (4.7), one has to iterate over the variables x, y and d. This involves many redundant 85 multiplications, if the sums in equation (4.7) are evaluated by explicitly iterating over all (2n + 1) · (2m + 1) elements of the rectangular window for each combination of x, y and d. This is very inefficient and leads to long processing times if correspondences are sought in large images and over a wide range of disparities. To obtain a computationally efficient implementation, redundant multiplications must be avoided. This section presents a simple strategy for doing so, using shiftable correlation windows. In equation (4.7), the first sum under the square root in the denominator is independent of the disparity d. Therefore, this sum is constant for a fixed x and y. As a constant factor is of no interest when searching the maximum over d one can consider the simplified matching score X C(x, y, d) = i,j I1 (x + i, y + j) · I2 (x + d + i, y + j) sX . I2 (x + d + i, y + (4.8) j)2 i,j Next, the numerator and the denominator of expression (4.8) are considered separately: N (x, y, d) with C(x, y, d) = p M (x + d, y) X N (x, y, d) = I1 (x + i, y + j) · I2 (x + d + i, y + j) and (4.9) i,j M (x, y) = X I2 (x + d + i, y + j)2 (4.10) i,j The strategy to avoid redundant computations is best explained using figure 4.3. For both the numerator and the denominator, first the elements of the product term along columns of length 2m + 1 starting in the first row are summed up. Next, starting in the first column, 2n + 1 of these column sums are summed to obtain the sum over a complete (2n + 1) × (2m + 1) window. Then, proceeding along the row, this window sum can be updated for each step by subtracting the column sum on the left and adding the next column on the right. 86 Solving the Correspondence Problem = - + = - + Figure 4.3: Efficient computation of matching score using shiftable windows. Product terms (blue squares) from the next column/row are added and pixels from the first column/row are subtracted from the window when proceeding to the next column/row. For the numerator N (x, y, d) this procedure leads to the following equations: Let P1 (x, y, d) = I1 (x, y) · I2 (x + d, y) be a shorthand for the product term in the sum of the numerator, and let Q1 (x, 0, d) = m X P1 (x, j, d) (4.11) j=−m be the first column sum at position x. Once the column sums of the first row have been calculated for all x, the correlation window sum for the first pixel can be calculated as N (0, y, d) = n X Q1 (i, y, d). (4.12) i=−n All the other pixels can then be calculated by iterating along the rows and columns using the following update rules N (x + 1, y, d) = N (x, y, d) − Q1 (x − n, y, d) + Q1 (x + n + 1, y, d) Q1 (x, y + 1, d) = Q1 (x, y, d) − P1 (x, y − m, d) + P1 (x, y + m + 1, d). (4.13) 87 The same scheme can be applied to the denominator, yielding the following algorithm: With P2 (x, y) = I2 (x, y)2 , the column sum in the denominator is initialized as Q2 (x, 0) = m X P (x, j) (4.14) j=−m for each column. The row sum is similarly initialized as M (0, y) = n X Q2 (i, y). (4.15) i=−n The update rules for the denominator are M (x + 1, y) = M (x, y) − Q2 (x − n, y) + Q2 (x + n + 1, y) Q(x, y + 1) = Q2 (x, y) − P2 (x, y − m) + P2 (x, y + m + 1). (4.16) For the image borders, some of the sums above are not well defined, as they include pixels coordinates that are outside the actual images. A common approach to deal with this, is to pad the image borders with zeros, which is the method used in the present application. Other approaches, such as taking the value of the pixel closest to the border for padding, can also be used. 4.3.3 Computation of Disparity Once the correlation score for a given point has been calculated for a given image point (x0 , y0 )T over the whole disparity search range dlow . . . dhigh , the value of d that maximizes function (4.8) is chosen as the disparity value for this image point: D(x0 , y0 ) = argmax d∈[dlow ...dhigh ] 4.3.4 N (x0 , y0 , d) p . M (x0 + d, y0 ) (4.17) Sub-Pixel Refinement Using the above method, the correlation score is only computed for discrete, wholenumbered, values of disparity. To estimate the disparity with sub-pixel precision, a parabola is used to interpolate between the discrete disparity values in the neighborhood of the detected maximum. 88 Solving the Correspondence Problem Assuming that dmax is the discrete disparity value that maximizes (4.8) for a given pixel (x, y)T , the interpolating parabola P (d) is determined by dmax and its neighboring points: P (dmax − 1) = C(x, y, dmax − 1) P (dmax ) = C(x, y, dmax ) P (dmax + 1) = C(x, y, dmax + 1). (4.18) The final sub-pixel disparity estimate is then given as the value d for which P (d) has its maximum; this value is easily computed using elementary calculus. 4.3.5 Validation of Matches Left/Right Consistency Check An effective test for checking the validity of the found matches can be applied by computing the correlation score (4.8) twice, using each of the images as the reference image in turn. Correct matches are expected to be consistent for both runs of the algorithm. In contrast, if a match is found to be inconsistent when reversing the role of the reference image it is labeled as erroneous and is discarded. The positions of matches that fail the consistency check are marked in a binary image mask. The rationale for this consistency check is as follows: Assume that, for a given pixel p1 in image I1 , the corresponding point in the other image cannot be found, for example due to occlusion. The correlation algorithm will then, more or less at random, assign a pixel p2 in the second image I2 as a match. However, p2 might have a corresponding pixel in I1 and therefore it is not likely that it will be matched to p1 if the roles of the images are reversed. However, note that this consistency check cannot detect wrong matches that are due to similar looking pixels that do not correspond to the same surface point; a situation that can arise from specular highlights. Mathematically this consistency criterion can be formulated as follows. Let D1 and D2 be the disparity maps obtained when using images I1 and I2 as the reference image, respectively. The consistency criterion then becomes ! D1 (x, y) = −D2 (x + D1 (x, y), y). (4.19) In this work, it was implemented such that all pixels for which |D1 (x, y) + D2 (x + D1 (x, y), y)| > T, (4.20) 89 where T is a manually selected threshold, were discarded. The value of T was set to 0.5. As the consistency check is performed after the sub-pixel refinement, D1 (x, y) is generally not an integer value, therefore D2 (x + D1 (x, y), y) is linearly interpolated when evaluating equation (4.20). Confidence Measure In addition to the above mentioned left/right consistency check, Faugeras et al. [26] propose a confidence measure associated with each disparity estimate. This confidence measure is based on how pronounced the maximum dmax of the correlation score is. For this, they suggest using the difference between the two largest local maxima as a measure. The rationale for choosing this measure is, that, in a situation with several possible candidate matches, for example due to a periodic texture, they are likely to have a similar correlation score. The confidence in such a match will therefore be low. However, it is unclear how this confidence measure could be properly integrated into a multi-scale approach, as described in the following section. Therefore, it was not used in the present work. 4.3.6 Multi-Scale Approach Sometimes no correspondences can be found for some areas of an image because of noise or lack of texture. In these cases, it is often possible to find a match if the correlation is performed on a down-sampled version of the original image. Gaussian pyramids Therefore, the algorithm described in the previous sections was extended to process the original image pair at multiple scales. For this, so called Gaussian image pyramids (see, for example Jähne [56]) were used. Starting at the original image resolution, a Gaussian pyramid is created by low-pass filtering the image and then sub-sampling it by taking pixels of every other column on every other row. This is repeated several times to obtain images of different resolutions, which can be visualized as different levels of a pyramid, see figure 4.4. The low-pass filtering is required to avoid aliasing effects such as Moiré patterns when sub-sampling. In the present work, the low-pass filtering was implemented as 1 (1, 4, 6, 4, 1), before a convolution of the image with a 5-entry Binomial filter tap, 16 sub-sampling and another convolution with a 3-entry Binomial filter tap, 14 (1, 2, 1), after sub-sampling. Both convolutions were performed in x- and y-direction. Disparity at Different Scales After Gaussian pyramids for both images of a stereo pair have been calculated, the correlation-based disparity estimation as 90 Solving the Correspondence Problem a b Figure 4.4: Gaussian pyramid. With a Gaussian pyramid, an image is represented at several scales. The original image is low-pass filtered and then down-sampled to obtain the next pyramid level. A schematic representation of this, in which pixels are represented by checkers, is given on the left. With each step up the pyramid, the number of pixels is reduced by a factor of four. On the right, the Gaussian pyramid obtained by downsampling an infrared image of a water surface is shown. Image courtesy of Uwe Schimpf, University of Heidelberg. described in sections 4.3.1 to 4.3.5 is applied to each pyramid level. This yields several disparity maps D0 , ...Dlmax , where the pyramid level is denoted by the subscript and level 0 represents the original resolution. Note that this meaning of the subscript is different than in the previous section, where it was used to denote which image was used as reference. The combined disparity map is obtained using algorithm 4.1. For each pixel (x, y) l= 0 // pyramid level While (Dl (x, y) is NOT valid) AND l ≤ lmax l = l + 1 // next level EndWhile If l ≤ lmax Dfinal (x, y) = 2l Dl · (2−l x, 2−l y) Else // no valid match at any scale Dfinal (x, y) =not valid EndIf Next pixel Algorithm 4.1: Algorithm for combining disparity estimates at different pyramid levels. The algorithm first tries to find a valid match at the highest resolution. If no 91 valid match is found, it proceeds to the higher pyramid levels until a valid match is found or the highest pyramid level is reached. The pixel positions for which no valid match was found on any scale are stored using a binary image. Strictly, the disparity estimates from higher pyramid levels would have to be interpolated when they are used to fill in gaps at the lowest level. Without interpolation, one obtains some block artifacts, as a disparity estimate at a coarse scale fills in up to four pixels at the next finer scale with the same value. The results obtained with the stereo matching algorithm described in sections 4.3.1 to 4.3.6 on infrared image sequences of water waves are described in sections 7.4 and 7.7. 92 Solving the Correspondence Problem 5 Image Pre- and Postprocessing This chapter presents image processing techniques for non-uniformity correction of infrared images, for removing outliers from disparity maps and for filling in missing data. The technique for filling in missing data, regularization, is used both for the preprocessing of the infrared images and for the post-processing of the disparity image sequences. For this reason, the treatment of both pre- and post-processing techniques is combined in this chapter. Radiometric calibration, briefly introduced in section 2.2.3, is used to correct for the non-uniform response of infrared detector chips. This pre-processing step, described in section 5.1, ensures that corresponding structures in two images of a stereo pair have similar grey-value distributions, a prerequisite for correlationbased disparity estimation. A simple technique for identifying outliers in measured data is presented in section 5.2. This method is used in a post-processing step to remove erroneous matches from estimated disparity maps. Missing data is interpolated using a so-called membrane model , discussed in section 5.3. This regularization method finds a data estimate that is controlled by a smoothness constraint and the available measurements. The theoretical background of the membrane model is presented in section 5.3.1. Section 5.3.2 explains how the membrane model can be applied to infrared images, where it is used to fill in missing pixels, resulting from defective sensor elements. Section 5.3.3 discusses how the disparity estimates obtained with the stereo matching algorithm described in chapter 4 can be regularized. For this particular application, an extension of the membrane model to image sequences rather than individual frames is presented. With this extension, temporal smoothness can be enforced during the regularization process and the convergence of the iterative interpolation method is improved. 5.1 Non-Uniformity Correction and Radiometric Calibration As mentioned in section 2.2.3, the detector response of an infrared focal plane array generally varies from sensor element to sensor element. As a result of this, the grey-value images obtained with an infrared camera may have a non-uniform grey-value distribution when a blackbody source with a spatially homogeneous temperature distribution is imaged. 94 Image Pre- and Postprocessing Sample images illustrating such a non-uniform sensor response are given in figure 7.1, page 114, and figure 7.2, page 115. The latter image shows that the spatial grey-value variations due to non-uniformity can be much larger than those attributable to the variation of the actual signal. Therefore, the non-uniformity has to be corrected for in order to obtain images that are suited for visual interpretation or correlation-based stereo matching. Radiometric calibration, introduced in section 2.2.3, establishes a relationship between the detector output and the incident irradiance. This relationship can be used for a non-uniformity correction by changing the grey-values in such a way that the new grey-value of each pixel is proportional to the equivalent blackbody temperature. Polynomial Fit For the radiometric calibration, it is assumed that n images of a blackbody at different temperatures T1 , . . . , Tn are available, as described in section 2.2.3. To obtain a functional relationship between temperature and detector output, the dependence of the grey-value output on the temperature is modeled as a polynomial for each pixel. The following discussion is for a single pixel. The relationship between the detector output, expressed as grey-value G, and blackbody temperature T is modeled as T (G) = m X ai Gi , (5.1) i=0 where the polynomial is chosen as third order (m=3), as suggested in [35]. The coefficients ai are determined by a linear regression that minimizes the sum of the squared distances between equation (5.1) evaluated at the grey-values G1 , . . . , Gn of the calibration points, and the blackbody temperatures T1 , . . . , Tn at which these calibration points were obtained. Once the coefficients have been determined for every sensor pixel, the non-uniformity of the camera images can be corrected for by replacing the original grey-value with a grey-value that is proportional to the apparent temperature obtained using equation (5.1). In this work, the image intensities were represented as floating point values, such that the value of the apparent temperature is used directly as the corrected grey-value. Identifying the Defective Pixels On a typical infrared detector chip, there are usually some “dead” or “bad” pixels, that either do not function at all, or that have a very noisy output signal. 95 In the present work, the parameters and the quality of the radiometric fit are used as a criterion to identify and label these defective pixels. For sensor elements that produce a noisy or random output, it is likely that the deviation of the fit curve from the calibration points is larger than for a regular pixel. Pixels that exceed a given threshold of the mean squared distance between the calibration points n 1X (T (Gi ) − Ti )2 > Dthreshold , n i=1 (5.2) were labeled as bad pixels, see section 7.1. Some defective sensor elements produce a constant grey-value output. These pixels can be identified by requiring a minimum slope of the fit curve. The positions of identified defective pixels are stored using a binary mask of the same size as the original image; good pixels are represented by a one and defective pixels are represented by a zero. Adjustment of Image Intensities The correlation score employed by the stereo matching algorithm used in this work, equation (4.8), is not normalized with respect to the grey-value of the two stereo images. Therefore, it is important that corresponding patches in the two images have a similar grey-value distribution. Ideally, if the water surface was a perfect blackbody, corresponding points in the two images would have the same grey-value after radiometric calibration. In practice, this is not always the case. As the water surface is not a perfect blackbody and the spectral responses of the two cameras used for this work differ slightly, the apparent absolute temperatures of the water surface obtained with the two cameras may differ slightly as well. However, the variation of the apparent temperature at the surface (typically several 0.1 K) should be similar for both cameras. To obtain similar image intensities for corresponding points in both images, the mean of the apparent temperature over all pixels has been calculated for both images and subtracted from the grey-values after radiometric calibration. 5.2 Outlier Removal Despite the validation of disparity estimates, described in section 4.3.5, wrong matches occasionally occur in the disparity maps. Often, these wrong matches have a disparity that differs significantly from the mean of the correct matches. 96 Image Pre- and Postprocessing Using the simplistic assumption that the correct matches are normally distributed, outliers can be identified as follows: The mean disparity hDi over the whole image or a subregion Ω is calculated as hDiΩ = 1X D(x, y), n Ω (5.3) where the sum runs over all n pixels in Ω. The standard deviation σΩ is defined as 1X σΩ2 = (D(x, y) − hDi)2 . (5.4) n Ω Outliers can be removed by setting a threshold on the maximum distance, in numbers of σΩ , a disparity value may have. In this work, the threshold distance was empirically set to 2.3 × σΩ and the subregions Ω were chosen as the lines of the disparity images. 5.3 Regularization: Filling in the Gaps When working with image data, a common problem is that the data contains gaps. Examples for such gaps are defective pixels in infrared images or invalid matches in disparity estimates. If the measured quantity is not completely random but has some spatial or temporal coherence, it is possible to fill in the gaps by interpolating between neighboring measurements. This section presents a regularization technique that can perform such an interpolation. The technique is physically motivated by the elastic properties of a membrane and leads to a partial differential equation. The theory of regularization is described in section 5.3.1, while sections 5.3.2 and 5.3.3 deal with the application of this technique to the above mentioned problems of filling in gaps in the infrared images and in the disparity estimates, respectively. 5.3.1 Theory This section briefly introduces regularization using the membrane model , in a similar manner as in Spies [93]. A more extensive treatment of regularization is given by Tschumperle [99]. 97 Problem Statement Assume that the image D(x, y) contains measured data, and that the confidence in the measurements is given as κ(x, y), where higher values of κ represent a higher level of confidence. Image regions where measured data is missing completely have a value of κ = 0. The goal is to find a differentiable regularized estimate E(x, y) that varies smoothly everywhere and stays close to the available measurements. The required smoothness of E ensures that regions with missing data are interpolated and that noise is dampened. Membrane Model If the image D is thought of as a height field, with greyvalues representing the height of each pixel, the regularized estimate E(x, y) can be pictured as the height field formed by a membrane which is “pulled” towards the data points. A higher confidence value results in a stronger pull towards the corresponding data point. Because of the elastic forces within the membrane, it cannot be stretched arbitrarily; that is, it cannot follow each pull. Therefore, its shape represents a balance between the competing demands of staying close to the data and smooth variation. Mathematically, this membrane model is expressed by an energy functional that has contributions from the elastic energy and from the “pulling” forces from the measured data. The regularized estimate E is given by the membrane shape that minimizes the energy functional Z ³ ´ κ(x, y) (E(x, y) − D(x, y)) + α (∂x E)2 + (∂y E)2 dx dy. (5.5) | {z } | {z } Image data term smoothness term | {z } L(E,∂x E,∂y E) In this functional, the first term accounts for the “pull” by the available data and the second term enforces smoothness by punishing large derivatives. The balance between these terms is regulated by the ratio of the confidence measure κ and the the elasticity α, which controls the smoothness. From the calculus of variations, it is well known that the solution that minimizes the integral (5.5) satisfies the Euler-Lagrange equation d ∂L d ∂L ∂L − − = 0, ∂E dx ∂(∂x E) dy ∂(∂y E) (5.6) which leads to the following partial differential equation κE − κD + α∆E = 0. Here, ∆ = ∂x2 + ∂y2 is the well known Laplace operator. (5.7) 98 Image Pre- and Postprocessing Discretization For a discretization of equation (5.7) it is useful to note that a discrete Laplace operator can be approximated by a local mean, for example by a binomial filter mask, minus the central value: ∆E = hEi − E (for an explanation, see for example Jähne [56]). Substituting this approximation into (5.7) yields (κ + α)E = αhEi + κD, (5.8) which leads to an iterative solution of equation (5.7) using the following update rule: En+1 = 5.3.2 α κD hEin + . κ+α κ+α (5.9) Filling in Defective Pixels As a result of defective sensor elements, image data collected by infrared cameras contains gaps, which must be filled in. The membrane model can be used to fill in these holes in a pre-processing step. In the present application, the grey-values of the dead pixels are completely unknown and therefore have a confidence value of zero. In contrast, the grey-values of the good pixels are known with very high confidence, as ideally they should be only affected by noise. Therefore, the good pixels should have a value for κ that is much larger than α. In other words, the grey-values of the good pixels remain fixed. This leads to a simplification of the iterative update rule (5.9) En+1 = hEin for “dead pixels” (κ = 0) En+1 = En for “good pixels”(large κ), (5.10) suggesting a two-step iterative procedure: 1. Smooth the whole image, using a filter that calculates a local mean, such as a Gaussian or binomial filter. 2. Replace the grey-values of the “good pixels” with their original grey-value. Because the holes due to defective pixels are not very large, convergence is typically reached after a few iterations. 99 5.3.3 Regularization of Disparity Estimates The membrane model, equation (5.5), also lends itself to the regularization of disparity estimates. Holes in disparity estimates can occur for a variety of reasons, such as occlusions, lack of texture or noisy images. There are two for the confidence measure κ that regulates the influence of the data term; either a confidence measure obtained from the disparity matching algorithm, as mentioned in section 4.3.5, or a binary confidence measure, setting κ to one for disparity estimates satisfying the consistency check between left and right image (see section 4.3.5) and setting κ to zero for estimates that fail the consistency check. The regularization of the disparity data can be performed on a frame-by-frame basis using the membrane model. However, this method has several drawbacks: First, if the disparity data contains large holes, as is the case for some frames in the current work, the iterative solution for the membrane model as in equation (5.9) does not have good convergence properties. The iterative approach is essentially a “diffusion” of disparity data with high confidence into regions with low confidence. The problem is, that diffusion is a relatively slow process and therefore large holes take many iterations to be closed. Second, the temporal change of the water surface is not abrupt but rather gradual, and this should be taken into account by the regularization procedure. Therefore, a temporal smoothness constraint should be included in addition to the spatial smoothness term in equation (5.5). Temporal Smoothness Constraint To see how the wave motion constrains the temporal change of disparity, the height of a simple sinusoidal wave h(t, x) = h0 ei(ωt−kx) (5.11) can be examined, where h0 is the amplitude of the wave, ω is the frequency and k is the wave vector. Taking the temporal derivative leads to ∂h = ih0 ωei(ωt−kx) ∂t = ih0 kck ei(ωt−kx) , (5.12) with the wavenumber k and the propagation velocity ck . The dependence of ck on k is given by the so-called dispersion relation. For water waves, the dispersion relation can be found, for example, in Kinsman [63]. 100 Image Pre- and Postprocessing If the time between consecutive image frames is Tframe , equation (5.12) gives an upper bound for the difference between frames: the maximum change in amplitude is h0 kck Tframe . The exact value of this upper bound depends on the wavenumbers present in the wavefield. Disparity Regularization of Wave Image Sequences The upper bound on the temporal derivative of the wave height is, in effect, a physically-based temporal smoothness constraint, which also limits the temporal change of stereo disparity. This temporal smoothness constraint can be taken into account by extending the membrane model of equation (5.5) in the time domain and applying it to a whole disparity image sequence rather than to individual frames. The temporal smoothness constraint is integrated into the model in the same way as the spatial smoothness constraint, leading to the following functional to be minimized: Z Image sequence ¡ ¢ κ (E(x, y, t) − D(x, y, t)) + α (∂x E)2 + (∂y E)2 + dx dy dt. αt (∂t E)2 | {z } | {z } temporal smoothness spatial smoothness {z } | L(E,∂x E,∂y E,∂t E) (5.13) Analogous to equations (5.6) and (5.7), this minimization problem leads to the following partial differential equation: κE − κD + α∆xy E + αt ∂t2 E = 0 (5.14) This equation can be further simplified if one assumes that the factors controlling the spatial and temporal smoothness have the same value, that is α = αt . By the same reasoning as in section 5.3.1, the regularized solution E(x, y, t) can be found using the same iterative update rule, equation (5.9). The only difference to the 2D case is, that the local average is taken over a 3D neighborhood, with the third dimension being time. This iterative approach can also be used for α 6= αt if the filter tap that is used to calculate the local average is adjusted appropriately. In the present application, using a 3D membrane model also helps to overcome the above mentioned problem of slow convergence when large holes are present in the disparity data. The reason for the improved convergence is that holes which are large in the x- and y- direction are not necessarily large in the temporal direction as well. Therefore, the diffusion process can quickly fill in such holes with disparity information from previous and subsequent frames. 6 Experimental Setup and Procedures This chapter gives a detailed description of the instruments used and the procedures followed for the experiments with the infrared stereo camera system at the Heidelberg Aeolotron facility. The infrared cameras that were used in this work are presented in section 6.1, and their integration into a stereo system is the topic of section 6.1.2. Before each deployment of the system, image sequences to be used for radiometric calibration were obtained with a laboratory blackbody, which is described in section 6.2. Section 6.3 introduces the different components of the acquisition and data storage system. This includes subsection 6.3.2 which deals with the synchronization of the two cameras. Geometric calibration of the cameras was performed using a calibration target, presented in section 6.4, that was designed to have high-contrast feature points in the infrared region. An overview of Heidelberg wind-wave facility Aeolotron, at which the experiments were performed, can be found in section 6.5. Finally, section 6.6 gives an account of the procedures that were followed to perform the experiments. 6.1 Infrared Cameras For a stereo system, the use of two identical cameras is desirable. However, due to the high cost of infrared imagers, a stereo system with two different cameras (which were already on hand) was built. The cameras used were one Thermosensorik CMT 384 and one Raytheon Amber Radiance. 6.1.1 Specifications Table 6.1 summarizes the manufacturers’ specifications for these two cameras. Although the two cameras are based on different detector technology, Indium Antimonide for the Amber Radiance camera and Cadmium Mercury Telluride for the Thermosensorik camera, they are both sensitive in approximately the same wavelength range (3 − 5µm). They differ slightly in terms of their NE∆T, number of pixels and maximum frame rate. Both cameras were equipped with lenses of 50 mm focal length. Despite identical focal lengths, owing to the different aperture 102 Experimental Setup and Procedures Factory specification Detector type Spectral bandpass Noise equivalent ∆T Maximum frame rate Number of Pixels Pixel size Pixel pitch Dynamic range bits/pixel Dimensions WxHxD Focal length Field-of-view Non-uniformity correction Dead pixel correction Amber Radiance TS CMT384 Indium Antimonide 3-5 µm ≤25 mK at 300K 60 Hz 256x256 unknown 38x38 µm2 12 112x183x262 mm3 50 mm 11.1◦ vert./horiz. on-board, two-point nearest neighbor Cadmium Mercury Telluride 3.5-5 µm ≤20 mK at 293K 150 Hz at 1ms int. time 384x288 20x20 µm2 24x24 µm2 14 130x150x260 mm3 50 mm 8.8◦ horiz., 6.0◦ vert. not included not included Table 6.1: Manufacturers’ specifications of the Raytheon Amber Radiance and Thermosensorik CMT 384 infrared cameras. and detector sizes of the two cameras their field-of-view , that is the maximum horizontal and vertical angular extent viewed, was different. Another consequence of the difference in optics is a marked variation in depth-of-field , that is the distance range in which objects are well focused ([see 57, sec. 4.6.2] for details). The Amber Radiance camera is equipped with an on-board non-uniformity correction. This non-uniformity correction is based on a radiometric calibration using a linear model, as described in 5.1. Two reference temperatures are created internally by a small plate that is heated or cooled and placed in front of the detector chip. The Amber Radiance camera automatically fills in the defective pixels with the grey-value of a neighboring pixel. The model of the Thermosensorik camera used in this work is not equipped with an on-board non-uniformity correction. 103 6.1.2 Stereo Setup To create a stereo setup, the two cameras were mounted together on a solid aluminum base plate with a thickness of 5 mm as shown in figure 6.1. The horizontal baseline of this system is about 13 cm; as the position of the optical center with respect to the camera housing is not known exactly, its value is later determined through camera calibration (see 7.2.2). Each camera was firmly attached to the plate with two screws. The verging angle β of the cameras was varied by sliding one of the attachment points along a slit holding one of the screws as depicted in figure 6.2. Before deployment and geometric calibration (see section 3.4.1), the verging angle was adjusted to maximize the overlap of the two fields of view at the anticipated range, which was typically around 120 cm in these experiments. This maximization was done by visual inspection, using the PCB checkerboard pattern described in section 6.4 as a target. This adjustment of verging angle is sometimes referred to as boresight alignment ([see 50]). The fact that the optical axes of the two cameras are not parallel for a non-zero verging angle does not pose a problem for the stereo matching because the stereo images are rectified as described in section 3.5.3. During experimental operation, the main disadvantage of not having identical cameras was the difference in optics, specifically the difference in depth-of-field. When the Amber Radiance camera was focused on a distance of 90 cm, everything that was further away was also in focus (focus at ∞). The Thermosensorik camera, owing to its larger aperture, has a smaller depth of field, and when focused at the same distance of 90 cm only objects in the range of 85 − 110 cm were well focused. Figure 6.1: Infrared stereo camera setup: front (left) and top (right) views. 104 Experimental Setup and Procedures Figure 6.2: Stereo camera setup. The verging angle β can be adjusted to maximize the overlap (shaded area) of the fields of view for the anticipated distance to the water surface. This is done by sliding the attachment screws along a slit in the aluminum base plate. 6.2 Blackbody For the radiometric calibration (see sections 2.2.3 and 5.1), a laboratory blackbody, that is a thermal emitter that comes close to being an ideal blackbody (see section 2.1.2), manufactured by Santa Barbara Infrared Inc. was used. This laboratory blackbody has a highly emissive (emissivity 0.985 ± 0.014) surface whose temperature can be controlled over the temperature range 10 − 60◦ C. The temperature accuracy is the larger of 2.5 mK × (T − 25◦ C) and 25 mK. For the experiments, radiometric calibration images were obtained over a temperature range from 19 − 25◦ C, in which the absolute temperature accuracy of the blackbody is 25 mK. The specifications of the blackbody are listed in table 6.2, while results of the radiometric calibration are presented in section 7.1. 6.3 6.3.1 Acquisition System Frame Grabber For image acquisition, two frame grabbers of the type microEnable 2 , produced by Silicon Software GmbH , were used. The microEnable 2 frame grabber, shown in figure 6.3, is based on a modular design with a programmable logic chip (FPGA) 105 Parameter Temperature Range Emissivity Total system uncertainty Factory specification 10-60◦ C, absolute 0.985±0.014 from 2 − 14 µm ±max of2.5 mK · (T − 25◦ C), 25 mK Table 6.2: Santa Barbara Infrared Series 2000 Blackbody specifications. at its core. Together with interchangeable camera port modules, the FPGA makes it possible to tailor the frame grabber to specific applications and allows some simple on-board image processing. The frame grabbers possess an additional I/O port module, which in the present application was used to route trigger signals for camera synchronization (see below). The image data is transferred to the RAM of the computer with direct memory access (DMA) via a PCI bus interface. Figure 6.3: Silicon Software microEnable 2 frame grabbers: the right frame grabber is shown with the I/O port module at the upper right, which was used to route the trigger signal from one frame grabber to the other. 106 6.3.2 Experimental Setup and Procedures Camera Synchronization When collecting stereo image sequences of moving objects, it is important that the two cameras be synchronized, because otherwise the disparity between the two views can be partly due to motion in the scene. In contrast to the Thermosensorik CMT camera, the Amber Radiance camera cannot be triggered externally (see section 6.1). Therefore, in these experiments, the trigger signal for the Thermosensorik CMT had to be derived from the frame start signal of the free-running Amber Radiance camera. After exposure of the CCD-chip, the camera sends the frame start signal, to signal the start of the image data transmission to the attached frame grabber. This frame grabber was programmed to generate a trigger signal from the frame start signal. Via an external cable, the generated trigger signal was routed to the second frame grabber, which then triggered the Thermosensorik camera. A schematic timing diagram of this setup is depicted in figure 6.4. The timing diagram shows that there is a time lag on the order of the exposure (or integration) time Tint of the Amber Radiance camera (1.8 ms during the experiments) before the start of exposure of the second camera. To work around this problem, attempts were made to use the frame grabber to delay the generated trigger signal by Tframe − Tint where Tframe is the time between frames. This way the cameras would be exposed simultaneously, with their frame numbers differing by one. However, during the experiments it was found that the programmable delay did not operate as claimed by the manufacturer. Therefore the two cameras were exposed with a small time lag. For a discussion of the potential systematic error introduced by this delay, refer to the discussion in section 7.8.1. 6.3.3 PC and RAID Both frame grabbers were mounted in a standard PC equipped with a 433 MHz Pentium II processor, 1 GB of RAM and a 270 GB striping RAID hard disk array. The operating system used was Windows NT 4.0. For synchronous acquisition with both cameras recording at 60 frames per second, a data rate of · · ¸ ¸ · ¸ · ¸ pixel Byte frames MByte · 384}) 2 ×(256 · 256} + 288 ×60 = 20.60 (6.1) | {z | {z pixel frame s s Amber Rad. TS CMT must be handled. The RAID array has a nominal data rate of 50 MB/s, fast enough to write this data rate onto the hard disk in real time. Nevertheless, an occasional loss of frames cannot be completely ruled out, especially during 107 AR Shutter AR Frame Start AR Data Generated trigger for TS TS Shutter TS Data LT LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLT LLLLL LLL LLLLLLLLLLLLLLLLLLLLLLLLLLLLLL LLL ZZZZZVVVVVVVVVVVVVVVVVVVVVVVVVZZZZZVVV LLL LLLLLLLLLLLLLLLLLLLLLLLLLLLLLL LLL LLLLT LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLT LL ZZZZZZVVVVVVVVVVVVVVVVVVVVVVVVVZZZZZVV AaaP AaaP int int AR Image Data int int TS Image Data time lag time lag Figure 6.4: Timing diagram: The Amber Radiance camera (AR) exposes the detector chip for a time span Tint , after which it signals the start of a new frame to the frame grabber (AR Frame Start) and begins transmission of data. After the rising edge of AR Frame Start the frame grabber generates a trigger signal for the Thermosensorik camera which then opens its shutter for Tint . Note that the time lag between the exposure of the two images is on the order of Tint . The diagram schematically shows the sequence of events and is not to scale. intermittent claims of system resources by other tasks running on the system. The frame grabber embeds a sequential frame number into the image data stream, so that a loss of frames can easily be detected by searching for consecutive frame numbers that differ by more than one. However, stereo image sequences of up to approximately 30 seconds can instead be stored directly in the RAM of the PC, and then subsequently be written onto the hard disk. When doing so, care has to be taken that the RAM buffer is allocated in such a way that the operating system does not try to swap the contents of the buffer onto the hard drive.1 This “grab-to-RAM” approach was used for the experiments, and no loss of frames occurred. 1 For Windows NT 4.0 this can be done by allocating the buffer with VirtualAlloc() and locking in it in physical memory with VirtualLock(). 108 6.4 Experimental Setup and Procedures Geometric Calibration Target For the geometric camera calibration process described in section 3.4.1, a planar calibration target with precisely located and easily identifiable feature points is required. As suggested by Bouguet [10], a checkerboard pattern was used for the present work. To facilitate the extraction of the corners from the calibration images, it is desirable to have a high contrast between the light and dark checkers. When imaging in the infrared region, this can be accomplished by using checkers made of two materials with different emissivity. Printed circuit boards (PCB) are well suited for this; the copper has a high reflectance (low emittance), whereas the base material has a high emittance (see figure 6.5). When such a target is warmed, for example with a heat gun, the base material checkers appear as bright squares in a thermal image because they are greybodies. In contrast, the copper checkers look darker, reflecting the cooler ambient temperatures like a mirror. Using PCBs has the benefit that calibration targets can quickly be produced with high accuracy (about 10 µm) and low costs by etching or milling. For this application several targets with different checker dimensions were milled by Martin Vogel of the Physiologisches Institut Heidelberg. To prevent non-planar distortions due to bending or twisting, the boards were mounted onto sturdy flat plexiglass slabs. For the experiments a target with a checker size of 22 × 22 mm2 was used. Figure 6.5: Left: Two camera calibration targets with checker size 22×22 mm2 and 32×32 mm2 respectively. The picture was taken at visible wavelengths. Right: The highly reflective copper and the highly emissive base material of the printed circuit board make for a good contrast in the infrared. This image was acquired with the Amber Radiance camera after heating the target with a heat gun. 109 6.5 Aeolotron The experiments were performed at the wind-wave interaction facility Aeolotron 1 , located at the Institute of Environmental Physics at the University of Heidelberg. The Aeolotron is designed to study air-sea gas exchange in a controlled environment. It is a gas-tight annular channel, the lower part of which is filled with water and the upper part of which is filled with air (see Figure 6.6). Figure 6.6: Left: cross-sectional view of the wind wave channel. The aluminum coated foil reduces thermal emission from the walls (photo from [54]). Right: side view through the “panoramic window” into the channel with wind waves. Winds with speeds of up to 15 ms−1 can be produced by a ring of paddles circulating above the water surface. The wind stress on the water surface generates wind waves. The circular geometry gives a distinct advantage over linear channels, because it allows for a large fetch; that is, the wind can act on the water surface for a long time. Therefore, the generated wave field develops a stationary wavelength distribution after some time period of continuous wind, similar to the situation found on oceans. In contrast, in linear wind wave facilities the distribution of the observed wavelengths varies along the length of the channel. The channel has a mean circumference of 29.2 m, a width of 0.61 m and a height of 2.4 m. The water depth at the time of experiment was 0.90 m. The channel is gas-tight, permitting the study of air-sea gas exchange by mass-balance methods. The air conditioning system provides independent control over humidity and air temperature. By warming or cooling, positive and negative heat fluxes can be 1 Named after Aeolus, the Greek god of the winds. 110 Experimental Setup and Procedures created on the water surface. Changing the humidity affects the latent heat flux due to evaporation. The interior walls of the channel are covered with a gas-tight, aluminum coated plastic foil. This coating has a high reflectivity and helps to reduce the thermal emission from the walls, thereby reducing the effect of the environment on the temperature measurements of the water surface. A more detailed description of the Aeolotron is given by Jähne [54, 58]. 6.6 Experimental Procedure In this section, the experimental procedures followed and the instrument settings used during the experiments are detailed. Figure 6.7 illustrates the different stages of the measurement procedure. 6.6.1 Radiometric Calibration Procedure First, a set of image sequences to be used for radiometric calibration was acquired. This was done by placing each camera in front of the blackbody as shown in figure 6.7a. The blackbody temperature was varied over the temperature range 19 − 25◦ C in steps of 0.2 K. At each temperature, an image sequence of 32 frames was acquired. The integration times were set to 1.8 ms for the Amber Radiance camera and 1.3 ms for the Thermosensorik camera; these integration times were also used for the subsequent acquisition of images of the water surface. 6.6.2 Geometric Calibration Procedure Second, a series of stereo image pairs of the checkerboard calibration target was acquired for geometric calibration, as explained in section 3.4.1. For each image pair, the orientation of the checkerboard with respect to the camera was changed. To enhance the contrast in the infrared images, the target was warmed up using a heat gun (see figure 6.7b) before acquiring the images. The distance between the cameras and the calibration target was chosen to lie within ±10 cm of the anticipated distance between the cameras and the water surface, which was around around 125 cm in these experiments (see figure 6.7c). 111 a b c d Figure 6.7: Experimental procedures: a) Acquisition of image sequences for radiometric calibration. b) Warming the calibration target with a heat gun to enhance image contrast. c) Setup during acquisition of image sequences for geometric camera calibration. d) Mounted stereo system aimed at the water surface through an opening in the wind wave channel. Note that the calibration target shown in b) and c) differs from the one that was actually used for the experiments. 6.6.3 Deployment After the acquisition of image sequences for radiometric and geometric calibration, the stereo system was mounted for deployment without changing the relative orientation of the cameras or the focus setting. As shown in figure 6.7d, the system was aimed at the water surface at a slightly oblique angle, through an opening in the Aeolotron. Using a folding rule, the distance of the cameras from the calm water surface was determined to be 125 ± 15 cm (The variation in the distance results from the oblique view angle). 112 6.6.4 Experimental Setup and Procedures Acquisition of Image Sequences For different wind speeds, image sequences of the water surface were recorded. Each sequence consists of 512 frames with a frame rate of 60 frames/second. To create a heat flux, cool dry air was let into the Aeolotron using the air-conditioning system. Before wind waves were created using the paddle ring, an image sequence of a flat water surface was recorded; the planar surface was used for assessing the accuracy of the stereo system (see section 7.8.1). Care was taken that the surface was truly calm by waiting for approximately two hours after filling the wave flume with water before the image sequences were recorded. Some frames of this flat water sequence are depicted in figure 1.3 on page 12. Because no heat flux was present at the beginning of this sequence, the first 100 frames of the sequence were not usable for disparity matching. The sequences acquired are listed in Table 6.3. The numbering reflects the order in which the sequences were acquired and is used in section 7.7 for the presentation of the results. The emphasis during the experiments was on obtaining sequences with different wave heights, which was achieved by varying the wind speed. The listed wind speeds are rough estimates, and are only given to allow qualitative comparisons. The bulk water temperature was in the range 19.5◦ C to 21◦ C over the course of the experiments. Sequence Number 0 (Flat water) 1 2 3 4 Wind Speed medium low high low no wind (4-6 m/s) (2-3 m/s) (7-8 m/s) (2-3 m/s) Table 6.3: List of the infrared image sequences of water waves acquired. 7 Results This chapter presents the results obtained with the stereo infrared camera system described in chapter 6 and the image processing algorithms described in chapters 3 to 5. Sections 7.1 to 7.6 document the results obtained for the individual processing steps: radiometric and geometric calibration, rectification, disparity matching, regularization and depth reconstruction. The results obtained for the actual application, namely the reconstruction of the water surface, are presented in section 7.7. An experimental assessment of the accuracy of the surface reconstruction is given in section 7.8. 7.1 Radiometric Calibration and Non-Uniformity Correction Radiometric calibration and non-uniformity correction were performed for both infrared cameras. At temperatures in the range of 19 − 25◦ C, image sequences of the Santa Barbara Infrared laboratory blackbody were acquired as documented in section 6.6.1. To reduce the influence of image noise, the mean over 32 frames was calculated for each pixel at each temperature step. Using the mean images, a 3rd order polynomial fit as described in section 5.1 was performed. Pixels for which the squared standard deviation of the calibration points from the fit curve (see equation (5.2) on page 95) exceeded an empirically chosen threshold of 5 · 10−4 K2 were √ marked as defective. This threshold corresponds to a standard deviation√of 32 · 5 · 10−4 K = 0.13 K from the fit curve for a single frame. The factor of 32 accounts for the fact that the calibration points were not obtained from single image frames but from mean values calculated over sequences comprising 32 frames. Pixels whose grey-value remained constant independent of temperature were also marked as defective. The calibration curves thus obtained were used to apply a non-uniformity correction to the infrared image sequences by replacing the raw grey-value of each image pixel with the corresponding apparent blackbody temperature. Gaps in the images due to defective pixels were interpolated as described in section 5.3.2. 114 7.1.1 Results Thermosensorik CMT Camera Figure 7.1 (left) shows one of the raw grey-value images collected with the Thermosensorik CMT camera to illustrate the need for a non-uniformity correction. This image was acquired with the camera imaging the surface of the Santa Barbara Infrared laboratory blackbody (see section 6.2) at a temperature of 24◦ C. Clearly, the uniform temperature distribution of the blackbody does not result in a uniform grey-value distribution in the raw image. For several selected pixels, marked with red arrows in figure 7.1, the calibration points and the resulting fit curves describing the relationship between blackbody temperature and grey-value output are shown in figure 7.3. The calibration images were acquired with an integration time of 1.3 ms. Note that the grey-value level of the pixel at position (180,246) is lower than the grey-values of the other selected pixels by one order of magnitude. Also, the slope of the calibration curve is lower, resulting in a lower temperature resolution. This particular pixel was labeled as a defective pixel based on the deviation of the calibration points from the fit curve. The pixels of the Thermosensorik camera that were labeled as defective are displayed as a binary mask in figure 7.1 on the right. The number of bad pixels was 557, that is 0.5% of the total number of pixels. Figure 7.2 shows the results of the non-uniformity correction for an image of a water surface. The defective pixels have been filled in using the membrane model (see section 5.3.2). Figure 7.1: Left: Raw grey-value intensity image obtained with the Thermosensorik CMT camera imaging a laboratory blackbody at a temperature of 24◦ C. The calibration curves for the pixels marked with red arrows are depicted in figure 7.3. Right: Binary pixel mask for the Thermosensorik camera. Pixels labeled as defective are shown in black, good pixels are shown in grey. 115 Figure 7.2: Image of the temperature distribution on a water surface before (left) and after non-uniformity correction and filling of gaps (right). The image was taken with the Thermosensorik CMT camera. 7.1.2 Amber Radiance Camera For the Amber Radiance camera, the relationship between the grey-values of some selected pixels and the corresponding blackbody temperature is depicted in figure 7.4. The integration time was 1.8 ms. For the Amber Radiance camera, higher temperatures correspond to higher grey-values, in contrast to the Thermosensorik camera for which higher temperatures correspond to lower grey-values. When looking at the calibration curves of the Amber Radiance camera, the fact that this camera has an on-board non-uniformity correction as described in section 6.1 must be taken into account. The camera also has a mode of operation in which it outputs raw data, but it was operated using the on-board correction as this allows immediate visual feedback using an external control monitor. This visual feedback facilitated operations such as focusing the camera on the water surface. An additional radiometric calibration was performed on top of the on-board nonuniformity correction for two reasons. First, the internal non-uniformity correction uses a temperature-controlled plate within the camera body to perform a linear two-point calibration. Therefore, it does not take into account any non-uniformity introduced by the lens, such as vignetting.1 Second, the radiometric calibration makes it possible to translate the grey-value output into equivalent blackbody temperatures, making it easier to compare the outputs of the two cameras. Due to the fact that the on-board electronics of the camera replaces the output of defective pixels with the grey-value of an immediate neighbor pixel, the method of identifying bad pixels based on the quality of the fit is not applicable for this 1 The term vignetting refers to a reduced image intensity close to the borders of an image. 116 Results # $ % & ! " # ( * * $ ' ! ) ' # ( ' # & ' +! ! " ! ' ) ! ) ' ) ! ! Figure 7.3: Sensitivity of selected pixels (marked in red in figure 7.1, left) for the Thermosensorik CMT camera. The mean grey-value for each pixel (averaged over 32 frames) is given as a function of the temperature of the imaged blackbody (calibration data points are marked +) together with the respective fit curve through the data points (solid lines). Note that the grey-values of the pixel at position (180,246) in the bottom graph are displayed at a different scale and differ from the grey-values of the other pixels by one order of magnitude. 117 &' % " % !%& ( % !" # $ !% & ' ) # & *+ ) # & & ' ) # & & *+ ) " ! ) # # Figure 7.4: Sensitivity of selected pixels for the Amber Radiance camera. Refer to figure 7.3 for an explanation of the axes. instrument. However, despite the on-board nearest neighbor interpolation, 26 pixels were identified as bad pixels by the distance-from-fit criterion that was also used for the Thermosensorik camera as described in section 7.1.1. These pixels are shown in figure 7.5. To fill in the intensity values of these 26 pixels, the regularization technique described in section 5.1 was used. For the other pixels, the nearest neighbor interpolation of the camera was relied upon. Figure 7.5: Binary mask of the pixels of the Amber Radiance detector which, despite the on-board correction, were identified as bad pixels by the distance-from-fit criterion. Bad pixels are displayed in black, good pixels are displayed in grey. 118 7.2 Results Geometric Camera Calibration This section presents the results of the geometric camera calibration obtained with the algorithm described in section 3.4. The implementation of the algorithm used is a modified version of the Camera Calibration Toolbox for Matlab, which is a piece of open-source software provided by Bouguet [10]. Bouguet’s implementation, however, uses a method for obtaining an initial guess for the camera parameters that differs from the method proposed by Zhang [107], and is not well documented. Therefore, Zhang’s method of obtaining an initial guess, described in section 3.4.2, was implemented. However, it was found that the the optimization step of the calibration (see section 3.4.3) is not very sensitive to the initial guess; for both methods of estimating the initial values, the optimization converged to the same result. For the geometric calibration, a set of 38 stereo image pairs showing different orientations of the calibration target described in section 6.4 were acquired. This set of image pairs is depicted in figures 7.7 and 7.6. Extraction of Grid-Corners The positions of the grid-corners in the checkerboard images were located in a semi-automatic way. The four corners delimiting the outside edges of the checkerboard were selected manually for each calibration image by clicking with the mouse. Using the information about the four corner positions, the software calculated initial guesses for the grid corner locations, which were subsequently refined using a sub-pixel corner estimation algorithm described by Harris and Stevens [41]. In some of the images, only a subset of the checkers was selected, because parts of the calibration target were not visible in the images or out of focus. Care was taken that the same subset of checkers was selected for corresponding images of the two cameras. The calibration was initially performed using all 38 image pairs. A close inspection of the re-projection errors (see section 7.2.1, in particular figure 7.10) for this initial calibration revealed that some of the images contained gross outliers, due to inaccurate extraction of grid-corners. These images were manually deselected. The subsequent calibration was performed on the remaining 23 images comprising 942 calibration points in total. 119 Figure 7.6: Calibration images taken with the Amber Radiance camera. Figure 7.7: Calibration images taken with the Thermosensorik CMT camera. 120 7.2.1 Results Interior Parameters The interior camera parameters obtained by the calibration algorithm are listed in table 7.1. The focal length is given both in terms of pixel dimensions αx,y and in millimeters fx,y . To calculate the focal length in millimeters, the αx,y were multiplied by the pixel pitch, that is the center-to-center spacing of pixels on the detector chip. The pixel pitches are 38 µm for the Amber Radiance camera and 24 µm for the Thermosensorik camera (see table 6.1). The numerical errors associated with the parameters are derived from the inverse of the Hessian matrix used in the gradient-descent optimization of the re-projection error (see section 3.4.3) and are about three times the standard deviation. Parameter Focal length αx (pixel dimensions) Focal length αy (pixel dimensions) Focal length fx (mm) Focal length fy (mm) Principal point u0 Principal point v0 Skew First-order radial distortion κ1 Tangential distortion τ1 Tangential distortion τ2 Pixel re-projection error ex Pixel re-projection error ey Amber Radiance TS CMT384 1405.5±6.1 1405.6±6.0 53.4±0.2 53.4±0.2 143.3±7.5 123.9±8.6 not estimated -0.189±0.054 not estimated not estimated 0.052 0.052 2314.9±8.1 2316.6±7.7 55.6±0.2 55.6±0.2 275.0±15.4 118.7±11.9 not estimated -0.456±0.038 -0.0039±0.0001 -0.0053±0.0016 0.065 0.069 Table 7.1: Interior camera parameters determined by camera calibration. Distortion Model In a first run of the calibration algorithm (see section 3.4.3), the optimization minimized the re-projection error of a camera model that included parameters for both first order radial distortion, κ1 , and first order tangential distortion τ1,2 . For the Thermosensorik camera, this model was adequate. However, for the Amber Radiance camera the tangential distortion parameters τ1,2 were found to be zero within the numerical uncertainties. Therefore, the camera parameters of the Amber Radiance were re-estimated using a distortion model that only included first order radial distortion. For both cameras, the resulting displacement between undistorted and distorted image coordinates is illustrated in figure 7.8. 121 Camera Distortion Model (Amber Radiance) 0 4 50. 0. 0.2 0.1 50 0.3 0.1 0.2 100 0. 1 0.1 0. 4 200 0.3 150 0 0.2 0.2 50 100 150 200 Camera Distortion Model (Thermosensorik) 0 1 0.5 50 0.5 1 100 0.5 1 200 1.5 150 2 250 0 50 100 150 0.5 200 250 300 350 Figure 7.8: Camera distortion models for the Amber Radiance camera (top) and the Thermosensorik camera (bottom). The vectors show the displacement between distorted and corrected image coordinates (see equations (3.15)-(3.18) in section 3.3.1). The contour lines mark lines with equal magnitude of displacement. The image center and the principal point are marked × and ◦, respectively. 122 Results Re-projection Error Table 7.1 also lists the standard deviation of the re-projection error, defined in section 3.4.3, over all calibration points in x- and y- directions. As illustrated in figure 7.9, the re-projection error is the difference between the grid corner positions extracted from the images (marked with a +) and the grid corner positions calculated by re-projecting the known locations of the feature points on the calibration target into the image with the estimated camera parameters (marked with a ◦). The re-projection errors for all feature points used for calibration are plotted in figure 7.10. 50 50 100 100 Y Y 150 150 200 200 O X O X 250 250 50 100 150 200 250 50 100 150 200 250 300 350 Figure 7.9: Re-projection of grid-points for one of the calibration image pairs. The extracted grid corners are marked with red crosses and the grid corner positions calculated by re-projection are marked with green circles. The axis labels denote the pixel coordinates. Left: Amber Radiance camera, right: Thermosensorik camera. 7.2.2 Exterior Parameters The exterior parameters of the stereo setup obtained by the calibration process are listed in table 7.2 and are illustrated in figure 7.11, together with the estimated positions and orientations of the planar calibration target. The world coordinate system was chosen to coincide with the Amber Radiance camera coordinate system. The vector C TS represents the position of the camera center of the Thermosensorik in this coordinate system. The rotation is represented in terms of a vector R, whose direction determines the axis of rotation and whose length specifies the rotation angle. This representation was used in the gradient-descent optimization, because it consists of only three 123 Reprojection error (in pixel) − Amber Radiance 0.15 0.1 y 0.05 0 −0.05 −0.1 −0.15 −0.2 −0.25 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2 x Reprojection error (in pixel) − Thermosensorik 0.25 0.2 0.15 0.1 y 0.05 0 −0.05 −0.1 −0.15 −0.2 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 x Figure 7.10: Re-projection error for the Amber Radiance camera (top) and Thermosensorik camera (bottom). For an explanation of the re-projection error, refer to section 3.4.3. The colors of the points correspond to different orientations of the calibration pattern as depicted in figure 7.11. 124 Results 50 0 −50 12 1321 37 20 3624 32 35 18 27 14 26 91116 45 3828 7 10 2 1200 1000 800 AR X TSZ Z 600 Y Y 400 X 200 0 −100 0 100 Figure 7.11: Exterior orientation obtained by the stereo calibration algorithm. The camera centers are at the origins of the small coordinate frames; AR and TS represent the Amber Radiance and the Thermosensorik infrared cameras respectively. The fieldof-view of each camera is represented by a red pyramid. The positions of the calibration target and the Thermosensorik camera are relative to the camera coordinate system of the Amber Radiance, which was chosen as the world coordinate system. The axis labels are in millimeters. Parameter Value Rotation vector R Uncertainty of rotation vector Translation (Position of C TS in world frame) [mm] Uncertainty of C TS [mm] (0.002, 0.077, -0.0270) (0.007, 0.008, 0.0003) (129.7, 5.1, -18.5) (0.5, 0.2, 4.8) Table 7.2: Exterior parameters of the stereo camera setup. 125 elements, corresponding to the three independent degrees of freedom. The same rotation can also be expressed in terms of a rotation matrix as 0.9967 0.0271 0.0771 R = −0.0269 0.9996 −0.0033 . (7.1) −0.0772 0.0012 0.9970 Using these exterior parameters and the interior parameters presented in section 7.2.1 the projection matrices in the world coordinate system defined by the Amber Radiance camera are 1405.5 0 143.3 0 1405.6 123.9 0 PRAD = 0 (7.2) 0 0 1 0 (7.3) for the Amber Radiance camera and 2285.9 63.0 452.6 −295262.8 PTS = −71.48 2315.9 110.8 −9573.4 −0.077 0.0012 0.997 18.45 (7.4) for the Thermosensorik camera. These matrices were calculated by inserting the camera parameters obtained from calibration into the definition of the camera projection matrix, equation (3.12). They describe the relationship between the coordinates of 3D scene point in millimeters (augmented with a one to create a homogeneous 4-vector) and its homogeneous image coordinates in pixels. 126 Results 7.3 Rectification To facilitate the use of the epipolar constraint in the subsequent stereo matching step, the images were rectified using the algorithm described in section 3.5.4, making use of the exterior and interior parameters obtained from the camera calibration. The outcome of the rectification is illustrated in figure 7.12 for a stereo image pair of an infrared sequence1 of a wavy water surface. Before applying the rectifying transforms, the non-linear distortion was removed from the images using algorithm 3.1. A visual inspection of distinctive grey-value features in several of the rectified image pairs confirmed that the rectification works correctly; apart from small variations of typically less than 1 pixel, corresponding points were aligned on the same image scanline. The deviations that were noticed are not a result of the rectification process, but are due to the limited accuracy with which the camera parameters and thus the epipolar geometry are known. As a result of the rectification, the size ratio of the two images is changed. The different scaling corrects for the different fields-of-view of the two cameras (see section 6.1.2); corresponding structures are now approximately the same size. The rectified left image, taken with the Amber Radiance camera, is enlarged with respect to its original size of 256×256 pixels. Its horizontal and vertical extent is now 350×350 pixels. Note that it does not fully fill a 350×350 pixel bitmap as it is tilted. In contrast, the rectified right image, corresponding to the Thermosensorik camera, shrank from its original size of 388×284 pixels to a size that fits into a rectangular bitmap image of size 310×238 pixels. Because the rectified images do not fill a whole rectangular bitmap image, it is useful to express the scaling in terms of the total number of pixels. The 65536 original pixels of the Amber Radiance are mapped to 106323 pixels in the rectified images; this is an increase of a factor of 1.62. The 110592 original pixels of the Thermosensorik camera are mapped to 68065 pixels in the rectified images; this is a reduction by a factor of 0.615. Therefore, when disparity estimation is performed for the rectified images, one original pixel of the Amber Radiance camera is in effect compared to 1.62/0.615 = 2.63 original pixels of the Thermosensorik camera during the evaluation of the correlation score (equation (4.7)). 1 The particular image pair is frame 1 of sequence 3 (see section 6.6.4). 127 Thermosensorik rectified images original images Amber Radiance 0.1 [K] line profile 0.2 0 -0.1 -0.2 0 100 200 300 0 100 200 300 Figure 7.12: Results of the stereo rectification. An infrared image pair of a wavy water surface is shown in the top row. The middle row shows the corresponding rectified stereo pair. Note that rectified images are mirrored both vertically and horizontally, but this does not affect the subsequent processing. The bottom row presents line profiles of two corresponding lines, marked red in the rectified images. The left and right column correspond to images obtained with the Amber Radiance camera and the Thermosensorik camera, respectively. 128 Results Another negative effect of the different fields-of-view of the two cameras that is apparent in figure 7.12 is the fact that many pixels cannot be used for disparity matching due to the limited overlap of the imaged areas. 7.4 Disparity Estimation This section documents the results obtained with the stereo matching algorithm described in section 4.3. First, in section 7.4.1, the results for a test image pair are presented. Second, in section 7.4.2, the results for an infrared image pair of a wavy water surface are presented and used to analyze the multi-scale disparity estimation. 7.4.1 Test Image Results The disparity algorithm was tested on a standard test stereo image pair, to check the correctness of the implementation and to allow for a comparison with other algorithms. The image pair, shown in figure 7.13, was taken from a standard test image series from the University of Tsukuba, which is available on the internet [81]. Ground truth disparity data for this image pair, with a resolution of one pixel, is shown on the bottom right of figure 7.13. The results obtained with the implementation of the algorithm described in section 4.3 are shown on the bottom left of figure 7.13. The matching was performed on three different levels of a Gaussian image pyramid using correlation window sizes of 9×9, 7×7 and 5×5 pixels (from highest to lowest resolution). In figure 7.13, the disparity combined across scales is shown. A comparison of the resulting disparity map with the ground truth data demonstrates that the algorithm produces reasonable disparity estimates. Problematic areas, where no matches can be found, are mostly located near object boundaries and can be attributed to occlusions. A notable exception is the wrong match located left of the depicted camera, which is probably due to the repetitive pattern of the books in the shelf. Due to the fixed size of the correlation window, small structures, such as the legs of the tripod, are fattened and disparity discontinuities are smoothed out. The Tsukuba test image pair is not very representative for the situation encountered in the present work, because the depicted scene is comprised of many different objects, giving rise to depth discontinuities and occlusions. Also, the images have very low image noise and the whole image scene is in focus. In contrast, the images in the present work are of a single smooth surface, comparatively noisy and 129 Figure 7.13: Results of the disparity matching algorithm described in section 4.3 applied to a standard test image pair. Top: Stereo image pair from the University of Tsukuba image series. The size of the images is 384×288 pixels. Bottom: The disparity map, obtained with the implemented algorithm, is shown on the left. The disparities are with respect to the right image; brighter grey-values correspond to larger disparities. Pixels for which no correspondence was found are shown in white. The ground truth disparity, with disparity levels quantized in 1-pixel steps, is shown on the right. The Tsukuba stereo image pair and the ground truth data are available on the internet [81]. sometimes blurred. However, the Tsukuba image has been used by Scharstein and Szeliski [82] and therefore it allows comparison with a large number of other algorithms. Such a comparison reveals that the implementation yields similar results as other algorithms that do not employ global disparity optimization. Algorithms using global disparity optimization generally perform better in terms of preserving disparity discontinuities across object boundaries. 7.4.2 Multi-Scale Disparity Estimation This section presents the disparity estimate obtained for an infrared image pair1 of a wavy water surface. The image pair is shown at three different levels of a Gaussian image pyramid in figure 7.14, together with the disparity maps computed at each level. As mentioned in section 7.3, the image area that is usable for disparity matching fits 1 The particular image pair used for this discussion is frame 88 of sequence 4 of section 6.6.4. 130 Results into a rectangular area of 310×238 pixels. Note that the image acquired with the Thermosensorik camera is slightly out of focus. The sizes of the correlation windows at each scale were determined empirically, such that a good balance between a dense disparity map and good resolution was achieved at each level. The correlation window sizes were 21×21, 11×11 and 9×9 pixels in order from highest to lowest resolution. These window sizes are quite large compared to the window sizes used for the “Tsukuba” test image in the previous section. The need for large window sizes is probably attributable to image noise and the fact that the right image is slightly out of focus. These window sizes were also used to analyze the wave image sequences presented in section 7.7. Judging by the density of the valid disparity estimates, the image regions with spatially finer texture, at the top left and bottom right of the infrared images, can be better matched, as would be expected. The holes in the disparity map obtained at the highest resolution can to some degree be filled in by the results at lower resolution using algorithm 4.1 (page 90). This multi-scale approach yields the combined disparity shown at the bottom left of figure 7.14. The bottom right of figure 7.14 shows the combined disparity after outliers have been removed using the method described in section 5.2; that is, for each scanline, disparity estimates that differ by more than 2.3 standard deviations from the mean disparity of that scanline were removed. 7.5 Regularization Regularization of the image sequences was performed using the spatio-temporal membrane model described in section 5.3.3. To compute the local average, a 7×7×3 pixel Binomial filter mask was used. The filter mask was made larger in the spatial dimensions than in the time dimension to give spatial smoothness a higher weight over temporal smoothness. The elasticity parameter α, which controls the smoothness, was set to 0.6. The confidence value κ was set to zero for disparity estimates which failed the left/right consistency check (see section 4.3.5) or were marked as outliers (see section 5.2), otherwise κ was set to one. Figure 7.15 illustrates the results obtained with this regularization technique for a disparity image taken from water wave sequence 4 (see section 6.6.4). As can be seen that convergence is reached after a few iterations; there is little change between the 6th and the 9th iteration. Close inspection of figure 7.15 reveals that 131 Multi-Scale Stereo Matching Left Camera Image (Amber Radiance) Right Camera Image (Thermosensorik) Combined Disparity Disparity Disparity after Removal of Outliers Figure 7.14: Disparity estimation at different scales, demonstrated for a stereo image pair of a water surface in the infrared region. The top three rows show the stereo image pair at different scales, together with the disparity estimates at each scale. For ease of comparison, the grey-values of disparities obtained at lower resolutions have been scaled to their equivalent disparity at the highest resolution. The combined disparity is shown on the bottom left. The final disparity map after outliers have been removed is shown on the bottom right. The range of disparities is 5 pixels to 25 pixels of displacement, with darker grey-values representing lower disparities. Points for which no valid disparity estimates were found are shown in white. 132 Results original frame 3 iterations 6 iterations 9 iterations Figure 7.15: Regularization results obtained using the spatio-temporal membrane model (see section 5.3.3). The original disparity map is shown at the top left, together with regularized versions for different numbers of iterations of the update rule in equation (5.9). the interpolation of the larger gap in the bottom right quadrant of the image is not very smooth. This is because the gap, which is relatively large in the spatial directions, was filled in with disparity estimates from neighboring frames during the first iterations as the Binomial filter also extends into the time dimension. The fact that the interpolation in time direction produces the observed artifacts can be attributed to the frame rate of 60 Hz, with which the image sequences were acquired. This frame rate is not high enough to fulfill the sampling theorem (see Shannon [89]). Due to the resulting temporal aliasing, the discretization of the differential operators in equation (5.13), leading to the iterative update rule in equation (5.9), is not correct. The artefacts can also be observed in the space-time images presented in section 7.7, figure 7.19. For some frames, it was observed that even a single outlier in the data that is erroneously labeled as correct (κ = 1), can severely affect the quality of the regularized solution if it is located within a larger area where no data is available. In such a case, the iterative procedure propagates the grey-value of the outlier to fill the gap. 133 7.6 Depth Reconstruction Based on the disparity map calculated for the infrared stereo image pairs, a 3D reconstruction of the water surface was performed using triangulation as described in section 3.5.5. The reconstruction was performed point by point for each valid entry in the disparity image, yielding a list containing the 3D coordinates of the reconstructed points. Image sequences of the 3D reconstruction were produced by creating a scatter plot of the reconstructed points for each frame of the original stereo image sequence. An example of such a scatter plot, showing the reconstruction of a wavy water surface, is depicted in figure 7.16. Other visualization methods, which interpolate between the individual points to produce a continuous surface, were also evaluated; however, a small number of outliers can severely affect the visual quality of these methods, and therefore they were not found to be useful for this application. 1180 1220 z [mm] 1260 -50 0 x [mm] 50 1300 -50 100 0 y [mm] 50 Figure 7.16: 3D Reconstruction of a wavy water surface, visualized as a scatter plot. The coordinates are with respect to the world frame defined by the Amber Radiance camera. For better visibility only a third of the total number of reconstructed points are shown. The particular image pair used to produce this example is frame 62 of sequence 3 (see section 6.6.4). 134 7.7 Results Measurements of Water Waves In this section, the results obtained for image sequences of a wavy water surface are presented. Refer to section 6.6.4 for details about how these sequences were acquired. Because of their inherent three-dimensionality, image sequences are difficult to present on paper1 . To address this problem, space-time images are used in this section to depict the temporal change of image grey-values. Such a space-time image shows the temporal evolution only for a single scanline rather than for a whole image to reduce the dimensionality. The velocity of a given object in the image sequence corresponds to the orientation of its grey-value trace in the spacetime image. For example, an object that does not move at all creates a vertical bar in a space-time image (see [56] for a detailed explanation of space-time images). Figure 7.17 shows the disparity estimates of four sequences recorded at different wind speeds as reported in section 6.6.4. The disparity matching was performed on three levels of a Gaussian image pyramid using the same correlation window sizes as in section 7.4.2. The image sequences are numbered according to table 6.3 on page 112. For each sequence, the disparity of a line is shown, with the time in seconds increasing from top to bottom. At the top of each space-time image a sample disparity image frame2 of the sequence is used to mark the position of the scanline shown. The disparity values are in pixels, and are encoded as grey-values. Again, pixels for which no valid disparity estimate was found are shown in white. For the sequences in figure 7.17, larger disparities correspond to points that are further away from the camera. Note that the grey-value scale used for sequence 3 differs from the scale used for the other sequences. In the space-time images of all four sequences, a distinct periodic structure, corresponding to a large gravity wave mode is visible. This wave mode is more dominant in the images sequences 1 and 3, which were taken at higher wind speeds. In addition, some smaller periodic structures are visible. From their orientation in the space-time images, it can be observed that these structures propagate from right to left and are slower than the dominant wave mode. It is interesting to note that the holes in the disparity estimates do not occur randomly but mostly along oriented structures that move from right to left in the space-time images. It is likely that these missing disparities arise from specular 1 The digital image sequences presented in this section can be found on the accompanying DVD or be requested from the author via email: [email protected] 2 The particular frame numbers of the sample disparity images are frame 20 for sequences 1,3,4 and frame 1 for sequence 2. 135 Seq. 1 Seq. 2 t [s] x 0 0 2 2 4 4 6 6 8 8 5 10 15 20 Seq. 3 25 5 15 20 25 10 15 20 25 Seq. 4 0 0 2 2 4 4 6 6 8 8 0 10 20 40 5 Figure 7.17: Space-time disparity images. Refer to the text of section 7.7 for details. 136 Results 1.67 1 3.33 Time [s] 5 6.67 8.33 0.5 Sequence 1 0 0 50 100 150 200 250 300 350 400 450 500 100 150 200 250 300 350 400 450 500 100 150 200 250 300 350 400 450 500 100 150 200 250 300 350 400 450 500 1 0.5 Nvalid/N total Sequence 2 0 0 50 1 0.5 Sequence 3 0 0 50 1 0.5 Sequence 4 0 0 50 Frame Number Figure 7.18: Fraction of pixels for which a valid disparity estimate could be found to the total number of usable pixels over time/frame number. reflections. Their velocity is determined by the propagation velocity of areas with constant slope on the wave face, for a slope angle at which the radiation of some distant warm object is specularly reflected towards the camera. Despite the validation of matches using the left/right consistency check (see section 4.3.5) and outlier removal (see section 5.2), careful visual inspection of the space-time images reveals some disparity estimates that are likely to be faulty. In the actual image sequences these faulty estimates appear more salient than in the printed space-time images. The wave motion should lead to continuous grey-value trajectories in the spacetime images. Clearly, this is only the case for some of the larger structures. The finer structures appear to be discontinuous. This is caused by temporal aliasing: 137 Disparity after Regularization Seq. 1 Seq. 3 0 0 2 2 4 4 6 6 8 8 5 10 15 20 25 0 20 40 Figure 7.19: Space-time disparity images after regularization using the 3D membrane model described in section 5.3.3. The axes and sequences numbers are as in figure 7.17. the frame rate of the camera is not high enough to resolve the motion of the smaller structures. In addition, the amplitude of shorter waves is usually smaller. Therefore, they cause smaller variations of disparity that are not well resolved given the limited spatial resolution of the cameras and the accuracy of the disparity estimation. The fraction of pixels for which a valid disparity estimate was found to the total number of “usable” pixels was calculated for each frame. Figure 7.18 depicts this fraction for the four sequences. For each sequence, every pixel for which no valid disparity estimate was found in any of the 512 frames was marked as “unusable”, while all other pixels were marked as “usable”. The number of usable pixels was around 46 · 103 , varying by several 100 pixels for the individual sequences. It is apparent that the sequences which exhibit a higher variation of disparity, corresponding to larger wave heights and higher wind speeds, have a higher fraction 138 Results of invalid matches. One factor that contributes to this observed effect is the following: For larger wave heights, the range of distances between the water surface and the camera becomes larger. As a result, the water surface moves out of focus if it comes too close to the Thermosensorik camera, which has a narrow depthof-field (see section 6.1.2). The resulting blur in the infrared images results in a less pronounced correlation of corresponding image patches and consequently leads to a lower number of found matches. This explanation is also supported by the space-time image of sequence 3 in figure 7.17. In this sequence, many gaps in the disparity estimate occur when the disparity is low, that is, when the water surface is closer to the camera. Visual inspection of the original infrared image sequences confirms this interpretation; the frames with a low number of correct matches were found to correlate with images that were out of focus. Another possible cause that may contribute to the observed dependence of the found matches on wave height is the higher velocity that is associated with larger gravity waves. Faster motion causes blurring because the exposure of the detector takes some time. In addition, the slight delay between the exposure of the two cameras (see section 6.3.2) becomes more critical. In figure 7.19, the space-time images of sequences 1 and 3 after applying regularization using the membrane model are shown. In these space time images, the gaps in the data have been filled in. However, as described in section 7.5, the quality of the interpolation is not very good; many image artifacts are visible. Image sequences that show the motion of the recorded water surfaces in 3D as a series of scatter plots have been created from the disparity image sequences (these sequences can be found on the accompanying DVD). Due to the unsatisfactory quality of the regularization, the original disparity images, which contain gaps, were used to create these sequences. 7.8 Discussion of Accuracy This section provides an analysis of the accuracy of the reconstructed 3D points that are obtained with the infrared stereo camera system described in this work. Part of this analysis is based on the theoretical considerations presented in appendix A. 7.8.1 Assessing the Total System Accuracy Using a Reference Plane As explained in section A.2, one way of assessing the total system accuracy of a stereo setup experimentally is to reconstruct the surface of an object with known 139 ground truth. For the present application, no ground truth data for the wavy water surface was available for cross validation, because an alternative instrument to measure the water surface in the wind-wave channel, a Color Imaging Slope Gauge (see Fuß [33]), was still under construction at the time of measurement. Instead, a stereo image sequence of a flat water surface was acquired to serve as ground truth. Some images of this sequence are depicted in figure 1.3 on page 12. For this image sequence, it is known that the imaged surface was level. Therefore, the reconstructed 3D points should lie in a plane. If the position and orientation of this plane is known with high accuracy, the distance of the reconstructed points from this plane can be used as a measure for the precision of the reconstruction. Determining the Reference Plane The accurate determination of the reference plane is based on the assumption that the disparity estimates are only affected by randomly distributed errors and not by systematic errors. This assumption is justified, as the only likely cause of systematic error, the time lag between the exposure of the two cameras (see section 6.3.2), does not play a significant role for a still surface (the motion of the temperature patterns on the surface can be neglected as it is slow with respect to the time lag). With this assumption of randomly distributed errors, the accuracy of the disparity estimate can be improved by taking the mean over several frames. The reference plane was computed as follows: • For each pixel, the mean disparity over 400 frames of the acquired flat water sequence (see section 6.6.4) was calculated. Note that for some pixels a disparity estimate was not available for every frame of the sequence. In these cases, the mean was taken only over the available disparity estimates. The resulting disparity map is shown in figure 7.20, together with the standard deviation of the disparity for each pixel (over the frames included in calculating the mean). The mean of this standard deviation, taken over all pixels, is 0.55 pixels. • To avoid potential boundary effects, 30 pixels at the edges of the mean disparity were cropped using a morphological erosion operator (see, for example [56]). • For the cropped mean disparity map, a 3D reconstruction was performed using the triangulation method described in section 3.5.5. The reconstruction is shown in figure 7.23, top left. The total number of reconstructed points is 34046. 140 Results Mean Disparity 5 10 15 Standard Deviation 20 0 0.2 0.4 0.6 0.8 1 Figure 7.20: Left: Mean over 400 disparity maps calculated from a stereo image sequence of a flat water surface. The disparity is in pixels with respect to the left camera. The vertical gradient of the disparity reflects the fact that the camera was tilted with respect to the water surface. Right: Standard deviation of the disparity for each image point over 400 frames. Disparity (single frame) 5 10 15 20 Figure 7.21: Disparity map for a single frame of the stereo image sequence of a flat water surface. 141 • A plane was robustly fitted to the reconstructed points using the RANSAC method, detailed in section A.3. The final consensus set of the RANSAC method comprised 97.4% of the total number of points. The homogeneous T 4-vector representing the fit plane is P̃ ref = (1.265, 4.774, −8.185, 104 ) . The standard deviation of the distance from the fit plane, calculated over all reconstructed points, was σd1 = 0.68 mm. The fit plane is depicted in figure 7.22, together with the fit points, and from a different perspective in figure 7.23, top right. Deviation from the Reference Plane The so-obtained fit plane was then used as a reference against which the points reconstructed from single frames of the disparity (in contrast to the sequence mean) were compared. For example, figure 7.23, bottom left, shows the reconstruction corresponding to the disparity map shown in figure 7.21 which was derived from a single frame of the stereo image sequence1 . The plot on the bottom right of figure 7.23 illustrates the deviation from the fit plane of the reconstructed points from this frame. This distance from the fit plane was calculated for all points over all 400 frames of the sequence, that is, for a total number of 18.1 · 106 points. The standard deviation of all points from the reference plane was σd2 = 3.25 mm. 1 The particular frame was number 12 of the flat water sequence. 1280 1240 z [mm] 1200 1160 100 0 x [mm] -100 100 50 0 y [mm] -50 -100 Figure 7.22: Points reconstructed from the mean disparity, shown in blue, and the reference plane fitted to these points, shown in green. The coordinates are with respect the coordinates system defined by the Amber Radiance camera as shown in figure 7.11. 142 Results Points on reference plane (from series mean) Reference plane (Fit) 1300 1300 1250 1250 z 1200 z 1200 1150 1150 1100 100 1100 100 100 y 0 100 0 y 0 x -100 -100 -100 -100 0 x Points on reference plane (single frame) Deviation from reference plane (single frame) 1300 1250 z 1200 10 z -10 1150 -100 -50 1100 100 x 0 50 -100 0 100 0 y -100 -100 0 x 100 y z [mm] -10 0 10 Figure 7.23: Top left: Scatter plot of points reconstructed from the mean disparity shown in figure 7.20. Top right: Reference plane fitted through the reconstructed points on the top left using the method described in section A.3. Bottom left: Scatter plot of reconstructed points from a single disparity frame of the sequence of a flat water surface. The corresponding disparity map is shown in figure 7.21. Bottom right: Deviation of the points on the bottom left from the reference plane. Except for the graphic at the bottom right, the coordinates are with respect to the coordinate system defined by the Amber Radiance camera as shown in figure 7.11. All axis labels are in millimeters. Note that the color bar only refers to the bottom right plot. For better visibility, the scatter plots show only a randomly selected 20-30% of the total number of reconstructed points. 143 The position of the reference plane is not known exactly (the standard deviation of the points used for the fit from the reference plane is σd1 ). This is accounted for by combining σd1 and σd2 to obtain the total uncertainty of the reconstruction q σd = 2 2 σd1 + σd2 = 3.32 mm. (7.5) Note that equation (7.5) is strictly not correct as σd1 and σd2 are not really independent. Limitations The assessment of the achievable accuracy using the reference plane has several limitations. First, the calculated deviation from the reference plane only includes inaccuracies in the direction normal to the plane. As the level water surface is not oriented perpendicular to the z-axis of the world coordinate frame, defined by the Amber Radiance camera, the estimated uncertainty mixes the lateral uncertainty, along the x- and y-axes, with the range uncertainty (see also figure A.2). However, the result is supported by the theoretical considerations on the range resolution achievable by triangulation presented in appendix A. For the given configuration of the stereo camera system, 10 cm on the water surface are sampled by roughly 250 pixels1 , corresponding to a lateral resolution of 0.4 mm per pixel. Assuming that the disparity estimates are accurate to half a pixel, and calculating the verging angle based on the distance to the surface of 120 cm and the baseline of 13 cm, equation (A.1) yields a value of ∆d = 3.7 textmm, which is in excellent agreement with the value σd = 3.3 textmm obtained using the reference plane. Second, the reference plane does not take potential systematic errors into account. A possible source of systematic error, which can affect the disparity estimate, is the small time lag between the exposure of the two cameras. This is not a concern for the image sequence of a still water surface, used to determine the reference plane. However, it will affect the image sequences of water waves. As an example, consider a wave with a wave number of 20 rad/m, equivalent to a wavelength of 31 cm. Such a wave has a phase velocity of of 0.7 ms−1 . During a time lag of 1 ms, the wave travels over a distance of 0.7 mm. For the configuration of the stereo system used in this work, where a horizontal distance of 10 cm was sampled roughly by 250 pixels, a distance of 0.7 mm can cause a systematic error of 1.75 pixels in the disparity estimate. Effects such as motion blur or defocusing are also not accounted for by the reference plane approach. 1 It is difficult to give an exact value due to the different fields-of-view of the cameras. 144 Results For the acquired wave image sequences, the uncertainty of the reconstructed points will therefore be somewhat larger than the 3.3 mm obtained with the reference plane approach. 8 Conclusions 8.1 Summary A stereo-infrared imaging system for the 3D reconstruction of a wavy water surface was designed, implemented and tested. The stereo-infrared wave gauge does not require a light source, that is, it is passive, and no parts of the system have to be submerged below the water surface. Therefore, it does not interfere with the wave motion and it can be used in the field, either from the boom of a ship or from a permanent structure, such as a pier. In contrast, instruments that require a submerged light source are considerably more difficult to deploy and are not suited for long-term operation from a permanent structure, as they are prone to damage from the environment. The stereo correspondence problem was solved using a correlation-based disparity matching algorithm. Key requirements for correlation-based matching are that the imaged surfaces be opaque and have enough intensity variation, or texture. These requirements were met by operating in the 3−5 µm region of the infrared spectrum, where water is opaque and, if a heat flux is present, exhibits rich temperature patterns at the surface, which can be recorded by infrared cameras. For several infrared stereo image sequences of water waves at different wind speeds, acquired at the Aeolotron wind-wave flume in Heidelberg, it has been demonstrated that a dense reconstruction of the water surface can be obtained using this stereo system. The 3D reconstruction accuracy obtained with the infrared camera system was evaluated under realistic operating conditions, that is, with similar image texture and image noise level as for the water wave sequences. To this end, a level water surface was used as a ground truth target. With this method, the accuracy of the system, given as the standard deviation of the reconstructed points from the fit plane, was found to be 3.3 mm. 8.2 Discussion and Outlook The main contribution of this work is the use of infrared imaging to overcome some of the major problems associated with previous attempts (see sections 1.2 to 1.3) to use stereo photogrammetry for the reconstruction of wavy water surfaces. It was demonstrated that the shape of a water surface can be reconstructed based 146 Conclusions on stereo infrared image sequences using algorithms from computer vision. As the infrared imagery can also be used to determine the heat flux at the water surface using the methods described by Haußecker et al. [46] and Garbe et al. [36], the stereo infrared wave gauge can be used to investigate the influence of short gravity waves on air-sea exchange processes in the field. The incorporation of a flexible technique for geometric camera calibration proposed by Zhang [108] permits quick adjustments of the system to adapt to different situations. The main limitations of the system described in this work are the high cost of infrared cameras and the achievable surface resolution. With a reconstruction uncertainty of 3.3 mm for the configuration used in this work, the instrument cannot resolve capillary waves which have sub-millimeter amplitudes. Given that capillary waves contribute significantly to the mean squared slope of the water surface, this important parameter for air-sea exchange processes cannot be determined with the present system. Several measures could be taken to improve the performance of the instrument. First, the use of two identical cameras and lenses would have several benefits. As was shown in section 7.5, the current stereo camera system does not make efficient use of the available sensor pixels due to the limited overlap of the imaged areas. With two identical cameras, it should be possible to create a greater overlap and make better use of the available sensor area. As an illustration, consider a stereo setup with two cameras of the type Thermosensorik CMT 384. If the areas imaged by the two cameras were arranged to overlap by 80%, which should be possible using identical optics, the number of usable pixels would be around 90000, roughly twice as many as in the current configuration. Such a setup would have the additional benefit that the frame rate, which is currently limited to 60 Hz by the Amber Radiance camera, could be increased to 133 Hz. This would reduce the temporal aliasing effects observed in the image sequences, which are caused by the motion of shorter waves. As a consequence, the use of the temporal smoothness constraint for disparity regularization (see section 5.3.3) would be justified for finer spatial structures. A potential source of systematic error, the time lag between the exposure of the two cameras, would also be eliminated for a system configuration using two Thermosensorik CMT cameras, because this camera model can be triggered externally. Further hardware improvements are to be expected due to the steady progress in infrared detector technology. As described in Hewett [48], infrared cameras are currently making their way into consumer applications, and detectors with a larger numbers of pixels and a higher sensitivity are likely to become available. 147 The use of a wider stereo baseline would improve the range resolution that is achievable using triangulation (see A.2). However, due to the resulting greater difference in the view angle of the two cameras, this would also make the correlation-based disparity estimation more difficult. Disparity estimation algorithms have received increasing attention from computer vision researchers in recent years, as is documented in a review by Scharstein and Szeliski [82]. Much of the recent work in this field is directed towards preserving disparity discontinuities, which is not of much concern for the present application. However, most of the newer algorithms make use of spatial smoothness constraints by integrating the regularization into the disparity estimation step, which would be useful for the present application. The algorithm proposed by Alvarez et al. [2] is particularly interesting, because it makes use of the known epipolar geometry to formulate the smoothness constraint based on the curvature of the reconstructed surface rather than based on the curvature of the disparity. For the application of wave imaging, such an approach would make it easier to set the parameters controlling the spatial smoothness to meaningful values derived from physical constraints rather than determining them empirically. More accurate disparity estimates should be possible through integrating the temporal smoothness constraint based on the wave motion (see section 5.3.3) into the stereo matching algorithm. However, a prerequisite for the successful use of a temporal smoothness constraint are stereo image sequences recorded with a higher frame rate to avoid temporal aliasing. Apart from these technical improvements, the system must be tested in the field to check whether the results obtained in the laboratory can be reproduced. In this regard, experience with field operation of infrared cameras collected during several research cruises over the past years (Schimpf [83], personal communication) is very promising, as infrared image sequences of high quality, that is, with rich texture and few specular reflections, have been recorded for a wide range of environmental conditions. Work should also be directed towards improving the 3D visualization of the obtained results. For example, it is conceivable to use coloring to map spatially resolved heat flux estimates, computed using the image processing method proposed by Garbe et al. [37], onto the reconstructed surface. This would give previously unavailable insights into the spatial and temporal variations of air-sea exchange processes. 148 Conclusions A Reconstruction Accuracy In this appendix, methods to assess the accuracy of the 3D reconstruction obtained with the infrared stereo camera system are discussed. Section A.1 presents the different factors that affect the quality of the reconstruction and their interdependencies. In section A.2, the theoretical limit on range resolution that can be achieved using triangulation due to the finite image resolution is analyzed. For the stereo camera system presented in this work, the achievable accuracy was assessed by experiment, using a flat water surface as a reference plane (see section 7.8.1). Section A.3 of this appendix explains the technique used to estimate the orientation and position of this reference plane robustly, given a number of reconstructed points. A.1 Factors Influencing the Reconstruction Accuracy The accuracy of the 3D reconstruction depends on a number of factors, such as the resolution of the cameras, the stereo baseline, the accuracy of the camera calibration, and the quality of the disparity estimates. Some of these factors in turn depend on other factors, such as the noise level and the amount of texture in the stereo image pairs. These factors often are not independent, for example the accuracy of the disparity estimates is affected by how well the epipolar geometry has been determined through geometric camera calibration. In figure A.1 some of the factors that influence the accuracy of the 3D reconstruction and their interdependencies are identified. Adding to the difficulty of quantifying the uncertainty in the system is the fact that some of these factors, such as the image texture, are hard to provide a numerical value for. Given the complexity of the interdependencies, deriving an analytical expression that links the final accuracy of the reconstruction to all of these factors is hardly feasible. However, it is possible to analyze individual links in this network of interdependencies. One such link, the influence of the camera resolution and the verging angle of the stereo system on the achievable range resolution, is analyzed in section A.2. Another approach to obtain an assessment of the total system accuracy of the 3D reconstruction is by experiment. In order to do this, images of a scene containing a surface with known ground truth or marker points at known locations are recorded with the stereo camera system. The total system accuracy is then given by the 150 Reconstruction Accuracy " " # $ ! Figure A.1: Some of the factors influencing the accuracy of the 3D reconstruction and their interdependencies. deviation of the 3D reconstruction from the known ground truth coordinates. For this experimental approach to be valid, it is important that the ground truth target is acquired under similar conditions as during the actual experiments. Specifically, the image texture and signal-to-noise ratio should be similar. In this work, a flat water surface was used as a ground truth target. Except for the lack of wind waves, it was recorded under exactly the same conditions as the sequences with water waves (see section 6.6.4). The accuracy is then given by the deviation of the reconstruction of the flat water surface from planarity. To calculate this deviation, a reference plane is robustly fitted to the reconstructed points as described in section A.3. A.2 Range Resolution of Triangulation As was shown in section 3.2.1, an image point and the projective center of a camera define a line of sight. Due to the limited resolution of real cameras, the coordinates of a point in the image plane are not known exactly, but are known to lie within a given region around the estimated coordinates. In most cases, the size of this 151 region is determined by the spacing of the sensor elements on the detector chip, or, if sub-pixel matching is used, some fraction thereof. In contrast to a single point coordinate, an uncertainty region around the estimated point no longer defines a single line of sight. Instead, it defines a solid angle. Therefore, the range resolution of a 3D reconstruction is given by the volume defined by the intersection of the solid angles corresponding to the uncertainty regions of the image points used for triangulation. The exact size and shape of this intersection volume depends on the exterior orientation of the camera system and on the image coordinates and their uncertainties. To obtain a rough estimate for how the range resolution ∆d of a point reconstructed by triangulation depends on the image resolution, one can consider the simplified model depicted in figure A.2. Instead of central projection, this model assumes parallel projection. The lateral resolution corresponding to a single pixel is assumed to be ∆x. It is apparent that the relationship between ∆x and ∆d depends on the vergence angle α of the camera system. Using similar triangles, the following relationship can be derived 2 cos α2 ∆d = ∆x ; sin α (A.1) that is, ∆d decreases for larger values of α. However, the uncertainty ∆h in the direction perpendicular to ∆d increases for larger values of α. The ratio of the uncertainties is ∆h α = tan . ∆d 2 (A.2) Figure A.2: Simple geometric model to calculate the uncertainty ∆d of a depth estimate obtained by triangulation, assuming parallel projection with a spatial uncertainty ∆x. 152 Reconstruction Accuracy For a vergence angle of α = 90◦ , ∆x and ∆h are equal, minimizing the total uncertainty. For a more detailed analysis, one would have to take into account that the camera images are formed by central projection, and that the internal and exterior camera parameters are not known exactly. Also, in the present application, the two cameras were not identical and had different resolutions. A.3 Estimation of a Reference Plane In section 7.8.1, the 3D reconstruction of a flat water surface was used as a reference plane to assess the accuracy of the infrared stereo camera system. This section briefly describes the algorithm that was used to robustly estimate the parameters of the reference plane, that is its position and orientation, given a number of reconstructed points X̃ 1 . . . X̃ n lying in this plane. A.3.1 Least-Squares Estimate Planes and points in the projective space P3 are represented by homogeneous 4vectors, as described in section 3.1.2. Analogous to the case of points and lines in P2 , expressed in equation (3.1), a point X̃ lies P3 on a plane P̃ if T X̃ i P̃ = 0, (A.3) that is, if the inner product of the 4-vectors representing the plane and the point is zero. The first three elements of the homogeneous vector P̃ define the surface normal of the plane, and therefore its orientation. The distance of the plane from the origin is determined by the ratio of the first three elements to the fourth element of the homogeneous vector. Given the homogeneous coordinates X̃ 1 . . . X̃ n of n ≥ 3 points on a plane, the homogeneous vector P̃ representing the plane can be estimated by finding the least-squares solution AP̃ = 0, (A.4) with T X̃ 1 A = ... . T X̃ n (A.5) 153 A.3.2 Robust Estimate The least-squares solution of equation (A.4) is adequate if the X̃ n differ from the true point coordinates only by added Gaussian noise with zero mean. In practice, some of the points will be outliers to the Gaussian error distribution, that is they do not lie in the plane in the first place. For example, with a stereo system, outliers will occur if non-corresponding points are matched during disparity estimation. These outliers can severely distort the least-squares solution of equation (A.4). In the present application, the reference plane has therefore been estimated using a so-called robust fit method, that is, a fit method which is insensitive to outliers. This method is based on the Random Sample Consensus (RANSAC) paradigm proposed by Fischler and Bolles [29] and consists of the steps outlined in algorithm A.1. Given: A set of points {X̃ 1 , . . . , X̃ n }. 1. Randomly select a subset of k points out of {X̃ 1 , . . . , X̃ n }. 2. For this subset, solve equation (A.4) in a least-squares sense. This yields a solution P̃ i for the plane. 3. Find all the points in the whole set {X̃ 1 , . . . , X̃ n } that lie within a fixed distance d of the estimated plane P̃ i . These points form the consensus set for the estimate P̃ i . 4. Repeat the above steps N times. 5. Of the different solutions P̃ i , select the one with the largest consensus set. Calculate the least-squares solution to equation (A.4) using all the points in this consensus set. This yields the final fit plane P̃ . Algorithm A.1: Robust fit of a plane to a set of points, based on the RANSAC paradigm [29]. To perform the third step of the algorithm, which is finding the points that lie within a distance d of the estimated plane, it is useful to apply a rotation and a translation, which bring the estimated plane P̃ i onto the Z = 0 plane, to all points X̃ 1 , . . . , X̃ n . After this transformation, the consensus set is found by selecting the transformed points whose Z-coordinate lies in the interval [−d, d]. For the plane fit in the present application, described in section 7.8.1, the largest consensus set out of N = 10 runs with subsets of k = 1000 out of a total of 34046 points was used. Points with d ≥ 1.5 mm were classified as outliers. The largest consensus set, which was used for the final least-squares estimate, comprised 33156 or 97.4% of the total number of points. 154 Reconstruction Accuracy B Estimating Planar Homographies In order to perform the camera calibration procedure described in section 3.4.2, it is necessary to estimate a planar homography, that is a projective transform in P2 , from point correspondences. The estimation of a homography from point correspondences is a standard problem in computer vision and appropriate algorithms are described, for example, in [44]. For completeness, this section gives a brief summary of how to estimate a homography. B.1 Problem statement Given a set of points with coordinates pi = (px , py )T on a plane, and a set of corresponding points mi = (mx , my )T that are related by a projective homography as mx px (B.1) m̃i = λ my = Hp̃i = H py , 1 1 the problem is to estimate the matrix H describing the homography. For example, the points could correspond to two different images that are related by a projective transformation, or, as in the present application, to points on a plane in 3D space and their images obtained with a projective camera. Leaving degenerate configurations aside, a solution for H can be found if at least four point correspondences are known, because each correspondence gives two constraints on the eight degrees of freedom H (3×3 minus one because projective transforms are only defined up to a scale factor). Similar to the estimation of the camera parameters in section 3.4, the estimation of the homography consists of two main steps: First, an initial solution is found by stacking the constraints on H imposed by the point correspondences and solving the resulting linear system. Second, this linear solution is used as an initial value for a non-linear optimization routine that minimizes a geometric error functional. 156 B.2 Estimating Planar Homographies Initial Guess Let the row vectors of H be H T1 , H T3 , H T3 . Then (B.1) becomes T H1 u m̃i = v = H T2 p̃i w i H T3 (B.2) Without loss of generality, it can be assumed that the homogeneous vector m̃ is scaled such that w = 1. Then it is possible to introduce a factor of one into the equation u − u = 0 yielding u − u = u − 1 · u = u − wu = u − uH T3 p̃ = p̃T H 1 − up̃T H 3 = 0. (B.3) Similarly the equation v − v = 0 can be transformed into p̃T H 2 − v p̃T H 3 = 0. (B.4) Equations (B.3) and (B.4) are constraints on H and can be written in matrix form as · T ¸ H1 T T p̃ 0 −up̃ H 2 = 0. (B.5) T T 0 p̃ −vHpT H3 For each pair of corresponding points one obtains a linear system of the form (B.5). These systems can be stacked to obtain the equation LH = 0, (B.6) where L is a 2n × 9 matrix that contains two rows for each point correspondence, ¡ ¢T and where H = H T1 , H T2 , H T3 is a 9-vector with the elements of H. Equation (B.6) can be solved for H if at least four point correspondences are known. In general, there will be no exact solution due to noise in the point measurements, but the solution that minimizes kLHk for a non-trivial H can be found by a singular value decomposition of L as the singular vector corresponding to the smallest singular value (see, for example Golub and van Loan [39]). As described by Hartley [42] for a similar type of estimation problem, the accuracy of the solution can be improved by a prior data normalization which makes the matrix L better conditioned numerically. 157 B.3 Minimizing Geometric Error The algebraic error functional kLHk that is minimized by solving the linear system (B.6) in a least-squares sense has no intuitively meaningful geometrical interpretation. Therefore, it is common to use a non-linear optimization technique to minimize a geometrically meaningful error functional e(H). The linear solution for equation (B.6) is used as a starting value for the non-linear optimization routine. A suitable error functional is e(H) = X (mi − m̆i )2 , (B.7) i µ T ¶ H 1 p̃ . This choice of where the m̆i are the p transformed by H as m̆i = H T2 p̃ e(H), called transfer error, is the sum of the squared Euclidean distances between the original points pi transformed by H and their corresponding points mi as determined from the image. 1 HT 3 p̃ The implementation of the camera calibration used in the present application, due to Bouguet [10], uses a gradient descent algorithm (see Press et al. [75]) to minimize the error functional (B.7). 158 Estimating Planar Homographies Acknowledgements/Danksagung Thanks go to . . . Bernd Jähne for giving me the opportunity to work on this interesting topic. Kurt Roth for acting as the second referee. Richard Hartley, for the opportunity to do part of this work in Canberra. All members of the “Windkanaler” and image processing groups for the good company and many interesting discussions. All the proofreaders, especially Madeleine. My parents. Vielen Dank ! Index absorptivity, 34 accuracy of reconstruction, 138, 149 acquisition system, 104 Aeolotron, 11, 101, 109 algebraic error, 61, 157 apparent temperature, 94 apparent temperature, 44 area based stereo, 76 blackbody, 31 laboratory, 104 Santa Barbara Infrared, 104 bolometers, 41 Boltzmann’s constant, 32 boresight alignment, 103 calibration camera geometry, 56 geometric, results, 118 process (camera geometry), 57 radiometric, 25, 43, 93, 94 target, 108 camera calibration, 47, 56 camera calibration matrix, 51 camera center, 50 camera projection matrix, 52, 125 CCD-type cameras, 53 center of projection, 50 close-range, 23 Color Imaging Slope Gauge, 17 complex refractive index, 36 cool skin of the ocean, 9 correspondence problem, 75 cross-correlation, 84 dense stereo, 76 depth-of-field, 102 detector output, 43 dielectric constant, 36 dielectric function ²(λ), 36 disparity, 75 map, 77 sub-pixel estimation, 87 underlying assumptions, 79 disparity space image, 78 distortion correcting for, 55 non linear, 53, 54 radial, 54 tangential, 55 duality, 49 emissivity, 34 temperature measurements, 44 epipolar constraint, 62 epipolar line, 65, 76 epipolar plane, 65 Euler-Lagrange, 97 exterior camera parameters, 52, 122 feature-based stereo, 76 field-of-view, 102 focal length, 50 focal plane, 50 focal plane array, 41, 93 foreshortened, 28 FPA focal plane array, 41 frame grabber, 104 fronto-parallel, 79 fronto-parallel assumption, 80 fundamental matrix, 62, 66 Gaussian image pyramids, 89, 128 geometric error, 61, 157 grazing angles, 80 160 Index greenhouse gases, 7 greybody, 35 Herschel, William, 30 homogeneous, 47, 48 homography, 49, 58 estimation of, 155 homologous, 75, 79 IAPSO, 25 ideal diffuse, 29 image coordinate system, 50 image plane, 50 image pyramids, 83 imaging, 23 index of refraction, 36 infrared cameras, 101 Intergovernmental Panel on Climate Change, 7 interior camera parameters, 53, 55, 120 irradiance, 27 Kirchhoff’s Law, 34 Kirchhoff’s law, 31 Lambert’s cosine law, 29 Lambertian assumption in matching, 80 surface, 29 Laplace, 97 membrane model, 93, 96, 97, 100 microEnable 2, 104 MWIR, 42 noise equivalent temperature difference, 40 non-uniformity correction, 93, 102 Opacity, 81 optical flow, 23 outlier removal, 95 passive, 23 remote sensing device, 22 penetration depth, 38 161 photon detector arrays, 41 photon detectors, 41 pinhole camera model, 50 pinhole model, 47 pixel pitch, 53 Planck, 31 Planck’s constant, 32 points at infinity, 49 principal axis, 50 quantum efficiency, 41 QWIP, 42 radial distortion, 54 Radiance, 28 radiant energy, 27 Radiant exitance, 27 radiant flux, 27 Radiant intensity, 28 radiant power, 27 Radiometric calibration, 93 radiometric calibration, 43, 94 radiometric quantities, 26 radiometry, 25 RAID, 106 RANSAC Random sample consensus, 153 Raytheon Amber Radiance, 101 re-projection error, 61, 122 rectification, 62, 67, 126 Reflective Stereo Slope Gauge, 16 reflectivity, 34 regularization, 93 remote-sensing, 23 representation of matches, 77 S.M.S. Hyäne, 13 Santa Barbara Infrared Inc., 104 Schmidt number scaling, 11 Selective emitters, 35 shape-from-shading, 17 shiftable windows, 83 skew, 53 162 solid angle, 26 spectral radiant energy, 30 spherical coordinates, 26 Stefan-Boltzmann law, 32 stereo area-based, 76 correspondence problem, 75 feature-based, 76 particle tracking velocimetry, 76 underlying assumptions, 79 stereo particle tracking velocimetry, 76 sub-pixel disparity, 87 synchronization of cameras, 106 tangential distortion, 55 temperature apparent, 44, 95 measurement, 44 texture, 130 assumption in matching, 82 thermal detector arrays, 41 Thermosensorik CMT 384, 101 transmissivity, 34 triangulation, 73 accuracy of, 138, 150, 151 figure, 13 Tsukuba, 128 verging angle, 103 vignetting, 115 Wien’s displacement law, 32 world coordinate system, 52, 122 Index Bibliography [1] L. P. Adams and J. D. Pos. Model harbour wave form studies by short range photogrammetry. Photogrammetric Record, 10(58):457–470, 1981. [2] L. Alvarez, R. Deriche, J. Sanchez, and J. Weickert. Dense disparity map estimation respecting image discontinuities: A PDE and scale-space based approach. Technical Report RR3874, INRIA, Sophia-Antipolis, France, January 2000. [3] G. Balschbach. Untersuchungen statistischer und geometrischer Eigenschaften von Windwellen und ihrer Wechselwirkung. PhD thesis, Universität Heidelberg, February 2000. [4] G. Balschbach, J. Klinke, and B. Jähne. Multichannel shape from shading techniques for moving specular surfaces. In Proceedings of ECCV 98, 1998. [5] M. L. Banner, I. S. F. Jones, and J. C. Trinder. Wave number spectra of short gravity waves. J.Fluid Mech., 198:321–344, 1989. [6] E. Bock. personal communication, 2000. [7] E. Bock et al. Overview of the CoOP experiments: Physical and chemical measurements parameterizing air-sea heat exchange. In M. Donelan, editor, Gas Transfer at Water Surfaces. American Geophysical Union, 2002. [8] E. J. Bock and T. Hara. Optical measurements of capillary-gravity wave spectra using a scanning laser slope gauge. Journal of Atmospheric and Oceanic Technology, 12(2):395–403, April 1995. [9] E. J. Bock and T. Hara. Relationship between air-sea gas transfer and short wind waves. J.Geophys.Res., 104(25821), 1999. [10] J.-Y. Bouguet. Camera calibration toolbox for Matlab. http://www.vision.caltech.edu/∼bouguetj/calib doc/index.html, , 2002. [11] D. C. Brown. Decentering distortion of lenses. Photogrammetric Engineering, 32(3):444–462, 1966. [12] D. C. Brown. Lens distortion for close-range photogrammetry. Photogrammetric Engineering, 37(8):855–866, 1971. 164 Bibliography [13] M. Z. Brown, D. Burschka, and G. D. Hager. Advances in computational stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(8):993–1008, August 2003. [14] T. A. Clarke and J. G. Fryer. The development of camera calibration methods and models. Photogrammetric Record, (16(91)):51–66, April 1998. [15] L. J. Cote. The directional spectrum of a wind generated sea as determined from data obtained by the stereo wave observation project. Meteorological Papers, 2 (6):1–88, 1960. [16] C. Cox and W. Munk. Measurements of the roughness of the sea surface from photographs of the sun’s glitter. Journal of the Optical Society of America, 44(11):838–850, 1954. [17] C. Cox and W. Munk. Statistics of the sea surface derived from sun glitter. Journal of Marine Research, 13(2):198–227, 1954. [18] J. Cruset. Photogrammetric measurement of the sea swell. Photogrammetria, pages 122–125, 1953. [19] U. R. Dhond and J. K. Aggarwal. Structure from stereo - a review. IEEE Transactions on Systems, Man, and Cybernetics, 19(6):1489–1510, November/December 1989. [20] J. Dieter. Analysis of small ocean wind waves by image sequence analysis of specular reflections. PhD thesis, University of Heidelberg, Germany, May 1998. [21] J. Dieter, H. Lauer, and B. Jähne. Measurements of slope statistics on a wind driven water surface. In A. Gruen and H. Kahmen, editors, Optical 3D Measurement Techniques 4 - Applications in architecture, quality control, robotics, navigation, medical imaging and animation, pages 357–364, 1997. [22] H. D. Downing and D. Williams. Optical constants of water in the infrared. Journal of Geophysical Research, 80(12):1656–1661, 1975. [23] V. A. Dulov, V. N. Kudryavtsev, and A. N. Bolshakov. A field study of whitecap coverage and its modulations by energy containing surface waves. In M. Donelan, E. Saltzman, and R. Wanninkhof, editors, Gas Transfer at Water Surfaces. American Geophysical Union, 2002. Bibliography 165 [24] D. Engelmann, C. S. Garbe, M. Stöhr, and B. Jähne. Three dimensional flow dynamics beneath the air-water interface. In Proc. of the Symposium on the Wind-Driven Air-Sea Interface, page 181, Sydney, Australia, 1999. [25] O. Faugeras. Three-dimensional computer vision: A geometric viewpoint. MIT Press, 1993. [26] O. Faugeras, B. Hotz, et al. Real time correlation-based stereo: algorithm, implementations and applications. Technical Report 2013, INRIA, August 1993. [27] O. Faugeras and Q.-T. Luong. The geometry of multiple images. MIT Press, 2001. [28] R. Feynman, R. B. Leighton, and M. Sands. The Feynman lectures on physics. Addison Wesley, 1964. [29] M. Fischler and R. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981. [30] N. M. Frew. The role of organic films in air-sea gas exchange. In P. S. Liss and R. A. Duce, editors, The Sea Surface and Global Change, chapter 5, pages 121–171. Cambridge University Press, Cambridge,UK, 1997. [31] N. M. Frew, E. J. Bock, W. R. McGillis, A. V. Karachintsev, T. Hara, T. Münsterer, and B. Jähne. Variation of air-water gas transfer with wind stress and surface viscoelasticity. In B. Jähne and E. C. Monahan, editors, Airwater Gas Transfer, selected papers from the Third International Symposium on Air-Water Gas Transfer, Hanau, 1995. [32] A. Fusiello, A. Trucco, and A. Verri. A compact algorithm for rectification of stereo pairs. Machine Vision and Applications, (12):16–22, 2000. [33] D. Fuß. Kombinierte Höhen- und Neigungsmessung von winderzeugten Wasserwellen (in preparation). PhD thesis, University of Heidelberg, Germany, 2004. [34] C. Garbe. Entwicklung eines Systems zur dreidimensionalen Particle Tracking Velocimetry mit Genauigkeitsuntersuchungen und Anwendung bei Messungen in einem Wind-Wellen Kanal. Master’s thesis, University of Heidelberg, Heidelberg, Germany, 1998. 166 Bibliography [35] C. S. Garbe. Measuring heat exchange processes at the air-water interface from thermographic image sequence analysis. PhD thesis, University of Heidelberg, Heidelberg, Germany, December 2001. [36] C. S. Garbe, H. Haußecker, and B. Jähne. Measuring the sea surface heat flux and probability distribution of surface renewal events. In E. Saltzman, M. Donelan, W. Drennan, and R. Wanninkhof, editors, Gas Transfer at Water Surfaces, Geophysical Monograph. American Geophysical Union, 2001. [37] C. S. Garbe, B. Jähne, and H. Haußecker. Measuring the sea surface heat flux and probability distribution of surface renewal events. In M. Donelan, E. Saltzman, and R. Wanninkhof, editors, Gas Transfer at Water Surfaces, pages 109–114. American Geophysical Union, 2002. [38] J. Gluckman and S. K. Nayar. Rectifying transformations that minimize resampling effects. In Proc. of IEEE Computer Vision and Pattern Recognition, Kauai, Hawaii, December 2001. [39] G. H. Golub and C. F. van Loan. Matrix computations. John Hopkins University Press, Baltimore and London, 3rd edition, 1996. [40] V. Haltrin. Fresnel reflection coefficient of very turbid waters. In Proceedings of the 5th International Conference: Remote Sensing for Marine and Coastal environments, San Diego, volume II, 1998. [41] G. C. Harris and M. J. Stevens. A combined corner and edge detector. In Proceedings of the 4th Alvey Vision Conference, pages 147–151, 1988. [42] R. I. Hartley. In defence of the 8-point algorithm. In ICCV, Boston, June 1995. [43] R. I. Hartley and P. F. Sturm. Triangulation. Computer Vision and Image Understanding, 68(2):146–157, November 1997. [44] R. I. Hartley and A. Zisserman. Multiple view geometry in computer vision. Cambridge University Press, Cambridge, U.K., 2000. [45] H. Haußecker. Radiation. In B. Jähne, H. Haußecker, and P. Geißler, editors, Handbook of Computer Vision and Applications, volume 1, chapter 2, pages 7–35. Academic Press, San Diego,CA, 1999. [46] H. Haußecker, U. Schimpf, C. S. Garbe, and B. Jähne. Physics from IR image sequences: Quantitative analysis of transport models and parameters of air-sea Bibliography 167 gas transfer. In M. Donelan, E. Saltzman, W. Drennan, and R. Wanninkhof, editors, Gas Transfer at Water Surfaces, Geophysical Monograph. American Geophysical Union, 2001. [47] J. Heikkilä and O. Silven. A four-step camera calibration procedure with implicit image correction. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1106–1112, 1997. [48] J. Hewett. Infrared cameras near consumer applications. Interview with Earl Lewis of FLIR systems. Opto and Laser Europe, pages p15–17, January 2004. [49] N. K. Hoejerslev. Optical properties of sea water. In Oceanography, volume 3 of Landolt-Börnstein: Numerical data and functional relationships in science and technology (New Series), page 383. Springer, 1986. [50] G. C. Holst. Testing and evaluation of infrared imaging systems. SPIE Optical Engineering Press, 2nd edition, 1998. [51] G. C. Holst. Common sense approach to thermal imaging. SPIE Optical Engineering Press, Bellingham, WA, 2000. [52] B. K. P. Horn. Robot vision. MIT Press, Cambridge, MA, 1986. [53] IPCC. Climate Change 2001: Synthesis report. A contribution of working groups I, II, and III to the third assessment report of the Intergovernmental Panel on Climate Change. Cambridge University Press, 2001. [54] B. Jähne. The Heidelberg Aeolotron air-sea interaction facility. CD-ROM Aeon Verlag, Hanau, 2000. [55] B. Jähne. personal communication, 2000. [56] B. Jähne. Digital image processing. Springer-Verlag, Heidelberg, Germany, 5th edition, 2002. [57] B. Jähne, P. Geißler, and H. Haußecker. Handbook of computer vision and applications. Academic Press, San Diego, 1999. [58] B. Jähne, H. Haußecker, U. Schimpf, and G. Balschbach. The Heidelberg Aeolotron - a new facility for laboratory investigations of small scale air-sea interaction. In M. L. Banner, editor, The Wind-Driven Air-Sea Interface: Electromagnetic and Acoustic Sensing, Wave Dynamics and Turbulent Fluxes, Sydney, Australia, 1999. 168 Bibliography [59] B. Jähne, J. Klinke, and S. Waas. Imaging of short ocean wind waves: a critical theoretical review. Journal of the Optical Society of America, 11(8): 2197–2209, 1994. [60] B. Jähne, K. O. Münnich, R. Bösinger, A. Dutzi, W. Huber, and P. Libner. On the parameters influencing air-water gas exchange. Journal of Geophysical Research, 92(C2):1937–1949, 1987. [61] B. Jähne and K. Riemer. Two-dimensional wave number spectra of small-scale water surface waves. J.Geophys.Res., 95:11531–11646, 1990. [62] C. Janßen. Ein miniaturisiertes Strömungsvisualisierung in Kiesbetten. Heidelberg, Germany, July 2000. Endoskop-Stereomesssystem zur Master’s thesis, University of [63] B. Kinsman. Wind waves. Prentice Hall, Englewood Cliffs, 1965. [64] C. Kittel. Introduction to solid state physics. Wiley & Sons, New York, 1971. [65] M. Klar. 3-D particle-tracking velocimetry applied to turbulent open-channel flow over a gravel layer. Master’s thesis, University of Heidelberg, Germany, May 2001. [66] J. Klinke. Optical measurements of small-scale wind generated water surface waves in the laboratory and the field. PhD thesis, University of Heidelberg, Germany, 1996. [67] J. Klinke and B. Jähne. Measurements of short ocean waves during the MBL ARI West Coast experiment. In B. J. Monahan, editor, Air-Water Gas Transfer - Selected papers from the Third International Symposium on Air-Water Gas Transfer, pages 165–173, 1995. [68] E. Kohlschütter. Die Forschungsreise S.M.S. Planet II. Stereophotogrammetrische Aufnahmen. Annalen der Hydrographie und Maritimen Meteorologie, 34:220–227, 1906. [69] P. S. Liss and et.al., editors. Solas science plan and implementation strategy, April 2003. [70] P. S. Liss and L. Merlivat. Air-sea gas exchange rates: Introduction and synthesis. In P. Buat-Menard, editor, The role of air-sea exchange in geochemical cycling, pages 113–129. Reidel, Boston, MA, 1986. Bibliography 169 [71] C. Loop and Z. Zhang. Computing rectifying homographies for stereo vision. In IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 125–131, Fort Collins, CO, June 1999. [72] T. Luhmann and W. Tecklenburg. Optische Messung der Wellentopographie. Technical report, Fachhochschule Oldenburg, Institut für angewandte Photogrammetrie, 2000. [73] A. Mucha and B. Szczechowski. Photogrammetrische Vermessung des Wellengangs an hydrotechnischen Modellen. Jenaer Rundschau, (4), 1983. [74] M. Planck. Distribution of energy in the spectrum. Annalen der Physik, 4(3): 553–563, 1901. [75] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical recipes in C. Cambridge University Press, Cambridge, MA, 2nd edition, 1992. [76] G. Redweik. Untersuchung zur Eignung der digitalen Bildzuordnung für die Ableitung von Seegangsparametern. PhD thesis, Fachbereich Vermessungswesen der Universität Hannover, 1993. [77] Rottok. Meereswellen-Beobachtungen. Annalen der Hydrographie und Maritimen Meteorologie, (8):329–341, 1903. [78] F. Santel, C. Heipke, K. S., and H. Wegmann. Image sequence matching for the determination of three-dimensional wave surfaces. In Proceedings of the ISPRS Commision V Symposium, volume XXXIV, pages 596–600, Corfu, Greece, September 2002. [79] P. M. Saunders. The temperature at the ocean-air interface. Journal of Atmospherical Sciences, 24(3):269–273, 1967. [80] H. Scharr and J. Weickert. An anisotropic diffusion algorithm with optimized rotation invariance. In DAGM, pages 460–467, Kiel, Germany, September 2000. [81] D. Scharstein and R. Szeliski. Stereo vision research page. http://www.middlebury.edu/stereo, Middlebury College, Middelbury, VT, USA, 2002. [82] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, (47(1)):7–42, 2002. 170 Bibliography [83] U. Schimpf. personal communication, 2003. [84] U. Schimpf, C. S. Garbe, and B. Jähne. Investigation of the transport processes across the sea-surface microlayer by infrared imagery. Journal of Geophysical Research, 2004. to appear. [85] H. Schultz. Specular surface stereo: a new method for retrieving the shape of a water surface. In L. Estep, editor, Optics of the air-sea interface: theory and measurements, number 1749, pages 283–294, 1992. [86] H. Schultz. Shape reconstruction from multiple images of the ocean surface. Technical report, Department of Computer Science, University of Massachusetts, Amherst, 1994. [87] A. Schumacher. Die stereophotogrammetrischen Wellenaufnahmen der Deutschen Atlantischen Expedition. Ergänzungsheft 3 zur Zeitschrift der Gesellschaft für Erdkunde, pages 105–120, 1928. [88] A. Schumacher. Stereophotogrammetrische Wellenaufnahme mit schneller Bildfolge. Deutsche Hydrographische Zeitschrift, 3(1/3):78–82, 1950. [89] C. E. Shannon. The mathematical theory of information. University of Illinois Press, Urbana, IL, 1949. [90] O. H. Shemdin and H. M. Tran. Measuring short surface waves with stereophotography. Photogrammetric Eng.and Remote Sens., 93:311–316, 1992. [91] O. H. Shemdin, H. M. Tran, and S. C. Wu. Directional measurements of short ocean waves with stereophotography. J.Geophys.Res., 93:13891–13901, 1988. [92] H. Spies. Bewegungsdetektion und Geschwindigkeitsanalyse zur Untersuchung von Sedimentverlagerungen und Porenströmungen. Master’s thesis, University of Heidelberg, Germany, 1998. [93] H. Spies. Analysing dynamic processes in range data sequences. PhD thesis, University of Heidelberg, Heidelberg, Germany, July 2001. [94] H. Spies, H. Haussecker, and H. J. Köhler. Material transport and structure changes at soil-water interfaces. In Geofilters, pages 91–97, Warzsaw, Poland, 2000. [95] R. H. Stewart. Methods of satellite oceanography. University of California Press, Berkeley, 1985. Bibliography 171 [96] D. J. Stilwell. Directional energy spectra of the sea from photographs. J.Geophys.Res., 74:1974–1986, 1969. [97] T. Takahashi. Net sea-air CO2 flux over the global oceans: An improved estimate based on the sea-air pCO2 difference. In Proceedings of the 2nd International Symposium CO2 in the Oceans, Tsukuba, Japan, 1999. Center for Global Environmental Research, National Institute for environmental Studies. [98] T. Takahashi and S. C. Sutherland. Global sea-air CO2 flux based on climatological surface ocean pCO2 , and seasonal biological and temperature effects. Deep Sea Research II, 49(1601), 2002. [99] D. Tschumperle. PDEs based regularization of multivalued images and applications. PhD thesis, Universite de Nice, Sophia Antipolis, France, 2002. [100] H. Vogel. Gerthsen Physik. Springer, 18. edition, 1995. [101] S. Waas. Entwicklung eines Verfahrens zur Messung kombinierter Höhen- u. Neigungsverteilungen von Wasserwellen. Master’s thesis, University of Heidelberg, 1988. [102] S. Waas. Combined slope-height measurements of short wind waves: First results from field and laboratory measurements. In Optics of the Air-Sea Interface: Theory and Measurements, San Diego, 1992. SPIE’s 1992 International Symposium. [103] S. Waas. Entwicklung eines feldgängigen optischen Meßsystems zur stereoskopischen Messung von Wasseroberflächenwellen. PhD thesis, Universität Heidelberg, 1992. [104] R. Wanninkhof. Relationship between gas exchange and wind speed over the ocean. Journal of Geophysical Research, 97(C5):7373–7382, 1992. [105] D. Wierzimok and B. Jähne. 3-dimensional particle tracking beneath a windstressed wavy water surface with image processing. In D. Wierzimok and B. Jähne, editors, 5th Int. Symp. Flow Visualization, Prague, August 1989. [106] C. J. Zappa, W. Asher, A. T. Jessup, J. Klinke, and S. Long. Effect of microscale wave breaking on air-water gas transfer. In M. Donelan, E. Saltzman, W. Drennan, and R. Wanninkhof, editors, Gas Transfer at Water Surfaces, Geophysical Monograph. American Geophysical Union, 2001. 172 Bibliography [107] Z. Zhang. Flexible camera calibration by viewing a plane from unknown orientations. In Proceedings of the ICCV, 1999. [108] Z. Zhang. A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11):1330–1334, November 2000. [109] C. L. Zitnick and T. Kanade. A cooperative algorithm for stereo matching and occlusion detection. PAMI, 22(7):675–684, July 2000.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

### Related manuals

Download PDF

advertisement