Inaugural-Dissertation zur Erlangung der Doktorwürde der Naturwissenschaftlich–Mathematischen Gesamtfakultät der Ruprecht–Karls–Universität Heidelberg vorgelegt von Diplom–Informatikerin, Diplom–Mathematikerin Claudia Kondermann aus Bocholt Tag der mündlichen Prüfung: 13. Juli 2009 ii Postprocessing and Restoration of Optical Flows 1. Gutachter: PD. Dr. Christoph Garbe Digitale Bildverarbeitung Interdisziplinäres Zentrum für Wissenschaftliches Rechnen Universität Heidelberg 2. Gutachter: Prof. Dr. Rudolf Mester Visuelle Sensorik und Informationsverarbeitung Universität Frankfurt iii iv Abstract The notion “Optical flow” refers to the apparent motion in the image plane produced by the projection of the real 3D motion onto the 2D image plane. The thesis at hand addresses postprocessing and restoration methods for arbitrarily computed optical flow fields. Many motion estimators have been proposed during the last three decades, but all of them suffer from shortcomings in difficult situations. Hence, it is of utmost importance for any optical flow measurement technique to give a prediction of the quality and reliability of each individual flow vector. Yet, a sound, universally applicable, and statistically motivated confidence measure for optical flow measurements is still missing today. Based on such information, erroneous optical flow fields can be restored or improved by means of inpainting techniques. This thesis introduces three confidence measures, which evaluate the reliability of optical flow vectors. In contrast to previously employed methods, these confidence measures are based on learned motion models and are, thus, statistically motivated, they are independent of the original flow computation method and yield more accurate predictions on the quality of optical flow vectors. The thesis puts a second focus on the restoration of optical flow fields, where it transfers inpainting techniques from the restoration of images to the field of motion recovery. Since the reconstruction process in case of motion fields can use the image sequence as additional source of information, a novel motion inpainting approach is proposed. It combines motion and image information in one functional and, thus, allows to control the orientation of the reconstruction algorithm based on image edges. Zusammenfassung Der Begriff “Optischer Fluss” bezeichnet die scheinbare Bewegung auf der Bildebene, die durch die Projektion der realen 3D Bewegung auf die 2D Bildebene erzeugt wird. Die vorliegende Arbeit befasst sich mit der Nachbearbeitung und Wiederherstellung beliebig berechneter Flussfelder. Viele Bewegungsschätzer sind in den letzten drei Jahrzenten vorgeschlagen worden, aber alle weisen in schwierigen Situationen Mängel auf. Deshalb ist es von höchster Wichtigkeit für jede Flussberechnungsmethode, eine Schätzung der Qualität und Zuverlässigkeit für jeden einzelnen Flussvektor anzugeben. Jedoch fehlt ein solides, allgemein anwendbares und statistisch motiviertes Konfidenzmaß für Flussberechnungen bis heute. Basierend auf diesen Informationen können mittels sogenannter “inpainting” Methoden fehlerhafte Flussfelder wiederhergestellt oder verbessert werden. Im Rahmen dieser Arbeit werden drei Konfidenzmaße vorgeschlagen, die die Zuverlässigkeit von Flussvektoren bewerten. Im Unterschied zu den bisher verwendeten Methoden basieren diese Konfidenzmaße auf gelernten Bewegungsmodellen und sind somit statistisch motiviert, sie sind unabhängig von der zu Grunde liegenden Flussberechnungsmeth- v ode und liefern genauere Vorhersagen über die Qualität der Flussvektoren. Ein zweiter Schwerpunkt der Arbeit lieg auf der Wiederherstellung von Flussfeldern. Sie überträgt “inpainting” Methoden von der Bildrestauration auf das Feld der Bewegungsrekonstrution. Da der Rekonstruktionsprozess im Falle von Bewegungsfeldern zusätzlich die Bildsequenz als Informationsquelle benutzen kann, wird ein neuer Ansatz zur Rekonstruktion von Bewegungen vorgeschlagen. Er kombiniert Bewegungs- und Bildinformationen in einem Funktional und erlaubt dadurch die Orientierung des Rekonstruktionsprozesses an Bildkanten. vi Acknowledgements During the last three years I have worked with people from different mathematical backgrounds, whose ideas and criticism were vital to this work. First of all I want to thank PD Dr. Christoph Garbe for offering me the PhD position despite the long waiting time, for introducing me to the world of optical flows, for inspiring discussions, his constant support and nightshifts of paper revisions. I am indebted to Prof. Dr. Rudolf Mester, whose support and commitment in the field of statistics constitute a significant contribution to the success of this thesis. It was a pleasure to work with both of you. Many thanks also go to Prof. Dr. Martin Rumpf and Benjamin Berkels for fruitful discussions on the topic of optical flow restoration. Finally, I thank Daniel Kondermann for many hours of interesting discussions. Last but not least, I want to thank my colleagues for this great working environment at IWR and HCI and all the fun we have had during our joint freetime activities. Especially I want to thank Barbara Werner and my office neighbor Nikos Gianniotis for the many enlivening conversations and joyful hours we have spent together. vii Contents 1 Introduction 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Mathematical Preliminaries 2.1 The Calculus of Variations . . . . 2.2 Hypothesis Testing . . . . . . . . 2.3 Best Linear Unbiased Estimators 2.4 Intrinsic Dimensions . . . . . . . 2.5 Principal Component Analysis . 2.6 The Least Squares Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 6 8 9 9 11 14 17 19 22 3 Optical Flow Estimation 25 3.1 Local Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Global Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4 Error Analysis 4.1 Introduction . . . . . . . . . . . . 4.2 Discussion of the Angular Error . 4.3 The Joint Distribution of Optical 4.4 Experiments and Results . . . . . 4.5 Summary and Conclusion . . . . . . . . . . Flow . . . . . . 5 Predictability and Situation Measures 5.1 Introduction . . . . . . . . . . . . . . 5.2 Classification of Situation Measures . 5.3 Experiments and Results . . . . . . . 5.4 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 29 33 35 37 44 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 45 48 58 67 . . . . . . . . . . . . . . . . . . . . . . . . 1 Contents 6 Surface Measures 6.1 Introduction . . . . . . . . 6.2 Surface Measures . . . . . 6.3 Computational Issues . . 6.4 Experiments and Results . 6.5 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 69 71 74 75 79 . . . . . . . Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 81 84 89 92 92 93 94 102 8 A Model Based Optical Flow Algorithm 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . 8.3 Confidence Estimation . . . . . . . . . . . . . . . . . . . . . . 8.4 Integration of the Model into a Global Optical Flow Method 8.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 105 107 109 109 111 126 . . . . . . 129 129 131 132 133 136 142 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Statistical Confidence Estimation 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . 7.2 A Confidence Measure Based on Linear Subspace 7.3 A Statistical Confidence Measure . . . . . . . . . 7.4 Applicability of the Test . . . . . . . . . . . . . . 7.5 Application to Sparse Vector Fields . . . . . . . . 7.6 A Nonlinear Extension . . . . . . . . . . . . . . . 7.7 Results . . . . . . . . . . . . . . . . . . . . . . . . 7.8 Summary and Conclusion . . . . . . . . . . . . . 9 The 9.1 9.2 9.3 9.4 9.5 9.6 Restoration of Optical Flow Fields Introduction . . . . . . . . . . . . . Diffusion Based Motion Inpainting TV Motion Inpainting . . . . . . . Image Guided Motion Inpainting . Experiments and Results . . . . . . Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Conclusions and Perspectives 143 10.1 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 10.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 2 Chapter 1 Introduction 1.1 Introduction Optical flow is the apparent motion in the image plane produced by the projection of the real 3D motion onto the 2D image plane. Since the 1980s, optical flow has been an important subject of research in computer vision, and many optical flow estimators have been proposed so far. Despite these long standing research activities, the equally important field of sound quality and reliability measures has often been ignored. A few measures called “confidence measures” have been proposed, yet they either merely estimate the complexity of the image sequence without taking account of the actual flow field at all, or they are derived from and, thus, depend on specific flow computation methods. To the best of our knowledge, no comprehensive approach has been published in literature that assesses the quality of computed flow fields in a postprocessing step independent of the actual flow estimation procedure. Furthermore, none of these measures are statistically substantiated. If sound reliability information was available, motion restoration methods could be employed to reconstruct erroneous flow fields with lower average errors. Such methods for an automatic refinement of computed flow fields would be of high interest for many applications. In fact, it would also be possible to directly integrate confidence measures into optical flow computation methods, as they basically consist of additional constraints and knowledge on the flow field. Yet, if different constraints are combined, they may conflict or contradict each other or drastically increase the computation time. Just as in the case of NP-problems, it is simpler to verify the accuracy of an already computed solution than to find the solution itself. Let D := Ω × [0, T ] denote a spatio-temporal image domain, where Ω ⊆ Rd , d ∈ {2, 3}, stands for the spatial domain of the sequence and [0, T ], T ∈ N, for its time interval. 3 1 Introduction Let, furthermore, I refer to an image sequence defined as I:D→R . (1.1) Then the notion ”optical flow” refers to the displacement field u of corresponding pixels in subsequent frames of the image sequence u : D → Rd . (1.2) Confidence measures can be defined as mappings from the spatio-temporal image domain D, the image sequence I and a d-dimensional displacement vector to the interval of confidence [0, 1], where 1 stands for high and 0 for low confidence: ϕ : D × I × Rd → [0, 1]. (1.3) Optical flow is employed in many applications today such as medicine, control systems, data compression, robot navigation, as well as pedestrian and vehicle tracking. In medical applications it is often necessary to screen the movement of a patient over a short period of time or to compare images of an organ before and after treatment with medication. Since the patient and his heart move, the motion must be compensated before further processing steps can be taken. To this end, optical flow fields can be computed to register different frames. E.g. in [91] the intensity of X-ray images is improved by screening the patient for a longer period of time followed by subsequent motion compensation. Especially in medical applications it is important to gain information on the accuracy of the motion field, since symptoms of diseases are often deduced from extremely small indications or color differences. Incorrect motion fields can easily produce such differences when subsequent frames are compared or registered. Furthermore, to fully register two frames, dense optical flow fields are necessary. Hence, confidence measures combined with an automatic motion restoration algorithm would be desirable in the field of medical imaging. Optical flow estimation is also important for physical applications and control systems. In [55] the optical flow estimates are used to monitor the motion of flames in order to optimize the combustion process. In [46, 45] the temperature change of the water surface, the heat flux, is estimated to analyze the air-sea gas transfer during the investigation of global climatic changes. For both applications accurate motion estimates are important for further processing steps. In case of combustion monitoring the heat of the process is regulated based on information drawn from the motion field. Erroneous optical flow measurements can, therefore, have undesired or even fatal effects on these control systems, such as a low effectiveness leading to unnecessarily high energy dissipation and operating expenses. In case of the analysis of the heat flux, incorrect motion estimates can entail experimental 4 1.1 Introduction errors and, thus, incorrect scientific conclusions. Depending on the control system or subsequent analysis steps, such as higher order derivatives of the flow field, dense optical flow fields may be required as well. Hence, accuracy measurements and restoration algorithms are also beneficial for physical applications. Another important field for optical flow is data compression, e.g. the compression of large video sequences [43]. If the first frame and the corresponding motion field is known, a rough version of the following frame can be obtained by means of warping. The resulting version of the following frame can be taken as a reference in the encoding stage. In this way, the required compensation information is reduced, which improves the compression ratio. For data compression accurate motion estimates are also a prerequisite, since they improve the quality of the computed reference frame and, thus, decrease the necessary compensation information. Furthermore, dense flow fields are required in order to compute the reference frame. Reliable confidence measures together with motion inpainting algorithms for the subsequent restoration of the motion field could, therefore, yield significant improvements of currently used methods in the field of video compression, reducing compression artifacts at increasing compression ratios. In robot navigation optical flow fields are used for obstacle detection, collision avoidance and to trail moving objects, e.g. in [34]. The basic concept was inspired by the way bees navigate. They try to balance the amount of motion occurring on either side of them. If the robot wants to avoid obstacles, then it should turn away from the side that shows more motion in the optical flow field, since this indicates a possible approach to a stationary obstacle. Similarly, if a target is trailed, the robot should turn to the side that shows more motion. In this way, it keeps the target in focus. For robot applications accuracy measurements are also important to ensure the robot’s safety and unproblematic navigation. In case of incorrect, possibly extremely large motion vectors, errors in the flow field can lead to incorrect navigation commands resulting in accidents, failures or uncontrolled behavior of the robot. Hence, confidence measures are important in this field. In pedestrian and vehicle tracking optical flow is often employed as well. One example is traffic monitoring in aerial video sequences, where low contrast and occlusions are difficult to handle [76]. Another example is pedestrian detection, e.g. [48], where principal component analysis and a boosting classifier are combined to distinguish between pedestrian motion and that of other objects such as cars. For applications as important as pedestrian detection accuracy measurements are indispensable to ensure the safety of people. Therefore, this thesis is dedicated to the analysis and classification of known confidence measures, to the proposition and evaluation of new confidence measures for optical flow fields and to the automatic reconstruction of erroneous measurements by means of inpainting methods yielding dense, accurate flow fields. 5 1 Introduction 1.2 Thesis Outline After introducing the topic of optical flow and confidence estimation as well as motion restoration and its applications I give an overview of the remaining chapters of this thesis. As some of the methods used in this thesis require some mathematical background knowledge, Chapter 2 contains the necessary preliminaries. For the estimation of optical flow fields many approaches have been proposed since the 1980s. The ones important for this thesis are shortly outlined in Chapter 3. I distinguish between local optical flow methods, which are based on locally restricted image regions, and global methods, which solve the optical flow problem for the whole image sequence within one functional. In order to evaluate confidence measures, which predict the error of computed flow fields, I first introduce and analyze common error measurements for optical flow fields in chapter 4. Such error measurements are of high importance for the understanding of strengths and weaknesses of optical flow algorithms as well as for scientific and industrial applications. However, their evaluation has been limited to mostly the indication of the average angular error and its variance for a small number of highly artificial test sequences. Hence, I also propose a new evaluation method, which comprehends the assessment of motion estimators as a sampling from a joint probability distribution, namely the joint distribution of the true and the estimated flow, as well as local gray value neighborhoods. Marginals and conditionals of this distribution allow for a detailed assessment of optical flow algorithms. After introducing common error measurements I come to the analysis of known confidence measures. I find that an important distinction has to be made between measures, which only judge the complexity of the image sequence to obtain information on the accuracy of the flow field, and “real” confidence measures. Since the former measures do not even consider the flow vectors at all, they, in fact, are insufficient for the task of reliability estimation. Such measures will be denoted by “situation measures”, since they only analyze the complexity of the image sequence. In chapter 5 I explain and classify these measures according to the intrinsic dimension of the image sequence they examine. I show that these measures can be successfully applied to the recognition of aperture problems, homogeneous regions and occlusions as well as to detect locations, where the flow vector can be computed reliably. In chapter 6 I employ the notion of intrinsic dimensionality again to evaluate the accuracy of optical flow fields. However, I do not compute the intrinsic dimension of the image sequence. Instead, since in fact every optical flow method can be formulated as some kind of energy minimization task, I propose to analyze the intrinsic dimension of the energy surface produced by small variations of the computed flow vector. I formulate surface measures, which are able to identify unreliable parameter vectors as well as outliers, and can be used as situation or confidence measures. In this way, sparse, but reliable motion fields with lower errors can be obtained. In chapter 7 I suggest learned motion models as another option to evaluate the accuracy 6 1.2 Thesis Outline of an arbitrarily computed optical flow field. These models are obtained from learning algorithms applied to typical motion fields, e.g. ground truth flow fields, computed or synthetic fields. The model then contains information on common flow vector constellations, which can be used to examine computed flow fields. I propose two different kinds of motion models, one based on linear subspace projections and the other based on a statistical hypothesis test. The resulting confidence measures are statistically motivated, generally applicable independent of the flow computation method and obtain highly accurate results. They can be extended to a nonlinear version and to handle non-dense flow fields. As confidence measures are based on some kind of knowledge on what a correct flow field should be like, most confidence measures already incorporate the basic idea or constraints for an optical flow computation method. Both concepts, the optical flow algorithms and confidence measures for their evaluation, are, therefore, closely related. Hence, starting out from the confidence measure proposed in Chapter 7, a new optical flow computation method is developed in Chapter 8. It is simple to implement, can easily be parallelized and yields highly accurate results. Finally, confidence measures can be used to improve optical flow fields. Depending on the estimated accuracy of a given flow vector it can be removed from the flow field and reconstructed from its surrounding neighborhood in a subsequent step. Methods for the restoration of optical flow fields are introduced in Chapter 9. Here, the transfer of known inpainting methods to the reconstruction of vector fields is discussed. To further increase the quality of the restored field, I propose to include the image sequence information in the restoration process. The resulting functional directly combines motion and image information and allows to control the impact of image edges on the motion field reconstruction. In fact, in case of jumps of the motion field, where the jump set coincides with an edge set of the underlying image intensity, an anisotropic TV-type functional acts as a prior in the inpainting model. To conclude, I combine the most effective confidence measure proposed in Chapter 7 and the image guided motion inpainting approach proposed in Chapter 9 to automatically restore optical flow fields. 7 1 Introduction 1.3 Notation As before, let I : D → R denote the spatio-temporal image sequence and u : D → Rd the optical flow field. In case the ground truth flow is known it will be denoted by g : D → Rd . Furthermore, let ||.||l2 denote the l2 -norm as the square root of the sum of the squared vector/matrix components. xT stands for the transpose of the vector x. The gradient is indicated by ∇z , where z ⊂ {x, y, t} indicates its direction. det(A) refers to the determinant of the matrix A, trace(A) denotes the trace of the matrix A. Sometimes vectors are augmented by an additional dimension, which is set to 1. To simplify the notation in these cases, let for any given vector v ∈ Rn ṽ = (v1 , ..., vn , 1)T ∈ Rn+1 . (1.4) Sometimes I need to indicate the image sequence I warped according to a flow field u. Since the flow field is in general not integer numbered, I use linear interpolation. The warped image sequence is indicated by Iw : Iw (x, y, t) = I(x + u1 , y + u2 , t + 1). (1.5) Let, furthermore, R denote the set of real numbers, N the set of natural numbers and Nn = {1, ..., n} the first n natural numbers. 8 Chapter 2 Mathematical Preliminaries 2.1 The Calculus of Variations Let L be a real-valued mapping depending on five variables, which is twice continuously differentiable. The integral Z lZ l Φ(u) = L(x, y, u(x, y), ux (x, y), uy (x, y)) dy dx 0 (2.1) 0 is to be minimized over the set of all continuously differentiable mappings u : R2 → R fulfilling the boundary value conditions u(0, y) = 0, u(l, y) = 0 ∀y ∈ R (2.2) u(x, 0) = 0, u(x, l) = 0 ∀x ∈ R. The mapping u → Φ(u) is called functional, and the minimization of such functionals is handled by the calculus of variations. We assume that there exists a mapping u, which minimizes the functional Φ(u). Let v : R2 → R be an arbitrary, continuously differentiable mapping with boundary values equivalent to (2.2). Then the mapping Z lZ → Φ(u + v) = l L(x, y, u + v, ux + vx , uy + vy ) dy dx 0 (2.3) 0 is minimized for = 0. As this function is continuously differentiable, the derivative can be computed within the integral. Using partial integration and the conditions stated in 9 2 Mathematical Preliminaries (2.2) we obtain 1 ∂ Φ(u + v) =0⇔ ∂ =0 Z lZ l ∂L ∂x ∂L ∂y ∂L ∂u ∂L ∂ux ∂L ∂uy + + + + dy dx = 0 ⇔ ∂ ∂y |{z} ∂ ∂u ∂ ∂ux ∂ ∂uy ∂ 0 0 ∂x |{z} =0 Z lZ 0 0 ∂L ∂L ∂L vx + vy dy dx = 0 ⇔ v+ ∂u ∂ux ∂uy l ∂L ∂ ∂L ∂ ∂L v− v dy dx + v− ∂u ∂x ∂ux ∂y ∂uy 0 Z lZ 0 Z lZ =0 l Z |0 l l l Z l ∂L ∂L v dy + v dx = 0 ⇔ ∂ux 0 0 ∂uy 0 {z } | {z } =0 =0 l ∂L ∂ ∂L ∂ ∂L v− v dy dx = 0 ⇔ v− ∂x ∂ux ∂y ∂uy 0 0 ∂u Z lZ l ∂L ∂ ∂L ∂ ∂L − − v=0 . ∂u ∂x ∂ux ∂y ∂uy 0 0 (2.4) Theorem 1. Let f be a continuous, real-valued mapping over the interval [0, l] × [0, l], and let Z lZ l f (x, y)v(x, y) dy dx = 0 (2.5) 0 0 for every continuously differentiable, real-valued mapping v fulfilling the boundary conditions equivalent to (2.2). Then it follows that f = 0 over [0, l] × [0, l]. Proof. Assume, on the contrary, that f does not vanish over [0, l] × [0, l]. Since f is continuous, there exists a point (x0 , y0 ) ∈ [0, l] × [0, l] and an > 0 such that f (x, y) 6= 0 for (x, y) ∈ [x0 − , x0 + ] × [y0 − , y0 + ]. Let first f (x, y) > 0. We now choose v(x, y) = ( 2 22 − (x − x0 )2 − (y − y0 )2 , (x, y) ∈ [x0 − , x0 + ] × [y0 − , y0 + ] 0, otherwise. Then v is continuously differentiable and fulfills the assumed boundary conditions. It follows ( > 0, (x, y) ∈ [x0 − , x0 + ] × [y0 − , y0 + ] f (x, y)v(x, y) (2.6) = 0, otherwise. 1 In the following, ∂L denotes the derivative with respect to the first variable of L, ∂L the derivative ∂x ∂u with respect to the third variable of L and so on. It does not denote the derivative with respect to the function u. 10 2.2 Hypothesis Testing Hence, we would obtain Z lZ l Z x0 + Z y0 + f (x, y)v(x, y) dy dx > 0, f (x, y)v(x, y) dy dx = 0 x0 − 0 (2.7) y0 − which contradicts the assumption. The case f (x0 , y0 ) < 0 is handled in the same way. Hence, it follows that f (x, y) = 0 for x ∈ (0, l) × (0, l) and thus for all x ∈ [0, l] × [0, l]. This concludes the proof. Based on this theorem we can now conclude from equation (2.4) Z lZ l 0 that 0 ∂L ∂ ∂L ∂ ∂L − − ∂u ∂x ∂ux ∂y ∂uy ∂L ∂ ∂L ∂ ∂L − − = 0. ∂u ∂x ∂ux ∂y ∂uy v=0 (2.8) (2.9) This partial differential equation is also called the Euler-Lagrange equation. Hence, the mapping u, which minimizes the functional in (2.1), can be obtained by solving the Euler-Lagrange equation in (2.9). 2.2 Hypothesis Testing Hypothesis tests are used to judge if a given sample realization can stem from a hypothetical distribution (also called null hypothesis). Let (H, H, W) be a statistical experiment, where H refers to the n-dimensional sample space, H to the σ-algebra over the sample space and W to a set of probability measures. Let, furthermore, Γ refer to a parameter space, which is partitioned into two sets Γ1 and Γ2 . This partitioning of the parameter space corresponds to a partitioning of the set of probability measures W into W1 and W2 , since each parameter in Γ defines one probability measure in W. The quintuple (H, H, W, W1 , W2 ) is called testing experiment. W1 is called the hypothesis, W2 the alternative. A hypothesis test decides for each possible realization of a sample X = (X1 , ..., Xn ) if the distribution of X can be described by a distribution in W1 . Let (H, H, W, W1 , W2 ) be a testing experiment, and let B denote the Borel σ-algebra. Then every (H − [0, 1] ∩ B)-measurable function φ : H → [0, 1] defines a hypothesis test. Here, φ(x) = 0 means that the hypothesis is accepted, whereas φ(x) = 1 means that the hypothesis is rejected. A := φ−1 (0) = {x ∈ H|φ(x) = 0} (2.10) 11 2 Mathematical Preliminaries is called acceptance region of the test. If for a sample realization we have x ∈ A then the hypothesis is not rejected. Ac := φ−1 (1) = {x ∈ H|φ(x) = 1} (2.11) is called critical or rejection region of the test. If for a sample realization we have x ∈ Ac then the hypothesis is rejected. The choice of acceptance and rejection region is crucial for the quality of the test. 2.2.1 The Quality of a Hypothesis Test The quality of a hypothesis test is measured based on the probability of an incorrect decision. Type 1 and Type 2 Errors There are two types of incorrect decisions in a hypothesis test • Type 1 error: the hypothesis is rejected even though it is correct, • Type 2 error: the hypothesis is not rejected even though it is incorrect. Let there be a test problem with W1 = {P1 } denoting the hypothesis and W2 = {P2 } the alternative. Let A refer to the acceptance region and Ac to the rejection region of a test φ. Then the probability of a type 1 error is P1 (Ac ) and that of a type 2 error P2 (A). Figure 2.1 demonstrates this relation for a simple example. Quality Function and Power of a Test Quantitative statements on the quality of a test can be obtained by means of the quality function. Let (H, H, W, W1 , W2 ) be a testing experiment and φ : H → {0, 1} a hypothesis test. Then the function βφ : Γ → [0, 1] (2.12) Z βφ (γ) = Eγ (φ) = φdPγ (2.13) is called quality function of φ. For each parameter γ of the parameter space Γ this function computes the expected value of the test, that is the probability for the rejection of the hypothesis. Hence, it is important to design the test φ such that βφ (γ) is minimal for γ ∈ Γ1 and maximal for γ ∈ Γ2 . The probability for a type 1 error can be determined by means of the restriction βφ |Γ1 . To control the type 1 error, a significance level α is chosen for hypothesis tests. The 12 2.2 Hypothesis Testing Figure 2.1: Graphical representation of the probability of a type 1 error (P1 (Ac ),blue) and a type 2 error (P2 (A), red), {P1 } refers to the hypothesis with corresponding probability density f1 , {P2 } to the alternative with corresponding probability density f2 . test φ is designed in such a way that the probability for a type 1 error is limited by α in case the hypothesis is true, that means Pγ {x ∈ H|φ(x) = 1} = βφ (γ) ≤ α, γ ∈ Γ1 . (2.14) Tests fulfilling (2.14) are called α-tests or tests with significance level α. An important quality criterion, which can also be computed from the quality function βφ , is the power of a test, which is defined as the restriction βφ |Γ2 . It indicates the probability that the hypothesis is rejected in case that the distribution underlying the data does not belong to the hypothesis, that is the absence of a type 2 error. 2.2.2 p-Values Definition 1. Let (H, H, W, W1 , W2 ) refer to a testing experiment. For α ∈ (0, 1) let φα : H → {0, 1} be a deterministic test with significance level α, that means Z φα dPγ ≤ α, Pγ ∈ W1 . (2.15) H For α ∈ (0, 1) let Kα := {x ∈ H|φα (x) = 1} indicate the rejection region of φα . Furthermore, let 0 < α1 < α2 < 1 ⇒ Kα1 ⊂ Kα2 . (2.16) Then the p-value function Π : H → [0, 1] belonging to the family of tests (φα )α∈(0,1) is defined by Π(x) = inf {α|x ∈ Kα } . (2.17) 13 2 Mathematical Preliminaries Let x be a sample realization, then the value Π(x) is called the p-value of the sample x. It corresponds to the minimum significance level, for which the sample x is rejected. 2.3 Best Linear Unbiased Estimators Theorem 2. Let x be an n × 1 parameter vector that is to be estimated from an m × 1 vector y of observations. Let, furthermore, µx and µy denote the expected values of x and y, and let Cxx , Cxy , Cyx and Cyy denote the corresponding covariance matrices. Then the best linear unbiased estimator (BLUE) for x given the observations y is b = µx + Cxy C−1 x yy (y − µy ), Var(b x) = b) = Var(x − x Cxy C−1 yy Cyx , Cxx − Cxy C−1 yy Cyx . (2.18) (2.19) (2.20) Proof. To prove this statement we introduce the following four theorems. Their proofs can be found in [61]. Let x and y denote two random variables. Then Theorem 3. Var(Ax + b) = A Var(x)AT Theorem 4. Cov(Ax + a, By + b) = A Cov(x, y)B T Theorem 5. Cov(x − y, z) = Cov(x, z) − Cov(y, z) Theorem 6. Var(x + y) = Var(x) + Cov(x, y) + Cov(y, x) + Var(y) Let b ∈ Rn . Then we want to estimate (as customary in most textbooks) an arbitrary T x such that the following assumptions hold: linear function bd T x is linearly predictable from y Assumption 1. bd T x = cT y + d, bd c, d ∈ Rm . T x is unbiased Assumption 2. bd E(cT y + d) = E(bT x), which equals the condition cT µy + d − bT µx = 0. Assumption 3. The squared error of the estimator is to be minimized T x − bT x) → min. Var(bd From assumption 3 and 1 together with theorem 3 and 4 it follows that 14 2.3 Best Linear Unbiased Estimators T x − bT x) = Var(bd Var(cT y + d − bT x) = E((cT y + d − bT x − E(cT y + d − bT x))2 ) = E(((cT y − E(cT y)) − (bT x − E(bT x)))2 ) = E((cT y − E(cT y))2 − 2(cT y − E(cT y))(bT x − E(bT x)) + (bT x − E(bT x))2 ) = Var(cT y) − 2 Cov(cT y, bT x) + Var(bT x) = cT Cyy c − 2cT Cyx b + bT Cxx b → min . To minimize this expression with regard to assumption 2 we use the Lagrange multiplier −2λ w(c, d) := cT Cyy c − 2cT Cyx b + bT Cxx b − 2λ(cT µy + d − bT µx ) (2.21) and solve the system of equations ∂w(c, d) = −2λ = 0, ∂d ∂w(c, d) = 2Cyy c − 2Cyx b − 2λµy = 0, ∂c ∂w(c, d) = cT µy + d − bT µx = 0. ∂λ It follows that λ=0 c = C−1 yy Cyx b T (2.22) T d = b µx − c µy . (2.23) Using these results and theorem 3, the variance of the estimation error comes down to T x) = Var(bT x − bd T T (2.24) T T T T Var(b x − c y − d) = Var(b x − c y − b µx + c µy ) = T −1 bT Cxx b − cT Cyy c = bT Cxx b − (C−1 yy Cyx b) Cyy (Cyy Cyx b) = T −1 bT Cxx b − bT Cxy C−1 yy Cyx b = b (Cxx − Cxy Cyy Cyx )b. We now show that the linear estimator defined by (2.22) and (2.23) has minimum error. To this end, let b z = eT y+f , for e, f ∈ Rm be any other linear estimator. Using theorem 6 15 2 Mathematical Preliminaries and the fact that some expressions equal out to 0, we obtain for the variance Var(b z − bT x) = T (2.25) T T T T T Var(e y + f − b x) = Var(e y + f − c y − d + c y + d − b x) = Var((e − c)T y + f − d + cT y + d − bT x) = Var((e − c)T y + f − d) + Var(cT y + d − bT x). Bearing in mind that the variance is non-negative we, finally, obtain T x − bT x). Var(b z − bT x) ≥ Var(cT y + d − bT x) = Var(bd (2.26) T x is minimal. Hence, the error of the estimator bd We, thus, come to the following conclusions for the BLUE: T x = cT y + d = (C−1 C b)T y + bT µ − (C−1 C b)T µ = bd x y yy yx yy yx T (C−1 yy Cyx b) (y T T − µy ) + b µx = b (µx + Cxy C−1 yy (y (2.27) − µy )) . Due to (2.24) the error of the estimator results in −1 T x) = bT (C Var(bT x − bd xx − Cxy Cyy Cyx )b (2.28) and is minimal due to (2.26). According to theorem 3 and the results in (2.22) and (2.23) its variance is T x) = Var(cT y + d) = cT Var(y)c = Var(bd T −1 (C−1 yy Cyx b) Cyy (Cyy Cyx b) = bT Cxy C−1 yy Cyx b. (2.29) Finally setting bT = (1, 0, ..., 0), ..., (0, ...0, 1) in (2.27), (2.28) and (2.29) we obtain the b and Var(x − x b) original statements in theorem 2. The formulas for the estimator x b can also be obtained from the normal distribution, which is conditioned on y. Here x b) correspond to the expectation and covariance of the conditional normal and Var(x − x distribution of x given the observation vector y. 16 2.4 Intrinsic Dimensions Figure 2.2: Triangular topology of the intrinsic dimensionality in barycentric coordinates, taken from [38]. 2.4 Intrinsic Dimensions According to [17] the notion “intrinsic dimension” is defined as follows: “a data set in n dimensions is said to have an intrinsic dimensionality equal to d if the data lies entirely within a d-dimensional subspace”. It has first been applied to image processing by Zetzsche and Barth in [101] in order to distinguish between edge-like and corner-like structures in an image. Such information can be used to identify reliable locations for optical flow computation, tracking and registration, e.g. corners in an image sequence. An equivalent definition of “intrinsic dimension” in image patches is based on its spectrum assigning the identifier • i0d, if the spectrum consists of a single point, which corresponds to a homogeneous image patch • i1d, if the spectrum is a line through the origin, which corresponds to an edge in the image patch • i2d otherwise, which corresponds to highly textured regions. The intrinsic dimension of a given dataset can be computed as the rank of the structure tensor [16]. As, of course, the eigenvalues of the structure tensor are never exactly 0, thresholds have to be applied to obtain estimates of discrete intrinsic dimensions. Hence, since no situation exclusively belongs to only one of the intrinsic dimensions, Felsberg et al. [38] gave a continuous formulation of intrinsic dimensions smaller and equal to two and showed that the underlying topology of the intrinsic dimension space corresponds to a triangle (see Figure 2.2). Their formulation has been used in [60] to analyze errors of local optical flow computation methods based on the intrinsic dimensionality of the 17 2 Mathematical Preliminaries Figure 2.3: Left: example for i2d: a point moving in 3d-space, the dimension of the subspace with constant intensity along the line is 1, thus yielding intrinsic dimension 2. Right: example for i1d: a moving edge leading to a 2d-plane of constant intensity, thus yielding intrinsic dimension 1. underlying image. In [11, 13] Barth has introduced the intrinsic dimensionality of three-dimensional image sequences and applied it to motion estimation, especially for multiple and transparent motions. Provided that the assumption of constant brightness over time holds, motion of a single point corresponds to a line of constant brightness in the image sequence volume. Intuitively speaking, the notion “intrinsic dimension” refers to the dimension of the examined image region (here three) minus the number of dimensions with the same intensity as the current pixel. Thus, the intrinsic dimension of locations in image sequences, where motion is present, is lower or equal to two. In case of intrinsic dimension three the brightness constancy assumption is violated which can be due to e.g. noise, occlusions or transparent structures, since the trajectory of constant intensity in the temporal dimension is intercepted. If unambiguous movement (e.g. of a corner) is present, a unique trajectory of the current pixel in temporal direction exists, which corresponds to intrinsic dimension two. If we have an aperture problem, there is one additional spatial direction with constant intensity leading to intrinsic dimension one. A homogeneous region contains the same intensity in two spatial and the temporal dimension leading to intrinsic dimension zero. If no consistent movement exists, we have intrinsic dimension three. Figure 2.3 demonstrates this concept for intrinsic dimension one and two. Occlusions play a special role here. If a motion vector with intrinsic dimension between zero and two is occluded, the temporal direction no longer contains constant intensity values. Consequently the vector gains one intrinsic dimension. Thus, estimators for the intrinsic dimensionality cannot distinguish between a situation of a certain intrinsic dimension and an occluded situation of a lower intrinsic dimension. This leads to problems 18 2.5 Principal Component Analysis for confidence estimators, which rely on intrinsic dimension estimates. 2.5 Principal Component Analysis 2.5.1 Origin and Objectives The origins of Principal Component Analysis (PCA) reach back to the 1930s, when PCA was originally used in psychology to study intelligence. It is also called ”Karhunen-Loeve transform” and refers to a mathematical way of expressing an n-dimensional dataset in a new coordinate system that shows the properties of the data samples most clearly along the coordinate axes. Today PCA is used in a wide range of applications from computer science to psychology, biology, botanics and medicine, wherever the reduction of dimensionality is needed. There are also applications, where the dimensionality of the data is unknown and PCA is used to define as small a number of principal axes as possible. There are several goals of PCA: • the reduction (or discovery) of the dimensionality of a dataset with many interrelated variables, • the conservation of as much of the variation in the dataset as possible despite the compression, • the reduction of noise and redundancy in the dataset, • the emphasis of the variation in the dataset, • the identification of new, underlying, explanatory variables. In Figure 2.4 the original dataset in 2-dimensional space can be seen on the left within the original coordinate system. On the right the same dataset has been reexpressed using a ”better” coordinate or basis system in order to achieve the goals of PCA. The variables have been decorrelated as much as possible. Most of the variance of the dataset can now be found along the axis y1 . To reduce the dimensionality of the data we could remove the axis y2 and would only lose as few information as possible, since most of the variance is conserved on the other axis. To perform PCA a large sample dataset is needed. The question remains how to choose the new basis system for the representation of the original n-dimensional data samples in order to fulfill the goals stated above. 19 2 Mathematical Preliminaries Figure 2.4: Principal Component Analysis for 2D-data. 2.5.2 Achieving the goals of PCA The covariance of two random variables expresses the redundancy between these variables. A high value for an off-diagonal element of the covariance matrix indicates high correlation and, thus, redundancy between the corresponding variables. We show that the goals of PCA will be achieved by finding a way to diagonalize the covariance matrix. First, redundancy reduction is obtained, since the cross-covariances of all variables are zero after the diagonalization. So no linear redundancy is left in the dataset. Furthermore, information on the variation of the variables can be obtained from the main diagonal of the covariance matrix after diagonalization. The higher the variance of the variable the more information of the original dataset is contained in it and the more important is the variable. Noise reduction goes together with the goal of dimensionality reduction. Noise can usually be found in variables with low variance, whereas variables with higher variances represent important dynamics of the dataset. The latter are desirable components needed to discriminate between the different data samples, whereas the variables with lower variances, the noise, make the process of distinguishing between the data samples more difficult. These variables are removed during the process of dimensionality reduction. Hence, when reducing the dimensions of the dataset the noise is reduced at the same time. Thus, it is clear that all of the above stated goals of PCA can be obtained by the diagonalization of the covariance matrix. 2.5.3 Diagonalization of the Covariance Matrix using Eigenvector Decomposition The assumption that the original basis and the desired principal components form orthonormal bases is necessary to find a rather simple solution for the diagonalization 20 2.5 Principal Component Analysis problem using linear algebra. Let X be the matrix containing the mean-adjusted data samples in its columns. Our goal is to find a matrix P where Y = P T X in such a way, that the covariance matrix 1 D = n−1 Y Y T of the resulting matrix Y is a diagonal matrix. Let M refer to the rowwise mean vector of a given matrix M , and let C denote the covariance matrix of the original dataset matrix containing the samples in its columns. Then we can express the covariance matrix D after diagonalization in the following way 1 (Y − Y )(Y − Y )T n−1 1 = (P T (X − X))(P T (X − X))T n−1 1 (X − X)(X − X)T P = PT n − 1 | {z } D = :=C T = P CP. So, the final redefinition of the problem is to find a matrix P such that P T CP is diagonal. The following steps are based on several theorems from linear algebra: • Every matrix multiplied by its transposed is symmetric. • Eigenvectors of every symmetric matrix are orthogonal. • For every orthogonal matrix M the inverse of this matrix is its transpose, so M −1 = M T . • Consequently the transposed of every matrix S containing the normalized eigenvectors of a symmetric matrix M in its columns is equal to S −1 . Hence, we know that every symmetric matrix M can be diagonalized by a matrix containing the eigenvectors of M in its columns. Therefore SM S −1 = SM S T is a diagonal matrix. The covariance matrix C is symmetric. P must, therefore, contain the eigenvectors of C in its columns in order to diagonalize C. To find the eigenvectors of a symmetric matrix e.g. Givens rotations can be used. These eigenvectors of C are the new basis vectors, which we will use for the transformation of the data samples. D is the desired diagonal matrix, the new covariance matrix of the transformed dataset, containing the eigenvalues of C on the main diagonal. Since the eigenvalues on the main diagonal correspond to the variances of each single variable, the eigenvalues corresponding to the principal components can be used to determine their importance. They describe the variance of the dataset along the corresponding principal component, the eigenvector. 21 2 Mathematical Preliminaries In order to reduce the dimensionality of the data to a meaningful subspace, the axes representing the least information (the smallest variance) of the dataset can be removed. These are the eigenvectors with the smallest eigenvalues. We can select the number of eigenvectors containing the fraction δ of the information of the original dataset by choosing k of the n eigenvectors sorted by decreasing eigenvalue λi such that Pr λi k = min{r | Pi=1 ≥ δ}. (2.30) n i=1 λi 2.5.4 Data transformation The following transformation will be performed to express the original dataset in terms of the new basis system. Let X be the matrix of the original dataset with each single data vector (or sample image) arranged in one of the columns. Let P be the transformation matrix containing the new basis vectors in its columns. Then the transformation P T (X − X) = Y expresses the original dataset in the new coordinate system. Thus, Yi (the i-th column of Y ) is the projection of the i-th data sample onto the basis given in the columns of P . To retransform the data from the eigenspace to the original sample space the inverse transformation is used X = P Y + X. 2.6 The Least Squares Method Let there be n data points (xi , yi ), i ∈ Nn , forming an overdetermined system of equations with unknown parameters α and β yi = α + βxi , i ∈ Nn . (2.31) To estimate α and β we want to minimize the l2 -norm of the residual vector r F (α, β) := krk22 = n X i=1 ri2 = n X (α + βxi − yi )2 . (2.32) i=1 To obtain the minimum we have to solve the following system of equations n 22 ∂F (α, β) X = 2(α + βxi − yi ) = 0, ∂α (2.33) ∂F (α, β) = ∂β (2.34) i=1 n X i=1 2(α + βxi − yi )xi = 0. 2.6 The Least Squares Method These equations are called normal equations and can be rewritten in the following way n X i=1 n X i=1 (α + βxi ) = n X yi (2.35) i=1 (α + βxi )xi = n X yi xi . (2.36) i=1 Equivalently, we can use matrices to rewrite these equations Pn Pn α yi i=1 P Pni=1 x2i Pnn . = n β i=1 yi xi i=1 xi i=1 xi (2.37) It is obvious that the matrix of 1 .. A := . the normal equations is real-valued and symmetric. Let x1 y1 .. ∈ Rn×2 , y := .. ∈ Rn . (2.38) . . 1 xn yn Then the system of normal equations in (2.37) can be written in the following way α T = AT y. (2.39) A A β For the Hessian matrix H containing the second derivatives of F (α, β) we obtain ! P ∂F (α,β) ∂F (α,β) 2n 2 P ni=1 xi ∂α∂β ∂2α P = 2AT A. (2.40) = H= n n 2 ∂F (α,β) ∂F (α,β) x x 2 2 i=1 i i=1 i 2 ∂α∂β ∂ β Furthermore, we have det(H) = det(2AT A) = 2n det(AT A) ≥ 0. And in case there is a pair i, j ∈ Nn of indices with xi 6= xj the determinant of H is positive. Thus, in this case the solution of the normal equations given in (2.39) yields the minimum residual solution for the parameters α and β. 23 2 Mathematical Preliminaries 24 Chapter 3 Optical Flow Estimation Optical flow computation is usually based on the assumption that the brightness of a moving pixel remains constant over time. If x : [0, T ] → R2 describes the trajectory of a point of an object we can model a constant brightness intensity as I(x(t), t) = const. A first order approximation yields the brightness constancy constraint equation (BCCE) dI =0 dt ⇔ u · ∇x I + ∂I =0 , ∂t (3.1) where ∇ is the gradient operator with respect to parameters given as indices. 3.1 Local Methods In case of a local method, the optical flow is estimated for each pixel individually based on a specific flow model, which is only locally valid. Therefore, local methods are usually simple to implement and fast as they can be easily parallelized. Furthermore, they have proven relatively robust to noise [56]. Yet, as only local image information is used for the computation, such methods often suffer from problems in regions, where none or only little image information is available, such as in homogeneous regions or in case of aperture problems. In contrast, global methods solve the optical flow problem by minimizing an energy term formulated on the whole image domain. 3.1.1 The Lukas and Kanade Method Lukas and Kanade [70] proposed to solve the optical flow problem by assuming brightness constancy and a constant flow within small neighborhoods. This leads to an overdeter- 25 3 Optical Flow Estimation mined system of brightness constancy equations for each pixel ∇x I u = −∇t I . |{z} | {z } A (3.2) b To solve this system of equations, the least squares method described in chapter 2 is used. 3.1.2 The Bigün Method The problem with the method by Lukas and Kanade is that the least squares method assumes errors only in the right hand side of the equation system, in the observation vector, that is in the temporal image gradient. Since errors also occur in the spatial image gradient, it would be more appropriate to minimize the error of the true and the measured spatio-temporal image gradients. This approach was proposed by Bigün [16] in 1991. Let D = [A, b] and r = [u, −1]T . Then the idea of the total least squares method is to solve the following optimization problem: min kDrk22 , s.t. rT r = 1. (3.3) T T L(r, λ) = kDrk22 + λ(1 − rT r) = rT D | {zD} r + λ(1 − r r) (3.4) Using Lagrange multipliers J we obtain the following system of equations ∂L ∂r ∂L ∂λ = 2Jr − 2λr = 0, T = 1 − r r = 0. (3.5) (3.6) Thus, the minimization of (3.3) reduces to an eigenvalue problem of the matrix J, which is called the structure tensor. As J is symmetric the eigenvalues are positive and realvalued. Hence, to minimize (3.3) let r be an eigenvector of J. Then we have min kDrk22 = rT Jr = rT λr = λ. (3.7) Therefore, it is obvious that the minimum of (3.3) is obtained if r corresponds to the eigenvector of J with the smallest eigenvalue λ. To obtain the optical flow vector for the current pixel, the eigenvector has to be renormalized. To this end, it is divided by its last entry, which is removed afterwards. 26 3.2 Global Methods 3.1.3 Handling Covariances Caused by Derivative Filters In general, one can say that established total least squares methods estimate the most likely corrections Ae and be to a given data matrix [A, b] perturbed by additive Gaussian noise, such that there exists a solution u with [A + Ae , b + be ][u, −1] = 0. In practice, regression imposes a more restrictive constraint, namely the existence of a solution x with [A + Ae ]x = [b + be ]. In addition, more complicated correlations arise canonically from the use of linear filters, e.g. derivative filters. In [4] we, therefore, propose a maximum likelihood estimator for regression in the general case of arbitrary positive definite covariance matrices. This leads to an unconstrained minimization of a multivariate polynomial which can, in principle, be carried out by means of a Gröbner basis. There exist several extensions of the structure tensor method leading to more accurate results, for example the integration of brightness variations [50] and the consideration of outliers [6]. Other local methods have been proposed by Anandan [3], who uses a block matching approach to compute flow vectors, and Farnebäck [37], who introduces orientation tensors and a region segmentation method to obtain fast and accurate velocity estimates. 3.2 Global Methods Global methods are formulated as energy optimization problems consisting of a data term and a regularization term. The data term usually ensures some constancy constraint, such as the brightness or gradient constancy. The regularizers are used to obtain a spatiotemporal relation of neighboring flow estimates. In this way, information is ’transported’ into regions, where otherwise no or only little image information is available. This effect is called the filling-in effect of global methods. Advantages of global methods are that they yield dense flow fields and mostly avoid aperture problems. Disadvantages are that they are usually more complicated to implement than local methods and require higher computation effort. Furthermore, they are more sensitive to noise [10]. Numerous global methods are known today. An overview can be found in [10, 74]. 3.2.1 Horn and Schunck The first global method was proposed in 1981 by Horn and Schunck. They minimized the following energy functional consisting of the brightness constancy constraint and a simple smoothness assumption relating neighboring flow vectors Z E(u) = (Ix u1 + Iy u2 + It )2 +λ (k∇2 u1 k2 + k∇2 u2 k2 ) dx dy. | {z } | {z } Ω data term (3.8) regularization term 27 3 Optical Flow Estimation The energy is minimized using the calculus of variations described in chapter 2, which leads to the following Euler-Lagrange equations (Ix u1 + Iy u2 + It )Ix − u1xx − u1yy = 0, (3.9) (Ix u1 + Iy u2 + It )Iy − u2xx − u2yy = 0. (3.10) To solve this linear system of equations, for example the conjugate gradient method or the Gauss-Seidel method in combination with successive overrelaxation (SOR) can be used. 3.2.2 Bruhn et al. In [28], Bruhn et al. proposed the combined local global (CLG) method, which integrates the advantages of local methods into a global framework. There are two variants of this method: a linear one and a nonlinear one. The nonlinear method explicitly allows for discontinuities in the flow field. Let Jρ (∇3 I) = Kρ ∗ (∇3 I)(∇3 I)T denote the structure tensor, where Kρ indicates the convolution with a Gaussian kernel of standard deviation ρ, and let ũ denote the vector u with an additional last coordinate set to 1. Then the following energy is minimized in the linear case Z E(u) = ũJ(∇3 I)ũT + λ(k∇u1 k2 + k∇u2 k2 ) dx dy . (3.11) Ω We obtain the following system of Euler-Lagrange equations J(∇3 I)ũT − u1xx − u1yy = 0, (3.12) J(∇3 I)ũT − u2xx − u2yy = 0, (3.13) which can again be solved using linear solvers such as SOR. In the nonlinear case, a nonlinear function ψ is introduced to handle outliers in the data term as well as in the regularization term Z E(u) = ψ1 (ũJ(∇3 I)ũT ) + λψ2 (k∇u1 k2 + k∇u2 k2 ) dx dy . (3.14) Ω An example for ψ is Charbonnier’s function, which is convex in s s s2 ψ(s2 ) = 2β 2 1 + 2 . β In this case we obtain a nonlinear system of Euler-Lagrange equations ψ10 (ũJ(∇3 I)ũT )(J11 u1 + J12 u2 + J13 ) − λ div(ψ20 (k∇uk2 )∇u1 ) = 0, ψ10 (ũJ(∇3 I)ũT )(J21 u1 + J22 u2 + J23 ) − λ div(ψ20 (k∇uk2 )∇u2 ) = 0. This system can be solved using gradient descent methods. 28 (3.15) Chapter 4 Error Analysis 4.1 Introduction The estimation of optical flow has been a fundamental problem in computer vision ever since the pioneering work of Fennema and Thompson [39]. Over the last three decades, the importance of this problem has spawned the development of algorithms and their usage to a wide range of disciplines, ranging from robot vision to scientific and measurement applications. Along with the diversity of methods comes the need for a thorough quantitative evaluation of their accuracy and applicability to different scenes. In order to evaluate confidence measures, which predict the error of computed flow fields, first commonly used error measurements of optical flow fields need to be introduced and analyzed. Such error measures are of high importance for the understanding of strengths and weaknesses of these algorithms, for scientific and industrial applications and for confidence estimation. However, the evaluation has been limited to mostly the indication of the average angular error and its variance for a small number of highly artificial test sequences. I propose to comprehend the evaluation of motion estimators as a sampling from a joint probability distribution, namely the joint distribution of the true and the estimated flow, as well as local gray value neighborhoods. Marginals and conditionals of this distribution allow for a detailed assessment of optical flow algorithms. For the ranking of motion estimators, a new indicator is suggested, a scalar measure, which overcomes conceptual difficulties of the average angular error. The expressiveness of the proposed method is shown for five different flow estimators and five test sequences. 4.1.1 Motivation Seminal work has been dedicated to the task of accuracy evaluation by Barron and Fleet [10], was followed by Mitiche and Bouthemy [74], Stiller and Konrad [94], Kalkan et al. [59] and others. Expressive measures of errors along with a detailed investigation of 29 4 Error Analysis estimated flow fields, in conjunction with the image sequences used in testing, will help to better understand and judge the specific strengths and weaknesses of flow estimators. The choice of ground truth test sequences has tremendous influence on the significance of comparative studies. So far, most quantitative studies on optical flow are restricted to the indication of the average angular error [42] and its variance. This error is defined as the angle between two three dimensional vectors, namely the true and the estimated displacement vector augmented to 3D by setting the third coordinate to one. Recently, Baker et al. [8] have suggested to report quantiles of the statistics of the endpoint error, i.e. the length of the difference vector between the estimated and the true flow vector. Unlike relative error measures, which suffer from possible divisions by zero, the angular error and the endpoint error are well-defined. This advantage is complemented by the representation of the error in a single scalar value, which usually is computed by taking the average over the whole flow field. A careful analysis reveals several crucial drawbacks of the angular error which are summarized in section 4.2. A generalized view on the evaluation of optical flow estimators is proposed, which is essentially understood as a sampling from the joint PDF of the ground truth and computed flow field, as well as the gray value structure of local pixel neighborhoods. Relevant properties of the estimator can be derived as marginals, conditionals and scalar descriptors from this distribution. The more different test sequences are used to sample from the PDF, the more generally valid become statements on the accuracy of flow estimators. By means of marginalization and conditioning of the PDF, the relation between error measures and local gray value structure can be assessed. E.g., one can ask for the distribution of a scalar error measure within homogeneous regions, or conversely for the gray value structure of local neighborhoods for a given error interval. The results of the operations on the PDF can be visually and quantitatively examined. Hence, new insights into the quality and drawbacks of a single estimator as well as quantitative comparisons between different estimators can be made. An important aim of comparative studies is the ranking of optical flow estimators. While complex descriptors of error distributions are useful to reveal detailed properties of the estimator, they are inadequate for ranking. Instead, a scalar indicator is desirable to impose an order on the estimators. Based on the endpoint error, which naturally results from the PDF framework, the integral over the cumulative distribution function is computed in order to rank different methods. In the results section the quality of five well-established optical flow estimators are evaluated and compared. In this way, it is demonstrated how the proposed general framework can be applied to obtain all kinds of statements on the quality of an estimator based on a single PDF. In addition, results for known estimators are provided, which can be used to compare against the quality of new estimators in the future. 30 4.1 Introduction 4.1.2 Related Work The following measures have been proposed as measures of discrepancy between the ground truth vector g ∈ Rd and the estimated flow vector u ∈ Rd . Angular Error The most widely used error measure is the angular error, Eα (g(x), u(x)) = 180 ũ(x) · g̃(x) arccos , π kũ(x)k kg̃(x)k (4.1) which was suggested by Barron and Fleet [10] and dates back to prior work by Fleet and Jepson [42]. A brief discussion of the angular error can be found in [51]. Endpoint Error The length of the difference vector between the true and the computed vector, E2 (g(x), u(x)) = ku(x) − g(x)k , (4.2) was proposed by Otte and Nagel [80] and revived as endpoint error by Baker et al. [8]. This measure discounts errors in regions of small flow. Angle Error The angle between the correct and the estimated flow vector, Eφ (g(x), u(x)) = u(x) · g(x) 180 arccos , π ku(x)kkg(x)k (4.3) is usually referred to as the angle error. Since it does not take into account the error in length, it is usually indicated together with the magnitude error of the flow vector. Magnitude Error The magnitude error is defined as the absolute difference of the magnitudes Em (g(x), u(x)) = |ku(x)k − kg(x)k| . (4.4) This error measure does not account for errors in direction and, hence, is usually indicated together with the angle error. 31 4 Error Analysis Relative Magnitude Error The relative magnitude error Eµ (g(x), u(x)) = | ku(x)k − kg(x)k | kg(x)k (4.5) relates the absolute magnitude error to the length of the ground truth vector. Due to divisions by values close to zero this error measure is problematic for ground truth vectors of very small magnitude [73]. Error Normal to the Gradient In order to measure how effectively an algorithm compensates for the aperture problem, Galvin et al. [44] propose to measure the error normal to the gradient E⊥ (g(x), u(x)) = k(u(x) − g(x)) f ⊥ (x)k, f ⊥ (x) = (−∂y I(x), ∂x I(x))T . (4.6) Gray Value Differences Error measures based on the squared gray value differences between the original frame and the frame warped by the computed motion field have been proposed by Baker et al. [8]. The problem with these measures is their strong dependence on a frame interpolation algorithm. Furthermore, the error depends on the homogeneity of the scene, because any incorrect flow vector pointing to a location with identical gray value is considered correct. 4.1.3 Contribution My contribution is threefold. First, due to several shortcomings of the angular error a generalized evaluation of optical flow methods is suggested, which is based on marginals and conditionals of the PDF comprising the ground truth flow, the computed flow and the gray value neighborhood. To compute the PDF as many test sequences as possible should be used to gain independence from the image sequence. In this way, many questions on the quality of estimators can be answered, e.g. questions concerning the best estimator in case of high velocities or typical gray value structures in case of high errors. Thus, an evaluation method is suggested, which comprises previously used analysis methods and at the same time allows for various, more specific questions. The results can be visually examined in order to gain new insights into special problems or advantages of estimators, but quantitative statements can be made as well to allow for comparisons between estimators. Second, a scalar indicator is proposed to rank different optical flow methods. This indicator is based on the cumulative distribution function of the absolute error of the flow 32 original length l =0.11 80 60 40 20 l length l L=0.21 L=0.41 80 60 40 20 =0.41L=0.01 L=0.51 L=1.01 100 80 60 40 20 l length l =0.61 relative magnitude error in % relative magnitude error in % relative magnitude error in % l length l =0.21L=0.01 4.2 80 60 40 Discussion of the Angular Error 20 original length l =0.01 original length l =0.31 L=0.81 200 L=0.61 150 100 50 0 50 100 150 0.04 100 0.03 80 0.02 60 0.01 40 20 0 angle error L=1.51 L=2.01 original length l =1.01 original length l =0.51 200 2 150 100 50 0 50 100 150 100 1.5 80 1 60 40 0.5 20 0 original length l =0.71 angle error original length l =2.01 Figure 4.1: Top: Angular error for (u1 , u2 )T ∈ [−10,200 10] × [−10, 10] for different ground 120truth lengths g = L · (1, 1)T . Bottom: Angular error isocontours for increas120 2 100 100 150 80 ing angle error ([0, π], horizontal axis) and relative magnitude error ([0, 802], lengths L. 60 vertical axis), for different ground truth 100 60 40 20 field. l length l =0.81 And third, 50 1 40 20 0 0 50 100 150 original length lflow =0.91 five known optical estimators angle error results are shown for the comparison of within the proposed framework. These results can be understood as starting point for 120 120 the comparison on 100 between known and new estimators. Besides, interesting statements 100 80 specific 80 questions such as the most reliable method in case of high velocities are made. 60 60 This approach has been submitted [63]. 40 20 4.2 Discussion of the Angular Error 40 20 The angular error has been used as a scalar indicator for the ranking of optical flow estimators. However, given the accuracy of recent algorithms, conceptual shortcomings of this measure become apparent. This section is devoted to the discussion of these drawbacks. The angular error depends non-linearly on the true magnitude. The top row of Figure 4.1 depicts the angular error for all kinds of flow vectors within the interval [−10, 10] × [−10, 10] for a given ground truth flow vector L · (1, 1)T . Here L determines the length of the ground truth vector and is increased from left to right. It can be seen that, for a small magnitude L of the true displacement, the angular error is governed by the deviation in magnitude, because the error homogeneously grows in all directions. Conversely, for higher magnitudes L of 33 relative magnitude error in % relative magnitude error in % relative magnitude error in % l length l =0.01 original length l =0. 200 150 100 50 0 50 100 angle error original length l =1. 0 50 100 angle error original length l =2. 200 150 100 50 200 150 100 50 0 50 100 angle error 4 Error Analysis 80 lim Eα (x) 70 L u(x) · g(x) + L1 = lim arccos u1 (x) L→∞ L u21(x) kg̃(x)k L u(x) · g(x) = arccos ku(x)k kg̃(x)k Angular Error L→∞ Increasing angle error Increasing magnitude error 60 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 Angle Error/Relative Magnitude Error Figure 4.2: The angular error is bounded with respect to magnitude errors, but not with respect to angle errors. the true displacement, the angle error dominates. This relation is non-linear and rather arbitrary. The angular error is bounded for magnitude errors, not for angle errors. Let u(x) be a computed displacement vector, which is varied in length by multiplication with a factor L. Then the limes for L tending to infinity is given in Figure 4.2. On the right hand side the angular error is shown for a true displacement vector (1, 1)T and a computed vector with a) increasing angle error and b) increasing magnitude error. The plot shows that the angular error is bounded for increasing deviations in magnitude, but not for deviations in angle. Hence, a vector of length infinity with zero angle error is seen as equally “correct” as a vector with thirty degrees angle error. The angular error is not invariant against the sign of magnitude deviations. An estimated flow vector parallel to the true displacement but being too short gives rise to a different angular error than a parallel vector that overestimates the displacement by the same percentage (see bottom row Figure 4.1). Yet, both cases should yield the same error value, since a vector that is 10% too short and a vector that is 10% too long are, in fact, equally “correct”. The influence of the magnitude and angle error interdepend. The bottom row of Figure 4.1 shows isocontours for angle errors around 90 degrees, which are nearly parallel to the vertical axis. This means that for angle errors around 90 degrees the magnitude error does not influence the angular error at all. For smaller and larger angle errors the influence of the magnitude error increases again. This problem is especially apparent for larger speeds of the ground truth vector. In contrast, for angle errors around 0 and 180 degrees the isocontours are almost parallel to 34 4.3 The Joint Distribution of Optical Flow Estimation the horizontal axis. This shows that for very small and very large angle errors the angular error almost exclusively depends on the magnitude error. The average angular error and its variance are insufficient characteristics. Another problem when comparing the quality of different optical flow techniques is that papers in this area are mostly limited to the indication of the average angular error and its variance, the first and second order moments of the error distribution. For comparisons and rankings of different methods, the variance is even often not taken into account. This is insufficient because specific problems or advantages are neglected, and a single outlier can distort the comparison. Due to these shortcomings of the angular error a new policy and error measurement for the evaluation of optical flow estimators is proposed. 4.3 The Joint Distribution of Optical Flow Estimation I comprehend the evaluation of optical flow estimates from a set of test sequences as a sampling from a joint PDF, from which relevant properties of the estimator can be derived as marginals, conditionals and scalar descriptors, respectively. The application of any motion estimator to (possibly several) sequences of images results in a set of samples, one for each pixel, which are assumed independent ∀x ∈ D : (g1 (x), g2 (x), u1 (x), u2 (x), J1 (x), . . . , Jn (x)) ∈ Rn+4 . (4.7) Herein, (g1 (x), g2 (x))T and (u1 (x), u2 (x))T shall denote the true and the estimated displacement field, respectively, and (J1 (x), . . . , Jn (x))T shall be the vector of gray values in a local neighborhood of the pixel x. These are samples from the PDF, which would result from the application of the estimator to all sequences of images. Although this PDF is unavailable in practice, approximations based on the set of samples can be obtained by means of density estimation. To this end, exemplary results are shown for Parzen kernel density estimates [82]. Once an estimate of the PDF has been computed, marginal and conditional distributions as well as scalar descriptors derived from this PDF open a generic and direct view to evaluating the quality of optical flow estimators. 4.3.1 Marginals and Conditionals Image Sequences The representativeness of the estimated PDF highly depends on the choice of test sequences. It is, therefore, important to consider the distribution of the true displacement accumulated over the set of test sequences, ideally in conjunction with spatio-temporal gray value context, namely to consider the marginal distribution of (g1 , g2 , J1 , . . . , Jn ). 35 4 Error Analysis Based on this distribution, marginals and conditionals can be computed, which allow for answers to various kinds of questions concerning the quality of the optical flow estimator. To this end, a low-dimensional representation of the gray values of local neighborhoods can be obtained by PCA [57], ICA [32], the local variance and local entropy [90], respectively. Flow Field Discrepancy The discrepancy between the true and the estimated flow field is a very important factor in the evaluation of optical flow estimators. A thorough quantitative analysis of this discrepancy is, therefore, indispensable. Let us consider the bivariate PDF of (u1 − g1 , u2 − g2 )T , i.e. the distribution of the difference vector between the true and the estimated flow. Due to its fundamental importance, I recommend that estimates of this distribution be visualized (e.g. Figure 4.4) when analyzing the quality of optical flow estimators. In addition, the (bivariate) distribution of the magnitudes of the true and the estimated flow field is of interest (e.g. Figure 4.6), namely the marginal distribution of (kuk, kgk). In this way, systematic errors for specific lengths can be observed for different optical flow methods. For special applications, e.g. driver assistance systems, the performance of the methods for very large flow magnitudes are of specific interest and can be investigated in this way. Causes of Discrepancy Once the discrepancy of the true and the estimated flow field has been quantified, the question is vital, in which local context the highest errors occur. In fact, conditioning the joint PDF on a certain interval of the endpoint error (4.2) yields a PDF of local neighborhoods of gray values. Conversely, a distribution of the endpoint error is obtained by conditioning the joint PDF on an interval of a scalar texture descriptor (i.e., the local gray value entropy). In this way the distribution of the error in case of special gray value structures becomes apparent. Ranking of Estimators To obtain a scalar indicator suited for comparing different optical flow methods I propose to compare the integral over the cumulative distribution functions of the endpoint error (4.2). This scalar indicator is not bounded with respect to large magnitude errors, it is invariant against the sign of deviation in magnitude, it does not depend on the ground truth magnitude and it is a statistical measure reflecting the error distribution in a more detailed way than the indication of average error and variance. Hence, this indicator evades the difficulties reported for the average angular error. 36 4.4 Experiments and Results 4.4 Experiments and Results Five optical flow estimators and five standard test sequences are considered and various information from the proposed joint PDF is derived. The following results are intended as a demonstration of expressiveness of the proposed method rather than as a comprehensive assessment of recent motion estimators. The algorithms by Nir et al. [78], Farnebäck [37], Bruhn et al. [28] (2D linear CLG), Horn and Schunck [53] and Bigün et al. [16] (for some of them compare Chapter 3) are considered, which are applied to the Marble [80], Street [73], Office [73], Yosemite [52] and Rubber Whale [8] sequence. For the method of Nir et al., the estimated flow field for the Yosemite sequence was kindly provided by the authors. For the Farnebäck method the Matlab implementation provided online by the authors [35] is used. The other algorithms had to be re-implemented and relevant parameters were chosen as follows. The regularization strength, the Gaussian pre-smoothing variance and the spatial integration of the structure tensor were jointly optimized for each sequence. The structure tensor was computed by means of isotropy-optimized 7x7x7 Scharr filters [89]. Unless otherwise noted, results are accumulated over all considered test sequences. Note especially that for Nir’s method, results are restricted to the Yosemite sequence. Where errors are visualized by shading, the scale of the shading is logarithmic. Comparing the Marginal Distributions of (u) and (g) To begin with, by marginalizing over the gray values J1 , . . . , Jn and either the estimated u or the true flow g, the nature of these flow fields becomes apparent and the two can be compared (see Figure 4.3). In fact, these distributions are quite similar as can be expected from a rather accurate flow estimator such as Farnebäck’s. Deviations are mainly due to inaccurate results obtained on the Rubber Whale sequence. Comparing the Marginal Distribution of (u − g) In order to precisely assess accuracy, I marginalize over the gray values J1 , . . . , Jn and analyze the distribution of the deviations (d1 , d2 )T = (u1 − g1 , u2 − g2 )T . Figure 4.4 shows Parzen estimates of the x and y velocity deviations. For a perfect flow estimator, one would expect a single point at (0,0). Farnebäck’s and Nir’s methods outperform the other algorithms as the distributions are more centered than the others. It can be seen that the distribution of the deviations are similar for local (Farnebäck, Bigün) and global (Bruhn, Horn-Schunck) methods, especially in case that all test sequences are used. Local methods yield distributions that tend to look normal and only yield a small bias in the vertical component. In contrast, global methods often yield systematic deviations in specific directions and exhibit a larger bias in the vertical component. These systematic errors for global methods may be due to regularization. Yet, for the Yosemite sequence the methods tend to underestimate the vertical component. Furthermore, the CLG 37 4 Error Analysis 2.50 2.50 3.5 2.5 2 0.00 1.5 1 −1.25 computed y displacement true y displacement 3 1.25 2 1.8 1.25 1.6 1.4 1.2 0.00 1 0.8 0.6 −1.25 0.4 0.5 0.2 −2.50 −2.50 −1.25 0.00 1.25 true x displacement 2.50 a) 0 −2.50 −2.50 −1.25 0.00 1.25 computed x displacement 2.50 b) Figure 4.3: a) Distribution of ground truth velocities for all test sequences, b) Distribution of flow estimates from Farnebäck’s method for all test sequences. and Horn-Schunck methods mostly either underestimate the vertical or overestimate the horizontal component. The Nir method yields the most centered distribution but also shows systematic errors in specific directions. 4.4.1 Comparing Scalar Descriptors A numerical comparison of different flow estimators based on a single value is problematic, since it is hardly possible to include the most important characteristics of the flow field in one figure only. To avoid simple averaging of the error, I propose to integrate over the cumulative distribution function of the endpoint error (4.2). Figure 4.5 shows the cumulative distribution functions over the angular and endpoint error based on all test sequences and based on the Yosemite sequence only. A bin size of 0.0005 was used to compute the cumulative distribution function. The corresponding tables in the same figure compare the integral over the cdf to the respective error measurement. The resulting order naturally corresponds to that inferred by the angular error, which again confirms the known quality ranking of these estimators. However, the relation between the results of different methods differ, e.g. when comparing the results for all sequences for the methods by Horn and Schunck and by Bigün. Here, the structure tensor result shows an angular error that exceeds the one of the Horn-Schunck method by 2.79. In contrast, the integrals over the corresponding cdfs differ only marginally. 38 4.4 Experiments and Results Farnebäck 0.50 0.50 6 5 0.25 5 0.25 4 dy dy 4 3 0.00 0.00 3 2 −0.25 2 −0.25 1 −0.50 −0.50 −0.25 0.00 dx 0.25 0 0.50 1 −0.50 −0.50 −0.25 0.00 dx 0.25 0 0.50 Bruhn et al. 0.50 0.50 4.5 5 4 0.25 4.5 0.25 3.5 4 3.5 2.5 0.00 3 dy dy 3 0.00 2.5 2 2 1.5 −0.25 1 1.5 −0.25 1 0.5 −0.50 −0.50 −0.25 0.00 dx 0.25 0.50 0 0.5 −0.50 −0.50 −0.25 0.00 dx 0.25 0.50 0 Horn & Schunck 0.50 0.50 5 4.5 4.5 4 0.25 4 0.25 3.5 3.5 3 dy dy 3 2.5 0.00 0.00 2.5 2 2 1.5 −0.25 1.5 −0.25 1 1 0.5 −0.50 −0.50 −0.25 0.00 dx 0.25 0.50 0 0.5 −0.50 −0.50 −0.25 0.00 dx 0.25 0.50 Bigün et al. 0.50 0.50 5 4.5 4.5 4 4 0.25 0.25 3.5 3.5 3 0.00 2.5 dy dy 3 2.5 0.00 2 2 1.5 1.5 −0.25 −0.25 1 1 0.5 0.5 −0.50 −0.50 −0.25 0.00 dx 0.25 0.50 0 −0.50 −0.50 −0.25 0.00 dx 0.25 0.50 0 Nir et al. 0.50 6 5 0.25 dy 4 0.00 3 2 −0.25 1 −0.50 −0.50 −0.25 0.00 dx 0.25 0.50 0 Figure 4.4: Parzen estimate of the distribution of the x and y velocity deviations for different flow estimators over all test sequences (left) and only Yosemite (right). 39 4 Error Analysis 1 1 0,9 0,9 0,8 0,8 0,7 0,7 0,6 Bruhn et al. 0,6 0,5 Bigün et al. 0,5 Bigün et al. Farnebäck 0,4 0,3 Farnebäck Horn & Schunck 0,4 0,3 0,2 Horn & Schunck 0,2 0,1 Nir et al. 0,1 Bruhn et al. 0 0 0 0,5 1 1,5 2 2,5 3 3,5 4 4,5 5 5,5 6 6,5 7 7,5 8 8,5 9 9,5 10 0 0,5 1 1,5 2 2,5 3 3,5 4 4,5 5 5,5 6 6,5 7 7,5 8 8,5 9 9,5 10 all: endpoint error cdf Yosemite: endpoint error cdf 1 1 0,9 0,9 0,8 0,8 0,7 0,7 0,6 Bruhn et al. 0,6 0,5 Bigün et al. 0,5 Bigün et al. Farnebäck 0,4 0,3 Farnebäck Horn & Schunck 0,4 0,3 0,2 Horn & Schunck 0,2 0,1 Nir et al. 0,1 Bruhn et al. 0 0 0 2,5 5 7,5 10 12,5 15 17,5 20 22,5 25 27,5 30 32,5 35 37,5 40 0 2,5 5 7,5 10 12,5 15 17,5 20 22,5 25 27,5 30 32,5 35 37,5 40 all: angular error cdf Yosemite: angular error cdf Nir et al. Farnebäck Bruhn et al. Horn-Schunck Bigün et al. AAE 0.82 1.14 1.68 2.58 4.32 Yosemite integral AEE 38359.9 0.04 37764.6 0.06 36131.4 0.14 35127.9 0.16 31882.0 0.40 integral 19922.2 19874.6 19711.1 19671.6 19212.1 AAE 12.75 6.65 7.77 10.56 all sequences integral AEE 59407.7 1.81 68225.8 0.23 65760.2 0.27 65632.9 0.63 integral 17192.6 19542.3 19458.4 19225.3 Figure 4.5: Cumulative distribution functions (cdf) of the angular and endpoint error for the compared flow methods and table displaying the average angular error (AAE), the average endpoint error (AEE) and the integral over the cdf for the Yosemite sequence and for all test sequences. 4.4.2 The Marginal Distribution of (kuk, kgk) To test optical flow algorithms for a bias in magnitude of the estimated flow field, I marginalize over all gray values and estimate the joint distribution of the magnitude of the true and the estimated displacement (Figure 4.6). Note that, for an ideal flow estimator, the density should be concentrated around the angle bisector. For none of the methods a severe bias can be observed for the Yosemite sequence. Yet, except for the Nir method, all estimators exhibit difficulties with large flows. This can be observed in Figure 4.6, because the scatter for larger ground truth magnitudes is so severe that the distribution becomes invisible. This is due to the latter methods linearizing the brightness constancy equation, which becomes less reliable for larger flow vectors. The vertical line for ground truth lengths around 1.25 in the Farnebäck result is probably due to the nature of the errors in this flow field, as these almost exclusively occur at the principal motion boundary, where the flow length is approximately 1.25. 40 4.4 Experiments and Results 5.00 3 magnitude of estimated displacement number of displacements 2500 2000 2.5 3.75 1500 2 2.50 1000 0 1 1.25 500 0 1.5 1 2 3 4 ground truth length 5 6 0.00 0.00 0.5 1.25 a) lengths histogram 5.00 3 2.5 2 2.50 1.5 1 1.25 0.5 1.25 2.50 3.75 magnitude of true displacement 5.00 0 magnitude of estimated displacement magnitude of estimated displacement 5.00 3.75 2.5 3.75 2 1.5 2.50 1.25 0.00 0.00 c) Farnebäck 1 0.5 1.25 2.50 3.75 magnitude of true displacement 5.00 d) Bruhn et al. 5.00 2.5 3.75 2.5 magnitude of estimated displacement magnitude of estimated displacement 5.00 2 3.75 2 1.5 2.50 1 1.25 0.00 0.00 3.75 b) Nir et al. 5.00 0.00 0.00 2.50 magnitude of true displacement 1.5 2.50 1 1.25 0.5 1.25 2.50 3.75 magnitude of true displacement e) Horn-Schunck 5.00 0.00 0.00 0.5 1.25 2.50 3.75 magnitude of true displacement 5.00 0 f) Bigün et al. Figure 4.6: Distribution of the ground truth flow length (a) compared to the computed flow length (b-f) for all tested methods on the Yosemite sequence. The results show that only Nir’s method yields reliable results for large displacements. 41 4 Error Analysis 4.4.3 Conditioning on Error Intervals In contrast to common accuracy measurement techniques the proposed joint PDF allows to investigate the causes of discrepancy. I condition on a certain endpoint interval and marginalize over the flow vectors and thereby obtain the distribution of the corresponding gray value neighborhoods. As the result is multidimensional, PCA is applied in order to obtain a lower dimensional subspace, which still preserves most of the variance of the data. The resulting eigencomponents correspond to the main axes of the PDF. Figure 4.7 shows the five eigencomponents associated with the highest variance, for four estimators based on the Marble sequence. It was conditioned on very high (left) and very low (right) endpoint errors, respectively. It can be seen that the eigencomponents of the structure tensor method significantly differ from those of the other algorithms. The former clearly indicate that edges pose problems to the structure tensor, the reason being that these often coincide with discontinuities of the displacement field, which violate the assumption of constant flow within the integration area. In contrast, well-textured or smooth regions yield low error values. For the Farnebäck method the results are similar. Here edges pose problems as well, whereas highly textured regions yield lower errors. The first three eigencomponents are even almost identical for the structure tensor method by Bigün and the Farnebäck method. The second and third eigencomponent of Farnebäck’s method as well as the second to fifth component of Bigün’s method hint at high errors in case of aperture problems. The CLG method by Bruhn et al. also shows problems in case of edges due to motion boundaries, which are oversmoothed by the regularization term. In contrast, for the Horn and Schunck method the eigencomponents of the PDF conditioned on high and low endpoint errors look very similar. This suggests that the gray value structure has only minor influence on errors. This may be due to the regularization, which is independent of the image sequence in case of the Horn-Schunck method. 4.4.4 Conditioning on Gray Value Structures Finally, I condition on gray value structure and investigate the distribution of the endpoint error. Using the standard deviation of gray values in local neighborhoods as well as the local gray value entropy, bivariate densities relating these scalar descriptors of texture to the endpoint error (cf. Figure 4.8) are obtained. It becomes apparent that the endpoint error is larger for high gray value entropy and small gray value standard deviations, respectively. That means that high frequency structures and rather low frequency regions tend to produce higher endpoint errors. Many of the statements made for single estimators have been known before. However, automatically computing qualitative and quantitative results at the same time within a unified framework over a number of test sequences is a new concept and will help to 42 4.4 Experiments and Results Farnebäck Bruhn et al. Horn & Schunck Bigün et al. Figure 4.7: Eigencomponents with highest variance of pdf conditioned on high endpoint errors (left column) and low endpoint errors (right column) for four different estimators based on the Marble sequence. 0.49 0.49 0.16 0.12 0.14 0.37 0.1 0.37 0.1 0.25 0.08 0.06 0.12 0.04 endpoint error endpoint error 0.12 0.08 0.25 0.06 0.04 0.12 0.02 0.02 0.00 0.00 1.27 2.53 3.80 gray value entropy 5.07 0 0.00 0.00 19.16 38.31 57.47 gray value standard deviation 76.62 0 Figure 4.8: Left: Distribution of the neighborhood entropy and the endpoint error, Right: Distribution of the neighborhood gray value standard deviation and the endpoint error, both for the Nir method. 43 4 Error Analysis improve the analysis of optical flow estimators. 4.5 Summary and Conclusion In this chapter I proposed to analyze the accuracy of optical flow estimators within a uniform framework by means of a single joint probability distribution over gray value neighborhoods, ground truth and computed flow fields. I have shown that by means of marginalization and conditioning interesting, additional information on the quality and properties of optical flow estimators can be obtained from the suggested distribution. For example, the results showed that all estimators except the Nir method have difficulties with large displacements. Furthermore, principal components showing typical gray value structures in case of very high or very low endpoint errors were obtained. By this statistical viewpoint, I hope to further improve optical flow evaluation methods by a) systematically analyzing certain problematic areas such as large displacements through conditioning and marginalization, b) making the evaluation of optical flow estimators independent of the underlying test sequences and c) offering an improved scalar error measure in order to overcome the described shortcomings of the average angular error. As all statistics can be computed on any data sets, this approach becomes highly adaptable to application-dependent requirements, e.g. car driving sequences. The method was applied to the analysis of five exemplary optical flow estimators on five different test sequences. For each method the distribution of the velocity deviations was depicted and the proposed scalar valued indicator was computed which allowed for the ranking of the analyzed estimators. Here, the order of the methods was similar to that inferred by the angular error, but not identical. By means of the proposed analysis method many insights into the quality of the tested flow estimators could be obtained. 44 Chapter 5 Predictability and Situation Measures 5.1 Introduction Even though many optical flow estimators exist, none of them solves the optical flow problem satisfactorily. In fact, all of these methods are prone to errors in specific situations. Hence, it is important to identify the unreliable flow vectors prior to further processing steps. To this end, situation and confidence measures can be used. Both types of measures assign a level of reliability to each single motion vector. Situation measures mostly only take into account the image sequence and judge the hypothetical complexity of estimating the optical flow accurately based on the image data. They can be defined as mappings from the image domain and the image intensity to the interval of confidence [0, 1], where 1 stands for high and 0 for low reliability: ϕ : D × I → [0, 1] . (5.1) In contrast, confidence measures evaluate the reliability of a given optical flow field and, thus, map from the image domain, the image intensity and the flow field to the interval [0, 1] ϕ : D × I × Rd → [0, 1] . (5.2) There are several fields of application for situation and confidence measures. In general, both kinds of measures are important for every application that uses optical flow and whose results depend to any extent on the accuracy of the displacement field. There are mainly two large fields of application for optical flow methods: • areas, where an accurate result is important such as medical applications [49], data compression [43] or particle image velocimetry [83], 45 5 Predictability and Situation Measures • areas, where real-time optical flow results are required, such as robot navigation [67] and pedestrian and vehicle tracking [47]. For both fields accuracy measurements are valuable, since for accurate applications results can be improved afterwards, and for real-time applications less accurate and, thus, faster flow computation methods can be used. Reliable situation and confidence measures also provide valuable information for the improvement of optical flow methods themselves. For example, displacement vectors classified as unreliable could be left out in the result leading to a sparser but more accurate displacement field. Here it also might make sense to ignore these regions completely during the flow field computation or to derive flow information from surrounding pixels, where the flow can be estimated reliably [92]. Another possibility to improve the field in global methods could be to let the parameter that controls the smoothness of the field depend on the confidence or situation. In this way, vectors with higher confidence could exert higher influences on vectors with low confidence, and the smoothing of vectors with high confidence could be reduced. In this chapter I classify, analyze and compare situation measures based on the intrinsic dimension of the image sequence (see Chapter 2.4) they examine. These measures can be successfully applied to the recognition of aperture problems, homogeneous regions and occlusions as well as to the detection of locations, where the flow vector can be computed reliably. Since these measures are affected by image noise, their stability for increasing noise levels is also investigated. 5.1.1 Motivation As described before in Chapter 2.4, the intrinsic dimension of locally confined regions of the image sequence is important for the assessment of the accuracy of optical flow fields. The relationship between all situations based on their intrinsic dimension is shown in Figure 5.1. The first distinction between intrinsic dimension two and three is characterized by the existence of any spatial and/or temporal directed structure. If movement is present, there is always a temporal directed structure in the image marking the trajectory of the object. However, general directed structures can also be spatially directed. In this case, intrinsic dimensions smaller than two are observed which lead to aperture problems (edges or homogeneous regions). Therefore, movement can only be identified reliably in the presence of a temporal directed structure and the simultaneous absence of a spatial directed structure, that is in the case of intrinsic dimension two. The problem with the assignment of reliability levels to specific intrinsic dimensions is the case of occlusion, which increases the intrinsic dimension by one as explained in Chapter 2.4. In order to examine the degree of correctness of the flow field, the augmented intrinsic dimension of occluded structures is undesired as it leads to ambiguous statements on the feasibility of accurate flow computation. For example in the case of intrinsic dimension two, one either observes a uniquely defined translation and flow 46 5.1 Introduction Figure 5.1: Tree and diagram illustrating the relation between all situations that appear in optical flow problems. The situations of augmented intrinsic dimensions due to occlusions are neglected for the sake of clarity. vector or an occluded aperture problem. In the first case, the flow can be computed reliably, while in the second case it cannot. The exception of occluded regions with lower intrinsic dimension, therefore, interferes with the notion of reliability. Table 5.1 shows different situations occurring in image sequences and the corresponding intrinsic dimensions. As the intrinsic dimension is ambiguous due to occlusions, it is important to detect occlusions in order to resolve this situation. Consequently, situation measures for the detection of the following six situations are compared: • intrinsic dimension ≤ 2 (directed structures), • intrinsic dimension 2 (temporal directed structures), • intrinsic dimension ≤ 1 (aperture problems), 47 5 Predictability and Situation Measures i0d i1d i2d i3d image sequence situation aperture problem (homogeneous region) aperture problem (edge) occluded homogeneous region unique translation of corners occluded edge aperture problem undirected or transparent structures noise occluded translation of corners Table 5.1: Typical situations in optical flow problems assigned to intrinsic dimensions • intrinsic dimension 1 (edge aperture problems), • intrinsic dimension 0 (homogeneous regions), • occlusions (increased intrinsic dimension). There are other situations as well that would be beneficial to detect, for example changes in lighting, which could for example be recognized via heavily averaged difference images. 5.1.2 Related Work Related work on intrinsic dimensions has been described in the corresponding section in Chapter 2.4. 5.1.3 Contribution Situation measures have been proposed before in literature, but no one has come up with a summary and comparison. To fill this gap, I have composed and tested previously proposed situation measures. My contribution is three-fold: First, I propose a classification scheme for known situation measures, which is based on intrinsic dimensionality. Second, new measures that I felt were missing are added. And third, the performance of the most important situation measures is compared. 5.2 Classification of Situation Measures 5.2.1 Data and Output Functions To handle the situation measures in a unified way, each of them is subdivided into 1) a data function and 2) an output function. The data function is used to acquire the reliability data, for example the gradient of the image at a certain location. But this 48 5.2 Classification of Situation Measures data alone cannot be used as a situation measure for the following two reasons: firstly, it does not range between 0 and 1, so comparability is not given for different measures, and secondly the relation between data and situation is often inverse, which means that a large data value often corresponds to a low probability for the situation. Therefore, we need monotonous, non-negative output functions mapping any kind of data function values to situation values in the interval [0, 1]. The following four output functions are used to obtain situation measure values for data function values d: (a) If the data function d is naturally bounded by a fixed range [min,max] (e.g. [0, 2π]), and there is no inverse relation between data and situation: c1 (d) := d − min . max − min (5.3) (b) If the data function d is naturally bounded by a fixed range [min,max], and there is an inverse relation between data and situation: c2 (d) := 1 − c1 (d) = max − d . max − min (5.4) (c) If the data function d is not bounded by a fixed range (but non-negative as assumed here), and there is no inverse relation between data and situation: c3 (d) := d2 . 1 + d2 (5.5) (d) If the data function d is not bounded by a fixed range, and there is an inverse relation between data and situation: c4 (d) := 1 − c3 (d) = 1 . 1 + d2 (5.6) 5.2.2 Principal Currently Known Situation Measures I now present all principal known situation measures classified according to their intrinsic dimensionality. For each measure a short explanation of the concept behind it and the data and its output function is given. Intrinsic Dimension ≤ 2 (Directed Structures) This situation is present for any directed structures at a given location. These structures can be temporal or spatial. Thus, this is the most general of the situations based on the intrinsic dimension. The two following situation measures can be used for its detection. 49 5 Predictability and Situation Measures structEv3 structEv3 stands for the minimum eigenvalue of the structure tensor (see Chapter 3). It is derived from [51] and is based on the following concept: The nearer to zero the smallest eigenvalue λ3 of the structure tensor is, the more likely exists a main direction of constant intensity within the integration area of the structure tensor. This is the case if movement at constant velocity (0 velocity as well) is present, but dimensions of constant intensity also exist in case of an aperture problem or within homogeneous regions. Hence, this measure is not able to distinguish between temporal directed structures due to movement and spatial directed structures due to aperture problems or homogeneous regions. Data function: Output function: d = λ3 c4 (d) structCt structCt stands for the total coherency measure of the structure tensor. It is based on the same idea as the previous measure structEv3 and is also derived from [51]. The advantage of this measure compared to the structEv3 measure is that it is able to distinguish between temporal directed structures due to motion and spatial directed structures due to homogeneous regions. However, structCt is not able to distinguish motion from edge aperture problems, since in both cases λ1 λ3 . The data function is maximal if λ1 λ3 = 0 (in case of i2d or i1d) and minimal if λ1 = λ3 (in case of i0d or i3d). ( 3 2 ( λλ11 −λ if λ1 6= 0, +λ3 ) Data function: d= 0 if λ1 = 0. Output function: c1 (d) Intrinsic Dimension 2 (Directed Temporal Structures) This situation is present if the intrinsic dimension at a given position equals two, which means that a directed structure in temporal direction only or an occluded aperture problem exists. In the first case motion can be estimated reliably. A difficult task for these measures is the distinction between directed structures in temporal and spatial direction. The following measures have been applied to this task. structMinors The structMinors method stands for the structure tensor measure examining the existence of multiple motions in the same location. The basic idea underlying this measure has been mentioned by Barth [12]. I suggest to use the minors of the structure tensor to derive four different expressions for the same motion vector. Ideally, these expressions should be identical in case of a unique motion vector of intrinsic dimension 50 5.2 Classification of Situation Measures two, but for other spatio-temporal patterns the calculated motion vectors u1 , u2 , u3 and u4 differ. Based on a chosen error measure e (see section 4.1.2) for the comparison of these vectors different situation measures can be defined. P P Data function: d = 3i=1 4j=i+1 e(ui , uj ) Output function: c4 (d) structCc structCc stands for the corner measure of the structure tensor proposed in [51]. Its data function is defined as the difference between the total coherency measure (structCt) and the spatial coherency measure (structCs) data function. In this way, structCc returns high values in locations, where structCt is large (which is the case for i2d and i1d structures) and structCs is small (which is the case for i3d, i2d and i0d structures). This means that structCc is only large for intrinsic dimension two, that is in case of directed temporal structures. It is bounded by the range [0, 1], since 0 ≤ Cs ≤ Ct ≤ 1. Data function: d := Ct − Cs = Output function: c1 (d) λ1 −λ3 λ1 +λ3 2 − λ1 −λ2 λ1 +λ2 2 structMultipleMotion This measure is based on the structure tensor measure proposed by [75], which examines the existence of multiple motions in the same location. It relies on the assumption that for a reliable motion vector estimate a temporal and no spatial directed structure should exist. For the eigenvalues of the structure tensor J this means λ1 ≥ λ2 λ3 = 0 . (5.7) Therefore, the product K = λ1 λ2 λ3 of the eigenvalues of J is compared to the average diminished product of the eigenvalues 1 S = (λ2 λ3 + λ1 λ3 + λ1 λ2 ) . 3 (5.8) Here K = 0 indicates λ3 = 0 and, thus, if any directed temporal or spatial structure exists. In this case we have intrinsic dimension two, one or zero. In contrast to that S = 0 means for the two smallest eigenvalues λ3 = λ2 = 0 and, thus, indicates an aperture problem of intrinsic dimension one or zero. Therefore, a reliable motion vector of intrinsic dimension two can be identified by a small value√of K and a large value of √ 3 S at the same time. To adjust scales K is compared to S. Hence in the case of √ √ intrinsic dimension two it follows 3 K S. 51 5 Predictability and Situation Measures Data function: Output function: √ √ d= S− 3K c3 (d) Intrinsic Dimension ≤ 1 (Aperture Problems) In the situation of an aperture problem (edge or homogeneous region), which is defined by an intrinsic dimension smaller or equal to 1, the local context is not sufficient to calculate a unique displacement vector. An example is a long edge of a rectangle moving downwards, since here from a local point of view, any flow vector with the correct vertical and an arbitrary horizontal component would be correct. In the case of an edge only the component orthogonal to the edge that causes the aperture problem can be estimated reliably, whereas in the case of homogeneous regions any flow vector is possible. To detect such situations the following situation measures can be used. detHessian, evHessian, condHessian These measures stand for different Hesse matrix measurements: the determinant, the smallest eigenvalue and the condition number. The Hesse matrix H is defined as ∂xx I ∂xy I . (5.9) H= ∂yx I ∂yy I Its condition number has been proposed in [97], but in [10] the determinant of the Hesse matrix has been found more reliable. The determinant can be expressed as the product of the eigenvalues γ1 and γ2 of H det(H) = γ1 γ2 . (5.10) Since the curvature of a function can be approximated by the second derivative, the entries of the Hesse matrix describe the curvature of the sequence in different directions. This measure can be used especially for the identification of homogeneous regions and the aperture problem: H is a symmetric positive definite matrix. Therefore, its eigenvectors are orthonormal and form a basis of R2 . The curvature in a certain direction x can be computed as xT Hx. If vi , i ∈ {1, 2}, is a normalized eigenvector of H with corresponding eigenvalue γi , then the curvature along this eigenvector can be computed as follows viT Hvi = viT γi vi = γi . (5.11) For this reason the eigenvalues γi , i ∈ {1, 2}, are equal to the curvature along the main axes, the eigenvectors. Thus, aperture problems and homogeneous regions amount to det(H)=0. For these reasons two different data functions for the recognition of intrinsic dimensions one and zero can be defined. One data function is based on γ2 , whereas the other is based on the determinant of H. 52 5.2 Classification of Situation Measures Data function: Output function: d1 := γ2 , d2 := det(H) c4 (d) LOGHessian LOGHessian stands for the measure examining the curvature of the edge map of the image sequence. It has been proposed by Waxman, Wu and Bergholm in [98]. The idea is similar to the detHessian measure. But instead of using the image sequence directly, the authors use an edge map E of the image sequence convolved with a spatio-temporal Gaussian kernel called “activation profile” A. This edge map is computed by a convolution of a DOG filter with the sequence and the identification of the zero-crossings z E = z(DOG ∗ I), (5.12) A = G(σx , σy , σt ) ∗ E. (5.13) Then instead of the Hesse matrix of the image sequence the Hesse matrix of the activation profile is computed and its determinant is used as data function. ∂xx A ∂xy A Data function: d = det ∂yx A ∂yy A Output function c4 (d) SSDSurface This measurement is based on Anandan’s proposal [3] to examine the SSD (sum of squared differences) surface. It is created by the repeated modification of the flow vector at the current position and the computation of a new SSD value each time. These SSD values at different modified locations make up the surface. If the minimum SSD value of the surface, Smin , is rather high, no good match in the SSD sense exists for the current vector. To detect aperture problems, the curvature of the surface along the maximum and the minimum principal axis, Cmax and Cmin , is computed. In homogeneous regions the curvature is low leading to small values for Cmax and Cmin . At edges the curvature along the maximum principal axis is high, whereas that along the minimum principal axis is low. Anandan defines the two measurements g1 and g2 in order to quantify these values: g1 := g2 := Cmax , k1 + k2 Smin + k3 Cmax Cmin , k1 + k2 Smin + k3 Cmax (5.14) (5.15) (5.16) where k1 , k2 and k3 are constants. k1 prevents 0 in the denominator, k2 determines the punishment for high values of Smin , and k3 bounds the result to the range (0, k13 ). 53 5 Predictability and Situation Measures Following Anandan the parameters are chosen as follows: k1 = 150, k2 = 1 and k3 = 0. The final measurement data function d is defined as the product of g1 and g2. Data function: Output function: d := g1 g2 c3 (d) SinghSurface SinghSurface stands for the measure examining the velocity distribution at each position in the sequence. The measure has been proposed by Singh and can be found in [10]. It is based on a two-stage computation of SSD values using the previous and following frame combined with the displacement field in positive and negative direction: SSDs (x, y, t) = (5.17) 2 (I(x + u1 , y + u2 , t + 1) − I(x, y, t)) + (I(x − u1 , y − u2 , t − 1) − I(x, y, t))2 . In this way, spurious minima due to noise or periodic texture are averaged out. The idea behind this measure is to calculate the SSDs surface for varying integer displacements for each motion vector u. The resulting surface is then converted to a probability distribution by R(u1 + i, u2 + j) = exp(−k SSDs (u1 + i, u2 + j)), (5.18) ln(0.95) where k = − min(SSD . From this distribution the velocity v = (v1 , v2 ) can be s (u)) obtained as its mean Pn i,j=1 R(u1 + i, u2 + j)(u1 + i) Pn , (5.19) v1 = i,j=1 R(u1 + i, u2 + j) Pn i,j=1 R(u1 + i, u2 + j)(u2 + j) Pn v2 = . (5.20) i,j=1 R(u1 + i, u2 + j) This only works well if the distribution is nearly symmetrical about the true velocity and has few maxima. Therefore, the eigenvalues of the covariance matrix of this distribution are examined. If both eigenvalues are large, we have a homogeneous region, since many displacements lead to high probabilities. If one eigenvalue is large and one small, we have an aperture problem, since the probability is high in many places along the axis corresponding to the larger eigenvalue. If both eigenvalues are small, we have a corner, where the displacement field can be computed reliably. Let λ1 denote the larger eigenvalue. Only if λ1 is small, both eigenvalues are small and the measure is reliable. 54 5.2 Classification of Situation Measures Data function: Output function: d := λ1 c4 (d) structCc As explained before this measure indicates the existence of a situation of intrinsic dimension two. By exchanging the output function to c2 (d) the measure indicates the existence of situations of intrinsic dimension two and one. In case of intrinsic dimension three the result can also be large leading to incorrect statements. Intrinsic Dimension 1 (Edge Aperture Problems) Edges pose problems for flow computation methods, since they cause an aperture problem. Here, the motion vector is no longer uniquely defined from a local point of view and, thus, cannot be estimated reliably. Only one measure is known, which distinguishes between aperture problems caused by edges and those caused by homogeneous regions. structCs structCs stands for the spatial coherency measure of the structure tensor mentioned in [51]. If we have an aperture problem and assume the brightness constancy equation holds, there are two directions of constant gray values: the temporal direction and the direction along the object that causes the aperture problem. Therefore, the two smallest eigenvalues λ2 ≥ λ3 of the structure tensor are equal to 0. This property can be measured by the spatial coherency measure data function structCs. It reaches its maximum for intrinsic dimension 1 (since λ1 λ2 = λ3 = 0). For all other types of motion it is smaller than 1 and, thus, bounded by [0, 1]. Data function: Output function: ( −λ2 2 ( λλ11 +λ ) 2 d= 0 c1 (d) if λ1 6= 0, if λ1 = 0 Intrinsic Dimension 0 (Homogeneous Regions) Homogeneous regions pose problems for many flow computation methods due to the lack of image structure. From a local point of view any displacement vector can be correct in these situations. The following measures can be used for their detection. grad The idea behind the gradient measurement is that the displacement field can be computed the more reliably the more texture is contained in the image. There are different ways to compute the image gradient. Here the central differences scheme (grad) and the forward differences scheme (gradFD) is employed. 55 5 Predictability and Situation Measures Data function: Output function: d := ∇2 I c4 (d) structTrace structTrace stands for the trace of the structure tensor J (see Chapter 3). In homogeneous regions all image gradients tend to 0. The same applies to the structure tensor, its eigenvalues and, thus, the sum of its eigenvalues. Since the trace of a matrix is invariant under coordinate transformations, the sum of the eigenvalues of the structure tensor equals its trace. Data function: Output function: d = trace(J) c4 (d) Occlusion Detection Occlusion situations pose difficulties for flow computation methods, since they contain pixels with undefined flow vectors. This situation is especially important for the detection of higher intrinsic dimensions caused by occlusion. Many investigations have been conducted in the field of occlusion detection. However, these methods are not applicable to situations, where only a monocular image sequence and a flow field is given. Several approaches use stereo images and disparity maps in order to derive occlusion information [102]. Other techniques employ initialization images and compute difference images to detect occlusions [71]. There are also flow computation methods that integrate occlusion detection into global energy terms [1, 24]. Furthermore, there are several statistical approaches [69, 68] that are not considered here due to complexity and variety. In my opinion, situation measures do not have to be based on the image sequence necessarily, but can just as well derive their information only from the flow field . Therefore, I suggest the use of regularizers of global optical flow computation methods as situation measures. Regularizers can be used to identify locations, where the flow field does not correspond to the regularization model. In this way, for example occlusions could be detected. Here four classes of regularizers (image-driven, flow-driven, isotropic and anisotropic) defined in [99] are examined plus the homogeneous regularizer used in [53], a space-time regularizer defined in [28] and a total variation regularizer proposed in [88]. I also propose a new type of regularizer I found missing in this collection: a purely temporal regularizer. To detect occlusion situations the following measures can be used. homReg homReg stands for the homogeneous regularizer. It has been used in [53] and examines the smoothness of the flow field u by computing its spatial gradient. 56 5.2 Classification of Situation Measures Data function: Output function: d := k∇{x,y} u1 k2l2 + k∇{x,y} u2 k2l2 c3 (d) isoFlowReg isoFlowReg stands for the isotropic flow-driven regularizer. It was proposed in [99] and examines the smoothness of the flow field, but allows for exceptions at flow edges. Let ψ(s2 ) be a differentiable and increasing function that is convex in s, e.g. r s2 2 2 2 ψ(s ) = s + (1 − )λ 1 + 2 (5.21) λ mentioned in [99] with λ = = 0.1. Data function: Output function: d := ψ(k∇{x,y} u1 k2l2 + k∇{x,y} u2 k2l2 ) c3 (d) tvReg tvReg stands for the total variation regularizer. It was originally proposed by Rudin et al. in [88] and is a special case of the isotropic flow-driven regularizer with p ψ(s2 ) = s2 + 2 . (5.22) anisoFlowReg anisoFlowReg stands for the anisotropic flow-driven regularizer presented in [99]. It also examines the smoothness of the flow field except at flow edges. Here it only assumes smoothness along the edge, whereas no smoothness is assumed across the edge. Let φ be a matrix valued function, which uses the function ψ defined as in isoFlowReg: 2 X φ(U ) := ψ(λi )vi viT . (5.23) i=1 Here U is a symmetric positive semidefinite matrix. Thus, it has two orthonormal eigenvectors vi , i ∈ {1, 2}, with corresponding eigenvalues λi . Due to the special choice of U in the data function, the eigenvalues specify the contrast of the image in the directions v1 and v2 , respectively. Data function: Output function: d := trace(φ(∇{x,y} u1 ∇{x,y} uT1 + ∇{x,y} u2 ∇{x,y} uT2 )) c3 (d) timeReg Among the variety of regularizers I found missing a purely temporal regularizer, which assumes only temporal smoothness of the flow field at each pixel. Thus, it may be well suited for the detection of occlusions. 57 5 Predictability and Situation Measures Data function: Output function: d := k∇t u1 k2l2 + k∇t u2 k2l2 c3 (d) spaceTimeReg spaceTimeReg was proposed in [28] and assumes temporal and spatial smoothness of the flow field at each pixel. Data function: Output function: d := k∇{x,y,t} u1 k2l2 + k∇{x,y,t} u2 k2l2 c3 (d) structMinors In case of occlusions the four different motion vectors computed from the minors of the structure tensor will differ. Thus, occlusions can be detected by high errors between these estimates. The measure has been described before and can be found in the section on directed temporal structures. To detect occlusions, the output function has to be changed to c3(d). 5.3 Experiments and Results 5.3.1 Comparison Technique To examine the performance of the situation measures an artificial sequence (Figure 5.2) containing various intrinsic dimensions in combination with occlusion has been generated. It consists of four parts: (a) noise as example for undirected structures (i3d), (b) a moving two-dimensional sine as example for temporal directed structures (i2d), (c) a moving one-dimensional sine as example for an edge aperture problem (i1d), (d) a homogeneous region (i0d). To evaluate the effect of occlusions of different intrinsic dimensions on all measures the lower half of the sequence is occluded by a sine pattern in the following frame (see Figure 5.2). Thus, we obtain four regions for each intrinsic dimension, each of which consists of an occluded and a non-occluded region. To test the situation measures the detection accuracy of the corresponding intrinsic dimensions is evaluated. According to the situation and the dimension of the image region examined by the measures each of the eight areas in Figure 5.2 is either to be detected as belonging to the situation or not. Measures not using temporal information are not influenced by occlusion in the next frame and, thus, have the same ground truth values for the upper and lower half of the sequence. 58 5.3 Experiments and Results Figure 5.2: First and second frame of the artificial test sequence for the comparison of all situation measures. The lower half is occluded in the second frame, which increases the original intrinsic dimension by one in these regions. To evaluate the performance of a given situation measure S is used as the set of pixels within the situation, T as the set of pixels outside, that means in all other situations, and gi ∈ {0, 1} as the ground truth value at pixel i. This value gi is set to 0 for pixels in S and to 1 for pixels in T . Now the quality of the measures is expressed in two characteristic values: the average identification error within the situation (inSit) and that outside the situation (outSit) P i∈S |ci − gi | inSit = , (5.24) |S| P i∈T |ci − gi | outSit = , (5.25) |T | where ci ∈ [0, 1] denotes the result of the situation measure at pixel i. Now different measures can be compared based on the sum of these values. The effect of image noise on the quality of the measures will be especially considered in the evaluation and final decision on the best measure. In each Figure the horizontal axis shows the current image noise level and the vertical axis shows the sum of the error measures inSit and outSit for the compared situation measures. 5.3.2 Results for Intrinsic Dimensions ≤ 2 (Directed Structures) For this situation both considered situation measures yield comparable, moderate results. The results of the structEv3 method are slightly better than those of the structCt measure. The methods strongly depend on the size of the integration area of the structure tensor and the size of the filter mask for the derivatives. Both measures perform only moderately due to the fact that large parts of the noise (i3d) are recognized as directed structures as well, which leads to a high error outside the situation. The slightly 59 5 Predictability and Situation Measures 1 structEv3 structCt inSit + outSit 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 image noise 3.5 4 4.5 5 Figure 5.3: Dependency of the directed structures situation measures on image noise. lower quality of the structCt measure is due to the reason that this measure is not able to detect homogeneous regions as directed structures and, thus, suffers from a considerable error within the situation. Both methods are robust to noise as shown in Figure 5.3. 5.3.3 Results for Intrinsic Dimension 2 (Directed Temporal Structures) In this situation only directed temporal structures, that means the two-dimensional sine part, should be recognized by the measures specializing on this situation. As all measures also use temporal information, occluded edge aperture problems have intrinsic dimension two as well and, thus, are also part of the i2d situation. For the structMinors measure the amplitude error measure (structMinorsAmplitude), the angle error measure (structMinorsAngle) and the angular error measure (structMinorsAngular) defined in section 4.1.2 are chosen for the comparison of the four computed flow vectors. The structMultipleMotion and the structMinorsAmplitude measure yield good results, whereas the results of the structMinorsAngle and the structMinorsAngular measure show high errors outside the situation. Figure 5.4 shows that the structMinorsAmplitude method is absolutely non-robust towards noise and already yields high errors for a noise level of σ = 0.5 despite the integration area of the structure tensor. In contrast to that the other measures are much less susceptible to noise. Therefore, one would favor the structMultipleMotion measure for this situation. 60 5.3 Experiments and Results structMultMotion structMinorsAmplitude structCc structMinorsAngle structMinorsAngular 1.2 inSit + outSit 1 0.8 0.6 0.4 0.2 0 0 1 2 image noise 3 4 5 Figure 5.4: Dependency of the directed temporal structures situation measures on image noise. 5.3.4 Results for Intrinsic Dimension ≤ 1 (Aperture Problems) For this situation most aperture problem measures yield almost optimal recognition rates within the situation, so here the average error outside the situation determines the quality of the measures. The dependency on image noise is very interesting for these measures, since many which show very good results if no noise is present already deteriorate dramatically for small noise levels. All of these measures are tested with different noise levels of σ ∈ [0, 5] standard deviation. The results in Figure 5.5 show that measures using second image derivatives, such as the smallest eigenvalue of the Hessian, are highly dependent on the image noise level. In contrast, the logHessian measure is rather noise resistant from smaller noise levels of σ = 0.5 on due to the convolution of the edge map with a spatio-temporal Gaussian filter. The ssdSurface and the singhSurface are rather robust to noise as well, which is probably due to the reason that their decision is based on a larger set of measurements. I come to the conclusion that one would find the detHessian measure preferable. 5.3.5 Results for Intrinsic Dimension 1 (Edge Aperture Problems) For the situation of edge aperture problems just one situation measure, structCs, is known. Figure 5.6 shows that the structCs measure yields rather good results and is not influenced by the noise levels tested. 61 5 Predictability and Situation Measures 1 inSit + outSit 0.8 0.6 0.4 detHessian logHessian evHessian ssdSurface singhSurface 0.2 0 0 1 2 image noise 3 4 5 Figure 5.5: Dependency of the aperture problem measures on image noise. 1 inSit + outSit 0.8 0.6 structCs 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 image noise 3.5 4 4.5 5 Figure 5.6: Dependency of the edge aperture problem measure structCs on image noise. 62 5.3 Experiments and Results gradFD grad structTrace 1 inSit + outSit 0.8 0.6 0.4 0.2 0 0 1 2 image noise 3 4 5 Figure 5.7: Dependency of the homogeneous regions measures on image noise. 5.3.6 Results for Intrinsic Dimension 0 (Homogeneous Regions) The measures detecting homogeneous regions are compared in Figure 5.7. The curves show that measures using image derivatives, the gradient measures, are a lot more susceptible to noise than the structTrace method, which relies on the structure tensor with its larger integration scale. The gradient measures yield good results if no noise is present, but already for a noise level of σ = 0.5 the structTrace method is superior. The grad measure performs slightly superior to the gradFD measure, since it is based on the central differences scheme, which averages two forward differences gradients and, thus, makes the result less prone to noise. In contrast to the gradient measures the structTrace measure remains stable for all higher noise levels and, thus, easily outperforms the other two measures. Therefore, one would favor the structTrace method for this situation. 5.3.7 Results for Occlusion The measures detecting occlusions should be able to recognize the lower half of the four regions of the test sequence. Here the distinction into four regions is interesting as the consequences of different occluded intrinsic dimensions can be examined. Since the regularizers depend on a computed flow field two different flow fields are used: one field computed by the structure tensor method [16] and one field computed by the HornSchunck method [53]. For both flow computation methods the results are quite different. For the flow field calculated by the Horn-Schunck method the timeReg measure yields the best results, 63 5 Predictability and Situation Measures 1 inSit + outSit 0.8 0.6 timeReg spaceTimeReg structMinorsAmplitude 0.4 homReg tvReg anisoFlowReg 0.2 isoFlowReg structMinorsAngular 0 structMinorsAngle 0 1 2 image noise 3 4 5 Figure 5.8: Dependency of the occlusion measures on image noise for a flow field computed by the Horn-Schunck method. which are also rather robust to noise. In contrast to that the structMinorsAngular method yields the best results for the structure tensor flow field. The influence of noise on the measures applied to the Horn-Schunck method is presented in Figure 5.8. For the structure tensor method the regularizer results are very similar, only about 0.3 higher. The diagram shows that the timeReg measure as well as the structMinorsAngular measure are rather robust to noise. Therefore, one would favor the timeReg measure for Horn-Schunck flow fields and the structMinorsAngular measure for structure tensor flow fields. However, due to the high error rates none of these measures is really effective. 5.3.8 Application to Real-World sequences To test the presented situation measures on real-world-scenes, they are applied to the Marble sequence. The question I seek to answer is, whether the situation measures that perform best on artificial sequences also yield good results on noisy real-worldsequences. The Marble sequence (Figure 5.9a) is chosen, since it contains many different situations such as aperture problems of intrinsic dimension 1 (diagonal line on the table, parts of the flagstone edges) and 0 (background, parts of the marble blocks), directed temporal structures (the main part of the image) and occlusion (at the edges of the marble blocks). Figure 5.9 b) shows the result of the best measure for the examination of directed structures (structEv3) applied to the Marble sequence. Temporal directed structures 64 5.3 Experiments and Results as well as aperture problems, such as the edges of the marble blocks and homogeneous regions in the background, are recognized well. Occluded regions of intrinsic dimension two, thus becoming intrinsic dimension three at the edges of the marble blocks, are partially recognized as non-directed structures. However, several pixels of the table pattern in the foreground are not recognized as directed structures even though a unique motion vector exists. The application of the best measure for the recognition of directed temporal structures, structMultipleMotion, to the Marble sequence is shown in Figure 5.9 c). The measure seems to work quite well for real-world sequences, too, as aperture problems (e.g. the diagonal line on the table, parts of the flagstone boundaries) and homogeneous regions (the background, larger patterns on the table) are detected as situations, where no unique direction of motion exists. However, larger parts of the marble blocks with mainly homogeneous texture pose problems for this measure. In contrast, the parts of the block texture, where darker structures appear, are detected correctly. For the recognition of aperture problems the detHessian measure showed the best results if noise was present in the scene. For real-world-sequences the measure is very noisy. The result can be seen in Figure 5.9 d). It recognizes aperture problems such as the diagonal line on the table, the borders of the flagstones, the background and even some structures on the table. These structures are detected as well, because they form small but homogeneous regions. However, like the structMultipleMotion measure, this measure also detects large parts of the marble blocks as aperture problems, which is due to the homogeneity of the block texture. Hence, we can see that several measures depend on well-textured regions for correct decisions. The application of the only measure for the recognition of edge aperture problems of intrinsic dimension one, the structCs measure, to the Marble sequence yields the result in Figure 5.9 e). We can see that the measure is well able to detect regions with edge aperture problems, especially the flagstone border regions, the diagonal table line, edges of the blocks and the table edge. However, false positives appear for small structures on the table, which locally appear as aperture problems. Figure 5.9 f) shows the result of the application of the best situation measure for the detection of homogeneous regions (structTrace) to the Marble sequence. We can see that the measure is well suited even for real-world-sequences, since all larger homogeneous regions are detected. Here as well large parts of the marble block texture are classified as homogeneous. The undetected border of some homogeneous regions is due to the reason that the minimum distance from the edge of a homogeneous region for the current pixel to be detected is determined by the integration area of the structure tensor. This also affects the minimum size of a homogeneous region below which it cannot be detected. For the occlusion sequence the timeReg measure is the only one to yield acceptable results provided an artificial sequence and the Horn Schunck method are used to calculate the flow field. However, the application to the Marble sequence shows that in fact none of the occlusion situations is detected. A similar statement accounts for the best measure 65 5 Predictability and Situation Measures a) Marble sequence b) structEv3 (i2d, i1d, i0d) c) structMultipleMotion (i2d) d) detHessian (i1d, i0d) e) structCs (i1d) f) structTrace (i0d) Figure 5.9: Best situation measures for the detection of the compared situations applied to the real-world Marble sequence. for structure tensor fields, the structMinorsAngular measure. Here several of the marble blocks, the background and smaller parts of the table are recognized as occluded. Real occlusion situations are partly recognized, but by far the largest part of the detections does not correspond to occlusions. 66 5.4 Summary and Conclusion 5.4 Summary and Conclusion In this chapter I have summarized and examined typical situation measures which are used to identify image sequence locations, where a reliable optical flow estimation is difficult or impossible. For each situation a measure was identified that yields the best results and is still robust in the presence of noise. These measures are shown in Table 5.2. id i2d - i0d i2d i1d - i0d i1d i0d i+1d i+1d situation directed structures temp. directed structures general aperture problems edge aperture problems homogeneous regions occlusion (HS-field) occlusion (ST-field) best measure structEv3 structMultipleMotion detHessian structCs structTrace timeReg structMinorsAngular Table 5.2: Best situation measure for each situation based on its intrinsic dimension. The quality of the presented situation measures is different. If no noise is present in the scene, there is always a measure available that yields good or at least moderate results. In case of higher noise levels the results show that larger integration scales like those of the structure tensor or smoothing as is applied by the logHessian measure partially remove the influence of noise on the measures and make them robust. In contrast, measures based on unsmoothed image gradients become unreliable for small noise scales already. Therefore, for almost all of the situations a measure of the structure tensor has finally been chosen with respect to noise robustness. However, the application to the Marble sequence shows that several of the measures are not able to cope with real-world situations. 67 5 Predictability and Situation Measures 68 Chapter 6 Surface Measures 6.1 Introduction In the previous chapter I have presented and compared the most important situation measures classified according to their intrinsic dimensionality. Yet, since none of these measures accounts for the computed flow field none of them is, in fact, qualified to estimate the accuracy of optical flow vectors. Instead of examining the intrinsic dimension of the image sequence we, therefore, propose to examine the intrinsic dimension of the energy surface of optical flow estimators based on an arbitrary number of parameters. By means of surface functions I was able to simultaneously derive a situation and a confidence measure for optical flows. I show that based on the information of the proposed so-called “surface measures” the average error of optical flow fields can be significantly reduced by a basic motion restoration algorithm. 6.1.1 Motivation In general, all global and local optical flow computation methods can be formulated as a parameter optimization problem, which consists of the minimization of an energy. This is also true for many other image processing problems such as registration, segmentation, image restoration and reconstruction. Global methods minimize an energy consisting of a data term ensuring brightness or gradient constancy etc. along the flow trajectory and a regularization term, which enforces the smoothness of the resulting flow field in order to obtain unique solutions [53, 25, 28, 78]. Local methods employ energy optimization by means of e.g. least squares or total least squares estimators [70, 16]. In several cases additional parameters such as intensity changes [20, 50] or various model parameters [78] are estimated as well. The aim of such energy optimization problems is to find parameters fulfilling the requirements expressed in the energy formulation as good as possible. Yet, despite recent progress in optical flow approaches, the algorithms 69 6 Surface Measures are still facing difficult problems, which lead to errors in the resulting flow fields. Hence, if an optimum of the parameter optimization problem has been found by an optical flow algorithm, the question of the quality of this optimum and the corresponding optical flow parameters still remains. Based on the intrinsic dimension of surface functions a new confidence and a new situation measure are simultaneously proposed. The only difference between both measures is the underlying flow field. In case a computed flow field is used we obtain a confidence measure, which evaluates the accuracy of the flow vectors. In case a zero flow field is used we obtain a situation measure, which evaluates the feasibility of accurate flow estimation. 6.1.2 Related Work Related work on intrinsic dimensions has been introduced in chapter 2.4. To estimate the intrinsic dimension of the energy of the parameter optimization problem I propose surface functions. The notion surface function refers to a generalization of correlation surfaces to arbitrary energy functions. Correlation surfaces have widely been used for the detection of features in images. In terms of motion estimation, they have been applied to measure the similarity between regions in subsequent image frames in order to estimate the displacement between moving objects. To this end, a similarity measure relating different image intensities for alternative locations in the sequence is computed. Typical similarity measures are for example the sum of squared differences or the normalized cross correlation. Correlation surfaces have for example been applied by Anandan in optical flow computation [3], who proposed a confidence measure to detect aperture problems in order to find the optimal scale for his hierarchical block matching algorithm for the computation of the optical flow, by Rosenberg and Werman [84] to detect locations where motion cannot be represented by a Gaussian random variable and by Irani and Anandan in [54] to align images obtained from different sensors. Due to the dependence on image intensities, a drawback of correlation surfaces is their susceptibility to image noise. This can be overcome by choosing large correlation regions, which in turn leads to problems caused by occlusions and motion boundaries. Surface functions differ from correlation surfaces in one main aspect: They do not operate on the image, but on energies derived from parameter optimization methods. Hence, they are not limited to simple intensity based image similarity measures but depend on all parameters occurring in the optimization problem, such as the horizontal and vertical flow component and parameters estimating brightness changes etc. I want to analyze the intrinsic dimension of energy surfaces. So far, in image processing the intrinsic dimension of image sequences has naturally been restricted to equal to or smaller than two [38] or three [11]. As we are dealing with energy surfaces and an arbitrarily large set of parameters, the intrinsic dimension of the surface functions can be arbitrarily large. Furthermore, as has been stated in [38], in fact for images and 70 6.2 Surface Measures energies there is no situation that is exclusively of one intrinsic dimension. Instead, a probability can be assigned to each situation for it to be of a given intrinsic dimension. This led to a continuous formulation of the intrinsic dimension within the topology of a cone [38]. Therefore, to be able to apply the theory of intrinsic dimensions to the case of an arbitrary number d of parameters, a generalized continuous formulation of the intrinsic dimension up to dimension d is required. 6.1.3 Contribution In previous approaches (see Chapter 5) only the intrinsic dimension of the image sequence has been used. To apply the theory of intrinsic dimensions to energy surface functions depending on an arbitrary number of parameters, I give a generalized continuous formulation of the intrinsic dimension up to dimension d, which can be represented as a simplex. In addition to the intrinsic dimension of the energy surface function my formulation also comes with a simple method to detect outliers. For these, either the energy could not be optimized at all or the optimization process got stuck in a local optimum. From both sources of information, the intrinsic dimensionality of the energy surface function and the outlier detection, a new confidence measure can be derived. In case a zero motion field instead of a computed motion field is used, we, in fact, obtain a situation measure, which allows for statements on the feasibility of accurate optical flow estimation. Part of this work was published in [62]. 6.2 Surface Measures 6.2.1 Energy Functions The computation of optical flow fields usually amounts to the solution of energy minimization problems. Based on arbitrary energy formulations I will now derive surface functions. Such energy formulations will be described by mappings c from the image domain, the image sequence and a parameter vector from the d-dimensional parameter space Rd to the set of non-negative real numbers: c : D × I × Rd → R+ 0. (6.1) The parameter vector u ∈ Rd usually consists of the horizontal and vertical flow field component together with any additional parameters in the flow computation problem. The set of such energy formulations will be denoted by C. These energies can be derived directly from the flow computation method, or they can be arbitrarily defined on the given flow field. Depending on the employed flow field (the computed flow field or an artificial zero flow field) either statements on the accuracy of the given flow field or on the feasibility of an accurate flow computation can be made. Examples for typical energy formulations appearing in global methods are derived from image invariants under the 71 6 Surface Measures displacement field, e.g. the constancy of the brightness, the intensity, the gradient or the curvature at a given location x in the original image and the image after warping by means of the parameter vector u: brightnessConst: c(x, I, u) = (∇{x,y} I u + ∇t I)2 = 0, ssdConst: c(x, I, u) = kI(x) − Iw (x)k2l2 = 0, gradConst: c(x, I, u) = k∇{x,y} I(x) − ∇{x,y} Iw (x)k2l2 = 0, hessConst: c(x, I, u) = kH(x) − Hw (x)k2l2 = 0, where Iw and Hw denote the image sequence and the Hessian of the image sequence, which are warped by the computed parameter vector u. In order to obtain a bounded interval for these energies they are mapped to the interval [0, 1] using the transformation 1 . Note, that the energy minimum is turned into a maximum. To make 1+c(x,I(x),u)2 the resulting energy function robust to image noise the energy function is scaled by multiplying it by a value κσ ≥ 1 depending on the noise level σ and cutting the result to the interval [0, 1]. 6.2.2 Surface Functions A surface function for a given d-dimensional parameter vector u reflects the variation of the energy c ∈ C over the set of modifications of the current parameter vector: Sx,u,c : Rd → [0, 1], Sx,u,c (p) := c(x, I(x), u + p). (6.2) It can be understood as an indicator for possible alternatives to the current parameter vector as it shows the effect of slight parameter changes p on the given energy. If the parameter changes but the surface function, Sx,u,c (p), remains almost constantly high a rather small reliability is assigned to this optimum, since neighboring parameters yield almost equally low energies. In such cases the surface function shows an aperture problem or homogeneous region, which makes a reliable parameter optimization impossible without further information. In the case of occlusion, transparent structures and noise the maximum of the surface function is usually small indicating that no good estimate is possible at all. Such outliers can make a parameter estimation arbitrarily bad, for example in the case of least squares estimators, which are used in many local methods. Hence, the intuition is that the computed parameters in the optimum are reliable only if two requirements are fulfilled: (a) No other parameter constellation with a similar energy exists. If there are different constellations of parameters yielding very similar surface function values, the solution to the parameter optimization problem is not unique and, thus, unreliable. (b) The surface function is sufficiently high at the maximum. If this is not the case we either got stuck in a local energy minimum or, especially if the problem is 72 6.2 Surface Measures convex, there is no satisfying solution to the parameter optimization problem. In this case we have come across an outlier, for which the energy cannot be optimized satisfactorily. In both cases, the optimum is unreliable. In contrast, a single surface function peak suggests a unique, reliable optimum. Hence, we can now investigate the quality of the energy optimum by examining the intrinsic dimension and the maximum of the surface function. 6.2.3 A Continuous Formulation of the Intrinsic Dimension as Simplex Structure In the discrete formulation, the intrinsic dimension of the surface function at location x, Sx,u,c , corresponds to the dimension of the subspace of non-constant values. Let d be the number of parameters in the parameter optimization problem. We need to examine the variance of the surface function in the (d + 1)-dimensional space. To this end, the curvature along the main axes of the surface function is computed. Following Felsberg [38] I use a continuous formulation of the intrinsic dimensionality (see Figure 2.2 in Chapter 2.4). Let t ∈ R+ define a threshold indicating a very large curvature value, and let v stand for the d-dimensional curvature vector with its entries sorted in descending order. These entries are normalized to the range [0, 1] by vi = min{ vi , 1}, i ∈ Nd . t (6.3) Then each entry can be understood as a barycentric coordinate in the intrinsic dimension simplex, e.g. v1 indicates the first coordinate (1 − v1 )i0d + v1 i1d, (6.4) v2 indicates the second coordinate (1 − v2 )i1d + v2 i2d, (6.5) and so on. Due to vi ≥ vi+1 the resulting coordinates always lie within the d-dimensional simplex defined by the edges i0d, i1d,...,idd. This approach can, therefore, be understood as a generalization of the triangle formulation by Felsberg [38]. 6.2.4 Outlier Detection After computing the intrinsic dimension vector corresponding to the surface function I come to the second point, the detection of outliers. Outliers are locations in the image sequence, where it is not possible to optimize the energy of the optical flow estimator. One often has to do with optimization problems, for which the result of the estimator can already be arbitrarily bad in case of a single outlier in the data, as is for example the 73 6 Surface Measures case in least squares methods (see section 2.6), which are used for local flow computation methods, such as [70, 16]. Hence, it would be beneficial to detect outliers. A simple but effective method for the detection of outliers is to examine the maximum of the surface function S0 ∈ [0, 1]. If the value is sufficiently close to 1, the energy can be optimized by the corresponding parameter set, otherwise the optimization was unsuccessful. This can be the case if the energy cannot be optimized or if the optimization process got stuck in a local minimum. Both cases are not desirable for reliable parameters. In optical flow estimation, especially the difficult situations of occlusions, severe noise, transparent structures or incoherent motion can, thus, be detected. 6.2.5 Surface Measures Based on the proposed intrinsic dimension estimator and the outlier detection a single function ϕ can now be defined as confidence or situation measure. The situation, where the d optimized parameters for the optical flow problem are reliable, demands the existence of a high intrinsic dimension with a high maximum value of the surface function, S0 , at the same time. To combine the intrinsic dimension of the surface function and its maximum value S0 ϕ is defined in the following way: ϕ : D × Rd → [0, 1], ϕ(x, u) := φ(Sx,u,c ). (6.6) The function φ derives the situation measure value based on properties of the surface function. It will be defined based on the following theoretical considerations. The case of lower intrinsic dimensionality of the surface function in the optimum can be detected by a low minimum curvature value vn . In this way, homogeneous regions and aperture problems in different dimensions of the energy surface are recognized. In the case of an outlier the surface function yields a low maximum value S0 . Therefore, the value of the function φ should always be close to 1 if vn and S0 are high. Let S be the set of surface functions defined in Equation (6.2). Then the function φ can be defined by 1 φ : S → [0, 1], φ(Sx,u,c ) := S0 · 1 − , (6.7) 1 + τ vn2 where τ ∈ R+ is used to scale the influence of the intrinsic dimensionality on the value of φ. Here τ was set to 60. 6.3 Computational Issues The discretization of the surface function has a large influence on the quality of the estimation of the intrinsic dimension. To discretize a surface function Sx,u,c a step size h and a fixed size w of the surface are used, where h is the distance between two surface points within every dimension and w is the number of surface points in all dimensions 74 6.4 Experiments and Results a) i0d b) i1d c) i1d d) i2d Figure 6.1: Discretized surface functions for two dimensional parameter space. after discretization, e.g. h = 0.5, w = 13 yielded good results. The variances can only be estimated reliably if h is chosen between 0 and 1 and if bicubic interpolation of the surface function is used. Further preprocessing steps have been applied to the surface functions to obtain good results: (a) Since we expect the correct parameter set to be similar to the estimated one, only those values of Sx,u,c (p) with small arguments of kpk2 can actually be considered as alternatives for the current parameter vector. Hence, the examination of the surface function is limited to the direct neighborhood of its origin. (b) Since the eigenvalues of the Hessian yield noisy curvature estimates, a robust curvature estimator is introduced. It averages n curvature values along the principal axis using the following filter mask: n1 (1| .{z . . 1} −2n 1| .{z . . 1}). n n (c) To estimate the intrinsic dimension of the surface function only those parts of the image of Sx,u,c are relevant which are close to the maximum S0 , since only these locations denote possible alternatives for the current parameter vector. (d) Locations that are separated from the origin of the surface function by a local minimum are likely to belong to other local minima of the original energy and, thus, should not influence the intrinsic dimension of the surface function. Hence, all surface function values that are separated from the origin by a local minimum are set to 0. Such local minima can for example be found by means of a simple flood fill algorithm with starting point at the origin. Typical discretized surface functions for a two-dimensional parameter space are shown in Figure 6.1. 6.4 Experiments and Results 6.4.1 Comparison to i2d Measures To evaluate my results I first compare the quality of the surface measures used as situation measures with zero flow field to the previously known situation measures detecting 75 6 Surface Measures 0.6 SSM−brightnessConst SSM−gradConst SSM−hessConst SSM−laplaceConst SSM−gradNormConst SSM−hessNormConst SSM−ssd structMultMotion structCc structMinorsAngle ssdSurface (Anandan) 0.5 rin + r out 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 2.5 image noise 3 3.5 4 4.5 5 Figure 6.2: Comparison of surface situation measures (SSM-measures) based on different energy functions for the recognition of the i2d situation not due to occlusion to known situation measures for increasing noise levels. i2d situations and show that all surface measures - independent of the underlying energy function - perform better than the best previously proposed measures and are robust to noise as well. As test sequence I again use the synthetic sequence shown in Figure 5.2, as it contains every intrinsic dimension and their occlusion. In this way we can examine if the surface measures are able to recognize the situation of accurate flow estimability (intrinsic dimension two without outliers caused by occlusion). To obtain numerical results I use the same error measure consisting of the sum of the errors within and outside the situation, which was presented in section 5.3. As no measures are known for the detection of i2d situations not due to occlusions the proposed surface measures are compared to the best known measures for the i2d situation: structMultMotion derived from [75], structCc [51], Anandan’s measure [3] and structMinorsAngle [12]. Figure 6.2 shows the error measure plotted against an increasing noise level σ ∈ [0, 5] in the test sequence. The proposed surface situation measures are labeled by the prefix ”SSM” and an abbreviation of the energy function they are based on. We can see that the proposed surface measures generally perform better than the best previously proposed i2d measures for any underlying energy function c. All surface measures are robust to noise but depend on the robustness of the underlying energy function. The susceptibility to noise increases with the order of the derivatives in the energy function. However, the influence of noise on the surface measures is limited by 76 6.4 Experiments and Results a) b) c) d) e) f) Figure 6.3: Top: Cropped Marble sequence regions with the result of the brightness constancy surface measure for the recognition of the accurate flow estimability situation (i2d not due to occlusions). a),b),c) Texture of blocks (good estimability/i0d), d) Diagonal table line (i1d) e) Flagstones in the background (i1d, good estimability at corners), f) Table (good estimability/i0d), Bottom: Office sequence with additional lens flare and result of the SSD constancy Surface Measure correctly identifying the occlusion. the robust curvature estimation along the principal axes. 6.4.2 Application to Other Test Sequences For further validation of the surface situation measures I apply them to standard test sequences. As no ground truth concerning the accurate flow estimability situation is available for these sequences, only a visual evaluation is feasible. Figure 6.3 a)-f) shows six different cropped regions of the Marble sequence, which has been used for the same purpose in the previous chapter, and the corresponding surface measure result based on the brightness constancy energy function. In Figure 6.3 a), b) and c) we can see the application of the surface measure to different textures. In a) and b) the Marble blocks show only very little texture, which makes these regions unreliable for flow estimation. In contrast, most parts of the block texture in c) are classified as sufficient for a reliable flow computation. In d) and e) we can see examples of aperture problems (i1d). The diagonal line on the table as well as the edges of the flagstones in the background of the sequence are typical examples for this 77 6 Surface Measures situation. Both are recognized well by the surface measure. The corners of the flagstones are correctly recognized as regions, where the optical flow can be estimated reliably. The table region in f) is partially recognized as reliable and partially as i0d. This is due to the larger homogeneous regions in the table texture, as here the result depends on the size of the surface considered. If the whole surface function lies within the homogeneous region, the curvature along the main axis is 0 and, thus, the surface measure result as well. To demonstrate that surface measures can also detect outliers (e.g. occlusions), I use the cropped Office sequence [73] with an additional lens flare occluding part of the background in Figure 6.3 (this kind of lens flare often poses problems e.g. in traffic scenes). The brightness constancy surface measure detects this region. 6.4.3 Motion Inpainting Based on Situation Measures To show an application for surface measures used as confidence measures based on the computed flow field I reconstruct optical flows. First, a surface confidence measure map is used to sparsify flow fields calculated on four ground truth sequences (Marble, Yosemite, Street and Office) by the three dimensional linear combined local global (CLG) method by Bruhn et al. [28] and by the local structure tensor method by Bigün [16], both described in Chapter 3. Then motion inpainting is applied to the sparsified displacement fields in order to reconstruct the flow at pixels with low surface measure values. I demonstrate that the angular error [10] is reduced significantly by means of motion inpainting. Table 6.1 shows the average angular error and standard deviation over ten frames for the sparsification and reconstruction of the flow field based on the best previously proposed situation measure (structMultMotion) compared to the new surface measures. For sparsification, the flow field density optimal for motion inpainting with respect to the angular error is chosen. Concerning the quality of the proposed measures, we can draw several conclusions from the results presented in Table 6.1. • The average angular error of the motion inpainting algorithm based on the surface measures is lower than the error we obtain based on the best previously proposed situation measure. Hence, using the surface measures we can make more reliable statements on the accuracy of the flow than by means of previous i2d situation measures. • The average angular error after motion inpainting is lower than the original angular error for the CLG and the structure tensor method. Thus, I conclude that the remaining flow vectors after sparsification contain the most relevant information of the original flow field, and that most other information is dispensable, even obstructive, for the computation of a 100% dense flow field. 78 6.5 Summary and Conclusion CLG Marble Yosemite Street Office ST Marble Yosemite Street Office original 3.88 ± 3.39 4.13 ± 3.36 8.01 ± 15.47 3.74 ± 3.93 original 4.49 ± 6.49 4.52 ± 10.10 5.97 ± 16.92 7.21 ± 11.82 sparsified 3.59 ± 3.03 2.78 ± 2.24 2.77 ± 2.52 3.25 ± 4.80 sparsified 2.96 ± 2.25 2.90 ± 3.49 2.07 ± 5.61 2.59 ± 4.32 density 70.6 20.7 11.5 26.7 density 42.3 37.5 34.6 5.1 inpainting 3.87 ± 3.38 3.85 ± 3.00 7.73 ± 16.23 3.59 ± 3.93 inpainting 3.40 ± 3.56 2.76 ± 3.94 4.95 ± 13.23 4.48 ± 4.49 best previous 3.88 ± 3.39 4.13 ± 3.36 7.99 ± 15.48 3.62 ± 3.91 best previous 3.88 ± 4.89 4.23 ± 9.18 5.69 ± 16.47 6.35 ± 10.14 Table 6.1: Angular error for four test sequences for the original field, the sparsified field with given density (percentage), the result of motion inpainting based on the best surface measure and the result of motion inpainting based on the previously best situation measure (structMultMotion), averaged over ten frames for the CLG and the structure tensor (ST) method; the density of the sparsified field was chosen as optimal for motion inpainting. • The table also indicates the average angular error for the sparsification of the flow field by means of the surface measures. Here I chose the sparsification density which has been found optimal for motion inpainting. The sparsification error is lower than the motion inpainting error and can be achieved if a dense flow field is not required. • For both the CLG and the structure tensor method the inpainting of the sparsified flow fields yields lower angular errors than the original methods for all test sequences. The results of the local structure tensor method after motion inpainting are even superior to the original and the inpainted global CLG method in all cases but one. Therefore, I conclude that – in contrast to the accepted opinion which favors global methods over local methods if dense flow fields are required – the filling-in effect of global methods is not necessarily beneficial for obtaining an accurate dense flow field. Instead, local and global methods alike can lead to better results if motion inpainting in combination with surface measures for sparsification is employed. Here, local methods often even seem preferable. 6.5 Summary and Conclusion I have proposed surface measures, which can be employed either as confidence measures to analyze the accuracy of a given flow field or as situation measures to estimate the feasibility of accurate optical flow computation. They can be applied in case of an 79 6 Surface Measures arbitrary number of parameters. The proposed surface measures have proven robust to noise, yield better results than all previously proposed situation measures, also for real-world sequences, and contain the most relevant information for the reconstruction of the original flow field with even higher quality. Based on these measures locally or globally computed flow fields were sparsified and the missing flow vectors were filled in by a basic motion inpainting algorithm. Tests have been conducted using the CLG method and the structure tensor method on four standard test sequences. For the chosen test sequences I conclude that the application of a postprocessing method to sparsified flow fields calculated with local or global methods yields better results than can be achieved by exploiting the filling-in effect of global methods. Hence, in contrast to the accepted opinion, global methods are not always preferable to local methods if a dense flow field is required, because motion inpainting only based on reliable flow vectors can lead to superior results. 80 Chapter 7 Statistical Confidence Estimation 7.1 Introduction Confidence measures evaluate the accuracy of a given flow vector and are, thus, indispensable to assess and increase the quality of optical flow fields. Using the information provided by confidence measures, the accuracy of the estimated flow field can be improved by integrating the confidence measure into the calculation method or by postprocessing, e.g. removing and reconstructing incorrect flow vectors. In this chapter I propose two statistical approaches and extensions to confidence estimation for optical flow fields. 7.1.1 Motivation It is of utmost importance for any optical flow measurement technique to give a prediction of the quality and reliability of each individual flow vector. This was already asserted in 1994 in the landmark paper by Barron et al. [10], where the authors stated that “confidence measures are rarely addressed in literature” even though “they are crucial to the successful use of all [optical flow] techniques”. Confidence measures map each individual flow vector to a value within the interval [0, 1], where 0 stands for no confidence and 1 for high confidence. In contrast to situation measures they always use the optical flow field to assess its accuracy ϕ : D × I × Rd → [0, 1]. (7.1) An example can be seen in Figure 7.1, where the confidence is continuously indicated between the colors red (which stands for confidence 0) and green (which stands for confidence 1). 81 7 Statistical Confidence Estimation Figure 7.1: Color coded confidence computed for a structure tensor flow field and the Rubber Whale sequence. Green stands for high confidence and red for low confidence. There are mainly four benefits of confidence measures: (a) unreliable flow vectors can be identified before they cause harm to subsequent processing steps, (b) corrupted optical flow regions can be identified and possibly recovered by modelbased interpolation (also denoted as “inpainting”), (c) existing optical flow methods can be improved, e.g. by integrating the confidence measure into variational approaches, (d) fast, structurally simple optical flow methods in combination with a confidence measure can replace slow, complicated ones. Yet, the confidence measures known today are inadequate for the assessment of the accuracy of optical flow fields due to the following reasons: (a) Many confidence measures are, in fact, situation measures as they infer confidence values based on the local structure of the image sequence only without taking into account the computed flow field. 82 7.1 Introduction (b) Most confidence measures are directly derived from specific optical flow computation techniques and, thus, can only be applied to flow fields computed by this method. But if the same model for flow and confidence estimation is used the confidence measure only verifies the restrictions already imposed by the flow computation model. Thus, errors are often not detected as the flow obeys the model. Hence, I opt against using the same motion model for confidence estimation. (c) None of the proposed measures is statistically motivated despite the notion “confidence measure”. In this chapter we, therefore, propose a statistical confidence measure, which is generally applicable independently of the flow computation method. An additional benefit of our method is its adaptability to application-specific data, i.e. it exploits the fact that typical flow fields can be very different for various applications. 7.1.2 Related Work For optical flow estimators a thorough analysis of the errors in the estimated flow field is important. These errors have been analyzed by Fermüller et al. [40]. To predict errors without ground truth confidence measures are used. The number of previously proposed confidence measures for optical flow fields is limited. In addition to the comparison by Barron et al. [10], another comparison of different confidence measures was carried out by Bainbridge and Lane [7]. In the following I will present confidence measures that have been proposed in the literature so far. Confidence measures can be classified based on two aspects: first, I distinguish between confidence measures which derive their information from the image sequence and those, which derive it from the computed flow field. The second group is then subdivided into measures that depend on the flow computation method and those which are independent of it. Most of the measures rely on the image sequence only and are, thus, naturally independent of the flow computation method. These are, in fact, situation measures, which were described, classified and compared in Chapter 5. For comparison, in this chapter I use the three measures based on the structure tensor, structCt, structCs and structCc as well as the image gradient grad. All of these measures are examples of confidence measures which assess the reliability of a given flow vector exclusively based on the input image sequence. The second group consists of measures taking into account the flow field, but these measures are derived from and are, thus, limited to special flow computation methods. An example is the confidence measure proposed by Bruhn and Weickert [26] for variational optical flow methods, which computes the inverse of the variational energy remaining after optimization. In this way, locations are identified, where the energy could not be minimized, e.g. in cases where the model assumption of the method is not valid, such 83 7 Statistical Confidence Estimation as in the vicinity of edges in case of homogeneous regularization. Hence, their approach assigns a low confidence value to these locations. So far, there are no measures which take into account the flow without being restricted to its computation method. Previous work on optical flow statistics and statistical models mainly focuses on the work by Roth and Black. In [85] they investigated the statistics of the horizontal and vertical velocities and found that the derivative statistics strongly resemble a Student t-distribution. They used these insights to formulate an optical flow prior for a Markov random field. In [96] Sun et al. learn statistical models of brightness constancy errors, high-order constancy assumptions such as gradient constancy and spatial properties of optical flow fields by means of random fields. 7.1.3 Contribution Our contribution in this chapter are two statistical confidence measures. The first is based on linear subspace projections and mainly applicable to flow fields computed by local optical flow methods. It has been published in [65]. The second confidence measure is a purely statistical confidence measure, which is generally applicable to any kind of computed flow field. It is based on a hypothesis test and can be extended to a nonlinear method. Furthermore, it can be adapted to deal with sparse flow fields often occurring in applications such as traffic sequences. The linear confidence measure has been published in [66]. The measures are compared to previously used situation and confidence measures based on error quantile plots. 7.2 A Confidence Measure Based on Linear Subspace Projections I first propose a new confidence measure that is adaptable to the current flow computation problem by means of unsupervised learning. In fact, the measure can be used for all optical flow fields that have been computed with no or minor smoothness assumptions. Even ground truth data, which is generally unavailable, is not necessary, as the model can be learned either from a set of ground truth flow fields or from a previously computed flow field. The linear subspace projection method has been applied to the estimation of optical flows before, directly by Black et al. in [23] and by means of Markov Random Fields by Roth and Black in [85]. In contrast to these approaches, where only spatial information is used, I extend the subspace method to include temporal information of the flow field and derive a new confidence measure. Since much of the information contained in a flow field is only obvious in the temporal domain, the inclusion of temporal information is indispensable. The concept of our confidence measure is based on the idea of learning typical displacement vector constellations within a local neighborhood. The resulting model consists of 84 7.2 A Confidence Measure Based on Linear Subspace Projections Figure 7.2: Examples of flow field patches from which the motion statistics are computed. a set of basis flows, a linear subspace of the flow fields, that is sufficient to reconstruct 99% of the information contained in the flow fields. Displacement vectors that cannot be reconstructed by this model are considered unreliable. Hence, the reconstruction error is chosen as confidence measure. It performs better than previously proposed confidence measures and obtains a substantial gain of quality in several cases. 7.2.1 Training Data Selection Instead of defining a motion model statistical methods are used to learn the motion model directly from sample motion data. To draw conclusions on the accuracy of a flow vector, the surrounding flow field patch of a predefined size (see Figure 7.2) is examined. In the following a spatio-temporal flow field patch is defined. For n given image sequence locations (xi , ti ) ∈ Ω × [0, T ], a finite time interval [ti − τ, ti + τ ], ti ≥ τ and a spatial neighborhood ω(xi , ti ) ⊂ Ω of fixed size let Si : ω(xi , ti ) × [ti − τ, ti + τ ] → R2 , i ∈ Nn , (7.2) denote n spatio-temporal flow field patch samples centered on (xi , ti ). As each flow vector Si consists of a horizontal and vertical flow component its size p corresponds to twice the number of flow field vectors contained in the spatio-temporal neighborhood, p := 2 |ω| (2τ + 1). Let si = vec(Si ) ∈ Rp , i ∈ Nn , denote the columnwise vectorization of the i-th sample flow field patch. The relation between indices in the original spatio-temporal sample flow field patch Si and the vector si can then be described by a mapping q : ω(xi , ti ) × [ti − τ, ti + τ ] × {1, 2} → Np , (7.3) where 1 or 2 indicates the horizontal or vertical component of each flow vector, respectively. To obtain statistical information on the accuracy, a probabilistic motion model is learned from training data, which can be ground truth flow fields, synthetic flow fields, computed flow fields or every other flow field that is considered suitable. If motion estimation is performed for an application domain, where typical motion patterns are known a priori, 85 7 Statistical Confidence Estimation the training data should of course reflect this. Yet, in general there is no need for prior knowledge on the type of motion occurring in the scene as we still obtain results of high accuracy if no such prior knowledge is available. It is even possible to use the flow field for which we want to compute the confidence as training data, i.e. finding outliers in one single data set. This leads to a very general approach, which allows for the incorporation of different levels of prior knowledge. 7.2.2 Symmetrization In order to avoid any directional bias in the following prediction methods, it is important to perform all possible rotations and reflections on the training data, including time reversal. This means that the training flow field patches are rotated several times, the vectors are reflected on the horizontal and vertical axis, and the temporal direction of the flow field patch sample is inverted. In this way we obtain, as desired, a mean vector m = ~0. 7.2.3 Learning the Motion Model In order to learn the motion model, principal component analysis (PCA) described in Chapter 2.5.4 is used similar to [23]. By means of PCA (or, alternatively, robust PCA [33]) a new orthogonal basis system B = [b1 , ..., bp ] can be computed, within which the original sample fields are decorrelated. Let the basis components be sorted according to decreasing eigenvalues. Then the first k ≤ p eigenvectors with the largest eigenvalues contain most of the variance contained in the sample data, whereas the eigenvectors with small eigenvalues usually represent noise or errors in the sample flow fields and, thus, should be removed from the set. Thus, the first k basis components derived in this way span a linear k-dimensional subspace of the original sample data preserving most of the sample data information. Within this subspace the sample flow fields can be approximated by a linear combination of the first P k principal components bj , j ∈ Nk , and the sample mean m = n1 nj=1 sj : si = k X αj bj + m + e , (7.4) j=1 where e ∈ Rp denotes the approximation error. In order to select the number of eigenvectors containing the fraction δ ∈ (0, 1) of the information of the original dataset the value k is chosen based on the eigenvalues λi of the eigenvectors bi , i ∈ Np , according to (2.30), such that Pj λi ≥δ . (7.5) k := argmin Pi=1 p j∈Np i=1 λi 86 7.2 A Confidence Measure Based on Linear Subspace Projections The linear subspace, thus, restricts possible solutions of the flow estimation problem to the subspace of typical motion patterns statistically learned from sample data. Examples for such typical motion patterns are presented in Figure 7.3. Using temporal information the resulting eigenflows can represent complex temporal phenomena such as a direction change, a moving motion discontinuity or a moving divergence. With the eigenvectors (”eigenflows”) any vectorized displacement vector neighborhood Nx centered on position x can now be approximately reconstructed by a linear combination of the k selected eigenflows using the reconstruction function r r(Nx , k) = k X α i bi + m . (7.6) i=1 In order to obtain the coefficient vector α containing the eigenflow coefficients αi , it is sufficient to project the sample neighborhood Nx into the linear subspace spanned by the eigenflows using the transformation α = B T (Nx − m). (7.7) The linear combinations of the previously derived eigenflow vectors represent typical flow field neighborhood constellations. Depending on the training data the information contained in the learned model varies. If ground truth flow fields are used many sample sequences are necessary to include most of the possible flow constellations. However, as only very few sequences with ground truth exist the resulting eigenflows only represent an incomplete number of constellations. In contrast, it is possible to compute the flow for a given sequence and use exactly this computed flow as input for the unsupervised learning algorithm. In this way the resulting model will be well adapted to the current flow problem. However, if the flow computation method does not allow certain displacement vector constellations such as rotations the trained linear subspace will not be sufficient to represent these constellations either, as all training samples are derived from the computed flow field. In both cases, if we learn from insufficient ground truth flows or from incorrect, computed flow fields, the problem that correct flow constellations cannot be reconstructed from the eigenflows persists. 7.2.4 A Confidence Measure from Eigenflows To evaluate the confidence of a given flow vector its validity within its spatio-temporal context has to be considered, that is within its neighborhood Nx of flow vectors. Given a number of k model parameters, e.g. eigenflows, a confidence measure can be derived based on the assumption that displacement vectors are the more reliable within their neighborhood the better these flow vector constellations can be reconstructed from the eigenflows. The accuracy of a computed flow vector can in general be assessed based on a chosen 87 7 Statistical Confidence Estimation 12 y 9 6 3 0 9 6 3 0 x 0 2 1 3 4 4.5 t Figure 7.3: Examples for eigenflows calculated from computed flow fields using spatial and temporal information. The inclusion of temporal information allows for the representation of complex temporal phenomena such as a flow direction change (top), a moving motion discontinuity (center) and a moving divergence (bottom). 88 7.3 A Statistical Confidence Measure error measure E (see section 4.1.2). Hence, the normalized reconstruction error of the flow vector will serve as confidence measure: ϕ(x, u) = 1 − E(u, r(Nx , k)) . max(E) (7.8) The size of the neighborhood Nx of course has to be the same as for the eigenflows. Our proposed method may fail on rare occasions of untypical, but correct flows encountered in the image data. These are singular events, which in case of underrepresentation in the training data may not be adequately incorporated into our basic PCA framework. A range of more refined algorithms has been developed in the field of statistical learning. Some of these might solve the problem of underrepresentation, such as multiclass PCA [77] or partial least squares regression. 7.3 A Statistical Confidence Measure The second confidence measure I propose is a purely statistical measure based on a learned model of typical flow field patches. The model consists of the first and second order moment of the flow field patch distribution obtained from training data. To assess the accuracy of a given flow vector a test statistic is formulated together with a hypothesis test (see section 2.2.2). Since the true distribution of the test statistic is unknown, an empirical distribution is estimated from training data. The p-value, the minimum significance level for which the current hypothesis is rejected (see section 2.2.2), expresses the confidence associated with the current vector. To evaluate the proposed method the confidence is used to indicate the order of sparsification of the flow field. In each sparsification step the average error of the remaining flow field is computed and compared to the optimal value and other confidence measures. The results show that the proposed statistical method yields lower errors than common confidence measures for almost all test sequences and optical flow methods. Furthermore, I show that it can be extended into a nonlinear estimator, which further increases its accuracy, and that it can be modified in order to handle sparse flow fields. 7.3.1 Hypothesis Testing For the statistical confidence measure I use the same training data selection process as for the previous confidence measure including symmetrization (see section 7.2.1 and 7.2.2). To obtain the motion model, the first and second order moments of the flow field distribution, the empirical mean m and the covariance matrix C, are computed from the training data set, which contains the vectorized flow field patches in its columns. To assess the reliability of a given flow vector based on its neighborhood the following hypothesis is tested 89 7 Statistical Confidence Estimation H0 : “The central flow vector of a given flow field patch follows the underlying conditional distribution given the remaining flow vectors of the patch.”. Let D := Ω × [0, T ] again denote the spatio-temporal image domain and V : D → Rp a p-dimensional real valued random variable describing possible vectorized flow field patches. Testing the confidence of the central vector of a regarded flow patch boils down to specifying the conditional pdf of the central vector given the remainders of the flow patch, and comparing the candidate flow vector against this prediction, considering a metric induced by the conditional pdf. For a given image sequence location (x, y, t) ∈ D let v ∈ Rp correspond to the vectorized flow field patch centered on this location, and let (i, j), i < j, (7.9) denote the line indices of v corresponding to the horizontal and vertical flow vector component of the central vector of the original patch. We partition v into two disjoint vectors, the central flow vector va , and the “remainders” vb of the regarded flow patch: va = (vi , vj )T , (7.10) T vb = (v1 , ..., vi−1 , vi+1 , ..., vj−1 , vj+1 , ..., vp ) . The mean vector and matrix C are partitioned accordingly: covariance ma Caa Cab . m= C= Cba Cbb mb The basic idea is now to predict the central vector of a flow field patch from its neighboring vectors and to evaluate the difference between the predicted vector and the actually measured vector in a hypothesis test. As shown in Chapter 2.3, the best linear unbiased estimator (BLUE) of the central vector and its prediction error correspond to the first and second order moments of the conditional pdf, which are given by v̂a = ma + Cab C−1 bb (vb − mb ) = ma|b , (7.11) Cab C−1 bb Cba (7.12) Var(va − v̂a ) = Caa − = Ca|b . I stress that these first and second order moments of the conditional pdf are valid independent of the assumption of a normal distribution. The covariance matrix does not imply Gaussianity nor does it imply elliptical shape of the pdf. A covariance matrix is next to the mean an important characteristic of any distribution. To derive the test statistic let dM : Rp → R+ 0 dM (v) = (va − ma|b )T C−1 a|b (va − ma|b ) (7.13) denote the squared Mahalanobis distance between va and the mean vector ma|b given the covariance matrix Ca|b . Even though we do not know the true conditional distribution, the squared Mahalanobis distance is chosen as test statistic for the following reasons: 90 7.3 A Statistical Confidence Measure • Since according to Chapter 2.3 the best linear unbiased estimator for the central vector va given the remaining flow vectors in vb corresponds to the conditional mean ma|b of the learned distribution with covariance matrix Ca|b , the Mahalanobis distance can be understood as the weighted distance between the central vector of the patch and its prediction ma|b from the surrounding field. • The Mahalanobis distance is the optimal test statistic in case of a normally distributed conditional pdf of the central flow vector. This does not imply that the image data or the flow data are assumed to be normally distributed as well. To carry out a hypothesis test (significance test), we have to determine quantiles of the distribution of the test statistic for the case that the null hypothesis to be tested is known to be true. To this end, the empirical cumulative distribution function G : R+ → [0, 1] of the test statistic is computed from training data. We obtain the empirical quantile function G−1 : [0, 1] → R+ G −1 (7.14) (q) = inf{x ∈ R | G(x) ≥ q} . (7.15) To, finally, examine the validity of H0 a hypothesis test is applied φα : Rp → {0, 1} ( 0, if dM (v) ≤ G−1 (1 − α) φα (v) = 1, otherwise (7.16) (7.17) where φα (v) = 1 indicates the rejection of the hypothesis H0 . Based on this hypothesis test we would obtain a binary confidence measure instead of a continuous mapping to the interval [0, 1]. Furthermore, it would be inconvenient to recompute the confidence measure each time the significance level α is modified. Therefore, I propose to use the concept of p-values, which was introduced by Fisher [41] and defined in section 2.2.2. A p-value function Π maps each sample vector to the minimum significance level α for which the hypothesis would still be rejected, i.e. Π : Rp → [0, 1] (7.18) Π(v) = inf{α ∈ [0, 1]|φα (v) = 1} = inf{α ∈ [0, 1]|dM (v) > G−1 (1 − α)} . Hence, we finally obtain the following confidence measure ϕ : Rp → [0, 1] (7.19) −1 ϕ(v) = Π(v) = inf{α ∈ [0, 1]|dM (v) > G (1 − α)} . 91 7 Statistical Confidence Estimation 7.4 Applicability of the Test One issue we have to cope with is the applicability of the proposed hypothesis test. In case of inconsistent flow field patches, where the central vector cannot be predicted reliably from the surrounding vectors, the result of the hypothesis test is unreliable. Reliability is only given for typical surrounding flow field patches. Hence, we can only compare the central vector to the prediction by the surrounding flow vectors and the learned model, if the surrounding vectors follow the model. In order to make the results independent of the average flow vector length of sample and test patch the whole patch is normalized by dividing by its l2 -norm. To detect locations, where the confidence measure is unreliable, I propose a second hypothesis test, which examines the hypothesis H1 : “The flow vectors of the current flow field patch, which surround the central vector, follow the underlying distribution.”. As reasoned before, I again choose the Mahalanobis distance as test statistic. The covariance matrix and mean vector can simply be computed by marginalizing the original flow field patch distribution over the variables indicated by i and j in (7.9). Since the underlying distribution of the surrounding flow vectors is again unknown, I then compute their empirical cumulative distribution function from sample data and obtain p-values in exactly the same way as for the test of H0 . The result of the hypothesis test for H1 now yields information on the applicability of the hypothesis test for H0 . In case of a low p-value for the H1 test, the result of the H0 test is not reliable. In this way, we finally arrive at a two-stage method: For any given flow field we estimate the cumulative distribution function of the central vector given the surrounding vectors of the patch, and we estimate the cumulative distribution function of the surrounding flow vectors themselves. Then the H1 test yields results on the applicability of the confidence test by judging the consistency of the surrounding flow vectors only. In a second step the H0 test can then be applied in locations identified as reliable by the H1 test in order to obtain statements on the reliability of the central flow vector given the surrounding ones. 7.5 Application to Sparse Vector Fields Sparse vector fields often occur in applications. Since the proposed confidence measure is based on a distribution over the flow vectors of a complete patch the approach has to be modified in order to allow for the confidence computation of sparse vector fields. To compute the confidence at the current location, let k refer to the number of flow vectors existing in the current patch, and let w ∈ {0, 1}p indicate if the flow vector belonging to the current index is set in the current patch. In case it is set this is denoted by wi = 1, 92 7.6 A Nonlinear Extension otherwise the entry is set to 0. We project the learned distribution described by its first and second order moments m and C into the subspace corresponding to the vectors existing in the current flow field patch. This is done by multiplying C and m by a matrix P ∈ Nk×p , which is obtained from an identity matrix by removing every line with index i for which wi = 0: C0 = P CP T 0 m = Pm . (7.20) (7.21) The resulting distribution is described by its moments C0 and m0 . Again, we condition on the central vector v ∈ R2k as described in equation (7.11) and obtain the moments C0a|b and m0a|b of the projected, conditional distribution. Based on this distribution, we can then compute the Mahalanobis distance for the central vector of the sparse flow field patch just as in the case of a dense flow field. To obtain p-values we would have to estimate the distribution of the Mahalanobis distance function d0M (v), the test statistic, for the current population of the flow field patch. Since there are ( p2 )2 possible populations of the flow field patch, just as many different distributions over the test statistic would have to be estimated. This is computationally infeasible. Hence, I propose to simply use the inverse of the Mahalanobis distance as confidence function in case of sparse vector fields 1 ϕ(v) = . (7.22) 1 + d0M (v) 7.6 A Nonlinear Extension The previously proposed confidence measure relies on a linear prediction of the central vector of the flow field patch from its surrounding flow vectors. In order to obtain results of higher accuracy, the vector v is extended by nonlinear, polynomial combinations of flow vector components. This way, even more relations between the central flow vector and those in the surrounding patch can be represented by v. To limit the number of possible flow vector combinations I restrict this approach to polynomials of degree three. In this way higher powers of flow vector components are possible without too much complexity and without losing the sign (as could be the case for degree two). To identify the most meaningful combinations of flow vectors the normalized cross covariance Z is used. For two random variables, X and Y , Z is defined as follows: Cov(X, Y ) p Z(X, Y ) = p . (7.23) Var(X) Var(Y ) We estimate the normalized cross covariance matrix Z of the central vector of the patch and all degree three polynomials computed from its surrounding vectors based on sample data taken from the Yosemite, Marble, Street, Office and Rubber Whale sequences. 93 7 Statistical Confidence Estimation 1 2 1 3 1 1 1 1 3 1 1 1 3 1 2 1 2 2 1 2 2 1 2 2 1 1 1 1 1 1 2 2 2 1 2 2 1 1 1 1 1 Figure 7.4: Polynomials containing nonlinear combinations of the flow vector components in a 3 × 3 neighborhood, which are used to extend the input vector v. The rectangles represent the 3 × 3 neighborhood, and the numbers the power of the components in the polynomial. The first and second row describe the polynomials chosen to estimate the horizontal flow vector component, the third and fourth row those chosen to estimate the vertical flow vector component. As the horizontal central flow vector is best described by horizontal flow components and the vertical central flow by vertical neighborhood components, only the corresponding dimension (horizontal in the first and second row, vertical and the third and fourth row) is indicated by the rectangles, as no interrelations between both dimensions exist in the polynomials. Then the polynomials with the highest normalized cross covariance are chosen for each of the two components, because they describe the central vector best. The computation of the normalized cross covariance matrix clearly shows that relations between the horizontal central vector are strongest with the horizontal components of its neighbors, that means that horizontal motion is best described by nonlinear combinations of horizontal neighboring flow vector components. The equivalent holds true for vertical motion. In case a 3×3 neighborhood is chosen, the polynomials in Figure 7.4 have been identified as having the most significant relation with the central vector. For testing the nonlinear confidence measure, I altogether chose 20 polynomials for each flow vector component. 7.7 Results As there are several test sequences with ground truth data and numerous optical flow computation methods with different parameters each, it is impossible to present an ex- 94 7.7 Results tensive comparison between the proposed and previously known confidence measures. Hence, I will present results for a selection of typically used real and artificial sequences and flow computation methods. Here, the Yosemite, the Marble and the Rubber Whale sequence (from the Middlebury database [8]) are used. As optical flow computation methods the local structure tensor method [16], the non-linear 2D multiresolution combined local global method (CLG) [28] as well as the methods proposed by Nir [78] and Farnebäck [37] are employed. To quantify the error e(x) ∈ R+ 0 of a given flow vector at image sequence location x ∈ D the endpoint error and the angular error (see section 4.1.2) are used. The proposed approaches are compared to several of the situation and confidence measures described in previous sections. These are the three measures examining the intrinsic dimension of the image sequence by Haussecker and Spies [51] (strCt, strCs, strCc), the inverse of the energy of the global flow computation method by Bruhn et al. [26] (inverseEnergy), and the image gradient measure (grad ), which is approximated by central differences. Note that the inverse of the energy measure is only applicable for variational approaches and has, thus, not been applied to the flow fields computed by methods other than CLG. The Yosemite flow field by Nir et al. [78] was obtained directly from the authors. Hence, no variational energy is available for the computation of the inverse energy confidence measure. In the following, the three approaches proposed in this chapter will be abbreviated by pcaRecon for the measure based on linear subspace projections, pVal for the linear statistical confidence measure and pValNonlin standing for its nonlinear extension. In order to numerically compare different measures I follow the comparison method suggested by Bruhn et al. in [26] called “sparsification”, which is based on quantile plots. To this end, a specific fraction of the flow vectors (indicated on the horizontal axis in the following figures) is removed from the flow field in the order of increasing confidence and compute the average error of the remaining flow field. Hence, removing fraction 0 means that all flow vectors are taken into account, so the value corresponds to the average error over all flow vectors. Removing fraction 1 indicates that all flow vectors have been removed from the flow field yielding average error 0. For some confidence measures, the average error even increases after removing a certain fraction of the flow field. This is the case if flow vectors with errors below the average error are removed instead of those with the highest errors. As a benchmark, I also calculate an “optimal confidence”, copt , which reproduces the correct rank order of the flow vectors in terms of the chosen error measure e and, thus, indicates the optimal order for the sparsification of the flow field: copt (x) = 1 − e(x) . max{e(y)|y ∈ D} (7.24) For the experiments the patch size was not optimized but kept constant at 3×3×1 for all test sequences, where 3 × 3 stands for the spatial and 1 for the temporal dimension. 95 mean error 7 Statistical Confidence Estimation 5 4,5 4 3,5 3 2,5 2 1,5 1 0,5 0 3x3x1 7x7x1 11x11x1 15x15x1 21x21x1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 removed fraction mean error Figure 7.5: Remaining mean error for given fraction of removed flow vectors based on different patch sizes for the proposed confidence measure (Farnebäck method on Rubber Whale sequence, trained on ground truth data). The results show that the patch size chosen for the confidence measures is rather negligible. 5 4,5 4 3,5 3 2,5 2 1,5 1 0,5 0 groundtruth Yosemite Yosemite, Marble Street, Office particles PIV several 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 removed fraction Figure 7.6: Remaining mean error based on different training sequences for the proposed confidence measure (Farnebäck method on Rubber Whale sequence for 3 × 3 × 1 patch size). The results show that the methods are hardly sensitive to the choice of training data. The influence of this parameter is rather negligible as shown in Figure 7.5. Figure 7.6 shows that the performance of the confidence measures is also mostly independent of the training data. In case that ground truth or similar training data is used the performance is improved, but even particle sequence data yields results close to ground truth data. Quantile plots of the average flow field error for the state-of-the-art flow computation method by Nir et al. [78], Farnebäck et al. [37], the nonlinear 2D CLG method [28] and the structure tensor method [16] have been computed for the Rubber Whale sequence proposed in [8] as well as for the standard Yosemite and Marble test sequences. Selected results are shown in Figure 7.7. 96 7.7 Results For nearly all test examples the results indicate that the remaining average error for almost all fractions of removed flow vectors is lowest for the proposed confidence measures. As confidence measures are applied to remove the flow vectors with the highest errors only, the course of the curves is most important for small fractions of removed flow vectors and can in practice be neglected for larger fractions. The results indicate that at least one of the proposed confidence measures outperforms the previously employed measures for locally and globally computed optical flow fields on all test sequences. When looking at the result plots in Figure 7.7 it becomes apparent that the original and the nonlinear version of the proposed statistical measure do not perform equally well. In case the endpoint error is used as error measure the nonlinear version yields mostly better results. In case the results are based on the angular error the original linear confidence measure often performs better. The reason for this could be that the nonlinear polynomials increase vector components of large magnitudes. These usually also yield larger endpoint errors, since this error measure is absolute, which means that it depends on the ground truth length. It should also be noted that for a flow field density of 90% the average error of the local structure tensor method is already lower than that of the CLG flow fields for 100% density on the Marble and Yosemite test sequences. If the CLG flow field is sparsified to 90% as well, the error is approximately equal to that of the structure tensor method for the Yosemite sequence and only half of that of the CLG method at 90% for the Marble sequence. Yet, the structure tensor approach only needs a fraction of the computation time of the CLG method and is much simpler to implement. Hence, for the local structure tensor method in two out of three cases I was able to obtain a flow field of 90% density of a quality level equal or better than that of the CLG method by means of the proposed linear confidence measure, which clearly shows the benefit of the suggested approaches. 97 7 Statistical Confidence Estimation 0,25 opt mean endpoint error 0,2 grad pValNonlin 0,15 pVal pcaRecon 0,1 strCt strCs 0,05 strCc strEv3 0 inverseEnergy 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 removed fraction a) CLG method, endpoint error, Yosemite sequence 0,12 0,1 mean endpoint error opt grad 0,08 pValNonlin 0,06 pVal pcaRecon 0,04 strCt strCs 0,02 strCc strEv3 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 removed fraction b) Farnebäck method, endpoint error, Yosemite sequence 1,4 1,2 mean angular error opt 1 grad pValNonlin 0,8 pVal 0,6 pcaRecon strCt 0,4 strCs 0,2 strCc strEv3 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 removed fraction c) Farnebäck method, angular error, Yosemite sequence 98 7.7 Results 1,2 1 mean angular error opt grad 0,8 pValNonlin 0,6 pVal pcaRecon 0,4 strCt strCs 0,2 strCc strEv3 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 removed fraction d) Nir method, angular error, Yosemite sequence 7 6 mean angular error opt 5 grad pValNonlin 4 pVal 3 pcaRecon strCt 2 strCs 1 strCc strEv3 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 removed fraction e) Structure Tensor method, angular error, Marble sequence 0,25 mean endpoint error 0,2 opt grad 0,15 pValNonlin pVal 0,1 pcaRecon strCt strCs 0,05 strCc strEv3 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 removed fraction f) Structure Tensor method, endpoint error, Marble sequence 99 7 Statistical Confidence Estimation 30 25 mean angular error opt grad 20 pValNonlin 15 pVal pcaRecon 10 strCt strCs 5 strCc strEv3 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 removed fraction g) Structure Tensor method, angular error, Rubber Whale sequence 0,7 0,6 mean endpoint error opt 0,5 grad pValNonlin 0,4 pVal 0,3 pcaRecon strCt 0,2 strCs 0,1 strCc strEv3 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 removed fraction h) Horn-Schunck method, endpoint error, Rubber Whale sequence Figure 7.7: Average error quantile plots based on different optical flow methods and error measures for the comparison of previous confidence measures (strCt, strCs, strCc, grad, inverseEnergy) to the proposed methods (pcaRecon, pVal, pValNonlin) and the optimal confidence defined in (7.24) (optConf ). The horizontal axis indicates the fraction of removed flow vectors, the vertical axis the mean error of the remaining flow field. To graphically compare confidence measure results I use the structure tensor flow field computed on the Rubber Whale test sequence based on the angular error as example, as here the difference between the proposed confidence measure and the previously used ones is most eminent. As the scale of confidence measures is not unique I again only compare the order of removal of the flow vectors based on increasing confidence. Hence, each flow vector is assigned the time step of its removal from the field. The resulting orders for three of the confidence measures is shown in Figure 7.8. 100 7.7 Results a) optimal b) pValNonlin c) pcaRecon d) pVal e) strCc f) strEv3 Figure 7.8: Sparsification order of flow vectors based on increasing confidence value for the structure tensor flow field on the Rubber Whale sequence based on the angular error. The proposed confidence measures (pValNonlin, pVal, pcaRecon) are closest to the optimal confidence. 101 7 Statistical Confidence Estimation To examine the results of the second hypothesis test examining H1 , it is applied to three sample flow fields: the Nir flow field on the Yosemite sequence, the CLG flow field on the Marble sequence and the structure tensor flow field on the Rubber Whale sequence. What we expect is that flow edges and regions with large, inconsistent motion vector fields are detected as unreliable inputs for confidence estimation and, thus, for the H0 test. The results are shown in Figure 7.9 and confirm our expectations. Hence, the applicability test is suitable to detect inconsistent flow field patches, which prevent reliable confidence estimates. Finally, I show results for the sparse variant of the pVal confidence measure. Figure 7.10 shows a car sequence with sparse flow field and the color coded result of the confidence measure. Even though there is no ground truth available and, thus, no numerical results, the results show that improbable flow vectors are marked by low confidence values. 7.8 Summary and Conclusion In this chapter I have proposed three confidence measures. All three are based on the learned first and second order moments of the flow field patch distribution. The first measure assigns confidence values based on the distance of the original vector and its projection into a linear subspace computed by principal component analysis. The second measure carries out a statistical hypothesis test and uses p-values in order to assign confidence values. The third measure is, in fact, an extension of the second measure, which integrates nonlinear interrelations between different flow vector components by means of polynomials. In combination with the hypothesis test I also proposed to examine the confidence measure’s applicability. To this end I formulated an additional hypothesis test, which assesses the consistency of flow field patches. The results show that especially flow edges and regions with inconsistent flow are detected as unreliable for confidence estimation. In this way, I finally obtained a method consisting of two sequential hypothesis tests: the first test, H1 , estimates the applicability of the following confidence test, H0 , which is only reliable in case H1 yields high results. All measures and their extensions are generally applicable to arbitrarily computed optical flow fields. As the measures are based on the computation of motion statistics from sample data, they are to the best of my knowledge the first confidence measures for optical flows, for which the notion “confidence measure” is in fact justified in a statistical sense. Slight changes in the algorithm also allow for an application to non-dense flows, which often occur, e.g. in traffic sequences. Results for locally and globally computed flow fields on ground truth test sequences based on different error measures show the superiority of the suggested method compared to previously employed confidence measures. 102 7.8 Summary and Conclusion a) Nir flow b) test applicability c) CLG flow d) test applicability e) structure tensor flow f) test applicability Figure 7.9: Results of H1 hypothesis test, which examines the applicability of the H0 test. Especially flow edges and difficult flow regions are detected. In these cases the result of the H0 test, and, thus, the computed confidence, is unreliable. 103 7 Statistical Confidence Estimation a) Car sequence and corresponding flow field b) Result of sparse confidence measure Figure 7.10: Result of the pVal confidence measure applied to a sparse flow field. a) Original car sequence with computed flow, b) Confidence measure result (green: high confidence, red: low confidence). 104 Chapter 8 A Model Based Optical Flow Algorithm 8.1 Introduction Every confidence measure builds on some idea on what a correct optical flow field should look like. Therefore, confidence measures and optical flow estimators are highly related, and most confidence measures already contain constraints for a new optical flow computation method. In this chapter I employ the original idea used for the subspace projection based confidence measure in Chapter 7 to propose a new optical flow estimation method. The confidence measure is based on the idea that computed flow field patches can be expressed as a linear combination of typical, learned basis flows. Following the same line of thought the coefficients of these basis flows can be estimated based on the brightness constancy constraint. In this way, we obtain a highly accurate optical flow estimation method, which can be easily implemented and parallelized. 8.1.1 Motivation Optical flow refers to the displacement field between subsequent frames of an image sequence. Methods for the computation of the optical flow are usually tradeoffs between speed, accuracy and implementation effort. Local methods such as the Lucas/Kanade approach [70] or the structure tensor approach by Bigün [16] are fast and easy to implement but not very accurate. Global methods such as the method by Horn and Schunck [53], Bruhn et al. [28], Brox et al. [25], Papenberg et al. [81] and Nir et al. [78] are much more accurate and can even be applied in realtime by means of multigrid methods [27], yet with considerable implementation effort. Farnebäck [36] proposed a local method which is accurate and fast, but also rather complex to implement due to its rule-based image segmentation scheme. Hence, in this chapter I propose to extend the 105 8 A Model Based Optical Flow Algorithm method by Black et al. [23] in order to obtain a local optical flow method, which ranges among the most accurate methods today and at the same time is fast, yet simple to implement. Furthermore, the suggested method relies on natural motion statistics and is, thus, adaptable to specific motion patterns occurring e.g. in fluid dynamics or driver assistance systems. Finally, the original subspace projection confidence measure (Chapter 7) is directly inherent to this flow computation method and can easily be applied yielding considerable improvements of the flow field. 8.1.2 Related Work There is a number of local methods for optical flow computation today. Because the optical flow problem is underdetermined, all these methods involve additional assumptions on the structure of the motion, i.e. they are based on a model of the flow. Variational methods incorporate such models e.g. in regularization terms [28, 81, 25]. Local methods explicitly model assumptions on the flow vectors within spatio-temporal neighborhoods. Lucas/Kanade [70] and Bigün [16] assume that the velocity is constant for a local neighborhood centered on the current pixel. This leads to an overdetermined system of equations, which can be solved by the least squares (section 2.6) or total least squares method. A more robust method for solving the overdetermined system, the least median of squares approach, has been proposed by Bab-Hadiashar and Suter [6]. Yet, the model of piecewise constant motion is not adequate for most image sequences. Hence, other methods assume more general models like constant or affine models [58, 19, 36], local planar models [18] or physics based models [46, 50]. An overview can be found in [94]. Flow fields based on such models are often more accurate than those based on assumptions of constant flow. However, there are situations where more complex models would be necessary to compute accurate flow fields, e.g. situations with motion discontinuities and transparent motion, which have been addressed by Barth et al. [13]. General affine models have been integrated into a variational framework by Nir et al. [78]. In case of even more complex situations learning motion models from given sample motion data is a method to obtain superior results. Roth and Black [85] employ a general learning based approach using fields of experts, which is integrated into a global optical flow method. Here, learning is not adapted to special image sequences. In contrast, Black et al. [23] as well as Yacoob and Davis [100] integrate adapted, learning based models into a local optical flow method. To learn these models, principal component analysis (PCA) is used. This leads to a nonlinear energy functional, which is linearized and minimized by means of a coarse-to-fine strategy and coordinate descent. However, the models employed are either purely spatial [23] or purely temporal [100]. Our approach differs in five main aspects from these two methods: (a) Instead of formulating a non-linear energy functional I obtain an overdetermined system of equations, which can be solved by established least squares methods 106 8.2 Parameter Estimation instead of performing gradient descent, (b) Spatio-temporal instead of purely spatial or temporal motion models are employed, which can represent complex motion patterns over time, (c) I show highly accurate results comparable to Farnebäck’s, but with much less effort, for test sequences typically used in optical flow computation, (d) By means of a model-based confidence measure directly inherent to the flow computation method the resulting flow field can be sparsified yielding significantly lower angular errors. Via interpolation or inpainting after sparsification it is also possible to reconstruct a dense flow field with lower angular error [62]. (e) I additionally integrate the learned motion model into a global optical flow approach, which could be understood as a learning based extension of the method by Nir et al. [78]. Based on the simple structure tensor method, we end up with a simple, fast, parallelizable and accurate optical flow method, which is able to incorporate learned prior knowledge on special types of motion patterns and, thus, can be adapted to all kinds of motion estimation problems. 8.1.3 Contribution In this chapter I use the original idea of the subspace projection confidence measure to formulate a new optical flow estimator. Following Nir [78] the optical flow vectors are not estimated explicitly. Instead, a parameter vector is estimated which contains the coefficients of the principal components spanning the learned subspace of typical motion constellations in flow field neighborhoods. Thus, every flow vector is expressed as a linear combination of learned basis flow field patches. In this way, spatio-temporal motion models learned from sample data can be employed to constrain the optical flow field. Using the brightness constancy constraint equation (3.1), we end up with an overdetermined system of equations, which can be solved efficiently. Part of this work has been submitted [64]. 8.2 Parameter Estimation I again use the learned motion model proposed in Chapter 7 consisting of the mean vector m and the principal components bj , j ∈ Nk , in (7.4) learned from sample data. Instead of estimating the optical flow itself we want to estimate the corresponding coefficients αj , j ∈ Nk , of the principal components, which implicitly define the displacement field for the neighborhood of the current pixel. In this way, the resulting optical flow is restricted to the learned subspace spanned by the k principal components. In the sense 107 8 A Model Based Optical Flow Algorithm of Nir et al. [78] one could speak of an over-parameterized model, since we estimate k coefficients to obtain a two-dimensional flow vector, which is chosen from the center of the flow field patch defined by the corresponding parameters. To estimate the coefficients αj , j ∈ Nk , we can solve an overdetermined system of equations. For a given flow vector and its spatio-temporal neighborhood I make two assumptions: (a) The flow field patch can be represented as a linear combination of principal components, (b) Each of the flow vectors within the patch fulfills the brightness constancy constraint equation (3.1) as explained in Chapter 3. p For a given pixel position (x, t) ∈ D let Ix , Iy , It ∈ R 2 , p = 2 |ω| (2τ + 1) denote the vectorized image derivatives with respect to x, y and t within the spatio-temporal flow field patch ω(x) × [t − τ, t + τ ] centered on (x, t). Then the assumptions (a) and (b) can be combined by substituting u in (3.1) by the linear combination of basis flows. For each x0 ∈ ω(x), t0 ∈ [t − τ, t + τ ] we, thus, obtain one equation of the following form: !T Pk Ixr = 1 αj bjr + mr j Pk · = −Itr Iyr j = 1 αj bjs + ms (8.1) where r := q(x0 , t0 , 1), s := q(x0 , t0 , 2) (see 7.3) . We can rewrite these p 2 equations in matrix formulation: L α = −d, L∈R p ×k 2 (8.2) p 2 T , d ∈ R , α = (α1 , · · · , αk ) . with L and d defined as follows. Under the assumption that the first p2 entries of the principal components bi , i ∈ Nk , correspond to the horizontal components of each flow vector and the remaining entries to the vertical components, the j-th column of L denoted by lj and the vector d are given by Ix1 bj1 + Iy1 bj( p +1) 2 .. lj := , j ∈ Nk , . p p p Ix bj + Iy bjp 2 2 2 Ix1 m1 + Iy1 m p +1 + It1 2 .. d := . . Ix p m p + Iy p mp + It p 2 108 2 2 2 (8.3) (8.4) 8.3 Confidence Estimation Many methods are available to solve this overdetermined system of equations for the vector α. One way is to use the simple and efficient least squares method (see section 2.6) α = −(LT L)−1 LT d , (8.5) which already yields results of high accuracy. As the integration area of this method is large for spatially and/or temporally large principal components the accuracy in the vicinity of motion boundaries may be lower. Yet, to handle outliers, the original least squares method is inappropriate. In such cases, I propose to use the more robust least median of squares approach by Rousseuw [86, 87] applied to optical flow estimation by Bab-Hadiashar and Suter [6]. Here, the basic idea is to randomly and repeatedly choose a subset of equations from the original system that contains as many equations as unknowns and can, thus, be solved precisely. For each solution obtained from a random subset of equations the residual of the remaining equations is computed and, finally, the solution with the lowest residual median is chosen. Independent of the method used to solve the overdetermined system of equations the resulting parameter vector α finally represents the optical flow within a spatio-temporal neighborhood, from which the central vector is chosen as displacement vector for the current image sequence location. 8.3 Confidence Estimation Since the principal components have already been computed for the flow estimation method, they can be employed to estimate the confidence of the computed flow vectors using the linear subspace projection confidence measure described in Chapter 7.2.4. Based on this confidence measure vectors with high errors can be removed from the flow field and, if a dense flow field is necessary, reconstructed afterwards as demonstrated in chapter 9. 8.4 Integration of the Model into a Global Optical Flow Method In 2005 Bruhn et al. [28] proposed to introduce the original structure tensor method into a global framework and came up with the very well-known CLG method. Since I proposed an extension to the structure tensor by introducing learned motion models I will show that the proposed structure tensor extension can also be introduced into a global framework, thus extending the original CLG method to incorporate motion 109 8 A Model Based Optical Flow Algorithm models and to estimate parameter vectors as done by Nir [78]. Let α̃ = (α1 , ..., αk , 1)T , (8.6) L̃ = (L d), (see 8.3), (8.7) J T = L̃L̃ . (8.8) Then the following energy is minimized, which is similar to that proposed by Bruhn et al. [28]. In contrast to their method, I apply the regularizer to the parameters α as done by Nir et al. [78] instead of to the flow vectors themselves. Z ψ1 (αT Jα) + λ ψ2 (k∇αk2 )dx dy dt, λ ∈ R+ . (8.9) E(α) = D Here ψ1 and ψ2 stand for outlier functions, which allow for nonlinearities in the flow field, e.g. at motion boundaries. I use an outlier function proposed by Charbonnier et al. [31]: s ψ(s2 ) = 2βi2 1+ s2 , i ∈ {1, 2} . βi2 (8.10) Via calculus of variations (see Chapter 2.1), this leads to k Euler-Lagrange equations, for r ∈ Nk : k X ψ10 (αT Jα)( Jr,j αj + Jr,k+1 ) − λ div(ψ20 (k∇αk)2 ) = 0. (8.11) j=1 The solution of this nonlinear system of equations corresponds to a minimum of the original energy functional. In order to compute derivatives the central differences scheme is used for spatial derivatives, and the forward differences scheme for temporal derivatives. To improve the computation of image derivatives the scene is first smoothed by means of a Gaussian filter. The importance of this step for more accurate results is discussed in [28]. For optimization the multiresolution scheme with warping proposed by Brox et al. [25] is used. This method has been formulated for flow fields, not for parameter maps as in this case. The scheme can be adapted to parameter maps by making two changes: First, to scale parameter maps between different pyramid levels, we cannot scale the parameters themselves but we need to compute the flow corresponding to the parameters, then scale the flow and transform the scaled flow back to the parameter space. Let B := (b1 , ..., bn ), then the following function f scales the flow given by the parameter vector α by a factor s f : Rk × R → Rk T f (α, s) = B (s(Bα + m) − m) . (8.12) (8.13) Second, the initial guess for each level is not zero but the parameters, α0 , which correspond to a zero flow field at this level: α0 = −B T m . 110 (8.14) 8.5 Results With these two changes, the multiresolution scheme with warping can easily be applied to parameter maps, as well. For each new pyramid level the current complete solution in the parameter space, p0 , is scaled to the current level by means of the function f . Then the image is warped by the flow corresponding to the scaled parameters. With α0 as initial guess the optical flow problem is solved for the warped sequence in the parameter space on the current level of the pyramid. Finally, the solution is scaled to the original size by the function f and added to the previous solution p0 . Then the procedure is repeated for the next level. More detailed information on pyramids with warping is given in [81]. 8.5 Results In this section I present results on the accuracy, efficiency and adaptability of the proposed optical flow method. For the implementation of the proposed local method the filters optimized for optical flow by Scharr [89] are used to estimate derivatives. Usually, I use size 7x7x7 if the length of the image sequence permits it. Furthermore, all sequences are presmoothed by a Gaussian filter with spatial σ = 0.8. For the computation of the principal components 5000 samples were randomly selected from the training sequences. 8.5.1 Accuracy In order to evaluate the accuracy of the proposed optical flow method several experiments were conducted on ground truth data. Here, the Yosemite, the Marble and the Rubber Whale sequence from the Middlebury dataset [8] as well as the Street and Office sequence [73] shown in Figure 8.1 were used. Quality of the Estimator A comparison of the angular error and standard deviation to previously proposed local and global optical flow methods for the Yosemite sequence is shown in Table 8.1. Among them can be found the method by Roth and Black [85] who also obtained very good results by learning statistical motion models from sample data and integrating this model into a variational approach. Yet, their algorithm is rather complex, and for the Yosemite sequence they trained their model on ground truth data. Hence, the proposed method is preferable due to accuracy, speed and straightforwardness. In fact, the results obtained are as accurate as the relatively involved method by Farnebäck [36], yet with a much lower standard deviation of 1.45 compared to 2.57. It is even more accurate than many global methods such as the combined local global method by Bruhn et al. [28]. Yet, in contrast to most other methods, it is also simple to implement and adaptable to different types of motion. This approach was, furthermore, applied to the Marble and Rubber Whale sequence. 111 8 A Model Based Optical Flow Algorithm Figure 8.1: One frame of the Yosemite, the Marble, the Rubber Whale, the Street and the Office sequence. 112 8.5 Results Comparison to Other Approaches Method Ang. Err. Black & Anandan [22] 4.46 Bruhn et al. (2d CLG linear) [28] 2.64 Black & Jepson [21] 2.29 Ju et al. [58] 2.16 Bab-Hadiashar & Suter [6] 1.97 Bruhn et al. (2d CLG non-linear) [28] 1.79 our method (trained on other gt) 1.53 Roth & Black [85] (trained on gt) 1.47 Bruhn et al. (3d CLG non-linear) [28] 1.46 our method (trained on HS) 1.45 Farnebäck [36] 1.40 our method (trained on gt) 1.35 Farnebäck [37] 1.14 Papenberg et al. [81] 0.99 Nir et al. [78] 0.85 Std 4.21 2.27 2.25 2.0 1.96 2.34 1.69 1.54 1.50 1.47 2.57 1.45 2.14 1.17 1.18 Table 8.1: Comparison of the proposed model based local motion estimator to angular error and standard deviation obtained by previously proposed local and global methods for the Yosemite sequence without clouds. To obtain the results in Tables 8.2 and 8.3 spatial model sizes ω between 3 × 3 and 21 × 21, temporal model sizes τ between 0 and 3 and numbers of principal components between 2 and 10 were tested. Table 8.4 shows the parameters used (spatial model size, temporal model size, number of principal components) in order to obtain the results in Tables 8.2 and 8.3. The values indicate that large model sizes but only between 5 and 10 or even less principal components and, thus, coefficient parameters αj are necessary to obtain good results. Evaluation of the Confidence Measure In the context of Tables 8.2 and 8.3, I will also refer to the effects of the confidence measure on the angular error of the remaining flow field. The column titled “density” indicates the density of the remaining flow field, which is obtained by removing the flow vectors with lowest confidence values. The results in the table indicate that the high accuracy of the method is generally increased by the application of the confidence measure. Its computation is very simple, since the principal components have been computed before the actual motion estimation. An important issue to investigate is the dependency of the accuracy of the proposed 113 8 A Model Based Optical Flow Algorithm Sample data Ground truth Ground truth Ground truth Ground truth Computed Computed Computed Computed Other GT Other GT Other GT Other GT Density (%) 100 90 80 70 100 90 80 70 100 90 80 70 Results Yosemite 1.35 ± 1.45 1.26 ± 1.30 1.14 ± 1.22 1.06 ± 1.22 1.45 ± 1.47 1.32 ± 1.21 1.19 ± 1.05 1.11 ± 0.97 1.53 ± 1.69 1.37 ± 1.43 1.24 ± 1.37 1.15 ± 1.38 Marble 2.06 ± 3.64 1.60 ± 2.78 1.38 ± 2.52 1.27 ± 2.36 2.32 ± 3.95 1.68 ± 2.60 1.30 ± 1.85 1.10 ± 1.45 2.55 ± 4.25 1.87 ± 2.73 1.49 ± 2.05 1.27 ± 1.65 Rubber Whale 7.87 ± 16.12 5.30 ± 10.49 4.45 ± 9.55 4.20 ± 9.83 7.87 ± 16.14 5.31 ± 10.52 4.44 ± 9.52 4.20 ± 9.82 7.85 ± 15.95 5.24 ± 10.43 4.36 ± 9.45 4.12 ± 9.75 Table 8.2: Angular error and standard deviation for different sample data selections (ground truth flow of the same sequence, Horn-Schunck flow field for the same sequence, ground truth flows of other sequences) and densities after sparsification based on the confidence measure in [65] for the Yosemite, Marble and Rubber Whale sequence. Sample data Other GT Other GT Other GT Other GT Results Density (%) Street 100 4.99 ± 13.72 90 3.65 ± 8.38 80 3.04 ± 6.05 70 2.44 ± 4.52 Office 3.83 ± 4.98 3.35 ± 3.85 3.01 ± 3.38 2.75 ± 3.25 Table 8.3: Angular error and standard deviation for ground truth sample data taken from other sequences, and densities after sparsification based on the confidence measure in [65] for the Street and Office sequence. method on the choice of parameters for a) the sample data for the computation of the motion statistics PCA model, b) the size of the model and c) the number of principal components. Dependency on Sample Data For the investigation of the dependency on the sample data I used sample data taken from 114 8.5 Results Parameters Sequence Yosemite (ground truth) Yosemite (computed) Yosemite (other) Marble (ground truth) Marble (computed) Marble (other) Rubber Whale (ground truth) Rubber Whale (computed) Rubber Whale (other) Street (other) Office (other) ω 21 × 21 19 × 19 19 × 19 21 × 21 21 × 21 19 × 19 19 × 19 19 × 19 19 × 19 19 × 19 21 × 21 τ 3 2 3 3 5 7 0 0 1 3 1 k 7 6 10 9 7 6 2 2 2 2 5 Table 8.4: Model parameters (spatial model size ω, temporal model size τ , number of principal components k) used to obtain the results in Tables 8.2 and 8.3. Note that for the Rubber Whale sequence the ground truth flow only contains a single frame and, thus, limits τ to 0 in the training process. (a) the specific ground truth flow field, (b) a flow field computed by the Horn-Schunck method [53], (c) ground truth flow fields of other sequences. For the computation of the Horn-Schunck flow fields used for (b) a multiresolution approach was employed yielding angular errors of 2.21 ± 2.83 on the Yosemite sequence, 4.90 ± 2.74 on the Marble sequence and 12.05 ± 16.25 on the Rubber Whale sequence. For (c) the other ground truth sequences are the Marble, Street and Office sequence in case we want to estimate the optical flow of the Yosemite sequence, and the Yosemite, Street and Office sequence in case we want to estimate the optical flow of the Marble sequence, and from all other four sequences if we want to estimate the flow of the Rubber Whale sequence. These training sequences have been chosen because their ground truth flow fields are readily available. Table 8.2 compares the angular error and standard deviation obtained for three test sequences for the different flow field densities, if the PCA model is trained on ground truth data, a computed Horn-Schunck flow field or other ground truth sequences. From the results in Table 8.2 we can draw the conclusion that best results are usually obtained from accurate prior knowledge as expected. However, even in the absence of such knowledge learning from optical flow estimates (e.g. Horn & Schunck) yields results of almost equal quality. Even in the case of learning from generic motion fields still competitive 115 8 A Model Based Optical Flow Algorithm results can be obtained. For the Street and Office sequences the choice of training data is limited to ground truth sequences taken from other scenes in Table 8.3. Dependency on Model Size Concerning the sensitivity of the method with respect to different model sizes and numbers of principal components Tables 8.5 and 8.6 show the dependency of the method on the chosen size (ω, τ ) of the PCA model based on the Yosemite sequence for ground truth sample data and based on the Marble sequence for Horn-Schunck data, respectively, for 7 principal components. The results suggest that larger model sizes yield lower angular errors. ω\τ 5×5 9×9 15 × 15 21 × 21 Dependency 0 7.12 ± 12.76 3.93 ± 6.46 2.39 ± 3.01 1.79 ± 1.87 on model size 1 4.72 ± 7.62 3.01 2.69 ± 3.49 2.12 1.81 ± 1.97 1.50 1.66 ± 1.57 1.35 3 ± ± ± ± 3.55 2.27 1.70 1.45 Table 8.5: Angular error and standard deviation for different spatio-temporal sizes (ω, τ ) for the Yosemite sequence trained on ground truth data using 7 principal components. ω\τ 5×5 9×9 15 × 15 21 × 21 Dependency on model size 0 1 9.74 ± 14.10 6.68 ± 10.29 6.49 6.61 ± 9.60 5.05 ± 8.11 3.42 5.01 ± 5.85 3.24 ± 4.78 2.54 4.10 ± 3.78 2.90 ± 3.24 2.42 3 ± ± ± ± 9.76 6.52 5.12 3.97 Table 8.6: Angular error and standard deviation for different spatio-temporal sizes (ω, τ ) for the Marble sequence trained on Horn-Schunck data using 7 principal components. Dependency on Numbers of Principal Components Tables 8.7 and 8.8 show the results for different numbers of principal components for the Yosemite sequence trained on ground truth data and for the Marble sequence trained on Horn-Schunck data for the model size of ω = 21 × 21, τ = 3. The error values suggest that this number does not have much influence on the accuracy of the flow field if at least 116 8.5 Results five or six components are chosen. This shows that most of the variance for the occurring motion patterns is already contained in the first five or six principal components. Dependency on principal components k ang. err. k ang. err. 2 1.93 ± 2.07 7 1.35 ± 1.45 3 1.86 ± 2.06 8 1.36 ± 1.40 4 1.53 ± 1.53 9 1.35 ± 1.41 5 1.44 ± 1.56 10 1.36 ± 1.42 6 1.40 ± 1.53 Table 8.7: Angular error and standard deviation for different numbers of principal components for the Yosemite sequence trained on ground truth using a spatiotemporal model size of ω = 21 × 21, τ = 3. Dependency on principal components k ang. err. k ang. err. 2 3.28 ± 4.99 7 2.32 ± 3.95 3 3.32 ± 4.95 8 2.34 ± 3.99 4 3.13 ± 4.57 9 2.35 ± 4.06 5 2.95 ± 4.34 10 2.35 ± 4.04 6 2.37 ± 3.99 Table 8.8: Angular error and standard deviation for different numbers of principal components for the Marble sequence trained on the Horn-Schunck result using a spatio-temporal model size of ω = 21 × 21, τ = 2. Hence, I have shown that neither the choice of (reasonably general) sample data nor higher numbers of principal components have much influence on the results of the proposed optical flow method. In contrast, the spatio-temporal model size is important to obtain high accuracy. Graphical Results The flow fields for all test sequences with 100 % density computed on ground truth data are shown in Figures 8.2 and 8.3. Apparently the accurate results of the method are especially due to the high accuracy of angular errors close to 0 for large regions without motion boundaries, e.g. the valley of the Yosemite sequence and the table of the Marble sequence. In case of motion boundaries the accuracy of the flow field is lower, especially for higher numbers of principal components. 117 8 A Model Based Optical Flow Algorithm Figure 8.2: HSV-coded ground truth flow fields (left) and result of the proposed estimator (right) for the Yosemite, the Marble and the Rubber Whale sequence based on motion statistics obtained from ground truth data. 118 8.5 Results Figure 8.3: HSV-coded ground truth flow fields (left) and result of the proposed estimator (right) for the Street and the Office sequence based on motion statistics obtained from generic ground truth data of other scenes. 119 8 A Model Based Optical Flow Algorithm Figure 8.4: Example for a principal component representing a horizontal motion boundary. Note the slight, non-sharp transition between the speeds on both sides of the boundary. The reason for this lies in the fact that motion boundaries are not accurately learned by the PCA model if smooth flow field patches and motion boundaries are combined in the training data and if the location of the motion boundary varies within the flow field patch. Principal components representing edges then typically show a slight, non-sharp transition between the speeds on different sides of the motion boundary. An example for such a principal component is shown in Figure 8.4. If such principal components are contained in the set of eigenflows they are predominant in case of motion boundaries due to high coefficients. Yet, since the learned motion boundaries are not sharp the resulting computed motion boundary is not sharp, either. Hence, the method can become inaccurate at motion boundaries. In case that such principal components are not included in the set of eigenflows the least median of squares approach usually leads to sharp motion boundaries, since the pixels on one side of the motion boundary are considered as outliers. Another reason for inaccurate estimates at motion boundaries are the derivative filters. They are spatially and temporally large and, thus, yield inaccurate image derivatives near motion boundaries, which in turn lead to inaccurate values in the linear system of equations (8.3). Figure 8.5 illustrates the different behavior of the proposed method near motion boundaries for different numbers of eigenvectors compared to the original structure tensor method. In all cases the least median of squares method has been used to solve the system of equations. From Figure 8.5 we can deduce the following: Since the third and fourth principal component contains edge structures these components are mainly used to represent the edge in case of four principal components (8.5 b). Hence, in the center the values for the third and fourth parameters are predominant. As the learned motion boundaries are not perfectly sharp the transition is also visible in the resulting computed 120 8.5 Results a) 0.13 b) 1.03 c) 0.27 Figure 8.5: Estimation result with least median of squares method based on (a) two, (b) four eigenvectors and (c) without learned eigenvector model, Top: estimated parameters, Bottom: flow indicated by the parameters above. The values below correspond to the average angular error of the estimated flow field. flow and parameter field. In case only two principal components are used for the flow estimation (8.5 a) the edge cannot be represented by the model, and hence the least median of squares method considers pixels across the motion boundary as outliers and ensures a correct edge. On the other hand, in case no motion model is used (8.5 c), we end up with the original structure tensor approach combined with the least median of squares method. The resulting motion field and the higher angular error shows that the edge is not as clear as in the case that motion models are used (8.5 a). A drawback of the proposed method is that it has difficulties with very large displacements due to the linearization of the brightness constancy equation. This is a common problem for many optical flow methods relying on the linearized expression. Usually the use of a multiscale pyramid-type based approach helps to solve this problem. The rather deceiving results for the office sequence, whose ground truth flow field only consists of a divergence, is probably due to the limitations of the training data, which was taken from various sequences without divergences. Therefore, the largest errors appear at the center of the divergence of the flow field. 8.5.2 Efficiency The proposed local optical flow method can be implemented efficiently due to several reasons. First, the method only takes into account a limited local image region in order to estimate the displacement vector for each pixel. Hence, it takes only limited space and 121 8 A Model Based Optical Flow Algorithm can be easily parallelized. Second, the computation of the PCA model can be carried out once before the estimation of the optical flow and can be used for all kinds of sequences and the confidence estimation later on. I will now analyze the speed of the suggested method more precisely. To compute the structure tensor we need to calculate several scalar products, which can be obtained by means of convolutions in order to avoid recalculations. Let vi := bi(1: p ) (8.15) wi := bi( p +1:p) (8.16) ma := m(1: p ) (8.17) mb := m( p +1:p) (8.18) 2 2 2 2 for i ∈ Nk denote the horizontal and vertical component of the i-th eigenflow and the mean vector. Let J = (L, d)T (L, d) denote the structure tensor containing (k+1)×(k+1) entries. Then each entry Jij of the structure tensor corresponds to a scalar product of the following form (as J is symmetric only the upper triangle and diagonal is indicated): 1≤i≤j≤k: Jij = hvi ∗ Ix + wi ∗ Iy , vj ∗ Ix + wj ∗ Iy i 1 ≤ i ≤ k, j = k + 1 : Jij = hvi ∗ Ix + wi ∗ Iy , It + ma ∗ Ix + mb ∗ Iy i i=j =k+1: Jij = hIt + ma ∗ Ix + mb ∗ Iy , It + ma ∗ Ix + mb ∗ Iy i . In the first case [1 ≤ i ≤ j ≤ k] 3 k2 (k + 1) convolutions of eigenvectors with image derivatives are necessary. In the second case [1 ≤ i ≤ k, j = k + 1] 6k convolutions are necessary, and in the third case [i = j = k + 1] 9 convolutions are necessary. So, all in all, this amounts to 23 k 2 + 15 2 k + 9 convolutions to compute the structure tensor for all image locations. This number can be reduced to 32 k 2 + 27 k + 1 convolutions if the training data is symmetrized in such a way that the mean vector computed by PCA equals ~0 (see section 7.2.2). For comparison, the original structure tensor method by Bigün for k parameters yields 12 k 2 + k + 1 convolutions. After computing the structure tensor the proposed method as well as the original method by Bigün simply amounts to solving a (k + 1) × (k + 1) system of equations. Figure 8.6 shows the computation times per pixel on a 2.4 GHz machine for increasing integration area sizes and numbers of eigenflows computed on the Yosemite sequence 122 8.5 Results Figure 8.6: Computation times per pixel for increasing integration area sizes and increasing numbers of eigenflows. containing 15 frames based on the basic least squares method. We can conclude that for small integration areas and numbers of eigenflows the method works near-realtime without further tuning. For larger sizes and numbers of eigenflows the computation time increases approximately quadratically. Furthermore, the convolutions carried out to compute the structure tensor images take most of the computation time, the application of the model to the sequence is very fast. For comparison: the previously proposed method “Real-Time Optic Flow Computation with Variational Methods” by Bruhn et al. [27] takes 0.0094 ms per pixel on a 3.06 GHz machine. To speed up the approach, the method can be parallelized or implemented on GPUs similarly as described by Strzodka and Garbe [95]. 8.5.3 Adaptability The algorithm is adaptable to all kinds of scenes where typical, complex motion patterns need to be computed such as in fluid dynamics or driver assistance systems in vehicles. Especially in such cases it is valuable to learn motion statistics from sample data. If no such prior knowledge on the type of motion is available the algorithm achieves results of already high accuracy. The accuracy is increased in case such prior knowledge is in fact available. Figure 8.7 shows spatial principal components computed on particle image velocimetry (PIV) test data, on the Yosemite sequence and on a motion boundary. The examples show that for very different kinds of flow fields we obtain very different principal components. In this way, the algorithm can be adapted to special applications. The proposed method can easily be extended to three dimensional optical flow problems and brightness changes [50]. Figure 8.8 shows results for PIV data from the PIV challenge [79]. As the displace- 123 8 A Model Based Optical Flow Algorithm Figure 8.7: Examples of spatial principal components computed on PIV data (top), the Yosemite sequence (center) and on a motion boundary (bottom). 124 8.5 Results Figure 8.8: Example sequence and computed flow field for PIV data. ments are extremely large in the upper right corner of the image, the image derivatives for the small particles are incorrect leading to incorrect results due to the linearized brightness constancy constraint. Hence, the resolution of the image sequence is reduced by a factor of 4. To compute image derivatives, the 7x7x7 filters proposed by Scharr [89] are employed. Six eigenflows sized 15x15x1 are used to estimate the flow, and the sequence is preblurred by a Gaussian filter of spatial standard deviation 0.8. In this way root mean squared (RMS) errors of 0.13 could be obtained, which lie within the range of typical PIV methods applied in the PIV challenge [93]. Figure 8.8 shows an example for such a sequence and computed flow field. 8.5.4 Results for the Global Approach To demonstrate the global approach it is applied to the Yosemite and the Rubber Whale sequence. For the Yosemite sequence the Gaussian filter with σ = 0.7 is used. Furthermore, a 13 × 13 × 1 model is used to estimate 6 model parameters. The smoothing parameter was set to λ = 500. For the data term function ψ1 β1 = 10 is chosen, and for the regularizer β2 = 2. 200 linear iterations and 5 nonlinear steps on each of the 5 pyramid levels were used. I employed warping as described above, the central difference scheme for spatial discretization and forward differences for temporal discretization. The resulting flow field and the corresponding angular error can be found in Figure 8.9. An average angular error of 1.77 ± 1.61 could be obtained. For the second test sequence, the Rubber Whale sequence, the following parameters were used: Gaussian σ = 1.0, model size 5 × 5 × 1, 10 model parameters, λ = 500, β1 = 70 and β2 = 0.1. The flow field and angular error are depicted in Figure 8.9. The average angular error for this sequence is 9.13 ± 17.43. The problem with the global approach is that the high memory complexity does not 125 8 A Model Based Optical Flow Algorithm Figure 8.9: Flow and angular error computed on the Yosemite sequence (top) and on the Rubber Whale sequence (bottom) by means of the global approach described in section 8.4. allow for large model sizes, especially not temporal ones. Hence, the results had to be limited to rather small, solely spatial optical flow models. Better results could probably be obtained with larger model sizes. 8.6 Summary and Conclusion I have presented a novel, local approach to optical flow estimation, which essentially extends the basic structure tensor method by incorporating prior knowledge by means of learning motion models. The proposed method yields results of high quality in case of small or intermediate displacements. For large displacements the linearization of the brightness constancy constraint equation is no longer a valid approximation. Here, a multiscale approach can be employed to yield good results. For the Yosemite sequence, among the local methods the proposed approach obtains errors comparable to Farnebäck [36] with much lower standard deviation. Among global methods it is more accurate than the non-linear 3d CLG method by Bruhn et al. [28] or the statistics based method by 126 8.6 Summary and Conclusion Roth and Black [85]. Furthermore, the proposed method is not only accurate, but also simple to implement, easily parallelizable and adaptable to special motion patterns in case such prior knowledge is available. Hence, especially if implementation or computation time is sparse (as is often the case in industry applications) or in case typical motion patterns exist in the scene, the proposed algorithm is a good choice. I have demonstrated that the method is robust with respect to the selection of sufficiently general sample data for the motion statistics and other parameters as well. Besides, the suggested local approach can be integrated into a global optical flow method, which is based on learned motion models. 127 8 A Model Based Optical Flow Algorithm 128 Chapter 9 The Restoration of Optical Flow Fields 9.1 Introduction The previous chapters were dedicated to the analysis of computed optical flow fields. Situation measures can be used to make statements on the estimability of the optical flow vector based on the complexity of the image sequence. In contrast, confidence measures analyze the accuracy of an already computed flow field. Based on such indicators, incorrect vectors can be removed from given flow fields. In this chapter I propose methods for the restoration of sparsified fields in order to obtain dense fields with lower average error. In the results section I will show how flow fields can be automatically refined based on the combination of the nonlinear statistical confidence measure suggested in Chapter 7 and an image based inpainting approach proposed in this chapter. 9.1.1 Motivation Many methods have been proposed to estimate motion in image sequences. Yet, in difficult situations such as in case of multiple motions, aperture problems or at occlusion boundaries incorrect optical flow estimates often occur. These incorrect flow vectors can be detected and removed from the flow field e.g. by means of confidence measures [26, 66]. But since many applications demand a dense flow field it would be beneficial to reconstruct the missing vectors based on information from the surrounding flow field. A similar task has been addressed in the field of image reconstruction, where it was called “inpainting”. The reconstruction of optical flow fields can be accomplished by a simple extension of these inpainting functionals for images, e.g. TV-inpainting on two dimensional vector fields. However, these methods sometimes fail in situations where the course of the 129 9 The Restoration of Optical Flow Fields motion boundary is unclear, e.g. if round motion boundaries or junctions occur. Since image edges often correspond to motion edges the information drawn from the image sequence can be important for the reconstruction, especially in such cases where the damaged vector field does not contain enough information to uniquely determine the course of motion boundaries. Hence, in the special case of optical flow, the image sequence provides a source of information in addition to the corrupted vector field, which can be used to guide the reconstruction process in ambiguous cases. So far, optical flow fields have sometimes been used for the reconstruction of images, e.g. in video completion – this time I use the image to reconstruct the optical flow field. The resulting functional is nonlinear and can be minimized by means of the finite element method. I compare the results to diffusion based and TV inpainting methods. 9.1.2 Related Work For the reconstruction of images, inpainting is a widely used technique. The reconstruction of corrupted images was first proposed by Masnou and Morel [72] and named “disocclusion”. The term “inpainting” was brought up by Bertalmio et al. in [15]. It refers to the art of restoring damaged paintings or, in case of digital images, to the reconstruction of blank image domains based on image information outside the domain. The classical inpainting problem can be formulated as follows. Given an image I0 : D → R and an inpainting domain G ⊂ Ω, one asks for a restored image intensity I : D → R, such that I|D\G = I0 and I|G is a suitable and regular extension of the image intensity I0 outside G. The simplest inpainting model is based on the construction of I on G with boundary data I = I0 on ∂G. This model is equivalent to the minimization of Z 1 EL (I) = k∇Ik2 dx (9.1) 2 G for given boundary data. The resulting intensity function I is smooth – even analytic – inside G but does not continue any edge type singularity of I0 prominent at the boundary ∂G. To resolve this shortcoming TV-type inpainting models have been proposed [29]. They are based on the functional Z 1 ETV (I) = k∇Ik dx, (9.2) 2 G which allows for steep transitions on some edge contour. The resulting image intensity is a BV function and, thus, characterized by jumps along rectifiable edge contours. In [9] Ballester et al. proposed a variational approach based on the continuation of isophote lines. A variational approach based on level set perimeter and mean curvature was presented by Ambrosio and Masnou in [2]. Other approaches have been proposed for image inpainting, e.g. curvature-driven diffusion inpainting suggested by Chan and Shen [30], or the restoration of motion fields for video reconstruction. 130 9.2 Diffusion Based Motion Inpainting 9.1.3 Contribution In this chapter I address the restoration problem for locally corrupted optical flow fields. Often in practical applications, the flow field based on image derivatives may be corrupted locally, while the image data is still available. This available information has not been exploited previously for optical flow restoration. Thus, a novel anisotropic BV-type variational approach is proposed, where the anisotropy takes into account edge information of the underlying image sequence. To identify unreliable flow vectors a confidence measure is used. This measure is taken into account as a weight in the functional. The method will be validated on test data and on real world motion sequences with given ground truth. Part of this work has been submitted [14]. Let ϕ(x) : R3 → [0, 1] denote the confidence function which indicates the regions to be reconstructed. Let, furthermore, θ stand for the threshold applied to φ in order to identify the regions to be reconstructed, and let H stand for the Heaviside function. 9.2 Diffusion Based Motion Inpainting A fast and simple way to inpaint a given motion field u is to smoothly reconstruct it by minimizing the gradient within the corrupted region. This can be achieved by minimizing the following energy functional in a variational approach [62] Z min (u(x) − u0 (x))2 H(ϕ(x) − θ) + λk∇u(x)k2 (1 − H(ϕ(x) − θ)) dx . Ω,t To minimize the energy the calculus of variations is used (see Chapter 2.1). As the region where ϕ(x) > θ is preserved, we only have to compute the minimum for the remaining set of pixels called G, that is the minimum of the function Z Z Z φ(u) = L(u)dx = k∇u(x)k2 dx = u21x + u21y + u22x + u22y dx . (9.3) G G G The Euler Lagrange equations for i ∈ {1, 2} are obtained according to Chapter 2.1 in the following way ∂L ∂ ∂L ∂ ∂L − − = 0⇔ ∂ui ∂x ∂uix ∂y ∂uiy −2uixx − 2uiyy = 0 ⇔ uixx + uiyy = 0. Hence, the minimization of the energy in (9.3) in a variational approach leads to the set of linear partial differential equations, the Euler-Lagrange equations, ( 4ui (x) = uixx + uiyy = 0, if ϕ(x) < θ, (9.4) u(x) = u0 (x), otherwise . 131 9 The Restoration of Optical Flow Fields These equations are descretized using a finite differences scheme. The second derivatives are discretized using a four point Laplace stencil. In this way, we obtain a large linear system of equations, which can be solved using standard methods such as the conjugate gradient method or the Gauss-Seidel method with successive overrelaxation (SOR). Yet, as only the gradient of the motion field is minimized within the corrupted regions, diffusion based motion inpainting is not suitable to continue motion edges into the region to be reconstructed. 9.3 TV Motion Inpainting As edges are not taken into account by the diffusion based approach, the total variation based denoising functional by Rudin, Osher and Fatemi [88] can be adopted to the inpainting of motion fields. The total variation of a motion field is denoted by T V (u) = Z k∇u(x) k dx = G Z q u21x (x) + u22x (x) + u21y (x) + u22y (x) dx . G The space of functions with bounded variations (BV space) is defined as BV (Ω) = {f |f ∈ L1 (Ω) and T V (f ) < ∞} . (9.5) The BV space is a Banach space together with the BV norm kf kBV = kf kL1 + T V (f ) . (9.6) It allows jumps while having sufficient control over arbitrary oscillations. Hence, this space has often been chosen for the representation of images containing edges. As motion fields contain edges at every occlusion boundary, they can be represented as twodimensional functions in BV space. I propose to minimize the following energy functional Z min (u(x) − u0 (x))2 H(ϕ(x) − θ) + λ k∇u(x)k(1 − H(ϕ(x) − θ)) dx . | {z } G T V (u(x))|{x|ϕ(x)<θ} Let Z φ(u) = Z k∇u(x)kdx = L(u)dx = G G Z q G u21x + u21y + u22x + u22y dx . (9.7) For simplicity in writing let a := u21x + u21y + u22x + u22y . 132 (9.8) 9.4 Image Guided Motion Inpainting The corresponding Euler-Lagrange equations for i ∈ {1, 2} are obtained in the following way ∂ ∂L ∂ ∂L ∂L − − ∂ui ∂x ∂uix ∂y ∂uiy 2 √ √ u2 uiyy u u −uixx a + ix√aixx −uiyy a + iy√a + a a −(uixx + uiyy )a + u2ix uixx + u2iy uiyy √ 3 a = 0⇔ = 0⇔ = 0. (9.9) Or equivalently, the minimization of the energy in (9.7) leads to the following EulerLagrange equations, a set of non-linear partial differential equations ( ∇ui (x) div k∇u(x)k = 0, if ϕ(x) < θ u(x) = u0 (x), (9.10) otherwise . To solve these equations they are discretized by common finite differences schemes. The Euler-Lagrange equation can be understood as the gradient in the functional space. Hence, we can write for an artificially introduced time step t ∂u = −∇L ∂t un+1 − un = −∇L dt un+1 = un − dt∇L. (9.11) (9.12) (9.13) Thus, to solve the system of nonlinear partial differential equations we can use the gradient descent method. 9.4 Image Guided Motion Inpainting The TV motion inpainting approach is able to reconstruct flow field edges. However, the precise course of these edges is often unclear for larger destroyed regions. Consider for example the edge of a circle. Such problems could be handled by the integration of information from the image sequence such as the gradient. I propose to minimize the following variational approach Z φ(u) = min (u(x) − u0 (x))2 H(ϕ(x) − θ) + λβ(∇I(x), Du(x))(1 − H(ϕ(x) − θ)) dx G 133 9 The Restoration of Optical Flow Fields Figure 9.1: Examples for the mapping g, which controls the influence of the image gradient for different parameter values for µ. where Du stands for the Jacobian matrix of u and g(s) = 1 1+ s2 µ2 , β(∇I(x), Du(x)) = g(∇I(x))|Du(x)| + (1 − g(∇I(x)))γ(∇I(x), Du(x)) , sX 2 ν 2 (n · ∇ui (x))2 + (n⊥ · ∇ui (x)) , γ(∇I(x), Du(x)) = i n = ∇I(x) |∇I(x)| . Figure 9.1 shows examples for the mapping g. The idea of the reconstruction term behind this formulation is the following: In case the image gradient is low and, thus, g yields a value close to 1, we want to reconstruct the missing flow vectors isotropically by minimizing the Frobenius norm |Du| of the flow field gradient p |Du| = |∇u1 |2 + |∇u2 |2 . (9.14) 134 9.4 Image Guided Motion Inpainting In case the image gradient is high and, thus, g yields a value close to 0, we want to control the flow field’s orientation along the image gradient. Hence, the gradient ∇ui , i ∈ {1, 2}, of the flow field components is separated into two parts by means of the scalar product with the normalized image gradient n and with the orthogonal to this direction, n⊥ , respectively. The smaller the parameter ν is chosen the less expensive is it to assign the largest part of the flow gradient component to the first term, that means that the flow gradient is oriented along the image gradient. In this way, the anisotropy of the restoration process is controlled by the parameter ν. The value µ controls the strength of the image gradient necessary to influence the motion reconstruction process. It can approximately be understood as the weak “threshold” between high and low image gradients, so that for higher values the image gradient is taken into account. Hence, locally minimizing the prior β will favor sharp motion edges aligned with edges in the underlying image. Apart from edges a usual TV prior is applied to the motion field. In particular for larger destroyed regions this leads to an effective image based guidance in the reconstruction of motion edges. For ν values close to 1 there is no preference for any orientation of a motion edge and we obtain the classical T V type inpainting model on motion fields. This will be proven in the following. Due to orthonormality we know 2 2 ⊥ n21 + n22 = n⊥ 1 + n2 = 1, (9.15) ⊥ ⊥ ⊥ (n⊥ 1 = −n2 ∧ n2 = n1 ) ∨ (n1 = n2 ∧ n2 = −n1 ). (9.16) Therefore, we obtain γ(∇I, Du) = sX 2 (n · ∇ui )2 + (n⊥ · ∇ui ) = (9.17) i sX 2 2 2 2 ⊥ 2 ⊥ ⊥ (n21 + n⊥ 1 )∇uix + (n2 + n2 )∇uiy + 2(n1 n2 + n1 n2 )∇uix ∇uiy = i sX ∇u2ix + ∇u2iy = |Du|, i 135 9 The Restoration of Optical Flow Fields and, thus, β(∇I, Du) = |Du|. The functional derivative of the energy for i ∈ {1, 2} is computed as follows h∂ui γ(∇I, Du), i = 1 ν 2 (n · ∇ui )(n · ∇) + 2(n⊥ · ∇ui )(n⊥ · ∇) = 2γ(∇I, Du) 1 ν 2 (n · ∇ui )n + n⊥ · ∇ui n⊥ ∇, 2γ(∇I, Du) h∂ui β(∇I, Du), i = ∇ui · ∇ + (1 − g(|∇Ik)) h∂ui γ(∇I, Du), i = g(|∇I|) |Du| ∇ui 1 − g(|∇I|) 2 ⊥ ⊥ g(|∇I|) · ∇, + ν (n · ∇ui )n + (n · ∇ui )n |Du| γ(∇I, Du) h∂ φ(u), i = Z ui H(ϕ − θ)(ui − ui0 ) + Ω,t 1 − g(|∇I|) 2 ∇ui ⊥ ⊥ · ∇ dx. + ν (n · ∇ui )n + (n · ∇ui )n ν g(|∇I|) |Du| γ(∇I, Du) (9.18) This weak formulation of the Euler-Lagrange equation can be used for the minimization of the energy by means of the finite element method and gradient descent with Armijo step size control [5] to speed up the process. 9.5 Experiments and Results 9.5.1 Reconstruction of Artificial Motion Fields To illustrate the image guided motion inpainting method it is applied to the reconstruction of a corrupted rectangular and circular motion field. To this end, Figure 9.2 shows the color coded ground truth flow field on the left hand side (a), the red shape indicating the region to be reconstructed in the second image (b), the initialization of the image guided motion inpainting algorithm in the third image (c) and the result of the algorithm on the right hand side (d). The final results show that the reconstruction process was successful in retrieving the motion boundary along the edge of the circle. The following set of parameters: λ = 1, µ = 50 and ν = 0.1 was used. For the circle 6200 iteration steps were necessary, for the rectangle 12200. 136 9.5 Experiments and Results a) b) c) d) Figure 9.2: a) Ground truth flow field, b) Underlying image and corruption indicated by the red shape, c) Corrupted flow field which is the initialization of the image guided motion inpainting algorithm, d) Restored flow field. 9.5.2 Reconstruction of Real World Motion Fields After reconstructing artificial motion fields I now turn to real world examples and reconstruct the motion field of a sequence taken from the Middlebury dataset [8]. Special attention will be turned to the effect of the parameters ν and µ on the reconstruction result. Figure 9.3 shows the Rubber Whale sequence with its corrupted regions marked by red shapes (a), the ground truth flow field (b), the result of the image guided reconstruction algorithm (c) and the angular error (d). The following set of parameters was used: λ = 1, µ = 1 and ν = 0.1. To investigate the effect of the parameter ν let us take a closer look at two different regions in the scene: the upper left corner of the turning wheel on the left hand side and the flap of the box on the right hand side. At the upper side of the wheel the image contrast is low and, thus, makes reconstruction along image edges difficult. Hence, the sensitivity of the method concerning the image gradient should be high and the method’s inclination to follow image edges should be large as well, which would result in small values for µ and ν. At the flap of the box we have the opposite problem. The image contrast is large, but the motion boundary does, in fact, not follow the strong but the weaker edge above. Hence, the inclination of the method to follow image edges should be reduced, which would result in a higher value for ν. The effect of different parameter constellations for both regions is shown in Figure 9.4. The results demonstrate that for low ν values the wheel can be reconstructed quite well, 137 9 The Restoration of Optical Flow Fields a) b) c) d) Figure 9.3: a) Original Rubber Whale frame, b) Ground truth flow field, c) Restored flow field, d) Angular error. but the motion field also follows the sharp edge of the box flap and yields errors in that part of the sequence. In contrast, for high ν values the box flap can be reconstructed well, but the wheel is reconstructed by a straight edge, which does not follow the contour of the wheel. 9.5.3 Comparison to Diffusion and TV Inpainting To finally compare the image guided motion inpainting algorithm to the diffusion and TV inpainting method, I apply them to the corrupted Marble sequence. Figure 9.5 shows the original corrupted sequence and the results of the diffusion based, the TV-based and the image based motion inpainting methods. The results demonstrate that the diffusion based motion inpainting is not able to reconstruct flow edges. In contrast, by means of TV motion inpainting flow edges can be reconstructed. However, the lower right corner of the central marble block cannot 138 9.5 Experiments and Results ν = 0.01 ν = 0.1 ν = 0.5 ν = 1.0 µ=1 µ = 10 µ = 50 µ = 100 Figure 9.4: Upper row: results for different values of ν for µ = 50, lower row: results for different values of µ for ν = 0.1 be reconstructed without information drawn from the original image, because the exact course of the edges near the junction is unclear. Image based motion inpainting uses the image gradient information to correctly reconstruct the motion boundary of the central marble block as well. Here the following set of parameters was used: λ = 1, µ = 50 and ν = 0.1. 9.5.4 Reconstruction Based on Confidence Measures Finally, the most effective confidence measure, the nonlinear statistical confidence measure, and the image guided motion inpainting approach are combined to improve given optical flow fields. The results of the nonlinear confidence measure (pValNonlin) are used as confidence function ϕ, which indicates the reliability of the current flow vector. Figure 9.6 shows the original flow fields and their reconstruction. For the Rubber Whale sequence and the structure tensor flow field a threshold of θ = 0.03 was applied to the pValNonlin confidence measure shown in a). By means of image based inpainting I was able to reduce the angular error from 11.18 ± 23.32 to 8.31 ± 16.18. For the Marble sequence the flow field was computed by Farnebäck’s method. In this case, a threshold of θ = 0.19 was applied to the pValNonlin confidence measure. The image guided inpainting approach then reduced the angular error from 2.13 ± 3.21 to 1.93 ± 2.54. Thus, I have shown how optical flow fields can be refined automatically to obtain lower average errors. 139 9 The Restoration of Optical Flow Fields a) original b) 2.00 ± 3.87 c) 0.93 ± 3.75 d) 0.39 ± 1.38 Figure 9.5: Comparison of the proposed image guided inpainting algorithm to diffusion and TV inpainting; the numbers indicate the average angular error within the corrupted regions after reconstruction; a) Original Marble sequence with corruptions indicated by red rectangles, b) Reconstruction result of diffusion based motion inpainting, c) Reconstruction result of TV based motion inpainting, d) Reconstruction result of image guided motion inpainting. 140 9.5 Experiments and Results a) pValNonlin b) thresholded c) original flow d) reconstructed flow a) pValNonlin b) thresholded c) original flow d) reconstructed flow Figure 9.6: Reconstruction of the Rubber Whale and Marble flow computed by the structure tensor method and Farnebäck’s method respectively. a) Result of the nonlinear confidence measure (pValNonlin), b) Thresholded confidence, c) Original flow field (cropped to maximum flow length 4), d) Reconstruction of the original flow field after image guided motion inpainting. 141 9 The Restoration of Optical Flow Fields 9.6 Summary and Conclusion Given an image sequence and an extracted underlying motion field together with a local measure of confidence for the motion estimation I have proposed a variational approach for the restoration of the motion field. This restoration is vital for a number of applications requiring dense motion fields. Based on a confidence measure regions of corrupted motion can be detected. The underlying image data is still available and reliable. I make use of this information to improve the restoration of the motion field. The approach is based on an anisotropic TV-type functional, where the anisotropy takes into account edge information extracted from the underlying image data. The approach has been applied to test data and to two different real world optical flow problems. The results are compared to diffusion based vector field inpainting and TV-type inpainting. I demonstrate that inpainting guided by the underlying intensity data outperforms purely flow driven approaches. I consider this as a feasibility study for the coupling of motion field and image sequence data in variational inpainting approaches. 142 Chapter 10 Conclusions and Perspectives 10.1 Summary and Conclusion In this thesis I have addressed the analysis and restoration of arbitrarily computed optical flow fields. Statements on the accuracy of flow vectors always require a given error definition. Therefore, chapter 4 was dedicated to the analysis of known error measures for optical flow fields. Due to shortcomings of previous methods I have derived a joint distribution of optical flow estimates, ground truth flow vectors and gray value neighborhoods, and obtained interesting statements from this distribution by means of marginalization and conditioning. For example, I observed that all estimators except the Nir method have difficulties with large displacements, and I obtained principal components showing typical gray value structures in case of very high or very low endpoint errors. I also proposed a statistically motivated scalar indicator suited for the ranking of different estimators. I applied the evaluation method to five known optical flow estimators. Based on this statistical evaluation and systematic analysis, I hope to further improve optical flow estimators and their analysis methods. Quality estimates for optical flow vectors can be obtained by means of confidence measures. In Chapter 5 I composed the most important previously used confidence measures and classified them according to the intrinsic dimension of the image sequence they examine. In the end I was able to assign the most effective measure to each intrinsic dimension with respect to noise robustness. However, the measures only yielded acceptable results on artificial sequences, not on real-world sequences. Furthermore, they derive statements on the feasibility of accurate flow computation from the intrinsic dimension of the image sequence only. Thus, I found that none of these measures qualifies as confidence measure, since they do not even consider the computed flow field. We, therefore, suggested to distinguish between situation and confidence measures. To account for this shortcoming, in Chapter 6 I proposed to instead analyze the intrinsic 143 10 Conclusions and Perspectives dimension of the energy surface after optimization in the flow computation process. The resulting measure also detects outliers and can either be employed as confidence measure in case the energy surface is obtained from a computed flow field or as a situation measure in case the energy surface is based on a zero flow field. In the latter case, the measure examines the feasibility of an accurate optical flow computation and yielded noise resistant detection results far above previously used situation measures for artificial and real-world sequences. The additional application of a simple motion inpainting algorithm led to impressing results, as angular error reductions of up to 38% were feasible by confidence estimation and subsequent motion restoration. Furthermore, I concluded for the employed test sequences that the application of the suggested postprocessing method to sparsified flow fields calculated with local or global methods yielded better results than could be achieved by exploiting the filling-in effect of the original global methods. Hence, in contrast to the accepted opinion, global methods are not always preferable to local methods if a dense flow field is required, because motion inpainting only based on a set of reliable flow vectors led to superior results. The confidence measures presented in Chapter 7 contribute to the automatic quality evaluation of optical flow fields. They are statistically motivated, since they are based on probability distributions, which are learned from sample data. One of the three measures relies on principal component analysis models, the other two are formulated as best linear unbiased estimators based on linear or nonlinear data vectors. In this way, I was able to define confidence measures, which take into account the flow field, can be applied to arbitrarily computed motion fields, and justifiably bear the notion “confidence measure” in a statistical sense. Slight changes in the algorithm even allow for applications to sparse flow fields, which often occur e.g. in traffic scenes. Results for locally and globally computed flow fields on ground truth test sequences based on different error measures showed the superiority of the proposed method compared to previously employed confidence measures. Error reductions of up to 50% for the whole flow field were feasible by removing only 10% of the flow vectors indicated by the proposed nonlinear confidence measure. As every confidence measure contains some knowledge on correct flow fields, it usually contains the basic idea for a new optical flow estimator as well. Based on the linear subspace projection measure proposed in Chapter 7, I suggested a novel, local approach to optical flow estimation in Chapter 8, which essentially extends the basic structure tensor method by incorporating prior knowledge by means of learned motion models. Advantages of this approach are despite its high accuracy that it is simple to implement, easily parallelizable and adaptable to specific types of motion in special applications. On the Yosemite sequence, it yields results comparable to Farnebäck’s method, but with much less implementation effort, and obtains angular errors below the 3D combined local global (CLG) method by Bruhn et al. [28] and the learning based approach by Roth and Black [85]. The last Chapter 9 was dedicated to the restoration of optical flow fields. Methods from 144 10.2 Future Research image inpainting have been transferred to the reconstruction of motion fields. A simple approach is the diffusion based inpainting method, which is not able to continue motion boundaries. To solve this problem the total variation based inpainting approach was employed. However, there are situations, where the course of a motion boundary cannot be unambiguously deduced from the preserved neighboring motion field. To this end, the gradient of the image sequence was used to guide the reconstruction process. The image based motion inpainting approach allows for the definition of the minimum image edge strength and the strength of the anisotropy. The results showed that by means of image guided inpainting motion boundaries, which could not be retrieved by TV-inpainting, were correctly restored. Finally, the combination of the nonlinear statistical confidence measure and the image guided inpainting method yielded an automatic optical flow field refinement routine, which significantly reduced the angular error of the test flow fields. Our results indicate that by identifying a set of reliable flow vectors followed by subsequent mathematical motion inpainting approaches flow fields of lower angular errors can be obtained than by combining all model constraints in one functional. 10.2 Future Research For future research, it would be interesting to further improve the proposed confidence measures. To this end, other motion models or different learning methods could be employed, e.g. advanced machine learning techniques. In this way, the motion models could also account for rare flow field constellations, which are underrepresented in the training data, and for improved representations of motion boundaries. Furthermore, it could be rewarding to directly learn a confidence estimator based on sample data without having to define or learn motion models at all. Concerning the reconstruction of optical flow fields, a topic for future research would be the refinement of the image guided inpainting approach. Robustness and reliability of the approach might be improved based on a fully joint approach, where the motion field and the image sequence are jointly restored. A restoration in space time would be promising as well. Furthermore, as shown in the results section of Chapter 9 one set of parameters is sometimes not suitable for the whole optical flow field. Hence, it would be beneficial to adapt these parameters to the requirements of the current situation. Furthermore, intrinsic dimension information of the energy surface could be used in optical flow estimators to define new regularization methods. E.g. in case of flat energy surfaces, which correspond to intrinsic dimension zero, a homogeneous regularization model would be suitable as many different displacement vectors lead to similar energies, which indicates low certainty. In contrast, for energy surfaces of intrinsic dimension one, smoothing of the flow field should only be applied in the direction along the aperture problem. In case of intrinsic dimension two, which indicates energy surfaces having a unique minimum, the computed flow vectors should have larger influence on their neigh- 145 10 Conclusions and Perspectives bors due to high certainty. Hence, regularization models or strenghts could be adapted to the intrinsic dimension of the energy surface. Another rewarding issue would be a more thorough investigation of the model based global optical flow method proposed in Chapter 8. Here, due to memory complexity merely small patch sizes could be tested yielding only intermediate results. As already seen in the local approach, larger patch sizes might significantly reduce the flow field error. In Chapter 5 we have seen that most of the situation measures yielding good or intermediate results for artificial test sequences fail for real-world applications. Hence, more sophisticated measures that are especially noise resistant would be another topic for future research. 146 Bibliography [1] L. Alvarez, R. Deriche, and J. Papadopoulo, T. Sanchez. Symmetrical dense optical flow estimation with occlusion detection. In European Conference on Computer Vision (ECCV), pages 721–735, 2002. [2] L. Ambrosio and S. Masnou. A direct variational approach to a problem arising in image reconstruction. Interfaces and Free Boundaries, 5:63–81, 2003. [3] P. Anandan. A computational framework and an algorithm for the measurement of visual motion. International Journal of Computer Vision, 2:283–319, 1989. [4] B. Andres, C. Kondermann, D. Kondermann, U. Köthe, F. Hamprecht, and C. Garbe. On errors-in-variables regression with arbitrary covariance and its application to optical flow estimation. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–6, 2008. [5] L. Armijo. Minimization of functions having Lipschitz continuous first derivatives. Pacific Journal of Mathematics, 16(1):1–3, 1966. [6] A. Bab-Hadiashar and D. Suter. Robust optic flow computation. International Journal of Computer Vision (IJCV), 29(1):59–77, 1998. [7] R.G. Bainbridge-Smith, A. Lane. Measuring confidence in optical flow estimation. IEEE Electronics Letters, 32(10):882–884, 1996. [8] S. Baker, S. Roth, D. Scharstein, M. Black, J. Lewis, and R. Szeliski. A database and evaluation methodology for optical flow. In Proceedings of the International Conference on Computer Vision (ICCV), pages 1–8, 2007. [9] C. Ballester, M. Bertalmio, V. Caselles, G. Sapiro, and J. Verdera. Filling-in by joint interpolation of vector fields and gray levels. IEEE Transactions on Image Processing, 10(8):1200–1211, 2001. 147 Bibliography [10] J. L. Barron, D. J. Fleet, and S. Beauchemin. Performance of optical flow techniques. International Journal of Computer Vision, 12(1):43–77, 1994. [11] E. Barth. Bewegung als Intrinsische Geometrie von Bildfolgen. In Proceedings of the German Association for Pattern Recognition (DAGM), 1999. [12] E. Barth. The minors of the structure tensor. In Proceedings of the German Association for Pattern Recognition (DAGM), 2000. [13] Erhard Barth, Ingo Stuke, Til Aach, and Cicero Mota. Spatio-temporal motion estimation for transparency and occlusions. In Proceedings of the International Conference on Image Processing (ICIP), volume 3, pages 69–72, 2003. [14] B. Berkels, C. Kondermann, C. Garbe, and M. Rumpf. Reconstructing optical flow fields by motion inpainting. In Proceedings of the Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), 2009. [15] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester. Image inpainting. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 417–424. ACM Press/Addison-Wesley Publishing Co., 2000. [16] J. Bigün, G.H. Granlund, and J.Wiklund. Multidimensional orientation estimation with applications to texture analysis and optical flow. IEEE Journal of Pattern Analysis and Machine Intelligence, 13(8):775–790, 1991. [17] C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995. [18] M. Black and A. Jepson. Estimating multiple independent motions in segmented images using parametric models with local deformations. IEEE Workshop on Motion of Non-Rigid and Articulated Objects, 1994. [19] M. Black and Y. Yacoob. Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion. In Proceedings of the International Conference on Computer Vision (ICCV), 1995. [20] Michael J. Black, David J. Fleet, and Yaser Yacoob. Robustly estimating changes in image appearance. Computer Vision and Image Understanding, 78:8–31, 2000. [21] Michael J. Black and Allan D. Jepson. Estimating optical flow in segmented images using variable-order parametric models with local deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(10):972–986, 1996. 148 Bibliography [22] M.J. Black and P. Anandan. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Computer Vision and Image Understanding, 63(1):75–104, January 1996. [23] M.J. Black, Y. Yacoob, A. Jepson, and D. Fleet. Learning parameterized models of image motion. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR), 1997. [24] M. Bleyer, M. Gelautz, and C. Rhemann. Segmentation-based motion with occlusions using graph-cut optimization. In Symposium of the German Association for Pattern Recognition (DAGM), 2006. [25] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert. High accuracy optical flow estimation based on a theory for warping. In European Conference on Computer Vision, Proceedings, pages 25–36, 2004. [26] A. Bruhn and J. Weickert. A Confidence Measure for Variational Optic Flow Methods, pages 283–298. Springer Netherlands, 2006. [27] A. Bruhn, J. Weickert, C. Feddern, T. Kohlberger, and C. Schnörr. Real-time optic flow computation with variational methods. IEEE Transactions in Image Processing, 14(5):608–615, 2005. [28] A. Bruhn, J. Weickert, and C. Schnörr. Lucas/Kanade meets Horn/Schunck: Combining local and global optic flow methods. International Journal of Computer Vision, 61(3):211–231, 2005. [29] Tony F. Chan and Jianhong Shen. Mathematical models for local nontexture inpaintings. SIAM J. Appl. Math, 62:1019–1043, 2001. [30] Tony F. Chan and Jianhong Shen. Non-texture inpainting by curvature-driven diffusions. J. Visual Comm. Image Rep, 12:436–449, 2001. [31] P. Charbonnier, L. Blanc-Feraud, G. Aubert, and M. Barlaud. Two deterministic half-quadratic regularization algorithms for computed imaging. In Proceedings of the International Conference on Image Processing, pages 168–172. IEEE Computer Society, 1994. [32] Pierre Comon. Independent component analysis, a new concept? Signal Processing, 36:287–314, 1994. [33] F. De la Torre and M. J. Black. A framework for robust subspace learning. International Journal on Computer Vision, 54(1-3):117–142, 2003. [34] Andrew P. Duchon, William H. Warren, and Leslie Pack Kaelbling. Ecological robotics. Adaptive Behavior, 6, 1994. 149 Bibliography [35] G. Farnebäck. http://lsvn.lysator.liu.se/svnroot/\linebreakspatial\ _domain\_toolbox/trunk. [36] G. Farnebäck. Fast and accurate motion estimation using orientation tensors and parametric motion models. In International Conference on Pattern Recognition, Proceedings, volume 1, pages 135–139, Barcelona, Spain, September 2000. [37] G. Farnebäck. Very high accuracy velocity estimation using orientation tensors, parametric motion, and simultaneous segmentation of the motion field. In International Conference on Computer Vision, Proceedings, volume I, pages 171–177, Vancouver, Canada, July 2001. [38] M. Felsberg, S. Kalkan, and N. Krüger. Continuous dimensionality characterization of image structures. Journal of Image and Vision Computing, 2008. [39] C. Fennema and W. Thompson. Velocity determination in scenes containing several moving objects. Computer Graphics and Image Processing, 9:301–315, 1979. [40] C. Fermüller, D. Shulman, and Y. Aloimonos. The statistics of optical flow. Journal of Computer Vision and Image Understanding, 82(1):1–32, 2001. [41] R.A. Fisher. Statistical Methods for Research Workers. Oliver and Boyd, 1925. [42] D. Fleet and A. Jepson. Computation of component image velocity from local phase information. International Journal of Computer Vision, 5:77–104, 1990. [43] B. Furht, J. Greenberg, and R. Westwater. Motion Estimation Algorithms for Video Compression. Springer-Verlag Gmbh, 1996. [44] B. Galvin, B. McCane, K. Novins, D. Mason, and S. Mills. Recovering motion fields: An analysis of eight optical flow algorithms. In Proceedings of the 1998 British Machine Vision Conference, 1998. [45] Christoph S. Garbe, Uwe Schimpf, and Bernd Jähne. A surface renewal model to analyze infrared image sequences of the ocean surface for the study of air-sea heat and gas exchange. Journal of Geophysical Research, 109:1–18, 2004. [46] Christoph S. Garbe, Hagen Spies, and Bernd Jähne. Estimation of surface flow and net heat flux from infrared image sequences. Journal of Mathematical Imaging and Vision, 19(3):159–174, 2003. [47] A. Giachetti, M. Campani, and V. Torre. The use of optical flow for road navigation. Transactions on Robotics and Automation, 14:34–48, 1998. 150 Bibliography [48] D. Goel and Tsuhan Chen. Real-time pedestrian detection using eigenflow. In Proceedings of the International Conference on Image Processing (ICIP), volume 3, pages 229–232, 2007. [49] N. Hata, A. Nabavi, W. Wells, S. Warfield, R. Kikinis, P. Black, and F. Jolesz. Three-dimensional optical flow method for measurement of volumetric brain deformation from intraoperative magnetic resonance images. Journal of Computer Assisted Tomography, 24:531–538, 2000. [50] H. Haussecker and D. Fleet. Computing optical flow with physical models of brightness variation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 23(6):661–673, 2001. [51] H. Haussecker and H. Spies. Motion. In B. Jähne, H. Haussecker, and P. Geissler, editors, Handbook of Computer Vision and Applications, volume 2, chapter 13, pages 336–338. Academic Press, 1999. [52] D. Heeger. Model for the extraction of image flow. Journal of the Optical Society of America, 4(8):1455–1471, 1987. [53] B. Horn and B. Schunck. Determining optical flow. Artificial Intelligence, 17:185– 204, 1981. [54] M. Irani and P. Anandan. Robust multi-sensor image alignment. In Proceedings of the International Conference on Computer Vision, pages 959–966, 1998. [55] Scholz J., Wiersbinski T., Ruhnau P., Kondermann D., Garbe C.S., Hain R., and Beushausen V. Double-pulse planar-lif investigations using fluorescence motion analysis for mixture formation investigation. Experiments in Fluids, 45(4):583– 593, 2008. [56] B. Jähne. Digital Image Processing. Springer Verlag, 2002. [57] I. T. Jolliffe. Principal Component Analysis. Springer, 1986. [58] S. X. Ju, M. J. Black, and A. D. Jepson. Skin and bones: Multi-layer, locally affine, optical flow and regularization with transparency. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR), 1996. [59] S. Kalkan, D. Calow, M. Felsberg, F. Wörgötter, M. Lappe, and N. Krüger. Optic flow statistics and intrinsic dimensionality. In booktitle = Proc. of Brain Inspired Cognitive Systems,, 2004. [60] S. Kalkan, D. Calow, and Wörgötter. Local image structures and optic flow estimation. In Network: Computation in Neural Systems, 2005. 151 Bibliography [61] K.R. Koch. Parameterschätzung und Hypothesentests in linearen Modellen. Ferd. Dümmlers Verlag Bonn, 2004. [62] C. Kondermann, D. Kondermann, and C. Garbe. Postprocessing of optical flows via surface measures and motion inpainting. In Pattern Recognition, volume 5096 of LNCS, pages 355–364. Springer, 2008. [63] C. Kondermann, D. Kondermann, and C. Garbe. The evaluation of optical flow estimators. submitted to Computer Vision and Image Understanding, 2009. [64] C. Kondermann, D. Kondermann, and C. Garbe. Local optical flow estimation based on learned motion models. submitted to Transactions in Image Processing, 2009. [65] C. Kondermann, D. Kondermann, B. Jähne, and C. Garbe. An adaptive confidence measure for optical flows based on linear subspace projections. In Pattern Recognition, volume 4713 of LNCS, pages 132–141. Springer, 2007. [66] C. Kondermann, R. Mester, and C. Garbe. A statistical confidence measure for optical flows. In Proceedings of the European Conference of Computer Vision, ECCV, pages 290–301, 2008. [67] B. Kröse, A. Dev, and F. Groen. Heading direction of a mobile robot from the optical flow. Image and Vision Computing, 18:415–424, 2000. [68] S. Lee, S. Park, N. Cho, Y. Kanatsugn, and J. Park. Occlusion detection and stereo matching in a stochastic method. In Proceedings of the International Conference on Image Processing (ICIP), volume 1, pages 377–380, 2003. [69] K. Lim, M. Chong, and A. Das. A new MRF model for robust estimate of occlusion and motion vector fields. In Proceedings of the International Conference on Image Processing (ICIP), volume 2, page 843, 1997. [70] B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision (DARPA). In Proceedings of the 1981 DARPA Image Understanding Workshop, pages 121–130, 1981. [71] A. Malla and R. Green. Real-time adaptation to unconstrained lighting for occlusion detection. In Image and Vision Computing New Zealand, 2005. [72] S. Masnou and J. Morel. Level lines based disocclusion. In Proceedings of the ICIP 1998, volume 3, pages 259 – 263, 1998. [73] B. McCane, K. Novins, D. Crannitch, and B. Galvin. On benchmarking optical flow. Computer Vision and Image Understanding, 84(1):126–143, 2001. 152 Bibliography [74] A. Mitiche and P. Bouthemy. Computation and analysis of image motion: A synopsis of current problems and methods. International Journal of Computer Vision (IJCV), 19(1):2955, 1996. [75] C. Mota, I. Stuke, and E. Barth. Analytical solutions for multiple motions. In Proceedings of the International Conference on Image Processing (ICIP), 2001. [76] F. Nejadasl, B. Gorte, and S. Hoogendoorn. Optical flow based vehicle tracking strengthened by statistical decisions. ISPRS Journal of Photogrammetry and Remote Sensing, 61:159–169, 2006. [77] C. Nieuwenhuis and M. Yan. Knowledge based image enhancement using neural networks. In International Conference on Pattern Recognition, Proceedings, pages 814–817, 2006. [78] Tal Nir, Alfred M. Bruckstein, and Ron Kimmel. Over-parameterized variational optical flow. International Journal of Computer Vision, 76(2):205–216, June 2006. [79] Nishio S. Kobayashi T. Saga T. Takehara K. Okamoto, K. Evaluation of the 3dPIV standard images (PIV-STD Project). Journal of Visualization, 3-2:115–124, 2000. [80] M. Otte and H. Nagel. Optical flow estimation: advances and comparisons. In Proceedings of the European Conference on Computer Vision (ECCV), pages 51– 60, 1994. [81] N. Papenberg, A. Bruhn, T. Brox, S. Didas, and J. Weickert. Highly accurate optic flow computation with theoretically justified warping. International Journal of Computer Vision, 67(2):141–158, 2006. [82] E. Parzen. On the estimation of probability density functions. Annual Mathematical Statistics, 33:1065–1076, 1962. [83] M. Raffel, C. Willert, and J. Kompenhans. Postprocessing of PIV data. In Particle Image Velocimetry, chapter 6. Springer, 1998. [84] A. Rosenberg and M. Werman. Representing local motion as a probability distribution matrix applied to object tracking. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR), pages 654–659, 1997. [85] S. Roth and M.J. Black. On the spatial statistics of optical flow. In International Conference on Computer Vision, Proceedings, volume 1, pages 42–49, 2005. [86] P. J. Rousseeuw. Least median of squares regression. Journal of the American Statistical Association, (79):871–880, 1984. 153 Bibliography [87] P.J. Rousseeuw and A.M. Leroy. Robust Regression and Outlier Detection. John Wiley, 1987. [88] L.I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D, 60:259–268, 1992. [89] H. Scharr. Optimal filters for extended optical flow. In Complex Motion, Lecture Notes in Computer Science, volume 3417. Springer, 2004. [90] C.E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423,623–656, 1948. [91] A. Singh. Motion-compensated enhancement of medical image sequences. In Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, volume 1660, pages 288–298, 1992. [92] H. Spies and C. Garbe. Dense parameter fields from total least squares. In Proceedings of the German Association for Pattern Recognition (DAGM), 2002. [93] M. Stanislas, Okamoto K., C. Kaehler, and J. Westerveel. Main results of the second international PIV challenge. volume 39, pages 170–191, 2005. [94] C. Stiller and J. Konrad. Estimating motion in image sequences. IEEE Signal Processing Magazine, 16(4):70–91, 1999. [95] R. Strzodka and C. Garbe. Real-time motion estimation and visualization on graphics cards. In Proceedings of the conference on visualization, pages 545–552, 2004. [96] D. Sun, S. Roth, J.P. Lewis, and M.J. Black. Learning optical flow. In Proceedings of the European Conference on Computer Vision (ECCV), 2008. [97] S. Uras, F. Girosi, A. Verri, and V. Torre. A computational approach to motion perception. Journal of Biological Cybernetics, 60:79–97, 1988. [98] A. Waxman, J. Wu, and F. Bergholm. Convected activation profiles and receptive fields for real time measurement of short range visual motion. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR), pages 717–723, 1988. [99] J. Weickert and C. Schnörr. A theoretical framework for convex regularizers in PDE-based computation of image motion. International Journal of Computer Vision, 45(3):245–264, 2001. [100] Y. Yacoob and L. Davis. Learned temporal models of image motion. In International Conference on Computer Vision, Proceedings, 1998. 154 Bibliography [101] C. Zetzsche and E. Barth. Fundamental limits of linear filters in the visual processing of two dimensional signals. Vision Research, 30(7):1111–1117, 1990. [102] C. Zitnick and T. Kanade. A cooperative algorithm for stereo matching and occlusion detection. IEEE Journal of Pattern Analysis and Machine Intelligence (PAMI), 22(7):675–684, 2000. 155

Download PDF

advertisement