Linköping Studies in Science and Technology Thesis No. 755 Spatial Domain Methods for Orientation and Velocity Estimation Gunnar Farnebäck LIU-TEK-LIC-1999:13 Department of Electrical Engineering Linköpings universitet, SE-581 83 Linköping, Sweden Linköping March 1999 Spatial Domain Methods for Orientation and Velocity Estimation c 1999 Gunnar Farnebäck ° Department of Electrical Engineering Linköpings universitet SE-581 83 Linköping Sweden ISBN 91-7219-441-3 ISSN 0280-7971 iii To Lisa iv v Abstract In this thesis, novel methods for estimation of orientation and velocity are presented. The methods are designed exclusively in the spatial domain. Two important concepts in the use of the spatial domain for signal processing is projections into subspaces, e.g. the subspace of second degree polynomials, and representations by frames, e.g. wavelets. It is shown how these concepts can be unified in a least squares framework for representation of finite dimensional vectors by bases, frames, subspace bases, and subspace frames. This framework is used to give a new derivation of Normalized Convolution, a method for signal analysis that takes uncertainty in signal values into account and also allows for spatial localization of the analysis functions. With the help of Normalized Convolution, a novel method for orientation estimation is developed. The method is based on projection onto second degree polynomials and the estimates are represented by orientation tensors. A new concept for orientation representation, orientation functionals, is introduced and it is shown that orientation tensors can be considered a special case of this representation. A very efficient implementation of the estimation method is presented and by evaluation on a test sequence it is demonstrated that the method performs excellently. Considering an image sequence as a spatiotemporal volume, velocity can be estimated from the orientations present in the volume. Two novel methods for velocity estimation are presented, with the common idea to combine the orientation tensors over some region for estimation of the velocity field according to a motion model, e.g. affine motion. The first method involves a simultaneous segmentation and velocity estimation algorithm to obtain appropriate regions. The second method is designed for computational efficiency and uses local neighborhoods instead of trying to obtain regions with coherent motion. By evaluation on the Yosemite sequence, it is shown that both methods give substantially more accurate results than previously published methods. vi vii Acknowledgements This thesis could never have been written without the contributions from a large number of people. In particular I want to thank the following persons: Lisa, for love and patience. All the people at the Computer Vision Laboratory, for providing a stimulating research environment and good friendship. Professor Gösta Granlund, the head of the research group and my supervisor, for showing confidence in my work and letting me pursue my research ideas. Dr. Klas Nordberg, for taking an active interest in my research from the first day and contributing with countless ideas and discussions. Associate Professor Hans Knutsson, for a never ending stream of ideas, some of which I have even been able to understand and make use of. Björn Johansson and visiting Professor Todd Reed, for constructive criticism on the manuscript and many helpful suggestions. Drs. Peter Hackman, Arne Enqvist, and Thomas Karlsson at the Department of Mathematics, for skillful teaching of undergraduate mathematics and for consultations on some of the mathematical details in the thesis. Professor Lars Eldén, also at the Department of Mathematics, for help with the numerical aspects of the thesis. Dr. Jörgen Karlholm, for much inspiration and a great knowledge of the relevant literature. Johan Wiklund, for keeping the computers happy. The Knut and Alice Wallenberg foundation, for funding the research within the WITAS project. viii Contents 1 Introduction 1.1 Motivation . 1.2 Organization 1.3 Contributions 1.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 A Unified Framework for Bases, Frames, Subspace Bases, Subspace Frames 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 The Linear Equation System . . . . . . . . . . . . . . . 2.2.3 The Linear Least Squares Problem . . . . . . . . . . . . 2.2.4 The Minimum Norm Problem . . . . . . . . . . . . . . . 2.2.5 The Singular Value Decomposition . . . . . . . . . . . . 2.2.6 The Pseudo-Inverse . . . . . . . . . . . . . . . . . . . . 2.2.7 The General Linear Least Squares Problem . . . . . . . 2.2.8 Numerical Aspects . . . . . . . . . . . . . . . . . . . . . 2.3 Representation by Sets of Vectors . . . . . . . . . . . . . . . . . 2.3.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Dual Vector Sets . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Representation by a Basis . . . . . . . . . . . . . . . . . 2.3.5 Representation by a Frame . . . . . . . . . . . . . . . . 2.3.6 Representation by a Subspace Basis . . . . . . . . . . . 2.3.7 Representation by a Subspace Frame . . . . . . . . . . . 2.3.8 The Double Dual . . . . . . . . . . . . . . . . . . . . . . 2.3.9 A Note on Notation . . . . . . . . . . . . . . . . . . . . 2.4 Weighted Norms . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 The Weighted General Linear Least Squares Problem . 2.4.3 Representation by Vector Sets . . . . . . . . . . . . . . 2.4.4 Dual Vector Sets . . . . . . . . . . . . . . . . . . . . . . 2.5 Weighted Seminorms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 2 3 and . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 5 5 6 6 6 7 7 8 8 8 8 9 9 9 10 10 10 11 11 12 12 12 12 13 14 x Contents 2.5.1 2.5.2 The Seminorm Weighted General Linear Least Squares Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Representation by Vector Sets and Dual Vector Sets . . . . 3 Normalized Convolution 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 3.2 Definition of Normalized Convolution . . . . . . . . 3.2.1 Signal and Certainty . . . . . . . . . . . . . 3.2.2 Basis Functions and Applicability . . . . . . 3.2.3 Definition . . . . . . . . . . . . . . . . . . . 3.2.4 Comments on the Definition . . . . . . . . . 3.3 Implementational Issues . . . . . . . . . . . . . . . 3.4 Example . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Output Certainty . . . . . . . . . . . . . . . . . . . 3.6 Normalized Differential Convolution . . . . . . . . 3.7 Reduction to Ordinary Convolution . . . . . . . . . 3.8 Application Examples . . . . . . . . . . . . . . . . 3.8.1 Normalized Averaging . . . . . . . . . . . . 3.8.2 The Cubic Facet Model . . . . . . . . . . . 3.9 Choosing the Applicability . . . . . . . . . . . . . . 3.10 Further Generalizations of Normalized Convolution 14 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 17 18 18 19 19 20 21 22 23 24 25 27 27 30 31 31 4 Orientation Estimation 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 The Orientation Tensor . . . . . . . . . . . . . . . . . . 4.2.1 Representation of Orientation for Simple Signals 4.2.2 Estimation . . . . . . . . . . . . . . . . . . . . . 4.2.3 Interpretation for Non-Simple Signals . . . . . . 4.3 Orientation Functionals . . . . . . . . . . . . . . . . . . 4.4 Signal Model . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Construction of the Orientation Tensor . . . . . . . . . . 4.5.1 Linear Neighborhoods . . . . . . . . . . . . . . . 4.5.2 Quadratic Neighborhoods . . . . . . . . . . . . . 4.5.3 General Neighborhoods . . . . . . . . . . . . . . 4.6 Properties of the Estimated Tensor . . . . . . . . . . . . 4.7 Fast Implementation . . . . . . . . . . . . . . . . . . . . 4.7.1 Equivalent Correlation Kernels . . . . . . . . . . 4.7.2 Cartesian Separability . . . . . . . . . . . . . . . 4.8 Computational Complexity . . . . . . . . . . . . . . . . 4.9 Relation to First and Second Derivatives . . . . . . . . . 4.10 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10.1 The Importance of Isotropy . . . . . . . . . . . . 4.10.2 Gaussian Applicabilities . . . . . . . . . . . . . . 4.10.3 Choosing γ . . . . . . . . . . . . . . . . . . . . . 4.10.4 Best Results . . . . . . . . . . . . . . . . . . . . 4.11 Possible Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 33 34 34 35 35 35 37 38 39 39 41 42 44 45 45 50 53 54 56 59 59 59 61 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contents xi 4.11.1 Multiple Scales . . . . . . . . . . . . . . . . . . . . . . . . . 4.11.2 Different Radial Functions . . . . . . . . . . . . . . . . . . . 4.11.3 Additional Basis Functions . . . . . . . . . . . . . . . . . . 5 Velocity Estimation 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 5.2 From Orientation to Motion . . . . . . . . . . . . . . 5.3 Estimating a Parameterized Velocity Field . . . . . . 5.3.1 Motion Models . . . . . . . . . . . . . . . . . 5.3.2 Cost Functions . . . . . . . . . . . . . . . . . 5.3.3 Parameter Estimation . . . . . . . . . . . . . 5.4 Simultaneous Segmentation and Velocity Estimation 5.4.1 The Competitive Algorithm . . . . . . . . . . 5.4.2 Candidate Regions . . . . . . . . . . . . . . . 5.4.3 Segmentation Algorithm . . . . . . . . . . . . 5.5 A Fast Velocity Estimation Algorithm . . . . . . . . 5.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Implementation and Performance . . . . . . . 5.6.2 Results for the Yosemite Sequence . . . . . . 5.6.3 Results for the Diverging Tree Sequence . . . 61 62 62 . . . . . . . . . . . . . . . 63 63 63 64 65 65 67 69 70 70 71 72 75 76 77 82 6 Future Research Directions 6.1 Phase Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Irregular Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 85 86 88 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendices A A Matrix Inversion Lemma . . . . . . . . . . . . . . . . . . B Cartesian Separable and Isotropic Functions . . . . . . . . . C Correlator Structure for Separable Normalized Convolution D Angular RMS Error . . . . . . . . . . . . . . . . . . . . . . E Removing the Isotropic Part of a 3D Tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 . 93 . 95 . 98 . 99 . 100 xii Contents Chapter 1 Introduction 1.1 Motivation In this licenciate thesis, spatial domain methods for orientation and velocity estimation are developed, together with a solid framework for design of signal processing algorithms in the spatial domain. It is more conventional for such methods to be designed, at least partially, in the Fourier domain. To understand why we wish to avoid the use of the Fourier domain altogether, it is necessary to have some background information. The theory and methods presented in this thesis are results of the research within the WITAS1 project [57]. The goal of this project is to develop an autonomous flying vehicle and naturally the vision subsystem is an important component. Unfortunately the needed image processing has a tendency to be computationally very demanding and therefore it is of interest to find ways to reduce the amount of processing. One way to do this is to emulate biological vision by using foveally sampled images, i.e. having a higher sampling density in an area of interest and gradually lower sampling density further away. In contrast to the usual rectangular grids, this approach leads to highly irregular sampling patterns. Except for some very specific sampling patterns, e.g. the logarithmic polar [10, 42, 47, 58], the theory for irregularly sampled multidimensional signals is far less developed than the corresponding theory for regularly sampled signals. Some work has been done on the problem of reconstructing irregularly sampled bandlimited signals [16]. In contrast to the regular case this turns out to be quite complicated, one reason being that the Nyquist frequency varies spatially with the local sampling density. In fact the use of the Fourier domain in general, e.g. for filter design, becomes much more complicated and for this reason we turn our attention to the spatial domain. So far all work has been restricted to regularly sampled signals, with adaptation of the methods to the irregularly sampled case as a major future research goal. Even without going to the irregularly sampled case, however, the spatial domain 1 Wallenberg laboratory for Information Technology for Autonomous Systems. 2 Introduction approach has turned out to be successful, since the resulting methods are efficient and have excellent accuracy. 1.2 Organization An important concept in the use of the spatial domain for signal processing is projections into subspaces, e.g. the subspace of second degree polynomials. Chapter 2 presents a unified framework for representations of finite dimensional vectors by bases, frames, subspace bases, and subspace frames. The basic idea is that all these representations by sets of vectors can be regarded as solutions to various least squares problems. Generalizations to weighted least squares problems are explored and dual vector sets are derived for efficient computation of the representations. In chapter 3 the theory developed in the previous chapter is used to derive the method called Normalized Convolution. This method is a powerful tool for signal analysis in the spatial domain, being able to take uncertainties in the signal values into account and allowing spatial localization of the analysis functions. Chapter 4 introduces orientation functionals for representation of orientation and it is shown that orientation tensors can be regarded as a special case of this concept. With the use of Normalized Convolution, a spatial domain method for estimation of orientation tensors, based on projection onto second degree polynomials, is developed. It is shown that, properly designed, this method can be implemented very efficiently and by evaluation on a test volume that it in practice performs excellently. In chapter 5 the orientation tensors from the previous chapter are utilized for velocity estimation. With the idea to estimate velocity over whole regions according to some motion model, two different algorithms are developed. The first one is a simultaneous segmentation and velocity estimation algorithm, while the second one gains in computational efficiency by disregarding the need for a proper segmentation into regions with coherent motion. By evaluation on the Yosemite sequence it is shown that both algorithms are substantially more accurate than previously published methods for velocity estimation. The thesis concludes with chapter 6 and a look at future research directions. It is sketched how orientation functionals can be extended to phase functionals, how the projection onto second degree polynomials can be employed for adaptive filtering, and how Normalized Convolution can be adapted to irregularly sampled signals. Since Normalized Convolution is the key tool for the orientation and velocity estimation algorithms, these will require only small amounts of additional work to be adapted to irregularly sampled signals. This chapter also explains how the cover image relates to the thesis. 1.3 Contributions It is never easy to say for sure what ideas and methods are new and which have been published somewhere previously. The following is an attempt at listing the parts of the material that are original and more or less likely to be truly novel. 1.4 Notation 3 The main contribution in chapter 2 is the unification of the seemingly disparate concepts of frames and subspace bases in a least squares framework, together with bases and subspace frames. Other original ideas is the simultaneous weighting in both the signal and coefficient spaces for subspace frames, the full generalization of dual vector sets to the weighted norm case in section 2.4.4, and most of the results in section 2.5 on the weighted seminorm case. The concept of weighted linear combinations in section 2.4.4 may also be novel. The method of Normalized Convolution in Chapter 3 is certainly not original work. The primary contribution here is the presentation of the method. By taking advantage of the framework from chapter 2 to derive the method, the goal is to achieve greater clarity than in earlier presentations. There are also some new contributions to the theory, such as parts of the discussion about output certainty in section 3.5, most of section 3.9, and all of section 3.10. In chapter 4 everything is original except section 4.2 about the tensor representation of orientation and estimation of tensors by means of quadrature filter responses. The main contributions are the concept of orientation functionals in section 4.3, the method to estimate orientation tensors from the projection onto second degree polynomials in section 4.5, the efficient implementation of the estimation method in section 4.7, and the observation of the importance of isotropy in section 4.10.1. The results on separable computation of Normalized Convolution in sections 5.5 and 4.8 are not limited to a polynomial basis but applies to any set of Cartesian separable basis functions and applicabilities. This makes it possible to do the computations significantly more efficient and is obviously an important contribution to the theory of Normalized Convolution. Chapter 5 mostly contains original work too, with the exception of sections 5.2 and 5.3.1. The main contributions here are the methods for estimation of motion model parameters in section 5.3, the algorithm for simultaneous segmentation and velocity estimation in section 5.4, and the fast velocity estimation algorithm in section 5.5. Large parts of the material in chapter 5 were developed for my master’s thesis and has been published in [13, 14]. The material in chapter 2 has been accepted for publication at the SCIA’99 conference [15]. 1.4 Notation Lowercase letters in boldface (v) are used for vectors and in matrix algebra contexts they are always column vectors. Uppercase letters in boldface (A) are used for matrices. The conjugate transpose of a matrix or a vector is denoted A∗ . The transpose of a real matrix or vector is also denoted AT . Complex conjugation without transpose is denoted v̄. The standard inner product between two vectors is written (u, v) or u∗ v. The norm of a vector is induced from the inner product, √ (1.1) kvk = v∗ v. Weighted inner products are given by (u, v)W = (Wu, Wv) = u∗ W∗ Wv (1.2) 4 and the induced weighted norms by p p kvkW = (v, v)W = (Wv, Wv) = kWvk, Introduction (1.3) where W normally is a positive definite Hermitian matrix. In the case that it is only positive semidefinite we instead have a weighted seminorm. The norm of a matrix is assumed to be the Frobenius norm, kAk2 = tr (A∗ A), where the trace of a quadratic matrix, tr M, is the sum of the diagonal elements. The pseudoinverse of a matrix is denoted A† . Somewhat nonstandard is the use of u · v to denote pointwise multiplication of the elements of two vectors. Finally v̂ is used to denote vectors of unit length and ṽ is used for dual vectors. Additional notation is introduced where needed, e.g. f ? g to denote unnormalized cross correlation in section 3.7. Chapter 2 A Unified Framework for Bases, Frames, Subspace Bases, and Subspace Frames 2.1 Introduction Frames and subspace bases, and of course bases, are well known concepts, which have been covered in several publications. Usually, however, they are treated as disparate entities. The idea behind this presentation of the material is to give a unified framework for bases, frames, and subspace bases, as well as the somewhat less known subspace frames. The basic idea is that the coefficients in the representation of a vector in terms of a frame, etc., can be described as solutions to various least squares problems. Using this to define what coefficients should be used, expressions for dual vector sets are derived. These results are then generalized to the case of weighted norms and finally also to the case of weighted seminorms. The presentation is restricted to finite dimensional vector spaces and relies heavily on matrix representations. 2.2 Preliminaries To begin with, we review some basic concepts from (Numerical) Linear Algebra. All of these results are well known and can be found in any modern textbook on Numerical Linear Algebra, e.g. [19]. 2.2.1 Notation Let C n be an n-dimensional complex vector space. Elements of this space are denoted by lower-case bold letters, e.g. v, indicating n × 1 column vectors. Uppercase bold letters, e.g. F, denote complex matrices. With C n is associated the 6 A Unified Framework . . . standard inner product, (f ,p g) = f ∗ g, where ∗ denotes conjugate transpose, and the Euclidian norm, kf k = (f , f ). In this section A is an n × m complex matrix, b ∈ C n , and x ∈ C m . 2.2.2 The Linear Equation System The linear equation system Ax = b (2.1) x = A−1 b (2.2) has a unique solution if and only if A is square and non-singular. If the equation system is overdetermined it does in general not have a solution and if it is underdetermined there are normally an infinite set of solutions. In these cases the equation system can be solved in a least squares and/or minimum norm sense, as discussed below. 2.2.3 The Linear Least Squares Problem Assume that n ≥ m and that A is of rank m (full column rank). Then the equation Ax = b is not guaranteed to have a solution and the best we can do is to minimize the residual error. The linear least squares problem arg minn kAx − bk (2.3) x = (A∗ A)−1 A∗ b. (2.4) x∈C has the unique solution If A is rank deficient the solution is not unique, a case which we return to in section 2.2.7. 2.2.4 The Minimum Norm Problem Assume that n ≤ m and that A is of rank n (full row rank). Then the equation Ax = b may have more than one solution and to choose between them we take the one with minimum norm. The minimum norm problem arg min kxk, x∈S S = {x ∈ C m ; Ax = b}. (2.5) has the unique solution x = A∗ (AA∗ )−1 b. (2.6) If A is rank deficient it is possible that there is no solution at all, a case to which we return in section 2.2.7. 2.2 Preliminaries 2.2.5 7 The Singular Value Decomposition An arbitrary matrix A of rank r can be factored by the Singular Value Decomposition, SVD, as A = UΣV∗ , (2.7) where U and V are unitary matrices, n×n and m×m respectively. Σ is a diagonal n × m matrix ¡ ¢ Σ = diag σ1 , . . . , σr , 0, . . . , 0 , (2.8) where σ1 , . . . , σr are the non-zero singular values. The singular values are all real and σ1 ≥ . . . ≥ σr > 0. If A is of full rank we have r = min(n, m) and all singular values are non-zero. 2.2.6 The Pseudo-Inverse The pseudo-inverse1 A† of any matrix A can be defined via the SVD given by (2.7) and (2.8) as A† = VΣ† U∗ , (2.9) where Σ† is a diagonal m × n matrix ¡ ¢ Σ† = diag σ11 , . . . , σ1r , 0, . . . , 0 . (2.10) We can notice that if A is of full rank and n ≥ m, then the pseudo-inverse can also be computed as A† = (A∗ A)−1 A∗ (2.11) A† = A∗ (AA∗ )−1 . (2.12) and if instead n ≤ m then If m = n then A is quadratic and the condition of full rank becomes equivalent with non-singularity. It is obvious that both the equations (2.11) and (2.12) reduce to A† = A−1 (2.13) in this case. Regardless of rank conditions we have the following useful identities: (A† )† = A ∗ † (2.14) † ∗ (A ) = (A ) † ∗ (2.15) † ∗ (2.16) ∗ † (2.17) A = (A A) A † ∗ A = A (AA ) 1 This pseudo-inverse is also known as the Moore-Penrose inverse. 8 A Unified Framework . . . 2.2.7 The General Linear Least Squares Problem The remaining case is when A is rank deficient. Then the equation Ax = b is not guaranteed to have a solution and there may be more than one x minimizing the residual error. This problem can be solved as a simultaneous least squares and minimum norm problem. The general (or rank deficient) linear least squares problem is stated as arg min kxk, x∈S S = {x ∈ C m ; kAx − bk is minimum}, (2.18) i.e. among the least squares solutions, choose the one with minimum norm. Clearly this formulation contains both the ordinary linear least squares problem and the minimum norm problem as special cases. The unique solution is given in terms of the pseudo-inverse as x = A† b (2.19) Notice that by equations (2.11) – (2.13) this solution is consistent with (2.2), (2.4), and (2.6). 2.2.8 Numerical Aspects Although the above results are most commonly found in books on Numerical Linear Algebra, only their algebraic properties are being discussed here. It should, however, be mentioned that e.g. equations (2.9) and (2.11) have numerical properties that differ significantly. The interested reader is referred to [5]. 2.3 Representation by Sets of Vectors If we have a set of vectors {fk } ⊂ C n and wish to represent2 an arbitrary vector v as a linear combination X v∼ c k fk (2.20) of the given set, how should the coefficients {ck } be chosen? In general this question can be answered in terms of linear least squares problems. 2.3.1 Notation n With the set of vectors, {fk }m k=1 ⊂ C , is associated an n × m matrix F = [f1 , f2 , . . . , fm ], (2.21) which effectively is a reconstructing operator because multiplication with an m × 1 vector c, Fc, produces linear combinations of the vectors {fk }. In terms of the reconstruction matrix, equation (2.20) can be rewritten as v ∼ Fc, 2 Ideally (2.22) we would like to have equality in equation (2.20) but that cannot always be obtained. 2.3 Representation by Sets of Vectors linearly independent dependent 9 spans C n yes no basis subspace basis frame subspace frame Table 2.1: Definitions where the coefficients {ck } have been collected in the vector c. The conjugate transpose of the reconstruction matrix, F∗ , gives an analyzing operator because F∗ x yields a vector containing the inner products between {fk } and the vector x ∈ C n . 2.3.2 Definitions Let {fk } be a subset of C n . If {fk } spans C n and is linearly independent it is called a basis. If it spans C n but is linearly dependent it is called a frame. If it is linearly independent but does not span C n it is called a subspace basis. Finally, if it neither spans C n , nor is linearly independent, it is called a subspace frame.3 This relationship is depicted in table 2.1. If the properties of {fk } are unknown or arbitrary we simply use set of vectors or vector set as a collective term. 2.3.3 Dual Vector Sets We associate with a given vector set {fk } the dual vector set {f̃k }, characterized by the condition that for an arbitrary vector v the coefficients {ck } in equation (2.20) are given as inner products between the dual vectors and v, ck = (f̃k , v) = f̃k∗ v. (2.23) This equation can be rewritten in terms of the reconstruction matrix F̃ corresponding to {f̃k } as c = F̃∗ v. (2.24) The existence of the dual vector set is a nontrivial fact, which will be proved in the following sections for the various classes of vector sets. 2.3.4 Representation by a Basis Let {fk } be a basis. An arbitrary vector v can be written as a linear combination of the basis vectors, v = Fc, for a unique coefficient vector c.4 Because F is invertible in the case of a basis, we immediately get c = F−1 v 3 The (2.25) notation used here is somewhat nonstandard. See section 2.3.9 for a discussion. coefficients {ck } are of course also known as the coordinates for v with respect to the basis {fk }. 4 The 10 A Unified Framework . . . and it is clear from comparison with equation (2.24) that F̃ exists and is given by F̃ = (F−1 )∗ . (2.26) In this very ideal case where the vector set is a basis, there is no need to state a least squares problem to find c or F̃. That this could indeed be done is discussed in section 2.3.7. 2.3.5 Representation by a Frame Let {fk } be a frame. Because the frame spans C n , an arbitrary vector v can still be written as a linear combination of the frame vectors, v = Fc. This time, however, there are infinitely many coefficient vectors c satisfying the relation. To get a uniquely determined solution we add the requirement that c be of minimum norm. This is nothing but the minimum norm problem of section 2.2.4 and equation (2.6) gives the solution c = F∗ (FF∗ )−1 v. (2.27) Hence the dual frame exists and is given by F̃ = (FF∗ )−1 F. 2.3.6 (2.28) Representation by a Subspace Basis Let {fk } be a subspace basis. In general, an arbitrary vector v cannot be written as a linear combination of the subspace basis vectors, v = Fc. Equality only holds for vectors v in the subspace spanned by {fk }. Thus we have to settle for the c giving the closest vector v0 = Fc in the subspace. Since the subspace basis vectors are linearly independent we have the linear least squares problem of section 2.2.3 with the solution given by equation (2.4) as c = (F∗ F)−1 F∗ v. (2.29) Hence the dual subspace basis exists and is given by F̃ = F(F∗ F)−1 . (2.30) Geometrically v0 is the orthogonal projection of v onto the subspace. 2.3.7 Representation by a Subspace Frame Let {fk } be a subspace frame. In general, an arbitrary vector v cannot be written as a linear combination of the subspace frame vectors, v = Fc. Equality only holds for vectors v in the subspace spanned by {fk }. Thus we have to settle for the c giving the closest vector v0 = Fc in the subspace. Since the subspace frame vectors are linearly dependent there are also infinitely many c giving the same closest vector v0 , so to distinguish between these we choose the one with minimum 2.3 Representation by Sets of Vectors 11 norm. This is the general linear least squares problem of section 2.2.7 with the solution given by equation (2.19) as c = F† v. (2.31) Hence the dual subspace frame exists and is given by F̃ = (F† )∗ . (2.32) The subspace frame case is the most general case since all the other ones can be considered as special cases. The only thing that happens to the general linear least squares problem formulated here is that sometimes there is an exact solution v = Fc, rendering the minimum residual error requirement superfluous, and sometimes there is a unique solution c, rendering the minimum norm requirement superfluous. Consequently the solution given by equation (2.32) subsumes all the other ones, which is in agreement with equations (2.11) – (2.13). 2.3.8 The Double Dual The dual of {f̃k } can be computed from equation (2.32), applied twice, together with (2.14) and (2.15). ˜ = F̃†∗ = F†∗†∗ = F†∗∗† = F†† = F. F̃ (2.33) What this means is that if we know the inner products between v and {fk } we can reconstruct v using the dual vectors. To summarize we have the two relations v ∼ F(F̃∗ v) = v ∼ F̃(F∗ v) = 2.3.9 X (f̃k , v)fk and (2.34) k X (fk , v)f̃k . (2.35) k A Note on Notation Usually a frame is defined by the frame condition, X |(fk , v)|2 ≤ Bkvk2 , Akvk2 ≤ (2.36) k which must hold for some A > 0, some B < ∞, and all v ∈ C n . In the finite dimensional setting used here the first inequality holds if and only if {f k } spans all of C n and the second inequality is a triviality as soon as the number of frame vectors is finite. The difference between this definition and the one used in section 2.3.2 is that the bases are included in the set of frames. As we have seen that equation (2.28) is consistent with equation (2.26), the same convention could have been used here. The reason for not doing so is that the presentation would have become more involved. 12 A Unified Framework . . . Likewise, we may allow the subspace bases to span the whole C n , making bases a special case. Indeed, as has already been discussed to some extent, if subspace frames are allowed to be linearly independent, and/or span the whole C n , all the other cases can be considered special cases of subspace frames. 2.4 Weighted Norms An interesting generalization of the theory developed so far is to exchange the Euclidian norms used in all minimizations for weighted norms. 2.4.1 Notation Let the weighting matrix W be an n × n positive definite Hermitian matrix. The weighted inner product (·, ·)W on C n is defined by (f , g)W = (Wf , Wg) = f ∗ W∗ Wg = f ∗ W2 g and the induced weighted norm k · kW is given by p p kf kW = (f , f )W = (Wf , Wf ) = kWf k. n In this section M and L denote weighting matrices for C and C The notation from previous sections carry over unchanged. 2.4.2 (2.37) (2.38) m respectively. The Weighted General Linear Least Squares Problem The weighted version of the general linear least squares problem is stated as arg min kxkL , x∈S S = {x ∈ C m ; kAx − bkM is minimum}. (2.39) This problem can be reduced to its unweighted counterpart by introducing x 0 = Lx, whereby equation (2.39) can be rewritten as arg min kx0 k, 0 x ∈S S = {x0 ∈ C m ; kMAL−1 x0 − Mbk is minimum}. (2.40) The solution is given by equation (2.19) as x0 = (MAL−1 )† Mb, (2.41) which after back-substitution yields x = L−1 (MAL−1 )† Mb. 2.4.3 (2.42) Representation by Vector Sets Let {fk } ⊂ C n be any type of vector set. We want to represent an arbitrary vector v ∈ C n as a linear combination of the given vectors, v ∼ Fc, where the coefficient vector c is chosen so that (2.43) 2.4 Weighted Norms 13 1. the distance between v0 = Fc and v, kv0 − vkM , is smallest possible, and 2. the length of c, kckL , is minimized. This is of course the weighted general linear least squares problem of the previous section, with the solution c = L−1 (MFL−1 )† Mv. (2.44) From the geometry of the problem one would suspect that M should not influence the solution in the case of a basis or a frame, because the vectors span the whole space so that v0 equals v and the distance is zero, regardless of norm. Likewise L should not influence the solution in the case of a basis or a subspace basis. That this is correct can easily be seen by applying the identities (2.11) – (2.13) to the solution (2.44). In the case of a frame we get c = L−1 (MFL−1 )∗ ((MFL−1 )(MFL−1 )∗ )−1 Mv = L−2 F∗ M(MFL−2 F∗ M)−1 Mv =L −2 ∗ F (FL −2 ∗ −1 F ) (2.45) v, in the case of a subspace basis c = L−1 ((MFL−1 )∗ (MFL−1 ))−1 (MFL−1 )∗ Mv = L−1 (L−1 F∗ M2 FL−1 )−1 L−1 F∗ M2 v ∗ 2 = (F M F) −1 ∗ (2.46) 2 F M v, and in the case of a basis c = L−1 (MFL−1 )−1 Mv = F−1 v. 2.4.4 (2.47) Dual Vector Sets It is not completely obvious how the concept of a dual vector set should be generalized to the weighted norm case. We would like to retain the symmetry relation from equation (2.33) and get correspondences to the representations (2.34) and (2.35). This can be accomplished by the weighted dual5 F̃ = M−1 (L−1 F∗ M)† L, (2.48) which obeys the relations ˜ = F, F̃ v ∼ FL (2.49) −2 ∗ 2 F̃ M v, v ∼ F̃L−2 F∗ M2 v. and (2.50) (2.51) 5 To be more precise we should say ML-weighted dual, denoted F̃ ML . In the current context the extra index would only weigh down the notation, and has therefore been dropped. 14 A Unified Framework . . . Unfortunately the two latter relations are not as easily interpreted as (2.34) and (2.35). The situation simplifies a lot in the special case where L = I. Then we have F̃ = M−1 (F∗ M)† , (2.52) which can be rewritten by identity (2.17) as F̃ = F(F∗ M2 F)† . The two relations (2.50) and (2.51) can now be rewritten as X (f̃k , v)M fk , and v ∼ F(F̃∗ M2 v) = (2.53) (2.54) k v ∼ F̃(F∗ M2 v) = X (fk , v)M f̃k . (2.55) k Returning to the case of a general L, the factor L−2 in (2.50) and (2.51) should be interpreted as a weighted linear combination, i.e. FL−2 c would be an L−1 -weighted linear combination of the vectors {fk }, with the coefficients given by c, analogously to F∗ M2 v being the set of M-weighted inner products between {fk } and a vector v. 2.5 Weighted Seminorms The final level of generalization to be addressed here is when the weighting matrices are allowed to be semidefinite, turning the norms into seminorms. This has fundamental consequences for the geometry of the problem. The primary difference is that with a (proper) seminorm not only the vector 0 has length zero, but a whole subspace has. This fact has to be taken into account with respect to the terms spanning and linear dependence.6 2.5.1 The Seminorm Weighted General Linear Least Squares Problem When M and L are allowed to be semidefinite7 the solution to equation (2.39) is given by Eldén in [12] as x = (I − (LP)† L)(MA)† Mb + P(I − (LP)† LP)z, (2.56) where z is arbitrary and P is the projection P = I − (MA)† MA. (2.57) 6 Specifically, if a set of otherwise linearly independent vectors have a linear combination of norm zero, we say that they are effectively linearly dependent, since they for all practical purposes may as well have been. 7 M and L may in fact be completely arbitrary matrices of compatible sizes. 2.5 Weighted Seminorms 15 Furthermore the solution is unique if and only if N (MA) ∩ N (L) = {0}, (2.58) where N (·) denotes the null space. When there are multiple solutions, the first term of (2.56) gives the solution with minimum Euclidian norm. If we make the restriction that only M may be semidefinite, the derivation in section 2.4.2 still holds and the solution is unique and given by equation (2.42) as x = L−1 (MAL−1 )† Mb. 2.5.2 (2.59) Representation by Vector Sets and Dual Vector Sets Here we have exactly the same representation problem as in section 2.4.3, except that that M and L may now be semidefinite. The consequence of M being semidefinite is that residual errors along some directions does not matter, while L being semidefinite means that certain linear combinations of the available vectors can be used for free. When both are semidefinite it may happen that some linear combinations can be used freely without affecting the residual error. This causes an ambiguity in the choice of the coefficients c, which can be resolved by the additional requirement that among the solutions, c is chosen with minimum Euclidian norm. Then the solution is given by the first part of equation (2.56) as c = (I − (L(I − (MF)† MF))† L)(MF)† Mv. (2.60) Since this expression is something of a mess we are not going explore the possibilities of finding a dual vector set or analogues of the relations (2.50) and (2.51). Let us instead turn to the considerably simpler case where only M is allowed to be semidefinite. As noted in the previous section, we can now use the same solution as in the case with weighted norms, reducing the solution (2.60) to that given by equation (2.44), c = L−1 (MFL−1 )† Mv. (2.61) Unfortunately we can no longer define the dual vector set by means of equation (2.48), due to the occurrence of an explicit inverse of M. Applying identity (2.16) on (2.61), however, we get c = L−1 (L−1 F∗ M2 FL−1 )† L−1 F∗ M2 v (2.62) F̃ = FL−1 (L−1 F∗ M2 FL−1 )† L (2.63) and it follows that yields a dual satisfying the relations (2.49) – (2.51). In the case that L = I this expression simplifies further to (2.53), just as for weighted norms. For future reference we also notice that (2.61) reduces to c = (MF)† Mv. (2.64) 16 A Unified Framework . . . Chapter 3 Normalized Convolution 3.1 Introduction Normalized Convolution is a method for signal analysis that takes uncertainties in signal values into account and at the same time allows spatial localization of possibly unlimited analysis functions. The method was primarily developed by Knutsson and Westin [38, 40, 60] and has later been described and/or used in e.g. [22, 35, 49, 50, 51, 59]. The conceptual basis for the method is the signal/certainty philosophy [20, 21, 36, 63], i.e. separating the values of a signal from the certainty of the measurements. Most of the previous presentations of Normalized Convolution have primarily been set in a tensor algebra framework, with only some mention of the relations to least squares problems. Here we will skip the tensor algebra approach completely and instead use the framework developed in chapter 2 as the theoretical basis for deriving Normalized Convolution. Specifically, we will use the theory of subspace bases and the connections to least squares problems. Readers interested in the tensor algebra approach are referred to [38, 40, 59, 60]. Normalized Convolution can, for each neighborhood of the signal, geometrically be interpreted as a projection into a subspace which is spanned by the analysis functions. The projection is equivalent to a weighted least squares problem, where the weights are induced from the certainty of the signal and the desired localization of the analysis functions. The result of Normalized Convolution is at each signal point a set of expansion coefficients, one for each analysis function. While neither least squares fitting, localization of analysis functions, nor handling of uncertain data in themselves are novel ideas, the unique strength of Normalized Convolution is that it combines all of them simultaneously in a well structured and theoretically sound way. The method is a generally useful tool for signal analysis in the spatial domain, which formalizes and generalizes least squares techniques, e.g. the facet model [23, 24, 26], that have been used for a long time. In fact, the primary use of normalized convolution in the following chapters of this thesis is for filter design in the spatial domain. 18 3.2 Normalized Convolution Definition of Normalized Convolution Before defining normalized convolution, it is necessary to get familiar with the terms signal, certainty, basis functions, and applicability, in the context of the method. To begin with we assume that we have discrete signals, and explore the straightforward generalization to continuous signals in section 3.2.4. 3.2.1 Signal and Certainty It is important to be aware that normalized convolution can be considered as a pointwise operation, or more strictly, as an operation on a neighborhood of each signal point. This is no different from ordinary convolution, where the convolution result at each point is effectively the inner product between the conjugated and reflected filter kernel and a neighborhood of the signal. Let f denote the whole signal while f denotes the neighborhood of a given point. It is assumed that the neighborhood is of finite size1 , so that f can be considered an element of a finite dimensional vector space C n . Regardless of the dimensionality of the space in which it is embedded2 , f is represented by an n × 1 column vector.3 Certainty is a measure of the confidence in the signal values at each point, given by non-negative real numbers. Let c denote the whole certainty field, while the n × 1 column vector c denotes the certainty of the signal values in f . Possible causes for uncertainty in signal values are, e.g., defective sensor elements, detected (but not corrected) transmission errors, and varying confidence in the results from previous processing. The most important, and rather ubiquitous case of uncertainty, however, is missing data outside the border of the signal, so called edge effects. The problem is that for a signal of limited extent, the neighborhood of points close to the border will include points where no values are given. This has traditionally been handled in a number of different ways. The most common is to assume that the signal values are zero outside the border, which implicitly is done by standard convolution. Another way is to assume cyclical repetition of the signal values, which implicitly is done when convolution is computed in the frequency domain. Yet another way is to extend with the values at the border. None of these is completely satisfactory, however. The correct way to do it, from a signal/certainty perspective, is to set the certainty for points outside the border to zero, while the signal value is left unspecified. It is obvious that certainty zero means missing data, but it is not so clear how positive values should be interpreted. An exact interpretation must be postponed to section 3.2.4, but of course a larger certainty corresponds to a higher confidence in the signal value. It may seem natural to limit certainty values to the range [0, 1], with 1 meaning full confidence, but this is not really necessary. 1 This condition can be lifted, as discussed in section 3.2.4. For computational reasons, however, it is in practice always satisfied. 2 E.g. dimensionality 2 for image data. 3 The elements of the vector are implicitly the coordinates relative to some orthonormal basis, typically a pixel basis. 3.2 Definition of Normalized Convolution 3.2.2 19 Basis Functions and Applicability The role of the basis functions is to give a local model for the signal. Each basis function has the size of the neighborhood mentioned above, i.e. it is an element of C n , represented by an n × 1 column vector bi . The set {bi }m 1 of basis functions are stored in an n × m matrix B, | | | (3.1) B = b 1 b 2 . . . b m . | | | Usually we have linearly independent basis functions, so that the vectors {b i } do constitute a basis for a subspace of C n . In most cases m is also much smaller than n. The applicability is a kind of “certainty” for the basis functions. Rather than being a measure of certainty or confidence, however, it indicates the significance or importance of each point in the neighborhood. Like the certainty, the applicability a is represented by an n × 1 vector with non-negative elements. Points where the applicability is zero might as well be excluded from the neighborhood altogether, but for practical reasons it may be convenient to keep them. As for certainty it may seem natural to limit the applicability values to the range [0, 1] but there is really no reason to do this because the scaling of the values turns out to be of no significance. The basis functions may actually be defined for a larger domain than the neighborhood in question. They can in fact be unlimited, e.g. polynomials or complex exponentials, but values at points where the applicability is zero simply do not matter. This is an important role of the applicability, to enforce a spatial localization of the signal model. A more extensive discussion on the choice of applicability follows in section 3.9. 3.2.3 Definition ¡ ¢ ¡ ¢ Let the n × n matrices Wa = diag a , Wc = diag c , and W2 = Wa Wc . 4 The operation of normalized convolution is at each signal point a question of representing a neighborhood of the signal, f , by the set of vectors {bi }, using the weighted norm (or seminorm) k · kW in the signal space and the Euclidian norm in the coefficient space. The result of normalized convolution is at each point the set of coefficients r used in the vector set representation of the neighborhood. As we have seen in chapter 2, this can equivalently be stated as the seminorm weighted general linear least squares problem arg min krk, r∈S S = {r ∈ C m ; kBr − f kW is minimum}. (3.2) In the case that the basis functions are linearly independent with respect to the (semi)norm k · kW , this can be simplified to the more ordinary weighted linear 4 We set W2 = Wa Wc to keep in line with the established notation. Letting W = Wa Wc would be equally valid, as long as a and c are interpreted accordingly, and somewhat more natural in the framework used here. 20 Normalized Convolution least squares problem arg minm kBr − f kW . (3.3) r∈C In any case the solution is given by equation (2.64) as r = (WB)† Wf . (3.4) For various purposes it is useful to rewrite this formula. We start by expanding the pseudo-inverse in (3.4) by identity (2.16), leading to r = (B∗ W2 B)† B∗ W2 f , (3.5) which can be interpreted in terms of inner products as (b1 , b1 )W .. r= . (bm , b1 )W ... .. . ... † (b1 , bm )W (b1 , f )W .. .. . . . (bm , bm )W (3.6) (bm , f )W Replacing W2 with Wa Wc and using the assumption that the vectors {bi } constitute a subspace basis with respect to the (semi)norm W, so that the pseudo-inverse in (3.5) and (3.6) can be replaced with an ordinary inverse, we get r = (B∗ Wa Wc B)−1 B∗ Wa Wc f (3.7) and with the convention that · denotes pointwise multiplication, we arrive at the expression5 (a · c · b1 , b1 ) .. r= . (a · c · bm , b1 ) 3.2.4 ... .. . ... −1 (a · c · b1 , bm ) (a · c · b1 , f ) .. .. . . . (a · c · bm , bm ) (3.8) (a · c · bm , f ) Comments on the Definition In previous formulations of Normalized Convolution, it has been assumed that the basis functions actually constitute a subspace basis, so that we have a unique solution to the linear least squares problem (3.3), given by (3.7) or (3.8). The problem with this assumption is that if we have a neighborhood with lots of missing data, it can happen that the basis functions effectively become linearly dependent in the seminorm given by W, so that the inverses in (3.7) and (3.8) do not exist. We can solve this problem by exchanging the inverses for pseudo-inverses, equations (3.5) and (3.6), which removes the ambiguity in the choice of resulting coefficients r by giving the solution to the more general linear least squares problem (3.2). This remedy is not without risks, however, since the mere fact that the basis functions turn linearly dependent, indicates that the values of at least some of the 5 This is almost the original formulation of Normalized Convolution. The final step is postponed until section 3.3. 3.3 Implementational Issues 21 coefficients may be very uncertain. More discussion on this follows in section 3.5. Taking proper care in the interpretation of the result, however, the pseudo-inverse solutions should be useful when the signal certainty is very low. They are also necessary in certain generalizations of Normalized Convolution, see section 3.10. To be able to use the framework from chapter 2 in deriving the expressions for Normalized Convolution, we restricted ourselves to the case of discrete signals and neighborhoods of finite size. When we have continuous signals and/or infinite neighborhoods we can still use (3.6) or (3.8) to define normalized convolution, simply by using an appropriate weighted inner product. The corresponding least squares problems are given by obvious modifications to (3.2) and (3.3). The geometrical interpretation of the least squares minimization is that the local neighborhood is projected into the subspace spanned by the basis functions, using a metric that is dependent on the certainty and the applicability. From the least squares formulation we can also get an exact interpretation of the certainty and the applicability. The certainty gives the relative importance of the signal values when doing the least squares fit, while the applicability gives the relative importance of the points in the neighborhood. Obviously a scaling of the certainty or applicability values does not change the least squares solution, so there is no reason to restrict these values to the range [0, 1]. 3.3 Implementational Issues While any of the expressions (3.4) – (3.8) can be used to compute Normalized Convolution, there are some differences with respect to computational complexity and numeric stability. Numerically (3.4) is somewhat preferable to the other expressions, because values get squared in the rest of them, raising the condition numbers. Computationally, however, the computation of the pseudo-inverse is costly and WB is typically significantly larger than B∗ W2 B. We rather wish to avoid the pseudo-inverses altogether, leaving us with (3.7) and (3.8). The inverses in these expressions are of course not computed explicitly, since there are more efficient methods to solve linear equation systems. In fact, the costly operation now is to compute the inner products in (3.8). Remembering that these computations have to be performed at each signal point, we can improve the expression somewhat by rewriting (3.8) as (a · b1 · b̄1 , c) .. r= . (a · bm · b̄1 , c) ... .. . ... −1 (a · b1 · b̄m , c) (a · b1 , c · f ) .. .. , . . (a · bm · b̄m , c) (3.9) (a · bm , c · f ) where b̄i denotes complex conjugation of the basis functions. This is actually the original formulation of Normalized Convolution [38, 40, 60], although with different notation. By precomputing the quantities {a · bk · b̄l }, {a · bk }, and c · f , we can decrease the total number of multiplications at the cost of a small increase in storage requirements. 22 Normalized Convolution To compute Normalized Convolution at all points of the signal we essentially have two strategies. The first is to compute all inner products and to solve the linear equation system for one point before continuing to the next point. The second is to compute the inner product for all points before continuing with the next inner product and at the very last solve all the linear equation systems. The advantage of the latter approach is that the inner products can be computed as standard convolutions, an operation which is often available in heavily optimized form, possibly in hardware. The disadvantage is that large amounts of extra storage must be used, which even if it is available could lead to problems with respect to data locality and cache performance. Further discussion on how Normalized Convolution can be computed more efficiently in certain cases can be found in sections 3.7, 4.7, and 4.8. 3.4 Example To give a small example, assume that we have a two-dimensional signal f , sampled at integer points, with an accompanying certainty field c, as defined below. f= 3 9 5 3 4 7 9 7 2 1 1 6 3 6 4 4 4 1 2 2 4 5 4 3 2 3 6 9 8 6 7 8 6 3 9 0 2 2 2 1 1 2 c= 2 1 1 2 0 1 2 2 1 1 2 2 2 2 2 2 2 2 2 1 1 2 2 1 1 2 0 0 (3.10) Let the local signal model be given by a polynomial basis, {1, x, y, x2 , xy, y 2 } (it is understood that the x variable increases from the left to the right, while the y variable increases going downwards) and an applicability of the form: a= 1 2 1 2 4 2 1 2 1 (3.11) The applicability fixes the size of the neighborhood, in this case 3 × 3 pixels, and gives a localization of the unlimited polynomial basis functions. Expressed as matrices, taking the points columnwise, we have 1 −1 −1 1 1 1 1 1 −1 0 1 0 0 2 1 −1 1 1 −1 1 1 1 0 −1 0 0 1 2 0 0 0 0 and a = B = 1 0 (3.12) 4 . 1 0 1 0 0 1 2 1 1 −1 1 −1 1 1 1 1 2 0 1 0 0 1 1 1 1 1 1 1 3.5 Output Certainty 23 Assume that we wish to compute the result of Normalized Convolution at the marked point in the signal. Then the signal and certainty neighborhoods are represented by 2 1 0 6 1 3 2 1 and c = 2 . 2 (3.13) f = 2 2 2 2 2 3 1 6 Applying equation (3.7) we get the result r = (B∗ Wa Wc B)−1 B∗ Wa Wc f −1 26 4 −2 10 0 14 55 1.81 4 10 0 4 −2 0 17 0.72 −2 0 7 0.86 14 −2 0 −2 = = 10 4 −2 10 0 6 27 0.85 0 −2 0 0 6 0 1 0.41 14 0 −2 6 0 14 27 −0.12 (3.14) As we will see in chapter 4, with this choice of basis functions, the resulting coefficients hold much information on the the local orientation of the neighborhood. To conclude this example, we reconstruct the projection of the signal, Br, and reshape it to a 3 × 3 neighborhood: 1.36 1.94 2.28 0.83 1.81 2.55 1.99 3.38 4.52 (3.15) To get the result of Normalized Convolution at all points of the signal, we repeat the above process at each point. 3.5 Output Certainty To be consistent with the signal/certainty philosophy, the result of Normalized Convolution should of course be accompanied by an output certainty. Unfortunately, this is for the most part an open problem. Factors that ought to influence the output certainty at a given point include 1. the amount of input certainty in the neighborhood, 2. the sensitivity of the result to noise, and 3. to which extent the signal can be described by the basis functions. 24 Normalized Convolution The sensitivity to noise is smallest when the basis functions are orthogonal 6 , because the resulting coefficients are essentially independent. Should two basis functions be almost parallel, on the other hand, they both tend to get relatively large coefficients, and input noise in a certain direction gets amplified. Two possible measures of output certainty have been published, by Westelius [59] and Karlholm [35] respectively. Westelius has used cout = Ã det G det G0 ! m1 , (3.16) while Karlholm has used cout = 1 . kG0 k2 kG−1 k2 (3.17) In both expressions we have G = B ∗ Wa Wc B and G0 = B∗ Wa B, (3.18) where G0 is the same as G if the certainty is identically one. Both these measures take the points 1 and 2 above into account. A disadvantage, however, is that they give a single certainty value for all the resulting coefficients, which makes sense with respect to 1 but not with respect to the sensitivity issues. Clearly, if we have two basis functions that are nearly parallel, but the rest of them are orthogonal, we have good reason to mistrust the coefficients corresponding to the two almost parallel basis functions, but not necessarily the rest of the coefficients. A natural measure of how well the signal can be described by the basis functions is given by the residual error in (3.2) or (3.3), kBr − f kW . (3.19) In order to be used as a measure of output certainty, some normalization with respect to the amplitude of the signal and the input certainty should be performed. One thing to be aware of in the search for a good measure of output certainty, is that it probably must depend on the application, or more precisely, on how the result is further processed. 3.6 Normalized Differential Convolution When doing signal analysis, it may be important to be invariant to certain irrelevant features. A typical example can be seen in chapter 4, where we want to estimate the local orientation of a multidimensional signal. It is clear that the local DC level gives no information about the orientation, but we cannot simply ignore it because it would affect the computation of the interesting features. The 6 Remember that orthogonality depends on the inner product, which in turn depends on the certainty and the applicability. 3.7 Reduction to Ordinary Convolution 25 solution is to include the features to which we wish to be invariant in the signal model. This means that we expand the set of basis functions, but ignore the corresponding coefficients in the result. Since we do not care about some of the resulting coefficients, it may seem wasteful to use (3.7), which computes all of them. To avoid this we start by partitioning µ ¶ ¡ ¢ r B = B1 B2 and r = 1 , (3.20) r2 where B1 contains the basis functions we are interested in, B2 contains the basis functions to which we wish to be invariant, and r1 and r2 are the corresponding parts of the resulting coefficients. Now we can rewrite (3.7) in partitioned form as µ ¶ µ ∗ r1 B1 W a W c B1 = r2 B∗2 Wa Wc B1 B∗1 Wa Wc B2 B∗2 Wa Wc B2 ¶−1 µ ∗ ¶ B1 W a W c f B∗2 Wa Wc f (3.21) and continue the expansion with an explicit expression for the inverse of a partitioned matrix [34]: µ A C∗ C B ¶−1 ∗ = ∆=B−C A µ −1 A−1 + E∆−1 F −∆−1 F C, E=A −1 C, ¶ −E∆−1 , ∆−1 ∗ F=C A (3.22) −1 The resulting algorithm is called Normalized Differential Convolution [35, 38, 40, 60, 61]. The primary advantage over the expression for Normalized Convolution is that we get smaller matrices to invert, but on the other hand we need to actually compute the inverses here7 , instead of just solving a single linear equation system, and there are also more matrix multiplications to perform. It seems unlikely that it would be worth the extra complications to avoid computing the uninteresting coefficients, unless B1 and B2 contain only a single vector each, in which case the expression for Normalized Differential Convolution simplifies considerably. In the following chapters we use the basic Normalized Convolution, even if we are not interested in all the coefficients. 3.7 Reduction to Ordinary Convolution If we have the situation that the certainty field remains fixed while the signal varies, we can save a lot of work by precomputing the matrices B̃∗ = (B∗ Wa Wc B)−1 B∗ Wa Wc (3.23) at every point, at least if we can afford the extra storage necessary. A possible scenario for this situation is that we have a sensor array where we know that certain 7 This is not quite true, since it is sufficient to compute factorizations that allow us to solve corresponding linear equation systems, but we need to solve several of these instead of just one. 26 Normalized Convolution sensor elements are not working or give less reliable measurements. Another case, which is very common, is that we simply do not have any certainty information at all and can do no better than setting the certainty for all values to one. Notice, however, that if the extent of the signal is limited, we have certainty zero outside the border. In this case we have the same certainty vector for many neighborhoods and only have to compute and store a small number of different B̃. As can be suspected from the notation, B̃ can be interpreted as a dual basis matrix. Unfortunately it is not the weighted dual subspace basis given by (2.53), because the resulting coefficients are computed by (b̃i , f ) rather than by using the proper8 inner product (b̃i , f )W . We will still use the term dual vectors here, although somewhat improperly. If we assume that we have constant certainty one and restrict ourselves to compute Normalized Convolution for the part of the signal that is sufficiently far from the border, we can reduce Normalized Convolution to ordinary convolution. At each point the result can be computed as r = B̃∗ f (3.24) ri = (b̃i , f ). (3.25) or coefficient by coefficient as Extending these computations over all points under consideration, we can write ri (x) = (b̃i , Tx f ), (3.26) where Tx is a translation operator, Tx f (u) = f (u + x). This expression can in turn be rewritten as a convolution ˇ ri (x) = (b̃i ∗ f )(x), (3.27) ˇ where we let b̃i denote the conjugated and reflected b̃i . The need to reflect and conjugate the dual basis functions in order to get convolution kernels is a complication that we would prefer to avoid. We can do this by replacing the convolution with an unnormalized cross correlation, using the notation from Bracewell [9], X g?h= ḡ(u)h(u + x). (3.28) u With this operation, (3.27) can be rewritten as ri (x) = (b̃i ? f )(x). (3.29) The cross correlation is in fact a more natural operation to use in this context than ordinary convolution, since we are not much interested in the properties that otherwise give convolution an advantage. We have, e.g., no use for the property 8 Proper with respect to the norm used in the minimization (3.3). 3.8 Application Examples 27 that g ∗ h = h ∗ g, since we have a marked asymmetry between the signal and the basis functions. The ordinary convolution is, however, a much more well known operation, so while we will use the cross correlation further on, it is useful to remember that we get the corresponding convolution kernels simply by conjugating and reflecting the dual basis functions. To get a better understanding of the dual basis functions, we can rewrite (3.23), with Wc = I, as | b̃1 | | b̃2 | ... | | b̃m = a · b1 | | | a · b2 | ... | a · bm G−1 , | (3.30) where G = B∗ Wa B. Hence we obtain the duals as linear combinations of the basis functions bi , windowed by the applicability a. The role of G−1 is to compensate for dependencies between the basis functions when they are not orthogonal. Notice that this includes non-orthogonalities caused by the windowing by a. A concrete example of dual basis functions can be found in section 4.7.1. 3.8 Application Examples Applications where Normalized Convolution has been used include interpolation [40, 60, 64], frequency estimation [39], estimation of texture gradients [50], depth segmentation [49, 51], phase-based stereopsis and focus-of-attention control [59, 62], and orientation and motion estimation [35]. In the two latter applications, Normalized Convolution is utilized to compute quadrature filter responses on uncertain data. 3.8.1 Normalized Averaging The most striking example is perhaps, despite its simplicity, normalized averaging to interpolate missing samples in an image. We illustrate this technique with a partial reconstruction of the example given in [22, 40, 60, 64]. In figure 3.1(a) the well-known Lena image has been degraded so that only 10% of the pixels remain.9 The remaining pixels have been selected randomly with uniform distribution from a 512 × 512 grayscale original. Standard convolution with a smoothing filter, given by figure 3.2(a), leads to a highly non-satisfactory result, figure 3.1(b), because no compensation is made for the variation in local sampling density. An ad hoc solution to this problem would be to divide the previous convolution result with the convolution between the smoothing filter and the certainty field, with the latter being an estimate of the local sampling density. This idea can easily be formalized by means of Normalized Convolution. The signal and the certainty are already given. We use a single basis function, a 9 The removed pixels have been replaced by zeros, i.e. black. For illustration purposes, missing samples are rendered white in figures 3.1(a) and 3.3(a). 28 Normalized Convolution (a) (b) (c) (d) Figure 3.1: Normalized averaging. (a) Degraded test image, only 10% of the pixels remain. (b) Standard convolution with smoothing filter. (c) Normalized averaging with applicability given by figure 3.2(a). (d) Normalized averaging with applicability given by figure 3.2(b). 3.8 Application Examples 29 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 5 5 0 0 −5 (a) a = ( −5 cos2 0, πr 16 , r<8 otherwise 5 5 0 0 −5 −5 r<1 1, (b) a = 0.5r −3 , 1 ≤ r < 8 0, otherwise Figure 3.2: Applicability functions used for normalized averaging. (a) (b) Figure 3.3: Normalized averaging on an inhomogeneously sampled image. (a) Degraded test image, only 4% of the pixels remain. (b) Normalized averaging with applicability given by figure 3.2(b). 30 Normalized Convolution constant one, and use the smoothing filter as the applicability.10 The result from this operation, figure 3.1(c), can be interpreted as a weighted and normalized average of the pixels present in the neighborhood, and is identical to the ad hoc solution above. In figure 3.1(d) we see the result of normalized averaging with a more localized applicability, given by figure 3.2(b). To expand on the example, we notice that instead of having a uniform distribution of the remaining pixels, it would be advantageous to have more samples in areas of high contrast. Figure 3.3(a) is such a test image, only containing 4% of the original pixels. The result of normalized averaging, with applicability given by figure 3.2(b), is shown in figure 3.3(b). 3.8.2 The Cubic Facet Model In the cubic facet model [24], it is assumed that in each neighborhood of an image, the signal can be described by a cubic polynomial f (x, y) = k1 + k2 x + k3 y + k4 x2 + k5 xy + k6 y 2 + k7 x3 + k8 x2 y + k9 xy 2 + k10 y 3 . (3.31) The coefficients {ki } are determined by a least squares fit within a square window of some size. A typical application of the cubic facet model is to estimate the image derivatives from the polynomial model and to use these to get the curvature κ= fx2 fyy + fy2 fxx − 2fx fy fxy 2(k22 k6 + k3 k4 − k2 k3 k5 ) = . (fx2 + fy2 )3/2 (k22 + k32 )3/2 (3.32) We see that the cubic facet model has much in common with to Normalized Convolution, except that it lacks provision for certainty and applicability. Hence we can regard this model as a special case of Normalized Convolution, with third degree polynomials as basis functions, certainty identically one, and applicability identically one on a square window. We can also note that in the computation of the curvature by equation (3.32), some of the estimated coefficients are not used, which can be compared to the idea of Normalized Differential Convolution, section 3.6. Facet models in general11 can of course also be described in terms of Normalized Convolution, by changing the set of basis functions accordingly. Applications for the facet model include gradient edge detection, zero-crossing edge detection, image segmentation, line detection, corner detection, three-dimensional shape estimation from shading, and determination of optic flow [25]. By extension, the same approaches can be taken with Normalized Convolution and probably with better results, since the availability of the applicability mechanism allows better control of the process. As discussed in the following section, an appropriate choice of applicability is especially important if we want to estimate orientation. 12 10 Notice that by equation (3.30), the equivalent correlation kernel when we have constant certainty is given by a multiple of the applicability, since we have only one basis function, which is a constant. 11 With other basis functions than the cubic polynomials. 12 Notice in particular the dramatic difference between a square applicability, implicitly used 3.9 Choosing the Applicability 3.9 31 Choosing the Applicability The choice of applicability depends very much on the application. It is in fact all but impossible to give general guidelines. For most applications, however, it seems more or less unavoidable that we wish to give higher importance to points in the center of the neighborhood than to points farther away. Thus the applicability should be monotonically decreasing in all directions. Another property to be aware of is isotropy. Unless a specific direction dependence is wanted, one probably had better taking care to get an isotropic applicability. This is, in fact, of utmost importance in the orientation estimation algorithm presented in chapter 4, see in particular section 4.10.1. If we look at a specific application, the normalized averaging from section 3.8.1, we can see a trade-off between excessive blurring with a wide applicability function and noise caused by the varying certainty with a narrow applicability. The motivation for the very narrow applicability in figure 3.2(a) is that we want to interpolate from values as close as possible to the point of interest and more or less ignore information farther away. In other applications it is necessary to have a wider applicability, because we actually want to analyze a whole neighborhood, e.g. to estimate orientation. In these cases the size of the applicability is related to the scale of the analyzed features. Another reason for a wider applicability is to become less sensitive to signal noise. 3.10 Further Generalizations of Normalized Convolution In the formulation of Normalized Convolution, it is traditionally assumed that the local signal model is spanned by a set of vectors constituting a subspace basis. As we have already discussed in section 3.2, this assumption is not without complications, since the vectors may effectively become linearly dependent in the seminorm given by W. This typically happens in areas with large amounts of missing data. A first generalization is therefore to allow linearly dependent vectors in the signal model, i.e. exchanging the subspace basis for a subspace frame. Except that we lose the simplifications to the expressions (3.7) and (3.8), this case has already been covered by the presentation in section 3.2. With a subspace frame instead of a subspace basis, another possible generalization is to use a weighted norm L in the coefficient space instead of the Euclidian norm, i.e. generalizing the seminorm weighted general linear least squares problem (3.2) somewhat to arg min krkL , r∈S S = {r ∈ C m ; kBr − f kW is minimum}. (3.33) If we require L to be positive definite, the solution is now given by (2.61) as r = L−1 (WBL−1 )† Wf by the facet model, and a Gaussian applicability in table 4.3. (3.34) 32 Normalized Convolution or by (2.62) as r = L−1 (L−1 B∗ W2 BL−1 )† L−1 B∗ W2 f . (3.35) If we allow L to be semidefinite we have to resort to the solution given by (2.60). If L is diagonal, the elements can be interpreted as the relative cost of using each subspace frame vector. This case is not very interesting, however, since the same effect could have been achieved simply by an amplitude scaling of the frame vectors. A more general L allows varying the costs for specific linear combinations of the subspace frame vectors, leading to more interesting possibilities. That it would be pointless to introduce L in the case of a subspace basis is clear from section 2.4.3, since it would not affect the solution at all, unless we have the case where the seminorm W turns the basis vectors effectively linearly dependent. Correspondingly, it does not make much sense to use a basis or a frame for the whole space as signal model, since in this case the weighting by W would be superfluous as the error to be minimized in (3.2) would be zero regardless of norm. Hence neither the certainty nor the applicability would make a difference to the solution. Another generalization that could be solved by the framework from chapter 2 is to have a non-diagonal weight matrix W. Unfortunately we still have no good idea what interpretation this could have but it is possible that the certainty part Wc could naturally be non-diagonal if the primary measurements of the signal were collected, e.g., in the frequency domain or as line integrals. A different generalization, that is not covered by the framework from the previous chapter, is to replace the general linear least squares problem (3.2) with the simultaneous minimization of signal error and coefficient norm, arg min αkBr − f kW + βkrk. r (3.36) This approach could possibly be more robust when the basis functions are nearly linearly dependent, but we will not investigate it further here. Chapter 4 Orientation Estimation 4.1 Introduction Orientation is a feature that fundamentally distinguishes multidimensional signals from one-dimensional signals, since the concept lacks meaning in the latter case. It is also a feature that is far from trivial both to represent and to estimate, as well as to define strictly for general signals. The one case where there is no question how the orientation should be defined is for non-constant simple signals, i.e. signals that can be written as f (x) = h(xT n) (4.1) for some non-constant function h of one variable and for some vector n. This means that the function is constant on all hyper-planes perpendicular to n and we say that the signal is oriented in the direction of n. Notice however that n is not unique in (4.1) since we could replace n by any multiple. Even if we normalize n to get the unit directional vector n̂, there is still an ambiguity between n̂ and −n̂. This is one problem that must be addressed by the representation of orientation. Of course the class of globally simple signals is too restricted to be of much use, so we need to generalize the definition of orientation to more arbitrary signals. To begin with we notice that we usually are not interested in a global orientation. In fact it is understood throughout the rest of this thesis that we by “orientation” mean “local orientation”, i.e. we only look at the signal behavior in some neighborhood of a point of interest. We can, however, still not rely on always having locally simple neighborhoods. Unfortunately there is no obvious way in which to generalize the definition of orientation to non-simple signals. Assume for example that we have a signal composed as the sum of two simple signals with different orientations. Should the orientation now be some mean value of the two orientations, both orientations, or something else? To illustrate what kind of problems we have here we take a closer look at two examples. They are both two-dimensional and we are interested in the orientation at a neighborhood of the origin. 1. Let f1 (x, y) = x and f2 (x, y) = y. Now the sum f (x, y) = f1 (x, y) + 34 Orientation Estimation f2 (x, y) = x + y is simple as well, oriented in the ( 1 1 )T direction. 2. Let f1 (x, y) = x2 and f2 (x, y) = y 2 . This time the sum f (x, y) = f1 (x, y) + f2 (x, y) = x2 +y 2 is very isotropic and we cannot possibly prefer one direction over another. In practice, exactly how we define the orientation of non-simple signals tends to be a consequence of how we choose to represent the orientation and what procedure we use to estimate it. In this presentation we represent orientation by tensors, essentially along the lines of Knutsson [22, 36], although with a somewhat different interpretation. The method used here for estimation of the orientation is completely novel and based on Normalized Convolution with a polynomial basis. A related concept is the inertia tensor by Bigün [4]. 4.2 The Orientation Tensor In this section we give an overview of Knutsson’s orientation tensor representation and estimation by means of quadrature filter responses [22, 36]. It should be noted that this estimation method is included only for reference and comparison. The estimation method used in this thesis is described in sections 4.4 and 4.5. 4.2.1 Representation of Orientation for Simple Signals The orientation tensor is a representation of orientation that for N-dimensional signals takes the form of an N × N real symmetric matrix1 . A simple signal in the direction n, as defined by equation (4.1), is represented by the tensor T = An̂n̂T , (4.2) where A is some constant that may encode other information than orientation, such as certainty or local signal energy. It is obvious that this representation maps n̂ and −n̂ to the same tensor and that the orientation can be recovered from the eigensystem of T. By design the orientation tensor satisfies the following two conditions: Invariance The normalized tensor T̂ = in equation (4.1). T kTk does not depend on the function h Equivariance The orientation tensor locally preserves the angle metric of the original space, i.e. kδ T̂k ∝ kδn̂k. (4.3) The tensor norm used above is the Frobenius norm, kTk2 = tr (TT T). 1 Symmetric matrices constitute a subclass of tensors. Readers who are more familiar with matrix algebra than with tensor algebra may safely substitute “symmetric matrix” for “tensor” throughout this chapter. 4.3 Orientation Functionals 4.2.2 35 Estimation The orientation tensor can be computed by means of the responses of a set of quadrature filters. Each quadrature filter is spherically separable and real in the Fourier domain, Fk (u) = R(kuk)Dk (û), where the radial function R can be chosen design restrictions given by desired center scale. The directional function is given by ( (û · n̂k )2 , Dk (û) = 0, (4.4) more or less arbitrary, with typical frequency, bandwidth, locality, and û · n̂k > 0, otherwise, (4.5) where {n̂k } is a set of direction vectors, usually evenly distributed in the signal space. It turns out that the minimum number of filters is 3 in 2D, 6 in 3D, and 12 in 4D. The orientation tensor is constructed from the magnitudes of the filter responses {qk } at each point by X T= |qk | Mk , (4.6) k where {Mk } are the duals of the outer product tensors {n̂k n̂Tk }. 4.2.3 Interpretation for Non-Simple Signals The above construction is guaranteed to give a tensor as defined in equation (4.2) only for simple signals. For non-simple signals the tensor is analyzed by means of the eigenvalue decomposition, which can be written as X T= (4.7) λk êk êTk , k where λ1 ≥ λ2 ≥ . . . ≥ λN are the eigenvalues and {êk } are the corresponding eigenvectors. In 3D, e.g., this can be rewritten as T = (λ1 − λ2 )ê1 êT1 + (λ2 − λ3 )(ê1 êT1 + ê2 êT2 ) + λ3 I. (4.8) The tensor is here represented as a linear combination of three tensors. The first corresponds to a simple neighborhood, i.e. locally planar, the second to a rank 2 neighborhood, i.e. locally constant on lines, and the last term corresponds to an isotropic neighborhood, e.g. non-directed noise. For further details the reader is referred to [22]. 4.3 Orientation Functionals Here we take the view that the orientation tensor is an instance of the new concept of orientation functionals, defined below. 36 Orientation Estimation Let U denote the set of unit vectors in RN , U = {u ∈ RN ; kuk = 1}. (4.9) An orientation functional φ is a mapping φ : U −→ R+ ∪ {0} (4.10) that to each direction vector assigns a non-negative real value. The value is interpreted as a measure of how well the signal locally is consistent with an orientation hypothesis in the given direction. Since we do not distinguish between two opposite directions, we require that φ be even, i.e. that φ(−u) = φ(u), all u ∈ U. (4.11) We also set some restrictions on the mapping from signal neighborhoods to the associated orientation functionals. The signal f is assumed to be expressed in a local coordinate system, so that we always discuss the local orientation at the origin. 1. Assume that the signal is rotated around the origin, so that f (x) is replaced by f˜(x) = f (Rx), where R is a rotation matrix. Then the orientation functional φ̃ associated to f˜ should relate to φ by φ̃(u) = φ(Ru), i.e. be rotated in the same way. This relation should also hold for other orthogonal matrices R, characterized by RT R = I. These matrices represent isometric transformations, which in addition to rotations also include reflections and combinations of rotation and reflection. 2. In directions along which the signal is constant, φ should be zero. 3. For a simple signal in the direction n̂, φ should have its maximum value for n̂ and −n̂. It should also decrease monotonically as the angle to the closer of these two directions increases. 4. If a constant is added to the signal, φ should not change, i.e. the orientation functional should be invariant to the DC level. 5. If the signal is multiplied by a positive constant α, f˜(x) = αf (x), the new orientation functional should be proportional to the old one, φ̃(u) = βφ(u), where the positive constant β is not necessarily equal to α but should vary monotonically with α. 6. If the signal values are negated, φ should remain unchanged. One might also think that the orientation functional should remain unchanged if the signal is scaled. This is, however, not the case. Orientation is a local property and a signal may look completely different at different scales. Thus we have the following non-requirement. 7. If the signal is uniformly scaled, f˜(x) = f (λx), |λ| 6= 1, no additional restriction is set on the behavior of φ. 4.4 Signal Model 37 To transform an orientation tensor into an orientation functional, we simply use the construction φT (u) = uT Tu. (4.12) Hence the orientation tensors are the subclass of orientation functionals which are positive semidefinite2 quadratic forms in u. Orientation functionals can in 2D and 3D be illustrated by polar plots, as shown in figure 4.1. For a generalization of the orientation functional concept, see section 6.1. (a) φ(u) = uT µ 1 0 ¶ 0 u 0 1 (b) φ(u) = uT 0 0 Figure 4.1: Polar plots of orientation functionals. 4.4 0 0.5 0 0 0 u 0 Signal Model For estimation of orientation we use the assumption that local projection onto second degree polynomials gives sufficient information. Thus we have the local signal model, expressed in a local coordinate system, f (x) ∼ xT Ax + bT x + c, (4.13) where A is a symmetric matrix, b a vector and c a scalar. The coefficients of the model can be estimated in terms of Normalized Convolution with the basis 2 As it happens, the estimation method described in section 4.2.2 can sometimes yield an indefinite tensor. We will not consider that case further. 38 Orientation Estimation functions {1, x, y, x2 , y 2 , xy} (4.14) for the 2D case and obvious generalizations to higher dimensionalities. The relation between the coefficients obtained from Normalized Convolution, {ri } and the signal model (4.13) is straightforward. In e.g. 2D we have ¶ µ µ ¶ r4 r26 r2 , and A = r6 , (4.15) c = r1 , b = r3 r5 2 so that ¡ x µ ¶ µ ¶ x x T y A + c = r1 + r2 x + r3 y + r4 x2 + r5 y 2 + r6 xy. +b y y ¢ (4.16) The choice of applicability can in principle be made freely but as we will see in sections 4.6 and 4.10 it is important that it is isotropic and in particular does Gaussian applicability turn out to be very useful, see section 4.7. Local orientation can be estimated for signal structures of different scale and the size of the applicability is directly related to this scale. We can notice that A captures information about the even part of the signal, excluding DC, that b captures information about the odd part of the signal, and that c varies with the local DC level. While the latter gives no information about orientation, it is necessary to include it in the signal model because otherwise the DC level would affect the computation of A. Although the use of Normalized Convolution allows us to have signals with varying certainty, we are not going to explore this case in any depth, mainly limiting ourselves to border effects. As a consequence we can be certain that the basis functions are linearly independent3 and that we can use equation (3.8) to compute Normalized Convolution. As noted in section 3.2.4, this equation can also be used for continuous signals if we introduce a suitable inner product. Here we use the standard L2 inner product Z (f, g) = f (x)g(x) dx. (4.17) RN Although we do not limit ourselves to L2 functions, it is assumed that all integrals are convergent, typically by requiring that the applicability has finite support or decreases exponentially while the basis functions and the signals are bounded by some polynomial. 4.5 Construction of the Orientation Tensor To find out how the orientation tensor should be constructed from the coefficients of the local signal model, we start by studying purely linear and quadratic neighborhoods. 3 Unless we are using an applicability of very small size, e.g. in 2D an applicability with only five non-zero values, which certainly is not sufficient for the six basis functions. 4.5 Construction of the Orientation Tensor 4.5.1 39 Linear Neighborhoods A linear neighborhood can always be written as f (x) = bT x (4.18) for some vector b. Obviously this means that the signal is simple with orientation given by b. It should be clear that we get a suitable orientation tensor from the construction T = bbT . (4.19) An illustration of a linear neighborhood in 2D is given in figure 4.2. 15 4 10 5 2 0 0 −5 −2 −10 −15 4 2 2 0 0 −2 −2 −4 −4 4 −4 −4 −2 0 2 4 Figure 4.2: Linear neighborhood, f (x, y) = x + 2y. 4.5.2 Quadratic Neighborhoods For quadratic neighborhoods, f (x) = xT Ax, (4.20) the situation is more complicated. These neighborhoods are simple if and only if A is of rank 1. To get an idea of how to deal with higher rank neighborhoods we take a look at four different neighborhoods in 2D, depicted in figure 4.3. In (a) we have f (x, y) = x2 , a simple signal, so the orientation is clearly horizontal. In (b), where f (x, y) = x2 + 0.5y 2 , the horizontal direction still dominates but less distinctly. In (c) we have the perfectly isotropic neighborhood f (x, y) = x 2 + y 2 , where no direction can be preferred. The signal illustrated in (d), f (x, y) = x 2 −y 2 is more confusing. Although it can be argued that it is constant on the two lines y = ±x, this is not sufficient to consider it a simple signal in either direction. In fact we treat this signal too as completely isotropic, in a local orientation sense. Analogously to the linear case we get a suitable orientation tensor by the construction T = AAT . (4.21) 40 Orientation Estimation 20 15 5 2 0 0 −2 −2 −4 −4 2 4 4 20 2 10 0 4 25 4 2 15 0 10 0 −2 5 −2 −4 −4 0 4 −2 0 2 4 2 (a) f (x, y) = x2 40 2 2 0 0 −2 −2 −4 −4 4 2 4 2 4 0 −10 −2 0 2 0 −2 −4 −4 −2 4 10 0 10 0 4 20 2 20 −4 −4 (b) f (x, y) = x2 + 0.5y 2 4 30 0 0 −2 −2 −4 −4 2 4 0 (c) f (x, y) = x2 + y 2 2 4 −20 4 −2 2 2 0 0 −2 −2 −4 −4 4 −4 −4 −2 0 (d) f (x, y) = x2 − y 2 Figure 4.3: Quadratic neighborhoods. 4.5 Construction of the Orientation Tensor 41 The tensors corresponding to the quadratic neighborhoods in figure 4.3 are given in figure 4.4 together with their polar plots. (a) T = µ (c) T = µ 1 0 0 0 1 0 0 1 ¶ ¶ (b) T = µ (d) T = 1 0 µ 0 0.25 1 0 0 1 ¶ ¶ Figure 4.4: Tensors corresponding to quadratic neighborhoods. 4.5.3 General Neighborhoods For a general neighborhood we have the local signal model f (x) ∼ xT Ax + bT x + c. (4.22) 42 Orientation Estimation Here we can add the tensors which would be obtained from the linear and quadratic components separately, i.e. T = AAT + γbbT , (4.23) where γ is a non-negative weight factor. An example of a linear plus quadratic neighborhood is given in figure 4.5 together with the polar plot of the corresponding tensor for γ = 0.25. As we can see from the example, the proper value of γ depends on the scale at which we study the orientation. At a small scale the linear component should dominate while at a large scale the quadratic part is more significant. In general the value of γ should decrease when the size of the neighborhood under consideration becomes larger. Further discussion on the choice of γ can be found in sections 4.6 and 4.10.3. 30 4 20 2 10 0 0 −10 −2 −20 4 2 2 0 0 −2 −2 −4 −4 4 −4 −4 −2 0 (a) f (x, y) = 3x + y 2 2 4 (b) T = µ 2.25 0 0 1 ¶ Figure 4.5: Linear plus quadratic neighborhood and corresponding tensor. 4.6 Properties of the Estimated Tensor Ideally we would like the estimated tensor to exactly satisfy the requirements of an orientation functional, listed in section 4.3. This is indeed the case if we restrict ourselves to the ideal case of continuous signals with constant certainty and require that the applicability be isotropic. To begin with we can notice that from the construction of the tensor, T = AAT + γbbT , it is clear that T is symmetric and positive semidefinite. Thus the corresponding functional φT (u) = uT Tu maps to non-negative real values and is even because φT (−u) = (−u)T T(−u) = uT Tu = φT (u). To verify the numbered requirements in section 4.3 we assume that we have a signal f , a projection onto a quadratic polynomial according to equation (4.13), 4.6 Properties of the Estimated Tensor 43 the corresponding tensor T given by equation (4.23), and the corresponding orientation functional φ(u) = uT Tu. 1. If f is rotated to yield f˜(x) = f (Rx), the projection onto a quadratic polynomial is rotated similarly, f˜(x) ∼ (Rx)T A(Rx) + bT Rx + c = xT (RT AR)x + (RT b)T x + c. (4.24) This follows from the fact that the set of quadratic polynomials is closed under rotation and the assumptions that the certainty is constant (and thus isotropic) and that the applicability is isotropic. Now we get the tensor corresponding to f˜ by T̃ = (RT AR)(RT AR)T + γ(RT b)(RT b)T = RT AAT R + γRT bbT R = RT TR. (4.25) From this it follows that φ̃(u) = uT T̃u = uT RT TRu = (Ru)T T(Ru) = φ(Ru). (4.26) The only property of R that we have used above is RRT = I, so this derivation is equally valid for other isometric transformations. 2. Assume that f is constant along the first coordinate axis, and let u 1 be the corresponding direction vector. Then f does not depend on the first variable and neither does the projection on a quadratic polynomial. Thus we have Au1 = AT u1 = 0 and bT u1 = 0 so that φ(u1 ) = uT1 AAT u1 + γuT1 bbT u1 = 0. (4.27) If f is constant along some other direction the conclusion still follows because property 1 allows us to rotate this direction onto u1 . 3. If f is N-dimensional and simple in the direction n̂, there is a set of N − 1 orthogonal directions along which it is constant. From property 2 it follows that these directions are eigenvectors of T corresponding to the eigenvalue zero and as a consequence T is at most of rank one. Hence we have T = αn̂n̂T , (4.28) φ(u) = αuT n̂n̂T u = α(n̂T u)2 = α cos2 θ, (4.29) for some non-negative α and where θ is the angle between n̂ and u. 4. If a constant is added to f , this only affects the value of c in the projection onto a quadratic polynomial. Since c is discarded in the construction of T, the tensor remains unchanged. 44 Orientation Estimation 5. If the amplitude of the signal is multiplied by a constant α, f˜(x) = αf (x), the projection is multiplied by the same constant, i.e. f˜(x) ∼ xT (αA)x + (αb)T x + αc. (4.30) Hence the new tensor is given by T̃ = (αA)(αA)T + γ(αb)(αb)T = α2 T (4.31) and the corresponding orientation functional φ̃(u) = α2 φ(u). (4.32) 6. If the signal values are negated, the tensor is unchanged. This follows from equation (4.31) with α = −1. 7. If the signal is uniformly scaled, f˜(x) = f (λx), things become more complicated. To begin with, if the signal is a quadratic polynomial, we have f˜(x) = (λx)T A(λx) + bT (λx) + c = xT (λ2 A)x + (λb)T x + c, (4.33) and the new orientation tensor is given by T̃ = λ4 AAT + λ2 γbbT . (4.34) Hence the relative weight between the quadratic and linear parts of the tensor is altered. For a general signal the projection on a polynomial may change arbitrarily, because it may look completely different at varying scales. In the case where the applicability is scaled identically with the signal, however, the projection is scaled according to equation (4.33) and to get a new tensor proportional to the old one, we need to scale the weight factor γ by λ2 . In practice, with discretized signals of limited extent, we cannot guarantee that all of these requirements be perfectly fulfilled. The primary reason is that we cannot even perform an arbitrary rotation of a discretized signal without introducing errors, so we cannot really hope to do any better with the orientation descriptor. 4.7 Fast Implementation A drawback with Normalized Convolution is that it is computationally demanding. If we assume that the certainty is constant, however, it can be reduced to ordinary convolution or cross correlation4 , as shown in section 3.7. To improve further on the computational complexity, it turns out that the resulting correlation kernels are Cartesian separable for suitable choices of applicability. 4 We prefer to use the latter operation, but remember that a real correlation kernel always can be transformed into a convolution kernel simply by reflecting it. 4.7 Fast Implementation 4.7.1 45 Equivalent Correlation Kernels To find the equivalent correlation kernels, assuming constant certainty, we need to compute the dual basis functions according to equation (3.30), | | | | | | b̃1 b̃2 . . . b̃m = a · b1 a · b2 . . . a · bm G−1 , (4.35) | | | | | | where (a · b1 , b1 ) .. G= . (a · bm , b1 ) ... .. . ... (a · b1 , bm ) .. . (4.36) (a · bm , bm ) Now we get the coefficients in the signal model (4.13) by the cross correlation ri (x) = (b̃i ? f )(x), (4.37) where we can skip i = 1 since the DC level is not used in the tensor construction. To illustrate these equivalent correlation kernels, we have in figure 4.6 the six basis functions for 2D. In figure 4.7 follow the dual basis functions for a Gaussian T applicability, a(x) = e−0.5x x , on a 9 × 9 grid. In figure 4.8 finally we have the Fourier transforms of the equivalent correlation kernels. 4.7.2 Cartesian Separability It turns out that all the correlation kernels in figure 4.7, except the uninteresting one corresponding to the constant basis function, have the property that they are Cartesian separable, i.e. that they can each be decomposed as the outer product of two 1D kernels, one horizontal and one vertical. This means that each cross correlation can be computed by means of two consecutive 1D cross correlations, which computationally is significantly more efficient than a full 2D cross correlation. This advantage is even more important for higher dimensionalities than two. If we have a Cartesian separable applicability we can see that the products {a · bk } in equation (4.35) also have that property, because the basis functions obviously are Cartesian separable. This means that the polynomial coefficients, for signals of any dimensionality, can be computed solely by 1D correlations, since (4.35) and (4.37) together give us ((a · b1 ) ? f )(x) .. r(x) = G−1 (4.38) . . ((a · bm ) ? f )(x) The next step is to explore the structures of G and G−1 . It turns out that the structures become extremely simple if we restrict ourselves to applicabilities 46 Orientation Estimation 2 4 1.5 2 1 0 0.5 −2 0 4 2 0 −2 −2 −4 −4 0 2 4 −4 4 2 0 −2 −4 −4 (a) 1 20 2 15 0 10 −2 5 2 0 −2 −2 −4 −4 0 2 4 0 4 2 0 −2 −4 −4 20 20 15 10 10 0 5 −10 2 0 −2 −2 −4 −4 (e) y 4 −2 0 2 4 (d) x2 (c) y 0 4 2 (b) x 4 −4 4 −2 0 0 2 4 −20 4 2 0 −2 −4 −4 (f) xy 2 Figure 4.6: Basis functions in 2D. −2 0 2 4 4.7 Fast Implementation 47 0.4 0.1 0.3 0.05 0.2 0 0.1 −0.05 −0.1 4 0 4 2 0 −2 −2 −4 −4 0 2 4 2 0 −2 −2 −4 −4 0 2 4 (b) x (a) 1 0.1 0.05 0.05 0 0 −0.05 −0.1 4 −0.05 2 0 −2 −4 −4 −2 0 2 4 4 2 0 −2 (c) y (d) x 0.05 −2 −4 −4 0 2 4 2 0.05 0 0 −0.05 4 2 0 −2 −4 −4 (e) y 2 −2 0 2 4 −0.05 4 2 0 −2 −4 −4 (f) xy Figure 4.7: Dual basis functions in 2D. −2 0 2 4 48 Orientation Estimation −3 x 10 0.01 5 0.005 0 0 −0.005 −5 −0.01 π π/2 0 −π/2 −π −π 0 −π/2 π/2 π π π/2 (a) 1 (real part) −π/2 −π −π −π/2 0 π/2 π (b) x (imaginary part) −3 −3 x 10 x 10 4 5 2 0 0 −2 −5 π 0 π/2 0 −π/2 −π −π −π/2 0 π/2 π −4 π π/2 0 −π −π −π/2 0 π/2 π (d) x2 (real part) (c) y (imaginary part) −3 −3 x 10 x 10 4 4 2 2 0 0 −2 −2 −4 π −4 π π/2 −π/2 0 −π/2 −π −π −π/2 0 (e) y 2 (real part) π/2 π π/2 0 −π/2 −π −π −π/2 0 (f) xy (real part) Figure 4.8: Fourier transforms of equivalent correlation kernels. π/2 π 4.7 Fast Implementation 49 which are even and identical along all axes, i.e. invariant under reflection and permutation of the axes. Then most of the inner products {(a·bk , bl )} in equation (4.36) become zero since most of the products {a · bk · bl } are odd along at least one axis. In fact, the only non-zero inner products are {(a·bk , bk )}, {(a·b1 , bx2i )}, and {(a · bx2i , bx2j )}. Thus we have the structure of G, illustrated in the 3D case, a b G = b b 1 b b b c d d d c d d d c b b b d d x y z x2 y2 z2 xy xz 1 x y z 2 x 2 y . 2 z xy xz d yz yz Surprisingly enough we get an even simpler structure for the inverse, a e e e 1 x b y b z b 2 e x c 2 y . e c G−1 = 2 e z c xy d xz d d yz 1 x y z x2 y 2 z 2 xy xz yz (4.39) (4.40) This result is proved in appendix A. Now we can see that all the dual basis functions except the first one5 are indeed separable. By (4.38) and (4.40) the duals can be written as b̃1 = a · (a b1 + e b̃xi = b a · bxi , X bx2i ), b̃x2i = a · (e b1 + c bx2i ) = β a · (bx2i − α b1 ), b̃xi xj = d a · bxi xj , (4.41) i 6= j, where we can notice that the constant α turns out to have precisely the value which makes the DC response of those dual basis functions zero. This should come as 5 Notice that we always have the constant function as b . Thus there is no ambiguity whether 1 the subscript refers to the position number or the zeroth degree monomial. 50 Orientation Estimation no surprise since we included the constant one among the basis functions precisely for this purpose. The final step to get an efficient correlator structure is to notice that the decompositions into 1D correlation kernels have a lot of common factors. Figure 4.9 shows how the correlations for 3D can be structured hierarchically in three levels, where the first level contains correlations in the x direction, the second in the y direction, and the third in the z direction. The results are the correlations {((a · bk ) ? f )(x)} and the desired polynomial coefficients are then computed from equation (4.38) using the precomputed G−1 according to equation (4.40). It should be noted that only three different correlation kernels are in use, ax , ax · x, and ax ·x2 in the x direction and identical kernels in the other directions. These kernels are illustrated in figure 4.10. It should also be clear that this correlator structure straightforwardly can be extended to any dimensionality. 89:; ?>=<RRRR gg f ggggg g g g g ggggg ² sggggg x 1 j M j MMM jjj MMM ¯¯ jjjj j M& j ² ¦¯¯ j tjj y y2 1 D 12 22 DD z ¯ z 2 ¯ ¯¯ 222 DD z 2 ¯ ¯ z " ² ¼ ¼ |z ¦¯ ² ¦¯ z z z 1 1 1 1 z2 z 1 z2 y yz y2 x xz RRR RRR R( DD DD D" x2 y ² 1 ² 1 ² 1 xy x2 Figure 4.9: Correlator structure. There is understood to be an applicability factor in each box as well. This scheme can be improved somewhat by replacing x2 with x2 − α, doing the same for the other squares, and then removing the leftmost box of the bottom level. Additionally the remaining coefficients in G−1 could be multiplied into the bottom level kernels to save another few multiplications. We do not consider these improvements in the complexity analysis in the following section. Finally one may ask oneself whether there exist applicabilities which are simulT taneously Cartesian separable and isotropic. Obviously the Gaussians, e −ρx x , satisfy these requirements. Actually this is the only solution, which is proved in appendix B. 4.8 Computational Complexity The computational complexity of the tensor estimation algorithm depends on a number of factors, such as • the dimensionality d of the signal space, • the size of the applicability per dimension, n, 4.8 Computational Complexity 51 1 0.5 0 −0.5 −1 −4 −3 −2 −1 0 1 2 3 4 1 2 3 4 1 2 3 4 (a) ax · 1 1 0.5 0 −0.5 −1 −4 −3 −2 −1 0 (b) ax · x 1 0.5 0 −0.5 −1 −4 −3 −2 −1 0 (c) ax · x 2 Figure 4.10: One dimensional correlation kernels. 52 Orientation Estimation • whether the certainty is assumed to be constant, and • whether the applicability is separable and sufficiently symmetric. Here we consider the following four variations of the method: Normalized Convolution (NC) The certainty is allowed to vary and the applicability is arbitrary. The polynomial coefficients are computed by equation (3.9), using the point by point strategy. It should be noticed that there are a number of duplicates among the quantities {a · bk · bl }, reducing the number of inner products that must be computed. Correlation (C) The certainty is assumed to be constant (ignoring border effects) while the applicability is arbitrary. The polynomial coefficients are computed by equation (4.37). Separable Correlation (SC) The certainty is assumed to be constant and the applicability to be Cartesian separable and sufficiently symmetric. The polynomial coefficients are computed by the correlator structure in figure 4.9. Separable Normalized Convolution (SNC) With varying certainty but Cartesian separable applicability, all inner products in equation (3.9) can be computed as separable correlations. They can in fact even be computed hierarchically with a more complex variation of the structure in figure 4.9. The 2D case is illustrated in appendix C. Independent of method we always have m = (d+1)(d+2) basis functions and the 2 computation of the tensor from the polynomial coefficients requires d(d+1)(d+2) 2 multiplications and slightly fewer additions per point. In general we count the complexity per computed tensor and only the number of multiplications involved; usually there is a slightly lower number of additions as well. This is consistent with the traditional count of coefficients for convolution kernels. Without going into details we present the asymptotic complexities, for both d and n large, in table 4.1. Memory overhead should be multiplied by the number of tensors to be computed, to get the necessary size of temporary storage, measured in floating point values of the desired precision. Usually, however, we are more interested in small values of d rather than in large values. A more detailed estimation of the complexity for 2D, 3D, and 4D Method NC C SC SNC Time complexity d4 d 24 n d2 d 2 n d3 6 n d5 120 n + Memory overhead 0 d6 48 d2 2 d2 2 d4 24 Table 4.1: Asymptotic complexities, d and n large, leading terms. 4.9 Relation to First and Second Derivatives Method NC C SC SNC d=2 21n2 + 105 5n2 + 12 9n + 19 29n + 105 53 Time complexity d=3 d=4 45n3 + 351 85n4 + 966 9n3 + 30 14n4 + 60 19n + 42 34n + 78 74n + 351 159n + 966 Memory overhead d=2 d=3 d=4 0 0 0 6 10 15 6 10 15 21 45 85 Table 4.2: Time complexity and memory overhead for 2D, 3D, and 4D. can be found in table 4.2. The values relate to reasonably straightforward implementations of the methods and can likely be improved somewhat. The first term of the time complexities is the total number of coefficients involved in the correlation kernels, while the second is the count for the tensor construction from the correlation results. Included in the latter part for NC and SNC is the solution of 3 2 m an m × m symmetric positive definite equation system, estimated at m6 + 3m 2 + 3 operations [45]. To sum this analysis up, it is clear that Separable Correlation is by far the most efficient method. The restriction set on the applicability is no limitation because those properties are desired anyway. The requirement of constant certainty is a problem, however, since such an assumption surely fails at least in a neighborhood of the borders and the method is more than likely to yield significantly biased tensors there. Proper attention to the vanishing certainty outside the border is paid by the NC and SNC methods, which on the other hand have a high time complexity and a large memory overhead, respectively. A good solution for signals with constant certainty would be to use Separable Correlation for the inner part of the signal and Normalized Convolution or Separable Normalized Convolution for the border parts. It must be stressed, however, that while Normalized Convolution will reduce the negative impact of missing information outside the borders, it will certainly not remove it completely. The best solution is, as always, to keep away from those parts as much as possible. 4.9 Relation to First and Second Derivatives By the Maclaurin expansion, a sufficiently differentiable signal can in a neighborhood of the origin be expanded as 1 f (x) = f (0) + (∇f )T x + xT Hx + O(kxk3 ), 2 (4.42) where the gradient ∇f contains the first derivatives of f at the origin and the Hessian H contains the second derivatives, fx1 (0) fx1 x1 (0) . . . fx1 xn (0) .. .. .. ∇f = ... , H= (4.43) . . . . fxn (0) fxn x1 (0) ... fxn xn (0) 54 Orientation Estimation Clearly this expansion looks identical to the signal model (4.13) with A = 12 H, b = ∇f , and c = f (0). There is, however, conceptually an important difference. The Maclaurin expansion is intended to be correct for an infinitesimal neighborhood, while our signal model is intended to approximate the signal over a larger neighborhood, specified by the applicability. The Maclaurin expansion also has the principal problem that the mathematical definition of derivatives requires signal values arbitrarily close to the origin, which are not available for discretized signals. Another complication is that perfect differentiation would be extremely sensitive to noise. One way to get around this, which also allows computing “derivatives at different scales”, is to first convolve the signal with a Gaussian, h = f ∗ g, xT x g(x) = e− 2σ2 (4.44) and then differentiate the filtered signal h. By the laws of convolution, the partial derivatives of h can be computed as f convolved with the partial derivatives of g, which are known explicitly. In e.g. 2D we have hx = f ∗ g x , hy = f ∗ g y , hxx = f ∗ gxx , hxy = f ∗ gxy , hyy = f ∗ gyy , x g, σ2 y gy = − 2 g, µ σ2 x gxx = − σ4 xy gxy = 4 g, σ µ y2 gyy = − σ4 gx = − 1 σ2 ¶ g, 1 σ2 ¶ g, (4.45) and we can see that the structure of the partial derivatives of g agrees with the dual basis functions in (4.41) for Gaussian applicability. These functions are also illustrated in figure 4.7. We would like to stress, however, that this fact is purely coincidental and an effect of the special properties of the Gaussians. For other applicabilities we do not have this relation. Likewise we cannot compute the tensor from the responses of any filter set that implements some approximation of first and second derivatives and automatically assume that it will have all the good properties that we have derived for the orientation tensors. Still it may help the intuition to think of A and b in terms of image derivatives. 4.10 Evaluation The tensor estimation algorithm has been evaluated on a 3D test volume consisting of concentric spherical shells. The volume is 64 × 64 × 64 and selected slices are displayed in figure 4.11. Except at the center of the volume the signal is locally planar and all possible orientations are present. As in [2, 37] the performance of 4.10 Evaluation 55 (a) slice 5 (b) slice 14 (c) slice 21 (d) slice 32 Figure 4.11: Slices from the 64-cube test volume. 56 Orientation Estimation (a) SNR = 10 dB (b) SNR = 0 dB Figure 4.12: White noise added to slice 32 of the test volume. the tensors is measured by an angular RMS error v u L u 1 X kx̂x̂T − ê1 êT1 k2 , ∆φ = arcsin t 2L (4.46) l=1 where x̂ is a unit vector in the correct orientation, ê1 is the eigenvector corresponding to the largest eigenvalue of the estimated tensor, and L is the number of points. To avoid border effects and irregularities at the center of the volume, the sum is only computed for points at a radius between 0.16 and 0.84, with respect to normalized coordinates. As is shown in appendix D, the angular RMS error can equivalently be written as v u L u1 X ∆φ = arccos t (4.47) (x̂T ê1 )2 . L l=1 Although the slices in figure 4.11 may give the impression that the volume contains structures at a wide range of scales, this is not the case from a 3D perspective. As can be seen from slice 32, the distance between two shells varies between about 3 and 6 pixels within the tested part of the volume. Hence it is possible to obtain very good performance by orientation estimation at a single scale. The algorithm has also been tested on degraded versions of the test volume, where white noise has been added to get a signal to noise ratio of 10 dB and 0 dB respectively. One slice of each of these are shown in figure 4.12. 4.10.1 The Importance of Isotropy As we saw in section 4.6, isotropy is a theoretically important property of the applicability. To test this in practice a number of different applicabilities has been 4.10 Evaluation 57 evaluated. The test set consists of: • Cubes of four different sizes, with sides being 3, 5, 7, and 9 pixels wide. • A sphere of radius 3.5 pixels. • The same sphere but oversampled, i.e. sampled regularly at 10 × 10 points per pixel and then averaged. The result is a removal of jaggies at the edges and a more isotropic applicability. • A 3D cone of radius 4 pixels. • The same cone oversampled. • A “tent” shape, 8 pixels wide, oversampled. • A Gaussian with standard deviation 1.2, sampled at 9 × 9 × 9 points. The first nine applicabilities are illustrated in figure 4.13 in form of their 2D counterparts. The Gaussian can be found in figure 4.14 (b). The results are listed in table 4.3 and we can see that the cube and tent shapes, which are highly anisotropic,6 performs significantly worse than the more isotropic ones. This is of particular interest since the cube applicability corresponds to the naive use of an unweighted subspace projection; cf. the cubic facet model, discussed in section 3.8.2. shape cube 3 × 3 × 3 cube 5 × 5 × 5 cube 7 × 7 × 7 cube 9 × 9 × 9 Sphere Sphere, oversampled Cone Cone, oversampled Tent, oversampled Gaussian, σ = 1.2 ∞ 3.74◦ 13.50◦ 22.99◦ 30.22◦ 6.69◦ 0.85◦ 1.39◦ 0.28◦ 21.38◦ 0.17◦ 10 dB 7.27◦ 14.16◦ 23.57◦ 30.64◦ 8.20◦ 5.78◦ 6.10◦ 5.89◦ 21.86◦ 3.53◦ 0 dB 24.06◦ 18.48◦ 27.30◦ 33.62◦ 15.34◦ 14.30◦ 13.89◦ 14.13◦ 25.16◦ 10.88◦ Table 4.3: Evaluation of different applicabilities. The main reason that the cubes are anisotropic is that they extend farther into the corners than along the axes. The spheres and the cones eliminate this phenomenon by being cut off at some radius. Still there is a marked improvement in isotropy when these shapes are oversampled, which can clearly be seen in the results from the noise-free volume. The difference between the spheres and the cones is that the latter have a gradial decline in the importance given to points farther away from the center. 6 The 3 × 3 × 3 cube is actually too small to be significantly anisotropic. 58 Orientation Estimation 1 1 1 0.5 0.5 0.5 0 5 5 0 −5 0 5 5 0 0 −5 −5 (a) Cube 3 × 3 × 3 0 5 −5 (b) Cube 5 × 5 × 5 1 1 0.5 0.5 0.5 5 0 −5 0 5 5 0 0 −5 −5 (d) Cube 9 × 9 × 9 0 5 −5 (e) Sphere, radius 3.5 1 0.5 0.5 0.5 5 −5 0 5 5 0 0 −5 −5 (g) Cone, radius 4 0 −5 (f) Sphere, oversampled 1 0 5 0 0 −5 1 0 5 0 −5 (c) Cube 7 × 7 × 7 1 0 5 5 0 0 −5 0 5 5 0 0 −5 −5 (h) Cone, oversampled 0 −5 (i) Tent, oversampled Figure 4.13: Applicabilities used to test orientation estimation. 1 1 1 0.5 0.5 0.5 0 5 5 0 −5 0 −5 (a) σ = 0.3 0 5 5 0 −5 0 −5 (b) σ = 1.2 0 5 5 0 −5 0 −5 (c) σ = 4.8 Figure 4.14: Gaussian applicabilities with different standard deviations. 4.10 Evaluation 59 We can see that this makes a difference, primarily when there is no noise, but that the significance of isotropy is much larger can clearly be seen from the poor results of the tent shape. The Gaussian, finally, turns out to have superior performance, which is very fortunate considering that this shape is separable and therefore can be computed with the fast algorithm from section 4.7.2. 4.10.2 Gaussian Applicabilities With Gaussian applicabilities there is only one parameter to vary, the standard deviation σ. The Gaussian must be truncated, however, and with the separable algorithm the truncation is implicitly made to a cube of some size. Figure 4.14 shows three Gaussians with widely varying standard deviations, truncated to a cube with side 9. There are three aspects to note with respect to the choice of σ: 1. The size of the applicability should match the scale of the structures we want to estimate orientation for. 2. For small applicabilities the estimation is typically more sensitive to noise than for larger ones. 3. If the standard deviation is large relative to the size to which the Gaussian is truncated, the contributions from the corners tend to make the applicability anisotropic, as is illustrated in figure 4.14 (c). Fortunately the Gaussian decreases very fast sufficiently far from the origin, so with a proper choice of the truncation size the Gaussian remains very isotropic. The results are shown in figure 4.15. It is noteworthy that the anisotropic tendencies affect the performance of the algorithm, in the absence of noise, quite significantly already for σ about 1.5. 4.10.3 Choosing γ Another parameter in the tensor estimation algorithm is the relative weight for the linear and quadratic parts of the signal, γ. In the previous experiments γ has been chosen reasonably, with only a small optimization effort. To see how the value of γ typically affects the performance we have varied γ for a fixed Gaussian applicability with optimal standard deviation, 1.06. The results are shown in figure 4.16. We can clearly see that neither the linear nor the quadratic part are very good on their own but suitably weighted together they give much better results. We can also see that the linear part on its own works better than the quadratic part in the absence of noise, but that it is more noise sensitive. It is interesting to note here that the linear part, interpreted in terms of derivatives, essentially is the gradient, which is a classic means to estimate orientation. 4.10.4 Best Results Table 4.4 lists the best results obtained for different sizes of the applicability. All computations have been made with the separable algorithm and σ and γ have 60 Orientation Estimation 45 40 angular error, degrees 35 30 25 20 15 10 5 0 0 1 2 3 σ 4 5 Figure 4.15: Angular errors for Gaussians with varying standard deviations. The solid lines refer to 9 × 9 × 9 kernels while the dashed lines refer to 11 × 11 × 11 kernels. The three curve pairs are for 0, 10, and ∞ SNR respectively. angular error, degrees 25 3 2.5 angular error, degrees 20 15 2 1.5 10 5 0 1 0.5 −5 10 0 10 γ (a) 5 10 0 −5 10 0 10 γ 5 10 (b) Figure 4.16: (a) Angular errors for varying γ values. The three curves are for 0, 10, and ∞ SNR respectively. (b) A magnification of the results for the noise-free test volume. 4.11 Possible Improvements n 3 5 7 9 11 ∞ 0.87◦ 0.37◦ 0.15◦ 0.11◦ 0.11◦ 10 dB 6.69◦ 3.51◦ 3.05◦ 3.03◦ 3.03◦ 61 0 dB 23.18◦ 10.70◦ 10.26◦ 10.24◦ 10.24◦ kernel coefficients 57 85 133 171 209 total operations 99 127 175 213 251 Table 4.4: Smallest angular errors for different kernel sizes. been tuned for each applicability size, n. The results for 9 × 9 × 9 applicabilities, and equivalently kernels of the same effective size, can readily be compared to the results given in [2, 37] for a sequential filter implementation of the quadrature filter based estimation algorithm described in section 4.2.2. As we can see in table 4.5, the algorithm proposed in this thesis performs quite favorably7 in the absence of noise while being somewhat more noise sensitive. Additionally it uses only half the number of kernel coefficients. SNR ∞ 10 dB 0 dB Andersson, Wiklund, Knutsson 345 coeff. 0.76◦ 3.02◦ 9.35◦ Farnebäck 171 coeff. 0.11◦ 3.03◦ 10.24◦ Table 4.5: Comparison with Andersson, Wiklund & Knutsson [2, 37]. 4.11 Possible Improvements Although the orientation estimation algorithm has turned out to work very well, see also the results in section 5.6, there are still a number of areas where it could be improved, or at least designed differently. 4.11.1 Multiple Scales The algorithm is quite selective with respect to the scale of the structures in the signal, which depending on the application may be either an advantage or a disadvantage. If it is necessary to estimate orientation over a large range of scales, the best solution probably is to compute a number of tensors at distinct scales and 7 To be fair it should be mentioned that the filters used in [2, 37] are claimed not to have been tuned at all for performance on the test volume. One would still guess that the available parameters have been chosen quite reasonably though. 62 Orientation Estimation subsequently combine them into a single tensor. Preliminary experiments indicate that it is sufficient to simply add the tensors together, taking the scaling relations of equation (4.34) into account. 4.11.2 Different Radial Functions The basis functions can in 2D be written in polar coordinates as {1, ρ cos φ, ρ sin φ, ρ2 cos2 φ, ρ2 sin2 φ, ρ2 cos φ sin φ}. (4.48) It is easy to show that it is only the angular functions that are essential for the rotation equivariance and other properties of the tensor. The radial functions may well be something else than ρ for the linear basis functions and ρ 2 for the quadratic ones. One possibility would be to have the radial function ρ for all the basis functions except the constant. As a consequence both parts of the tensor would scale equally if both signal and applicability is scaled and there would be no need to adjust γ, cf. equation (4.34). Another possibility would be to try to obtain matching radial functions in the Fourier domain, which currently is not the case. One should be aware, however, that changing the radial functions would destroy the separability of the basis functions. 4.11.3 Additional Basis Functions It would be conceivable to expand the signal model, equation (4.13), e.g. with higher degree polynomials, cf. the cubic facet model discussed in section 3.8.2. It is not obvious that this would actually improve anything, however, but it would certainly increase the computational complexity. To make the increased complexity worthwhile it would probably be necessary to find a way to ensure that the additional basis functions reduce the noise sensitivity of the algorithm, possibly by introducing some kind of redundancy. Chapter 5 Velocity Estimation 5.1 Introduction If an image sequence is considered as a spatiotemporal volume, it is possible to use the orientation information in the volume for estimation of the motion in the sequence. In particular the tensor representation of orientation allows straightforward estimation of the motion, see e.g. [22, 28, 29, 30, 31, 32, 60] and section 5.2. The tensor can also be used more indirectly to provide constraints on the motion in order to estimate parameterized motion models, which is the basis for the methods developed in this chapter. Related approaches have also been used by Karlholm [35]. For overviews of other methods for motion estimation, the reader is referred to [3] and [11]. The algorithms presented in sections 5.3 and 5.4 were developed for my master’s thesis [13, 14], at that time using orientation tensors estimated by quadrature filters and with emphasis on segmentation of the motion field rather than on velocity estimation. The simplified algorithm in section 5.5 and the results in section 5.6 have not been published before. All these algorithms are novel and together with orientation tensors estimated by the algorithms from chapter 4 they give excellent performance, as demonstrated in section 5.6. 5.2 From Orientation to Motion By stacking the frames of an image sequence onto each other we obtain a spatiotemporal image volume with two spatial dimensions and a third temporal dimension. It is easy to see that there is a strong correspondence between the motions in the image sequence and the orientations in the image volume. A moving line, e.g., is converted into a plane in the volume and from the orientation of the plane we can recover the velocity component perpendicular to the line. The fact that we can only obtain the perpendicular component is a fundamental limitation known as the aperture problem; the parallel velocity component of a linear structure cannot be determined simply because it does not induce any change in the local signal. A 64 Velocity Estimation moving point, on the other hand, is converted into a line in the volume and from the direction of the line we can obtain the true motion. In terms of orientation tensors, the first case corresponds to a rank 1 tensor, where the largest eigenvector gives the orientation of the planar structure, while the second case corresponds to a rank 2 tensor, where the smallest eigenvector gives the direction along the linear structure. More precisely, with the tensor T expressed by the eigenvalue decomposition as in equation (4.7), T = λ1 e1 eT1 + λ2 e2 eT2 + λ3 e3 eT3 , (5.1) the velocity in the two cases can be computed by, taken from [22], vnormal x1 x2 x3 = −x3 (x1 ξ̂ 1 + x2 ξ̂ 2 )/(x21 + x22 ) = ê1 · ξ̂ 1 = ê1 · ξ̂ 2 = ê1 · t̂ moving line case, (5.2) v x1 x 2 x3 (x1 ξ̂ 1 + x2 ξ̂ 2 )/x3 ê3 · ξ̂ 1 ê3 · ξ̂ 2 ê3 · t̂ moving point case, (5.3) and = = = = where ξ̂ 1 and ξ̂ 2 are the orthogonal unit vectors defining the image plane and t̂ is a unit vector in the time direction. One problem with this approach to velocity estimation is that we at each point must decide whether we can compute true velocity or have to be content with the normal component. Another problem is robustness. The method is sensitive both to noise and to errors in the tensor estimation. A common method to increase the robustness is averaging of the tensors in a neighborhood of each point, discussed further in section 5.5. 5.3 Estimating a Parameterized Velocity Field Rather than estimating the velocity from the tensors point for point we assume that the velocity field over a region can be parameterized according to some motion model and we use all the tensors in the region to compute the parameters. To begin with we assume that we somehow have access to a region in the current frame of the sequence, within which the motion can be described, at least approximately, by some relatively simple parametric motion model. 5.3 Estimating a Parameterized Velocity Field 5.3.1 65 Motion Models The simplest possible motion model is to assume that the velocity is constant over the region, vx (x, y) = a, vy (x, y) = b, (5.4) where x and y are the spatial coordinates, vx and vy are the x and y components of the velocity, and a and b are the model parameters. Geometrically this motion model corresponds to objects undergoing a pure translation under orthographic projection. A more powerful alternative is the affine motion model, vx (x, y) = ax + by + c, vy (x, y) = dx + ey + f, (5.5) which applies to planar patches undergoing rigid body motion, i.e. translation plus rotation, under orthographic projection. To also account for a perspective projection we need the eight parameter motion model, vx (x, y) = a1 + a2 x + a3 y + a7 x2 + a8 xy, vy (x, y) = a4 + a5 x + a6 y + a7 xy + a8 y 2 . (5.6) The usefulness of these models does of course depend on the application but it is useful to notice that sufficiently far away most surfaces can be approximated as planar and if the distance to the scene is much larger than the variation in distance within the scene, perspective projection can be approximated by orthographic projection. More details on the derivation of these motion models can be found in [11]. Of course it would be possible to design other motion models for specific expected velocity fields, but we will only consider those listed above in this presentation. Requirements on the motion models in order to be useful with the methods developed in this chapter are given in section 5.3.3. 5.3.2 Cost Functions A 2D velocity vector (vx , vy )T , measured in pixels per frame, can be extended to a 3D spatiotemporal directional vector v and a unit directional vector v̂ by vx v . (5.7) v = vy , v̂ = kvk 1 Ideally, in the case of a constant translation1 , we obtain a spatiotemporal neighborhood which is constant in the v̂ direction. By property 2 of section 4.3 we therefore have the constraint that φT (v̂) = v̂T Tv̂ = 0. 1 In the image plane. (5.8) 66 Velocity Estimation This is consistent with the discussion in section 5.2 since in the moving point case, the least eigenvector has eigenvalue zero and is parallel to v̂, while in the moving line case, all v̂ with the correct normal velocity component satisfies equation (5.8). In a less ideal case, where the motion is not a translation, where the motion is a translation but it varies over time, or in the presence of noise, we typically have to deal with rank 3 tensors, meaning that the constraint (5.8) cannot be fulfilled. As discussed in section 4.3, the interpretation of the orientation functional φ T is that the value in a given direction is a measure of how well the signal locally is consistent with an orientation hypothesis in that direction. In this case we are searching for directions along which the signal varies as little as possible and thus we wish to minimize v̂T Tv̂. Rewriting the tensor as T = λ1 e1 eT1 + λ2 e2 eT2 + λ3 e3 eT3 = (λ1 − λ3 )e1 eT1 + (λ2 − λ3 )e2 eT2 + λ3 I = T̃ + λ3 I, (5.9) where λ3 I is the isotropic part of the tensor, cf. section 4.2.3, we can see that v̂T Tv̂ = v̂T T̃v̂ + λ3 v̂T Iv̂ = v̂T T̃v̂ + λ3 . (5.10) Thus it is clear that the isotropic part of the tensor can be removed without affecting the minimization problem, i.e. the minimum is obtained for the same directions. In fact it is necessary to remove it, because we will see that for computational reasons it is preferable to minimize an expression involving v rather than v̂.2 Then we have the minimization of vT Tv = vT T̃v + λ3 vT v, (5.11) which would be clearly biased against large velocities compared to (5.10). Hence we remove the isotropic part of the tensor in a preprocessing step to obtain an isotropy compensated tensor T̃ = T − λ3 I. (5.12) Notice that this operation does not require a full eigenvalue decomposition of T; it is sufficient to compute the smallest eigenvalue. An efficient algorithm for this is given in appendix E. To simplify the notation it is understood throughout the rest of the chapter that T denotes the preprocessed tensor. Now we can define two cost functions, d1 (v, T) = vT Tv, T d2 (v, T) = (5.13) T v̂ Tv̂ v Tv = , 2 kvk tr T tr T (5.14) 2 Notice that both problems are well-defined. In the latter case we have the constraint that v̂ be of unit length while in the former case the last component of v has to be 1. 5.3 Estimating a Parameterized Velocity Field 67 both giving a statement about to what extent a velocity hypothesis is consistent with a given tensor. A perfect match gives the value zero, while increasing values indicate increasing inconsistencies. The distinction between the two cost functions is that d1 is suitable for minimization over a region, while d2 is more useful for comparisons of the consistency of motion hypotheses at different points. 5.3.3 Parameter Estimation Assume now that we again have a region given and that we have a motion model that assigns velocities vi to each point of the region. By summing the costs at each point we obtain dtot = X d1 (vi , Ti ), (5.15) i giving a total cost for the motion model over the entire region. With a parameterized motion model the next step is to find the parameters that minimize d tot . To explain this procedure we use the affine motion model (5.5), which can be rewritten as vx a b v = vy = d e 1 0 0 c x x f y = 0 1 1 0 | y 0 0 1 0 0 0 0 x y 0 0 {z S a b 0 0 c d 1 0 . e 0 1 } f 1 | {z } p (5.16) Hence we get d1 (v, T) = vT Tv = pT ST TSp = pT Qp, (5.17) where Q = ST TS is a positive semidefinite quadratic form. Summing these over the region transforms equation (5.15) into dtot (p) = X d1 (vi , Ti ) = i =p T Ã X pT STi Ti Si p i X i Qi ! T p = p Qtot p, (5.18) 68 Velocity Estimation which should be minimized under the constraint that the last element of p be 1. 3 In order to do this we partition p and Qtot as µ ¶ µ ¶ Q̄ q p̄ p= , Qtot = , (5.20) 1 qT α turning (5.18) into dtot (p) = p̄T Q̄p̄ + p̄T q + qT p̄ + α. (5.21) If Q̄ is invertible we can complete the square to get dtot (p) = (p̄ + Q̄−1 q)T Q̄(p̄ + Q̄−1 q) + α − qT Q̄−1 q (5.22) and it is clear that the minimum value α − qT Q̄−1 q (5.23) p̄ = −Q̄−1 q. (5.24) is obtained for If Q̄ should happen to be singular, the minimum value α + qT p̄ is obtained for all solutions to the equation Q̄p̄ = −q (5.25) and to choose between the solutions some additional constraint is needed. One reasonable possibility is to require that the mean squared velocity over the region is taken as small as possible, i.e. minimizing ! Ã X X T T T T (5.26) S̄i S̄i p̄ = p̄T L2 p̄ = kp̄kL , p̄ S̄i S̄i p̄ = p̄ i i where S̄ is S with the last column removed. The solution to this problem can be found in section 2.4.2 or in section 2.5.1 if L should be semidefinite. In order to use this method of parameter estimation, the necessary and sufficient property of the motion model is that it is linear in its parameters. This property is demonstrated by equation (5.16) for the affine motion model. The corresponding matrices S and p for the constant velocity motion model (5.4) are given by a 1 0 0 (5.27) S = 0 1 0 , p = b , 1 0 0 1 3 Now it should be clear why we prefer to minimize an expression involving v T Tv rather than v̂T Tv̂. In the latter case equation 5.18 would be replaced by dtot (p) = X p T S T Ti S i p i pT ST i Si p i and the minimization problem would become substantially harder to solve. (5.19) 5.4 Simultaneous Segmentation and Velocity Estimation 69 and for the eight parameter motion model (5.6) by 1 x y 0 0 0 x2 xy S = 0 0 0 1 x y xy y 2 0 0 0 0 0 0 0 0 ¡ p = a1 a2 a3 a4 a5 a6 a7 0 0 , 1 a8 (5.28) 1 ¢T . (5.29) There are two important advantages to estimating the velocity over a whole region rather than point by point. The first advantage is that the effects of noise and inaccuracies in the tensor estimation typically are reduced significantly. The second advantage is that even if the aperture problem is present in some part of the region, information obtained from other parts can help to fill in the missing velocity component. There does remain a possibility that the motion field cannot be uniquely determined, but that requires the signal structures over the whole region to be oriented in such a way that the motion becomes ambiguous; a generalized aperture problem.4 This case is characterized by Q̄ becoming singular, so that equation (5.25) has multiple solutions. The secondary requirement to minimize the mean squared velocity generalizes the idea to compute only the normal velocity component in the case of the ordinary aperture problem. A disadvantage with velocity estimation over a whole region is that it is assumed that the true velocity field is at least reasonably consistent with the chosen motion model. A problem here is that even if we know, e.g. from the geometry of the scene, that the velocity field should be patchwise affine, we still need to obtain regions not covering patches with different motion parameters. There are many possible solutions to this problem, including graylevel segmentation and the ideal case of a priori knowledge of suitable regions. Another solution is given in the following section, where a simultaneous segmentation and velocity estimation algorithm is presented. A different alternative is to ignore the need for correct segmentation and instead simply average the Q matrices. This approach is detailed in section 5.5. 5.4 Simultaneous Segmentation and Velocity Estimation In this section we present an efficient algorithm for simultaneous segmentation and velocity estimation, only given an orientation tensor field for one frame. The goal of the segmentation is to partition the image into a set of disjoint regions, so that each region is characterized by a coherent motion, with respect to the chosen motion model. In this section a region R is defined to be a nonempty, connected set of pixels. The segmentation algorithm is based on a competitive region growing approach. The basic algorithm is first presented in abstract form. 4A nontrivial example of this generalized aperture problem is a signal consisting of concentric circles, which simultaneously expand and rotate around their center. Only the radial velocity component can be recovered. 70 Velocity Estimation 5.4.1 The Competitive Algorithm To each region R is associated a cost function CR (x), which is defined for all pixels in the image. Regions are extended by adding one pixel at a time. To preserve connectivity the new pixel must be adjacent to the region, and to preserve disjointedness it must not already be assigned to some other region. The new pixel is also chosen as cheap as possible. The details are as follows. Let the border ∆R of region R be the set of unassigned pixels in the image which are adjacent to some pixel in R. For each region R, the possible candidate, N (R), to be added to the region is the cheapest pixel bordering to R, i.e. N (R) = arg min CR (x). x∈∆R (5.30) The corresponding minimum cost for adding the candidate to the region is denoted Cmin (R). In the case of an empty border, N (R) is undefined and Cmin (R) is infinite. Assuming that a number of regions {Rn } in some way have been obtained, the rest of the image is partitioned as follows. 1. Find the region Ri for which the cost to add a new pixel is the least, i.e. i = arg minn Cmin (Rn ). 2. Add the cheapest pixel N (Ri ) to Ri . 3. Repeat the first two steps until no unassigned pixels remain. Notice that it does not matter what the actual values of the cost functions are. It is only relevant which of them is lowest. Hence the algorithm is called competitive. 5.4.2 Candidate Regions A fundamental problem with the simultaneous segmentation and velocity estimation approach is that we typically need a segmentation in order to compute the motion model parameters, and we need motion models in order to partition the image into regions. Since we assume no a priori knowledge about the segmentation of the image, we use the concept of candidate regions to introduce preliminary regions into the algorithm. To begin with we arbitrarily fill the image with a large number of overlapping rectangular candidate regions5 . For each candidate region we then compute the optimal motion model parameters as described in section 5.3. Obviously these rectangular regions are not at all adapted to the motion field of the image and as a consequence the computed motion models are likely to be suboptimal. In order to improve the candidate regions we use a procedure called regrowing. The regrowing procedure is the first application of the competitive algorithm. Regrowing is performed for one candidate region at a time, which means that 5 e.g. squares of the size 21 × 21, with a distance between the center points of 4 pixels. The exact numbers are not critical. 5.4 Simultaneous Segmentation and Velocity Estimation 71 there is no competition between different regions but rather between the pixels. To begin with the candidate region contains only one pixel, its starting point, which was also the center point of the initial rectangle. The cost function used is d2 from equation (5.14), where v is the velocity given by the candidate region’s current motion model. The competitive algorithm is then run until the candidate region has grown to a specified size. This size is called the candidate region size, m0 and is a design parameter of the segmentation algorithm. The effect of the regrowing procedure is that the candidate region now consists of the m0 connected pixels, starting from a fixed point, that are most consistent with the candidate region’s motion model. When the candidate region has been regrown, new optimal parameters are computed. Each candidate region is regrown twice, a number which seems to be sufficient to obtain reasonably coherent regions. 5.4.3 Segmentation Algorithm Having obtained candidate regions, the rest of the segmentation algorithm is a matter of alternately converting candidate regions into real regions and letting the latter grow. In contrast to the candidate regions, the real regions are not allowed to overlap but has to be disjoint. While the candidate regions are allowed to overlap each other they must not overlap the real regions, which means that they have to be regrown from time to time, taking this restriction into account. To accomodate the inclusion of new regions, the competitive algorithm is extended to have the following steps, to be iterated as long as there are empty pixels left: 1. Regrow the candidate regions which are currently overlapping a real region. If a candidate region cannot be regrown to its full size, it is removed. The same thing happens when a candidate region’s starting point becomes occupied by a real region. The cost of the most expensive included pixel is called the maximum cost of the candidate region. 2. Find the candidate region with the least maximum cost. This is the aspirant for inclusion among the real regions. 3. As in the competitive algorithm, find the cheapest pixel that may be added to one of the already existing real regions. 4. Compare the least maximum cost from step 2 with the cost of the cheapest pixel in step 3. (a) If the least maximum cost is smallest, raise the corresponding candidate region to the status of a real region. (b) Otherwise, add the cheapest pixel to the corresponding region. In the first iteration there are no real regions yet, so the first thing that happens is that the best candidate region is transformed into the first real region. To see how the segmentation algorithm works, frame 12 of the flower garden sequence, illustrated in figure 5.1, has been segmented. In figure 5.2 we can see how the regions develop and how new regions are added. 72 Velocity Estimation (a) frame 6 (b) frame 12 (c) frame 18 Figure 5.1: Selected frames from the flower garden sequence. While the comparison in step 4 can be made directly between the given values, it is beneficial to introduce a design parameter λ, with which the least maximum cost is multiplied before the comparison is made. The effect of λ is that for a large value, new regions are added only if it would be very expensive to enlarge the existing ones. This may be desired e.g. if the segmentation is intended for a video coding application, where excessive fragmentation into regions can be costly. A small λ value means that existing regions are enlarged only if there are pixels available that are very consistent with the motion models, which is preferable if we are more interested in the velocity field than in the segmentation. The difference in segmentation for varying λ values is illustrated in figure 5.3. The regrowing of candidate regions in step 1 of the algorithm may seem prohibitively computationally expensive. In practice though, it is reasonable to assume that the maximum cost always increases when a candidate region has to be regrown.6 Therefore it is sufficient to regrow candidate regions only when the least maximum cost is smaller than the cheapest pixel and also only a few of the top candidate regions need to be regrown. More details on the segmentation algorithm, a few variations, and a discussion on possible improvements can be found in [13]. An algorithm with basic elements in common with the competitive algorithm can be found in [1], being applied to grayscale segmentation.7 Initial inspiration for the development of this segmentation algorithm was given by the results in [54, 55, 56]. 5.5 A Fast Velocity Estimation Algorithm To avoid the complexities of the segmentation algorithm we may also choose to completely ignore the need for segmentation into regions with coherent motion. Instead we minimize a weighted distance measure for a motion model around each 6 This would have been strictly correct if the motion model parameters were not recomputed each time the candidate region is regrown. 7 The competitive algorithm presented here was developed independently, although being predated by the mentioned paper. 5.5 A Fast Velocity Estimation Algorithm 73 (a) 17% coverage (b) 33% coverage (c) 50% coverage (d) 67% coverage (e) 83% coverage (f) 100% coverage Figure 5.2: Development of the regions in the segmentation algorithm. 74 Velocity Estimation (a) λ = 0.1 (b) λ = 0.5 (c) λ = 2 (d) λ = 10 (e) λ = 50 (f) λ = 500 Figure 5.3: Segmentation results for different λ values. 5.6 Evaluation 75 point, i.e. equation (5.18) is replaced by dtot (p) = X wi d1 (vi , Ti ) = p i T Ã X wi Q i i ! p = pT Qtot p, (5.31) where the sum is taken over a neighborhood of the current point and the weights w i are given by, e.g., a Gaussian. In effect this means that we convolve the quadratic forms Qi over the image with the weight function, and this operation can be efficiently computed by separable convolution as soon as the weight function is separable. Another way to look at this operation is as an application of normalized averaging, see section 3.8.1, with the weight function as applicability. 8 By taking this view we have the opportunity to set the certainty field to zero close to the borders, which is appropriate if we only use the very fast separable correlation method from section 4.7 to compute the tensors, as that method gives incorrect results at the borders. The optimal parameters and the corresponding velocity estimates at each point are computed exactly as in section 5.3.3 and it also turns out that the minimum value (5.23) can be used as a confidence measure,9 since it indicates how well the local neighborhood is consistent with the motion model in use. In the simplest case of a constant velocity motion model, we have S = I and hence the averaging of the quadratic forms Qi reduces to an averaging of the tensor field. This is in fact a well known idea to improve the robustness of the tensors [60], but there are a few important differences. The first one is that we here do not average the original tensor field, but rather the isotropy compensated field. The second difference is that we compute the velocity by equation (5.24), solving an equation system, rather than by (5.2) or (5.3), which involves the computation of at least one eigenvector. To summarize the whole algorithm we have the following five steps: 1. Compute the orientation tensor field for the frame, preferably using the separable correlation method for maximum computational efficiency. 2. Remove the isotropic part of the tensors. 3. Compute quadratic forms, Qi = STi Ti Si , according to the chosen motion model. 4. Apply normalized averaging to the quadratic forms. 5. Solve for the optimal parameters pi and compute the corresponding velocity estimates vi = Si pi . 5.6 Evaluation The velocity estimation algorithms have been evaluated on two commonly used test sequences with known velocity fields, Lynn Quam’s Yosemite sequence [27], figure 8 Notice that this gives a somewhat different scaling of the results, especially at the borders. it is a reversed confidence measure since small values indicate high confidence. 9 Actually 76 Velocity Estimation (a) frame 2 (b) frame 9 (c) frame 16 (d) velocity field Figure 5.4: Selected frames from the Yosemite sequence and the true velocity field corresponding to frame 9 (subsampled). 5.4, and David Fleet’s diverging tree sequence [17], figure 5.5. Both sequences are synthetic but differs in that the Yosemite sequence is generated with the help of a digital terrain map and therefore has a motion field with depth variation and discontinuities at occlusion boundaries. The diverging tree sequence on the other hand is only a textured planar surface towards which the camera translates. Hence the motion field is very regular but the lack of image details leads to difficulties in the velocity estimation. 5.6.1 Implementation and Performance All algorithms have been implemented in Matlab, with Normalized Convolution, ordinary convolution/correlation, the segmentation algorithm, and solution of multiple equation systems in the form of C mex files and the rest as highly vectorized matlab code. Typical running times for the different algorithms on a SUN Ultra 60 are given below and relate to the computation of the velocity for one frame of the 252 × 316 5.6 Evaluation (a) frame 20 77 (b) velocity field Figure 5.5: One frame from the diverging tree sequence and the corresponding true velocity field (subsampled). Yosemite sequence. Velocity estimation with the segmentation algorithm takes about 66 seconds, distributed with 1.6 seconds for tensor estimation with the separable convolution method, 37 seconds for estimation of the tensors along the border with the normalized convolution method,10 0.5 seconds for isotropy compensation, and 27 seconds for the segmentation algorithm, most of which is spent on the construction of candidate regions. Here the affine motion model is used, effective size of the kernels in the tensor estimation is 9 × 9 × 9, and candidate region size m0 is 500. The fast algorithm with the affine motion model, 11 × 11 × 11 tensors, and a 41×41 averaging kernel takes about 16 seconds. Of these, 1.8 seconds are spent on tensor estimation with the separable convolution method, 0.5 seconds on isotropy compensation, 0.3 seconds on computation of the quadratic forms, 8.6 seconds on normalized averaging, and 4.8 seconds on solving for the velocity. Finally we have the fast algorithm with the constant velocity motion model, 9 × 9 × 9 tensors and 15 × 15 normalized averaging. Here the running time is about 3.5 seconds, with 1.6 seconds for tensor estimation, 0.5 seconds for isotropy compensation, 1.2 seconds for normalized averaging, and 0.2 seconds for solving for the velocity. 5.6.2 Results for the Yosemite Sequence The accuracy of the velocity estimates has been measured using the average spaT tiotemporal angular error, arccos(v̂est v̂true ) [3]. In the Yosemite sequence the sky region is excluded from the error analysis. 10 This can be reduced drastically by the use of separable normalized convolution, which has not been implemented yet. In the order of one second should be possible. 78 Velocity Estimation To see how the various design parameters affect the results, we present a fairly detailed analysis. Common parameters for all algorithms are the values of σ and γ in the tensor estimation, cf. sections 4.10.2 and 4.10.3.11 Additional parameters for the segmentation algorithm are the factor λ and the candidate region size m 0 . The fast algorithm only adds the standard deviation σavg for the averaging kernel, which is chosen to be Gaussian. The kernel sizes used by the various algorithms are the same as in the discussion on running times. For the segmentation algorithm, we begin by varying m0 while having σ = 1.4, γ = 18 , and λ = 0.06. The results are shown in figure 5.6(a) and we can see that the errors vary between 1.25◦ and 1.45◦ in a rather unpredictable way. The main reason for this peculiar phenomenon is that the final partitioning into regions as well as the motion model parameters, which are computed only from the initial pixels in the region, can change significantly with a small change in m0 . In figure 5.6(b)–(d) we plot the minimum, mean, and maximum values for the average angular errors over the interval 400 ≤ m0 ≤ 600, while in turn varying σ, γ, and λ around the values given above. While the sensitivity to the value of m0 is disturbing, it turns out that this problem can be eliminated nicely at the cost of some extra computation. The solution is to estimate the velocity for a number of different values of m0 and then simply average the estimates.12 This has the double effect of both stabilizing the estimates and improving them. Using 11 evenly spaced values 400, 420, . . . , 600 of m0 we get an average angular error of 1.14◦ and a standard deviation of 2.14◦ . Picking 11 m0 values randomly in the same interval, we consistently get average angular errors between 1.13◦ and 1.18◦ . In figure 5.7(a)–(c) we see the results for the fast algorithm with the affine 1 motion model, in turn varying σ, γ and σavg around the point σ = 1.6, γ = 256 and γavg = 6.5. Of interest here is that a large part of the errors are due to discontinuities in the velocity field, especially along the horizon. It turns out that the confidence measure is rather successful in identifying the uncertain estimates. Sorting the estimates with respect to the confidence, we can compute average angular errors at different levels of coverage, shown in figure 5.7(d). At 100% coverage we have an average angular error of 1.40◦ ± 2.57◦ , at 90% the error is 1.00◦ ± 1.09◦ , and at 70% it is 0.75◦ ± 0.73◦ .13 Using the constant velocity motion model instead of affine motion we obtain average angular errors at different levels of coverage according to figure 5.8 for 1 σ = 1.4, γ = 32 , and σavg = 3.5. The errors are increased to 1.94◦ ± 2.31◦ at 100% coverage, 1.61◦ ± 1.57◦ at 90%, and 1.43◦ ± 1.24◦ at 70%. These results can be compared to results for a similar simple method, reported by Karlholm in [35]. He uses orientation tensors estimated from quadrature filter responses at multiple scales. From these the isotropic part is removed and they are normalized 11 We only consider Gaussian applicabilities. To some extent the kernel size may also be regarded as a design parameter, but its only effect is a trade-off between the computation time and the usable range for σ. 12 Notice that we only need to rerun the segmentation algorithm. The tensor field can be reused. The running time is thus increased to about 5.5 minutes. 13 a ± b is used here as a shorthand for average error and standard deviation, but has no interpretation in terms of an interval. 5.6 Evaluation 79 2.5 2.5 average error, degrees 3 average error, degrees 3 2 1.5 1.5 1 0.5 0 400 2 1 0.5 450 500 m0 550 0 0.8 600 1 1.2 2.5 2.5 average error, degrees 3 average error, degrees 1.4 1.6 1.8 (b) (a) 3 2 1.5 2 1.5 1 0.5 0 −4 10 σ 1 0.5 −2 10 0 10 γ (c) 2 10 4 10 0 −2 10 −1 10 0 λ 10 1 10 (d) Figure 5.6: Average angular errors for the segmentation algorithm on the Yosemite sequence, while varying the design parameters. 80 Velocity Estimation 2.5 2.5 average error, degrees 3 average error, degrees 3 2 2 1.5 1.5 1 1 0.5 0.5 0 1 1.2 1.4 1.6 σ 1.8 2 0 −4 10 2.2 −2 2.5 2.5 average error, degrees 3 average error, degrees 10 2 10 80 100 10 γ 4 (b) (a) 3 2 1.5 2 1.5 1 1 0.5 0.5 0 0 10 2 4 6 8 σ2 (c) 10 12 14 0 0 20 40 60 Coverage (%) (d) Figure 5.7: (a)–(c): Average angular errors for the fast algorithm with the affine motion model on the Yosemite sequence, while varying the design parameters. (d): Average angular errors at different levels of coverage. The dashed lines are the corresponding standard deviations. 5.6 Evaluation 81 3 average error, degrees 2.5 2 1.5 1 0.5 0 0 20 40 60 Coverage (%) 80 100 Figure 5.8: Average angular errors for the fast algorithm with the constant velocity motion model on the Yosemite sequence at different levels of coverage. The dashed line gives the corresponding standard deviations. with respect to the largest eigenvalue. Finally the squared tensors are averaged using a 21 × 21 Gaussian kernel with standard deviation 6 and the velocity is estimated from the smallest eigenvector as in equation (5.3). This gives average angular errors of 2.44◦ ± 2.06◦ at 90% coverage and 2.23◦ ± 1.94◦ at 70%, using the quotient λλ21 as confidence measure. It would also be conceivable to use the eight parameter motion model with the fast algorithm but it turns out to give no better results than the affine motion model. In fact the results are slightly worse, probably due to model overfitting. 14 Some statistics on the distribution of errors for the three evaluated methods are given in table 5.1. Comparison with previously published results, table 5.2, shows that the algorithms presented here are substantially more accurate than existing methods. Average error Standard deviation < 0.5◦ Proportion < 1◦ of estimates < 2◦ with errors < 3◦ below: < 5◦ < 10◦ segmentation 1.14◦ 2.14◦ 32.0% 64.4% 87.8% 94.0% 98.0% 99.7% fast, affine 1.40◦ 2.57◦ 35.8% 65.0% 82.1% 89.7% 95.4% 98.8% fast, constant 1.94◦ 2.31◦ 14.1% 39.7% 70.5% 83.4% 92.8% 98.6% Table 5.1: Distribution of errors for the Yosemite sequence. 14 The constant velocity motion model and the eight parameter motion model can of course be used with the segmentation algorithm too, but do not lead to any improvements for this sequence. 82 Velocity Estimation Technique Lucas & Kanade [41] Uras et al. [52] Fleet & Jepson [17] Black & Anandan [6] Black & Jepson [7] Ju et al. [33] Karlholm [35] segmentation fast, affine fast, affine fast, constant fast, constant Average error 2.80◦ 3.37◦ 2.97◦ 4.46◦ 2.29◦ 2.16◦ 2.06◦ 1.14◦ 1.40◦ 0.75◦ 1.94◦ 1.43◦ Standard deviation 3.82◦ 3.37◦ 5.76◦ 4.21◦ 2.25◦ 2.0◦ 1.72◦ 2.14◦ 2.57◦ 0.73◦ 2.31◦ 1.24◦ Density 35% 14.7% 34.1% 100% 100% 100% 100% 100% 100% 70% 100% 70% Table 5.2: Comparison of error results for the Yosemite sequence. All errors are computed without the sky region. The compilation of the old results is due to Karlholm [35]. 5.6.3 Results for the Diverging Tree Sequence The diverging tree sequence is characterized by having a continuous velocity field, in contrast to the discontinuities in the Yosemite sequence. On the other hand there is less texture and large regions which are completely featureless. One result of these changed circumstances is that the confidence measure for the fast method turns ineffective, since the larger estimation errors mainly are caused by a lack of image details rather than incoherency in the local velocity field.15 Hence no results are given for partial levels of coverage. The segmentation algorithm, with σ = 1.25, γ = 18 , λ = 0.25, and the velocity averaged over m0 = 500, 520, . . . , 700 gives an average angular error of 0.54◦ and a standard deviation of 0.28◦ . The fast algorithm with the affine motion model and σ = 1.6, γ = 41 , and σavg = 9.5 (51 × 51 Gaussian kernel) gives an average angular error of 0.56◦ ±0.23◦ . The fast algorithm with the constant velocity motion 1 model and σ = 1.1, γ = 32 , and σavg = 1.5 results in an average angular error of ◦ ◦ 1.79 ± 1.34 . That the segmentation algorithm gives only marginally better results than the fast algorithm with the affine motion model is not surprising given the lack of discontinuities in the velocity field. A comparison with other methods is given in table 5.3, listing a selection of the results reported in [3]. 15 Necessary information to detect this kind of uncertainty should be available in the tensor field. 5.6 Evaluation Technique Horn & Schunck (modified) Lucas & Kanade Uras et al. Nagel Fleet & Jepson segmentation fast, affine fast, constant 83 Average error 2.55◦ 1.65◦ 3.83◦ 2.94◦ 0.73◦ 0.54◦ 0.56◦ 1.79◦ Standard deviation 3.67◦ 1.48◦ 2.19◦ 3.23◦ 0.46◦ 0.28◦ 0.23◦ 1.34◦ Density 100% 24.3% 60.2% 100% 28.2% 100% 100% 100% Table 5.3: Comparison of error results for the diverging tree sequence. 84 Velocity Estimation Chapter 6 Future Research Directions In this chapter we look at some further developments of the ideas and methods presented in the thesis. The first two sections expand the spatial domain framework to phase estimation and adaptive filtering respectively, while the last section outlines the strategy for adapting the framework to irregularly sampled signals. Since this is research in progress, the presentation is by necessity short and sketchy. 6.1 Phase Functionals The concept of orientation functionals from section 4.3 can straightforwardly be extended to a novel and powerful representation for phase, or rather a combined orientation and phase representation. With U defined by equation (4.9), a phase functional is a mapping θ : U −→ C (6.1) that to each direction vector assigns a complex value. The magnitude of the value is interpreted as a measure of the signal variation along the direction, while the argument of the value is interpreted as the local phase of the signal with respect to the direction. If we reverse the direction the magnitude should be unchanged while the argument should be negated. Hence we require that θ be Hermitian, i.e. that θ(−u) = θ(u)∗ , all u ∈ U. (6.2) It is clear that by taking the magnitude1 of a phase functional we obtain an orientation functional. The method for estimation of orientation tensors in section 4.5 can be extended to estimation of phase after some preparations. To begin with we use a more liberal definition of signal phase than is usual. Instead of relating it to the phase of a sinusoidal it is interpreted as some relation between the odd and even parts of the 1 Or the squared magnitude or something similar. 86 Future Research Directions arg φ 0 π 2 π − π2 ei arg φ 1 i −1 −i interpretation even, local maximum odd, decreasing even, local minimum odd, increasing Table 6.1: Interpretation of local phase. signal, DC component excluded. A similar approach to the definition of phase is taken by Nordberg in [43]. Table 6.1 lists the primary characteristics of the phase. With the signal model given by equation (4.13) it is clear that A represents the even part of the signal, excluding the DC component, while b represents the odd part. Thus it should be possible to construct a phase functional from A and b. Unfortunately we cannot use a quadratic form to represent phase, as we did with the tensor for orientation. The reason for this is that quadratic forms by necessity give even functionals, a property that is compatible with being Hermitian only if they are also real-valued. A way to get around this is to add one dimension to the representation and use a quadratic form with respect to µ ¶ u . (6.3) ũ = 1 By setting the phase tensor µ A P=− iγbT ¶ iγb , 0 (6.4) where γ has a similar role to that in the construction of the orientation tensor, we obtain the phase functional θP (u) = ũT Pũ = −uT Au − i2γbT u. (6.5) If we take the magnitude of this phase functional we obtain an orientation functional that is different from the orientation tensor in section 4.5 but it is interesting to notice that the latter appears in ũT PP∗ ũ = uT (AAT + γ 2 bbT + γ 2 (bT b)I)u, (6.6) with only an extra isotropic term in the tensor. 6.2 Adaptive Filtering The idea of adaptive filtering, as described in [22], is to apply a space-variant filter to a signal, where the filter shape at each point is determined by the local orientation. In order for this operation to be practically feasible, it is required that the different filter shapes can be constructed as linear combinations of a small set 6.2 Adaptive Filtering 87 of space-invariant filters. Without going into the depth of the presentation in [22], we show in this section how projection onto second degree polynomials can be used for adaptive high pass filtering with applications to directed noise and image degradation. The key observation can be made from figure 4.8 on page 48, where we can see that the dual basis functions used to compute the projections onto the x 2 and y 2 basis functions in fact are directed high pass filters along the two axes. 2 Together with the dual basis function corresponding to xy we can construct rotations of this ¡ ¢T high pass filter to any direction α β , α2 +β 2 = 1, by noting that (αx+βy)2 = α2 x2 + 2αβxy + β 2 y 2 . Hence directed high pass filters in any direction can be computed as linear combinations of three space-invariant filters in 2D and in the filters. Furthermore, with Cartesian separable d-dimensional case from d(d+1) 2 applicability3 , the responses of these filters can be computed very efficiently with one-dimensional correlations according the scheme in section 4.7. (a) original white noise (b) 1 iteration (c) 2 iterations (d) 5 iterations (e) 10 iterations (f) equalized Figure 6.1: Iterated adaptive filtering of white noise. Directed noise is obtained by applying adaptive filtering to white noise. The result is noise with a tendency to be oriented in the same way as the orientation field used to control the adaptive filtering. To increase the effect we can iterate this 2 Notice, however, that we had better negate these filters to avoid an unnecessary 180 ◦ phase shift. 3 Obviously we also want to have the applicability isotropic, which means that it should be Gaussian. 88 Future Research Directions procedure a number of times. In figure 6.1 we adapt white noise to an orientation field defined so that the orientation at each point is the radius vector rotated by 10 degrees, with the geometric interpretation of a spiral pattern. We can see that this pattern becomes more distinct with each iteration, although there is also a significant element of very low frequency noise. The latter can be almost completely eliminated by a local amplitude equalization, giving the final result in figure 6.1(f). The scale of the pattern is directly controlled by the standard deviation of the Gaussian applicability. An amusing application of directed noise is to control the filters with the estimated orientation field of an actual image; a severe form of image degradation. In figure 6.2 the orientation field of the well-known Lena image has been computed, low-pass filtered and then used to control the directed noise process, with 20 iterations and amplitude equalization.4 Figure 6.3 shows the result of a more intricate variation of the same theme, involving multiple scales and separate processing of each color component. The full color image can be found on the cover of the thesis. Figure 6.2: Directed noise shaped to the orientation field of a real image. 6.3 Irregular Sampling In order to adapt the orientation and motion estimation methods to the irregularly sampled case, we make the following observations: 1. The motion estimation methods only depend on the computation of orientation tensors and for the fast algorithm on averaging or Normalized Convolution.5 4 Only the largest eigenvector of each tensor has been used to determine the orientation. Cf. figure 3.1 on page 28 for less degraded versions of the image. 5 The segmentation algorithm only needs a minor modification with respect to the connectivity used for the growing of regions. 6.3 Irregular Sampling (a) original image 89 (b) degraded image Figure 6.3: A more complex example of image degradation. See also the cover image. 2. Estimation of orientation tensors only depends on the projection onto a polynomial basis, which is implemented by means of Normalized Convolution. 3. As discussed in section 3.2.4, Normalized Convolution can be defined by equation (3.6) for the continuous case. Thus the whole problem can be reduced to estimating weighted inner products from the samples of the signal. The aim is to develop a framework that applies to just about any sampling pattern, with special attention to foveal patterns, such as the one in figures 6.4 and 6.5. It is assumed that the samples are computed by integration of the signal over an area around each sample point, either unweighted over the region given by the Voronoi diagram or weighted with some probably overlapping windowing functions. If we let f be the signal and φk the sampling functions, we can express the sample values fk as Z fk = (f, φk ) = f (x)φk (x) dx. (6.7) What we need in order to compute Normalized Convolution is some method to estimate Z (f, bi )w = f (x)w(x)bi (x) dx (6.8) from the samples fk . It is reasonable to assume that we can do some approximation X (f, bi )w = ck fk (6.9) 90 Future Research Directions and the remaining question is how the {ck } coeffients should be determined. This problem does not look too hard but care must be taken to avoid biased solutions, cf. the negative effects of not fully isotropic applicabilities in section 4.10.1. This approach is not completely without complications, however. The main one is that irregularly sampled data requires more care to deal with efficiently than traditional samples on a rectangular grid. One problem in particular is that the coefficients {ck } above vary both with the basis function and with the spatial position, in contrast to the space-invariance in the regular case. Another related problem is that the method for fast computation by separable correlations cannot easily be adapted to the irregular case, if at all. 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 Figure 6.4: Irregular sampling pattern. 0.8 1 6.3 Irregular Sampling 91 0.9 0.05 0.8 0.7 0 0.6 0.5 0.4 −0.2 −0.1 0 0.1 (a) Peripheral part. 0.2 −0.05 −0.05 0 0.05 (b) Central part. Figure 6.5: Magnifications of one peripheral part and the central part of the irregular sampling pattern, including Voronoi diagrams. 92 Future Research Directions Appendices A A Matrix Inversion Lemma To prove that the matrix G in equation (4.39), section 4.7.2, has an inverse given by equation (4.40), we need the following lemma: Lemma A.1. Provided that a 6= 0, c 6= d, and ad = b2 , the (n + 1) × (n + 1) matrix a b M = b .. . b b c d d c .. .. . . d d b ... ... ... .. . ... b d d .. . (A.1) c has an inverse of the form M−1 a e = e .. . e e c 0 .. . e 0 c .. . ... ... ... .. . 0 0 ... e 0 0 .. . (A.2) c Proof. Inserting (A.1) and (A.2) into the definition of the inverse, MM−1 = I, we get the five distinct equations aa + nbe = 1, (A.3) ae + bc = 0, ba + ce + (n − 1)de = 0, (A.4) (A.5) be + cc = 1, be + dc = 0, (A.6) (A.7) 94 Appendices which can easily be verified to have the explicit solution µ ¶ 1 nd a= 1+ , a c−d 1 , c= c−d d b =− . e=− a(c − d) b(c − d) (A.8) Since G can be partitioned into one diagonal block and one block with the structure given by M in lemma A.1, the stated structure of G−1 follows immediately if we can show that the conditions of the lemma are satisfied. The components of M are given by a = (a · b1 , b1 ) (A.9) b = (a · b1 , bx2i ) (A.10) d = (a · bx2i , bx2j ) (A.12) c = (a · bx2i , bx2i ) (A.11) so it is clear that a > 0. That c 6= d follows by necessity from the assumption that the basis functions are linearly independent, section 4.4. The final requirement that ad = b2 relies on the condition set for the applicability1 a(x) = a1 (x1 )a1 (x2 ) . . . a1 (xN ). (A.13) Now we have X a= a(x) = x1 ,...,xN b= k=1 X a(x)x2i X a1 (x1 )x21 x1 ,...,xN = Ã d= x1 X N Y = Ã Ã !Ã = X x1 a1 (x1 ) x1 !2 Ã Ã X xi X x1 ! a1 (xi )x2i xi X a(x)x2i x2j = a1 (x1 )x21 a1 (xk ) xk X x1 ,...,xN Ã X ! = Ã X a1 (x1 ) x1 Y ³X k6=i !N −1 !N a1 (xk ) (A.14) ´ (A.15) ! ´ Y ³X X a1 (xj )x2j a1 (xk ) a1 (xi )x2i a1 (x1 ) xj k6=i,j !N −2 (A.16) and it is clear that ad = b2 . 1 We apologize that the symbol a happens to be used in double contexts here and trust that the reader manages to keep them apart. B Cartesian Separable and Isotropic Functions B 95 Cartesian Separable and Isotropic Functions As we saw in section 4.7.2, a desirable property of the applicability is to simultaneously be Cartesian separable and isotropic. In this appendix we show that the only interesting class of functions having this property is the Gaussians. Lemma B.1. Assume that f : RN −→ R, N ≥ 2, is Cartesian separable, f (x) = f1 (x1 )f2 (x2 ) . . . fN (xN ), some {fk : R −→ R}N k=1 , (B.1) isotropic, f (x) = g(xT x), some g : R+ ∪ {0} −→ R, (B.2) and partially differentiable. Then f must be of the form f (x) = AeCx T x , (B.3) for some real constants A and C. Proof. We first assume that f is zero for some x. Then at least one factor in (B.1) is zero and by varying the remaining coordinates it follows that g(t) = 0, all t ≥ α2 , (B.4) where α is the value of the coordinate in the zero factor. By taking ¢T α ¡ 1 1 ... 1 x= √ N (B.5) we can repeat the argument to get g(t) = 0, all t ≥ α2 . N (B.6) and continuing like this we find that g(t) = 0, all t > 0, and since f is partially differentiable there cannot be a point discontinuity at the origin, so f must be identically zero. This is clearly a valid solution. If instead f is nowhere zero we can compute the partial derivatives as ∂ ∂xk f (x) f (x) = fk0 (xk ) g 0 (xT x) = 2xk , fk (xk ) g(xT x) k = 1, 2, . . . , N. (B.7) Restricting ourselves to one of the hyper quadrants, so that all xk 6= 0, we get 0 g 0 (xT x) (xN ) f10 (x1 ) f20 (x2 ) fN = = = ··· = , T g(x x) 2x1 f1 (x1 ) 2x2 f2 (x2 ) 2xN fN (xN ) (B.8) which is possible only if they all have a common constant value C. Hence we get g from the differential equation g 0 (t) = Cg(t) (B.9) 96 Appendices with the solution g(t) = AeCt . (B.10) It follows that f in each hyper quadrant must have the form f (x) = AeCx T x (B.11) and in order to get isotropy, the constants must be the same for all hyper quadrants. The case A = 0 corresponds to the identically zero solution. A weakness of this result is the condition that f be partially differentiable, which is not a natural requirement of an applicability function. If we remove this condition it is easy to find one new solution, which is zero everywhere except at the origin. What is not so easy to see however, and quite contra-intuitive, is that there also exist solutions which are discontinuous and everywhere positive. To construct these we need another lemma. Lemma B.2. There do exist discontinuous functions L : R −→ R which are additive, i.e. x, y ∈ R. L(x + y) = L(x) + L(y), (B.12) Proof. See [48] or [18]. With L from the lemma we get a Cartesian separable and isotropic function by the construction f (x) = eL(x T x) . (B.13) This function is very bizarre, however, because it has to be discontinuous at every point and unbounded in every neighborhood of every point. It is also completely useless as an applicability since it is unmeasurable, i.e. it cannot be integrated. To eliminate this kind of strange solutions it is sufficient to introduce some very weak regularity constraints2 on f . Unfortunately the proofs become very technical if we want to have a bare minimum of regularity. Instead we explore what can be accomplished with a regularization approach. Let the functions φσ and Φσ be normalized Gaussians with standard deviation σ in one and N dimensions respectively, φσ (x) = √ Φσ (x) = 2 See x2 1 e− 2σ2 2πσ 1 N (2π) 2 σ N e.g. [18] and exercise 9.18 in [44]. (B.14) xT x e− 2σ2 = N Y k=1 φσ (xk ) (B.15) B Cartesian Separable and Isotropic Functions 97 We now make the reasonable assumption that f is regular enough to be convolved with Gaussians,3 Z fσ (x) = (f ∗ Φσ )(x) = f (x − y)Φσ (y) dy, σ > 0. (B.16) RN The convolved functions retain the properties of f to be Cartesian separable and isotropic. The first property can be verified by Z fσ (x) = f1 (x1 − y1 ) . . . fN (xN − yN ) φσ (y1 ) . . . φσ (yN ) dy = RN N YZ k=1 (B.17) R fk (xk − yk )φσ (yk ) dyk . To show the second property we notice that isotropy is equivalent to rotation invariance, i.e. for an arbitrary rotation matrix R we have f (Rx) = f (x). Since the Gaussians are rotation invariant too, we have Z fσ (Rx) = f (Rx − y)Φσ (y) dy RN Z f (Rx − Ru)Φσ (Ru) du = (B.18) N ZR = f (x − u)Φσ (u) du = fσ (x). RN Another property that the convolved functions obtain is a high degree of regularity. Without making additional assumptions on the regularity of f , fσ is guaranteed to be infinitely differentiable because Φσ has that property. This means that lemma B.1 applies to fσ , which therefore has to be a Gaussian. To connect the convolved functions to f itself we notice that the Gaussians Φσ approach the Dirac distribution as σ approaches zero; they become more and more concentrated to the origin. As a consequence the convolved functions f σ approach f and in the limit we find that f has to be a Gaussian, at least almost everywhere.4 Another way to reach this conclusion is to assume that f can be Fourier transˆ formed. Then equation (B.16) turns into fˆσ (u) = fˆ(u)Φ̂σ (u), so that fˆ(u) = Φ̂fσ (u) σ (u) is a quotient of two Gaussians and hence itself a Gaussian. Inverse Fourier transformation gives the desired result. 3 This mainly requires f to be locally integrable. The point is, however, that it would be useless as an applicability if it could not be convolved with Gaussians. 4 Almost everywhere is quite sufficient because the applicability is only used in integrals. 98 Appendices C Correlator Structure for Separable Normalized Convolution : uu uu 5 u uu jj ujujjjjj / 1 TTT ··I IIIITTTTT) II · II I$ ··· ·· · ··· p7 · ppp p ··· p pp e2 eYpeYeeeee ··· x Y Y Y N B NNN YYY, · NNN ¦¦ ¦ ··· NN' ¦ ¦ ·· ¦ ¦ · ¦¦ ··· ¦¦¦ 5 ¦ jjjj j ··¦¦ j j / x2 jT / c 6C TTTT 6C6CC TTT) 66 CC 66 CCC 66 CC 66 C! eee2 66 x3 YeeYeYeYe Y YYY, 66 66 ½ / x4 89:; ?>=< 1 1 y y y2 y2 y3 y3 y4 y4 1 x y xy y2 xy 2 y3 xy 3 1 x2 y x2 y y2 x2 y 2 1 x3 y x3 y 1 x4 5 1 jjjj jjjj / y = 1 TTTTTT { TT) {{ y2 {{ { { { { {{ eee2 1 / x eYeYeYeYeY c·f NN YYY, NNN y NNN N' / 1 x2 @ABC GFED 1 y y2 x xy x2 Figure C.1: Correlator structure for computation of 2D orientation tensors, using the separable normalized convolution method described in section 4.8. There is understood to be an applicability factor in each box as well. D Angular RMS Error D 99 Angular RMS Error In this appendix we verify the equivalence between the expressions (4.46) and (4.47) for the angular RMS error in section 4.10. Starting with v u L u 1 X kx̂x̂T − ê1 êT1 k2 , ∆φ = arcsin t (D.1) 2L l=1 we expand the squared Frobenius norm, using the relation kTk2 = tr (TT T), to get ¢ ¡ kx̂x̂T − ê1 êT1 k2 = tr (x̂x̂T − ê1 êT1 )T (x̂x̂T − ê1 êT1 ) (D.2) ¡ ¢ = tr x̂x̂T − (x̂T ê1 )x̂êT1 − (êT1 x̂)ê1 x̂T + ê1 êT1 . To simplify this expression we use the fact that the trace operator is linear and that tr(abT ) = tr(bT a). Thus we have kx̂x̂T − ê1 êT1 k2 = 1 − (x̂T ê1 )2 − (êT1 x̂)2 + 1 = 2(1 − (x̂T ê1 )2 ) and continuing with the original expression, v u L u 1 X 2(1 − (x̂T ê1 )2 ) ∆φ = arcsin t 2L l=1 v u L u X 1 = arcsin t1 − (x̂T ê1 )2 L l=1 v u L u1 X = arccos t (x̂T ê1 )2 . L l=1 (D.3) (D.4) 100 E Appendices Removing the Isotropic Part of a 3D Tensor To remove the isotropic part of a 3D tensor we need to compute the smallest eigenvalue of a symmetric and positive semidefinite 3 × 3 matrix T. There are a number of ways to do this, including inverse iterations and standard methods for eigenvalue factorization. Given the small size of the matrix in this case, however, we additionally have the option of computing the eigenvalues algebraically, since these can be expressed as the roots of the third degree characteristic polynomial, p(λ) = det(T − λI). To find the solutions to z 3 + az 2 + bz + c = 0 we first remove the quadratic term with the translation z = x − a3 , yielding x3 + px + q = 0. (E.1) It is well known that the solutions to this equation can be given by Cardano’s formula [53] ³ p ´ 3 ³ q ´2 D= + , (E.2) 2 r3 q √ u = 3 − + D, (E.3) 2 r q √ v = 3 − − D, (E.4) 2 x1 = u + v, (E.5) √ u+v u−v + i 3, x2 = − (E.6) 2 2 u+v u−v √ − i 3. x3 = − (E.7) 2 2 Unfortunately this formula leads to some complications in the choice of the complex cubic roots if D < 0, which happens exactly when we have three distinct real roots. Since we have a symmetric and positive semidefinite matrix we know a priori that all eigenvalues are real and non-negative.5 q A better approach in this case, still following [53], is to make the scaling x = − 4p 3 y, leading to 3q . 4y 3 − 3y = q p − 4p 3 (E.8) Taking advantage of the identity cos 3α = 4 cos3 α − 3 cos α, we make the substitution y = cos α to get the equation 3q cos 3α = q , p − 4p 3 (E.9) 5 For the following discussion it is sufficient that we have a symmetric matrix and thus real eigenvalues. E Removing the Isotropic Part of a 3D Tensor 101 where the right hand side is guaranteed to have an absolute value less than or equal to one if all the roots are indeed real. Hence it is clear that we obtain the three real solutions to (E.1) from r 4p (E.10) β= − , 3 1 3q α = arccos , (E.11) 3 pβ x1 = β cos α, (E.12) ¶ µ 2π , (E.13) x2 = β cos α − 3 µ ¶ 2π x3 = β cos α + . (E.14) 3 Furthermore, since we have 0 ≤ α ≤ π3 , it follows that x1 ≥ x2 ≥ x3 . In terms of the tensor T, the above discussion leads to the following algorithm for removal of the isotropic part: 1. Remove the trace of T by computing a tr T T =T− I = d 3 e 0 d b f e f . c (E.15) This is equivalent to removing the quadratic term from the characteristic polynomial. 2. The eigenvalues of T0 are now given as the solutions to x3 + px + q = 0, where p = ab + ac + bc − d2 − e2 − f 2 , 2 2 (E.16) 2 (E.17) 4p , 3 (E.18) 1 3q arccos , 3 pβ (E.19) q = af + be + cd − 2def − abc. 3. Let β= α= r − so that the eigenvalues of T0 are given by (E.12) – (E.14). 4. Let µ 2π x3 = β cos α + 3 ¶ (E.20) and compute the isotropy compensated tensor T00 as T00 = T0 − x3 I. (E.21) 102 Appendices It should be noted that this method may numerically be somewhat less accurate than standard methods for eigenvalue factorization. For the current application, however, this is not an issue at all.6 A slightly different formula for the eigenvalues of a real, symmetric 3 × 3 matrix can be found in [46] and a closed form formula for eigenvectors as well as eigenvalues in [8]. 6 Except making sure that the magnitude of the argument to the arccosine is not very slightly larger than one. Bibliography [1] R. Adams and L. Bischof. Seeded region growing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(6):641–647, June 1994. [2] M. Andersson, J. Wiklund, and H. Knutsson. Sequential Filter Trees for Efficient 2D 3D and 4D Orientation Estimation. Report LiTH-ISYR-2070, ISY, SE-581 83 Linköping, Sweden, November 1998. URL: http://www.isy.liu.se/cvl/ScOut/TechRep/TechRep.html. [3] J. L. Barron, D. J. Fleet, and S. S. Beauchemin. Performance of optical flow techniques. Int. J. of Computer Vision, 12(1):43–77, 1994. [4] J. Bigün. Local Symmetry Features in Image Processing. PhD thesis, Linköping University, Sweden, 1988. Dissertation No 179, ISBN 91-7870334-4. [5] Å. Björck. Numerical Methods for Least Squares Problems. SIAM, Society for Industrial and Applied Mathematics, 1996. [6] M. J. Black and P. Anandan. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Computer Vision and Image Understanding, 63(1):75–104, Jan. 1996. [7] M. J. Black and A. Jepson. Estimating optical flow in segmented images using variable-order parametric models with local deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(10):972–986, 1996. [8] A.W. Bojanczyk and A. Lutoborski. Computation of the euler angles of a symmetric 3 × 3 matrix. SIAM J. Matrix Anal. Appl., 12(1):41–48, January 1991. [9] R. Bracewell. The Fourier Transform and its Applications. McGraw-Hill, 2nd edition, 1986. [10] K. Daniilidis and V. Krüger. Optical flow computation in the log-polar plane. In Proceedings 6th Int. Conf. on Computer Analysis of Images and Patterns, pages 65–72. Springer-Verlag, Berlin, Germany, 1995. 104 Bibliography [11] F. Dufaux and F. Moscheni. Segmentation-based motion estimation for second generation video coding techniques. In L. Torres and M. Kunt, editors, Video Coding: The Second Generation Approach, chapter 6, pages 219–263. Kluwer Academic Publishers, 1996. [12] L. Eldén. A Weighted Pseudoinverse, Generalized Singular Values, and Constrained Least Squares Problems. BIT, 22:487–502, 1982. [13] G. Farnebäck. Motion-based Segmentation of Image Sequences. Master’s Thesis LiTH-ISY-EX-1596, Computer Vision Laboratory, SE-581 83 Linköping, Sweden, May 1996. [14] G. Farnebäck. Motion-based segmentation of image sequences using orientation tensors. In Proceedings of the SSAB Symposium on Image Analysis, pages 31–35, Stockholm, March 1997. SSAB. [15] G. Farnebäck. A unified framework for bases, frames, subspace bases, and subspace frames. In Proceedings of the 11th Scandinavian Conference on Image Analysis, Kangerlussuaq, Greenland, June 1999. SCIA. Accepted for publication. [16] H. G. Feichtinger and K. H. Gröchenig. Theory and practice of irregular sampling. In J. Benedetto and M. Frazier, editors, Wavelets: Mathematics and Applications, pages 305–363. CRC Press, 1994. [17] D. J. Fleet and A. D. Jepson. Computation of Component Image Velocity from Local Phase Information. Int. Journal of Computer Vision, 5(1):77–104, 1990. [18] James Foran. Fundamentals of Real Analysis. M. Dekker, New York, 1991. ISBN 0-8247-8453-7. [19] G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, second edition, 1989. [20] G. H. Granlund. In search of a general picture processing operator. Computer Graphics and Image Processing, 8(2):155–178, 1978. [21] G. H. Granlund and H. Knutsson. Contrast of structured and homogenous representations. In O. J. Braddick and A. C. Sleigh, editors, Physical and Biological Processing of Images, pages 282–303. Springer Verlag, Berlin, 1983. [22] G. H. Granlund and H. Knutsson. Signal Processing for Computer Vision. Kluwer Academic Publishers, 1995. ISBN 0-7923-9530-1. [23] R. M. Haralick. Edge and region analysis for digital image data. Computer Graphics and Image Processing, 12(1):60–73, January 1980. [24] R. M. Haralick. Digital step edges from zero crossing of second directional derivatives. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(1):58–68, January 1984. Bibliography 105 [25] R. M. Haralick and L. G. Shapiro. Computer and robot vision, volume 2. Addison-Wesley, 1993. [26] R. M. Haralick and L. Watson. A facet model for image data. Computer Graphics and Image Processing, 15(2):113–129, February 1981. [27] D. J. Heeger. Model for the extraction of image flow. J. Opt. Soc. Am. A, 4(8):1455–1471, 1987. [28] B. Jähne. Motion determination in space-time images. In Image Processing III, pages 147–152. SPIE Proceedings 1135, International Congress on Optical Science and Engineering, 1989. [29] B. Jähne. Motion determination in space-time images. In O. Faugeras, editor, Computer Vision-ECCV90, pages 161–173. Springer-Verlag, 1990. [30] B. Jähne. Digital Image Processing: Concepts, Algorithms and Scientific Applications. Springer Verlag, Berlin, Heidelberg, 1991. [31] B. Jähne. Spatio-Temporal Image Processing: Theory and Scientific Applications. Springer Verlag, Berlin, Heidelberg, 1993. ISBN 3-540-57418-2. [32] B. Jähne. Practical Handbook on Image Processing for Scientific Applications. CRC Press LLC, 1997. ISBN 0-8493-8906-2. [33] S. X. Ju, M. J. Black, and A. D. Jepson. Skin and bones: Multi-layer, locally affine, optical flow and regularization with transparency. In Proceedings CVPR’96, pages 307–314. IEEE, 1996. [34] T. Kailath. Linear Systems. Information and System Sciences Series. PrenticeHall, Englewood Cliffs, N.J., 1980. [35] J. Karlholm. Local Signal Models for Image Sequence Analysis. PhD thesis, Linköping University, Sweden, SE-581 83 Linköping, Sweden, 1998. Dissertation No 536, ISBN 91-7219-220-8. [36] H. Knutsson. Representing local structure using tensors. In The 6th Scandinavian Conference on Image Analysis, pages 244–251, Oulu, Finland, June 1989. Report LiTH–ISY–I–1019, Computer Vision Laboratory, Linköping University, Sweden, 1989. [37] H. Knutsson and M. Andersson. Optimization of Sequential Filters. In Proceedings of the SSAB Symposium on Image Analysis, pages 87– 90, Linköping, Sweden, March 1995. SSAB. LiTH-ISY-R-1797. URL: http://www.isy.liu.se/cvl/ScOut/TechRep/TechRep.html. [38] H. Knutsson and C-F. Westin. Normalized and Differential Convolution: Methods for Interpolation and Filtering of Incomplete and Uncertain Data. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 515–523, New York City, USA, June 1993. IEEE. 106 Bibliography [39] H. Knutsson, C-F. Westin, and G. H. Granlund. Local Multiscale Frequency and Bandwidth Estimation. In Proceedings of IEEE International Conference on Image Processing, pages 36–40, Austin, Texas, November 1994. IEEE. [40] H. Knutsson, C-F. Westin, and C-J. Westelius. Filtering of Uncertain Irregularly Sampled Multidimensional Data. In Twenty-seventh Asilomar Conf. on Signals, Systems & Computers, pages 1301–1309, Pacific Grove, California, USA, November 1993. IEEE. [41] B. Lucas and T. Kanade. An Iterative Image Registration Technique with Applications to Stereo Vision. In Proc. Darpa IU Workshop, pages 121–130, 1981. [42] L. Massone, G. Sandini, and V. Tagliasco. “Form-invariant” topological mapping strategy for 2d shape recognition. Computer Vision, Graphics, and Image Processing, 30(2):169–188, May 1985. [43] K. Nordberg. Signal Representation and Processing using Operator Groups. PhD thesis, Linköping University, Sweden, SE-581 83 Linköping, Sweden, 1995. Dissertation No 366, ISBN 91-7871-476-1. [44] W. Rudin. Real and Complex Analysis. McGraw-Hill, 3rd edition, 1987. ISBN 0-07-054234-1. [45] H. R. Schwarz. Numerical Analysis: A Comprehensive Introduction. John Wiley & Sons Ltd., 1989. ISBN 0-471-92064-9. [46] O. K. Smith. Eigenvalues of a symmetric 3 × 3 matrix. Communications of the ACM, 4(4):168, 1961. [47] M. Tistarelli and G. Sandini. On the advantages of polar and log-polar mapping for direct estimation of time-to-impact from optical flow. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(4):401–410, April 1993. [48] A. Torchinsky. Real Variables. Addison-Wesley, 1988. ISBN 0-201-15675-X. [49] M. Ulvklo, G. Granlund, and H. Knutsson. Adaptive reconstruction using multiple views. In Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation, pages 47–52, Tucson, Arizona, USA, April 1998. IEEE. LiTH-ISY-R-2046. [50] M. Ulvklo, G. H. Granlund, and H. Knutsson. Texture Gradient in Sparse Texture Fields. In Proceedings of the 9th Scandinavian Conference on Image Analysis, pages 885–894, Uppsala, Sweden, June 1995. SCIA. [51] M. Ulvklo, H. Knutsson, and G. Granlund. Depth segmentation and occluded scene reconstruction using ego-motion. In Proceedings of the SPIE Conference on Visual Information Processing, pages 112–123, Orlando, Florida, USA, April 1998. SPIE. Bibliography 107 [52] S. Uras, F. Girosi, A. Verri, and V. Torre. A computational approach to motion perception. Biological Cybernetics, pages 79–97, 1988. [53] B. L. v. d. Waerden. Algebra, volume 1. Springer-Verlag, Berlin, Göttingen, Heidelberg, 5 edition, 1960. [54] J. Y. A. Wang and E. H. Adelson. Representing moving images with layers. IEEE Transactions on Image Processing Special Issue: Image Sequence Compression, 3(5):625–638, September 1994. [55] J. Y. A. Wang and E. H. Adelson. Spatio-temporal segmentation of video data. In Proceedings of the SPIE Conference: Image and Video Processing II, volume 2182, pages 120–131, San Jose, California, February 1994. [56] J. Y. A. Wang, E. H. Adelson, and U. Y. Desai. Applying mid-level vision techniques for video data compression and manipulation. In Proceedings of the SPIE: Digital Video Compression on Personal Computers: Algorithms and Technologies, volume 2187, pages 116–127, San Jose, California, February 1994. [57] WITAS web page. http://www.ida.liu.se/ext/witas/eng.html. [58] C. F. R. Weiman and G. Chaikin. Logarithmic spiral grids for image processing and display. Computer Graphics and Image Processing, 11:197–226, 1979. [59] C-J. Westelius. Focus of Attention and Gaze Control for Robot Vision. PhD thesis, Linköping University, Sweden, SE-581 83 Linköping, Sweden, 1995. Dissertation No 379, ISBN 91-7871-530-X. [60] C-F. Westin. A Tensor Framework for Multidimensional Signal Processing. PhD thesis, Linköping University, Sweden, SE-581 83 Linköping, Sweden, 1994. Dissertation No 348, ISBN 91-7871-421-4. [61] C-F. Westin, K. Nordberg, and H. Knutsson. On the Equivalence of Normalized Convolution and Normalized Differential Convolution. In Proceedings of IEEE International Conference on Acoustics, Speech, & Signal Processing, pages 457–460, Adelaide, Australia, April 1994. IEEE. [62] C-F. Westin, C-J. Westelius, H. Knutsson, and G. Granlund. Attention Control for Robot Vision. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 726–733, San Francisco, California, June 1996. IEEE Computer Society Press. [63] R. Wilson and H. Knutsson. Uncertainty and inference in the visual system. IEEE Transactions on Systems, Man and Cybernetics, 18(2):305–312, March/April 1988. [64] R. Wilson and H. Knutsson. Seeing Things. Report LiTH-ISY-R-1468, Computer Vision Laboratory, SE-581 83 Linköping, Sweden, 1993. 108 Bibliography

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement