Link oping Studies in Science and Technology Thesis No. 764 Single and Multiple Motion Field Estimation Magnus Hemmendor LIU-TEK-LIC-1999:22 Department of Electrical Engineering Linkopings universitet, SE-581 83 Linkoping, Sweden http://www.isy.liu.se Linkoping April 1999 Single and Multiple Motion Field Estimation c 1999 Magnus Hemmendor Department of Electrical Engineering Linkopings universitet SE-581 83 Linkoping Sweden ISBN 91-7219-478-2 ISSN 0280-7971 iii Abstract This thesis presents a framework for estimation of motion elds both for single and multiple layers. All the methods have in common that they generate or use constraints on the local motion. Motion constraints are represented by vectors whose directions describe one component of the local motion and whose magnitude indicate condence. Two novel methods for estimating these motion constraints are presented. Both methods take two images as input and apply orientation sensitive quadrature lters. One method is similar to a gradient method applied on the phase from the complex lter outputs. The other method is based on novel results using canonical correlation presented in this thesis. Parametric models, e.g. ane or FEM, are used to estimate motion from constraints on local motion. In order to estimate smooth elds for models with many parameters, cost functions on deformations are introduced. Motions of transparent multiple layers are estimated by implicit or explicit clustering of motion constraints into groups. General issues and diculties in analysis of multiple motions are described. An extension of the known EM algorithm is presented together with experimental results on multiple transparent layers with ane motions. Good accuracy in estimation allows reconstruction of layers using a backprojection algorithm. As an alternative to the EM algorithm, this thesis also introduces a method based on higher order tensors. A result with potential applicatications in a number of dierent research elds is the extension of canonical correlation to handle complex variables. Correlation is maximized using a novel method that can handle singular covariance matrices. iv v Acknowledgements Many people have been important for this thesis and here follows an attempt to list those who have made the largest contributions. Professor Hans Knutsson is my academic advisor. He has extremely lots of ideas and a good intuition. Knutsson has provided me with the embryos to many of the best results in this thesis. Torbjorn Kronander PhD, president SECTRA-Imtec AB, is my industrial advisor and despite being overly busy, he has had a great impact on the pace of the progress of this project. He did together with my academic advisor invent the initial ideas for this project. Mats Andersson PhD, has almost served as an assistant academic advisor and taken time to understand and discuss a major portion of this work to the level of detail. All the people at the Computer Vision Laboratory and its manager, professor Gosta Granlund, have provided a friendly and stimulating research environment well above average. For example, Johan Wiklund maintains a very well working computer network. Gunnar Farneback's experience and LATEX design of licentiate thesis speeded up my work. Magnus Borga provided me with unpublished details from his research on canonical correlation. SECTRA-Imtec AB has provided 50% nancial support and I have spent half my time there to share the SECTRA spirit and widen my experience and knowledge. Thanks to SECTRA I have been able to bring my research output into commercial applications of medical imaging. http://www.sectra.se Among our partners at medical centers are Asgrimur Ragnarsson Torbjorn An dersson MD at Orebro Regional Hospital. Lars Thorelius MD, Erik Hellgren MD, at Linkoping University Hospital. Anders Persson MD and Goran Iwar MD, Hudiksvall Hospital. Research partners have an increasing inuence on our work and future plans. Thanks to Lars Wigstrom at Linkoping University Hospital. Also thanks to Surgical Planning Laboratory at Harvard Medical School, in particular faculty members C-F Westin PhD and Professor Ron Kikinis MD. Swedish National Board for Industrial and Technical Development (NUTEK) has provided 50% nancial support for me and my colleague Mats Andersson. NUTEK has also provided partial support for Hans Knutsson and Johan Wiklund. vi Contents 1 Introduction 3 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.1 Cardiovascular Disease . . . . . . . . . . . . . . . . . . . . . 4 1.2 What is Digital Subtraction Angiography and why is Motion Compensation Needed? . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 X-ray Angiography . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 Image Subtraction . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.3 Motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.4 Pixel Shift by Hand . . . . . . . . . . . . . . . . . . . . . . 7 1.2.5 Automatic Motion Compensation . . . . . . . . . . . . . . . 7 1.2.6 Objective of our Research . . . . . . . . . . . . . . . . . . . 7 1.2.7 Cardiac Angiography . . . . . . . . . . . . . . . . . . . . . . 8 1.2.8 Interventional Angiography . . . . . . . . . . . . . . . . . . 9 1.2.9 A Word about MR Angiography . . . . . . . . . . . . . . . 9 1.3 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Quadrature Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 General Issues for Single Motion Fields 2.1 Aperture Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Failure of Separable Motion Estimation Algorithms . . . . 2.2 Motion Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Warping Image to Estimate Large Motions with High Accuracy . 2.3.1 Conventional Iterative Renement . . . . . . . . . . . . . 2.3.2 Compensate Constraint . . . . . . . . . . . . . . . . . . . 2.3.3 Iterative Renement without Subpixel Warps . . . . . . . 3 Parametric Motion Models 3.1 Our Denition of Parametric Motion Models 3.1.1 Finite Element Method (FEM) . . . . 3.2 Model Based Motion Estimation . . . . . . . 3.3 Cost Functions . . . . . . . . . . . . . . . . . 3.3.1 Limit on Cost . . . . . . . . . . . . . . 3.3.2 Designing Cost Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 13 13 15 16 17 17 18 19 19 21 21 22 23 24 viii Contents 3.4 Relation to Motion Estimation from Spatiotemporal Orientation Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.5 Local-Global Ane Model . . . . . . . . . . . . . . . . . . . . . . . 25 3.5.1 Ecient Implementation of The Local-Global Ane Model 26 4 Estimation of Motion Constraints 4.1 Existing Methods . . . . . . . . . . . . . . . . . . 4.1.1 Intensity Conservation Gradient Method . 4.1.2 Point Matching . . . . . . . . . . . . . . . 4.1.3 Spatiotemporal Orientation Tensors . . . 4.2 Phase Based Quadrature Filter Method . . . . . 4.2.1 Motion Constraint Estimation . . . . . . 4.2.2 Condence Measure . . . . . . . . . . . . 4.2.3 Multiple Scales and Iterative Renement . 4.3 Experimental Results . . . . . . . . . . . . . . . . 4.3.1 X-ray Angiography Images . . . . . . . . 4.3.2 Synthetic Images . . . . . . . . . . . . . . 4.3.3 Synthetic Images with Disturbance . . . . 4.4 Future Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 General Problems in Multiple Motion Analysis 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Motion Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Correspondence Problems . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Minimal Number of Motion Constraints . . . . . . . . . . . 5.3.2 Problem: Correspondence Between Estimates in Dierent Parts of the Image . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Problem: Interframe Correspondence Between Estimates . . 6 Estimation of Multiple Motions 6.1 Other Methods Considered . . . . . . . . . . . . . . . . . . . 6.1.1 Diculties with Multiple Correlation Peaks . . . . . . 6.1.2 Diculties with Dominant Layers . . . . . . . . . . . 6.2 Estimation of Motion Constraints . . . . . . . . . . . . . . . . 6.3 EM (modied) . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Review EM . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Derivation of EM Algorithm for Multiple Warps . . . 6.3.3 Evaluating Criteria for Optimum . . . . . . . . . . . . 6.3.4 Iterative Search for Optimum . . . . . . . . . . . . . . 6.3.5 The Probability Function . . . . . . . . . . . . . . . . 6.3.6 Introducing Condence Measure in the EM Algorithm 6.3.7 Our Extensions to the EM Algorithm . . . . . . . . . 6.3.8 Convergence of Modied EM with Warp . . . . . . . . 6.4 Reconstruction of Transparent Layers . . . . . . . . . . . . . 6.4.1 Improved Backprojection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 29 29 29 30 30 31 33 34 34 35 35 35 35 39 39 39 40 40 41 42 43 43 43 44 44 45 46 46 47 49 49 49 50 51 51 51 Contents 1 6.4.2 Finding Correspondence between Motion Estimates from Dierent Frames . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . 6.5 Alternative Method for Two Mixed Motions . . . . . . . . . . . . . 6.5.1 Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Minimizing "(a1 ; a2 ) . . . . . . . . . . . . . . . . . . . . . . 6.5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . 7 Canonical Correlation of Complex Variables. 7.1 7.2 7.3 7.4 Denition of Canonical Correlation of Complex Variables Maximizing Canonical Correlation . . . . . . . . . . . . . Properties of the Canonical Correlation . . . . . . . . . . Maximization Using SVD . . . . . . . . . . . . . . . . . . 7.4.1 Operations in Maximization . . . . . . . . . . . . . 7.5 Canonical Variates . . . . . . . . . . . . . . . . . . . . . . 7.6 Equivalence with Borga's Solution . . . . . . . . . . . . . 8 Motion Estimation using Canonical Correlation 8.1 Operations Applied Locally in the Image. . . . . . 8.1.1 Shifted Quadrature Filter Outputs . . . . . 8.1.2 Canonical Correlation . . . . . . . . . . . . 8.1.3 Correlation of Filters . . . . . . . . . . . . . 8.1.4 Look Up Table (LUT) . . . . . . . . . . . . 8.1.5 Motion Constraints from Correlation Data . 8.2 Fitting Motion Model to Data . . . . . . . . . . . . 8.3 Choosing Patch Size . . . . . . . . . . . . . . . . . 8.4 Experimental Results . . . . . . . . . . . . . . . . . 8.5 Future Development . . . . . . . . . . . . . . . . . 8.5.1 Using Multiple Variates . . . . . . . . . . . 8.5.2 Other Filters than Quadrature Filters . . . 8.5.3 Reducing Patch Size . . . . . . . . . . . . . Appendix A B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Details for Chapter 7 on Canonical Correlation . . . . . . . . . . . A.1 Failure to Compute Derivative with Respect to a Complex Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2 Beginner's Example of Canonical Correlation . . . . . . . . A.3 Proof of Equation (7.9) . . . . . . . . . . . . . . . . . . . . Variable Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1 Global Variable Names . . . . . . . . . . . . . . . . . . . . . B.2 Local Variable Names in Chapter 3 . . . . . . . . . . . . . . B.3 Local Variable Names in Chapter 4 . . . . . . . . . . . . . . B.4 Local Variable Names in Chapter 5 . . . . . . . . . . . . . . B.5 Local Variable Names in Chapter 6 . . . . . . . . . . . . . . B.6 Local Variable Names in Chapter 7 . . . . . . . . . . . . . . B.7 Local Variable Names in Chapter 8 . . . . . . . . . . . . . . 52 52 54 54 56 57 59 59 60 61 61 61 63 63 65 65 67 67 69 69 72 72 72 73 74 74 74 75 77 77 77 77 78 80 80 81 81 82 82 83 84 2 Contents Chapter 1 Introduction 1.1 Motivation All the research presented in this thesis is dedicated to medical image processing and diagnosis of cardiovascular disease, which is the leading killer throughout the industrial world. For example, according to U.S. Department of Health and Human Services[24], more than 950,000 Americans die of cardiovascular disease each year, accounting for more than 40% of all deaths. About 57 million Americans, nearly one fourth of the U.S. population, live with some form of cardiovascular disease. This thesis presents algorithms for motion analysis that are primarily intended for angiography, i.e. medical images on blood vessels. Some parts of this work are already used in a commercial product that has been delivered for clinical use. Other parts need further development before they can be turned into commercial applications. So far, we are good in motion compensation for patients moving extremities[16, 15]. The future goal is to handle motions of a beating heart. The motion estimation algorithms presented in this thesis are by no means limited to medical applications. Estimation of single motions is widely used and high accuracy is often crucial, e.g. in robotics and structure-from-motion applications. Multiple motion analysis is also an important eld. Our methods for estimating transparent motions may enable robotics applications to handle moving shadows and reections in windows. Our algorithms are also able to handle motions of occluding objects. Some modications may improve performance though. 4 Introduction 1.1.1 Cardiovascular Disease A number of words related with cardiovascular disease are listed here. Thrombosis Embolism Stenosis Aneurysm Perfusion Ischemia Capillaries infarct Stroke Formation of a blood clot that blocks a vessel. Can often be dissolved by drugs. A clot in one part of the body can break lose and block an artery in another part of the body. Narrowing of a vessel. The blood sometimes nds a new way through smaller vessels. Swelling of a vessel. Often it looks like a balloon. Aneurysms that burst in the skull cause cerebral hemorrhage. Blood ow through tissue. Lack of oxygen in tissue. Often due to obstruction of arterial blood supply. Vessels in tissue that are too small to be seen individually. On angiography images with contrast agents, they can sometimes be seen as a cloud. Tissue death due to lack of oxygen. Damage to nerve cells in the brain due to lack of oxygen. 1.2 What is Digital Subtraction Angiography and why is Motion Compensation Needed? Angiography is medical imaging on vasculature (angio = blood [vessel]). In the past, angiography was only done using conventional X-ray and contrast agents. Today it is also widely accepted to use CT1 and there is a rapid progress in MR2 angiography. Over the last years, more and more people seem to believe that MR is taking over a a large portion from X-ray angiography. Despite the progress and the future potential of MR, X-ray remains the gold standard, to which MR is compared, and most people seem to believe that X-ray will be indispensable even in future. 1 Computed Tomography (CT). X-ray images are taken from dierent angles by a rotating X-ray source. A computer calculates a 3D reconstruction. 2 Magnetic Resonance (MR). A combination of stationary and rotating magnetic elds are applied on the patient. These make nuclei in the atoms spin in coherence. The echoes of the rotating eld can be measured. MR equipments are expensive but the total cost of using MR is not always higher than for X-ray. 1.2 What is Digital Subtraction Angiography and why is Motion Compensation Needed? 1.2.1 X-ray Angiography contrast agent 5 X-ray source Patient digital X-ray sensor Don’t move! Computer Figure 1.1: A number of images are taken during contrast injection. The patient is told not to move, but that might be dicult. Figure 1.2: angiography sequence of a leg (excerpt) The image sensor is usually an image intensier tube with a CCD element at the output screen. Electronic sensors without intensier tubes are coming. There are also image plates that are scanned by lasers and yield better image quality, but they cannot be used to acquire a sequence of images. 6 Introduction An ordinary frame rate in DSA is between 2 and 6 images per The frame rate is often An ordinary dose of contrast second. higher in the beginning of a seagent is 30ml. It is injected by quence and decreased when a long catheter directly into a the contrast isagent reaches the vessel, upstreams of the region smaller and slower vessels. Dito be examined. agnosis on the heart (angiocardiography) requires a much higher frame rate. Since blood cannot be distinguished from tissue in an X-ray image, a contrast agent is injected into an artery upstreams of the region of interest. The injection is made using a catheter, i.e. a hose that is usually inserted through arteries in the groin. Iodine-based contrast agents have signicantly higher X-ray attenuation than human tissue. This means that more of the X-rays are being absorbed and fewer X-ray photons reach the sensor. The use of contrast agent enables medicals to see the vessels. By taking multiple images, during injection, it is also possible to see how the contrast agent propagates. Unfortunately, it is often dicult to distinguish small vessels from other structure in the image. Despite the contrast agent, the images are usually dominated by bones, lungs and slowly varying thickness of the patient. The help to this problem is image subtraction. 1.2.2 Image Subtraction When subtracting pixel values of two images, one taken before injection and the other taken after injection, only the vessels with contrast agent remains. Image subtraction is a simple, easy-to-understand and widely accepted method. In digital subtraction angiography (DSA), a reference image is taken before contrast is injected or reaches the region of interest. That reference image is then subtracted from all the images acquired after contrast injection. Image subtraction is often a very good method. After image subtraction, nothing remains in the image, except for the contrast agent. In addition, image subtraction is a safe method and the risk of wrong diagnosis due to image subtraction is very small. Radiologists often have long experience and amazing skills in interpreting subtraction angiographies. The predecessor of DSA, is subtraction angiography with photographic lm. One lm is positive and the other is negative. 1.2.3 Motions Image subtraction requires, that nothing has moved between the images were acquired. No patient motions are allowed during image acquisition. Not surprisingly, 1.2 What is Digital Subtraction Angiography and why is Motion Compensation Needed? 7 this makes DSA almost impossible on heart, intestines and other organs that keep moving all the time. More surprising is that motions cause problems even when the arms and legs are examined. When contrast is injected, the patient often feels a burning sensation, and move a little. Even if patients are xated, they still move a little. 1.2.4 Pixel Shift by Hand In conventional implementations of DSA, it is possible to compensate for motions by shifting the entire image a certain number (or fractions) of pixels. This process, called pixel shift, must be done manually by a medical. To save time, images with motions are often thrown away, rather than being shifted. Except for the time required, the quality is often poor. Pixel shifts can only compensate motions that are uniform over the image, but the motions often vary over the image. This means that pixel shifts cannot achieve good quality over the entire image simultaneously. 1.2.5 Automatic Motion Compensation We have developed automatic motion compensation[16, 15] that is a substitute to manual pixel shift. The automatic motion compensation even works for images with rotations and deformations in the image plane. Our motion compensation is very accurate for ordinary motions, including rotations and deformations. It does not matter if the motions are irregular over time. The algorithm is implemented on a dual processor Pentium-II workstation, where 1 second processing time yields enough accuracy for most images of size 512x512. A whole sequence of images can be processed without user interaction. At the time of writing, we have attended an oral presentation of another project that addresses the same problem but with dierent algorithms. Their article[32] is not yet available though. 1.2.6 Objective of our Research Our research in the past is justied by the motion compensation for angiography, and the future goal is better angiography of a beating heart. Tracking the motions of the heart in 2-dimensional X-ray images is a very dicult task. Probably, we will see several generations of motion estimation algorithms before performance is good enough. For that reason, the focus of this thesis is to solve simpler problems of multiple motions. We don't claim that the algorithms for multiple motions work on real X-ray cardio images, but we hope that research has led us closer to the solution of our specic problem. We also hope it is a step towards better analysis of multiple motions in general. 8 Introduction Some More Facts Iodine-based contrast agents are no longer ionic. Ring structure molecules are popular. Despite the development of better contrast agents, some patients still have allergic reactions and chronic kidney damage. A large portion of the patients have diabetes and thus extra sensitive kidneys. CO2 is an alternative to iodinebased contrast agents. CO2 , which is almost transparent to X-rays, replaces the blood in the vessels and acts like a negative contrast. CO2 is dissolved in the blood and expired by the lungs in a one-pass fashion. [22] 1.2.7 Cardiac Angiography Angiography on a beating heart is dierent from angiography on peripheral parts of the body. Due to the fast motions, a higher frame rate of 12-24 images per second is used. Today, there is no technique of motion compensation and thus subtraction cannot be used. Often, angiocardiography is done with interventions and many image sequences are acquired. This means large doses of both X-ray and contrast agents. Typical images are shown in gure 1.3. Figure 1.3: Cardio Sequence. Frame 25, 50, 75 and 125. 1.3 Notations 9 1.2.8 Interventional Angiography A set of techniques, commonly called interventional angiography, is a cheap and simple alternative to surgery in treatment of cardiovascular disease. Thromboses and stenoses can be punctured by a wire inside the catheter. Narrowed vessels can be widened by balloons that are temporarily inserted with the catheter and inated to high pressure. After treatment with balloons, it might be necessary to insert a tube in the vessel for it to stay open. The tubes, called stents, are often made of a metal grid that expands to the correct size once it has been inserted to the right position. There are also stents that make vessels narrower, as a remedy to aneurysms or other kinds of pathological enlargement of vessels. These stents are like a hose inside the vessels. For example, sections of the aorta sometimes expand and get much too wide. A stent is fastened upstreams of the aneurysm and leads the blood past the aneurysm. The blood outside the stent coagulates and the aneurysm goes away. Other aneurysms can be treated by lling them with wire that makes the blood coagulate. Aneurysm in the brain is the leading cause of cerebral hemorrhage. 1.2.9 A Word about MR Angiography Magnetic Resonance Angiography (MRA) has evolved rapidly over the last years. Several studies[25] indicate that MR angiography is already as good as X-ray. In addition, MR avoids problems with X-ray, such as harmful radiation. MRA can be performed without contrast agents, using velocity sensitive measurement such as phase contrast (PC) or time of ight (TOF). In practice, contrast agents may, however, be necessary in most MRA studies, but the risks are less than in X-ray. Contrast agents for MRA are less harmful and are injected intravenously, usually in the arm. This is much simpler than X-r is time consuming and requires precautions to prevent bleeding, thromboses, vessel trauma and infections. Other advantages with MRA are abilities like 3D image acquisition. Among the disadvantages are slow image acquisition and inferior spatial resolution. Metallic implants cause image artifacts, e.g. signal void around metallic stents look like stenosis. Interventional angiography requires that all tools are non-metallic. For security, patients with pacemakers should not be exposed to magnetic resonance. Today, most people seem to believe that MRA will substitute X-ray in many situations, but to what extent is a controversial issue. Most predictions we have heard are partisan and range from \not more than today" to \almost always". 1.3 Notations In appendix B, there is a list of variable names in this thesis. This section is just an introduction to notations and style in this thesis. Vectors and matrices(and tensors) are written in boldface. Matrices are uppercase and vectors are lower case. For example, boldface A is a matrix and boldface a is a vector. Vectors are always column vectors. Normal font A and a are scalars. 10 Introduction r I AT A A Gradient identity matrix superscript T denotes transpose of matrix. star denotes complex conjugate and transpose of matrix. v for scalars, a star is simply a complex conjugate. v = vxy kuk u^ = kuuk boldface v denotes image motion x= y boldface x always denotes coordinate in image. x norm of vector u hat denotes normalized vector 1.4 Quadrature Filters Chapters 4 and 8 use quadrature lters that are related to Gabor lter pairs. A lter is a quadrature lter[13] if its Fourier transform, F (u), has zero amplitude on one side of a hyperplane through the origin, i.e. there is a vector n^ such that F (u) = 0 8 n^ T u 0 (1.1) In this thesis, n^ is called the direction of the quadrature lter. We only use quadrature lters that are real in the Fourier domain. Note that quadrature lters must be complex in the spatial domain (since F (u) 6= F (,u)). Quadrature lters can be optimized using a kernel generator, which produces ecient separable or sequential kernels [23, 2]. 11 amplitude amplitude 1.4 Quadrature Filters 5 10 0 −10 −5 0 frequency 5 10 omega_y 0 −5 −10 omega_x Figure 1.4: Quadrature lters in one and two dimensions. Both lters have direction in positive x-axis. 12 Introduction Chapter 2 General Issues for Single Motion Fields This chapter is a discussion on issues in motion estimation in general. There are several existing methods, e.g. nding correlation peaks and point matching. In this thesis, the focus is methods that use two images and rst estimate constraints on the local motion and then t a motion to these. 2.1 Aperture Problem No matter, how good tracking algorithm is used, motions cannot be unambiguously estimated in an image that only contains structure in one orientation. This is known as the aperture problem. For example, we can think of a moving line, viewed through a small window. Since we cannot see the line endings, it is impossible to estimate the motion component along the line. Only the orthogonal component can be estimated. The aperture problem tells us to use big windows when estimating motions. Small windows rarely have structure in more than one orientation. How large windows depends on how far we have to go in the image, before orientation changes. In some images, e.g. gure 2.1, it might be necessary to use the entire image to estimate motion at a single coordinate. A big window may solve the aperture problem, but fails to estimate motions locally, when motions are not uniform over the image. Chapter 3 describes how to use global motion models to overcome the aperture problem and still being able to estimate motions that are not pure translations. 2.1.1 Failure of Separable Motion Estimation Algorithms It may seem plausible that an algorithm that estimates disparity along a scan line, can be extended to track motions in the plane. A rst stupid idea was to apply 14 General Issues for Single Motion Fields Figure 2.1: This image contains very little structure for estimating motions in vertical direction. For sucient accuracy, the entire image is needed. (X-ray image of a leg) the stereo algorithm in both horizontal and vertical direction. This would give one estimate of the motion in x-direction and another estimate in y-direction. Although this worked pretty good in some experiments, we abandoned this approach since there is a fundamental dierence between stereo algorithms and motion algorithms. The stereo algorithm assumes it can nd a match along the direction of search (usually scanline). This assumption is valid for stereo images, but not for images with motions. Searching in one direction does rarely yield a correct match. Thus, we might not nd a match, or even worse, nd a false match. As illustrated in gure 2.2, this method is even unaware of the aperture problem. 2.2 Motion Constraints @ vy @ @ @ @ @ 15 @ [email protected] @ @ @ @ @ @ ,, , @ @ @ ,v = vx vy ,, , @ - @ v @ x Figure 2.2: This gure shows what happens if we track a moving line, by independently estimating the x- and y-components of the motion. The total estimate, v, is seriously bad. In addition, this algorithm is unaware of the aperture problem and gives just one answer. 2.2 Motion Constraints Throughout this thesis, we will use constraints on local motion, like cxvx + cy vy + ct = 0 (2.1) where (vx ; vy ) is the local image motion and cx , cy and ct are coecients estimated locally in the image. It popular to use cx = dI=dx, cy = dI=dy and ct = dI=dt, where I (x; t) denotes the intensity of the image sequence. This method is commonly called the gradient method or optical ow[17]. A novel method of estimating motion constraints is presented in chapters 4 and 8. If we use constraints from a single point, the motion (vx ; vy ) cannot be unambiguously determined due to the aperture problem, but by combining constraints over a larger region, the aperture problem is overcome. For the rest of the thesis, c will denote a vector such that 0c 1 x c = @cy A (2.2) ct with the property that 0v 1 x cT @vy A = 0: (2.3) 1 Note that scaling of the constraint vector, c, does not change the constraint on the motion, eq. 2.1. We use the magnitude of the constraint vector to denote a 16 General Issues for Single Motion Fields condence, i.e. a measure on how much we trust the estimate. Terminology will be sloppy and the vector c itself is often called motion constraint. Example 2.2.1 Intersecting Constraints: Assume motion is pure translation and v y we have been able to estimate motion constraints, eq. (2.1), without errors. Then there is a unique motion, vx , vy that satises all the constraints. As in gure 2.3 the motion can be solved graphically by plotting all motion constraints (vx ; vy )space. The intersection is the correct motion. v x Figure 2.3: A number of constraints. For pure translational motions, all constraints intersect at a common point in (vx ; vy )-space. This representation is trivial when there is only one motion. It will be used more for better understanding of multiple motions. For other motions than pure translations, we may use parametric motion models as described in chapter 3. We need to draw constraints in as many dimensions as there are parameters. The constraints are represented by hyperplanes that all intersect in a point corresponding to the motion. 2.3 Warping Image to Estimate Large Motions with High Accuracy To estimate large motions with high accuracy, it is common to use a coarse to ne approach. Motions estimates from coarse scale are used to warp the image, and the estimates can be rened in ner scale. For best accuracy, more than one iteration is done in each scale. This scheme is called iterative renement[29]. One potential problem is that a good match in coarse scale is not necessarily a good match in ner scale. Another problem is the subpixel warp, which means resampling of the image. In imaging, unlike to audio, resampling usually means degradation since images are not perfectly bandlimited before sampling and cannot be reconstructed without error. There are several methods of interpolation, but we simply use bilinear 2.3 Warping Image to Estimate Large Motions with High Accuracy 17 interpolation for maximum locality and obtain images that look good to the human eye. Even if images warped with bilinear interpolation look quite good to the human eye, they may not look good to motion estimation algorithms. In section 2.3.3 we will present a method that avoids subpixel warps. In chapter 8 is another method presented where nothing is warped at all. Both these methods need to assume that rotations and deformations are so small that the image motion locally can be described as translation. 2.3.1 Conventional Iterative Renement A conventional scheme of iteratively estimating large motions with good accuracy is presented in gure 2.4. After each iteration, the original images are warped and in next iteration the error in previous iteration will be estimated. The error is supposed to converge to zero. IA (x) - IB (x) warp - compute c~ t - motion constmodel raints v - accumulate v - 6 Figure 2.4: Iterative renement for motion estimation from two image frames, IA (x) and IB (x). Estimated motions are used to warp the image so that only a small motion remains to be estimated in the next iteration. 2.3.2 Compensate Constraint It turns out that the approach to warp image and estimate errors, as in gure 2.4, means diculties when estimating multiple motions and the image is warped for each of the motion layers. The major problem is the incompatibility between constraints computed from dierent warps, i.e. it is complicated to use constraints computed from one warp together with constraints from another warp. We will show how compensate for the warp directly in the constraint. Let (wx ; wy ) denote the local warp and let (~cx ; c~y ; c~t ) denote a motion constraint estimated from a warped image. That constraint is an estimate of the motion relative to the warp, c~x(vx , wx ) + c~y (vy , wy ) + c~t = 0: (2.4) 18 General Issues for Single Motion Fields Thus, the correct motion constraint vector is 0 [email protected] 1 c~x A c~y c~t , c~xwx , c~y wy (2.5) 2.3.3 Iterative Renement without Subpixel Warps Thanks to eq. 2.5, we can compute the correct constraint, even if even if warp is not exact. This enables warp without subpixel accuracy where the local shifts can be rounded to integral pixels. An overview of the scheme is presented in gure 2.5. Note that unless the motion is a pure translation, it is no good to apply any spatial operations after warping with integral local shifts. In particular, we have to compute spatial gradient before warping. This limits this method to images where deformations and rotations are so small that they can locally be regarded as translations. This method is fast since the spatial lters1 need not be applied in each iteration. A limitation is that it cannot be used in conjunction with all possible methods of estimating motion constraints, c. The motion constraint must be estimated in a separable fashion, where all spatial operations are performed before operations in temporal direction. The phase-based method in chapter 4 and the conventional gradient method[17] satisfy this requirement. In the gradient method, there are no temporal operations before computing spatial derivatives and there are no spatial operations applied on the temporal derivatives. IA- spatial operations - IB- spatial - warp operations 6 - comptemporal ~c- ensate for operations warp t c- motion v model 6 w round to integer Figure 2.5: Our scheme of warping. Instead of warping the image, a number of lter outputs are warped. Since the motion constraints, c are compensated for the warp directly, it is not necessary to warp with subpixel accuracy. (c.f gure 2.4) 1 We may want to use lters that are computationally expensive - Chapter 3 Parametric Motion Models In this chapter, we assume a large number of constraints on the local motion are given (c.f. section 2.2), i.e. 0v 1 x (3.1) cTk v = 0 where v = @vy A and k = 1; 2; 3; : : : 1 Methods for computing these constraints are described in chapters 4 and 8. The focus of this chapter is how to compute the motion from these constraints, even if motion is not pure translation, i.e. the motion depends on spatial position x v = v(x): (3.2) Since the constraint vectors, c are noisy, it does not make sense to t a motion perfectly to these constraints. In case we would try to t a motion eld to every single constraint, the resulting estimate would be very noisy. Therefore it is necessary to t a smooth eld to the constraints. How smooth and in which way is application dependent. E.g. in orthogonal projection of a planar surface, the projection image can only be subject to translations, rotations and elongations. In case we do motion estimation on a planar surface, all estimated nonrigid deformations should be discarded. There are several dierent methods of tting motions to a number of constraints. We think, in many articles[3], these methods are often associated and confused with particular methods for estimating the constraints on the local motion. We have found it simple to use methods where the motion is represented by a number of parameters, e.g. ane motion which is described by six parameters. In this chapter, we present a general theory for parametric models where the local motion is linear with respect to the parameter vector. 3.1 Our Denition of Parametric Motion Models A motion model describes how images move relative to one another. The motion is denoted v and describes how many pixels an object moves between two frames. 20 Parametric Motion Models The motion can either be velocity or displacement. In case of two image frames IA (x) and IB (x) and no intensity variations, the image intensities are related as IA (x) = IB (x + v) 8x (3.3) Unless we have pure translation, v is not constant over the image. Pure translation is simple, but not adequate in most applications where tracked features are being distorted or rotated. A popular motion model is the ane transformation, which can handle scaling, rotation and elongations, i.e. a a a v = a14 a25 x + a36 (3.4) Motion models can be designed in many ways. Just for fun, let's consider one more, the quadratic motion model. a a a 0x2 1 a a x a v = a107 a118 a129 @xyA + a35 a46 y + a12 (3.5) y2 We can spot a pattern. All the motion models considered so far can be written as a linear combination of basis functions. Given a set of basis functions, the motion is represented by a set of parameters, ai . This seems to be a useful and simple way of describing almost any motion model. v= N X i=1 ai ki (x) (3.6) To simplify notations, we arrange the coecients in a parameter vector 0a 1 BB a12 CC a=B (3.7) @ ... CA aN and the basis functions in a matrix K(x) = ,k1 (x) k2(x) : : : kN (x) (3.8) and rewrite eq (3.6) as a matrix multiplication instead of a sum v = K(x) a: (3.9) For the rest of the thesis, boldface a denotes a vector of motion model parameters and K(x) is a matrix, whose columns are basis functions. Example 3.1.1 For the pure translation motion model, K(x) = I is the identity matrix and for the ane motion model K(x) = x0 y0 10 x0 y0 01 (3.10) It is of course possible to swap the columns in K(x) or form new sets of basis functions. 3.2 Model Based Motion Estimation 21 3.1.1 Finite Element Method (FEM) For computational eciency in motion estimation, K(x) should be locally sparse. In other words, the basis function should have small support, i.e. Kij (x) = 0 except for in a small region in spatial domain. We might want to express the motion as a linear combination of bumps, or interpolation kernels. We have used bilinear interpolation kernels, which have small support and are continuous. Using interpolation kernels with small support is known as the nite element method. In particular, when we have bilinear interpolation kernels, solid mechanics people say we have linear elements or rst order method. To get a second order method, we must have interpolation kernels with continuous derivatives. Figure 3.1: One of our favorite motion models used to be a deformable linear mesh. The more complicated motions are, the more nodes are needed in the mesh. Each node corresponds to bilinear basis functions for horizontal and vertical motions. The motion model presented here can be extended to describe motion over time K(x; t). In case motion is regular over time, a spatiotemporal model can improve accuracy. An interesting model for cyclical heart motion[35] uses truncated fourier series in temporal direction and a nite element mesh in spatial directions. 3.2 Model Based Motion Estimation To simplify notations, the motion vector is extend with an extra entry that is always unity. For that reason, K(x) matrix and the parameter vector a are also extended, v a K(x) 0 v = 1 ; a = 1 and K(x) = 0T 1 : (3.11) This section describes how to estimate motion model parameters from motion constraints. In other words, we have a set of motion constraint vectors, ck (c.f. 22 Parametric Motion Models section 2.2) and want to compute a the best possible parameter vector, a for the chosen motion model. For simplicity, we t parameters in least square sense to constraints like cT v = 0 where v(x) = K(x) a. Remember that the magnitude of the constraint vector, c, is the condence measure. Let xk denote the spatial position of constraint ck and dene the following error measure that should be minimized with respect to motion model parameters X "(a) = (cTk v(xk ))2 = = where k X k X (cTk K(xk ) a)2 aT K(xk )T ck cTk K(xk ) a (3.12) k = aT Qa Q= X k K(xk )T ck cTk K(xk ) (3.13) Since the last entry in a is always one, the Q matrix is splitted into a submatrix, a vector and a scalar. Q = qQT qq : The error can be expressed as "(a) = aT Q a + 2qT a + q and the motion model parameters are computed as a = Q,1q: (3.14) (3.15) (3.16) 3.3 Cost Functions Even if motions are complicated in a global view, they may be simple locally. Motion model with many parameters allow too irregular motions. This makes the motion estimates susceptible to noise and aperture problem. The problem is even worse when using the EM algorithm in chapter 6, which gets lost in the rst few iterations. It is also a problem when the basis functions in K(x) have small support, and some regions suer from the aperture problem. Our remedy is to discourage deformations by adding a cost function to the error measure "(a) in eq. (3.12). For simplicity, the cost function is a quadratic form aT Pa where P is a symmetric matrix with nonnegative eigenvalues. Instead of minimizing that error measure, we minimize a sum of the error measure and the cost on deformations. "~(a) = "(a) + aT Pa (3.17) = aT (Q + P) a + qT a + q 3.3 Cost Functions 23 where 0 is a scalar parameter that controls the stiness, and can be included in P, if you like. The larger lambda, the more regularization. The reason for using quadratic error measure is the computational eciency. Compared to not using a cost function, we only have to introduce a matrix addition in eq. (3.16) a = (Q + P),1 Q: (3.18) There is no universal way to choose , but in one of our implementations, it is proportional to the frobenius norm of Q. 3.3.1 Limit on Cost When using the EM algorithm in chapter 6, we avoid choosing explicitly. Instead we set a limit on the cost, i.e. we choose an upper limit on aT P a and then solve for the smallest 0 that gives a motion estimate below the limit. First we try = 0, and if that doesn't pass the limit, Newton-Raphson search is applied on a function that is zero at the cost limit, f () = aT () P a() , 0 (3.19) where 0 is the upper limit. Newton-Raphson solves f () = 0 in a number of iterations, n+1 = n , ff0((n )) (3.20) n The derivative in the denominator is computed as (note that a = a() is a function of ). da f 0 () = 2aT P d (3.21) = 2aT P (Q + P),1 (,Pa) = ,2aT P (Q + P),1 Pa da = ,(Q + P),1 Pa was solved by dierentiating (Q + P) a = q, which where d da + Pa = 0. The second derivative can be computed by gives that (Q + P) d d2 a2 + 2P da = 0. This gives d2 a2 = ,2(Q + dierentiating again, i.e. (Q + P) d d d da = 2(Q + P),1 (Q + P),1 Pa. P),1 d Theorem 3.3.1 f 0() 0; f 00() 0 8 0 Proof: Note that Q is symmetric, as it is dened. Without loss of generality, we can also assume P is symmetric. (Every cost function written with a nonsymmetric P can also be written with a symmetric P.) Then it is obvious that f 0 () 0 since (Q + P),1 is positive denite. Next da P da + 2aT P d2 a f 00 () = 2 d d d2 T , 1 (3.22) = 2 a P (Q + P) P (Q + P),1 Pa , , = 2 (Q + P),1 Pa T P (Q + P),1 Pa 24 Parametric Motion Models Note that f 00 () is a quadratic form and P has non-negative eigenvalues, f 00 () cannot be negative. Thus, the sequence n will decrease towards the limit, but never reach below. To get below, we modied f () by replacing 0 by some value just below the limit. 3.3.2 Designing Cost Functions There is no universal way of designing cost functions. We have tried to design cost functions without much theory. It is easy when there are only a few parameters in our motion model, but gets harder when there are more degrees of freedom. By mistake, we may forget adding a cost on deformations that should be forbidden. We have developed a method of designing cost functions for any motion model with many parameters. We have used it to design cost functions that makes that makes a deformable mesh to locally behave like an ane transformation. The fundamental idea is to compare the estimated motions in a region, with the closest possible ane transformation. Example 3.3.1 To illustrate the approach of designing a cost function, let's look at an example that solves a dierent and much simpler problem. Assume we would measure the roughness of a signal s. Let slp denote a low pass ltered version thereof. We may dene roughness = ks , slp k. The cost is simply dened as the dierence between the signal and the closest signal that is free from high frequency components. The same idea is used when designing a cost on deformations. The cost on the motion is the dierence to the closest motion without non-ane deformations (locally). To dene the cost, we locally t an ane model to the estimated motions. The cost is the square dierence between the motion estimate and the ane model that we t to the same estimates. Let K(~x) be the matrix including the basis functions of the ane model and let a~ be the ane parameters. These a~-parameters are locally computed from the estimated motion, v(x) = K(x) a. (a) = X all regions min a~ ZZ kK(~x) a~ , K(x) ak dx dy 2 region (3.23) The regions must overlap, since this method does not put a cost on deformations at the borders. Evaluation of (a) into an explicit formula will give a quadratic cost function of a for each local region. These cost functions are summed into a global cost function that is also quadratic and can be applied as described in section 3.3. Note that the use of this method is not restricted to regularization for the nite element model. Also note that it does not have to be ane models that we locally impose. Instead of using ane motion models as a reference, we may use translational or quadratic motion models. It is even possible to use a mixture thereof, just by adding cost functions designed in dierent ways. For example, we can put a low cost on translations and a high cost on ane deformations. 3.4 Relation to Motion Estimation from Spatiotemporal Orientation Tensors 25 3.4 Relation to Motion Estimation from Spatiotemporal Orientation Tensors Knutsson and Granlund[13] have used 3D orientation tensors to estimate motion. Their tensors are a 3x3 matrix that are estimated locally in the image. To overcome the aperture problem, it may be necessary to low pass lter the tensors. Pure translation can be estimated by summing tensors over the entire image. They suggest motions should be estimated by minimizing " = v TTv T (3.24) v v This is similar to our least square method, described in section 3.2. In fact, our least square t is minimization of " = vT Tv where T = X ccT (3.25) Which of eq. (3.25) and eq. (3.24) gives the best motion estimate depends on how the tensors are estimated. Knutsson and Granlund use spatiotemporal lter banks to estimate the tensors. The motion is estimated without, warping the image. For their tensors, one can assume that the angular error of vx vy 1 T is independent of angle. For our tensors that are estimated from warped images, we assume that the absolute error of (vx ; vy ) is independent of the size of the motion. To estimate ane motions, Farneback[9] expanded the tensors to size 7x7 before summing them together. His approach can be generalized to any of our motion models by replacing ck cTk by Tk in all eq. (3.13). 3.5 Local-Global Ane Model In some applications the motion eld is several complicated deformations. Initially, we used the nite element motion model, which models image motions like a mesh that deforms. This method become computationally expensive when the image is divided into many cells. Remind that it takes O(N 3 ) operations to solve a linear equation system, where N denotes the number of unknowns. There are two unknowns for every node in the mesh, and the number of nodes is proportional to the square of the resolution. Thus, we have O(N 6 ) algorithm. Another diculty with the nite element method is to design cost function, since it must depend on the image. The cost function should depend on the distribution on magnitude of the image or motion constraint. It is also an issue how it should behave on the borders of the image. We have images where the valid region can be a circular or rectangular region of the total image. Since the valid region is not known a priori, a new cost function needs to be computed for every image. Instead of using a global parametric model with many parameters, we use local ane models with global smoothing . The remedy to the aperture problem is low pass ltering, not cost functions. Of course, we cannot estimate motions rst and 26 Parametric Motion Models then low pass lter the motion vectors. Instead we low pass lter the coecients of Q. Although the model is local, we use a global coordinate system for the ane parameters to enable low pass ltering of coecients. The reader should convince him/herself that averaging equation system coecients over the entire image, is equivalent to a global ane model. Averaging over a region is equivalent to using an ane model in that region. Recall that averaging is equivalent to convolution with a kernel with a constant value. You might stop up and ask what happens if we use some other non-negative kernel, e.g. a Gaussian. This is, in fact, equivalent to to weighting the constraint dierently in, eq. (3.26). Usually, we are more interested in constraints in near neighborhood than far away. In order to remedy the aperture problem, it is still necessary to let motion estimates in one corner of the image inuence motion estimates in opposite corner. In terms of formulas, we modify eq. (3.13) from global to local values of Q. Q(x) = X k W (x , xk )2 K(xk )T ck cTk K(xk ) (3.26) where W (x) is a windowing function, e.g. W (x) = kxk1+ (3.27) Where determines the size of the local region. A large will make the motion estimate more global. We recommend RR that the other parameter, > 2 (otherwise there is theoretically no locality since W (x; y) dx dy < 1). Estimating motion locally rather than globally is of course more sensitive to noise and the lack of local structure. An interesting property of this method, is that if the window is strictly positive everywhere, i.e. W (x) > 0 8x, then the local Q(x) has the same rank as the global Q. All structure in the image contribute to all local matrices. This means that this method produces a motion estimate at every point in the image. 3.5.1 Ecient Implementation of The Local-Global Ane Model Our implementation of the local ane motion model is almost as fast as the global ane motion model. The window function is implemented as a convolution with a low pass lter. First we compute a local version of Q(x) using a window function that is unity in a very small neighborhood and zero everywhere else. To save computations, subsampling1 of the matrix eld is applied at the same time. This eld of matrices is convolved with the window function. The window function is modied a little to be separable. For each point in the low pass ltered matrix eld, the ane parameters are solved. Since the matrix eld was subsampled, the computed we applied subsampling, we need to upsample the estimated motion estimates. The upsampling is done by bilinear interpolation. 1 We recommend that the blocksize in subsampling is signicantly smaller than in eq. (3.27). 3.5 Local-Global Ane Model 27 Ecient implementation of the low pass lter, using separable and spread kernels, makes the computational complexity reduces from O(N 6 ) in the nite element model to O(N 3 ). 28 Parametric Motion Models Chapter 4 Estimation of Motion Constraints The focus of this chapter is a novel method for estimation of constraints on the local motion, c, as dened in section 2.2. The input is two image frames and the output is a number of (possibly conicting) constraints for each pixel. This method can be used in conjunction with parametric motion models in chapter 3 and even for estimation of multiple motions in chapter 6. 4.1 Existing Methods Before describing our method, we will briey describe other existing motion estimation methods and argue why not using them. 4.1.1 Intensity Conservation Gradient Method Traditional methods for optical ow are based on the assumption of intensity conservation over time. For X-ray images this is not valid, since (i) images in the same sequence have slightly dierent level due to dierent X-ray exposure. (ii) Contrast injection may darken the image, at least locally. (iii) Multiple layers may interfere. It may be possible to remedy these problems with preltering[27, 1], and advanced preltering can be similar to our method. 4.1.2 Point Matching Another popular method is point matching, where a region in one image is matched to regions in another image, using some correlation scheme. Some kind of correlation measure is computed and we the algorithm chooses the match that gives maximum correlation. An alternative to maximizing correlation is to minimize a dissimilarity measure. To speed up matching, it is possible to use some gradient method or iterative search instead of explicitly computing correlation for all possible shifts. 30 Estimation of Motion Constraints Due to the aperture problem, point matching methods are only suitable to match regions in the image that have structure in more than one orientation, e.g. corners and line crossings. The features to track must to be found. For our medical images, point matching is not a good alternative. The amount of corners is much less than there are edges. Thus, a point matching method would only use a fraction of the information in the images. 4.1.3 Spatiotemporal Orientation Tensors Estimating image velocity using three dimensional lter banks has proven accurate[13, 9, 10, 21] in other applications. The idea is to consider a sequence of images as a three dimensional spatiotemporal volume, of variables x; y; t. This three dimensional thinking is the same as for the gradient method[17] of optical ow, but instead of computing gradients, a set of lters are used to measure local orientation, which is the three dimensional motion vector. The most successful method is probably[13, 21] based on a set of nine quadrature lters. The energies of each of the quadrature lter outputs are computed and combined to an orientation tensor. Except for the high accuracy, the method is good in using all the information of both edges and corners. All the information is implicit in the tensors. The aperture problem is simply overcome by low pass ltering of tensors. All information, including certainty, can be extracted using eigenvector decomposition. Unfortunately, spatiotemporal ltering approaches are not useful in our applications. The frame rate is too low and patient motions are too large and irregular over time. In terms of signal processing, we have severe aliasing due to low temporal sampling rate. Thinking of the image sequence as a spatiotemporal volume is not helpful. Another reason for not using spatiotemporal ltering is that we want to estimate the displacement, not the velocity. In case we would use spatiotemporal ltering, we would get velocity vectors that had to be followed over time, which would result in accumulation of errors. 4.2 Phase Based Quadrature Filter Method Using quadrature lters phase is a relatively common approach in stereo algorithms[33, 12]. The idea of using phase for motion estimation has previously been investigated by some researchers [8, 6, 11], but to our knowledge, nobody has tried this approach, which extends the accurate stereo algorithms to estimate relative motions from two image frames. Our method is almost a gradient-based method with nonlinear preprocessing of the images. To improve accuracy, a condence measure has been added. The method presented in this thesis has been published both as an independent method[14] and in the context of angiography application[15]. Denition 4.2.1 A lter is a quadrature lter[13] if its Fourier transform, F (u), is zero on one side of a hyperplane through the origin, i.e. there is a direction n^ such that F (u) = 0 8 n^T u 0 (4.1) 4.2 Phase Based Quadrature Filter Method 31 Quadrature lter outputs are closely related to analytic signals. Note that quadrature lters must be complex in the spatial domain. We only use lters that are real in the Fourier domain. 4.2.1 Motion Constraint Estimation The input to the algorithm is two image frames, denoted IA (x) and IB (x) and the output is a number of motion constraints, c, at each pixel. A number of quadrature lters are applied in parallel on each of the two image frames, producing the same number of lter outputs. The quadrature lters are tuned in dierent directions and frequency bands to split dissimilar features into dierent lter outputs, so that they do not interfere in the motion estimation. The quadrature lters also suppress undesired features like DC value and high frequencies. Unlike the conventional gradient method, our method is not sensitive to low pass variations in image intensity, that are frequent in medical X-ray images, or real world images where shadows and illumination vary. The quadrature lters can be chosen to have dierent directions and dierent frequency bands, but all of our implementations have four lters in the same frequency band but in dierent directions, as shown in gure 4.1. These lters are denoted f1 (x), f2 (x), f3 (x) and f4(x) and are tuned in 0, 45, 90 and 135 degrees. Both the input images are convolved with each of the lters, qA;j (x) = (fj IA )(x) and qB;j (x) = (fj IB )(x) (4.2) where fj (x) is a quadrature lter and IA (x) and IB (x) are image intensities of the two frames respectively. The phase is dened as the phase angle of the complex numbers A;j (x) = arg qA;j (x) and B;j (x) = arg qB;j (x): (4.3) In all ensuing computations, we must remember that phase is always modulo 2, but for readability we drop this in our formulas and notations. In most image points, the lter outputs are strongly dominated by one frequency, which makes the phase nearly linear in a local neighborhood. When the phase is linear, it can be represented by its value and gradient. Thus, a gradient method applied on the phase will be very accurate. Of course, the phase is not always linear in a local neighborhood, but that can be detected, and reected by a condence measure. For each point in the image, and for each quadrature lter output, a constraint on the local motion is computed. To simplify notations, we drop the index, j , of the quadrature lter. 0c 1 0 x c = @cy A = C @ ct 1 2 1 2 @ (B + A )1 @x @ @y (B + A )A B , A (4.4) 32 Estimation of Motion Constraints Apply Quadrature Filters 0 ) , , ,45 , Phase d dx ) d dy x x PP Images IA ( ) and IB ( ) PP PP PP @ PP @ PPP 90 135 @ P q P R @ A A A A AU B B d dt B BBN Magnitude B B B B B BN condence C @ R @ cx ? cy ? ct , , ? 0c 1 x c = @cy A ct Figure 4.1: From image to motion constraint for one direction of quadrature lters. The quadrature lter outputs are complex values, but that would take colors to illustrate so only phase images are shown. Note that phase is wrapped module 2. 4.2 Phase Based Quadrature Filter Method 33 Since the phase is locally almost linear, the derivatives can be computed as a dierence between two pixels. The motion constraint vector is the spatiotemporal gradient of the phase, weighted by the condence measure, C , which will be introduced in next section. 4.2.2 Condence Measure Using a condence measure is necessary to give strong features precedence over weaker features and noise. In addition, it is necessary to avoid phase singularities[33, 20] which occur when two frequencies interfere in the lter output. These singularities must be discovered and treated as outliers. All this is done by assigning a condence value to each constraint. Our condence measure is inspired by the stereo disparity algorithm by Westelius [33], which in turn is inspired by [7]. It is a product of several factors, where the most important feature is the magnitude. Our condence measure for magnitude may seem complicated at rst glance. Except for suppressing weak features, it is also sensitive to dierence between the two frames. This reduces the inuence of structure that only exist in one of the images, such as moving shadows, appearing objects and other features not moving according to the motion we estimate. 2 2 Cmag = (jq jjq2A+j jjqqB jj2 )3=2 (4.5) A B Other factors have been added to reect whether the gradient, is sound for the specic quadrature lter in use. Negative frequencies are illegal and indicate phase singularities[20, 33]. ( T Cfreq>0 = 1 if n^ r > 0 ; 0 otherwise: (4.6) Our condence measure is also sensitive to high frequencies, which may indicate an error in the lter output or signal probability of negative frequencies wrapping around modulo 2. ( Cfreq:wrap = 1 if krk < !max:diff ; (4.7) 0 otherwise: where !max is related to the upper cuto frequency of the quadrature lter. Frequencies above this are probably false and there is also an increased probability of wrap-around from negative frequencies. It might be better with a continuous drop o in condence, but this binary function is computationally ecient since a 'C=0' can be represented by \NaN" in oating point arithmetics. We also guard for phase dierence wrap arounds. ( Cphase,wrap = 1 if kB , A k < max ; 0 otherwise: (4.8) 34 Estimation of Motion Constraints When computing the frequency, it is also useful to check consistency between two images in order to avoid features that only exist in one of the images. , krA , rB k2 Cfreq:cons = max 0; !max:diff krA k2 + krB k2 (4.9) where we have heuristically set !max:diff = 1 Finally, the total condence is computed as a product of all the condence measures, i.e. C = Cfreq>0 Cfreq:wrap Cphase,wrap Cmag Cfreq:cons (4.10) 4.2.3 Multiple Scales and Iterative Renement To estimate large motions with best possible accuracy, we apply motion estimation iteratively in multiple scales. We begin at the coarsest scale in a low pass pyramid to compute a rough estimate. Then we warp the image, or its lter outputs, and do a new iteration at a ner scale. For best accuracy, we can do multiple iterations at each scale. When estimating a motion constraint from a warped image, we get a constraint on the motion relative to the warp. Similarly, subsampling alters the estimated motion constraints to yield smaller motion estimates. It is, however, simple to compensate for the warp and subsampling. Assume the image is warped (wx ; wy ) pixels and subsampled octaves prior to estimation of a motion constraint, c~ = (~cx ; c~y ; c~t ). Then we have in fact estimated that c~x vx 2,wx + c~y vy 2,wy + c~t = 0: Thus, the correct motion constraint is 0 [email protected] (4.11) 1 c~x A: c~y 2 c~t , c~xwx , c~y wy (4.12) In order to avoid subpixel warps, the method in Figure 2.5 is used. 4.3 Experimental Results We have used the phase-based method on various image data, and it has always turned out advantageous to the conventional gradient method. One important application is motion compensation in sequences of medical X-ray images, digital subtraction angiography. The conventional gradient method fail to estimate motions accurately, due to dierent DC level in the frames and motions of the injected contrast agent. Suppressing low frequencies helps a lot, but still our phase-based method is superior. 4.4 Future Development 35 4.3.1 X-ray Angiography Images Figures 4.2 - 4.5 show a comparison for a medical X-ray angiography sequence. Image subtraction is used to extract the vessels and take away the bones and tissue. We get much less motion artifacts when using phase-based motion estimation. Constraints over the image are integrated, to t a local-global deformable motion model[16] in least square sense. We have used four quadrature lters in dierent directions in conjunction with multiple scales and iterative renement. 4.3.2 Synthetic Images We have also compared accuracy on images where motions come from synthetic shifts. A real world test image has been shifted dierent amounts in dierent directions. To avoid inuence from subpixel warps, the image has been subsampled after the warp. One might expect the conventional gradient method works pretty good on these images that have perfect intensity conservation between frames. But still, our phase-based method is more accurate, as shown in gure 4.6. 4.3.3 Synthetic Images with Disturbance In angiography, the contrast injection causes disturbing changes in the image, that may also disturb the motion estimation. We have made an experiment on synthetic images to show that our phase-based motion estimation is less susceptible to such disturbance than the the conventional gradient method, often referred as optical ow[17]. We have used synthetically shifted images to evaluate the accuracy when one of the image frames is disturbed. We have used a popular reference image, Lena256x256, which has been shifted and then subsampled to hide artifacts due to subpixel shifts. The shifts are in all possible directions and we have computed the average performance for all shifts of the same distance. As shown in gure 4.7, our novel method performs signicantly better. Since we usep iterative renement, it is most relevant to study performance for shifts less than 2 = 2 0:7 pixels. After convergence, the warp reduces motion to something less than a half pixel in each of x and y directions. 4.4 Future Development The condence measure in this thesis is designed without much theory and experiments. It might be possible to get better accuracy with application specic condence measures. For instance, in some applications it may be more or less important to check consistency between frames. In general, it can be that the condence measure factor on magnitude, eq. (4.5) should depend on the noise level. Instead of being linear to magnitude, it should be a sigmoid function that give almost equal condence to all features that are well above the noise level. 36 Estimation of Motion Constraints Figure 4.2: Original X-ray images Figure 4.3: Subtraction Images, no motion compensation Figure 4.4: Subtraction Images, motion compensation based on conventional gradient method, after ltering out low frequencies. Figure 4.5: Subtraction Images, motion compensation based on our phase-based method. Note there are less artifacts compared to gure 4.4 (the condence measure diers[16] slightly from the text.) 4.4 Future Development 37 0.35 Phase Method Gradient Method 0.4 0.3 Error in Estimate (pixels) Error in Estimate (pixels) 0.5 0.3 0.2 0.1 Phase Method Gradient Method 0.25 0.2 0.15 0.1 0.05 0 0 0.5 1 True Shift (pixels) 1.5 0 0 0.5 1 True Shift (pixels) 1.5 Figure 4.6: The phase-based method is more accurate than the conventional gradient method. These plots show a comparison on images(Lena 256x256 and Debbie 128x128) that are shifted synthetically. One pass estimation { no iterative renement. 0.35 Phase Method Gradient Method Error in Estimate (pixels) 0.3 0.25 0.2 0.15 0.1 0.05 0 0 0.2 0.4 0.6 True Shift (pixels) 0.8 1 Figure 4.7: The phase-based method is more accurate than the conventional gradient method. This gure shows a comparison on images(Lena 256x256) that are shifted synthetically(shifting Lena512x512 before subsampling). One of the image frames has been disturbed by adding a transparent stripe across the image, in order to simulate a contrast bolus. (One pass estimation, i.e. no iterative renement). 38 Estimation of Motion Constraints Chapter 5 General Problems in Multiple Motion Analysis 5.1 Introduction In estimation of multiple motions, there is a number of diculties that are not present in estimation of single motion. In the general case, estimation of multiple motions is a very dicult problem. All motions in the image need to be classied and clustered into an unknown number of elds described by unknown models. Even counting the number of motions in an image is a problem. This requires some criteria to tell how dierent two motions must be before they are classied as two motions instead of one. In the algorithms presented in chapter 6, it is assumed that the number of motions are known a priori. Multiple motion problems can be classied into transparent or occluding motions. The case of occluding motions is the most common in real world images, e.g. a scene of multiple opaque objects at dierent depth, moving at dierent image velocity. The focus our research is, however, primarily dedicated to the problem of transparent motions. In X-ray images, we get a projection of structure at dierent depth. The X-rays goes through all parts and nothing is occluded. The logarithm of the image is thus the sum of all X-ray attenuation at all dierent depths. Our approximation model of the human body is a set of transparent layers that move independently. For example, an approximate model of X-ray images on the heart might be four transparent layers. Two layers are the front and back ribs and the other two layers are the front and back wall of the heart. 5.2 Motion Constraints In the estimation of single motion in chapter 4, we assumed that the structure in a medical image is primarily edges and only few corners. Thus, constraints on local motion, c, is a representation that holds almost all relevant information in the image. In the case of multiple layers, we assume that the motion of each layer 40 General Problems in Multiple Motion Analysis can be described by motion constraints. For transparent layers, we also assume a sparse abundance of edges and that the small regions in the image are usually dominated by structure from only one layer. Under that assumption, it is possible to estimate constraints on the local motion. Each estimated motion constraint describes the motion of one layer, but we do not know which. An image can yield a million of motion constraints that need to be explicitly or implicitly clustered into multiple layers. 5.3 Correspondence Problems We have already mentioned, there are correspondence problems when a large number of motion constraints are given from one image. Here follows a description in more detail about dierent types of correspondence problems. 5.3.1 Minimal Number of Motion Constraints Generalized Aperture Problem[19]: Assume translation. In the case of one motion it is enough to see how two edges move, to estimate the motion. But, in the case of two transparent motions, we need at least ve edges or independent motion constraints. Figure 5.1 illustrates that four motion constraints are never enough to estimate the motion of two layers due to ambiguous solutions. Adding a fth constraint resolves the ambiguity, provided that one layer has three constraints and the other two layer has two constraints. Theorem 5.3.1 Assume there are M motion layers. The motion of each layer is represented by motion models with N parameters. Then we need at least MN + M , 1 motion constraints, of the type cxvx + cy vy + ct = 0, to compute the motions of all the layers. Proof: There are N parameters per motion layer. Thus there are MN unknowns parameters. In addition, there are hidden unknowns telling which constraints belong the same layer. Assume MN constraints are given, then there , topoints would be MN in parameter space where N constraints intersect. Only M N of these correspond to a true motion. But if one extra extra constraint is added for all layers but one, i.e. a total of M , 1 new constraints, then there are exactly M , 1 points where N + 1 constraints intersect. All of these M , 1 points are true motions. The M :th motion is unambiguously given. As shown in gure 5.1, it is the only point of N constraints that remains after removal of the constraints belonging to the rst M , 1 motions. So far, we have shown that MN + M , 1 motion constraints are enough. To complete the proof, we also must note that fewer constraints would cause ambiguities. If there are fewer constraints, there are two cases: case I): One motion has N , 1 constraint, or even fewer. It is impossible to solve for the N motion parameters. case II) At least two motions , have fewer than N +1 constraints. The constraints of these motions will give 2NN intersections of N 5.3 Correspondence Problems v x five constraints (not enough) v y five constraints (enough) vy v y four constraints (always ambiguous) 41 v x v x Figure 5.1: To estimate two velocities, it is not enough to have four constraints, but ve might be enough. The left gure shows four constraints and all possible solution cases. (Would you choose the circles, the squares or the diamonds?). With a fth constraint, it is easy to realize that the circles represent the only solution. The third gure shows that ve constraints are not always enough. constraints. If N 2, it is impossible to tell which of these correspond to the real motions. The problems is in fact worse than described here, since motion constraint vectors can be linearly dependent and noisy. Motion constraints will never intersect exactly at a number of points corresponding to each layer. In practice, the abundance of motion constraints will be denser at some points and it is hard to tell which constraint belongs to which layer. 5.3.2 Problem: Correspondence Between Estimates in Different Parts of the Image Assume we have been able to locally estimate two or more motion vectors, v~1 ; v~2 ; : : : , at every single pixel in an image. Then it remains to tell which motion vectors belong to the same layer or object. To illustrate the diculties, we will study a case with ambiguous solutions. Figure 5.2 shows a eld of two motion vectors 42 General Problems in Multiple Motion Analysis and two possible solutions of splitting up the vectors into two smooth elds. Of course, it is unlikely that two motion elds will be equal on a path across the image. Although, this will not happen in practice, we may get something quite close. In practice we will also have diculties to tell if a motion eld is continuous, due to noisy estimates and that motion is not given at points between pixels. Both motions v1 (case 1) v1 (case 2) v 2 (case 1) v2 (case 2) Figure 5.2: Despite we have been able to estimate two velocity vectors at every point in the image, (upper left). We cannot unambiguously tell which velocity vectors belong to the same layer. In this case, there are two possible continuous solutions. Which one would you choose? 5.3.3 Problem: Interframe Correspondence Between Estimates Assume we have several frames. It is not enough to overcome all the previously mentioned problems, i.e. to estimate all motion vector elds for each frame. We still don't know which vector elds in the two frames correspond to the same layers. This might be a problem, even when motions are smooth over time. (We suggest that this problem should be solved by nding correspondence between the features in the images.) Chapter 6 Estimation of Multiple Motions Chapter 5 described the problems and diculties in estimation of multiple motions. This chapter presents algorithms to overcome some of these problems and estimate motion elds of multiple layers. The primary focus is a modied version of the EM algorithm[19] for estimation of multiple motions. 6.1 Other Methods Considered Before describing the successful part of our research, some other methods will be described briey. We have considered a number of possible methods that we have decided not to use. Among them are explicit correlation and tracking of dominant layers. We cannot prove that they are inferior, but we describe problems that discourage further research. 6.1.1 Diculties with Multiple Correlation Peaks One of the methods we have considered is to explicitly correlate images with dierent shifts and nd correlation peaks. As in estimation of a single motion, correlation is hard to extend to estimation of other motions than pure translations. Subpixel accuracy requires that the image is shifted with subpixel accuracy prior to correlation. In estimation of multiple motions, it is often easy to nd one of the motions as the highest peak. Finding the next peak is not as easy. It is like asking which is the second highest point in an area of mountains. Depending on who you ask, the answer is dierent One person may say it is a rock two meters below the highest peak. Another person would claim it is a minor peak of the same mountain, just hundred meters away. A third person would count nothing but another mountain, at least a kilometer away. In a correlation map, the problem of dening criteria of nding a second peak 44 Estimation of Multiple Motions is as dicult as in the real world. It is eveb worse, since the correlation is only computed at a nite resolution of shifts. The limited resolution of the images make it unmeaningful to compute correlation for small subpixel shifts. If the dierence between two layers is just a few pixels, it is likely that the two peaks merge into one. In order to estimate non-translational motions, local analysis is necessary and the second peak often drowns in the ridge of a higher peak. 6.1.2 Diculties with Dominant Layers Let's describe an approach that works in some of our experiments, but not good enough. It is based on the assumption that one layer may be much stronger than all the other layers. Under this assumption, we have been able to estimate motions of two transparent layers by rst estimating motion of the dominant layer. Motion estimation is done using the phase-based method in section 4.2. The condence measure is designed to suppress motions that are large relative to the warp. In conjunction with iterative renement, section 4.2.3, the motion estimate converges to the dominant layer and outliers from the other layer are given low weight. When motions of the dominant layer are known, it is possible to lter it away. The removal of the dominant layer from the images is far from perfect, but in our experiments it has been good enough for the next step. After removal of the dominant layer from the images, it is straightforward to estimate the motion of the weaker layer. To improve accuracy, we have applied the above scheme iteratively. When both motions are known the two layers are separated, section 6.4. In the next iteration, the reconstructed layers are then used as reference images in the motion estimation. If success, the algorithm converges towards better reconstructed images and better motion estimates. We have also been able to estimate motions in an image sequences where the layers are virtually equally strong. This was done using a bootstrap version of the above scheme. In the rst integration, only two frames are used. Often, the motion estimate converges to either of the layers, although the accuracy is awfully bad. This layer is ltered out and used as a reference image in the motion estimation in the next iteration. Accuracy slowly gets better the more image frames that are used. The scheme is computationally expensive and suers from problems with convergence. On our test images, it only works when motions are pure translations. It is also complicated to use multiple scales to estimate large motions since dierent layers are dominant at dierent scales. 6.2 Estimation of Motion Constraints The motion constraints, c in this chapter are computed by the phase-based method we used for single motion estimation, section 4.2. It is possible to use other methods, but we have not tried that. If a small region in the image only contains structure from one layer, the estimated motion constraint will be accurate. Otherwise, in case there is structure from two layers at the same point, they may interfere and produce outliers. The 6.3 EM (modied) 45 Figure 6.1: Constraints from an image with two transparent layers as in section 6.5.3. Four directions of quadrature lters are used, yielding four constraints at each pixel. One layer appears stronger than the other. phase-based method is less susceptible to interference between layers than the conventional gradient method. The phase-based method is only sensitive for band pass frequencies and these are split up by in dierent directions, and thus dissimilar structure from dierent layers is less likely to interfere. The condence measure is also designed to suppress matches of dissimilar structure. An example of constraints from two transparent layers is shown in gure 6.1. 6.3 EM (modied) Out of the methods we have tried, the EM algorithm[19] is probably the best. The EM algorithm is a general algorithm with applications beyond imaging. In our 46 Estimation of Multiple Motions application, it is basically a kind of clustering algorithm, whose input is the mixture of all motion constraints from all layers. Motion constraints that are coherent are assumed to belong to the same layer. The EM algorithm is an iterative algorithm that uses an initial guess what the motions are and then does several iterations. A limitation is that the number of layers must be known a priori. Of course, no clustering algorithm for motion constraints is guaranteed to converge due to the correct answer, since there might be ambiguities as described in section 5.3.2. In addition, it happens that the EM algorithm gets stuck in a local optimum. 6.3.1 Review EM When we have multiple motions, constraints intersect at dierent points in parameter space, corresponding to each of the motions. Estimating these motions is equivalent to nding the intersection points in parameter space. There seems to be no closed form solution to this problem, but it can be iteratively solved by the EM algorithm[19]. The EM algorithm is a clustering algorithm that iteratively applies two steps: Expectation: Estimate the owner probabilities for each constraint, i.e. the probabilities that a constraint belongs to a particular motion layer. (We will see that the owner probabilities depend on previously made motion estimates.) Maximization: Estimate the motions, when constraints are assigned to each of the motions, depending on the owner probabilities. Next iteration, the owner probabilities have changed since the motion estimates are dierent. As already mentioned, the original version of EM algorithm is only guaranteed to converge[26] to a local optimum but we do not know whether it is a global optimum. 6.3.2 Derivation of EM Algorithm for Multiple Warps Jepson and Black [19] have used the the EM algorithm on multiple motions, but their approach didn't include warping images. As pointed out earlier, warping images is necessary to estimate large motions with best possible accuracy. This is especially important when estimating transparent motions. In case we wouldn't warp, the constraints of a large displacement would be much weaker than those of a small displacement. The problem with warping multiple motions is that the image must be warped according to each of the estimated motions, producing multiple warped images. Here we will derive a simple extension of Jepson-Black's EM algorithm[19] that assigns dierent mixture probabilities to each of the warped images. A lot of variable names need to be introduced and it may help to keep an eye on the list in appendix B. Let l denote the index of the warp. For each warp, l, we get a set of constraints ck;l , where k is a joint index of spatial position and other indices such as quadrature lter direction. Also assume the correct motions model parameters for each of 6.3 EM (modied) 47 for i = 1 to number_of_iterations_refinement { for j = 1 to number_of_motions { warp_image; compute_motion_constraints; } for j = 1 to number_of_EM_iterations { E_step; M_step; } } Figure 6.2: The loops in our extended EM algorithm with multiple warps. the motions are a0 ; :::; aN . Temporarily disregard from the possibility of a bad estimates of motion constraints. Under these conditions, the PDF1 for observing the constraint ck;l is P (ck;l jxk ; fmn;l g; a0 ; :::; aN ) = X n mn;l P (ck;l jxk ; an ) (6.1) where mn;l is the probability of observing motion n in an image warped according to l. The PDF of observing our combination of constraints is the product of all PDFs for single constraints. By applying logarithm, the product is converted to a sum. Y X log P (ck;l jxk ; fmn;lg; a0 ; :::; aN ) = log P (ck;l jxk ; fmn;l g; a0 ; :::; aN ) k;l k;l (6.2) X X = log mn;l P (ck;l jxk ; an ) k;l n We want to nd the global maximum of this function under constraint that the mixture probabilities sum to 1. X n mn;l = 1 8 l = 1; 2; : : : (6.3) 6.3.3 Evaluating Criteria for Optimum We have just arrived at a well dened mathematical problem, i.e. to maximize the joint PDF, eq. (6.2), under constraint eq. (6.3). To make clear what mathematical problem is, let's write the equations for an optimization problem. in a special form X X max log mn;l P (ck;l jxk ; an ) fan g;fmn;lg k;l X n where mn;l , 1 = 0 8 l n 1 Probability Density Function (PDF) (6.4) (6.5) 48 Estimation of Multiple Motions Similar to[19] we use Lagrange relaxation2 to derive our version of the EM algorithm for warped images. Relaxation of eq. (6.5) gives the Lagrange function X X X X L(fan g; fmn;lg; fl g) = log mn;l P (ck;l jxk ; an ) , l ( mnl , 1) n k;l l n (6.6) where fl g are the Lagrange multipliers. According to Lagrange theory, the optimum is a sadle point of L(fan g; fmn;lg; flg) and must satisfy3 @ L(fa g; fm g; f g) = 0 8 n; l (6.7) n n;l l @mn;l @ (6.8) @ L(fan g; fmn;lg; flg) = 0 8 l l ran L(fan g; fmn;lg; flg) = 0 Evaluation of these equations yields X P (ckl jxk ; an) P m P (c jx ; a ) , l = 0 n~ n~ l kl k n~ k X mn;l , 1 = 0 n X mnlraP (ck;l jxk ; an) P =0 k;l n~ mn~ l P (ckl jxk ; an~ ) 8n (6.9) 8 n; l (6.10) 8l (6.11) 8n (6.12) In order to further evaluate these equations, let's dene something called ownership probabilities jxk ; an ) (6.13) qnkl = PmnlmP (Pck;l ( c n~ n~ l kl jxk ; an~ ) Now, the equations can be written as X qnkl , l mnl = 0 8 n; l (6.14) k X k;l X mnl , 1 = 0 8l (6.15) qnkl ra log P (ckl jxk ; an ) = 0 8n (6.16) n These are the equations that need to be satised at the optimum. They are solved iteratively by solving one at a time. Before describing the details in next section, we will give an intuitive meaning to the owner probabilities, qnkl . Note that they are dened for each combination of motion constraint and layer. After a closer look at eq. (6.13), it is clear that qnkl is the probability that the constraint ck;l belongs to layer with index n. In particular, note that Pn qnkl = 1. 2 3 A common method in mathematics and optimization theory Unfortunately, even a local optimum satises these equations. 6.3 EM (modied) 49 6.3.4 Iterative Search for Optimum The EM algorithm denes how to solve equations. (6.14)-(6.16) iteratively by solving one variable at a time using one equation. The rst operation in each iteration is to compute the ownership probabilities for each pixel and layer. This is a straightforward computation using eq. (6.13). In the rst iteration, we need an initial guess of the motion parameters, fan g, and mixture probabilities, fmnl g. The second operation is to compute the motion parameters for each layer, fan g using eq. (6.16). Thanks to our probability function that will be dened in section 6.3.5, the motion estimation is the same least square t as in section 3.2. In order to prepare for next iteration, the mixture probabilities, fmnl g, need to be updated using P q P mnl = k nkl : (6.17) n~ ;k;~l qn~ k~l Then we go back and do some more iterations. The EM algorithm is guaranteed to converge to a local optimum[26]. 6.3.5 The Probability Function The probability density function, denes the probability of observing a particular constraint at a particular spatial location, according to a particular motion model. For simplicity, we use a Gaussian PDF. It is simple because eq. (6.16) yields the same equations as for the model based estimation in the section 3.2, except for that no condence measure is used. In next section we will tell how to get the condence measure back. The probability of observing a particular constraint is a normal distribution with respect to the deviation according to a dissimilarity measure[19], d(c; v). 2 P (cjx; a) = P (cjv) = p 1 exp(, d 2(c;2v) ) (6.18) 2 where v = K(x) a. With some abuse of notation, we dene the d-function cy vy + ct )2 d2 (c; v) = (cx vx + (6.19) 2 cx + c2y The denominator acts like normalizing the vector (cx ; cy ) and the value of d(c; v) is the closest distance from the point (vx ; vy ) to the line cx vx + cy vy + ct = 0. Note that our function does not assume larger deviations for large motions in contrast to[19]. Our approach to warp the image gives the same absolute accuracy for arbitrary large motions. 6.3.6 Introducing Condence Measure in the EM Algorithm In the probability function dened in section 6.3.5, the condence measure was removed due to the normalization in eq. (6.19). We believe the condence should 50 Estimation of Multiple Motions not aect the relative values of the owner probabilities. For example, a high condence in the motion constraint does not mean that we are certain which layer it belongs to. Without writing the equations again, we will tell how to derive the EM algorithm with a condence measure. Let C = c2x + c2y denote the condence measure of a motion constraint. The condence measure should be introduced in eq. (6.4) 2 by multiplying Ck;l in the outer sum in front of \log : : :". This condence will follow through all the derivation in section 6.3.3. In the end, eq. (6.14) and eq. (6.16) 2 will be modied by simply replacing qnkl with Ck;l qnkl . 6.3.7 Our Extensions to the EM Algorithm As we have pointed out, it is proved that the EM algorithm converges to a local optimum, but the risk of getting stuck in a local optimum is prohibitive when using motion models with many degrees of freedom. In our case, warping images, makes convergence even more hazardous, since the image has nite size and we might get outside the boundary. In experiments, when applying the EM algorithm to an images with two transparent layers with ane motion model, the EM algorithm sometimes bails out already in the rst iteration. Our remedy to this is to control the stiness (section 3.3). In the rst iteration, only translations are estimated. In the second iteration, we allow small ane deformations. For every iteration, we reduce the cost. We do several EM iterations for every time we warp the image. Other researchers[21] suggest simulated and deterministic annealing to avoid getting stuck in a local optimum. This would require too many iterations when we have many parameters. We have not tried that since we have not had problems with local optima when using cost functions. Another extension is to let owner probabilities alter the certainties. If we are not sure which layer a constraint belongs to, its inuence in estimation of the motion parameters should be less. Outliers in motion constraints are more frequent when there are multiple layers, since structures corresponding to dierent layers sometimes interfere. In our scheme, outliers are handled by introducing an extra layer of motion that is supposed to own outliers. This special layer has a special probability function that is much wider than for the other layers. This means that this layer owns all constraints that are far away from the closest estimated motion. How far is implicitly determined by specifying a value for the mixture probability, i.e. we want that mn;N = 8n (6.20) where N is the motion model and is a predened constant, that controls what fraction of constraints to consider as outliers. This is controlled by setting P (ck;l jxk ; aN ) = poutlier (6.21) poutlier is determined so that eq. (6.20) holds. Since this probability function does not depend on its corresponding motion, there is no need to estimate the motion parameters of this layer. 6.4 Reconstruction of Transparent Layers 51 6.3.8 Convergence of Modied EM with Warp We have made we have made modications of the EM algorithm without proving convergence. Even if we assumes that the modied EM algorithm always converges, it would not imply that the iterative renement with image warps converges. It is important not to confuse the iterations of the EM algorithm with the iterations of image warps. The EM algorithm is in the inner loop and the warps are in the outer loop (see gure 6.2. For every iteration of iterative renement, several EM iterations are performed. The proof of convergence has nothing to do with convergence of iterative renement. Convergence of the inner loop does not imply convergence of the outer loop. 6.4 Reconstruction of Transparent Layers If motions are known it is possible to reconstruct transparent layer, except for the very lowest frequencies and provided that motions are unique and big enough not to interfere with the pixel resolution. A predecessor of our algorithm is to simply average along the trajectory of one motion[18]. If we have many frames in the image sequence, the structure corresponding to this motion is sharpened and all other structure is blurred. We improve the image quality by estimating the errors and feeding them back. We arrive at an iterative backprojection algorithm, described by gure 6.3. original sequence - subtract - reconstructed images average along motion trajectory - 6 reconstruct original sequence Figure 6.3: Reconstruction of motion layers using simple backprojection. 6.4.1 Improved Backprojection Algorithm In the simple backprojection, the feedback images are warped two times; rst when feeded back and then when feeded forward after being subtracted. The double warp degrades the image quality. Our way to overcome this problem is the 52 Estimation of Multiple Motions scheme in gure 6.4 where the two warps are performed in one step as a single warp. original sequence - average along motion trajectory - subtract reconstruction images - 6 - combined reconstruction and average along trajectory Figure 6.4: Reconstruction of transparent layers using a more sophisticated backprojection. 6.4.2 Finding Correspondence between Motion Estimates from Dierent Frames As pointed out in section 5.3, we need know which motion vectors correspond over time. The way we have applied the EM algorithm does not give us that information. Comparing motion vectors over time does not work since our motions are irregular over time. Our approach is to rst reconstruct layers from only two frames. (There are no such problems if we only have two frames). Although image quality is bad, we have the two layers separated and we can analyze how these correlates with frames later in the sequence (when warped with dierent motion estimates). 6.4.3 Experimental Results So far, we have run the our algorithms only on images with two layers that have been superimposed synthetically. Figure 6.5 shows a number of frames from a sequence of synthetically generated images. The images have been generated by taking two still images on a heart and adding them together with ane random motions and then crop the valid region. The original heart images were thrown away and the only input data to our algorithm was the sequence of 50 generated 6.4 Reconstruction of Transparent Layers 53 Figure 6.5: Synthetic multiple motion eld sequence containing two layers with independent random ane motions. Frame 10, 20, 30 and 40 in a sequence of 50 images. Figure 6.6: Reconstructed images (cropped a few pixels to hide artifacts at borders). Compare to gure 1.3. 54 Estimation of Multiple Motions images. Using the EM algorithm with our phase-based method, the motions of both layers were estimated under assumption of ane motions. The original layers were reconstructed and shown in gure 6.6. We get some artifacts at the borde is degraded. We did not save the true original images are thrown away in order to avoid confusion, but they can be seen in gure 1.3. One layer is the upper left part of one image in gure 1.3 and the other layer is the lower right part. The reconstruction of the details is ne but the lowpass is degraded and the DC is discarded. After doing these experiments, we have seen a poster and an abstract on a project that also seem to solve the same problem of separation of layers. Not much details related to our work are given and the proceedings with the full article[28] is not yet available. 6.5 Alternative Method for Two Mixed Motions In this section, we present an alternative to the EM algorithm for estimation of multiple motions. Compared to the EM algorithm, the same input data is used, but the computational time is usually shorter since it is cheap to do many iterations once some initial computations are done. Among the drawbacks of this method is the inuence of outliers seem too large, problems with convergence and we have not yet invented any method to take advantage of multiple warps in order to estimate large motions with good accuracy. We have found references of an algorithm[30, 31] with some similarities. It uses higher order moments in 3D Fourier/Gabor transforms of spatiotemporal volumes and also yields a minimization problem. 6.5.1 Basic Idea Assume we have two motions, v1 and v2 , described by parameter vectors a1 and a2 for some motion model dened by K(x). As dened in chapter 3, v = K(x) a and v = K(x) a 1 1 2 2 (6.22) A large number of constraints vectors, ck , k = 1; 2; 3; : : : are given at spatial positions xk . Neglecting interference between layers, the motion constraints are supposed to satisfy either cTk v = 0 or cTk v = 0 1 where v v= 1 2 and 0c 1 k;x ck = @ck;y A : ck;t (6.23) (6.24) 6.5 Alternative Method for Two Mixed Motions 55 Let's now dene an error measure similar to eq. (3.12) but for two motions, "(a1 ; a2 ) = X (cTk v1 (xk ))2 (cTk v2 (xk ))2 k X , , = ( ck;x ck;y K(xk ) a + ck;t ) ( ck;x ck;y K(xk ) a + ck;t ) 1 k 2 2 2 (6.25) To simplify notations, we introduce bk = ,ck;x ck;y K(xk ) dk = ck;t (6.26) (6.27) and the expression gets more readable, "(a1 ; a2 ) = = X k X k (bk a1 + dk )2 (bk a2 + dk )2 (aT1 bk bTk a1 + 2dbTk a1 + d2k )(aT2 bk bTk a2 + 2dk bTk a2 + d2k ): (6.28) After further evaluation of the product above, it turns out that the sum can be moved inside the unknown parameter vectors, a1 and a2 . We have to sum over outer products of up to four vectors and we get a four dimensional array of numbers, that are called tensors[34]. Readers not familiar of tensors, can think of them as an extension of vectors and matrices into arbitrary dimensionality. Tensors come with special tensor notations, where matrix-like product are written without . Instead indices to multiply and sum over are written as subscripts of one factor and subscript of the other factor. "(a1 ; a2 ) =T4ijkl a1i a1j a2k a2l + + T3ijk a1i a1j a2k + T3ijk a1i a2j a2k + + T2ij a1i a1j + 4T2ij a1i a2j + T2ij a2i a2j + + 2T1ia1i + 2T1ia2i + + T0 (6.29) where the T4ijkl , T3ijk , T2ij , T1i , T0 are tensors with 4, 3, 2, 1, 0 indices4. The tensors are formed by summing outer products5 and we get the fourth moments 4 Example: T has two indices and is a matrix, T has one index and is a vector. T has no 2 1 0 index and hence a scalar. 5 For example, the outer product of two column vectors, u v = uvT is a matrix (or a tensor with two indices). 56 Estimation of Multiple Motions of the elements in c T = 4 T3 = T2 = T1 = T0 = X k X k X k X k X k bk bk bk bk (6.30) dk bk bk bk (6.31) d2k bk bk (6.32) d3k bk (6.33) d4k (6.34) 6.5.2 Minimizing "(a1; a2 ) In order to to nd motion, "(a1 ; a2 ) is minimized. In lack of references for better methods, a version of Newton's method is used to nd a stationary points, i.e. where the gradient is zero. A brief description of the approach would be a multidimensional version of the well known Newton-Raphson method applied on the gradient. The gradient with respect to a1 is computed as ra1 "(a ; a ) = T ijkl a i a j a k + 1 2 1 4 2 2 + 4T3ijk a1i a2j + 4T3ijk a2i a2j + + 2T2ij a1i + 4T2ij a2i + (6.35) + 2T1 and the gradient with respect to a2 is computed in a similar way. With some abuse of notations, rtensor we treat the tensors as vectors and say that the gradient " ( a ; a ) r" = raa1 "(a11 ; a22 ) . With similar abuse of notations, the Hessian, i.e. matrix 2 of second derivatives, is given by ijkl r "(a ; a ) = 2T4T aa ai a j 2T4Tijklaai aaj + i j i j 2 1 2 2 4 4 1 2 4 1 2 4 4T ijk a 2 1 1 4T3ijka1i + 4T3ijka2i + + 4T3ijka1i + 4T3 ijka2i 4T3ijk a1i ij ij + 24TT2ij 42TT2ij 2 2 3 2 i (6.36) The parameter vectors, a1 ; a2 are computed iteratively using Newton's method. (Search for stationary points) a n a 1 2 ( +1) (n) a (n) ,1 a (n) = aa1 , r2 "( a1 ) r"( a1 ) 2 2 2 (6.37) 6.5 Alternative Method for Two Mixed Motions 57 Convergence is often quite poor. In our experiments, the Newton search often converges to suboptimal solutions, usually a1 = a2 . Our simple remedy to this problem is to use several start points for the iteration. The Newton search is so fast6 so that we can use hundreds of start points. A Newton search that tends to bail out is canceled and we try the next start points. The procedure is repeated until we have got a large number of sound estimates of a1 and a2 . Then we choose the estimate with the smallest "(a1 ; a2 ). Unfortunately, we can never be sure that we have found the optimum. All we can do is to increase the likelihood by using a large number of start points. An alternative to this approach is simulated or deterministic annealing. Annealing also suers from the problem that you cannot be sure you have found the optimum. 6.5.3 Experimental Results This alternative method has been implemented both for translational and ane motions. Accuracy seems not as good as the EM algorithm. Figure 6.7 shows results from experiments on synthetic images. Just like in gure 4.6 in chapter 4, the phase-based method is used to estimate constraints and the image is not warped to improve accuracy. The same test images are used(Lena+Debbie128) but these are superimposed with opposite motion. When estimating multiple motions, we get two motion estimates for each pixel and it is hard to tell which corresponds to which layer. In evaluation of accuracy, the motion estimates are sorted by a comparison with the known motion and we don't think it's cheating. Motion constraints for this experiment when motions are (1; 1) pixels in each direction are drawn in gure 6.1. Error in Estimate (pixels) 0.3 0.25 0.2 0.15 0.1 0.05 0 0 0.5 1 1.5 True Shift (pixels) 2 Figure 6.7: Accuracy of estimation of each of two superimposed layers. For small motions, it is hard to separate the layers. Evidently, one layer yields better estimates than the other. 6 Very fast compared to the EM algorithm 58 Estimation of Multiple Motions Chapter 7 Canonical Correlation of Complex Variables. There is a well developed theory for canonical correlation analysis (CCA) of real variables, Borga[5]. Canonical correlation of complex variables has successfully been used in a stereo algorithm[5] without having a theory for the complex case. This chapter introduces a novel way of maximizing canonical correlation, which is derived for complex variables. It is also shown to generate the same solution as Borga's[5] method, even for complex variables. Thus, Borga's method is proven to work even for the complex case. A major advantage of our novel method is the ability to handle singular covariance matrices. This chapter is a theoretical study on canonical correlation in general. Since no images or vectors are involved, a number of variable names and notations can be used for other purposes. For example, vector v is not the motion vector. For complex matrices, conjugate and transpose are usually applied simultaneously. This is denoted by superscript star(), e.g. A . A simple transpose is denoted by superscript T , e.g. AT . Unfortunately, this chapter uses simple complex conjugate without transpose. In lack of good notations, a simple conjugate is written as a combination of a star and transpose, e.g. AT . Another commonly used notation is the operator of expectancy value of a stochastic variable, E [:]. In practical application, statistical data sets are limited and we need to use estimates of expectancy value. After having veried all the formulas in the chapter, it turns out that every every E [:] operator can be substituted with a sum over all available data. 7.1 Denition of Canonical Correlation of Complex Variables The notations and formulas are similar to Borga's PhD thesis[5], except for some variables names that would cause too much confusion in image processing. Assume we have two sets of stochastic variables organized in two vectors, zA and zB 60 Canonical Correlation of Complex Variables. respectively. For each of the two vectors we construct linear combinations of the vector components. zA = wAT zA and zB = wBT zB (7.1) where wA and wB are vectors of linear combination coecients. The canonical correlation is the correlation of these two linear combinations. = p E [zAzB ] E [zA zA]E [zB zB ] E [(zTA wA ) zTB wB ] =q E [(zTA wA ) zTA wA ]E [(zTB wB ) zTB wB ] (7.2) E [zT zT ]wB w A A B =q wA E [zTA zTA]wA wB E [zTB zTB ]wB = p wA CAB w B wACAAwA wB CBB wB where the covariance1 matrices are CAA = E [zTA zTA] and CAB = E [zTA zTB ] and CBB = E [zTB zTB ] and wA and wB are computed to maximize the correlation. (7.3) 7.2 Maximizing Canonical Correlation The objective in canonical correlation analysis is to nd the two linear combinations that yield maximum correlation, i.e. maximizing the correlation, , with respect to wA and wB . In the complex case, where is complex, the rst issue is what to maximize, the absolute value or the real part. The following theorem implies that both the absolute value and the real part can be maximized simultaneously. Theorem 7.2.1 max < = max jj (7.4) Proof: It is obvious that max < max kk. It remains to show that max < max kk always holds. Assume that we nd wA and wB so that kk is maximized but arg 6= 0. Then we can get a real canonical correlation with the same absolute value by multiplying wA by ei arg . At maximum, the linear combination coecients hold information about dependence of the input data. In learning and adaptive ltering, these linear combination can be applied on new input data for classication. The stereo and motion algorithm in chapter 8 use analysis of wA and wB directly to nd mutual dependence between the two images. A simple example of canonical correlation analysis is provided in appendix A.2. 1 Only true covariance if expectancy value is zero. 7.3 Properties of the Canonical Correlation 61 7.3 Properties of the Canonical Correlation Theorem 7.3.1 = p wA CAB w B (7.5) wACAAwA wB CBB wB 1 Proof: Note that E [zA zB ] is a scalar product2 of stochastic variables zA and zB . Thus, it follows from Cauchy-Schwarz' that the numinator is less or equal to the denuminator. In real world applications, expectancy value, E [zA zB ], is substituted P with a sum of all available data, k zA;k zB;k . This sum also meets the criteria for being a scalar product. Thus, it still holds that jj 1. 7.4 Maximization Using SVD Borga[5] transforms the maximization problem of canonical correlation into a generalized eigenvector problem. The formulas on page 68 in his dissertation[5] are only formulated for real parameter vectors, wA and wB . That proof does not hold for the complex case, since it is not possible to compute the derivative of a complex conjugate (see note in appendix A.1). It may be possible to modify Borga's proof by dierentiating with respect to real and imaginary parts separately, but we do not present any such proof. Instead, we present a novel proof and a novel method that employs neither derivatives nor generalized eigenvector problem. Our novel method of maximizing the canonical correlation, works, unlike the scheme by Borga[5], even when covariance matrices CAA and CBB are singular. We will also show that it is equivalent to Borga's method, even for the complex case of canonical correlation. Thus, we have proved that Borga's method is valid for complex variables. 7.4.1 Operations in Maximization Since CAA and CBB are Hermitian and positive denite, we can do eigenvalue decomposition CAA = QADAQA and CBB = QB DB QB (7.6) where QA and QB are unitary matrices. DA , DB are diagonal matrices, whose eigenvalues are real and nonnegative. Note that one or more eigenvalues are zero in case CAA or CBB is singular. In practice, matrices are almost never exactly singular, just ill conditioned. Therefore, it may be necessary to threshold eigenvalues in DA and DB . 2 2 3 Dene vA = DA QAwA and vB = DB QB wB 2 3 A scalar product in complex vector space must conjugate one of the factors. A matrix, Q is unitary if its inverse is Q , i.e. Q Q = I. (7.7) 62 Canonical Correlation of Complex Variables. which is a conventional coordinate transformation in the nonsingular case. In the singular case, one or more elements in vA or vB are always zero. Let's also dene a covariance matrix for this coordinate transformation. C~ AB = DyAQACAB QB DyB (7.8) where y denotes pseudo inverse4. With this coordinate transformation, the canonical correlation can be expressed in a simple form. Thanks to the relations between CAA , CAB and CBB , the following equations are valid even when DA and DB are singular. For readability, this proof is put in appendix A.3. = p wA CAB w B wACAAwA wB CBB wB = p wA Q2A QACAB Q B QB w2 B wAQADA QAwA wB QB DB QB wB see appendix A.3 (7.9) QADA Dy Q CAB QB Dy DB Q wB w A A B A B = p wAQADAQAwA wB QB DB QB wB ~ = pvACAB v B vAvA vB vB = v^A C~ AB v^B where v^A and v^B denote normalized unit vectors of vA and vB . This expression of is simple to maximize with respect to v^A and v^B . At rst thought, one might worry about what happens in the singular case, where eq. 7.7 impose the constraint that some elements of vA and vB have to be zero. These constraints are automatically satised at the maximum of eq. (7.9) since the forbidden subspaces are the same as the left and right nullspaces of C~ AB . To nd the maximum, singular value decomposition (SVD) is applied on C~ AB . 0 1 0f 1 B CC BBf CC ~CAB = ,e e e : : : B [email protected] CA [email protected] CA 2 2 1 1 2 1 = X k 2 k ek fk 3 2 3 ... 3 .. . (7.10) By convention, fei g and ffi g are both sets of orthonormal vectors. The singular values are real and sorted in descending order, i.e. 1 2 3 ::: 0. The The pseudo 2 inverse y ofa0diagonal matrix is simple. Just invert each of the nonzero elements. 0 : 5 0 For example 0 0 = 0 0 4 7.5 Canonical Variates maximum is obtained when 63 v^A = e v^B = f 1 (7.11) 1 = 1 Note that the SVD is not uniquely dened in case two or more singular values are equal. If the multiplicity of the largest singular value is greater than 1, the optimal vA and vB are not unique. Finally, wA and wB can be solved, using eq. (7.7). This solution is ambiguous in the singular case, but pseudo inverse yields the smallest wA and wB . wA = QADyAvA and wB = QB DyB vB (7.12) 7.5 Canonical Variates We do not know any good denition of what a canonical variate is. Borga[5] provides a denition that depends on his maximization method. In this thesis, a dierent maximization method is used and a dierent denition need to be introduced. In section 7.6 this denition is proved to be equivalent with Borga's denition. In this thesis, canonical variates are dened as the (suboptimal) solutions to the canonical correlation corresponding to the dierent singular values in the SVD, eq. (7.10). The variate of index k is what we get if we replace eq. (7.11) with v^A = ek v^B = fk (7.13) = k 7.6 Equivalence with Borga's Solution The objective of this section is to show that the CCA-SVD method gives the same solutions as Borga's[5] method that transforms the maximization problem to a generalized eigenvector problem. The following equations are valid, even for complex and singular cases. Thus, the equivalence proof also conrms the validity of Borga's[5] method for complex variables. Even the canonical variates are the same as in Borga's method. P Remember the singular value decomposition in eq.(7.10), C~ AB = k k ek fk and study what it means for the solutions of the following equation system. C~ AB v^B = v^A (7.14) ~CAB v^A = v^B These equations are satised if and only if vA , vB and are the corresponding components in SVD of C~ AB . Or to be exact, in case the singular value has multiplicity 1, the solutions are linear combinations of SVD-vectors with the same 64 Canonical Correlation of Complex Variables. singular value. For readability, the linear combinations are not written explicitly, but they are implicit since the singular value decomposition is not unique. v^A = ek v^B = fk k = 1; 2; 3; : : : (7.15) = k This means that the canonical variates computed by our novel SVD method are the only solutions to eq. (7.14). We want these equations in the w-coordinates, as in Borga's thesis. Use eq. (7.7) to substitute vA and vB . We also multiply the whole equations with QA DA and QB DB respectively. Despite we multiply with DA that might be singular, we have equivalence with eq. (7.14), (since DA and D2A have the same rank). QADA DyAQACAB QB DyB DB QB wB = QADA DAQAwA QB DB DyB QB CAB QADyA DAQAwA = QB DB DB QB wB (7.16) (7.17) Thanks to eq. (A.2) makes most of the matrix product cancel out. We arrive at the following expression which is equivalent to eq.(4.30) in Borga's thesis, except for that the w vectors are not normalized. CAB wB = CAAwA CAB wA = CBB wB (7.18) (7.19) We can normalize the vectors, provided we multiply rhs in one eq by \ BA " and the other rhs by \( BA ),1 " (Borga's variable names). Then we have exactly equation (4.30) in Borga's PhD thesis[5]. The singular values and vectors correspond to the canonical variates. Note that this proof holds even for the complex and singular cases. To emphasize this is the generalized eigenvalue problem, let's write it in matrix form 0 C C w 0 AB , AA A (7.20) CAB 0 0 CBB wB = 0 We do not recommended Borga's method when CAA and CBB are close to singular. Experimental results5 indicate serious numerical problems. 5 our motion algorithms using Matlab function eig() Chapter 8 Motion Estimation using Canonical Correlation Canonical correlation has been successfully used for estimation of disparity in a stereo algorithm by Borga[5]. An important advantage of that method is the ability to handle depth discontinuities. Whereas conventional stereo algorithms smoothen disparity estimates across discontinuities, Borga's algorithm responds with a distinct discontinuity. Experiments with transparent layers even prove an ability to estimate multiple disparities at a single point in the image. It should be pointed out that there are other stereo algorithms that can handle depth discontinuities, e.g. Bircheld-Tomasi[4] that searches for single pixel correspondence. One may wish there were a motion estimation algorithm with the same advantages as Borga's stereo algorithm. In case of occlusion, one wish motion discontinuities would be be correctly estimated. One may also wish that transparent layers would give multiple motion estimates at a single point. Unfortunately, it is more complicated in motion estimation, due to the generalized aperture problem, described in chapter 5. It may still be possible to compute motion constraints are not smoothed across discontinuities and not much degraded by interference of multiple layers. We have extended the stereo algorithm to estimate motions, but so far for only one motion. It remains to explore its potential abilities in estimation of multiple motions. 8.1 Operations Applied Locally in the Image. The image is rst convolved with a number of quadrature lters and then divided into patches, e.g. blocks of size 16x16 pixels. The patch should be so small that the motion can be considered as pure translation within the patch. Each of these patches for each of these lter outputs are processed independently to get a motion constraint, c. This section describes these local operations. 66 Motion Estimation using Canonical Correlation PP Apply Quadrature , Filters , 0 45 ) , @ @ , Images IA (x) and IB (x) PP PP PP P @ 90 R @ 135PPPP q Local covariance of shifted lter outputs 0x : : : x1 0x : : : x1 0x : : : CAA = B @ ... . . . ... CA, CAB = [email protected] ... . . . ... CA, CBB = [email protected] ... . . . x ::: x x ::: x x ::: A A max canon. correlation A 0x1 wA = B @ ... CA x 1 x .. C .A x 0x1 wB = B @ ... CA AU x linear combination of shifted lters ? ? fA(x) = fB (x) = @ @ @ cross correlation @ R @ compute constraint , , , g(v) , , 0c 1 x c = @ cy A ? ct Figure 8.1: From image to motion constraint for one direction and one patch. Don't forget that all values in between are complex numbers. A look up table can speed up computations. 8.1 Operations Applied Locally in the Image. 67 8.1.1 Shifted Quadrature Filter Outputs Each of the two original images are convolved with a number of quadrature lters, as dened in section 1.4. We have used lters in directions 0, 45, 90 and 135 degrees. Since only one lter is used to compute one motion constraint, c, the direction is dropped in our notations. For readability, we let f (x) denote the quadrature lter of any directions. We have not tried lters with dierent center frequencies, but we believe it would improve performance. qA (x) = (f IA )(x) and qB (x) = (f IB )(x) (8.1) These lter outputs are shifted with a number of predened shifts, s1 ; s2; s3 ; : : : and correlated. For example, in case the motion is exactly v = s3 then qA (x) and qB (x + s3 ) will make perfect correlation. In case we would have v s3 then we also get a high magnitude of correlation, but the value is complex with an argument almost proportional to the dierence v , s3 . This property is fundamental in the phasebased method in chapter 4. The method in this chapter is based on nding linear combinations of shifted lter outputs X i wAi qA (x + si ) and X i wBi qB (x + si ): (8.2) that have highest possible correlation. The coecients are complex and we arrange them in vectors 0w 1 0w 1 A1 BBwA2 CC BBwBB12 CC wA = B (8.3) @wA. 3 CA and wB = [email protected] 3 CA : .. .. 8.1.2 Canonical Correlation For each lter direction and each patch, canonical correlation is used to nd the linear combinations of shifted lter outputs in eq (8.2) that have maximum correlation. The patch region in the image is denoted N . The unknown coecients in the linear combinations are organized in vectors wA and wB . In terms of these notations, we want to maximize the following correlation under constraint it is real and positive. RR (P w q (x + s )) P w q (x + s ) dx i i i Bi B N i Ai A q = wmax RR RR P P ; w A B N j i wAi qA (x + si )j dx N j i wBi qB (x + si )j dx wA CAB wB p = wmax A ;wB w C w w C w 2 A AA A B BB B 2 (8.4) 68 Motion Estimation using Canonical Correlation 6 , @ I @, ,2 s = +2 7 6 , @ I @, s = +2 8 6 @ I , @, 6 @ I , @, ,2 s = 0 4 6 , @ I @, s = ,,22 1 0 0 s = 0 5 6 , @ I @, s = ,02 2 6 , @ I @, +2 s = +2 9 6 @ I , @, +2 s = 0 6 6 , @ I @, s = +2 ,2 3 Figure 8.2: A set of shifted quadrature lters in directions 0, 45, 90, 135 degrees that are used in experiments in section 8.4. This is the form of canonical correlation where CAA , CAB and CBB are covariance matrices. The element at row m and column n in each covariance matrix is computed as CAA;mn = CAB;mn = CBB;mn = ZZ N ZZ N ZZ N qA (x + sm ) qA (x + sn ) dx (8.5) qA (x + sm ) qB (x + sn ) dx (8.6) qB (x + sm ) qB (x + sn ) dx (8.7) The canonical correlation is maximized using the SVD-based method in chapter 7 that can handle singular covariance matrices. Matrices are virtually never singular, just ill conditioned, and therefore we threshold eigenvalues of the covariance matrices in eq. (7.6). The threshold should be much higher than what can be justied by errors in oating point arithmetics. In order to reject weak features in the images, the threshold in our implementation is set to 1=1000 of the largest eigenvalue. The exact value of the threshold is probably not important and can vary several orders of magnitude without signicant changes in motion estimates. 8.1 Operations Applied Locally in the Image. 69 8.1.3 Correlation of Filters Maximization of canonical correlation means nding the linear combinations of lter outputs that yield maximum correlation. In previous sections, we found the vectors of coecients, wA and wB , such that the maximimum correlation is obtained for X i wAi IA f (x + si ) and X i wBi IB f (x + si ): (8.8) Thanks to the properties of convolution, it makes sense to study the linear combination of lters fA (x) = X i wAi f (x + si ) and fB (x) = X i wBi f (x + si ) (8.9) instead of the lter outputs. Convolving the images with these lters is the same as convolving the image with each of the original lters and then computing linear combinations. In sense of correlation, these are the best possible linear combination of original lters. The motion can be estimated by analyzing these lters. The lters obtained by linear combinations of quadrature lters in the same direction, are also quadrature lters. This statement is obvious if we think of the lter summation in the Fourier domain. Since all the added lters are zero in one half plane, the sum is also zero in one half plane. Since the two images are similar except for a shift, i.e. IA (x) = IB (x + v) the computed lters should also be similar, except for an equally large shift in opposite direction, fA(x) = fB (x , v). To nd the correct motion, v, we analyze the cross correlation of the generated lters, g(v) = ZZ fA (x + v) fB (x) dx (8.10) In a perfect world, the cross correlation of g(v) has a peak value where v is the image motion. This peak value value is real and positive, i.e the phase crosses zero. In practice, the zero crossing of the phase does not perfectly coincide with the maximum amplitude. Just nding correlation peaks is of limited use in image regions that only have structure in one orientation, e.g. a straight line or edge. Phase is used since it is aware of the aperture problem. We also believe that zero crossings are more accurate than maximum amplitude. The phase of g(v) crosses zero along curves in (vx ; vy )-space. Usually, there are several curves, but the curve with the highest amplitude is probably the one corresponding to the image motion. How to analyze the cross correlation, g(v), is described in section 8.1.5. The next section describes how to compute g(v) using a look up table. 8.1.4 Look Up Table (LUT) Since the generated lters, fA (x) and fB (x) are linear combinations of a set of original lters, their cross correlation, g(v), is a sum of cross correlations of the original shifted lters. The LUT is computed by explicitly shifting lters. For 70 Motion Estimation using Canonical Correlation Figure 8.3: Zero crossings of the phase for all the patches in an image with ane motion. In each subplot, the zero crossings, arg g(v) = 0, are drawn for each of the lter directions. Most zero crossings are straight lines. Sometimes, there are multiple false zero crossings. Since the motion is not pure translation, the intersections have dierent positions for dierent patches. subpixel accuracy, the shifts are implemented as multiplication in the Fourier domain. In matrix form the generated lters can be expressed as fA (x) = X ,i wAi f (x + si ) = f (x + s1 ) : : : f (x + sN ) wA : (8.11) 8.1 Operations Applied Locally in the Image. 71 The cross correlation is a product of coecients vectors from the canonical correlation and a matrix whose elements are cross correlation of the original lters. ZZ fA (x) fB (x + v) dx 0 f (x + v + s ) 1 ZZ BB f (x + v + s12) CC , f (x + s1 ) : : : f (x + sN ) wB dx = wA B C . . @ A . f (x + v + sN ) (8.12) 0 f (x + v + s ) 1 ZZ BB f (x + v + s12) CC , = wA [email protected] CA f (x + s1) : : : f (x + sN ) dx wB .. . f (x + v + sN ) = wA G(v) wB g (v ) = where 0 f (x + v + s ) 1 ZZ B f (x + v + s ) CC , f (x + s ) : : : f (x + sN ) dx G(v) = B [email protected] C .. A . 1 2 1 (8.13) fN (x + v + sN ) G(v) is a look up table(LUT), that is precomputed for a number of dierent values of v. Since subpixel shifts are necessary, there is an issue which interpolation method to use. For this particular data, we have chosen phase shifts in the Fourier domain, since we are not worried about ringings in the spatial domain. In order to reduce the eects of circular shifts, zeros are padded on the borders of the lters before computing FFT. Zero padding is equivalent to more dense sampling in the Fourier domain. For computational eciency, Plancherel's formula may used to compute cross correlation directly in Fourier domain in order to avoid inverse FFT. Gmn(v) = ZZ f (x + v + sm) f (x + sn ) dx ZZ T T (8.14) = 21 (F (u) eiu (v+sm) ) F (u) eiu sn du ZZ jF (u)j2 eiuT (sn,sm ,v) du = 21 where F (u) is the Fourier transform of f (x). In the next section, interpolation is used to compute g(v) for values of v that are not in the look up table. Bilinear interpolation is used, but not directly on the real and imaginary parts. Instead, interpolation is done in polar representation of the complex numbers. The reason is because phase is more linear than the real and imaginary parts. This interpolation enables us to compute derivatives of the phase. 72 Motion Estimation using Canonical Correlation 8.1.5 Motion Constraints from Correlation Data The motion to estimate, v, is assumed to be along one of the zero crossings of the phase of the correlation map. This yields a nonlinear constraint on the local motion, arg g(v) = 0. In order to make computations reasonably simple, the nonlinear constraint is approximated by a linear motion constraint, cT v = 0, as dened in section 2.2, where notations are as in previous chapters, 0c 1 v x v = 1 and c = @cy A : (8.15) c t Unfortunately, there are often multiple zero crossings of the phase. In addition, the zero crossings are not along straight lines. For that reason, it is necessary to know roughly what the motion is before converting to a linear motion constraint. Assuming the motion is close to v0 , we think that a linear motion constraint, c, should have the following property (8.16) C arg g(v) = cT v + O(kv , v0 k2 ): The solution is c x (8.17) cy = C r arg g(v0 ) and ct = C arg g(v0 ) , cxv0;x , cy v0;y where C is a condence measure set to ( 1 1 3 C = ( 1 , , 1 ,2 ) jg(v)j kr arg g(v)k if > 2 (8.18) 0 otherwise where 1 = 1:001, 2 = 0:98 and 3 = 1 are constants chosen by studying a few experiments. Note that the magnitude of the gradient of the phase is included in both eq (8.17) and eq (8.18). 8.2 Fitting Motion Model to Data The image is divided into patches that each yield as many motion constraints as there are directions of quadrature lters. These constraints are combined according to the theory of motion models in chapter 3 and produce a motion estimate. Instead of iterative renement as described in chapter 4, we iterate without warping the image. Instead, the motion constraints are recomputed for updated motions as v0 in eq. (8.17) and gure 8.4. 8.3 Choosing Patch Size A small patch often contains too little information. For the canonical correlation to have meaning, we must have at least as many pixels in a patch as many as the 8.4 Experimental Results Quadr. Filter - CCA 73 - Correlate lters (LUT) - compute c - Fit motion - 6 Figure 8.4: Flow chart of our CCA-based motion estimation regarded from a single patch. Computation of motion constraints requires that the motion is known approximately. Since we can only make a rough guess, a number of iterations is necessary. number of shifts per lter. But even if there are fewer pixels, there is still a chance that canonical correlation nds good pair of linear combinations. A too large patch, on the other hand, will not reect the local structure in the image. It will rather tend to reect the global distribution. Large patches also have problem to estimate motions that are not pure translations. Probably, the error in estimation of rotations is proportional to the patch size (for large patches). A few experiments on ane motions, suggest that the error is roughly a certain fraction of the variations within a single patch. In addition, the larger patches are, the fewer they get and thus a lot of information seems to be thrown away. More patches yield more motion constraints. 8.4 Experimental Results Figure 8.5 shows the accuracy in motion estimation on an image with synthetically introduced motions. The famous test image Lena512x512 has been shifted in several dierent directions and distances. Then the images have been subsampled to 128 pixels in order to hide the artifacts introduced by subpixel shifts. Thus, we have good test images of size 128x128 pixels and we know the answer. The motion is estimated and the mean square deviation is plotted for each magnitude of shifts. Since the look up table is only computed for shifts smaller than 2 pixels, it is impossible to estimate larger motions. In the experiment, the center frequency of the lter is roughly 1 rad/pixel1, the patch size is 16x16 pixels and the lter outputs are shifted according to si in gure 8.2. 1 Filter is taken o the shelf with internal name is orient8 in GOP. 74 Motion Estimation using Canonical Correlation Error in Estimate (pixels) 0.01 0.008 0.006 0.004 0.002 0 0 0.5 1 1.5 True Shift (pixels) 2 Figure 8.5: Accuracy is very good for this synthetically shifted image, Lena128. The mean square error is plotted versus the amount the image is shifted. (Do not compare with gure 4.6 where no iterations are done.) 8.5 Future Development The experimental results on synthetic images is in itself a justication for the research we have done so far. Still our future goal is estimation of multiple motions, but which we still have not tried. The diculties compared to the stereo algorithm is again the general aperture problem described in chapter 5. 8.5.1 Using Multiple Variates The canonical correlation generates multiple canonical variates. Most of these canonical variates yield high correlation and similar cross correlation of generated lters g(v). It may be possible to use more variates than the rst one, but still we have not seen any signicant improvement in experimental results. 8.5.2 Other Filters than Quadrature Filters We have performed some experiments on replacing the quadrature lters by pair of odd and even real lters. The purpose is to allow more degrees of freedom in canonical correlation by allowing any linear combination of odd and even parts. Then fA(x) and fB (x) are sums of real lters that are both even and odd. Thus, the generated lters are not quadrature lters and the cross correlation must be done in a dierent way. In experiments, we have transform the lters to the Fourier domain and one of them is multiplied by ei where denotes the angle in polar representation of the frequency in the Fourier domain. After cross correlation, then the magnitude of g(v) is zero at the motion. Unfortunately, zero crossings of magnitude is harder to nd than zero crossings of the phase. In particular, it gets harder to estimate large motions since there are more zero crossings of the magnitude. 8.5 Future Development 8.5.3 Reducing Patch Size 75 Maybe reducing the patch size helps in estimation non-translational motions and motions of multiple transparent layers. It may be possible to reduce the patch size if fewer shifts, si , are used. Maybe, the set of shifts should depend on direction of the lter. It may also help to have dierent shifts for the two images in case the motion is roughly known, as in iterative renement. In the extreme case, if only one shift is used for every image, we get something similar to the phase-based method in chapter 4 with warp. We suggest to be careful with such approaches. For example, only choosing shifts along a line would mean that the motion is estimated in a separable fashion and we are back at the mistake described in section 2.1.1. 76 Motion Estimation using Canonical Correlation Appendices A Details for Chapter 7 on Canonical Correlation A.1 Failure to Compute Derivative with Respect to a Complex Variable Calculus with complex variables often obey the same rules as with real variables. For that reason, it is easy to forget that the same rules do not always apply. As interest for this chapter, we will show why it is not possible to compute the derivative of a complex conjugate. Let f (z ) = z and the derivative is dened as a limit that does not exist since h is a complex number, f (z + h) , f (z ) f 0 (z ) = jhlim h j!0 (z + h) , z = jhlim h j!0 h: = jhlim j!0 h (A.1) Of course, it is still possible to split the complex variable into real and imaginary a+ib) and @f (a+ib) . parts, z = a + ib and then calculate the partial derivatives @f (@a @b A.2 Beginner's Example of Canonical Correlation Assume X1 ; X2 ; X3 ; X4 are independent stochastic variables, with zero mean and standard deviation = 1. 0X + X 1 zA = @X , X A 1 1 2 X3 2 1 and zB = X X4 Note that the only variable that appears in both data set is X1 . For this simple case, it is obvious that maximum correlation is between zA;1 + zA;2 and zB;1 . 78 Appendix To verify the CCA algorithm, go through the formal computations. 0(X CAA = E [zTA zTA] = E [@(X 02 0 = @0 2 1 + X2 ) (X1 + X2 ) (X1 + X2 ) (X1 , X2 ) (X1 + X2 ) X3 (X1 , X2 ) (X1 , X2 ) (X1 , X2 ) X3 A] 1 , X2 ) (X1 + X2 ) X13 (X1 + X2 ) X3(X1 , X2 ) X3X3 0 0A 0 0 1 1 0(X + X )X CAB = E [zTA zTB ] = E [@(X , X )X 1 1 2 1 2 1 X3 X1 1 0 1 (X1 + X2 ) X4 1 0 (X1 , X2 ) X4 A] = @1 0A X3 X4 0 0 X X X X 1 0 ]= CBB = E [zTB zTB ] = E [ Maximization gives 1 1 1 4 X4 X1 X4 X4 0 1 01 1 1 1 @ A wA = p2 1 ; wB = 0 and = 1 0 This time, = 1, which means that the two linear combinations are always equal, except for a scalar factor. This is not normal in real world application, where it is possible where no linear combination makes perfect correlation. A.3 Proof of Equation (7.9) This section is a proof that given the variable denitions in chapter 7 DADyAQACAB QB DyB DB = QACAB QB (A.2) In the singular case, DA DyA 6= I and(or) DyB DB 6= I, but we will show that eq (7.9) is still valid thanks to the relations between CAA , CAB and CBB . It is enough to prove one half of the theorem QACAB = DADyAQACAB (A.3) The other part of the theorem, CAB QB = CAB QB DyB DB , can be proved in the same way. Before going into the core of the proof, note that DA DyA is equal to the identity matrix, except in the positions where DA is zero . The denitions of these covariance matrices implies that a null space in the CAA and CBB are also left 1 1 03:26 For example: If DA = @ 0 0 1 0 1 0 0 1 0 0 56:31 0A then DA DyA = @0 1 0A 0 0 0 0 0 A Details for Chapter 7 on Canonical Correlation 79 and right null spaces of CAB . Let's apply a coordinate transformation to obtain a form of the canonical correlation which is useful in the proof. = p wA Q2A QA CAB Q B QB w2 B wAQADAQAwA wB QB DB QB wB = puA QA2 CAB Q B u2 B (A.4) uA = QAwA uB = QB wB (A.5) uAk = DADyA uA uA? = (I , DA DyA) uA (A.6) uADAuA uB DB uB where Here comes the core of the proof. Let's pick arbitrary uA and uB and split the former into parts Note that uAk + uA? = uA . We will prove that the error in the numinator of eq. (A.4) is zero. " = uA QA CAB QB uB , uA DA DyA QA CAB QB uB = uA (I , DA DyA )QA CAB QB uB (A.7) = (uAk + uA? )(I , DA DyA )QA CAB QB uB = uA? QA CAB QB uB To prove that uA? QA CAB is zero, we employ a simple trick. Study the correlation, when the coecients of the linear combinations are uA? and uB . (uA? ; uB ) = p uA? Q2A CAB QB u2B uA?DAuA? uB DB uB Q CAB QB uB u = A?p A 0 uB DB uB = 0" 2 (A.8) Remind that the canonical correlation, jj 1, eq. (7.2.1). Thus, a zero in the denuminator means a zero in the numinator. Thus, " = 0 for arbitrary uA and uB , it follows that eq. (A.3) is proven. 80 Appendix B Variable Names All variable names that are used without immediate explanation are listed here. Most variable names are local of each chapter, but there are also variables that are used throughout the thesis. The right column indicates where each variable is dened. An introduction to our style and notations (except for variable names) is provided in section 1.3. B.1 Global Variable Names The following notations are used in many chapters without immediate explanation at every occurence. v v = vxy 0c 1 x c = @cy A c xt x= y xk image motion eq. (3.3) motion constraint, such that cT v = 0 eq. (2.2) spatial position eq. (3.2) a often spatial coordinate of constraint, ck , with eq. (3.12) index k. vector of parameters for motion model eq. (3.7) K(x) matrix of basis functions for motion model IA (x), IB (x) two images that are input to motion estimation section 4.2.1 eq. (3.8) B Variable Names 81 B.2 Local Variable Names in Chapter 3 v = v1 a = a1 image motion with an extra element=1 eq. (3.11) model parameter vector with extra element=1 eq. (3.11) K(x) matrix of basis functions with extra element eq. (3.11) "(a) error when tting model eq. (3.12) k Q often index of motion constraint (joint index chapter 3 for spatial position, lter direction e.t.c.) Symmetric matrix dening quadratic form eq. (3.13) Q, q submatrix/vector of Q eq. (3.14) P matrix dening cost function section 3.3 Scalar multiplier of cost eq. (3.18) B.3 Local Variable Names in Chapter 4 fj (x) Quadrature lter with index j eq. (4.2) n^ Direction of quadrature lter. eq. (4.1) qA;j (x), qB;j (x) A;j (x) Output from quadrature lter with index j eq. (4.2) convolved with images A and B respectively Phase computed from image A and lter j eq. (4.3) C Condence in constraint c eq. (4.4), eq. (4.10) 82 Appendix B.4 Local Variable Names in Chapter 5 M number of layers N number of parameters in motion model B.5 Local Variable Names in Chapter 6 This chapter uses the same notations as in chapter 3 and the following. an parameters describing motion of layer n mn;l mixture probability - the probability of observing a constraint for layer n in a warped image with index l owner probability - the probability that the particular constraint ckl belongs to layer n. often index of motion layer qnkl n ra often index of motion constraint ck (joint index for spatial position, lter direction e.t.c.) often the index of image warped according to estimated motion with index (n =)j conditional probability density function of X when Y is known Gradient with respect to variables in vector a d(c; v) distance from a motion to a given constraint eq. (6.19) "(a1 ; a2 ) Error measure in alternative method eq. (6.25) k l P (X jY ) T4ijkl , T3ijk , tensors with 4, 3, 2, 1, 0 indices of fourth mo- eq. (6.30) ments of motion constraints T2ij , T1i , T0 B Variable Names 83 B.6 Local Variable Names in Chapter 7 zA , zB vectors of input data (stochastic variables) wA, wB vectors with coecients for linear combination eq. (7.1) of the elements in zA and zB . The correlation is maximized with respect to wA and wB . Linear combination of stochastic variables eq. (7.1) zA = wAT zA canonical correlation eq. (7.2) zA , zB section 7.1 CAA, CAB , covariance matrices CBB DA, DB Diagonal matrix in eigenvalue decomposition of CAA and CBB . All elements are real and nonnegative. QA, QA Transformation matrix in eigenvalue decomposition of CAA and CBB . Complex elements and unitary. vA, vB Transformed vectors of wA and wB . eq. (7.3) C~ AB Transformed covariance matrix CAB eq. (7.8) v^A, v^B normalized unit vectors of vA and vB eq. (7.9) Dy Pseudo inverse of matrix D. eq. (7.8) k , ek , fk SVD of C~ AB eq. (7.10) eq. (7.6) eq. (7.6) eq. (7.7) 84 Appendix B.7 Local Variable Names in Chapter 8 f (x) section 8.1.1 wA, wB Quadrature lter (one out of several in dierent directions) Outputs from some quadrature lter f (x) applied on images IA (x) and IB (x). shift of quadrature lter outputs when computing covariance matrices coecients for linear combination N patch region in the image section 8.1.2 canonical correlation eq. (8.4) qA (x), qB (x) s ;s ;::: 1 2 eq. (8.1) section 8.1.1 eq. (8.3) CAA, CAB , covariance matrices for canonical correlation as section 8.1.2 CBB , in chapter 7 fA (x), fB (x) Generated lters. Linear combination of eq. (8.9) shifted original lters f (x) g (v ) cross correlation of generated lters { complex eq. (8.10) G(v) value Look up table (LUT) { a matrix for each v. eq. (8.13) Bibliography [1] V Torre A Verri, F Giroso. Dierential techniques for optical ow. Journal of the Optic Society of North America, 7:912-922, 1990. [2] M. Andersson and H. Knutsson. General sequential Spatiotemporal Filters for Ecient Low Level Vision. In ECCV-96, April 1996. Submitted. [3] J. L. Barron, D. J. Fleet, S. S. Beauchemin, and T. A. Burkitt. Performance of optical ow techniques. In Proc. of the CVPR, pages 236{242, Champaign, Illinois, USA, 1992. IEEE. Revised report July 1993, TR-299, Dept. of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7. [4] S. Bircheld and C. Tomasi. Depth discontinuities by pixel-to-pixel stereo.". Proceedings of the IEEE International Conference on Computer Vision, pages 1073{1080, January 1998. [5] M. Borga. Learning Multidimensional Signal Processing. PhD thesis, Linkoping University, Sweden, SE-581 83 Linkoping, Sweden, 1998. Dissertation No 531, ISBN 91-7219-202-X. [6] A. D. Calway, H. Knutsson, and R. Wilson. Multiresolution estimation of 2-d disparity using a frequency domain approach. In Proc. British Machine Vision Conf., Leed, UK, September 1992. [7] A. D. Calway, H. Knutsson, and R. Wilson. Multiresolution estimation of 2-d disparity using a frequency domain approach. In Proc. British Machine Vision Conf., Leed, UK, September 1992. [8] A. D. Calway, H. Knutsson, and R. Wilson. Multiresolution frequency domain algorithm for fast image registration. In Proc. 3rd Int. Conf. on Visual Search, Nottingham, UK, August 1992. [9] G. Farneback. Motion-based Segmentation of Image Sequences. Master's Thesis LiTH-ISY-EX-1596, Computer Vision Laboratory, SE-581 83 Linkoping, Sweden, May 1996. [10] G. Farneback. Spatial Domain Methods for Orientation and Velocity Estimation. Lic. Thesis LiU-Tek-Lic-1999:13, Dept. EE, Linkoping University, 86 [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] Bibliography SE-581 83 Linkoping, Sweden, March 1999. Thesis No. 755, ISBN 91-7219441-3. D. J. Fleet and A. D. Jepson. Computation of Component Image Velocity from Local Phase Information. Int. Journal of Computer Vision, 5(1):77{104, 1990. D. J. Fleet, A. D. Jepson, and M. R. M. Jenkin. Phase-based disparity measurement. CVGIP Image Understanding, 53(2):198{210, March 1991. G. H. Granlund and H. Knutsson. Signal Processing for Computer Vision. Kluwer Academic Publishers, 1995. ISBN 0-7923-9530-1. M. Hemmendor, M. T. Andersson, and H. Knutsson. Phase-based image motion estimation and registration. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1999, Phoenix, AZ, USA, March 1999. IEEE. M. Hemmendor, H. Knutsson, M. T. Andersson, and T. Kronander. Motion compensated digital subraction angiography. In Proceedings of SPIE's International Symposium on Medical Imaging 1999, volume 3661 Image Processing, San Diego, USA, February 1999. SPIE. Magnus Hemmendor. Motion compensated digital subtraction angiography. Master's thesis, Linkopings universitet, 1997. LiTH-ISY-EX-1750. B. K. P. Horn and B. G. Schunk. Determining optical ow. Articial Intelligence, 17:185{204, 1981. M. Irani and S. Peleg. Motion analysis for image enhancement: resolution, occlusion, and transparency. Journal of Visual Communications and Image Representation, 4(4):324{335, 1993. A. Jepson and M. Black. Mixture models for optical ow. Technical Report RBCV-TR-93-44, Res. in Biol. and Comp. Vision, Dept. of Comp. Sci., Univ. of Toronto, 1993. A. D. Jepson and D. J. Fleet. Scale-space singularities. In O. Faugeras, editor, Computer Vision-ECCV90, pages 50{55. Springer-Verlag, 1990. J. Karlholm. Local Signal Models for Image Sequence Analysis. PhD thesis, Linkoping University, Sweden, SE-581 83 Linkoping, Sweden, 1998. Dissertation No 536, ISBN 91-7219-220-8. Scott R. Kerns and Jr. Irvin F.Hawkins. Carbon dioxide digital subtraction angiography. AJR, 164, 1995. H. Knutsson and M. Andersson. Optimization of Sequential Filters. In Proceedings of the SSAB Symposium on Image Analysis, pages 87{ 90, Linkoping, Sweden, March 1995. SSAB. LiTH-ISY-R-1797. URL: http://www.isy.liu.se/cvl/ScOut/TechRep/TechRep.html. Bibliography 87 [24] Centers for Disease Control Lewis A. Connor, David Satcher and Prevention. Reducing the burden of cardiovascular disease: Cdc strategies in evolution. Chronic Disease Notes and Reports, 1997. [25] Jorg F. Debatin Martin R. Prince, Thomas M. Grist. 3D Contrast MR Angiography, 2nd edition. Springer, 1999. [26] G. J. McLachlan and T. Krishnan. The EM algorithm and extensions. Wiley, 1997. [27] H. H. Nagel. On the estimation of optical ow: Relations between dierent approaches and som new results. Articial Intelligence, 33:299{324, 1987. [28] J. Whiting R. Close. Decomposition of coronary angiograms into non-rigid moving layers. In Proceedings of SPIE's International Symposium on Medical Imaging 1999, volume 3661 Image Processing, San Diego, USA, February 1999. SPIE. [29] J. Shi and C. Tomasi. Good features to track. IEEE Conference on Computer Vision and Pattern Recognition, pages 593{600, 1994. [30] M. Shizawa and K. Mase. Simultaneous multiple optical ow estimation. In Proceedings of the 10th International Conference on Pattern Recognition, volume 1, pages 274{278, 1990. [31] M. Shizawa and K. Mase. Principle of superposition: A common computational framework for analysis of multiple motion. In IEEE Workshop on Visual Motion, Princton, NJ, 1991. [32] Jurgen Weese T. Buzug. Weighted least squares for point-based registration in digital subtraction angiography (dsa). In Proceedings of SPIE's International Symposium on Medical Imaging 1999, volume 3661 Image Processing, San Diego, USA, February 1999. SPIE. [33] C-J. Westelius. Focus of Attention and Gaze Control for Robot Vision. PhD thesis, Linkoping University, Sweden, SE-581 83 Linkoping, Sweden, 1995. Dissertation No 379, ISBN 91-7871-530-X. [34] C-F. Westin. A Tensor Framework for Multidimensional Signal Processing. PhD thesis, Linkoping University, Sweden, SE-581 83 Linkoping, Sweden, 1994. Dissertation No 348, ISBN 91-7871-421-4. [35] Y. Zhu and N. J. Pelc. A spatiotemporal nite element mesh model of cyclical deforming motion and its application in myocardial motion analysis using phase contrast mr images. In IEEE International Conference on Image Processing 97, volume II, pages 117{120, Santa Barbara, October 1997. IEEE. 88 Bibliography

Download PDF

advertisement