Single and Multiple Motion Field Estimation Magnus Hemmendor

Single and Multiple Motion Field Estimation Magnus Hemmendor
Link
oping Studies in Science and Technology
Thesis No. 764
Single and Multiple Motion Field
Estimation
Magnus Hemmendor
LIU-TEK-LIC-1999:22
Department of Electrical Engineering
Linkopings universitet, SE-581 83 Linkoping, Sweden
http://www.isy.liu.se
Linkoping April 1999
Single and Multiple Motion Field Estimation
c 1999 Magnus Hemmendor
Department of Electrical Engineering
Linkopings universitet
SE-581 83 Linkoping
Sweden
ISBN 91-7219-478-2
ISSN 0280-7971
iii
Abstract
This thesis presents a framework for estimation of motion elds both for single
and multiple layers. All the methods have in common that they generate or use
constraints on the local motion. Motion constraints are represented by vectors
whose directions describe one component of the local motion and whose magnitude
indicate condence.
Two novel methods for estimating these motion constraints are presented. Both
methods take two images as input and apply orientation sensitive quadrature lters. One method is similar to a gradient method applied on the phase from the
complex lter outputs. The other method is based on novel results using canonical
correlation presented in this thesis.
Parametric models, e.g. ane or FEM, are used to estimate motion from
constraints on local motion. In order to estimate smooth elds for models with
many parameters, cost functions on deformations are introduced.
Motions of transparent multiple layers are estimated by implicit or explicit clustering of motion constraints into groups. General issues and diculties in analysis
of multiple motions are described. An extension of the known EM algorithm is
presented together with experimental results on multiple transparent layers with
ane motions. Good accuracy in estimation allows reconstruction of layers using
a backprojection algorithm. As an alternative to the EM algorithm, this thesis
also introduces a method based on higher order tensors.
A result with potential applicatications in a number of dierent research elds
is the extension of canonical correlation to handle complex variables. Correlation
is maximized using a novel method that can handle singular covariance matrices.
iv
v
Acknowledgements
Many people have been important for this thesis and here follows an attempt to
list those who have made the largest contributions.
Professor Hans Knutsson is my academic advisor. He has extremely lots of ideas
and a good intuition. Knutsson has provided me with the embryos to many of the
best results in this thesis.
Torbjorn Kronander PhD, president SECTRA-Imtec AB, is my industrial advisor and despite being overly busy, he has had a great impact on the pace of the
progress of this project. He did together with my academic advisor invent the
initial ideas for this project.
Mats Andersson PhD, has almost served as an assistant academic advisor and
taken time to understand and discuss a major portion of this work to the level of
detail.
All the people at the Computer Vision Laboratory and its manager, professor
Gosta Granlund, have provided a friendly and stimulating research environment
well above average. For example, Johan Wiklund maintains a very well working
computer network. Gunnar Farneback's experience and LATEX design of licentiate thesis speeded up my work. Magnus Borga provided me with unpublished
details from his research on canonical correlation.
SECTRA-Imtec AB has provided 50% nancial support and I have spent half my
time there to share the SECTRA spirit and widen my experience and knowledge.
Thanks to SECTRA I have been able to bring my research output into commercial
applications of medical imaging. http://www.sectra.se
Among our partners at medical centers are Asgrimur Ragnarsson Torbjorn An
dersson MD at Orebro
Regional Hospital. Lars Thorelius MD, Erik Hellgren
MD, at Linkoping University Hospital. Anders Persson MD and Goran Iwar MD,
Hudiksvall Hospital.
Research partners have an increasing inuence on our work and future plans.
Thanks to Lars Wigstrom at Linkoping University Hospital. Also thanks to Surgical Planning Laboratory at Harvard Medical School, in particular faculty members
C-F Westin PhD and Professor Ron Kikinis MD.
Swedish National Board for Industrial and Technical Development (NUTEK) has
provided 50% nancial support for me and my colleague Mats Andersson. NUTEK
has also provided partial support for Hans Knutsson and Johan Wiklund.
vi
Contents
1 Introduction
3
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Cardiovascular Disease . . . . . . . . . . . . . . . . . . . . . 4
1.2 What is Digital Subtraction Angiography and why is Motion Compensation Needed? . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 X-ray Angiography . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Image Subtraction . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Pixel Shift by Hand . . . . . . . . . . . . . . . . . . . . . . 7
1.2.5 Automatic Motion Compensation . . . . . . . . . . . . . . . 7
1.2.6 Objective of our Research . . . . . . . . . . . . . . . . . . . 7
1.2.7 Cardiac Angiography . . . . . . . . . . . . . . . . . . . . . . 8
1.2.8 Interventional Angiography . . . . . . . . . . . . . . . . . . 9
1.2.9 A Word about MR Angiography . . . . . . . . . . . . . . . 9
1.3 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Quadrature Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 General Issues for Single Motion Fields
2.1 Aperture Problem . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Failure of Separable Motion Estimation Algorithms . . . .
2.2 Motion Constraints . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Warping Image to Estimate Large Motions with High Accuracy .
2.3.1 Conventional Iterative Renement . . . . . . . . . . . . .
2.3.2 Compensate Constraint . . . . . . . . . . . . . . . . . . .
2.3.3 Iterative Renement without Subpixel Warps . . . . . . .
3 Parametric Motion Models
3.1 Our Denition of Parametric Motion Models
3.1.1 Finite Element Method (FEM) . . . .
3.2 Model Based Motion Estimation . . . . . . .
3.3 Cost Functions . . . . . . . . . . . . . . . . .
3.3.1 Limit on Cost . . . . . . . . . . . . . .
3.3.2 Designing Cost Functions . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
13
15
16
17
17
18
19
19
21
21
22
23
24
viii
Contents
3.4 Relation to Motion Estimation from Spatiotemporal Orientation
Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5 Local-Global Ane Model . . . . . . . . . . . . . . . . . . . . . . . 25
3.5.1 Ecient Implementation of The Local-Global Ane Model 26
4 Estimation of Motion Constraints
4.1 Existing Methods . . . . . . . . . . . . . . . . . .
4.1.1 Intensity Conservation Gradient Method .
4.1.2 Point Matching . . . . . . . . . . . . . . .
4.1.3 Spatiotemporal Orientation Tensors . . .
4.2 Phase Based Quadrature Filter Method . . . . .
4.2.1 Motion Constraint Estimation . . . . . .
4.2.2 Condence Measure . . . . . . . . . . . .
4.2.3 Multiple Scales and Iterative Renement .
4.3 Experimental Results . . . . . . . . . . . . . . . .
4.3.1 X-ray Angiography Images . . . . . . . .
4.3.2 Synthetic Images . . . . . . . . . . . . . .
4.3.3 Synthetic Images with Disturbance . . . .
4.4 Future Development . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 General Problems in Multiple Motion Analysis
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Motion Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Correspondence Problems . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 Minimal Number of Motion Constraints . . . . . . . . . . .
5.3.2 Problem: Correspondence Between Estimates in Dierent
Parts of the Image . . . . . . . . . . . . . . . . . . . . . . .
5.3.3 Problem: Interframe Correspondence Between Estimates . .
6 Estimation of Multiple Motions
6.1 Other Methods Considered . . . . . . . . . . . . . . . . . . .
6.1.1 Diculties with Multiple Correlation Peaks . . . . . .
6.1.2 Diculties with Dominant Layers . . . . . . . . . . .
6.2 Estimation of Motion Constraints . . . . . . . . . . . . . . . .
6.3 EM (modied) . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.1 Review EM . . . . . . . . . . . . . . . . . . . . . . . .
6.3.2 Derivation of EM Algorithm for Multiple Warps . . .
6.3.3 Evaluating Criteria for Optimum . . . . . . . . . . . .
6.3.4 Iterative Search for Optimum . . . . . . . . . . . . . .
6.3.5 The Probability Function . . . . . . . . . . . . . . . .
6.3.6 Introducing Condence Measure in the EM Algorithm
6.3.7 Our Extensions to the EM Algorithm . . . . . . . . .
6.3.8 Convergence of Modied EM with Warp . . . . . . . .
6.4 Reconstruction of Transparent Layers . . . . . . . . . . . . .
6.4.1 Improved Backprojection Algorithm . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
29
29
29
30
30
31
33
34
34
35
35
35
35
39
39
39
40
40
41
42
43
43
43
44
44
45
46
46
47
49
49
49
50
51
51
51
Contents
1
6.4.2 Finding Correspondence between Motion Estimates from
Dierent Frames . . . . . . . . . . . . . . . . . . . . . . . .
6.4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . .
6.5 Alternative Method for Two Mixed Motions . . . . . . . . . . . . .
6.5.1 Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.5.2 Minimizing "(a1 ; a2 ) . . . . . . . . . . . . . . . . . . . . . .
6.5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . .
7 Canonical Correlation of Complex Variables.
7.1
7.2
7.3
7.4
Denition of Canonical Correlation of Complex Variables
Maximizing Canonical Correlation . . . . . . . . . . . . .
Properties of the Canonical Correlation . . . . . . . . . .
Maximization Using SVD . . . . . . . . . . . . . . . . . .
7.4.1 Operations in Maximization . . . . . . . . . . . . .
7.5 Canonical Variates . . . . . . . . . . . . . . . . . . . . . .
7.6 Equivalence with Borga's Solution . . . . . . . . . . . . .
8 Motion Estimation using Canonical Correlation
8.1 Operations Applied Locally in the Image. . . . . .
8.1.1 Shifted Quadrature Filter Outputs . . . . .
8.1.2 Canonical Correlation . . . . . . . . . . . .
8.1.3 Correlation of Filters . . . . . . . . . . . . .
8.1.4 Look Up Table (LUT) . . . . . . . . . . . .
8.1.5 Motion Constraints from Correlation Data .
8.2 Fitting Motion Model to Data . . . . . . . . . . . .
8.3 Choosing Patch Size . . . . . . . . . . . . . . . . .
8.4 Experimental Results . . . . . . . . . . . . . . . . .
8.5 Future Development . . . . . . . . . . . . . . . . .
8.5.1 Using Multiple Variates . . . . . . . . . . .
8.5.2 Other Filters than Quadrature Filters . . .
8.5.3 Reducing Patch Size . . . . . . . . . . . . .
Appendix
A
B
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Details for Chapter 7 on Canonical Correlation . . . . . . . . . . .
A.1 Failure to Compute Derivative with Respect to a Complex
Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2 Beginner's Example of Canonical Correlation . . . . . . . .
A.3 Proof of Equation (7.9) . . . . . . . . . . . . . . . . . . . .
Variable Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B.1 Global Variable Names . . . . . . . . . . . . . . . . . . . . .
B.2 Local Variable Names in Chapter 3 . . . . . . . . . . . . . .
B.3 Local Variable Names in Chapter 4 . . . . . . . . . . . . . .
B.4 Local Variable Names in Chapter 5 . . . . . . . . . . . . . .
B.5 Local Variable Names in Chapter 6 . . . . . . . . . . . . . .
B.6 Local Variable Names in Chapter 7 . . . . . . . . . . . . . .
B.7 Local Variable Names in Chapter 8 . . . . . . . . . . . . . .
52
52
54
54
56
57
59
59
60
61
61
61
63
63
65
65
67
67
69
69
72
72
72
73
74
74
74
75
77
77
77
77
78
80
80
81
81
82
82
83
84
2
Contents
Chapter 1
Introduction
1.1 Motivation
All the research presented in this thesis is dedicated to medical image processing
and diagnosis of cardiovascular disease, which is the leading killer throughout the
industrial world. For example, according to U.S. Department of Health and Human
Services[24], more than 950,000 Americans die of cardiovascular disease each year,
accounting for more than 40% of all deaths. About 57 million Americans, nearly
one fourth of the U.S. population, live with some form of cardiovascular disease.
This thesis presents algorithms for motion analysis that are primarily intended
for angiography, i.e. medical images on blood vessels. Some parts of this work
are already used in a commercial product that has been delivered for clinical use.
Other parts need further development before they can be turned into commercial
applications. So far, we are good in motion compensation for patients moving
extremities[16, 15]. The future goal is to handle motions of a beating heart.
The motion estimation algorithms presented in this thesis are by no means limited to medical applications. Estimation of single motions is widely used and high
accuracy is often crucial, e.g. in robotics and structure-from-motion applications.
Multiple motion analysis is also an important eld. Our methods for estimating
transparent motions may enable robotics applications to handle moving shadows
and reections in windows. Our algorithms are also able to handle motions of
occluding objects. Some modications may improve performance though.
4
Introduction
1.1.1 Cardiovascular Disease
A number of words related with cardiovascular disease are listed here.
Thrombosis
Embolism
Stenosis
Aneurysm
Perfusion
Ischemia
Capillaries
infarct
Stroke
Formation of a blood clot that blocks a vessel. Can
often be dissolved by drugs.
A clot in one part of the body can break lose and
block an artery in another part of the body.
Narrowing of a vessel. The blood sometimes nds a
new way through smaller vessels.
Swelling of a vessel. Often it looks like a balloon.
Aneurysms that burst in the skull cause cerebral
hemorrhage.
Blood ow through tissue.
Lack of oxygen in tissue. Often due to obstruction of
arterial blood supply.
Vessels in tissue that are too small to be seen individually. On angiography images with contrast agents,
they can sometimes be seen as a cloud.
Tissue death due to lack of oxygen.
Damage to nerve cells in the brain due to lack of
oxygen.
1.2 What is Digital Subtraction Angiography and
why is Motion Compensation Needed?
Angiography is medical imaging on vasculature (angio = blood [vessel]). In the
past, angiography was only done using conventional X-ray and contrast agents.
Today it is also widely accepted to use CT1 and there is a rapid progress in MR2
angiography. Over the last years, more and more people seem to believe that MR
is taking over a a large portion from X-ray angiography. Despite the progress
and the future potential of MR, X-ray remains the gold standard, to which MR is
compared, and most people seem to believe that X-ray will be indispensable even
in future.
1 Computed Tomography (CT). X-ray images are taken from dierent angles by a rotating
X-ray source. A computer calculates a 3D reconstruction.
2 Magnetic Resonance (MR). A combination of stationary and rotating magnetic elds are
applied on the patient. These make nuclei in the atoms spin in coherence. The echoes of the
rotating eld can be measured. MR equipments are expensive but the total cost of using MR is
not always higher than for X-ray.
1.2 What is Digital Subtraction Angiography and why is Motion
Compensation Needed?
1.2.1 X-ray Angiography
contrast
agent
5
X-ray source
Patient
digital X-ray sensor
Don’t
move!
Computer
Figure 1.1: A number of images are taken during contrast injection. The patient
is told not to move, but that might be dicult.
Figure 1.2: angiography sequence of a leg (excerpt)
The image sensor is usually
an image intensier tube with
a CCD element at the output screen. Electronic sensors without intensier tubes
are coming. There are also
image plates that are scanned
by lasers and yield better image quality, but they cannot be
used to acquire a sequence of
images.
6
Introduction
An ordinary frame rate in DSA
is between 2 and 6 images per
The frame rate is often
An ordinary dose of contrast second.
higher
in
the beginning of a seagent is 30ml. It is injected by quence and
decreased when
a long catheter directly into a the contrast isagent
reaches the
vessel, upstreams of the region smaller and slower vessels.
Dito be examined.
agnosis on the heart (angiocardiography) requires a much
higher frame rate.
Since blood cannot be distinguished from tissue in an X-ray image, a contrast
agent is injected into an artery upstreams of the region of interest. The injection is
made using a catheter, i.e. a hose that is usually inserted through arteries in the
groin. Iodine-based contrast agents have signicantly higher X-ray attenuation
than human tissue. This means that more of the X-rays are being absorbed and
fewer X-ray photons reach the sensor. The use of contrast agent enables medicals
to see the vessels. By taking multiple images, during injection, it is also possible
to see how the contrast agent propagates.
Unfortunately, it is often dicult to distinguish small vessels from other structure
in the image. Despite the contrast agent, the images are usually dominated by
bones, lungs and slowly varying thickness of the patient. The help to this problem
is image subtraction.
1.2.2 Image Subtraction
When subtracting pixel values of two images, one taken before injection and the
other taken after injection, only the vessels with contrast agent remains. Image
subtraction is a simple, easy-to-understand and widely accepted method. In digital subtraction angiography (DSA), a reference image is taken before contrast is
injected or reaches the region of interest. That reference image is then subtracted
from all the images acquired after contrast injection.
Image subtraction is often a very good method. After image subtraction, nothing remains in the image, except for the contrast agent. In addition, image subtraction is a safe method and the risk of wrong diagnosis due to image subtraction
is very small. Radiologists often have long experience and amazing skills in interpreting subtraction angiographies.
The predecessor of DSA, is subtraction angiography with photographic lm. One lm is positive and the other is negative.
1.2.3 Motions
Image subtraction requires, that nothing has moved between the images were acquired. No patient motions are allowed during image acquisition. Not surprisingly,
1.2 What is Digital Subtraction Angiography and why is Motion
Compensation Needed?
7
this makes DSA almost impossible on heart, intestines and other organs that keep
moving all the time. More surprising is that motions cause problems even when
the arms and legs are examined. When contrast is injected, the patient often feels
a burning sensation, and move a little. Even if patients are xated, they still move
a little.
1.2.4 Pixel Shift by Hand
In conventional implementations of DSA, it is possible to compensate for motions
by shifting the entire image a certain number (or fractions) of pixels. This process,
called pixel shift, must be done manually by a medical. To save time, images with
motions are often thrown away, rather than being shifted.
Except for the time required, the quality is often poor. Pixel shifts can only
compensate motions that are uniform over the image, but the motions often vary
over the image. This means that pixel shifts cannot achieve good quality over the
entire image simultaneously.
1.2.5 Automatic Motion Compensation
We have developed automatic motion compensation[16, 15] that is a substitute to
manual pixel shift. The automatic motion compensation even works for images
with rotations and deformations in the image plane. Our motion compensation is
very accurate for ordinary motions, including rotations and deformations. It does
not matter if the motions are irregular over time. The algorithm is implemented
on a dual processor Pentium-II workstation, where 1 second processing time yields
enough accuracy for most images of size 512x512. A whole sequence of images can
be processed without user interaction.
At the time of writing, we have attended an oral presentation of another project
that addresses the same problem but with dierent algorithms. Their article[32]
is not yet available though.
1.2.6 Objective of our Research
Our research in the past is justied by the motion compensation for angiography,
and the future goal is better angiography of a beating heart. Tracking the motions
of the heart in 2-dimensional X-ray images is a very dicult task. Probably, we
will see several generations of motion estimation algorithms before performance is
good enough. For that reason, the focus of this thesis is to solve simpler problems
of multiple motions. We don't claim that the algorithms for multiple motions work
on real X-ray cardio images, but we hope that research has led us closer to the
solution of our specic problem. We also hope it is a step towards better analysis
of multiple motions in general.
8
Introduction
Some More Facts
Iodine-based contrast agents
are no longer ionic. Ring structure molecules are popular. Despite the development of better contrast agents, some patients still have allergic reactions and chronic kidney damage. A large portion of the patients have diabetes and thus
extra sensitive kidneys.
CO2 is an alternative to iodinebased contrast agents. CO2 ,
which is almost transparent to
X-rays, replaces the blood in
the vessels and acts like a negative contrast. CO2 is dissolved
in the blood and expired by the
lungs in a one-pass fashion. [22]
1.2.7 Cardiac Angiography
Angiography on a beating heart is dierent from angiography on peripheral parts
of the body. Due to the fast motions, a higher frame rate of 12-24 images per
second is used. Today, there is no technique of motion compensation and thus
subtraction cannot be used. Often, angiocardiography is done with interventions
and many image sequences are acquired. This means large doses of both X-ray
and contrast agents. Typical images are shown in gure 1.3.
Figure 1.3: Cardio Sequence. Frame 25, 50, 75 and 125.
1.3 Notations
9
1.2.8 Interventional Angiography
A set of techniques, commonly called interventional angiography, is a cheap and
simple alternative to surgery in treatment of cardiovascular disease. Thromboses
and stenoses can be punctured by a wire inside the catheter. Narrowed vessels
can be widened by balloons that are temporarily inserted with the catheter and
inated to high pressure. After treatment with balloons, it might be necessary to
insert a tube in the vessel for it to stay open. The tubes, called stents, are often
made of a metal grid that expands to the correct size once it has been inserted to
the right position.
There are also stents that make vessels narrower, as a remedy to aneurysms
or other kinds of pathological enlargement of vessels. These stents are like a
hose inside the vessels. For example, sections of the aorta sometimes expand and
get much too wide. A stent is fastened upstreams of the aneurysm and leads
the blood past the aneurysm. The blood outside the stent coagulates and the
aneurysm goes away. Other aneurysms can be treated by lling them with wire
that makes the blood coagulate. Aneurysm in the brain is the leading cause of
cerebral hemorrhage.
1.2.9 A Word about MR Angiography
Magnetic Resonance Angiography (MRA) has evolved rapidly over the last years.
Several studies[25] indicate that MR angiography is already as good as X-ray.
In addition, MR avoids problems with X-ray, such as harmful radiation. MRA
can be performed without contrast agents, using velocity sensitive measurement
such as phase contrast (PC) or time of ight (TOF). In practice, contrast agents
may, however, be necessary in most MRA studies, but the risks are less than in
X-ray. Contrast agents for MRA are less harmful and are injected intravenously,
usually in the arm. This is much simpler than X-r is time consuming and requires
precautions to prevent bleeding, thromboses, vessel trauma and infections.
Other advantages with MRA are abilities like 3D image acquisition. Among the
disadvantages are slow image acquisition and inferior spatial resolution. Metallic
implants cause image artifacts, e.g. signal void around metallic stents look like
stenosis. Interventional angiography requires that all tools are non-metallic. For
security, patients with pacemakers should not be exposed to magnetic resonance.
Today, most people seem to believe that MRA will substitute X-ray in many
situations, but to what extent is a controversial issue. Most predictions we have
heard are partisan and range from \not more than today" to \almost always".
1.3 Notations
In appendix B, there is a list of variable names in this thesis. This section is just
an introduction to notations and style in this thesis.
Vectors and matrices(and tensors) are written in boldface. Matrices are uppercase and vectors are lower case. For example, boldface A is a matrix and boldface
a is a vector. Vectors are always column vectors. Normal font A and a are scalars.
10
Introduction
r
I
AT
A
A
Gradient
identity matrix
superscript T denotes transpose of matrix.
star denotes complex conjugate and transpose of matrix.
v for scalars, a star is simply a complex conjugate.
v = vxy
kuk
u^ = kuuk
boldface v denotes image motion
x= y
boldface x always denotes coordinate in image.
x norm of vector u
hat denotes normalized vector
1.4 Quadrature Filters
Chapters 4 and 8 use quadrature lters that are related to Gabor lter pairs. A
lter is a quadrature lter[13] if its Fourier transform, F (u), has zero amplitude
on one side of a hyperplane through the origin, i.e. there is a vector n^ such that
F (u) = 0 8 n^ T u 0
(1.1)
In this thesis, n^ is called the direction of the quadrature lter. We only use
quadrature lters that are real in the Fourier domain. Note that quadrature lters
must be complex in the spatial domain (since F (u) 6= F (,u)).
Quadrature lters can be optimized using a kernel generator, which produces
ecient separable or sequential kernels [23, 2].
11
amplitude
amplitude
1.4 Quadrature Filters
5
10
0
−10
−5
0
frequency
5
10
omega_y
0
−5 −10
omega_x
Figure 1.4: Quadrature lters in one and two dimensions. Both lters have
direction in positive x-axis.
12
Introduction
Chapter 2
General Issues for Single
Motion Fields
This chapter is a discussion on issues in motion estimation in general. There are
several existing methods, e.g. nding correlation peaks and point matching. In
this thesis, the focus is methods that use two images and rst estimate constraints
on the local motion and then t a motion to these.
2.1 Aperture Problem
No matter, how good tracking algorithm is used, motions cannot be unambiguously estimated in an image that only contains structure in one orientation. This
is known as the aperture problem. For example, we can think of a moving line,
viewed through a small window. Since we cannot see the line endings, it is impossible to estimate the motion component along the line. Only the orthogonal
component can be estimated.
The aperture problem tells us to use big windows when estimating motions.
Small windows rarely have structure in more than one orientation. How large
windows depends on how far we have to go in the image, before orientation changes.
In some images, e.g. gure 2.1, it might be necessary to use the entire image to
estimate motion at a single coordinate.
A big window may solve the aperture problem, but fails to estimate motions
locally, when motions are not uniform over the image. Chapter 3 describes how to
use global motion models to overcome the aperture problem and still being able
to estimate motions that are not pure translations.
2.1.1 Failure of Separable Motion Estimation Algorithms
It may seem plausible that an algorithm that estimates disparity along a scan line,
can be extended to track motions in the plane. A rst stupid idea was to apply
14
General Issues for Single Motion Fields
Figure 2.1: This image contains very little structure for estimating motions
in vertical direction. For sucient accuracy, the entire image is needed. (X-ray
image of a leg)
the stereo algorithm in both horizontal and vertical direction. This would give one
estimate of the motion in x-direction and another estimate in y-direction.
Although this worked pretty good in some experiments, we abandoned this
approach since there is a fundamental dierence between stereo algorithms and
motion algorithms. The stereo algorithm assumes it can nd a match along the
direction of search (usually scanline). This assumption is valid for stereo images,
but not for images with motions. Searching in one direction does rarely yield a
correct match. Thus, we might not nd a match, or even worse, nd a false match.
As illustrated in gure 2.2, this method is even unaware of the aperture problem.
2.2 Motion Constraints
@ vy
@
@
@
@
@
15
@
[email protected]
@
@
@
@
@
@
,,
,
@
@
@
,v = vx
vy
,,
,
@
-
@ v
@ x
Figure 2.2: This gure shows what happens if we track a moving line, by independently estimating the x- and y-components of the motion. The total estimate,
v, is seriously bad. In addition, this algorithm is unaware of the aperture problem
and gives just one answer.
2.2 Motion Constraints
Throughout this thesis, we will use constraints on local motion, like
cxvx + cy vy + ct = 0
(2.1)
where (vx ; vy ) is the local image motion and cx , cy and ct are coecients estimated
locally in the image. It popular to use cx = dI=dx, cy = dI=dy and ct = dI=dt,
where I (x; t) denotes the intensity of the image sequence. This method is commonly called the gradient method or optical ow[17]. A novel method of estimating
motion constraints is presented in chapters 4 and 8.
If we use constraints from a single point, the motion (vx ; vy ) cannot be unambiguously determined due to the aperture problem, but by combining constraints
over a larger region, the aperture problem is overcome.
For the rest of the thesis, c will denote a vector such that
0c 1
x
c = @cy A
(2.2)
ct
with the property that
0v 1
x
cT @vy A = 0:
(2.3)
1
Note that scaling of the constraint vector, c, does not change the constraint on
the motion, eq. 2.1. We use the magnitude of the constraint vector to denote a
16
General Issues for Single Motion Fields
condence, i.e. a measure on how much we trust the estimate. Terminology will
be sloppy and the vector c itself is often called motion constraint.
Example 2.2.1 Intersecting Constraints: Assume motion is pure translation and
v
y
we have been able to estimate motion constraints, eq. (2.1), without errors. Then
there is a unique motion, vx , vy that satises all the constraints. As in gure 2.3
the motion can be solved graphically by plotting all motion constraints (vx ; vy )space. The intersection is the correct motion.
v
x
Figure 2.3: A number of constraints. For pure translational motions, all constraints intersect at a common point in (vx ; vy )-space.
This representation is trivial when there is only one motion. It will be used more
for better understanding of multiple motions.
For other motions than pure translations, we may use parametric motion models as described in chapter 3. We need to draw constraints in as many dimensions
as there are parameters. The constraints are represented by hyperplanes that all
intersect in a point corresponding to the motion.
2.3 Warping Image to Estimate Large Motions
with High Accuracy
To estimate large motions with high accuracy, it is common to use a coarse to
ne approach. Motions estimates from coarse scale are used to warp the image,
and the estimates can be rened in ner scale. For best accuracy, more than one
iteration is done in each scale. This scheme is called iterative renement[29]. One
potential problem is that a good match in coarse scale is not necessarily a good
match in ner scale.
Another problem is the subpixel warp, which means resampling of the image.
In imaging, unlike to audio, resampling usually means degradation since images are
not perfectly bandlimited before sampling and cannot be reconstructed without
error. There are several methods of interpolation, but we simply use bilinear
2.3 Warping Image to Estimate Large Motions with High Accuracy 17
interpolation for maximum locality and obtain images that look good to the human
eye.
Even if images warped with bilinear interpolation look quite good to the human
eye, they may not look good to motion estimation algorithms. In section 2.3.3 we
will present a method that avoids subpixel warps. In chapter 8 is another method
presented where nothing is warped at all. Both these methods need to assume
that rotations and deformations are so small that the image motion locally can be
described as translation.
2.3.1 Conventional Iterative Renement
A conventional scheme of iteratively estimating large motions with good accuracy
is presented in gure 2.4. After each iteration, the original images are warped and
in next iteration the error in previous iteration will be estimated. The error is
supposed to converge to zero.
IA (x)
-
IB (x) warp
-
compute c~
t
- motion
constmodel
raints
v
-
accumulate
v
-
6
Figure 2.4: Iterative renement for motion estimation from two image frames,
IA (x) and IB (x). Estimated motions are used to warp the image so that only a
small motion remains to be estimated in the next iteration.
2.3.2 Compensate Constraint
It turns out that the approach to warp image and estimate errors, as in gure 2.4,
means diculties when estimating multiple motions and the image is warped for
each of the motion layers. The major problem is the incompatibility between constraints computed from dierent warps, i.e. it is complicated to use constraints
computed from one warp together with constraints from another warp.
We will show how compensate for the warp directly in the constraint. Let
(wx ; wy ) denote the local warp and let (~cx ; c~y ; c~t ) denote a motion constraint estimated from a warped image. That constraint is an estimate of the motion relative
to the warp,
c~x(vx , wx ) + c~y (vy , wy ) + c~t = 0:
(2.4)
18
General Issues for Single Motion Fields
Thus, the correct motion constraint vector is
0
[email protected]
1
c~x
A
c~y
c~t , c~xwx , c~y wy
(2.5)
2.3.3 Iterative Renement without Subpixel Warps
Thanks to eq. 2.5, we can compute the correct constraint, even if even if warp
is not exact. This enables warp without subpixel accuracy where the local shifts
can be rounded to integral pixels. An overview of the scheme is presented in
gure 2.5. Note that unless the motion is a pure translation, it is no good to
apply any spatial operations after warping with integral local shifts. In particular,
we have to compute spatial gradient before warping. This limits this method to
images where deformations and rotations are so small that they can locally be
regarded as translations.
This method is fast since the spatial lters1 need not be applied in each iteration. A limitation is that it cannot be used in conjunction with all possible methods
of estimating motion constraints, c. The motion constraint must be estimated in
a separable fashion, where all spatial operations are performed before operations
in temporal direction. The phase-based method in chapter 4 and the conventional
gradient method[17] satisfy this requirement. In the gradient method, there are no
temporal operations before computing spatial derivatives and there are no spatial
operations applied on the temporal derivatives.
IA- spatial
operations
-
IB- spatial - warp
operations
6
-
comptemporal ~c- ensate
for
operations
warp
t
c- motion
v
model
6
w
round to
integer
Figure 2.5: Our scheme of warping. Instead of warping the image, a number of
lter outputs are warped. Since the motion constraints, c are compensated for the
warp directly, it is not necessary to warp with subpixel accuracy. (c.f gure 2.4)
1
We may want to use lters that are computationally expensive
-
Chapter 3
Parametric Motion Models
In this chapter, we assume a large number of constraints on the local motion are
given (c.f. section 2.2), i.e.
0v 1
x
(3.1)
cTk v = 0 where v = @vy A and k = 1; 2; 3; : : :
1
Methods for computing these constraints are described in chapters 4 and 8. The
focus of this chapter is how to compute the motion from these constraints, even if
motion is not pure translation, i.e. the motion depends on spatial position x
v = v(x):
(3.2)
Since the constraint vectors, c are noisy, it does not make sense to t a motion
perfectly to these constraints. In case we would try to t a motion eld to every single constraint, the resulting estimate would be very noisy. Therefore it is
necessary to t a smooth eld to the constraints. How smooth and in which way
is application dependent. E.g. in orthogonal projection of a planar surface, the
projection image can only be subject to translations, rotations and elongations. In
case we do motion estimation on a planar surface, all estimated nonrigid deformations should be discarded. There are several dierent methods of tting motions
to a number of constraints. We think, in many articles[3], these methods are often
associated and confused with particular methods for estimating the constraints on
the local motion.
We have found it simple to use methods where the motion is represented by
a number of parameters, e.g. ane motion which is described by six parameters.
In this chapter, we present a general theory for parametric models where the local
motion is linear with respect to the parameter vector.
3.1 Our Denition of Parametric Motion Models
A motion model describes how images move relative to one another. The motion
is denoted v and describes how many pixels an object moves between two frames.
20
Parametric Motion Models
The motion can either be velocity or displacement. In case of two image frames
IA (x) and IB (x) and no intensity variations, the image intensities are related as
IA (x) = IB (x + v) 8x
(3.3)
Unless we have pure translation, v is not constant over the image. Pure translation
is simple, but not adequate in most applications where tracked features are being
distorted or rotated. A popular motion model is the ane transformation, which
can handle scaling, rotation and elongations, i.e.
a a a v = a14 a25 x + a36
(3.4)
Motion models can be designed in many ways. Just for fun, let's consider one
more, the quadratic motion model.
a a a 0x2 1 a a x a v = a107 a118 a129 @xyA + a35 a46 y + a12
(3.5)
y2
We can spot a pattern. All the motion models considered so far can be written as
a linear combination of basis functions. Given a set of basis functions, the motion
is represented by a set of parameters, ai . This seems to be a useful and simple
way of describing almost any motion model.
v=
N
X
i=1
ai ki (x)
(3.6)
To simplify notations, we arrange the coecients in a parameter vector
0a 1
BB a12 CC
a=B
(3.7)
@ ... CA
aN
and the basis functions in a matrix
K(x) = ,k1 (x) k2(x) : : : kN (x)
(3.8)
and rewrite eq (3.6) as a matrix multiplication instead of a sum
v = K(x) a:
(3.9)
For the rest of the thesis, boldface a denotes a vector of motion model parameters and K(x) is a matrix, whose columns are basis functions.
Example 3.1.1 For the pure translation motion model, K(x) = I is the identity
matrix and for the ane motion model
K(x) = x0 y0 10 x0 y0 01
(3.10)
It is of course possible to swap the columns in K(x) or form new sets of basis
functions.
3.2 Model Based Motion Estimation
21
3.1.1 Finite Element Method (FEM)
For computational eciency in motion estimation, K(x) should be locally sparse.
In other words, the basis function should have small support, i.e. Kij (x) = 0
except for in a small region in spatial domain. We might want to express the
motion as a linear combination of bumps, or interpolation kernels. We have used
bilinear interpolation kernels, which have small support and are continuous. Using
interpolation kernels with small support is known as the nite element method.
In particular, when we have bilinear interpolation kernels, solid mechanics people
say we have linear elements or rst order method. To get a second order method,
we must have interpolation kernels with continuous derivatives.
Figure 3.1: One of our favorite motion models used to be a deformable linear mesh.
The more complicated motions are, the more nodes are needed in the mesh. Each
node corresponds to bilinear basis functions for horizontal and vertical motions.
The motion model presented here can be extended to describe motion over time
K(x; t). In case motion is regular over time, a spatiotemporal model can improve
accuracy. An interesting model for cyclical heart motion[35] uses truncated fourier
series in temporal direction and a nite element mesh in spatial directions.
3.2 Model Based Motion Estimation
To simplify notations, the motion vector is extend with an extra entry that is
always unity. For that reason, K(x) matrix and the parameter vector a are also
extended,
v a
K(x) 0
v = 1 ; a = 1 and K(x) = 0T 1 :
(3.11)
This section describes how to estimate motion model parameters from motion
constraints. In other words, we have a set of motion constraint vectors, ck (c.f.
22
Parametric Motion Models
section 2.2) and want to compute a the best possible parameter vector, a for the
chosen motion model. For simplicity, we t parameters in least square sense to
constraints like cT v = 0 where v(x) = K(x) a. Remember that the magnitude
of the constraint vector, c, is the condence measure. Let xk denote the spatial
position of constraint ck and dene the following error measure that should be
minimized with respect to motion model parameters
X
"(a) = (cTk v(xk ))2
=
=
where
k
X
k
X
(cTk K(xk ) a)2
aT K(xk )T ck cTk K(xk ) a
(3.12)
k
= aT Qa
Q=
X
k
K(xk )T ck cTk K(xk )
(3.13)
Since the last entry in a is always one, the Q matrix is splitted into a submatrix,
a vector and a scalar.
Q = qQT qq :
The error can be expressed as
"(a) = aT Q a + 2qT a + q
and the motion model parameters are computed as
a = Q,1q:
(3.14)
(3.15)
(3.16)
3.3 Cost Functions
Even if motions are complicated in a global view, they may be simple locally.
Motion model with many parameters allow too irregular motions. This makes
the motion estimates susceptible to noise and aperture problem. The problem is
even worse when using the EM algorithm in chapter 6, which gets lost in the rst
few iterations. It is also a problem when the basis functions in K(x) have small
support, and some regions suer from the aperture problem. Our remedy is to
discourage deformations by adding a cost function to the error measure "(a) in
eq. (3.12). For simplicity, the cost function is a quadratic form aT Pa where P is a
symmetric matrix with nonnegative eigenvalues. Instead of minimizing that error
measure, we minimize a sum of the error measure and the cost on deformations.
"~(a) = "(a) + aT Pa
(3.17)
= aT (Q + P) a + qT a + q
3.3 Cost Functions
23
where 0 is a scalar parameter that controls the stiness, and can be included
in P, if you like. The larger lambda, the more regularization. The reason for using
quadratic error measure is the computational eciency. Compared to not using a
cost function, we only have to introduce a matrix addition in eq. (3.16)
a = (Q + P),1 Q:
(3.18)
There is no universal way to choose , but in one of our implementations, it is
proportional to the frobenius norm of Q.
3.3.1 Limit on Cost
When using the EM algorithm in chapter 6, we avoid choosing explicitly. Instead
we set a limit on the cost, i.e. we choose an upper limit on aT P a and then solve
for the smallest 0 that gives a motion estimate below the limit. First we try
= 0, and if that doesn't pass the limit, Newton-Raphson search is applied on a
function that is zero at the cost limit,
f () = aT () P a() , 0
(3.19)
where 0 is the upper limit. Newton-Raphson solves f () = 0 in a number of
iterations,
n+1 = n , ff0((n ))
(3.20)
n
The derivative in the denominator is computed as (note that a = a() is a function
of ).
da
f 0 () = 2aT P d
(3.21)
= 2aT P (Q + P),1 (,Pa)
= ,2aT P (Q + P),1 Pa
da = ,(Q + P),1 Pa was solved by dierentiating (Q + P) a = q, which
where d
da + Pa = 0. The second derivative can be computed by
gives that (Q + P) d
d2 a2 + 2P da = 0. This gives d2 a2 = ,2(Q +
dierentiating again, i.e. (Q + P) d
d
d
da = 2(Q + P),1 (Q + P),1 Pa.
P),1 d
Theorem 3.3.1 f 0() 0; f 00() 0 8 0
Proof: Note that Q is symmetric, as it is dened. Without loss of generality,
we can also assume P is symmetric. (Every cost function written with a nonsymmetric P can also be written with a symmetric P.) Then it is obvious that
f 0 () 0 since (Q + P),1 is positive denite.
Next
da P da + 2aT P d2 a
f 00 () = 2 d
d
d2
T
,
1
(3.22)
= 2 a P (Q + P) P (Q + P),1 Pa
,
,
= 2 (Q + P),1 Pa T P (Q + P),1 Pa
24
Parametric Motion Models
Note that f 00 () is a quadratic form and P has non-negative eigenvalues, f 00 ()
cannot be negative. Thus, the sequence n will decrease towards the limit, but
never reach below. To get below, we modied f () by replacing 0 by some value
just below the limit.
3.3.2 Designing Cost Functions
There is no universal way of designing cost functions. We have tried to design cost
functions without much theory. It is easy when there are only a few parameters
in our motion model, but gets harder when there are more degrees of freedom. By
mistake, we may forget adding a cost on deformations that should be forbidden.
We have developed a method of designing cost functions for any motion model
with many parameters. We have used it to design cost functions that makes that
makes a deformable mesh to locally behave like an ane transformation. The
fundamental idea is to compare the estimated motions in a region, with the closest
possible ane transformation.
Example 3.3.1 To illustrate the approach of designing a cost function, let's look
at an example that solves a dierent and much simpler problem. Assume we would
measure the roughness of a signal s. Let slp denote a low pass ltered version
thereof. We may dene roughness = ks , slp k. The cost is simply dened as
the dierence between the signal and the closest signal that is free from high frequency components. The same idea is used when designing a cost on deformations.
The cost on the motion is the dierence to the closest motion without non-ane
deformations (locally).
To dene the cost, we locally t an ane model to the estimated motions. The
cost is the square dierence between the motion estimate and the ane model that
we t to the same estimates. Let K(~x) be the matrix including the basis functions
of the ane model and let a~ be the ane parameters. These a~-parameters are
locally computed from the estimated motion, v(x) = K(x) a.
(a) =
X
all regions
min
a~
ZZ
kK(~x) a~ , K(x) ak dx dy
2
region
(3.23)
The regions must overlap, since this method does not put a cost on deformations
at the borders. Evaluation of (a) into an explicit formula will give a quadratic
cost function of a for each local region. These cost functions are summed into
a global cost function that is also quadratic and can be applied as described in
section 3.3. Note that the use of this method is not restricted to regularization for
the nite element model. Also note that it does not have to be ane models that
we locally impose. Instead of using ane motion models as a reference, we may
use translational or quadratic motion models. It is even possible to use a mixture
thereof, just by adding cost functions designed in dierent ways. For example, we
can put a low cost on translations and a high cost on ane deformations.
3.4 Relation to Motion Estimation from Spatiotemporal Orientation
Tensors
25
3.4 Relation to Motion Estimation from Spatiotemporal Orientation Tensors
Knutsson and Granlund[13] have used 3D orientation tensors to estimate motion.
Their tensors are a 3x3 matrix that are estimated locally in the image. To overcome
the aperture problem, it may be necessary to low pass lter the tensors. Pure
translation can be estimated by summing tensors over the entire image. They
suggest motions should be estimated by minimizing
" = v TTv
T
(3.24)
v v
This is similar to our least square method, described in section 3.2. In fact, our
least square t is minimization of
" = vT Tv where T =
X
ccT
(3.25)
Which of eq. (3.25) and eq. (3.24) gives the best motion estimate depends on how
the tensors are estimated. Knutsson and Granlund use spatiotemporal lter banks
to estimate the tensors. The motion is estimated without, warping the
image. For
their tensors, one can assume that the angular error of vx vy 1 T is independent of angle. For our tensors that are estimated from warped images, we assume
that the absolute error of (vx ; vy ) is independent of the size of the motion.
To estimate ane motions, Farneback[9] expanded the tensors to size 7x7 before summing them together. His approach can be generalized to any of our motion
models by replacing ck cTk by Tk in all eq. (3.13).
3.5 Local-Global Ane Model
In some applications the motion eld is several complicated deformations. Initially,
we used the nite element motion model, which models image motions like a mesh
that deforms. This method become computationally expensive when the image
is divided into many cells. Remind that it takes O(N 3 ) operations to solve a
linear equation system, where N denotes the number of unknowns. There are two
unknowns for every node in the mesh, and the number of nodes is proportional to
the square of the resolution. Thus, we have O(N 6 ) algorithm. Another diculty
with the nite element method is to design cost function, since it must depend on
the image. The cost function should depend on the distribution on magnitude of
the image or motion constraint. It is also an issue how it should behave on the
borders of the image. We have images where the valid region can be a circular or
rectangular region of the total image. Since the valid region is not known a priori,
a new cost function needs to be computed for every image.
Instead of using a global parametric model with many parameters, we use local
ane models with global smoothing . The remedy to the aperture problem is low
pass ltering, not cost functions. Of course, we cannot estimate motions rst and
26
Parametric Motion Models
then low pass lter the motion vectors. Instead we low pass lter the coecients
of Q. Although the model is local, we use a global coordinate system for the
ane parameters to enable low pass ltering of coecients. The reader should
convince him/herself that averaging equation system coecients over the entire
image, is equivalent to a global ane model. Averaging over a region is equivalent
to using an ane model in that region. Recall that averaging is equivalent to
convolution with a kernel with a constant value. You might stop up and ask what
happens if we use some other non-negative kernel, e.g. a Gaussian. This is, in fact,
equivalent to to weighting the constraint dierently in, eq. (3.26). Usually, we are
more interested in constraints in near neighborhood than far away. In order to
remedy the aperture problem, it is still necessary to let motion estimates in one
corner of the image inuence motion estimates in opposite corner. In terms of
formulas, we modify eq. (3.13) from global to local values of Q.
Q(x) =
X
k
W (x , xk )2 K(xk )T ck cTk K(xk )
(3.26)
where W (x) is a windowing function, e.g.
W (x) = kxk1+ (3.27)
Where determines the size of the local region. A large will make the motion
estimate more global. We recommend RR
that the other parameter, > 2 (otherwise
there is theoretically no locality since W (x; y) dx dy < 1).
Estimating motion locally rather than globally is of course more sensitive to
noise and the lack of local structure. An interesting property of this method, is
that if the window is strictly positive everywhere, i.e. W (x) > 0 8x, then the
local Q(x) has the same rank as the global Q. All structure in the image contribute
to all local matrices. This means that this method produces a motion estimate at
every point in the image.
3.5.1 Ecient Implementation of The Local-Global Ane
Model
Our implementation of the local ane motion model is almost as fast as the global
ane motion model. The window function is implemented as a convolution with a
low pass lter. First we compute a local version of Q(x) using a window function
that is unity in a very small neighborhood and zero everywhere else. To save
computations, subsampling1 of the matrix eld is applied at the same time. This
eld of matrices is convolved with the window function. The window function is
modied a little to be separable.
For each point in the low pass ltered matrix eld, the ane parameters are
solved. Since the matrix eld was subsampled, the computed we applied subsampling, we need to upsample the estimated motion estimates. The upsampling is
done by bilinear interpolation.
1
We recommend that the blocksize in subsampling is signicantly smaller than in eq. (3.27).
3.5 Local-Global Ane Model
27
Ecient implementation of the low pass lter, using separable and spread
kernels, makes the computational complexity reduces from O(N 6 ) in the nite
element model to O(N 3 ).
28
Parametric Motion Models
Chapter 4
Estimation of Motion
Constraints
The focus of this chapter is a novel method for estimation of constraints on the
local motion, c, as dened in section 2.2. The input is two image frames and
the output is a number of (possibly conicting) constraints for each pixel. This
method can be used in conjunction with parametric motion models in chapter 3
and even for estimation of multiple motions in chapter 6.
4.1 Existing Methods
Before describing our method, we will briey describe other existing motion estimation methods and argue why not using them.
4.1.1 Intensity Conservation Gradient Method
Traditional methods for optical ow are based on the assumption of intensity
conservation over time. For X-ray images this is not valid, since (i) images in
the same sequence have slightly dierent level due to dierent X-ray exposure. (ii)
Contrast injection may darken the image, at least locally. (iii) Multiple layers may
interfere. It may be possible to remedy these problems with preltering[27, 1], and
advanced preltering can be similar to our method.
4.1.2 Point Matching
Another popular method is point matching, where a region in one image is matched
to regions in another image, using some correlation scheme. Some kind of correlation measure is computed and we the algorithm chooses the match that gives
maximum correlation. An alternative to maximizing correlation is to minimize
a dissimilarity measure. To speed up matching, it is possible to use some gradient method or iterative search instead of explicitly computing correlation for all
possible shifts.
30
Estimation of Motion Constraints
Due to the aperture problem, point matching methods are only suitable to
match regions in the image that have structure in more than one orientation,
e.g. corners and line crossings. The features to track must to be found. For our
medical images, point matching is not a good alternative. The amount of corners
is much less than there are edges. Thus, a point matching method would only use
a fraction of the information in the images.
4.1.3 Spatiotemporal Orientation Tensors
Estimating image velocity using three dimensional lter banks has proven accurate[13,
9, 10, 21] in other applications. The idea is to consider a sequence of images as
a three dimensional spatiotemporal volume, of variables x; y; t. This three dimensional thinking is the same as for the gradient method[17] of optical ow, but
instead of computing gradients, a set of lters are used to measure local orientation, which is the three dimensional motion vector.
The most successful method is probably[13, 21] based on a set of nine quadrature lters. The energies of each of the quadrature lter outputs are computed
and combined to an orientation tensor. Except for the high accuracy, the method
is good in using all the information of both edges and corners. All the information
is implicit in the tensors. The aperture problem is simply overcome by low pass
ltering of tensors. All information, including certainty, can be extracted using
eigenvector decomposition.
Unfortunately, spatiotemporal ltering approaches are not useful in our applications. The frame rate is too low and patient motions are too large and irregular
over time. In terms of signal processing, we have severe aliasing due to low temporal sampling rate. Thinking of the image sequence as a spatiotemporal volume is
not helpful. Another reason for not using spatiotemporal ltering is that we want
to estimate the displacement, not the velocity. In case we would use spatiotemporal ltering, we would get velocity vectors that had to be followed over time,
which would result in accumulation of errors.
4.2 Phase Based Quadrature Filter Method
Using quadrature lters phase is a relatively common approach in stereo algorithms[33,
12]. The idea of using phase for motion estimation has previously been investigated by some researchers [8, 6, 11], but to our knowledge, nobody has tried this
approach, which extends the accurate stereo algorithms to estimate relative motions from two image frames. Our method is almost a gradient-based method with
nonlinear preprocessing of the images. To improve accuracy, a condence measure
has been added. The method presented in this thesis has been published both as
an independent method[14] and in the context of angiography application[15].
Denition 4.2.1 A lter is a quadrature lter[13] if its Fourier transform, F (u),
is zero on one side of a hyperplane through the origin, i.e. there is a direction n^
such that
F (u) = 0 8 n^T u 0
(4.1)
4.2 Phase Based Quadrature Filter Method
31
Quadrature lter outputs are closely related to analytic signals. Note that quadrature lters must be complex in the spatial domain. We only use lters that are real
in the Fourier domain.
4.2.1 Motion Constraint Estimation
The input to the algorithm is two image frames, denoted IA (x) and IB (x) and the
output is a number of motion constraints, c, at each pixel. A number of quadrature
lters are applied in parallel on each of the two image frames, producing the same
number of lter outputs. The quadrature lters are tuned in dierent directions
and frequency bands to split dissimilar features into dierent lter outputs, so that
they do not interfere in the motion estimation. The quadrature lters also suppress
undesired features like DC value and high frequencies. Unlike the conventional
gradient method, our method is not sensitive to low pass variations in image
intensity, that are frequent in medical X-ray images, or real world images where
shadows and illumination vary.
The quadrature lters can be chosen to have dierent directions and dierent
frequency bands, but all of our implementations have four lters in the same
frequency band but in dierent directions, as shown in gure 4.1. These lters are
denoted f1 (x), f2 (x), f3 (x) and f4(x) and are tuned in 0, 45, 90 and 135 degrees.
Both the input images are convolved with each of the lters,
qA;j (x) = (fj IA )(x) and qB;j (x) = (fj IB )(x)
(4.2)
where fj (x) is a quadrature lter and IA (x) and IB (x) are image intensities of the
two frames respectively. The phase is dened as the phase angle of the complex
numbers
A;j (x) = arg qA;j (x) and B;j (x) = arg qB;j (x):
(4.3)
In all ensuing computations, we must remember that phase is always modulo 2,
but for readability we drop this in our formulas and notations. In most image
points, the lter outputs are strongly dominated by one frequency, which makes
the phase nearly linear in a local neighborhood. When the phase is linear, it can
be represented by its value and gradient. Thus, a gradient method applied on the
phase will be very accurate. Of course, the phase is not always linear in a local
neighborhood, but that can be detected, and reected by a condence measure.
For each point in the image, and for each quadrature lter output, a constraint
on the local motion is computed. To simplify notations, we drop the index, j , of
the quadrature lter.
0c 1 0
x
c = @cy A = C @
ct
1
2
1
2
@ (B + A )1
@x
@
@y (B + A )A
B , A
(4.4)
32
Estimation of Motion Constraints
Apply Quadrature Filters 0
)
,
,
,45
,
Phase
d
dx )
d
dy
x
x
PP Images IA ( ) and IB ( )
PP
PP
PP
@
PP
@
PPP
90
135
@
P
q
P
R
@
A
A
A
A
AU
B
B
d
dt
B
BBN
Magnitude
B
B
B
B
B
BN
condence C
@
R
@
cx
?
cy
?
ct
,
,
?
0c 1
x
c = @cy A
ct
Figure 4.1: From image to motion constraint for one direction of quadrature
lters. The quadrature lter outputs are complex values, but that would take colors
to illustrate so only phase images are shown. Note that phase is wrapped module
2.
4.2 Phase Based Quadrature Filter Method
33
Since the phase is locally almost linear, the derivatives can be computed as a
dierence between two pixels. The motion constraint vector is the spatiotemporal gradient of the phase, weighted by the condence measure, C , which will be
introduced in next section.
4.2.2 Condence Measure
Using a condence measure is necessary to give strong features precedence over
weaker features and noise. In addition, it is necessary to avoid phase singularities[33,
20] which occur when two frequencies interfere in the lter output. These singularities must be discovered and treated as outliers. All this is done by assigning
a condence value to each constraint. Our condence measure is inspired by the
stereo disparity algorithm by Westelius [33], which in turn is inspired by [7]. It is
a product of several factors, where the most important feature is the magnitude.
Our condence measure for magnitude may seem complicated at rst glance.
Except for suppressing weak features, it is also sensitive to dierence between the
two frames. This reduces the inuence of structure that only exist in one of the
images, such as moving shadows, appearing objects and other features not moving
according to the motion we estimate.
2
2
Cmag = (jq jjq2A+j jjqqB jj2 )3=2
(4.5)
A
B
Other factors have been added to reect whether the gradient, is sound for
the specic quadrature lter in use. Negative frequencies are illegal and indicate
phase singularities[20, 33].
(
T
Cfreq>0 = 1 if n^ r > 0 ;
0 otherwise:
(4.6)
Our condence measure is also sensitive to high frequencies, which may indicate
an error in the lter output or signal probability of negative frequencies wrapping
around modulo 2.
(
Cfreq:wrap = 1 if krk < !max:diff ;
(4.7)
0 otherwise:
where !max is related to the upper cuto frequency of the quadrature lter. Frequencies above this are probably false and there is also an increased probability
of wrap-around from negative frequencies. It might be better with a continuous
drop o in condence, but this binary function is computationally ecient since a
'C=0' can be represented by \NaN" in oating point arithmetics. We also guard
for phase dierence wrap arounds.
(
Cphase,wrap = 1 if kB , A k < max ;
0 otherwise:
(4.8)
34
Estimation of Motion Constraints
When computing the frequency, it is also useful to check consistency between
two images in order to avoid features that only exist in one of the images.
, krA , rB k2
Cfreq:cons = max 0; !max:diff
krA k2 + krB k2
(4.9)
where we have heuristically set !max:diff = 1
Finally, the total condence is computed as a product of all the condence
measures, i.e.
C = Cfreq>0 Cfreq:wrap Cphase,wrap Cmag Cfreq:cons
(4.10)
4.2.3 Multiple Scales and Iterative Renement
To estimate large motions with best possible accuracy, we apply motion estimation
iteratively in multiple scales. We begin at the coarsest scale in a low pass pyramid
to compute a rough estimate. Then we warp the image, or its lter outputs, and
do a new iteration at a ner scale. For best accuracy, we can do multiple iterations
at each scale.
When estimating a motion constraint from a warped image, we get a constraint
on the motion relative to the warp. Similarly, subsampling alters the estimated
motion constraints to yield smaller motion estimates. It is, however, simple to
compensate for the warp and subsampling. Assume the image is warped (wx ; wy )
pixels and subsampled octaves prior to estimation of a motion constraint, c~ =
(~cx ; c~y ; c~t ). Then we have in fact estimated that
c~x vx 2,wx + c~y vy 2,wy + c~t = 0:
Thus, the correct motion constraint is
0
[email protected]
(4.11)
1
c~x
A:
c~y
2 c~t , c~xwx , c~y wy
(4.12)
In order to avoid subpixel warps, the method in Figure 2.5 is used.
4.3 Experimental Results
We have used the phase-based method on various image data, and it has always
turned out advantageous to the conventional gradient method. One important
application is motion compensation in sequences of medical X-ray images, digital
subtraction angiography. The conventional gradient method fail to estimate motions accurately, due to dierent DC level in the frames and motions of the injected
contrast agent. Suppressing low frequencies helps a lot, but still our phase-based
method is superior.
4.4 Future Development
35
4.3.1 X-ray Angiography Images
Figures 4.2 - 4.5 show a comparison for a medical X-ray angiography sequence.
Image subtraction is used to extract the vessels and take away the bones and tissue.
We get much less motion artifacts when using phase-based motion estimation.
Constraints over the image are integrated, to t a local-global deformable motion
model[16] in least square sense. We have used four quadrature lters in dierent
directions in conjunction with multiple scales and iterative renement.
4.3.2 Synthetic Images
We have also compared accuracy on images where motions come from synthetic
shifts. A real world test image has been shifted dierent amounts in dierent
directions. To avoid inuence from subpixel warps, the image has been subsampled
after the warp. One might expect the conventional gradient method works pretty
good on these images that have perfect intensity conservation between frames.
But still, our phase-based method is more accurate, as shown in gure 4.6.
4.3.3 Synthetic Images with Disturbance
In angiography, the contrast injection causes disturbing changes in the image, that
may also disturb the motion estimation. We have made an experiment on synthetic
images to show that our phase-based motion estimation is less susceptible to such
disturbance than the the conventional gradient method, often referred as optical
ow[17]. We have used synthetically shifted images to evaluate the accuracy when
one of the image frames is disturbed. We have used a popular reference image,
Lena256x256, which has been shifted and then subsampled to hide artifacts due
to subpixel shifts. The shifts are in all possible directions and we have computed
the average performance for all shifts of the same distance.
As shown in gure 4.7, our novel method performs signicantly better. Since
we usep iterative renement, it is most relevant to study performance for shifts less
than 2 = 2 0:7 pixels. After convergence, the warp reduces motion to something
less than a half pixel in each of x and y directions.
4.4 Future Development
The condence measure in this thesis is designed without much theory and experiments. It might be possible to get better accuracy with application specic
condence measures. For instance, in some applications it may be more or less
important to check consistency between frames. In general, it can be that the condence measure factor on magnitude, eq. (4.5) should depend on the noise level.
Instead of being linear to magnitude, it should be a sigmoid function that give
almost equal condence to all features that are well above the noise level.
36
Estimation of Motion Constraints
Figure 4.2: Original X-ray images
Figure 4.3: Subtraction Images, no motion compensation
Figure 4.4: Subtraction Images, motion compensation based on conventional
gradient method, after ltering out low frequencies.
Figure 4.5: Subtraction Images, motion compensation based on our phase-based
method. Note there are less artifacts compared to gure 4.4 (the condence measure
diers[16] slightly from the text.)
4.4 Future Development
37
0.35
Phase Method
Gradient Method
0.4
0.3
Error in Estimate (pixels)
Error in Estimate (pixels)
0.5
0.3
0.2
0.1
Phase Method
Gradient Method
0.25
0.2
0.15
0.1
0.05
0
0
0.5
1
True Shift (pixels)
1.5
0
0
0.5
1
True Shift (pixels)
1.5
Figure 4.6: The phase-based method is more accurate than the conventional
gradient method. These plots show a comparison on images(Lena 256x256 and
Debbie 128x128) that are shifted synthetically. One pass estimation { no iterative
renement.
0.35
Phase Method
Gradient Method
Error in Estimate (pixels)
0.3
0.25
0.2
0.15
0.1
0.05
0
0
0.2
0.4
0.6
True Shift (pixels)
0.8
1
Figure 4.7: The phase-based method is more accurate than the conventional gradient method. This gure shows a comparison on images(Lena 256x256) that are
shifted synthetically(shifting Lena512x512 before subsampling). One of the image
frames has been disturbed by adding a transparent stripe across the image, in order
to simulate a contrast bolus. (One pass estimation, i.e. no iterative renement).
38
Estimation of Motion Constraints
Chapter 5
General Problems in
Multiple Motion Analysis
5.1 Introduction
In estimation of multiple motions, there is a number of diculties that are not
present in estimation of single motion. In the general case, estimation of multiple
motions is a very dicult problem. All motions in the image need to be classied
and clustered into an unknown number of elds described by unknown models.
Even counting the number of motions in an image is a problem. This requires
some criteria to tell how dierent two motions must be before they are classied
as two motions instead of one. In the algorithms presented in chapter 6, it is
assumed that the number of motions are known a priori.
Multiple motion problems can be classied into transparent or occluding motions. The case of occluding motions is the most common in real world images, e.g.
a scene of multiple opaque objects at dierent depth, moving at dierent image
velocity. The focus our research is, however, primarily dedicated to the problem of
transparent motions. In X-ray images, we get a projection of structure at dierent
depth. The X-rays goes through all parts and nothing is occluded. The logarithm
of the image is thus the sum of all X-ray attenuation at all dierent depths. Our
approximation model of the human body is a set of transparent layers that move
independently. For example, an approximate model of X-ray images on the heart
might be four transparent layers. Two layers are the front and back ribs and the
other two layers are the front and back wall of the heart.
5.2 Motion Constraints
In the estimation of single motion in chapter 4, we assumed that the structure
in a medical image is primarily edges and only few corners. Thus, constraints on
local motion, c, is a representation that holds almost all relevant information in
the image. In the case of multiple layers, we assume that the motion of each layer
40
General Problems in Multiple Motion Analysis
can be described by motion constraints. For transparent layers, we also assume
a sparse abundance of edges and that the small regions in the image are usually
dominated by structure from only one layer. Under that assumption, it is possible
to estimate constraints on the local motion. Each estimated motion constraint
describes the motion of one layer, but we do not know which. An image can yield
a million of motion constraints that need to be explicitly or implicitly clustered
into multiple layers.
5.3 Correspondence Problems
We have already mentioned, there are correspondence problems when a large number of motion constraints are given from one image. Here follows a description in
more detail about dierent types of correspondence problems.
5.3.1 Minimal Number of Motion Constraints
Generalized Aperture Problem[19]: Assume translation. In the case of one
motion it is enough to see how two edges move, to estimate the motion. But, in the
case of two transparent motions, we need at least ve edges or independent motion
constraints. Figure 5.1 illustrates that four motion constraints are never enough
to estimate the motion of two layers due to ambiguous solutions. Adding a fth
constraint resolves the ambiguity, provided that one layer has three constraints
and the other two layer has two constraints.
Theorem 5.3.1 Assume there are M motion layers. The motion of each layer is
represented by motion models with N parameters. Then we need at least MN +
M , 1 motion constraints, of the type cxvx + cy vy + ct = 0, to compute the motions
of all the layers.
Proof: There are N parameters per motion layer. Thus there are MN unknowns parameters. In addition, there are hidden unknowns telling which constraints belong
the same layer. Assume MN constraints are given, then there
, topoints
would be MN
in parameter space where N constraints intersect. Only M
N
of these correspond to a true motion. But if one extra extra constraint is added
for all layers but one, i.e. a total of M , 1 new constraints, then there are exactly
M , 1 points where N + 1 constraints intersect. All of these M , 1 points are
true motions. The M :th motion is unambiguously given. As shown in gure 5.1,
it is the only point of N constraints that remains after removal of the constraints
belonging to the rst M , 1 motions.
So far, we have shown that MN + M , 1 motion constraints are enough. To
complete the proof, we also must note that fewer constraints would cause ambiguities. If there are fewer constraints, there are two cases:
case I): One motion has N , 1 constraint, or even fewer. It is impossible to solve
for the N motion parameters. case II) At least two motions
, have
fewer than N +1
constraints. The constraints of these motions will give 2NN intersections of N
5.3 Correspondence Problems
v
x
five constraints
(not enough)
v
y
five constraints
(enough)
vy
v
y
four constraints
(always ambiguous)
41
v
x
v
x
Figure 5.1: To estimate two velocities, it is not enough to have four constraints,
but ve might be enough. The left gure shows four constraints and all possible
solution cases. (Would you choose the circles, the squares or the diamonds?). With
a fth constraint, it is easy to realize that the circles represent the only solution.
The third gure shows that ve constraints are not always enough.
constraints. If N 2, it is impossible to tell which of these correspond to the real
motions.
The problems is in fact worse than described here, since motion constraint
vectors can be linearly dependent and noisy. Motion constraints will never intersect exactly at a number of points corresponding to each layer. In practice, the
abundance of motion constraints will be denser at some points and it is hard to
tell which constraint belongs to which layer.
5.3.2 Problem: Correspondence Between Estimates in Different Parts of the Image
Assume we have been able to locally estimate two or more motion vectors, v~1 ; v~2 ; : : :
, at every single pixel in an image. Then it remains to tell which motion vectors
belong to the same layer or object. To illustrate the diculties, we will study
a case with ambiguous solutions. Figure 5.2 shows a eld of two motion vectors
42
General Problems in Multiple Motion Analysis
and two possible solutions of splitting up the vectors into two smooth elds. Of
course, it is unlikely that two motion elds will be equal on a path across the
image. Although, this will not happen in practice, we may get something quite
close. In practice we will also have diculties to tell if a motion eld is continuous,
due to noisy estimates and that motion is not given at points between pixels.
Both motions
v1 (case 1)
v1 (case 2)
v
2
(case 1)
v2 (case 2)
Figure 5.2: Despite we have been able to estimate two velocity vectors at every
point in the image, (upper left). We cannot unambiguously tell which velocity
vectors belong to the same layer. In this case, there are two possible continuous
solutions. Which one would you choose?
5.3.3 Problem: Interframe Correspondence Between Estimates
Assume we have several frames. It is not enough to overcome all the previously
mentioned problems, i.e. to estimate all motion vector elds for each frame. We
still don't know which vector elds in the two frames correspond to the same
layers. This might be a problem, even when motions are smooth over time. (We
suggest that this problem should be solved by nding correspondence between the
features in the images.)
Chapter 6
Estimation of Multiple
Motions
Chapter 5 described the problems and diculties in estimation of multiple motions.
This chapter presents algorithms to overcome some of these problems and estimate
motion elds of multiple layers. The primary focus is a modied version of the
EM algorithm[19] for estimation of multiple motions.
6.1 Other Methods Considered
Before describing the successful part of our research, some other methods will be
described briey. We have considered a number of possible methods that we have
decided not to use. Among them are explicit correlation and tracking of dominant
layers. We cannot prove that they are inferior, but we describe problems that
discourage further research.
6.1.1 Diculties with Multiple Correlation Peaks
One of the methods we have considered is to explicitly correlate images with
dierent shifts and nd correlation peaks. As in estimation of a single motion,
correlation is hard to extend to estimation of other motions than pure translations.
Subpixel accuracy requires that the image is shifted with subpixel accuracy prior
to correlation.
In estimation of multiple motions, it is often easy to nd one of the motions as
the highest peak. Finding the next peak is not as easy. It is like asking which is
the second highest point in an area of mountains. Depending on who you ask, the
answer is dierent One person may say it is a rock two meters below the highest
peak. Another person would claim it is a minor peak of the same mountain, just
hundred meters away. A third person would count nothing but another mountain,
at least a kilometer away.
In a correlation map, the problem of dening criteria of nding a second peak
44
Estimation of Multiple Motions
is as dicult as in the real world. It is eveb worse, since the correlation is only
computed at a nite resolution of shifts. The limited resolution of the images make
it unmeaningful to compute correlation for small subpixel shifts. If the dierence
between two layers is just a few pixels, it is likely that the two peaks merge into
one. In order to estimate non-translational motions, local analysis is necessary
and the second peak often drowns in the ridge of a higher peak.
6.1.2 Diculties with Dominant Layers
Let's describe an approach that works in some of our experiments, but not good
enough. It is based on the assumption that one layer may be much stronger than
all the other layers. Under this assumption, we have been able to estimate motions
of two transparent layers by rst estimating motion of the dominant layer. Motion
estimation is done using the phase-based method in section 4.2. The condence
measure is designed to suppress motions that are large relative to the warp. In
conjunction with iterative renement, section 4.2.3, the motion estimate converges
to the dominant layer and outliers from the other layer are given low weight.
When motions of the dominant layer are known, it is possible to lter it away.
The removal of the dominant layer from the images is far from perfect, but in
our experiments it has been good enough for the next step. After removal of the
dominant layer from the images, it is straightforward to estimate the motion of
the weaker layer.
To improve accuracy, we have applied the above scheme iteratively. When both
motions are known the two layers are separated, section 6.4. In the next iteration,
the reconstructed layers are then used as reference images in the motion estimation.
If success, the algorithm converges towards better reconstructed images and better
motion estimates.
We have also been able to estimate motions in an image sequences where the
layers are virtually equally strong. This was done using a bootstrap version of
the above scheme. In the rst integration, only two frames are used. Often,
the motion estimate converges to either of the layers, although the accuracy is
awfully bad. This layer is ltered out and used as a reference image in the motion
estimation in the next iteration. Accuracy slowly gets better the more image
frames that are used. The scheme is computationally expensive and suers from
problems with convergence. On our test images, it only works when motions are
pure translations. It is also complicated to use multiple scales to estimate large
motions since dierent layers are dominant at dierent scales.
6.2 Estimation of Motion Constraints
The motion constraints, c in this chapter are computed by the phase-based method
we used for single motion estimation, section 4.2. It is possible to use other
methods, but we have not tried that.
If a small region in the image only contains structure from one layer, the estimated motion constraint will be accurate. Otherwise, in case there is structure
from two layers at the same point, they may interfere and produce outliers. The
6.3 EM (modied)
45
Figure 6.1: Constraints from an image with two transparent layers as in section 6.5.3. Four directions of quadrature lters are used, yielding four constraints
at each pixel. One layer appears stronger than the other.
phase-based method is less susceptible to interference between layers than the
conventional gradient method. The phase-based method is only sensitive for band
pass frequencies and these are split up by in dierent directions, and thus dissimilar structure from dierent layers is less likely to interfere. The condence
measure is also designed to suppress matches of dissimilar structure. An example
of constraints from two transparent layers is shown in gure 6.1.
6.3 EM (modied)
Out of the methods we have tried, the EM algorithm[19] is probably the best. The
EM algorithm is a general algorithm with applications beyond imaging. In our
46
Estimation of Multiple Motions
application, it is basically a kind of clustering algorithm, whose input is the mixture
of all motion constraints from all layers. Motion constraints that are coherent are
assumed to belong to the same layer. The EM algorithm is an iterative algorithm
that uses an initial guess what the motions are and then does several iterations.
A limitation is that the number of layers must be known a priori. Of course, no
clustering algorithm for motion constraints is guaranteed to converge due to the
correct answer, since there might be ambiguities as described in section 5.3.2. In
addition, it happens that the EM algorithm gets stuck in a local optimum.
6.3.1 Review EM
When we have multiple motions, constraints intersect at dierent points in parameter space, corresponding to each of the motions. Estimating these motions is
equivalent to nding the intersection points in parameter space. There seems to
be no closed form solution to this problem, but it can be iteratively solved by the
EM algorithm[19]. The EM algorithm is a clustering algorithm that iteratively
applies two steps:
Expectation: Estimate the owner probabilities for each constraint, i.e. the
probabilities that a constraint belongs to a particular motion layer. (We will see
that the owner probabilities depend on previously made motion estimates.)
Maximization: Estimate the motions, when constraints are assigned to each
of the motions, depending on the owner probabilities. Next iteration, the owner
probabilities have changed since the motion estimates are dierent. As already
mentioned, the original version of EM algorithm is only guaranteed to converge[26]
to a local optimum but we do not know whether it is a global optimum.
6.3.2 Derivation of EM Algorithm for Multiple Warps
Jepson and Black [19] have used the the EM algorithm on multiple motions, but
their approach didn't include warping images. As pointed out earlier, warping
images is necessary to estimate large motions with best possible accuracy. This is
especially important when estimating transparent motions. In case we wouldn't
warp, the constraints of a large displacement would be much weaker than those of
a small displacement.
The problem with warping multiple motions is that the image must be warped
according to each of the estimated motions, producing multiple warped images.
Here we will derive a simple extension of Jepson-Black's EM algorithm[19] that
assigns dierent mixture probabilities to each of the warped images. A lot of
variable names need to be introduced and it may help to keep an eye on the list
in appendix B.
Let l denote the index of the warp. For each warp, l, we get a set of constraints
ck;l , where k is a joint index of spatial position and other indices such as quadrature
lter direction. Also assume the correct motions model parameters for each of
6.3 EM (modied)
47
for i = 1 to number_of_iterations_refinement {
for j = 1 to number_of_motions {
warp_image;
compute_motion_constraints;
}
for j = 1 to number_of_EM_iterations {
E_step;
M_step;
}
}
Figure 6.2: The loops in our extended EM algorithm with multiple warps.
the motions are a0 ; :::; aN . Temporarily disregard from the possibility of a bad
estimates of motion constraints. Under these conditions, the PDF1 for observing
the constraint ck;l is
P (ck;l jxk ; fmn;l g; a0 ; :::; aN ) =
X
n
mn;l P (ck;l jxk ; an )
(6.1)
where mn;l is the probability of observing motion n in an image warped according to l. The PDF of observing our combination of constraints is the product
of all PDFs for single constraints. By applying logarithm, the product is converted
to a sum.
Y
X
log P (ck;l jxk ; fmn;lg; a0 ; :::; aN ) = log P (ck;l jxk ; fmn;l g; a0 ; :::; aN )
k;l
k;l
(6.2)
X
X
= log mn;l P (ck;l jxk ; an )
k;l
n
We want to nd the global maximum of this function under constraint that
the mixture probabilities sum to 1.
X
n
mn;l = 1
8 l = 1; 2; : : :
(6.3)
6.3.3 Evaluating Criteria for Optimum
We have just arrived at a well dened mathematical problem, i.e. to maximize the
joint PDF, eq. (6.2), under constraint eq. (6.3). To make clear what mathematical
problem is, let's write the equations for an optimization problem. in a special form
X X
max
log mn;l P (ck;l jxk ; an )
fan g;fmn;lg k;l
X n
where
mn;l , 1 = 0 8 l
n
1 Probability Density Function (PDF)
(6.4)
(6.5)
48
Estimation of Multiple Motions
Similar to[19] we use Lagrange relaxation2 to derive our version of the EM
algorithm for warped images. Relaxation of eq. (6.5) gives the Lagrange function
X X
X X
L(fan g; fmn;lg; fl g) = log mn;l P (ck;l jxk ; an ) , l ( mnl , 1)
n
k;l
l
n
(6.6)
where fl g are the Lagrange multipliers. According to Lagrange theory, the optimum is a sadle point of L(fan g; fmn;lg; flg) and must satisfy3
@ L(fa g; fm g; f g) = 0 8 n; l
(6.7)
n
n;l
l
@mn;l
@
(6.8)
@ L(fan g; fmn;lg; flg) = 0 8 l
l
ran L(fan g; fmn;lg; flg) = 0
Evaluation of these equations yields
X P (ckl jxk ; an)
P m P (c jx ; a ) , l = 0
n~ n~ l kl k n~
k
X
mn;l , 1 = 0
n
X mnlraP (ck;l jxk ; an)
P
=0
k;l n~ mn~ l P (ckl jxk ; an~ )
8n
(6.9)
8 n; l
(6.10)
8l
(6.11)
8n
(6.12)
In order to further evaluate these equations, let's dene something called ownership
probabilities
jxk ; an )
(6.13)
qnkl = PmnlmP (Pck;l
(
c
n~ n~ l kl jxk ; an~ )
Now, the equations can be written as
X
qnkl , l mnl = 0 8 n; l
(6.14)
k
X
k;l
X
mnl , 1 = 0
8l
(6.15)
qnkl ra log P (ckl jxk ; an ) = 0
8n
(6.16)
n
These are the equations that need to be satised at the optimum. They are
solved iteratively by solving one at a time. Before describing the details in next
section, we will give an intuitive meaning to the owner probabilities, qnkl . Note
that they are dened for each combination of motion constraint and layer. After a
closer look at eq. (6.13), it is clear that qnkl is the probability
that the constraint
ck;l belongs to layer with index n. In particular, note that Pn qnkl = 1.
2
3
A common method in mathematics and optimization theory
Unfortunately, even a local optimum satises these equations.
6.3 EM (modied)
49
6.3.4 Iterative Search for Optimum
The EM algorithm denes how to solve equations. (6.14)-(6.16) iteratively by
solving one variable at a time using one equation.
The rst operation in each iteration is to compute the ownership probabilities
for each pixel and layer. This is a straightforward computation using eq. (6.13).
In the rst iteration, we need an initial guess of the motion parameters, fan g, and
mixture probabilities, fmnl g.
The second operation is to compute the motion parameters for each layer,
fan g using eq. (6.16). Thanks to our probability function that will be dened in
section 6.3.5, the motion estimation is the same least square t as in section 3.2.
In order to prepare for next iteration, the mixture probabilities, fmnl g, need
to be updated using
P q
P
mnl = k nkl
:
(6.17)
n~ ;k;~l qn~ k~l
Then we go back and do some more iterations. The EM algorithm is guaranteed
to converge to a local optimum[26].
6.3.5 The Probability Function
The probability density function, denes the probability of observing a particular
constraint at a particular spatial location, according to a particular motion model.
For simplicity, we use a Gaussian PDF. It is simple because eq. (6.16) yields
the same equations as for the model based estimation in the section 3.2, except
for that no condence measure is used. In next section we will tell how to get the
condence measure back.
The probability of observing a particular constraint is a normal distribution
with respect to the deviation according to a dissimilarity measure[19], d(c; v).
2
P (cjx; a) = P (cjv) = p 1 exp(, d 2(c;2v) )
(6.18)
2
where v = K(x) a. With some abuse of notation, we dene the d-function
cy vy + ct )2
d2 (c; v) = (cx vx +
(6.19)
2
cx + c2y
The denominator acts like normalizing the vector (cx ; cy ) and the value of d(c; v)
is the closest distance from the point (vx ; vy ) to the line cx vx + cy vy + ct = 0. Note
that our function does not assume larger deviations for large motions in contrast
to[19]. Our approach to warp the image gives the same absolute accuracy for
arbitrary large motions.
6.3.6 Introducing Condence Measure in the EM Algorithm
In the probability function dened in section 6.3.5, the condence measure was
removed due to the normalization in eq. (6.19). We believe the condence should
50
Estimation of Multiple Motions
not aect the relative values of the owner probabilities. For example, a high
condence in the motion constraint does not mean that we are certain which layer
it belongs to.
Without writing the equations again, we will tell how to derive the EM algorithm with a condence measure. Let C = c2x + c2y denote the condence measure
of a motion constraint. The condence measure should be introduced in eq. (6.4)
2
by multiplying Ck;l
in the outer sum in front of \log : : :". This condence will follow through all the derivation in section 6.3.3. In the end, eq. (6.14) and eq. (6.16)
2
will be modied by simply replacing qnkl with Ck;l
qnkl .
6.3.7 Our Extensions to the EM Algorithm
As we have pointed out, it is proved that the EM algorithm converges to a local
optimum, but the risk of getting stuck in a local optimum is prohibitive when using
motion models with many degrees of freedom. In our case, warping images, makes
convergence even more hazardous, since the image has nite size and we might
get outside the boundary. In experiments, when applying the EM algorithm to an
images with two transparent layers with ane motion model, the EM algorithm
sometimes bails out already in the rst iteration. Our remedy to this is to control
the stiness (section 3.3). In the rst iteration, only translations are estimated.
In the second iteration, we allow small ane deformations. For every iteration, we
reduce the cost. We do several EM iterations for every time we warp the image.
Other researchers[21] suggest simulated and deterministic annealing to avoid
getting stuck in a local optimum. This would require too many iterations when we
have many parameters. We have not tried that since we have not had problems
with local optima when using cost functions.
Another extension is to let owner probabilities alter the certainties. If we are
not sure which layer a constraint belongs to, its inuence in estimation of the
motion parameters should be less.
Outliers in motion constraints are more frequent when there are multiple layers, since structures corresponding to dierent layers sometimes interfere. In our
scheme, outliers are handled by introducing an extra layer of motion that is supposed to own outliers. This special layer has a special probability function that
is much wider than for the other layers. This means that this layer owns all constraints that are far away from the closest estimated motion. How far is implicitly
determined by specifying a value for the mixture probability, i.e. we want that
mn;N = 8n
(6.20)
where N is the motion model and is a predened constant, that controls what
fraction of constraints to consider as outliers.
This is controlled by setting
P (ck;l jxk ; aN ) = poutlier
(6.21)
poutlier is determined so that eq. (6.20) holds.
Since this probability function does not depend on its corresponding motion,
there is no need to estimate the motion parameters of this layer.
6.4 Reconstruction of Transparent Layers
51
6.3.8 Convergence of Modied EM with Warp
We have made we have made modications of the EM algorithm without proving convergence. Even if we assumes that the modied EM algorithm always
converges, it would not imply that the iterative renement with image warps converges.
It is important not to confuse the iterations of the EM algorithm with the
iterations of image warps. The EM algorithm is in the inner loop and the warps
are in the outer loop (see gure 6.2. For every iteration of iterative renement,
several EM iterations are performed. The proof of convergence has nothing to do
with convergence of iterative renement. Convergence of the inner loop does not
imply convergence of the outer loop.
6.4 Reconstruction of Transparent Layers
If motions are known it is possible to reconstruct transparent layer, except for
the very lowest frequencies and provided that motions are unique and big enough
not to interfere with the pixel resolution. A predecessor of our algorithm is to
simply average along the trajectory of one motion[18]. If we have many frames in
the image sequence, the structure corresponding to this motion is sharpened and
all other structure is blurred. We improve the image quality by estimating the
errors and feeding them back. We arrive at an iterative backprojection algorithm,
described by gure 6.3.
original
sequence
-
subtract
-
reconstructed
images
average
along
motion
trajectory
-
6
reconstruct
original
sequence
Figure 6.3: Reconstruction of motion layers using simple backprojection.
6.4.1 Improved Backprojection Algorithm
In the simple backprojection, the feedback images are warped two times; rst
when feeded back and then when feeded forward after being subtracted. The
double warp degrades the image quality. Our way to overcome this problem is the
52
Estimation of Multiple Motions
scheme in gure 6.4 where the two warps are performed in one step as a single
warp.
original
sequence
-
average
along
motion
trajectory
-
subtract
reconstruction
images
-
6
-
combined
reconstruction
and average
along trajectory
Figure 6.4: Reconstruction of transparent layers using a more sophisticated backprojection.
6.4.2 Finding Correspondence between Motion Estimates
from Dierent Frames
As pointed out in section 5.3, we need know which motion vectors correspond
over time. The way we have applied the EM algorithm does not give us that
information. Comparing motion vectors over time does not work since our motions
are irregular over time. Our approach is to rst reconstruct layers from only two
frames. (There are no such problems if we only have two frames). Although image
quality is bad, we have the two layers separated and we can analyze how these
correlates with frames later in the sequence (when warped with dierent motion
estimates).
6.4.3 Experimental Results
So far, we have run the our algorithms only on images with two layers that have
been superimposed synthetically. Figure 6.5 shows a number of frames from a
sequence of synthetically generated images. The images have been generated by
taking two still images on a heart and adding them together with ane random
motions and then crop the valid region. The original heart images were thrown
away and the only input data to our algorithm was the sequence of 50 generated
6.4 Reconstruction of Transparent Layers
53
Figure 6.5: Synthetic multiple motion eld sequence containing two layers with
independent random ane motions. Frame 10, 20, 30 and 40 in a sequence of 50
images.
Figure 6.6: Reconstructed images (cropped a few pixels to hide artifacts at borders). Compare to gure 1.3.
54
Estimation of Multiple Motions
images. Using the EM algorithm with our phase-based method, the motions of
both layers were estimated under assumption of ane motions. The original layers
were reconstructed and shown in gure 6.6. We get some artifacts at the borde is
degraded.
We did not save the true original images are thrown away in order to avoid
confusion, but they can be seen in gure 1.3. One layer is the upper left part of one
image in gure 1.3 and the other layer is the lower right part. The reconstruction
of the details is ne but the lowpass is degraded and the DC is discarded.
After doing these experiments, we have seen a poster and an abstract on a
project that also seem to solve the same problem of separation of layers. Not much
details related to our work are given and the proceedings with the full article[28]
is not yet available.
6.5 Alternative Method for Two Mixed Motions
In this section, we present an alternative to the EM algorithm for estimation
of multiple motions. Compared to the EM algorithm, the same input data is
used, but the computational time is usually shorter since it is cheap to do many
iterations once some initial computations are done. Among the drawbacks of this
method is the inuence of outliers seem too large, problems with convergence and
we have not yet invented any method to take advantage of multiple warps in order
to estimate large motions with good accuracy.
We have found references of an algorithm[30, 31] with some similarities. It uses
higher order moments in 3D Fourier/Gabor transforms of spatiotemporal volumes
and also yields a minimization problem.
6.5.1 Basic Idea
Assume we have two motions, v1 and v2 , described by parameter vectors a1 and
a2 for some motion model dened by K(x). As dened in chapter 3,
v = K(x) a and v = K(x) a
1
1
2
2
(6.22)
A large number of constraints vectors, ck , k = 1; 2; 3; : : : are given at spatial
positions xk . Neglecting interference between layers, the motion constraints are
supposed to satisfy either
cTk v = 0 or cTk v = 0
1
where
v
v= 1
2
and
0c 1
k;x
ck = @ck;y A :
ck;t
(6.23)
(6.24)
6.5 Alternative Method for Two Mixed Motions
55
Let's now dene an error measure similar to eq. (3.12) but for two motions,
"(a1 ; a2 ) =
X
(cTk v1 (xk ))2 (cTk v2 (xk ))2
k
X
,
,
= ( ck;x ck;y K(xk ) a + ck;t ) ( ck;x ck;y K(xk ) a + ck;t )
1
k
2
2
2
(6.25)
To simplify notations, we introduce
bk = ,ck;x ck;y K(xk )
dk = ck;t
(6.26)
(6.27)
and the expression gets more readable,
"(a1 ; a2 ) =
=
X
k
X
k
(bk a1 + dk )2 (bk a2 + dk )2
(aT1 bk bTk a1 + 2dbTk a1 + d2k )(aT2 bk bTk a2 + 2dk bTk a2 + d2k ):
(6.28)
After further evaluation of the product above, it turns out that the sum can
be moved inside the unknown parameter vectors, a1 and a2 . We have to sum
over outer products of up to four vectors and we get a four dimensional array of
numbers, that are called tensors[34]. Readers not familiar of tensors, can think
of them as an extension of vectors and matrices into arbitrary dimensionality.
Tensors come with special tensor notations, where matrix-like product are written
without . Instead indices to multiply and sum over are written as subscripts of
one factor and subscript of the other factor.
"(a1 ; a2 ) =T4ijkl a1i a1j a2k a2l +
+ T3ijk a1i a1j a2k + T3ijk a1i a2j a2k +
+ T2ij a1i a1j + 4T2ij a1i a2j + T2ij a2i a2j +
+ 2T1ia1i + 2T1ia2i +
+ T0
(6.29)
where the T4ijkl , T3ijk , T2ij , T1i , T0 are tensors with 4, 3, 2, 1, 0 indices4. The
tensors are formed by summing outer products5 and we get the fourth moments
4 Example: T has two indices and is a matrix, T has one index and is a vector. T has no
2
1
0
index and hence a scalar.
5 For example, the outer product of two column vectors, u v = uvT is a matrix (or a tensor
with two indices).
56
Estimation of Multiple Motions
of the elements in c
T =
4
T3 =
T2 =
T1 =
T0 =
X
k
X
k
X
k
X
k
X
k
bk bk bk bk
(6.30)
dk bk bk bk
(6.31)
d2k bk bk
(6.32)
d3k bk
(6.33)
d4k
(6.34)
6.5.2 Minimizing "(a1; a2 )
In order to to nd motion, "(a1 ; a2 ) is minimized. In lack of references for better
methods, a version of Newton's method is used to nd a stationary points, i.e.
where the gradient is zero. A brief description of the approach would be a multidimensional version of the well known Newton-Raphson method applied on the
gradient. The gradient with respect to a1 is computed as
ra1 "(a ; a ) = T ijkl a i a j a k +
1
2
1
4
2
2
+ 4T3ijk a1i a2j + 4T3ijk a2i a2j +
+ 2T2ij a1i + 4T2ij a2i +
(6.35)
+ 2T1
and the gradient with respect to a2 is computed in a similar way. With some
abuse of
notations,
rtensor
we treat the tensors as vectors and say that the gradient
"
(
a
;
a
)
r" = raa1 "(a11 ; a22 ) . With similar abuse of notations, the Hessian, i.e. matrix
2
of second derivatives, is given by
ijkl
r "(a ; a ) = 2T4T aa ai a j 2T4Tijklaai aaj +
i j
i j
2
1
2
2
4
4 1
2
4 1
2
4
4T ijk a
2
1
1
4T3ijka1i + 4T3ijka2i +
+
4T3ijka1i + 4T3 ijka2i
4T3ijk a1i
ij ij + 24TT2ij 42TT2ij
2
2
3
2
i
(6.36)
The parameter vectors, a1 ; a2 are computed iteratively using Newton's method.
(Search for stationary points)
a n
a
1
2
( +1)
(n) a (n) ,1 a (n)
= aa1
, r2 "( a1 ) r"( a1 )
2
2
2
(6.37)
6.5 Alternative Method for Two Mixed Motions
57
Convergence is often quite poor. In our experiments, the Newton search often
converges to suboptimal solutions, usually a1 = a2 . Our simple remedy to this
problem is to use several start points for the iteration. The Newton search is so
fast6 so that we can use hundreds of start points. A Newton search that tends to
bail out is canceled and we try the next start points. The procedure is repeated
until we have got a large number of sound estimates of a1 and a2 . Then we choose
the estimate with the smallest "(a1 ; a2 ). Unfortunately, we can never be sure that
we have found the optimum. All we can do is to increase the likelihood by using
a large number of start points.
An alternative to this approach is simulated or deterministic annealing. Annealing also suers from the problem that you cannot be sure you have found the
optimum.
6.5.3 Experimental Results
This alternative method has been implemented both for translational and ane
motions. Accuracy seems not as good as the EM algorithm. Figure 6.7 shows results from experiments on synthetic images. Just like in gure 4.6 in chapter 4, the
phase-based method is used to estimate constraints and the image is not warped
to improve accuracy. The same test images are used(Lena+Debbie128) but these
are superimposed with opposite motion. When estimating multiple motions, we
get two motion estimates for each pixel and it is hard to tell which corresponds
to which layer. In evaluation of accuracy, the motion estimates are sorted by a
comparison with the known motion and we don't think it's cheating. Motion constraints for this experiment when motions are (1; 1) pixels in each direction are
drawn in gure 6.1.
Error in Estimate (pixels)
0.3
0.25
0.2
0.15
0.1
0.05
0
0
0.5
1
1.5
True Shift (pixels)
2
Figure 6.7: Accuracy of estimation of each of two superimposed layers. For
small motions, it is hard to separate the layers. Evidently, one layer yields better
estimates than the other.
6
Very fast compared to the EM algorithm
58
Estimation of Multiple Motions
Chapter 7
Canonical Correlation of
Complex Variables.
There is a well developed theory for canonical correlation analysis (CCA) of real
variables, Borga[5]. Canonical correlation of complex variables has successfully
been used in a stereo algorithm[5] without having a theory for the complex case.
This chapter introduces a novel way of maximizing canonical correlation, which is
derived for complex variables. It is also shown to generate the same solution as
Borga's[5] method, even for complex variables. Thus, Borga's method is proven
to work even for the complex case. A major advantage of our novel method is the
ability to handle singular covariance matrices.
This chapter is a theoretical study on canonical correlation in general. Since
no images or vectors are involved, a number of variable names and notations can
be used for other purposes. For example, vector v is not the motion vector.
For complex matrices, conjugate and transpose are usually applied simultaneously. This is denoted by superscript star(), e.g. A . A simple transpose is
denoted by superscript T , e.g. AT . Unfortunately, this chapter uses simple complex conjugate without transpose. In lack of good notations, a simple conjugate
is written as a combination of a star and transpose, e.g. AT .
Another commonly used notation is the operator of expectancy value of a
stochastic variable, E [:]. In practical application, statistical data sets are limited
and we need to use estimates of expectancy value. After having veried all the formulas in the chapter, it turns out that every every E [:] operator can be substituted
with a sum over all available data.
7.1 Denition of Canonical Correlation of Complex Variables
The notations and formulas are similar to Borga's PhD thesis[5], except for some
variables names that would cause too much confusion in image processing. Assume
we have two sets of stochastic variables organized in two vectors, zA and zB
60
Canonical Correlation of Complex Variables.
respectively. For each of the two vectors we construct linear combinations of the
vector components.
zA = wAT zA and zB = wBT zB
(7.1)
where wA and wB are vectors of linear combination coecients. The canonical
correlation is the correlation of these two linear combinations.
= p E [zAzB ] E [zA zA]E [zB zB ]
E [(zTA wA ) zTB wB ]
=q
E [(zTA wA ) zTA wA ]E [(zTB wB ) zTB wB ]
(7.2)
E [zT zT ]wB
w
A
A
B
=q
wA E [zTA zTA]wA wB E [zTB zTB ]wB
= p wA CAB w B
wACAAwA wB CBB wB
where the covariance1 matrices are
CAA = E [zTA zTA] and CAB = E [zTA zTB ] and CBB = E [zTB zTB ]
and wA and wB are computed to maximize the correlation.
(7.3)
7.2 Maximizing Canonical Correlation
The objective in canonical correlation analysis is to nd the two linear combinations that yield maximum correlation, i.e. maximizing the correlation, , with
respect to wA and wB . In the complex case, where is complex, the rst issue
is what to maximize, the absolute value or the real part. The following theorem
implies that both the absolute value and the real part can be maximized simultaneously.
Theorem 7.2.1
max < = max jj
(7.4)
Proof: It is obvious that max < max kk. It remains to show that max < max kk always holds. Assume that we nd wA and wB so that kk is maximized
but arg 6= 0. Then we can get a real canonical correlation with the same absolute
value by multiplying wA by ei arg .
At maximum, the linear combination coecients hold information about dependence of the input data. In learning and adaptive ltering, these linear combination can be applied on new input data for classication. The stereo and motion
algorithm in chapter 8 use analysis of wA and wB directly to nd mutual dependence between the two images. A simple example of canonical correlation analysis
is provided in appendix A.2.
1
Only true covariance if expectancy value is zero.
7.3 Properties of the Canonical Correlation
61
7.3 Properties of the Canonical Correlation
Theorem 7.3.1
= p wA CAB w B
(7.5)
wACAAwA wB CBB wB 1
Proof: Note that E [zA zB ] is a scalar product2 of stochastic variables zA and zB .
Thus, it follows from Cauchy-Schwarz' that the numinator is less or equal to the
denuminator. In real world applications,
expectancy value, E [zA zB ], is substituted
P
with a sum of all available data, k zA;k zB;k . This sum also meets the criteria
for being a scalar product. Thus, it still holds that jj 1.
7.4 Maximization Using SVD
Borga[5] transforms the maximization problem of canonical correlation into a generalized eigenvector problem. The formulas on page 68 in his dissertation[5] are
only formulated for real parameter vectors, wA and wB . That proof does not hold
for the complex case, since it is not possible to compute the derivative of a complex
conjugate (see note in appendix A.1). It may be possible to modify Borga's proof
by dierentiating with respect to real and imaginary parts separately, but we do
not present any such proof. Instead, we present a novel proof and a novel method
that employs neither derivatives nor generalized eigenvector problem.
Our novel method of maximizing the canonical correlation, works, unlike the
scheme by Borga[5], even when covariance matrices CAA and CBB are singular.
We will also show that it is equivalent to Borga's method, even for the complex
case of canonical correlation. Thus, we have proved that Borga's method is valid
for complex variables.
7.4.1 Operations in Maximization
Since CAA and CBB are Hermitian and positive denite, we can do eigenvalue
decomposition
CAA = QADAQA and CBB = QB DB QB
(7.6)
where QA and QB are unitary matrices. DA , DB are diagonal matrices, whose
eigenvalues are real and nonnegative. Note that one or more eigenvalues are zero
in case CAA or CBB is singular. In practice, matrices are almost never exactly singular, just ill conditioned. Therefore, it may be necessary to threshold eigenvalues
in DA and DB .
2
2
3
Dene
vA = DA QAwA and vB = DB QB wB
2
3
A scalar product in complex vector space must conjugate one of the factors.
A matrix, Q is unitary if its inverse is Q , i.e. Q Q = I.
(7.7)
62
Canonical Correlation of Complex Variables.
which is a conventional coordinate transformation in the nonsingular case. In the
singular case, one or more elements in vA or vB are always zero. Let's also dene
a covariance matrix for this coordinate transformation.
C~ AB = DyAQACAB QB DyB
(7.8)
where y denotes pseudo inverse4.
With this coordinate transformation, the canonical correlation can be expressed
in a simple form. Thanks to the relations between CAA , CAB and CBB , the
following equations are valid even when DA and DB are singular. For readability,
this proof is put in appendix A.3.
= p wA CAB w B
wACAAwA wB CBB wB
= p wA Q2A QACAB Q B QB w2 B wAQADA QAwA wB QB DB QB wB
see appendix A.3
(7.9)
QADA Dy Q CAB QB Dy DB Q wB
w
A
A
B
A
B
= p wAQADAQAwA wB QB DB QB wB
~
= pvACAB v B
vAvA vB vB
= v^A C~ AB v^B
where v^A and v^B denote normalized unit vectors of vA and vB . This expression
of is simple to maximize with respect to v^A and v^B . At rst thought, one
might worry about what happens in the singular case, where eq. 7.7 impose the
constraint that some elements of vA and vB have to be zero. These constraints are
automatically satised at the maximum of eq. (7.9) since the forbidden subspaces
are the same as the left and right nullspaces of C~ AB . To nd the maximum,
singular value decomposition (SVD) is applied on C~ AB .
0
1 0f 1
B CC BBf CC
~CAB = ,e e e : : : B
[email protected]
CA [email protected] CA
2
2
1
1
2
1
=
X
k
2
k ek fk
3
2
3
...
3
..
.
(7.10)
By convention, fei g and ffi g are both sets of orthonormal vectors. The singular
values are real and sorted in descending order, i.e. 1 2 3 ::: 0. The
The pseudo
2 inverse
y ofa0diagonal
matrix is simple. Just invert each of the nonzero elements.
0
:
5
0
For example 0 0 = 0 0
4
7.5 Canonical Variates
maximum is obtained when
63
v^A = e
v^B = f
1
(7.11)
1
= 1
Note that the SVD is not uniquely dened in case two or more singular values
are equal. If the multiplicity of the largest singular value is greater than 1, the
optimal vA and vB are not unique.
Finally, wA and wB can be solved, using eq. (7.7). This solution is ambiguous
in the singular case, but pseudo inverse yields the smallest wA and wB .
wA = QADyAvA and wB = QB DyB vB
(7.12)
7.5 Canonical Variates
We do not know any good denition of what a canonical variate is. Borga[5]
provides a denition that depends on his maximization method. In this thesis,
a dierent maximization method is used and a dierent denition need to be
introduced. In section 7.6 this denition is proved to be equivalent with Borga's
denition.
In this thesis, canonical variates are dened as the (suboptimal) solutions to
the canonical correlation corresponding to the dierent singular values in the SVD,
eq. (7.10). The variate of index k is what we get if we replace eq. (7.11) with
v^A = ek
v^B = fk
(7.13)
= k
7.6 Equivalence with Borga's Solution
The objective of this section is to show that the CCA-SVD method gives the
same solutions as Borga's[5] method that transforms the maximization problem
to a generalized eigenvector problem. The following equations are valid, even for
complex and singular cases. Thus, the equivalence proof also conrms the validity
of Borga's[5] method for complex variables. Even the canonical variates are the
same as in Borga's method.
P
Remember the singular value decomposition in eq.(7.10), C~ AB = k k ek fk
and study what it means for the solutions of the following equation system.
C~ AB v^B = v^A
(7.14)
~CAB v^A = v^B
These equations are satised if and only if vA , vB and are the corresponding
components in SVD of C~ AB . Or to be exact, in case the singular value has multiplicity 1, the solutions are linear combinations of SVD-vectors with the same
64
Canonical Correlation of Complex Variables.
singular value. For readability, the linear combinations are not written explicitly,
but they are implicit since the singular value decomposition is not unique.
v^A = ek
v^B = fk
k = 1; 2; 3; : : :
(7.15)
= k
This means that the canonical variates computed by our novel SVD method are
the only solutions to eq. (7.14). We want these equations in the w-coordinates,
as in Borga's thesis. Use eq. (7.7) to substitute vA and vB . We also multiply the
whole equations with QA DA and QB DB respectively. Despite we multiply with
DA that might be singular, we have equivalence with eq. (7.14), (since DA and
D2A have the same rank).
QADA DyAQACAB QB DyB DB QB wB = QADA DAQAwA
QB DB DyB QB CAB QADyA DAQAwA = QB DB DB QB wB
(7.16)
(7.17)
Thanks to eq. (A.2) makes most of the matrix product cancel out. We arrive at
the following expression which is equivalent to eq.(4.30) in Borga's thesis, except
for that the w vectors are not normalized.
CAB wB = CAAwA
CAB wA = CBB wB
(7.18)
(7.19)
We can normalize the vectors, provided we multiply rhs in one eq by \ BA " and
the other rhs by \( BA ),1 " (Borga's variable names). Then we have exactly equation (4.30) in Borga's PhD thesis[5]. The singular values and vectors correspond
to the canonical variates. Note that this proof holds even for the complex and
singular cases.
To emphasize this is the generalized eigenvalue problem, let's write it in matrix
form
0 C C
w 0
AB , AA
A
(7.20)
CAB 0
0 CBB
wB = 0
We do not recommended Borga's method when CAA and CBB are close to singular.
Experimental results5 indicate serious numerical problems.
5
our motion algorithms using Matlab function eig()
Chapter 8
Motion Estimation using
Canonical Correlation
Canonical correlation has been successfully used for estimation of disparity in a
stereo algorithm by Borga[5]. An important advantage of that method is the
ability to handle depth discontinuities. Whereas conventional stereo algorithms
smoothen disparity estimates across discontinuities, Borga's algorithm responds
with a distinct discontinuity. Experiments with transparent layers even prove an
ability to estimate multiple disparities at a single point in the image.
It should be pointed out that there are other stereo algorithms that can handle depth discontinuities, e.g. Bircheld-Tomasi[4] that searches for single pixel
correspondence.
One may wish there were a motion estimation algorithm with the same advantages as Borga's stereo algorithm. In case of occlusion, one wish motion discontinuities would be be correctly estimated. One may also wish that transparent
layers would give multiple motion estimates at a single point. Unfortunately, it is
more complicated in motion estimation, due to the generalized aperture problem,
described in chapter 5. It may still be possible to compute motion constraints
are not smoothed across discontinuities and not much degraded by interference of
multiple layers.
We have extended the stereo algorithm to estimate motions, but so far for only
one motion. It remains to explore its potential abilities in estimation of multiple
motions.
8.1 Operations Applied Locally in the Image.
The image is rst convolved with a number of quadrature lters and then divided
into patches, e.g. blocks of size 16x16 pixels. The patch should be so small that
the motion can be considered as pure translation within the patch. Each of these
patches for each of these lter outputs are processed independently to get a motion
constraint, c. This section describes these local operations.
66
Motion Estimation using Canonical Correlation
PP
Apply Quadrature
,
Filters
,
0
45
)
,
@
@
,
Images IA (x) and IB (x)
PP
PP
PP
P
@ 90
R
@
135PPPP
q
Local covariance of shifted lter outputs
0x : : : x1
0x : : : x1
0x : : :
CAA = B
@ ... . . . ... CA, CAB = [email protected] ... . . . ... CA, CBB = [email protected] ... . . .
x ::: x
x ::: x
x :::
A
A
max canon. correlation
A
0x1
wA = B
@ ... CA
x
1
x
.. C
.A
x
0x1
wB = B
@ ... CA
AU
x
linear combination
of shifted lters
?
?
fA(x) =
fB (x) =
@
@
@
cross correlation
@
R
@
compute constraint
,
,
,
g(v)
,
,
0c 1
x
c = @ cy A
?
ct
Figure 8.1: From image to motion constraint for one direction and one patch.
Don't forget that all values in between are complex numbers. A look up table can
speed up computations.
8.1 Operations Applied Locally in the Image.
67
8.1.1 Shifted Quadrature Filter Outputs
Each of the two original images are convolved with a number of quadrature lters,
as dened in section 1.4. We have used lters in directions 0, 45, 90 and 135
degrees. Since only one lter is used to compute one motion constraint, c, the
direction is dropped in our notations. For readability, we let f (x) denote the
quadrature lter of any directions. We have not tried lters with dierent center
frequencies, but we believe it would improve performance.
qA (x) = (f IA )(x) and qB (x) = (f IB )(x)
(8.1)
These lter outputs are shifted with a number of predened shifts, s1 ; s2; s3 ; : : :
and correlated.
For example, in case the motion is exactly v = s3 then qA (x) and qB (x + s3 )
will make perfect correlation. In case we would have v s3 then we also get a
high magnitude of correlation, but the value is complex with an argument almost
proportional to the dierence v , s3 . This property is fundamental in the phasebased method in chapter 4.
The method in this chapter is based on nding linear combinations of shifted
lter outputs
X
i
wAi qA (x + si ) and
X
i
wBi qB (x + si ):
(8.2)
that have highest possible correlation. The coecients are complex and we arrange
them in vectors
0w 1
0w 1
A1
BBwA2 CC
BBwBB12 CC
wA = B
(8.3)
@wA. 3 CA and wB = [email protected] 3 CA :
..
..
8.1.2 Canonical Correlation
For each lter direction and each patch, canonical correlation is used to nd the
linear combinations of shifted lter outputs in eq (8.2) that have maximum correlation. The patch region in the image is denoted N . The unknown coecients
in the linear combinations are organized in vectors wA and wB . In terms of these
notations, we want to maximize the following correlation under constraint it is
real and positive.
RR (P w q (x + s )) P w q (x + s ) dx
i
i
i Bi B
N i Ai A
q
= wmax
RR
RR
P
P
;
w
A B
N j i wAi qA (x + si )j dx N j i wBi qB (x + si )j dx
wA CAB wB
p
= wmax
A ;wB w C w w C w
2
A AA A B BB B
2
(8.4)
68
Motion Estimation using Canonical Correlation
6
,
@
I
@,
,2
s = +2
7
6
,
@
I
@,
s = +2
8
6
@
I
,
@,
6
@
I
,
@,
,2
s = 0
4
6
,
@
I
@,
s = ,,22
1
0
0
s = 0
5
6
,
@
I
@,
s = ,02
2
6
,
@
I
@,
+2
s = +2
9
6
@
I
,
@,
+2
s = 0
6
6
,
@
I
@,
s = +2
,2
3
Figure 8.2: A set of shifted quadrature lters in directions 0, 45, 90, 135 degrees
that are used in experiments in section 8.4.
This is the form of canonical correlation where CAA , CAB and CBB are covariance matrices. The element at row m and column n in each covariance matrix is
computed as
CAA;mn =
CAB;mn =
CBB;mn =
ZZ
N
ZZ
N
ZZ
N
qA (x + sm ) qA (x + sn ) dx
(8.5)
qA (x + sm ) qB (x + sn ) dx
(8.6)
qB (x + sm ) qB (x + sn ) dx
(8.7)
The canonical correlation is maximized using the SVD-based method in chapter 7 that can handle singular covariance matrices. Matrices are virtually never
singular, just ill conditioned, and therefore we threshold eigenvalues of the covariance matrices in eq. (7.6). The threshold should be much higher than what can
be justied by errors in oating point arithmetics. In order to reject weak features
in the images, the threshold in our implementation is set to 1=1000 of the largest
eigenvalue. The exact value of the threshold is probably not important and can
vary several orders of magnitude without signicant changes in motion estimates.
8.1 Operations Applied Locally in the Image.
69
8.1.3 Correlation of Filters
Maximization of canonical correlation means nding the linear combinations of
lter outputs that yield maximum correlation. In previous sections, we found
the vectors of coecients, wA and wB , such that the maximimum correlation is
obtained for
X
i
wAi IA f (x + si ) and
X
i
wBi IB f (x + si ):
(8.8)
Thanks to the properties of convolution, it makes sense to study the linear combination of lters
fA (x) =
X
i
wAi f (x + si ) and fB (x) =
X
i
wBi f (x + si )
(8.9)
instead of the lter outputs. Convolving the images with these lters is the same
as convolving the image with each of the original lters and then computing linear
combinations. In sense of correlation, these are the best possible linear combination of original lters. The motion can be estimated by analyzing these lters.
The lters obtained by linear combinations of quadrature lters in the same
direction, are also quadrature lters. This statement is obvious if we think of the
lter summation in the Fourier domain. Since all the added lters are zero in one
half plane, the sum is also zero in one half plane.
Since the two images are similar except for a shift, i.e. IA (x) = IB (x + v)
the computed lters should also be similar, except for an equally large shift in
opposite direction, fA(x) = fB (x , v). To nd the correct motion, v, we analyze
the cross correlation of the generated lters,
g(v) =
ZZ
fA (x + v) fB (x) dx
(8.10)
In a perfect world, the cross correlation of g(v) has a peak value where v is the
image motion. This peak value value is real and positive, i.e the phase crosses
zero. In practice, the zero crossing of the phase does not perfectly coincide with
the maximum amplitude. Just nding correlation peaks is of limited use in image
regions that only have structure in one orientation, e.g. a straight line or edge.
Phase is used since it is aware of the aperture problem. We also believe that zero
crossings are more accurate than maximum amplitude. The phase of g(v) crosses
zero along curves in (vx ; vy )-space. Usually, there are several curves, but the curve
with the highest amplitude is probably the one corresponding to the image motion.
How to analyze the cross correlation, g(v), is described in section 8.1.5. The next
section describes how to compute g(v) using a look up table.
8.1.4 Look Up Table (LUT)
Since the generated lters, fA (x) and fB (x) are linear combinations of a set of
original lters, their cross correlation, g(v), is a sum of cross correlations of the
original shifted lters. The LUT is computed by explicitly shifting lters. For
70
Motion Estimation using Canonical Correlation
Figure 8.3: Zero crossings of the phase for all the patches in an image with ane
motion. In each subplot, the zero crossings, arg g(v) = 0, are drawn for each
of the lter directions. Most zero crossings are straight lines. Sometimes, there
are multiple false zero crossings. Since the motion is not pure translation, the
intersections have dierent positions for dierent patches.
subpixel accuracy, the shifts are implemented as multiplication in the Fourier
domain. In matrix form the generated lters can be expressed as
fA (x) =
X
,i
wAi f (x + si )
= f (x + s1 )
: : : f (x + sN ) wA :
(8.11)
8.1 Operations Applied Locally in the Image.
71
The cross correlation is a product of coecients vectors from the canonical correlation and a matrix whose elements are cross correlation of the original lters.
ZZ
fA (x) fB (x + v) dx
0 f (x + v + s ) 1
ZZ BB f (x + v + s12) CC ,
f
(x + s1 ) : : : f (x + sN ) wB dx
= wA B
C
.
.
@
A
.
f (x + v + sN )
(8.12)
0 f (x + v + s ) 1
ZZ BB f (x + v + s12) CC ,
= wA
[email protected]
CA f (x + s1) : : : f (x + sN ) dx wB
..
.
f (x + v + sN )
= wA G(v) wB
g (v ) =
where
0 f (x + v + s ) 1
ZZ B f (x + v + s ) CC ,
f
(x + s ) : : : f (x + sN ) dx
G(v) = B
[email protected]
C
..
A
.
1
2
1
(8.13)
fN (x + v + sN )
G(v) is a look up table(LUT), that is precomputed for a number of dierent
values of v. Since subpixel shifts are necessary, there is an issue which interpolation
method to use. For this particular data, we have chosen phase shifts in the Fourier
domain, since we are not worried about ringings in the spatial domain. In order to
reduce the eects of circular shifts, zeros are padded on the borders of the lters
before computing FFT. Zero padding is equivalent to more dense sampling in the
Fourier domain. For computational eciency, Plancherel's formula may used to
compute cross correlation directly in Fourier domain in order to avoid inverse FFT.
Gmn(v) =
ZZ
f (x + v + sm) f (x + sn ) dx
ZZ
T
T
(8.14)
= 21 (F (u) eiu (v+sm) ) F (u) eiu sn du
ZZ
jF (u)j2 eiuT (sn,sm ,v) du
= 21
where F (u) is the Fourier transform of f (x).
In the next section, interpolation is used to compute g(v) for values of v that
are not in the look up table. Bilinear interpolation is used, but not directly on the
real and imaginary parts. Instead, interpolation is done in polar representation of
the complex numbers. The reason is because phase is more linear than the real
and imaginary parts. This interpolation enables us to compute derivatives of the
phase.
72
Motion Estimation using Canonical Correlation
8.1.5 Motion Constraints from Correlation Data
The motion to estimate, v, is assumed to be along one of the zero crossings of
the phase of the correlation map. This yields a nonlinear constraint on the local
motion, arg g(v) = 0.
In order to make computations reasonably simple, the nonlinear constraint is
approximated by a linear motion constraint, cT v = 0, as dened in section 2.2,
where notations are as in previous chapters,
0c 1
v
x
v = 1 and c = @cy A :
(8.15)
c
t
Unfortunately, there are often multiple zero crossings of the phase. In addition,
the zero crossings are not along straight lines. For that reason, it is necessary to
know roughly what the motion is before converting to a linear motion constraint.
Assuming the motion is close to v0 , we think that a linear motion constraint, c,
should have the following property
(8.16)
C arg g(v) = cT v + O(kv , v0 k2 ):
The solution is
c x
(8.17)
cy = C r arg g(v0 ) and ct = C arg g(v0 ) , cxv0;x , cy v0;y
where C is a condence measure set to
( 1
1
3
C = ( 1 , , 1 ,2 ) jg(v)j kr arg g(v)k if > 2
(8.18)
0
otherwise
where 1 = 1:001, 2 = 0:98 and 3 = 1 are constants chosen by studying a few
experiments. Note that the magnitude of the gradient of the phase is included in
both eq (8.17) and eq (8.18).
8.2 Fitting Motion Model to Data
The image is divided into patches that each yield as many motion constraints as
there are directions of quadrature lters. These constraints are combined according
to the theory of motion models in chapter 3 and produce a motion estimate. Instead of iterative renement as described in chapter 4, we iterate without warping
the image. Instead, the motion constraints are recomputed for updated motions
as v0 in eq. (8.17) and gure 8.4.
8.3 Choosing Patch Size
A small patch often contains too little information. For the canonical correlation
to have meaning, we must have at least as many pixels in a patch as many as the
8.4 Experimental Results
Quadr.
Filter
-
CCA
73
-
Correlate
lters
(LUT)
-
compute
c
-
Fit
motion
-
6
Figure 8.4: Flow chart of our CCA-based motion estimation regarded from a
single patch. Computation of motion constraints requires that the motion is known
approximately. Since we can only make a rough guess, a number of iterations is
necessary.
number of shifts per lter. But even if there are fewer pixels, there is still a chance
that canonical correlation nds good pair of linear combinations.
A too large patch, on the other hand, will not reect the local structure in the
image. It will rather tend to reect the global distribution. Large patches also
have problem to estimate motions that are not pure translations. Probably, the
error in estimation of rotations is proportional to the patch size (for large patches).
A few experiments on ane motions, suggest that the error is roughly a certain
fraction of the variations within a single patch.
In addition, the larger patches are, the fewer they get and thus a lot of information seems to be thrown away. More patches yield more motion constraints.
8.4 Experimental Results
Figure 8.5 shows the accuracy in motion estimation on an image with synthetically
introduced motions. The famous test image Lena512x512 has been shifted in
several dierent directions and distances. Then the images have been subsampled
to 128 pixels in order to hide the artifacts introduced by subpixel shifts. Thus,
we have good test images of size 128x128 pixels and we know the answer. The
motion is estimated and the mean square deviation is plotted for each magnitude
of shifts. Since the look up table is only computed for shifts smaller than 2 pixels,
it is impossible to estimate larger motions.
In the experiment, the center frequency of the lter is roughly 1 rad/pixel1,
the patch size is 16x16 pixels and the lter outputs are shifted according to si in
gure 8.2.
1
Filter is taken o the shelf with internal name is orient8 in GOP.
74
Motion Estimation using Canonical Correlation
Error in Estimate (pixels)
0.01
0.008
0.006
0.004
0.002
0
0
0.5
1
1.5
True Shift (pixels)
2
Figure 8.5: Accuracy is very good for this synthetically shifted image, Lena128.
The mean square error is plotted versus the amount the image is shifted. (Do not
compare with gure 4.6 where no iterations are done.)
8.5 Future Development
The experimental results on synthetic images is in itself a justication for the
research we have done so far. Still our future goal is estimation of multiple motions,
but which we still have not tried. The diculties compared to the stereo algorithm
is again the general aperture problem described in chapter 5.
8.5.1 Using Multiple Variates
The canonical correlation generates multiple canonical variates. Most of these
canonical variates yield high correlation and similar cross correlation of generated
lters g(v). It may be possible to use more variates than the rst one, but still we
have not seen any signicant improvement in experimental results.
8.5.2 Other Filters than Quadrature Filters
We have performed some experiments on replacing the quadrature lters by pair
of odd and even real lters. The purpose is to allow more degrees of freedom in
canonical correlation by allowing any linear combination of odd and even parts.
Then fA(x) and fB (x) are sums of real lters that are both even and odd. Thus,
the generated lters are not quadrature lters and the cross correlation must be
done in a dierent way. In experiments, we have transform the lters to the Fourier
domain and one of them is multiplied by ei where denotes the angle in polar
representation of the frequency in the Fourier domain. After cross correlation,
then the magnitude of g(v) is zero at the motion. Unfortunately, zero crossings
of magnitude is harder to nd than zero crossings of the phase. In particular, it
gets harder to estimate large motions since there are more zero crossings of the
magnitude.
8.5 Future Development
8.5.3 Reducing Patch Size
75
Maybe reducing the patch size helps in estimation non-translational motions and
motions of multiple transparent layers. It may be possible to reduce the patch size
if fewer shifts, si , are used. Maybe, the set of shifts should depend on direction
of the lter. It may also help to have dierent shifts for the two images in case
the motion is roughly known, as in iterative renement. In the extreme case, if
only one shift is used for every image, we get something similar to the phase-based
method in chapter 4 with warp.
We suggest to be careful with such approaches. For example, only choosing
shifts along a line would mean that the motion is estimated in a separable fashion
and we are back at the mistake described in section 2.1.1.
76
Motion Estimation using Canonical Correlation
Appendices
A Details for Chapter 7 on Canonical Correlation
A.1 Failure to Compute Derivative with Respect to a Complex Variable
Calculus with complex variables often obey the same rules as with real variables.
For that reason, it is easy to forget that the same rules do not always apply.
As interest for this chapter, we will show why it is not possible to compute the
derivative of a complex conjugate. Let f (z ) = z and the derivative is dened as
a limit that does not exist since h is a complex number,
f (z + h) , f (z )
f 0 (z ) = jhlim
h
j!0
(z + h) , z
= jhlim
h
j!0
h:
= jhlim
j!0 h
(A.1)
Of course, it is still possible to split the complex variable into real and imaginary
a+ib) and @f (a+ib) .
parts, z = a + ib and then calculate the partial derivatives @f (@a
@b
A.2 Beginner's Example of Canonical Correlation
Assume X1 ; X2 ; X3 ; X4 are independent stochastic variables, with zero mean and
standard deviation = 1.
0X + X 1
zA = @X , X A
1
1
2
X3
2
1
and zB = X
X4
Note that the only variable that appears in both data set is X1 . For this simple
case, it is obvious that maximum correlation is between zA;1 + zA;2 and zB;1 .
78
Appendix
To verify the CCA algorithm, go through the formal computations.
0(X
CAA = E [zTA zTA] = E [@(X
02 0
= @0 2
1
+ X2 ) (X1 + X2 ) (X1 + X2 ) (X1 , X2 ) (X1 + X2 ) X3
(X1 , X2 ) (X1 , X2 ) (X1 , X2 ) X3 A]
1 , X2 ) (X1 + X2 )
X13 (X1 + X2 )
X3(X1 , X2 )
X3X3
0
0A
0 0 1
1
0(X + X )X
CAB = E [zTA zTB ] = E [@(X , X )X
1
1
2
1
2
1
X3 X1
1 0
1
(X1 + X2 ) X4
1 0
(X1 , X2 ) X4 A] = @1 0A
X3 X4
0 0
X X X X 1 0
]=
CBB = E [zTB zTB ] = E [
Maximization gives
1
1
1
4
X4 X1 X4 X4
0 1
01
1
1
1
@
A
wA = p2 1 ; wB = 0 and = 1
0
This time, = 1, which means that the two linear combinations are always equal,
except for a scalar factor. This is not normal in real world application, where it is
possible where no linear combination makes perfect correlation.
A.3 Proof of Equation (7.9)
This section is a proof that given the variable denitions in chapter 7
DADyAQACAB QB DyB DB = QACAB QB
(A.2)
In the singular case, DA DyA 6= I and(or) DyB DB 6= I, but we will show that
eq (7.9) is still valid thanks to the relations between CAA , CAB and CBB . It is
enough to prove one half of the theorem
QACAB = DADyAQACAB
(A.3)
The other part of the theorem, CAB QB = CAB QB DyB DB , can be proved in the
same way.
Before going into the core of the proof, note that DA DyA is equal to the identity matrix, except in the positions where DA is zero . The denitions of these
covariance matrices implies that a null space in the CAA and CBB are also left
1
1
03:26
For example: If DA = @ 0
0
1
0
1
0 0
1 0 0
56:31 0A then DA DyA = @0 1 0A
0 0
0 0 0
A Details for Chapter 7 on Canonical Correlation
79
and right null spaces of CAB . Let's apply a coordinate transformation to obtain
a form of the canonical correlation which is useful in the proof.
= p wA Q2A QA CAB Q B QB w2 B wAQADAQAwA wB QB DB QB wB
= puA QA2 CAB Q B u2 B
(A.4)
uA = QAwA
uB = QB wB
(A.5)
uAk = DADyA uA
uA? = (I , DA DyA) uA
(A.6)
uADAuA uB DB uB
where
Here comes the core of the proof. Let's pick arbitrary uA and uB and split the
former into parts
Note that uAk + uA? = uA . We will prove that the error in the numinator of
eq. (A.4) is zero.
" = uA QA CAB QB uB , uA DA DyA QA CAB QB uB
= uA (I , DA DyA )QA CAB QB uB
(A.7)
= (uAk + uA? )(I , DA DyA )QA CAB QB uB
= uA? QA CAB QB uB
To prove that uA? QA CAB is zero, we employ a simple trick. Study the correlation,
when the coecients of the linear combinations are uA? and uB .
(uA? ; uB ) = p uA? Q2A CAB QB u2B
uA?DAuA? uB DB uB
Q CAB QB uB
u
= A?p A 0 uB DB uB
= 0"
2
(A.8)
Remind that the canonical correlation, jj 1, eq. (7.2.1). Thus, a zero in the
denuminator means a zero in the numinator. Thus, " = 0 for arbitrary uA and
uB , it follows that eq. (A.3) is proven.
80
Appendix
B Variable Names
All variable names that are used without immediate explanation are listed here.
Most variable names are local of each chapter, but there are also variables that
are used throughout the thesis. The right column indicates where each variable is
dened. An introduction to our style and notations (except for variable names) is
provided in section 1.3.
B.1 Global Variable Names
The following notations are used in many chapters without immediate explanation
at every occurence.
v v = vxy
0c 1
x
c = @cy A
c
xt
x= y
xk
image motion
eq. (3.3)
motion constraint, such that cT v = 0
eq. (2.2)
spatial position
eq. (3.2)
a
often spatial coordinate of constraint, ck , with eq. (3.12)
index k.
vector of parameters for motion model
eq. (3.7)
K(x)
matrix of basis functions for motion model
IA (x), IB (x)
two images that are input to motion estimation section 4.2.1
eq. (3.8)
B Variable Names
81
B.2 Local Variable Names in Chapter 3
v = v1
a = a1
image motion with an extra element=1
eq. (3.11)
model parameter vector with extra element=1 eq. (3.11)
K(x)
matrix of basis functions with extra element
eq. (3.11)
"(a)
error when tting model
eq. (3.12)
k
Q
often index of motion constraint (joint index chapter 3
for spatial position, lter direction e.t.c.)
Symmetric matrix dening quadratic form
eq. (3.13)
Q, q
submatrix/vector of Q
eq. (3.14)
P
matrix dening cost function
section 3.3
Scalar multiplier of cost
eq. (3.18)
B.3 Local Variable Names in Chapter 4
fj (x)
Quadrature lter with index j
eq. (4.2)
n^
Direction of quadrature lter.
eq. (4.1)
qA;j (x),
qB;j (x)
A;j (x)
Output from quadrature lter with index j eq. (4.2)
convolved with images A and B respectively
Phase computed from image A and lter j
eq. (4.3)
C
Condence in constraint c
eq. (4.4),
eq. (4.10)
82
Appendix
B.4 Local Variable Names in Chapter 5
M
number of layers
N
number of parameters in motion model
B.5 Local Variable Names in Chapter 6
This chapter uses the same notations as in chapter 3 and the following.
an
parameters describing motion of layer n
mn;l
mixture probability - the probability of observing a constraint for layer n in a warped image
with index l
owner probability - the probability that the
particular constraint ckl belongs to layer n.
often index of motion layer
qnkl
n
ra
often index of motion constraint ck (joint index
for spatial position, lter direction e.t.c.)
often the index of image warped according to
estimated motion with index (n =)j
conditional probability density function of X
when Y is known
Gradient with respect to variables in vector a
d(c; v)
distance from a motion to a given constraint
eq. (6.19)
"(a1 ; a2 )
Error measure in alternative method
eq. (6.25)
k
l
P (X jY )
T4ijkl , T3ijk , tensors with 4, 3, 2, 1, 0 indices of fourth mo- eq. (6.30)
ments of motion constraints
T2ij , T1i , T0
B Variable Names
83
B.6 Local Variable Names in Chapter 7
zA , zB
vectors of input data (stochastic variables)
wA, wB
vectors with coecients for linear combination eq. (7.1)
of the elements in zA and zB . The correlation
is maximized with respect to wA and wB .
Linear combination of stochastic variables eq. (7.1)
zA = wAT zA
canonical correlation
eq. (7.2)
zA , zB
section 7.1
CAA, CAB , covariance matrices
CBB
DA, DB
Diagonal matrix in eigenvalue decomposition
of CAA and CBB . All elements are real and
nonnegative.
QA, QA
Transformation matrix in eigenvalue decomposition of CAA and CBB . Complex elements
and unitary.
vA, vB
Transformed vectors of wA and wB .
eq. (7.3)
C~ AB
Transformed covariance matrix CAB
eq. (7.8)
v^A, v^B
normalized unit vectors of vA and vB
eq. (7.9)
Dy
Pseudo inverse of matrix D.
eq. (7.8)
k , ek , fk
SVD of C~ AB
eq. (7.10)
eq. (7.6)
eq. (7.6)
eq. (7.7)
84
Appendix
B.7 Local Variable Names in Chapter 8
f (x)
section 8.1.1
wA, wB
Quadrature lter (one out of several in dierent
directions)
Outputs from some quadrature lter f (x) applied on images IA (x) and IB (x).
shift of quadrature lter outputs when computing covariance matrices
coecients for linear combination
N
patch region in the image
section 8.1.2
canonical correlation
eq. (8.4)
qA (x), qB (x)
s ;s ;:::
1
2
eq. (8.1)
section 8.1.1
eq. (8.3)
CAA, CAB , covariance matrices for canonical correlation as section 8.1.2
CBB ,
in chapter 7
fA (x), fB (x) Generated lters. Linear combination of eq. (8.9)
shifted original lters f (x)
g (v )
cross correlation of generated lters { complex eq. (8.10)
G(v)
value
Look up table (LUT) { a matrix for each v.
eq. (8.13)
Bibliography
[1] V Torre A Verri, F Giroso. Dierential techniques for optical ow. Journal
of the Optic Society of North America, 7:912-922, 1990.
[2] M. Andersson and H. Knutsson. General sequential Spatiotemporal Filters
for Ecient Low Level Vision. In ECCV-96, April 1996. Submitted.
[3] J. L. Barron, D. J. Fleet, S. S. Beauchemin, and T. A. Burkitt. Performance
of optical ow techniques. In Proc. of the CVPR, pages 236{242, Champaign,
Illinois, USA, 1992. IEEE. Revised report July 1993, TR-299, Dept. of Computer Science, University of Western Ontario, London, Ontario, Canada N6A
5B7.
[4] S. Bircheld and C. Tomasi. Depth discontinuities by pixel-to-pixel stereo.".
Proceedings of the IEEE International Conference on Computer Vision, pages
1073{1080, January 1998.
[5] M. Borga. Learning Multidimensional Signal Processing. PhD thesis,
Linkoping University, Sweden, SE-581 83 Linkoping, Sweden, 1998. Dissertation No 531, ISBN 91-7219-202-X.
[6] A. D. Calway, H. Knutsson, and R. Wilson. Multiresolution estimation of
2-d disparity using a frequency domain approach. In Proc. British Machine
Vision Conf., Leed, UK, September 1992.
[7] A. D. Calway, H. Knutsson, and R. Wilson. Multiresolution estimation of
2-d disparity using a frequency domain approach. In Proc. British Machine
Vision Conf., Leed, UK, September 1992.
[8] A. D. Calway, H. Knutsson, and R. Wilson. Multiresolution frequency domain
algorithm for fast image registration. In Proc. 3rd Int. Conf. on Visual Search,
Nottingham, UK, August 1992.
[9] G. Farneback. Motion-based Segmentation of Image Sequences. Master's Thesis LiTH-ISY-EX-1596, Computer Vision Laboratory, SE-581 83 Linkoping,
Sweden, May 1996.
[10] G. Farneback. Spatial Domain Methods for Orientation and Velocity Estimation. Lic. Thesis LiU-Tek-Lic-1999:13, Dept. EE, Linkoping University,
86
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
Bibliography
SE-581 83 Linkoping, Sweden, March 1999. Thesis No. 755, ISBN 91-7219441-3.
D. J. Fleet and A. D. Jepson. Computation of Component Image Velocity
from Local Phase Information. Int. Journal of Computer Vision, 5(1):77{104,
1990.
D. J. Fleet, A. D. Jepson, and M. R. M. Jenkin. Phase-based disparity
measurement. CVGIP Image Understanding, 53(2):198{210, March 1991.
G. H. Granlund and H. Knutsson. Signal Processing for Computer Vision.
Kluwer Academic Publishers, 1995. ISBN 0-7923-9530-1.
M. Hemmendor, M. T. Andersson, and H. Knutsson. Phase-based image
motion estimation and registration. In International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), 1999, Phoenix, AZ, USA, March
1999. IEEE.
M. Hemmendor, H. Knutsson, M. T. Andersson, and T. Kronander. Motion compensated digital subraction angiography. In Proceedings of SPIE's
International Symposium on Medical Imaging 1999, volume 3661 Image Processing, San Diego, USA, February 1999. SPIE.
Magnus Hemmendor. Motion compensated digital subtraction angiography.
Master's thesis, Linkopings universitet, 1997. LiTH-ISY-EX-1750.
B. K. P. Horn and B. G. Schunk. Determining optical ow. Articial Intelligence, 17:185{204, 1981.
M. Irani and S. Peleg. Motion analysis for image enhancement: resolution,
occlusion, and transparency. Journal of Visual Communications and Image
Representation, 4(4):324{335, 1993.
A. Jepson and M. Black. Mixture models for optical ow. Technical Report
RBCV-TR-93-44, Res. in Biol. and Comp. Vision, Dept. of Comp. Sci., Univ.
of Toronto, 1993.
A. D. Jepson and D. J. Fleet. Scale-space singularities. In O. Faugeras, editor,
Computer Vision-ECCV90, pages 50{55. Springer-Verlag, 1990.
J. Karlholm. Local Signal Models for Image Sequence Analysis. PhD thesis,
Linkoping University, Sweden, SE-581 83 Linkoping, Sweden, 1998. Dissertation No 536, ISBN 91-7219-220-8.
Scott R. Kerns and Jr. Irvin F.Hawkins. Carbon dioxide digital subtraction
angiography. AJR, 164, 1995.
H. Knutsson and M. Andersson. Optimization of Sequential Filters.
In Proceedings of the SSAB Symposium on Image Analysis, pages 87{
90, Linkoping, Sweden, March 1995. SSAB. LiTH-ISY-R-1797. URL:
http://www.isy.liu.se/cvl/ScOut/TechRep/TechRep.html.
Bibliography
87
[24] Centers for Disease Control Lewis A. Connor, David Satcher and Prevention.
Reducing the burden of cardiovascular disease: Cdc strategies in evolution.
Chronic Disease Notes and Reports, 1997.
[25] Jorg F. Debatin Martin R. Prince, Thomas M. Grist. 3D Contrast MR Angiography, 2nd edition. Springer, 1999.
[26] G. J. McLachlan and T. Krishnan. The EM algorithm and extensions. Wiley,
1997.
[27] H. H. Nagel. On the estimation of optical ow: Relations between dierent
approaches and som new results. Articial Intelligence, 33:299{324, 1987.
[28] J. Whiting R. Close. Decomposition of coronary angiograms into non-rigid
moving layers. In Proceedings of SPIE's International Symposium on Medical
Imaging 1999, volume 3661 Image Processing, San Diego, USA, February
1999. SPIE.
[29] J. Shi and C. Tomasi. Good features to track. IEEE Conference on Computer
Vision and Pattern Recognition, pages 593{600, 1994.
[30] M. Shizawa and K. Mase. Simultaneous multiple optical ow estimation.
In Proceedings of the 10th International Conference on Pattern Recognition,
volume 1, pages 274{278, 1990.
[31] M. Shizawa and K. Mase. Principle of superposition: A common computational framework for analysis of multiple motion. In IEEE Workshop on
Visual Motion, Princton, NJ, 1991.
[32] Jurgen Weese T. Buzug. Weighted least squares for point-based registration in
digital subtraction angiography (dsa). In Proceedings of SPIE's International
Symposium on Medical Imaging 1999, volume 3661 Image Processing, San
Diego, USA, February 1999. SPIE.
[33] C-J. Westelius. Focus of Attention and Gaze Control for Robot Vision. PhD
thesis, Linkoping University, Sweden, SE-581 83 Linkoping, Sweden, 1995.
Dissertation No 379, ISBN 91-7871-530-X.
[34] C-F. Westin. A Tensor Framework for Multidimensional Signal Processing.
PhD thesis, Linkoping University, Sweden, SE-581 83 Linkoping, Sweden,
1994. Dissertation No 348, ISBN 91-7871-421-4.
[35] Y. Zhu and N. J. Pelc. A spatiotemporal nite element mesh model of cyclical deforming motion and its application in myocardial motion analysis using
phase contrast mr images. In IEEE International Conference on Image Processing 97, volume II, pages 117{120, Santa Barbara, October 1997. IEEE.
88
Bibliography
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Related manuals

Download PDF

advertisement