Adaptive Multidimensional Filtering Leif Haglund

Adaptive Multidimensional Filtering Leif Haglund
Adaptive Multidimensional Filtering
Leif Haglund
1
To Christina, Lina and Maja
2
Abstract
This thesis contains a presentation and an analysis of adaptive filtering strategies
for multidimensional data. The size, shape and orientation of the filter are signal
controlled and thus adapted locally to each neighbourhood according to a predefined
model. The filter is constructed as a linear weighting of fixed oriented bandpass filters having the same shape but different orientations. The adaptive filtering methods
have been tested on both real data and synthesized test data in 2D, e.g. still images, 3D, e.g. image sequences or volumes, with good results. In 4D, e.g. volume
sequences, the algorithm is given in its mathematical form. The weighting coefficients are given by the inner products of a tensor representing the local structure of
the data and the tensors representing the orientation of the filters.
The procedure and filter design in estimating the representation tensor are described. In 2D, the tensor contains information about the local energy, the optimal
orientation and a certainty of the orientation. In 3D, the information in the tensor
is the energy, the normal to the best fitting local plane and the tangent to the best
fitting line, and certainties of these orientations. In the case of time sequences, a
quantitative comparison of the proposed method and other (optical flow) algorithms
is presented.
The estimation of control information is made in different scales. There are two
main reasons for this. A single filter has a particular limited pass band which may
or may not be tuned to the different sized objects to describe. Second, size or
scale is a descriptive feature in its own right. All of this requires the integration
of measurements from different scales. The increasing interest in wavelet theory
supports the idea that a multiresolution approach is necessary. Hence the resulting
adaptive filter will adapt also in size and to different orientations in different scales.
3
0
Acknowledgements
I would like to express my gratitude to those who have helped me in pursuing this
work:
To Gösta Granlund, Professor at Computer Vision at LiTH for introducing me
to the exciting field of computer vision and for giving me the opportunity to work in
his group. His ideas about vision has flavored most of the algorithms in this thesis.
To Dr. Hans Knutsson, for always finding time for discussions and for sharing
his immense scientific knowledge. He has been a constant source of new ideas,
explanations and comments, without which this thesis would have been considerably
thinner. His comments regarding the presentation of the work in this thesis has also
improved the final result.
To all the members of the Computer Vision Laboratory for the stimulating and
friendly atmosphere in the group and for numerous comments and discussions on
scientific and philosophical subjects.
To Carl-Fredrik Westin for all the time he has spent reading drafts of this thesis.
His comments, regarding both scientific and editorial issues, have improved the final
result considerably.
To Tomas Landelius for his implementations of the applications in Chapter 8.
To Catharina Holmgren, Professor Roland Wilson, Dr. Mats Andersson, and
Dr Håkan Bårman for proof-reading different parts of the manuscript.
To the Swedish National Board for Technical Development and PrometheusSweden, for its financial support of this work.
Finally to somebody I haven’t seen much during the last hectic months, but who
means very much to me. Thank you, Christina, for your patience, love and support,
and most of all for just being. By taking the utmost care of the daily life and of our
two daughters Lina and Maja, you made it all possible.
1
2
Contents
1 Introduction
I
6
1.1
Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . .
6
1.2
Previous Publications . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.3
Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
Filter Design and Scale Analysis
11
2 Background
13
3
18
Filter Design and Wavelets
3.1
Quadrature Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2
Radial Filter Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.A
4
3.2.1
General Design Principles . . . . . . . . . . . . . . . . . . . . 20
3.2.2
Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.3
The Lognormal Filter . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.4
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Filter Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
On the Use of Phase in Scale Analysis
4.1
4.2
34
Phase Estimation in Images . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.1
Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.2
Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 39
Feature Scale-Space Utilizing Phase . . . . . . . . . . . . . . . . . . . 39
4.2.1
Scale Space Clustering . . . . . . . . . . . . . . . . . . . . . . 41
4.2.2
Spatial Frequency Estimation . . . . . . . . . . . . . . . . . . 47
4.2.3
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.A Gaussian Resampling of Images . . . . . . . . . . . . . . . . . . . . . 58
4.A.1 Subsampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.A.2 Interpolation to a Higher Sampling Rate . . . . . . . . . . . . 59
3
II
2D — Adaptive Filtering
61
5
Adaptive Filtering of 2D Images
63
5.1
5.2
Representation and Estimation of Orientation . . . . . . . . . . . . . 63
5.1.1
Orientation Representation . . . . . . . . . . . . . . . . . . . . 65
5.1.2
Orientation Estimation . . . . . . . . . . . . . . . . . . . . . . 65
Orientation Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . 68
5.2.1
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6 Scale Adaptive Filtering
6.1
74
The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.1.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.1.2
Generation of the Bandpass Pyramid . . . . . . . . . . . . . . 76
6.1.3
Estimation of Orientation . . . . . . . . . . . . . . . . . . . . 78
6.1.4
Consistency of Orientation . . . . . . . . . . . . . . . . . . . . 78
6.1.5
Determination of Parameters
6.1.6
Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . . . . 83
6.1.7
Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . 84
. . . . . . . . . . . . . . . . . . 79
6.2
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.A
Calculations of the Energy in the Bandpass Pyramid . . . . . . . . . 95
6.B
The Transfer Function . . . . . . . . . . . . . . . . . . . . . . . . . . 96
III
3D — Adaptive Filtering
7 Orientation Representation and Estimation in 3D
7.1
7.2
7.A
99
Orientation Representation . . . . . . . . . . . . . . . . . . . . . . . 99
7.1.1
Implementation of the Orientation Algorithm . . . . . . . . . 100
7.1.2
Evaluation of the Representation Tensor . . . . . . . . . . . . 102
7.1.3
Accuracy of the Orientation Estimate . . . . . . . . . . . . . . 102
Optical Flow Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.2.1
Velocity Estimation . . . . . . . . . . . . . . . . . . . . . . . . 108
7.2.2
Spatio-Temporal Channels . . . . . . . . . . . . . . . . . . . . 109
7.2.3
Quantitative Error Measurements . . . . . . . . . . . . . . . . 112
Filtering of Interlaced Video Signals . . . . . . . . . . . . . . . . . . 119
8 Applications in Structure from Motion
8.1
97
121
Extraction of Focus of Expansion . . . . . . . . . . . . . . . . . . . . 121
4
8.2
Motion Stereo Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 124
8.2.1
9
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Adaptive Filtering of 3D Signals
128
9.1
The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
9.2
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
9.3
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
IV
4D — Adaptive Filtering
10 Filtering of 4D Signals
143
145
10.1 Orientation Representation and Estimation . . . . . . . . . . . . . . . 145
10.2 Adaptive Filtering of 4D Signals . . . . . . . . . . . . . . . . . . . . . 147
5
Chapter 1
Introduction
The main theme of this thesis is to present an adaptive filtering structure based on
estimated parameters for local models. The models are based on the observation
that complex high dimensional signals locally have very few degrees of freedom.
For images it is believed that local one-dimensionality, i.e. a small neighbourhood
varying in just one orientation, is a generally useful model for vision systems.
The chosen approach is inspired and guided by previous work at the Computer
Vision Laboratory (CVL) at Linköping University, [39, 67, 40, 66, 97, 62, 14]. The
main line of research at CVL is the design of robust operations which can work
in a hierarchical, or modular, system. Each module of the system is designed to
produce estimates that can be used directly by other modules. Larger system can
then be designed using these modules as building blocks. In multi-level systems the
information representation becomes crucial. A main guide-line for the work at CVL
is that the behavior of the developed modules should be robust and that ordinary
signal processing tools, e.g. differentiation and averaging, should be meaningful to
apply. This implies that small changes in the input signal should not cause large
changes in the output, which may seem a simple rule, but it has profound implications. One consequence of the robustness rule is the use of tensors as a general way
of representing information, [59, 8, 94]. In some cases 1:st order tensors, i.e. vectors, may suffice. The norm of the tensor represents the strength of the represented
event. The Eigenvectors and Eigenvalues of the tensor give a full representation of
the type and the certainty of the event. An important feature of this representation
is that the norm of the difference between two tensors will indicate how different
the represented events are.
In this thesis the tensor representation is utilized for representing the control
information in an adaptive filtering strategy. The robustness and accuracy of the
estimated tensor in the case of 3D signal spaces is also evaluated.
1.1
Organization of the Thesis
The thesis is divided into four different parts. The first part presents the filters
used and also describes a scale analyzing scheme for 2D images. The following
three parts describe the adaptive filter algorithm for 2D, 3D and 4D signal spaces.
6
In each of these three parts the estimation of local orientation is described and
examples of other applications where the estimated features have been used are
given. The results are in a number of cases compared to results of other (widely)
used algorithms. Problems associated with signals of different dimensionality are in
many ways qualitatively different. It is, for this reason, natural to partitioning the
thesis depending on the dimensionality of the signal space. However, some basic
assumptions are the same for signals of any dimension, and some of the assumptions
involved in the 3D and 4D parts are presented in the 2D part.
Part I – Filter Design
In the first part the quadrature filters used in the estimation throughout the thesis are presented in a wavelet theory context, [87, 25]. An extension of the onedimensional wavelet transform to two dimensions is suggested giving a rotational
unbiased output. The two-dimensional transform is used to perform scale analysis
of images, [100, 18, 70, 99, 73]. The use of phase from the quadrature filters, [67, 32],
is utilized in the scale analysis and a continuous representation of the phase for 2D
signals is given. Finally an application in texture analysis is described.
Part II – 2D Adaptive Filtering
In the second part the estimation of local orientation on 2D images, and the representation as a tensor field is described, [59, 60]. The estimated orientation information
is used as contextual control of an orientation adaptive filter, [66]. Although the
adaptive filter has many degrees of freedom it can be constructed as a weighted
sum of the output from fixed filters. Due to the tensor representation the weighting
coefficients are calculated as a simple inner product of the tensor describing the
neighborhood and the tensors corresponding to the fixed filters, [65]. The orientation adaptive filter is then extended to work in different scales. Working with
different scales also enables a better decision of whether estimated local energy originates from noise or from the image signal. A scheme for estimating the noise level
in images is presented and used for parameter setting in the adaptive filtering. This
scheme has proven to work well on a wide variety of images.
Part III – 3D Adaptive Filtering
The third part contains the estimation of orientation for 3D signals, such as image
sequences or image volumes. The estimated orientation is used for optical flow
calculation and the results are compared to other algorithms. This information is
also used for focus of expansion estimation in the case of a translating camera. The
local orientation is represented as a 3 by 3 tensor, [59], which is used to adapt a 3D
filter in this direction, [65]. The adaptive filtering algorithm is shown to perform
very well, even in very noisy situations, on both image sequences and image volumes.
7
Part IV – 4D Adaptive Filtering
In the last part of the thesis a theoretical extension of the adaptive filtering algorithm
to 4D signal spaces is presented. The representation and estimation of orientation
of local structures in 4D are discussed. As in 3D, this information is used as control
for orientation adaptive filtering.
1.2
Previous Publications
This thesis is a compilation and an extension of work previously documented in the
following publications:
L. Haglund, H.Knutsson and G.H. Granlund.
On Phase Representation of Image Information.
The 6th Scandinavian Conference on Image Analysis pp 1082–1089
Oulu, Finland, June 1989
L. Haglund, H.Knutsson and G.H. Granlund.
Scale Analysis Using Phase Representation.
The 6th Scandinavian Conference on Image Analysis pp 1118–1125
Oulu, Finland, June 1989
J. Wiklund, L. Haglund, H. Knutsson and G. H. Granlund
Time Sequence Analysis Using Multi-Resolution Spatio-Temporal Filters
The 3rd International Workshop on Time-Varying Image Processing and Moving
Object Recognition pp 258–265
Florence, Italy, May 1989
L. Haglund, H. Bårman and H. Knutsson
Estimation of Velocity and Acceleration in Time Sequences
Theory and Applications of Image Analysis pp 223–236
Publisher: World Scientific Publishing Co, May 1992
H. Bårman, L. Haglund, H. Knutsson and G. H. Granlund
Estimation of Velocity, Acceleration and Disparity in Time Sequences
Proceedings of IEEE Workshop on Visual Motion pp 44–51
Princeton, NJ, USA, October 1991
H. Knutsson, L. Haglund, H. Bårman and G. H. Granlund
A Framework for Anisotropic Adaptive Filtering and Analysis of Image Sequences
and Volumes
Proceedings ICASSP-92 pp 469–472
San Francisco, CA, USA, March 1992
8
1.3
Notations
The following notations are used1 :
Lowercase letters are used to denote scalars, e.g. s.
Lowercase letters in boldface are used to denote vectors, e.g. v, while lowercase
letters with subscripts are used for individual vector elements, e.g. vi . The norm of
a vector v is denoted v.
Uppercase letters in boldface are used for tensors of order 2 (and matrices), e.g. T.
The elements are denoted with the uppercase letter and subscripts, e.g. Tij . The
norm of a tensor T is denoted by T .
e Eigenvector.
λ Eigenvalue. λ1 is the largest.
n Filter direction in frequency domain.
q Filter output magnitude.
θ Phase angle.
q Filter output (with phase angle).
ω Frequency coordinate vector.
x Spatial coordinate vector.
ϕ Spatial orientation.
1
Some exceptions from the general notation guide can be found, in these cases the variables are
defined explicitly to avoid misunderstandings.
9
10
Part I
Filter Design and Scale Analysis
11
12
Chapter 2
Background
In the last few years the wavelet theory and their transform have gained increasing
attention, [56], in various signal processing fields. Wavelet theory could be said to
be a unification of many “old” ideas, and the purpose of this part of the thesis is to
reformulate some ideas in terms of this common framework.
Wavelet-like transforms, e.g. scale pyramids and scale-space, have been widely
used in the computer vision community over the last decade. The original reason for
using scale pyramids was to perform an efficient analysis, but in later years there has
been an increasing support for the idea that analysis at multiple scales or in scalespace could provide important means of image description on which model-based
image processing could be based. Many researchers have published work within the
field. To mention some of them, without claiming to be comprehensive:
• Marr, [78], introduced the concept of scale-pyramids as early as in the midseventies. The frequency difference between two adjacent levels in the pyramid
was an octave or, in other words, a subsampling of a factor 2 was used. The
result of this is a logarithmic sampling in the scale dimension. The inspiration
to this was that approximately the same performance had been detected in
biological vision, in the so called frequency channel theory. He claimed that
visually interesting features must exist in a rather broad band of scales, i.e.
over many levels in the pyramid.
• Granlund, [39], suggested the use of a set of Cartesian separable Gabor-Gaussian
filters in different scales for orientation estimation in images. This type of filters have been extensively used in a number of different applications, and have
lately been termed Morlet wavelets.
• Knutsson, [67], introduced the a polar separable lognormal filter function, see
also Section 3.2.3, obtaining a natural separation of scale and orientation. These
filters were used on several scales with typical applications being orientation
estimation, spatial frequency estimation and texture segmentation.
• Witkin, [100], extended the pyramid concept to a continuous so-called scalespace by using Gaussian filtering. The description he used was based on zerocrossings of the Laplacian of Gaussian filters. By varying the standard deviation
of the Gaussian all scales will be covered by Laplacians. He stressed two things:
13
(1) Identity: Zero-crossings observed at different scales but lying on a common
zero-contour, meaning that they correspond to a small scale change in scalespace, arise from the same underlying event.
(2) Localization: The true location of an event is its position in the finest
available scale.
• Koenderink, [70, 69], has by using Gaussian filtering proved that the classical
heat equation can correlate scale changes with changes in spatial dimensions.
He embedded the original image in a one parameter family of derived images,
where the parameter is resolution or scale. A study of this family makes it
possible to use all scales simultaneously, which is a requirement if no a priori
knowledge about the “right” resolution is available and it is desirable to retain
all possible structures.
• Pizer, [29], has implemented Koenderink’s ideas in an impressive way, nearly
in video-rate. By using the symmetry axis transform in multiple resolutions
as description of images, good results have been achieved, mainly on medical
images.
• Lindeberg, [73], has theoretically extended Koenderink’s ideas to discrete signals. He has also defined a representation, “the scale space primal sketch”, for
the relations between different scale levels. The theory is applied to extracting
image structures and detecting on what scale they appear.
• Crowley, [23], bases his analysis on differences of Gaussians, using the so called
DOLP transform. Following what he calls peaks and ridges in the family of
Laplacian images, which is the result of the DOLP transform, through many
scales, gives a description of the original image. The final description is taken
as the maximum over scale of each peak or ridge.
• Hoff and Olsen, [81], [51] are two examples of the use of scale analysis in
stereo algorithms. The scale analysis is in this case used as a guidance to
solve the stereo correspondence problem. By doing a tracking of events, i.e.
zero-crossings of a Laplacian, from coarse to fine scale, the calculation and
complexity of the correspondence problem is reduced significantly.
• Wilson and Calway, [96], has extended the use of the short-time Fourier transform, with several window sizes, to images in the so called Multiresolution
Fourier Transform. The transform is a general tool for image analysis in multiple scales and has for example been applied to curve segmentation.
In order to give some understanding of how images behave when they go through
a blurring process, a short description will be given. When resolution is decreased,
the images become less articulated because the extremes, bright and dark “blobs”,
disappear one after the other. This erosion of structure is a process that is similar
in each case and there are three main reasons for this behavior:
1. Two extrema float together and annihilate.
14
scale of interest
events
under 5 cm
gravel and pebbles
bricks
5 - 50 cm
windows and doors
50 cm - 5 m
separate houses
5 - 50 m
blocks
50 - 500 m
the whole town
over 500 m
Table 2.1: The correlation between scale and events in a picture of a town.
2. Two extrema float together and one “swallows” the other. The result is that
one large extremum continues in the blurring process.
3. An extremum and a saddle point float together and the extremum disappears.
Consequently each blob has a limited range in which it manifests itself. To
understand this, think about a picture of a town. In this picture different events
exist in different scales, see Table 2.1. The detection of, say, bricks shall then be in
the scale interval between 5 cm and 50 cm. Finer or larger scales must not interfere
with the detection procedure.
The phase from the wavelet transforms has during the last few years gained
increasing interest in the computer vision community. The phase from Gabor-like
filters has been applied in a number of different applications [44, 45, 34, 35, 93, 72,
95, 19, 98, 16]. The output from a quadrature filter is an analytic function which
can be written in terms of an even real part and an odd imaginary part, as described
in Chapter 3. The analytical function can also be written as an amplitude and an
argument. The amplitude is a measure of the local energy (local both in spatial and
frequency domains) and the argument is the phase.
The motivations to use phase are:
• The phase is insensitive to both mean luminance and contrast.
• The phase is a continuous variable, i.e. it can measure changes much smaller
than the spatial quantization, giving subpixel accuracy, without subpixel representation of the image.
• The phase is also stable against scaling up to 20 %, [35].
• The phase is generally a very stable feature in scale space.
and the applications has for example been guidance for preattentive gaze control,
[93], disparity estimation, [72, 35, 95, 98], image velocity estimation, [34], and feature
modeling, [16, 19].
Among the motivations above the last one might be the most important motivation for the scale space clustering scheme in Section 4.2. In order to illustrate the
phase behavior in scale space an 1D line from the well-known “Lenna” test image
will be used, see Figure 2.1.
15
180
160
140
Grey level
120
100
80
60
40
20
0
50
100
150
200
250
Spatial position
Figure 2.1: The signal used to illustrate the scale behavior, Figure 2.2, of the
lognormal filter.
Figure 2.2 contains isophase contours from a lognormal filter, Eq. 3.16. It describes the phase behavior of the lognormal filter in scale-space. This type of figure
will be named phaseogram. In the figure a continuous scale parameter has been
used. The scale has the same meaning as in the wavelet community. The phase is
stable over scale when the contours are vertical. It is clear that this is the case in
most of the phaseogram. There are, however, some points in the phaseogram where
the contours turn and become horizontal. These points are called singular points
[32] because they are singular points of the analytical function.
The reason for their appearance can intuitively be explained in the following
way. At low scales, i.e. high resolution, the phase cycle of the filter output has small
spatial support; on the other hand at high scales, i.e. low resolution, the spatial
support is much larger. If the signal has a limited support this means that the
number of cycles must decrease with increasing scale. In the phaseogram, Figure 2.2,
this can easily be seen by just looking at the variation of the density of the isophase
curves when going from low to high scales. The decrease in the number of cycles
occurs in these singular points.
In Chapter 4 a continuous representation of the local phase in images is presented and described. An extensions of the wavelet transform to 2D signal spaces,
i.e. images, is suggested in Section 4.2 utilizing the phase of the transform. This
2D transform is used as a scale space clustering tool and applied as a spatial frequency estimator. In the application the filters are chosen to have constant relative
bandwidth. The bandwidth B = 1.5 can be motivated from physiological studies.
16
Phase of Lognorm filter output
4
3.5
3
Scale
2.5
2
1.5
1
0.5
0
50
100
150
200
250
Figure 2.2: Isophase contours from a lognormal filter. The broad, dark, lines are
due to phase wrap around.
Measurements both from adaption experiments on human beings and directly on
striate cortex in macaque monkeys have in agreement with each other showed nearly
constant relative bandwidth in biological visual systems, see [28] and [38]. The measured bandwidths range from 1.5 to 2 octaves. The calculations in Section 3.2.3 for
an octave based wavelet transform also shows that this choice is mathematically
acceptable.
17
Chapter 3
Filter Design and Wavelets
3.1
Quadrature Filters
It is well known that a real-valued image, i(x, y), has a hermitian Fourier transform,
[17], I(ωx , ωy ), i.e.
(3.1)
I(ω) = I ∗ (−ω)
where ∗ denotes the complex conjugate and ω is the frequency vector. It follows
from this property that the energy contribution is even.
| I(ω) |2 =| I(−ω) |2 = (Re[I(ω)])2 + (Im[I(ω)])2
(3.2)
From the evenness of the energy function, a measurement in a particular section
of the Fourier domain can be obtained by designing filters that only measure the
energy in one half plane. One way to perform the estimation is to use quadrature
filters.
A typical frequency function, Qk (ω), first suggested in [67], for a quadrature filter
is
(
Qk (ω) =
F (ω)(ω̂ · n̂k )2A if (ω̂ · n̂k ) ≥ 0
0
otherwise
(3.3)
where
ω = ||ω||
A is a parameter which specifies the angular bandwidth and
nk is the main direction of the filter.
In more traditional terms of image processing the filter of Eq. 3.3 can be separated
into line and edge detectors. Eq. 3.2 shows that it is possible to estimate the energy
contribution separately from the real and the imaginary part of the local Fourier
transform. The real part of the transform corresponds to even functions, such as
lines, and the imaginary part corresponds to odd functions, such as edges. The
equation for quadrature filters, Eq. 3.3, can be written in terms of an odd part,
indexed o, and an even part, indexed e.
Qk (ω) = Hke (ω) + Hko (ω)
Hke (ω) = Hke (−ω) = 12 F (ω)(ω̂ · n̂k )2A
Hko (ω) = −Hko (−ω) = 12 F (ω)(ω̂ · n̂k )2A sign[ω̂ · n̂k ]
18
(3.4)
Using a quadrature approach to filter design has a number of advantages. The
principal one is the well-defined behavior in the common frequency band of the line
and edge detectors. This behavior may also be used to define a general signal phase
of multidimensional signals, see Section 4 and [67]. Another important advantage
is that “phase-independence” can be obtained in the output of different operations,
e.g. orientation estimation. This means that the output will be equally strong for
cosine (line)-like inputs as for sinus (edge)-like inputs. There is also physiological
evidence for quadrature-like filters in biological vision systems, [76, 82, 83].
The reason for choosing polar separable filter functions is that an intuitive feeling
for the filter parameters can be obtained. The angular function is closely related to
the orientation of the filter and thus to the local orientation in the neighbourhood.
The radial function, on the other hand, determines in what scale the filter has its
sensitivity. By making the filter functions polar separable, it is easy to change these
aspects of the estimation independently, with easy predictable effects on the filter
sensitivity.
The choice of angular function in Eq. 3.3 and Eq. 3.4 is due to the interpolation
and smoothness properties of trigonometric functions. The parameter A controls the
tuning of the orientation sensitivity. A larger value of A gives a narrower angular
bandwidth and thus the possibility to estimate the orientation more exactly, but this
at the cost of more filters. The condition which increases the number of filters is the
requirement of rotational invariance in the estimation. Rotational invariance means
here that the exactness and the strength of the estimation should be invariant under
rotation of the input. For example, with A = 1 in a two-dimensional image, at least
three quadrature filters are required to get an unbiased output. The relationship
between A and the number of filters, k, is k ≥ A + 1, the proof is found in [67]. This
requirement is in a similar context called rotational invariance [24].
The choice of the radial frequency F (ω) is based on the scale of interest. F (ω)
has the character of a bandpass filter (see the next section for a more thorough
investigation).
Applying these filters as a convolution between their spatial transforms, hke and
hko , and the input image, i(x, y), will form outputs qko and qke
qke (x, y) = hke (x, y) ∗ i(x, y)
qko (x, y) = −j · hko (x, y) ∗ i(x, y)
where j =
√
(3.5)
−1.
The energy contribution from the filter k, Ek , is given by
2
2
+ qke
Ek = qko
(3.6)
By combining the outputs qk it is possible to design robust algorithms for orientation estimation [67].
19
3.2
3.2.1
Radial Filter Functions
General Design Principles
When designing filters for image processing the uncertainty principle must be kept
in mind, [67, 97]
∆x · ∆ω ≥ constant
(3.7)
where x is the spatial variable and ω is the frequency variable.
The spreads ∆ω and ∆x are the standard deviations in the Fourier and the spatial
domain respectively.
In order to minimize the simultaneous spread of the filters in both frequency and
spatial domains a Gaussian approach is preferable. This is, however, not possible
to combine with quadrature filters, [67]. The restrictions that must be fulfilled in
order to design small spatial quadrature filters, which still behave properly in the
Fourier domain, are
1. The DC-component the Fourier domain must be zero.
2. The value at the filter border, ω = π, must be zero.
A consequence of the uncertainty principle is that a narrow filter in one domain
will be wide in the corresponding dual domain. A wide filter in the Fourier space is
thus appropriate to realize as a spatial convolution kernel, since only a small number
of coefficients are needed. On the other hand, a wide filter has a long tail which
may be hard to combine with the second demand above.
The reason for these restrictions is to avoid discontinuities in the odd part, or
in the edge detector, of the filter. A discontinuity in the frequency domain gives
rise to ringing in the spatial domain. An understanding of how these discontinuities
can appear can be gained by looking at Figure 3.1, which shows a well-behaved
quadrature filter plotted in the frequency domain. In Figure 3.2 a bad quadrature
filter is plotted. Neither the first nor the second requirement stated above is fulfilled.
Note the discontinuity in the odd part of the filter.
3.2.2
Wavelet Transform
Wavelet theory is a unification of similar ideas from different fields. In the mideighties, three French scientists, Morlet(geophysicist), Grossmann (theoretical physicist) and Meyer (mathematician), built a strong mathematical foundation based on
the idea of looking at signals at different scales and various resolutions. In the field
computer vision, Mallat and Daubechies have established connections to discrete
signal processing, e.g. [25, 75].
The general reason for transforming or filtering a signal is to extract specific
features from the signal. This means that the filter, in the ideal case, should be
invariant to all other variations of the signal but the interesting feature, to which
it should be equivariant. Hence, the transformed or filtered signal should vary only
with the interesting feature and not with anything else.
20
even and
odd part
even part
−2π
−π
π
2π
ω
−2π
−π
π
2π
ω
odd part
Figure 3.1: Plot of a well-behaved quadrature filter both as one filter and split into
odd and even parts. The periodicity of the Fourier domain corresponds to a discrete
spatial domain.
even and
odd part
even part
−2π
−π
π
2π
ω
−2π
−π
π
2π
ω
odd part
Figure 3.2: Plot of a bad quadrature filter both as one filter and split in odd and
even parts. Note the discontinuities in the odd part of the filter.
The Fourier transform is an example of a transform where the interesting feature
to extract is the frequency. The assumption is that the signal is stationary, e.g.
sinewaves. If the signal is non-stationary, any abrupt change of the signal will be
spread over the whole frequency axis and the spatial position of the discontinuity
will be impossible to retain from the Fourier coefficients. The Fourier transform is
apparently not sufficient for analyzing such signals.
The Short Time Fourier Transform, or windowed Fourier transform, is one way
to modify the Fourier transform for better performance on non-stationary signals.
There are, however, many possibilities to choose the windowing function. One widely
used window function is the Gabor function, G(ω) or g(x).
2
2
G(ω) = e−(ω−ω0 ) /2σω
σω
2 2
g(x) = ejxω0 √ e−x σω /2
2π
(3.8)
Here the window function in the frequency domain is a shifted Gaussian and spatially
21
f
f1
f2
t1
t2
t
Figure 3.3: Linear partitioning of the frequency domain, where the filter functions
for two filters are indicated both spatially and in the frequency domain.
a modulated Gaussian, giving a filter that is well concentrated in both domains,
[37, 39].
To have equal sensitivity for all frequencies there is an obvious need for several
filters. There are many ways of partitioning the frequency domain. Originally
Gabor suggested a uniform splitting of the time-frequency domain, see Figure 3.3
and Figure 3.4.
In wavelet theory, a logarithmic partitioning is desirable, see Figure 3.4. This is
achieved by scaling and translating a “mother wavelet”. This idea has been used
in vision for several years, e.g. Laplacian pyramids, [18], and the spatial frequency
channel theory of biological vision, [20].
The logarithmic partitioning is equivalent to using filters with constant relative
bandwidth, or “constant-Q” analysis
∆ω
= constant
ω0
(3.9)
When the relation in Eq. 3.9 is fulfilled, the bandwidth, ∆ω, will vary with
the center frequency of the filter. In spite of the uncertainty principle, it is now
possible to get arbitrarily good time or space resolution at high frequencies, [99, 84].
The same holds for low frequencies, i.e the low frequency resolution can be made
arbitrarily good. The hypothesis is that high frequency bursts are of short duration,
while low frequency components have long duration. In images this hypothesis
implies that “interesting” features have fixed shape but unknown size. This is a more
realistic model than for example the global Fourier transform, where all frequencies
are supposed to have infinite support in space.
22
ω
ω
x
x
Figure 3.4: Uniform (to the left) and logarithmic (to the right) splitting of the
frequency domain.
Continuous Wavelet Transforms
In the continuous case the wavelet transform is constructed from a single function
h(x) named the “mother wavelet”.
x−τ
1
)
ha,τ (x) = √ h(
a
a
(3.10)
where
a is a scale, or dilation, factor,
τ is the translation,
√
1/ a is a constant that is used for energy normalization.
The definition of the continuous wavelet transform (CWT) is then
1 Z
x−τ
CW Th (τ, a) = √
i(x)h∗ (
)dx
a
a
(3.11)
Note that both the wavelet transform and the short time Fourier transform can be
seen as Wigner-Ville distributions, [87].
The wavelet transform can be seen as projecting the signal onto a set of basis
functions, ha (x). From a mathematical point of view it would be preferable to
have an orthogonal basis to minimize redundancy and interference between different
wavelets. This assures that it is possible to reconstruct the signal from its transform
by summing up inner products.
Z
i(x) =
a>0
CW T (τ, a)ha,τ (x)
dadτ
a2
(3.12)
Surprisingly enough this holds even though the wavelets ha,τ (x) are not orthogonal,
[25]. The requirements on the wavelets are
23
• h(x) should be of finite energy, i.e
R
h(x)h∗ (x)dx < ∞
• h(x) should be of bandpass type, meaning that the reconstruction scheme in
Eq. 3.12 is only for the signal energy i(x), the DC-component cannot be reconstructed.
Discrete Wavelet Transforms
A natural question to ask is if it is possible to find an orthogonal wavelet basis by
a careful sampling of a and τ . The answer is that it is possible but it depends
critically on the choice of h(x). There is a trade-off between the orthogonality and
the restrictions on the wavelet. If the redundancy is high (oversampling), there are
mild restrictions on h(x). But if the redundancy is small (close to critical sampling),
then the choice of wavelet is very constrained, [25].
The coefficients, cmn , in the discrete wavelet transform (DWT) are given by
cmn
1 Z
x − bmn
=√
i(x)h∗ (
)dx =< hmn , i >
an
an
(3.13)
where hmn and the sampling points in the discrete wavelet transform are
hmn = h(
x − bmn
)
an
(3.14)
an = eαn
bmn = βman
where
an is the dilation sampling, or the sampling by the wavelet in the frequency domain.
bmn is the spatial sampling.
The sampling density is now decided by the parameters α and β.
To analyze the behavior of the DWT, the notation frame from Daubechies [25]
is an appropriate concept. The frame-bounds, A and B, of the wavelet transform
are defined according to
A||f ||2 ≤
X
|| < hmn , f > ||2 =
mn
X
||cmn ||2 ≤ B||f ||2
(3.15)
mn
independently of the signal f ∈ L2 . Then, hmn is a frame if A > 0 and B < ∞.
These frame-bounds balance the demand on the wavelet. If all the hmn are
normalized the following terminology is used.
A = B = 1 Orthonormal base.
A = B > 1 Tight frame, the number A is a measure of the redundancy.
A≈B
Snug frame, the inverse is well-behaved.
The signal to noise ratio, SNR, of a synthesis of the original signal from its wavelet
coefficients can be calculated from the frame-bounds, [25].
24
3.2.3
The Lognormal Filter
A function which is well concentrated in both domains, suggested in [67], is the
so-called lognormal function, which in the Fourier domain is given by
F (ω) = e
−4 ln2 (ω/ωi )/B 2
logn
ln(2)
(3.16)
where
ωi = center frequency,
Blogn = 6 dB sensitivity bandwidth in octaves.
This function is designed to meet several requirements:
• The DC-component should be zero.
• It should be possible to splice this function with zero sensitivity for negative
frequencies.
Note: not only F (0) = 0 but also all its derivatives.
• The relative bandwidth, see Eq. 3.9, should be independent of the center frequency, ωi .
• The function is self-similar which enables it to be used as a “mother wavelet”.
The drawback of the filter function is that the inverse Fourier transform is not known
in a closed form.
Theoretical Frame-Bounds for the Lognormal Function
In Daubechies [26, 25], an extensive calculation scheme for different kinds of wavelet
transforms is given. Following this scheme for the lognormal function, the framebounds A and B from Eq. 3.15 can be calculated.
The calculation of the frame-bounds involves an analysis in the Fourier domain
of the suggested “mother wavelet” in the context of the sampling density chosen.
In the case of the lognormal bandpass function this analysis is easiest and most
comprehensible to evaluate when performed in the frequency domain.
The first demand of a wavelet is that it should be of finite energy, or in the terms
of wavelets that the admissibility condition should be fulfilled, i.e.
Z
ch = 2π
|H(ω)|2
dω < ∞
|ω|
(3.17)
For the lognormal frequency function F (ω), this is easily calculated giving the result
s
cF = Blogn π
25
π ln(2)
2
(3.18)
In [26, 25], compact support of the Fourier transform of the “mother wavelet” in
[l, L] is assumed. Unfortunately, the lognormal function is not of compact support.
In practical situations , however, the signals are discrete, i.e. they are sampled, with
the Nyquist band limitation at π. This means that the Fourier domain is periodic
and that the interval [−π, 0[ is the same as the interval [π, 2π[. Truncating the
function at ω = π, and choosing the interval [l, L] as [ε, 2π], where ε ≈ 0 > 0, means
that negative frequencies will not contribute with any energy. Hence the compact
support interval will be [ε, 2π]. To give an impression of the approximation caused
by the truncation of the filter function, say that the parameters Blogn = 1.5 and
√
ωi = π/(2 2) in Eq. 3.16 then the frequency function has decayed to 1/16 of its top
value at ω = π. Thus the truncation will hardly introduce severe errors.
The calculation of the frame-bounds can now proceed by recalling Eq. 3.15, and
rewrite it in terms of the sampling points, an and bmn
A|f |2 ≤
XX
m
where
X
m
| < an , bmn ; h|f > |2 =
n
XX
m
| < hmn |f > |2 ≤ B|f |2
(3.19)
n
2
X Z L
jωβm
| < an , bmn ; h|f > | =
e
H(nω)F
(ω)dω
l
m
2
(3.20)
By imposing
2π
≈1
L−l
the spatial sampling density is locked to the scale sampling density, i.e.
β=
bmn =
2π
man ≈ man
L−l
It is now possible to simplify Eq. 3.20 using the Parseval relation
X
2
X Z L
jω2πm/(L−l)
=
e
H(a
ω)F
(ω)dω
n
l
m
Z
2
| < an , bmn ; h|f > |
m
|H(an ω)|2 |F (ω)|2 dω
=
(3.21)
It may be worth mentioning that in the above equations the Fourier domain is
supposed to be continuous and the spatial domain discrete. The Parseval
relation
R
P
2
thus relates the m |cm | in the first line in Eq. 3.21 and the integral |F H|2 dω on
the second line.
Remembering that only positive frequencies are involved in the calculations it is
possible to define
F F (s) = F (es ),
HH(s) = H(es ).
Substituting ω = es in Eq. 3.21, also remembering that an = enα
XX
n
2
| < an , bmn ; h|f > | =
Z
2
e |F F (s)|
s
m
X
n
26
!
|HH(s + nα)|
2
ds
(3.22)
0.25
0.2
0.15
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
-8
-6
-4
-2
0
2
4
6
8
Spatial position
Figure 3.5: The spatial lognormal filter realized as a 15 tap filter. The real part
(solid) and the imaginary part (dashed) are plotted.
This shows that
"
A = inf
sR
B = sup
sR
X
"
#
|HH(s + nα)|
2
n
X
#
|HH(s + nα)|2
n
since
Z
A
2
e |F F (s)| ds ≤
s
Z
2
e |F F (s)|
s
X
!
|HH(s + nα)|
2
ds ≤ B
Z
es |F F (s)|2 ds
n
From this short review of the calculations from [26, 25], it is now possible to
proceed by investigating the frame-bounds for the lognormal frequency function.
Practical/Numerical Calculation of Frame-Bounds
As the analysis has been performed in a continuous Fourier domain, one must remember that the filters will be applied as spatial convolution kernels. These kernels
must in practical situations be implemented with a limited number of coefficients.
This is one reason for choosing functions which are smooth in both domains.
The lognormal filter has been realized as a spatial convolution with 15 coefficients,
which is plotted in both domains in Figures 3.5 and
√ 3.6. The parameters for the
lognormal filter is set to Blogn = 1.5 and ωi = π/(2 2) in the Figures.
27
1.2
1
0.8
0.6
0.4
0.2
0
-0.2
-4
-3
-2
-1
0
1
2
3
4
Frequency
Figure 3.6: The lognormal filter, both the ideal function (solid) and a transform
of the realized convolution kernel from Figure 3.5 (dashed). The filters are plotted
against a frequency axis and one period of the Fourier domain is shown.
The frame-bounds for the lognormal functions have been calculated numerically
following Eq. 3.22. Eq. 3.22 has been applied directly with the Fourier transform of
the 15 tap filter as the “mother wavelet”. The frame-bounds with 8 discrete octave
based scales used, i.e. 2l where l ∈ [−2, 5], are given as the maximum and minimum
of the plot in Figure 3.7. The frame-bounds are for this specific case:
B = 1.33
and
A = 1.12
(3.23)
where the function is normalized with the energy.
This scheme has also been tested with another widely used filter function: the
Gabor function. In the wavelet community this is called the Morlet wavelet, i.e.
a Gabor function, Eq. 3.8, sampled with constant relative bandwidth. In image
processing this sampling scheme was proposed by Granlund [39] and has been widely
used, e.g. [84]. When implementing the Gabor functions as a quadrature filter,
fulfilling the demands in section 3.1, the bandwidth, or standard deviation, of the
filter must be rather small. There is a trade-off between the bandwidth of the filter
and the demand that the filter should be zero, or at least small enough, at ω = 0
and at ω = π.
The 6 dB bandwidth Bg of a shifted Gaussian with standard deviation σω and
center frequency ω0 is approximated with
Bg = log2
ω0 + σω
ω0 − σω
In practical situations Bg will be in the interval 0.8–1.2.
28
(3.24)
Plot of bounds for octave based lognormal function
1.4
1.2
1
0.8
0.6
0.4
0.2
0
-4
-3
-2
-1
0
1
2
3
4
Frequency
Figure 3.7: Illustration of how the frequency sensitivity of the lognormal filter, used
in an octave based DWT varies. The frame-bounds for lognormal filter in Figure 3.5
are the maximum and minimum of this plot.
The experiment
has been carried out with a Gabor filter with center frequency
√
ω0 = π/2 2 and the relative bandwidth Bg = 1.2, realized as a spatial 15 tap
filter. The used function is plotted in Figure 3.8 and its Fourier transform is given
in Figure 3.9 together with the ideal function.
Applying Eq. 3.22, see Figure 3.10, for an octave based wavelet transform and,
as before, 8 discrete scale levels give the frame-bounds for this Gabor function
B = 1.64
and
A = 1.16
(3.25)
The Scaling Function
So far nothing has been said about the scaling function, sometimes named the
“wavelets’ father”, [31], that must be incorporated in the subsampling procedure.
This function should, preferably, be orthogonal to the wavelets still remaining to
apply, e.g. an ideal lowpass filter. A more realistic scaling function is a Gaussian,
see Appendix 4.A, which has been applied and tested in the same way as above
giving the following results. These results are relevant only if the wavelet is applied
on a pyramid.
For the lognormal 15 tap filter in Figure 3.5 the result is that the frame-bounds
over the total interval are
B = 1.30
and
29
A = 0.68
(3.26)
0.2
0.15
0.1
0.05
0
-0.05
-0.1
-0.15
-8
-6
-4
-2
0
2
4
6
8
Spatial position
Figure 3.8: The spatial Gabor filter realized as a 15 tap filter. The real part (solid)
and the imaginary part (dashed) are plotted.
1.2
1
0.8
0.6
0.4
0.2
0
-0.2
-4
-3
-2
-1
0
1
2
3
4
Frequency
Figure 3.9: The Gabor filter, both the ideal function (solid) and a transform of the
realized convolution kernel from Figure 3.8 (dashed).
30
Plot of bounds for octave based gabor function
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
-4
-3
-2
-1
0
1
2
3
4
Frequency
Figure 3.10: Illustration of how the frequency sensitivity of the Gabor filter used
in an octave based DWT. The frame-bounds for Gabor filter in Figure 3.8 are the
maximum and minimum of this plot.
√
but limiting the interval to be maximum π/ 2 gives the frame-bounds
B = 1.30
and
A = 1.02
(3.27)
The result from Eq. 3.22 is given in Figure 3.11.
3.2.4
Conclusion
The calculations in this section has shown that an octave based bandpass pyramid
based on lognormal filter functions is appropriate to apply. The maximum variation
in sensitivity to different scales is approximately 20 %. This can be compared to
an octave based bandpass pyramid based on Gabor functions, where the variation
is approximately 40 %.
3.A
Filter Optimization
This appendix is more or less a review of the methods reported about filter optimization
in [67].
The reason for using an optimization procedure in filter design is that the filter must
be realized with a limited number of coefficients. The optimization will then decide how
to choose these coefficients so that a prescribed ideal function is approximated as well as
possible, in some norm. Usually the ideal function is given in the Fourier domain and the
31
Gaussian scaling and octave based lognormal wavelet
1.4
1.2
1
0.8
0.6
0.4
0.2
0
-4
-3
-2
-1
0
1
2
3
4
Figure 3.11: Illustration of the frequency sensitivity for the lognormal filter, Figure 3.5, with a Gaussian scaling function. Maximum and minimum of this function
are the frame-bounds for the total DWT.
filter should be realized as a spatial convolution kernel. The Discrete Fourier Transform,
DFT, could then be used to get the convolution kernel. The larger the spatial kernel, the
better the approximation of the ideal function will be, but this is, of course, at the cost of
higher complexity in the computation, and give also poorer spatial resolution.
Using a DFT imposes that it is equally important to follow the ideal function, Fide ,
at all frequencies, i.e. the DFT performs the following minimization if the spatial grid is
prescribed:
X
|Fide (ω) − DFT[fspat (x)](ω)|2
min
ω∈[−π..π[
meaning that fspat is given by an inverse DFT, IDFT, of the ideal function. While fspat is
spatially limited the Fourier domain must be periodic, which is the reason for the interval
[−π..π[ in the optimization.
In many practical situations the assumption of equal importance for all frequencies
will not hold, e.g. the DC-component has an extremely high importance. Thus a more
realistic optimization would be
min
X
w(ω) |Fide (ω) − DFT[fspat (x)](ω)|2
(3.28)
ω∈[−π..π[
where w(ω) is a weighting function. In this minimization it is possible to put in a priori
knowledge of both the signal spectrum and the noise spectrum.
Statistically, according to [67], images have a spectrum decaying somewhere between
1/ω to 1/ω 3 . This and the importance of the DC-component have been used when the
weighting function, w(ω), has been designed.
32
It is also worth mentioning that a filter realized with a fixed number of spatial coefficients can be transformed to the Fourier domain and represented with a large number of
coefficients. The trick is to use zero padding in the spatial domain to get a better resolution
in the Fourier domain. This means that the ideal function, Fide , can be discretized with
a sufficient number of coefficients, even though the convolution kernel, fspat , is supposed
to be realized on a rather sparse grid.
33
Chapter 4
On the Use of Phase in Scale
Analysis
4.1
Phase Estimation in Images
An extension of the phase concept from the well-known theory of one-dimensional
signals to multidimensional signals is not trivial. The main reason is that the ordering problem does not have a unique solution for a dimensionality of two and higher.
This implies that
a) the neighbourhood should be one-dimensional, i.e. vary in only one direction,
b) the direction in which the phase should be estimated must be predefined and
c) the resulting phase representation must be invariant to rotation of the input
signal.
The requirements a) and b) can be met by finding the one-dimensional structures
and extracting a phase independent orientation, [67], giving a rather straightforward
generalization. The estimation of orientation is further described in Section 5.1,
here it is just assumed to be accessible in a “double angle” representation, [39], see
Figure 4.1. The orientation measure is used to control in what direction the phase is
estimated and the phase is given by the responses from the quadrature filters used.
Further, it will be shown that a three-dimensional representation, involving phase
and orientation, is necessary to meet the requirement c).
Following this discussion by assuming a one-dimensional structure and that a
filter, qk , is exactly aligned with the orientation of the one-dimensionality. It is
then possible to define the phase of this filter, θph,k , as the quotient between the
odd and the even filter results. The filter, qk , could for example be a lognormal
quadrature filter, Eq. 3.16.
qk = qke + jqko
θph,k = arg(qk )
In order to give a better understanding of the phase concepts some particular
instances of phase angles are presented below and in Figure 4.2.
34
ϕori
Figure 4.1: The used orientation representation where a doubling of the argument
produces a continuous mapping of the local orientation, see also Section 5.1.
• θph = 0: corresponds to a symmetric input, i.e. a line, which is bright in
comparison with its surroundings.
• θph = π: corresponds to a symmetric input, i.e. a line, which is dark compared
to its surroundings.
• θph = ± π2 : corresponds to an antisymmetric input, i.e. an edge. The sign
cannot be defined without the orientation of the edge.
blue
θph
red
θph
green
Figure 4.2: Left: The phase as a line/edge detector. Right: Illustration of a rectified
phase, where the sign of the edge is skipped, the colors corresponds to the colors in
the result images.
This is, of course, just the phase from one filter at a particular orientation. To
get a general phase estimate that is equally sensitive to every orientation, all filters
35
must be involved in the overall estimate. One way of doing this is to sum the even
filters separately from the odd ones and take the quotient between the sums.
X
θph = arg(
qke ,
X
k
qko )
k
This is possible if the summation is carefully done, since the signs of the results
from the odd filters are dependent on the main direction of the input. There is a
fundamental discontinuity in the phase estimate, or more precisely in the sign of
edges, on images. Consider a dark circle on a brighter background. The sign of the
edge can for example be defined as positive, if white is to the right (x in Figure 4.3).
Start to walk around the circle, and the sign of the edge changes from positive to
negative when you have reached the opposite side (o in Figure 4.3). Somewhere
between these two points the edge abruptly has changed its sign. This discontinuity
is orientation dependent, so if the orientation is available simultaneously, the representation could be made continuous. The crucial factor in the summation of the
filter responses is to ensure that all filters change signs at the same orientation.
x
o
Figure 4.3: Circle illustrating the edge discontinuity.
Another way to define the phase is to use the orientation estimate to interpolate a quadrature filter in that direction. This will be the phase taken across a
one-dimensional structure inside the neighbourhood. The interpolation can be analytically solved for the even part, but for the odd part of the filter an optimization
criterion must be decided and solved numerically. Recalling Eq. 3.4 the even part
of the filter has an angular variation cos2A (ϕ − ϕk ), where ϕk is the main direction
of the filter. As an example, put A = 1, ϕk = k π4 and use four filters corresponding
to k = 0, 1, 2, 3. To interpolate in an arbitrary direction, ϕ0 , the expected filter
response looks like cos2 (ϕ − ϕ0 ). This is readily obtained by simple trigonometric
formulas.
1
(1 + cos(2(ϕ − ϕ0 ))) =
2
1
=
(1 + cos 2ϕ cos 2ϕ0 − sin 2ϕ sin 2ϕ0 ) =
(4.1)
2
1
π
π
=
[1 + (cos2 ϕ − sin2 ϕ) cos 2ϕ0 − (cos2 (ϕ − ) − sin2 (ϕ − )) sin 2ϕ0 ]
2
4
4
cos2 (ϕ − ϕ0 ) =
Since
sin2 ϕ = cos2 (ϕ −
π
)
2
and
0.5
X
k
36
cos2 (ϕ − ϕk ) = 1
all terms in Eq. 4.1 are among the four specified and fixed filters.
The odd part has an expected angular function cos2 (ϕ − ϕ0 )sign[cos(ϕ − ϕ0 )],
Eq. 3.4, which cannot be interpolated from the four fixed odd filters with the same
main directions as the even filters. Thus a least square error interpolation table
has been calculated, another way of doing this interpolation is presented in [6].
The normalized errors from this numerical solution are no worse than 0.08, and the
maximum error is where the function value is smallest, for orientations right between
the fixed orientations. No large effect from these errors has been observed in the
calculation and processing of phase images.
The phase estimation after the interpolation is taken as the argument of the
complex number, qeϕ0 + jqoϕ0 , where qeϕ0 is the output from the interpolated even
filter and qoϕ0 is the output from the interpolated odd filter.
The second way of estimating phase has been chosen in the implementation, but
the difference in appearance compared to the first method is negligible. In the case
of a perfect one-dimensional signal there will be no difference at all between the
methods.
4.1.1
Representation
To make the phase description invariant to rotation of the input signal, a threedimensional representation is necessary. The parameters involved in the representation is the energy, the phase angle and the orientation. The chosen representation
fulfills the three constraints, a), b) and c), from above. The representation builds
on a spherical coordinate system, see Figure 4.4, and was originally proposed by H.
Knutsson. In this system corresponds
The phase
θ = θph
ϕ = ϕori /2 Half the orientation estimate
r = E
The energy
(4.2)
where
the energy measure can be chosen as the sum of the energies from
each filter,
qP
q
2
2
2
2 .
i.e.
qeϕ0 + qoϕ
k (qko + qke ), or as the energy from the interpolated filter, i.e.
0
The slight modification here from the common definition of a spherical coordinate
system is that θ ∈ [0, 2π[ and ϕ ∈ [0, π]. Although it is possible to rewrite it on the
standard form, this notation is kept in order to get an appreciation of what is
happening in the chosen space when changing the input.
The representation fulfills the proposed requirements for a good representation,
since it is
• invariant to rotation of the input,
• possible to average, where the mean vector equals the mean of the signal without any drift.
The last point is obvious, but the first one needs some contemplation. The
37
z
r
θ
y
ϕ
x
Figure 4.4: A spherical coordinate system.
invariance can be proven by introducing


 x
= r sinθph cosϕ
y = r sinθph sinϕ


z = r cosθph
(4.3)
The easiest way to define phase angle and orientation is according to Figure 4.5.
θph changes signs abruptly in one specific orientation, ϕori . This orientation can
be chosen as ϕori = 0 without loss of generality. By halving the “double angle”
representation of orientation, the discontinuity will occur not between ϕori = 0 and
ϕori = 2π, but between ϕ = 0 and ϕ = π. Let us then examine what happens in the
representation space in these orientations.
What happens when ϕ = 0: Figure 4.5 defines θph = + π2 , then


 x
= r sin( π2 ) cos(0) = r
y = r sin( π2 ) sin(0) = 0


z = r cos( π2 ) = 0
(4.4)
Now let us look at ϕ = π and θph = − π2


 x
= r sin( −π
) cos(π) = r
2
−π
y = r sin( 2 ) sin(π) = 0


)=0
z = r cos( −π
2
(4.5)
A popular way of describing the representation would be to say that the orientation changes its sign at the same place as the phase does, and that these changes
compensate each other.
38
+ π_ π
2
π_
−
0
2
+ π_
2
ϕ
ori ϕ θ ph
2π
0
0
0
Figure 4.5: Definition of phase, θph , and orientation, ϕori .
By noting that the phase estimate is insensitive to the orientation when θph = 0
or θph = π, meaning centers of lines or blobs, the examination of the discontinuity
points show that this space is invariant to rotation of an input edge or line.
4.1.2
Experimental Results
To complete the presentation of the three-dimensional phase description two examples of the representation on both a synthetic and a natural image will be presented
in Figure 4.6 and Figure 4.7 respectively. The images are divided into four sections.
The upper left part is the x-component, the upper right is the y-component, the
lower left is the z-component and the lower right part is the original image. Before
visualisation the components have been normalized to the interval between zero,
meaning black, and one, meaning white. In these images it easy to see that the
chosen three-dimensional phase representation is continuous in each of its Cartesian
components.
The filter used for the radial part is the one described above. The angular function
is
cos2 (ϕ − ϕk ) where
π
ϕk = k
with k = 0, 1, 2, 3
4
4.2
Feature Scale-Space Utilizing Phase
This section describes a new algorithm which detects in what scale an event appears
and also in what scale it disappears. In this way the scale-space is subdivided into a
39
Figure 4.6: Example of the 3D-phase description of a natural image.
number of intervals. Within each scale interval a consistency check is performed to
get the certainty of the detection. High certainty means visually important features.
The lowpass pyramid that is calculated as a first step in the algorithm is similar to
the one Marr, [78], proposed. Hence the sampling in scale will be on octave basis.
The pyramid will typically contain images from 512x512, 256x256, . . . , down to
16x16.
It is shown that by using the three-dimensional phase representation of image
data from Section 4.1, it is possible to do both the splitting and the consistency
check in a simple manner. The scale levels between different events are detected
when a particular dot product becomes negative and the consistency will be a vector summation in the interval between these scales. The specific levels where a
split occurs will, of course, be contextually dependent. There will also be different
numbers of levels in different parts of the images.
To get scale invariance in the feature extraction, the chosen feature must be
estimated in all, or at least many, scales. The sample density in resolution has
been tested with both octave differences and half octave differences. Tests have
shown that nothing is gained by doubling the sample density. This is not surprising,
since the calculation of the frame-bounds in Section 3.2.3 indicates that almost all
information is captured in an octave based pyramid using lognormal filters. The
objective of the implementation is to estimate the phase of successively smaller
images from the lowpass pyramid, suggested in Appendix 4.A. By using exactly the
same set of quadrature filters at all size levels in the lowpass pyramid, an octave
based bandpass pyramid is obtained.
√ The filter function chosen is given in Eq. 3.16
with the center frequency ωi = π/(2 2) and the bandwidth B = 1.5, as in the frame
calculations of Section 3.2.3.
40
Figure 4.7: Example of the 3D-phase description of a synthesized image.
4.2.1
Scale Space Clustering
The phase will always be dependent on the scale of the filters used to estimate it.
Figure 4.8 shows the responses from filters with different frequency response applied
to an edge. The phase response will be independent of the frequency responses
of the filters if there is a scale invariant event, such as an edge or a line, and the
filter is centered upon this event. This clearly indicates that scale analysis used for
deciding signal phase and phase consistency over scale is an important tool in image
processing. The proposed representation has been implemented in a scale pyramid
and it has proved to be robust in rather noisy situations.
The description of scale space clustering is divided into two sections. The first
section concerns isolated events, where no clustering is necessary. The second section
is an extension of the first part in the sense that events not necessarily need to be
isolated. In the second section the events are allowed to be objects within objects,
what in the sequel will be referred to as nested events. In this case the scale space
must be divided into different sections originating from different nesting levels.
41
step edge
high frequency
filter response
middle frequency
filter response
low frequency
filter response
Figure 4.8: Filter responses to a step edge from different filters.
42
Scale Analysis for Isolated Events
One main idea in the proposed algorithm is to do scale analysis as pixelwise operations through all scales. To be able to do this between different scale levels in a
simple manner the images must have the same size. The interpolation scheme proposed in Appendix 4.A can be applied, and afterwards the analysis will be restricted
to pixelwise operations.
The scheme for interpolation to a higher sampling rate described for grey scale
images can easily be extended to vector images, such as for example the result of
the phase estimation. The only difference is that the averaging must be a vector
averaging. Vector averaging can be performed as a convolution, in which the magnitude of the vectors are weighted with the proposed interpolation function and the
summation is a vector summation. Applying a proper interpolation function at each
level of the phase pyramid results in a high tower, in which images at all scale levels
have the same size. An example of this is given in Figure 4.9, where three samples
of scales are presented. The lowpass image and the corresponding phase image are
displayed pairwise in Figure 4.9. The phase images are displayed in their rectified
form, meaning that green corresponds to 0◦ (locally bright areas), blue to ±90◦
(unsigned edges) and red to 180◦ (locally dark areas), see also Figure 4.2.
Several combinations of the images from different scales have been tested. One
combination is to find the scale level that has the highest energy. This is similar to
the algorithm in [23], but using phase representation over the whole image. It turns
out that this gives a discontinuous result image, since the chosen level can emanate
from drastically different scales in adjacent pixels. The maximum criterion is thus
no well-behaved combination over scale.
Another variation is to find the maximum pixelwise vector sum of phase images
of adjacent scale levels according to Eq. 4.6.
The sum, C, can be expressed in a Cartesian coordinate system as
l=j
X
C(x, y, z) = (
xl ,
l=i
l=j
X
l=i
yl ,
l=j
X
zl )
(4.6)
l=i
where x, y and z are defined in accordance with Eq. 4.3, and i and j are indices of
what scale levels that contribute to the sum.
The vector sum can be regarded as a moving average of the samples in scale.
Hence the maximum over the sums gives a smoother function than merely taking
the maximum. The sum is not only taken for pairwise adjacent scale levels but
triples and quadruples and so on are likewise regarded. The length of the resultants are normalized so that the resulting length is independent of the number of
vectors forming the sum. The maximum of these vector resultants gives a good
representation of the original image, at least as long as the event is isolated. It was
then discovered that a vector sum involving all scale levels gives a slightly better
representation and is much simpler to implement.
The vector summation over scale as a consistency measure is possible because
of the chosen representation, where the length of the vector tells about energy, or
consistency, in the specific frequency band, and the angles are the phase angle and
43
Figure 4.9: Three samples of differently scaled images both as lowpass images and
represented as phase.
44
the orientation. If the operation makes the same “statement” in all scales, the
output magnitude will be very large. If, on the other hand, the estimation is not
the same for different scales the magnitude will go down, see Figure 4.10, which is
an example of a two-dimensional representation.
consistent vectors
resultant
inconsistent vectors
resultant
Figure 4.10: Sums of consistent and of inconsistent vectors.
To conclude, an example of the sum of phase vectors is given in Figure 4.11. The
left image is the total phase summation image of the image in Figure 4.9 and the
right one is a map of phase angles from color to grey scale. Note that almost all
information from the original image is represented in the phase summation image.
The drawback of this solution is the restriction to isolated events. In a case where
there are nested events, the result will be a combination of the phase vectors of all
objects. The result will not correspond to either of the underlying events but to
some kind of average of them. This average is, of course, not a good representation
of a nested event. To take an everyday example, it is like averaging two roads that
are situated to the left and to the right of a tree respectively. The average will be a
non-existent road straight through the tree.
Scale Analysis for Nested Events
To be able to handle nested events in a consistent way, a subdivision of the scale
pyramid, or actually of the scale high-tower, is necessary. This is based on how
events behave in a blurring process. As stated earlier, each item manifests itself in
a particular scale interval. If there is more than one object at a certain place, there
is more than one maximum in the scale dimension. To distinguish between these
events a subdivision is necessary.
To follow the reasoning consider the stylized image in Figure 4.12. The lines
45
Figure 4.11: A phase summation image in its rectified form, left, and mapped to a
grey scale image, right.
are the grey level intensity at different scales and the vectors indicate the phase
responses. Observe that the angles of the vectors normally change smoothly between
adjacent scale levels. The exception is when the small minimum in the middle is
“swallowed” by the larger maximum. Between level 2 and level 3 these particular
vectors turn from 180◦ (←) to 0◦ (→). The implication of this is that a split should
occur between these levels in the area of the small minimum. The small minimum
should be analyzed only in the two scale levels with the best resolution to avoid
interference with the maximum. The large maximum, on the other hand, should be
analyzed in level 3 and in more blurred versions of the image.
The subdivision of the scale pyramid is a splitting process. The essence of this
process is a dot product between phase vectors from two adjacent scale levels. If the
dot product is positive, then the event is said to exist in both scales, and no split
should occur. If it is negative, a split between these levels should take place. The
negative sign of the dot product implies that the phase vectors of the regarded scale
levels originate from different events. The proposed algorithm results in a nested
description of the image. The number of levels in the description is contextually
dependent and equals the number of nesting levels in the image.After the subdivision
of the pyramid each interval only contains one isolated event at any given location
of the image. This implies that the vector summation can be used as a consistency
measure.
Some examples of the performance of the algorithm, are shown Figure 4.14 and
Figure 4.15. The images are split into four sections, where the upper left part is the
scale interval with the finest resolution. The upper right part is the next interval
46
level 3
level 2
level 1
Figure 4.12: A stylized example of a blurring process for nested events.
with larger scale, the dark parts mean that there is only one scale interval in these
areas. The lower left shows if there are any third intervals in the images. Finally, the
lower right part indicates where in the scale the splitting levels occur. The intensity
is where the first split level is, low intensity means that the first split occurs at a
rather high frequency level. The hue reflects if there is any second split and on what
level this takes place. The colors in the phase images correspond to phase angles
without signs going from green to blue to red meaning, 0, π/2 and π angles. The
intensities in these images are taken directly from the vector sum their respective
scale interval.
The subdivision of the pyramid is not only valid for phase estimations but can be
used as contextual control for other features. Orientation and spatial frequency are
two examples of such features. Hence it is possible to give a description of an image
event containing many aspects that are estimated in the correct scale interval.
4.2.2
Spatial Frequency Estimation
An algorithm for spatial frequency estimation will be described and results on synthesized images as well as on natural images will be presented. The algorithm
presented here uses the separate results from each level of the bandpass pyramid to
estimate spatial frequency. Spatial frequency is known to be a good discriminator
between different kinds of textures, [61, 16]. This property is further enhanced when
it is combined with a phase-gating process. The purpose of the gating is to ensure
that only neighborhoods in which spatial frequency is a good descriptor are utilized
47
Figure 4.13: The original test image.
in the classification.
A Spatial Frequency Algorithm and Representation
The chosen algorithm is intimately connected to the chosen representation. The
belief is that spatial frequency can be represented in a modular, or circular, manner.
The inspiration to this is the fact that one, and only one, color is perceived when
humans observe light with an arbitrarily complex (time) frequency distribution. The
color is mainly determined by the frequency component with the largest energy. If
all different frequencies contained in the perceived light are represented as vectors
pointing in a color circle, see Figure 4.16, and the lengths of the vectors reflect the
energy of the represented frequency, then the perceived color would be determined
by a vector sum of these vectors.
The key idea in the spatial frequency estimation is to decide the major frequency
in a given neighbourhood and, if many frequencies are detected simultaneously, to
lower the certainty. In the modular representation this is implicitly given. It will
be possible to do spatial averaging directly upon the result image. A conventional
linear representation of spatial frequency would not be easy to do averaging on, since
a drift towards the middle of the represented interval should occur. The parallel to
the decrease in confidence with increasing bandwidth in color spaces, is the change
of color to white, in which all (time) frequencies are “contained” and the saturation
equals zero.
To obtain the circular representation, a vector combination of the energies from
the levels of the phase pyramid, which is a bandpass pyramid, with fixed axes is
executed in accordance with Figure 4.17, where E6 corresponds to the energy from
48
Figure 4.14: A phase description of a synthesized image.
the lowest resolution, and E1 corresponds to the energy from the highest resolution. This example is from a pyramid with six levels, but the strategy can be used
with any number of levels. The only condition is that the axes of summation are
evenly spread over the unit circle. In Figure 4.17 there is also an intuitive interpretation of what different angles mean in the circular spatial frequency representation.
The translation of these angles to corresponding colors is also indicated to make
understanding of the result images easier.
The section between E1 and E6 in Figure 4.17 is due to the modular representation
and is a parallel to the so-called purple line in color perception. In the case of spatial
frequency the statements in this section mean mixed frequencies, both very high and
very low ones, in the same area.
The representation will be a complex number in accordance with the hierarchical
scheme proposed in [39]. In a general case with a pyramid with N levels, where level
1 is the one with the lowest resolution, the mapping to spatial frequency, Fre(m, φ),
looks like
Fre(m, φ) =
N
X
Ek exp(jπ(−1 +
k=1
2k − 1
)
N
(4.7)
where m is the confidence of the estimate and φ is the estimate.
To analyze the estimate, a plot has been calculated for a single frequency as input.
The plot can be found in Figure 4.18 with logarithmic frequency on the horizontal
axis. The upper part is a plot of the confidence, m, in Fre and the lower part is the
estimate, φ.
Observe how linear the response is in an interval over 4 octaves. The possibil49
Figure 4.15: A phase description of a natural image.
ity to extend the interval by normalizing the estimate with the response curve in
Figure 4.18 is apparent and it seems reasonable to get a linearized estimate over
6 octaves. The plot of the confidence is almost constant over the linear interval.
The maximum of the confidence for high frequencies comes from the fact that in
Figure 4.18 the band limitation of the original image is assumed to be caused by
an ideal lowpass filter. The filter with the highest peak frequency operates directly
upon the original image, while the other filter operates on images that have been
filtered with a Gaussian in the subsampling.
If spatial frequency is supposed to be used as a size descriptor, then the estimate must be object-oriented. Thus a problem that has not been mentioned yet in
connection with frequency estimation appears: the existence of nested events in the
image. To get a proper size description of an object, the estimation must be done
in the correct scale interval, and other scales must not interfere in the description.
This problem was solved in the case of phase estimation by a subdivision of the scale
pyramid. The solution can also be used in frequency estimation. The most natural
way is to use the very same subdivision. This implies that the index in Eq. 4.7
should be restricted to the considered scale interval, and the restriction should be
controlled by the subdivision of the phase pyramid. The description of the image
will then be both a phase estimate and a frequency estimate, originating from the
same scale interval.
The presentation will be concluded with some examples of results from the estimations, see Figure 4.19 and Figure 4.20. The images are, as in the case of phase
estimation, divided into four sections. The partitioning is the same as in the phase
images, Figure 4.14 and Figure 4.15. The difference is that the hue now means
spatial frequency. The colors that reflect the frequency, in order from low to high
50
Bl
et
ue
−g
re
en
Blue
ol
Vi
Pur
Gra
ple
dua
saturation
lly changing hue
Increasing
White
Green
d
Re
Ye
llo
Or
Yellow
ang
e
w−
gr
ee
n
Figure 4.16: An approximation of a psychological color space as a circle.
frequencies, are orange, green and blue which also is indicated in Figure 4.17. The
mixed frequencies are displayed in a reddish hue. The intensity is a combination of
energy in the input and the bandwidth, high bandwidth lowers the intensity.
Phase-Gating
The phase-gating concept is a postulate that estimates from an arbitrary operator
or algorithm are valid only in particular places, where the relevance of the estimate
is high. To take an example in the context of frequency estimation, the size of an
object should be decided independently of how sharp the edges are, thus only the
even component of a quadrature filter should be involved in the frequency estimation.
First a test was performed on a test image containing ten circles with different radii,
Figure 4.21. The radii range from 4 pixels to 90 pixels, and the√increase of the radii
is half-octave based, i.e. the radius is increased by a factor of 2 from one circle to
the next. The result from the phase gating is given in yellow in Figure 4.21, where
the original image have been corrupted with white additive Gaussian noise1 . To
illustrate the performance of the proposed spatial frequency algorithm the mean of
the angles, φ from Eq. 4.7, inside each phase gated area is plotted in Figure 4.22
versus the radii of the circles.
The second experiment was to test the discrimination ability of the size descriptor
for texture segmentation . The image to classify is the one in Figure 4.9 and the only
feature used is spatial frequency. Figure 4.20 shows the spatial frequency estimate
for the texture image to be classified. The scale interval with the highest consistency,
i.e. in the interval with the highest vector sum resultant in the phase pyramid, is
1
The SDEV of the noise was 12.8 and the amplitude of the signal 153.
51
E2
violet
E1
π/3
E3
Φ
green
E4
E6
orange
E5
Figure 4.17: Left: A six level example of how the frequency algorithm works.
Right: An intuitive explanation of what the angles mean in the circular spatial frequency representation.
used in the classification to simplify calculations.
By using phase-gating, the frequency estimate is decomposed into one ”dark”
and one ”bright” frequency estimate, or one estimate for the background and one
for the object. The bright frequency estimate is taken as a size descriptor of locally bright areas of the image. The phase from exactly the same scale interval as
the frequency controls the division. Phase angles around 0◦ , meaning locally bright
areas, are masked upon the frequency estimate and thus a ”bright” frequency estimate is created and the ”dark” frequency estimate is created in a similar way. The
construction of these images are defined in such a way that edge frequencies, where
the phase angle is ±90◦ , are neglected.
These two images of size 256x256 have before being used as feature images in
the classifier, been averaged with a 31x31 average filter to use larger areas in the
calculation of the estimate. The average also spreads data from the center point of
the objects outward, making the classification possible over the whole texture.
The classification scheme is as follows:
1. Train the classifier in some areas of textures. This is implemented as a painting
in the original image, see Figure 4.23. Different textures are painted in different
colors.
2. The classifier collects statistics under the training areas from the feature images.
3. A classification of the whole image is performed. The chosen classifier is a
Maximum-Likelihood classifier and the result is found in Figure 4.24.
The results prove that spatial frequency is a good texture descriptor, also described in[61, 99, 16], at least if it is combined with phase-gating. The only area
that has been misclassified is the one in the upper left texture. In the misclassified
52
3
2
estimate
1
0
-1
-2
-3
10 -2
10 -1
10 0
10 1
10 0
10 1
logarithmic frequency
1.6
1.4
1.2
magnitude
1
0.8
0.6
0.4
0.2
0
10 -2
10 -1
logarithmic frequency
Figure 4.18: The characteristics of the proposed frequency estimation method. The
lower part is confidence and the upper part is the estimate of spatial frequency plotted
against a logarithmic scale.
53
Figure 4.19: A spatial frequency description of a synthezied image.
area, one can see that the size of the pellets is the same as the size of the holes in
the sea-weed, and that the distance between the pellets is the same as the size of
the sea-weed. In this area, size is not a good discriminator, and the result shows
this as well.
4.2.3
Conclusions
This chapter has described a study of an extension of the wavelet transform to 2D
signals. The phase of the transform is utilized and a continuous representation of
the phase is presented. A new algorithm has been presented, which detects in what
scale an event appears and also in what scale it disappears. The scale space is in this
way subdivided into a number of intervals isolating individual events. Within each
scale interval a consistency check is performed to get the certainty of the detection.
It has been shown that by using a three-dimensional phase representation of image
data, it is possible to do both the subdivision and the consistency check in a simple
manner.
A classification example with a size descriptor from the proposed scale analysis
scheme has shown some of the possibilities with this approach.
54
Figure 4.20: A spatial frequency description of a natural image.
Figure 4.21: The original test image containing ten circles with some additive noise.
The areas marked in the phase gating is given in yellow.
55
Plot of spat. freq. estimate of ten circles
200
180
160
*
*
*
Estimate
140
*
120
*
100
*
80
*
60
*
*
40
*
20
10 0
10 1
10 2
logarithmic radii of circles
Figure 4.22: The result of the spatial frequency estimation on Figure 4.21 plotted
versus the logarithm of the radii of the circles.
Figure 4.23: The training areas for the classifier.
56
Figure 4.24: Classification result of the texture image.
57
Appendix
4.A
Gaussian Resampling of Images
To provide a simple interaction between different levels of the pyramid, it is preferable
that all images have the same size. To preserve spatial position, a combination of oddand even-sized lowpass filters is required, such that a blob in a fine resolution image keeps
its center position if it goes through a subsampling followed by an interpolation to the
original size. The lowpass pyramid, which is built with the proposed algorithm, is the
traditional one, with one octave between each level. In other words, a subsampling from
an image of size 2N x 2N to 2N −1 x 2N −1 will occur between each level. The resampling
process is schematically illustrated in Figure 4.25.
1
3
2
4
Original image
Image resampled to original size
LP−filtering
LP−filtering
1
2
1
2
1
1
2
2
3
3
4
4
3
4
4
Expansion
4
Subsampled image
1
2
Filtering
3
1
3
4
3
Filtered image
2
Block expanded image
Figure 4.25: Illustration of the subsampling and the resampling process. The circles indicates center position of Gaussian lowpass filters. The numbers refer to the
outputs from these filters.
4.A.1
Subsampling
The subsampling procedure is explained by a general example. Start with an original image
of 512x512 pixels, which will be the bottom of the pyramid. To avoid aliasing effects in
the subsampling this image is filtered before the subsampling. The filter proposed here
has the following frequency function
2
2
G(ωx , ωy ) = e−(ωx +ωy )/(2σ
2)
(4.8)
The corresponding spatial function is
g(x, y) =
√
2π σe−(x
and the parameter
σ = π/4
58
2 +y 2 )σ 2 /2
(4.9)
to minimize aliasing effects for a subsampling by a factor of two, and at the same time to
preserve as much information as possible in the smaller image.
In the subsampling an even sized filter is used moving the sampling grid half a pixel
from the usual grid, i.e. (− 12 , − 12 ) is the origin for the filter. Filter position [ i , j ], where
i and j are integers, will have the value
1
1
g((i − ), (j − ))
2
2
The center of the filter is then defined in the middle of the four pixels that are connected
to the same pixel in the smaller image.
To continue our subsampling example after the convolution with the filter, it is resampled by a factor of two to a 256x256 image. The resampling is done by retaining only
every other column and every other row in the filtered image. Almost all aliasing effects
have been removed because of the filtering. The effects from the fact that Gaussians are
not strictly bandlimited, and the choice of σ from above, (the filter has a value of about
13.5 % of its top value at the frequency ω = π2 ), are so small that they have not been
detected in any, of many, test images. This 256x256 image will be the next level in the
lowpass pyramid.
The resampling from 512x512 down to 256x256 pixels moves the frequency spectrum
one octave so that ω512 = π2 ←→ ω256 = π. It is therefore possible to apply the same
Gaussian lowpass filter on this image and resample it to a 128x128 image. The number
of operations in the convolution of the 256x256 image will be a factor of four less than it
was on the 512x512 image.
The result after the described procedure is a pyramid that contains successively smaller
Gaussian images of sizes 512x512, 256x256, . . . , 16x16. The largest image is not filtered
with a Gaussian, but it is assumed to be very near Gaussian, because it is more or less
blurred before it is sampled and therefore bandlimited, (hopefully Mother Nature does
band limitation in a Gaussian fashion).
4.A.2
Interpolation to a Higher Sampling Rate
The procedure for interpolation to a larger image can be divided into two steps:
1. Expand the smaller image, so that each pixel becomes four in the larger image.
2. To avoid block effects in the larger image, the spectrum has to be cut at ωlarger = π2 ,
because higher frequencies cannot be represented in the smaller image. The filter
described below is applied for this purpose.
The first step is an ordinary block-expansion, and the filter used for interpolation, in
the second step, has the same frequency function as the filter used for subsampling, see
Eq. 4.8 and Eq. 4.9, but is spatially differently sampled. The usual sampling grid is used
in the interpolation scheme, i.e. g(0, 0) corresponds to the center of the filter and g(i, j)
to filter position [ i , j ] , where the size of the filter will be an odd number.
The combination of resampling and filtering is an approximation of a larger filter. If this
larger filter could be applied it should have an odd size to preserve spatial position. Thus
the restriction of preservation of position requires that the total filter, after a subsampling
followed by an interpolation to the original size of the image, must have an odd size. The
motivation for using even-sized filters is that an expansion from one image size to a larger
59
one can be performed as a convolution with an even-sized filter. An octave based block
expansion is the same as to add a column and a row between all old rows and old columns
followed by a convolution with a 2x2 filter with the value one in all four positions.
The filtering procedure for subsampling and interpolation to the original size is consequently a combination of a filter of even size (in the subsampling) with another filter of
even size (in the expansion), and finally a convolution with a filter of an odd size (in the
interpolation) which means that the total filter is an odd-sized filter as required.
60
Part II
2D — Adaptive Filtering
61
62
Chapter 5
Adaptive Filtering of 2D Images
For one-dimensional signals, adaptive filtering often means distinguishing between
signal and noise. After an estimation, the filter adapts to the signal in an optimal
way, in most cases according to a least squared error criterion. One example of this
category is the Wiener filter, e.g. [85]. This filter is directly extendable to higher
dimensions, e.g. images. However, the stationary 2D filter has a tendency to blur
image details, such as lines and edges, due to its lowpass character.
The idea of the presented adaptive filtering strategy is to enhance detectability,
both for a human observer and for algorithms, of features in an image and also to
reduce the noise level in the signal. Image enhancement techniques, where the size
of a smoothing filter is adjusted according to the signal and the filtering might be
anisotropic, i.e. directed along some prescribed orientation, are typical applications.
Examples of strategies, where the filter locally adapts to the signal are [66, 1, 5, 36,
21].
In this context a number of interesting parallels to biological systems becomes
relevant. The human visual system is much more sensitive to noise in spatially
broad, flat, low frequency regions than at edges. Thus an adaptive filtering scheme
should preferably perform smoothing over large regions in low frequency areas and
smoothing over small regions in high frequency areas, e.g. near edges. The effect of
noise in the vicinity of a directed feature depends on its orientation relative to the
feature. Noise in the same orientation may enhance detectability, and noise in the
perpendicular orientation will reduce it, [71, 79, 58].
This chapter will describe an idea and an implementation of an anisotropic filtering strategy. The similarities between the enhancement algorithm in [66] and the
presented tensor controlled adaptive filtering are discussed.
5.1
Representation and Estimation of Orientation
The fundamental idea behind this work is that local one-dimensionality, i.e. a neighbourhood with variations in just one orientation, is a generally useful model for vision systems. This means grey scale variations such as lines or edges, which are the
63
foundation of many image processing algorithms. These kinds of one-dimensional
structures are defined by two different properties; first, by their position, and second,
by their orientation. One way of estimating the first aspect is to use a local window
in the scene, i.e. the image, to decide the position. Inside this window, or neighborhood, a number of directed filters are applied and the filter outputs combined to
estimate the local oriented energy, [39, 67]. This oriented energy corresponds either
to an edge, a line or a mixture between such events. The algorithm described below
also gives a certainty measure that tells how well the neighbourhood is described by
the one-dimensional model.
The estimation of orientation can be achieved by splitting up the Fourier transform and measuring the energy contribution in the different sections. If an orientational field is to be used as contextual control of an adaptive filter, a continuous
orientation representation simplifies the procedure. This enables relaxation of orientation, i.e. to use larger areas to update the certainty of the estimate as simple
as by local averaging in the representation space.
There is one representation constraint for orientation, which has not been mentioned. A line which is rotated π radian will have the same orientation as the original
line. An appropriate representation should naturally map both these lines to the
same statement. For edges there is a possibility to have a sign meaning brighter
to the right or to the left. For more complex fields than grey scale images it is
impossible to talk about the direction of border edges between areas. Considering
color, it is not possible to define that red < blue < green or something similar. The
only reasonable statement which can be made is that we have a borderline between
differently colored areas.
To handle directional data in a proper way the directional aspect must be treated
correctly. In the case of tangent directions without sign, this means that the data
will be distributed over an angle interval [ϕ0 , π + ϕ0 [. If the problem then is to find
the mean of a particular distribution, it is easy to show that the mean will depend on
ϕ0 . Consider for example a distribution that contains only two samples, the angles
ε and π − ε, where ε is a small value. If the interval is defined as [0, π[ the mean will
be π/2, but if the interval is defined as [−π/2, π/2[ the mean will be 0. This is of
course not acceptable, and there are statistical distributions that can handle these
problems, see [77] for an extensive overview.
A general representation of the orientation in image processing can be achieved
using a “double angle” representation, of the tangent direction, [39]. This means
that a line with the orientation ϕ will be mapped onto the same statement as a
line with orientation (ϕ + π) because 2ϕ = 2(ϕ + π), since the angular variation is
modulo 2π. The same holds for edges, where it is the same as ignoring the sign.
This vector representation can be used not only on grey scale images, but on every
type of images. The magnitude of the vector has been used as a confidence measure
of the actual estimate, which is represented by the argument of the vector, [40].
The main advantage with this representation is that orientation will be described in
a continuous way, making ordinary signal processing concepts, e.g. averaging and
differentiation, meaningful.
Presenting features in a general form independent of the particular feature has
64
many advantages. One benefit of this vector representation is data visualization
using color. By letting the magnitude of a complex number control the intensity of
the pixels and letting the argument control the hue, an image is constructed that
creates an impression of the data information that is easy to interpret. High intensity
means high confidence in the estimate, while the color states the class membership.
In areas with high confidence the color will be clearly visible, while areas with low
certainty will be dark, and the color will be harder to discern.
5.1.1
Orientation Representation
The orientation estimates can be represented in a continuous way by tensors, [59].
These tensors are, for edges, in essence the scaled outer product of the gradient
direction, but on real 2D data there exist, of course, cases where there is more than
one orientation, i.e. the scalar function is not locally one-dimensional. Neglecting
this fact for a moment, the tensor in the ideal case of a strictly one-dimensional
neighbourhood can be written
1
1
T = xxt =
x
x
x21 x1 x2
x1 x2 x22
!
(5.1)
t
where
q x = (x1 , x2 ) is a vector pointing in the direction of the local gradient and
x = x21 + x22 is the norm of x .
The tensor, T, fulfills three basic requirements for a proper orientation representation, [68]:
1. The uniqueness requirement, i.e. there is just one representation point for
each orientation. This is not trivial, considering that the orientation of a line is
modulo π. This means that the orientation of a line with the gradient direction
x should be mapped onto the same point as a line with the orientation −x. This
ambiguity was resolved by Granlund [39] by doubling the angle of the gradient
orientation. The tensor mapping, Eq. 5.1, also fulfills this requirement since
the tensor is a quadratic form and the sign is thus neglected.
2. The uniform stretch, i.e. the representation must be equally sensitive to
orientation changes in all orientations. The implication of this is that the angle
metric of the orientation is preserved, i.e. similar events are represented “close
to each other” and the representation does not have discontinuities.
3. Polar separability, i.e. the estimated orientation should be represented independently of the strength of the signal and that the norm of the tensor is
invariant to rotation of the signal. The tensor norm is then possible to use as
a certainty measure not influencing the actual estimate.
5.1.2
Orientation Estimation
Recall from Chapter 3 that a real valued 2D function, e.g. an image, has a hermitian
Fourier spectrum, i.e. F (ω) = F ∗ (−ω), where ∗ denotes complex conjugate. Locally
65
one-dimensional signals, such as lines and edges, will have an energy distribution on
one single line in the Fourier domain.
The orientation algorithm then requires a number of precomputed quadrature
filters evenly spread in one half of the Fourier space. The minimum number of
quadrature filters required for orientation estimation in 2D is 3, [67], where the
filters are evenly distributed, i.e. the angular difference of two filters is π/3. In the
following these orientations are taken as 0, π/3 and 2π/3, or in vector notation:
2, √0 )t
n̂1 = 0.5 (
n̂2 = 0.5 (
1, √3 )t
n̂3 = 0.5 ( −1,
3 )t
(5.2)
A quadrature filter designed with a suitable bandpass function, Hρ (ω), e.g. the
lognormal function described in Chapter 3, is given by:
(
Hk (ω) = Hρ (ω)(ω̂ · n̂k )2
Hk (ω) = 0
if ω · n̂k > 0
otherwise
(5.3)
where ω =q(ω1 , ω2 ) is the frequency coordinate, ω̂ is a unit vector directed along ω,
and ω = ω12 + ω22 . The spatial filter coefficients are found by a straightforward
2D-DFT or by use of an optimization technique, see Appendix 3.A. The resulting
filter is complex-valued, with the real part being even and the imaginary part being
odd. This procedure is used to obtain three complex-valued quadrature filters.
The tensor describing the neighbourhood is given by, [60]:
4X
1
qk (Nk − I)
3 k
4
Te =
(5.4)
where qk denotes the magnitude of the output from filter k and I is the unity tensor.
Nk = n̂k n̂tk denotes the direction of the filter expressed in the tensor representation,
i.e.
!
4
0
N1 = n̂1 n̂t1 = 14
0
0
√1
3
1
4
N2 = n̂2 n̂t2 =
N3 = n̂3 n̂3 =
(5.5)
√ !
1
−
3
√
3
− 3
1
4
t
√ !
3
3
Rewriting Eq. 5.4 in terms of an algorithm:
1. Convolve the input data with the three complex-valued filters, i.e. perform six
scalar convolutions.
2. Compute the magnitude of each complex-valued filter output by
q
qk =
2
2
qke
+ qko
where qke denotes the output of the real part of filter k and qko denotes the
output of the imaginary part of filter k.
66
3. Compute the tensor Te by Eq. 5.4, i.e.
4
T =
3
T11 T12
T12 T22
e
where
!
T11 = q1 + A(q2 + q3 ) − S
T22 = B(q2 + q3 ) − S
T12 = C(q2 − q3 )
for
3
1X
S=
qk
4 k=1
1
4
3
B=
4
√
3
C=
4
A=
By studying the distribution of the eigenvalues and eigenvectors of Te a description of the neighbourhood is obtained. Utilizing the fact that T (and Te ) has a
regular form, i.e. it is symmetric and positive semidefinite, the eigenvectors can be
written in a simpler form.
Let e1 and e2 be the eigenvectors to Te , where e1 has the largest eigenvalue and
e2 has the least. The directions of e1 and e2 are denoted by ϕ1 and ϕ2 respectively.
Then
2ϕ1 = 2 arg(e1x , e1y ) = arg(T11 − T22 , 2T12 )
2ϕ2 = 2 arg(e2x , e2y ) = arg(T22 − T11 , −2T12 )
(5.6)
It can be noted, [94], that the two eigenvectors in the double angle space will be
parallel, but with reversed signs. This follows from the fact that the eigenvectors of
symmetric tensors, or matrices, are perpendicular. In 2D a doubling of the argument
of perpendicular vectors necessarily means that they are oppositely directed.
In [39, 67] an orientation algorithm is suggested by combining filter outputs
directly in a double angle fashion. By combining the filter outputs vectorially according to Eq. 5.7, giving an orientation estimate corresponding to Figure 4.1.
ori =
3
4X
qk e2i(k−1)π/3 = meiϕori
3 k=1
Direct calculation shows that
ϕori = arg(ori) = 2ϕ1
The magnitude m in Eq. 5.7 will depend on
67
(5.7)
1. the energy in the considered neighbourhood, i.e. two neighborhoods that only
differ in their local energies, will have different magnitudes and the difference
in magnitude will be linear w.r.t. the RMS of the energy.
2. how well the neighbourhood fits into the one-dimensionality model. A perfectly
one-dimensional structure will give rise to a higher magnitude than for example
a cross.
The magnitude, m, is therefore referred to as a certainty measure, and the angle,
ϕori , is the orientation estimate expressed in “double angle” representation. The
division of the result from operations in one part corresponding to the certainty
and one part corresponding to the actual estimate is a fundamental issue, originally
discussed in [39].
The reason to use a tensor as the representation is the richer description of the
neighbourhood. The spectrum theorem, which is found in all standard textbooks of
linear algebra, e.g. [43], allows us to write Te as
Te = λ1 e1 et1 + λ2 e2 et2
(5.8)
where λ1 is the largest eigenvalue to Te and λ2 the least.
This means that not only the largest eigenvalue is given, describing the energy in the
dominant orientation of the neighbourhood, but also the least eigenvalue describing
the energy in the perpendicular direction.
The correspondence between the eigenvalues and the magnitude of ori follows
from the observation that e1 and e2 are oppositely directed in the double angle
space.
m = mag(ori) = (λ1 − λ2 )
(5.9)
By taking the norm of the tensor Te as the Frobenius norm, i.e.
2 X 2 1
2
2
2
||T ||f rob =
=
=
qk +
(q1 − q2 ) + (q1 − q3 ) + (q2 − q3 )
3
3
(5.10)
the energy in the tensor representation and in the vector representation will be equal
for an ideal input, i.e. a one-dimensional signal. The same holds also if the norm is
chosen as the maximum norm, i.e.
e 2
X
Tij2
X
λ2i
||Te ||2max = λ21
5.2
(5.11)
Orientation Adaptive Filtering
The purpose of this section is to describe how the estimated shape and orientation
of Te can be used to adapt a filter behaving in the same way as the signal. One
suitable function, F , is, according to Knutsson, Wilson and Granlund [66], a linear
weighting between an isotropic bandpass filter and an anisotropic filter having the
same bandpass function
F (ω) = F (ω, ϕ) = F (ω) g1 (λ1 , λ2 )(ω̂ · e1 )2 + g2 (λ1 , λ2 )
68
(5.12)
where F (ω) is a bandpass function with high upper cut-off frequency, g1 is the
weighting function for the anisotropic part and g2 is the weighting function for the
isotropic part.
The first term in the sum will be a line filter in the direction of the eigenvector
corresponding to the largest eigenvalue. The second part will be an isotropic filter.
Typical choices of the weighting functions g1 and g2 are
g1 (λ1 , λ2 ) =
g2 (λ1 , λ2 ) =
λ1 −λ2
λ1
λ2
λ1
(5.13)
Choosing these functions will result in a linear weighting between the isotropic
and the anisotropic filter. It also implies that the maximum norm, Eq. 5.11, of the
tensor representation is the most suitable when the tensor should be used to adapt
an anisotropic filter.
Is it then possible to adapt a filter according to Eq. 5.12, from the tensor Te ?
Proposition 1
Assume the use of three polar separable filters having the same radial filter function,
F (ω), and angular (ω̂ · n̂k )2 , where the filter directions n̂k is given in Eq. 5.2. The
adaptive filter of Eq. 5.12 is then given by, [64]
F (ω) = F (ω)
3
4X
1
T̂e • (Nk − I)(ω̂ · n̂k )2
3 k=1
4
(5.14)
T̂e is the normalized estimate, the choice of normalization will give different types
of filter shapes, in other words it will affect the functions g1 and g2 . If T̂e is
normalized with λ1 , the resulting filter will be interpolated using the chosen g1 and
g2 from Eq. 5.13. I is the identity tensor and • symbolizes the tensor inner product
defined according to Eq. 5.15
A•B=
X
Aij Bij
(5.15)
i,j
Proof of proposition 1
2
Using the spectrum theorem, Eq. 5.8, to divide the estimated structure of the neighbourhood in two parts, each one pointing in one direction, it is possible to write the
projection as:
1
λ2
(5.16)
T̂e = Te = e1 et1 + e2 et2
λ1
λ1
It is now possible to study each part √
of the tensor independently, starting with
t
the first one. Setting e1 = (a, b) , where a2 + b2 = 1 leads to the control tensor
0
T =
a2 ab
ab b2
69
!
(5.17)
The weighting coefficients in the interpolation is calculated according to Eq. 5.15
1−
0
1
(N1 − I) • T0 =
4
1
1
(N2 − I) • T0 =
4
4
1
1
(N3 − I) • T0 =
4
4
1
4
0
− 14
!
•
a2 ab
ab b2
√ !
1√
−1
3
a2
•
ab
3 3−1
√ !
2
1−
√1 − 3 • a
ab
− 3 3−1
!
ab
b2
ab
b2
!
!
1
= (3a2 − b2 )
4
1 √
= (2 3ab + 2b2 )
4
√
1
= (−2 3ab + 2b2 )
4
Denoting the filter functions by cos2 ϕ, cos2 (ϕ − π/3) and cos2 (ϕ − 2π/3) for filter
number 1, 2 and 3 respectively leads to the following interpolated filter:
F 0 (ω) = F (ω) 43 14 ((3a√2 − b2 ) cos2 ϕ+
(2 3ab + 2b2 ) cos2 (ϕ − π/3)+ √
(−2 3ab + 2b2 ) cos2 (ϕ − 2π/3) =
F (ω) 12 (a2 + b2 + (a2 − b2 ) cos 2ϕ + 2ab sin 2ϕ)
(5.18)
Remembering that (a, b)t is an eigenvector to Te allows us to put
(a, b)t = (cos ϕ0 , sin ϕ0 )t , which gives
F 0 (ω) = F (ω) 12 cos2 ϕ0 + sin2 ϕ0 +
(cos2 ϕ0 − sin2 ϕ0 ) cos 2ϕ+
2 cos ϕ0 sin ϕ0 sin 2ϕ) =
F (ω)cos2 (ϕ − ϕ0 )
(5.19)
Thus the first part in Eq. 5.16 will adapt a filter to the main orientation of the
structure found in the neighbourhood.
The second part of Eq. 5.16 will in the same way as above give rise to a filter
perpendicular to the filter in Eq 5.19. The angular function of this filter can then
be denoted (b, −a)t = (sin ϕ0 , − cos ϕ0 )t and the resulting filter will be
F 00 (ω) = F (ω) sin2 (ϕ − ϕ0 )
(5.20)
The total filter can now be written as
F (ω) = F 0 (ω)+ λλ21 F 00 (ω) =
F (ω) cos2 (ϕ − ϕ0 ) +
F (ω)
λ1 −λ2
λ1
2
λ2
λ1
sin2 (ϕ − ϕ0 ) =
cos (ϕ − ϕ0 ) +
λ2
λ1
(5.21)
which is the filter proposed in Eq. 5.12.
2
In some applications the angular function cos2 ϕ might be too broadly tuned.
However, by iterations of the same filter it is possible to get more narrowly tuned
filters. The filter will get the angular variation cos2i ϕ after i iterations. Iterations
of the filter put great demands on the radial function since it will be iterated as
well, see Eq. 5.24
70
Input
image
Consistency
of local
tensor
representation
Estimation of
local tensor
representation
Adaptive
filtering
Output
image
Figure 5.1: Overview of the orientation adaptive filtering algorithm.
To use the presented filtering strategy for noise reduction, a suitable lowpass
filter must be applied to keep the low frequency components, including DC, of the
original signal. The decision of where to use the bandpass filter is controlled by
the energy of the tensor according to Frobenius ||Te ||f rob . The adaptive filter, Fada
could now be written as
Fada (ω) = FLP (ω) +
q
||Te ||f rob FHP (ω)
3
4X
1
T̂e • (Nk − I)(ω̂ · n̂k )2
3 k=1
4
(5.22)
where FLP is the lowpass filter, see Eq. 5.23, and FHP is a bandpass filter with rather
high upper cut-off frequency, see Eq. 5.24.
Note that no eigenvalue decomposition is necessary in the interpolation strategy.
The largest eigenvalue must be available to calculate the maximum norm, Eq. 5.11,
but this is the only component needed. This is not critical in the case of 2 by 2 tensor,
but for higher dimensional signals an eigenvalue decomposition is computationally
demanding.
5.2.1
Implementation
Implementing these ideas is straightforward. The implementation is a three-stage
operation which is summarized in the block scheme in Figure 5.1.
The estimation of the local information, represented in the tensor, is accomplished
with quadrature filters to get a smooth phase independent tensor field. Typically
rather broadband lognormal filters, Eq. 3.16, are used in the estimation. In the
result section the applied estimation filter has the bandwidth Blogn = 4 and the
center frequency ωi = π/4. These filter results are combined to a tensor according
to Eq. 5.4.
The continuity of the tensor representation is utilized when a tensor averaging
operation is applied to the estimates. This type of local averaging in the representation space originates from [67]. The averaging affects both the eigenvectors and
the relation of the eigenvalues. To give a few examples:
1. In a truly one-dimensional neighborhood, with some noise added, the averaging
will work as relaxation of the orientation with respect to a larger spatial region.
This means that rapid changes in the representation space are suppressed.
2. In the proximity of a corner the averaging mainly will change the shape of
the tensor. In this kind of neighborhoods there will exist two perpendicular
71
“one dimensional” tensors. Averaging of these gives rise to a tensor whose
eigenvalues are approximately equal, meaning that the shape of the tensor will
be isotropic.
In the first of these two cases, the implication is that the adaptive filter will have
one dominant orientation, sharpening the one-dimensional signal. In the second case
the adaptive filter will be isotropic and the corner, which has no unique orientation,
will be unaffected, i.e. the filter will be of allpass type.
In the adaptive filtering four separate filters are applied. The filters used are
one isotropic lowpass and three anisotropic bandpass filters with high upper cut-off
frequency. The filter function in the Fourier domain for the lowpass filter is given
according to Eq. 5.23, [66].
(
FLP (ω) =
cos2 (πω/1.8) ω ≤ 0.9
0
ω > 0.9
(5.23)
The radial part of the three bandpass filters is given of Eq. 5.24.

2

 1 − cos (πω/1.8)
FHP (ω) =  1

cos2 (π(ω − π + 0.9)/1.8)
ω ≤ 0.9
0.9 < ω ≤ π − 0.9
π − 0.9 < ω ≤ π
(5.24)
while the angular function is, as said before, (ω̂ · nˆk )2 .
The values, 0.9 and 1.8, of the constants in Eq. 5.23 and 5.24 are chosen under
the demand that the filters must be possible to realize on a 15x15 grid.
P
The sum of the isotropic bandpass filter, i.e. 23 3k=1 FHP (ω̂ · nˆk )2 , and the lowpass
filters is flat in the frequency domain. This is a natural requirement in the design
phase of the filters. The reason is that with no a priori information about the image
to be filtered, all spatial frequencies must be regarded equally important. Another
reason is that the flatness is necessary if the filters are supposed to be used iteratively.
If there is a deviation in the spectrum, which otherwise is flat, the deviation will
grow larger if the filter is used iteratively.
To give an impression of what the filters look like for different control tensors,
five different interpolations in the Fourier domain have been plotted in Figure 5.2.
72
a)
b)
c)
d)
e)
Figure 5.2: The filter shape for different relations between the eigenvalues λ1 and
λ2 of the control tensor.
√
√
√
√
b) λ1 = 1/2 2, λ2 = 1/2 2
c) λ1 = 1/ 2, λ2 = 1/ 2
a) λ1 = 0, λ2 = 0
e) λ1 = 1, λ2 = 0
d) λ1 = 1/2, λ2 = 0
73
Chapter 6
Scale Adaptive Filtering
An extension of the orientation adaptive filtering strategy was proposed in [46] where
a bandpass or Laplacian pyramid [18] is generated and each level is filtered according to the anisotropic filtering strategy. In this chapter will this strategy be further
described. The adaptive filter is constructed as a linear combination of three fixed
filters on each scale level. The three weighting coefficients are obtained from the
local energy, the local orientation, and the consistency of local orientation, all represented by 2x2 tensors. The orientation adaptive filtering is applied on each level of a
bandpass pyramid. The resulting filter is interpolated from all bandpass levels, and
spans over more than 6 octaves. It is possible to reconstruct an enhanced original
image from the filtered images. The performance of the reconstruction algorithm
displays two desirable but normally contradictory features, namely edge enhancement and an improvement of the signal-to-noise ratio. The results are satisfactory
on a wide variety of images from moderate signal-to-noise ratios to low, even lower
than 0 dB, SNR.
6.1
The Algorithm
Initially a brief overview of the algorithm following the implementation will be given.
Each part of the algorithm will then be explained in detail.
6.1.1
Overview
The overview follows the block scheme in Figure 6.1.
The first box is a more or less standard implementation of an octave based difference of lowpass pyramid (DOLP), [18], meaning that the images in the bandpass, or
Laplacian, pyramid fit together such that the sum of them reconstructs the original
image. The octave based pyramid has bandpass images with typical sizes 5122 , 2562 ,
... 322 and an image of size 162 containing the low frequency components including
DC.
In the second box the local energy and the local orientation are estimated on
each level of the pyramid. The feature extraction is performed by a set quadrature
74
1
Input
Image
Difference
of lowpass
pyramid and
subsampling
2
3
4
Consistency
of
local
model
Estimation
of
local
model
Adaptive
determination
of control
parameters
Adaptive
filtering
Interpolation
of
subsampled
images
and
summation
5
6
Output
image
Figure 6.1: Overview of the scale and orientation adaptive algorithm.
filters, see Section 3. The use of quadrature filters ensure that the local energy is
independent of the phase of the signal, e.g. independent of whether it is a line or an
edge, which is important for the smoothness of the output. The tensor representation
of orientation, Section 5.1, is used because it provides a continuous representation
space.
The continuity of the representation space is utilized in the third box where the
consistency, or relaxation, of orientation information is computed as tensor averaging
of the orientation information. The relaxation provides larger spatial regions to
control the certainty of the orientation estimation. A novel feature of this algorithm
is that the relaxation is not only made within a single scale level, but incorporating
several levels.
In the fourth box the parameters to a nonlinear transfer function are determined,
interpreting the estimated information. The parameter setting is based on global
statistics of the local energy.
The first four boxes taken together are used to estimate a local model for each
neighborhood in the image.
The fifth and sixth boxes can be seen as a model based reconstruction of the
original signal increasing the signal-to-noise ratio, [66]. The proposed bandpass
environment enables a efficient implementation of edge enhancement in edge areas,
simply by putting more weight on the higher frequency levels in the reconstruction.
75
Gaussian lowpass filters
1
0.9
0.8
Filter response
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
10 -2
10 -1
10 0
10 1
Logarithmic frequency
Figure 6.2:
6.1.2
The frequency response of the Gaussian lowpass filters.
Generation of the Bandpass Pyramid
Computational efficiency is one reason for using subsampled images, and as pointed
out in Section 3 it also provides means to implement a wavelet transform. The
simplicity of reconstructing the original image from the DOLP pyramid is maybe
the major reason to use a bandpass pyramid instead of the lowpass pyramid used in
the Chapter 4. This type of scale pyramid is widely used for a number of computer
vision and image processing tasks [18, 23].
In the current implementation a Gaussian approach is chosen. The filter used
has a spatial standard deviation of 4/π, with one exception. The first applied filter
has the standard deviation of 2/π. The latter filter with standard deviation 2/π can
be used to generate a half-octave based pyramid. The first filter is better suited for
an octave-based approach. The reason to use the Gaussian with the smaller spatial
standard deviation is that otherwise the bandpass level with the highest cut-off
frequency, which actually is a highpass filter, collects a too large proportion of the
signal energy, see Appendix 6.A. The frequency of the lowpass filters are plotted in
Figure 6.2.
The bandpass pyramid is obtained by taking subsequent differences between the
adjacent lowpass filters. The frequency response of the bandpass filters are plotted
in Figure 6.3. The sum of the bandpass filters is also plotted in Figure 6.3, where it
is easy to see why the reconstruction is so simple.
76
Difference of lowpass filters
1.2
1
Filter response
0.8
0.6
0.4
0.2
0
10 -2
10 -1
10 0
10 1
Logarithmic frequency
Figure 6.3:
The difference of lowpass filters and the sum of the bandpass filters.
The Implementation
There are some aspects that must be taken into account when implementing subsampled, e.g. bandpass, pyramids. One will be treated below and some others are
mentioned in Appendix 4.A. It is desirable that the reconstruction could be made
using pixelwise operations. The pyramid is implemented as follows.
Define the Gaussian filters, g1 (x) and g2 (x), as
π −π2 x2 /8
g1 (x) =
e
8
π −π2 x2 /32
g2 (x) =
e
32
(6.1)
where x = ||x||. The first level in the bandpass pyramid is easy to obtain since the
sampling grid is the same as for the original image, i.e.
lp1 = g1 ∗ org
bp1 = org − lp1
where org is the original image and ∗ denotes convolution.
The second level should in the same way be calculated as
lp2 = g2 ∗ lp1
77
(6.2)
bp2 = lp1 − lp2
(6.3)
and the original image is to be reconstructed as org = bp1 + bp2 + lp2 . But the
lowpassed version, lp2 , of the original is supposed to be subsampled in order to
obtain a pyramid. However, the original image can not generally be reconstructed
using the subsampled image, 2↓ lp2 , of lp2 . The superscript 2↓ indicates that the
image has been subsampled by a factor 2 in each dimension, according to the scheme
in Appendix 4.A. To ensure that the original can be exactly reconstructed bp2 is
given by
bp2 = lp1 − 2↑ (2↓ lp2 )
(6.4)
This means that before computing the difference, the image that is supposed to
be subsampled should be subsampled and interpolated to the original grid size.
The superscript 2↑ indicates an enlargement of the image with a factor 2, using
the interpolation process described in Appendix 4.A. The image is now partitioned
in three images, bp1 , bp2 and 2↓ lp2 , whose sum exactly reconstructs the original.
Iterating this procedure gives the subsequent bandpass images.
In the resulting bandpass pyramid the two levels with highest frequency content
are represented on the same spatial sampling grid as the original image. The subsequent levels are subsampled by a factor of two in each dimension, for each level.
The result after the transform is that the image is represented using about 73 of the
number of pixels in the original1 image. Typically one highpass level, five bandpass
levels, and one DC level have been used in the implementation.
6.1.3
Estimation of Orientation
The local orientation is estimated on each level in the bandpass pyramid. The
estimation is performed by applying the same filter set on each level. Using a
octave based pyramid means that the effective center frequency of the filter set will
be reduced by a factor of two for each level. The resulting pyramid is termed an
orientation pyramid, where the representation for each pixel is a 2x2 tensor according
to Eq. 5.4.
6.1.4
Consistency of Orientation
There are two common ways of separating the signal from the noise: spectral methods, e.g. edge focusing [12, 21], and spatial correlation methods. A method of the
latter type was utilized in Chapter 5 where, on a single scale, the orientation estimate was relaxed with respect to its neighbors. In a multi-scale environment it is
natural not only to do spatial consistency of orientation, but also scale consistency.
The reason is that it is believed that visually interesting features must exist over
several scales. It is straightforward to implement the relaxation as tensor averaging
in the same sense as described in Chapter 5.
1
One of the attractions of bandpass, and Laplacian, pyramids has been the coding potential
inherent in this type of structure, [90]. However, the coding aspect is not the subject of the
proposed algorithm.
78
Implementation
The extension of the averaging procedure compared to the procedure in Chapter 5
is that the average filter is three-dimensional, with scale as the third dimension. In
the filter design a lot of heuristics have been used. The exact shape of the filter has
proven to be non-critical, but some general design criteria can be given:
1. The filter should be relatively narrow in the scale direction, and only the current
and lower resolution images should be used.
2. The filter should have roughly the same region of spatial support for each scale
level involved.
3. A smooth function should be used.
The first point can be expounded by considering how hard it is to distinguish
an image signal from noise, especially in the high frequency range. The support
for visually interesting features should therefore come from lower resolution, where
the separation between signal and noise is likely to be better. The second and third
point is built on heuristics and it has proven to work satisfactorily in all experiments.
The implemented filter has a depth of 3, i.e. three different scale levels are
regarded in the relaxation. The filter planes, fp , are defined according to
f1 (x) =
f2 (x) =
f3 (x) =
1
2
1
3
1
6
π −π 2 x2 /16
e
16
π −π 2 x2 /32
e
32
π −π 2 x2 /64
e
64
=
=
=
1
Gaussian(σ
2
1
Gaussian(σ
3
1
Gaussian(σ
6
√
= 2 π 2 ) on a 9x9 grid
= π4√)
on a 7x7 grid
4 2
= π ) on a 5x5 grid
(6.5)
This filter is applied after the estimate images have been resampled to the finest
resolution image size involved in the filtering.
Note that this filtering does not change the sensitivity to high frequency components in the original data. It only means that rapid changes in the representation
of local structure are suppressed. (The original estimate was made by a filter set of
size 15x15.)
6.1.5
Determination of Parameters
So far, the estimation of the local models have only incorporated linear operations.
But in order to interpret the data a non-linear function is useful. The main reason
is that different Signal-to-Noise Ratios (SNR) require different interpretations of the
orientation estimates.
The most important parameter to be estimated is the value of the parameter
called a in Figure 6.4 and Appendix 6.B. The estimation of a in the transfer
function, s(E), for the local energy is based on global statistics of the relaxed energy.
On each scale level the pixels with lowest energy will control the position of a.
The strategy is based on the hypothesis of a uniform noise level over the image,
and that at least 1% of the signal area is assumed to be flat, i.e. have a signal energy
79
s(E)
1
0.5
a
1
E
Figure 6.4: Illustration of the transfer function s(E), see also Appendix 6.B.
which is less than the noise energy. An estimation of the noise energy is achieved by
investigating the energy limit for these pixels. This assumption is supported by the
examination of a test on the “boat” image, Figure 6.8, with additive white noise.
The energy histograms from one single scale level are displayed in Figure 6.5. The
behavior is, as expected, that the pixels with the lowest energy will have higher
energy relative to the maximum in the images with low SNR.
To determine a, a contemplation of the behavior of the noise energy compared
with the signal energy in the bandpass transform is useful. Assuming that the
noise is white, it follows that the noise energy in the bandpass pyramid increases
approximately as ωl2 , where ωl is the center frequency for bandpass level l. As
white noise has a flat energy spectrum, the noise energy in the bandpass levels will
only depend on their area in the frequency domain. Using the proposed bandpass
pyramid leads to a transform with constant relative bandwidth, B, and the area of
each bandpass level, Figure 6.6, is proportional to
Area ∝ π((ωl + Bωl )2 − (ωl − Bωl )2 ) = 4Bπωl2
A more thorough calculation of the energy is given in Appendix 6.A.
On the other hand, stochastically distributed lines have a energy spectrum decreasing as 1/ω and stochastically distributed edges have a spectrum decreasing as
1/ω 3 , [67]. Taking 1/ω 2 as the mean for the sum of these stochastic distributions
will then lead to an approximately flat distribution in the bandpass transform.
These considerations taken together makes the use of a quadratic weighting of
the scale level, l, from the estimated noise energy, based on the histogram, to the
setting of a adequate. Typically the 1% value from the histogram will be weighted
80
Figure 6.5: Four histograms of the estimated RMS value of energy for the “boat”
image with different signal-to-noise ratios on a single scale. The value of RM S1% is
given in each histogram. The histograms are as expected shifted to the right, meaning
toward higher energy, if the SNR decreases, while the signal energy is constant.
81
ωy
ωl
ωx
Figure 6.6: The area of a bandpass filter with center frequency ωl . The borders
illustrate the bandwidth of the filter.
dependent on the bandpass level l as
(c1 l + c2 )2
(6.6)
where c1 and c2 are constants that should be set depending on the application, see
for example Eq. 6.7.
In the discussion above the noise is assumed to be white, but if there is another
a priori knowledge of the noise distribution or the signal spectrum this is easy to
incorporate in the bandpass structure.
Implementation
In this implementation the RMS value is used instead of the energy to keep the
numerical bounds in a manageable range. This means that the function suggested
in Eq. 6.6 will not be quadratic but linear and it has been implemented as:
3
5
3
5
a = (− l + 4 )RMS1%
(6.7)
where RMS1% is the estimate of the RMS value of the noise from the histogram.
The transfer function, s(E), for outputs in the interval [0, 1] has mainly two
degrees of freedom, the half-value, a, and the slope at a. The function is symmetric
around a in the sense that it will go to zero as fast as it goes to 1. Now when a
is determined the slope also has to be settled. In this implementation, it has been
done by fixating s(0) = 0 and using the symmetry of the function.
The drawback of the automatic parameter setting is that the noise level is supposed to be more or less constant over the image. If this is not the case, the
82
Figure 6.7: A plot in the Fourier domain of one of the used allpass filters with an
angular variation of (ω̂ · nˆk )2 .
estimation of RMS1% must occur separately in different parts of the image. In each
of these parts different parameters will control s(E). However, in most situations a
uniform noise distribution is a useful model.
6.1.6
Adaptive Filtering
The adaptive filtering from Section 5 is applied on each level of the bandpass pyramid. The contextual control parameters are obtained by passing the tensor averaged
data through the transfer function, s(E).
When the adaptive filtering is applied on a bandpass image there is no need to
incorporate a lowpass filter as described in Section 5.2, since the DC-component is
zero anyway. Another set of filters is then more appropriate, from a complexity as
well as a theoretical point of view. The filters used are as before, see Section 5,
three polar separable filters with angular function (ω̂ · n̂k )2 , where n̂k is the main
direction of each filter. The bandpass function of the filters can be adapted to the
bandpass environment where it is applied.
The ideal bandpass function is an allpass function, but this is not possible to
combine with the proposed angular function. However, it is possible to optimize
a filter from these specifications. The optimization procedure, [67], is the same as
described in Appendix 3.A, i.e. the weighted error is minimized in a least square
sense. The weighting function is in this case the bandpass function from the pyramid
generation. The result is a filter that optimally approximates an allpass function
in the window of the current level, see Figure 6.7. The filter is realized on a 15x15
spatial grid.
The weighting functions of how the anisotropic part of the adaptive filter should
83
Figure 6.8: The used image, a circle in the lower right part is indicating where the
filter in Figure 6.9 is taken.
depend of the estimates are chosen in accordance with Eq. 5.13. The smooth decision
on how to apply the adaptive filter is based on the estimated local energy according
to the transfer function s(E) and setting of the parameter a.
6.1.7
Reconstruction
As mentioned above the reconstruction of the original image from the bandpass
pyramid is simple due to the construction of the pyramid. The sum of the filtered
bandpass images will produce a filtered original image.
To show the degrees of freedom in this filtering strategy, the filter around one
pixel in the original image, indicated by a circle in the lower right part of Figure 6.8,
is shown in Figure 6.9. The filter is given in the Fourier domain. In the center of
the image the DC-component is located, and in the low frequency area it is possible
to see that a vertical structure is detected, originating from the tree-trunk, and the
filter will adapt its orientation to this structure. At higher frequencies the filter
adapts to a horizontal structure originating from the tree-branch.
To do edge (or line) enhancement is easy in this adaptive filter environment.
On each scale level, the local model provides information of the reliability of the
interesting features. High frequency amplification in edge areas can then simply be
done by putting higher weights to these bandpass levels in the reconstruction.
The amplification of high frequency bands is inspired by the human visual system
where the same behavior is detected, e.g. [30]. The amplification function has been
84
Figure 6.9: An example in the Fourier domain of an adapted filter.
chosen to be exponential with respect to a logarithmic frequency axis, i.e. linear on
a log-log plot of spatial frequency versus amplification.
Implementation
The reconstruction of the original image can be seen as the reverse of the generation
of the pyramid. The filtered original is constructed by successive enlargements of
the filtered bandpass images followed by a summation. The original is then:
org
= bp1 + bp2 + 2↑ BPl>2
where
BPl>k = bpk + 2↑ BPl>k−1
(6.8)
Amplification of high frequencies on the used logarithmic scale was done according to the following function in the experiments
bpl = bpl eα(6−l)
(6.9)
where the bandpass pyramid has six levels and α is a constant controlling the amount
of high frequency amplification. Typical values for α is in the interval [0, 0.2]. This
means that the lowest frequency band will be unaffected and the highest frequency
band will be amplified by a factor of e6α . Note, this amplification takes place after
the filtering, i.e it will occur in areas where the signal energy has high frequency
content and noise in flat areas will consequently not be amplified.
85
6.2
Results
Extensive testing of the algorithm has been carried out on both real and synthetic
images. Some of these results are given below. The images used in this section
include a synthetic test image, see Figure 6.10, and the “boat” image from the USC
image database. The images have been corrupted with white additive Gaussian
noise to SNR down to -10dB. In the case of the “boat” image the performance is
compared to the algorithm reported in [66], see Figure 6.16. Other images are a
mammogram, i.e. a X-ray image of a female breast in Figures 6.20–6.21, and a
satellite image over Iceland, Figures 6.22–6.23. It is believed that the result images
show the possibilities of the proposed scale and orientation adaptive filtering strategy
without further comments.
The synthetic test image is designed to include all possible orientations and a
wide frequency range in order to test the sensitivity of the algorithm for these
different aspects. The image contains a frequency range that spans approximately 6
octaves, including the Nyquist frequency. As the pattern is circular the orientations
are equally distributed in each frequency band. The function generating the test
pattern is given by:


f (r) 0 ≤ r < 0.992

W (r)f (r) 0.992 ≤ r < 1
(6.10)


0.5 (1 − W (r))
r>1
where
f (r) = 
0.5 ∗ [1 + cos (0.64π(e6r − 1))]

if r < 0.992
 1
2
cos (64π(r − 0.992)) if 0.992 ≤ r < 1
W (r) =


0
if r ≥ 1
and r is the distance from the center in normalized pixel units, e.g. r =
corners of the image.
√
2 in the
Measuring the quality of an enhancement or reconstruction algorithm is an extremely complex task. The reason is that there exists no unifying theory of measuring image quality. The problem is aggravated due to the fact that the quality of
an image is determined by the not so well understood human visual system (HVS).
The most commonly used measure to compare the quality of images is the SNR,
i.e. calculating the mean of the squared difference between the original image and
the reconstructed image. However, SNR is sensitive to all of the transforms that
HVS is insensitive to, that is translation, rotation, contrast stretch and scaling. As
a consequence an image that is perceived as a “good quality” image can have very
poor SNR due to any of the above transforms.2
For historical reasons SNR tables have been calculated for the synthetic image
in Figure 6.10 and for the “boat” image. These are presented in Table 6.1 and
Table 6.2.
2
An extreme case is when SNR is negative: Then a constant output with a value equal to the
correct mean of the signal will have an output SNR that is zero.
86
Input SNR Output SNR
∞ dB
26.1 dB
20 dB
22.7 dB
10 dB
12.5 dB
5 dB
8.2 dB
0 dB
5.5 dB
-5 dB
4.0 dB
-10 dB
2.3 dB
Table 6.1: The result of the scale adaptive enhancement algorithm for the synthetic
image in Figure 6.10.
Input SNR Output SNR
∞ dB
24.1 dB
20 dB
20.0 dB
10 dB
15.8 dB
5 dB
13.5 dB
0 dB
11.2 dB
-5 dB
8.9 dB
-10 dB
6.0 dB
Table 6.2: The result of the scale adaptive enhancement algorithm for the “boat”
image in Figure 6.14.
87
Figure 6.10: The synthetic test image with white Gaussian noise added to a SNR
of 20dB.
Figure 6.11: Adaptive filtering of the test image in Figure 6.10, the parameter α = 0
in Eq. 6.9.
88
Figure 6.12: Adaptive filtering of the test image Figure 6.10, the parameter α = 0.1
in Eq. 6.9.
Figure 6.13: Adaptive filtering of the test image Figure 6.10, the parameter α = 0.2
in Eq. 6.9.
89
Figure 6.14: The original “boat” image.
Figure 6.15: The original “boat” image with white Gaussian noise added, SNR=0dB.
90
Figure 6.16: Orientation adaptive filtering of Figure 6.15. This is the result from
the algorithm reported in [66] using one single scale.
Figure 6.17: Scale and orientation adaptive filtering of Figure 6.15, the parameter
α = 0.0 in Eq. 6.9.
91
Figure 6.18: Scale and orientation adaptive filtering of Figure 6.15, the parameter
α = 0.1 in Eq. 6.9.
Figure 6.19: Iteration of the scale and orientation adaptive filtering of Figure 6.15,
the parameter α = 0.1 in Eq. 6.9, for both of the filterings.
92
Figure 6.20: The original mammogram, left: The complete image, right: a part of
the image where there are micro calcifications.
Figure 6.21: Scale and orientation adaptive filtering of Figure 6.20. The micro
calcifications, the small bright dots, are now easier to detect.
93
Figure 6.22: The original satellite image of Iceland.
Figure 6.23: Scale and orientation adaptive filtering of Figure 6.22, the parameter
α = 0.2 in Eq. 6.9.
94
Appendices
6.A
Calculations of the Energy in the Bandpass
Pyramid
To verify that the noise energy in the bandpass filters increases approximately as ωl2 , the
integral under each filter is calculated. The RMS of the energy in the bandpass pyramid
is given by:
RMS =
r
Rπ Rπ −π −π
v
u
u
= tπ
2
2
2
2
2
2
e(ωx +ωy )/2σ1 − e(ωx +ωy )/2σ2
2
dωx dωy
σ12 (erf (π/σ1 ))2 + σ22 (erf (π/σ2 ))2 −
σ2 σ2
4 σ21+σ22
1
2
r
erf π
σ12 +σ22
2σ12 σ22
2 !
where erf is the error function defined as:
2
erf (x) = √
π
Z
0
x
2
e−t dt
The first bandpass level, bp1 , must be calculated in a somewhat different manner since it
is constructed as a difference between a flat filter and a Gaussian.
r
RMS =
s
=
Rπ Rπ −π −π
2
2
2
1 − e(ωx +ωy )/2σ1
2
dωx dωy
4π 2 + πσ12 (erf (π/σ1 ))2 − 4πσ12 erf
√π
2
2σ12
The RMS of the energy, if the signal is white, is for each bandpass level given in Table 6.3.
The subsampling and resampling procedure, Eq 6.4, has been considered as a filtering
with a Gaussian, see Appendix 4.A. The model of the process is specified by the used
standard deviations, SDEV, for the Gaussians
Bandpass level
bp1
bp2
bp3
bp4
bp5
bp6
SDEV for the large Gaussian
0
π
2
√π
20
√π
84
√π
340
√π
1364
SDEV for the small Gaussian
π
2
π
6
√π
148
√π
596
√π
2388
√π
9556
As the bandpass levels are approximately octave based, this shows that the two upper
levels fit into the scheme. The subsampling has been considered as a Gaussian lowpass
filtering in accordance with the description in Appendix 4.A.
95
Bandpass level
BP1
RMS
4.35
BP2
2.33
BP3
1.01
BP4
0.49
Quotient
1.86
2.31
2.07
2.02
BP5
0.24
2.00
BP6
0.12
Table 6.3: The RMS of the energy for each bandpass level, and the quotient of the
RMS between adjacent levels.
6.B
The Transfer Function
The implemented transfer function, s(E), is defined according to the following equation,
where a is the center of the interval according to Figure 6.4





s(E) =
1
2
E
2
a
1−



 1
1
2
2−
E
a
96
2
0≤E<a
a ≤ E < 2a
E ≥ 2a
Part III
3D — Adaptive Filtering
97
98
Chapter 7
Orientation Representation and
Estimation in 3D
7.1
Orientation Representation
The fast progress in the data acquisition area has led to a situation where the use
of 3D data is commonplace. Typical examples here are video sequences, ultrasonic
sequences, MR and CT data. There is therefore a strong need for methods that can
handle this type of data efficiently.
The local orientation estimates are in 3D, as well as in 2D, represented with
tensors, [59]. This representation fulfills the three requirements from [68] also were
given in Section 5.1. Here the requirement are just repeated:
• Uniqueness, meaning a one-to-one mapping of orientations reflecting the fact
that planes are modulo π.
• Uniform stretch, meaning equal sensitivity to changes in orientation for all
orientations.
• Polar separability, meaning that the estimate should be independent of the
strength of the signal.
A representation fulfilling these requirements will make ordinary operations, e.g. averaging and differentiation, in the representation space meaningful. This is often not
the case for other representations, see [60, 59] for a more complete discussion. This
representation supplies a richer description of the local neighbourhood compared to
representing local orientation with the gradient vector. The tensor representation
of the local orientation of a neighbourhood with one single orientation is given by


x2 x1 x2 x1 x3
1 1
T =  x1 x2 x22 x2 x3 

x
x1 x3 x2 x3 x23
(7.1)
where x = (x1 , x2 , xq3 ) is directed as the normal vector to the plane of the neighbourhood and x = x21 + x22 + x23 . The magnitude of x is determined by the local
energy distribution.
99
Figure 7.1: An icosahedron (one of the 5 Platonic polyhedra).
7.1.1
Implementation of the Orientation Algorithm
The orientation estimation algorithm requires a number of precomputed quadrature
filters evenly spread in one half of the Fourier space. The number of filters in 3D
must be greater than four, [59, 60]. Consider for example the case of four evenly
spread filters, whose axes pass through the vertices of a cube centered around the
origin in the Fourier domain. Then let the input to these filters have an energy
contribution on a line in the Fourier domain and let this line passes through the
center of two opposing faces of the cube. In this case all the filters will have the
same output and the filter set cannot discern which pair of faces the line passes
through. Thus the minimum number of filter must be greater than four. Spreading
the filters evenly puts additional demands on the number of filters, in 3D the only
possible numbers are 3, 4, 6 and 10. Hence the minimum number of quadrature
filters required for orientation estimation in 3D is 6, where the filters are directed
as the vertices of a hemiicosahedron, see Figure 7.1:
n̂1
n̂2
n̂3
n̂4
n̂5
n̂6
=
=
=
=
=
=
c
c
c
c
c
c
(
a,
0,
b
( −a,
0,
b
(
b,
a,
0
(
b, −a,
0
(
0,
b,
a
(
0,
b, −a
)t
)t
)t
)t
)t
)t
(7.2)
with
a = 2 √
b = 1 + 5√
c = (10 + 2 5)−1/2
(7.3)
A quadrature filter designed with your favorite bandpass function, e.g. the log100
normal function, Eq 3.16, Fω (ω) is given in the frequency domain by:
(
Fk (ω) = Fω (ω)(ω̂ · n̂k )2
Fk (ω) = 0
if ω · n̂k > 0
otherwise
(7.4)
The spatial filter coefficients are found by a straightforward 3D-DFT or by use of
an optimization technique, Appendix 3.A. The resulting spatial filter is complexvalued. This procedure is used to obtain the six quadrature filters.
It is easy to implement the orientation algorithm with these precomputed filters
[59]. The tensor describing the neighbourhood is given by:
Te =
X
k
1
qk (Nk − I)
5
(7.5)
where qk again denotes the magnitude of the output from filter k. Nk = n̂k n̂tk
denotes the direction of the filter expressed in the tensor representation and I is the
unity tensor. A less compact description of Eq. 7.5 is:
1. Convolve the input data with the six complex-valued filters, i.e. perform twelve
scalar convolutions.
2. Compute the magnitude of each complex-valued filter by
q
2
2
qke
+ qko
qk =
where qke denotes the filter output of the real part of filter k and qko denotes
the filter output of the imaginary part of filter k.
3. Compute the tensor Te by Eq. 7.5, i.e.


T11 T12 T13

e
T =  T12 T22 T23 

T13 T23 T33
where
T11
T22
T33
T12
T13
T23
=
=
=
=
=
=
A(q1 + q2 ) + B(q3 + q4 ) − S
A(q3 + q4 ) + B(q5 + q6 ) − S
A(q5 + q6 ) + B(q1 + q2 ) − S
C(q3 − q4 )
C(q1 − q2 )
C(q5 − q6 )
for
S=
6
1X
qk
5 k=1
4
√
10 + 2 5
√
6+2 5
√
B=
10 + 2 5
√
2+2 5
√
C=
10 + 2 5
A=
101
7.1.2
Evaluation of the Representation Tensor
It is shown in [59] that the eigenvector corresponding to the largest eigenvalue of
Te is the normal vector of the plane best describing the neighbourhood. This
implies that an eigenvalue analysis is appropriate for evaluating the tensor. Below
the eigenvalue distribution and the corresponding tensor representation are given
for three particular cases of Te , where λ1 ≥ λ2 ≥ λ3 ≥ 0 are the eigenvalues in
decreasing order, and êi is the eigenvector corresponding to λi .
1. λ1 > 0 ; λ2 = λ3 = 0;
Te = λ1 ê1 êt1
This case corresponds to a neighbourhood that is perfectly planar, i.e. is constant on planes in a given orientation. The orientation of the normal vectors
to the planes is given by ê1 .
2. λ1 = λ2 > 0 ; λ3 = 0;
Te = λ1 (I − ê3 êt3 )
This case corresponds to a neighbourhood that is constant on lines. The orientation of the lines is given by the eigenvector corresponding to the least
eigenvalue, ê3 .
3. λ1 = λ2 = λ3 > 0;
Te = λ 1 I
This case corresponds to an isotropic neighbourhood, meaning that there exists
energy in the neighbourhood but no orientation, e.g. in the case of noise.
The eigenvalues and eigenvectors are easily computed with standard methods such
as the Jacobi method, e.g. [86]. Note that the spectrum theorem states that all
neighborhoods can be expressed as a linear combination of these three cases.
7.1.3
Accuracy of the Orientation Estimate
The performance of the algorithm is evaluated on two synthetic test patterns. The
first test pattern is generated to be locally planar with all directions equally represented. The second test pattern is designed to be locally constant on lines evenly
distributed in the volume. The patterns reflect case 1 and case 2 from above.
Accuracy of Estimation of Locally Planar Structures
The test pattern is a 64x64x64 volume that contains all possible 3D plane orientations for a wide frequency range. A traveler heading outward from the center and
moving in a straight line would experience a sine wave with decreasing frequency.
Four instances of the test pattern were used for the evaluation, one without noise
and two with added Gaussian distributed noise, see also Figure 7.2. The volumes
with noise have a SNR of 20 dB, 10 dB and 0 dB respectively, with SNR defined as:
SNR = 20 log
sdev(pattern)
sdev(noise)
102
(7.6)
Figure 7.2: From left to right: without noise, 10 dB and 0 dB. From top to bottom:
slice 4, 18 and 32.
The estimated orientation tensor Te (with tensor elements Tije ) was compared
with the ideal tensor, Tf (with tensor elements Tijf ), for all points in the volume.
The comparison was done with the error estimate:
v
u
N X
u1 X
u
err = t
(T̂ije − T̂ijf )2
N
(7.7)
ij
where the ˆ-notation indicates that the tensor has been normalized in the Frobenius
P
norm, i.e. ij T̂ij2 = 1, and N is the number of points involved in the error calculation. Since this is a synthetic test pattern it is possible to generate the correct
tensor field, T̂f , as
(7.8)
T̂f = x̂x̂t
where x is the coordinate vector. The results obtained with quadrature filters are
compared with the results produced by gradient filters. Both quadrature filters and
t
gradient filters are of size 7x7x7. The gradient estimate, (x
q1 , x2 , x3 ) , is transformed
into the tensor shape by scaling the outer product with 1/ x21 + x22 + x23 . This gives
a ‘tensor’ with a magnitude comparable with the orientation algorithm and enables
the use of Eq. 7.7.
The results are given in Table 7.1. The clear advantage of the algorithm using
quadrature filters is what one could expect, since one should be able to acquire a
more robust estimate with twelve filters compared to three. The frequency responses
of the filters influence the noise suppression, and a more thorough comparison with a
variety of frequency functions is needed. An extensive examination is, however, not
103
SNR Quadrature
∞ dB
0.04
20 dB
0.06
10 dB
0.11
0 dB
0.32
Gradient
0.10
0.30
0.49
0.61
Table 7.1: Comparison of quadrature and gradient filters using Eq. 7.7.
the issue here. The intention is to demonstrate that quadrature filters give accurate
estimates of the 3D orientation.
Note that it is possible to average the orientation tensor to get even more accurate
estimates. This is a virtue of the representation and does not depend upon the
filters used to achieve the initial estimates. Gradient filters in combination with this
representation and averaging are for example used in [15, 57].
Accuracy of Estimation of Locally Linear Structures
The maximum number of evenly distributed lines in 3D is 10, corresponding to the
space diagonals of a dodecahedron, [22]. These have been generated from Eq. 7.9
V =
10
X
k=1
x · d̂k
x
!2x2
(7.9)
where Vq(x1 , x2 , x3 ) is the intensity in the voxel, x is the spatial coordinate, x =
||x|| = x21 + x22 + x23 and d̂k is the normalized direction to one vertex of the dodecahedron. In essence this is a squared scalar product between the direction of the
line and the direction of the actual position. Directly applying the scalar product
will, however, produce a cone instead of the expected line. To compensate for this
broadening of the line, the exponent x2 is necessary. The result from Eq. 7.9 is
rendered in Figure 7.3.
The used test pattern has the size 64x64x64. When doing the actual comparison
the center part is omitted. The reason is that there is no unique orientation in the
center where all lines interfere with each other.
The line volume has been corrupted with white Gaussian distributed noise to
different SNR. SNR is in this case given by to Eq. 7.6. In order to get meaningful
noise estimates the standard deviations of the pattern and the noise are calculated
only in the voxels where the signal exists.
The accuracy of the orientation estimate is calculated as suggested in Eq. 7.7.
The correct tensors for each line are derived from the directions, d̂k of the lines by
the use of the equation in Section 7.1.2, paragraph 2
1
T̂f = √ (I − d̂k d̂tk )
2
104
(7.10)
Figure 7.3: The test volume for the line case. The lines corresponds to the space
diagonals of a dodecahedron.
105
SNR Error
∞ dB 0.11
20 dB 0.12
10 dB 0.13
5 dB 0.15
0 dB 0.19
Table 7.2: The result of the noise sensitivity for line cases according to Eq. 7.7
and the directions of the lines can be written as
d̂1
d̂2
d̂3
d̂4
d̂5
d̂6
d̂7
d̂8
d̂9
d̂10
=
=
=
=
=
=
=
=
=
=
k
k
k
k
k
k
k
k
k
k
( d,
0 , −b )t
( d,
0,
b )t
( b,
d,
0 )t
( −b ,
d,
0 )t
( 0,
b,
d )t
( 0 , −b ,
d )t
( f , −f , f )t
( f,
f , f )t
( −f ,
f , f )t
( −f , −f , f )t
(7.11)
where
d = a + 2b
f = a+b
1
k = √3(a+b)
The constants a, b and c are given in Eq. 7.3. This is a symmetric formulation of
the vertices of a dodecahedron. Since the structures in the test pattern coincide
with the frame slicing it is better if the test volume is tilted with respect to the
coordinate system of the filters. Then there will be a larger number of different test
cases. Thus the test pattern has been rotated 30◦ around the x-axis followed by a
rotation of 30◦ around the new y-axis.
The result from the error calculation, Eq. 7.7, for the quadrature filter is given in
Table 7.2. One possible explanation for the rather large error for the original signal
is the construction of the ideal tensor, T̂f . Centered on the line T̂f from Eq. 7.10
holds, but a few voxels away the shape of tensor should also reflect the variation
across the lines, which T̂f does not do.
7.2
Optical Flow Estimation
A natural way of estimating optical flow in an image sequence is to estimate 3Dorientation in the sequence, as described above. Orientation in three dimensions (two
spatial and one time-dimension) contains information of both the spatial orientation
106
Acceleration
Estimate
Principal
Direction
Estimate
Velocity
Estimate
Orientation
Estimate
Time
Sequence
Figure 7.4: Overview of a hierarchical algorithm.
and the optical flow. Over the years several other algorithms have been proposed
for time sequence analysis. Three different approaches can be distinguished in this
field.
• Matching 2D image processing techniques are used on one frame at a time to
extract feature points, curves, etc. Matching of the feature extraction result is
used to find the corresponding parts in the neighbor frames [53, 54, 4].
• Derivatives The solution to the optical flow equation is approximated by the
use of gradient filters. The filters are 3D-filters (two spatial dimensions + time)
[80, 91, 57, 15].
• Signal Processing The 3D data set is analyzed with tools such as the Fourier
Transform to design algorithms for the estimation of the planes in the spatiotemporal data set originating from moving lines and curves originating from
moving points [34, 2, 48].
The borderlines between these three methods are not always clear and it is possible
to use algorithms which incorporate aspects of all of the three approaches. This
section will not elaborate on the virtues and drawbacks of the different approaches.
The methods presented here can be considered to belong to the signal processing
approach.
The algorithm introduced here supplies estimates of velocity and can be extended
with acceleration estimates [9, 8]. Fig. 7.4 shows possible algorithm components to
indicate the hierarchical structure where the optical flow estimates are supposed to
function.
Figure 7.4 presents a general framework for the analysis of time sequences. Recent
developments in the human visual system models suggest the existence of frequency
107
and orientation channels representing the local spatial, as well as the local spatiotemporal image spectrum, [55, 3]. In the latter case the energy concentration in the
local spectrum to a particular orientation and frequency channel means a velocity
vector with a particular direction and coarseness range. Adelson and Bergen [2] as
well as Watson and Ahumada [92] and others have proposed human visual motion
sensing models based on the local spectra by means of the spatio-temporal filter
responses using separable filters of Gabor type.
7.2.1
Velocity Estimation
If the signal to analyze is a time sequence, a plane means a moving line and the.
The optical flow will be obtained by an eigenvalue analysis of the estimated representation tensor. The projection of the eigenvector corresponding to the largest
eigenvalue onto the image plane will give the flow field. However, the so-called
aperture problem will give rise to an unspecified velocity component, the component moving along the line. The aperture problem is a problem in all optical flow
algorithms which rely on local operators. On the other hand, the aperture problem
does not exist for moving points in the sequence. In this case of velocity estimation the correspondence between the energy in the spatial dimensions and the time
dimension is established to get correct velocity estimation.
By examining the relations between the eigenvalues in the orientation tensor it
is possible to divide the optical flow estimation into different categories, [9, 47].
Depending on the category different strategies can be chosen, see Section 7.1.2.
Case number two in Section 7.1.2, i.e. the line case, gives a correct estimation of
the velocity in the image plane and is thus very important in the understanding of
the motion. To give an illustration of this the synthetic test sequence, Figure 7.51
is used as an example. In Figure 7.6 the correct velocity field is given with arrows,
white arrows correspond to the plane case and black arrows to the line case.
To do this division of different shapes of the tensor the following functions are
chosen
λ1 − λ2
λ1
λ2 − λ3
=
λ1
λ3
=
λ1
pplane =
pline
piso
(7.12)
(7.13)
(7.14)
These expressions can be seen as the probability for each case. In Figure 7.7 the
division is made by selecting the case having the highest probability.
The calculation of the optical flow is done using Eq. 7.15 for the plane case and
Eq. 7.16 for the line case. In neighborhoods classified as ‘isotropic’ no optical flow
is computed. The ‘true’ optical flow in neighborhoods of the ‘plane’ type, such
1
A rotating and translating star together with a fair amount of Gaussian noise. The star is
rotated 1.8◦ counter-clockwise around its center, and translates 0.5 pixel up and 1 pixel to the
right between each frame.
108
Figure 7.5: One frame from the original sequence of the translating and rotating
star, with white Gaussian noise added.
as moving lines, cannot be computed by optical flow algorithms using only local
neighbourhood operations as mentioned earlier. The optical flow is computed by
x
= ê1
vline = (−x1 x3 x̂1 − x2 x3 x̂2 )/(x21 + x22 )
(7.15)
where x̂1 and x̂2 are the orthogonal unit vectors defining the image plane.
The aperture problem does not exist for neighborhoods of the ‘line’ type, such as
moving points. This makes them, as mentioned, very important for motion analysis.
The optical flow is computed by
x
= ê3
vpoint = (x1 x̂1 + x2 x̂2 )/x3
(7.16)
In Figure 7.8 the result from Eq. 7.15, white arrows, and Eq. 7.16, black arrows,
is given.
7.2.2
Spatio-Temporal Channels
The human visual system has difficulties handling high spatial frequencies simultaneously with high temporal frequencies [7, 27]. This means that objects with high
velocity cannot be seen sharply without tracking. A possible explanation to this is
that the visual system performs an effective data reduction. The data reduction is
made in such a way that high spatial frequencies can be handled if the temporal
frequency is low, and vice versa. This strategy is possible to use in a computer
vision model for time sequences.
The input image sequence is subsampled both spatially and temporally into different channels. In Table 7.3 the data content in the different channels relatively to
109
Figure 7.6: The correct velocity vectors from the star sequence. Black vectors
correspond to the moving point case and white ones to the moving line case.
Figure 7.7:
quence.
The subdivision in different cases following Eq. 7.12 for the test se-
110
Figure 7.8: The result of the optical flow algorithm. Black vectors correspond to the
moving point case and white ones to the moving line case.
a reference sequence, ch00, is shown. The reference sequence has maximum resolution in all dimensions; typically this means a video signal of 50 Hz2 , height 576 and
width 720 pixels. The frequency difference between adjacent channels is one octave,
i.e. a subsampling factor of 2 is used.
The numbers in Table 7.3 indicate that large data reduction can be made by not
using the channels with high resolution in both spatial and temporal domains. For
instance, the channels on the diagonal together contain approximately 1/4 of the
data in the reference sequence (ch00).
As pointed out in Chapter 3 there is also a signal theoretical reason to use a
pyramid representation of the image. A single filter has a particular limited pass
band, both temporally and spatially, which may or may not be tuned to the different
features to describe. In Figure 7.9a the upper cut-off frequency for a spatio-temporal
quadrature filter set is indicated. The lower cut-off frequency is not plotted for the
sake of clarity. Only the first quadrant in the ωs , ωt plane is plotted. The use of this
filter set on different subsampled channels corresponds to using filters with different
center frequencies and constant relative bandwidth. Figure 7.9b indicates the upper
cut-off frequency when convolving the channels on the diagonal in Table 7.3 with
this filter set.
To avoid aliasing in the subsampling the sequence must be prefiltered with a
lowpass filter. As the resulting channel shall be further processed the design of
the lowpass filter is critical. The estimation of optical flow from Eq. 7.15 and
Eq. 7.16 utilize the relationship of energies originating from spatial variations and
from temporal variations. The lowpass filter used for anti-aliasing should then not
influence this relationship.
Take ch10 as an example. This channel is subsampled a factor of two in the
2
A strategy to filtering interlaced sequences is described in Appendix 7.A.
111
Spatial subsampling
1/8
1/4
1/2
1
1/64 1/16 1/4
1
Relative
1/128 1/32 1/8 1/2
data content 1/256 1/64 1/16 1/4
1/512 1/128 1/32 1/8
Notation for
sequence
Table 7.3:
channels.
1
1/2 Temporal
1/4 subsampling
1/8
Spatial subsampling
1/8
1/4
1/2
1
ch30
ch20 ch10 ch00
1
ch31
ch21 ch11 ch01 1/2 Temporal
ch32
ch22 ch12 ch02 1/4 subsampling
ch33
ch23 ch13 ch03 1/8
Data content and name convention for the different spatio-temporal
spatial dimension and not subsampled in the time dimension. A natural way of
obtaining this channel is to apply a spatial Gaussian, as described in Appendix 4.A,
prior to the subsampling. However, by such a filtering the relationship between
spatio-temporal domain is altered. It is obvious from Figure 7.10 where a stylized
Gaussian is plotted, that the energy in the subsampled spatial domain is reduced
due to the filtering.
The way to design the anti-aliasing filter to obtaining ch10 is as an elliptic spatiotemporal function, e.g. an elliptic Gaussian
G(ωs , ωt ) = e−0.5((ωs /σs )
2 +(ω /σ )2 )
t
t
(7.17)
where σs = π/4 and σt = π/2. Then is not the relations in the spatio-temporal
affected. The reasoning is the same for designing the anti-aliasing filters for other
channels. The design is reduced to set the values of σs and σt .
7.2.3
Quantitative Error Measurements
The estimation error is computed in two different ways; first following an error
measure from [34] given in Eq. 7.18, second by a histogram, where the correct
magnitude of the image flow has been quantized in 100 different velocities. In each
of these velocities the relative mean error is calculated. The result is that the
sensitivity with respect to the speed of the used filter set can be determined. The
latter method will be used to examine the performance of the normal flow calculation
in different spatio-temporal channels.
In order to make comparisons with other algorithms, the same angular measure
of error is used as in [11, 34]. In [11] an extensive testing of several optical flow
algorithms is performed using the following error measure between the correct vector
ôf and an estimate ôe
ψE = arccos(ôe · ôf )
(7.18)
112
ωt
ωt
Ch30
Ch00
Ch21
Ch12
Ch03
ωs
ωt
Figure 7.9: Cut-off frequency for a spatio-temporal filter.
π/2
π
ωs
Figure 7.10: Illustration of a Gaussian filtering, a subsampling will move the frequency π/2 to the maximum frequency π.
113
where the optical flow vector v = (v1 , v2 ) is expressed as a 3D-orientation vector
(v1 , v2 , st)t
ô ≡ q
(7.19)
v12 + v22 + st2
where st = 1 in [34, 11].
This error measure can be further expounded by considering the behavior if
the absolute error, ∆v, is small, see Figure 7.11. Differentiating the angle α in
∆v
v
ψ
E
α
st
Figure 7.11: Illustration of the angular error suggested in [34].
Figure 7.11 w.r.t. the absolute error will give the correspondence to ψE .
q
(v1e − v1f )2 + (v2e − v2f )2
v
α = arctan( )
st
st
∆v
ψE = ∆α =
st2 + v 2
∆v =
(7.20)
(7.21)
(7.22)
This angular measure handles small velocities without the amplification inherent in
the commonly used relative measure of vector differences.
Results
The error estimates by applying Eq. 7.18 is calculated for two of the sequences in
the examination in [10], i.e. the translating tree and the divergent tree. One frame
from each of the original sequence with the correct velocity fields superimposed can
be found in Figure 7.12.
The results of the proposed method, based on eigenvalue decomposition according
to Eq. 7.16, depend highly on the way the estimate is averaged in the representation
space. The smooth motion fields in the test sequences enable the use of large averaging filters without destroying the information. The values from the calculation of
Eq. 7.18 is given in Table 7.4, where three different sizes of a Gaussian low pass filter
was used. As can be seen in the density column in Table 7.4 not all pixels in the
sequence contributes in the error calculation. This is due to the aperture problem,
the pixels that has been used fulfill the following demands
||T||f rob ≥ 0.1
and
114
pline ≥ 0.5
Figure 7.12: Two frames from the tree sequences. To the left the motion field from
the translating sequence is superimposed and to the right the motion field from the
divergence sequence is superimposed (further information about these sequences can
be found in [33].
Size of averaging filter Average error Standard Deviation Density
7x7x7
1.61◦
0.94◦
11.3 %
◦
15x15x15
0.98
0.51◦
17.7 %
◦
◦
21x21x21
0.85
0.42
21.1 %
Size of averaging filter Average error Standard Deviation Density
7x7x7
2.93◦
2.10◦
8.5 %
◦
◦
15x15x15
1.93
1.28
13.7 %
◦
◦
21x21x21
1.77
1.14
17.7 %
Table 7.4: The result of of the error calculation from the velocity estimation algorithm based on eigenvalue decomposition of the representation tensor. In the upper
table are the results from the translating tree sequence and in the lower are the results
from the diverging tree sequence.
where pline is defined in Eq. 7.12. Other values of these thresholds give different
values in the table. In general one can state that the higher density of the estimate
the higher the error will be. This indicates that some kind of regularisation, see
e.g.[13], using the pixels with highest certainty, might be a useful postprocessing
step to get a motion field over the whole image.
To compare the values in Table 7.4 in other algorithms two tables from [11, 10] is
given in Tables 7.5-7.6. The names in these tables refer to the authors of the articles
from which Barron et al have implemented the algorithms, see [52, 74, 91, 4, 49, 34].
The decrease of the error value in Table 7.4 depending on the size of the averaging
filters shows how extremely difficult it is to do a impartial sensitivity examination.
The error estimate for the histogram method is calculated for the test pattern in
Figure 7.2, that also has been used in the performance examination of the quadrature
filters, see Table 7.1. The correct image velocities, has been calculated for this
115
Technique
Horn and Schunck
Lucas and Kanade (λ2 ≥ 1.0)
Lucas and Kanade (λ2 ≥ 5.0)
Uras et al. (unthresholded)
Uras et al. (det(H) ≥ 1.0)
Anadan
Heeger
Fleet and Jepson (τ = 2.5)
Fleet and Jepson (τ = 1.25)
Average error Standard Deviation Density
38.99◦
27.84◦
100 %
◦
1.75
1.43◦
40.8 %
1.12◦
0.82◦
13.6 %
◦
◦
0.71
0.81
100 %
0.47◦
0.29◦
41.7 %
4.54◦
2.98◦
100 %
◦
◦
4.79
2.39
13.8 %
0.36◦
0.41◦
76.0 %
◦
◦
0.23
0.20
51.5 %
Table 7.5: The results from [11, 10] of different algorithms from the translating
tree.(The table is reprinted by courtesy of David Fleet.)
Technique
Horn and Schunck
Lucas and Kanade (λ2 ≥ 1.0)
Lucas and Kanade (λ2 ≥ 5.0)
Uras et al. (unthresholded)
Uras et al. (det(H) ≥ 1.0)
Anadan
Heeger
Fleet and Jepson (τ = 2.5)
Fleet and Jepson (τ = 1.25)
Average error Standard Deviation Density
12.77◦
12.00◦
100 %
◦
◦
3.04
2.53
49.4 %
2.32◦
1.84◦
24.8 %
◦
◦
5.11
3.96
100 %
◦
◦
4.05
2.26
56.5 %
8.23◦
6.17◦
100 %
◦
◦
4.95
3.09
73.8 %
1.24◦
0.72◦
64.3 %
◦
◦
1.10
0.53
49.7 %
Table 7.6: The results from [11, 10] of different algorithms from the diverging tree
sequence.(The table is reprinted by courtesy of David Fleet.)
116
pattern. Then has the optical flow, according to Eq. 7.15 been estimated for four
different spatio-temporal channels, ch00, ch01, ch10 and ch11 of the test volume.
Prior to the estimation the pattern has been corrupted with white Gaussian additive
noise to a SNR of 10dB. The correct image velocities are quantized into 100 ’boxes’
in the interval [0, 5.5] pixels/frame. In each of these ’boxes’ is the sum of the absolute
velocity errors and the number of voxels involved in the error sum calculated. In
Figure 7.13 is the relative mean error for each ’box’ plotted.
The behavior for the spatio-temporal channels is that ch01 has a better performance for low velocities, while ch10 has better performance for higher velocities.
ch00 and ch11 has compared to each other approximately the same performance for
different velocities. This is no surprise since maximum velocity sensitivity for these
channels are
ch00
1 pixel/frame
ch01 0.5 pixel/frame
ch10
2 pixel/frame
ch11
1 pixel/frame
which also is indicated in Figure 7.9.
The histograms in Figure 7.13 show also that a combination of different spatiotemporal channels allows a relative error well below 10 % for a wide range of image
velocities. It should then be considered that the input pattern were corrupted with
a signal-to-noise ratio of 10dB.
117
ch00
Rel. error
0.6
0.4
0.2
0
0
1
Rel. error
6
5
6
5
6
5
6
0.4
0.2
0
0
1
2
3
4
Correct image velocity pixel/frame
ch10
0.6
Rel. error
5
ch01
0.6
0.4
0.2
0
0
1
2
3
4
Correct image velocity pixel/frame
ch11
0.6
Rel. error
2
3
4
Correct image velocity pixel/frame
0.4
0.2
0
0
1
2
3
4
Correct image velocity pixel/frame
Figure 7.13: The relative mean error for different image velocities and four different spatio-temporal channels. The results are compiled from estimations of the test
pattern in Figure 7.2, the SNR=10dB.
118
Frame
1
2
3
Reference Sequence (ch00)
4
5
6
t
t
y
y
Figure 7.14: Sampling grid for interlaced and reference sequences respectively.
Appendix
7.A
Filtering of Interlaced Video Signals
Ordinary video equipment produces images where the sampling grid is different from the
commonly used grid for digital images. This is due to the interlacing defined in the video
standard. In Figure 7.14, the sampling grid for the different lines are plotted against time.
In the horizontal direction (x-direction), the sampling density is the same as the density
for the t- and y-directions in the reference sequence. It can be seen that the interlaced
signal contains half the number of sampling points compared to the reference sequence,
ch00.
To be able to perform feature extraction by using standard filters, the sequence should
consist of non-interlaced images. The easiest way to convert the sequence is to put the
odd and the even field from each frame in a separate image. The resulting sequence then
gets maximum resolution in time, but the resolution in the y-direction is only half of the
maximum resolution. Because of the full resolution in the x-direction, the pixels are not
square shaped any longer. Furthermore, the same line in two consecutive fields does not
cover the same area in the scene. This gives aliasing if for instance a one pixel wide
horizontal line appears in the scene.
A more accurate way is to resample the interlaced sequence so that the different resolution channels can be created. Figure 7.15 shows the frequency domain corresponding
to the interlaced and the ordinary sampling grids. The rhomb indicates the Nyquist frequency for the interlaced signal, and the square for the reference sequence. Because the
rhomb covers only a subarea of the square, it follows that it is impossible to achieve the
resolution of ch00 from the video signal. Without any aliasing it is possible to get ch11, i.e.
half the resolution both spatially and temporally. This corresponds to the shaded square
inside the rhomb in Figure 7.15b. See also the results in Chapter 9, where a complete
reconstruction of ch00 from the interlaced original is given in Figures 9.10-9.11.
From Figure 7.15a it is clear that the interlaced signal has maximum resolution in the
119
ωy
ωy
ω
ω
t
a)
t
b)
Figure 7.15: Nyquist frequencies for the two grids.
y-direction when the temporal frequency is zero, i.e. the scene is static. Furthermore the
video signal has maximum temporal resolution when the spatial frequency is zero, that is,
when the scene is uniform.
These two cases are quite degenerated, but similar degeneration has been found in
the mammal visual system (x- and y-cells) [27], i.e. high spatial frequency can only be
detected if the temporal frequency is low and vice versa.
To be able to extract more of the information inside the rhomb in Figure 7.15 and
simultaneously avoiding aliasing, two more filters have been implemented. These filters
correspond to ch02 and ch20. They are indicated with dashed rectangles in Figure 7.15b.
The frequency function of the filters is a Gaussian, G:
1
2
2
2
2
2
2
G(ωx , ωy , ωt ) = e(− 2 (ωx /σx +ωy /σy +ωt /σt ))
The choice of σx , σy , σt and the size of the filters for the different channels are:
ch11 : σx = π/4 σy = π/4 σt = π/4 7 × 7 × 7
ch20 : σx = π/8 σy = π/8 σt = π/2 13 × 13 × 4
ch02 : σx = π/2 σy = π/2 σt = π/8 4 × 4 × 13
The filters are designed on an ordinary sampling grid, as in Figure 7.14b, but applied
on an interlaced grid, Figure 7.14a. Because of the fact that the grid is sparse, only about
one half of the points in the filter are necessary.
Further subsampling is performed on an ordinary grid, and therefore standard filters
can be used.
120
Chapter 8
Applications in Structure from
Motion
The purpose of this chapter is to exemplify how the information captured in the 3D
orientation tensor is applicable for further processing. Therefore, all the details of
the used algorithms will not be presented. Two applications relevant for estimation
of structure from motion will be described. First a technique for estimation of
the focus of expansion for a translating camera will be presented and second an
application of motion stereo is given.
8.1
Extraction of Focus of Expansion
The problem of finding the Focus of Expansion (FOE) for a moving camera is one
step in the extraction of the scene geometry from the image flow,[50, 41]. Figure 8.1
presents the geometrical structure of the problem and the notation.
dP
dt
P
R
Y
Y
r
Z
f
Z
Figure 8.1: Geometry of the projected motion field for a translating camera.
Using simple geometry and the notations from Figure 8.1, where f is the focal
length and Z is the depth, the following relationship between R, the projection of a
translating point, P, in the scene onto the (X, Y )-plane and, r, the projection onto
the image plane (x, y) of the same point, P, can be established:
121
r
R
=
f
Z
(8.1)
Taking the derivative of Eq. 8.1 yields:
f
dR
dr dZ
=Z +
r
dt
dt
dt
(8.2)
Denoting the velocity of a point P in the scene by:
dP
dR(X, Y ) dZ
=(
,
) = (Vx , Vy , Vz )
dt
dt
dt
and the velocity of a point r in the image by:
dr(x, y)
= (u, v)
dt
Inserting the derivatives of R, r and Z in Eq. 8.2 results in
⇒
f (U, V ) = Z(vx , vy ) + Vz (x, y)


 vx =


vy =
1
(f Vx
Z
− Vz x)
1
(f Vy
Z
− Vz y)
(8.3)
For a pure translation of the camera, features in the image move towards or away
from a single point in the image, the FOE. This means that the velocity at the FOE
is zero. Inserting this condition into Eq. 8.3 gives the image coordinates of the FOE,
r0 = (x0 , y0 )T :



 x0 =
f Vx
Vz


 y =
0
f Vy
Vz
⇒
x0 − x
Z
y0 − y
=
=
W
vx
vy
⇒
which implies
(x − x0 )vy − (y − y0 )vx = 0
(8.4)
Using velocity estimates from more than two points not suffering from the aperture problem, i.e. the line case from Section 7.2.1, an over determined linear equation
system in x0 and y0 formed by N equations on the form of Eq. 8.4 is to be solved.

Ar0 = b
,



A=



−vy1 vx1
.
.
.
.
.
.
−vyN vxN












b=



,
−x1 vy1 + y1 vx1
.
.
.
−xN vyN + yN vxN








(8.5)
A least square solution to the Eq. 8.5 can be obtained by finding the pseudo
inverse A+ to the equation system matrix A. It can be noted that it is simple to
122
2
1.5
1
0.5
-2
-1.5
-1
-0.5
0.5
1
1.5
2
-0.5
-1
-1.5
-2
Figure 8.2: Error in the calculation of FOE, scale in pixels. The results are for
22 frames of the diverging tree sequence, where the original image size is 150x150
pixels.
incorporate certainties in the solution of the equation system by just multiplying
each equation by a certainty factor, [86].
A+ = (AT A)−1 AT
The solution can now be written:
r0 = A+ b = (AT A)−1 AT b
An experiment using 22 frames from an image sequence containing a camera
translation towards a tilted picture, see Figure 7.12, and about 86 velocity estimates
from each frame gave the result shown in Figure 8.2 indicating an error of less than
1%. The correct answer is r0 = (0, 0)T .
123
estimated
flow
real flow
frame (i)
frame (i+1)
Figure 8.3: A line in two different frames.
8.2
Motion Stereo Algorithm
Motion stereo is a natural application of the optical flow algorithm described in
Section 7.2.1. Under the assumption that the scene is static during the translation
of the camera, the image velocity, at least in principle, is inversely proportional to
the depth in the scene. If the focus of expansion is known or estimated, the induced
image flow from the camera translation is known. This magnitude of this known
image flow will, however, depend on the depth to the feature point.
To estimate the depth from motion the following ideas are utilized. Figure 8.3
illustrates that the optical flow algorithm will, due to the aperture problem, underestimate the flow, i.e. overestimate the disparity, in the case of a moving line.
The knowledge, from the position of FOE, of the direction of the image flow, vf oe
enables a compensation of this overestimation. The orientation of the line is used
to compute the depth, d, which projects onto the normal vector of the line:
dline =
vf oe · vline
|vline |2
(8.6)
Moving points, e.g. the end points of the line in Figure 8.3, do not have the
aperture problem and the disparity is computed from the optical flow by:
dpoint = 1/vpoint
(8.7)
The partitioning to the ’point-case’ and to the ’line-case’ is performed according
to Eq. 7.12.
8.2.1
Results
Two sequences, the diverging tree and the translating tree, have been used for
quantitative evaluation of the proposed method. The correct depth in the scene is
given in Figure 8.4, where the intensity is proportional to the closeness to the camera.
The result from the method, in the points where it is meaningful to calculate the
124
ERRdepth
Diverging tree
-0.03
Translating tree
-0.02
σerr
0.07
0.06
Table 8.1: The relative depth error for the translating and diverging tree sequences.
Figure 8.4: The correct depth for the translating tree (left) and for the diverging
tree (right).
image velocity is given in Figure 8.5. The error is calculated as the mean of the
relative depth error, i.e.
∆d = (dest − dcorrect ) /dcorrect
N
1 X
ERRdepth =
∆di
N i=1
and the standard deviation, σerr of the errors as
v
u
u
σerr = t
N 2
1 X
∆di − ERRdepth
N − 1 i=1
The performance of the depth estimation for one frame in the translating tree sequence and for one frame in the diverging tree sequence are given in Table 8.1.
The correct depths, in these particular frames, are in the interval [11.3, 16.4] for the
diverging tree, and in the interval [11.3, 14.7] for the translating tree.
A qualitative result from this method is given in Figure 8.7. The original sequence
originates from the image sequence database at Sarnoff Labs, Princeton, USA and
the camera is translated perpendicular to the line of sight. One frame from the
original sequence is shown in Figure 8.6.
125
Figure 8.5: The estimated depth for the translating tree (left) and for the diverging
tree (right).
Figure 8.6: One frame from the Sarnoff tree sequence.
126
Figure 8.7: The result from the motion stereo algorithm, bright means close to the
camera and dark far away.
127
Chapter 9
Adaptive Filtering of 3D Signals
In this chapter the extension of the 2D orientation adaptive filtering scheme from
Section 5.2 to 3D signal spaces is described. The 3D scheme utilizes the tensor
representation from Section 7.1 which ensures a simple control of an adaptive filter
for image sequences or other data volumes. The adaptivity of the filter consists
in a signal controlled variation of shape and orientation. The shape of the filter
has 3 degrees of freedom and can be said to span the range line - plane - sphere.
The shape of the filter is rotation invariant and the filter can have any orientation.
Algorithms for adaptive, or steerable, 3D filters have not been the subject for many
publications, but there are a few examples, [6, 36, 42].
The basic idea in the 3D case is the same as in 2D, Section 5.2, i.e. to take
advantage of the spatial relations in signal. By for example doing lowpass filtering
within an estimated signal the detectability, both for a human observer and for
further processing, can be considerably enhanced. The location, orientation and
velocity of lines and edges in the sequence or volume are estimated according to the
method in Section 7.1. This information is used as contextual control to construct
an adaptive filter. The adaptive filter is constructed as a linear combination of 6
fixed filters having the same shape but different orientations.
The method has been tested both on synthetic patterns and on actual recorded
sequences. The results are indeed satisfying, which can be seen in the result Section 9.3, on a wide variety of volumes.
9.1
The Algorithm
The estimated local structure and orientation from Te , Eq. 7.5, can be used to
adapt a filter to the signal. The desired filter containing the degrees of freedom for
an orientation adaptive filter can be written as:
F (ω) = Fω (ω) g1 (λ1 , λ2 , λ3 ) (ω̂ · ê1 )2 + g2 (λ1 , λ2 , λ3 ) 1 − (ω̂ · ê3 )2 + g3 (λ1 , λ2 , λ3 )
(9.1)
where Fω (ω) is a bandpass function and
g1 is a weighting function for a spatial plane filter in the direction of the eigenvector
corresponding to the largest eigenvalue, ê1 , of Te .
128
g2 is a weighting function for a spatial line filter in the direction of the eigenvector
corresponding to the least eigenvalue, ê3 .
g3 is a weighting function for an isotropic filter.
A direct extension of the 2D filter, see Eq. 5.12-5.13, yields the following weighting
functions, gi .
g1 (λ1 , λ2 , λ3 ) = (λ1 − λ2 )/λ1
g2 (λ1 , λ2 , λ3 ) = (λ2 − λ3 )/λ1
(9.2)
g3 (λ1 , λ2 , λ3 ) = λ3 /λ1
These choices give a filter following the signal structure as it is estimated by the
representation tensor. The above choice also follows the probabilities from Eq. 7.12
of how the optical flow should be interpreted.
Is it now possible to extend the 2D scheme from Proposition 1 in Section 5.2,
using the weighting functions from Eq. 9.2 ?
Proposition 2
Assume six polar separable filters having the same radial frequency function, Fω (ω),
and angular functions (ω̂ · n̂k )2 . The directions, n̂k , are the same as in the estimation
procedure, Eq. 7.2. Then the interpolation can be written, [63]:
6
X
5
1
F (ω) = Fω (ω)
T̂e • (Nk − I)(ω̂ · n̂k )2
4
5
k=1
(9.3)
where T̂e is the normalized estimate and • symbolizes the tensor inner product
defined in Eq. 5.15.
Proof of proposition 2
2
From the spectrum theorem it is possible to split the estimated tensor in three parts
T̂e =
1 e
λ2
λ3
T = ê1 êt1 + ê2 êt2 + ê3 êt3
λ1
λ1
λ1
(9.4)
each one corresponding to one orientation. The spectrum theorem shows that it is
enough to investigate the interpolation in Eq. 9.3 for a tensor corresponding to one
orientation.
An representation tensor, T0 , originating a one-dimensional input, ê1 = (u1 , u2 , u3 )t
is


u21 u1 u2 u1 u3

0
T =  u1 u2 u22 u2 u3 

u1 u3 u2 u3 u23
129
(9.5)
q
where
yields
u21 + u22 + u23 = 1. Expanding the filter orientation tensors, Nk = nk ntk ,

N1 = c2 


a2
0
ab
a2

N2 = c2  0
−ab

b2
ab
0
0
0
0

ab
0 

b2

0 −ab
0
0 

0
b2

ab
a2
0
0
0 

0
b2 −ab

a2
N4 = c2  −ab
0
0
0
0 

0
N3 = c2 



N5 = c2 


N6 = c2 

0
0
0
0
b2
ab

(9.6)

0
ab 

a2

0
0
0
2
0
b −ab 

0 −ab
a2
where the constants a, b and c are given in Eq. 7.3.
Next the tensor inner products, T0 •(Nk − 15 I), are calculated.
T0 • N1 − 15 I
T0 • N2 − 15 I
T0 • N3 − 15 I
T0 • N4 − 15 I
T0 • N5 − 15 I
T0 • N6 − 15 I
= c2 (u21 a2 + u23 b2 + 2u1 u3 ab) − 15 (u21 + u22 + u23 )
= c2 (u21 a2 + u23 b2 − 2u1 u3 ab) − 15 (u21 + u22 + u23 )
= c2 (u21 b2 + u22 a2 + 2u1 u2 ab) − 15 (u21 + u22 + u23 )
= c2 (u21 b2 + u22 a2 − 2u1 u2 ab) − 15 (u21 + u22 + u23 )
(9.7)
= c2 (u22 b2 + u23 a2 + 2u2 u3 ab) − 15 (u21 + u22 + u23 )
= c2 (u22 b2 + u23 a2 − 2u2 u3 ab) − 15 (u21 + u22 + u23 )
To further investigate the adaptive filter, the filter outputs (ω̂ · n̂k )2 are expressed
in Cartesian coordinates ω̂ = (ω1 , ω2 , ω3 )t /ω
(ω̂ · n̂1 )2
= c2 ω −2 (a2 ω1 2 + 2abω1 ω3 + b2 ω3 2 )
(ω̂ · n̂2 )2
= c2 ω −2 (a2 ω1 2 − 2abω1 ω3 + b2 ω3 2 )
(ω̂ · n̂3 )2
= c2 ω −2 (b2 ω1 2 + 2abω1 ω2 + a2 ω2 2 )
(ω̂ · n̂4 )2
= c2 ω −2 (b2 ω1 2 − 2abω1 ω2 + a2 ω2 2 )
(ω̂ · n̂5 )2
= c2 ω −2 (b2 ω2 2 + 2abω2 ω3 + a2 ω3 2 )
(ω̂ · n̂6 )2
= c2 ω −2 (b2 ω2 2 − 2abω2 ω3 + a2 ω3 2 )
130
(9.8)
Now it is possible to express the interpolation from Eq. 9.3
6
X
1
T0 • (Nk − I)(ω̂ · n̂k )2 =
5
k=1
h
2c4 ω −2 u21 (a4 + b4 )ω12 + a2 b2 ω22 + a2 b2 ω32 +
u22 a2 b2 ω12 + (a4 + b4 )ω22 + a2 b2 ω32 +
u23 a2 b2 ω12 + a2 b2 ω22 + (a4 + b4 )ω32 +
(9.9)
i
4a2 b2 (u1 u2 ω1 ω2 + u1 u3 ω1 ω3 + u2 u3 ω2 ω3 ) −
2c2 ω −2 2
(a + b2 )(ω12 + ω22 + ω32 )(u21 + u22 + u23 )
5
In the simplification of this expression some relations of the constants a, b and c are
helpful
c−2 = a2 + b2
b2 − a2 = ab
(9.10)
b4 + a4 = 3a2 b2
(b2 + a2 )2 = 5a2 b2
Rearranging terms in Eq. 9.9 and utilizing the relations in Eq. 9.10 gives
6
X
1
T0 • (Nk − I)(ω̂ · n̂k )2 =
5
k=1
h
2c4 ω −2 2a2 b2 (ω12 u21 + ω22 u22 + ω32 u23 )+
4a2 b2 (u1 u2 ω1 ω2 + u1 u3 ω1 ω3 + u2 u3 ω2 ω3 ) +
i
a2 b2 (ω12 + ω22 + ω32 )(u21 + u22 + u23 )
−
(9.11)
2c2 ω −2 2
(a + b2 )(ω12 + ω22 + ω32 )(u21 + u22 + u23 ) =
5
4 −2
4
ω (ω · ê1 )2 = (ω̂ · ê1 )2
5
5
which is, except from a constant, the filter proposed for a one-dimensional tensor in
the direction of ê1 .
By combinating the proof from above and the spectrum theorem, Eq. 9.4, the
total filter shape can be calculated.
F (ω) =
6
X
5
1
Te • (Nk − I)(ω̂ · n̂k )2 =
Fω (ω)
4
5
k=1
5
Fω (ω) λ1 (ω̂ · ê1 )2 + λ2 (ω̂ · ê2 )2 + λ3 (ω̂ · ê3 )2 =
4
5
Fω (ω) (λ1 − λ2 )(ω̂ · ê1 )2 + (λ2 − λ3 ) 1 − (ω̂ · ê3 )2 + λ3
4
(9.12)
By normalizing the tensor in the maximum norm, i.e. with the largest eigenvalue
λ1 , the adaptive filter behaves as proposed in Proposition 2.
2
131
If the adaptive filter is supposed to be used for enhancement, or reconstruction,
of degraded volumes or sequences an appropriate lowpass filter, FLP (ω) see Eq. 9.18,
must be added. The reason is that the low frequencies, including DC, should not be
lost in the process since these are important in such applications. The final filter,
Fada (ω), can now be written as
Fada (ω) = FLP (ω) +
q
||Te ||f rob Fω (ω)
6
5X
1
Te • (Nk − I)(ω̂ · n̂k )2
4 k=1
5
(9.13)
where the weighting if the adaptive bandpass filter should be a part of the output
is controlled by the energy, according to Frobenius. This energy measure can also
be mapped by a function, e.g. a sigmoid such as s(E) from Appendix 6.B, in order
for the adaptive filter to function in different noise situations.
Note that a complete eigenvalue decomposition of the representation tensor, Te ,
is not necessary for adapting the filter. When normalizing the tensor in the maximum
norm, only the largest eigenvalue, λ1 , is needed. It is even possible to approximate
the adaptive filter of Eq.9.3 only using the Frobenius norm and the trace of Te . By
normalizing Te with the trace, i.e. with
T race(T) =
X
Tii = T11 + T22 + T33 =
X
λi
none of the eigenvalues need to be calculated. Using the trace for normalizing the
tensor is the same as changing the weighting functions, gi , in the adaptive filtering.
The functions will in this case be
g1 (λ1 , λ2 , λ3 ) =
g2 (λ1 , λ2 , λ3 ) =
g3 (λ1 , λ2 , λ3 ) =
λ1 −λ2
λ1 +λ2 +λ3
λ2 −λ3
λ1 +λ2 +λ3
λ3
λ1 +λ2 +λ3
(9.14)
Compared to the weighting functions in Eq. 9.2 the line filter, weighted by g2 , will
be given smaller weights. However, the difference in appearance of filtered sequences
is hard to discover. The results in Section 9.3 are filtered with an adaptive filter,
where the representation tensor is normalized in maximum norm, while ”line-like”
and ”plane-like” structures then are given equal importance. If there is a need to
minimize the computational complexity in the adaption, the normalization by the
trace can be chosen.
To illustrate the possibilities and degrees of freedom for the adaptive filter,
Eq. 9.13 some specific cases are plotted as isosurfaces in Figure 9.1.
132
TOP:
MIDDLE LEFT:
MIDDLE RIGHT:
BOTTOM LEFT:
BOTTOM RIGHT:
λ1
λ1
λ1
λ1
λ1
=0
= 0.5
=1 √
= 1/2√ 2
= 1/ 2
λ2
λ2
λ2
λ2
λ2
=0
=0
=0 √
= 1/2√ 2
= 1/ 2
λ3
λ3
λ3
λ3
λ3
=0
=0
=0
=0
=0
i.e.
i.e.
i.e.
i.e.
i.e.
a
a
a
a
a
lowpass filter
spatial plane filter
spatial plane filter
spatial line filter
spatial line filter
Table 9.1: The eigenvalues and the interpretation of the isosurfaces in Figure 9.1.
Figure 9.1: Isosurfaces of adaptive filters in the Fourier domain illustrating the
filter shapes for different relations of the eigenvalues of Te , see Table 9.1. Compare
this figure with Figure 5.2 to get the filter functions.
133
9.2
Implementation
The implementation of the 3D algorithm falls naturally into the same three parts
as in the 2D case:
1. Estimation of context information.
2. Relaxation of the context information.
3. Adaptive filtering of the original image.
Estimation of Context Information
Considering the estimation of the contextual information described in Section 7.1,
the quadrature filters used are defined according to
(
Qk (ω) =
f (ω) (ω̂ · n̂k )2 if ω · n̂k ≥ 0
0
otherwise
(9.15)
where f (ω) is a lognormal function from Eq. 3.16. All parameters used in the filter
design, as the center frequency and the bandwidth of the filter above, are chosen so
that the corresponding spatial convolution kernel will be possible to realize typically
as a 9x9x9 filter, with negligible
√ deviation from the specified frequency response. The
center frequency is ωi = π/(2 2) and the bandwidth B = 2.
Relaxation of Context Information
An important property of the tensor representation described above is that averaging of such tensors is a meaningful operation [59]. It is thus possible to increase
robustness by convolving the tensor field with an averaging filter. The filter used
in the experiments has been a truncated isotropic Gaussian filter, gLP (x), of size
7x7x7 having a standard deviation of 1.3 (inter voxel distances). Note that this filtering does not change the sensitivity to high frequency components in the original
data, it only means that rapid changes in the representation of local structure are
suppressed. This relaxed tensor,
Trel = gLP (x) ∗ Te
where ∗ denotes convolution, is then used for a more robust control of the adaptive
filter.
Adaptive Filtering
The filters are defined with the angular function (ω̂ · n̂k )2 and the frequency function
Fω (ω) defined as

sin2 (πω/2)



 1
0
1
Fω (ω) =
 sin2 (π(π − ω)/2) π − 1



0
π
134
<
<
<
<
ω < 1
ω < π−1
ω < π
ω
(9.16)
Input SNR Output SNR
∞ dB
21.0 dB
20 dB
21.0 dB
10 dB
18.5 dB
0 dB
9.8 dB
Table 9.2: The result of the 3D enhancement algorithm for the volume in Figure 9.2
This function is chosen to give a flat frequency response, so that the filter will
be possible to use iteratively. The proposed contextual control, eq. 9.3, will be a
combination of Trel and the convolutions of the filters specified by eq. 9.16. To
apply the adaptive filter to reduce noise, the bandpass filter must be combined with
a corresponding lowpass filter defined by:
(
FLP (ω) =
cos2 (πω/2) 0 < ω < 1
0
1 < ω
(9.17)
The filters FLP (ω) and Fω (ω)(ω̂ · n̂k )2 are applied as 9x9x9 convolutions kernels.
By denoting the output from the lowpass filter, hLP , and the output from the
directed bandpass filters, hk , the final filter is given as:
q
hLP +
||Te ||f rob
X
ak hk
(9.18)
6
5X
1
T̂e • (Nk − I)
4 k=1
5
(9.19)
k
where ak is
ak =
9.3
Results
The algorithm has been tested extensively on both synthetic and real image volumes
and sequences. In this section some results are presented.
First the test volume used in the performance examination in Section 7.1.3 is used
in the test. Sixteen slices of the original volume is shown in Figure 9.2. The original
pattern has been corrupted with white additive Gaussian noise to a signal-to-noise
ratio, calculated from sdev of the signal and the noise according to Eq. 7.6, of 0 dB,
see Figure 9.3. The same slices from the enhanced volume are shown in Figure 9.4.
An improvement of about 10 dB of the SNR has been achieved. To illustrate the
noise performance of the algorithm a SNR table has been calculated1 in Table 9.2.
1
See also the comments about SNR as quality measure in Section 6.2.
135
Figure 9.2: Sixteen slices from the original test pattern.
The second test pattern has also been used in the accuracy estimation and it is
rendered in Figure 7.3. The volume has been corrupted with white Gaussian noise
to a SNR according to Eq. 7.6 of -4 dB. Sixteen slices from the original sequences
and sixteen slices from filtered is shown in Figures 9.5- 9.7. In this case the strength
of the spatio-temporal approach is undoubtedly shown.
136
Figure 9.3: Sixteen slices from the test pattern corrupted with white additive Gaussian noise, SNR= 0dB.
Figure 9.4: Result from 3D enhancement of the test pattern in Figure 9.3.
137
Figure 9.5: Sixteen slices from the original test pattern.
The third example is an ultrasonic sequence of a beating heart.2 Figure 9.8
contains four frames from the original sequence. As can be seen, the sequence
is corrupted by a lot of measurement noise which is significantly reduced in the
enhanced images, Figure 9.9.
The fourth example is a reconstruction of an interlaced video sequence, Figures 9.10-9.11. The estimation of the orientation tensor is obtained according to
the scheme described in Appendix 7.A. Two different adaptive filters are then applied, one designed with its center on the existing sampling points in the interlaced
sequence and one designed with its center on the new sampling points, see Figure 7.14.
2
Tomas Gustavsson, CTH, Sweden is acknowledged for this sequence.
138
Figure 9.6: Sixteen slices from the test pattern corrupted with white additive Gaussian noise, SNR= -4dB.
Figure 9.7: Result from 3D enhancement of the test pattern in Figure 9.6.
139
Figure 9.8:
original.
Four frames from an ultrasound image sequence of a beating heart,
Figure 9.9: Result from 3D enhancement of the ultrasound sequence, Figure 9.8.
140
Figure 9.10: A zoomed frame from an interlace sequence.
Figure 9.11: The reconstruction using 3D adaptive filtering of the interlaced sequence in Figure 9.10.
141
142
Part IV
4D — Adaptive Filtering
143
144
Chapter 10
Filtering of 4D Signals
Volume sequences, i.e. 4D signals, have been accessible from special devices during
last years, [88, 89]. However, the progress in image producing devices such as Magnetic Resonance (MR)cameras and positron cameras will in the near future lead to
a situation where 4D signals will be common. Today it is for example possible to
produce volume sequences in ordinary MR devices. In such situations the human
eye has problems to interpret the data and machine vision will have a great potential to guide and interpret the signals. Handling 4D data put hard demands on the
computers to use. Today’s peak performance is maybe a bit to slow for everyday
use, but the challenge will probably be met with the hardware development.
Extending the orientation algorithm and the adaptive filtering strategy from 2D
and 3D signals to 4D is therefore natural.
10.1
Orientation Representation and Estimation
Extending the proposed tensor representation from 2D and 3D to 4D is straightforward, [60]. The representation tensor originating from a one dimensional input
x = (x1 , x2 , x3 , x4 )t is given by:

T = xxt =
1


x

x21 x1 x2 x1 x3 x1 x4
x1 x2 x22 x2 x3 x2 x4 


x1 x3 x2 x3 x23 x3 x4 
x1 x4 x2 x4 x3 x4 x24
(10.1)
To estimate the tensor the number of filters should exceed the number of faces of
a 4D cube, i.e. it should be greater than 8. The number of filters is also restricted by
the demand that they should be distributed in accordance with the vertices of a regular polytope. This leaves the 24–cell [22] as the only alternative. (Computational
complexity makes the 120–cell and the 600–cell unrealistic alternatives.)
145
The 12 filter directions are given in cartesian coordinates by
where c =
n̂1
= c (1,
1,
0,
0)t
n̂2
= c (1, −1,
0,
0)t
n̂3
= c (1,
0,
1,
0)t
n̂4
= c (1,
0, −1,
0)t
n̂5
= c (1,
0,
0,
1)t
n̂6
= c (1,
0,
0, −1)t
n̂7
= c (0,
1,
1,
0)t
n̂8
= c (0,
1, −1,
0)t
n̂9
= c (0,
1,
0,
1)t
n̂10 = c (0,
1,
0, −1)t
n̂11 = c (0,
0,
1,
n̂12 = c (0,
0,
1, −1)t
(10.2)
1)t
√1 .
2
The filters to use are quadrature filters defined by
(
Fk (ω) = Fω (ω)(ω̂ · n̂k )2
Fk (ω) = 0
if ω · n̂k > 0
otherwise
(10.3)
where Fω (ω) is a bandpass filter.
Implementing the algorithm is as easy in a 4D signal space as in the lower dimension signal spaces, i.e the tensor is estimated as, [60, 8]
Te =
12
X
1
qk (Nk − I)
6
k=1
(10.4)
where qk is the filter output from filter number k.
When going to implementation of Eq. 10.4 the following scheme is used:
1. Convolve the input data with the twelve complex-valued filters, i.e. perform
twenty four scalar convolutions.
2. Compute the magnitude of each complex-valued filter by
q
2
2
qke
+ qko
qk =
where qke denotes the filter output of the real part of filter k and qko denotes
the filter output of the imaginary part of filter k.
3. Compute the tensor Te by Eq. 7.5, i.e.


Te = 


T11
T12
T13
T14
T12
T22
T23
T24
146
T13
T23
T33
T34
T14
T24
T34
T44





where
T11
T22
T33
T44
T12
T13
T14
T23
T24
T34
=
=
=
=
=
=
=
=
=
=
c2 (q1 + q2 + q3 + q4 + q5 + q6 ) − S
c2 (q1 + q2 + q7 + q8 + q9 + q10 ) − S
c2 (q3 + q4 + q7 + q8 + q11 + q12 ) − S
c2 (q5 + q6 + q9 + q10 + q11 + q12 ) − S
c2 (q1 − q2 )
c2 (q3 − q4 )
c2 (q5 − q6 )
c2 (q7 − q8 )
c2 (q9 − q10 )
c2 (q11 − q12 )
where
c2 =
1
2
and
S=
12
1X
qk
6 k=1
This algorithm has been implemented and tested on synthetic data, [8], but the
process is quit painful, due to limitation in the hardware. The test implies, though,
that the algorithm gives robust and correct results.
10.2
Adaptive Filtering of 4D Signals
In this section the extension of the proposed orientation adaptive algorithm from 2D,
Section 5.2, and 3D, Chapter 9, to 4D signal spaces is given. It is not obvious that
the scheme holds for 4D signal spaces, since in both 2D and 3D the filter directions
have the same scalar product with all the others, i.e. they are equally far away from
any of the other. However, in 4D there are two different scalar products, i.e. the
angular distance from one filter direction to another can have two different values.
The interpolation scheme will be shown to behave in the same way in 4D as in lower
dimensional signal spaces, and can be extended in a straightforward manner.
The estimated local structure and orientation from Te , Eq. 7.5, can be used to
adapt a filter to the signal. desired filter is:
F (ω) = Fω (ω)(g1 (λ1 , λ2 , λ3 , λ4 )(ω · ê1 )2 +
g2 (λ1 , λ2 , λ3 , λ4 ) (ω · ê1 )2 + (ω · ê2 )2 +
g3 (λ1 , λ2 , λ3 , λ4 ) 1 − (ω · ê4 )2 +
g4 (λ1 , λ2 , λ3 , λ4 ))
(10.5)
where Fω (ω) is a bandpass function with rather high cut-off frequency or designed
using a priori knowledge of the signal spectrum and noise spectrum and
g1 is a weighting function for a spatial volume filter in the direction of the the
eigenvector corresponding to the largest eigenvalue, ê1 , of Te .
g2 is a weighting function for a spatial plane filter in the direction of the eigenvectors
corresponding to the largest eigenvalues, ê1 and ê2 , of Te .
147
g3 is a weighting function for a spatial line filter in the direction of the eigenvector
corresponding to the least eigenvalue, ê4 .
g4 is a weighting function for an isotropic filter.
A direct extension of the 2D and 3D schemes is obtained by choosing the weighting functions, gi , as:
g1 (λ1 , λ2 , λ3 , λ4 )
g2 (λ1 , λ2 , λ3 , λ4 )
g3 (λ1 , λ2 , λ3 , λ4 )
g4 (λ1 , λ2 , λ3 , λ4 )
=
=
=
=
(λ1 − λ2 )/λ1
(λ2 − λ3 )/λ1
(λ3 − λ4 )/λ1
λ4 /λ1
(10.6)
Proposition 3
Assume twelve polar separable filters having the same radial frequency function,
Fω (ω), and angular functions (ω̂ · n̂k )2 . The directions, n̂k , are the same as in the
estimation procedure, Eq. 10.2. Then the desired interpolation is given by:
F (ω) = Fω (ω)
12
X
1
T̂e • (Nk − I)(ω̂ · n̂k )2
6
k=1
(10.7)
T̂e is the normalized estimate and • symbolizes the tensor inner product defined in
Eq. 5.15.
Proof of proposition 3
2
Using the spectrum theorem for the positive semi-definite tensor Te yields
T̂e =
1 e
λ2
λ3
λ4
T = ê1 et1 + ê2 et2 + ê3 et3 + ê4 et4
λ1
λ1
λ1
λ1
(10.8)
showing that it is enough to investigate the orientation interpolation in Eq. 10.7 for
a tensor corresponding to one orientation. An estimation tensor, T0 , from a one
dimensional input, eˆ1 = (u1 , u2 , u3 , u4 )t is


T0 = 



u21 u1 u2 u1 u3 u1 u4
u1 u2 u22 u2 u3 u2 u4 


u1 u3 u2 u3 u23 u3 u4 
u1 u4 u2 u4 u3 u4 u24
q
where u =
u21 + u22 + u23 + u24 = 1.
148
(10.9)
The filter orientation tensors Nk = n̂k n̂tk :s are given by




N1 = c2 


N3 = c2 




N5 = c2 




N7 = c2 




N9 = c2 




N11 = c2 


1
1
0
0
1
1
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
1
0
1
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
1
1
0
0
1
1
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
1
1
0
0
1
1






N2






N4





1 −1
−1
1

= c2 
 0
0
0
0
1
 0
= c2 

−1
0

N6

1
 0
= c2 

 0
−1





N8 = c2 








N10 = c2 







N12 = c2 






149
0
0
0
0
0
0
0
0
0 −1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0










0 −1
0
0 


0
0 
0
1
0
0
0
0
1 −1
0 −1
1
0
0
0
0
0
0
1
0
0
0 −1

0
0
0
0






0
0
0 −1 


0
0 
0
1
0
0
0
0
0
0
0
1 −1
0 −1
1





(10.10)
Calculating the tensor inner products T0 •(Nk − 16 I) gives
T0 • N1 − 16 I
T0 • N2 − 16 I
T0 • N3 − 16 I
T0 • N4 − 16 I
T0 • N5 − 16 I
T0 • N6 − 16 I
T0 • N7 − 16 I
T0 • N8 − 16 I
T0 • N9 − 16 I
T0 • N10 − 16 I
T0 • N11 − 16 I
T0 • N12 − 16 I
= c2 (u21 + 2u1 u2 + u22 ) −
1
6
= c2 (u21 − 2u1 u2 + u22 ) −
1
6
= c2 (u21 + 2u1 u3 + u23 ) −
1
6
= c2 (u21 − 2u1 u3 + u23 ) −
1
6
= c2 (u21 + 2u1 u4 + u24 ) −
1
6
= c2 (u21 + 2u1 u4 + u24 ) −
1
6
= c2 (u22 + 2u2 u3 + u23 ) −
1
6
= c2 (u22 − 2u2 u3 + u23 ) −
1
6
= c2 (u22 + 2u2 u4 + u24 ) −
1
6
= c2 (u22 − 2u2 u4 + u24 ) −
1
6
= c2 (u23 + 2u3 u4 + u24 ) −
1
6
= c2 (u23 − 2u3 u4 + u24 ) −
1
6
P6
k=1
P6
k=1
P6
k=1
P6
k=1
P6
k=1
P6
k=1
P6
k=1
P6
k=1
P6
k=1
P6
k=1
P6
k=1
P6
k=1
u2k
u2k
u2k
u2k
u2k
u2k
u2k
(10.11)
u2k
u2k
u2k
u2k
u2k
The filter outputs:
(ω̂ · n̂1 )2
= c2 ω −2 (ω12 + 2ω1 ω2 + ω22 )
(ω̂ · n̂2 )2
= c2 ω −2 (ω12 − 2ω1 ω2 + ω22 )
(ω̂ · n̂3 )2
= c2 ω −2 (ω12 + 2ω1 ω3 + ω32 )
(ω̂ · n̂4 )2
= c2 ω −2 (ω12 − 2ω1 ω3 + ω32 )
(ω̂ · n̂5 )2
= c2 ω −2 (ω12 + 2ω1 ω4 + ω42 )
(ω̂ · n̂6 )2
= c2 ω −2 (ω12 − 2ω1 ω4 + ω42 )
(ω̂ · n̂7 )2
= c2 ω −2 (ω22 + 2ω2 ω3 + ω32 )
(ω̂ · n̂8 )2
= c2 ω −2 (ω22 − 2ω2 ω3 + ω32 )
(ω̂ · n̂9 )2
= c2 ω −2 (ω22 + 2ω2 ω4 + ω42 )
(ω̂ · n̂10 )2
= c2 ω −2 (ω22 − 2ω2 ω4 + ω42 )
(ω̂ · n̂11 )2
= c2 ω −2 (ω32 + 2ω3 ω4 + ω42 )
(ω̂ · n̂12 )2
= c2 ω −2 (ω32 − 2ω3 ω4 + ω42 )
150
(10.12)
Summing the filter outputs weighted by the tensor inner products now gives
12
X
1
T0 • (Nk − I)(ω̂ · n̂k )2 =
6
k=1
2c4 ω −2 [
u21 3ω12 + ω22 + ω32 + ω42 + u22 ω12 + 3ω22 + ω32 + ω42 +
u23 ω12 + ω22 + 3ω32 + ω42 + u24 ω12 + ω22 + ω32 + 3ω42 +
(10.13)
4 (u1 u2 ω1 ω2 + u1 u3 ω1 ω3 + u1 u4 ω1 ω4 +
u2 u3 ω2 ω3 + u2 u4 ω2 ω4 + u3 u4 ω3 ω4 +)] −
1 2 −2 2
u1 + u22 + u23 + u24 6ω12 + 6ω22 + 6ω32 + 6ω42 =
cω
6
ω −2 (u1 ω1 + u2 ω2 + u3 ω3 + u4 ω4 )2 = (ω̂ · ê1 )2
proving that the orientation adaptivity algorithm works for 4D signal spaces in the
same way as in 2D and 3D signal spaces.
By applying the spectrum theorem in same way as in the proofs for the 2D and
3D the final filter is given
It remains to show the interpolation from all the filters from the spectrum theorem
F (ω) =
Fω (ω)
12
X
1
Te • (Nk − I)(ω̂ · n̂k )2 =
6
k=1
Fω (ω) λ1 (ω̂ · ê1 )2 + λ2 (ω̂ · ê2 )2 + λ3 (ω̂ · ê3 )2 + λ4 (ω̂ · ê4 )2 =
(10.14)
Fω (ω) (λ1 − λ2 )(ω̂ · ê1 )2 + (λ2 − λ3 ) (ω̂ · ê1 )2 + (ω̂ · ê2 )2 +
(λ3 − λ4 ) 1 − (ω̂ · ê4 )2 + λ4
Normalizing the control tensor, Te in maximum norm gives the desired filter from
Eq. 10.5 and Eq. 10.6.
2
It would have been truly interesting to test this strategy on real data, but that
will be a question for the future.
151
Bibliography
[1] J. F. Abramatic and L. M. Silverman. Non-stationary linear restoration of
noisy images. In Proceedings IEEE Conf. on Decision and Control, pages
92–99, Fort Lauderdale, FL., 1979.
[2] E. H. Adelson and J. R. Bergen. Spatiotemporal energy models for the perception of motion. Jour. of the Opt. Soc. of America, 2:284–299, 1985.
[3] E. H. Adelson and J. A. Movshon. Phenomenal coherence of moving gratings.
Nature, 200:523–525, 1982.
[4] P. Anadan. A computational framework and an algorithm for the measurement
of visual motion. Int. J. of Computer Vision, 2:283–310, 1989.
[5] M. Andersson and H. Knutsson. Orientation estimation in ambiguous neighbourhoods. In Proceedings of SCIA91, Aalborg, Denmark, 1991.
[6] M. T. Andersson. Controllable Multidimensional Filters in Low Level Computer Vision. PhD thesis, Linköping University, Sweden, S–581 83 Linköping,
Sweden, September 1992. Dissertation No 282, ISBN 91–7870–981–4.
[7] M. A. Arbib and A. Hanson, editors. Vision, Brain and Cooperative Computation, pages 187–207. MIT Press, 1987.
[8] H. Bårman. Hierachical Curvature Estimation in Computer Vision. PhD
thesis, Linköping University, Sweden, S–581 83 Linköping, Sweden, September
1991. Dissertation No 253, ISBN 91–7870–797–8.
[9] H. Bårman, L. Haglund, H. Knutsson, and G. H. Granlund. Estimation of
velocity, acceleration and disparity in time sequences. In Proceedings of IEEE
Workshop on Visual Motion, pages 44–51, Princeton, NJ, USA, October 1991.
[10] J.L. Barron, D. J. Fleet, and S. S. Beauchemin. Performance of optical flow
techniques. Technical Report RPL–TR–9107, Robotics and Perception Laboratory, Queen’s University, Canada, 1992.
[11] J.L. Barron, D. J. Fleet, S. S. Beauchemin, and T. A. Burkitt. Performance
of optical flow techniques. In Proc. of the CVPR, pages 236–242, Champaign,
Illinois, USA, 1992. IEEE.
[12] F. Bergholm. Edge focusing. IEEE Trans. on Pattern Analysis and Machine
Intelligence, 9(6):726–741, 1987.
152
[13] M. Bertero, T. Poggio, and V. Torre. Ill-posed problems in early vision. Proc.
of the IEEE, 76(8):869–889, 1988.
[14] J. Bigün. Local Symmetry Features in Image Processing. PhD thesis, Linköping
University, Sweden, 1988. Dissertation No 179, ISBN 91–7870–334–4.
[15] J. Bigün, G. H. Granlund, and J. Wiklund. Multidimensional orientation:
texture analysis and optical flow. In Proceedings of the SSAB Symposium on
Image Analysis. SSAB, March 1991.
[16] A. C. Bovik, M. Clark, and W. S. Geisler. Multichannel texture analysis
using localized spatial filters. IEEE Trans.s on Pattern Analysis and Machine
Intelligence, 12(1):55–73, January 1990.
[17] R. Bracewell. The Fourier Transform and its Applications. McGraw-Hill, 2nd
edition, 1986.
[18] P. J. Burt and E. H. Adelson. The Laplacian pyramid as a compact image
code. IEEE Trans. Comm., 31:532–540, 1983.
[19] A. Calway. The Multiresolution Fourier Transform. PhD thesis, University of
Warwick, 1989.
[20] F. W. Campbell and J. G. Robson. Application of Fourier analysis to the
visibility of gratings. J. Physiol., 197:551–566, 1968.
[21] S. Clippingdale and R. Wilson. Quad-tree image estimation: a new image
model and its application to minimum mean square error image restoration.
In Proc. 5th Scand. Conf. on Image Anal., 1987.
[22] H. S. M. Coxeter. Introduction to Geometry. John Wiley & Sons, Inc., 1961.
[23] J. Crowley and A. Parker. A representation for shape based on peaks and
ridges in difference of low-pass transform. IEEE Trans. on PAMI, (6):156–
169, 1989.
[24] P.E. Danielsson and O. Seger. Rotation invariance in gradient and higher
derivative detectors. Computer Vision, Graphics, and Image Processing, 49(2),
February 1990.
[25] I. Daubechies. The wavelet transform, time-frequency localization and signal
analysis. IEEE Trans. on Information Theory, 36(5):961–1005, September
1990.
[26] I. Daubechies, A. Grossmann, and Y. Meyer. Painless nonorthogonal expansions. J. Math. Phys, 26(5):1271 – 1283, May 1986.
[27] Hugh Davson, editor. The Eye, volume 2A. Academic Press, New York, 2nd
edition, 1976.
[28] R. L. DeValois, D. G. Albrecht, and L. G. Thorell. Spatial frequency selectivity
of cells in macaque visual cortex. Vision Research, 22:549–559, 1982.
153
[29] S. M. Pizer et al. Hierarchical shape description via the multiresolution symmetric axis transform. IEEE Trans. on PAMI, 31:156–177, 1985.
[30] F. L. Van Nes and M. A. Bouman. Variation of contrast with luminance. J.
Opt. Soc. of America, 57:401–406, 1967.
[31] Marie Farge. Wavelet transforms and their applications to turbulence. Annu.
Rev. Fluid Mech., 24:395–457, 1992.
[32] D. J. Fleet and A. D. Jepson. Stability of phase information. In Proceedings
of IEEE Workshop on Visual Motion, pages 52–60, Princeton, USA, October
1991. IEEE, IEEE Society Press.
[33] David J. Fleet. Measurement of image velocity. Kluwer Academic Publishers,
1992. ISBN 0–7923–9198–5.
[34] David J. Fleet and Allan D. Jepson. Computation of Component Image Velocity from Local Phase Information. Int. Journal of Computer Vision, 5(1):77–
104, 1990.
[35] David J. Fleet, Allan D. Jepson, and Michael R. M. Jenkin. Phase-based
disparity measurement. CVGIP Image Understanding, 53(2):198–210, March
1991.
[36] W. T. Freeman and E. H. Adelson. The design and use of steerable filters for
image analysis, enhancement, and wavelet representation. Technical report,
Vision and Modeling Group, Media Lab., MIT, Cambrige, September 1990.
[37] D. Gabor. Theory of communication. Proc. Inst. Elec. Eng., 93(26):429–441,
1946.
[38] M. A. Georgeson and M. G. Harris. Spatial selectivity of contrast adaptation:
Models and data. Vision Research, 24:729–741, 1984.
[39] G. H. Granlund. In search of a general picture processing operator. Computer
Graphics and Image Processing, 8(2):155–178, 1978.
[40] G. H. Granlund and H. Knutsson. Contrast of structured and homogenous
representations. In O. J. Braddick and A. C. Sleigh, editors, Physical and
Biological Processing of Images, pages 282–303. Springer Verlag, Berlin, 1983.
[41] R. Guissin and S. Ullman. Direct Computation of the Focus of Expansion from
Velocity Field Measurements. In Proceedings of IEEE Workshop on Visual
Motion, pages 146–155, Princeton, USA, October 1991. IEEE, IEEE Society
Press.
[42] T Gustavsson and S Nivall. Adaptive spatio-temporal filtering of ultrasound
image sequences. In Proceedings of the SSAB Conf. on Image Analysis, pages
104–108, Gothenburg, Sweden, March 1989. SSAB.
[43] P. Hackman. Boken med kossan på, Läroplamflett i Linjär Algebra. LiTH,
Linköpings Universitet, 1983.
154
[44] L. Haglund, H. Knutsson, and G. H. Granlund. On phase representation of
image information. In The 6th Scandinavian Conference on Image Analysis,
pages 1082–1089, Oulu, Finland, June 1989.
[45] L. Haglund, H. Knutsson, and G. H. Granlund. Scale analysis using phase
representation. In The 6th Scandinavian Conference on Image Analysis, pages
1118–1125, Oulu, Finland, June 1989.
[46] L. Haglund, H. Knutsson, and G.H. Granlund. On scale and orientation adaptive filtering. In Proceedings of the SSAB Symposium on Image Analysis,
Uppsala, March 1992. SSAB.
[47] O. Hansen. On the use of Local Symmetries in Image Analysis and Computer
Vision. PhD thesis, Aalborg University, March 1992.
[48] D. J. Heeger. Optical flow from spatiotemporal filters. In First Int. Conf. on
Computer Vision, pages 181–190, London, June 1987.
[49] D. J. Heeger. Optical Flow Using Spatio-Temporal Filters. Int. Journal of
Computer Vision, 2(1):279–302, 1988.
[50] D. J. Heeger and A. D. Jepson. Subspace methods for recovering rigid motion
I: Algorithm and implementation. Int. Journal of Computer Vision, 7(2):95–
117, Januari 1992.
[51] W. Hoff and N. Ahuja. Depth from stereo. In Proc. of the fourth Scandinavian
Conf. on Image Analysis, 1985.
[52] B. K. P. Horn. Robot vision. The MIT Press, 1986.
[53] T. S. Huang, editor. Image Sequence Analysis. Information Sciences. Springer
Verlag, Berlin, 1981.
[54] T. S. Huang, editor. Image Sequence Processing and Dynamic Scene Analysis,
Berlin, June 1982. NATO Advanced Study Institute, Springer Verlag.
[55] D. H. Hubel and T. N. Wiesel. Receptive fields of single neurones in the cat’s
striate cortex. J. Physiol., 148:574–591, 1959.
[56] IEEE Trans. on Information Theory, March 1992. Special Issue on Wavelet
Transforms and Multiresolution Signal Analysis.
[57] Bernd Jähne. Motion determination in space-time images. In O. Faugeras,
editor, Computer Vision-ECCV90, pages 161–173. Springer-Verlag, 1990.
[58] P. E. King-Smith and J. J. Kulikowski. The detection and recognition of two
lines. Vision Research, 21:235–250, 1981.
[59] H. Knutsson. Representing local structure using tensors. In The 6th Scandinavian Conference on Image Analysis, pages 244–251, Oulu, Finland, June 1989.
Report LiTH–ISY–I–1019, Computer Vision Laboratory, Linköping University, Sweden, 1989.
155
[60] H. Knutsson, H. Bårman, and L. Haglund. Robust orientation estimation in
2d, 3d and 4d using tensors. In Proceedings of International Conference on
Automation, Robotics and Computer Vision, September 1992.
[61] H. Knutsson and G. H. Granlund. Texture analysis using two-dimensional
quadrature filters. In IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management - CAPAIDM,
Pasadena, October 1983.
[62] H. Knutsson and G. H. Granlund. Apparatus for determining the degree
of variation of a feature in a region of an image that is divided into descrete
picture elements. Swedish patent 8502569-0 (US-Patent 4.747.151, 1988), 1986.
[63] H. Knutsson, L. Haglund, H. Bårman, and G. H. Granlund. A framework for
anisotropic adaptive filtering and analysis of image sequences and volumes. In
Proceedings ICASSP-92, San Fransisco, CA, USA, March 1992. IEEE.
[64] H. Knutsson, L. Haglund, and G. Granlund. Adaptive filtering of image sequences and volumes. In Proceedings of International Conference on Automation, Robotics and Computer Vision, September 1992.
[65] H. Knutsson, L. Haglund, and G. H. Granlund. Tensor field controlled image sequence enhancement. In Proceedings of the SSAB Symposium on Image Analysis, pages 163–167, Linköping, Sweden, March 1990. SSAB. Report
LiTH–ISY–I–1087, Linköping University, Sweden, 1990.
[66] H. Knutsson, R. Wilson, and G. H. Granlund. Anisotropic non-stationary
image estimation and its applications — part I: Restoration of noisy images.
IEEE Trans on Communications, COM–31(3):388–397, March 1983. Report
LiTH–ISY–I–0462, Linköping University, Sweden, 1981.
[67] Hans Knutsson. Filtering and Reconstruction in Image Processing. PhD thesis,
Linköping University, Sweden, 1982. Diss. No. 88.
[68] Hans Knutsson. Producing a continuous and distance preserving 5-D vector
representation of 3-D orientation. In IEEE Computer Society Workshop on
Computer Architecture for Pattern Analysis and Image Database Management
- CAPAIDM, pages 175–182, Miami Beach, Florida, November 1985. IEEE.
Report LiTH–ISY–I–0843, Linköping University, Sweden, 1986.
[69] J. J. Koenderink and A. J. van Doorn. Dynamic shape. Biological Cybernetics,
1986.
[70] J.J. Koenderink and A.J. van Doorn. The structure of images. Biological
Cybernetics, 50:363–370, 1984.
[71] J. J. Kulikowski and P. E. King-Smith. Spatial arrangement of line, edge
and grating detectors revealed by subthreshold summation. Vision Research,
13:1455–1478, 1973.
156
[72] K. Langley, T.J. Atherton, R.G. Wilson, and M.H.E Larcombe. Vertical and
horizontal disparities from phase. In O. Faugeras, editor, Computer VisionECCV90, pages 315–325. Springer-Verlag, April 1990.
[73] Tony Lindeberg. Discrete Scale-Space Theory and the Scale-Space Primal
Sketch. PhD thesis, Royal Institute of Technology, 1991.
[74] B. Lucas and T. Kanade. An Iterative Image Registration Technique with
Applications to Stereo Vision. In Proc. Darpa IU Workshop, pages 121–130,
1981.
[75] S. G. Mallat. A theory for multiresolution signal decomposition: the wavelet
representation. IEEE Trans. PaMI, 11:674–693, July 1989.
[76] S. Marcelja. Mathematical description of the responses of simple cortical cells.
Journal of Opt. American Society, 70, 1297–1300 1980.
[77] K. V. Mardia. Statistics of Directional Data. Academic Press, 1972.
[78] D. Marr, T. Poggio, and S. Ullman. Bandpass channels, zero-crossings, and
early visual information processing. Journal of Opt. American Society, 69,
914–916 1977.
[79] H. Mostafavi and D. J. Sakrison. Structure and properties of a single channel
in the human visual system. Vision Research, 16:957–968, 1976.
[80] H. H. Nagel. Constraints for the estimation of displacement vector fields from
an image sequence. In Proceedings of Int. Joint. Conf. on Artificial Intelligence
in West Germany, 1983.
[81] S. I. Olsen. Concurrent solution of the stereo correspondence problem and the
surface reconstruction problem. In Proc. of the Eighth Int. Conf. on Pattern
Recognition, Paris, pages 1038–1040, 1986.
[82] D. A. Pollen and S. F. Ronner. Spatial computation performed by simple and
complex cells in the visual cortex of the cat. Vision Research, 22:101–118,
1982.
[83] D. A. Pollen and S. F. Ronner. Visual cortical neurons as localized spatial
frequency filters. IEEE Trans. on Syst. Man Cybern., 13(5):907–915, 1983.
[84] M. Porat and Y. Y. Zeevi. The Generalized Gabor Scheme of Image Representation in Biological and Machine Vision. IEEE Trans. on PAMI, 10(4):452–
467, 1988.
[85] W. K. Pratt. Generalized Wiener filtering computation techniques. IEEE
Trans. Comput., C-21:297–303, July 1972.
[86] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical
Recipes. Cambridge University Press, 1986.
157
[87] O. Rioul and M. Vetterli. Wavelets and signal processing. IEEE Signal Processing Magazine, pages 14–38, October 1991.
[88] Richard A. Robb. The dynamic spatial reconstructor: An x-ray videofluoroscopic ct scanner for dynamic volume imaging of moving organs. IEEE
Trans.s on Medical Imaging, MI-1(1):22–33, July 1982.
[89] Richard A. Robb, Eric A. Hoffman, Lawrence J. Sinak, Lowell D. Harris, and
Erik L. Ritman. High-speed three-dimensional x-ray computed tomography:
The dynamic spatial reconstructor. Proceedings of the IEEE, 71(3):308–319,
March 1983.
[90] E. Simoncelli and E. H. Adelson. Subband transforms. In J. Woods, editor,
Subband Image Coding, chapter 4, pages 143–192. Kluwer, 1991.
[91] S. Uras, F. Girosi, A. Verri, and V. Torre. A computational approach to
motion perception. Biological Cybernetics, pages 79–97, 1988.
[92] A. B. Watson and Jr. A. J. Ahumada. Model of human visual-motion sensing.
Jour. of the opt. soc. of America A, 1(2):322–342, 1985.
[93] C-J Westelius. Preattentive gaze control for robot vision, June 1992. Thesis
No. 322, ISBN 91–7870–961–X.
[94] C-F Westin. Feature extraction based on a tensor image description, September 1991. Thesis No. 288, ISBN 91–7870–815–X.
[95] J. Wiklund, C-J Westelius, and H. Knutsson. Hierarchical phase based disparity estimation. Report LiTH–ISY–I–1327, Computer Vision Laboratory,
Linköping University, Sweden, 1992.
[96] R. Wilson, A. D. Calway, and E. R. S. Pearson. A Generalized Wavelet Transform for Fourier Analysis: The Multiresolution Fourier Transform and Its Application to Image and Audio Signal Analysis. IEEE Trans. on Information
Theory, 38(2):674–690, 1992.
[97] R. Wilson and G. H. Granlund. The uncertainty principle in image processing. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI–
6(6), November 1984. Report LiTH–ISY–I–0576, Computer Vision Laboratory,
Linköping University, Sweden, 1983.
[98] R. Wilson and H. Knutsson. A multiresolution stereopsis algorithm based on
the Gabor representation. In 3rd International Conference on Image Processing and Its Applications, pages 19–22, Warwick, Great Britain, July 1989. IEE.
ISBN 0 85296382 3 ISSN 0537–9989.
[99] R. Wilson and M. Spann. Image segmentation and uncertainty. Research
Studies Press, 1988.
[100] A. Witkin. Scale-space filtering. In 8th Int. Joint Conf. Artificial Intelligence,
pages 1019–1022, Karlsruhe, 1983.
158
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Related manuals

Download PDF

advertisement