Proceedings fra den 13. Danske Konference i

Proceedings fra den 13. Danske Konference i
1
Proceedings fra den 13. Danske Konference i
Mønstergenkendelse og Billedanalyse
Eds. Søren I. Olsen
Technical Report no. 2004/10
ISSN: 0107-8283
DIKU
University of Copenhagen x Universitetsparken 1
DK-2100 Copenhagen x Denmark
TABLE OF CONTENTS:
Color Signal Processing Professor Reiner Lenz, Linköping University
3
Image Relighting: Getting the Sun to Set in an Image Taken at Noon Claus B. Madsen, Rune Laursen
13
Using Mixtures of Gaussians to Compare Approaches to Signal Seperation Kaare B. Petersen
21
Stochastic differential equations in image warping Bo Markussen
31
Probabilistic Model-based Background Subtraction J. Anderson, T. Prehn, V.Krüger, A. Elgammal
38
Some Transitions of Extrama-Based Multi-Scale Singularity Trees K. Somchaipeng, J. Sporring, S.
Kreiborg, P. Johansen
48
A Test Statistic in the Complex Wishart Distribution and its Application to Change Detection in
Polarimetric SAR Data Knut Conradsen, Allan A. Nielsen, Jesper Schou, Henning Skriver
56
Using the LZW to Detect Prinictives Lars Reng
72
Medical image sequance coding Mo Wu, Søren Forchhammer
77
Integrity Improvement in Classically Deformable Solids Micky K. Christensen, Anders Fleron
87
The Thin Shell Tetrahedral Mesh Kenny Erleben, Henrik Dohlmann
94
Talking Faces - a State Space Approach Tue Lehn-Schiøler
103
Live Interpretation of Conductors' Beat Patterns Declan Murphy
111
Testing for difference between two groups of functional neuroimaging experiments Finn Å. Nielsen,
Andrew C.N. Chen, Lars K. Hansen
121
An image based system to automatically and objectively score the degree of redness and scaling in
psoriasis lesions David Delgado, Bjarne Ersbøll, Jens M. Carstensen
130
Survey and Assesment of Advanced Feature Extraction Techniques and Tools for OE Applications
Mikael K. Sørensen, Michael S. Rasmussen, Henning Skriver, Peter Johansen, Arthur Pece, Jesper H.
Thygesen
138
Automatic change detection for validation of digital map databases Brian P. Olsen, Thomas Knudsen
148
Spatial-temporal Disambiguation of Multi-modal Descriptors Norbert Krüger, Florentin Wörgötter
154
3
Color Signal Processing
Reiner Lenz
Department Science and Technology
Linköping University
SE-60174 Norrköping
Sweden
email:[email protected]
Abstract
In this paper we describe some of our recent investigations in the eld of color signal processing. We will describe
the application of group theoretical methods and illustrate them with some results from low-level lter design, statistical
properties of color signal spaces and the construction of invariants.
1
Introduction
Everybody knows what color is. But as a scienti c concept color is not at all well-de ned. This is perhaps natural
since ”color” has been of interest in such different scienti c disciplines as art, physics, chemistry, biology, mathematics,
psychology, physiology and technology to name only a few. We are very well aware of this but here we choose to restrict
us to a very narrow de nition of what color is. In the rst part we identify color simply with the measurement vectors
that are produced by a (RGB or multi-channel) camera. In the rest of the paper we de ne color (or rather the color signal)
as a function of wavelength that has non-negative function values. This signal processing approach completely ignores all
perceptual aspects of the problem (color scientists would thus probably argue that this paper is not at all about color, since in
their view color is in the brain of the observer).
Group theory on the other hand originates partly in the study of symmetries. Roughly one can say that symmetry is a
property that is preserved under the application of a set of rules. A simple example: A circle is a geometric gure that is preserved under all 2-D rotations around the center of the circle. Or one could also say that the rotations are the transformations
that preserve the circles. Group theoretical methods have had an enormous impact on theoretical physics and mathematics
and the purpose of this paper is to show how they can be used to solve problems involving color. We will not go into technical
details but try to give a a vor of the wide range of color related problems that can be investigated with group theoretical tools.
We start in the rst section with a description how the (geometrical) symmetries of the sampling grid and the (permutational) symmetries of the color channels can be used to design low-level lters (corresponding to receptive elds in human
color vision). Then we show why spaces of color signals have a conical structure and we illustrate this with an investigation
of databases of daylight spectra. The nal topic is the application of the Lie-theory of differential equations to color problems.
We will show that transformation groups de ne systems of differential equations in a natural way. Solving these differential
equations can, for example, be used to construct invariants. Using the Lie-theory of differential equations gives a number
of advantages. One of them is the availability of symbolic packages like Maple that can solve these problems completely
automatically.
2
PCA without data - Harmonic Analysis of RGB images
Group theoretically de ned transforms are important tools in many low-level digital signal processing applications. The
most widely used examples are the DFT and the FFT but also wavelet transforms can be understood in terms of group
4
theory. The example we describe here is the transform based on the dihedral groups [14]. The case that is of greatest use
in applications concerns RGB images on a square grid. Other important cases are RGB vectors on hexagonal grids and
four-channel sensors on square or hexagonal grids. Potential applications are digital signal processors in digital cameras with
varying sampling and spectral resolution properties.
Consider a square sampling grid of the plane. The transformations that map this grid into itself form a group, the dihedral
group D(4). It consists of all rotations with angle 90k degrees (k=0,1,2,3) and all re ections around the symmetry axis of a
square. It has eight elements. The dihedral group D(6) consists of all transformations that preserve the hexagonal sampling
grid. The group of all permutations of three elements is S(3) and consists of six elements. The direct product G = D(4) × S(3)
consists of all pairs g = (d, π) where d is an element in D(4) and π ∈ S(3). This product forms a group that has 48 elements.
The group elements g operate on RGB vectors de ned on square grids by application of the dihedral part d on the grid and
the permutation part π on the spectral bands. A typical operation would be a 90 degrees rotation combined with a ipping
of the R and B channels. Since these transformations are linear we can describe them (given a basis) by a matrix. We will
also use g to denote this matrix. The same symbol will thus be used for the group element, the transformation and the matrix.
What is meant should be clear from the context.
The second concept of interest in our context is Principal Component Analysis as Least Mean Squared Error (LMSE)
approximation. We start with a stochastic process of signal vectors s of dimension N . In our case this could be RGB vectors
on an n × n window collected in an N = 3n2 −dimensional signal vector s. Now change the basis in the signal space and
project all signal vectors s to the rst M basis vectors in the new system (with M << N ) thus creating a new signal s.
Since the information on s contained in the last N − M components is lost by the projection this will result in an error
for each signal. Selecting the new basis such that the mean approximation error is minimal (in the L2 -sense) is known as
Principal Component Analysis (PCA). It can be shown that the basis to choose consists of the eigenvectors of the correlation
matrix C = E (ss ) of the original stochastic process belonging to the M largest eigenvalues (E (.) denotes the expectation
operator).
PCA requires thus, in general, two operations: (1) compute the correlation matrix C of the process and (2) compute eigenvectors of the correlation matrix C. In the following we will show under what conditions we can make general statements
about the eigenvectors without actually computing the correlation matrix and the eigenvectors.
We start by selecting a subset of points on the given grid that is closed under the operations of the dihedral transformations. On each location point we have a measuremnt vector. Given a group element g = (d, π) we apply the geometric
transformation d to the grid points and the permutation π to the vector components. The resulting operation is described by
a permutation matrix g that has exactly one element of value one in each column and row. These permutation matrices have
the property that the transpose is equal to the inverse: g = g −1 . Now assume that each such transformed signal gs has the
same probability as the original signal s. Then we get for the correlation matrix the property that:
C = E (ss ) = E ((gs)(gs) ) = gE (ss )g = gCg = gE (ss )g −1 = gCg −1
where the equality E (ss ) = E ((gs)(gs) ) holds because the transformation g merely results in a reordering of the original
set of signals.
As a result we nd that each group element g delivers one constraint C = E (ss ) = gE (ss )g −1 = gCg −1 for the
correlation matrix. Such a matrix C is called a G-symmetric matrix. The properties of G-symmetric matrices are wellknown from the theory of group representations and the main results are the following:
• There is one matrix M that block-diagonalizes all G-symmetric matrices C
• The size of the blocks in the diagonal depends only on the group
• The matrix M can be generated automatically
The detailed description of these facts can be found in [14] and the references mentioned there.
2.1
Experimental results
The implementation of the block-diagonalization is similar to the case described in [14] for the dihedral groups. The
similarity of the correlation matrix C and its block-diagonalization M CM depends of course on the degree to which the
stochastic process really is G-symmetric. To test the usefulness of the method we did the following experiment: From a
large image database containing more than 600 000 images we selected 10000 random images and in each image randomly
5
Representation No.
Dimension
Block size
1
1
36
2
1
0
3
2
72
4
1
28
5
1
0
6
2
56
7
1
36
8
1
0
9
2
72
10
1
28
11
1
0
12
2
56
13
2
128
14
2
0
15
4
256
Table 1. Block structure of transformed correlation matrix
Eigenvector Number
1
2
3
4
5
6
7
8
9
10
Original
0.915898
0.945646
0.954611
0.963074
0.967766
0.970658
0.973146
0.975438
0.976639
0.977770
Block-Diagonal Approximation
0.914900
0.945642
0.954600
0.963057
0.967745
0.970605
0.973117
0.975403
0.976601
0.977733
Table 2. Trace(C) compared with accumulated sum of eigenvalues
2 patches of size 16 by 16 pixels. The signal vector had thus dimension 16x16x3=768. Applying the general theory we nd
that in the ideal case the transformed correlation matrix M CM had the block structure listed in Table 1. Note that this
result holds for all correlation matrices with this type of symmetry of this size.
Figure 2.1 shows the original correlation matrix, the absolute value of the block diagonal correlation matrix M CM
and the absolute value of the block diagonal correlation matrix M CM where we have masked out the upper left corner to
visualize the remaining entries.
(a) Original correlation matrix C
(b) Block-diagonal matrix M CM
(c) Masked block-diagonal matrix M CM
One way to evaluate the loss of information introduced by the block-diagonalization is to compare the values of the
eigenvalues of the original principal components and the block-diagonal approximations.In Table 2 the ratios of the trace and
the accumulated sums of the eigenvalues of the original matrix and the block diagonal are listed: Further inspection shows
that each of the blocks has a special transformation property under the elements of the transformation group leading to lter
functions of a certain type. Some lters are thus invariant under all dihedral operations (spatially homogeneous) while others
are related to edge-type lters etc..
6
Finally we want to point out that the block-diagonal lters allow more ef cient implementations due to their symmetry
properties and that given the domain they can be automatically derived using symbolic mathematical software like Maple.
3
PCA and the geometry of the space of color signals
One attempt to understand the properties of biological systems (and to build technological systems) relies on the hypothesis
that successful systems are adapted to the structure of the space of signals they are analyzing (see [1, 2, 4, 18] for a few
examples and [19] for a collection of articles on natural stimulus statistics.). If this is true, then it is of interest to investigate
properties of signal spaces that are often analyzed by natural or arti cial systems.
Here we start with the observation that in many interesting cases the signals of interest can only assume non-negative
values. A typical example are illuminant spectra. Here the signals s are functions of the wavelength variable λ and s(λ) is
the number of photons of wavelength λ. By de nition s(λ) ≥ 0, ∀λ. Another example from multi-channel color processing
are re ectance spectra r where r(λ) describes the probability that a photon of wavelength λ will be re ected from a point.
When an illumination spectrum s hits an object point with re ection spectrum r then (in the simplest model) the spectrum that
is re ected from that scene point is given by c = s · r and c is often called a color signal. The color signal, too, is by de nition
a non-negative function. The main result we will describe here is the proof that all components of the rst eigenfunction of a
stochastic process of spectra have the same sign. It can therefore be chosen to have only non-negative entries. This has been
observed in many empirical investigations of databases of spectra but to our knowledge it has never been pointed out that this
follows from the Perron-Frobenius theory of non-negative matrices and its generalization, the Krein-Rutman theorem.
We will then illustrate a few consequences of this fact for databases of daylight spectra. We will show that it follows
that these spaces of daylight spectra have a conical geometry and that the group of Lorentz transformations provides a rich
toolbox to investigate problems related to changes of daylight properties.
3.1
Perron-Frobenius and Krein-Rutman Theory
For nite dimensional signal vectors the non-negativity of the rst eigenvector follows easily from the Perron-Frobenius
theory of non-negative matrices. We will therefore give a brief overview and refer the interested reader to Chapter 13 in [6]
for more details
De nition 1
1. A matrix C is non-negative if all elements are non-negative.
2. A matrix C is positive if all elements are positive.
3. A matrix C is reducible if there is a permutation matrix P such that
C1
P −1 AP = P AP =
C21
0
C2
(1)
A matrix that is not reducible is called irreducible.
Note the difference between non-negative (positive) and non-negative (positive) de nite matrices. The rst are de ned by
properties of the elements of the matrix whereas the latter are de ned via bilinear products! Note also that for positive
(non-negative) matrices we require the transformation matrix P to be a permutation matrix. For a permutation matrix P the
transposed P of the matrix is its inverse P −1 .
One result of the Perron-Frobenius theory of non-negative matrices is the following theorem of Perron (see [6]):
Theorem 1 [Perron] A positive matrix has a real, maximal, positive eigenvalue r. This eigenvalue is a simple root of the
characteristic equation and the corresponding eigenvector has only positive elements.
In the context of stochastic processes we will also need the following result about the normal form of non-negative matrices:
7
Theorem 2 For a non-negative matrix C we can nd
⎛
C1
0
⎜ 0
C2
⎜
⎜ ...
...
⎜
⎜ 0
0
⎜
⎜Cg+1,1 Cg+1,2
⎜
⎝ ...
...
Cs1
Cs2
= P CP =
a permutation matrix P such that: C
⎞
...
0
0
... 0
...
0
0
... 0 ⎟
⎟
...
...
...
. . . . . .⎟
⎟
...
Cg
0
... 0 ⎟
⎟
. . . Cg+1,g Cg+1 . . . 0 ⎟
⎟
...
...
...
. . . . . .⎠
...
Cs,g
Cs,g+1 . . . Cs
(2)
is the normalform of C. The matrices Ci are irreducible and at least one of the matrices Ci,1 , Ci,2 , . . . , Ci,i−1 is different
C
from zero.
For stochastic processes of color signals we can now argue as follows: color signals have non-negative values, the correlation
matrix C is therefore non-negative. We can nd a permutation matrix to bring C into normalform and since C is also
symmetric we nd that C is block-diagonal with irreducible blocks. A stochastic process is thus the direct sum of uncorrelated
subprocesses where each of the subprocesses has a strictly positive rst eigenvector. In the following we will therefore only
consider processes with strictly positive rst eigenvectors. Many of the results of the Perron-Frobenius theory hold also in
the more general case where the nite signals are replaced by elements in a Hilbert space and the correlation matrix by an
operator. These results are generally known as Krein-Rutman theory (for details see [3], pp. 2129).
3.2
Geometry of Spaces of Daylight spectra
Understanding the properties of spectral distributions of daylight and its evolutions at different sites with varying atmospheric conditions is an active research area. Many efforts to measure and model daylight spectra originate in the
60s [10, 11, 17, 23, 24]. The SMARTS model [9] is one example of recent tools to compute clear sky spectral irradiance
from a description of atmospheric conditions, time and solar geometries.
In the following we consider spaces of daylight spectra as subsets of vector spaces. We use PCA to approximate the spectra
with linear combinations of three principal components. We saw above that the rst eigenvector is positive. Therefore the
other eigenvectors must have negative contributions somewhere (because of the orthogonality of the eigenvectors). From
this we conclude that (given a x ed value for the coef cient of the rst eigenvector) the coef cients of the higher order
eigenvectors must be bounded (otherwise some negative contribution of this eigenvector will be larger than the corresponding
positive contribution from the rst eigenvector and the resulting linear combination will be negative, which is impossible since
the spectra are non-negative everywhere). The daylight spectra are thus located in a solid cone-like region of the vector space.
This allows us to introduce the conical structure of the coordinate vector space of spectra. For all spectra in the set we have
(after a suitable re-scaling of the basis vectors):
σ02 − σ12 − σ22 > 0;
σ0 ≥ 0
(3)
where σk is the k-th coef cient in the PCA expansion. A conical projection in the space of spectra coordinates is de ned as
the following perspective projection:
x = σ1 /σ0 ;
y = σ2 /σ0
(4)
A spectrum in this space is thus characterized by σ0 (related to the intensity) and the coordinate z = x + iy (related to the
chromaticity). The points z = x + iy lie on the open unit disk of the complex plane. In the following we will use the terms
”intensity” and ”chromaticity” in the sense de ned here. They are convenient terms but are, of course, different from their
meaning in traditional color science. We characterize a spectra sequence by its sequence of projected points on the open unit
disk.
It is known that the open unit disk is a two dimensional Poincaré model of hyperbolic geometry, and its isometry transformation group is SU(1,1). SU(1,1) is the group consisting of 2 × 2 complex matrices of the following form:
a b
2
2
M=
: a, b ∈ C, |a| − |b| = 1
(5)
b a
An element M ∈ SU(1,1) acts as the fractional linear transformation on a point z:
w = M z =
az + b
bz + a
(6)
8
Special subgroups of SU(1,1), known as one-parameter subgroups M (t) are given by group elements that are functions
of the real values t, having the following properties:
M (t1 + t2 )
M (0)
= M (t1 )M (t2 );
∀t1 , t2 ∈ R
= E = identity matrix
For a one-parameter subgroup M (t) we introduce its in nitesimal generator, represented by the matrix X:
dM (t) M (t) − E
X=
= lim
dt t=0 t→0
t
(7)
(8)
The in nitesimal matrices X representing one-parameter subgroups M (t) ∈ SU(1,1) form the Lie algebra su(1,1), which
consists of 2 × 2 complex matrices of the form:
iγ
β
su(1,1) =
:
γ ∈ R, β ∈ C
(9)
β −iγ
The Lie algebra su(1,1) forms a three-dimensional vector space [20], spanned by the basis (J k ):
0 1
0 −i
i
0
; J2 =
; J3 =
J1 =
1 0
i
0
0 −i
(10)
Each in nitesimal matrix X ∈ su(1,1) of a one-parameter subgroup M (t) ∈ SU(1,1) has thus a coordinate vector speci ed
by the three real numbers ξ1 , ξ2 and ξ3 : X = ξ1 J 1 + ξ2 J 2 + ξ3 J 3 .
Given a start point z(0) on the unit disk together with a one-parameter subgroup M (t) we de ne an SU(1,1) curve as the
following function of t:
z(t) = M (t)z(0) = etX z(0);
t ∈ R, z(t) on the unit disk
(11)
The SU(1,1) curves are straight lines in the three dimensional Lie algebra space su(1,1), thus the estimation of input
chromaticity sequences using SU(1,1) curves can be considered as a linearization. The Lie algebra SU(1,1) provides a
powerful tool to linearize problems involving chromaticity sequences. We developed several methods [15] to compute the
group parameters of SU(1,1) curves from input data sequences. These methods are used to study the properties of sequences
of daylight spectra.
3.3
An Illustration
We investigated two different types of daylight spectra:
• A database of 21871 daylight spectra measured at the same location (Norrköping, Sweden) from June 16th, 1992 to
July 7th, 1993; between 5:10 and 19:01 (Local time); in 5nm steps from 380nm to 780nm, and
• Daylight spectra sequences generated by the simulation program SMARTS2 [9]. The SMARTS model accepts as its
input the Sun position and atmospheric parameters including: Angström beta, precipitable water, ozone, and surface
pressure. The wavelength range of the generated spectra was 380nm to 780nm in 1nm steps. In a series of experiments,
sequences of spectra were generated by changing a single parameter in its allowed range while keeping the others x ed
to the default values. In another experiment a large set of simulated daylight spectra with all the feasible combinations
of parameters were also created to simulate the investigated space of daylight spectra.
Figure 3.3 shows the results of the SU(1,1) estimations of chromaticity sequences generated by SMARTS2. In one
experiment we generated spectra with different values of the Angström beta and Precipitable water parameters. They can be
seen as the images of a grid in the (Angström beta, precipitable water) plane under the SMARTS2 mapping. The chromaticity
sequences corresponding to single parameter (Beta or Water) changes are then created by grouping all chromaticity points
having the same value of the other parameter. We choose the value of the precipitable water parameter as constant and
estimate the one-parameter group as the Angström beta varies. The group coordinates characterizing these estimated SU(1,1)
curves are not constant but form a linear function for different settings of the other parameter. In Figure 3.3 we show the
computed group coordinates of estimated SU(1,1) curves, varying as functions of the other parameter’s settings. In another
9
(d) Group coordinates from varying water values
(e) SU(1,1) curve estimating changing Sun position
example we generated different daylight spectra by varying the zenith angle with all other parameters set to default values.
This corresponds to a sequence of daylight spectra with different positions of the Sun. It can also be considered as a time
sequence of daylight spectra. The same SU(1,1) curve can be used to estimate both sequences of equally spaced zenith angle
changes and equally spaced time changes, but the group parameters t describing the locations of points on curve are different
for each sequence.
In all the experiments described above, we found that SU(1,1) curves provide good estimation for all SMARTS2 spectra
sequences generated by changing any single parameter. The method also provides good approximations when the remaining
parameters are set to other reasonable values different from the default values.
4
Lie Theory, Differential equations and Invariants
One advantage of the framework described above is the connection to the Lie-theory of differential equations. This
connection is provided by the one-parameter groups de ned in the last section. Assume we have a one-parameter group M (t)
describing the chromaticity changes of an illumination source. One example could be a sunrise or sunset described by the
time-varying sequence of spectra in the experiment mentioned in the last section. Given a chromaticity parameter z(0) this
group generates a sequence of time-varying chromaticity parameters z(t) = M (t)z(0) as de ned in Eq. (11). Choosing an
intensity parameter one can reconstruct the coef cients in the PCA decomposition and by computing the linear combination
with the principal components we obtain a sequence of the time-varying illumination spectra l(t, λ) where λ is the wavelength
parameter and t is the group parameter. If we now model the camera as a linear system described by the matrix K then the
time-changing pixel vectors p(t) for a given object point are given by p(t) = Kl(t, λ). Since the chromaticity transformations
d
form a one-parameter group it is possible to de ne the Lie-derivative operator dt
with respect to the group parameter t. This
construction can then be used to investigate properties of measurement sequences like the time-varying pixel values recorded
by one channel of a color camera.
Another application where these tools can be used is the construction of invariants. With the help of the Lie-theory of
differential equations it is possible to give an overview over all invariants and to construct these invariants automatically using
programs like Maple. We will illustrate this with the construction of invariants for the dichromatic re ection model.
10
Name
Rotation
Uniform Scaling
Non-Uniform Scaling
Shear
Matrix Group
cos(τ ) sin(τ )
−sin(τ ) cos(τ
)
0
eσ
−σ
0 σ e e
0
0
1
1 α
0 1
Lie-Algebra (in nitesimal element)
0 1
X1 =
−1 0 1 0
X2 =
0 −1 1 0
X3 =
0 0 0 1
X4 =
0 0
Table 3. Subgroups of the general linear group of 2 × 2 matrices
In many applications, for example in color image segmentation, color object recognition etc., the main interest is the
physical content of the objects in the scene. Deriving features which are robust to image capturing conditions such as
illumination changes, highlights, shadows and geometry changes is a crucial step in such applications. The interaction
between lights and objects in the scene is very complicated and sophisticated models such as Transfer Radiative Theory or
Monte-Carlo simulation methods are needed to provide realistic results. The complexity of these models (and the fact that
many key components are unknown) prevents their application in object recognition. Previous studies of color invariance
are, therefore, mostly based on simpler semi-empirical models such as the Dichromatic Re ection Model [21], or the model
proposed by Kubelka and Munk [13], together with many additional assumptions [5, 7, 8, 12, 22].
We already mentioned that it is very dif cult to describe in detail what happens when light strikes a surface: some of
the light will be re ected at the interface producing interface re ection, while another part will transfer through the medium
undergoing absorption, scattering, and emission. The Dichromatic Re ection Model (see [21]) describes the relation between
the incoming light and the re ected light as a mixture of the light re ected at the material surface and the light re ected from
the material body. The model assumes that the light L(x, λ) (of wavelength λ) re ected from a point of the surface can be
decomposed into two additive components: an interface (specular) re ectance and a body (diffuse) re ectance:
L(x, λ) = mS (x)RS (λ)E(λ) + mD (x)RD (λ)E(λ)
(12)
An interpretation is that the geometric properties at x (such as the angle of incidence light, the angle of remittance light etc.)
determine the weights mS (x) and mD (x). The relations between the spectral properties of the illumination, the specular and
the diffuse re ections depend on the properties of the material and are given by the relations: RS (λ)E(λ) and RD (λ)E(λ)
where E(λ) is the spectral power distribution of the incident light. The measured sensor values Cn (x) at pixel x in the image
using N lters with spectral sensitivities given by f1 (λ)...fN (λ) will be given by the following integral over the visible
spectrum:
Cn (x) = fn (λ) [mS (x)RS (λ)E(λ)
+mD (x)RD (λ)E(λ)] dλ
= mS (x)Sn + mD (x)Dn
(13)
Assume that two object points belong to the same material. They have therefore identical re ectance functions and the only
difference between the corresponding pixels in the image originates in their geometrical properties. For these two neighboring
pixels x1 and x2 and one x ed channel n we have then:
Cn (x1 )
Sn
mS (x1 ) mD (x1 ) Sn
=M
(14)
=
Cn (x2 )
mS (x2 ) mD (x2 ) Dn
Dn
In the the framework of transformation groups we see that the matrix M operates on the 2-D vectors (Sn Dn ) . In the group
theoretical approach it is now natural to consider groups of matrices M operating on the vectors (Sn Dn ) and to construct
invariants for such groups. Examples of relevant groups of matrices M are listed in Table 3:
The subgroups in Table 3 form one-parameter groups with in nitesimal elements Xk . It is now natural to consider all
linear combinations of one, two, three or four of the in nitesimal elements Xk , k = 1, . . . , 4. Combining X2 , X3 we
can thus generate the two-parameter group of scalings and the full group is a four parameter group generated by all linear
11
combinations of all four in nitesimal matrices. General Lie-theory requires however not only to consider linear combinations
but also the Lie-product of two elements X, Y de ned as XY − Y X. A vector space that is closed under the Lie-product is
called a Lie-algebra and its vector-space dimension is the dimension of the Lie-algebra. Computing this Lie product for the
shear matrix X4 and the rotation matrix X1 we nd that the subgroup generated by the shear and the rotation operations must
also include the scaling matrices. Closure with regard to the Lie-product implies that the two one-parameter groups of shears
and rotations generate the three-parameter Lie-algebra of shears, rotations and uniform scalings. We have now shown how
the group of non-singular 2 × 2 matrices (and its subgroups) operate on measurement vectors generate by one sensor type at
two different positions. Next we will generalize this to the case where we measure simultaneously N-channels at each pixel.
The most common example of such a generalization corresponds to a transition from intensity- to RGB-images. Since we
separated the spectral properties and the non-spectral parameters we see that the transformation matrix M is the same for all
channels. Therefore we obtain in the general case the transformation group:
C(x1 )
S
(15)
=M
D
C(x2 )
N
Here C(x1 ), C(x2 ), S, D are N -dimensional vectors. The group now operates on the space R2 . A general result from
Lie-theory (see [16]) shows that this implies that there are 2N − k functionally independent invariants where k is the dimension of the Lie-algebra. For the case of RGB images and the full matrix group we have 2 · 3 − 4 = 2 invariants. The
general theory does not only give the number of functionally independent invariants but it also provides an algorithm how to
construct these invariants. T
A simple Maple program gives the following solutions (with C(xk ) = (rk , gk , bk )):
−g2 b1 + b2 g1 b2 r1 − r2 b1
(16)
,
f = F1
r1 g2 − r2 g1 r1 g2 − r2 g1
If we only start with rotations and shearing then we start with two variables but because of the Lie-structure we have k = 3
and there are three independent invariants:
−g2 b1 +b2 g1
f = F1(r1 g2 −r2 g1 , b2 r1 −r2 b1 , −
)
r1 g2 −r2 g1
5
Conclusions
We used a few examples to illustrate the possible usage of group theoretical methods to solve color related problems. They
show that it may be well worth to invest the time and effort to learn the tools of group representations and Lie-theory. The
advantages are, at least, twofold: (1) this may lead to new insights into the fundamental properties of spaces of color signals
and (2) group theory provides tools and algorithms to solve dif cult problems with the help of programs like Maple.
References
[1] J. J. Atick and N. Redlich. What does the retina know about natural scenes. Neural Computation, 4:449–572, 1992.
[2] H. B. Barlow. The coding of sensory messages. In W. H. Thorpe and O. L. Zangwill, editors, Current Problems in Animal Behaviour,
pages 331–360. Cambridge University Press, 1961.
[3] N. Dunford and J. T. Schwartz. Part III, Spectral Operators. Interscience Publishers, New York, 1988.
[4] D. J. Field. Relations between the statistics of natural images and the response properties of cortical cells. josaa, 4(12):2379–2394,
dec 1987.
[5] G. D. Finlayson and G. Schaefer. Solving for colour constancy using a constrained dichromatic re ection model. International
Journal of Computer Vision, 42(3):127–144, 2001.
[6] F. R. Gantmacher. Matrizentheorie. Springer Verlag, Berlin, Heidelberg, New York, Tokyo”, 1986.
[7] J. M. Geusebroek, R. van den Boomgaard, A. W. M. Smeulders, and H. Geerts. Color invariance. IEEE Trans. Pattern Anal. Machine
Intell., 23(12):1338–1350, 2001.
[8] T. Gevers and A. W. M. Smeulders. Pictoseek: Combining color and shape invariant features for image retrieval. IEEE Trans. on
Image Processing, 9(1):102–119, 2000.
[9] C. Gueymard, “Smarts, a simple model of the atmospheric radiative transfer of sunshine: Algorithms and performance assessment,”
Tech. Rep. FSEC-PF-270-95, Florida Solar Energy Center, (1995).
[10] S. T. Henderson and D. Hodgkiss, “The spectral energy distribution of daylight,” J. Appl. Phys., 14, pg. 125–131, (1963).
12
[11] D. B. Judd, D. L. MacAdam, and G. Wyszecki, “Spectral distribution of typical daylight as a function of correlated colour temperature,” Journal of the Optical Society of America A, 54, pg. 1031–1041, (1964).
[12] G. J. Klinker. A Physical Approach to Color Image Understanding. A K Peters Ltd., 1993.
[13] P. Kubelka and F. Munk. Ein beitrag zur optik der farbanstriche. Zeitschrift für Technische Physik, 11a:593–601, 1931.
[14] R. Lenz. Investigation of receptive elds using representations of dihedral groups. Journal of Visual Communication and Image
Representation, 6(3):209–227, September 1995.
[15] R. Lenz, T. H. Bui, and J. Hernández-Andrés. One-parameter subgroups and the chromaticity properties of time-changing illumination spectra. In Proc. SPIE-2003, Color Imaging VIII, Vol 5008, pages 237–248, 2003.
[16] P. J. Olver. Applications of Lie Groups to Differential Equations. Springer, New York, 1986.
[17] Z. Pan, G. Healey, and D. Slater, “Global spectral irradiance variability and material discrimination at boulder, colorado,” Journal of
the Optical Society of America A, 20, 3, pg. 513–521, (2003).
[18] C. A. Párraga, G. Brelstaff, T. Troscianko, and I. R. Moorehead. Color and luminance information in natural scenes. J. Opt. Soc.
Am. A, 15(3):563–569, March 1998.
[19] P. Reinagel and S. Laughlin. Natural stimulus statistics. Network: Computation in Neural Systems, 12(3):237–240, 2001.
[20] D. Sattinger and O. Weaver. Lie Groups and Algebras with Applications to Physics, Geometry and Mechanics, volume 61 of Applied
Mathematical Sciences. Springer, 1986.
[21] S. A. Shafer. Using color to separate re ection components. Color Research and Application, 10(4):210–218, 1985.
[22] H. M. G. Stokman. Robust Photometric Invariance in Machine Color Vision. PhD thesis, Intelligent Sensory Information Systems
group, University of Amsterdam, Nov. 2000.
[23] G. T. Winch, M. C. Boshoff, C. J. Kok, and A. G. du Toit, “Spectroradiometric and colorimetric characteristics of daylight in the
southern hemisphere: Pretoria, south africa,” Journal of the Optical Society of America A, 56, pg. 456–464, (1966).
[24] G. Wyszecki and W. Stiles. Color Science. Wiley & Sons, London, England, 2 edition, 1982.
13
Image Relighting: Getting the Sun to Set in an Image Taken at Noon
Claus B. Madsen and Rune Laursen
Laboratory of Computer Vision and Media Technology
Aalborg University, Aalborg, Denmark
cbm/[email protected]
Abstract
Image relighting is a very unique special visual effect
which promises to have many important practical applications. Image relighting is essentially the process of, given
one or more images of some scene, computing what that
scene would look like under some other (arbitrary) lighting conditions, e.g., changing positions and colors of light
sources. Image relighting can for example be used for
interior light design. This paper describes an approach
to image relighting which can be implemented to run in
real-time by utilizing graphics hardware, as opposed to
other state-of-the-art approaches which at best run at a
few frames per second.
1 Introduction
This paper addresses the subject of developing relighting techniques, i.e. techniques allowing a user to completely alter the lighting conditions in an image of a real
scene. Speci cally , the paper focuses on techniques providing real-time relighting functionalities, thus enabling
the user to interactively change lighting conditions and
get ”instant” visual feedback. Figure 1 provides an example of the kind of relighting this paper addresses. It should
be noted that the work presented here presumes the availability of three things: 1) an original image of the scene,
2) a 3D model of the scene, and 3) a model of the lighting conditions in the scene at the time the original image
is acquired. We will return to ways in which the two last
pieces of knowledge can be obtained.
Conceptually image relighting in this manner is a two
step process. In the rst step all effects of the original
lighting conditions are removed, e.g., highlights, shadows,
and differences in shading across surfaces due to varying
light incidence angles. In the second step the scene is subjected to some arbitrary new lighting conditions and the
appearance of the scene in these conditions is computed.
The second step thus ”adds” new highlights, shadows and
shading etc. Of these two steps the former is the tricky
one, while the latter can be performed using any preferred
rendering technique, e.g., ray tracing, radiosity or standard hardware accelerated approaches. Which rendering
technique is employed depends on the preferred balance
between rendering speed and accuracy in handling various lighting phenomena. In order to achieve true real-time
performance we have chosen to use a hardware accelerated approach for step 2, thus sacri cing certain global
illumination phenomena.
In our approach step 1 is achieved by a computational
approach which requires, as stated above, a 3D model
of the scene and a model of the original lighting conditions. Alternatively, one could in principle acquire frontoparallel digital images of the surfaces in the scene under perfectly diffuse, white-balanced lighting conditions
and use these images as textures on the 3D model, which
is subsequently rendered under novel lighting conditions
(step 2). This would be a mechanical or image acquisition
approach to step 1, but in reality acquiring such ’clean’
textures devoid of lighting effects is not practical for general scenes.
The contributions in this work lie in the speci c manner
in which the operations performed in steps 1 and 2 are carried out. With the approach described here the two steps
can be combined such that the image relighting becomes
a matter of modulating the original image on a pixel by
pixel basis with a ”relighting map”. The relighting map
can be computed in real-time using standard techniques,
14
Figure 1: Left: original image acquired outdoors on a sunny day. Right: same scene as left but this image is a
simulation of what the scene would look like if the position of the sun were different. The work in this paper enables
such changes in lighting conditions to be performed in real-time.
and the modulation can also easily be performed in realtime. Thus, our approach has two advantages: 1) it is directly designed for real-time performance, and 2) the original image is used directly and therefore the nal image is
not subject to ltering and/or aliasing effects involved in
doing reprojections of textures mapped to a 3D model of
the scene.
This paper represents the current state of work in
progress and will only address the problem for scenes
with perfectly diffuse re ectance properties. The paper is
organized as follows. In section 2, we present an overview
of our approach and show how the relighting effects are
achieved. Section 3 then describes related work. Section 4 describes our approach in more detail, followed by
section 5 giving some practical details behind the initial
experiments we have performed to validate the proposed
approach. Section 6 discusses central aspects of the work
and points to future research. Finally section 7 offers conclusions.
involved and of the process behind our technique.
The approach requires different types of input information. First of all an image of the original scene is required.
Secondly, a 3D model of the scene must be available, and
the original image must be calibrated to the 3D model
such that every pixel corresponds to a known 3D point
in the scene model. Third, the original lighting conditions
in the scene must be known, i.e., we need to know the
sources of light in the scene, and their relative intensities.
The 3D model can be obtained in many different ways
[7], e.g., by reconstruction from multiple images using approaches such as [8], by Image-Based Modelling, e.g. [4],
or by laser range scanning. Alternative the scene can be
measured and a model constructed manually. The latter is
the approach employed for our experimental results, i.e.,
we have measured the scene, constructed crude polygonal
models of the objects, and then calibrated the camera to
the 3D model using manually established 2D to 3D point
correspondences. Figure 2 shows the scene model used
for the relighting illustrated in gure 1.
The required knowledge of the original lighting conditions can most easily be acquired using the popular light
2 Overview of approach
probe approach, i.e., by taking high dynamic range imPrior to describing the proposed approach in a more tech- ages of a re ecti ve sphere placed in the scene, [2, 5]. Alnically rigorous manner this section attempts to provide ternatively, light source positions, sizes and power can be
the reader with an intuitive understanding of the issues measured manually as done in [6] or semi-automatically
15
relighting conditions it will be denoted
.
For purely diffuse Bidirectional Re ectance Distribution Functions (BRDFs) there is a linear relationship between radiance and irradiance (radiance is proportional to
diffuse albedo times irradiance). Therefore diffuse scenes
can be very simply relit by dividing the radiance map with
the original irradiance map, and then multiplying with the
religting irradiance map. Using the introduced terminol.
ogy
Figure 3 shows original and relighting irradiance maps
corresponding to the relighting example in gure 1.
Figure 2: 3D model corresponding to scene shown in gure 1. The image illustrates the model as a depth map, i.e.,
intensity is a measure of distance from the camera. We
have used one plane for the tiled ground plane, a plane
for the brick wall on the right, six quadrilaterals for the
speaker on the left, and two quads and one triangle for for
calibration object in the center.
Section 1 presented the general concept of relighting as
a two step process: 1) removing original light from the
image, and 2) adding new light. In the described diffuse
case step 1 is represented by the
operation, whereas step 2 is performed by multiplying the result
. Step 1 is
from step 1 by the relight irradiance,
a once-only process as it only involves elements that do
not change over time (original image and original irradiance map). Step 2 has to be re-performed constantly in
response to the user’s changes in the desired lighting conmust in general be
ditions for the relit scene, so
re-computed for every frame.
using multiple images as in [12]. For the experimental results in this paper we have done it manually in a manner
described in section 5.
In this paper we will limit ourselves to discussing the
case of scenes containing surfaces with perfectly diffuse
re ectance properties.
Each pixel in the original image is a measurement of the
radiance (in the three RGB bands) from a unique 3D point
in the scene in the direction of the viewpoint. Thus the
, where
original image is a 2D radiance map,
and are the image coordinates. Because we have the 3D
model and knowledge of the original lighting conditions
it is trivial1 to compute the amount of light arriving at the
same 3D points in the scene, i.e., it is possible to con. When the irradiance
struct an irradiance map,
map is computed using the known original scene lighting
. Conversely, when the
conditions we will call it
irradiance map is computed using some arbitrary different
Step 1 could be pre-computed and the result stored in
an image which is then subsequently modulated by the
real-time computed relighting irradiance map. Our approach does not do this; we have designed a solution
which embeds the normalization with the original irradiance into the computation of the relighting irradiance
. Thus, at run-time we actually compute the
map,
relight/original light ratio
directly. This
”ratio map” is then used for modulating the original image. By doing this we avoid the non-trivial implementation of a real-time texture division operation.
Computing the ratio map is described in detail in section 4. The idea is based on the observation that if one
renders a radiance image of an all-white perfectly diffuse
3D model of the scene under the chosen relighting conditions, this image is then identical to the required irra. If instead we set the re ectances
diance map,
1 Irradiance computation is trivial provided global illumination issues
of
the
3D
model
to
be
proportional to the inverse of the
(irradiance contributions from diffuse re ections) are disregarded. If
original
irradiances
then
the rendered image automically
this is not a fair assumption the work by Yu et al., [12], can be used to
compute these contributions.
becomes the desired relight/original irradiance ratio map.
16
Figure 3: Left: irradiance map,
, corresponding to original lighting conditions of the scene shown in gure 1.
Center: irradiancemap,
, corresponding to lighting conditions where the dominant light source, the sun, has
changed location relative to the scene. These are the lighting conditions valid for the relit image in gure 1. Right:
. Every pixel in
illustrates the ”relighting factor”, i.e., the ratio of original irradiance to relighting irradiance,
the original radiance map is modulated with its corresponding pixel in this map. Dark means ”light is removed” from
original pixels, and bright means ”light is added” to original pixels.
3 Related work
There is a small amount of closely related work in the
literature. First of all Yu et al., [13], demonstrated how
they could acquire re ectance properties of architectural
scenes by taking at least two images of each surface of
the objects under different lighting conditions. The recovered 3D model combined with the estimated re ectance
parameters could then be used to render the scene under changing lighting conditions. The focus of this work
is entirely on parameter recovery and relighting is by no
means done in real-time.
Similarly, inverse global illumination was proposed by
Yu et al., [12] for recovery of re ectance parameters for
indoor scenes using multiple images of each surface from
different viewpoints. Again this work focuses on reectance parameter estimation and relighting is done using RADIANCE, [10], which again is far from real-time.
The most closely related work is that of Loscos et al.,
[6]. This work also enables a user to change the lighting
conditions in an image of a scene in an interactive manner, but this work is centered around a radiosity method
for irradiance computations. Therefore, the method performs at a few frames per second when the only lighting changes performed are intensity adjustments. If the
number of sources or their positions are changed updating takes on the order of 10 seconds.
The work in [6] also employs texture modulation for
ef cient relighting, and the modulation texture (irradiance map) is computed using radiosity. We have chosen to focus speci cally on true real-time performance
and therefore the computation of the relighting irradiance
maps does not account for global illumination phenomena
such as color bleeding. Nevertheless, with the work currently being done in Pre-computed Radiance Transfer and
Photon Mapping, real-time global illumination is coming
closer and closer to reality, and our approach can readily
be combined with such ef cient global illumination techniques.
4
The perfectly diffuse case
For perfectly diffuse re ectors the relationship between
incident irradiance, , and radiance, , in any direction is
given by the diffuse albedo, :
(1)
The original image (radiance map),
, provides
us with measured radiances from a dense set of 3D points
in the scene, and these points are known since we assume
the camera is calibrated to the scene. In general the diffuse
albedo and the irradiance vary for every point in the scene,
so the relationship between radiance maps and irradiance
17
maps becomes:
ith light source at the 3D point). is the number of light
sources.
If we set the ambient and diffuse re ectances equal, eq.
(2)
4 changes to:
Here,
is the ”albedo map”. When doing relighting the albedo stays constant; the only thing that
(5)
changes is the irradiance at each scene point. Therefore,
the radiance map of the relit image/scene can be expressed
as:
Eq. 5 states that rendering with OpenGL Phong lighting the radiance from a point equals the re ectance at the
point times the total irradiance (ambient plus sum of individual point source contributions) at the point. Thus, by
(3) setting unit re ectances, the radiance equals the irradiance. This may be self-evident but is important because it
Eq. 3 simply shows that the relit image can be com- shows that we can use the graphics card’s ef cient lightputed by modulating the original image with a ratio of ing computation capabilities to produce irradiance maps
, needed for relighting.
two irradiance maps: the relight irradiance map,
corresponding to the user’s desired scene (new) lighting
That is, if we render the 3D scene model from a view. The point corresponding to the original image, and if all surconditions, and the original irradiance map,
key element in our approach is a technique for computing faces in the rendered 3D model have unit re ectances,
this map in real-time and using it for modulation of the then the resulting image is an irradiance map. This means
original image.
, for any user dethat relighting irradiance maps,
sired lighting conditions can be rendered simply by rendering a diffuse, all-white 3D scene model under the cho4.1 Computing the irradiance ratio map
sen lighting conditions.
How can we ef ciently compute the irradiance ratio map?
To actually do relighting we not only needed real-time
First, let us describe how simply the relighting irradiance computation of
, but we needed the lighting ratio
map can be computed using standard local illumination map,
. This is accomplished by setting
techniques (speci cally we will use the Phong lighting the re ectances of points in the 3D model to the inverse
model of OpenGL, a description of which may be found of the original irradiance at that point,
.
in books such as [1, 11]). Rendering an image of a scene
To summarize the light ratio maps are generated by dousing the Phong lighting model results in a radiance from ing the following rendering using hardware acceleration:
a 3D point which can be formulated as (disregarding specular re ection):
1. upload 3D scene model to graphics card
1
(
+
-
/
(
&
'
(
3
"
*
6
)
$
0
8
(
+
-
/
(
&
'
(
"
"
$
0
(4)
)
*
Here is the radiance from a 3D point in the direction
of the viewpoint.
and
are the ambient and diffuse
re ectances, respectively, (eq. 4 is to be evaluated for
is the ambient irradiance
each of three RGB colors).
at the 3D point.
is the irradiance at the point
caused by the ith point light source, (
is the angle
between the surface normal and the direction vector to the
(
"
+
-
/
(
2. set ambient and diffuse RGB re ectances of all vertices to the inverse of the original irradiance at that
3D point
3. setup the desired lighting conditions consisting of
ambient and point source contributions
"
(
0
0
4. render the model to a texture using a viewport corresponding to the camera in the original image
18
4.2 Pratical issues
In the previous section we described how to use hardware
accelerated local lighting rendering to produce irradiance
ratio maps for modulating the original image. With this
approach there is really no limits to how much the lighting
conditions in the scene can be altered.
We are presently implementing the proposed technique
but all images in this paper were produced by a nonreal-time simulation of the presented approach. Figure
4 shows what the original scene looks like with a (nonexisting) light source in the very center of the scene.
For the ongoing implementation of the real-time version the only real issue to contemplate is the resolution of
the 3D model of the scene. In order to properly capture
gradients in the original irradiances of the scene the resolution of the 3D model has to be high at such gradients.
We are working on designing methods for adaptive subdivision based on evaluating irradiance differences between
3D model vertex locations. If differences are too high the
surface is subdivided.
Computing original irradiances to be used inversely as
re ectances of the (subdivided) 3D scene model is an offline process which can be done using any preferred rendering technique, for example Monte Carlo ray tracing to
enable proper handling of area light sources. This is especially possible if a high dynamic range light probe image
of the scene is available, because then an Image-Based
Lighting approach, [2, 3], can be used to compute accurate irradiances which properly handle soft shadows in the
original image.
In the on-line stage, when rendering the 3D model with
the assigned re ectances, cast shadows are important for
proper irradiance computation. For this we propose to use
a shadow volume approach to detecting shadowed areas.
5 Experiments
As mentioned previously the images shown in this paper
are produced using a simulation of the presented technique. The original image was acquired with a standard
5 mega pixel digital camera. The scene was measured
manually and a simple 3D model of it was constructed (as
described in section 2).
The camera was calibrated to the 3D model using man-
ually established 2D to 3D point correspondences, and the
estimation of internal and external camera parameters was
done using an approach from [9].
The original lighting conditions were modelled as a
combination of a point light source (the sun) and an ambient term (the sky). The position of the sun relative to the
scene model was determined by orienting the calibration
object such that sun rays were parallel to the xz-plane,
thus xing the sun’s y-coordinate to zero. The x and z
coordinates were then found by measuring the length of
a shadow cast by an object of known height. The RGB
intensities of the blue ambient sky light was determined
from the image colors of the white paper of the calibration object in areas not exposed to sunlight. By comparing
RGB values of calibration object cardboard in shadow and
in direct light the relative intensities between ambient and
sun light were determined, (taking the cosine fall-off for
diffuse re ection into account for the sun point source).
It should be made very clear that the original lighting conditions modelled as described above are extremely
crude and this was only done to get quick results. In the
future we will use light probe images.
For computing the original irradiance map a simple ray
tracing approach was implemented which consideres local illumination only, by casting primary rays plus shadow
feelers. The original radiance map was computed in image resolution. The relighting examples given in the paper
basically involve changing the location of the sun source.
Given some desired sun position the simple raytracer was
used to render a relighting irradiance map, again in image resolution. The relighting irradiance map was divided
by the original irradiance map and the result multiplied
with the original image to complete the diffuse relighting
process.
6
Discussion
In this section we will brie y discuss some important
points in relation to our proposed method.
Using our approach it is possible to employ arbitrarily
complex and accurate computations of the original irradiance map. This is an off-line, once-only computation
the results of which are used to set the re ectances of
the 3D scene model subsequently used for relighting. We
believe handling area light sources to be very important,
19
Figure 4: Left: irradiance map corresponding to a user de ned lighting environment with a weak ambient term and a
point light source in the middle of the scene. Right: resulting relit image.
even for outdoor images, since shadows due to sun light 7 Conclusions
actually do have noticeable penumbra regions. Similarly,
we believe taking global illumination phenomena (indirect light) into account is important, especially for indoor We have descibed an approach to image/scene relight
scenes, where re ections from other surfaces may be a which based on a 3D model of the scene and knowledge of
the original lighting conditions can compute the appearsigni cant irradiance contribution for a given surface.
ance of the scene under any arbitrary new lighting conditions, including chaging the number of light sources, their
positions and radiant powers.
Conversely, for the actual on-line, real-time rendering
The main contribution of the work is the fact that the
of irradiances during interactive relighting we have here
proposed a straight forward local illumination approach. approach is directly designed for real-time performance,
Yet, the basic approach with using the 3D scene model, enabling a user to get instant visual feedback upon having
normalized with original irradiances, can be used in con- changed the parameters of the lighting environment. A
junction with any lighting algorithm depending on how smaller contribution lies in the idea of performing the normalization with the original irradiance by appropriately
accurate one desires the result should be.
setting the re ectances of the 3D model used for real-time
irradiance computations. This allows the approach to operate directly on the original image, rather than computing
Throughout this paper we have assumed scenes to conthe albedo map off-line and modulating it at run-time.
sist entirely of Lambertian materials. Our approach actuAn important aspect of the proposed approach is that
ally does generalize nicely to scenes with glossy BRDFs.
It requires an additional rendering pass in the real-time relighting is performed as a modulation of the original
relighting process, in order rst to modulate the original image. It is believed that doing relighting in this manner
image with the diffuse part of the relighting/original irra- is superior to an approach were textures extracted from
diance ratio map and subsequently add the specular radi- the image are mapped to the scene geometry and reproance part. Figure 5 demonstrates the effect of adding a jected at run-time, because the multiple re-sampling steps
involved will cause the resulting image to be blurred.
specular component to the surfaces during relighting.
20
geometry and image-based approach. In Proceedings: SIGGRAPH 1996, New Orleans, LA, USA,
pages 11 – 20, August 1996.
[5] S. Gibson, J. Cook, T. Howard, and R. Hubbold.
Rapic shadow generation in real-world lighting environments. In Proceedings: EuroGraphics Symposium on Rendering, Leuwen, Belgium, June 2003.
[6] C. Loscos, G. Drettakis, and L. Robert. Interative
virtual relighting of real scenes. IEEE Transactions
on Visualization and Computer Graphics, 6(4):289
– 305, October-December 2000.
Figure 5: A specular re ection component has been added
to each surface during relighting to illustrate the possibility of playing with the re ectance properties.
Acknowledgments
This research is funded in part by the BENOGO project
under the European Commission IST program (IST-200139184), and in part by the ARTHUR project (IST-200028559). This support is gratefully acknowledged.
References
[1] S. R. Buss. 3-D Computer Graphics – A Mathematical Introduction with OpenGL. Cambridge Universtity Press, 2003.
[2] P. Debevec. Rendering synthetic objects into real
scenes: Bridging traditional and image-based graphics with global illumination and high dynamic range
photography. In Proceedings: SIGGRAPH 1998,
Orlando, Florida, USA, July 1998.
[3] P. Debevec. Tutorial: Image-based lighting. IEEE
Computer Graphics and Applications, pages 26 –
34, March/April 2002.
[4] P. Debevec, C. Taylor, and J. Malik. Modelling and
rendering architecture from photographs: a hybrid
[7] M. M. Oliveira. Image-based modelling and rendering: A survey. RITA - Revista de Informatica Teorica
a Aplicada, 9(2):37 – 66, October 2002. Brasillian
journal, but paper is in English.
[8] M. Pollefeys, R. Koch, and L. VanGool. Selfcalibration and metric reconstruction in spite of
varying and unknown internal camera parameters.
International Journal of Computer Vision, 32(1):7
– 25, 1999.
[9] E. Trucco and A. Verri. Introductory Techniques for
3D Computer Vision. Prentice Hall, 1998.
[10] G. J. Ward. The radiance lighting simulation and
rendering system. In Proceedings: SIGGRAPH
1994, pages 459 – 472, July 1994.
[11] A. Watt and F. Policarpo. 3D Games: Real-Time
Rendering and Software Technology, volume 1.
Addison-Wesley, 2001.
[12] Y. Yu, P. Debevec, J. Malik, and T. Hawkins. Inverse
global illumination: Recovering re ectance models
of real scenes from photographs. In Proceedings:
SIGGRAPH 1999, Los Angeles, California, USA,
pages 215 – 224, August 1999.
[13] Y. Yu and J. Malik. Recovering photometric properties of architectural scenes from photographs. In
Proceedings: SIGGRAPH 1998, Orlando, Florida,
USA, pages 207 – 217, July 1998.
21
Using Mixtures of Gaussians
to Compare Approaches to Signal Separation
Kaare Brandt Petersen
Technical University of Denmark
1
Introduction
In the signal separation technique called Independent Component Analysis (ICA),
one refers to a problem as being ”square” if the there is as many measurements
as sources to separate and ”overcomplete” or ”under-determined” if there is
more sources than measurements. While square ICA is thoroughly investigated
through many different approaches with well understood differences and similarities, the overcomplete case is still posing a difficult problem. Below is a short
overview of some of the more interesting or illustrative of the approaches.
In 1999, Hagai Attias presented in [1] a Maximum Likelihood approach assuming the generative model x = As + . In this approach a model distribution
is constructed and using Mixture of Gaussians as priors makes it possible to
complete the relevant integrals and obtain a closed form expression for the
distribution over x. The model distribution is approximated to the data distribution through the Kullback Leibler divergence. This approach is extremely
flexible while still having appealing analytical properties and the only real drawback is the bad scaling behavior: A sum over K D must be computed, which can
be rather larger for e.g. image data.
In 2000, Lewicki and Sejnowksi presented in [5] a Maximum Likelihood approach assuming the generative model x = As + . In this approach, the log
likelihood of A is Taylor expanded to second order around the maximum a
posteriori estimate of the sources, i.e. one approximate the likelihood with a
gaussian, and the sources are in turn are estimated using the updated estimate
of A. In this approach we see the difficulty that most overcomplete techniques
is trying to work around: The likelihood, or some other suitable cost-function,
contains some integral involving the source prior and is therefore in general hard
to solve. The approach of Lewicki and Sejnowski from 2000 substitutes the integral with a second order expansion and we shall see other possibilities in the
following.
In 2001, Girolami presented in [2] a Maximum Likelihood approach assuming the generative model x = As + . In this approach, Girloami assumes
Laplacian priors and the integral associated with the log likelihood is approximated through a variational scheme: The laplacian priors are reformulated in
dual space which provides a lower bound of the likelihood to be optimized. The
drawback of this approach is that the trick only works for laplacian priors and
that the algorithm optimizes a lower boudn in stead of the log likelihood it self.
1
22
In 2002, Hojen-Soerensen, Winther and Hansen presented in [4], the socalled ”Mean Field Approach” assuming the generative model x = As + .
The parameters A and possibly the noise covariance W are estimated through
maximum likelihood assuming knowledge of the mean values of the sources
and vice versa. That is, the integral of the log likelihood, translates into mean
values of the sources, which are approximated with estimates of the mean values
obtained from Mean Field theory. The nice feature of this approach is that we
substitute a very complicated integral with an easier non-linear equation and
that one can do this for any prior. The problem, of course is that al though
Mean Field estimates are fairly accurate they are still approximations.
Also in 2002, Shriki, Sampolinski and Lee presented in [7] an interesting
variant of Infomax on the filtering model ŝ = Wx. In the setup y = g(Wx),
Shriki et. al. obtains a relation between p(x) and p(y) by assuming a noisy
relation y = g(Wx) + and letting the noise go to zero. The main problem is
that the limit is not taken properly care of and that the certainty of the result
therefore is doubtful.
And finally in 2003, Teh, Welling, Osindero and Hinton presented in [8] the
Energy Based Model on the filtering model ŝ = Wx. Through a setup using
inspiration from physics, a model distribution for x is constructed and adjusted
to be as close to the data distribution as possible through a Kullback Leibler
divergence. Again the authors are faced with an intractable integral and this
time it is approximated with the so-called ”n-step Learning” which is a Hybrid
Monte Carlo technique using n steps in the estimation of the integral. The
remarkable claim of the paper is that very few steps such as n = 1, or n = 3
often will be sufficient to obtain overall convergence of the algorithm.
A common feature for most of the approaches is that they assume a noisy
model and obtain a likelihood which involves a difficult integral which is then
approximated in some way or another. The exception from this is the approach
of Hagai Attias, but one can then argue that assuming priors to be mixtures of
gaussians either is a restriction or an approximation.
1.1
This Paper
Thus, Independent Component Analysis (ICA) can be performed by a vast range
of different methods. These can differ from each other by assumed properties
such as noise or time-correlation, but also by the fundamental issue on whether
they attempt to estimate the generative mixing matrix, denoted A, or a filtering
matrix denoted W. In case of the same number of observations and sources,
the square case, most if not all of these methods can be proven to be equivalent.
But in the overcomplete case, where the number of observations are smaller
than the number of sources, their differences becomes apparent and it is not
easy to compare the results of generative and filtering approaches.
This paper makes an attempt to compare the result of two different methods
in the overcomplete case: The Maximum Likelihood (ML), which estimates the
generative A, and the Energy Based Models (EBM) which seek to estimate a
filtering matrix W. This is done by assuming the priors to be centered mixtures
of gaussians which makes it possible to compare the optimization schemes. This
approach, with respect to ML, is closely related to the work of Hagai Attias in
2
23
General MoG
SMoG (Spherical)
IMoG (Independent)
1
1
1
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0
0
0
0.2
0.2
0.2
0.4
0.4
0.4
0.6
0.6
0.6
0.8
0.8
1
0.8
1
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
1
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
Figure 1: Contour plot of probability densities in 2D source space. and this is
the only of the the three which in fact is independent in its variables, i.e. a
Independent Mixture of Gaussians, IMoG.
[1], but where Attias is assuming completely general mixtures of gaussians and
a noisy mixture model, the mixtures of gaussians in this paper are assumed
centered for simplicity and, more importantly, the limit of zero noise is derived
to be able to compare with the noiseless EBM.
The structure of the paper is as follows: In Sec 2, we introduce the reader
to the mixture of gaussians to be used, and in Sec 3 we apply the mixtures of
gaussians to the EBM and the ML. In Sec 4 we compare the results found and
finally in Sec 5, we make a short summary.
2
Mixture of Gaussians
In this paper we make extensive use of the family of distributions known as
mixtures of gaussians (MoG). To clear up some common misconceptions about
MoG we introduce the general MoG and then discuss two important subsets of
distributions. The density of a D-dimensional centered MoG is
1
ρ
κ
(1)
exp − sT D−1
p(s) =
κ s
2
|2πDκ |
κ
where the weights ρκ sum to one. The matrix Dκ can be any positive definite
2
2
2
, σ2κ
, ..., σDκ
).
matrix, but in this context often assumed diagonal, Dκ = diag(σ1κ
is
itself
a
mixture
of
gaussians,
s
∼
The
marginal
distribution
for
each
s
i
i
2
κ ρκ N (0, σiκ ), but note about the joint distribution that even if the coordinates si in each component are independent, i.e. if Dκ all are diagonal, the
coordinates si are not in general independent in the bigger joint distribution
p(s).
One subset of interest is the MoG constructed as a product of D 1-dimensional
mixtures of gaussians, and therefore independent by construction. Since the
variables are independent we call this kind of MoG for Independent
Mixtures2of
),
Gaussians (IMoG). Denoting the parameters of i th marginal by κ ρiκ N (0, σiκ
the density of the joint distribution is given by
1
ρ̃
k
(2)
exp − sT D−1
p(s) =
k s
2
|2πDk |
{k}
3
24
The symbol {k} denotes all combinations of the vector k having length D,
2
2
2
), and ρ̃k is
, σ2k
, ..., σDk
consisting of integers from 1 to K, Dk = diag(σ1k
D
2
1
D
defined by ρ̃k = i=1 ρiki .
Another subset of interest is the MoG, where the matrices are not only
diagonal, but all some constant times the identity Dκ = σκ2 I. In this case the
density is spherical symmetric and we call this kind of distributions for Spherical
Mixtures of Gaussians (SMoG).
Obvious from Eq. 2, IMoG is itself a MoG, but the reverse is not true in
general. As visualized on Figure 1, the density contour of a MoG with diagonal
covariances is some weighted sum of axis-aligned ellipsoids. The length of axis of
the ellipsoids corresponds to variances of gaussian components, i.e. the diagonal
elements of the covariance matrices. This is also true for the product of 1dimensional MoG, but in this case all combinations of the marginal variances
are present as sets of axis of some ellipsoid. Thus knowing the density of a
MoG with diagonal covariances, its variables are independent if and only if all
combinations of axis lengths are present. Therefore, since this by definition is
not the case for SMoG, SMoG and IMoG must be disjoint subsets of MoG.
3
Models and Derivation
We now use the MoG as source priors for two different approaches to ICA:
Energy Based Models (EBM) and Maximum Likelihood (ML).
We consider a situation in which D sources st are mixed into a set of M
measurements xt , expressed by the equation xt = Ast . The signals are N time
steps long and can be arranged into the matrices S and X, such that mixing of
all N vectors can be expressed in one equation X = AS. The sources are white
and since EBM is restricted to square and overcomplete mixing, we assume
M ≤ D.
3.1
Energy Based Models
The EBM method, presented in [8], aim at demixing the measurements X by a
filtering with W in the traditional way Ŝ = WX. That this, in the overcomplete
case, is not producing independent estimated sources, is discussed in more detail
in Sec 4. The filtering coefficients is determined through a construction of a
model distribution pW (x) which is approximated to the data distribution
p0 (x) =
N
1 δ(x − xt )
N t=1
(3)
through the Kullback-Leibler divergence. The model distribution is constructed
in the following way: An energy is defined by E(x; W) = − ln ps (Wx), which
ensures the property that choosing W such that the resulting estimated sources
are not too unlikely, is rewarded through low energy levels and unlikely source
values penalized with high energy levels. Other definitions of energy could be
made, but this is especially appealing due to its calculational properties. The
4
25
energy E is used in a Gibbs distribution, i.e.
ps (Wx)
e−E(x;W)
=
Z(W)
ps (Wx)dx
pW (x) =
(4)
which is chosen as our model distribution. In [8], Teh et. al. make no assumption on the prior, which makes the normalization part more difficult. To deal
with this they use so-called n-step learning, a variant of a hybrid monte carlo
approach, do approximately estimate the normalization part of the optimization. In this paper we in stead choose the prior to be a MoG, which enables us
to calculate the integral of Eq. 4 and obtain an closed form expression for the
model distribution. The result is
exp(− 1 xT WT D−1
κ Wx)
(5)
γκ 2
pW (x) =
−1 |
κ
|2π(WT D−1
κ W)
where γκ in general is dependent on W and Dκ for all κ through the following
T −1
−1
1/2
relation:
Setting ξκ = (|2π(W Dκ W) |/|2πDκ |) , we can write γκ =
ρκ ξκ / κ ρκ ξκ . Note that in the square case ξκ = 1 and therefore γκ = ρκ
and in the case of SMoG priors we obtain ξκ = (2π/σκ2 )(D−M )/2 . The estimated
optimal filtering matrix Ŵ is determined by
Ŵ = minW KL(pW ||p0 )
in which the gradient of the KL-divergence can be calculated analytically when
the prior is chosen to be mixtures of gaussians or some other analytically appealing distribution.
3.2
Maximum Likelihood
In the ML setup presented here we assume a generative model X = AS + Γ,
where we, in order to be able to deal with the overcomplete case, have added
white gaussian noise, Γ. In the end we let the noise variance go to zero to obtain
the noiseless result.
Assuming white gaussian noise and the prior on the sources ps (s) to be a
MoG with covariance matrices Dκ and weights ρκ , we can complete the integration and write the distribution over x as
1 T
|2πΦ−1
κ | exp(− 2 x Ψκ x)
p(x) = p(x|s)p(s)ds =
ρκ
|2πΣ| |2πDκ |
κ
where Σ = σ 2 I is the noise covariance matrix, Φκ = AT Σ−1 A+Dκ−1 and Ψκ =
T −1
. We now want to consider the limit of σ 2 → 0, but
Σ−1 − Σ−1 AΦ−1
κ A Σ
we need to do this with great care, since otherwise crucial details will vanish in
the approximation. Using the Woodbury identity, singular value decomposition
and some very good approximations (see the appendix for details), we obtain
the following limits
|2πΦ−1
κ |/
Ψκ
→
|2πΣ| →
5
(ADκ AT )−1
wκ / |AAT |
26
Here wκ is a constant with respect to σ 2 , but depends on Dκ and on A through
the unique orthogonal matrix V. V which fulfills the eigen value equation
in Λ are decreasing in size down the
AT AV = VΛ, such that the eigen values D
diagonal. The expression for wκ is wκ = i=M +1 (2π/(VT Dκ−1 V)ii )1/2 . With
this limit taken care of, we can write the maximum likelihood expression for
p(x) as
pA (x) =
κ
ρκ
exp(− 12 xT (ADκ AT )−1 x)
|2πADκ AT |
(6)
where the weights somewhat surprisingly turns out to be same as those of the
prior (see the appendix for details). The estimated generative mixing matrix Â
is the matrix maximizing the log likelihood
 = maxA ln P (X|A)
Note that we have not estimated the sources in this process, only the generative
mixing matrix.
4
Comparing EBM and ML
Now we compare the EBM and ML approaches derived in the previous section
and discuss the significance of the differences and similarities. In the square
case, we obtain total equivalence of all expressions setting W = A−1 and thus
not surprisingly we can conclude, as Teh et .al. does in [8], that the two approaches are equivalent when the number of observations equal the number of
sources. Therefore the discussion and comparison in this sections is almost
entirely concerned with the overcomplete case.
In the overcomplete case it is not obvious how one should compare results
on the generative A and the filtering matrix W. The filtering approach does
not retrieve the original sources, since for any matrix W we have WAg = I
because of the dimensionality: It is impossible to construct D M -dimensional
orthogonal matrices when D > M . And we cannot in general compare the filter
matrix W with the pseudo-inverse of A, since this is not the optimal solution in
all cases [5]. But using MoG as prior, both the model distribution pW (x) of the
EBM and the loglikelihood pA (x) of the ML becomes MoG’s with parameters
which must be estimated to fit a common data set X. In fact we end up with
two optimizations which look rather similar
0=
N
∂ ln pW (xt )
∂W t=1
∂ ln pA (xt )
∂A t=1
N
0=
The similarity is to some extend both genuine and deceptive: Both distributions
are MoG, but the dependency of the weights and covariances on W and A are
different. In this section we compare EBM and ML by comparing the covariances
and weights of their MoG’s.
6
27
2
Square
d=1
d=2
d = 10
1.5
1
0.5
0
0
0.5
1
1.5
2
Figure 2: The Weights for EBM and ML. The plot demonstrates for SMoG
priors that the differences in weights increase dramatically when the settings
gets strongly overcomplete. (See the text for details on the plots)
4.1
Spherical MoG
We now consider the special case when the priors are SMoG, i.e. the variances
are given as Dκ = σκ2 I. In this case the sources are not assumed to be independent, which is interesting in its on right, but it also serves as a clear example of
properties which holds for the more general cases as well. When we assume the
priors to be SMoG, the model distribution simplifies significantly,
pW (x) =
κ
exp(− 1 xT WT D−1
κ Wx)
,
γκ 2
−1 |
|2π(WT D−1
κ W)
ρκ (1/σκ2 )d
2 d
κ ρκ (1/σκ )
γκ = (7)
where d = (D − M )/2.
The Covariances
The covariances of EBM and ML respectively can in fact become equal in the
case of SMoG. The equation setting the covariances equal
−1
= ADκ AT
(WT D−1
κ W)
∀κ
(8)
translates into WT WAAT = I, which is fulfilled for W = A+ , where A+
denotes the pseudo inverse (Moore-Penrose) of the matrix A. When A has full
rank, the pseudo-inverse is given by A+ = AT (AAT )−1 . But the equation is
also fulfilled for any matrix W = UA+ , where U is orthogonal and thus, there
is an entire family of matrices which would make the covariances of the EBM
equal to the covariances of the ML for a given A. Conversely for any W we
can choose A = W+ to obtain the same covariances and and in this sense the
two approaches have equal flexibility with respect to adjusting the covariances
to the data. This is evident in the 4 × 2 example in Fig 3 a) and b).
7
28
Gen Ag
Ps-inv A+
g
Est A (ML)
Est W (EBM)
8
8
8
8
4
4
4
4
0
0
0
0
4
4
4
4
8
8
8
4
0
4
8
8
8
4
0
4
8
8
8
4
0
4
8
8
4
0
4
8
Figure 3: The covariances in case of SMoG priors for different settings. Plot a)
Generative A. Plot b) Pseudo-inverse of the generative A. Plot c) Estimated
A. Plot d) Estimated W.
The Weights
The weights can be compared by examining the ratio γκ /ρκ and as we shall see,
this ratio differs strongly from 1 in most cases. From the expression of γκ in
Eq. 7, we see that the constrain making the weights of ML and EBM equal is
(1/σκ2 )d
?
2 d =1
κ ρκ (1/σκ )
∀κ
which is clearly impossible when the weights σκ2 must be different for different
κ. Thus, in the SMoG case, the weights of EBM and ML cannot be equal and
furthermore the ratio γκ /ρκ becomes relatively large for those κ where σκ2 is
very small and vice versa.
Fig 2 is a general illustration of this. Here we assume a SMoG prior with
two components which has σ12 = 1 and ρ1 = ρ2 = 0.5 and let σ22 vary between 0
and 2. The resulting ratio γ2 /ρ2 is shown in Fig 2: The x-axis is σ22 , the y-axis
is γ2 /ρ2 and the 4 curves are plottet for different degrees of overcompleteness
d=0 (Square) and d = 1, 2, 10 (Overcomplete). Clearly only in the square case
and for σ22 values close to 1, is the ratio reasonably close to 1.
Another more specific example is shown in Fig 3 c) and d), which is the
estimated covariances for ML and EBM in a 4 × 2 case. In this example the
effect of the enhanced weight on smaller covariances is clear: When the weights
of the smaller covariance is strong, the points far from origin is considered
extreme, and the covariances are expanded accordingly. Thus, for this reason,
EBM seem to favor larger covariances compared to ML.
5
Summary and Acknowledgements
Conclusively the use of Mixtures of Gaussians made it possible to compare
Maximum Likelihood with Energy Based Models. The results show that in the
overcomplete case with Spherical Mixtures of Gaussians as priors, the Energy
Based Model is biased toward larger covariances compared to the Maximum
Likelihood. One can show that this effect is also present for IMoG priors,
though not at all as strong.
8
29
Finally I need to give due credit: This paper is closely related to the result
of earlier work together with Jiucang Hao and Te-Won Lee from University of
California San Diego (UCSD).
A
Details of the Calculations
When calculating the limit of Ψκ , we first set in the definition Σ = σ 2 I and
simplify the expression into
Ψκ =
1 T
I − A(AT A + σ 2 D−1
κ )A
2
σ
Now defining Q = ADκ AT /σ 2 we can use Woodburys identity for inverse
matrices and obtain
T
−1
Q
A(AT A + σ 2 D−1
κ )A = Q − Q(I + Q)
Since Q is symmetric and very large compared to I, the right hand side can be
approximated by I−Q−1 . To see this, write Q as Q = VΛVT , for an orthogonal
V and diagonal Λ and remember the identity x − x2 /(1 + x) = 1 − 1/(1 + x) to
obtain I − Vdiag(1/(1 + Λii /σ 2 ))VT ∼
= I − Vdiag(1/(Λii /σ 2 ))VT = I − Q−1 .
Inserting this into the equation containing Ψκ , gives the desired result.
When calculating the limit of the fraction containing the determinant |2πΦ−1
κ |,
we use the fact that since AT A is symmetric there exists an orthogonal V such
that AT A = VΛVT . Since |V| = 1 we get
2 (D−M )
/|Λ + σ 2 VT D−1 V|
|2πΦ−1
κ |/|2πΣ| = (2πσ )
Since σ 2 are assumed arbitrary small, only the diagonal of the sum of matrices
will contributed significantly to the determinant. And since further more AT A
has rank M , the M first elements of the diagonal matrix will dominate together
with the remaining M − D factors
|Λ + σ 2 VT D−1 V| ∼
=
M
Λii
i=1
D
(σ 2 VT D−1 V)jj
j=M +1
inserting this into the fraction above gives the
desired result.
The coefficients ακ has the structure ρκ wκ |2πADκ AT |/ |AAT | · |2πDκ |.
In the square case much of the difficulties of taking the noiseless limit disappears,
wκ = 1 and we easily obtain ακ = ρκ . In the case of SMoG, setting Dκ = σκ2 I,
we obtain wκ = (2πσκ2 )(D−M )/2 and therefore ακ = ρκ . Supported by numerical
results we conjecture that this is also the case for the general overcomplete case.
References
[1] H. Attias, Independent Factor Analysis, Neural Computation 11, 803-851,
1999.
[2] M. Girolami, A Variational Method for Learning Sparse and Overcomplete
Representations, Neural Computation, 13(11), pp 2517 - 2532, 2001.
9
30
[3] G. E. Hinton, Training products of Experts by minimizing contrastive divergence, Neural Computation, 14(8):1771-1800, 2002.
[4] P. Hojen-Sorensen, O. Winther, L. K. Hansen, Mean-Field Approaches to
Independent Component Analysis, Neural Computation Volume 14, Issue
4, April 2002.
[5] M. S. Lewicki, T. J. Sejnowski, Learning Overcomplete Representations,
Neural Computation, 12(2):337-65, 2000.
[6] R. M. Neal, Probabilistic Inference Using Markov Chain Monte Carlo Methods, Techinal Report CRG-TR-93-I, Department of Computer Science,
ujniversity of Toronto, 1993.
[7] O. Shriki, H. Sampolinski, D.D. Lee, An information maximization approach to overcomplete and recurrent representations Neural Info. Proc.
Sys. 13 (2001).
[8] Y. W. Teh, M. Welling, S. Osindero, G. Hinton ”Energy-Based Models for
Sparse Overcomplete Representations”, The Journal of Machine Learning
Research - Special Issue on Independent Components Analysis, guest edited
by Te-Won Lee, Jean-Franois Cardoso, Erkki Oja and Shun-ichi Amari.
2003.
10
31
Stochastic differential equations in image warping
Bo Markussen
Department of Computer Science
University of Copenhagen, DK-2100 Copenhagen
email: [email protected]
Abstract
This paper is concerned with image warping as applied in e.g. non-rigid
registration. We use the theory of stochastic differential equations to give
a precise mathematical description of the Brownian warps model proposed
by Nielsen et al.. Furthermore we introduce a renormalization technique allowing for Bayesian inference of the warp o w given e.g. the observation
of matching landmarks in two images. As a nal result we show that the
maximum a posteriori estimator is equivalent to the minimal energy estimator derived by Joshi & Miller using o ws given by deterministic differential
equations.
1
Introduction
Image warping is diffeomorphic transformations of the domain on which an image
is represented. A typical application is the registration problem. Suppose for
instance that several images of the same object have been taken with different
devices. Each device records different properties of the object, and we wish to
combine the different images a joint description of the object. In practice the
images are often not aligned, whence it would be improper simply to stack them
on top of each other. However, if we are allowed to deform the images (imagine
the images being recorded of pieces of rubber, and that we may deform the pieces
of rubber in any possible way without introducing folds or tears), then it might
be possible to achieve alignment. In some cases the needed transformation are
non-rigid leading to a non-rigid registration problem. This paper is concerned
with a probabilistic description of this problem. In particular we consider the
statistical problem of nding the maximum a posteriori estimator of the warps
given a set of matching landmarks. Landmarks are points which are identi able
in all the images. We hence know that these landmarks should be matched by the
1
32
warps. Given this information we seek to estimate the remaining registration. A
related application would be the estimation of the image o w given a sequence of
matched landmarks in a sequence of images.
This will be a purely theoretical paper. The results are threefold: 1) A stochastic differential equation description of the least committed prior for Brownian
warps proposed in [7], 2) the introduction of a mathematical rigorous renormalization technique allowing the computation of a maximum a posteriori estimator,
3) equivalence of the maximum a posteriori and the minimum energy warp found
in [4]. These results have recently been presented in the paper [1] at the second
workshop on Generative Model Based Vision at the IEEE conference on Computer
Vision and Pattern Recognition.
Mathematically a warp between two d-dimensional images (in applications
we typically have d = 2 or d = 3) is a diffeomorphism φ : Rd → Rd . Points
x ∈ Rd are usually understood as column vectors x = {xi }i=1,...,d . Similarly, the
coordinates of a matrix a ∈ Rd×d are denoted by a = {aij }i,j=1,...,d . The transpose
of a vector/matrix A is denoted by A∗ , probabilistic expectation is denoted by E,
and the set of natural numbers is denoted by N.
2
The rst principle model
In accordance with the physical “piece of rubber” analogy it is natural to realize
the warping as a dynamical process . At time t = 0 we have the original image,
and a time t = T we have the nal image. At a time point 0 < t < T we
have a intermediate warp. We thus introduce a double indexed family of warps
φ(·, s, t) : Rd → Rd with 0 ≤ s ≤ t ≤ T . The warp φ(·, s, t) gives the warping
between time s and t. The physical interpretation implies the o w properties
φ(x, s, s) = x,
φ(x, s, t) = φ φ(x, s, r), r, t .
The next step is to realize φ(·, s, t) as a stochastic process. The axiom stated below
was proposed in [7] and parallels the assumptions leading to Gaussian distribution
and the Brownian motion. A Brownian motion is a stochastic process B with
independent Gaussian distributed increments B(t) − B(s) ∼ N (0, t√
− s). The
differential dB(t) can be interpreted as Gaussian white noise of order dt.
Axiom. The diffeomorphisms φ(·, th−1 , th ) are stochastically independent for every partition 0 = t0 < t1 < · · · < tH = T , and the distribution of φ(·, s, t)
depends only on the time length t − s.
The Axiom together with the o w property imply a central limit theorem for
the Jacobiants {∂φi (x)/∂xj }i,j=1,...,d . These Jacobiants were employed in [7] for
2
33
the landmark matching problem. Unfortunately the limiting distribution is very
dif cult to analyze, and has to our knowledge only been described in dimension
d = 2 in [3]. The underlying probabilistic model can be rigorously de ned using
stochastic calculus. The physical interpretation of Eq. (3) given below is that the
“piece of rubber” is in uenced by in nitely many random forces modelled by
Brownian motions Bn , n ∈ N. The n’th force works in the direction fn (x) at the
spatial position x and has size dBn (t) at time t. We will use stochastic integrals
T
such as 0 X(t) dB(t), which loosely speaking is the limit of the incremental
sums
H
X(th−1 ) B(th ) − B(th−1 ) .
h=1
However, some care has to be taken. Especially there is a distinction between the
so-called Itô and Stratonovich integrals. The essential reference for the results
employed in this paper is the monograph [5]. But be warned, this monograph can
be quite hard to read. Let ◦dBn (t) denote the Stratonovich differential at time t.
Theorem 1. In addition to the Axiom assume for every x, y ∈ Rd and s ∈ [0, T ]
the existence of the limits
E[φ(x, s, t) − x] t>s
−−→ b(x),
t→s
t−s
∗ E φ(x, s, t) − x φ(y, s, t) − y
t>s
−−→ a(x, y)
t→s
t−s
and functions f0 and fn , n ∈ N, such that
fn (x) fn (y)∗ ∈ Rd×d ,
a(x, y) =
n∈N
1
b(x) = f0 (x) +
2
d
j=1
∂aij (x, y) ∈ Rd .
∂xj
y=x i=1,...,d
(1)
(2)
Under some additional regularity conditions there exists stochastically independent Brownian motions Bn , n ∈ N, such that φ(x, s, t) solves the stochastic differential equation given on integral form by
t
t f0 φ(x, s, u) du +
fn φ(x, s, u) ◦dBn (u). (3)
φ(x, s, t) = x +
s
n∈N
s
Note that the statistical properties of the o w of diffeomorphisms φ(x, s, t)
are encoded by the in nitesimal mean and covariance b(x) and a(x, y), and that
the actual appearance of the o w is encoded by the Brownian motions Bn . In
remaining of this paper the functions b(x) and a(x, y) are assumed to be known a
priori.
3
34
3
The maximum a posteriori estimator
The idea behind the regularized maximum a posteriori estimator is to approximate
the in nite dimensional Brownian motion B = {Bn (t)}n∈N,t∈[0,T ] by the nite
dimensional random variable
∈ RN ×N .
BN = Bn mNT
n,m=1,...,N
The N 2 -dimensional Gaussian vector BN has probability density
"
!
2 #
N
N N
N )
−
B
(t
)
N Bn (tN
N
n m−1
m
pN B =
exp −
.
2πT
2T
n=1 m=1
∈ C 1 [0, T ]; RN we de ne the renormalized Brownian density as the limit
For B
#
"
T ∂B
n (t) 2
N
1
= lim pN B
p B
dt .
(4)
= exp −
N →∞
2 n∈N 0
∂t
Let (xl , yl ) ∈ Rd × Rd , l ∈ L, be a given set of landmarks. Let the approximative
and the approximative maximum a posteriori
o w φN (·, tm , tk ) with tm = mT
N
N
$
estimator B be de ned by
k−1
N k−1
f0 φN (x, tm , th ) T +
fn φN (x, tm , th ) Bn (th+1 ) − Bn (th ) ,
φN (x, tm , tk ) = x +
N
n=1 h=m
h=m
N
N
N
N
×N
$ = arg max pN B B ∈ R
B
: φN (xl , 0, T ) = yl for l ∈ L ,
s, t) and the renormalized maximum a posteriori estithe renormalized o w φ(x,
$ be de ned by
mator B
t
t s, u) du +
s, u) dB
s, t) = x +
n (u),
f0 φ(x,
fn φ(x,
φ(x,
s
n∈N
s
l , 0, T ) = yl for l ∈ L ,
∈ C 1 [0, T ]; RN : φ(x
B
$
B = arg max p B
$ s, t) be de ned by the ordinary difand the maximum a posteriori estimator φ(·,
ferential equation
t
t $
$ s, u) dB
$
$n (u). (5)
f0 φ(x, s, u) du +
fn φ(x,
φ(x, s, t) = x +
s
n∈N
4
s
35
$ N conThen it can be show that φN (x, s, t) converges to φ(x, s, t) and that B
$ s, t)
$ as N → ∞. These convergence results indeed justify to call φ(·,
verges to B
the maximum a posteriori estimator. The main novelty of the paper [1] was the
$ The relations between
$ N to B.
awareness of need to consider the convergence of B
these properties are gathered in the following diagram
φN (·,O s, t)
MAP⇒
/ $N o
B
Convergence⇒ /
$
⇐Regularization B
O
Approximation⇑ ⇓Convergence
⇑
/ φ(·,
$ s, t)
MAP
s, t)
φ(·,
φ(·, Os, t)
O
⇑
SDE⇑
B
ODE⇒
Renormalization⇒
ODE
/ B
Observe that discretizations done in this section only are an intermediate step
s, t)
in the theoretical analysis. However, the differential equation for the o w φ(·,
will typically be discretized for the numerical implementation. Moreover, working directly with the covariance function a(x, y) instead of the functions fn (x) we
can circumvent the in nite sum over n ∈ N in Eq. (5). For these results, a speci c
algorithm for the landmark matching problem and a discussion of the choice of
the functions b(x) and a(x, y) see [1].
4
Interpretation of energy minimization approaches
The theory of differential geometry gives that there for every diffeomorphism φ
exists a velocity eld v : Rd × [0, T ] → Rd such that φ(x) = φ(x, 0, T ) for the
solution to the transport equation
t
φ(x, s, t) = x +
v φ(x, s, u), u du.
(6)
s
An alternative approach for warp estimation is to de ne some energy functional
over velocity elds and choose the minimal energy velocity eld tting the landmarks, cf. [4] and [2]. Given some differential operator L , e.g. the Laplacian, the
paper [4] proposes to use the energy functional
T
L v(x, t)2 dx dt.
E(v) =
0
Rd
5
36
l , 0, t)
For vanishing f0 (x) the velocity eld corresponding to the landmarks path φ(x
l∈L
is given by
l , 0, t)
v(x, t) = a x, φ(x
l∈L
k , 0, t), φ(x
l , 0, t)
a φ(x
∂ φ(xk , 0, t)
.
∂t
k,l∈L
k∈L
−1
If the in nitesimal covariance a(x, y) equals the Greens function for the square of
the differential operator L , i.e. L ∗ L a(x, y) = δx=y , then
L L v(x, t) = δx=φ(x
l ,0,t)
∗
l∈L
k , 0, t), φ(x
l , 0, t)
a φ(x
∂ φ(xk , 0, t)
∂t
k,l∈L
k∈L
−1
and it is shown in [1] that the energy functional E(
v ) equals minus the loga
rithm of the renormalized Brownian density for B. In this way the maximum a
posteriori estimator equals the minimal energy estimator. This result gives 1) a
probabilistic interpretation of the energy minimization methods proposed in [4]
and [2], 2) the “statistical equivalence” of the energy minimization methods to the
Bayesian approach proposed in [7].
Acknowledgement: I am grateful to Peter Johansen and Mads Nielsen for
introducing me to these intriguing problems.
References
[1] B. Markussen, “A Statistical Approach to Large Deformation Diffeomorphisms”, second workshop on Generative Model Based Vision at the 22nd IEEE conference on
Computer Vision and Pattern Recognition, Washington D.C., 2004.
[2] V. Camion and L. Younes, “Geodesic Interpolating Splines,” Third International
Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, EMMCVPR 2001, LNCS 2134, pp. 513-527, 2001.
[3] A. D. Jackson, B. Lautrup, P. Johansen, and M. Nielsen, “Products of Random Matrices,” Phys. Rev. E, Vol. 66, article 66124, 2002.
[4] S. C. Joshi and M. I. Miller, “Landmark Matching via Large Deformation Diffeomorphisms,” IEEE Trans. Image Process, Vol. 9, pp. 1357-1370, 2000.
[5] H. Kunita, Stochastic Flows and Stochastic Differential Equations, Cambridge University Press, 1990.
6
37
[6] M. I. Miller and L. Younes, “Group Actions, Homeomorphisms, and Matching: A
General Framework,” Int. J. Computer Vision, Vol. 41, pp. 61–84, 2002.
[7] M. Nielsen, P. Johansen, A. D. Jackson, and B. Lautrup, “Brownian Warps: A Least
Committed Prior for Non-rigid Registration,” Fifth International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2002, LNCS
2489, pp. 557-564, 2002.
[8] C. J. Twining and S. Marsland, “Constructing Diffeomorphic Representations of
Non-rigid Registrations of Medical Images,” 18’th International Conference on Information Processing in Medical Imaging, IPMI 2003, LNCS 2732, pp. 413-425,
2003.
7
38
Probabilistic Model-based Background
Subtraction
J. Anderson1 , T. Prehn1 , V. Krüger1 , and A. Elgammal2
1
Aalborg University Esbjerg
Niels Bohrs Vej 8
6700 Esbjerg
Denmark
2
Dept. of Computer Science
Rutgers, the State University of New Jersey
110 Frelinghuysen Road
Piscataway, NJ 08854-8019
USA
Abstract. Usually, background subtraction is approached as a pixelbased process, and the output is (a possibly thresholded) image where
each pixel reflects, independent from its neighboring pixels, the likelihood of itself belonging to a foreground object. What is neglected for
better output is the correlation between pixels. In this paper we introduce a model-based background subtraction approach which facilitates
prior knowledge of pixel correlations for clearer and better results. Model
knowledge is being learned from good training video data, the data is
stored for fast access in a hierarchical manner. Bayesian propagation over
time is used for proper model selection and tracking during model-based
background subtraction. Bayes propagation is attractive in our application as it allows to deal with uncertainties during tracking. The system
runns in real-time and was extensively tested on suitable outdoor video
data.
1
Introduction
Companies and scientists work on vision systems that are expected to work in
real-world scenarios. Car companies work, e.g., on road sign and pedestrian detection and due to the threat of terrorism, biometric vision systems and surveillance applications are under development.
All these approaches work well in controlled environments, e.g., attempts
to recognize humans by their face and gait has proven to be very successful
in a lab environment. However, in uncontrolled environments, such as outdoor
scenarios, the approaches disgracefully fail, e.g., the gait recognition drops from
99% to merely 30%. This is mainly due to low quality video data, the often small
number of pixels on target and visual distractors such as shadows and strong
illumination variations.
What is needed are special feature extraction techniques that are robust to
outdoor distortions and that can cope with low-quality video data. One of the
39
most common feature extraction techniques in surveillance applications is background subtraction (BGS) [5; 3; 9; 6]. BGS approaches assume a stable camera.
They are able to learn a background as well as possible local image variations
of it, thus generating a background model even of non-rigid background objects.
During application the model is compared with novel video images and pixels
are marked according to the belief that they are fitting the background model.
Generally, BGS methods have the following drawbacks:
1. BGS techniques are able to detect “interesting” images areas, i.e., image
areas that are sufficiently different from the learned background model. Thus,
BGS approaches are able to detect, e.g., a person and the shadow that he/she
throws. However, BGS approaches are not able to distinguish between a
foreground object and its shadow.
2. Very often, the same objects causes a different output when the scenario
changes: E.g. the BGS output for a person walking on green grass or gray
concrete may be different.
In this paper we present a Model-based Background Subtracting (MBGS)
method that learns not only a background model but also a foreground model.
The “classical” background subtraction detects the region of interests while the
foreground models are being applied to the classical BGS output to “clean up”
possible noise. To reach a maximum of robustness, we obmit any thresholding
and apply statistical matching techniques to the likelihood measures of the BGS.
Having the recent gait recognition attempts in mind, we have applied the following limitations to our discussion (however, the methodology is general enough
that we do not see any limit of generality in these limitations):
1. We consider only humans as objects and ignore objects that look different
from humans.
2. We limit our discussion to silhouettes of humans as they deliver a fairly
clothing-independent description of an individual.
The Model-based Background Subtraction System (MBGS System) consists
of a learning part to learn possible foreground objects and a MBGS part, where
the output of a classical BGS is verified using the previously trained foreground
object knowledge.
To learn and represent the foreground knowledge (here silhouettes of humans)
is not straight forward due to the absence of a suitable vector space. One possibility is to describe the data in a hierarchical manner, using a suitable metric and
a suitable representation of dynamics between the silhouettes. We interpret the
silhouettes as probabilistic density functions S(X, Y ) where each pixel describes
the likelihood of belonging either to foreground or to background. To compare
these “probabilistic” silhouettes we use the Kullback-Leibler distance, k-means
clustering is used to cluster similar ones. Similar approaches for hierarchical
contour representation can be found in [4; 14].
In the second part, we again consider the contours as densities over spatial
coordinates and use normalized correlation to compute the similarity between
40
the silhouette density and the computed one in the input image. Tracking and
silhouette selection is being done using Bayesian propagation over time. Bayesian
propagation can be applied directly since we are dealing with densities and it
has the advantage that it considers the uncertainty in the estimation of the
tracking parameters and the silhouette selection. The densities in the Bayesian
propagation are approximated using an enhancement of the well-known Condensation method [7]. A similar enhancement of the Condensation method has
been applied in video based face recognition [12].
The remainder of this paper is organized as follows: In Sec. 2 we introduce the
learning approaches. The actual BGS method is discussed in Sec. 3. We discuss
our experimental results in Sec. 4 and conclude with final remarks are in Sec. 5.
2
Learning and Representation of Foreground Objects
In order to make use of foreground model knowledge, the MBGS system needs
to be able to:
– learn a model representation for possibly a number of different and non-rigid
objects from video data and
– quickly access the proper model information during application of the MBGS.
Our main idea is the following: Apply the classical BGS to a scenario that is
controlled in a manner that facilitates the learning process. In our case, since we
want to learn silhouettes of humans, that only humans are visible in the scene
during training and that the background variations are kept as small as possible
to minimize distortions. Then, use this video data to learn the proper model
knowledge.
After the application of a classical BGS, applying mean-shift tracking [1] allows to extract from the BGS output-data a sequence of small image patches
containing, centered, the silhouette. This procedure is the same as the one presented in [10], however, with the difference that here we do not threshold the
BGS output but use probabilistic silhouettes (instead of binary ones as in [10])
which still contain for each silhouette pixel the belief of being a foreground pixel.
To organize this data we first normalize the exemplars w.r.t scale and position. Then we use, similar to [4], a combination of tree structuring and k-means
clustering. We use a top down approach: The first level is the root of the hierarchy which contains all the exemplars. Then the second level is constructed by
using a the k-means clustering to cluster the exemplars from the root. The third
level is constructed by clustering each cluster from the second level, again, using
k-means, see Fig. 1 for an example. The k-means clustering uses the KullbackLeibler divergence measure which measures the similarity between two density
functions p and q:
p(x)
dx .
(1)
KLDist(p, q) = p(x) log
q(x)
KLDist(p, q) is non-negative and only zero if p and q coincide.
41
Fig. 1. An example of our clustering approach: 30 exemplars with K=3 and the algorithm stops after reaching 3 levels.
See Fig. 2 for an example clustering of a training sequence of a single individual (as training data, we chose here non-optimal data for better visualization).
The tree structure facilitates a fast search of exemplars along the tree vertices,
Fig. 2. The five images show the cluster centers computed from a training sequence of
a single individual.
and the cluster centers are either used to apply MBGS on a coarse level or they
are used as proto-types, as in [4], to direct the search to a finer level in the
hierarchy. Once the tree is constructed, we generate a Markov transition matrix:
Assuming that the change over time from one silhouette to a next one can be
l
understood as a first order Markov process, the Markov transition matrix Mij
describes the transition probability of silhouette sj following after silhouette si
at level l in the hierarchy. During MBGS application particle filtering [8; 11;
2] will be used to find the proper silhouette (see Sec. 3). The propagation of
silhouettes over time (see Sec. 3) is non-trivial, as silhouette do not form a vector space. However, what is sufficient, is a (not necessarily symmetric) metric
space, i.e., given a silhouette si , only silhouettes that are close according to a
given (not necessarily symmetric) metric are needed to be considered. In the
tree structure similar contours are clustered which facilitates the propagation
process. The Markov transition matrix Mij on the other hand describes directly
the transition likelihoods between clusters.
42
3
Applying Background Subtraction and Recognizing
Foreground Objects
The MBGS system is built as an extension to a pixel based BGS approach. It
uses foreground models to define likely correlations between neighbored pixels
in the output P (x) of the BGS application.
Each pixel in the image P (x) contains a value in the range [0, 1], where 1
indicates the highest probability of a pixel being a foreground pixel. A model in
the hierarchy can be chosen and deformed according to a 4-D vector
θ = [i, s, x, y],
(2)
where x and y denote the position of the silhouette in the image P , s its scale,
and i is a natural number that refers to a silhouette in the hierarchy.
The “matching” is done by normalized correlation between a model silhouette
density, parameterized according to a deformation vector θt and the appropriate
region of interest in the BGS image Pt (x), appropriately normalized.
In order to find at each time-step t the most likely θt in the image Pt (x), we
use Bayesian propagation over time
p(θt |P1 , P2 , . . . , Pt ) ≡ pt (αt , it )
=
p(Pt |αt , it )
it−1
αt−1
p(αt , it |αt−1 , it−1 )pt−1 (αt−1 , it−1 )
(3)
with αt = [s, x, y]t . The probability images are denoted by “Pt ” while little “p”
denotes density functions: pt (Pt |αt , it ) denotes the likelihood measure of observation Pt , given the parameters αt andit while pt (αt , it ) denotes the prior at time
t. We approximate the posteriori density p(θt |P1 , P2 , . . . , Pt ) with a sequential
Monte Carlo method [2; 7; 11; 13].
Using Bayesian propagation allows to take into account the uncertainty in
the estimated parameter.
Monte Carlo methods use random samples for the approximation of a density
function. Our MBGS system uses separate sample sets for each object in the
input image. A new sample set is constructed every time a new object in the
video image matches sufficiently well.
As the diffusion density p(αt , it |αt−1 , it−1 ) in Eq. 3 we use the Brownian motion model due to the absence of a better one. For the propagation of the position
and scale parameters, x, y, and s, this is straight forward. For the propagation
of the silhouette we use the following strategy: The likelihood for selecting a
silhouette from a certain silhouette cluster in the hierarchy is computed from
the Markov transition matrix M by marginalizing over the silhouettes in that
particular cluster. Within a cluster, the new silhouette is then chosen randomly.
The reason for this is that our training data is too little so that the Markov
transition matrix M appeared to be specific to the training videos.
43
4
Experiments
In this section we present the experimental results with our MBGS implementation. We have carried out a large number of experiments with both, online
and offline video data. The experiments have shown that the drawbacks of the
classical BGS approach, that were mentioned in section 1, can be remedied with
MBGS:
1. Because shadows are not part of the model information, they are be classified
as background by the implemented MBGS approach. In fact, most non-model
object types will be classified as background, and therefore MBGS allows for
effective object type filtering.
2. The output presented from the MBGS does not vary, even when the scenario
changes significantly and objects are always detected completely even when
significantly covered up by other objects such as bushes or lamp posts.
The verification is done systematically by comparing the AUE MBGS system
with two previously developed pixel-based state-of-art background subtraction
approaches: One is the non-parametric approach developed at Maryland (UMD
BGS) [3]. The second one (AUE BGS) was previously developed at AUE and
utilizes an alternative image noise filtering approach. The comparison is done by
eye-sight due to the absense of ground-truth. We have tested live-feed as well as
videos from a database of 50 videos with different scenarios such as parkinglotscenes and park-scenes. The videos contained both side views and frontal views
of the person. 3 In Figure 3 one sees a scenario in one of the test movies with
a pedestrian walking behind trees, thereby at times being occluded. The output
of two pixel-based approaches is shown in the lower part of the figure. One can
notice that the shadow casted by the pedestrian is classified as foreground by
the pixel-based approaches but as background by the model-based approaches.
Figure 4 shows the same scenario but this time a scene is shown where the
pedestrian is heavily occluded. The occlusion causes the pedestrian to more or
less disappear when the pixel based approaches are used.
The scenario in figure 5 (see also the provided video) shows two pedestrians
walking towards each other, thereby crossing behind lamp posts and a statue.
When processing this scenario, a combination of image filtering and the background variation appeares where the silhouettes of the pixel-based are too heavily
distorted to still be identified as a person.
Also both pixel-based approaches severely distorts the contours of the pedestrians. By only inspecting the pixel-based results, it is hard to tell that the
foreground objects are actually pedestrians.
The system has been tested on a 2 GHz Pentium with Linux. We use RGB
videos with an images size 320 × 240 pixels. With only a single person to track,
the system needs with 350 particles ≈ 50 ms/frame. 25ms/frame were used by
the classical BGS, 25ms/frame were used for the matching.
3
We have provided the most representative videos that document the limits of our
approach.
44
Fig. 3. BGS approach comparison of shadow issue.
5
Conclusion
The presented model-based background subtraction system combines the classical background subtraction with model knowledge of foreground objects. The
application of model knowledge is not applied on a binary BGS image but on the
“likelihood image”, i.e. an image where each pixel value represents a confidence
of belonging either to the foreground or background. This approach considerably
increases robustness as these likelihoods can also be understood as uncertainties which is exploited for the tracking and silhouette selection process. Also,
the propagation of densities prevents the need of selecting thresholds (e.g. for
binarization of the image P ) or of maximization. Thresholds are only used for
visualization purposes and otherwise for the detection of a new human in the
field of view.
In the above application we have chosen silhouettes of humans, but we belive
that this choice is without limit of generality since even different object types fit
into the tree structure.
The presented experiments were carried out with only a single individual in
the database. We have experimented also with different individuals (and thus
45
Fig. 4. BGS approach comparison of heavily occlusion.
varying contours), but the output was instable w.f.t. the choice if the individual. This is under further investigation and the use of our approach for gait
recognition is future research.
References
1. Dorin Comaniciu, Visvanathan Ramesh, and Peter Meer. Real-time tracking of
non-rigid objects using mean shift. In Proc. IEEE Conf. on Computer Vision and
Pattern Recognition, volume 2, pages 142–149, Hilton Head Island, SC, June 13-15,
2000.
2. A. Doucet, S. Godsill, and C. Andrieu. On sequential monte carlo sampling methods for bayesian filtering. Statistics and Computing, 10:197–209, 2000.
3. A. Elgammal and L. Davis. Probabilistic framework for segmenting people under
occlusion. In ICCV, ICCV01, 2001.
4. D. Gavrila and V. Philomin. Real-time object detection for ”smart” vehicles. In
Proc. Int. Conf. on Computer Vision, pages 87–93, Korfu, Greece, 1999.
5. I. Haritaoglu, D. Harwood, and L. Davis. W4s: A real-time system for detection and
tracking people in 2.5 D. In Proc. European Conf. on Computer Vision, Freiburg,
Germany, June 1-5, 1998.
46
Fig. 5. BGS approach comparison of low contrast issue.
6. T. Horprasert, D. Harwood, and L.S. Davis. A statistical approach for real-time
robust background subtraction and shadow detection. In Proceedings of IEEE
ICCV’99 FRAME-RATE Workshop, 1999.
7. M. Isard and A. Blake. Condensation – conditional density propagation for visual
tracking. Int. J. of Computer Vision, 1998.
8. M. Isard and A. Blake. Condensation – conditional density propagation for visual
tracking. Int. J. of Computer Vision, 29:5–28, 1998.
9. Yuri A. Ivanov, Aaron F. Bobick, and John Liu. Fast lighting independent background subtraction. Int. J. of Computer Vision, 37(2):199–207, 2000.
10. A. Kale, A.N. Rajagopalan, N. Cuntoor, V. Krueger, and R. Chellappa. Identification of humans using gait. IEEE Trans. Image Processing, 2004. to be published.
11. G. Kitagawa. Monta carlo filter and smoother for non-gaussian nonlinear state
space models. J. Computational and Graphical Statistics, 5:1–25, 1996.
12. V. Krueger and S. Zhou. Exemplar-based face recognition from video. In Proc.
European Conf. on Computer Vision, Copenhagen, Denmark, June 27-31, 2002.
13. J.S. Liu and R. Chen. Sequential monte carlo for dynamic systems. Journal of the
American Statistical Association, 93:1031–1041, 1998.
47
14. K. Toyama and A. Blake. Probabilistic tracking in a metric space. In Proc. Int.
Conf. on Computer Vision, volume 2, pages 50–59, Vancouver, Canada, 9-12 July,
2001.
48
8
:
<
=
?
A
B
C
E
G
I
2
K
a
b
c
e
f
h
i
p
k
€
m
u
n
v
p
n
p
f
r
‹
p
u
t
v
b

u
Ž
v
}
w
}
y
u
N
O
K
{
w
r
,
P
i
S
G
w
b
=

T
t
‘
E
K
K
^
ƒ
b
u
€
w

P
'
(
*
,
!
/
,
V
<
:
8
T
I
E
X
=
T
L
N
R
P
y
f
’
w
p
}
y
€
m
u
b
v
u
n
f
v
‹
w
u
y
{
w
v
}


i

†
b
v
u
y
v
‡
u
•
}
f
{
}
y
w

p
t
–
Š
Š
—
K
„
L
I
u
’
‘
"
T
C
p
c
!
B
”
5
=
€
’
<
S
u
‘
:
:
b
u
\

r
R
Z
}

*
L
C
`
p
t
p
•
€
˜
w
u
}
k
m
y
u
v
m
u
i
†
v
y
‡
u
}
{
y
w

p
t
p
Š
€
u
v
n
f
‹
u
v
i
†
v
y
‡
u
}
{
y
w
u
w
{
€
f
}
–
u
v
›
i
Š
b
”
c
‘
›
’
’
i
p
€
u
v
n
f
‹
u
v

i
b
u
v
•
f
}
–
Š

¤
¤
¥
¬
¦
¸
¬
§
´
´
§
§
¬
¶
¯
©
¸
Å
­
¸
¨
½
©
³
¦
¦
¨
§
©
´
¥
¬
¯
­
®
ª
¬
¬
¶
§
§
¶
ª
¶
È
§
·
±
­
¦
®
½
¬
¯
·
¦
À
¦
¥
¥
ª
¿
§
À
¦
¦
¯
¶
¦
¦
³
§
¯
§
´
¶
ª
¦
¥
®
¥
©
±
§
¥
¯
®
¯
´
¦
¸
§
¥
¥
±
·
´
®
§
ª
¥
ª
Á
¥
§
§
¶
®
¬
¦
³
¸
§
µ
´
©
Æ
§
¬
§
§
¸
³
º
¶
§
¸
¬
ª
´
¦
¬
¨
¥
¥
§
¥
§
¯
´
ª
³
¨
·
§
§
©
§
ª
´
­
¯
´
¶
Â
¬
È
¨
¦
¦
©
§
´
§
Á
¬
¨
©
­
À
§
§
º
À
¸
­
¨
£
¨
¿
Ç
¬
¬
¢
ª
¾
¨
©
¡
¥
¦
¨
ª
µ
§
§
Ÿ
´
§
¬
¥
¦
©
´
®
¦
ž
§
¬
´
®
¥
­
µ
¬
¥
Ë
¬
´
´
µ
¶
¶
ª
©
§
Ê
¬
¬
§
§
¨
¬
ª
Å
Ê
¥
©
¯
§
ª
´
º
¦
§
§
¦
§
¼
§
ª
·
§
»
¨
·
¶
¬
µ
§
§
¨
µ
´
­
­
¶
´
§
¬
¬
¸
ª
­
·
´
®
¬
§
¦
¶
­
´
³
®
®
§
½
¬
¥
´
¸
¥
½
¬
§
±
®
¥
±
º
¦
¬
±
­
§
Æ
Ì
ª
¬
­
·
§
§
¬
ª
´
§
¥
´
Ò
­
¥
Ó
n
u
Õ
ã
§
¬
¶
y
¯
·
¦
Õ
¶
v
§
y
p
È
ª
¦
f
ª
º
­
À
®
¬
­
Ý
è
§
·
§
{
Ê
¼
¥
¶
f
¶
¥
©
©
¯
¬
¬
´
§
¯
{
m
Å
Å
¸
¬
À
´
½
©
Ê
´
¥
¿
¨
¥
Ü
v
§
¦
Õ
©
ª
ª
Å
µ
®
§
¶
à
y
´
½
¬
¦
¶
¬
¦
¦
Û
{
¥
§
ª
ß
˜
¬
¦
¥
ª
Þ
f
¦
­
´
¥
§
Á
¬
Å
Õ
r
¬
·
©
Ü
v
Å
À
¶
¦
Û
{
©
¿
¥
¶
©
µ
§
­
´
¬
¨
­
¦
Ù
u
¸
¬
©
¶
¬
Å
§
§
©
¦
©
·
¬
³
•
¥
¥
¥
Ø
®
¦
½
Å
×
„
¶
ª
®
©
×
´
©
¶
§
¦
ç
¶
¨
§
Ö
å
¶
§
´
·
³
¦
¬
­
ª
½
ª
¥
Á
®
§
¶
ª
©
Ê
¦
¦
¶
¥
´
©
©
Á
½
§
§
À
¦
º
¦
Å
§
¥
§
§
´
®
´
¼
È
ª
¦
¶
¬
§
§
­
¨
½
¥
·
¬
·
§
´
¨
¥
Ê
¨
µ
¸
¥
©
¯
¯
Ê
¦
µ
®
§
´
­
È
¥
¶
ª
ª
·
·
§
¨
Æ
¬
ª
®
Å
ª
¥
¸
©
§
¶
®
ª
¦
´
¨
¨
©
­
§
§
§
¦
©
¼
Å
³
®
§
©
´
ª
¶
ª
½
§
§
®
¬
ª
¬
©
¶
º
ª
¶
¦
Æ
§
Æ
Æ
Þ
f
r
u
c
{
€
f
m
u
i
é
ë
ì
í
î
`
ì
ï
i
p
t
f
v
ã
„
y
•
u
v
c
â
{
y
h
p
r
v
˜
f
}
r
}
y
u
•
f
„
‡
‹
u
u
}
i
{
y
ò
p
ë
v
ì
p
í
t
ì
ï
w
n
u
t
i
p
}
y
•
{
u
f
v
}
õ
p
a
}
i
„
›
u
‘
}
i
u
ö
„
÷
{
w

f
n
m
–
p
u
h
r
t
˜
y
}
•
}
f
y
v
‹
u
‹
y
{
i
{
ó
€
n
u
}
u
t
}
p
u
}
u
•
f
u
m
n
„
y
f
m
•
m
f
p
}
‹
„
u
y
y
v
{
‹
f
w
p
â
w
n
u
„
y
ø
˜
{
y
p
v
u
ù
˜
f
w
y
p
v
i
ú
—
é
ý
þ
é
ÿ
›
û
ú
ó
n
u
}
u
é
y
{
w
n
u
}
{
w
€
f
}
w
y
f
r
c
„
u
}
y
‡
f
w
y
‡
u
p
t
w
n
u
y
•
f
‹
u
y
v
w
n
u
{
m
f
r
ú
u
„
y
}
u
ú
m
w
y
v
p
i
f
v
„
û
þ
y
—
{
w
n
u
e
f
€
r
f
m
y
f
v
p
€
u
}
f
w
p
}
i
ó
n
y
m
n
y
v
w
ó
p
„
y
•
u
v
{
y
p
v
{
}
u
f
„
{
å
—

n
v
u
u
è
f
f
•
˜
€
{
{
r
y
u
f
p
v
t
–
w
u
n
}
u
v
{
u
r
m
f
r
y
u
{
c
w
{
€
n
f
u
m
u
è
p
}
u
t
u
f
v
•
f
{
t
˜
v
v
t
m
f
w
y
m
u
p
y
v
•
f
p
t
‹
w
u
n
y
u
n
{
u
{
f
n
w
p
„
ó
v
y
ø

—
˜
y
{
v
y
p
y
v
‹
˜
u
}
ù
u
˜
›
f
w

y
y
p
v
i
â

u

é
ý
ò
ÿ
‘
ç
&
.
0
û
!
2
ÿ
ý
"
#
›
í
&
—
(
*
+
a
49
"
$
"
)
+
/
/
"
/
+
"
*
6
/
+
/
/
$
$
7
$
+
D
/
8
K
+
/
<
$
"
+
<
+
"
)
/
"
"
!
+
P
!
)
C
&
/
$
/
O
!
G
*
<
)
+
-
!
$
<
D
0
=
$
)
+
/
+
/
$
+
+
$
/
/
$
$
G
/
+
<
G
G
"
E
B
G
/
/
<
+
/
+
G
$
/
/
$
+
"
+
B
C
$
$
I
/
$
/
$
+
*
/
)
+
$
"
/
$
$
/
+
"
I
G
G
G
<
/
+
"
/
*
/
+
/
$
"
G
3
$
$
/
I
$
O
6
)
7
"
G
"
c
/
$
/
6
/
/
+
$
$
B
"
/
$
$
e
;
h
/
G
$
"
/
B
/
$
B
<
$
+
"
/
"
$
c
+
"
/
/
+
C
$
D
$
B
$
$
$
/
G
/
/
/
+
/
$
$
I
k
/
l
/
+
<
G
=
$
"
O
=
/
;
$
=
$
$
$
G
c
$
/
/
$
9
I
/
G
G
c
$
/
K
<
<
e
O
;
G
=
/
G
e
7
$
b
B
+
$
"
G
$
/
$
B
B
G
G
$
/
"
"
+
G
"
B
+
$
$
/
/
/
/
/
G
G
H
f
J
=
/
c
k
6
$
<
$
n
o
/
$
$
<
l
+
i
/
$
<
G
)
/
+
/
L
$
"
B
/
$
$
<
$
$
/
)
I
$
/
K
G
G
$
/
/
$
G
/
$
/
$
G
h
+
/
<
3
I
/
"
$
+
/
/
$
<
u
*
$
G
3
v
/
$
G
/
+
/
G
"
G
"
$
6
+
$
/
K
+
/
G
/
"
I
B
G
I
$
+
$
6
P
+
$
/
/
$
$
N
+
)
)
$
$
/
/
/
+
h
$
/
G
I
+
/
<
K
)
/
/
$
G
<
6
+
/
+
"
Q
$
/
S
)
$
G
)
*
)
0
T
$
$
&
S
3
/
!
<
$
"
/
+
"
$
e
+
"
<
+
7
e
/
$
/
G
K
I
/
+
$
B
/
+
Y
$
"
/
/
O
+
$
7
;
)
$
$
$
"
+
O
W
)
V
/
3
<
/
+
"
/
3
u
$
G
G
B
/
B
<
"
/
G
/
/
+
/
$
/
)
/
$
/
6
/
+
/
B
u
$
$
G
B
G
G
$
)
+
"
$
/
$
"
6
<
/
+
/
"
3
/
$
/
6
K
/
/
+
+
$
$
/
B
/
B
$
G
$
B
/
B
"
$
B
3
"
<
)
$
S
$
B
G
S
!
S
S
)
0
/
S
Q
$
*
$
S
)
)
!
6
Q
0
+
+
/
b
^
Q
T
Q
!
0
^
+
+
+
/
)
/
$
T
+
S
&
)
$
G
0
S
!
!
"
S
$
<
Q
T
"
/
0
G
^
/
+
$
/
+
)
<
/
b
+
+
&
S
/
/
$
<
/
"
$
$
G
/
/
+
$
B
/
$
G
$
/
$
$
"
$
/
/
B
+
6
/
+
"
/
/
+
+
I
I
"
/
G
G
)
/
+
)
d
+
$
/
$
$
/
I
G
/
G
G
B
$
/
/
+
/
$
/
D
I
/
*
/
I
"
<
)
)
$
B
n
$
/
/
K
$
G
+
/
/
K
/
$
50
#
&
(
#
#
+
,
.
/
/
#
-
#
&
6
&
.
.
#
#
+
,
-
.
,
.
+
?
@
?
#
#
.
E
#
+
G
-
#
#
,
0
1
.
2
4
a
#
#
f
#
#
+
+
'
E
)
.
E
&
&
?
+
+
%
&
.
E
,
#
,
#
J
K
G
&
I
M
.
#
&
&
\
,
M
+
.
.
+
&
#
&
M
G
,
e
)
.
K
&
M
#
/
+
)
I
'
E
<
:
&
#
a
d
&
&
b
.
6
.
?
8
+
c
#
b
@
7
&
.
#
6
6
+
@
G
@
.
+
.
#
?
+
.
@
.
6
.
,
#
4
g
c
i
j
b
f
a
i
#
g
k
?
e
f
b
6
g
.
4
A
C
&
6
a
a
d
#
-
@
G
&
o
@
+
+
&
M
&
#
.
+
#
#
I
E
M
F
M
G
K
,
.
&
-
#
.
G
&
+
H
&
I
/
K
,
.
#
.
G
-
&
#
&
&
&
6
?
#
+
+
&
G
I
&
+
G
#
+
+
?
@
.
#
.
&
,
#
.
&
M
#
r
@
#
+
+
?
&
.
+
o
.
@
.
#
.
.
+
@
,
.
6
?
#
6
&
6
?
+
o
#
.
M
&
.
-
+
u
w
.
K
t
x
+
6
+
@
s
K
L
M
t
&
.
#
6
Q
K
R
N
#
#
#
#
M
&
&
&
V
Q
K
T
&
&
H
,
T
#
X
w
Z
[
y
\
^
K
x
@
.
#
.
X
`
[
b
z
Q
M
r
R
X
`
\
b
z
V
Q
M
&
M
X
j
T
Q
.
j
i
@
k
g
i
V
Q
M
&
X
i
.
+
+
6
+
.
k
g
i
m
&
#
+
&
#
G
+
.
,
R
.
?
o
`
V
Q
b
@
.
#
Q
#
+
#
+
&
V
Q
&
H
&
-
j
i
R
}
}
u
~
`
X
`
…
b
b

Q
o
`
V
b
z
s
u
V
v
`
X
`
…
b
~
b
ˆ
Š
…
Œ
j
…
€
‚
‚
‚
‚

Ž

'
F
,
Œ
i
w
y
z
{
{
|
}
Œ
‚
‚
‚
‚
s
‘
L
K
6
.
+
+
.
,
.
•
–
˜
M
™
š
M
-
#
&
@
.
Q
š
‘
R
M
@
.
.
?
&
H
&
.
+
+
.
T
o
`
j
V
Q
b
+
M

™
i
š
z
Ÿ
V
Q
o
K
`
j
T
V
Q
b
o

`
j
i
‚
‚
V
Q
b
y
¢
‘
Q

i
‚
R
¤
T
y
¥
z
¦
§
¨

'
E
,
51
"
'
)
,
-
1
1
#
'
-
J
-
-
,
-
7
)
'
)
A
B
,
)
-
'
,
G
7
,
,
,
1
-
B
-
-
1
=
)
P
7
1
-
)
A
'
B
"
'
B
'
'
7
"
'
R
Q
,
S
1
7
,
,
1
-
=
,
B
'
B
'
'
-
,
"
$
%
'
)
*
,
%
.
Y
0
Z
"
]
"
-
,
7
,
/
)
J
)
-
-
1
)
'
B
-
)
,
,
)
'
-
`
1
-
'
-
"
)
,
'
A
-
7
-
,
,
-
-
"
,
'
'
,
-
)
,
d
-
-
'
-
"
-
J
,
"
,
'
-
-
)
)
,
1
7
=
'
'
-
=
-
,
-
-
-
)
,
)
7
,
-
7
-
)
-
-
-
)
-
d
7
'
'
=
-
)
,
-
,
'
B
-
'
-
,
,
-
)
,
-
-
)
,
'
1
"
,
,
-
,
'
G
-
'
)
d
"
,
7
7
'
'
=
1
-
-
B
-
-
,
,
,
,
-
'
A
,
'
7
-
-
)
,
-
-
)
,
,
d
-
"
J
'
'
-
-
-
,
,
-
d
4
-
-
=
Y
5
Z
"
J
6
8
l
-
@
1
,
C
A
,
E
F
M
N
,
d
=
-
'
d
-
'
-
'
B
"
A
=
E
K
-
?
H
-
?
'
7
=
)
,
;
,
-
:
=
?
A
H
Q
A
-
@
S
E
@
-
Q
A
M
N
Y
-
,
-
-
-
)
'
B
=
B
,
-
,
,
,
-
)
,
-
"
,
-
'
-
-
'
'
'
d
-
'
-
'
B
-
-
[
-
\
"
,
,
-
,
-
)
'
'
B
G
)
,
-
1
A
A
-
7
,
'
-
n
A
'
-
,
7
o
A
o
"
=
d
"
J
)
'
52
2
-
)
"
5
-
+
%
-
!
,
+
!
!
"
-
-
+
+
G
-
I
"
!
!
J
,
I
"
K
!
@
2
-
)
%
2
%
)
-
%
)
,
)
$
%
,
%
5
!
,
-
2
-
%
$
)
)
-
%
+
!
)
+
!
,
"
-
"
"
"
"
-
%
-
-
2
%
!
"
%
-
!
-
D
"
!
$
)
,
-
-
R
)
2
%
)
5
5
!
-
)
@
-
2
)
-
2
%
-
2
5
)
@
%
-
$
D
$
%
2
-
+
)
)
)
$
!
-
!
)
$
"
)
-
-
-
)
+
%
"
2
%
@
-
-
%
2
%
)
%
"
)
-
2
-
"
$
"
-
5
,
$
%
"
)
-
%
-
%
!
,
"
)
%
%
"
@
-
"
%
!
"
+
[
+
)
"
[
c
5
-
D
%
5
-
)
-
5
I
e
"
f
g
-
2
%
"
-
"
"
%
5
-
$
-
%
%
"
-
-
)
2
%
"
!
-
"
2
!
!
,
"
!
-
"
[
%
-
!
!
,
5
!
)
!
!
+
[
e
$
%
-
"
-
%
5
!
,
-
+
)
!
!
+
-
%
5
,
+
-
"
-
2
%
$
"
-
-
2
-
-
5
%
+
)
@
$
$
"
$
&
'
(
*
,
(
'
/
1
/
(
"
'
@
"
$
&
'
(
*
9
&
;
1
/
(
"
'
@
"
$
&
'
(
*
(
"
"
&
1
/
(
"
'
$
%
-
2
+
-
[
c
-
%
)
)
-
+
!
)
)
-
2
2
-
%
)
%
)
-
2
"
)
-
+
%
+
$
-
R
@
"
%
5
!
-
-
5
2
)
)
-
@
D
2
%
-
)
!
%
%
+
5
-
-
-
)
-
[
-
2
%
%
"
-
-
%
-
2
%
%
R
)
-
%
5
!
-
5
-
+
[
!
2
"
-
$
-
2
%
53
!
$
&
)
!
$
,
!
$
!
!
2
!
!
,
&
,
6
!
!
8
!
8
!
&
X
!
,
Z
!
!
X
=
,
,
8
2
$
=
!
!
&
!
8
Y
,
Y
)
[
&
2
Y
$
8
&
$
$
6
$
8
=
&
=
$
$
$
=
E
!
!
=
&
,
&
=
&
)
!
!
,
2
R
)
2
]
$
!
=
!
!
=
=
2
,
)
!
$
&
!
!
=
$
c
,
R
)
!
$
&
$
!
!
$
R
!
$
,
,
,
R
!
!
=
&
d
d
d
!
!
R
!
d
d
)
!
$
!
!
=
&
,
R
!
&
$
d
d
2
=
&
$
!
!
d
d
d
)
2
=
&
,
R
!
$
$
!
&
&
!
,
$
=
$
R
,
R
!
$
d
d
&
!
!
,
2
8
,
&
)
E
8
!
!
!
$
d
d
d
)
$
!
=
)
&
,
R
&
8
&
&
!
$
8
!
$
,
&
=
=
d
d
&
,
!
8
,
,
R
=
$
R
!
$
d
$
$
d
d
2
=
&
,
!
54
&
"
$
&
'
(
8
&
:
;
)
+
(
0
2
'
(
&
&
8
&
&
"
$
&
"
8
N
8
8
"
:
(
0
'
(
&
&
&
(
2
4
&
(
+
&
8
&
(
H
8
:
0
+
$
&
$
)
8
:
8
:
&
8
&
;
:
0
&
(
'
2
&
:
&
(
+
0
&
4
0
&
:
H
0
:
&
H
$
&
(
&
(
"
$
&
0
+
$
&
4
&
(
&
(
&
0
$
$
8
(
2
0
2
&
8
&
&
:
H
&
0
:
8
:
0
+
$
8
&
&
&
:
$
2
:
&
&
&
(
2
&
8
&
0
:
&
(
8
(
'
4
W
(
&
"
;
&
:
+
2
&
&
&
:
+
:
$
)
)
(
:
&
&
8
8
$
&
`
:
+
&
:
:
:
;
8
(
&
+
&
&
0
8
$
8
H
&
&
(
2
&
:
2
&
(
8
&
2
:
&
:
&
&
&
$
8
+
8
+
(
0
W
+
8
;
&
:
8
4
&
8
2
&
&
(
$
)
2
;
:
&
8
&
'
(
&
+
&
:
8
&
&
+
(
&
(
&
8
&
:
:
&
:
&
(
&
(
(
0
:
4
2
(
&
'
8
&
8
$
+
&
(
&
(
8
&
&
:
+
(
+
:
&
4
8
:
&
8
&
2
&
(
c
'
(
:
;
&
W
8
&
$
)
8
+
H
$
:
;
(
0
+
$
0
&
$
2
H
:
&
(
&
W
:
0
:
$
c
&
(
(
;
0
&
:
$
0
8
(
2
W
$
$
'
H
+
+
$
)
2
&
(
&
8
&
4
&
(
f
:
W
$
2
:
;
&
(
&
&
:
4
0
:
4
"
$
&
$
)
4
8
$
0
2
+
+
$
8
&
:
4
8
(
0
0
:
2
$
0
0
&
8
8
(
$
2
0
4
0
2
2
H
&
H
2
8
:
$
$
8
&
'
2
&
;
8
:
'
N
8
+
&
0
H
$
)
:
:
;
&
(
`
c
0
2
:
;
8
:
;
8
$
2
:
'
j
k
l
j
m
j
0
j
n
0
:
'
o
:
8
$
:
&
(
:
)
;
:
p
H
$
;
8
&
:
'
:
+
:
2
4
q
$
4
o
8
$
:
8
f
4
r
&
:
(
4
&
:
4
r
!
#
H
#
$
&
(
(
*
,
4
-
/
n
0
:
*
3
,
/
8
7
(
&
9
/
4
;
<
(
4
8
q
(
&
+
(
&
$
l
l
4
+
4
l
2
A
A
>
l
s
>
l
@
t
'
u
$
W
N
8
0
8
'
k
w
m
'
n
8
(
'
8
"
$
)
0
'
"
+
$
0
&
2
&
(
p
&
n
"
&
"
'
4
2
0
2
4
N
2
W
:
8
f
4
&
:
4
<
;
,
/
/
&
(
J
#
H
;
L
N
9
/
(
*
&
(
J
3
;
<
/
(
N
/
<
(
N
&
;
(
-
;
(
L
/
<
/
(
,
/
;
(
W
J
/
<
;
,
/
#
#
&
(
J
4
+
2
H
w
k
t
@
t
m
'
>
'
0
:
'
X
$
8
8
&
(
:
&
)
8
:
4
:
2
0
+
:
$
Z
m
A
@
w
'
'
:
\
'
u
:
$
+
+
&
&
f
]
]
'
!
-
-
/
N
&
(
;
c
m
4
L
l
&
:
-
A
w
A
:
'
>
y
N
4
l
A
`
s
/
,
N
<
;
;
N
+
/
,
9
&
(
&
&
,
\
-
8
;
<
N
:
;
<
;
&
4
)
+
w
8
@
_
t
$
@
`
:
t
`
`
`
4
'
'
(
&
8
&
:
;
0
t
k
0
l
s
@
k
w
2
'
&
;
-
;
J
&
,
-
/
<
(
/
N
&
,
#
4
c
y
_
t
@
t
u
'
+
'
7
9
/
/
/
3
#
N
<
!
,
N
!
<
/
;
L
!
#
#
&
(
#
,
-
/
#
3
,
/
&
W
J
/
#
'
r
(
n
H
&
(
0
4
2
8
8
&
&
&
4
&
&
&
:
;
;
:
0
&
:
:
0
+
&
2
c
8
8
4
8
$
&
)
:
;
&
(
0
&
8
:
c
&
)
4
w
y
y
w
'
>
0
+
&
8
8
4
z
&
8
(
&
z
"
`
55
,
!
#
%
'
(
!
,
(
'
,
!
!
1
1
(
4
4
!
,
1
%
8
8
%
,
;
$
(
,
-
>
C
!
D
E
%
0
1
2
3
E
4
3
F
D
C
D
4
4
H
#
6
7
(
(
'
(
1
(
9
!
!
1
1
(
8
%
,
4
,
!
,
E
H
H
E
P
H
F
C
R
4
!
%
,
7
,
1
(
V
!
,
1
1
(
N
:
,
V
C
E
H
H
E
4
,
!
7
P
:
8
!
8
C
=
1
1
(
N
U
P
N
=
>
?
#
@
!
1
8
4
[
,
!
%
1
1
4
8
8
'
=
C
(
?
P
,
4
1
,
(
!
!
8
C
8
'
(
[
!
!
,
>
C
R
A
;
,
>
!
!
e
$
8
C
C
D
P
E
H
H
D
P
Z
3
F
F
Z
C
D
3
N
C
F
D
0
E
1
2
D
4
4
E
Z
3
C
D
4
4
4
1
?
,
1
(
!
4
,
!
,
C
R
a
(
!
V
!
,
!
;
(
N
4
%
;
!
E
H
H
Z
f
G
D
H
?
V
(
N
C
G
V
@
(
!
,
'
-
,
4
8
C
H
!
;
i
1
'
!
V
j
'
R
!
V
P
:
-
L
(
(
L
#
%
N
G
'
!
!
#
,
C
1
V
,
(
'
>
%
!
D
C
4
8
3
H
4
3
D
F
C
7
!
;
C
#
'
!
(
C
D
4
4
>
f
G
D
D
=
=
?
=
H
1
i
!
C
[
?
&
'
!
V
(
C
7
!
9
!
,
U
!
(
(
!
(
(
8
%
4
(
%
-
C
V
,
-
,
L
#
(
(
%
>
C
H
?
e
:
1
(
%
2
#
a
>
,
L
,
(
!
W
n
-
'
,
Y
#
G
,
(
'
%
Z
C
4
8
Z
4
Z
4
4
C
7
!
;
!
C
#
C
U
#
C
(
1
,
;
!
E
E
4
f
D
4
4
6
,
%
4
'
!
,
1
>
!
f
D
G
E
#
=
=
H
=
i
1
(
4
1
U
(
!
8
0
#
(
0
3
L
0
Y
G
j
6
!
%
>
C
#
'
8
-
'
D
9
4
>
3
Z
#
?
A
C
V
,
>
(
'
%
E
C
4
8
D
H
D
4
D
H
E
E
C
[
!
(
!
'
C
56
A Test Statistic in the Complex Wishart Distribution
and Its Application to Change Detection in
Polarimetric SAR Data
Knut Conradsen, Allan Aasbjerg Nielsen, Jesper Schou, and Henning Skriver
Abstract—When working with multilook fully polarimetric synthetic aperture radar (SAR) data, an appropriate way of representing the backscattered signal consists of the so-called covariance
matrix. For each pixel, this is a 3 3 Hermitian positive definite
matrix that follows a complex Wishart distribution. Based on this
distribution, a test statistic for equality of two such matrices and an
associated asymptotic probability for obtaining a smaller value of
the test statistic are derived and applied successfully to change detection in polarimetric SAR data. In a case study, EMISAR L-band
data from April 17, 1998 and May 20, 1998 covering agricultural
fields near Foulum, Denmark are used. Multilook full covariance
matrix data, azimuthal symmetric data, covariance matrix diagonal-only data, and horizontal–horizontal (HH), vertical–vertical
(VV), or horizontal–vertical (HV) data alone can be used. If applied
to HH, VV, or HV data alone, the derived test statistic reduces to
the well-known gamma likelihood-ratio test statistic. The derived
test statistic and the associated significance value can be applied as
a line or edge detector in fully polarimetric SAR data also.
Index Terms—Covariance matrix test statistic, EMISAR, radar
applications, radar polarimetry, remote sensing change detection.
I. INTRODUCTION
D
UE TO ITS all-weather mapping capability independently of, for instance, cloud cover, synthetic aperture
radar (SAR) data hold a strong potential, for example, for
change detection studies in remote sensing applications. In
this paper, multitemporal SAR images of agricultural fields
are used to demonstrate a new change detection method for
polarimetric SAR data. It is well known that the development
of different crops over time causes changes in the backscatter.
The radar backscattering is sensitive to the dielectric properties
of the vegetation and the soil, to the plant structure (i.e., the
size, shape, and orientation distributions of the scatterers), to
the surface roughness, and to the canopy structure (e.g., row
direction and spacing and cover fraction) [1], [2].
The polarimetric SAR measures the amplitude and phase
of backscattered signals in four combinations of the linear
receive and transmit polarizations: horizontal–horizontal (HH),
Manuscript received June 8, 2001; revised October 17, 2002. This work was
supported in part by The Danish National Research Councils in part under the
Earth Observation Programme and in part under the European Space Agency
Follow-On Research Programme. The EMISAR data acquisitions and part of the
data processing were supported by the Danish National Research Foundation.
K. Conradsen and A. A. Nielsen are with IMM, Informatics and Mathematical Modelling, Technical University of Denmark, DK-2800 Lyngby, Denmark
(e-mail: [email protected]).
J. Schou and H. Skriver are with EMI, Section of Electromagnetic Systems,
Technical University of Denmark (Ørsted DTU), DK-2800 Lyngby, Denmark.
Digital Object Identifier 10.1109/TGRS.2002.808066
horizontal–vertical (HV), vertical–horizontal (VH), and vertical–vertical (VV). These signals form the complex scattering
matrix that relates the incident and the scattered electric fields
[3]. The inherent speckle in the SAR data can be reduced by
spatial averaging at the expense of loss of spatial resolution. In
this so-called multilook case, a more appropriate representation
of the backscattered signal is the covariance matrix in which
the average properties of a group of resolution cells can be
expressed in a single matrix. The average covariance matrix is
defined as [3]
(1)
denotes ensemble averaging; denotes complex conwhere
is the complex scattering amplitude for receive
jugation; and
polarization and transmit polarization ( and are either
for horizontal or for vertical). Reciprocity, which normally
(in the backscatapplies to natural targets, gives
tering direction using the backscattering alignment convention
[3]) and results in the covariance matrix (1) with rank 3.
follows a complex Wishart distribution [4]. The components in
the covariance matrix containing both co- and cross-polarized
scattering matrix elements often contain little information. For
randomly distributed targets with azimuthal symmetry, the elements are zero [5].
In this paper, a test statistic for equality of two complex covariance matrices and an associated asymptotic probability measure for obtaining a smaller value of the test statistic are derived and applied to change detection in fully polarimetric SAR
data. In [6], a change detection scheme based on canonical correlations analysis is applied to scalar EMISAR data (see also
[7]–[10]).
If used with HH, VV, or HV data only, the test statistic reduces to the well-known test statistic for equality of the scale
parameters in two gamma distributions.
The derived test statistic and the associated significance measure can be applied as a line or edge detector in fully polarimetric SAR data also [11].
Section II sketches important aspects of the complex
Gaussian and Wishart distributions, the likelihood-ratio test
statistic in the complex Wishart distribution, and the associated
significance measure. Section III gives a case study in which
data from the Danish airborne EMISAR [12], [13] are used.
Section IV discusses the results from the case study, and
Section V concludes.
57
II. THEORY
This section describes the complex normal and Wishart distributions and the likelihood-ratio test for equality of two complex
Wishart matrices. For a more thorough description, see the Appendix.
A. Complex Normal Distribution
We say that a -dimensional random complex vector follows a complex multivariate normal distribution with mean
and dispersion matrix , i.e.,
data. Then
states that
where
is a subset of the
states that
where
and
set of all possible .
are disjoint, and often
. The likelihood ratio
(9)
where
If
case
is the likelihood function rejects
for small values.
is true (in statistical parlance “under
”), then in our
with
. The likelihood-ratio test statistic becomes
(2)
(10)
if the frequency function is
Here
(11)
(3)
and
denotes the determinant;
denotes the trace of a
where
denotes complex conjugation ( ) and transpose
matrix; and
( ).
). Similar expressions are
where is the identity matrix (
and
. For the numerator of we get
valid for
B. Complex Wishart Distribution
We say that a Hermitian positive definite random
follows a complex Wishart distribution, i.e.,
(12)
matrix
(4)
if the frequency function is
(13)
and
(5)
(14)
where
(6)
This leads to the desired likelihood-ratio test statistic
(15)
The frequency function is defined for positive definite.
If and are independent and both follow complex Wishart
distributions
and
If
get
, which is typically the case for change detection, we
(7)
then their sum also follows a complex Wishart distribution
(16)
If
(8)
C. Test for Equality of Two Complex Wishart Matrices
Hermitian positive definite
Let the independent
and
be complex Wishart distributed, i.e.,
matrices
with
and
with
. We consider the null hypothesis
, which states that the two matrices are
.
equal against the alternative hypothesis
In general, suppose that the observations on which we shall
where is the set of
base our test have joint density
parameters of the probability function that has generated the
(17)
and
(18)
then the probability of finding a smaller value of
2
is
(19)
58
For covariance matrix data,
. In the latter case, and
, and reduces to
. For HH, HV, or VV data,
are therefore scalars and
(20)
which is equivalent to the well-known likelihood-ratio test
statistic for the equality of two gamma parameters [14], [15]
(see the Appendix).
as functions of the number of looks for
Fig. 1 shows and
and
.
D. Azimuthal Symmetry
By swapping first rows and then columns two and three in
in (1), we obtain in the azimuthal symmetry case
(21)
is
(here 2 2), and
is
where
(here 1 1). This matrix is not Wishart distributed. We now
,
,
consider
, and
, and we assume
,
,
, and
are mutually independent.
that
We want to test the hypothesis
and
against all alternatives. We have the
likelihood function
(22)
The likelihood-ratio test statistic becomes
(23)
where the latter equality is due to the fact that the determinant
of a block diagonal matrix is the product of the determinants of
the diagonal elements, i.e., we get the same test statistic as in
. If
the full covariance matrix case. In this case
(24)
(25)
and
(26)
then the probability of finding a smaller value of
2
is
(27)
Fig. 1.
and ! as functions of the number of looks for n = m and p = 3.
III. DATA
To illustrate the change detection capability of the derived
test statistic, EMISAR and ground data from an agricultural test
site at the Research Center Foulum located in Central Jutland,
Denmark are used. Agricultural fields have been selected for the
analysis because of the large change in the polarimetric properties for such areas, due to the development of the crops with
time. Polarimetric parameters of agricultural crops have previously been analyzed from this area [2].
A. SAR Data and Calibration
The EMISAR system is a fully polarimetric airborne
SAR system, and it operates at two frequencies: C-band
(5.3-GHz/5.7-cm wavelength) and L-band (1.25-GHz/24-cm
wavelength) [11], [12]. The SAR system is normally operated from an altitude of approximately 12 500 m; the spatial
2 m (one-look); the ground range swath
resolution is 2 m
is approximately 12 km; and typical incidence angles range
from 35 to 60 . The processed data from this system are
fully calibrated by means of an advanced internal calibration
system. The radiometric calibration is better than 0.5 dB, and
the channel imbalance is less than 0.5 dB in amplitude and
5 in phase [12]. The cross-polarization contamination is
generally suppressed by more than 30 dB. The stability of the
system is very important in the change detection scheme set up
in this paper.
A large number of acquisitions with both the C- and L-band
polarimetric SAR from 1994 to 1999 exist for the agricultural
test site. To illustrate the change detection capability of the derived test statistic, L-band data from April 17, 1998 and May
59
Fig. 2.
L-band EMISAR image of the test area acquired on April 17, 1998, 5210 m
20, 1998 have been used. The two EMISAR images are shown
in Figs. 2 and 3, as color composites of the HH (green), HV (actually the complex addition of HV and VH, red), and VV (blue)
channels. The HH and VV channels are stretched linearly between 30 dB and 0 dB. The HV channel is stretched linearly
between 36 dB and 6 dB. The area is relatively flat, and corrections of the local incidence angle due to terrain slope are not
critical in this study, since the acquisition geometry for the two
acquisitions are almost identical, and therefore the correction
has not been carried out. The geometrical coregistration is, however, very important in a change detection application, where
two images are compared on a pixel-by-pixel basis. The polari-
2 5120 m.
metric images were registered to a digital elevation model generated from interferometric airborne data acquired by EMISAR.
The registration was carried out by combining a model of the
imaging geometry with few ground control points, and the images were registered to one another with a root mean-square accuracy of better than one pixel [13]. In the study, 13-look covariance matrix data with a 5 m 5 m pixel spacing are used.
B. Test Site
The area contains a number of agricultural fields of different
sizes with different crops. The lengthy, dark blue feature in the
upper left corner of Figs. 2 and 3 is a lake, while the bright
60
Fig. 3.
L-band EMISAR image of the test area acquired on May 20, 1998, 5210 m
greenish areas seen especially in the lower part of the images are
forests (primarily coniferous forests). In the April acquisition,
the colors of the agricultural fields are predominantly bluish,
due to the larger VV- than HH-backscatter coefficient for the
bare fields for the spring crops and the sparsely vegetated fields
for the winter crops. For the May acquisition, the spring crops
are mainly in the tillering stage, and the winter crops are at the
end of the stem elongation stage, in the boot stage, or at the
beginning of heading, depending on the crop type.
A number of test areas have been selected for quantitative
analysis of the test statistic. These areas are outlined in Figs. 2
and 3, and the development stage and the height of the vegeta-
2 5120 m.
tion are shown in Table I for reference. Five spring crop fields
have been used, i.e., one beet field, one pea field, two spring
barley fields, and one oats field. All spring crop fields are bare
at the April acquisition with the surface being relatively smooth
due to sowing or harrowing. At the May acquisition, the beet
field is still bare, whereas the other fields have some relatively
dense and low vegetation. The three winter crop fields, i.e., two
winter wheat fields and one rye field, have low vegetation for
the April vegetation and relatively dense and high vegetation
for the May acquisition. Finally, a common spruce field, which
is virtually unchanged between the two acquisitions, is used in
the investigation.
61
TABLE I
DEVELOPMENT STAGES AND VEGETATION HEIGHTS (IN PARENTHESES)
(a)
(b)
Fig. 5. (a) Correlation coefficient and (b) phase difference between HH and VV for the test areas shown in Figs. 2 and 3 for L-band in
April and in May.
A. Polarimetric Parameters
Fig. 4. (a) , (b) , and (c) backscatter coefficients for the test areas
shown in Figs. 2 and 3 for L-band in April and in May.
IV. RESULTS AND DISCUSSION
In Section IV-A, polarimetric parameters for the fields used
in the quantitative evaluation will be presented and discussed to
provide the background for interpretation of the test statistic results. The results for the test statistic are presented and discussed
in Section IV-B.
The polarimetric parameters used to describe the selected
fields are standard parameters derived from the covariance
backscatter coefficients
,
, and
matrix (1) [2]; the
where the
backscatter coefficient is slightly less dependent
backscatter coefficient
on the incidence angle than the
and the phase differ[2]; the correlation coefficient
of the HH and VV components, which contain
ence
important information about the scattering mechanisms; and
the co- and cross-polarized polarization signatures, which are
graphical representations of the polarimetric properties [2],
,
, and
backscatter coefficients
[3]. Fig. 4 shows the
for the various test fields and for both the April and the May
and
are shown in
acquisitions. Correspondingly,
Fig. 5, and the polarization responses are shown in Fig. 6.
1) Spring Crops: All spring crops (beets, peas, spring barley
1 and 2, oats) show classical behavior for rough surface scat[Fig. 5(a)], small
tering for the April acquisition, i.e., high
[Fig. 5(b)], low
backscatter [Fig. 4(c)], larger
than
backscatter [Fig. 4(a) and (b)], and textbook examples
of surface scattering polarization responses (which are therefore not shown here). The actual backscatter level from the surface is, of course, controlled by the soil moisture and the surface
roughness of the individual fields, and we observe rather weak
backscatter from the spring barley 1 and the oats fields (Fig. 4)
for the April acquisition due to very smooth surfaces.
The beets field also shows rough surface behavior for the May
acquisition [Fig. 6(a)]. The pea field shows some volume scattering behavior for the May acquisition, due to the sparse vegeta[Fig. 5(a)] has decreased, and the pedestal of
tion, i.e., the
the polarization response has increased [Fig. 6(b)]. This effect is
62
Fig. 6. Polarization signatures for the test areas shown in Figs. 3 for the L-band acquisition in May. Orientation angle of 0 corresponds to HH backscatter and
the left-hand signature is the copolarized signature, whereas the right-hand signature is the cross-polarized signature.
even more pronounced for the spring barley and the oats fields,
due to a more dense vegetation [Figs. 5(a) and 6(c) and (d)]. For
is observed too [Fig. 5(b)], and a
the latter fields, a large
pronounced double-bounce response is observed, especially for
oats [Fig. 6(c) and (d)]. The double-bounce scattering is most
likely caused by penetration through the vegetation, scattering
from the ground surface, and scattering from the vegetation,
or vice versa. This phenomenon has previously been observed
early in the growing season for winter crops [2].
2) Winter Crops: The backscatter coefficients from the
winter crops are, in general, larger than from the spring crops
(Fig. 4), due to the contribution from the volume scattering.
The behavior of the winter wheat and the rye fields resembles
surface scattering for the April acquisition (Fig. 5), indicating
penetration through the vegetation and a large surface scattering component. The cross-polarized backscatter is, however,
somewhat larger than for surface scattering, due to the volume
scattering contribution [Fig. 4(c)]. The backscattering from the
winter wheat 1 field is significantly larger than from the winter
wheat 2 field for both acquisitions (Fig. 4). The reason is that
the sowing direction for the winter wheat 1 field is exactly
perpendicular to the radar look direction. For the May acquisition, the winter wheat fields also show some double-bounce
scattering behavior [Figs. 5(b) and 6(e) and (f)]. The rye field
shows virtually no change in the polarimetric parameters
between the two acquisitions, except for some increase in the
backscatter (Fig. 4). The coniferous forest area shows pronounced volume scattering behavior for both acquisitions, i.e.,
63
Fig. 7.
Logarithm of the test statistic ln
Q (16) for the images shown in Figs. 2 and 3 in the assumed azimuthally symmetric case.
small
[Fig. 5(a)], small
[Fig. 5(b)], strong
backscatter [Fig. 4(c)], and large pedestal for the polarization
responses [Fig. 6(h)].
B. Test Statistic
” (16) for the azimuthally symFigs. 7 and 8 show “
metric case and the diagonal-element-only case, respectively,
for the two images shown in Figs. 2 and 3. The test statistic is inverted to show areas with large change as bright areas and areas
with small change as dark areas. Consequently, when “large
values of the test statistic” is mentioned below, it means large
” and vice versa for small values. We observe
values of “
that especially the forest areas appear very dark, indicating virtually no change between the two acquisitions. For the agricultural fields, the results range from dark (no change) to bright
(large change) areas depending on the crop type. Fig. 9(a) shows
the average “
” for the test areas outlined in the previous
sections in the following four different cases:
1) using only the VV channel;
2) using only the three diagonal elements of the covariance
matrix;
3) using the covariance matrix but assuming azimuthal
symmetry;
4) using the full covariance matrix.
Furthermore, Fig. 9(b) shows the average probability of finding a
” (derived from (19) and Theorem 6 in
larger value of “ 2
the Appendix) for the four cases mentioned above. Fig. 9(b) also
indicates the 5% and the 1% significance levels, and the regions
with probabilities lower than these levels are the regions where we
will typically reject the hypothesis of equal covariance matrices
(or VV channels) at the two points in time, i.e., these are regions
with major change between the two data acquisitions.
64
Fig. 8.
Logarithm of the test statistic ln
Q (16) for the test areas shown in Figs. 2 and 3 in the diagonal case.
Figs. 10 and 11 show in white for the images in Figs. 2 and
3, where the hypothesis of equal covariance matrix has been rejected at a 1% significance level for the azimuthally symmetric
case and the diagonal case, respectively. Clearly, we observe detection of changes in the azimuthally symmetric case that have
not been detected in the diagonal case, as well as improved detection in the azimuthally symmetric case of changes that already to some extent have been detected in the diagonal case.
In general, the test statistic for the full covariance matrix is
only slightly larger than that for the assumed azimuthally symmetric case [Fig. 9(a)]. We may conclude that the additional information added by the co- and cross-elements of the covariance matrix is small. Also, the change detection potential of the
single VV channel is seen to be much less than for the other
three cases. Therefore, the discussion below will concentrate on
comparing the results for two cases: the diagonal case, where
only the three diagonal backscatter coefficient elements of the
covariance matrix are used, and the polarimetric case, where
azimuthal symmetry is assumed (i.e., all the co- and cross-polarization elements are zero).
1) Similar Polarimetric Parameters: The two regions with
virtually no change between the acquisitions but with different
dominating scattering mechanisms, i.e., beets (surface scattering) and coniferous forest (volume scattering), show both
large values of the test statistic and no significant difference
between the diagonal and the polarimetric case. It is not
possible to reject the hypothesis of equal covariance matrices
at a 5% significance level for any of the regions [Fig. 9(b)].
The rye field also has very similar polarimetric parameters for
, as mentioned above, and
the two acquisitions, except for
the test statistic for the diagonal and the polarimetric cases
are relatively close [Figs. 7–9(a)]. The hypothesis of equal
covariance matrices is rejected at a 5% significance level for
both cases. Consequently, in these cases with relatively similar
65
much larger test statistic is observed for the polarimetric case
[Figs. 7–9(a)]. The spring barley 1 field has a very large change
in the backscatter coefficients, due to the relatively smooth
surface at the April acquisition (Fig. 4), and we observe a very
large test statistic for both the diagonal and the polarimetric
cases [Figs. 7–9(a)]. The two spring barley fields have almost
the same
and
(Fig. 5), whereas the change in the
backscatter coefficients is largest for the spring barley 1 field,
as mentioned above. This difference is clearly important for
both test statistics, where the test statistic for both the diagonal
and the polarimetric case is much larger for the spring barley
1 field than for the spring barley 2 field. The hypothesis of
equal covariance matrices is rejected for all three fields at the
5% significance level in both the diagonal and the polarimetric
case. This is also the case at the 1% significance level, except
for the spring barley 2 field in the diagonal case. Consequently,
even when large changes in the backscatter coefficients ensure
detection with a nonpolarimetric method, the addition of
polarimetric information improves the detection of changes
with the new polarimetric test statistic.
(a)
V. CONCLUSIONS
(b)
0 Q
Fig. 9. (a) Average “ ln ” for the test areas shown in Figs. 2 and 3 in four
different cases: 1) using only the VV channel, 2) using only the three diagonal
elements of the covariance matrix, 3) using the covariance matrix but assuming
azimuthal symmetry, and 4) using the full covariance matrix. (b) Average
probability of finding a larger value of “ 2 ln Q” (derived from (19) and
Theorem 6 in the Appendix) for the same four cases.
0
polarimetric parameters, the new test statistic for polarimetric
data performs equally well as the nonpolarimetric test statistic.
2) Similar Backscatter Coefficients, Large Difference
and/or
: Three fields have very similar
for
backscatter coefficients for the two acquisitions, whereas a
and/or
exists between
large difference between
decreases
the two acquisitions, i.e., the pea field (where
between the two acquisitions, due to the sparse vegetation cover
at the May acquisition), the winter wheat 2 field (where
changes significantly between the two acquisitions), and the
and
change
winter wheat 1 field (where both
significantly). A significantly larger test statistic is observed in
the polarimetric case than in the diagonal case [Figs. 7–9(a)] for
all three fields. Also, it is not possible to reject the hypothesis
of equal covariance matrices at a 5% significance level for any
of the three fields in the diagonal case [Fig. 9(b)]. On the other
hand, the hypothesis is rejected in the polarimetric case at a
1% significance level for all three fields [Fig. 9(b)]. Thus, the
results clearly show that the new polarimetric test statistic is
very sensitive to differences in the full polarimetric information
contained in the covariance matrix.
3) Large Difference for All Polarimetric Parameters: Finally, three regions show significant changes in
all polarimetric parameters, i.e., the backscatter coefficients,
, and the
, i.e., the two spring barley fields
the
and the oats field, which have a smooth bare surface at the
April acquisition and a relatively dense vegetation cover at the
May acquisition. For the spring barley 2 and the oats fields,
we see a medium test statistic in the diagonal case, whereas a
In this paper, a test statistic for equality of two complex
Wishart distributed covariance matrices and an associated
asymptotic probability measure for obtaining a smaller value
of the test statistic have been derived. The test statistic provides
a unique opportunity to develop optimal algorithms for change
detection, edge detection, line detection, segmentation, etc.,
for polarimetric SAR images. Such algorithms have previously
been based on results of applying algorithms to the single
channels and subsequently combining these results using some
kind of fusion operator.
As a demonstration of the potential of the new test statistic,
the derived measures have been applied to change detection in
fully polarimetric SAR data for a test area with primarily agricultural fields and forest stands where two images acquired with
approximately one-month interval have been used. In the case
with areas with only small change in the polarimetric parameters between the two acquisitions, the new test statistic for
polarimetric data performs equally well as the nonpolarimetric
test statistic. When the backscatter coefficients are virtually unchanged, but either the phase and/or the correlation coefficient
between the HH and VV polarizations have changed, the results
clearly show that the new polarimetric test statistic is much more
sensitive to the differences than test statistics based only on the
backscatter coefficients. Also, in the case where all parameters
in the covariance matrix have changed between the two polarizations, the new test statistic shows improved change detection capability. Consequently, the results show clearly that the
new test statistic offers improved change detection capability for
fully polarimetric SAR data.
APPENDIX
ANALYSIS OF COMPLEX WISHART MATRICES
In change detection and edge detection in polarimetric SAR
data, it is useful to be able to compare two complex Wishart
distributed matrices.
66
Fig. 10.
Rejection of hypothesis of equal covariance matrices at 1% level for the assumed azimuthally symmetric case (white: rejection, black: acceptance).
Most of the standard literature in multivariate analysis only
contains references to the real case (e.g., see [16]). This does not
mean, however, that results for the complex case do not exist. In
[4], the relevant class of complex distributions is introduced, and
[17] completed much of the work, either giving results or (indirectly) pointing out how results may be obtained. It is, however,
not straightforward to deduce the relevant formulas from their
work.
In [18], many of the necessary formulas are deduced in an
elegant way using the fact that the problem is invariant under
a group of linear transformations. The notation chosen is not
straightforward however. Some of their results appear in [19],
an unpublished thesis in Danish, including results on comparing
covariance matrices. In [20], the theory for linear and graphical
models in multivariate complex normal models is covered.
Since the formulas for the distribution of the likelihood ratio
do not seem to be available and since no authors seem to have
treated the so-called block diagonal case, we have chosen to give
a rather thorough description of the necessary results.
We start with a short introduction to the complex normal
and the complex Wishart distributions. We then compare two
gamma distributions, which is the one-dimensional test often
usen in the radar community. Then we give a straightforward
(brute force) derivation of the likelihood-ratio criterion for
testing the equality in the complex case. Then, we describe
the so-called block diagonal case, which among other things
covers the case known in the radar community as the azimuthal
symmetric case and the total independence case.
After quoting results from [21] on asymptotic distributions,
we establish the necessary results on moments by brute force
integration. By straightforward but rather cumbersome calculations, the results in [21] yield the desired results. Alternatively,
one may use results in [20] and derive expressions involving the
product of beta distributed random variables.
67
Fig. 11.
Rejection of hypothesis of equal covariance matrices at 1% level for the diagonal case (white: rejection, black: acceptance).
A. Complex Normal Distribution
is Hermitian positive definite and of the form
Following [4], we say that a -dimensional random complex
follows a complex multivariate normal distribution
vector
with mean and dispersion matrix , i.e.,
(30)
..
.
..
.
..
.
In other words, we have for
(28)
(31)
(32)
if the frequency function is
B. Complex Wishart Distribution
We say that a Hermitian positive definite random
follows a complex Wishart distribution, i.e.,
(29)
matrix
(33)
68
and similarly for . The likelihood function for the parameters
) thus becomes
(
if the frequency function is
(34)
where
(44)
(35)
, we obtain
and under the hypothesis
If confusion concerning or may arise, we write
rather than
. The frequency function is defined for positive definite, and in evaluating integrals the volume element becomes
(36)
(45)
Taking the derivatives of the log likelihoods and setting them
equal to zero, we obtain the ML estimates
where and denote real and imaginary parts. For further description and useful results on Jacobians, etc., see [4] and [17].
It is emphasized that the formulas for the complex normal and
Wishart distributions differ from their real counterparts.
Estimation of from a normal sample follows in Theorem 1.
be independent, complex
Theorem 1: Let
normal random variables, i.e.,
(46)
(47)
(48)
Therefore the likelihood-ratio test statistic becomes
(37)
Then the maximum-likelihood (ML) estimator for
is
(38)
and
(49)
or
The critical region is given by
is Wishart distributed
(50)
(39)
Proof: See [4].
From Theorem 1, we easily obtain Theorem 2.
Theorem 2: Let and be independent Wishart distributed
matrices
(40)
(41)
Then the sum will again be Wishart distributed, i.e.,
(42)
Straightforward calculations show that this critical region is of
the form
or
Since
, i.e.,
(51)
under the null hypothesis is distributed like Fisher’s
(52)
and may be determined by means of quantiles in the -distribution.
D. Test on Equality of Two Complex Wishart Matrices
Proof: Straightforward.
We consider independent Wishart distributed matrices
C. Test on Equality of Two Gamma Parameters
Let the independent random variables
real scalars) be gamma distributed
frequency function for is
and
and
(53)
(54)
(which are
. The
and wish to test the hypothesis
(43)
against
(55)
69
against all alternatives. The likelihood-ratio criterion becomes
the product of the criteria, i.e.,
We have the likelihood functions
(68)
(56)
Since the determinant of a block diagonal matrix is the product
of the determinants of the diagonal elements, we obtain
(69)
(57)
The ML estimates are
(58)
(59)
(60)
Therefore, the likelihood-ratio test statistic becomes
i.e., the same result as in the general case [see (61)]. Note, howand are no
ever, that the distribution has changed, since
longer Wishart distributed.
F. Large Sample Distribution Theory
In [21] (as quoted in [16]), a general asymptotic expansion of
the distribution of a random variable whose moments are certain
functions of gamma functions has been developed. We state the
main result as a theorem, and we shall use it in determining the
(asymptotic) distribution of the likelihood-ratio criterion.
(
) have
Theorem 3: Let the random variable
the th moment
(61)
Thus, we get the critical region
(70)
(62)
where
is a constant so that
. For an arbitrary we set
and
(71)
(72)
E. Tests in the Block Diagonal Case
In some applications (e.g., remote sensing), there are several
independent Wishart matrices in each observation. They will
often be arranged in a block diagonal structure like
The first three Bernoulli polynomials are denoted
(73)
(63)
(74)
. This covers the so-called azwhere
imuthal symmetric case and the case with independence of coand cross-polarized signals. If we define
(75)
We define
(64)
(76)
it is important to note that is not Wishart distributed
.
and , i.e., we
We now consider similar partitionings of
have independent random matrices
(65)
(66)
. We have
for
We want to test the hypothesis
and set
(77)
If we select
so that
, we have
.
(78)
(67)
Proof: See [16].
70
Theorem 4: Let
(91)
(79)
(80)
(92)
be independent complex Wishart distributed matrices. Then for
and
(81)
we have
Proof: The joint
frequency function of
. Therefore
(
)
(82)
(93)
is
Then the asymptotic distribution of the likelihood-ratio criterion
is given by
(83)
(94)
and evaluation of this integral gives the desired result.
By means of the two previous theorems, we are now able to
state the important result on the (asymptotic) distribution of the
likelihood-ratio criterion.
Theorem 5: We consider the likelihood-ratio criterion
(84)
and define
(85)
(86)
(87)
Then
(88)
Proof: Omitted, straightforward but cumbersome calculations.
We now address the block diagonal case and state the main
result in Theorem 6.
Theorem 6: Let the situation be as described in Section A–E,
and define for
(89)
(90)
Proof: Omitted, straightforward but cumbersome calculations.
REFERENCES
[1] F. T. Ulaby, R. K. Moore, and A. K. Fung, Microwave Remote Sensing:
Active and Passive. Norwood, MA: Artech House, 1986, vol. 3.
[2] H. Skriver, M. T. Svendsen, and A. G. Thomsen, “Multitemporal L- and
C -band polarimetric signatures of crops,” IEEE Trans. Geosci. Remote
Sensing, vol. 37, pp. 2413–2429, Sept. 1999.
[3] J. J. van Zyl and F. T. Ulaby, “Scattering matrix representation for simple
targets,” in Radar Polarimetry for Geoscience Applications, F. T. Ulaby
and C. Elachi, Eds. Norwood, MA: Artech House, 1990.
[4] N. T. Goodman, “Statistical analysis based on a certain multivariate
complex Gaussian distribution (An introduction),” Ann. Math. Stat., vol.
34, pp. 152–177, 1963.
[5] S. V. Nghiem, S. H. Yueh, R. Kwok, and F. K. Li, “Symmetry properties
in polarimetric remote sensing,” Radio Sci., vol. 27, no. 5, pp. 693–711,
1992.
[6] A. A. Nielsen, R. Larsen, and H. Skriver, “Change detection in bi-temporal EMISAR data from Kalø, Denmark, by means of canonical correlations analysis,” in Proc. 3rd Int. Airborne Remote Sensing Conf. and
Exhibition, Copenhagen, Denmark, July 7–10, 1997.
[7] A. A. Nielsen, “Change detection in multispectral bi-temporal spatial
data using orthogonal transformations,” in Proc. 8th Austral-Asian
Sensing Conf., Canberra ACT, Australia, Mar. 25–29, 1996.
[8] A. A. Nielsen and K. Conradsen, “Multivariate alteration detection
(MAD) in multispectral, bi-temporal image data: A new approach to
change detection studies,” Dept. Mathematical Modelling, Tech. Univ.
Denmark, Lyngby, Denmark, Tech. Rep. 1997-11, 1997.
[9] A. A. Nielsen, K. Conradsen, and J. J. Simpson, “Multivariate alteration
detection (MAD) and MAF post-processing in multispectral, bi-temporal image data: New approaches to change detection studies,” Remote
Sens. Environ., vol. 19, pp. 1–19, 1998.
[10] A. A. Nielsen, “Multi-channel remote sensing data and orthogonal
transformations for change detection,” in Machine Vision and Advanced Image Processing in Remote Sensing, I. Kanellopoulos, G. G.
Wilkinson, and T. Moons, Eds. Berlin, Germany: Springer, 1999, pp.
37–48.
[11] J. Schou, H. Skriver, K. Conradsen, and A. A. Nielsen, “CFAR edge
detector for polarimetric SAR images,” IEEE Trans. Geosci. Remote
Sensing, vol. 41, pp. 20–32, Jan. 2003.
[12] S. N. Madsen, E. L. Christensen, N. Skou, and J. Dall, “The Danish SAR
system: Design and initial tests,” IEEE Trans. Geosci. Remote Sensing,
vol. 29, pp. 417–476, May 1991.
[13] E. L. Christensen, N. Skou, J. Dall, K. Woelders, J. H. Jørgensen, J.
Granholm, and S. N. Madsen, “EMISAR: An absolutely calibrated polarimetric L- and C -band SAR,” IEEE Trans. Geosci. Remote Sensing,
vol. 36, pp. 1852–1865, Nov. 1998.
71
[14] R. Touzi, A. Lopes, and P. Bousquez, “A statistical and geometrical edge
detector for SAR images,” IEEE Trans. Geosci. Remote Sensing, vol. 26,
pp. 764–773, Nov. 1988.
[15] A. Lopes, E. Nezry, R. Touzi, and H. Laur, “Structure detection and statistical adaptive speckle filtering in SAR images,” Int. J. Remote Sens.,
vol. 13, no. 9, pp. 1735–1758, 1993.
[16] T. W. Anderson, An Introduction to Multivariate Statistical Analysis,
2nd ed. New York: Wiley, 1984.
[17] C. G. Khatri, “Classical statistical analysis based on a certain multivariate complex Gaussian distribution,” Ann. Math. Stat., vol. 36, pp.
98–114, 1965.
[18] S. A. Andersson, H. K. Brøns, and S. T. Jensen, “Distribution of eigenvalues in multivariate statistical analysis,” Inst. Mathematical Statistics,
Univ. Copenhagen, Copenhagen, Denmark, 1982.
[19] G. Gabrielsen, “Fordeling af egenværdier i relle, komplekse og kvaternion Wishart fordelinger,” M.S. thesis (in Danish), Inst. Mathematical
Statistics, Univ. Copenhagen, Copenhagen, Denmark, 1975.
[20] H. H. Andersen, M. Højbjerre, D. Sørensen, and P. S. Eriksen, Linear
and Graphical Models for the Multivariate Complex Normal Distribution. Berlin, Germany: Springer-Verlag, 1995, vol. 101, Lecture Notes
in Statistics.
[21] G. E. P. Box, “A general distribution theory for a class of likelihood
criteria,” Biometrika, vol. 36, pp. 317–346, 1949.
Knut Conradsen received the M.S. degree from
the Department of Mathematics, University of
Copenhagen, Copenhagen, Denmark, in 1970.
He is currently a Professor with the Informatics
and Mathematical Modelling, Technical University
of Denmark (DTU), Lyngby, Denmark. Since 1995,
he has been Deputy Rector (or Vice President) of
DTU. His main research interest is the application of
statistics and statistical models to real-life problems.
He has worked on many national and international
projects on the application of statistical methods
in a wide range of applications, recently mainly in remote sensing. He has
also conducted extensive studies in the development and application of
mathematical and statistical methods concerning spatial, multi/hyperspectral,
and multitemporal data, including both optical and radar data.
Allan Aasbjerg Nielsen received the M.S. degree
from the Department of Electrophysics, Technical
University of Denmark, Lyngby, Denmark, in 1978,
and the Ph.D. degree from Informatics and Mathematical Modelling (IMM), Technical University of
Denmark, in 1994.
He is currently an Associate Professor with
IMM. He is currently working with IMM’s Section
for Geoinformatics. He has been with the Danish
Defense Research Establishment from 1977 to
1978. He has worked on energy conservation in
housing with the Thermal Insulation Laboratory, Technical University of
Denmark, from 1978 to 1985. Since 1985, he has been with the Section for
Image Analysis, IMM. Since then, he has worked on several national and
international projects on the development, implementation, and application
of statistical methods and remote sensing in mineral exploration, mapping,
geology, environment, oceanography, geodesy, and agriculture funded by
industry, the European Union, Danida (the Danish International Development
Agency), and the Danish National Research Councils.
Jesper Schou received the M.S. degree in
engineering and the Ph.D. degree in electrical engineering from the Technical University of Denmark,
Lyngby, Denmark, in 1997 and 2001 respectively.
His primary research interests during his Ph.D.
studies was image analysis and the processing of
polarimetric SAR data, including filtering, structure
detection, and segmentation.
He is currently working with 3-D scanner systems
and related 3-D image analysis algorithms.
Henning Skriver received the M.S. degree and the
Ph.D. degree from the Technical University of Denmark, Lyngby, in 1983 and 1989, respectively, both
in electrical engineering.
He has been with the Section of Electromagnetic
Systems (EMI), Department Ørsted, Technical
University of Denmark (DTU), Lyngby, since 1983,
where he is currently an Associate Professor. His
work has been concerned primarily with various
topics related to the utilization of SAR data for
different applications. From 1983 to 1992, his main
area of interest was retrieval of sea ice parameters from SAR data, including
SAR data from ERS-1. Since 1992, he has covered different aspects of land
applications of SAR data, such as forestry in the MAESTRO-1 project, and
agricultural and environmental applications using both satellite SAR data and
data from the Danish airborne polarimetric SAR, EMISAR. His interests also
include various methods for processing of SAR data, such as SAR image simulation, SAR image filtering, speckle statistics, texture analysis, segmentation,
calibration, and polarimetric analysis. He is currently a Project Manager for a
project concerning the use of SAR data for cartographic mapping.
72
ȱ
œ’—ȱ‘Žȱȱ˜ȱŽŽŒȱ›’–’’ŸŽœȱȱ
ȱ
Š›œȱŽ—ǰȱȱ
Š‹˜›Š˜›¢ȱ˜ȱ˜–™žŽ›ȱ’œ’˜—ȱŠ—ȱŽ’ŠȱŽŒ‘—˜•˜¢ǰȱ
ȱŠ•‹˜›ȱ—’ŸŽ›œ’¢ȱ
ȱ
ǯ ‹œ›ŠŒȱ
‘’œȱ™Š™Ž›ȱ’œȱ‘Žȱ›Žœž•ȱ˜ȱœ˜–ŽȱŽŠ›•¢ȱŠŽ–™œȱ˜ȱ’—ȱ–Ž‘˜œȱ˜›ȱŽŽŒ’˜—ȱ˜ȱ™›’–’’ŸŽœǯȱ
‘Žȱ ™›’–’’ŸŽȱ ™›˜‹•Ž–ȱ ’œȱ Ÿ’Ž ȱ ’—ȱ Šȱ œ’–™•Žȱ Ž—Ž›Š•ȱ ŒŠœŽȱ ‘Šȱ œ‘˜ž•ȱ Š™™•¢ȱ ˜ȱ Š•–˜œȱ Š—¢ȱ
˜–Š’—ǯȱȱ‘¢™˜‘Žœ’œȱœŠ’—ȱ‘Šȱ‘ŽȱȱŠ•˜›’‘–ȱŒŠ—ȱ‹ŽȱžœŽȱ˜›ȱœ˜•Ÿ’—ȱœžŒ‘ȱŠȱ™›˜‹•Ž–ȱ
’—ȱŠȱœ’–™•Žȱ˜–Š’—ȱ’œȱ–ŠŽǯȱ˜–Žȱ˜ȱ‘Žȱ™›˜‹•Ž–ȱŠ›ŽŠœȱŠ›Žȱ’œŒžœœŽǰȱŠ—ȱ™˜œœ’‹•Žȱœ˜•ž’˜—œȱ
œžŽœŽǯȱŽŸŽ›Š•ȱŽœȬœŒŽ—Š›’˜œȱŠ›Žȱ–Ž—’˜—Žǰȱ‹žȱžŽȱ˜ȱ•ŠŒ”ȱ˜ȱ’—Š•ȱŽœȱ›Žœž•œȱ—˜ȱ’—Š•ȱ
Œ˜—Œ•žœ’˜—œȱ ŒŠ—ȱ ‹Žȱ –ŠŽǯȱ ‘Žȱ ™Š™Ž›ȱ Š’–œȱ —˜ȱ Šȱ Œ›ŽŠ’—ȱ Šȱ •Š—–Š›”ȱ ’—ȱ ‘’œȱ Š›ŽŠǰȱ ‹žȱ ’œȱ
’—œŽŠȱ’—Ž—ŽȱŠœȱŠȱ‹Šœ’œȱ˜›ȱ’œŒžœœ’˜—ȱŠ—ȱŒ›ŽŠ’ŸŽȱŽ¡Œ‘Š—Žȱ˜ȱ’ŽŠœǯȱȱȱ
ǯ
—›˜žŒ’˜—ȱ
‘ŽȱžœŽȱ˜ȱœ™ŽŽŒ‘Ȭ™›’–’’ŸŽœȱ‘Šœȱ’—ȱ›ŽœŽ—ȱ¢ŽŠ›œȱ›Žœž•Žȱ’—ȱ–Š¢˜›ȱ’–™›˜ŸŽ–Ž—œȱ’—ȱ‘Žȱ
Š›ŽŠȱ ˜ȱ œ™ŽŽŒ‘Ȭ›ŽŒ˜—’’˜—ǯȱ ‘Žȱ ™›’–’’ŸŽœȱ Š›Žȱ —˜ȱ ˜—•¢ȱ žœŽȱ ’—ȱ œ™ŽŽŒ‘ȱ ›ŽŒ˜—’’˜—ǰȱ ‹žȱ ‘ŠŸŽȱ
Š•œ˜ȱ ›Žœž•Žȱ ’—ȱ ‹ŽŽ›ȱ œ™ŽŽŒ‘ȱ œ¢—‘Žœ’£Ž›œǯȱ žŽȱ ˜ȱ ‘’œȱ œžŒŒŽœœǰȱ ‘Ž›Žȱ Š›Žȱ —˜ ȱ ‹Ž’—ȱ –ŠŽȱ
›ŽŠȱŽ˜›œȱ’—ȱ›ŽœŽŠ›Œ‘’—ȱ’ȱ‘Žȱ™›’–’’ŸŽȬŒ˜—ŒŽ™ȱŒŠ—ȱ‹ŽȱŒ˜™’Žȱ˜ȱ˜‘Ž›ȱŠ›ŽŠœȱ˜ȱŠž˜–Š’Œȱ
›ŽŒ˜—’’˜—ȱŠ—ȱœ¢—‘Žœ’£Ž›ȱŽŸŽ•˜™–Ž—ǯȱ
‘’œȱ™Š™Ž›ȱŽœŒ›’‹ŽœȱŠȱ‘¢™˜‘Žœ’œȱ’œŒ˜ŸŽ›Žȱž›’—ȱ‘ŽȱŽŠ›•’ŽœȱœŠŽœȱ˜ȱŠȱ™›˜“ŽŒȱŠ’–’—ȱ
Šȱ ŽŸŽ•˜™’—ȱ –˜’˜—Ȭ™›’–’’ŸŽœȱ ˜›ȱ ŽœŒ›’‹’—ȱ ž™™Ž›Ȭ‹˜¢ȱ ‹˜¢Ȭ•Š—ž’œ‘ǯȱ œȱ Šȱ ™’•˜ȱ
›ŽœŽŠ›Œ‘Ȭ™›˜“ŽŒȱ‘ŽȱŠœ”ȱ˜ȱŽ›’Ÿ’—ȱ™›’–’’ŸŽœȱ ’••ȱ‹ŽȱŠŽ–™Žȱ˜—ȱŠȱ—ž–‹Ž›ȱ˜ȱœ’–™•ŽȱŠŠȱ
œŽšžŽ—ŒŽœǯȱ‘Žȱ˜Š•ȱ’œȱ˜ȱ’—ȱŠȱŠœŽ›ȱŠ—ȱœ–Š›Ž›ȱ Š¢ȱ˜ȱŽ›’ŸŽȱ‘Žȱ™›’–’’ŸŽœǰȱ‘Š—ȱžœ’—ȱŠȱ
œ’–™•Žȱ‹žȱœ•˜ Ž›ȱŽ¡‘Šžœ’ŸŽȱœŽŠ›Œ‘ȱ–Ž‘˜ǯȱ
˜–Žȱ ¢™Žœȱ ˜ȱ Œ˜–™›Žœœ’˜—ȱ Š•˜›’‘–œȱ ›Ž•’Žœȱ ˜—ȱ ›ŽŒž››’—ȱ ™ŠŽ›—œȱ ’—ȱ ‘Žȱ ’—™žȱ ŠŠȱ
œŽšžŽ—ŒŽǰȱŠ—ȱ ŠŒžŠ••¢ȱ ‹ž’•œȱŠȱ’Œ’˜—Š›¢ȱŒ˜—Š’—’—ȱ‘Žȱ™ŠŽ›—œȱ‘Ž¢ȱ’—ǯȱœ’—ȱ™Š›œȱ˜ȱ
œžŒ‘ȱ Š—ȱ Š•˜›’‘–ȱ –’‘ȱ œ™ŽŽȱ ž™ȱ ‘Žȱ Š—Š•¢œ’œȱ ™›˜ŒŽœœȱ ˜ȱ –žŒ‘ȱ ‹’Ž›ȱ Š—ȱ –˜›Žȱ Œ˜–™•Ž¡ȱ
œŽšžŽ—ŒŽœǯȱ ȱ ‘Šœȱ ‘Ž›Ž˜›Žȱ ‹ŽŽ—ȱ ŽŒ’Žȱ ˜ȱ Œ˜–™Š›Žȱ ‘Žȱ ™ŠŽ›—œȱ ˜ž—ȱ ‹¢ȱ ‹˜‘ȱ Šȱ œ’–™•Žȱ
Ž¡‘Šžœ’ŸŽȱœŽŠ›Œ‘ȱŠ—ȱœžŒ‘ȱŠ—ȱŠ•˜›’‘–ǯȱ
‘Žȱ Š•˜›’‘–ȱ Œ‘˜œŽ—ȱ ˜›ȱ ‘’œȱ Žœȱ ’œȱ ‘Žȱ ŗȱ ǽřǾǰȱ ‘’Œ‘ȱ ’œȱ ™›˜‹Š‹•¢ȱ ‘Žȱ –˜œȱ Š–˜žœȱ
–˜’’Žȱ ŸŽ›œ’˜—ȱ ˜ȱ ‘Žȱ •Š—–Š›”ȱ œŽ’—ȱ Řȱ Š•˜›’‘–œǯȱ ‘’œȱ ™Š™Ž›ȱ ’—Ž—œȱ ˜ȱ Žœȱ ‘Žȱ
‘¢™˜‘Žœ’œȱ‘Šȱ‘Žȱǰȱ ‘’Œ‘ȱ ’••ȱ‹ŽȱŽœŒ›’‹Žȱ’—ȱ–˜›ŽȱŽŠ’•ȱ•ŠŽ›ȱ’—ȱ‘’œȱ™Š™Ž›ǰȱŒŠ—ȱ’—ȱ
‘Žȱ™›’–’’ŸŽœȱ˜ȱŠȱŽœȬœŽšžŽ—ŒŽȱ’ŸŽ—ȱ ‹¢ȱ Šȱ—˜›–Š•ȱȱœ›’—ǯȱ‘Žȱ™Š™Ž›ȱ ’••ȱŠ•œ˜ȱ›¢ȱ˜ȱ
Œ˜ŸŽ›ȱœ˜–Žȱ˜ȱ‘Žȱ™›˜‹•Ž–œȦŒ˜—ŒŽ›—œȱ‘Šȱ‘ŽȱȱŒ˜ž•ȱŒŠžœŽǯȱ
ȱ
ǯ
‘Žȱ’–™•Žȱ¡‘Šžœ’ŸŽȱŽŠ›Œ‘ȱ
—ȱ˜›Ž›ȱ˜ȱŽŸŠ•žŠŽȱ‘Žȱ›Žœž•œȱ˜ȱ‘ŽȱȱŠ•˜›’‘–œǰȱŠ••ȱ›Š’—’—ȱŠŠȱŠ›ŽȱŠ•œ˜ȱŠ—Š•¢£Žȱ
‹¢ȱ Šȱ œ’–™•Žȱ Ž¡‘Šžœ’ŸŽȱ œŽŠ›Œ‘ǯȱ ‘Žȱ Ž¡‘Šžœ’ŸŽȱ œŽŠ›Œ‘ȱ œ‘˜ž•ȱ ‹Žȱ Š‹•Žȱ ˜ȱ ’—ȱ ‘Žȱ —ž–‹Ž›ȱ ˜ȱ
˜ŒŒž››Ž—ŒŽœȱ ˜›ȱ Š••ȱ ™˜œœ’‹•Žȱ œž‹ȬœŽšžŽ—ŒŽœȱ Œ˜–‹’—Š’˜—œǰȱ Š—ȱ ‘Ž›Žȱ ‹¢ȱ œž™™•¢ȱ Šȱ ȁ›˜ž—ȱ
ȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱ
ŗ
Ř
ȱŽ–™Ž•ǰȱ’ŸǰȱŠ—ȱŽ•Œ‘ǯȱ
ȱ‘Žȱ ˜ȱŠ–˜žœȱŝŝȱŠ—ȱŝŞȱ–ŠŽȱ‹¢ȱŽ–™Ž•ȱŠ—ȱ’Ÿǯȱ
73
›ž‘Ȃȱ ˜›ȱ Šȱ ’ŸŽ—ȱ œŽšžŽ—ŒŽǯȱ ‘Žȱ ›ž—’–Žȱ ˜›ȱ ‘Žȱ Ž¡‘Šžœ’ŸŽȱ œŽŠ›Œ‘ȱ ’œȱ Š•œ˜ȱ žœŽȱ ˜›ȱ
Œ˜–™Š›’œ˜—ǰȱœ’—ŒŽȱ˜—Žȱ˜ȱ‘Žȱ’—ŒŽ—’ŸŽœȱ˜›ȱ›¢’—ȱ˜ȱžœŽȱŠȱȱŠ•˜›’‘–ȱ Šœȱ˜ȱŽŒ›ŽŠœŽȱ‘Žȱ
›ž—’–Žǯȱ
‘ŽȱŽ¡‘Šžœ’ŸŽȱœŽŠ›Œ‘ȱ ’••ȱ›ž—ȱ‘›˜ž‘ȱ‘Žȱ’—™žȱœ›’—ȱǻǼȱ˜›ȱŽŠŒ‘ȱ ’—˜ Ȭœ’£Žřȱ˜ȱǻŗǯǯ—Ǽȱ
Œ˜ž—’—ȱ‘Žȱ˜ŒŒž››Ž—ŒŽœȱ˜ȱŽŸŽ›¢ȱœž‹Ȭœ›’—ǯȱ’ž›ŽȱŗȱœŽŽ—ȱ‹Ž•˜ ǰȱœ‘˜ œȱŠ—ȱŽ¡Š–™•Žȱ˜ȱ‘Žȱ
Ž¡‘Šžœ’ŸŽȱ œŽŠ›Œ‘ȱ ˜ȱ ’••žœ›ŠŽȱ ‘Žȱ ™›’—Œ’™•Žȱ Š—ȱ Š••˜ ȱ Šȱ Œ˜–™Š›’œ˜—ȱ ’‘ȱ ‘Žȱ ’ž›Žœȱ
’••žœ›Š’—ȱ‘Žȱȱ™›’—Œ’™•Žœǯȱ‘Žȱ’—™žȱœ›’—ȱǻǼȱ‘ŠœȱŠȱŸŠ•žŽȱ˜ȱǻŖŖŗŖŗŖŘŗŖŘŗŖŘŗŘŖŘŗŖŘŗŘŖŖǼǰȱ
Š—ȱ‘Žȱ’—™žȱœŽšžŽ—ŒŽȱ’œȱŠ—Š•¢£Žȱ˜›ȱ™ŠŽ›—œȱ˜ȱœ’£ŽȱƽȱŘǯȱ
ȱ
0 0 1 0 1 0 2 1 0 2 1 0 2 1 2 0 2 1 0 2 1 2 0 0
0 0 = 1
0 1 = 2
1 0 = 1+1
ȱ
’ž›ŽȱŗȱDZȱ¡‘Šžœ’ŸŽȱœŽŠ›Œ‘ǯȱ
ǯ
‘Žȱȱ•˜›’‘–œȱ
‘Žȱ˜›’’—Š•ȱŝŝȱŠ—ȱŝŞȱŠ•˜›’‘–œȱ ŠœȱŽŸŽ•˜™ŽȱŠ—ȱ™ž‹•’œ‘Žȱ‹¢ȱŽ–™Ž•ȱŠ—ȱ’Ÿȱ
‹ŠŒ”ȱ ’—ȱ ŗşŝŝȱ Š—ȱ ŗşŝŞȱ ǽŗǾǰǽŘǾǯȱ ‘Ž¢ȱ ‹Ž•˜—ȱ ˜ȱ Šȱ ›˜ž™ȱ ˜ȱ Œ˜–™›Žœœ’˜—ȱ Š•˜›’‘–œȱ ”—˜ —ȱ Šœȱ
ȁŠŠ™’ŸŽȱ ’Œ’˜—Š›¢Ȃȱ Š•˜›’‘–œǯȱ ‘Žȱ ’Œ’˜—Š›¢ȱ Œ˜’—ȱ ŽŒ‘—’šžŽȱ ›Ž•’Žœȱ ž™˜—ȱ Œ˜››Ž•Š’˜—ȱ
‹Ž ŽŽ—ȱ ™Š›œȱ ˜ȱ ŠŠǰȱ Š—ȱ ’œȱ œ™ŽŒ’’ŒŠ••¢ȱ ‹ž’•ȱ ˜ȱ ŽŽŒȱ ›ŽŒž››’—ȱ ™ŠŽ›—œǯȱ ȱ ’œȱ ‘’œȱ
Œ‘Š›ŠŒŽ›’œ’Œȱ‘Šȱ Žȱ‘˜™Žȱ˜ȱ‹Ž—Ž’ȱ˜›–ǰȱŠ—ȱ—˜ȱ‘ŽȱŠ•˜›’‘–œȱŠ‹’•’¢ȱ˜ȱŒ˜–™›ŽœœȱŠŠǯȱ
‘Žȱ ’›œȱ ˜••˜ ’—ȱ ™Š›œȱ ˜ȱ ‘’œȱ ™Š™Ž›ȱ ’••ȱ ŽœŒ›’‹Žȱ ‘Žȱ ŝŝǰȱ ŝŞȱ Š—ȱ ȱ ’—ȱ –˜›Žȱ
ŽŠ’•ǰȱ Š—ȱ ’ŸŽȱ Šȱ œ’–™•Žȱ Ž¡Š–™•Žȱ ˜ȱ œ‘˜ ȱ ‘Žȱ ‹Šœ’Œȱ ™›’—Œ’™•Žœȱ ˜ȱ ‘ŽœŽȱ Š•˜›’‘–œǯȱ ˜ȱ Šȱ
›ŽŠŽ›ȱŠ•›ŽŠ¢ȱŽ¡™Ž›’Ž—ŒŽȱ ’‘ȱ‘ŽœŽȱŽŒ‘—’šžŽœǰȱ‘Žȱ™Š›œȱ ’••ȱ‹›’—ȱ—˜ȱ—Ž ȱ’—˜›–Š’˜—ǯȱ
ȱ
ȱȬȱ’ǯȱȱȱȱ ‘Žȱŝŝȱ•˜›’‘–ȱ
ŸŽ—ȱ ‘˜ž‘ȱ ‘Žȱ ŝŝȱ ’œȱ Œ˜—œ’Ž›Žȱ ˜ȱ ‹Žȱ Š—ȱ ȁŠŠ™’ŸŽȱ ’Œ’˜—Š›¢ȱ Š•˜›’‘–Ȃȱ ’œȱ ˜Žœȱ ’—ȱ
ŠŒȱ —˜ȱ Œ›ŽŠŽȱ Šȱ ’Œ’˜—Š›¢ǯȱ —œŽŠȱ ’ȱ žœŽœȱ Šȱ ȁœ•’’—ȱ ’—˜ Ȃȱ ŽŒ‘—’šžŽǰȱ Šœȱ Ž™’ŒŽȱ ’—ȱ
’ž›ŽȱŘǰȱ˜ȱ’—ȱ‘Žȱ›ŽŒž››’—ȱ™ŠŽ›—œȱŠ—ȱŒ˜–™›Žœœȱ‘ŽȱŠŠȱœŽšžŽ—ŒŽȱ’—ȱŠȱœ’—•Žȱ™Šœœǯȱ‘Žȱ
œ•’’—ȱ ’—˜ ȱ Œ˜—œ’œœȱ ˜ȱ  ˜ȱ ™Š›œǰȱ Šȱ •˜˜”Š‘ŽŠȱ ‹žŽ›ȱ ‘’Œ‘ȱ Œ˜—Š’—œȱ ‘Žȱ ™Š›ȱ ˜ȱ ‘Žȱ
œŽšžŽ—ŒŽȱ‘Šȱ‘Šœȱ—˜ȱ¢Žȱ‹ŽŽ—ȱŽ—Œ˜ŽǰȱŠ—ȱ‘ŽȱœŽŠ›Œ‘ȱ‹žŽ›ȱ ‘’Œ‘ȱŒ˜—Š’—œȱ‘Žȱ•Šœȱ™Š›ȱ˜ȱ
‘ŽȱœŽšžŽ—ŒŽȱ‘Šȱ‘ŠœȱŠ•›ŽŠ¢ȱ‹ŽŽ—ȱŽ—Œ˜Žǯȱȱ
ȱ
search buffer
lookahead buffer
001010 210210 212021 021200
window
ȱ
’ž›ŽȱŘȱDZȱ•’’—ȱ ’—˜ ǯȱ
‘Žȱœ’£Žȱ˜ȱŽŠŒ‘ȱ‹žŽ›ȱŒŠ—ȱ‹ŽȱœŽȱ˜ȱŠ—¢ȱ’ŸŽ—ȱŸŠ•žŽǯȱȱœ–Š••ȱ ‹žŽ›Ȭœ’£Žȱ ’••ȱ›Žœž•ȱ’—ȱŠȱ
Šœȱ ‹žȱ ŸŽ›¢ȱ ™˜ž›ȱ Œ˜–™›Žœœ’˜—ǰȱ Šœȱ Šȱ ‹’ȱ ‹žŽ›Ȭœ’£Žȱ ’••ȱ ›Žœž•ȱ ’—ȱ Šȱ ‹ŽŽ›ȱ Œ˜–™›Žœœ’˜—ȱ ‹žȱ
Š•œ˜ȱ •˜—Ž›ȱ ›ž—Ȭ’–Žǯȱ ‘Žȱ œ’£Žȱ ˜ȱ ‘Žȱ œŽŠ›Œ‘ȱ ‹žŽ›ȱ ’œȱ ˜Ž—ȱ ‹’Ž›ȱ ‘Š—ȱ ‘Žȱ œ’£Žȱ ˜ȱ ‘Žȱ
ȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱ
ȱ ‘Žȱ ŸŠ•žŽȱ ˜ȱ ǻ—Ǽȱ ‘Šœȱ ’—ȱ ™›’—Œ’™•Žȱ ˜—•¢ȱ ‘Žȱ œ’£Žȱ ˜ȱ ‘Žȱ ’—™žȬœ›’—ȱ Šœȱ •’–’ǯȱ žȱ ’••ȱ ’—ȱ
™›ŠŒ’ŒŽȱ‹ŽȱœŽȱ˜ȱŠȱŸŠ•žŽȱ–žŒ‘ȱœ–Š••Ž›ǯȱ
ř
74
•˜˜”Š‘ŽŠȱ ‹žŽ›ǰȱ œ’—ŒŽȱ ‘Žȱ •˜˜”Š‘ŽŠȱ ‹žŽ›ȱ ’œȱ Š• Š¢œȱ Ž—Œ˜Žȱ ›˜–ȱ ‘Žȱ •Ž–˜œȱ ŸŠ•žŽǰȱ
‘’•ŽȱŠȱ–ŠŒ‘ȱŒŠ—ȱ‹Žȱ˜ž—ȱŠ—¢ ‘Ž›Žȱ’—ȱ‘ŽȱœŽŠ›Œ‘ȱ‹žŽ›ǯȱ‘ŽȱŠ•˜›’‘–ȱœŠ›œȱ‹¢ȱ’••’—ȱ‘Žȱ
•˜˜”Š‘ŽŠȱ‹žŽ›ȱ ’‘ȱ‘Žȱ’›œȱŸŠ•žŽœȱ˜ȱ‘Žȱ’—™žȱœŽšžŽ—ŒŽǰȱ ‘’•Žȱ‘ŽȱœŽŠ›Œ‘ȱ‹žŽ›ȱ’œȱ’••Žȱ
’‘ȱ Šȱ ™›ŽŽ’—Žȱ ŸŠ•žŽȱ ǻ’ǯŽǯȱ £Ž›˜œǼǯȱ ‘Žȱ •Žȱ –˜œȱ ŸŠ•žŽœȱ ˜ȱ ‘Žȱ •˜˜”Š‘ŽŠȱ ‹žŽ›ȱ ’œȱ ‘Ž—ȱ
Œ˜–™Š›Žȱ ˜ȱ ‘Žȱ ŸŠ•žŽœȱ ˜ȱ ‘Žȱ œŽŠ›Œ‘ȱ ‹žŽ›ǰȱ œ˜ȱ ‘Žȱ ‹’Žœȱ ™˜œœ’‹•Žȱ –ŠŒ‘ȱ ‹Ž ŽŽ—ȱ ‘Žȱ
•Ž–˜œȱŸŠ•žŽœȱ˜ȱ‘Žȱ•˜˜”Š‘ŽŠȱ ‹žŽ›ȱ’œȱ–ŠŒ‘Žȱ˜ȱŠȱœ’–’•Š›ȱœž‹Ȭœ›’—ȱ‹Ž’——’—Śȱ’—ȱ‘Žȱ
œŽŠ›Œ‘ȱ ‹žŽ›ǯȱȱŠȱ–ŠŒ‘ȱ’œȱ˜ž—ǰȱ‘Žȱ™˜œ’’˜—ȱ˜ȱ‘ŽȱœŽŠ›Œ‘Ȭ–ŠŒ‘ȱŒ˜ž—Žȱ›˜–ȱ‘Žȱ•Žȱ˜ȱ
‘ŽȱœŽŠ›Œ‘ȱ‹žŽ›ȱ’œȱŠ™™Ž—Žȱ˜ȱ‘Žȱ˜ž™žȱœ›’—ȱ ’‘ȱ‘Žȱœ’£Žȱ˜ȱ‘Žȱ–ŠŒ‘ǰȱŠ—ȱ‘ŽȱŸŠ•žŽȱ
›’‘ȱ ŠŽ›ȱ ‘Žȱ –ŠŒ‘ȱ ’—ȱ ‘Žȱ •˜˜”Š‘ŽŠȱ ‹žŽ›ȱ ǻ‘Žȱ ’›œȱ —˜—ŽȬ–ŠŒ‘Žȱ ŸŠ•žŽǼǯȱ ȱ —˜ȱ –ŠŒ‘ȱ ’œȱ
˜ž—ȱ‘Žȱ™˜œ’’˜—ȱŠ—ȱ•Ž—‘ȱ’œȱœŽȱ˜ȱ£Ž›˜ǯȱ
’ž›Žȱřȱ ’••ȱœ‘˜ ȱŠ—ȱŽ¡Š–™•ŽȱŠ”Ž—ȱ›˜–ȱ‘Žȱ˜›’’—Š•ȱŝŝȱ™Š™Ž›śǯȱ‘Žȱ’—™žȱœŽšžŽ—ŒŽȱƽȱ
ǻŖŖŗŖŗŖŘŗŖŘŗŖŘŗŘŖŘŗŖŘŗŘŖŖdzǼǰȱ‘ŽȱœŽŠ›Œ‘ȱŠ—ȱ•˜˜”Š‘ŽŠȱ‹žŽ›Ȭœ’£ŽȱƽȱǻşǼǯȱ
ȱ
0000000 00001 0102102 1...
C1 = 22 02 1
000 000001 010210 21021. ..
C2 = 21 10 2
00001 0102102 10212 021...
C3 = 20 21 2
2 102102 1202102 1200. ..
C4 = 02 22 0
ȱ
’ž›ŽȱřȱDZȱŝŝȱŽ—Œ˜’—ǯȱ
‘Ž›Žȱ‘ŠŸŽȱ‹ŽŽ—ȱ–ŠŽȱœŽŸŽ›Š•ȱ–˜’’ŽȱŸŽ›œ’˜—œȱ˜ȱ‘Žȱŝŝǯȱ˜–Žȱ˜ȱ‘ŽœŽȱŒ˜–‹’—Žȱ‘Žȱ
ŝŝȱ ’‘ȱ ž–Š—ȱ Ž—Œ˜’—ǰȱ ‘’•Žȱ ˜‘Ž›œȱ Š•Ž›ȱ ŒŽ›Š’—ȱ ™Š›œȱ ˜ȱ ’–™›˜ŸŽȱ ‘Žȱ Œ˜–™›Žœœ’˜—ȱ
‘Ž—ȱ ˜›”’—ȱ˜—ȱŠȱŒŽ›Š’—ȱ¢™Žȱ˜ȱŠŠǯȱ—Žȱ˜ȱ‘Žȱ’–™›˜ŸŽ–Ž—œǰȱ‘Žȱȱ‹¢ȱ˜›Ž›ȱŠ—ȱ
£¢–Š—œ”’ǰȱŠŸ˜’œȱŠ’—ȱ‘Žȱ’›œȱ—˜—ŽȬ–ŠŒ‘ŽȱŸŠ•žŽȱ˜ȱ˜ž™žȱž—•Žœœȱ—˜ȱ–ŠŒ‘ȱ’œȱ˜ž—ǯȱ
—˜‘Ž›ȱŒ‘Š—Žȱ–ŠŽȱ’—ȱȱ’œȱ‘Šȱ‘Žȱ’›œȱ—˜—ŽȬ–ŠŒ‘ŽȱŸŠ•žŽȱ’œȱ™Š›ȱ˜ȱ‘Žȱ—Ž¡ȱ–ŠŒ‘ǯȱ
ȱ
ȱȬȱ’’ǯȱȱȱȱ ‘ŽȱŝŞȱ•˜›’‘–ȱ
‘ŽȱŝŞȱ’œȱ Šȱ›ŽŠ•ȱŠŠ™’ŸŽȱ’Œ’˜—Š›¢Ȭ‹ŠœŽȱŒ˜–™›Žœœ’˜—ȱŠ•˜›’‘–ǰȱŠ—ȱ ‘Šœȱ—˜ȱœ•’’—ȱ
’—˜ œǯȱȱ‹ž’•œȱŠȱ’Œ’˜—Š›¢ȱ‹ŠœŽȱ˜—ȱ‘Žȱ’—™žȱœŽšžŽ—ŒŽǰȱŠ—ȱ’œȱ˜—•¢ȱ˜ž™žȱ’œȱŠȱ™˜’—Ž›ȱ
˜ȱ‘Žȱ•˜—Žœȱ–ŠŒ‘’—ȱ’Œ’˜—Š›¢ȱŽ—›¢ǰȱŠ—ȱ‘Žȱ’›œȱ—˜—ŽȬ–ŠŒ‘’—ȱŽ•Ž–Ž—ǯȱȱ—˜ȱ–ŠŒ‘ȱ’œȱ
˜ž—ȱ‘Žȱ˜ž™žȱŒ˜—œ’œœȱ˜ȱŠ—ȱ’—Ž¡ȱŸŠ•žŽȱ˜ȱ£Ž›˜ȱŠ—ȱ‘Žȱ’›œȱ—˜—ŽȬ–ŠŒ‘’—ȱŽ•Ž–Ž—ǯȱ‘Žȱ
˜ž™žȱ ™Š’›ȱ ’œȱ Š•œ˜ȱ ŠŽȱ ˜ȱ ‘Žȱ ’Œ’˜—Š›¢ȱ Šœȱ Šȱ —Ž ȱ Ž—›¢ǰȱ ž—•Žœœȱ ‘Žȱ ’—Ž¡ȱ ’œȱ £Ž›˜ǯȱ —ȱ ‘Šȱ
ŒŠœŽȱ˜—•¢ȱ‘Žȱ—˜—ŽȬ–ŠŒ‘’—ȱŽ•Ž–Ž—ȱ’œȱŠŽǯȱ
’ž›Žȱ Śȱ œ‘˜ œȱ Š—ȱ Ž¡Š–™•Žȱ ˜ȱ Šȱ ŝŞȱ Ž—Œ˜’—ǯȱ ǻ—™žȱ
ŖŖŗŘŗŘŗŘŗŖŘŗŖŗŘŗŖŗŘŘŗŖŗŗǼǯȱ
ȱ
ȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱ
Ś
ś
ȱ‘Žȱ–ŠŒ‘ȱ–Š¢ȱŽ¡Ž—ȱ’—˜ȱ‘Žȱ•˜˜”Š‘ŽŠȱ‹žŽ›ǯȱ
ȱ˜Žȱ‘ŠȱŠ••ȱŸŠ•žŽœȱŠ›Žȱ ›’Ž—ȱ’—ȱŽ›—Š›¢ȱ—ž–‹Ž›œȱǻ›Š’¡ȬřǼǯȱ
œŽšžŽ—ŒŽȱ ƽȱ
75
Dictionary
#
1
2
3
4
5
6
7
8
9
entry
0
1+1
2
1
3+1
5+0
6+1
7+2
7+1
phrase
0
01
2
1
21
210
2101
21012
21011
Output
0
1
0
0
3
5
6
7
7
0
1
2
1
1
0
1
2
1
ȱ
’ž›ŽȱŚȱDZȱŝŞȱ—Œ˜’—ȱ
‘Žȱ ŝŞȱ ‘Šœȱ Šȱ Ž ȱ ŽŠ”—ŽœœŽœDzȱ ˜—Žȱ ˜ȱ ‘Ž–ȱ ’œȱ œ’£Žȱ ˜ȱ ‘Žȱ ’Œ’˜—Š›¢ǯȱ ȱ ‘Žȱ ’—™žȱ
œŽšžŽ—ŒŽȱ ’œȱ ŸŽ›¢ȱ •Š›Žȱ Š—ȱ Œ˜—Š’—œȱ Šȱ ‘’‘ȱ •ŽŸŽ•ȱ ˜ȱ ŸŠ›’Š’˜—ǰȱ ‘Žȱ ’Œ’˜—Š›¢ȱ –’‘ȱ Š•œ˜ȱ
‹ŽŒ˜–ŽȱŽ¡›Ž–Ž•¢ȱ•Š›Žǯȱ‘’œȱ™›˜‹•Ž–ȱŒŠ—ȱ‹Žȱœ˜•ŸŽȱ‹¢ȱŽ’‘Ž›ȱœŠ›ȱ˜ŸŽ›ȱ˜—ȱŠȱ—Ž ȱ’Œ’˜—Š›¢ȱ
˜›ȱœ’–™•¢ȱœ˜™ȱŠ’—ȱ—Ž ȱŽ—›’Žœǯȱ—˜‘Ž›ȱ™›˜‹•Ž–ȱ’œȱ‘Šȱ‘Žȱ ŝŞȱ•’”Žȱ‘ŽȱŝŝȱŠ• Š¢œȱ
Šœȱ‘Žȱ’›œȱ—˜—ŽȬ–ŠŒ‘ŽȱŸŠ•žŽȱ˜ȱ˜ž™žǰȱŠ—ȱ“ž–™œȱ˜ŸŽ›ȱ’ȱ ‘Ž—ȱ‹Ž’——’—ȱ˜ȱ–ŠŒ‘ȱ‘Žȱ
›Ž–Š’—’—ȱ ™Š›ȱ ˜ȱ ‘Žȱ œŽšžŽ—ŒŽǯȱ —Žȱ ˜ȱ ‘Žȱ –Š—¢ȱ –˜’’Žȱ ŸŽ›œ’˜—œȱ ˜ȱ ‘Žȱ ŝŞȱ ǻ‘Žȱ ȱ
Š•˜›’‘–ǼȱŠ”ŽœȱŒŠ›Žȱ˜ȱ‘’œȱ™›˜‹•Ž–ǯȱȱ
ȱ
ȱȬȱ’’’ǯȱȱȱȱ ‘Žȱȱ•˜›’‘–ȱ
‘Žȱ ǰȱ ’—›˜žŒŽȱ ‹¢ȱ Ž››¢ȱ Ž•Œ‘ȱ ’—ȱ ŗşŞŚȱ ǽřǾǰȱ ’œȱ Šȱ –˜’’Žȱ ŸŽ›œ’˜—ȱ ˜ȱ ‘Žȱ ŝŞǯȱ
‘Ž›Žȱ‘ŽȱŝŞȱŠœȱ‹˜‘ȱŠȱ™˜’—Ž›ȱ˜ȱŠȱ’Œ’˜—Š›¢ȱŽ—›¢ȱŠ—ȱ‘Žȱ’›œȱ—˜—ŽȬ–ŠŒ‘ŽȱŸŠ•žŽȱ˜ȱ
˜ž™žǰȱ‘ŽȱȱŠ™™•’Žœȱ‘Žȱȱ™›’—Œ’™•Žȱ‹¢ȱ˜—•¢ȱŠ’—ȱŽ—›¢Ȭ™˜’—Ž›œǯȱ’‘˜žȱŠ—¢ȱ˜‘Ž›ȱ
Œ‘Š—Žœǰȱ’ȱ ˜ž•ȱ‹Žȱ’–™˜œœ’‹•Žȱ˜ȱŽŒ˜Žȱ‘Žȱ˜ž™žȱœ’—Š•Dzȱœ˜ȱ˜ȱŒ˜–™Ž—œŠŽȱ˜›ȱ‘Žȱ•ŠŒ”ȱ˜ȱ
œŽšžŽ—ŒŽȱŸŠ•žŽœǰȱ‘Žȱ’Œ’˜—Š›¢ȱ’œȱ’—’’Š•’£Žȱ ’‘ȱŠ••ȱ‘Žȱ‹Šœ’Œȱœ¢–‹˜•œȱ˜ȱ‘Žȱ’—™žȱŠ•™‘Š‹Žǯȱ
‘Žȱ ȱ Š•œ˜ȱ žœŽœȱ ‘Žȱ ȱ ™›’—Œ’™•Žȱ ˜ȱ —˜ȱ “ž–™’—ȱ ˜ŸŽ›ȱ ‘Žȱ ’›œȱ —˜—ŽȬ–ŠŒ‘Žȱ ŸŠ•žŽȱ
‘Ž—ȱŒ˜—žŒ’—ȱ‘Žȱ—Ž ȱ–ŠŒ‘ǯȱŽœ’Žœȱ‘ŽœŽȱŽ ȱŠ•Ž›Š’˜—œǰȱ‘Žȱȱ ˜›”œȱ“žœȱ•’”Žȱ‘Žȱ
ŝŞǯȱȱ
ȱ
ǯ
œ’—ȱ‘ŽȱȱŠœȱŠȱŽŠ›Œ‘ȱ•˜›’‘–ȱ
‘Žȱ ȱ Š•˜›’‘–ȱ ’œȱ •’”Žȱ ‘Žȱ ›Žœȱ ˜ȱ ‘Žȱ ȱ Š•˜›’‘–œȱ Žœ’—Žȱ ˜ȱ Œ˜–™›Žœœȱ Š—ȱ ’—™žȱ
œ’—Š•ȱŠœȱ–žŒ‘ȱŠœȱ™˜œœ’‹•ŽǰȱŠ—ȱ—˜ȱ˜ȱ”ŽŽ™ȱ›ŠŒ”ȱ˜ȱ‘˜ ȱ–Š—¢ȱ’–Žœȱ‘ŽȱœŠ–Žȱ™ŠŽ›—ȱ˜ŒŒž›œȱ
’—ȱ ‘Žȱ ’—™žȱ œŽšžŽ—ŒŽǯȱ ‘’œȱ –’‘ȱ ‹Žȱ Šȱ –Š“˜›ȱ ™›˜‹•Ž–ȱ ‘Ž—ȱ ›¢’—ȱ ˜ȱ žœŽȱ ‘Žȱ Š•˜›’‘–ȱ ˜›ȱ
Š—Š•¢£’—ȱ ŠȱœŽšžŽ—ŒŽȱ˜ȱ’—ȱ™›’–’’ŸŽœǯȱŠŒ‘ȱ’–ŽȱŠȱ–ŠŒ‘ȱ’œȱ–ŠŽǰȱ‘Ž›ŽȱŠ›Žȱ Š•œ˜ȱŠŽȱŠȱ
—Ž ȱ Ž—›¢ȱ ’—˜ȱ ‘Žȱ ’Œ’˜—Š›¢ǯȱ ‘’œȱ –ŽŠ—œȱ ‘Šȱ ŽŸŽ—ȱ ‘˜ž‘ȱ ‘Žȱ ȱ ŽŽŒœȱ ‘Žȱ ›’‘ȱ
›ŽŒž››’—ȱ™ŠŽ›—ǰȱ’ȱ ’••ȱŠȱœ˜–Žȱ™˜’—ȱŠ•œ˜ȱ‘ŠŸŽȱŠ—ȱŽ—›¢ȱ˜ȱ‘Šȱ™ŠŽ›—ȱ ’‘ȱ–Š—¢ȱ’Ž›Ž—ȱ
œ¢–‹˜•œȱ Šȱ ‘Žȱ Ž—ǰȱ Š—ȱ ‘Ž›Ž˜›Žȱ –˜œȱ ŒŽ›Š’—•¢ȱ ‘˜œŽȱ ˜ȱ –ŠŒ‘ȱ ‘Žȱ ǻ™ŠŽ›—Ƹœ¢–‹˜•Ǽȱ ›˜–ȱ
‘Žȱ’—™žȱœŽšžŽ—ŒŽȱ˜ȱŠȱœ’–’•Š›ȱǻ™ŠŽ›—Ƹœ¢–‹˜•Ǽȱ’—ȱ‘Žȱ’Œ’˜—Š›¢ǯȱ‘’œȱ‘Šœȱ‘Žȱž—˜›ž—ŠŽȱ
˜žŒ˜–Žǰȱ‘Šȱ‘ŽȱœŠ–Žȱ™ŠŽ›—ȱ’œȱ–ŠŒ‘ȱ ’‘ȱ–Š—¢ȱ’Ž›Ž—ȱ—Š–ŽœȱǻŽ—›’ŽœǼȱŠ—ȱ‘Ž›Ž˜›Žȱ—˜ȱ
›ŽŒ˜—’£ŽȱŠœȱ‹Ž’—ȱ‘ŽȱœŠ–Žȱ›ŽŒž››’—ȱ™ŠŽ›—ǯȱ
‘Žȱ Š¢ȱ‘ŽȱœŠ—Š›ȱȱ’œȱŽœ’—Žǰȱ’ȱ–’‘ȱŠ•œ˜ȱ‘ŠŸŽȱŠȱ•˜ȱ˜ȱœ’–’•Š›ȱŽ—›’Žœǰȱ‘Šȱ
˜—•¢ȱ’Ž›œȱ’—ȱ‘Žȱ’›œȱœ¢–‹˜•ǰȱœŽŽȱŽ¡Š–™•Žȱ’—ȱȱȱ
’ž›Žȱ śǯȱ ‘ŽœŽȱ œ’–’•Š›ȱ œž‹Ȭœ›’—œȱ ’••ȱ —˜ȱ ŽŸŽ—ȱ ‹Žȱ –ŠŒ‘Žȱ ˜—ŒŽǰȱ Š—ȱ ŒŠ—ȱ ‘Ž›Ž˜›Žȱ ‹Žȱ
Œ˜–™•ŽŽ•¢ȱ –’œœŽȱ Šœȱ ™˜Ž—’Š•ȱ ™›’–’’ŸŽœǯȱ ‘Žȱ •Šœȱ Š—ȱ ’—Š•ȱ ™›˜‹•Ž–ȱ ’œȱ ‘Šȱ Šȱ ›ŽŒž››’—ȱ
™ŠŽ›—ȱ –’‘ȱ ‹Žȱ ’Ÿ’Žȱ ’—˜ȱ  ˜ȱ ’Ž›Ž—ȱ Ž—›’Žœǰȱ Šœȱ Šȱ ›Žœž•ȱ ˜ȱ ‘Žȱ ŽŸŽ›ȱ Ž¡™Š—’—ȱ
76
’Œ’˜—Š›¢ǯȱ ‘’œȱ ™›˜‹•Ž–ȱ –’‘ȱ ‹Žȱ ‘Žȱ ‘Š›Žœȱ ˜ȱ Œ˜ž—Ž›ŠŒǰȱ œ’—ŒŽȱ ‘’œȱ ›Žšž’›Žœȱ Šȱ
›ŽŠœœŽ–‹•’—ȱ˜ȱ‘Žȱ ˜ȱ™Š’›œȱ˜ȱœžŒŒŽœœ’ŸŽȱŽ—›’Žœǯȱ
ȱ
344 668 8
544 668 8
1 446 688
71 446 688
ȱȱ
’ž›ŽȱśȱDZȱ—›ŽŒ˜—’£Š‹•ŽȱŽ—›’Žœǯȱ
ȱȬȱ’ǯȱȱȱȱ ¡›ŠŒ’—ȱ‘Žȱȃ˜œȄȱž‹Ȭ›’—œȱ
‘Žȱ ȱ ’œȱ Šȱ –˜’’Žȱ ŸŽ›œ’˜—ȱ ˜ȱ ‘Žȱ ŝŞǰȱ ’‘ȱ œ˜–Žȱ ’–™›˜ŸŽȱ Š‹’•’’Žœǯȱ ‘Žȱ ȱ ’œȱ
Žœ’—ȱ ˜›ȱ Œ˜–™›Žœœ’—ȱ ŠŠȱ Š—ȱ —˜ȱ ˜›ȱ Š—Š•¢£’—ȱ Šȱ œŽšžŽ—ŒŽȱ ˜ȱ ŽŽŒȱ ™›’–’’ŸŽœǰȱ œ˜ȱ
–˜’¢’—ȱ‘ŽȱȱŽŸŽ—ȱž›‘Ž›ȱ–’‘ȱ’–™›˜ŸŽȱ’ȱšž’ŽȱŠȱ‹’ǯȱȱ
ȱ Šȱ œ–Š••ȱ œŽŠ›Œ‘ȱ ˜›ȱ œ’–’•Š›ȱ œž‹Ȭœ›’—œȱ Ž›Žȱ Œ˜—žŒŽȱ ˜—ȱ ‘Žȱ Ž—›’Žœȱ ˜ȱ ‘Žȱ ’Œ’˜—Š›¢ȱ
‘’œȱ œ‘˜ž•ȱ œ˜•ŸŽȱ ‘Žȱ ™›˜‹•Ž–ȱ ˜ȱ ’Ž›Ž—ȱ —Š–Žœȱ ˜›ȱ œ’–’•Š›ȱ ™ŠŽ›—œǰȱ œ’—ŒŽȱ ‘Žȱ —ž–‹Ž›ȱ ˜ȱ
˜ŒŒž››Ž—ŒŽœȱ˜›ȱŽŠŒ‘ȱŽ—›¢ȱŒ˜—Š’—’—ȱ‘Žȱœž‹Ȭœ›’—ȱŒŠ—ȱ‹ŽȱŠŽȱ˜ȱŽȱ‘Žȱ˜Š•ȱ—ž–‹Ž›ȱ˜ȱ
˜ŒŒž››Ž—ŒŽœȱ˜ȱ‘Žȱœž‹Ȭœ›’—ǯȱ
‘Ž›ȱ ȱ ™›˜‹•Ž–œȱ –’‘ȱ Š•œ˜ȱ ‹Žȱ œ˜•ŸŽȱ ‹¢ȱ Š’—ȱ Ž¡›Šȱ ž—Œ’˜—Š•’¢ǰȱ Š”’—ȱ ’—˜ȱ
Œ˜—œ’Ž›Š’˜—ȱ‘Šȱ’ȱ–’‘ȱ‘ŠŸŽȱž—˜›ž—ŠŽȱŽŽŒœȱ˜—ȱ‘ŽȱŠ•˜›’‘–œȱ›ž—Ȭ’–Žǯȱ
ȱ
ȱȬȱ’’ǯȱȱȱȱ ˜–™Š›’—ȱ‘ŽȱȱŠ—ȱ¡‘Šžœ’ŸŽŽŠ›Œ‘ȱ•˜›’‘–ȱ
‘Žȱ™Š™Ž›ȱœ‘˜ž•ȱ‘ŠŸŽȱ’—Œ•žŽȱŠ—ȱŽ¡Ž—œ’ŸŽȱŒ˜–™Š›’œ˜—ȱ‹Ž ŽŽ—ȱ‘ŽȱŽ¡‘Šžœ’ŸŽȱœŽŠ›Œ‘ȱ
Š™™›˜ŠŒ‘ȱ Š—ȱ ‘Žȱ –˜’’Žȱ ŸŽ›œ’˜—œȱ ˜ȱ ‘Žȱ ǯȱ ŽœȬ›Žœž•œȱ Ž›Žǰȱ ‘˜ ŽŸŽ›ǰȱ —˜ȱ ŠŸŠ’•Š‹•Žȱ
‹Ž˜›Žȱ ŽŠ•’—Žȱ ˜ȱ ‘’œȱ ™Š™Ž›ǯȱ ‘ŽœŽȱ ›Žœž•œȱ Š›Žȱ ’—œŽŠȱ œŒ‘Žž•Žȱ ˜ȱ ‹Žȱ ™›ŽœŽ—Žȱ Šȱ ‘Žȱ
Š—˜–ȱŒ˜—Ž›Ž—ŒŽȱ’—ȱžžœǯȱȱ
ȱ
ǯ
’œŒžœœ’˜—ȱŠ—ȱ˜—Œ•žœ’˜—ȱȱ
ŸŽ—ȱ‘˜ž‘ȱŠȱ’—Š•ȱŒ˜—Œ•žœ’˜—ȱ˜ȱ‘’œȱŽ¡™Ž›’–Ž—ȱ’œȱ’–™˜œœ’‹•ŽȱžŽȱ˜ȱ‘Žȱ•ŠŒ”ȱ˜ȱ’—Š•ȱŽœȱ
›Žœž•œǰȱ œ˜–Žȱ Œ˜—Œ•žœ’˜—œȱ–’‘ȱ ‹Žȱ ŠŽ–™Žȱ —ŽŸŽ›ȱ ‘Žȱ •Žœœǯȱ ‘’œȱ ™Š™Ž›ȱ ’œȱ ‘Žȱ ›Žœž•ȱ ˜ȱ ‘Žȱ
ŽŸŽ›¢ȱ ’›œȱ œŽ™œȱ ˜ȱ Šȱ ™›˜“ŽŒȱ Š’–’—ȱ Šȱ ’—’—ȱ Šȱ œŽȱ ˜ȱ ™›’–’’ŸŽœȱ ˜ȱ ŽœŒ›’‹Žȱ ž™™Ž›Ȭ‹˜¢ȱ
‹˜¢Ȭ•Š—ž’œ‘ǯȱ ‘Žȱ ™›Ž•’–’—Š›¢ȱ Žœ’—ȱ ‘Šœȱ Ž—•’‘Ž—Žȱ –Š—¢ȱ ™›˜‹•Ž–œȦŒ˜—ŒŽ›—œȱ ˜ Š›œȱ
žœ’—ȱŠ—ȱŠ•˜›’‘–ȱ‹ž’•ȱ˜›ȱŠȱœ˜–Ž ‘Šȱ’Ž›Ž—ȱŠœ”ǯȱ
‘Žȱ ˜Š•ȱ ˜ȱ ‘Žȱ Ž¡™Ž›’–Ž—ȱ ’œȱ —˜ȱ ˜ȱ Žȱ Œ˜–™•ŽŽȱ œŠ’œ’Œœȱ ˜ȱ ‘˜ ȱ –Š—¢ȱ ˜ŒŒž››Ž—ŒŽœȱ ˜ȱ
‘˜ ȱ–Š—¢ȱ›ŽŒž››Ž—ȱ™ŠŽ›—œȱ’œȱ’—Œ•žŽȱ’—ȱ‘Žȱ’—™žȱœŽšžŽ—ŒŽǰȱ‹žȱ’—œŽŠǰȱ’ȱ‘ŽȱȱŒŠ—ȱ
‹Žȱ žœŽȱ ˜ȱ ’—ȱ ™›’–’’ŸŽœǯȱ ‘Žȱ ’—Š•ȱ Žœȱ ›Žœž•ȱ –’‘ȱ œž›™›’œ’—•¢ȱ œ‘˜ ȱ ‘Šȱ ‘Žȱ ȱ ’œȱ
œž™Ž›’˜›ȱ˜ȱ‘ŽȱŽ¡‘Šžœ’ŸŽȱœŽŠ›Œ‘ȱ’—ȱœŽ•ŽŒ’—ȱ‘Žȱȁ›’‘ȂȱŽ—›’Žœǯȱ
ȱ
ǯ ȱŽŽ›Ž—ŒŽœȱ
ȱǽŗǾȱ ’Ÿǰȱ ǯǰȱ Š—ȱ Ž–™Ž•ǰȱ ǯȱ ȱ —’ŸŽ›œŠ•ȱ •˜›’‘–ȱ ˜›ȱ ŽšžŽ—’Š•ȱ ŠŠȱ ˜–™›Žœœ’˜—ǯȱ
ȱ›Š—œŠŒ’˜—œȱ˜—ȱ—˜›–Š’˜—ȱ‘Ž˜›¢ȱŘřȱǻŗşŝŝǼǰȱřřŝȬřŚřǯȱ
ǽŘǾȱ
’Ÿǰȱ ǯǰȱ Š—ȱ Ž–™Ž•ǰȱ ǯȱ ȱ ˜–™›Žœœ’˜—ȱ ˜ȱ —’Ÿ’žŠ•ȱ ŽšžŽ—ŒŽœȱ Ÿ’Šȱ Š›’Š‹•ŽȬŠŽȱ
˜’—ǯȱȱ›Š—œŠŒ’˜—œȱ˜—ȱ—˜›–Š’˜—ȱ‘Ž˜›¢ȱŘŚȱǻŗşŝŞǼǰȱśřŖȬśřŜǯȱ
ǽřǾȱ
Ž•Œ‘ǰȱŽ››¢ǯȱȱŽŒ‘—’šžŽȱ˜›ȱ
’‘ȬŽ›˜›–Š—ŒŽȱŠŠȱ˜–™›Žœœ’˜—ǯȱ˜–™žŽ›ǰȱž—Žȱ
ŗşŞŚǯȱ
77
Medical image sequence coding
Mo Wu and Søren Forchhammer
Research Center COM, Technical University of Denmark
Abstract:
Wavelet-based methods have become popular for two-dimensional (2D) image compression.
It has also been used in 3D image sequence compression, especially for medical image
sequence compression. Our contribution in this paper is to present a method to compress 4D
functional Magnetic Resonance Imaging (fMRI) data (16 bit per pixel) in a 3D setting. In
order to acquire lossless compression and improve compression ratio, different reversible
integer to integer wavelet transforms have been tried out. In the entropy coding part, we use
context based adaptive arithmetic coding and extend the 2D contexts to 3D by including the
information from the third dimension slices. The compression ratio can be increased by the
new 3D context. A Head Object Oriented Lossless Compression (HOOLC) for 4D fMRI data
is also introduced, which can also be applied for other types of objects in lossless
compression.
Index Terms - fMRI, wavelet based image coding, medical image compression, lossless
compression, adaptive arithmetic coding, HOLC, HOOLC.
1. Introduction
Medical ultrasound data, Computer Tomography data (CT) and functional Magnetic
Resonance Imaging (fMRI) data may have very large volumes and can only be transmitted
efficiently by using compression. They include tremendous volumes of images with a high
level of noise as well as a high correlation between them. Using wavelet based image
sequence coding can decrease the volume of the data significantly and also provide
progressive coding. Based on the lifting [6] scheme integer to integer wavelet transforms can
be constructed [5], making efficient lossless compression possible, e.g. for medical image
compression.
The fMRI data we consider is 4D data. In the 3D spatial domain, it is composed of slices at
different positions. Including time as the fourth dimension, the data become 4D. Functional
MR data has its own property compared with e.g. video sequences. It has much more noise
than normal video. The noise comes from the capturing and processing of the acquisition
device. So even in areas of the image, which is supposed to be (almost) zero, there may be
unexpected values. Viewing the images by naked eyes, sometimes we cannot tell the
difference between two slices at different time. The popular technique, motion prediction [12]
used in video technology, is not suitable for this condition. Lossless compression is very
important for medical applications, because the subtle differences can give us information
about the patients. But on the other hand many bits are most likely of little importance. So, for
the medical data compression, traditional methods need to be revised. Firstly we use 3D
wavelets based compression for the 4D dataset (64×64×49×32), which has be reformed to 3D
setting (448×448×32) by aligning slices at different positions and the same time to one slice
[see Fig. 1]. Here the data is 2 bytes per pixel. Thereafter optimized context based coding will
be used for the 4D data compression.
78
Figure 1: 4D fMRI medical image sequence data in 3D representation
A popular wavelet-based embedded image coding is Set Partitioning In Hierarchical Trees
(SPIHT) [10], by Said and Pearlman, which improves upon Embedded Zerotree Wavelet
(EZW) [11]. The coding efficiency comes from exploring interband dependencies between
the wavelet coefficients. It assumes that the significant status of one region in a subband is
highly correlated to its parent node. As a consequence, if the status of descendent subband
coefficients is not dependent on its lower bands’ status, the cost will be more. 3D SPIHT [19]
was developed for 3D data compression.
Another popular coder is the context based layered zero coding [13], which is the core
element of JPEG2000 [1] image compression standard. Compared with the zerotree structure
coding [10], [11], it concentrates on pursuing the dependency inside the individual subbands.
The context of the zero coding will be refined for the 3D structure.
The type of data is important information for us when performing the compression. We can
use the extra information to know the interested region of the images and preprocess the data
to improve the compression ratio.
The rest of this paper is organized as follows: 3D packet integer to integer wavelet transform
is covered in Section 2. Section 3 will talk about 3D context based adaptive arithmetic block
coding, concentrating on how to extend the 2D context to 3D. Section 4 will show the object
based 3D wavelet coding. It combines the 3D contexts with object information to increase
compression of the 4D datasets. Experimental results, in both lossy and lossless cases, are
given in Section 5. Conclusions are drawn in Section 6.
2. 3D packet integer to integer wavelet transform
Integer to integer lifting is used to construct reversible 1D wavelet transforms. 3D packet [3]
integer to integer transform is constructed based on the 1D wavelet transforms.
2.1.
Reversible integer to integer wavelet transform
Traditional (discrete) wavelet transform decomposes one sequence of data into subbands by
using low pass and high pass filters. The data can be further decomposed in subbands to decorrelate the data.
The original medical data is represented by integer numbers (or finite precision). By using the
normal wavelet transforms, such as the biorthogonal Daubechies 9/7, output wavelets
coefficients will be real numbers, which can not be coded efficiently without loss. But for
medical applications, only minor mistakes in the medical data may result in serious problems.
Thus, efficient lossless compression requires an integer to integer wavelet transform. The
lifting technique is a good way to construct integer to integer wavelet transforms [5], [6].
79
Every lifting step can be rounded to integer representation, and the process can be reversible.
Thus the wavelets transform can be converted to an integer transform, if it is constructed by
integer lifting steps. There are many integer transforms [14-16], which can be used. Here we
use the filters: (2, 2), (2, 4), (4, 2), (4, 4), (6, 2), (2+2, 2), (2/6), (9/7-F) and (9/7)
These lifting scheme filters do not include the scaling part for the forward transform, except
the (9/7). The scaling part [3] can be split into two integer lifting steps and maintain that the
transform is reversible. But the scaling is used to make the transform unitary. It is also
possible to do the transform without this part in lossless compression, making the
transformation simpler and more robust towards lossy conditions, because the more integer
lifting steps, the more the (rounding) error will spread from the code to the reconstructed
images. For example, the lowest band of a 4 level 2D dyadic wavelets transform [9] combined
with a 2 level packet transform [3] in the third dimension will experience 2 times scaling in
the third dimension and 4 times scaling in the other two dimensions (see below), so the extra
integer lifting steps will be 2×2×4=16 times. The rounding process at each step will
exacerbate the quality of the reconstructed images if truncating for lossy coding..
2.2.
3-D packet wavelets transform
We do the spatial 2D dyadic wavelet transform for each slice and then do the wavelet
transform in the third dimension. It is different from the 3D dyadic wavelet transform if the
number of decomposition levels in the spatial domain is more than one. The order of the steps
in the transformation is shown on Fig. 2.
(a) 3D Dyadic
Row Transform
Column Transform
Time Transform
(b) 3D Packet
Row Transform
Column Transform
Time Transform
Figure 2: The order of the wavelet transform in the three dimensions for: (a) 3D dyadic transformation
and (b) 3D packet transform
Different wavelets are good at decorrelating different textures. The relationship between the
neighbor pixels in the special domain is different from that in the time domain. Different
wavelets can be used for the spatial domain from the one for the time domain.
3. 3D Context based adaptive arithmetic block coding
In the 3D context based adaptive arithmetic block coding, we did the entropy coding for the
wavelet coefficients based on the Zero Coding (ZC) primitive, Sign Coding primitive (SC)
and Magnitude Refinement primitive (MR), which are the core of the Embedded Block
Coding by Optimized Truncation (EBCOT) [13] technique for image coding. The basic
coding contexts have been extended to 3D by including the status from the front slice and
back slice and the most current status from the front point in the third dimension. The main
target is finding the best contexts to achieve a higher compression ratio.
3.1.
Zero coding, Sign coding, Magnitude refinement
We did the entropy coding for the wavelet coefficients by using ZC, SC and MR primitives
and thereafter include context information from the front slice and the back slice. There are
17 contexts in total for the normal 2D contexts model
80
3.2.
Extension to 3-D contexts
Based on the 2D context model, we extend the contexts to 3D by including the information
from the front slice and the back slice. We use a Normalized Least-Mean-Square (NLMS)
algorithm [2] [see Fig. 3] to train α i ,1 ≤ i ≤ 18, i ∈ Z to predict the wavelets coefficient with
the 18 neighbors in these two slices. The estimator [3] is defined as, ∆ =
¦
wi ∈S
α i ωi , ωi
is the wavelets coefficients at the front and the back slice.
0
h(n)
+
d(n)
w(n)
Į(n)
dˆ (n)
+
e(n)
-
w(n-1)
NLMS
adaptive filter
Figure 3: Adaptive filter structure for linear prediction. w(n) is the wavelets
coefficients; d(n) is the current wavelet coefficient we need to predict; dˆ (n) is
the predicted value; e(n) is the error in prediction.
α k +1 = α k + µ
ek ω k
L
¦ω
2
m,k
m =1
We train α i for the different subbands in the third dimension and thereafter simply use the
mean results of the different subbands as α i . If we were to use linear prediction, we would
(implicitly) assume the signal is stationary, unfortunately, this is not the case, and α i should
not converge to one value, but rather fluctuate around some value. So the mean value is used
for an approximate division of the original contexts. This prediction is only used for
extending the zero coding contexts, because there is no relation with the sign and little
relation with magnitude refinement. The estimator is quantized to 2 intervals or 4 intervals.
When we further extend these 3D contexts, we only use 2 intervals here. But this prediction is
only used to extend the ZC contexts, because this prediction almost has no relationship with
the MR and no relationship with SC.
3.3.
Scan order in coding the blocks
In the coding process, we use a block coding method, but without making the coding
independent for every block with respect to the other blocks, not only because we want to
extend the contexts to 3D, but also as we can use information from the other blocks. If we
have more information available, we can predict the coded value more accurately.
So, in this condition, we want to initialize the counts of the statistics for adaptive arithmetic
coding by inheriting the counts from the most similar neighbor block. One way to evaluate
the similarity is by the Kullback-Leibler distance [7] [8] between the coded block and its
81
neighbour block. The blocks that have the smallest distance have the most similarity of their
statistics. The average Kullback-Leibler distance D for this model is defined as:
­
¦c ( n0 (c ) + n1( c ) ) d c
°
(c)
(c)
°D =
¦c ( n0 + n1 )
°
°
1
§ qk ( c ) · °
(c)
® d c = ¦ q k log 2 ¨¨ ( c ) ¸¸
k =0
© pk ¹
°
°
nk + δ
° pk , qk =
n0 + n1 + 2δ
°
°
¯
Where d c is the Kullback-Leibler distance for one context, which has label c and a discrete
distribution, k = 0, 1 . The probability function for the first block is pk
it is qk
(c )
(c )
, for the second one
. The statistics in first block will be passed on to the second block, and the second
block will use it to do the arithmetic coding. Thus, the distance D we calculate for predicting
(c )
(c)
the arithmetic coding performance uses the counts ( n0 and n1 ) of every context in the
second block as a weight for calculating the average Kullback-Leibler distance to get more
accurate prediction. From the definition of the distance, we can see it is always nonnegative,
and equals zero if all of the probabilities distribution of the contexts are the same.
Experiments show that the distance between the neighbouring blocks in the same subband in
the third dimension (time) is the smallest, followed by the neighbour block in the same spatial
subband. It is easy to understand if we explain it from another point of view. The medical
image sequences have high correlation between neighbour slices at different positions, but
even higher between neighbour frames in time. The decomposition in the time dimension can
make it into a multi-resolution subband, but the stillness of the frames makes the wavelet
coefficients in the same band generally the same.
In the same 3D subband, neighbouring blocks in the third dimension’s (time) direction are
coded first, thereafter we move to the next neighbouring block in the 2D spatial domain to
finish coding all the blocks in the same way. After finishing the coding of one 3D subband,
we move to the next subband along the direction shown in the Figure 4. After finishing
coding the first packet, we move to the second one and use the same scan order as in the first
one. After we finish scanning all the points for one instance, then the next round for the next
bit-plane starts. Scanning by this way can also support progressive coding.
82
Figure 4, The order in block coding
3.4.
Initialization of the contexts
We optimize the method of initializing counts of the statistics table by comparing the code
lengths in a different way. In order to make it adaptive and localized for small regions in the
adaptive arithmetic coding [9], [17], we use the same statistics table in the same subband,
which is in the 3D spatial subband and includes at least one slice. The counts of a new
subband in the same resolution will be initialized by the scaled old counts from the former
subband. It is calculated as follow,
ª
º
old count × λ
new count = «
»
¬ number of blocks in last subband ¼
λ is a factor in controlling the volume of the initialized counts.
In subbands of different resolution or different bit-plane, the counts of new statistics will be
force to zeros.
3.5.
Further extension of the 3-D context
As we have stated before, the average Kullback-Leibler distance between the neighbour slices
in the time direction is small. Besides, pixels coded at one bit-plane can also be used as
information in coding the same position in the next slice. Status of the pixel at one bit-plane
has two types. We can extend the number of zero coding contexts to 9×2×2=36. This
information can be used to extend magnitude refinement coding contexts and the sign coding
contexts. They will have 3×2=6 and 5×2=10 contexts separately.
4. Object oriented 3D context based coding
The functional fMRI data we used is image slices of a human head. The head object is located
at the center of the images and it has a blurred boundary between object and the background.
There is a lot of noise in the background. In the medical treatment, the head object is the most
important for diagnosis, thus compression may be increased if only the head object
information is coded. But we should find a fully safe way to strip the background, but not any
part of the head. We call the lossless compression of the head object for Head Object Lossless
Compression (HOLC). Another object oriented method is lossless compression of the whole
dataset, but using different statistics for different objects. We call it Head Object Oriented
Lossless Compression (HOOLC).
4.1.
HOLC
To demonstrate the concept we use a simple (bounding) rectangle shape to mask the head
object. In deciding the size of the rectangle, we use different thresholds to test the acceptable
boundary between the head and the background. Because gray level of the bone is much
83
larger than that of the margin, it is easy to find the boundary in slices at the back head
position. But it is difficult to find it in slices at the front face position. Thus we test different
thresholds until it is safe to force the margin to zero. We code the preprocessed dataset and
combine the mask information we have known in the preprocessing. Mask information is
processed by transforming the mask from the 2D spatial domain to 2D wavelets domain
according to 2D dyadic decomposition level. Because the wavelet decompositions change the
resolution to a lower scale, so it is easy to predict the boundary in the wavelets domain
without truly using the wavelet transformation. The transformed mask is only used to
differentiate the boundary in the wavelets domain and the head object will be coded
separately from the almost zero back ground (head edge spread to the background due to long
length filters and higher decomposition level).
An ellipse-shaped mask has also been tested in HOLC. It can be very effective for some slices
using a high threshold, but it will cut off part of the head object in some slices. In order to
make it safe for all the slices, the size would be very large and the area of the object would be
larger than that of the simple rectangle shape mask. So, it was not used for the HOLC method.
4.2.
HOOLC
In HOOLC mode, the ellipse mask is a good choice, because almost all the head slices have
shapes similar to an ellipse. It can match the boundary more accurately than a rectangle.
Because both the head object and the background object would be compressed, it is tolerable
to make mistakes in masking the head. Also the ellipse shape is very easy to be coded with
only one point coordinate and two axis’ length values and the same position share the same
value.
By this method, we separate the statistics of these two objects. Because the textures of these
two objects are different, the statistics can be very different. It can increase the compression
ratio based on the clear classification. In order to get the highest compression ratio, a good
threshold needs to be chosen to separate the different objects or in other words, the different
textures. The same coding method is applied as in HOLC except preprocessing. The ellipse
mask and the transformed version [see Figure 5] are used for separating the coding content.
Figure 5: One slice of the image sequence, the ellipse mask with threshold equal to 50 and mask after
four levels decomposition
So HOOLC can fully support lossless compression by compressing the head object and the
background separately. HOOLC first codes the head object and then the background, so it can
provide extra dimension to progressive coding to the traditional wavelet based coding. Also
HOOLC can support lossless reconstruction of the head object and almost lossless
reconstruction of the background. It will be flexible to be used for the doctors, such as used in
telemedicine [18].
84
5. Experiment results
In the experiments, lossless compression for a dataset of dimensions 448×448×32 (16 bits per
pixel) was performed. The data is larger than the normal 8 bit data sets, and the maximum
value requires 11 bits to be represented. As the data set is mainly used for medical analysis
and treatment, we concentrate on the test of lossless compression (2D context, 3D context and
HOOLC) and lossy compression HOLC with different reversible wavelets transform. The
lossy compression results were compared with 3D_SPIHT. For HOOLC, an optimal threshold
was determined for the (2, 2) reversible wavelet transform.
5.1.
Lossless compression and HOLC with different filters
The lossless compression performances of the proposed coders were evaluated with different
integer filters. The results [see Table 1] show that extending the 2D context to 3D can
increase the lossless compression ratio. The best filter for the head fMRI sequence is 2/6 and
then (2, 4), (2, 2). The hybrid filters can also give us good results, and 2D 2/6 combined with
(2, 4) has the best performance in the 2D context coding. As mentioned before, HOLC is only
lossless for the head object. The result of HOLC shows that many bits have been used to code
the noisy background.
Table 1: Lossless compression results with different encoders and reversible filters
5.2.
Methods
2D context
3D context
HOOLC
(T = 50)
HOLC
(T =100)
(2,2)
(2,4)
(4,2)
(2+2,2)
(4,4)
(6.2)
9/7
9/7-F
2/6
2D(2,2) – 3rdD(2,4)
2D(2,2) – 3rdD 2/6
2D(2,4) – 3rdD(2,2)
2D(2,4) – 3rdD 2/6
2D 2/6 – 3rdD(2,2)
2D 2/6 – 3rdD(2,4)
4.393
4.386
4.445
4.439
4.421
4.471
4.428
4.419
4.384
4.387
4.397
4.391
4.396
4.378
4.373
4.334
4.329
4.386
4.384
4.362
4.413
4.355
4.346
4.292
4.330
4.327
4.333
4.326
4.299
4.294
4.305
4.299
4.356
4.354
4.332
4.383
4.324
4.315
4.262
4.300
4.298
4.303
4.297
4.270
4.265
2.686
2.734
2.837
2.816
2.843
2.892
2.903
2.872
2.585
2.685
2.681
2.735
2.730
2.590
2.589
Threshold choosing in lossless 3D object based medical image compression
In HOOLC, different thresholds from 10 to 140 have been tested. The result is shown
[see Figure 6]. When the threshold equals to 50, we get the highest compression ratio.
And the mask can increase the compression ratio by classifying different textures of
the head object and the background.
85
Figure 6: Lossless compression result with
different thresholds. Using (2,2,) reversible
wavelet transform
Figure 7: Rate-distortion comparison between
3D contexts adaptive arithmetic coding and 3DSPIHT in lossy compression.
But the mask is lager than the clear boundary of the head. One explanation is that near the
boundary there are still some areas having a texture similar to that of the head but with a
lower gray scale.
5.3.
Lossy compression
Lossy compression was constructed based on the 3D context coding, and the transformation is
a real number to real number wavelet transform, which is also constructed by lifting steps [6]
and has scaling to make it unitary. Result was compared with 3D_SPIHT [see Figure 7]. The
performance is better than 3D-SPIHT. In high bit-rate, the performance is also better than 3DSPIHT, but not as obvious as in low bit-rate. The integer to integer transform based coding
can also support progressive reconstruction, but the rate-distortion performance is not better
than that of the real number to real number wavelet transform because of rounding errors at
every step. If we need to use integer to integer transform based lossy compression, the filters
with less lifting steps will be preferable.
6. Conclusion
This paper gives an overview of four evolutional 3D wavelet based coders for 4D medical
image sequence, from 2D context to 3D context and then object based coders. The
performance is improved and tested after the refinement of the contexts and object
classification. Filter 2/6 has the best performance in compressing fMRI data of head, and
hybrid filters can also be used. The lossy compression performance is better than 3D-SPIHT.
A good extension from 2D context to 3D has been given. The Kullback-Leibler distance was
used to explain the similarity between the different statistics.
HOLC is a coder supporting region-of-interest (ROI). HOOLC is a safe way to compress
fMRI image data of a head and it can increase the compression ratio. It also provides an extra
dimension for progressive reconstruction, focusing on the object first and then the background.
This can increase the compression ratio by mildly throwing away some non-usable bits
related to the background.
References
[1] JPEG2000 Image Compression Fundamentals, Standards and Practice, David S.
Taubman, Michael W. Marcellin, Page 356.
[2] A Family of Normalized LMS Algorithms, Scott C. Douglas, IEEE Signal Processing,
Vol. 1, NO. 3 March 1994.
86
[3] Lossy-to-Lossless Compression of Medical Volumetric Data Using Three-Dimensional
Integer Wavelet Transforms, Zixiang Xiong, Xiaolin Wu, IEEE Transactions on
MedicalImaging, Vol. 22, No, 3, March 2003
[4] M. Antonini M. Barlaud P. Mathieu I. Daubechies, "Image coding using wavelet
transform," IEEE Trans. on Image Processing, vol. 1, no. 2, pp. 205--220, 1992.
[5] M. D. Adams and F. Kossentini, Reversible Integer-to-Integer Wavelet Transforms for
Image Compression: Performance Evaluation and Analysis, IEEE
Trans. on Image Processing, vol. 9, no. 6, Jun. 2000, pp. 1010-1024.
[6]I. Daubechies and W. Sweldens Factoring Wavelet Transforms into Lifting Steps,
J. Fourier Anal. Appl., Vol. 4, Nr. 3, pp. 247-269, 1998.
[7] Cover, T. M. and Thomas, J. A. Elements of Information Theory. New York: Wiley, 1991.
[8] Qian, H. "Relative Entropy: Free Energy Associated with Equilibrium Fluctuations and
Nonequilibrium Deviations." 8 Jul 2000.
[9] Khalid Sayood, Introduction to data Compression, second edition
[10]: A. Said and W. A. Pearlman. “A New Fast and Efficient Coder Based on Set
Partitioning in Hierarchical Trees”. IEEE Transactions on Circuits and Systems for Video
Technologies, pages 243/250, June 1996
[11] J. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients,”
IEEE Trans. Signal Processing, vol. 41, pp. 3445–3463, Dec. 1993.
[12] Ligang Lu1, Zhou Wang2, and Alan C. Bovik, “Adaptive frame prediction for foveation
scalable video coding.”
[13] D. Taubman, “High performance scalable image compression with EBCOT,” IEEE
Trans. Image Processing, vol. 9, pp. 1158–1170, July 2000. IEEE Trans. Commun., vol. 45,
pp. 437–444, Apr. 1997.
[14] R. C. Calderbank, I. Daubechies,W. Sweldens, and B.-L. Yeo, “Wavelet transforms that
map integers to integers,” J. Appl. Comput. Harmon. Anal., vol. 5, pp. 332–369, 1998.
[15] A. Bilgin, G. Zweig, and M. W. Marcellin, “Three-dimensional image compression using
integer wavelet transforms,” in Appl. Opt.: Inform. Proc., vol. 39, Apr. 2000, pp. 1799–1814.
[16] W. Sweldens, “The lifting scheme: A construction of second generation wavelet,” SIAM
J. Math. Anal., vol. 29, pp. 511–546, 1997.
[17] G. Langdon, “An introduction to arithmetic coding,” IBM J. Res. Dev., vol. 28, pp. 135–
149, 1984.
[18] “Telemedicine Today Magazine,” Westlake Village, CA.
[19] B.-J. Kim, Z. Xiong, and W. A. Pearlman, “Low bit-rate scalable video coding with 3-D
set partitioning in hierarchical trees (3-D SPIHT),” IEEE Trans. Circuits Syst. Video Technol.,
vol. 10, pp. 1365–1374, Dec. 2000.
87
Integrity Improvements in Classically Deformable Solids
Micky K. Christensen*
Anders Fleron†
Department of Computer Science
University of Copenhagen
DIKU
Abstract
The physically-based model for simulating elastically
deformable objects, presented by Terzopoulos et al. in 1987,
has significant problems with handling solids. We give a
plausible explanation for the reasons of instability that lead
to implosion of deformable solids. We propose an extension
to the original model, with improvements that result in a
model of increased stability. Comparisons are made with the
original method to illustrate this point. The improved model
is suitable for interactive simulations of deformable solids,
with only a small constant decrease of performance.
Keywords: Physics-based animation, simulation, dynamics,
deformation, real time, deformable solids, continuum elastic
model, constraints, variational calculus, finite differences
1
Introduction
Figure 1 Large deformation results in convincing material buckling
Deformable models seem to have gained increasing interest
during the last years. Part of this success comes from a
desire to interact with objects that resembles those in real
life, which all seem to be deformable at some level. The fact
that CPUs and GPUs today are both advanced and powerful
makes it possible to simulate and animate deformable
bodies interactively. This paper builds on work previously
done by Christensen and Fleron [1], which focused on an
implementation of a method for simulating elastically
deformable objects, first put forward by Terzopoulos et al. in
1987 [9]. The implementation could simulate various types
of deformable surfaces convincingly, but severable instability
issues were noticed, with regard to simulating deformable
solids. As the authors of the original method claim the model
can simulate deformable curves, surfaces, and solids, we
will look into why the method was unable to simulate
deformable solids satisfactorily. At first sight it seemed that
the solids did have a hard time keeping their integrity.
Integrity preservation is important to give a realistic
impression of a deformable solid. We will try to solve the
instabilities using concepts from the original frame work,
and propose an improved model that is able to simulate
solids, as well as surfaces, without a significant decrease of
the overall performance.
In section 2 we revisit the theory of elastically deformable
models, with focus on solids, to give an overview of the
method. The theory serves as a foundation for
understanding the following sections. Section 3 reveals and
explains the instabilities of the original model. Sections 4
and 5 focus on solutions and improvements of the local and
global problems, respectively. In section 6 we present the
*
†
results of adding the developed improvements to the model.
Comparison between the original and improved model will be
performed visually.
2
Elastically Deformable Solids
At the basis of the theory of deformable models lies elasticity
theory. From this physical theory Terzopoulos et al. have
extrapolated a model that governs the movements of
elastically deformable bodies [9]. In this paper we will
concentrate on the theory of deformable solids.
A point in a solid is described by the intrinsic coordinates
a = [a1 , a 2 , a 3 ] . A deformable solid is thought of as having a
natural rest state, where no elastic energy is inherent. The
rest state is described by r 0 (a ) = [r10 (a ) , r20 (a ) , r30 (a )] ,
where the positional vector function r of the object is
defined in Euclidian 3-space. When the solid is deformed, it
takes on a different shape than its rest shape, and distances
between nearby points are either stretched or compressed
with the deformation. This ultimately creates elastic energy
that results in internal forces that will seek to minimize the
elasticity.
The deformation will evolve over time and as such, it can
be described by the time-varying vector function
r (a, t ) = [r1 (a, t ) , r2 (a, t ) , r3 (a, t )] . The evolving deformation
is independent of the rigid body motion of the solid. The
equations governing the motion of particles in a deformable
solid are obtained from Newtonian mechanics, and given by
( )
s µ sr + H sr + EF ( r ) = f ( r,t ) ,
Er
st
st
st
[email protected]
[email protected]
1
(1)
88
problem defined by a function of three independent variables
and their first derivatives, the equations become
where r (a, t ) is the position of the particle a at time t ,
µ (a) is the mass density at particle a , and H (a) is the
damping density. The right hand side represents the sum of
externally applied forces at time t . The third term on the left
hand side of (1) is called a variational derivative, and this is
where the internal elastic energy is dealt with. F ( r) is called
a functional and it measures the potential energy that builds
up when the body is deformed.
sF d sF d sF d sF = 0 .
sy
dx 1 syx1 dx 2 syx 2
dx 3 syx 3
The symbol F is the function describing the problem, and
F will be minimized with respect to the symbol y , which in
our case is the positional vector field r . Substituting (5) into
F in (6) and executing the differentiations yields a neatly
compact expression that minimizes the given problem
2.1
Deformation Energy
A method is needed to measure the deformation energies
that arise when a solid deforms. For this task, we use the
area of differential geometry. It is convenient to look at arclengths on curves, running along the intrinsic directions of
the solid. A way of measuring the directions is specified by
what is known as the first fundamental form. This form takes
on a local view, and looks at changes in length between
particles in a small neighborhood of a particle. When looking
at the first fundamental form, or the metric tensor, we get an
idea of the distances between particles. For a deformable
solid the metric tensor is
Gij ( r (a )) = sr ¸ sr , 1 b i, j b 3 ,
sai sa j
3
0
0
= s r ¸ sr .
sa i sa j
3
2
0
¨ œ Iij (Gij Gij )
da1da 2da 3 ,
(
3
2
.
)
(8)
The alpha tensors (8) represent the comparison between the
deformed state and the rest state of the solid. When an
element in an alpha tensor becomes positive, it means that
the corresponding constraint has been stretched and that it
wants to shrink. Likewise, when an element becomes
negative, the constraint has been compressed and it wants
to grow.
(2)
2.2
Discretization
The deformable model is continuous in the intrinsic
coordinates. To allow an implementation of deformable
solids, the model is discretized into a unit 3D grid structure,
representing the particles which will make up a solid. The
grid will have three principal directions called l , m , and n .
Particles in the grid are uniformly distributed with spacings in
each of the three directions, given by the symbols h1 , h2 ,
and h3 . The number of particles in each of the directions are
designated L , M , and N . Each particle property will be
discretized using an index [l , m , n ] , which will return the
property value of that particular grid entry, e.g. the particle
positions will be described by the discrete vector function
r[l , m , n ] .
The model requires that derivatives are calculated in the
intrinsic directions of the object. For this purpose we use
finite difference operations across the grid, to achieve the
desired derivative approximations [2]. An approximation to
the first derivative in the m -direction, can for example be
obtained by use of either a forward or backward difference
operator. Given an arbitrary grid function u[l , m , n ] these
approximations are
(3)
(4)
where 8 is the domain of the deformable solid and I is a
user defined tensor that weights each of the coefficients of
the metric. For a single point the energy function is simply
œ Iij (Gij Gij0 )
(7)
0
Bij = Iij ra i ¸ ra j Gij .
8 i , j =1
S =
)
with
By using the weighted Hilbert-Schmidt matrix norm of the
difference between the metric tensors, a simplified way of
describing the energy of deformation becomes
F (r) =
(
ES = œ sa i Bij ra j ,
Er
i , j =1
which is a symmetric 3 × 3 tensor. The diagonal of the
tensor represents length measurements along the
coordinate directions from the particle in question. The offdiagonal elements represent angle measurements between
the coordinate directions, which resist shearing within the
local particle structure.
When measuring deformation energy in a solid, we are
interested in looking at the change of the shape, with
respect to the natural rest shape. This rest shape is
described by the superscript 0 such that
Gij0 ( r (a ))
(6)
(5)
+
1
D2 (u ) = h2 (u [l , m + 1, n ] u [l , m , n ])
i , j =1
(9)
and
The elastic energies that occur when a solid is deformed
come from the stretching and compressing of the particles in
the body. To discover the natural behavior of a deformed
solid seeking towards its rest state, a term that minimizes
the deformation energy is desirable. Variational calculus can
be applied to find such a minimizing term, which is described
by the Euler-Lagrange differential equations [5]. For a
1
D2 (u ) = h2 (u [l , m , n ] u [l , m 1, n ]) ,
(10)
where the superscript symbols, + and , represent forward
and backward operators, respectively.
Using the difference operators, it is possible to discretize
(7), by replacing the derivatives with the corresponding
2
89
difference operators. The discrete equation for the elastic
force e becomes
3
e [l , m, n ] =
œ Di (p)[l , m, n ] ,
(11)
+
(12)
i , j =1
p [l , m , n ] = Bij [l , m , n ]D j ( r)[l , m , n ] ,
and the B tensor field is also discretized using finite
differencing,
Bij [l , m, n ] =
Iij [l , m, n ](Di+ ( r)[l , m, n ] ¸ D j+ ( r)[l , m , n ]
(13)
Gij0 [l , m, n ]).
Problems at the boundaries of the grid will occur because no
information is readily available about the derivatives in these
places. A natural boundary condition can be created by
setting to zero all forward difference operators that reach
outside the grid [9].
To solve equations for all particles at the same time, the
values in the positional grid r and in the energy grid e can
be unwrapped into LMN -dimensional vectors r and e .
With these vectors, the entire system of equations can be
written as
e = K(r) r ,
Figure 2 The surface patch will collapse to a curve, when a particle
crosses the opposite diagonal
At = K ( rt ) +
s2 r
st
2
+C
gt = f t +
2
2
rt +%t
2%t
3
”
(15)
(16)
.
rt +%t = At1 gt ,
2
)
(
)
M + 1 C rt + 1 M 1 C vt , (19)
2%t
2
%t
1
( rt rt %t ) .
(20)
Instabilities
The instabilities we will address are not numerical in nature,
such as the chosen integration method, timesteps, etc. We
will turn the interest towards instabilities, regarding the
structure of the underlying constraints in the model of elastic
solids. The problems can, more or less, be identified as
boundary issues, as the integrity instabilities arise from the
boundaries of the discrete 3D grid. Real life deformable
objects are held together by strong forces at a very small
scale. Similarly, we could solve the integrity problems, if we
could get away with using extremely high values in the
underlying weighting tensors. The reason why we must
disregard this option is because it will require us to integrate
numerically using “infinitely” small timesteps. As we are
interested in interactive simulations, we seek to use large
timesteps, thus to simply increase the values of the tensors
is not a desired option.
Differential Geometry [7] is used as a tool to measure
deformation of an elastic body, in comparison with its resting
shape. For solids, the first fundamental forms, or 3 × 3
metric tensors, are sufficient to distinguish between the
shapes of two bodies. However, the metric tensor of a solid
(2) is not nearly sufficient enough as a tool to compute the
complex particle movements of a deformed solid, seeking
By inserting these derivatives into (15) we convert the
nonlinear system into the system of linear equations
At rt +%t = gt
(18)
With these equations in hand it is possible to implement real
time dynamic simulations of deformable solids. The desirable
properties of the system matrix indicate that a specialized
solution method, such as the conjugate gradient method [8],
can be utilized.
rt +%t 2 rt + rt %t
%t
rt %t
( %1t
vt = %t
2.3
Numerical Integration
To evolve the deformable model over time, a semi-implicit
time integration scheme will be employed. The time interval
that the model will evolve in, is subdivided into time steps of
equal size %t . Using central differences, the time
derivatives are approximated by
s r
2 x
st
sr
x
st
)
M+ 1 C ,
2%t
with the velocity estimated by the normal first order
backwards difference
(14)
sr
+ K(r) r = f .
st
2
and g is called the effective force vector,
where K ( r ) is an LMN × LMN sized matrix, called the
stiffness matrix, which has desirable computational
properties such as sparseness and bandedness. A
discussion on how to assemble the stiffness matrix K can
be found in [1].
We introduce the diagonal LMN × LMN mass matrix M and
damping matrix C , assembled from the corresponding
discrete values of µ[l , m , n ] and H [l , m , n ] , respectively. The
equations of the elastically deformable model (1) can now
be expressed in grid vector form, by the coupled system of
second order ordinary differential equations
M
( %1t
(17)
where A is called the system matrix,
3
90
towards its resting shape. The discrete components of (2),
disregarding the diagonal, describe the cosine to the angle
between adjacent directions multiplied by the product of the
lengths,
v ¸ w = v w cos R,
0bRbQ.
(21)
The angle between two vectors is not dependent on their
mutual orientation, as verified by the domain of R in (21).
This is the main reason why we will get internal structure
instabilities. The case of surface patches is depicted in
Figure 2. The bold lines and the angle between them form
the natural condition. If particle A is moved towards B , as
in case 1, the angular elasticity constraints will force the
particle back to its resting condition, in the direction
indicated by the arrow. The behavior of the angular
constraint force is defined ambiguously when the angle
between vectors is either 0 or 180 degrees. The latter is
shown in case 2. If the particle reaches beyond the opposite
diagonal, as in case 3, the elasticity will be strengthened and
push particle A into B . This is clearly a problem, as it will
reduce the surface into a curve. The original model [9]
suffers from this instability.
The instabilities for solids get even worse. Expanding the
square to a cube and focusing on a particle in the grid. Not
only does the particle spawn surface instabilities for the 3
directional patches, there is also nothing that prevents the
cube from collapsing over the space diagonals. The disability
of volume preservation will ruin the integrity of the solid.
In short, the original elastic model is insufficient to
simulate deformable solids. Based on the modeling concept
by Terzopoulos et al [9], we will remodel the method, to
render it more suitable for simulating elastically deformable
solids.
4
Figure 3 The dual explicit angular constraint is replaced by two new
diagonal constraints that will define the angular constraints
implicitly
considered as new directions called ad 1 and ad 2 . When the
model is extended with the new diagonal constraints,
variational calculus must be utilized for the expression of the
minimizing elastic force. Writing out (5) for the case of
surfaces, with focus on the new directions, yields
+ I21
0
ad 2
0
)
) + ",
2
0
2
¸ ra d 2 G21
(22)
0
where the elements G12 and G21 of the rest state tensor,
now holds the natural state of the new diagonal constraints.
With the new directions, ad 1 and ad 2 , in hand, the EulerLagrange equation changes to
ES = S s S s S s S
sad 2 S ra .(23)
r
a 1 ra
a 2 ra
ad 1 ra
1
2
d1
d2
Er
Integrity Improvements
Using variational calculus on the terms from (22) results in
the following additions to the elastic force e ,
To handle integrity instabilities, we are going to design an
extension to the original model that can improve its ability to
prevent collapsing of the grid cubes. Basically, the extension
will be done by adding new constraints to the model,
gathered into a new tensor we call the Spatial Diagonal
Metric, or SDM. To make things a little easier, we start out by
describing the analogy for surfaces.
4.1
(
(r
S = " + I12 ra d 1 ¸ ra d 1 G12
e[m , n ] = " Dd 1 (p )[m , n ] Dd 2 (q )[m , n ] ,
(24)
where
+
and
Extended Metric
For a surface, the original tension constraints on a given
particle is the four constraints given by its 2 × 2 metric
tensor G . As with solids, when i = j then Gij defines the
length constraints along the coordinate directions m and
n , while when i v j then Gij represents an expression of
the same explicit angular constraint between the two
directions. The idea is to replace the dual explicit angular
constraint with two diagonal length constraints. These
constraints will reach from the particle at [m , n ] to the
diagonally opposite particles at [m + 1, n + 1] and
[m 1, n + 1] , as depicted in Figure 3. Besides working as
diagonal length constraints, they will also implicitly work as
angular constraints that together can account for all 360
degrees. The directions along these new constraints will be
p [m , n ] = I12 [m, n ]Dd 1 (r)[m, n ]
q [m, n ] = I21 [m, n ]Dd+2 ( r)[m, n ].
(25)
Notice that new difference operators arise with the new
directions in the discretization of the extended metric. These
operators can be considered to be just like the operators in
the original directions. For example, the new first order
forward difference operators become
Dd+1 (u ) =
u [m + 1, n + 1] u [m, n ]
hd 1
Dd+2 (u ) =
u [m 1, n + 1] u [m, n ]
,
hd 2
2
2
(26)
where hd 1 = hd 2 = h1 + h2 is the grid distance in both
diagonal directions. The square root is computationally
expensive, but it can easily be avoided in any calculation
4
91
involving the new difference operators, or simply be precomputed as the value does not change at runtime.
We have shown how to extend the metric tensor, and we
call the extended metric for Metrix. To give a deformable
solid a stronger integrity, the Metrix can easily be applied to
work for solids. The 3 × 3 metric tensor for a solid contains
three length constraints along its diagonal. The remaining six
elements represent angular constraints between the original
coordinate directions, and these can be replaced by the
appropriate elements of the Metrix.
Figure 4 Spatial length constraints for solids. On the left it is
shown how the four constraints reach from the center. On the right
it is shown how the constraint contribution from four particles on
one cube side renders symmetric behavior
4.2
Spatial Diagonal Metric
To implement volume preservation, we introduce another
new idea called the Spatial Diagonal Metric, which is a 2 × 2
tensor. The four elements will represent four new length
constraints that will be spatially diagonal, meaning they will
span grid cubes volumetrically. The four new directions can
be chosen three-ways, and we have simply chosen them to
reach from [l , m , n ] to the four particles at [l 1, m +1, n +1] ,
and [l +1, m 1, n +1] . No
[l 1, m 1, n +1] , [l +1, m +1, n +1] ,
matter how the four length constraints are chosen to span
the cubes, their contributions will end up covering a grid
cube symmetrically, as depicted in Figure 4.
The new SDM tensor that represents the volume will be
called V . It follows the same design ideas as the Metrix,
and is defined as
Dv+1 ¸ Dv+1
V = ¡¡ +
+
¡¢Dv 3 ¸ Dv 3
+
+
Dv 2 ¸ Dv 2 ¯
°,
°
±
Dv+4 ¸ Dv+4 °
(27)
Figure 5
together
where Dv+1..4 are the four new first order forward difference
operators along the new spatial directions.
5
Discrete central differences bind adjacent grid cubes
What we seek is a mechanism that somehow binds
adjacent grid cubes together, in such a way that if implosions
occur, we can disperse self-intersecting cubes. This is not a
method that can prevent self-intersections, but it can restore
the integrity of the solid upon implosions. We can reuse what
we have been working with so far, and thus reduce the
computational cost and memory use significantly, compared
to the extra load we would introduce into the system, if we
had used a BVH algorithm.
We introduce a new Pillar tensor P , which is based upon
the discrete metric tensor G , but extended to use first order
central difference operators. For reasons of clarity we will
limit P only to use the length constraints
Global Implosions
With the SDM and Metrix contributions added to the
elasticity term, the model of deformable solid can prevent
the shape of the grid cubes from undergoing local collapsing.
This is an important improvement towards keeping the
integrity of a deformable solid intact. Another integrity issue
still exists that renders a solid unable to prevent implosions.
In this matter, we define an implosion as when a grid cube
enters one of its adjacent grid cubes. Implosions can happen
upon large deformation, which typically are caused by heavy
external forces, e.g. reaction to collisions and aggressive
user interactions. Global implosions can also be defined as
internal self-intersections, thus self-intersection tests can be
utilized as a tool to prevent implosions.
Common methods to avoid self-intersection include
surrounding each grid cube into an axis aligned bounding
box, or AABB, and arrange the AABBs into a bounding
volume hierarchy, or BVH tree, that can be updated
efficiently as the body deforms [4]. A recent paper
introduces image-space techniques that can be
implemented on the GPU, to allow performance friendly
detection of self-intersections and collision between
deformable bodies [6]. Although a method for handling selfintersection can be chosen to perform reasonable, it will still
decrease the overall performance.
D 2 ( r )
0
0 ¯°
¡ 1
¡
°
2
P [l , m, n ] = ¡ 0
D2 ( r )
0 °,
¡
°
¡
°
2
0
D3 ( r )°
¡¢ 0
±
(28)
where
D1 (u)[l, m, n ] = ½h11 (u (l + 1, m, n) u (l 1, m, n)),
1
D2 (u)[l, m, n ] = ½h2 (u (l, m + 1, n) u (l , m 1, n)), (29)
D3 (u)[l, m, n ] = ½h31 (u (l, m, n + 1) u (l , m, n 1)) .
The effect of using central difference operators results in a
very convincing way to bind adjacent grid cubes together, see
5
92
Figure 5. As every grid particle will be extended with the
Pillar tensor, the combined range of P will overlap all grid
cubes.
To further strengthen the prevention of implosion, the
Pillar tensor can easily be extended to use a central
difference Metrix tensor.
6
Results
We have extended the previous implementation of the
elastically deformable models [1] to support both the Metrix,
SDM, and Pillar contributions, when simulating deformable
solids. The implementation can be found in [3]. The
deformable solids respond naturally to external forces, e.g.
gravity and viscosity. The user can interact with the solids,
such as constraining particles to a fixed position and
performing pulling operations using spring forces. We have
also implemented a scaling mechanism, which allows a user
to control the overall uniform strength of the tensors.
Adjusting the strength scaling interactively can vary the
stiffness of a deformable solid in real time. For example
simulating the effect of a solid being inflated.
Experiments have revealed that the effects of the Metrix
and SDM do not always succeed satisfactorily, in moving
particles back to their natural location. In some situations
new energy equilibriums arise unnaturally. With the help of
the supported visual debugger we have realized that
different constraints can work against each other, and the
result is that the sum of constraint contributions is zero. To
counteract this problem, we have squared the force of the
constraints in the SDM (and the Metrix), to make sure they
prevail.
To give a reasonable review of how the improved model
solves the presented integrity instabilities, we will perform
visual comparisons between the original model and the new
improved one. In Figure 6, still frames from a small box that
is influenced by gravity and collides with a plane, are
compared frame to frame between the models. Due to the
lack of volume preservation, the constraints of the original
model simply cannot keep the shape of the cube. In Figure
7, we have recreated a configuration from [1], comparing
two rubber balls with different particle mass. The picture on
the left is taken from [1], where the metrics fail to maintain
the integrity of the right ball, thus the ball collapses on itself.
The picture on the right is simulated using the same
parameters but with the improved model, and the integrity of
the ball is now strong enough to stay solid. In Figure 8, a test
of how well the models can recover from a sudden large
deformation is performed.
The improved model enables simulations of situations that
are impossible with the original model. In Figure 1, a soft
solid is depicted. The solid has been constrained to the
ground, and in three of the top corners. Pulling the last
corner downwards results in a large deformation and
renders convincing material buckling. In Figure 9, the true
strength of the Pillar tensor is illustrated, showing an effect
of inflation. In Figure 10, some pudding is constrained to the
ground, and being twisted by its top face. The sides of the
deformable cube skew as is expected of a soft body like
pudding. In Figure 11, a large water lily is deformed upon
resting on two pearls. The improved model performs a great
job in keeping the water lily fluffy.
Figure 6 A small box is influenced by gravity and collides with a
plane. The three stills on the left illustrate the original model, and
on the right the frames from the new improved model is shown
Figure 7 Rubber balls. The left frame illustrates the situation from
the old model where the right ball is unable to maintain its integrity.
In the right frame the same situation is depicted, but simulated
using the improved model
Figure 8 A wooden box is heavily lifted in one corner. Original vs.
improved model, on the left and right frame, respectively
6
93
Figure 9 Constraint strength is increased interactively and yields the effect of inflation
Figure 11 Fluffy water lily modeled using an ellipsoid solid
Figure 10 Twisting the pudding renders skewing
7
References
Conclusion
[1] M. K. Christensen and A. Fleron (2004), “Implementation
of Deformable Objects,” Department of Computer
Science, University of Copenhagen, DIKU
[2] D. Eberly (2003), "Derivative Approximation by Finite
Differences," Magic Software, Inc., January 21, 2003
[3] K. Erleben, H. Dohlmann, J. Sporring, and K. Henriksen
(2003) “The OpenTissue project,” Department of
Computer Science, University of Copenhagen, DIKU,
November 2003,
http://www.diku.dk/forskning/image/research/opentissue/
[4] K. Erleben, J. Sporring, K. Henriksen, and H. Dohlmann
(2004), "Physics-based Animation and Simulation," DIKU
[5] H. Goldstein, C. P. Poole, and J. L. Safko (2002),
"Classical Mechanics," Third Edition, Addision-Wesley
[6] B. Heidelberger, M. Teschner, and M. Gross (2004),
"Detection of Collisions and Self-collisions Using Imagespace Techniques," Proc. WSCG'04, University of West
Bohemia, Czech Republic, pp. 145-152
[7] J. J. Koenderink (1990), "Solid Shape," MIT Press
[8] J. R. Shewchuk (1994), "An Introduction to the Conjugate
Gradient Method Without the Agonizing Pain," Carnegie
Mellon University
[9] D. Terzopoulos, J. C. Platt, A. H. Barr, and K. Fleischer
(1987), “Elastically deformable models,” Computer
Graphics, volume 21, Number 4, July 1987, pp 205-214
The original model for deformable solids, presented in [9],
turned out to be insufficient for the sake of realism. Even
extremely modest external forces applied to the bodies
would ruin their integrity, as can be verified on the left side
of Figures 6 to 8. Clearly some extensions to the model were
needed. In this paper we have presented three
improvements to the model that will give deformable solids a
much better ability to keep their original integrity, and thus
the ability to handle much larger deformations without
collapsing. With the new Metrix, SDM, and Pillar additions to
the model, the overall strength of the internal elastic
constraints is increased. This reinforces the impression that
the elastic bodies are actually solids.
We have shown that the improvements to the original
model greatly increase the usability of the method for
simulating deformable solids. As the new contributions
influence the same particles as the original model, the
system matrix A is still of size LMN × LMN , and it still
possess its original pleasant properties that allows an
relaxation based solver to invert it, using only a few
iterations. The Metrix replaces the metric tensor, thus it is
only the SDM and Pillar additions that are new to the
calculations, and they can be computed in constant time for
each particle.
Newer papers on deformable solids state the importance
of preservations of length, surface area, and volume. What
we have done with the elastically deformable model is
precisely to strengthen the surface preservation using the
Metrix, and to implement the missing volume preservation
using the SDM.
7
94
The Thin Shell Tetrahedral Mesh
Kenny Erleben∗and Henrik Dohlmann†
Department of Computer Science, University of Copenhagen, Denmark
Abstract
Tetrahedral meshes are often used for simulation of deformable objects. Unlike engineering disciplines graphics is biased towards stable, robust and fast methods, instead of accuracy. In that spirit we
present in this paper an approach for building a thin inward shell
of the surface of an object. The goal is to device a simple and fast
algorithm yet capable of building a topologically sound tetrahedral
mesh. The tetrahedral mesh can be used with several different simulation methods, such as the nite element method (FEM).
The main contribution of this paper is a novel tetrahedral mesh
generation method, based on surface extrusion and prism tesselation.
Twofold
Brep
Inward
Extrusion
CR Categories: I.3.5 [Computer Graphics]: Computational Geometry and Object Modeling—Physically based modeling; I.3.7
[Computer Graphics]: Three-Dimensional Graphics and Realism—
Animation;
Keywords: Tetrahedral Mesh, Erosion, Extrusion, Tesselation,
Shell, Prism
1
Tesselating
Prisms
Introduction
Given a 3D polygonal model created by a 3D artist, it is often a
challenge to create a spatial structure for simulating a deformable
object. Besides, polygonal models for visual pleasing pictures tends
to be highly tessellated. Thus, even if they do not pose any “errors”,
creating a tetrahedral mesh directly from the polygonal model,
tends to create an enormous amount of tetrahedra. To achieve real
time performance, one seeks a more coarse tetrahedral mesh. These
are the kind of problems we aim to solve in this paper.
Given a twofold boundary representation of an object, as a connected triangular mesh (a watertight surface), the tetrahedral mesh
is build by extruding each triangle inward, that is in opposite direction of the triangle normal. Thus, for each triangle a prism
is generated. The result is a volumetric mesh consisting of connected prisms. These prisms can now be tessellated into tetrahedra,
thereby creating the rst layer of the thin shell tetrahedral mesh.
Succeeding layers can be created by recursively applying this approach. Figure 1 illustrates the basic idea. Although the overall idea
is simple, the approach is not without problems. Polygonal models
are seldom twofold, but suffers from all kind of degeneracies. The
idea we have illustrated is obviously capable of handling an open
boundary, but cases where edges share more than two neighboring
faces, or self-loop edges, are clearly unwanted, since the generated
prisms will overlap each other or degenerate into a zero-volume
prism.
The prism generation is reminiscent of an erosion operation with
a spherical strutural element (mathematical morphology). The radius of the sphere corresponds to the extrusion length. It is well
known that working directly on the Brep [Sethian 1999] is fast
and simple, but topological problems arises easily, such as swallow
tails.
In case the given triangle mesh is not a proper mesh, one can
apply a mesh reconstruction algorithm [Nooruddin and Turk 2003].
∗ [email protected][email protected]
Figure 1: The basic algorithm for generating a thin tetrahedral shell
from a boundary representation.
We will require the following properties of the prisms making up
a thin shell layer
• No two prisms must be intersecting each other (Neighbors are
allowed to touch each other at their common faces).
• No prism must collapse to zero volume or be turned inside out
(equivalent to signed volume is always positive).
• All prisms must be convex
Unfortunately, even if we are given a perfect connected twofold
triangle mesh, we can get into trouble if we make the inward extrusion too big. This is illustrated in Figure 2. Here, the large extrusion
length causes prisms B and C to become non-convex. Furthermore,
A and D, B and D, C and A, and B and C are overlapping. Fortunately, these degenerate and unwanted prisms can be avoided if the
extrusion were made smaller. Thus, we seek an upper bound on
how far we can extrude the triangle faces inward, without causing
degenerate prisms.
A publicly available implementation of the described algorithm
can be found in [OpenTissue 2004].
Existing tetrahedral mesh generation methods create an initial,
blocki ed tetrahedral mesh from a voxelization or signed distance
map. Afterwards, nodes are iteratively repositioned, while subsampling tetrahedra to improve mesh quality [Mueller and Teschner
2003; Persson and Strang 2004; Molino et al. 2004]. Our approach
differs mainly from these in being surface-based.
95
must have non-zero volume, as long as prism is not turned inside
out. We therefore seek a robust way to determine an upper bound
on ε, such that all generated prisms will be valid.
The direction of the normal of the extruded face,nq , can be found
from q1 ,q2 , and q3 , using the cross-product:
nq (ε) = (q2 (ε) −q1 (ε)) × (q3 (ε) −q1 (ε)) .
This is a second order polynomial in ε,
nq (ε) = (q2 (ε) −q1 (ε)) × (q3 (ε) −q1 (ε))
= ((p2 −n2 ε) − (p1 −n1 ε)) × ((p3 −n3 ε) − (p1 −n1 ε))
= ((p2 −p1 ) + (n1 −n2 ) ε) × ((p3 −p1 ) + (n1 −n3 ) ε)
= ((p2 −p1 ) × (p3 −p1 )) +
%
&'
(
B
c
C
A
((p2 −p1 ) × (n1 −n3 ) + (n1 −n2 ) × (p3 −p1 )) ε+
%
&'
(
b
D
((n1 −n2 ) × (n1 −n3 )) ε 2
%
&'
(
a
Figure 2: Degenerate prisms results from a too big inward extrusion.
2
Inward Extrusion
As a preprocessing step, we compute the pseudo normals for all vertices (the angle-weighted normals [Aanæs and Bærentzen 2003]).
These will indicate the direction along which a vertex will be extruded.
Given a triangle consisting of three vertices p1 , p2 and p3 , with
angle weighted normals n1 , n2 and n3 , the inward extruded prism is
de ned by the six corner points:
The extrusion length is given by ε > 0. Notation is illustrated in
Figure 3. By requiring ε to be strictly positive, all generated prisms
n2
p2
n1
p3
p1
q2
nq
q3
Observe that c = 0, since its magnitude is equal to twice the area of
the triangle being extruded.
To ensure convexity, the dot product of the direction of the normal of the extruded face, nq , with the pseudo normals, n1 , n2 , and
n3 , must always be positive. That is
n1 ·nq (ε) > 0
n2 ·nq (ε) > 0
n3 ·nq (ε) > 0.
This yields the following system of constraints,
⎤⎡ ⎤
⎡
n1 ·a n1 ·b n1 ·c ε 2
⎣n2 ·a n2 ·b n2 ·c⎦ ⎣ ε ⎦ > 0.
1
n3 ·a n3 ·b n3 ·c
p1
p2
p3
q1 (ε) = p1 −n1 ε
q2 (ε) = p2 −n2 ε
q3 (ε) = p3 −n3 ε.
n3
= aε 2 +bε +c.
q1
Figure 3: The six corner points de ning a prism, and pseudo normals yielding extrusion directions.
We solve for the smallest positive ε ful lling the system of constraints. That is, each row represents the coef cient of a second
order polynomial in ε, thus for each row we nd the two roots of
the corresponding polynomial. The three rows yields a total of 6
roots. If no positive root exist, then ε = ∞, otherwise ε is set equal
to the smallest positive root.
Observe that the third column of the coef cient matrix is always
positive (by the property of the angle weighted normals). The rst
column can be interpreted as an indication of, whether the corresponding extrusion line is trying to “shrink” (< 0) or “enlarge”
(> 0) the extruded face. The middle column is dif cult to interpretate. As far as we can tell it resembles an indication of the skewness
of the resulting prism.
In fact, the tree convexity constraints ensure that no neighboring
prism will intersect each other, nor will the prism turn its inside out
(ie. ipping the extruded face opposite the original face).
The maximum extrusion length for the entire layer can be found
by iterating over each prism. For the i’th prism the extrusion length
ε i is computed. The maximum extrusion length of the layer is found
as
.
ε = min ε 0 , . . . , ε n−1 .
Afterwards, it is a simple matter to compute the actual extrusion
and generating the prisms.
96
3
Prism Generation
algorithm markDegeneratePrism(ε, γ)
for each prism p do
if ε p = ε then
if ||nq (ε)|| ≤ γ then
if (q2 (ε) −q1 (ε)) = 0
or (q3 (ε) −q1 (ε)) = 0 then
mark p as line-degenerate
else
mark p as point-degenerate
end if
end if
next p
End algorithm
The technique in the previous section guarantee that prisms will
be convex, and that no intersections occurs between neighboring
prisms. However, degenerate prisms can still occur whenever the
extruded face collapses to a line or to a point. These are shown in
Figure 4. These degenerate cases must be marked, such that the foln2
p2
n3
n1
p3
p1
Figure 5: Pseudo code for marking degenerate prisms.
q1,2,3
n2
p2
n3
Layer 0
n1
p3
p1
q2
Layer 1
q1
q3
Figure 6: Degenerate cases affect succeeding layers.
Figure 4: Point-degenerate and line-degenerate prisms.
lowing tessellation can take them into consideration. The problem
is how to detect these degenerate cases. If no upper bound were
found on the extrusion length for a prism, we can trivially reject the
prism. However, if an upper extrusion bound were computed, the
prism might degenerate. As a rst step in marking the degenerate
prisms, we iterate over all those prisms where an upper bound were
computed. If the computed upper bound for a prism is equal to the
extrusion length, then it might be a degenerate prism. For each of
these possible degenerate prisms we rst test whether
||nq (ε)|| ≤ γ,
where γ is a user speci ed threshold to counter numerical precision
problems. If this criteria is ful lled, we clearly know that we are
dealing with a degenerate prism.
The degenerate prisms can be classi ed as either pointdegenerate or line-degenerate, as shown in Figure 4. The linedegenerate case is given by the criteria
(q2 (ε) −q1 (ε)) = 0
or
(q3 (ε) −q1 (ε)) = 0.
If this criteria is not ful lled we have a point-degenerate case. A
pseudo code version of the algorithm is shown in Figure 5. The
degenerate cases do not only in uence the prism tessellation (which
we will treat in the next section). If another thin shell layer is to be
generated, then the original mesh faces can no longer be used. A
simple 2D case is shown in Figure 6. Notice that the bold red faces
were used when creating layer 0, but they vanish when layer 1 is
created. Therefore, if another layer should be generated, a new
connected triangular mesh must be formed from the extruded faces
of the non-degenerate prisms.
In Figure 7 the surface mesh generation algorithm is shown in
pseudo code.
4
Prism Tessellation
For non-degenerate prisms, having 6 corners, the minimum number
of tetrahedra we can tesselate the prism into, is 3. This is shown in
Figure 8. The problem with this approach is that the extruded sides
of the prism will be triangulated. One therefore has to ensure, that
the tessellation of neighboring prisms agree with each other. This
is illustrated in Figure 9. As seen in Figure 9 it becomes a global
combinatorial problem to match the tesselation of neighbors against
each other.
If we use the centroid of the prism as the apex for creating tetrahedra then a prism can be tesselated into eight tetrahedra, as shown
in Figure 10. With this approach the tesselation is no longer a global
problem, since the tesselation of each prism side can be chosen in-
algorithm build-surface(Mesh : M)
for each prism p do
for i = 1, 2, 3 do
/ M then
if qi ∈
add qi to M
end if
next i
if p not degenerate then
add face q1 ,q2 ,q3 to M
end if
next p
End algorithm
Figure 7: Pseudo code for generating surface mesh for next shell
layer.
97
Figure 11: Nice vs. bad line case.
Rising (R)
Falling (F)
Falling (F)
Figure 8: Prism tessellated into 3 tetrahedra.
Figure 9: Tessellation of neighboring prisms must be consistent.
dependently of each other.
Both approaches deals nicely with the point-degenerate case.
Since a point-degenerate prism is already a tetrahedron, there is
no need to tesselate it. However, since the point-degenerate case
only have triangular sides, it can never be a direct neighbor with
a non-degenerate prism. It can only be neighbor with other pointdegenerate cases or line-degenerate cases, which have both rectangular and triangular sides, as shown in Figure 11.
In the following we disregard degeneracies and consider the
three-tetrahdra tesselation strategy. This is because it is more attractive due to the lower tetrahedra count.
A prism can be tesselated into three tetrahedra in 6 different
ways. In order to classify the 6 types of tesselation, we will mark
the rectangular sides of a prism as falling (F) or rising (R). The
edge type depends on wether the tesselation edge is falling or rising
as we travel along the extruded prism face in counter clock wise
order. See Figure 12. We observe that the three-tetrahedra tesselation strategy will always have two prism sides of the same type,
and the last side of opposite type. Thus, we can only have 6 different patterns, as shown in Table 1. The consistency requirement
implies that if one side of a prism is marked as F, then the neighboring prism will have marked the same side as R. In short, no
neighboring prisms will have a side marked with the same type.
A simple tesselation example is shown in Figure 13. Here a tetrahedron mesh is being tesselated. The thetrahedron has been cut
up and layed out in 2D. Triangle edges corresponds to rectangular
sides of prisms. Let us apply a brute-force strategy to the combi-
Figure 12: Classi cation of prism sides as falling (F) or rising (R).
natorial problem of the tesselation as follows: We start at a single
prism and choose one of the 6 tesselation types. Then we visit the
neighboring prisms and choose a tesselation type that agree with
the immediate neighbor prisms, which have already been tesselated.
This is a breadth rst traversal over the prisms.
The method is not fail-safe, since inconsistency can arise, as
shown in Figure 14. Here, the middle prism is the last prism to
be visited by the traversal. Clearly, it is impossible to assign a tesselation type to the prism, since all three sides should have the same
type. We can repair the inconsistency by picking one of the neighboring prisms and ipping the type of its shared edge. This action
will not change the type of any of the edges marked with arrows in
Figure 14. Therefore, the repairing action do not cause a rippling
effect through the prisms, and inconsistencies are not introduced at
other places.
Fixing inconsistency in this way is attractive, since it offers a
local solution to a global problem. However, sometimes we might
end up in a dead-lock where no local solution can be found, as is
shown in the top of Figure 15.
This time the drawing in the gure resembles a small local view
of a larger mesh. Notice that none of the edges shared with the
inconsistent prism can be ipped, without creating an inconsistent
neighboring prism. The problem is that all the edges marked with
arrows are of the same type.
The solution to the problem is shown in the bottom of Figure 15.
We let the inconsistency ripple as water waves over to neighboring
prisms, in a search for a single prism, where an edge ip does not
give rise to a new inconsistency. When such a prism is encountered,
we track the trajectory of the ripple wave-front back to the originating inconsistent prism, and ip all shared edges lying on this path.
In Figure 16 we have shown the result of the rippling. Notice that
two edges are ipped. These are the edges lying on the path to the
prism that could be ipped. Also notice that all edges marked with
arrows are unaffected by the rippling action. This property ensures
that the rippling action will not cause inconsistencies in any prisms
elsewhere in the mesh.
A pseudo code of the tesselation-pattern- nding algorithm is
shown in Figure 17. Our proposed tesselation pattern algorithm
F
R
R
R
F
F
Figure 10: Prism tessellated into eight tetrahedra.
R
F
R
F
R
F
R
R
F
F
F
R
Table 1: The 6 Four-Tetrahedra Tesselation types.
98
F
R
R
F
F
F
R
R
R
R
F
R
F
F
R
R
R
F
F
R
R
Figure 13: Tesselation Example. A simple 3D mesh (a tetrahedron)
have been cut up and layed down in 2D. Triangles correspond to
prisms in the thin shell.
F
F
F
R
R
R
F
F
R
R
F
R
F
F
R
F
F
F
R
R
R
Figure 14: Inconsistent Tesselation Example. The middle prism
will have the same type on all sides, which is illegal.
Figure 15: Top picture shows a dead-locked inconsistent tesselation. The bottom picture shows that inconsistency problem have
been propagated to neighboring prisms further away.
is an ad-hoc solution for the problem at hand. We do not have a
formal proof, stating that it is always possible to nd a consistent
pattern of rising and falling tesselation edges.
5
Results
We have implemented the extrusion length computation and
tesselation-pattern algorithm. Currently the implementation detects
degenerate prisms, but do not tesselate these.
In our test cases we have chosen 14 meshes of increasing size,
that were all scaled to be within the unit-cube. A single layer shell
were computed, given a user speci ed maximum extrusion length.
In Table 2, performance statistics are listed, together with polygon
count, extrusion lengths and rippling action information. The timings for a single layer construction are cheap, and appears to scale
lineary with mesh size. The resulting tetrahedral meshes are visualized in Figure 18. As seen in Table 2, the test-cases: diku, teapot,
propeller, funnel, cow, and bend have a surpringsingly small extrusion limit. The remaining test cases show excellent extrusion limits.
Figure 19 shows the prisms corresponding to the minimum extrusion limit. Notice that in all cases, where the limit is unexpected
small, the limit is caused by small faces or long slivers on sharp
ridges. Figure 20 shows cut-through views of the cylinder, pointy,
tube, sphere, teapot, funnel, bowl, and torus meshes. Notice how
thin the teapot and funnel are.
F
R
R
R
R
F
R
F
R
R
F
R
F
R
R
Figure 16: The rippling solution to the dead-locked case shown in
Figure 15.
99
Figure 18: Tesselation results of the 14 meshes.
100
Figure 19: Prisms marked with blue have minimum extrusion limit.
101
Figure 20: Various cut-through views of a few selected meshes, illustrating the thin shell.
102
algorithm tesselation-pattern()
Queue Q
Push first prism onto Q
While Q not empty do
Prism p = pop(Q)
mark p as visited
if neighbors is not tesselated then
pick random pattern of p
else if exist consistent pattern with neighbors
assign consistent pattern to p
else
if exist neighbor that can be flipped
flip edge type of shared edge with p
assign constent pattern to p
else
perform-rippling
end if
end if
for all unvisited neighbors n of p do
push(Q, n)
next n
End while
End algorithm
Figure 17: Pseudo code for determining tesselation pattern.
6
Discussion
We have omitted the problem of shell layers overlapping from opposite sides. In Figure 20 the problem is seen in the case of the
bowl mesh. Our solution to the problem have been to ignore it. The
user can always choose a smaller maximum extrusion length.
Degenerate prisms were ignored, in the sense that we are able to
detect if they occur, but our tesselation pattern algorithm is not yet
capable of handling them. No degenerate prisms were generated
for any of the examples in our result section.
The global computation of the extrustion length work fairly well
for some meshes, but for others a surpingsly small extrusion length
is found. Long slivers and small faces lying close to sharp ridges are
the reason for this phenemena, as can be seen in Figure 19. Thus,
we conclude that, not surpringsingly, the algorithm is highly dependent on both the shape of the object, but also upon the tesselation of
the object surface. It appears that a good uniform tesselation work
best. Abrubt tesselation with large aspect ratios results in small extrusion lengths. Mesh reconstruction [Nooruddin and Turk 2003]
could be used as a preprocessing step to create a more suitable tesselation.
Another avenue for circumventing the problem of a small global
extrusion, might be to investigate the possibililty of having nonglobal extrusion lengths, i.e. a varying extrusion length over the
mesh, adapting itself to take the local maximum length without
causing degenerate prisms. We believe this is an interesting thought
an leave it for future work.
Our results indicate that our tesselation pattern algorithm works:
We have not yet encountered an unsolvable problem. We believe
this shows, that the combinatorial problem of nding the tesselation
pattern, is at least solvable in practice. From a theoretical viewpoint, a proof of existence would be very interesting, and we leave
this for future work.
7
Conclusion
In this paper we have presented preliminary results, showing that it
is possible to generate a thin shell, without any topological errors.
box
cylinder
pointy
diku
tube
sphere
teapot
propeller
funnel
cow
bend
bowl
torus
knot
|F|
12
48
96
288
512
760
1056
1200
1280
1500
1604
2680
3072
5760
|R|
0
0
0
0
0
0
1
2
1
0
0
0
0
0
δ
0.1
0.1
0.083
0.004
0.1
0.1
0.001
0.004
0.003
0.001
0.006
0.017
0.099
0.1
ε
0.866
0.431
0.083
0.004
0.174
0.476
0.001
0.004
0.003
0.001
0.006
0.017
0.099
0.102
time(secs.)
0
0
0
0
0.01
0.01
0.01
0.02
0.029
0.02
0.029
0.04
0.059
0.089
Table 2: Performance Statistics on 14 different test cases. In all
cases the end user requested a shell thickness of 0.1. The δ -column
shows the actual shell thickness produced. The ε-column shows
the extrusion limit. The |F|-column gives the face count of the
meshes. The |R|-column gives the number of times the ripple action
were invoked. The zero entires in the time column indicates that the
duration were not measureable by the timing method.
As pointed out in the previous section, there are many unsolved
issues to be dealt with.
Our motivation for this work were to create a volumetric mesh
with low tetrahedra count for animation purpose. Due to the early
stage of this work, we have not yet validated whether our approach
is useful for animation.
References
A ANÆS , H., AND B ÆRENTZEN , J. A. 2003. Pseudo–normals for signed
distance computation. In Proceedings of VISION, MODELING, AND
VISUALIZATION.
M OLINO , N., B RIDSON , R., T ERAN , J., AND F EDKIW, R. 2004. Adaptive
physics based tetrahedral mesh generation using level sets. (in review).
M UELLER , M., AND T ESCHNER , M. 2003. Volumetric meshes for realtime medical simulations. In Proc. BVM, 279–283.
N OORUDDIN , F., AND T URK , G. 2003. Simpli cation and repair of polygonal models using volumetric techniques. IEEE Transactions on Visualization and Computer Graphics 9, 2, 191–205.
O PEN T ISSUE, 2004. http://www.diku.dk/forskning/image/research/opentissue/.
P ERSSON , P.-O., AND S TRANG , G. 2004. A simple mesh generator in
matlab. SIAM Review 46, 2 (June), 329–345.
S ETHIAN , J. A. 1999. Level Set Methods and Fast Marching Methods.
Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science. Cambridge University Press. Cambridge Monograph on Applied and Computational Mathematics.
103
Talking Faces – a State Space Approach
Tue Lehn-Schiøler
The Technical University of Denmark
Informatics and Mathematical Modelling
Email: [email protected]
July 5, 2004
Abstract
In this paper a system that transforms speech waveforms to animated
faces are proposed. The system relies on continuous state space models
to perform the mapping, this makes it possible to ensure video with no
sudden jumps and allows continuous control of the parameters in ’face
space’.
The performance of the system is critically dependent on the number of
hidden variables, with too few variables the model cannot represent data,
and with too many overfitting is noticed.
To create a photo realistic image an Active Appearance Model is used.
Simulations are performed on recordings of 3-5 sec. video sequences with
sentences from the Timit database. From a subjective point of view the
model is able to construct an image sequence from an unknown noisy
speech sequence even though the number of training examples are limited.
1
Introduction
The motivation for transforming a speech signal into lip movements is at least
threefold. Firstly, the language synchronization of movies often leaves the actors
mouth moving while there is silence or the other way around, this looks rather
unnatural. If it was possible to manipulate the face of the actor to match the
actual speech it would be much more pleasant to view synchronized movies
(and a lot easier to make cartoons). Secondly, even with increasing bandwidth
sending images via the cell phone is quite expensive, therefore, a system that
allows single images to be sent and models the face in between would be useful.
The technique will also make it possible for hearing impaired people to lip
read over the phone. If the person in the other end does not have a camera
on her phone, a model image can be used to display the facial movements.
Thirdly, when producing agents on a computer (like Windows Office Mr. clips)
it would make communication more plausible if the agent could interact with
lip movements corresponding to the (automatically generated) speech.
The idea of extracting phonemes or similar high-level features from the
speech signal before performing the mapping to the mouth position has been
widely used in the lip-sync community. Goldenthal [9] suggested a system called
”Face Me!”. He extracts phonemes using Statistical Trajectory Modeling. Each
phoneme is then associated with a mouth position (keyframe). In Mike Talk
104
[6], phonemes are generated from text and then mapped onto keyframes, however, in this system trajectories linking all possible keyframes are calculated in
advance thus making the video more seamless. In ”Video rewrite” [2] phonemes
are again extracted from the speech, in this case using Hidden Markov Models.
Each triphone (three consecutive phonemes) has a mouth sequence associated
with it. The sequences are selected from training data, if the triphone does
not have a matching mouth sequence in the training data, the closest available
sequence is selected. Once the sequence of mouth movements has been determined, the mouth is mapped back to a background face of the speaker. Other
authors have proposed methods based on modeling of phonemes by correlational
HMM’s [21] or neural networks [10].
Methods where speech is mapped directly to facial movement are not quite as
popular as phoneme based methods. However, in ’Picture my voice’ [13], a time
dependent neural network, maps directly from 11 × 13 Mel Frequency Cepstral
Coefficients (MFCC) as input to 37 facial control parameters. The training
output is provided by a phoneme to animation mapping but the trained network
does not make use of the phoneme representation. Also Brand [1] has proposed a
method based on (entropic) HMM’s where speech is mapped directly to images.
In [17] Nakamura presents an overview of methods using HMM’s, the first MAPV converts speech into the most likely HMM state sequence and the uses a table
lookup to convert into visual parameters. In an extended version MAP-EM the
visual parameters are estimated using the EM algorithm. Methods that do not
rely on phoneme extraction has the advantage that they can be trained to work
on all languages, and that they are able to map non-speech sounds like yawning
and laughing.
There are certain inherent difficulties in mapping from speech to mouth positions an analysis of these can be found in [7]. The most profound is the confusion between visual and auditive information. The mouth position of sounds
like /b/,/p/ and /m/ or /k/,/n/ and /g/ can not be distinguished even though
the sounds can. Similarly the sounds of /m/ and /n/ or /b/ and /v/ are very
similar even though the mouth position is completely different. This is perhaps
best illustrated by the famous experiment by McGurk [16]. Thus, when mapping from speech to facial movements, one cannot hope to get a perfect result
simply because it is very difficult to distinguish whether a ”ba” or a ”ga” was
spoken.
The rest of this paper is organized in three sections, section 2 focuses on
feature extraction in sound and images, in section 3 the model are described.
Finally experimental results are presented in section 4.
2
Feature extraction
Many different approaches has been taken for extraction of sound features. If
the sound is generated directly from text phonemes can be extracted directly
and there is no need to process the sound track [6]. However, when a direct
mapping is performed one can choose from a variety of features. A non-complete
list of possibilities include Perceptual Linear Prediction or J-Rasta-PLP as in
[1, 5], Harmonics of Discrete Fourier Transform as in [15], Linear Prediction
Coefficients as in [12] or Mel Frequency Cepstral Coefficients [9, 10, 13, 17]. In
this work the sound is split into 25 blocks per second (the same as the image
2
105
Figure 1: Facial feature points 1 .
frame rate) and 13 MFCC features are extracted from each block. To extract
features from the images an Active Appearance model (AAM) [3] is used. The
use of this model for lipreading has previously been studied by Mathews et al.
[14]. AAM’s are also useful for low bandwidth transmission of facial expressions
[20]. In this work an implementation by Mikkel B. Stegman [19] is used. For
the extraction a suitable subset of images in the training set are selected and
annotated with points according to the MPEG-4 facial animation standard (Fig.
1). Using these annotations a 14-parameter model of the face is created. Thus,
with 14 parameters it is possible to create a photo realistic image of any facial
expression seen in the training set. Once the AAM is created the model is
used to track the lip movements in the image sequences, at each point the 14
parameters are picked up. In Fig. 2 the result of the tracking is shown for a
single representative image.
3
Model
In this work the mapping from sound to images is performed by a Kalman
filter. The implementation makes use of the toolbox written by Kevin Murphy
(http://www.ai.mit.edu/ murphyk/Software).
1 Image
from www.research.att.com/projects/AnimatedHead
3
106
Figure 2: Image with automatically extracted feature points. The facial feature
points are selected from the MPEG-4 standard
Normally, HMM’s or a Neural network is used for the mapping. In the case
of HMM’s a series of models are created,each one is trained on a specific subset
of data. At each time step model that has the highest likelihood wins and is
then responsible for producing the image. In this work the entire sequence is
considered at once and only a single state space model is trained. In case of the
Kalman filter the model set up is as follows:
xk
sk
ik
= Axk−1 + nxk
= Bxk + nsk
= Cxk + nik
(1)
(2)
(3)
In this setting ik is the image features at time k, sk is the sound features and
xk is a hidden variable without physical meaning. x can be thought of as some
kind of brain activity controlling what is said. Each equation has i.i.d. Gaussian
noise component n added to it.
During training both sound and image features are known, and the two
observation equations can be collected in one.
s sk
B
nk
(4)
=
xk +
ik
nik
C
By using the EM algorithm [4, 8] on the training data, all parameters
{A, B, C, Σx , Σs , Σi } can be found. Σ’s are the diagonal covariance matrices
of the noise components.
When a new sound sequence arrives Kalman filtering (or smoothing) can
be applied to equations (1,2) to obtain the hidden state x. Given x the corresponding image features can be obtained by multiplication, ik = Cxk . If the
intermediate smoothing variables are available the variance on ik can also be
calculated.
4
107
4
results
The data used is taken from the vidtimit database [18]. The database contains
recordings of large number of people each uttering ten different sentences while
facing the camera. The sound recordings are degraded by fan-noise from the
recording pc. In this work a single female speaker is selected, thus 10 different
sentences are used, nine for training and one for testing.
To find the dimension of the hidden state (x), the optimal parameters for
the KF were found for varying dimensions. For each model the likelihood on
training and test sequences were calculated, the result is shown in Fig. 3 and
Fig. 4.
The test likelihood provides a statistical measure of the quality of the model
and provides a way of comparing models. This allows comparison between
different model approaches. Unfortunately the likelihood is not necessarily a
good measure of the quality of a model prediction. If the distributions in the
model are broad, i.e. the model has high uncertainty, it can describe data well,
but, is not a good generative model.
Looking at the results in Fig. 3 and Fig. 4 it is seen that the likelihood of a
KF has a peak in the test data around 40 hidden states. In Fig. 5 snapshots from
the KF sequence are provided for visual inspection, the entire sequence is available
at
http://www.imm.dtu.dk/˜tls/code/facedemo.php, where other demos can also
be found.
No precise metric exist for evaluation of synthesized lip sequences. The distance between facial points in the true and the predicted image would be one
way, another way would be to measure the distance between the predicted feature vector and the feature vector extracted from the true image. However,
the ultimate evaluation of faces can be only provided by human interpretation. Unfortunately it is difficult to get an objective measure this way. One
possibility would be to get a hearing impaired person to lipread the generated
sequence, another to let people try to guess which sequence was real and which
was computer generated. Unfortunately, such test are time and labor demanding. Further more these subjective test does not provide an error function that
can be optimized directly.
5
Conclusion
A speech to face mapping system relying on state space models is proposed. The
system makes it possible to easily train a unique face model that can be used
to transform speech into facial movements. The training set must contain all
sounds and corresponding face gestures, but there are no language or phonetic
requirements to what the model can handle.
In this work a linear model is used but future work will investigate the use
of nonlinear models by using particle filtering and Markov Chain Monte Carlo
methods. Other future work include extracting emotions from the speech and
mapping them to the face.
5
108
Figure 3: The likelihood evaluated on the training data. The Kalman filter is
able to utilize the extra dimension to improve the training result.
Figure 4: The likelihood evaluated on the test data. The Kalman filter improves
performance as more hidden dimensions are added and overfitting is seen for
high number of hidden states.
6
109
(a)
(b)
(c)
Figure 5: Characteristic images taken from the test sequence when using the
Kalman filter. The predicted face is to the left and the true face to the right.
References
[1] Matthew Brand. Voice puppetry. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pages 21–28. ACM
Press/Addison-Wesley Publishing Co., 1999.
[2] Christoph Bregler, Michele Covell, and Malcolm Slaney. Video rewrite:
driving visual speech with audio. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pages 353–360.
ACM Press/Addison-Wesley Publishing Co., 1997.
[3] T.F. Cootes, G.J. Edwards, and C.J. Taylor. Active appearance models.
Proc. European Conference on Computer Vision, 2:484–498, 1998.
[4] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from
incomplete data via the EM algorithm. JRSSB, 39:1–38, 1977.
[5] S. Dupont and J. Luettin. Audio-visual speech modelling for continuous
speech recognition. IEEE Transactions on Multimedia, 2000.
[6] T. Ezzat and T. Poggio. Mike talk: a talking facial display based on
morphing visemes. Proc. Computer Animation IEEE Computer Society,
pages 96–102, 1998.
[7] Lavagetto F. Converting speech into lip movements: A multimedia telephone for hard of hearing people. IEEE Trans. on Rehabilitation Engineering, 3(1), 1995.
7
110
[8] Z. Ghahramani and G.E. Hinton. Parameter estimation for linear dynamical systems. Technical report, 1996. University of Toronto, CRG-TR-96-2.
[9] William Goldenthal, Keith Waters, Thong Van Jean-Manuel, and Oren
Glickman. Driving synthetic mouth gestures: Phonetic recognition for
faceme! In Proc. Eurospeech ’97, pages 1995–1998, Rhodes, Greece, 1997.
[10] Pengyu Hong, Zhen Wen, and Thomas S. Huang. Speech driven face animation. In Igor S. Pandzic and Robert Forchheimer, editors, MPEG-4
Facial Animation: The Standard, Implementation and Applications. Wiley, Europe, July 2002.
[11] T. Lehn-Schioler, L. K. Hansen, and J. Larsen. Mapping from speech to images using continuous state space models. In Joint AMI/PASCAL/IM2/M4
Workshop on Multimodal Interaction and Related Machine Learning Algorithms, April 2004.
[12] J. P. Lewis. Automated lip-sync: Background and techniques. J. Visualization and Computer Animation, 2, 1991.
[13] Dominic W. Massaro, Jonas Beskow, Michael M. Cohen, Christopher L.
Fry, and Tony Rodriguez. Picture my voice: Audio to visual speech synthesis using artificial neural networks. Proc. AVSP 99, 1999.
[14] I. Matthews, T.F. Cootes, J.A. Bangham, S. Cox, and R. Harvey. Extraction of visual features for lipreading. Pattern Analysis and Machine
Intelligence, IEEE Transactions on, 24(2):198 –213, 2002.
[15] David F. McAllister, Robert D. Rodman, Donald L. Bitzer, and Andrew S.
Freeman. Speaker independence in automated lip-sync for audio-video communication. Comput. Netw. ISDN Syst., 30(20-21):1975–1980, 1998.
[16] H. McGurk and J. W. MacDonald. Hearing lips and seeing voices. Nature,
264:746–748, 1976.
[17] Satoshi Nakamura. Statistical multimodal integration for audio-visual
speech processing. IEEE Transactions on Neural Networks, 13(4), july
2002.
[18] C. Sanderson and K. K. Paliwal. Polynomial features for robust face authentication. Proceedings of International Conference on Image Processing,
3:997–1000, 2002.
[19] M. B. Stegmann. Analysis and segmentation of face images using point
annotations and linear subspace techniques. Technical report, Informatics and Mathematical Modelling, Technical University of Denmark, DTU,
Richard Petersens Plads, Building 321, DK-2800 Kgs. Lyngby, August
2002. http://www.imm.dtu.dk/pubdb/p.php?922.
[20] B.J. Theobald, S.M. Kruse, J.A. Bangham, and G.C. Cawley. Towards a
low bandwidth talking face using appearance models. Image and Vision
Computing, 21(13-14):1117–1124, 2003.
[21] Jay J. Williams and Aggelos K. Katsaggelos. An hmm-based speech-tovideo synthesizer, 1998.
8
111
Live Interpretation of Conductors’ Beat Patterns
Declan Murphy
Computer Science Department
University of Copenhagen
[email protected]
Abstract
should be [10, 2], this approach would be neither
necessary nor realistic. In practice, both the
musicians and the conductor are quite familiar with
the score before a performance. They all have a
fair idea of how the music ought to turn out and
of what to expect from each other. It is simply
inconceivable that a real conductor would issue a
3-beat pattern where a 4-beat one would have been
appropriate: such a degree of inconsistency would
just never work in practice. The musicians do not
need to be told how many beats are in a bar! Even
if they did, it would be too late by the time they got
the message. It is rather how the particular beat
pattern is executed that conveys the important
information, and accordingly it is the deviation of
the executed beat pattern from what would have
been expected that is to be recognized not the
beat pattern per se. It is therefore reasonable to
annotate a score file with timing, dynamic and
articulation indications just as a real musician’s
score is rather than using only raw MIDI values.
A method is presented for following and
interpreting the conductor’s gestures known as
beat patterns. Taking knowledge of the expected
beat pattern and observed baton coordinates as
input, the system keeps track of where the baton
is in relation to the beat pattern. Tempo, dynamic
level and registration updates are made on the fly
according to deviations of the observed coordinates
from those expected based on the score. A suitable
mathematical model of the conducting process is
constructed, and an algorithm for following and
interpreting the execution of the beat patterns is
developed.
1
Introduction
The work of this paper builds upon a previous
computer vision technique to track an unadorned
conductor’s baton [6] and takes a step towards a
more complete conducting system. The cameraview location of the baton’s tip is available in realtime from the tracker, so the next task is to be
able to follow and interpret the conductor’s beat
pattern as executed by the user. Deviations of the
user’s conducting from that expected for the music
are to modulate the playing tempo, dynamics and
articulation.
Rudolf [9] elucidated the structure of the
conductor’s beat patterns and how they have a
sort of grammar. The task in this sense is to
parse the execution of the beat patterns as they
unfold on the fly in order to extract the expressive
information. A symbolic understanding of the
conductor’s gestures is attainable by correlating the
tracked motion with the standard conductors’ beat
patterns, some of which are shown in Fig. 1.
While it ought to be possible to recognize all beat
patterns (allowing for some delay) without any a
priori expectation of which particular pattern it
The conductor has a conception of an ideal
performance interpretation, according to which
s/he instructs the players to keep as close as
possible to. This is done by issuing instructions
to compensate for any drift from this ideal, in
order that the performance should tend to return
to a presumed state of perfection. For example,
if the conductor considers that the musicians are
playing too quickly, then a beat pattern will be
executed more slowly than would otherwise have
been expected. It is important to understand that
the role of the conductor is not to play the orchestra
as such, but instead to modulate their playing
towards a coherent ideal interpretation. There are
no absolute rules or values to be measured: it is
instead a question of establishing expectation and
recognizing subsequent deviation from it. (On a
cognitive science note as discussed in [5, Ch. 6],
it may be argued at length that this process
of establishing and maintaining expectation while
1
Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04
2
112
deviating from it is what music is all about.
It seems appropriate then that the meta-process
of modulating performance should also be of this
nature.)
The model then involves a representation of the
score with beat pattern annotations, and a system
for recognising the deviations from the expected
tempo and volume as indicated by the sampled
baton locations. These deviation data can then be
used to modify live generated output.
2
Scenario
Template
Next?
Nature of the Task
Even though the general form of the beat pattern
may be known beforehand, the actual performed
beat pattern will in general deviate from this
curve’s exact shape, size, registration and rotation.
Some of this deviation is perfectly natural and
should not have any bearing on the output, but
of course the whole point is that certain other
deviations are significant. It becomes important
to be able to recognize when a drift from the
abstract pattern is just the natural variation of this
particular instance of it and when it is a message
to alter musical playback. The difficulty of this
recognition is compounded by the fact that this
recognition must take place in real-time and that
the beat-pattern interpreter only receives periodic
updates of the tip’s position just isolated sample
points with which to determine a great deal of
information on the fly. To make matters even
worse, the score values for the nominal expected
size and speed of the beat patterns change, e.g., as
the music gets softer and slows down. In practice,
the beat pattern follower can only rely on receiving
sample point updates at the relatively coarse rate
of 15 per second. Figure 2 illustrates how this
coarseness introduces temporal uncertainty.
The simulation and recognition of cues will not
be attempted here. It may safely be expected that
a programmed computer will respond without fail
at the appropriate time: it will not forget or get lost
just because the previous thirty-seven bars have
been rests, for example. The expressive timing of
a voice’s entry is relevant though, and this may be
dealt with as a special case of articulation.
2.1
Previous
Considerations
Fifteen samples per second would not be good
enough at all for satisfactory temporal resolution
if the baton was to play the music. The resolution
of MIDI Time Code (MTC) is 1–2 ms, for example.
Next?
Beat 1
Now
Next?
Figure 2: Illustrating the inherent temporal
uncertainty of following a beat pattern with
only periodic updates. In most beat patterns,
the first beat occurs at the bottom of a down
stroke such as in the template on the left. In
the scenario on the right, the last few points
are marked by red crosses and the point marked
“Now” has just occurred. The task involves
determining the beat point as accurately as
possible, giving rise to the question of where
to expect the next sample. It could be that the
tip is still traveling downwards as suggested in
green; it could be that the tip has just bottomed
out as suggested in blue; it could be that the tip
will bottom out before the next sample arrives
as in orange. The system cannot predict the
future with certainty, but yet it will most likely
be too late if it waits until the next sample
arrives.
Since conducting operates by regulating at a supranote level, this ought to be just about good enough.
If the sample rate were much lower then there
would be pathological loss of information: the
system would break down. There is a lower
bound specified by the Nyquist sampling theorem
(a corollary of Shannon’s far-reaching theorem
about entropy in information theory [11]), but
greater resolution is required for determining the
current location along the beat pattern and for
simultaneously recognizing deformations of the
template.
Some considerations which were instrumental
in formulating the template and designing the
following algorithm were:
Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04
3
113
stop
4
2
neutral legato
full stacato
1
3
stop
light staccato
2
4
3
3
2
1
1
legato
poco espressivo
legato
mezzo espressivo
legato
molto espressivo
4
4
2
3
2
1
n
Baton passes through without stopping
n
Baton stops at this point
4
4
2
3
1
3
1
Tense controlled movement
Field of beating
Quick flick
Bouncing
Figure 1: The range of (more or less) standard 4-beat patterns ranging from light-staccato to moltoespressivo according to Rudolf [9]. Note the increasing lack of synch points.
1. The major phases of all of the beat patterns
may be recognized by the tip’s rough position
q and approximations to its first and second
time derivatives q̇, q̈. If there were on-board
accelerometers or if numerical computation
of derivatives was not inherently unstable
([1], [5, §A.1.2]), then this would be a
preferred way to proceed. It would offer
automatic accommodation of individual style
and freestyle, morphing between levels of
expressivity, ease of encoding and efficient
computation.
2. It the spatial match is very good, then
the tempo should be adjusted to match the
conductor.
3. If q lies significantly outside of the expected
curve, signal a crescendo; if q lies inside, a
diminuendo.
4. Extrema may be used to decisively update
scale, translation and tempo. In-between
extrema, there must be anticipation in
order for the system concept to work.
This anticipation should be based on the
expectation derived from the score file, viz.
crescendo, ritardando, etc.
2.2
Clarification of the Task
The system is given:
• snapshot samples of q(ti ) ∈ R2 , the tip in
flight, for some discrete time values ti ,
• a score file annotated with indications of beat
patterns, dynamics, tempo, etc.,
• a template of the beat pattern as a parametric
continuous closed curve p in R2 .
The system must calculate:
• q’s parametric position in the template in
order to determine the conductor’s tempo,
• updates of the scale factor of the template in
order to determine the conductor’s dynamics,
• updates of the translation of the template in
order to continue to match the conductor’s
execution with the standard template
successfully and accurately.
The rotation of the template shall be assumed
to be fixed. The camera can simply be rotated if
necessary, but even this simple once-off calibration
should not normally be necessary. People do not
Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04
4
114
normally loose their sense of up and down, so
rotation matching is considered to be not worth
implementing. Without rotation, any drift or
translation of the conductor’s beating may be
completely described by a vector b ∈ R2 .
2.3
ρ2
ρ1
Representation of the Template
The first approach considered was to encode each
of the standard beat patterns (from Rudolf [9]) as
normalized continuous closed parametric curves p :
[0, 1) → R2 constructed out of simple polynomials
as splines. To this would be added a time curve
κ : [0, 1) → [0, 1) such that the composition p ◦ κ
maps at the same rate as the gesture.
This idea of standardizing the beat patterns
with simple polynomials was replaced by tailored
splines constructed from recordings of the user’s
execution of the beat pattern. Tailored splines were
preferred for a couple of reasons. Firstly, there is
a natural variation of style of execution between
conductors, between text books on the subject, and
between pieces with the same conductor. Secondly,
although the first approach was more elegant from
an analytic point of view, the recorded splines
are more practical from a pragmatic engineering
perspective. This approach also allows the user to
pre-program special articulation, such as off-beat
entry cues, with relative ease.
The convenience of considering the template’s
spatial p and temporal κ dimensions separately was
retained for several reasons:
• much more smooth and accurate splines are
attained in regions of low baton movement
(otherwise spurious points linger in a small
neighbourhood which should only have one
point),
• invertability is not feasible otherwise (since
some patterns require the baton to hold
steady) and the guaranteed invertability of
κ allows a more efficient implementation of
locally inverting p ◦ κ,
• direct independent scaling of size (amplitude)
and speed (tempo),
• easier analytic simulation,
• easier analysis and adjustment of recorded
templates.
p : [0, 1) → R2 is continuous, and p is piecewise
C . For each beat pattern template, there exist
points ρ0 , . . . , ρnr −1 ∈ [0, 1) such that p is not
1
y
σ1
4
2
3
1
x
ρ0 = σ0
Figure 3: The most salient points to the eye are
the non-differentiable ones ρ0 , ρ1 , ρ2 . They are
also the points about whose location the system
can be the most sure of. Beats number 2, 3
occur at turning points or alternatively at zerocrossings of the x-axis; beats number 1, 4 occur
at local minima of y and are therefore synch
points.
differentiable at κ(ρj ), 0 ≤ j < nr . These
r −1
are the most visually salient points
r = {ρj }nj=0
of the beat pattern to the human eye, called
“rough” points in this paper. There also exist
σ0 , . . . , σns −1 ∈ [0, 1) such that, for all σj ∈ s =
ns −1
, a beat point occurs at p ◦ κ(σj ) and
{σj }j=0
either σj ∈ r, p˙x κ(σj ) = 0 or p˙y κ(σj ) = 0
where p(t) = px (t), py (t) and the dot denotes the
time derivative
as usual. In other words, a beat
occurs at p κ(σj ) that may be located accurately
in both space and time. The σj ∈ s will be called
“synch” points. Figure 3 shows a legato-espressivo
4-beat pattern with its rough points and synch
points marked.
3
3.1
Model and Algorithm
Model
Each beat pattern has its own template {p, κ, r, s}
where r = {ρ0 , . . . , ρnr −1 } is the set of rough points
and s = {σ0 , . . . , σns −1 } are the synch points. The
template curve p is centered about the origin 0 ∈
R2 which corresponds to the intersection of the two
axes which define the conductor’s field of beating.
It is also normalized with respect to the templates
for the other beat patterns to mf . The template
time function κ is both continuous and bijective
on [0, 1). It is proved in [5, §A.1.1] that κ must
Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04
115
• s = {σ0 , . . . , σns −1 } ⊂ [0, 1) with σj < σ
k for
0 ≤ j < k < ns . ∀ σj ∈ s, pκ(σj ) is a
beat point and either σj ∈ r, p˙x κ(σj ) = 0
or p˙y κ(σj ) = 0.
5
Figure 4: The musician readable score file for
the simulation.
therefore be monotonic. Without loss of generality,
κ(0)
= 0 and limt→1 κ(t) = 1. r = {t ∈ [0, 1) :
p κ(t) ∈ C 1 } is a finite set (by virtue of the shape
of the beat patterns) so without loss of generality
they can be ordered such that j < k =⇒ ρj <
ρk , 0 ≤ j, k < nr . The set s shall be ordered
similarly.
The normalized template is to be scaled and
translated
to fit to the incoming data, so a point
p κ(t) for some κ(t) will have the general form
ap + b. a ∈ (0, ∞) scales the template closed
curve p[0, 1), centered about the origin, as a
multiplicative scalar. b ∈ R2 translates the scaled
template as an additive vector. The remaining
variable in the model is the current tempo m ∈
[0, ∞) where one bar lasts m seconds so that, for
the example in Fig. 4, if there are nominally 60
crochet beats per minute and two such beats per
bar then m = 2. The expected location of q at
time t can now be written as
t mod m(t) + b.
Et (q) = ap κ
m(t)
• ms (ti ), as (ti ) ∈ R+ the nominal score values
for tempo and dynamics at time ti .
• mc (ti−1 ), ac (ti−1 ) ∈ R+ , bc (ti−1 ) ∈ R2
the previous conductor’s values for tempo,
dynamics and offset respectively.
3.3
Know
The trace of hypothetical neutral conducting,
following only the score, is given in the model by:
t mod ms (t)
.
s(t) = as (t)(p ◦ κ)
ms (t)
The expected location of the conductor’s trace
at time ti is based on the most recent values
of the tempo, dynamics and offset according to
the conductor, mc (ti−1 ), ac (ti−1 ), bc (ti−1 ). This
information is combined with the values from the
score for the current time, to effect any changes.
Let
The algorithm consists of comparing the observed
value q(ti ) with Eti (q)|ti−1 at each time step, and
updating m, a, b such that q(ti ) ≈ Eti (q)|ti . This
procedure can now be stated formally.
mc (ti−1 )ms (ti )
.
ms (ti−1 )
(1)
Then the locus of q(τ ) as calculated at ti is
modelled as
τ mod m̄(ti )
+ bc (ti−1 ),
c̄ti (τ ) = ā(ti ) (p ◦ κ)
m̄(ti )
(2)
and the best estimation of q at ti is given by c̄ti (ti ).
3.2
3.4
Have
The system is equipped with the following
information:
• p : [0, 1) → R2 : t → px (t), py (t) ,
limt→1 p(t) = p(0) piecewise differentiable,
continuous, closed, parametric curve in R2 .
• κ : [0, 1) → [0, 1), κ(0) = 0 continuous,
piecewise differentiable, bijective time spline.
• r = {ρ0 , . . . , ρnr −1 } ⊂ [0, 1) with ρj < ρk for
0 ≤ j < k < nr . p is differentiable on the
interval κ(ρj , ρj+1 ) but p is non-differentiable
at the point κ(ρj ). p is also differentiable on
κ(0, ρ0 ) and κ(ρnr −1 , 1).
ā(ti ) =
ac (ti−1 )as (ti )
,
as (ti−1 )
m̄(ti ) =
Want
For every time step ti , the algorithm has
to calculate the most plausible values for
mc (ti ), ac (ti ), bc (ti ) such that, with these new
values, q(ti ) lies on (or very nearly on) the modelled
conductor’s trace. This may be stated by cti (ti ) ≈
q(ti ) where
τ mod mc (ti )
+ bc (ti ).
cti (τ ) = ac (ti ) (p ◦ κ)
mc (ti )
(3)
In other words, c̄ti (ti ) are the expected points using
the latest information from the last frame and the
current nominal score values, and cti (ti ) are the
adjusted points.
Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04
6
116
c̄ti (ρj +1 mod nr )
r
c̄ti (τ )
c̄ti (ti ) r
c̄ti (tq ) r r
q(ti )
c̄ti (τ )
c̄ti (ti ) r q
c̄ti (tq ) r r
wrong
Figure 5: Updating the tempo mc (ti ), the first
task is to locate tq . In the case on the left, q(ti )
is away from any rough points and so c̄ti (tq )
may safely be determined as the closest point on
c̄ti (τ ) in a neighbourhood of q(ti ). On the right
however, q(ti ) is close to c̄ti (ρj ) so that c̄ti (tq )
must be found in a neighbourhood of c̄ti (ti ).
Note that c̄ti (tq ) is not the closest point to q
on c̄ti (τ ) marked “×”, which would be much
further away from c̄ti (ti ) in time.
r ct (ti )
i
c̄ti (ti ) r
r
c̄ti (ρj )
r
c̄ti (ρj )
3.5
c̄ti (tq )
r q(ti )
r
aa
c̄ti (τ )
Outer
bracketing
amplitude
cti (τ )
Figure 6: Adjusting the amplitude so that
q(ti ) ∈ cti [0, 1). Since q(ti ) lies in advance of
c̄ti (ti ), cti (ti ) is scaled via mc (ti ) to be closer
to q(ti ).
First Step: Try to Align Tempo
the current bar, and as before let
In the simple case, only the tempo needs to be
updated slightly. The first task is to find tq such
that, with a certain interval (a, b),
/
/
/
/
/q(ti ) − c̄t (tq )/ = min /q(ti ) − c̄t (τ )/ . (4)
i
i
2
2
a≤τ ≤b
If q/ is not in the vicinity
of a rough point, i.e., ∀ ρ ∈
/
r, /q(ti ) − c̄ti (ρ)/2 > adr for an implementation
dependent constant dr ∈ R+ , then (a, b) might just
as well
/ for numerical efficiency. If
/ be [0, 1) except
say /q(ti ) − c̄ti (ρj )/2 ≤ adr for some 0 ≤ j < nr
then let tˆi = ti mod m̄(ti ) /m̄(ti ) and choose j ∈
{j, j − 1 mod nr } such that ρj < tˆi < ρj +1 mod nr ,
where the choice of j depends on whether or not
c̄ti (ti ) has passed c̄ti (ρj ) yet in this cycle. Now a, b
are chosen such that (a, b) ⊂ (ρj , ρj +1 mod nr ) as
illustrated in Fig. 5.
3.6
Basic Case: Tempo Only
Now that tq has been calculated, the remaining
task is to adjust mc (ti ) so as to tend to align tq and
ti . The distance in time which has to be aligned is
factored over the length of a beat at tempo m̄(ti ).
Let l be the denominator of the time signature for
ti mod m̄(ti )
.
tˆi =
m̄(ti )
(5a)
Then the updated tempo is
mc (ti ) =
3.7
m̄(ti )
.
m̄(ti ) + (tq − tˆi )l
Next Case:
Tempo
Amplitude
(5b)
and
The criterion for needing to adjust the amplitude
ac (ti ) is that q is too far
/ away from / the
expected locus, formally that /q(ti ) − c̄ti (tq )/2 ≥
da log ā(ti ) + 1 for an implementation dependent
constant da ∈ R+ . If this is not the case then the
amplitude is simply updated as expected by putting
ac (ti ) = āc (ti ).
The task is to adjust ac (ti ) by the least amount
such that q(ti ) ∈ cti [0, 1). This can be achieved
without undue inefficiency by the technique known
as bracketing and bisecting. Figure 6 sketches the
scenario. The prior adjustment of mc (ti ) carries
through to the new scale and does not need to be
recalculated.
Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04
117
3.8
Full Case: Offset Adjustment
When the conductor’s time passes through a rough
point ρ ∈ r then any necessary adjustment of the
registration b is made. Formally the condition
is cti (ρ) ∈ cti (ti , ti+1 ) for some ρ ∈ r, but the
implementation should perform the test in [0, 1)
rather than along a curve in R2 for reasons of both
accuracy and efficiency. Let
ti mod mc (ti )
ti =
mc (ti )
and
ti+1 mod mc (ti )
.
t0
i+1 =
mc (ti )
(6a)
Then the condition is that
1
ti < t0
(ti , t0
i+1 )
i+1 ,
∃ ρ ∈ r such that ρ ∈
(ti , 1) ∪ [0, t0
i+1 ) ti > t0
i+1 .
(6b)
The hypothesis here is that rough points will
always be correctly placed spatially. Say that
ρj ∈ r satisfies the above condition (Eqn 6b),
and let ρj = ρj−1 mod nr . First the amplitude
ac (ti+1 tρj ) is set such as to satisfy
/
/
/
/
/ac (tρj )p κ(ρj ) − ac (tρj )p κ(ρj ) /
2
/
/
/
/
= /q(tρj ) − q(tρj )/ , (7)
2
which reduces to a somewhat messy but readily
solvable quadratic in ac (tρj ).
Any necessary
updating of the score amplitude value as is
propagated through the time steps from tρj to tρj
by ā in Eqns 1, 2, 3 and is implicit in Eqn 7 as such.
It only remains to set b which is given simply by
rearranging ap + b = q:
bc (ti+1 tρj ) = q(tρj ) − ac (tρj )p κ(ρj ) . (8)
In particular, this procedure ensures that the xcomponent of ct (τ ) is set on the upbeat-downstroke
of each beat pattern.
The case for the conductor’s time passing
through a synch point is treated analogously
except that the hypothesis only assumes reliable
spatialization along the axis that has the
extremum. Say that σj ∈ s satisfies
the above
condition in Eqn 6b and that p˙y κ(σj ) = 0. Let
dpy κ(σ) = 0
sy = σ ∈ s :
dt
and let
1
max{ρ ∈ r ∪ sy : ρ < σj } if it exists,
ρj =
otherwise.
max{r ∪ sy }
7
Then the equation for calculating the amplitude
ac (tσj ) becomes
ac (tσj )py κ(σj ) − ac (tρj )py κ(ρj ) = qy (tσj ) − qy (tρj )
(9)
where q(t) = qx (t), qy (t) , instead of Eqn 7.
4
Implementation
The composition function p◦κ needs to be inverted
at selected points in a given neighbourhood in R2
in order to determine tq from Eqn 4. It is actually
a form of bracketing and bisecting which was also
used for determining ac (ti ) in §3.7. Because of
the low image resolution used, it takes only a
small few iterations to establish the result in the
implementation. For most beat patterns, p is not
a simple curve, i.e., it crosses itself and is not
injective, but it is injective on the interval (a, b)
of Eqn 4. The time spline κ does not need to
be inverted, only the spatial spline p. By setting
constants κa , κb ∈ [0, 1) so that
b mod m̄(ti )
a mod m̄(ti )
,
, κb = κ
κa = κ
m̄(ti )
m̄(ti )
then only half the amount of computation is
required in order to calculate c̄ti between κa and
κb in Eqn 2.
The model and algorithm were implemented in
C in order to maximize real-time performance
by minimizing the overhead of processing time.
C was also chosen for portability reasons: this
code was developed in particular to be directly
incorporated into the PatternPlay system [5, 4] or
to be encapsulated as an EyesWeb module [12],
both systems being platforms for investigating the
use of gestural expression for making music. The
code is open source and could provide the basis
for many other conducting or gesture recognition
systems. It is available from [7] and on the CDROM accompanying [5].
In order to make up a complete basic conducting
system, the remaining system components are: a
baton tracker [6], a facility to record the user’s
own beat patterns, a sequencer for live output and
a module for imposing performance patterns onto
the score file. The module for recording the user’s
beat patterns has to automatically locate the rough
points, mark the synch points, and integrate them
into a score file database. The module for imposing
Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04
8
118
performance patterns is implemented by means of
a general framework for structural navigation and
manipulation based on topos theory [3]. The full
system is described in [5, §16.3.2].
p1x(t), p1y(t)
p2x(t), p2y(t)
"p(kappa(t)).dat" using 2:3
"q(t).dat" using 2:3
"E(t).dat" using 2:3
1
0.9
0.8
4.1
Simulation
0.7
The algorithm presented in §3 was, at least in its
finer details, worked out by building a simulation.
The level of detail is somewhat less than a full
implementation, but there should be no new
problems in principle scaling up to a full system.
A hypothetical beat pattern p(t) was constructed
simply by
1
(2t − 4t2 )i + 2tj 0 ≤ t ≤ 12 ,
(10)
p(t) =
1
2(1 − t)j
2 ≤ t < 1,
where i, j are the usual basis vectors in R2 . This
pattern along with some simulation data is plotted
in Fig. 7. The associated time spline κ is plotted in
Fig. 8. Figure 4 shows the musician-readable score
for which this simulation was run.
Articulation could be handled by matching beat
pattern attributes to performance parameters. It is
not incorporated into the work of this paper as it is
envisaged that the user would manually make these
associations, whether they be global for an entire
piece or once off for a single bar. Cues for expressive
timing of voice entry could also be implemented as
special cases of articulation.
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.1
0
0.1
0.2
0.3
0.4
0.6
0.5
0.7
0.8
0.9
Figure 7: A plot of the simulation pseudo beat
pattern as given by Eqn 10 and some early
developmental fitting results.
1
"kappaspline.dat"
0.9
0.8
0.7
0.6
0.5
0.4
4.2
Discussion
0.3
The recurring pattern of
f : (m, t) →
0.2
t mod m(t)
m(t)
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
in several of the equations behaves well in that f
is onto [0, 1). f (m, t)’s points of discontinuity as
t → nm for any n ∈ N match κ(τ )’s point of
discontinuity when τ → 1, which together result
in a seamless continuous closed curve p ◦ κ ◦ f in
R2 . As long as t = nm for all n ∈ N, then
Figure 8:
The time spline κ used for
the simulation.
This is a plot of actual
spline interpolation values calculated by the
implementation.
t mod m
∂f
(m, t) = −
m2
∂m
will suddenly change the systems behaviour for
constant user input. However, as well as the
conductor modulating the music output, there is an
additional feedback loop consisting of the generated
music telling the conductor what is going on with
the performance: the conductor knows the instant
there should be any such change and behaves
accordingly.
In this way, even discontinuities
arising from the score should not pose any problem
which is suitable for the sizes and ratio of t and m.
The fact that ∂f /∂m depends on t is not as bad as
it may sound at first: it is q which is used to adjust
m, m does not adjust t as such.
It may of course happen that ms , as change
discontinuously depending on the score, which
Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04
119
to an alert conductor.
The method presented is one approach to the
problem, but other strategies are quite possible.
As mentioned at the outset, there is a certain
unavoidable uncertainty inherent in the nature of
the task. The approach presented here is chosen
as the one considered to be the most promising
of success and musically meaningful, at least until
further user feedback becomes available.
In particular, if reliable acceleration information
was available (discussed in [5, §A.1.2]) then a
different or indeed complimentary method might
well prove fruitful. In-between a beat pattern’s
rough points, the motion is predominantly either
up, down, left, right, or changing direction. The
motion through the beat pattern may be parsed by
knowing its approximate i, j velocity components.
These p˙x , p˙y also give slope which may also be used
as a fitting criterion. The motion can also be parsed
in terms of its slope and direction.
The hypotheses used as registration criteria in
§3.8 are not guaranteed to hold. It may be that
they are hallmarks of good style; it may be that
they vary according to a conductor’s style. Such
data is not to be found in the literature. Upon
reflection and without further data, they seem to
be at least as strong
as any rival hypothesis. Such
quantities as a˙c , ḃ
, |b| and ṁ could be monitored
and used to regulate any adjustments, but it is
impossible in general to discern the user’s intention
from drift or noise with certainty on the fly.
One aspect of conducting which is not modelled
is how different ensembles generally require
different conducting technique from the same
conductor, and how the same ensemble may require
(or at least receive) a different style from different
conductors, by virtue of their different musical
background, preferred style, social character and
mutual rapport. A modelling of a certain sense of
personality might give the system a more human
feel, but that is left for future research.
The lack of any absolute reference frame or
absolute mensurable values for free gestures poses
a problem if the task should become playing of
music. The degree of articulateness necessary for
skillful playing of a musical instrument seems to
require both haptic and aural feedback. In terms
of using gestures captured by computer vision to
control music, this problem has been circumvented
by focusing on expectation instead. Indeed the very
lack of preciseness of the beat points of the more
expressive and rubato beat patterns is presumably
quite deliberate: to allow rubato. Rubato means
9
“stolen” in Italian, the semantics being that there
should be local expressive timing deviations over a
steady underlying pulse. One can steal a moment
from the start of a phrase boundary in order to stall
just before it, for example. Figure 1 shows a range
of 4-beat patterns increasing in both expressivity
and rubato.
4.3
Filtering
One of the problems encountered with previous
related work on conducting audio files [8] was
that sudden changes in audio playback rate are
distracting to the listener whereas slow changes
seem unresponsive to the user. Ideally one would
like a system which can overlook the human lack of
precision from one beat to the next but yet respond
instantly if, e.g., the user suddenly stops. A routine
for calculating the magnitude of a local deviation
was envisaged. This magnitude could be used to
reduce or stop the low pass filtering of the beat
timing.
These issues turn out to be less problematic at
the note level. The adjustment of the timing takes
place in relation to the length of the last beat
(Eqn 5b). This means that the music playback
can comfortably come to an unexpected complete
halt in one beat, a response time which rivals
an alert musician.
The distracting attributes
arising from audio manipulation such as frequency
shift, disproportionate note attack, time scaling
irrespective of note length, etc. are not a problem
when the playback is generated. When it is only
the sustain part of a note which is scaled then the
time scaling is perceived as being more natural.
Anyway, generating the music in response to the
conductor is more realistic than post-processing
a recording which already has one particular
interpretation stamped on it: the audio conductor
must undo a pre-existing interpretation before
asserting another, instead of working directly from
the score.
5
Summary
A method for live following and interpreting of
conductors’ beat patterns is presented. Before
play starts, the system has been provided with a
score which is annotated with the conductor’s own
expected neutral execution of the beat patterns.
During play, the system receives a live stream of the
baton’s coordinates. At each sample time-frame, an
updated value is output of how advanced along the
Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04
120
current beat pattern the conductor is, which can
be used to regulate sequenced playback accordingly.
Values are also output for the current dynamic level
and any specially marked cues.
The baton’s progress through the given beat
pattern, and more importantly any meaningful
deviation from the expected beat pattern, is
calculated by trying to fit each sample received to
the latest beat-pattern curve. At every time step,
this curve is updated in terms of tempo, dynamics
and registration. This update is based upon the
best estimation of these parameters as indicated
by the conductor in relation to their nominal score
values.
A suitable computational representation of the
beat patterns is devised under consideration of the
nature of the conducting task and of the variability
of the movement from one part of a beat pattern
to another and of the various beat patterns from
each other. Independent piecewise differentiable
splines are used for the curves through space
and time. The alignment of the expected and
indicated tempo is performed by locally inverting
the composition of the two splines. Significant drift
in or out of the curve is aligned by updating the
amplitude accordingly. Registration adjustment is
accomplished at the visually salient “rough” and
“synch” points.
The method is implemented into a complete
computer music conducting system under GPL
source code [7].
References
[1] Francis Begnaud Hildebrand. Introduction to
Numerical Analysis. McGraw-Hill, New York,
Second edition, 1974.
[2] M. Lee, G. Garnett, and D. Wessel.
An Adaptive Conductor Follower.
In
A. Strange, editor, Proceedings of the
International Computer Music Conference,
Pages 454–455, San Francisco, USA, 1992.
ICMA.
[3] Guerino Mazzola et al. The Topos of Music:
Geometric Logic of Cencepts, Theory, and
Performance. Birkhäuser Verlag, 2002. In
collaboration with Stefan Göller and Stefan
Müller.
[4] Declan Murphy. Pattern Play. In Alan
Smaill, editor, Additional Proceedings of
the Second International Conference on
10
Music and Artificial Intelligence, On-line
technical report series of the Division
of Informatics, University of Edinburgh,
Edinburgh, Scotland, UK, September 2002.
http://dream.dai.ed.ac.uk/group/
smaill/icmai/b06.pdf.
[5] Declan Murphy. Expressive Manipulation of
Musical Structure: From Gesture Tracking to
Analysis of Music. Ph.D thesis, Copenhagen
University, 2003. In Submission.
[6] Declan Murphy.
Tracking a Conductor’s
Baton. In Søren Olsen, editor, Proceedings
of the 12th Danish Conference on Pattern
Recognition and Image Analysis, volume
2003/05 of DIKU report series, Pages 59–
66, Copenhagen, Denmark, August 2003.
DSAGM, HCØ Tryk.
[7] Declan Murphy. Beat Pattern Recognition.
Anonymous FTP, May 2004.
ftp://ftp.diku.dk/diku/users/declan/
beat ptn/.
[8] Declan Murphy, Tue Haste Andersen, and
Kristoffer Jensen. Conducting Audio Files
via Computer Vision. In Proceedings of the
Fifth International Gesture Workshop, LNAI,
Genoa, Italy, April 2003. Springer. In press.
[9] Max Rudolf. The Grammar of Conducting:
A Comprehensive Guide to Baton Technique
and Interpretation. Macmillan, Third edition,
1993. prepared with Michael Stern.
[10] D. Rumelhart and J. McClelland. Parallel
Distributed Processing. The MIT Press, 1986.
Two volumes.
[11] Claude Elwood Shannon. A Mathematical
Theory of Communication.
Bell System
Technical Journal, 1948.
[12] University of Genoa The InfoMus lab, DIST.
The EyesWeb Project. WWW, October 2003.
http://musart.dist.unige.it/
sito inglese/research/r current/
eyesweb.html.
121
Testing for difference between two groups of
functional neuroimaging experiments
Finn Årup Nielsen∗†, Andrew C. N. Chen‡, Lars Kai Hansen†
July 5, 2004
Abstract
We describe a meta-analytic method that tests for the difference between
two groups of functional neuroimaging experiments. We use kernel density estimation in three-dimensional brain space to convert points representing focal
brain activations into a voxel-based representation. We find the maximum in
the subtraction between two probability densities and compare its value against
a resampling distribution obtained by permuting the labels of the two groups.
As such it appears as a general method for comparing the local intensity of two
non-stationary spatial point processes. The method is applied on data from
thermal pain studies where “hot pain” and “cold pain” form the two groups.
1
Introduction
Human functional neuroimaging examines the relationship between cognitive functions and brain areas with positron emission tomography (PET) or magnetic resonance imaging (MRI) brain scanners. Experiments typically investigate a specific
brain function and determine its “activation” in the brain volume. This is usually
done by scanning multiple subjects while they are under two different conditions (e.g.,
“activation” and “rest”). Statistical analysis of the scanning often employing the general linear model results in a statistical parametric image volume, that is summarized
by the significant local maxima (Friston et al., 1995). These local maxima are presented in scientific articles by their 3-dimensional coordinates and, e.g., their z-score
or p-value.
Before the statistical analysis the brain scans are spatially normalized to a standard brain atlas, — the so-called “Talairach atlas” (Talairach and Tournoux, 1988).
This allows the 3-dimensional coordinates of the local maxima — the “Talairach coordinates” — to be compared across studies.
For meta-analysis we should use the statistical parametric image volume for optimal results. Although there are beginning to appear neuroimaging databases that
contain such data, e.g., the fMRI Data Center (Van Horn et al., 2001) and NeuroGenerator (Roland et al., 2001), the image volumes are typically not available and
we have to resort to the Talairach coordinates.
The access to the Talairach coordinates is made easier when they are represented in
a database. Two such databases exist: The BrainMap database (Fox and Lancaster,
1994; Fox and Lancaster, 2002) and the Brede database (Nielsen, 2003). A number of
∗ Neurobiology
Research Unit, Rigshospitalet, Copenhagen
and Mathematical Modeling, Technical University of Denmark, Lyngby
‡ Center of Sensory-Motor Interaction, Aalborg University, Aalborg
† Informatics
1
122
studies have modeled the distribution of the Talairach coordinates in these databases,
e.g., (Fox et al., 1997; Nielsen and Hansen, 2002).
If the Talairach coordinates are restricted to a specific area their distribution
may be approximated with a Gaussian distribution and inference can be made with
“parametric models” (Fox et al., 1997) and, e.g., the Hotelling’s T 2 can be employed
to test for difference between two groups of coordinates (Christoff and Grabrieli,
2000; Nielsen et al., 2004). However, in many cases the distribution of the Talairach
coordinates will have several spatial modes, and therefore it has been suggested to
use Gaussian mixture models (Nielsen, 2001) or kernel density estimators (Nielsen
and Hansen, 2002; Turkeltaub et al., 2002; Chein et al., 2002; Wager et al., 2003).
The statistical analysis is usually performed in a mass-univariate setting where the
number of statistical tests corresponds to the number of voxels in the volume. This
results in a massive multiple comparison problem that is most often countered by
employing random field theory for the statistical inference (Cao and Worsley, 2001).
But permutation tests can also be used by constructing the null distribution for the
maximum statistic, where the maximum is taken across all the voxel in the statistical
parametric image (Holmes et al., 1996; Nichols and Holmes, 2001).
Below we will describe a meta-analytic method that uses permutation tests together with maximum statistic and kernel density estimation to give a statistical
value for the difference between two groups of Talairach coordinates and thereby testing if two groups of functional neuroimaging experiments are different. We will use
Talairach coordinates from hot and cold pain experiments. The pain modality is of
special interest for our particular method since it typically causes a spatially multimodal activation pattern where several district brain regions are involved: Thalamus,
the somatosensory cortex, insula and anterior cingulate cortex (Ingvar, 1999).
2
Method
A probability density volume is constructed by convolving a local maximum l (in the
following called “location”) positioned in Talairach space at xl with a 3-dimensional
Gaussian kernel with isotropic variance (Nielsen and Hansen, 2002; Turkeltaub et al.,
2002; Chein et al., 2002).
(x − xl )T (x − xl )
2 −3/2
,
(1)
exp −
p(x|l) = (2πσ )
2σ 2
where we select the kernel width to σ = 1 cm. When we construct the probability
density corresponding to a group of experiments p(x|g) we combine the contributions
from all the individual locations associated with the group (i.e., all the locations in
all the experiments associated with the group)
p(x|l) p(l|g),
(2)
p(x|g) =
l∈g
where the prior is simply set to p(l|g) = 1/|Lg |, i.e., inversely proportional to the
number of locations in the g group. The continuous probability density is converted
to a vector by sampling it in a regular grid
vg ≡ p(x|g).
(3)
3
In the present application we use a coarse grid of (8 mm) resulting in 7752 voxels.
As a statistic for the difference between two volumes (v1 and v2 ) we simply use the
subtraction performed separately for each voxel
t = v1 − v2 .
2
(4)
123
(a) Hot pain
(b) Cold pain
Figure 1: Visualization of the Talairach coordinates from hot pain and cold pain
studies in a corner cube environment (Rehm et al., 1998). The glyphs are colored
according to the experiment they belong to and are projected onto the walls. The blue
curves are the outline of the brain. The thick red lines are the axes of the Talairach
atlas and the view point makes the upper left part of the back of the brain the closest
point. The units on the axes are centimeters.
To counter the multiple comparison problem a null distribution uses the maximum
value across voxels
(5)
t = maxi (ti )
The null distribution of this maximum statistic is established by permutation: A
distribution is build up by permuting the assignment of experiments to the two groups
resulting in two new groups v1∗ and v2∗ , that each comprises of the same number of
experiments as the original two groups. Thus the null distribution of the maximum
statistic appears as
∗
∗
− v2,i
).
(6)
t∗ = max(v1,i
i
The permutation is randomized sufficiently many times (n = 1, . . . , N ) to generate
a “smooth” and stable distribution. We may assign a conservative p-value to the
ith voxel by counting the proportion its statistic ti is exceeded by the N values of the
permutation maximum statistic
Pi = 1/N
N
|ti < t∗n | .
(7)
n
The p-values enable us to choose a statistically based threshold in the subtraction
image.
To demonstrate the method we invoke data from thermal pain studies, where the
two groups are hot and cold pain. Such studies will typically employ a 45 − 50 ◦ C
or 0 − 5 ◦ C stimulus to the subjects. The studies were added to the Brede database
3
124
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
1
2
3
4
5
6
7
8
WOEXT: 183 - Hot pain
WOEXT: 186 - Attended heat pain on right hand
WOEXT: 187 - Distracted heat pain on right hand
WOEXT: 188 - Attended heat pain on left hand
WOEXT: 189 - Distracted heat pain on left hand
WOEXT: 217 - Hot pain in right hand
WOEXT: 225 - Hot pain on left hand (group 1)
WOEXT: 227 - Hot pain on left hand (group 2)
WOEXT: 230 - Painful heat on right fingers
WOEXT: 233 - Hot pain on right hand in rest,
mental imagery and hypnosis
WOEXT: 234 - Hot pain on right hand in rest and
mental imagery
WOEXT: 235 - Hot pain on right hand during
hypnosis
WOEXT: 237 - Interaction between hypnosis and
hot pain on right hand
WOEXT: 238 - Correlated with pain ratings in
hot pain on right hand in rest, mental imagery
and hypnosis
WOEXT: 240 - Interaction between hypnosis and
pain ratings in hot pain on right hand.
WOEXT: 245 - Heat pain on right arm
WOEXT: 246 - Positive correlation with pain
threshold
WOEXT: 248 - Correlation with pain intensity
WOEXT: 249 - Correlation with pain unpleasantness
WOEXT: 298 - Early phase heat pain
WOEXT: 299 - Late phase heat pain
WOEXT: 312 - Heat pain on left hand
WOEXT: 314 - Heat pain on left volar forearm
WOEXT: 319 - Heat pain on left arm
WOEXT: 182 - Cold pain
WOEXT: 184 - Cold pain in left hand
WOEXT: 213 - Cold pain in right hand
WOEXT: 263 - Cold pain on right foot
WOEXT: 264 - Cold pain on right foot masked
by silent word reading
WOEXT: 265 - Silent word reading while cold
pain on right foot
WOEXT: 266 - Cold pain versus cold pain with
silent word reading
WOEXT: 320 - Cold pain on left arm
(Tracey et al., 2000)
(Brooks et al., 2002)
(Brooks et al., 2002)
(Brooks et al., 2002)
(Brooks et al., 2002)
(Craig et al., 1996)
(Becerra et al., 1999)
(Becerra et al., 1999)
(Gelnar et al., 1999)
(Faymonville et al., 2000)
(Faymonville et al., 2000)
(Faymonville et al., 2000)
(Faymonville et al., 2000)
(Faymonville et al., 2000)
(Faymonville et al., 2000)
(Tölle et al., 1999)
(Tölle et al., 1999)
(Tölle et al., 1999)
(Tölle et al., 1999)
(Casey et al., 2001)
(Casey et al., 2001)
(Vogt et al., 1996)
(Adler et al., 1996)
(Casey et al., 1996)
(Tracey et al., 2000)
(Petrovic et al., 2000)
(Craig et al., 1996)
(Frankenstein et al., 2001)
(Frankenstein et al., 2001)
(Frankenstein et al., 2001)
(Frankenstein et al., 2001)
(Casey et al., 1996)
Table 1: List of included hot and cold pain experiments.
4
125
Hot pain
100
Frequency
80
60
40
20
0
1000
2000
3000
4000
6000
5000
7000
8000
9000
10000
11000
7000
8000
9000
10000
11000
Cold pain
200
Frequency
150
100
50
0
1000
2000
3000
4000
6000
5000
Maximum statistics
Figure 2: Empirical histograms of the maximum statistics t∗ after 1000 permutations.
The thick red lines indicate the maxima for the hot and cold pain statistics t hot and
tcold .
(Nielsen, 2003). Slight variations among the studies appear in the application of the
Talairach atlases, and locations that conform to the so-called MNI space are adjusted
before entry (Brett, 1999). All the locations from the pain experiments are shown
in two panels in figure 1, where the color indicates the experiments the locations
originate from. Table 1 lists all the included 24 hot and 8 cold pain experiments. The
list is automatically generated from the information in the Brede database. Note that
the pain stimulus is induced under varying contexts, at different sites on the body
and some studies contribute with several experiments, e.g., a study by Faymonville
et al. contributes with 6 experiments.
Both the statistics for hot and cold pain are considered, — simply by reversing the
subtraction (or actually, in the practical implementation, by finding the minimum)
thot
= max(vhot,i − vcold,i )
(8)
tcold
= max(vcold,i − vhot,i ).
(9)
i
i
Many of the operations performed in our method described above are implemented
in the Brede Neuroinformatics Toolbox (Nielsen and Hansen, 2000).
3
Results and discussion
Figure 2 displays the empirical histograms of the null distributions of the maximum
statistics t∗ . The thick red lines indicate the maximum statistics of the comparisons of
5
126
Figure 3: Results from the permutation test in a corner cube environment. The red
isosurfaces are for hot pain thot and the light blue isosurfaces are for cold pain tcold
based on very low thresholds at P = 0.95.
interest thot and tcold . They show that our method does not find any larger differences
between hot and cold pain: The statistics of both hot and cold pain are found in the
middle of the null distributions. The distribution for the cold pain has a heavier tail
than the hot pain distribution. This is caused by the fewer experiments that form
the cold pain group.
Figure 3 shows isosurfaces with a very liberal threshold in the subtraction images
of hot thot and cold pain tcold . The right hemisphere appears with the first difference
between the pain modalities, but only with very low support: For cold pain the most
“significant” voxel has a p-value of P ≈ 0.25.
We have previously proposed a method that uses a database of experiments to
generate a null distribution of the correlation coefficient between two volumes (Nielsen
and Hansen, 2004a; Nielsen and Hansen, 2004b). That method requires a database
of dissimilar experiments to build up a null hypothesis, e.g., a pain experiment is
compared with a memory or language experiment. The method we present in this
contribution does not need this extra data, but relies only on data from the two groups
that are being compared. Furthermore, our previous method is a global method
performing an omnibus test for the entire volume and the entire set of locations,
while our new method allows us to make inference on the voxel-level. This is also an
advantage over the Hotelling’s T 2 test. However, our presented method does require
— to gain sufficient statistical power — that there are many experiments in each
group.
6
127
4
Acknowledgment
Jørgen Kold is acknowledged for collection of the data. Finn Årup Nielsen is supported
by the Villum Kann Rasmussen Foundation.
References
Adler, L. J., Gyulai, F. E., Diehl, D. J., Mintun, M. A., Winter, P. M., and Firestone,
L. L. (1996). Regional brain activity changes associated with fentanyl analgesia
elucidated by positron emission tomography. Anesthesia & Analgesia, 84(1):120–
126.
Becerra, L. R., Breiter, H. C., Stojanovic, M., Fishman, S., Edwards, A., Comite,
A. R., Gonzalez, R. G., and Borsook, D. (1999). Human brain activation under
controlled thermal stimulation and habituation to noxious heat: An fMRI study.
Magnetic Resonance in Medicine, 41(5):1044–1057.
Brett, M. (1999). The MNI brain and the Talairach atlas. http://www.mrccbu.cam.ac.uk/Imaging/mnispace.html. Accessed 2003 March 17.
Brooks, J. C. W., Nurmikko, T. J., Bimson, W. E., Singh, K. D., and Roberts,
N. (2002). fMRI of thermal pain: effects of stimulus laterality and attention.
NeuroImage, 15(2):293–301.
Cao, J. and Worsley, K. J. (2001). Applications of random fields in human brain
mapping. In Moore, M., editor, Spatial Statistics: Methodological Aspects and
Applications, volume 159 of Lecture notes in Statistics, chapter 8, pages 170–182.
Springer, New York.
Casey, K. L., Minoshima, S., Morrow, T. J., and Koeppe, R. A. (1996). Comparison
of human cerebral activation patterns during cutaneous warmth, heat pain, and
deep cold pain. Journal of Neurophysiology, 76(1):571–581.
Casey, K. L., Morrow, T. J., Lorenz, J., and Minoshima, S. (2001). Temporal and spatial dynamics of human forebrain activity during heat pain: analysis by positron
emission tomography. Journal of Neurophysiology, 85(2):951–959.
Chein, J. M., Fissell, K., Jacobs, S., and Fiez, J. A. (2002). Functional heterogeneity
within Broca’s area during verbal working memory. Physiology & Behavior, 77(45):635–639.
Christoff, K. and Grabrieli, J. D. E. (2000). The frontopolar cortex and human cognition: Evidence for a rostrocaudal hierarchical organization within the human
prefrontal cortex. Psychobiology, 28(2):168–186.
Craig, A. D., Reiman, E. M., Evans, A., and Bushnell, M. C. (1996). Functional
imaging of an illusion of pain. Nature, 384(6606):258–260.
Faymonville, M. E., Laureys, S., Degueldre, C., Del Fiore, G., Luxen, A., Franck, G.,
Lamy, M., and Maquet, P. (2000). Neural mechanisms of antinociceptive effects
of hypnosis. Anesthesiology, 92(5):1257–1267.
Fox, P. T. and Lancaster, J. L. (1994).
266(5187):994–996.
7
Neuroscience on the net.
Science,
128
Fox, P. T. and Lancaster, J. L. (2002). Mapping context and content: the BrainMap
model. Nature Reviews Neuroscience, 3(4):319–321.
Fox, P. T., Lancaster, J. L., Parsons, L. M., Xiong, J.-H., and Zamarripa, F. (1997).
Functional volumes modeling: Theory and preliminary assessment. Human Brain
Mapping, 5(4):306–311.
Frankenstein, U. N., Richter, W., McIntyre, M. C., and Remy, F. (2001). Distraction modulates anterior cingulate gyrus activations during the cold pressor test.
NeuroImage, 14(4):827–36.
Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J.-B., Frith, C. D., and Frackowiak, R. S. J. (1995). Statistical parametric maps in functional imaging: A
general linear approach. Human Brain Mapping, 2:189–210.
Gelnar, P. A., Krauss, B. R., Sheehe, P. R., Szeverenyi, N. M., and Apkarian, A. V.
(1999). A comparative fMRI study of cortical representations for thermal painful,
vibrotactile, and motor performance tasks. NeuroImage, 10(4):460–482.
Holmes, A. P., Blair, R. C., Watson, J. D. G., and Ford, I. (1996). Non-parametric
analysis of statistic images from functional mapping experiments. Journal of
Cerebral Blood Flow and Metabolism, 16(1):7–22.
Ingvar, M. (1999). Pain and functional imaging. Philosophical Transactions of the
Royal Society of London. Series B, Biological Sciences, 354(1387):1347–1358.
Nichols, T. E. and Holmes, A. P. (2001). Nonparametric permutation tests for PET
functional neuroimaging experiments: A primer with examples. Human Brain
Mapping, 15(1):1–25.
Nielsen, F. Å. (2001). Neuroinformatics in Functional Neuroimaging. PhD thesis, Informatics and Mathematical Modelling, Technical University of Denmark, Lyngby, Denmark.
Nielsen, F. Å. (2003). The Brede database: a small database for functional neuroimaging. NeuroImage, 19(2). Presented at the 9th International Conference
on Functional Mapping of the Human Brain, June 19–22, 2003, New York, NY.
Available on CD-Rom.
Nielsen, F. Å., Balslev, D., and Hansen, L. K. (2004). Mining posterior cingulate.
NeuroImage, 22. Presented at the 10th Annual Meeting of the Organization
for Human Brain Mapping, June 14–17, 2004, Budapest, Hungary. Available on
CD-ROM.
Nielsen, F. Å. and Hansen, L. K. (2000). Experiences with Matlab and VRML in
functional neuroimaging visualizations. In Klasky, S. and Thorpe, S., editors,
VDE2000 - Visualization Development Environments, Workshop Proceedings,
Princeton, New Jersey, USA, April 27–28, 2000, pages 76–81, Princeton, New
Jersey. Princeton Plasma Physics Laboratory.
Nielsen, F. Å. and Hansen, L. K. (2002). Modeling of activation data in the
BrainMapTM database: Detection of outliers. Human Brain Mapping, 15(3):146–
156.
Nielsen, F. Å. and Hansen, L. K. (2004a). Assessing the reproducibility in sets of
Talairach coordinates. NeuroImage, 22. Presented at the 10th Annual Meeting
of the Organization for Human Brain Mapping, June 14–17, 2004, Budapest,
Hungary. Available on CD-ROM.
8
129
Nielsen, F. Å. and Hansen, L. K. (2004b). Finding related functional neuroimaging
volumes. Artificial Intelligence in Medicine, 30(2):141–151.
Petrovic, P., Petersson, K. M., Ghatan, P. H., Stone-Elander, S., and Ingvar, M.
(2000). Pain-related cerebral activation is altered by a distracting cognitive task.
Pain, 85(1-2):19–30.
Rehm, K., Lakshminarayan, K., Frutiger, S. A., Schaper, K. A., Sumners, D. L.,
Strother, S. C., Anderson, J. R., and Rottenberg, D. A. (1998). A symbolic
environment for visualizing activated foci in functional neuroimaging datasets.
Medical Image Analysis, 2(3):215–226.
Roland, P., Svensson, G., Lindeberg, T., Risch, T., Baumann, P., Dehmel, A., Frederiksson, J., Halldorson, H., Forsberg, L., Young, J., and Zilles, K. (2001).
A database generator for human brain imaging. Trends in Neuroscience,
24(10):562–564.
Talairach, J. and Tournoux, P. (1988). Co-planar Stereotaxic Atlas of the Human
Brain. Thieme Medical Publisher Inc, New York.
Tölle, T. R., Kaufmann, T., Siessmeier, T., Lautenbacher, S., Berthele, A., Munz, F.,
Zieglgansberger, W., Willoch, F., Schwaiger, M., Conrad, B., and Bartenstein,
P. (1999). Region-specific encoding of sensory and affective components of pain
in the human brain: a positron emission tomography correlation analysis. Ann
Neurol, 45(1):40–7.
Tracey, I., Becerra, L., Chang, I., Breiter, H., Jenkins, L., Borsook, D., and Gonzalez, R. G. (2000). Noxious hot and cold stimulation produce common patterns
of brain activation in humans: a functional magnetic resonance imaging study.
Neuroscience Letters, 288(2):159–162.
Turkeltaub, P. E., Eden, G. F., Jones, K. M., and Zeffiro, T. A. (2002). Meta-analysis
of the functional neuroanatomy of single-word reading: method and validation.
NeuroImage, 16(3 part 1):765–780.
Van Horn, J. D., Grethe, J. S., Kostelec, P., Woodward, J. B., Aslam, J. A., Rus, D.,
Rockmore, D., and Gazzaniga, M. S. (2001). The functional magnetic resonance
imaging data center (fMRIDC): the challenges and rewards of large-scale databasing of neuroimaging studies. Philosophical Transactions of the Royal Society of
London, Series B, Biological Sciences, 356(1412):1323–1339.
Vogt, B. A., Derbyshire, S., and Jones, A. K. P. (1996). Pain processing in four
regions of human cingulate cortex localized with co-registered PET and MR
imaging. European Journal of Neuroscience, 8(7):1461–1473.
Wager, T. D., Phan, K. L., Liberzon, I., and Taylor, S. F. (2003). Valence, gender,
and lateralization of functional brain anatomy in emotion: a meta-analysis of
findings from neuroimaging. NeuroImage, 19(3):513–531.
9
130
AN
IMAGE BASED SYSTEM TO AUTOMATICALLY
AND OBJECTIVELY SCORE THE DEGREE OF
REDNESS AND SCALING IN PSORIASIS LESIONS .
David Delgado
Informatics and Mathematical Modelling,
Denmark
email: [email protected]
Bjarne Ersb ll
IMM,
Denmark
email: [email protected]
Jens Michael Carstensen
IMM
Denmark
email: [email protected]
Abstract
In this work, a combined statistical and image analysis method to automatically evaluate the
severity of scaling in psoriasis lesions is proposed. The method separates the different regions of
the disease in the image and scores the degree of scaling based on the properties of these areas.
The proposed method provides a solution to one of the present problems in dermatology: the lack of
suitable methods to assess the lesion and to evaluate the changes during the treatment. An experiment
over a collection of psoriasis images is conducted to test the performance of the method. Results
show that the obtained scores are highly correlated with scores made by doctors. This and the fact
that the obtained measures are continuous indicate the proposed method is a suitable tool to evaluate
the lesion and to track the evolution of dermatological diseases.
Keywords: psoriasis, exploratory data analysis, segmentation, decision trees, classi cation
1 Introduction
One of the main problems in the treatment of dermatological diseases is the dif culty of tracking the
evolution of the disease. Physicians are visited by the patients several times to control the evolution of
the disease. However, due to the fact that no objective methods to summarize the lesion exist, physicians
make scorings and take notes to document the actual condition of the patient. A drawback of this method
is the dependency on the individual physician.
The advances in image analysis during the last decade have lead to the development of different methods to deal with related problems in the dermatological eld. Engström [1] observed the effect of a new
enzymatic debrider observing the evolution of the lesion area and the lesion color. These measurements
were obtained from digitized photographs analyzed with a computer. Later, Hansen [2], developed an
image system that included calibration for increasing the quality of the images. The system diagnoses
burns and pressure ulcers in animals but the possibility of being used in humans was mentioned. In a
recent paper, Hillebrand [3] used computer analysis in high resolution digital images to compare the skin
condition of a group of females.
In this work, a method to objectively score the degree of scaling and redness in psoriasis is proposed.
The method realises a hierarchical segmentation to isolate the different structures present in the image:
normal skin, red area and scales. Different values are obtained from these areas and they are used to
approximate the doctor scorings.
The dermatologists Lone Skov and Bo Bang of Gentofte Hospital of Denmark and the anonymous patients are gratefully
acknowledged for their collaboration during the image acquisition sessions.
2 SEGMENTATION OF THE AREAS PRESENT IN THE LESION
David Delgado et al
2 Segmentation of the areas present131
in the lesion
Psoriasis is a dermatological disease characterized by red, thickened areas with silvery scales [4]. In
order to score the degree of scales and redness in psoriasis, the rst step is to segment the different areas
in the lesion. The wide variety of forms and different levels of severity that psoriasis can exhibit makes
this task highly complex.
2.1 Segmentation of the lesion
The segmentation of the disease with respect to the healthy area is based on the assumption, that under
a suitable projection, both the normal skin and the lesion are distributed approximately as a Gaussian
distribution. This assumption was supported by an exploratory data analysis of a small set of psoriasis
images where several projections were considered. Furthermore, a principal component analysis [5] and
an independent component analysis [6] on a dataset of 115 images indicated that the difference between
the green and the blue band exhibits a good contrast to discriminate between the lesion and the normal
skin. The distribution of this difference approximately follows a linear mixture of two Gaussians. The
estimation of their means and variances makes it possible to identify the lesion by means of discriminant
analysis. The parameters of the gaussians were estimated according to Taxt [7]. Figure 1 shows the
segmentation of the lesion.
2.2 Extracting the scales
Segmentation of the scales is complicated by the fact that scales may or may not appear in the image.
If they appear they may range from a few spots to a large area. Moreover, non-uniformity of the areas
with redness (ranging from red to brown) makes the task even harder. This variability implies that
the lesion has to be considered in small areas where the change in redness is not signi cant. This
can be accomplished with watersheds [8] to mark the different scales and then locally use a clustering
algorithm [9] to segment them. This approach requires specifying the number of watersheds. In this
work, the number of watersheds is determined in two steps. First a new image is created based in the
watershed regions. Each watershed area is replaced by the minimum value of this area. This new image is
then thresholded and the watersheds with values less than the threshold are the areas where the scales are
detected. The method was tested on a set of psoriasis images and it demonstrated a good performance.
However, the method had dif culties with some images that had problems during acquisition (especially
shadows), so the number of watersheds was not found correctly. To solve this problem, the number of
watersheds was x ed visually by a tuning parameter. The blue band was used to nd the watersheds
because a canonical analysis had shown that this band is the best to separate the scales from the red area.
Figure 2 displays the segmentation of the scales.
2.3 Scoring the disease
Once the different areas have been segmented, a decision tree is created to automatically score the degree
of scaling in the different images, approximating the scorings made by the physicians. Three variables
are used as input to the model: the area of the scaling, the ratio between the area of scaling and the area
of the lesion, and the ratio between the area of scaling and the area of redness. The whole procedure is
shown in Figure 3.
In the evaluation of the redness, a canonical discriminant analysis over the difference of the mean
values of the spectral values in the red and healthy area points out to approximate the physicians scorings
using a clustering method as, e. g., a K- nearest neighbour.
132
133
2 SEGMENTATION OF THE AREAS PRESENT IN THE LESION
David Delgado et al
134
Patient
Image Aquisition
Capture Image
Segmentation
Area with scaling
Area Scaling/Area lesion
Area scaling/Area red
Feature Extraction
3713
0.089
0.098
Scoring
Automated Scoring 3
Doctor’s Scoring 3
Figure 3: diagram of the method.
3 EXPERIMENTS
David Delgado et al
135
Area < 2935
4
5
x 10
4.5
Area < 32073
Area < 467
4
3.5
3
2.5
2
1.5
1
0.5
0
0.92
2
3
0
0
0.5
1
1.5
2
2.5
3
Figure 4: Left: Decision tree for the scoring given the parameters of the segmentation. Right: Dependency of lesion area on physicians’ scoring of the lesions
3 Experiments
Two experiments are conducted to test the accuracy of the proposed method to score the scaling and
redness in psoriasis lesions
3.1 First experiment: Scoring the degree of Scaling
In collaboration with the dermatological department of Gentofte Hospital in Denmark an experiment
was conducted. The goal of the experiment is to objectively score the severity of the scaling in psoriasis
images. To accomplish this goal, a set of 46 psoriasis images was selected from a database of psoriasis
collected from different patients. The physicians scores of these images was also available. The images
were selected to cover the maximal possible diversity. The different areas of each image were extracted
according to the procedure described in the previous sections and the above mentioned three summary
values were obtained. A cross-validation process was used to build 23 decision trees. These decision
trees utilized 44 data points to build the tree and two for testing it. Results showed that the rst variable,
the area of the scaling, is enough to explain the physicians scoring. The automated scoring with our
method has proven reliable, and on several occasions even allowed for corrections of physicians’ mistakes. In these cases, the physicians were asked to re-score their previous judgements, and in all cases
the assessment was changed.
Figure 4 left shows the nal tree generated using all the points. Figure 4 right plots the area of the
scaling versus the physicians’ scoring.
3.2 Second experiment: Classifying the severity of redness
The second experiment aims to assess the possibility of automatically scoring the degree of the redness
of the lesion. To achieve this goal, a set composed of 77 images of psoriasis lesions was selected from
a data set with 175 images to perform the experiment. The selected images do not present shadows,
scars or other elements that could spoil the result of the experiment. The severity of redness for each
image was scored by the physicians in order to have a reference measure. The different areas involved
in the chosen images were segmented according to the procedure described previously. The mean of the
tri-chromatic bands was calculated in the healthy and in the red skin area for each image. The difference
of these two means showed to be a good feature to evaluate the lesion.
3 EXPERIMENTS
David Delgado et al
136
20
15
10
5
0
5
10
40
30
20
10
0
10
10
20
10
0
20
30
40
2.5
2
1.5
1
0.5
0
0.5
1
1.5
4
3
2
1
0
1
2
3
4
5
6
Figure 5: Up: 3D plot of the variables considered to classify the redness. Down: Result of applying a
canonical discriminant analysis to the three selected variables to classify the redness
REFERENCES
David Delgado et al
Figure 5 left shows a 3D plot of the differences where the different symbols represent different physi137
cians scorings. The presence of three de ned groups indicates the possibility of classify the redness
using a clustering algorithm. This suggestion is even more clear if a canonical discriminant analysis is
realized as it can be noticed in Figure 5.
4 Summary and conclusion
In this work, a procedure to evaluate the severity of the scaling and redness in psoriasis has been developed. The method automatically separates the different parts and extracts different parameters. In
certain dif cult cases such as uneven illumination it has been noticed that, allowing a manual interaction
increases the accuracy notably.
The method provides objective measures that avoid the dependence of the physician in the tracking of
dermatological diseases. It has been shown that one of the provided measures is highly correlated with
the doctor scoring. Together with the other two measures we expect to be able to provide a better lesion
description.
References
[1] Engström N., Hansson F. , Hellgren L., Tomas J., Nordin B. Vincent J. and Wahlberg A. Computerized Wound Image Analysis In Pathogenesis of Wound and Biomaterial-Associated Infections.
Springer-Verlag p 189-193, 1990.
[2] Hansen G., Sparrow E., Kokate J., Leland K., Iaizzo P. Wound Status Evaluation using Color Image
Processing IEEE Transactions on Medical Imaging, vol. 16, no.1, February 1997
[3] Hillebrand G., Miyamoto K., Schnell B., Ichihashi M., Shinkura R., Akiba S. Quantitative evaluation
of skin condition in an epidemiological survey of females living in northern versus southern Japan
Journal of Dermatological Science, vol. 27, p 42-52, 2001.
[4] C. Camisa, Handbook of psoriasis, Blackwell Science, 1998.
[5] Johnson R., Wichern D. Applied Multivariated Statistical Analysis. Chapter 8. Prentince-Hall, 1995.
[6] Hyv rinen A., Karhunen J., Oja E. Independent Component Analysis Wiley publications, 2001.
[7] Taxt T., Hjort L., Eikvik L. : Statistical Classi cation using a Linear Mixture of two Multi-normal
Probability Densities. Pattern Recognition Letters.(1991) 12 731-737
[8] Vincent L., Soille P. Watersheds in digital spaces: an ef cient algorithm based on immersion simulations, IEEE Trans. Pattern Anal. Mach. Intell. 13 (6) (1991) 593-598.
[9] Sonka M., Hlavac V., Boyle R. Image Processing Analysis, and Machine Vision Broks, p 128-130,
1999.
138
Survey and A ssessm entofA dvanced Feature Extraction
Techniques and Tools for EO A pplications
M ikaelK am p Sørensena*,M ichaelSchultz Rasm ussena,H enning Skriverb,PeterJohansenc,
A rthurPecec,and JesperH øyerup Thygesend.
a
G RA S,c/o Inst.ofG eography,U niv.ofCopenhagen,Ø sterV oldg.10,1350 Copenhagen,D enm ark
b
TechnicalU niv.ofD enm ark,Ø rsted-D TU ,2800 K gs.Lyngby,D enm ark
c
D ept.ofCom p.Science,U niv.ofCopenhagen,U niversitetsparken 1,2100 Copenhagen,D enm ark
d
RO V SIN G A /S,D yregårdsvej2,2740 Skovlunde,D enm ark
A BSTR A C T
In addition to the substantial am ounts of available Earth O bservation (EO ) data, there is currently an increasing trend
tow ards the acquisition of larger and larger EO data and im age quantities from single satellites or m issions,w ith m ultiple,higherresolution sensors and w ith m ore frequentrevisiting.M ore sophisticated algorithm s and techniques than those
largely in use today are required to exploitthis rapidly grow ing w ealth ofdata and im ages to a fullerextent.
The project “Survey and A ssessm ent of A dvanced Feature Extraction Techniques and Tools for EO A pplications”
(SU RF) funded by the European Space A gency (ESA ) w illaddress these issues.The objective of SU RF is to provide an
overview of the currentstate-of-the-artM ethods w ithin feature extraction and m anipulation for EO applications and to
identify scenarios and related architectures for exploitation of the m ost prom ising EO feature extraction M ethods.The
task is to identify the m ost promising M ethods to extract pertinent inform ation from EO data on environm ent, natural
resources and security issues. SU RF aim s at listing existing M ethods w ith the final goal of identifying the three m ost
prom ising M ethods to be im plem ented in prototype solutions.The w ork includes the developm entof the conceptfor the
evaluation and rating ofM ethods relative to the users needs forinform ation,the m aturity and novelty ofthe M ethods,the
potential for fusing data and the operational feasibility. Special emphasis w ill be m ade regarding the exploitation of
state-of-the artim age processing,pattern recognition and classification techniques.
K eyw ords:autom atic feature extraction,classification,survey,m ethods,EO ,rem ote sensing,data fusion
IN TR O D U C TIO N
The globalarchives of rem ote sensing data are constantly increasing due to a num ber of factors.First,the am ountof operationalrem ote sensing satellites is continuously grow ing.Second,the spatialand spectralresolutions of the new sensors are constantly im proving due to technologicaladvances,resulting in exponentially larger data volum es.Finally,the
data transm ission capabilities ofm odern satellites resultin vastly increased data transfers.
H ow ever,even though the use and exploitation of EO data is increasing,the vastdata archives are notbeing fully exploited. Ithas been estim ated thatonly a sm allpercentage of the rem ote sensing data archived annually are actually being utilized by end users.Partof the reason is thatm ostrem ote sensing applications depend on im age processing carried
out by experts. Even though the basic im age processing such as geom etric rectification and radiom etric calibration is
*
m [email protected] gras.ku.dk;phone +45 35 32 41 75;fax +45 35 32 25 01;w w w .gras.ku.dk
139
largely perform ed autom atically for m ost rem ote sensing products, the conversion from im age data to the them atic inform ation required by end users is a constraining factorforthe furtherproliferation ofrem ote sensing applications.
A num ber of initiatives have already focused on autom atic feature extraction. Through the Inform ation Society Technologies (IST)program m e,the European U nion seeks to m ature technologicaladvances.A n exam ple ofsuch a projectis
Satellite Based Inform ation System on CoastalA reas and Lakes (SISCA L) (SISCA L,2003),thatw illprovide near-real
tim e data on the m arine environm ent.Exam ples of data m ining projects include the ESA projectK now ledge D riven Inform ation M ining in Rem ote Sensing Im age A rchives (K IM ).
H ow ever,there is stilla need for developing autom atic feature extraction m ethods thatw illm ake rem ote sensing inform ation readily available to a w ide range of end users.The “user” is considered as a broad category from professionals to
the generalpublic,from the environm entalscientistto the analystw orking in a county or a citizen seeking inform ation
through the Internet.For this reason EO data products m ustbe exploitable in a user-friendly w ay through a clear understanding and description ofthe inform ation and through digitaldata form ats thatare easy to handle.
M ostend users w ork w ith them atic features thatchange over tim e and the grow ing archives of rem ote sensing data further encourage and facilitate the analysis of tim e series. Change detection and trend analysis are im portant techniques
thatare able to m atch the userrequirem ents w ithin globalchange and environm entalm onitoring.
Itis im portantfor the exploitation of rem ote sensing data to try to identify novelm ethods. An interesting perspective is
the com bined use ofdata from differentsensors applying data fusion m ethods.In the contextof the currentproject,data
fusion m ay both refer to a specific data fusion algorithm (i.e. w avelets, PCA ) com bining tw o or m ore different data
sources oritm ay sim ply referto the integrated use oftw o differentim age sources.Furtherm ore,itis the intension to take
a broad view of possible candidate m ethods notonly w ithin classic rem ote sensing sciences,butequally w ithin related
fields such as im age processing as being explored by m edicine and in the studies of m aterials and m inerals using m icroscopes.
The project “Survey and A ssessm ent of A dvanced Feature Extraction Techniques and Tools for EO A pplications”
(SU RF) funded by the European Space A gency (ESA ) w illdealw ith these issues.The m ain objectives of the SU RF projectare to obtain:
-
an up-to-date picture of advanced M ethods for the support of Earth O bservation applications through Feature
Extraction.
-
the identification of possible scenarios and related architectures for the exploitation of the m ostprom ising EO
Feature Extraction M ethods.
-
the im plem entation ofthree M ethods using realim age data from three cases.
The objectives w illbe achieved through a survey,evaluation and ranking of M ethods applicable to the EO dom ain.For
the three m ostprom ising M ethods,prototypes w illbe developed forevaluation using im age data.
In the SU RF project,a M ethod (w ith capitalM ) is here defined as a w ay of solving a problem w ithin a given context.A
M ethod m ay be a specific algorithm or technique or itm ay be understood in a broader sense,such as a com plete softw are package.Feature extraction can be defined as the conversion from im age data to them atic inform ation.In the SU RF
project,the intention is to supportthe user in term s of pre-processing through autom ating the com plex partof the im age
processing by identifying features,patterns,trends,changes etc. and hereby m aking this inform ation readily available.
The approach to the screening of EO research status and trends w illstartw ith the identification of them atic areas w here
EO data have dem onstrated their usefulness.Them atic areas w here EO data are expected to provide im portantinformation in the future w illequally be included.
M ostscientists m aintain the idea of their w ork becom ing usefulto the society in one form or the other and the objective
of the presentpaper is to dem onstrate how scientific results are evaluated for possible real-w orld use and hereby propose
a batch ofim portantcriteria.This w illbe done by presenting the strategy forthe evaluation and the selection ofM ethods
as being done w ithin the SU RF project.The w ork w illbe placed in a contextthrough the description and evaluation of
140
three candidate M ethods.These are entropy based prediction,N D V I trend m ap from tim e series and know ledge-based
classification by SA R.
M ETH O D O LO G Y
In order to m eetthe objectives m entioned above,the survey started w ith a broad approach,listing the m ain them atic areas,the applications and M ethods.In the consecutive steps,the listof candidate M ethods w illbe progressively narrow ed
dow n and in the end,three M ethods w illbe selected forprototype im plem entation.
A tthe application level,user considerations are m ostim portant.If there is no or little user interestin an application,no
further effort w ill be devoted tow ards that particular application.A t the M ethod level,the technical considerations are
central.If a certain M ethod is very difficultto im plem entfor any reason,itw illgeta low rating and w illnotbe considered forprototyping.Table 1 provides an exam ple ofthe relation betw een them atic areas,applications and M ethods.
Table 1.Exam ple ofthe relation betw een them atic areas,applicationsand M ethods.
Them aticarea
Them atic sub-area
A pplication
M ethod
Environm entalm onitoring
Environm entaldegradation
M onitoring deforestation
Regression treeclassifier
M ultivariate A lteration D etection and M A F
H yper-colum ns using G aborFilters
M apping desertification
N D V Itrend analysis
Context-based objectoriented classification
The rating and selection of the m ostprom ising M ethods w illbe carried outin close dialog w ith end users,the scientific
com m unity as w ellas w ith ESA .
STEP O N E:SELEC TIO N O F TH EM A TIC A R EA S A ND A PPLIC A TIO N S
A n initialassessm entofthe userneeds w ithin each them atic application is m ade,considering the follow ing factors:
a) U sers’ need for inform ation – The users’ need for inform ation w ithin the category is evaluated.A pplications
thatare notconsidered usefulw illresultin a low score.
b) Relative provision of inform ation relative to other sources – If the inform ation potentially provided from EO
data can easily and costeffectively be obtained through other data sources (non EO data) itshould resultin a
low score.O n the other hand,if no other m eans than EO data is available for obtaining the inform ation, or if
com plim entary data can be achieved using rem ote sensing data,a high score should be given.
c) U sers’need for autom ation – This category evaluates the users’ need for autom ation in a given area.The need
w illbe high w hen w orking w ith operationaltopics or in areas w ith large am ounts of data.A utom ation w ithin
EO data processing m ay equally be able to provide users w ith products or sem iproducts thatcan be autom atically extracted from EO data during the pre-processing and hereby becom e m ore easily available and less costly
to the users.Forad-hoc applications the need forautom ation m ay be less pronounced.
The assessm entofuserneeds is firstcarried outby the consortium and rated according to the three categories m entioned
above.This w illbe follow ed by a dialog w ith users,user com m unities or representatives to getfeedback on the rating
and selection and hereby allow adjustm entofthe ratings.
D uring the initialphase of the project,five overallthem atic areas w ere identified.Each them atic area is characterized by
a num ber of sub-them atic areas,and w ithin each of these sub-areas,a num ber of them atic applications are found.The
identification of them atic areas and applications is m ainly based on the literature and know ledge of existing operational
applications.
141
Table 2 show s the prelim inary listof them atic areas.For each sub-area,the relevance ofdata categories orpossible data
fusion scenarios is indicated.
Table 2: List of the five thematic areas and related applications
O ptical R adar D ata Tim e
data data fusion series
Environm entalm onitoring
Environmental degradation
Land cover and land use change
Eutrophication and oxygen depletion of the marine environment
Oil spill monitoring
G lobalchange
Changes in vegetation productivity (carbon pools)
Land cover changes
Changes in (patterns of) Sea Surface Temperature (SST)
Changes in marine Chlorophyll a concentrations
Security
Sea-ice monitoring and warning
Flooding, mapping and forecast
Identifying forest or savannah fires
Meteorology
Volcanic activity
Policy and Planning
Carbon accounting
Land cover mapping and changes
Agricultural subsistence and control
Cartography
R esource A ssessm ent
Geology
Fisheries
Forestry
Fresh Water
x
x
x
(x)
x
x
x
x
x
x
x
x
x
(x)
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
STEP TW O :LISTIN G O F M ETH O D S
Based on the listof them atic areas and applications,a large num ber of feature extraction M ethods w ere identified based
on the experience of the consortium as w ellas an initialliterature search.D uring this phase M ethods are identified w ithoutany furtherevaluation.A n exam ple ofM ethods listed w ithin a given application is found in Table 3.
Table 3.Exam ple ofM ethodsforM onitoring D eforestation
RGB-N D V Idifferencing
M ultivariate A lteration D etection /M axim um A utocorrelati
on Factors
The V TT auto changeM ethod
Iterative fitting ofPrincipalCom ponentA xis
Regression treeclassifier
PCA D ifferencing
D ecision tree fuzzy classification
H yper-colum ns using G aborFilters
Classificationsusing space-tim e segm entation
Context-based objectoriented classification
N euralnetw orks
Entropyprediction
K now ledge based classification
142
STEP TH R EE: EV A LU A TIO N O F M ETH O D S W ITH IN TH E M O ST IM PO R TA N T A PPLIC A TIO N S
For each M ethod, an evaluation of the technical considerations is carried out.This project phase w ill be subjected to
three iterations in order to secure a thorough evaluation,w here new inputs and additionalinform ation can be added in
the process.
The follow ing categories are used to rate each M ethod w ithin prom ising applications:
a) M aturity of the M ethod in term s of technical and EO com pliance – This param eter considers how robust the
M ethod is.O ne criteria is to observe the am ountand generality ofpublications thatare available on the M ethod,
preferably from differentsources and w ith exam ples of their im plem entation.A lso,if the M ethod has been implem ented in com m ercialsoftw are packages,operationalprojects orifitis routinely used by various institutions
itm ay be considered m ature.The M ethod in question does notnecessarily need to have a history in EO applications in orderto be m ature - m aturity m ay also referto w ellestablished applications in otherareas ofsay pattern
recognition.
b) Novelty ofthe M ethod – The M ethod should be firstpublished or reported recently to geta high score.Itdoes
nothave to be an innovative algorithm ,the novelty m ay equally consistin com bining w ellestablished M ethods
orapplying w ellknow n M ethods in an innovative m anner.
c) O perational perspectives (tim ing, effort, com putational requirem ents, cost etc.) – The perspectives for im plem enting the M ethod w illbe evaluated here.A M ethod thatis com putationally dem anding (i.e.extensive iterative processes) w illbe given a low score if itrestricts the tim ely outputof results (for instance oilspillm apping
w here the results have to be delivered in near realtim e).If the M ethod is capable of delivering tim ely outputs
w ithin a reasonable financiallim it,a high score should be given.
d) Accuracy ofthe M ethod in relation to the specific application – A high absolute accuracy gives a high score.If
the relative accuracy is high (providing a correct assessm ent of the spatial distribution),the M ethod can be
given a high score even if the absolute accuracy is poor.A n exam ple is chlorophyll a concentration m onitoring,
w here the absolute accuracy is poor,butthe productm eets the needs of environm entalm onitoring institutions
to obtain an overview ofthe spatialdistribution ofalgae.
e) Potentialuse ofthe M ethod in other areas – Ifa M ethod is extrem ely specific and can only be applied w ithin a
single or lim ited areas a low score should be given.O n the other hand,if the M ethod is of a generalnature that
allow s for its application in other them atic areas,a high score should be given.The num ber of cross-references
w ill be a useful indicator (cross-references indicate if the sam e (or nearly the sam e) M ethod is listed in other
applications).
f) Perspectives for autom ation – Ifa M ethod can be run com pletely autom atically,a top score should be given.If
a substantialam ountofm anualw ork is required,a low score should be given.ForM ethods w here som e userinteraction is required (i.e.specification ofconstants etc.)a m oderate score is appropriate.
g) SU RF feasibility – The feasibility of im plem enting a given M ethod w ithin the SU RF project w ill be assessed
separately.Ifa M ethod is extrem ely dem anding to im plem ent,itw illnotbe possible underthe currentproject.
A fterthe evaluation ofm ore than 50 M ethods,approxim ately 20 M ethods w illbe selected based on the evaluation scores
listed above.Selected users and scientists (or groups of) w ill be consulted to m ake com m ents and suggestions on the
ratings and decisions being m ade.
STEP FO U R : D ETA ILED A N A LY SIS O F SELEC TED M ETH O D S
The 20 M ethods selected in step three w illbe narrow ed dow n to three M ethods thatw illbe used for prototype developm ent.The process ofselecting these three M ethods w illinvolve a detailed description of the 20 candidate M ethods.The
feasibility of the M ethods w illbe assessed from the literature and through contactw ith the scientific com m unity. Pilot
and test cases w ill be studied w ith special attention to the potentiallevel of generalisation.U ser requirem ents w ill be
143
analysed or confirm ed through dialogue w ith users,as w ellas the technicalaspects w illbe evaluated.ESA w illactively
take partin the selection process.
STEP FIV E:PR O TO TY PE D EV ELO PM EN T
For the three m ost prom ising M ethods, w orking prototypes w illbe im plem ented.A s som e of the M ethods m ay never
have been tested in the EO field,the evaluation ofprototypes w orking on EO data isan im portantstep in orderto acquire
specific w orking know ledge ofthe new M ethods.The prototype im plem entation w illas faras possible be based on existing available softw are libraries and CO TS.
Criteria for evaluation of the prototypes w illbe established as a subsetof the criteria for the ranking of M ethods in step
three.The operationalperspectives,the M ethods’accuracy,and the levelofautom ation can be tested by running the prototypeson prepared data sets.
Based on the com bined know ledge of M ethods and the user’s needs,realistic exploitation scenarios w ill be described.
This includes an assessm ent of the quality of the inform ation to be provided along w ith an assessm ent of the required
hum an resources and the necessary hard-and softw are.
M ETH O D EX A M PLES
In the follow ing section three exam ples ofcandidate M ethods are described and table 4 show s how these w ere rated.
EN TR O PY BA SED PR ED IC TIO N
The purpose is to detect unexpected changes. These could be the possible occurrence of patches of deforestation or
changes in agriculturalpractice (e.g.from fallow to cultivation).The M ethod m ay even be used to clean up clouded pixels in a tim e series.W hich events are unexpected is a question of prior know ledge of events in the area.The inform ation
m ay in the sim plestcase be directprediction of change per pixel,butcan equally w ellbe a prediction of a possible new
classification scenario.
The principle is to collect statistics on the tim e sequences at each pixel. Shannon (1948) used this M ethod to quantify
inform ation.In a data base an accountis m ade ofalloccurrences oftem poralsequences ofevents.Shannon show ed how
itw ould be possible to distinguish betw een differentlanguages by counting the frequencies of letters in sentences from
the language,the frequencies of allconsecutive letters,of alltriples of letters,etc.The presentM ethod is a m odern implem entation of his M ethod.Shannon show ed thathis prediction M ethod w ould be as precise as a program w hich in advance knew the transition table of a M arkov Source w hich generated the sentences of the language. Lem pel and Ziv
(1986)show ed how to use this principle efficiently for data com pression.The principle in data com pression is thatthe
program estim ates the probability of each sym bol before encoding it.If the sym bol occurs often,a shortcode w ord is
used,and if itoccurs only infrequently,notm uch harm is done using a long code w ord.This coding principle is w idely
used fortextcom pression in m odern personalcom puters.
W hen applied to im ages,statistics per pixelis collected,either as a resultof a classification,a spatialfilter or the quantization of spectralvalues.The program then decides if an eventis unexpected in the follow ing w ay.From a query to the
data base it gets inform ation about tim e and location w here a sim ilar previous sequence of event has occurred before.
This w ay it can m ake use of the prior inform ation about events in the area.There m ay be several occasions atw hich
sim ilar phenom ena have occurred.The program chooses the longestidenticalsequence of events.O nce ithas obtained a
tim e and a place w here a sim ilar event has happened,it m akes the prediction that the next event to occur is the event
w hich took place atthe previous tim e and location w here the optim alm atch w as found.
A surprising aspectis thateven if keeping track of allsequences of alllengths seem s a titanic job,there is an algorithm
w hich m akes it possible to use only one quantum of com putation independent of how long the sequence of im ages is.
This m eans,that given a sufficiently fast com puter,the M ethod w illw ork in real tim e.Furtherm ore,the M ethod w ill
only need one quantum of storage space for each im age itreceives.Itis notnecessary to use increasingly m ore space for
144
collecting the statistics.This is equally surprising,since there is no lim iton the length of the sequences w hich the data
base keeps track of.The data structure used in the data base is called a suffix tree,and the algorithm w as invented by
W einer(1973)quite som e years ago.Few read W einer's paper,a new erversion w as proposed by U kkonen (1992).
N D V I TR EN D M A P
Tim e series of N D V I data derived from EO data are used to identify long term trends in vegetation productivity.N O A A
A V H RR data have been used w here the integration of N D V I during the grow ing period is used as a estim ates of the annualN etPrim ary Production (N PP).This M ethod has been applied in dry areas (Rigina and Rasm ussen,2003,Tottrup
and Rasm ussen,subm itted)as w ellasglobally (N em anietal,2003).D egrading conditions are identified w hen integrated
N D V I consistently are decreasing.The inform ation is used by m inistries of environm entin sub Sahara A frica as a supportto the im plem entation of the U N Rio conventions on desertification and clim ate change.The inform ation serves as
reference data on the state of the environm entas w ellas being used to identify areas subjectto specific actions to m itigate consistently degrading environm entalconditions.
The M ethod is based on a few principles.A num ber of authors have dem onstrated the existence of a relationship betw een N D V I and N PP (Tucker et al, 1986, Prince 1991 and Rasm ussen 1992) using the Light U se Efficiency m odel
(LU E) (K um ar & M onteith, 1981).N D V I can be used to assess the fraction of absorbed photosynthetically radiation
(fA PA R)and the N PP can be calculated as:
N PP = e fA PA RN D V I PA R
W here e is the efficiency oftransform ing energy into drym atterand PA R is the photosynthetically active radiation.
fA PA RN D V I is the fA PA R estim ated using N D V I from EO data.PA R can be obtained from EO data or m eteorological
data and the e can be assum ed constantfor lim ited areas and during lim ited tim e,or values can be determ ined as a function of ecosystem s and differentstress param eters.This is the approach applied for the M O D IS N PP product(Justice et
al.,1998).
From a tim e series of annual m aps of N PP as a function of integrated N D V I, a per-pixel trend m ap can be computed,
w here the value of the slope is assigned to each pixelin the im age.This w illshow the overallconditions and areas w ith
consistently decreasing N PP can be identified.The M ethod w as evaluated and further developed in Senegaland results
show ed that areas experiencing soil exhaustion as w ell as areas w here enabling conditions w ere prevailing w ere both
captured from a tim e series of19 years ofPathfinderA V H RR N D V Idata (Tottrup & Rasm ussen,subm itted).
K N O W LED G E-BA SED C LA SSIFIC A TIO N O F SA R D A T A
The M ethod can be used to m ap changes w ithin land cover,land use and vegetation.M ore specifically,detailed information and land cover can be applied in relation to regionaland globalgeneralcirculation m odels to calculate the fluxes of
w ater-energy-carbon.A know ledge-based,hierarchicalclassification schem e has been proposed for land-cover classification into relatively broad classes using Synthetic A perture Radar (SA R) data. The concepthas been dem onstrated using airborne and spaceborne polarim etric SA R data (Pierce etal.,1994 and 1998) as w ellas com binations of SA R data
from tw o satellites atdifferentfrequencies (C- and L-band) (D obson etal.,1995 and 1996).
Radarbackscatteris sensitive to tw o types ofscene param eters,geom etricaland dielectricalproperties. The geom etrical
propertiesreferto the structure ofthe vegetation and to the roughness ofthe bare surface orthe surface underthe vegetation. For the vegetation,the structure param eters im portantfor radar backscatter are the size distribution,the orientation
distribution,and the densities of the vegetation elem ents (i.e.stem ,branches,and leaves). The dielectric properties are
determ ined by the w ater content,the phase of the w ater and the densities of the scattering elem ents. Based on these sensitivities it is possible to discrim inate betw een different types of vegetation w ith different structural and dielectrical
properties using SA R data. The sensitivity of the radar to these param eters varies w ith radar frequency and polarisation.
H ence, it is possible to resolve potentialam biguities in the discrim ination betw een different vegetation classes using
m ulti-frequency and/orpolarim etric SA R data.
145
In D obson etal.(1995 and 1996)and Pierce etal.(1994 and 1998)know ledge aboutthe scattering properties from vegetation as w ellas electrom agnetic m odelresults w ere used to derive a hierarchicalclassification schem e. First,in the socalled LevelI classifier the im ages are classified into 3 very broad classes:surface,low vegetation and tallvegetation.
The classification accuracy ofthis classifier is found to be very good w ith accuracies close to 100% . Based on the outputof this classification,a LevelII classifier is then applied. This classifier m ay for instance classify the tallvegetation
into differentforesttypes,and the low vegetation into differentcrop types. Classification accuracies for the overall classification schem e ofabove 90% w ere norm ally obtained.
The above-m entioned classification schem e is relatively robustin term s of for instance tim e of the year,and geographical location,because it is based on know ledge and m odel results. The classification schem e m ust,how ever, to som e
degree be adapted to the data available and to the requirem ents for the land-cover classes. A fter the m anualsetup of the
algorithm ,itisvery sim ple and fast.
Table 4:Ratings ofthree differentM ethods
Technical
C onsidera- Entropy based prediction
tions
The M ethodhasnot
M aturity of the M ethod in
been used before on
term s of technical and EO
**
im ages.W idely used
com pliance
fortext-processing.
N D V I Trend M ap
****
N ovelty ofthe M ethod
*****
O perational perspectives
(tim ing, effort, com putational requirem ents, cost
etc.)
**
A ccuracy ofthe M ethod in
relation to the specific
application
Potential use of the
M ethod in otherareas
Perspectives for automation
Feasibility of im plem enting the M ethod w ithin
SU RF (resources)
****
****
**
*****
Som ew hat uncertain.
In principle,theM ethod
can be im plem ented to
function in tim e proportionalw ith the length of
the sequence,butitw ill
*****
take som e care.A prototype program w as run
som e years ago on
im age (32*32 pixels)
w ith sequences of
length 96.
The accuracy cannotbe
estim ated due to the
novelty ofthisM ethod. ****
***
*****
A prototype could be
im plem ented by the
team , but not a real*****
tim e version.
M ature and used.M ustbe
assessed for the area considered.
D em onstration
studies
have been m ade and a
num ber of independent
science team s w orks
currently
w ith
the
M ethod.
V ery good perspectives.
H as been dem onstrated.
K now ledge-based classification ofSAR data
****
**
*****
A lternatives are less accurate and w illnotcoverthe
entire area being m onitored.
M ay be used w ithin different vegetation types
such as forest.Long term
changes in land surface
tem perature.
M ostrelevantbecause the
M ethod is data dem anding.
Theteam hasexperience
developing and im pl
em enting the M ethod.
The M ethod is robust
because it based on
physical m odelling and
know ledge ofbackscatter
****
W hen the classification
schem e is set up, the
M ethod is extrem ely
sim ple and fast, and the
com putational requirem ents are sm all. G ood
perspectives in handling
large data volum es
N orm ally the classification is into broad classes.
The actual accuracy
depends on the data
available (param eters and
acquisition tim e)
M ethod already used in
otherapplications
*****
****
***
The setup ofthe classification schem e is done
m anually, and then the
classification is fully
autom atic
Relatively sim ple M ethod
without any com putational com plexity. M ost
ofthe w ork is included in
the setup of the classification schem e
146
U ser C onsiderations
The user’s need for inform ation
Entropy based prediction
N D V I Trend M ap
(A pplication:
M apping deforestation)
(A pplication:
M apping desertification)
*****
The potential provision of
inform ation relative to the
effortof the user to obtain
this inform ation through
othersources
*****
U ser’s need for automation
****
Strong need especially
in countries w here
conventional statistics
are notreliable.
EO data the only feasible inf. source that
provide hom ogeneous
data at the regional
level.
Large data volum es
necessitate a high degree ofautom ation.
*****
****
****
N eeded form onitoring the
state of the environm ent
and possible degradation
ofnaturalresources.
EO data the only feasible
inf. source that provide
hom ogeneous data at the
regionallevel.
Large data volum es necessitate a high degree of
autom ation.
K now ledge-based classification ofSA R data
(A pplication:M apping land cover
and land use changes)
Planning and natural
resourcem anagem ent.
****
****
****
H om ogeneous and repetitive inform ation from EO
data.
A ssistance to com pile the
vastdatabase of EO data
needed.
C O N C LU SIO N
A strategy has been presented fora survey and assessm entofcandidate EO M ethods to be im plem ented in autom atic or
sem iautom atic feature extraction.The them atic areas and subsequentapplicationsw here EO data can supportusers w ith
im portantinform ation have been listed based on the literature and existing know ledge.These are rated according to user’s potentialneed for inform ation.O nly them atic areas and applications w here the user’s interestcan be justified have
been selected for further consideration.For each application,differentM ethods are listed and a firstassessm entis m ade
in order to reduce the num ber of M ethods to approxim ately 20. Exam ples are presented including the corresponding ratings.This rating of the M ethods are to a large extend subjective,how ever,itis assessed thatitcan be used fordocum entation and a firstselection.A n alternative w ould be to use a decision m odelattributing differentw eights to the different
ratings and using conditionalselections.In this case itw illbe essentialto guarantee the independency betw een the team
responsible for the developm entof the decision m odeland the team responsible for the rating of the M ethods.Furtherm ore,evaluators bias should be assessed through crosschecks.
The nextstep in the process w illbe the in depth feasibility study of the 20 selected M ethods.The results of the SU RF
projectw illbecom e publicly available atESRIN ’s w eb pagesatearth.esa.int/rtd/Projects m id 2004.
A C K N O W LED G EM EN TS
The SU RF Consortium consists of RO V SIN G A /S,G eographic Resource A nalysis and Science Ltd.(G RA S),Technical
U niversity ofD enm ark,Ø rsted-D TU D epartm entand D epartm entof Com puter Science,U niversity of Copenhagen.The
SU RF project is funded by the European Space A gency (ESA ),ESRIN contract no.: 17127/03/I-O L w ith A lessandro
Ciarlo as the technicalofficerforSU RF.
R EFER EN C ES
D obson,M .C.,F.T.U laby,and L.E.Pierce,1995,Land-Cover Classification and Estim ation of Terrain A ttributes U sing
Synthetic A perture Radar,Rem ote Sensing ofthe Environm ent,51,199-214.
D obson, M .C., L.E. Pierce, and F.T. U laby, 1996, K now ledge-Based Land-Cover Classification U sing ERS-1/JERS-1
SA R Com posites,IEEE Transactions on G eoscience and Rem ote Sensing,34,83-99.
147
Justice,C.O ., V erm ote,E., Tow nshend,J.R.G ., D eFries,R.S., Roy,D .P., H all,D .K ., Salom onson,V .V ., Privette,J.L.,
Riggs,G ., Strahler,A .H ., et al., 1998, The M oderate Resolution Im aging Spectroradiom eter (M O D IS): Land Rem ote
Sensing forG lobalChange Research. IEEE Transactions on G eoscience and Remote Sensing,36,1228-49.
K um ar, K ., M onteith,J.L. 1981, Rem ote Sensing of Crop G row th. In Plants and the D aylight Spectrum , edited by
Sm ith,H .(London:A cadem ic Press),pp.133-44.
Lem pel,A .& Ziv,J.,1986,Com pression of2-dim ensionaldata.IEEE transactions on inform ation theory,32,2-8.
N em ani,R.R., K eeling C.D ., H ashim oto H ., Jolly W .M ., Piper S.C., Tucker,C.J., M yneni R.B., Running,S.W ., 2003,
Clim ate-driven increases in globalterrestrialnetprim ary production from 1982 to 1999. Science,300,1560-3.
Pierce,L.E.,F.T.U laby,K .Sarabandi,and M .C.D obson,1994,K now ledge-Based Classification of Polarim etric SA R
Im ages,IEEE Transactions on G eoscience and Rem ote Sensing,32,1081-1086.
Pierce, L.E., K .M . Bergen, M .C. D obson, and F.T. U laby, 1998, M ultitem poral Land-Cover Classification U sing SIRC/X -SA R Im agery,Rem ote Sensing ofthe Environm ent,64,20-33.
Prince,S.D ., 1991,A m odelof regionalprim ary production for use w ith coarse resolution satellite data. International
Journalof Rem ote Sensing,12,1313-30.
Rasm ussen,M .S.,1992, A ssessm ent of m illet yields and production in northern Burkina Faso using integrated N D V I
from the A V H RR.InternationalJournalofRem ote Sensing.13,p 3431-42.
Rigina,O .and Rasm ussen,M .S., 2003, U sing trend line and principalcom ponentanalysis to study vegetation changes
in Senegal1986 -1999 from A V H RR N D V I8 km data. D anish JournalofG eography,103,31-42.
Shannon,C.E.,1948,A M athem aticalTheory ofCom m unication.BellSystem s TechnicalJournal,27,379-423
SISCA L,2003,U RL:w w w .SISCA L.N ET,A ugust2003.
Tottrup,C.and Rasm ussen,M .S.,Subm itted,M apping long-term changes in crop productivity in Senegalthrough trend
analysisoftim e seriesofrem ote sensing data.Subm itted to: A griculture,Ecosystem s and Environm ent,
Tucker,C.J.,Justice,C.O .,Prince,S.D ., 1986,M onitoring the grasslands of the Sahel1984-1985. InternationalJournal
ofRem ote Sensing,7,1571-81.
U kkonen,E.,1992, Constructing suffix trees on-line in linear tim e.In J.v.Leeuw en,editor,A lgorithm s,Softw are,A rchitecture.Inform ation Processing 92,V olum e 1,pages 484-492.Elsevier,1992.
W einer,P.,1973,Linear pattern m atching algorithm s.In 14th A nnualSym posium on Sw itching and A utom ata Theory,
pages 1-11,The U niversity ofIow a,15-17 O ctober1973.IEEE.
148
AUTOMATIC CHANGE DETECTION FOR VALIDATION OF DIGITAL MAP DATABASES
Brian Pilemann Olsen
Thomas Knudsen
National Survey and Cadastre, Denmark, 8 Rentemestervej, DK–2400 Copenhagen NV, Denmark, {bpo|thk}@kms.dk
KEY WORDS: Photogrammetry, Remote Sensing, Change Detection, Classi cation, Automation, High Resolution, Infra-red,
DEM/DTM
ABSTRACT
In almost all areas of our society there is an increasing need for
up to date digital map databases. Traditionally, different manual,
labour intensive and hence costly methods have been used for
map updating, with the change detection for the updating being
by far the most complex and expensive part. In this paper an automatic change detection method is presented and evaluated. The
method only considers changes in the buildings theme, but it can
be extended to other object classes. The aim is the development
of an ef cient change detection procedure for database maintenance in a production environment. The method is based on classi cation principles and combines an unsupervised and a supervised classi cation in order to determine the spectral response of
the building class and thus locate potential buildings. The result
is ltered by a height lter to re ne the result. The method is
evaluated on building registrations from the Danish TOP10DK
map database. The test case presented in the paper is from a residential suburban area. The method detects almost all changes due
to demolished buildings whereas only a smaller part of the new
buildings are detected. This is primarily due to the use of a very
special roo ng material. The method leads to a number of false
alarms, which to a large degree can be eliminated by re nements
of the algorithm or by introduction of additional information e.g.
infra-red images or texture measures.
1
As can be seen from gure 1 buildings are often highly diverse
both when it comes to size, form and spectral signature. They are
therefore hard to describe by spectral information only. Adding
height information to the process, e.g in the form of digital surface models, may improve the distinguishing of buildings from
other objects having a similar spectral response.
When introducing automatic change detection procedures, the
aim is to detect at least the same percentage of factual changes as
a manual operator is capable of. It is, on the other hand, acceptable if the change detection procedure introduces false alarms, as
long as they are few, since they can easily be rejected during the
actual 3D object registration.
INTRODUCTION
The digital topographic map database TOP10DK is the primary
topographic product of the National Survey and Cadastre—Denmark (Kort & Matrikelstyrelsen, KMS).The development and update of TOP10DK is based on aerial photogrammetry: one fth of
Denmark is photographed each spring before foliation, resulting
in approx. 1200 photos (about 400 GB of image data).
Map updating can be carried out by a complete remapping of
the area for each revision cycle, but much work can be saved by
detecting changes from the previous version of the map database
and concentrating on these areas of change. Change detection
for topographical mapping is on the other hand not a simple task:
Although the intention is always to carry out the photo ights
at approximately the same time of year, the natural, inter annual
variations of the vegetation coverage is of a magnitude that hides
the (primarily human generated) changes sought for. Furthermore
it is almost impossible to take the photos at the same geographical
position and with the same attitude as within the previous photo
campaign. This means that the change detection must be carried
out by comparing a new image directly with the existing map
database, rather than by a simpler image-to-image comparison.
In this paper an automatic procedure for change detection concentrating on buildings, which are important mapping objects, is
presented. The next step in the update process: the actual 3D object registration, is not considered here. This subject has recently
been treated extensively by Niederöst (2003) and Süweg (2003).
Figure 1: Buildings are typically highly diverse and spectrally
ill-de ned, when considered as a single group. The last image
(Lower Right) is a building as it is represented in a DSM.
1.1
Related work
Other European countries e.g. Germany, Switzerland, and the
Netherlands have also established and completed digital map databases with national coverage in the past few years. The National
Mapping agencies in these countries therefore face the same problem as the KMS and projects with similar aims considering automatic or semi-automatic map updating have been established.
In Germany the project for updating the ATKIS database focuses
on registration of more generic surface types (settlement, grassland, street, water etc.) (Petzold and Walter, 1999, Walter and
Fritsch, 2000). The method for change detection uses supervised classi cation, with training areas automatically generated
using the existing registrations in the ATKIS map data base (Walter, 2000). Experiments combining multi-spectral images (RGB,
colour infra red – CIR) with height information and reduction of
the information to generic surface types, have shown that it is
possible to perform automatic change detection with a satisfactory accuracy (Petzold and Walter, 1999, Petzold, 2000) (note,
however, that the accuracy requirements for ATKIS are somewhat lower than for TOP10DK (Kort & Matrikelstyrelsen, 2001,
AVLBD, 1988)). The change detection leads to a “change map”
149
where the generic objects are divided in three classes: no change,
possible change and change.
In the Swiss ATOMI project, aerial colour photos, a high resolution Digital Elevation Model (DEM) and a Digital Surface
Model (DSM) are used aiming at the enhancement of the planimetric accuracy for the 2D VECTOR25 database (Eidenbenz et
al., 2000, Niederöst, 2003). The surface model is generated by
auto-correlation in aerial photographs in the scale of 1:10.000 and
is used as the primary data source. Image information (RGB/CIR)
is primarily used to discern man made objects from natural objects (buildings vs. vegetation).
Data from the digital multi spectral camera High Resolution Stereo
Camera—Airborne, HRSC-A (Neukum, 1999) is evaluated and
used within the Dutch project (Asperen, 1996, Hoffmann et al.,
2000). The HRSC-A data set includes high-resolution (15 cm)
spectral data (RGB and CIR) and an automatically generated
high resolution surface model from stereo matching.
A new change detection project within the framework of EuroSDR is about to start up later this year. The emphasis is on development of methods for localising changes in land cover from
very high resolution imagery, the integration of change maps in
the updating process and nally comparison of different methods
for change detection (EuroSDR, 2004).
2
DATA
The change detection procedure presented in section 3 below, is
evaluated using datasets mainly associated with the development
and updating of the Danish TOP10DK topographical map data
base.
2.1
RGB images
For the establishment and update of TOP10DK traditional RGB
aerial photographs have been used. All images are taken from an
altitude of approximately 3800 m leading to a scale of 1:25.000.
Each image covers an area of 6 km by 6 km and has a forward
lap of 60 percent and a side lap of 20 percent. As part of the
production work- o w the photos are scanned at a resolution of
21 µm, leading to 350 MB of data, and a spatial pixel resolution
of 0.5 m at ground level. The photos were taken as part of a ight
campaign in April 2000.
2.2
Digital Surface Model (DSM)
As was described by Knudsen and Olsen (2003) it is very difcult to locate changes in the building layer using single aerial
images and hence only using spectral information in combination with size and form considerations. Therefore a high resolution digital surface model (DSM) with a grid size of 1 meter
covering a test area in Lyngby, north of Copenhagen, has been
generated to facilitate the building detection. The dataset was
collected and made available for these studies by the Danish engineering and mapping company COWI. Data were collected in
May-June 2001 using the TOPOSYS1 system (Toposys, 2004,
Baltsavias, 1999) which only record rst responses of the pulse.
The expected height accuracy is approximately 0.15 m.
2.3
Digital Map Database
The building theme from TOP10DK has been selected as target
for the update procedure. TOP10DK is a fully 3D map database,
including 51 object types (building, lake, highway . . . ) organised
in 8 classes (traf c, water . . . ). The precision of the database is
better than 1 meter both horizontally and vertically. For change
detection in the building layer, only new buildings larger than 25
m2 and changes of building size larger than 10 m2 are considered.
3
METHOD
The method presented is a revision of a method described by
Olsen et al. (2002) and Knudsen and Olsen (2003). It is based on
classi cation principles, using existing object registrations in the
map database as training areas in order to determine the characteristics of the different classes used to search for and build the
object model. As it is very dif cult to generate an unambiguous
object model for buildings using only spectral information, the
revised method also incorporates height information in the form
of high resolution DSM data e.g. from LIDAR or photogrammetric auto-correlation to distinguish between objects in terrain from
objects above terrain.
3.1
The method step by step
The method which consists of three steps, preparation, classi cation, and detection is outlined in gure 2.
Two major assumptions have to be ful lled for the change detection procedure to be successful:
(1) The number of changes in a given class (e.g. building) must
be much smaller than the number of objects used to describe the
class. This is valid for most urban areas.
(2) New objects must share the same spectral characteristics as
the existing objects used to generate the object model. This is
often the case as only a small number of roo ng materials is in
common use.
3.1.1 Preparation: The preparation consists of a data fusion
step to bring the data sets into a common reference frame and a
preprocessing step where various enhancement methods are applied to the data data to prepare them for the change detection
procedure.
Data fusion: as objects from the existing digital map database
is to be used as training areas for the determination of the class
characteristics, image data (raster) and the map database (vector)
must be co-registered. Generally co-registration can be done either by registration of the image data to the map database or by
registration of the map database to the image data.
The most used method is registration of image data to the map
database. However the method has the disadvantage that most
image data types (aerial photos) have to be re-sampled as recti ed
images or orthophotos. For the data sets to t completely to each
other a high precision elevation model, including description of
man made objects (buildings, bridges etc.) must be available (i.e.
a Digital Surface Model, DSM).
Another way is to project the map database directly to the other
data sets, e.g. onto the aerial images using the basic photogrammetric equations (Kraus, 1993). For this to work a database with
(X,Y, Z) coordinates and the orientation parameters for the aerial
images have to be available. This method, leads to the most precise co-registration, and eliminates any resampling of image data.
Preprocessing: Various algorithms are applied to the data set to
prepare them for the change detection process. The three most
important processes are: (1) calculation of NDVI (Normalised
150
Difference Vegetation Index) images if colour infra-red (CIR)
photos are available; (2) generation of a normalized Digital Surface Model (nDSM); and (3) evaluation of training areas.
DATA SOURCES
MAP
DATABASE
RGB
IMAGES
NIR
IMAGES
DHM
DTM
DSM
Preparation
PREPROCESSING
DATA FUSION
−red
NDVI is calculated as NDVI = ir
ir+red ; NDVI is well suited for
distinguishing vegetated areas from man made objects.
A nDSM only includes objects which stands above terrain and
it can be calculated as using a Digital Terrain Model (DTM):
nDSM = DSM − DT M. If a DTM is not available it must be
estimated from the DSM. A very simple method for DTM estimation using grey tone morphology is described by Weidner and
Förstner (1995), and used in these tests. First a minimum ltering
of the DSM is performed using a at structuring element B (with
a given size and form). In this way the minimum height in the
area determined by the structuring element is assigned to the origin of the structuring element (pixel). This minimum ltering is
followed by a maximum ltering, using the same at structuring
element. Performing the two steps in the described order equals
an morphological opening: z̄ = z ◦ B and leads to an estimation
or approximation of the topographic surface, the DTM. In order
to eliminate all elements above terrain (buildings), the size of the
structuring element must be chosen in such a way that it is not
completely contained in a building. The size depends on the area
to be processed. If a priori information concerning existing building sizes in the area is available the size can be x ed using this
information. In the test presented in this paper the size of B is
x ed to 25 m. The process is illustrated in gure 3.
CLUSTERING
Classification
CLASSIFICATION
ITERATION
Figure 3: nDSM creation using arti cial DTM. UL: DSM. UM:
estimated DTM, z̄ = z ◦ B. UR: nDSM = DSM - DTM. LL: DSM
pro le. LM: DTM pro le. LR: nDSM pro le. All pro les follow
the red line in the DSM, DTM and nDSM respectively.
Detection
CREATE
CHANGE MAP
POST
PROCESSING
Figure 2: Change detection work o w—cf. section 3 for description
Validation of the training areas is done using the estimated nDSM
and/or the NDVI image. An objects above terrain mask can be
generated using a height threshold of e.g. z ≤ 2.5 meters in the
nDSM. With this mask, areas registered as buildings in the existing map database, which no longer stand above terrain are ltered out. Objects covered by vegetation can be eliminated using
the NDVI mask (if available), as they can be detected as areas
with NDVI ≥ 0.1. The result of the validation is a re ned building mask, holding only the buildings which are most likely still
buildings.
3.1.2 Classi cation: The rst step in the classi cation part is
to perform a clustering process. As stated by e.g Kressler and
Steinnocher (1996) some classes, (e.g. buildings) have to be subdivided into more unique subclasses as they are spectrally highly
diverse. This task is handled by splitting up the group of pixels registered as buildings by the building mask into smaller and
151
more unique sub-classes using a simple migrating means clustering process. The algorithm is based on the ISODATA algorithm
(Ball and Hall, 1965), and the number of sub-classes is automatically determined by the algorithm in order to make a best t to
the input dataset.
The clustering process is followed by an actual classi cation .
The sub-classes, which are spectrally more uniform than the base
building class, are used (either alone or in combination with other
class descriptions (e.g. water, roads, forest, grassland etc.) to
perform a Mahalanobis classi cation of the entire image. This
causes all pixels in the image to be assigned to the class having the smallest Mahalanobis distance from the pixel value to the
class (Richards and Jia, 1999). Threshold values all being dependent on the class characteristics, are used to assign pixels with a
distance too far from the closest class to a garbage class.
The two successive steps are run a number of times as part of
an iteration process. This is done mainly due to the fact that the
result of the ISODATA algorithm is strongly dependent on the position of the initial cluster centres. After this iteration process it is
possible to accept pixels identi ed as buildings a speci c number
of times. In the case study presented below all pixels which are
classi ed as a building one or more times are considered to be
a “building”, leading to an image holding pixels with values of
either zero or one: zero indicates no building and one indicates a
potential building pixel.
Using the nDSM, the image holding potential buildings are ltered in order to extract only objects (pixels) which stands above
terrain.
3.1.3 Change detection: First a change map is computed by
a pixel by pixel comparison of the existing map database (in a
raster version) to the classi cation result. Since the change map
includes all potential changes in the building layer it includes
noise in the form of single pixels, and some false alarms due
to misclassi cation. The single pixels are removed using morphological opening. The remaining change pixels are segmented,
and pixel clusters smaller than the detection requirement (e.g. 25
m2 = 25 pixels in the TOP10DK case) and/or not ful lling the
size and shape speci cations for buildings are removed from the
dataset, leading to a reduction of the false alarms and the nal
change map.
4
CASE STUDY
The procedure is tested on the data described in section 2. The
latest update of the TOP10DK database was carried out ve years
before the photos were taken.
(E, N) coordinates are given in UTM zone 32. All images (RGB,
TOP10DK and DSM) are 500 rows by 700 columns and approximately 70 buildings are included in the area. 72 registrations are
included in the existing map data base. 12 new houses have been
build since the last revision and 14 have been demolished.
4.2
Test data
All datasets are subsamples from larger datasets and they are
brought into the same geographical reference by orthorecti cation of the aerial photographs using an existing digital terrain
model (DTM) with a grid spacing of 20 m.
4.3
Results
The results are visualised in gure 4. The rst image shows the
RGB image with the existing map database superimposed in yellow colour. It can be seen that a lot of development has taken
place since the last revision of the map database. This is most
pronounced in the right part of the image where 7 buildings have
been demolished and 10 new buildings with blue roo ng material
have been built.
The second image from the top shows the result after the classication step. White pixels indicate potential buildings, and as it
can be seen large areas are misclassi ed as vegetation and roads
are classi ed as potential buildings.
The third image shows the result after the height ltering. It can
be seen that all roads are now removed. Some vegetated areas
still remain as potential buildings, though.
The last image shows the RGB image with the changes found by
the automatic change detection algorithm superimposed in yellow. The results are summarised in table 1.
Demolished Buildings
New Buildings
Changes
False alarms
Factual
14
12
26
Detected
12
2
14
45
Table 1: Statistical results
Approximately 50 percent of the factual changes in the test area
have been detected by the algorithm.
5
DISCUSSION
Most success is experienced in the group of demolished buildings
where only two demolished buildings have not been detected.
4.1 Test area
This is caused by the method used for change detection where
the detected buildings are compared directly to the existing regThe test area used for evaluation is situated in Kgs. Lyngby, a
istrations on a pixel wise basis. The two demolished buildings
suburb 15 km north of Copenhagen. The area contains many difwhich are not detected are positioned the upper right area of the
ferent types of buildings and houses since it includes a small intest area (marked by a red circle). And as can be seen a new builddustrial area; a cemetery; a church; a small train station; large
ing has been built exactly at the same position as where the two
strip buildings; and a gasoline station. The looks and shapes of
old buildings were positioned. Due to the comparison method
the buildings as well as the heights differ a lot. Vegetated areas
neither the new nor the demolished buildings are “highlighted”
take up a large part of the area, and since the area also includes a
by the algorithm. Only two new buildings have been detected
highway, two bridges (one for pedestrians and one for cars) and a
by the algorithm. The reason for the poor result is that 9 out of
rail road, this causes a very special terrain structure. The area is
the twelve new buildings have completely different spectral realso characterized by the fact that many changes have taken place
sponses than the existing buildings in the area, as they are either
since the establishment of the TOP10DK database.
blue (6 buildings) or still not nished (3). As one of the two
Area: Approximately 700m × 500m. Lower left corner (E, N) =
hypotheses regarding the change detection procedure are not ful(718450, 6187050). Upper right corner (E, N) = (719150, 6187550). lled, the algorithm is expected to fail.
152
45 false alarms (3 times the factual changes found) are generated.
Of those, 26 are located in vegetated areas, 3 are bridges (above
terrain), and 16 caused by existing buildings which apparently
have not be re-detected by the algorithm. If infra-red images are
available, the majority of the false alarms can be eliminated using
the NDVI or by calculation of textural gures using the DSM, as
it can be expected that the texture for forrested areas differs from
buildings. The 3 false alarms caused by bridges can only be eliminated by the use of a more “clever” algorithm for nDSM generation. Looking more into the false alarms caused by buildings not
re-detected, it can be seen that a large proportion of those buildings are actually detected ( gure 4, examples marked by green
circles), but these detections are eliminated in a later stage of the
change detection algorithm, as part of noise reduction. Re ning
the noise reduction method may lead to more existing buildings
being “re-detected”. One of the false alarms (shown by a white
circle in the upper right corner of the area), is a factual difference
but not a change, since it is a roof covering a gasoline station,
and such roofs are not to be registered in the TOP10DK database,
according to the map speci cation. Such false alarms can only be
veri ed by a human operator.
6
CONCLUSION
The method presented shows reasonable performance when detecting demolished buildings (12 out of 14 are detected), whereas
the number of new buildings detected is poor (only 2 out of 12).
A reason for this exists, as the new buildings do not share the
spectral response of the existing buildings in the area. One of the
hypothesis for the algorithm is not ful lled. The algorithm introduces a fairly large number of false alarms (3 times the number of factual changes detected). Most of these false alarms can
be eliminated by re ning some of the processing steps in the algorithm (noise reduction, DSM generation) or by introduction of
additional information e.g. infra-red images or textural measures.
Acknowledgements: We would like to thank the engineering and
mapping company, COWI, for letting us use their Digital Surface
Model.
REFERENCES
Asperen, P. V., 1996. Digital updates of the Dutch topographic
service. In: International Archives of Photogrammetry and Remote Sensing 31(B4), ISPRS, pp. 891–900.
AVLBD, 1988. Amtliches topographisch-kartographiches informationssystem (ATKIS).
http://www.adv.online.de/
english/products/atkis.htm.
Ball, G. H. and Hall, D. J., 1965. A novel method of data analysis
and pattern classi cation. Technical report, Stanford Research
Insitute, Menlo Park, CA.
Baltsavias, E. P., 1999. Airborne laser scanner : existing systems
and rms and other resources. ISPRS Journal of Photogrammetry
and Remote Sensing 54(2-3), pp. 164–198.
Eidenbenz, C., Kaeser, C. and Baltsavias, E., 2000. ATOMI - automated reconstruction of topographic objects from aerial images
using vectorized map information. In: International Archives of
Photogrammetry and Remote Sensing, Vol. XXXIII (B3), ISPRS,
pp. 462–471.
Figure 4: Results from the change detection algorithm.
EuroSDR, 2004. EuroSDR – Change detection project for
consideration. http://www.eurosdr.org/2002/research/
ResPoss.asp?ResPosID=27&#pos.
153
Hoffmann, A., van der Vegt, J. W. and Lehmann, F., 2000. Towards automated map updating: is it feasible with new data aquisition and processing techniques? In: International Archives of
Photogrammetry and Remote Sensing, Vol. XXXIII (B2), ISPRS,
pp. 295–302.
Knudsen, T. and Olsen, B. P., 2003. Automated change detection
for updates of digital map databases. Photogrammetric Engineering & Remote Sensing 69(11), pp. 1289–1296.
Kort & Matrikelstyrelsen, 2001. TOP10DK. Geometrisk registrering. Technical report, National Survey and Cadastre—
Denmark, Rentemestervej 8, 2400 København NV, Denmark.
Kraus, K., 1993. Photogrammetry. Vol. 1, 4 edn, Dümmler, Bonn,
Germany.
Kressler, F. and Steinnocher, K., 1996. Change detection in urban
areas using satellite images and spectral mixture analysis. In:
International Archives of Remote Sensing and Photogrammetry,
Vol. 31number B7, ISPRS, pp. 379–383.
Neukum, G., 1999. The airborne HRSC–A: Performance results
and application potential. Photogrammerische Woche pp. 83–88.
Niederöst, M., 2003. Detection and Reconstruction of Buildings
for Automated Map Updating. PhD thesis, ETH Zurich, Institut
fur Geodasie and Phtotgrammetrie, Eidgenössische Technische
Hochschule, 8093 Zurich.
Olsen, B. P., Knudsen, T. and Frederiksen, P., 2002. Digital change detection for map database update. In: J. Chen
and J. Jiang (eds), Integrated Systems for Spatial Data Production, Custodian and Decision Support, Vol. XXXIV (2), ISPRS,
pp. 357–363.
Petzold, B., 2000. Revision of topographic databases by satellite
images – experiences and expectations. pp. 15–23.
Petzold, B. and Walter, V., 1999. Revision of topographic
databases by satellite images. In: M. Schroeder, K. Jacobsen,
G. Konechy and C. Heipke (eds), Sensors and mapping from
space 1999, ISPRS, Hanover, Germany, p. 9.
Richards, J. A. and Jia, X., 1999. Remote Sensing and Digital
Image Analysis: an introduction. 3 edn, Springer, Berlin, Germany.
Süweg, I., 2003. Reconstruction of 3D Building Models from
Aerial Images and Maps. Publications on Geodesy, Netherlands
Geodetic Commision, Delft, the Netherlands.
Toposys, 2004. Toposys1 speci cation. http://www.toposys.
de/.
Walter, V., 2000. Automatic change detection in gis databases
based on classi cation of multispectral data. In: International
Archives of Photogrammetry and Remote Sensing, Vol. XXXIII
(B4), ISPRS, pp. 1138–1145.
Walter, V. and Fritsch, D., 2000. Automated revision of gis
databases. In: L. K.-J. et. al. (ed.), Proceedings of the Eight
ACM Symposium on Advances in Geographic Information Systems, ACM, pp. 129–134.
Weidner, U. and Förstner, W., 1995. Towards automatic building
extraction from high resolution digital elevation models. ISPRS
Journal of Photogrammety & Remote Sensing 50(4), pp. 38–49.
154
Spatial-temporal Disambiguation of Multi-modal
Descriptors
Norbert Krüger1
Florentin Wörgötter2
1
2
Computer Science
Aalborg University Esbjerg
6705 Esbjerg
[email protected]
Computational Neuroscience
University of Stirling,
Stirling FK9 4LA Scotland, UK,
[email protected]
Abstract
In this paper, we describe a new kind of visual representation in terms of local
multi–modal Primitives and its role as initiator of diambiguation process based on
regularities in visual data. Our Primitives can be characterized by three properties:
(1) They describe visual information by different modalities. (2) They are essentially
adaptable according to the spatial and temporal context. (3) They give a condensed
representation of local image structure that make use of meaningful attributes.
Our Primitives are motivated by human visual processing. They are functional
abstractions of hypercolumns in visual cortex [1, 2, 3]. The efficient and generic coding
of visual information allows for a wide range of applications. For example, they have
been used to investigate the multi–modal character of Gestalt laws in natural scenes
[4], to code a multi–modal stereo matching and to investigate the role of different
visual modalities for stereo [5] and in a combination of grouping and stereo [6]. In this
paper, we use the regularity Rigid Body Motion to acquire reliable 3D information in
a spatial-temporal disambiguation process.
1
Introduction
The aim of this work is to compute reliable feature maps from natural scenes. We
believe that to establish artificial systems that perform reliable actions we need such
reliable features. These can only be computed through integration across the spatial
and temporal context and across visual modalities since local feature extraction is necessarily ambigious ([7, 8]. The European Project ECOVISION [9] focusses exactly on
this issue and the work described here is a central pillar of this ongoing project. In this
paper, we describe a new kind of image representation in terms of local multi–modal
Primitives (see fig. 1). These Primitives can be characterized by three properties:
Multi-modality: Different domains that describe different kind of structures in visual
data are well established in human vision and computer vision. For example, a local edge can be analyzed by local feature attributes such as orientation or energy in
certain frequency bands. In addition, we can distinguish between line and step–edge
like structures (constrast transition). Furthermore, color can be associated to the edge.
This image patch also changes in time due to ego-motion or object motion. Therefore
155
ϕ
cr
x2
θ
cl
A
B
x1
o
C
Figure 1: A: Image sequence and frame. B: Schematic representation of the multi–
modal Primitives. C: Extracted Primitives.
time specific features such as a 2D velocity vector (optic flow) can be associated to
this image patch. Finally by using stereo, information about the 3D origin of an aimage patch can be computed. In this work we define local multi–modal Primitives that
realize these multi-modal relations. The modalities, in addition to the usually applied
semantic parameters position and orientation, are contrast transition, color and optic
flow (see fig. 1).
Condensation: Integration of information requires communication between Primitives
expressing spatial [4, 5] and temporal dependencies [10]. This communication has
necessarily to be paid for with a certain cost. This cost can be reduced by limiting the
amount of information transferred from one place to the other, i.e., by reducing the
bandwidth. Therefore we are after a compression of data. Essentially we only need
less than 5% of the amount of the pixel data of a local image patch to code a Primitive
that represents such a patch. However, condensation not only means a compression of
data since communication and memorization not only require a reduction of information. Moreover, we want to reduce the amount of information within an image patch
while preserving perceptually relevant information. This leads to meaningful descriptors such as our attributes position, orientation, contrast transition, color and optic flow.
In [4], we have also shown that these descriptors (in particular when jointly applied)
allow for strong mutual prediction that can be related to classical Gestalt laws.
Adaptability: Since the interpretation of local image patches in terms of the above
mention attributes as well as classifications such as ‘edgeness’ or ‘junctionness’ are
necessarilly ambigious when based on local processing stable interpretations can only
be achieved through integration by making use of contextual information [7]. Therefore, all attributes of our Primitives are equipped with a confidence that is essentially
adaptable according to contextual information expressing the reliability of this at-
156
tribute. Furthermore, the feature attributes itself adapts according to the context (see
section 3). Adaptation occurs by means of recurrent processes in which predictions
based on statistical and deterministic regularities disambiguate the locally extracted
and therefore neceassarily ambigious data. The instantiation of these processes becomes feasable because of the condensed representation which leads to a managable
amount of meaningful higher order relations of visual events.
In section 2, we describe the Primitive attributes and their extraction. In section
3, we refer to applications of our Primitives for contextual integration where we will
especially focus on spatial-temporal disambiguation.
2
Multi–modal Primitives
In addition to the position x, we compute the following semantic attributes and associate them to our Primitives (see also fig. 1).
Intrinsic Dimension: Local patches in natural images can be associated to specific
local sub-structures, such as homogeneous patches, edges, corners, or textures. Over
the last decades, sub-domains of Computer Vision have extracted and analysed such
sub-structures.
The intrinsic dimension (see, e.g., [11]) has proven to be a suitable descriptor that
distinguishes such sub-structures. Homogeneous image patches have an intrinsic dimension of zero (i0D); edge-like structures are intrinsically 1-dimensional (i1D) while
junctions and most textures have an intrinsic dimension of two (i2D). A continuous
definition of intrinsic dimensionality has been given in [12, 13]. There, it has also
been shown that the topological structure of intrinsic dimension essentially has the
form of a triangle. This triangular structure can be used to associate 3 confidences
(ci0D , ci1D , ci2D ) to homogenous-ness, edge–ness, or junction–ness (see also [14]).
This association of confidences to visual attributes is a general design principle
in our system. These confidences as well as the attributes themselves are subject to
contextual integration via recurrent processes. Aspects with associated low confidences
have a minor influence in the recurrent processes or can be disregarded.
Orientation: The local orientation associated to the image patch is described by θ.
The computation of the orientation θ is based on a rotation invariant quadrature filter,
which is derived from the concept of the monogenic signal [15]. Considered in polar
coordinates, the monogenic signal performs a split of identity [15]: it decomposes an
intrinsically one-dimensional signal into intensity information (amplitude), orientation
information, and phase information (contrsat transition). These features are pointwise
mutually orthogonal. The intensity information can be interpreted as an indicator for
the likelihood of the presence of a certain structure with a certain orientation and a certain contrast transition (phase, see below). Orientation estimation is further improved
by interpolating across the orientation information of the whole image patch to achieve
a more reliable estimate.
Contrast transition: The contrast transition is coded in the phase ϕ of the applied filter
[15]. The phase codes the local symmetry, for example a bright line on a dark background has phase 0 while a bright/dark edge has phase −π/2 (see fig. 2I). Therefore,
157
-π/2
π/2
-π
0
Figure 2: Left: Different grey level strucures (such as line and step-edge structures)
can be associated to different phase values. Right: For line–like structures the colour
on the line is important which is irrelevant for step edges. Consequently there is a
different coding for these different structures: The line indicating the border of the
street is first coded as two step-edges and then with greater distance as a line.
the line that marks the border of the street is represented as a line or two edges depending on the distance (see fig. 2II). In case of boundaries of objects phase represents a
description of the transition between object and background.
Color: Color (cl , cm , cr ) is processed by integrating over image patches in coincidence with their edge structure (i.e., integrating separately over the left and right side
of the edge as well as a middle strip in case of a line structure). In case of a boundary
edge of a moving object at least the color at one side of the edge is expected to be
stable since (in contrast to the phase) it represents a description of the object. Note that
the coding of colour depends on the phase. In case of a line structure (i.e., ϕ ≈ 0 or
ϕ ≈ π) besides the colour on the left and right side also a colour for the middle stripe
is shown.
Optic Flow: Optic flow is associated to a primitive by a vector o. There exist a large variety of algorithms that compute the local displacement in image sequences. [16] have
them devided into 4 classes: differential techniques, region-based matching, energy
based methods and phase-based techniques. After some comparison we decided to use
the well-known optic flow technique [17]. This allgorithm is a differential technique in
which however in addition to the standard gradient constraint equation an anisotropic
smoothing term is supposed to lead to better flow estimation at edges (for details see
[17, 16, 18]).
Stereo: An image patch also describes a certain region of the 3D space and therefore
3D attributes can be associated such as a 3D-position and a 3D-direction (in the following called 3D-Primitives). In [5] we have defined a stereo similarity function that
makes use of multiple-modalities to enhance matching performance. We could show
158
a)
b)
c)
Figure 3: a) Left and right image from stereo frame. b) Front view of extracted spatial
primitives (note that this representations has been extracted over multiple frames). The
spatial primitives carry beside the actual depth value computable from the disparity also
information about 3D orientation, phase and colour. c) Side view of the representation.
The structure of the street as well as the two trees are clearly visible.
that it is the joint use of different modalities that gives best results and could compute
weights for the importance of the different modalities [5]. The stereo information is
coded by the 3D position and the 3D orientation.
Note that the result of the primitive representation is actually not only a disparity
map. It is a representation in which the ’symbols’ carry beside the depth information
also information about other semantic aspects (see figure 3b,c).
We end up with a parametric description of a Primitive as
E = (x, θ, ϕ, (cl , cm , cr ), o, (ci0D , ci1D , ci2D )).
In addition, there exist confidences ci , i ∈ {ϕ, cl , cm , cr , o} that code the reliabilty
of the specific sub–aspects that is also subject to contextual adaptation. In addition,
information about the underlying 3D-structure is associated to a primitive.
Although a usual image patch that is represented by our Primitives has a dimension
of 3×12×12 = 432 values (3 color values for each pixel in a 12×12 patch), the output
of our Primitives has less than 20 parameters. Therefore, the Primitives condense the
image information by more than 95%. This condensation is a crucial property of our
Primitives that allows to represent meaningful information in a directly accessible and
compressed way.
159
a)
b)
c)
d)
Figure 4: a) Accumulation scheme for spatial-temporal integration. b) Top view of
scene in figure 3a after applying the accumulation scheme and ellimination of primitives by thresholding over confidences. c) Accumulated representation projected to
an image (white line segments represent reliable primitives while dark ones represent
unreliable ones). d) Top view as in b) but without thresholding.
The visual modalities coded in our Primitives are processed in early stages of visual
processing. Hubel and Wiesel investigated the structure of the first stage of cortical
processing that is located in an area called ‘striate cortex’ [1, 2]. The striate cortex
is organized in a retinotopic map that has a specific repetitively occurring pattern of
substructures called hyper-columns. Hyper-columns themselves contain so called orientation columns and blobs. However, it is not only orientation that is processed in an
orientation column but the cells are sensitive to additional attributes such as disparity,
color, contrast transition, local motion and disparity (see [19]). Also specific responses
to junction–like structures could be measured [20]. Therefore, it is believed that in V1
basic local feature descriptions are processed similar to the feature attributes coded in
our Primitives.
160
3
Integration of Contextual Information and Conclusion
It is not only local image processing performed in early visual processing. As mentioned above, there occurs an extensive communication within visual brain areas as
well as across these areas. In this communication process the locally necessarily ambigious data becomes disambiguated in recurrent processes that are based on regularities
in visual data. In this sense the Primitives must be understood as initiators of an disambiguation process that makes use of contextual knowledge. We will briefly point to
the application of the Primitives is such processes now.
We have used the applied image representation in different contexts. Firstly, the
Primitives can be subject to a purely spatial contextual modification. We define links
between Primitives based on a statistical criterion in [4]. In [6], we apply this linkage
structure to improve stereo processing by demanding correspondences that preserve
links across groups in the left and right image.
Secondly, we stabilize features according to the temporal context. By making use
of the motion of an object to predict feature occurrences across frames we can stabilize
stereo processing by modifying the confidences according to the temporal context. For
example, in figure 4 the regularity Rigid Body Motion has been used to code predictions of 3D–Primitive occurences across frames. Schematically, the method depicted
in figure 4a) is applied: Assuming a certain ocurrence of a primitive in the first frame
(4a,i) and having an estimate of the underlying 3D structure we are able, based on the
motion between frames, to predict the occurrence of the primitive in the next temporal
frame. In case our prediction becomes verified we increase the confidence associated to
the primitive (see figure 4)b). After a couple of iterations the reliable estimates can be
easily detected by their associated confidences. Note that we can also improve the estimate of its parameters by interpolating between our prediction and the actually found
feature as schamtically shown in figure 4a.
Wrong 3D primitives lead to predictions that can not be verified in consecutive
frames which then results in a lowering of the associated confidences (shown in dark
colour in figure 4c). After a couple of frames the 3D–Primitives caused by wrong
correspondences can be easily detected by their low confidences. While in figure 4d
all hypothese in the process are shown (in a top view of the scene) in figure 4b only
Primitives that have been proven to be reliable are displayed. Accordingly, in figure 4,
the general structure of the scene is clearly visible.
4
Conclusion
We have introduced a novel kind of image representation in terms of visual Primitives.
These Primitives are multi-modal and give a dense and meaningful description of a
scene. Our Primitives are used as a first stage in the artificial visual system confidences
associated to our Primitive attributes adapt according to spatial and temporal context
and in this way stabilize the locally unreliable feature extraction.
161
References
[1] D.H. Hubel and T.N. Wiesel, “Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex,” J. Phyiology, vol. 160, pp. 106–154,
1962.
[2] D.H. Hubel and T.N. Wiesel, “Anatomical demonstration of columns in the monkey striate cortex,” Nature, vol. 221, pp. 747–750, 1969.
[3] N. Krüger, M. Lappe, and F. Wörgötter, “Biologically motivated multi-modal
processing of visual primitives,” The Interdisciplinary Journal of Artificial Intelligence and the Simulation of Behaviour, vol. 1, no. 5, 2004.
[4] N. Krüger and F. Wörgötter, “Multi modal estimation of collinearity and parallelism in natural image sequences,” Network: Computation in Neural Systems,
vol. 13, pp. 553–576, 2002.
[5] N. Krüger and M. Felsberg, “An explicit and compact coding of geometric and
structural information applied to stereo matching,” Pattern Recognition Letters,
vol. 25, no. 8, pp. 849–863, 2004.
[6] N. Pugeault, F. Wörgötter, and N. Krüger, “A non-local stereo similarity based
on collinear groups,” Fourth International ICSC Symposium on ENGINEERING
OF INTELLIGENT SYSTEMS, 2004.
[7] Y. Aloimonos and D. Shulman, Integration of Visual Modules — An extension of
the Marr Paradigm, Academic Press, London, 1989.
[8] N. Krüger and F. Wörgötter, “Statistical and deterministic regularities: Utilisation
of motion and grouping in biological and artificial visual systems,” Advances in
Imaging and Electron Physics, vol. 131, 2004.
[9] ECOVISION, “Artificial visual systems based on early-cognitive cortical processing (EU–Project),” http://www.pspc.dibe.unige.it/ecovision/project.html, 2003.
[10] N. Krüger, M. Ackermann, and G. Sommer, “Accumulation of object representations utilizing interaction of robot action and perception,” Knowledge Based
Systems, vol. 15, pp. 111–118, 2002.
[11] C. Zetzsche and E. Barth, “Fundamental limits of linear filters in the visual processing of two dimensional signals,” Vision Research, vol. 30, 1990.
[12] N. Krüger and M. Felsberg, “A continuous formulation of intrinsic dimension,”
Proceedings of the British Machine Vision Conference, 2003.
[13] M. Felsberg and N. Krüger, “A probablistic definition of intrinsic dimensionality
for images,” Pattern Recognition, 24th DAGM Symposium, 2003.
[14] S. Kalkan, D. Calow, M. Felsberg, F. Wörgötter, M.Lappe, and N. Krüger, “Optic
flow statistics and intrinsic dimensionality,” Proceedings of the ’Early Cognitive
Vision Workshop’ on the Isle of Skye (Scotland), 2004.
162
[15] M. Felsberg and G. Sommer, “The monogenic signal,” IEEE Transactions on
Signal Processing, vol. 49, no. 12, pp. 3136–3144, December 2001.
[16] J.L. Barron, D.J. Fleet, and S.S. Beauchemin, “Performance of optical flow techniques,” International Journal of Computer Vision, vol. 12, no. 1, pp. 43–77,
1994.
[17] H.-H. Nagel and W. Enkelmann, “An investigation of smoothness constraints
for the estimation of displacement vector fields from image sequences,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 8, pp. 565–593,
1986.
[18] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, “High accuracy optical flow
estimates based on a theory for warping,” Proceedings of the ECCV 2004, vol. ?,
pp. ??, 2004.
[19] R.H. Wurtz and E.R. Kandel, “Perception of motion, depth and form,” in Principles of Neural Science (4th edition), E.R. Kandell, J.H. Schwartz, and T.M.
Messel, Eds., pp. 548–571. 2000.
[20] I.A. Shevelev, N.A. Lazareva, A.S. Tikhomirov, and G.A. Sharev, “Sensitivity to
cross–like figures in the cat striate neurons,” Neuroscience, vol. 61, pp. 965–973,
1995.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement