1 Proceedings fra den 13. Danske Konference i Mønstergenkendelse og Billedanalyse Eds. Søren I. Olsen Technical Report no. 2004/10 ISSN: 0107-8283 DIKU University of Copenhagen x Universitetsparken 1 DK-2100 Copenhagen x Denmark TABLE OF CONTENTS: Color Signal Processing Professor Reiner Lenz, Linköping University 3 Image Relighting: Getting the Sun to Set in an Image Taken at Noon Claus B. Madsen, Rune Laursen 13 Using Mixtures of Gaussians to Compare Approaches to Signal Seperation Kaare B. Petersen 21 Stochastic differential equations in image warping Bo Markussen 31 Probabilistic Model-based Background Subtraction J. Anderson, T. Prehn, V.Krüger, A. Elgammal 38 Some Transitions of Extrama-Based Multi-Scale Singularity Trees K. Somchaipeng, J. Sporring, S. Kreiborg, P. Johansen 48 A Test Statistic in the Complex Wishart Distribution and its Application to Change Detection in Polarimetric SAR Data Knut Conradsen, Allan A. Nielsen, Jesper Schou, Henning Skriver 56 Using the LZW to Detect Prinictives Lars Reng 72 Medical image sequance coding Mo Wu, Søren Forchhammer 77 Integrity Improvement in Classically Deformable Solids Micky K. Christensen, Anders Fleron 87 The Thin Shell Tetrahedral Mesh Kenny Erleben, Henrik Dohlmann 94 Talking Faces - a State Space Approach Tue Lehn-Schiøler 103 Live Interpretation of Conductors' Beat Patterns Declan Murphy 111 Testing for difference between two groups of functional neuroimaging experiments Finn Å. Nielsen, Andrew C.N. Chen, Lars K. Hansen 121 An image based system to automatically and objectively score the degree of redness and scaling in psoriasis lesions David Delgado, Bjarne Ersbøll, Jens M. Carstensen 130 Survey and Assesment of Advanced Feature Extraction Techniques and Tools for OE Applications Mikael K. Sørensen, Michael S. Rasmussen, Henning Skriver, Peter Johansen, Arthur Pece, Jesper H. Thygesen 138 Automatic change detection for validation of digital map databases Brian P. Olsen, Thomas Knudsen 148 Spatial-temporal Disambiguation of Multi-modal Descriptors Norbert Krüger, Florentin Wörgötter 154 3 Color Signal Processing Reiner Lenz Department Science and Technology Linköping University SE-60174 Norrköping Sweden email:[email protected] Abstract In this paper we describe some of our recent investigations in the eld of color signal processing. We will describe the application of group theoretical methods and illustrate them with some results from low-level lter design, statistical properties of color signal spaces and the construction of invariants. 1 Introduction Everybody knows what color is. But as a scienti c concept color is not at all well-de ned. This is perhaps natural since ”color” has been of interest in such different scienti c disciplines as art, physics, chemistry, biology, mathematics, psychology, physiology and technology to name only a few. We are very well aware of this but here we choose to restrict us to a very narrow de nition of what color is. In the rst part we identify color simply with the measurement vectors that are produced by a (RGB or multi-channel) camera. In the rest of the paper we de ne color (or rather the color signal) as a function of wavelength that has non-negative function values. This signal processing approach completely ignores all perceptual aspects of the problem (color scientists would thus probably argue that this paper is not at all about color, since in their view color is in the brain of the observer). Group theory on the other hand originates partly in the study of symmetries. Roughly one can say that symmetry is a property that is preserved under the application of a set of rules. A simple example: A circle is a geometric gure that is preserved under all 2-D rotations around the center of the circle. Or one could also say that the rotations are the transformations that preserve the circles. Group theoretical methods have had an enormous impact on theoretical physics and mathematics and the purpose of this paper is to show how they can be used to solve problems involving color. We will not go into technical details but try to give a a vor of the wide range of color related problems that can be investigated with group theoretical tools. We start in the rst section with a description how the (geometrical) symmetries of the sampling grid and the (permutational) symmetries of the color channels can be used to design low-level lters (corresponding to receptive elds in human color vision). Then we show why spaces of color signals have a conical structure and we illustrate this with an investigation of databases of daylight spectra. The nal topic is the application of the Lie-theory of differential equations to color problems. We will show that transformation groups de ne systems of differential equations in a natural way. Solving these differential equations can, for example, be used to construct invariants. Using the Lie-theory of differential equations gives a number of advantages. One of them is the availability of symbolic packages like Maple that can solve these problems completely automatically. 2 PCA without data - Harmonic Analysis of RGB images Group theoretically de ned transforms are important tools in many low-level digital signal processing applications. The most widely used examples are the DFT and the FFT but also wavelet transforms can be understood in terms of group 4 theory. The example we describe here is the transform based on the dihedral groups [14]. The case that is of greatest use in applications concerns RGB images on a square grid. Other important cases are RGB vectors on hexagonal grids and four-channel sensors on square or hexagonal grids. Potential applications are digital signal processors in digital cameras with varying sampling and spectral resolution properties. Consider a square sampling grid of the plane. The transformations that map this grid into itself form a group, the dihedral group D(4). It consists of all rotations with angle 90k degrees (k=0,1,2,3) and all re ections around the symmetry axis of a square. It has eight elements. The dihedral group D(6) consists of all transformations that preserve the hexagonal sampling grid. The group of all permutations of three elements is S(3) and consists of six elements. The direct product G = D(4) × S(3) consists of all pairs g = (d, π) where d is an element in D(4) and π ∈ S(3). This product forms a group that has 48 elements. The group elements g operate on RGB vectors de ned on square grids by application of the dihedral part d on the grid and the permutation part π on the spectral bands. A typical operation would be a 90 degrees rotation combined with a ipping of the R and B channels. Since these transformations are linear we can describe them (given a basis) by a matrix. We will also use g to denote this matrix. The same symbol will thus be used for the group element, the transformation and the matrix. What is meant should be clear from the context. The second concept of interest in our context is Principal Component Analysis as Least Mean Squared Error (LMSE) approximation. We start with a stochastic process of signal vectors s of dimension N . In our case this could be RGB vectors on an n × n window collected in an N = 3n2 −dimensional signal vector s. Now change the basis in the signal space and project all signal vectors s to the rst M basis vectors in the new system (with M << N ) thus creating a new signal s. Since the information on s contained in the last N − M components is lost by the projection this will result in an error for each signal. Selecting the new basis such that the mean approximation error is minimal (in the L2 -sense) is known as Principal Component Analysis (PCA). It can be shown that the basis to choose consists of the eigenvectors of the correlation matrix C = E (ss ) of the original stochastic process belonging to the M largest eigenvalues (E (.) denotes the expectation operator). PCA requires thus, in general, two operations: (1) compute the correlation matrix C of the process and (2) compute eigenvectors of the correlation matrix C. In the following we will show under what conditions we can make general statements about the eigenvectors without actually computing the correlation matrix and the eigenvectors. We start by selecting a subset of points on the given grid that is closed under the operations of the dihedral transformations. On each location point we have a measuremnt vector. Given a group element g = (d, π) we apply the geometric transformation d to the grid points and the permutation π to the vector components. The resulting operation is described by a permutation matrix g that has exactly one element of value one in each column and row. These permutation matrices have the property that the transpose is equal to the inverse: g = g −1 . Now assume that each such transformed signal gs has the same probability as the original signal s. Then we get for the correlation matrix the property that: C = E (ss ) = E ((gs)(gs) ) = gE (ss )g = gCg = gE (ss )g −1 = gCg −1 where the equality E (ss ) = E ((gs)(gs) ) holds because the transformation g merely results in a reordering of the original set of signals. As a result we nd that each group element g delivers one constraint C = E (ss ) = gE (ss )g −1 = gCg −1 for the correlation matrix. Such a matrix C is called a G-symmetric matrix. The properties of G-symmetric matrices are wellknown from the theory of group representations and the main results are the following: • There is one matrix M that block-diagonalizes all G-symmetric matrices C • The size of the blocks in the diagonal depends only on the group • The matrix M can be generated automatically The detailed description of these facts can be found in [14] and the references mentioned there. 2.1 Experimental results The implementation of the block-diagonalization is similar to the case described in [14] for the dihedral groups. The similarity of the correlation matrix C and its block-diagonalization M CM depends of course on the degree to which the stochastic process really is G-symmetric. To test the usefulness of the method we did the following experiment: From a large image database containing more than 600 000 images we selected 10000 random images and in each image randomly 5 Representation No. Dimension Block size 1 1 36 2 1 0 3 2 72 4 1 28 5 1 0 6 2 56 7 1 36 8 1 0 9 2 72 10 1 28 11 1 0 12 2 56 13 2 128 14 2 0 15 4 256 Table 1. Block structure of transformed correlation matrix Eigenvector Number 1 2 3 4 5 6 7 8 9 10 Original 0.915898 0.945646 0.954611 0.963074 0.967766 0.970658 0.973146 0.975438 0.976639 0.977770 Block-Diagonal Approximation 0.914900 0.945642 0.954600 0.963057 0.967745 0.970605 0.973117 0.975403 0.976601 0.977733 Table 2. Trace(C) compared with accumulated sum of eigenvalues 2 patches of size 16 by 16 pixels. The signal vector had thus dimension 16x16x3=768. Applying the general theory we nd that in the ideal case the transformed correlation matrix M CM had the block structure listed in Table 1. Note that this result holds for all correlation matrices with this type of symmetry of this size. Figure 2.1 shows the original correlation matrix, the absolute value of the block diagonal correlation matrix M CM and the absolute value of the block diagonal correlation matrix M CM where we have masked out the upper left corner to visualize the remaining entries. (a) Original correlation matrix C (b) Block-diagonal matrix M CM (c) Masked block-diagonal matrix M CM One way to evaluate the loss of information introduced by the block-diagonalization is to compare the values of the eigenvalues of the original principal components and the block-diagonal approximations.In Table 2 the ratios of the trace and the accumulated sums of the eigenvalues of the original matrix and the block diagonal are listed: Further inspection shows that each of the blocks has a special transformation property under the elements of the transformation group leading to lter functions of a certain type. Some lters are thus invariant under all dihedral operations (spatially homogeneous) while others are related to edge-type lters etc.. 6 Finally we want to point out that the block-diagonal lters allow more ef cient implementations due to their symmetry properties and that given the domain they can be automatically derived using symbolic mathematical software like Maple. 3 PCA and the geometry of the space of color signals One attempt to understand the properties of biological systems (and to build technological systems) relies on the hypothesis that successful systems are adapted to the structure of the space of signals they are analyzing (see [1, 2, 4, 18] for a few examples and [19] for a collection of articles on natural stimulus statistics.). If this is true, then it is of interest to investigate properties of signal spaces that are often analyzed by natural or arti cial systems. Here we start with the observation that in many interesting cases the signals of interest can only assume non-negative values. A typical example are illuminant spectra. Here the signals s are functions of the wavelength variable λ and s(λ) is the number of photons of wavelength λ. By de nition s(λ) ≥ 0, ∀λ. Another example from multi-channel color processing are re ectance spectra r where r(λ) describes the probability that a photon of wavelength λ will be re ected from a point. When an illumination spectrum s hits an object point with re ection spectrum r then (in the simplest model) the spectrum that is re ected from that scene point is given by c = s · r and c is often called a color signal. The color signal, too, is by de nition a non-negative function. The main result we will describe here is the proof that all components of the rst eigenfunction of a stochastic process of spectra have the same sign. It can therefore be chosen to have only non-negative entries. This has been observed in many empirical investigations of databases of spectra but to our knowledge it has never been pointed out that this follows from the Perron-Frobenius theory of non-negative matrices and its generalization, the Krein-Rutman theorem. We will then illustrate a few consequences of this fact for databases of daylight spectra. We will show that it follows that these spaces of daylight spectra have a conical geometry and that the group of Lorentz transformations provides a rich toolbox to investigate problems related to changes of daylight properties. 3.1 Perron-Frobenius and Krein-Rutman Theory For nite dimensional signal vectors the non-negativity of the rst eigenvector follows easily from the Perron-Frobenius theory of non-negative matrices. We will therefore give a brief overview and refer the interested reader to Chapter 13 in [6] for more details De nition 1 1. A matrix C is non-negative if all elements are non-negative. 2. A matrix C is positive if all elements are positive. 3. A matrix C is reducible if there is a permutation matrix P such that C1 P −1 AP = P AP = C21 0 C2 (1) A matrix that is not reducible is called irreducible. Note the difference between non-negative (positive) and non-negative (positive) de nite matrices. The rst are de ned by properties of the elements of the matrix whereas the latter are de ned via bilinear products! Note also that for positive (non-negative) matrices we require the transformation matrix P to be a permutation matrix. For a permutation matrix P the transposed P of the matrix is its inverse P −1 . One result of the Perron-Frobenius theory of non-negative matrices is the following theorem of Perron (see [6]): Theorem 1 [Perron] A positive matrix has a real, maximal, positive eigenvalue r. This eigenvalue is a simple root of the characteristic equation and the corresponding eigenvector has only positive elements. In the context of stochastic processes we will also need the following result about the normal form of non-negative matrices: 7 Theorem 2 For a non-negative matrix C we can nd ⎛ C1 0 ⎜ 0 C2 ⎜ ⎜ ... ... ⎜ ⎜ 0 0 ⎜ ⎜Cg+1,1 Cg+1,2 ⎜ ⎝ ... ... Cs1 Cs2 = P CP = a permutation matrix P such that: C ⎞ ... 0 0 ... 0 ... 0 0 ... 0 ⎟ ⎟ ... ... ... . . . . . .⎟ ⎟ ... Cg 0 ... 0 ⎟ ⎟ . . . Cg+1,g Cg+1 . . . 0 ⎟ ⎟ ... ... ... . . . . . .⎠ ... Cs,g Cs,g+1 . . . Cs (2) is the normalform of C. The matrices Ci are irreducible and at least one of the matrices Ci,1 , Ci,2 , . . . , Ci,i−1 is different C from zero. For stochastic processes of color signals we can now argue as follows: color signals have non-negative values, the correlation matrix C is therefore non-negative. We can nd a permutation matrix to bring C into normalform and since C is also symmetric we nd that C is block-diagonal with irreducible blocks. A stochastic process is thus the direct sum of uncorrelated subprocesses where each of the subprocesses has a strictly positive rst eigenvector. In the following we will therefore only consider processes with strictly positive rst eigenvectors. Many of the results of the Perron-Frobenius theory hold also in the more general case where the nite signals are replaced by elements in a Hilbert space and the correlation matrix by an operator. These results are generally known as Krein-Rutman theory (for details see [3], pp. 2129). 3.2 Geometry of Spaces of Daylight spectra Understanding the properties of spectral distributions of daylight and its evolutions at different sites with varying atmospheric conditions is an active research area. Many efforts to measure and model daylight spectra originate in the 60s [10, 11, 17, 23, 24]. The SMARTS model [9] is one example of recent tools to compute clear sky spectral irradiance from a description of atmospheric conditions, time and solar geometries. In the following we consider spaces of daylight spectra as subsets of vector spaces. We use PCA to approximate the spectra with linear combinations of three principal components. We saw above that the rst eigenvector is positive. Therefore the other eigenvectors must have negative contributions somewhere (because of the orthogonality of the eigenvectors). From this we conclude that (given a x ed value for the coef cient of the rst eigenvector) the coef cients of the higher order eigenvectors must be bounded (otherwise some negative contribution of this eigenvector will be larger than the corresponding positive contribution from the rst eigenvector and the resulting linear combination will be negative, which is impossible since the spectra are non-negative everywhere). The daylight spectra are thus located in a solid cone-like region of the vector space. This allows us to introduce the conical structure of the coordinate vector space of spectra. For all spectra in the set we have (after a suitable re-scaling of the basis vectors): σ02 − σ12 − σ22 > 0; σ0 ≥ 0 (3) where σk is the k-th coef cient in the PCA expansion. A conical projection in the space of spectra coordinates is de ned as the following perspective projection: x = σ1 /σ0 ; y = σ2 /σ0 (4) A spectrum in this space is thus characterized by σ0 (related to the intensity) and the coordinate z = x + iy (related to the chromaticity). The points z = x + iy lie on the open unit disk of the complex plane. In the following we will use the terms ”intensity” and ”chromaticity” in the sense de ned here. They are convenient terms but are, of course, different from their meaning in traditional color science. We characterize a spectra sequence by its sequence of projected points on the open unit disk. It is known that the open unit disk is a two dimensional Poincaré model of hyperbolic geometry, and its isometry transformation group is SU(1,1). SU(1,1) is the group consisting of 2 × 2 complex matrices of the following form: a b 2 2 M= : a, b ∈ C, |a| − |b| = 1 (5) b a An element M ∈ SU(1,1) acts as the fractional linear transformation on a point z: w = M z = az + b bz + a (6) 8 Special subgroups of SU(1,1), known as one-parameter subgroups M (t) are given by group elements that are functions of the real values t, having the following properties: M (t1 + t2 ) M (0) = M (t1 )M (t2 ); ∀t1 , t2 ∈ R = E = identity matrix For a one-parameter subgroup M (t) we introduce its in nitesimal generator, represented by the matrix X: dM (t) M (t) − E X= = lim dt t=0 t→0 t (7) (8) The in nitesimal matrices X representing one-parameter subgroups M (t) ∈ SU(1,1) form the Lie algebra su(1,1), which consists of 2 × 2 complex matrices of the form: iγ β su(1,1) = : γ ∈ R, β ∈ C (9) β −iγ The Lie algebra su(1,1) forms a three-dimensional vector space [20], spanned by the basis (J k ): 0 1 0 −i i 0 ; J2 = ; J3 = J1 = 1 0 i 0 0 −i (10) Each in nitesimal matrix X ∈ su(1,1) of a one-parameter subgroup M (t) ∈ SU(1,1) has thus a coordinate vector speci ed by the three real numbers ξ1 , ξ2 and ξ3 : X = ξ1 J 1 + ξ2 J 2 + ξ3 J 3 . Given a start point z(0) on the unit disk together with a one-parameter subgroup M (t) we de ne an SU(1,1) curve as the following function of t: z(t) = M (t)z(0) = etX z(0); t ∈ R, z(t) on the unit disk (11) The SU(1,1) curves are straight lines in the three dimensional Lie algebra space su(1,1), thus the estimation of input chromaticity sequences using SU(1,1) curves can be considered as a linearization. The Lie algebra SU(1,1) provides a powerful tool to linearize problems involving chromaticity sequences. We developed several methods [15] to compute the group parameters of SU(1,1) curves from input data sequences. These methods are used to study the properties of sequences of daylight spectra. 3.3 An Illustration We investigated two different types of daylight spectra: • A database of 21871 daylight spectra measured at the same location (Norrköping, Sweden) from June 16th, 1992 to July 7th, 1993; between 5:10 and 19:01 (Local time); in 5nm steps from 380nm to 780nm, and • Daylight spectra sequences generated by the simulation program SMARTS2 [9]. The SMARTS model accepts as its input the Sun position and atmospheric parameters including: Angström beta, precipitable water, ozone, and surface pressure. The wavelength range of the generated spectra was 380nm to 780nm in 1nm steps. In a series of experiments, sequences of spectra were generated by changing a single parameter in its allowed range while keeping the others x ed to the default values. In another experiment a large set of simulated daylight spectra with all the feasible combinations of parameters were also created to simulate the investigated space of daylight spectra. Figure 3.3 shows the results of the SU(1,1) estimations of chromaticity sequences generated by SMARTS2. In one experiment we generated spectra with different values of the Angström beta and Precipitable water parameters. They can be seen as the images of a grid in the (Angström beta, precipitable water) plane under the SMARTS2 mapping. The chromaticity sequences corresponding to single parameter (Beta or Water) changes are then created by grouping all chromaticity points having the same value of the other parameter. We choose the value of the precipitable water parameter as constant and estimate the one-parameter group as the Angström beta varies. The group coordinates characterizing these estimated SU(1,1) curves are not constant but form a linear function for different settings of the other parameter. In Figure 3.3 we show the computed group coordinates of estimated SU(1,1) curves, varying as functions of the other parameter’s settings. In another 9 (d) Group coordinates from varying water values (e) SU(1,1) curve estimating changing Sun position example we generated different daylight spectra by varying the zenith angle with all other parameters set to default values. This corresponds to a sequence of daylight spectra with different positions of the Sun. It can also be considered as a time sequence of daylight spectra. The same SU(1,1) curve can be used to estimate both sequences of equally spaced zenith angle changes and equally spaced time changes, but the group parameters t describing the locations of points on curve are different for each sequence. In all the experiments described above, we found that SU(1,1) curves provide good estimation for all SMARTS2 spectra sequences generated by changing any single parameter. The method also provides good approximations when the remaining parameters are set to other reasonable values different from the default values. 4 Lie Theory, Differential equations and Invariants One advantage of the framework described above is the connection to the Lie-theory of differential equations. This connection is provided by the one-parameter groups de ned in the last section. Assume we have a one-parameter group M (t) describing the chromaticity changes of an illumination source. One example could be a sunrise or sunset described by the time-varying sequence of spectra in the experiment mentioned in the last section. Given a chromaticity parameter z(0) this group generates a sequence of time-varying chromaticity parameters z(t) = M (t)z(0) as de ned in Eq. (11). Choosing an intensity parameter one can reconstruct the coef cients in the PCA decomposition and by computing the linear combination with the principal components we obtain a sequence of the time-varying illumination spectra l(t, λ) where λ is the wavelength parameter and t is the group parameter. If we now model the camera as a linear system described by the matrix K then the time-changing pixel vectors p(t) for a given object point are given by p(t) = Kl(t, λ). Since the chromaticity transformations d form a one-parameter group it is possible to de ne the Lie-derivative operator dt with respect to the group parameter t. This construction can then be used to investigate properties of measurement sequences like the time-varying pixel values recorded by one channel of a color camera. Another application where these tools can be used is the construction of invariants. With the help of the Lie-theory of differential equations it is possible to give an overview over all invariants and to construct these invariants automatically using programs like Maple. We will illustrate this with the construction of invariants for the dichromatic re ection model. 10 Name Rotation Uniform Scaling Non-Uniform Scaling Shear Matrix Group cos(τ ) sin(τ ) −sin(τ ) cos(τ ) 0 eσ −σ 0 σ e e 0 0 1 1 α 0 1 Lie-Algebra (in nitesimal element) 0 1 X1 = −1 0 1 0 X2 = 0 −1 1 0 X3 = 0 0 0 1 X4 = 0 0 Table 3. Subgroups of the general linear group of 2 × 2 matrices In many applications, for example in color image segmentation, color object recognition etc., the main interest is the physical content of the objects in the scene. Deriving features which are robust to image capturing conditions such as illumination changes, highlights, shadows and geometry changes is a crucial step in such applications. The interaction between lights and objects in the scene is very complicated and sophisticated models such as Transfer Radiative Theory or Monte-Carlo simulation methods are needed to provide realistic results. The complexity of these models (and the fact that many key components are unknown) prevents their application in object recognition. Previous studies of color invariance are, therefore, mostly based on simpler semi-empirical models such as the Dichromatic Re ection Model [21], or the model proposed by Kubelka and Munk [13], together with many additional assumptions [5, 7, 8, 12, 22]. We already mentioned that it is very dif cult to describe in detail what happens when light strikes a surface: some of the light will be re ected at the interface producing interface re ection, while another part will transfer through the medium undergoing absorption, scattering, and emission. The Dichromatic Re ection Model (see [21]) describes the relation between the incoming light and the re ected light as a mixture of the light re ected at the material surface and the light re ected from the material body. The model assumes that the light L(x, λ) (of wavelength λ) re ected from a point of the surface can be decomposed into two additive components: an interface (specular) re ectance and a body (diffuse) re ectance: L(x, λ) = mS (x)RS (λ)E(λ) + mD (x)RD (λ)E(λ) (12) An interpretation is that the geometric properties at x (such as the angle of incidence light, the angle of remittance light etc.) determine the weights mS (x) and mD (x). The relations between the spectral properties of the illumination, the specular and the diffuse re ections depend on the properties of the material and are given by the relations: RS (λ)E(λ) and RD (λ)E(λ) where E(λ) is the spectral power distribution of the incident light. The measured sensor values Cn (x) at pixel x in the image using N lters with spectral sensitivities given by f1 (λ)...fN (λ) will be given by the following integral over the visible spectrum: Cn (x) = fn (λ) [mS (x)RS (λ)E(λ) +mD (x)RD (λ)E(λ)] dλ = mS (x)Sn + mD (x)Dn (13) Assume that two object points belong to the same material. They have therefore identical re ectance functions and the only difference between the corresponding pixels in the image originates in their geometrical properties. For these two neighboring pixels x1 and x2 and one x ed channel n we have then: Cn (x1 ) Sn mS (x1 ) mD (x1 ) Sn =M (14) = Cn (x2 ) mS (x2 ) mD (x2 ) Dn Dn In the the framework of transformation groups we see that the matrix M operates on the 2-D vectors (Sn Dn ) . In the group theoretical approach it is now natural to consider groups of matrices M operating on the vectors (Sn Dn ) and to construct invariants for such groups. Examples of relevant groups of matrices M are listed in Table 3: The subgroups in Table 3 form one-parameter groups with in nitesimal elements Xk . It is now natural to consider all linear combinations of one, two, three or four of the in nitesimal elements Xk , k = 1, . . . , 4. Combining X2 , X3 we can thus generate the two-parameter group of scalings and the full group is a four parameter group generated by all linear 11 combinations of all four in nitesimal matrices. General Lie-theory requires however not only to consider linear combinations but also the Lie-product of two elements X, Y de ned as XY − Y X. A vector space that is closed under the Lie-product is called a Lie-algebra and its vector-space dimension is the dimension of the Lie-algebra. Computing this Lie product for the shear matrix X4 and the rotation matrix X1 we nd that the subgroup generated by the shear and the rotation operations must also include the scaling matrices. Closure with regard to the Lie-product implies that the two one-parameter groups of shears and rotations generate the three-parameter Lie-algebra of shears, rotations and uniform scalings. We have now shown how the group of non-singular 2 × 2 matrices (and its subgroups) operate on measurement vectors generate by one sensor type at two different positions. Next we will generalize this to the case where we measure simultaneously N-channels at each pixel. The most common example of such a generalization corresponds to a transition from intensity- to RGB-images. Since we separated the spectral properties and the non-spectral parameters we see that the transformation matrix M is the same for all channels. Therefore we obtain in the general case the transformation group: C(x1 ) S (15) =M D C(x2 ) N Here C(x1 ), C(x2 ), S, D are N -dimensional vectors. The group now operates on the space R2 . A general result from Lie-theory (see [16]) shows that this implies that there are 2N − k functionally independent invariants where k is the dimension of the Lie-algebra. For the case of RGB images and the full matrix group we have 2 · 3 − 4 = 2 invariants. The general theory does not only give the number of functionally independent invariants but it also provides an algorithm how to construct these invariants. T A simple Maple program gives the following solutions (with C(xk ) = (rk , gk , bk )): −g2 b1 + b2 g1 b2 r1 − r2 b1 (16) , f = F1 r1 g2 − r2 g1 r1 g2 − r2 g1 If we only start with rotations and shearing then we start with two variables but because of the Lie-structure we have k = 3 and there are three independent invariants: −g2 b1 +b2 g1 f = F1(r1 g2 −r2 g1 , b2 r1 −r2 b1 , − ) r1 g2 −r2 g1 5 Conclusions We used a few examples to illustrate the possible usage of group theoretical methods to solve color related problems. They show that it may be well worth to invest the time and effort to learn the tools of group representations and Lie-theory. The advantages are, at least, twofold: (1) this may lead to new insights into the fundamental properties of spaces of color signals and (2) group theory provides tools and algorithms to solve dif cult problems with the help of programs like Maple. References [1] J. J. Atick and N. Redlich. What does the retina know about natural scenes. Neural Computation, 4:449–572, 1992. [2] H. B. Barlow. The coding of sensory messages. In W. H. Thorpe and O. L. Zangwill, editors, Current Problems in Animal Behaviour, pages 331–360. Cambridge University Press, 1961. [3] N. Dunford and J. T. Schwartz. Part III, Spectral Operators. Interscience Publishers, New York, 1988. [4] D. J. Field. Relations between the statistics of natural images and the response properties of cortical cells. josaa, 4(12):2379–2394, dec 1987. [5] G. D. Finlayson and G. Schaefer. Solving for colour constancy using a constrained dichromatic re ection model. International Journal of Computer Vision, 42(3):127–144, 2001. [6] F. R. Gantmacher. Matrizentheorie. Springer Verlag, Berlin, Heidelberg, New York, Tokyo”, 1986. [7] J. M. Geusebroek, R. van den Boomgaard, A. W. M. Smeulders, and H. Geerts. Color invariance. IEEE Trans. Pattern Anal. Machine Intell., 23(12):1338–1350, 2001. [8] T. Gevers and A. W. M. Smeulders. Pictoseek: Combining color and shape invariant features for image retrieval. IEEE Trans. on Image Processing, 9(1):102–119, 2000. [9] C. Gueymard, “Smarts, a simple model of the atmospheric radiative transfer of sunshine: Algorithms and performance assessment,” Tech. Rep. FSEC-PF-270-95, Florida Solar Energy Center, (1995). [10] S. T. Henderson and D. Hodgkiss, “The spectral energy distribution of daylight,” J. Appl. Phys., 14, pg. 125–131, (1963). 12 [11] D. B. Judd, D. L. MacAdam, and G. Wyszecki, “Spectral distribution of typical daylight as a function of correlated colour temperature,” Journal of the Optical Society of America A, 54, pg. 1031–1041, (1964). [12] G. J. Klinker. A Physical Approach to Color Image Understanding. A K Peters Ltd., 1993. [13] P. Kubelka and F. Munk. Ein beitrag zur optik der farbanstriche. Zeitschrift für Technische Physik, 11a:593–601, 1931. [14] R. Lenz. Investigation of receptive elds using representations of dihedral groups. Journal of Visual Communication and Image Representation, 6(3):209–227, September 1995. [15] R. Lenz, T. H. Bui, and J. Hernández-Andrés. One-parameter subgroups and the chromaticity properties of time-changing illumination spectra. In Proc. SPIE-2003, Color Imaging VIII, Vol 5008, pages 237–248, 2003. [16] P. J. Olver. Applications of Lie Groups to Differential Equations. Springer, New York, 1986. [17] Z. Pan, G. Healey, and D. Slater, “Global spectral irradiance variability and material discrimination at boulder, colorado,” Journal of the Optical Society of America A, 20, 3, pg. 513–521, (2003). [18] C. A. Párraga, G. Brelstaff, T. Troscianko, and I. R. Moorehead. Color and luminance information in natural scenes. J. Opt. Soc. Am. A, 15(3):563–569, March 1998. [19] P. Reinagel and S. Laughlin. Natural stimulus statistics. Network: Computation in Neural Systems, 12(3):237–240, 2001. [20] D. Sattinger and O. Weaver. Lie Groups and Algebras with Applications to Physics, Geometry and Mechanics, volume 61 of Applied Mathematical Sciences. Springer, 1986. [21] S. A. Shafer. Using color to separate re ection components. Color Research and Application, 10(4):210–218, 1985. [22] H. M. G. Stokman. Robust Photometric Invariance in Machine Color Vision. PhD thesis, Intelligent Sensory Information Systems group, University of Amsterdam, Nov. 2000. [23] G. T. Winch, M. C. Boshoff, C. J. Kok, and A. G. du Toit, “Spectroradiometric and colorimetric characteristics of daylight in the southern hemisphere: Pretoria, south africa,” Journal of the Optical Society of America A, 56, pg. 456–464, (1966). [24] G. Wyszecki and W. Stiles. Color Science. Wiley & Sons, London, England, 2 edition, 1982. 13 Image Relighting: Getting the Sun to Set in an Image Taken at Noon Claus B. Madsen and Rune Laursen Laboratory of Computer Vision and Media Technology Aalborg University, Aalborg, Denmark cbm/[email protected] Abstract Image relighting is a very unique special visual effect which promises to have many important practical applications. Image relighting is essentially the process of, given one or more images of some scene, computing what that scene would look like under some other (arbitrary) lighting conditions, e.g., changing positions and colors of light sources. Image relighting can for example be used for interior light design. This paper describes an approach to image relighting which can be implemented to run in real-time by utilizing graphics hardware, as opposed to other state-of-the-art approaches which at best run at a few frames per second. 1 Introduction This paper addresses the subject of developing relighting techniques, i.e. techniques allowing a user to completely alter the lighting conditions in an image of a real scene. Speci cally , the paper focuses on techniques providing real-time relighting functionalities, thus enabling the user to interactively change lighting conditions and get ”instant” visual feedback. Figure 1 provides an example of the kind of relighting this paper addresses. It should be noted that the work presented here presumes the availability of three things: 1) an original image of the scene, 2) a 3D model of the scene, and 3) a model of the lighting conditions in the scene at the time the original image is acquired. We will return to ways in which the two last pieces of knowledge can be obtained. Conceptually image relighting in this manner is a two step process. In the rst step all effects of the original lighting conditions are removed, e.g., highlights, shadows, and differences in shading across surfaces due to varying light incidence angles. In the second step the scene is subjected to some arbitrary new lighting conditions and the appearance of the scene in these conditions is computed. The second step thus ”adds” new highlights, shadows and shading etc. Of these two steps the former is the tricky one, while the latter can be performed using any preferred rendering technique, e.g., ray tracing, radiosity or standard hardware accelerated approaches. Which rendering technique is employed depends on the preferred balance between rendering speed and accuracy in handling various lighting phenomena. In order to achieve true real-time performance we have chosen to use a hardware accelerated approach for step 2, thus sacri cing certain global illumination phenomena. In our approach step 1 is achieved by a computational approach which requires, as stated above, a 3D model of the scene and a model of the original lighting conditions. Alternatively, one could in principle acquire frontoparallel digital images of the surfaces in the scene under perfectly diffuse, white-balanced lighting conditions and use these images as textures on the 3D model, which is subsequently rendered under novel lighting conditions (step 2). This would be a mechanical or image acquisition approach to step 1, but in reality acquiring such ’clean’ textures devoid of lighting effects is not practical for general scenes. The contributions in this work lie in the speci c manner in which the operations performed in steps 1 and 2 are carried out. With the approach described here the two steps can be combined such that the image relighting becomes a matter of modulating the original image on a pixel by pixel basis with a ”relighting map”. The relighting map can be computed in real-time using standard techniques, 14 Figure 1: Left: original image acquired outdoors on a sunny day. Right: same scene as left but this image is a simulation of what the scene would look like if the position of the sun were different. The work in this paper enables such changes in lighting conditions to be performed in real-time. and the modulation can also easily be performed in realtime. Thus, our approach has two advantages: 1) it is directly designed for real-time performance, and 2) the original image is used directly and therefore the nal image is not subject to ltering and/or aliasing effects involved in doing reprojections of textures mapped to a 3D model of the scene. This paper represents the current state of work in progress and will only address the problem for scenes with perfectly diffuse re ectance properties. The paper is organized as follows. In section 2, we present an overview of our approach and show how the relighting effects are achieved. Section 3 then describes related work. Section 4 describes our approach in more detail, followed by section 5 giving some practical details behind the initial experiments we have performed to validate the proposed approach. Section 6 discusses central aspects of the work and points to future research. Finally section 7 offers conclusions. involved and of the process behind our technique. The approach requires different types of input information. First of all an image of the original scene is required. Secondly, a 3D model of the scene must be available, and the original image must be calibrated to the 3D model such that every pixel corresponds to a known 3D point in the scene model. Third, the original lighting conditions in the scene must be known, i.e., we need to know the sources of light in the scene, and their relative intensities. The 3D model can be obtained in many different ways [7], e.g., by reconstruction from multiple images using approaches such as [8], by Image-Based Modelling, e.g. [4], or by laser range scanning. Alternative the scene can be measured and a model constructed manually. The latter is the approach employed for our experimental results, i.e., we have measured the scene, constructed crude polygonal models of the objects, and then calibrated the camera to the 3D model using manually established 2D to 3D point correspondences. Figure 2 shows the scene model used for the relighting illustrated in gure 1. The required knowledge of the original lighting conditions can most easily be acquired using the popular light 2 Overview of approach probe approach, i.e., by taking high dynamic range imPrior to describing the proposed approach in a more tech- ages of a re ecti ve sphere placed in the scene, [2, 5]. Alnically rigorous manner this section attempts to provide ternatively, light source positions, sizes and power can be the reader with an intuitive understanding of the issues measured manually as done in [6] or semi-automatically 15 relighting conditions it will be denoted . For purely diffuse Bidirectional Re ectance Distribution Functions (BRDFs) there is a linear relationship between radiance and irradiance (radiance is proportional to diffuse albedo times irradiance). Therefore diffuse scenes can be very simply relit by dividing the radiance map with the original irradiance map, and then multiplying with the religting irradiance map. Using the introduced terminol. ogy Figure 3 shows original and relighting irradiance maps corresponding to the relighting example in gure 1. Figure 2: 3D model corresponding to scene shown in gure 1. The image illustrates the model as a depth map, i.e., intensity is a measure of distance from the camera. We have used one plane for the tiled ground plane, a plane for the brick wall on the right, six quadrilaterals for the speaker on the left, and two quads and one triangle for for calibration object in the center. Section 1 presented the general concept of relighting as a two step process: 1) removing original light from the image, and 2) adding new light. In the described diffuse case step 1 is represented by the operation, whereas step 2 is performed by multiplying the result . Step 1 is from step 1 by the relight irradiance, a once-only process as it only involves elements that do not change over time (original image and original irradiance map). Step 2 has to be re-performed constantly in response to the user’s changes in the desired lighting conmust in general be ditions for the relit scene, so re-computed for every frame. using multiple images as in [12]. For the experimental results in this paper we have done it manually in a manner described in section 5. In this paper we will limit ourselves to discussing the case of scenes containing surfaces with perfectly diffuse re ectance properties. Each pixel in the original image is a measurement of the radiance (in the three RGB bands) from a unique 3D point in the scene in the direction of the viewpoint. Thus the , where original image is a 2D radiance map, and are the image coordinates. Because we have the 3D model and knowledge of the original lighting conditions it is trivial1 to compute the amount of light arriving at the same 3D points in the scene, i.e., it is possible to con. When the irradiance struct an irradiance map, map is computed using the known original scene lighting . Conversely, when the conditions we will call it irradiance map is computed using some arbitrary different Step 1 could be pre-computed and the result stored in an image which is then subsequently modulated by the real-time computed relighting irradiance map. Our approach does not do this; we have designed a solution which embeds the normalization with the original irradiance into the computation of the relighting irradiance . Thus, at run-time we actually compute the map, relight/original light ratio directly. This ”ratio map” is then used for modulating the original image. By doing this we avoid the non-trivial implementation of a real-time texture division operation. Computing the ratio map is described in detail in section 4. The idea is based on the observation that if one renders a radiance image of an all-white perfectly diffuse 3D model of the scene under the chosen relighting conditions, this image is then identical to the required irra. If instead we set the re ectances diance map, 1 Irradiance computation is trivial provided global illumination issues of the 3D model to be proportional to the inverse of the (irradiance contributions from diffuse re ections) are disregarded. If original irradiances then the rendered image automically this is not a fair assumption the work by Yu et al., [12], can be used to compute these contributions. becomes the desired relight/original irradiance ratio map. 16 Figure 3: Left: irradiance map, , corresponding to original lighting conditions of the scene shown in gure 1. Center: irradiancemap, , corresponding to lighting conditions where the dominant light source, the sun, has changed location relative to the scene. These are the lighting conditions valid for the relit image in gure 1. Right: . Every pixel in illustrates the ”relighting factor”, i.e., the ratio of original irradiance to relighting irradiance, the original radiance map is modulated with its corresponding pixel in this map. Dark means ”light is removed” from original pixels, and bright means ”light is added” to original pixels. 3 Related work There is a small amount of closely related work in the literature. First of all Yu et al., [13], demonstrated how they could acquire re ectance properties of architectural scenes by taking at least two images of each surface of the objects under different lighting conditions. The recovered 3D model combined with the estimated re ectance parameters could then be used to render the scene under changing lighting conditions. The focus of this work is entirely on parameter recovery and relighting is by no means done in real-time. Similarly, inverse global illumination was proposed by Yu et al., [12] for recovery of re ectance parameters for indoor scenes using multiple images of each surface from different viewpoints. Again this work focuses on reectance parameter estimation and relighting is done using RADIANCE, [10], which again is far from real-time. The most closely related work is that of Loscos et al., [6]. This work also enables a user to change the lighting conditions in an image of a scene in an interactive manner, but this work is centered around a radiosity method for irradiance computations. Therefore, the method performs at a few frames per second when the only lighting changes performed are intensity adjustments. If the number of sources or their positions are changed updating takes on the order of 10 seconds. The work in [6] also employs texture modulation for ef cient relighting, and the modulation texture (irradiance map) is computed using radiosity. We have chosen to focus speci cally on true real-time performance and therefore the computation of the relighting irradiance maps does not account for global illumination phenomena such as color bleeding. Nevertheless, with the work currently being done in Pre-computed Radiance Transfer and Photon Mapping, real-time global illumination is coming closer and closer to reality, and our approach can readily be combined with such ef cient global illumination techniques. 4 The perfectly diffuse case For perfectly diffuse re ectors the relationship between incident irradiance, , and radiance, , in any direction is given by the diffuse albedo, : (1) The original image (radiance map), , provides us with measured radiances from a dense set of 3D points in the scene, and these points are known since we assume the camera is calibrated to the scene. In general the diffuse albedo and the irradiance vary for every point in the scene, so the relationship between radiance maps and irradiance 17 maps becomes: ith light source at the 3D point). is the number of light sources. If we set the ambient and diffuse re ectances equal, eq. (2) 4 changes to: Here, is the ”albedo map”. When doing relighting the albedo stays constant; the only thing that (5) changes is the irradiance at each scene point. Therefore, the radiance map of the relit image/scene can be expressed as: Eq. 5 states that rendering with OpenGL Phong lighting the radiance from a point equals the re ectance at the point times the total irradiance (ambient plus sum of individual point source contributions) at the point. Thus, by (3) setting unit re ectances, the radiance equals the irradiance. This may be self-evident but is important because it Eq. 3 simply shows that the relit image can be com- shows that we can use the graphics card’s ef cient lightputed by modulating the original image with a ratio of ing computation capabilities to produce irradiance maps , needed for relighting. two irradiance maps: the relight irradiance map, corresponding to the user’s desired scene (new) lighting That is, if we render the 3D scene model from a view. The point corresponding to the original image, and if all surconditions, and the original irradiance map, key element in our approach is a technique for computing faces in the rendered 3D model have unit re ectances, this map in real-time and using it for modulation of the then the resulting image is an irradiance map. This means original image. , for any user dethat relighting irradiance maps, sired lighting conditions can be rendered simply by rendering a diffuse, all-white 3D scene model under the cho4.1 Computing the irradiance ratio map sen lighting conditions. How can we ef ciently compute the irradiance ratio map? To actually do relighting we not only needed real-time First, let us describe how simply the relighting irradiance computation of , but we needed the lighting ratio map can be computed using standard local illumination map, . This is accomplished by setting techniques (speci cally we will use the Phong lighting the re ectances of points in the 3D model to the inverse model of OpenGL, a description of which may be found of the original irradiance at that point, . in books such as [1, 11]). Rendering an image of a scene To summarize the light ratio maps are generated by dousing the Phong lighting model results in a radiance from ing the following rendering using hardware acceleration: a 3D point which can be formulated as (disregarding specular re ection): 1. upload 3D scene model to graphics card 1 ( + - / ( & ' ( 3 " * 6 ) $ 0 8 ( + - / ( & ' ( " " $ 0 (4) ) * Here is the radiance from a 3D point in the direction of the viewpoint. and are the ambient and diffuse re ectances, respectively, (eq. 4 is to be evaluated for is the ambient irradiance each of three RGB colors). at the 3D point. is the irradiance at the point caused by the ith point light source, ( is the angle between the surface normal and the direction vector to the ( " + - / ( 2. set ambient and diffuse RGB re ectances of all vertices to the inverse of the original irradiance at that 3D point 3. setup the desired lighting conditions consisting of ambient and point source contributions " ( 0 0 4. render the model to a texture using a viewport corresponding to the camera in the original image 18 4.2 Pratical issues In the previous section we described how to use hardware accelerated local lighting rendering to produce irradiance ratio maps for modulating the original image. With this approach there is really no limits to how much the lighting conditions in the scene can be altered. We are presently implementing the proposed technique but all images in this paper were produced by a nonreal-time simulation of the presented approach. Figure 4 shows what the original scene looks like with a (nonexisting) light source in the very center of the scene. For the ongoing implementation of the real-time version the only real issue to contemplate is the resolution of the 3D model of the scene. In order to properly capture gradients in the original irradiances of the scene the resolution of the 3D model has to be high at such gradients. We are working on designing methods for adaptive subdivision based on evaluating irradiance differences between 3D model vertex locations. If differences are too high the surface is subdivided. Computing original irradiances to be used inversely as re ectances of the (subdivided) 3D scene model is an offline process which can be done using any preferred rendering technique, for example Monte Carlo ray tracing to enable proper handling of area light sources. This is especially possible if a high dynamic range light probe image of the scene is available, because then an Image-Based Lighting approach, [2, 3], can be used to compute accurate irradiances which properly handle soft shadows in the original image. In the on-line stage, when rendering the 3D model with the assigned re ectances, cast shadows are important for proper irradiance computation. For this we propose to use a shadow volume approach to detecting shadowed areas. 5 Experiments As mentioned previously the images shown in this paper are produced using a simulation of the presented technique. The original image was acquired with a standard 5 mega pixel digital camera. The scene was measured manually and a simple 3D model of it was constructed (as described in section 2). The camera was calibrated to the 3D model using man- ually established 2D to 3D point correspondences, and the estimation of internal and external camera parameters was done using an approach from [9]. The original lighting conditions were modelled as a combination of a point light source (the sun) and an ambient term (the sky). The position of the sun relative to the scene model was determined by orienting the calibration object such that sun rays were parallel to the xz-plane, thus xing the sun’s y-coordinate to zero. The x and z coordinates were then found by measuring the length of a shadow cast by an object of known height. The RGB intensities of the blue ambient sky light was determined from the image colors of the white paper of the calibration object in areas not exposed to sunlight. By comparing RGB values of calibration object cardboard in shadow and in direct light the relative intensities between ambient and sun light were determined, (taking the cosine fall-off for diffuse re ection into account for the sun point source). It should be made very clear that the original lighting conditions modelled as described above are extremely crude and this was only done to get quick results. In the future we will use light probe images. For computing the original irradiance map a simple ray tracing approach was implemented which consideres local illumination only, by casting primary rays plus shadow feelers. The original radiance map was computed in image resolution. The relighting examples given in the paper basically involve changing the location of the sun source. Given some desired sun position the simple raytracer was used to render a relighting irradiance map, again in image resolution. The relighting irradiance map was divided by the original irradiance map and the result multiplied with the original image to complete the diffuse relighting process. 6 Discussion In this section we will brie y discuss some important points in relation to our proposed method. Using our approach it is possible to employ arbitrarily complex and accurate computations of the original irradiance map. This is an off-line, once-only computation the results of which are used to set the re ectances of the 3D scene model subsequently used for relighting. We believe handling area light sources to be very important, 19 Figure 4: Left: irradiance map corresponding to a user de ned lighting environment with a weak ambient term and a point light source in the middle of the scene. Right: resulting relit image. even for outdoor images, since shadows due to sun light 7 Conclusions actually do have noticeable penumbra regions. Similarly, we believe taking global illumination phenomena (indirect light) into account is important, especially for indoor We have descibed an approach to image/scene relight scenes, where re ections from other surfaces may be a which based on a 3D model of the scene and knowledge of the original lighting conditions can compute the appearsigni cant irradiance contribution for a given surface. ance of the scene under any arbitrary new lighting conditions, including chaging the number of light sources, their positions and radiant powers. Conversely, for the actual on-line, real-time rendering The main contribution of the work is the fact that the of irradiances during interactive relighting we have here proposed a straight forward local illumination approach. approach is directly designed for real-time performance, Yet, the basic approach with using the 3D scene model, enabling a user to get instant visual feedback upon having normalized with original irradiances, can be used in con- changed the parameters of the lighting environment. A junction with any lighting algorithm depending on how smaller contribution lies in the idea of performing the normalization with the original irradiance by appropriately accurate one desires the result should be. setting the re ectances of the 3D model used for real-time irradiance computations. This allows the approach to operate directly on the original image, rather than computing Throughout this paper we have assumed scenes to conthe albedo map off-line and modulating it at run-time. sist entirely of Lambertian materials. Our approach actuAn important aspect of the proposed approach is that ally does generalize nicely to scenes with glossy BRDFs. It requires an additional rendering pass in the real-time relighting is performed as a modulation of the original relighting process, in order rst to modulate the original image. It is believed that doing relighting in this manner image with the diffuse part of the relighting/original irra- is superior to an approach were textures extracted from diance ratio map and subsequently add the specular radi- the image are mapped to the scene geometry and reproance part. Figure 5 demonstrates the effect of adding a jected at run-time, because the multiple re-sampling steps involved will cause the resulting image to be blurred. specular component to the surfaces during relighting. 20 geometry and image-based approach. In Proceedings: SIGGRAPH 1996, New Orleans, LA, USA, pages 11 – 20, August 1996. [5] S. Gibson, J. Cook, T. Howard, and R. Hubbold. Rapic shadow generation in real-world lighting environments. In Proceedings: EuroGraphics Symposium on Rendering, Leuwen, Belgium, June 2003. [6] C. Loscos, G. Drettakis, and L. Robert. Interative virtual relighting of real scenes. IEEE Transactions on Visualization and Computer Graphics, 6(4):289 – 305, October-December 2000. Figure 5: A specular re ection component has been added to each surface during relighting to illustrate the possibility of playing with the re ectance properties. Acknowledgments This research is funded in part by the BENOGO project under the European Commission IST program (IST-200139184), and in part by the ARTHUR project (IST-200028559). This support is gratefully acknowledged. References [1] S. R. Buss. 3-D Computer Graphics – A Mathematical Introduction with OpenGL. Cambridge Universtity Press, 2003. [2] P. Debevec. Rendering synthetic objects into real scenes: Bridging traditional and image-based graphics with global illumination and high dynamic range photography. In Proceedings: SIGGRAPH 1998, Orlando, Florida, USA, July 1998. [3] P. Debevec. Tutorial: Image-based lighting. IEEE Computer Graphics and Applications, pages 26 – 34, March/April 2002. [4] P. Debevec, C. Taylor, and J. Malik. Modelling and rendering architecture from photographs: a hybrid [7] M. M. Oliveira. Image-based modelling and rendering: A survey. RITA - Revista de Informatica Teorica a Aplicada, 9(2):37 – 66, October 2002. Brasillian journal, but paper is in English. [8] M. Pollefeys, R. Koch, and L. VanGool. Selfcalibration and metric reconstruction in spite of varying and unknown internal camera parameters. International Journal of Computer Vision, 32(1):7 – 25, 1999. [9] E. Trucco and A. Verri. Introductory Techniques for 3D Computer Vision. Prentice Hall, 1998. [10] G. J. Ward. The radiance lighting simulation and rendering system. In Proceedings: SIGGRAPH 1994, pages 459 – 472, July 1994. [11] A. Watt and F. Policarpo. 3D Games: Real-Time Rendering and Software Technology, volume 1. Addison-Wesley, 2001. [12] Y. Yu, P. Debevec, J. Malik, and T. Hawkins. Inverse global illumination: Recovering re ectance models of real scenes from photographs. In Proceedings: SIGGRAPH 1999, Los Angeles, California, USA, pages 215 – 224, August 1999. [13] Y. Yu and J. Malik. Recovering photometric properties of architectural scenes from photographs. In Proceedings: SIGGRAPH 1998, Orlando, Florida, USA, pages 207 – 217, July 1998. 21 Using Mixtures of Gaussians to Compare Approaches to Signal Separation Kaare Brandt Petersen Technical University of Denmark 1 Introduction In the signal separation technique called Independent Component Analysis (ICA), one refers to a problem as being ”square” if the there is as many measurements as sources to separate and ”overcomplete” or ”under-determined” if there is more sources than measurements. While square ICA is thoroughly investigated through many diﬀerent approaches with well understood diﬀerences and similarities, the overcomplete case is still posing a diﬃcult problem. Below is a short overview of some of the more interesting or illustrative of the approaches. In 1999, Hagai Attias presented in [1] a Maximum Likelihood approach assuming the generative model x = As + . In this approach a model distribution is constructed and using Mixture of Gaussians as priors makes it possible to complete the relevant integrals and obtain a closed form expression for the distribution over x. The model distribution is approximated to the data distribution through the Kullback Leibler divergence. This approach is extremely ﬂexible while still having appealing analytical properties and the only real drawback is the bad scaling behavior: A sum over K D must be computed, which can be rather larger for e.g. image data. In 2000, Lewicki and Sejnowksi presented in [5] a Maximum Likelihood approach assuming the generative model x = As + . In this approach, the log likelihood of A is Taylor expanded to second order around the maximum a posteriori estimate of the sources, i.e. one approximate the likelihood with a gaussian, and the sources are in turn are estimated using the updated estimate of A. In this approach we see the diﬃculty that most overcomplete techniques is trying to work around: The likelihood, or some other suitable cost-function, contains some integral involving the source prior and is therefore in general hard to solve. The approach of Lewicki and Sejnowski from 2000 substitutes the integral with a second order expansion and we shall see other possibilities in the following. In 2001, Girolami presented in [2] a Maximum Likelihood approach assuming the generative model x = As + . In this approach, Girloami assumes Laplacian priors and the integral associated with the log likelihood is approximated through a variational scheme: The laplacian priors are reformulated in dual space which provides a lower bound of the likelihood to be optimized. The drawback of this approach is that the trick only works for laplacian priors and that the algorithm optimizes a lower boudn in stead of the log likelihood it self. 1 22 In 2002, Hojen-Soerensen, Winther and Hansen presented in [4], the socalled ”Mean Field Approach” assuming the generative model x = As + . The parameters A and possibly the noise covariance W are estimated through maximum likelihood assuming knowledge of the mean values of the sources and vice versa. That is, the integral of the log likelihood, translates into mean values of the sources, which are approximated with estimates of the mean values obtained from Mean Field theory. The nice feature of this approach is that we substitute a very complicated integral with an easier non-linear equation and that one can do this for any prior. The problem, of course is that al though Mean Field estimates are fairly accurate they are still approximations. Also in 2002, Shriki, Sampolinski and Lee presented in [7] an interesting variant of Infomax on the ﬁltering model ŝ = Wx. In the setup y = g(Wx), Shriki et. al. obtains a relation between p(x) and p(y) by assuming a noisy relation y = g(Wx) + and letting the noise go to zero. The main problem is that the limit is not taken properly care of and that the certainty of the result therefore is doubtful. And ﬁnally in 2003, Teh, Welling, Osindero and Hinton presented in [8] the Energy Based Model on the ﬁltering model ŝ = Wx. Through a setup using inspiration from physics, a model distribution for x is constructed and adjusted to be as close to the data distribution as possible through a Kullback Leibler divergence. Again the authors are faced with an intractable integral and this time it is approximated with the so-called ”n-step Learning” which is a Hybrid Monte Carlo technique using n steps in the estimation of the integral. The remarkable claim of the paper is that very few steps such as n = 1, or n = 3 often will be suﬃcient to obtain overall convergence of the algorithm. A common feature for most of the approaches is that they assume a noisy model and obtain a likelihood which involves a diﬃcult integral which is then approximated in some way or another. The exception from this is the approach of Hagai Attias, but one can then argue that assuming priors to be mixtures of gaussians either is a restriction or an approximation. 1.1 This Paper Thus, Independent Component Analysis (ICA) can be performed by a vast range of diﬀerent methods. These can diﬀer from each other by assumed properties such as noise or time-correlation, but also by the fundamental issue on whether they attempt to estimate the generative mixing matrix, denoted A, or a ﬁltering matrix denoted W. In case of the same number of observations and sources, the square case, most if not all of these methods can be proven to be equivalent. But in the overcomplete case, where the number of observations are smaller than the number of sources, their diﬀerences becomes apparent and it is not easy to compare the results of generative and ﬁltering approaches. This paper makes an attempt to compare the result of two diﬀerent methods in the overcomplete case: The Maximum Likelihood (ML), which estimates the generative A, and the Energy Based Models (EBM) which seek to estimate a ﬁltering matrix W. This is done by assuming the priors to be centered mixtures of gaussians which makes it possible to compare the optimization schemes. This approach, with respect to ML, is closely related to the work of Hagai Attias in 2 23 General MoG SMoG (Spherical) IMoG (Independent) 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0.2 0.2 0.2 0.4 0.4 0.4 0.6 0.6 0.6 0.8 0.8 1 0.8 1 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 Figure 1: Contour plot of probability densities in 2D source space. and this is the only of the the three which in fact is independent in its variables, i.e. a Independent Mixture of Gaussians, IMoG. [1], but where Attias is assuming completely general mixtures of gaussians and a noisy mixture model, the mixtures of gaussians in this paper are assumed centered for simplicity and, more importantly, the limit of zero noise is derived to be able to compare with the noiseless EBM. The structure of the paper is as follows: In Sec 2, we introduce the reader to the mixture of gaussians to be used, and in Sec 3 we apply the mixtures of gaussians to the EBM and the ML. In Sec 4 we compare the results found and ﬁnally in Sec 5, we make a short summary. 2 Mixture of Gaussians In this paper we make extensive use of the family of distributions known as mixtures of gaussians (MoG). To clear up some common misconceptions about MoG we introduce the general MoG and then discuss two important subsets of distributions. The density of a D-dimensional centered MoG is 1 ρ κ (1) exp − sT D−1 p(s) = κ s 2 |2πDκ | κ where the weights ρκ sum to one. The matrix Dκ can be any positive deﬁnite 2 2 2 , σ2κ , ..., σDκ ). matrix, but in this context often assumed diagonal, Dκ = diag(σ1κ is itself a mixture of gaussians, s ∼ The marginal distribution for each s i i 2 κ ρκ N (0, σiκ ), but note about the joint distribution that even if the coordinates si in each component are independent, i.e. if Dκ all are diagonal, the coordinates si are not in general independent in the bigger joint distribution p(s). One subset of interest is the MoG constructed as a product of D 1-dimensional mixtures of gaussians, and therefore independent by construction. Since the variables are independent we call this kind of MoG for Independent Mixtures2of ), Gaussians (IMoG). Denoting the parameters of i th marginal by κ ρiκ N (0, σiκ the density of the joint distribution is given by 1 ρ̃ k (2) exp − sT D−1 p(s) = k s 2 |2πDk | {k} 3 24 The symbol {k} denotes all combinations of the vector k having length D, 2 2 2 ), and ρ̃k is , σ2k , ..., σDk consisting of integers from 1 to K, Dk = diag(σ1k D 2 1 D deﬁned by ρ̃k = i=1 ρiki . Another subset of interest is the MoG, where the matrices are not only diagonal, but all some constant times the identity Dκ = σκ2 I. In this case the density is spherical symmetric and we call this kind of distributions for Spherical Mixtures of Gaussians (SMoG). Obvious from Eq. 2, IMoG is itself a MoG, but the reverse is not true in general. As visualized on Figure 1, the density contour of a MoG with diagonal covariances is some weighted sum of axis-aligned ellipsoids. The length of axis of the ellipsoids corresponds to variances of gaussian components, i.e. the diagonal elements of the covariance matrices. This is also true for the product of 1dimensional MoG, but in this case all combinations of the marginal variances are present as sets of axis of some ellipsoid. Thus knowing the density of a MoG with diagonal covariances, its variables are independent if and only if all combinations of axis lengths are present. Therefore, since this by deﬁnition is not the case for SMoG, SMoG and IMoG must be disjoint subsets of MoG. 3 Models and Derivation We now use the MoG as source priors for two diﬀerent approaches to ICA: Energy Based Models (EBM) and Maximum Likelihood (ML). We consider a situation in which D sources st are mixed into a set of M measurements xt , expressed by the equation xt = Ast . The signals are N time steps long and can be arranged into the matrices S and X, such that mixing of all N vectors can be expressed in one equation X = AS. The sources are white and since EBM is restricted to square and overcomplete mixing, we assume M ≤ D. 3.1 Energy Based Models The EBM method, presented in [8], aim at demixing the measurements X by a ﬁltering with W in the traditional way Ŝ = WX. That this, in the overcomplete case, is not producing independent estimated sources, is discussed in more detail in Sec 4. The ﬁltering coeﬃcients is determined through a construction of a model distribution pW (x) which is approximated to the data distribution p0 (x) = N 1 δ(x − xt ) N t=1 (3) through the Kullback-Leibler divergence. The model distribution is constructed in the following way: An energy is deﬁned by E(x; W) = − ln ps (Wx), which ensures the property that choosing W such that the resulting estimated sources are not too unlikely, is rewarded through low energy levels and unlikely source values penalized with high energy levels. Other deﬁnitions of energy could be made, but this is especially appealing due to its calculational properties. The 4 25 energy E is used in a Gibbs distribution, i.e. ps (Wx) e−E(x;W) = Z(W) ps (Wx)dx pW (x) = (4) which is chosen as our model distribution. In [8], Teh et. al. make no assumption on the prior, which makes the normalization part more diﬃcult. To deal with this they use so-called n-step learning, a variant of a hybrid monte carlo approach, do approximately estimate the normalization part of the optimization. In this paper we in stead choose the prior to be a MoG, which enables us to calculate the integral of Eq. 4 and obtain an closed form expression for the model distribution. The result is exp(− 1 xT WT D−1 κ Wx) (5) γκ 2 pW (x) = −1 | κ |2π(WT D−1 κ W) where γκ in general is dependent on W and Dκ for all κ through the following T −1 −1 1/2 relation: Setting ξκ = (|2π(W Dκ W) |/|2πDκ |) , we can write γκ = ρκ ξκ / κ ρκ ξκ . Note that in the square case ξκ = 1 and therefore γκ = ρκ and in the case of SMoG priors we obtain ξκ = (2π/σκ2 )(D−M )/2 . The estimated optimal ﬁltering matrix Ŵ is determined by Ŵ = minW KL(pW ||p0 ) in which the gradient of the KL-divergence can be calculated analytically when the prior is chosen to be mixtures of gaussians or some other analytically appealing distribution. 3.2 Maximum Likelihood In the ML setup presented here we assume a generative model X = AS + Γ, where we, in order to be able to deal with the overcomplete case, have added white gaussian noise, Γ. In the end we let the noise variance go to zero to obtain the noiseless result. Assuming white gaussian noise and the prior on the sources ps (s) to be a MoG with covariance matrices Dκ and weights ρκ , we can complete the integration and write the distribution over x as 1 T |2πΦ−1 κ | exp(− 2 x Ψκ x) p(x) = p(x|s)p(s)ds = ρκ |2πΣ| |2πDκ | κ where Σ = σ 2 I is the noise covariance matrix, Φκ = AT Σ−1 A+Dκ−1 and Ψκ = T −1 . We now want to consider the limit of σ 2 → 0, but Σ−1 − Σ−1 AΦ−1 κ A Σ we need to do this with great care, since otherwise crucial details will vanish in the approximation. Using the Woodbury identity, singular value decomposition and some very good approximations (see the appendix for details), we obtain the following limits |2πΦ−1 κ |/ Ψκ → |2πΣ| → 5 (ADκ AT )−1 wκ / |AAT | 26 Here wκ is a constant with respect to σ 2 , but depends on Dκ and on A through the unique orthogonal matrix V. V which fulﬁlls the eigen value equation in Λ are decreasing in size down the AT AV = VΛ, such that the eigen values D diagonal. The expression for wκ is wκ = i=M +1 (2π/(VT Dκ−1 V)ii )1/2 . With this limit taken care of, we can write the maximum likelihood expression for p(x) as pA (x) = κ ρκ exp(− 12 xT (ADκ AT )−1 x) |2πADκ AT | (6) where the weights somewhat surprisingly turns out to be same as those of the prior (see the appendix for details). The estimated generative mixing matrix Â is the matrix maximizing the log likelihood Â = maxA ln P (X|A) Note that we have not estimated the sources in this process, only the generative mixing matrix. 4 Comparing EBM and ML Now we compare the EBM and ML approaches derived in the previous section and discuss the signiﬁcance of the diﬀerences and similarities. In the square case, we obtain total equivalence of all expressions setting W = A−1 and thus not surprisingly we can conclude, as Teh et .al. does in [8], that the two approaches are equivalent when the number of observations equal the number of sources. Therefore the discussion and comparison in this sections is almost entirely concerned with the overcomplete case. In the overcomplete case it is not obvious how one should compare results on the generative A and the ﬁltering matrix W. The ﬁltering approach does not retrieve the original sources, since for any matrix W we have WAg = I because of the dimensionality: It is impossible to construct D M -dimensional orthogonal matrices when D > M . And we cannot in general compare the ﬁlter matrix W with the pseudo-inverse of A, since this is not the optimal solution in all cases [5]. But using MoG as prior, both the model distribution pW (x) of the EBM and the loglikelihood pA (x) of the ML becomes MoG’s with parameters which must be estimated to ﬁt a common data set X. In fact we end up with two optimizations which look rather similar 0= N ∂ ln pW (xt ) ∂W t=1 ∂ ln pA (xt ) ∂A t=1 N 0= The similarity is to some extend both genuine and deceptive: Both distributions are MoG, but the dependency of the weights and covariances on W and A are diﬀerent. In this section we compare EBM and ML by comparing the covariances and weights of their MoG’s. 6 27 2 Square d=1 d=2 d = 10 1.5 1 0.5 0 0 0.5 1 1.5 2 Figure 2: The Weights for EBM and ML. The plot demonstrates for SMoG priors that the diﬀerences in weights increase dramatically when the settings gets strongly overcomplete. (See the text for details on the plots) 4.1 Spherical MoG We now consider the special case when the priors are SMoG, i.e. the variances are given as Dκ = σκ2 I. In this case the sources are not assumed to be independent, which is interesting in its on right, but it also serves as a clear example of properties which holds for the more general cases as well. When we assume the priors to be SMoG, the model distribution simpliﬁes signiﬁcantly, pW (x) = κ exp(− 1 xT WT D−1 κ Wx) , γκ 2 −1 | |2π(WT D−1 κ W) ρκ (1/σκ2 )d 2 d κ ρκ (1/σκ ) γκ = (7) where d = (D − M )/2. The Covariances The covariances of EBM and ML respectively can in fact become equal in the case of SMoG. The equation setting the covariances equal −1 = ADκ AT (WT D−1 κ W) ∀κ (8) translates into WT WAAT = I, which is fulﬁlled for W = A+ , where A+ denotes the pseudo inverse (Moore-Penrose) of the matrix A. When A has full rank, the pseudo-inverse is given by A+ = AT (AAT )−1 . But the equation is also fulﬁlled for any matrix W = UA+ , where U is orthogonal and thus, there is an entire family of matrices which would make the covariances of the EBM equal to the covariances of the ML for a given A. Conversely for any W we can choose A = W+ to obtain the same covariances and and in this sense the two approaches have equal ﬂexibility with respect to adjusting the covariances to the data. This is evident in the 4 × 2 example in Fig 3 a) and b). 7 28 Gen Ag Ps-inv A+ g Est A (ML) Est W (EBM) 8 8 8 8 4 4 4 4 0 0 0 0 4 4 4 4 8 8 8 4 0 4 8 8 8 4 0 4 8 8 8 4 0 4 8 8 4 0 4 8 Figure 3: The covariances in case of SMoG priors for diﬀerent settings. Plot a) Generative A. Plot b) Pseudo-inverse of the generative A. Plot c) Estimated A. Plot d) Estimated W. The Weights The weights can be compared by examining the ratio γκ /ρκ and as we shall see, this ratio diﬀers strongly from 1 in most cases. From the expression of γκ in Eq. 7, we see that the constrain making the weights of ML and EBM equal is (1/σκ2 )d ? 2 d =1 κ ρκ (1/σκ ) ∀κ which is clearly impossible when the weights σκ2 must be diﬀerent for diﬀerent κ. Thus, in the SMoG case, the weights of EBM and ML cannot be equal and furthermore the ratio γκ /ρκ becomes relatively large for those κ where σκ2 is very small and vice versa. Fig 2 is a general illustration of this. Here we assume a SMoG prior with two components which has σ12 = 1 and ρ1 = ρ2 = 0.5 and let σ22 vary between 0 and 2. The resulting ratio γ2 /ρ2 is shown in Fig 2: The x-axis is σ22 , the y-axis is γ2 /ρ2 and the 4 curves are plottet for diﬀerent degrees of overcompleteness d=0 (Square) and d = 1, 2, 10 (Overcomplete). Clearly only in the square case and for σ22 values close to 1, is the ratio reasonably close to 1. Another more speciﬁc example is shown in Fig 3 c) and d), which is the estimated covariances for ML and EBM in a 4 × 2 case. In this example the eﬀect of the enhanced weight on smaller covariances is clear: When the weights of the smaller covariance is strong, the points far from origin is considered extreme, and the covariances are expanded accordingly. Thus, for this reason, EBM seem to favor larger covariances compared to ML. 5 Summary and Acknowledgements Conclusively the use of Mixtures of Gaussians made it possible to compare Maximum Likelihood with Energy Based Models. The results show that in the overcomplete case with Spherical Mixtures of Gaussians as priors, the Energy Based Model is biased toward larger covariances compared to the Maximum Likelihood. One can show that this eﬀect is also present for IMoG priors, though not at all as strong. 8 29 Finally I need to give due credit: This paper is closely related to the result of earlier work together with Jiucang Hao and Te-Won Lee from University of California San Diego (UCSD). A Details of the Calculations When calculating the limit of Ψκ , we ﬁrst set in the deﬁnition Σ = σ 2 I and simplify the expression into Ψκ = 1 T I − A(AT A + σ 2 D−1 κ )A 2 σ Now deﬁning Q = ADκ AT /σ 2 we can use Woodburys identity for inverse matrices and obtain T −1 Q A(AT A + σ 2 D−1 κ )A = Q − Q(I + Q) Since Q is symmetric and very large compared to I, the right hand side can be approximated by I−Q−1 . To see this, write Q as Q = VΛVT , for an orthogonal V and diagonal Λ and remember the identity x − x2 /(1 + x) = 1 − 1/(1 + x) to obtain I − Vdiag(1/(1 + Λii /σ 2 ))VT ∼ = I − Vdiag(1/(Λii /σ 2 ))VT = I − Q−1 . Inserting this into the equation containing Ψκ , gives the desired result. When calculating the limit of the fraction containing the determinant |2πΦ−1 κ |, we use the fact that since AT A is symmetric there exists an orthogonal V such that AT A = VΛVT . Since |V| = 1 we get 2 (D−M ) /|Λ + σ 2 VT D−1 V| |2πΦ−1 κ |/|2πΣ| = (2πσ ) Since σ 2 are assumed arbitrary small, only the diagonal of the sum of matrices will contributed signiﬁcantly to the determinant. And since further more AT A has rank M , the M ﬁrst elements of the diagonal matrix will dominate together with the remaining M − D factors |Λ + σ 2 VT D−1 V| ∼ = M Λii i=1 D (σ 2 VT D−1 V)jj j=M +1 inserting this into the fraction above gives the desired result. The coeﬃcients ακ has the structure ρκ wκ |2πADκ AT |/ |AAT | · |2πDκ |. In the square case much of the diﬃculties of taking the noiseless limit disappears, wκ = 1 and we easily obtain ακ = ρκ . In the case of SMoG, setting Dκ = σκ2 I, we obtain wκ = (2πσκ2 )(D−M )/2 and therefore ακ = ρκ . Supported by numerical results we conjecture that this is also the case for the general overcomplete case. References [1] H. Attias, Independent Factor Analysis, Neural Computation 11, 803-851, 1999. [2] M. Girolami, A Variational Method for Learning Sparse and Overcomplete Representations, Neural Computation, 13(11), pp 2517 - 2532, 2001. 9 30 [3] G. E. Hinton, Training products of Experts by minimizing contrastive divergence, Neural Computation, 14(8):1771-1800, 2002. [4] P. Hojen-Sorensen, O. Winther, L. K. Hansen, Mean-Field Approaches to Independent Component Analysis, Neural Computation Volume 14, Issue 4, April 2002. [5] M. S. Lewicki, T. J. Sejnowski, Learning Overcomplete Representations, Neural Computation, 12(2):337-65, 2000. [6] R. M. Neal, Probabilistic Inference Using Markov Chain Monte Carlo Methods, Techinal Report CRG-TR-93-I, Department of Computer Science, ujniversity of Toronto, 1993. [7] O. Shriki, H. Sampolinski, D.D. Lee, An information maximization approach to overcomplete and recurrent representations Neural Info. Proc. Sys. 13 (2001). [8] Y. W. Teh, M. Welling, S. Osindero, G. Hinton ”Energy-Based Models for Sparse Overcomplete Representations”, The Journal of Machine Learning Research - Special Issue on Independent Components Analysis, guest edited by Te-Won Lee, Jean-Franois Cardoso, Erkki Oja and Shun-ichi Amari. 2003. 10 31 Stochastic differential equations in image warping Bo Markussen Department of Computer Science University of Copenhagen, DK-2100 Copenhagen email: [email protected] Abstract This paper is concerned with image warping as applied in e.g. non-rigid registration. We use the theory of stochastic differential equations to give a precise mathematical description of the Brownian warps model proposed by Nielsen et al.. Furthermore we introduce a renormalization technique allowing for Bayesian inference of the warp o w given e.g. the observation of matching landmarks in two images. As a nal result we show that the maximum a posteriori estimator is equivalent to the minimal energy estimator derived by Joshi & Miller using o ws given by deterministic differential equations. 1 Introduction Image warping is diffeomorphic transformations of the domain on which an image is represented. A typical application is the registration problem. Suppose for instance that several images of the same object have been taken with different devices. Each device records different properties of the object, and we wish to combine the different images a joint description of the object. In practice the images are often not aligned, whence it would be improper simply to stack them on top of each other. However, if we are allowed to deform the images (imagine the images being recorded of pieces of rubber, and that we may deform the pieces of rubber in any possible way without introducing folds or tears), then it might be possible to achieve alignment. In some cases the needed transformation are non-rigid leading to a non-rigid registration problem. This paper is concerned with a probabilistic description of this problem. In particular we consider the statistical problem of nding the maximum a posteriori estimator of the warps given a set of matching landmarks. Landmarks are points which are identi able in all the images. We hence know that these landmarks should be matched by the 1 32 warps. Given this information we seek to estimate the remaining registration. A related application would be the estimation of the image o w given a sequence of matched landmarks in a sequence of images. This will be a purely theoretical paper. The results are threefold: 1) A stochastic differential equation description of the least committed prior for Brownian warps proposed in [7], 2) the introduction of a mathematical rigorous renormalization technique allowing the computation of a maximum a posteriori estimator, 3) equivalence of the maximum a posteriori and the minimum energy warp found in [4]. These results have recently been presented in the paper [1] at the second workshop on Generative Model Based Vision at the IEEE conference on Computer Vision and Pattern Recognition. Mathematically a warp between two d-dimensional images (in applications we typically have d = 2 or d = 3) is a diffeomorphism φ : Rd → Rd . Points x ∈ Rd are usually understood as column vectors x = {xi }i=1,...,d . Similarly, the coordinates of a matrix a ∈ Rd×d are denoted by a = {aij }i,j=1,...,d . The transpose of a vector/matrix A is denoted by A∗ , probabilistic expectation is denoted by E, and the set of natural numbers is denoted by N. 2 The rst principle model In accordance with the physical “piece of rubber” analogy it is natural to realize the warping as a dynamical process . At time t = 0 we have the original image, and a time t = T we have the nal image. At a time point 0 < t < T we have a intermediate warp. We thus introduce a double indexed family of warps φ(·, s, t) : Rd → Rd with 0 ≤ s ≤ t ≤ T . The warp φ(·, s, t) gives the warping between time s and t. The physical interpretation implies the o w properties φ(x, s, s) = x, φ(x, s, t) = φ φ(x, s, r), r, t . The next step is to realize φ(·, s, t) as a stochastic process. The axiom stated below was proposed in [7] and parallels the assumptions leading to Gaussian distribution and the Brownian motion. A Brownian motion is a stochastic process B with independent Gaussian distributed increments B(t) − B(s) ∼ N (0, t√ − s). The differential dB(t) can be interpreted as Gaussian white noise of order dt. Axiom. The diffeomorphisms φ(·, th−1 , th ) are stochastically independent for every partition 0 = t0 < t1 < · · · < tH = T , and the distribution of φ(·, s, t) depends only on the time length t − s. The Axiom together with the o w property imply a central limit theorem for the Jacobiants {∂φi (x)/∂xj }i,j=1,...,d . These Jacobiants were employed in [7] for 2 33 the landmark matching problem. Unfortunately the limiting distribution is very dif cult to analyze, and has to our knowledge only been described in dimension d = 2 in [3]. The underlying probabilistic model can be rigorously de ned using stochastic calculus. The physical interpretation of Eq. (3) given below is that the “piece of rubber” is in uenced by in nitely many random forces modelled by Brownian motions Bn , n ∈ N. The n’th force works in the direction fn (x) at the spatial position x and has size dBn (t) at time t. We will use stochastic integrals T such as 0 X(t) dB(t), which loosely speaking is the limit of the incremental sums H X(th−1 ) B(th ) − B(th−1 ) . h=1 However, some care has to be taken. Especially there is a distinction between the so-called Itô and Stratonovich integrals. The essential reference for the results employed in this paper is the monograph [5]. But be warned, this monograph can be quite hard to read. Let ◦dBn (t) denote the Stratonovich differential at time t. Theorem 1. In addition to the Axiom assume for every x, y ∈ Rd and s ∈ [0, T ] the existence of the limits E[φ(x, s, t) − x] t>s −−→ b(x), t→s t−s ∗ E φ(x, s, t) − x φ(y, s, t) − y t>s −−→ a(x, y) t→s t−s and functions f0 and fn , n ∈ N, such that fn (x) fn (y)∗ ∈ Rd×d , a(x, y) = n∈N 1 b(x) = f0 (x) + 2 d j=1 ∂aij (x, y) ∈ Rd . ∂xj y=x i=1,...,d (1) (2) Under some additional regularity conditions there exists stochastically independent Brownian motions Bn , n ∈ N, such that φ(x, s, t) solves the stochastic differential equation given on integral form by t t f0 φ(x, s, u) du + fn φ(x, s, u) ◦dBn (u). (3) φ(x, s, t) = x + s n∈N s Note that the statistical properties of the o w of diffeomorphisms φ(x, s, t) are encoded by the in nitesimal mean and covariance b(x) and a(x, y), and that the actual appearance of the o w is encoded by the Brownian motions Bn . In remaining of this paper the functions b(x) and a(x, y) are assumed to be known a priori. 3 34 3 The maximum a posteriori estimator The idea behind the regularized maximum a posteriori estimator is to approximate the in nite dimensional Brownian motion B = {Bn (t)}n∈N,t∈[0,T ] by the nite dimensional random variable ∈ RN ×N . BN = Bn mNT n,m=1,...,N The N 2 -dimensional Gaussian vector BN has probability density " ! 2 # N N N N ) − B (t ) N Bn (tN N n m−1 m pN B = exp − . 2πT 2T n=1 m=1 ∈ C 1 [0, T ]; RN we de ne the renormalized Brownian density as the limit For B # " T ∂B n (t) 2 N 1 = lim pN B p B dt . (4) = exp − N →∞ 2 n∈N 0 ∂t Let (xl , yl ) ∈ Rd × Rd , l ∈ L, be a given set of landmarks. Let the approximative and the approximative maximum a posteriori o w φN (·, tm , tk ) with tm = mT N N $ estimator B be de ned by k−1 N k−1 f0 φN (x, tm , th ) T + fn φN (x, tm , th ) Bn (th+1 ) − Bn (th ) , φN (x, tm , tk ) = x + N n=1 h=m h=m N N N N ×N $ = arg max pN B B ∈ R B : φN (xl , 0, T ) = yl for l ∈ L , s, t) and the renormalized maximum a posteriori estithe renormalized o w φ(x, $ be de ned by mator B t t s, u) du + s, u) dB s, t) = x + n (u), f0 φ(x, fn φ(x, φ(x, s n∈N s l , 0, T ) = yl for l ∈ L , ∈ C 1 [0, T ]; RN : φ(x B $ B = arg max p B $ s, t) be de ned by the ordinary difand the maximum a posteriori estimator φ(·, ferential equation t t $ $ s, u) dB $ $n (u). (5) f0 φ(x, s, u) du + fn φ(x, φ(x, s, t) = x + s n∈N 4 s 35 $ N conThen it can be show that φN (x, s, t) converges to φ(x, s, t) and that B $ s, t) $ as N → ∞. These convergence results indeed justify to call φ(·, verges to B the maximum a posteriori estimator. The main novelty of the paper [1] was the $ The relations between $ N to B. awareness of need to consider the convergence of B these properties are gathered in the following diagram φN (·,O s, t) MAP⇒ / $N o B Convergence⇒ / $ ⇐Regularization B O Approximation⇑ ⇓Convergence ⇑ / φ(·, $ s, t) MAP s, t) φ(·, φ(·, Os, t) O ⇑ SDE⇑ B ODE⇒ Renormalization⇒ ODE / B Observe that discretizations done in this section only are an intermediate step s, t) in the theoretical analysis. However, the differential equation for the o w φ(·, will typically be discretized for the numerical implementation. Moreover, working directly with the covariance function a(x, y) instead of the functions fn (x) we can circumvent the in nite sum over n ∈ N in Eq. (5). For these results, a speci c algorithm for the landmark matching problem and a discussion of the choice of the functions b(x) and a(x, y) see [1]. 4 Interpretation of energy minimization approaches The theory of differential geometry gives that there for every diffeomorphism φ exists a velocity eld v : Rd × [0, T ] → Rd such that φ(x) = φ(x, 0, T ) for the solution to the transport equation t φ(x, s, t) = x + v φ(x, s, u), u du. (6) s An alternative approach for warp estimation is to de ne some energy functional over velocity elds and choose the minimal energy velocity eld tting the landmarks, cf. [4] and [2]. Given some differential operator L , e.g. the Laplacian, the paper [4] proposes to use the energy functional T L v(x, t)2 dx dt. E(v) = 0 Rd 5 36 l , 0, t) For vanishing f0 (x) the velocity eld corresponding to the landmarks path φ(x l∈L is given by l , 0, t) v(x, t) = a x, φ(x l∈L k , 0, t), φ(x l , 0, t) a φ(x ∂ φ(xk , 0, t) . ∂t k,l∈L k∈L −1 If the in nitesimal covariance a(x, y) equals the Greens function for the square of the differential operator L , i.e. L ∗ L a(x, y) = δx=y , then L L v(x, t) = δx=φ(x l ,0,t) ∗ l∈L k , 0, t), φ(x l , 0, t) a φ(x ∂ φ(xk , 0, t) ∂t k,l∈L k∈L −1 and it is shown in [1] that the energy functional E( v ) equals minus the loga rithm of the renormalized Brownian density for B. In this way the maximum a posteriori estimator equals the minimal energy estimator. This result gives 1) a probabilistic interpretation of the energy minimization methods proposed in [4] and [2], 2) the “statistical equivalence” of the energy minimization methods to the Bayesian approach proposed in [7]. Acknowledgement: I am grateful to Peter Johansen and Mads Nielsen for introducing me to these intriguing problems. References [1] B. Markussen, “A Statistical Approach to Large Deformation Diffeomorphisms”, second workshop on Generative Model Based Vision at the 22nd IEEE conference on Computer Vision and Pattern Recognition, Washington D.C., 2004. [2] V. Camion and L. Younes, “Geodesic Interpolating Splines,” Third International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, EMMCVPR 2001, LNCS 2134, pp. 513-527, 2001. [3] A. D. Jackson, B. Lautrup, P. Johansen, and M. Nielsen, “Products of Random Matrices,” Phys. Rev. E, Vol. 66, article 66124, 2002. [4] S. C. Joshi and M. I. Miller, “Landmark Matching via Large Deformation Diffeomorphisms,” IEEE Trans. Image Process, Vol. 9, pp. 1357-1370, 2000. [5] H. Kunita, Stochastic Flows and Stochastic Differential Equations, Cambridge University Press, 1990. 6 37 [6] M. I. Miller and L. Younes, “Group Actions, Homeomorphisms, and Matching: A General Framework,” Int. J. Computer Vision, Vol. 41, pp. 61–84, 2002. [7] M. Nielsen, P. Johansen, A. D. Jackson, and B. Lautrup, “Brownian Warps: A Least Committed Prior for Non-rigid Registration,” Fifth International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2002, LNCS 2489, pp. 557-564, 2002. [8] C. J. Twining and S. Marsland, “Constructing Diffeomorphic Representations of Non-rigid Registrations of Medical Images,” 18’th International Conference on Information Processing in Medical Imaging, IPMI 2003, LNCS 2732, pp. 413-425, 2003. 7 38 Probabilistic Model-based Background Subtraction J. Anderson1 , T. Prehn1 , V. Krüger1 , and A. Elgammal2 1 Aalborg University Esbjerg Niels Bohrs Vej 8 6700 Esbjerg Denmark 2 Dept. of Computer Science Rutgers, the State University of New Jersey 110 Frelinghuysen Road Piscataway, NJ 08854-8019 USA Abstract. Usually, background subtraction is approached as a pixelbased process, and the output is (a possibly thresholded) image where each pixel reﬂects, independent from its neighboring pixels, the likelihood of itself belonging to a foreground object. What is neglected for better output is the correlation between pixels. In this paper we introduce a model-based background subtraction approach which facilitates prior knowledge of pixel correlations for clearer and better results. Model knowledge is being learned from good training video data, the data is stored for fast access in a hierarchical manner. Bayesian propagation over time is used for proper model selection and tracking during model-based background subtraction. Bayes propagation is attractive in our application as it allows to deal with uncertainties during tracking. The system runns in real-time and was extensively tested on suitable outdoor video data. 1 Introduction Companies and scientists work on vision systems that are expected to work in real-world scenarios. Car companies work, e.g., on road sign and pedestrian detection and due to the threat of terrorism, biometric vision systems and surveillance applications are under development. All these approaches work well in controlled environments, e.g., attempts to recognize humans by their face and gait has proven to be very successful in a lab environment. However, in uncontrolled environments, such as outdoor scenarios, the approaches disgracefully fail, e.g., the gait recognition drops from 99% to merely 30%. This is mainly due to low quality video data, the often small number of pixels on target and visual distractors such as shadows and strong illumination variations. What is needed are special feature extraction techniques that are robust to outdoor distortions and that can cope with low-quality video data. One of the 39 most common feature extraction techniques in surveillance applications is background subtraction (BGS) [5; 3; 9; 6]. BGS approaches assume a stable camera. They are able to learn a background as well as possible local image variations of it, thus generating a background model even of non-rigid background objects. During application the model is compared with novel video images and pixels are marked according to the belief that they are ﬁtting the background model. Generally, BGS methods have the following drawbacks: 1. BGS techniques are able to detect “interesting” images areas, i.e., image areas that are suﬃciently diﬀerent from the learned background model. Thus, BGS approaches are able to detect, e.g., a person and the shadow that he/she throws. However, BGS approaches are not able to distinguish between a foreground object and its shadow. 2. Very often, the same objects causes a diﬀerent output when the scenario changes: E.g. the BGS output for a person walking on green grass or gray concrete may be diﬀerent. In this paper we present a Model-based Background Subtracting (MBGS) method that learns not only a background model but also a foreground model. The “classical” background subtraction detects the region of interests while the foreground models are being applied to the classical BGS output to “clean up” possible noise. To reach a maximum of robustness, we obmit any thresholding and apply statistical matching techniques to the likelihood measures of the BGS. Having the recent gait recognition attempts in mind, we have applied the following limitations to our discussion (however, the methodology is general enough that we do not see any limit of generality in these limitations): 1. We consider only humans as objects and ignore objects that look diﬀerent from humans. 2. We limit our discussion to silhouettes of humans as they deliver a fairly clothing-independent description of an individual. The Model-based Background Subtraction System (MBGS System) consists of a learning part to learn possible foreground objects and a MBGS part, where the output of a classical BGS is veriﬁed using the previously trained foreground object knowledge. To learn and represent the foreground knowledge (here silhouettes of humans) is not straight forward due to the absence of a suitable vector space. One possibility is to describe the data in a hierarchical manner, using a suitable metric and a suitable representation of dynamics between the silhouettes. We interpret the silhouettes as probabilistic density functions S(X, Y ) where each pixel describes the likelihood of belonging either to foreground or to background. To compare these “probabilistic” silhouettes we use the Kullback-Leibler distance, k-means clustering is used to cluster similar ones. Similar approaches for hierarchical contour representation can be found in [4; 14]. In the second part, we again consider the contours as densities over spatial coordinates and use normalized correlation to compute the similarity between 40 the silhouette density and the computed one in the input image. Tracking and silhouette selection is being done using Bayesian propagation over time. Bayesian propagation can be applied directly since we are dealing with densities and it has the advantage that it considers the uncertainty in the estimation of the tracking parameters and the silhouette selection. The densities in the Bayesian propagation are approximated using an enhancement of the well-known Condensation method [7]. A similar enhancement of the Condensation method has been applied in video based face recognition [12]. The remainder of this paper is organized as follows: In Sec. 2 we introduce the learning approaches. The actual BGS method is discussed in Sec. 3. We discuss our experimental results in Sec. 4 and conclude with ﬁnal remarks are in Sec. 5. 2 Learning and Representation of Foreground Objects In order to make use of foreground model knowledge, the MBGS system needs to be able to: – learn a model representation for possibly a number of diﬀerent and non-rigid objects from video data and – quickly access the proper model information during application of the MBGS. Our main idea is the following: Apply the classical BGS to a scenario that is controlled in a manner that facilitates the learning process. In our case, since we want to learn silhouettes of humans, that only humans are visible in the scene during training and that the background variations are kept as small as possible to minimize distortions. Then, use this video data to learn the proper model knowledge. After the application of a classical BGS, applying mean-shift tracking [1] allows to extract from the BGS output-data a sequence of small image patches containing, centered, the silhouette. This procedure is the same as the one presented in [10], however, with the diﬀerence that here we do not threshold the BGS output but use probabilistic silhouettes (instead of binary ones as in [10]) which still contain for each silhouette pixel the belief of being a foreground pixel. To organize this data we ﬁrst normalize the exemplars w.r.t scale and position. Then we use, similar to [4], a combination of tree structuring and k-means clustering. We use a top down approach: The ﬁrst level is the root of the hierarchy which contains all the exemplars. Then the second level is constructed by using a the k-means clustering to cluster the exemplars from the root. The third level is constructed by clustering each cluster from the second level, again, using k-means, see Fig. 1 for an example. The k-means clustering uses the KullbackLeibler divergence measure which measures the similarity between two density functions p and q: p(x) dx . (1) KLDist(p, q) = p(x) log q(x) KLDist(p, q) is non-negative and only zero if p and q coincide. 41 Fig. 1. An example of our clustering approach: 30 exemplars with K=3 and the algorithm stops after reaching 3 levels. See Fig. 2 for an example clustering of a training sequence of a single individual (as training data, we chose here non-optimal data for better visualization). The tree structure facilitates a fast search of exemplars along the tree vertices, Fig. 2. The ﬁve images show the cluster centers computed from a training sequence of a single individual. and the cluster centers are either used to apply MBGS on a coarse level or they are used as proto-types, as in [4], to direct the search to a ﬁner level in the hierarchy. Once the tree is constructed, we generate a Markov transition matrix: Assuming that the change over time from one silhouette to a next one can be l understood as a ﬁrst order Markov process, the Markov transition matrix Mij describes the transition probability of silhouette sj following after silhouette si at level l in the hierarchy. During MBGS application particle ﬁltering [8; 11; 2] will be used to ﬁnd the proper silhouette (see Sec. 3). The propagation of silhouettes over time (see Sec. 3) is non-trivial, as silhouette do not form a vector space. However, what is suﬃcient, is a (not necessarily symmetric) metric space, i.e., given a silhouette si , only silhouettes that are close according to a given (not necessarily symmetric) metric are needed to be considered. In the tree structure similar contours are clustered which facilitates the propagation process. The Markov transition matrix Mij on the other hand describes directly the transition likelihoods between clusters. 42 3 Applying Background Subtraction and Recognizing Foreground Objects The MBGS system is built as an extension to a pixel based BGS approach. It uses foreground models to deﬁne likely correlations between neighbored pixels in the output P (x) of the BGS application. Each pixel in the image P (x) contains a value in the range [0, 1], where 1 indicates the highest probability of a pixel being a foreground pixel. A model in the hierarchy can be chosen and deformed according to a 4-D vector θ = [i, s, x, y], (2) where x and y denote the position of the silhouette in the image P , s its scale, and i is a natural number that refers to a silhouette in the hierarchy. The “matching” is done by normalized correlation between a model silhouette density, parameterized according to a deformation vector θt and the appropriate region of interest in the BGS image Pt (x), appropriately normalized. In order to ﬁnd at each time-step t the most likely θt in the image Pt (x), we use Bayesian propagation over time p(θt |P1 , P2 , . . . , Pt ) ≡ pt (αt , it ) = p(Pt |αt , it ) it−1 αt−1 p(αt , it |αt−1 , it−1 )pt−1 (αt−1 , it−1 ) (3) with αt = [s, x, y]t . The probability images are denoted by “Pt ” while little “p” denotes density functions: pt (Pt |αt , it ) denotes the likelihood measure of observation Pt , given the parameters αt andit while pt (αt , it ) denotes the prior at time t. We approximate the posteriori density p(θt |P1 , P2 , . . . , Pt ) with a sequential Monte Carlo method [2; 7; 11; 13]. Using Bayesian propagation allows to take into account the uncertainty in the estimated parameter. Monte Carlo methods use random samples for the approximation of a density function. Our MBGS system uses separate sample sets for each object in the input image. A new sample set is constructed every time a new object in the video image matches suﬃciently well. As the diﬀusion density p(αt , it |αt−1 , it−1 ) in Eq. 3 we use the Brownian motion model due to the absence of a better one. For the propagation of the position and scale parameters, x, y, and s, this is straight forward. For the propagation of the silhouette we use the following strategy: The likelihood for selecting a silhouette from a certain silhouette cluster in the hierarchy is computed from the Markov transition matrix M by marginalizing over the silhouettes in that particular cluster. Within a cluster, the new silhouette is then chosen randomly. The reason for this is that our training data is too little so that the Markov transition matrix M appeared to be speciﬁc to the training videos. 43 4 Experiments In this section we present the experimental results with our MBGS implementation. We have carried out a large number of experiments with both, online and oﬄine video data. The experiments have shown that the drawbacks of the classical BGS approach, that were mentioned in section 1, can be remedied with MBGS: 1. Because shadows are not part of the model information, they are be classiﬁed as background by the implemented MBGS approach. In fact, most non-model object types will be classiﬁed as background, and therefore MBGS allows for eﬀective object type ﬁltering. 2. The output presented from the MBGS does not vary, even when the scenario changes signiﬁcantly and objects are always detected completely even when signiﬁcantly covered up by other objects such as bushes or lamp posts. The veriﬁcation is done systematically by comparing the AUE MBGS system with two previously developed pixel-based state-of-art background subtraction approaches: One is the non-parametric approach developed at Maryland (UMD BGS) [3]. The second one (AUE BGS) was previously developed at AUE and utilizes an alternative image noise ﬁltering approach. The comparison is done by eye-sight due to the absense of ground-truth. We have tested live-feed as well as videos from a database of 50 videos with diﬀerent scenarios such as parkinglotscenes and park-scenes. The videos contained both side views and frontal views of the person. 3 In Figure 3 one sees a scenario in one of the test movies with a pedestrian walking behind trees, thereby at times being occluded. The output of two pixel-based approaches is shown in the lower part of the ﬁgure. One can notice that the shadow casted by the pedestrian is classiﬁed as foreground by the pixel-based approaches but as background by the model-based approaches. Figure 4 shows the same scenario but this time a scene is shown where the pedestrian is heavily occluded. The occlusion causes the pedestrian to more or less disappear when the pixel based approaches are used. The scenario in ﬁgure 5 (see also the provided video) shows two pedestrians walking towards each other, thereby crossing behind lamp posts and a statue. When processing this scenario, a combination of image ﬁltering and the background variation appeares where the silhouettes of the pixel-based are too heavily distorted to still be identiﬁed as a person. Also both pixel-based approaches severely distorts the contours of the pedestrians. By only inspecting the pixel-based results, it is hard to tell that the foreground objects are actually pedestrians. The system has been tested on a 2 GHz Pentium with Linux. We use RGB videos with an images size 320 × 240 pixels. With only a single person to track, the system needs with 350 particles ≈ 50 ms/frame. 25ms/frame were used by the classical BGS, 25ms/frame were used for the matching. 3 We have provided the most representative videos that document the limits of our approach. 44 Fig. 3. BGS approach comparison of shadow issue. 5 Conclusion The presented model-based background subtraction system combines the classical background subtraction with model knowledge of foreground objects. The application of model knowledge is not applied on a binary BGS image but on the “likelihood image”, i.e. an image where each pixel value represents a conﬁdence of belonging either to the foreground or background. This approach considerably increases robustness as these likelihoods can also be understood as uncertainties which is exploited for the tracking and silhouette selection process. Also, the propagation of densities prevents the need of selecting thresholds (e.g. for binarization of the image P ) or of maximization. Thresholds are only used for visualization purposes and otherwise for the detection of a new human in the ﬁeld of view. In the above application we have chosen silhouettes of humans, but we belive that this choice is without limit of generality since even diﬀerent object types ﬁt into the tree structure. The presented experiments were carried out with only a single individual in the database. We have experimented also with diﬀerent individuals (and thus 45 Fig. 4. BGS approach comparison of heavily occlusion. varying contours), but the output was instable w.f.t. the choice if the individual. This is under further investigation and the use of our approach for gait recognition is future research. References 1. Dorin Comaniciu, Visvanathan Ramesh, and Peter Meer. Real-time tracking of non-rigid objects using mean shift. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, volume 2, pages 142–149, Hilton Head Island, SC, June 13-15, 2000. 2. A. Doucet, S. Godsill, and C. Andrieu. On sequential monte carlo sampling methods for bayesian ﬁltering. Statistics and Computing, 10:197–209, 2000. 3. A. Elgammal and L. Davis. Probabilistic framework for segmenting people under occlusion. In ICCV, ICCV01, 2001. 4. D. Gavrila and V. Philomin. Real-time object detection for ”smart” vehicles. In Proc. Int. Conf. on Computer Vision, pages 87–93, Korfu, Greece, 1999. 5. I. Haritaoglu, D. Harwood, and L. Davis. W4s: A real-time system for detection and tracking people in 2.5 D. In Proc. European Conf. on Computer Vision, Freiburg, Germany, June 1-5, 1998. 46 Fig. 5. BGS approach comparison of low contrast issue. 6. T. Horprasert, D. Harwood, and L.S. Davis. A statistical approach for real-time robust background subtraction and shadow detection. In Proceedings of IEEE ICCV’99 FRAME-RATE Workshop, 1999. 7. M. Isard and A. Blake. Condensation – conditional density propagation for visual tracking. Int. J. of Computer Vision, 1998. 8. M. Isard and A. Blake. Condensation – conditional density propagation for visual tracking. Int. J. of Computer Vision, 29:5–28, 1998. 9. Yuri A. Ivanov, Aaron F. Bobick, and John Liu. Fast lighting independent background subtraction. Int. J. of Computer Vision, 37(2):199–207, 2000. 10. A. Kale, A.N. Rajagopalan, N. Cuntoor, V. Krueger, and R. Chellappa. Identiﬁcation of humans using gait. IEEE Trans. Image Processing, 2004. to be published. 11. G. Kitagawa. Monta carlo ﬁlter and smoother for non-gaussian nonlinear state space models. J. Computational and Graphical Statistics, 5:1–25, 1996. 12. V. Krueger and S. Zhou. Exemplar-based face recognition from video. In Proc. European Conf. on Computer Vision, Copenhagen, Denmark, June 27-31, 2002. 13. J.S. Liu and R. Chen. Sequential monte carlo for dynamic systems. Journal of the American Statistical Association, 93:1031–1041, 1998. 47 14. K. Toyama and A. Blake. Probabilistic tracking in a metric space. In Proc. Int. Conf. on Computer Vision, volume 2, pages 50–59, Vancouver, Canada, 9-12 July, 2001. 48 8 : < = ? A B C E G I 2 K a b c e f h i p k m u n v p n p f r p u t v b u v } w } y u N O K { w r , P i S G w b = T t E K K ^ b u w P ' ( * , ! / , V < : 8 T I E X = T L N R P y f w p } y m u b v u n f v w u y { w v } i b v u y v u } f { } y w p t K L I u " T C p c ! B 5 = < S u : : b u \ r R Z } * L C ` p t p w u } k m y u v m u i v y u } { y w p t p u v n f u v i v y u } { y w u w { f } u v i b c i p u v n f u v i b u v f } ¤ ¤ ¥ ¬ ¦ ¸ ¬ § ´ ´ § § ¬ ¶ ¯ © ¸ Å ¸ ¨ ½ © ³ ¦ ¦ ¨ § © ´ ¥ ¬ ¯ ® ª ¬ ¬ ¶ § § ¶ ª ¶ È § · ± ¦ ® ½ ¬ ¯ · ¦ À ¦ ¥ ¥ ª ¿ § À ¦ ¦ ¯ ¶ ¦ ¦ ³ § ¯ § ´ ¶ ª ¦ ¥ ® ¥ © ± § ¥ ¯ ® ¯ ´ ¦ ¸ § ¥ ¥ ± · ´ ® § ª ¥ ª Á ¥ § § ¶ ® ¬ ¦ ³ ¸ § µ ´ © Æ § ¬ § § ¸ ³ º ¶ § ¸ ¬ ª ´ ¦ ¬ ¨ ¥ ¥ § ¥ § ¯ ´ ª ³ ¨ · § § © § ª ´ ¯ ´ ¶ Â ¬ È ¨ ¦ ¦ © § ´ § Á ¬ ¨ © À § § º À ¸ ¨ £ ¨ ¿ Ç ¬ ¬ ¢ ª ¾ ¨ © ¡ ¥ ¦ ¨ ª µ § § ´ § ¬ ¥ ¦ © ´ ® ¦ § ¬ ´ ® ¥ µ ¬ ¥ Ë ¬ ´ ´ µ ¶ ¶ ª © § Ê ¬ ¬ § § ¨ ¬ ª Å Ê ¥ © ¯ § ª ´ º ¦ § § ¦ § ¼ § ª · § » ¨ · ¶ ¬ µ § § ¨ µ ´ ¶ ´ § ¬ ¬ ¸ ª · ´ ® ¬ § ¦ ¶ ´ ³ ® ® § ½ ¬ ¥ ´ ¸ ¥ ½ ¬ § ± ® ¥ ± º ¦ ¬ ± § Æ Ì ª ¬ · § § ¬ ª ´ § ¥ ´ Ò ¥ Ó n u Õ ã § ¬ ¶ y ¯ · ¦ Õ ¶ v § y p È ª ¦ f ª º À ® ¬ Ý è § · § { Ê ¼ ¥ ¶ f ¶ ¥ © © ¯ ¬ ¬ ´ § ¯ { m Å Å ¸ ¬ À ´ ½ © Ê ´ ¥ ¿ ¨ ¥ Ü v § ¦ Õ © ª ª Å µ ® § ¶ à y ´ ½ ¬ ¦ ¶ ¬ ¦ ¦ Û { ¥ § ª ß ¬ ¦ ¥ ª Þ f ¦ ´ ¥ § Á ¬ Å Õ r ¬ · © Ü v Å À ¶ ¦ Û { © ¿ ¥ ¶ © µ § ´ ¬ ¨ ¦ Ù u ¸ ¬ © ¶ ¬ Å § § © ¦ © · ¬ ³ ¥ ¥ ¥ Ø ® ¦ ½ Å × ¶ ª ® © × ´ © ¶ § ¦ ç ¶ ¨ § Ö å ¶ § ´ · ³ ¦ ¬ ª ½ ª ¥ Á ® § ¶ ª © Ê ¦ ¦ ¶ ¥ ´ © © Á ½ § § À ¦ º ¦ Å § ¥ § § ´ ® ´ ¼ È ª ¦ ¶ ¬ § § ¨ ½ ¥ · ¬ · § ´ ¨ ¥ Ê ¨ µ ¸ ¥ © ¯ ¯ Ê ¦ µ ® § ´ È ¥ ¶ ª ª · · § ¨ Æ ¬ ª ® Å ª ¥ ¸ © § ¶ ® ª ¦ ´ ¨ ¨ © § § § ¦ © ¼ Å ³ ® § © ´ ª ¶ ª ½ § § ® ¬ ª ¬ © ¶ º ª ¶ ¦ Æ § Æ Æ Þ f r u c { f m u i é ë ì í î ` ì ï i p t f v ã y u v c â { y h p r v f } r } y u f u u } i { y ò p ë v ì p í t ì ï w n u t i p } y { u f v } õ p a } i u } i u ö ÷ { w f n m p u h r t y } } f y v u y { i { ó n u } u t } p u } u f u m n y f m m f p } u y y v { f w p â w n u y ø { y p v u ù f w y p v i ú é ý þ é ÿ û ú ó n u } u é y { w n u } { w f } w y f r c u } y f w y u p t w n u y f u y v w n u { m f r ú u y } u ú m w y v p i f v û þ y { w n u e f r f m y f v p u } f w p } i ó n y m n y v w ó p y u v { y p v { } u f { å n v u u è f f { { r y u f p v t w u n } u v { u r m f r y u { c w { n f u m u è p } u t u f v f { t v v t m f w y m u p y v f p t w u n y u n { u { f n w p ó v y ø y { v y p y v u } ù u f w y y p v i â u é ý ò ÿ ç & . 0 û ! 2 ÿ ý " # í & ( * + a 49 " $ " ) + / / " / + " * 6 / + / / $ $ 7 $ + D / 8 K + / < $ " + < + " ) / " " ! + P ! ) C & / $ / O ! G * < ) + - ! $ < D 0 = $ ) + / + / $ + + $ / / $ $ G / + < G G " E B G / / < + / + G $ / / $ + " + B C $ $ I / $ / $ + * / ) + $ " / $ $ / + " I G G G < / + " / * / + / $ " G 3 $ $ / I $ O 6 ) 7 " G " c / $ / 6 / / + $ $ B " / $ $ e ; h / G $ " / B / $ B < $ + " / " $ c + " / / + C $ D $ B $ $ $ / G / / / + / $ $ I k / l / + < G = $ " O = / ; $ = $ $ $ G c $ / / $ 9 I / G G c $ / K < < e O ; G = / G e 7 $ b B + $ " G $ / $ B B G G $ / " " + G " B + $ $ / / / / / G G H f J = / c k 6 $ < $ n o / $ $ < l + i / $ < G ) / + / L $ " B / $ $ < $ $ / ) I $ / K G G $ / / $ G / $ / $ G h + / < 3 I / " $ + / / $ < u * $ G 3 v / $ G / + / G " G " $ 6 + $ / K + / G / " I B G I $ + $ 6 P + $ / / $ $ N + ) ) $ $ / / / + h $ / G I + / < K ) / / $ G < 6 + / + " Q $ / S ) $ G ) * ) 0 T $ $ & S 3 / ! < $ " / + " $ e + " < + 7 e / $ / G K I / + $ B / + Y $ " / / O + $ 7 ; ) $ $ $ " + O W ) V / 3 < / + " / 3 u $ G G B / B < " / G / / + / $ / ) / $ / 6 / + / B u $ $ G B G G $ ) + " $ / $ " 6 < / + / " 3 / $ / 6 K / / + + $ $ / B / B $ G $ B / B " $ B 3 " < ) $ S $ B G S ! S S ) 0 / S Q $ * $ S ) ) ! 6 Q 0 + + / b ^ Q T Q ! 0 ^ + + + / ) / $ T + S & ) $ G 0 S ! ! " S $ < Q T " / 0 G ^ / + $ / + ) < / b + + & S / / $ < / " $ $ G / / + $ B / $ G $ / $ $ " $ / / B + 6 / + " / / + + I I " / G G ) / + ) d + $ / $ $ / I G / G G B $ / / + / $ / D I / * / I " < ) ) $ B n $ / / K $ G + / / K / $ 50 # & ( # # + , . / / # - # & 6 & . . # # + , - . , . + ? @ ? # # . E # + G - # # , 0 1 . 2 4 a # # f # # + + ' E ) . E & & ? + + % & . E , # , # J K G & I M . # & & \ , M + . . + & # & M G , e ) . K & M # / + ) I ' E < : & # a d & & b . 6 . ? 8 + c # b @ 7 & . # 6 6 + @ G @ . + . # ? + . @ . 6 . , # 4 g c i j b f a i # g k ? e f b 6 g . 4 A C & 6 a a d # - @ G & o @ + + & M & # . + # # I E M F M G K , . & - # . G & + H & I / K , . # . G - & # & & & 6 ? # + + & G I & + G # + + ? @ . # . & , # . & M # r @ # + + ? & . + o . @ . # . . + @ , . 6 ? # 6 & 6 ? + o # . M & . - + u w . K t x + 6 + @ s K L M t & . # 6 Q K R N # # # # M & & & V Q K T & & H , T # X w Z [ y \ ^ K x @ . # . X ` [ b z Q M r R X ` \ b z V Q M & M X j T Q . j i @ k g i V Q M & X i . + + 6 + . k g i m & # + & # G + . , R . ? o ` V Q b @ . # Q # + # + & V Q & H & - j i R } } u ~ ` X ` b b Q o ` V b z s u V v ` X ` b ~ b j ' F , i w y z { { | } s L K 6 . + + . , . M M - # & @ . Q R M @ . . ? & H & . + + . T o ` j V Q b + M i z V Q o K ` j T V Q b o ` j i V Q b y ¢ Q i R ¤ T y ¥ z ¦ § ¨ ' E , 51 " ' ) , - 1 1 # ' - J - - , - 7 ) ' ) A B , ) - ' , G 7 , , , 1 - B - - 1 = ) P 7 1 - ) A ' B " ' B ' ' 7 " ' R Q , S 1 7 , , 1 - = , B ' B ' ' - , " $ % ' ) * , % . Y 0 Z " ] " - , 7 , / ) J ) - - 1 ) ' B - ) , , ) ' - ` 1 - ' - " ) , ' A - 7 - , , - - " , ' ' , - ) , d - - ' - " - J , " , ' - - ) ) , 1 7 = ' ' - = - , - - - ) , ) 7 , - 7 - ) - - - ) - d 7 ' ' = - ) , - , ' B - ' - , , - ) , - - ) , ' 1 " , , - , ' G - ' ) d " , 7 7 ' ' = 1 - - B - - , , , , - ' A , ' 7 - - ) , - - ) , , d - " J ' ' - - - , , - d 4 - - = Y 5 Z " J 6 8 l - @ 1 , C A , E F M N , d = - ' d - ' - ' B " A = E K - ? H - ? ' 7 = ) , ; , - : = ? A H Q A - @ S E @ - Q A M N Y - , - - - ) ' B = B , - , , , - ) , - " , - ' - - ' ' ' d - ' - ' B - - [ - \ " , , - , - ) ' ' B G ) , - 1 A A - 7 , ' - n A ' - , 7 o A o " = d " J ) ' 52 2 - ) " 5 - + % - ! , + ! ! " - - + + G - I " ! ! J , I " K ! @ 2 - ) % 2 % ) - % ) , ) $ % , % 5 ! , - 2 - % $ ) ) - % + ! ) + ! , " - " " " " - % - - 2 % ! " % - ! - D " ! $ ) , - - R ) 2 % ) 5 5 ! - ) @ - 2 ) - 2 % - 2 5 ) @ % - $ D $ % 2 - + ) ) ) $ ! - ! ) $ " ) - - - ) + % " 2 % @ - - % 2 % ) % " ) - 2 - " $ " - 5 , $ % " ) - % - % ! , " ) % % " @ - " % ! " + [ + ) " [ c 5 - D % 5 - ) - 5 I e " f g - 2 % " - " " % 5 - $ - % % " - - ) 2 % " ! - " 2 ! ! , " ! - " [ % - ! ! , 5 ! ) ! ! + [ e $ % - " - % 5 ! , - + ) ! ! + - % 5 , + - " - 2 % $ " - - 2 - - 5 % + ) @ $ $ " $ & ' ( * , ( ' / 1 / ( " ' @ " $ & ' ( * 9 & ; 1 / ( " ' @ " $ & ' ( * ( " " & 1 / ( " ' $ % - 2 + - [ c - % ) ) - + ! ) ) - 2 2 - % ) % ) - 2 " ) - + % + $ - R @ " % 5 ! - - 5 2 ) ) - @ D 2 % - ) ! % % + 5 - - - ) - [ - 2 % % " - - % - 2 % % R ) - % 5 ! - 5 - + [ ! 2 " - $ - 2 % 53 ! $ & ) ! $ , ! $ ! ! 2 ! ! , & , 6 ! ! 8 ! 8 ! & X ! , Z ! ! X = , , 8 2 $ = ! ! & ! 8 Y , Y ) [ & 2 Y $ 8 & $ $ 6 $ 8 = & = $ $ $ = E ! ! = & , & = & ) ! ! , 2 R ) 2 ] $ ! = ! ! = = 2 , ) ! $ & ! ! = $ c , R ) ! $ & $ ! ! $ R ! $ , , , R ! ! = & d d d ! ! R ! d d ) ! $ ! ! = & , R ! & $ d d 2 = & $ ! ! d d d ) 2 = & , R ! $ $ ! & & ! , $ = $ R , R ! $ d d & ! ! , 2 8 , & ) E 8 ! ! ! $ d d d ) $ ! = ) & , R & 8 & & ! $ 8 ! $ , & = = d d & , ! 8 , , R = $ R ! $ d $ $ d d 2 = & , ! 54 & " $ & ' ( 8 & : ; ) + ( 0 2 ' ( & & 8 & & " $ & " 8 N 8 8 " : ( 0 ' ( & & & ( 2 4 & ( + & 8 & ( H 8 : 0 + $ & $ ) 8 : 8 : & 8 & ; : 0 & ( ' 2 & : & ( + 0 & 4 0 & : H 0 : & H $ & ( & ( " $ & 0 + $ & 4 & ( & ( & 0 $ $ 8 ( 2 0 2 & 8 & & : H & 0 : 8 : 0 + $ 8 & & & : $ 2 : & & & ( 2 & 8 & 0 : & ( 8 ( ' 4 W ( & " ; & : + 2 & & & : + : $ ) ) ( : & & 8 8 $ & ` : + & : : : ; 8 ( & + & & 0 8 $ 8 H & & ( 2 & : 2 & ( 8 & 2 : & : & & & $ 8 + 8 + ( 0 W + 8 ; & : 8 4 & 8 2 & & ( $ ) 2 ; : & 8 & ' ( & + & : 8 & & + ( & ( & 8 & : : & : & ( & ( ( 0 : 4 2 ( & ' 8 & 8 $ + & ( & ( 8 & & : + ( + : & 4 8 : & 8 & 2 & ( c ' ( : ; & W 8 & $ ) 8 + H $ : ; ( 0 + $ 0 & $ 2 H : & ( & W : 0 : $ c & ( ( ; 0 & : $ 0 8 ( 2 W $ $ ' H + + $ ) 2 & ( & 8 & 4 & ( f : W $ 2 : ; & ( & & : 4 0 : 4 " $ & $ ) 4 8 $ 0 2 + + $ 8 & : 4 8 ( 0 0 : 2 $ 0 0 & 8 8 ( $ 2 0 4 0 2 2 H & H 2 8 : $ $ 8 & ' 2 & ; 8 : ' N 8 + & 0 H $ ) : : ; & ( ` c 0 2 : ; 8 : ; 8 $ 2 : ' j k l j m j 0 j n 0 : ' o : 8 $ : & ( : ) ; : p H $ ; 8 & : ' : + : 2 4 q $ 4 o 8 $ : 8 f 4 r & : ( 4 & : 4 r ! # H # $ & ( ( * , 4 - / n 0 : * 3 , / 8 7 ( & 9 / 4 ; < ( 4 8 q ( & + ( & $ l l 4 + 4 l 2 A A > l s > l @ t ' u $ W N 8 0 8 ' k w m ' n 8 ( ' 8 " $ ) 0 ' " + $ 0 & 2 & ( p & n " & " ' 4 2 0 2 4 N 2 W : 8 f 4 & : 4 < ; , / / & ( J # H ; L N 9 / ( * & ( J 3 ; < / ( N / < ( N & ; ( - ; ( L / < / ( , / ; ( W J / < ; , / # # & ( J 4 + 2 H w k t @ t m ' > ' 0 : ' X $ 8 8 & ( : & ) 8 : 4 : 2 0 + : $ Z m A @ w ' ' : \ ' u : $ + + & & f ] ] ' ! - - / N & ( ; c m 4 L l & : - A w A : ' > y N 4 l A ` s / , N < ; ; N + / , 9 & ( & & , \ - 8 ; < N : ; < ; & 4 ) + w 8 @ _ t $ @ ` : t ` ` ` 4 ' ' ( & 8 & : ; 0 t k 0 l s @ k w 2 ' & ; - ; J & , - / < ( / N & , # 4 c y _ t @ t u ' + ' 7 9 / / / 3 # N < ! , N ! < / ; L ! # # & ( # , - / # 3 , / & W J / # ' r ( n H & ( 0 4 2 8 8 & & & 4 & & & : ; ; : 0 & : : 0 + & 2 c 8 8 4 8 $ & ) : ; & ( 0 & 8 : c & ) 4 w y y w ' > 0 + & 8 8 4 z & 8 ( & z " ` 55 , ! # % ' ( ! , ( ' , ! ! 1 1 ( 4 4 ! , 1 % 8 8 % , ; $ ( , - > C ! D E % 0 1 2 3 E 4 3 F D C D 4 4 H # 6 7 ( ( ' ( 1 ( 9 ! ! 1 1 ( 8 % , 4 , ! , E H H E P H F C R 4 ! % , 7 , 1 ( V ! , 1 1 ( N : , V C E H H E 4 , ! 7 P : 8 ! 8 C = 1 1 ( N U P N = > ? # @ ! 1 8 4 [ , ! % 1 1 4 8 8 ' = C ( ? P , 4 1 , ( ! ! 8 C 8 ' ( [ ! ! , > C R A ; , > ! ! e $ 8 C C D P E H H D P Z 3 F F Z C D 3 N C F D 0 E 1 2 D 4 4 E Z 3 C D 4 4 4 1 ? , 1 ( ! 4 , ! , C R a ( ! V ! , ! ; ( N 4 % ; ! E H H Z f G D H ? V ( N C G V @ ( ! , ' - , 4 8 C H ! ; i 1 ' ! V j ' R ! V P : - L ( ( L # % N G ' ! ! # , C 1 V , ( ' > % ! D C 4 8 3 H 4 3 D F C 7 ! ; C # ' ! ( C D 4 4 > f G D D = = ? = H 1 i ! C [ ? & ' ! V ( C 7 ! 9 ! , U ! ( ( ! ( ( 8 % 4 ( % - C V , - , L # ( ( % > C H ? e : 1 ( % 2 # a > , L , ( ! W n - ' , Y # G , ( ' % Z C 4 8 Z 4 Z 4 4 C 7 ! ; ! C # C U # C ( 1 , ; ! E E 4 f D 4 4 6 , % 4 ' ! , 1 > ! f D G E # = = H = i 1 ( 4 1 U ( ! 8 0 # ( 0 3 L 0 Y G j 6 ! % > C # ' 8 - ' D 9 4 > 3 Z # ? A C V , > ( ' % E C 4 8 D H D 4 D H E E C [ ! ( ! ' C 56 A Test Statistic in the Complex Wishart Distribution and Its Application to Change Detection in Polarimetric SAR Data Knut Conradsen, Allan Aasbjerg Nielsen, Jesper Schou, and Henning Skriver Abstract—When working with multilook fully polarimetric synthetic aperture radar (SAR) data, an appropriate way of representing the backscattered signal consists of the so-called covariance matrix. For each pixel, this is a 3 3 Hermitian positive definite matrix that follows a complex Wishart distribution. Based on this distribution, a test statistic for equality of two such matrices and an associated asymptotic probability for obtaining a smaller value of the test statistic are derived and applied successfully to change detection in polarimetric SAR data. In a case study, EMISAR L-band data from April 17, 1998 and May 20, 1998 covering agricultural fields near Foulum, Denmark are used. Multilook full covariance matrix data, azimuthal symmetric data, covariance matrix diagonal-only data, and horizontal–horizontal (HH), vertical–vertical (VV), or horizontal–vertical (HV) data alone can be used. If applied to HH, VV, or HV data alone, the derived test statistic reduces to the well-known gamma likelihood-ratio test statistic. The derived test statistic and the associated significance value can be applied as a line or edge detector in fully polarimetric SAR data also. Index Terms—Covariance matrix test statistic, EMISAR, radar applications, radar polarimetry, remote sensing change detection. I. INTRODUCTION D UE TO ITS all-weather mapping capability independently of, for instance, cloud cover, synthetic aperture radar (SAR) data hold a strong potential, for example, for change detection studies in remote sensing applications. In this paper, multitemporal SAR images of agricultural fields are used to demonstrate a new change detection method for polarimetric SAR data. It is well known that the development of different crops over time causes changes in the backscatter. The radar backscattering is sensitive to the dielectric properties of the vegetation and the soil, to the plant structure (i.e., the size, shape, and orientation distributions of the scatterers), to the surface roughness, and to the canopy structure (e.g., row direction and spacing and cover fraction) [1], [2]. The polarimetric SAR measures the amplitude and phase of backscattered signals in four combinations of the linear receive and transmit polarizations: horizontal–horizontal (HH), Manuscript received June 8, 2001; revised October 17, 2002. This work was supported in part by The Danish National Research Councils in part under the Earth Observation Programme and in part under the European Space Agency Follow-On Research Programme. The EMISAR data acquisitions and part of the data processing were supported by the Danish National Research Foundation. K. Conradsen and A. A. Nielsen are with IMM, Informatics and Mathematical Modelling, Technical University of Denmark, DK-2800 Lyngby, Denmark (e-mail: [email protected]). J. Schou and H. Skriver are with EMI, Section of Electromagnetic Systems, Technical University of Denmark (Ørsted DTU), DK-2800 Lyngby, Denmark. Digital Object Identifier 10.1109/TGRS.2002.808066 horizontal–vertical (HV), vertical–horizontal (VH), and vertical–vertical (VV). These signals form the complex scattering matrix that relates the incident and the scattered electric fields [3]. The inherent speckle in the SAR data can be reduced by spatial averaging at the expense of loss of spatial resolution. In this so-called multilook case, a more appropriate representation of the backscattered signal is the covariance matrix in which the average properties of a group of resolution cells can be expressed in a single matrix. The average covariance matrix is defined as [3] (1) denotes ensemble averaging; denotes complex conwhere is the complex scattering amplitude for receive jugation; and polarization and transmit polarization ( and are either for horizontal or for vertical). Reciprocity, which normally (in the backscatapplies to natural targets, gives tering direction using the backscattering alignment convention [3]) and results in the covariance matrix (1) with rank 3. follows a complex Wishart distribution [4]. The components in the covariance matrix containing both co- and cross-polarized scattering matrix elements often contain little information. For randomly distributed targets with azimuthal symmetry, the elements are zero [5]. In this paper, a test statistic for equality of two complex covariance matrices and an associated asymptotic probability measure for obtaining a smaller value of the test statistic are derived and applied to change detection in fully polarimetric SAR data. In [6], a change detection scheme based on canonical correlations analysis is applied to scalar EMISAR data (see also [7]–[10]). If used with HH, VV, or HV data only, the test statistic reduces to the well-known test statistic for equality of the scale parameters in two gamma distributions. The derived test statistic and the associated significance measure can be applied as a line or edge detector in fully polarimetric SAR data also [11]. Section II sketches important aspects of the complex Gaussian and Wishart distributions, the likelihood-ratio test statistic in the complex Wishart distribution, and the associated significance measure. Section III gives a case study in which data from the Danish airborne EMISAR [12], [13] are used. Section IV discusses the results from the case study, and Section V concludes. 57 II. THEORY This section describes the complex normal and Wishart distributions and the likelihood-ratio test for equality of two complex Wishart matrices. For a more thorough description, see the Appendix. A. Complex Normal Distribution We say that a -dimensional random complex vector follows a complex multivariate normal distribution with mean and dispersion matrix , i.e., data. Then states that where is a subset of the states that where and set of all possible . are disjoint, and often . The likelihood ratio (9) where If case is the likelihood function rejects for small values. is true (in statistical parlance “under ”), then in our with . The likelihood-ratio test statistic becomes (2) (10) if the frequency function is Here (11) (3) and denotes the determinant; denotes the trace of a where denotes complex conjugation ( ) and transpose matrix; and ( ). ). Similar expressions are where is the identity matrix ( and . For the numerator of we get valid for B. Complex Wishart Distribution We say that a Hermitian positive definite random follows a complex Wishart distribution, i.e., (12) matrix (4) if the frequency function is (13) and (5) (14) where (6) This leads to the desired likelihood-ratio test statistic (15) The frequency function is defined for positive definite. If and are independent and both follow complex Wishart distributions and If get , which is typically the case for change detection, we (7) then their sum also follows a complex Wishart distribution (16) If (8) C. Test for Equality of Two Complex Wishart Matrices Hermitian positive definite Let the independent and be complex Wishart distributed, i.e., matrices with and with . We consider the null hypothesis , which states that the two matrices are . equal against the alternative hypothesis In general, suppose that the observations on which we shall where is the set of base our test have joint density parameters of the probability function that has generated the (17) and (18) then the probability of finding a smaller value of 2 is (19) 58 For covariance matrix data, . In the latter case, and , and reduces to . For HH, HV, or VV data, are therefore scalars and (20) which is equivalent to the well-known likelihood-ratio test statistic for the equality of two gamma parameters [14], [15] (see the Appendix). as functions of the number of looks for Fig. 1 shows and and . D. Azimuthal Symmetry By swapping first rows and then columns two and three in in (1), we obtain in the azimuthal symmetry case (21) is (here 2 2), and is where (here 1 1). This matrix is not Wishart distributed. We now , , consider , and , and we assume , , , and are mutually independent. that We want to test the hypothesis and against all alternatives. We have the likelihood function (22) The likelihood-ratio test statistic becomes (23) where the latter equality is due to the fact that the determinant of a block diagonal matrix is the product of the determinants of the diagonal elements, i.e., we get the same test statistic as in . If the full covariance matrix case. In this case (24) (25) and (26) then the probability of finding a smaller value of 2 is (27) Fig. 1. and ! as functions of the number of looks for n = m and p = 3. III. DATA To illustrate the change detection capability of the derived test statistic, EMISAR and ground data from an agricultural test site at the Research Center Foulum located in Central Jutland, Denmark are used. Agricultural fields have been selected for the analysis because of the large change in the polarimetric properties for such areas, due to the development of the crops with time. Polarimetric parameters of agricultural crops have previously been analyzed from this area [2]. A. SAR Data and Calibration The EMISAR system is a fully polarimetric airborne SAR system, and it operates at two frequencies: C-band (5.3-GHz/5.7-cm wavelength) and L-band (1.25-GHz/24-cm wavelength) [11], [12]. The SAR system is normally operated from an altitude of approximately 12 500 m; the spatial 2 m (one-look); the ground range swath resolution is 2 m is approximately 12 km; and typical incidence angles range from 35 to 60 . The processed data from this system are fully calibrated by means of an advanced internal calibration system. The radiometric calibration is better than 0.5 dB, and the channel imbalance is less than 0.5 dB in amplitude and 5 in phase [12]. The cross-polarization contamination is generally suppressed by more than 30 dB. The stability of the system is very important in the change detection scheme set up in this paper. A large number of acquisitions with both the C- and L-band polarimetric SAR from 1994 to 1999 exist for the agricultural test site. To illustrate the change detection capability of the derived test statistic, L-band data from April 17, 1998 and May 59 Fig. 2. L-band EMISAR image of the test area acquired on April 17, 1998, 5210 m 20, 1998 have been used. The two EMISAR images are shown in Figs. 2 and 3, as color composites of the HH (green), HV (actually the complex addition of HV and VH, red), and VV (blue) channels. The HH and VV channels are stretched linearly between 30 dB and 0 dB. The HV channel is stretched linearly between 36 dB and 6 dB. The area is relatively flat, and corrections of the local incidence angle due to terrain slope are not critical in this study, since the acquisition geometry for the two acquisitions are almost identical, and therefore the correction has not been carried out. The geometrical coregistration is, however, very important in a change detection application, where two images are compared on a pixel-by-pixel basis. The polari- 2 5120 m. metric images were registered to a digital elevation model generated from interferometric airborne data acquired by EMISAR. The registration was carried out by combining a model of the imaging geometry with few ground control points, and the images were registered to one another with a root mean-square accuracy of better than one pixel [13]. In the study, 13-look covariance matrix data with a 5 m 5 m pixel spacing are used. B. Test Site The area contains a number of agricultural fields of different sizes with different crops. The lengthy, dark blue feature in the upper left corner of Figs. 2 and 3 is a lake, while the bright 60 Fig. 3. L-band EMISAR image of the test area acquired on May 20, 1998, 5210 m greenish areas seen especially in the lower part of the images are forests (primarily coniferous forests). In the April acquisition, the colors of the agricultural fields are predominantly bluish, due to the larger VV- than HH-backscatter coefficient for the bare fields for the spring crops and the sparsely vegetated fields for the winter crops. For the May acquisition, the spring crops are mainly in the tillering stage, and the winter crops are at the end of the stem elongation stage, in the boot stage, or at the beginning of heading, depending on the crop type. A number of test areas have been selected for quantitative analysis of the test statistic. These areas are outlined in Figs. 2 and 3, and the development stage and the height of the vegeta- 2 5120 m. tion are shown in Table I for reference. Five spring crop fields have been used, i.e., one beet field, one pea field, two spring barley fields, and one oats field. All spring crop fields are bare at the April acquisition with the surface being relatively smooth due to sowing or harrowing. At the May acquisition, the beet field is still bare, whereas the other fields have some relatively dense and low vegetation. The three winter crop fields, i.e., two winter wheat fields and one rye field, have low vegetation for the April vegetation and relatively dense and high vegetation for the May acquisition. Finally, a common spruce field, which is virtually unchanged between the two acquisitions, is used in the investigation. 61 TABLE I DEVELOPMENT STAGES AND VEGETATION HEIGHTS (IN PARENTHESES) (a) (b) Fig. 5. (a) Correlation coefficient and (b) phase difference between HH and VV for the test areas shown in Figs. 2 and 3 for L-band in April and in May. A. Polarimetric Parameters Fig. 4. (a) , (b) , and (c) backscatter coefficients for the test areas shown in Figs. 2 and 3 for L-band in April and in May. IV. RESULTS AND DISCUSSION In Section IV-A, polarimetric parameters for the fields used in the quantitative evaluation will be presented and discussed to provide the background for interpretation of the test statistic results. The results for the test statistic are presented and discussed in Section IV-B. The polarimetric parameters used to describe the selected fields are standard parameters derived from the covariance backscatter coefficients , , and matrix (1) [2]; the where the backscatter coefficient is slightly less dependent backscatter coefficient on the incidence angle than the and the phase differ[2]; the correlation coefficient of the HH and VV components, which contain ence important information about the scattering mechanisms; and the co- and cross-polarized polarization signatures, which are graphical representations of the polarimetric properties [2], , , and backscatter coefficients [3]. Fig. 4 shows the for the various test fields and for both the April and the May and are shown in acquisitions. Correspondingly, Fig. 5, and the polarization responses are shown in Fig. 6. 1) Spring Crops: All spring crops (beets, peas, spring barley 1 and 2, oats) show classical behavior for rough surface scat[Fig. 5(a)], small tering for the April acquisition, i.e., high [Fig. 5(b)], low backscatter [Fig. 4(c)], larger than backscatter [Fig. 4(a) and (b)], and textbook examples of surface scattering polarization responses (which are therefore not shown here). The actual backscatter level from the surface is, of course, controlled by the soil moisture and the surface roughness of the individual fields, and we observe rather weak backscatter from the spring barley 1 and the oats fields (Fig. 4) for the April acquisition due to very smooth surfaces. The beets field also shows rough surface behavior for the May acquisition [Fig. 6(a)]. The pea field shows some volume scattering behavior for the May acquisition, due to the sparse vegeta[Fig. 5(a)] has decreased, and the pedestal of tion, i.e., the the polarization response has increased [Fig. 6(b)]. This effect is 62 Fig. 6. Polarization signatures for the test areas shown in Figs. 3 for the L-band acquisition in May. Orientation angle of 0 corresponds to HH backscatter and the left-hand signature is the copolarized signature, whereas the right-hand signature is the cross-polarized signature. even more pronounced for the spring barley and the oats fields, due to a more dense vegetation [Figs. 5(a) and 6(c) and (d)]. For is observed too [Fig. 5(b)], and a the latter fields, a large pronounced double-bounce response is observed, especially for oats [Fig. 6(c) and (d)]. The double-bounce scattering is most likely caused by penetration through the vegetation, scattering from the ground surface, and scattering from the vegetation, or vice versa. This phenomenon has previously been observed early in the growing season for winter crops [2]. 2) Winter Crops: The backscatter coefficients from the winter crops are, in general, larger than from the spring crops (Fig. 4), due to the contribution from the volume scattering. The behavior of the winter wheat and the rye fields resembles surface scattering for the April acquisition (Fig. 5), indicating penetration through the vegetation and a large surface scattering component. The cross-polarized backscatter is, however, somewhat larger than for surface scattering, due to the volume scattering contribution [Fig. 4(c)]. The backscattering from the winter wheat 1 field is significantly larger than from the winter wheat 2 field for both acquisitions (Fig. 4). The reason is that the sowing direction for the winter wheat 1 field is exactly perpendicular to the radar look direction. For the May acquisition, the winter wheat fields also show some double-bounce scattering behavior [Figs. 5(b) and 6(e) and (f)]. The rye field shows virtually no change in the polarimetric parameters between the two acquisitions, except for some increase in the backscatter (Fig. 4). The coniferous forest area shows pronounced volume scattering behavior for both acquisitions, i.e., 63 Fig. 7. Logarithm of the test statistic ln Q (16) for the images shown in Figs. 2 and 3 in the assumed azimuthally symmetric case. small [Fig. 5(a)], small [Fig. 5(b)], strong backscatter [Fig. 4(c)], and large pedestal for the polarization responses [Fig. 6(h)]. B. Test Statistic ” (16) for the azimuthally symFigs. 7 and 8 show “ metric case and the diagonal-element-only case, respectively, for the two images shown in Figs. 2 and 3. The test statistic is inverted to show areas with large change as bright areas and areas with small change as dark areas. Consequently, when “large values of the test statistic” is mentioned below, it means large ” and vice versa for small values. We observe values of “ that especially the forest areas appear very dark, indicating virtually no change between the two acquisitions. For the agricultural fields, the results range from dark (no change) to bright (large change) areas depending on the crop type. Fig. 9(a) shows the average “ ” for the test areas outlined in the previous sections in the following four different cases: 1) using only the VV channel; 2) using only the three diagonal elements of the covariance matrix; 3) using the covariance matrix but assuming azimuthal symmetry; 4) using the full covariance matrix. Furthermore, Fig. 9(b) shows the average probability of finding a ” (derived from (19) and Theorem 6 in larger value of “ 2 the Appendix) for the four cases mentioned above. Fig. 9(b) also indicates the 5% and the 1% significance levels, and the regions with probabilities lower than these levels are the regions where we will typically reject the hypothesis of equal covariance matrices (or VV channels) at the two points in time, i.e., these are regions with major change between the two data acquisitions. 64 Fig. 8. Logarithm of the test statistic ln Q (16) for the test areas shown in Figs. 2 and 3 in the diagonal case. Figs. 10 and 11 show in white for the images in Figs. 2 and 3, where the hypothesis of equal covariance matrix has been rejected at a 1% significance level for the azimuthally symmetric case and the diagonal case, respectively. Clearly, we observe detection of changes in the azimuthally symmetric case that have not been detected in the diagonal case, as well as improved detection in the azimuthally symmetric case of changes that already to some extent have been detected in the diagonal case. In general, the test statistic for the full covariance matrix is only slightly larger than that for the assumed azimuthally symmetric case [Fig. 9(a)]. We may conclude that the additional information added by the co- and cross-elements of the covariance matrix is small. Also, the change detection potential of the single VV channel is seen to be much less than for the other three cases. Therefore, the discussion below will concentrate on comparing the results for two cases: the diagonal case, where only the three diagonal backscatter coefficient elements of the covariance matrix are used, and the polarimetric case, where azimuthal symmetry is assumed (i.e., all the co- and cross-polarization elements are zero). 1) Similar Polarimetric Parameters: The two regions with virtually no change between the acquisitions but with different dominating scattering mechanisms, i.e., beets (surface scattering) and coniferous forest (volume scattering), show both large values of the test statistic and no significant difference between the diagonal and the polarimetric case. It is not possible to reject the hypothesis of equal covariance matrices at a 5% significance level for any of the regions [Fig. 9(b)]. The rye field also has very similar polarimetric parameters for , as mentioned above, and the two acquisitions, except for the test statistic for the diagonal and the polarimetric cases are relatively close [Figs. 7–9(a)]. The hypothesis of equal covariance matrices is rejected at a 5% significance level for both cases. Consequently, in these cases with relatively similar 65 much larger test statistic is observed for the polarimetric case [Figs. 7–9(a)]. The spring barley 1 field has a very large change in the backscatter coefficients, due to the relatively smooth surface at the April acquisition (Fig. 4), and we observe a very large test statistic for both the diagonal and the polarimetric cases [Figs. 7–9(a)]. The two spring barley fields have almost the same and (Fig. 5), whereas the change in the backscatter coefficients is largest for the spring barley 1 field, as mentioned above. This difference is clearly important for both test statistics, where the test statistic for both the diagonal and the polarimetric case is much larger for the spring barley 1 field than for the spring barley 2 field. The hypothesis of equal covariance matrices is rejected for all three fields at the 5% significance level in both the diagonal and the polarimetric case. This is also the case at the 1% significance level, except for the spring barley 2 field in the diagonal case. Consequently, even when large changes in the backscatter coefficients ensure detection with a nonpolarimetric method, the addition of polarimetric information improves the detection of changes with the new polarimetric test statistic. (a) V. CONCLUSIONS (b) 0 Q Fig. 9. (a) Average “ ln ” for the test areas shown in Figs. 2 and 3 in four different cases: 1) using only the VV channel, 2) using only the three diagonal elements of the covariance matrix, 3) using the covariance matrix but assuming azimuthal symmetry, and 4) using the full covariance matrix. (b) Average probability of finding a larger value of “ 2 ln Q” (derived from (19) and Theorem 6 in the Appendix) for the same four cases. 0 polarimetric parameters, the new test statistic for polarimetric data performs equally well as the nonpolarimetric test statistic. 2) Similar Backscatter Coefficients, Large Difference and/or : Three fields have very similar for backscatter coefficients for the two acquisitions, whereas a and/or exists between large difference between decreases the two acquisitions, i.e., the pea field (where between the two acquisitions, due to the sparse vegetation cover at the May acquisition), the winter wheat 2 field (where changes significantly between the two acquisitions), and the and change winter wheat 1 field (where both significantly). A significantly larger test statistic is observed in the polarimetric case than in the diagonal case [Figs. 7–9(a)] for all three fields. Also, it is not possible to reject the hypothesis of equal covariance matrices at a 5% significance level for any of the three fields in the diagonal case [Fig. 9(b)]. On the other hand, the hypothesis is rejected in the polarimetric case at a 1% significance level for all three fields [Fig. 9(b)]. Thus, the results clearly show that the new polarimetric test statistic is very sensitive to differences in the full polarimetric information contained in the covariance matrix. 3) Large Difference for All Polarimetric Parameters: Finally, three regions show significant changes in all polarimetric parameters, i.e., the backscatter coefficients, , and the , i.e., the two spring barley fields the and the oats field, which have a smooth bare surface at the April acquisition and a relatively dense vegetation cover at the May acquisition. For the spring barley 2 and the oats fields, we see a medium test statistic in the diagonal case, whereas a In this paper, a test statistic for equality of two complex Wishart distributed covariance matrices and an associated asymptotic probability measure for obtaining a smaller value of the test statistic have been derived. The test statistic provides a unique opportunity to develop optimal algorithms for change detection, edge detection, line detection, segmentation, etc., for polarimetric SAR images. Such algorithms have previously been based on results of applying algorithms to the single channels and subsequently combining these results using some kind of fusion operator. As a demonstration of the potential of the new test statistic, the derived measures have been applied to change detection in fully polarimetric SAR data for a test area with primarily agricultural fields and forest stands where two images acquired with approximately one-month interval have been used. In the case with areas with only small change in the polarimetric parameters between the two acquisitions, the new test statistic for polarimetric data performs equally well as the nonpolarimetric test statistic. When the backscatter coefficients are virtually unchanged, but either the phase and/or the correlation coefficient between the HH and VV polarizations have changed, the results clearly show that the new polarimetric test statistic is much more sensitive to the differences than test statistics based only on the backscatter coefficients. Also, in the case where all parameters in the covariance matrix have changed between the two polarizations, the new test statistic shows improved change detection capability. Consequently, the results show clearly that the new test statistic offers improved change detection capability for fully polarimetric SAR data. APPENDIX ANALYSIS OF COMPLEX WISHART MATRICES In change detection and edge detection in polarimetric SAR data, it is useful to be able to compare two complex Wishart distributed matrices. 66 Fig. 10. Rejection of hypothesis of equal covariance matrices at 1% level for the assumed azimuthally symmetric case (white: rejection, black: acceptance). Most of the standard literature in multivariate analysis only contains references to the real case (e.g., see [16]). This does not mean, however, that results for the complex case do not exist. In [4], the relevant class of complex distributions is introduced, and [17] completed much of the work, either giving results or (indirectly) pointing out how results may be obtained. It is, however, not straightforward to deduce the relevant formulas from their work. In [18], many of the necessary formulas are deduced in an elegant way using the fact that the problem is invariant under a group of linear transformations. The notation chosen is not straightforward however. Some of their results appear in [19], an unpublished thesis in Danish, including results on comparing covariance matrices. In [20], the theory for linear and graphical models in multivariate complex normal models is covered. Since the formulas for the distribution of the likelihood ratio do not seem to be available and since no authors seem to have treated the so-called block diagonal case, we have chosen to give a rather thorough description of the necessary results. We start with a short introduction to the complex normal and the complex Wishart distributions. We then compare two gamma distributions, which is the one-dimensional test often usen in the radar community. Then we give a straightforward (brute force) derivation of the likelihood-ratio criterion for testing the equality in the complex case. Then, we describe the so-called block diagonal case, which among other things covers the case known in the radar community as the azimuthal symmetric case and the total independence case. After quoting results from [21] on asymptotic distributions, we establish the necessary results on moments by brute force integration. By straightforward but rather cumbersome calculations, the results in [21] yield the desired results. Alternatively, one may use results in [20] and derive expressions involving the product of beta distributed random variables. 67 Fig. 11. Rejection of hypothesis of equal covariance matrices at 1% level for the diagonal case (white: rejection, black: acceptance). A. Complex Normal Distribution is Hermitian positive definite and of the form Following [4], we say that a -dimensional random complex follows a complex multivariate normal distribution vector with mean and dispersion matrix , i.e., (30) .. . .. . .. . In other words, we have for (28) (31) (32) if the frequency function is B. Complex Wishart Distribution We say that a Hermitian positive definite random follows a complex Wishart distribution, i.e., (29) matrix (33) 68 and similarly for . The likelihood function for the parameters ) thus becomes ( if the frequency function is (34) where (44) (35) , we obtain and under the hypothesis If confusion concerning or may arise, we write rather than . The frequency function is defined for positive definite, and in evaluating integrals the volume element becomes (36) (45) Taking the derivatives of the log likelihoods and setting them equal to zero, we obtain the ML estimates where and denote real and imaginary parts. For further description and useful results on Jacobians, etc., see [4] and [17]. It is emphasized that the formulas for the complex normal and Wishart distributions differ from their real counterparts. Estimation of from a normal sample follows in Theorem 1. be independent, complex Theorem 1: Let normal random variables, i.e., (46) (47) (48) Therefore the likelihood-ratio test statistic becomes (37) Then the maximum-likelihood (ML) estimator for is (38) and (49) or The critical region is given by is Wishart distributed (50) (39) Proof: See [4]. From Theorem 1, we easily obtain Theorem 2. Theorem 2: Let and be independent Wishart distributed matrices (40) (41) Then the sum will again be Wishart distributed, i.e., (42) Straightforward calculations show that this critical region is of the form or Since , i.e., (51) under the null hypothesis is distributed like Fisher’s (52) and may be determined by means of quantiles in the -distribution. D. Test on Equality of Two Complex Wishart Matrices Proof: Straightforward. We consider independent Wishart distributed matrices C. Test on Equality of Two Gamma Parameters Let the independent random variables real scalars) be gamma distributed frequency function for is and and (53) (54) (which are . The and wish to test the hypothesis (43) against (55) 69 against all alternatives. The likelihood-ratio criterion becomes the product of the criteria, i.e., We have the likelihood functions (68) (56) Since the determinant of a block diagonal matrix is the product of the determinants of the diagonal elements, we obtain (69) (57) The ML estimates are (58) (59) (60) Therefore, the likelihood-ratio test statistic becomes i.e., the same result as in the general case [see (61)]. Note, howand are no ever, that the distribution has changed, since longer Wishart distributed. F. Large Sample Distribution Theory In [21] (as quoted in [16]), a general asymptotic expansion of the distribution of a random variable whose moments are certain functions of gamma functions has been developed. We state the main result as a theorem, and we shall use it in determining the (asymptotic) distribution of the likelihood-ratio criterion. ( ) have Theorem 3: Let the random variable the th moment (61) Thus, we get the critical region (70) (62) where is a constant so that . For an arbitrary we set and (71) (72) E. Tests in the Block Diagonal Case In some applications (e.g., remote sensing), there are several independent Wishart matrices in each observation. They will often be arranged in a block diagonal structure like The first three Bernoulli polynomials are denoted (73) (63) (74) . This covers the so-called azwhere imuthal symmetric case and the case with independence of coand cross-polarized signals. If we define (75) We define (64) (76) it is important to note that is not Wishart distributed . and , i.e., we We now consider similar partitionings of have independent random matrices (65) (66) . We have for We want to test the hypothesis and set (77) If we select so that , we have . (78) (67) Proof: See [16]. 70 Theorem 4: Let (91) (79) (80) (92) be independent complex Wishart distributed matrices. Then for and (81) we have Proof: The joint frequency function of . Therefore ( ) (82) (93) is Then the asymptotic distribution of the likelihood-ratio criterion is given by (83) (94) and evaluation of this integral gives the desired result. By means of the two previous theorems, we are now able to state the important result on the (asymptotic) distribution of the likelihood-ratio criterion. Theorem 5: We consider the likelihood-ratio criterion (84) and define (85) (86) (87) Then (88) Proof: Omitted, straightforward but cumbersome calculations. We now address the block diagonal case and state the main result in Theorem 6. Theorem 6: Let the situation be as described in Section A–E, and define for (89) (90) Proof: Omitted, straightforward but cumbersome calculations. REFERENCES [1] F. T. Ulaby, R. K. Moore, and A. K. Fung, Microwave Remote Sensing: Active and Passive. Norwood, MA: Artech House, 1986, vol. 3. [2] H. Skriver, M. T. Svendsen, and A. G. Thomsen, “Multitemporal L- and C -band polarimetric signatures of crops,” IEEE Trans. Geosci. Remote Sensing, vol. 37, pp. 2413–2429, Sept. 1999. [3] J. J. van Zyl and F. T. Ulaby, “Scattering matrix representation for simple targets,” in Radar Polarimetry for Geoscience Applications, F. T. Ulaby and C. Elachi, Eds. Norwood, MA: Artech House, 1990. [4] N. T. Goodman, “Statistical analysis based on a certain multivariate complex Gaussian distribution (An introduction),” Ann. Math. Stat., vol. 34, pp. 152–177, 1963. [5] S. V. Nghiem, S. H. Yueh, R. Kwok, and F. K. Li, “Symmetry properties in polarimetric remote sensing,” Radio Sci., vol. 27, no. 5, pp. 693–711, 1992. [6] A. A. Nielsen, R. Larsen, and H. Skriver, “Change detection in bi-temporal EMISAR data from Kalø, Denmark, by means of canonical correlations analysis,” in Proc. 3rd Int. Airborne Remote Sensing Conf. and Exhibition, Copenhagen, Denmark, July 7–10, 1997. [7] A. A. Nielsen, “Change detection in multispectral bi-temporal spatial data using orthogonal transformations,” in Proc. 8th Austral-Asian Sensing Conf., Canberra ACT, Australia, Mar. 25–29, 1996. [8] A. A. Nielsen and K. Conradsen, “Multivariate alteration detection (MAD) in multispectral, bi-temporal image data: A new approach to change detection studies,” Dept. Mathematical Modelling, Tech. Univ. Denmark, Lyngby, Denmark, Tech. Rep. 1997-11, 1997. [9] A. A. Nielsen, K. Conradsen, and J. J. Simpson, “Multivariate alteration detection (MAD) and MAF post-processing in multispectral, bi-temporal image data: New approaches to change detection studies,” Remote Sens. Environ., vol. 19, pp. 1–19, 1998. [10] A. A. Nielsen, “Multi-channel remote sensing data and orthogonal transformations for change detection,” in Machine Vision and Advanced Image Processing in Remote Sensing, I. Kanellopoulos, G. G. Wilkinson, and T. Moons, Eds. Berlin, Germany: Springer, 1999, pp. 37–48. [11] J. Schou, H. Skriver, K. Conradsen, and A. A. Nielsen, “CFAR edge detector for polarimetric SAR images,” IEEE Trans. Geosci. Remote Sensing, vol. 41, pp. 20–32, Jan. 2003. [12] S. N. Madsen, E. L. Christensen, N. Skou, and J. Dall, “The Danish SAR system: Design and initial tests,” IEEE Trans. Geosci. Remote Sensing, vol. 29, pp. 417–476, May 1991. [13] E. L. Christensen, N. Skou, J. Dall, K. Woelders, J. H. Jørgensen, J. Granholm, and S. N. Madsen, “EMISAR: An absolutely calibrated polarimetric L- and C -band SAR,” IEEE Trans. Geosci. Remote Sensing, vol. 36, pp. 1852–1865, Nov. 1998. 71 [14] R. Touzi, A. Lopes, and P. Bousquez, “A statistical and geometrical edge detector for SAR images,” IEEE Trans. Geosci. Remote Sensing, vol. 26, pp. 764–773, Nov. 1988. [15] A. Lopes, E. Nezry, R. Touzi, and H. Laur, “Structure detection and statistical adaptive speckle filtering in SAR images,” Int. J. Remote Sens., vol. 13, no. 9, pp. 1735–1758, 1993. [16] T. W. Anderson, An Introduction to Multivariate Statistical Analysis, 2nd ed. New York: Wiley, 1984. [17] C. G. Khatri, “Classical statistical analysis based on a certain multivariate complex Gaussian distribution,” Ann. Math. Stat., vol. 36, pp. 98–114, 1965. [18] S. A. Andersson, H. K. Brøns, and S. T. Jensen, “Distribution of eigenvalues in multivariate statistical analysis,” Inst. Mathematical Statistics, Univ. Copenhagen, Copenhagen, Denmark, 1982. [19] G. Gabrielsen, “Fordeling af egenværdier i relle, komplekse og kvaternion Wishart fordelinger,” M.S. thesis (in Danish), Inst. Mathematical Statistics, Univ. Copenhagen, Copenhagen, Denmark, 1975. [20] H. H. Andersen, M. Højbjerre, D. Sørensen, and P. S. Eriksen, Linear and Graphical Models for the Multivariate Complex Normal Distribution. Berlin, Germany: Springer-Verlag, 1995, vol. 101, Lecture Notes in Statistics. [21] G. E. P. Box, “A general distribution theory for a class of likelihood criteria,” Biometrika, vol. 36, pp. 317–346, 1949. Knut Conradsen received the M.S. degree from the Department of Mathematics, University of Copenhagen, Copenhagen, Denmark, in 1970. He is currently a Professor with the Informatics and Mathematical Modelling, Technical University of Denmark (DTU), Lyngby, Denmark. Since 1995, he has been Deputy Rector (or Vice President) of DTU. His main research interest is the application of statistics and statistical models to real-life problems. He has worked on many national and international projects on the application of statistical methods in a wide range of applications, recently mainly in remote sensing. He has also conducted extensive studies in the development and application of mathematical and statistical methods concerning spatial, multi/hyperspectral, and multitemporal data, including both optical and radar data. Allan Aasbjerg Nielsen received the M.S. degree from the Department of Electrophysics, Technical University of Denmark, Lyngby, Denmark, in 1978, and the Ph.D. degree from Informatics and Mathematical Modelling (IMM), Technical University of Denmark, in 1994. He is currently an Associate Professor with IMM. He is currently working with IMM’s Section for Geoinformatics. He has been with the Danish Defense Research Establishment from 1977 to 1978. He has worked on energy conservation in housing with the Thermal Insulation Laboratory, Technical University of Denmark, from 1978 to 1985. Since 1985, he has been with the Section for Image Analysis, IMM. Since then, he has worked on several national and international projects on the development, implementation, and application of statistical methods and remote sensing in mineral exploration, mapping, geology, environment, oceanography, geodesy, and agriculture funded by industry, the European Union, Danida (the Danish International Development Agency), and the Danish National Research Councils. Jesper Schou received the M.S. degree in engineering and the Ph.D. degree in electrical engineering from the Technical University of Denmark, Lyngby, Denmark, in 1997 and 2001 respectively. His primary research interests during his Ph.D. studies was image analysis and the processing of polarimetric SAR data, including filtering, structure detection, and segmentation. He is currently working with 3-D scanner systems and related 3-D image analysis algorithms. Henning Skriver received the M.S. degree and the Ph.D. degree from the Technical University of Denmark, Lyngby, in 1983 and 1989, respectively, both in electrical engineering. He has been with the Section of Electromagnetic Systems (EMI), Department Ørsted, Technical University of Denmark (DTU), Lyngby, since 1983, where he is currently an Associate Professor. His work has been concerned primarily with various topics related to the utilization of SAR data for different applications. From 1983 to 1992, his main area of interest was retrieval of sea ice parameters from SAR data, including SAR data from ERS-1. Since 1992, he has covered different aspects of land applications of SAR data, such as forestry in the MAESTRO-1 project, and agricultural and environmental applications using both satellite SAR data and data from the Danish airborne polarimetric SAR, EMISAR. His interests also include various methods for processing of SAR data, such as SAR image simulation, SAR image filtering, speckle statistics, texture analysis, segmentation, calibration, and polarimetric analysis. He is currently a Project Manager for a project concerning the use of SAR data for cartographic mapping. 72 ȱ ȱȱȱȱȱȱȱ ȱ ȱǰȱȱ ¢ȱȱȱȱȱȱ¢ǰȱ ȱȱ¢ȱ ȱ ǯ ȱ ȱȱȱȱȱȱȱ¢ȱȱȱȱȱȱȱȱǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ¢ȱ ȱ ȱ ¢ȱ ǯȱȱ¢ȱȱȱȱȱȱȱȱȱȱȱȱȱȱ ȱȱȱȱȱǯȱȱȱȱȱȱȱǰȱȱȱȱ ǯȱȱȬȱȱǰȱȱȱȱȱȱȱȱȱȱȱ ȱ ȱ ȱ ǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ǰȱ ȱ ȱ ȱȱȱȱȱȱȱȱȱ¡ȱȱǯȱȱȱ ǯ ȱ ȱȱȱȬȱȱȱȱ¢ȱȱȱ¢ȱȱȱȱ ȱ ȱ Ȭǯȱ ȱ ȱ ȱ ȱ ¢ȱ ȱ ȱ ȱ ǰȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ¢£ǯȱ ȱ ȱ ȱ ǰȱ ȱ ȱ ȱ ȱ ȱ ȱȱȱȱȱȱȬȱȱȱȱȱȱȱȱȱ ȱȱ¢£ȱǯȱ ȱȱȱȱ¢ȱȱȱȱȱȱȱȱȱȱ ȱ ȱ Ȭȱ ȱ ȱ Ȭ¢ȱ ¢Ȭǯȱ ȱ ȱ ȱ Ȭȱȱȱȱȱȱ ȱȱȱȱȱȱȱȱȱ ǯȱȱȱȱȱȱȱȱȱȱ ¢ȱȱȱȱǰȱȱȱȱ ȱȱ ȱ¡ȱȱǯȱ ȱ ¢ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ǰȱȱ ¢ȱ ȱȱ¢ȱȱȱȱ¢ȱǯȱȱȱȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ¢ȱ ȱ ȱ ȱ ȱ ȱ ȱ ¡ȱ ǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ¢ȱ ȱ ȱ ȱ ¡ȱȱȱȱȱǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ŗȱ ǽřǾǰȱ ȱ ȱ ¢ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ Řȱ ǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ¢ȱȱȱǰȱ ȱ ȱȱȱȱȱȱȱȱȱǰȱȱȱ ȱȱȱȱȬȱȱ ¢ȱ ȱȱȱǯȱȱȱ ȱȱ¢ȱȱ ȱȱȱȱȦȱȱȱȱȱǯȱ ȱ ǯ ȱȱ¡ȱȱ ȱȱȱȱȱȱȱȱȱǰȱȱȱȱȱȱ¢£ȱ ¢ȱ ȱ ȱ ¡ȱ ǯȱ ȱ ¡ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ Ȭȱ ǰȱ ȱ ȱ ¢ȱ ¢ȱ ȱ ȁȱ ȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱ ŗ Ř ȱǰȱǰȱȱǯȱ ȱȱ ȱȱŝŝȱȱŝŞȱȱ¢ȱȱȱǯȱ 73 Ȃȱ ȱ ȱ ȱ ǯȱ ȱ ȱ ȱ ȱ ¡ȱ ȱ ȱ ȱ ȱ ȱ ǰȱȱȱȱȱȱȱ¢ȱȱȱȱȱȱ ȱȱȱȱ ǯȱ ȱ¡ȱȱ ȱȱȱȱȱȱǻǼȱȱȱ Ȭ£řȱȱǻŗǯǯǼȱ ȱȱȱȱ¢ȱȬǯȱȱŗȱȱ ǰȱ ȱȱ¡ȱȱȱ ¡ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱȱȱǯȱȱȱȱǻǼȱȱȱȱȱǻŖŖŗŖŗŖŘŗŖŘŗŖŘŗŘŖŘŗŖŘŗŘŖŖǼǰȱ ȱȱȱȱȱ¢£ȱȱȱȱ£ȱƽȱŘǯȱ ȱ 0 0 1 0 1 0 2 1 0 2 1 0 2 1 2 0 2 1 0 2 1 2 0 0 0 0 = 1 0 1 = 2 1 0 = 1+1 ȱ ȱŗȱǱȱ¡ȱǯȱ ǯ ȱȱȱ ȱȱŝŝȱȱŝŞȱȱ ȱȱȱȱ¢ȱȱȱȱ ȱ ȱ ŗşŝŝȱ ȱ ŗşŝŞȱ ǽŗǾǰǽŘǾǯȱ ¢ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȁȱ ¢Ȃȱ ǯȱ ȱ ¢ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ǰȱ ȱ ȱ ¢ȱ ȱ ȱ ȱ ȱ ǯȱ ȱ ȱ ȱ ȱȱ ȱȱȱȱǰȱȱȱȱȱ¢ȱȱȱǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ŝŝǰȱ ŝŞȱ ȱ ȱ ȱ ȱ ǰȱ ȱ ȱ ȱ ȱ ¡ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ǯȱ ȱ ȱ ȱ¢ȱ¡ȱ ȱȱǰȱȱȱ ȱȱȱ ȱǯȱ ȱ ȱȬȱǯȱȱȱȱ ȱŝŝȱȱ ȱ ȱ ȱ ŝŝȱ ȱ ȱ ȱ ȱ ȱ ȁȱ ¢ȱ Ȃȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ¢ǯȱ ȱ ȱ ȱ ȱ ȁȱ Ȃȱ ǰȱ ȱ ȱ ȱ ȱŘǰȱȱȱȱȱȱȱȱȱȱȱȱȱȱǯȱȱ ȱ ȱ ȱ ȱ ȱ ǰȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱȱȱȱ¢ȱȱǰȱȱȱȱȱ ȱȱȱȱȱȱ ȱȱȱȱ¢ȱȱǯȱȱ ȱ search buffer lookahead buffer 001010 210210 212021 021200 window ȱ ȱŘȱǱȱȱ ǯȱ ȱ£ȱȱȱȱȱȱȱȱ¢ȱȱǯȱȱȱ Ȭ£ȱ ȱȱȱȱ ȱ ȱ ¢ȱ ȱ ǰȱ ȱ ȱ ȱ Ȭ£ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ Ȭǯȱ ȱ £ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ £ȱ ȱ ȱ ȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱ ȱ ȱ ȱ ȱ ǻǼȱ ȱ ȱ ȱ ¢ȱ ȱ £ȱ ȱ ȱ Ȭȱ ȱ ǯȱ ȱ ȱ ȱ ȱȱȱȱȱȱȱǯȱ ř 74 ȱ ǰȱ ȱ ȱ ȱ ȱ ȱ ¢ȱ ȱ ȱ ȱ ȱ ǰȱ ȱȱȱȱȱȱ¢ ȱȱȱȱǯȱȱȱȱ¢ȱȱȱ ȱȱ ȱȱȱȱȱȱȱǰȱ ȱȱȱȱȱȱ ȱ ȱ ȱ ȱ ǻǯǯȱ £Ǽǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ǰȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱȱȱȱȱ ȱȱȱȱȱȱȬȱŚȱȱȱ ȱ ǯȱȱȱȱȱǰȱȱȱȱȱȬȱȱȱȱȱȱ ȱȱȱȱȱȱȱȱȱ ȱȱ£ȱȱȱǰȱȱȱȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ǻȱ ȱ Ȭȱ Ǽǯȱ ȱ ȱ ȱ ȱ ȱȱȱȱȱȱȱȱ£ǯȱ ȱřȱ ȱ ȱȱ¡ȱȱȱȱȱŝŝȱśǯȱȱȱȱƽȱ ǻŖŖŗŖŗŖŘŗŖŘŗŖŘŗŘŖŘŗŖŘŗŘŖŖǳǼǰȱȱȱȱȱȬ£ȱƽȱǻşǼǯȱ ȱ 0000000 00001 0102102 1... C1 = 22 02 1 000 000001 010210 21021. .. C2 = 21 10 2 00001 0102102 10212 021... C3 = 20 21 2 2 102102 1202102 1200. .. C4 = 02 22 0 ȱ ȱřȱǱȱŝŝȱǯȱ ȱȱȱȱȱȱȱȱȱŝŝǯȱȱȱȱȱȱ ŝŝȱ ȱ ȱ ǰȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱȱȱȱ¢ȱȱǯȱȱȱȱǰȱȱȱ¢ȱȱȱ £¢ǰȱȱȱȱȱȬȱȱȱȱȱȱȱȱǯȱ ȱȱȱȱȱȱȱȱȱȬȱȱȱȱȱȱ¡ȱǯȱ ȱ ȱȬȱǯȱȱȱȱ ȱŝŞȱȱ ȱŝŞȱȱ ȱȱȱ¢Ȭȱȱǰȱȱ ȱȱȱ ǯȱȱȱȱ¢ȱȱȱȱȱǰȱȱȱ¢ȱȱȱȱȱ ȱȱȱȱ¢ȱ¢ǰȱȱȱȱȬȱǯȱȱȱȱȱ ȱȱȱȱȱȱ¡ȱȱȱ£ȱȱȱȱȬȱǯȱȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ¢ȱ ȱ ȱ ȱ ¢ǰȱ ȱ ȱ ¡ȱ ȱ £ǯȱ ȱ ȱ ȱ¢ȱȱȬȱȱȱǯȱ ȱ Śȱ ȱ ȱ ¡ȱ ȱ ȱ ŝŞȱ ǯȱ ǻȱ ŖŖŗŘŗŘŗŘŗŖŘŗŖŗŘŗŖŗŘŘŗŖŗŗǼǯȱ ȱ ȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱȱ Ś ś ȱȱȱ¢ȱ¡ȱȱȱȱǯȱ ȱȱȱȱȱȱ ȱȱ¢ȱȱǻ¡ȬřǼǯȱ ȱ ƽȱ 75 Dictionary # 1 2 3 4 5 6 7 8 9 entry 0 1+1 2 1 3+1 5+0 6+1 7+2 7+1 phrase 0 01 2 1 21 210 2101 21012 21011 Output 0 1 0 0 3 5 6 7 7 0 1 2 1 1 0 1 2 1 ȱ ȱŚȱǱȱŝŞȱȱ ȱ ŝŞȱ ȱ ȱ ȱ ǲȱ ȱ ȱ ȱ ȱ £ȱ ȱ ȱ ¢ǯȱ ȱ ȱ ȱ ȱ ȱ ¢ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ǰȱ ȱ ¢ȱ ȱ ȱ ȱ¡¢ȱǯȱȱȱȱȱȱ¢ȱȱȱȱȱȱ ȱ¢ȱ ȱ¢ȱȱȱ ȱǯȱȱȱȱȱȱ ŝŞȱȱȱŝŝȱ ¢ȱ ȱȱȱȬȱȱȱǰȱȱȱȱȱ ȱȱȱȱȱ ȱ ȱ ȱ ȱ ǯȱ ȱ ȱ ȱ ¢ȱ ȱ ȱ ȱ ȱ ŝŞȱ ǻȱ ȱ Ǽȱȱȱȱȱǯȱȱ ȱ ȱȬȱǯȱȱȱȱ ȱȱȱ ȱ ǰȱ ȱ ¢ȱ ¢ȱ ȱ ȱ ŗşŞŚȱ ǽřǾǰȱ ȱ ȱ ȱ ȱ ȱ ȱ ŝŞǯȱ ȱȱŝŞȱȱȱȱȱȱȱ¢ȱ¢ȱȱȱȱȬȱȱȱ ǰȱȱȱȱȱȱȱ¢ȱ¢ȱȱ¢Ȭǯȱȱ¢ȱȱ ǰȱȱ ȱȱȱȱȱȱȱǲȱȱȱȱȱȱȱȱ ȱǰȱȱ¢ȱȱ£ȱ ȱȱȱȱ¢ȱȱȱȱǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ Ȭȱ ȱ ȱȱȱ ȱǯȱȱȱ ȱǰȱȱȱ ȱȱȱȱ ŝŞǯȱȱ ȱ ǯ ȱȱȱȱȱȱȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱȱȱȱǰȱȱȱȱȱȱȱ ȱ¢ȱȱȱȱȱȱ ȱ ȱ ȱ ǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ¢ȱ ȱ ȱ ȱ ȱ ȱ ¢£ȱ ȱȱȱȱǯȱȱȱȱȱȱǰȱȱȱ ȱȱȱ ȱ ¢ȱ ȱ ȱ ¢ǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱǰȱȱ ȱȱȱȱȱȱȱ¢ȱȱȱȱ ȱ¢ȱȱ ¢ȱ ȱ ȱ ǰȱ ȱ ȱ ȱ ¢ȱ ȱ ȱ ȱ ȱ ǻƸ¢Ǽȱ ȱ ȱȱȱȱȱȱǻƸ¢Ǽȱȱȱ¢ǯȱȱȱȱȱ ǰȱȱȱȱȱȱȱ ȱ¢ȱȱȱǻǼȱȱȱȱ £ȱȱȱȱȱȱǯȱ ȱ ¢ȱȱȱȱȱǰȱȱȱȱȱȱȱȱȱǰȱȱ ¢ȱȱȱȱȱ¢ǰȱȱ¡ȱȱȱȱ ȱ śǯȱ ȱ ȱ Ȭȱ ȱ ȱ ȱ ȱ ȱ ǰȱ ȱ ȱ ȱ ȱ ¢ȱ ȱ ȱ ȱ ǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ǰȱ ȱ ȱ ȱ ȱ ȱ ȱ ¡ȱ 76 ¢ǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ǰȱ ȱ ȱ ȱ ȱ ȱȱȱ ȱȱȱȱǯȱ ȱ 344 668 8 544 668 8 1 446 688 71 446 688 ȱȱ ȱśȱǱȱ£ȱǯȱ ȱȬȱǯȱȱȱȱ ¡ȱȱȃȄȱȬȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ŝŞǰȱ ȱ ȱ ȱ ǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ¢£ȱ ȱ ȱ ȱ ȱ ǰȱ ȱ ¢ȱȱȱȱȱȱȱȱȱȱǯȱȱ ȱ ȱ ȱ ȱ ȱ ȱ Ȭȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ¢ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ǰȱ ȱ ȱ ȱ ȱ ȱȱȱ¢ȱȱȱȬȱȱȱȱȱȱȱȱȱȱ ȱȱȱȬǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ¢ȱ ȱ ¡ȱ ¢ǰȱ ȱ ȱ ȱȱȱȱȱȱȱȱȱȱȬǯȱ ȱ ȱȬȱǯȱȱȱȱ ȱȱȱȱ¡ȱȱ ȱȱȱȱȱȱ¡ȱȱ ȱȱ¡ȱȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ǯȱ Ȭȱ ǰȱ ǰȱ ȱ ȱ ȱ ȱ ȱ ȱ ǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱȱȱǯȱȱ ȱ ǯ ȱȱȱȱ ȱȱȱȱȱȱȱ¡ȱȱȱȱȱȱȱȱȱȱ ǰȱ ȱ ȱȱ ȱ ȱ ȱ ȱ ǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ¢ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ Ȭ¢ȱ ¢Ȭǯȱ ȱ ¢ȱ ȱ ȱ ȱ ¢ȱ Ȧȱ ȱ ȱȱȱȱȱȱ ȱȱǯȱ ȱ ȱ ȱ ȱ ¡ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ȱ ¢ȱ ȱ ȱ ȱ¢ȱȱȱȱȱȱȱȱǰȱȱǰȱȱȱȱȱ ȱ ȱ ȱ ȱ ǯȱ ȱ ȱ ȱ ȱ ȱ ¢ȱ ȱ ȱ ȱ ȱ ȱ ȱȱȱ¡ȱȱȱȱȱȁȂȱǯȱ ȱ ǯ ȱȱ ȱǽŗǾȱ ǰȱ ǯǰȱ ȱ ǰȱ ǯȱ ȱ ȱ ȱ ȱ ȱ ȱ ǯȱ ȱȱȱȱ¢ȱŘřȱǻŗşŝŝǼǰȱřřŝȬřŚřǯȱ ǽŘǾȱ ǰȱ ǯǰȱ ȱ ǰȱ ǯȱ ȱ ȱ ȱ ȱ ȱ ȱ Ȭȱ ǯȱȱȱȱȱ¢ȱŘŚȱǻŗşŝŞǼǰȱśřŖȬśřŜǯȱ ǽřǾȱ ǰȱ¢ǯȱȱȱȱ Ȭȱȱǯȱǰȱȱ ŗşŞŚǯȱ 77 Medical image sequence coding Mo Wu and Søren Forchhammer Research Center COM, Technical University of Denmark Abstract: Wavelet-based methods have become popular for two-dimensional (2D) image compression. It has also been used in 3D image sequence compression, especially for medical image sequence compression. Our contribution in this paper is to present a method to compress 4D functional Magnetic Resonance Imaging (fMRI) data (16 bit per pixel) in a 3D setting. In order to acquire lossless compression and improve compression ratio, different reversible integer to integer wavelet transforms have been tried out. In the entropy coding part, we use context based adaptive arithmetic coding and extend the 2D contexts to 3D by including the information from the third dimension slices. The compression ratio can be increased by the new 3D context. A Head Object Oriented Lossless Compression (HOOLC) for 4D fMRI data is also introduced, which can also be applied for other types of objects in lossless compression. Index Terms - fMRI, wavelet based image coding, medical image compression, lossless compression, adaptive arithmetic coding, HOLC, HOOLC. 1. Introduction Medical ultrasound data, Computer Tomography data (CT) and functional Magnetic Resonance Imaging (fMRI) data may have very large volumes and can only be transmitted efficiently by using compression. They include tremendous volumes of images with a high level of noise as well as a high correlation between them. Using wavelet based image sequence coding can decrease the volume of the data significantly and also provide progressive coding. Based on the lifting [6] scheme integer to integer wavelet transforms can be constructed [5], making efficient lossless compression possible, e.g. for medical image compression. The fMRI data we consider is 4D data. In the 3D spatial domain, it is composed of slices at different positions. Including time as the fourth dimension, the data become 4D. Functional MR data has its own property compared with e.g. video sequences. It has much more noise than normal video. The noise comes from the capturing and processing of the acquisition device. So even in areas of the image, which is supposed to be (almost) zero, there may be unexpected values. Viewing the images by naked eyes, sometimes we cannot tell the difference between two slices at different time. The popular technique, motion prediction [12] used in video technology, is not suitable for this condition. Lossless compression is very important for medical applications, because the subtle differences can give us information about the patients. But on the other hand many bits are most likely of little importance. So, for the medical data compression, traditional methods need to be revised. Firstly we use 3D wavelets based compression for the 4D dataset (64×64×49×32), which has be reformed to 3D setting (448×448×32) by aligning slices at different positions and the same time to one slice [see Fig. 1]. Here the data is 2 bytes per pixel. Thereafter optimized context based coding will be used for the 4D data compression. 78 Figure 1: 4D fMRI medical image sequence data in 3D representation A popular wavelet-based embedded image coding is Set Partitioning In Hierarchical Trees (SPIHT) [10], by Said and Pearlman, which improves upon Embedded Zerotree Wavelet (EZW) [11]. The coding efficiency comes from exploring interband dependencies between the wavelet coefficients. It assumes that the significant status of one region in a subband is highly correlated to its parent node. As a consequence, if the status of descendent subband coefficients is not dependent on its lower bands’ status, the cost will be more. 3D SPIHT [19] was developed for 3D data compression. Another popular coder is the context based layered zero coding [13], which is the core element of JPEG2000 [1] image compression standard. Compared with the zerotree structure coding [10], [11], it concentrates on pursuing the dependency inside the individual subbands. The context of the zero coding will be refined for the 3D structure. The type of data is important information for us when performing the compression. We can use the extra information to know the interested region of the images and preprocess the data to improve the compression ratio. The rest of this paper is organized as follows: 3D packet integer to integer wavelet transform is covered in Section 2. Section 3 will talk about 3D context based adaptive arithmetic block coding, concentrating on how to extend the 2D context to 3D. Section 4 will show the object based 3D wavelet coding. It combines the 3D contexts with object information to increase compression of the 4D datasets. Experimental results, in both lossy and lossless cases, are given in Section 5. Conclusions are drawn in Section 6. 2. 3D packet integer to integer wavelet transform Integer to integer lifting is used to construct reversible 1D wavelet transforms. 3D packet [3] integer to integer transform is constructed based on the 1D wavelet transforms. 2.1. Reversible integer to integer wavelet transform Traditional (discrete) wavelet transform decomposes one sequence of data into subbands by using low pass and high pass filters. The data can be further decomposed in subbands to decorrelate the data. The original medical data is represented by integer numbers (or finite precision). By using the normal wavelet transforms, such as the biorthogonal Daubechies 9/7, output wavelets coefficients will be real numbers, which can not be coded efficiently without loss. But for medical applications, only minor mistakes in the medical data may result in serious problems. Thus, efficient lossless compression requires an integer to integer wavelet transform. The lifting technique is a good way to construct integer to integer wavelet transforms [5], [6]. 79 Every lifting step can be rounded to integer representation, and the process can be reversible. Thus the wavelets transform can be converted to an integer transform, if it is constructed by integer lifting steps. There are many integer transforms [14-16], which can be used. Here we use the filters: (2, 2), (2, 4), (4, 2), (4, 4), (6, 2), (2+2, 2), (2/6), (9/7-F) and (9/7) These lifting scheme filters do not include the scaling part for the forward transform, except the (9/7). The scaling part [3] can be split into two integer lifting steps and maintain that the transform is reversible. But the scaling is used to make the transform unitary. It is also possible to do the transform without this part in lossless compression, making the transformation simpler and more robust towards lossy conditions, because the more integer lifting steps, the more the (rounding) error will spread from the code to the reconstructed images. For example, the lowest band of a 4 level 2D dyadic wavelets transform [9] combined with a 2 level packet transform [3] in the third dimension will experience 2 times scaling in the third dimension and 4 times scaling in the other two dimensions (see below), so the extra integer lifting steps will be 2×2×4=16 times. The rounding process at each step will exacerbate the quality of the reconstructed images if truncating for lossy coding.. 2.2. 3-D packet wavelets transform We do the spatial 2D dyadic wavelet transform for each slice and then do the wavelet transform in the third dimension. It is different from the 3D dyadic wavelet transform if the number of decomposition levels in the spatial domain is more than one. The order of the steps in the transformation is shown on Fig. 2. (a) 3D Dyadic Row Transform Column Transform Time Transform (b) 3D Packet Row Transform Column Transform Time Transform Figure 2: The order of the wavelet transform in the three dimensions for: (a) 3D dyadic transformation and (b) 3D packet transform Different wavelets are good at decorrelating different textures. The relationship between the neighbor pixels in the special domain is different from that in the time domain. Different wavelets can be used for the spatial domain from the one for the time domain. 3. 3D Context based adaptive arithmetic block coding In the 3D context based adaptive arithmetic block coding, we did the entropy coding for the wavelet coefficients based on the Zero Coding (ZC) primitive, Sign Coding primitive (SC) and Magnitude Refinement primitive (MR), which are the core of the Embedded Block Coding by Optimized Truncation (EBCOT) [13] technique for image coding. The basic coding contexts have been extended to 3D by including the status from the front slice and back slice and the most current status from the front point in the third dimension. The main target is finding the best contexts to achieve a higher compression ratio. 3.1. Zero coding, Sign coding, Magnitude refinement We did the entropy coding for the wavelet coefficients by using ZC, SC and MR primitives and thereafter include context information from the front slice and the back slice. There are 17 contexts in total for the normal 2D contexts model 80 3.2. Extension to 3-D contexts Based on the 2D context model, we extend the contexts to 3D by including the information from the front slice and the back slice. We use a Normalized Least-Mean-Square (NLMS) algorithm [2] [see Fig. 3] to train α i ,1 ≤ i ≤ 18, i ∈ Z to predict the wavelets coefficient with the 18 neighbors in these two slices. The estimator [3] is defined as, ∆ = ¦ wi ∈S α i ωi , ωi is the wavelets coefficients at the front and the back slice. 0 h(n) + d(n) w(n) Į(n) dˆ (n) + e(n) - w(n-1) NLMS adaptive filter Figure 3: Adaptive filter structure for linear prediction. w(n) is the wavelets coefficients; d(n) is the current wavelet coefficient we need to predict; dˆ (n) is the predicted value; e(n) is the error in prediction. α k +1 = α k + µ ek ω k L ¦ω 2 m,k m =1 We train α i for the different subbands in the third dimension and thereafter simply use the mean results of the different subbands as α i . If we were to use linear prediction, we would (implicitly) assume the signal is stationary, unfortunately, this is not the case, and α i should not converge to one value, but rather fluctuate around some value. So the mean value is used for an approximate division of the original contexts. This prediction is only used for extending the zero coding contexts, because there is no relation with the sign and little relation with magnitude refinement. The estimator is quantized to 2 intervals or 4 intervals. When we further extend these 3D contexts, we only use 2 intervals here. But this prediction is only used to extend the ZC contexts, because this prediction almost has no relationship with the MR and no relationship with SC. 3.3. Scan order in coding the blocks In the coding process, we use a block coding method, but without making the coding independent for every block with respect to the other blocks, not only because we want to extend the contexts to 3D, but also as we can use information from the other blocks. If we have more information available, we can predict the coded value more accurately. So, in this condition, we want to initialize the counts of the statistics for adaptive arithmetic coding by inheriting the counts from the most similar neighbor block. One way to evaluate the similarity is by the Kullback-Leibler distance [7] [8] between the coded block and its 81 neighbour block. The blocks that have the smallest distance have the most similarity of their statistics. The average Kullback-Leibler distance D for this model is defined as: ¦c ( n0 (c ) + n1( c ) ) d c ° (c) (c) °D = ¦c ( n0 + n1 ) ° ° 1 § qk ( c ) · ° (c) ® d c = ¦ q k log 2 ¨¨ ( c ) ¸¸ k =0 © pk ¹ ° ° nk + δ ° pk , qk = n0 + n1 + 2δ ° ° ¯ Where d c is the Kullback-Leibler distance for one context, which has label c and a discrete distribution, k = 0, 1 . The probability function for the first block is pk it is qk (c ) (c ) , for the second one . The statistics in first block will be passed on to the second block, and the second block will use it to do the arithmetic coding. Thus, the distance D we calculate for predicting (c ) (c) the arithmetic coding performance uses the counts ( n0 and n1 ) of every context in the second block as a weight for calculating the average Kullback-Leibler distance to get more accurate prediction. From the definition of the distance, we can see it is always nonnegative, and equals zero if all of the probabilities distribution of the contexts are the same. Experiments show that the distance between the neighbouring blocks in the same subband in the third dimension (time) is the smallest, followed by the neighbour block in the same spatial subband. It is easy to understand if we explain it from another point of view. The medical image sequences have high correlation between neighbour slices at different positions, but even higher between neighbour frames in time. The decomposition in the time dimension can make it into a multi-resolution subband, but the stillness of the frames makes the wavelet coefficients in the same band generally the same. In the same 3D subband, neighbouring blocks in the third dimension’s (time) direction are coded first, thereafter we move to the next neighbouring block in the 2D spatial domain to finish coding all the blocks in the same way. After finishing the coding of one 3D subband, we move to the next subband along the direction shown in the Figure 4. After finishing coding the first packet, we move to the second one and use the same scan order as in the first one. After we finish scanning all the points for one instance, then the next round for the next bit-plane starts. Scanning by this way can also support progressive coding. 82 Figure 4, The order in block coding 3.4. Initialization of the contexts We optimize the method of initializing counts of the statistics table by comparing the code lengths in a different way. In order to make it adaptive and localized for small regions in the adaptive arithmetic coding [9], [17], we use the same statistics table in the same subband, which is in the 3D spatial subband and includes at least one slice. The counts of a new subband in the same resolution will be initialized by the scaled old counts from the former subband. It is calculated as follow, ª º old count × λ new count = « » ¬ number of blocks in last subband ¼ λ is a factor in controlling the volume of the initialized counts. In subbands of different resolution or different bit-plane, the counts of new statistics will be force to zeros. 3.5. Further extension of the 3-D context As we have stated before, the average Kullback-Leibler distance between the neighbour slices in the time direction is small. Besides, pixels coded at one bit-plane can also be used as information in coding the same position in the next slice. Status of the pixel at one bit-plane has two types. We can extend the number of zero coding contexts to 9×2×2=36. This information can be used to extend magnitude refinement coding contexts and the sign coding contexts. They will have 3×2=6 and 5×2=10 contexts separately. 4. Object oriented 3D context based coding The functional fMRI data we used is image slices of a human head. The head object is located at the center of the images and it has a blurred boundary between object and the background. There is a lot of noise in the background. In the medical treatment, the head object is the most important for diagnosis, thus compression may be increased if only the head object information is coded. But we should find a fully safe way to strip the background, but not any part of the head. We call the lossless compression of the head object for Head Object Lossless Compression (HOLC). Another object oriented method is lossless compression of the whole dataset, but using different statistics for different objects. We call it Head Object Oriented Lossless Compression (HOOLC). 4.1. HOLC To demonstrate the concept we use a simple (bounding) rectangle shape to mask the head object. In deciding the size of the rectangle, we use different thresholds to test the acceptable boundary between the head and the background. Because gray level of the bone is much 83 larger than that of the margin, it is easy to find the boundary in slices at the back head position. But it is difficult to find it in slices at the front face position. Thus we test different thresholds until it is safe to force the margin to zero. We code the preprocessed dataset and combine the mask information we have known in the preprocessing. Mask information is processed by transforming the mask from the 2D spatial domain to 2D wavelets domain according to 2D dyadic decomposition level. Because the wavelet decompositions change the resolution to a lower scale, so it is easy to predict the boundary in the wavelets domain without truly using the wavelet transformation. The transformed mask is only used to differentiate the boundary in the wavelets domain and the head object will be coded separately from the almost zero back ground (head edge spread to the background due to long length filters and higher decomposition level). An ellipse-shaped mask has also been tested in HOLC. It can be very effective for some slices using a high threshold, but it will cut off part of the head object in some slices. In order to make it safe for all the slices, the size would be very large and the area of the object would be larger than that of the simple rectangle shape mask. So, it was not used for the HOLC method. 4.2. HOOLC In HOOLC mode, the ellipse mask is a good choice, because almost all the head slices have shapes similar to an ellipse. It can match the boundary more accurately than a rectangle. Because both the head object and the background object would be compressed, it is tolerable to make mistakes in masking the head. Also the ellipse shape is very easy to be coded with only one point coordinate and two axis’ length values and the same position share the same value. By this method, we separate the statistics of these two objects. Because the textures of these two objects are different, the statistics can be very different. It can increase the compression ratio based on the clear classification. In order to get the highest compression ratio, a good threshold needs to be chosen to separate the different objects or in other words, the different textures. The same coding method is applied as in HOLC except preprocessing. The ellipse mask and the transformed version [see Figure 5] are used for separating the coding content. Figure 5: One slice of the image sequence, the ellipse mask with threshold equal to 50 and mask after four levels decomposition So HOOLC can fully support lossless compression by compressing the head object and the background separately. HOOLC first codes the head object and then the background, so it can provide extra dimension to progressive coding to the traditional wavelet based coding. Also HOOLC can support lossless reconstruction of the head object and almost lossless reconstruction of the background. It will be flexible to be used for the doctors, such as used in telemedicine [18]. 84 5. Experiment results In the experiments, lossless compression for a dataset of dimensions 448×448×32 (16 bits per pixel) was performed. The data is larger than the normal 8 bit data sets, and the maximum value requires 11 bits to be represented. As the data set is mainly used for medical analysis and treatment, we concentrate on the test of lossless compression (2D context, 3D context and HOOLC) and lossy compression HOLC with different reversible wavelets transform. The lossy compression results were compared with 3D_SPIHT. For HOOLC, an optimal threshold was determined for the (2, 2) reversible wavelet transform. 5.1. Lossless compression and HOLC with different filters The lossless compression performances of the proposed coders were evaluated with different integer filters. The results [see Table 1] show that extending the 2D context to 3D can increase the lossless compression ratio. The best filter for the head fMRI sequence is 2/6 and then (2, 4), (2, 2). The hybrid filters can also give us good results, and 2D 2/6 combined with (2, 4) has the best performance in the 2D context coding. As mentioned before, HOLC is only lossless for the head object. The result of HOLC shows that many bits have been used to code the noisy background. Table 1: Lossless compression results with different encoders and reversible filters 5.2. Methods 2D context 3D context HOOLC (T = 50) HOLC (T =100) (2,2) (2,4) (4,2) (2+2,2) (4,4) (6.2) 9/7 9/7-F 2/6 2D(2,2) – 3rdD(2,4) 2D(2,2) – 3rdD 2/6 2D(2,4) – 3rdD(2,2) 2D(2,4) – 3rdD 2/6 2D 2/6 – 3rdD(2,2) 2D 2/6 – 3rdD(2,4) 4.393 4.386 4.445 4.439 4.421 4.471 4.428 4.419 4.384 4.387 4.397 4.391 4.396 4.378 4.373 4.334 4.329 4.386 4.384 4.362 4.413 4.355 4.346 4.292 4.330 4.327 4.333 4.326 4.299 4.294 4.305 4.299 4.356 4.354 4.332 4.383 4.324 4.315 4.262 4.300 4.298 4.303 4.297 4.270 4.265 2.686 2.734 2.837 2.816 2.843 2.892 2.903 2.872 2.585 2.685 2.681 2.735 2.730 2.590 2.589 Threshold choosing in lossless 3D object based medical image compression In HOOLC, different thresholds from 10 to 140 have been tested. The result is shown [see Figure 6]. When the threshold equals to 50, we get the highest compression ratio. And the mask can increase the compression ratio by classifying different textures of the head object and the background. 85 Figure 6: Lossless compression result with different thresholds. Using (2,2,) reversible wavelet transform Figure 7: Rate-distortion comparison between 3D contexts adaptive arithmetic coding and 3DSPIHT in lossy compression. But the mask is lager than the clear boundary of the head. One explanation is that near the boundary there are still some areas having a texture similar to that of the head but with a lower gray scale. 5.3. Lossy compression Lossy compression was constructed based on the 3D context coding, and the transformation is a real number to real number wavelet transform, which is also constructed by lifting steps [6] and has scaling to make it unitary. Result was compared with 3D_SPIHT [see Figure 7]. The performance is better than 3D-SPIHT. In high bit-rate, the performance is also better than 3DSPIHT, but not as obvious as in low bit-rate. The integer to integer transform based coding can also support progressive reconstruction, but the rate-distortion performance is not better than that of the real number to real number wavelet transform because of rounding errors at every step. If we need to use integer to integer transform based lossy compression, the filters with less lifting steps will be preferable. 6. Conclusion This paper gives an overview of four evolutional 3D wavelet based coders for 4D medical image sequence, from 2D context to 3D context and then object based coders. The performance is improved and tested after the refinement of the contexts and object classification. Filter 2/6 has the best performance in compressing fMRI data of head, and hybrid filters can also be used. The lossy compression performance is better than 3D-SPIHT. A good extension from 2D context to 3D has been given. The Kullback-Leibler distance was used to explain the similarity between the different statistics. HOLC is a coder supporting region-of-interest (ROI). HOOLC is a safe way to compress fMRI image data of a head and it can increase the compression ratio. It also provides an extra dimension for progressive reconstruction, focusing on the object first and then the background. This can increase the compression ratio by mildly throwing away some non-usable bits related to the background. References [1] JPEG2000 Image Compression Fundamentals, Standards and Practice, David S. Taubman, Michael W. Marcellin, Page 356. [2] A Family of Normalized LMS Algorithms, Scott C. Douglas, IEEE Signal Processing, Vol. 1, NO. 3 March 1994. 86 [3] Lossy-to-Lossless Compression of Medical Volumetric Data Using Three-Dimensional Integer Wavelet Transforms, Zixiang Xiong, Xiaolin Wu, IEEE Transactions on MedicalImaging, Vol. 22, No, 3, March 2003 [4] M. Antonini M. Barlaud P. Mathieu I. Daubechies, "Image coding using wavelet transform," IEEE Trans. on Image Processing, vol. 1, no. 2, pp. 205--220, 1992. [5] M. D. Adams and F. Kossentini, Reversible Integer-to-Integer Wavelet Transforms for Image Compression: Performance Evaluation and Analysis, IEEE Trans. on Image Processing, vol. 9, no. 6, Jun. 2000, pp. 1010-1024. [6]I. Daubechies and W. Sweldens Factoring Wavelet Transforms into Lifting Steps, J. Fourier Anal. Appl., Vol. 4, Nr. 3, pp. 247-269, 1998. [7] Cover, T. M. and Thomas, J. A. Elements of Information Theory. New York: Wiley, 1991. [8] Qian, H. "Relative Entropy: Free Energy Associated with Equilibrium Fluctuations and Nonequilibrium Deviations." 8 Jul 2000. [9] Khalid Sayood, Introduction to data Compression, second edition [10]: A. Said and W. A. Pearlman. “A New Fast and Efficient Coder Based on Set Partitioning in Hierarchical Trees”. IEEE Transactions on Circuits and Systems for Video Technologies, pages 243/250, June 1996 [11] J. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients,” IEEE Trans. Signal Processing, vol. 41, pp. 3445–3463, Dec. 1993. [12] Ligang Lu1, Zhou Wang2, and Alan C. Bovik, “Adaptive frame prediction for foveation scalable video coding.” [13] D. Taubman, “High performance scalable image compression with EBCOT,” IEEE Trans. Image Processing, vol. 9, pp. 1158–1170, July 2000. IEEE Trans. Commun., vol. 45, pp. 437–444, Apr. 1997. [14] R. C. Calderbank, I. Daubechies,W. Sweldens, and B.-L. Yeo, “Wavelet transforms that map integers to integers,” J. Appl. Comput. Harmon. Anal., vol. 5, pp. 332–369, 1998. [15] A. Bilgin, G. Zweig, and M. W. Marcellin, “Three-dimensional image compression using integer wavelet transforms,” in Appl. Opt.: Inform. Proc., vol. 39, Apr. 2000, pp. 1799–1814. [16] W. Sweldens, “The lifting scheme: A construction of second generation wavelet,” SIAM J. Math. Anal., vol. 29, pp. 511–546, 1997. [17] G. Langdon, “An introduction to arithmetic coding,” IBM J. Res. Dev., vol. 28, pp. 135– 149, 1984. [18] “Telemedicine Today Magazine,” Westlake Village, CA. [19] B.-J. Kim, Z. Xiong, and W. A. Pearlman, “Low bit-rate scalable video coding with 3-D set partitioning in hierarchical trees (3-D SPIHT),” IEEE Trans. Circuits Syst. Video Technol., vol. 10, pp. 1365–1374, Dec. 2000. 87 Integrity Improvements in Classically Deformable Solids Micky K. Christensen* Anders Fleron† Department of Computer Science University of Copenhagen DIKU Abstract The physically-based model for simulating elastically deformable objects, presented by Terzopoulos et al. in 1987, has significant problems with handling solids. We give a plausible explanation for the reasons of instability that lead to implosion of deformable solids. We propose an extension to the original model, with improvements that result in a model of increased stability. Comparisons are made with the original method to illustrate this point. The improved model is suitable for interactive simulations of deformable solids, with only a small constant decrease of performance. Keywords: Physics-based animation, simulation, dynamics, deformation, real time, deformable solids, continuum elastic model, constraints, variational calculus, finite differences 1 Introduction Figure 1 Large deformation results in convincing material buckling Deformable models seem to have gained increasing interest during the last years. Part of this success comes from a desire to interact with objects that resembles those in real life, which all seem to be deformable at some level. The fact that CPUs and GPUs today are both advanced and powerful makes it possible to simulate and animate deformable bodies interactively. This paper builds on work previously done by Christensen and Fleron [1], which focused on an implementation of a method for simulating elastically deformable objects, first put forward by Terzopoulos et al. in 1987 [9]. The implementation could simulate various types of deformable surfaces convincingly, but severable instability issues were noticed, with regard to simulating deformable solids. As the authors of the original method claim the model can simulate deformable curves, surfaces, and solids, we will look into why the method was unable to simulate deformable solids satisfactorily. At first sight it seemed that the solids did have a hard time keeping their integrity. Integrity preservation is important to give a realistic impression of a deformable solid. We will try to solve the instabilities using concepts from the original frame work, and propose an improved model that is able to simulate solids, as well as surfaces, without a significant decrease of the overall performance. In section 2 we revisit the theory of elastically deformable models, with focus on solids, to give an overview of the method. The theory serves as a foundation for understanding the following sections. Section 3 reveals and explains the instabilities of the original model. Sections 4 and 5 focus on solutions and improvements of the local and global problems, respectively. In section 6 we present the * † results of adding the developed improvements to the model. Comparison between the original and improved model will be performed visually. 2 Elastically Deformable Solids At the basis of the theory of deformable models lies elasticity theory. From this physical theory Terzopoulos et al. have extrapolated a model that governs the movements of elastically deformable bodies [9]. In this paper we will concentrate on the theory of deformable solids. A point in a solid is described by the intrinsic coordinates a = [a1 , a 2 , a 3 ] . A deformable solid is thought of as having a natural rest state, where no elastic energy is inherent. The rest state is described by r 0 (a ) = [r10 (a ) , r20 (a ) , r30 (a )] , where the positional vector function r of the object is defined in Euclidian 3-space. When the solid is deformed, it takes on a different shape than its rest shape, and distances between nearby points are either stretched or compressed with the deformation. This ultimately creates elastic energy that results in internal forces that will seek to minimize the elasticity. The deformation will evolve over time and as such, it can be described by the time-varying vector function r (a, t ) = [r1 (a, t ) , r2 (a, t ) , r3 (a, t )] . The evolving deformation is independent of the rigid body motion of the solid. The equations governing the motion of particles in a deformable solid are obtained from Newtonian mechanics, and given by ( ) s µ sr + H sr + EF ( r ) = f ( r,t ) , Er st st st [email protected] [email protected] 1 (1) 88 problem defined by a function of three independent variables and their first derivatives, the equations become where r (a, t ) is the position of the particle a at time t , µ (a) is the mass density at particle a , and H (a) is the damping density. The right hand side represents the sum of externally applied forces at time t . The third term on the left hand side of (1) is called a variational derivative, and this is where the internal elastic energy is dealt with. F ( r) is called a functional and it measures the potential energy that builds up when the body is deformed. sF d sF d sF d sF = 0 . sy dx 1 syx1 dx 2 syx 2 dx 3 syx 3 The symbol F is the function describing the problem, and F will be minimized with respect to the symbol y , which in our case is the positional vector field r . Substituting (5) into F in (6) and executing the differentiations yields a neatly compact expression that minimizes the given problem 2.1 Deformation Energy A method is needed to measure the deformation energies that arise when a solid deforms. For this task, we use the area of differential geometry. It is convenient to look at arclengths on curves, running along the intrinsic directions of the solid. A way of measuring the directions is specified by what is known as the first fundamental form. This form takes on a local view, and looks at changes in length between particles in a small neighborhood of a particle. When looking at the first fundamental form, or the metric tensor, we get an idea of the distances between particles. For a deformable solid the metric tensor is Gij ( r (a )) = sr ¸ sr , 1 b i, j b 3 , sai sa j 3 0 0 = s r ¸ sr . sa i sa j 3 2 0 ¨ Iij (Gij Gij ) da1da 2da 3 , ( 3 2 . ) (8) The alpha tensors (8) represent the comparison between the deformed state and the rest state of the solid. When an element in an alpha tensor becomes positive, it means that the corresponding constraint has been stretched and that it wants to shrink. Likewise, when an element becomes negative, the constraint has been compressed and it wants to grow. (2) 2.2 Discretization The deformable model is continuous in the intrinsic coordinates. To allow an implementation of deformable solids, the model is discretized into a unit 3D grid structure, representing the particles which will make up a solid. The grid will have three principal directions called l , m , and n . Particles in the grid are uniformly distributed with spacings in each of the three directions, given by the symbols h1 , h2 , and h3 . The number of particles in each of the directions are designated L , M , and N . Each particle property will be discretized using an index [l , m , n ] , which will return the property value of that particular grid entry, e.g. the particle positions will be described by the discrete vector function r[l , m , n ] . The model requires that derivatives are calculated in the intrinsic directions of the object. For this purpose we use finite difference operations across the grid, to achieve the desired derivative approximations [2]. An approximation to the first derivative in the m -direction, can for example be obtained by use of either a forward or backward difference operator. Given an arbitrary grid function u[l , m , n ] these approximations are (3) (4) where 8 is the domain of the deformable solid and I is a user defined tensor that weights each of the coefficients of the metric. For a single point the energy function is simply Iij (Gij Gij0 ) (7) 0 Bij = Iij ra i ¸ ra j Gij . 8 i , j =1 S = ) with By using the weighted Hilbert-Schmidt matrix norm of the difference between the metric tensors, a simplified way of describing the energy of deformation becomes F (r) = ( ES = sa i Bij ra j , Er i , j =1 which is a symmetric 3 × 3 tensor. The diagonal of the tensor represents length measurements along the coordinate directions from the particle in question. The offdiagonal elements represent angle measurements between the coordinate directions, which resist shearing within the local particle structure. When measuring deformation energy in a solid, we are interested in looking at the change of the shape, with respect to the natural rest shape. This rest shape is described by the superscript 0 such that Gij0 ( r (a )) (6) (5) + 1 D2 (u ) = h2 (u [l , m + 1, n ] u [l , m , n ]) i , j =1 (9) and The elastic energies that occur when a solid is deformed come from the stretching and compressing of the particles in the body. To discover the natural behavior of a deformed solid seeking towards its rest state, a term that minimizes the deformation energy is desirable. Variational calculus can be applied to find such a minimizing term, which is described by the Euler-Lagrange differential equations [5]. For a 1 D2 (u ) = h2 (u [l , m , n ] u [l , m 1, n ]) , (10) where the superscript symbols, + and , represent forward and backward operators, respectively. Using the difference operators, it is possible to discretize (7), by replacing the derivatives with the corresponding 2 89 difference operators. The discrete equation for the elastic force e becomes 3 e [l , m, n ] = Di (p)[l , m, n ] , (11) + (12) i , j =1 p [l , m , n ] = Bij [l , m , n ]D j ( r)[l , m , n ] , and the B tensor field is also discretized using finite differencing, Bij [l , m, n ] = Iij [l , m, n ](Di+ ( r)[l , m, n ] ¸ D j+ ( r)[l , m , n ] (13) Gij0 [l , m, n ]). Problems at the boundaries of the grid will occur because no information is readily available about the derivatives in these places. A natural boundary condition can be created by setting to zero all forward difference operators that reach outside the grid [9]. To solve equations for all particles at the same time, the values in the positional grid r and in the energy grid e can be unwrapped into LMN -dimensional vectors r and e . With these vectors, the entire system of equations can be written as e = K(r) r , Figure 2 The surface patch will collapse to a curve, when a particle crosses the opposite diagonal At = K ( rt ) + s2 r st 2 +C gt = f t + 2 2 rt +%t 2%t 3 (15) (16) . rt +%t = At1 gt , 2 ) ( ) M + 1 C rt + 1 M 1 C vt , (19) 2%t 2 %t 1 ( rt rt %t ) . (20) Instabilities The instabilities we will address are not numerical in nature, such as the chosen integration method, timesteps, etc. We will turn the interest towards instabilities, regarding the structure of the underlying constraints in the model of elastic solids. The problems can, more or less, be identified as boundary issues, as the integrity instabilities arise from the boundaries of the discrete 3D grid. Real life deformable objects are held together by strong forces at a very small scale. Similarly, we could solve the integrity problems, if we could get away with using extremely high values in the underlying weighting tensors. The reason why we must disregard this option is because it will require us to integrate numerically using “infinitely” small timesteps. As we are interested in interactive simulations, we seek to use large timesteps, thus to simply increase the values of the tensors is not a desired option. Differential Geometry [7] is used as a tool to measure deformation of an elastic body, in comparison with its resting shape. For solids, the first fundamental forms, or 3 × 3 metric tensors, are sufficient to distinguish between the shapes of two bodies. However, the metric tensor of a solid (2) is not nearly sufficient enough as a tool to compute the complex particle movements of a deformed solid, seeking By inserting these derivatives into (15) we convert the nonlinear system into the system of linear equations At rt +%t = gt (18) With these equations in hand it is possible to implement real time dynamic simulations of deformable solids. The desirable properties of the system matrix indicate that a specialized solution method, such as the conjugate gradient method [8], can be utilized. rt +%t 2 rt + rt %t %t rt %t ( %1t vt = %t 2.3 Numerical Integration To evolve the deformable model over time, a semi-implicit time integration scheme will be employed. The time interval that the model will evolve in, is subdivided into time steps of equal size %t . Using central differences, the time derivatives are approximated by s r 2 x st sr x st ) M+ 1 C , 2%t with the velocity estimated by the normal first order backwards difference (14) sr + K(r) r = f . st 2 and g is called the effective force vector, where K ( r ) is an LMN × LMN sized matrix, called the stiffness matrix, which has desirable computational properties such as sparseness and bandedness. A discussion on how to assemble the stiffness matrix K can be found in [1]. We introduce the diagonal LMN × LMN mass matrix M and damping matrix C , assembled from the corresponding discrete values of µ[l , m , n ] and H [l , m , n ] , respectively. The equations of the elastically deformable model (1) can now be expressed in grid vector form, by the coupled system of second order ordinary differential equations M ( %1t (17) where A is called the system matrix, 3 90 towards its resting shape. The discrete components of (2), disregarding the diagonal, describe the cosine to the angle between adjacent directions multiplied by the product of the lengths, v ¸ w = v w cos R, 0bRbQ. (21) The angle between two vectors is not dependent on their mutual orientation, as verified by the domain of R in (21). This is the main reason why we will get internal structure instabilities. The case of surface patches is depicted in Figure 2. The bold lines and the angle between them form the natural condition. If particle A is moved towards B , as in case 1, the angular elasticity constraints will force the particle back to its resting condition, in the direction indicated by the arrow. The behavior of the angular constraint force is defined ambiguously when the angle between vectors is either 0 or 180 degrees. The latter is shown in case 2. If the particle reaches beyond the opposite diagonal, as in case 3, the elasticity will be strengthened and push particle A into B . This is clearly a problem, as it will reduce the surface into a curve. The original model [9] suffers from this instability. The instabilities for solids get even worse. Expanding the square to a cube and focusing on a particle in the grid. Not only does the particle spawn surface instabilities for the 3 directional patches, there is also nothing that prevents the cube from collapsing over the space diagonals. The disability of volume preservation will ruin the integrity of the solid. In short, the original elastic model is insufficient to simulate deformable solids. Based on the modeling concept by Terzopoulos et al [9], we will remodel the method, to render it more suitable for simulating elastically deformable solids. 4 Figure 3 The dual explicit angular constraint is replaced by two new diagonal constraints that will define the angular constraints implicitly considered as new directions called ad 1 and ad 2 . When the model is extended with the new diagonal constraints, variational calculus must be utilized for the expression of the minimizing elastic force. Writing out (5) for the case of surfaces, with focus on the new directions, yields + I21 0 ad 2 0 ) ) + ", 2 0 2 ¸ ra d 2 G21 (22) 0 where the elements G12 and G21 of the rest state tensor, now holds the natural state of the new diagonal constraints. With the new directions, ad 1 and ad 2 , in hand, the EulerLagrange equation changes to ES = S s S s S s S sad 2 S ra .(23) r a 1 ra a 2 ra ad 1 ra 1 2 d1 d2 Er Integrity Improvements Using variational calculus on the terms from (22) results in the following additions to the elastic force e , To handle integrity instabilities, we are going to design an extension to the original model that can improve its ability to prevent collapsing of the grid cubes. Basically, the extension will be done by adding new constraints to the model, gathered into a new tensor we call the Spatial Diagonal Metric, or SDM. To make things a little easier, we start out by describing the analogy for surfaces. 4.1 ( (r S = " + I12 ra d 1 ¸ ra d 1 G12 e[m , n ] = " Dd 1 (p )[m , n ] Dd 2 (q )[m , n ] , (24) where + and Extended Metric For a surface, the original tension constraints on a given particle is the four constraints given by its 2 × 2 metric tensor G . As with solids, when i = j then Gij defines the length constraints along the coordinate directions m and n , while when i v j then Gij represents an expression of the same explicit angular constraint between the two directions. The idea is to replace the dual explicit angular constraint with two diagonal length constraints. These constraints will reach from the particle at [m , n ] to the diagonally opposite particles at [m + 1, n + 1] and [m 1, n + 1] , as depicted in Figure 3. Besides working as diagonal length constraints, they will also implicitly work as angular constraints that together can account for all 360 degrees. The directions along these new constraints will be p [m , n ] = I12 [m, n ]Dd 1 (r)[m, n ] q [m, n ] = I21 [m, n ]Dd+2 ( r)[m, n ]. (25) Notice that new difference operators arise with the new directions in the discretization of the extended metric. These operators can be considered to be just like the operators in the original directions. For example, the new first order forward difference operators become Dd+1 (u ) = u [m + 1, n + 1] u [m, n ] hd 1 Dd+2 (u ) = u [m 1, n + 1] u [m, n ] , hd 2 2 2 (26) where hd 1 = hd 2 = h1 + h2 is the grid distance in both diagonal directions. The square root is computationally expensive, but it can easily be avoided in any calculation 4 91 involving the new difference operators, or simply be precomputed as the value does not change at runtime. We have shown how to extend the metric tensor, and we call the extended metric for Metrix. To give a deformable solid a stronger integrity, the Metrix can easily be applied to work for solids. The 3 × 3 metric tensor for a solid contains three length constraints along its diagonal. The remaining six elements represent angular constraints between the original coordinate directions, and these can be replaced by the appropriate elements of the Metrix. Figure 4 Spatial length constraints for solids. On the left it is shown how the four constraints reach from the center. On the right it is shown how the constraint contribution from four particles on one cube side renders symmetric behavior 4.2 Spatial Diagonal Metric To implement volume preservation, we introduce another new idea called the Spatial Diagonal Metric, which is a 2 × 2 tensor. The four elements will represent four new length constraints that will be spatially diagonal, meaning they will span grid cubes volumetrically. The four new directions can be chosen three-ways, and we have simply chosen them to reach from [l , m , n ] to the four particles at [l 1, m +1, n +1] , and [l +1, m 1, n +1] . No [l 1, m 1, n +1] , [l +1, m +1, n +1] , matter how the four length constraints are chosen to span the cubes, their contributions will end up covering a grid cube symmetrically, as depicted in Figure 4. The new SDM tensor that represents the volume will be called V . It follows the same design ideas as the Metrix, and is defined as Dv+1 ¸ Dv+1 V = ¡¡ + + ¡¢Dv 3 ¸ Dv 3 + + Dv 2 ¸ Dv 2 ¯ °, ° ± Dv+4 ¸ Dv+4 ° (27) Figure 5 together where Dv+1..4 are the four new first order forward difference operators along the new spatial directions. 5 Discrete central differences bind adjacent grid cubes What we seek is a mechanism that somehow binds adjacent grid cubes together, in such a way that if implosions occur, we can disperse self-intersecting cubes. This is not a method that can prevent self-intersections, but it can restore the integrity of the solid upon implosions. We can reuse what we have been working with so far, and thus reduce the computational cost and memory use significantly, compared to the extra load we would introduce into the system, if we had used a BVH algorithm. We introduce a new Pillar tensor P , which is based upon the discrete metric tensor G , but extended to use first order central difference operators. For reasons of clarity we will limit P only to use the length constraints Global Implosions With the SDM and Metrix contributions added to the elasticity term, the model of deformable solid can prevent the shape of the grid cubes from undergoing local collapsing. This is an important improvement towards keeping the integrity of a deformable solid intact. Another integrity issue still exists that renders a solid unable to prevent implosions. In this matter, we define an implosion as when a grid cube enters one of its adjacent grid cubes. Implosions can happen upon large deformation, which typically are caused by heavy external forces, e.g. reaction to collisions and aggressive user interactions. Global implosions can also be defined as internal self-intersections, thus self-intersection tests can be utilized as a tool to prevent implosions. Common methods to avoid self-intersection include surrounding each grid cube into an axis aligned bounding box, or AABB, and arrange the AABBs into a bounding volume hierarchy, or BVH tree, that can be updated efficiently as the body deforms [4]. A recent paper introduces image-space techniques that can be implemented on the GPU, to allow performance friendly detection of self-intersections and collision between deformable bodies [6]. Although a method for handling selfintersection can be chosen to perform reasonable, it will still decrease the overall performance. D 2 ( r ) 0 0 ¯° ¡ 1 ¡ ° 2 P [l , m, n ] = ¡ 0 D2 ( r ) 0 °, ¡ ° ¡ ° 2 0 D3 ( r )° ¡¢ 0 ± (28) where D1 (u)[l, m, n ] = ½h11 (u (l + 1, m, n) u (l 1, m, n)), 1 D2 (u)[l, m, n ] = ½h2 (u (l, m + 1, n) u (l , m 1, n)), (29) D3 (u)[l, m, n ] = ½h31 (u (l, m, n + 1) u (l , m, n 1)) . The effect of using central difference operators results in a very convincing way to bind adjacent grid cubes together, see 5 92 Figure 5. As every grid particle will be extended with the Pillar tensor, the combined range of P will overlap all grid cubes. To further strengthen the prevention of implosion, the Pillar tensor can easily be extended to use a central difference Metrix tensor. 6 Results We have extended the previous implementation of the elastically deformable models [1] to support both the Metrix, SDM, and Pillar contributions, when simulating deformable solids. The implementation can be found in [3]. The deformable solids respond naturally to external forces, e.g. gravity and viscosity. The user can interact with the solids, such as constraining particles to a fixed position and performing pulling operations using spring forces. We have also implemented a scaling mechanism, which allows a user to control the overall uniform strength of the tensors. Adjusting the strength scaling interactively can vary the stiffness of a deformable solid in real time. For example simulating the effect of a solid being inflated. Experiments have revealed that the effects of the Metrix and SDM do not always succeed satisfactorily, in moving particles back to their natural location. In some situations new energy equilibriums arise unnaturally. With the help of the supported visual debugger we have realized that different constraints can work against each other, and the result is that the sum of constraint contributions is zero. To counteract this problem, we have squared the force of the constraints in the SDM (and the Metrix), to make sure they prevail. To give a reasonable review of how the improved model solves the presented integrity instabilities, we will perform visual comparisons between the original model and the new improved one. In Figure 6, still frames from a small box that is influenced by gravity and collides with a plane, are compared frame to frame between the models. Due to the lack of volume preservation, the constraints of the original model simply cannot keep the shape of the cube. In Figure 7, we have recreated a configuration from [1], comparing two rubber balls with different particle mass. The picture on the left is taken from [1], where the metrics fail to maintain the integrity of the right ball, thus the ball collapses on itself. The picture on the right is simulated using the same parameters but with the improved model, and the integrity of the ball is now strong enough to stay solid. In Figure 8, a test of how well the models can recover from a sudden large deformation is performed. The improved model enables simulations of situations that are impossible with the original model. In Figure 1, a soft solid is depicted. The solid has been constrained to the ground, and in three of the top corners. Pulling the last corner downwards results in a large deformation and renders convincing material buckling. In Figure 9, the true strength of the Pillar tensor is illustrated, showing an effect of inflation. In Figure 10, some pudding is constrained to the ground, and being twisted by its top face. The sides of the deformable cube skew as is expected of a soft body like pudding. In Figure 11, a large water lily is deformed upon resting on two pearls. The improved model performs a great job in keeping the water lily fluffy. Figure 6 A small box is influenced by gravity and collides with a plane. The three stills on the left illustrate the original model, and on the right the frames from the new improved model is shown Figure 7 Rubber balls. The left frame illustrates the situation from the old model where the right ball is unable to maintain its integrity. In the right frame the same situation is depicted, but simulated using the improved model Figure 8 A wooden box is heavily lifted in one corner. Original vs. improved model, on the left and right frame, respectively 6 93 Figure 9 Constraint strength is increased interactively and yields the effect of inflation Figure 11 Fluffy water lily modeled using an ellipsoid solid Figure 10 Twisting the pudding renders skewing 7 References Conclusion [1] M. K. Christensen and A. Fleron (2004), “Implementation of Deformable Objects,” Department of Computer Science, University of Copenhagen, DIKU [2] D. Eberly (2003), "Derivative Approximation by Finite Differences," Magic Software, Inc., January 21, 2003 [3] K. Erleben, H. Dohlmann, J. Sporring, and K. Henriksen (2003) “The OpenTissue project,” Department of Computer Science, University of Copenhagen, DIKU, November 2003, http://www.diku.dk/forskning/image/research/opentissue/ [4] K. Erleben, J. Sporring, K. Henriksen, and H. Dohlmann (2004), "Physics-based Animation and Simulation," DIKU [5] H. Goldstein, C. P. Poole, and J. L. Safko (2002), "Classical Mechanics," Third Edition, Addision-Wesley [6] B. Heidelberger, M. Teschner, and M. Gross (2004), "Detection of Collisions and Self-collisions Using Imagespace Techniques," Proc. WSCG'04, University of West Bohemia, Czech Republic, pp. 145-152 [7] J. J. Koenderink (1990), "Solid Shape," MIT Press [8] J. R. Shewchuk (1994), "An Introduction to the Conjugate Gradient Method Without the Agonizing Pain," Carnegie Mellon University [9] D. Terzopoulos, J. C. Platt, A. H. Barr, and K. Fleischer (1987), “Elastically deformable models,” Computer Graphics, volume 21, Number 4, July 1987, pp 205-214 The original model for deformable solids, presented in [9], turned out to be insufficient for the sake of realism. Even extremely modest external forces applied to the bodies would ruin their integrity, as can be verified on the left side of Figures 6 to 8. Clearly some extensions to the model were needed. In this paper we have presented three improvements to the model that will give deformable solids a much better ability to keep their original integrity, and thus the ability to handle much larger deformations without collapsing. With the new Metrix, SDM, and Pillar additions to the model, the overall strength of the internal elastic constraints is increased. This reinforces the impression that the elastic bodies are actually solids. We have shown that the improvements to the original model greatly increase the usability of the method for simulating deformable solids. As the new contributions influence the same particles as the original model, the system matrix A is still of size LMN × LMN , and it still possess its original pleasant properties that allows an relaxation based solver to invert it, using only a few iterations. The Metrix replaces the metric tensor, thus it is only the SDM and Pillar additions that are new to the calculations, and they can be computed in constant time for each particle. Newer papers on deformable solids state the importance of preservations of length, surface area, and volume. What we have done with the elastically deformable model is precisely to strengthen the surface preservation using the Metrix, and to implement the missing volume preservation using the SDM. 7 94 The Thin Shell Tetrahedral Mesh Kenny Erleben∗and Henrik Dohlmann† Department of Computer Science, University of Copenhagen, Denmark Abstract Tetrahedral meshes are often used for simulation of deformable objects. Unlike engineering disciplines graphics is biased towards stable, robust and fast methods, instead of accuracy. In that spirit we present in this paper an approach for building a thin inward shell of the surface of an object. The goal is to device a simple and fast algorithm yet capable of building a topologically sound tetrahedral mesh. The tetrahedral mesh can be used with several different simulation methods, such as the nite element method (FEM). The main contribution of this paper is a novel tetrahedral mesh generation method, based on surface extrusion and prism tesselation. Twofold Brep Inward Extrusion CR Categories: I.3.5 [Computer Graphics]: Computational Geometry and Object Modeling—Physically based modeling; I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism— Animation; Keywords: Tetrahedral Mesh, Erosion, Extrusion, Tesselation, Shell, Prism 1 Tesselating Prisms Introduction Given a 3D polygonal model created by a 3D artist, it is often a challenge to create a spatial structure for simulating a deformable object. Besides, polygonal models for visual pleasing pictures tends to be highly tessellated. Thus, even if they do not pose any “errors”, creating a tetrahedral mesh directly from the polygonal model, tends to create an enormous amount of tetrahedra. To achieve real time performance, one seeks a more coarse tetrahedral mesh. These are the kind of problems we aim to solve in this paper. Given a twofold boundary representation of an object, as a connected triangular mesh (a watertight surface), the tetrahedral mesh is build by extruding each triangle inward, that is in opposite direction of the triangle normal. Thus, for each triangle a prism is generated. The result is a volumetric mesh consisting of connected prisms. These prisms can now be tessellated into tetrahedra, thereby creating the rst layer of the thin shell tetrahedral mesh. Succeeding layers can be created by recursively applying this approach. Figure 1 illustrates the basic idea. Although the overall idea is simple, the approach is not without problems. Polygonal models are seldom twofold, but suffers from all kind of degeneracies. The idea we have illustrated is obviously capable of handling an open boundary, but cases where edges share more than two neighboring faces, or self-loop edges, are clearly unwanted, since the generated prisms will overlap each other or degenerate into a zero-volume prism. The prism generation is reminiscent of an erosion operation with a spherical strutural element (mathematical morphology). The radius of the sphere corresponds to the extrusion length. It is well known that working directly on the Brep [Sethian 1999] is fast and simple, but topological problems arises easily, such as swallow tails. In case the given triangle mesh is not a proper mesh, one can apply a mesh reconstruction algorithm [Nooruddin and Turk 2003]. ∗ [email protected] † [email protected] Figure 1: The basic algorithm for generating a thin tetrahedral shell from a boundary representation. We will require the following properties of the prisms making up a thin shell layer • No two prisms must be intersecting each other (Neighbors are allowed to touch each other at their common faces). • No prism must collapse to zero volume or be turned inside out (equivalent to signed volume is always positive). • All prisms must be convex Unfortunately, even if we are given a perfect connected twofold triangle mesh, we can get into trouble if we make the inward extrusion too big. This is illustrated in Figure 2. Here, the large extrusion length causes prisms B and C to become non-convex. Furthermore, A and D, B and D, C and A, and B and C are overlapping. Fortunately, these degenerate and unwanted prisms can be avoided if the extrusion were made smaller. Thus, we seek an upper bound on how far we can extrude the triangle faces inward, without causing degenerate prisms. A publicly available implementation of the described algorithm can be found in [OpenTissue 2004]. Existing tetrahedral mesh generation methods create an initial, blocki ed tetrahedral mesh from a voxelization or signed distance map. Afterwards, nodes are iteratively repositioned, while subsampling tetrahedra to improve mesh quality [Mueller and Teschner 2003; Persson and Strang 2004; Molino et al. 2004]. Our approach differs mainly from these in being surface-based. 95 must have non-zero volume, as long as prism is not turned inside out. We therefore seek a robust way to determine an upper bound on ε, such that all generated prisms will be valid. The direction of the normal of the extruded face,nq , can be found from q1 ,q2 , and q3 , using the cross-product: nq (ε) = (q2 (ε) −q1 (ε)) × (q3 (ε) −q1 (ε)) . This is a second order polynomial in ε, nq (ε) = (q2 (ε) −q1 (ε)) × (q3 (ε) −q1 (ε)) = ((p2 −n2 ε) − (p1 −n1 ε)) × ((p3 −n3 ε) − (p1 −n1 ε)) = ((p2 −p1 ) + (n1 −n2 ) ε) × ((p3 −p1 ) + (n1 −n3 ) ε) = ((p2 −p1 ) × (p3 −p1 )) + % &' ( B c C A ((p2 −p1 ) × (n1 −n3 ) + (n1 −n2 ) × (p3 −p1 )) ε+ % &' ( b D ((n1 −n2 ) × (n1 −n3 )) ε 2 % &' ( a Figure 2: Degenerate prisms results from a too big inward extrusion. 2 Inward Extrusion As a preprocessing step, we compute the pseudo normals for all vertices (the angle-weighted normals [Aanæs and Bærentzen 2003]). These will indicate the direction along which a vertex will be extruded. Given a triangle consisting of three vertices p1 , p2 and p3 , with angle weighted normals n1 , n2 and n3 , the inward extruded prism is de ned by the six corner points: The extrusion length is given by ε > 0. Notation is illustrated in Figure 3. By requiring ε to be strictly positive, all generated prisms n2 p2 n1 p3 p1 q2 nq q3 Observe that c = 0, since its magnitude is equal to twice the area of the triangle being extruded. To ensure convexity, the dot product of the direction of the normal of the extruded face, nq , with the pseudo normals, n1 , n2 , and n3 , must always be positive. That is n1 ·nq (ε) > 0 n2 ·nq (ε) > 0 n3 ·nq (ε) > 0. This yields the following system of constraints, ⎤⎡ ⎤ ⎡ n1 ·a n1 ·b n1 ·c ε 2 ⎣n2 ·a n2 ·b n2 ·c⎦ ⎣ ε ⎦ > 0. 1 n3 ·a n3 ·b n3 ·c p1 p2 p3 q1 (ε) = p1 −n1 ε q2 (ε) = p2 −n2 ε q3 (ε) = p3 −n3 ε. n3 = aε 2 +bε +c. q1 Figure 3: The six corner points de ning a prism, and pseudo normals yielding extrusion directions. We solve for the smallest positive ε ful lling the system of constraints. That is, each row represents the coef cient of a second order polynomial in ε, thus for each row we nd the two roots of the corresponding polynomial. The three rows yields a total of 6 roots. If no positive root exist, then ε = ∞, otherwise ε is set equal to the smallest positive root. Observe that the third column of the coef cient matrix is always positive (by the property of the angle weighted normals). The rst column can be interpreted as an indication of, whether the corresponding extrusion line is trying to “shrink” (< 0) or “enlarge” (> 0) the extruded face. The middle column is dif cult to interpretate. As far as we can tell it resembles an indication of the skewness of the resulting prism. In fact, the tree convexity constraints ensure that no neighboring prism will intersect each other, nor will the prism turn its inside out (ie. ipping the extruded face opposite the original face). The maximum extrusion length for the entire layer can be found by iterating over each prism. For the i’th prism the extrusion length ε i is computed. The maximum extrusion length of the layer is found as . ε = min ε 0 , . . . , ε n−1 . Afterwards, it is a simple matter to compute the actual extrusion and generating the prisms. 96 3 Prism Generation algorithm markDegeneratePrism(ε, γ) for each prism p do if ε p = ε then if ||nq (ε)|| ≤ γ then if (q2 (ε) −q1 (ε)) = 0 or (q3 (ε) −q1 (ε)) = 0 then mark p as line-degenerate else mark p as point-degenerate end if end if next p End algorithm The technique in the previous section guarantee that prisms will be convex, and that no intersections occurs between neighboring prisms. However, degenerate prisms can still occur whenever the extruded face collapses to a line or to a point. These are shown in Figure 4. These degenerate cases must be marked, such that the foln2 p2 n3 n1 p3 p1 Figure 5: Pseudo code for marking degenerate prisms. q1,2,3 n2 p2 n3 Layer 0 n1 p3 p1 q2 Layer 1 q1 q3 Figure 6: Degenerate cases affect succeeding layers. Figure 4: Point-degenerate and line-degenerate prisms. lowing tessellation can take them into consideration. The problem is how to detect these degenerate cases. If no upper bound were found on the extrusion length for a prism, we can trivially reject the prism. However, if an upper extrusion bound were computed, the prism might degenerate. As a rst step in marking the degenerate prisms, we iterate over all those prisms where an upper bound were computed. If the computed upper bound for a prism is equal to the extrusion length, then it might be a degenerate prism. For each of these possible degenerate prisms we rst test whether ||nq (ε)|| ≤ γ, where γ is a user speci ed threshold to counter numerical precision problems. If this criteria is ful lled, we clearly know that we are dealing with a degenerate prism. The degenerate prisms can be classi ed as either pointdegenerate or line-degenerate, as shown in Figure 4. The linedegenerate case is given by the criteria (q2 (ε) −q1 (ε)) = 0 or (q3 (ε) −q1 (ε)) = 0. If this criteria is not ful lled we have a point-degenerate case. A pseudo code version of the algorithm is shown in Figure 5. The degenerate cases do not only in uence the prism tessellation (which we will treat in the next section). If another thin shell layer is to be generated, then the original mesh faces can no longer be used. A simple 2D case is shown in Figure 6. Notice that the bold red faces were used when creating layer 0, but they vanish when layer 1 is created. Therefore, if another layer should be generated, a new connected triangular mesh must be formed from the extruded faces of the non-degenerate prisms. In Figure 7 the surface mesh generation algorithm is shown in pseudo code. 4 Prism Tessellation For non-degenerate prisms, having 6 corners, the minimum number of tetrahedra we can tesselate the prism into, is 3. This is shown in Figure 8. The problem with this approach is that the extruded sides of the prism will be triangulated. One therefore has to ensure, that the tessellation of neighboring prisms agree with each other. This is illustrated in Figure 9. As seen in Figure 9 it becomes a global combinatorial problem to match the tesselation of neighbors against each other. If we use the centroid of the prism as the apex for creating tetrahedra then a prism can be tesselated into eight tetrahedra, as shown in Figure 10. With this approach the tesselation is no longer a global problem, since the tesselation of each prism side can be chosen in- algorithm build-surface(Mesh : M) for each prism p do for i = 1, 2, 3 do / M then if qi ∈ add qi to M end if next i if p not degenerate then add face q1 ,q2 ,q3 to M end if next p End algorithm Figure 7: Pseudo code for generating surface mesh for next shell layer. 97 Figure 11: Nice vs. bad line case. Rising (R) Falling (F) Falling (F) Figure 8: Prism tessellated into 3 tetrahedra. Figure 9: Tessellation of neighboring prisms must be consistent. dependently of each other. Both approaches deals nicely with the point-degenerate case. Since a point-degenerate prism is already a tetrahedron, there is no need to tesselate it. However, since the point-degenerate case only have triangular sides, it can never be a direct neighbor with a non-degenerate prism. It can only be neighbor with other pointdegenerate cases or line-degenerate cases, which have both rectangular and triangular sides, as shown in Figure 11. In the following we disregard degeneracies and consider the three-tetrahdra tesselation strategy. This is because it is more attractive due to the lower tetrahedra count. A prism can be tesselated into three tetrahedra in 6 different ways. In order to classify the 6 types of tesselation, we will mark the rectangular sides of a prism as falling (F) or rising (R). The edge type depends on wether the tesselation edge is falling or rising as we travel along the extruded prism face in counter clock wise order. See Figure 12. We observe that the three-tetrahedra tesselation strategy will always have two prism sides of the same type, and the last side of opposite type. Thus, we can only have 6 different patterns, as shown in Table 1. The consistency requirement implies that if one side of a prism is marked as F, then the neighboring prism will have marked the same side as R. In short, no neighboring prisms will have a side marked with the same type. A simple tesselation example is shown in Figure 13. Here a tetrahedron mesh is being tesselated. The thetrahedron has been cut up and layed out in 2D. Triangle edges corresponds to rectangular sides of prisms. Let us apply a brute-force strategy to the combi- Figure 12: Classi cation of prism sides as falling (F) or rising (R). natorial problem of the tesselation as follows: We start at a single prism and choose one of the 6 tesselation types. Then we visit the neighboring prisms and choose a tesselation type that agree with the immediate neighbor prisms, which have already been tesselated. This is a breadth rst traversal over the prisms. The method is not fail-safe, since inconsistency can arise, as shown in Figure 14. Here, the middle prism is the last prism to be visited by the traversal. Clearly, it is impossible to assign a tesselation type to the prism, since all three sides should have the same type. We can repair the inconsistency by picking one of the neighboring prisms and ipping the type of its shared edge. This action will not change the type of any of the edges marked with arrows in Figure 14. Therefore, the repairing action do not cause a rippling effect through the prisms, and inconsistencies are not introduced at other places. Fixing inconsistency in this way is attractive, since it offers a local solution to a global problem. However, sometimes we might end up in a dead-lock where no local solution can be found, as is shown in the top of Figure 15. This time the drawing in the gure resembles a small local view of a larger mesh. Notice that none of the edges shared with the inconsistent prism can be ipped, without creating an inconsistent neighboring prism. The problem is that all the edges marked with arrows are of the same type. The solution to the problem is shown in the bottom of Figure 15. We let the inconsistency ripple as water waves over to neighboring prisms, in a search for a single prism, where an edge ip does not give rise to a new inconsistency. When such a prism is encountered, we track the trajectory of the ripple wave-front back to the originating inconsistent prism, and ip all shared edges lying on this path. In Figure 16 we have shown the result of the rippling. Notice that two edges are ipped. These are the edges lying on the path to the prism that could be ipped. Also notice that all edges marked with arrows are unaffected by the rippling action. This property ensures that the rippling action will not cause inconsistencies in any prisms elsewhere in the mesh. A pseudo code of the tesselation-pattern- nding algorithm is shown in Figure 17. Our proposed tesselation pattern algorithm F R R R F F Figure 10: Prism tessellated into eight tetrahedra. R F R F R F R R F F F R Table 1: The 6 Four-Tetrahedra Tesselation types. 98 F R R F F F R R R R F R F F R R R F F R R Figure 13: Tesselation Example. A simple 3D mesh (a tetrahedron) have been cut up and layed down in 2D. Triangles correspond to prisms in the thin shell. F F F R R R F F R R F R F F R F F F R R R Figure 14: Inconsistent Tesselation Example. The middle prism will have the same type on all sides, which is illegal. Figure 15: Top picture shows a dead-locked inconsistent tesselation. The bottom picture shows that inconsistency problem have been propagated to neighboring prisms further away. is an ad-hoc solution for the problem at hand. We do not have a formal proof, stating that it is always possible to nd a consistent pattern of rising and falling tesselation edges. 5 Results We have implemented the extrusion length computation and tesselation-pattern algorithm. Currently the implementation detects degenerate prisms, but do not tesselate these. In our test cases we have chosen 14 meshes of increasing size, that were all scaled to be within the unit-cube. A single layer shell were computed, given a user speci ed maximum extrusion length. In Table 2, performance statistics are listed, together with polygon count, extrusion lengths and rippling action information. The timings for a single layer construction are cheap, and appears to scale lineary with mesh size. The resulting tetrahedral meshes are visualized in Figure 18. As seen in Table 2, the test-cases: diku, teapot, propeller, funnel, cow, and bend have a surpringsingly small extrusion limit. The remaining test cases show excellent extrusion limits. Figure 19 shows the prisms corresponding to the minimum extrusion limit. Notice that in all cases, where the limit is unexpected small, the limit is caused by small faces or long slivers on sharp ridges. Figure 20 shows cut-through views of the cylinder, pointy, tube, sphere, teapot, funnel, bowl, and torus meshes. Notice how thin the teapot and funnel are. F R R R R F R F R R F R F R R Figure 16: The rippling solution to the dead-locked case shown in Figure 15. 99 Figure 18: Tesselation results of the 14 meshes. 100 Figure 19: Prisms marked with blue have minimum extrusion limit. 101 Figure 20: Various cut-through views of a few selected meshes, illustrating the thin shell. 102 algorithm tesselation-pattern() Queue Q Push first prism onto Q While Q not empty do Prism p = pop(Q) mark p as visited if neighbors is not tesselated then pick random pattern of p else if exist consistent pattern with neighbors assign consistent pattern to p else if exist neighbor that can be flipped flip edge type of shared edge with p assign constent pattern to p else perform-rippling end if end if for all unvisited neighbors n of p do push(Q, n) next n End while End algorithm Figure 17: Pseudo code for determining tesselation pattern. 6 Discussion We have omitted the problem of shell layers overlapping from opposite sides. In Figure 20 the problem is seen in the case of the bowl mesh. Our solution to the problem have been to ignore it. The user can always choose a smaller maximum extrusion length. Degenerate prisms were ignored, in the sense that we are able to detect if they occur, but our tesselation pattern algorithm is not yet capable of handling them. No degenerate prisms were generated for any of the examples in our result section. The global computation of the extrustion length work fairly well for some meshes, but for others a surpingsly small extrusion length is found. Long slivers and small faces lying close to sharp ridges are the reason for this phenemena, as can be seen in Figure 19. Thus, we conclude that, not surpringsingly, the algorithm is highly dependent on both the shape of the object, but also upon the tesselation of the object surface. It appears that a good uniform tesselation work best. Abrubt tesselation with large aspect ratios results in small extrusion lengths. Mesh reconstruction [Nooruddin and Turk 2003] could be used as a preprocessing step to create a more suitable tesselation. Another avenue for circumventing the problem of a small global extrusion, might be to investigate the possibililty of having nonglobal extrusion lengths, i.e. a varying extrusion length over the mesh, adapting itself to take the local maximum length without causing degenerate prisms. We believe this is an interesting thought an leave it for future work. Our results indicate that our tesselation pattern algorithm works: We have not yet encountered an unsolvable problem. We believe this shows, that the combinatorial problem of nding the tesselation pattern, is at least solvable in practice. From a theoretical viewpoint, a proof of existence would be very interesting, and we leave this for future work. 7 Conclusion In this paper we have presented preliminary results, showing that it is possible to generate a thin shell, without any topological errors. box cylinder pointy diku tube sphere teapot propeller funnel cow bend bowl torus knot |F| 12 48 96 288 512 760 1056 1200 1280 1500 1604 2680 3072 5760 |R| 0 0 0 0 0 0 1 2 1 0 0 0 0 0 δ 0.1 0.1 0.083 0.004 0.1 0.1 0.001 0.004 0.003 0.001 0.006 0.017 0.099 0.1 ε 0.866 0.431 0.083 0.004 0.174 0.476 0.001 0.004 0.003 0.001 0.006 0.017 0.099 0.102 time(secs.) 0 0 0 0 0.01 0.01 0.01 0.02 0.029 0.02 0.029 0.04 0.059 0.089 Table 2: Performance Statistics on 14 different test cases. In all cases the end user requested a shell thickness of 0.1. The δ -column shows the actual shell thickness produced. The ε-column shows the extrusion limit. The |F|-column gives the face count of the meshes. The |R|-column gives the number of times the ripple action were invoked. The zero entires in the time column indicates that the duration were not measureable by the timing method. As pointed out in the previous section, there are many unsolved issues to be dealt with. Our motivation for this work were to create a volumetric mesh with low tetrahedra count for animation purpose. Due to the early stage of this work, we have not yet validated whether our approach is useful for animation. References A ANÆS , H., AND B ÆRENTZEN , J. A. 2003. Pseudo–normals for signed distance computation. In Proceedings of VISION, MODELING, AND VISUALIZATION. M OLINO , N., B RIDSON , R., T ERAN , J., AND F EDKIW, R. 2004. Adaptive physics based tetrahedral mesh generation using level sets. (in review). M UELLER , M., AND T ESCHNER , M. 2003. Volumetric meshes for realtime medical simulations. In Proc. BVM, 279–283. N OORUDDIN , F., AND T URK , G. 2003. Simpli cation and repair of polygonal models using volumetric techniques. IEEE Transactions on Visualization and Computer Graphics 9, 2, 191–205. O PEN T ISSUE, 2004. http://www.diku.dk/forskning/image/research/opentissue/. P ERSSON , P.-O., AND S TRANG , G. 2004. A simple mesh generator in matlab. SIAM Review 46, 2 (June), 329–345. S ETHIAN , J. A. 1999. Level Set Methods and Fast Marching Methods. Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science. Cambridge University Press. Cambridge Monograph on Applied and Computational Mathematics. 103 Talking Faces – a State Space Approach Tue Lehn-Schiøler The Technical University of Denmark Informatics and Mathematical Modelling Email: [email protected] July 5, 2004 Abstract In this paper a system that transforms speech waveforms to animated faces are proposed. The system relies on continuous state space models to perform the mapping, this makes it possible to ensure video with no sudden jumps and allows continuous control of the parameters in ’face space’. The performance of the system is critically dependent on the number of hidden variables, with too few variables the model cannot represent data, and with too many overﬁtting is noticed. To create a photo realistic image an Active Appearance Model is used. Simulations are performed on recordings of 3-5 sec. video sequences with sentences from the Timit database. From a subjective point of view the model is able to construct an image sequence from an unknown noisy speech sequence even though the number of training examples are limited. 1 Introduction The motivation for transforming a speech signal into lip movements is at least threefold. Firstly, the language synchronization of movies often leaves the actors mouth moving while there is silence or the other way around, this looks rather unnatural. If it was possible to manipulate the face of the actor to match the actual speech it would be much more pleasant to view synchronized movies (and a lot easier to make cartoons). Secondly, even with increasing bandwidth sending images via the cell phone is quite expensive, therefore, a system that allows single images to be sent and models the face in between would be useful. The technique will also make it possible for hearing impaired people to lip read over the phone. If the person in the other end does not have a camera on her phone, a model image can be used to display the facial movements. Thirdly, when producing agents on a computer (like Windows Oﬃce Mr. clips) it would make communication more plausible if the agent could interact with lip movements corresponding to the (automatically generated) speech. The idea of extracting phonemes or similar high-level features from the speech signal before performing the mapping to the mouth position has been widely used in the lip-sync community. Goldenthal [9] suggested a system called ”Face Me!”. He extracts phonemes using Statistical Trajectory Modeling. Each phoneme is then associated with a mouth position (keyframe). In Mike Talk 104 [6], phonemes are generated from text and then mapped onto keyframes, however, in this system trajectories linking all possible keyframes are calculated in advance thus making the video more seamless. In ”Video rewrite” [2] phonemes are again extracted from the speech, in this case using Hidden Markov Models. Each triphone (three consecutive phonemes) has a mouth sequence associated with it. The sequences are selected from training data, if the triphone does not have a matching mouth sequence in the training data, the closest available sequence is selected. Once the sequence of mouth movements has been determined, the mouth is mapped back to a background face of the speaker. Other authors have proposed methods based on modeling of phonemes by correlational HMM’s [21] or neural networks [10]. Methods where speech is mapped directly to facial movement are not quite as popular as phoneme based methods. However, in ’Picture my voice’ [13], a time dependent neural network, maps directly from 11 × 13 Mel Frequency Cepstral Coeﬃcients (MFCC) as input to 37 facial control parameters. The training output is provided by a phoneme to animation mapping but the trained network does not make use of the phoneme representation. Also Brand [1] has proposed a method based on (entropic) HMM’s where speech is mapped directly to images. In [17] Nakamura presents an overview of methods using HMM’s, the ﬁrst MAPV converts speech into the most likely HMM state sequence and the uses a table lookup to convert into visual parameters. In an extended version MAP-EM the visual parameters are estimated using the EM algorithm. Methods that do not rely on phoneme extraction has the advantage that they can be trained to work on all languages, and that they are able to map non-speech sounds like yawning and laughing. There are certain inherent diﬃculties in mapping from speech to mouth positions an analysis of these can be found in [7]. The most profound is the confusion between visual and auditive information. The mouth position of sounds like /b/,/p/ and /m/ or /k/,/n/ and /g/ can not be distinguished even though the sounds can. Similarly the sounds of /m/ and /n/ or /b/ and /v/ are very similar even though the mouth position is completely diﬀerent. This is perhaps best illustrated by the famous experiment by McGurk [16]. Thus, when mapping from speech to facial movements, one cannot hope to get a perfect result simply because it is very diﬃcult to distinguish whether a ”ba” or a ”ga” was spoken. The rest of this paper is organized in three sections, section 2 focuses on feature extraction in sound and images, in section 3 the model are described. Finally experimental results are presented in section 4. 2 Feature extraction Many diﬀerent approaches has been taken for extraction of sound features. If the sound is generated directly from text phonemes can be extracted directly and there is no need to process the sound track [6]. However, when a direct mapping is performed one can choose from a variety of features. A non-complete list of possibilities include Perceptual Linear Prediction or J-Rasta-PLP as in [1, 5], Harmonics of Discrete Fourier Transform as in [15], Linear Prediction Coeﬃcients as in [12] or Mel Frequency Cepstral Coeﬃcients [9, 10, 13, 17]. In this work the sound is split into 25 blocks per second (the same as the image 2 105 Figure 1: Facial feature points 1 . frame rate) and 13 MFCC features are extracted from each block. To extract features from the images an Active Appearance model (AAM) [3] is used. The use of this model for lipreading has previously been studied by Mathews et al. [14]. AAM’s are also useful for low bandwidth transmission of facial expressions [20]. In this work an implementation by Mikkel B. Stegman [19] is used. For the extraction a suitable subset of images in the training set are selected and annotated with points according to the MPEG-4 facial animation standard (Fig. 1). Using these annotations a 14-parameter model of the face is created. Thus, with 14 parameters it is possible to create a photo realistic image of any facial expression seen in the training set. Once the AAM is created the model is used to track the lip movements in the image sequences, at each point the 14 parameters are picked up. In Fig. 2 the result of the tracking is shown for a single representative image. 3 Model In this work the mapping from sound to images is performed by a Kalman ﬁlter. The implementation makes use of the toolbox written by Kevin Murphy (http://www.ai.mit.edu/ murphyk/Software). 1 Image from www.research.att.com/projects/AnimatedHead 3 106 Figure 2: Image with automatically extracted feature points. The facial feature points are selected from the MPEG-4 standard Normally, HMM’s or a Neural network is used for the mapping. In the case of HMM’s a series of models are created,each one is trained on a speciﬁc subset of data. At each time step model that has the highest likelihood wins and is then responsible for producing the image. In this work the entire sequence is considered at once and only a single state space model is trained. In case of the Kalman ﬁlter the model set up is as follows: xk sk ik = Axk−1 + nxk = Bxk + nsk = Cxk + nik (1) (2) (3) In this setting ik is the image features at time k, sk is the sound features and xk is a hidden variable without physical meaning. x can be thought of as some kind of brain activity controlling what is said. Each equation has i.i.d. Gaussian noise component n added to it. During training both sound and image features are known, and the two observation equations can be collected in one. s sk B nk (4) = xk + ik nik C By using the EM algorithm [4, 8] on the training data, all parameters {A, B, C, Σx , Σs , Σi } can be found. Σ’s are the diagonal covariance matrices of the noise components. When a new sound sequence arrives Kalman ﬁltering (or smoothing) can be applied to equations (1,2) to obtain the hidden state x. Given x the corresponding image features can be obtained by multiplication, ik = Cxk . If the intermediate smoothing variables are available the variance on ik can also be calculated. 4 107 4 results The data used is taken from the vidtimit database [18]. The database contains recordings of large number of people each uttering ten diﬀerent sentences while facing the camera. The sound recordings are degraded by fan-noise from the recording pc. In this work a single female speaker is selected, thus 10 diﬀerent sentences are used, nine for training and one for testing. To ﬁnd the dimension of the hidden state (x), the optimal parameters for the KF were found for varying dimensions. For each model the likelihood on training and test sequences were calculated, the result is shown in Fig. 3 and Fig. 4. The test likelihood provides a statistical measure of the quality of the model and provides a way of comparing models. This allows comparison between diﬀerent model approaches. Unfortunately the likelihood is not necessarily a good measure of the quality of a model prediction. If the distributions in the model are broad, i.e. the model has high uncertainty, it can describe data well, but, is not a good generative model. Looking at the results in Fig. 3 and Fig. 4 it is seen that the likelihood of a KF has a peak in the test data around 40 hidden states. In Fig. 5 snapshots from the KF sequence are provided for visual inspection, the entire sequence is available at http://www.imm.dtu.dk/˜tls/code/facedemo.php, where other demos can also be found. No precise metric exist for evaluation of synthesized lip sequences. The distance between facial points in the true and the predicted image would be one way, another way would be to measure the distance between the predicted feature vector and the feature vector extracted from the true image. However, the ultimate evaluation of faces can be only provided by human interpretation. Unfortunately it is diﬃcult to get an objective measure this way. One possibility would be to get a hearing impaired person to lipread the generated sequence, another to let people try to guess which sequence was real and which was computer generated. Unfortunately, such test are time and labor demanding. Further more these subjective test does not provide an error function that can be optimized directly. 5 Conclusion A speech to face mapping system relying on state space models is proposed. The system makes it possible to easily train a unique face model that can be used to transform speech into facial movements. The training set must contain all sounds and corresponding face gestures, but there are no language or phonetic requirements to what the model can handle. In this work a linear model is used but future work will investigate the use of nonlinear models by using particle ﬁltering and Markov Chain Monte Carlo methods. Other future work include extracting emotions from the speech and mapping them to the face. 5 108 Figure 3: The likelihood evaluated on the training data. The Kalman ﬁlter is able to utilize the extra dimension to improve the training result. Figure 4: The likelihood evaluated on the test data. The Kalman ﬁlter improves performance as more hidden dimensions are added and overﬁtting is seen for high number of hidden states. 6 109 (a) (b) (c) Figure 5: Characteristic images taken from the test sequence when using the Kalman ﬁlter. The predicted face is to the left and the true face to the right. References [1] Matthew Brand. Voice puppetry. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pages 21–28. ACM Press/Addison-Wesley Publishing Co., 1999. [2] Christoph Bregler, Michele Covell, and Malcolm Slaney. Video rewrite: driving visual speech with audio. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pages 353–360. ACM Press/Addison-Wesley Publishing Co., 1997. [3] T.F. Cootes, G.J. Edwards, and C.J. Taylor. Active appearance models. Proc. European Conference on Computer Vision, 2:484–498, 1998. [4] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. JRSSB, 39:1–38, 1977. [5] S. Dupont and J. Luettin. Audio-visual speech modelling for continuous speech recognition. IEEE Transactions on Multimedia, 2000. [6] T. Ezzat and T. Poggio. Mike talk: a talking facial display based on morphing visemes. Proc. Computer Animation IEEE Computer Society, pages 96–102, 1998. [7] Lavagetto F. Converting speech into lip movements: A multimedia telephone for hard of hearing people. IEEE Trans. on Rehabilitation Engineering, 3(1), 1995. 7 110 [8] Z. Ghahramani and G.E. Hinton. Parameter estimation for linear dynamical systems. Technical report, 1996. University of Toronto, CRG-TR-96-2. [9] William Goldenthal, Keith Waters, Thong Van Jean-Manuel, and Oren Glickman. Driving synthetic mouth gestures: Phonetic recognition for faceme! In Proc. Eurospeech ’97, pages 1995–1998, Rhodes, Greece, 1997. [10] Pengyu Hong, Zhen Wen, and Thomas S. Huang. Speech driven face animation. In Igor S. Pandzic and Robert Forchheimer, editors, MPEG-4 Facial Animation: The Standard, Implementation and Applications. Wiley, Europe, July 2002. [11] T. Lehn-Schioler, L. K. Hansen, and J. Larsen. Mapping from speech to images using continuous state space models. In Joint AMI/PASCAL/IM2/M4 Workshop on Multimodal Interaction and Related Machine Learning Algorithms, April 2004. [12] J. P. Lewis. Automated lip-sync: Background and techniques. J. Visualization and Computer Animation, 2, 1991. [13] Dominic W. Massaro, Jonas Beskow, Michael M. Cohen, Christopher L. Fry, and Tony Rodriguez. Picture my voice: Audio to visual speech synthesis using artiﬁcial neural networks. Proc. AVSP 99, 1999. [14] I. Matthews, T.F. Cootes, J.A. Bangham, S. Cox, and R. Harvey. Extraction of visual features for lipreading. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(2):198 –213, 2002. [15] David F. McAllister, Robert D. Rodman, Donald L. Bitzer, and Andrew S. Freeman. Speaker independence in automated lip-sync for audio-video communication. Comput. Netw. ISDN Syst., 30(20-21):1975–1980, 1998. [16] H. McGurk and J. W. MacDonald. Hearing lips and seeing voices. Nature, 264:746–748, 1976. [17] Satoshi Nakamura. Statistical multimodal integration for audio-visual speech processing. IEEE Transactions on Neural Networks, 13(4), july 2002. [18] C. Sanderson and K. K. Paliwal. Polynomial features for robust face authentication. Proceedings of International Conference on Image Processing, 3:997–1000, 2002. [19] M. B. Stegmann. Analysis and segmentation of face images using point annotations and linear subspace techniques. Technical report, Informatics and Mathematical Modelling, Technical University of Denmark, DTU, Richard Petersens Plads, Building 321, DK-2800 Kgs. Lyngby, August 2002. http://www.imm.dtu.dk/pubdb/p.php?922. [20] B.J. Theobald, S.M. Kruse, J.A. Bangham, and G.C. Cawley. Towards a low bandwidth talking face using appearance models. Image and Vision Computing, 21(13-14):1117–1124, 2003. [21] Jay J. Williams and Aggelos K. Katsaggelos. An hmm-based speech-tovideo synthesizer, 1998. 8 111 Live Interpretation of Conductors’ Beat Patterns Declan Murphy Computer Science Department University of Copenhagen [email protected] Abstract should be [10, 2], this approach would be neither necessary nor realistic. In practice, both the musicians and the conductor are quite familiar with the score before a performance. They all have a fair idea of how the music ought to turn out and of what to expect from each other. It is simply inconceivable that a real conductor would issue a 3-beat pattern where a 4-beat one would have been appropriate: such a degree of inconsistency would just never work in practice. The musicians do not need to be told how many beats are in a bar! Even if they did, it would be too late by the time they got the message. It is rather how the particular beat pattern is executed that conveys the important information, and accordingly it is the deviation of the executed beat pattern from what would have been expected that is to be recognized not the beat pattern per se. It is therefore reasonable to annotate a score ﬁle with timing, dynamic and articulation indications just as a real musician’s score is rather than using only raw MIDI values. A method is presented for following and interpreting the conductor’s gestures known as beat patterns. Taking knowledge of the expected beat pattern and observed baton coordinates as input, the system keeps track of where the baton is in relation to the beat pattern. Tempo, dynamic level and registration updates are made on the ﬂy according to deviations of the observed coordinates from those expected based on the score. A suitable mathematical model of the conducting process is constructed, and an algorithm for following and interpreting the execution of the beat patterns is developed. 1 Introduction The work of this paper builds upon a previous computer vision technique to track an unadorned conductor’s baton [6] and takes a step towards a more complete conducting system. The cameraview location of the baton’s tip is available in realtime from the tracker, so the next task is to be able to follow and interpret the conductor’s beat pattern as executed by the user. Deviations of the user’s conducting from that expected for the music are to modulate the playing tempo, dynamics and articulation. Rudolf [9] elucidated the structure of the conductor’s beat patterns and how they have a sort of grammar. The task in this sense is to parse the execution of the beat patterns as they unfold on the ﬂy in order to extract the expressive information. A symbolic understanding of the conductor’s gestures is attainable by correlating the tracked motion with the standard conductors’ beat patterns, some of which are shown in Fig. 1. While it ought to be possible to recognize all beat patterns (allowing for some delay) without any a priori expectation of which particular pattern it The conductor has a conception of an ideal performance interpretation, according to which s/he instructs the players to keep as close as possible to. This is done by issuing instructions to compensate for any drift from this ideal, in order that the performance should tend to return to a presumed state of perfection. For example, if the conductor considers that the musicians are playing too quickly, then a beat pattern will be executed more slowly than would otherwise have been expected. It is important to understand that the role of the conductor is not to play the orchestra as such, but instead to modulate their playing towards a coherent ideal interpretation. There are no absolute rules or values to be measured: it is instead a question of establishing expectation and recognizing subsequent deviation from it. (On a cognitive science note as discussed in [5, Ch. 6], it may be argued at length that this process of establishing and maintaining expectation while 1 Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04 2 112 deviating from it is what music is all about. It seems appropriate then that the meta-process of modulating performance should also be of this nature.) The model then involves a representation of the score with beat pattern annotations, and a system for recognising the deviations from the expected tempo and volume as indicated by the sampled baton locations. These deviation data can then be used to modify live generated output. 2 Scenario Template Next? Nature of the Task Even though the general form of the beat pattern may be known beforehand, the actual performed beat pattern will in general deviate from this curve’s exact shape, size, registration and rotation. Some of this deviation is perfectly natural and should not have any bearing on the output, but of course the whole point is that certain other deviations are signiﬁcant. It becomes important to be able to recognize when a drift from the abstract pattern is just the natural variation of this particular instance of it and when it is a message to alter musical playback. The diﬃculty of this recognition is compounded by the fact that this recognition must take place in real-time and that the beat-pattern interpreter only receives periodic updates of the tip’s position just isolated sample points with which to determine a great deal of information on the ﬂy. To make matters even worse, the score values for the nominal expected size and speed of the beat patterns change, e.g., as the music gets softer and slows down. In practice, the beat pattern follower can only rely on receiving sample point updates at the relatively coarse rate of 15 per second. Figure 2 illustrates how this coarseness introduces temporal uncertainty. The simulation and recognition of cues will not be attempted here. It may safely be expected that a programmed computer will respond without fail at the appropriate time: it will not forget or get lost just because the previous thirty-seven bars have been rests, for example. The expressive timing of a voice’s entry is relevant though, and this may be dealt with as a special case of articulation. 2.1 Previous Considerations Fifteen samples per second would not be good enough at all for satisfactory temporal resolution if the baton was to play the music. The resolution of MIDI Time Code (MTC) is 1–2 ms, for example. Next? Beat 1 Now Next? Figure 2: Illustrating the inherent temporal uncertainty of following a beat pattern with only periodic updates. In most beat patterns, the ﬁrst beat occurs at the bottom of a down stroke such as in the template on the left. In the scenario on the right, the last few points are marked by red crosses and the point marked “Now” has just occurred. The task involves determining the beat point as accurately as possible, giving rise to the question of where to expect the next sample. It could be that the tip is still traveling downwards as suggested in green; it could be that the tip has just bottomed out as suggested in blue; it could be that the tip will bottom out before the next sample arrives as in orange. The system cannot predict the future with certainty, but yet it will most likely be too late if it waits until the next sample arrives. Since conducting operates by regulating at a supranote level, this ought to be just about good enough. If the sample rate were much lower then there would be pathological loss of information: the system would break down. There is a lower bound speciﬁed by the Nyquist sampling theorem (a corollary of Shannon’s far-reaching theorem about entropy in information theory [11]), but greater resolution is required for determining the current location along the beat pattern and for simultaneously recognizing deformations of the template. Some considerations which were instrumental in formulating the template and designing the following algorithm were: Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04 3 113 stop 4 2 neutral legato full stacato 1 3 stop light staccato 2 4 3 3 2 1 1 legato poco espressivo legato mezzo espressivo legato molto espressivo 4 4 2 3 2 1 n Baton passes through without stopping n Baton stops at this point 4 4 2 3 1 3 1 Tense controlled movement Field of beating Quick flick Bouncing Figure 1: The range of (more or less) standard 4-beat patterns ranging from light-staccato to moltoespressivo according to Rudolf [9]. Note the increasing lack of synch points. 1. The major phases of all of the beat patterns may be recognized by the tip’s rough position q and approximations to its ﬁrst and second time derivatives q̇, q̈. If there were on-board accelerometers or if numerical computation of derivatives was not inherently unstable ([1], [5, §A.1.2]), then this would be a preferred way to proceed. It would oﬀer automatic accommodation of individual style and freestyle, morphing between levels of expressivity, ease of encoding and eﬃcient computation. 2. It the spatial match is very good, then the tempo should be adjusted to match the conductor. 3. If q lies signiﬁcantly outside of the expected curve, signal a crescendo; if q lies inside, a diminuendo. 4. Extrema may be used to decisively update scale, translation and tempo. In-between extrema, there must be anticipation in order for the system concept to work. This anticipation should be based on the expectation derived from the score ﬁle, viz. crescendo, ritardando, etc. 2.2 Clariﬁcation of the Task The system is given: • snapshot samples of q(ti ) ∈ R2 , the tip in ﬂight, for some discrete time values ti , • a score ﬁle annotated with indications of beat patterns, dynamics, tempo, etc., • a template of the beat pattern as a parametric continuous closed curve p in R2 . The system must calculate: • q’s parametric position in the template in order to determine the conductor’s tempo, • updates of the scale factor of the template in order to determine the conductor’s dynamics, • updates of the translation of the template in order to continue to match the conductor’s execution with the standard template successfully and accurately. The rotation of the template shall be assumed to be ﬁxed. The camera can simply be rotated if necessary, but even this simple once-oﬀ calibration should not normally be necessary. People do not Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04 4 114 normally loose their sense of up and down, so rotation matching is considered to be not worth implementing. Without rotation, any drift or translation of the conductor’s beating may be completely described by a vector b ∈ R2 . 2.3 ρ2 ρ1 Representation of the Template The ﬁrst approach considered was to encode each of the standard beat patterns (from Rudolf [9]) as normalized continuous closed parametric curves p : [0, 1) → R2 constructed out of simple polynomials as splines. To this would be added a time curve κ : [0, 1) → [0, 1) such that the composition p ◦ κ maps at the same rate as the gesture. This idea of standardizing the beat patterns with simple polynomials was replaced by tailored splines constructed from recordings of the user’s execution of the beat pattern. Tailored splines were preferred for a couple of reasons. Firstly, there is a natural variation of style of execution between conductors, between text books on the subject, and between pieces with the same conductor. Secondly, although the ﬁrst approach was more elegant from an analytic point of view, the recorded splines are more practical from a pragmatic engineering perspective. This approach also allows the user to pre-program special articulation, such as oﬀ-beat entry cues, with relative ease. The convenience of considering the template’s spatial p and temporal κ dimensions separately was retained for several reasons: • much more smooth and accurate splines are attained in regions of low baton movement (otherwise spurious points linger in a small neighbourhood which should only have one point), • invertability is not feasible otherwise (since some patterns require the baton to hold steady) and the guaranteed invertability of κ allows a more eﬃcient implementation of locally inverting p ◦ κ, • direct independent scaling of size (amplitude) and speed (tempo), • easier analytic simulation, • easier analysis and adjustment of recorded templates. p : [0, 1) → R2 is continuous, and p is piecewise C . For each beat pattern template, there exist points ρ0 , . . . , ρnr −1 ∈ [0, 1) such that p is not 1 y σ1 4 2 3 1 x ρ0 = σ0 Figure 3: The most salient points to the eye are the non-diﬀerentiable ones ρ0 , ρ1 , ρ2 . They are also the points about whose location the system can be the most sure of. Beats number 2, 3 occur at turning points or alternatively at zerocrossings of the x-axis; beats number 1, 4 occur at local minima of y and are therefore synch points. diﬀerentiable at κ(ρj ), 0 ≤ j < nr . These r −1 are the most visually salient points r = {ρj }nj=0 of the beat pattern to the human eye, called “rough” points in this paper. There also exist σ0 , . . . , σns −1 ∈ [0, 1) such that, for all σj ∈ s = ns −1 , a beat point occurs at p ◦ κ(σj ) and {σj }j=0 either σj ∈ r, p˙x κ(σj ) = 0 or p˙y κ(σj ) = 0 where p(t) = px (t), py (t) and the dot denotes the time derivative as usual. In other words, a beat occurs at p κ(σj ) that may be located accurately in both space and time. The σj ∈ s will be called “synch” points. Figure 3 shows a legato-espressivo 4-beat pattern with its rough points and synch points marked. 3 3.1 Model and Algorithm Model Each beat pattern has its own template {p, κ, r, s} where r = {ρ0 , . . . , ρnr −1 } is the set of rough points and s = {σ0 , . . . , σns −1 } are the synch points. The template curve p is centered about the origin 0 ∈ R2 which corresponds to the intersection of the two axes which deﬁne the conductor’s ﬁeld of beating. It is also normalized with respect to the templates for the other beat patterns to mf . The template time function κ is both continuous and bijective on [0, 1). It is proved in [5, §A.1.1] that κ must Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04 115 • s = {σ0 , . . . , σns −1 } ⊂ [0, 1) with σj < σ k for 0 ≤ j < k < ns . ∀ σj ∈ s, pκ(σj ) is a beat point and either σj ∈ r, p˙x κ(σj ) = 0 or p˙y κ(σj ) = 0. 5 Figure 4: The musician readable score ﬁle for the simulation. therefore be monotonic. Without loss of generality, κ(0) = 0 and limt→1 κ(t) = 1. r = {t ∈ [0, 1) : p κ(t) ∈ C 1 } is a ﬁnite set (by virtue of the shape of the beat patterns) so without loss of generality they can be ordered such that j < k =⇒ ρj < ρk , 0 ≤ j, k < nr . The set s shall be ordered similarly. The normalized template is to be scaled and translated to ﬁt to the incoming data, so a point p κ(t) for some κ(t) will have the general form ap + b. a ∈ (0, ∞) scales the template closed curve p[0, 1), centered about the origin, as a multiplicative scalar. b ∈ R2 translates the scaled template as an additive vector. The remaining variable in the model is the current tempo m ∈ [0, ∞) where one bar lasts m seconds so that, for the example in Fig. 4, if there are nominally 60 crochet beats per minute and two such beats per bar then m = 2. The expected location of q at time t can now be written as t mod m(t) + b. Et (q) = ap κ m(t) • ms (ti ), as (ti ) ∈ R+ the nominal score values for tempo and dynamics at time ti . • mc (ti−1 ), ac (ti−1 ) ∈ R+ , bc (ti−1 ) ∈ R2 the previous conductor’s values for tempo, dynamics and oﬀset respectively. 3.3 Know The trace of hypothetical neutral conducting, following only the score, is given in the model by: t mod ms (t) . s(t) = as (t)(p ◦ κ) ms (t) The expected location of the conductor’s trace at time ti is based on the most recent values of the tempo, dynamics and oﬀset according to the conductor, mc (ti−1 ), ac (ti−1 ), bc (ti−1 ). This information is combined with the values from the score for the current time, to eﬀect any changes. Let The algorithm consists of comparing the observed value q(ti ) with Eti (q)|ti−1 at each time step, and updating m, a, b such that q(ti ) ≈ Eti (q)|ti . This procedure can now be stated formally. mc (ti−1 )ms (ti ) . ms (ti−1 ) (1) Then the locus of q(τ ) as calculated at ti is modelled as τ mod m̄(ti ) + bc (ti−1 ), c̄ti (τ ) = ā(ti ) (p ◦ κ) m̄(ti ) (2) and the best estimation of q at ti is given by c̄ti (ti ). 3.2 3.4 Have The system is equipped with the following information: • p : [0, 1) → R2 : t → px (t), py (t) , limt→1 p(t) = p(0) piecewise diﬀerentiable, continuous, closed, parametric curve in R2 . • κ : [0, 1) → [0, 1), κ(0) = 0 continuous, piecewise diﬀerentiable, bijective time spline. • r = {ρ0 , . . . , ρnr −1 } ⊂ [0, 1) with ρj < ρk for 0 ≤ j < k < nr . p is diﬀerentiable on the interval κ(ρj , ρj+1 ) but p is non-diﬀerentiable at the point κ(ρj ). p is also diﬀerentiable on κ(0, ρ0 ) and κ(ρnr −1 , 1). ā(ti ) = ac (ti−1 )as (ti ) , as (ti−1 ) m̄(ti ) = Want For every time step ti , the algorithm has to calculate the most plausible values for mc (ti ), ac (ti ), bc (ti ) such that, with these new values, q(ti ) lies on (or very nearly on) the modelled conductor’s trace. This may be stated by cti (ti ) ≈ q(ti ) where τ mod mc (ti ) + bc (ti ). cti (τ ) = ac (ti ) (p ◦ κ) mc (ti ) (3) In other words, c̄ti (ti ) are the expected points using the latest information from the last frame and the current nominal score values, and cti (ti ) are the adjusted points. Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04 6 116 c̄ti (ρj +1 mod nr ) r c̄ti (τ ) c̄ti (ti ) r c̄ti (tq ) r r q(ti ) c̄ti (τ ) c̄ti (ti ) r q c̄ti (tq ) r r wrong Figure 5: Updating the tempo mc (ti ), the ﬁrst task is to locate tq . In the case on the left, q(ti ) is away from any rough points and so c̄ti (tq ) may safely be determined as the closest point on c̄ti (τ ) in a neighbourhood of q(ti ). On the right however, q(ti ) is close to c̄ti (ρj ) so that c̄ti (tq ) must be found in a neighbourhood of c̄ti (ti ). Note that c̄ti (tq ) is not the closest point to q on c̄ti (τ ) marked “×”, which would be much further away from c̄ti (ti ) in time. r ct (ti ) i c̄ti (ti ) r r c̄ti (ρj ) r c̄ti (ρj ) 3.5 c̄ti (tq ) r q(ti ) r aa c̄ti (τ ) Outer bracketing amplitude cti (τ ) Figure 6: Adjusting the amplitude so that q(ti ) ∈ cti [0, 1). Since q(ti ) lies in advance of c̄ti (ti ), cti (ti ) is scaled via mc (ti ) to be closer to q(ti ). First Step: Try to Align Tempo the current bar, and as before let In the simple case, only the tempo needs to be updated slightly. The ﬁrst task is to ﬁnd tq such that, with a certain interval (a, b), / / / / /q(ti ) − c̄t (tq )/ = min /q(ti ) − c̄t (τ )/ . (4) i i 2 2 a≤τ ≤b If q/ is not in the vicinity of a rough point, i.e., ∀ ρ ∈ / r, /q(ti ) − c̄ti (ρ)/2 > adr for an implementation dependent constant dr ∈ R+ , then (a, b) might just as well / for numerical eﬃciency. If / be [0, 1) except say /q(ti ) − c̄ti (ρj )/2 ≤ adr for some 0 ≤ j < nr then let tˆi = ti mod m̄(ti ) /m̄(ti ) and choose j ∈ {j, j − 1 mod nr } such that ρj < tˆi < ρj +1 mod nr , where the choice of j depends on whether or not c̄ti (ti ) has passed c̄ti (ρj ) yet in this cycle. Now a, b are chosen such that (a, b) ⊂ (ρj , ρj +1 mod nr ) as illustrated in Fig. 5. 3.6 Basic Case: Tempo Only Now that tq has been calculated, the remaining task is to adjust mc (ti ) so as to tend to align tq and ti . The distance in time which has to be aligned is factored over the length of a beat at tempo m̄(ti ). Let l be the denominator of the time signature for ti mod m̄(ti ) . tˆi = m̄(ti ) (5a) Then the updated tempo is mc (ti ) = 3.7 m̄(ti ) . m̄(ti ) + (tq − tˆi )l Next Case: Tempo Amplitude (5b) and The criterion for needing to adjust the amplitude ac (ti ) is that q is too far / away from / the expected locus, formally that /q(ti ) − c̄ti (tq )/2 ≥ da log ā(ti ) + 1 for an implementation dependent constant da ∈ R+ . If this is not the case then the amplitude is simply updated as expected by putting ac (ti ) = āc (ti ). The task is to adjust ac (ti ) by the least amount such that q(ti ) ∈ cti [0, 1). This can be achieved without undue ineﬃciency by the technique known as bracketing and bisecting. Figure 6 sketches the scenario. The prior adjustment of mc (ti ) carries through to the new scale and does not need to be recalculated. Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04 117 3.8 Full Case: Oﬀset Adjustment When the conductor’s time passes through a rough point ρ ∈ r then any necessary adjustment of the registration b is made. Formally the condition is cti (ρ) ∈ cti (ti , ti+1 ) for some ρ ∈ r, but the implementation should perform the test in [0, 1) rather than along a curve in R2 for reasons of both accuracy and eﬃciency. Let ti mod mc (ti ) ti = mc (ti ) and ti+1 mod mc (ti ) . t0 i+1 = mc (ti ) (6a) Then the condition is that 1 ti < t0 (ti , t0 i+1 ) i+1 , ∃ ρ ∈ r such that ρ ∈ (ti , 1) ∪ [0, t0 i+1 ) ti > t0 i+1 . (6b) The hypothesis here is that rough points will always be correctly placed spatially. Say that ρj ∈ r satisﬁes the above condition (Eqn 6b), and let ρj = ρj−1 mod nr . First the amplitude ac (ti+1 tρj ) is set such as to satisfy / / / / /ac (tρj )p κ(ρj ) − ac (tρj )p κ(ρj ) / 2 / / / / = /q(tρj ) − q(tρj )/ , (7) 2 which reduces to a somewhat messy but readily solvable quadratic in ac (tρj ). Any necessary updating of the score amplitude value as is propagated through the time steps from tρj to tρj by ā in Eqns 1, 2, 3 and is implicit in Eqn 7 as such. It only remains to set b which is given simply by rearranging ap + b = q: bc (ti+1 tρj ) = q(tρj ) − ac (tρj )p κ(ρj ) . (8) In particular, this procedure ensures that the xcomponent of ct (τ ) is set on the upbeat-downstroke of each beat pattern. The case for the conductor’s time passing through a synch point is treated analogously except that the hypothesis only assumes reliable spatialization along the axis that has the extremum. Say that σj ∈ s satisﬁes the above condition in Eqn 6b and that p˙y κ(σj ) = 0. Let dpy κ(σ) = 0 sy = σ ∈ s : dt and let 1 max{ρ ∈ r ∪ sy : ρ < σj } if it exists, ρj = otherwise. max{r ∪ sy } 7 Then the equation for calculating the amplitude ac (tσj ) becomes ac (tσj )py κ(σj ) − ac (tρj )py κ(ρj ) = qy (tσj ) − qy (tρj ) (9) where q(t) = qx (t), qy (t) , instead of Eqn 7. 4 Implementation The composition function p◦κ needs to be inverted at selected points in a given neighbourhood in R2 in order to determine tq from Eqn 4. It is actually a form of bracketing and bisecting which was also used for determining ac (ti ) in §3.7. Because of the low image resolution used, it takes only a small few iterations to establish the result in the implementation. For most beat patterns, p is not a simple curve, i.e., it crosses itself and is not injective, but it is injective on the interval (a, b) of Eqn 4. The time spline κ does not need to be inverted, only the spatial spline p. By setting constants κa , κb ∈ [0, 1) so that b mod m̄(ti ) a mod m̄(ti ) , , κb = κ κa = κ m̄(ti ) m̄(ti ) then only half the amount of computation is required in order to calculate c̄ti between κa and κb in Eqn 2. The model and algorithm were implemented in C in order to maximize real-time performance by minimizing the overhead of processing time. C was also chosen for portability reasons: this code was developed in particular to be directly incorporated into the PatternPlay system [5, 4] or to be encapsulated as an EyesWeb module [12], both systems being platforms for investigating the use of gestural expression for making music. The code is open source and could provide the basis for many other conducting or gesture recognition systems. It is available from [7] and on the CDROM accompanying [5]. In order to make up a complete basic conducting system, the remaining system components are: a baton tracker [6], a facility to record the user’s own beat patterns, a sequencer for live output and a module for imposing performance patterns onto the score ﬁle. The module for recording the user’s beat patterns has to automatically locate the rough points, mark the synch points, and integrate them into a score ﬁle database. The module for imposing Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04 8 118 performance patterns is implemented by means of a general framework for structural navigation and manipulation based on topos theory [3]. The full system is described in [5, §16.3.2]. p1x(t), p1y(t) p2x(t), p2y(t) "p(kappa(t)).dat" using 2:3 "q(t).dat" using 2:3 "E(t).dat" using 2:3 1 0.9 0.8 4.1 Simulation 0.7 The algorithm presented in §3 was, at least in its ﬁner details, worked out by building a simulation. The level of detail is somewhat less than a full implementation, but there should be no new problems in principle scaling up to a full system. A hypothetical beat pattern p(t) was constructed simply by 1 (2t − 4t2 )i + 2tj 0 ≤ t ≤ 12 , (10) p(t) = 1 2(1 − t)j 2 ≤ t < 1, where i, j are the usual basis vectors in R2 . This pattern along with some simulation data is plotted in Fig. 7. The associated time spline κ is plotted in Fig. 8. Figure 4 shows the musician-readable score for which this simulation was run. Articulation could be handled by matching beat pattern attributes to performance parameters. It is not incorporated into the work of this paper as it is envisaged that the user would manually make these associations, whether they be global for an entire piece or once oﬀ for a single bar. Cues for expressive timing of voice entry could also be implemented as special cases of articulation. 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 0 0.1 0.2 0.3 0.4 0.6 0.5 0.7 0.8 0.9 Figure 7: A plot of the simulation pseudo beat pattern as given by Eqn 10 and some early developmental ﬁtting results. 1 "kappaspline.dat" 0.9 0.8 0.7 0.6 0.5 0.4 4.2 Discussion 0.3 The recurring pattern of f : (m, t) → 0.2 t mod m(t) m(t) 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 in several of the equations behaves well in that f is onto [0, 1). f (m, t)’s points of discontinuity as t → nm for any n ∈ N match κ(τ )’s point of discontinuity when τ → 1, which together result in a seamless continuous closed curve p ◦ κ ◦ f in R2 . As long as t = nm for all n ∈ N, then Figure 8: The time spline κ used for the simulation. This is a plot of actual spline interpolation values calculated by the implementation. t mod m ∂f (m, t) = − m2 ∂m will suddenly change the systems behaviour for constant user input. However, as well as the conductor modulating the music output, there is an additional feedback loop consisting of the generated music telling the conductor what is going on with the performance: the conductor knows the instant there should be any such change and behaves accordingly. In this way, even discontinuities arising from the score should not pose any problem which is suitable for the sizes and ratio of t and m. The fact that ∂f /∂m depends on t is not as bad as it may sound at ﬁrst: it is q which is used to adjust m, m does not adjust t as such. It may of course happen that ms , as change discontinuously depending on the score, which Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04 119 to an alert conductor. The method presented is one approach to the problem, but other strategies are quite possible. As mentioned at the outset, there is a certain unavoidable uncertainty inherent in the nature of the task. The approach presented here is chosen as the one considered to be the most promising of success and musically meaningful, at least until further user feedback becomes available. In particular, if reliable acceleration information was available (discussed in [5, §A.1.2]) then a diﬀerent or indeed complimentary method might well prove fruitful. In-between a beat pattern’s rough points, the motion is predominantly either up, down, left, right, or changing direction. The motion through the beat pattern may be parsed by knowing its approximate i, j velocity components. These p˙x , p˙y also give slope which may also be used as a ﬁtting criterion. The motion can also be parsed in terms of its slope and direction. The hypotheses used as registration criteria in §3.8 are not guaranteed to hold. It may be that they are hallmarks of good style; it may be that they vary according to a conductor’s style. Such data is not to be found in the literature. Upon reﬂection and without further data, they seem to be at least as strong as any rival hypothesis. Such quantities as a˙c , ḃ , |b| and ṁ could be monitored and used to regulate any adjustments, but it is impossible in general to discern the user’s intention from drift or noise with certainty on the ﬂy. One aspect of conducting which is not modelled is how diﬀerent ensembles generally require diﬀerent conducting technique from the same conductor, and how the same ensemble may require (or at least receive) a diﬀerent style from diﬀerent conductors, by virtue of their diﬀerent musical background, preferred style, social character and mutual rapport. A modelling of a certain sense of personality might give the system a more human feel, but that is left for future research. The lack of any absolute reference frame or absolute mensurable values for free gestures poses a problem if the task should become playing of music. The degree of articulateness necessary for skillful playing of a musical instrument seems to require both haptic and aural feedback. In terms of using gestures captured by computer vision to control music, this problem has been circumvented by focusing on expectation instead. Indeed the very lack of preciseness of the beat points of the more expressive and rubato beat patterns is presumably quite deliberate: to allow rubato. Rubato means 9 “stolen” in Italian, the semantics being that there should be local expressive timing deviations over a steady underlying pulse. One can steal a moment from the start of a phrase boundary in order to stall just before it, for example. Figure 1 shows a range of 4-beat patterns increasing in both expressivity and rubato. 4.3 Filtering One of the problems encountered with previous related work on conducting audio ﬁles [8] was that sudden changes in audio playback rate are distracting to the listener whereas slow changes seem unresponsive to the user. Ideally one would like a system which can overlook the human lack of precision from one beat to the next but yet respond instantly if, e.g., the user suddenly stops. A routine for calculating the magnitude of a local deviation was envisaged. This magnitude could be used to reduce or stop the low pass ﬁltering of the beat timing. These issues turn out to be less problematic at the note level. The adjustment of the timing takes place in relation to the length of the last beat (Eqn 5b). This means that the music playback can comfortably come to an unexpected complete halt in one beat, a response time which rivals an alert musician. The distracting attributes arising from audio manipulation such as frequency shift, disproportionate note attack, time scaling irrespective of note length, etc. are not a problem when the playback is generated. When it is only the sustain part of a note which is scaled then the time scaling is perceived as being more natural. Anyway, generating the music in response to the conductor is more realistic than post-processing a recording which already has one particular interpretation stamped on it: the audio conductor must undo a pre-existing interpretation before asserting another, instead of working directly from the score. 5 Summary A method for live following and interpreting of conductors’ beat patterns is presented. Before play starts, the system has been provided with a score which is annotated with the conductor’s own expected neutral execution of the beat patterns. During play, the system receives a live stream of the baton’s coordinates. At each sample time-frame, an updated value is output of how advanced along the Den 13. Danske Konference i Mønstergenkendelse og Billedanalyse, København, 19–20./8-04 120 current beat pattern the conductor is, which can be used to regulate sequenced playback accordingly. Values are also output for the current dynamic level and any specially marked cues. The baton’s progress through the given beat pattern, and more importantly any meaningful deviation from the expected beat pattern, is calculated by trying to ﬁt each sample received to the latest beat-pattern curve. At every time step, this curve is updated in terms of tempo, dynamics and registration. This update is based upon the best estimation of these parameters as indicated by the conductor in relation to their nominal score values. A suitable computational representation of the beat patterns is devised under consideration of the nature of the conducting task and of the variability of the movement from one part of a beat pattern to another and of the various beat patterns from each other. Independent piecewise diﬀerentiable splines are used for the curves through space and time. The alignment of the expected and indicated tempo is performed by locally inverting the composition of the two splines. Signiﬁcant drift in or out of the curve is aligned by updating the amplitude accordingly. Registration adjustment is accomplished at the visually salient “rough” and “synch” points. The method is implemented into a complete computer music conducting system under GPL source code [7]. References [1] Francis Begnaud Hildebrand. Introduction to Numerical Analysis. McGraw-Hill, New York, Second edition, 1974. [2] M. Lee, G. Garnett, and D. Wessel. An Adaptive Conductor Follower. In A. Strange, editor, Proceedings of the International Computer Music Conference, Pages 454–455, San Francisco, USA, 1992. ICMA. [3] Guerino Mazzola et al. The Topos of Music: Geometric Logic of Cencepts, Theory, and Performance. Birkhäuser Verlag, 2002. In collaboration with Stefan Göller and Stefan Müller. [4] Declan Murphy. Pattern Play. In Alan Smaill, editor, Additional Proceedings of the Second International Conference on 10 Music and Artiﬁcial Intelligence, On-line technical report series of the Division of Informatics, University of Edinburgh, Edinburgh, Scotland, UK, September 2002. http://dream.dai.ed.ac.uk/group/ smaill/icmai/b06.pdf. [5] Declan Murphy. Expressive Manipulation of Musical Structure: From Gesture Tracking to Analysis of Music. Ph.D thesis, Copenhagen University, 2003. In Submission. [6] Declan Murphy. Tracking a Conductor’s Baton. In Søren Olsen, editor, Proceedings of the 12th Danish Conference on Pattern Recognition and Image Analysis, volume 2003/05 of DIKU report series, Pages 59– 66, Copenhagen, Denmark, August 2003. DSAGM, HCØ Tryk. [7] Declan Murphy. Beat Pattern Recognition. Anonymous FTP, May 2004. ftp://ftp.diku.dk/diku/users/declan/ beat ptn/. [8] Declan Murphy, Tue Haste Andersen, and Kristoﬀer Jensen. Conducting Audio Files via Computer Vision. In Proceedings of the Fifth International Gesture Workshop, LNAI, Genoa, Italy, April 2003. Springer. In press. [9] Max Rudolf. The Grammar of Conducting: A Comprehensive Guide to Baton Technique and Interpretation. Macmillan, Third edition, 1993. prepared with Michael Stern. [10] D. Rumelhart and J. McClelland. Parallel Distributed Processing. The MIT Press, 1986. Two volumes. [11] Claude Elwood Shannon. A Mathematical Theory of Communication. Bell System Technical Journal, 1948. [12] University of Genoa The InfoMus lab, DIST. The EyesWeb Project. WWW, October 2003. http://musart.dist.unige.it/ sito inglese/research/r current/ eyesweb.html. 121 Testing for diﬀerence between two groups of functional neuroimaging experiments Finn Årup Nielsen∗†, Andrew C. N. Chen‡, Lars Kai Hansen† July 5, 2004 Abstract We describe a meta-analytic method that tests for the diﬀerence between two groups of functional neuroimaging experiments. We use kernel density estimation in three-dimensional brain space to convert points representing focal brain activations into a voxel-based representation. We ﬁnd the maximum in the subtraction between two probability densities and compare its value against a resampling distribution obtained by permuting the labels of the two groups. As such it appears as a general method for comparing the local intensity of two non-stationary spatial point processes. The method is applied on data from thermal pain studies where “hot pain” and “cold pain” form the two groups. 1 Introduction Human functional neuroimaging examines the relationship between cognitive functions and brain areas with positron emission tomography (PET) or magnetic resonance imaging (MRI) brain scanners. Experiments typically investigate a speciﬁc brain function and determine its “activation” in the brain volume. This is usually done by scanning multiple subjects while they are under two diﬀerent conditions (e.g., “activation” and “rest”). Statistical analysis of the scanning often employing the general linear model results in a statistical parametric image volume, that is summarized by the signiﬁcant local maxima (Friston et al., 1995). These local maxima are presented in scientiﬁc articles by their 3-dimensional coordinates and, e.g., their z-score or p-value. Before the statistical analysis the brain scans are spatially normalized to a standard brain atlas, — the so-called “Talairach atlas” (Talairach and Tournoux, 1988). This allows the 3-dimensional coordinates of the local maxima — the “Talairach coordinates” — to be compared across studies. For meta-analysis we should use the statistical parametric image volume for optimal results. Although there are beginning to appear neuroimaging databases that contain such data, e.g., the fMRI Data Center (Van Horn et al., 2001) and NeuroGenerator (Roland et al., 2001), the image volumes are typically not available and we have to resort to the Talairach coordinates. The access to the Talairach coordinates is made easier when they are represented in a database. Two such databases exist: The BrainMap database (Fox and Lancaster, 1994; Fox and Lancaster, 2002) and the Brede database (Nielsen, 2003). A number of ∗ Neurobiology Research Unit, Rigshospitalet, Copenhagen and Mathematical Modeling, Technical University of Denmark, Lyngby ‡ Center of Sensory-Motor Interaction, Aalborg University, Aalborg † Informatics 1 122 studies have modeled the distribution of the Talairach coordinates in these databases, e.g., (Fox et al., 1997; Nielsen and Hansen, 2002). If the Talairach coordinates are restricted to a speciﬁc area their distribution may be approximated with a Gaussian distribution and inference can be made with “parametric models” (Fox et al., 1997) and, e.g., the Hotelling’s T 2 can be employed to test for diﬀerence between two groups of coordinates (Christoﬀ and Grabrieli, 2000; Nielsen et al., 2004). However, in many cases the distribution of the Talairach coordinates will have several spatial modes, and therefore it has been suggested to use Gaussian mixture models (Nielsen, 2001) or kernel density estimators (Nielsen and Hansen, 2002; Turkeltaub et al., 2002; Chein et al., 2002; Wager et al., 2003). The statistical analysis is usually performed in a mass-univariate setting where the number of statistical tests corresponds to the number of voxels in the volume. This results in a massive multiple comparison problem that is most often countered by employing random ﬁeld theory for the statistical inference (Cao and Worsley, 2001). But permutation tests can also be used by constructing the null distribution for the maximum statistic, where the maximum is taken across all the voxel in the statistical parametric image (Holmes et al., 1996; Nichols and Holmes, 2001). Below we will describe a meta-analytic method that uses permutation tests together with maximum statistic and kernel density estimation to give a statistical value for the diﬀerence between two groups of Talairach coordinates and thereby testing if two groups of functional neuroimaging experiments are diﬀerent. We will use Talairach coordinates from hot and cold pain experiments. The pain modality is of special interest for our particular method since it typically causes a spatially multimodal activation pattern where several district brain regions are involved: Thalamus, the somatosensory cortex, insula and anterior cingulate cortex (Ingvar, 1999). 2 Method A probability density volume is constructed by convolving a local maximum l (in the following called “location”) positioned in Talairach space at xl with a 3-dimensional Gaussian kernel with isotropic variance (Nielsen and Hansen, 2002; Turkeltaub et al., 2002; Chein et al., 2002). (x − xl )T (x − xl ) 2 −3/2 , (1) exp − p(x|l) = (2πσ ) 2σ 2 where we select the kernel width to σ = 1 cm. When we construct the probability density corresponding to a group of experiments p(x|g) we combine the contributions from all the individual locations associated with the group (i.e., all the locations in all the experiments associated with the group) p(x|l) p(l|g), (2) p(x|g) = l∈g where the prior is simply set to p(l|g) = 1/|Lg |, i.e., inversely proportional to the number of locations in the g group. The continuous probability density is converted to a vector by sampling it in a regular grid vg ≡ p(x|g). (3) 3 In the present application we use a coarse grid of (8 mm) resulting in 7752 voxels. As a statistic for the diﬀerence between two volumes (v1 and v2 ) we simply use the subtraction performed separately for each voxel t = v1 − v2 . 2 (4) 123 (a) Hot pain (b) Cold pain Figure 1: Visualization of the Talairach coordinates from hot pain and cold pain studies in a corner cube environment (Rehm et al., 1998). The glyphs are colored according to the experiment they belong to and are projected onto the walls. The blue curves are the outline of the brain. The thick red lines are the axes of the Talairach atlas and the view point makes the upper left part of the back of the brain the closest point. The units on the axes are centimeters. To counter the multiple comparison problem a null distribution uses the maximum value across voxels (5) t = maxi (ti ) The null distribution of this maximum statistic is established by permutation: A distribution is build up by permuting the assignment of experiments to the two groups resulting in two new groups v1∗ and v2∗ , that each comprises of the same number of experiments as the original two groups. Thus the null distribution of the maximum statistic appears as ∗ ∗ − v2,i ). (6) t∗ = max(v1,i i The permutation is randomized suﬃciently many times (n = 1, . . . , N ) to generate a “smooth” and stable distribution. We may assign a conservative p-value to the ith voxel by counting the proportion its statistic ti is exceeded by the N values of the permutation maximum statistic Pi = 1/N N |ti < t∗n | . (7) n The p-values enable us to choose a statistically based threshold in the subtraction image. To demonstrate the method we invoke data from thermal pain studies, where the two groups are hot and cold pain. Such studies will typically employ a 45 − 50 ◦ C or 0 − 5 ◦ C stimulus to the subjects. The studies were added to the Brede database 3 124 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 1 2 3 4 5 6 7 8 WOEXT: 183 - Hot pain WOEXT: 186 - Attended heat pain on right hand WOEXT: 187 - Distracted heat pain on right hand WOEXT: 188 - Attended heat pain on left hand WOEXT: 189 - Distracted heat pain on left hand WOEXT: 217 - Hot pain in right hand WOEXT: 225 - Hot pain on left hand (group 1) WOEXT: 227 - Hot pain on left hand (group 2) WOEXT: 230 - Painful heat on right ﬁngers WOEXT: 233 - Hot pain on right hand in rest, mental imagery and hypnosis WOEXT: 234 - Hot pain on right hand in rest and mental imagery WOEXT: 235 - Hot pain on right hand during hypnosis WOEXT: 237 - Interaction between hypnosis and hot pain on right hand WOEXT: 238 - Correlated with pain ratings in hot pain on right hand in rest, mental imagery and hypnosis WOEXT: 240 - Interaction between hypnosis and pain ratings in hot pain on right hand. WOEXT: 245 - Heat pain on right arm WOEXT: 246 - Positive correlation with pain threshold WOEXT: 248 - Correlation with pain intensity WOEXT: 249 - Correlation with pain unpleasantness WOEXT: 298 - Early phase heat pain WOEXT: 299 - Late phase heat pain WOEXT: 312 - Heat pain on left hand WOEXT: 314 - Heat pain on left volar forearm WOEXT: 319 - Heat pain on left arm WOEXT: 182 - Cold pain WOEXT: 184 - Cold pain in left hand WOEXT: 213 - Cold pain in right hand WOEXT: 263 - Cold pain on right foot WOEXT: 264 - Cold pain on right foot masked by silent word reading WOEXT: 265 - Silent word reading while cold pain on right foot WOEXT: 266 - Cold pain versus cold pain with silent word reading WOEXT: 320 - Cold pain on left arm (Tracey et al., 2000) (Brooks et al., 2002) (Brooks et al., 2002) (Brooks et al., 2002) (Brooks et al., 2002) (Craig et al., 1996) (Becerra et al., 1999) (Becerra et al., 1999) (Gelnar et al., 1999) (Faymonville et al., 2000) (Faymonville et al., 2000) (Faymonville et al., 2000) (Faymonville et al., 2000) (Faymonville et al., 2000) (Faymonville et al., 2000) (Tölle et al., 1999) (Tölle et al., 1999) (Tölle et al., 1999) (Tölle et al., 1999) (Casey et al., 2001) (Casey et al., 2001) (Vogt et al., 1996) (Adler et al., 1996) (Casey et al., 1996) (Tracey et al., 2000) (Petrovic et al., 2000) (Craig et al., 1996) (Frankenstein et al., 2001) (Frankenstein et al., 2001) (Frankenstein et al., 2001) (Frankenstein et al., 2001) (Casey et al., 1996) Table 1: List of included hot and cold pain experiments. 4 125 Hot pain 100 Frequency 80 60 40 20 0 1000 2000 3000 4000 6000 5000 7000 8000 9000 10000 11000 7000 8000 9000 10000 11000 Cold pain 200 Frequency 150 100 50 0 1000 2000 3000 4000 6000 5000 Maximum statistics Figure 2: Empirical histograms of the maximum statistics t∗ after 1000 permutations. The thick red lines indicate the maxima for the hot and cold pain statistics t hot and tcold . (Nielsen, 2003). Slight variations among the studies appear in the application of the Talairach atlases, and locations that conform to the so-called MNI space are adjusted before entry (Brett, 1999). All the locations from the pain experiments are shown in two panels in ﬁgure 1, where the color indicates the experiments the locations originate from. Table 1 lists all the included 24 hot and 8 cold pain experiments. The list is automatically generated from the information in the Brede database. Note that the pain stimulus is induced under varying contexts, at diﬀerent sites on the body and some studies contribute with several experiments, e.g., a study by Faymonville et al. contributes with 6 experiments. Both the statistics for hot and cold pain are considered, — simply by reversing the subtraction (or actually, in the practical implementation, by ﬁnding the minimum) thot = max(vhot,i − vcold,i ) (8) tcold = max(vcold,i − vhot,i ). (9) i i Many of the operations performed in our method described above are implemented in the Brede Neuroinformatics Toolbox (Nielsen and Hansen, 2000). 3 Results and discussion Figure 2 displays the empirical histograms of the null distributions of the maximum statistics t∗ . The thick red lines indicate the maximum statistics of the comparisons of 5 126 Figure 3: Results from the permutation test in a corner cube environment. The red isosurfaces are for hot pain thot and the light blue isosurfaces are for cold pain tcold based on very low thresholds at P = 0.95. interest thot and tcold . They show that our method does not ﬁnd any larger diﬀerences between hot and cold pain: The statistics of both hot and cold pain are found in the middle of the null distributions. The distribution for the cold pain has a heavier tail than the hot pain distribution. This is caused by the fewer experiments that form the cold pain group. Figure 3 shows isosurfaces with a very liberal threshold in the subtraction images of hot thot and cold pain tcold . The right hemisphere appears with the ﬁrst diﬀerence between the pain modalities, but only with very low support: For cold pain the most “signiﬁcant” voxel has a p-value of P ≈ 0.25. We have previously proposed a method that uses a database of experiments to generate a null distribution of the correlation coeﬃcient between two volumes (Nielsen and Hansen, 2004a; Nielsen and Hansen, 2004b). That method requires a database of dissimilar experiments to build up a null hypothesis, e.g., a pain experiment is compared with a memory or language experiment. The method we present in this contribution does not need this extra data, but relies only on data from the two groups that are being compared. Furthermore, our previous method is a global method performing an omnibus test for the entire volume and the entire set of locations, while our new method allows us to make inference on the voxel-level. This is also an advantage over the Hotelling’s T 2 test. However, our presented method does require — to gain suﬃcient statistical power — that there are many experiments in each group. 6 127 4 Acknowledgment Jørgen Kold is acknowledged for collection of the data. Finn Årup Nielsen is supported by the Villum Kann Rasmussen Foundation. References Adler, L. J., Gyulai, F. E., Diehl, D. J., Mintun, M. A., Winter, P. M., and Firestone, L. L. (1996). Regional brain activity changes associated with fentanyl analgesia elucidated by positron emission tomography. Anesthesia & Analgesia, 84(1):120– 126. Becerra, L. R., Breiter, H. C., Stojanovic, M., Fishman, S., Edwards, A., Comite, A. R., Gonzalez, R. G., and Borsook, D. (1999). Human brain activation under controlled thermal stimulation and habituation to noxious heat: An fMRI study. Magnetic Resonance in Medicine, 41(5):1044–1057. Brett, M. (1999). The MNI brain and the Talairach atlas. http://www.mrccbu.cam.ac.uk/Imaging/mnispace.html. Accessed 2003 March 17. Brooks, J. C. W., Nurmikko, T. J., Bimson, W. E., Singh, K. D., and Roberts, N. (2002). fMRI of thermal pain: eﬀects of stimulus laterality and attention. NeuroImage, 15(2):293–301. Cao, J. and Worsley, K. J. (2001). Applications of random ﬁelds in human brain mapping. In Moore, M., editor, Spatial Statistics: Methodological Aspects and Applications, volume 159 of Lecture notes in Statistics, chapter 8, pages 170–182. Springer, New York. Casey, K. L., Minoshima, S., Morrow, T. J., and Koeppe, R. A. (1996). Comparison of human cerebral activation patterns during cutaneous warmth, heat pain, and deep cold pain. Journal of Neurophysiology, 76(1):571–581. Casey, K. L., Morrow, T. J., Lorenz, J., and Minoshima, S. (2001). Temporal and spatial dynamics of human forebrain activity during heat pain: analysis by positron emission tomography. Journal of Neurophysiology, 85(2):951–959. Chein, J. M., Fissell, K., Jacobs, S., and Fiez, J. A. (2002). Functional heterogeneity within Broca’s area during verbal working memory. Physiology & Behavior, 77(45):635–639. Christoﬀ, K. and Grabrieli, J. D. E. (2000). The frontopolar cortex and human cognition: Evidence for a rostrocaudal hierarchical organization within the human prefrontal cortex. Psychobiology, 28(2):168–186. Craig, A. D., Reiman, E. M., Evans, A., and Bushnell, M. C. (1996). Functional imaging of an illusion of pain. Nature, 384(6606):258–260. Faymonville, M. E., Laureys, S., Degueldre, C., Del Fiore, G., Luxen, A., Franck, G., Lamy, M., and Maquet, P. (2000). Neural mechanisms of antinociceptive eﬀects of hypnosis. Anesthesiology, 92(5):1257–1267. Fox, P. T. and Lancaster, J. L. (1994). 266(5187):994–996. 7 Neuroscience on the net. Science, 128 Fox, P. T. and Lancaster, J. L. (2002). Mapping context and content: the BrainMap model. Nature Reviews Neuroscience, 3(4):319–321. Fox, P. T., Lancaster, J. L., Parsons, L. M., Xiong, J.-H., and Zamarripa, F. (1997). Functional volumes modeling: Theory and preliminary assessment. Human Brain Mapping, 5(4):306–311. Frankenstein, U. N., Richter, W., McIntyre, M. C., and Remy, F. (2001). Distraction modulates anterior cingulate gyrus activations during the cold pressor test. NeuroImage, 14(4):827–36. Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J.-B., Frith, C. D., and Frackowiak, R. S. J. (1995). Statistical parametric maps in functional imaging: A general linear approach. Human Brain Mapping, 2:189–210. Gelnar, P. A., Krauss, B. R., Sheehe, P. R., Szeverenyi, N. M., and Apkarian, A. V. (1999). A comparative fMRI study of cortical representations for thermal painful, vibrotactile, and motor performance tasks. NeuroImage, 10(4):460–482. Holmes, A. P., Blair, R. C., Watson, J. D. G., and Ford, I. (1996). Non-parametric analysis of statistic images from functional mapping experiments. Journal of Cerebral Blood Flow and Metabolism, 16(1):7–22. Ingvar, M. (1999). Pain and functional imaging. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 354(1387):1347–1358. Nichols, T. E. and Holmes, A. P. (2001). Nonparametric permutation tests for PET functional neuroimaging experiments: A primer with examples. Human Brain Mapping, 15(1):1–25. Nielsen, F. Å. (2001). Neuroinformatics in Functional Neuroimaging. PhD thesis, Informatics and Mathematical Modelling, Technical University of Denmark, Lyngby, Denmark. Nielsen, F. Å. (2003). The Brede database: a small database for functional neuroimaging. NeuroImage, 19(2). Presented at the 9th International Conference on Functional Mapping of the Human Brain, June 19–22, 2003, New York, NY. Available on CD-Rom. Nielsen, F. Å., Balslev, D., and Hansen, L. K. (2004). Mining posterior cingulate. NeuroImage, 22. Presented at the 10th Annual Meeting of the Organization for Human Brain Mapping, June 14–17, 2004, Budapest, Hungary. Available on CD-ROM. Nielsen, F. Å. and Hansen, L. K. (2000). Experiences with Matlab and VRML in functional neuroimaging visualizations. In Klasky, S. and Thorpe, S., editors, VDE2000 - Visualization Development Environments, Workshop Proceedings, Princeton, New Jersey, USA, April 27–28, 2000, pages 76–81, Princeton, New Jersey. Princeton Plasma Physics Laboratory. Nielsen, F. Å. and Hansen, L. K. (2002). Modeling of activation data in the BrainMapTM database: Detection of outliers. Human Brain Mapping, 15(3):146– 156. Nielsen, F. Å. and Hansen, L. K. (2004a). Assessing the reproducibility in sets of Talairach coordinates. NeuroImage, 22. Presented at the 10th Annual Meeting of the Organization for Human Brain Mapping, June 14–17, 2004, Budapest, Hungary. Available on CD-ROM. 8 129 Nielsen, F. Å. and Hansen, L. K. (2004b). Finding related functional neuroimaging volumes. Artiﬁcial Intelligence in Medicine, 30(2):141–151. Petrovic, P., Petersson, K. M., Ghatan, P. H., Stone-Elander, S., and Ingvar, M. (2000). Pain-related cerebral activation is altered by a distracting cognitive task. Pain, 85(1-2):19–30. Rehm, K., Lakshminarayan, K., Frutiger, S. A., Schaper, K. A., Sumners, D. L., Strother, S. C., Anderson, J. R., and Rottenberg, D. A. (1998). A symbolic environment for visualizing activated foci in functional neuroimaging datasets. Medical Image Analysis, 2(3):215–226. Roland, P., Svensson, G., Lindeberg, T., Risch, T., Baumann, P., Dehmel, A., Frederiksson, J., Halldorson, H., Forsberg, L., Young, J., and Zilles, K. (2001). A database generator for human brain imaging. Trends in Neuroscience, 24(10):562–564. Talairach, J. and Tournoux, P. (1988). Co-planar Stereotaxic Atlas of the Human Brain. Thieme Medical Publisher Inc, New York. Tölle, T. R., Kaufmann, T., Siessmeier, T., Lautenbacher, S., Berthele, A., Munz, F., Zieglgansberger, W., Willoch, F., Schwaiger, M., Conrad, B., and Bartenstein, P. (1999). Region-speciﬁc encoding of sensory and aﬀective components of pain in the human brain: a positron emission tomography correlation analysis. Ann Neurol, 45(1):40–7. Tracey, I., Becerra, L., Chang, I., Breiter, H., Jenkins, L., Borsook, D., and Gonzalez, R. G. (2000). Noxious hot and cold stimulation produce common patterns of brain activation in humans: a functional magnetic resonance imaging study. Neuroscience Letters, 288(2):159–162. Turkeltaub, P. E., Eden, G. F., Jones, K. M., and Zeﬃro, T. A. (2002). Meta-analysis of the functional neuroanatomy of single-word reading: method and validation. NeuroImage, 16(3 part 1):765–780. Van Horn, J. D., Grethe, J. S., Kostelec, P., Woodward, J. B., Aslam, J. A., Rus, D., Rockmore, D., and Gazzaniga, M. S. (2001). The functional magnetic resonance imaging data center (fMRIDC): the challenges and rewards of large-scale databasing of neuroimaging studies. Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences, 356(1412):1323–1339. Vogt, B. A., Derbyshire, S., and Jones, A. K. P. (1996). Pain processing in four regions of human cingulate cortex localized with co-registered PET and MR imaging. European Journal of Neuroscience, 8(7):1461–1473. Wager, T. D., Phan, K. L., Liberzon, I., and Taylor, S. F. (2003). Valence, gender, and lateralization of functional brain anatomy in emotion: a meta-analysis of ﬁndings from neuroimaging. NeuroImage, 19(3):513–531. 9 130 AN IMAGE BASED SYSTEM TO AUTOMATICALLY AND OBJECTIVELY SCORE THE DEGREE OF REDNESS AND SCALING IN PSORIASIS LESIONS . David Delgado Informatics and Mathematical Modelling, Denmark email: [email protected] Bjarne Ersb ll IMM, Denmark email: [email protected] Jens Michael Carstensen IMM Denmark email: [email protected] Abstract In this work, a combined statistical and image analysis method to automatically evaluate the severity of scaling in psoriasis lesions is proposed. The method separates the different regions of the disease in the image and scores the degree of scaling based on the properties of these areas. The proposed method provides a solution to one of the present problems in dermatology: the lack of suitable methods to assess the lesion and to evaluate the changes during the treatment. An experiment over a collection of psoriasis images is conducted to test the performance of the method. Results show that the obtained scores are highly correlated with scores made by doctors. This and the fact that the obtained measures are continuous indicate the proposed method is a suitable tool to evaluate the lesion and to track the evolution of dermatological diseases. Keywords: psoriasis, exploratory data analysis, segmentation, decision trees, classi cation 1 Introduction One of the main problems in the treatment of dermatological diseases is the dif culty of tracking the evolution of the disease. Physicians are visited by the patients several times to control the evolution of the disease. However, due to the fact that no objective methods to summarize the lesion exist, physicians make scorings and take notes to document the actual condition of the patient. A drawback of this method is the dependency on the individual physician. The advances in image analysis during the last decade have lead to the development of different methods to deal with related problems in the dermatological eld. Engström [1] observed the effect of a new enzymatic debrider observing the evolution of the lesion area and the lesion color. These measurements were obtained from digitized photographs analyzed with a computer. Later, Hansen [2], developed an image system that included calibration for increasing the quality of the images. The system diagnoses burns and pressure ulcers in animals but the possibility of being used in humans was mentioned. In a recent paper, Hillebrand [3] used computer analysis in high resolution digital images to compare the skin condition of a group of females. In this work, a method to objectively score the degree of scaling and redness in psoriasis is proposed. The method realises a hierarchical segmentation to isolate the different structures present in the image: normal skin, red area and scales. Different values are obtained from these areas and they are used to approximate the doctor scorings. The dermatologists Lone Skov and Bo Bang of Gentofte Hospital of Denmark and the anonymous patients are gratefully acknowledged for their collaboration during the image acquisition sessions. 2 SEGMENTATION OF THE AREAS PRESENT IN THE LESION David Delgado et al 2 Segmentation of the areas present131 in the lesion Psoriasis is a dermatological disease characterized by red, thickened areas with silvery scales [4]. In order to score the degree of scales and redness in psoriasis, the rst step is to segment the different areas in the lesion. The wide variety of forms and different levels of severity that psoriasis can exhibit makes this task highly complex. 2.1 Segmentation of the lesion The segmentation of the disease with respect to the healthy area is based on the assumption, that under a suitable projection, both the normal skin and the lesion are distributed approximately as a Gaussian distribution. This assumption was supported by an exploratory data analysis of a small set of psoriasis images where several projections were considered. Furthermore, a principal component analysis [5] and an independent component analysis [6] on a dataset of 115 images indicated that the difference between the green and the blue band exhibits a good contrast to discriminate between the lesion and the normal skin. The distribution of this difference approximately follows a linear mixture of two Gaussians. The estimation of their means and variances makes it possible to identify the lesion by means of discriminant analysis. The parameters of the gaussians were estimated according to Taxt [7]. Figure 1 shows the segmentation of the lesion. 2.2 Extracting the scales Segmentation of the scales is complicated by the fact that scales may or may not appear in the image. If they appear they may range from a few spots to a large area. Moreover, non-uniformity of the areas with redness (ranging from red to brown) makes the task even harder. This variability implies that the lesion has to be considered in small areas where the change in redness is not signi cant. This can be accomplished with watersheds [8] to mark the different scales and then locally use a clustering algorithm [9] to segment them. This approach requires specifying the number of watersheds. In this work, the number of watersheds is determined in two steps. First a new image is created based in the watershed regions. Each watershed area is replaced by the minimum value of this area. This new image is then thresholded and the watersheds with values less than the threshold are the areas where the scales are detected. The method was tested on a set of psoriasis images and it demonstrated a good performance. However, the method had dif culties with some images that had problems during acquisition (especially shadows), so the number of watersheds was not found correctly. To solve this problem, the number of watersheds was x ed visually by a tuning parameter. The blue band was used to nd the watersheds because a canonical analysis had shown that this band is the best to separate the scales from the red area. Figure 2 displays the segmentation of the scales. 2.3 Scoring the disease Once the different areas have been segmented, a decision tree is created to automatically score the degree of scaling in the different images, approximating the scorings made by the physicians. Three variables are used as input to the model: the area of the scaling, the ratio between the area of scaling and the area of the lesion, and the ratio between the area of scaling and the area of redness. The whole procedure is shown in Figure 3. In the evaluation of the redness, a canonical discriminant analysis over the difference of the mean values of the spectral values in the red and healthy area points out to approximate the physicians scorings using a clustering method as, e. g., a K- nearest neighbour. 132 133 2 SEGMENTATION OF THE AREAS PRESENT IN THE LESION David Delgado et al 134 Patient Image Aquisition Capture Image Segmentation Area with scaling Area Scaling/Area lesion Area scaling/Area red Feature Extraction 3713 0.089 0.098 Scoring Automated Scoring 3 Doctor’s Scoring 3 Figure 3: diagram of the method. 3 EXPERIMENTS David Delgado et al 135 Area < 2935 4 5 x 10 4.5 Area < 32073 Area < 467 4 3.5 3 2.5 2 1.5 1 0.5 0 0.92 2 3 0 0 0.5 1 1.5 2 2.5 3 Figure 4: Left: Decision tree for the scoring given the parameters of the segmentation. Right: Dependency of lesion area on physicians’ scoring of the lesions 3 Experiments Two experiments are conducted to test the accuracy of the proposed method to score the scaling and redness in psoriasis lesions 3.1 First experiment: Scoring the degree of Scaling In collaboration with the dermatological department of Gentofte Hospital in Denmark an experiment was conducted. The goal of the experiment is to objectively score the severity of the scaling in psoriasis images. To accomplish this goal, a set of 46 psoriasis images was selected from a database of psoriasis collected from different patients. The physicians scores of these images was also available. The images were selected to cover the maximal possible diversity. The different areas of each image were extracted according to the procedure described in the previous sections and the above mentioned three summary values were obtained. A cross-validation process was used to build 23 decision trees. These decision trees utilized 44 data points to build the tree and two for testing it. Results showed that the rst variable, the area of the scaling, is enough to explain the physicians scoring. The automated scoring with our method has proven reliable, and on several occasions even allowed for corrections of physicians’ mistakes. In these cases, the physicians were asked to re-score their previous judgements, and in all cases the assessment was changed. Figure 4 left shows the nal tree generated using all the points. Figure 4 right plots the area of the scaling versus the physicians’ scoring. 3.2 Second experiment: Classifying the severity of redness The second experiment aims to assess the possibility of automatically scoring the degree of the redness of the lesion. To achieve this goal, a set composed of 77 images of psoriasis lesions was selected from a data set with 175 images to perform the experiment. The selected images do not present shadows, scars or other elements that could spoil the result of the experiment. The severity of redness for each image was scored by the physicians in order to have a reference measure. The different areas involved in the chosen images were segmented according to the procedure described previously. The mean of the tri-chromatic bands was calculated in the healthy and in the red skin area for each image. The difference of these two means showed to be a good feature to evaluate the lesion. 3 EXPERIMENTS David Delgado et al 136 20 15 10 5 0 5 10 40 30 20 10 0 10 10 20 10 0 20 30 40 2.5 2 1.5 1 0.5 0 0.5 1 1.5 4 3 2 1 0 1 2 3 4 5 6 Figure 5: Up: 3D plot of the variables considered to classify the redness. Down: Result of applying a canonical discriminant analysis to the three selected variables to classify the redness REFERENCES David Delgado et al Figure 5 left shows a 3D plot of the differences where the different symbols represent different physi137 cians scorings. The presence of three de ned groups indicates the possibility of classify the redness using a clustering algorithm. This suggestion is even more clear if a canonical discriminant analysis is realized as it can be noticed in Figure 5. 4 Summary and conclusion In this work, a procedure to evaluate the severity of the scaling and redness in psoriasis has been developed. The method automatically separates the different parts and extracts different parameters. In certain dif cult cases such as uneven illumination it has been noticed that, allowing a manual interaction increases the accuracy notably. The method provides objective measures that avoid the dependence of the physician in the tracking of dermatological diseases. It has been shown that one of the provided measures is highly correlated with the doctor scoring. Together with the other two measures we expect to be able to provide a better lesion description. References [1] Engström N., Hansson F. , Hellgren L., Tomas J., Nordin B. Vincent J. and Wahlberg A. Computerized Wound Image Analysis In Pathogenesis of Wound and Biomaterial-Associated Infections. Springer-Verlag p 189-193, 1990. [2] Hansen G., Sparrow E., Kokate J., Leland K., Iaizzo P. Wound Status Evaluation using Color Image Processing IEEE Transactions on Medical Imaging, vol. 16, no.1, February 1997 [3] Hillebrand G., Miyamoto K., Schnell B., Ichihashi M., Shinkura R., Akiba S. Quantitative evaluation of skin condition in an epidemiological survey of females living in northern versus southern Japan Journal of Dermatological Science, vol. 27, p 42-52, 2001. [4] C. Camisa, Handbook of psoriasis, Blackwell Science, 1998. [5] Johnson R., Wichern D. Applied Multivariated Statistical Analysis. Chapter 8. Prentince-Hall, 1995. [6] Hyv rinen A., Karhunen J., Oja E. Independent Component Analysis Wiley publications, 2001. [7] Taxt T., Hjort L., Eikvik L. : Statistical Classi cation using a Linear Mixture of two Multi-normal Probability Densities. Pattern Recognition Letters.(1991) 12 731-737 [8] Vincent L., Soille P. Watersheds in digital spaces: an ef cient algorithm based on immersion simulations, IEEE Trans. Pattern Anal. Mach. Intell. 13 (6) (1991) 593-598. [9] Sonka M., Hlavac V., Boyle R. Image Processing Analysis, and Machine Vision Broks, p 128-130, 1999. 138 Survey and A ssessm entofA dvanced Feature Extraction Techniques and Tools for EO A pplications M ikaelK am p Sørensena*,M ichaelSchultz Rasm ussena,H enning Skriverb,PeterJohansenc, A rthurPecec,and JesperH øyerup Thygesend. a G RA S,c/o Inst.ofG eography,U niv.ofCopenhagen,Ø sterV oldg.10,1350 Copenhagen,D enm ark b TechnicalU niv.ofD enm ark,Ø rsted-D TU ,2800 K gs.Lyngby,D enm ark c D ept.ofCom p.Science,U niv.ofCopenhagen,U niversitetsparken 1,2100 Copenhagen,D enm ark d RO V SIN G A /S,D yregårdsvej2,2740 Skovlunde,D enm ark A BSTR A C T In addition to the substantial am ounts of available Earth O bservation (EO ) data, there is currently an increasing trend tow ards the acquisition of larger and larger EO data and im age quantities from single satellites or m issions,w ith m ultiple,higherresolution sensors and w ith m ore frequentrevisiting.M ore sophisticated algorithm s and techniques than those largely in use today are required to exploitthis rapidly grow ing w ealth ofdata and im ages to a fullerextent. The project “Survey and A ssessm ent of A dvanced Feature Extraction Techniques and Tools for EO A pplications” (SU RF) funded by the European Space A gency (ESA ) w illaddress these issues.The objective of SU RF is to provide an overview of the currentstate-of-the-artM ethods w ithin feature extraction and m anipulation for EO applications and to identify scenarios and related architectures for exploitation of the m ost prom ising EO feature extraction M ethods.The task is to identify the m ost promising M ethods to extract pertinent inform ation from EO data on environm ent, natural resources and security issues. SU RF aim s at listing existing M ethods w ith the final goal of identifying the three m ost prom ising M ethods to be im plem ented in prototype solutions.The w ork includes the developm entof the conceptfor the evaluation and rating ofM ethods relative to the users needs forinform ation,the m aturity and novelty ofthe M ethods,the potential for fusing data and the operational feasibility. Special emphasis w ill be m ade regarding the exploitation of state-of-the artim age processing,pattern recognition and classification techniques. K eyw ords:autom atic feature extraction,classification,survey,m ethods,EO ,rem ote sensing,data fusion IN TR O D U C TIO N The globalarchives of rem ote sensing data are constantly increasing due to a num ber of factors.First,the am ountof operationalrem ote sensing satellites is continuously grow ing.Second,the spatialand spectralresolutions of the new sensors are constantly im proving due to technologicaladvances,resulting in exponentially larger data volum es.Finally,the data transm ission capabilities ofm odern satellites resultin vastly increased data transfers. H ow ever,even though the use and exploitation of EO data is increasing,the vastdata archives are notbeing fully exploited. Ithas been estim ated thatonly a sm allpercentage of the rem ote sensing data archived annually are actually being utilized by end users.Partof the reason is thatm ostrem ote sensing applications depend on im age processing carried out by experts. Even though the basic im age processing such as geom etric rectification and radiom etric calibration is * m [email protected] gras.ku.dk;phone +45 35 32 41 75;fax +45 35 32 25 01;w w w .gras.ku.dk 139 largely perform ed autom atically for m ost rem ote sensing products, the conversion from im age data to the them atic inform ation required by end users is a constraining factorforthe furtherproliferation ofrem ote sensing applications. A num ber of initiatives have already focused on autom atic feature extraction. Through the Inform ation Society Technologies (IST)program m e,the European U nion seeks to m ature technologicaladvances.A n exam ple ofsuch a projectis Satellite Based Inform ation System on CoastalA reas and Lakes (SISCA L) (SISCA L,2003),thatw illprovide near-real tim e data on the m arine environm ent.Exam ples of data m ining projects include the ESA projectK now ledge D riven Inform ation M ining in Rem ote Sensing Im age A rchives (K IM ). H ow ever,there is stilla need for developing autom atic feature extraction m ethods thatw illm ake rem ote sensing inform ation readily available to a w ide range of end users.The “user” is considered as a broad category from professionals to the generalpublic,from the environm entalscientistto the analystw orking in a county or a citizen seeking inform ation through the Internet.For this reason EO data products m ustbe exploitable in a user-friendly w ay through a clear understanding and description ofthe inform ation and through digitaldata form ats thatare easy to handle. M ostend users w ork w ith them atic features thatchange over tim e and the grow ing archives of rem ote sensing data further encourage and facilitate the analysis of tim e series. Change detection and trend analysis are im portant techniques thatare able to m atch the userrequirem ents w ithin globalchange and environm entalm onitoring. Itis im portantfor the exploitation of rem ote sensing data to try to identify novelm ethods. An interesting perspective is the com bined use ofdata from differentsensors applying data fusion m ethods.In the contextof the currentproject,data fusion m ay both refer to a specific data fusion algorithm (i.e. w avelets, PCA ) com bining tw o or m ore different data sources oritm ay sim ply referto the integrated use oftw o differentim age sources.Furtherm ore,itis the intension to take a broad view of possible candidate m ethods notonly w ithin classic rem ote sensing sciences,butequally w ithin related fields such as im age processing as being explored by m edicine and in the studies of m aterials and m inerals using m icroscopes. The project “Survey and A ssessm ent of A dvanced Feature Extraction Techniques and Tools for EO A pplications” (SU RF) funded by the European Space A gency (ESA ) w illdealw ith these issues.The m ain objectives of the SU RF projectare to obtain: - an up-to-date picture of advanced M ethods for the support of Earth O bservation applications through Feature Extraction. - the identification of possible scenarios and related architectures for the exploitation of the m ostprom ising EO Feature Extraction M ethods. - the im plem entation ofthree M ethods using realim age data from three cases. The objectives w illbe achieved through a survey,evaluation and ranking of M ethods applicable to the EO dom ain.For the three m ostprom ising M ethods,prototypes w illbe developed forevaluation using im age data. In the SU RF project,a M ethod (w ith capitalM ) is here defined as a w ay of solving a problem w ithin a given context.A M ethod m ay be a specific algorithm or technique or itm ay be understood in a broader sense,such as a com plete softw are package.Feature extraction can be defined as the conversion from im age data to them atic inform ation.In the SU RF project,the intention is to supportthe user in term s of pre-processing through autom ating the com plex partof the im age processing by identifying features,patterns,trends,changes etc. and hereby m aking this inform ation readily available. The approach to the screening of EO research status and trends w illstartw ith the identification of them atic areas w here EO data have dem onstrated their usefulness.Them atic areas w here EO data are expected to provide im portantinformation in the future w illequally be included. M ostscientists m aintain the idea of their w ork becom ing usefulto the society in one form or the other and the objective of the presentpaper is to dem onstrate how scientific results are evaluated for possible real-w orld use and hereby propose a batch ofim portantcriteria.This w illbe done by presenting the strategy forthe evaluation and the selection ofM ethods as being done w ithin the SU RF project.The w ork w illbe placed in a contextthrough the description and evaluation of 140 three candidate M ethods.These are entropy based prediction,N D V I trend m ap from tim e series and know ledge-based classification by SA R. M ETH O D O LO G Y In order to m eetthe objectives m entioned above,the survey started w ith a broad approach,listing the m ain them atic areas,the applications and M ethods.In the consecutive steps,the listof candidate M ethods w illbe progressively narrow ed dow n and in the end,three M ethods w illbe selected forprototype im plem entation. A tthe application level,user considerations are m ostim portant.If there is no or little user interestin an application,no further effort w ill be devoted tow ards that particular application.A t the M ethod level,the technical considerations are central.If a certain M ethod is very difficultto im plem entfor any reason,itw illgeta low rating and w illnotbe considered forprototyping.Table 1 provides an exam ple ofthe relation betw een them atic areas,applications and M ethods. Table 1.Exam ple ofthe relation betw een them atic areas,applicationsand M ethods. Them aticarea Them atic sub-area A pplication M ethod Environm entalm onitoring Environm entaldegradation M onitoring deforestation Regression treeclassifier M ultivariate A lteration D etection and M A F H yper-colum ns using G aborFilters M apping desertification N D V Itrend analysis Context-based objectoriented classification The rating and selection of the m ostprom ising M ethods w illbe carried outin close dialog w ith end users,the scientific com m unity as w ellas w ith ESA . STEP O N E:SELEC TIO N O F TH EM A TIC A R EA S A ND A PPLIC A TIO N S A n initialassessm entofthe userneeds w ithin each them atic application is m ade,considering the follow ing factors: a) U sers’ need for inform ation – The users’ need for inform ation w ithin the category is evaluated.A pplications thatare notconsidered usefulw illresultin a low score. b) Relative provision of inform ation relative to other sources – If the inform ation potentially provided from EO data can easily and costeffectively be obtained through other data sources (non EO data) itshould resultin a low score.O n the other hand,if no other m eans than EO data is available for obtaining the inform ation, or if com plim entary data can be achieved using rem ote sensing data,a high score should be given. c) U sers’need for autom ation – This category evaluates the users’ need for autom ation in a given area.The need w illbe high w hen w orking w ith operationaltopics or in areas w ith large am ounts of data.A utom ation w ithin EO data processing m ay equally be able to provide users w ith products or sem iproducts thatcan be autom atically extracted from EO data during the pre-processing and hereby becom e m ore easily available and less costly to the users.Forad-hoc applications the need forautom ation m ay be less pronounced. The assessm entofuserneeds is firstcarried outby the consortium and rated according to the three categories m entioned above.This w illbe follow ed by a dialog w ith users,user com m unities or representatives to getfeedback on the rating and selection and hereby allow adjustm entofthe ratings. D uring the initialphase of the project,five overallthem atic areas w ere identified.Each them atic area is characterized by a num ber of sub-them atic areas,and w ithin each of these sub-areas,a num ber of them atic applications are found.The identification of them atic areas and applications is m ainly based on the literature and know ledge of existing operational applications. 141 Table 2 show s the prelim inary listof them atic areas.For each sub-area,the relevance ofdata categories orpossible data fusion scenarios is indicated. Table 2: List of the five thematic areas and related applications O ptical R adar D ata Tim e data data fusion series Environm entalm onitoring Environmental degradation Land cover and land use change Eutrophication and oxygen depletion of the marine environment Oil spill monitoring G lobalchange Changes in vegetation productivity (carbon pools) Land cover changes Changes in (patterns of) Sea Surface Temperature (SST) Changes in marine Chlorophyll a concentrations Security Sea-ice monitoring and warning Flooding, mapping and forecast Identifying forest or savannah fires Meteorology Volcanic activity Policy and Planning Carbon accounting Land cover mapping and changes Agricultural subsistence and control Cartography R esource A ssessm ent Geology Fisheries Forestry Fresh Water x x x (x) x x x x x x x x x (x) x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x STEP TW O :LISTIN G O F M ETH O D S Based on the listof them atic areas and applications,a large num ber of feature extraction M ethods w ere identified based on the experience of the consortium as w ellas an initialliterature search.D uring this phase M ethods are identified w ithoutany furtherevaluation.A n exam ple ofM ethods listed w ithin a given application is found in Table 3. Table 3.Exam ple ofM ethodsforM onitoring D eforestation RGB-N D V Idifferencing M ultivariate A lteration D etection /M axim um A utocorrelati on Factors The V TT auto changeM ethod Iterative fitting ofPrincipalCom ponentA xis Regression treeclassifier PCA D ifferencing D ecision tree fuzzy classification H yper-colum ns using G aborFilters Classificationsusing space-tim e segm entation Context-based objectoriented classification N euralnetw orks Entropyprediction K now ledge based classification 142 STEP TH R EE: EV A LU A TIO N O F M ETH O D S W ITH IN TH E M O ST IM PO R TA N T A PPLIC A TIO N S For each M ethod, an evaluation of the technical considerations is carried out.This project phase w ill be subjected to three iterations in order to secure a thorough evaluation,w here new inputs and additionalinform ation can be added in the process. The follow ing categories are used to rate each M ethod w ithin prom ising applications: a) M aturity of the M ethod in term s of technical and EO com pliance – This param eter considers how robust the M ethod is.O ne criteria is to observe the am ountand generality ofpublications thatare available on the M ethod, preferably from differentsources and w ith exam ples of their im plem entation.A lso,if the M ethod has been implem ented in com m ercialsoftw are packages,operationalprojects orifitis routinely used by various institutions itm ay be considered m ature.The M ethod in question does notnecessarily need to have a history in EO applications in orderto be m ature - m aturity m ay also referto w ellestablished applications in otherareas ofsay pattern recognition. b) Novelty ofthe M ethod – The M ethod should be firstpublished or reported recently to geta high score.Itdoes nothave to be an innovative algorithm ,the novelty m ay equally consistin com bining w ellestablished M ethods orapplying w ellknow n M ethods in an innovative m anner. c) O perational perspectives (tim ing, effort, com putational requirem ents, cost etc.) – The perspectives for im plem enting the M ethod w illbe evaluated here.A M ethod thatis com putationally dem anding (i.e.extensive iterative processes) w illbe given a low score if itrestricts the tim ely outputof results (for instance oilspillm apping w here the results have to be delivered in near realtim e).If the M ethod is capable of delivering tim ely outputs w ithin a reasonable financiallim it,a high score should be given. d) Accuracy ofthe M ethod in relation to the specific application – A high absolute accuracy gives a high score.If the relative accuracy is high (providing a correct assessm ent of the spatial distribution),the M ethod can be given a high score even if the absolute accuracy is poor.A n exam ple is chlorophyll a concentration m onitoring, w here the absolute accuracy is poor,butthe productm eets the needs of environm entalm onitoring institutions to obtain an overview ofthe spatialdistribution ofalgae. e) Potentialuse ofthe M ethod in other areas – Ifa M ethod is extrem ely specific and can only be applied w ithin a single or lim ited areas a low score should be given.O n the other hand,if the M ethod is of a generalnature that allow s for its application in other them atic areas,a high score should be given.The num ber of cross-references w ill be a useful indicator (cross-references indicate if the sam e (or nearly the sam e) M ethod is listed in other applications). f) Perspectives for autom ation – Ifa M ethod can be run com pletely autom atically,a top score should be given.If a substantialam ountofm anualw ork is required,a low score should be given.ForM ethods w here som e userinteraction is required (i.e.specification ofconstants etc.)a m oderate score is appropriate. g) SU RF feasibility – The feasibility of im plem enting a given M ethod w ithin the SU RF project w ill be assessed separately.Ifa M ethod is extrem ely dem anding to im plem ent,itw illnotbe possible underthe currentproject. A fterthe evaluation ofm ore than 50 M ethods,approxim ately 20 M ethods w illbe selected based on the evaluation scores listed above.Selected users and scientists (or groups of) w ill be consulted to m ake com m ents and suggestions on the ratings and decisions being m ade. STEP FO U R : D ETA ILED A N A LY SIS O F SELEC TED M ETH O D S The 20 M ethods selected in step three w illbe narrow ed dow n to three M ethods thatw illbe used for prototype developm ent.The process ofselecting these three M ethods w illinvolve a detailed description of the 20 candidate M ethods.The feasibility of the M ethods w illbe assessed from the literature and through contactw ith the scientific com m unity. Pilot and test cases w ill be studied w ith special attention to the potentiallevel of generalisation.U ser requirem ents w ill be 143 analysed or confirm ed through dialogue w ith users,as w ellas the technicalaspects w illbe evaluated.ESA w illactively take partin the selection process. STEP FIV E:PR O TO TY PE D EV ELO PM EN T For the three m ost prom ising M ethods, w orking prototypes w illbe im plem ented.A s som e of the M ethods m ay never have been tested in the EO field,the evaluation ofprototypes w orking on EO data isan im portantstep in orderto acquire specific w orking know ledge ofthe new M ethods.The prototype im plem entation w illas faras possible be based on existing available softw are libraries and CO TS. Criteria for evaluation of the prototypes w illbe established as a subsetof the criteria for the ranking of M ethods in step three.The operationalperspectives,the M ethods’accuracy,and the levelofautom ation can be tested by running the prototypeson prepared data sets. Based on the com bined know ledge of M ethods and the user’s needs,realistic exploitation scenarios w ill be described. This includes an assessm ent of the quality of the inform ation to be provided along w ith an assessm ent of the required hum an resources and the necessary hard-and softw are. M ETH O D EX A M PLES In the follow ing section three exam ples ofcandidate M ethods are described and table 4 show s how these w ere rated. EN TR O PY BA SED PR ED IC TIO N The purpose is to detect unexpected changes. These could be the possible occurrence of patches of deforestation or changes in agriculturalpractice (e.g.from fallow to cultivation).The M ethod m ay even be used to clean up clouded pixels in a tim e series.W hich events are unexpected is a question of prior know ledge of events in the area.The inform ation m ay in the sim plestcase be directprediction of change per pixel,butcan equally w ellbe a prediction of a possible new classification scenario. The principle is to collect statistics on the tim e sequences at each pixel. Shannon (1948) used this M ethod to quantify inform ation.In a data base an accountis m ade ofalloccurrences oftem poralsequences ofevents.Shannon show ed how itw ould be possible to distinguish betw een differentlanguages by counting the frequencies of letters in sentences from the language,the frequencies of allconsecutive letters,of alltriples of letters,etc.The presentM ethod is a m odern implem entation of his M ethod.Shannon show ed thathis prediction M ethod w ould be as precise as a program w hich in advance knew the transition table of a M arkov Source w hich generated the sentences of the language. Lem pel and Ziv (1986)show ed how to use this principle efficiently for data com pression.The principle in data com pression is thatthe program estim ates the probability of each sym bol before encoding it.If the sym bol occurs often,a shortcode w ord is used,and if itoccurs only infrequently,notm uch harm is done using a long code w ord.This coding principle is w idely used fortextcom pression in m odern personalcom puters. W hen applied to im ages,statistics per pixelis collected,either as a resultof a classification,a spatialfilter or the quantization of spectralvalues.The program then decides if an eventis unexpected in the follow ing w ay.From a query to the data base it gets inform ation about tim e and location w here a sim ilar previous sequence of event has occurred before. This w ay it can m ake use of the prior inform ation about events in the area.There m ay be several occasions atw hich sim ilar phenom ena have occurred.The program chooses the longestidenticalsequence of events.O nce ithas obtained a tim e and a place w here a sim ilar event has happened,it m akes the prediction that the next event to occur is the event w hich took place atthe previous tim e and location w here the optim alm atch w as found. A surprising aspectis thateven if keeping track of allsequences of alllengths seem s a titanic job,there is an algorithm w hich m akes it possible to use only one quantum of com putation independent of how long the sequence of im ages is. This m eans,that given a sufficiently fast com puter,the M ethod w illw ork in real tim e.Furtherm ore,the M ethod w ill only need one quantum of storage space for each im age itreceives.Itis notnecessary to use increasingly m ore space for 144 collecting the statistics.This is equally surprising,since there is no lim iton the length of the sequences w hich the data base keeps track of.The data structure used in the data base is called a suffix tree,and the algorithm w as invented by W einer(1973)quite som e years ago.Few read W einer's paper,a new erversion w as proposed by U kkonen (1992). N D V I TR EN D M A P Tim e series of N D V I data derived from EO data are used to identify long term trends in vegetation productivity.N O A A A V H RR data have been used w here the integration of N D V I during the grow ing period is used as a estim ates of the annualN etPrim ary Production (N PP).This M ethod has been applied in dry areas (Rigina and Rasm ussen,2003,Tottrup and Rasm ussen,subm itted)as w ellasglobally (N em anietal,2003).D egrading conditions are identified w hen integrated N D V I consistently are decreasing.The inform ation is used by m inistries of environm entin sub Sahara A frica as a supportto the im plem entation of the U N Rio conventions on desertification and clim ate change.The inform ation serves as reference data on the state of the environm entas w ellas being used to identify areas subjectto specific actions to m itigate consistently degrading environm entalconditions. The M ethod is based on a few principles.A num ber of authors have dem onstrated the existence of a relationship betw een N D V I and N PP (Tucker et al, 1986, Prince 1991 and Rasm ussen 1992) using the Light U se Efficiency m odel (LU E) (K um ar & M onteith, 1981).N D V I can be used to assess the fraction of absorbed photosynthetically radiation (fA PA R)and the N PP can be calculated as: N PP = e fA PA RN D V I PA R W here e is the efficiency oftransform ing energy into drym atterand PA R is the photosynthetically active radiation. fA PA RN D V I is the fA PA R estim ated using N D V I from EO data.PA R can be obtained from EO data or m eteorological data and the e can be assum ed constantfor lim ited areas and during lim ited tim e,or values can be determ ined as a function of ecosystem s and differentstress param eters.This is the approach applied for the M O D IS N PP product(Justice et al.,1998). From a tim e series of annual m aps of N PP as a function of integrated N D V I, a per-pixel trend m ap can be computed, w here the value of the slope is assigned to each pixelin the im age.This w illshow the overallconditions and areas w ith consistently decreasing N PP can be identified.The M ethod w as evaluated and further developed in Senegaland results show ed that areas experiencing soil exhaustion as w ell as areas w here enabling conditions w ere prevailing w ere both captured from a tim e series of19 years ofPathfinderA V H RR N D V Idata (Tottrup & Rasm ussen,subm itted). K N O W LED G E-BA SED C LA SSIFIC A TIO N O F SA R D A T A The M ethod can be used to m ap changes w ithin land cover,land use and vegetation.M ore specifically,detailed information and land cover can be applied in relation to regionaland globalgeneralcirculation m odels to calculate the fluxes of w ater-energy-carbon.A know ledge-based,hierarchicalclassification schem e has been proposed for land-cover classification into relatively broad classes using Synthetic A perture Radar (SA R) data. The concepthas been dem onstrated using airborne and spaceborne polarim etric SA R data (Pierce etal.,1994 and 1998) as w ellas com binations of SA R data from tw o satellites atdifferentfrequencies (C- and L-band) (D obson etal.,1995 and 1996). Radarbackscatteris sensitive to tw o types ofscene param eters,geom etricaland dielectricalproperties. The geom etrical propertiesreferto the structure ofthe vegetation and to the roughness ofthe bare surface orthe surface underthe vegetation. For the vegetation,the structure param eters im portantfor radar backscatter are the size distribution,the orientation distribution,and the densities of the vegetation elem ents (i.e.stem ,branches,and leaves). The dielectric properties are determ ined by the w ater content,the phase of the w ater and the densities of the scattering elem ents. Based on these sensitivities it is possible to discrim inate betw een different types of vegetation w ith different structural and dielectrical properties using SA R data. The sensitivity of the radar to these param eters varies w ith radar frequency and polarisation. H ence, it is possible to resolve potentialam biguities in the discrim ination betw een different vegetation classes using m ulti-frequency and/orpolarim etric SA R data. 145 In D obson etal.(1995 and 1996)and Pierce etal.(1994 and 1998)know ledge aboutthe scattering properties from vegetation as w ellas electrom agnetic m odelresults w ere used to derive a hierarchicalclassification schem e. First,in the socalled LevelI classifier the im ages are classified into 3 very broad classes:surface,low vegetation and tallvegetation. The classification accuracy ofthis classifier is found to be very good w ith accuracies close to 100% . Based on the outputof this classification,a LevelII classifier is then applied. This classifier m ay for instance classify the tallvegetation into differentforesttypes,and the low vegetation into differentcrop types. Classification accuracies for the overall classification schem e ofabove 90% w ere norm ally obtained. The above-m entioned classification schem e is relatively robustin term s of for instance tim e of the year,and geographical location,because it is based on know ledge and m odel results. The classification schem e m ust,how ever, to som e degree be adapted to the data available and to the requirem ents for the land-cover classes. A fter the m anualsetup of the algorithm ,itisvery sim ple and fast. Table 4:Ratings ofthree differentM ethods Technical C onsidera- Entropy based prediction tions The M ethodhasnot M aturity of the M ethod in been used before on term s of technical and EO ** im ages.W idely used com pliance fortext-processing. N D V I Trend M ap **** N ovelty ofthe M ethod ***** O perational perspectives (tim ing, effort, com putational requirem ents, cost etc.) ** A ccuracy ofthe M ethod in relation to the specific application Potential use of the M ethod in otherareas Perspectives for automation Feasibility of im plem enting the M ethod w ithin SU RF (resources) **** **** ** ***** Som ew hat uncertain. In principle,theM ethod can be im plem ented to function in tim e proportionalw ith the length of the sequence,butitw ill ***** take som e care.A prototype program w as run som e years ago on im age (32*32 pixels) w ith sequences of length 96. The accuracy cannotbe estim ated due to the novelty ofthisM ethod. **** *** ***** A prototype could be im plem ented by the team , but not a real***** tim e version. M ature and used.M ustbe assessed for the area considered. D em onstration studies have been m ade and a num ber of independent science team s w orks currently w ith the M ethod. V ery good perspectives. H as been dem onstrated. K now ledge-based classification ofSAR data **** ** ***** A lternatives are less accurate and w illnotcoverthe entire area being m onitored. M ay be used w ithin different vegetation types such as forest.Long term changes in land surface tem perature. M ostrelevantbecause the M ethod is data dem anding. Theteam hasexperience developing and im pl em enting the M ethod. The M ethod is robust because it based on physical m odelling and know ledge ofbackscatter **** W hen the classification schem e is set up, the M ethod is extrem ely sim ple and fast, and the com putational requirem ents are sm all. G ood perspectives in handling large data volum es N orm ally the classification is into broad classes. The actual accuracy depends on the data available (param eters and acquisition tim e) M ethod already used in otherapplications ***** **** *** The setup ofthe classification schem e is done m anually, and then the classification is fully autom atic Relatively sim ple M ethod without any com putational com plexity. M ost ofthe w ork is included in the setup of the classification schem e 146 U ser C onsiderations The user’s need for inform ation Entropy based prediction N D V I Trend M ap (A pplication: M apping deforestation) (A pplication: M apping desertification) ***** The potential provision of inform ation relative to the effortof the user to obtain this inform ation through othersources ***** U ser’s need for automation **** Strong need especially in countries w here conventional statistics are notreliable. EO data the only feasible inf. source that provide hom ogeneous data at the regional level. Large data volum es necessitate a high degree ofautom ation. ***** **** **** N eeded form onitoring the state of the environm ent and possible degradation ofnaturalresources. EO data the only feasible inf. source that provide hom ogeneous data at the regionallevel. Large data volum es necessitate a high degree of autom ation. K now ledge-based classification ofSA R data (A pplication:M apping land cover and land use changes) Planning and natural resourcem anagem ent. **** **** **** H om ogeneous and repetitive inform ation from EO data. A ssistance to com pile the vastdatabase of EO data needed. C O N C LU SIO N A strategy has been presented fora survey and assessm entofcandidate EO M ethods to be im plem ented in autom atic or sem iautom atic feature extraction.The them atic areas and subsequentapplicationsw here EO data can supportusers w ith im portantinform ation have been listed based on the literature and existing know ledge.These are rated according to user’s potentialneed for inform ation.O nly them atic areas and applications w here the user’s interestcan be justified have been selected for further consideration.For each application,differentM ethods are listed and a firstassessm entis m ade in order to reduce the num ber of M ethods to approxim ately 20. Exam ples are presented including the corresponding ratings.This rating of the M ethods are to a large extend subjective,how ever,itis assessed thatitcan be used fordocum entation and a firstselection.A n alternative w ould be to use a decision m odelattributing differentw eights to the different ratings and using conditionalselections.In this case itw illbe essentialto guarantee the independency betw een the team responsible for the developm entof the decision m odeland the team responsible for the rating of the M ethods.Furtherm ore,evaluators bias should be assessed through crosschecks. The nextstep in the process w illbe the in depth feasibility study of the 20 selected M ethods.The results of the SU RF projectw illbecom e publicly available atESRIN ’s w eb pagesatearth.esa.int/rtd/Projects m id 2004. A C K N O W LED G EM EN TS The SU RF Consortium consists of RO V SIN G A /S,G eographic Resource A nalysis and Science Ltd.(G RA S),Technical U niversity ofD enm ark,Ø rsted-D TU D epartm entand D epartm entof Com puter Science,U niversity of Copenhagen.The SU RF project is funded by the European Space A gency (ESA ),ESRIN contract no.: 17127/03/I-O L w ith A lessandro Ciarlo as the technicalofficerforSU RF. R EFER EN C ES D obson,M .C.,F.T.U laby,and L.E.Pierce,1995,Land-Cover Classification and Estim ation of Terrain A ttributes U sing Synthetic A perture Radar,Rem ote Sensing ofthe Environm ent,51,199-214. D obson, M .C., L.E. Pierce, and F.T. U laby, 1996, K now ledge-Based Land-Cover Classification U sing ERS-1/JERS-1 SA R Com posites,IEEE Transactions on G eoscience and Rem ote Sensing,34,83-99. 147 Justice,C.O ., V erm ote,E., Tow nshend,J.R.G ., D eFries,R.S., Roy,D .P., H all,D .K ., Salom onson,V .V ., Privette,J.L., Riggs,G ., Strahler,A .H ., et al., 1998, The M oderate Resolution Im aging Spectroradiom eter (M O D IS): Land Rem ote Sensing forG lobalChange Research. IEEE Transactions on G eoscience and Remote Sensing,36,1228-49. K um ar, K ., M onteith,J.L. 1981, Rem ote Sensing of Crop G row th. In Plants and the D aylight Spectrum , edited by Sm ith,H .(London:A cadem ic Press),pp.133-44. Lem pel,A .& Ziv,J.,1986,Com pression of2-dim ensionaldata.IEEE transactions on inform ation theory,32,2-8. N em ani,R.R., K eeling C.D ., H ashim oto H ., Jolly W .M ., Piper S.C., Tucker,C.J., M yneni R.B., Running,S.W ., 2003, Clim ate-driven increases in globalterrestrialnetprim ary production from 1982 to 1999. Science,300,1560-3. Pierce,L.E.,F.T.U laby,K .Sarabandi,and M .C.D obson,1994,K now ledge-Based Classification of Polarim etric SA R Im ages,IEEE Transactions on G eoscience and Rem ote Sensing,32,1081-1086. Pierce, L.E., K .M . Bergen, M .C. D obson, and F.T. U laby, 1998, M ultitem poral Land-Cover Classification U sing SIRC/X -SA R Im agery,Rem ote Sensing ofthe Environm ent,64,20-33. Prince,S.D ., 1991,A m odelof regionalprim ary production for use w ith coarse resolution satellite data. International Journalof Rem ote Sensing,12,1313-30. Rasm ussen,M .S.,1992, A ssessm ent of m illet yields and production in northern Burkina Faso using integrated N D V I from the A V H RR.InternationalJournalofRem ote Sensing.13,p 3431-42. Rigina,O .and Rasm ussen,M .S., 2003, U sing trend line and principalcom ponentanalysis to study vegetation changes in Senegal1986 -1999 from A V H RR N D V I8 km data. D anish JournalofG eography,103,31-42. Shannon,C.E.,1948,A M athem aticalTheory ofCom m unication.BellSystem s TechnicalJournal,27,379-423 SISCA L,2003,U RL:w w w .SISCA L.N ET,A ugust2003. Tottrup,C.and Rasm ussen,M .S.,Subm itted,M apping long-term changes in crop productivity in Senegalthrough trend analysisoftim e seriesofrem ote sensing data.Subm itted to: A griculture,Ecosystem s and Environm ent, Tucker,C.J.,Justice,C.O .,Prince,S.D ., 1986,M onitoring the grasslands of the Sahel1984-1985. InternationalJournal ofRem ote Sensing,7,1571-81. U kkonen,E.,1992, Constructing suffix trees on-line in linear tim e.In J.v.Leeuw en,editor,A lgorithm s,Softw are,A rchitecture.Inform ation Processing 92,V olum e 1,pages 484-492.Elsevier,1992. W einer,P.,1973,Linear pattern m atching algorithm s.In 14th A nnualSym posium on Sw itching and A utom ata Theory, pages 1-11,The U niversity ofIow a,15-17 O ctober1973.IEEE. 148 AUTOMATIC CHANGE DETECTION FOR VALIDATION OF DIGITAL MAP DATABASES Brian Pilemann Olsen Thomas Knudsen National Survey and Cadastre, Denmark, 8 Rentemestervej, DK–2400 Copenhagen NV, Denmark, {bpo|thk}@kms.dk KEY WORDS: Photogrammetry, Remote Sensing, Change Detection, Classi cation, Automation, High Resolution, Infra-red, DEM/DTM ABSTRACT In almost all areas of our society there is an increasing need for up to date digital map databases. Traditionally, different manual, labour intensive and hence costly methods have been used for map updating, with the change detection for the updating being by far the most complex and expensive part. In this paper an automatic change detection method is presented and evaluated. The method only considers changes in the buildings theme, but it can be extended to other object classes. The aim is the development of an ef cient change detection procedure for database maintenance in a production environment. The method is based on classi cation principles and combines an unsupervised and a supervised classi cation in order to determine the spectral response of the building class and thus locate potential buildings. The result is ltered by a height lter to re ne the result. The method is evaluated on building registrations from the Danish TOP10DK map database. The test case presented in the paper is from a residential suburban area. The method detects almost all changes due to demolished buildings whereas only a smaller part of the new buildings are detected. This is primarily due to the use of a very special roo ng material. The method leads to a number of false alarms, which to a large degree can be eliminated by re nements of the algorithm or by introduction of additional information e.g. infra-red images or texture measures. 1 As can be seen from gure 1 buildings are often highly diverse both when it comes to size, form and spectral signature. They are therefore hard to describe by spectral information only. Adding height information to the process, e.g in the form of digital surface models, may improve the distinguishing of buildings from other objects having a similar spectral response. When introducing automatic change detection procedures, the aim is to detect at least the same percentage of factual changes as a manual operator is capable of. It is, on the other hand, acceptable if the change detection procedure introduces false alarms, as long as they are few, since they can easily be rejected during the actual 3D object registration. INTRODUCTION The digital topographic map database TOP10DK is the primary topographic product of the National Survey and Cadastre—Denmark (Kort & Matrikelstyrelsen, KMS).The development and update of TOP10DK is based on aerial photogrammetry: one fth of Denmark is photographed each spring before foliation, resulting in approx. 1200 photos (about 400 GB of image data). Map updating can be carried out by a complete remapping of the area for each revision cycle, but much work can be saved by detecting changes from the previous version of the map database and concentrating on these areas of change. Change detection for topographical mapping is on the other hand not a simple task: Although the intention is always to carry out the photo ights at approximately the same time of year, the natural, inter annual variations of the vegetation coverage is of a magnitude that hides the (primarily human generated) changes sought for. Furthermore it is almost impossible to take the photos at the same geographical position and with the same attitude as within the previous photo campaign. This means that the change detection must be carried out by comparing a new image directly with the existing map database, rather than by a simpler image-to-image comparison. In this paper an automatic procedure for change detection concentrating on buildings, which are important mapping objects, is presented. The next step in the update process: the actual 3D object registration, is not considered here. This subject has recently been treated extensively by Niederöst (2003) and Süweg (2003). Figure 1: Buildings are typically highly diverse and spectrally ill-de ned, when considered as a single group. The last image (Lower Right) is a building as it is represented in a DSM. 1.1 Related work Other European countries e.g. Germany, Switzerland, and the Netherlands have also established and completed digital map databases with national coverage in the past few years. The National Mapping agencies in these countries therefore face the same problem as the KMS and projects with similar aims considering automatic or semi-automatic map updating have been established. In Germany the project for updating the ATKIS database focuses on registration of more generic surface types (settlement, grassland, street, water etc.) (Petzold and Walter, 1999, Walter and Fritsch, 2000). The method for change detection uses supervised classi cation, with training areas automatically generated using the existing registrations in the ATKIS map data base (Walter, 2000). Experiments combining multi-spectral images (RGB, colour infra red – CIR) with height information and reduction of the information to generic surface types, have shown that it is possible to perform automatic change detection with a satisfactory accuracy (Petzold and Walter, 1999, Petzold, 2000) (note, however, that the accuracy requirements for ATKIS are somewhat lower than for TOP10DK (Kort & Matrikelstyrelsen, 2001, AVLBD, 1988)). The change detection leads to a “change map” 149 where the generic objects are divided in three classes: no change, possible change and change. In the Swiss ATOMI project, aerial colour photos, a high resolution Digital Elevation Model (DEM) and a Digital Surface Model (DSM) are used aiming at the enhancement of the planimetric accuracy for the 2D VECTOR25 database (Eidenbenz et al., 2000, Niederöst, 2003). The surface model is generated by auto-correlation in aerial photographs in the scale of 1:10.000 and is used as the primary data source. Image information (RGB/CIR) is primarily used to discern man made objects from natural objects (buildings vs. vegetation). Data from the digital multi spectral camera High Resolution Stereo Camera—Airborne, HRSC-A (Neukum, 1999) is evaluated and used within the Dutch project (Asperen, 1996, Hoffmann et al., 2000). The HRSC-A data set includes high-resolution (15 cm) spectral data (RGB and CIR) and an automatically generated high resolution surface model from stereo matching. A new change detection project within the framework of EuroSDR is about to start up later this year. The emphasis is on development of methods for localising changes in land cover from very high resolution imagery, the integration of change maps in the updating process and nally comparison of different methods for change detection (EuroSDR, 2004). 2 DATA The change detection procedure presented in section 3 below, is evaluated using datasets mainly associated with the development and updating of the Danish TOP10DK topographical map data base. 2.1 RGB images For the establishment and update of TOP10DK traditional RGB aerial photographs have been used. All images are taken from an altitude of approximately 3800 m leading to a scale of 1:25.000. Each image covers an area of 6 km by 6 km and has a forward lap of 60 percent and a side lap of 20 percent. As part of the production work- o w the photos are scanned at a resolution of 21 µm, leading to 350 MB of data, and a spatial pixel resolution of 0.5 m at ground level. The photos were taken as part of a ight campaign in April 2000. 2.2 Digital Surface Model (DSM) As was described by Knudsen and Olsen (2003) it is very difcult to locate changes in the building layer using single aerial images and hence only using spectral information in combination with size and form considerations. Therefore a high resolution digital surface model (DSM) with a grid size of 1 meter covering a test area in Lyngby, north of Copenhagen, has been generated to facilitate the building detection. The dataset was collected and made available for these studies by the Danish engineering and mapping company COWI. Data were collected in May-June 2001 using the TOPOSYS1 system (Toposys, 2004, Baltsavias, 1999) which only record rst responses of the pulse. The expected height accuracy is approximately 0.15 m. 2.3 Digital Map Database The building theme from TOP10DK has been selected as target for the update procedure. TOP10DK is a fully 3D map database, including 51 object types (building, lake, highway . . . ) organised in 8 classes (traf c, water . . . ). The precision of the database is better than 1 meter both horizontally and vertically. For change detection in the building layer, only new buildings larger than 25 m2 and changes of building size larger than 10 m2 are considered. 3 METHOD The method presented is a revision of a method described by Olsen et al. (2002) and Knudsen and Olsen (2003). It is based on classi cation principles, using existing object registrations in the map database as training areas in order to determine the characteristics of the different classes used to search for and build the object model. As it is very dif cult to generate an unambiguous object model for buildings using only spectral information, the revised method also incorporates height information in the form of high resolution DSM data e.g. from LIDAR or photogrammetric auto-correlation to distinguish between objects in terrain from objects above terrain. 3.1 The method step by step The method which consists of three steps, preparation, classi cation, and detection is outlined in gure 2. Two major assumptions have to be ful lled for the change detection procedure to be successful: (1) The number of changes in a given class (e.g. building) must be much smaller than the number of objects used to describe the class. This is valid for most urban areas. (2) New objects must share the same spectral characteristics as the existing objects used to generate the object model. This is often the case as only a small number of roo ng materials is in common use. 3.1.1 Preparation: The preparation consists of a data fusion step to bring the data sets into a common reference frame and a preprocessing step where various enhancement methods are applied to the data data to prepare them for the change detection procedure. Data fusion: as objects from the existing digital map database is to be used as training areas for the determination of the class characteristics, image data (raster) and the map database (vector) must be co-registered. Generally co-registration can be done either by registration of the image data to the map database or by registration of the map database to the image data. The most used method is registration of image data to the map database. However the method has the disadvantage that most image data types (aerial photos) have to be re-sampled as recti ed images or orthophotos. For the data sets to t completely to each other a high precision elevation model, including description of man made objects (buildings, bridges etc.) must be available (i.e. a Digital Surface Model, DSM). Another way is to project the map database directly to the other data sets, e.g. onto the aerial images using the basic photogrammetric equations (Kraus, 1993). For this to work a database with (X,Y, Z) coordinates and the orientation parameters for the aerial images have to be available. This method, leads to the most precise co-registration, and eliminates any resampling of image data. Preprocessing: Various algorithms are applied to the data set to prepare them for the change detection process. The three most important processes are: (1) calculation of NDVI (Normalised 150 Difference Vegetation Index) images if colour infra-red (CIR) photos are available; (2) generation of a normalized Digital Surface Model (nDSM); and (3) evaluation of training areas. DATA SOURCES MAP DATABASE RGB IMAGES NIR IMAGES DHM DTM DSM Preparation PREPROCESSING DATA FUSION −red NDVI is calculated as NDVI = ir ir+red ; NDVI is well suited for distinguishing vegetated areas from man made objects. A nDSM only includes objects which stands above terrain and it can be calculated as using a Digital Terrain Model (DTM): nDSM = DSM − DT M. If a DTM is not available it must be estimated from the DSM. A very simple method for DTM estimation using grey tone morphology is described by Weidner and Förstner (1995), and used in these tests. First a minimum ltering of the DSM is performed using a at structuring element B (with a given size and form). In this way the minimum height in the area determined by the structuring element is assigned to the origin of the structuring element (pixel). This minimum ltering is followed by a maximum ltering, using the same at structuring element. Performing the two steps in the described order equals an morphological opening: z̄ = z ◦ B and leads to an estimation or approximation of the topographic surface, the DTM. In order to eliminate all elements above terrain (buildings), the size of the structuring element must be chosen in such a way that it is not completely contained in a building. The size depends on the area to be processed. If a priori information concerning existing building sizes in the area is available the size can be x ed using this information. In the test presented in this paper the size of B is x ed to 25 m. The process is illustrated in gure 3. CLUSTERING Classification CLASSIFICATION ITERATION Figure 3: nDSM creation using arti cial DTM. UL: DSM. UM: estimated DTM, z̄ = z ◦ B. UR: nDSM = DSM - DTM. LL: DSM pro le. LM: DTM pro le. LR: nDSM pro le. All pro les follow the red line in the DSM, DTM and nDSM respectively. Detection CREATE CHANGE MAP POST PROCESSING Figure 2: Change detection work o w—cf. section 3 for description Validation of the training areas is done using the estimated nDSM and/or the NDVI image. An objects above terrain mask can be generated using a height threshold of e.g. z ≤ 2.5 meters in the nDSM. With this mask, areas registered as buildings in the existing map database, which no longer stand above terrain are ltered out. Objects covered by vegetation can be eliminated using the NDVI mask (if available), as they can be detected as areas with NDVI ≥ 0.1. The result of the validation is a re ned building mask, holding only the buildings which are most likely still buildings. 3.1.2 Classi cation: The rst step in the classi cation part is to perform a clustering process. As stated by e.g Kressler and Steinnocher (1996) some classes, (e.g. buildings) have to be subdivided into more unique subclasses as they are spectrally highly diverse. This task is handled by splitting up the group of pixels registered as buildings by the building mask into smaller and 151 more unique sub-classes using a simple migrating means clustering process. The algorithm is based on the ISODATA algorithm (Ball and Hall, 1965), and the number of sub-classes is automatically determined by the algorithm in order to make a best t to the input dataset. The clustering process is followed by an actual classi cation . The sub-classes, which are spectrally more uniform than the base building class, are used (either alone or in combination with other class descriptions (e.g. water, roads, forest, grassland etc.) to perform a Mahalanobis classi cation of the entire image. This causes all pixels in the image to be assigned to the class having the smallest Mahalanobis distance from the pixel value to the class (Richards and Jia, 1999). Threshold values all being dependent on the class characteristics, are used to assign pixels with a distance too far from the closest class to a garbage class. The two successive steps are run a number of times as part of an iteration process. This is done mainly due to the fact that the result of the ISODATA algorithm is strongly dependent on the position of the initial cluster centres. After this iteration process it is possible to accept pixels identi ed as buildings a speci c number of times. In the case study presented below all pixels which are classi ed as a building one or more times are considered to be a “building”, leading to an image holding pixels with values of either zero or one: zero indicates no building and one indicates a potential building pixel. Using the nDSM, the image holding potential buildings are ltered in order to extract only objects (pixels) which stands above terrain. 3.1.3 Change detection: First a change map is computed by a pixel by pixel comparison of the existing map database (in a raster version) to the classi cation result. Since the change map includes all potential changes in the building layer it includes noise in the form of single pixels, and some false alarms due to misclassi cation. The single pixels are removed using morphological opening. The remaining change pixels are segmented, and pixel clusters smaller than the detection requirement (e.g. 25 m2 = 25 pixels in the TOP10DK case) and/or not ful lling the size and shape speci cations for buildings are removed from the dataset, leading to a reduction of the false alarms and the nal change map. 4 CASE STUDY The procedure is tested on the data described in section 2. The latest update of the TOP10DK database was carried out ve years before the photos were taken. (E, N) coordinates are given in UTM zone 32. All images (RGB, TOP10DK and DSM) are 500 rows by 700 columns and approximately 70 buildings are included in the area. 72 registrations are included in the existing map data base. 12 new houses have been build since the last revision and 14 have been demolished. 4.2 Test data All datasets are subsamples from larger datasets and they are brought into the same geographical reference by orthorecti cation of the aerial photographs using an existing digital terrain model (DTM) with a grid spacing of 20 m. 4.3 Results The results are visualised in gure 4. The rst image shows the RGB image with the existing map database superimposed in yellow colour. It can be seen that a lot of development has taken place since the last revision of the map database. This is most pronounced in the right part of the image where 7 buildings have been demolished and 10 new buildings with blue roo ng material have been built. The second image from the top shows the result after the classication step. White pixels indicate potential buildings, and as it can be seen large areas are misclassi ed as vegetation and roads are classi ed as potential buildings. The third image shows the result after the height ltering. It can be seen that all roads are now removed. Some vegetated areas still remain as potential buildings, though. The last image shows the RGB image with the changes found by the automatic change detection algorithm superimposed in yellow. The results are summarised in table 1. Demolished Buildings New Buildings Changes False alarms Factual 14 12 26 Detected 12 2 14 45 Table 1: Statistical results Approximately 50 percent of the factual changes in the test area have been detected by the algorithm. 5 DISCUSSION Most success is experienced in the group of demolished buildings where only two demolished buildings have not been detected. 4.1 Test area This is caused by the method used for change detection where the detected buildings are compared directly to the existing regThe test area used for evaluation is situated in Kgs. Lyngby, a istrations on a pixel wise basis. The two demolished buildings suburb 15 km north of Copenhagen. The area contains many difwhich are not detected are positioned the upper right area of the ferent types of buildings and houses since it includes a small intest area (marked by a red circle). And as can be seen a new builddustrial area; a cemetery; a church; a small train station; large ing has been built exactly at the same position as where the two strip buildings; and a gasoline station. The looks and shapes of old buildings were positioned. Due to the comparison method the buildings as well as the heights differ a lot. Vegetated areas neither the new nor the demolished buildings are “highlighted” take up a large part of the area, and since the area also includes a by the algorithm. Only two new buildings have been detected highway, two bridges (one for pedestrians and one for cars) and a by the algorithm. The reason for the poor result is that 9 out of rail road, this causes a very special terrain structure. The area is the twelve new buildings have completely different spectral realso characterized by the fact that many changes have taken place sponses than the existing buildings in the area, as they are either since the establishment of the TOP10DK database. blue (6 buildings) or still not nished (3). As one of the two Area: Approximately 700m × 500m. Lower left corner (E, N) = hypotheses regarding the change detection procedure are not ful(718450, 6187050). Upper right corner (E, N) = (719150, 6187550). lled, the algorithm is expected to fail. 152 45 false alarms (3 times the factual changes found) are generated. Of those, 26 are located in vegetated areas, 3 are bridges (above terrain), and 16 caused by existing buildings which apparently have not be re-detected by the algorithm. If infra-red images are available, the majority of the false alarms can be eliminated using the NDVI or by calculation of textural gures using the DSM, as it can be expected that the texture for forrested areas differs from buildings. The 3 false alarms caused by bridges can only be eliminated by the use of a more “clever” algorithm for nDSM generation. Looking more into the false alarms caused by buildings not re-detected, it can be seen that a large proportion of those buildings are actually detected ( gure 4, examples marked by green circles), but these detections are eliminated in a later stage of the change detection algorithm, as part of noise reduction. Re ning the noise reduction method may lead to more existing buildings being “re-detected”. One of the false alarms (shown by a white circle in the upper right corner of the area), is a factual difference but not a change, since it is a roof covering a gasoline station, and such roofs are not to be registered in the TOP10DK database, according to the map speci cation. Such false alarms can only be veri ed by a human operator. 6 CONCLUSION The method presented shows reasonable performance when detecting demolished buildings (12 out of 14 are detected), whereas the number of new buildings detected is poor (only 2 out of 12). A reason for this exists, as the new buildings do not share the spectral response of the existing buildings in the area. One of the hypothesis for the algorithm is not ful lled. The algorithm introduces a fairly large number of false alarms (3 times the number of factual changes detected). Most of these false alarms can be eliminated by re ning some of the processing steps in the algorithm (noise reduction, DSM generation) or by introduction of additional information e.g. infra-red images or textural measures. Acknowledgements: We would like to thank the engineering and mapping company, COWI, for letting us use their Digital Surface Model. REFERENCES Asperen, P. V., 1996. Digital updates of the Dutch topographic service. In: International Archives of Photogrammetry and Remote Sensing 31(B4), ISPRS, pp. 891–900. AVLBD, 1988. Amtliches topographisch-kartographiches informationssystem (ATKIS). http://www.adv.online.de/ english/products/atkis.htm. Ball, G. H. and Hall, D. J., 1965. A novel method of data analysis and pattern classi cation. Technical report, Stanford Research Insitute, Menlo Park, CA. Baltsavias, E. P., 1999. Airborne laser scanner : existing systems and rms and other resources. ISPRS Journal of Photogrammetry and Remote Sensing 54(2-3), pp. 164–198. Eidenbenz, C., Kaeser, C. and Baltsavias, E., 2000. ATOMI - automated reconstruction of topographic objects from aerial images using vectorized map information. In: International Archives of Photogrammetry and Remote Sensing, Vol. XXXIII (B3), ISPRS, pp. 462–471. Figure 4: Results from the change detection algorithm. EuroSDR, 2004. EuroSDR – Change detection project for consideration. http://www.eurosdr.org/2002/research/ ResPoss.asp?ResPosID=27&#pos. 153 Hoffmann, A., van der Vegt, J. W. and Lehmann, F., 2000. Towards automated map updating: is it feasible with new data aquisition and processing techniques? In: International Archives of Photogrammetry and Remote Sensing, Vol. XXXIII (B2), ISPRS, pp. 295–302. Knudsen, T. and Olsen, B. P., 2003. Automated change detection for updates of digital map databases. Photogrammetric Engineering & Remote Sensing 69(11), pp. 1289–1296. Kort & Matrikelstyrelsen, 2001. TOP10DK. Geometrisk registrering. Technical report, National Survey and Cadastre— Denmark, Rentemestervej 8, 2400 København NV, Denmark. Kraus, K., 1993. Photogrammetry. Vol. 1, 4 edn, Dümmler, Bonn, Germany. Kressler, F. and Steinnocher, K., 1996. Change detection in urban areas using satellite images and spectral mixture analysis. In: International Archives of Remote Sensing and Photogrammetry, Vol. 31number B7, ISPRS, pp. 379–383. Neukum, G., 1999. The airborne HRSC–A: Performance results and application potential. Photogrammerische Woche pp. 83–88. Niederöst, M., 2003. Detection and Reconstruction of Buildings for Automated Map Updating. PhD thesis, ETH Zurich, Institut fur Geodasie and Phtotgrammetrie, Eidgenössische Technische Hochschule, 8093 Zurich. Olsen, B. P., Knudsen, T. and Frederiksen, P., 2002. Digital change detection for map database update. In: J. Chen and J. Jiang (eds), Integrated Systems for Spatial Data Production, Custodian and Decision Support, Vol. XXXIV (2), ISPRS, pp. 357–363. Petzold, B., 2000. Revision of topographic databases by satellite images – experiences and expectations. pp. 15–23. Petzold, B. and Walter, V., 1999. Revision of topographic databases by satellite images. In: M. Schroeder, K. Jacobsen, G. Konechy and C. Heipke (eds), Sensors and mapping from space 1999, ISPRS, Hanover, Germany, p. 9. Richards, J. A. and Jia, X., 1999. Remote Sensing and Digital Image Analysis: an introduction. 3 edn, Springer, Berlin, Germany. Süweg, I., 2003. Reconstruction of 3D Building Models from Aerial Images and Maps. Publications on Geodesy, Netherlands Geodetic Commision, Delft, the Netherlands. Toposys, 2004. Toposys1 speci cation. http://www.toposys. de/. Walter, V., 2000. Automatic change detection in gis databases based on classi cation of multispectral data. In: International Archives of Photogrammetry and Remote Sensing, Vol. XXXIII (B4), ISPRS, pp. 1138–1145. Walter, V. and Fritsch, D., 2000. Automated revision of gis databases. In: L. K.-J. et. al. (ed.), Proceedings of the Eight ACM Symposium on Advances in Geographic Information Systems, ACM, pp. 129–134. Weidner, U. and Förstner, W., 1995. Towards automatic building extraction from high resolution digital elevation models. ISPRS Journal of Photogrammety & Remote Sensing 50(4), pp. 38–49. 154 Spatial-temporal Disambiguation of Multi-modal Descriptors Norbert Krüger1 Florentin Wörgötter2 1 2 Computer Science Aalborg University Esbjerg 6705 Esbjerg [email protected] Computational Neuroscience University of Stirling, Stirling FK9 4LA Scotland, UK, [email protected] Abstract In this paper, we describe a new kind of visual representation in terms of local multi–modal Primitives and its role as initiator of diambiguation process based on regularities in visual data. Our Primitives can be characterized by three properties: (1) They describe visual information by different modalities. (2) They are essentially adaptable according to the spatial and temporal context. (3) They give a condensed representation of local image structure that make use of meaningful attributes. Our Primitives are motivated by human visual processing. They are functional abstractions of hypercolumns in visual cortex [1, 2, 3]. The efficient and generic coding of visual information allows for a wide range of applications. For example, they have been used to investigate the multi–modal character of Gestalt laws in natural scenes [4], to code a multi–modal stereo matching and to investigate the role of different visual modalities for stereo [5] and in a combination of grouping and stereo [6]. In this paper, we use the regularity Rigid Body Motion to acquire reliable 3D information in a spatial-temporal disambiguation process. 1 Introduction The aim of this work is to compute reliable feature maps from natural scenes. We believe that to establish artificial systems that perform reliable actions we need such reliable features. These can only be computed through integration across the spatial and temporal context and across visual modalities since local feature extraction is necessarily ambigious ([7, 8]. The European Project ECOVISION [9] focusses exactly on this issue and the work described here is a central pillar of this ongoing project. In this paper, we describe a new kind of image representation in terms of local multi–modal Primitives (see fig. 1). These Primitives can be characterized by three properties: Multi-modality: Different domains that describe different kind of structures in visual data are well established in human vision and computer vision. For example, a local edge can be analyzed by local feature attributes such as orientation or energy in certain frequency bands. In addition, we can distinguish between line and step–edge like structures (constrast transition). Furthermore, color can be associated to the edge. This image patch also changes in time due to ego-motion or object motion. Therefore 155 ϕ cr x2 θ cl A B x1 o C Figure 1: A: Image sequence and frame. B: Schematic representation of the multi– modal Primitives. C: Extracted Primitives. time specific features such as a 2D velocity vector (optic flow) can be associated to this image patch. Finally by using stereo, information about the 3D origin of an aimage patch can be computed. In this work we define local multi–modal Primitives that realize these multi-modal relations. The modalities, in addition to the usually applied semantic parameters position and orientation, are contrast transition, color and optic flow (see fig. 1). Condensation: Integration of information requires communication between Primitives expressing spatial [4, 5] and temporal dependencies [10]. This communication has necessarily to be paid for with a certain cost. This cost can be reduced by limiting the amount of information transferred from one place to the other, i.e., by reducing the bandwidth. Therefore we are after a compression of data. Essentially we only need less than 5% of the amount of the pixel data of a local image patch to code a Primitive that represents such a patch. However, condensation not only means a compression of data since communication and memorization not only require a reduction of information. Moreover, we want to reduce the amount of information within an image patch while preserving perceptually relevant information. This leads to meaningful descriptors such as our attributes position, orientation, contrast transition, color and optic flow. In [4], we have also shown that these descriptors (in particular when jointly applied) allow for strong mutual prediction that can be related to classical Gestalt laws. Adaptability: Since the interpretation of local image patches in terms of the above mention attributes as well as classifications such as ‘edgeness’ or ‘junctionness’ are necessarilly ambigious when based on local processing stable interpretations can only be achieved through integration by making use of contextual information [7]. Therefore, all attributes of our Primitives are equipped with a confidence that is essentially adaptable according to contextual information expressing the reliability of this at- 156 tribute. Furthermore, the feature attributes itself adapts according to the context (see section 3). Adaptation occurs by means of recurrent processes in which predictions based on statistical and deterministic regularities disambiguate the locally extracted and therefore neceassarily ambigious data. The instantiation of these processes becomes feasable because of the condensed representation which leads to a managable amount of meaningful higher order relations of visual events. In section 2, we describe the Primitive attributes and their extraction. In section 3, we refer to applications of our Primitives for contextual integration where we will especially focus on spatial-temporal disambiguation. 2 Multi–modal Primitives In addition to the position x, we compute the following semantic attributes and associate them to our Primitives (see also fig. 1). Intrinsic Dimension: Local patches in natural images can be associated to specific local sub-structures, such as homogeneous patches, edges, corners, or textures. Over the last decades, sub-domains of Computer Vision have extracted and analysed such sub-structures. The intrinsic dimension (see, e.g., [11]) has proven to be a suitable descriptor that distinguishes such sub-structures. Homogeneous image patches have an intrinsic dimension of zero (i0D); edge-like structures are intrinsically 1-dimensional (i1D) while junctions and most textures have an intrinsic dimension of two (i2D). A continuous definition of intrinsic dimensionality has been given in [12, 13]. There, it has also been shown that the topological structure of intrinsic dimension essentially has the form of a triangle. This triangular structure can be used to associate 3 confidences (ci0D , ci1D , ci2D ) to homogenous-ness, edge–ness, or junction–ness (see also [14]). This association of confidences to visual attributes is a general design principle in our system. These confidences as well as the attributes themselves are subject to contextual integration via recurrent processes. Aspects with associated low confidences have a minor influence in the recurrent processes or can be disregarded. Orientation: The local orientation associated to the image patch is described by θ. The computation of the orientation θ is based on a rotation invariant quadrature filter, which is derived from the concept of the monogenic signal [15]. Considered in polar coordinates, the monogenic signal performs a split of identity [15]: it decomposes an intrinsically one-dimensional signal into intensity information (amplitude), orientation information, and phase information (contrsat transition). These features are pointwise mutually orthogonal. The intensity information can be interpreted as an indicator for the likelihood of the presence of a certain structure with a certain orientation and a certain contrast transition (phase, see below). Orientation estimation is further improved by interpolating across the orientation information of the whole image patch to achieve a more reliable estimate. Contrast transition: The contrast transition is coded in the phase ϕ of the applied filter [15]. The phase codes the local symmetry, for example a bright line on a dark background has phase 0 while a bright/dark edge has phase −π/2 (see fig. 2I). Therefore, 157 -π/2 π/2 -π 0 Figure 2: Left: Different grey level strucures (such as line and step-edge structures) can be associated to different phase values. Right: For line–like structures the colour on the line is important which is irrelevant for step edges. Consequently there is a different coding for these different structures: The line indicating the border of the street is first coded as two step-edges and then with greater distance as a line. the line that marks the border of the street is represented as a line or two edges depending on the distance (see fig. 2II). In case of boundaries of objects phase represents a description of the transition between object and background. Color: Color (cl , cm , cr ) is processed by integrating over image patches in coincidence with their edge structure (i.e., integrating separately over the left and right side of the edge as well as a middle strip in case of a line structure). In case of a boundary edge of a moving object at least the color at one side of the edge is expected to be stable since (in contrast to the phase) it represents a description of the object. Note that the coding of colour depends on the phase. In case of a line structure (i.e., ϕ ≈ 0 or ϕ ≈ π) besides the colour on the left and right side also a colour for the middle stripe is shown. Optic Flow: Optic flow is associated to a primitive by a vector o. There exist a large variety of algorithms that compute the local displacement in image sequences. [16] have them devided into 4 classes: differential techniques, region-based matching, energy based methods and phase-based techniques. After some comparison we decided to use the well-known optic flow technique [17]. This allgorithm is a differential technique in which however in addition to the standard gradient constraint equation an anisotropic smoothing term is supposed to lead to better flow estimation at edges (for details see [17, 16, 18]). Stereo: An image patch also describes a certain region of the 3D space and therefore 3D attributes can be associated such as a 3D-position and a 3D-direction (in the following called 3D-Primitives). In [5] we have defined a stereo similarity function that makes use of multiple-modalities to enhance matching performance. We could show 158 a) b) c) Figure 3: a) Left and right image from stereo frame. b) Front view of extracted spatial primitives (note that this representations has been extracted over multiple frames). The spatial primitives carry beside the actual depth value computable from the disparity also information about 3D orientation, phase and colour. c) Side view of the representation. The structure of the street as well as the two trees are clearly visible. that it is the joint use of different modalities that gives best results and could compute weights for the importance of the different modalities [5]. The stereo information is coded by the 3D position and the 3D orientation. Note that the result of the primitive representation is actually not only a disparity map. It is a representation in which the ’symbols’ carry beside the depth information also information about other semantic aspects (see figure 3b,c). We end up with a parametric description of a Primitive as E = (x, θ, ϕ, (cl , cm , cr ), o, (ci0D , ci1D , ci2D )). In addition, there exist confidences ci , i ∈ {ϕ, cl , cm , cr , o} that code the reliabilty of the specific sub–aspects that is also subject to contextual adaptation. In addition, information about the underlying 3D-structure is associated to a primitive. Although a usual image patch that is represented by our Primitives has a dimension of 3×12×12 = 432 values (3 color values for each pixel in a 12×12 patch), the output of our Primitives has less than 20 parameters. Therefore, the Primitives condense the image information by more than 95%. This condensation is a crucial property of our Primitives that allows to represent meaningful information in a directly accessible and compressed way. 159 a) b) c) d) Figure 4: a) Accumulation scheme for spatial-temporal integration. b) Top view of scene in figure 3a after applying the accumulation scheme and ellimination of primitives by thresholding over confidences. c) Accumulated representation projected to an image (white line segments represent reliable primitives while dark ones represent unreliable ones). d) Top view as in b) but without thresholding. The visual modalities coded in our Primitives are processed in early stages of visual processing. Hubel and Wiesel investigated the structure of the first stage of cortical processing that is located in an area called ‘striate cortex’ [1, 2]. The striate cortex is organized in a retinotopic map that has a specific repetitively occurring pattern of substructures called hyper-columns. Hyper-columns themselves contain so called orientation columns and blobs. However, it is not only orientation that is processed in an orientation column but the cells are sensitive to additional attributes such as disparity, color, contrast transition, local motion and disparity (see [19]). Also specific responses to junction–like structures could be measured [20]. Therefore, it is believed that in V1 basic local feature descriptions are processed similar to the feature attributes coded in our Primitives. 160 3 Integration of Contextual Information and Conclusion It is not only local image processing performed in early visual processing. As mentioned above, there occurs an extensive communication within visual brain areas as well as across these areas. In this communication process the locally necessarily ambigious data becomes disambiguated in recurrent processes that are based on regularities in visual data. In this sense the Primitives must be understood as initiators of an disambiguation process that makes use of contextual knowledge. We will briefly point to the application of the Primitives is such processes now. We have used the applied image representation in different contexts. Firstly, the Primitives can be subject to a purely spatial contextual modification. We define links between Primitives based on a statistical criterion in [4]. In [6], we apply this linkage structure to improve stereo processing by demanding correspondences that preserve links across groups in the left and right image. Secondly, we stabilize features according to the temporal context. By making use of the motion of an object to predict feature occurrences across frames we can stabilize stereo processing by modifying the confidences according to the temporal context. For example, in figure 4 the regularity Rigid Body Motion has been used to code predictions of 3D–Primitive occurences across frames. Schematically, the method depicted in figure 4a) is applied: Assuming a certain ocurrence of a primitive in the first frame (4a,i) and having an estimate of the underlying 3D structure we are able, based on the motion between frames, to predict the occurrence of the primitive in the next temporal frame. In case our prediction becomes verified we increase the confidence associated to the primitive (see figure 4)b). After a couple of iterations the reliable estimates can be easily detected by their associated confidences. Note that we can also improve the estimate of its parameters by interpolating between our prediction and the actually found feature as schamtically shown in figure 4a. Wrong 3D primitives lead to predictions that can not be verified in consecutive frames which then results in a lowering of the associated confidences (shown in dark colour in figure 4c). After a couple of frames the 3D–Primitives caused by wrong correspondences can be easily detected by their low confidences. While in figure 4d all hypothese in the process are shown (in a top view of the scene) in figure 4b only Primitives that have been proven to be reliable are displayed. Accordingly, in figure 4, the general structure of the scene is clearly visible. 4 Conclusion We have introduced a novel kind of image representation in terms of visual Primitives. These Primitives are multi-modal and give a dense and meaningful description of a scene. Our Primitives are used as a first stage in the artificial visual system confidences associated to our Primitive attributes adapt according to spatial and temporal context and in this way stabilize the locally unreliable feature extraction. 161 References [1] D.H. Hubel and T.N. Wiesel, “Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex,” J. Phyiology, vol. 160, pp. 106–154, 1962. [2] D.H. Hubel and T.N. Wiesel, “Anatomical demonstration of columns in the monkey striate cortex,” Nature, vol. 221, pp. 747–750, 1969. [3] N. Krüger, M. Lappe, and F. Wörgötter, “Biologically motivated multi-modal processing of visual primitives,” The Interdisciplinary Journal of Artificial Intelligence and the Simulation of Behaviour, vol. 1, no. 5, 2004. [4] N. Krüger and F. Wörgötter, “Multi modal estimation of collinearity and parallelism in natural image sequences,” Network: Computation in Neural Systems, vol. 13, pp. 553–576, 2002. [5] N. Krüger and M. Felsberg, “An explicit and compact coding of geometric and structural information applied to stereo matching,” Pattern Recognition Letters, vol. 25, no. 8, pp. 849–863, 2004. [6] N. Pugeault, F. Wörgötter, and N. Krüger, “A non-local stereo similarity based on collinear groups,” Fourth International ICSC Symposium on ENGINEERING OF INTELLIGENT SYSTEMS, 2004. [7] Y. Aloimonos and D. Shulman, Integration of Visual Modules — An extension of the Marr Paradigm, Academic Press, London, 1989. [8] N. Krüger and F. Wörgötter, “Statistical and deterministic regularities: Utilisation of motion and grouping in biological and artificial visual systems,” Advances in Imaging and Electron Physics, vol. 131, 2004. [9] ECOVISION, “Artificial visual systems based on early-cognitive cortical processing (EU–Project),” http://www.pspc.dibe.unige.it/ecovision/project.html, 2003. [10] N. Krüger, M. Ackermann, and G. Sommer, “Accumulation of object representations utilizing interaction of robot action and perception,” Knowledge Based Systems, vol. 15, pp. 111–118, 2002. [11] C. Zetzsche and E. Barth, “Fundamental limits of linear filters in the visual processing of two dimensional signals,” Vision Research, vol. 30, 1990. [12] N. Krüger and M. Felsberg, “A continuous formulation of intrinsic dimension,” Proceedings of the British Machine Vision Conference, 2003. [13] M. Felsberg and N. Krüger, “A probablistic definition of intrinsic dimensionality for images,” Pattern Recognition, 24th DAGM Symposium, 2003. [14] S. Kalkan, D. Calow, M. Felsberg, F. Wörgötter, M.Lappe, and N. Krüger, “Optic flow statistics and intrinsic dimensionality,” Proceedings of the ’Early Cognitive Vision Workshop’ on the Isle of Skye (Scotland), 2004. 162 [15] M. Felsberg and G. Sommer, “The monogenic signal,” IEEE Transactions on Signal Processing, vol. 49, no. 12, pp. 3136–3144, December 2001. [16] J.L. Barron, D.J. Fleet, and S.S. Beauchemin, “Performance of optical flow techniques,” International Journal of Computer Vision, vol. 12, no. 1, pp. 43–77, 1994. [17] H.-H. Nagel and W. Enkelmann, “An investigation of smoothness constraints for the estimation of displacement vector fields from image sequences,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, pp. 565–593, 1986. [18] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, “High accuracy optical flow estimates based on a theory for warping,” Proceedings of the ECCV 2004, vol. ?, pp. ??, 2004. [19] R.H. Wurtz and E.R. Kandel, “Perception of motion, depth and form,” in Principles of Neural Science (4th edition), E.R. Kandell, J.H. Schwartz, and T.M. Messel, Eds., pp. 548–571. 2000. [20] I.A. Shevelev, N.A. Lazareva, A.S. Tikhomirov, and G.A. Sharev, “Sensitivity to cross–like figures in the cat striate neurons,” Neuroscience, vol. 61, pp. 965–973, 1995.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement