# Motion and structure Motion and Structure. Application to
feature-oriented coding
As mentioned in Chapters 1 and 3, observation and associated estimation methods of
an apparent motion vector eld within an image sequence results from the projection of
3-D objects and from their 3-D motion on the 2-D image plane. This projection operation,
which is perspective or orthogonal in nature, depending on the projection system selected,
creates ambiguities concerning apparent 2-D motions perceived and, in addition, does not
generate a compact representation of the motion information itself. In fact, if we take the
example of a rigid 3-D body undergoing 3-D motion, this motion of the object is wholly
specied by a small number of parameters (generally six degrees of freedom) through
the kinematic screw (translation + rotation) associated with the object and referenced in
relation to an absolute xed reference. This same 3-D motion observed through the 2-D
apparent motion-vector eld is, on the other hand, much more complex to analyze and
to represent. A more compact representation and more eective estimation of complex
motions which are not purely translational parallel to the image plane, constitute the two
essential arguments in favour of a higher level modelling for the motions and structures
of objects manipulated. All of the motion estimation techniques detailed in the preceding
chapters limited themselves:
to a local estimation by pixeldyfor which the representation of motion by its apparent
t
t
motion vector (x_ y_ )t = ( dx
dt dt ) = (u v ) - two translational components - is adequate. Clearly, it is impossible to talk about rotational motion of an object restricted
to one pixel.
to a global estimation of a translation vector (u v)t by block (block matching) or
region. This representation of the apparent motion eld only makes it possible to
model and identify a constant and purely translational motion parallel to the image
plane by object (region, block...) which constitutes a very restrictive class of 3-D
motions of an actual natural scene. Let us recall that in the case of sensor motions
which are not purely translational parallel to the image plane, which is often the
case in televisual scenes (tilt, panning, translations parallel to the optical axis...),
the apparent motion vector eld cannot be correctly represented on regions or blocks
by a simple 2-D translation.
As far as modelling and identication of 3-D motion parameters are concerned, there
are several possibilities. Firstly (Section 8.1), we recall the geometrical relations between 3D motions, 3-D structures (i.e., 3-D geometry of objects) and apparent 2-D motions in the
case of the visual \perspective" projective system. The particular cases of the description
of objects by planar facets and low-order parametrized approximation of motion vector
elds (1st order: ane models and 2nd order: quadratic models) are more particularly
detailed.
As far as the resolution methods and the application frameworks envisaged are concerned, we will present separately:
1
the monocular case where a unique sensor (if necessary moving) perceives the dy-
namic scene and, through spatio-temporal observations, tries to reach both motion
information and that concerning the structure of objects. The applications within
coding schemes concern compression methods (\second" generation with very low
rates) or techniques of analysis/synthesis by extraction of high-level global primitives.
the stereoscopic case where several sensors (2 or even 3 sensors) simultaneously perceive the same dynamic scene which makes it possible to identify, either in parallel
or jointly, the structural and motion parameters of the 3-D objects which constitute the scene. Many studies have been carried out into stereo-motion cooperation
within the eld of Articial Vision, primarily with the aim of 3-D reconstitution
of objects or of robot navigation in complex environments. More recently, for 3-D
TV or stereoscopic sequence dynamic restitution applications (CAD of 3-D objects,
computer-assisted chirurgical operation...), these techniques have also been studied
with the aim of improving image reconstitution quality after analysis/synthesis phase
or compression/decompression. Whilst still remaining at the heart of similar motion estimation schemes, the bi- or tri-nocular stereoscopic case makes it possible to
improve the observation space and to solve some ambiguities in temporal occlusion
regions.
Some results of simulation of predictive coding schemes with motion compensation
will be given, which enables to measure the performance of associated estimators.
1 Models and descriptors of 3-D motions
1.1 Relations between 3-D motions and apparent motions
_ Y_ Z_ )t of
Let us recall the geometric relations which link the 3-D motion vector V~ = (X
a point (X Y Z )t of the surface of an object in motion and its projection (x_ y_ )t = (u v )t
on the image plane.. We examine the case of the perspective projective system where
Y
y
=
f
(1)
x=fX
Z
Z
In order to simplify the notations, we will select the term f to designate the ratio focal
length/pixel size as having a normalized value of 1.
The 3-D motion vector V~ can be expressed using the instantaneous translation vector
T~ and of the instantaneous rotation vector ~ of the kinematic screw associated with the
moving object 26], i.e.,
2 3
X
V~ = T~ + ~ ^ 64 Y 75
(2)
Z
which is expressed, by components, as
2 _ 3 2
3 2
3
X
T
Y Z ; Z Y
X
64 Y_ 7
(3)
5 = 64 TY 75 + 64 Z X ; X Z 7
5
_Z
TZ
X Y ; Y X
2
In the same way, the components of the apparent motion vector associated with the
point (x y ) in the image plane, are dened in the case of perspective projection by
" # " # " X_ ;xZ_ #
x_ = u =
Z
(4)
Y_ ;yZ_
y_
v
Z
which after replacement in Equation (4) of the expressions dened in Equation (3) gives
" # " T
#
x_ = ZX + Y ; TZZ x ; Z y ; X xy + Y x2
(5)
TY + ; TZ y + x + xy ; y 2
y_
X Z
Z
Y
X
Z
The relations (5) are fully specied when the term 1=Z is also expressed as a function
of the local pixel-coordinates (x y ). In order to retain a maximum quadratic order in
Equation (5) as a function of the coordinates (x y ), but particularly since the structural
terms of a geometric surface greater than order 1 are dicult to identify without bias on
real images, a priori hypotheses concerning the regularity of surfaces are given. Then, if
the term Z (and, therefore, the term 1=Z ) is expressed by a rst order Taylor development,
@Z ) X + ( @Z ) Y + o2(X Y )
Z = Z0 + ( @X
@Y 0
0
(6)
= Z0 + Z1 X + Z2 Y + o2 (X Y )
1 = 1 (1 ; Z x ; Z y ) + o2(x y )
(7)
1
2
Z Z0
subsequently noted by
1
2
(8)
Z = nX x + nY y + nZ + o (x y )
(nX nY nZ ) specifying the terms of the structure of the local surface which is approximated here by a planar facet (Equation (6)) around (X0 Y0 Z0). Currently, the reference
point selected will be the center of gravity of the region for which planar facet approximation (6) or (8) is carried out, being
0 Y0 )t
(xg yg )t = ( X
(9)
Z0 Z0
Equation (5) linking the apparent motion components (x_ y_ )t to the pixel coordinates and
the surface approximation carried out in (8) making it possible to establish a quadratic
relation between ~v = (x_ y_ )t and the coordinates of the point where this measurement is
carried
" # out
"
#
x_ = a1 + a2 x + a3y + a7 xy + a8x2
(10)
y_
a + a x + a y + a xy + a y2
4
5
where,
8
a1 = TX nZ
>
>
>
>
>
a2 = TX nX
>
>
>
>
>
a3 = TX nY
>
>
< a4 = TY nZ
>
a5 = TY nX
>
>
>
>
a6 = TY nY
>
>
>
>
>
a = ;T n
>
>
: a7 = ;T Zn Y
8
Z X
6
8
7
+ Y
; TZ nZ
; Z
; X
+ Z
; TZ nZ
; X
+ Y
(11)
3
1.2.1 Justication of the linear approximation
Two sub-models of the motion vector local eld can be introduced naturally from Equation
(10).
1. a linear model: (dim=6) restricts itself to motion parameters
(a1 a2 a3 a4 a5 a6).
This model is also called an ane model in so far as it makes it possible to identify
an ane pixel-based transformation. In fact, if the pixel pt+t = (xt+t yt+t )t is
matched to the pixel pt = (xt yt )t by the ane relation
pt+t = Apt + B
"
(12)
#
"
#
x_ ' 1 (p ; p ) = 1 (A ; I ) xt + B
2 y
y_
t t+t t t
t
t
(13)
we again nd the linear relation between motion vector eld and pixel-coordinates.
An important consequence of this observation is that, when such a linear motion
model is used, the properties of ane transformations will be used, implicitly: in
particular, let us refer to the transformation of a linear segment in a linear segment, of
a polygonal region in a polygonal region and the maintenance of convexity property.
2. a quadratic model (dim=8) using all the parameters fai gi=1:::8 dened in Equation
(10).
We will then see that these models, even if they prove to be more complete, come up
against two major problems: it turns out that it appears dicult to obtain an accurate estimation of quadratic terms from previously estimated 2-D apparent motion
measurements the model described by the Equation (10) is already a restrictive
model compared to a general quadratic model which would contain six quadratic
terms and is only obtained by rst order approximation of local surfaces and rigid
motion hypothesis secondly, the use of a quadratic parametric model in motion compensation only brings minor improvements in the regions of complex motions and
can even prove to be less ecient than the use of a lower order parametric model.
1.2.2 Illustration of particular cases of linear modelling
Case 1: If the instantaneous rotation vector ~ = (X Y Z )t is equal to (0 0 Z)t,
that is to say where only rotations around the center of gravity of the region, and with
a rotational axis parallel to the optical axis are authorized, then the development
(10) becomes:
"
#
"
# "
#"
x_ = TXg + k ; x ; xg
y_
TYg
k
y ; yg
"
#"
#
T
n
T
n
x
;
x
X
X
X
Y
g
+ T n T n
y ; yg
Y X
Y Y
with
4
#
(14)
(TXg TYg )t = (a1 a4)t = (TX nZ TY nZ )t, translation vector of the center of
gravity of the region which, as we note, in relation to the 3-D translation
components, is only dened to within one nZ factor (similarity factor on the Z
axis).
k = ;TZ nZ and = Z , terms which are very often preponderant in translation
and rotation along the optical axis
the other terms constituting crossed motion and structure terms along the other
axes
Case 2: Simplied linear model (SLM model)
An even rougher form of modelling of the structural geometry of objects and regions
consists of considering the scene as a succession of planar facets parallel to the image
plane, in the same way as a z-buer in infography. This leads to nX = nY = 0 and,
consequently,
" # "
# "
#"
#
x_ = TXg + k ; x ; xg
(15)
y_
TYg
k
y ; yg
The merit of this form of modelling is that it provides a compact representation (4
parameters) for the description of the eld and a simple interpretation concerning
the 3-D motion components: TX TY TZ and Z = .
Case 3 : Constant Model (CST model)
Finally, let us recall the case of the constant model, restriction of the linear model
solely to 0 order terms. This model, which is widely used in motion compensation
by regions nevertheless proves limited in identifying complex global 3-D motions.
1.3 Linear approximation of the motion vector eld and choice of 2 12 -D
descriptors
The analysis base for specifying the geometry of the motion vector eld as specied by
the Equation (10) is not of course unique. To convince ourselves of this it is possible,
through dierential operators, to return to the general formulation of a vector eld with,
for example, linear geometry
" # "
# " @u @u # "
#
u = ug + @x
x ; xg
@y
(16)
@v @v
v
vg
y ; yg
@x @y
which corresponds to a development limited to rst order of the eld around the point
(xg yg ), or
" # "
#
"
#
u = ug + M x ; xg
(17)
v
vg
y ; yg
Francois et Bouthemy 13], and Simard and Mailloux 44] recall that the M matrix can be
re-written as:
M = 12 trace" (M)II +# 12 (M ; "MT ) + 12 (#M + MT ;" trace(M#)II)
= 21 div 10 01 + 12 rot 01 ;10 + 21 hyp1 10 ;10
(18)
"
#
+ 12 hyp2 01 10
5
which makes it possible to introduce general dierential operators for the description of a
vector eld (not necessarily linear) at each (x y ) point
divergence = div(u v )
rotational = rot(u v )
hyperbolic 1 = hyp1 (u v )
hyperbolic 2 = hyp2 (u v )
@v
= @u
@x + @y
@v ; @u
= @x
@y
@u
= ; @v
@y + @x
@v + @u
= @x
@y
(19)
Examples of synthetic elds are provided by Figure 1 and illustrate fairly well the physically interpretable nature of these dierential descriptors.
Using these, we thus specify a linear geometry motion vector eld by
"
#"
#
" # "
#
x ; xg
x_ = ug + 1 (div + hyp1 ) (hyp2 ; rot)
(20)
y_
vg
2 (rot + hyp2 ) (div ; hyp1 ) y ; yg
The analogy with the ane decomposition model dened at Equation (10) makes it possible to dene the change of basis between descriptor sets.
8
a1 = ug = Txg
>
8
>
>
ug = a1
>
>
div
+
hyp
>
>
1)
>
>
a
=
(
>
>
2
>
>
2
div = a2 + a6
>
>
>
>
< a3 = ( hyp2 ;rot )
< rot = a5 ; a3
2
() > vg = a4
(21)
>
a
=
v
=
Tyg
4
g
>
>
>
>
>
>
>
>
hyp = a ; a
>
>
>
>
a5 = ( rot+2hyp2 )
>
: hyp1 = a2 + a6
>
>
3
5
: a = ( div;hyp1 )
2
6
2
According to the estimation method (evoked in Section 8.2) and the intended application (qualitative interpretation and/or use in motion compensation), it would be advisable
to select whichever set of descriptors proves to be the most eective. Finally, let us stress
that the particular case of linear models dened by Equation (15) corresponds to the case
in which the hyperbolic terms (hyp1 and hyp2 ) are disregarded, that is to say:
(
a2 = a6 = 12 div
a3 = ;a5 = ; 12 rot
(22)
1.4 Design and use of an apparent motion model hierarchy
Up until now, studies carried out in the eld of motion estimation-compensation only used
a pre-dened motion model, without seeking to adapt it to the various motions present
within the image. Let us note that, as a general rule, it is the region-constant model which
is used. Now, as there are generally several dierent types of motions in a single natural
image sequence, it would seem to be interesting to adapt the motion model to be identied
locally, this, essentially, for the following two reasons:
the identication of a too simple motion model (for example a constant model) in
a region in which the physically observed motions are complex (some sort of 3-D
motion of a rigid body for example) can only lead to poor reconstitution by motion
compensation or to an over-segmentation of the region (possibly down to pixel level)
costly in terms of volumes of motion information to estimate and to transmit (see
Figure 1).
6
Figure 1: Illustration of the eect of the selection of a model on segmentation: if a
divergence model is used, the whole of the vector eld constitutes a single homogenous
region, on the other hand, if a constant model is used, it is necessary to decompose the
main region into several sub-regions (thus more descriptors are used) and that for a less
eective result.
the identication of a sophisticated motion model (for example a quadratic model)
on a region in which a single motion can be observed (for example 2-D translation
motion parallel to the image plane) leads to large estimation bias, including on the
signicant parameter sub-vector corresponding to the single motion which naturally
should be identied. In fact, as we will establish in the next paragraph, the criterion
to be minimized in the motion parameter vector estimation diagram is very often
global, since it is simultaneously dependent on all the components of the motion
vector to be identied. Thus the components which are not actually observable
introduce bias on the identication of the components of the true motion.
Naturally, paragraphs 8.1.2 and 8.1.3 introduced several motion models of increasing
complexity. Figure 2 illustrates how these dierent models can be placed in a hierarchy
from the most simple (zero motion) to the most complex. As with 8] and 39], we have
included the possibility of introducing into the motion parameters vector to be identied,
an estimate of the illumination variation, considered as a potential source of temporal
change in the intensity function. Once this model hierarchy has been identied (denoted
by M ), it is advisable to dene the path strategy within this hierarchy. The introduction
of the notion of local adaptivity of motion models signies the choice from amongst the
M entity of the most \probable" model in the sense of a cost or performance criterion
for the model . This cost function very often depends on:
the error due to reconstitution by motion compensation associated with the model
the cost of representation (indeed of transmission if the motion vector eld is transmitted in accordance with the coding schemes considered) of the motion information
(parameters vector with dimensions which vary depending on the model)
7
Null
motion
Constant
motion
Rotation
Divergence
Simplied
linear motion
Linear
motion
Equivalent
models
Ane
motion
motion
Figure 2: A model hierarchy
the size of the region considered in order to avoid an under- or over-segmentation of
the image
the operational cost of the identication of the vector It is easy to distinguish two extensive methodologies for the eective use of the M entity
of motion models:
1. Parallel approach: a test in parallel of all motion models is carried out, region by
region in the sense of a MAP criterion, and the most eective model is selected. The
clearly formalized mathematical framework of the statistical criteria based to the
information theory 40] makes it possible to solve this problem.
2. Sequential approach: this involves using the hierarchy of M models in accordance
with a pre-dened path which can be either:
from the simplest to the most complex model (\coarse to ne" approach)
from the most complex to the simplest by progressive suppression of the components of the motion vector (\ne-to-coarse" approach)
from an averagely complex intermediate model (for example an SLM model
introduced in paragraph 8.1.2) to a more complex or more simple version.
For all these sequential approaches, the mathematical framework for the tests of the hypotheses based on likelihood functions appears well adapted: two hypotheses will be tested
by comparison with each other, for example in the sense of maximum likelihood:
Hypothesis H0: the motion of the current region corresponds to a motion model 8
Hypothesis H1: the motion of this same region corresponds to a just slightly more
complex ( + 1) motion model.
In conclusion, let us note that within the context of the use of such a motion hierarchy,
the representation of the motion information will consist of two information elds:
the map of models selected (one label fg per region)
the motion parameter vector eld itself. Let us also recall that the size of the vector
varies depending on .
2 Estimation methods in the monocular case
2.1 Estimation of the sensor motion of a static scene
Several motion estimation algorithms try, before or at the same time as the estimation of a
dense motion information eld, at all points or in all regions of the image, to estimate the
sensor motion, in order to be able to identify not the relative motions between the camera
and the objects, but the absolute motions of the objects in relation to a xed reference.
A priori, the camera has freedom of motion throughout the six dimensions of a true
motion (3-D translation and 3-D rotation). According to certain hypotheses (see 16],
50], 39]) involving, in particular, the relative distancing of objects present in the scene
in relation to the small angles of rotation during a panoramic motion of the sensor, the
camera motions can be reduced to the following three classes:
translations parallel to the image plane (including panning).
translations perpendicular to the image plane (divergence) analytically equivalent to
a change in focal length (zoom).
rotations around the optical axis.
It can thus be seen that a simplied linear motion model (SLM model with SLM =
(tx ty k )), as introduced by Equation (15), makes it possible to identify such a sensor
motion.
This sensor motion can be estimated directly by one of the methods introduced in the
paragraph below. The entire image is then considered as a single region whose center of
gravity is the center of the image also identied at the projection of the optical center.
Other quantitative information (localization of xed objects in the scene whose apparent
motion is thus not due to the sensor motion alone) or qualitative information (known
nature of the sensor motion model) can be injected easily into the algorithm, in order
to ease and improve the estimate. A priori, such knowledge is rarely available in the
case of communication services (contribution, distribution, storage services, etc...) which
is the opposite of applications which use \closed-loop" dynamic imagery, that is to say
where information is available concerning the sensor motion from its own control (e.g.,
tele-monitoring, vision for robotics, etc...).
The results in Figures 3 to 7 illustrate the performance obtained when sensor motion is
taken into account, in terms of compactness of motion representation and of the error due
to reconstitution by motion compensation, in the limited case in which only this sensor
motion estimation is carried out.
9
a
b
c
Figure 3: (a) and (b), two original frames of the \Kiel harbour" sequence, (c) Frame
dierence image with MSE=922.5
2.2 Estimation methods of motion descriptors for a moving scene
All the motion estimation methods - closely related to the aspects of segmentation based
on motion in the case of motion estimators by regions - were discussed in Chapter 3,
essentially using the 2-D constant translation model (tx ty ). Let us also recall that the
following general classes of motion estimation were presented:
translation of a 2-D region (whose \block-matching" algorithm is an example)
pel-recursive algorithms
iterative algorithms
analysis of spatio-temporal frequencies
parametric models
Below we detail how these methods can be extended naturally for more complex parametric motion models (already presented in Section 3.3.2.5). However, two cases present
themselves depending on the existence or otherwise of a dense apparent motion vector eld
preliminary to the estimation of the parameters of more global models. We deal brie!y
with the case in which such a dense eld preexists since, clearly, an algorithmic scheme
complete as much for coding as for analysis, will tend to remove itself from the calculation
of this dense eld, sometimes very operationally complex, if it is not useful. Let us note,
10
a
b
c
d
Figure 4: (a) Identication of a global (camera) motion using a divergence motion
model, (b) optical !ow relative to the global motion, (c) Dierential !ows, (d) Motioncompensated frame dierence image only based to the global motion (a) MSE = 56.3
a
b
Figure 5: (a) and (b), two original frames of the \Interview" sequence
however, that through the analytical relations detailed below, it is still possible to pass
from a sparse eld of motion descriptors to a dense apparent motion vector eld and vice
versa.
2.2.1 Estimation of a parametric model from a dense motion vector eld
As we saw in Chapter 3, many methods make it possible to obtain a dense motion vector
eld. An illustration is provided below (Figure 8) with the Horn-Schunck algorithm 17].
The idea is to use this dense information in order to extract from it parameters of a more
global model (for example an ane or SLM model as illustrated in Figure 9).
11
a
b
c
Figure 6: (a) Identication of a global (camera) motion using a constant motion model,
(b) optical !ow relative to the global motion, (c) Dierential !ow
a
b
Figure 7: (a) Frame dierence image with MSE=137.4, (b) Motion-compensated frame
dierence image only based to the global motion MSE=100.9
At this stage, we assume that we have image segmentation into homogenous regions
in the motion sense. The parameters are obtained:
by minimization of the mean square error between the initial dense eld and the
dense eld derived from the parametric model (15], 29], 16]) for example, let us
consider an SLM model with parameters SLM = (tx ty k )t for a region R and an
initial dense eld noted as f(ui vi)g for each pixel 2 R indexed by i with coordinates
(xi yi ) the error to be minimized is therefore expressed as:
E2 =
X
(tx + kxi ; yi ; ui )2 + (ty + kyi + xi ; vi )2
i2R
12
(23)
Figure 8: Example of an optical !ow obtained by the Horn-Schunck method 17] on
dierents areas where \pure" divergent, translational, rotational and ane !ows have
been synthetized
The least mean squares resolution requires the inversion of the 4 4 matrix (for
such an SLM model). Simplications can be made 42] concerning this system's
resolution. The resolution equations provide the vector of the following parameters:
8
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
:
X
X
X
tx =
ui ; k xi + yi
i
i
i
X
X
X
ty =
vi ; k yi ; xi
i X i X
iX X X
X
ui xi ; uixi+ vi yi ; vi yi
k = i Xi !2 iX iX i!2 Xi
xi ; x2i + yi ; yi2
i
X i X X
Xi
Xi X
ui yi; ui yi ; vi xi+ vi xi
= i X i!2 Xi iX !2 i X i
xi ; x2i + yi ; yi2
i
i
i
(24)
i
by separable identication of global translation motions and rotation/divergence in
relation to the center of gravity of the region considered, by simple averaging of local
estimates 37] the following global parameters are obtained:
X
8
>
t
=
ui
x
>
>
>
>
i
X
>
>
>
>
>
< ty = i vi
X xi (ui ;tx )+yi (vi ;ty )
>
k
=
>
>
xi 2 +yi 2
>
>
i
>
X
>
>
xi (vi ;ty );yi (ui ;tx )
>
>
: =
xi 2 +yi 2
0
(25)
0
0
0
0
0
0
0
i
where (x0i yi0 ) represents the
of the region considered.
relative coordinates in relation to the center of gravity
13
Figure 9: Identication of the ane motion model descriptors on the four regions (Velocity
eld obtained by using the system in Equation (25))
2.2.2 Direct parametric estimation
Least Mean Square Estimation By extension of the methods introduced in Chapter
3 (paragraph 3.3.2.5) it is quite possible to envisage the introduction into the resolution
model of a more complex model (ex. here of an ane model). The resolution of the motion
constraint equation is expressed by: for the region R, the optimal estimated motion R
will be
= (a1 a2 a3 a4 a5 a6 )t
R
X
= arg min (Ix (p)u( ) + Iy (p)v (
p2R
) + It (p))2
(26)
with u( ) = a1 + a2 x + a3 y and v ( ) = a4 + a5 x + a6y (ane model). The least
squares resolution is achieved by resolution of a linear system of six equations. Certain
simplications have been proposed 42], 15].
Estimation by a generalized gradient method (see Chapter 3)
Here we seek the solution minimizing the motion compensation mean square error
across the whole of the region R by the gradient optimization technique.
^ = arg min X DFD2 (p )
p2R
X
(27)
= arg min
(I (i j k) ; I (i ; u( ) j ; v ( ) k ; 1))2
p(ij )2R
The gradient algorithm (35], 42], 37]) then becomes general to the following iterative
estimation process:
m
~ m+1 = ~ m ; ; ~
(28)
NR
14
2 @
DFD2 (i j ~ m)
@a
1
X
6
..
64
with ~ m =
.
(ij )2R
@ DFD2 (i j ~ m)
3
7
7
5
@an
where,
m designates the iteration index
NR the size of the region R
; a gain matrix which can be either xed,
m adaptive, full or empty limited to a
~
diagonal matrix, the corrective term ; between two iterations is carried out in
the direction of the gradient of each component..
In the case of an ane model where ~ = (a1 a2 a3 a4 a5 a6)t , the estimation of ~ is
obtained iteratively by:
~ m+1 = ~ m ; X ;~m (i j )DFD((i j ) m)
(29)
(ij )2R
with the displaced gradient vector ~ m equal to
2
Ix (i ; u( m ) j ; v ( m ) k ; 1) 3
66 iIx(i ; u( m ) j ; v ( m ) k ; 1) 7
66 jIx(i ; u( m ) j ; v ( m ) k ; 1) 7
7
m
~ (i j ) = 66 I (i ; u( m) j ; v ( m ) k ; 1) 777
(30)
64 y
7
iIy (i ; u( m ) j ; v( m ) k ; 1) 5
jIy (i ; u( m ) j ; v( m ) k ; 1)
The ; gain matrix is taken diagonal in order mto avoid interaction between the dierent
descriptors, otherwise the corrective term ~ would not be taken in the direction of
the gradient. Elsewhere, in practice, it is necessary to take account of the dierence in
scale and in physical size which exists between the various components of the vector ~ of
motion parameters. Thus the \constant" parameters (a1 and a4 ) of a ane model will be
allocated a larger gain than the other descriptors.
The estimation-segmentation link The identication of the previous motion models
requires the denition of a segmentation, either prior to, or concomitant with, the motion
estimation phase itself, since this operates on an region R of matched pixels.
Generally speaking, two approaches can be used:
1. the denition of a segmentation which is either arbitrary (decomposition of the image
into blocks) or independent of motion (purely spatial segmentation which has the
major inconvenience of constituting an over-segmentation from the motion point of
view). This segmentation can be either monogrid, or in relation to a pyramid of
information 15], 42], a quadtree splitting 39] or a splitting/merging into regions
6], 14].
In the case of a pyramidal structure, the elements of this structure inherit motion
parameter vectors calculated at a coarser level and a correction to this motion prediction is carried out by parametric estimation as described previously.
Segmentation into a quadtree allows the progressive decomposition of an image into
smaller and smaller regions making it possible rstly to identify the more global
15
attributes and to lead to identication of local motions (even at pixel level, if the
quadtree is complete) at the end of the estimation process. Clearly, a splitting
criterion has to be dened it can be based on the following tests of hypotheses:
test of a region's homogeneity
The test consists of comparing the motion homogeneity hypothesis (the R0 region corresponds to a 0 parametric model) with that of inhomogeneity (presence of several motions). According to Gaussian hypotheses (and zero-mean
laws) concerning associated error functions, the search for maximum likelihood
leads to testing the following estimated variance:
1 X DFD(i j )2 > or < 2
(31)
R0
NR0 (ij)2R0
test of division of a region into L sub-regions
In this context, the test consists of comparing the following hypotheses:
Hypothesis H0: the region R0 corresponds to a unique parametric model.
Hypothesis H1: the region R0 could be decomposed into sub-regions, on each
one of which a Rl parametric model must be identied.
Bouthemy and Santillana-Rivero 6] test the case in which the region divides
up into two sub-regions. According to the same hypotheses as previously mentioned, the likelihood test between the two hypotheses (hypotheses H0 and H1
associated with likelihood functions f0 and f1 ) leads to the following test:
^ ^ H H
log f (1^ 2) <0 or >1 (32)
f (0)
and we obtain the following criterion:
H
H
NR0 log 02 ; NR1 log 12 ; NR2 log 22 <0 or >1 (33)
where,
{ NR0 NR1 X
NR2 designate respectively the surfaces of the regions R0 R1 R2
1
2
{ i = N i DFD2(p ^ i)
p2Ri
i.e., after linearization,
X
i2 = N1 i (Ix(p)u( ^ i ) + Iy (p)v ( ^ i) + It (p))2
R
R
p2Ri
2. Markovian models make it possible to specify eective observation interaction models
Francois 14] thus denes a motion based segmentation by Markovian approach using
an energy function composed of two terms:
one term favouring identical labelling of two adjacent sites (region merging
approach)
one term seeking to maximize the likelihood of the observations depending on
the labels (same formula as previously for i2)
A deterministic relaxation scheme makes it possible to propagate labels.
16
In conclusion, in the case of the use of a motion parametric model in a motion compensation scheme, it seems important :
to select the X
criterion to minimize as a direct function of the local compensation
errors, i.e.,
DFD(i j R)2
(ij )2R
to smooth out the motion parameters eld to achieve better compactness of presen-
tation.
to avoid the convergence of the estimation process towards local minima of the nonconvex functional to be minimized. The latter two constraints are simply resolved
by the introduction of a relaxation algorithm.
to proceed with a \coarse-to-ne" analysis in a pyramidal or progressive region
splitting sense.
Several authors 15], 16], 42], 22], 39] have adopted these principles and obtain
interesting results from the point of view of both vector eld regularity and motion compensation eectiveness. In Figure 9 we illustrate the example of the algorithm 37], 38]
which will serve as a basis for the results on real sequences in paragraph 8.2.4 and Figures 10 and 11.
2.3 Model hierarchy
In the case where, for a given region R, a notion of adaptation of a model to the region is
envisaged, it is best to dene a selection criterion for the optimum model from all the
parametric models written M . Two families of criteria can be used depending on the
sequential or parallel approach desired (see Section 8.1.4).
1. Likelihood ratio
The procedure is identical to that described previously in the context of splitting of
regions. It is a matter of testing, for the same estimation surface (the current region
R ), two hypotheses:
Hypothesis H1: the use of a \complex" model
t
1 = (a1 : : : ar : : : an )
Hypothesis H0: the use of a \simple" model, restriction to r parameters (r < n) of
the previous model 0 = (a1 : : : ar )t
will be selected in accordance with the most probable hypothesis by comparison
with a threshold of the likelihood ratio associated with the two hypotheses.
According to certain hypotheses (see 14]), it has been shown that this ratio L can
be written in the form
(34)
L = N2R log (1 + W ) = log ff1
0
where f0 and f1 are respectively the likelihood functions under hypothese H0 and
H1 and where W is proportional to a random process according to a Fisher's law
which makes it possible, assuming the prior selection of an error probability (for
17
example = 0.05 ) to x a direction for the test of the hypotheses. In many cases
for coding applications, likelihood functions are relative to the motion-compensated
mean square errors.
2. Information statistical criteria
In this context, it is possible to use Akaike and Rissanen's information statistical
criteria 40] which, for a given model, evaluate both its performance and its complexity.
Generally speaking, these two criteria are expressed in the form:
AKAIKE criterion: C = ;2 log f (y= ) + 2dim( )
(35)
RISSANEN criterion: C = ;2 log f (y= ) + 2dim( ) log NR
(36)
where f (y= ) is the likelihood of y conditional to . The rst terms of these two
criteria constitute the model performance measures (likelihood), whilst the second
are penalization terms for complex models.
A practical implementation, in order to obtain motion compensation using a
motion model hierarchy, was tested 39] by using a measurement criterion
derived
X
DFD2 ) to
from the Rissanen criterion and compatible with the function ( N1
R
be minimized, already used in the
expressed by:
C = log N1
X
(ij )2R
vector estimation process. This criterion is
DFD2((i j ) ) + rN( )
R (ij )2R
R
(37)
where
is a weighting coecient (for example, = 0.1).
r(), motion model encoding rate, represents the volume of binary information
in the entropic sense for example, required to represent and transmit the ~
parameters vector.
If this criterion is applied to the two motion models 1 and 2, then the model 1 will
be selected, if
C1 > C2
(38)
2.4 Estimation of 3-D motion
The estimation of 3-D motion based on image sequences can be carried out naturally
using two distinct approaches. The rst of these, called the two-stage method, consists
of calculating these 3-D motions from a previously estimated 2-D apparent motion vector
eld. The second, called the direct method, attempts to evaluate these 3-D motions
directly from spatio-temporal derivatives of the intensity function. We describe these two
general approaches below.
18
2.4.1 Two-stage estimation methods
This approach, which is similar to that evoked in paragraph 8.2.2 for the estimation of a
2 12 -D parametric model from a 2-D motion vector eld, is based on the following scheme:
stage 1: estimation of a 2-D displacement vector eld which will be sparse (discrete
methods of matching 2-D primitives) or dense (dierential methods) by one of the
estimation methods described in Chapter 3.
stage 2: By equations linking the projected 2-D motions and 3-D motions (see paragraph 8.1.1 in the case of a dense eld), this second stage identies the 3-D motion
parameters based on the 2-D primitives' eld.
We will deal with the case of discrete methods in Section 8.3, since it is very similar
to the problem of stereovision-motion cooperation on discrete primitives. Within the
context of dierential methods, many authors (1], 45], 55]) pose the problem of the
determination of motion and of the 3-D structure from apparent motion in the form of the
minimization of a quadratic criterion based on equations concerning 2-D/3-D relations.
Even in accordance with the theory of the observation of rigid objects, Equation (5) shows
that this problem of optimization is non-linear.
As an example, in the case of dierential methods, Adiv 1] breaks this estimation
process down into two stages. The rst of these consists of segmenting an apparent motion
vector eld (assumed to have been calculated previously) into regions corresponding to
planar facets. The parametric motion modules are thus quadratic models dened by the
equations at (11). The estimation technique is based on a generalized Hough transform
from Equation (5), the energy function % is dened by
X
% = (u ; ; T z )2 + (v ; ; T z )2
(39)
R
with
8
= ;xy X + Y (1 + x2) ; y Z
>
>
>
>
>
X (1 + y 2) + xy Y + xZ
>
< = ;
T
X
T = k;TxTk Z
(40)
>
TY ;yTZ
>
>
=
T
>
kT k
>
>
: z = kT k
Z
which consists of separating the terms which involves the instantaneous translation vector
T = (TX TY TZ )t and the instantaneous rotation vector ~ = (X Y Z )t respectively.
Assuming the constancy of the energy function % depending on the relative depth variable
z ( @@z = 0), we can deduce the optimum relative depth
z = (u ; ()2T ++ (v2 ); )T
(41)
T
T
which, carrying over to Equation (39)
2
X
(42)
% = ((u ; )(T2 ++(v2;) )T )
N
N
R
The unitary vector kTT k can then be parametered in an angular space ( ) such that:
TX = sin cos TY = sin sin TZ = cos (43)
kT k
kT k
kT k
19
and the energy function % is then parametered to %( ).
The generalized Hough transform makes it possible to calculate the optimum couple
( T ) such that
( T ) = arg min %( )
(44)
On completion of this rst stage, a fusion of adjacent components corresponding to
the same parametric transformation is carried out, using least squares criteria. The algorithm continues by iterative sequencing of these motion-structure parameter estimation
procedures and that of the grouping together of regions which correspond to a single transformation. Adiv 2] extends his work by raising the ambiguities inherent in the estimation
of 3-D motion and of depth these ambiguities are essentially of two types:
a single 2-D eld can have several 3-D interpretations (non-unicity of representation)
2], 5], 51].
an estimation bias on the 2-D primitives eld induces an estimation bias on the 3-D
parameters and often creates phenomena of instability in estimations.
2.4.2 Direct estimation methods
These methods seek to mitigate the drawbacks mentioned previously by direct estimation
of parameters linked to motions and 3-D structures without previously estimated apparent
motion elds. In this context, we again nd extensions of estimation methods known in the
2-D case, such as extensive recursive estimation methods in the case of parametric motion
models (36], 9], 10], 39]) and iterative estimation methods based on the \brightness
change equation" or extensive motion constraint equation in the case of 3-D motions and
particular 3-D structures (planar, quadratic surfaces, ...) 18], 33].
Dugelay and Pele 9], and Netravali and Salz 36] start o from the following three-stage
approach:
from the Equations (11) dening the relations between apparent motion description parameters A, those of 3-D motion C = (~ T~ )t and those of structure K =
( nnXZ nnYZ 1)t
from an initial vector or previous estimate: C n;1 Kn;1
it is possible to repeat the following three stages:
Stage 1: calculation of An;1 from C n;1 Kn;1, initial values using Equation (11)
Stage 2: a dierential method of estimating a corrective term An;1 is operated by
gradient algorithm as follows (see Equations (28) to (30)):
An;1 = X
DFD(p An;1 t ; 1) A DFD(p An;1 t ; 1)
p2R
(45)
Stage 3: based on the system of Equations (11) calculation of the parameters C n and Kn,
function of (An;1 + An;1 ).
20
This system of 8 unknowns and 8 non-linear equations works out by successive linearization (Newton method) for example.
The second family of approaches (18], 32], 33]) consists of starting with the theory
of the temporal invariance of the intensity function expressed by the motion constraint
equation
Ix u + Iy v + It = r~I:~v + It = 0
(46)
In a vectorial manner, Equation (4) deduced from the perspective projection model
can be expressed:
2 3
u 7 ~z ^ (V~ ^ P~ )
6
(47)
~v = 4 v 5 =
(P~ :~z)2
0
_ Y_ Z_ )t , ~v = (x_ y_ 0)t and ~z is the unitary
where p~ = (x y 1)t, P~ = (X Y Z )t, V~ = (X
vector along the optical axis, with ~p = P~P~:~z . By substituting the expression of V~ (Equation
(2)) in Equation (47), that gives us:
~
(48)
~v = ~z ^ (~p ^ (~p ^ ~ + ~T ))
P :~z
The motion constraint equation (46) expanded to the 3-D case is then expressed:
~
r~I:(~z ^ (~p ^ (~p ^ ~ + ~T ))) + It = 0
(49)
P:~z
or in a more compact fashion, if ~s = (r~I ^ ~z) ^ ~p and w~ = ~s ^ p~, then Equation (49)
becomes
~s:T~ + w~ ^ ~ + I = 0
(50)
t
P~ :~z
The resolution method often assumes a geometric structuren model.
For example, in
o
the planar region case, we have the region of the 3-D points P~ dened by P~ :N~ = 1
which is equivalent to p~:N~ = P~1:~z . The motion constraint equation then becomes
(~s : T~ ) (~p : N~ ) + w~ ^ ~ + It = 0
(51)
and the resolution into (T~ ~ N~ ) is carried out by iterative resolution of a functional
minimization algorithm
Z Z
J=
((~s:T~ )(~p:N~ ) + w~ ^ ~ + It )2dxdy
(52)
D
These approaches are thus a direct extension of the iterative estimation method normally
used in the 2-D case. Other region models have also been tried 33] such as quadratic
patches, cylindrical surfaces, etc.
21
2.5 Use of motion compensation in a predictive coding scheme
The use of parametric motion models within a predictive coding scheme with motion
compensation (see Chapter 4 for an introductory description of these schemes) appears to
be a natural extension of the usual case where a dense motion vector eld compensates
the image. As a matter of fact, as illustrated by Equation (10) in the context of a general
quadratic model, if, for each region Rm of the image, we have the motion parameter vector
( m ) identied corresponding to the motion model , it is always possible to derive a
dense apparent motion vector eld from the f m g and use it in a motion-compensated
loop.
The prediction by motion compensation will be equal to
I^(i j k) = I~(i ; u^( m ) j ; v^( m ) k ; 1)
(53)
for each pixel with coordinates (i j )t and where,
I~ indicates the previously reconstituted image
I^ indicates the current image to be predicted
(^u v^) the dense eld predicted from the eld f(u v)g derived from the parameters
f mg.
Because of the compact nature of the representation of the motion information which
represents the f mg, this information is usually transmitted and in this case f(^u v^)g
is selected as being the estimated eld: u^(i j ) = u( m ) and v^(i j ) = v ( m ) for each
pixel (i j ) 2 Rm .
Let us recall that in such a scheme, the information transmitted has to be decomposed
into four parts
1. the image segmentation into N regions fRm gm=1:::N 2. the type of model used m for each region Rm 3. the quantized motion parameters vector ( m ) for each region Rm 4. the quantized motion compensation error.
As far as the coding of the segmentation map is concerned, a compromise has to be
found between the following two extreme cases:
1. a priori known arbitrary segmentation such as a block decomposition: the coding
cost for such a segmentation is null
2. adapted spatial segmentation on all images: consequence of extensive coding due to
the fact of the irregularity of the edges obtained.
Binary coding schemes adapted to edges (for example Freeman codes) can be used,
even if it could use a lot of bit rate to encode this map of contours. Quadtree decomposition
allows good adaptation of the segmentation to the local contents of the image at only a
small coding cost expressed by 24], 39], 41], 43]
(54)
Rquadtree = 43 NR ; 13 NRinit ; NRmin
22
if NR NRinit NRmin designate respectively the number of regions within the nal image
after the quadtree decomposition, the number of regions within the initial image (initial
grid) and the number of regions with the minimal size (quadtree roots).
The coding cost of the label m designating the motion model selected for the current
region Rm clearly only exists in the case of the use of a distinct motion model hierarchy
and can be accessed by an entropy cost.
The parameter vector m is transmitted after quantization. Note that the various
components of this vector do not require the same accuracy of quantization. Adapted
quantizers must be designed for each component.
Finally, the coding of the prediction error by motion compensation uses all codingsource techniques (transform coding, entropy coding, : : : ) again making it possible to
decorrelate the information from a spatial or frequency point of view, and thus to reduce
by as much, the transmission cost of this information eld. Figure 10 shows , applied to the
so-called image sequence Interview, motion compensated error image when a motion-based
quadtree segmentation is used. Moreover, the distorsion v.s rate trade-o is assessed in
Figure 11 for several linear scalar quantization versions of the motion compensated errors.
a
b
c
d
Figure 10: Motion compensation of the \Interview" sequence using a \constant motion"
model. (a) Motion-compensated dierences : MSE=17.9, (b) quadtree segmentation (44
regions are not illustrated), (c) Reconstructed image, (d) Motion vector eld
2.6 Use of an analysis-synthesis coding approach
The estimation schemes previously described lend themselves well to the denition of
schemes involving object-oriented coding by analysis-synthesis. The rst work carried out
in this eld (3], 11], 12], 20]) assumed an extensive knowledge of the nature of the
objects manipulated and restricted itself to a particular category of scenes such as the
23
50
45
40
35
30
25
20
15
10
5
0
30
Compression ratio
Q=128
Q=15
Q=10
Q=5
1
2
3
4
5
6
7
8
9
MSE
Q=128
Q=15
25
20
Q=10
Q=5
15
10
5
0
1
2
3
4
5
6
7
8
9
Figure 11: \Interview" sequence. Compression ratio and MSE for dierent values of the
elementary step of quantization for motion-compensated errors.
motion of human faces (videophone services or video conferences with very small rates
envisaged). In this case, the hypotheses in the preceding paragraphs, used to establish the
relations between 3-D motion/structure and apparent 2-D motion, were valid: rigid objects
decomposed into planar surfaces, small rotation angles, small depth variation between
two successive images. Musmann et al 30] and H'otter 19] develop such an analysissynthesis object-oriented coding approach, using either the 2-D motion estimation by
linear regression methods or the 3-D estimation by prediction/verication methods. The
general scheme of the approach is described in the Figure 12. The sequence analysis phase
concerns the extraction of three types of information:
the shape of objects (regions)
their motion
24
Source Model
Image
Analysis
Parameter
Coding
Transmission channel
Image
Synthesis
Motion parameter
Shape parameter
Texture parameter
Parameter
Decoding
Memory for
Object
Parameter
Figure 12: Block-diagram of an object-oriented analysis-synthesis coder
These information elds being dierent in nature, a specic coding procedure is used
for each of them. The shape information describes the outline of objects and this code
naturally by contour coding techniques. Only temporal changes in shape will be coded
predictively. Motion information also codes predictively in relation to motion parameters
estimated on the same object to the previous image. Finally radiosity information can be
compressed by hybrid coding techniques with motion compensation.
In conclusion, let us note that these analysis-synthesis coding approaches are often
limited to the identication of 2 12 -D motion parametric models without seeking the whole
range of 3-D motion + structure parameters. Such a full range would make it possible
to synthesize the scene not only from the true viewing angle at the current moment, but
also from all sensor-object relative intermediate positions, which would make it possible to
obtain ecient temporal or spatial interpolation schemes. This remains dicult to achieve,
however, given the current levels of accuracy obtained on 3-D structural parameters after
identication and given that these parameters are only known to a relative depth factor.
The stereovision-motion cooperation techniques dealt with in the next section can make
it possible in part to overcome these disadvantages.
3 Motion estimation methods in the binocular case
3.1 Introduction
Unlike the monocular case, here we assume the availability of several stereoscopic sensors makes it possible to perceive at dierent moments (stereocopic sequences) the scene
composed of 3-D objects provided with 3-D motions from several points of view. Various
experimental contexts can be studied:
number of sensors: at least two cameras, in order to allow the creation of a stereoscopic eect. This number can be greater (the case of trinocular vision for example
25
was explored) in order to facilitate the matching phase and to identify certain ambiguities more easily.
geometry of the stereoscopic system: Most studies which have dealt with this algorithmic theme of stereo-motion cooperation use a stereoscopic system, in which cameras are set out in parallel in a unique plane (i.e., image planes are identical) which
assumes a depth focalization at innity, and where the separation of the geometric
base of sensors is large (i.e., greater than the distance corresponding to the visual
system of about 65mm). These choices clearly are uncompatible with the optimal
conditions of the quality of relief perception (see paragraph concerning the use of
these techniques in 3-D TV) for which a respect of dierent levels of conformity is
conventionally introduced.
calibration of the stereoscopic system: this procedure signies the prior identication of the intrinsic parameters of each sensor (focal length, coordinates of the optical
sensor, radial distortion factor,... see Chapter 1), as well as the extrinsic parameters
matching by a geometric screw (Rrl Trl ) (3-D rotation + 3-D translation) the relative references attached to each sensor (l = \left" sensor, r = \right" sensor in this
paragraph).
This calibration phase enables:
{ the establishment equations linking the 2-D pixel coordinates to the 3-D point
coordinates
2
3 2
32 3
Zx
F
X7
x 0 xc
64 Zy 75 = 64 0 F y 7
6
Y 5
(55)
5
4
y c
Z
0 0 1
Z
where (X Y Z ) designate the coordinates of a 3-D point, (x y ) designate the
2-D pixel coordinates and (xc yc Fx Fy ) are the intrinsic parameters of the
sensor (case of a perspective projection sensor model without radial distortion)
{ the passage of \left" coordinate references to \right" and vice versa
2 3
2 3
X
X
64 Y 75 = Rr 64 Y 75 + Tr
(56)
l
l
Z r
Z l
{ the denition of epipoles: the right (left resp.) epipole is the projection on
the right (left resp.) image plane of the optical center of the left (right resp.)
camera. Epipolar lines linking epipoles and optical centers are associated. This
epipolar geometry makes it possible to constrain analytically the geometry of
the search window during the matching of primitives between left and right
images.
It is clear that in the absence of any calibration, only fairly rough heuristics can be
used:
{ selection of the optical center at the center of the image
{ focal parameters xed without identication
{ search window limited in number of pixels directly in the image plane and
hypotheses of horizontal epipolar lines
26
These heuristic selections naturally introduce large sources of error in the motion
estimation and disparity algorithms then used. Tamtaoui 47] carried out a study
into the robustness of these algorithms faced with such errors or inaccuracies on the
calibration parameters.
Once these experimental selections have been made, the problem of 3-D or 2-D motion
estimation in the context of stereoscopic sequences is then posed in these terms: in the
short term at two successive moments (t t + 1), as illustrated in Figure 13, we have four
observation elds (in the binocular case dealt with here) of a 3-D primitive P moving in
3-D space, in the case of a rigid object according to the kinematic screw V~ = (T~ ~ ) from
these four observation elds various 2-D, 2 12 -D or 3-D information elds can be identied:
disparity elds (ftg at time t and ft+1g at time t + 1 respectively) by standard
matching primitives techniques
n o
n o
2-D apparent motion vectors (f lg = d~l on the left sequence and f r g = d~r
on the right sequence respectively) by use of a monocular 2-D apparent motion
estimation algorithm
motion descriptor elds (resp. f lg and f r g ) dependent on a previously dened
motion model
3-D motion and structure parameter elds in the monocular case applied here to
each stereoscopic sequence.
t
t
t
dl
~
V
Y
M
~
= (T~ )
M
t+1
O
X
Z
dr
t+1
t+1
Left sequence
Right sequence
Figure 13: Stereo-motion observation space and associated identiable information elds
We will not go back over the estimation techniques of these various information elds
which have already been studied in Chapter 3 and at the beginning of this chapter in
the monocular case. However, let us remember that the manipulated primitives can be of
dierent levels:
pixel primitives: the information elds are dense
contour or region primitives: the information elds are sparse.
27
Below we discuss more particularly the various sequencing or matching possibilities of
these stereo-motion primitive estimation procedures three approaches are distinguished:
the rst consists of identifying the 3-D motion of objects by temporal matching of 3-D
primitives (the \stereo then 3-D motion" approach) the second consists of starting with
2-D apparent motion elds, independently estimated in each stereoscopic sequence, and
then raised again by stereoscopic relation between 3-D motion and structure information
elds (\3-D motion then stereo" approach) nally, the third approach, which is meant
to be better adapted to the case of the use of these motion estimation techniques in a
coding context, carries out the joint estimation of motion descriptor elds (\2-D, 2 12 -D
stereo constrained motion" approach) simultaneously in both stereoscopic sequences, by
respecting the constraints due to the intrinsic stereoscopic geometry.
3.2 3-D motion by matching 3-D primitives
This approach can be arranged as follows:
Stage 1: After identication of a disparity eld ftg (resp. ft+1g) throughout the sequence, for every stereoscopic couple of images, a depth map is produced fZt (x y )g
(resp. fZt+1 (x y )g) for every image.
Stage 2: A matching phase for 3-D primitives, obtained by successive depth maps, is
used.
Stage 3: Instantaneous depth maps and the matching previously carried out make it
possible to deduce the 3-D motions + structure of manipulated primitives.
Several authors have studied this type of approach by trying to minimize the number of
3-D primitives to be matched. Leung and Huang 23], Netravali et al 34], and Mitiche
and Bouthemy 27] worked on 3-D pixel-based primitives since theoretically three noncolinear points are enough to determine the 3-D motion of a rigid object, a sparse 3-D
point depth map is rst formulated by stereo-matching. A temporal matching on one
of the stereoscopic sequences then makes it possible to identify the 3-D motion of these
points. The raising of certain ambiguities is then eected by the verication on the other
stereoscopic sequence, of a matching of projected 3-D points. Kim and Aggarwal 21]
base their approach on the joint extraction of depth maps on contour-primitives extracted
by zero crossings of Laplacians and on pixel-based primitives by Moravec operator. A
two-pass relaxation method (in order to ensure the symmetry of temporal matching) is
used to link the 3-D primitive maps of two successive images (t) and (t + 1) the cost
function for the relaxation procedure is based on the notion of motion invariants for rigid
bodies such as distance ratios or angles between primitives. Lingxiao et al 25] present a
method in which the estimation phases of the instantaneous rotation vector and that of
translation are uncoupled. Firstly, the centroids of the pixel sets of the left and right views
are superposed on this new set of translated points, the rotation vector ~ calculation is
carried out by least mean squares method in the case of a planar structure nally the
translation vector T~ is deduced from Equation (2) itself.
Many other studies have introduced alternative algorithms to those described here.
Due to the sparse nature of the processed primitive elds, these stereo-motion cooperation algorithms are intended more particularly for the reconstitution of 3-D objects or
as navigation aids for robots by dynamic stereoscopic vision 31], 49]. In stereoscopic
28
sequence coding, it is still necessary to segment and interpret, in terms of motion and
3-D structures, a complete partition of the images, which makes the two complementary
approaches developed below more attractive.
3.3 3-D motion based on 2-D motion elds
Another approach to the calculation of the 3-D motion and structure parameters is based
on the independent and prior combination of estimated 2-D apparent motion elds on
each of the stereoscopic sequences.
Mitiche 28] starts from the hypothesis of the observation of at least four 3-D points
in two stereoscopic sequences. Each point checks the equations
8
2 3
>
x
>
h
i
>
>
64 y l 75 = 0
>
x
y
1
A
>
r
r
l
>
>
<
1
2 3
2 3
(57)
>
x
u
l
l
>
h
i
h
i
>
>
>
ur vr 0 A 64 yl 75 + xr yr 1 A 64 vl 75 = 0
>
>
>
:
1
0
where A, a 3 3 matrix, depends only on the relative displacement Rrl Trl between the
systems of coordinates linked to the stereoscopic cameras.
The identication of A (which represents 8 unknown variables after normalization) can
be carried out by resolution of the linear system on four observed points. By using the
apparent motion eld itself, this solves the problem of calibrating the stereoscopic system.
For all other 2-D matched point sets, it will therefore be possible to return to the depth
information by simple triangulation and thus to obtain access to the 3-D kinematic screw
(T~ ~) by resolution of the system of Equation (5) (linear in T~ and ~ ) once this depth
map is known. Waxman et al 53], 54] studied, in particular, the relations between 2-D
motion elds. They dene the relative !ow or binocular dierence !ow by
d~(xl yl ) = d~r (xl + (xl yl) yl) ; d~l (xl yl)
(58)
where (xl yl) designates the disparity measure obtained at the current point (xl yl ) of
the left view in the case of parallel and aligned cameras (i.e., Zl = Zr and yl = yr at all
points), it is expressed by
(59)
(xl yl) = z (xb y )
l l l
where b measures the distance (baseline) between the two stereoscopic sensors.
Equation (5) is reformulated, by separating the terms linked with the instantaneous
translation T~ and those linked with the rotation ~ by:
! !
d~(x y) = xy__ = uv = Z (x1 y ) A(x y ):T~ + B(x y ):~
(60)
From Equations (58) to (60), we deduce the following analytical relation between disparity
elds, relative !ow components and 3-D motion (in the case of aligned cameras):
8 u(x y )
1
l l
<
(xlyl ) = b TZ + yl !X ; xl !Y
: v((xxlyyl)) = 0
l l
(61)
29
1 = n x + n y + n then the
If a planar structure hypothesis is used, i.e., Z (xy
X
Y
Z
)
relations between 3-D motion + structure and disparity elds and relative !ow elds can
be established simply by:
(
u(xl yl )
(xl yl )
v (xl yl )
= nTZZ + (nX TZ ; !Y )xl + (!X + nY TZ )yl
(62)
= 0
In order to avoid bias in the estimation of initial 2-D motion elds, the latter are
ltered by adapted lters (radial !ow ltering for the relative !ow, 2nd order ltering for
the elds themselves) 53]. The 3-D motion estimation method proceeds in accordance
with the following principles:
stage 1: estimation, segmentation and ltering of 2-D apparent motion elds
stage 2: matching of primitives based on coherence equations (62)
stage 3: use of disparity functions for the reconstitution of surfaces between discontinuity regions detected during monocular analysis (stage 1)
stage 4: estimation of 3-D motion parameters
A temporal linking phase is also introduced in order to allow a \sub-pixel" accuracy in
the estimated disparity eld (by temporal interpolation) and tracking along the temporal
axis of discontinuity regions and matched segmented regions.
3.4 Joint motion estimation under stereoscopic constraints
In several applications - and notably those of stereoscopic sequence coding, where 3-D reconstruction is not an aim - it is sometimes not necessary to go back as far as the estimation
of explicit 3-D motion and structure parameters. A contrario, it would appear interesting
to move on to the 2-D or 2 21 -D motion descriptor estimation phases not independently
of each stereoscopic sequence, but jointly by introducing stereoscopic constraints into the
estimation schemes themselves, linking the two descriptor elds.
In the case where only dense 2-D primitive
elds
(disparity elds ft g
n
o nare estimated
o
~
~
and ft+1 g and apparent motion elds l = dl , r = dr ) an available coherence
constraint for these elds is to impose, at each point of the image plane, a linear relation:
d~l + ~l + d~r + ~r = 0
(63)
consisting of forcing the closure of the quadrilateral illustrated in Figure 13.
Such a relation makes it possible, knowing three information elds, to deduce the
fourth, an ability which is easily applied in the case where, given that the dense disparity elds are calculated on each stereoscopic pair, the knowledge of a motion eld (for
example on the left sequence) makes it possible to deduce the other eld (on the right
sequence). Tamtaoui and Labit 46] tested this estimation approach. It turns out that
this too localized and too major constraint, notably on occlusion regions, can only provide
an initial prediction of a eld which then has to be ane to obtain results in motion
compensation identical to the monocular case obviously, this post-processing removes the
previous stereoscopic constraint. Furthermore, this scheme remains very sensitive to the
estimation bias of each of the information elds introduced.
30
An interesting alternative 46], 48] is to begin with a coherence equation linking the
apparent motion elds d~l = (ul vl )t and d~r = (ur vr )t under stereoscopic constraints. This
relation establishes itself as follows: if
2
3
2
3
X
X
r
l
64 Y 75 = Rr 64 Y 75 + Tr
(64)
r
l
l
l
Zr
Zl
with Trl = (t1 t2 t3)t and Rrl = ( rij ) with i = 1 2 3 j = 1 2 3, and if we assume
that Zl = Zr for all matched pixels (parallel cameras hypothesis), then it is possible to
establish the following relation between apparent 2-D motion elds:
(r21 ; tt2 r11)ul + (r22 ; tt2 r12)vl = ; tt2 ur + vr
(65)
1
1
1
which can be put in the form ul + vl + ur + vr = 0 with:
8
= rt212 ; rt111
>
>
>
< = r22 ; r12
t2
t1
(66)
>
= t11
>
>
: = ;1
t2
It is equivalent matrically to C : ( = 0 with:
( = (ul vl ur vr)t the motion vector linked to the two stereoscopic sequences,
C coherence coecients.
Tamtaoui and Labit 46] introduce this coherence equation within a pel-recursive type
estimation scheme by minimization of a reconstitution error quadratic function (;) linked
to the left and right sequences by gradient techniques. Namely:
;(( plr ) = DFD2 (pl d~l) + DFD2 (pr d~r )
(67)
with plr , a couple of pixels (pl pr ) matched together the estimation algorithm is then
written:
(k+1 = (k ; P r ;((k )
(68)
with P = I ; C T (CC T );1 C . The matrix P is the matrix of projection on the coherence
space:
n
o
( 2 IR4 = C:( = 0
(69)
This estimation technique (see Figure 16) compares favourably with monocular independent motion estimation techniques (see Figure 15) and with disparity estimation techniques
(see Figure 14) used for compensation schemes.
Naturally, this approach on a dense eld extends to region motion descriptor estimation
methods (see Figure 17) by the use of parametric motion models 47]. In addition to
the more global nature of these descriptors, such an approach appears more robust to
estimation bias on the disparity since in this context it is a matter of matching regions
and not points.
Some results below illustrate the performances achieved using these joint estimation
algorithms concerning quality criteria of reconstitution after motion compensation and
quality criteria of motion elds obtained.
31
a
b
Figure 14: (a) Reconstructed \Campagne" image using disparity compensation, (b) Corresponding disparity compensation errors (MSE=54.24)
3.5 Application to coding of stereoscopic sequences (3-D TV)
3.5.1 The general context of 3-D TV
As Figure 18 illustrates, a three-dimensional television system (3-D TV) consists of various
elements as follows:
a stereoscopic capture system (at least two cameras, calibrated or not)
a coder-decoder implementing a compression phase for transmission or storage of
stereoscopic sequences
a 3-D display for which various technologies exist: dual-screens with polarizing lters,
glasses with synchronized obturators, lenticular plate screens,...
The motion estimation algorithms using stereovision-motion cooperation, mentioned
in the previous paragraphs, integrate naturally into such an applicational context in order to analyze stereoscopic source-sequences and code them by motion and/or disparity
compensation.
3.5.2 Stereoscopic sequence coding strategies
We remain within the context of compatible coding-decoding-restitution approaches, i.e.,
which permit restoration of a monocular view, if the receiver does not have a 3-D display.
Two denitions of compatibility can then be introduced (see Figure 19):
32
a
b
c
Figure 15: (a) Reconstructed \Campagne" image using motion compensation (WalkerRao pel-recursive method), (b) Corresponding motion compensated errors (MSE=7.92),
(c) Motion vector eld
1. in the rst approach, we assume the coding of one of the stereoscopic sequences (for
example the left as illustrated in Fig 19) by such a standard monocular sequence
compression technique. The second sequence will be coded by:
disparity compensation 57] (example in Figure 14)
motion compensation 47], 10] (examples in Figures 15 to 17).
The second coding channel is thus used to transmit compensation errors and if
necessary, if the disparity and motion information elds are used non-predictively in
the compensation scheme, these should also be transmitted.
In this case, an eective stereo-motion cooperation approach makes it possible:
to compare the two possible types of compensation
33
a
b
c
Figure 16: (a) Reconstructed \Campagne" right image using joint coherent motion compensation on the two stereoscopic sequences, (b) Corresponding motion compensated errors (MSE=3.73), (c) Motion vector eld
to restrict the volume of information which represents these elds by taking
account of equations of geometric dependence which link them (coherence equations described just before)
to minimize depth perception artefacts which are linked to an independent
view-to-view reconstitution by purely monocular approaches.
2. the second approach appears as an attractive, but more dicult to achieve, extension of the previous notion of compatibility. Prior to any coding of stereoscopic
sequences, a joint stereo-motion analysis is carried out. From this processing phase
are generated, on the one hand, a \compatible" monocular sequence which can be
situated as an intermediate position between the viewpoints of the left and right
cameras and, on the other hand, innovation information (identical in nature to the
compensation error information previously described) with regard to this compat34
a
b
c
Figure 17: (a) Reconstructed \Campagne" right image using joint coherent quadtree-based
ane motion estimation on the two stereoscopic sequences, (b) Corresponding motion
compensated errors (MSE=15.16), (c) Motion vector eld
ible sequence. Such an approach is well adapted to the case of the use of 3-D
motion+structure estimation methods which, once carried out, make it possible to
synthesize the 3-D scene perceived from all viewing angles. This coding strategy,
dicult because of the even more imprecise nature of 3-D parameter estimations
obtained on true stereoscopic sequences, can be considered as a natural extension of
the Analysis-Synthesis or object-oriented coding approaches, described in paragraph
8.2.6 for simple objects.
References
1] G. Adiv, \Determining three-dimensional motion and structure from optical !ow
generated by several moving objects", IEEE Trans. on Pattern Analysis and Ma35
Camera
1
STEREOSCOPIC
SOURCE
DIGITALIZATION
Camera
2
ENCODING
TRANSMISSION
Synchronized or polarized
binocular sytem
STEREOSCOPIC
SOURCE
DECODING
Lenticular
sheet
STEREOSCOPIC DISPLAY
Figure 18: General scheme of a 3-D TV system
Left sequence
Right sequence
Stereo
Analysis
Residual
Coding
Coding
Transmission
Reconstruction
Reconstruction
Compatible
Image
Stereo Image
Figure 19: Compatibility approach for transmission of stereoscopic image sequences
chine Intelligence, Vol. PAMI-7, pp. 384-401, July 1985.
2] G. Adiv, \Inherent ambiguities in recovering 3D motion and structures from a noisy
!ow eld", IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. PAMI11, pp. 477-489, May 1989.
3] K. Aizawa, H. Harashima, and T. Saito, \Model-based analysis synthesis image
coding (MBASIC) system for a person's face", Signal Processing: Image Communication, Vol. 1, pp. 139-152, 1989.
36
4] P. Anandan, \A unied perspective on computational techniques for the measurement of visual motion", Proc. of the 1st Int. Conf. on Computer Vision, pp. 219-230,
May 1987.
5] J.L. Barron, A.D. Jepson, and J.K. Tsotsos, \The feasibility of motion and structure from noisy time-varying image velocity information", Int. Journal of Computer
Vision, pp. 239-269, 1990.
6] P. Bouthemy and J. Santillana-Rivero, \A hierarchical likelihood approach for region
segmentation according to motion-based criteria", Proc. of the 1st Int. Conf. on
Computer Vision, London, pp. 463-467, 1987.
7] N. Diehl, \Object-oriented Motion Estimation and segmentation in Image Sequences", Signal Processing: Image Communication, Vol. 3, No. 1, pp. 23-56, 1991.
8] E. Dubois, \Motion-compensated ltering of time-varying images", Multidimensional Systems and Signal Processing, No. 3, pp. 211-239, 1992.
9] J.L. Dugelay and B. Choquet, \A 3D image analysis algorithm and stereoscopic
television", Proc. of Festival Int. des images 3D, Paris, Sept. 1991.
10] J.L. Dugelay and D. Pele, \Motion and disparity analysis of a stereoscopic sequence:
application to 3DTV encoding", European Conference on Signal Processing, EUSIPCO'92, Aug. 1992.
11] R. Forchheimer and O. Fahlander, \Low bit rate coding through animation", Picture
Coding Symposium, PCS'83, Davis, March 1983.
12] R. Forchheimer, O. Fahlander, and T. Kronander, \A semantic approach to the
transmission of face images", Picture Coding Symposium, PCS'84, Rennes, July
1984.
13] E. Francois and P. Bouthemy, \The derivation of qualitative information in motion
analysis", Proc. of the 1st European Conf. on Computer Vision, ECCV'90, pp. 226230, 1990.
14] E. Francois, Interpretation qualitative du mouvement a partir d'une sequence
d'images, Ph-D thesis, Universit)e de Rennes-I, June 1991.
15] R. Hartley, \Segmentation of optical !ow elds by pyramid linking", Pattern Recognition Letters, Vol. 3, pp. 253-262, July 1985.
16] M. Hoetter, \Dierential estimation of the global motion parameters zoom and pan",
Signal Processing, Vol. 16, pp. 249-265, 1989.
17] B.K.P. Horn and B. Schunck, \Determining optical !ow", Articial Intelligence, Vol.
17, pp. 185-203, 1981.
18] B.K.P. Horn and J.R. Weldon, \Direct methods for recovering motion", Int. Journal
of Computer Vision, Vol. 2, pp. 51-76, 1988.
19] M. H'otter, \Object-oriented analysis-synthesis coding based on moving twodimensional objects", Signal Processing: Image Communication, Vol. 2, pp. 409-429,
1990.
37
20] M. Kanado, A. Koike and Y. Hatori, \Codings with knowledge-based analysis of
motion pictures", Picture Coding Symposium, PCS'87, Stockholm, June 1987.
21] Y.C. Kim and J.K. Aggarwal, \Determining object motion in a sequence of stereo
images", IEEE Journal of Robotics and Automation, Vol. 3, No. 6, pp. 599-614, Dec.
1987.
22] C. Labit and H. Nicolas, \Compact motion representation based on global features
for semantic image sequence coding", Proc. of the SPIE Conf. on Visual Communication and Image Processing, VCIP'91, Vol. 2, pp.697-709, Nov. 1991.
23] M. K. Leung and T. S. Huang, \An integrated approach to 3D motion analysis and
object recognition", IEEE Trans. on Pattern Analysis and Machine Intelligence,
Vol. PAMI-13, No. 10, pp. 1075-1084, Oct 1991.
24] S. X. Li and M. H. Loew, \The quadcode and its arithmetic", Communications of
the ACM, pp. 621-631, July 1987.
25] L. Lingxiao, T. S. Huang et al., \Motion estimation from 3-D points sets with
and without correspondences", Proc. of the Conf. Computer Vision and Pattern
Recognition, CVPR'86, pp. 194-201, 1986.
26] H. C. Longuet-Higgins, \A computer algorithm for reconstructing a scene from two
projections", Nature, Vol. 293, pp. 133-135, Sept. 1981.
27] A. Mitiche and P. Bouthemy, \Tracking modelled objects using binocular images",
Computer Vision, Graphics and Image Processing, Vol. 32, pp. 384-396, 1985.
28] A. Mitiche, \On kineopsis and computation of structure and motion", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-8, No. 1, pp.
109-112, Jan. 1986.
29] Y. Miyamoto and M. Ohta, \Global motion compensation for rotation and zooming
image", Proc. of the Picture Coding Symposium, PCS'91, pp. 137-140, Sept. 1991.
30] H.-G. Musmann, M. H'otter and J. Ostermann, \Object-oriented analysis-synthesis
coding of moving images", Signal Processing: Image Communication, Vol. 1, pp.
117-138, 1989.
31] N. Navab, Z. Zhang and O. D. Faugeras, \Tracking, motion and stereo", Proc. of
the Scandinavian Conf. on Image Analysis, SCIA'91, pp. 98-105, 1991.
32] S. Negahdaripour and A. Yuille, Direct passive navigation, I: analytical solutions for
planes, AI Memo 863, MIT Articial Intelligence Lab, August 1985.
33] S. Negahdaripour and A. Yuille, \Direct passive navigation, II: analytical solutions
for quadratic patches", Conf. Computer Vision and Pattern Recognition, CVPR'88,
pp. 404-410, 1988.
34] A.N. Netravali, T.S. Huang et al, \Algebraic Methods in 3D Motion Estimation from
two-view point correspondences", Int. Journal of Imaging Systems and Technology,
Vol. 1, pp. 78-99, 1989.
38
35] A. N. Netravali and J. D. Robbins, \Motion compensated television coding: Part
I", Bell Syst. Tech. Journal, Vol. 58, No. 3, pp. 631-670, March 1979.
36] A. N. Netravali and J. Salz, \Algorithms for estimation of three-dimensional motion", AT&T Technical Journal, Vol. 64, No. 2, Feb. 1985.
37] H. Nicolas and C. Labit, \Global motion identication for image sequence analysis and coding", Proc. of Int. Cong. on Speech, Acoustics and Signal Processing,
ICASSP'91, Vol. 4, pp. 2825-2828, May 1991.
38] H. Nicolas and C. Labit, \Region-based motion estimation using deterministic relaxation schemes for image sequence coding", Proc. Int. Cong. on Speech, Acoustics
and Signal Processing, ICASSP'92, Vol. 3, pp. 265-268, March 1992.
39] H. Nicolas, Hierarchie de modeles de mouvement et methodes d'estimation associees.
Application au codage de sequences d'images, Ph-D Thesis, Universit)e de Rennes-I,
Sept. 1992.
40] J. Rissanen, \Modeling by shortest data description", Automatica, Vol. 14, pp. 465472, 1986.
41] H. Samet, \Quadtree from boundary codes", Communications of the ACM, pp. 163170, March 1980.
42] H. Sanson, \Motion ane models identication and application to television image
coding", SPIE Conf. Visual Communication and Image Processing, VCIP'91, Vol.
1605, pp. 570-581, Nov. 1991.
43] J. Santillana-Rivero, P. Bouthemy and C. Labit, \Hierarchical motion-based image
segmentation applied to HDTV", 2nd Int. Workshop on Signal Processing of HDTV,
l'Aquila, March 1988.
44] P. Y. Simard and G. E. Mailloux, \A projection operator for the restoration of
divergence-free vector elds", IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. PAMI-10, No. 2, pp. 248-256, 1988.
45] M. Subbarao and A. M. Waxman, \On the uniqueness of image !ow solutions for
planar surfaces in motion", Computer Vision, Graphics and Image Processing, Vol.
36, pp. 208-220, 1986.
46] A. Tamtaoui and C. Labit, \Constrained disparity and motion estimators for 3DTV
image sequence coding", Signal Processing: Image Communication, Vol. 4, pp. 4554, 1991.
47] A. Tamtaoui, Cooperation stereovision-mouvement pour la compression de sequences
stereoscopiques. Application a la Television en relief (TV3D), Ph-D Thesis, Universit)e de Rennes-I, Oct. 1992.
48] A. Tamtaoui and C. Labit, \Constrained motion estimators for 3D sequence coding",
Proc. of the European Conf. on Signal Processing, EUSIPCO'92, Brussels, Aug.
1992.
39
49] M. Tistarelli, E. Grosso and G. Sandini, \Dynamic stereo in visual navigation",
Proc. of Conf. Computer Vision and Pattern Recognition, CVPR'91, pp. 186-192,
1991.
50] Y. T. Tse and R. Baker, \Global zoom/pan estimation and compensation for
video compression", Proc. of Int. Cong. on Speech, Acoustics and Signal Processing,
ICASSP'91, Vol. 4, pp. 2725-2728, May 1991.
51] A. Verri, F. Girosi and V. Torre, \Mathematical properties of the two-dimensional
motion eld: from singular points to motion parameters", Journal of Optical Soc.
of Am., Vol. 6, No. 5, pp. 698-712, May 1989.
52] A. M. Waxman and K. Wohn, \Contour evolution, neighborhood deformation, and
global image !ow: planar surfaces in motion", Int. Journal of Robotics Research,
Vol. 4, No. 3, pp. 95-108, 1985.
53] A. M. Waxman and S. Sinha, \Dynamic stereo passive ranging to moving objects
from relative image !ows", IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. PAMI-8, No. 4, pp. 406-412, July 1986.
54] A. M. Waxman and J. H. Duncan, \Binocular image !ows: steps forward stereomotion fusion", IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. PAMI-8, No. 6, pp. 715-729, Nov. 1986.
55] A. M. Waxman and K. Wohn, \Image !ow theory: a framework for 3D inference
from time-varying imagery", Chapter 3 in Advances in Computer Vision, Erlbaum
Associates Ed., London, pp. 164-224, 1988.
56] S. F. Wu and J. Kittler, \A dierential method for simultaneous estimation of rotation, change of scale and translation", Signal Processing: Image Communication,
Vol. 2, pp. 69-80, 1990.
57] M. Ziegler, \Disparity estimation using variable blocksize", Proc. of the 3rd COST230 Workshop on 3DTV Signal Processing, Rennes, 1992.
40