Duality based optical flow algorithms with applications
U N I V E R S I T Y O F C O P E N H A G E N
Duality based optical flow algorithms with applications
University of Copenhagen prize thesis
Lars Lau Rakêt
January 14, 2013
Abstract
We consider the popular TVL
1 optical flow formulation, and the socalled duality based algorithm for minimizing the TVL
1 energy. The original formulation is extended to allow for vector valued images, and minimization results are given. In addition we consider di↵erent definitions of total variation regularization, and related formulations of the optical flow problem that may be used with a duality based algorithm. We present a highly optimized algorithmic setup to estimate optical flows, and give five novel applications. The first application is registration of medical images, where Xray images of di↵erent hands, taken using di↵erent imaging devices are registered using a TVL
1 optical flow algorithm. We propose to regularize the input images, using sparsity enhancing regularization of the image gradient to improve registration results. The second application is registration of 2D chromatograms, where registration only have to be done in one of the two dimensions, resulting in a vector valued registration problem with values having several hundred dimensions. We propose a novel method for solving this problem, where instead of a vector valued data term, the di↵erent channels are coupled through the regularization. This results in a simple formulation of the problem, that may be solved much more efficiently than the conventional coupling. In the third application of the TVL
1 optical flow algorithm we consider the problem of interpolating frames in an image sequence.
We propose to move the motion estimation from the surrounding frames directly to the unknown frame by parametrizing the optical flow objective function such that the interpolation assumption is directly modeled. This reparametrization is a powerful trick that results in a number of appealing properties, in particular the motion estimation becomes more robust to noise and large displacements, and the computational workload is more than halved compared to usual bidirectional methods. Finally we consider two applications of frame interpolation for distributed video coding. The first of these considers the use of depth data to improve interpolation, and the second considers using the information from partially decoded video frames to improve interpolation accuracy in highmotion video sequences.
Notes
This thesis was awarded the University of Copenhagen silver medal at the 2013 annual commemoration. A number of typos present in the original thesis has been corrected in the present version, and several references have been updated.
Lars Lau Raket
Contents
1
3
. . . . . . . . . . . . . . . . . . . . . .
4
. . . . . . . . . . . . . . . . . . . . . .
5
. . . . . . . . . . . . . .
6
Alternative optical flow formulations
. . . . . . . . . . . .
10
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
22
. . . . . . . . . . . . . . . . . . . .
22
Registration of structural images
. . . . . . . . . . . . . .
23
Registration of 2D chromatograms
. . . . . . . . . . . . . . . . .
28
. . . . . . . . . . . . . . . . . . . .
28
Image interpolation with a symmetric optical flow constraint
. .
32
Motion Compensated Frame Interpolation
. . . . . . . . .
33
Reparametrizing Optical Flow for Interpolation
. . . . . .
33
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
Image interpolation using depth data
. . . . . . . . . . . . . . . .
39
Optical flow computation using brightness and depth
. . .
39
. . . . . . . . . . . . . . . . . . . . . . . . .
40
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
Image interpolation using partially decoded frames
. . . . . . . .
43
. . . . . . . . . . . . . . . . . .
43
Upsampling images from DCT coefficients
. . . . . . . . .
44
Motion reestimation and interpolation
. . . . . . . . . . .
47
4 Conclusions and future directions
49
51
Chapter 1
Introduction
This thesis is an answer to the call for prize papers announced at University of
Copenhagen’s annual commemoration 2011. In particular, it is an answer to the topic “Regularized energy methods in image analysis”, proposed by Department of Computer Science.
For the energy method in question, we consider the TVL
1 optical flow formulation, which has received a lot of attention in recent years. With the introduction of the socalled duality based method for minimizing this energy,
( 2007 ) opened the door to an entirely new way of estimating optical
flow, that has fundamentally changed the field.
While the method introduced by
( 2007 ) is powerful, the original
formulation is somewhat limiting. We begin this thesis with a theoretical section, where we first review the original formulation. We then consider extensions to allow for vector valued images, which will make it possible to estimate optical flows using color images. This extension was originally presented in
( 2011 ). We furthermore consider alternative definitions of the total variation
term that is used for regularizing the results. A number of related formulations of the optical flow problem that fit into the duality based algorithm are reviewed, and in relation to this, we propose new data and regularization terms, and give directions on the minimization of the the corresponding energies.
We finally end the theoretical chapter by presenting a highly optimized algorithmic setup to estimate optical flows, and give results for some of the presented algorithms on benchmark data from the Middlebury Optical Flow Database
The second part of this thesis consists of five novel applications of optical flow. The first application is registration of medical images, where Xray images of di↵erent hands, taken using di↵erent imaging devices are registered using a TVL
1 optical flow algorithm. In addition we consider the use of sparsity enhancing regularization of the input images, in order to improve registration results.
The second application considers registration of 2D chromatograms. For this particular dataset, registration is only necessary in one of the two dimensions of the data. With a fixed second dimension we may consider this as a vector valued registration problem with values having several hundred dimensions. A novel method for solving this is proposed, where instead of a coupling the di↵erent channels through the data term, the coupling is done through regularization.
1
This results in a very simple formulation of the problem, which may in addition be solved much more efficiently than the conventional coupling. This method, which may be used on many types of data, has originally been developed for the presented example in
Rakˆet & Markussen ( 2014 ), where the registration is
used as a preprocessing step, prior to analysis of the dataset.
In the third application of the TVL
1 optical flow algorithm we consider the problem of interpolating unknown frames in an image sequence. We propose to move the motion estimation from the surrounding frames directly to the unknown frame, by parametrizing the optical flow objective function such that the interpolation assumption is directly modeled. This reparametrization is a powerful trick that results in a number of appealing properties, in particular the motion estimation becomes more robust to noise and large displacements, and the computational workload is more than halved compared to usual bidirectional methods. This method was originally presented in
Finally we consider two applications of frame interpolation meant to be used in distributed video coding setups. The first of these consider the use of depth data to improve interpolation quality. We show that including a standard asymmetric data term for the depth data with the symmetric data term presented in the previous application gives significantly better interpolation results than using either of the terms on their own. This application has been developed for the distributed video codec by
Salmistraro, Rakˆet, Zamarin, Ukhanova &
For the second application of interpolation in a distributed video coding setup, we consider using the information from partially decoded video frames to improve accuracy in highmotion video sequences. We develop a method to generate rough estimates of the frame to be decoded in pixel domain based on the decoded information in transform domain. With these initial estimates we are able to use a TVL
1 optical flow method to fill in the fine details from the two known surrounding key frames. This method is used in the distributed video codec described in
All results of this thesis are, unless otherwise mentioned, original and unpublished, and have been independently developed by the author.
2
Chapter 2
Optical Flow
The optical flow problem dates back to the works of
Horn & Schunck ( 1981 ), that respectively proposed local and global resolution
strategies. Given two images I
0 and I
1 the main problem in optical flow is defining a map v, such that the di↵erence between I
1 warped according to v and I
0
I
1
(x + v(x)) I
0
(x) (2.1)
is close to zero. Solving ( 2.1
) equal to zero is problematic in a number of ways.
It is illposed; in the standard case we have onedimensional brightness images, and for each point x we need to estimate the two components of v from a single equation. In addition the problem is highly nonlinear. To deal with
) is typically linearized by means of its first order
Taylor approximation in v. Local methods assume that the displacement v(x) is similar in a neighborhood of x, which typically gives enough linear independent equations in the channels of v for proper estimation. In contrast global optical flow methods typically use a pointwise data term based on the linearization of
), but adds a regularization term, that penalizes erratic behavior of v, giving
an energy that must be minimized in order to estimate v.
In the original formulation by
Horn & Schunck ( 1981 ), optical flow is defined
as “the distribution of apparent velocities of movement of brightness patterns
in an image”, which is directly compatible with ( 2.1
Rather than this original definition, optical flow is today often thought of as the
projected scene flow ( Barron et al. 1994 ), that is the true motion of the objects
in the scene as seen from the image plane.
Today the variational global approach to optical flow estimation is by far the method of choice for high accuracy optical flow algorithms, and judging from
the authoritative Middlebury optical flow benchmark ( Baker et al. 2011 ) the op
tical flow problem is essentially solved. The accuracy of optical flow algorithms have only increased marginally since 2010, where
( 2010 ) presented their estimation framework ( Xu et al. 2012 ), with average endpoint errors of
the estimated motion vectors that are typically less than one fifth of a pixel.
So why still consider this problem, if one can only hope for pushing accuracy on the second or third decimals of benchmark data? A number of prominent reasons comes to mind. First, many of the top performing methods require large amounts of time (up to 10 hours) to compute a single displacement field
3
for small resolution images. Secondly, it seems that almost all top performing methods are either very complex in their formulation, or relies on solving the optical flow problem using highly sophisticated setups. In addition many meth
ods rely on ‘tricks’ ( Sun et al. 2010 ), and proper tuning of a large number of
parameters. Finally, it seems that most focus has been on a single benchmark dataset, which means that many methods are essentially tailored to the specific evaluation setup. The consequence of this is that only little of the work that has been put in to solving the optical flow problem given by the Middlebury benchmark, has actually been transferred to possibly benefit related problems such as processing of video data or registration of medical images.
In this chapter we will review the socalled duality based optical flow method with a special focus on the TVL
1 optical flow formulation, which is both fast and has been used in many di↵erent applications, demonstrating its robustness.
2.1
Duality based optical flow
Given a domain T
✓ R d and a sequence of images I t
: T
! R k
, I = (I t
) t 2T for suitable T , we want to estimate the optical flow v : T
! T such that the motion matches the image sequence with respect to some measure. We will consider a variational approach where the flow v is estimated as a minimizer of an energy on the form
E(v) = F (I, v) + G(v) (2.2) with F being a positive functional measuring data fidelity, and where G acts as a regularization term. Many energies of this type have been suggested throughout
the years, and a large variety of solution methods exist ( Horn & Schunck 1981 ,
Zach et al. 2007 , Zimmer et al. 2011 ). Here we will focus
on a specific relaxation of the problem, and consider the minimization methods in this framework. The relaxed energy is obtained by introducing an auxiliary variable, e↵ectively splitting the minimization problem in two quadratically coupled problems
E(u, v) = F (I, v) +
1
2✓
Z kv(x) u(x) k
2 dx + G(u).
(2.3)
For ✓
) will clearly be the same, so the hope is that for ✓ small, a minimizer of the relaxed energy ( 2.3
) will be close to a minimizer of the original energy ( 2.2
). It may seem troublesome to introduce
an auxiliary variable, since one has to iteratively solve the two energies
E
1
(v) = F (I, v) +
E
2
(u) =
1
2✓
Z kv(x)
1
2✓
Z kv(x) u(x) k
2 u(x) k
2 dx + R(u), dx, (2.4)
(2.5) and for a wide variety of choices for F and G, methods exist that directly tar
get (relaxed) variants of the original energy ( 2.2
) are much easier to solve, and in a number of important cases the mini
mization problems may very easily be solved on massively parallel processors
4
such as GPUs. Another positive feature is that datamatching and regularization are done independently, so one may easily replace one without changing the minimization of the other—a fact that makes comparison of di↵erent types of energies uncomplicated and fair, since the minimization is done in a fully comparable framework.
2.1.1
TVL
1
optical flow
The by now classic duality based TVL
1
1 optical flow algorithm of
norm for the data matching term F , and a vectorial total variation term for the regularization G, giving an energy of the form
Z Z
E(v) = kR(v)(x)k dx + kDv(x)k dx, (2.6)
T T where R is the given constancy assumption, that is typically defined from some
The TVL
1 formulation was originally proposed by
also described a modern implementation in details, and gave a theoretical account for the choices. This algorithm marked a turning point with respect to optical flow accuracy, that also helped boosting the performance of later algorithms. The estimation in
( 2004 ) is based on the EulerLagrange
framework, which require smooth functionals, and so the Euclidian norms in
) are replaced with Charbonnier functions
k · k
"
= p k · k 2
+ "
2
(2.7) where " is some small number.
) proposed to recover a minimizer of ( 2.6
minimizing the two convex quadratically coupled problems described in ( 2.3
In the given formulation, they are
E
1
(v) =
Z k⇢(v)(x)k dx +
1
Z
(2.8)
T
2✓
T kv(x) u(x) k
2 dx,
E
2
(u) =
Z
T kD
S u(x) k dx +
1
2✓
Z
T kv(x) u(x) k
2 dx.
(2.9) where ⇢ is the linearization of a grayscale data fidelity term R, and the chosen total variation term is defined as the sum of the total variation over all channels
Z
T kD
S u(x) k dx = i=1
Z
T kru i
(x) k dx.
(2.10)
The flow is recovered by iteratively minimizing these energies in a coarsetofine pyramid scheme for some small ✓.
For onedimensional images, using for example only the brightness, we get a linearized data term of the form ⇢(v)(x) = a
> v(x) + b, a
2 R d minimizer of E
1 and b
2 R. The can be computed using the results given in
These results are replicated in general form in the following lemma.
5
Lemma 2.1.1. For ⇢(v)(x) = a
> v(x) + b, the minimizer of E
1 v(x) = u(x) ⇡
✓[ a,a]
✓ u(x) + b kak 2 a
◆ is given by
(2.11) where ⇡ and ✓a,
✓[ a,a] is the projection onto the line segment joining the vectors ✓a
⇡
✓[ a,a]
✓ u(x) + b kak 2 a
◆
=
8
<
✓a if ⇢(u)(x) < ✓ kak
2
✓a if ⇢(u)(x) > ✓ kak
2
⇢(u)(x) kak
2 a if
⇢(u)(x) ✓kak
2
.
(2.12)
The regularization energy ( 2.9
) is elegantly minimized by the method of
Chambolle ( 2004 ). The solution is reproduced in the following lemma.
Lemma 2.1.2 (Chambolle). The minimizer u of E
1 is given coordinatewise by u i
= v i
✓ r · p i for i = 1, . . . , d, where p i
: T
! R d scheme p n+1 i
=
✓ p n i
+ ⌧ r(✓ r · p n i
✓ + ⌧ can computed by the iterative fixedpoint
r(✓ r · p n i v i v i
)

)
.
These lemmas provide an elegant, and easily implementable solution to the
relaxed optical flow problem given by ( 2.8
given formulation is somewhat restrictive, for example it does not allow for the use of vector valued images such as color images. In the next section, the problem of having using an L
1
norm for data term with vector valued images is considered. The problem is analyzed from a convex analysis point of view, and
a solution to the resulting energy ( 2.8
) is given. These results were originally
proposed by
2.1.2
Minimizing affine L
1
L
2
energies
Consider an L
1
L
2
E
1
(v) =
Z
T energy of the following form
1
Z kAv(x) + b(x)k dx +
2
T kv(x) u(x) k
2 dx (2.13) where A :
R d
! R k
. Because no di↵erential of v is involved, the minimization of
) boils down to a pointwise minimization of a strictly convex cost function
of the form f (v) = kAv + bk +
1
2 kv u k
2
.
(2.14)
In the following we present the tools used for solving the minimization problem
). We recall first a few elements of convex analysis, the reader can refer to
Ekeland & Teman ( 1999 ) for a complete introduction to convex analysis in both
finite and infinite dimension. Here we will restrict ourselves to finite dimensional problems.
6
A function f :
R d
! R is onehomogeneous if f( x) = f(x), for all > 0.
For a onehomogeneous function, it is easily shown that its LegendreFenchel transform f
⇤
(x
⇤
) = sup x 2R d
{hx, x
⇤ i f (x) } is the characteristic function of a closed convex set C of
R d
, d
C
(x
⇤
) := f
⇤
(x
⇤
) =
(
0 if x
⇤
2 C,
+ 1 otherwise.
(2.15)
(2.16)
The onehomogeneous functions that will interest us here are of the form f (x) = kAxk where A : R d
R k
! R
. The computation of the associated Fenchel transform involves the Moore
Penrose pseudoinverse A
† k is linear, and k · k is the usual Euclidean norm of of A. We recall its construction.
The kernel (or nullspace) of A, denoted Ker A, is the vector subspace of the v
2 R d for which Av = 0. The image of A, denoted Im A, is the subspace of reached by A. The orthogonal complement of Ker A is denoted Ker A
?
R k
. Denote by ◆ the inclusion map Ker A
R k
?
! R d and let ⇡ be the orthogonal projection
! Im A. It is well known that the composition map B = ⇡ A ◆
Ker A
?
! R d
! R k
(2.17) is a linear isomorphism between Ker A
?
doinverse A
† of A is defined as and Im A. The MoorePenrose pseu
A
†
= ◆ B
1
⇡.
(2.18)
With this, the following lemma provides the LegendreFenchel transform of f (x):
Lemma 2.1.3. The LegendreFenchel tranform of x 7! kAxk is the characteristic function d
C of the elliptic ball C given by the set of x’s in
R d that satisfy the following conditions
A
†
Ax = x x
>
A
†
A
†> x
1.
(2.19)
(2.20)
From the properties of pseudoinverses, the equality x = A
†
Ax means that x belongs to Ker A
?
On this subspace, A
†
. In fact, A
†
A
†>
A is the orthogonal projection on Ker A
?
is positive definite and the inequality thus defines an
.
elliptic ball.
The lemma will not be proven here, but we indicate how it can be done. In the case where A is the identity
I d of
R d
, it is easily shown that C is the unit sphere of
R d
. The case where A is invertible follows easily, while the general case follows from the latter using the structure of pseudoinverse (see
Golub & van Loan ( 1989 ) for instance).
We can now state the main result which allows to generalize the TVL
1 algorithm from
( 2007 ) to calculate the optical flow between two
vector valued images.
7
Proposition 2.1.4. The minimizer of the function f (v) = kAv + bk +
1
2 kv u k
2 is given as follows.
(i) In the case b
62 Im A, f(v) is smooth. It can be minimized by usual methods.
(ii) In the case where b 2 Im A, f(v), which fails to be smooth for v 2
Ker A + A
† b, reaches its unique minimum at v = u ⇡
C u + A
† b (2.21) where ⇡
C is the projection onto the convex set C =
{ x, x 2 C}, with
C as described in Lemma
Proof. To see (i), write b as Ab
0 onal projection of b onto Im A, while b
1 assumption of (i) implies that b then write
1
+ b
1
, with b
0
= A
† b, Ab
0 being then orthogis the residual of the projection. The
6= 0 is orthogonal to the image of A. One can kAv + bk = kA(v + b
0
) + b
1 k = p kA(v + b
0
) k
2
+ kb
1 k
2
(2.22) which is always strictly positive as kb
1 k
2
> 0, and smoothness follows.
In the situation of (ii), since b 2 Im A, we can do the substitution v v + A
†
) and the resulting function has the same form as a
number of functions found in
Chambolle ( 2004 ) and Chambolle & Pock ( 2011 ).
We refer the reader to them for the computation of minimizers.
Proposition
generalizes Lemma
since, on onedimensional spaces, elliptic balls are simply line segments. The next examples extends to multidimensional values.
Example 2.1.1. Consider the minimization problem arg min v
✓ kAv + bk +
1
2 kv u k
2
◆
, > 0.
(2.23) where A
2 R k
⇥2 and b
2 Im A. If A has maximal rank (i.e. 2), then is is well known that the 2
⇥ 2 matrix C = A
†
A
†> is symmetric and positive definite
( Golub & van Loan 1989 ). The set C is then an elliptic disc determined by
the eigenvectors and eigenvalues of C. The projection may be computed by the efficient algorithm described in Example
, which has much better properties
than the method originally suggested in
In the case that the matrix has two linearly dependent columns a 6= 0 and ca, a series of straightforward calculations give
Ker A =
Ry, Ker A
?
=
Rx, Im A = Ra
(2.24) with x =
1
1+c
2
(1, c)
> and y =
A
†
A
†>
=
1
1+c
2
( c, 1)
>
1
(1 + c
2
)
2 kak an orthonormal basis of
R 2
, and
1 c
!
2 c c
2
.
(2.25)
8
If c = 0, the inequality ( 2.20
u
2
1 kak 2
1 () kak u
1
kak, u = (u
1
, u
2
)
>
(2.26)
that is a vertical strip, while equality ( 2.19
simply says that u
2
= 0, thus C is the line segment
[ kakx, kakx] ⇢ R
2
.
(2.27)
The case where c 6= 0 is identical, and obtained for instance by rotating the natural basis of
R
2 to the basis (x, y).
Example 2.1.2. Consider again the minimization problem ( 2.23
assuming that b /
) we can rewrite the minimization problem
as arg min v
✓ p kA(v + b
0
) k 2
+ kb
1 k 2
+
1
2 kv u k
2
◆
, > 0.
(2.28)
The minimizing v is found by solving the equation
A
>
A(v + b
0
) kAv + bk
+ v u = 0 which may be done by gradient descent or a (quasi)Newton method.
Example 2.1.3. Consider the problem of projecting a point x
0 onto the ellipsoid given by
C =
{x 2 R k
 x
>
Cx 1}, where x
0
2 C.
x given by x = arg min x
2C kx x
0 k
2
.
This problem can be solved by introducing a Lagrange multiplier ⇠, giving the objective function f (x, ) = kx x
0 k
2
+ ⇠(x
>
Cx 1).
From the condition that
@
@x f (x, ⇠) = 2(x x
0
) + 2⇠Cx = 0, we get that
ˆ
I) 1 x
0
.
However we need to determine the value of the Lagrange multiplier ⇠. Since we assumed that x
0 was outside the ellipsoid, we know that the projected point will lie on the boundary of the ellipse @C, that is ⇠ is a root of
G(⇠) = ((⇠C +
I) 1 x
0
)
>
C(⇠C +
I) 1 x
0
1.
(2.29)
We can use the following theorem due to
Kiseliov ( 1994 ) to determine the
correct value of ⇠.
9
Theorem 2.1.5. The root ⇠
⇤ iterative Newton process
) is unique and can be found by the
⇠
0
= 0, ⇠ n+1
= ⇠ n
G(⇠ n
)
G
0
(⇠ n
)
, where ⇠ k
" ⇠
⇤
. The rate of convergence is quadratic.
Proof. Since we are assuming that x
0
/
G(0) = x
>
0
Cx
0
1 > 0, lim
⇠
!1
G(⇠) = 1 < 0, which gives that a root exists in [0,
1).
Since
✓
0 = d d⇠
(⇠C+
I) 1
(⇠C+
I) = d d⇠
(⇠C +
I) 1
◆
(⇠C+
I)+(⇠C+I) 1
✓ d d⇠
(⇠C +
I)
◆
, we have that d d⇠
(⇠C +
I) 1
= (⇠C +
I) 1
C(⇠C +
I) 1
= (⇠C +
I) 2
C.
Using this we can di↵erentiate G
G
0
(⇠) = 2x
>
0
(⇠C +
I) 3
C
2 x
0 and in addition
G
00
(⇠) = 6x
>
0
(⇠C +
I)
4
C
3 x
0
, where we have used the commutativity of C and (⇠C +
I) has full rank for all ⇠ 0 we see that
1
. Since (⇠C +
I) 1
G
0
(⇠) < 0, so the solution ⇠
⇤ is unique.
G
00
(⇠) > 0
Kiseliov ( 1994 ) in addition gives a nonlinear version of the Newton process
described in the above theorem, which is even more efficient. Compared to the added complexity of the implementation, the overall gain of using such an algorithm is limited, and we will recommend the process described here.
2.1.3
Alternative optical flow formulations
In the wake of the algorithm by
( 2007 ), a large number of duality
based or primaldual methods emerged in optical flow estimation. The initial focus has mainly been on improving regularization.
siders structure and motion adaptive regularization.
has gone further to consider full anisotropic regularization, where regularization directions are weighted di↵erently by means of a di↵usion tensor. In addition the L
1 norm used in the regularization of the original TVL
1 were replaced with a Huber norm that is smooth at the origin, thus eliminating the staircasing e↵ect of the regularization. In
( 2010 ) nonlocal total variation
10
is considered, where a low level image segmentation is integrated in the regularization. This in turn produces very sharp motion boundaries, and preserves small scale structures in the flow very well.
In addition to the refinement of regularization techniques, some work has been done on reformulating data terms.
imize a sum of two L
1 data terms for onedimensional images. Recognizing the pointwise structure of many data terms,
to use bruteforce minimization of the data fidelity energy ( 2.4
earizing the optical flow constraint ( 2.1
). A number of more advanced pointwise
data terms are considered in
Steinbr¨ ( 2009b ), but unfortunately the
quality of the resulting flows are not as impressive as one could hope for.
( 2010 ) use truncated normalized cross correlation for their data
term. This data term is attractive because of its invariance to multiplicative illumination changes in the scene. It is however not defined pointwise, and thus needs a more complex minimization strategy. This is done by a second order approximation of the data term, in contrast to the usual first order approximation. Building on these ideas,
Panin ( 2012 ) considers a mutual information
data terms. Although the benchmark optical flow results of
compete with a highly optimized TVL
1
implementation ( Wedel et al. 2009a ),
the algorithm shows impressive results under less optimal conditions such as noise and transformations of the values in one of the images to be registered.
In the following we will consider some examples of alternative data and regularization terms, and consider how they may be minimized. At the end we will consider other extensions.
Example 2.1.4 (L
2 data term). The cost function is clearly smooth and convex in v, so there is a unique minimizer to the problem that can be found as the solution to d dv f (v) = 0.
We readily get that f (v) =
2 kAv + bk
2
+
1
2 kv u k
2 d dv f (v) = v u + ✓A
>
(Av + b) = (
I + ✓A
>
A)v u + ✓A
> b, and the solution to the problem is v = (
I + ✓A
>
A)
1
(u ✓A
> b).
For comparison purposes it may be interesting to rewrite this solution as v = u ✓(
I + ✓A
>
A)
1
A
>
(Au + b).
Note that for k = 1, A
>
= a, so the above formula becomes v = u
✓(a
> u + b) ✓ a.
1 + ✓ kak 2
(2.30)
11
Example 2.1.5 (Charbonnier norm). While the approach for minimizing the vector valued data term in Section
is nice from a theoretical point of view, the actual implementation of the solution given in Proposition
is not very practical. The checks needed in order to determine which category the given point falls into, and the iterative procedures needed to project onto an ellipsoid,
), result in an algorithm that is hard to implement
and quite slow. The original algorithm presented in
with this somewhat inelegantly, by ensuring full rank of of the matrix A by means of regularization, followed by projection of b onto Im A.
By replacing the Euclidian norm with the Charbonnier norm k · k
" given
), we may avoid the checks related to the cases of Proposition
and instead just perform iterative minimization at all points following Example
Example 2.1.6 (Interval data term). Consider the following penalty function
'(x) =
1
(
1,c
1
)
(x)(x c
1
) + 1
(c
2
,
1)
(x)(x c
2
) where c
1
0 c
2 and 1 is the indicator function. This type of penalty may be an interesting data term when data is very noisy, or in general when a perfect data fit is not realizable over most of the image.
First consider a function of the form
'(x) + (x y)
2
.
If y 2 [c
1
, c
2
], it is minimized by x = y. For y c
2 we want to minimize
1
(c
2
, 1)
(x)(x c
2
) + (x y)
2
.
If (y c
2
)
1 /
2 the minimizer is x = c
2
, and otherwise it is x = y 1 /
2
. A similar expression is found for y
c
1
, giving the final solution
8
>
> y +
1
/
2 if y 2 ( 1, c
1 c
1 if y
2 [c
1
1
1
/
2
)
/
2
, c
1
) x = .
>
> y c
2 if y if y
2 [c
2 (c
1
2
, c
, c y
1
/
2 if y 2 (c
2
+
2
2
]
1
+ 1 /
2
]
/
2
, 1)
Consider now the function f (v) = '(a
> v + b) + kv u k
2
.
Denoting ⇢(v) = a
> v + b, the solution can be found as v =
8
>
> u + 1 /
2 a u (⇢(u) c
1 if ⇢(u) 2 ( 1, c
) a
/ kak
2 if ⇢(u) 2 [c
1
1
1
1 /
2 kak
/
2 kak
2
, c
1
)
2
)
>
> u u u
(⇢(u)
1 /
2 a c
2
) a / kak
2 if ⇢(u) 2 [c
1
, c
2
] if ⇢(u)
2 (c
2
, c
2
+ if ⇢(u) 2 (c
2
+
1 /
2 kak
2
]
1 /
2 kak
2
, 1) we see that for c
1
= c
2
= 0, the solution is identical to the one given in Lemma
. In addtion it is interesting to note that similar calculations can be used
to give an explicit solution to the truncated L
1 data term, which was minimized by brute force by
12
Example 2.1.7 (Vectorial total variation). As already mentioned the total variation of a vector valued function is not uniquely defined, and the di↵erent definitions will give results with di↵erent properties. In the following it is assumed that d = 2 for simplicity.
We have already introduced the channelbychannel definition of the vectorial
), which is used in the original formulation by
The canonical definition of vectorial total variation, which is also the definition used by the original TVL
1 algorithm of
Z Z
T kD
F u(x) k dx = sup p 2P
T hu(x), r · p(x)i dx (2.31) with p :
R 2
! R
2 ⇥2
, and where P = {p 2 C c
1
(
R 2
, R
2 ⇥2
) : kpk
2
1}. It is worth noting that the definition of
( 2007 ) corresponds to the require
ment that kpk
1
1 in P. If we assume that u is smooth, using integration by parts with proper boundary conditions yields that sup p 2P
Z
T hu(x), r · p(x)i dx = sup p 2P
Z
T
X hru i
(x), p i
(x) i dx.
i=1
(2.32)
For ru 6= 0 the supremum is found to be p i
= ru i kruk
, kruk = p kru
1 k 2
+ kru
2 k 2
(2.33) and for ru = 0, p can be any function in P, which in turn means that for smooth u
Z kDu(x)k dx =
Z p kru
1
(x) k
2
+ kru
2
(x) k
2 dx =
Z kru(x)k dx. (2.34)
This definition has some very nice properties. In particular it is rotationally invariant, and couples the channels by weighting the regularization di↵erently across the di↵erent channels.
This definition is not directly compatible with Chambolle’s algorithm (Lemma
), however one may use the following algorithm proposed by
Chan ( 2008 ), which is a direct extension.
Lemma 2.1.6. The minimizer u of ( 2.9
u = v ✓ r · p (2.35) which can be solved with the convergent semiimplicit gradient descent scheme p n+1
= p n
+ ⌧ r(r · p n
1 + ⌧ kr(r · p n v/✓) v/✓)k
(2.36) where ⌧
1
/
8
, p
0
= 0.
Recently
Goldl¨ ( 2012 ) introduced an alternative definition of vec
torial total variation. Assuming sufficient smoothness of u, this definition, which we will denote by
Z kD
J u(x) k dx
T
13
corresponds to the integral over the largest singular value of the derivative matrix of u. This definition smooths in a single direction across channels, and thus does not su↵er from the channel smearing e↵ects of the two previously defined methods. The large number of examples given to color imaging by
( 2012 ) are very convincing, showing consistently better results of this
method in di↵erent applications. For the minimization of the energy ( 2.9
gorithm given in
Goldl¨ ( 2012 ), which is directly comarable to the
solutions given in lemmas
and
In the following these three definitions will be denoted by TV
S
, TV
F
, and
TV
J respectively.
Example 2.1.8 (1harmonic regularization). As we have seen so far, it is common practice in optical flow estimation to formulate the regularization of the optical flow by means of an L p norm of the flow gradient. However, when considering the nature of optical flow fields, one realizes that this is perhaps not the natural type of regularization. An optical flow field describes the motion of a projected scene. Considering a scene where all motion is parallel to the camera plane, and objects are rigid and move in a single spatial direction. In this setup the displacement vectors of an object in the projected image will point in the same direction, but the magnitude of the flow vectors will vary depending on the distance of the particular part of the object to the camera. This suggests that one should regularize direction to a higher extend than magnitude.
Additional directional regularization has been proposed by
) in the form of an additional 1harmonic regularization term ( Vese &
Osher 2002 ) to the original TVL
1 method
representation of optical flow was considered by
strate good results on the Middlebury training data by completely decoupling the angular and magnitude component of the flow. The chosen representation does however increase the complexity of the formulation of the problem considerably. In addition both methods are quite slow, and they owe much of their
precision to being built upon existing wellperforming methods ( Wedel et al.
( 2010 ) respectively). It turns out that one may combine
the elements of the two mentioned methods, in an elegant manner, which in addition has very attractive computational properties. The regularization term we propose to use is the following
Z Z
G(v) = ↵ r v(x) kv(x)k
T rkv(x)k dx + (1
↵)
T dx, 0
↵ 1.
The above regularization term is similar to the one of
it completely decouples magnitude and direction, however, instead of having to solve a constrained problem with an angular component, we use a 1harmonic term as in
We are now interested interested in minimizing an energy of the form
E(v) = F (v) + G(v).
Using the standard quadratic decoupling ( 2.3
) will introduce a coupling of mag
nitude and direction in the regularization, and in order to avoid that, we propose
14
the following decoupling
E
1
(v) = F (v) +
1
2✓
Z
T kv(x) u(x) k
2 dx,
E
2
(u) =
1
2✓
1
Z
T kv(x)k ku(x)k
2 dx +
1
2✓
2
Z
T v(x) kv(x)k u(x) ku(x)k
2 dx + G(v).
In this formulation magnitude and directions are independent, and representing flows this way will even allow for simultaneous minimization of the magnitude and regularization parts of the flow. The magnitude regularization just corresponds to onedimensional total variation regularization, and may be minimized following
Chambolle ( 2004 ). The directional regularization may be solved effi
ciently following
When considering the splitting scheme, it does not seem as elegant as the typical quadratic splitting, as it cannot directly be formulated as a single energy in two variables. For ✓, ✓
1
, and ✓
2 sufficiently small however, the two solutions should converge to each other. In addition, if one considers the splitting as an iterative estimation process, it does makes sense that minimization of the data term gives a solution with fundamentally di↵erent properties than the estimate produced by minimizing the regularization term. In this light it makes sense that one should treat the previous estimate di↵erently in the two minimization problems. This also gives a good explanation for the widespread use and success
Example 2.1.9 (Illumination and occlusion modeling). Illumination changes in image sequences present a major problem for conventional optical flow estimation. Changing light conditions from one frame to the next may render the data term completely unable to match objects in the scene. Another problem is the issue of occlusions, when an object (partly) disappears from one frame to the next. This will naturally lead to violations of the optical flow constraint.
In the TVL
1 optical flow setting,
Chambolle & Pock ( 2011 ) proposed to
model violations of the data term by adding a compensating term c(x) to the linearized optical flow constraint ⇢(v)(x). Illumination changes are expected to a↵ect the residual ⇢(v)(x) similarly in connected regions, so
( 2011 ) proposed to regularize c using total variation. This new illumination field
may seamlessly be integrated to the algorithm in a similar fashion to was has been done so far: We split data and regularization of c using a quadratic term and minimize iteratively.
A similar method has been used for occlusion detection by
( 2012 ). A compensating term c is added to the data fidelity functional, however
since only occlusions are modeled, a sparsity enhancing L
0 regularization is proposed. In the presented setup this gives the following data energy
E
1
(v, c) =
Z
T k⇢(v)(x) c(x) k dx +
1
2✓
Z
T kv(x) u(x) k
2 dx + kck
L 0
, where the L
0 norm is defined as
Z kck
L
0
=
T kc(x)k
`
0 dµ(x), kyk
`
0
=
(
0 if y = 0
1 otherwise
,
15
with µ denoting the Hausdor↵ measure. In order to minimize the full energy using specific solvers
( 2012 ) iteratively approximate the L
0 term by weighted L
1 terms. This does however seem somewhat unnecessary, as a closedform pointwise solution for c is easily found. In the duality based setting v may be calculated using Lemma
, and the minimizer of c is given pointwise as
c(x) =
(
⇢(v)(x) if k⇢(v)(x)k > /
0 otherwise
.
This means that we simply have a thresholding step where if the data residual
⇢(v)(x) is too big, it is considered an occlusion, and motion in the area (which in principle is not defined) is fully determined by the regularization term.
2.2
Algorithm
This section will describe a general algorithmic framework for estimating dif
ferent types of duality based optical flows from energies on the form ( 2.3
already mentioned the duality based approach has good computational properties, because the solutions to the two subenergies may be done in parallel. This makes the algorithm perfectly suited for massively parallel processors.
The structure of the algorithm is depicted in Algorithm
Data: Two images I
0 and I
1
Result: The optical flow field v for ` = ` max to 0 do
// Pyramid levels
Downsample the images I
0 for w = 0 to w max do
// Warping and I
1 to current pyramid level
Compute v as the minimizer of E
1 for i = 0 to i max do
// Inner iterations
Compute u as the minimizer of E
2
end end
Upscale v and u to next pyramid level end
Algorithm 2.1: Computation of duality based optical flow.
The standard settings of the algorithm is described in the following. Unless specifically mentioned, these are the settings used in the calculations described in the rest of this thesis.
Pyramid An image pyramid is built, where on each level, prior to downsampling to the next pyramid level, the images are smoothed with a Gaussian function of standard deviation . The downsampling is done by means of linear interpolation. Evaluation at nonpixel positions in images is done by bicubic interpolation.
16
Warping At the beginning of each warp, the image I
1 is warped according to the current estimate v. If median filtering is used to remove outliers (it is not in the standard setting), it is performed on v prior to warping of I
1
.
Upscaling Flows are upscaled using linear interpolation, and their values are divided by the downscale factor of the pyramid, in order for vector lengths to match the current image size. This is followed by an application of a 3 ⇥ 3 median filter.
The standard parameters of the flow algorithm are given in Table
Table 2.1: Standard parameters of the optical flow algorithm depicted in Algorithm
Parameter
` max value
70 downscale factor 0.95
p
2
4 w max
90 i max
20
50
✓ 0.2
The algorithm has been implemented in CUDA C in order to take advantage of the thousands of cores on modern GPUs.
2.3
Results
This section presents optical flow results for the TVL
1 algorithm using the implementation proposed in the previous section. In addition some of the extensions from Section
are considered, and analyzed.
Choosing TV definition Example
introduced three di↵erent definitions of vectorial total variation. Using the algorithm described in the previous section, we can consider the di↵erence in terms of accuracy of the resulting TV
L
1 optical flow algorithm on the Middlebury Optical Flow Database training
data ( Baker et al. 2011 ). Figure
shows the average endpoint errors (AEEs) for the di↵erent definitions on the training sequences, as a function of . We see that generally the definitions that couple channels, TV
F better than the uncoupled regularization TV
S and TV
J
, perform
. Figure
shows the average performance over all test sequence, and we see an average di↵erence in AEE of approximately 0.01 between using coupled or uncoupled regularization, for the respective optimal choices of s. This may not seem like much, but since optical flow algorithms today have so high accuracies, an average improvement of this order will give significantly di↵erent rankings of the algorithms on for example the Middlebury ranking (as of January 14, 2013 the average di↵erence between
17
the two top ranking methods in terms of AEE, MDPFlow2 and NNField is
0.0025). Considering the coupled methods there seem to be little di↵erence in the optimal area, and both choices seem reasonable. Based on this analysis we choose the definition TV
J of
because of its nice theoretical properties.
( 2012 ) as our standard method,
0.24
Dimetrodon
0.18
0.17
0.22
0.16
0.20
0.15
0.60
0.58
0.56
0.54
0.52
0.50
0
0.18
0 50 100
Hydrangea
150 200
0.14
0
0.26
0.24
0.22
0.20
0
0.17
0.16
0.15
0.14
0.13
50 100
Urban3
150 200
0.40
0
0.38
0.36
0.34
0.32
0.30
50 100 150 200 0
50
50 100
Venus
150 200
50
Grove2
100 150
RubberWhale
200
100
λ
150 200
0.75
0.70
0.65
0.60
0.40
0.38
0.36
0.34
0
0
50 100
Urban2
150 200
50
Grove3
100 150 200
TV type
TV−S
TV−F
TV−J
Figure 2.1: Average endpoint errors for the three di↵erent definitions of total variation TV
S
, TV
F
, and TV
J on training data plotted as a function of .
Comparisson of methods With the algorithmic setup in place, we can evaluate the accuracy of the method. For a baseline method we consider the presented algorithm with standard parameters, using a onedimensional data term, and TV
J for regularization. Table
shows results for this method, compared to results of the original algorithm by
( 2007 ), an improved version of
this algorithm presented by
( 2009a ), which among other things uses
structuretexture decomposition of the input images to remove lighting artifacts, and uses intermediate median filtering. Finally we compare to the results presented in
( 2011 ), that are based on a color image data term solved
by Proposition
We see that despite of its simplicity, the baseline method presented here gives slightly better average results than the ones of
18
Table 2.2: Average endpoint error results for the Middlebury optical flow database training sequences for di↵erent variants of the TVL
1 optical flow algorithm. Baseline is the method following Algorithm
with standard parameters, using a 1D data term with grayscale images, and TV
J for regularization. The second method is the RGB algorithm presented in
( 2011 ). Third method is the original method of
are taken from
( 2009a ). The last column holds results of the so
called TVL
1
improved algorithm ( Wedel et al. 2009a ). Bold indicates the best
result among the four baseline
Dimetrodon
Grove2
Grove3
Hydrangea
RubberWhale
Urban2
Urban3
Venus average
0.13
0.33
0.51
0.31
0.296
0.17
0.14
0.57
0.21
0.17
0.36
0.50
0.49
0.331
0.16
0.15
0.57
0.25
0.22
0.65
1.07
0.48
0.486
0.26
0.19
0.76
0.26
0.09
0.32
0.63
0.26
0.308
0.19
0.15
0.67
0.15
19
0.36
0.34
0.32
TV type
TV−S
TV−F
TV−J
0.30
0 50 100
λ
150 200
Figure 2.2: Average AEE computed across all sequences in Figure
( 2011 ). This evaluation is of course not optimal, since it is based
on training data, however, it seems to be a general fact that carefully optimized implementations are of great importance when computing optical flow.
Figure 2.3: Frames 10 and 11 of the Army sequence of the Middlebury Optical
Flow Database test set.
Smoothness and visualization of optical flows Consider the two frames from the Army sequence in Figure
. The ground truth motion represented
in color coding can be found in Figure
. In this representation hue codes the
direction of the flow vectors, and saturation indicates the length of the vectors.
Black corresponds to occluded areas.
Figure
shows the estimates produced with the baseline method for three di↵erent values of . We see that with the standard choice of = 50 we are able to properly estimate most details. Increasing produces a less regular flow with more noise artifacts, and decreasing it produces flows where small structures are blurred out. In contrast, substituting the data term with the interval data term presented in Example
, and varying the size of the interval where no
penalty is given, produces another type of smooth flows. With this data term small variations are ignored in the minimization, and the estimation is driven by big di↵erences that may be found at for example edges. This is evident from the results, where edges are preserved quite well, while the interior of objects are very smooth.
20
Figure 2.4: Ground truth optical flow between the two frames from Figure
Flow vectors are coded according to the color legend in the lower right corner.
= 10 = 50 = 100
Figure 2.5: Optical flow between the two frames from Figure
with baseline method.
c
1
= 100,
= c
2
= 0.02
c
1
= 100,
= c
2
= 0.01
c
1
= 100,
= c
2
= 0.005
Figure 2.6: Optical flow between the two frames from Figure
with interval data term (Example
), and everything else as in the baseline method.
21
Chapter 3
Applications
This chapter presents five novel applications of the optical flow methods reviewed so far. The first application considers registration of Xray images of hands. The second application presents a specialized registration problem for
2D chromatograms, and introduces a novel solution strategy. The last three applications look into interpolation in image sequences. The first of these presents a general method where optical flow data terms are reparametrized to fit the interpolation assumptions, which turns out to produce superior results compared to conventional methods. The last two applications are related to distributed video coding. We consider interpolation when depth information of the scene is available, and finally we consider how to give new estimates of inbetween frames using partially decoded information about the frame in question.
3.1
Registration of Xray images
One of the major application areas of registration algorithms is medical images, where proper analysis of the data is often impossible without first registering the images. In this setting TVL
1 registration o↵ers a number of advantages over typical methods. In particular the parallel nature of the algorithms, and the associated speed when implemented on massively parallel processors, but also the robustness of the L
1 norms used in both data and regularization terms.
This makes TVL
1 a good choice for outofthebox registration to a large body of problems. On the downside, TVL
1 does not necessarily produce di↵eomorphic registrations, and in applications where this is of importance, one may either try to manually enforce this behavior, or consider methods for di↵eomorphic registration (see for example
( 2012 ) and references therein).
Furthermore, if images are taken on completely di↵erent imaging devices, and cannot easily be brought on similar scales, one will prefer a data term suitable for these types of problems. One type of data term that can handle such problems is mutual information, which has recently been considered by
1 optical flow method has previously been used for registration in
( 2007 ), where CT scans of lungs as well as brain MRI images
were registered.
I
1
Consider the Xray images of two di↵erent hands in Figure
to I
0 is by no means a simple task: The two hands have significantly di↵erent
22
bone structure; the contrast of the images and the amount of noise is di↵erent; and the two lead tags that are placed in the image represent information that should not be registered. This last issue may very well ruin the registration process for many types of methods. Undeterred by these facts, a registration using the standard settings with = 10 and one level of 3 ⇥ 3 median filtering has been performed. The registration results can be found in Figure
seen that we get a quite decent registration, with the only big artifact being caused by the lead tag in I
1
. However, this artifact only causes local changes, which do not propagate down to the hand, as can be seen in the deformation visualization. Note that the lead tag in I
0 does not cause any artifacts. This is due to the fact that the corresponding area in I
1 is homogeneous, which means that the updates caused caused by the data term in this area are essentially zero
(Lemma
3.1.1
Registration of structural images
The problem of registering the images in Figure
is in many ways di↵erent from the optical flow problem applied to video sequences. In video data, the same objects are occurring in consecutive pairs of images, and we want to use the fine details in the images to get correct correspondences. In the case of images of two di↵erent objects, a true onetoone correspondence does not exist, and trying to match fine details may just amount to noise matching—both in terms of actual noise, or person specific structure that may from a larger perspective be considered as interpersonal serially correlated noise structures.
This means that one may benefit from filtering or regularization of the images thereby enhancing dominant structures and removing fine details. Such an approach contrasts the successful structuretexture decomposition used in several
optical flow algorithms ( Wedel et al. 2009a , Sun et al. 2010 ), where the struc
tural part of the images obtained from regularization are subtracted from the original image. This produces an image that mainly contains texture details.
The structures that are removed often include shadows and general illumination changes, which will in turn give better estimates.
0 using total variation regularization
E
ROF
(I) =
Z
T kI(x) I
0
(x) k
2 dx +
Z
T krI(x)k dx.
(3.1)
Furthermore, we consider a type of regularization that enhances sparsity even further, where the L
1 norm of the gradient is replaced by an L
0 norm that penalize all deviations from 0 equally
E
L 0
(I) =
Z
T kI(x) I
0
(x) k
2 dx +
Z
T krI(x)k
`
0 dµ(x), (3.2) where µ denotes the Hausdor↵ measure.
The ROF model may be minimized e↵ectively using Lemma
) may be computed using the method described in
( 2011 ). As opposed to many other types of regularization, these methods are
edge preserving, and the L
0 regularization of the gradient produces cartoonlike
23
I
0
I
1
Figure 3.1: Xray images of two di↵erent hands.
http://mrmackenzie.co.uk/2011/11/01/xraysinmedicine/
Sources are and http://images.suite101.com/460740_com_28_hand.jpg
.
I
1 registered Deformation of coordinate system
Di↵erence between I
0 and registered I
1
Figure 3.2: Registration of I
1 to I
0 using the standard TVL
1
= 1 and one level of 3
⇥ 3 median filtering.
algorithm with
24
results with strong edges and large flat regions of zero gradient. The regularization results of these methods with respectively = 10 and = 200 can be found in Figure
. The corresponding registrations, computed using the
same parameters as before, are found in figures
and
look very similar to the ones found in Figure
the regularized images tend to deform the coordinate system outside the hand to a greater extend than the nonregularized case. The reason for this is that the gradients driving the registration are small or zero inside the bones after regularization. This is desirable both from a practical point of view, since the purpose of these types of registrations are typically to align bones in a similar coordinate system, as opposed to bending and deforming the interior of bones to produce a good match. A somewhat similar e↵ect can also be achieved by using the interval data term from Example
instead of the L
1 norm in the optical flow algorithm.
25
ROF regularized I
0
ROF regularized I
1
L
0 regularized I
0
L
0 regularized I
1
Figure 3.3: Results of ROF and L
0
regularization on the images from Figure
26
I
1 registered Deformation of coordinate system
Di↵erence between I
0 and registered I
1
Figure 3.4: Registration of I
1
TVL
1 algorithm with to I
0 after ROF regularization using the standard
= 1 and one level of 3 ⇥ 3 median filtering.
I
1 registered Deformation of coordinate system
Di↵erence between I
0 and registered I
1
Figure 3.5: Registration of I
1 to I
0 after L
0
TVL
1 algorithm with regularization using the standard
= 1 and one level of 3
⇥ 3 median filtering.
27
Figure 3.6: Example of a chromatogram along with the absorbance (A.U.) curves corresponding to two fixed wavelengths.
3.2
Registration of 2D chromatograms
Chromatography is a process for separating mixtures. One use of chromatography is measuring relative proportions of analytes in a number of mixtures, to determine di↵erences. An example of a 2D chromatogram is shown in Figure
The chromatograms we are considering have been generated using ultrahigh
performing liquid chromatography with diodearraydetection ( Petersen et al.
2011 ). The chromatograms consists of 209 wavelengths each measured at 24,000
retention times. The subject of the analysis is rapeseed seedlings having been exposed to di↵erent levels glyphosate (commonly known as Roundup R ).
The images arising from this procedure will have shifts in retention time, but because of the experimental setup, no such shifts occur in the wavelength dimension. This means that we have a onedimensional registration problem for a twodimensional image.
Consider the single wavelength of four chromatograms shown in Figure
The retention time shifts are clearly visible. Furthermore there seem to be a varying detector sensitivity, resulting in some of the curves consistently having higher peaks that others. Finally there are small variations that cannot be explained by the mentioned issues, and which can be ascribed to serially correlated e↵ects and noise.
3.2.1
Registration algorithm
Given two chromatograms I
T t
0
, I
1
: T ! R of size k ⇥ n, where T = T w
⇥
, consider the problem of estimating the disparity v : T
I
1
(w, t + v(t)) is properly registered to I
0 sensitivity, a robust data term such as an L
1 t
! T t such that
(w, t). Because of the varying detector norm is preferable.
From the point of view of Section
the natural formulation of the data term is as a vector valued problem. Let
I i
(t) =
0
I i
(w
1
, t)
.
..
1
I i
(w k
, t)
28
Retention time
Figure 3.7: A range of retention times for a single wavelength for four chromatograms.
The optical flow constraint may then be written as
I
1
(t + v(t)) I
0
(t) = 0.
Linearizing this around a given estimate v
0
, we get the following system of equations
@ t
I
1
(t + v
0
) v(t) @ t
I
1
(t + v
0
)v
0
+ I
1
(t + v
0
) I
0
(t) = 0.
a b
Considering an L
1 norm of this linearization of this data term, we see that the case (ii) of Proposition
is very easily calculated, however, it seems unlikely that it will ever be the case that b 2 Im a for just a moderate number of wavelengths. This means that we will almost surely be in the less attractive case (i) where we have to minimize by some iterative procedure.
An novel alternative for registering this dataset has been described in
& Markussen ( 2014 ). The idea is to treat the onedimensional vector valued
registration problem as a twodimensional problem, and couple the di↵erent vector channels through the regularization rather than through the data term.
The method is generally applicable, and works by posing an d dimensional registration problem with data taking values in a k dimensional space, as a onedimensional registration problem on a d + 1 dimensional domain. This is done by treating the vector channels as an added dimension to the domain. This way the regularization will be d + 1 dimensional, and by enforcing strong (or increasing) weight on the regularity across this new dimension, information is propagated between the di↵erent channels of the image to produce a registration that is homogeneous along the new dimension.
As described above, we start out by estimating disparities for each wavelength. In the given example we are interested in a robust L
1 norm for the data term. The robustness is important because of the varying detector sensitivity and serially correlated e↵ects, where for example an L
2 norm may cause problems in relation to outliers. For regularization, we are interested in a term, that in addition to imposing regularity on the estimated disparities, regularize across wavelengths. Since one must expect drifts in retention time to be continuous,
29
the registration should be smooth, and therefore we will regularize using squared gradient magnitude instead of total variation. The energy to be minimized looks as follows
Z Z
E(v) = kr w,t v w
(t) k
2 dt.
T kI
1
(w, t + v w
(t)) I
0
(w, t) k dw dt +
T
This functional is minimized following the methods described in Chapter
where the data term is iteratively approximated by its firstorder Taylor approximation around the given estimate v w
0
⇢(v)(w, t) = @ t
I
1
(w, t + v w
0
(t))(v(t) v w
0
(t)) + I
1
(w, t + v w
0
(t)) I
0
(w, t).
Furthermore datafidelity and regularization are decoupled by means of a quadratic proximity term
E(v, v
0
) =
Z
T
+ k⇢(v w
)(w, t) k dw dt
1
2✓
Z
T kv w
(t) v
0 w
(t) k
2 dt +
Z
T kr w,t v
0 w
(t) k
2 dt where ✓ is sufficiently small. Using Proposition
v w is found to be v w
(t) = v
0 w
(t) ✓
8
>
@ t
@
I t
1
I
1
(w, t + v
(w, t + v w
0
@ t
I
1
⇢(v)(w,t)
(w,t+v w
0
(t)) w
0
(t)) if
(t)) if if
⇢(v)(w,t)
✓
<
@ t
I
1
(w, t + v w
0
(t))

⇢(v)(w,t)
✓
⇢(v)(w,t)
✓
>
@ t
I
1
(w, t + v w
0
(t))

2
@ t
I
1
(w, t + v w
0
(t)) 
2
2
.
The problem in v
0 w is just a standard Tikhonov regularization problem, and can easily be solved using standard methods. E is minimized iteratively in a coarsetofine manner, where the input images and the corresponding disparities are gradually upsampled in the retention time dimension, but the wavelength dimension is kept at its original size. Following Algorithm
we use ` max
= 160 and a scaling factor between levels of 0.97, yielding a downsampling factor at the coarsest level of approximately 130. w max
= 100 warps are performed at each level, and was set to 60, while ✓ was fixed at 0.1.
Figure
shows the individual wavelength registration curves (gray) of the described method, as well as the average (red) for two 2D chromatograms. The weighting of the wavelength dimension of the gradient in the Tikhonov regularization are respectively a factor 0 (i.e. registering each wavelength independently) and 10. As we can see, with the higher weight we are able to propagate information between wavelengths very well, and end up with a uniform result across wavelengths. Note in addition that the average curves are quite di↵erent in the two cases.
The average registration is used as the final single disparity v : T t
! T t
.
The registration was then done by warping the chromatograms according to v for each wavelength. The result of the registration procedure on the data in
Figure
can be found in Figure
. We see that the data is very well aligned
after registration.
30
Independent registration of wavelengths
Dependent registration of wavelengths
Figure 3.8: Registration curves of individual wave lengths (gray) with the average registration plotted on top (red).
Retention time
Figure 3.9: The chromatograms from Figure
registrated along retention time.
31
3.3
Image interpolation with a symmetric optical flow constraint
Frame interpolation is the process of creating intermediate images in a sequences of known images. The process has many uses, for example video postprocessing and restoration, temporal upsampling in HDTVs to enhance viewing experience, as well as a number of more technical applications, in for example in video coding
In this section we will review optical flow based frame rate upsampling which performs interpolation along the motion trajectories. In particular, we will review the method presented in
( 2012a ), where the optical flow
energy is reparametrized such that it fits better to the given problem. The reparametrized energy has a symmetric data fidelity term, that uses both surrounding frames as references. We show that one can improve modern frame interpolation methods substantially by this powerful generic trick, that can be incorporated in existing schemes without requiring major adaptations. We analyze the reparametrization, and show experimentally that it has a substantial e↵ect on the stability and robustness of the interpolation process.
The idea to symmetrize data matching terms to achieve better results has already established its usefulness in other areas. In image registration
& Johnson ( 2001 ) explored the benefit of penalizing consistency, by jointly esti
mating forward and backward transforms, and requiring that they were inverses of one another. A similar idea was applied to the optical flow problem by
( 2007a ), who imposed an additional consistency term. Later that
same year
( 2007b ) proposed a reparametrization similar to the
one derived here, in order to avoid a reference frame, and thereby increase flow consistency. However, they did not use the obtained symmetric flow directly, but interpolated flow values at pixel position of a reference image in order to obtain a flow comparable to the standard asymmetric flow. Recently
used a symmetric data term for surface velocity estimation, noting the property that motion vector length is halved, which in turn gives better handling of large displacements.
Apart from being algorithmically di↵erent, the di↵erence between the justification for the reparametrization given here and the justifications of
Chen ( 2012 ) is that we have chosen the symmetric data fidelity
term because it explicitly models the standard interpolation assumption, rather than improves some notion of consistency, or better handles large displacements.
In turn this also means that we may use the estimated flows directly on the unknown frame, and thereby avoid the problems related to temporal warping. As we will show, the mentioned benefits are clearly reflected in the results. It is demonstrated that using a symmetric flow for interpolation is generally better than using either forward or backward flows or both.
32
3.3.1
Motion Compensated Frame Interpolation
Given two images I
0 and I
1 and an estimate of the (forward) optical flow v f we are interested in estimating the inbetween image I
1 /
2
(the methods are easily extended to any inbetween frame I t
, t
2 (0, 1)). A simple approach is to assume that the motion vectors are linear through I
1 /
2 and then fill in I
1 /
2 using the computed flow. However, since v f is of subpixel accuarcy, the points x +
1 /
2 v f
(x) that are hit by the motion vectors are generally not pixel positions. This is often solved by warping the flow to the temporal position of the intermediate frame I
1 /
2
which one defines a new flow v
1 /
2 f from I
1 /
2 to I
1
v
1 /
2 f
(round(x +
1
/
2 v f
(x))) =
1
/
2 v f
(x), (3.3) where the round function rounds the argument to nearest pixel value in the domain. There are some drawbacks to this approach. First, if the area around x in I
0 is occluded in I
1
, there are likely multiple flow candidates assigned to the point round(x + 1 /
2 v f
(x)). In the converse situation, i.e. disocclusion from I
0 to I
1 there may be pixels that are not hit by a flow vector, thus leaving holes in the flow. While the first problem can be solved by choosing the candidate vector with the best data fit, that is the candidate v f for which kI
1
(x+v f
(x)) I(x) k is smallest, the solution for the problem of disocclusions in not that simple. Here we will simply fill the holes in the flow field by an outsidein filling strategy.
With a dense flow we can then interpolate I
1
/
2 using the forward scheme
I
1 /
2
(x) =
1
2
⇣
I
0
(x v
1 /
2 f
(x)) + I
1
(x + v
1 /
2 f
(x))
⌘
, (3.4) or consider the backward flow v b
(i.e. the flow from I
1 to I
0
) and use a backward scheme accordingly. We will in addition consider a bidirectional interpolation scheme where the frame is interpolated as the average frames obtained by the forward and backward schemes.
One can sophisticate the interpolation methods by estimating occluded regions and selectively interpolating from the correct frame. We will not pursue any occlusion reasoning here, but refer to
3.3.2
Reparametrizing Optical Flow for Interpolation
The approach presented in the previous section is the standard procedure for
frame interpolation and serves as backbone in many algorithms ( Baker et al.
Huang et al. 2011 , Keller et al. 2010 ,
Werlberger et al. 2011 ). In this
section we will reparametrize the original energy functional so the recovered flow is better suited for interpolation purposes. The reparametrization turns out to be beneficial on a number of levels: It makes the temporal warping of the flow superfluous, eliminates the need to calculate flows in both directions, improves handling of large motion, and increase overall robustness.
The original optical flow energy functional take as argument an optical flow v that is defined on a continuous domain. In practice, however, we only observe images at discrete pixels, and the optical flow is typically only estimated at the points corresponding to the pixels in I
0
. Since we assume that the intermediate
33
frame I
1
/
2 can be obtained from linearly following the flow vectors, we propose to reparametrize the data fidelity functional of the TVL
1 optical flow energy using this assumption, so that it is given as
1
Z
2
T kI
1
(x + v s
(x)) I
0
(x v s
(x)) k dx.
(3.5)
We note that in this parametrization, the coordinates of the optical flow matches those of the intermediate frame I
1
/
2
, and using this data term will thus eliminate the need for warping of the flow, since interpolation can directly be
). Because the motion vectors of the symmetric flow v
s are only half of the ones of the forward or backward flows, we need to halve the corresponding to keep comparison fair, which is the reason for the factor 1 /
2
.
Linearizing the data matching term ( 3.5
0 gives
⇢(v s
) = I
1
( · + v
0
) I
0
( · v
0
) + (J
I
1
( · + v
0
) + J
I
0
( · v
0
))(v s v
0
) (3.6)
). For grayscale images the corresponding split energy
) is easily minimized using Lemma
1
L
2 minimization described in Proposition
) and the conventional linearization are that
we now allow subpixel matching in both surrounding images, and instead of a single Jacobian we have a sum of two. Thinking of this linearization as a finite di↵erence scheme corresponding to a linearized di↵erential form of the
data fidelity term ( Horn & Schunck 1981 ), we see that the temporal derivative
is represented by a central finite di↵erence scheme, as opposed to the typical forward di↵erences. In addition the sum of the two Jacobians should make the estimation procedure more robust to noise, as the noise amplification caused by derivative estimation is now averaged over two frames—a fact that has previously been used heuristically to improve accuracy in asymmetric flow estimation
( Wedel et al. 2009a ). Finally we note that the motion vectors will only have
half the length of the ones obtained from the regular parametrization. This will make the method better suited to handle large displacements compared to traditional methods that only make use of a onesided linearization.
3.3.3
Results
Motion compensated frame interpolation finds many uses, ranging from the more
technical applications such as video coding ( Girod et al. 2005 ,
Luong et al. 2012 ) to disciplines like improving viewing experience ( Keller et al.
) or restoration of historic material ( Werlberger et al. 2011 ). For the former
type of application the reconstruction quality in terms of quantitative measures is of great importance. For the latter types it is hard to devise specific measures of quality, as the human visual system is very tolerant to some types of errors, while it instantly notices other types.
For the results presented in the following, the optical flows have been computed using the algorithm illustrated in Algorithm
ters, except: On each level w max
= 60 warps are performed with i max
= 5 inner total variation TV
J
.
34
Beanbags
3.0
2.8
2.6
3.6
3.4
3.2
2.9
2.8
2.7
2.6
1.42
1.40
1.38
1.36
1.34
1.32
10 20
MiniCooper
30 40
1.430
1.425
1.420
1.415
1.410
1.405
50
1.400
λ
10
DogDance
Walking
20 30 40 50
method
backward bidirectional forward symmetric
Figure 3.10: Performance for varying on the four Highspeed camera training sequences from the Middlebury Optical Flow benchmark.
As a first experiment we compare the four di↵erent types of interpolation suggested in the previous sections, on the four Highspeed camera training se
quences of the Middlebury Optical Flow benchmark ( Baker et al. 2011 ). Figure
shows the e↵ect of varying the data term weight in terms of the mean absolute interpolation error (MAIE). We see that the symmetric flow outperforms the conventional approaches, and that it is typically less sensitive in terms of the choice of . In particular we see that the difficult Beanbags sequence which contains large displacements is handled much better by the symmetric scheme.
By evaluation on the Middlebury training set it was found that = 35 gave the best overall performance for the symmetric flow, and that = 20 gave the best performance for the other three methods. These values will be used in the rest of the experiments presented in this section.
Consider as a second example the results of interpolation under noise presented in Figure
. This figure shows the mean square interpolation error per
formance of the four methods on the Beanbags sequence with additive N (0,
2
) noise. The improved robustness of the symmetric interpolation method is clearly visible from the distances between the MAIEs to the asymmetric methods that increase as the standard deviation of the noise increases. In addition we see that the variance of the MAIEs across the independent replications is significantly lower for the symmetric method compared to the three other methods.
Now consider, as a third example, the frames given in Figure
sequence has large displacements (> 35 pixels) and severe deformations, which makes the estimation of I
1 /
2 very difficult. Figure
shows the three di↵erent flows v f
, v b and v s along with the corresponding interpolated frames. Zoom ins of details can be found in Figure
. We see that the result generated by the
35
Beanbags
400
350
300
250
200
150
100
●
●
0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
method
● backward
●
●
● bidirectional forward symmetric
5 10
σ
15 20 25
Figure 3.11: Mean square interpolation error performance under additive
N (0,
2
) noise for varying . Results are for the Beanbags sequence, and are based on 10 independent replications.
symmetric flow is visually more pleasing than the ones produced by the forward and backward flows, a fact that is also clearly reflected in the MAIEs and root mean square interpolation errors (RMSIE).
Finally let us compare the method to some methods of the current stateoftheart. Table
holds the RMSIEs for six sequences from the Middlebury
Optical Flow benchmark and results for a number of methods. While the results cannot fully match those of
( 2008 ), which gives significantly better
results on 3 of the sequences, our method outperforms all other approaches, including the recent and much more complex methods of
and
Realtime performance In the presented setup we only have to compute a single flow field between two images and fill in the intermediate frame from the trajectories. The runtime of the interpolation is dominated by the time it takes to compute the flow field, and at a slight cost in accuracy (5 pyramid levels with a scale factor of 2, and 30 warps per level, 1 level of median filtering) the flow fields can be computed in realtime (
⇠35 fps) for 640 ⇥ 480 images using an NVIDIA Tesla C2050 GPU, which in turn means that we can do realtime frame doubling of 30fps video footage at a resolution of 640 ⇥ 480 pixels.
Figure 3.12: Frames 7 (I
0
), 10 (I
1 /
2
) and 13 (I
1
) of the Mequon sequence.
36
Forward Backward Symmetric
MAIE 8.72, RMSIE 20.30
MAIE 8.85, RMSIE 20.09
MAIE 8.23, RMSIE 19.02
Figure 3.13: Results for the Mequon sequence. Top row: Color coded optical flows, buttom row: Interpolation results. Zoom ins of details can be found in
Figure
Ground truth Forward Backward Symmetric
Figure 3.14: Details of the interpolated Mequon frame from Figure
37
Table 3.1: RMSIE for di↵erent Middlebury sequences. Bold indicates the best result.
†
Results are taken from
‡
Marked algorithms have not been implemented by their respective authors, but are based on alternative implementation on the Middlebury Optical Flow database.
Method
Symmetric TVL
1
†,‡
†
†
Pyramid LukasKanade
†,‡
Dimetrodon
1.93
1.95
1.93
1.78
2.59
3.06
2.56
2.49
Venus Hydrangea
3.45
3.63
—
2.88
3.73
5.33
3.93
3.67
3.36
—
—
2.57
—
—
—
—
RubberWhale
1.46
—
—
1.59
—
—
—
—
Method
Symmetric TVL
1
MiniCooper
3.96
4.55
Walking
2.89
3.97
38
Figure 3.15: Frame 90 of the Breakdancers sequence from Microsoft Research, with its corresponding depth map.
3.4
Image interpolation using depth data
In recent years, the 3DTV and 2Dplusdepth formats have seen an increasing popularity. While conventional methods for frame interpolation usually work on a pair of outer images, similar to the one to be interpolated, these hybrid formats o↵er some new challenges and opportunities.
Distributed video coding ( Girod et al. 2005 ) provides an interesting appli
cation of frame interpolation, and variations of both standard and symmetric
optical flow based methods have been used in this area ( Huang et al. 2011 ,
Rakˆet et al. 2012b ). In distributed video coding, a set of key frames is coded using
conventional coding, and intermediate frames are coded using WynerZiv coding
( Aaron et al. 2002 ). From the point of view of this application, the key frames
can be considered given, and in order to decode the intermediate WynerZiv frames, estimates to be used as side information in the decoding process must be generated. For the 2Dplusdepth format, one may code the depth map very
efficiently ( Zamarin & Forchhammer 2012 ), and use the depth frames in ad
dition to the given key frames. This is beneficial, since depth frames usually contain most of the information needed for proper motion estimation, as can be seen in Figure
. The following sections describe a scheme that was de
vised for
Salmistraro, Rakˆet, Zamarin, Ukhanova & Forchhammer ( 2013 ). The
converse scheme, where texture frames are used to model depth movement has been investigated in
Salmistraro, Zamarin, Rakˆet & Forchhammer ( 2013 ).
3.4.1
Optical flow computation using brightness and depth
Denote by I t 1
, I t+1 the two (brightness) key frames, which we want to use for interpolating the intermediate frame I t
. Furthermore, let two depth frames D t
, and D t+1 be given. We can generalize the approach presented in Section
so in addition to the symmetric data term ( 3.5
) we also include an asymmetric
39
term for the depth frames to the energy
Z
E(v) =
1
Z
T kI t+1
(x + v(x)) I t 1
(x v(x)) k dx
+
2 kD t+1
(x + v(x)) D t
(x) k dx
Z
T
+ kD
J v(x) k dx.
T
(3.7)
With two data terms, this energy does not fit in to any of the methods described so far unless
1
= 0 or
2
= 0. For a sum of two L
1 norms, the solution may however be found explicitly, by means of the results in
The advantage of the full formulation ( 3.7
), as opposed to a purely symmetric
data term (
2
= 0), is that we have a smaller temporal gap on the depth data, which means that we may recover some of the nonlinear motions between the two key frames. In addition, this smaller gap produces a better estimate, as the displacements are smaller. The outer key frames may then help getting the apparent motion right where the depth frames does not supply enough information. This may for example be texture, and shadows that does not show up in the depth maps.
3.4.2
Interpolation
When interpolating a frame in an image sequence, we are interested in using information from both surrounding frames. Thus we compute the forward flow v f as described in the previous section, and a backward flow v interchange I t+1 and I t 1 and replace D t+1 with D t 1
b
, where we
The results are asymmetric because of the depth information, and thus a
symmetric interpolation such as ( 3.4
) should not be used. Instead we simply
interpolate by
I t
(x) =
1
2
I t 1
(x + v b
(x)) + I t+1
(x + v f
(x)) where the subpixel locations are evaluated using bicubic interpolation.
3.4.3
Results
) it is natural to consider three distinct cases. These will
be denoted by T2T (
2
= 0), D2T (
1
= 0), and DT2T (
1
6= 0,
2
6= 0).
The motion estimates are recovered following Algorithm
with standard settings, except that ` max
= 65, i max
= 10, and = 0.5 for T2T and DT2T, while = 0.35 for D2T. For the T2T method
1 was set to 40, for D2T
2
= 30, and
1
= 5,
2
= 40 for DT2T.
We evaluate the method on the sequences Breakdancers and Ballet from
Microsoft Research ( Zitnick et al. 2004 ), and Dancer from Nokia Research. We
use the central view of the three sequences, at 15 fps downsampled to CIF resolution.
In Table
we see that the symmetric interpolation T2T gives the worst results, and that D2T that only uses depth improves the average interpolation
40
quality by 1.4 dB. Combining both the symmetric and depth information in
DT2T, we gain an additional 0.8 dB.
Table 3.2: Average peaksignaltonoiseratio of interpolated frames for the three di↵erent methods for the first 100 frames of the sequences.
Method
Ballet Breakdancers Dancer
T2T
D2T
34.7
37.0
DT2T 38.0
27.5
28.5
29.0
30.5
31.4
32.3
Figure
shows an interesting example from the Dancer sequence. In this example there are large movements of shadows, that are not visible in the depth images. The optical flows for the three methods, along with the interpolated frames can be found in Figure
. We see that the shadow on the wall is
interpolated quite well for the T2T method, while the movement of the dancer is less well modeled due to the large temporal distance between the key frames.
For the D2T method, the movements of the dancer are well captured, but no movement is identified in the wall area, and the interpolation is just the average of the outer frames. The DT2T method identifies both the movement of the dancer as well as the shadows, and gives a better overall result than the two other methods.
Figure 3.16: Frames 94, 95, and 96 and corresponding depth maps of the Dancer sequence from Nokia Research.
41
T2T D2T DT2T
PSNR 26.1
PSNR 27.1
PSNR 29.3
Figure 3.17: Estimated forward (top) and backward (middle) flow fields, along with the interpolation result (bottom).
42
3.5
Image interpolation using partially decoded frames
In this section we will consider the refinement of motion estimation and interpolation in a distributed video coding setup. We use transform domain coding with a discrete cosine transform (DCT) like transform. In this setup every decoded bit plane will produce affine constraints on the frame to be decoded, that can be used to refine the estimate of the frame to be decoded. This chapter is based on methods developed for the codec described in
3.5.1
Initial frame interpolation
For the initial interpolation we are in the setting of Section
images I t 1 and I t+1
. The interpolation is done following
where the forward, backward and symmetric interpolations presented in Section
are computed, and the final result is taken to be the average of the three.
Figure 3.18: Frames 84, 85, and 86 of the Soccer sequence
PSNR 20.4
PSNR 20.3
PSNR 21.5
PSNR 21.2
Figure 3.19: Forward, backward, and symmetric interpolation results and average, for frame 84 of the Soccer sequence.
A specialized coarsetofine pyramidal implementation of the above algorithm is used. Following Algorithm
, we have to following modifications
to the standard settings: On each level we perform w max
= 30 warps using i max
= 10 iterations of the BM algorithm of
improve interpolation quality, ⇢ has been weighted by the gradient magnitude krI
1
(x+v
0
)+0.01
k (slightly shifted to avoid division by 0) in the minimization
) ( Zimmer et al. 2011 ), which will allow for more even step sizes in the
estimation. With this modification, was set to 3.
All in all this produces a more robust flow for interpolation, and combining the symmetric flow with the warped forward and backward flows, we propose
43
to do the interpolation as follows
I
1 /
2
(x) =
1
6
(I
1
(x + v
1 /
2 f
(x)) + I
1
(x v
1 /
2 b
(x)) + I
1
(x + v s
(x))
+ I
0
(x v
1 /
2 f
(x)) + I
0
(x + v
1 /
2 b
(x)) + I
0
(x v s
(x))),
(3.8) so the interpolation is the average of the two surrounded frames warped to the center using the three di↵erent flows.
) which we will denote 3OF on the test sequences (QCIF,
15 fps) Coastguard QP=26, Foreman QP=25, Hall QP=24 and Soccer QP=25, where we interpolate every other frame and compare to the overlapped block motion compensation (OBMC) method from
Huang & Forchhammer ( 2005 ) and
the TVL
1 optical flow (OF) method from
found in Table
, and it can be seen that the proposed method outperforms
OBMC and OF on all sequence, with an average increase in PSNR of 1.26 dB over OBMC and 2.25 dB over OF.
Table 3.3: Average PSNR across the 74 interpolated frames for the four test sequences.
Sequence OBMC OF 3OF
Coastguard 31.83
30.92
32.71
Foreman 29.26
29.28
30.19
Hall
Soccer
36.46
21.30
32.28
22.43
37.25
23.75
With initial estimates produced by 3OF we may decode the socalled Wyner
Ziv frames. Decoding is done one bit plane at the time, going from most significant to least significant bit in transform domain. In the next section it is considered how one may link this information to the frame to be decoded in pixel domain.
3.5.2
Upsampling images from DCT coefficients
The frames to be decoded have been transformed using the following DCTlike transform on every 4 ⇥ 4 image patch I
DCT
0
(I) = (CIC
>
) E, where denote pointwise multiplication, and
C =
0
1 1
B
@
2 1
1 1
1
1
1
2 2
1 1
1
1
2
C
1
E =
0
B
@
1 /
1 /
1 /
4
1 / p
10
1 /
4
1 / p
10 p
10
2 /
5
1 / p
10
2 /
5
1 /
1 /
4
1 / p
10
1 /
4
1 / p
10 p
10
2 /
5 p
10
2 /
5
1
C
A .
We see that the DCT at a single point (i, j) can be written as
E ij
C i
⌦ C j
I v a
44
where I v is the vectorization
I v
=
0
I
00
1
B
@
I
10
.
..
C
A
I
33
.
Decoding is done bit plane wise for the image in question, and gives a set of intervals d
1
± c
1
, . . . , d k
± c k
(with corresponding DCT vectors a
1
, . . . , a k
) wherein the DCT coefficients lie. To reconstruct a 4
⇥ 4 image patch I we can use the fact that the vectorized patch should fulfill
AI v
2 [d
1 c
1
, d
1
+ c
1
]
⇥ · · · ⇥ [d k c k
, d k
+ c k
] A =
0
B a
1
.
..
1
D a k
A simple solution for reconstructing a 4
⇥ 4 patch consists in ignoring the interas the solution to
Ae v
=
0
B d
1
.
..
1 d k with the least Euclidian norm, which is simply given as the MoorePenrose pseudoinverse of the midpoint vector eI v
= A
†
0
B d
1
.
..
1 d k
Example 3.5.1. Given c
1
= DCT
0
(I)(0, 0), A =
1
4
A
†
=
1
4
1 · · · 1
>
,
1 · · · 1 , and d
1
4
, in other words we fill in the patch in pixel domain by its average.
Example 3.5.2. Given an estimate I v subject to the constraint specified by a vector a, we can first check if the estimate is admissible by checking if
a
>
I v d  c.
If so, we will not do anything. When this is not the case we can project the solution onto the planar strip given by the constraint.
Assuming that a
>
I v d > c, we want to compute the orthogonal projection of I v onto
{x : a
> x d = c
}, which is equivalent to computing the projection of I v given by a plus d + c, which is a
>
(I v a
>
(d + c)) a a + d + c.
(d + c) onto the line
45
Upsampling using global regularization Consider now the case of an entire image I, with the 4 ⇥4 patches I
1
, . . . , I n
, and corresponding DCT intervals
D
1
, . . . , D n
. We can consider the problem of reconstructing all patches subject to a global roughness penalty. This means that the reconstructed patches interact to give a result that is regular across the entire image. In addition the regularization will make the problem wellposed. A choice of regularization that is both simple and powerful is total variation. The problem may then be formulated as wanting to recover I as the minimizer of
Z krI(x) dx such that AI i
2 D i for all i.
T
(3.9)
Direct minimization of this problem is not feasible, and instead we propose the following procedure. Starting out from an initial solution, we find a nearby solution with better regularity properties, using the algorithm of
⇥ 4 blocks of the resulting regularized solution are projected onto the set of admissible solutions, and using this new initial solution we repeat the algorithm. In short, the algorithm is simply an iterative minimization of the energy
E(I) =
Z kI(x) I
0
(x) k
2 dx +
Z krI(x) dx (3.10) where I
0 is the orthogonal projection of a current solution onto the set of admissible solutions given by the constraints AI i
2 D i
. The orthogonal projection is computed using an alternating projections method with the result from Example
As previously mentioned, the decoding of a frame is done from most significant bit plane to least significant bit plane. In the following we will denote the corresponding constraints on the DCT coefficients of the decoded bands by DC,
AC1, AC2, etc., and use data from decoding of the codec presented in
Figure
shows an example of how to convert the DC to an estimate of the unknown image in pixel domain. The particular structure of the DC allows for easy manipulation to create an initial estimate of the frame in question. Using only the midpoint of the given intervals, the solution with least Euclidian norm
(Example
) is a blocky nearestneighborlike interpolation of the image. A
smoother estimate may be found by using bicubic interpolation on the same midpoints, which will add some global regularity to the estimate, and finally the quality of this solution may in general be improved by projecting the bicubic solution onto to the space of admissible solutions given by the known coefficients.
This final estimate will be used as the initial guess when minimizing ( 3.10
Figure
shows the results of the presented algorithm with an increasing
number of decoded bands. For the algorithm we have iterated ( 3.10
with = 50, and the projection onto the set of admissible solutions is done by 20 iterations of alternating projection of all individual constraints. While the characteristics of total total variation smoothing are clearly visible, we see that already at four decoded bands (out of 16 in total), the estimates look quite decent.
46
Ground truth Upscaled DC midpoint
Bicubic upscaling
Projected bicubic upscaling
PSNR 24.4
PSNR 24.5
PSNR 25.7
Figure 3.20: Frame 81 of the Soccer sequence and reconstruction of the frame based on the DC.
DC AC1 AC2 AC3
PSNR 26.3
PSNR 28.2
PSNR 30.4
PSNR 31.1
Figure 3.21: Total variation upscaling of frame 81 of the Soccer sequence using an increasing number of decoded bands.
3.5.3
Motion reestimation and interpolation
The upsampling method described above provides a link, that allows one to use the information contained in decoded bands in transform domain, to full resolution images in pixel domain. This new information is di↵erent from the depth maps used in the interpolation process in Section
is just an overly smooth estimate of the frame to be interpolated. The goal is then to fill in the fine details from the outer frame onto this smooth estimate.
It turns out that this detail mapping may be done remarkably well by using a slightly modified TVL
1 optical flow algorithm. With an estimate of the frame in question ˆ t produced by the total variation minimization proposed in the previous section, motion can be estimated directly between the estimate and the corresponding key frames I t 1
, I t+1
. This reduction of the temporal gap may increase the accuracy in the motion estimation, but more importantly, it eliminates the need for the assumption that motion is linear inbetween key frames. The main difficulty is that the estimated frame may be a very coarse approximation of the real solution, in particular when only the DC coefficients have been decoded. To address this, we will use a specialized smoothing strategy prior to downsampling images in the image pyramid. At pyramid level ` with downsampling factor the Gaussian smoothing compared to the full resolution images has standard deviation 0.5
·
This means that from level `
0 max(`,`
0
) where `
0 is some given level.
and down we will have a fixed total standard deviation compared to the full resolution images, and thus smooth out the finer details at these levels. This smoothing makes it possible to properly estimate motion to the generated estimates. Apart from this specialized downsampling
47
and smoothing strategy, we use Algorithm
with i max standard parameters. Table
gives the levels `
0 and
= 15 and otherwise values used for the di↵erent number of decoded bands.
Figure
shows the results of this procedure, where details from the key frames are mapped onto the estimates from Figure
Table 3.4: Paramaters used for reestimation with varying number of decoded bands.
Highest decoded `
0
DC
AC1
25
15
15
15
AC2
AC3
15 40
15 60
DC AC1 AC2 AC3
PSNR 27.7
PSNR 30.2
PSNR 32.2
PSNR 32.6
Figure 3.22: Details mapped to the estimates from Figure
using optical flow from surrounding frames.
The average PSNR values for the four test sequences can be found in Table
. We see that results after reestimation are significantly better for the dy
namic sequences Foreman and Soccer, while the initial estimates provide better results for the almost static Coastguard and Hall sequences. The parameters in Table
have been chosen to achieve this, since a proper multihypothesis
decoding scheme ( Huang et al. 2011 ) may be used to fuse the best parts of
di↵erent estimates to a single superior estimate. Dynamic sequences generally present a problem in distributed video coding with respect to side information
generation ( Huang et al. 2011 ,
Rakˆet et al. 2012b ), which means that improve
ments as those seen in Table
will in the end be the ones that deliver the main bitrate saving in the coding process, compared to conventional methods.
Table 3.5: Average PSNR across the 74 interpolated frames for the four test sequences.
Sequence Initial DC AC1 AC2 AC3
Coastguard 32.71
27.74
27.78
33.01
34.02
Foreman 30.19
33.74
34.69
36.04
36.61
Hall
Soccer
37.25
23.75
31.43
29.96
33.19
31.49
36.22
34.23
36.67
34.93
48
Chapter 4
Conclusions and future directions
This thesis has described the duality based TVL
1 optical flow method, and variations hereof. Theoretical work on generalizing the original formulation of
( 2007 ) has been presented, and a highly optimized algorithmic setup
was described. In addition five novel applications were given. We considered registration of medical images and 2D chromatograms, as well as three examples of frame interpolation in video sequences.
It has been demonstrated that the TVL
1 optical flow algorithm is able to produce good results on benchmark data, and that the robustness of the formulation allows it to be used successfully in a wide range of applications. This robustness is a very important feature of the algorithm, as small benchmark datasets has a tendency to pull the development of algorithms toward algo
rithms that work mainly on the specific examples available ( Austvoll 2005 ).
While the Middlebury benchmark has been built as a response to optical flow algorithms typically only being evaluated on (and overfitted to) the Yosemite and Tree sequences, the examples available are hardly realistic situations, and good performance on this benchmark does not guarantee good performance in other applications. Recently new benchmark data for optical flow evaluation
has been presented ( Geiger et al. 2012 , Butler et al. 2012 ), and hopefully these
data will help giving rise to new robust optical flow methods.
The most obvious direction of future research is to properly investigate the di↵erent extensions presented in Chapter
2 , and consider how they interact.
Then use this knowledge to build an algorithm with even better performance than what has been presented here.
Another question that has not been considered here is estimation of parameters. Very few successful methods for doing this exist.
proposed the socalled optimal prediction principle, where multiple flows with di↵erent parameters were estimated, and the one with the best predictive qualities (with respect to data fit in a subsequent frame) was chosen. A similar idea was considered in
Rakˆet ( 2012 ) to give a locally varying field of
values. While these methods have been demonstrated to work well, they are somewhat heuristic. A general and efficient parameter estimation framework would greatly benefit optical flow estimation, as this would remove the need to
49
tune parameters for specific applications.
An additional point of future investigations is the symmetric interpolation results presented in Section
. The interpolation quality may of course be
improved by using a symmetric data term with a more advanced optical flow method, but this is perhaps not the most exciting research direction. If the goal is to improve viewing experience, a spatial regularization of the interpolated frames could probably improve the perceived quality. Spatial regularization
may be done by means of total variation ( Keller et al. 2010 ,
2011 ) or by edge enhancing di↵usion ( Weickert 1994 ). The latter has been
shown to have very good interpolation properties in other areas, and has been
to frame interpolation would likely produce very good and robust results. To improve reconstruction quality one could in addition do occlusion reasoning and selectively interpolate from the nonoccluded frame, or compute motion
trajectories over several frames ( Volz et al. 2011 ) and use this information for
interpolation.
Finally, an important point of future research is the approach to the vector valued problem described in Section
, where the coupling of channels is moved
to the regularization term, instead of the data term. These uncoupled data terms are somewhat similar to the robustification of
decomposed the original coupling of brightness and gradient suggested in
( 2004 ), but kept as a strict requirement that the flows stayed the same
(as was the case with brightness and depth in ( 3.7
further to decouple HSV color channels in
interesting to consider how the proposed method compares to usual data term coupling, in particular if it will add any robustness to the estimation, since it allows for a much simpler and more efficient solution than the one described in
Proposition
50
Bibliography
Aaron, A., Zhang, R. & Girod, B. (2002), WynerZiv coding of motion video, in ‘Signals, Systems and Computers, 2002. Conference Record of the Thirty
Sixth Asilomar Conference on’, Vol. 1, pp. 240 –244 vol.1.
Adato, Y., Zickler, T. & BenShahar, O. (2011), ‘A polar representation of motion and implications for optical flow’, Computer Vision and Pattern Recognition, IEEE Computer Society Conference on pp. 1145–1152.
& S´anchez, J. (2007b), Symmetric optical flow, in R. D´ıaz, F. Pichler &
A. Arencibia, eds, ‘Computer Aided Systems Theory–EUROCAST 2007’, Vol.
4739 of Lecture Notes in Computer Science, Springer, pp. 676–683.
Alvarez, L., Deriche, R., Papadopoulo, T. & S´anchez, J. (2007a), ‘Symmetrical dense optical flow estimation with occlusions detection’, International Journal of Computer Vision 75, 371–385.
Austvoll, I. (2005), A study of the Yosemite sequence used as a test sequence for estimation of optical flow, in H. Kalviainen, J. Parkkinen & A. Kaarna, eds,
‘Image Analysis’, Vol. 3540 of Lecture Notes in Computer Science, Springer
Berlin Heidelberg, pp. 659–668.
Ayvaci, A., Raptis, M. & Soatto, S. (2012), ‘Sparse occlusion detection with optical flow’, International Journal of Computer Vision 97, 322–338.
Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J. & Szeliski, R.
(2011), ‘A database and evaluation methodology for optical flow’, International Journal of Computer Vision 31(1), 1–31.
Barron, J., Fleet, D. & Beauchemin, S. (1994), ‘Performance of optical flow techniques’, International Journal of Computer Vision 12, 43–77.
Black, M. J. & Anandan, P. (1996), ‘The robust estimation of multiple motions:
Parametric and piecewisesmooth flow fields’, Computer Vision and Image
Understanding 63(1), 75–104.
Bresson, X. & Chan, T. (2008), ‘Fast dual minimization of the vectorial total variation norm and application to color image processing’, Inverse Problems and Imaging 2(4), 455–484.
Brox, T., Bruhn, A., Papenberg, N. & Weickert, J. (2004), High accuracy optical flow estimation based on a theory for warping, in T. Pajdla & J. Matas, eds,
51
‘Computer Vision  ECCV 2004’, Vol. 3024 of Lecture Notes in Computer
Science, Springer Berlin Heidelberg, pp. 25–36.
Bruhn, A., Weickert, J. & Schn¨orr, C. (2005), ‘Lucas/Kanade meets
Horn/Schunck: Combining local and global optic flow methods’, International
Journal of Computer Vision 61, 211–231.
Butler, D., Wul↵, J., Stanley, G. & Black, M. (2012), A naturalistic open source movie for optical flow evaluation, in A. W. Fitzgibbon, S. Lazebnik, P. Perona,
Y. Sato & C. Schmid, eds, ‘Computer Vision – ECCV 2012’, Vol. 7577 of
Lecture Notes in Computer Science, Springer Berlin Heidelberg, pp. 611–625.
Chambolle, A. (2004), ‘An algorithm for total variation minimization and applications’, Journal of Mathematical Imaging and Vision 20, 89–97.
Chambolle, A. & Pock, T. (2011), ‘A firstorder primaldual algorithm for convex problems with applications to imaging’, Journal of Mathematical Imaging and
Vision 40, 120–145.
Chen, K. & Lorenz, D. (2011), ‘Image sequence interpolation using optimal control’, Journal of Mathematical Imaging and Vision 41, 222–238.
Chen, W. (2012), ‘Surface velocity estimation from satellite imagery using displaced frame central di↵erence equation’, IEEE Transactions on Geoscience and Remote Sensing, 50(7), 2791 –2801.
Christensen, G. E. & Johnson, H. J. (2001), ‘Consistent image registration’,
IEEE Transactions on Medical Imaging 20(7), 568–582.
Ekeland, I. & Teman, R. (1999), Convex Analysis and Variational Problems,
SIAM.
Gai, J. & Stevenson, R. (2010), Optical flow estimation with pharmonic regularization, in ‘Image Processing (ICIP), 2010 17th IEEE International Conference on’, pp. 1969 –1972.
Gali´c, I., Weickert, J., Welk, M., Bruhn, A., Belyaev, A. & Seidel, H.P.
(2008), ‘Image compression with anisotropic di↵usion’, Journal of Mathematical Imaging and Vision 31, 255–269.
Geiger, A., Lenz, P. & Urtasun, R. (2012), Are we ready for autonomous driving? The KITTI vision benchmark suite, in ‘Computer Vision and Pattern
Recognition (CVPR), 2012 IEEE Conference on’, pp. 3354 –3361.
Ghodstinat, M., Bruhn, A. & Weickert, J. (2009), Deinterlacing with motioncompensated anisotropic di↵usion, in D. Cremers, B. Rosenhahn, A. L. Yuille
& F. R. Schmidt, eds, ‘Statistical and Geometrical Approaches to Visual
Motion Analysis’, SpringerVerlag, Berlin, Heidelberg, pp. 91–106.
Girod, B., Aaron, A., Rane, S. & RebolloMonedero, D. (2005), ‘Distributed video coding’, Proceedings of the IEEE 93(1), 71 –83.
total variation which arises from geometric measure theory’, SIAM Journal on Imaging Sciences 5(2), 537–563.
52
Golub, G. & van Loan, C. (1989), Matrix Computations, The John Hopkins
University Press, Baltimore, Maryland.
Herbst, E., Seitz, S. & Baker, S. (2009), Occlusion reasoning for temporal interpolation using optical flow, Technical Report UWCSE090801, Department of Computer Science and Engineering, University of Washington.
Horn, B. K. P. & Schunck, B. G. (1981), ‘Determining optical flow’, Artificial
Intelligence 17, 185–203.
Huang, X. & Forchhammer, S. (2005), ‘Crossband noise model refinement for transform domain WynerZiv video coding’, Signal Processing: Image Communication 27, 16–30.
Huang, X., Rakˆet, L. L., Luong, H. V., Nielsen, M., Lauze, F. & Forchhammer, S. (2011), Multihypothesis transform domain WynerZiv video coding including optical flow, in ‘Multimedia Signal Processing (MMSP), 2011 IEEE
13th International Workshop on’, pp. 1–6.
Keller, S., Lauze, F. & Nielsen, M. (2010), Temporal super resolution using variational methods, in M. Mrak, M. Grgic & M. Kunt, eds, ‘HighQuality
Visual Experience: Creation, Processing and Interactivity of HighResolution and HighDimensional Video Signals’, Springer Berlin Heidelberg, pp. 275–
296.
Kiseliov, Y. (1994), ‘Algorithms of projection of a point onto an ellipsoid’,
Lithuanian Mathematical Journal 34, 141–159.
Lucas, B. D. & Kanade, T. (1981), An iterative image registration technique with an application to stereo vision, in ‘Proceedings of the 7th International
Joint Conference on Artificial Intelligence (IJCAI ’81)’, pp. 674–679.
Luong, H. V., Rakˆet, L. L., Huang, X. & Forchhammer, S. (2012), ‘Side information and noise learning for distributed video coding using optical flow and clustering’, Image Processing, IEEE Transactions on 21(12), 4782–4796.
Luong, H. V., Rakˆet, L. L., Salmistraro, M. & Forchhammer, S. (2013), Motion and reconstruction reestimation for distributed video coding. (in preparation).
Panin, G. (2012), Mutual information for multimodal, discontinuitypreserving image registration, in G. Bebis, R. Boyle, B. Parvin, D. Koracin, C. Fowlkes,
S. Wang, M.H. Choi, S. Mantler, J. Schulze, D. Acevedo, K. Mueller &
M. Papka, eds, ‘Advances in Visual Computing’, Vol. 7432 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, pp. 70–81.
Papenberg, N., Bruhn, A., Brox, T., Didas, S. & Weickert, J. (2006), ‘Highly accurate optical flow computations with theoretically justified warping’, International Journal of Computer Vision 67(2), 141–158.
Petersen, I. L., Tomasi, G., Sørensen, H., Boll, E. S., Hansen, H. C. B. &
Christensen, J. H. (2011), ‘The use of environmental metabolomics to determine glyphosate level of exposure in rapeseed (Brassica napus L.) seedlings’,
Environmental Pollution 159(10), 3071 – 3077.
53
Pock, T., Urschler, M., Zach, C., Beichel, R. & Bischof, H. (2007), A duality based algorithm for TVL
1
opticalflow image registration, in N. Ayache,
S. Ourselin & A. Maeder, eds, ‘Medical Image Computing and Computer
Assisted Intervention – MICCAI 2007’, Vol. 4792 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, pp. 511–518.
Rakˆet, L. L. (2012), Local smoothness for global optical flow, in ‘Image Processing (ICIP), 2012 19th IEEE International Conference on’, pp. 1–4.
Rakˆet, L. L. & Markussen, B. (2014), ‘Approximate inference for spatial functional data on massively parallel processors’, Computational Statistics & Data
Analysis 72, 227 – 240.
Rakˆet, L. L. & Nielsen, M. (2012), A splitting algorithm for directional regularization and sparsification, in ‘Pattern Recognition (ICPR), 2012 21st International Conference on’, pp. 3094–3098.
Rakˆet, L. L., Roholm, L., Bruhn, A. & Weickert, J. (2012a), Motion compensated frame interpolation with a symmetric optical flow constraint, in
G. Bebis, R. Boyle, B. Parvin, D. Koracin, C. Fowlkes, S. Wang, M.H. Choi,
S. Mantler, J. Schulze, D. Acevedo, K. Mueller & M. Papka, eds, ‘Advances in
Visual Computing’, Vol. 7431 of Lecture Notes in Computer Science, Springer
Berlin Heidelberg, pp. 447–457.
Rakˆet, L. L., Roholm, L., Nielsen, M. & Lauze, F. (2011), TVL
1 optical flow for vector valued images, in Y. Boykov, F. Kahl, V. Lempitsky & F. Schmidt, eds,
‘Energy Minimization Methods in Computer Vision and Pattern Recognition’,
Vol. 6819 of Lecture Notes in Computer Science, Springer, pp. 329–343.
Rakˆet, L. L., Søgaard, J., Salmistraro, M., Luong, H. V. & Forchhammer,
S. (2012b), Exploiting the errorcorrecting capabilities of low density parity check codes in distributed video coding using optical flow, in ‘Proceedings of SPIE, the International Society for Optical Engineering’, Vol. 8499, SPIE
– International Society for Optical Engineering.
Rudin, L. I., Osher, S. & Fatemi, E. (1992), ‘Nonlinear total variation based noise removal algorithms’, Phys. D 60, 259–268.
Salmistraro, M., Rakˆet, L. L., Zamarin, M., Ukhanova, A. & Forchhammer, S.
(2013), Texture side information generation for distributed coding of videoplusdepth, in ‘Image Processing, 2013. ICIP ’13. 2013 International Conference on’.
Salmistraro, M., Zamarin, M., Rakˆet, L. L. & Forchhammer, S. (2013), Distributed multihypothesis coding of depth maps using texture motion information and optical flow, in ‘Acoustics, Speech and Signal Processing (ICASSP),
2013 IEEE International Conference on’, pp. 1685–1689.
Sommer, S., Lauze, F., Nielsen, M. & Pennec, X. (2012), ‘Sparse multiscale di↵eomorphic registration: The kernel bundle framework’, Journal of Mathematical Imaging and Vision pp. 1–17.
54
flow computation without warping, in ‘Computer Vision, 2009 IEEE 12th
International Conference on’, pp. 1609 –1614.
ational optic flow estimation, in M. A. Magnor, B. Rosenhahn & H. Theisel, eds, ‘Proceedings of the Vision, Modeling, and Visualization Workshop 2009,
November 1618, 2009, Braunschweig, Germany’, DNB, pp. 155–164.
Stich, T., Linz, C., Albuquerque, G. & Magnor, M. (2008), ‘View and time interpolation in image space’, Computer Graphics Forum 27(7), 1781–1787.
Sun, D., Roth, S. & Black, M. J. (2010), Secrets of optical flow estimation and their principles, in ‘Computer Vision and Pattern Recognition (CVPR), 2010
IEEE Conference on’, pp. 2432 –2439.
Vese, L. A. & Osher, S. J. (2002), ‘Numerical methods for pharmonic flows and applications to image processing’, SIAM Journal on Numerical Analysis
40(6), 2085–2104.
Volz, S., Bruhn, A., Valgaerts, L. & Zimmer, H. (2011), Modeling temporal coherence for optical flow, in ‘Computer Vision (ICCV), 2011 IEEE International Conference on’, pp. 1116 –1123.
Wedel, A., Cremers, D., Pock, T. & Bischof, H. (2009b), Structure and motionadaptive regularization for high accuracy optic flow, in ‘Computer Vision,
2009 IEEE 12th International Conference on’, pp. 1663 –1668.
Wedel, A., Pock, T., Braun, J., Franke, U. & Cremers, D. (2008), Duality TV
L
1 flow with fundamental matrix prior, in ‘Image Vision and Computing’,
Auckland, New Zealand.
Wedel, A., Pock, T., Zach, C., Bischof, H. & Cremers, D. (2009a), An improved algorithm for TVL
1 optical flow, in D. Cremers, B. Rosenhahn, A. Yuille &
F. Schmidt, eds, ‘Statistical and Geometrical Approaches to Visual Motion
Analysis’, Vol. 5064 of Lecture Notes in Computer Science, Springer, pp. 23–
45.
Weickert, J. (1994), Theoretical foundations of anisotropic di↵usion in image processing, in W. G. Kropatsch, R. Klette & F. Solina, eds, ‘Theoretical
Foundations of Computer Vision’, Vol. 11 of Computing Supplement, Springer, pp. 221–236.
Werlberger, M., Pock, T. & Bischof, H. (2010), Motion estimation with nonlocal total variation regularization, in ‘Computer Vision and Pattern Recognition
(CVPR), 2010 IEEE Conference on’, pp. 2464 –2471.
Werlberger, M., Pock, T., Unger, M. & Bischof, H. (2011), Optical flow guided
TVL
1 video interpolation and restoration, in Y. Boykov, F. Kahl, V. Lempitsky & F. Schmidt, eds, ‘Energy Minimization Methods in Computer Vision and Pattern Recognition’, Vol. 6819 of Lecture Notes in Computer Science,
Springer, pp. 273–286.
55
Werlberger, M., Trobin, W., Pock, T., Wedel, A., Cremers, D. & Bischof, H.
(2009), Anisotropic HuberL
1 optical flow, in ‘Proceedings of the British Machine Vision Conference (BMVC)’, London, UK.
Xu, L., Jia, J. & Matsushita, Y. (2010), Motion detail preserving optical flow estimation, in ‘Computer Vision and Pattern Recognition (CVPR), 2010 IEEE
Conference on’, pp. 1293 –1300.
Xu, L., Jia, J. & Matsushita, Y. (2012), ‘Motion detail preserving optical flow estimation’, Pattern Analysis and Machine Intelligence, IEEE Transactions on 34(9), 1744 –1757.
Xu, L., Lu, C., Xu, Y. & Jia, J. (2011), ‘Image smoothing via L
0 gradient minimization’, ACM Transactions on Graphics (SIGGRAPH Asia) 30(4).
Zach, C., Pock, T. & Bischof, H. (2007), A duality based approach for realtime
TVL
1 optical flow, in F. Hamprecht, C. Schn¨orr & B. J¨ahne, eds, ‘Pattern Recognition’, Vol. 4713 of Lecture Notes in Computer Science, Springer, pp. 214–223.
Zamarin, M. & Forchhammer, S. (2012), Lossless compression of stereo disparity maps for 3D, in ‘Multimedia and Expo Workshops (ICMEW), 2012 IEEE
International Conference on’, pp. 617 –622.
Zimmer, H., Bruhn, A. & Weickert, J. (2011), ‘Optic flow in harmony’, International Journal of Computer Vision 93, 368–388.
Zitnick, C., Jojic, N. & Kang, S. B. (2005), Consistent segmentation for optical flow estimation, in ‘Computer Vision, 2005. ICCV 2005. Tenth IEEE
International Conference on’, Vol. 2, pp. 1308 –1315 Vol. 2.
Zitnick, L., Kang, S., Uyttendaele, M., Winder, S. & Szeliski, R. (2004), ‘Highquality video view interpolation using a layered representation’, ACM Transactions on Graphics 23(3), 600–608.
56
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project