Duality based optical flow algorithms with applications

Duality based optical flow algorithms with applications

U N I V E R S I T Y O F C O P E N H A G E N

Duality based optical flow algorithms with applications

University of Copenhagen prize thesis

Lars Lau Rakêt

January 14, 2013

Abstract

We consider the popular TV-L

1 optical flow formulation, and the so-called duality based algorithm for minimizing the TV-L

1 energy. The original formulation is extended to allow for vector valued images, and minimization results are given. In addition we consider di↵erent definitions of total variation regularization, and related formulations of the optical flow problem that may be used with a duality based algorithm. We present a highly optimized algorithmic setup to estimate optical flows, and give five novel applications. The first application is registration of medical images, where X-ray images of di↵erent hands, taken using di↵erent imaging devices are registered using a TV-L

1 optical flow algorithm. We propose to regularize the input images, using sparsity enhancing regularization of the image gradient to improve registration results. The second application is registration of 2D chromatograms, where registration only have to be done in one of the two dimensions, resulting in a vector valued registration problem with values having several hundred dimensions. We propose a novel method for solving this problem, where instead of a vector valued data term, the di↵erent channels are coupled through the regularization. This results in a simple formulation of the problem, that may be solved much more efficiently than the conventional coupling. In the third application of the TV-L

1 optical flow algorithm we consider the problem of interpolating frames in an image sequence.

We propose to move the motion estimation from the surrounding frames directly to the unknown frame by parametrizing the optical flow objective function such that the interpolation assumption is directly modeled. This reparametrization is a powerful trick that results in a number of appealing properties, in particular the motion estimation becomes more robust to noise and large displacements, and the computational workload is more than halved compared to usual bidirectional methods. Finally we consider two applications of frame interpolation for distributed video coding. The first of these considers the use of depth data to improve interpolation, and the second considers using the information from partially decoded video frames to improve interpolation accuracy in high-motion video sequences.

Notes

This thesis was awarded the University of Copenhagen silver medal at the 2013 annual commemoration. A number of typos present in the original thesis has been corrected in the present version, and several references have been updated.

Lars Lau Raket

Contents

1 Introduction

1

2 Optical Flow

3

2.1

Duality based optical flow

. . . . . . . . . . . . . . . . . . . . . .

4

2.1.1

TV-L

1 optical flow

. . . . . . . . . . . . . . . . . . . . . .

5

2.1.2

Minimizing affine L

1

-L

2 energies

. . . . . . . . . . . . . .

6

2.1.3

Alternative optical flow formulations

. . . . . . . . . . . .

10

2.2

Algorithm

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

2.3

Results

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

3 Applications

22

3.1

Registration of X-ray images

. . . . . . . . . . . . . . . . . . . .

22

3.1.1

Registration of structural images

. . . . . . . . . . . . . .

23

3.2

Registration of 2D chromatograms

. . . . . . . . . . . . . . . . .

28

3.2.1

Registration algorithm

. . . . . . . . . . . . . . . . . . . .

28

3.3

Image interpolation with a symmetric optical flow constraint

. .

32

3.3.1

Motion Compensated Frame Interpolation

. . . . . . . . .

33

3.3.2

Reparametrizing Optical Flow for Interpolation

. . . . . .

33

3.3.3

Results

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

3.4

Image interpolation using depth data

. . . . . . . . . . . . . . . .

39

3.4.1

Optical flow computation using brightness and depth

. . .

39

3.4.2

Interpolation

. . . . . . . . . . . . . . . . . . . . . . . . .

40

3.4.3

Results

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

3.5

Image interpolation using partially decoded frames

. . . . . . . .

43

3.5.1

Initial frame interpolation

. . . . . . . . . . . . . . . . . .

43

3.5.2

Upsampling images from DCT coefficients

. . . . . . . . .

44

3.5.3

Motion reestimation and interpolation

. . . . . . . . . . .

47

4 Conclusions and future directions

Bibliography

49

51

Chapter 1

Introduction

This thesis is an answer to the call for prize papers announced at University of

Copenhagen’s annual commemoration 2011. In particular, it is an answer to the topic “Regularized energy methods in image analysis”, proposed by Department of Computer Science.

For the energy method in question, we consider the TV-L

1 optical flow formulation, which has received a lot of attention in recent years. With the introduction of the so-called duality based method for minimizing this energy,

Zach et al.

( 2007 ) opened the door to an entirely new way of estimating optical

flow, that has fundamentally changed the field.

While the method introduced by

Zach et al.

( 2007 ) is powerful, the original

formulation is somewhat limiting. We begin this thesis with a theoretical section, where we first review the original formulation. We then consider extensions to allow for vector valued images, which will make it possible to estimate optical flows using color images. This extension was originally presented in

Rakˆet et al.

( 2011 ). We furthermore consider alternative definitions of the total variation

term that is used for regularizing the results. A number of related formulations of the optical flow problem that fit into the duality based algorithm are reviewed, and in relation to this, we propose new data and regularization terms, and give directions on the minimization of the the corresponding energies.

We finally end the theoretical chapter by presenting a highly optimized algorithmic setup to estimate optical flows, and give results for some of the presented algorithms on benchmark data from the Middlebury Optical Flow Database

( Baker et al. 2011 ).

The second part of this thesis consists of five novel applications of optical flow. The first application is registration of medical images, where X-ray images of di↵erent hands, taken using di↵erent imaging devices are registered using a TV-L

1 optical flow algorithm. In addition we consider the use of sparsity enhancing regularization of the input images, in order to improve registration results.

The second application considers registration of 2D chromatograms. For this particular dataset, registration is only necessary in one of the two dimensions of the data. With a fixed second dimension we may consider this as a vector valued registration problem with values having several hundred dimensions. A novel method for solving this is proposed, where instead of a coupling the di↵erent channels through the data term, the coupling is done through regularization.

1

This results in a very simple formulation of the problem, which may in addition be solved much more efficiently than the conventional coupling. This method, which may be used on many types of data, has originally been developed for the presented example in

Rakˆet & Markussen ( 2014 ), where the registration is

used as a preprocessing step, prior to analysis of the dataset.

In the third application of the TV-L

1 optical flow algorithm we consider the problem of interpolating unknown frames in an image sequence. We propose to move the motion estimation from the surrounding frames directly to the unknown frame, by parametrizing the optical flow objective function such that the interpolation assumption is directly modeled. This reparametrization is a powerful trick that results in a number of appealing properties, in particular the motion estimation becomes more robust to noise and large displacements, and the computational workload is more than halved compared to usual bidirectional methods. This method was originally presented in

Rakˆet et al.

( 2012a ).

Finally we consider two applications of frame interpolation meant to be used in distributed video coding setups. The first of these consider the use of depth data to improve interpolation quality. We show that including a standard asymmetric data term for the depth data with the symmetric data term presented in the previous application gives significantly better interpolation results than using either of the terms on their own. This application has been developed for the distributed video codec by

Salmistraro, Rakˆet, Zamarin, Ukhanova &

Forchhammer ( 2013 ).

For the second application of interpolation in a distributed video coding setup, we consider using the information from partially decoded video frames to improve accuracy in high-motion video sequences. We develop a method to generate rough estimates of the frame to be decoded in pixel domain based on the decoded information in transform domain. With these initial estimates we are able to use a TV-L

1 optical flow method to fill in the fine details from the two known surrounding key frames. This method is used in the distributed video codec described in

Luong et al.

( 2013 ), and have resulted in a significant bitrate saving, compared to the current state-of-the-art codec SING ( Luong et al. 2012 ).

All results of this thesis are, unless otherwise mentioned, original and unpublished, and have been independently developed by the author.

2

Chapter 2

Optical Flow

The optical flow problem dates back to the works of

Lucas & Kanade ( 1981 ) and

Horn & Schunck ( 1981 ), that respectively proposed local and global resolution

strategies. Given two images I

0 and I

1 the main problem in optical flow is defining a map v, such that the di↵erence between I

1 warped according to v and I

0

I

1

(x + v(x)) I

0

(x) (2.1)

is close to zero. Solving ( 2.1

) equal to zero is problematic in a number of ways.

It is ill-posed; in the standard case we have one-dimensional brightness images, and for each point x we need to estimate the two components of v from a single equation. In addition the problem is highly non-linear. To deal with

these problems the term ( 2.1

) is typically linearized by means of its first order

Taylor approximation in v. Local methods assume that the displacement v(x) is similar in a neighborhood of x, which typically gives enough linear independent equations in the channels of v for proper estimation. In contrast global optical flow methods typically use a pointwise data term based on the linearization of

( 2.1

), but adds a regularization term, that penalizes erratic behavior of v, giving

an energy that must be minimized in order to estimate v.

In the original formulation by

Horn & Schunck ( 1981 ), optical flow is defined

as “the distribution of apparent velocities of movement of brightness patterns

in an image”, which is directly compatible with ( 2.1

) with grayscale images.

Rather than this original definition, optical flow is today often thought of as the

projected scene flow ( Barron et al. 1994 ), that is the true motion of the objects

in the scene as seen from the image plane.

Today the variational global approach to optical flow estimation is by far the method of choice for high accuracy optical flow algorithms, and judging from

the authoritative Middlebury optical flow benchmark ( Baker et al. 2011 ) the op-

tical flow problem is essentially solved. The accuracy of optical flow algorithms have only increased marginally since 2010, where

Xu et al.

( 2010 ) presented their estimation framework ( Xu et al. 2012 ), with average endpoint errors of

the estimated motion vectors that are typically less than one fifth of a pixel.

So why still consider this problem, if one can only hope for pushing accuracy on the second or third decimals of benchmark data? A number of prominent reasons comes to mind. First, many of the top performing methods require large amounts of time (up to 10 hours) to compute a single displacement field

3

for small resolution images. Secondly, it seems that almost all top performing methods are either very complex in their formulation, or relies on solving the optical flow problem using highly sophisticated setups. In addition many meth-

ods rely on ‘tricks’ ( Sun et al. 2010 ), and proper tuning of a large number of

parameters. Finally, it seems that most focus has been on a single benchmark dataset, which means that many methods are essentially tailored to the specific evaluation setup. The consequence of this is that only little of the work that has been put in to solving the optical flow problem given by the Middlebury benchmark, has actually been transferred to possibly benefit related problems such as processing of video data or registration of medical images.

In this chapter we will review the so-called duality based optical flow method with a special focus on the TV-L

1 optical flow formulation, which is both fast and has been used in many di↵erent applications, demonstrating its robustness.

2.1

Duality based optical flow

Given a domain T

✓ R d and a sequence of images I t

: T

! R k

, I = (I t

) t 2T for suitable T , we want to estimate the optical flow v : T

! T such that the motion matches the image sequence with respect to some measure. We will consider a variational approach where the flow v is estimated as a minimizer of an energy on the form

E(v) = F (I, v) + G(v) (2.2) with F being a positive functional measuring data fidelity, and where G acts as a regularization term. Many energies of this type have been suggested throughout

the years, and a large variety of solution methods exist ( Horn & Schunck 1981 ,

Papenberg et al. 2006 ,

Zach et al. 2007 , Zimmer et al. 2011 ). Here we will focus

on a specific relaxation of the problem, and consider the minimization methods in this framework. The relaxed energy is obtained by introducing an auxiliary variable, e↵ectively splitting the minimization problem in two quadratically coupled problems

E(u, v) = F (I, v) +

1

2✓

Z kv(x) u(x) k

2 dx + G(u).

(2.3)

For ✓

! 0 a minimizer of ( 2.2

) and ( 2.3

) will clearly be the same, so the hope is that for ✓ small, a minimizer of the relaxed energy ( 2.3

) will be close to a minimizer of the original energy ( 2.2

). It may seem troublesome to introduce

an auxiliary variable, since one has to iteratively solve the two energies

E

1

(v) = F (I, v) +

E

2

(u) =

1

2✓

Z kv(x)

1

2✓

Z kv(x) u(x) k

2 u(x) k

2 dx + R(u), dx, (2.4)

(2.5) and for a wide variety of choices for F and G, methods exist that directly tar-

get (relaxed) variants of the original energy ( 2.2

). The splitting on the other hand also has a number of advantages, typically the two sub-problems ( 2.4

) and

( 2.5

) are much easier to solve, and in a number of important cases the mini-

mization problems may very easily be solved on massively parallel processors

4

such as GPUs. Another positive feature is that data-matching and regularization are done independently, so one may easily replace one without changing the minimization of the other—a fact that makes comparison of di↵erent types of energies uncomplicated and fair, since the minimization is done in a fully comparable framework.

2.1.1

TV-L

1

optical flow

The by now classic duality based TV-L

1

( 2007 ) uses an L

1 optical flow algorithm of

Zach et al.

norm for the data matching term F , and a vectorial total variation term for the regularization G, giving an energy of the form

Z Z

E(v) = kR(v)(x)k dx + kDv(x)k dx, (2.6)

T T where R is the given constancy assumption, that is typically defined from some

variant of ( 2.1

).

The TV-L

1 formulation was originally proposed by

Brox et al.

( 2004 ), who

also described a modern implementation in details, and gave a theoretical account for the choices. This algorithm marked a turning point with respect to optical flow accuracy, that also helped boosting the performance of later algorithms. The estimation in

Brox et al.

( 2004 ) is based on the Euler-Lagrange

framework, which require smooth functionals, and so the Euclidian norms in

( 2.6

) are replaced with Charbonnier functions

k · k

"

= p k · k 2

+ "

2

(2.7) where " is some small number.

Zach et al.

( 2007

) proposed to recover a minimizer of ( 2.6

) by iteratively

minimizing the two convex quadratically coupled problems described in ( 2.3

).

In the given formulation, they are

E

1

(v) =

Z k⇢(v)(x)k dx +

1

Z

(2.8)

T

2✓

T kv(x) u(x) k

2 dx,

E

2

(u) =

Z

T kD

S u(x) k dx +

1

2✓

Z

T kv(x) u(x) k

2 dx.

(2.9) where ⇢ is the linearization of a grayscale data fidelity term R, and the chosen total variation term is defined as the sum of the total variation over all channels

Z

T kD

S u(x) k dx = i=1

Z

T kru i

(x) k dx.

(2.10)

The flow is recovered by iteratively minimizing these energies in a coarse-to-fine pyramid scheme for some small ✓.

For one-dimensional images, using for example only the brightness, we get a linearized data term of the form ⇢(v)(x) = a

> v(x) + b, a

2 R d minimizer of E

1 and b

2 R. The can be computed using the results given in

Zach et al.

( 2007 ).

These results are replicated in general form in the following lemma.

5

Lemma 2.1.1. For ⇢(v)(x) = a

> v(x) + b, the minimizer of E

1 v(x) = u(x) ⇡

✓[ a,a]

✓ u(x) + b kak 2 a

◆ is given by

(2.11) where ⇡ and ✓a,

✓[ a,a] is the projection onto the line segment joining the vectors ✓a

✓[ a,a]

✓ u(x) + b kak 2 a

=

8

<

✓a if ⇢(u)(x) < ✓ kak

2

✓a if ⇢(u)(x) > ✓ kak

2

⇢(u)(x) kak

2 a if

|⇢(u)(x)|  ✓kak

2

.

(2.12)

The regularization energy ( 2.9

) is elegantly minimized by the method of

Chambolle ( 2004 ). The solution is reproduced in the following lemma.

Lemma 2.1.2 (Chambolle). The minimizer u of E

1 is given coordinatewise by u i

= v i

✓ r · p i for i = 1, . . . , d, where p i

: T

! R d scheme p n+1 i

=

✓ p n i

+ ⌧ r(✓ r · p n i

✓ + ⌧ can computed by the iterative fixed-point

|r(✓ r · p n i v i v i

)

|

)

.

These lemmas provide an elegant, and easily implementable solution to the

relaxed optical flow problem given by ( 2.8

) and ( 2.9

). On the other hand, the

given formulation is somewhat restrictive, for example it does not allow for the use of vector valued images such as color images. In the next section, the problem of having using an L

1

-norm for data term with vector valued images is considered. The problem is analyzed from a convex analysis point of view, and

a solution to the resulting energy ( 2.8

) is given. These results were originally

proposed by

Rakˆet et al.

( 2011 ).

2.1.2

Minimizing affine L

1

-L

2

energies

Consider an L

1

-L

2

E

1

(v) =

Z

T energy of the following form

1

Z kAv(x) + b(x)k dx +

2

T kv(x) u(x) k

2 dx (2.13) where A :

R d

! R k

. Because no di↵erential of v is involved, the minimization of

( 2.13

) boils down to a pointwise minimization of a strictly convex cost function

of the form f (v) = kAv + bk +

1

2 kv u k

2

.

(2.14)

In the following we present the tools used for solving the minimization problem

( 2.14

). We recall first a few elements of convex analysis, the reader can refer to

Ekeland & Teman ( 1999 ) for a complete introduction to convex analysis in both

finite and infinite dimension. Here we will restrict ourselves to finite dimensional problems.

6

A function f :

R d

! R is one-homogeneous if f( x) = f(x), for all > 0.

For a one-homogeneous function, it is easily shown that its Legendre-Fenchel transform f

(x

) = sup x 2R d

{hx, x

⇤ i f (x) } is the characteristic function of a closed convex set C of

R d

, d

C

(x

) := f

(x

) =

(

0 if x

2 C,

+ 1 otherwise.

(2.15)

(2.16)

The one-homogeneous functions that will interest us here are of the form f (x) = kAxk where A : R d

R k

! R

. The computation of the associated Fenchel transform involves the Moore-

Penrose pseudoinverse A

† k is linear, and k · k is the usual Euclidean norm of of A. We recall its construction.

The kernel (or null-space) of A, denoted Ker A, is the vector subspace of the v

2 R d for which Av = 0. The image of A, denoted Im A, is the subspace of reached by A. The orthogonal complement of Ker A is denoted Ker A

?

R k

. Denote by ◆ the inclusion map Ker A

R k

?

! R d and let ⇡ be the orthogonal projection

! Im A. It is well known that the composition map B = ⇡ A ◆

Ker A

?

! R d

! R k

(2.17) is a linear isomorphism between Ker A

?

doinverse A

† of A is defined as and Im A. The Moore-Penrose pseu-

A

= ◆ B

1

⇡.

(2.18)

With this, the following lemma provides the Legendre-Fenchel transform of f (x):

Lemma 2.1.3. The Legendre-Fenchel tranform of x 7! kAxk is the characteristic function d

C of the elliptic ball C given by the set of x’s in

R d that satisfy the following conditions

A

Ax = x x

>

A

A

†> x

 1.

(2.19)

(2.20)

From the properties of pseudoinverses, the equality x = A

Ax means that x belongs to Ker A

?

On this subspace, A

. In fact, A

A

†>

A is the orthogonal projection on Ker A

?

is positive definite and the inequality thus defines an

.

elliptic ball.

The lemma will not be proven here, but we indicate how it can be done. In the case where A is the identity

I d of

R d

, it is easily shown that C is the unit sphere of

R d

. The case where A is invertible follows easily, while the general case follows from the latter using the structure of pseudoinverse (see

Golub & van Loan ( 1989 ) for instance).

We can now state the main result which allows to generalize the TV-L

1 algorithm from

Zach et al.

( 2007 ) to calculate the optical flow between two

vector valued images.

7

Proposition 2.1.4. The minimizer of the function f (v) = kAv + bk +

1

2 kv u k

2 is given as follows.

(i) In the case b

62 Im A, f(v) is smooth. It can be minimized by usual methods.

(ii) In the case where b 2 Im A, f(v), which fails to be smooth for v 2

Ker A + A

† b, reaches its unique minimum at v = u ⇡

C u + A

† b (2.21) where ⇡

C is the projection onto the convex set C =

{ x, x 2 C}, with

C as described in Lemma

2.1.3

.

Proof. To see (i), write b as Ab

0 onal projection of b onto Im A, while b

1 assumption of (i) implies that b then write

1

+ b

1

, with b

0

= A

† b, Ab

0 being then orthogis the residual of the projection. The

6= 0 is orthogonal to the image of A. One can kAv + bk = kA(v + b

0

) + b

1 k = p kA(v + b

0

) k

2

+ kb

1 k

2

(2.22) which is always strictly positive as kb

1 k

2

> 0, and smoothness follows.

In the situation of (ii), since b 2 Im A, we can do the substitution v v + A

b in function ( 2.14

) and the resulting function has the same form as a

number of functions found in

Chambolle ( 2004 ) and Chambolle & Pock ( 2011 ).

We refer the reader to them for the computation of minimizers.

Proposition

2.1.4

generalizes Lemma

2.1.1

since, on one-dimensional spaces, elliptic balls are simply line segments. The next examples extends to multidimensional values.

Example 2.1.1. Consider the minimization problem arg min v

✓ kAv + bk +

1

2 kv u k

2

, > 0.

(2.23) where A

2 R k

⇥2 and b

2 Im A. If A has maximal rank (i.e. 2), then is is well known that the 2

⇥ 2 matrix C = A

A

†> is symmetric and positive definite

( Golub & van Loan 1989 ). The set C is then an elliptic disc determined by

the eigenvectors and eigenvalues of C. The projection may be computed by the efficient algorithm described in Example

2.1.3

, which has much better properties

than the method originally suggested in

Rakˆet et al.

( 2011 ).

In the case that the matrix has two linearly dependent columns a 6= 0 and ca, a series of straightforward calculations give

Ker A =

Ry, Ker A

?

=

Rx, Im A = Ra

(2.24) with x =

1

1+c

2

(1, c)

> and y =

A

A

†>

=

1

1+c

2

( c, 1)

>

1

(1 + c

2

)

2 kak an orthonormal basis of

R 2

, and

1 c

!

2 c c

2

.

(2.25)

8

If c = 0, the inequality ( 2.20

) from Lemma 2.1.3

, just amounts to

u

2

1 kak 2

 1 () kak  u

1

 kak, u = (u

1

, u

2

)

>

(2.26)

that is a vertical strip, while equality ( 2.19

) in Lemma 2.1.3

simply says that u

2

= 0, thus C is the line segment

[ kakx, kakx] ⇢ R

2

.

(2.27)

The case where c 6= 0 is identical, and obtained for instance by rotating the natural basis of

R

2 to the basis (x, y).

Example 2.1.2. Consider again the minimization problem ( 2.23

), but this time

assuming that b /

2.22

) we can rewrite the minimization problem

as arg min v

✓ p kA(v + b

0

) k 2

+ kb

1 k 2

+

1

2 kv u k

2

, > 0.

(2.28)

The minimizing v is found by solving the equation

A

>

A(v + b

0

) kAv + bk

+ v u = 0 which may be done by gradient descent or a (quasi-)Newton method.

Example 2.1.3. Consider the problem of projecting a point x

0 onto the ellipsoid given by

C =

{x 2 R k

| x

>

Cx  1}, where x

0

2 C.

x given by x = arg min x

2C kx x

0 k

2

.

This problem can be solved by introducing a Lagrange multiplier ⇠, giving the objective function f (x, ) = kx x

0 k

2

+ ⇠(x

>

Cx 1).

From the condition that

@

@x f (x, ⇠) = 2(x x

0

) + 2⇠Cx = 0, we get that

ˆ

I) 1 x

0

.

However we need to determine the value of the Lagrange multiplier ⇠. Since we assumed that x

0 was outside the ellipsoid, we know that the projected point will lie on the boundary of the ellipse @C, that is ⇠ is a root of

G(⇠) = ((⇠C +

I) 1 x

0

)

>

C(⇠C +

I) 1 x

0

1.

(2.29)

We can use the following theorem due to

Kiseliov ( 1994 ) to determine the

correct value of ⇠.

9

Theorem 2.1.5. The root ⇠

⇤ iterative Newton process

of ( 2.29

) is unique and can be found by the

0

= 0, ⇠ n+1

= ⇠ n

G(⇠ n

)

G

0

(⇠ n

)

, where ⇠ k

" ⇠

. The rate of convergence is quadratic.

Proof. Since we are assuming that x

0

/

G(0) = x

>

0

Cx

0

1 > 0, lim

!1

G(⇠) = 1 < 0, which gives that a root exists in [0,

1).

Since

0 = d d⇠

(⇠C+

I) 1

(⇠C+

I) = d d⇠

(⇠C +

I) 1

(⇠C+

I)+(⇠C+I) 1

✓ d d⇠

(⇠C +

I)

, we have that d d⇠

(⇠C +

I) 1

= (⇠C +

I) 1

C(⇠C +

I) 1

= (⇠C +

I) 2

C.

Using this we can di↵erentiate G

G

0

(⇠) = 2x

>

0

(⇠C +

I) 3

C

2 x

0 and in addition

G

00

(⇠) = 6x

>

0

(⇠C +

I)

4

C

3 x

0

, where we have used the commutativity of C and (⇠C +

I) has full rank for all ⇠ 0 we see that

1

. Since (⇠C +

I) 1

G

0

(⇠) < 0, so the solution ⇠

⇤ is unique.

G

00

(⇠) > 0

Kiseliov ( 1994 ) in addition gives a non-linear version of the Newton process

described in the above theorem, which is even more efficient. Compared to the added complexity of the implementation, the overall gain of using such an algorithm is limited, and we will recommend the process described here.

2.1.3

Alternative optical flow formulations

In the wake of the algorithm by

Zach et al.

( 2007 ), a large number of duality

based or primal-dual methods emerged in optical flow estimation. The initial focus has mainly been on improving regularization.

Wedel et al.

( 2009b ) con-

siders structure and motion adaptive regularization.

Werlberger et al.

( 2009 )

has gone further to consider full anisotropic regularization, where regularization directions are weighted di↵erently by means of a di↵usion tensor. In addition the L

1 norm used in the regularization of the original TV-L

1 were replaced with a Huber norm that is smooth at the origin, thus eliminating the staircasing e↵ect of the regularization. In

Werlberger et al.

( 2010 ) non-local total variation

10

is considered, where a low level image segmentation is integrated in the regularization. This in turn produces very sharp motion boundaries, and preserves small scale structures in the flow very well.

In addition to the refinement of regularization techniques, some work has been done on reformulating data terms.

Wedel et al.

( 2008 ) shows how to min-

imize a sum of two L

1 data terms for one-dimensional images. Recognizing the pointwise structure of many data terms,

Steinbr¨ ( 2009a ) proposed

to use brute-force minimization of the data fidelity energy ( 2.4

), without lin-

earizing the optical flow constraint ( 2.1

). A number of more advanced pointwise

data terms are considered in

Steinbr¨ ( 2009b ), but unfortunately the

quality of the resulting flows are not as impressive as one could hope for.

Werlberger et al.

( 2010 ) use truncated normalized cross correlation for their data

term. This data term is attractive because of its invariance to multiplicative illumination changes in the scene. It is however not defined pointwise, and thus needs a more complex minimization strategy. This is done by a second order approximation of the data term, in contrast to the usual first order approximation. Building on these ideas,

Panin ( 2012 ) considers a mutual information

data terms. Although the benchmark optical flow results of

Panin ( 2012 ) cannot

compete with a highly optimized TV-L

1

implementation ( Wedel et al. 2009a ),

the algorithm shows impressive results under less optimal conditions such as noise and transformations of the values in one of the images to be registered.

In the following we will consider some examples of alternative data and regularization terms, and consider how they may be minimized. At the end we will consider other extensions.

Example 2.1.4 (L

2 data term). The cost function is clearly smooth and convex in v, so there is a unique minimizer to the problem that can be found as the solution to d dv f (v) = 0.

We readily get that f (v) =

2 kAv + bk

2

+

1

2 kv u k

2 d dv f (v) = v u + ✓A

>

(Av + b) = (

I + ✓A

>

A)v u + ✓A

> b, and the solution to the problem is v = (

I + ✓A

>

A)

1

(u ✓A

> b).

For comparison purposes it may be interesting to rewrite this solution as v = u ✓(

I + ✓A

>

A)

1

A

>

(Au + b).

Note that for k = 1, A

>

= a, so the above formula becomes v = u

✓(a

> u + b) ✓ a.

1 + ✓ kak 2

(2.30)

11

Example 2.1.5 (Charbonnier norm). While the approach for minimizing the vector valued data term in Section

2.1.2

is nice from a theoretical point of view, the actual implementation of the solution given in Proposition

2.1.4

is not very practical. The checks needed in order to determine which category the given point falls into, and the iterative procedures needed to project onto an ellipsoid,

or for the solution of ( 2.28

), result in an algorithm that is hard to implement

and quite slow. The original algorithm presented in

Rakˆet et al.

( 2011 ) dealt

with this somewhat inelegantly, by ensuring full rank of of the matrix A by means of regularization, followed by projection of b onto Im A.

By replacing the Euclidian norm with the Charbonnier norm k · k

" given

in ( 2.7

), we may avoid the checks related to the cases of Proposition

2.1.4

,

and instead just perform iterative minimization at all points following Example

2.1.2

.

Example 2.1.6 (Interval data term). Consider the following penalty function

'(x) =

1

(

1,c

1

)

(x)(x c

1

) + 1

(c

2

,

1)

(x)(x c

2

) where c

1

 0  c

2 and 1 is the indicator function. This type of penalty may be an interesting data term when data is very noisy, or in general when a perfect data fit is not realizable over most of the image.

First consider a function of the form

'(x) + (x y)

2

.

If y 2 [c

1

, c

2

], it is minimized by x = y. For y c

2 we want to minimize

1

(c

2

, 1)

(x)(x c

2

) + (x y)

2

.

If (y c

2

)

1 /

2 the minimizer is x = c

2

, and otherwise it is x = y 1 /

2

. A similar expression is found for y

 c

1

, giving the final solution

8

>

> y +

1

/

2 if y 2 ( 1, c

1 c

1 if y

2 [c

1

1

1

/

2

)

/

2

, c

1

) x = .

>

> y c

2 if y if y

2 [c

2 (c

1

2

, c

, c y

1

/

2 if y 2 (c

2

+

2

2

]

1

+ 1 /

2

]

/

2

, 1)

Consider now the function f (v) = '(a

> v + b) + kv u k

2

.

Denoting ⇢(v) = a

> v + b, the solution can be found as v =

8

>

> u + 1 /

2 a u (⇢(u) c

1 if ⇢(u) 2 ( 1, c

) a

/ kak

2 if ⇢(u) 2 [c

1

1

1

1 /

2 kak

/

2 kak

2

, c

1

)

2

)

>

> u u u

(⇢(u)

1 /

2 a c

2

) a / kak

2 if ⇢(u) 2 [c

1

, c

2

] if ⇢(u)

2 (c

2

, c

2

+ if ⇢(u) 2 (c

2

+

1 /

2 kak

2

]

1 /

2 kak

2

, 1) we see that for c

1

= c

2

= 0, the solution is identical to the one given in Lemma

2.1.1

. In addtion it is interesting to note that similar calculations can be used

to give an explicit solution to the truncated L

1 data term, which was minimized by brute force by

Steinbr¨ ( 2009b ).

12

Example 2.1.7 (Vectorial total variation). As already mentioned the total variation of a vector valued function is not uniquely defined, and the di↵erent definitions will give results with di↵erent properties. In the following it is assumed that d = 2 for simplicity.

We have already introduced the channel-by-channel definition of the vectorial

total variation ( 2.10

), which is used in the original formulation by

Zach et al.

( 2007 ).

The canonical definition of vectorial total variation, which is also the definition used by the original TV-L

1 algorithm of

Brox et al.

( 2004 ), is

Z Z

T kD

F u(x) k dx = sup p 2P

T hu(x), r · p(x)i dx (2.31) with p :

R 2

! R

2 ⇥2

, and where P = {p 2 C c

1

(

R 2

, R

2 ⇥2

) : kpk

2

 1}. It is worth noting that the definition of

Zach et al.

( 2007 ) corresponds to the require-

ment that kpk

1

 1 in P. If we assume that u is smooth, using integration by parts with proper boundary conditions yields that sup p 2P

Z

T hu(x), r · p(x)i dx = sup p 2P

Z

T

X hru i

(x), p i

(x) i dx.

i=1

(2.32)

For ru 6= 0 the supremum is found to be p i

= ru i kruk

, kruk = p kru

1 k 2

+ kru

2 k 2

(2.33) and for ru = 0, p can be any function in P, which in turn means that for smooth u

Z kDu(x)k dx =

Z p kru

1

(x) k

2

+ kru

2

(x) k

2 dx =

Z kru(x)k dx. (2.34)

This definition has some very nice properties. In particular it is rotationally invariant, and couples the channels by weighting the regularization di↵erently across the di↵erent channels.

This definition is not directly compatible with Chambolle’s algorithm (Lemma

2.1.2

), however one may use the following algorithm proposed by

Bresson &

Chan ( 2008 ), which is a direct extension.

Lemma 2.1.6. The minimizer u of ( 2.9

) is given by

u = v ✓ r · p (2.35) which can be solved with the convergent semi-implicit gradient descent scheme p n+1

= p n

+ ⌧ r(r · p n

1 + ⌧ kr(r · p n v/✓) v/✓)k

(2.36) where ⌧ 

1

/

8

, p

0

= 0.

Recently

Goldl¨ ( 2012 ) introduced an alternative definition of vec-

torial total variation. Assuming sufficient smoothness of u, this definition, which we will denote by

Z kD

J u(x) k dx

T

13

corresponds to the integral over the largest singular value of the derivative matrix of u. This definition smooths in a single direction across channels, and thus does not su↵er from the channel smearing e↵ects of the two previously defined methods. The large number of examples given to color imaging by

Goldl¨ et al.

( 2012 ) are very convincing, showing consistently better results of this

method in di↵erent applications. For the minimization of the energy ( 2.9

) with

gorithm given in

Goldl¨ ( 2012 ), which is directly comarable to the

solutions given in lemmas

2.1.2

and

2.1.6

.

In the following these three definitions will be denoted by TV

S

, TV

F

, and

TV

J respectively.

Example 2.1.8 (1-harmonic regularization). As we have seen so far, it is common practice in optical flow estimation to formulate the regularization of the optical flow by means of an L p norm of the flow gradient. However, when considering the nature of optical flow fields, one realizes that this is perhaps not the natural type of regularization. An optical flow field describes the motion of a projected scene. Considering a scene where all motion is parallel to the camera plane, and objects are rigid and move in a single spatial direction. In this setup the displacement vectors of an object in the projected image will point in the same direction, but the magnitude of the flow vectors will vary depending on the distance of the particular part of the object to the camera. This suggests that one should regularize direction to a higher extend than magnitude.

Additional directional regularization has been proposed by

Gai & Stevenson ( 2010

) in the form of an additional 1-harmonic regularization term ( Vese &

Osher 2002 ) to the original TV-L

1 method

Wedel et al.

( 2009a ). A fully polar

representation of optical flow was considered by

Adato et al.

( 2011 ), who demon-

strate good results on the Middlebury training data by completely decoupling the angular and magnitude component of the flow. The chosen representation does however increase the complexity of the formulation of the problem considerably. In addition both methods are quite slow, and they owe much of their

precision to being built upon existing well-performing methods ( Wedel et al.

( 2009a ) and Sun et al.

( 2010 ) respectively). It turns out that one may combine

the elements of the two mentioned methods, in an elegant manner, which in addition has very attractive computational properties. The regularization term we propose to use is the following

Z Z

G(v) = ↵ r v(x) kv(x)k

T rkv(x)k dx + (1

↵)

T dx, 0

 ↵  1.

The above regularization term is similar to the one of

Adato et al.

( 2011 ), since

it completely decouples magnitude and direction, however, instead of having to solve a constrained problem with an angular component, we use a 1-harmonic term as in

Gai & Stevenson ( 2010 ).

We are now interested interested in minimizing an energy of the form

E(v) = F (v) + G(v).

Using the standard quadratic decoupling ( 2.3

) will introduce a coupling of mag-

nitude and direction in the regularization, and in order to avoid that, we propose

14

the following decoupling

E

1

(v) = F (v) +

1

2✓

Z

T kv(x) u(x) k

2 dx,

E

2

(u) =

1

2✓

1

Z

T kv(x)k ku(x)k

2 dx +

1

2✓

2

Z

T v(x) kv(x)k u(x) ku(x)k

2 dx + G(v).

In this formulation magnitude and directions are independent, and representing flows this way will even allow for simultaneous minimization of the magnitude and regularization parts of the flow. The magnitude regularization just corresponds to one-dimensional total variation regularization, and may be minimized following

Chambolle ( 2004 ). The directional regularization may be solved effi-

ciently following

Rakˆet & Nielsen ( 2012 ).

When considering the splitting scheme, it does not seem as elegant as the typical quadratic splitting, as it cannot directly be formulated as a single energy in two variables. For ✓, ✓

1

, and ✓

2 sufficiently small however, the two solutions should converge to each other. In addition, if one considers the splitting as an iterative estimation process, it does makes sense that minimization of the data term gives a solution with fundamentally di↵erent properties than the estimate produced by minimizing the regularization term. In this light it makes sense that one should treat the previous estimate di↵erently in the two minimization problems. This also gives a good explanation for the widespread use and success

of intermediate median filtering in duality based optical flow estimation ( Wedel et al. 2009a , Sun et al. 2010 ).

Example 2.1.9 (Illumination and occlusion modeling). Illumination changes in image sequences present a major problem for conventional optical flow estimation. Changing light conditions from one frame to the next may render the data term completely unable to match objects in the scene. Another problem is the issue of occlusions, when an object (partly) disappears from one frame to the next. This will naturally lead to violations of the optical flow constraint.

In the TV-L

1 optical flow setting,

Chambolle & Pock ( 2011 ) proposed to

model violations of the data term by adding a compensating term c(x) to the linearized optical flow constraint ⇢(v)(x). Illumination changes are expected to a↵ect the residual ⇢(v)(x) similarly in connected regions, so

Chambolle & Pock

( 2011 ) proposed to regularize c using total variation. This new illumination field

may seamlessly be integrated to the algorithm in a similar fashion to was has been done so far: We split data and regularization of c using a quadratic term and minimize iteratively.

A similar method has been used for occlusion detection by

Ayvaci et al.

( 2012 ). A compensating term c is added to the data fidelity functional, however

since only occlusions are modeled, a sparsity enhancing L

0 regularization is proposed. In the presented setup this gives the following data energy

E

1

(v, c) =

Z

T k⇢(v)(x) c(x) k dx +

1

2✓

Z

T kv(x) u(x) k

2 dx + kck

L 0

, where the L

0 norm is defined as

Z kck

L

0

=

T kc(x)k

`

0 dµ(x), kyk

`

0

=

(

0 if y = 0

1 otherwise

,

15

with µ denoting the Hausdor↵ measure. In order to minimize the full energy using specific solvers

Ayvaci et al.

( 2012 ) iteratively approximate the L

0 term by weighted L

1 terms. This does however seem somewhat unnecessary, as a closedform pointwise solution for c is easily found. In the duality based setting v may be calculated using Lemma

2.1.1

, and the minimizer of c is given pointwise as

c(x) =

(

⇢(v)(x) if k⇢(v)(x)k > /

0 otherwise

.

This means that we simply have a thresholding step where if the data residual

⇢(v)(x) is too big, it is considered an occlusion, and motion in the area (which in principle is not defined) is fully determined by the regularization term.

2.2

Algorithm

This section will describe a general algorithmic framework for estimating dif-

ferent types of duality based optical flows from energies on the form ( 2.3

). As

already mentioned the duality based approach has good computational properties, because the solutions to the two sub-energies may be done in parallel. This makes the algorithm perfectly suited for massively parallel processors.

The structure of the algorithm is depicted in Algorithm

2.1

.

Data: Two images I

0 and I

1

Result: The optical flow field v for ` = ` max to 0 do

// Pyramid levels

Downsample the images I

0 for w = 0 to w max do

// Warping and I

1 to current pyramid level

Compute v as the minimizer of E

1 for i = 0 to i max do

( 2.4

)

// Inner iterations

Compute u as the minimizer of E

2

( 2.5

)

end end

Upscale v and u to next pyramid level end

Algorithm 2.1: Computation of duality based optical flow.

The standard settings of the algorithm is described in the following. Unless specifically mentioned, these are the settings used in the calculations described in the rest of this thesis.

Pyramid An image pyramid is built, where on each level, prior to downsampling to the next pyramid level, the images are smoothed with a Gaussian function of standard deviation . The downsampling is done by means of linear interpolation. Evaluation at non-pixel positions in images is done by bicubic interpolation.

16

Warping At the beginning of each warp, the image I

1 is warped according to the current estimate v. If median filtering is used to remove outliers (it is not in the standard setting), it is performed on v prior to warping of I

1

.

Upscaling Flows are upscaled using linear interpolation, and their values are divided by the downscale factor of the pyramid, in order for vector lengths to match the current image size. This is followed by an application of a 3 ⇥ 3 median filter.

The standard parameters of the flow algorithm are given in Table

2.1

.

Table 2.1: Standard parameters of the optical flow algorithm depicted in Algorithm

2.1

Parameter

` max value

70 downscale factor 0.95

p

2

4 w max

90 i max

20

50

✓ 0.2

The algorithm has been implemented in CUDA C in order to take advantage of the thousands of cores on modern GPUs.

2.3

Results

This section presents optical flow results for the TV-L

1 algorithm using the implementation proposed in the previous section. In addition some of the extensions from Section

2.1.3

are considered, and analyzed.

Choosing TV definition Example

2.1.7

introduced three di↵erent definitions of vectorial total variation. Using the algorithm described in the previous section, we can consider the di↵erence in terms of accuracy of the resulting TV-

L

1 optical flow algorithm on the Middlebury Optical Flow Database training

data ( Baker et al. 2011 ). Figure

2.1

shows the average endpoint errors (AEEs) for the di↵erent definitions on the training sequences, as a function of . We see that generally the definitions that couple channels, TV

F better than the uncoupled regularization TV

S and TV

J

, perform

. Figure

2.2

shows the average performance over all test sequence, and we see an average di↵erence in AEE of approximately 0.01 between using coupled or uncoupled regularization, for the respective optimal choices of s. This may not seem like much, but since optical flow algorithms today have so high accuracies, an average improvement of this order will give significantly di↵erent rankings of the algorithms on for example the Middlebury ranking (as of January 14, 2013 the average di↵erence between

17

the two top ranking methods in terms of AEE, MDP-Flow2 and NN-Field is

0.0025). Considering the coupled methods there seem to be little di↵erence in the optimal area, and both choices seem reasonable. Based on this analysis we choose the definition TV

J of

Goldl¨

because of its nice theoretical properties.

( 2012 ) as our standard method,

0.24

Dimetrodon

0.18

0.17

0.22

0.16

0.20

0.15

0.60

0.58

0.56

0.54

0.52

0.50

0

0.18

0 50 100

Hydrangea

150 200

0.14

0

0.26

0.24

0.22

0.20

0

0.17

0.16

0.15

0.14

0.13

50 100

Urban3

150 200

0.40

0

0.38

0.36

0.34

0.32

0.30

50 100 150 200 0

50

50 100

Venus

150 200

50

Grove2

100 150

RubberWhale

200

100

λ

150 200

0.75

0.70

0.65

0.60

0.40

0.38

0.36

0.34

0

0

50 100

Urban2

150 200

50

Grove3

100 150 200

TV type

TV−S

TV−F

TV−J

Figure 2.1: Average endpoint errors for the three di↵erent definitions of total variation TV

S

, TV

F

, and TV

J on training data plotted as a function of .

Comparisson of methods With the algorithmic setup in place, we can evaluate the accuracy of the method. For a baseline method we consider the presented algorithm with standard parameters, using a one-dimensional data term, and TV

J for regularization. Table

2.2

shows results for this method, compared to results of the original algorithm by

Zach et al.

( 2007 ), an improved version of

this algorithm presented by

Wedel et al.

( 2009a ), which among other things uses

structure-texture decomposition of the input images to remove lighting artifacts, and uses intermediate median filtering. Finally we compare to the results presented in

Rakˆet et al.

( 2011 ), that are based on a color image data term solved

by Proposition

2.1.4

.

We see that despite of its simplicity, the baseline method presented here gives slightly better average results than the ones of

Wedel et al.

( 2009a ) and

18

Table 2.2: Average endpoint error results for the Middlebury optical flow database training sequences for di↵erent variants of the TV-L

1 optical flow algorithm. Baseline is the method following Algorithm

2.1

with standard parameters, using a 1D data term with grayscale images, and TV

J for regularization. The second method is the RGB algorithm presented in

Rakˆet et al.

( 2011 ). Third method is the original method of

Zach et al.

( 2007 ), with results

are taken from

Wedel et al.

( 2009a ). The last column holds results of the so-

called TV-L

1

-improved algorithm ( Wedel et al. 2009a ). Bold indicates the best

result among the four baseline

Rakˆet et al.

Zach et al.

Wedel et al.

Dimetrodon

Grove2

Grove3

Hydrangea

RubberWhale

Urban2

Urban3

Venus average

0.13

0.33

0.51

0.31

0.296

0.17

0.14

0.57

0.21

0.17

0.36

0.50

0.49

0.331

0.16

0.15

0.57

0.25

0.22

0.65

1.07

0.48

0.486

0.26

0.19

0.76

0.26

0.09

0.32

0.63

0.26

0.308

0.19

0.15

0.67

0.15

19

0.36

0.34

0.32

TV type

TV−S

TV−F

TV−J

0.30

0 50 100

λ

150 200

Figure 2.2: Average AEE computed across all sequences in Figure

2.1

.

Rakˆet et al.

( 2011 ). This evaluation is of course not optimal, since it is based

on training data, however, it seems to be a general fact that carefully optimized implementations are of great importance when computing optical flow.

Figure 2.3: Frames 10 and 11 of the Army sequence of the Middlebury Optical

Flow Database test set.

Smoothness and visualization of optical flows Consider the two frames from the Army sequence in Figure

2.3

. The ground truth motion represented

in color coding can be found in Figure

2.4

. In this representation hue codes the

direction of the flow vectors, and saturation indicates the length of the vectors.

Black corresponds to occluded areas.

Figure

2.5

shows the estimates produced with the baseline method for three di↵erent values of . We see that with the standard choice of = 50 we are able to properly estimate most details. Increasing produces a less regular flow with more noise artifacts, and decreasing it produces flows where small structures are blurred out. In contrast, substituting the data term with the interval data term presented in Example

2.1.6

, and varying the size of the interval where no

penalty is given, produces another type of smooth flows. With this data term small variations are ignored in the minimization, and the estimation is driven by big di↵erences that may be found at for example edges. This is evident from the results, where edges are preserved quite well, while the interior of objects are very smooth.

20

Figure 2.4: Ground truth optical flow between the two frames from Figure

2.3

.

Flow vectors are coded according to the color legend in the lower right corner.

= 10 = 50 = 100

Figure 2.5: Optical flow between the two frames from Figure

2.3

with baseline method.

c

1

= 100,

= c

2

= 0.02

c

1

= 100,

= c

2

= 0.01

c

1

= 100,

= c

2

= 0.005

Figure 2.6: Optical flow between the two frames from Figure

2.3

with interval data term (Example

2.1.6

), and everything else as in the baseline method.

21

Chapter 3

Applications

This chapter presents five novel applications of the optical flow methods reviewed so far. The first application considers registration of X-ray images of hands. The second application presents a specialized registration problem for

2D chromatograms, and introduces a novel solution strategy. The last three applications look into interpolation in image sequences. The first of these presents a general method where optical flow data terms are reparametrized to fit the interpolation assumptions, which turns out to produce superior results compared to conventional methods. The last two applications are related to distributed video coding. We consider interpolation when depth information of the scene is available, and finally we consider how to give new estimates of in-between frames using partially decoded information about the frame in question.

3.1

Registration of X-ray images

One of the major application areas of registration algorithms is medical images, where proper analysis of the data is often impossible without first registering the images. In this setting TV-L

1 registration o↵ers a number of advantages over typical methods. In particular the parallel nature of the algorithms, and the associated speed when implemented on massively parallel processors, but also the robustness of the L

1 norms used in both data and regularization terms.

This makes TV-L

1 a good choice for out-of-the-box registration to a large body of problems. On the downside, TV-L

1 does not necessarily produce di↵eomorphic registrations, and in applications where this is of importance, one may either try to manually enforce this behavior, or consider methods for di↵eomorphic registration (see for example

Sommer et al.

( 2012 ) and references therein).

Furthermore, if images are taken on completely di↵erent imaging devices, and cannot easily be brought on similar scales, one will prefer a data term suitable for these types of problems. One type of data term that can handle such problems is mutual information, which has recently been considered by

Panin

( 2012 ). The TV-L

1 optical flow method has previously been used for registration in

Pock et al.

( 2007 ), where CT scans of lungs as well as brain MRI images

were registered.

I

1

Consider the X-ray images of two di↵erent hands in Figure

3.1

. Registering

to I

0 is by no means a simple task: The two hands have significantly di↵erent

22

bone structure; the contrast of the images and the amount of noise is di↵erent; and the two lead tags that are placed in the image represent information that should not be registered. This last issue may very well ruin the registration process for many types of methods. Undeterred by these facts, a registration using the standard settings with = 10 and one level of 3 ⇥ 3 median filtering has been performed. The registration results can be found in Figure

3.2

. It is

seen that we get a quite decent registration, with the only big artifact being caused by the lead tag in I

1

. However, this artifact only causes local changes, which do not propagate down to the hand, as can be seen in the deformation visualization. Note that the lead tag in I

0 does not cause any artifacts. This is due to the fact that the corresponding area in I

1 is homogeneous, which means that the updates caused caused by the data term in this area are essentially zero

(Lemma

2.1.1

).

3.1.1

Registration of structural images

The problem of registering the images in Figure

3.1

is in many ways di↵erent from the optical flow problem applied to video sequences. In video data, the same objects are occurring in consecutive pairs of images, and we want to use the fine details in the images to get correct correspondences. In the case of images of two di↵erent objects, a true one-to-one correspondence does not exist, and trying to match fine details may just amount to noise matching—both in terms of actual noise, or person specific structure that may from a larger perspective be considered as inter-personal serially correlated noise structures.

This means that one may benefit from filtering or regularization of the images thereby enhancing dominant structures and removing fine details. Such an approach contrasts the successful structure-texture decomposition used in several

optical flow algorithms ( Wedel et al. 2009a , Sun et al. 2010 ), where the struc-

tural part of the images obtained from regularization are subtracted from the original image. This produces an image that mainly contains texture details.

The structures that are removed often include shadows and general illumination changes, which will in turn give better estimates.

Here we will consider two types of regularization. ROF regularization ( Rudin et al. 1992 ) where we regularize the observed image I

0 using total variation regularization

E

ROF

(I) =

Z

T kI(x) I

0

(x) k

2 dx +

Z

T krI(x)k dx.

(3.1)

Furthermore, we consider a type of regularization that enhances sparsity even further, where the L

1 norm of the gradient is replaced by an L

0 norm that penalize all deviations from 0 equally

E

L 0

(I) =

Z

T kI(x) I

0

(x) k

2 dx +

Z

T krI(x)k

`

0 dµ(x), (3.2) where µ denotes the Hausdor↵ measure.

The ROF model may be minimized e↵ectively using Lemma

2.1.2

, and the

minimizer of ( 3.2

) may be computed using the method described in

Xu et al.

( 2011 ). As opposed to many other types of regularization, these methods are

edge preserving, and the L

0 regularization of the gradient produces cartoon-like

23

I

0

I

1

Figure 3.1: X-ray images of two di↵erent hands.

http://mrmackenzie.co.uk/2011/11/01/x-rays-in-medicine/

Sources are and http://images.suite101.com/460740_com_28_hand.jpg

.

I

1 registered Deformation of coordinate system

Di↵erence between I

0 and registered I

1

Figure 3.2: Registration of I

1 to I

0 using the standard TV-L

1

= 1 and one level of 3

⇥ 3 median filtering.

algorithm with

24

results with strong edges and large flat regions of zero gradient. The regularization results of these methods with respectively = 10 and = 200 can be found in Figure

3.3

. The corresponding registrations, computed using the

same parameters as before, are found in figures

3.4

and

3.5

. While the results

look very similar to the ones found in Figure

3.2

, we see that the results for

the regularized images tend to deform the coordinate system outside the hand to a greater extend than the non-regularized case. The reason for this is that the gradients driving the registration are small or zero inside the bones after regularization. This is desirable both from a practical point of view, since the purpose of these types of registrations are typically to align bones in a similar coordinate system, as opposed to bending and deforming the interior of bones to produce a good match. A somewhat similar e↵ect can also be achieved by using the interval data term from Example

2.1.6

instead of the L

1 norm in the optical flow algorithm.

25

ROF regularized I

0

ROF regularized I

1

L

0 regularized I

0

L

0 regularized I

1

Figure 3.3: Results of ROF and L

0

3.1

.

regularization on the images from Figure

26

I

1 registered Deformation of coordinate system

Di↵erence between I

0 and registered I

1

Figure 3.4: Registration of I

1

TV-L

1 algorithm with to I

0 after ROF regularization using the standard

= 1 and one level of 3 ⇥ 3 median filtering.

I

1 registered Deformation of coordinate system

Di↵erence between I

0 and registered I

1

Figure 3.5: Registration of I

1 to I

0 after L

0

TV-L

1 algorithm with regularization using the standard

= 1 and one level of 3

⇥ 3 median filtering.

27

Figure 3.6: Example of a chromatogram along with the absorbance (A.U.) curves corresponding to two fixed wavelengths.

3.2

Registration of 2D chromatograms

Chromatography is a process for separating mixtures. One use of chromatography is measuring relative proportions of analytes in a number of mixtures, to determine di↵erences. An example of a 2D chromatogram is shown in Figure

3.6

.

The chromatograms we are considering have been generated using ultra-high-

performing liquid chromatography with diode-array-detection ( Petersen et al.

2011 ). The chromatograms consists of 209 wavelengths each measured at 24,000

retention times. The subject of the analysis is rapeseed seedlings having been exposed to di↵erent levels glyphosate (commonly known as Roundup R ).

The images arising from this procedure will have shifts in retention time, but because of the experimental setup, no such shifts occur in the wavelength dimension. This means that we have a one-dimensional registration problem for a two-dimensional image.

Consider the single wavelength of four chromatograms shown in Figure

3.7

.

The retention time shifts are clearly visible. Furthermore there seem to be a varying detector sensitivity, resulting in some of the curves consistently having higher peaks that others. Finally there are small variations that cannot be explained by the mentioned issues, and which can be ascribed to serially correlated e↵ects and noise.

3.2.1

Registration algorithm

Given two chromatograms I

T t

0

, I

1

: T ! R of size k ⇥ n, where T = T w

, consider the problem of estimating the disparity v : T

I

1

(w, t + v(t)) is properly registered to I

0 sensitivity, a robust data term such as an L

1 t

! T t such that

(w, t). Because of the varying detector norm is preferable.

From the point of view of Section

2.1.2

the natural formulation of the data term is as a vector valued problem. Let

I i

(t) =

0

I i

(w

1

, t)

.

..

1

I i

(w k

, t)

28

Retention time

Figure 3.7: A range of retention times for a single wavelength for four chromatograms.

The optical flow constraint may then be written as

I

1

(t + v(t)) I

0

(t) = 0.

Linearizing this around a given estimate v

0

, we get the following system of equations

@ t

I

1

(t + v

0

) v(t) @ t

I

1

(t + v

0

)v

0

+ I

1

(t + v

0

) I

0

(t) = 0.

a b

Considering an L

1 norm of this linearization of this data term, we see that the case (ii) of Proposition

2.1.4

is very easily calculated, however, it seems unlikely that it will ever be the case that b 2 Im a for just a moderate number of wavelengths. This means that we will almost surely be in the less attractive case (i) where we have to minimize by some iterative procedure.

An novel alternative for registering this dataset has been described in

Rakˆet

& Markussen ( 2014 ). The idea is to treat the one-dimensional vector valued

registration problem as a two-dimensional problem, and couple the di↵erent vector channels through the regularization rather than through the data term.

The method is generally applicable, and works by posing an d dimensional registration problem with data taking values in a k dimensional space, as a onedimensional registration problem on a d + 1 dimensional domain. This is done by treating the vector channels as an added dimension to the domain. This way the regularization will be d + 1 dimensional, and by enforcing strong (or increasing) weight on the regularity across this new dimension, information is propagated between the di↵erent channels of the image to produce a registration that is homogeneous along the new dimension.

As described above, we start out by estimating disparities for each wavelength. In the given example we are interested in a robust L

1 norm for the data term. The robustness is important because of the varying detector sensitivity and serially correlated e↵ects, where for example an L

2 norm may cause problems in relation to outliers. For regularization, we are interested in a term, that in addition to imposing regularity on the estimated disparities, regularize across wavelengths. Since one must expect drifts in retention time to be continuous,

29

the registration should be smooth, and therefore we will regularize using squared gradient magnitude instead of total variation. The energy to be minimized looks as follows

Z Z

E(v) = kr w,t v w

(t) k

2 dt.

T kI

1

(w, t + v w

(t)) I

0

(w, t) k dw dt +

T

This functional is minimized following the methods described in Chapter

2

where the data term is iteratively approximated by its first-order Taylor approximation around the given estimate v w

0

⇢(v)(w, t) = @ t

I

1

(w, t + v w

0

(t))(v(t) v w

0

(t)) + I

1

(w, t + v w

0

(t)) I

0

(w, t).

Furthermore data-fidelity and regularization are decoupled by means of a quadratic proximity term

E(v, v

0

) =

Z

T

+ k⇢(v w

)(w, t) k dw dt

1

2✓

Z

T kv w

(t) v

0 w

(t) k

2 dt +

Z

T kr w,t v

0 w

(t) k

2 dt where ✓ is sufficiently small. Using Proposition

2.1.4

, the pointwise solution in

v w is found to be v w

(t) = v

0 w

(t) ✓

8

>

@ t

@

I t

1

I

1

(w, t + v

(w, t + v w

0

@ t

I

1

⇢(v)(w,t)

(w,t+v w

0

(t)) w

0

(t)) if

(t)) if if

⇢(v)(w,t)

<

|@ t

I

1

(w, t + v w

0

(t))

|

⇢(v)(w,t)

|⇢(v)(w,t)|

>

|@ t

I

1

(w, t + v w

0

(t))

|

2

 |@ t

I

1

(w, t + v w

0

(t)) |

2

2

.

The problem in v

0 w is just a standard Tikhonov regularization problem, and can easily be solved using standard methods. E is minimized iteratively in a coarse-to-fine manner, where the input images and the corresponding disparities are gradually upsampled in the retention time dimension, but the wavelength dimension is kept at its original size. Following Algorithm

2.1

we use ` max

= 160 and a scaling factor between levels of 0.97, yielding a downsampling factor at the coarsest level of approximately 130. w max

= 100 warps are performed at each level, and was set to 60, while ✓ was fixed at 0.1.

Figure

3.8

shows the individual wavelength registration curves (gray) of the described method, as well as the average (red) for two 2D chromatograms. The weighting of the wavelength dimension of the gradient in the Tikhonov regularization are respectively a factor 0 (i.e. registering each wavelength independently) and 10. As we can see, with the higher weight we are able to propagate information between wavelengths very well, and end up with a uniform result across wavelengths. Note in addition that the average curves are quite di↵erent in the two cases.

The average registration is used as the final single disparity v : T t

! T t

.

The registration was then done by warping the chromatograms according to v for each wavelength. The result of the registration procedure on the data in

Figure

3.7

can be found in Figure

3.9

. We see that the data is very well aligned

after registration.

30

Independent registration of wavelengths

Dependent registration of wavelengths

Figure 3.8: Registration curves of individual wave lengths (gray) with the average registration plotted on top (red).

Retention time

Figure 3.9: The chromatograms from Figure

3.7

registrated along retention time.

31

3.3

Image interpolation with a symmetric optical flow constraint

Frame interpolation is the process of creating intermediate images in a sequences of known images. The process has many uses, for example video post-processing and restoration, temporal upsampling in HDTVs to enhance viewing experience, as well as a number of more technical applications, in for example in video coding

( Girod et al. 2005 ,

Huang et al. 2011 ).

In this section we will review optical flow based frame rate upsampling which performs interpolation along the motion trajectories. In particular, we will review the method presented in

Rakˆet et al.

( 2012a ), where the optical flow

energy is reparametrized such that it fits better to the given problem. The reparametrized energy has a symmetric data fidelity term, that uses both surrounding frames as references. We show that one can improve modern frame interpolation methods substantially by this powerful generic trick, that can be incorporated in existing schemes without requiring major adaptations. We analyze the reparametrization, and show experimentally that it has a substantial e↵ect on the stability and robustness of the interpolation process.

The idea to symmetrize data matching terms to achieve better results has already established its usefulness in other areas. In image registration

Christensen

& Johnson ( 2001 ) explored the benefit of penalizing consistency, by jointly esti-

mating forward and backward transforms, and requiring that they were inverses of one another. A similar idea was applied to the optical flow problem by

Alvarez et al.

( 2007a ), who imposed an additional consistency term. Later that

same year

Alvarez et al.

( 2007b ) proposed a reparametrization similar to the

one derived here, in order to avoid a reference frame, and thereby increase flow consistency. However, they did not use the obtained symmetric flow directly, but interpolated flow values at pixel position of a reference image in order to obtain a flow comparable to the standard asymmetric flow. Recently

Chen ( 2012 )

used a symmetric data term for surface velocity estimation, noting the property that motion vector length is halved, which in turn gives better handling of large displacements.

Apart from being algorithmically di↵erent, the di↵erence between the justification for the reparametrization given here and the justifications of

Alvarez et al.

( 2007b ) and

Chen ( 2012 ) is that we have chosen the symmetric data fidelity

term because it explicitly models the standard interpolation assumption, rather than improves some notion of consistency, or better handles large displacements.

In turn this also means that we may use the estimated flows directly on the unknown frame, and thereby avoid the problems related to temporal warping. As we will show, the mentioned benefits are clearly reflected in the results. It is demonstrated that using a symmetric flow for interpolation is generally better than using either forward or backward flows or both.

32

3.3.1

Motion Compensated Frame Interpolation

Given two images I

0 and I

1 and an estimate of the (forward) optical flow v f we are interested in estimating the in-between image I

1 /

2

(the methods are easily extended to any in-between frame I t

, t

2 (0, 1)). A simple approach is to assume that the motion vectors are linear through I

1 /

2 and then fill in I

1 /

2 using the computed flow. However, since v f is of sub-pixel accuarcy, the points x +

1 /

2 v f

(x) that are hit by the motion vectors are generally not pixel positions. This is often solved by warping the flow to the temporal position of the intermediate frame I

1 /

2

( Baker et al. 2011 ,

Herbst et al. 2009

which one defines a new flow v

1 /

2 f from I

1 /

2 to I

1

,

Werlberger et al. 2011 ), in

v

1 /

2 f

(round(x +

1

/

2 v f

(x))) =

1

/

2 v f

(x), (3.3) where the round function rounds the argument to nearest pixel value in the domain. There are some drawbacks to this approach. First, if the area around x in I

0 is occluded in I

1

, there are likely multiple flow candidates assigned to the point round(x + 1 /

2 v f

(x)). In the converse situation, i.e. dis-occlusion from I

0 to I

1 there may be pixels that are not hit by a flow vector, thus leaving holes in the flow. While the first problem can be solved by choosing the candidate vector with the best data fit, that is the candidate v f for which kI

1

(x+v f

(x)) I(x) k is smallest, the solution for the problem of dis-occlusions in not that simple. Here we will simply fill the holes in the flow field by an outside-in filling strategy.

With a dense flow we can then interpolate I

1

/

2 using the forward scheme

I

1 /

2

(x) =

1

2

I

0

(x v

1 /

2 f

(x)) + I

1

(x + v

1 /

2 f

(x))

, (3.4) or consider the backward flow v b

(i.e. the flow from I

1 to I

0

) and use a backward scheme accordingly. We will in addition consider a bidirectional interpolation scheme where the frame is interpolated as the average frames obtained by the forward and backward schemes.

One can sophisticate the interpolation methods by estimating occluded regions and selectively interpolating from the correct frame. We will not pursue any occlusion reasoning here, but refer to

Herbst et al.

( 2009 ) and

Stich et al.

( 2008 ) for details.

3.3.2

Reparametrizing Optical Flow for Interpolation

The approach presented in the previous section is the standard procedure for

frame interpolation and serves as backbone in many algorithms ( Baker et al.

2011 ,

Huang et al. 2011 , Keller et al. 2010 ,

Werlberger et al. 2011 ). In this

section we will reparametrize the original energy functional so the recovered flow is better suited for interpolation purposes. The reparametrization turns out to be beneficial on a number of levels: It makes the temporal warping of the flow superfluous, eliminates the need to calculate flows in both directions, improves handling of large motion, and increase overall robustness.

The original optical flow energy functional take as argument an optical flow v that is defined on a continuous domain. In practice, however, we only observe images at discrete pixels, and the optical flow is typically only estimated at the points corresponding to the pixels in I

0

. Since we assume that the intermediate

33

frame I

1

/

2 can be obtained from linearly following the flow vectors, we propose to reparametrize the data fidelity functional of the TV-L

1 optical flow energy using this assumption, so that it is given as

1

Z

2

T kI

1

(x + v s

(x)) I

0

(x v s

(x)) k dx.

(3.5)

We note that in this parametrization, the coordinates of the optical flow matches those of the intermediate frame I

1

/

2

, and using this data term will thus eliminate the need for warping of the flow, since interpolation can directly be

done similarly to ( 3.4

). Because the motion vectors of the symmetric flow v

s are only half of the ones of the forward or backward flows, we need to halve the corresponding to keep comparison fair, which is the reason for the factor 1 /

2

.

Linearizing the data matching term ( 3.5

) around v

0 gives

⇢(v s

) = I

1

( · + v

0

) I

0

( · v

0

) + (J

I

1

( · + v

0

) + J

I

0

( · v

0

))(v s v

0

) (3.6)

which is of the form ( 2.14

). For grayscale images the corresponding split energy

term ( 2.8

) is easily minimized using Lemma

2.1.1

, and in general using the L

1

-L

2 minimization described in Proposition

2.1.4

.

The di↵erences between ( 3.6

) and the conventional linearization are that

we now allow sub-pixel matching in both surrounding images, and instead of a single Jacobian we have a sum of two. Thinking of this linearization as a finite di↵erence scheme corresponding to a linearized di↵erential form of the

data fidelity term ( Horn & Schunck 1981 ), we see that the temporal derivative

is represented by a central finite di↵erence scheme, as opposed to the typical forward di↵erences. In addition the sum of the two Jacobians should make the estimation procedure more robust to noise, as the noise amplification caused by derivative estimation is now averaged over two frames—a fact that has previously been used heuristically to improve accuracy in asymmetric flow estimation

( Wedel et al. 2009a ). Finally we note that the motion vectors will only have

half the length of the ones obtained from the regular parametrization. This will make the method better suited to handle large displacements compared to traditional methods that only make use of a one-sided linearization.

3.3.3

Results

Motion compensated frame interpolation finds many uses, ranging from the more

technical applications such as video coding ( Girod et al. 2005 ,

Huang et al. 2011 ,

Luong et al. 2012 ) to disciplines like improving viewing experience ( Keller et al.

2010

) or restoration of historic material ( Werlberger et al. 2011 ). For the former

type of application the reconstruction quality in terms of quantitative measures is of great importance. For the latter types it is hard to devise specific measures of quality, as the human visual system is very tolerant to some types of errors, while it instantly notices other types.

For the results presented in the following, the optical flows have been computed using the algorithm illustrated in Algorithm

2.1

, with standard parame-

ters, except: On each level w max

= 60 warps are performed with i max

= 5 inner total variation TV

J

.

34

Beanbags

3.0

2.8

2.6

3.6

3.4

3.2

2.9

2.8

2.7

2.6

1.42

1.40

1.38

1.36

1.34

1.32

10 20

MiniCooper

30 40

1.430

1.425

1.420

1.415

1.410

1.405

50

1.400

λ

10

DogDance

Walking

20 30 40 50

method

backward bidirectional forward symmetric

Figure 3.10: Performance for varying on the four High-speed camera training sequences from the Middlebury Optical Flow benchmark.

As a first experiment we compare the four di↵erent types of interpolation suggested in the previous sections, on the four High-speed camera training se-

quences of the Middlebury Optical Flow benchmark ( Baker et al. 2011 ). Figure

3.10

shows the e↵ect of varying the data term weight in terms of the mean absolute interpolation error (MAIE). We see that the symmetric flow outperforms the conventional approaches, and that it is typically less sensitive in terms of the choice of . In particular we see that the difficult Beanbags sequence which contains large displacements is handled much better by the symmetric scheme.

By evaluation on the Middlebury training set it was found that = 35 gave the best overall performance for the symmetric flow, and that = 20 gave the best performance for the other three methods. These values will be used in the rest of the experiments presented in this section.

Consider as a second example the results of interpolation under noise presented in Figure

3.11

. This figure shows the mean square interpolation error per-

formance of the four methods on the Beanbags sequence with additive N (0,

2

) noise. The improved robustness of the symmetric interpolation method is clearly visible from the distances between the MAIEs to the asymmetric methods that increase as the standard deviation of the noise increases. In addition we see that the variance of the MAIEs across the independent replications is significantly lower for the symmetric method compared to the three other methods.

Now consider, as a third example, the frames given in Figure

3.12

. The

sequence has large displacements (> 35 pixels) and severe deformations, which makes the estimation of I

1 /

2 very difficult. Figure

3.13

shows the three di↵erent flows v f

, v b and v s along with the corresponding interpolated frames. Zoom ins of details can be found in Figure

3.14

. We see that the result generated by the

35

Beanbags

400

350

300

250

200

150

100

0

method

● backward

● bidirectional forward symmetric

5 10

σ

15 20 25

Figure 3.11: Mean square interpolation error performance under additive

N (0,

2

) noise for varying . Results are for the Beanbags sequence, and are based on 10 independent replications.

symmetric flow is visually more pleasing than the ones produced by the forward and backward flows, a fact that is also clearly reflected in the MAIEs and root mean square interpolation errors (RMSIE).

Finally let us compare the method to some methods of the current stateof-the-art. Table

3.1

holds the RMSIEs for six sequences from the Middlebury

Optical Flow benchmark and results for a number of methods. While the results cannot fully match those of

Stich et al.

( 2008 ), which gives significantly better

results on 3 of the sequences, our method outperforms all other approaches, including the recent and much more complex methods of

Chen & Lorenz ( 2011 )

and

Werlberger et al.

( 2011 ).

Real-time performance In the presented setup we only have to compute a single flow field between two images and fill in the intermediate frame from the trajectories. The runtime of the interpolation is dominated by the time it takes to compute the flow field, and at a slight cost in accuracy (5 pyramid levels with a scale factor of 2, and 30 warps per level, 1 level of median filtering) the flow fields can be computed in real-time (

⇠35 fps) for 640 ⇥ 480 images using an NVIDIA Tesla C2050 GPU, which in turn means that we can do real-time frame doubling of 30fps video footage at a resolution of 640 ⇥ 480 pixels.

Figure 3.12: Frames 7 (I

0

), 10 (I

1 /

2

) and 13 (I

1

) of the Mequon sequence.

36

Forward Backward Symmetric

MAIE 8.72, RMSIE 20.30

MAIE 8.85, RMSIE 20.09

MAIE 8.23, RMSIE 19.02

Figure 3.13: Results for the Mequon sequence. Top row: Color coded optical flows, buttom row: Interpolation results. Zoom ins of details can be found in

Figure

3.14

.

Ground truth Forward Backward Symmetric

Figure 3.14: Details of the interpolated Mequon frame from Figure

3.13

.

37

Table 3.1: RMSIE for di↵erent Middlebury sequences. Bold indicates the best result.

Results are taken from

Stich et al.

( 2008 ).

Marked algorithms have not been implemented by their respective authors, but are based on alternative implementation on the Middlebury Optical Flow database.

Method

Symmetric TV-L

1

Chen & Lorenz ( 2011 )

Werlberger et al.

( 2011 )

Stich et al.

( 2008 )

Bruhn et al.

( 2005 )

†,‡

Zitnick et al.

( 2005 )

Black & Anandan ( 1996 )

Pyramid Lukas-Kanade

†,‡

Dimetrodon

1.93

1.95

1.93

1.78

2.59

3.06

2.56

2.49

Venus Hydrangea

3.45

3.63

2.88

3.73

5.33

3.93

3.67

3.36

2.57

RubberWhale

1.46

1.59

Method

Symmetric TV-L

1

Werlberger et al.

( 2011 )

MiniCooper

3.96

4.55

Walking

2.89

3.97

38

Figure 3.15: Frame 90 of the Breakdancers sequence from Microsoft Research, with its corresponding depth map.

3.4

Image interpolation using depth data

In recent years, the 3DTV and 2D-plus-depth formats have seen an increasing popularity. While conventional methods for frame interpolation usually work on a pair of outer images, similar to the one to be interpolated, these hybrid formats o↵er some new challenges and opportunities.

Distributed video coding ( Girod et al. 2005 ) provides an interesting appli-

cation of frame interpolation, and variations of both standard and symmetric

optical flow based methods have been used in this area ( Huang et al. 2011 ,

Rakˆet et al. 2012b ). In distributed video coding, a set of key frames is coded using

conventional coding, and intermediate frames are coded using Wyner-Ziv coding

( Aaron et al. 2002 ). From the point of view of this application, the key frames

can be considered given, and in order to decode the intermediate Wyner-Ziv frames, estimates to be used as side information in the decoding process must be generated. For the 2D-plus-depth format, one may code the depth map very

efficiently ( Zamarin & Forchhammer 2012 ), and use the depth frames in ad-

dition to the given key frames. This is beneficial, since depth frames usually contain most of the information needed for proper motion estimation, as can be seen in Figure

3.15

. The following sections describe a scheme that was de-

vised for

Salmistraro, Rakˆet, Zamarin, Ukhanova & Forchhammer ( 2013 ). The

converse scheme, where texture frames are used to model depth movement has been investigated in

Salmistraro, Zamarin, Rakˆet & Forchhammer ( 2013 ).

3.4.1

Optical flow computation using brightness and depth

Denote by I t 1

, I t+1 the two (brightness) key frames, which we want to use for interpolating the intermediate frame I t

. Furthermore, let two depth frames D t

, and D t+1 be given. We can generalize the approach presented in Section

3.3

,

so in addition to the symmetric data term ( 3.5

) we also include an asymmetric

39

term for the depth frames to the energy

Z

E(v) =

1

Z

T kI t+1

(x + v(x)) I t 1

(x v(x)) k dx

+

2 kD t+1

(x + v(x)) D t

(x) k dx

Z

T

+ kD

J v(x) k dx.

T

(3.7)

With two data terms, this energy does not fit in to any of the methods described so far unless

1

= 0 or

2

= 0. For a sum of two L

1 norms, the solution may however be found explicitly, by means of the results in

Wedel et al.

( 2008 ).

The advantage of the full formulation ( 3.7

), as opposed to a purely symmetric

data term (

2

= 0), is that we have a smaller temporal gap on the depth data, which means that we may recover some of the non-linear motions between the two key frames. In addition, this smaller gap produces a better estimate, as the displacements are smaller. The outer key frames may then help getting the apparent motion right where the depth frames does not supply enough information. This may for example be texture, and shadows that does not show up in the depth maps.

3.4.2

Interpolation

When interpolating a frame in an image sequence, we are interested in using information from both surrounding frames. Thus we compute the forward flow v f as described in the previous section, and a backward flow v interchange I t+1 and I t 1 and replace D t+1 with D t 1

in ( 3.7

).

b

, where we

The results are asymmetric because of the depth information, and thus a

symmetric interpolation such as ( 3.4

) should not be used. Instead we simply

interpolate by

I t

(x) =

1

2

I t 1

(x + v b

(x)) + I t+1

(x + v f

(x)) where the sub-pixel locations are evaluated using bicubic interpolation.

3.4.3

Results

With the energy ( 3.7

) it is natural to consider three distinct cases. These will

be denoted by T2T (

2

= 0), D2T (

1

= 0), and DT2T (

1

6= 0,

2

6= 0).

The motion estimates are recovered following Algorithm

2.1

with standard settings, except that ` max

= 65, i max

= 10, and = 0.5 for T2T and DT2T, while = 0.35 for D2T. For the T2T method

1 was set to 40, for D2T

2

= 30, and

1

= 5,

2

= 40 for DT2T.

We evaluate the method on the sequences Breakdancers and Ballet from

Microsoft Research ( Zitnick et al. 2004 ), and Dancer from Nokia Research. We

use the central view of the three sequences, at 15 fps downsampled to CIF resolution.

In Table

3.2

we see that the symmetric interpolation T2T gives the worst results, and that D2T that only uses depth improves the average interpolation

40

quality by 1.4 dB. Combining both the symmetric and depth information in

DT2T, we gain an additional 0.8 dB.

Table 3.2: Average peak-signal-to-noise-ratio of interpolated frames for the three di↵erent methods for the first 100 frames of the sequences.

Method

Ballet Breakdancers Dancer

T2T

D2T

34.7

37.0

DT2T 38.0

27.5

28.5

29.0

30.5

31.4

32.3

Figure

3.16

shows an interesting example from the Dancer sequence. In this example there are large movements of shadows, that are not visible in the depth images. The optical flows for the three methods, along with the interpolated frames can be found in Figure

3.17

. We see that the shadow on the wall is

interpolated quite well for the T2T method, while the movement of the dancer is less well modeled due to the large temporal distance between the key frames.

For the D2T method, the movements of the dancer are well captured, but no movement is identified in the wall area, and the interpolation is just the average of the outer frames. The DT2T method identifies both the movement of the dancer as well as the shadows, and gives a better overall result than the two other methods.

Figure 3.16: Frames 94, 95, and 96 and corresponding depth maps of the Dancer sequence from Nokia Research.

41

T2T D2T DT2T

PSNR 26.1

PSNR 27.1

PSNR 29.3

Figure 3.17: Estimated forward (top) and backward (middle) flow fields, along with the interpolation result (bottom).

42

3.5

Image interpolation using partially decoded frames

In this section we will consider the refinement of motion estimation and interpolation in a distributed video coding setup. We use transform domain coding with a discrete cosine transform (DCT) like transform. In this setup every decoded bit plane will produce affine constraints on the frame to be decoded, that can be used to refine the estimate of the frame to be decoded. This chapter is based on methods developed for the codec described in

Luong et al.

( 2013 ).

3.5.1

Initial frame interpolation

For the initial interpolation we are in the setting of Section

3.3

. We have two

images I t 1 and I t+1

. The interpolation is done following

Rakˆet et al.

( 2012b ),

where the forward, backward and symmetric interpolations presented in Section

3.3

are computed, and the final result is taken to be the average of the three.

Figure 3.18: Frames 84, 85, and 86 of the Soccer sequence

PSNR 20.4

PSNR 20.3

PSNR 21.5

PSNR 21.2

Figure 3.19: Forward, backward, and symmetric interpolation results and average, for frame 84 of the Soccer sequence.

A specialized coarse-to-fine pyramidal implementation of the above algorithm is used. Following Algorithm

2.1

, we have to following modifications

to the standard settings: On each level we perform w max

= 30 warps using i max

= 10 iterations of the BM algorithm of

Goldl¨ ( 2012 ). In order to

improve interpolation quality, ⇢ has been weighted by the gradient magnitude krI

1

(x+v

0

)+0.01

k (slightly shifted to avoid division by 0) in the minimization

of ( 2.4

) ( Zimmer et al. 2011 ), which will allow for more even step sizes in the

estimation. With this modification, was set to 3.

All in all this produces a more robust flow for interpolation, and combining the symmetric flow with the warped forward and backward flows, we propose

43

to do the interpolation as follows

I

1 /

2

(x) =

1

6

(I

1

(x + v

1 /

2 f

(x)) + I

1

(x v

1 /

2 b

(x)) + I

1

(x + v s

(x))

+ I

0

(x v

1 /

2 f

(x)) + I

0

(x + v

1 /

2 b

(x)) + I

0

(x v s

(x))),

(3.8) so the interpolation is the average of the two surrounded frames warped to the center using the three di↵erent flows.

We will evaluate ( 3.8

) which we will denote 3OF on the test sequences (QCIF,

15 fps) Coastguard QP=26, Foreman QP=25, Hall QP=24 and Soccer QP=25, where we interpolate every other frame and compare to the overlapped block motion compensation (OBMC) method from

Huang & Forchhammer ( 2005 ) and

the TV-L

1 optical flow (OF) method from

Huang et al.

( 2011 ). The results are

found in Table

3.3

, and it can be seen that the proposed method outperforms

OBMC and OF on all sequence, with an average increase in PSNR of 1.26 dB over OBMC and 2.25 dB over OF.

Table 3.3: Average PSNR across the 74 interpolated frames for the four test sequences.

Sequence OBMC OF 3OF

Coastguard 31.83

30.92

32.71

Foreman 29.26

29.28

30.19

Hall

Soccer

36.46

21.30

32.28

22.43

37.25

23.75

With initial estimates produced by 3OF we may decode the so-called Wyner-

Ziv frames. Decoding is done one bit plane at the time, going from most significant to least significant bit in transform domain. In the next section it is considered how one may link this information to the frame to be decoded in pixel domain.

3.5.2

Upsampling images from DCT coefficients

The frames to be decoded have been transformed using the following DCT-like transform on every 4 ⇥ 4 image patch I

DCT

0

(I) = (CIC

>

) E, where denote pointwise multiplication, and

C =

0

1 1

B

@

2 1

1 1

1

1

1

2 2

1 1

1

1

2

C

1

E =

0

B

@

1 /

1 /

1 /

4

1 / p

10

1 /

4

1 / p

10 p

10

2 /

5

1 / p

10

2 /

5

1 /

1 /

4

1 / p

10

1 /

4

1 / p

10 p

10

2 /

5 p

10

2 /

5

1

C

A .

We see that the DCT at a single point (i, j) can be written as

E ij

C i

⌦ C j

I v a

44

where I v is the vectorization

I v

=

0

I

00

1

B

@

I

10

.

..

C

A

I

33

.

Decoding is done bit plane wise for the image in question, and gives a set of intervals d

1

± c

1

, . . . , d k

± c k

(with corresponding DCT vectors a

1

, . . . , a k

) wherein the DCT coefficients lie. To reconstruct a 4

⇥ 4 image patch I we can use the fact that the vectorized patch should fulfill

AI v

2 [d

1 c

1

, d

1

+ c

1

]

⇥ · · · ⇥ [d k c k

, d k

+ c k

] A =

0

B a

1

.

..

1

D a k

A simple solution for reconstructing a 4

⇥ 4 patch consists in ignoring the interas the solution to

Ae v

=

0

B d

1

.

..

1 d k with the least Euclidian norm, which is simply given as the Moore-Penrose pseudoinverse of the midpoint vector eI v

= A

0

B d

1

.

..

1 d k

Example 3.5.1. Given c

1

= DCT

0

(I)(0, 0), A =

1

4

A

=

1

4

1 · · · 1

>

,

1 · · · 1 , and d

1

4

, in other words we fill in the patch in pixel domain by its average.

Example 3.5.2. Given an estimate I v subject to the constraint specified by a vector a, we can first check if the estimate is admissible by checking if

|a

>

I v d |  c.

If so, we will not do anything. When this is not the case we can project the solution onto the planar strip given by the constraint.

Assuming that a

>

I v d > c, we want to compute the orthogonal projection of I v onto

{x : a

> x d = c

}, which is equivalent to computing the projection of I v given by a plus d + c, which is a

>

(I v a

>

(d + c)) a a + d + c.

(d + c) onto the line

45

Upsampling using global regularization Consider now the case of an entire image I, with the 4 ⇥4 patches I

1

, . . . , I n

, and corresponding DCT intervals

D

1

, . . . , D n

. We can consider the problem of reconstructing all patches subject to a global roughness penalty. This means that the reconstructed patches interact to give a result that is regular across the entire image. In addition the regularization will make the problem well-posed. A choice of regularization that is both simple and powerful is total variation. The problem may then be formulated as wanting to recover I as the minimizer of

Z krI(x)| dx such that AI i

2 D i for all i.

T

(3.9)

Direct minimization of this problem is not feasible, and instead we propose the following procedure. Starting out from an initial solution, we find a nearby solution with better regularity properties, using the algorithm of

Chambolle

( 2004 ). All 4

⇥ 4 blocks of the resulting regularized solution are projected onto the set of admissible solutions, and using this new initial solution we repeat the algorithm. In short, the algorithm is simply an iterative minimization of the energy

E(I) =

Z kI(x) I

0

(x) k

2 dx +

Z krI(x)| dx (3.10) where I

0 is the orthogonal projection of a current solution onto the set of admissible solutions given by the constraints AI i

2 D i

. The orthogonal projection is computed using an alternating projections method with the result from Example

3.5.2

As previously mentioned, the decoding of a frame is done from most significant bit plane to least significant bit plane. In the following we will denote the corresponding constraints on the DCT coefficients of the decoded bands by DC,

AC1, AC2, etc., and use data from decoding of the codec presented in

Luong et al.

( 2013 ).

Figure

3.20

shows an example of how to convert the DC to an estimate of the unknown image in pixel domain. The particular structure of the DC allows for easy manipulation to create an initial estimate of the frame in question. Using only the midpoint of the given intervals, the solution with least Euclidian norm

(Example

3.5.1

) is a blocky nearest-neighbor-like interpolation of the image. A

smoother estimate may be found by using bicubic interpolation on the same mid-points, which will add some global regularity to the estimate, and finally the quality of this solution may in general be improved by projecting the bicubic solution onto to the space of admissible solutions given by the known coefficients.

This final estimate will be used as the initial guess when minimizing ( 3.10

).

Figure

3.21

shows the results of the presented algorithm with an increasing

number of decoded bands. For the algorithm we have iterated ( 3.10

) 5000 times

with = 50, and the projection onto the set of admissible solutions is done by 20 iterations of alternating projection of all individual constraints. While the characteristics of total total variation smoothing are clearly visible, we see that already at four decoded bands (out of 16 in total), the estimates look quite decent.

46

Ground truth Upscaled DC midpoint

Bicubic upscaling

Projected bicubic upscaling

PSNR 24.4

PSNR 24.5

PSNR 25.7

Figure 3.20: Frame 81 of the Soccer sequence and reconstruction of the frame based on the DC.

DC AC1 AC2 AC3

PSNR 26.3

PSNR 28.2

PSNR 30.4

PSNR 31.1

Figure 3.21: Total variation upscaling of frame 81 of the Soccer sequence using an increasing number of decoded bands.

3.5.3

Motion reestimation and interpolation

The upsampling method described above provides a link, that allows one to use the information contained in decoded bands in transform domain, to full resolution images in pixel domain. This new information is di↵erent from the depth maps used in the interpolation process in Section

3.4

, in the sense that it

is just an overly smooth estimate of the frame to be interpolated. The goal is then to fill in the fine details from the outer frame onto this smooth estimate.

It turns out that this detail mapping may be done remarkably well by using a slightly modified TV-L

1 optical flow algorithm. With an estimate of the frame in question ˆ t produced by the total variation minimization proposed in the previous section, motion can be estimated directly between the estimate and the corresponding key frames I t 1

, I t+1

. This reduction of the temporal gap may increase the accuracy in the motion estimation, but more importantly, it eliminates the need for the assumption that motion is linear in-between key frames. The main difficulty is that the estimated frame may be a very coarse approximation of the real solution, in particular when only the DC coefficients have been decoded. To address this, we will use a specialized smoothing strategy prior to downsampling images in the image pyramid. At pyramid level ` with downsampling factor the Gaussian smoothing compared to the full resolution images has standard deviation 0.5

·

This means that from level `

0 max(`,`

0

) where `

0 is some given level.

and down we will have a fixed total standard deviation compared to the full resolution images, and thus smooth out the finer details at these levels. This smoothing makes it possible to properly estimate motion to the generated estimates. Apart from this specialized downsampling

47

and smoothing strategy, we use Algorithm

2.1

with i max standard parameters. Table

3.4

gives the levels `

0 and

= 15 and otherwise values used for the di↵erent number of decoded bands.

Figure

3.22

shows the results of this procedure, where details from the key frames are mapped onto the estimates from Figure

3.21

.

Table 3.4: Paramaters used for reestimation with varying number of decoded bands.

Highest decoded `

0

DC

AC1

25

15

15

15

AC2

AC3

15 40

15 60

DC AC1 AC2 AC3

PSNR 27.7

PSNR 30.2

PSNR 32.2

PSNR 32.6

Figure 3.22: Details mapped to the estimates from Figure

3.21

using optical flow from surrounding frames.

The average PSNR values for the four test sequences can be found in Table

3.5

. We see that results after reestimation are significantly better for the dy-

namic sequences Foreman and Soccer, while the initial estimates provide better results for the almost static Coastguard and Hall sequences. The parameters in Table

3.4

have been chosen to achieve this, since a proper multi-hypothesis

decoding scheme ( Huang et al. 2011 ) may be used to fuse the best parts of

di↵erent estimates to a single superior estimate. Dynamic sequences generally present a problem in distributed video coding with respect to side information

generation ( Huang et al. 2011 ,

Rakˆet et al. 2012b ), which means that improve-

ments as those seen in Table

3.5

will in the end be the ones that deliver the main bitrate saving in the coding process, compared to conventional methods.

Table 3.5: Average PSNR across the 74 interpolated frames for the four test sequences.

Sequence Initial DC AC1 AC2 AC3

Coastguard 32.71

27.74

27.78

33.01

34.02

Foreman 30.19

33.74

34.69

36.04

36.61

Hall

Soccer

37.25

23.75

31.43

29.96

33.19

31.49

36.22

34.23

36.67

34.93

48

Chapter 4

Conclusions and future directions

This thesis has described the duality based TV-L

1 optical flow method, and variations hereof. Theoretical work on generalizing the original formulation of

Zach et al.

( 2007 ) has been presented, and a highly optimized algorithmic setup

was described. In addition five novel applications were given. We considered registration of medical images and 2D chromatograms, as well as three examples of frame interpolation in video sequences.

It has been demonstrated that the TV-L

1 optical flow algorithm is able to produce good results on benchmark data, and that the robustness of the formulation allows it to be used successfully in a wide range of applications. This robustness is a very important feature of the algorithm, as small benchmark datasets has a tendency to pull the development of algorithms toward algo-

rithms that work mainly on the specific examples available ( Austvoll 2005 ).

While the Middlebury benchmark has been built as a response to optical flow algorithms typically only being evaluated on (and overfitted to) the Yosemite and Tree sequences, the examples available are hardly realistic situations, and good performance on this benchmark does not guarantee good performance in other applications. Recently new benchmark data for optical flow evaluation

has been presented ( Geiger et al. 2012 , Butler et al. 2012 ), and hopefully these

data will help giving rise to new robust optical flow methods.

The most obvious direction of future research is to properly investigate the di↵erent extensions presented in Chapter

2 , and consider how they interact.

Then use this knowledge to build an algorithm with even better performance than what has been presented here.

Another question that has not been considered here is estimation of parameters. Very few successful methods for doing this exist.

Zimmer et al.

( 2011 )

proposed the so-called optimal prediction principle, where multiple flows with di↵erent parameters were estimated, and the one with the best predictive qualities (with respect to data fit in a subsequent frame) was chosen. A similar idea was considered in

Rakˆet ( 2012 ) to give a locally varying field of

values. While these methods have been demonstrated to work well, they are somewhat heuristic. A general and efficient parameter estimation framework would greatly benefit optical flow estimation, as this would remove the need to

49

tune parameters for specific applications.

An additional point of future investigations is the symmetric interpolation results presented in Section

3.3

. The interpolation quality may of course be

improved by using a symmetric data term with a more advanced optical flow method, but this is perhaps not the most exciting research direction. If the goal is to improve viewing experience, a spatial regularization of the interpolated frames could probably improve the perceived quality. Spatial regularization

may be done by means of total variation ( Keller et al. 2010 ,

Werlberger et al.

2011 ) or by edge enhancing di↵usion ( Weickert 1994 ). The latter has been

shown to have very good interpolation properties in other areas, and has been

successfully used in image compression ( Gali´c et al. 2008 ) as well as motion compensated deinterlacing ( Ghodstinat et al. 2009 ). A proper generalization

to frame interpolation would likely produce very good and robust results. To improve reconstruction quality one could in addition do occlusion reasoning and selectively interpolate from the non-occluded frame, or compute motion

trajectories over several frames ( Volz et al. 2011 ) and use this information for

interpolation.

Finally, an important point of future research is the approach to the vector valued problem described in Section

3.2

, where the coupling of channels is moved

to the regularization term, instead of the data term. These uncoupled data terms are somewhat similar to the robustification of

Bruhn et al.

( 2005 ), who

decomposed the original coupling of brightness and gradient suggested in

Brox et al.

( 2004 ), but kept as a strict requirement that the flows stayed the same

(as was the case with brightness and depth in ( 3.7

)). This decoupling was taken

further to decouple HSV color channels in

Zimmer et al.

( 2011 ). It will be

interesting to consider how the proposed method compares to usual data term coupling, in particular if it will add any robustness to the estimation, since it allows for a much simpler and more efficient solution than the one described in

Proposition

2.1.4

.

50

Bibliography

Aaron, A., Zhang, R. & Girod, B. (2002), Wyner-Ziv coding of motion video, in ‘Signals, Systems and Computers, 2002. Conference Record of the Thirty-

Sixth Asilomar Conference on’, Vol. 1, pp. 240 –244 vol.1.

Adato, Y., Zickler, T. & Ben-Shahar, O. (2011), ‘A polar representation of motion and implications for optical flow’, Computer Vision and Pattern Recognition, IEEE Computer Society Conference on pp. 1145–1152.

& S´anchez, J. (2007b), Symmetric optical flow, in R. D´ıaz, F. Pichler &

A. Arencibia, eds, ‘Computer Aided Systems Theory–EUROCAST 2007’, Vol.

4739 of Lecture Notes in Computer Science, Springer, pp. 676–683.

Alvarez, L., Deriche, R., Papadopoulo, T. & S´anchez, J. (2007a), ‘Symmetrical dense optical flow estimation with occlusions detection’, International Journal of Computer Vision 75, 371–385.

Austvoll, I. (2005), A study of the Yosemite sequence used as a test sequence for estimation of optical flow, in H. Kalviainen, J. Parkkinen & A. Kaarna, eds,

‘Image Analysis’, Vol. 3540 of Lecture Notes in Computer Science, Springer

Berlin Heidelberg, pp. 659–668.

Ayvaci, A., Raptis, M. & Soatto, S. (2012), ‘Sparse occlusion detection with optical flow’, International Journal of Computer Vision 97, 322–338.

Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J. & Szeliski, R.

(2011), ‘A database and evaluation methodology for optical flow’, International Journal of Computer Vision 31(1), 1–31.

Barron, J., Fleet, D. & Beauchemin, S. (1994), ‘Performance of optical flow techniques’, International Journal of Computer Vision 12, 43–77.

Black, M. J. & Anandan, P. (1996), ‘The robust estimation of multiple motions:

Parametric and piecewise-smooth flow fields’, Computer Vision and Image

Understanding 63(1), 75–104.

Bresson, X. & Chan, T. (2008), ‘Fast dual minimization of the vectorial total variation norm and application to color image processing’, Inverse Problems and Imaging 2(4), 455–484.

Brox, T., Bruhn, A., Papenberg, N. & Weickert, J. (2004), High accuracy optical flow estimation based on a theory for warping, in T. Pajdla & J. Matas, eds,

51

‘Computer Vision - ECCV 2004’, Vol. 3024 of Lecture Notes in Computer

Science, Springer Berlin Heidelberg, pp. 25–36.

Bruhn, A., Weickert, J. & Schn¨orr, C. (2005), ‘Lucas/Kanade meets

Horn/Schunck: Combining local and global optic flow methods’, International

Journal of Computer Vision 61, 211–231.

Butler, D., Wul↵, J., Stanley, G. & Black, M. (2012), A naturalistic open source movie for optical flow evaluation, in A. W. Fitzgibbon, S. Lazebnik, P. Perona,

Y. Sato & C. Schmid, eds, ‘Computer Vision – ECCV 2012’, Vol. 7577 of

Lecture Notes in Computer Science, Springer Berlin Heidelberg, pp. 611–625.

Chambolle, A. (2004), ‘An algorithm for total variation minimization and applications’, Journal of Mathematical Imaging and Vision 20, 89–97.

Chambolle, A. & Pock, T. (2011), ‘A first-order primal-dual algorithm for convex problems with applications to imaging’, Journal of Mathematical Imaging and

Vision 40, 120–145.

Chen, K. & Lorenz, D. (2011), ‘Image sequence interpolation using optimal control’, Journal of Mathematical Imaging and Vision 41, 222–238.

Chen, W. (2012), ‘Surface velocity estimation from satellite imagery using displaced frame central di↵erence equation’, IEEE Transactions on Geoscience and Remote Sensing, 50(7), 2791 –2801.

Christensen, G. E. & Johnson, H. J. (2001), ‘Consistent image registration’,

IEEE Transactions on Medical Imaging 20(7), 568–582.

Ekeland, I. & Teman, R. (1999), Convex Analysis and Variational Problems,

SIAM.

Gai, J. & Stevenson, R. (2010), Optical flow estimation with p-harmonic regularization, in ‘Image Processing (ICIP), 2010 17th IEEE International Conference on’, pp. 1969 –1972.

Gali´c, I., Weickert, J., Welk, M., Bruhn, A., Belyaev, A. & Seidel, H.-P.

(2008), ‘Image compression with anisotropic di↵usion’, Journal of Mathematical Imaging and Vision 31, 255–269.

Geiger, A., Lenz, P. & Urtasun, R. (2012), Are we ready for autonomous driving? The KITTI vision benchmark suite, in ‘Computer Vision and Pattern

Recognition (CVPR), 2012 IEEE Conference on’, pp. 3354 –3361.

Ghodstinat, M., Bruhn, A. & Weickert, J. (2009), Deinterlacing with motioncompensated anisotropic di↵usion, in D. Cremers, B. Rosenhahn, A. L. Yuille

& F. R. Schmidt, eds, ‘Statistical and Geometrical Approaches to Visual

Motion Analysis’, Springer-Verlag, Berlin, Heidelberg, pp. 91–106.

Girod, B., Aaron, A., Rane, S. & Rebollo-Monedero, D. (2005), ‘Distributed video coding’, Proceedings of the IEEE 93(1), 71 –83.

total variation which arises from geometric measure theory’, SIAM Journal on Imaging Sciences 5(2), 537–563.

52

Golub, G. & van Loan, C. (1989), Matrix Computations, The John Hopkins

University Press, Baltimore, Maryland.

Herbst, E., Seitz, S. & Baker, S. (2009), Occlusion reasoning for temporal interpolation using optical flow, Technical Report UW-CSE-09-08-01, Department of Computer Science and Engineering, University of Washington.

Horn, B. K. P. & Schunck, B. G. (1981), ‘Determining optical flow’, Artificial

Intelligence 17, 185–203.

Huang, X. & Forchhammer, S. (2005), ‘Cross-band noise model refinement for transform domain Wyner-Ziv video coding’, Signal Processing: Image Communication 27, 16–30.

Huang, X., Rakˆet, L. L., Luong, H. V., Nielsen, M., Lauze, F. & Forchhammer, S. (2011), Multi-hypothesis transform domain Wyner-Ziv video coding including optical flow, in ‘Multimedia Signal Processing (MMSP), 2011 IEEE

13th International Workshop on’, pp. 1–6.

Keller, S., Lauze, F. & Nielsen, M. (2010), Temporal super resolution using variational methods, in M. Mrak, M. Grgic & M. Kunt, eds, ‘High-Quality

Visual Experience: Creation, Processing and Interactivity of High-Resolution and High-Dimensional Video Signals’, Springer Berlin Heidelberg, pp. 275–

296.

Kiseliov, Y. (1994), ‘Algorithms of projection of a point onto an ellipsoid’,

Lithuanian Mathematical Journal 34, 141–159.

Lucas, B. D. & Kanade, T. (1981), An iterative image registration technique with an application to stereo vision, in ‘Proceedings of the 7th International

Joint Conference on Artificial Intelligence (IJCAI ’81)’, pp. 674–679.

Luong, H. V., Rakˆet, L. L., Huang, X. & Forchhammer, S. (2012), ‘Side information and noise learning for distributed video coding using optical flow and clustering’, Image Processing, IEEE Transactions on 21(12), 4782–4796.

Luong, H. V., Rakˆet, L. L., Salmistraro, M. & Forchhammer, S. (2013), Motion and reconstruction reestimation for distributed video coding. (in preparation).

Panin, G. (2012), Mutual information for multi-modal, discontinuity-preserving image registration, in G. Bebis, R. Boyle, B. Parvin, D. Koracin, C. Fowlkes,

S. Wang, M.-H. Choi, S. Mantler, J. Schulze, D. Acevedo, K. Mueller &

M. Papka, eds, ‘Advances in Visual Computing’, Vol. 7432 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, pp. 70–81.

Papenberg, N., Bruhn, A., Brox, T., Didas, S. & Weickert, J. (2006), ‘Highly accurate optical flow computations with theoretically justified warping’, International Journal of Computer Vision 67(2), 141–158.

Petersen, I. L., Tomasi, G., Sørensen, H., Boll, E. S., Hansen, H. C. B. &

Christensen, J. H. (2011), ‘The use of environmental metabolomics to determine glyphosate level of exposure in rapeseed (Brassica napus L.) seedlings’,

Environmental Pollution 159(10), 3071 – 3077.

53

Pock, T., Urschler, M., Zach, C., Beichel, R. & Bischof, H. (2007), A duality based algorithm for TV-L

1

-optical-flow image registration, in N. Ayache,

S. Ourselin & A. Maeder, eds, ‘Medical Image Computing and Computer-

Assisted Intervention – MICCAI 2007’, Vol. 4792 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, pp. 511–518.

Rakˆet, L. L. (2012), Local smoothness for global optical flow, in ‘Image Processing (ICIP), 2012 19th IEEE International Conference on’, pp. 1–4.

Rakˆet, L. L. & Markussen, B. (2014), ‘Approximate inference for spatial functional data on massively parallel processors’, Computational Statistics & Data

Analysis 72, 227 – 240.

Rakˆet, L. L. & Nielsen, M. (2012), A splitting algorithm for directional regularization and sparsification, in ‘Pattern Recognition (ICPR), 2012 21st International Conference on’, pp. 3094–3098.

Rakˆet, L. L., Roholm, L., Bruhn, A. & Weickert, J. (2012a), Motion compensated frame interpolation with a symmetric optical flow constraint, in

G. Bebis, R. Boyle, B. Parvin, D. Koracin, C. Fowlkes, S. Wang, M.-H. Choi,

S. Mantler, J. Schulze, D. Acevedo, K. Mueller & M. Papka, eds, ‘Advances in

Visual Computing’, Vol. 7431 of Lecture Notes in Computer Science, Springer

Berlin Heidelberg, pp. 447–457.

Rakˆet, L. L., Roholm, L., Nielsen, M. & Lauze, F. (2011), TV-L

1 optical flow for vector valued images, in Y. Boykov, F. Kahl, V. Lempitsky & F. Schmidt, eds,

‘Energy Minimization Methods in Computer Vision and Pattern Recognition’,

Vol. 6819 of Lecture Notes in Computer Science, Springer, pp. 329–343.

Rakˆet, L. L., Søgaard, J., Salmistraro, M., Luong, H. V. & Forchhammer,

S. (2012b), Exploiting the error-correcting capabilities of low density parity check codes in distributed video coding using optical flow, in ‘Proceedings of SPIE, the International Society for Optical Engineering’, Vol. 8499, SPIE

– International Society for Optical Engineering.

Rudin, L. I., Osher, S. & Fatemi, E. (1992), ‘Nonlinear total variation based noise removal algorithms’, Phys. D 60, 259–268.

Salmistraro, M., Rakˆet, L. L., Zamarin, M., Ukhanova, A. & Forchhammer, S.

(2013), Texture side information generation for distributed coding of videoplus-depth, in ‘Image Processing, 2013. ICIP ’13. 2013 International Conference on’.

Salmistraro, M., Zamarin, M., Rakˆet, L. L. & Forchhammer, S. (2013), Distributed multi-hypothesis coding of depth maps using texture motion information and optical flow, in ‘Acoustics, Speech and Signal Processing (ICASSP),

2013 IEEE International Conference on’, pp. 1685–1689.

Sommer, S., Lauze, F., Nielsen, M. & Pennec, X. (2012), ‘Sparse multi-scale di↵eomorphic registration: The kernel bundle framework’, Journal of Mathematical Imaging and Vision pp. 1–17.

54

flow computation without warping, in ‘Computer Vision, 2009 IEEE 12th

International Conference on’, pp. 1609 –1614.

ational optic flow estimation, in M. A. Magnor, B. Rosenhahn & H. Theisel, eds, ‘Proceedings of the Vision, Modeling, and Visualization Workshop 2009,

November 16-18, 2009, Braunschweig, Germany’, DNB, pp. 155–164.

Stich, T., Linz, C., Albuquerque, G. & Magnor, M. (2008), ‘View and time interpolation in image space’, Computer Graphics Forum 27(7), 1781–1787.

Sun, D., Roth, S. & Black, M. J. (2010), Secrets of optical flow estimation and their principles, in ‘Computer Vision and Pattern Recognition (CVPR), 2010

IEEE Conference on’, pp. 2432 –2439.

Vese, L. A. & Osher, S. J. (2002), ‘Numerical methods for p-harmonic flows and applications to image processing’, SIAM Journal on Numerical Analysis

40(6), 2085–2104.

Volz, S., Bruhn, A., Valgaerts, L. & Zimmer, H. (2011), Modeling temporal coherence for optical flow, in ‘Computer Vision (ICCV), 2011 IEEE International Conference on’, pp. 1116 –1123.

Wedel, A., Cremers, D., Pock, T. & Bischof, H. (2009b), Structure- and motionadaptive regularization for high accuracy optic flow, in ‘Computer Vision,

2009 IEEE 12th International Conference on’, pp. 1663 –1668.

Wedel, A., Pock, T., Braun, J., Franke, U. & Cremers, D. (2008), Duality TV-

L

1 flow with fundamental matrix prior, in ‘Image Vision and Computing’,

Auckland, New Zealand.

Wedel, A., Pock, T., Zach, C., Bischof, H. & Cremers, D. (2009a), An improved algorithm for TV-L

1 optical flow, in D. Cremers, B. Rosenhahn, A. Yuille &

F. Schmidt, eds, ‘Statistical and Geometrical Approaches to Visual Motion

Analysis’, Vol. 5064 of Lecture Notes in Computer Science, Springer, pp. 23–

45.

Weickert, J. (1994), Theoretical foundations of anisotropic di↵usion in image processing, in W. G. Kropatsch, R. Klette & F. Solina, eds, ‘Theoretical

Foundations of Computer Vision’, Vol. 11 of Computing Supplement, Springer, pp. 221–236.

Werlberger, M., Pock, T. & Bischof, H. (2010), Motion estimation with non-local total variation regularization, in ‘Computer Vision and Pattern Recognition

(CVPR), 2010 IEEE Conference on’, pp. 2464 –2471.

Werlberger, M., Pock, T., Unger, M. & Bischof, H. (2011), Optical flow guided

TV-L

1 video interpolation and restoration, in Y. Boykov, F. Kahl, V. Lempitsky & F. Schmidt, eds, ‘Energy Minimization Methods in Computer Vision and Pattern Recognition’, Vol. 6819 of Lecture Notes in Computer Science,

Springer, pp. 273–286.

55

Werlberger, M., Trobin, W., Pock, T., Wedel, A., Cremers, D. & Bischof, H.

(2009), Anisotropic Huber-L

1 optical flow, in ‘Proceedings of the British Machine Vision Conference (BMVC)’, London, UK.

Xu, L., Jia, J. & Matsushita, Y. (2010), Motion detail preserving optical flow estimation, in ‘Computer Vision and Pattern Recognition (CVPR), 2010 IEEE

Conference on’, pp. 1293 –1300.

Xu, L., Jia, J. & Matsushita, Y. (2012), ‘Motion detail preserving optical flow estimation’, Pattern Analysis and Machine Intelligence, IEEE Transactions on 34(9), 1744 –1757.

Xu, L., Lu, C., Xu, Y. & Jia, J. (2011), ‘Image smoothing via L

0 gradient minimization’, ACM Transactions on Graphics (SIGGRAPH Asia) 30(4).

Zach, C., Pock, T. & Bischof, H. (2007), A duality based approach for realtime

TV-L

1 optical flow, in F. Hamprecht, C. Schn¨orr & B. J¨ahne, eds, ‘Pattern Recognition’, Vol. 4713 of Lecture Notes in Computer Science, Springer, pp. 214–223.

Zamarin, M. & Forchhammer, S. (2012), Lossless compression of stereo disparity maps for 3D, in ‘Multimedia and Expo Workshops (ICMEW), 2012 IEEE

International Conference on’, pp. 617 –622.

Zimmer, H., Bruhn, A. & Weickert, J. (2011), ‘Optic flow in harmony’, International Journal of Computer Vision 93, 368–388.

Zitnick, C., Jojic, N. & Kang, S. B. (2005), Consistent segmentation for optical flow estimation, in ‘Computer Vision, 2005. ICCV 2005. Tenth IEEE

International Conference on’, Vol. 2, pp. 1308 –1315 Vol. 2.

Zitnick, L., Kang, S., Uyttendaele, M., Winder, S. & Szeliski, R. (2004), ‘Highquality video view interpolation using a layered representation’, ACM Transactions on Graphics 23(3), 600–608.

56

Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project