Large-scale time parallelization for molecular dynamics problems

Large-scale time parallelization for molecular dynamics problems
Large-scale time parallelization for
molecular dynamics problems
JOHANNES
BULIN
Master of Science Thesis
Stockholm, Sweden 2013
Large-scale time parallelization for
molecular dynamics problems
JOHANNES
BULIN
Master’s Thesis in Scientific Computing (30 ECTS credits)
Master Programme in Scientific Computing (120 credits)
Royal Institute of Technology year 2013
Supervisor at KTH was Michael Schliephake
Examiner was Michael Hanke
TRITA-MAT-E 2013:44
ISRN-KTH/MAT/E--13/44--SE
Royal Institute of Technology
School of Engineering Sciences
KTH SCI
SE-100 44 Stockholm, Sweden
URL: www.kth.se/sci
Abstract
As modern supercomputers draw their power from the sheer
number of cores, an efficient parallelization of programs is
crucial for achieving good performance. When one tries to
solve differential equations in parallel this is usually done
by parallelizing the computation of one single time step.
As the speedup of such parallelization schemes is usually
limited, e.g. by the spatial size of the problem, additional
parallelization in time may be useful to achieve better scalability.
This thesis will introduce two well-known schemes for
time-parallelization, namely the waveform relaxation method and the parareal algorithm. These methods are then
applied to a molecular dynamics problem which is a useful
test example as the number of required time steps is high
while the number of unknowns is relatively low. Afterwards
it is investigated how these methods can be adapted to
large-scale computations.
Referat
Storskalig tidsparallelisering för
molekyldynamik
Moderna superdatorer använder ett stort antal processorer
för att uppnå hög prestanda. Därför är det nödvändigt att
parallelisera sina program på ett effektivt sätt. När man
löser differentialekvationer så brukar man parallelisera beräkningen av en enda tidspunkt. Speedupen av sådana program är ofta begränsad, till exempel av problemets storlek.
Genom att använda ytterligare parallelisering i tid kan man
uppnå bättre skalbarhet.
Denna avhandling presenterar två välkanda algoritmer
för tidsparalellisering: waveform relaxation och parareal.
Dessa metoder används för att lösa ett molekyldynamikproblem där tidsdomänen är stor jämförd med antalet obekanta. Slutligen undersöks några förbättringar för att möjliggöra storskaliga beräkningar.
Definitions and abbreviations
• ODE – ordinary differential equation
• PDE – partial differential equation
• WR – waveform relaxation
• MD – molecular dynamics
• diag(A) describes the diagonal matrix that has the same entries on the diagonal as A.
• ẏ describes the first derivative in time of the function y and ÿ the second
derivative in time.
• P denotes always the number of available processors. When algorithms are
discussed then p0 , . . . , pP −1 are used to identify each of the P processors.
• CF describes the cost of the evaluation of the function F , i.e. the required
flops or the time on a specified system.
• Jf (y, t) denotes the Jacobian matrix of a given function f (y, t).
Contents
1 Introduction
1
2 Time parallelization methods
2.1 Waveform relaxation method . . . . . . . . .
2.1.1 Algorithm . . . . . . . . . . . . . . . .
2.1.2 Parallel waveform relaxation . . . . .
2.1.3 Numerical properties . . . . . . . . . .
2.1.4 Comments . . . . . . . . . . . . . . . .
2.2 Parareal algorithm . . . . . . . . . . . . . . .
2.2.1 Algorithm . . . . . . . . . . . . . . . .
2.2.2 Numerical properties . . . . . . . . . .
2.3 Solving the toyproblem in a time-parallel way
2.3.1 Waveform relaxation . . . . . . . . . .
2.3.2 Parareal algorithm . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
8
9
9
11
12
14
14
17
19
20
21
3 Introduction to molecular dynamics
3.1 Force fields and potentials . . . . . .
3.2 Time stepping . . . . . . . . . . . . .
3.3 The molecular dynamics problem . .
3.4 Using time-parallel methods . . . . .
3.4.1 Waveform relaxation . . . . .
3.4.2 Parareal algorithm . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
26
27
29
30
32
35
4 Improved methods for the molecular dynamics problem
4.1 Choice of the force splitting for the waveform relaxation . .
4.2 Improving the parareal algorithm . . . . . . . . . . . . . . .
4.2.1 Using windowing . . . . . . . . . . . . . . . . . . . .
4.2.2 Choice of the coarse operators . . . . . . . . . . . .
4.2.3 A multilevel parareal algorithm . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
37
37
38
38
39
40
5 Evaluation
5.1 Choice of the force splitting for the waveform relaxation . . . . . . .
5.2 Improving the parareal algorithm . . . . . . . . . . . . . . . . . . . .
5.2.1 Using windowing . . . . . . . . . . . . . . . . . . . . . . . . .
49
49
49
49
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5.2.2
5.2.3
Choice of the coarse operators . . . . . . . . . . . . . . . . .
A multilevel parareal algorithm . . . . . . . . . . . . . . . . .
50
53
6 Conclusions
57
Bibliography
59
Chapter 1
Introduction
Due to the structure of modern (super-)computers, parallelization of algorithms
has become important in order to solve certain problems faster. As big compute
clusters can easily have several thousand cores it is important to write programs that
actually can use this large number of processors in an efficient way. In this thesis
the usefulness of parallelization in time for time-dependent differential equations
stemming from molecular dynamics will be investigated. The main focus is on the
behavior of these time-parallel methods for a large number of cores.
Parallelization means that one tries to adapt a program in such a way that several
processors cooperate in order to solve a given problem. Reasons to parallelize a
program can be limited memory on a single machine or the desire to solve problems
faster. In this thesis the focus is on the latter, i.e. how to make programs faster by
using more processors. Before we start we need to define two elementary terms:
Definition 1. For a given problem let T ∗ be the time that the fastest known serial
solver (i.e. on one processor) needs to solve the problem. Let furthermore TP be the
time that the currently investigated algorithm needs to solve the problem using P
processors. Then the speedup SP is defined as
T∗
TP
(1.1)
T∗
SP
=
.
P TP
P
(1.2)
SP =
and the efficiency EP is given by
EP =
A lot of different algorithms can be accelerated by using parallelization but with
very different results. One reason for this is that introducing parallelism usually requires some kind of communication between the processors. A very simple example
of this is the calculation of the sum of a large array a1 , . . . , an in parallel (see figure
1.1): Usually, the array is split into several parts and each part is assigned to a
single processors. Now, every processor calculates the sum of its assigned subarray.
1
CHAPTER 1. INTRODUCTION
Finally, these subtotals have to be merged together in order to get the final sum
of the whole array. To do this, the processors must exchange their local sums with
each other. The big problem is that communication tends to be very slow. If the
processors reside on different computers then the data that has to be exchanged
must travel through some kind of network which is slow compared to the computational speed of a processor. Furthermore, each communication has a startup time
( latency) before the actual data exchange starts. Therefore it is usually better to
send large chunks of data at once instead of sending a lot of smaller data packages.
p0
Pn/2
i=1
p1
Pn
ai
ai
i=n/2+1
Communication
Pn
i=1
Pn
ai
i=1
ai
Figure 1.1. Calculating the sum of an array on two processors
It is also quite common that parallel algorithms have some so called sequential parts that cannot be executed in parallel. These sequential parts limit the
maximal possible speedup as using more processors will not reduce the execution
time of those. These factors combined cause a quite typical behavior of many parallel algorithms that is shown in figure 1.2: One can see that the growth rate of
the speedup declines for a higher number of processors. Finally a point is reached
where using additional processors does not make the program run any faster, but
often even slower. The reason for this behavior is that the communication time is
negligible in the beginning as the workload per processor is high enough to hide the
communication cost. When the number of processors increases, the computational
workload per processor is usually decreasing. The communication time, however, is
sometimes even increasing when more processors are added. Therefore the communication part of the algorithm becomes the predominant part of the runtime. The
sequential part of the algorithm does also limit the speedup. The famous Amdahl’s
law states that the speedup is bounded by
SP ≤
1
s + 1−s
P
where s is the fraction of the algorithm that cannot be executed in parallel.
2
(1.3)
140
120
100
SP
80
60
40
20
0
0
100
200
300
400
500
P
Figure 1.2. Example of a speedup curve
As this thesis will focus on the parallel solution of time-dependent differential equations, it is worth to have a look at how these problems are parallelized. Probably the
most common way to do so is to parallelize the computation of one single time step.
If the differential equation has a spatial representation as it is the case for PDEs and
some ODEs then the problem can usually be decomposed in space in such a way that
the resulting sub-problems can be solved efficiently in parallel. Figure 1.3 shows a
Ω
Ω0
Ω1
Ω2
Ω3
Ω4
Ω5
Figure 1.3. Simple decomposition in space; Ωi is assigned to pi
3
CHAPTER 1. INTRODUCTION
very simple decomposition of the 2D domain Ω into several sub-domains Ωi which
can be used to solve PDEs in parallel. First, each processor pi is assigned to a part
Ωi of the full domain. When one uses explicit time-stepping methods and simple
differentiation stencils then the solution in each Ωi can be calculated using only the
points in Ωi and the boundary points of the neighbouring subdomains. Therefore it
is sufficient to assign Ωi to pi and communicate the points near the boundary to the
neighbouring processors in order to solve the problem. If implicit methods have to
be used, more complicated – and usually iterative – parallelization approaches have
to be used. One example of these methods are parallel multi-grid methods[21] which
can solve the (non-)linear system of equations that has to be solved in each time
step using a multi-grid approach. A different idea is used for Schwarz methods[12]
which assign artificial boundary conditions to the sub-domains Ωi in figure 1.3. This
makes it possible to calculate the solution in these sub-domains in parallel. After
each iteration data is exchanged and the boundary conditions are updated until the
global solution has converged.
Unfortunately, parallelization in space cannot always be applied. Some properties of the problem like a small spatial dimension compared to the number of
processors can limit the speedup similar to the situation in figure 1.2. Furthermore,
it may actually be impossible to parallelize in space if the differential equation does
not have a spatial representation which is the case for some ODEs. An obvious
alternative would be to use parallelism on the level of the basic numerical methods. If the time-stepping for example corresponds to the solution of a linear system
Ax = b, this system can be solved using parallel linear equation solvers. Otherwise
one can exploit parallelism inside the time-stepping scheme itself, for example by using parallel Runge-Kutta methods or parallel prediction-correction algorithms[10].
The speedup that one can obtain with the two last-mentioned methods is usually
quite small[10][32].
An approach to overcome these problems might be to use parallelization in time.
This means that one tries to calculate the solution in several time steps at once,
compared to other methods where usually all processors try to calculate the solution
in a single time step. In some problems, the time-domain is much “bigger” than
the spatial domain, for example in molecular dynamics problems. For this kind of
problems it is common to have a quite low number of unknown functions but several
billion time steps. As soon as spatial or other kinds of parallelization do not yield
good results anymore it might be advantageous to parallelize in the time dimension.
As most problems are sequential in time as the underlying physics are sequential
in time it may however be quite difficult to obtain reasonable speedups using this
kind of parallelization.
In this thesis I will try to apply time parallelization to two differential equations
to see how efficient it actually is and if there are ways to improve the performance.
4
Both differential equations can be written as ODEs of the type
ẏ = f (y, t),
t ∈ [0, T ]
(1.4)
y(0) = y0 .
The first equation that will be solved is a simple linear toyproblem. The other
one is a non-linear equation which is derived from molecular dynamics. The term
molecular dynamics describes a group of mathematical simulations that are used
to predict the movement of atoms or molecules and corresponding quantities like
temperature. As already mentioned, these simulations require a lot of time steps
while the usefulness of spatial parallelization may be limited due to an insufficient
problem size. Furthermore, these problems are usually non-linear and are sensitive
to small deviations during the solution. Therefore it is reasonable to use these kinds
of equations as test problems instead of relying on relatively simple linear equations.
Time-parallel solvers have been investigated for at least 50 years[14] and have
already been successfully applied to certain problems. Unfortunately there are
very few papers available that investigate the scaling properties of time-parallel
algorithms for a higher numbers of cores, i.e. more than 50 cores. The fastest
supercomputer in the beginning of 2013, the TITAN at the Oak Ridge National
Laboratory, has about 300,000 cores plus additional accelerators[3], so it would be
interesting to know how time-parallel methods behave for at least one hundred cores.
This thesis will therefore focus on the question how time-parallel methods behave
when such larger number of cores are used. To test algorithms, the supercomputer
Lindgren at PDC[2] was used and computational time was provided by the CRESTA
project[1]. In the next chapter, two well-known time-parallel methods are presented.
As it will turn out, these methods do not become faster as soon as a certain number
of processors is used even if the time domain is sufficiently large. Therefore, some
work will be done in chapter 4 to find ways to allow scaling even after this threshold.
5
Chapter 2
Time parallelization methods
In the previous chapter time-parallel methods were motivated. This chapter will give
a short overview of these methods and will explain two algorithms in more detail.
The usual approach, i.e. using all available processors to calculate one single time
step, will be called sequential time-stepping in this chapter. Time-parallel methods
differ from these sequential schemes in the way that the solutions at several time
steps is calculated in parallel.
Several different algorithms that can be called time-parallel exist, for example
space-time-parallel multigrid methods. These methods have for example been used
in [19] to solve parabolic equations by using the multigrid technique in space and
time. This means that one does not use multigrid to compute the approximate solution yi in one single time step by solving a linear equation A(yi ) = b. Instead, one
calculates several time steps at once by adding the time as an additional dimension
to the multigrid solving step. Thus, the space-time-parallel multigrid has to solve
a bigger system of equations:
Ã(yi , . . . , yi+k ) = B̃.
A different approach has been presented in [34]: Here, implicit time-stepping is
used and the resulting equations that have to be solved in each time step are solved
using an iterative solver. The calculation of the next time step is started before the
iterative solver has finally converged which leads to a time-parallel algorithm.
For very special problem classes additional time-parallel methods exist, for example [29] for planetary movements, [20] for certain parabolic partial differential
equations and [35] for molecular dynamics problems1 . These special algorithms can
achieve quite good speedups, even for a higher number of processors. For general
methods the results have usually been quite sobering. The maximal achievable
speedup is usually quite small and only a handful of cores can be used usefully.
The algorithms that will be investigated here are the waveform relaxation method
and the parareal algorithm. These two algorithms were chosen because they are
1
This algorithm is not using a pure mathematical approach to solve these problems in parallel.
Instead, a database with a lot of previous molecular dynamics simulations is used to predict the
state in the next time steps.
7
CHAPTER 2. TIME PARALLELIZATION METHODS
quite general, i.e. they can theoretically be applied to almost every type of ODE, of
course with varying success. This is important as the molecular dynamics problem
that is presented in the next chapter is lacking a lot of nice properties, like linearity or the possibility to easily create coarse-grid approximations. Furthermore,
both methods can in theory converge on very large time-intervals which might be
necessary as large-scale time-parallelization should be achieved.
These two methods shall be presented now. In order to verify the theory and
to show some features of the algorithm I will use a simple toyproblem. This first
example is a linear ODE that has been derived from the 2D heat equation using
finite differences. It has the form
t ∈ [0, T ], y ∈ Rn
ẏ = Ay + f (t),
2
(2.1)
y(0) = 0
where n = 30 and A is the block tridiagonal matrix

B
 I


A = (n + 1)2 



I
B
..
.

I
..
.
I
..



 ∈ Rn2 ×n2 .


I 
.
B
I B
Here, I denotes the identity matrix of size n × n and B is defined as

−4 1
 1 −4 1


.. .. ..
B=
.
.
.



1 −4 1
1 −4




 ∈ Rn×n .



The right-hand side f is chosen as the time-dependend vector2



f (t) = 10 


2.1
sin(75πt) cos(6π(1/n2 ))
sin(75πt) cos(6π(2/n2 ))
..
.
sin(75πt) cos(6π(n2 /n2 ))



.


Waveform relaxation method
The waveform relaxation method is an algorithm that can be used to solve ODEs in
an iterative way. It was first published in 1982[24] as a method to solve problems related to integrated circuits. Sometimes it is not seen as a pure time-parallel method
2
This right-hand side does not correspond to any real problem. It was simply chosen in such
a way that the function values will neither become too large nor converge to zero when t becomes
larger. Otherwise calculations on bigger time intervals would be meaningless.
8
2.1. WAVEFORM RELAXATION METHOD
but as a method that exploits parallelism across the system[10]. It is presented
as a time-parallel method here because it is still possible to calculate several time
steps simultaneously in one iteration. Furthermore, it also allows to solve ODEs in
parallel that have no spatial representation.
2.1.1
Algorithm
The idea of the WR algorithm is to approximate the solution of equation (1.4) in
an iterative fashion. It calculates the approximate solutions y (k+1) in each iteration
k by solving the following series of modified problems
ẏ (k+1) = F (y (k+1) , y (k) , t),
y
(k+1)
t ∈ [0, T ]
(2.2)
(0) = y0
instead of equation (1.4). The right-hand side F (x, y, t) has to be chosen such that
F (y, y, t) = f (y, t) ∀y. If this holds true as well as a couple of other conditions that
will be presented later, the iterates y (i) converge towards the actual solution y ∗ of
the equation (1.4).
In order to implement the WR algorithm on a computer, one has to discretize
the time in the usual way, resulting in a number of discrete time points
0 = t0 < t1 < · · · < tm = T.
One also has to define the function values that should be approximated:
(k)
≈ y (k) (ti ).
yi
(0)
Now one has to define the initial waveform yi ∀i. The easiest approach to do this
when no additional information is available is to set
(0)
yi
= y0 ,
(k+1)
i.e. to choose a constant initial waveform. Then one can calculate the values yi
(k)
(k+1)
using the already known values yi , ∀i and yj
, j < i. The whole WR algorithm
is also summed up in algorithm 1.
2.1.2
Parallel waveform relaxation
It is not completely obvious how the waveform relaxation can be parallelized – especially in a time-parallel fashion. The first step is similar to space-parallel methods:
(k)
The components of yi ∀i are assigned to different processors. Now one can choose
the modified right-hand side F in such a way that each processor can calculate its
(k+1)
(k+1)
part of yi
without requiring those components of yi
that are located on
9
CHAPTER 2. TIME PARALLELIZATION METHODS
Algorithm 1 Waveform relaxation method for solving equation (1.4)
Choose appropriate function F (x, y, t)
for i = 0, 1, . . . , m do
(0)
yi = y0
end for
k←1
while not converged do
(k)
y0 = y0
for i = 1, . . . , m do
(k)
Calculate yi by applying a time-stepping scheme to equation (2.2)
end for
k =k+1
end while
another processor. This can for example be achieved by choosing



F (x, y, t) = 


F (x, y, t)1
F (x, y, t)2
..
.
F (x, y, t)n






and
F (x, y, t)i = f ((y1 , . . . , yi−1 , xi , yi+1 , . . . , yn ), t).
This results in the so called Jacobi waveform relaxation. By using this decoupling,
(k+1)
each processor can calculate its part of yi
on a part of the time domain (so
called window) without communicating with other processors. After each iteration,
(k+1)
(k+1)
i.e. when yi
has been calculated ∀i, all yi
have to be distributed to the
(k+2)
other processors in order to calculate yi
. Compared to normal space-parallel
methods this has the advantage that the number of communications is lower as data
is exchanged once per iteration and not in every time step. This can be advantegeous
when the communication latency is prohibiting higher speedups. On the other hand
this method usually needs several iterations to converge, so it can easily be much
slower than space-parallel methods when the number of iterations is too high.
Remark 1. The WR algorithm is usually not beneficial for explicit methods. As
F must satisfy F (x, x, t) = f (x, t), each single iteration is at least as costly as
solving the problem sequentially when one ignores communication costs. For implicit
methods the WR method is more useful as F can be chosen in such a way that solving
the system of equations in each time step becomes cheaper. An example of that would
be to choose F as
F (x, y, t) = f (y) − diag(Jf (y))(x − y)
10
2.1. WAVEFORM RELAXATION METHOD
if f is non-linear. This replaces the solution of a non-linear equation at each time
step by solving a linear system which is much cheaper. This kind of force splitting
is also called Waveform Newton.
2.1.3
Numerical properties
An important question is always if a numerical method actually converges to the
right solution. It is also interesting to know how fast the algorithm converges in
that case. It can actually be shown that the waveform relaxation method converges
superlinearly to the actual solution. A theorem and a corresponding proof can be
found for example in [8] and the theorem will briefly be recalled here:
Theorem 1. Let the function F (x, y, t) satisfy F (v, v, t) = f (v, t) ∀v where f is the
function from equation (1.4). Define also y ∗ as the exact solution of the problem in
equation (1.4). If F satisfies the Lipschitz condition
||F (y ∗ , y ∗ , t) − F (x̃, ỹ, t)|| ≤ K||y ∗ − x̃|| + L||y ∗ − ỹ||, K, L ∈ R
n
(2.3)
o
for all t and x̃, ỹ ∈ y (k) (t) : k = 0, . . . , ∞ and if the differential equation (2.2) is
solved exactly, then one can bound the error e(k) = y ∗ − y (k) by:
sup ||e(k) (t)|| ≤
0≤t≤T
(LT )k KT
e
sup ||e(0) (t)||
k!
0≤t≤T
(2.4)
Usually it is of course not possible to solve the equation (2.2) exactly but one uses
numerical methods instead. Then it is necessary to know if the solution of the WR
algorithm converges towards the solution that one obtains by solving the original
problem (1.4) using the same time-stepping. For some time-stepping methods it has
been proved that this is the case, for example for forward and backward Euler[8]:
Theorem 2. Let Y1 , . . . , Yl be the solution at times t1 , . . . , tl that one obtains by
using serial time-stepping with the forward Euler method. If the conditions from
(k)
(k)
theorem 1 are satisfied, then the iterates y1 , . . . , yl of the forward Euler WR
(k)
(k)
algorithm converge superlinearly to Y1 , . . . , Yl . By defining ei = Yi − yi one
obtains the convergence rate:
sup
i=1,...,l
(k)
||ei ||
≤
∆t L
1 + ∆t K
k
!
l−1
(0)
(1 + ∆t K)l−1 sup ||ei ||
k
i=1,...,l−k
(2.5)
Now let Z1 , . . . , Zl be the solution that one obtains by using serial time-stepping with
(k)
(k)
(k)
the backward Euler method. One also has to define ei as ei = Zi − yi this time.
Assume that F fulfills the requirements in theorem 1 and that the step size in time
∆t also fulfills
1
∆t <
.
K +L
11
CHAPTER 2. TIME PARALLELIZATION METHODS
(k)
(k)
In that case the iterates y1 , . . . , yl
converge linearly to Z1 , . . . , Zl :
sup
i=1,...,l
(k)
||ei ||
≤
∆t L
1 − ∆t K
k
of the backward Euler WR algorithm will
!
l+k−1
(0)
(1 − ∆t K)1−l sup ||ei ||.
k
i=1,...,l
(2.6)
Example 1. Recall the toy problem (2.1) with the function splitting
F (x, y, t) = (A − diag(A))y + diag(A)x + b(t)
(2.7)
For this simple example it is easy to show that the necessary conditions for convergence according to theorem 1 are fulfilled:
||F (y ∗ , y ∗ , t) − F (x̃, ỹ, t)|| = ||diag(A)(y ∗ − x̃) + (A − diag(A))(y ∗ − ỹ)||
≤ ||diag(A)|| ||y ∗ − x̃|| + ||A − diag(A)|| ||y ∗ − ỹ||.
Therefore we get the Lipschitz constants K = ||diag(A)|| and L = ||A − diag(A)||.
If we are using the maximum norm here we can get K = 3600 and L = 3600. Using
theorem 1, we can easily get an upper bound for the error:
sup ||ek (t)||∞ ≤
0≤t≤T
(3600T )k 3600T
e
sup ||e0 (t)||∞ .
k!
0≤t≤T
k
) 3600T
e
are plotted with respect to k for
In the figures 2.1 and 2.2 the terms (3600T
k!
two different values of T . It is clearly visible that the convergence rate is highly
dependent on the value of T in this example. Large values of T lead to a very slow
convergence as e3600T will become very large and 3600T 1.
2.1.4
Comments
One result of theorem 1 and the observations in example 1 is that the WR algorithm
may converge very slowly on large time intervals. Therefore one usually divides the
time domain into several so called windows and the differential equation is solved
on each window at a time. This does not only improve performance but also limits
(k−1)
the amount of memory that is needed as all values yi
have to be saved in order
(k)
to calculate yi .
It is also important to know how one should define a stopping criterion. The
stopping criterion that is used in my WR implementation is
(k)
(k−1)
||yi (t) − yi
(t)|| ≤ ε, ∀t ∈ [0, T ], ∀i.
(2.8)
In the continuous case this choice can be motivated by the following theorem:
Theorem 3. Assume that the assumptions in theorem 1 are fulfilled and that all
y (k) are continuous. If the inequality in (2.8) holds true then we can bound the error
e(k) = y ∗ − y (k) by
||e(k) (t)|| ≤ Ltεet(K+L)
12
2.1. WAVEFORM RELAXATION METHOD
40
10
30
10
20
(LT)k eKT / k!
10
10
10
0
10
−10
10
−20
10
−30
10
0
20
40
60
80
100
120
140
160
180
k
Figure 2.1. Evolution of
(3600T )k 3600T
e
k!
for T = 10−2
Proof.
||e(k) (t)|| = y0 +
Z t
F (y ∗ , y ∗ , s) ds − y0 −
0
Z t
0
F (y (k) , y (k−1) , s) ds
Z t ≤
F (y ∗ , y ∗ , s) − F (y (k) , y (k−1) , s) ds
0
Z t
K||e(k) (s)|| + L||e(k−1) (s)|| ds
≤
0
Z t
=
K||e(k) (s)|| + L||e(k−1) (s) + e(k) (s) − e(k) (s)|| ds
0
≤
Z t
0
(k)
K||e(k) (s)|| + L(||e(k) (s)|| + ||yi
≤ Ltε +
Z t
(K + L)||e(k) (s)|| ds
0
Applying Grönwall’s lemma to this finally gives us
||e(k) (t)|| ≤ Ltεet(K+L)
13
(k−1)
− yi
||) ds
CHAPTER 2. TIME PARALLELIZATION METHODS
5
10
0
(LT)k eKT / k!
10
−5
10
−10
10
−15
10
0
5
10
Figure 2.2. Evolution of
2.2
15
k
20
(3600T )k 3600T
e
k!
25
30
for T = 10−3
Parareal algorithm
The parareal algorithm is a method that can be used to solve more or less arbitrary
differential equations in a time-parallel fashion. Originally presented by [26] in
2001, it “has received a lot of attention over the past few years”[14, p. 556] and “has
become quite popular among people involved in domain decomposition methods”[4,
p. 425].
2.2.1
Algorithm
The parareal algorithm can be motivated in several different ways, for example as
a multiple shooting method or as a multigrid method[14]. The basic idea is always
to divide the time domain [0, T ] into several blocks
T0 = [t0 , t1 ], T1 = [t1 , t2 ], . . . , Tn = [tn , tn+1 ]
with
0 = t0 < t1 < · · · < tn+1 = T.
Here we also assume that ti+1 −ti = ∆T is constant ∀i. Then the parareal algorithm
tries in each iteration k to approximate the initial values Ujk ≈ y(tj ) in all time
blocks Tj . To do this, a coarse and a fine time-stepping scheme are needed. The fine
14
2.2. PARAREAL ALGORITHM
time-stepping F should yield more accurate results than the coarse time-stepping
G, either by using smaller time steps or higher order methods. F(Ujk ) and G(Ujk )
should symbolize the numerical solution of
t ∈ [tj , tj+1 ]
ẏ = f (y, t),
y(tj ) =
Ujk
at time tj+1 using the corresponding time-stepping scheme.
y
U3
U1
U2
U0
t1
T0
t2
T1
t3
T2
t4
t
T3
Figure 2.3. Parareal as a multiple-shooting method
One way to motivate the parareal algorithm is to see it as a multiple shooting
method as shown in [14]: One can divide the time-domain as explained above and
guess initial values Ui for each time block. By using these artificial initial values one
can solve the differential equations inside each time-block using the F propagator in
parallel. The problem here are the jumps in the solution at the block borders. Figure
2.3 shows this behaviour in the case that y in equation (1.4) is one-dimensional. To
prohibit this, one can eliminate these jumps by enforcing that








U0 − y0
U1 − F(U0 )
U2 − F(U1 )
..
.




=0



Un − F(Un−1 )
|
{z
=:N (U )
15
}
(2.9)
CHAPTER 2. TIME PARALLELIZATION METHODS
which would require the solution of a nonlinear system of equations. This could
be done by using standard methods for nonlinear equations like Newton methods.
Applying the Newton method to this system yields
U k+1 = U k − JN (U k )−1 N (U k )
where JN (U k ) denotes the Jacobian matrix of N (U k ). Due to the special form of
equation (2.9) the term JN (U k )−1 N (U k ) can be calculated explicitly and we get
k+1
Uj+1
= F(Ujk ) +
∂F
(U k )(Ujk+1 − Ujk ).
∂Uj j
(2.10)
Using Taylor expansion we obtain
∂F
(U k )(Ujk+1 − Ujk ) ≈ F(Ujk+1 ) − F(Ujk )
∂Uj j
which would result in the normal serial time-stepping when we plug this into equation (2.10). So one uses instead
∂F
(U k )(Ujk+1 − Ujk ) ≈ F(Ujk+1 ) − F(Ujk ) ≈ G(Ujk+1 ) − G(Ujk )
∂Uj j
which gives us the updates for the initial values in each parareal iteration:
k+1
k
).
) + G(Uj−1
Ujk+1 = F(Ujk ) − G(Uj−1
(2.11)
0 ), i 6= 0. Even though
The initial values Ui0 are usually chosen as Ui0 = G(Ui−1
this initialization is purely sequential, it will not introduce additional costs to the
algorithm compared to a simple initialization like Ui0 ≡ y0 , ∀i (see [5]). By using
this we can finalize the update formula as
0
U00 = y0 , Ui0 = G(Ui−1
)
k+1
k
k
) + G(Uj−1
).
U0k+1 = y0 , Ujk+1 = F(Uj−1
) − G(Uj−1
(2.12)
k )
Compared to normal sequential time-stepping we have the advantage that all F(Uj−1
can be calculated independently from each other which makes the parallelization of
this operation very simple. This update formula leads to the parareal algorithm in
algorithm 2.
Remark 2. As for the WR algorithm we also need a stopping criterion for the
parareal algorithm. According to [27], stopping criteria for the parareal algorithm
have not been subject to a lot of research. A very simple and common stopping
criterion that will also be used here is to stop the calculation as soon as
||Ujk − Ujk−1 || < ε,
16
∀j.
(2.13)
2.2. PARAREAL ALGORITHM
Algorithm 2 Parareal framework for solving equation (1.4)
Set U00 = y0
for all processors pi in parallel do
p=i
if i 6= 0 then
Receive Up0 from processor pi−1
end if
g = G(Up0 )
0
Up+1
=g
if p 6= P − 1 then
0
Send Up+1
to processor pi+1
end if
while not converged do
k+1
Up+1
= F(Upk )
if p 6= 0 and p − 1 has not converged then
k+1
k+1
Up+1
= Up+1
−g
k+1
Receive Up
from processor pi−1
k+1
g = G(Up )
k+1
k+1
+g
Up+1
= Up+1
end if
if p 6= P − 1 then
k+1
Send Up+1
to processor pi+1
end if
end while
end for
2.2.2
Numerical properties
Convergence
It is trivial to show that the initial values of time-block Tj in iteration k Ujk will
equal the solution that is obtained by using F in a serial fashion if k ≥ j. Therefore
we know that the parareal algorithm will always converge. In order to obtain a
speedup bigger than 1, we have to show that the parareal algorithm can converge
in less than P iterations. One theorem that also covers non-linear right-hand sides
f can be found in [16] and [15] and will be briefly repeated here:
Theorem 4. Assume that F solves the underlying differential equation (1.4) exactly.
Assume also that the local truncation error of G is limited by C1 ∆T p+1 when ∆T
is small enough. If G also satisfies the Lipschitz condition
||G(x) − G(y)|| ≤ (1 + C2 ∆T )||x − y||
17
CHAPTER 2. TIME PARALLELIZATION METHODS
then we can get the error bound
sup ||y ∗ (tn ) − Unk || ≤
n
(C1 T )k C2(T −(n+1)∆T )
e
∆T pk sup ||y ∗ (tn ) − Un0 ||.
(k)!
n
(2.14)
Not so much is known about the convergence of the parareal algorithm in the discrete
case, i.e. when F is not the exact solution but the output of a numerical method.
For linear one-dimensional problems the convergence can be studied as shown in
[14]. For higher-dimensional linear systems or non-linear differential equations no
general convergence results exist as far as I know. Therefore the convergence in the
discrete case must be evaluated numerically here.
Parallel efficiency
Now we will investigate which parallel efficiency we can achieve when we try to
solve an ODE on the time-interval [0, P ∆T ]: When we look at equation (2.11) we
see that each iteration of the parareal framework requires one evaluation of F and
k ). This
G3 . We also have to note that each Ujk requires the calculation of G(Uj−1
serial dependence introduces an additional cost of P CG to the parareal algorithm.
Assume that the cost CG of the evaluation of G can be written as
CG = aCF
where CF is the cost of the F and a 1. When one is ignoring communication
costs and defines K as the maximum number of iterations we get the theoretical
maximal speedup of[5]
SP ≤
P CF
P
1
1
=
=
≤ .
K((1 + a)CF ) + P aCF
K(1 + a) + P a
K/P (1 + a) + a
a
(2.15)
This means that the efficiency is always lower than
EP ≤
1
.
K + (K + P )a
(2.16)
This result is discouraging when one wants to use the parareal framework on a lot
of cores. The speedup is bounded by 1/a and is probably even worse as we usually
also have K 2 unless we use a very accurate coarse solver G. Additionally,
K will probably increase when the interval size P ∆T increases. This means that
the parareal algorithm in this generic form is likely to be unsuitable for large-scale
applications.
k
In the given equation G is evaluated two times but one can save one evaluation as G(Uj−1
)
has already been calculated in the previous iteration.
3
18
2.3. SOLVING THE TOYPROBLEM IN A TIME-PARALLEL WAY
2.3
Solving the toyproblem in a time-parallel way
Now these two algorithms should be used to solve the toyproblem in equation (2.1).
To solve this problem one usually applies implicit solvers to it as stability is an
issue here. The reference solution is calculated using the implicit midpoint rule
with step-size ∆t = 10−3 in a sequential fashion. As the problem size is very small
(900 unknowns), parallization of one single time step is probably not very efficient
in this case.
To verify this, the linear system of equations that has to be solved in each time
step was solved in parallel using a parallel solver from the PETSc toolkit[7]. Figure
2.4 shows that the program becomes only marginally faster when using more cores4 .
To check the accuracy of the time parallel methods the approximate solution at
the final time send was compared to the reference solution rend . Then the relative
error
||rend − send ||∞
||rend ||∞
(2.17)
was investigated.
1.25
1.2
1.15
SP
1.1
1.05
1
0.95
0.9
0.85 0
10
1
2
10
10
3
10
P
Figure 2.4. Achieved speedup using a parallel linear equation solver on the time
interval [0, 100]
4
The bad performance on very few cores compared to the solution on one single core can be
explained by the way how PETSc solves linear systems. PETSc uses by default Krylov methods
to solve linear systems. If more than one core is available, PETSc uses a different preconditioner
which leads to slightly slower convergence in this case.
19
CHAPTER 2. TIME PARALLELIZATION METHODS
2.3.1
Waveform relaxation
In example 1 we have already predicted that the convergence of the WR algorithm
will be very slow if the length of one window is too large. Here we will solve the
equation on the time interval [0, 10] using several different window sizes. The timestepping scheme that is used for solving the modified problems in equation (2.2) is
the same as for the reference solution, i.e. the implicit midpoint rule with step size
∆t = 10−3 . The modified right-hand side F (x, y, t) is the same as in equation (2.7).
As a stopping criterion
(k)
||yi
(k−1)
− yi
||∞ ≤ 10−5 ,
∀i
was used. By using this stopping criterion, the relative error
||w − s||∞
||s||∞
was always lower than 5e − 3. Here s stands for the reference solution at time T
and w for the approximate solution that was obtained using the WR method.
Table 2.1 shows the average number of iterations per window that are necessary
to fulfill this stopping criterion. We can see that the necessary number of iterations
grows very fast for window lengths > 10−3 . The motivation to use the WR algorithm
was to lower the number of necessary communications which would be advantegeous
if the communication latency is limiting the speedup. In this case it does not work,
unfortunately. In order to do fewer communications, the number of iterations should
be much lower than the number of discrete time points in the window. Otherwise
one needs only one communication per iteration but due to the high number of
iterations one may require more communications in total than in the case where
only one single time step is parallelized.
window size
1 · 10−3
4 · 10−3
10 · 10−3
average number of iterations per window
11.4
25.0
48.7
Table 2.1. Average number of iterations for different window sizes
Now we can have a look at the speedup in the case where the window size is
10 · 10−3 . Figure 2.5 shows the speedup for the WR algorithm with respect to the
sequential reference solver, which is extremely low in this case. The reason for this
is of course the high number of iterations that were shown in table 2.1. On the
other hand we can show the positive effects of sending data only once per iteration:
Figure 2.6 shows the speedup of the WR algorithm when one uses the WR algorithm
on one processor as the reference solver. The achieved speedup is much better than
in the sequential case (figure 2.4) which probably means that the influence of the
communication latency became smaller.
20
2.3. SOLVING THE TOYPROBLEM IN A TIME-PARALLEL WAY
0.034
0.032
SP
0.03
0.028
0.026
0.024
0.022 0
10
1
2
10
10
3
10
P
Figure 2.5. Achieved speedup with respect to sequential time-stepping using the
WR algorithm and window size 10 · 10−3
Finally one can say that the basic WR algorithm is unsuitable for this kind of
problems. This was already shown in example 1 and has also been noted in several
papers[20][22]. Some ideas have been proposed to improve the convergence of WR
methods for parabolic problems, for example by using a multigrid approach[20] or
by employing successive overrelaxation[22].
2.3.2
Parareal algorithm
For the parareal algorithm we choose the same solver for F as we did in the sequential
case, i.e. the implicit midpoint rule with step size ∆t = 10−3 . As a coarse propagator
G the implicit Euler method was chosen, one time with ∆t = 10−2 and the other
time with ∆t = 10−1 . In this case we get CG ≈ 81 CF when the step size of G is
1
10−2 and CG ≈ 50
CF for a step size of 10−1 .5 It is also easy to show that the
requirements in theorem 4 are fulfilled due to the linearity of the used solvers.
This time we choose the large time interval [0, 100] and look at the speedup
in figure 2.7. In the case when G uses the step size ∆t = 10−2 we obtain a quite
limited speedup which can easily be explained by the upper bound for the speedup
5
To solve the linear systems in each time step, linear equation solvers from the PETSc toolkit
were used. These solvers are not direct solvers but Krylov methods. It seems that the speed of
these solvers implicitly depends on the step size of the time-stepping scheme. Therefore using a
10 times bigger step size does not make the solver 10 times faster. Forcing PETSc to use direct
solvers makes the computation much slower.
21
CHAPTER 2. TIME PARALLELIZATION METHODS
1.45
1.4
1.35
1.3
SP
1.25
1.2
1.15
1.1
1.05
1 0
10
1
2
10
10
3
10
P
Figure 2.6. Achieved speedup with respect to the WR algorithm on one processor
using the WR algorithm and window size 10 · 10−3
that was obtained in equation (2.15). This equation limits the speedup SP by
SP ≤
1
1
≈
= 8.
a
1/8
If we use the G with step size 10−1 we may get a bigger speedup according
to equation (2.15) (SP ≤ 50). We can also see that this holds true by looking at
figure 2.8. It is worth noting that the parareal algorithm always converges after
two iterations for this simple toyproblem, seemingly independent of the step-size
for the coarse solver. Furthermore, the accuracy of the results is usually very good
with the relative error being around 1e − 4. For general problems the number of
iterations will probably increase when a less accurate coarse solver G is used but
this does not seem to be the case for this toyproblem.
22
2.3. SOLVING THE TOYPROBLEM IN A TIME-PARALLEL WAY
8
7
6
SP
5
4
3
2
1 0
10
1
2
10
3
10
P
10
4
10
Figure 2.7. Achieved speedup for the parareal algorithm with coarse time step 10−2
35
30
25
S
P
20
15
10
5
0 0
10
1
2
10
10
3
10
P
Figure 2.8. Achieved speedup for the parareal algorithm with coarse time step 10−1
23
Chapter 3
Introduction to molecular dynamics
Molecular dynamics is the term for a class of numerical simulations that try to
predict the motion of “particles” and corresponding physical properties like temperature or the structure of molecules. The particles in these calculations can be
• basic nucleons like protons, neutrons and electrons
• whole atoms or atom-groups
• bigger molecules or groups of molecules, for example proteins
• a combination of the three groups above where certain movements are simulated on a very fine scale while others are calculated on a coarse level.
Here I will briefly present the so called classical MD. This basically means that the
movement of particles can be derived from Newton’s laws of motion and not from
quantum mechanics. The basic idea behind MD is usually the same, independent
of the used scale: One tries to simulate the movement of some particles ρ1 , . . . , ρn
where each particle ρi has the coordinates xi (t), yi (t) (in the 2D case). Then the
basic equation for molecular dynamics is
M d¨ = f (d, t),
t ∈ [0, T ]
(3.1)
d(0) = d0
˙
d(0)
= v0
where d : R 7→ R2n is defined as
d(t) = (x1 (t), y1 (t), x2 (t), y2 (t), . . . , xn (t), yn (t))T .
M is in this case a mass matrix which determines the mass of each particle ρi . The
function f (d, t) determines the interaction of the particles with each other as well
as additional external forces. In 3.1 it will be explained how this function f can be
chosen.
25
CHAPTER 3. INTRODUCTION TO MOLECULAR DYNAMICS
Equation (3.1) can be rewritten in such a way that it has the same form as the
basic equation (1.4) by introducing vik (t) = d˙ki (t). Then one obtains
v̇
d˙
!
=
| {z }
ẏ
M −1 f (d, t)
v
!
{z
}
|
g(y)
,
t ∈ [0, T ]
(3.2)
d(0) = d0
v(0) = d˙0
where v describes the velocity of the particles. This allows us to apply the theory
from chapter 2 to these equations.
3.1
Force fields and potentials
The big question is now how one should choose the right-hand side function f (d, t).
In theory one could solve MD problems with arbitrary precision by using Schrödinger’s equation[25]. This is impossible unless one has very special (and small)
problems as solving Schrödingers equation is very expensive. Instead, a lot of approximations have been developed which are able to model some properties of the
molecular system. These approximations are usually sufficient to accurately approximate the solution for a certain class of problems.
In classical MD, the right-hand side f (d, t) can usually be written as[23]
f (d, t) = −∇V (d)
(3.3)
where V is called the potential function and ∇V the force field. V describes the
potential energy in the system. By differentiating with respect to the particle positions, the actual force can be calculated. A lot of different force fields exists, some
popular examples are the AMBER and CHARMM force fields which are part of
several important molecular dynamics software packages.
One very simple potential that is often used for demonstration purposes is the
Lennard-Jones potential that can actually be used to simulate the behaviour of
noble gases. In that case the potential function is computed by calculating and
summing up all pairwise potentials

σ
U (rij ) = αε 
rij
!12
−
σ
rij
!6 
(3.4)

where rij denotes the distance between the particles ρi and ρj and α, ε and σ are
constants that have to be adapted to the actual problem. The whole potential
function V (d) is in this case
V (d) =
n X
n
X
i=1 j=i+1
U (||rij ||),
rij = 26
d2i−1
d2i
!
−
d2j−1
d2j
!
.
(3.5)
3.2. TIME STEPPING
1
Uij
0.5
0
−0.5
−1
0
1
2
rij
3
4
5
Figure 3.1. The function U (rij ) with respect to rij , α = ε = σ = 1
By calculating −∇V one obtains the forces that act on each particle:
fxi
fyi
!
n
X
1
=
24ε
rij
j=1
σ6
σ 12
−
2
6
12
rij
rij
!
xj − xi
yj − yi
!
.
(3.6)
j6=i
By looking at figure 3.1 or equation (3.6), one can see that particles that are far
away almost have no influence on each other. This holds also true for a couple of
other potentials which therefore are called short range potentials. In this case it
is common to ignore the interactions of two particles ρi and ρj if their distance is
larger than a certain cutoff-radius rcut . In that case equation (3.6) becomes
fxi
fyi
n
X
!
=
j=1
j6=i
rij <rcut
1
24ε
rij
σ 12
σ6
−
2
6
12
rij
rij
!
xj − xi
yj − yi
!
.
(3.7)
This cutoff-radius can also be used to parallelize such MD problems in space. To
calculate the force that acts on a certain particle ρi we need no longer all particles in
the domain but only a couple of other particles which are located near ρi . Therefore
one can divide the domain into several parts Ωi as shown in figure 3.2. All particles
inside one subdomain are assigned to one single processor. To calculate the forces
for the particles, only particles close to the subdomain boundaries (in the gray area)
have to be exchanged.
3.2
Time stepping
The standard methods to solve the molecular dynamics problems (3.1) are Verlet
integrators[31]. The classical velocity Verlet method calculates the position of the
27
CHAPTER 3. INTRODUCTION TO MOLECULAR DYNAMICS
Ω0
Ω1
cutoff radius
Ω2
Ω3
Figure 3.2. Spatial decomposition for short range potentials
particles di+1 and the corresponding velocities vi+1 at time ti+1 in the following
way[30]:
di+1 = di + ∆t vi +
vi+1 = vi +
∆2t −1
M f (di , ti )
2
(3.8)
∆t −1
M (f (di , ti ) + f (di+1 , ti+1 )).
2
The Verlet integrators are second-order explicit methods that also have the nice
property that they are symplectic. An symplectic integrator can be defined as
follows:
“An integration method can be interpreted as a mapping in phase space
[...]. If the integration method is applied to a measurable set of points
in the phase space, this set is mapped to another measurable set in the
phase space. The integration method is called symplectic if the measure
of both of those sets is equal.”[18, p. 46]
Symplectic methods are usually used in MD because they preserve the energy of
a system very well compared to non-symplectic methods. Other examples of symplectic methods are the symplectic Euler method or the implicit midpoint rule[31].
A big problem that occurs in MD are instabilities. The time step for explicit
methods is usually “limited by the smallest oscillation period that can be found in
the simulated system”[33, p. 6543] – which can easily be about 10−15 s (femto seconds) for some calculations. The time step can be increased slightly by using additional techniques like force splitting and freezing certain bonds (RATTLE/SHAKE),
see also [31]. Implicit time-stepping could – in theory – allow almost arbitrary large
28
3.3. THE MOLECULAR DYNAMICS PROBLEM
step sizes which are only limited by accuracy consideration, not by stability. A
single implicit time step is of course much more expensive than an explicit step as
it requires the solution of a non-linear system.
The paper [31] has investigated the usefulness of implicit methods in molecular
dynamics. The authors basically came to the conclusion that implicit methods do
not allow much bigger timesteps because the necessary non-linear solvers do not
converge for bigger time steps as the non-linear system becomes ill-conditioned.
On the other hand the paper [33] managed to solve certain MD problems using an
implicit waveform relaxation method in a competitive time.
3.3
The molecular dynamics problem
To have a more meaningful problem, that is also more difficult to solve from a
numerical point of view a problem from molecular dynamics has been chosen. This
example has been taken from [18, p. 157ff] and is simulating the evolution of a
crack in silver. The initial configuration is a mesh consisting of 4870 silver atoms
with a small crack in the domain as shown in figure 3.3.
Figure 3.3. Initial positions of the silver atoms; an additional force is applied to the
lower and upper border (white atoms)
Now a force is exercised on the particles on the lower and upper boundary
which is pulling the two sides away from each other. Due to these forces the crack
will open wider. The internal forces between the particles are modeled by a new
potential, which is called the Finnis-Sinclair potential. Unlike the Lennard-Jones
potential which was already presented it is a multibody potential which means that
the potential between the particles ρi and ρj does not only depend on ρi and ρj
but potentially also on some other particles. The potential function for the Finnis29
CHAPTER 3. INTRODUCTION TO MOLECULAR DYNAMICS
Sinclair potential is given by
V (d) = ε
n
X

n
X

i=1
j=i+1
σ
rij
where
Si =
n
X
j=1
j6=i
σ
rij
!12

q
− c (Si )
!6
.
By calculating −∇V we obtain the forces for each particle:
fxi
fyi
where

!
1
σ
Lij = −ε 2 12
rij
rij
=
n
X
Lij
j=1
j6=i
!12
xj − xi
yj − yi
!
1
1
− 3c √ + p
Sj
Si
(3.9)
!
σ
rij
!6 
.
The external force that is exercised on the particles on the lower and upper boundary
is given by
fyext
i
=



g(ymax − yi ),
g(ymin − yi ),


0,
ρi lies on the upper boundary
ρi lies on the lower boundary
otherwise
This causes the particles on the upper/lower boundary to be pulled up/down until
the y coordinate of the particle becomes ymax (upper boundary) or ymin (lower
boundary). We also use the cutoff-radius technique here to reduce the computational cost of each function evaluation. The used constants in this example are
also taken from [18, p. 157ff] and are ε = 1, σ = 1, c = 10.7, M = I, fext = 3,
ymax = 47.5, ymin = −0.5, g = 0.6 and rcut = 5.046875.
3.4
Using time-parallel methods
After doing some tests with time-parallel methods in the previous chapter we will
now apply these methods to the presented MD problem. There are not many papers
available where it was tried to parallelize MD problems in time. In the few cases
where this was done[33][6], very simple and small problems were solved. Therefore
it is difficult to predict how our more realistic problem will behave.
Before we start to solve the problem one thing has to be pointed out: MD
problems usually show chaotic behaviour, i.e. even a very small error will grow exponentially in time. Figure 3.4 shows the solution of the crack propagation problem
at time t = 50. Both solutions were calculated with the Verlet algorithm, using a
30
3.4. USING TIME-PARALLEL METHODS
50
50
40
40
30
30
20
20
10
10
0
0
0
10
20
30
40
50
60
70
80
0
90
10
20
30
40
50
60
70
80
90
Figure 3.4. Solution at t = 100 with slightly modified initial conditions
step size of ∆t = 10−2 . The only difference was that in the initial condition the
velocity of two particles was exchanged. As one can see the pictures are slightly
different.
Due to this chaotic behaviour, comparing the positions or velocities of the particles with some reference solution is usually meaningless. To check the accuracy of
the time-parallel methods the following way is used:
“In molecular dynamics simulations, the evaluation of numerical methods (and determination of the quality of a numerical trajectory) must
be based on the magnitude of the observed energy drift. From one time
step to the next, the energy can fluctuate quite considerably in an MD
simulation, regardless of the method, and these local fluctuations are
generally larger in a low-order method than in a higher-order one, at a
given stepsize.”[9, p. 10]
This means that we have to calculate the total energy E(ρ1 , . . . , ρn ) in the system.
To calculate E, we need the kinetic energy Ekin (ρ1 , . . . , ρn ) and the potential energy
Epot (ρ1 , . . . , ρn ) in the system. These two values are given by[18]
Epot (ρ1 , . . . , ρn ) = V (ρ1 , . . . , ρn )
Ekin (ρ1 , . . . , ρn ) =
n
1X
mi vi2
2 i=1
where mi is the mass of particle ρi and vi is its velocity. Then one gets the total
energy by
E(ρ1 , . . . , ρn ) = Epot (ρ1 , . . . , ρn ) + Ekin (ρ1 , . . . , ρn ).
(3.10)
Again a reference solution was calculated to determine the accuracy of the timeparallel methods. In this case this solution was obtained by using the Verlet algo31
CHAPTER 3. INTRODUCTION TO MOLECULAR DYNAMICS
rithm with step size 5·10−3 . As shown in figure 3.5, the energy remains more or less
constant after an initial phase. Therefore, we calculate the average energy Emean
on the time interval [40, 100]. To check the accuracy of other solvers, we compare
the total energy Ei at time ti with the reference energy Emean . Then we define the
maximal energy deviation Edev for this method by
Edev =
max
δEi =
20
40
i:ti ∈[40,100]
Emean − Ei E
i:ti ∈[40,100]
mean
max
(3.11)
5
x 10
E
−1.9412
−1.9413
0
60
80
100
t
Figure 3.5. Total energy when using a smaller step size (5 · 10−3 )
If we look back at figure 3.4 we see that the final positions of the particles are
different. Figure 3.6 on the other hand shows the deviation from Emean is quite
similar in both cases. In both cases we have that Edev ≈ 5 · 10−6 .
3.4.1
Waveform relaxation
We will start with testing the WR algorithm. First, we have to choose a splitting
function F (x, y, t). Here the function
˜ (y, t) x − y
F (x, y, t) = f (y, t) + Jf
32
3.4. USING TIME-PARALLEL METHODS
−5
1.6
x 10
1.4
|(Emean − E) / Emean|
1.2
1
0.8
0.6
0.4
0.2
0
0
20
40
60
80
100
60
80
100
t
−5
1.6
x 10
1.4
|(Emean − E) / Emean|
1.2
1
0.8
0.6
0.4
0.2
0
0
20
40
t
Figure 3.6. Deviation from the average energy for the unmodified (top) and the
modified initial condition (bottom)
˜ (y, t) is the block-diagonal Jacobian matrix of f that is defined by
will be used. Jf

T1 (y)

..
˜ (y, t) = 
Jf



.
Tn (y)
33
CHAPTER 3. INTRODUCTION TO MOLECULAR DYNAMICS
where Ti (y) ∈ R2×2 is given by
Ti (y) =
∂f2i /∂y2i
∂f2i /∂y2i+1
∂f2i+1 /∂y2i ∂f2i+1 /∂y2i+1
!
.
Each block of the Jacobian matrix corresponds to one single atom whose position
is defined by its x and y coordinate. Therefore this force-splitting will be called
atom-wise Waveform Newton.
As explained in remark 1, using the WR algorithm with explicit time-stepping
is usually not very useful. Therefore the implicit midpoint rule was chosen as a
time-stepping scheme. The idea here is to use bigger step sizes in order to counter
the higher cost of the implicit time-stepping scheme as it was done in [33].
We start with analyzing the convergence rates of the WR algorithm. Here we
cannot get a theoretical convergence estimate as easily as in the previous chapter.
It is also easy to see that no force splitting will satisfy the Lipschitz condition in
theorem 1. This can be shown by choosing x̃ = ỹ in this theorem in such a way that
these vectors correspond to a system where two particles have the same position.
In that case we have that
F (x̃, ỹ, t) = F (x̃, x̃, t) = f (x̃, t) = ±∞
as we have to divide by rij = 0 in the force calculation.
To check if convergence is still possible, several combinations of step sizes and
window sizes were tested for the time interval [0, 1]. Only one core was used in
the beginning. The corresponding number of iterations per window are shown in
table 3.1. As a stopping criterion, the criterion in equation (2.8) was used with
ε = 10−4 . By comparing these results with the sequential Verlet algorithm which
step size
1 · 10−2
1 · 10−2
1 · 10−2
5 · 10−2
5 · 10−2
5 · 10−2
5 · 10−2
10 · 10−2
window size
1 · 10−2
5 · 10−2
10 · 10−2
5 · 10−2
10 · 10−2
20 · 10−2
50 · 10−2
10 · 10−2
average # of iterations per window
4.0
6.0
8
10.6
15.5
23.6
no convergence after 100 iterations
no convergence after 100 iterations
wall-clock time in s
1159
1041
1283
617
664
860
-
Table 3.1. Average number of iterations for different window sizes and step sizes
needed approximately 80 seconds to solve the problem on this time interval we can
see that the WR algorithm performs quite badly again. If we use small step and
window sizes we get a quite fast convergence. However, this does not help us as
each iteration is now much more costly as we are using implicit time-stepping and a
more expensive right-hand side function F (x, y, t). When we try to use bigger step
34
3.4. USING TIME-PARALLEL METHODS
sizes to counter the higher costs, the number of iterations grows also quite fast or
the solution does not converge at all. As the maximal possible window sizes are also
quite small (≤ 10·stepsize) we still have to exchange data after a few time steps.
Tests showed also that even when one uses 100 cores the WR algorithm is only
marginally faster than the Verlet algorithm on one single processor. Furthermore,
the WR algorithm does not seem to conserve energy very well with Edev being
around 2 · 10−3 for a step size of 5 · 10−2 and a window size of 5 · 10−2 .
3.4.2
Parareal algorithm
To test the parareal algorithm, the Verlet algorithm with step size 10−2 was chosen
as the fine solver F. As using a bigger step size leads to instabilities, the coarser
solver G was chosen to be the equal to F but the cutoff radius was halved in order to
make G faster. This gives us approximately CF ≈ 4CG . When testing the parareal
algorithm, the outcome was basically that one always has to do almost P iterations
in order to fulfill the stopping criterion if P / 150. This leads to runtimes that
are even worse than for serial time stepping using only one processor. Further
investigations showed that the reason for this behaviour seems to be the update
step
k+1
k
k
).
Ujk+1 = F(Uj−1
) − G(Uj−1
) + G(Uj−1
If the size of one time block is quite large – which is the case if P is small – the values
k ), G(U k ) and G(U k+1 ) may differ substantially. As U k+1 is calculated
of F(Uj−1
j−1
j
j−1
by simply summing up these terms it regularly happens that some particles in Ujk+1
become too close to each other. By looking at equation (3.9) one can see that
small distances between particles lead to very strong repulsive forces which causes
subsequent applications of time-stepping schemes to become unstable. Therefore
the parareal algorithm does not converge in this case and only the observation
that the result of the parareal algorithm after P iterations equals the solution of
sequential time-stepping makes it possible to obtain the right solution in this case.
For higher numbers of processors the algorithm converges and we are at least
able to get a small speedup as shown in figure 3.7. Table 3.2 shows the number of
parareal iterations that were required for convergence. It also shows the theoretical
upper bound for the speedup that was derived in chapter 2 (equation (2.15)) and
compares it with the actual speedup. We can see that the algorithm came very
close to the theoretical speedup.
The energy conservation of the parareal algorithm was quite good in this case
with values Edev ≤ 6 · 10−6 . This is quite surprising, given that the parareal algorithm is not symplectic, even if F and G are symplectic[11].
35
CHAPTER 3. INTRODUCTION TO MOLECULAR DYNAMICS
2.6
2.5
2.4
SP
2.3
2.2
2.1
2
1.9
1.8
100
200
300
400
500
600
700
800
P
Figure 3.7. Achieved speedup for the parareal algorithm
P
192
384
768
number of iterations
43
44
37
maximal theoretical speedup
1.9
2.5
3.2
achieved speedup
1.8
2.4
2.6
Table 3.2. Average number of iterations for different window sizes and step sizes
36
Chapter 4
Improved methods for the molecular
dynamics problem
In chapter 2 the parareal algorithm and the WR method were presented. For the
MD problem in the previous chapter both algorithms performed very bad and do
not seem to work in their basic forms. This chapter will present some ideas how
the algorithms can be improved. In the next chapter these improvements are tested
and evaluated.
The waveform relaxation suffers from the fact that the number of necessary
iterations is growing very fast when the window sizes become bigger. However, big
window sizes are required in order to use a high number of cores efficiently. Maybe
an even bigger problem is that the solution does not even converge when one uses
big time steps or window sizes. This makes it very unlikely that large-scale time
parallelization can be achieved using the WR method. Furthermore, using implicit
methods may not be useful for molecular dynamics as it does not allow much bigger
step sizes than explicit methods[31]. Because of this, the main focus of this chapter
will be on the parareal method which is more promising here.
The parareal algorithm in its basic form suffers from the fact that it becomes
unstable and does not converge. But we can see that convergence is possible when
the number of processors is high enough and the size of each time block is small
enough. But the obtained speedup is still relatively small. Therefore some components of the algorithm shall be investigated and a new multilevel parareal algorithm
is proposed here.
4.1
Choice of the force splitting for the waveform
relaxation
The choice of the function splitting for the WR method is likely to have an influence on the speed of the method. Therefore three different force splittings will be
compared with respect to their rate of convergence and the overall speed of the
resulting method. We will use the Picard splitting which was probably the first
37
CHAPTER 4. IMPROVED METHODS FOR THE MOLECULAR DYNAMICS
PROBLEM
splitting that was used for WR like methods. We will also test a diagonal Newton
approach for the modified right-hand side. These two splittings will be compared
to the splitting that was used in the previous chapter. The mentioned splittings are
defined as follows:
• Picard iteration:
F (x, y, t) = f (y, t)
• Diagonal Newton WR iteration:
F (x, y, t) = f (y, t) + diag(Jf (y, t))(x − y)
• The force splitting from chapter 3:
˜ (y, t)(x − y)
F (x, y, t)i = f (y, t) + Jf
4.2
4.2.1
Improving the parareal algorithm
Using windowing
When applying the parareal algorithm we saw that the algorithm became unstable
when the size of each time block is too large. Furthermore, theorem 4 suggests that
using smaller time blocks may be benefitial for convergence. Therefore it is often
not appropriate to apply the parareal method to the whole time domain. Instead,
one can solve the differential equation on a part of the time domain (on a window)
as shown in figure 4.1. As soon as the problem has been solved completely on
one window, the solution on the next window starts, using the final solution of the
previous window as the initial condition.
p0
p1
t=0
1st window
p0
p1
p2
p2
t=T
2nd window
p0
p1
p2
t=0
3rd window
p0
p1
p2
t=T
Figure 4.1. Top: Using parareal on the whole domain; Bottom: Using parareal with
windowing
38
4.2. IMPROVING THE PARAREAL ALGORITHM
4.2.2
Choice of the coarse operators
The choice of the coarse operator in each parareal-like algorithm is very important
in order to obtain fast convergence. If the coarse operator is too inaccurate, a lot
of iterations are needed for convergence. If it is too expensive on the other hand, it
will also influence the parallel efficiency in a bad way. To construct coarse solvers
we basically have four different possibilities:
1. Use a bigger step-size than for the fine operator.
2. Use another time-stepping method that has a lower order and which is therefore faster.
3. Simplify the underlying physics.
4. Use an iterative solver and do only a limited number of iterations. This can
be done if the WR algorithm or a comparable iterative method is used for the
time-stepping.
The ideas 1 and 2 are unfortunately not very doable when one tries to solve MD
problems. Idea 1 will not work as one is usually already using the biggest possible
time step for which the system is stable. Using a bigger time step will therefore result
in an unstable method unless one uses implicit methods. But in [31] it was shown
that the non-linear equation solvers that are needed for implicit time-stepping do not
converge when the time steps become only slightly larger. Idea 2 will not be working
either, as we are using the Verlet algorithm which is a quite cheap and explicit
time stepping scheme. Furthermore, the Verlet algorithm has much better stability
properties than for example the explicit Euler scheme. Constructing methods that
are faster than Verlet and similarly stable may therefore be very difficult.
The easiest way to simplify the underlying physics is to simply make the cutoff
radius smaller. This has of course some influence on the solution in the long term
but on shorter time scales the solutions are sufficiently similar. Therefore we will
test coarse solvers which use 0.5rcut and 0.25rcut as the cutoff radius. Another idea
is to use a completely different potential. Crack propagation has already been done
in [28] using the Lennard-Jones potential. For sufficiently small time intervals the
solution that one obtains using the Lennard-Jones potential may be close enough
to the original solution. Therefore the Lennard-Jones potential is used to construct
a coarse solver by setting ε = 1 and σ ≈ 0.89 in this potential1 . There are also
several different approaches available to construct coarse grained approximations.
Deriving these approximations is usually quite complex and would go beyond the
scope of this thesis, therefore this approach is not tested here.
Finally, we use only a few iterations of the WR algorithm as the coarse solver.
In that case we hope that the solution obtained by the WR method is sufficiently
close to the actual solution.
1
These values were obtained by fitting the Lennard-Jones forces to the Finnis-Sinclair forces.
39
CHAPTER 4. IMPROVED METHODS FOR THE MOLECULAR DYNAMICS
PROBLEM
4.2.3
A multilevel parareal algorithm
As shown in the end of chapter 2 (equation (2.15)) the speedup is limited by the
speed of the coarse operator. Furthermore, it is quite probable that the the number
of parareal iteration will increase when we use a G which is less accurate. Thus we
have a big problem now: We cannot use an arbitrary coarse G to obtain a better
bound for the speedup in equation (2.15) because this will probably require much
more parareal iterations which negates the positive effects of a faster G.
Therefore a multilevel parareal algorithm will be derived here. Figure 4.2 shows
k ) in
the two steps of the classical parareal algorithm: First one calculates F(Uj−1
parallel for all j. The second step is to sequentially apply the coarse solver G in order
to calculate Ujk+1 ∀j. In the multilevel algorithm we do not apply G sequentially
on the whole time domain. Instead, G is only applied sequentially on some parts
of the domain. In order to propagate the solution from earlier time steps to the
later ones, even coarser solvers should be used. Figure 4.3 illustrates the idea in
the case that one uses three different solvers F, G and C. By doing this we hope
to get convergence rates which are not much worse than for the classical parareal
algorithm. If this is the case we can get much better performance on a lot of cores
as the amount of sequential calculations is much lower now.
G
G
G
G
G
G
G
G
part 2
t
0
T
F
F
F
F
F
F
F
F
part 1
t
0
T
Figure 4.2. Sketch of the classical parareal algorithm, dashed lines represent dependencies
The foundation for this algorithm is the work in [14] where parareal was motivated as a multigrid algorithm with only two levels. Here, this approach is used
40
4.2. IMPROVING THE PARAREAL ALGORITHM
C
C
C
C
part 3
t
0
T
G
G
G
G
G
G
G
G
part 2
t
0
T
F
F
F
F
F
F
F
F
part 1
t
0
T
Figure 4.3. Sketch of the 3-grid parareal algorithm, dashed lines represent dependencies
to extend parareal to more levels. This makes it possible to introduce additional
parallelism into the algorithm. The possibility to construct multilevel parareal algorithms has already been pointed out for example in [14] but as far as I know there
are no papers available that deal with such an algorithm for non-linear ODEs. For
linear ODEs such an algorithm has been presented in [13] which showed quite similar
convergence rates for the classical and the multigrid parareal algorithm. A somewhat similar approach has also been used in [17] to create a multilevel time-parallel
predictor-corrector method for computational fluid dynamics.
41
CHAPTER 4. IMPROVED METHODS FOR THE MOLECULAR DYNAMICS
PROBLEM
Parareal as a 2-grid method
First, we will derive the classical parareal algorithm as a multigrid method: To do
this we have to define a coarseness constant c ∈ N, c > 1, equidistant time steps
t0 , t1 , . . . , tcm and vectors u0 , u1 , . . . ucm . Furthermore, we need a fine solver F and
a coarse solver G. F(ui ) denotes here the approximate solution of
t ∈ [ti , ti+1 ]
ẏ = f (y, t)
y(ti ) = ui
at time ti+1 using the F propagator. G(ui ) is defined in a similar way, namely as
the approximate solution of
t ∈ [ti , ti+c ]
ẏ = f (y, t)
y(ti ) = ui
at time ti+c using the G propagator.
Similar to equation (2.9) we can now represent the sequential application of F
to the basic equation (1.4) as a non-linear system of equations:






u0
u1 − F(u0 )
..
.
ucm − F(ucm−1 )
|
{z

 
 
=
 
 
}
A(u)

y0
0
..
.
0






(4.1)
| {z }
b
This nonlinear system A(u) = b can easily be solved by using sequential timestepping. In order to introduce parallelism however, the system should now be
solved by a non-linear multigrid approach called FAS (full approximation scheme).
This means that one tries to approximate the exact solution u∗ by an iterative
scheme. Here, uki will denote the approximation of u∗i in the k-th iteration. The
general form of a 2-grid FAS iteration is:
1. Smoothing step:
ũ = S(uk , b)
(4.2)
Ã(U ) = B
(4.3)
B = R(b − A(ũ)) + Ã(R(ũ))
(4.4)
uk+1 = ũ + P (U − R(ũ))
(4.5)
2. Solve a coarse problem:
where
3. Correction step:
42
4.2. IMPROVING THE PARAREAL ALGORITHM
In this scheme, S is the so called smoothing operator, R the restricting operator, P
the prolongation operator and Ã(U ) is the coarse version of A(u). The paper [14]
uses now the following definitions to obtain the parareal scheme: The smoothing
operation ũ = S(uk , b) is defined by:
ũi = uki ,
∃j ∈ N0 : i = cj
ũi = F(ũi−1 ) + bi ,
otherwise
which means that sequential time-stepping is used inside a block of c time-points
while the initial values of each time block remain unchanged. The restriction and
prolongation operations are given by:




R(u) = 


u0
uc
..
.
ucm
U0
0c−1
U1
0c−1
..
.





P (U ) = 




 0c−1



,














Um
where 0c−1 is the zero-vector of size c − 1. Finally we have to specify the coarse
problem Ã(U ). Using G instead of F, we define Ã(U ) as



A(U ) = 


U0
U1 − G(U0 )
..
.
Um − G(Um−1 )



.


(4.6)
The normal parareal algorithm solves the nonlinear equation A(U ) = B on the
coarse grid exactly by applying G sequentially. Then one can show that each of
these FAS iterations is equal to one parareal iteration:
First we have to calculate the vector B in the coarse equation A(U ) = B:
B
= R(b − A(ũ)) + Ã(R(ũ))



=





=


b0 − ũ0 + ũ0
bc − ũc + F(ũc−1 ) + ũc − G(ũ0 )
..
.
bcm − ũcm + F(ũcm−1 ) + ũcm − G(ũc(m−1) )
b0

bc + F(ũc−1 ) − G(ũ0 )
..
.


.


bcm + F(ũcm−1 ) − G(ũc(m−1) )
43






CHAPTER 4. IMPROVED METHODS FOR THE MOLECULAR DYNAMICS
PROBLEM
Solving A(U ) = B exactly by using the coarse operator G we get



U =


B0
B1 + G(U0 )
..
.
Bm + G(Um−1 )


 
 
=
 
 
b0
F(ũc−1 ) − G(ũ0 ) + G(U0 ) + bc
..
.
F(ũc(m−1) ) − G(ũc(m−1) ) + G(Um−1 ) + bc(m−1)



.


Doing the final step uk+1 = ũ + P (U − R(ũ)) yields:
uk+1
= b0
0
uk+1
= ũi + P (U − R(ũ))i = ũi = F(ũi−1 ) + bi ,
i
uk+1
i
6 ∃j ∈ N : i = jc
= ũi + P (U − R(ũ))i = P (U )i = Ui/c
= F(ũi−1 ) − G(ũi−c ) + G(Ui/c−1 ) + bi ,
otherwise.
To obtain the classical parareal algorithm we now have to set c = 1. In that case
we also have that b0 = y0 and bi = 0, i 6= 0. By using this information we get that
ũi = uki and therefore
uk+1
= b0
0
uk+1
= ũki + P (U − R(ũ))i = P (U )i = Ui
i
= F(ũi−1 ) − G(ũi−1 ) + G(Ui−1 )
= F(uki−1 ) − G(uki−1 ) + G(uk+1
i−1 )
Extension to a multigid method
Now versions with more than two levels shall be derived. This means that the coarse
problem Ã(U ) = B in equation (4.3) will not be solved exactly. Instead, we provide
an initial guess for this coarser level and apply recursively one multigrid-parareal
iteration to this coarse problem.
First, we define the number of different levels by L where level 1 is the finest and
L the coarsest level. Then we have to define fine and coarse time-stepping schemes
Fl and Gl for each level 1, . . . , L − 1. We must also enforce that Fl = Gl−1 , l > 1,
i.e. the coarse propagator on level l must equal the fine propagator on level l + 1.
Furthermore, coarseness constants c1 , . . . , cL−1 for the levels are needed and we set
c1 = 1. Finally we denote the function A and the vector b on level l by Al and bl .
Now we are going to derive the update formulas for each level l where l < L − 1.
On every level l we need an initial guess Uik,l , i = 1, . . . , cl ml in each iteration k.
Then we have to apply the smoothing operator S(U k,l ) to these initial values:
Ũil = Uik,l ,
∃j ∈ N0 : i = cl j
l
Ũil = Fl (Ũi−1
) + bli ,
otherwise.
44
4.2. IMPROVING THE PARAREAL ALGORITHM
By using this we can calculate the vector bl+1 which is needed for the evaluation on
the coarser grid:
bl+1
= R(bl − Al (Ũ l )) + Al+1 (R(Ũ l ))
bl0 − Ũ0l + Ũ0l
blc − Ũcll + Fl (Ũcll −1 ) + Ũcll − Gl (Ũ0l )
..
.



=


blcl ml − Ũcll ml + Fl (Ũcll ml −1 ) + Ũcll ml − Gl (Ũcll (ml −1) )
bl0



=


blcl ml + Fl (Ũcll ml −1 ) − Gl (Ũcll (ml −1) )
bl0
blcl + Fl (Ũcll −1 ) − Gl (U0k,l )
..
.


=








blcl + Fl (Ũcll −1 ) − Gl (Ũ0l )
..
.








)
blcl ml + Fl (Ũcll ml −1 ) − Gl (Uck,l
l (ml −1)


.


The value bl+1 is now used on the next level as the coarse right-hand side and
Ũ0l , Ũcll , . . . , Ũcll ml = U0k,l , Uck,l
, . . . , Uck,l
are used as the initial condition on the
l
l ml
k,l+1
k,l+1
k,l+1
coarse grid U0
, U1
, . . . , Uml . The evaluation on the coarse grid will return
k+1,l+1
k+1,l+1 which are now used to calculate the
, U1k+1,l+1 , . . . , Um
updated values U0
l
next iterate:
k+1,l+1
Uik+1,l = Ui/c
,
l
Uik+1,l
Uik+1,l
l
= Ũil = Fl (Ũi−1
) + bli =
l
= Ũil = Fl (Ũi−1
) + bli =
∃j ∈ N : i = cl j
k,l
Fl (Ui−1
) + bli ,
k+1,l
Fl (Ui−1
) + bli ,
∃j ∈ N : i = cl j + 1
(4.7)
otherwise
If we are on level L − 1 then the coarse problem Al+1 (U ) = bl+1 is solved exactly.
If we use U ∗,l+1 to describe the exact solution of the coarse problem we obtain
∗,l+1
k+1,l
Uik+1,l = Ui/c
= Gl (Ui−c
) + bl+1
i/cl
l
l
k+1,l
k,l
k,l
= Gl (Ui−c
) + bli + Fl (Ui−1
) − Gl (Ui−c
), ∃j ∈ N : i = cl j
l
l
k,l
l
Uik+1,l = Ũil = Fl (Ũi−1
) + bli = Fl (Ui−1
) + bli ,
∃j ∈ N : i = cl j + 1
k+1,l
l
Uik+1,l = Ũi = Fl (Ũi−1
) + bli = Fl (Ui−1
) + bli ,
otherwise
(4.8)
Finally, these values are passed to the finer level where they will serve as the coarse
grid solution.
Parallel implementation of the multigrid method
The question now is in which way the processors should cooperate. The idea that
was used here is to assign a number of processors Pl to each level l. After choosing
45
CHAPTER 4. IMPROVED METHODS FOR THE MOLECULAR DYNAMICS
PROBLEM
the number of processors P1 for the finest level, the values Pl for the levels 2, . . . , L−1
were defined by
Pi = Pi−1 ci−1 , i = 2, . . . , L − 1.
We will also denote the Pl processors on level l by pl0 , . . . , plPl −1 . At the beginning
of each iteration k all processors pli are supposed to have the vector Uik,l locally
available. The next thing that is done is to split the calculation of bl+1 into two
parts. On level l the parts
b̃lcl = blcl + Fl (Ũicl l −1 ),
∀i = 0, . . . , ml
k,l
k,l
are calculated. The calculation of the missing Gl (U(i−1)c
) = Fl+1 (U(i−1)c
)
l −1
l −1
is moved to the level l + 1 where this value is already known from the previous
k,l
iteration. Due to this we can avoid the explicit calculation of Gl (U(i−1)c
) on level
l −1
l. By using this and the equations (4.7) and (4.8) we get the operations that each
processor pli for l < L − 1 has to do. All different cases are listed below in algorithm
3:
Algorithm 3 Updates for processor pli
if l = 1 then
l+1
b̃i+1
= Fl (Uik,l )
else if l = L − 1 and ∃j ∈ N : i + 1 = jcl then
k+1,l
k+1,l
k,l
Ui+1
= Gl (Ui−c
) + Fl (Uik+1,l ) − Fl (Uik,l ) − Gl (Ui−c
) + b̃li
l +1
l +1
else if cl = 1 then
k,l
k,l
l
l
b̃l+1
(i+1)/cl = Fl (Ui ) − Fl (Ui ) + b̃i+1 = b̃i+1
else
if ∃j ∈ N : i + 1 = jcl then
k+1,l
b̃l+1
) − Fl (Uik,l ) + b̃li+1
(i+1)/cl = Fl (Ui
else if ∃j ∈ N : i = jcl then
k+1,l
Ui+1
= Fl (Uik,l ) − Fl (Uik,l ) + b̃li+1 = b̃li+1
else
k+1,l
Ui+1
= Fl (Uik+1,l ) − Fl (Uik,l ) + b̃li+1
end if
end if
If we use these updates and the fact that each processor pli holds Uik,l we get the
following parallel scheme:
46
4.2. IMPROVING THE PARAREAL ALGORITHM
Algorithm 4 Parallel multilevel parareal algorithm for processor pli
Require: Initial value Ui0,l
while not converged do
Receive Uik+1,l from pli−1 if this value is needed by algorithm 3
k+1,l
Receive Ui−c
from pli−cl if this value is needed by algorithm 3
l +1
Receive b̃li+1 from pl−1
cl−1 i if l > 1
Calculate the new values according to algorithm 3
if 6 ∃j : i + 1 = jcl then
k+1,l
Send Ui+1
to pli+1
else if l < L − 1 then
l+1
to pl+1
Send b̃(i+1)c
(i+1)cl −1
l
end if
if l = L − 1 and ∃j : i + 1 = jcl then
k+1,l
to pli+cl
Send Ui+1
end if
if l < L − 1 and ∃j : i = jcl then
k+1,l+1
Receive Ui/c
from pl+1
icl
l
k+1,l+1
Uik+1,l = Ui/c
l
end if
end while
The stopping criterion was implemented in a similar fashion as for the classical
parareal algorithm. This time we simply apply the stopping criterion only to the
coarsest level. This means that we can stop the calculation as soon as
||Ujk,L−1 − Ujk−1,L−1 || < ε,
∀j.
Tests showed that ε has to be chosen as ε = 10−4 to achieve similar accuracy as
sequential time-stepping, i.e. Edev ≤ 6 · 10−6 .
47
Chapter 5
Evaluation
5.1
Choice of the force splitting for the waveform
relaxation
Here we have a look at the three different force splittings that were presented in
the previous chapter. Again, we will test them by solving the MD problem on the
interval [0, 1] using the implicit midpoint rule as the time-stepping scheme.
Table 5.1 shows the average number of iterations per window and the corresponding walltime on one processor. As we can see, none of these force splittings
is satisfying. Using the Picard splitting, each iteration is quite cheap as we do not
have to calculate parts of the Jacobian matrix in contrast to the other two splittings. While being cheap, this version of the WR algorithm does not converge for
bigger time steps. Instead, the solution becomes instable for step sizes as small as
5 · 10−2 . The diagonal Newton approach on the other hand behaves very similar to
the atom-wise Newton splitting in chapter 3.
As the window sizes are still very small we do not gain much from using more
cores. Even if 100 cores are used, none of these force splittings is more than 20%
faster than the Verlet algorithm on one single processor. Furthermore we still have
the problem that the WR algorithm does not conserve the energy very well which
leads to very inaccurate results in the long run.
5.2
5.2.1
Improving the parareal algorithm
Using windowing
Now the influence of windowing is investigated using the same solvers as in the
evaluation in chapter 3. Here, the solution was not calculated on the whole time
interval at once. Instead, windows of size P ∆T for varying ∆T were used. By looking at figure 5.1 we can see the positive influence of windowing on the convergence
rate. This figure shows us that the number of iterations that is required to fulfill the
stopping criterion becomes much smaller which should give us better performance
49
CHAPTER 5. EVALUATION
algorithm
Atom-wise Newton
Atom-wise Newton
Atom-wise Newton
Atom-wise Newton
Atom-wise Newton
Atom-wise Newton
Atom-wise Newton
Atom-wise Newton
Picard
Picard
Picard
Picard
Picard
Diagonal Newton
Diagonal Newton
Diagonal Newton
Diagonal Newton
Diagonal Newton
Diagonal Newton
Diagonal Newton
Diagonal Newton
step
size
1 · 10−2
1 · 10−2
1 · 10−2
5 · 10−2
5 · 10−2
5 · 10−2
5 · 10−2
10 · 10−2
1 · 10−2
1 · 10−2
1 · 10−2
5 · 10−2
10 · 10−2
1 · 10−2
1 · 10−2
1 · 10−2
5 · 10−2
5 · 10−2
5 · 10−2
5 · 10−2
10 · 10−2
window
size
1 · 10−2
5 · 10−2
10 · 10−2
5 · 10−2
10 · 10−2
20 · 10−2
50 · 10−2
10 · 10−2
1 · 10−2
5 · 10−2
10 · 10−2
5 · 10−2
10 · 10−2
1 · 10−2
5 · 10−2
10 · 10−2
5 · 10−2
10 · 10−2
20 · 10−2
50 · 10−2
10 · 10−2
average number
of iterations
4.0
6.0
8.0
10.6
15.5
23.6
no convergence
no convergence
5.8
11.6
18
instable
instable
4.0
6.0
8
11.0
16.2
25.2
instable
no convergence
wall-clock time
in s
1159
1041
1283
617
664
860
479
570
808
1114
1000
1233
618
680
882
-
Table 5.1. Average number of iterations for different window sizes and step sizes
according to equation (2.15). As shown in figure 5.2 this is indeed the case, giving
us up to 50% better performance for P = 192. It also makes it possible to get
convergence if P is smaller.
5.2.2
Choice of the coarse operators
Here, several different coarse solvers G for the molecular dynamics problem will be
compared. These solvers have already been motivated in the previous chapter and
are:
• Using a smaller cutoff radius (0.5rcut and 0.25rcut )
• Using the Lennard-Jones potential
• Using only a few WR iterations. After looking at table 5.1 the Picard iteration
with step size 1 · 10−2 and window size 1 · 10−2 was chosen as it is the fastest
solver. The number of WR iterations will be limited to only two iterations
here. As this coarse solver would still be slower than F according to table 5.1
we will use a smaller cutoff radius 0.5rcut , too.
50
5.2. IMPROVING THE PARAREAL ALGORITHM
45
40
10−2
35
−2
5 * 10
−1
iterations
30
10
no windowing
25
20
15
10
5
0
0
20
40
60
80
100
t
Figure 5.1. Necessary iterations for parareal algorithm on each window for different
∆T and P = 192
We will also apply the windowing technique with ∆T = 10−2 as these coarse operators may only work on a limited time window. Figure 5.3 shows now the numbers
of iterations that were necessary in each window for P = 48. We can see that the
convergence is quite slow when we are using the Lennard-Jones potential, at least
compared to the other cases. One can demonstrate that the solutions that one obtains by using the Lennard-Jones potential differ quiet strongly from the solution
using the Finnis-Sinclair potential. The difference will be larger if the used window
size increases. If the Lennard-Jones potential is used, the parareal algorithm also
becomes unstable if P 50 as the window size P · 10−2 is larger in this case.
If we use one of the other coarse solvers the convergence is quite fast. But
the performance of the parareal algorithm does not only depend on the speed of
convergence but also on the cost of the coarse solver. As we can see in table 5.2 the
speedup is very small when we use the two WR iterations as a coarse solver even
though convergence is really fast. The reason for this is that the cost of this coarse
solver is very high and we almost have CF = CG in this case.
smaller cutoff 0.5rcut
smaller cutoff 0.25rcut
Lennard-Jones
WR
Speedup
2.6
5.5
0.8
1.2
Table 5.2. Achieved speedups for different coarse time-stepping schemes and P = 48
51
CHAPTER 5. EVALUATION
3.5
3
2.5
SP
2
10−2
1.5
5*10−2
1
10−1
no windowing
0.5
0
0
50
100
150
200
P
250
300
350
400
Figure 5.2. Achieved speedup for the parareal algorithm for different ∆T
20
18
iterations
16
14
0.5rcut
12
0.25rcut
10
Lennard−Jones
WR
8
6
4
2
0
0
20
40
60
80
100
t
Figure 5.3. Necessary iterations for different coarse time-stepping schemes
This means that we basically only have the possibility to use smaller cutoff
radiuses to obtain useful coarse solvers. We also cannot use cutoff radiuses smaller
than 0.25rcut as the resulting coarse solver becomes unstable in this case. For the
cutoff radiuses 0.5rcut and 0.25rcut the speedup for different P was plotted in figure
5.4. By using the less accurate coarse operator with 0.25rcut we are able to obtain
a speedup as high as 10 which is much more than the speedup 2.6 that we achieved
52
5.2. IMPROVING THE PARAREAL ALGORITHM
in the end of chapter 3.
10
0.5rcut
9
0.25 r
cut
8
SP
7
6
5
4
3
2 1
10
2
3
10
10
4
10
P
Figure 5.4. Speedup for cutoff radiuses 0.5rcut and 0.25rcut
5.2.3
A multilevel parareal algorithm
Due to the lack of suitable coarse time-integrators, only the 3-grid method is used.
All used time-stepping schemes are based on the Verlet integrator with step size
10−2 . We will also use windowing with ∆T = 10−2 as in the previous section. The
first coarse operator G1 = F2 is the same as G in chapter 3, i.e. we use 0.5rcut as the
cutoff radius. The even coarser operator G2 has been constructed by halving the
cutoff radius again, as in the previous section about coarse operators. From now on
we will always compare the classical parareal algorithm on P cores with the 3-grid
algorithm using P1 = P cores on the fine level, i.e. 2P cores in total.
First we will have a look at the number of iterations when P = 192. Figure 5.5
shows us the number of iterations in the following four cases:
• Classical parareal algorithm using G = G1 , i.e. the coarse solver uses 0.5rcut
as the cutoff radius.
• Classical parareal algorithm using G = G2 , i.e. the coarse solver uses 0.25rcut
as the cutoff radius.
• Multilevel parareal algorithm using the solvers as specified above and the
constants c1 = 1 and c2 = 2.
• Multilevel parareal algorithm using the solvers as specified above and the
constants c1 = 1 and c2 = 8.
53
CHAPTER 5. EVALUATION
We can see that the convergence speed of the multilevel algorithm is in both cases
not faster than for the classical parareal algorithm using G2 as the coarse solver.
This was unfortunately the requirement for faster calculations and therefore it is
very likely that our algorithm will not be faster than the classical parareal algorithm.
This is shown in figure 5.6 where we compare the speedup of the multilevel algorithm
with the speedup of the classical parareal algorithm.
In that case the multilevel algorithm is slightly slower than the classical parareal
algorithm with G = G2 even though the number of iterations was very similar in
figure 5.5. A possible explanation is the higher number of communications that are
necessary in the multilevel algorithms as the processors on different levels have to
exchange data. We should also not forget that the multilevel algorithm uses the
double amount of processors here compared to the classical parareal algorithm.
16
classical, 0.5 rcut
classical, 0.25 rcut
14
multilevel c2 = 2
multilevel c2=16
iterations
12
10
8
6
4
2
0
20
40
60
80
100
t
Figure 5.5. Necessary iterations for the classical parareal algorithm for P = 192
54
5.2. IMPROVING THE PARAREAL ALGORITHM
9
classical, 0.5 rcut
8
classical, 0.25 rcut
7
multilevel c2 = 2
multilevel c2=8
S
P
6
5
4
3
2
1 1
10
2
10
P
3
10
Figure 5.6. Speedup for the multilevel and the classical parareal algorithm
55
Chapter 6
Conclusions
This thesis showed that it is possible to parallelize at least the previously presented MD problem in time. In contrast to a lot of other papers my evaluation
was not based on a very simple toyproblem but on a more realistic example. The
more promising way to achieve time-parallelization for this problem seems to be
the parareal algorithm which gave us a maximal speedup of 10 here. Even higher
speedups may be possible if one can create even faster coarse solvers G that do not
lead to instabilities.
As equation (2.15) limited the speedup for a high number of cores, a multilevel
parareal algorithm was derived to avoid this limit. This approach was not very
successful as the convergence was not fast enough to obtain a better speedup. Short
tests were also done with the Lorenz attractor using the setup in [16]. During these
tests only the convergence speed of the algorithms was analyzed and not the overall
performance. These experiments showed that the multilevel with the coarse solver
GL−1 can converge much faster than the classical parareal algorithm with coarse
time-stepping G = GL−1 if and only if G is a very inaccurate approximation of F.
By investing more time into the development of even coarser solvers for this MD
problem one can test if this also holds true for this problem.
Future work may also include a convergence analysis for the multilevel parareal
algorithm as well as advanced scheduling strategies. It is also necessary to investigate how the parareal algorithm behaves when additional spatial parallelization
is used to increase the speed of the fine and coarse solvers. In this thesis F and G
were calculated using only one processor which made their calculation slow. Using
several cores will increase the speed of these time-stepping schemes but the communication costs in the parareal algorithm will get a bigger influence when the cost
for the time-stepping declines. This is especially an issue in the multilevel parareal
algorithm which requires more communication between the processors.
The WR algorithm seems quite useless for this kind of problems. Slow convergence on bigger time-intervals or no convergence at all make this algorithm
non-competitive when compared to normal time-stepping. This may be different
for other kinds of MD problems as shown in [33]. In this paper the waveform
57
CHAPTER 6. CONCLUSIONS
relaxation was shown to be almost as fast as sequential application of the Verlet
algorithm in cases where special harmonic potentials force the Verlet algorithm to
do very small time steps in order to retain stability.
We conclude that the parareal algorithm allows us to parallelize the solution
of the MD problem in time. Unfortunately, this algorithm is not really suited for
large-scale parallelization as the speedup is limited even if the time domain is very
large. The derived multilevel algorithm was not able to overcome this limit but it
might still be useful for other problems.
58
Bibliography
[1]
CRESTA homepage. http://cresta-project.eu/.
[2]
Lindgren specification. http://www.pdc.kth.se/resources/computers/lindgren/hardware.
[3]
Titan homepage. http://www.olcf.ornl.gov/titan/.
[4]
Pierluigi Amodio and Luigi Brugnano. Recent Advances in the Parallel Solution
in Time of ODEs. AIP Conference Proceedings, 1048(1):867–870, 2008.
[5]
Eric Aubanel. Scheduling of tasks in the parareal algorithm. Parallel Computing, 37(3):172 – 182, 2011.
[6]
L. Baffico, S. Bernard, Y. Maday, G. Turinici, and G. Zérah. Parallel-in-time
molecular-dynamics simulations. Phys. Rev. E, 66:057701, Nov 2002.
[7]
Satish Balay, Jed Brown, Kris Buschelman, William D. Gropp, Dinesh Kaushik,
Matthew G. Knepley, Lois Curfman McInnes, Barry F. Smith, and Hong
Zhang. PETSc Web page, 2013. http://www.mcs.anl.gov/petsc.
[8]
Morten Bjørhus. A note on the convergence of discretized dynamic iteration.
BIT Numerical Mathematics, 35(2):291–296, 1995.
[9]
Stephen D. Bond and Benedict J. Leimkuhler. Molecular dynamics and the
accuracy of numerically computed averages, 2007.
[10] Kevin Burrage. Parallel methods for systems of ordinary differential equations,
1995.
[11] X. Dai, C. Le Bris, F. Legoll, and Y. Maday. Symmetric parareal algorithms
for Hamiltonian systems. ArXiv e-prints, November 2010.
[12] Luca Formaggia, Marzio Sala, and Fausto Saleri. Domain decomposition techniques. In Are Magnus Bruaset and Aslak Tveito, editors, Numerical Solution
of Partial Differential Equations on Parallel Computers, volume 51 of Lecture Notes in Computational Science and Engineering, pages 135–163. Springer
Berlin Heidelberg, 2006.
59
BIBLIOGRAPHY
[13] S. Friedhoff, R. D. Falgout, T. V. Kolev, S. MacLachlan, and J. B. Schroder.
A multigrid-in-time algorithm for solving evolution equations in parallel.
https://e-reports-ext.llnl.gov/pdf/705292.pdf, 2012.
[14] M. Gander and S. Vandewalle. Analysis of the parareal time-parallel timeintegration method. SIAM Journal on Scientific Computing, 29(2):556–578,
2007.
[15] Martin Gander. New Convergence Results for the Parareal Algorithm Applied
to ODEs and PDEs. Presentation slides for the DD16 meeting, found at:
www.ddm.org/DD16/Talks/gander.pdf.
[16] Martin J. Gander and Ernst Hairer. Nonlinear convergence analysis for the
parareal algorithm. In Ulrich Langer, Marco Discacciati, David E. Keyes,
Olof B. Widlund, and Walter Zulehner, editors, Domain Decomposition Methods in Science and Engineering XVII, volume 60 of Lecture Notes in Computational Science and Engineering, pages 45–56. Springer Berlin Heidelberg,
2008.
[17] Izaskun Garrido, Barry Lee, Gunnar E. Fladmark, and Magne S. Espedal.
Convergent iterative schemes for time parallelization. Mathematics of Computation, 75(255):pp. 1403–1428, 2006.
[18] M. Griebel, S. Knapek, and G. Zumbusch. Numerical Simulation in Molecular Dynamics: Numerics, Algorithms, Parallelization, Applications. Texts in
Computational Science and Engineering. Springer, 2010.
[19] G. Horton and S. Vandewalle. A space-time multigrid method for parabolic partial differential equations. SIAM Journal on Scientific Computing, 16(4):848–
864, 1995.
[20] G. Horton, S. Vandewalle, and P. Worley. An algorithm with polylog parallel
complexity for solving parabolic partial differential equations. SIAM Journal
on Scientific Computing, 16(3):531–541, 1995.
[21] Frank Hülsemann, Markus Kowarschik, Marcus Mohr, and Ulrich Rüde. Parallel geometric multigrid. In Are Magnus Bruaset and Aslak Tveito, editors,
Numerical Solution of Partial Differential Equations on Parallel Computers,
volume 51 of Lecture Notes in Computational Science and Engineering, pages
165–208. Springer Berlin Heidelberg, 2006.
[22] Jan Janssen and Stefan Vandewalle. On SOR Waveform Relaxation Methods.
SIAM J. Numer. Anal, 34:2456–2481, 1997.
[23] Claude Le Bris. Computational chemistry from the perspective of numerical
analysis. Acta Numerica, 14:363–444, 4 2005.
60
BIBLIOGRAPHY
[24] E. Lelarasmee, A.E. Ruehli, and A.L. Sangiovanni-Vincentelli. The waveform
relaxation method for time-domain analysis of large scale integrated circuits.
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions
on, 1(3):131 – 145, july 1982.
[25] Erik R. Lindahl. Molecular dynamics simulations. In Andreas Kukol, editor,
Molecular Modeling of Proteins, volume 443 of Methods Molecular Biology™,
pages 3–23. Humana Press, 2008.
[26] Jacques-Louis Lions, Yvon Maday, and Gabriel Turinici. Résolution d’EDP par
un schéma en temps «pararéel». Comptes Rendus de l’Académie des Sciences
- Series I - Mathematics, 332(7):661 – 668, 2001.
[27] A. S. Nielsen. Feasibility study of the parareal algorithm. Master’s thesis,
Technical University of Denmark, DTU Informatics, 2012.
[28] Arthur Paskin, A. Gohar, and G. J. Dienes. Computer simulation of crack
propagation. Phys. Rev. Lett., 44:940–943, Apr 1980.
[29] Prasenjit Saha, Mount Stromlo, Siding Spring Observatories, Joachim Stadel,
and Scott Tremaine. A parallel integration method for solar system dynamics,
1997.
[30] A. Satoh. Introduction to Practice of Molecular Simulation: Molecular Dynamics, Monte Carlo, Brownian Dynamics, Lattice Boltzmann and Dissipative
Particle Dynamics. Elsevier insights. Elsevier Science, 2010.
[31] Nick Schafer and Dan Negrut. A quantitative assessment of the potential of
implicit integration methods for molecular dynamics simulation. Journal of
computational and nonlinear dynamics, 5(3), 2010.
[32] P.J. van der Houwen, B.P. Sommeijer, and J.J.B. de Swart. Parallel predictorcorrector methods. Journal of Computational and Applied Mathematics,
66(1–2):53 – 71, 1996. Proceedings of the Sixth International Congress on
Computational and Applied Mathematics.
[33] Haim Waisman and Jacob Fish. A space–time multilevel method for molecular
dynamics simulations. Computer Methods in Applied Mechanics and Engineering, 195(44–47):6542 – 6559, 2006.
[34] David E. Womble. A time-stepping algorithm for parallel computers. SIAM J.
Sci. Stat. Comput., 11(5):824–837, September 1990.
[35] Yanan Yu, Ashok Srinivasan, and Namas Chandra.
Scalable timeparallelization of molecular dynamics simulations in nano mechanics. 2012
41st International Conference on Parallel Processing, 0:119–126, 2006.
61
TRITA-MAT-E 2013:44
ISRN-KTH/MAT/E—13/44-SE
www.kth.se
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Related manuals

Download PDF

advertisement