Disserta o de Doutoramento, "Hybrid Space and Time Domain Decomposition for Parallel Simulation of Unsteady Incompressible Fluid Flows", Instituto Superior T cnico, Universidade T cnica de Lisboa, 2007.

Disserta o de Doutoramento, "Hybrid Space and Time Domain Decomposition for Parallel Simulation of Unsteady Incompressible Fluid Flows", Instituto Superior T cnico, Universidade T cnica de Lisboa, 2007.
UNIVERSIDADE TÉCNICA DE LISBOA
INSTITUTO SUPERIOR TÉCNICO
ain
dom
ce
spa
e
m
ti
g
n
ti
u
p
m
o
c
time
varia
ble
Hybrid Space and Time Domain Decomposition
for Parallel Simulation of Unsteady
Incompressible Fluid Flows
Jorge Manuel Fernandes Trindade
(Mestre)
Dissertação para obtenção do Grau de Doutor em
Engenharia Mecânica
Orientador: Doutor José Carlos Fernandes Pereira
Júri
Presidente: Reitor da Universidade Técnica de Lisboa
Vogais: Doutor Paulo Jorge dos Santos Pimentel de Oliveira
Doutor José Carlos Fernandes Pereira
Doutor Fernando Manuel Coutinho Tavares Pinho
Doutor Pedro Jorge Martins Coelho
Doutor José Manuel da Silva Chaves Ribeiro Pereira
April 2007
iii
Abstract
The thesis addresses the parallel calculation of unsteady, incompressible fluid flows
on a PC-cluster. The spatial domain decomposition is nowadays a standard technique to perform parallel calculation of the Navier-Stokes equations. The research
of an accurate, efficient and robust method for parallel-in-time calculations will
extend the parallel calculations options in the context of CFD simulations. The
solution is based on the iterative use of a coarse and a finer time-grid calculation
in a predictor-corrector fashion. The extension of the parallel-in-time algorithm to
hybrid time and space parallel calculations allows the possibility to optimize the
speed-up by the choice of the domain to parallelize against the dimensions of the
problem and the number of processors available.
The discretization option of the incompressible, unsteady form of the NavierStokes equations is a common issue for both, spatial and temporal, parallel strategies. For second-order accurate finite-volume methods, the time derivative of the
volume-averaged velocity can be congruently replaced by the time derivative of the
cell center velocity. However, on a high-order formulation based on a projection
method, it is essential to include in the algorithm a high-order reconstruction step.
The fourth-order finite-volume numerical scheme uses the projection method for
decoupling velocity and pressure. The inclusion of a high-order step-by-step deaveraging process applied onto the velocity field is a simple and effective method
to enhance the accuracy of a finite-volume code for CFD analysis.
Key-words: Parallel; Navier-Stokes; High-Order; Unsteady; Incompressible; FiniteVolume.
iv
v
Resumo
O cálculo paralelo de escoamentos não-estacionários de fluidos incompressı́veis num
”PC-cluster ” constitui o assunto central da tese. A decomposição do domı́nio espacial é hoje prática corrente na solução paralela das equações de Navier-Stokes.
A investigação de um método baseado na decomposição do domı́nio temporal permitirá alargar o leque de opções de paralelização no contexto das simulações numéricas do escoamento de fluidos. A tese apresenta a aplicação de um algoritmo
de decomposição temporal para a solução das equações de Navier-Stokes no caso
incompressı́vel e não-estacionário baseado na solução iterativa das equações em
duas malhas temporais.
A escolha da ordem de discretização das equações de Navier-Stokes é um assunto comum em ambas as estratégias de paralelização. Nos esquemas de volumefinito de segunda-ordem de precisão, a derivada temporal da velocidade média no
volume de controlo pode ser substituı́da pela derivada da velocidade pontual no
centro do volume de controlo. No entanto, para uma formulação de alta-ordem
baseada no método de projecção é necessário incluir no algoritmo um passo de reconstrução com alta-ordem de precisão. O esquema numérico de quarta-ordem usa
o método de projecção para desacoplar a velocidade e a pressão. A inclusão de um
processo de reconstrução do campo de velocidades em cada passo de cálculo é uma
forma de aumentar a precisão de um código de volume-finito para a investigação
de problemas através da mecânica de fluidos computacional.
Palavras-Chave: Cálculo Paralelo; Navier-Stokes; Alta-Ordem; Não-estacionário;
Incompressı́vel; Volume-Finito.
vi
vii
Acknowledgments
This work was carried out at the Department of Mechanical Engineering at Instituto Superior Técnico, Universidade Técnica de Lisboa.
I would like to express my gratitude to my supervisor Prof. José Carlos Pereira
for all good advice and for encouraging me during my work.
I would also like to thank to Prof. José Manuel C. Pereira for all fruitful discussions and to all my colleagues at the LASEF for creating a stimulating working
atmosphere.
Finally, I would like to thank to my family for the support and patience.
viii
ix
Nomenclature
Latin characters
b body force per unit mass
cv , cp specific heat at constant volume, pressure
CD drag coefficient
CL lift coefficient
g gravitational acceleration vector
h cavity width
k iteration counter
n unit vector normal to CS directed outwards
p pressure
P
number of processors
S parallel speed-up
t time variable
T
time domain length
t∗ non-dimensionalized time t∗ = tκ/h2
u horizontal velocity
u velocity vector
U non-dimensionalized horizontal velocity U = ρuh/µ
x horizontal space coordinate
X non-dimensionalized horizontal space coordinate X = x/h
y vertical space coordinate
Y
non-dimensionalized vertical space coordinate Y = y/h
x
Greek characters
α diffusivity
β thermal coefficient of expansion
χ primitive variable for the velocity and temperature field
δt fine time-grid step size
∆t coarse time-grid step size
² parallel efficiency
γ kinematic viscosity
Γ computing time
θ, θ0 temperature, reference temperature
Θ non-dimensionalized temperature Θ = (θ − θ0 )/(θX=0 − θX=1 )
κ heat conductivity
µ viscosity coefficient
ρ, ρ0 density, reference density
φ scalar
ω vorticity
Similarity parameters
N u Nusselt number
P r Prandtl number
Ra Rayleigh number
Re Reynolds number
St Strouhal number
xi
Abbreviations
CF D Computational Fluid Dynamics
CF L Courant-Friedrichs-Lewy
CS Control surface
CV
Control volume
M P I Message Passing Interface
P C Personal computer
P V M Parallel Virtual Machine
rms root mean square
xii
Table of Contents
Abstract . . . . .
Resumo . . . . .
Acknowledgments
Nomenclature . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. iii
. v
. vii
. ix
1 Introduction
1.1 Parallel numerical simulation . .
1.2 Spatial domain decomposition . .
1.3 Temporal domain decomposition .
1.4 High-order projection methods . .
1.5 Objectives and contributions . . .
1.6 Contents . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
6
10
13
16
17
.
.
.
.
.
.
.
.
.
.
19
19
22
23
26
27
28
28
30
33
34
.
.
.
.
.
.
37
37
43
43
45
49
56
2 Numerical methods
2.1 Governing equations . . . . . . . . . . . . . . . . .
2.2 Solution of Navier-Stokes equations . . . . . . . . .
2.3 Spatial discretization . . . . . . . . . . . . . . . . .
2.4 Time advancement of momentum equations . . . .
2.5 Pressure correction . . . . . . . . . . . . . . . . . .
2.6 High-order finite-volume method . . . . . . . . . .
2.6.1 Spatial discretization . . . . . . . . . . . . .
2.6.2 Time advancement of momentum equations
2.6.3 Velocity de-averaging . . . . . . . . . . . . .
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
3 Space domain decomposition
3.1 Parallel solution of systems of equations . . . . . . .
3.2 High-order finite-volume numerical simulations . . . .
3.2.1 Two-dimensional Taylor vortex decay problem
3.2.2 Counter-rotating vortices interaction . . . . .
3.2.3 Co-rotating vortices merging . . . . . . . . . .
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . .
xiii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xiv
TABLE OF CONTENTS
4 Time domain decomposition
4.1 Numerical method . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Numerical stability . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Parallel-in-time numerical scheme accuracy . . . . . . . . . . . . . .
4.4 Performance model . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.1 Taylor vortex . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.1.1 Parallel-in-time results . . . . . . . . . . . . . . . .
4.5.1.2 Comparison between the spatial and the temporal
domain decomposition. . . . . . . . . . . . . . . . .
4.5.2 Shedding flow past a two-dimensional square cylinder in a
channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.2.1 Parallel-in-time simulation . . . . . . . . . . . . . .
4.5.2.2 Evaluation of the proposed performance model . .
4.5.3 Hybrid spatial and temporal domain decomposition . . . . .
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
59
62
70
76
78
78
78
84
87
87
91
96
102
5 Conclusions
105
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2 Suggestions for future work . . . . . . . . . . . . . . . . . . . . . . 108
Bibliography
109
A De-averaging coefficients
117
List of Figures
1.1
1.2
1.3
2.1
2.2
2.3
2.4
PC-cluster current implementation. . . . . . . . . . . . . . . . . . .
1D(a), 2D(b) and 3D(c) strategies for the spatial domain decomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Internal boundaries message exchange scheme. . . . . . . . . . . . .
2
Co-located grid (left) and staggered grid (right). . . . . . . . . . . .
Labelling scheme for a 2D grid. . . . . . . . . . . . . . . . . . . . .
Labelling scheme for 3D face integral evaluation. . . . . . . . . . . .
Nine-point stencil for pressure increment gradient operator discretization for the pressure Poisson equation for the two-dimensional case.
24
29
30
Parallel code flow chart for a incremental-pressure projection method
(cycle A exists only for the explicit outer iteration coupling). . . . .
3.2 Dependence of the computing time on the number of processors
(128 × 128 nodes mesh). . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Dependence of the achieved speed-up on the number of processors
(128 × 128 nodes mesh). . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Dependence of the parallel efficiency on the number of processors
(128 × 128 nodes mesh). . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Dependence of the achieved speed-up on the number of processors
for a 512 × 512 nodes mesh. . . . . . . . . . . . . . . . . . . . . . .
3.6 Dependence of the parallel efficiency on the number of processors
for a 512 × 512 nodes mesh. . . . . . . . . . . . . . . . . . . . . . .
3.7 The computational domain considered for the two-dimensional Taylor vortex-decay problem. . . . . . . . . . . . . . . . . . . . . . . .
3.8 Maximum error, L∞ norm, of u-velocity and pressure, at final computed time, dependence on the mesh refinement. . . . . . . . . . . .
3.9 Initial conditions and computational domain for the two-dimensional
viscous counter-rotating vortices test case. . . . . . . . . . . . . . .
3.10 Temporal evolution of the adimensionalized maximum vertical velocity component along a horizontal line going through the vortex
center. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
9
34
3.1
xv
38
40
41
41
42
42
44
45
46
47
xvi
LIST OF FIGURES
3.11 Comparison of the predicted vertical velocity component profile after 100 s for the 128 × 128 nodes grid with the finer grid solution. .
3.12 Comparison of the predicted vertical velocity component profile after 100 s for the 256 × 256 nodes grid with the finer grid solution. .
3.13 Initial vorticity contours. . . . . . . . . . . . . . . . . . . . . . . . .
3.14 Vorticity contours during the merging process at t = 2 s, t = 5 s,
t = 6 s, t = 7 s and t = 8 s (Second-order de-averaging on left and
fourth-order on the right side). . . . . . . . . . . . . . . . . . . . . .
3.15 Comparison of the vorticity contours during the merging process
simulation on a 150 × 150 nodes grid at t = 2 s, t = 5 s, t = 6 s,
t = 7 s and t = 8 s (second-order de-averaging on the left and fourthorder on center) with the reference solution on the 600 × 600 nodes
grid (right side). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.16 Comparison of the vorticity contours during the merging process
simulation on a 300 × 300 nodes grid at t = 2 s, t = 5 s, t = 6 s,
t = 7 s and t = 8 s (second-order de-averaging on the left and fourthorder on center) with the reference solution on the 600 × 600 nodes
grid (right side). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.17 Vorticity contours during the merging process at t = 2 s, t = 5 s,
t = 6 s, t = 7 s and t = 8 s for 300 × 300 (left) and 300 × 300 × 20
(right) nodes grids. . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.18 Vorticity contours during the merging process at t = 60 s, t = 220 s,
t = 240 s, t = 260 s and t = 300 s (Second-order de-averaging on
left and fourth-order on the right side). . . . . . . . . . . . . . . . .
3.19 Predicted vertical velocity component profile after the merging (t =
360 s). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
4.2
4.3
4.4
4.5
4.5
4.6
Space-time decomposition parallel solver schematic diagram. . . . .
Propagating scalar front problem considered for numerical evaluation of the stability domain (solution for t = 1, u = 0.125 and
α = 10−3 , after two iterations). . . . . . . . . . . . . . . . . . . . .
Sequential coarse time-grid and parallel fine time-grid solutions . . .
Error dependence on the iteration number near the stability boundary (4th order Runge-Kutta scheme and diffusive criterion equal to
0.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Stability domain for the explicit Euler (a), Adams-Basforth (b),
Crank-Nicolson (c) and 4th order Runge-Kutta (d) schemes . . . .
Stability domain for the explicit Euler (a), Adams-Basforth (b),
Crank-Nicolson (c) and 4th order Runge-Kutta (d) schemes (cont’d)
Initial condition (t = 0) and the iterative approximation to the exact
solution at t = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
48
49
50
52
53
54
55
56
61
64
66
67
68
69
71
LIST OF FIGURES
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15
4.16
4.17
4.18
4.19
4.20
4.21
4.22
4.23
4.24
4.25
4.26
4.27
Dependence of sequential solution on the fine time-grid L2 error
norm on the mesh resolution. . . . . . . . . . . . . . . . . . . . . .
Error dependence on the spatial discretization and number of iterations using the implicit Euler and fourth-order Runge-Kutta schemes
on the coarse and fine time-grids, respectively. . . . . . . . . . . . .
Error dependence on the spatial discretization and number of iterations using the Crank-Nicolson and fourth-order Runge-Kutta
schemes on the coarse and fine time-grids, respectively. . . . . . . .
Error dependence on the spatial discretization and number of iterations using the implicit Euler and Adams-Bashforth schemes on the
coarse and fine time-grids, respectively. . . . . . . . . . . . . . . . .
Parallel-in-time solver schematic diagram. . . . . . . . . . . . . . .
Parallel-in-time speed-up prediction for two iterations. . . . . . . .
Parallel-in-time computing time dependence on the number of iterations performed (32 × 32 nodes; δt = 4 × 10−3 s). . . . . . . . . . .
Parallel-in-time solution L1 norm error dependence on the number
of iterations performed (32×32 nodes; δt = 4 × 10−3 s). . . . . . . .
Dependence of maximum deviation between parallel-in-time and serial solutions on the number of iterations performed (32 × 32 nodes;
δt = 4 × 10−3 s). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Computing time dependence on space resolution (2 iterations; δt =
4 × 10−3 s). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Speed-up and parallel efficiency dependence on space resolution (2
iterations; δt = 4 × 10−3 s). . . . . . . . . . . . . . . . . . . . . . . .
Computing time dependence on the size of the finer time-grid increment (2 iterations on a 32 × 32 nodes mesh). . . . . . . . . . . . . .
Spatial domain decomposition parallel efficiency and speed-up on a
64 × 64 nodes mesh and δt = 4 × 10−3 s. . . . . . . . . . . . . . . .
Temporal domain decomposition parallel efficiency and speed-up on
a 64 × 64 nodes mesh and δt = 4 × 10−3 s. . . . . . . . . . . . . . .
Parallel-in-time and domain decomposition methods parallel efficiency ratio on a 64 × 64 nodes mesh. . . . . . . . . . . . . . . . . .
Flow configuration and grid. . . . . . . . . . . . . . . . . . . . . . .
Force coefficients for Re = 500. . . . . . . . . . . . . . . . . . . . .
Predicted vorticity contours and streamlines for Re = 500. . . . . .
Lift coefficient and Strouhal number dependence on the number of
iterations prescribed. . . . . . . . . . . . . . . . . . . . . . . . . . .
Vorticity contours after first (a) and second (b) iteration. . . . . . .
Comparison between predicted (lines) and verified (symbols, filled
symbols are related to 3 iterations) efficiency of the parallel-in-time
method for 2 and 3 iterations prescribed (Φ = M/P ). . . . . . . . .
xvii
72
73
74
75
76
78
80
81
81
82
83
84
86
86
87
88
90
91
93
94
95
xviii
LIST OF FIGURES
4.28 The dependence of the Nusselt number, at the heated wall, X = 0,
and at the vertical center-line X = 1/2, on the non-dimensionalized
time for Rayleigh number equal to 103 , 1.4 × 104 and 1.4 × 105 . . . 98
4.29 The Nusselt number temporal evolution, at the vertical center line
for Ra = 1.4×105 , dependence on the number of iterations performed. 99
4.30 The dependence of the number of iterations required for convergence
on the Rayleigh number and on the number of time sub-domains. . 99
4.31 Computing time required for simulations: a) 128 × 128 nodes ; b)
32 × 32 nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
A.1
A.2
A.3
A.4
A.5
High-order 2-D de-averaging stencil for interior control volumes. . .
Cases considered for the use of the 2-D de-averaging coefficients. . .
3-D stencil and de-averaging coefficients for a interior cell. . . . . .
3-D stencil and de-averaging coefficients for a boundary face cell. . .
3-D stencil and de-averaging coefficients for a boundary face cell
near a edge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.6 3-D stencil and de-averaging coefficients for a boundary face cell
near a vertex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.7 3-D stencil and de-averaging coefficients for a cell near a boundary
face. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.8 3-D stencil and de-averaging coefficients for a cell near a boundary
face and edge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.9 3-D stencil and de-averaging coefficients for a cell near a boundary
face and vertex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.10 3-D stencil and de-averaging coefficients for a boundary edge cell. .
A.11 3-D stencil and de-averaging coefficients for a boundary vertex cell.
118
118
120
121
122
123
124
125
126
127
128
List of Tables
1.1
Configuration of individual machines. . . . . . . . . . . . . . . . . .
3
4.1
Predicted values for St and CL rms . . . . . . . . . . . . . . . . . . . 89
A.1 De-averaging coefficients for a uniform 2-D cartesian spatial grid. . 119
xix
xx
LIST OF TABLES
Chapter 1
Introduction
1.1
Parallel numerical simulation
Parallel architectures are increasingly attractive when examining current trends
from a variety of perspectives; economical, technological and application demand
[1]. For complex engineering problems, the demand for the computer resource
of large memory and reasonable turn-around time is more than the one provided
by a regular sequential machine. To this end, one can employ powerful supercomputers, which are very expensive for Universities and small research groups.
Fortunately, due to the support of high-speed networking, we can construct highperformance systems based on personal computers (PC) components. The use
of multiple processors is an effective way to significantly speed-up the solution.
A popular and cost-effective approach to parallel computing is cluster computing
based, for example, on PCs running the Linux operating system.
The present Thesis addresses the parallel calculation of unsteady, incompressible flows on a PC cluster that is nowadays a common practice to achieve the
computing power required by theoretical or applied fluid engineering problems.
LASEF (Laboratory and Simulation of Energy and Fluids) has experience in parallel processing for Computational Fluid Dynamics (CFD) and for almost two
decades used different configurations, starting with transputer based processors
[2]. Other research teams in Portugal have used parallel processing in CFD related
problems, see e.g. P. Novo [3], P. Coelho [4, 5], L. Palma [6, 7] and a few others,
but the number is very limited compared with central Europe countries, USA or
1
2
CHAPTER 1. INTRODUCTION
Japan. Therefore, here is included the description of the hardware and software of
the 24 nodes PC-cluster used to conduct the numerical experiments. A PC-cluster
can be considered as a message-passing architecture parallel machine that provides
communications between processors as explicit I/O operations, with the limitation
that these communications are much slower than the processor. The effectiveness
of this approach depends on the communication network connecting the PCs. The
goal of the project was to construct a PC-cluster to provide the relatively inexpensive but high-performing computational capability required for the execution
of fluid flow simulations. A 24 unit was successfully assembled, each with one
Pentium IV 2.4 GHz processor. The system provides very stable operation with
almost 100% up-time and expected speed, and it is very heavily used by a large
number of users (about 15). Consequently, it was very important to integrate the
system properly so that it complies with the demands of a multi-user remote access
environment. All the major aspects of the PC-cluster current implementation are
schematically represented on Fig. 1.1.
Master
PC
100 Mbs
Switch
24 Slaves
Figure 1.1: PC-cluster current implementation.
Pentium CPU was selected over other options because there are stable compilers that run under Linux system; otherwise, one had to purchase an expensive
1.1. PARALLEL NUMERICAL SIMULATION
3
alternative operating system. No monitor is attached to any slave PC, since these
are to be connected with network and used without direct user access. The operation of all the individual machines is monitored with a master console. Table 1.1
summarizes the configuration details of the individual machines.
Table 1.1: Configuration of individual machines.
Master
CPU
2.4 GHz
Memory
1 GB
Hard Drive 65 + 200 GB
Slaves
2.4 GHz
1 GB
33 GB
Finally, it is worth mentioning a vital hardware-related issue of the PC-cluster,
the Fast Ethernet network setup. In order to coordinate the computational tasks
among all computer nodes, it is necessary to exchange a large amount of information during the computing process. High-speed communication among them is
imperative. One 24-port 100 Mbps Ethernet switch is currently used for the node
connection. The master PC is connected to the IST 10Mbps Ethernet, allowing
the communication with the outside world.
The Linux operating system (Debian distribution, version 3.1) is currently used
for all computers of the cluster. The well-known stability, performance, excellent
software and troubleshooting support and documentation of this easy-to-use free
operation system made it a natural choice for the PC-cluster. All the codes were
developed in FORTRAN language using the Absoft compiler version 95 9.0 EP. The
MPICH version 2.1 libraries are currently used for inter-processor communication.
In parallel computation, the workload is partitioned into smaller pieces, which
are taken care by a group of processors. Effective use of a PC-cluster requires
a proper distribution of the solution tasks among the available processing nodes.
A common approach in CFD is to decompose the spatial domain into a number of partitions and assign the partitions to different nodes, see e.g. [8, 9]. The
processing nodes execute the same CFD solver but on different space sub-domains.
One should partition the work evenly to the available computing nodes. It should
be done such that no node is overloaded while other nodes are waiting for work.
At the end of each numerical iteration, processors exchange intermediate results
at sub-domain boundaries. Minimizing communication between nodes is the key
4
CHAPTER 1. INTRODUCTION
for optimal speed-up. For CFD problems, evenly balanced workload and minimization of communication are dictated by the mesh partitioning. For structured
meshes, the domain decomposition is straightforward. For unstructured meshes, a
robust and effective algorithm to divide automatically the domain is a demanding
necessity.
The comparison between different methods of decomposition requires parameters for measuring the performance of codes on parallel machines. The most
common performance metrics of parallel processing are the speed-up and the efficiency. The speed-up achieved by a parallel algorithm running on P processors is
defined as the ratio of the execution time on a single processor and the execution
time of the parallel algorithm on P processors. The parallel speed-up S(P ) can be
expressed as
S(P ) =
Γ1
,
ΓP
(1.1)
where Γ1 and ΓP denote the execution time on one and P processors, respectively.
The parallel efficiency, ²(P ), is given by:
²(P ) =
S(P )
.
P
(1.2)
The definition of sequential time can be based on either the execution time of the
parallel code on one processor, relative speed-up, or on the execution time of the
best sequential algorithm available, absolute speed-up. The difference in these two
definitions is that the relative speed-up deals with the inherent parallelism of the
parallel algorithm under consideration, while the absolute speed-up emphasizes
how much faster a problem can be solved with parallel processing when the chosen
algorithm is the best available. Practical considerations limit the usefulness of the
absolute speed-up and efficiency definitions. The option for the best algorithm
may depend on the problem size, hardware used, etc. A constraint on the parallel
performance is provided by Amdahl’s law [10], which states that no matter how
many processors are used in a parallel calculation, the parallel speed-up for an
application will be limited by the fraction of serial code present.
The parallel efficiency can be expressed as a product of three factors: commu-
1.1. PARALLEL NUMERICAL SIMULATION
5
nication, numerical and load balancing,
²(P ) = ²com × ²num × ²lb .
(1.3)
The numerical efficiency represents the increase in the number of operations required to fulfil the convergence criterion due to the changes in the algorithm required for its parallelization. The load balancing efficiency represents the time
some processors stay idle due to problem sizing differences on each processor. The
communication efficiency represents the time loss in a parallel computation due
to communication lag between processors during which computation cannot take
place. The communication time can be split into local and global,
Γcom = Γloc + Γglob ,
(1.4)
where Γloc and Γglob are the time spent on local and global communications, respectively. The difference between the two is that local communications run in
parallel, i.e. all processors are involved simultaneously in communication. During local communication, some processors send while others receive data. Global
communication is a limiting factor for massive parallelization because only a certain number of processors are involved in communication at any time between the
beginning and the end of data gathering or scattering.
Scalability is an important issue in parallel computing and becomes significant when solving large-scale problems with many processors. Scalability refers to
the ability of a parallel system including the hardware, software and application
to demonstrate a proportional increase in parallel speed-up with the number of
processors. Mesh partitioning has a dominant effect on the parallel scalability for
problems characterized by almost constant work per point. Poor load balancing
induces idleness at synchronization points. However, balancing work alone is not
sufficient. Communications must be balanced as well. These two objectives are
not entirely compatible. When the problem size is fixed, an increase in the number
of processors can begin to have a negative impact on parallel speed-up. The ratio
between the time of communication and the time of computation decreases when
the size of the problem increases, leading to an increased efficiency. Speed-up usually increases with increasing problem size because large data arrays can reduce
6
CHAPTER 1. INTRODUCTION
the fraction of serial code providing increased parallel speed-up.
In the last years, high performance computing became increasingly important
to scientific advancement. Nowadays, it is recognized that computational power
greater than presently available will be needed in order to address the large-scale
problems in industry as well as to improve our knowledge in complex scientific problems. Massive parallel technology as the GRID computing is expected to fulfil this
present need [11, 12]. GRID computing is a new distributed computing paradigm,
similar in spirit to the electric power grid. GRID computing is a form of distributed computing that involves coordinating and sharing computing, application,
data, storage, or network resources across dynamic and geographically dispersed
organizations. It provides scalable high-performance mechanisms for discovering
and negotiating access to geographically remote resources promising to change the
way organizations tackle complex computational problems. However, the vision of
large-scale resource sharing is not yet a reality in many areas - GRID computing
is an evolving area of computing, where standards and technology are still being
developed to enable this new paradigm in a near future, see e.g. [13, 14, 15].
1.2
Parallel simulation by spatial domain decomposition
CFD plays an important role in research of physical processes associated with important engineering applications. Current challenges in computing incompressible
flows have been recently addressed by Kwak et al. in a review article [16]. Flow
solver codes and software tools have been developed to the point that many daily
fluid engineering problems can now be computed routinely. However, the predictive capability is considered still very limited, and prediction with accurate physics
is yet being accomplished. This will require the inclusion of not only fluid dynamics
modelling but also the modelling of other related quantities. These computations
will require not only large computing resources but also large data storage and
management technologies.
The solution of three-dimensional, unsteady, incompressible fluid flows, governed by the parabolic-elliptic nature of the Navier-Stokes and continuity system
of equations, usually requires a large amount of computer time due to long time
1.2. SPATIAL DOMAIN DECOMPOSITION
7
simulation coupled with high-resolution meshes. Workstations, networks or multiprocessor systems with distributed memory have become a customary facility for
the solution of this sort of problem.
Most of the unsteady, incompressible calculations are performed with timestepping algorithms and space domain decomposition. The time-stepping algorithms require the solution at one time-instance before start the calculation for
the next time-step. This sequential in time procedure is accomplished by the discretization in space that uses the data parallelism or space domain decomposition
technique. It is well known that when the spatial dimension of the problem is low,
the parallel speed-up of the spatial domain decomposition method is limited and
an increase on the number of processors further decreases the efficiency.
To perform parallel programming, a parallel library is needed to provide the
communication among the computer nodes in the network. There are two standards for distributed computing: the Parallel Virtual Machine (PVM) and the
Message Passing Interface (MPI). The features of both specifications were compared by Geist et al. [17]. PVM is built around the concept of a virtual machine,
dynamic collection of potentially heterogeneous computational resources. Since
the development start of PVM that portability and heterogeneity are considered
much more important than performance. Some performance sacrifice is made in
favour of the flexibility to communicate across architectural boundaries. In contrast, MPI was focused on the message-passing task and explicitly states that the
resource management is out of scope of MPI. On the framework of parallel calculation on a homogeneous PC-cluster, the MPI standard (MPI 1.2 specification
[18]) was chosen to develop the parallel Navier-Stokes solvers. MPI provides the
following main features:
- the ability to specify communication topologies;
- the ability to create derived data-types describing non-contiguous data;
- a large set of point-to-point communication routines;
- a large set of collective communication routines for communication among
groups of processes.
The parallelization procedure is based on the grid partitioning technique or
data parallelism. The solution domain is subdivided into P non-overlapping sub-
8
CHAPTER 1. INTRODUCTION
domains and each sub-domain is assigned to one processor. The objective of domain decomposition is to balance the computational workload and memory occupancy of processing nodes while keeping the inter-node communication as less as
possible. The communication between nodes, which is the more important source
of the computational overhead, must be minimized.
In general, the partitioning method follows one the following strategies (see
Fig. 1.2):
- The one-directional partitioning of a three dimension computational space;
- The partitioning for decrease the amount of data exchange in communications between processors.
a)
b)
c)
Figure 1.2: 1D(a), 2D(b) and 3D(c) strategies for the spatial domain decomposition.
The one-directional partitioning is easier to program but has a long data communication time, because this partitioning has a large amount of communication
on a data exchange part. This trend of data communication appears in parallel ma-
1.2. SPATIAL DOMAIN DECOMPOSITION
9
chines with high data communication latency. The multi-directional partitioning,
with smaller message size, is commonly in use and will be applied when possible.
Since the sub-domains do not overlap, each processor calculates variable values
that are not calculated by other processors. The calculation on control volumes
(CVs) along the sub-domain boundaries needs values from CVs allocated to neighbouring processors. This requires an overlap of storage in the case of a distributed
memory parallel computer, as indicated in Fig. 1.3. Each time a processor updates a variable needed by a neighbour processor, it is copied to the neighbour
processor’s memory. The number of CVs stored in the neighbouring processor is
dependent on the order of accuracy of the discretization scheme.
boundary
Proc. 3
Proc. 1
Proc. 2
Proc. 4
Data overlap areas
Proc. n
Proc. n+1
Communication step 1
Proc. n-1
Proc. n
Communication step 2
Figure 1.3: Internal boundaries message exchange scheme.
In contrast to a sequential program, a parallel program needs to be well organized in task load and communication to maximize the performance. Message
passing routines, included in the MPI library, provide us the organized structure
for parallel computing. If not done carefully, the executing processors can spend
much more time waiting for the communication than computing, and thus the
10
CHAPTER 1. INTRODUCTION
overall performance of the cluster will be unacceptable. A basic solution is to
exploit the so-called ”non-blocking communication,” which allows message passing
simultaneously with computing.
The performance of a parallel CFD code is dependent on several factors including the characteristics of the numerical algorithm and the hardware. The parallel
performance depends to a great extend on inter-processor communication which
is a hardware depend parameter. The numerical procedures used to solve the
equations have also a significant contribution to the parallel performance. The
discretization methods and the time-marching procedure are important parts of a
CFD code. A higher-order numerical scheme usually requires more communication time in a parallel environment than a lower-order one. Implicit schemes allow
larger time-steps than the easily parallelized explicit ones but require more computational work at each time step and its parallelization is sometimes cumbersome.
1.3
Parallel simulation by temporal domain decomposition
The parallelism in the time direction is not common in CFD. The usual approach
for setting up parallel CFD calculations is to divide the domain between processors
using the spatial domain decomposition. Algorithms that are sequential in time are
usually considered to solve parabolic and hyperbolic differential equations numerically. Space domain splitting and the allocation of each sub-domain to a processor
is the methodology usually used to perform the parallel computation of the governing fluid flow equations. However, it is well known that domain decomposition
techniques are not efficient when the spatial dimension of the problem is small and
a large number of processors are used. At each time step, the processors need to
exchange boundary variable values with processors holding adjacent sub-domains.
For a fixed space sized problem in a distributed memory parallel computer, the
communication/computation ratio increases with the number of processors. Although some overlapping between computation and communication is possible,
parallel efficiency and speed-up are drastically reduced.
Future massive parallel computer systems as GRID computing will increase the
number of processors available allowing new boundaries to the problem dimension
1.3. TEMPORAL DOMAIN DECOMPOSITION
11
to be solved. Consequently, parallel-in-time methods will have a high potential
application to reduce the computing time that nowadays is only achieved with the
standard spatial domain decomposition technique. Several parallel algorithms have
been developed in the past to decompose the temporal domain of the problem under
consideration [19, 20, 21, 22]. These methods range from space-time multigrid
methods to parallel time-stepping methods but their application for the solution
of real unsteady fluid flow problems did not become popular, see e.g. [23].
Recently, Lions et al. presented a new approach to parallelize across the time
domain of the problem under consideration the solution for the temporal evolution of a parabolic system of equations [24]. The new parallel method was called
parareal because the main goal on the initial development was the real time solution of a problem using a parallel structure. Some modifications of the original
parareal algorithm have been introduced by Bal [25] to obtain better stability and
performance. The method is based on the alternated use of coarse global sequential solvers with fine local parallel ones and the calculation proceeds in an iterative
prediction-correction fashion over the entire time domain of the problem. Calculation starts with a sequential solution along the time domain of the problem on
a coarse time-grid and is followed by an iterative procedure using the coarse timegrid and a finer one. The predictor step is calculated sequentially on the coarse
time-grid and the correction step is based on a solution calculated in parallel using
the fine time-grid. This iterative procedure provides successive corrections for the
problem solution.
Some applications of the parareal algorithm have already been successfully
performed. The parareal algorithm was applied on molecular-dynamics simulation
[26], and quantum control [27] identifying some bottlenecks that contribute to the
low parallel efficiency of the method. The application of the method to solve structure, fluid and coupled fluid-structure model problems has also been considered feasible by Farhat and Chandesris [28]. Previous application of the parallel-in-time
method for the solution of the unsteady, incompressible Navier-Stokes equations
reported in Reference [29] indicates that the method can be a promising alternative
technique for long time simulations on small spatial domains.
The stability of the parallel-in-time method has been addressed by Lions et al.
[24] for linear partial differential equations under simplified assumptions, such as
the knowledge of the exact form of the solution on the fine time-grid evolution. The
12
CHAPTER 1. INTRODUCTION
stability issue is also theoretically addressed by Farhat and Chandesris [28] and by
Staff and Rønquist [30] but, unfortunately, no general CFL-like criterion for the
conditional stability exists, even for the one-dimensional convection-diffusion equation. The theoretical investigation of the stability of the parallel-in-time algorithm
allowed important conclusions such as the observed reduction of the conditional
stability region of standard explicit schemes. The fulfilment of the stability criteria
for isolated fine- and coarse-time-grid sets of equations requires, in the absence of
a general criterion, the computation of the stability region numerically. Therefore,
numerical experiments were performed to investigate the stability domain when the
method is applied to the solution of the one-dimensional transport equation [31].
The conditional stability domain was predicted for several temporal discretization
schemes on the coarser time-grid simulating up to one hundred processors. For
the test case investigated, no reduction of the stability domain of the finite difference equations of the parallel-in-time method was detected for the ”unconditionally
stable” implicit three-level and Euler schemes. The other schemes considered under the parallel-in-time method (explicit Euler, Adams-Bashforth, Crank-Nicolson
and fourth-order Runge-Kutta) displayed important reductions on the standard
conditional stability domain for sequential calculations.
The extension of the parallel-in-time algorithm to hybrid time and space parallel calculations allows the possibility to optimize the speed-up by the choice of the
domain to parallelize against the problem dimension under consideration and the
number of processors available. A simplified performance model for the parallelin-time method, taking into account several parameters that contribute to the
computing time required to perform a parallel-in-time calculation, is essential to
provide a parallel efficiency or speed-up prediction. The prediction of the theoretical parallel efficiency or speed-up is relevant to decide when to use parallel-in-time
for the solution of a specific problem. Another important issue related to the performance of the parallel-in-time method is the time spent by the communication
tasks required by the algorithm. The performance model is validated with solutions of the unsteady Navier-Stokes equations. The comparison of the observed
parallel-in-time performance for the solution of an unsteady, incompressible flow
problem on a PC-cluster with the performance model prediction will allow to verify
the effect of the communication time on the parallel-in-time simulation efficiency.
It is believed that in the near future massive parallel computer systems will in-
1.4. HIGH-ORDER PROJECTION METHODS
13
crease the number of processors available allowing new boundaries to the problems
dimension. Consequently, time and hybrid (space and time) domain decomposition
methods will have a high potential application to reduce the computing time that
is nowadays achieved with the standard spatial domain decomposition technique.
This will have a positive impact on the solution of partial differential equations
that CFD deals with, but also in other areas of computer modelling for engineering
and science.
1.4
High-order projection methods
Previous sections introduced the methodologies required for parallel fluid flow simulation by spatial or temporal domain decomposition. The discretization of the
incompressible, unsteady form of the Navier-Stokes equations, which is introduced
in this section, is a common issue for both parallel strategies.
The numerical solution of Navier-Stokes equations, written in primitive variables, for unsteady, incompressible flows, faces numerical difficulties due to the
lack of a dedicated equation for the pressure temporal evolution. This problem is
commonly overcome, apart from pseudo-compressibility and vorticity based methods, by a pressure-velocity coupling method such as the families of projection
methods or fractional-step or operator-splitting methods or dedicated algorithms
such as PISO [32]. Pressure projection methods are usually preferred to artificial compressibility methods, except for pseudo-transient solutions, to reach the
steady-state solution of interest, see [33].
The projection methods were introduced by the pioneering work of Chorin [34]
and Temam [35] and several variants of the original method have been presented.
All these variants are based on the decomposition of the equation operators. Momentum equations are first updated calculating an intermediate velocity that, in
general, will not satisfy mass conservation. After the solution of the pressure Poisson equation, the intermediate velocity field is then corrected in the second step to
enforce mass conservation. Many of the schemes use some kind of explicit approximation making the overall algorithm semi-implicit or explicit, see e.g. [34, 35].
Others use a semi-implicit approximation for the convective terms making the
algorithm unconditionally stable, see e.g. [36, 37, 38]. The use of an explicit treat-
14
CHAPTER 1. INTRODUCTION
ment of convection fluxes and diffusion terms, even imposing severe restrictions
on allowed time-step size, requires less storage and computing time per time-step.
Consequently, when a detailed flow history is required, explicit methods are often
used instead of more stable implicit methods. Another advantage resulting from
the use of an explicit method is the straightforward and efficient parallelization.
For problems with large meshes, massive parallel computing offers an attractive
means to reduce the computing time.
Numerous papers appeared in the literature over the past thirty years discussing
projection-type methods for solving the incompressible Navier-Stokes equations.
Many peculiarities related to the accuracy of the projection methods have been focus of extensive research and discussion. Recurring difficulties encountered are the
proper choice of boundary conditions for the auxiliary variables in order to obtain
at least second-order accuracy in the computed solution and the formula for the
pressure correction at each time-step. E and Liu [39, 40] made a review analysing
several projection methods. They performed a normal modes analysis of the semidiscrete in time Stokes equations employing the first-order Chorin’s method and
the second-order incremental and non-incremental methods of Bell et al. [37] and
Kim and Moin [36], respectively. It revealed that since it is impossible to satisfy
the exact boundary condition for the pressure that follow from the semi-discrete
equations, the pressure is polluted by either a spurious boundary layer around
boundaries where Dirichlet boundary conditions are prescribed or high frequency
oscillations. Strikwerda and Lee [41] have also analysed the fractional step methods of Kim and Moin [36], van Kan [38] and Bell et al. [37], for the incompressible
Navier-Stokes equations. Their study shows that the pressure at any projection
method can be at best first-order accurate because boundary conditions cannot
be exactly satisfied on the projection step. Despite both analysis are restricted to
implicit schemes for the time advancement step, conclusions are extended to explicit schemes. Brown et al. have also analysed the accuracy of several incremental
pressure projection methods [42] identifying the inconsistencies that contribute to
reduce the order of accuracy on pressure. The first-order error appears as a boundary layer in the numerical results. Simple modifications in existing methods were
also presented to eliminate first-order errors in the computed pressure near solid
boundaries.
Although the controversy related to the order of accuracy of the projection
1.4. HIGH-ORDER PROJECTION METHODS
15
methods, high-order finite difference numerical schemes have been presented, see
e.g. [43, 44, 45], and the formal order of accuracy demonstrated through numerical
experiments. The advantages of using higher-order accurate methods for the solution of partial differential equations are well known. Spectral methods are widely
used for problems with periodic boundary conditions. Higher-order finite difference
methods also present significant advantages over lower-order methods. To achieve
the high accuracy demands of some engineering problems simulations, high-order
spatial discretizations have gained interest. Higher-order accurate methods have
better resolution characteristics of the difference approximations. The resolution
characteristics as reviewed by Lele [46] are related to the accuracy with which the
difference approximation represents the exact result over the full range of length
scales that can be realized on a given mesh. Consequently, higher-order difference
methods should require fewer nodes per wavelength compared with the secondorder scheme for a given accuracy.
In finite-volume formulations, the resulting high-order finite difference equations are constructed by increasing the spatial accuracy of the fluxes approximation [47]. For second-order accurate methods, the time derivative of the volumeaveraged velocity can be congruently replaced by the time derivative of the cell
center velocity. This procedure is also second-order accurate. On a high-order formulation, it is essential to proceed with a high-order reconstruction of the pointwise velocity field. The reconstruction should be performed, at least, with the
same order of accuracy of the other operators to keep the desired formal accuracy
of the numerical scheme.
A local average-based procedure was introduced by Denaro et al. [48] for
the solution of incompressible Navier-Stokes equations in the framework of Large
Eddy Simulations (LES). This approach was developed in References [49] and [50]
where a fourth-order finite-volume method, based on the approximate deconvolution of the integral Navier-Stokes equations, is presented. The deconvolved integral Navier-Stokes equations are solved after discretization in a co-located mesh
arrangement by means of a second-order accurate semi-implicit scheme for the
time integration and a pressure-free velocity-pressure decoupling. A deconvolution procedure was also proposed by Pereira et al. [51] in the context of a compact
fourth-order fully coupled finite-volume method, where the solution proceeds based
on variable cell averages and point-wise values are recovered at the end of the com-
16
CHAPTER 1. INTRODUCTION
putation.
A projection-type fourth-order accurate numerical scheme to solve the integral
form of the incompressible Navier-Stokes equations can provide an efficient tool
for fluid flow simulations. An explicit formulation for time advancement, like the
fourth-order Runge-Kutta scheme, will contribute to an efficient parallel performance of the code.
1.5
Objectives and contributions
Unsteady fluid flow phenomena are important for a wide range of engineering
problems and tools such as the numerical simulation plays a vital role in providing
solutions. For some applications, such as feedback control processes, it would be
beneficial to obtain multidimensional fluid flow solutions faster then real time.
However, even without real time applications in mind, reducing the computing
time to solve unsteady flow problems is always beneficial, as it makes possible to
study increasingly larger and more complex problems.
The advent of an accurate, efficient and robust method for parallel-in-time
calculations will extend the parallel calculations options in the context of CFD
calculations. The parallel-in-time or time-domain decomposition option could be
selected but also the hybridization of the time and space parallel strategies.
Another important issue related to the numerical simulation of unsteady incompressible flows is the use of high-order accurate methods. For a given accuracy,
high-order formulations require less storage and computing time per time-step than
a second-order scheme. However, for finite-volume formulations it is essential to
include a high-order reconstruction procedure to keep the desired accuracy of the
numerical scheme.
The main objectives of the work presented in the Thesis can be summarized as
follows:
i) to perform the parallel solution of the unsteady, incompressible Navier-Stokes
equations by the spatial, temporal or hybrid (simultaneous space and time)
domain decomposition;
ii) to analyse the consistency, stability, convergence, efficiency and robustness
of the temporal domain decomposition method;
1.6. CONTENTS
17
iii) to develop a fourth-order accurate, in space and time, finite-volume numerical scheme for the solution of the unsteady, incompressible Navier-Stokes
equations.
Some of the work presented in the Thesis has also been published in References
[29, 31, 52, 53] where further details and applications can be found.
1.6
Contents
The present Thesis is divided in five Chapters. Chapter 1 introduces the work
performed and reported in the Thesis.
Chapter 2 describes the numerical methods used for the solution of the unsteady, incompressible Navier-Stokes equations on the framework of a spatial,
temporal or hybrid domain decomposition. The governing equations for an unsteady, incompressible fluid flow are presented in Section 2.1. Section 2.2 presents
the numerical methods adopted for the parallel solution of the governing equations.
The spatial and temporal discretization schemes considered for the parallel-in-time
fluid flow simulations are summarized in Sections 2.3 and 2.4, respectively. Section 2.5 is devoted to the pressure correction step of the projection method. A
fourth-order accurate finite-volume numerical scheme for the solution of the unsteady, incompressible Navier-Stokes equations is described with detail in Section
2.6. Section 2.7 summarizes the Chapter.
Chapter 3 describes the main topics related to the space domain decomposition method for parallel fluid flow simulation. The parallel solution of systems of
equations is discussed in Section 3.1. Fluid flow simulations performed with the
fourth-order accurate numerical scheme under the framework of the spatial domain
decomposition technique are included in Section 3.2 to analyse the accuracy of the
method. The two-dimensional Taylor vortex decay problem allows verifying the
accuracy of the numerical scheme. Strong non-linear test cases, the interaction
between co- and counter-rotating vortices, were also selected to verify the increase
of the numerical scheme accuracy promoted by the inclusion of a high-order deaveraging procedure. Detailed conclusions close the Chapter.
Chapter 4 is devoted to the presentation of the parallel-in-time method for the
solution of the unsteady, incompressible Navier-Stokes equations. Section 4.1 in-
18
CHAPTER 1. INTRODUCTION
cludes the detailed presentation of the numerical method applied to the solution of
the unsteady, incompressible Navier-Stokes equations. The stability of the method
is discussed in Section 4.2. Following Sections are devoted to analyse the accuracy
of the method and to propose a performance model. The Chapter closes with the
presentation of the results of numerical experiments in Section 4.5 and conclusions
in Section 4.6.
Chapter 5 closes the Thesis with summarizing conclusions in Section 5.1 and
suggestions for future work in Section 5.2.
Chapter 2
Numerical methods
This Chapter describes the numerical methods used for the solution of the unsteady, incompressible Navier-Stokes equations on the framework of the spatial,
temporal or hybrid domain decomposition techniques. The governing equations
for unsteady, incompressible fluid flows are presented in Section 2.1. The main
topics related to the numerical methods used for the solution of the unsteady, incompressible fluid flow governing equations are introduced in Section 2.2. Sections
2.3 and 2.4 include the spatial and temporal schemes considered for the discretization of the governing equations. Section 2.5 is devoted to the pressure correction
step of the projection method. A fourth-order accurate finite-volume numerical
scheme for the solution of the unsteady, incompressible Navier-Stokes equations
based on the projection method is derived in Section 2.6. Chapter closes with a
summary in the Section 2.7.
2.1
Governing equations
The equations of conservation of mass and momentum describe the viscous flow
of a pure isothermal fluid. For an arbitrary control-volume, conservation of mass
requires that the rate of change of mass within the control-volume is equal to the
mass flux crossing the control-surface,
∂
∂t
Z
Z
ρ dV =
CV
ρu · n dS ,
(2.1)
CS
19
20
CHAPTER 2. NUMERICAL METHODS
where ρ, u, CV , CS and n denote the density, velocity, control-volume, controlsurface and a unit vector normal to the control-surface and directed outwards,
respectively.
For incompressible flows, constant density, Eq. (2.1) reduces to:
Z
u · n dS = 0 .
(2.2)
CS
The momentum conservation equation, derived from the Newton’s second law
of motion, for an arbitrary control-volume CV is:
∂
∂t
Z
Z
∇ · (ρuu) dV =
ρu dV +
X
F,
(2.3)
CV
CV
where contributions to the summation F come from forces acting at the surface of
the control-volume and throughout the volume.
For incompressible Newtonian fluid flows, the momentum conservation equation, Eq. (2.3) is:
Z
Z
∂
ρu dV +
ρu (u · n) dS =
∂t CV
CS
Z
Z
Z
=−
pn dS +
µ∇u · n dS +
ρb dV ,
(2.4)
CS
CS
CV
denoting by µ the viscosity, b the body forces per unit mass and p the pressure.
For flows accompanied by heat transfer, an equation for energy conservation,
usually with the temperature θ as dependent variable, must be added to complete
the set of governing equations. Neglecting the work done by pressure and viscous
forces, the equation for temperature reduces to a scalar conservation equation. The
integral form of the equation describing conservation of a scalar quantity φ reads:
∂
∂t
where
Z
P
Z
ρφ dV +
CV
ρφu · n dS =
X
fφ ,
(2.5)
CS
fφ represents transport of φ by mechanisms other than convection and
any sources or sinks of the scalar. Diffusive transport is described by a gradient
2.1. GOVERNING EQUATIONS
21
approximation. The Fourier’s law for heat diffusion reads:
Z
fθdif f
=
κ∇θ · n dS ,
(2.6)
CS
where κ is the thermal conductivity. Therefore, neglecting the existence of sources
or sinks, the heat conservation equation is:
Z
Z
Z
∂
ρcp θ dV +
ρcp θu · n dS =
κ∇θ · n dS .
∂t CV
CS
CS
(2.7)
Considering constant specific heat and thermal conductivity, this equation can
be rewritten as:
Z
Z
Z
∂
κ
(2.8)
θ dV +
θu · n dS =
∇θ · n dS .
∂t CV
CS
CS ρcp
The finite-difference methods consider the differential form of the governing
equations that are obtained as follows. Applying the Gauss theorem to Eq. (2.2),
the surface integral may be replaced by a volume integral,
Z
∇ · u dV = 0 .
(2.9)
CV
Since Eq. (2.9) is valid for any size of the control-volume, it implies that:
∇ · u = 0.
(2.10)
The vector form of the momentum equation, Eq. (2.4), is obtained applying
the Gauss’ divergence theorem to the convective and diffusive terms:
∂ρu
+ ∇ · (ρuu) = −∇p + µ∇ · (∇u) + ρb .
∂t
The vector form of the temperature convection-diffusion equation is:
µ
¶
κ
∂θ
+ ∇ · (θu) = ∇ ·
∇θ .
∂t
ρcp
(2.11)
(2.12)
Under the considered assumptions, in Cartesian coordinates and tensor notation the differential form of the governing equations for an unsteady, incompressible
22
CHAPTER 2. NUMERICAL METHODS
viscous flow is:
∂ui
= 0,
∂xi
∂p
∂ 2 ui
∂ρui
∂
(ρui uj ) = −
+µ
+ ρbi ,
+
∂t
∂xj
∂xi
∂xj ∂xj
∂θ ∂ (θuj )
κ ∂ 2θ
=
,
+
∂t
∂xj
ρcp ∂xj ∂xj
(2.13)
(2.14)
(2.15)
for mass, momentum and thermal energy conservation, respectively.
Considering the Boussinesq approximation, see e.g. Lesieur et al. [54], the
density is treated as a constant in the unsteady and convection terms and as a
variable in the body forces term. Assuming that the density varies linearly with
the temperature, the contribution for the body forces will be given by:
(ρ − ρ0 ) gi = −ρ0 gi β (θ − θ0 ) ,
(2.16)
where gi is the ith component of the gravity acceleration, ρ0 stands for the density at the reference temperature θ0 and β is the thermal expansion coefficient.
When the only body force to be considered is the buoyancy force, this term of the
momentum equation is given by:
ρbi = −ρ0 gi β (θ − θ0 ) .
2.2
(2.17)
Numerical solution of the Navier-Stokes equations
The lack of a dedicated pressure evolution equation in the governing set of equations is responsible for major difficulties encountered to obtain a time accurate prediction for an incompressible, unsteady flow problem. When solving the unsteady,
incompressible form of the Navier-Stokes equations, pressure provides coupling between the momentum and mass conservation equations. This coupled system can
be solved iteratively using methods such as the SIMPLE [55] or PISO [32] or by the
”divide and conquer ” approach, which has different names under different modifications: fractional-step, operator splitting, projection method, etc. Both approaches
2.3. SPATIAL DISCRETIZATION
23
will be considered for the solution of the unsteady fluid flow problems included in
Chapter 4 to analyse the properties of the parallel-in-time decomposition method.
The SIMPLE formulation was introduced by Patankar and Spalding [56] and
described in detail by Patankar [55]. The acronym SIMPLE stands for SemiImplicit Method for Pressure Linked Equations. The iterative procedure can be
interpreted as a pseudo-transient treatment of the governing equations to obtain
a steady-state solution. The SIMPLE procedure, although most suited to steady
problems, may be easily extended to unsteady problems. The SIMPLE method
will be considered for parallel-in-time fluid flow simulations solution when larger
time-steps are required by the coarse time-grid predictor step.
The projection methods are based on the decomposition of the equation operators. Momentum equations are first updated calculating an intermediate velocity
that, in general, will not satisfy mass conservation. This intermediate velocity field
is then corrected to enforce mass conservation and pressure field is adjusted. The
incremental pressure-correction scheme is used in the projection methods considered for the unsteady fluid flow simulations. The ”old ” pressure gradient is
considered in the first step and then corrected in the second step. This procedure became popular after Van Kan [38] who proposed a second-order incremental
pressure-correction scheme.
Another distinctive feature of the numerical schemes devoted to the solution
of the Navier-Stokes equations is the way they treat the convective and diffusive
terms. The projection methods considered for the numerical simulations reported
in Chapters 3 and 4 range from explicit to implicit formulations. The numerical
scheme for each flow simulation included in Chapters 3 and 4 will be stated with
the definition of the test case.
2.3
Spatial discretization
Two types of grid layout, co-located and staggered grids, may be applied to discretize the appropriate set of equations, equations (2.13), (2.14) and (2.15) for
finite-difference formulations or equations (2.2), (2.4) and (2.8) for finite-volume
formulations. Both grid layouts, represented in the Fig. 2.1 for the two-dimensional
case, will be used for parallel-in-time flow simulation.
24
CHAPTER 2. NUMERICAL METHODS
y
y
j+1
y
p,u,v
j
j+1
y
u
j
p
v
y
j-1
x
i-1
x
i
x
y
i+1
j-1
x
i-1
x
i
x
i+1
Figure 2.1: Co-located grid (left) and staggered grid (right).
In the co-located grid arrangement, all dependent variables are located at the
same physical location. The co-located arrangement appears to be more natural
and simple, allowing a small amount of interpolation when compared with the discretization on a staggered grid. The computer programming for staggered grids,
where velocities are centered with the pressure locations, appears to be more complex than for a co-located arrangement due to the different indexing required by
each velocity component. The main reason to use this arrangement is that it prevents ”odd–even coupling” (sometimes referred as ”checkerboarding”) between the
pressure and the velocity fields, which arises on co-located arrangements.
The ”odd–even decoupling problem” needs to be addressed when computing incompressible Navier–Stokes equations on a co-located grid. Various approaches
may be used to overcome the problem of ”odd–even decoupling” between the pressure and velocity fields. The technique of interpolating the cell-face velocities via
”momentum interpolation”, first proposed by Rhie and Chow [57], is a popular
scheme to achieve this. Another approach, among others, consists into filter out
the oscillations. When discretizing the governing equations on a co-located grid,
the solution adopted for this problem follows the method proposed by Ye et al.
[58]. For a finite-volume second-order discretization scheme, the method can be
outlined as follows:
- The face-center velocity is used to compute the convective flux from each cell
in the finite-volume discretization scheme;
- Following the time advance step, the intermediate face-center is computed
2.3. SPATIAL DISCRETIZATION
25
by interpolating the intermediate cell-center velocity;
- Once the pressure is obtained by solving the pressure Poisson equation, both
the cell-center and face-center velocities are updated separately;
- The updated face-center velocity is used to compute the convective flux at
the next time step.
In addition to developing a stable form of the discrete equations, the temporal
and spatial discretization schemes were selected in order to preserve a congruent
order of accuracy to the scheme. As it will be described in Section 2.4, first- secondand fourth-order accurate schemes were considered for the temporal advancement
of the momentum equations. Therefore, first-, second- and fourth-order accurate
schemes were considered for the spatial discretization.
The first-order ”upwind ” spatial discretization is employed together with the
first-order temporal implicit and explicit Euler schemes. The second-order central difference scheme is considered for the Adams-Bashforth, Crank-Nicolson and
three-level implicit time discretization schemes. An advantage of the central difference scheme over non-centered schemes is that it is relatively free of numerical
dissipation. The standard second-order accurate staggered grid finite-difference
scheme conserves mass, momentum and kinetic energy [59]. While this improves
the accuracy of the scheme, it can also lead to non-physical oscillations if an insufficient grid refinement is used [55]. For steady one-dimensional convection-diffusion
equation, the central difference scheme will give realistic solutions as long as the
cell Peclet number,
Pe =
u∆h
,
γ
(2.18)
is kept less than two, where ∆h is the grid cell dimension and γ is the kinematic
diffusion coefficient. However, it is possible to obtain good results with P e > 2,
as long as the oscillations are significantly smaller than the other structures in
the flow. When necessary, the deferred correction method proposed by Khosla
and Rubin [60] was applied for the convection discretization avoiding non-physical
oscillations.
The fourth-order accurate central difference discretization scheme is considered
with the fourth-order explicit Runge-Kutta time discretization in the framework
26
CHAPTER 2. NUMERICAL METHODS
of a fourth-order accurate finite-volume numerical scheme that will be derived in
Section 2.6.
2.4
Time advancement of momentum equations
A fully explicit scheme for both convection and diffusion terms has the advantage
that no matrix inversion is required. However, all explicit methods require consideration of the general stability constraints from linear analysis. The Neumann
diffusive criterion links the time-step and the square of the grid size. The maximum usable time-step is proportional to the characteristic diffusion time, (∆h)2 /γ
where ∆h is the minimum grid cell dimension and γ is the kinematic diffusion
coefficient. For convection terms the maximum time-step is proportional to the
characteristic convection time ∆h/u. This condition is usually described in terms
of the Courant-Friedrichs-Lewy number, CF L = u∆t/∆h.
Denoting by H the discrete convection operator, G the discrete gradient operator for pressure and L the discrete diffusion operator, the following explicit and
implicit schemes are used in this work for the time advancement of the momentum
equations:
(i) Explicit methods:
- Explicit Euler scheme
¡
¢
un+1 − un
+ H (un ) = L (un ) − G pn−1/2
∆t
(2.19)
- Adams-Bashforth scheme
·
¸
un+1 − un
3
1 ¡ n−1 ¢
n
+ H (u ) − H u
=
∆t
2
2
·
¸
¡
¢
3
1 ¡ n−1 ¢
n
= L (u ) − L u
− G pn−1/2
2
2
(2.20)
- Fourth-order Runge-Kutta scheme
The fourth-order Runge-Kutta scheme will be described later in subsection 2.6.2.
(ii) Implicit methods:
2.5. PRESSURE CORRECTION
27
- Implicit Euler scheme
¡
¡
¡
¢
¢
¢
un+1 − un
+ H un+1 = L un+1 − G pn−1/2
∆t
(2.21)
- Crank-Nicolson scheme
·
¸
un+1 − un
1 ¡ n+1 ¢ 1
n
+ H u
+ H (u ) =
∆t
2
2
·
¸
¢ 1
¡
¢
1 ¡
= L un+1 + L (un ) − G pn−1/2
2
2
(2.22)
- Three-level implicit scheme
¡
¢
¡
¢
¡
¢
3un+1 − 4un + un−1
+ H un+1 = L un+1 − G pn−1/2 (2.23)
2∆t
2.5
Pressure correction
On a projection scheme, after the approximation of the velocity field, u∗ , obtained
by integration of the momentum equations using one of the schemes above, mass
conservation is enforced through a pressure correction step given by:
Z
CV
un+1 − u∗
dV = −
∆t
Z
0
∇p dV ,
(2.24)
CV
0
for a finite-volume discretization scheme, denoting by p and u∗ the pressure increment and the intermediate velocity, respectively.
The approximated velocity field is projected onto a subspace of divergence
free velocity field requiring that the final velocity field satisfies the integral mass
conservation equation:
Z
un+1 · n dS = 0 .
(2.25)
CS
The integral version of the pressure Poisson equation results:
Z
1
∇p · n dS =
∆t
CS
Z
0
u∗ · n dS .
CS
(2.26)
28
CHAPTER 2. NUMERICAL METHODS
The calculated pressure correction p0 is then used to correct the velocity field,
un+1 = u∗ − ∆t∇p0 ,
(2.27)
and the pressure field,
pn+1/2 = pn−1/2 + p0 .
2.6
(2.28)
High-order finite-volume method
The integral form of the governing equations for an isothermal flow, Eq. (2.2)
and (2.4), is used to derive a fourth-order accurate finite-volume method. The
method can be outlined as follows. The governing equations are discretized on
a staggered cartesian uniform mesh. The time advancement of the momentum
equations is performed with the classical four-stage Runge-Kutta explicit scheme
and the convective and diffusive fluxes through control-volume faces are approximated by fourth-order accurate polynomial interpolation and Simpson’s rule of integration for high-order accuracy agreement. A high-order de-averaging procedure
is required to calculate the time derivative of the volume-averaged velocity congruently with the spatial and temporal discretization schemes. The de-averaging
coefficients calculation is based on the Taylor series expansion of the integrated
velocity values at cell and neighbourhood cells.
2.6.1
Spatial discretization
For one control-volume, the integral form of the momentum equation, Eq. (2.4),
reads:
Z
X
X
∂
u=
Li −
Hi − G ,
(2.29)
∂t CV
i∈S
i∈S
where Hi and Li stands for the convective and diffusive fluxes through controlsurfaces and G represents the pressure source term. For high-order approximation
of the convective and viscous fluxes through control-surfaces in Eq. (2.29), surface
integrals are calculated by the fourth-order accurate Simpson’s rule.
2.6. HIGH-ORDER FINITE-VOLUME METHOD
29
NN
N
ne
∆
y
n
WW
W
w
P
s
e
E
EE
se
S
SS
∆x
Figure 2.2: Labelling scheme for a 2D grid.
For the two-dimensional case, the integral over Se , see Fig. 2.2, is evaluated as:
Z
f ds =
Se
∆y
(fne + 4fe + fse ) ,
6
(2.30)
where f stands for the convective or diffusive approximations at the required locations. Therefore, the values of f are required at three locations, cell face-center
and two vertices, for each cell face. The variable values required for the convective
or diffusive approximations should be calculated with fourth-order accurate interpolation and derivatives in order to keep the fourth-order approximation of the
integral. The following expressions are used for fourth-order accurate evaluation
of the integrand in Eq. (2.30):
27fP + 27fE − 3fW − 3fEE
,
48
µ ¶
∂f
27fE − 27fP + fW − fEE
.
=
∂x e
24∆x
fe =
(2.31)
(2.32)
For the three-dimensional case, more values of f are required for a fourth-order
approximation of the face integral. The control-volume face integral is calculated
30
CHAPTER 2. NUMERICAL METHODS
by:
Z
∆y∆z
[fend + fesd + fenu + fesu +
36
+4 (fen + fed + fes + feu ) + 16fe ] ,
f ds =
Se
(2.33)
considering the notation indicated in the Fig. 2.3. The required values on the
edges and vertices are obtained by fourth-order accurate interpolation as in the
two-dimensional case.
y
end
en
enu
ed
P
y
∆
x
e
esd
eu
es
∆z
z
esu
∆x
Figure 2.3: Labelling scheme for 3D face integral evaluation.
2.6.2
Time advancement of momentum equations
The time advancement of the momentum equations is performed with the classical explicit four-stage Runge-Kutta scheme. It is generally considered that the
fourth-order Runge-Kutta method provides a good balance of computational effort,
precision and storage requirement. As an explicit one-step multistage method, it
does not need any special requirements to start the calculation. The basic idea on
multistage methods is to create a weighted sum of corrections ∆χ to the solution
0
00
000
at several stages within the time-step, χn+1 = χn + C1 ∆χ + C2 ∆χ + C3 ∆χ . . ..
The coefficients Ck are calculated by matching this expansion with the correspond-
2.6. HIGH-ORDER FINITE-VOLUME METHOD
31
ing expansion by Taylor series for the desired order of accuracy. The Runge-Kutta
methods tend to be non-unique because a large number of parameters are considered during derivation. Considering the initial value problem,
dχ
= F (t, χ) ,
dt
(2.34)
with prescribed initial condition χ (t = 0) = χ0 , the classical fourth-order RungeKutta algorithm is:
0
∆χ = ∆t F (tn , χn ) ,
µ
¶
1
1
00
0
n
n
∆χ = ∆t F t + ∆t, χ + ∆χ ,
2
2
µ
¶
1
1
000
00
n
n
∆χ = ∆t F t + ∆t, χ + ∆χ
,
2
2
³
´
0000
000
∆χ = ∆t F tn + ∆t, χn + ∆χ
,
´
1³ 0
00
000
0000
χn+1 = χn +
∆χ + 2∆χ + 2∆χ + ∆χ
.
6
(2.35)
(2.36)
(2.37)
(2.38)
(2.39)
For a finite-volume formulation, it is required to perform the integration of
the point-wise initial velocity field before start the time-stepping calculation. After interpolation to the required positions on the control-surfaces, the following
expression is used to approximate the integral for the two-dimensional case:
Z
∆x∆y
[(ui )en + (ui )es + (ui )wn + (ui )ws +
ui dV =
36
CV
+4 ((ui )e + (ui )w + (ui )n + (ui )s ) + 16(ui )P ] .
(2.40)
For the three-dimensional case, the integral is calculated by:
Z
∆x∆y∆z
ui dV =
[(ui )n + (ui )s +
6
CV
(ui )w + (ui )e + (ui )u + (ui )d ] .
(2.41)
The application of the Runge-Kutta method for the time-stepping solution of
the Navier-Stokes equations requires the evaluation of the convective and diffusive
R
contributions at four stages during each time-step. Denoting by û the term Ω u on
the LHS of Eq. (2.29), by F (t, u) the convective and diffusive terms on the RHS
32
CHAPTER 2. NUMERICAL METHODS
0 00 ,000
of the same equation and by u ,
0 00 ,000
and p ,
the velocity and pressure variable
values after the first, second and third stages, the solution proceeds accordingly
with the projection method and the pressure increment option as follows:
Stage 1
û∗ = ûn +
∇ · û∗ = 0
0
û = û∗ −
∆t
∆t ³ n− 1 ´
F (tn , un ) −
G p 2
2
2
³
´
1
∆t
0
G p − pn− 2
2
(2.42)
(2.43)
(2.44)
Stage 2
∆t ³ n+ 1 0 ´ ∆t ³ 0 ´
F t 2,u −
G p
û = û +
2
2
∇ · û∗∗ = 0
´
∆t ³ 00
00
0
∗∗
û = û −
G p −p
2
∗∗
n
(2.45)
(2.46)
(2.47)
Stage 3
³ 1 00 ´
³ 00 ´
û∗∗∗ = ûn + ∆t F tn+ 2 , u − ∆t G p
(2.48)
∇ · û∗∗∗ = 0
(2.49)
³
´
000
000
00
∗∗∗
û = û − ∆t G p − p
(2.50)
Stage 4
∆t
∆t ³ n+ 1 0 ´
û∗∗∗∗ = ûn +
F (tn , un ) +
F t 2,u +
6
3
³ 000 ´
∆t ³ n+ 1 00 ´ ∆t ³ n+1 000 ´
t 2,u +
F t , u − ∆t G p
3
6
∗∗∗∗
∇ · û
=0
³
´
000
n+1
∗∗∗∗
n+ 12
û
= û
− ∆t G p
−p
(2.51)
(2.52)
(2.53)
2.6. HIGH-ORDER FINITE-VOLUME METHOD
33
After each time-advance stage, a pressure correction step is performed to impose
the mass conservation. The integrated intermediate velocity is used to calculate
mass fluxes for the source terms of the pressure Poisson equation, Eq. (2.26). For
a two-dimensional problem, a 25 points stencil is required to obtain consistent discretizations of the gradient and divergence operators in Eq. (2.26). This is a very
expensive calculation for an unsteady time-stepping calculation even applying a deferred correction approach. The gradient operator discretization was consequently
performed with 9 and 13 points stencils for two-dimensional and three-dimensional
domains, see Fig. 2.4 for the two-dimensional case. As it will be verified in Chapter 3, the inconsistence introduced here did not deteriorate the formal order of
accuracy of the numerical scheme. The solution of the linear system of equations
resulting from the finite volume/difference analogue of Eq. (2.26) is performed by
the Bi-CGStab [61] algorithm included in the AZTEC library [62].
After the solution of the pressure Poisson equation, the new averaged divergence
free intermediate velocity field is calculated by correcting the intermediate velocity
field prediction with the gradient of the computed pressure correction. Finally, a
new point-wise velocity field is reconstructed by de-averaging the corrected velocity
field. The velocity de-averaging operation is described in the following section.
2.6.3
Velocity de-averaging
The new control-volume cell center velocity is calculated as a weighted sum of the
volume integrated velocity at cell and neighbourhood cells
u=
1 X
Ci ûi
V i
(2.54)
where V denotes the volume of each cell.
The Taylor series expansion of the integrated velocity values at cell and neighbourhood cells allows to calculate the de-averaging coefficients, Ci , for Eq. (2.54).
The derivation of the de-averaging coefficients was performed with symbolic computing software (MATHEMATICA). Appendix A includes the de-averaging coefficients calculated for two- and three-dimensional cartesian uniform grids and
details of the calculation. The de-averaging coefficients were calculated to obtain sixth-order accuracy for cells far from the boundaries of the domain. Near
34
CHAPTER 2. NUMERICAL METHODS
i, j+2
i, j+1
i-2, j
i-1, j
i, j
i+1, j
i+2, j
i, j-1
i, j-2
Figure 2.4: Nine-point stencil for pressure increment gradient operator discretization for the pressure Poisson equation for the two-dimensional case.
the boundaries, only fourth-order accuracy is achieved due to the non-symmetric
stencil. The benefits on the accuracy of numerical scheme resulting from the inclusion of the high-order de-averaging procedure will be verified through numerical
experiments in Chapter 3.
2.7
Summary
The governing equations and the numerical methods used for the solution of the
unsteady, incompressible form of the Navier-Stokes equations on the framework of
a spatial, temporal or hybrid domain decomposition technique were presented in
this Chapter.
The first- and second-order schemes considered for the spatial and temporal discretization are widely used for CFD calculations. Consequently, these discretization schemes were briefly summarized.
A fourth-order accurate finite-volume numerical scheme was developed for the
solution of the unsteady, incompressible Navier-Stokes equations. The scheme
2.7. SUMMARY
35
is based on the projection method. The time advancement of the momentum
equations is performed with the classical explicit four-stage Runge-Kutta scheme
and the convective and viscous fluxes through control-volume faces are approximated by Simpson’s rule and fourth-order accurate polynomial interpolation and
derivatives. The inclusion of a high-order integration/de-averaging procedure in a
high-order finite-volume unsteady, incompressible flow solver should be required to
approximate the time derivative of the volume-averaged velocity congruently with
the accuracy of the other operators. The de-averaging coefficients calculation is
based on the Taylor series expansion of the volume-integrated velocity at cell and
neighbourhood cells.
36
CHAPTER 2. NUMERICAL METHODS
Chapter 3
Space domain decomposition
This Chapter describes the main topics related to the space domain decomposition method for parallel fluid flow simulation. Section 3.1 is devoted to analyse
the methods for the parallel solution of systems of equations. Section 3.2 includes
numerical fluid flow simulations performed with a finite-volume fourth-order accurate numerical scheme under the framework of the spatial domain decomposition
technique. The main purpose of the numerical experiments is to evaluate the accuracy increase resulting from the inclusion of a high-order de-averaging step on
the numerical scheme. Chapter closes with detailed conclusions.
3.1
Parallel solution of systems of equations
When the parallelization procedure is based on a spatial grid partitioning strategy,
the computational domain is partitioned and distributed to several sub-domains,
each one assigned to one processor. Each processor contains a sub-grid surrounded
by some ”ghost” grid points that are duplicates of grid points hold by other processors. At the start of each time-step, calculation on all processors is synchronized
and message exchange is performed for grid points on partition boundaries. For
an explicit time advancement method, the parallelization of the calculation procedure for the intermediate velocity fields is straightforward. However, the pressure
correction step is a crucial issue to the efficient work of a parallel machine. Figure
3.1 shows schematically the flow chart for a single-step explicit time advancement
method.
37
38
CHAPTER 3. SPACE DOMAIN DECOMPOSITION
Start
Inicialize
Distribute initial
conditions
Increment time and set
boundary conditions
Exchange partition boundaries
U and V values
Solve U-equation and
V-equation
Exchange partition boundaries
U and V values
Solve
P-equation
A
No
Solver
Collect residual sums and broadcast
decision
Converged ?
Exchange partition boundaries
pressure correction values
Yes
Correct U, V and P
No
Final time ?
Yes
End
Figure 3.1: Parallel code flow chart for a incremental-pressure projection method
(cycle A exists only for the explicit outer iteration coupling).
The solution of the Poisson equation for pressure has the potential to degrade
the performance, the achieved speed-up, of a parallel algorithm due to the nature of this equation that requires global communication among the processors.
The most CPU-time consuming part of explicit projection methods is the solution
of a discrete pressure Poisson equation with Neumann-type boundary conditions
formulated to satisfy the continuity equation. More than 80% of the total computational time is usually consumed by the iterative solution of the Poisson equation.
There are basically three strategies for solving band-diagonal linear systems in parallel: direct methods, whereby data are exchanged at processor boundaries then
local solutions are obtained and joined back together; transpose methods, whereby
the data are rearranged to give each processor all of the data it needs to compute
a complete solution in serial; and iterative methods, whereby boundary data are
exchanged and an initial guess is then iterated to convergence.
The availability of reliable packages, like the AZTEC library [62], to numeri-
3.1. PARALLEL SOLUTION OF SYSTEMS OF EQUATIONS
39
cally solve in parallel large linear systems makes the parallelization quickly achievable. AZTEC library includes a number of Krylov iterative methods, such as
conjugate gradient (CG), generalized minimum residual (GMRES) and stabilized
bi-conjugate gradient (Bi-CGStab), to solve systems of equations. These Krylov
methods can be used in conjunction with various pre-conditioners such as polynomial or domain decomposition methods using LU or incomplete LU factorizations
within sub-domains.
In general, three different methods can be considered for the sub-domains coupling. With the first one, explicit outer iteration coupling, each sub-system of
equations is solved independently and data at the partition boundaries are exchanged after each outer iteration (cycle A in Fig. 3.1). With the second method,
explicit inner iteration coupling, data transfer is performed after each inner iteration, achieving a better coupling of sub-domains with more data exchange. If
the global system of equations is solved, sometimes designated as implicit inner
iteration coupling, a strong coupling is achieved with a large amount of data exchange. Since the coupling method also affects the convergence behaviour, the
comparative parallel performance with each one of the methods described above
is problem dependent [63].
Although problem dependent, some preliminary tests were performed to evaluate the performance of the solution with the Bi-CGStab algorithm [61] included in
the AZTEC library and compare with the performance of a solver routine, written
in FORTRAN for this purpose, also based on the Bi-CGStab algorithm. A parallel two-dimensional second-order accurate explicit code for the solution of the unsteady, incompressible Navier-Stokes equations was initially developed to conduct
the tests. Flow configuration considered for the tests is the Taylor vortex-decay
problem [64]. The analytical solution for this flow is:
u(x, y, t) = − cos(x) sin(y)e−σt ,
(3.1)
v(x, y, t) = sin(x) cos(y)e−σt ,
(3.2)
1
(3.3)
p(x, y, t) = − (cos (2x) + cos (2y)) e−2σt ,
4
for σ = 2/Re. Computations were performed with Re = 1 considering the domain
0 ≤ x, y ≤ π. Initial and boundary conditions are prescribed accordingly with Eq.
40
CHAPTER 3. SPACE DOMAIN DECOMPOSITION
14
local
local/AZTEC
global
computing time (s)
12
10
8
6
4
2
0
5
10
15
number of processors
Figure 3.2: Dependence of the computing time on the number of processors (128 ×
128 nodes mesh).
(3.1) to (3.3). Figure 3.2 shows the dependence on the number of processors of the
computing time required for the solution of one time-step on a 128 × 128 nodes
mesh. In the Figure, local solution indicates an explicit outer iteration coupling
using the FORTRAN routine, local/AZTEC solution indicates an explicit outer
iteration coupling using the AZTEC library. The Figure shows also the computing
time required by the global solution method performed with the AZTEC library.
Figures 3.3 and 3.4 show the parallel speed-up and efficiency obtained with the
considered methodologies. Results showed that the calculation performed with the
AZTEC library could be up to about three times faster than the one performed
with the FORTRAN routine, even for a small number of mesh nodes. Consequently, the last option, the use of the written FORTRAN routine to solve the
systems of equations, was discarded.
Another set of tests were conducted to compare the performance of the AZTEC
solver for the explicit outer iteration coupling scheme and the global solution on a
more refined mesh. Figures 3.5 and 3.6 shows the parallel speed-up and efficiency
for a 512 × 512 nodes mesh. The tests allowed concluding that it is probably
advantageous to solve the systems of equations with the global solution method
and the AZTEC library for most of the cases.
3.1. PARALLEL SOLUTION OF SYSTEMS OF EQUATIONS
41
14
local
local/AZTEC
global
ideal
12
speed-up
10
8
6
4
2
5
10
15
number of processors
Figure 3.3: Dependence of the achieved speed-up on the number of processors
(128 × 128 nodes mesh).
1
0.9
local
local/AZTEC
global
parallel efficiency
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
5
10
15
number of processors
Figure 3.4: Dependence of the parallel efficiency on the number of processors
(128 × 128 nodes mesh).
42
CHAPTER 3. SPACE DOMAIN DECOMPOSITION
local/AZTEC
global
ideal
14
12
speed-up
10
8
6
4
2
5
10
15
number of processors
Figure 3.5: Dependence of the achieved speed-up on the number of processors for
a 512 × 512 nodes mesh.
local/AZTEC
global
1
0.9
parallel efficiency
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
5
10
15
number of processors
Figure 3.6: Dependence of the parallel efficiency on the number of processors for
a 512 × 512 nodes mesh.
3.2. HIGH-ORDER FINITE-VOLUME NUMERICAL SIMULATIONS
3.2
43
High-order finite-volume numerical simulations
A fourth-order accurate finite-volume numerical scheme for the unsteady, incompressible form of the Navier-Stokes equations was derived in Section 2.6. The
spatial domain decomposition technique was applied to parallelize the solver, reducing the computing time. The present Section is devoted to evaluate and analyse
the proposed numerical scheme. Three test cases are considered for that purpose:
(i) The Taylor vortex-decay problem has an analytical solution and allows evaluating the order of accuracy of the numerical scheme.
(ii) The second case is the simulation of the interaction between a pair of counterrotating vortices.
(iii) Finally, the merging process resulting from the interaction between a pair of
co-rotating vortices constitutes the last test case.
The main purpose of the simulations is to verify the increase on the accuracy of
the numerical scheme promoted by the inclusion of the high-order de-averaging
procedure rather than investigate the physics of wake vortices interaction.
3.2.1
Two-dimensional Taylor vortex decay problem
The two-dimensional Taylor vortex-decay problem [64] was used to evaluate the
accuracy of the presented numerical scheme. The analytical solution for this flow
is given by Eq. (3.1) to (3.3). Computations were performed with Re = 1 until
t = 0.35, considering the domain − 23 π ≤ x, y ≤ 32 π indicated in Fig. 3.7, for grid
sizes from 16×16 up to 256×256 nodes. The time-step was set to 10−3 in the coarser
mesh and initial Courant number is kept constant for the refined meshes. Dirichlet
boundary conditions are prescribed accordingly Eq. (3.1) and (3.2). Simulations
were also performed with same parameters but considering cell center point-wise
velocity values as the averaged values for each control-volume. This is hereafter
referred as the second-order de-averaging procedure. The comparison between
the results achieved with each formulation allows to evaluate the influence on the
44
CHAPTER 3. SPACE DOMAIN DECOMPOSITION
6
4
Y
2
0
-2
-4
computational domain
-6
-6
-4
-2
0
2
4
6
8
X
Figure 3.7: The computational domain considered for the two-dimensional Taylor
vortex-decay problem.
accuracy of the method resulting from the inclusion of the high-order de-averaging
procedure described in Section 2.6.
Figure 3.8 plots the maximum error, L∞ norm, of u-velocity and pressure at
final computed time as a function of mesh refinement. The Figure shows the important error reduction introduced by the inclusion of the high-order de-averaging
procedure into the numerical scheme allowing fourth-order of accuracy in both
velocity and pressure. The numerical scheme becomes only second-order accurate
when the second-order de-averaging procedure is applied.
3.2. HIGH-ORDER FINITE-VOLUME NUMERICAL SIMULATIONS
10
-1
10
-2
45
u velocity (4th order de-averaging)
u velocity (2nd order de-averaging)
pressure (4th order de-averaging)
pressure (2nd order de-averaging)
4th order slope
2nd order slope
10-3
-4
Error
10
10-5
10
-6
10-7
10
-8
10-9
100
200
300
Number of grid points along each direction
Figure 3.8: Maximum error, L∞ norm, of u-velocity and pressure, at final computed
time, dependence on the mesh refinement.
3.2.2
Counter-rotating vortices interaction
The second test case considered is a pair of symmetric two-dimensional viscous
counter-rotating vortices. The initial flow field is created by the superposition of
two Lamb-Oseen vortices, i.e. axi-symmetric vortices with a Gaussian vorticity
distribution:
ω(r) =
Γ −r2 /a2
e
,
πa2
(3.4)
where r is the radial distance from the center of each vortex, a the core radius and Γ
is the circulation. Symmetry boundary conditions are applied on the left boundary,
x = 0, and therefore, only the right semi-plane is calculated. Velocity on remaining
boundaries are prescribed accordingly with the superposition of the two vortices
and kept constant during the simulation. The spatial domain [0, 100] m×[0, 100] m
is discretized by a staggered cartesian uniform mesh. The core radius and the
46
CHAPTER 3. SPACE DOMAIN DECOMPOSITION
computational domain
100
80
Y
60
40
20
0
-100
-50
0
50
100
X
Figure 3.9: Initial conditions and computational domain for the two-dimensional
viscous counter-rotating vortices test case.
circulation of the vortices are set equal to 1 m and ±10 m2 s−1 , respectively. The
kinematic viscosity is set to 2×10−5 m2 s−1 and consequently, the Reynolds number
based on the vortex circulation is 5 × 105 . Initial position of the vortex in the right
semi-plane is, x = 20 m, y = 50 m and the simulations were performed up to time
t = 100 s. Initial conditions and the computational domain are represented in Fig.
3.9.
Figure 3.10 shows the dependence of the temporal evolution of the maximum
adimensionalized vertical velocity component along a horizontal line going through
the vortex center on the spatial discretization, 128 × 128, 256 × 256 and 512 × 512
grid nodes, and on the order of the de-averaging procedure. As expected, the
velocity decay decreases with mesh refinement and for 128 × 128 and 256 × 256
spatial discretizations, the influence of the de-averaging scheme is important. The
temporal velocity decay calculated with the second-order de-averaging procedure
on the 512 × 512 spatial mesh size, not represented in the Figure, is very similar
to the one that is obtained with the fourth-order procedure.
Figures 3.11 and 3.12 show the predicted vertical velocity component profile
3.2. HIGH-ORDER FINITE-VOLUME NUMERICAL SIMULATIONS
512x512
256x256
256x256
128x128
128x128
1.1
47
4th order de-averaging
4th order de-averaging
2nd order de-averaging
4th order de-averaging
2nd order de-averaging
1
V-THETA
0.9
0.8
0.7
0.6
0.5
0
25
50
75
100
Time
Figure 3.10: Temporal evolution of the adimensionalized maximum vertical velocity component along a horizontal line going through the vortex center.
after 100 s for 128 × 128 and 256 × 256 nodes meshes. The comparison with the
finer grid solution, 512 × 512 nodes, reveals the better accuracy of the velocity
profile predicted with the fourth-order method. Figures 3.10, 3.11 and 3.12 also
show that the difference between predictions obtained with the second- and the
fourth-order de-averaging procedures decreases with the mesh refinement.
48
CHAPTER 3. SPACE DOMAIN DECOMPOSITION
512x512 4th order de-averaging
128x128 4th order de-averaging
128x128 2nd order de-averaging
1
0.8
0.6
0.4
V-THETA
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
-1.2
16
18
20
22
24
X
Figure 3.11: Comparison of the predicted vertical velocity component profile after
100 s for the 128 × 128 nodes grid with the finer grid solution.
512x512 4th order de-averaging
256x256 4th order de-averaging
256x256 2nd order de-averaging
1
0.8
0.6
0.4
V-THETA
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
-1.2
16
18
20
22
24
X
Figure 3.12: Comparison of the predicted vertical velocity component profile after
100 s for the 256 × 256 nodes grid with the finer grid solution.
3.2. HIGH-ORDER FINITE-VOLUME NUMERICAL SIMULATIONS
3.2.3
49
Co-rotating vortices merging
A second flow example involving the interaction between vortices was simulated to
verify the influence of the increased accuracy of the fourth-order numerical scheme
on a merging process prediction.
A pair of two-dimensional Lamb-Oseen vortices with circulation Γ = 100 m2 s−1 ,
core size a = 1.2 m and separated by b = 6 m, is superposed on the spatial domain
[−30, 30] m × [−30, 30] m, large enough to avoid the influence of the boundaries
on the vortices interaction. Velocity on boundaries are prescribed accordingly with
the superposition of the two vortices and kept constant during the simulation. The
kinematic viscosity is set to 10−1 m2 s−1 . Figure 3.13 shows the vorticity contours
of the initial flow configuration.
8
6
4
y
2
0
-2
-4
-6
-8
-10
-5
0
5
10
x
Figure 3.13: Initial vorticity contours.
Firstly, calculations were performed considering a 600 × 600 nodes grid. It
was verified that for such a refined mesh the results are the same considering the
second- and fourth-order schemes, see Fig. 3.14.
This reference solution was then compared with results obtained with coarser
spatial discretizations, 150 × 150 and 300 × 300 nodes, together with the secondand fourth-order schemes. The comparisons of the predicted vorticity contours
at five time stages are plotted in Fig. 3.15 and 3.16. The inclusion of the highorder de-averaging scheme promoted an important accuracy increase on the coarser
grid solution. Figure 3.15 shows that the predictions at t = 5 s and t = 6 s
CHAPTER 3. SPACE DOMAIN DECOMPOSITION
8
8
6
6
4
4
2
2
0
y
y
50
0
-2
-2
-4
-4
-6
-6
-8
-10
-8
-5
0
5
10
-10
-5
8
8
6
6
4
4
2
2
0
-2
-4
-4
-6
-6
-8
0
5
10
-10
-5
6
4
4
2
2
0
y
y
8
6
-2
-4
-4
-6
-6
-8
0
5
10
-10
-5
0
5
10
5
10
5
10
x
8
8
6
6
4
4
2
2
0
y
y
10
-8
-5
x
0
-2
-2
-4
-4
-6
-6
-8
-8
-5
0
5
10
-10
-5
x
0
x
8
8
6
6
4
4
2
2
0
y
y
5
0
-2
0
-2
-2
-4
-4
-6
-6
-8
-10
0
x
8
-10
10
-8
-5
x
-10
5
0
-2
-10
0
x
y
y
x
-8
-5
0
x
5
10
-10
-5
0
x
Figure 3.14: Vorticity contours during the merging process at t = 2 s, t = 5 s,
t = 6 s, t = 7 s and t = 8 s (Second-order de-averaging on left and fourth-order on
the right side).
3.2. HIGH-ORDER FINITE-VOLUME NUMERICAL SIMULATIONS
51
denote important velocity differences in the vortex core as well as on the vortex
filaments at t = 7 s. The predictions obtained with the fourth-order de-averaging
are much closer to the reference solution obtained with a 600 × 600 nodes grid.
As expected, the differences on the comparison between the reference solution and
the intermediate mesh solution, 300 × 300 nodes, with both numerical schemes are
less detectable.
Temporal three-dimensional simulations were also performed applying periodic
boundary conditions on the stream-wise direction for meshes comprising 150 ×
150 × 12 and 300 × 300 × 24 nodes. No differences were detected between the twodimensional and three-dimensional results for the same number of nodes on the
plane normal to the stream-wise direction. Figure 3.17 plots the vorticity contours
during the merging process for 300 × 300 and 300 × 300 × 20 nodes grids. One
should note that the prescribed viscosity is high and none perturbation was applied.
Consequently, the flow configuration considered is not appropriated to observe the
elliptical instability, which is characterized by a three-dimensional deformation of
the vortex cores [65].
The difference between the simulations performed with the fourth-order and the
second-order numerical schemes are more significant in the following similar twodimensional merging flow example with different parameters. The circulation of
the Lamb-Oseen vortices is reduced to Γ = 10 m2 s−1 and the core size to a = 1 m.
The spatial domain, [−50, 50] m × [−50, 50] m, is discretized on a 512 × 512 nodes
uniform grid. The kinematic viscosity is also reduced to 1 × 10−3 m2 s−1 . The
reductions considered for the core size and the kinematic viscosity delay the merging process. Simulations with the second- and fourth-order numerical schemes
were performed and the predicted vorticity contours at five time stages, t = 60 s,
t = 220 s, t = 240 s, t = 260 s and t = 300 s, are plotted in Fig. 3.18. This flow
configuration allows showing that the better velocity conservation of the fourthorder accurate numerical scheme improves significantly the simulations. Figure
3.18 shows important differences on the vortex merging dynamics for both simulations. After the conclusion of the merging, the vertical velocity component
profiles along a horizontal line through the vortex center, presented in Fig. 3.19,
are very similar because the small-scale interaction processes have finished and
consequently, the mesh refinement is now sufficient to achieve similar solutions
with both numerical schemes.
CHAPTER 3. SPACE DOMAIN DECOMPOSITION
8
8
6
6
6
4
4
4
2
2
2
0
0
y
8
y
y
52
-2
-2
-4
-4
-4
-6
-6
-6
-8
-8
-10
-5
0
5
10
-10
-8
-5
10
-10
8
6
6
4
4
4
2
2
2
0
0
y
8
-2
-2
-4
-4
-4
-6
-6
-6
-8
-8
0
5
10
-10
0
5
10
-10
4
4
2
2
2
0
0
y
6
4
y
8
6
-2
-2
-4
-4
-4
-6
-6
-6
-8
-8
10
-10
0
5
10
-10
6
4
4
4
2
2
2
0
0
y
8
6
y
8
-2
-2
-4
-4
-4
-6
-6
-6
-8
-8
10
-10
0
5
10
-10
6
4
4
4
2
2
2
0
0
y
8
6
y
8
-2
-2
-4
-4
-4
-6
-6
-6
-8
-8
x
10
-10
0
5
10
5
10
0
-2
5
10
x
6
0
-5
x
8
-5
5
-8
-5
x
-10
0
0
-2
5
10
x
6
0
-5
x
8
-5
5
-8
-5
x
-10
0
0
-2
5
10
x
8
0
-5
x
6
-5
5
-8
-5
8
-10
0
0
-2
-5
-5
x
6
x
y
5
8
-10
y
0
x
y
y
x
y
0
-2
-8
-5
0
x
5
10
-10
-5
0
x
Figure 3.15: Comparison of the vorticity contours during the merging process
simulation on a 150 × 150 nodes grid at t = 2 s, t = 5 s, t = 6 s, t = 7 s and
t = 8 s (second-order de-averaging on the left and fourth-order on center) with the
reference solution on the 600 × 600 nodes grid (right side).
8
8
6
6
6
4
4
4
2
2
2
0
0
y
8
y
y
3.2. HIGH-ORDER FINITE-VOLUME NUMERICAL SIMULATIONS
-2
-2
-4
-4
-4
-6
-6
-6
-8
-8
-5
0
5
10
-10
-8
-5
10
-10
8
6
6
4
4
4
2
2
2
0
0
y
8
-2
-2
-4
-4
-4
-6
-6
-6
-8
-8
0
5
10
-10
0
5
10
-10
4
4
2
2
2
0
0
y
6
4
y
8
6
-2
-2
-4
-4
-4
-6
-6
-6
-8
-8
10
-10
0
5
10
-10
6
4
4
4
2
2
2
0
0
y
8
6
y
8
-2
-2
-4
-4
-4
-6
-6
-6
-8
-8
10
-10
0
5
10
-10
6
4
4
4
2
2
2
0
0
y
8
6
y
8
-2
-2
-4
-4
-4
-6
-6
-6
-8
-8
x
10
-10
0
5
10
5
10
0
-2
5
10
x
6
0
-5
x
8
-5
5
-8
-5
x
-10
0
0
-2
5
10
x
6
0
-5
x
8
-5
5
-8
-5
x
-10
0
0
-2
5
10
x
8
0
-5
x
6
-5
5
-8
-5
8
-10
0
0
-2
-5
-5
x
6
x
y
5
8
-10
y
0
x
y
y
x
y
0
-2
-10
53
-8
-5
0
x
5
10
-10
-5
0
x
Figure 3.16: Comparison of the vorticity contours during the merging process
simulation on a 300 × 300 nodes grid at t = 2 s, t = 5 s, t = 6 s, t = 7 s and
t = 8 s (second-order de-averaging on the left and fourth-order on center) with the
reference solution on the 600 × 600 nodes grid (right side).
CHAPTER 3. SPACE DOMAIN DECOMPOSITION
8
8
6
6
4
4
2
2
0
y
y
54
0
-2
-2
-4
-4
-6
-6
-8
-10
-8
-5
0
5
10
-10
-5
8
8
6
6
4
4
2
2
0
-2
-4
-4
-6
-6
-8
0
5
10
-10
-5
6
4
4
2
2
0
y
y
8
6
-2
-4
-4
-6
-6
-8
0
5
10
-10
-5
0
5
10
5
10
5
10
x
8
8
6
6
4
4
2
2
0
y
y
10
-8
-5
x
0
-2
-2
-4
-4
-6
-6
-8
-8
-5
0
5
10
-10
-5
x
0
x
8
8
6
6
4
4
2
2
0
y
y
5
0
-2
0
-2
-2
-4
-4
-6
-6
-8
-10
0
x
8
-10
10
-8
-5
x
-10
5
0
-2
-10
0
x
y
y
x
-8
-5
0
x
5
10
-10
-5
0
x
Figure 3.17: Vorticity contours during the merging process at t = 2 s, t = 5 s,
t = 6 s, t = 7 s and t = 8 s for 300 × 300 (left) and 300 × 300 × 20 (right) nodes
grids.
3.2. HIGH-ORDER FINITE-VOLUME NUMERICAL SIMULATIONS
5
Y
Y
5
0
-5
0
-5
-5
0
5
-5
0
X
Y
Y
5
0
-5
0
-5
-5
0
5
-5
0
X
5
Y
Y
5
X
5
0
-5
0
-5
-5
0
5
-5
0
X
5
X
5
Y
5
Y
5
X
5
0
-5
0
-5
-5
0
5
-5
0
X
5
X
5
Y
5
Y
55
0
-5
0
-5
-5
0
5
X
-5
0
5
X
Figure 3.18: Vorticity contours during the merging process at t = 60 s, t = 220 s,
t = 240 s, t = 260 s and t = 300 s (Second-order de-averaging on left and fourthorder on the right side).
56
CHAPTER 3. SPACE DOMAIN DECOMPOSITION
2nd order integration/de-averaging
4th order integration/de-averaging
1
V-THETA
0.75
0.5
0.25
5
10
15
X
Figure 3.19: Predicted vertical velocity component profile after the merging (t =
360 s).
3.3
Summary
The main topics related to the spatial domain decomposition method for parallel
simulation of incompressible, unsteady fluid flows were addressed in this Chapter.
For explicit projection schemes, the solution of the pressure Poisson equation is the
most time-consuming procedure and consequently, determinant to achieve a good
parallel performance. The methodology adopted for the parallel solution of the
system of equations was selected after numerical tests evaluating the performance
of several options for the coupling between sub-domains. The tests allowed to
conclude that it is advantageous to solve the systems of equations with the AZTEC
library and the global solution method.
The spatial domain decomposition technique was applied to parallelize the
fourth-order accurate finite-volume solver code for the unsteady, incompressible
form of the Navier-Stokes equations, presented in Chapter 2. The proposed fourthorder accurate finite-volume numerical scheme considers the time advancement of
the momentum equations by the classical four-stage Runge-Kutta explicit scheme
and the control-volume faces convective and viscous fluxes approximation by Simp-
3.3. SUMMARY
57
son’s rule and fourth-order accurate polynomial interpolation.
Three test cases were considered to evaluate and analyse the properties of the
proposed numerical scheme. The fourth-order accuracy of the numerical scheme
was verified with the two-dimensional Taylor vortex decay problem simulation.
When the time derivative of the cell center velocity replaces the time derivative of
the volume-averaged velocity, the global numerical scheme becomes second-order
accurate.
The second and third test cases comprise the simulation of the interaction
between a pair of counter- and co-rotating vortices, respectively. The main purpose
of the simulations was to verify the increase of the numerical scheme accuracy
promoted by the inclusion of the high-order de-averaging procedure rather than
investigate the physics of wake vortices interaction. Flow simulations comprising
the interaction between Lamb-Oseen vortices provided the following conclusions:
(i) The inclusion of the high-order de-averaging scheme promotes an important
accuracy increase on the predictions based on coarser grids.
(ii) The inclusion of the high-order accurate de-averaging scheme improves significantly the velocity conservation.
(iii) The differences between the solutions obtained with and without the inclusion of the high-order de-averaging procedure vanish with mesh refinement
and for very refined meshes the predictions are identical.
A high-order finite-volume unsteady, incompressible flow solver provides better
simulation of small-scale vortex dynamics after the inclusion of a high-order deaveraging procedure to approximate the time derivative of the volume-averaged
velocity congruently with the accuracy of the other operators.
58
CHAPTER 3. SPACE DOMAIN DECOMPOSITION
Chapter 4
Time domain decomposition
This Chapter is devoted to present the parallel-in-time method for unsteady, incompressible fluid flow simulation. Section 4.1 includes the detailed presentation
of the numerical method applied to the solution of the unsteady, incompressible
form of the Navier-Stokes equations. When solving the Navier-Stokes equations
in this predictor-corrector fashion, some problems may emerge due to the use of
two temporal grids. The stability of the method is evaluated and discussed in
Section 4.2. Another issue that requires some attention is the choice of the time
integration method used on each time-grid. Section 4.3 is devoted to analyse the
accuracy and convergence of the method. A performance model for the present
method is developed in Section 4.4. Section 4.5 includes the results of numerical
experiments. The laminar Taylor vortex-decay problem, the shedding flow behind
a square cylinder and the natural convection in a square cavity problem, where the
hybrid space-time parallelization was achieved, were selected to prove the ability of
the time-domain decomposition method to provide parallel fluid flow simulations.
The Chapter closes with the presentation of detailed conclusions in Section 4.6.
4.1
Numerical method
Considering an unsteady, incompressible Newtonian fluid flow, governed by Eq.
(2.10) to (2.12), let us assume that the spatial domain of the problem is divided
into control volumes provided in the present case by a cartesian orthogonal mesh.
Standard numerical schemes, as those described in Chapter 2, are used for spatial
59
60
CHAPTER 4. TIME DOMAIN DECOMPOSITION
and temporal discretization.
The parallel-in-time method requires two temporal grids. The present hybrid
formulation of the parallel-in-time algorithm is represented schematically in the
Fig. 4.1. The Figure shows the time domain decomposition into time-slices and
the entire space domain at each time-slice decomposed into spatial sub-domains.
The time interval [0, T ] of the problem under consideration is decomposed into
a sequence of L sub-domains of size ∆t = T /L, that will be called coarse timegrid. For the present purpose, it is sufficient to consider that an operator G∆t
performs the solution of the momentum and energy equations together with the
pressure correction Poisson equation for a single time-step ∆t. Another operator,
Fδt , is required for the parallel-in-time solution. This operator also corresponds
to the solution of the momentum and energy equations together with the pressure
correction Poisson equation for the time-span ∆t but uses a finer time-grid. The
operator Fδt corresponds to the sequential solution of M time-steps with size δt =
∆t/M , for some integer M . When more than one processor is allocated to each time
slice, the communication of the variable values on spatial sub-domain boundaries
is required as in the standard spatial domain decomposition technique.
Denoting by χ0 the initial velocity and temperature fields and by (χ1 , ..., χL )
the successive fields at ti = i ∆t, where i is the temporal sub-domain number, the
parallel solution proceeds as follows:
(i) Initialization
A coarse time-grid solution is obtained sequentially for the entire time domain
of the problem. The processors assigned to each time sub-domain solves the
spatial field for a single time-step, ∆t, and communicates the solution to the
processors assigned to the next time sub-domain,
χ0i = G∆t (χ0i−1 )
(4.1)
χ00 = χ0
for all time instances, i = 1, 2, .., L, being the superscript the iteration
counter. The operator G∆t performs the solution of the momentum and
energy equations together with the pressure correction Poisson equation for
a single time-step ∆t. This sequential solution in the coarse time-grid re-
4.1. NUMERICAL METHOD
61
ain
om
ed
c
a
sp
e
m
ti
g
itn
u
p
m
o
c
-
Sequential coarse time-grid solution
Parallel fine time-grid solution
time
vari
a
ble
Figure 4.1: Space-time decomposition parallel solver schematic diagram.
quires correction, provided by the iterative scheme that follows.
(ii) Iterative Procedure
The initial solution obtained previously is used to start an iterative procedure. Each iteration includes a parallel calculation using the fine time-grid
and a sequential one on the coarse time-grid. Firstly, the parallel calculation
is performed in the finer time-grid
ψik = Fδt (χk−1
i−1 )
(4.2)
χk0 = χ0
for 1 ≤ i ≤ L and k ≥ 1. The operator Fδt denotes the parallel solution of
the momentum and energy equations together with the pressure correction
Poisson equation on the finer time-grid for M time-steps of size δt from ti−1 to
62
CHAPTER 4. TIME DOMAIN DECOMPOSITION
ti . Completed the parallel solution on the finer time-grid, the solution jumps
at ti are calculated by each processor, according to the difference between
the new solution calculated on the finer time-grid and the solution on the
coarse time-grid at the previous iteration,
.
Sik = ψik − χk−1
i
(4.3)
The solution jumps are also evaluated in parallel because no information
is required from other time instances. Finally, a new sequential solution is
calculated. For 1 ≤ i ≤ L a solution is predicted using the coarse time-grid
solver,
χ̃ki = G∆t (χki−1 ) ,
(4.4)
corrected by the solution jumps,
χki = χ̃ki +
k
X
Sil ,
(4.5)
l=1
and communicated to the processors that are assigned to the next time-step.
Time-marching sequential and parallel tasks are indicated in Fig. 4.1 showing that some overlap between them is possible and so contributing to reduce
the sequential tasks overhead. For each time-span, the iterative procedure starts
when the first initializing time-step calculation is completed. The parallel speed-up
achieved with the above described algorithm is strongly dependent on the number
of iterations performed and on the computational time spent on sequential tasks.
One should note that at iteration k the solution at time t = tk does not need
further corrections because the final solution is found.
4.2
Numerical stability
Application of the classical linear stability analysis to this algorithm faces difficulties because there are two time-marching solutions on different time-grids used
in a predictor-corrector iterative fashion. Therefore, the stability domain for sev-
4.2. NUMERICAL STABILITY
63
eral time integration schemes was evaluated by numerical experiments using the
one-dimensional transport equation model,
∂φ
∂φ
∂ 2φ
+u
− α 2 = 0,
∂t
∂x
∂x
(4.6)
applied to solve the propagating scalar front problem indicated in Fig. 4.2. At
t = 0 a sharp front is located at x = 0. For subsequent time the front convects
to the right with a speed u and its profile loses sharpness under the influence of
the diffusivity α. Different numerical schemes were tested comprising the threelevel implicit scheme, the implicit and explicit Euler schemes, the Crank-Nicolson
scheme, the Adams-Bashforth scheme and the fourth-order Runge-Kutta scheme.
Spatial discretization of first-, second- and fourth-order accuracy was used together with the above denoted temporal discretization schemes to provide a stable
formulation for the finite differences analogue of Eq. (4.6). The first-order ”upwind ” spatial discretization is used together with the first-order temporal schemes.
For the second-order temporal schemes, the spatial discretization is performed by
second-order central differences. The fourth-order Runge-Kutta scheme is used
together with the fourth-order central differences scheme.
The one-dimensional unsteady convection-diffusion problem will also be used to
illustrate the application of the parallel-in-time algorithm. For sake of simplicity,
only the temporal domain decomposition will be considered here. Firstly, the time
domain for the problem under consideration is divided by P , number of processors
devoted to the simulation, calculating the coarse time-grid step size, ∆t. The fine
time-grid step size, δt, should be chosen in order to satisfy the stability constraints
of the numerical scheme intended to be applied for the temporal evolution on
this grid. Another issue related to the choice of this numerical scheme is that it
should allow the improvement of the solution accuracy. The implicit Euler scheme
together with the ”upwind ” spatial discretization is considered here for the coarse
time-grid. The ratio between the coarse and fine time-grid step sizes equal to
20 allows the use of the second-order accurate explicit Adams-Bashforth scheme
for the fine time-grid integration along with the second-order central differences
scheme for the spatial discretization. Considering these arbitrary assumptions, the
application of the parallel-in-time method consists of the following calculations.
After discretization of Eq. (4.6), the resulting algebraic equation at node j for
64
CHAPTER 4. TIME DOMAIN DECOMPOSITION
Initial condition
Solution
1
0.9
0.8
0.7
scalar
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.5
-0.25
0
0.25
0.5
X
Figure 4.2: Propagating scalar front problem considered for numerical evaluation
of the stability domain (solution for t = 1, u = 0.125 and α = 10−3 , after two
iterations).
positive u reads:
¡
¢
¡
¢
i+1
φi+1
− φij u φi+1
− φi+1
α φi+1
+ φi+1
j
j
j−1
j−1 − 2φj
j+1
+
−
=0
∆t
∆x
∆x2
(4.7)
where the superscript i refers to the final time of the temporal sub-domain assigned
to each processor.
The processor allocated to the first time-span solves Eq. (4.7) to the required
stopping criterion for the residuals, which corresponds to the coarse time-grid
prediction of the variable field at the final time-instance of the first time-span.
Then, the processor transmits this variable field to the processor assigned to the
following time-span. The processor receives and initializes the variable field with
this prediction and solves Eq. (4.7) for the next coarse grid time-step. This
procedure continues throughout the entire time-domain of the problem.
After conclusion of the tasks comprised in the initializing step, each processor
4.2. NUMERICAL STABILITY
65
is able to start iteration. Iteration begins with a fine time-grid solution. For each
processor, it corresponds to solve sequentially the algebraic equation resulting from
the discretization of Eq. (4.6) 20 time-steps with size δt = ∆t/20. Calculation
starts from the same local initial variable field, already considered for the coarse
time-grid prediction, see Fig. 4.3. The algebraic equation for the fine time-grid
evolution reads:
à ¡
¡
¢
¢!
m
m
m
φm+1
− φm
α φm
3 u φm
j
j
j+1 − φj−1
j−1 − 2φj + φj+1
+
−
−
δt
2
2∆x
∆x2
à ¡
¢
¡
¢!
m−1
m−1
m−1
α φm−1
+ φm−1
1 u φj+1 − φj−1
j−1 − 2φj
j+1
−
−
=0
(4.8)
2
2∆x
∆x2
where the superscript m refers to the time-step counter on the fine time-grid. This
fine time-grid calculation allows evaluating the solution jumps at the local final
time. This procedure, described by Eq. (4.3), only requires local values of the
variable and consequently, is also performed in parallel. The iteration concludes
with a sequential coarse time-grid calculation similar to the initializing procedure
with an important difference. The variable field prediction at final time-instance,
obtained previously by each processor on the coarse time-grid, is now corrected by
the solution jumps, applying Eq. (4.5), prior to the transmission to the processor
assigned for the next time-span. Repeat the same procedure for the first time-span
conducts obviously to the same solution at the local final time-instance. Therefore,
the second iteration begins with the solution of the second coarse grid time-span.
Figure 4.2 shows the solution of the propagating scalar front problem for t = 1,
u = 0.125 and α = 10−3 , after two iterations.
Preliminary tests showed that the stability domain of the schemes decreases
with an increasing number of processors and number of iterations. The numerical
experiments were performed in a single processor, but accordingly with the described parallel-in-time algorithm, simulating the solution on one hundred processors. Stability domain was evaluated considering up to 5, 20 and 100 iterations
using the same numerical scheme for both time grids.
A persistent increase on the error of the solution during the iterative procedure
leading to unrealistic solutions was used to set the criterion for admissible pairs of
the CFL number, CF L = u∆t/∆x, and the diffusive parameter, d = α∆t/∆x2 .
66
CHAPTER 4. TIME DOMAIN DECOMPOSITION
processor 1
processor 2
time domain
clock time
coarse time-grid step size - ∆t
fine time-grid step size - δt
Figure 4.3: Sequential coarse time-grid and parallel fine time-grid solutions
Figure 4.4 shows an example when the error dramatically increases by several
orders of magnitude but the solution does not blow-up and recovers the accuracy
after a large number of iterations. These situations are case dependent and not
completely understood.
Figure 4.5 shows the conditional stability domain for the Crank-Nicolson, explicit Euler, Adams-Bashforth and fourth-order Runge-Kutta time integration
schemes. The results of the numerical experiments show a stability domain reduction for the explicit schemes when compared with their standard conditional
stability domain obtained by the Fourier analysis [66]. No reduction of the stability domain has been detected for the implicit ”unconditionally stable” three-level
and Euler schemes. In addition, no influence was observed of the finer time-grid
step size on the stability domain.
4.2. NUMERICAL STABILITY
10
9
10
7
67
Courant = 2.0
Courant = 1.8
Error
105
103
10
1
10
-1
10
-3
0
25
50
75
100
Iteration
Figure 4.4: Error dependence on the iteration number near the stability boundary
(4th order Runge-Kutta scheme and diffusive criterion equal to 0.2)
68
CHAPTER 4. TIME DOMAIN DECOMPOSITION
Serial solution
5 iterations
20 iterations
100 iterations
1
0.9
0.8
Courant
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
Diffusive Criterion
a)
Serial solution
5 iterations
20 iterations
100 iterations
1
0.9
0.8
Courant
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
b)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Diffusive Criterion
Figure 4.5: Stability domain for the explicit Euler (a), Adams-Basforth (b), CrankNicolson (c) and 4th order Runge-Kutta (d) schemes
.
4.2. NUMERICAL STABILITY
69
5
5 iterations
20 iterations
100 iterations
Courant
4
3
2
1
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Diffusive Criterion
c)
Serial solution
5 iterations
20 iterations
100 iterations
3
2.5
Courant
2
1.5
1
0.5
0
d)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Diffusive Criterion
Figure 4.5: Stability domain for the explicit Euler (a), Adams-Basforth (b), CrankNicolson (c) and 4th order Runge-Kutta (d) schemes (cont’d)
.
70
CHAPTER 4. TIME DOMAIN DECOMPOSITION
4.3
Accuracy of the parallel-in-time numerical
scheme
The accuracy of the iterative algorithm for linear parabolic differential equations
was addressed by Lions et al. [24] and Bal and Maday [25] considering the exact
form of the solution on the fine time-grid but that corresponds to a simplification
in the model. The iterative scheme is ideally of order m × (k + 1), being m the
accuracy order of the numerical scheme considered for the coarse time-grid solution
and k the iteration counter. It is important to analyse how the iterative numerical
scheme behaves with some spatial and temporal discretization schemes commonly
used for the solution of the Navier-Stokes equations.
For this purpose, the one-dimensional scalar transport equation, Eq. (4.6), is
applied to solve the propagating scalar pulse problem and analyse the accuracy of
the parallel-in-time method. The problem consists of the one-dimensional domain
from x = 0 to x = 2, through which fluid velocity is u = 0.25. The diffusivity was
set to 10−3 . The initial condition corresponds to a Gaussian wave pulse with peak
amplitude unity,
2 /4α
φ(x, 0) = e−(x−u)
,
(4.9)
considered centered at x = 0.25, see Fig. 4.6.
The time dependent solution for this problem [67],
φ(x, t) = √
1
2
e−(x−u(1+t)) /4α(1+t) ,
1+t
(4.10)
allows to evaluate the error of numerical solution. The time domain considered for
this purpose is T equal to 2. The numerical temporal discretization schemes range
from first- to fourth-order accurate. Spatial discretization of first-, second- and
fourth-order accuracy was used together with the temporal discretization schemes
to provide a stable formulation for the finite difference analogue of Eq. (4.6).
The first-order ”upwind ” spatial discretization is used together with the first-order
temporal schemes (implicit and explicit Euler). For the second-order temporal
schemes (Adams-Bashforth, Crank-Nicolson and three-level implicit) the secondorder central differences scheme is applied for the spatial discretization and the
4.3. PARALLEL-IN-TIME NUMERICAL SCHEME ACCURACY
71
1
initial
1 iter
2 iter
3 iter
4 iter
5 iter
exact
0.9
0.8
0.7
Scalar
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.5
1
1.5
2
X
Figure 4.6: Initial condition (t = 0) and the iterative approximation to the exact
solution at t = 2.
fourth-order Runge-Kutta explicit scheme is used together with the fourth-order
central differences scheme.
The numerical tests were conducted considering spatial meshes ranging from
nx = 50 to nx = 1600 mesh nodes keeping CFL number constant. Therefore, the
number of processors used for the calculations depends on the spatial mesh and
P = nx/2 yields to a CFL number equal to 5 × 10−1 and 5 × 10−2 on the coarse
and fine time-grids, respectively. The number of time-steps on the finer time-grid
was set to 10 × P , to obtain the accurate solution on the finer time-grid required
by the iterative procedure. The parallel-in-time calculations were performed in a
single processor, accordingly with the described algorithm, simulating the required
number of processors (up to 800).
Firstly, sequential calculations were performed to evaluate the accuracy of each
numerical scheme on the finer time-grid. Figure 4.7 shows the dependence of the
72
CHAPTER 4. TIME DOMAIN DECOMPOSITION
10-1
10
-2
10
-3
m=1
m=2
m=4
-4
error
10
10-5
Explicit Euler
Implicit Euler
Adams-Bashforth
Crank-Nicolson
3 level implicit
4th-order Runge-Kutta
10-6
10
-7
10
-8
0.01
0.02
0.03 0.04
2 / nx
Figure 4.7: Dependence of sequential solution on the fine time-grid L2 error norm
on the mesh resolution.
L2 norm of the error on the mesh size for several temporal schemes including also
the first, second and fourth order slopes. The Figure shows that only the secondand fourth-order numerical schemes are within the convergence asymptotic region
of the discretization.
The dependence of the L2 norm of the error on the spatial discretization and
on the number of iterations performed is indicated in Fig. 4.8 to 4.10 for different
pairs of numerical schemes considered for the coarse and fine time-grid temporal
evolution. Among the schemes tested, the fourth-order Runge-Kutta scheme is
the best candidate for the fine time-grid integrator. Figure 4.6 displays the initial
condition (t = 0) and also the evolution of the solution during the iterative procedure when the first order implicit Euler scheme is used for the temporal evolution
on the coarse time-grid. The evolution of the error on the spatial discretization,
plotted in Fig. 4.8, shows that the order of accuracy increases with the number
4.3. PARALLEL-IN-TIME NUMERICAL SCHEME ACCURACY
10
73
-1
m=1
10-2
-3
10
-4
m=4
error
10
initial approx.
iteration 1
iteration 2
iteration 3
iteration 4
iteration 5
10-5
10
-6
m=6
10-7
10
-8
0.01
0.02
0.03 0.04
2 / nx
Figure 4.8: Error dependence on the spatial discretization and number of iterations
using the implicit Euler and fourth-order Runge-Kutta schemes on the coarse and
fine time-grids, respectively.
of iterations but does not reach the formal convergence order (k + 1) because the
first-order scheme is not in the convergence asymptotic region.
It is possible to detect the same behaviour when using the second-order accurate Crank-Nicolson scheme for the coarse time-grid solution. However, after
the second/third iteration, depending on the spatial discretization, no further improvement on the accuracy of the solution is obtained because the accuracy of the
fine time-grid solution is already acquired, see Fig. 4.9.
The second-order accurate Adams-Bashforth explicit scheme is also a good
candidate for the numerical scheme to use for the fine time-grid solution because,
although only second-order accurate, is computationally less expensive than the
Runge-Kutta scheme. For the problem under consideration, the convergence is
achieved after four iterations when the Adams-Bashforth is used together with the
74
CHAPTER 4. TIME DOMAIN DECOMPOSITION
10
initial approx.
iteration 1
iteration 2
iteration 3
iteration 4
iteration 5
-1
10-2
10
-3
m=2
-4
error
10
10-5
m=4
10
-6
10-7
10
-8
0.01
0.02
0.03 0.04
2 / nx
Figure 4.9: Error dependence on the spatial discretization and number of iterations
using the Crank-Nicolson and fourth-order Runge-Kutta schemes on the coarse and
fine time-grids, respectively.
implicit Euler scheme on the coarse time-grid, see Fig. 4.10.
The selection of the numerical schemes for the temporal evolution on each
time-grid, besides the stability constraints, should consider that the finer time-grid
numerical scheme should provide higher order of convergence than the one used in
the coarse time-grid. When using the fourth-order accurate numerical scheme on
the fine time-grid, the convergence of the iterative parallel-in-time method requires
fewer iterations using a second-order accurate scheme for the coarse time-grid than
an unconditionally stable first-order scheme.
The speed-up obtained with the parallel-in-time method is strongly dependent
on the number of iterations required to meet the convergence criterion. The number of iterations to acquire the accuracy of the fine-grid solution is obviously dependent of the nature of numerical schemes applied. Considering the general theory
4.3. PARALLEL-IN-TIME NUMERICAL SCHEME ACCURACY
10
-1
75
m=1
10-2
10
-3
m=2
-4
error
10
initial approx.
iteration 1
iteration 2
iteration 3
iteration 4
iteration 5
10-5
10
-6
10-7
10
-8
0.01
0.02
0.03 0.04
2 / nx
Figure 4.10: Error dependence on the spatial discretization and number of iterations using the implicit Euler and Adams-Bashforth schemes on the coarse and
fine time-grids, respectively.
related to the convergence of numerical schemes, one can approximate the number
of iterations required for convergence by the ratio between the order of accuracy
of the numerical schemes used for the coarse and fine time-grids solution. This
prediction is clearly verified when using the second-order accurate Crank-Nicolson
and the fourth-order Runge-Kutta schemes for the coarse and fine time-grid solutions respectively. When using the formal first-order implicit Euler scheme for the
coarse time-grid solution, one must consider that the order of accuracy evaluated
in the interval between ∆x = 0.00125 and ∆x = 0.005 on the initial approximation
is only equal to 0.61. Therefore, the prediction of the number of iterations required
to achieve the accuracy of the Runge-Kutta or the Adams-Bashforth schemes fine
time-grid solution is 7 and 4, respectively.
76
4.4
CHAPTER 4. TIME DOMAIN DECOMPOSITION
Performance model
Along with the number of iterations required for convergence, the parallel efficiency
of the method is also strongly dependent on the time spent by the communication
tasks required by the algorithm. The following is an attempt to derive theoretically
the parallel speed-up of the method presently used, even with some simplifying
assumptions, in order to evaluate the potential application of the technique for the
unsteady, incompressible Navier-Stokes equations. The comparison between the
performance prediction and the speed-up effectively achieved with a PC-cluster
computation will allow to verify how relevant is the time spent on communication
tasks for the efficiency of the parallel-in-time method.
Proc. 1
Proc. 2
...
Proc.P-1
time variable
Proc. P
Tin
T1
T2
e
im
t
k
c
o
l
c
converged solution
- Sequential solution on coarse time-grid
- Parallel solution on fine time-grid
Figure 4.11: Parallel-in-time solver schematic diagram.
Figure 4.11 displays two iterations of the cycle. The computing time required
to perform the first iteration is denoted by T1 . One should note that after the
first iteration the solution at time t1 corresponds to the final solution at this time
instance. More generally, at iteration k the solution at time tk does not need
4.4. PERFORMANCE MODEL
77
further iteration because the final solution is achieved. Considering the overlap
between sequential and parallel tasks, indicated in Fig. 4.11, and neglecting the
communication time, the total computing time of a parallel-in-time solution using
P processors, ΓP , can be predicted by:
ΓP = Γseq + k
Γseq + Γ1
,
P
(4.11)
where Γseq denotes the computing time of the sequential solution on the coarse
time-grid, Γ1 the computing time on a single processor and k is the number of
iterations prescribed. The parallel speed-up, Eq. (1.1), of a computation based on
the time domain decomposition, can be predicted by:
S (P ) =
Γ1
Γseq + k ΓseqP+Γ1
.
(4.12)
Considering the same computing time for one time-step on the fine and coarse
time-grid, the maximum expected speed-up that can be achieved is approximated
by:
S (P ) =
MP
,
P + k (1 + M )
(4.13)
where M is the ratio between the coarse and the fine time-grid step sizes. Figure
4.12 shows the predicted speed-up of a parallel-in-time calculation, Eq. (4.13),
as a function of the number of processors and the ratio between the coarse and
the fine time-grid step sizes, when two iterations on the parallel-in-time algorithm
are required to meet the convergence criterion. The Figure shows that the speedup achieved for small time-step ratios is worthless. However, important parallel
speed-up can be predicted for high time-step ratios. Similar conclusions could be
derived considering other number of iterations, k. For high time-step ratios, the
speed-up always increases with the number of processors and is limited by P/k.
The behaviour of parallel-in-time processing is rather different from the space
domain decomposition where, for a fixed spatial dimension of the problem, speedup has a limit imposed by the computation/communication time ratio and no further speed-up improvement can be achieved by increasing the number of processors
involved. The penalty on the parallel-in-time method is inherent to the algorithm
78
CHAPTER 4. TIME DOMAIN DECOMPOSITION
Parallel Speed-up
100
16 proc.
32 proc.
64 proc.
128 proc.
10
1
1
10
100
1000
10000
Time-steps on fine time grid per processor
Figure 4.12: Parallel-in-time speed-up prediction for two iterations.
and consequently, the efficiency comparison of the present method with the classical spatial domain decomposition method is adverse in most cases. However,
despite the low efficiency obtained, important computing time reductions could be
accomplished if a large number of processors are available and a few iterations are
required to converge the solution.
4.5
Numerical experiments
4.5.1
Taylor vortex
4.5.1.1
Parallel-in-time results
The two-dimensional Taylor vortex-decay problem [64] was used to evaluate the
numerical time-parallel technique applied to the integral form of the Navier-Stokes
and continuity equations. The analytical solution for this problem was already
4.5. NUMERICAL EXPERIMENTS
79
presented by Eq. (3.1) to (3.3). Computations were performed with Re = 100
and the time domain considered is up to T = 40 s. The staggered spatial meshes
comprise 32 × 32 and 64 × 64 nodes to discretize the spatial domain 0 ≤ x, y ≤ π.
The time domain was decomposed into P , number of processors, sub-domains (up
to 16 because at this time only 16 processors were available on the PC-cluster)
yielding ∆t = 40/P s and the finer time step was set equal to 4 × 10−3 s. Firstorder implicit and explicit schemes are used for the temporal evolution on the
coarse and on the finer time-grids, respectively. The Bi-CGSTAB [61] method is
used to solve the Poisson and the implicit momentum equations. The SIMPLE [55]
method is employed to provide the velocity and pressure fields correction during
each time iteration on the coarse time-grid. The standard projection method is
applied for the finer time-grid solution.
The numerical experiments performed suggest that the solution accuracy and
the computing time required by the parallel-in-time method depend on several
parameters:
(i) The number of iterations performed in the algorithm;
(ii) The spatial resolution;
(iii) The ratio of coarse to fine time-step sizes ∆t/δt;
and for each one of the above items depend obviously on the number of processors.
The dependence of the computing time and the solution accuracy on items (i), (ii)
and (iii) is presented in the following paragraphs.
Influence of the number of iterations. The parallel-in-time method involves
two integrators, for the coarse and fine time-grids that are used in an iterative
fashion. Consequently, the computing time of the parallel-in-time solution is dependent on the number of iterations. Figure 4.13 shows the computing time versus
the number of processors for different number of iterations considered. The spatial mesh comprises 32 × 32 nodes. For this test case, the number of iterations is
prescribed (2, 3 or 4) and Figure 4.13 shows that the computing time obviously
decreases with the increase in the number of processors and increases with the
increase in the number of iterations. Figure 4.13 shows also the computing time
80
CHAPTER 4. TIME DOMAIN DECOMPOSITION
120
110
2
3
4
1
100
Computing Time (s)
90
iterations
iterations
iterations
processor
80
70
60
50
40
30
20
10
0
2
4
6
8
10
12
14
16
Number of Processors
Figure 4.13: Parallel-in-time computing time dependence on the number of iterations performed (32 × 32 nodes; δt = 4 × 10−3 s).
for a non-parallel calculation performed with a fully explicit method using a timestep equal to the finer time-step of the parallel-in-time calculation. The computing
time required for the serial calculation is equal to 106 s. The computing time of
the parallel-in-time calculation was equal to 41 s on 16 processors and four iterations. The speed-up is rather low, and equal to 4.1, using 16 processors and two
iterations.
Concerning the accuracy of the method, Figure 4.14 shows the L1 error norm
of the calculated solution at t = 40 s as a function of the number of iterations
prescribed. A very small error in the velocity field, approximately equal to 1×10−5 ,
occurs. This error may be considered very small because the maximum value of
the velocity components is approximately equal to 0.45. Figure 4.15 shows that
the maximum deviation between the parallel-in-time and serial solutions decreases
with the increase in the number of iterations. For the present case, the use of four
iterations induces a maximum deviation smaller than 1 × 10−6 and consequently,
the parallel-in-time and serial solutions are virtually identical.
4.5. NUMERICAL EXPERIMENTS
10
81
-4
Error
u velocity
v velocity
10-5
10
-6
1
2
3
4
5
Iteration Number
Figure 4.14: Parallel-in-time solution L1 norm error dependence on the number of
iterations performed (32×32 nodes; δt = 4 × 10−3 s).
10
-4
u velocity
v velocity
-5
Error
10
10-6
10-7
10
-8
1
2
3
4
5
Iteration Number
Figure 4.15: Dependence of maximum deviation between parallel-in-time and serial
solutions on the number of iterations performed (32 × 32 nodes; δt = 4 × 10−3 s).
82
CHAPTER 4. TIME DOMAIN DECOMPOSITION
Influence of the spatial resolution. The temporal solution of any flow problem requires for each time step the solution of the dependent variables in the
discrete spatial domain. The use of an implicit method in the coarse time-grid
penalizes the computing time when the number of nodes increases in the spatial
mesh. Figure 4.16 shows the computing time required by the present parallelin-time method for two spatial meshes, with 32 × 32 and 64 × 64 nodes. The
computing time of a sequential calculation is also shown in the Figure. For both
spatial discretizations considered, a substantial computing time saving is verified
when using the parallel-in-time method. This computing time saving increases
with the increase in the spatial dimension of the problem. Speed-up and parallel
efficiency for both spatial meshes considered are represented in Figure 4.17 showing
that higher speed-up is achieved on coarser spatial meshes.
800
32 x 32
64 x 64
700
Computing Time (s)
600
500
400
300
200
100
0
2
4
6
8
10
12
14
16
Number of Processors
Figure 4.16: Computing time dependence on space resolution (2 iterations; δt =
4 × 10−3 s).
4.5. NUMERICAL EXPERIMENTS
83
1
6
Efficiency 32 x 32 grid
Speed-up 32 x 32 grid
Efficiency 64 x 64 grid
Speed-up 64 x64 grid
Parallel Efficiency
0.8
5
0.7
0.6
4
0.5
0.4
3
0.3
0.2
Parallel Speed-up
0.9
2
0.1
0
1
3
5
7
9
11
13
15
1
Number of Processors
Figure 4.17: Speed-up and parallel efficiency dependence on space resolution (2
iterations; δt = 4 × 10−3 s).
Influence of the ratio of coarse to fine time-step sizes. The computing
time of the parallel-in-time calculations can be split into two parts. One is the
time required for the sequential procedures of the algorithm and the other is
used in parallel calculations. The computing time saving depends on the ratio
between parallel and sequential computational efforts carried out during the iteration process. The selection of δt should correspond to the desired time resolution
in a serial computation.
Three time-grid increments were considered on the fine time-grid, δt1 = 4 ×
10 s, δt2 = 4 × 10−3 s and δt3 = 4 × 10−4 s, to which correspond M × P = 103 ,
104 and 105 , respectively. Figure 4.18 shows the computing time as a function of
the number of processors for the three finer time-grids considered. The computing
−2
time saving increases with the increase of the time scales ratio (M ).
84
CHAPTER 4. TIME DOMAIN DECOMPOSITION
10
4
time-step = 4.E-2
time-step = 4.E-3
time-step = 4.E-4
Computing Time (s)
10
3
102
101
10
0
2
4
6
8
10
12
14
16
Number of Processors
Figure 4.18: Computing time dependence on the size of the finer time-grid increment (2 iterations on a 32 × 32 nodes mesh).
4.5.1.2
Comparison between the spatial and the temporal domain decomposition.
The parallel-in-time method was compared with the standard spatial domain decomposition method. Calculations of the Taylor problem, on a 64 × 64 nodes
mesh, were performed with spatial domain decomposition method using up to sixteen processors. The temporal evolution of the momentum equations is performed
with the same fully explicit procedure that was applied for the fine time-grid on
the parallel-in-time calculation. An explicit outer iteration coupling was used during the solution of the Poisson equation. After each outer iteration, pressure and
velocities on partition boundaries are exchanged between processors until convergence is reached. The parallel efficiency rapidly decreases with the increase in
the number of processors due to the high ratio between the communication and
4.5. NUMERICAL EXPERIMENTS
85
computation efforts, as shown in Fig. 4.19.
The definition of parallel efficiency and speed-up for parallel-in-time results
should correspond to the classical one developed for space-domain decomposition
calculations. In the present parallel-in-time method the iterative use of two integrators, for the coarser and finer time-grids, causes a speed-up and efficiency
decrease. However, large computer time reduction can still be achieved when comparing with a single processor calculation. Figure 4.20 shows the parallel efficiency
of the parallel-in-time calculations. The low efficiency of the spatial domain decomposition method is due to the low dimension of the problem, while the low efficiency
of the parallel-in-time calculations is inherent to the present method for the reasons explained above. Nevertheless, the efficiencies ratio, between parallel-in-time
and spatial domain decomposition methods, is important for the user because it
will help to select the domain, spatial or temporal, that should be parallelized.
Figure 4.21 shows the parallel efficiencies ratio between parallel-in-time and spatial domain decomposition methods. Figure 4.21 shows that when the number of
processors increases, the parallel-in-time method is more efficient than the spatial
domain decomposition. This result was expected because the parallel efficiency of
the spatial domain decomposition method decreases with an increasing number of
processors.
86
CHAPTER 4. TIME DOMAIN DECOMPOSITION
1
4
0.9
Efficiency
Speed-up
3.5
0.7
3
0.6
0.5
2.5
0.4
2
0.3
0.2
Parallel Speed-up
Parallel Efficiency
0.8
1.5
0.1
0
1
3
5
7
9
11
13
15
1
Number of Processors
Figure 4.19: Spatial domain decomposition parallel efficiency and speed-up on a
64 × 64 nodes mesh and δt = 4 × 10−3 s.
1
4
Efficiency
Speed-up
0.9
0.7
3
0.6
0.5
0.4
2
0.3
Parallel Speed-up
Parallel Efficiency
0.8
0.2
0.1
0
1
3
5
7
9
11
13
15
1
Number of Processors
Figure 4.20: Temporal domain decomposition parallel efficiency and speed-up on
a 64 × 64 nodes mesh and δt = 4 × 10−3 s.
4.5. NUMERICAL EXPERIMENTS
87
3
Efficiency ratio
2.5
2
1.5
1
0.5
0
1
3
5
7
9
11
13
15
Number of Processors
Figure 4.21: Parallel-in-time and domain decomposition methods parallel efficiency
ratio on a 64 × 64 nodes mesh.
4.5.2
Shedding flow past a two-dimensional square cylinder in a channel
The numerical simulation of the flow past a two-dimensional square cylinder in a
channel for Reynolds number equal to 500 and 1000 was selected to illustrate the
application of the parallel-in-time method to a self-sustained demanding unsteady
flow problem. Flow around bluff bodies is characterized by the onset of periodic
oscillations after a critical Reynolds number, the von Karman vortex street.
4.5.2.1
Parallel-in-time simulation
The flow configuration comprises the square cylinder, of width unity, symmetrically
confined in a channel with height and length equal to 4 and 24, respectively. The
blockage ratio is therefore 1/4 and the upstream face of the obstacle is at a distance
88
CHAPTER 4. TIME DOMAIN DECOMPOSITION
2
Y
1
0
-1
-2
0
10
X
20
Figure 4.22: Flow configuration and grid.
equal to 8.5 from the inlet. The flow is impulsively started at t = 0 being a uniform
flow prescribed at the inlet. At the outlet, the convective wave open boundary
condition was used for velocity components. In addition, no-slip conditions were
prescribed on walls. The numerical grid comprises 90 × 38 nodes to discretize the
spatial domain of 24 × 4. The local mesh Reynolds number or Peclet number is
higher than 2 for the flow considered and consequently a deferred correction method
was used for convection discretization [60]. The deferred correction employed about
80% of central differences and the remaining 20% of upwind contribution. At each
time-step calculation, the SIMPLE method [55] is used to correct the velocity and
pressure fields enforcing a divergence free velocity field.
The three-level implicit and the Crank-Nicolson schemes were used on the
coarse and finer time-grids, respectively. The time-step sizes are equal to δt =
1 × 10−2 and ∆t = 0.25 in the fine and coarser time-grids, respectively. As the
number of processors available is insufficient to perform a parallel-in-time calculation corresponding to the simulation time of several shedding time periods, that
would require 2000 processors for a time interval T equal to 500, time-blocks were
considered. In this way, time-blocks with size equal to P × ∆t are solved sequentially.
Figure 4.23 shows the temporal evolution of the drag and lift coefficients for
Reynolds number equal to 500, where the onset of breaking flow can be observed at
t ≈ 60 after the impulsive start. The bifurcation of the solution was triggered by
numerical noise without any perturbation prescribed to initiate the vortex shedding
and in a similar fashion to the pure sequential simulations. Figure also shows
4.5. NUMERICAL EXPERIMENTS
89
Table 4.1: Predicted values for St and CL rms .
Re St
CL rms
500 0.24 0.52
1000 0.22 1.24
the temporal evolution of the drag and lift coefficients when the periodic flow is
established for the same Reynolds number.
Table 4.1 shows the predicted values for the Strouhal number, St = f D/U0
where f is the shedding frequency and D is the square cylinder width, and for
the rms value of the lift coefficient, CL rms . The predictions are in reasonable
agreement with reference solution values reported by Davies et al. [68].
Figure 4.24 shows predicted vorticity contours and streamlines for Reynolds
number equal to 500. Flow predictions were also obtained on a single processor
with the same parameters used in the finer time-grid of the parallel-in-time procedure. The results were virtually identical, showing that the time decomposition
procedure did not deteriorate the solution. Parallel speed-up for the simulations
performed with 16 processors and the parallel-in-time procedure was equal to 5.2
and 4.8 for Reynolds number equal to 500 and 1000, respectively.
90
CHAPTER 4. TIME DOMAIN DECOMPOSITION
4
CD
CL
Force coefficients
3
2
1
0
-1
-2
0
50
100
150
200
Time
4
CD
CL
Force coefficients
3
2
1
0
-1
-2
400
425
450
475
500
Time
Figure 4.23: Force coefficients for Re = 500.
4.5. NUMERICAL EXPERIMENTS
91
Figure 4.24: Predicted vorticity contours and streamlines for Re = 500.
4.5.2.2
Evaluation of the proposed performance model
The above described flow configuration for Re = 500 was also used to evaluate
the parallel efficiency of the method and compare with the predictions of the
performance model derived in Section 4.4. The spatial domain is now discretized on
a staggered, uniform 151×26 nodes grid. The deferred correction scheme, blending
second-order central and upwind differences, removed the oscillations produced by
the central differences schemes on the mesh Reynolds numbers considered. The
temporal discretization was performed by the implicit Crank-Nicolson scheme on
both time-grids. The selection of the same numerical scheme for both temporal
grids approximates the assumption of equal computing time per time-step on the
fine and coarse time-grids that was considered when deriving the performance
model.
Preliminary simulation was performed on a single processor with a time-step
92
CHAPTER 4. TIME DOMAIN DECOMPOSITION
size equal to 0.01. The bifurcation of the equations does not arise at the same time
on the single processor and on the parallel-in-time solutions. Consequently, the
comparison between the single processor and parallel solutions is based on the full
periodic established vortex shedding mechanism. The comparison is based on the
predictions for St and CLrms . The single processor predictions for St and CLrms
are equal to 0.255 and 0.362, respectively.
The same spatial grid, with 151 × 26 nodes, is used on the parallel calculations
evaluating the dependence of the solution on the number of iterations. Stability
constraints related to the numerical method imposed a coarse grid time-step size
equal to ∆t = 0.1. Time blocks, corresponding each one to P processors handling
P × ∆t, were solved sequentially. The fine time-grid step size is equal to δt = 0.01
and kept constant to the single processor calculation for comparison purposes.
Figure 4.25 plots the calculated St and rms value of the lift coefficient as a function
of the number of iterations performed in the parallel-in-time algorithm. Figure 4.25
shows that few iterations are required to obtain values of St and CLrms that are
virtually equal to those obtained with a single processor calculation.
The small number of iterations required for convergence is a consequence of the
small time-step size on the coarse time grid imposed by the stability restriction and,
consequently the small dimension of each time-block calculated. Figure 4.26 shows
the vorticity contours after the first and the second iteration on the parallel-intime algorithm after the establishment of the periodic flow. Very small differences
are detectable on those vorticity contours. It is expectable for longer time-blocks
an increase on the number of iterations required for convergence. However, the
number of processors available for these experiments did not allow proofing this
assertion.
Other numerical experiments were performed to compare the achieved speedup with the one predicted by the performance model. The sampling frequency,
∆t = 0.1, was kept unchanged in all simulations. Ten time-blocks, consisting
each one of P time-steps, were calculated using different number of processors
(1, 8, 10, 12, and 16) and different coarse to fine time-grid step size ratios (10,
100, 200 and 1000). A converged flow field solution after the establishment of the
periodic flow was used as the initial condition to start these calculations avoiding
the influence of the initial time-steps on an impulsive start from rest. The use of a
prescribed number of time-blocks on the comparison, instead of a prescribed time
4.5. NUMERICAL EXPERIMENTS
93
0.45
Strouhal Number
Lift Coeficient (rms)
Strouhal Number (single processor)
Lift Coeficient (rms) (single processor)
Strouhal Number
0.265
0.4
0.26
0.35
0.255
0.3
0.25
0
1
2
3
4
5
Lift Coeficient (rms)
0.27
0.25
Number of iterations
Figure 4.25: Lift coefficient and Strouhal number dependence on the number of
iterations prescribed.
interval, is necessary to allow the use of the above mentioned cluster dimensions
maintaining the sampling frequency. Two and three iterations on the parallel-intime algorithm were prescribed. Figure 4.27 shows the comparison between the
verified parallel efficiency of the method and predictions based on the performance
model, Eq.(4.13). The predicted efficiency that depends on M , P and k was
plotted as a function of Φ = M/P for each number of iterations prescribed on the
parallel-in-time algorithm. Figure 4.27 shows good agreement between predicted
and verified parallel efficiency. More significant deviations are verified for small
values of Φ, when the computing time of the sequential coarse time-grid procedures
gives a more important contribution to the total computing time.
94
CHAPTER 4. TIME DOMAIN DECOMPOSITION
b)
a)
Figure 4.26: Vorticity contours after first (a) and second (b) iteration.
4.5. NUMERICAL EXPERIMENTS
95
0,6
2 iterations
3 iterations
P=16
P=12
P=10
P=8
Parallel Efficiency
0,5
0,4
0,3
0,2
0,1
0
0,1
1
10
100
1000
Ф
Figure 4.27: Comparison between predicted (lines) and verified (symbols, filled
symbols are related to 3 iterations) efficiency of the parallel-in-time method for 2
and 3 iterations prescribed (Φ = M/P ).
96
CHAPTER 4. TIME DOMAIN DECOMPOSITION
4.5.3
Hybrid spatial and temporal domain decomposition
The test case considered here is the two-dimensional natural convection in a square
cavity flow problem. The two-dimensional, unsteady, incompressible flow in a
square cavity due to natural convection, governed by the Navier-Stokes and energy equations, is a demanding impulsively started fluid flow problem, suitable to
evaluate the main properties of the parallel-in-time method. The test case can be
described as follows. The lower and upper walls of the square cavity are insulated
and the fluid is considered initially at rest and at temperature θ0 . At time t = 0,
the temperatures θ0 + ∆θ and θ0 − ∆θ are prescribed and maintained at the left
and right-side walls, respectively.
Simulations for Prandtl number equal to 7 and Rayleigh number equal to 103 ,
1.4 × 104 and 1.4 × 105 were performed on the PC-cluster with up to 24 nodes. The
non-dimensionalized temporal domain of the simulations is [0, 0.05]. The time-step
size on the coarser time-grid dictates the use of an implicit method. The prescribed
non-dimensionalized time-step size equal to 2 × 10−6 on the finer grid allows the
use of an explicit method.
The spatial domain is discretized on a uniform 32 × 32 nodes collocated mesh.
The convective and diffusive fluxes on the control-volume faces are evaluated with
second order central differences together with the Adams-Bashforth scheme on the
finer time-grid. The projection method is used to correct the pressure and the
velocity fields. When a small number of processors are available, in the present
case up to 24 processors, the integration method, using the coarse time-grid, should
be implicit to verify stability constraints. For the present problem, the implicit
Euler scheme is used together with the first-order ”upwind ” spatial discretization
scheme. At each time-step, the SIMPLE [55] method was used to correct the
pressure and the velocity fields on the coarse time-grid solution. The stabilized biconjugate gradient method (Bi-CGStab) [61] included in the AZTEC library [62]
is used to solve the systems of equations arising from the Poisson and the implicit
momentum and energy equations.
The convergence criterion used to stop the iterative procedure of the parallel-intime algorithm is based on the maximum value of the calculated solution jump nondimensionalized by the maximum value of the variable at present time-instance.
This convergence criterion is set to 5 × 10−3 .
4.5. NUMERICAL EXPERIMENTS
97
Figure 4.28 shows the time dependent Nusselt number,
1
Nu =
2
Z
1
(P r U Θ −
0
∂Θ
)X dY
∂X
(4.14)
evaluated at the heated wall, X = 0, and at the vertical center-line of the cavity
X = 1/2 for Rayleigh number equal to 103 , 1.4 × 104 and 1.4 × 105 . Simulations
were performed on 24 processors assigning each time-slice to a single processor.
The Nusselt number temporal evolutions are in good agreement with reference
reported solutions [69].
Figure 4.29 shows the evolution of the Nusselt number with time, at X = 1/2,
for different number of iterations performed on the parallel-in-time algorithm, for
Ra = 1.4 × 105 . From Fig. 4.29 one can deduce that the convergence of the
parallel-in-time iterative solution is smooth. The number of iterations required for
convergence is dependent on the prescribed Rayleigh number and on the number of
time sub-domains, see Fig. 4.30. More iterations are required for higher Rayleigh
number flows. The lowest Rayleigh number flow considered, Ra = 103 , has a
time dependence so smooth that the number of iterations required to convergence
remains constant not depending on the number of processors used. For higher
Rayleigh number flows, the number of iterations decreases with the number of
sub-domains. One should note that the number of iterations is limited by the
number of time sub-domains and consequently, for Ra = 1.4 × 105 there is an
increase on the number of iterations required to achieve the convergence criterion
when the number of processors increases from 8 to 12.
The extension of the parallel-in-time algorithm to hybrid time and space parallel calculations allows the possibility to optimise the speed-up by the choice of
the domain to parallelize against the dimensions of the problem under consideration and the number of processors available. Calculations were performed for
Ra = 1.4 × 104 on two spatial grids, 32 × 32 and 128 × 128 nodes keeping Courant
number constant on the fine time grid. Both spatial meshes are too coarse for
accurate representative calculations. However, the main purpose of the present
work is to apply the hybrid space- and time-domain decomposition technique on
a problem solution. To consider finer spatial meshes more processors would be
necessary to perform a reasonable comparison between the time-domain and the
space-domain decomposition methods. Several configurations of the space and
98
CHAPTER 4. TIME DOMAIN DECOMPOSITION
9
3
Ra = 10
4
Ra = 1.4 x 10
Ra = 1.4 x 105
8
7
Nu (X = 1/2)
6
5
4
3
2
1
0
0
0.01
0.02
0.03
0.04
0.05
0.06
0.04
0.05
0.06
t*
9
8
7
Nu (X = 0)
6
5
4
3
2
1
0
0
0.01
0.02
0.03
t*
Figure 4.28: The dependence of the Nusselt number, at the heated wall, X = 0, and
at the vertical center-line X = 1/2, on the non-dimensionalized time for Rayleigh
number equal to 103 , 1.4 × 104 and 1.4 × 105 .
4.5. NUMERICAL EXPERIMENTS
99
iteration 1
iteration 2
iteration 3
iteration 4
iteration 5
converged solution
9
8
7
Nu (X=1/2)
6
5
4
3
2
1
0
0
0.01
0.02
0.03
0.04
0.05
0.06
t*
Figure 4.29: The Nusselt number temporal evolution, at the vertical center line
for Ra = 1.4 × 105 , dependence on the number of iterations performed.
3
Ra = 10
Ra = 1.4 x 104
5
Ra = 1.4 x 10
16
Number of iterations for convergence
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0
5
10
15
20
25
Number of processors
Figure 4.30: The dependence of the number of iterations required for convergence
on the Rayleigh number and on the number of time sub-domains.
100
CHAPTER 4. TIME DOMAIN DECOMPOSITION
time domains decomposition were tested using up to 24 processors. The limited
number of nodes of the PC-cluster only allows testing few topologic configurations
of processors allocated to time and space domains.
The ”wall clock ” computing time as a function of the number of space and
time sub-domains is present in Fig. 4.31 a) for the 128 × 128 grid nodes. When
only standard domain decomposition is used, the achieved results translate the
well-known trend that when the spatial dimension of the problem is large the
space domain decomposition is efficient and meaningful computing time reduction
is achieved. For the present problem and the spatial mesh comprising 128 × 128
grid nodes, the time decomposition is not efficient because the computing effort
ratio between the parallel fine time grid and the sequential coarse time grid solutions constrains the performance of the method. The computing time is dictated
by these sequential calculations on the iterative procedure because the number
of outer iterations required in the coarse time grid calculation is large. A smaller
coarse grid time-step size, and consequently more processors available, is necessary
to increase properly the computing effort ratio. When using the coarser space grid
of 32 × 32 nodes for the same problem, see Fig. 4.31 b), the space domain decomposition is not efficient. Computing time reduction can only be achieved by the
time domain decomposition method. The number of time sub-domains considered,
6 to 24, is sufficient to achieve a suitable computing effort ratio and the computing
time saving increases with the number of time sub-domains. The small number of
processors available limited the present hybrid space-time parallelization. Nevertheless, the expected computing time decrease with an increasing number of time
sub-domains, considering an optimal number of space sub-domains, is potentially
indicated in Fig. 4.31 a). From Fig. 4.31 b) it is possible to verify that the small
spatial dimension of the problem (32 × 32 nodes) prevents any benefit from the
hybrid parallelization.
4.5. NUMERICAL EXPERIMENTS
101
a)
16000
14000
10000
8000
6000
computing time (s)
12000
4000
1
2000
6
0
number of time
sub-domains
12
1
2
24
4
9
number of space
sub-domains
b)
300
200
150
100
computing time (s)
250
50
1
6
0
number of time
sub-domains
12
1
2
24
4
9
number of space
sub-domains
Figure 4.31: Computing time required for simulations: a) 128 × 128 nodes ; b)
32 × 32 nodes.
102
4.6
CHAPTER 4. TIME DOMAIN DECOMPOSITION
Summary
A parallel-in-time method, based on the temporal domain decomposition, was
applied for the solution of the unsteady, incompressible Navier-Stokes equations
and extended to a hybrid spatial and temporal formulation.
The two-dimensional Taylor vortex-decay problem with Re = 100 was selected
to conduct a sensitivity analysis on some of the parallel-in-time method influencing
parameters. The following conclusions were derived:
(i) The parallel-in-time solution can require less computational time than the
single processor solution. Speed-up depends on several parameters that were
investigated.
(ii) The parallel-in-time computing time decreases with the number of processors and increases with the number of iterations required for convergence of
the iterative process, between the sequential coarse time-grid and the finer
parallel time-grid solutions.
(iii) The parallel efficiency of the parallel-in-time method increases with the decrease of the spatial dimension of the problem.
(iv) The parallel efficiency of the present method increases when the computational effort ratio, between fine and coarse time-grid integrators, increases.
The conditional stability domain was evaluated by numerical experiments for
several numerical temporal discretization schemes applied on the coarser timegrid. For this purpose, calculations of the one-dimensional transport equation
were performed simulating up to one hundred processors. No reduction of the
stability domain has been detected for the ”unconditionally stable” implicit threelevel and Euler schemes. The other schemes, explicit Euler, Adams-Bashforth,
Crank-Nicolson and fourth-order Runge-Kutta, displayed important reductions on
their conditional stability domain for the test case investigated.
To evaluate the potential efficiency of the present method to fluid flow simulations, the convergence of this iterative method was analysed considering some
spatial and temporal discretization schemes commonly used for that purpose. The
parallel-in-time solution of the one-dimensional scalar transport equation allowed
some conclusions concerning the accuracy and convergence of the iterative method:
4.6. SUMMARY
103
(i) The accuracy of the parallel-in-time solution increases with the number of
iterations accordingly with the order of accuracy of the numerical schemes
considered for the coarse and fine time-grids.
(ii) The number of iterations required for convergence can be estimated by the
ratio between the order of accuracy of the numerical schemes considered for
the fine and coarse time-grids.
Another important issue related to the performance of the parallel-in-time
method is the time spent by the communication tasks required by the algorithm. A
simplified performance model for the parallel-in-time method was proposed. The
flow past a two-dimensional square cylinder between parallel walls for Reynolds
number equal to 500 was selected to analyse through numerical experiments the
influence of the communication time on the parallel efficiency of the method. The
following conclusions could be derived:
(i) The agreement between the verified parallel efficiency in the numerical experiments and the one predicted by the theoretical model indicates that the
communication overhead does not impose a critical limitation on the application of the present methodology for the unsteady, incompressible NavierStokes equations.
(ii) For large ratios between the coarse and fine time-step sizes, substantial computing time reduction can be expected.
The application of the parallel-in-time algorithm for the solution of the incompressible, unsteady Navier-Stokes equations was extended to a hybrid (space and
time) decomposition and the simulations agree with reference solutions for the
considered test cases. The numerical simulation of the two-dimensional, unsteady,
incompressible flow in a square cavity due to natural convection was selected to
illustrate the application of four computing strategies, sequential, space domain
decomposition, time domain decomposition and hybrid space-time decomposition.
When the space domain of the problem is small and the standard space domain
decomposition method prevents the speed-up of the calculation, the parallel-intime method allows to reduce the computing time. The speed-up achieved with
the present method is strongly dependent on the number of iterations required for
104
CHAPTER 4. TIME DOMAIN DECOMPOSITION
convergence. An increasing number of time sub-domains decreases the number of
iterations required for convergence and consequently, contributes to magnify the
computing time saving. The hybrid space-time calculations were successfully performed but 24 processors were not enough to derive final conclusions. The results
suggest that the parallel-in-time methodology is promising when the temporal scale
of the problem under consideration is large and a large number of processors are
available.
Chapter 5
Conclusions
5.1
Summary
Computational fluid dynamics is actually one of the great challenges in supercomputing. The use of massively parallel processing systems is necessary to solve
cost-effectively high resolution problems. The number of nodes of parallel computers will naturally increase in the future having as limit the largest number of
linked computers, which is ultimately the World Wide Web with several hundred
million processors. For such scenario, a problem considered nowadays as large may
become a small one if a very large number of computer nodes are available. For
unsteady fluid flow problems, one possible way to fully exploit the large number
of nodes available in the future is the temporal domain decomposition method.
Another relevant trend on CFD is the use of high-order methods. For a given solution accuracy, high-order methods require less memory (small number of points)
and in most cases can save computing time. To take advantage from the more
accurate volume-averaged solution on higher-order formulations, it is essential to
proceed with the reconstruction of the point-wise velocity field. This should be
done also on a higher-order basis to approximate congruently the time derivative
of the volume-averaged velocity with the accuracy of the other operators.
The spatial domain decomposition technique was applied to parallelize a finitevolume fourth-order accurate solver code for the unsteady, incompressible form of
the Navier-Stokes equations. The developed algorithm considers the time advancement of the momentum equations by the classical explicit four-stage Runge-Kutta
105
106
CHAPTER 5. CONCLUSIONS
scheme and the control-volume faces convective and viscous fluxes approximation by fourth-order accurate Simpson’s rule and polynomial interpolation. The
calculation of the high-order de-averaging coefficients is based on the Taylor series expansion of the integrated velocity values at cell and neighbourhood cells.
The fourth-order global accuracy of the numerical scheme was verified by the twodimensional Taylor vortex decay problem simulation. The global numerical scheme
becomes second-order accurate when the time derivative of the cell center velocity
substitutes the time derivative of the volume-averaged velocity.
The flow simulations performed comprising the interaction between vortices
provided some more conclusions. The accuracy enhancement obtained by the
inclusion of the fourth-order de-averaging procedure was significantly revealed for
coarser spatial discretizations. For the same level of accuracy, the proposed fourthorder de-averaging procedure requires less mesh nodes than the second-order one.
The differences between the solutions obtained with the proposed fourth-order and
the second-order de-averaging procedures vanish with mesh refinement.
A parallel-in-time method, based on temporal domain decomposition, was applied for the solution of the unsteady, incompressible Navier-Stokes equations.
Numerical experiments were selected to analyse the numerical properties of the
method. The solution of the non-linear fluid flow equations with the parallel-intime method is a promising technique but still in the beginning of development
and application. Among others, the stability, the theoretical parallel efficiency and
the robustness to deal with non-linear complex unsteady flows were investigated.
The conditional stability domain was evaluated by numerical experiments solving the one-dimensional scalar transport equation. For this purpose, various numerical schemes, involving first- to fourth-order discretization schemes, were applied for the coarser time-grid solution, considering up to one hundred processors.
No reduction of the stability domain was detected for the ”unconditionally stable” implicit schemes. Other schemes, explicit Euler, Adams-Bashforth, CrankNicolson and fourth-order Runge-Kutta, displayed important reductions on their
conditional stability domain.
The prediction of the parallel efficiency, or speed-up, is relevant to decide when
to use the parallel-in-time method for a specific problem. A theoretical performance model for the parallel-in-time method was derived, with some simplifying
assumptions, in order to evaluate the potential application of the technique. This
5.1. SUMMARY
107
model takes into account several parameters that contribute to the computing time
required to perform a parallel-in-time calculation.
The number of iterations required to convergence plays an important role on
the efficiency of the parallel-in-time method. The one-dimensional scalar transport
equation allowed to evaluate the potential efficiency of the method for fluid flow
simulations. The convergence of the iterative method was analysed considering
some spatial and temporal discretization schemes commonly used for CFD calculations. The selection of the numerical schemes for the temporal evolution on each
time-grid, besides the stability constraints, should be made considering that the
numerical scheme for the finer time-grid evolution should provide higher order of
convergence than the one used in the coarse time-grid. Considering the general
theory related to the convergence of numerical schemes, one can approximate the
number of iterations required for convergence by the ratio between the order of
accuracy of the numerical schemes used for the fine and coarse time-grids solution.
The performance of the parallel-in-time method is also dependent on the time
spent by the communication tasks required by the algorithm. The performance
model was validated with solutions of the Navier-Stokes equations for a selfsustained unsteady flow problem. The comparison of the observed parallel-in-time
performance for the solution of a complex unsteady, incompressible flow problem
with the performance model prediction allows to verify that the effect of the communication time overhead of the parallel-in-time simulation do not prevent the
application of the method for fluid flow simulations.
The simulations of demanding unsteady flow problems allowed to conclude that
few iterations are sufficient to obtain parallel-in-time solutions virtually equal to
those obtained in a single processor. Significant computing time saving can be
achieved for long time simulations of problems with small spatial domain.
It is believed that in the near future massive parallel computer systems will
increase the number of processors available allowing new boundaries to the problems dimension. Consequently, temporal or hybrid domain decomposition and
high-order finite-volume methods will have a high potential application to reduce
the computing time of fluid flow simulations. This will have a positive impact on
the solution of partial differential equations that CFD deals with as well as in other
areas of computer modelling in engineering and science.
108
CHAPTER 5. CONCLUSIONS
5.2
Suggestions for future work
Parallel-in-time solution of non-linear fluid flow equations is still in the beginning
and many theoretical, numerical and practical topics need further investigation.
The main tasks in this area that should be addressed are:
(i) Investigate the stability of the algorithm including non-linear stability analysis.
(ii) Incorporate in the algorithm automatic options allowing coarser spatial meshes
for the coarse time-grid prediction step.
(iii) Investigate the extension of the algorithm to a multi-level formulation, in a
similar fashion of the spatial multi-grid methods, to increase the performance
of the method.
Finally, general suggestions for future work are:
(i) Import the codes developed for parallel processing in PC-clusters into the
GRID massive technology.
(ii) Develop error estimation techniques to incorporate into the h-p refinement
methodologies for high-order massive parallel calculations.
Bibliography
[1] D. Culler, J. Singh, and A. Gupta. Parallel Computer Arquitecture - A hardware/software approach. Morgan Kaufmann Publishers, USA, 1999.
[2] J. Chamiço. Cálculo Numérico do Escoamento de Convecção Natural usando
Computadores de Arquitectura Paralela. MS Thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Lisboa, 1995.
[3] Pedro J. A. V. Novo. Parallel Simulation of Radiative Heat Transfer. PhD
Thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Lisboa,
2000.
[4] Pedro Jorge Coelho. Influence of the discretization scheme on the parallel
efficiency of a code for the modelling of a utility boiler. In VECPAR, pages
203–214, 1998.
[5] P. J. Coelho, P. A Novo, and M. G. Carvalho. Modelling of a utility boiler
using parallel computing. Journal of Supercomputing, 13:211–232, 1999.
[6] L. M. R. Carvalho and José M. L. M. Palma. Parallelization of CFD code
using PVM and domain decomposition techniques. In VECPAR, pages 247–
257, 1996.
[7] Vipin Kumar, Franz-Josef Pfreundt, Hans Burkhard, and Jose Laginha Palma.
Applications on high performance computers. In Euro-Par, page 409, 2002.
[8] J. Dongarra, I. Foster, G. Fox, K. Kennedy, A. White, L. Torczon, and
W. Gropp. The Source Book of Parallel Computing. Elsevier Science, USA,
2003.
109
110
BIBLIOGRAPHY
[9] A. Quarteroni and A. Valli. Domain Decomposition Methods for Partial Differential Equations. Oxford University Press, Oxford, 1999.
[10] G. Amdahl. Validity of the single processor approach to achieving large scale
computing capabilities. In AFIPS 1967 Spring Joint Computer Conference,
volume 40, pages 483–485, 1967.
[11] F. Berman, G. Fox, and A. Hey, editors. Grid Computing: Making the Global
Infrastructure a Reality. J. Wiley & Sons, 2003.
[12] I. Foster and C. Kesselman, editors. The Grid: Blueprint for a New Computing Infrastructure. Elsevier, USA, 2004.
[13] N. Karonis, B. Toonen, and I. Foster. Mpich-g2: A grid-enabled implementation of the message passing interface. Journal of Parallel and Distributed
Computing, 63(5):551–563, 2003.
[14] O. Hazama, T. Hirayama, and G. Yagawa. Development of a grid working
environment and its applications to fluid-structure interaction analysis. In
P. Neittaanmaki, T. Rossi, K. Majava, and O. Pironneau, editors, Proceedings
of European Congress on Computational Methods in Applied Sciences and
Engineering, Jyvaskyla, Finland. ECCOMAS, 2004.
[15] R.S. Montero, E. Huedo, and I.M. Llorente. Benchmarking of high throughput
computing applications on grids. Parallel Computing, 32(4):267–279, 2006.
[16] D. Kwak, C. Kiris, and C. Kim. Computational challenges of viscous incompressible flows. Computers & Fluids, 34:283–299, 2005.
[17] G. Geist, J. Kohla, and P. Papadopoulos. PVM and MPI: A Comparison of
Features. Calculateurs Paralleles, 8(2):137–150, 1996.
[18] W. D. Gropp, E. Lusk, and A. Skjellum. Using MPI: Portable Parallel Programming with the Message-Passing Interface. Scientific and Engineering
Computation. MIT Press, Cambridge, MA, 1994.
[19] D. Womble. A time-stepping algorithm for parallel computers. SIAM J. Sci.
Stat. Comput., 11:824–837, 1990.
BIBLIOGRAPHY
111
[20] G. Horton. The time-parallel multigrid method. Comm. Appl. Numer. Methods, 8:585–595, 1992.
[21] G. Horton and S. Vandewalle. A space-time multigrid method for parabolic
partial differential equations. SIAM J. Sci. Stat. Comput., 16:848–864, 1995.
[22] P. Chartrier and B. Philippe. A parallel shooting technique for solving dissipative ODE’s. Computing, 51:209–236, 1993.
[23] J. Ferziger and M. Peric. Computational methods for fluid dynamics. SpringerVerlag, 1997.
[24] J. Lions, Y. Maday, and G. Turinici. Resolution d’EDP par un schema en
temps ’pararéel’. Comptes Rendus de l Academie des Sciences, Paris, Séries
I, Mathematique, 332:661–668, 2001.
[25] G. Bal and Y. Maday. A ’parareal’ time discretization for non-linear PDE’s
with application to the pricing of an american put. In L. Pavarino and
A. Toselli, editors, Proceedings of the Workshop on Domain Decomposition,
Zurich, Switzerland, volume 23 of Lecture Notes in Computer Science and
Engineering Series, Berlin, 2002. Springer.
[26] L. Baffico, S. Bernard, Y. Maday, G. Turinici, and G. Zerah. Parallel-in-time
molecular-dynamics simulations. Int. J. Quant. Chem., E66:057701, 2002.
[27] Y. Maday and G. Turinici. Parallel in time algorithms for quantum control:
the parareal time discretization scheme. Int. J. Quant. Chem., 93(3):223–228,
2003.
[28] C. Farhat and M. Chandesris. Time-decomposed parallel time-integrators:
theory and feasibility studies for fluid, structure, and fluid-structure applications. Int. J. Numer. Methods Engng., 58:1397–1434, 2003.
[29] J. Trindade and J. Pereira. Parallel-in-time simulation of the unsteady NavierStokes equations for incompressible flow. Int. J. Numer. Methods Fluids,
45:1123–1136, 2004.
112
BIBLIOGRAPHY
[30] G. Staff and E. Rønquist. Stability of the parareal algorithm. In Proceedings of
the 15th International Conference on Domain Decomposition Methods, Berlin,
2003.
[31] J. Trindade and J. Pereira. Parallel-in-time simulation of two-dimensional
unsteady incompressible laminar flows. Num. Heat Transf. B, Fundamentals,
50:25–40, 2006.
[32] R. Issa. Solution of implicitly discretized fluid flow equations by operatorsplitting. J. Comp. Phys., 62:40–65, 1986.
[33] C. Kiris and D. Kwak. Aspects of unsteady incompressible flow simulations.
Computers & Fluids, 31:627–638, 2002.
[34] A. Chorin. Numerical solution of the Navier-Stokes equations. Math. Comp.,
22:745–762, 1968.
[35] R. Temam. Une méthode d’aproximation de la solution des équations de
Navier-Stokes. Bull. Soc. Math. France, 98:115–152, 1968.
[36] J. Kim and P. Moin. Application of a fractional-step method to incompressible
Navier-Stokes equations. J. Comp. Phys., 59:308–323, 1985.
[37] P. Colella J. Bell and H. Glaz. A second order projection method for the
incompressible Navier–Stokes equations. J. Comput. Phys, 85:257–283, 1989.
[38] J. van Kan. A second-order accurate pressure-correction scheme for viscous
incompressible flow. SIAM J. Sci. Stat. Comput., 7(3):870—-891, 1986.
[39] W. E and J. Liu. Projection method I: convergence and numerical boundary
layers. SIAM J. Num. Anal., 32:1017–1057, 1995.
[40] W. E and J. Liu. Projection method II: Godunov-Ryabenki analysis. SIAM
J. Num. Anal., 33:1597–1621, 1996.
[41] J. Strikwerda and Y. Lee. The accuracy of the fractional step method. SIAM
J. Num. Anal., 37:37–47, 1999.
[42] D. Brown, R. Cortez, and M. Minion. Accurate projection methods for the
incompressible Navier–Stokes equations. J. Comput. Phys, 168:464–499, 2001.
BIBLIOGRAPHY
113
[43] W. Henshaw. A fourth-order accurate method for the incompressible NavierStokes equations on overlapping grids. J. Comp. Phys., 113:13–25, 1994.
[44] W. Henshaw, H. Kreiss, and L. Reyna. Fourth-order-accurate difference approximation for the incompressible Navier-Stokes equations. Computers and
Fluids, 23(4):575–593, 1994.
[45] N. Kampanis and J. Ekaterinaris. A staggered grid, high-order accurate
method for the incompressible Navier–Stokes equations. J. Comp. Phys.,
215(2):589–613, 2006.
[46] S. Lele. Compact finite difference schemes with spectral-like resolution. J.
Comp. Phys., 103:16–42, 1992.
[47] Z. Lilek and M. Peric. A fourth-order finite volume method with collocated
variable arrangement. Computers & Fluids, 24:239–252, 1995.
[48] F.M. Denaro. Towards a new model-free simulation of high-reynolds-flows:
local average direct numerical simulation. Int. J. Numer. Methods Fluids,
23:125–142, 1996.
[49] G. De Stefano, F.M. Denaro, and G. Riccardi. Analysis of 3d backward-facing
step incompressible flows via a local average-based numerical procedure. Int.
J. Numer. Methods Fluids, 28:1073–1091, 1998.
[50] P. Iannelli, F.M. Denaro, and G. De Stefano. A deconvolution-based fourthorder finite volume method for incompressible flows on non-uniform grids. Int.
J. Numer. Methods Fluids, 43:431–462, 2003.
[51] J.M.C. Pereira, M.H. Kobayashi, and J.C.F. Pereira. A fourth order accurate
finite volume compact method for the incompressible Navier-Stokes solutions.
J. Comp. Phys., 166:217–243, 2001.
[52] J. Trindade and J. Pereira. Parallel-in-time simulation of 2D laminar unsteady flow around a square obstacle. In Proceedings of ASME Heat Transfer/Fluids Engineering Summer Conference, Charlotte, USA, HT-FED200456796. ASME, 2004.
114
BIBLIOGRAPHY
[53] J. Trindade and J. Pereira. Parallel-in-time for the finite-volume solution of
incompressible unsteady Navier-Stokes. In P. Neittaanmaki, T. Rossi, K. Majava, and O. Pironneau, editors, Proceedings of European Congress on Computational Methods in Applied Sciences and Engineering, Jyvaskyla, Finland.
ECCOMAS, 2004.
[54] M. Lesieur, P. Comte, and J. Zinn-Justin, editors. Computational Fluid Dynamics. Elsevier, North-Holland, 1996.
[55] S. Patankar. Numerical heat transfer and fluid flow. Hemisphere Publishing
Corp., 1980.
[56] S. Patankar and D. Spalding. A calculation procedure for heat, mass and
momentum transfer in three-dimensional parabolic flows. Int. J. Heat Mass
Transfer, 15:1787–1807, 1972.
[57] C. Rhie and W. Chow. Numerical study of the turbulent flow past an airfoil
with tailing edge separation. AIAA J., 21:1525–1532, 1983.
[58] T. Ye, R. Mittal, H. S. Udaykumar, and W. Shyy. An accurate cartesian grid
method for viscous incompressible flows with complex immersed boundaries.
J. Comp. Phys., 156(2):209–240, 1999.
[59] F. W. Harlow and J.E. Welch. Numerical calculation of time-dependent viscous incompressible flow of fluid with free surface. Phys. of Fluids, 8:2182–
2189, 1965.
[60] P. Khosla and S. Rubin. A diagonally dominant second-order accurate implicit
scheme. Computers & Fluids, 2:207–209, 1974.
[61] H. van der Vorst. Bi-CGSTAB: A fast and smoothly converging variant of
Bi-CG for the solution of nonsymmetric linear systems. SIAM J. Sci. Stat.
Comput., 13(2):631–644, 1992.
[62] R. Tuminaro, M. Heroux, S. Hutchinson, and J. Shadid. Official Aztec user’s
guide: Version 2.1, Tech. Rep. SAND99-8801,. Sandia National Labs, 1999.
http://www.cs.sandia.gov/CRF/aztec1.html.
BIBLIOGRAPHY
115
[63] M. Kurreck and S. Wittig. A comparative study of pressure correction and
block-implicit finite volume algorithms on parallel computers. Int. J. Numer.
Methods Fluids, 24:1111–1128, 1997.
[64] G. Taylor. On the decay of vortices in a viscous fluid. Philosophical Magazine,
46:671–675, 1923.
[65] S. Le Dizès and F. Laporte. Theoretical predictions for the elliptical instability
in a two-vortex flow. J. Fluid Mech., 471:169–201, 2002.
[66] J.M.C. Pereira and J.C.F. Pereira. Fourier analysis of several finite differences
schemes for the one-dimensional unsteady convection-diffusion equation. Int.
J. Numer. Methods Fluids, 36:417–439, 2001.
[67] C. Yu and J. Heinrich. Petrov-Galerkin methods for the time-dependent convective transport equation. Int. J. Numer. Methods Engng., 23:883–901, 1986.
[68] R. Davies, E. Moore, and L. Purtell. A numerical-experimental study of
confined flow around rectangular cylinders. Phys. Fluids, 27:46–59, 1984.
[69] J. Patterson and J. Imberger. Unsteady natural convection in a rectangular
cavity. J. Fluid Mech., 100(1):65–86, 1980.
116
BIBLIOGRAPHY
Appendix A
De-averaging coefficients
The derivation of the de-averaging coefficients was performed with symbolic computing software (MATHEMATICA). Firstly, a generic function is considered in
the de-averaging finite volume stencil. The truncated Taylor series expansion (up
to the required order) is then integrated in each control-volume. The resulting
system of equations gives the de-averaging coefficients Ci of Eq. (2.54)
u=
1 X
Ci ûi
V i
where V denotes the volume of each cell.
The de-averaging coefficients were calculated to obtain sixth-order accuracy,
requiring 13 finite volume cells for the two-dimensional case, as indicated in Fig.
A.1 for cells far from the boundaries of the domain. Near the boundaries, only
fourth-order accuracy is achieved due to the non-symmetric stencil. The stencil
only includes interior cells. The coefficients for a uniform mesh are indicated
in Tab. A.1 accordingly with the control volume location indicated in the Fig.
A.2. Coefficients of cell type 1 are considered for all interior control volumes.
Consideration of control volumes near other boundaries is straightforward.
Figure A.3 includes the stencil and the de-averaging coefficients for an interior
cell on a 3-D uniform mesh. Non-symmetric stencils and coefficients required to
perform the de-averaging near boundaries, where application of the symmetric
stencil is not possible, are indicated in Fig. A.4 to A.11.
117
118
APPENDIX A. DE-AVERAGING COEFFICIENTS
ˆ (i
u
ˆ (i
u
− 2, j)
−1, j +1)
ˆ (i
u
ˆ (i
u
−1, j)
− 1, j − 1)
ˆ (i, j
u
+ 2)
ˆ (i, j
u
+ 1)
ˆ (i, j )
u
ˆ (i, j
u
−1)
ˆ (i, j
u
− 2)
ˆ (i
u
+ 1, j + 1)
ˆ (i
u
ˆ (i
u
+ 1, j )
ˆ (i
u
+ 2, j)
+ 1, j − 1)
Figure A.1: High-order 2-D de-averaging stencil for interior control volumes.
6
1
1
1
1
3
2
5
4
Figure A.2: Cases considered for the use of the 2-D de-averaging coefficients.
1
2
3
4
5
6
1771
1440
6679
5760
3137
2880
5503
5760
5503
5760
1003
1440
Ai,j
43
1440
43
1440
1511
2880
1511
2880
1481
2880
31
960
31
960
31
960
3
640
41
80
-
9
320
-
Ai+4,j
3
− 640
− 127
640
1
576
1
576
1
576
Ai−1,j−1
Ai+3,j
-
Ai,j−1
23
− 360
103
− 2880
103
− 2880
-
Ai,j+5
-
43
1440
169
− 2880
17
− 1440
1481
2880
Ai+1,j
23
− 360
23
− 360
Ai,j+4
3
− 640
3
− 640
− 127
640
− 127
640
− 127
640
Ai,j+1
23
− 360
Ai,j+3
1
9
2
320
9
3
320
41
4
80
41
5
80
41
5
80
Ai−1,j
23
− 360
23
− 360
103
− 2880
169
− 2880
203
− 5760
-
31
960
Ai+5,j
-
-
1
576
1
576
1
576
1
− 288
1
− 288
Ai−1,j+1
-
1
576
1
576
Ai−1,j+2
-
1
576
1
576
1
576
1
− 288
1
− 288
1
144
Ai+1,j+1
1
576
1
576
1
− 288
Ai+1,j+2
-
-
1
576
1
576
1
576
Ai+1,j−1
3
640
3
640
21
− 320
3
640
3
− 160
511
− 720
Ai+2,j
1
576
Ai+2,j+2
-
3
640
21
− 320
21
− 320
2059
− 2880
2059
− 2880
− 511
720
Ai,j+2
Ai+2,j+1
1
− 288
-
3
640
-
3
640
3
640
Ai−2,j
Table A.1: De-averaging coefficients for a uniform 2-D cartesian spatial grid.
-
3
640
Ai,j−2
119
675
1
675
1
046
3
046
3
675
1
Figure A.3: 3-D stencil and de-averaging coefficients for a interior cell.
−
046
3
0441
79
0441
79
675
1
675
1
0441
79
−
−
675
1
675
1
675
1
096
1031
0441
79
−
675
1
0441
79
0441
79
−
046
3
675
1
046
3
675
1
675
1
046
3
−
APPENDIX A. DE-AVERAGING COEFFICIENTS
120
046
3
675
1
−
−
−
−
Figure A.4: 3-D stencil and de-averaging coefficients for a boundary face cell.
−
046
3
−
675
1
675
1
882
1
882
1
0882
971
0882
971
675
1
0291
7112
675
1
−
0882
971
−
0882
971
0441
335
0882
9311
−
046
3
675
1
046
3
882
1
882
1
675
1
675
1
084
19
0291
17
−
121
675
1
−
−
−
−
−
Figure A.5: 3-D stencil and de-averaging coefficients for a boundary face cell near a edge.
046
3
−
0882
971
04032
7201
−
675
1
042
952
675
1
882
1
882
1
675
1
675
1
−
02511
185
−
046
3
0882
971
0441
335
0882
9311
−
0652
3
675
1
046
3
882
1
882
1
675
1
675
1
084
19
0291
17
−
APPENDIX A. DE-AVERAGING COEFFICIENTS
122
675
1
−
−
−
−
−
−
Figure A.6: 3-D stencil and de-averaging coefficients for a boundary face cell near a vertex.
−
0882
971
0675
322
882
1
882
1
−
675
1
0291
7391
675
1
0441
335
675
1
675
1
−
027
11
027
11
−
046
3
675
1
061
3
882
1
0882
9311
−
0652
3
046
3
882
1
675
1
675
1
084
19
0291
17
−
123
046
3
675
1
−
−
Figure A.7: 3-D stencil and de-averaging coefficients for a cell near a boundary face.
046
3
−
675
1
675
1
675
1
675
1
0441
79
0441
79
675
1
021
751
0675
352
675
1
−
−
675
1
675
1
0441
79
0441
79
0882
95
−
046
3
675
1
046
3
675
1
675
1
061
3
046
3
−
APPENDIX A. DE-AVERAGING COEFFICIENTS
124
046
3
675
1
−
−
−
Figure A.8: 3-D stencil and de-averaging coefficients for a cell near a boundary face and edge.
675
1
675
1
0441
79
0675
352
−
0675
352
675
1
069
1121
675
1
675
1
675
1
−
675
1
675
1
0882
95
−
061
3
0441
79
0882
95
−
046
3
675
1
046
3
675
1
675
1
061
3
046
3
−
125
675
1
−
−
−
−
Figure A.9: 3-D stencil and de-averaging coefficients for a cell near a boundary face and vertex.
675
1
675
1
0675
352
0675
352
675
1
675
1
675
1
−
0675
352
084
385
0882
95
675
1
−
675
1
675
1
0882
95
0882
95
−
061
3
675
1
061
3
675
1
−
046
3
046
3
675
1
061
3
046
3
−
APPENDIX A. DE-AVERAGING COEFFICIENTS
126
027
14
046
3
−
−
−
Figure A.10: 3-D stencil and de-averaging coefficients for a boundary edge cell.
−
063
113
−
882
1
−
675
1
0441
325
882
1
675
1
−
0882
9211
027
14
−
084
19
882
1
0441
325
0882
9211
−
0291
17
675
1
046
3
882
1
675
1
084
19
0291
17
−
127
061
75
061
75
046
904
061
75
069
373
−
−
Figure A.11: 3-D stencil and de-averaging coefficients for a boundary vertex cell.
069
373
−
069
373
084
19
−
084
19
−
0291
17
0291
17
084
19
0291
17
−
128
APPENDIX A. DE-AVERAGING COEFFICIENTS
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement