PETSc-FEM: A General Purpose, Parallel, Multi

PETSc-FEM: A General Purpose, Parallel, Multi
PETSc-FEM: A General Purpose,
Parallel, Multi-Physics FEM Program.
User’s Guide
by Mario Storti, Norberto Nigro, Rodrigo Paz,
Lisandro Dalc´ın and Ezequiel L´
opez
Centro Internacional de M´etodos Computacionales
en Ingenier´ıa (CIMEC) Santa Fe, Argentina
http://www.cimec.org.ar/petscfem
((version "mstorti-v15-root-5-g62b6839 ’clean")
(date "Sun Apr 27 18:22:33 2008 -0300")
(processed-date "Sun Apr 27 18:22:36 2008 -0300"))
April 27, 2008
This is the documentation for PETSc-FEM (current version
mstorti-v15-root-5-g62b6839 ’clean, a general purpose, parallel, multiphysics FEM program for CFD applications based on PETSc. PETSc-FEM
comprises both a library that allows the user to develop FEM (or FEM-like,
i.e. non-structured mesh oriented) programs, and a suite of application
programs. It is written in the C++ language with an OOP (Object Oriented
Programming) philosophy, but always keeping in mind the scope of efficiency.
1
Contents
1 LICENSE
6
2 GNU GENERAL PUBLIC LICENSE
2.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 GNU General Public License. Terms and Conditions for Copying,
bution and Modification . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Appendix: How to Apply These Terms to Your New Programs . .
6
. . . . . 6
Distri. . . . . 7
. . . . . 11
3 The PETSc-FEM philosophy
13
3.1 The three levels of interaction with PETSc-FEM . . . . . . . . . . . . . . . 13
3.2 The elemset concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 General layout of the user input data file
4.1 Preprocessing the user input data file . . . . . . .
4.2 Internal preprocessing. The FileStack class . . .
4.2.1 Syntax . . . . . . . . . . . . . . . . . . . .
4.2.2 Class internals . . . . . . . . . . . . . . .
4.3 Preprocessing with ePerl . . . . . . . . . . . . . .
4.3.1 Basics of ePerl . . . . . . . . . . . . . . .
4.3.2 Variables . . . . . . . . . . . . . . . . . .
4.3.3 Text expansion . . . . . . . . . . . . . . .
4.3.4 Conditional processing . . . . . . . . . . .
4.3.5 File inclusion . . . . . . . . . . . . . . . .
4.3.6 Use of ePerl in makefiles . . . . . . . . . .
4.3.7 ePerlini library . . . . . . . . . . . . . . .
4.3.8 Errors in ePerl processing . . . . . . . . .
4.4 General options . . . . . . . . . . . . . . . . . . .
4.4.1 Read mesh options . . . . . . . . . . . . .
4.4.2 Elemset options . . . . . . . . . . . . . . .
4.4.3 PFMat/IISDMat class options . . . . . .
4.5 Emacs tools and tips for editing data files . . . .
4.5.1 Installing PETSc-FEM mode . . . . . . .
4.5.2 Copying and pasting PETSc-FEM options
5 Text hash tables
5.1 The elemset hash table of properties . . . .
5.2 Text hash table inclusion . . . . . . . . . .
5.3 The global table . . . . . . . . . . . . . . .
5.4 Reading strings directly from the hash table
5.5 Reading with ‘get int’ and ‘get double’ . . .
5.6 Per element properties table . . . . . . . . .
5.7 Taking values transparently from hash table
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
14
15
16
16
17
17
17
18
18
20
20
21
21
22
22
22
23
24
26
27
27
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
or per element table
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
28
28
29
30
31
31
31
32
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6 The
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
general advective elemset
Introduction to advective systems of equations . . . . . . . . . . . . . . .
Discretization of advective systems . . . . . . . . . . . . . . . . . . . . . .
SUPG stabilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Shock capturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Creating a new advective system . . . . . . . . . . . . . . . . . . . . . . .
Flux function routine arguments . . . . . . . . . . . . . . . . . . . . . . .
6.6.1 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The hydrology module . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.7.1 Related Options . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Hydrological Model (cont.). . . . . . . . . . . . . . . . . . . . . . . .
Subsurface Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Surface Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.10.1 2D Saint-Venant Model. . . . . . . . . . . . . . . . . . . . . . . . .
6.10.2 1D Saint-Venant Model. . . . . . . . . . . . . . . . . . . . . . . . .
6.10.3 Kinematic Wave Model. . . . . . . . . . . . . . . . . . . . . . . . .
Boundary Conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.11.1 Boundary Conditions to simulate River-Aquifer Interactions/Coupling Term. . . . . . . . . . . . . . . . . . . . . . . . . .
6.11.2 Initial Conditions. First, Second and Third Kind Boundary Conditions/Absorbent Boundary Condition. . . . . . . . . . . . . . . . .
Absorbing boundary conditions . . . . . . . . . . . . . . . . . . . . . . . .
6.12.1 Linear absorbing boundary conditions . . . . . . . . . . . . . . . .
6.12.2 Riemann based absorbing boundary conditions . . . . . . . . . . .
6.12.3 Absorbing boundary conditions based on last state . . . . . . . . .
6.12.4 Finite element setup . . . . . . . . . . . . . . . . . . . . . . . . . .
6.12.5 Extrapolation from interiors . . . . . . . . . . . . . . . . . . . . . .
6.12.6 Avoiding extrapolation . . . . . . . . . . . . . . . . . . . . . . . . .
6.12.7 Flux functions with enthalpy. . . . . . . . . . . . . . . . . . . . .
6.12.8 Absorbing boundary conditions available . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
34
34
35
36
36
37
37
39
42
44
45
46
46
46
47
47
48
. 48
.
.
.
.
.
.
.
.
.
.
48
50
51
52
53
53
54
55
56
57
7 The Navier-Stokes module
7.1 LES implementation . . . . . . . . . . . . . . . . . . . . . .
7.1.1 The wall elemset . . . . . . . . . . . . . . . . . . . .
7.1.2 The mixed type boundary condition . . . . . . . . .
7.1.3 The van Driest damping factor. Programming notes
7.2 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Mesh movement . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
59
59
60
60
61
62
69
8 Tests and examples
8.1 Flow in the anular region between to cylinders . . .
8.2 Flow in a square with periodic boundary conditions
8.3 The oscilating plate problem . . . . . . . . . . . . .
8.4 Linear advection-diffusion in a rectangle . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
71
72
72
72
74
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9 The FastMat2 matrix class
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
9.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2.1 Current Matrix view . . . . . . . . . . . . . . . .
9.2.2 Set operations . . . . . . . . . . . . . . . . . . .
9.2.3 Dimension matching . . . . . . . . . . . . . . . .
9.2.4 Automatic dimensioning . . . . . . . . . . . . . .
9.2.5 Concatenation of operations . . . . . . . . . . . .
9.3 Caching the adresses used in the operations . . . . . . .
9.3.1 Branching . . . . . . . . . . . . . . . . . . . . . .
9.3.2 Loops executed a non constant number of times
9.3.3 Masks can’t traverse branches . . . . . . . . . . .
9.3.4 Efficiency . . . . . . . . . . . . . . . . . . . . . .
9.4 Synopsis of operations . . . . . . . . . . . . . . . . . . .
9.4.1 One-to-one operations . . . . . . . . . . . . . . .
9.4.2 In-place operations . . . . . . . . . . . . . . . . .
9.4.3 Generic “sum” operations (sum over indices) . .
9.4.4 Sum operations over all indices . . . . . . . . . .
9.4.5 Export/Import operations . . . . . . . . . . . . .
9.4.6 Static cache operations . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
74
74
75
76
77
77
77
77
77
78
82
84
84
85
85
85
86
87
87
88
10 Hooks
10.1 Launching hooks. The hook
10.2 Dynamically loaded hooks .
10.3 Shell hook . . . . . . . . . .
10.4 Shell hooks with “make” . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
88
89
89
90
92
.
.
.
.
.
.
93
94
94
97
98
99
99
list
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11 Gatherers and embedded gatherers
11.1 Dimensioning the values vector . . . . . . . . . . . . . .
11.2 Embedded gatherers . . . . . . . . . . . . . . . . . . . .
11.3 Automatic computation of layer connectivities . . . . . .
11.4 Passing element contributions as per-element properties
11.5 Parallel aspects . . . . . . . . . . . . . . . . . . . . . . .
11.6 Creating a gatherer . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12 Generic load elemsets
100
12.1 Linear generic load elemset . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
12.2 Functional extensions of the elemset . . . . . . . . . . . . . . . . . . . . . . 101
12.3 The flow reversal elemset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
13 Visualization with DX
13.1 Asynchronous/synchronous communication . . .
13.2 Building and loading the ExtProgImport module
13.3 Inputs/outputs of the ExtProgImport module . .
13.4 DX hook options . . . . . . . . . . . . . . . . . .
4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
102
. 104
. 104
. 105
. 105
14 The
14.1
14.2
14.3
14.4
14.5
14.6
14.7
“idmap” class
Permutation matrices . . . . . . . . . . . . . .
Permutation matrices in the FEM context . . .
A small example . . . . . . . . . . . . . . . . .
Inversion of the map . . . . . . . . . . . . . . .
Design and efficiency restrictions . . . . . . . .
Implementation . . . . . . . . . . . . . . . . . .
Block matrices . . . . . . . . . . . . . . . . . .
14.7.1 Example: . . . . . . . . . . . . . . . . .
14.8 Temporal dependent boundary conditions . . .
14.8.1 Built in temporal functions . . . . . . .
14.8.2 Implementation details . . . . . . . . . .
14.8.3 How to add a new temporal function . .
14.8.4 Dynamically loaded amplitude functions
14.8.5 Use of prefixes . . . . . . . . . . . . . .
14.8.6 Time like problems. . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
106
. 106
. 107
. 111
. 114
. 114
. 114
. 115
. 115
. 116
. 117
. 122
. 122
. 123
. 125
. 127
15 The compute prof package
127
15.1 MPI matrices in PETSc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
15.2 Profile determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
16 The PFMat class
16.1 The PFMat abstract interface . . . . . . .
16.2 IISD solver . . . . . . . . . . . . . . . . .
16.2.1 Interface preconditioning . . . . .
16.3 Implementation details of the IISD solver
16.4 Efficiency considerations . . . . . . . . . .
17 The
17.1
17.2
17.3
DistMap class
Abstract interface . . . . . . . . .
Implementation details . . . . . . .
Mesh refinement . . . . . . . . . .
17.3.1 Symmetry group generator
17.3.2 Canonical ordering . . . . .
17.4 Permutation tree . . . . . . . . . .
17.5 Canonical ordering . . . . . . . . .
17.6 Object hashing . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
129
129
129
131
132
134
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
135
. 136
. 136
. 138
. 139
. 141
. 141
. 142
. 142
18 Synchronized buffer
144
18.1 A more specialized class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
19 Authors
147
20 Grants received
148
21 Symbols and Acronyms
150
21.1 Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5
22 Symbols
150
1 LICENSE
The PETSc - FEM package is a library and application suite oriented to the Finite Element Method based on PETSc. Copyright (C) 1999-2008, Mario Alberto Storti, Norberto Marcelo Nigro, Rodrigo R. Paz, Lisandro Dalcin, Ezequiel Lopez. Centro Internacional de Metodos Numericos en Ingenieria (CIMEC-Argentina), Universidad Nacional
del Litoral (UNL-Argentina), Consejo Nacional de Investigaciones Cientificas y Tecnicas
(UNL-Argentina).
This program is free software; you can redistribute it and/or modify it under the terms
of the GNU General Public License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this
program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330,
Boston, MA 02111-1307, USA.
2 GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc. 675 Mass Ave, Cambridge,
MA 02139, USA Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
2.1 Preamble
The licenses for most software are designed to take away your freedom to share and change
it. By contrast, the GNU General Public License is intended to guarantee your freedom
to share and change free software–to make sure the software is free for all its users. This
General Public License applies to most of the Free Software Foundation’s software and
to any other program whose authors commit to using it. (Some other Free Software
Foundation software is covered by the GNU Library General Public License instead.) You
can apply it to your programs, too.
When we speak of free software, we are referring to freedom, not price. Our General
Public Licenses are designed to make sure that you have the freedom to distribute copies
of free software (and charge for this service if you wish), that you receive source code or
can get it if you want it, that you can change the software or use pieces of it in new free
programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid anyone to deny you
these rights or to ask you to surrender the rights. These restrictions translate to certain
responsibilities for you if you distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether gratis or for a fee, you
must give the recipients all the rights that you have. You must make sure that they, too,
6
receive or can get the source code. And you must show them these terms so they know
their rights.
We protect your rights with two steps:
• copyright the software, and
• offer you this license which gives you legal permission to copy, distribute and/or
modify the software.
Also, for each author’s protection and ours, we want to make certain that everyone
understands that there is no warranty for this free software. If the software is modified
by someone else and passed on, we want its recipients to know that what they have is
not the original, so that any problems introduced by others will not reflect on the original
authors’ reputations.
Finally, any free program is threatened constantly by software patents. We wish to avoid
the danger that redistributors of a free program will individually obtain patent licenses,
in effect making the program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone’s free use or not licensed at all.
The precise terms and conditions for copying, distribution and modification follow.
2.2 GNU General Public License. Terms and Conditions for Copying,
Distribution and Modification
0. This License applies to any program or other work which contains a notice placed by
the copyright holder saying it may be distributed under the terms of this General
Public License. The ”Program”, below, refers to any such program or work, and
a ”work based on the Program” means either the Program or any derivative work
under copyright law: that is to say, a work containing the Program or a portion of
it, either verbatim or with modifications and/or translated into another language.
(Hereinafter, translation is included without limitation in the term ”modification”.)
Each licensee is addressed as ”you”.
Activities other than copying, distribution and modification are not covered by this
License; they are outside its scope. The act of running the Program is not restricted,
and the output from the Program is covered only if its contents constitute a work
based on the Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program’s source code as you
receive it, in any medium, provided that you conspicuously and appropriately publish
on each copy an appropriate copyright notice and disclaimer of warranty; keep intact
all the notices that refer to this License and to the absence of any warranty; and give
any other recipients of the Program a copy of this License along with the Program.
You may charge a fee for the physical act of transferring a copy, and you may at
your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or
7
work under the terms of Section 1 above, provided that you also meet all of these
conditions:
a) You must cause the modified files to carry prominent notices stating that you
changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in whole or in part
contains or is derived from the Program or any part thereof, to be licensed as
a whole at no charge to all third parties under the terms of this License.
c) If the modified program normally reads commands interactively when run, you
must cause it, when started running for such interactive use in the most ordinary
way, to print or display an announcement including an appropriate copyright
notice and a notice that there is no warranty (or else, saying that you provide a
warranty) and that users may redistribute the program under these conditions,
and telling the user how to view a copy of this License. (Exception: if the
Program itself is interactive but does not normally print such an announcement,
your work based on the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If identifiable sections
of that work are not derived from the Program, and can be reasonably considered
independent and separate works in themselves, then this License, and its terms, do
not apply to those sections when you distribute them as separate works. But when
you distribute the same sections as part of a whole which is a work based on the
Program, the distribution of the whole must be on the terms of this License, whose
permissions for other licensees extend to the entire whole, and thus to each and every
part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest your rights to
work written entirely by you; rather, the intent is to exercise the right to control the
distribution of derivative or collective works based on the Program.
In addition, mere aggregation of another work not based on the Program with the
Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License.
3. You may copy and distribute the Program (or a work based on it, under Section 2) in
object code or executable form under the terms of Sections 1 and 2 above provided
that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable source code,
which must be distributed under the terms of Sections 1 and 2 above on a
medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three years, to give any third
party, for a charge no more than your cost of physically performing source
distribution, a complete machine-readable copy of the corresponding source
code, to be distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
8
c) Accompany it with the information you received as to the offer to distribute
corresponding source code. (This alternative is allowed only for noncommercial
distribution and only if you received the program in object code or executable
form with such an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source
code for all modules it contains, plus any associated interface definition files, plus
the scripts used to control compilation and installation of the executable. However,
as a special exception, the source code distributed need not include anything that is
normally distributed (in either source or binary form) with the major components
(compiler, kernel, and so on) of the operating system on which the executable runs,
unless that component itself accompanies the executable.
If distribution of executable or object code is made by offering access to copy from
a designated place, then offering equivalent access to copy the source code from the
same place counts as distribution of the source code, even though third parties are
not compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program except as expressly
provided under this License. Any attempt otherwise to copy, modify, sublicense or
distribute the Program is void, and will automatically terminate your rights under
this License. However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such parties remain in
full compliance.
5. You are not required to accept this License, since you have not signed it. However,
nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License.
Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and
conditions for copying, distributing or modifying the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the Program), the
recipient automatically receives a license from the original licensor to copy, distribute
or modify the Program subject to these terms and conditions. You may not impose
any further restrictions on the recipients’ exercise of the rights granted herein. You
are not responsible for enforcing compliance by third parties to this License.
7. If, as a consequence of a court judgment or allegation of patent infringement or for any
other reason (not limited to patent issues), conditions are imposed on you (whether
by court order, agreement or otherwise) that contradict the conditions of this License,
they do not excuse you from the conditions of this License. If you cannot distribute
so as to satisfy simultaneously your obligations under this License and any other
pertinent obligations, then as a consequence you may not distribute the Program at
all. For example, if a patent license would not permit royalty-free redistribution of
the Program by all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to refrain entirely
from distribution of the Program.
9
If any portion of this section is held invalid or unenforceable under any particular
circumstance, the balance of the section is intended to apply and the section as a
whole is intended to apply in other circumstances.
It is not the purpose of this section to induce you to infringe any patents or other
property right claims or to contest validity of any such claims; this section has the
sole purpose of protecting the integrity of the free software distribution system, which
is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance
on consistent application of that system; it is up to the author/donor to decide if
he or she is willing to distribute software through any other system and a licensee
cannot impose that choice.
This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in certain countries either
by patents or by copyrighted interfaces, the original copyright holder who places the
Program under this License may add an explicit geographical distribution limitation
excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if
written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions of the General
Public License from time to time. Such new versions will be similar in spirit to the
present version, but may differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the Program specifies a
version number of this License which applies to it and ”any later version”, you have
the option of following the terms and conditions either of that version or of any later
version published by the Free Software Foundation. If the Program does not specify
a version number of this License, you may choose any version ever published by the
Free Software Foundation.
10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For
software which is copyrighted by the Free Software Foundation, write to the Free
Software Foundation; we sometimes make exceptions for this. Our decision will be
guided by the two goals of preserving the free status of all derivatives of our free
software and of promoting the sharing and reuse of software generally.
11. WARRANTY BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE,
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN
WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM ”AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND
10
PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY
SERVICING, REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO
MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED
ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL,
SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF
THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT
LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE
OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF
THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF
SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
2.3 Appendix: How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest possible use to
the public, the best way to achieve this is to make it free software which everyone can
redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest to attach them to the
start of each source file to most effectively convey the exclusion of warranty; and each file
should have at least the ”copyright” line and a pointer to where the full notice is found.
<one line to give the program’s name and a brief
does.> Copyright (C) 19yy <name of author>
idea of what it
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT
ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write
to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this when it starts in an
interactive mode:
11
Gnomovision
version 69, Copyright
(C) 19yy name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type
‘show
w’.
This is
free software, and you
are welcome to
redistribute
it under certain conditions;
type ‘show c’ for
details.
The hypothetical commands ‘show w’ and ‘show c’ should show the appropriate parts
of the General Public License. Of course, the commands you use may be called something
other than ‘show w’ and ‘show c’; they could even be mouse-clicks or menu items–whatever
suits your program.
You should also get your employer (if you work as a programmer) or your school, if any,
to sign a ”copyright disclaimer” for the program, if necessary. Here is a sample; alter the
names:
Yoyodyne,
Inc., hereby disclaims all copyright
interest in the
program ‘Gnomovision’ (which makes passes at compilers) written by
James Hacker.
<signature of Ty Coon>, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful
to permit linking proprietary applications with the library. If this is what you want to do,
use the GNU Library General Public License instead of this License.
12
3 The PETSc-FEM philosophy
3.1 The three levels of interaction with PETSc-FEM
As stated in the PETSc-FEM description, it is both a library and an application suite.
That means that some applications as Navier-Stokes, Euler (inviscid flow), shallow water,
general advective linear systems and the Laplace/Poisson equation, come bundled with it,
whereas the library is an abstract interface that allows people to write other applications.
So that we distinguish between the “user” for which the interaction with PETSc-FEM is
limited to writing data files for the bundled applications, from the “application writers”
that is people that uses the library to develop new applications. Usually, application
writers write a main() routine that use routine calls to the PETSc-FEM library in order
to assemble PETSc vectors and matrices and perform algebraic operations among them
via calls to PETSc routines. In addition, they also have to code “element routines” that
compute vectors and matrices at the element level. PETSc-FEM is the code layer that
is in charge of assembling the individual contributions in the global vectors or matrices,
taking into account fixations, etc... Finally, there is the “PETSc-FEM programmers”,
that is people that write code for the core library.
data_file
written by the user
Application Code
written by the
application writer
main()
PETSc-FEM library
PETSc
Element routines
program output
Figure 1: Typical structure of a PETSc-FEM application
3.2 The elemset concept
PETSc-FEM is written in the C++ language and sticks to the Object-Oriented Programming (OOP) philosophy. In OOP, data is stored in “objects” and the programmer access
them via an abstract interface. A first approach to writing a Finite Element program with
13
OOP philosophy, is to define each element or node as an object. However, this is both
time and memory consuming, since accessing each element or each node is performed by
passing through the whole code layer of the element or node class. As one of the primary
objectives of PETSc-FEM is the efficiency, we solved this by defining the basic objects
as a whole set of elements of nodes that share almost all the properties, aside element
connectivities or node coordinates. This is very common in CFD, where for each problem
almost all the elements share the same physical properties (viscosity, density, etc...) and
options (number of integration Gauss points, parameters for the FEM formulation, etc...).
Thus, for each problem the user defines a nodedata object, and one or several elemset
objects.
Each elemset groups elements of the same type, i.e. for which residuals and matrices
are to be computed by the same routine. Usually, in CFD all the elements are processed
by the same routine, so that one may ask for what it may serve to have several elemsets. First, some boundary conditions (constant flux or mixed boundary conditions for
heat conduction, absorbing boundary conditions for advective systems) may be imposed
more conveniently through an elemset abstraction. Also, several elemsets may be used
for reducing the space required for storing varying physical properties. If some physical
property is shared by all the elements, for instance viscosity or specific heat, then it is
defined once for all the elemset. If the quantity varies continuously in the region, but it
is known a priori, then it can be passed as a “per-element” property (see §5.6), but that
means storage of a double (or the amount of memory needed for that property) for each
element. If it is not the same on all the mesh, but is constant over large regions, then it
may be convenient to divide the mesh in several elemsets, where the given property has
the same value over the elements of each elemset.
4 General layout of the user input data file
Input to application packages (like ns, advective or laplace) is feed via input data files,
usually with extension .dat. This file contains global options, node coordinates, element
connectivities, physical properties, fixations and boundary conditions, etc... Even if the
precise format of the file may be changing it is worth to describe the general layout.
The file is divided in sections: the table sections, the nodedata section, several elemset
sections, the fixa section and constraint section.
Each section starts with the keyword identifying the section in a separate line, followed
by several parameters in the same line. Follows several lines that makes the section, ending
with a terminator of the form __END_<some-identifier>__, for instance __END_HASH__.
(Note that these terminators start and end with double underscores (__) while single
underscores are used to separate words inside the keyword). For instance, an elemset
section is of the form
elemset volume_euler 4
geometry cartesian2d
ndim 2
npg 4
chunk_size 1000
14
lumped_mass 1
shock_capturing 1
gamma 1.4
<... other element options go here >
__END_HASH__
1
2
81
80
2
3
82
81
3
4
83
82
4
5
84
83
5
6
85
84
6
7
86
85
7
8
87
86
8
9
88
87
<... more element connectivities follow here >
__END_ELEMSET__
In this example, the keyword elemset is followed by the parameters volume_euler that
is the elemset type, and 4 that is the number of nodes connected to each element. The line
starting with props describes some per-element quantities (more on this later, see §5.4).
Follows the assignation of some values to parameters for the actual elemset, for instance
the value 4 is assigned to the npg parameter, that is, the number of Gauss points. The
assignation of parameters ends with the __END_HASH__ terminator. Follows the element
connectivities, one per line, ending with the terminator __END_ELEMSET__.
4.1 Preprocessing the user input data file
It’s very handy to have some preprocessing capabilitites when reading input files, for
instance including files, if-else constructs, macro definitions, inline arithmetics, etc... Some
degree of preprocessing is performed inside PETSc-FEM – this includes file inclusion,
skipping comments, and continuation lines and is described in section §4.2. Off course,
this internal preprocessing may be combined with any previous preprocessing package, such
as m4 or ePerl. In section §4.2 we describe internal preprocessing while in section§4.3 we
describe preprocessing with ePerl.
The reason to have this two mechanisms of preprocessing is the following. Preprocessing
with ePerl (or m4, or any other packages) is very powerful and well supported, however the
mechanism is to create an intermediate file, and this file may be very large for large problems, so that some internal preprocessing including, at least, the file inclusion is needed.
On the other hand, including all the preprocessing capabilities of preprocessing as in ePerl
is beyond the scope of PETSc-FEM, so that we preferred to keep with this two levels of
preprocessing. The idea is to have a small user data file where the strong capabilities of an
external preprocessing package may be used while performing file inclusion of very large
files containing node coordinates, element connectivities and so on, as the file is reading,
avoiding the creation of large intermediate files. In addition, this allows the user to choose
the external preprocessing package.
15
4.2 Internal preprocessing. The FileStack class
This class allows reading from a set of linked files (the PETSc-FEM data file including node
coordinates, mesh connectivities, etc...) with some preprocessing capabilities. Supports
file inclusion, comments and continuation lines. The preprocessing capabilities supported
in this class are the minimal ones needed in order to treat very large files efficiently. More
advanced preprocessing capabilities will be added in a higher level layer written in Perl or
similar (may be ePerl?).
4.2.1 Syntax
The syntax of comments and continuation lines is borrowed from Unix shells,
Comments: From the first appearance of “#” to the end of the lines is considered a
comment.
Continuation lines: If a line ends in “\”, (i.e. if backslash is the last character in the line,
before newline “^J”) then the next line is appended to the previous one to form a
logical single line.
File inclusion: The directive
__INCLUDE__ path/to/file
inserts the contents of “file” in directory “path/to/” to this file in the actual
point. Directories may be absolute or relative (we use “fopen” from “stdio”). File
inclusion may be recursive to the limit of the quantity of files that can be kept open
simultaneously by the system.
Echo control: The directives __ECHO_ON__, __ECHO_OFF__ controls whether the lines
read from input should be copied to the output. Usually one may be interested in
copying some part of the input to the output in order to remember the parameters of
the run. As implemented so far, this feature is recursive so that if included files (with
the internal preprocessing, i.e. with the __INCLUDE__ directive) will be also copied
to the output, unless you enclose the __INCLUDE__ line itself with a __ECHO_OFF__,
__ECHO_ON__ pair. For instance
__ECHO_ON__
global_options
...
# more options here
nsaverot 50
viscosity 13.3333333333333
weak_form 0
...
# and here
__END_HASH__
nodes
2
2
3
# do not echo coordinates
16
__ECHO_OFF__
__INCLUDE__ stabi.nod.tmp
__ECHO_ON__
__END_NODES__
...
4.2.2 Class internals
As its names suggests, the class is based in a stack of files. When a “get_line()” is issued
a new line is read, comments are skipped and continuation lines are resolved. if the line is
an “__INCLUDE__” directive, then the current line is “pushed” in the stack and the new
file is open and is the current file.
4.3 Preprocessing with ePerl
ePerl (for “embedded Perl”) is a package that allows inclusion of Perl commands within
text files. Perl is the “Practical Extraction and Report Language”, a scripting language optimized for scanning arbitrary text files, extracting information from those
text files, and printing reports based on that information. For more information about
ePerl see http://www.engelschall.com/sw/eperl/, while for more information on Perl see
http://www.perl.com.
Describing the possibilities of preprocessing with ePerl are far beyond the scope of this
manual. We will describe some basic techniques of use in the context of writing PETScFEM user data files.
4.3.1 Basics of ePerl
ePerl allow you to embed Perl commands within text, enclosing them within <: and :>
delimiters, for instance
<: $pi = 2*atan2(1,0); $theta = sin($pi/4); :>
...
some text here
...
theta <: print $theta :>
beta <: print 4*$theta :>
results, after being processed by ePerl in
...
some text here
...
theta 0.707106781186547
beta 2.82842712474619
17
The basic rules are
• Variables start with $ following by a C-like identifier (alphanumeric plus underscore,
case sensitive), for instance $alpha or $my_variable.
• Statements end in semicolon.
• The text inserted in place of the ePerl block is the output of the commands inside
the block. This output is done in Perl with the print() function, but in ePerl there
is a shortcut of the form <:=expression:>.
• Mathematical functions sin, cos, tan, exp, atan2(y,x) are available, as in C,
powers xy are expressed as x**y (i.e. Fortran like).
4.3.2 Variables
You can define variables to use them after in different places, and also in mathematical
expressions
<: $Reynolds = 1000; $L = 2; $rho = 1.345;
$velocity=3.54; $Grashof=6.e4; :>
...
mu <:=$rho*$velocity*$L/$Reynolds:>
Nusselt <:=(($Reynolds*$Grashof)**0.25):>
which results in
...
mu 0.0095226
Nusselt 88.0111736793393
4.3.3 Text expansion
It is common to have several lines of text that have to be repeated several times. For
instance some options that hae to be applied to several elemsets. The trick is to assign
the text to a variable via the “here-in document” <<EOT feature and then inserting in the
appropriate places, for instance
<: $common_options = <<EOT;
option1
value1
option2
value 2
EOT
_:>
...
elemset type1 4
props
18
<:=$common_options:>
option3 value3
...
__END_ELEMSET__
elemset type2 3
props
<:=$common_options:>
option4 value4
...
__END_ELEMSET__
that expands to
...
elemset type1 4
props
option1
value1
option2
value 2
option3 value3
...
__END_ELEMSET__
elemset type2 3
props
option1
value1
option2
value 2
option4 value4
...
__END_ELEMSET__
Note the use of the underscore just before the :> terminator in the first ePerl block.
This tells to ePerl not to include the semicolon terminator (see the ePerl documentation
for further details.) The terminator EOT stands for “End Of Text” and may be replaced by
any similar string. It must appear in a line by itself at the end of the text to be included.
19
4.3.4 Conditional processing
ePerl allows conditional processing, as with the C preprocessor, with #if-#else-#endif
constructs as in
<:$method = "iterative"; :>
...
#if $method eq "iterative"
maxits 100
tolerance 1e-2
#else
maxits 1
tolerance 1e-10
#endif
...
expands to
...
maxits 100
tolerance 1e-2
...
Also, lines starting with #c are dicarded as comments. Conditional preprocessing and
comments are enabled with the “-P” flag, so that make sure you have this flag enabled
when preprocessing the .epl file (probably in the Makefile file). Note that PETSc-FEM
comments (those starting with numeral “#”) may collide with the ePerl preprocessing
directives, so that when commenting out lines in PETSc-FEM input files it is safer to
leave a space between the “#” character and the commented text
# commented text
#commented tex
(OK)
(but dangerous!)
4.3.5 File inclusion
In addition to the inclusion allowed in the internal preprocessor via the __INCLUDE__
command, ePerl has his own inclusion directive, for instance
some text
...
#include /home/mstorti/PETSC/petscfem/doc/options.txt
...
another text
and provided file options.txt contains
20
# File options.txt
opion1 value1
option2 value2
then the previous block expands to
some text
...
# File options.txt
opion1 value1
option2 value2
...
another text
Including with the internal preprocessing directive __INCLUDE__ has the advantage of not
creating an intermediate file. On the other hand, including with the ePerl directive, allows
recursive ePerl preprocessing and more versatility in defining the inclusion path (with the
@INC list, see Perl documentation).
4.3.6 Use of ePerl in makefiles
Usually user data files have extension .dat. When preprocessing with ePerl the convention
is to use .epl suffix for the file written by the user with ePerl commands, i.e. the files to
be preprocessed, and suffix .depl for the preprocessed file. A line in the Makefile of the
form
%.depl: %.epl
eperl -P $< > [email protected]
assures the translation when needed.
4.3.7 ePerlini library
Some useful constants and functions are found in the file eperlini.pl. This may be
included in the user data file with the following line
<:require ’eperlini.pl’:>// # Initializes ePerl
It includes a definition for $PI (=π), trigonometric and hyperbolic functions, and others.
A common mistake when using preprocessing packages like ePerl is to edit manually
the preprocessed file .depl, instead to edit the .epl file. In order to avoid this we write
protect the .depl file, for instance the section in the Makefile is replaced by
%.depl: %.epl
if [ -e [email protected] ] ; then chmod +w [email protected] ; rm [email protected] ; fi
eperl -P $< > [email protected]
chmod -w [email protected]
In addition, inclusion of the eperlini.pl library inserts the following comment in the
included file
21
# DON’T EDIT MANUALLY THIS FILE !!!
# This files automatically generated by ePerl from
# the corresponding ‘.epl’ file.
4.3.8 Errors in ePerl processing
If the preprocessing stage with ePerl gives some error (on STDERR) preprocessing is stopped
no ePerl output is given. For instance, if you include a directive like
<:=atanh(2.):>
the output looks like
ePerl:Error: Perl runtime error (interpreter rc=255)
---- Contents of STDERR channel: --------atanh: argument x must be |x| < 1.
-----------------------------------------In such a case you have to fix the ePerl commands prior to any further debugging of the
PETSc-FEM run.
4.4 General options
The following options apply to all the modules.
4.4.1 Read mesh options
This options are read in the read_mesh() routine
• int additional_iprops (default=0):
int additional properties (used by the element routine) (found in file: readmesh.cpp)
• int additional_props (default=0):
Additional properties (used by the element routine) (found in file: readmesh.cpp)
• int check_dofmap_id (default=0):
Checks that the idmap has been correctly generated. (found in file: readmesh.cpp)
• int debug_element_partitioning (default=0):
Prints element partitioning. (found in file: readmesh.cpp)
• int local_store (default=0):
Defines a “locker” for each element (found in file: readmesh.cpp)
• int max_partgraph_vertices (default=INF):
Maximum number of vertices admissible while computing the partitioning graph.
(found in file: readmesh.cpp)
22
• string partitioning_method (default=metis):
Set partitioning method. May be set to metis , hitchhiking , nearest_neighbor
or random . (found in file: readmesh.cpp)
• int print_dofmap_id (default=0):
Prints the dofmap idmap object. (found in file: readmesh.cpp)
• int print_hostnames (default=0):
Print hostnames for nodes participating in this run (found in file: readmesh.cpp)
• int print_partitioning_statistics (default=0):
Print graph statistics (found in file: readmesh.cpp)
4.4.2 Elemset options
This options are used in the Elemset class
• int chunk_size (default=ELEM_CHUNK_SIZE):
Chunk size for the elemset. (found in file: elemset.cpp)
• int debug_compute_prof (default=0):
Debug the process of building the matrix profile. (found in file: elemset.cpp)
• int element_weight (default=1):
Element weight for the processor (found in file: elemset.cpp)
• double epsilon_fdj (default=EPSILON_FDJ):
The increment in the variables in order to compute the finite difference approximation to the Jacobian. Should be order epsilon=sqrt(precision)*(typical magnitude of
the variable). Normally, precision=1e-15 so that epsilon=1e-7*(typical magnitude
of the variable) (found in file: elemset.cpp)
• int print_local_chunk_size (default=0):
Print the local chunk size used for each elemset in each processor for each chunk.
(found in file: elemset.cpp)
• int report_assembly_time (default=0):
Debug the process of building the matrix profile. (found in file: elemset.cpp)
• int report_consumed_time (default=0):
Report consumed time for the elemset. Useful for building the table of weights per
processor. (found in file: elemset.cpp)
• int report_consumed_time_stat (default=0):
Print statistics about time spent in communication and residual evaluation (found
in file: elemset.cpp)
23
4.4.3 PFMat/IISDMat class options
This options are used in the PFMat class
• int iisd_subpart (default=1):
Number of subpartitions inside each processor. (found in file: iisdcr.cpp)
• int iisd_subpart_auto (default=0):
Choose automatically the number of subdomains so as to have approximately this
number of unknowns per subdomain. (found in file: iisdcr.cpp)
• int iisdmat_print_statistics (default=0):
Print dof statistics, number of dofs local and interface in each processor. (found in
file: iisdcr.cpp)
• double interface_full_preco_fill (default=1.):
The ILU fill to be used for the A_II problem if the ILU preconditioning is used
(found in file: iisdcr.cpp)
• int interface_full_preco_maxits (default=5):
Number of iters in solving the preconditioning for the interface problem when using
use_interface_full_preco . (found in file: iisdcr.cpp)
• string interface_full_preco_pc (default=jacobi):
Defines the preconditioning to be used for the solution of the diagonal interface
problem (not the Schur problem) (found in file: iisdcr.cpp)
• double interface_full_preco_relax_factor (default=1.):
The problem on the interface is solved with Richardson method with few iterations
(normally 5). Richardon iteration may not converge in some cases and then we can
help convergence using a relaxation factor ¡1 (found in file: iisdcr.cpp)
• string local_solver (default=PETSc):
Chooses the local solver (may be ”PETSc” or ”SuperLU”) (found in file:
iisdcr.cpp)
• int max_partgraph_vertices_proc (default=INF):
The maximum number of vertices in the coarse mesh for sub-partitioning the dof
graph in the IISD matrix. (found in file: iisdcr.cpp)
• double pc_lu_fill (default=5.):
PETSc parameter related to the efficiency in growing the factored profile. (found in
file: iisdcr.cpp)
• int print_Schur_matrix (default=0):
Print the Schur matrix (don’t try this for big problems). (found in file: iisdcr.cpp)
24
• int print_interface_full_preco_conv (default=0):
Flags whether or not print the convergence when solving the preconditioning for
the interface problem when using use_interface_full_preco . (found in file:
iisdcr.cpp)
• int use_interface_full_preco (default=0):
Chooses the preconditioning operator. (found in file: iisdcr.cpp)
• int use_interface_full_preco_nlay (default=1):
Number of layers in the preconditioning band (should be nlay>=1 .) (found in file:
iisdcr.cpp)
• int asm_lblocks (default=1):
Chooses the number of local blocks in ASM (found in file: )
• int asm_overlap (default=1):
Chooses the overlap of blocks in ASM (found in file: )
• string asm_sub_ksp_type (default=preonly):
Chooses the preconditioning for block problems in ASM method. (found in file: )
• string asm_sub_preco_type (default=ilu):
Chooses the preconditioning for block problems in ASM method. (found in file: )
• string asm_type (default=restrict):
Chooses the restriction/extension type in ASM (found in file: )
• double atol (default=1e-6):
Absolute tolerance to solve the monolithic linear system (Newton linear subiteration). (found in file: )
• int compact_profile_graph_chunk_size (default=0):
Size of chunk for the dynamic vector used in computing the mstrix profile. (found
in file: )
• double dtol (default=1e+3):
Divergence tolerance to solve the monolithic linear system (Newton linear subiteration). (found in file: )
• string gmres_orthogonalization (default=modified_gram_schmidt):
Orthogonalization method used in conjunction with GMRES. May be
unmodified gram schmidt, modified_gram_schmidt or ir orthog (default).
(Iterative refinement). See PETSc documentation. (found in file: )
• int Krylov_dim (default=50):
Krylov space dimension in solving the monolithic linear system (Newton linear subiteration) by GMRES. (found in file: )
25
• string KSP_method (default=gmres):
Defines the KSP method (found in file: )
• int maxits (default=Krylov_dim):
Maximum iteration number in solving the monolithic linear system (Newton linear
subiteration). (found in file: )
• string preco_side (default=<ksp-dependent>):
Uses right or left preconditioning. Default is right for GMRES. (found in file: )
• string preco_type (default=jacobi):
Chooses the preconditioning operator. (found in file: )
• int print_fsm_transition_info (default=0):
Print Finite State Machine transitions for PFPETScMat matrices. 1: print inmediately, 2: gather events (non immediate printing). (found in file: )
• int print_internal_loop_conv (default=0):
Prints convergence in the solution of the GMRES iteration. (found in file: )
• double rtol (default=1e-3):
Relative tolerance to solve the monolithic linear system (Newton linear subiteration).
(found in file: )
• int use_compact_profile (default=LINK_GRAPH):
Choice representation of the profile graph. Possible values are: 0) Adjacency graph
classes based on STL map+set, demands too much memory, CPU time OK. 1) Based
on dynamic vector of pair of indices with resorting, demands too much CPU time,
RAM is OK 2) For each vertex wee keep a linked list of cells containing the adjacent
nodes. Each insertion is O(m2 ) where m is the average number of adjacent vertices.
This seems to be optimal for FEM connectivities. (found in file: )
4.5 Emacs tools and tips for editing data files
GNU Emacs (http://www.gnu.org/software/emacs/) is a powerful text editor, that
can colorize and indent text files according to the syntax of the language you are editing.
Emacs has “modes” for most languages (C/C++, Pascal, Fortran, Lisp, Perl, ...). We have
written a basic mode for PETSc-FEM data files that is distributed with PETSc-FEM in
the tools/petscfem.el Emacs Lisp file. Another mode that can serve for colorization is
the “shell-script” mode. In order to associate the .epl or .depl extensions to this mode,
add this to your ~/.emacs file
(setq auto-mode-alist
(cons ’("\\.\\(\\|d\\)epl$" . shell-script-mode) auto-mode-alist))
26
4.5.1 Installing PETSc-FEM mode
In order to use the mode, you have to copy the file tools/petscfem.el to somewhere
accessible for Emacs (/usr/share/emacs/site-lisp is a good candidate). There is also
a file tools/petscfem-init.el that contains some basic configuration, you can also copy
it, or insert directly its contents into your .emacs.
;; Load basic PETSc-FEM mode
(load-file "/path/to/petscfem/tools/petscfem.el")
;; Load ‘petscfem-init’ or insert directly its contents
;; below and configure
(load-file "/path/to/petscfem/tools/petscfem-init.el")
4.5.2 Copying and pasting PETSc-FEM options
PETSc-FEM modules have a lot of options, and is mandatory to have fast access to all of
them and to their documentation. The HTML documentation has a list of all of them at
the end of the user’s guide. For easier access there is also an info file (doc/options.info)
that has a page for each of the options. You can browse it with the standalone GNU
Info program or within Emacs with the info commands. In the last case you have the
additional advantage that you can very easily find the documentation for a given option
with a couple of keystrokes and paste options from the manual into your PETSc-FEM
data file.
For jumping to the documentation for an option, put the cursor on the option and press
C-h C-i (that is <Ctrl-h><Ctrl-i>, this is the key-binding for info-lookup-symbol).
You will get in the minibuffer something like
Describe symbol (default print_internal_loop_conv):
If you press RET then you jump to the corresponding page in the info file. From there you
can browse the whole manual. Pressing s (Info-search) allows to search for text in the
whole manual. When done, you can press q (Info-exit) or x (my-Info-bury-and-kill
defined in tools/petscfem.el).
If you don’t know exactly what option you are looking for, then you can search in
the manual or launch info-lookup-symbol with the start of the command and then use
“completion” to finish writing the option.
If you want to copy some option from the info manual to the data file, then you can
use the usual keyboard or mouse copy-and-paste methods of Emacs. Also pressing c
(my-Info-lookup-copy-keyword) in the info manual copies the name of the currently
visited option to the “kill-ring”. Then, in the data file buffer press as usual, C-y (yank)
to paste the last killed option. If you paste several options from the manual, then you can
navigate between them by pasting the last with C-y and then going back and forth in the
kill ring with M-y (yank-forw, usually M- stands for pressing the <Alt> or <Escape> key).
For more information, see the Emacs manual, specially the documentation for the info
and info-lookup modes.
27
5 Text hash tables
In many cases, options are passed from the user data file to the program in the form
of small databases which are called “text hash tables”. This consist of several lines,
each one starting with a keyword and followed by one or several values. When reading
this data in the read_mesh() routine, the program doesn’t know anything about neither
the keywords nor the appropriate values for these keywords. Just when the element is
processed by the element module via the assemble() function call, for instance assembling
residuals or matrices, the element routine written by the application writer reads values
from this database and apply it to the elemset. The text hash table is stored internally as
a correspondence between the keys and the values, both in the form of strings. The key is
the first (space delimited) token and the value is the rest of the string, from the end of the
space separator to the end of the line. Usually the values are strings like a “C” identifier
(chunk_size for instance). As the values are stored in the form of strings, almost any
kind of values may be stored their, for instance
non_linear_method Newton
max_iter 10
viscosity 1.2e-4
# string value
# integer value
# double value
or lists and combinations of them. It’s up to the application writer to decode this values
at the element routine level. Several routines in the getprop package help in getting this
(see §5.4).
Some of this options are used for internal use of the PETSc-FEM code, for instant
chunk_size sets the quantity of elements that are processed by the elemset routine at
each time.
In section §5.1, some specific properties of the elemset properties text hash table are
described.
5.1 The elemset hash table of properties
A very common problem in FEM codes is how to pass element physical or numerical
properties (conductivities, viscosities, etc... ), to the element routine. In PETSc-FEM
you can pass arbitrary “per elemset” quantities via an “elemset properties hash table”.
This comes after the element header as in the following example
elemset nsi_tet 4 fat 4
# elemset header
props
# per element properties (to be explained later)
#
# Elemset properties hash table
#
name My Mesh
geometry cartesian2d
ndim 2
npg 4
couple_velocity 0
weak_form 1
28
# physical properties
Dt .1
# time step
viscosity 1.
# next lines flags end of the properties hash table
__END_HASH__
# element connectivities
1 3 4 2
3 5 6 4
5 7 8 6
...
< more element connectivities here >
...
193 195 196 194
195 197 198 196
197 199 200 198
199 201 202 200
# next lines flags end of elemset connectivities
__END_ELEMSET__
In this example we define several properties, containing doubles (viscosity and Dt),
integers (ndim, npg, etc...) or strings (name). This hash table is stored in the form of
an associative array (“hash”) with a text key, the first non-blank characters and a value
composed of the rest of the line. In the previous example
Key: "name" → Value: "My Mesh"
Key: "ndim" → Value: "2"
(1)
Key: "viscosity" → Value: "1."
and so on. This hash table is read in an object of type “TextHashTable” without checking
whether this properties apply to the particular elemset or not. The values are stored
strictly as string, so that no check is performed on the syntax of entering double values or
integers.
5.2 Text hash table inclusion
Often, we have some physical or numerical parameter that is common to a set of elemsets,
for instance gravity in shallow water, or viscosity in Navier-Stokes. In this case, these
common properties can be put in a separate table with a table ... header, and included
in the elemsets with _table_include directives. The table ... sections (not associated
to a particular elemset) have to be put in preference before the calling elemset, for instance
table steel_properties
density 13.2
viscosity 0.001
<more steel properties here ...>
__END_HASH__
29
elemset nsi_tet 4
_table_include steel_properties
npg 8
weak_form 1
<more properties for this elemset here ...>
__END_HASH__
1 3 4 2
<more element connectivities here ...>
Text hash tables may be recursively included to any depth. The text hash table for an
elemset may be included in other elemset referencing it by its name entry, for instance
elemset nsi_tet 4
name volume_elemset_1
geometry cartesian2d
npg 4
viscosity 0.23
<more properties here ...>
__END_HASH__
1 3 4 2
<more connectivities here ...>
__END__ELEMSET__
elemset nsi_tet 4
name volume_elemset_2
_table_include volume_elemset_1 # includes properties from the
# previous elemset
__END_HASH__
5 4 76 45
<more connectivities here ...>
__END__ELEMSET__
5.3 The global table
If one table is called global_options then all other hash tables inherit their properties,
without needing to explicitly include it. For instance in this case
table global_options
viscosity 0.023
tau_fac 0.5
__END_HASH__
elemset nsi_tet 4
name elemset_1
30
__END_HASH__
5 4 76 45
<more connectivities here ...>
__END__ELEMSET__
the elemset elemset_1 gets a value for viscosity from the global hash table of 0.023. The
global_options table may include other tables as well.
Note: In previous versions of the code, the table keyword may be omitted for the
global_options table. For instance, the previous example can be entered as
global_options
viscosity 0.023
tau_fac 0.5
__END_HASH__
This usage is obsolete, and will be deprecated.
5.4 Reading strings directly from the hash table
Once inside the element routine values can be retrieved with routines from the
TextHashTable class, typically get_entry, for instance
char *geom;
thash->get_entry("geometry",geom);
this returns a pointer to the internal value string “geom” (this is documented with the
TextHashTable class. You can then read values from it with routines from stdio (sscanf
and friends). You should not try to modify this strings, for instance with the strtok()
function. In that case, copy the string in a new fresh string (remember to delete it after
to avoid memory leak). In this way you can pass arbitrary information (strings, integer,
doubles) to the element routine.
5.5 Reading with ‘get int’ and ‘get double’
As most of the time element properties are either of integer or double type, two specific
routines are provided “get_int” and “get_double”, for instance
ierr = get_int(thash,"npg",&npg);
where the integer value is directly returned in “npg”. You can specify also a default value
and read several values at once.
5.6 Per element properties table
Many applications need a mechanism for storing values per element, for instance when
dealing with physical properties variable varying spatially in a continuous way. If the
properties is piecewise constant, then you can define an elemset for each constant patch,
but if it varies continuously, then you should need an elemset for each element, which
31
conspires with efficiency. We provide a mechanism to pass “per element” values to the
element routine. At the moment this is only possible for doubles. These properties are
specified in the same line of the connectivities, for instance
elemset nsi_tet 4
# elemset header
props
cond cp emiss
# name of properties to be defined "per element"
# Elemset properties hash table
geometry cartesian2d
ndim 2
npg 4
# physical properties
Dt .1
# time step
viscosity 1.
# next lines flags end of the properties hash table
__END_HASH__
# element connectivities, physical properties per element
1 3 4 2 1.1 2.3 0.7
3 5 6 4 1.2 2.1 0.8
5 7 8 6 1.3 2.2 0.9
...
< more element connectivities and physical props. here >
...
193 195 196 194 1.5 2.0 0.8
195 197 198 196 1.6 2.1 0.7
197 199 200 198 1.7 2.2 0.6
199 201 202 200 1.8 2.3 0.5
# next lines flags end of elemset connectivities
__END_ELEMSET__
Here we define that properties “cond”, “cp” and “emiss” are to be defined per element.
We add to each element connectivity line the three properties. There are two ways to
access these properties. The corresponding values are stored in an array “elemprops”
of length nprops × nelem , where nprops is the number of “per element” properties (3 in
this example) and nelem is the number of elements in the mesh. Also there is a macro
“ELEMPROPS(k,iprop)” that allows treating it as a matrix. So, you can access this values
with
cond = ELEMPROPS(k,0);
cp
= ELEMPROPS(k,1);
emiss= ELEMPROPS(k,2);
// conductivity of element k
// specific heat of element k
// emissivity of element k
5.7 Taking values transparently from hash table or per element table
With the tools described so far you can access constant properties on one hand (the same
for the whole elemset) and per element properties. Now, given a property, you should
decide whether it should be assumed to be constant for all the elemset or whether it can
be taken “per element”. The second is the more general case, but to take all the possible
32
properties as “per element’ may be too much core memory consuming. There is then a
mechanism to allow the application writer to get physical properties without bothering of
whether the user has set them in the properties hash table or in the per-element properties
table.
First, the application writer reserves an array of doubles large enough to contain
all the needed properties. (This doesn’t scale with mesh size so you can be generous here, or either use dynamic memory allocation). Before entering the element
loop, the macro “DEFPROP(prop_name)” determines whether the property has been
passed by one or the other of the mechanisms. This information is stored in an integer vector “elprpsindx[MAXPROP]”. Also, it assigns a position in array “propel”
so that “propel[prop_name_indx]” contains the given property. Then, once inside
the element loop a call to the function “load_props” loads the appropriate values on
“propel[MAXPROP]”, independently how thay have been defined. A typical call sequence
is as follows
// Maximum number of properties to be loaded via load_prop
#define MAXPROP 100
int iprop=0, elprpsindx[MAXPROP]; double propel[MAXPROP];
// determine which mechanism passes ‘conductivity’
DEFPROP(conductivity)
// conductivity is found (after calling load_props()) in
// propel[conductivity_indx]
#define COND (propel[conductivity_indx])
// Other properties
DEFPROP(propa)
#define PROPA (propel[propa_indx])
DEFPROP(propb)
#define PROPB (propel[propb_indx])
DEFPROP(propc)
#define PROPC (propel[propc_indx])
DEFPROP(propp)
#define PROPP (propel[propp_indx])
DEFPROP(propq)
#define PROPQ (propel[propq_indx])
// Total number of properties loaded
int nprops=iprop;
// Set error if maximum number of properties exceeded
33
assert(nprops<=MAXPROP);
// ... code ...
// loop over elements
for (int k=el_start; k<=el_last; k++) {
// check if this element is to be processed
if (!compute_this_elem(k,this,myrank,iter_mode)) continue;
// Load properties either from properties hash table or from
// per element properties table
load_props(propel,elprpsindx,nprops,&(ELEMPROPS(k,0)));
// ... more code ...
// use physical element property
double veccontr += wpgdet * COND * dshapex.t() * dshapex;
// ...
First, we allocate for 100 entries in “elprpsindx” and “propel” arrays, and set the
counter “prop” to 0. Then we call “DEFPROP” for properties “conductivity” and “propa”
thru “propq”. After this, we check that the maximum number of properties to be defined
is not exceeded and enter the element loop. After checking, as usual, if the element needs
processing, we call “load_props” in order to effectively load element properties in propel
and, after this, we can use them as “propel[conductivity_indx]” and so on. Macro
shortcuts “COND”, “PROPA” are handy for this.
6 The general advective elemset
6.1 Introduction to advective systems of equations
Advective system of equations are of the form
∂Fi (U )
∂U
=G
+
∂t
∂xi
(2)
where U is the “state vector” of the fluid in “conservative variables”. Examples of these
are the inviscid fluid equations (the “Euler” or “gas dynamic equations”, the “shallow
water equations”, and scalar advective systems that represents the transport of a scalar
property (like temperature or concentration of a component) by a moving fluid.
The conservative variables for the Euler equations are


ρ
U =  ρu 
(3)
ρe
for the Euler equations. In general, U is a vector of ndof components. Fi (U ) is the
“flux vector”. t can be thought as a matrix of ndof × ndim components. The row index
34
corresponds to a field value, whereas the column index is a spatial dimension. G is a
source vector. For the 2D or 3D Euler equations it is null, but if we consider 1D flow in a
tube of varying section, then it has a source term in the momentum equation, due to the
reaction on the wall. Also, for the shallow water equations, there is a reaction term in the
momentum balance equations if there is a varying bathymetry.
The relation of the flux vector with the state vector is the heart of the advective system.
In fact, the discretization of advective systems may be put in a completely abstract setting,
where the unique thing that varies from one system to another is the definition of the flux
function itself. The discretization of advective systems in PETSc-FEM has been done in
this way, so that it is easy to add other advective systems by only adding the new flux
function.
Applying the chain rule, and noting that the fluxes only depend on position through
their dependence on the state vector, we arrive to
∂U
∂Fi
= Ai
∂xi
∂xi
where
Ai =
∂Fi
∂U
(4)
(5)
are the “jacobians of the advective fluxes”.
6.2 Discretization of advective systems
Using the Finite Element Method, with weight functions Wj (x) and interpolation functions
Nj (x) results in
˙ + F(U) = G
MU
(6)
where

U1
U2
..
.


U=






(7)
UNnod
so that U has Nd.o.f. = ndof × Nnod components.
M (of dimension Nd.o.f. × Nd.o.f. ) is the mass matrix. If we look at it as an Nnod × Nnod
block matrix with blocks of size ndof × ndof , then the i, j block is
Z
Mij =
Ni (x) Nj (x) dx
(8)
Ω
the i block of the global flux and source vector F and G are
Z
∂Fk
dx
Fi =
Ni (x)
∂xk
ZΩ
Gi =
Ni (x) G(x) dx
(9)
Ω
If the flux vector term is integrated by parts then we have the “weak form”
Z
Z
∂Ni
Fi = −
Fk dx + nk Fk (x) dx
Ω ∂xk
Γ
35
(10)
where Γ is the boundary of Ω. This formulation is the “Galerkin” or “centered” one. It is
equivalent to approximate first derivatives by centered differences in the Finite Difference
Method. It is well known that the Galerkin formulation leads to oscillations for advective
systems, and this is solved by adding a “stabilizing term” to the discretized equations.
6.3 SUPG stabilization
In the SUPG (for “Streamline Upwind/Petrov Galerkin”) formulation of Hughes et.al.
The stabilized formulation is
XZ
∂Fi
∂U
˙ + F(U) − G)j +
(MU
+
− G) = 0
(11)
(PSUPG )ej (
∂t
∂x
i
Ω
e
e
where the whole expression corresponds to the j-th block of size ndof in the global equations. Note that, as the added term is a “weighted residual” form of the residual (the term
in parentheses), then the continuum solution is solution of these discrete equations – we
say that this is a “consistent formulation”. PSUPG is a matrix of ndof × ndof the SUPG
“perturbation function”, usually defined as
∂N
(12)
(PSUPG )ej = τ e Aj
∂xj
where τ e are the “characteristic” or “intrinsic time” of the element, defined as
he
τe =
||A||
(13)
where he is the size of the element and and ||A|| represents some norm of the vector of
jacobians. There is a variety of possibilities for computing both quantities. For instance
he may be computed as the largest side of the element, or as the radius of the circle with
the same area. On the other hand, ||A|| may be computed as the maximum eigenvalue
of all the linear combinations of the form nj Aj , with nj a unit vector, i.e. the maximum
propagation velocity possible in the fluid, that is
||A|| =
max
max
j=1,...,ndim k=1,...,ndof
|λjk |
(14)
where λjk is the k-th eigenvalue of jacobian Aj . For the Euler equations, it turns out to
be that this corresponds to pressure waves propagating in the direction of the fluid and is
c + u where c is the speed of sound
√ and u the absolute value of velocity. For the shallow
water equations, its value is u + gh where g is gravity acceleration and h the local water
elevation with respect to bottom.
6.4 Shock capturing
For problems with strong shocks, (shock waves in Euler, or hydraulic jumps in shallow
water) the standard SUPG stabilizing term may not be sufficient. Then an additional
stabilizing term is added, so that the stabilized equations are now of the form
XZ
∂U
∂Fi
˙ + F(U) − G)j +
(MU
(PSUPG )ej (
+
− G)+
∂t
∂xi
Ωe
e
(15)
XZ
∂Nj ∂U
+
δsc
=0
∂xi ∂xi
Ωe
e
36
Note that, in contrast with the SUPG term, the new, so-called shock capturing term is
no more “consistent”. δsc is a scalar – the so called “shock capturing parameter”. Often,
when shock capturing is added, we diminish the amount of stabilization in the SUPG term
in order to compensate and not to have an over-diffusive scheme. We will not enter in the
details of this computations, refer to [2] for further details.
6.5 Creating a new advective system
New advective systems may be added to PETSc-FEM only by defining their flux function,
jacobians and other quantities. This means that you don’t need to code details of the
numerical discretization. Follow these steps
1. Create the flux function in a file by itself, in the applications/advective
directory. (The better is to start copying one of the existing advective systems, for instance ffeuler.cpp or ffshallw.cpp.) The arguments to flux
function routines is described in section 6.6. The name of the function has
to be of the form flux_fun_<system> where <system> identifies the new system. We assume that you write the flux function flux_fun_new_adv_sys in file
applications/advective/ffnadvs.cpp.
2. Add the file in the MYOBJS variable in the Makefile, for instance
MYOBJS = advective.o adv.o absorb.o ffeuler.o \
ffshallw.o ffadvec.o
3. Define the new derived classes volume_new_adv_sys and absorb_new_adv_sys by
adding a line at the end of the file applications/advective/advective.h as in
the following example.
// Add here declarations for further advective
/// Euler equations for inviscid (Gas dynamics
ADVECTIVE_ELEMSET(euler);
/// Shallow water equations.
ADVECTIVE_ELEMSET(shallow);
/// Advection of multiple scalar fields with a
ADVECTIVE_ELEMSET(advec);
/// My new advective system
ADVECTIVE_ELEMSET(new_adv_sys);
//
4. Recompile.
6.6 Flux function routine arguments
Currently, the interface is the following
37
elemsets.
eqs.)
velocity field.
<- Add this line.
typedef int FluxFunction(const RowVector &U,
int ndim,const Matrix &iJaco,
Matrix &H, Matrix &grad_H,
Matrix &flux,vector<Matrix *> A_jac,
Matrix &A_grad_U, Matrix &grad_U,
Matrix &G_source,Matrix &tau_supg, double &delta_sc,
double &lam_max,
TextHashTable *thash,double *propel,
void *user_data,int options)
(This may eventually change – in any case, if you are interested in adding a new advective
system, then see the actual description in the advective.h file in the distribution.) The
meaning of these arguments are listed below. When the size is specified, it means that the
argument is a Newmat or FastMat matrix.
In some situations the flux function routine must compute only some of the required
values. For instance, when computing the contribution of the absorbing boundary elements there is no need to compute the parameters regarding stabilizing terms. This is
controlled with the parameter options which can take the values DEFAULT, COMP_UPWIND
and COMP_SOURCE. In the list below, it is indicated under which conditions the specific
quantity must be computed.
• const RowVector &U (input, size ndof ) This is the state vector – you must return
the flux, jacobians and other quantities for this state vector.
• int ndim (input) The dimension of the space.
• const Matrix &iJaco (input, size ndim × ndim ) The jacobian of the master to
element coordinates in the actual gauss points. This may be used in order to
calculate the characteristic size of the element
• Matrix &H (input, size 1 × nH ) In the shallow water, the source term G depends
on the gradient of the depth H(x), and in the 1D Euler equations, on the area of
the tube section A(x). This is taken into account by PETSc-FEM by assuming
that the user enters in the nodedata section nu = ndim + nH quantities per node,
where the first ndim quantities are the node coordinates and the rest are assumed
that are node data that has to be passed to the flux function routine (together
with its gradient) in order to compute the source term.
• Matrix &grad_H (input, size ndim × nH ) the gradient of the quantities in H (see
previous entry).
• Matrix &flux (output, size ndof × ndim ) Each column is the vector Fj of fluxes for
each of the governing equations.
• vector<Matrix *> A_jac (output, size: a vector of ndim pointers to matrices of
ndof × ndof ). Each matrix is the jacobian matrix as defined by (5). To access the
jd jacobian matrix you may write (*A_jac[(jd)-1]). The macro AJAC(jd),
defined in advective.h expands to this.
38
• Matrix &A_grad_U (output, size ndof × 1, compute if options & COMP_UPWIND)
∂F
This is the accumulation term Ak (∂U/∂xj ) = ∂xjj .
• Matrix &grad_U (input, size ndim × ndof ) The gradient of the state vector.
• Matrix &tau_supg (output, size: either 1 × 1 or ndof × ndof , compute if
options & COMP_UPWIND). This is the τ intrinsic time scale – it may be either a
scalar or a matrix. Beware that even in the case where it is a scalar it must be
returned as a Newmat Matrix object of dimensions 1 × 1.
• double &delta_sc (output, double, compute if options & COMP_UPWIND) This is
the “shock capturing” parameter as described in 6.4. Set to 0 if no shock capturing
is added.
• double &lam_max (output, double, compute if options & COMP_UPWIND) The
maximum propagation speed. The expression is (14) but normally it may be
computed directly from the state vector. This is used to compute upwind, and also
the automatic and local time step.
• TextHashTable *thash (input) This is the TextHashTable of the elemset.
Physical and numerical parameters can be passed from the user input data file to
the flux function routine through this (specific heat, specific heat ratio, gravity...
for instance). Beware that the flux function routine is called at each Gauss point,
and decoding of the table may be time expensive, so that if the properties are
constant over all the mesh you can decode them once and leave the decoded data
in static variables.
• double *propel (input, double array, compute if options & COMP_UPWIND) This
is the table of per-element properties. Physical and numerical parameters that are
not constant for all the elemsets, can be passed from the user input data file to the
flux function routine through this.
• void *user_data (input). Arbitrary information may be passed from the main
routine to the flux function through this pointer.
• int options (input, integer).
• Matrix &G_source (output, size ndof × 1, compute if options & COMP_SOURCE)
The source vector.
6.6.1 Options
General options:
• double Courant (default=0.6):
The Courant number. (found in file: adv.cpp)
• double Dt (default=0.):
Time step. (found in file: adv.cpp)
39
• double atol (default=1e-6):
Absolute tolerance when solving a consistent matrix (found in file: adv.cpp)
• int auto_time_step (default=1):
Chooses automatically the time step from the selected Courant number (found
in file: adv.cpp)
• int consistent_supg_matrix (default=0):
Uses consistent SUPG matrix for the temporal term or not. (found in file:
adv.cpp)
• double dtol (default=1e+3):
Divergence tolerance when solving a consistent matrix (found in file: adv.cpp)
• int local_time_step (default=1):
Chooses a time step that varies locally. (Only makes sense when looking for
steady state solutions. (found in file: adv.cpp)
• int maxits (default=150):
Maximum iterations when solving a consistent matrix (found in file: adv.cpp)
• int measure_performance (default=0):
Measure performance of the comp_mat_res jobinfo. (found in file: adv.cpp)
• int nfile (default=1):
Sets the number of files in the “rotary save” mechanism. (see 7.2) (found in
file: adv.cpp)
• int nrec (default=1000000):
Sets the number of states saved in a given file in the “rotary save” mechanism
(see 7.2 (found in file: adv.cpp)
• int nsave (default=10):
Save state vector frequency (in steps) (found in file: adv.cpp)
• int nsaverot (default=100):
Save state vector frequency with the “rotary save” mechanism. (see 7.2) (found
in file: adv.cpp)
• int nstep (default=10000):
The number of time steps. (found in file: adv.cpp)
• int nstep_cpu_stat (default=10):
Output CPU time statistics for frequency in time steps. (found in file: adv.cpp)
• int print_internal_loop_conv (default=0):
Prints the convergence history when solving a consistent matrix (found in file:
adv.cpp)
• int print_linear_system_and_stop (default=0):
After computing the linear system prints Jacobian and right hand side and
stops.. (found in file: adv.cpp)
40
• double rtol (default=1e-3):
Relative tolerance when solving a consistent matrix (found in file: adv.cpp)
• string save_file (default=outvector.out):
Filename for saving the state vector. (found in file: adv.cpp)
• string save_file_pattern (default=outvector%d.out):
The pattern to generate the file name to save in for the rotary save mechanism.
(found in file: adv.cpp)
• double start_time (default=0.):
Counts time from here. (found in file: adv.cpp)
• double tol_mass (default=1e-3):
Tolerance when solving with the mass matrix. (found in file: adv.cpp)
Generic elemset “advecfm2”:
• double beta_supg (default=0.8):
Parameter to control the amount of SUPG perturbation added to the mass
matrix to be consistent SUPG beta_supg =0 implies consistent Galerkin and
beta_supg =1 implies full consistent SUPG. (found in file: advecfm2.cpp)
• string geometry (default=cartesian2d):
Type of element geometry to define Gauss Point data (found in file:
advecfm2.cpp)
• int lumped_mass (default=1):
Use lumped mass. (found in file: advecfm2.cpp)
• int weak_form (default=1):
Use the weak form for the Galerkin part of the advective term. (found in file:
advecfm2.cpp)
Flux function“ffeulerfm2”: Euler eqs.
• double gamma (default=1.4):
The specific heat ratio. (found in file: ffeulerfm2.cpp)
• int shock_capturing (default=0):
Add shock-capturing term. (found in file: ffeulerfm2.cpp)
• double shock_capturing_threshold (default=0.1):
Add shock-capturing term if relative variation of variables inside the element
exceeds this. (found in file: ffeulerfm2.cpp)
• double tau_fac (default=1.):
Scale the SUPG upwind term. (found in file: ffeulerfm2.cpp)
Flux function“ffswfm2”: Shallow water eqs.
• double gravity (default=1.):
Acceleration of gravity (found in file: ffswfm2.cpp)
41
• int shock_capturing (default=0):
Add shock-capturing term. (found in file: ffswfm2.cpp)
• double shock_capturing_threshold (default=0.1):
Add shock-capturing term if relative variation of variables inside the element
exceeds this. (found in file: ffswfm2.cpp)
• double tau_fac (default=1.):
Scale the SUPG upwind term. (found in file: ffswfm2.cpp)
6.7 The hydrology module
stream
soil
aquifer bottom
datum
hb
free surface of freatic aquifer
Figure 2: Aquifer/stream system.
soil
u
freatic surface
aquifer bottom
hb
datum
Figure 3: Aquifer/stream system. Transverse 2D view
This module solves the problem of subsurface flow in a free aquifer, coupled with a
surface net of 1D streams. To model such system three elemsets must be used: an aquifer
42
stream node
y
n3
n5
n4
n1
x
n2
aquifer node
stream
Figure 4: Aquifer/stream system. Discretization.
system representing the subsurface aquifer, a stream elemset representing the 1D stream
and a stream_loss elemset representing the losses from the stream to the aquifer (or vice
versa) see figures 2 and 3.
The aquifer elemset is a 2D elemset with triangle or quadrangle elements (see figure 4).
A per-element property eta represents the height of the aquifer bottom to a given datum.
The corresponding unknown for each node is the piezometric height or the level of the
freatic surface at that point φ. On the other hand, the stream elemset represents a 1D
stream of water. It has its own nodes, separate from the aquifer nodes, whose coordinates
must coincide with some corresponding node in the aquifer. For instance, the triangular
aquifer element in the figure is connected to nodes n1, n2 and n3, while the stream element
is connected to nodes n4 and n5. n1 and n5 have the same coordinates (but different
unknowns) and also n2 and n4. A node constant field (so called “H-fields”) represents
the stream bottom height hb , with reference to the datum. So that, normally, we have for
each node two coordinates and the stream bottom height. (ndim=2 nu=3 ndof=1). The
unknown for these nodes is the height u of the stream free water surface with reference to
the stream bottom. The channel shape and friction model and coefficients are entered via
properties described below. If the stream level is above the freatic aquifer level (hb +u > φ)
then the stream losses water to the aquifer and vice versa.
The equation for the aquifer integrated in the verical direction is
X
∂
(S(φ − η)φ) = ∇ · (K(φ − η)∇φ) +
Ga
∂t
(16)
where S is the storativity and G is a source term, due to rain, losses from streams or other
aquifers.
The equation for the stream is, according to the “Kinematic Wave Model” KWM approach,
∂A(u) ∂Q(A(u))
+
= Gs
(17)
∂t
∂s
43
Where u is the unknown field that represents the height of the water in the channel with
respect to the channel bottom as a function of time and a linear arc coordinate along the
stream, A is the transverse cross section of the stream and depends, through the geometry
of the channel, on the channel water height u. Q is the flow rate and, under the KWM
model is a function only of A through the friction law.
Q = γAm
(18)
where γ = Ch S 1/2 P −1 and m = 3/2 for the Ch`ezy friction model, and γ = a
¯ n−1 S 1/2 P −2/3
and m = 5/3 for the Manning model, where S = (dhb /ds) is the slope of the stream bottom,
P is the wetted perimeter, and Ch , a
¯ and n are model constants. Gs represent the gain
or loss of the stream, and the main component is the loss to the aquifer
Gs = P/Rf (φ − hb − u)
(19)
where Rf is the resistivity factor per unit arc length of the perimeter. The corresponding
gain to the aquifer is
Ga = −Gs δΓs
(20)
where Γs represents the planar curve of the stream and δΓs is a Dirac’s delta distribution
with a unit intensity per unit length, i.e.
Z
Z
L
f (x) δΓs dΣ =
f (x(s)) ds
(21)
0
The stream_loss elemset represents this loss, and a typical discretization is shown in
figure 4. The stream loss element is connected to two nodes on the stream and two on the
aquifer and must be entered in that order in the element connectivity table, for instance
elemset stream_loss 4
<.... elemset properties...>
__END__HASH__
...
<n5> <n4> <n1> <n2>
...
__END__ELEMSET__
6.7.1 Related Options
• double a_bar (default=1.):
Unit conversion factor for Manning friction law. (found in file: )
• double B1 (default=<required>):
Width of the channel (found in file: )
• double Ch (default=<required>):
Chezy roughness coefficient (found in file: )
44
• double diameter (default=<required>):
geometry of the channel (found in file: )
• string friction_law (default=string("undefined")):
Choose friction law, may be manning or chezy (found in file: )
• int impermeable (default=0):
Flag whether the element is impermeable (Rf → ∞) or not. (found in file: )
• double radius (default=<required>):
Radius of the channel (found in file: )
•
double Rf (default= 1.): Resistivity (including perimeter) of the stream to loss
to the aquifer. (found in file: )
• double roughness (default=1.):
Roughness coefficient for the Manning formula (a.k.a. n) (found in file: )
• string shape (default=string("undefined")):
Choose channel section shape, may be circular, rectangular or triangular.
(found in file: )
• double wall_angle (default=<required>):
Width and height of the channel (found in file: )
• double wall_angle (default=<required>):
Aperture angle of channel (found in file: )
• double width (default=<required>):
Width of the channel (found in file: )
• double width_bottom (default=<required>):
Width of bottom of channel (found in file: )
6.8 The Hydrological Model (cont.).
The implemented code solves the problem of subsurface flow in a free aquifer, coupled with
a surface net of 2D or 1D streams (“2D Saint-Venant Model”, 2DSVM, “1D Saint-Venant
Model”, 1DSVM, and “Kinematic Wave model”, KWM). To model such system three
element sets must be used: an aquifer system representing the subsurface aquifer, a 2D
or 1D (depending on the chosen model) stream element set representing the stream and
a 2D or 1D stream loss element set representing the losses from the stream to the aquifer
(or vice versa).
45
6.9 Subsurface Flow.
The aquifer element set is 2D linear triangle or quadrangle elements. A per-node property η
represents the height of the aquifer bottom to a given datum. The corresponding unknown
for each node is the piezometric height or the level of the freatic surface at that point φ.
The equation for the aquifer integrated in the vertical direction is
X
∂
(S(φ − η)φ) = ∇ · (K(φ − η)∇φ) +
Ga ,
∂t
on Ωaq × (0, t],
(22)
where Ωaq is the aquifer domain, S is the storativity, K is the hydraulic conductivity and
Ga is a source term, due to rain, losses from streams or other aquifers.
6.10 Surface Flow.
6.10.1 2D Saint-Venant Model.
The stream element set represents a 2D or 1D stream of water. It has its own nodes, separated from the aquifer nodes, whose coordinates must coincide with some corresponding
node in the aquifer. A constant per node field represents the stream bottom height hb ,
with reference to the datum. That is why, normally, we have two coordinates and the
stream bottom height for each node.
The equations for the 2D Saint-Venant open channel flow are the well known mass and
momentum conservation equations integrated in the vertical direction. If we write this
equations in the conservation matrix form, we have
∂U ∂Fi (U)
= Gi (U),
+
∂t
∂xi
i = 1, .., 3,
on Ωst × (0, t],
(23)
where Ωst is the stream domain, U = (h, hw, hv)T is the state vector and the advective
flux functions in eq. (23) are
h2
, hwv)T ,
2
h2
F2 (U) = (hv, hwv, hv 2 + g )T ,
2
F1 (U) = (hw, hw2 + g
(24)
where h is the height of the water in the channel with respect to the channel bottom,
u = (w, v)T is the velocity vector and g is the acceleration due to gravity. As in eq. (22),
Gs represents the gain (or loss) of the river, the source term is
G(U) = (Gs , gh(S0x − Sf x ) + fc hv + Cf $x |$|, gh(S0y − Sf y ) − fc hw + Cf $y |$|)T (25)
where S0 is the bottom slope and Sf is the slope friction.
1
w|¯
u|,
Ch h
n2
= 4 w|¯
u|,
h /3
1
v|¯
u| Ch`ezy model.
Ch h
n2
= 4 v|¯
u|, Manning model.
h /3
Sf x =
Sf y =
Sf x
Sf y
(26)
where Ch and n (the Manning roughness) are model constants. Generally, the effect of
coriolis force, related to the coriolis factor fc , must be taken in account in the case of great
46
lakes, wide rivers and estuaries. The coriolis factor is given by fc = 2ω sin ψ, where ω is
the rotational rate of the earth and ψ is the latitude of the area under study. The free
surface stresses in eq. (25) are expressed as the product between a friction coefficient and
a quadratic form of the wind velocity, $($x , $y ), and
ρair
Cf = c$
,
(27)
ρ
where,
c$ = 1.25×10−3 $−1/5
c$ = 0.5×10−3 $1/2
−3
c$ = 2.6×10
if |$| < 1 m/s,
if 1 m/s ≤ |$| < 15 m/s,
(28)
if |$| ≥ 15 m/s,
6.10.2 1D Saint-Venant Model.
When velocity variations on the channel cross section are neglected, the flow can be treated
as one dimensional. The equations of mass and momentum conservation on a variable cross
sectional stream (in conservation form) are,
∂A(s, t) ∂Q(A(s, t))
+
= Gs (s, t),
∂t
∂s
1 ∂Q
1
∂
Q2
+
(β
) + g(S0 − Sf )+
A(s, t) ∂t
A(s, t) ∂s A(s, t)
c$
qt
∂h
−
$2 cosα =
(v − vt ),
+g
∂s A(s, t)
A(s, t)
on Ωst × (0, t],
(29)
where A is the cross sectional area, Q is the discharge, Gs (s, t) represents the gain or
loss of the stream (i.e. the lateral inflow per unit length of channel), s is the arc-length
along the channel, v = Q/A the average velocity in s-direction, vt the velocity component
R
in s-direction of lateral flow from tributaries, the Boussinesq coefficient β = v21A u2 dA
(u the flow velocity at a point) and α the wind direction measured from a positive line
tangent to s in flow direction. The bottom shear stresses are approximated by using the
Ch`ezy or Manning equations,
Sf =
v 2 P (h)
,
Ch2 A(h)
2
4
n
P /3 (h)
,
Sf =
v2 4
a
A /3 (h)
Ch`ezy model.
(30)
Manning model.
where P is the wetted perimeter of the channel and a is a conversion factor (a = 1 for
metric units).
6.10.3 Kinematic Wave Model.
When friction and gravity effects dominate over inertia and pressure forces, and, if we
neglect the stress due to wind blowing and the coriolis term, the momentum equation
becomes
S = Sf ,
(31)
47
and eq. (29)
∂A(h) ∂Q(A(h))
+
= Gs , on Ωst × (0, t],
(32)
∂t
∂s
where A depends, through the geometry of the channel, on the channel water height h.
The flow rate Q under the KWM model is only a function of A through the friction law.
Q = γAm ,
(33)
where γ = Ch S 1/2 P −1 and m = 3/2 for the Ch`ezy friction model, and γ = a
¯ n−1 S 1/2 P −2/3
and m = 5/3 for the Manning model; S = (dhb /ds) is the slope of the stream bottom.
6.11 Boundary Conditions.
6.11.1 Boundary Conditions to simulate River-Aquifer Interactions/Coupling Term.
The stream/aquifer interaction process occurs between a stream and its adjacent floodplain aquifer. The coupling term is not explicitly included in eq. (22) but it is treated as
a boundary flux integral. At a nodal point we can write the coupling,
Gs = P/Rf (φ − hb − h),
(34)
where Gs represents the gain or loss of the stream, and the main component is the loss
to the aquifer and Rf is the resistivity factor per unit arc length of the perimeter. The
corresponding gain to the aquifer is
Ga = −Gs δΓs ,
(35)
where Γs represents the planar curve of the stream and δΓs is a Dirac’s delta distribution
with a unit intensity per unit length, i.e.
Z
Z L
f (¯
x) δΓs dΣ =
f (¯
x(s)) ds.
(36)
0
The stream loss element set represents this loss, and a typical discretization is shown in
fig. 4. The stream loss element is connected to two nodes on the stream and two on the
aquifer. If the stream level is over the freatic aquifer level (hb + h > φ) then the stream
losses water to the aquifer and vice versa. Contrary to standard approaches, the coupling
term is incorporated through a boundary flux integral that arises naturally in the weak
form of the governing equations rather than through a source term.
6.11.2 Initial Conditions. First, Second and Third Kind Boundary
Conditions/Absorbent Boundary Condition.
Groundwater flow. In the previous section, the equation that governs subsurface flow
was established. In order to obtain a well posed PDE problem, initial and boundary conditions must be superimposed on the flow domain and on its limits. The initial condition
for the groundwater problem is a constant hydraulic head in the whole region that obeys
levels observed in the basin history.
Now, consider a simply connected region Ω bounded by a closed curve ∂Ω such that
48
∂Ωφ ∪ ∂Ωσ ∪ ∂Ωφσ = ∂Ω. If the stream is partially penetrating and connected, in a
Hydraulic sense, to the aquifer, we set
φ = φ0 , on ∂Ωφ × (0, t]
∂φ
= σ0 , on ∂Ωσ × (0, t]
K(φ − η)
∂n
∂φ
= C(φ − h), on ∂Ωφσ × (0, t]
K(φ − η)
∂n
(37)
where φ0 is a given water head, σ0 is a given flux normal to the flux boundary ∂Ωσ and C
the conductance at the river/stream interface. If a fully penetrating stream is considered,
∂φ
= C(φ − h),
∂n
on ∂Ωφσ × (0, t]
(38)
∂φ
= C(hb − h),
∂n
on ∂Ωφσ × (0, t]
(39)
K(φ − η)
Finally, for a perched stream,
K(φ − η)
Surface Flow - Fluid Boundary. We recall that the type of a flow in a stream
√ or in
an open channel depends on the value of the Froud number Fr = |u|/c (where c = gh is
the wave celerity ), a flow is said
• fluvial, for |u| < c.
• torrential, for |u| > c
Saint-Venant equations. Considering a Cauchy problem for the time-like variable
xdim+1 = t where the solution is given in the subspace xdim+1 = t = 0 as U = U(x, t = 0)
and is determined at subsequent values of t. If the subspace t = 0 is bounded by a surface
∂Γ(x) then additional conditions have to be imposed on that surface at all values of t.
This defines an initial boundary value problem. A solution for the system of the first-order
PDE’s can be written as a superposition of wave-like solutions of the type corresponding
to the n-eigenvalues of the matrix Ak · n =
outward unit normal to the boundary edge:
U=
n
X
¯ α eI(x·n−ωα t) ,
U
∂Fi (U) k
∂U
· n,
k = 1, .., dim, being n the
on Ωst × Γst × (0, t]
(40)
α=1
where summation extends over all eigenvalues λα .
As the problem is hyperbolic, n initial conditions for the Cauchy problem have to be
given to determine the solution. That is, equal number of conditions as unknowns must
be imposed at t = 0. For initial boundary value problem the n boundary conditions
have to be distributed along the limits at all values of t, according to the direction of
the propagation of the corresponding waves. If the wave phase velocity, the α-eigenvalue
of Ak · nk (i.e. k-wave projected in the interior normal direction n), is positive, the
information is propagated inside the domain. Hence, the number of conditions to be
imposed for the initial boundary value problem at a given point of ∂Γ is equal to the
49
number of positive eigenvalues of Ak · nk at that point. The total number of conditions
remains equal to the total number of eigenvalues (i.e. the order of the system). For the
treatment of the boundary conditions we will use the one dimensional projected system
and consider the sign of the eigenvalues of Ak (un + c and un − c). We remark that if
n is the outward unit normal to the boundary edge, an inflow boundary corresponds to
u · n < 0 and an outflow one to u · n > 0.
Fluvial Boundary.
• inflow boundary: u specified and the depth h is extrapolated from interior points,
or vice versa.
• outflow boundary: depth h specified and velocity field extrapolated from interior
points, or vice versa.
Torrential Boundary.
• inflow boundary: u and the depth h are specified.
• outflow boundary: all variables are extrapolated from interior points.
Solid Wall Condition. We prescribe the simple slip condition over Γslip (⊂ Γst )
u·n=0
(41)
Kinematic Wave Model. The applicability of the kinematic wave as an approximation
to dynamic wave was discussed in Rodr´ıguez(1995) and, according to Lighthill and
Whitham, subcritical flow conditions favor the kinematic approach.
Since one characteristic is propagated inside the domain, only we can specify the water
head, the channel section or the discharge at inflow boundaries (see eq. (33)).
6.12 Absorbing boundary conditions
Absorbing boundary conditions are a very useful feature for the solution of advective
diffusion problems. They allow the user to put artificial boundaries closer to the interest
region, and also accelerate convergence to steady solutions, since provide the highest rate
of energy dissipation through the boundaries.
In PETSc-FEM, once you write the flux function for a particular advective-diffusive
problem you get absorbing boundary conditions with none or little extra work. There are
basically two types of absorbing boundary conditions
• Linear, based on the Jacobian of the flux function, assuming small perturbations
about a reference value.
• Based on Riemann invariants (require the writer of the flux function to provide the
Riemann invariants for the flux function). (Needs the user to write the Riemann
invariants and
50
6.12.1 Linear absorbing boundary conditions
Starting with the conservation form of an advective system (2), and assuming small perturbations about a mean fluid state U(x, t) = U0 + U0 (x, t), and no source term, then we
obtain the linearized form
∂U0
∂U0
+ A0
= 0.
(42)
∂t
∂x
where we assume further, that the flow only depends on x the direction normal to the
boundary. Let S be the matrix of right eigenvectors of A0 so that
A0 S = SΛ
(43)
where Λ = diag{λ1 , ..., λndof } are the eigenvalues of A0 . Assuming that the system is
“hyperbolic”, then such a diagonal decomposition is possible, for any state U0 , with real
eigenvalues and a non singular matrix S. Multiplying (42) at left by S−1 and defining
V = S−1 U0 we obtain a decoupled system of equations
∂V
∂V
+Λ
= 0.
(44)
∂t
∂x
Now, the equation for each “characteristic component” vj of V is a simple linear transport
equation with constant transport velocity λj
∂vj
∂vj
+ λj
= 0.
(45)
∂t
∂x
so that the absorbing boundary condition is almost evident. Assuming that we want
to solve the equation on the semiplane x ≥, so that x = 0 is a boundary, then the
corresponding absorbing boundary condition is
(
vj (0) = 0;
if λj ≥ 0 (ingoing boundary)
(46)
vj extrapolated from interior; otherwise, (outgoing boundary)
This can be summarized as
Π+
V V0 = 0
(47)
where
Π+
V = diag{(1 + sign(λj ))/2}
(48)
Π+
V
is the projection matrix onto the space of incoming waves. As the
is a diagonal matrix,
+
+
with diagonal elements 1 or 0, it trivially satisfies the projection condition Π+
V ΠV = ΠV .
Coming back to the U basis, we obtain the following first-order, linear absorbing boundary condition
Π+
(49)
U (U0 ) (U(0) − U0 ) = 0,
where
± −1
Π±
U = S ΠV S
(50)
This condition is perfectly absorbing for small amplitude waves around the state U0 . The
main problem with it is that, as the limit state at the boundary U∞ is not known a priori,
we have to use some a priori chosen state U0 6= U∞ and then, the projection matrix
+
0
used Π+
U (U ) will not be equal to ΠU (U∞ ), and then not fully absorbing. We call U0
the reference state for the absorbing boundary condition. It can even happen that the
eigenvalues for the actual state at the boundary change sign with respect to the reference
state.
51
6.12.2 Riemann based absorbing boundary conditions
Let n+ (n− ) be the number of incoming (outgoing) waves, i.e. the number of positive
(negative) eigenvalues of A0 , and assume that the eigenvalues are decreasingly ordered,
i,e, λj ≥ λk , if j < k. So that the positive eigenvalues are the first n+ ones. The boundary
conditions for the incoming waves (46) can be written as
lj .(U − U0 ) = 0,
j = 1, . . . , n+
(51)
where lj is a row of S−1 , i.e. a “left eigenvalue” of A0 . If U is close to U0 we can write
(51) as
lj (U). dU = 0, j = 1, . . . , n+ .
(52)
If this differential forms were “exact differentials”, i.e. if
lj (U). dU = dwj (U), for all
j = 1, . . . , ndof ,
(53)
for some functions wj (U), then we could impose as absorbing boundary conditions
wj (U) = wj (U0 ),
j = 1, . . . , n+ .
(54)
Let’s call to these wj functions “invariants”. As there are ndof invariants, we can define as
a new representation of the internal state the w variables. Note that, as (∂w/∂U) = S −1 ,
and S is non-singular, due to the hyperbolicity of the system, the correspondence between
w and U is one-to-one.
Assume that the value at the boundary reaches a steady limit value of U∞ , i.e.
U(0, t) → U∞ , for
t→∞
(55)
If all the waves were incoming (n+ = ndof ), then the set of boundary conditions (54)
would be a set of ndof non-linear equations on the value U∞ . As the correspondence
U → w is one to one, the boundary conditions would mean w(U∞ ) = w(U0 ), and then
U∞ = U0 . But if the number of incoming waves is n+ < ndof , then it could happen
that U∞ 6= U0 . In fact, for a given U0 , the limit value U∞ would belong to a curved
n− -dimensional curvilinear manifold. Even if the limit state U∞ =U0 , we can proved to
be perfectly absorbing, since, as U → U∞ at the boundary, we can expand each of the
conditions around U∞ and it would result in an equation similar to (52) but centered
about U∞ ,
lj (U∞ ).(U − U∞ ) = 0, j = 1, . . . , n+ .
(56)
The problem is that in general the differentials are not exact. “Riemann invariants” are
functions that satisfy (53) under some restrictions on the flow. For instance, Riemann invariants can be computed for compressible gas flow if we assume that the flow is isentropic.
They are
w1 = s = log(p/ργ ),
λ1 = u,
(entropy);
2a
w2,3 = u ±
,
λ2,3 = u ± a, (acoustic waves);
(57)
γ−1
w4,5 = u · ˆt1,2 ;
λ4,5 = u,
(vorticity waves).
Boundary conditions based on Riemann invariants are, then, absorbing in some particular
cases.
52
6.12.3 Absorbing boundary conditions based on last state
Another possibility is to linearize around the last state Un , i.e.
n
n+1
Π+
) − U(0, tn )) = 0.
U (U(0, t )) (U(0, t
(58)
This equation is always perfectly absorbing in the limit, because we are always linearizing
about a state that, in the limit, will tend to U∞ and doesn’t need the computation of
Riemann invariants (which could be unknown for certain problems). Also, this boundary
condition is fully absorbing even in the case of inversion of the sense of propagation of
waves.
The drawback is that we have no external control on the internal states, i.e. the limit
internal state does not depend on some external value (as the U0 used for the Riemann
based absorbing boundary conditions), but on the internal state. That means, for instance, that if the internal computational scheme tends to produce some error (due to non
conservativity, or rounding errors), the internal state would drift constantly.
A good compromise may be to use Riemann based (54) or linear absorbing boundary
conditions (49) at inlet and absorbing boundary conditions based on last state (58) at
outlet. As the error tends to propagate more intensely towards the outlet boundary, it
is preferably to use strongly absorbing boundary conditions there, whereas the linearly
absorbing or Riemann invariant boundary conditions upstream avoid the drift.
6.12.4 Finite element setup
Assume that the problem is 1D, with a constant mesh size h and time step ∆t, so that
nodes are located at positions xk = kh, k = 0, . . . , ∞. Let Unj be the state at node j, time
step n. FEM discretisation of the system of equations with “natural” boundary conditions
leads to a system of the form
F0 (Un+1
, Un+1
) = Rn+1
0
1
0
F1 (Un+1
, Un+1
, Un+1
) = Rn+1
0
1
2
1
..
..
.
=
.
n+1
Fk (Uk−1
, Un+1
, Un+1
k
k+1 )
..
.
=
=
(59)
Rn+1
k
..
.
where the Fk () are non-linear functions and the Rk possibly depends on the previous values
Unk . “Imposing boundary conditions” means to replace some of the equations in the first
row (k = 0) for other equations. Recall that, in order to balance the number of equations
and unknowns, we must specify which of the equations are discarded for each equations
that is added. For instance, when imposing conditions on primitive variables it is usual
to discard the energy equation if pressure is imposed, to discard the continuity equation
if density is imposed and to discard the j-th component of the momentum equation if the
j-th component of velocity is imposed. On solid walls, the energy equation is discarded if
temperature is imposed. Some of these “pairings” equation/unknown are more clear than
others.
53
So that, when imposing absorbing boundary conditions we have not only to specify the
new conditions, but also which equations are discarded. Note that this is not necessary
if all the variables are imposed at the boundary, for instance in a supersonic outlet. This
suggests to generate appropriate values even for the outgoing waves, for instance, by
extrapolation from the interior values.
6.12.5 Extrapolation from interiors
For a linear system in characteristic variables (45) we could replace all the first row of
equations by
(
0;
j = 1, .., n+
n+1
vj0
= Pm
;
(60)
n
+
p=0 cp vjp ; j = n + 1, . . . , ndof
which can be put in matricial form as
n+1
Π+
=0
V V0
m
X
n+1
Π−
(V
−
cp Vpn ) = 0
0
V
(61)
p=0
where the cp ’s are appropriate coefficients that provide an extrapolation to the value v0n+1
from the values at time v0n+1 Note that these represents 2ndof equations, but as Π±
V have
rank n± there are, in total, ndof linearly independent equations. As the n+ + 1 ≤ j ≤ ndof
rows in the first row of equations (corresponding to incoming waves) are null and vice
versa for the outgoing waves, we can add both blocks of equations to add a single set of
ndof equations
m
X
n+1
−
n+1
V
(V
Π+
+
Π
−
cp Vpn ) = 0
(62)
0
V 0
V
p=0
and, coming back to the U basis
n+1
n+1
Π+
− U0 ) + Π−
−
U (U0
U (U0
m
X
cp Unp ) = 0
(63)
p=0
The modified version of the FEM system of equations (59) that incoroporates the absorbing
boundary conditions is, then
n+1
n+1
Π+
− U0 ) + Π−
−
U (U0
U (U0
m
X
cp Unp ) = 0
p=0
n+1
n+1
F1 (U0 , U1 , Un+1
)
2
..
.
= Rn+1
1
..
=
.
n+1
n+1
Fk (Un+1
, Un+1
k−1 , Uk
k+1 ) = Rk
..
..
.
=
.
54
(64)
6.12.6 Avoiding extrapolation
For linear systems, equation (59) is of the form
Un+1
− Un0
Un+1 − Un0
0
+A 1
= 0;
∆t
h
(65)
n+1
n
n
U
−
U
Un+1
−
U
k−1
k+1
k
k
+A
= 0, k ≥ 1
∆t
2h
We have made a lot of simplifications here, no source or upwind terms, and a simple
discretization based on centered finite differences. Alternatively, it can be thought as a
pure Galerkin FEM discretization with mass lumping. In the base of the characteristic
variables V this could be written as
Vn+1 − V0n
V0n+1 − V0n
+Λ 1
= 0;
∆t
h
n
Vn+1 − Vk−1
Vkn+1 − Vkn
+ Λ k+1
= 0, k ≥ 1.
∆t
h
For the linear absorbing boundary conditions (49) we should impose
Π+
V (Vref ) (V0 − Vref ) = 0.
(66)
(67)
while discarding the equations corresponding to the incoming waves in the first rows of
(66). Here Uref /Vref is the state about which we make the linearization. This can be
done, via Lagrange multipliers in the following way
−
Π+
V (Vref ) (V0 − Vref ) + ΠV (Vref ) Vlm = 0,
Vn+1 − V0n
V0n+1 − V0n
+Λ 1
+ Π+
V Vlm = 0;
(68)
∆t
h
n+1
n
V
− Vk−1
Vkn+1 − Vkn
+ Λ k+1
= 0, k ≥ 1.
∆t
2h
where Vlm are the Lagrange multipliers for imposing the new conditions. Note that, if j
is an incoming wave (λj >= 0), then the equation is of the form
vj0 − vref0 = 0
n+1
vj0
−
n
vj0
+ λj
n+1
vj1
∆t
n+1
n
vjk
− vjk
n
− vj0
h
+ vj,lm = 0
n+1
vj,k+1
(69)
n
− vjk
= 0, k ≥ 1
∆t
2h
Note that, due to the vj,lm Lagrange multiplier, we can solve for the vjk values from
the first last rows, the value of the multiplier vj,lm “adjusts” itself in order to relax the
equations in the second row.
On the other hand, for the outgoing waves (λj < 0), we have
+ λj
vj,lm = 0
n+1
vj0
−
n
vj0
∆t
n+1
n
vjk − vjk
∆t
+ λj
+ λj
n+1
vj1
n
− vj0
h
n+1
n
vj,k+1 − vjk
2h
55
=0
= 0,
(70)
k≥1
So that the solution coincides with the unmodified orginal FEM equation, and vj,lm = 0.
Coming back to the U basis, we have
−
Π+
U (Uref ) (U0 − Uref ) + ΠU (Uref ) Ulm = 0,
Un+1 − Un0
Un+1
− Un0
0
+A 1
+ Π+
U Ulm = 0;
∆t
h
Un+1
− Unk−1
Un+1
− Unk
k
+ A k+1
= 0,
∆t
2h
(71)
k ≥ 1.
And finally, coming back to the FEM equations (59),
−
Π+
U (Uref ) (U0 − Uref ) + ΠU (Uref ) Ulm = 0,
n+1
F0 (Un+1
, Un+1
) + Π+
0
1
U Ulm = R0
F1 (Un+1
, Un+1
, Un+1
) = Rn+1
0
1
2
1
..
..
.
=
.
(72)
n+1
n+1
Fk (Un+1
, Un+1
k−1 , Uk
k+1 ) = Rk
..
..
.
=
.
In conclusion, in this setup we do not need to make extrapolations to the variables,
and then there is no need to have a structured line of nodes near the boundary. It’s
only required to have an additional fictitious node at the boundary in order to hold the
Lagrange multiplier unknowns Ulm , and to add the absorbing boundary equation (first
row of (72) for these nodes.
6.12.7 Flux functions with enthalpy.
When the flux function has an enthalpy term that is not the identity, then the expressions
for the change of basis are somewhat modified, and also the projectors. An advective
diffusive-system with a “generalized enthalpy function” H(U) is an extension of the form
(2) and can be written as
∂
∂Fi (U )
=0
(73)
H(U) +
∂t
∂xi
The heat equation can be naturally put in this way. Also, the gas dynamics equations for
compressible flow can be put in this form if we put the equations in “conservative form”
but use the “primitive variables” U = [ρ, u, p]T as the main main variables for the code.
This has the advantage of using a conservative form of the equations and, at the same
time, allows an easy imposition of Dirichlet boundary conditions that are normally set
in terms of the primitive variables. In this case U are the primitive variables, and the
generalized enthalpy H(U) is the vector of conservative variables. We call the generalized
“heat content matrix” Cp as
∂H(U)
Cp =
(74)
∂U
and (2) can be put in quasi-linear form as
Cp
∂U
∂U
+ Ai
=0
∂t
∂xi
56
(75)
Note that this can be brought to the quasi-linear form (42) (i.e., without the Cp ) if we
multiply the equation at left by Cp−1 and define new flux Jacobians as
˜i = C−1 Ai ,
A
p
(76)
So that, basically, the extension to systems with generalized enthalpy is to replace the
Jacobians, by the modified Jacobians (76). The modified expression for the projectors is
± −1
Π±
U = Cp S ΠV S
(77)
6.12.8 Absorbing boundary conditions available
At the moment of writing this, we have three possible combinations of boundary conditions.
Using extrapolation from the interior.
These is the elemset <system> abso. The
number of nodes per element nel must be not lower than 4. The first nel − 2 nodes are
used for a second order extrapolation of the outgoing wave. The nel − 1-th node is the
node with Lagrange multipliers, and the nel -th node is used to set the reference value. For
instance, for an absorbing elment of nel = 5 nodes, we would have 3 internal nodes, and
the data would loke like this (see figure 5)
n (lagrange multipliers)
5
n4 (reference state)
n1
n2
outgoing wave
n3
(absorbing element)
Figure 5: Absorbing element.
elemset gasflow_abso 5
normal <nx> <ny>
__END_HASH__
<n1> <n2> <n3> <n4> <n5>
...
__END_ELEMSET__
end_elemsets
57
fixa
<n4> 1 <rho ref>
<n4> 2 <u ref>
<n4> 3 <v ref>
<n4> 4 <p ref>
__END_FIXA__
• Each element has 5 nodes, first three are real nodes (i.e. not fictitious) numbered
from the boundary to the interior. Fourth node is reserved for Lagrange multipliers,
and the fifth node is set to the reference value.
• The normal option is used to define the normal to the boundary. It can be set as a
constant vector per elemset (usually for plane boundaries, as in the example above),
or as a per element value. In this last case we would have something like this
elemset gasflow_abso 5
props normal[2]
normal <nx> <ny>
__END_HASH__
<n1> <n2> <n3> <n4> <n5> <nx> <ny>
...
__END_ELEMSET__
The normal need not be entered with a high precision. If the vector entered is
not exactly normal to the boundary, then the condition will be absorbing for waves
whose “group velocity vector” is parallel with this vector.
• Note the the fixa section for the values of the reference node. In this case (gas
dynamics, elemset gasflow) we set the four fields (ndim = 2) to the reference values.
• The reference values can be made time dependent in an explicit way by using a
fixa_amplitude section instead of a fixa section.
• Using this absorbing boundary condition requires the flow to have implemented the
Riemman_Inv() method. If this is not the case, then the program will stop with a
message like
• For the gasflow elemset: If the option linear_abso is set to false (0), then the
Riemman invariants for gas dynamics are used, and the state reference value is used
for computing the reference Riemman invariants. If linear_abso is set to true (1),
then the linear absorbing boundary conditions are imposed.
Not using extrapolation from the interior. These is the elemset <system> abso2. The
number of nodes per element nel is 3. The first node is the node at the boundary. The
second node is the node with Lagrange multipliers, and the third node is used to set the
reference value. The data would look like this (see figure 6)
elemset gasflow_abso2 3
normal <nx> <ny>
__END_HASH__
<n1> <n2> <n3>
...
__END_ELEMSET__
58
(absorbing element)
n (lagrange multipliers)
2
n1
n3 (reference state)
outgoing wave
Figure 6: Absorbing element.
end_elemsets
fixa
<n4> 1 <rho ref>
<n4> 2 <u ref>
<n4> 3 <v ref>
<n4> 4 <p ref>
__END_FIXA__
• As before, the normal property is used for computing the direction normal to the
boundary.
• If the use_old_state_as_ref flag is set to true (1), then the reference state is taken
as the state o the state of the boundary node at the previous time step. In this case
the state of the third node is ignored. On the other hand if it is set to false (0), then
the state of the third node is used as the reference state.
• This type of boundary condition doesn’t need the implementation of the Riemman
invariants methd, but it needs the methods get_Cp() and get_Ajac().
7 The Navier-Stokes module
7.1 LES implementation
The Smagorisky LES model for the Navier-Stokes module follows ... The implementation
under PETSc-FEM has presents the following particularities
• Wall boundary conditions are implemented as “mixed type”.
• The van Driest damping factor introduces non-localities in the sense that the turbulent viscosity at a volume element depends on the state of the fluid at a wall.
59
7.1.1 The wall elemset
Wall boundary conditions have been implemented via a wall elemset. This is a surface
element that computes, given the velocities at the nodes, the tractions corresponding to
this velocities, for a given law of wall. Also, this shear velocities as stored internally in the
element, so that the volume elements can get them and compute the van Driest damping
factor. This requires to find, for each volume element, the nearest wall element. This is
done before the time loop, with the ANN (Approximate Nearest Neighbor) library.
7.1.2 The mixed type boundary condition
The contribution to the momentum equations from the wall element is
Z
Rip =
tp Ni dΣ
(78)
Σe
where Rip is the contribution to the residual of the p-th momentum equation of the node
i. Ni is the shape function of node i and tp are the tractions on the surface of the element
Σe . The wall law is in general of the form
u
= f (y + )
(79)
u∗
p
where u is the tangent velocity, u∗ the shear velocity u∗ = τw /ρ, and y + = yu∗ /ν,
the non-dimensional distance to the wall. We have several possibilities regarding the
positioning of the computational boundary. We first discuss the simplest, that is to set
the computational boundary at a fixed y + position. Note, that this means that the real
position od the boundary y + changes during iteration. In this case (79) can be rewritten
as
τw = g(u) u
(80)
where
τw = g(u) u = ρ
u
f (y + )
2
(81)
or
ρ
u
(82)
f (y + )2
The traction on the wall element is assumed to be parallel to the wall and in opposite
direction to the velocity vector, that is
g(u) =
tp = −g(u)up
(83)
Replacing in (78) the residual term is
Z
Rip = −
g(u)up Ni dΣ
(84)
Σe
The Jacobian of the residual with respect to the state variables, needed for the NewtonRaphson algorithm is
Z
∂Rip
∂
=
(g(u)up ) Ni dΣ
Jip,jq = −
∂ujq
∂u
jq
Σe
Z (85)
∂up
∂u
0
g(u)
+ g (u) up
Ni dΣ
∂ujq
∂ujq
Σe
60
but
up =
X
ulp Nl
(86)
l
so that
X ∂ulp
∂up
=
Nl
∂ujq
∂ujq
l
X
δlj δpq Nl
=
(87)
l
= δpq Nj
Similarly,
u2 = up up = ulp Nl ump Nm
and
2u
so that
(88)
∂up
∂u
= 2up
∂ujq
∂ujq
(89)
up
∂u
=
Nj
∂ujq
u
(90)
Replacing in (85),
Z
Jip,jq =
Σe
g 0 (u)
g(u) δpq +
up uq
u
Ni Nj dΣ
(91)
7.1.3 The van Driest damping factor. Programming notes
This is a non-standard issue, since the computation of one volume element requires information of other (wall) elements. First we compute the wall element that is associated
to each volume element. assemble() is called with jobinfo="build_nneighbor_tree".
This jobinfo is acknowledged only by the wall elemsets which compute their geometrical
center and put them in the data_pts STL array. Then, this is passed to the ANN package
which computes the octree. All this is cached in the constructor of a WallData class. After
this a call to assemble() with jobinfo="get_nearest_wall_element" is acknowledged
by all the volume elemsets, that compute for each volume element the neares wall element.
This is stored as an “integer per element property” in the volume elemsets. In order to
reduce memory requirements only an index in the data_pts array is stored. As several
wall elemsets may exists, an array of pair<int,elemset *> is used to store pointers to
the data_pts array in order to know to which wall elemset the given index in data_pts
belongs.
vector<double> *data_pts_ = new vector<double>;
vector<ElemToPtr> *elemset_pointer = new vector<ElemToPtr>;
WallData *wall_data;
if (LES) {
VOID_IT(argl);
61
argl.arg_add(data_pts_,USER_DATA);
argl.arg_add(elemset_pointer,USER_DATA);
Elemset *elemset=NULL;
argl.arg_add(elemset,USER_DATA);
ierr = assemble(mesh,argl,dofmap,
"build_nneighbor_tree",&time); CHKERRA(ierr);
PetscPrintf(PETSC_COMM_WORLD,"After nearest neighbor tree.\n");
wall_data = new WallData(data_pts_,elemset_pointer,ndim);
//---:---<*>---:---<*>---:---<*>---:---<*>---:---<*>---:---<*>---:
// Find nearest neighbor for each volume element
VOID_IT(argl);
argl.arg_add(wall_data,USER_DATA);
ierr = assemble(mesh,argl,dofmap,"get_nearest_wall_element",
&time); CHKERRA(ierr);
}
In the jobinfo="build_nneighbor_tree" call to assemble() a loop over all the elements in the elemset, ignoring to what processor it belongs, must be performed. Otherwise, each processor loads in data_pts only the coordinates of the elements that belong to him. A possible solution is, after the loop, to exchange the information among
the processors, but the simplest solution id to simply bypass the element selection with
compute_this_elem() with a call like
for (int k=el_start; k<=el_last; k++) {
if (!build_nneighbor_tree ||
comp_shear_vel ||
compute_this_elem(k,this,myrank,iter_mode))) continue;
...
That means that for jobinfo="build_nneighbor_tree" and "comp_shear_vel" the normal element selection is bypassed.
7.2 Options
General options:
• int A_van_Driest (default=0):
If A van Driest=0 then the van Driest damping factor is not used (found in
file: ns.cpp)
• int activate_debug (default=0):
Activate debugging (found in file: ns.cpp)
• int activate_debug_memory_usage (default=0):
Activate report of memory usage (found in file: ns.cpp)
• int activate_debug_print (default=0):
Activate printing in debugging (found in file: ns.cpp)
62
• double alpha (default=1.):
Trapezoidal method parameter. alpha=1 : Backward Euler. alpha=0 :
Forward Euler. alpha=0.5 : Crank-Nicholson. (found in file: ns.cpp)
• double displ_factor (default=0.1):
Scales displacement for ALE-like mesh relocation. (found in file: ns.cpp)
• double Dt (default=0.):
The time step. (found in file: ns.cpp)
• int fractional_step (default=0):
Use fractional step or TET algorithm (found in file: ns.cpp)
• string fractional_step_solver_combo (default=iisd):
Solver combination for the fractional step method. May be iisd, lu,
global_gmres. (found in file: ns.cpp)
• int fractional_step_use_petsc_symm (default=1):
Fractional step uses symmetric matrices (only CG iterative KSP). (found in
file: ns.cpp)
• string gather_file (default=gather.out):
Print values in this file (found in file: ns.cpp)
• int LES (default=0):
Use the LES/Smagorinsky turbulence model. (found in file: ns.cpp)
• int measure_performance (default=0):
Measure performance of the ’comp mat res’ jobinfo. (found in file: ns.cpp)
• int ndim (default=3):
Dimension of the problem. (found in file: ns.cpp)
•
vector<double> newton_relaxation_factor (default= (none)):
Relaxation parameter for Newton iteration. Several values may be entered in
the form
newton_relaxation_factor w1 n1 w2 n2 .... wn
that means: Take relaxation factor w1 for the first n1 steps, w2 for the
following n2 steps and so on until w_{n-1}. wn is taken for all subsequent
steps. Normally one takes a conservative (said 0.5) relaxation factor for the
first steps and then let full Newton (i.e. w=1) for the rest. For instance, the
line
newton_relaxation_factor 0.5 3 1.
means: take w = 0.5 for the first 3 steps, and then use w = 1. (found in file:
ns.cpp)
• int nfile (default=1):
Sets the number of files in the “rotary save” mechanism. (see 7.2) (found in
file: ns.cpp)
• int ngather (default=0):
Number of “gathered” quantities. (found in file: ns.cpp)
63
• int nnwt (default=1):
Number of inner iterations for the global non-linear Newton problem. (found
in file: ns.cpp)
• int nrec (default=1000000):
Sets the number of states saved in a given file in the “rotary save” mechanism
(see 7.2 (found in file: ns.cpp)
• int nsave (default=10):
Sets the save frequency in iterations (found in file: ns.cpp)
• int nsaverot (default=100):
Sets the frequency save for the “rotary save” mechanism. Sometimes it is
interesting to save the state vector with a certain frequency in a “append”
manner, i.e. appending the state vector at the end of the file. However, this
posses the danger of storing too much amount of data if the user performs a
very long run. The “rotary save” mechanism allows writing only a certain
amount of the recent states. The mechanism basically saves the state vector
each nsaverot steps appending to the a file. The name of the file is contructed
from a pattern set by the user via the save_file_pattern entry, by replacing
%d by 0 “`
a la” printf(). For instance, if save_file_pattern i set to
file%d.out then the state vectors are appended to file0.out. When the
number of written states reach the nrec count, the file is reset to 0, and the
saving continues from the start of the file. However, if nfile is greater than
one, then the state vector are continued to be stored in file file1.out and so
on. When the number of files nfile is reached, the saving continues in file ’0’.
More precisely, the saving mechanism is described by the following
pseudo-code:
Read state vector from ‘initial_state’ file into x^0, n=0;
for (i=0; i<nstep; i++) {
advance x^n to x^{n+1};
if (n % nsaverot == 0) {
j <- n/nsaverot;
k <- j % nrec;
l <- j / nrec;
if (k==0) { rewind file l; }
append state vector to file l;
}
}
(found in file: ns.cpp)
• int nsome (default=10000):
Sets the save frequency in iterations for the “print some” mechanism. The
“print some” mechanism allows the user to store the variables of some set of
nodes with some frequency. The nodes are entered in a separate file whose
name is given by a print_some_file entry in the general options, one node
per line. The entry nsome indicates the frequency (in steps) at which the data
64
is saved and save_file_some the name of the file to save in. (found in file:
ns.cpp)
• int nstep (default=10000):
The number of time steps. (found in file: ns.cpp)
• int print_linear_system_and_stop (default=0):
After computing the linear system solves it and prints Jacobian, right hand
side and solution vector, and stops. (found in file: ns.cpp)
• int print_residual (default=0):
Print the residual each nsave steps. (found in file: ns.cpp)
• string print_some_file (default=<none>):
Name of file where to read the nodes for the “print some” feature. (found in
file: ns.cpp)
• int report_option_access (default=1):
Print, after execution, a report of the times a given option was accessed.
Useful for detecting if an option was used or not. (found in file: ns.cpp)
• int reuse_mat (default=0):
Use fractional step or TET algorithm (found in file: ns.cpp)
• string save_file (default=outvector.out):
The name of the file to save the state vector. (found in file: ns.cpp)
• string save_file_pattern (default=outvector%d.out):
The pattern to generate the file name to save in for the rotary save
mechanism. (found in file: ns.cpp)
• string save_file_some (default=outvsome.out):
Name of file where to save node values for the “print some” feature. (found in
file: ns.cpp)
• int save_file_some_append (default=1):
Access mode to the “some” file. If 0 rewind file. If 1 append to previous
results. (found in file: ns.cpp)
• int solve_system (default=1):
Solve system before print_linear_system_and_stop (found in file: ns.cpp)
• string solver (default=petsc):
Type of solver. May be iisd or petsc . (found in file: ns.cpp)
• string solver_mom (default=petsc):
Type of solver for the projection and momentum steps (fractional-step). May
be iisd or petsc . (found in file: ns.cpp)
• double start_comp_time (default=0.):
Time to start computations (found in file: ns.cpp)
• string stdout_file (default=none):
If set, redirect output to this file. (found in file: ns.cpp)
65
• int steady (default=0):
Flag if steady solution or not (uses Dt=inf). If steady is set to 1, then the
computations are as if ∆t = ∞. The value of Dt is used for printing etc... If
Dt is not set and steady is set then Dt is set to one. (found in file: ns.cpp)
• int stop_mom (default=0):
After computing the linear system for the predictor/momentum step print
right hand side and solution vector, and stops. (found in file: ns.cpp)
• int stop_on_step (default=1):
After computing the linear system for the predictor/momentum step print
right hand side and solution vector, and stops. (found in file: ns.cpp)
• int stop_poi (default=0):
After computing the linear system for the poisson step print right hand side
and solution vector, and stops. (found in file: ns.cpp)
• int stop_prj (default=0):
After computing the linear system for the projection step print right hand
side and solution vector, and stops. (found in file: ns.cpp)
• double tol_newton (default=1e-8):
Tolerance to solve the non-linear system (global Newton). (found in file:
ns.cpp)
• int update_jacobian_iters (default=1):
Update jacobian only until n-th Newton subiteration. Don’t update if null.
(found in file: ns.cpp)
• int update_jacobian_start_iters (default=INF):
Update jacobian each n-th Newton iteration (found in file: ns.cpp)
• int update_jacobian_start_steps (default=INF):
Update jacobian each n-th time step. (found in file: ns.cpp)
• int update_jacobian_steps (default=0):
Update jacobian each n-th time step. (found in file: ns.cpp)
• int update_wall_data (default=0):
If 0: compute wall_data info only once. Otherwise refresh each
update_wall_data steps. (found in file: ns.cpp)
• int use_iisd (default=0):
Use IISD (Interface Iterative Subdomain Direct) or not. (found in file:
ns.cpp)
• int verify_jacobian_with_numerical_one (default=0):
Computes jacobian of residuals and prints to a file. May serve to debug
computation of the analytic jacobians. (found in file: ns.cpp)
Elemset “nsitetlesfm2”:
• int ALE_flag (default=0):
Flag to turn on ALE computation (found in file: nsitetlesfm2.cpp)
66
• double A_van_Driest (default=0):
van Driest constant for the damping law. (found in file: nsitetlesfm2.cpp)
• double C_smag (default=0.18):
Smagorinsky constant. (found in file: nsitetlesfm2.cpp)
•
double[ndim] G_body (default= null vector): Vector of gravity
acceleration (must be constant). (found in file: nsitetlesfm2.cpp)
• int LES (default=0):
Add LES for this particular elemset. (found in file: nsitetlesfm2.cpp)
• double additional_tau_pspg (default=0.):
Add to the tau_pspg term, so that you can stabilize with a term
independently of h. (Mainly for debugging purposes). (found in file:
nsitetlesfm2.cpp)
• string axisymmetric (default=none):
Add axisymmetric version for this particular elemset. (found in file:
nsitetlesfm2.cpp)
• int cache_grad_div_u (default=0):
Cache grad_div_u matrix (found in file: nsitetlesfm2.cpp)
• string geometry (default=cartesian2d):
Type of element geometry to define Gauss Point data (found in file:
nsitetlesfm2.cpp)
• int indx_ALE_xold (default=1):
Pointer to old coordinates in nodedata array excluding the first ndim values
(found in file: nsitetlesfm2.cpp)
• double jacobian_factor (default=1.):
Scale the jacobian term. (found in file: nsitetlesfm2.cpp)
• int npg (default=none):
Number of Gauss points. (found in file: nsitetlesfm2.cpp)
• double pressure_control_coef (default=0.):
Add pressure controlling term. (found in file: nsitetlesfm2.cpp)
• int print_van_Driest (default=0):
print Van Driest factor (found in file: nsitetlesfm2.cpp)
• double residual_factor (default=1.):
Scale the residual term. (found in file: nsitetlesfm2.cpp)
• double rho (default=1.):
Density (found in file: nsitetlesfm2.cpp)
• double shock_capturing_factor (default=0):
Add shock-capturing term. (found in file: nsitetlesfm2.cpp)
• double tau_fac (default=1.):
Scale the SUPG and PSPG stabilization term. (found in file:
nsitetlesfm2.cpp)
67
• double tau_pspg_fac (default=1.):
Scales the PSPG stabilization term. (found in file: nsitetlesfm2.cpp)
• double temporal_stability_factor (default=0.):
Adjust the stability parameters, taking into account the time step. If the
steady option is in effect, (which is equivalent to ∆t = ∞) then
temporal_stability_factor is set to 0. (found in file: nsitetlesfm2.cpp)
• int weak_form (default=1):
Use a weak form for the gradient of pressure term. (found in file:
nsitetlesfm2.cpp)
Elemset “bcconv ns fm2”:
• string geometry (default=cartesian2d):
Type of element geometry to define Gauss Point data (found in file:
bccnsfm2.cpp)
• int weak_form (default=1):
Use a weak form for the gradient of pressure term. (found in file:
bccnsfm2.cpp)
Elemset “wall”:
• string geometry (default=cartesian2d):
Type of element geometry to define Gauss Point data (found in file:
wall.cpp)
• double rho (default=1.):
Density (found in file: wall.cpp)
• double y_wall_plus (default=25.):
The y + coordinate of the computational boundary (found in file: wall.cpp)
Elemset “ns sup g”:
This element imposes a linearized free surface boundary condition.
• int LES (default=0):
Add LES for this particular elemset. (found in file: nssupg.cpp)
• double free_surface_damp (default=0.):
Clf =free surface set level factor tries to keep the free surface level
constant by adding a term ∝ η¯ to the free surface level. (see doc for
free surface damp) The equation of the free surface is
(dη/dt) = w
(92)
where η elevation, and w is the velocity component normal to the free surface.
We modify this as follows
Ceq
dη
+ Clf η¯ − Cdamp ∆η = w
dt
68
(93)
Where η¯ is the average value of eta on the free surface, and:
Cdamp = free surface damp smoothes the free surface adding a Laplacian
filter. Note that if only the temporal derivative and the Laplace term are
present in (93) then the equation is a heat equation. A null value (which is
the default) means no filtering. A high value means high filtering. (Warning:
A high value may result in unstability). Cdamp has dimensions of L2 /T (like a
diffusivity). One possibility is to scale with mesh parameters like h2 /∆t, other
0
is to scale with h1.5 g 0.5 . Currently, we are using Cdamp = Cdamp
h1.5 g 0.5 with
0
Cdamp ≈ 2. (found in file: nssupg.cpp)
• double free_surface_set_level_factor (default=0.):
This adds a Clf η¯ term in the free surface equation in order to have the total
meniscus volume constant. (found in file: nssupg.cpp)
• double fs_eq_factor (default=1.):
Ceq =fs eq factor (see doc for free surface damp option) is a factor that
scales the free surface “rigidity”. Ceq = 1 (which is the default) means no
scaling, a zero value means infinitely rigid (as for an infinite gravity). (found
in file: nssupg.cpp)
• string geometry (default=cartesian2d):
Type of element geometry to define Gauss Point data (found in file:
nssupg.cpp)
• int npg (default=none):
Number of Gauss points. (found in file: nssupg.cpp)
7.3 Mesh movement
Sometimes one has a mesh (connectivity and node coordinates) and wants a mesh for
a slightly modified domain. If u is the displacement of the boundaries, then we can
pose the problem of finding a mesh topologically equal to the original one, but with a
slight displacement of the nodes so that the boundaries are in the new position. This is
termed “mesh relocation”. One way to do this is to solve an elasticity problem where the
displacements at the boundaries are the prescribed displacement of the boundary. This
problem has been included by convenience in the Navier-Stokes module, even if at this
stage it has little to do with the Navier-Stokes eqs.
Once the displacement is computed by a standard finite element computation, we can
compute the new mesh node coordinates by adding simply the computed displacement
to the original node coordinate. The elastic constants can be chosen arbitrarily. If an
isotropic material is considered, then the unique relevant non-dimensional parameter is
the Poisson ratio, controlling the incompressibility of the fictituos material. However, if
the distortion is too large, this linear simple strategy can break down, when some element
collapses and has a negative Jacobian. A simple idea to fix this is to somewhat rigidize
the elastic constants of the fictituos material in order to minimize the distortion of the
elements. Designing this nonlinear material behaviour should guarantee a unique solution
and a relatively easy way to compute the Jacobian. A possibility is to have an hyperelastic
material, i.e. to define an “energy functional” F (ij ) as function of the strain tensor. One
69
should guarantee that this functional is convex, and one should have an easy way to
compute its derivatives and second derivatives.
A related approach, implemented in PETSc-FEM, is to compute (for simplices) the
transformation from a regular master element to the actual element, and to define the
energy functional to be minimized as a function of the associated matrix tensor. If Jij =
(∂xi /∂ξj ) where xi are the spatial coordinates and ξj the master coordinates, then the
metric tensor is
∂xk ∂xk
.
(94)
Gij =
∂ξi ∂ξj
The Jacobian of the transformation may be computed by differentiating the interpolated
displacements
X
xi (ξj ) =
xµi Nµ (ξj ),
(95)
µ
with respect to the master coordinates, so that
Jij =
∂Nµ
∂xi X
=
xµi
.
∂ξj
∂ξ
j
µ
(96)
We want to compute the distortion function to be minimized as a function of the eigenvalues of the metric tensor Gij . The eigenvalues vq of G are such that
Gij vqj = λq vqj
(97)
λq = vqi Gij vqj
(98)
so that,
(no sum over q) since the vq are orthogonal. Now if the functional F is a function of
the element node coordinates through the eigenvalues λq then we can compute the new
coordinates x0 as
∂F
= 0.
(99)
∂xµj
A possible distortion functional is
F ({λq }) = (Πq λq )−2/ndim
X
(λq − λr )2
(100)
qr
This functional has several nice features. Is minimal whenever all the eigenvalues are
equal (the distortion is minimal). It is non-dimensional, so that an isotropic dilatation
or contraction doesn’t produce a change in the functional. The non-linear problem (99)
can be solved by a Newton strategy, by computing the first and second derivatives of
F with respect to the node displacements (the residual and Jacobian of the system of
equations) by finite difference approximations. However, this turns to be too costly in
CPU time for the second derivatives, since we should compute for each second derivative
three evaluations of the functional, and there are nen (nen + 1)/2 (where nen = ndof nel is
the number of unknowns per element, nel is the number of nodes per element, and ndof
the number of unknowns per node, here ndof = ndim ) second derivatives to compute. Each
evaluation of the functional amounts to the computation of the tensor metrics G and to
solve the associated eigenvaue problem, so that an analytical expression to, at least, the
70
first derivatives, is desired. The derivatives of the distortion functional can be computed
as
∂F ∂λq
∂F
=
(101)
∂xj
∂λq ∂xµj
and then the derivatives with respect to the eigenvalues can be computed still numerically,
while we will show that the derivatives of the eigenvalues with respect to the node coordinates (which are the most expensive part) can be computed analytically. In this way,
we can compute the derivatives of the functional with the solution of only one eigenvalue
problem. The second derivatives can be computed similarly as
∂ 2 F ∂λq ∂λr
∂F ∂ 2 λq
∂2F
=
+
.
∂xµi xνj
∂λq λr ∂xµi ∂xµj
∂λq ∂xµi xνj
(102)
The first and second derivatives of F with respect to the eigenvalues are still computed
numerically, whilst the second derivatives of the eigenvalues can be computed by differentiating numerically the first derivatives. This amounts to O(nen ) eigenvalue problems for
computing the first and second derivatives of the distortion functional (the residual and
Jacobian). This cost may be further reduced by noting that the eigenvalues are invariant
under rotations and translations, and simply scaled by a dilatation or contraction, so that,
from the nen = ndim (ndim + 1) displacements, only ndim (ndim + 1)/2 − 1 should be really
computed, but this is not implemented yet. For tetras in 3D this implies a reduction from
12 distortion functional computation to only 5.
We will show now how the derivatives of the eigenvalues are computed analytically. It
can be shown that
∂λq
∂Gij
= vqi
vqj .
(103)
∂xµk
∂xµk
Note that this is as differentiating (98) but only keeping in the right hand side the change
in the matrix G and discarding the rate of change of the eigenvectors. It can be shown
that the conttribution of the other two terms is an antisymmetric matrix, so that the
contibution to the rate of change of the eigenvalue is null. Now, from (94)
∂Nµ
∂Gij
∂Nµ
+ Jlj
= Jli
∂xµl
∂ξj
∂ξi
so that
∂λq
= (vql Jli )
∂xµl
∂Nµ
vqj ,
∂ξj
(104)
(105)
which is the expression used.
8 Tests and examples
In this section we describe some examples that come with PETSc-FEM. They are intentionally small and sometimes almost trivial, but they are very light-weight and can be run
in a short time (at most some minutes). They are put in the test directory and can be
run with make tests command. The purpose of this tests is to check the corect behaviour
of PETSc-FEM, but also they can serve people to understand the program.
71
8.1 Flow in the anular region between to cylinders
File: sector.dat . This example tests periodic boundary conditions. The mesh is a
sector of circular strip (2.72 < r < 4.48, 0 < θ < π/4) and we impose. The governing
equation is ∆u = 0, where u is a velocity vector of two components. On the internal
radius we impose u = 0 and u = t where t is the tangent vector. On the radii θ = 0 and
θ = π/4 we impose periodic boundary conditions so that it is close to flow in the region
between two cylinders, with the internal cylinder fixed and the external one rotating with
velocity 1. But the operator is not the Stokes operator. However in this case the flow
for this operator is divergence free and the gradient of pressure has no component in the
angular direction, so that the solution do coincide with the Stokes solution.
In the output of the test (sector.sal), we check that the solution is aligned with the
angular direction ux = −uy at the outlet section (θ = π/4).
8.2 Flow in a square with periodic boundary conditions
File: lap per.dat This is similar to example “sector.dat” but in a region that is the
square −1 < x, y < +1. u is imposed on all sides, |u| = 1 and in the tangential direction
in the counter-clockwise sense, i.e. u = (0, ±1) in x = ±1, u = (∓1, 0) in y = ±1. By
symmetry we can solve it in 1/4-th of the domain, i.e. in the square x, y > 0 (We could
solve it also in 1/8-th of the region, the triangle y > 0, y < x, x < 1).
8.3 The oscilating plate problem
File: oscplate.dat This is the flow produced between two infinite plates when one is at
rest and the other oscillating with amplitude A and frequency ω (see figure 7). This serves
to test a problem with temporal dependent boundary conditions. The problem is onedimensional and the resulting field is u = 0, p = cnst and v = v(x). We set parameters
lenght between plates L = 1, viscosity ν = 1 and we model the problem with a strip
0 ≤ x ≤ 1, 0 ≤ y ≤ 0.1 and set periodic boundary conditions between y = 0.1 and y = 0.
At the plate at rest (x = L) we set u, p = 0 and at x = 0 and at the moving plate (x = 0)
we set u = ωA sin(ωt). The analytic solution can be found easily. The equation for v is
∂v
∂2v
=ν 2
∂t
∂x
(106)
with boundary conditions v(0) = Aω sin(ωt) and v(L) = 0. The solution can be found by
searching for solutions of the form
v = eiωt+λx
(107)
Replacing in (106) we arrive at the characteristic equation
whose solutions are
iω = νλ2
(108)
1+ip
ω/ν = ±λ+
λ=± √
2
(109)
72
Setting the boundary conditions, the solution is
)
(
λ+ (x−L) − e−λ+ (x−L)
iωt e
v(x) = Im e
e−λ+ L − eλ+ L
(110)
In the example we set ν = 100, ω = 2000π ≈ 6283. We choose to have 16 time steps in
one period so that Dt = 6.2×10−5 . The resulting profile velocity is compared with the
analytical one in figure 8.4s
moving plate
x
plate at rest
u
Figure 7: Oscillating plates.
1
v
0.8
0.6
0.4
numeric
0.2
0
analytic
−0.2
0
0.2
0.4
0.6
0.8
Figure 8: Velocity profile for the oscilating plate problem.
73
x
1
8.4 Linear advection-diffusion in a rectangle
File: sine.epl This is an example for testing the advdif module. The governing equations are
∂φ
∂φ
+u
= 0, in 0 ≤ x ≤ Lx , |y| < ∞, t > 0
∂t
∂x
φ = A cos(ωt) cos(ky), at x = 0, t > 0
(111)
∂φ
= 0, at x = Lx , t > 0
∂n
φ = 0, at 0 ≤ x ≤ Lx , |y| < ∞, t = 0
As the problem is perdiodic in the y direction we can restrict the analysis to one quart
wave-length, i.e. if λy = 2π/k, Ly = λ/4 then the above problem is equivalent to
∂φ
∂φ
+u
= D∆φ,
∂t
∂x
φ = A cos(ωt) cos(ky),
∂φ
= 0,
∂n
φ = 0,
in 0 ≤ x ≤ Lx , 0 < y < Ly , t > 0
at x = 0, t > 0
at x = Lx , t > 0
(112)
in 0 ≤ x ≤ Lx , 0 < y < Ly , t > 0
φ = 0, at y = Ly
∂φ
= 0, at y = 0
∂n
We can find the solution proposing a solution of the form
φ ∝ eβx+iωt)
Replacing in (112) we arrive to a characteristic equation in kx of the form
i ω + βu = −ky2 + β 2
(113)
This is a quadratic equation in β and has two roots, say β − 12. The general solution is
n
o
φ(x, y, t) = Re c1 eβ1 x + c2 eβ2 x eiωt
9 The FastMat2 matrix class
9.1 Introduction
Finite element codes usually have two levels of programming. In the outer level a large
vector describes the “state” of the physical system. Usually this vector has as many
entries as the number of nodes times the number of fields minus the number of fixations
(i.e. Dirichlet boundary conditions). This vector can be computed at once by assembling
the right hand side and the stiffness matrix in a linear problem, iterated in a non-linear
problem or updated at each time step through solution of a linear or non-linear system.
The point is that, at this outer level, you perform global assemble operations that build
this vector and matrices. At the inner level, you perform a loop over all the elements
74
in the mesh, compute the vector and matrix contributions of each element and assemble
them in the global vector/matrix. From one application to another, the strategy at the
outer level (linear/non-linear, steady/temporal dependent, etc...) and the physics of the
problem that defines the FEM matrices and vectors may vary.
The FastMat2 matrix class has been designed in order to perform matrix computations
at the element level. It is assumed that we have an outer loop (usually the loop over
elements) that is executed many times, and at each execution of the loop a series of
operations are performed with a rather reduced set of local vectors and matrices. There
are many matrix libraries but often the performance is degraded for small dimensions.
Sometimes performance is good for operations on the whole matrices, but it is degraded
for operations on a subset of the elment of the matrices, like columns, rows, or individual
elements. This is due to the fact that acessing a given element in the matrix implies a
certain number of arithmetic operations. Otherwise, we can copy the row or column in an
intermediate object, but then there is an overhead due to the copy operations.
The particularity of FastMat2 is that at the first execution of the loop the address of
the elements used in the operation are cached in an internal object, so that in the second
ans subsequent executions of the loop the addresses are retrieved from the cache.
9.2 Example
Consider the following simple example. We are given a 2D finite element composed of
triangles, i.e. an array xnod of 2 × Nnod doubles with the node coordinates and an array
icone with 3 × nelem elements with node connectivities. For each element 0 < j < nelem
its nodes are stored at icone[3*j+k] for 0 ≤ k ≤ 2. We are required to compute the
maximum and minimum value of the area of the triangles. This is a computation which
is similar to those found in FEM analysis. For each element we have to load the node
coordinates in local vectors x1 , x2 and x3 , compute the vectors along the sides of the
elements a = x2 − x1 and b = x3 − x1 . The area of the element is, then, the determinant
of the 2 × 2 matrix J formed by putting a and b as rows.
The FastMat2 code for the computations is like this
Chrono chrono;
FastMat2 x(2,3,2),a(1,2),b(1,2),J(2,2,2);
FastMatCacheList cache_list;
FastMat2::activate_cache(&cache_list);
// Compute area of elements
chrono.start();
for (int ie=0; ie<nelem; ie++) {
FastMat2::reset_cache();
for (int k=1; k<=3; k++) {
int node = ICONE(ie,k-1);
x.ir(1,k).set(&XNOD(node-1,0)).rs();
}
x.rs();
a.set(x.ir(1,2));
a.rest(x.ir(1,1));
75
b.set(x.ir(1,3));
b.rest(x.ir(1,1));
J.ir(1,1).set(a);
J.ir(1,2).set(b);
double area = J.rs().det()/2.;
total_area += area;
if (ie==0) {
minarea = area;
maxarea = area;
}
if (area>maxarea) maxarea=area;
if (area<minarea) minarea=area;
}
printf("total_area %g, min area %g,max area %g, ratio: %g\n",
total_area,minarea,maxarea,maxarea/minarea);
printf("Total area OK? : %s\n",
(fabs(total_area-1)<1e-8 ? "YES" : "NOT"));
double cpu = chrono.elapsed();
FastMat2::print_count_statistics();
printf("CPU: %g, number of elements: %d\n"
"rate: %g [sec/Me], %g Mflops\n",
cpu,nelem,cpu*1e6/nelem,
nelem*FastMat2::operation_count()/cpu/1e6);
FastMat2::void_cache();
FastMat2::deactivate_cache();
Calls to the static members (those starting with FastMat2::) are related with the
caching manipulation and will be discussed later. Matrix are dimensioned in line 2, the
first argument is the number of dimensions, and then follow the dimensions, for instance
FastMat2 x(2,3,2) defines a matrix which 2 indices ranging from 1 to 3, and 1 to 2
respectively. The rows of this matrix will store the coordinates of the local nodes to the
element. FastMat2 matrices may have any number of indices. Actually the library is
compiled for a maximum number of indices (10 by default. This limit may be modified
by redefining variable $max arg in script readlist.eperl and recompile.) Also they can
have zero dimensions, which stands for scalars.
9.2.1 Current Matrix view
In lines 9 to 12 the coordinates of the nodes are loaded in matrix x. The underlying
philosophy in FastMat2 is that you can set “views” of the matrix without actually made
any copies of the underlying values. For instance the operation x.ir(1,k) (for “index
restriction”) sets a view of x so that index 1 is restricted to take the value k reducing
76
in one the number of dimensions of the matrix. As x has two indices, the operation
x.ir(1,k) gives a matrix of dimension one consisting in the k-th row of x. A call without
arguments like in x.ir() cancels the restriction. Also, the function rs() (for “reset”)
cancels the actual view.
9.2.2 Set operations
The operation a.set(x.ir(1,2)) copies the contents of the argument x.ir(1,2) in a.
Also we can use x.set(y) with y a Newmat matrix (Matrix y) or an array of doubles
(double *y).
9.2.3 Dimension matching
The x.set(y) operation requires that x and y have the same “viewed” dimensions. As
the .ir(1,2) operation restricts index to the value of 2, x.ir(1,2) is seen as a row vector
of size 2 and then can be copied to a. If the “viewed” dimensions don’t fit then an error
is issued.
9.2.4 Automatic dimensioning
In the example, a has been dimensioned at line 2, but most operations perform the dimensioning if the matrix has not been already dimensioned. For instance, if at line 2
we would declared FastMat2 a only, without specifying dimensions, then at line 14, the
matrix is created and dimensioned taking the dimensions from the argument. The same
applies to set(Matrix &) but not to set(double *) since in this last case the argument
(double *) don’t posses information about his dimensions. Other operations that define
dimensions are products and contraction operations.
9.2.5 Concatenation of operations
Many operations return a reference to the matrix (return value FastMat2 &) so that
operations may be concatenated as in A.ir(1,k).ir(2,j).
9.3 Caching the adresses used in the operations
If caching is not used the performance of the library is poor, typically one to two orders
of magnitude slower than the cached version, and the cached version is very fast, in the
sense that almost all the CPU time us spent in performing multiplications and additions,
and negligible CPU time is spent in auxiliary operations.
The idea with caches is that they are objects (class FastMatCache) that store the
adresses needed for the current operation. In the first pass through the body of the
loop a cache object is created for each of the operations, and stored in a list of class
FastMatCacheList. This list is basically an STL vector of pointers to cache objects.
When the body of the loop is executed the second time and the following, the adresses are
not needed to be recomputed but they are read from the cache instead. The use of the
cache is rather automatic and requires little intervention by the user but in some cases
the position in the cache-list can get out of sync with respect to the execution of the
operations and severe errors may occur.
77
The typical use can be seen in the example shown in section §9.2. First we have to
create a FastMatCacheList object as shown in line 3 and activate it with as in line 4.
The outer loop here is the loop over elements. For the second and following executions
of the body of the loop, you must “rewind” the cache-list, this is done in line 8 with the
FastMat2::reset_cache() operation (see figure 9).
The deactivate_cache() call at line 43 causes that subsequent operations after this line
will not be cached. This call is required if subsequent calls to FastMat2 cached operations
will be executed. Otherwise there will be an error since those posterior calls will not have
a corresponding cache in the cache-list.
The void_cache() call deletes the cache claiming the space used by it. It’s not required
but it is a good idea in order to save memory space. It may be required if the loop will be
called with other dimensions. For instance a FEM loop may be called first for triangles
and then for quadrangles, and in this two calls the dimensions of the involved matrices
are different.
The general layout of a code section which uses caching is like this
...
// Initialization. Not cached operations.
FastMatCacheList cache_list;
// Cache-list object
FastMat2::activate_cache(&cache_list); // activates the cache
for (j=0; j<N; j++) {
// Outer loop. N very large number
FastMat2::reset_cache();
// Rewinds the cache list
op_1;
op_2;
op_3;
...
op_N;
// Cached operations
}
// End of outer loop
FastMat2::void_cache();
FastMat2::deactivate_cache();
// free memory
// deactivates cache
9.3.1 Branching
Caching { is straightforward if the sequence of operations are executed linearly, with the
same local variables, without branching or looping. Special operations have to be executed
if there are branching conditions (if sentences) that alter the order of execution of the
operations in the loop. For instance an “if” sequence like this
%
....
op_prev_1;
// operations previous to the branch
78
op_1
reset_cache()
op_2
op_3
op_N
Figure 9: Cache of operations for linear segment of code
op_prev_2;
...
op_prev_k;
if (<condition 0>) {
op_0_1;
op_0_2;
// conditional block for branch 0
...
op_0_N;
} else if (<condition 1>) {
op_1_1;
op_1_2;
// conditional block for branch 1
...
op_1_N;
} else {
op_1_1;
op_1_2;
// conditional block for the ‘else’ branch
...
op_1_N;
}
79
....
op_pos_1;
op_pos_2;
...
op_pos_k;
// operations posterior to the branch
If branch ’0’ is followed in the first execution of the block, then the cache will look like
that shown in figure 11. When in a subsequent execution of the loops another branch is
op_prev_1
op_prev_2
op_prev_N
branch point
op_0_1
op_1_1
op_e_1
op_0_2
op_1_2
op_e_2
op_0_N
op_1_N
op_e_N
leave branch
op_pos_1
op_pos_2
op_pos_N
Figure 10: Cache list produced when branch 0 is chosen.
chosen, say branch 1, then when reading trying to execute operation op_1_1 the library
will find in the cache list a cache corresponding to operation op_0_1.
This is solved by creating “branch points” in the cache list and choosing the appropriate
branch as shown in the following code (see figure 10):
%
....
op_prev_1;
op_prev_2;
...
op_prev_k;
FastMat2::branch();
if (<condition 0>) {
// operations previous to the branch
80
op_prev_1
op_prev_2
op_prev_N
op_0_1
op_0_2
op_0_N
op_pos_1
op_pos_2
op_pos_N
Figure 11: Cache list produced when branch 0 is chosen.
FastMat2::choose(0);
op_0_1;
op_0_2;
// conditional block for branch 0
...
op_0_N;
} else if (<condition 1>) {
FastMat2::choose(1);
op_1_1;
op_1_2;
// conditional block for branch 1
...
op_1_N;
} else {
FastMat2::choose(<N>);
op_e_1;
op_e_2;
// conditional block for the ‘else’ branch
...
op_e_N;
}
FastMat2::leave();
....
81
op_prev_1;
op_prev_2;
...
op_prev_k;
// operations posterior to the branch
The branch() call tells the library that several branchs will start from there and a
branch point is created. Then each conditional block code must start with
choose(<j>)
where <j> is a number that must be unique among all other branches. Finally when leaving all the branches we must call leave() in order to tell the library that the mainstream
of the cache list must be retaken. Branches can be nested at any level.
The call to branching is not needed if the “execution path” is the same for all executions
of the loop. This usually happens when the condition refer to some global option that is
uniform over all elements. For instance if branch ’0’ corresponds to “include turbulence
model A” and branch ’0’ to model B, then the same branch is executed for all the elements
and there is no need to call the static functions.
9.3.2 Loops executed a non constant number of times
Another special case is when there are loops inside the body of the outer loop. Note that
no special branching is needed in general if the loop is executed a fixed number of times,
since the sequence of operations is not altered from one execution to another. For instance
consider the following piece of code
//
Case A. Inner code executed a fixed number of times
...
for (int k=0; k<N;k++) {
// N very large - Outer loop
block_before;
for (int ll=0; ll<3; ll++) {
inner_block;
// Operations that act on the
// same matrices.
}
block_after;
}
Then the cache list generated in the first execution of the loop will be
block_before
inner_block
inner_block
inner_block
block_after
and this is OK since the number of times inner_block is executed is always the same. If
the operations that are performed inside the loop are the the same for all executions of
the loop but are executed an irregular number of times, then we can use a sequence as
follows
82
//
Case B. Inner code executed a variable number of times
FastMatCachePosition cp1;
...
for (int k=0; k<N;k++) {
// N very large - Outer loop
block_before
FastMat2::get_cache_position(cp1);
int n=irand(1,5);
for (int ll=0; ll<n; ll++) {
FastMat2::jump_to(cp1);
inner_block;
// Operations that act on the
// same matrices.
}
FastMat2::resync_was_cached();
block_after;
Here the number of times the inner block is executed may vary randomly from 1 to 5.
(irand(m,n) returns an integer number randomly distributed between m and n.) The
FastMatCachePosition class objects store the position of the actual computation in the
cache list. So that the call to jump_to() at the start of the loop restarts the position in
the cache to the desired one. After leaving the loop we call to resync_was_cached() in
order to resync the cache list.
This is OK if the inner loop is executed at least once the in the first execution of the
outer loop. If it happens that in the first execution of the loop the inner loop is not
entered, then the cache list will contain block_before,block_after and when the inner
block will be entered in subsequent executions of the loop and error will arise since there
will be missimng caches.
To fix this we have to combine this with branching as here
FastMatCachePosition cp1;
for (int k=0; k<N;k++) {
....
FastMat2::branch();
FastMat2::choose(0);
// N very large - Outer loop
// Previous block
// Allows conditional execution
FastMat2::get_cache_position(cp1);
n=irand(0,5);
if (k==0) n=0; // This is the critical case.
// n=0 the first execution
// of the loop.
for (int ll=0; ll<n; ll++) {
FastMat2::jump_to(cp2);
...
// inner_block
}
FastMat2::resync_was_cached();
FastMat2::leave();
...
// posterior block
}
83
Off course, if the number of times the inner loop is executed is very large, and the most
time consuming part is the execution of this loop, then it may be convenient to choose
this loop as the “outer” one.
9.3.3 Masks can’t traverse branches
Another restriction is that if branching is used, the mask that is active at a certain
FastMat2 cached operation must be the same independently of the path that the code
have followed, for instance consider the following code
FastMat2 a,b,c;
// resize and set ‘a,b,c’
for (int j=0; j<N; j++) {
FastMat2::branch();
if (condition) {
FastMat2::choose(0);
a.is(...).ir(...);
// (B)
// operate on masked ‘a’
}
FastMat2::leave();
c.prod(a,b);
// (A) Wrong! ‘a’ may have or
// not the mask set
}
When the code reaches the prod() method at line (A), it can have executed or not the
block inside the if, so that the mask set in line (B) may or may not be active at line (A).
This is clearly an error, and to avoid it the safest way is to always reset the masks at the
outlet of a branched block like in line (C) as follows.
FastMat2 a,b,c;
// resize and set ‘a,b,c’
for (int j=0; j<N; j++) {
FastMat2::branch();
if (condition) {
FastMat2::choose(0);
a.is(...).ir(...);
// (B)
// operate on masked ‘a’
a.rs();
// (C)
}
FastMat2::leave();
c.prod(a,b);
// (A) OK! ‘a’ has not mask.
}
9.3.4 Efficiency
As we mentioned before, When caching is enabled there is a gain in speed of ten to one
hundreth, and the library is very performant. Of course, the first execution of loop is not
cached and represents and overhead that has to be amortized by executing the loop in
84
cached mode many times. The average speed increases when the number of executions of
the loop is increased. The cut point, i.e. the number of executions of the loop for which
the excution speed falls to one half the speed obtained for very large number o execution
is currently between 10 and 30, so that for loops larger than 200 the overhead time spent
in building the caches is negligible.
Another issue is the memory required by the caches. First there is some space required
by the caches themselves and then, there is a copy of the addresses of the elements involved.
For instance in a a.set(b) operation with a and b of size n×m, say, we have to store 2mn
addresses. Usually this overhead in memory requirement is negligible, since the amount of
variables and operations needed in the element routines are very small as compared with
the size of the problem itself. However, some care must be taken when caching large inner
loops. For instance in code A, section §9.3.2, if the inner loop is executed a constant, but
very large, number M of times, then the amount of the cache required is proportional to
M . Then, even if, as discussed before, no operations like those used in code B are required,
it may be adviceable to spend some time in insert these calls in order to reduce memory
cache and overhead time. Again, in the limit of M very large, it will be more convenient
to choose this loop as the “outer” one.
9.4 Synopsis of operations
9.4.1 One-to-one operations
These are operations that take one FastMat2 argument as in
FastMat2& add(const FastMat2 & A). The operations are from one element of A to the
corresponding element in *this.
The one-to-one operations implemented so far are
• FastMat2& set(const FastMat2 & A) Copy matrix
• FastMat2& add(const FastMat2 & A) Add matrix
• FastMat2& rest(const FastMat2 & A) Substract a matrix
• FastMat2& mult(const FastMat2 & A) Multiply (element by element) (like
Matlab .*).
• FastMat2& div(const FastMat2 & A) Divide matrix (element by element, like
Matlab ./).
• FastMat2& axpy(const FastMat2 & A, const double alpha) Axpy operation
(element by element): (*this) = alpha * A+
9.4.2 In-place operations
These operations perform an action on all the elements of a matrix.
• FastMat2& set(const double val=0.) Sets all the element of a matrix to a
constant value
• FastMat2& scale(const double val) Scale by a constant value
85
• FastMat2& add(const double val) Adds constant val
• FastMat2& fun(scalar_fun_t *function) Apply a function to all elements
• FastMat2& fun(scalar_fun_with_args_t *function, void *user_args)
Apply a function with optional arguments to all elements
9.4.3 Generic “sum” operations (sum over indices)
These operations perform some operation an all the indices of a given dimension
resulting in a matrix which has less number of indices. It’s a generalization of the
sum/max/min operations in Matlab that returns the specified operation per columns,
resulting in a row vector result (one element per column). Here you specify a number of
integer arguments, in such a way that
• if the j-th integer argument is positive it represents the position of the index in the
resulting matrix, otherwise
• if the j-th argument is -1 then we perform the specified operation (sum/max/min
etc...) over all this index.
For instance if we declare FastMat2 A(4,2,2,3,3) then B.sum(A,-1,2,1,-1) means
X
Bij =
Akjil , for i = 1..3, j = 1..2
(114)
k=1..2,l=1..3
These operation can be extended to any binary associative operation. So far we have
implemented the following
• FastMat2& sum(const FastMat2 & A, const int m=0, ...) Sum over all
selected indices
• FastMat2& sum_square(const FastMat2 & A, const int m=0, ...) Sum of
squares over all selected indices
• FastMat2& sum_abs(const FastMat2 & A, const int m=0, ...) Sum of
absolute values all selected indices
• FastMat2& min(const FastMat2 & A, const int m=0, ...) Minimum over all
selected indices
• FastMat2& max(const FastMat2 & A, const int m=0, ...) Maximum over
all selected indices
• FastMat2& min_abs(const FastMat2 & A, const int m=0, ...) Min of
absolute value over all selected indices
• FastMat2& max_abs(const FastMat2 & A, const int m=0, ...) Max of
absolute value over all selected indices
86
9.4.4 Sum operations over all indices
When the sum is over all indices the resulting matrix has zero dimensions, so that it is a
scalar. You can get this scalar by creating an auxiliar matrix (with zero dimensions)
casting with operator double() as in
FastMat2 A(2,3,3),Z;
...
// assign elements to A
double a = double(Z.sum(A,-1,-1));
or using the get() function
double a = Z.sum(A,-1,-1).get();
without arguments, which returns a double. In addition there is for each of the previous
mentioned “generic sum” function a companion function that sums over all indices. The
name of this function is obtained by appending _all to the generic function
double a = A.sum_square_all();
The list of these functions is
• double sum_all() const Sum over all indices
• double sum_square_all() const Sum of squares over all indices
• double sum_abs_all() const Sum of absolute values over all indices
• double min_all() const Minimum over all indices
• double max_all() const Maximum over all indices
• double min_abs_all() const Minimum absolute value over all indices
• double max_abs_all() const Maximum absolute value over all indices
9.4.5 Export/Import operations
These routines allow to convert matrices from or to arrays of doubles and Newmat
matrices
• FastMat2& set(const Matrix & A) Copies to argument from Newmat matrix
• FastMat2& set(const double *a) Copy from array of doubles
• const FastMat2& export(double *a) const exports to a double vector
• FastMat2& export(double *a) exports to a double vector
• const FastMat2& export(Matrix & A) const Exports to a Newmat matrix
• const FastMat2& export(Matrix & A) const Exports to a Newmat matrix
87
9.4.6 Static cache operations
These routines control the use of the cache list.
• static void activate_cache(FastMatCacheList *cache_list_=NULL) Activates use of the
cache
• static(void) Deactivates use of the cache
• static void reset_cache(void) Resets the cache
• static void void_cache(void) Voids the cache
• static void branch(void) Creates a branch point
• static void choose(const int j) Follows a branch
• static void leave(void) Leaves the current branch
• static double operation_count(void) Computes the total number of
operations in the cache list
• static void print_count_statistics() Print statistics about the number of
operations of each type in the current cache list
10 Hooks
Hooks are functions that you pass to the program and, then, are executed at particular
points, called “hook-launching points”, in the execution of the program. The particular
hook-launching points may depend on the application but, in order to fix ideas, for the
Navier-Stokes module and the Advective-Diffusive module the standard hooks are:
• init: To be executed once, at the start of the program.
• time step pre: To be executed before the time step calculation.
• time step post: To be executed after the time step calculation.
• close: To be executed once, at the end of the program.
There are “built-in” hooks included in the modules, for instance the DX hook that is in
charge of communicating with the DX visualizations program or the “shell-hook” that
allows you to execute shell commands, but you can also define your own hooks that are
defined in a C++ piece of code, compiled and dynamically loaded at run-time. You
can do almost anything with your hooks, for instance you can compress the result files,
perform file manipulation, launch visualization with other software like GMV. Also, hooks
are useful for communicating between different instances of PETSc-FEM. For instance, if
you want to couple a inviscid external flow with an internal viscous flow, then you can
run a PETSc-FEM instance for each region and perform the communication between the
different regions with hooks. The DX hook is explained in the DX section (see §13) and
we will explain here how to write and use dynamically loaded hooks and the shell hook.
The hook concept has been borrowed from GNU Emacs.
88
10.1 Launching hooks. The hook list
In order to activate a hook you first have to add a for each hook a pair of strings to the
hook_list, namely the type pf hook and the name of the hook. This last one is a unique
identifier that makes that hook unique.
hook_list <hook-type-1> <hook-name-1> <hook-type-2> <hook-name-2> ...
For instance:
hook_list shell_hook compress
\
dx_hook
my_dx_hook
\
dl_hook
coupling_hook
Here we added a shell hook that probably will compress some files during execution, the
DX hook in order to visualize, and a dynamically linked hook that will couple the run
with another program. Each hook will after take their own options from special options
in the table.
The hook_list entry must be unique, so that you have to group all your hooks in a
single hook_list entry. This is not limiting, because you can add as much hooks as you
want, but it is rather syntactically cumbersome, because you end up with a long string.
Also it becomes difficult to comment out some hooks while keeping others.
The hooks are executed in the order as you entered them in the hook list, so that in the
previous case you will have, at init time, the init part of the compress hook to be executed
before the init part of the my_dx_hook and finally the init part of the coupling_hook.
10.2 Dynamically loaded hooks
The easiest way to code a dynamically loadable hook is with a class. You need to include the corresponding headers hook.h and dlhook.h, and the class may define the hook
functions for all, some or none of the hook-launching points. Consider for instance the
following “Hello world!’ hook, that prints the message at the corresponding points in the
program.
#include <src/hook.h>
#include <src/dlhook.h>
class hello_world_hook {
public:
void init(Mesh &mesh_a,Dofmap &dofmap,
TextHashTableFilter *options,const char *name) {
printf("Hello world!I’m in the \"init\" hook\n");
}
void time_step_pre(double time,int step) {
printf("Hello world!I’m in the \"time_step_pre\" hook\n");
}
void time_step_post(double time,int step,
const vector<double> &gather_values) {
89
printf("Hello world!I’m in the \"time_step_post\" hook\n");
}
void close() {
printf("Hello world!I’m in the \"close\" hook\n");
}
};
DL_GENERIC_HOOK(hello_world_hook);
You can use almost any conceivable C/C++ library within your hooks. Take into
account that the program may be called in a parallel environment so, for instance, if you
will compress a certain file, then you should take care of doing that only at the master
process by, e.g., enclosing the code with a if (!MY_RANK) { ... } construct.
10.3 Shell hook
The shell-hook allows the user to execute a certain action at the hook-launching points
by simply writing shell commands.
hook_list shell_hook <name>
<name> <shell-command>
For instance
hook_list shell_hook hello
hello "echo Hello world"
In this case, PETSc-FEM will issue the echo command at each of the launching points.
If you want to issue more complex commands, then perhaps it’s a better idea to bundle
them in a script and then execute the script from the hook:
hook_list shell_hook hello
hello "my_script_hook"
where you have previously written a my_script_hook script with something like
#!bin/bash
## This is ‘my_script_hook’ file
echo Hello world
inside.
Probably you want to perform some actions depending on which stage you are, so that
you can pass the stage name to the command by including a %s output conversion token
in the command. For instance
hello "echo Hello world, stage %s"
Moreover, you can have also the time step currently executing and the current simulation
time by including a %d and a %f output conversions, for instance
90
hello "echo Hello world, stage %s, step %d, time %f"
The order is important! That is, the first argument is the step (a C string), the second
the time step (an integer) and the last the simulation time (a double).
In fact, basically, what PETSc-FEM does is to build string with sprintf() and then
execute it with system() like
sprintf(command,your_command,stage,step,time);
system(command);
(see the Glibc manual for more info about sprintf and system). If you need, for some
reason to switch the order then use a parameter number like
hello "echo Hello world, time %3$f, step %2$d, stage %1$s"
If you want to do some things depending on the stage then perhaps you can write
something like this
hook_list shell_hook hello
hello "my_script_hook"
and
#!bin/bash
## File ‘my_script_hook’
if [ "$1" == "init" ]
then
echo "in init"
## Do more things in
## ....
elif [ "$1" == "pre" ]
then
echo "in pre"
## Do more things in
## ....
elif [ "$1" == "post" ]
then
echo "in post"
## Do more things in
## ....
elif [ "$1" == "close" ]
then
echo "in close"
## Do more things in
## ....
else
## Catch all. Should
echo "Don’t know how
fi
‘init’ stage
‘pre’ stage
‘post’ stage
‘close’ stage
not enter here.
to handle stage: $1"
91
At the init and close hook-launching points the step number passed is -1 and -2 respectively, so that you can detect whether you are in a pre/post stage or init/close by checking
this too. The time passed is in both cases 0.
10.4 Shell hooks with “make”
If no command is given, i.e. if you write
hook_list shell_hook hello
but don’t add the
hello <command>
line, then PETSc-FEM uses a standard command line like this
make petscfem_step=%2$d petscfem_time=%3$f hello_%1$s
so that it will execute make commands with targets hello_init, hello_pre, hello_post
and hello_close like
$ make
$ make
$ make
$ make
$ make
$ make
$ make
...
$ make
$ make
$ make
petscfem_step=-1 petscfem_time=0.
petscfem_step=1 petscfem_time=0.1
petscfem_step=1 petscfem_time=0.1
petscfem_step=2 petscfem_time=0.2
petscfem_step=2 petscfem_time=0.2
petscfem_step=3 petscfem_time=0.3
petscfem_step=3 petscfem_time=0.3
hello_init
hello_pre
hello_post
hello_pre
hello_post
hello_pre
hello_post
petscfem_step=100 petscfem_time=10. hello_pre
petscfem_step=100 petscfem_time=10. hello_post
petscfem_step=-2 petscfem_time=0. hello_close
Inside the Makefile you can use the make variables $(petscfem_step) and
$(petscfem_time). For instance you can do the “Hello world” trick by adding the targets
# In Makefile
hello_init:
echo "In init"
## Do more things in ‘init’ stage
## ....
hello_pre:
echo "In pre"
## Do more things in ‘pre’ stage
## ....
hello_post:
echo "In post"
## Do more things in ‘post’ stage
92
## ....
hello_close:
echo "In close"
## Do more things in ‘close’ stage
## ....
For instance, I love to gzip my state files with a command like this
## In the PETSc-FEM data file
hook_list shell_hook compress
## In the Makefile
compress_init:
compress_pre:
compress_post:
for f in *.state.tmp ; do echo "gzipping $f" ; gzip -f $f ; done
compress_close:
11 Gatherers and embedded gatherers
Typically “gatherers” are reduction operators over the element sets, i.e. instead of assembling a matrix or vector, the gather operations produce some kind of global data, as for
instance
• The volume/surface/length of an elemset.
• The integral of some quantity (fields, coordinates, or functions of the them) over the
element set, for instance total concentration, total energy, total (linear or angular)
momentum or angular momentum.
• Same as above but depending also on gradients of the state variables, for instance:
total elastic energy. This involves some
• Same as above but integrals of quantities on a manifold of a dimension lower than
the embedding space. In this case the integrand may involve also the normal to the
surface, for instance total flow of heat or volume rate through a surface.
Gatherers are implemented as elemsets the only difference is that they typically only
process a special jobinfo named gather. This jobinfo is processed after the time step,
i.e. only after the Newton loop, on the converged values. This jobinfo task does not
assemble vectors or matrices, but a series global values stored in a global C++ vector
called vector<int> gather_values. A typical call is as follows (taken from ns.cpp),
arglf.clear();
arglf.arg_add(&state,IN_VECTOR|USE_TIME_DATA);
arglf.arg_add(&state_old,IN_VECTOR|USE_TIME_DATA);
arglf.arg_add(&gather_values,VECTOR_ADD);
ierr = assemble(mesh,arglf,dofmap,"gather",&time_star);
CHKERRA(ierr);
93
Note that three arguments are passed to the gather assemble task: the old and new states,
and the vector of gathered values.
Many gatherer elemsets have the suffix integrator appended to their names, for instance visc_force_integrator or volume_integrator
11.1 Dimensioning the values vector
In order to use the gatherers the user must before dimension appropriately the array of values with global option ngather. Then, for each gatherer elemset the user
must set the options that select a continuous range in this vector, namely gather_pos
and gather_length. The selected range is [gather_pos,gather_pos+gather_length].
Then, for instance, if the (hypothetical) gatherer elemset momentum_integrator is supposed to compute the integral of the momentum (a 3-vector in 3D), then we could use
as
global_options
...
ngather 3
__END_HASH__
elemset momentum_integrator 4
...
gather_pos 0
gather_length 3
data ./connectivity.dat
__END_HASH
With this setup, the program will compute for each time step the integral of the momentum, at will print these three values on standard output. If a string is passed to the global
option gather_file for instance
...
ngather 3
gather_file momentum.out
...
then, instead of reporting the gathered values on standard output they are printed on the
corresponding file. The file is opened and closed at each time step.
11.2 Embedded gatherers
If the elemset has a dimension lower than the embedding space, for instance a surface
embedded in 3D space, then computing the gradients of the variables on the surface can
not be done with the information on the surface only. This is not an uncommon situation,
for instance it happens when computing viscous forces of a Newtonian fluid on the skin
of a solid body. The gradient of the velocity field must be computed in order to compute
94
Figure 12: Embedded gatherer element
the stress on the skin. But if the velocity field is known only on the surface (think, for
simplicity, in the case a of plane surface) the normal component of the gradient can not
be determined.
In order to solve this, a special class of gatherers have been developed, namely the
embedded_gatherer class. Typically such an elemset is composed is a surface elemset,
associated with a volume one. For instance (see figure 12) the user can have a fluid problem
with a volume elemset (in 2D) composed of cartesian2d and wants to compute the viscous
traction on the solid surface AB. For this, she adds an elemset visc_force_integrator
composed of six-nodes elements composed of three layers of segments parallel to the surface
as, for instance, the element 3-4-8-9-14 (marked with a dashed line in the figure). This
special elements can be seen as layers of surface elements, parallel to the skin. With the
velocity values computed by the Navier-Stokes solver at these nodes, the gatherer elemset
can compute high precision approximations to the normal derivatives of velocity at the
surface.
• Note that this requires that the mesh must be somewhat structured near the surface.
However, it is usual to add structured layer on the body skin in order to correctly
capture the boundary layer.
• Also, it requires the construction of the connectivities of these layers, what may be
cumbersome, but we will see later that this can be done automatically (see §11.3).
• The size of the elements in the normal direction are not required to be equispaced,
i.e. the distances 1-6 and 6-11 are not required to be equal or similar. This is
important, because normally the layers of nodes are refined towards the surface, as
shown in the figure.
• The lines of nodes are not required to be normal to the surface.
• However, it is required that each row of nodes (for instance the row 1-6-10) must
lay on a smooth curve. This is required, since in the process of computing high
normal derivatives a Taylor expansion is computed in terms of this curves, so the
error depends on the higher derivatives (e.g. the curvature) of the line.
95
The typical invocation is as follows
elemset nsi_tet_les_full 4
geometry cartesian2d
__END_HASH__
1 2 7 6
6 7 12 11
2 3 8 7
...
__END__ELEMSET__
elemset visc_force_integrator 6
geometry line2quad
__END_HASH__
1 2 6 7 11 12
2 3 7 8 12 13
3 4 8 9 13 14
...
__END__ELEMSET__
Figure 13: Embedded gatherer elements in 3D
Note that the geometry is special line2quad means that the surface geometry are lines
and the corresponding volume elemset is composed of quads. The other two possibilities
implemented so far are tri2prism and quad2hexa (see figure 13).
Typical invocation is as follows The typical invocation is as follows
elemset visc_force_integrator 9
geometry tri2prism
__END_HASH__
...
1 2 3 4 5 6 7 8 9
96
...
__END__ELEMSET__
and
elemset visc_force_integrator 12
geometry quad2hexa
__END_HASH__
...
1 2 3 4 5 6 7 8 9 10 11 12
...
__END__ELEMSET__
11.3 Automatic computation of layer connectivities
Sometimes it is somewhat cumbersome to compute the connectivities of the embedded gatherer connectivities since it involves the finding of layers of nodes inside the
adjacent volume element. This can be done automatically by PETSc-FEM if the
identify_volume_elements is activated and the name of the adjacent volume element is
passed through the option volume_elemset for instance
elemset nsi_tet_les_full 4
geometry cartesian2d
name viscous_fluid
__END_HASH__
1 2 7 6
6 7 12 11
2 3 8 7
...
__END__ELEMSET__
elemset visc_force_integrator 6
volume_elemset viscous_fluid
identify_volume_elements
geometry line2quad
__END_HASH__
1 2 1 1 1 1
2 3 1 1 1 1
3 4 1 1 1 1
...
__END__ELEMSET__
Note that the nodes in the inner layers of nodes are replaced by 1’s. With this setting,
the code will inspect the connectivity of the viscous_fluid elemset and find the nodes
corresponding to the inner layers and replace the 1’s by the correct n ode number.
97
11.4 Passing element contributions as per-element properties
For some applications it is desirable to have the individual element contributions instead
of having their sum. For instance, in a fluid-structure application involving a fluid and a
deformable solid, it does not suffice only with the integral of the forces to compute the
evolution of the solid, but also it is needed the whole distribution of forces. In this case
what is needed is a list of surface elements and the total force for each element.
There are two flavors of this feature.
The simplest one is activated with
dump_props_to_file and simply stores the data in a file. The other possibility is activated with the pass_values_as_props option and stores the computed per-element
values in the per-element properties table. Subsequent parts of the code can grab this
information using the usual mechanism to query the per-element properties table.
All three mechanism can be activated or deactivated independently of the others. In
summary
• Computation of global values is activated with gather_length>0.
• Storing the per-element is activated with the dump_props_to_file option.
• Passing the per-element computed values to other sections of the code via the perelement table is activated with the pass_values_as_props option.
In the last two cases the number of values to be computed by the gatherer can be set via
the store_values_length option.
The summary of relevant options is
• pass_values_as_props activates the mechanism of passing per element computed
values as per-element properties.
• store_values_length is an integer indicating how many values are computed by
the gatherer. It defaults to gather_length. However, if the standard mechanism
of the gather_values is deactivated, then the number of computed values must be
passed through this option.
• store_in_property_name is a string that identifies the per-element property that
must be filled with the computed values. Of course, the user must define such
property with the appropriate size. The values in the connectivity table are irrelevant
(for instance null values) and will be overwritten by the gatherer.
• compute_densities: Ifcompute_densities==0 then the value set in the per-element
property is the integral of the integrand over the element. That is, if the integrand
is φ, the computed value for the element is
Z
value for element e =
φ
(115)
Ωe
Conversely, if compute_densities==1 the value set is the density of the mean value
of the integrand, i.e.
Z
1
value for element e = hφiΩe =
φ
(116)
|Ωe | Ωe
For instance, if the integrand φ is the heat flow through the surface, then
ifcompute_densities==0 then the value set in the per-element property is the total
98
heat flow through the element (which has units of energy per unit time), whereas
ifcompute_densities==1 then the value set is the mean heat flow density (which
has units of energy per unit time and unit surface ). For the traction on a surface,
the passed value is the total force on the element in one case (units of force) and the
mean skin friction in the other (units of force per unit area). Of course, this option
has no effect on the values passed via the global gather_values vector.
• dump_props_to_file activates the mechanism of storing per-element computed values in a file.
• dump_props_file is the name of the file where the per-element values are written.
If it contains a %d format sequence then it is replaced by the time step number (using
printf() and friends) .
• dump_props_freq is the frequency at which the values are dumped to file.
11.5 Parallel aspects
Of course, PETSc-FEM is in charge of adding the contributions on different processors.
The resulting sum of contributions over all elements in all processors. The sum is available
not only in the master (rank=0), but in all processors (as with an MPI_Allreduce() call).
This is important since this global sum can be used also in hooks in order to perform
computations. For instance, a gatherer can be used to compute the force on a body, and
this force can be passed to a hook to compute the movement of the body.
11.6 Creating a gatherer
Typically a gatherer is created by deriving from the virtual class gatherer and implementing the set_pg_values() method which is in charge of computing the integrands.
class gatherer {
// ...
public:
// perform several checks and initialization
void init();
// add Gauss point contributions
void set_pg_values(vector<double> &pg_values,FastMat2 &u,
FastMat2 &uold,FastMat2 &xpg,FastMat2 &Jaco,
double wpgdet,double time);
};
• The init() function may be used for initialization. The TGETOPTDEF(thash,....)
macro can be used for extracting options of the elemset. For instance,
TGETOPTDEF(thash,double,Young_modulus,0.);
• The set_pg_values(...) is the main method. Here the user computes the values
to be integrated. Its signature is
99
void set_pg_values(vector<double> &pg_values,FastMat2 &u,
FastMat2 &uold,FastMat2 &xpg,FastMat2 &n,
double wpgdet,double time);
The argument pg_values are the values to be computed. (pg stands for Gauss
point, since usually this function is called by the gatherer class at the Gauss points of
integration.) u and uold are the states at times tn and tn+1 . xpg are the coordinates
of the Gauss point. n is the normal to the surface (only relevant if the dimension
of the element ndimel is equal to ndim-1). wpgdet is the area of the Gauss point
(i.e. the Gauss point weight times the Jacobian of the transformation to the master
element).
• void element_hook(int k) is called before the Gauss point hook and then can be
used in order to pre-compute some stuff for all the Gauss points. k is the element
number.
• void clean() can be used after processing all the elements and doing some cleanup.
12 Generic load elemsets
Generic load elemsets account for surface contributions which represent constant terms
in the governing equations or either a function of the state at the surface. Typical terms
that can be represented in this way are
• External heat loads (like a constant radiation load) in thermal problems: q = q¯
• A linear Newtonian cooling term q = −h(T − T∞ ).
• A nonlinear Newtonian cooling term q = f (T, T∞ ).
12.1 Linear generic load elemset
n
35
q’’
Ts
36
h
Figure 14: Generic load element (single layer)
In the simplest case the load is of the form
¯,
q = −hU + q
(single layer)
¯ , (double layer)
q = h(Uout − U) + q
(117)
This is implemented by the lin_gen_load elemset. This elemset may be “single-layer”
or “double-layer” (see figures 14 and 15). Double layer elements can represent a lumped
thermal resistance, for instance a very thin layer of air inside a material of higher conductivity. In the double layer case the number of nodes is twice than in the single-layer
100
120
n
q’’
T1
121
35
36
h
Ts
Figure 15: Generic load element (double layer)
case. Double layer is activated either by including the double_layer option or either if
the numer of nodesis twice that one specified by the geometry.
The options for the lin_gen_load elemset are
• int double_layer (default=0):
Whether there is a double or single layer of nodes (found in file: genload.cpp)
• string geometry (default=cartesian1d):
Type of element geometry to define Gauss Point data (found in file: genload.cpp)
•
double[var_len] hfilm_coeff (default= no default):
Defines coeffcients for the film flux function. May be var_len=0 (no ∆T driven
load) or var_len=ndof*ndof a full matrix of relating the flux with ∆U. (found in
file: linhff.cpp)
•
double[var_len] hfilm_source (default= no default):
Defines constant source term for the generic load on surfaces. May be of length 0 (null
load) or ndof which represents a geven load per field. (found in file: linhff.cpp)
• int ndimel (default=ndim-1):
The dimension of the element (found in file: genload.cpp)
12.2 Functional extensions of the elemset
The generic elemset is GenLoad. There is an instantiation for the Generic load elemsets can be extended by the user by deriving the base class GenLoad and the most important task is to implement the methos q(u,flux,jac) for the single layer case, and
q(u_in.u_out,flux_in,flux_out,jac) in the double layer case.
12.3 The flow reversal elemset
This instantiation of GenLoad is useful for avoiding instabilities caused by infersion of the
flow at an outlet boundary. Assume that the Navier-Stokes module computes a velocity
field v and some scalar φ is being advected with this velocity field. On an outlet boundary
one usually leaves the temperature free, that is, no Dirichlet condition is imposed on φ. But
ˆ < 0 (ˆ
if the flow is reversed, that is v · n
n is the external normal to the boundary) at some
instant, then this boundary condition becomes ill-posed and the simulation may diverge.
101
One solution is to switch from Neumann to Dirichlet boundary conditions depending on
ˆ , i.e.
the sign of v · n
(
ˆ ≥ 0,
0,
if v · n
q=
(118)
ˆ > 0,
h(φ¯ − φ), if v · n
where h is a film coefficient and p¯hi is the value to be imposed at the boundary when the
flow is reversed. Note that this amounts to switch to a Dirichlet condition by penalization.
¯ However, if h is too large, this can
For h large enough, the value of φ will converge to φ.
affect the conditioning of the linear system.
The options for this elemset are
•
vector<double> coefs (default= empty vector): Penalization coefficients (artificial film coefficients). The length of coefs and dofs must be the same. (found
in file: flowrev.cpp)
•
vector<int> dofs (default= empty vector): Field indexes (base-1) for which
the flow reversal will be applied (found in file: flowrev.cpp)
• int ndim (default=0):
Dimension of the problem (found in file: flowrev.cpp)
•
vector<double> refvals (default= empty vector): The reference values for
the unknown. The length of coefs and dofs must be the same. (found in file:
flowrev.cpp)
• int thermal_convection (default=0):
If this is set, then the code assumes that a thermal NS problem is being run i.e.
vel_index=1 and the penalization term is added only on the temperature field (i.e.
dofs=[ndim+2]). Also the refval (found in file: flowrev.cpp)
• int vel_indx (default=1):
Field index where the components of the velocity index start (1-base). (found in file:
flowrev.cpp)
13 Visualization with DX
Data Explorer (http://www.opendx.org) is a system of tools and user interfaces for
visualizing scientific data. Originally an IBM commercial product, has been released now
under the IBM Open Source License, and maintained by a group of volunteers (see the
URL above). Besides their impressive visualization capabilities, DX has many features
that make it an ideal visualization tool for PETSc-FEM.
• DX is Open Source with a License very close to the GPL (not completely compatible
though)
• It has a “Visual Program Editor” which makes it very configurable for different
modules.
102
• It has been linked to PETSc-FEM through sockets, which makes it possible to visualize a running job, even in background.
• It has a scripting language.
If you want to visualize your results with DX you have first to download it from the
URL above and install it. Then you can pass your results to DX simply by editing the
needed .dx files or well by using the ExtProgImport DX module. In order to use this last
option you have to
• Compile PETSc-FEM with the USE_SSL flag enabled (disabled by default). Also,
if you want to DX be able to communicate asynchronously with PETSc-FEM, you
have to compile with the USE_PTHREADS flag enabled (disabled by default)
• Load the dx_hook hook in PETSc-FEM and pass it some options.
• Build the dynamically loadable module (file dx/epimport) and load it in DX.
DX basic visualization units are Field objects, which are composed of three Array
objects called positions (node coordinates), connections (element connectivities) and
data (computed field values). At each time step, ExtProgImport exports two Group
objects,
• a Group of Arrays named output_array_list and
• a Group of Fields objects named output_fields_list.
This objects are generated as follows.
• For each Nodedata PETSc-FEM object a positions array is constructed, this results
in, say, nn array objects. (Currently PETSc-FEM has only one Nodedata object
(member name nodes), so that NN=1).
• A data array is constructed in basis to the current state vector (member name data).
• For each elemset, a connections array is constructed. You can disable construction
of some particular elemsets through the dx elemset option, and also which nodes and
DX interpolation geometry is used. This results in other ne connection arrays. The
member name of each array is based in the name option of the elemset. If this has
not been set, the name is set to the type of the elemset. If collision is produced, a
suffix of the form _0, _1 is appended, in order to make it unique. Options controlling
how the connection array is constructed can be consulted in §4.4.2.
The resulting nn+ne+1 arrays are grouped in a Group object and sent through the
output_array_list output tab. You can extract the individual components with the
Select DX module, and build field objects. Also a set of field objects is created automatically and sent through the output_field_list output tab. Basically a field is constructed
for each possible combination of positions, connections and data objects. This may seem
a huge amount of fields, but in fact as the arrays are passed internally by pointers in DX,
103
the additional memory requirements are not large. (At the time of writing this, this is a
set of nn*ne=ne fields, since nn=1). A name is generated automatically for each field.
Some information is send to the DX “Message Window”. Also it’s very useful to put
Print modules downstream of the ExtProgImport module in order to see which arrays
and fields have been created.
The communication between DX and PETSc-FEM is done through a socket. PETScFEM acts as a “server” whereas DX acts as a “client”. PETSc-FEM opens a “port” (option
dx_port), and DX connects to that port (the “port” input tab in the ExtProgImport
module). (Currently, the standard port for the DX/PETSc-FEM communication is 5314.)
DX can communicate with PETSc-FEM running in the background ad even on other
machine (the “serverhost” input tab). At each time step, DX sends a request to PETScFEM which answers sending back “arrays” and “fields”.
13.1 Asynchronous/synchronous communication
• In “synchronous” mode (steps>0) PETSc-FEM waits each steps time steps a connection from DX. Once DX connects to the port, PETSc-FEM transmits the required
data and resumes computation. This is the appropriate way of communication when
generating a sequence of frames for a video with a DX sequencer, for instance. Note
that if you don’t use a sequencer then you have to arrange in someway to make
ExtProgImport awake and connect to PETSc-FEM, otherwise the update in the
visualization is not performed and the PETSc-FEM job is stopped, waiting for the
connection.
• In “asynchronous” mode (steps=0), in contrast, PETSc-FEM monitors each port
after computing a time step. If a DX client is trying to connect, it answers the
request and resumes computing, otherwise it resumes computing immediately. This
is ideal for monitor a job that is running in background, for instance. Note that in
this case the interference with the PETSc-FEM job is minimal, since once PETScFEM answers the request, resumes processing automatically until a new connection
is requested.
The steps state variable is internal to PETSc-FEM. It can be set initially with a
dx_steps options line (1 by default). After that, it can be changed by changing the steps
input tab. However, note that the change doesn’t take effect until the next connection of
DX to PETSc-FEM. If you don’t want to change the internal state of the steps variable
then you can set it to NULL or -1.
13.2 Building and loading the ExtProgImport module
This module allows PETSc-FEM to exchange data with DX through a socket, using a
predefined protocol. The module is in the $(PETSCFEM DIR)/dx directory of the PETScFEM distribution. To built it, you have to compile first the petscfem library, and then
cd to the dx directory and make $ make. This should build the dx/epimport file, which
is a dynamically loadable module for DX. This one altogether with the dx/epimport.mdf
(which is a plain text file describing the inputs/outputs of the module, and other things)
are the files needed by DX in order to run the module.
104
To load this module in DX you can do this either at the moment of launching DX with
something like
$ dx -mdf /path/to/epimport.mdf
or well from the dxexec window (menu File/Load Module Descriptions).
13.3 Inputs/outputs of the ExtProgImport module
• (input) steps: type integer; default 0. Description: Visualize each ”steps” time
steps. (0 means asynchronously).
• (input) serverhost; type string; default "localhost"; Description: Name of host
where the external program is running. May be also set in dot notation (e.g.
200.9.223.34 or myhost.mydomain.edu).
• (input) port; type integer; default 5314; Description:Port number
• (input) options; type string; default NULL; Description: Options to be passed to
the external program
• (input) step; type integer; default -1; Description: Input for sequencer. This value
is passed to the dx_hook hook, but currently is ignored by it. It could be used in a
future in order to synchronize the DX internal step number with the PETSc-FEM
one.
• (input) state_file; type string; default NULL; An ASCII file storing a state where
to read the state to be visualized.
• (output) output_array_list; type field; Description: Group object of imported
arrays
• (output) output_field_list; type field; Description: Group object of imported
fields
13.4 DX hook options
• int dx_auto_combine (default=0):
Auto generate states by combining elemsets with fields (found in file: dxhook.cpp)
• int dx_cache_coords (default=0):
Uses DX cache if possible in order to avoid sending the coordinates each time step
or frame. Use only if coordinates are not changing in your problem. (found in file:
dxhook.cpp)
• double dx_coords_scale_factor (default=1.):
Coefficient affecting the new displacements read. New coordinates are c0*x0+c*u
where c0=dx_coords_scale_factor0 , c1=dx_coords_scale_factor , x0 are the
initial coordinates and u are the coordinates read from the given file. (found in file:
dxhook.cpp)
105
• double dx_coords_scale_factor0 (default=1.):
Coefficient affecting the original coordinates. See doc for dx_coords_scale_factor
(found in file: dxhook.cpp)
• int dx_do_make_command (default=0):
If true, then issue a make dx_step=<step> dx_make_command (found in file:
dxhook.cpp)
• string dx_node_coordinates (default=<none>):
Mesh where updated coordinates must be read (found in file: dxhook.cpp)
• int dx_port (default=5314):
TCP/IP port for communicating with DX ( 5000 < dx_port < 65536 ). (found in
file: dxhook.cpp)
• int dx_read_state_from_file (default=0):
Read states from file instead of computing them . Normally this is done to analyze
a previous run. If 1 the file is ASCII, if 2 then it is a binary file. In both cases
the order of the elements must be: u(1,1), u(1,2), u(1,3), u(1,ndof), u(2,1),
... u(nnod,ndof) where u(i,j) is the value of field j at node i. (found in file:
dxhook.cpp)
• string dx_split_state (default=):
Generates DX fields by combination of the input fields (found in file: dxhook.cpp)
• int dx_state_all_fields (default=!dx_split_state_flag):
Generates a DX field with the whole state (all ndof fields) (found in file: dxhook.cpp)
• int dx_steps (default=1):
Initial value for the steps parameter. (found in file: dxhook.cpp)
• int dx_stop_on_bad_file (default=0):
If dx_read_state_from_file is activated and can’t read from the given file, then
stop. Otherwise continue padding with zeros. (found in file: dxhook.cpp)
14 The “idmap” class
14.1 Permutation matrices
This class represents sparse matrices that are “close” to a permutation. Assume that the
n × n matrix Q is a permutation matrix, so that if y, x are vectors of dimension n and
y = Qx
(119)
yj = xP (j)
(120)
then
106
where P is the permutation associated with Q. For instance if P = {1, 2, 3, 4} → {2, 4, 3, 1}
then


x2
 x4 

(121)
y = Qx = 
 x3 
x1
This matrix can be efficiently stored as an array of integers of dimension ident[n] such that
ident[j − 1] = P (j). Also, the inverse permutation can be stored in another integer array
iident[n], so that both Q and Q−1 can be stored at the cost of n integer values for each.
Also, both multiplication and inversion operations as y = Q x, y = QT x, x = Q−1 y and
x = Q−T y, can be done with n-addressing operations.
14.2 Permutation matrices in the FEM context
2
3
1
4
5
7
6
Figure 16: Node/field to dof map. Example for NS or Euler in a duct.
Permutation matrices are very common in FEM applications for describing the relation
between node/field pairs and degrees of freedom (abbrev. “dof”’s). For instance consider
a flow NS or Euler application in a mesh like that shown in figure 16. At each node we
have a set of unknown “fields” (both components of velocity and pressure, u, v and p). In
107
a first description, we may arrange the vector of unknowns as


u1
 v1 


 p1 


 u2 


 v2 


UNF =  p

2




..


.


 uN

nod 

 vN

(122)
nod
pNnod
The length of this vector is Nnod × ndof and this may be called the “node/field” (this
accounts for the N F subindex) description of the vector of unknowns. However, we can
not take this vector as the vector of unknowns for actual computations due to a series of
facts,
• Not all of them are true unknowns, since some of them may be imposed to a given
value by a Dirichlet boundary condition.
• There may be some constraints between them, for instance, in structural mechanics
two material points linked by a rigid bar, or a node that is constrained to move on
a surface or line. In CFD these constraints arise when periodic boundary conditions
are considered
• Some reordering may be necessary, either for reducing band-width if a direct solver
is used, or either due to mesh partitioning, if the problem is solved in parallel.
• Also, non-linear constraints may arise, for instance when considering absorbing
boundary conditions or non-linear restrictions.
So that we assume that we have a relation
¯U
¯
UNF = Q U + Q
(123)
¯ representing the externally fixed values,
where U is the “reduced” array of unknowns, U
∗
and Q, Q appropriated matrices. Follows some common cases
Dirichlet boundary conditions: If the velocity components of node 1 are fixed, then we
have
u1 = u
¯1
(124)
v1 = v¯1
¯ has a “1” in the entry
so that the corresponding file in Q is null and the file in Q
corresponding to the barred values.
108
slip
bou
n^
nda
ry
j
u
Figure 17: Slip boundary condition
ˆ j is
Slip boundary conditions: For the Euler equations, if node j is on a slip wall and n
the corresponding normal, then the normal component of velocity is null and the
tangential component is free. (In 3D there are two free tangential components and
one prescribed normal component). The normal component may be prescribed to
some non-null value if transpirations fluxes are to be imposed. The corresponding
equations may look like this
uj = ujt tx + u
¯jn nx
(125)
vj = ujt ty + u
¯jn ny
where ujt is the (free) tangential component and u
¯jn the prescribed normal component.
A
C
H
G
D
F
E
B
Figure 18: Symmetrical arrangement of foils with specular symmetry.
Periodic boundary conditions: These kind of b.c.’s are useful when treating geometries
with repetitive components, as is very common for rotating machinery. In some
cases this can be done as slip boundary conditions (see figure 18). Here the foils
are symmetric about their centerline, and then the whole geometry not only posses
109
A
=
2
n
j
C
j’
E
D
Figure 19: Non-symmetrical arrangement of foils.
symmetry of rotation about angles multiple of 2π/nfoils but in addition possess reflection symmetry about the centerline of a foil, like BC and the centerline between
two consecutive foils like AB. In consequence it suffices the domain ADEC with
inlet/outlet b.c.’s in DE and AC, slip boundary conditions on those parts of AD
and EF GHC. On the foil (boundary F GH) pressure should be free, while on AD,
EF and HC we can impose (∂p/∂n) = 0. For Navier-Stokes we may impose solid
wall b.c.’s on the foil, normal null velocity component on AD, EF and HC, null
tangential shear stress ((∂ut /∂n) = 0), and (∂p/∂n) = 0.
However, if the foils are non-symmetric or they are not disposed symmetrically about
a line passing by the rotation axis (see figure 19) then there are no predefined streamlines, but given two corresponding point like j and j 0 that are obtained through
rotation of α = 2π/nfoils , then we can impose
uj 0 = cos α uj + sin α vj
vj 0 = − sin α uj + cos α vj
(126)
pj 0 = pj
Absorbing boundary conditions The basic concept about these b.c.’s is to impose the ingoing components from outside (reference) values while leaving the outgoing components free. If w = [u, v, p] is the state vector of the fluid at some node j on the
outlet boundary (see figure 16), then Vw = u are the eigen-components, where V
is the change of basis matrix. The absorbing b.c. may be written as
 1 
w
¯
(127)
u = V  w2 
3
w
110
Where w
¯ 1 is the in-going component (taken from the reference value wref ) and the
other components w2,3 are free, i.e. they go in U.
Non-linear Dirichlet boundary conditions: In some cases, Dirichlet boundary conditions
are not expressed in terms of the state variables used in the computations, but on
a non-linear combination of them, instead. For instance, consider the transport of
moisture and heat through a porous media, and choose temperature T and moisture
content H as the state variables. On an external boundary, we impose that the
partial pressure of water should be equal to its external value Pw = Pw,atm . As the
partial pressure of water (which may be related to relative humidity) is a complex
non-linear function of T and H through the sorption isotherms of the porous media
and the saturation pressure of water, it results in a non-linear link of the form
Pw (T, H) = Pw,atm . By the moment we consider only a linear relation, since the
non-linear case doesn’t fit in the representation (123). The non linear case will be
considered later on.
14.3 A small example
abs
orb
ing
1
oug
per
iodi
c
oin
outle g b.c.
2
t
3
4
iodi
6
c
5
per
7
8
inlet
u,v im
po
sed
9
Figure 20: A small example showing boundary conditions.
Consider the region shown in figure 20, composed of 8 triangle elements and 9 nodes.
We have 9 × 3 = 27 node/field values, but periodic boundary conditions on side 1-4-7 to
side 3-6-9 eliminates the 9 unknowns on these last three nodes. In addition, at the outlet
boundary (nodes 1-2-3) there is only one in-going component so that the unknowns here
are only w12 , w13 , w22 and w23 . On the other hand, on the inlet boundary u, and v are
111
imposed so the vector of unknowns is
U1 = w12
U2 = w13
U3 = w22
U4 = w23
U5 = u4
U6 = v4
(128)
U7 = p4
U8 = u5
U9 = v5
U10 = p5
U11 = p7
U12 = p8
while the prescribed values are
¯1 = w
U
¯11
¯2 = w
U
¯21
¯ 3 = u7
U
¯4 = v7
U
¯ 5 = u8
U
¯6 = v8
U
112
(129)
¯ are
And the relation defining matrices Q and Q
1 ¯
1
1
u1 = V11
U1 + V12
U1 + V13
U2
1 ¯
1
1
v1 = V21
U1 + V22
U1 + V23
U2
1 ¯
1
1
p1 = V21
U1 + V22
U1 + V23
U2
2 ¯
2
2
u2 = V11 U2 + V12 U3 + V13 U4
2 ¯
2
2
v2 = V21
U2 + V22
U3 + V23
U4
2 ¯
2
2
p2 = V21 U2 + V22 U3 + V23 U4
1 ¯
1
1
1 ¯
1
1
u3 = cos α (V11
U1 + V12
U1 + V13
U2 ) + sin α (V21
U1 + V22
U1 + V23
U2 )
1 ¯
1
1
1 ¯
1
1
v3 = − sin α (V11 U1 + V12 U1 + V13 U2 ) + cos α (V21 U1 + V22 U1 + V23
U2 )
1 ¯
1
1
p3 = V21
U1 + V22
U1 + V23
U2
u4 = U5
v4 = U6
p4 = U7
u5 = U8
(130)
v5 = U9
p5 = U1 0
u6 = cos α U5 + sin α U6
v6 = − sin α U5 + cos α U6
p6 = U7
¯3
u7 = U
¯4
v7 = U
p7 = U11
¯5
u8 = U
¯6
v8 = U
p8 = U12
¯3
u9 = U
¯4
v9 = U
p9 = U11
¯ matrices. Moreover, in
As we can see the boundary conditions result in sparse Q and Q
real (large) problems most of the rows correspond to interior nodes (such as node 5 here)
so that Q is very close to a permutation matrix. If we think at accessing the elements of Q
by row, then could store Q as a sparse matrix, but in this case we need an average of two
integers (for pointers) and a double (for the value) per unknown, whereas a permutation
matrix can be stored with only one integer per unknown. One possibility is to think at
these matrices as permutations followed by a sequence of operations depending on each
kind of boundary conditions, but it may happen also that several kind of b.c.’s superposes,
as int the case of node 3 (periodic + absorbing) and node 9 (periodic+ Dirichlet).
113
14.4 Inversion of the map
¯ to find U, for
It is necessary sometimes to invert relation (123), i.e. given UNF and U
instance when initializing a temporal problem. Of course, this may not be possible for
¯ since Q is in general rectangular, but we assume that QT Q is non
arbitrary UNF and U,
singular and solve (123) in a least square sense. After, we may verify if the given data
¯ is “compatible” by evaluating the residual of the equation. This operation
(UNF and U)
should be performed in O(N ) operations.
14.5 Design and efficiency restrictions
¯ should have the following characteristics.
The class of matrices representing Q and Q
• Be capable of storing arbitrary matrices.
• Should be efficient for permutation like matrices, i.e. require as storage (order of) 2
integers per unknown and constant time access by row and column.
14.6 Implementation
An arbitrary permutation P of N objects can be stored as two integer vectors ident[N ]
and iident[N ] such that if P (j) = k, then
ident[j − 1] = k
iident[k − 1] = j
(131)
We will consider a slight modification to this. A row is
void: If all its elements are null
normal: If one element is one and the rest null (like in a permutation matrix).
special: otherwise.
Then we set


0;
ident[j − 1] = k;


−1;
if row j is void
if row j is normal and the non-null element is in position k
if row j is special
(132)
We need now how to access the coefficients of the special rows. Rows are stored as a
“map”, in the STL language, from integer values to doubles, i.e.
typedef map<int,double> row_t;
and we keep in class idmap a map from row indices to pointers to rows of the form
map<int,row_t*> row_map;
114
So that, for a given j we can get the corresponding row by checking the value of inode[j−1].
If the row is void or normal we return the corresponding row and if it is special, then we
look for the pointer in row_map and this returns the desired row. Similar information is
stored by columns, but in this case it is not necessary to store the values for the special
rows so that the columns are of the form
typedef set<int> col_t;
and class idmap contains a private member col map of the form:
map<int,col_t*> col_map;
14.7 Block matrices
When solving a linear (123) for U, it is a key point to take into account that most part of Q
are independent blocks, so that the inversion may be done with a minimum computational
effort. We say that a given row and column indices i, j we say that they are “directly
connected” if the element Qij is non null. We can then define that two row indices i, i0 are
“indirectly connected” if they are connected by a path i → j1 → i1 → j2 → . . . → jn → i0
where the arrow means “directly connected”. The definition applies also to pairs of column
indices and row/column, column/row pairs. The indirect connection defines an equivalence
class that splits the rows in disjoint sets.
14.7.1 Example:
Consider matrix










Q=









row → 1 2 3 4 5 6 7 8 9 10
col ↓ 1 . . . . . . . ∗ ∗ . 

2 . . . . . . . . . . 

3 . . . . . . . ∗ ∗ . 


4 . . . . . . . . . ∗ 

5 . . . ∗ ∗ . ∗ . . . 

6 . . . ∗ ∗ . ∗ . . . 

7 ∗ . . . . . . . . . 

8 . . . . . ∗ . . . . 

9 . . . . . . . . . . 
10 . . . . . . . ∗ ∗ .
(133)
where the asterisks means a non-null element. Starting with row index 1, we see that its
corresponding connecting set is rows 1,3,10 and columns 8,9. So that the linear transfor-
115
mation y = Qx can be represent as
y2 = 0
y9 = 0


y1
x
8
1
 y3  = Q
x9
y10
(134)
4,10
y4 = Q x10


x4
y5
= Q3  x5 
y6
x7
So that Q decomposes in three blocks of 3×2, 1×1, and 2×3, two void rows (2,9) and two
row columns (2,3). One basic operation of class idmap is to compute the row connected
to a given column index.
14.8 Temporal dependent boundary conditions
Temporal dependent boundary conditions are treated in PETSc-FEM assuming that the
boundary condition on a set of nodes J = {j1 , j2 , j3 , . . . , jN } is of the form
φj1 (t) = a1 φ(t)
φj2 (t) = a2 φ(t)
..
.
(135)
φjN (t) = aN φ(t)
where the ak are spatial amplitudes and the function φ(t) is the temporal part of the
dependency. For instance consider the solution of a problem of heat conduction coupledt
with diffusion of a component in a rectangular region, as depicted in figure 21. The
boundary condition on side AD is of the form
◦
T (x, t) = 100 C [4x(L − x)/L2 ] sin(5t)
(136)
where L is the distance AD.
The nodes on side AD are {1, 7, 13, 19, 25, 31, 37} and we assume that concentration and
temperature are fields 1 and 3 respectively. Here we can take
aj = 4xj (L − xj )/L2 ], for j in J
◦
φ(t) = 100 C sin 5t
(137)
Boundary conditions depending on time, as in this example are entered in PETSc-FEM
with a fixa_amplitude section. The general form is
fixa_amplitude <function>
<param1> <val1>
<param2> <val2>
116
y
1
7
13
19
25
31
A
37
D
x
B
C
Figure 21: Example with time dependent boundary conditions.
...
__END_HASH__
<node1> <dof1> <val1>
<node2> <dof2> <val2>
...
<nodeN> <dofN> <valN>
__END_FIXA__
For instance, for the example above it may look like this
fixa_amplitude sine
omega 5.
amplitude 1.
__END_HASH__
0.00000 2
0.55556 2
0.88889 2
1.00000 2
0.88889 2
0.55556 2
0.00000 2
__END_FIXA__
14.8.1 Built in temporal functions
The following temporal functions are currently available in PETSc-FEM:
ramp: Double ramp function See 22.


; if t ≤ t1
φ1
φ(t) = φ2
; if t ≥ t2


φ1 + s(t − t1 ) ; if t1 < t < t2 ;
117
(138)
where the slope s is
s=
φ2 − φ1
t2 − t1
(139)
The parameters are
• start_time=t1 . (default 0.) The starting time of the ramp.
• start_value=φ1 . (default 0.) The starting value of the ramp.
• slope=s. (default 0.) The slope in the ramp region.
• end_time=t2 . (default =start_time) The end time of the ramp.
• end_value=φ2 . (default =start_value) The end value of the ramp.
Only one of slope and end_value must be specified. If the end values are not
defined, then they are assumed to be equal to the starting ones. If the end time
is equal to the starting time, then the end time is taken as ∞, i.e. a single ramp
starting at start_time.
1
s
0
t
t0
t1
Figure 22: Ramp function
smooth ramp: Smooth double ramp function (hyperbolic tangent)
See figure 23.
φ1 + φ0 φ1 − φ0
t − ts
φ(t) =
+
tanh
(140)
2
2
τ
This function is somewhat analogous to ramp in the sense that it goes from a starting
constant value to another final one, but smoothly. Beware that the start and end
values are reached only for t → ±∞.
• switch_time=ts . (default 0.) The time at which φ passes by the mean value
φ¯ between φ0 and φ1 .
• time_scale=τ . (default: none) This somewhat represents the duration of the
change from the starting value to the end value. During the interval from ts − τ
to ts + τ (i.e. a total duration of 2τ ) φ goes from φ0 + 0.12∆φ to φ0 + 0.88∆φ.
• start_value=φ0 . (default 0.) The limit valu for t → −∞.
• end_value=φ1 . (default 1.) The limit valu for t → +∞.
118
1
0
t
2
ts
Figure 23: Smooth ramp with the hyperbolic tangent function
sin: Trigonometric sine function
φ(t) = φ0 + A sin(ωt + ϕ)
(141)
u(t) / sin(!t)
A
t
Figure 24: Sine function.
The parameters are
• mean_val=φ0 . (default 0.)
• amplitude=A. (default 1.)
• omega=ω. (default: none)
• frequency=ω/2π. (default: none)
• period=T = 2π/ω. (default: none)
• phase=ϕ. (default 0.)
Only one of omega, frequency or period must be specified.
cos: Trigonometric cosine function
φ(t) = φ0 + A cos(ωt + ϕ)
The parameters are the same as for sin (see 14.8.1)
119
(142)
piecewise: Piecewise linear function This defines a piecewise linear interpolation
for a given series of time instants tj (ordered such that tj < tj+1 ) and amplitudes φj
(1 < j < n) entered by the user. The interpolation is
(
0;
if t < t0 or t > tn
φ(t) =
(143)
φk+1 −φk
φk + tk+1 −tk t − tk ; if tk < t < tk+1
If final_time is defined, then th function is extended periodically with period tn −t1 .
The parameters are
• npoints (integer) The number n of points to be entered.
• t (n doubles) The instants tj
• f (n doubles) The function values φj
• final_time (double) The function is extended from tn to this instant with
period tn − t1 .
spline: Spline interpolated function This is similar to piecewise but data is
smoothly interpolated using splines.
• npoints (integer) The number n of points to be entered.
• t (n doubles) The instants tj
• f (n doubles) The function values φj
• final_time (double) The function is extended from tn to this instant with
period tn − t1 .
spline periodic: Spline interpolated periodic function
Spline interpolation
with piecewise may give poor results, specially if you try to match smoothly the
beginning and end of the period. spline_periodic may give better results, at
the cost of being restricted to enter the data function in an equally spaced grid
(tj+1 − tj = ∆t = cnst).
• npoints (integer) The number n of points to be entered.
• period (double) The period T of the function.
• start_time (double, default = 0.) The first time instant t1 . The remaining
instants are then defined as tj = t1 + [(j − 1)/(n − 1)] T .
• f (n doubles) The function values φj at times {tj }.
Example: The lines
fixa_amplitude spline_periodic
period 0.2
npoints 5
start_time 0.333
ampl_vals 1
0
-1
120
0.
1.
Defines a function with period T = 0.2 that takes the values
φ(0.333 + nT ) = 1
φ(0.333 + (n + 1/4)T ) = 0
φ(0.333 + (n + 1/2)T ) = −1
(144)
φ(0.333 + (n + 3/4)T ) = 0
Actually, the resulting interpolated function is simply
t − 0.333
φ(t) = cos 2π
T
(145)
Implementation details: If we define the phase θ as
t − t1
θ = 2π
(146)
T
then φ(θ) is periodic with period 2π. We can decompose it in two even functions
φ± (θ) as
φ(θ) + φ(−θ)
φ+ (θ) =
2
(147)
φ(θ)
−
φ(−θ)
−
φ (θ) =
2 sin(θ)
so that
φ(θ) = φ+ (θ) + sin(θ) φ− (θ)
(148)
As φ± are even function, they may be put in terms of x = (1 − cos θ)/2. So that xj
and φ±
j are computed and by construction, only one half of the values (say j = 1 to
−
j = m = (n − 1)/2 + 1) are relevant. The values φ−
0 and φm have to be computed
specially, since sin(φj ) = 0 for them. If we take the limit
φ(θ) − φ(−θ)
(149)
θ→0
2 sin θ
then, if linear interpolation is assumed in each interval, it can be shown that
φ−
0 = lim
φ2 − φn−1
φ(θ) − φ(−θ)
=
θ,
2
∆θ
for |θ| < ∆θ
(150)
Then
φ(θ) − φ(−θ)
φ2 − φn−1
=
(151)
θ→0
2 sin θ
2∆θ
We could take then this value for φ−
0 , however, this introduces some noise. For
+
instance, if φ(θ) = sin(θ) then φ ≡= 0 and φ−
j = 1 for j 6= 1, m, and (151) gives
lim
lim
θ→0
φ(θ) − φ(−θ)
sin ∆θ
=
2 sin θ
∆θ
We introduce, then a “correction factor”
∆θ
sin ∆θ
(152)
so that we define
φ2 − φn−1
2 sin ∆θ
(153)
φm−1 − φm+1
2 sin ∆θ
(154)
φ−
0 =
Analogously, we define
φ−
m =
121
14.8.2 Implementation details
Temporal boundary conditions are stored in an object of class Amplitude. This objects
contains a key string identifying the function and a text hash table containing the parameters for that function (for instance omega→3.5, amplitude→2.3, etc...). Recall that
each node/field pair may depend on some free degrees of freedom (those in U in (123))
¯ ). For those fixed, there is an array containing both an
and some others fixed (those in U
amplitude (i.e. a double) and a pointer to an Amplitude object. If the condition don’t
depends on time, then the pointer is null.
14.8.3 How to add a new temporal function
If none of the built-in time dependent functions fit your needs, then you can add you own
temporal function. Suppose you want a function of the form
(
0
; t < 0;
(155)
φ(t) =
t
; t >= 0;
(1+t/τ )
Follow these steps
1. Give a name to it ending in _function. We will use my_own_function in the
following.
2. Declare it with a line like
AmplitudeFunction my_own_function;
(In the core PETSc-FEM this is done in dofmap.h.)
3. Write the definition. (You can find typical definitions in file tempfun.cpp). You can
use the macro
SGETOPTDEF(<type>,<name>,<default value>);
for retrieving values from the hash table defined by the user in the data file.
4. Register it in the temporal function table with a call such as the following, preferably after the call to Amplitude::initialize_function_table in the main(). For
instance
Amplitude::initialize_function_table();
Amplitude::add_entry("smooth_ramp",
&smooth_ramp_function); // <- add this line
(In
the
core
PETSc-FEM
this
Amplitude::initialize_function_table.)
5. Use it in your data files as follows
122
is
done
inside
the
amplitude_function my_own
tau 3.0
__END_HASH__
1 2 3.
...
14.8.4 Dynamically loaded amplitude functions
Another possibility to add new amplitude functions is through loading functions at runtime. This avoids re-linkage and modification of the sources. (Note: You need to have
the module compiled with the appropriate flag (either the compilation flag -DUSE_DLEF
or, the Makefile variable USE_DYNAMICALLY_LOADED_EXTENDED_FUNCTIONS = yes, either
in the Makefile.base or the ../Makefile.defs file.)
These amplitude functions are written in C++ in source files, say fun.cpp. The compiled files have extension .efn (from “extension function”) for instance fun.efn in this
example. The Makefile.base file provides the appropriate rules to generate the compiled
.efn functions from the source. For each amplitude you want to define you have to write
three functions
extern "C" void <prefix>init_fun(TextHashTable *thash,void *&fun_data);
extern "C" double <prefix>eval_fun(double t,void *fun_data);
extern "C" void <prefix>clear_fun(void *fun_data) ; // this is optional
Prefix may be an empty string, or a non-empty string ending in underscore. Use of
non-empty prefix is explained below. The macros INIT_FUN, EVAL_FUN and CLEAR_FUN
expand to these definitions. The second function eval_fun(..) is that one that defines
the relation between the time and the corresponding amplitude. The first init_fun(...)
one can be used to make some initialization, normally it is called only once. The last one
clear_fun(...) is called after all calls to eval_fun(...). For simple functions you may
not need the init and clear functions, for instance if you want a linear ramp from 0 at
t = 0 to 1 at t = 1, then it can be done with a simple file fun0.cpp this:
// File: fun0.cpp
#include <math.h>
#include <src/ampli.h>
INIT_FUN {}
EVAL_FUN {
if (t < 0) return 0.;
else if (t > 1.) return 1.;
else return t;
}
CLEAR_FUN {}
You then have to do a
123
$ make fun0.efn that will create the fun0.efn shared object file. Dynamically loaded
functions can be used using the fixa_amplitude dl_generic clause and then giving the
name of the file with the ext_filename option. For instance,
<... previous lines here ...>
fixa_amplitude dl_generic
ext_filename "./fun0.efn"
__END_HASH__
1 1 1.
<... more node/dof/val combinations here ...>
__END_FIXA__
You can also have dynamically loaded functions that use parameters loaded via the table
of options at run time. For this you are passed the TextHashTable * object entered by
the user to the init function. You then can read parameters from these but in order
that the “init” function do anything useful it has to be able to pass data to the “eval”
function. Normally you define a “struct” or class to hold the data, create it with new
or malloc() put the data in it, and then pass the address via the “fun data” pointer.
Later, in subsequent calls to eval_fun(...) it is passed the same pointer so that you
can recover the information, previous a static cast. Of course, in order to avoid memory
leak you have to free the allocated memory somewhere, this is done in the clear(...)
function. For instance, the following file defines a ramp function where youcan set the
four values t0 , f0 , t1 , f1
#include <math.h>
#include <src/fem.h>
#include <src/getprop.h>
#include <src/ampli.h>
struct MyFunData {
double f0, f1, t0, t1, slope;
};
INIT_FUN {
int ierr;
// Read values from option table
TGETOPTDEF(thash,double,t0,0.);
TGETOPTDEF(thash,double,t1,1.);
TGETOPTDEF(thash,double,f0,0.);
TGETOPTDEF(thash,double,f1,1.);
// create struct data and set values
MyFunData *d = new MyFunData;
fun_data = d;
d->f0 = f0;
d->f1 = f1;
124
d->t0 = t0;
d->t1 = t1;
d->slope = (f1-f0)/(t1-t0);
}
EVAL_FUN {
MyFunData *d = (MyFunData *) fun_data;
// define ramp function
if (t < d->t0) return d->f0;
else if (t > d->t1) return d->f1;
else return d->f0 + d->slope *(t - d->t0);
}
CLEAR_FUN {
// clear allocated memory
MyFunData *d = (MyFunData *) fun_data;
delete d;
fun_data=NULL;
}
Another possibility, perhaps more simple, would be to use a global MyFunData object but if
several fixa_amplitude entries that use the same function are used then the data created
by the first entry would be overwritten by the second entry.
14.8.5 Use of prefixes
Several functions can be written in the same .cpp file using a prefix ending in underscore,
for instance if you want to define a gaussian function then you define functions with
<prefix>=gaussian , for instance
extern "C" void gaussian_init_fun(TextHashTable *thash,void *&fun_data);
extern "C" double gaussian_eval_fun(double t,void *fun_data);
extern "C" void gaussian_clear_fun(void *fun_data) ; // this is optional
The macros INIT_FUN1(name), EVAL_FUN1(name) and CLEAR_FUN1(name) do it for you,
where name is the prefix, with the underscore removed, for instance
%
<... headers ...>
struct MyGaussFunData {
<... your data here ...>
};
INIT_FUN1(gaussian) {
<... read data...>
MyGaussFunData *d = new MyGaussFunData;
125
fun_data = d;
<... set data in d ...>
}
EVAL_FUN(gaussian) {
MyGaussFunData *d = (MyGaussFunData *) fun_data;
<... use data in d and define amplitude ...>
}
CLEAR_FUN(gaussian) {
// clear allocated memory
MyGaussFunData *d = (MyGaussFunData *) fun_data;
delete d;
fun_data=NULL;
}
Finally, yet another approach is to have a “wrapper” class with methods, init and eval.
The macro DEFINE_EXTENDED_AMPLITUDE_FUNCTION(class_name) is in charge of creating
and destroying the object. For instance the following file fun3.cpp defines two functions
linramp and tanh_ramp.
// File fun3.cpp
#include <src/ampli.h>
class linramp {
public:
double f0, f1, t0, t1, slope;
void init(TextHashTable *thash);
double eval(double);
};
void linramp::init(TextHashTable *thash) {
int ierr;
TGETOPTDEF_ND(thash,double,t0,0.);
TGETOPTDEF_ND(thash,double,t1,1.);
TGETOPTDEF_ND(thash,double,f0,0.);
TGETOPTDEF_ND(thash,double,f1,1.);
slope = (f1-f0)/(t1-t0);
}
double linramp::eval(double t) {
if (t < t0) return f0;
else if (t > t1) return f1;
else return f0 + slope *(t - t0);
}
DEFINE_EXTENDED_AMPLITUDE_FUNCTION(linramp);
126
//---:---<*>---:---<*>---:---<*>---:---<*>---:---<*>---:---<*>---:
class tanh_ramp {
public:
double A, t0, delta, base;
void init(TextHashTable *thash);
double eval(double);
};
void tanh_ramp::init(TextHashTable *thash) {
int ierr;
TGETOPTDEF_ND(thash,double,base,0.);
TGETOPTDEF_ND(thash,double,A,1.);
TGETOPTDEF_ND(thash,double,t0,0.);
TGETOPTDEF_ND(thash,double,delta,1.);
assert(delta>0.);
}
double tanh_ramp::eval(double t) {
if (t < t0) return base;
else return base + A * tanh((t-t0)/delta);
}
DEFINE_EXTENDED_AMPLITUDE_FUNCTION(tanh_ramp);
14.8.6 Time like problems.
The mechanism of passing time to boundary conditions and elemsets may be used to pass
any other kind of data, since it is passed as a generic pointer void *time-data. This may
be used to treat other problems where an external parameter exists, for instance
Continuation problems For instance a NS solution at high Reynolds number may be
obtained by continuation in Re, and then we can use it as the external (tome-like)
parameter. In general, we can perform continuation in a set of parameters, so that
the time_data variable should be an array of scalars.
Multiple right hand sides Here the time like variable may be an integer: the number of
right hand side case.
15 The compute prof package
15.1 MPI matrices in PETSc
When defining a matrix in PETSc with “MatCreateMPIAIJ” you tell PETSc (see the
PETSc documentation)
• How many lines of the matrix live in the actual processoor “m”.
127
• For each line of this processor how many non null elements it has in the diagonal
block “d_nnz[j]”
• For each line of this processor how many non null elements it has in the off-diagonal
block “o_nnz[j]”
In PETSc-FEM this quantities are computed in two steps.
• Call assemble with “ijob=COMP_MAT_PROF” or “ijob=COMP_FDJ_PROF” and appropriated “jobinfo” in order to tell the element routine which matrix are you computing. This returns a Libretto dynamic array “*da” containing the sparse structure
of the given matrix.
• function “compute_prof” takes “da” as argument and fills “d_nnz”, “o_nnz”.
15.2 Profile determination
Let us describe first how the sparse profile is determined in the sequential (one processor
only) case. We have to determine for each row index i all the column indices j that have a
non null matrix entry Aij . The amount of storage needed is a mattern oof concern here. It
is allmost sure that we will need at least one integer per each non-null entry. A first attempt
is to have a dynamic array for each i index and store their all the connected j indices.
In typical applications we have O(104 ) to O(105 ) nodes per processor and the number of
connected j indices ranges from 10 to several hundred. In order to avoid this large number
of small dynamic arrays growing we store all the indices in a big array, behaving as a sinly
linked list. Each entry in the array is composed of two integers (“struct Node”), one
pointing to the next entry for this row, and other with the value of the row. Consider, for
instance, a matrix with the following sparse structure:


∗ ∗ ∗
(156)
A= . ∗ . 
∗ . ∗
Matrix coefficients (i, j) are introduced in the following order (1,1), (2,2), (1,2), (3,1),
(1,3), (3,3). The dynamic array for this mesh results in
row 3 starts
row 2 starts
row 1 starts
Pos. in dyn array: 0
Contents of dyn. array: (1,3)
1
(2,4)
2
(1,6)
3
(2,5)
4
(0,-1)
7
(0,-1)
8
(0,-1)
end of
row 3
128
6
(3,8)
end of
row 1
end of
row 2
Figure 25: Structure of darray
5
(3,7)
For all i, the i-th row starts in position i − 1. The first component of the pair shows
the value of j and the position of the next connected j index. The sequence ends with a
terminator (0, −1). In this way, the storage needed is two integers per coefficient with an
overhead of two integers per row for the terminator, and there is only one dynamic array
growing at the same time.
To insert a new coefficient, say (i, j), we traverse the i-th row, checking whether the
j coefficient is already present or not, and if the terminator is found, j is inserted in its
place, pointinf to the new terminator, which is placed at the end of the dynamic array.
16 The PFMat class
The PFMat class is a matrix class that acts either as a wrapper to the PETSc Mat or
to other representations of matrix/solvers. Currently there is the basic PETSc class
named PETScMat and a class called IISDMat (for “Interface Iterative – Sub-domain Direct”, method) that has a special solver that solves the linear system by solving iteratively
over the interface nodes, while solving with a direct method in the sub-domain interiors
(this is commonly referred as the “Domain Decomposition Method”).
16.1 The PFMat abstract interface
The create(...) member should create the matrix from the matrix profile da and the
dofmap’. For the PETScMat matrix class it calls the old compute_prof routine calculating
the d_nnz and o_nnz arrays, and calling the MatCreate routine. For the IISD matrix it
has to determine which dofs are in the local blocks and create the appropriate numbering.
The set_value member is equivalent to the ’MatSetValues’ routine and allows to enter
values in the matrix. For the IISDMat class it sets the value in the appropriate block
(ALL , AIL , ALI or AII PETSc matrices). In addition, for the ALL (“local-local” block) it
has to “buffer” those values that are not in this processor (this can happen when dealing
with periodic boundary conditions, or bad partitionings, for instance, an element that
is connected to all nodes that belong to other processor. This last case is not the most
common but it can happen.
Once you have filled the entries in the matrix you have to call the assembly_begin()
and assembly_end() members (as in PETSc).
The solve(...) member solves a linear ssytem associated to the given operator.
The zero_entries() is also the counterpart of the corresponding PETSc() routine.
The build_sles() member creates internally the SLES needed by the solution (included
the preconditioner). It takes as an argument a TextHashTable from where it takes a series
of options. The destroy_sles() member has to be called afterwards, in order to destroy
it (and free space). The monitor() member allows the user to redefine the convergence
monitoring routine.
The view() member prints operator information to the output.
16.2 IISD solver
Let’s consider a mesh as in figure 26, partitioned such that a certain number of elements
and nodes belong to processor 0 and others to processor 1. We assume that one unknown
129
I block
L2 block
L1 block
Element in processor 0
Element in processor 1
Unknown in processor 0
Unknown in processor 1
Figure 26: IISD deccomposition by subdomains
is associated to each node and no Dirichlet boundary conditions are imposed so that each
node corresponds to one unknown. We split the nodes unknowns in three disjoint subsets
L1,2 and I such that the nodes in L1 are not connected to those in L2 (i.e. they not share
an element, and then, the FEM matrix elements Ai,j and Aj,i with i ∈ L1 and j ∈ L2 are
null. The matrix is split in blocks as follows
ALL ALI
A=
AIL AII
A00
0
ALL =
0
A11
(157)
ALI = A0I A1I
AI0
AIL =
AI1
Now consider the system of equations
Ax = b
130
(158)
which is split as
ALL xL + ALI xI = bL
AIL xL + AII xI = bI
(159)
Now consider eliminating xL from the first equation and replacing in the second so that,
we have an equation for xI
−1
(AII − AIL A−1
LL ALI ) xI = (bI − AIL ALL bL )
˜I
˜ xI = b
A
(160)
We consider solving this system of equations by an iterative method such as GMRES,
for instance. For such an iterative method, we have only to specify how to compute the
˜ and also how to compute the matrix-vector product y = A
˜ x.
modified right hand side b
Computing the matrix product involves the following steps
1. Compute y = AII x
2. Compute w = ALI x
3. Solve ALL z = w for z
4. Compute v = AIL z
5. Add y ← y − v
involving three matrix products with matrices AII , AIL and ALI and to solve the system
with ALL . As the matrix ALL has no elements connecting unknowns in different processors
the solution system may be computed very efficiently in parallel.
16.2.1 Interface preconditioning
To improve convergence of the interface problem(160) some preconditioning can be introduced. As the interface matrix is never built, true Jacobi preconditioning (i.e. using
˜ where P is the preconditioning
the diagonal part of the Schur complement, P = diag A,
matrix), can not be used. However we can use the diagonal part of the interface matrix
(P = diag AII ) matrix or even the whole interface matrix (P = AII ). Using the diagonal
preconditining helps to reduce bad conditioning due to refinement and inter-equation bad
scaling. If the whole interface matrix is used (P = AII ) then the linear system AII w = x
has to be solved at each iteration. This can be done with a direct solver or iteratively.
In the 2D case the connectivity of the interface matrix is 1D or 1D like, so that the direct option is possible. However in 3D the connectiviy is 2D like, and the direct solver is
much more expensive. In addition the interface matrix is scattered among all processors.
The iterative solution is much more appealing since the fact that the matrix is scattered
among processors is not a problem. In addition the interface matrix is usually diagonally
dominant, even when the whole matrix is not. For instance for the Laplace equation in
2D the stencil of the interface matrix on a planar interface in a homogeneous grid of step
h with bilinear quad elements is [−18 − 1]/3h2 . Such matrix has a condition number
which is independent of h and is asymptotically κ(AII ) = 5/3. In the case of using triangular elements (by the way, this is equivalent to the finite difference case) the stencil
131
is [−14 − 1]/2h2 whose condition number is κ(AII ) = 3. This low condition number also
favors the use of an iterative method. However a disdvantage for the iterative solution
is that non-stationary iterative solvers (i.e., those where the next iteration xk+1 doesn’t
depend only on the previous one: xk+1 = f (xk ), like CG or GMRES) can not be nested
inside other non-stationary method, unless the inner preconditioning loop is iterated to a
very low error. This is because the conjugate gradient directions will lose orthogonality.
But using a very low error bound for the preconditioning problem may be expensive so that
non-stationary iterative methods are discarded. Then, the use of Richardson iteration is
suggested. Sometimes this can diverge and some under-relaxation must be used. Options
controlling iteration for the preconditioning problem can be found in section § 4.4.3.
16.3 Implementation details of the IISD solver
• Currently unknowns and elements are partitioned in the same way as for the PETSc
solver. The best partitioning criteria could be different for this solver than for the
PETSc iterative solver.
Element in processor 0
Element in processor 1
dof in processor 0
dof in processor 1
dof’s in processor 0 connected to dof’s in processor 1
Figure 27: IISD deccomposition by subdomains. Actual decomposition.
• Selecting “interface” and “local” dof’s: One strategy could be to mark all dof’s that
are connected to a dof in other processor as “interface”. However this could lead to
132
an “interface” dof set twice larger in the average than the minimum needed. As the
number of nodes in the “interface” set determines the size of the interface problem
(160) it is clear that we should try to choose an interface set as small as possible.
In read_mesh() partitioning is done on the dual graph, i.e. on the elements. Nodes
are then partitioned in the following way: A node that is connected to elements in
different processors is assigned to the highest numbered processor. Referring to the
mesh in figure 26 and with the same element partitioning all nodes in the interface
would belong to processor 1, as shown in figure 27.
Now, if a dof i is connected to a dof j on other processor we mark as “interface” that
dof that belongs to the highest numbered processor. So, in the mesh of figure 27 all
dof’s in the interface between element sub-domains are marked to belong to processor
1. The nodes in the shadowed strip belong to processor 0 and are connected to nodes
in processor 1 but they are not marked as “interface” since they belong to the lowest
numbered processor. Note that this strategy leads to an interface set of 4 nodes,
whereas the simpler strategy mentioned first would lead to an interface set of 4 (i.e.
including the nodes in the shadowed strip), which is two times larger.
q
p
e
r
Proc 0
Proc 1
Element in processor 0
Element in processor 1
dof in processor 0
dof in processor 1
dof’s in processor 0 connected to dof’s in processor 1
Figure 28: Non local element contribution due to bad partitioning
• The IISDMat matrix object contains three MPI PETSc matrices for the ALI , AIL
and AII blocks and a sequential PETSc matrix in each processor for the local part
133
of the ALL block. The ALL block must be defined as sequential because otherwise
we couldn’t factorize it with the LU solver of PETSc. However this constrains that
MatSetValues has to be called in each processor for the matrix that belongs to its
block, i.e. elements in a given processor shouldn’t contribute to ALL elements in
other processors. Normally, this is so, but for some reasons this condition may be
violated. One, is periodic boundary conditions and constraints in general (they are
not taken into account for the partitioning). Another reasons is very bad partitioning
that may arise in some not so common situations. Consider for instance figure 28.
Due to bad partitioning a rather isolated element e belongs to processor 0, while
being surrounded by elements in processor 1. Now, as nodes are assigned to the
highest numbered processor of the elements connected to the node, nodes p, q and r
are assigned to processor 1. But then, nodes q and r will belong to the local subset
of processor 1 but will receive contributions from element e in processor 0. However,
the solution is not to define this matrices as PETSc because, so far, PETSc doesn’t
support for a distributed LU factorization. The solution we devised is to store those
ALL contributions that belong to other processors in a temporary buffer and after,
to send those contributions to the correct processors directly with MPI messages.
This is performed with the DistMatrix object A_LL_other.
16.4 Efficiency considerations
The uploading time of elements in PETSc matrices can be significantly reduced by using
“block uploading”, i.e. uploading an array of values corresponding to a rectangular submatrix (not necessarily being contiguous indices) instead of uploading each element at a
time. The following code snippets show the both types of uploading.
// ELEMENT BY ELEMENT UPLOADING
// The global matrix
Mat A;
// row and column indices (both of length ‘nen’)
int *row_indx,*col_indx;
// Elementary matrix (size ‘nen*nen’)
double *Ae;
// ... define row_indx, col_indx and fill Ae
for (int j=0; j<nen; j++) {
for (int k=0; k<nen; k++) {
ierr = MatSetValue(A,row_int[j],col_indx[k],Ae[j*nen+k],ADD_VALUES);
}
}
// BLOCK UPLOADING
// ... same stuff as before ...
ierr = MatSetValues(A,nen,row_int[j],nen,col_indx[k],Ae,ADD_VALUES);
In PETSc-FEM, the computed elemental matrices can be uploaded in the global matrices with both methods, as selected with the block_uploading global option (set to 1 by
default, i.e. use block uploading). However, in some cases block uploading can be actually
134
slower due to the use of “masks”. A mask is a matrix of the same size as the elemental
matrix with 0’s or 1’s indicating whether some coefficients are (structurally, i.e. not for a
particular state) null. Moreover, the mask do not depends on the particular element, but
it is rather a property of the terms being evaluated in the Jacobian.
For instance, for the Navier-Stokes equations the Galerkin term only has non null coefficients for the velocity unknowns in the continuity equation, while the pressure gradient
term only has coefficients for the pressure unknowns in the momentum equations. Both
terms together have a mask as shown in figure 29. When the application writer codes such a
term, he defines the mask. At the moment of uploading the elements, if block_uploading
is in effect, then PETSc-FEM computes the “envelope” of the mask, i.e. the rectangular
mask that contains the mask in order to make just one call to MatSetValues. In this case,
the envelope is just a matrix filled with 1’s, so that block uploading pays the benefit of using the faster MatSetValues routine, with the cost of loading much more coefficients than
the original mask. In addition, the PETSc matrix will be bigger, with the corresponding
increase in RAM demand and CPU time in computing factorizations (IISD solver) and
matrix/vector products (PETSc solver). (In a future, such a combination of terms will be
loaded more efficiently with two calls to MatSetValues.)
The conclusion is that, if the terms to be loaded have a very sparse structure but a
dense envelope, then may be block uploading is slower. (The worst case is a diagonal-like
mask.) Note that also, it’s not sufficient to have a sparse structure of the elemental matrix,
but also the application writer has to compute and return the mask. Finally, note that
you can always check whether block uploading is faster or slower by activating the time
statistics for the elemset (the report_consumed_time) and run a large example with both
kinds of uploading.
Ae,mask =
ux
uy
uz
p
0
0
0
1
momentum x
0
0
0
1
momentum y
0
0
0
1
momentum z
1
1
1
0
continuity
momentum eq. pressure gradient term
cont. equation. Galerkin term
Figure 29: Example of mask for the continuity equation (Galerkin term) in the NavierStokes equations
17 The DistMap class
This class is an STL map<Key,Val> template container, where each process can insert
values and, at a certain point a scatter() call sends contributions to the corresponding
processor.
135
17.1 Abstract interface
To insert or access values one uses the standard insert() member or the [] operator
of the basic class. The processor(iterator k) member (to be defined by the user of
the template) should return the process rank to which the pair (key,val) pointed by
k should belong. After calling the scatter() member by all processes, all entries are
sent to their processors. In order to send the data across processors, the user has to
define the size_of_pack() and pack() routines. The pack() and unpack() functions in
the BufferPack namespace can help to do that for basic types (i.e. those for which the
sizeof() operator and the libc memcpy() routine work). Finally the combine member
defines how new entries that have been sent from other processors have to be merged in
the local object.
17.2 Implementation details
When calling the scatter member, each entry in the basic container is scanned and if
it doesn’t belong to the processor it is buffered (using the pack member) to a buffer to
be sent to the other processor. Once all the data to be sent is buffered, the entries are
scanned again and the entries that have been buffered are deleted.
The buffers are sent to the other processors following a strategy preventing deadlock in
nproc − 1 stages, where nproc is the number of processors. In the k-th stage, data is sent
from processor j to processor (j + k)%nproc where % is the remainder operator as in the
C language. In each stage, all processors other than the server do first a MPI_Recv() and
after a MPI_Send(), while the server does in the other sense, i.e. first a MPI_Send() and
after a MPI_Recv(), initiating the process
Processor
0
1
R
3
4
R
R
S
R
S
R
S
R
S
S
R
S
S
R
S
5
R
S
R
R
S
3rd stage
time
2nd stage
1st stage
S
2
S = send
R = receive
Figure 30: Simple scheduling algorithm for transferring data among processors
This is a rather inefficient strategy because at each stage all sending is tied to the
previous one, making the whole process ∼ nproc 2 T ) where T is the time needed to send a
typical individual buffer from one process to other.
136
Processor
0
1
2
3
4
5
stg 3
time
stg 2
stg 1
stg 0
E
E
E
E
E
E
E
E
E
E
E
E
E
6
E
E
7
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E = exchange
Figure 31: Improved scheduling algorithm for transferring data among processors
An improved strategy consists in a multilevel scheduling algorithm. Assume first
that nproc is even and divide processors in subsets S1 = {0, 1, . . . , nproc /2 − 1}, S2 =
{nproc /2, nproc /2 + 1, . . . , nproc − 1}. We first exchange all information between processors
in S1 with those in S2 in nproc /2 stages. In stage 0 (see figure 31) processor 0 exchanges
data with n/2, 1 with nproc /2 + 1, . . . and n/2 − 1 with nproc . i.e. processor 0 calls
exchange(n/2) and processor n/2 calls exchange(0), the code of procedure exchange()
is shown in the procedure below. In the stage 1 processor 0 exchanges with nproc /2 + 2,
1 with nproc /2 + 3, nproc /2 − 2 with nproc and nproc /2 − 1 with nproc /2, i.e. processor j
exchanges with (n/2) + (j + 1)%(nproc /2). In general, in stage k processor j exchanges
with processor (n/2) + (j + k)%(nproc /2). Note that in each stage all communications are
paired and can be performed simultaneously, so that each stage can be performed in 2T
(sending plus receiving). This nproc /2 stages take then a total of 2(nproc /2)T = nproc T
secs. Now, all communication between processors in S1 and S2 has been performed, it
only remains to perform the communication between processors in S1 and processors in
S2 . But then, we can apply this idea recursively and divide the processors in S1 in two
subsets S11 and S12 with nproc /4 each (let’s assume that the number of processors is a
power of 2, nproc = 2m ), with a required time of (nproc /2)T . Applying the idea recursively
we arrive to an estimation of a total time of
T (n) = [nproc + (nproc /2) + . . . + 1]T
= (2nproc − 1)T
∼ O(2nproc T )
Then, it is significantly better than the previous algorithm.
// Scheduling algorithm for exchanging data between processors
void exchange(j) {
if (myrank>j) {
137
(161)
// send data to
// receive data
} else {
// send data to
// receive data
}
processor j ...
from processor j ...
processor j ...
from processor j ...
}
17.3 Mesh refinement
[Warning: This is a work in progress.]
e
e
n
e
t
e
n
e
q
e
n
e
n
FEM mesh
Figure 32: FEM mesh and graph representation
1
2
0
3
5
Figure 33: Geomtrical objets: 5 nodes and an edge
We conceive a mesh as a graph, connecting nodes and other higher dimension entities
based on these nodes. Consider for example five nodes (labeled from 0 to 40 and an
edge connecting two of these nodes as in figure 33. In total, there are 6 “geometrical
objects” (5 nodes and the edge) in the figure. In order to identify higher order objects
like the edge, we could add a new index for it, say index 5, but instead we can associate
the edge with the connected pair of node indices, (edge 2 3). Note that, if we consider
that the edge has no orientation, then the sequence must be considered as a “set”, i.e.
(edge 2 3) = (edge 3 2), (Lisp-like “S-exp” expressions will be used throughout in order
to describe objects) whereas if oritentation matters, then (edge 2 3) 6= (edge 3 2). We
could, then define geometrical objects of type edge as unordered sequences of two node
indices, while the type ordered-edge is associated with ordered sequences of two nodes.
We say that the set of permutations that leave invariant edge objects is ((0 1) (1 0)),
whereas for the oriented edge is only the identity permutation ((0 1)). For larger objects,
the set of permutations that leave invariant the node sequence that defines the object is
more complex than that, it doesn’t reduce to the special case of ordered and unordered
138
sequences as for edges. Consider for instance the case of triangles. For an oriented triangle,
the set of nodes that leave invariant the triangle is ((0 1 2) (1 2 0) (2 0 1)), i.e.
shifts clockwise and counter-clockwise of the node sequence, whereas for unordered objects
the sequences are the same ((0 1 2) (1 2 0) (2 0 1) (0 2 1) (1 0 2) (2 1 0)).
In the last case, the permutations that leave invariant the object are the whole set of
6 permutations for 3 objects, so that in this case we can say that unordered triangles
can be represented as unordered sets of three nodes. But for the unordered triangle edge
(a b c) is the same that (b c a), so that associating unordered triangles with unordered
sequences does not take into account the rotational symmetry.
17.3.1 Symmetry group generator
The set of permutations (perm SHAPE) that leave the geometrical object SHAPE invariant
are a “group”, that means that if two permutations p, q belong to (perm SHAPE), then the
composition pq, i.e. the permutation resulting of applying first q and then p, also belong
to the group, as well as the product qp. In general, this is a finite group.
As a consequence, if we have two elements p and q of a group G, then pq, qp, p2 q, pqp,
pq 2 ... all belong to G. Given a set of permutations S ⊂ G, the set pS formed by the left
product of all the elements of S with p belongs to G. The same applies to Sp, qS and
Sq. So that, starting with the set S0 = {1}, where 1 is the identity permutation, we can
generate recursively a larger set of symmetries by forming S1 = {S0 , pS0 , S0 p, qS0 , S0 q}.
As the total number of permutations in the group is finite (at most n!, where n is the
number of nodes in the object), applying recursively the relation above will end with a
set of permutations H that is itself a group included in G. We will call it the group
“generated” by p an q. This can be applied to any number of generators p, q, r, s, ....
For instance, the symmetries for the oriented triangle can be generated with the permutation p =(1 2 0), since (2 0 1) can be generated as p2 . The symmetries for the
unoriented triangle can be derived from generators (1 2 0) (rotation) and (0 2 1) (inversion). The most relevant geometrical objects and their symmetries are
0
0
inv
rot
2
2
1
1
Figure 34: Generators for triangle
• Unoriented triangles are described by numbering their nodes in any way. Their
symmetries are all the permutations (6 = 2! en total) and are generated by a rotation
and an inversion.
• Oriented triangles are described by numbering their nodes in a specified direction.
Their symmetries are 3 en total generated by a rotation.
• Unoriented quadrangles are described by numbering their nodes in clockwise or
counterclockwise rotation. Symmetry generators: are (1 2 3 0) (rotation) and
(0 3 2 1) (inversion), there is a total of 8 permutations.
139
0
0
3
3
rot
inv
2
2
1
1
Figure 35: Generators for quadrangle
• Oriented quadrangles are identical to unoriented quadrangles but do not include the
inversion (4 rotations).
3
3
0
0
rot
0
2
2
rot
3
2
inv
1
1
1
Figure 36: Generators for tetras
• Unoriented tetrahedra are described by numbering one of the faces, and then the
oposite node. Symmetries are generated by permutation of any two of the faces, for
instance (1 2 0 3) and (3 0 2 1) and inversion (1 0 2 3) (24 permutations in
total).
• Oriented tetrahedra is identical to unoriented tetrahedra without inversion (24 permutations in total).
7
4
7
6
4
5
6
rot
1
4
5
3
0
7
6
5
3
2
rot
0
1
3
2
0
inv
2
1
Figure 37: Generators for hexas
• Unoriented hexas are described by numbering one of the faces as a quad, and then the
◦
opposite face in correspondence with the first face. Symmetries are generated by 90
rotations for any two of the faces (not opposite), for instance (1 2 3 0 5 6 7 0)
and (1 5 6 2 0 4 7 3) and inversion (1 0 2 3) (232 permutations in total).
• Oriented hexahedra is identical to unoriented hexahedra without inversion (112 permutations in total).
• Unoriented prisms are described by numbering one of the triangular faces as a triangle, and then the opposite face in correspondence with the first face. Symmetries
◦
are generated by 180 rotations of any two of the quad faces ((4 3 5 1 0 2) for
◦
instance), a 120 rotation of any of the triangular face ((1 2 0 4 5 3) for instance)
and an inversion ((1 0 2 4 3 5) for instance). There are 12 permutations in total.
140
5
3
5
3
4
5
3
4
4
rot
0
2
1
rot
0
2
1
0
inv
2
1
Figure 38: Generators for prisms
• Oriented hexahedra is identical to unoriented hexahedra without inversion (6 permutations in total).
17.3.2 Canonical ordering
One of the most common operations when manipulating these geometrical objects is, given
two geometrical objects of the same type, to determine whether they represent the same
object. The brute force solution is to apply all the permutations to one of them and check
if one of the permuted indices coincide with the other node sequence.
17.4 Permutation tree
In order to make it more efficient we can store all the permutations for a given shape in
a tree like fashion. Consider, for instance, the oriented tetrahedral shape. Their permutations are the following:
(0
(0
(0
(1
(1
(1
(2
(2
(2
(3
(3
(3
1
2
3
0
2
3
0
1
3
0
1
2
2
3
1
3
0
2
1
3
0
2
0
1
3)
1)
2)
2)
3)
0)
3)
0)
1)
1)
2)
0)
We can describe the generation of a new node numbering in the following way. First, we
can take any of the nodes as the first node of the new numbering. These is seen from the
fact that all the indices (0 to 3) are present in the first column of the table above. If we
chose node 2 as the first index, then we can chose any of the reamining as the second node
(0 1 3). Once we choose the second index (say for instance 1) there is only one possibility
for the reamining two: the numbering (3 1 0 2). The remaining possibility (3 1 2 0)
would gnerate an inverted triangle. Part of the tree, is shown in 39. Every possible
permutation is a “path” in the tree from the first node at level 0, to the last node, which
is a leave.
141
(root)
0
1
3
2
0
1
0
2
2
Figure 39: Tree representing all the permutations for the ordered tetra geometry.
Now, given two possible node orderings for a given geometrical object, and the tree that
describes the permutations for its shape, we simply have to follow the path that makes
one of them to fit in the other. If we arrive to an internal node without possibility to
follow, then the geometrical objects are distinct. If we reach a leave, then the objects are
the same.
17.5 Canonical ordering
Another possibility to determine whether two node oderings are congruent is to have a
uniquely determined ordering that can be computed from the node sequence itself. We we
call this ordering the “canonical” ordering for the geometrical object. If this is possible,
then given two orderings we can bring both of them to the canonical ordering and then
compare them as plain sequences. One possibility is to take the caonical order as that one
that gives the lower node sequence, in lexicographical order.
This has the advantage that one the canonical orderign is known for both objects,
the comparison is very cheap. Also, it can be used as a comparison operator for sorting
geometrical objects, i.e. the order between two objects is the lexicographical order between
the two node sequences in its canonical form.
The canonical order can be computed efficientlyusing the permutation tree described
above.
17.6 Object hashing
Another useful technique for comparing objects is by comparing first some scalar function
of the node indices. For instance we can compute a “hash-sum” for the object go as
H(go) =
n−1
X
h((nodegoj))
(162)
j=0
where (node go j) is the j node of object go. H is the hash-sum for the geometrical
object go, whereas h is a “scalar hashing function”. A very simple possibility is to take
h as the identity h(x) = x. In that case the hash of the object is simply the sum of their
node indices. The idea is that, even for this simple hashing function we can compare
first the hash-sums of two objects for determining if they are equal. If there is a high
probability of the objects being unequal, then in most cases this simple check will suffice.
If the hash-sum are equal, then the full comparision procedure, as described above, must
be performed. Note that the hash-sum is (and must be) independent of the ordering.
142
In order to reduce the probability that unequal objects give the same hash-sum we
can device better hashing functions. Two kind of hashing functions will be discussed:
functions that hashes a sequence of numbers (“hash-sequence functions”) taking in account
the ordering of the sequences and functions that hash the sequences in a way infependent
of the ordering (“hash-sum functions”).
Consider first the hash-sequence functions. If we consider sequences of integers, then
the hash functions should return a scalar value for each sequence of integers, in such a way
that if two sequences are distinct (i.e. they have distinct numbers, or the same numbers
in differente order) they should have different hashing functions. We restrict us to 32-bit
integers, unsigned integers are bounded by 232 and there are 232n distinct sequences of
n integers, so that it is impossible to have such un hashing function. If for two distinct
sequences we have the same hash values, then we said thar there is a “collision” in the
hashing process. We, then, try to have a hashing functions that minimizes the number of
collisions. Suppose we consider the set of number sequences 32-bit integers of length n.
We have N = 232n sequences and M = 232 hash values, and in the best case we would
have N/M = 232(n−1) sequences for each hash value. If we generate m distinct (random)
sequences and m N then, if the hashing funtion is good, there is very little probability
of having collisions between them adn the probability of collision is almost by random, i.e.
m/M . Then the number of collisions is, approximately
nbr. of collisions =
m
X
j/M ∝
j=0
m2
M
(163)
We consider the following sequence-hasshing functions:
• Hasher (SVID rand48 functions) If we have some kind of pseudo-random generator in the form y = rand(s) where s is the seed, then we can do a hashing function
with the following seudocode
int hash(int *x,int n) {
int v = 0;
for (int j=0; j<n; j++) {
v = rand(f(v,x[j]);
}
return v;
}
where f(v,x) is some binary functions that combines the values of the current state v
and the incoming sequence element x[j]. The rand48 hasher is based on the random
function based on the SVID library coming with the GNU C library (version 5.3.12
at the moment of writing this).
• FastHasher This is based in a simple pseudo-random function of the form
int rand(int v,int x) {
v ^= x;
int y = (v+x)^2;
y = y % MAX;
y = y ^ m;
}
143
int hash(int *w,int n) {
int v = c;
for (int j=0; j<n; j++) {
v = rand(v,x);
}
}
where MAX=232 and c = 0x238e1f29, m = 0x6b8b4567.
• MD5Hasher This is based on the MD5 routines from RSA. This is an elaborated
algorithm that creates a 16 byte hash value from a string of characters. We take as
hash value the first 4 bytes from this digest.
18 Synchronized buffer
One difficult task in parallel programming is printing from the slave nodes. In MPI, in
general,it is not guaranteed that printing from the nodes is possible and in the MPICH
implementation output from the nodes get all mixed and scrambled.
PETSc provides a functions in order to facilitate this task. The user can call
PetscSynchronizedPrintf(...) as many times as he wants en each nodes. The
output is concatenated in each node to a buffer, and then a collective call to
PetscSynchronizedFlush(...) flushes all the buffers in order to the standard output.
There is a similar function for files PetscSynchronizedFPrintf(...) but it turns out
that the flushing of standard output and files is mixed. In addition, even in the case of
writing only to standard output, the output is not sorted properly.
The objective of the SyncBuffer<T> template class and the KeyedOutputBuffer class
is to have a synchronized output device that sorts the lines written by the nodes. The
idea is that one defines a class (say KeyedObject) that must support the following member
functions
class KeyedObject {
public:
// Default constructor
KeyedObject();
// Copy Ctor
KeyedObject(const KeyedObject &ko);
// Dtor.
~KeyedObject();
// Used for sorting the lines
friend int operator<(const KeyedObject& left,
const KeyedObject& right);
// Call back for the distributed container. Return
// size of the buffer needed to store this element.
int size_of_pack() const;
// Effectively packs the object into the buffer,
// upgrading the pointer.
void pack(char *&buff) const;
// Extracts the object from the buffer, upgading the buffer.
144
void unpack(const char *& buff);
// Print the object
void print();
};
Then the user can load objects into the buffer, and finally call the flush() method to
dump the actual content of the buffer to the output. The flush() is equivalent to
• sending all objects to the server, deleting them from the original node,
• sorting them according to the operator<(...) defined, and
• calling print(...) on all of them on the server.
The underlying container is a list so that you can manipulate it with the standard list accessors, but the ideal is to push elements with push_back(KeyedObject obj), or pushing
a clean object with push_back() and then access the elements with back(). Note that
you must implement the copy constructor for the KeyedObject class. Objects with the
same key are not overwritten, i.e. if several elements with the same key are loaded on the
same or differente processors, then all of them are printed to the output. As the sorting
algorithm used is stable, objects with the same key loaded in the same processor, remain
in the same order as were entered. Typical usage is as follows
#include <src/syncbuff.h>
class KeyedObject {
// define methods as declared above
};
SYNC_BUFFER_FUNCTIONS(KeyedObject);
int main() {
SyncBuffer<KeyedObject> sb;
// Insert objects
sb.push_back(obj1);
// ...
sb.push_back(obj1);
// flush the buffer.
sb.flush();
}
The macro SYNC_BUFFER_FUNCTIONS(...) takes as argument the name of the basic object
class and generates a series of wrapper functions. (This should be done in the templates
themself but due to current limitations in template specialization it has to be done through
macros.)
145
18.1 A more specialized class
A more simple class derived from SyncBuffer<...> has been written. This class is based
on SyncBuffer<...> using as KeyedObject the class KeyedLine which has simply an
integer and a C-string. Typical usage is
#include <src/syncbuff.h>
#include <src/syncbuff2.h>
KeyedOutputBuffer kbuff;
AutoString s;
for ( ... ) {
s.clear();
// Load string ‘s’ with ‘cat_sprintf’ functions
// ....
// Load in buffer
kbuff.push(k,s);
}
kbuff.flush();
Also you can directly use a printf(...)’ like semantics in the following way
#include <src/syncbuff.h>
#include <src/syncbuff2.h>
KeyedOutputBuffer kbuff;
int key;
for ( ... ) {
// Load internal string with ‘printf’ and ‘cat_printf’ functions
kbuff.printf("one line %d %d %d %f\n",i,j,k,a);
kbuff.cat_printf("other line %d %d %d\n",m,n,p);
// Push in buffer with key ‘q’. ‘push()’ also clears the
// internal string
kbuff.push(q)
}
kbuff.flush();
If you want to dump the buffer on another stream, like a file for instance, then you have
to set the static FILE *KeyedLine::output field of the KeyedLine class. Also, the
static int KeyedLine::print_keys flag controls whether the key number is printed at
the beginning of the line or not. For instance the following code sends output to the
output.dat file without key numbers.
146
FILE *out = fopen("output.dat","w");
KeyedLine::output = out;
KeyedLine::print_keys = 0;
kbuff.flush();
fclose(out);
Also, the member int KeyedOutputBuffer::sort_by_key (default 1) controls whether
to sort by keys prior to printing, then you can set
kbuff.sort_by_key = 0;
if you want to disable sorting.
Some notes regarding usage of this class are:
• You can have several SyncBuffer<..>’s or KeyedOutputBuffer’s at the same time,
and you can flush(...) them independently.
• Memory usage: All items sent to the buffer with push() are kept in memory in a
temporary buffer. When flush() is called all objects are sent to the master, printed
and al buffers are cleared. So that you must guarantee space enough in memory for
all this operations.
• Implementation details: Data is sent from the nodes to the master with point to
point MPI operations, which is far more efficient than writing all nodes to a file via
NFS. Sorting of the objects by key in the master is done using the sort() algorithm
of the list<...> STL container, which is O(N log N ) operations.
19 Authors
Mario A. Storti (CIMEC) PETSc-FEM kernel, NS and AdvDif modules.
Norberto M. Nigro (CIMEC) PETSc-FEM kernel, NS and AdvDif modules, multi-phase
flow.
Rodrigo R. Paz (CIMEC) AdvDif module, AdvDif module, hydrology module, compressible flow, fluid-structure interaction.
Lisandro Dalc´ın (CIMEC) PETSc-FEM Kernel, Python extension language project
scripting extension, linear algebra, preconditioners.
Ezequiel L´
opez (CIMEC) Mesh relocation algorithms (mesh-move).
Also many people from the following institutions has contributed directly or indirectly to
the generation of this code:
CIMEC International Center for Computational Methods in Engineering, Santa Fe, Argentina http://www.cimec.org.ar
INTEC Instituto de Desarrollo Tecnol´ogico para la Industria Qu´ımica http://www.
intec.unl.edu.ar
147
CONICET Consejo Nacional de Investigaciones Cient´ıficas y T´ecnicas http://www.
conicet.gov.ar
UNL Universidad Nacional del Litoral http://www.unl.edu.ar
20 Grants received
The development of this code has been developed under financement from the following
grants:
23. Key: PICT-2006-VES. Code: PICT-1506/2006. Title: C´
alculo distribuido en
mec´
anica y multif´ısica computacional Director: Mag. Victorio Sonzogni. Financing agency: FONCyT. Begin: 2008. End: 2010.
22. Key: PICTO-2004. Code: PICTO-23295/2004. Title: Integraci´
on de procesos
del complejo suelo-agua-planta para una mejor planificaci´
on h´ıdrica en
la cuenca inferior del R´ıo Salado Director: Dr. Leticia Rodr´ıguez. Financing
agency: FONCyT. Begin: 2006. End: 2007.
21. Key: PIP-2005. Code: PIP-5271. Title: Mec´
anica Computacional en Problemas de Multif´ısica Director: Dr. M.A. Storti. Financing agency: CONICET.
Begin: 2005. End: 2008.
20. Key: CAI+D-05. Code: CAI+D 2005-10-64. Title: M´
etodos Num´
ericos para
Resoluci´
on de Problemas Multif´ısica Director: Dr. S. R. Idelsohn. Financing
agency: Universidad Nacional del Litoral - UN Litoral. Begin: 2005. End: 2007.
19. Key: PROTIC. Code: PAV-127, Subproy 4.. Title: PROTIC: Red para la
Promoci´
on de las Tecnolog´ıas de la Informaci´
on y las Comunicaciones.
Subproyecto 4: Centro Virtual de Computaci´
on de Alto Rendimiento Director: Dres. Alejandro Cecatto, Guillermo Marshall. Financing agency: ANPCyT
- FONCyT. Begin: 2005. End: 2006.
18. Key: LAMBDA. Code: PICT 12-14573/2003. Title: LAMBDA: Laboratorio
virtual para el An´
alisis y simulaci´
on computacional de problemas Multif´ısicos Basados en ecuaciones Diferenciales Acopladas Director: Dr. S.
Idelsohn. Financing agency: ANPCyT - FONCyT. Begin: 2005. End: 2007.
17. Key: PME-CLUSTER. Code: PME-209. Title:
Cluster del Litoral: Red
de laboratorios para la resoluci´
on de problemas de la f´ısico-matem´
atica
aplicados a la ingenier´ıa Director: Dr. S. Idelsohn. Financing agency: ANPCyT
- FONCyT. Begin: 2004. End: 2005.
16. Key: CLUSTER-CHILE. Code: C-13680/4, Nro 23. Title: C´
alculo paralelo
en problemas de mec´
anica computacional a trav´
es del uso de una red de
computadores personales. Director: Dres. Marcela Cruchaga, Norberto Nigro.
Financing agency: Programa de Colaboracion Cient´ıfico-Acad´emica entre Argentina,
Brasil y Chile 2000 - 2001 (C-13680/4). Fundaci´on Andes.. Begin: 2000. End: 2001.
15. Key: PIP-PAR. Code: PIP 02552/2000. Title: Generaci´
on de recursos de
c´
alculo paralelo para mec´
anica computacional Director: V.E. Sonzogni. Financing agency: CONICET. Begin: 2000. End: 2002.
14. Key: CAI+D. Code: CAI+D-2000-43. Title: Desarrollo de algoritmos para
c´
alculo paralelo Director: Victorio Sonzogni. Financing agency: UNL. Begin:
148
13.
12.
11.
10.
9.
8.
7.
6.
5.
4.
3.
2.
2000. End: 2002.
Key: PROA. Code: PICT-6973/99. Title: Desarrollos en Mec´
anica Computacional utilizando t´
ecnicas de PRogramaci´
on Avanzada Director: Sergio
Idelsohn. Financing agency: ANPCyT. Begin: 2000. End: 2003.
Key: MELT. Code: PID-99/76. Title: MELT: Modelado de Emulsificaci´
on
de metales en estado L´ıquido y sus efectos Termomec´
anicos Director: A.
Cardona. Financing agency: ANPCyT - FONCyT. Begin: 2000. End: 2002.
Key: FLAGS. Code: PID-99/74. Title: FLAGS: Simulaci´
on num´
erica en
gran escala de la interrelaci´
on entre el FLujo de Aguas Superficiales y
el FLujo de AGuas Subterr´
aneas Director: S. Idelsohn. Financing agency:
ANPCyT - FONCyT. Begin: 2001. End: 2004.
Key: GERMEN-CFD. Code: PIP-0198/98. Title: Germen/CFD: GEneraci´
on
de Recursos b´
asicos para la aplicaci´
on de los M´
etodos Num´
ericos en
din´
amica de fluidos computacional Director: M. Storti. Financing agency:
CONICET. Begin: 1999. End: 2001.
Key: PEI-CFD. Code: PEI-231/97. Title: PEI 231 - CONICET. Dise˜
no
mec´
acnico asistido por CFD Director: N. Nigro. Financing agency: CONICET.
Begin: 1998. End: 1999.
Key: PEI-NAVAL. Code: PEI-232/97. Title: PEI Nro. 232 - CONICET.
M´
etodos Num´
ericos en Hidrodin´
amica Naval y Costera Director: M. Storti.
Financing agency: CONICET. Begin: 1998. End: 1999.
Key: SINUS-PIM-B. Title: Proyecto Alpha de la Comisi´
on Europea. Sinus
Pim B: Modelisation et Simulation Numeriques en Ingenierie Mecanique
Director: S.R. Idelsohn y V. Ruas (Univ. Paris VI, Laboratoire de modelisation en
Mecanique). Financing agency: CONICET - CEE. Begin: 1997. End: 1999.
Key: CMES. Title:
Proyecto Alpha de la Comisi´
on Europea. CMES:
Computer Methods in Engineering Science Director: S.R. Idelsohn y G. Beer
(Institut f¨
ur Baustatik, Graz, Austria). Financing agency: CONICET - CEE. Begin:
1997. End: 1999.
Key: TUCANO. Title:
Proyecto Alpha de la Comisi´
on Europea. TUCANO: Transatlantic University / Industry Cooperation Director: S.R.
Idelsohn y S. Mac Neill (Univ. de Birmingham). Financing agency: CONICET CEE. Begin: 1997. End: 1999.
Key: CONICET-FNRS. Title: Convenio CONICET - Fonds National de la
Recherche Scientifique (FNRS) entre el Grupo de Tecnolog´ıa Mec´
anica
del INTEC y el Laboratoire de T´
echniques A´
eronautiques et Spatiales,
Universidad de Lieja, B´
elgica. Investigaci´
on en Mec´
anica Computacional
Director: A. Cardona y M. G´eradin. Financing agency: CONICET - FNRS(B´elgica).
Begin: 1996. End: 1997.
Key: CAI+D-94. Code: CAI+D 94-004-024. Title: M´
etodos Num´
ericos en
Mec´
anica de S´
olidos y Fluidos Director: S. R. Idelsohn y A. Cardona. Financing
agency: Universidad Nacional del Litoral - UNLit. Begin: 1994. End: 1995.
Key: GERMEN. Code: PICT-51. Title: FONCyT - PICT 51 GERMEN:
´
GEneraci´
on de Recursos b´
asicos para la aplicaci´
on de los MEtodos
Num´
ericos Director: Dr. Sergio Idelsohn. Financing agency: FONCyT. Begin:
1998. End: 2001.
149
1. Key: COLA. Code: PID-026. Title: Simulaci´
on Num´
erica de Procesos de
Colada Continua Director: S. R. Idelsohn. Financing agency: SECYT-BID. Begin: 1996. End: 1999.
21 Symbols and Acronyms
21.1 Acronyms
CFD: Computational Fluid Dynamics
DX: IBM Data Explorer
ePerl: embedded Perl
FDM: Finite Difference Method
FEM: Finite Element Method
IISD: Interface Iterative – Sub-domain Direct method
KWM: Kinematic Wave Model (see 6.7)
LES: Large Eddy Simulation
MPI: Message Passing Interface
OOP: Object Oriented Programming
Perl: Practical Extraction and Report Language
PETSc: Portable Extensible Toolkit for Scientific computations.
SUPG: Streamline Upwind/Petrov Galerkin
ANN: Approximate Nearest Neighbor problem. Also refers to the library developped by
David Mount and Sunil Arya (http://www.cs.umd.edu/∼mount/ANN).
22 Symbols
• u, v = Streamwise and normal components of velocity.
150
References
[1] L. Rodr´ıguez. Investigation of Stream-Aquifer Interactions Using a Coupled SurfaceWater and Ground-Water Flow Model. PhD thesis, University of Arizona, 1995.
[2] M. Mallet T.J.R. Hughes. A new finite element method for cfd: Iv. a discontinuitycapturing operator for multidimensional advective-diffusive systems. Comp. Meth. in
Applied Mechanics and Engineering, 58, 1986.
151
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement