Analysis of Adaptive Mesh Refinement for IMEX Discontinuous Galerkin

Analysis of Adaptive Mesh Refinement for IMEX Discontinuous Galerkin
Calhoun: The NPS Institutional Archive
Faculty and Researcher Publications
Faculty and Researcher Publications
2013
Analysis of Adaptive Mesh Refinement
for IMEX Discontinuous Galerkin
Solutions of the Compressible Euler
Equations with Application in to
Atmospheric Simulations
Kopera, Michal A.
http://hdl.handle.net/10945/38331
Analysis of Adaptive Mesh Refinement for IMEX Discontinuous Galerkin
Solutions of the Compressible Euler Equations with Application to
Atmospheric Simulations
Michal A. Koperaa,∗, Francis X. Giraldoa
a Naval
Postgraduate School, Department of Applied Mathematics, Monterey, CA 93940
Abstract
The resolutions of interests in atmospheric simulations require prohibitively large computational resources.
Adaptive mesh refinement (AMR) tries to mitigate this problem by putting high resolution in crucial areas of
the domain. We investigate the performance of a tree-based AMR algorithm for the high order discontinuous
Galerkin method on quadrilateral grids with non-conforming elements. We perform a detailed analysis of
the cost of AMR by comparing this to uniform reference simulations of two standard atmospheric test
cases: density current and rising thermal bubble. The analysis shows up to 15 times speed-up of the AMR
simulations with the cost of mesh adaptation below 1% of the total runtime. We pay particular attention to
the implicit-explicit (IMEX) time integration methods and show that the ARK2 method is more robust with
respect to dynamically adapting meshes than BDF2. Preliminary analysis of preconditioning reveals that it
can be an important factor in the AMR overhead. The compiler optimizations provide significant runtime
reduction and positively affect the effectiveness of AMR allowing for speed-ups greater than it would follow
from the simple performance model.
Keywords: adaptive mesh refinement, discontinuous Galerkin method, non-conforming mesh, IMEX,
compressible Euler equations, atmospheric simulations
1. Introduction
Atmospheric flows are characterized by a vast spectrum of spatial and temporal scales, from weather
fronts and planetary waves covering thousands of kilometers and lasting weeks, to turbulent motions at
the micro scale. Due to limitations in computational resources, we are not able to resolve all phenomena.
Most models assume a uniform mesh and therefore distribute computational resources uniformly across the
domain. The scales of motion in the atmosphere are not, however, distributed uniformly both in space and
time. The goal of adaptive mesh refinement (AMR) is to focus the resolution of the mesh (and therefore
computational resources) where it is most required. Dynamic adaptation aims to follow the important
structures of the flow and modify the mesh as the simulation progresses, according to some refinement
criterion. An example of such a situation in the atmosphere is a hurricane, which is an event of significance
that is relatively localized within a global domain but traverses vast distances. Static adaptation, on the
other hand, aims to refine the mesh once at the beginning of the simulation, which allows to focus the
computational resources at a particular area of interest in the domain. In such a way one could well resolve
a certain part of the globe for which the weather forecast is to be performed, leaving the rest of the domain
at much coarser resolution. In this paper we focus on dynamic mesh refinement, which we present on a
couple of atmospheric test cases. All the methods that we discuss, however, are readily applicable to static
adaptation.
∗ Corresponding
author. Tel: +1 831-656-3247
Email addresses: [email protected] (Michal A. Kopera ), [email protected] (Francis X. Giraldo)
In order to discretize the solution we use the discontinuous Galerkin (DG) method on quadrilateral
element grids. The DG method has gained a significant level of interest in recent years and there has been
a number of efforts to apply it to hydrostatic [1] and nonhydrostatic [2] atmospheric flows. The method
benefits from great data locality and high computation intensity, which helps to scale exceptionally well on
a large number of processors. It also supports arbitrary high order, which, among other benefits, allows for
accurate handling of the non-conforming edge fluxes, which can arise in the process of mesh adaptation.
AMR for DG was successfully applied in different applications (e.g. shock capturing [3, 4], mantle convection
[5]). Here we focus on atmospheric flows, particularly the dry dynamics governed by the Euler equation.
The question we would like to answer is: how does AMR benefit a simulation? To try to answer this
question we perform a detailed analysis of the cost of AMR comparing this to the results of a uniformly
refined simulation. To our knowledge, such a detailed study has not been performed previously. Moreover,
we seek to answer this question in light of the use of implicit-explicit (IMEX) time-integrators because for
real-world applications (such as weather and climate modeling) explicit time-integration is not feasible due
to the small time-step restriction of the fast acoustic waves.
The following subsections present a brief overview of the work done on AMR for atmospheric simulation
in general and specifically in conjunction with the DG method.
1.1. AMR in atmospheric simulations
Jablonowski [6] and Behrens [7] give a good overview of the state of adaptive atmospheric modeling.
The hurricane modeling community was the first to use nested grid techniques for their simulations. In this
approach fine scale grids were nested into large-scale coarse meshes and communication was allowed from
large to small scales [8, 9] or in both directions [10]. The nested grid can move with the feature it tracks, but
often some previous knowledge of the grid movement is required and the number of points remains constant
throughout the simulation.
Another example of a mesh modification technique which preserves the number of points and grid connectivity throughout the simulation is mesh stretching, i.e., changing the grid spacing by the use of transformation functions (also known as r-refinement). Examples of such an approach are presented in [11, 12]
and more recently in [13].
This paper focuses on an adaptive method which does not involve a moving mesh but rather refines the
elements dynamically in regions of particular interest. This technique is typically referred to as dynamic
AMR. The first atmospheric models involving AMR were developed by Skamarock and Klemp [14, 15].
They used the technique of Berger and Oliger [16] where a finite difference method is used to integrate the
dynamical equations first on a coarse and then on the finer grids. In order to determine the location of finer
grids a criterion based on truncation error estimation was used. The technique of Berger and Oliger [16]
and Berger and Collela [17] is referred to as block-structured AMR. It was later used in a number of studies
including LeVeque [18] and Nikiforakis [19].
The only operational weather model which uses dynamic AMR is OMEGA [20]. OMEGA is based on
the finite volume MPDATA scheme, originally developed by Smolarkiewicz [21], which uses unstructured
triangular meshes. The dynamic adaptation capabilities were implemented to the MPDATA model by Iselin
and coworkers [22]. Another example of AMR which uses triangular elements is the work of Giraldo [23]
who used the Lagrange-Galerkin method. In all of these approaches the mesh is obtained by a Delaunay
triangulation of the domain given some mesh size criteria. When the criterion indicates that the mesh should
be adjusted, the triangulation is performed on the entire domain, and the solution is projected from the old
mesh to the new one.
A different approach is presented in the work of Behrens [24, 25], where the triangular elements indicated
for refinement are subdivided making sure the conformity of the edges is preserved. The method allows for
local refinement without modifying the entire domain. It is well suited for triangular meshes, however the
same approach for quadrilateral grids would be difficult as maintaining conformity of locally dynamically
refined quadrilaterals is more challenging to achieve. An example of static conforming quadrilateral mesh
refinement can be found in [26].
Quadrilateral meshes yield to local refinement easily, provided that we allow non-conforming elements
in the grid. Examples of quadrilateral-based dynamic AMR methods for the shallow water equations can
2
be found in the work of Jablonowski [6]. St-Cyr et al. [27] compare an adaptive cubed-sphere grid spectral
element shallow water model with an adaptive finite volume method by [6] to investigate the applicability
of tree-based AMR algorithms to atmospheric models.
1.2. Element based Galerkin methods for atmospheric AMR
The high order element based Galerkin methods for AMR in atmospheric applications is a fairly new field
of study. These methods present a new set of challenges and possibilities for adaptive mesh refinement. By
expanding the solution in a basis of high order polynomials in each element, one can dynamically adjust the
order of these basis functions, which can differ across elements. This kind of approach is called p-refinement,
and an example of such a technique applied to the shallow water equations can be found in [28].
In the previous section we already mentioned the element mesh refinement (so called h-refinement),
which focuses on refining the mesh while keeping the polynomial order constant across the elements. If we
choose to allow non-conforming elements, the challenge in this approach is the appropriate treatment of the
non-confoming faces. For the DG method one needs to compute the flux across the non-conforming faces.
The mathematical solution to this problem was proposed by Kopriva [29] as well as Maday et al. [30] who
formulated the mortar method for non-conforming elements. Examples of application of h-refinement to
atmospheric flows can be found in [31], where the spectral element method for geophysical and astrophysical
applications was used, or [27] where the comparison of a finite volume and spectral element AMR code for
the shallow water equations was performed. A recent study by Müller et al. [32] investigates the dynamic
adaptation of triangular conforming meshes and addresses the question whether coarsening the mesh in
certain areas of the grid affects the solution in a significant way. Brdar et al. [33] perform the comparison
of two dynamical cores for the numerical weather prediction and mention that the DUNE code, which uses
the DG method, has AMR capabilities, however no adaptive mesh examples are discussed in that paper.
The p and h refinement methods can be combined together. [34] shows the application of such an
algorithm for the DG method using triangular, non-conforming elements applied to shallow water equations. Another example for this set of equations is presented in [35], where an hp-adaptive DG method on
quadrilateral, non-conforming elements for global tsunami simulations is considered.
In this paper we focus on the quadrilateral, non-conforming DG method for the Euler equations and
provide an in-depth analysis of the performance of our implementation. To our knowledge this is the first
work on tree-based non-conforming AMR for the nonhydrostatic atmosphere equations. Furthermore, to
our knowledge no previous work has been published on non-conforming AMR for high-order DG methods
for these equations.
This paper is organized as follows: section 2 gives a brief overview of the equations we are solving, section
3 provides an outline of the DG method, section 4 discusses the difference between conforming and nonconforming mesh. In section 5 we describe the details of the mesh adaptation algorithm. Finally, in section
7 we provide the outline of the test cases followed by the discussion of the results in section 8. The paper
is concluded in section9 and supplemented with an Appendix, which describes in detail the formulation of
the projection method.
2. Governing equations
Non-hydrostatic atmospheric dynamical processes in NUMA1 are governed by the compressible Euler
equations in conservative form which uses density ρ, momentum ρu and density potential temperature ρθ
as state variables (see, e.g., [36] for other forms). We use the following equation set:
∂ρ
+ ∇ · (ρu) = 0,
∂t
∂ρu
+ ∇ · (ρu ⊗ u + P I) = −ρgk + ∇ · (µρ∇u),
∂t
∂ρθ
+ ∇ · (ρθu) = ∇ · (µρ∇θ),
∂t
1 NUMA
is the name of our model and is an acronym for the Nonhydrostatic Unified Model of the Atmosphere
3
(1)
∂
∂ T
where u = (u, w)T is the velocity field, ∇ = ( ∂x
, ∂z
) is the gradient operator, ⊗ is the tensor product, I
T
is the rank-2 identity matrix, k = (0, 1) is the directional vector that points along the z direction, g is the
gravitational acceleration and P is the pressure obtained from the equation of state
P = P0
Rρθ
P0
ccp
v
.
(2)
The dynamical viscosity µ is varied among the test cases. Note that while this is not the mathematically
proper form of the true Navier-Stokes viscous stresses, it is sufficient for the chosen test cases as shown
in Giraldo and Restelli [36]. Additional terms requiring definition are the pressure at the lower boundary
P0 = 1 × 105 Pa, the gas constant R = cp − cv and the specific heats at constant pressure and volume, cp
and cv .
3. Discontinuous Galerkin method
Giraldo and Restelli [36] describe in detail the discretisation of Eq. (1) for the DG method. Here we
outline the weak formulation for the sake of completeness. Note that for the sake of brevity the analysis in
this paper was conducted using the weak form only, although both strong and weak forms are implemented
in the NUMA software with the non-conforming AMR algorithm.
To describe the DG method, we write Eq. (1) in vector form
∂q
+ ∇ · F = S(q),
∂t
where q = (ρ, UT , Θ)T is the solution vector with U = ρu and Θ = ρθ. F(q) is the flux tensor given by


U 

+ P I − ∇ µρ∇ U
(3)
F(q) =  U⊗U
ρ
ρ ,
θU − µ∇Θ
and the source term S(q) given by

0
S(q) = −ρgk .
0

We divide the computational domain Ω into a set of non-overlapping elements Ωe such that Ω =
(4)
N
Se
Ωe .
e=1
We define the reference element I = [−1, 1]2 and for each element Ωe there exists a smooth transformation
such that I = FΩe (Ωe ). Additionally, if Ωe = Ωe (x, y) and I = I(ξ, η) then FΩe : (x, y) → (ξ, η) and
FΩ−1
: (ξ, η) → (x, y). We employ the notation x = (x, y) and ξ = (ξ, η). The Jacobian of this transformation
e
dF
is given by JΩe = dξΩe with determinant JΩe .
Let ψk be a basis function of a space PN (I) of polynomials of degree N or lower in I where the index k
varies from 1 to K = (N + 1)2 . The tensor product structure of I allows us to construct such a basis as
ψk = hi (ξ)hj (η),
where {hi }N
i=0 is a basis for PN ([−1, 1]) and index k is uniquely associated with the pair (i, j): k = (i + 1) +
(N + 1)j. Let ξi be the Legendre-Gauss-Lobatto (LGL) points defined as the roots of (1 − ξ 2 )PN0 (ξ) = 0,
where PN (ξ) is the N th order Legendre polynomial. The basis functions hi (ξ) are in fact the Lagrange
polynomials associated with LGL points ξi . Notice that associated with these points are the Gaussian
quadrature weights
2
1
2
.
ωi =
N (N + 1) PN (ξi )
4
Let qN be the approximation of the solution vector q on the element Ωe in the expansion basis ψ:
qN (x, t)|Ωe =
K
X
ψk FΩ−1
(x) qk (t),
e
e = 1, . . . , Ne ,
(5)
k=1
where we introduce the grid points xk = FΩe ( (ξi , ηj ) ) and the grid point values qk (t) = qN (xk , t). The
computation of the derivatives of qN gives
K
X
i
d h
∂qN
ψk FΩ−1
(x, t) =
qk (t),
(6)
(x)
e
∂x
dx
Ωe
k=1
K
X
dqk
∂qN
(x, t) =
ψk FΩ−1
(t).
(7)
(x)
e
∂t
dt
Ωe
k=1
Concerning the computations of integrals, the expansion defined by Eq. (5) yields
Z
Z
qN (x (ξ) , t) JΩe (ξ) dξ '
qN (x, t) dx =
Ωe
I
N
X
ωi ωj qkij (t)JΩe (ξi , ηj ).
(8)
i,j=0
With the definitions of the solution expansion and the operations of differentiation and integration in
place we can now formulate a DG representation of Eq. (1). Here we consider a nodal formulation with
inexact integration as described in [36]. We start with multiplying Eq. (1) by a test function ψ and integrating
over an element Ωe :
e
Z
Z
∂qN
+ ∇ · F(qeN ) dΩ =
ψ S(qeN ) dΩe ,
(9)
ψ
∂t
Ωe
Ωe
where qeN denotes the degrees of freedom collocated in Ωe . Applying now integration by parts and introducing the numerical flux F∗ , the following problem is obtained:
find qN (·, t) ∈ VNDG such that ∀Ωe , e = 1, . . . , Ne
Z
Z
Z
Z
∂qeN
∗
e
ψ
dΩe +
ψ n · F (qN ) dΓe −
∇ψ · F(qN ) dΩe =
ψ S(qeN ) dΩ,
(10)
∂t
Ωe
Γe
Ωe
Ωe
∀ψ ∈ L2 (Ω) .
The coupling between neighboring elements is then recovered through the numerical flux F ∗ , which is
required to be a single valued function on the interelement boundaries and the precise definition of which is
given in [36].
By virtue of Eqs. (6), (7) and (8), Eq. (10) can be written in the matrix form
e T
dq e c s,e T ∗
b
+ M
F (q) − D
F (q e ) = S(q e )
(11)
dt
s,e
c = (M e )−1 M s,e and D
b e = (M e )−1 D e . M e , M s,e and D e are local mass, boundary mass and
where M
differentiation matrices given by
e
Mhk
= wh |JΩe (ξ h ) |δhk ,
D ehk = wh |JΩe (ξ h ) |∇φk (xh ),
s s
M s,e
hk = wh |JΩe (ξ h ) |δhk n(xh ),
where h, k = 1, . . . , K, δhk is the Kronecker delta, ξ k = (ξi , ηj ), wk = ωi ωj , wks = ωi for j = 0 or j = N and
wks = ωj for i = 0 or i = N .
Note that this equation can be simplified to yield the following semi-discrete weak form
e T
dq ei
wis,e |Jis,e | s,e T ∗
e
e
b
= D
F
+
S
−
(ni ) F i .
(12)
ij
j
i
dt
wie |Jie |
In Eq. (10) we notice that there is only one integral (Γe ) which couples the neighboring elements together.
It is this boundary integral that needs to be modified in order to handle non-conforming AMR. This is
explained in detail in Sec. 6.
5
4. Conforming vs non-conforming mesh
The first question that needs to be answered when constructing the AMR algorithm is whether nonconforming elements are allowed in the grid - that is whether each edge can be shared by more than two
elements. We can restrict ourselves to purely conforming ones, where all edges in the mesh are owned by
two elements only. In the conforming case, the entire burden of handling the changing mesh falls on the
mesh adaptation algorithm, which has to make sure that the grid remains conforming after adaptation.
The upside of this approach is an easy communication between neighboring elements. Since their edges are
conforming, the AMR does not introduce any additional complication to the DG method solver.
In contrast, in the non-conforming case the mesh adaptation algorithm is kept simple: we divide each
element marked for adaptation into a predefined number of children elements. In our case, we choose to
divide a 2D quadrilateral element into four children elements (from here on, we shall refer to quadrilaterals
as quads). This leads to a situation where, if only one of two neighbor elements is refined, the non-refined
neighbor shares an edge with two children elements. This requires the DG solver to be able to compute the
numerical flux on such a non-conforming edge. This approach shifts the burden from the mesh adaptation
algorithm to the solver side.
In this paper we present the non-conforming approach for a 2D quad-based mesh for the DG method
2
. We believe that the added complication and increased cost related to the computation of fluxes through
non-conforming edges is more than made up for by the simplified element refinement algorithm.
5. Mesh adaptation algorithm
5.1. Forest of quad-trees
We adopt the forest of quad-trees approach proposed by [37]. We generate an initial coarse mesh, which
has to represent the geometrical features of the domain. In Fig. 1a we present a simple two element initial
mesh. We call it the level 0 mesh, where each element is a root for a tree of children elements. If we
decide to refine element 1, we replace this element with four children, which belong to the level 1 mesh. We
represent it graphically on the right panel of Fig. 1b. Active elements (a set of elements which pave the
domain entirely) are marked in blue. Element number 1 is now inactive, replaced by the four newly created
elements 3, 4, 5 and 6.
If we further choose to refine element number 5, and thus introduce level 2 elements, we render this
element inactive and replace it with the four children 7, 8, 9 and 10. Our element tree is presented in Fig.
1c. Now elements 1 and 5 are no longer active while the active elements 2, 3, 4, 6, 7, 8, 9 and 10 form our
mesh.
5.2. Space filling curve
In the previous subsection’s examples we assigned numbers to the elements. Those numbers serve as
element unique labels. In order to traverse all active elements in the mesh, we utilize the concept of the
space filling curve (SFC). To each active element we assign an index, which defines the element’s position
in the space filling array. In order to index all the active elements in the mesh, we search each quad-tree in
the forest for active elements (leaves) starting with the first root element (element 1). Since it is inactive,
we move to its first child. If the child element is active, we include it into the space filling curve and move
to the next child; if it is inactive we recursively traverse the sub-tree rooted in this element. After finishing
the search of one quad-tree, we move to the next level 0 root element.
Figure 2 illustrates the space filling curve concept. Note that the same tree traversing technique can
yield different space filling curves, depending on the numbering of children elements. The numbering that
produced the curve in Fig. 2a imposes a row-major order of children elements, while the curve in Fig. 2b
was generated using counter-clockwise enumeration. The numbering is applied recursively, starting from the
2 It should be noted that doing this for the continuous Galerkin method is also possible but slightly more complicated. We
shall report on this in a follow-up paper.
6
Figure 1: Unstructured grid organized into a forest of quad-trees.
level 0 mesh and traversing the tree to its fullest depth. Therefore in the row-major order we first number
the elements of the level 0 mesh as (1, 2). Next we move one level down and enumerate the children of
element 1 (element 2 has no children) starting from the bottom left element and enumerating the children in
the bottom row first, then move to the second row and enumerate the remaining two elements. We repeat
the recursive procedure until we enumerate all the elements on all levels.
In DG methods we prefer indexing the elements in such a way that adjacent elements are placed close
to each other in the space filling curve. This increases data locality, which in turn makes the computation
of fluxes more efficient. Data locality is particularly important in parallel implementations of the AMR
algorithm, however a full study of the influence of element numbering on the efficiency of the code exceeds
the scope of this paper.
5.3. Element division technique
To keep the adaptation algorithm simple, and the non-conforming face handling as efficient as possible
(see Section 6.1), we require that each side of an element to be refined is divided in a 2:1 size ratio. This means
that we split a parent edge into two children edges of equal length. In NUMA we use general quadrilaterals,
possibly with curved edges, which could prove splitting elements in physical (x, y) space difficult. Therefore
we perform the element splitting in the computational space (ξ, η) instead (see Fig. 3a).
In Section 3 we defined the transformation F : (x, y) → (ξ, η). The inverse mapping F −1 : (ξ, η) → (x, y)
is simply the expansion of a variable in the polynomial basis ψ:
q(xj , yj ) =
K
X
qi ψi (ξj , ηj ),
i=1
where q is a variable defined in physical space, qi is the nodal value of the variable in computational space
at node i corresponding to a basis function ψi . (ξj , ηj ) represent coordinates of the j-th nodal point in
computational space, which corresponds to a point (xj , yj ) in physical space.
We can treat the x and y coordinates of the nodal points as a variable across the element, which yields
7
Figure 2: Two different variations of space filling curve.
(a)
(b)
Figure 3: (a) General shape quad division; (b) Projection of coordinates from parent to children - (red) dots represent nodal
values of coordinates of parent element, (green) crosses are the nodal coordinates we seek for one of the four children elements.
the expansions:
xj ≡ x(ξj , ηj )
=
K
X
xi ψi (ξj , ηj ),
i=1
yj ≡ y(ξj , ηj )
=
K
X
yi ψi (ξj , ηj ).
(13)
i=1
Figure 3b presents an example of a standard element with 16 nodal points (red dots). Each point has a
nodal value of xpi and yip and is characterized by coordinates (ξip , ηip ). If an element is marked for refinement,
we split the standard element into four children of equal size (dashed lines). Next, using the parent nodal
values of (xi , yi ) we find the nodal values of (x, y) in the children elements (green crosses in Fig. 3). Since we
have (ξip , ηip ) coordinates of parent nodal points, then the division of the parent element into four children
is very simple. We can easily find (ξ, η) coordinates for children nodal points. Following [29] we write:
c(k)
=
s · ξip − oξ ,
c(k)
=
s · ηip − o(k)
η ,
ξi
ηi
(k)
where s = 1/2 is a scale factor and oξ
(k)
and oη
(k)
(14)
are the offsets corresponding to the variables ξ and η
(k)
respectively for a child k. For the lower left child element (depicted in Fig. 3b) k = 1, oξ
(k)
oη
= 1/2. For different children the offset values will differ in sign.
8
= 1/2 and
(a)
(b)
Figure 4: 2:1 balanced mesh and ripple propagation problem.
We seek the values of x and y at child nodal points (green crosses - here only one child is represented).
Knowing the value of (ξ c(k) , η c(k) ) at children nodes we can substitute it into the expansion (13) and find
the coordinates of the children nodes in the physical space. We write
c(k)
xi
=
K
X
c(k)
xj ψj (ξi
c(k)
, ηi
),
j=1
c(k)
yi
=
K
X
c(k)
yj ψj (ξi
c(k)
, ηi
),
(15)
j=1
which can be represented in matrix form as
c(k)
=
Lij xpj ,
c(k)
=
Lij yjp ,
xi
yi
c(k)
c(k)
(16)
where Lij = ψj (ξick , ηick ) is the interpolation matrix which holds values of the parent basis functions ψj at
children node coordinates (ξick , ηick ).
5.4. 2:1 balance
A mesh, where each edge is shared by at most three elements, is called a 2:1 balanced mesh, and an
edge which is shared by two elements on one side, and one element on the other, is called 2:1 balanced
edge. When using the term ”2:1 balanced”, we assume that a 1:1 balanced edge (where an edge is shared
by only one element on each side) is also a valid member of the 2:1 balanced mesh. The element refinement
procedure described in Section 5.3 may lead to a situation, where an element, which has a 2:1 balanced edge
and lies on the refined side of the edge, is marked for refinement, while its neighbor across this edge is not.
This would violate the 2:1 balance causing the edge to be shared by a total of four elements (one on one
side, and three on the other).
An example of such a situation is shown in Fig. 4a. Elements 3, 4, 7, 8, 9, 10, 6, 2 form a 2:1 balanced
mesh. Let us assume that element number 6 was marked for refinement. This would cause a conflict, where
the edge shared by elements 2, 4 and 6 is not 2:1 balanced anymore. In order to avoid this situation, a
special balancing procedure needs to be introduced.
In the situation presented in Fig. 4a, the solution of the 2:1 balance problem is to refine the element
2 first, before refining 6, even though 2 might not be originally marked for refinement. This would not
lead, at any time of this process, to violating the 2:1 balance rule. One might imagine, however, that for
a more complex mesh, refining element number 2 might also cause a conflict with other edges owned by 2.
Consider a situation depicted in Fig. 4b, where the mesh is initially 2:1 balanced. In this mesh there are
9
two levels of refinement predefined. Element 19 is a 0th level element, elements 3, 4, 11, 12, 13, 14 are 1st
level elements and 7-10 and 15-18 are 2nd level. Let us assume we want to refine element number 18, which
would create 3rd level children elements. This would create a conflict with element 13, which is a 1st level
element. Therefore we refine element number 13 to the 2nd level, but this creates a conflict with the 0th
level element number 19. In order to refine 18, we need to refine element 19 first, then 13, and finally 18
in order to keep 2:1 balance at all times. This causes some regions of the domain to be more refined than
required from the refinement criterion - we did not initially intend to refine 13 or 19. Such phenomenon is
called the ripple effect, where refinement of one element can cause an entire area not directly neighboring
the element in question to be refined [38].
It is easy to show that in the 2D case the ripple propagation is limited by the lowest level element in
the mesh. In a 2:1 balanced mesh the level difference between neighboring elements can be at most 1. The
conflict can occur only when refining an n-th level element which has a neighbor of level (n − 1). Therefore
we need to bring the (n − 1) level element to level n before refining the original element to level (n + 1). If
in turn the (n − 1) level element causes conflict with (n − 2) level element, we need to follow the balancing
procedure recursively. In the worst case scenario we will propagate the ripple down to 0th level element,
which by definition is a root of the element tree. Therefore by refining one n-th level element we may be
forced, in the worst case, to refine n other elements. This will cause 4n new elements to be created in
the areas possibly not indicated by the refinement criterion. 4n is typically a very small number since the
simulations shown in this work tend to have values n <= 5.
In the case of element coarsening, we adopt a different strategy. If coarsening of an element would cause
a conflict (consider a situation in Fig. 4b after refining all indicated elements, when we want to de-refine
element 19), we do not perform this operation. In order to keep the 2:1 balance we avoid propagating a
de-refinement ripple to higher levels. The rationale behind this strategy is that it is better to have more
refined elements than we need, rather than lack the resolution in the areas where it is necessary. This way
we ensure that we always have the appropriate level of refinement as indicated by the refinement criterion.
5.5. Refinement criterion
Our focus in this paper is the AMR machinery and its particular application to the DG method, therefore
we use a very simple mesh refinement criterion. First, we specify the quantity of interest (QOI). It can be
either a primitive variable like θ, u, w or ρ, or an expression derived from those variables (i.e. velocity
magnitude, absolute value of temperature fluctuation etc.). We then choose a refinement threshold. If the
maximum value of the QOI within an element falls below the threshold, the element is marked for refinement.
The maximum criterion can be of course replaced by a minimum criterion. It is also worthwhile to consider
a gradient, or other derivatives of primitive variables, as a QOI. Throughout this paper we use the potential
temperature perturbation as the quantity of interest.
The refinement criterion typically need not be evaluated every time-step, but rather every predefined
number of steps, depending on the particular problem. After each evaluation of the criterion for all active
elements in the grid, the balancing algorithm is run to eliminate possible conflicts.
6. Handling of non-conforming edges
The previous section described the details of the mesh refinement algorithm. Here we will focus on the
implementation of such an algorithm to a DG solver. Sections 6.1 and 6.2 describe the computation of the
flux for the DG method.
6.1. Projection onto 2:1 edges
In the DG method at every time-step we need to evaluate the numerical flux through all the element
edges. When allowing non-conforming elements in the mesh, one needs to address the problem of projecting
the data between two sides of a non-conforming edge. In our case the non-conformity is limited to 2:1
balanced edges, which makes the data exchange slightly easier than in a general non-conforming case.
10
Figure 5: Projection onto non-conforming edges: a) scatter from left parent edge to two children edges and b) gather from two
children edges to parent edge.
Consider the situation shown in Fig. 5a. The variable q L from a parent edge is projected onto two children
edges and becomes q L1 and q L2 . In order to perform this scatter operation we design two projection matrices
PS1 and PS2 such that
q L1
= PS1 q L
q L2
= PS2 q L .
(17)
Similarly for the gather operation we need the matrices PG1 and PG2 which satisfy
q L = PG1 q L1 + PG2 q L2
The projection matrices are constructed using the integral projection technique [29], derived for different
size ratios and different polynomial orders in neighboring elements. In Appendix A we present the outline
of the method tailored to 2:1 balanced edges.
Due to the fact that we limit our non-conforming edges to 2:1 ratio only, the scatter and gather projection
matrices are the same for all edges and need to be computed only once, which makes the algorithm quite
simple and very efficient.
6.2. Flux computation
The flux computation algorithm applied in NUMA to handle non-conforming edges relies on the projection technique described in section 6.1 and Appendix A. First, we scatter the variables from parent to
children edges using projection matrices PS1 and PS2 . This way we have the necessary information on both
sides of children edges just as in the regular conforming case. We compute the numerical flux on children
edges using the Rusanov flux (see [36] for details) and gather it back to the parent edge using matrices PG1
and PG2 . Finally, the flux is applied on both parent and children edges. Note that you can replace the
Rusanov flux with any other Riemann solver.
This algorithm ensures that the amount of numerical flux leaving one element is equal to the flux received
by the children elements on the other side of the non-conforming edge. It is worth noting, however, that
since we are dealing with the DG method, we allow a discontinuity between variables (and therefore flux)
at the children side of the interface. The projection represents the flux from both children elements as one
smooth function defined on the parent side of the interface. The parent flux is not point-wise identical to
the children fluxes, but we constrain the integral of the flux over the edge, which guarantees conservation.
6.3. 2D projection between parent and children elements
Once an element is refined (derefined), its data must be projected onto its children (parent) elements.
In order to perform these two operations we will use the 2D version of the integral projection technique
discussed in Section 6.1. Figure 6 shows schematically the projections from parent element with coordinates
(k) (k)
(ξ, η) ∈ [−1, 1]2 to four children, each with separate coordinates (z1 , z2 ) ∈ [−1, 1]2 , where k = 1, . . . , 4
11
Figure 6: Projection between parent and children elements - 2D extension of integral projection for non-conforming edges.
Sk
enumerates the children elements. For this projection we construct the scatter matrix P2D
. The inverse
Gk
operation is performed using the gather matrix P2D
. Detailed construction of both matrices is described in
Appendix A. Note that while the integral projection method works well for conserved quantities, it may not
be appropriate for all variables in the problem. It is sometimes better to interpolate or recompute certain
quantities, if possible. An example of such a situation is the gravity direction vector, which is defined
completely by the input to the simulation, therefore can be recomputed for each new element in the mesh.
In both cases presented in this paper the gravity direction was k = (0, 1) and in some cases the projection
operation caused inconsistencies on the order of the round-off error, which in turn adversely affected the
solution.
7. Test cases
In order to test the AMR algorithm we run a selection of cases from the set presented in [36]. The set
consists of seven tests widely used for benchmarking of non-hydrostatic dynamical cores of numerical weather
prediction codes. For the purpose of benchmarking the AMR capabilities of our code we have picked two
scenarios. The density current and rising thermal bubble cases show the performance of the AMR algorithm
on a rapidly changing mesh. For both test cases we compare the adaptively refined simulation with a
uniformly refined simulation. Both cases are described in detail in the aforementioned paper. Here we
outline them for completeness.
7.1. Case 1: Density current
The case was first published in [39] and consists of a bubble of cold air dropped in a neutrally stratified
atmosphere. The bubble eventually hits the lower boundary of the domain (no flux wall) and moves horizontally shedding Kelvin-Helmholtz rotors. In order to obtain the grid-converged solution we apply artificial
viscosity µ = 75m2 /s(see [39]).
The initial condition is defined in terms of potential temperature
(
0
for r > rc ,
(18)
θ 0 = θc 1 + cos πr
for r ≤ rc
2
rc
where θc = −15K, r =
r
x−xc
xr
2
+
z−zc
zr
2
and rc = 1. The domain is defined as (x, z) ∈ [0, 25600] ×
[0, 6400] m with t ∈ [0, 900] s and the center of the bubble is at (xc , zc ) = (0, 3000) m with the size of the
bubble defined by (xr , zr ) = (4000, 2000) m. The boundary conditions for all four boundaries are no-flux
walls. The velocity field is initially set to zero everywhere.
12
(a)
(b)
(c)
(d)
Figure 7: Snapshots of the solution and dynamically adaptive mesh for θt = 1.0 at different simulation times: (a) 1s, (b) 300s,
(c) 600s, and (d) 900s.
7.2. Case 2: Rising thermal bubble
In this test case a warm bubble rises in a constant potential temperature atmosphere (θ̄ = 300K). As it
rises, it deforms until it forms a mushroom shape. Initially, the air is at rest and in hydrostatic balance. The
initial potential temperature perturbation is given by Eq. (18) with θc = 0.5K and rc = 250m. The domain
has dimensions (x, z) ∈ [0, 1000]m ×[0, 1000]m and the bubble is positioned at (xc , zc ) = (500, 350)m. The
boundary conditions for all sides are no-flux. The simulation runs until t = 700s.
8. Results
8.1. Case 1 - Density Current
The density current simulation, defined in Section 7.1, was initialized with a coarse mesh consisting of
four elements (4×1 grid). We chose polynomial order N = 5 within each element. The mesh was then refined
uniformly to a specified maximum level of refinement. We chose the maximum level to be equal to 5, which
allowed for a uniformly fully refined mesh of 128 × 32 elements. This corresponds to an effective resolution
of 40 m, which is slightly below the resolution shown in [36] to yield a converged solution. By effective
resolution we mean the average distance between the nodal points within the element. The initial condition
was generated for this fully refined mesh, which formed an initial input to all the Case 1 simulations.
13
1000
θt = 0.001
900
element number
800
θt = 0.1
700
θ = 1.0
600
t
500
θt = 2.0
400
300
200
0
θt = 4.0
100
200
300
400
500
time [s]
600
700
800
900
Figure 8: Element count as a function of simulation time for different θt threshold values.
(a)
(b)
Figure 9: Snapshots of the solution and dynamically adaptive mesh at T = 900s for two different thresholds : (a) θt = 0.001
and (b) θt = 4.0.
In order to obtain a reference solution, we run the fully refined mesh without the AMR algorithm
. Next, we select four different refinement thresholds for the potential temperature perturbation θt =
[0.001, 0.1, 1.0, 2.0, 4.0]. For each θt the AMR algorithm adapts the mesh to the initial condition and
continues modifying the grid every one second of simulation time so that the areas of the domain where
θ > θt are fully refined, and the remaining elements are coarsened to a minimum resolution allowed by the
2:1 balance condition. Figure 7 shows snapshots of the mesh at different times for θt = 0.1.
Figure 8 shows the total number of elements in the grid for different values of θt over the simulation
time. Notice the different behavior of the element count for high and low threshold values. The number
of elements for high thresholds tend to level off more quickly as the simulation progresses. Low thresholds
cause the element count to steadily rise with time, at least for the time frame considered for this test case.
This is due to the fact that the low θt criterion not only requires the algorithm to refine the area around
the moving structure, but also the near-wall wake, where the temperature is slightly decreased after the
N
passing of the cold front. For further analysis, we define the element ratio (ER = Nref ) to be the ratio of
e
the number of elements in the reference simulation (Nref = 4096) to the time average number of elements
in the AMR simulations (Ne ). High ER corresponds to high threshold AMR simulations.
3
3 We
must use a high-resolution simulation as the reference solution because this test case has no analytic solution.
14
Table I: Front location (L), minimum potential temperature perturbation (θmin ) and simulation runtime (Ts ).
θt
ref.
0.001
0.1
1.0
2.0
4.0
L [m]
14758.87
14758.79
14758.81
14758.86
14758.89
14758.80
RK35
θmin [K]
-8.90603
-8.90606
-8.90607
-8.90619
-8.90608
-8.90555
Ts [h]
60.94
9.78
7.85
6.24
5.53
4.11
L [m]
14758.70
14758.71
14758.74
14758.89
14758.98
14759.10
BDF2
θmin [K]
-8.90630
-8.90650
-8.90661
-8.90688
-8.90690
-8.90648
Ts [h]
6.13
1.08
0.89
0.75
0.71
0.57
L [m]
14758.87
14758.79
14758.81
14758.86
14758.89
14758.80
ARK2
θmin [K]
-8.90603
-8.90609
-8.90610
-8.90622
-8.90611
-8.90558
Ts [h]
9.68
1.57
1.27
1.03
0.92
0.71
8.1.1. Accuracy Analysis
In Fig. 9 two different AMR simulation results are presented. The top picture shows the potential
temperature perturbation field for a low threshold (θt = 0.001) simulation, while the bottom plot shows a
high threshold (θt = 4.0) result. The main features of the solution are the same in both cases. Even though
in the high threshold simulation the resolved region does not encompass the entire structure, the position
of the front and the rotor structure looks identical to the low threshold case. The difference can be noted in
the wake of the front and the far field. The high threshold mesh does not capture the wake well, therefore
this feature is not represented in the bottom plot.
Table I confirms that all simulations reproduce the main features of the solution well. The simulations
were run using an explicit 3rd order 5 step Runge-Kutta method (RK35), an IMEX second-order backward
difference formula (BDF2) method and an IMEX second-order Additive Runge-Kutta method (ARK2) - see
[40] for details of these time-integrators. The front position was calculated as the location of −1K isotherm at
the bottom wall. The values from the computational grid were interpolated using the visualization package
Paraview to a very fine (0.01m resolution) uniform grid, and the location of the isotherm was measured
on that uniform grid. The front position error does not exceed 1m for all the methods. Interestingly, the
front location for RK35 and ARK2 methods are identical. ARK2 also closely follows RK35 with regard
to the minimum potential temperature perturbation, which suggests that ARK2 delivers superior accuracy
to BDF2. The BDF2 results for both the front location and minimum potential temperature perturbation
gives similar, but slightly different results than RK35 and ARK2.
−3
10
−4
10
L2 error
−5
10
−6
10
−7
RK35
BDF2
ARK2
10
−8
10
6
8
10
12
element ratio
14
16
Figure 10: L2 error norms for AMR simulations using different time integrators: blue line with circles shows RK35 result;
green line with squares shows BDF2 result; red line with x markers shows ARK2 result.
Figure 10 shows the L2 normalized error norms for all AMR simulations plotted against the element ratio.
The norm was computed by comparing the potential temperature perturbation field of the AMR simulations
with a fully refined reference case with the same time-integration method (AMR explicit compared with
reference explicit - blue line with circle markers; AMR BDF2 compared with reference BDF2 - green line
with square markers; AMR ARK2 compared with reference ARK2 - red line with cross markers) using the
15
following formula:
X
L2 (Q, q) =
(Qek − qke )2
e,k
X
(Qek )2
,
(19)
e,k
where Q is the reference solution, q is the AMR solution, e is the index traversing the elements, and k
enumerates the nodal points within each element.
The error for all the time integration methods is the same, which shows that AMR equally impacts the
simulation accuracy regardless of the time integration scheme. For low ER simulations the L2 error is very
small and stays below 10−6 . For high ER the error grows to 10−4 . This indicates that for low ER cases
the entire domain is adequately resolved, while for high ER simulations the unresolved far field impacts the
global accuracy of the solution.
Overall, the error analysis shows that AMR can deliver an accurate result, with the level of accuracy
dependent on the refinement threshold. Even the high threshold AMR simulations can represent the main
features of the solution well. Regarding IMEX methods, ARK2 seems to deliver more accurate solutions
(i.e. the solution which is closer to the explicit RK35).
8.1.2. Performance Analysis
We examine the performance of three time integration methods in conjunction with an adaptive mesh:
the explicit RK35, IMEX BDF2, and IMEX ARK2 methods. While the explicit method proves to be simpler
to analyze, IMEX is the method of choice for all real-world applications, because it relaxes the explicit time
step constraint. For explicit simulations the time step (∆t = 0.01s) was ten times smaller than for the IMEX
simulations. Figure 11d reflects that difference, as the IMEX simulations run nearly 10 times faster than
their explicit counterparts. The fastest method is the BDF2, while the ARK2 is just slightly slower running
with the same time step (although the ARK2 method can use a larger time-step than BDF2 due to a larger
stability region). The Courant number for the IMEX simulations was 1.6.
Since the AMR simulations have a lower element count, we expect them to run proportionally faster
than a fully refined reference simulation. The theoretical ideal speed-up is equal to the ratio of the number
of elements in the reference case to an average number of elements in the AMR simulations. We expect
that the AMR algorithm incurs an overhead for the evaluation of the refinement criterion, projection of
the solution between dynamically changing meshes and computation of non-conforming fluxes, therefore the
actual speed-up curve should never exceed the ideal profile. The difference between the ideal and actual
speed-up curves is a measure of the AMR overhead.
Figure 11 shows the speed-up of the AMR algorithm for different ER values for (a) explicit RK35, (b)
IMEX BDF2 and (c) all three time integration methods. The black solid line without markers represents the
ideal theoretical speed-up. The speed-up of explicit AMR simulations (blue solid line with circular markers
in Fig. 11a) is nearly ideal, which indicates that the cost of performing all the operations accredited to
AMR is negligible compared to the cost of time integration. Since in this case the refinement criterion was
evaluated every second (i.e., every 100 time-steps) we plot the speed-up of the simulation with refinement
check every time-step in the blue dashed line. The overhead due to frequent criterion evaluation shows
on the plot, but is still very small. This also indicates that the leading cost in the AMR overhead is the
evaluation of the criterion, since when we take away the majority of that cost (solid blue line with circular
markers) the speed-up curve overlaps with the ideal line. The other components of the overhead (mesh
manipulations, data projections, non-conforming flux computations) have a negligible cost.
In order to validate the choice of refinement frequency, we plot in Fig. 12 the time history of the element
count for two extreme refinement thresholds using refinement every time-step (black lines) and every one
second (100 time steps for explicit, or 10 time steps for IMEX simulations). Both curves overlap - only
small differences are visible for the high threshold simulation near time T = 680s, which can be attributed
to ”blinking”, that is refining and de-refining the same mesh location every time step. In this case the less
frequent refinement works to our advantage by removing this unwelcome feature.
16
(b)
16
16
14
14
12
12
speed−up
speed−up
(a)
10
8
10
8
6
6
6
8
10
12
element ratio
14
16
(c)
6
8
10
12
element ratio
14
16
(d)
6
10
speed−up
14
ideal
BDF2
ARK2
RK35
5
wall time [s]
16
12
10
10
4
10
8
BDF2
RK35
ARK2
6
3
6
8
10
12
element ratio
14
10 2
10
16
3
10
average element number
4
10
Figure 11: Speed-up for (a) RK35, (b) BDF2 and (c) all time-integrators. Black solid line without markers indicates ideal
speed-up. Blue line with circles marks the RK35 results, green line with squares marks BDF2 and red line with crosses marks
ARK2. (a) Dashed line shows the speed-up in the case where the refinement criterion is evaluated at every iteration. (b)
The dashed line shows the speed-up with a constant number of solver iterations. (d) Wall clock time for all time integration
methods.
In Fig. 11b the green solid line with square markers represents simulations with IMEX BDF2 time
integration. Clearly the overhead incurred by adaptive simulations is much greater than in the explicit
case. The speed-up curve has a variable slope, which indicates that the overhead changes with the choice
of refinement threshold. The speed-up of AMR simulations with the IMEX ARK2 method (red line with
cross markers in Fig. 11c) is much better than the BDF2, but with bigger overhead than the RK35. Also
the slope for ARK2 seems to be less variable than for BDF2.
At the heart of our IMEX methods is the GMRES iterative solver. In Fig. 13 we show the average number
of GMRES iterations per time step as a function of ER for both BDF2 (green line with square markers) and
ARK2 (red line with cross markers). The point corresponding to ER = 1 is the reference simulation. For
BDF2 the average iteration count grows with ER, while for ARK2 it remains more or less constant. This
can explain the difference in the overhead incurred by AMR with those two time integration methods. To
investigate the matter further we run the BDF2 simulations with a prescribed number of GMRES iterations.
The result of this exercise is depicted by the dashed line in Fig. 11b. The nearly ideal speed-up is regained
by introducing a constant number of GMRES iterations for each simulation. Of course such a constraint
is artificial, as the GMRES algorithm automatically determines the number of iterations needed to satisfy
the solution accuracy. This means, however, that it is indeed the variable GMRES iteration count that is
preventing the AMR BDF2 simulations from achieving a good speed-up.
To investigate the reason for the higher average number of iterations per time step in the AMR BDF2
simulations we plot the time history of the iteration count for both the reference and θt = 0.001 simulations
in Fig. 14a. The top plot shows the GMRES iterations for the reference simulation - the number oscillates
between 7 and 8 every time step. The bottom plot represents the AMR simulation and reveals frequent
17
1000
900
element count
800
700
600
500
400
300
200
0
100
200
300
400
500
time [s]
600
700
800
900
Figure 12: Element number time history for two different refinement threshold settings (θt = 0.001 top line; θt = 4.0 bottom
line) evaluating the criterion every time step (black) and every 100 time steps (green) with RK35.
10.5
avg. GMRES iterations
10
BDF2
ARK2
9.5
9
8.5
8
7.5
7
0
5
10
15
element ratio
Figure 13: Average number of GMRES iteration per time step for reference and adaptive simulations.
spikes in the iteration count. A closer look at the iteration count compared with the changes in the number
of elements in the mesh (Fig. 14b) shows that the spike in the iteration count occurs whenever there is a
change in the mesh. This clearly implies that the AMR algorithm incurs an additional overhead with the
IMEX BDF2 method because of an increased number of GMRES iterations.
On the other hand, such spikes do not occur for ARK2. The iteration count history in Fig. 14c is very
similar for both the reference and θt = 0.001 simulations, however much more variable compared with the
BDF2 reference. The reason for the different behavior is that even though both BDF2 and ARK2 use the
same iterative solver, the system solved by GMRES actually differs between the two methods. While not
exhaustive, this result does seem to show that BDF2 is less robust than ARK2 with respect to the projections
of the solution between meshes that is introduced by the AMR algorithm. One simple explanation is that
BDF2 is a multi-step method (requires the solution at two time-levels in addition to the right-hand-side
vector) while ARK2 is a single-step multi-stage method (which only requires the solution at one previous
time level and all stages are built directly from this).
8.1.3. Mass conservation
An important measure of the quality of the discretization is the mass conservation. In order to show
that our non-conforming AMR implementation does conserve mass as well as a conforming DG method, we
investigate the mass conservation error. We define the mass conservation error as:
M (t) =
m(0) − m(t)
,
m(0)
18
(20)
(a)
(b)
20
15
k
k
15
10
10
5
660
5
20
k
Ne
15
640
10
5
0
100
200
300
400
500
time [s]
600
700
800
900
100
200
300
400
500
time [s]
600
700
800
900
100
200
300
400
500
time [s]
600
700
800
900
620
480
485
490
time [s]
495
500
(c)
10
k
8
6
4
0
10
k
8
6
4
0
Figure 14: Time history of GMRES iterations (k) for reference (top figure on each panel) and θt = 0.001 (bottom figure on each
panel) simulations for (a) BDF2 and (c) ARK2 simulations. Panel (b) contains the close-up of the time history of k between
t = 480s and 500s for the BDF2 simulation (top figure) and the time history of the element count (Ne on the bottom figure).
R
where m(t) = Ω ρ(t)dΩ is the total mass in the system at a time t. The mass conservation error is therefore
a normalized measure of the mass loss in the system over the simulation time. The plots of M for the
reference simulation and two different AMR simulations are presented in Figure 15. The top three panels
present the mass conservation error as a function of time for the reference simulation (Fig. 15a) and two
AMR simulations with different refinement thresholds (Fig. 15b and 15c). The M profile of the reference
simulation, after the initial transient behavior, is flat for all the time integration methods and stays below
10−13 . The profiles for both AMR simulations are bounded from above by the reference mass conservation
level. We notice also that M varies slightly over time for the AMR simulations. In panel (d) we plot the
element count (Ne ) for the reference simulation and two AMR simulations. Note that the level of mass
conservation on panels (a), (b) and (c) is correlated with the element count for reference, θt = 0.001 and
θt = 4.0 simulations. The mass conservation error and the element count are both constant for the reference
simulation. For the AMR simulations M is growing whenever there is an increase in Ne . This can be
explained by the fact that more integration points mean more roundoff errors in mass computations, which
add up to create a larger mass conservation error although it remains far below the conservation error of
the high-resolution uniform simulation.
8.1.4. Optimization remarks
The NUMA code is written in the Fortran90 language, which relies heavily on the compiler. All presented
analysis was conducted with compiler optimization turned off in order to investigate the algorithm itself.
Figure 16a presents the decrease in runtime due to using O3 (Intel) compiler optimization setting for both
explicit and IMEX simulations. Use of optimization provides almost an order of magnitude faster code,
which is a considerable benefit for any application. Figure 16b shows the speed-up analysis for the adaptive
simulations using compiler optimization. Both explicit and IMEX speed-up curves (dashed lines) are above
19
(a)
(b)
−12
−12
10
10
−13
−14
M1
M1
10
10
RK35
BDF2
ARK2
−15
10
0
−14
10
−15
10
−16
10
RK35
BDF2
ARK2
−13
10
−16
100
200
300
400
500
time [s]
600
700
800
10
900
(c)
0
100
200
300
400
500
time [s]
600
700
800
900
(d)
−12
4
10
10
RK35
BDF2
ARK2
−13
10
reference
Nel
M1
θt = 0.001
−14
10
3
10
θt = 4.0
−15
10
−16
10
0
2
100
200
300
400
500
time [s]
600
700
800
10
900
0
100
200
300
400
500
time [s]
600
700
800
900
Figure 15: Mass conservation error for Case 1 for (a) reference simulation, (b) θt = 0.001 AMR simulation and (c) θt = 4.0
AMR simulation for three time integration methods. Panel (d) shows the time history of the element count for all three
simulations using all three methods. Results from different simulations with the same time-integration method overlap, which
is expected.
Table II: Timings (in seconds of runtime) and percentage breakdown of different AMR components for Case 1.
total
volume integrals
face integrals
non-conforming faces
AMR:
criterion evaluation
mesh manipulation
data projection
other
RK35
θt = 0.001
θt = 4.0
5292 (100%)
2128 (100%)
2876.1 (54.3%) 1183.3 (55.6%)
507.67 (9.59%) 182.19 (8.56%)
116.79 (2.21%) 90.94 (4.27%)
6.030 (0.11%)
2.933 (0.14%)
3.463
1.422
0.051
0.049
0.965
0.581
1.412
0.787
BDF2
θt = 0.001
θt = 4.0
643.6 (100%)
297.0 (100%)
274.46 (42.6%) 135.2 (45.5%)
66.59 (10.3%) 28.07 (9.45 %)
21.33 (3.31%)
20.98 (7.06%)
6.291 (0.98%)
3.154 (1.06%)
3.517
1.468
0.05
0.057
1.247
0.77
1.450
0.833
ARK2
θt = 0.001
θt = 4.0
967.8 (100%) 403.1 (100%)
362.9 (37.5%) 153.7 (38.1%)
80.1 (8.29%) 29.58 (7.34%)
23.65 (2.44%) 20.14 (4.99%)
6.025 (0.62%) 3.037 (0.75%)
3.527
1.506
0.051
0.063
0.965
0.599
1.454
0.843
the theoretical ideal speed-up line. The IMEX time integrators benefitted more from optimization, as they
provide higher speed-ups than the explicit method (even though it still suffers from a variable number of
GMRES iterations in the BDF2 case). The biggest beneficiary (speed-up wise) of the compiler optimization
is the ARK2 method. Also, the slope of all optimized curves is higher than the ideal slope. This indicates that the original assumption about the work load being proportional to the number of elements does
not necessarily hold. The theoretical performance model of the algorithm does not incorporate possible
optimizations that occur at the compiler level. We attribute those super-speed-ups to the optimization of
memory accesses (e.g., prefetching, etc.), which plays a significant role when the problem size gets smaller
(and fits in cache) due to the use of the AMR algorithm.
8.1.5. AMR cost breakdown
In Table II we present the absolute runtime (in seconds) and a percentage share in the total simulation
time of selected parts of the code for three time integration methods and two refinement thresholds. We
list the total runtime, the cost of evaluating the volume integrals (the Ωe integrals in Eq. (10) apart from
the one containing the time derivative), conforming and non-conforming face integrals (both contribute to
the Γe integral in Eq. (10)) and the time spent in the AMR subroutine. We further provide a breakdown
of how much time was spent doing particular AMR tasks like the criterion evaluation, mesh manipulation
20
(a)
(b)
6
5
wall time [s]
10
25
RK35
RK35 O3
BDF2
BDF2 O3
ARK2
ARK2 O3
20
speed−up
10
4
10
3
10
10
2
10 2
10
15
ideal
RK35
RK35 O3
BDF2
BDF2 O3
ARK2
ARK2 O3
3
10
element count
5
6
4
10
8
10
12
element ratio
14
16
Figure 16: (a) Wall clock time for explicit (blue circles), IMEX BDF2 (green squares) and IMEX ARK2 (red crosses) simulations
with (dashed line) and without (solid line) compiler optimization; (b) Speed-up of explicit (blue circles), IMEX BDF2 (green
squares) and IMEX ARK2 (red crosses) AMR simulations with (dashed lines) and without (solid lines) compiler optimization.
or data projection. The category ”other” contains the operations like recomputing quantities such as the
mass matrix, gravity vector etc. which were not optimized for a changing mesh. Instead of recomputing
those values for new elements only, we do it for all the elements in the new mesh. The overall time spent
in those subroutines is negligible compared to the total simulation time, therefore the optimization would
yield a very small gain. It would have to be addressed though in the future.
We observe that the total share of the AMR in runtime is on the order of 1% for IMEX methods and a
factor of 10 less for the explicit time integration. All the timings for the AMR part are basically identical
for all three methods, which is expected as we call the AMR subroutine every 1s of the simulation time,
regardless of the time-step. We also note that the biggest part of the AMR time share is spent in evaluating
the refinement criterion, with the data projection cost being significantly smaller and the mesh manipulation
negligible. We disregard the cost of other operations, as explained above.
For the sake of argument one can say that if we checked the refinement criterion every time step, the
cost of AMR would rise from 1% to 10%. That would be true if the simulation required mesh modifications
every time step. As shown in Fig. 12, this is not the case here, as even with refinement check every 10 (or
100 for RK35) time-steps we reproduce the same mesh behavior as for refinement every time-step. Even
then, the 10% cost is still an acceptable one.
The evaluation of the flux over the non-conforming faces can be also considered an overhead of AMR,
however it is more difficult to quantify how much of the cost can be actually attributed to AMR. As explained
in Sec. 6, the cost of evaluating the non-conforming flux does not only include computing the Rusanov flux
on two children faces, as in a regular conforming case, but also the projections from and to the parent face.
The cost of the operations on the non-conforming face is then larger than the cost for the two conforming
faces replaced by it. On the other hand, the non conforming face allows for decreasing the number of
elements in the domain, and therefore limits the number of expensive volume integral evaluations.
8.2. Case 2 - Rising thermal bubble
In a similar fashion to Case 1, we start the rising thermal bubble simulation by refining the level 0 mesh
of 2 × 2 elements uniformly by 5 levels. On the resulting mesh of 64 × 64 elements we generate the initial
condition described in Sec. 7.2. The polynomial order is again set to NP = 5, which gives the effective
resolution of 3.125m. The smaller domain size and increased resolution, compared to Case 1, imposes a
decreased time step. We choose ∆t = 0.025s, which results in the Courant number of 4.7. This set-up
generates a problem of a similar size to Case 1, but with significantly increased runtime due to the smaller
time-step. Also, an increased Courant number will put more effort on the GMRES solver, as the iteration
count will be significantly larger.
21
1800
1600
θ = 0.001
t
N
e
1400
1200
θt = 0.01
1000
θt = 0.1
800
θ = 0.3
600
t
400
200
0
100
200
300
400
time [s]
500
600
700
Figure 17: History of element count for different refinement threshold settings for Case 2.
Table III: Maximum potential temperature perturbation (θmax ), height of the bubble (Hb ) and simulation runtime (Ts ).
θt
ref.
0.001
0.01
0.1
0.35
Hb [m]
961.09
961.07
961.05
962.27
968.92
RK35
θmax [K]
0.46110
0.46110
0.46116
0.46049
0.45795
Ts [h]
197.51
58.51
49.67
35.48
18.13
Hb [m]
960.16
960.75
960.88
962.15
967.76
BDF2
θmax [K]
0.46124
0.46012
0.46031
0.45861
0.45551
Ts [h]
91.29
28.50
25.79
19.43
10.22
Hb [m]
961.27
961.74
961.81
963.02
968.34
ARK2
θmax [K]
0.46064
0.45993
0.46001
0.45853
0.45759
Ts [h]
77.38
23.64
19.99
14.98
7.95
Figure 17 shows the time history of the element count for four different refinement thresholds. The
behavior of the mesh is similar to Case 1; the mesh initially does not change much. The number of elements
starts growing at later times. For high thresholds this growth is subsequently stopped (θt = 0.3 for t > 600s).
The period of the initial inactivity is longer by 100s than for Case 1, and has a much larger share in the
total simulation time (over 50%).
8.2.1. Accuracy analysis
Figure 18 compares four potential temperature perturbation fields at t = 700s for different refinement
thresholds. Figure 18a shows that the lowest threshold refines a large portion of the domain around the
bubble, including the interior of the mushroom bubble and the wake. This results in very smooth contour
lines. As the threshold rises, a smaller portion of the domain is refined and the contours become more wavy
indicating some instability. Note that in Figs. 18b and 18c we see that the outer contours of the mushroom
are refined in the same way, but the mesh within the bubble is very different, which results in a more wavy
mushroom pattern in Fig. 18c. This indicates that not only the strong gradient zone, but also the internal
mushroom region is important for this case. The solution in Fig. 18d is the worst case scenario where the
general bubble shape is still recreated, but the solution is not smooth and some mesh imprinting can be
seen at the interfaces of big elements in the center of the mushroom. In this case only the highest potential
temperature perturbation areas are refined and the lack of refinement in the rest of the bubble affects the
solution in the refined region significantly.
Table III summarizes the maximum potential temperature perturbation in the domain at time T = 700s,
as well as the position (height) of the bubble at that time. The position of the bubble is defined as a
vertical coordinate of the top-most cross-section of θ = 0.1K isoline and x = 500m central axis of the
bubble. Additionally the runtime for all the cases using different time integration methods is presented for
comparison. The runtime was measured for simulations without compiler optimizations and preconditioning.
For all time integration methods, both the bubble height and θmax follow the reference solution closely
for low θt and diverge for larger values of the refinement threshold. The simulation with θt = 0.01 matches
22
(a)
(b)
(c)
(d)
Figure 18: Potential temperature perturbation contours at t = 700s for (a) θt = 0.001, (b) θt = 0.01, (c) θt = 0.1, and (d)
θt = 0.35.
the reference solution almost perfectly, but obtains this result in a significantly shorter simulation time
( 25%). In this case IMEX methods run with a significantly higher Courant number than in Case 1, which
causes the reference simulations for both BDF2 and ARK2 to differ slightly from the explicit one. Similarly,
the AMR results for both ARK2 and BDF2 are not exactly the same as their explicit counterparts. The
ARK2 method proves to be the fastest among the three.
In Fig. 19 the L2 norms of the potential temperature perturbation for two IMEX methods are presented.
The norms were computed using the reference simulation for each method (BDF2 AMR simulations compared with BDF2 reference, likewise for ARK2). Similarly to the Case 1 results, the error does not differ
much among methods, which indicates that AMR affects the accuracy of both methods in the same way. The
level of error is significantly higher, though, than in the density current case. This is because the artificial
viscosity applied to stabilize this flow is significantly smaller than for the density current case. Typically
the viscosity for Case 1 is µ = 75m2 /s while for Case 2 it is µ = 0.1m2 /s. This amount of viscosity is
enough to stabilize the flow, but is barely enough to guarantee a smooth solution for the effective resolution
of ∆x = 3.125m (see [32]). A more in-depth analysis of the rising thermal bubble case with AMR will follow
in an upcoming paper.
8.2.2. Performance analysis
Figure 20 shows the simulation runtime (Fig. 20a) and speed-up (Fig. 20b) of AMR simulations for
different refinement thresholds. Solid lines represent unoptimized simulations, while dashed lines show the
23
−1
10
−2
L2
10
−3
10
BDF2
ARK2
RK35
−4
10
3
4
5
6
7
8
element ratio
9
10
11
12
Figure 19: L2 error norms for RK35 (blue line with circles), BDF2 (green line with squares) and ARK2 (red line with crosses)
for Case 2.
(a)
(b)
6
10
5
16
14
12
speed−up
wall time [s]
10
18
BDF2 O3
ARK2 O3
BDF2
ARK2
RK35 O3
RK35
ideal
BDF2 O3
ARK2 O3
RK35 O3
ARK2
BDF2
RK35
10
8
4
10
6
4
3
10 2
10
3
10
element count
2
3
4
10
4
5
6
7
8
element ratio
9
10
11
12
Figure 20: (a) Wall clock time as a function of average element count and (b) speed-up of AMR simulations as a function of
ER for different IMEX methods for Case 2. Green line with squares represents BDF2; red line with crosses shows the result for
ARK2; blue line with circles marks the timing of the explicit RK35 method. Dashed lines represent the code optimized using
compiler flags, solid lines show the unoptimized case. The solid black line shows the expected ideal speed-up.
performance of the compiler O3 optimizations. As observed already in Table III, ARK2 was the fastest
one for this case for both optimized and unoptimized runs. The speed-up plot for unoptimized simulations
looks similar to those for Case 1, where explicit RK35 obtains nearly ideal speed-up, followed by ARK2 and
BDF2. This time the difference between IMEX methods is not as pronounced. The compiler optimized runs
again show speed-ups exceeding expectations.
The greater speed of ARK2 simulations can be credited to an increased Courant number (compared
to Case 1) and the absence of preconditioning. ARK2 is a two stage method, where each of the stages
performs significantly less GMRES iterations than a single BDF2 time-step. The total number of iterations
per time step is presented in Fig. 21. Since the cost of GMRES is O(k 2 ) (where k is the number of GMRES
iterations) that alone accounts for an increased runtime. On the other hand, the use of preconditioners
could significantly bring down the number of iterations for BDF2, while ARK2 may not benefit as much
due to the relatively small iteration count at each stage. We discuss the impact of preconditioners in the
following section.
Figure 21 shows the average number of GMRES iterations for both IMEX methods. For ARK2 we
present the sum of iterations from both implicit stages. In both cases AMR causes the increase of the
iteration count, however it is much more pronounced for the BDF2 method. Contrary to the Case 1 result,
this does not translate into a significant difference in the speed-up plots.
Looking at Fig. 22 we see that the previously reported spikes in the GMRES iteration count (for Case
24
42
avg. GMRES iterations
40
BDF2
ARK2
38
36
34
32
30
28
0
2
4
6
element ratio
8
10
12
Figure 21: Average number of GMRES iterations as a function of ER for different IMEX methods for Case 2. Green line with
squares represents BDF2; red line with crosses shows the result for ARK2.
1) are not as significant for Case 2. The iteration count for the BDF2 AMR simulation (bottom plot of Fig.
22a) is larger compared to the reference case (top plot). The spikes, however, mainly occur while de-refining
the mesh (Fig. 22b) and do not significantly contribute to the overall average iteration count. Very small
spikes can be noted when the element count increases, but compared to the average iteration number this
increase is negligible. For ARK2 the AMR GMRES iteration time history (bottom plot of Fig. 22c) much
more closely follows the reference case (top plot of Fig. 22c), which indicates more robustness of this method
with respect to the AMR algorithm.
8.2.3. Preconditioning
In any real application IMEX methods are used in combination with a preconditioner in order to improve the conditioning of the system and bring the GMRES iteration count down. Of course, the cost of
computing and applying the preconditioner has to be smaller than the cost saved by decreasing the iteration
count. An added complexity occurs in the AMR case, because the mesh keeps changing and therefore the
preconditioner has to adapt to those changes. Here we present only a very preliminary discussion of the use
of preconditioners in the AMR simulations. We apply the element-based spectrally-optimized approximate
inverse preconditioner described in [41].
Figure 23 presents the GMRES iteration count for both IMEX methods with preconditioning (dashed
line) and without (solid line). For both BDF2 and ARK2, the preconditioner brought down the number of
iterations by a similar factor. The decrease of the iteration count does not translate to decrease in runtime
for both methods equally well, which can be observed in Fig. 24a. For BDF2 the use of preconditioning
resulted in noticeably faster simulations, while for ARK2 the speed-up was not as pronounced. Interestingly,
the reference simulations benefitted more from the preconditioner (the rightmost points on the graph). This
is due to the fact that without AMR the preconditioner has to be computed only once at the beginning of
the simulation and does not need to be recomputed. For AMR simulations the recomputing of the preconditioner adds a significant overhead. It is apparent in Fig. 24b, where the speed-up lines for preconditioned
simulations are significantly lower than the speed-up for non-preconditioned cases.
It is worth noting that this is just a preliminary study of the influence of preconditioning on the AMR
simulations. At the moment we recompute the preconditioner each time the mesh is modified, even if only
one of the elements is changed. Further study will reveal if it is possible to limit this overhead. Additionally,
this preconditioner was designed for the Schur-form CG method and is applied to no-Schur form DG method,
therefore it may not be as beneficial for the cases at hand. We will address these issues in an upcoming
paper.
8.2.4. Mass conservation
Similarly to Case 1, the mass conservation error for the AMR simulations for Case 2 is bounded by the
error for the reference case. Figure 25 shows the same sensitivity of M to an increasing number of elements
25
(a)
(b)
50
50
k
k
40
40
30
20
50
30
1150
k
Ne
40
30
20
0
100
200
300
400
time [s]
500
600
700
100
200
300
400
time [s]
500
600
700
1100
280
285
290
time [s]
295
300
(c)
40
k
30
20
10
40
k
30
20
10
0
Figure 22: Time history of GMRES iterations (k) for reference (top figure on each panel) and θt = 0.1 (bottom figure on each
panel) simulations for (a) BDF2 and (c) ARK2 simulation. Panel (b) contains the close-up of the time history of k between
t = 280s and 300s for BDF2 simulation (top figure) and the time history of the element count (Ne on the bottom figure).
Table IV: Corrected coefficients for the RK35 method.
stage
1
2
3
4
5
1
0
0.355909775063326∗
0.367933791638137
0
α
0
1
0.644090224936674
0.632066208361863
0.762406163401431
∗
marks the difference from [42].
0
0
0
0
0.237593836598569
β
0.377268915331368
0.377268915331368
0.242995220537396
0.238458932846290
0.237593836598569
over simulation time. All three time integration methods give consistent mass conservation behavior. During
the study an important feature of the explicit RK35 method has been noted, which can be extended to other
time integration methods. It is important that the coefficients of each stage of the Runge-Kutta method are
consistent up to a desired precision. If we aim for double precision and therefore are looking for round-off
errors at the 10−16 level, the coefficients have to be consistent up to the 16th decimal place. This is not
the case with the coefficients of the third stage of RK35 published in Ruuth [42]. In the case of this study
a small inconsistency at the last decimal place, which is probably the result of a roundoff error, led to a
steady growth of the M norm. In Table IV we provide consistent coefficients for the RK35 method which
fixed this steady growth.
8.2.5. AMR cost breakdown
Compared to the results for Case 1, the share of the cost of AMR for IMEX methods in Case 2 is
significantly smaller. This is because the time integration became more expensive due to the increased
Courant number. Also, the criterion evaluation cost with regards to the total AMR cost is much bigger than
in the previous case, especially for θt = 0.01 simulations. This is because we are checking the refinement
criterion every 10 iterations, which translates to 0.25s of simulation time. We therefore check for refinement
more often than actually perform any mesh modification. It is clear that the refinement check frequency
parameter could have been chosen more optimally. Note, however, that for this case the cost of time
integration is so high that even more frequent checks would not cause the cost of AMR to exceed a few
percents.
26
45
avg. GMRES iterations
40
BDF2
ARK2
AKR2 precon
BDF2 precon
35
30
25
20
15
10
0
2
4
6
element ratio
8
10
12
Figure 23: Average number of GMRES iterations as a function of ER for different IMEX methods for Case 2 with (dashed
line) and without preconditioning (solid line). Green line with squares represents BDF2; red line with crosses shows the result
for ARK2.
(a)
(b)
18
16
14
wall time [s]
speed−up
5
10
BDF2
ARK2
BDF2 precon
ARK2 precon
12
ideal
BDF2
ARK2
BDF2 precon
ARK2 precon
RK35
10
8
6
4
10
4
2
2
3
10
element count
4
6
8
element ratio
10
12
Figure 24: (a) Runtime and (b) speed-up of compiler optimized IMEX simulations for Case 2 with (dashed line) and without
preconditioning (solid line). Green line with squares represents BDF2; red line with crosses shows the result for ARK2.
9. Conclusion
We presented the details of the AMR algorithm for the discontinuous Galerkin version of the Nonhydrostatic Unified Model of the Atmosphere (NUMA) and sought to analyze the benefit of AMR. To this
end, we compared the accuracy versus the speed-up of AMR with respect to uniform grid simulations.
The AMR algorithm was tested using a threshold based criterion for both the density current and rising
thermal bubble cases. We have investigated the performance of the AMR algorithm running with three time
integration methods: explicit RK35 and two IMEX methods: BDF2 and ARK2. For all the methods the
accuracy of the result depends on the choice of the threshold for the refinement criterion. For higher element
ratios (ER) the criterion does not capture all the features of the solution, therefore the L2 error increases
from 10−6 for low threshold simulations to 10−4 for high threshold ones for the density current case. The
most significant features of the solution such as the front position, minimum potential temperature or the
general structure of the front are captured correctly even for high ER simulations. The ARK2 method
reproduces closely the result of RK35, while BDF2 provides a less similar agreement.
For the rising thermal bubble case the general features of the solution are captured correctly by all the
simulations except the one with the highest threshold value. In that case the boundary of the bubble is
27
(a)
(b)
−12
−12
10
10
−13
−13
10
−14
M1
M1
10
10
RK35
BDF2
ARk2
−15
10
0
−14
10
−15
10
−16
10
RK35
BDF2
ARK2
−16
100
200
300
400
time [s]
500
600
10
700
(c)
0
100
200
300
400
time [s]
500
600
700
(d)
−12
4
10
−13
10
10
RK35
BDF2
ARK2
reference
Nel
M1
θt = 0.01
−14
10
3
10
θt = 0.35
−15
10
−16
10
0
2
100
200
300
400
time [s]
500
600
10
700
0
100
200
300
400
time [s]
500
600
700
Figure 25: Mass conservation error for Case 2 for (a) reference simulation, (b) θt = 0.01 AMR simulation and (c) θt = 0.35
AMR simulation for three time integration methods. Panel (d) shows the time history of the element count for all three
simulations using all three methods. Results from different simulations with the same time-integration method overlap, which
is expected.
Table V: Timings (in seconds of runtime) and percentage breakdown of different
RK35
BDF2
θt = 0.001
θt = 4.0
θt = 0.001
θt = 4.0
total
27994 (100%)
9710 (100%)
18938 (100%)
7154 (100%)
volume integrals
13954 (49.8%) 4740 (48.8%) 4990 (26.3%) 1829 (25.6%)
face integrals
2560 (9.14%)
678.9 (7%)
1540 (8.13%) 385.7 (5.39%)
non-conforming faces 677.7 (2.42%) 503.6 (5.19%) 535.4 (2.83%) 440.6 (6.16%)
AMR:
21.7 (0.08%)
8.5 (0.09%)
22.2 (0.12 %)
8.49 (0.11%)
16.8
5.9
16.8
5.9
criterion evaluation
mesh manipulation
0.15
0.16
0.15
0.16
data projection
1.76
0.86
2.39
1.13
other
2.5
1.11
2.7
1.16
AMR components for Case 2
ARK2
θt = 0.001
θt = 4.0
13245 (100%)
4687 (100%)
4618 (34.9%) 1655 (35.3%)
1369 (10.3%) 342.6 (7.31%)
460.3 (3.47%) 374.4 (7.99%)
21.8 (0.16%)
8.29 (0.18%)
16.8
5.93
0.15
0.15
1.93
0.9
2.81
1.22
clearly disturbed and some mesh artifacts are present in the solution. It was discovered that not only the
refinement in the high temperature gradient area at the boundary of the bubble needs to be resolved, but
the resolution of the interior of the bubble plays a role. All the important features such as the position
of the bubble and maximum potential temperature were captured very accurately by the lower refinement
thresholds at a fraction of the cost.
The performance of the algorithm was investigated by comparing the runtime and speed-up of different AMR simulations. The explicit RK35 showed nearly perfect speed-up which confirms that the AMR
algorithm is indeed very efficient. In the cases shown in the paper the overall cost of the adaptive mesh
algorithm is below 1% of the total runtime. The main component of the AMR overhead is the evaluation
of the refinement criterion, while the mesh manipulation and data projections have negligible cost. For
the density current test case (Case 1) with BDF2 an additional big overhead was the increased GMRES
iteration count caused by AMR. The BDF2 speed-up turned out to be the worst of all the three methods.
Both IMEX methods had runtimes about an order of magnitude smaller than RK35. ARK2 was shown to
be slightly slower than BDF2 using the same time step, but with much increased accuracy and improved
speed-up properties. The ARK2 method was shown to be more robust (less sensitive) to the changing mesh
caused by the AMR algorithm. For the rising thermal bubble test case (case 2), the ARK2 was the fastest
28
method due to the increased Courant number. The difference in speed-up curves was not as pronounced,
but again the ARK2 proved to have better properties and less overhead than BDF2.
In nonhydrostatic atmospheric applications (when using the fully compressible equations), IMEX methods are an absolute necessity, due to a very strict constraint imposed on the explicit time-step by acoustic
waves. This study shows that the performance of some IMEX methods can be affected by a dynamically
adaptive mesh. It is important not only to design the AMR algorithm in an efficient way, but also to consider
which time integration method to use in such applications. In our future investigations we will pursue the
ARK2 method due to its good performance properties and excellent accuracy.
The mass conservation error for both test cases and all the time integration methods was shown to be
bounded by the mass conservation error of the reference simulations. For adaptive simulations the mass
conservation was affected by the changing number of elements in the mesh.
A preliminary study of the influence of preconditioning on the AMR simulations showed that it is
another significant source of overhead, mainly because the preconditioner was recomputed after every mesh
modification. We will pursue the possibilities of reducing this overhead in future work since it may not be
necessary to always recompute the preconditioner.
Finally, the effect of the compiler optimizations on the algorithm performance was investigated. The
optimizations provide significant runtime reduction of the simulations, as big as a factor of 10. Additionally,
optimization affects the speed-up of the AMR algorithm, as more efficient memory handling for smaller
problems allows for speed-ups greater than it would follow from the simple performance model. In a sense
the optimizations more than make up for the overhead caused by the AMR.
Acknowledgements
The authors gratefully acknowledge the support of the Office of Naval Research through program element PE-0602435N, the National Science Foundation (Division of Mathematical Sciences) through program
element 121670, and the Air Force Office of Scientific Research through the Computational Mathematics
program. The authors are grateful to Andreas Müller and Simone Marras for constructive discussions that
greatly contributed to this paper.
[1] R. Nair, H.-W. Choi, H. Tufo, Computational aspects of a scalable high-order discontinuous galerkin atmospheric dynamical core, Comp. Fl. 38 (2) (2009) 309–319.
[2] J. Kelly, F. Giraldo, Continuous and discontinuous galerkin methods for a scalable 3d nonhydrostatic atmospheric model:
limited area mode, J. Comp. Phys. 231 (2) (2012) 7988–8008.
[3] R. Hartmann, P. Houston, Adaptive discontinuous galerkin finite element methods for the compressible euler equations,
J. Comp. Phys. 183 (2) (2002) 508–532.
[4] P. Bastian, M. Blatt, A. Dedner, C. Engwer, R. Klöfkorn, R. Kornhuber, M. Ohlberger, O. Sander, A generic grid
interface for parallel and adaptive scientific computing. part ii: Implementation and tests in dune, Computing 82 (2-3)
(2008) 121–138.
[5] C. Burstedde, O. Ghattas, M. Gurnis, G. Stadler, E. Tan, T. Tu, L. C. Wilcox, S. Zhong, Scalable adaptive mantle
convection simulation on petascale supercomputers, in: Proc. 2008 ACM/IEEE Supercomputing, IEEE Press, 2008, p. 62.
[6] C. Jablonowski, Adaptive grids in weather and climate modeling, Ph.D. thesis, The University of Michigan (2004).
[7] J. Behrens, Adaptive atmospheric modeling: key techniques in grid generation, data structures, and numerical operations
with applications, Springer, 2006.
[8] G. W. Ley, R. L. Elsberry, Forecasts of typhoon Irma using a nested grid model, Mon. Wea. Rev. 104 (1976) 1154.
doi:10.1175/1520-0493(1976)104¡1154:FOTIUA¿2.0.CO;2.
[9] Y. Kurihara, M. A. Bender, Use of a movable nested-mesh model for tracking a small vortex, Mon. Wea. Rev. 108 (1980)
1792–1809. doi:10.1175/1520-0493(1980)108¡1792:UOAMNM¿2.0.CO;2.
[10] D.-L. Zhang, H.-R. Chang, N. L. Seaman, T. T. Warner, J. M. Fritsch, A two-way interactive nesting procedure with
variable terrain resolution, Mon. Wea. Rev. 114 (7) (1986) 1330–1339.
[11] G. S. Dietachmayer, K. K. Droegemeier, Application of continuous dynamic grid adaptation techniques to meteorological
modeling. i: Basic formulation and accuracy, Mon. Wea. Rev. 120 (8) (1992) 1675–1706.
[12] J. M. Prusa, P. K. Smolarkiewicz, An all-scale anelastic model for geophysical flows: dynamic grid deformation, J. Comp.
Phys. 190 (2) (2003) 601–622.
[13] C. J. Budd, W. Huang, R. D. Russell, Adaptivity with moving grids, Acta Numerica 18 (1) (2009) 111–241.
[14] W. Skamarock, J. Oliger, R. L. Street, Adaptive grid refinement for numerical weather prediction, J. Comp. Phys. 80 (1)
(1989) 27–60.
[15] W. C. Skamarock, J. B. Klemp, Adaptive grid refinement for two-dimensional and three-dimensional non-hydrostatic
atmospheric flow, Mon. Wea. Rev. 121 (3) (1993) 788–804.
29
[16] M. J. Berger, J. Oliger, Adaptive mesh refinement for hyperbolic partial differential equations, J. Comp. Phys. 53 (3)
(1984) 484–512.
[17] M. J. Berger, P. Colella, Local adaptive mesh refinement for shock hydrodynamics, J. Comp. Phys. 82 (1) (1989) 64–84.
[18] R. J. LeVeque, Wave propagation algorithms for multidimensional hyperbolic systems, J. Comp. Phys. 131 (2) (1997)
327–353.
[19] N. Nikiforakis, Amr for global atmospheric modelling, in: Adaptive Mesh Refinement-Theory and Applications, Springer,
2005, pp. 505–526.
[20] D. P. Bacon, N. N. Ahmad, Z. Boybeyi, T. J. Dunn, M. S. Hall, P. C. Lee, R. A. Sarma, M. D. Turner, K. T. Waight III,
S. H. Young, et al., A dynamically adapting weather and dispersion model: the operational multiscale environment model
with grid adaptivity (omega), Mon. Wea. Rev. 128 (7) (2000) 2044–2076.
[21] P. K. Smolarkiewicz, A fully multidimensional positive definite advection transport algorithm with small implicit diffusion,
J. Comp. Phys. 54 (2) (1984) 325–362.
[22] J. P. Iselin, J. M. Prusa, W. J. Gutowski, Dynamic grid adaptation using the mpdata scheme, Mon. Wea. Rev. 130 (4)
(2002) 1026–1039.
[23] F. X. Giraldo, The lagrange-galerkin method for the two-dimensional shallow water equations on adaptive grids, Int. J.
Numer. Meth. Fl. 33 (6) (2000) 789–832.
[24] J. Behrens, Atmospheric and ocean modeling with an adaptive finite element solver for the shallow-water equations, App.
Num. Math. 26 (1) (1998) 217–226.
[25] J. Behrens, N. Rakowsky, W. Hiller, D. Handorf, M. Läuter, J. Päpke, K. Dethloff, amatos: Parallel adaptive mesh
generator for atmospheric and oceanic simulation, Ocean Mod. 10 (1) (2005) 171–183.
[26] M. A. Taylor, A. Fournier, A compatible and conservative spectral element method on unstructured grids, J. Comp. Phys.
229 (17) (2010) 5879–5895.
[27] A. St-Cyr, C. Jablonowski, J. M. Dennis, H. M. Tufo, S. J. Thomas, A comparison of two shallow water models with
non-conforming adaptive grids: classical tests, arXiv preprint physics/0702133.
[28] E. J. Kubatko, S. Bunya, C. Dawson, J. J. Westerink, Dynamic¡ i¿ p¡/i¿-adaptive runge–kutta discontinuous galerkin
methods for the shallow water equations, Comp. Meth. Appl. Mech. Engng. 198 (21) (2009) 1766–1774.
[29] D. A. Kopriva, A conservatice staggered-grid chebyshev multidomain method for compressible flows. II: A semi-stuctured
method, Tech. Rep. 2, NASA Contractor Report (Oct. 1996). doi:10.1006/jcph.1996.0225.
URL http://linkinghub.elsevier.com/retrieve/pii/S0021999196902259
[30] Y. Maday, C. Mavriplis, A. T. Patera, Nonconforming mortar element methods: Application to spectral discretizations,
Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, 1988.
[31] D. Rosenberg, A. Fournier, P. Fischer, A. Pouquet, Geophysical–astrophysical spectral-element adaptive refinement (gaspar): Object-oriented h-adaptive fluid dynamics simulation, J. Comp. Phys. 215 (1) (2006) 59–80.
[32] A. Mueller, J. Behrens, F. X. Giraldo, V. Wirth, An adaptive discontinuous galerkin method for modeling atmospheric
convection, J. Comp. Phys. 235 (1) (2012) 371–393.
[33] S. Brdar, M. Baldauf, A. Dedner, R. Klöfkorn, Comparison of dynamical cores for nwp models: comparison of cosmo and
dune, Theor. Comp. Fl. Dyn. (2012) 1–20.
[34] C. Eskilsson, An hp-adaptive discontinuous galerkin method for shallow water flows, Int. J. Numer. Meth. Fl. 67 (11)
(2011) 1605–1623.
[35] S. Blaise, A. St-Cyr, A dynamic hp-adaptive discontinuous galerkin method for shallow-water flows on the sphere with
application to a global tsunami simulation, Mon. Wea. Rev. 140 (3) (2012) 978–996.
[36] F. Giraldo, M. Restelli, A study of spectral element and discontinuous galerkin methods for the navier-stokes equations in
non-hydrostatic mesoscale atmospheric modeling: equation sets and test cases, J. Comp. Phys. 227 (1) (2008) 3849–3877.
[37] C. Burstedde, O. Ghattas, M. Gurnis, T. Isaac, G. Stadler, T. Warburton, L. Wilcox, Extreme-scale amr, Proc. 2010
ACM/IEEE Int. Confence for High Performance Computing, Networking, Storage and Analysis (1) (2010) 1–12.
[38] H. Sundar, R. S. Sampath, G. Biros, Bottom-up construction and 2:1 balance refinement of linear octrees in parallel,
SIAM J. Sci. Comp. 30 (5) (2008) 2675–2708. doi:10.1137/070681727.
URL http://epubs.siam.org/doi/abs/10.1137/070681727
[39] J. Straka, R. B. Wilhelmson, J. R. Anderson, K. K. Droegemeier, Numerical solutions of a non-linear density current: a
benchmark solution and comparisons, Int. J. Numer. Meth. Fl. 17 (1993) 1–22.
[40] F. Giraldo, J. Kelly, E. Constantinescu, Implicit-explicit formulations for a 3d nonhydrostatic unified model of the atmosphere (numa), SIAM J. Sci. Comp. (1).
[41] L. E. Carr, C. F. Borges, F. X. Giraldo, An element-based spectrally-optimized approximate inverse preconditioner for
the euler equations, SIAM J. Sci. Comp. 34 (2012) B392–B420.
[42] S. Ruuth, Global optimization of explicit strong-stability-preserving runge-kutta methods, Math. Comp. 75 (253) (2006)
183–207.
Appendix A
1D projection
The integral projection method presented in this section was proposed by [29] for general h-p nonconforming elements. Here we describe the method applied to a specific h-non-conforming edge in 2:1
balance.
30
Let ξ ∈ [−1, 1] denote the coordinate in the standard element space corresponding to the parent (left)
side of the edge. Define z (1) , z (2) ∈ [−1, 1] as the coordinates of standard elements corresponding to two
children (right) sides of the edge. Let
z (1) =
ξ − o(1)
,
s
z (2) =
ξ − o(2)
s
(21)
be a map ξ → z (k) , k = 1, 2 from the parent space to children spaces, where o(k) is the offset parameter
for the child k and s is the scale parameter. In our case o(1) = −0.5, o(2) = 0.5 and s = 0.5. The inverse
mapping z (k) → ξ is
ξ = s · z (k) + o(k) , k = 1, 2.
(22)
We can now expand the variables using a polynomial basis as follows:
q L (ξ)
N
X
=
qjL ψj (ξ),
(23)
j=0
q Lk (z (k) )
N
X
=
qjLk ψj (z (k) ),
k = 1, 2.
(24)
j=0
By substituting (22) into (23) we get
q L (z (k) ) =
N
X
qjL ψj (s · z (k) + o(k) ),
k = 1, 2.
(25)
j=0
In order to perform projection from a parent side to two children sides of the edge we require that for
both children sides
Z1 q Lk (z (k) ) − q L (z (k) ) ψi (z (k) )dz (k) = 0, k = 1, 2.
(26)
−1
Substitution of (24) and (25) to (26) and rearranging yields
 1

 1

Z
Z
N
N
X
X
 ψj (z (k) )ψi (z (k) )dz (k)  qjLk −
 ψj (s · z (k) + o(k) )ψi (z (k) )dz (k)  qjL = 0.
j=0
j=0
−1
(27)
−1
Since z (k) ∈ [−1, 1] regardless of k, we can write z = z (k) , which simplifies the notation. The terms in
brackets can be represented in matrix form as
Z1
Mij =
ψi (z)ψj (z)dz,
(k)
Sij
Z1
=
−1
ψi (z)ψj (s · z + o(k) )dz,
k = 1, 2,
(28)
−1
which simplifies equation (27) to
(k)
Mij qjLk − Sij qjL = 0,
k = 1, 2.
(29)
(k)
−1
Note that Mij is the standard 1D mass matrix, which is easily invertible. If PSk
ij = Mij Sij , then
L
qiLk = PSk
ij qj ,
and we call PSk the scatter projection matrix.
31
k = 1, 2
(30)
Similarly the gather projection from two children sides to a parent side is performed (see Fig. 5b). We
require that on the parent side of the face
Z1
q R (ξ) − q̃ R (ξ) ψi (ξ)dξ = 0,
(31)
−1
where q R is the continuous projection of the variables q R1 and q R2 from children sides to a parent side, and

 q R1 (z (1) ) = q R1 ξ−o(1)
f or − 1 ≤ ξ ≤ 0− ,
s (2) (32)
q̃ R (ξ) =
 q R2 (z (2) ) = q R2 ξ−o
f or 0+ ≤ ξ ≤ 1.
s
Note that q̃ R (ξ) allows for a discontinuity at ξ = 0. Substituting (32) into (31) yields
Z0 R
q (ξ) − q
R1
ξ − o(1)
s
Z1 ξ − o(2)
ψi (ξ)dξ +
ψi (ξ)dξ = 0,
q R (ξ) − q R2
s
−1
0
Using an expansion analogous to (24), (25) and rearranging we get
 1

 0

 1

Z
Z
Z
(1)
(2)
ξ
−
o
ξ
−
o
 ψi (ξ)ψj (ξ)dξ  qjR −  ψi (ξ)ψj
dξ  qjR1 −  ψi (ξ)ψj
dξ  qjR2 = 0.
s
s
−1
−1
0
We introduce the variable change ξ = s · z + o(k) , dξ = s · dz to the second and third integral, that gives
 1

 1

Z
Z
2
X
 ψi (ξ)ψj (ξ)dξ  qjR − s
 ψi (s · z + o(k) )ψj (z)dz  qjRk = 0.
(33)
k=1
−1
−1
(k)
Note that the term in brackets to the left of qjRk is the transpose of Sij defined in (??). We can write the
integrals in matrix notation:
2
X
(k) T
Mij qjR − s
Sij qjRk = 0,
(34)
k=1
(k) T
−1
where Mij is the mass matrix as defined in (28).Finally, if we define PGk
ij = s · Mij Slj
projection matrix, then
2
X
Rk
qiR =
PGk
ij qj .
to be the gather
(35)
k=1
2D projection
The 2D projection presented here is a two-dimensional extension of the integral projection method
presented in the previous section.
Figure 6 presents the standard element corresponding to the parent element with coordinates (ξ, η) ∈
(k) (k)
[−1, 1]2 and four children, each with separate coordinates (z1 , z2 ) ∈ [−1, 1]2 , where k = 1, . . . , 4. We
define a map:
(k)
(k)
z1
(k)
z2
ξ − o1
,
s
(k)
η − o2
=
,
s
32
=
(36)
(k)
(k)
where o1 and o2 are offset parameters corresponding to each child element k and coordinate z1 and z2 .
The inverse mapping is now
ξ = s · z1 + o1 ,
(k)
(k)
(37)
(k)
(k)
(38)
η = s · z2 + o2 ,
(k)
where k = 1, . . . , 4, the scale parameter s = 0.5 and offsets oi = ±0.5 depending on direction i and element
number k.
Each element has a polynomial basis defined in which we expand the projected variable,
P
q (ξ, η)
=
MN
X
qjP ψj (ξ, η),
(39)
j=1
(k)
(k)
q Ck (z1 , z2 )
=
MN
X
(k)
(k)
qjCk ψj (z1 , z2 ),
(40)
j=1
where q P is the parent element variable and q Ck is the k-th child element variable projected from q P , and
MN is the number of nodal points in the element. Similarly as in Equation (25) we can substitute the
inverse map (37) to (39) and represent the parent variable in terms of the children coordinate system.
P
q (ξ, η) = q
P
(k) (k)
(z1 , z2 )
=
MN
X
(k)
(k)
(k)
(k)
qjP ψj (s · z1 + o1 , s · z2 + o2 ),
k = 1, . . . , 4.
(41)
j=1
For each k = 1, . . . , 4 we require that
Z1 Z1 (k) (k)
(k) (k)
(k) (k)
(k)
(k)
q Ck (z1 , z2 ) − q P (z1 , z2 ) ψj (z1 , z2 )dz1 dz2 = 0.
(42)
−1 −1
Substituting expansions (39), (40), rearranging and employing matrix notation yields
(k)
Mij qjCk − Sij qjP = 0,
(43)
ψj (z1 , z2 )ψi (z1 , z2 )dz1 dz2 ,
(44)
(k)
(45)
where
Z1 Z1
Mij
=
−1 −1
(k)
Sij
Z1 Z1
(k)
ψj (s · z1 + o1 , s · z2 + o2 )ψi (z1 , z2 )dz1 dz2 .
=
−1 −1
The projection matrix is once again constructed by inverting the mass matrix and left-multiplying the inverse
with Eq. (43). This yields
P
qiCk = (PSk
k = 1, . . . , 4,
(46)
2D )ij qj ,
where
−1 (k)
PSk
S .
2D = M
(47)
Similarly as in the case of the gather projection in the 1D non-conforming edge case, the 2D gather
projection matrix is constructed by multiplying the inverse of the mass matrix and transpose of 2D S(k) :
T
−1 (k)
PGk
S
,
2D = s · M
33
(48)
which yields:
qiP =
4
X
Ck
(PGk
2D )ij qj .
(49)
k=1
This approach is easily extendable to 3D projections of hexahedral elements since the same tensor-product
operations are being applied in 1D, 2D, or 3D.
34
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Related manuals

Download PDF

advertisement