Topics in mesh-based modelling and computer graphics Erik Christopher Dyken

Topics in mesh-based modelling and computer graphics Erik Christopher Dyken
Topics in mesh-based modelling
and computer graphics
Erik Christopher Dyken
ii
ACKNOWLEDGEMENTS
My work has been funded by the project “Mathematical Methods in Mesh-based Geometric Modelling”, which is part of the BeMatA program of the Norwegian Research
Council. I thank the Department of Informatics and the Centre of Mathematics for
Applications for hosting me and providing a stimulating environment. I also want to
use this opportunity to thank some of the people that have helped me with this thesis,
ranging from general inspiration to the trade of mathematical writing.
First of all, I thank Professor Michael S. Floater, my primary supervisor, for being
a most valued mentor teaching me the craft of scientific research. I also thank Professor Knut Mørken, my secondary supervisor, for all his support and encouragement.
Furthermore, I thank all my co-authors: Morten Dæhlen, Michael S. Floater, Martin
Reimers, Hans-Peter Seidel, Johan S. Seland, Thomas Sevaldrud, Christian Theobalt,
and Gernot Ziegler.
I thank Martin Reimers for promptly inviting me to joint research when I was a
fresh M.Sc.-student, which eventually resulted in my first published paper, Paper IV in
this thesis, as well as several subsequent papers. Also, big thanks to Johan S. Seland for
encouraging me to look into general processing on graphics processing units as well as
being an inspiring co-author. Furthermore, I thank Gernot Ziegler, always full of ideas
and a great source of inspiration, for our co-operation and for introducing me to the
HistoPyramid. Thanks go also to Knut Waagan for answering all my stupid questions
related to real analysis with a smile.
Furthermore, I particularly thank my family for all support and encouragement, and
last, but not least, the greatest thank to my girlfriend Benedicte Haraldstad Frostad for
all her love, help, support, and encouragement.
April 15, 2008
Christopher Dyken
iii
ACKNOWLEDGEMENTS
iv
TABLE OF C ONTENTS
Acknowledgements
ii
Overview and Introduction
1
Parameterizing triangles and polygons . . . . . . . . . . . . . . . .
1.1
Barycentric coordinates . . . . . . . . . . . . . . . . . . .
1.2
Mean value coordinates . . . . . . . . . . . . . . . . . . .
1.3
Transfinite mean value interpolation . . . . . . . . . . . . .
2
Triangulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
Voronoi diagrams . . . . . . . . . . . . . . . . . . . . . . .
2.2
Delaunay triangulations . . . . . . . . . . . . . . . . . . .
2.3
Constrained Delaunay triangulations . . . . . . . . . . . . .
3
Triangle meshes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
Rendering and shading . . . . . . . . . . . . . . . . . . . .
3.2
Triangular Bézier patches . . . . . . . . . . . . . . . . . .
3.3
GPU-based tessellation of triangular patches . . . . . . . .
4
Iso-surfaces of scalar fields as triangle meshes . . . . . . . . . . . .
4.1
The Marching Cubes algorithm . . . . . . . . . . . . . . .
4.2
Marching Cubes as a stream compaction-expansion process
5
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
2
4
5
5
6
7
8
9
11
12
14
15
16
17
Paper I:
C. Dyken and M. S. Floater,
Transfinite mean value interpolation
1
Introduction . . . . . . . . . . . . . . . .
2
Lagrange interpolation . . . . . . . . . .
2.1
Interpolation on convex domains .
2.2
Interpolation on convex polygons
2.3
The boundary integral formula . .
2.4
Non-convex domains . . . . . . .
2.5
Bounds on φ . . . . . . . . . . .
2.6
Proof of interpolation . . . . . . .
3
Differentiation . . . . . . . . . . . . . . .
4
Hermite interpolation . . . . . . . . . . .
5
A minimum principle . . . . . . . . . . .
6
Domains with holes . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
19
19
20
20
21
23
24
26
29
31
32
36
38
v
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
TABLE OF C ONTENTS
7
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1
Smooth mappings . . . . . . . . . . . . . . . . . . . . . . .
7.2
A weight function for web-splines . . . . . . . . . . . . . . .
39
39
39
Paper II:
C. Dyken and M. S. Floater,
Preferred directions for resolving
the non-uniqueness of Delaunay triangulations
1
Introduction . . . . . . . . . . . . . . . . .
2
Triangulating quadrilaterals . . . . . . . . .
3
Triangulating convex polygons . . . . . . .
4
Delaunay triangulations . . . . . . . . . . .
5
Numerical implementation . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
43
44
45
48
50
Paper III:
M. Dæhlen, T. Sevaldrud, and C. Dyken,
Simultaneous curve simplification
1
Introduction . . . . . . . . . . . .
2
Refinement and decimation . . . .
3
Problem statement . . . . . . . .
4
Triangulation . . . . . . . . . . .
5
Curve set decimation . . . . . . .
5.1
Nodes and vertices . . . .
5.2
Removable vertices . . . .
5.3
Weight calculation . . . .
6
Numerical results . . . . . . . . .
7
Concluding remarks . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
53
53
54
56
58
59
60
60
62
65
70
Paper IV:
C. Dyken and M. Reimers,
Real-time linear silhouette enhancement
1
Introduction . . . . . . . . . . . . .
2
Silhouettes . . . . . . . . . . . . . .
3
View dependent geometry . . . . .
4
Implementational issues . . . . . . .
5
Final remarks . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
71
71
72
74
77
79
vi
TABLE OF C ONTENTS
Paper V:
C. Dyken, M. Reimers, and J. Seland,
Real-time GPU silhouette refinement
using adaptively blended Bézier patches
1
Introduction . . . . . . . . . . . . . . . . . . . . . . .
2
Previous and related work . . . . . . . . . . . . . . . .
3
Silhouettes of triangle meshes . . . . . . . . . . . . .
4
Curved geometry . . . . . . . . . . . . . . . . . . . .
5
Adaptive tessellation . . . . . . . . . . . . . . . . . .
6
Implementation . . . . . . . . . . . . . . . . . . . . .
6.1
Silhouetteness calculation . . . . . . . . . . .
6.2
Histogram pyramid construction and extraction
6.3
Rendering unrefined triangles . . . . . . . . .
6.4
Rendering refined triangles . . . . . . . . . . .
6.5
Normal and displacement mapping . . . . . .
7
Performance analysis . . . . . . . . . . . . . . . . . .
7.1
Silhouette extraction on the GPU . . . . . . .
7.2
Benchmarks of the complete algorithm . . . .
8
Conclusion and future work . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
81
81
82
83
85
86
89
89
90
92
92
94
95
96
97
98
Paper VI:
C. Dyken, M. Reimers, and J. Seland,
Semi-uniform adaptive patch tessellation
1
Introduction . . . . . . . . . . . . . . .
2
Related work . . . . . . . . . . . . . .
3
Semi-uniform adaptive patch tessellation
4
Implementation . . . . . . . . . . . . .
4.1
Building the render queues . . .
4.2
Rendering . . . . . . . . . . . .
4.3
Optimizations . . . . . . . . . .
5
Performance analysis . . . . . . . . . .
6
Conclusion . . . . . . . . . . . . . . .
7
Acknowledgments . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
101
101
102
103
105
106
107
108
108
110
111
Paper VII:
C. Dyken, G. Ziegler, C. Theobalt, and H.-P. Seidel,
High-speed marching cubes using Histogram Pyramids
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Previous and related work . . . . . . . . . . . . . . . . . . . . . . . .
113
113
115
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
TABLE OF C ONTENTS
3
4
5
6
HistoPyramids . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
Construction . . . . . . . . . . . . . . . . . . . . .
3.2
Traversal . . . . . . . . . . . . . . . . . . . . . . .
3.3
Comments . . . . . . . . . . . . . . . . . . . . . .
Marching Cubes . . . . . . . . . . . . . . . . . . . . . . . .
4.1
Mapping MC to stream and HistoPyramid processing
4.2
Implementation details . . . . . . . . . . . . . . . .
4.3
CUDA implementation . . . . . . . . . . . . . . . .
Performance analysis . . . . . . . . . . . . . . . . . . . . .
Conclusion and future work . . . . . . . . . . . . . . . . . .
References
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
117
117
118
121
122
123
125
126
128
130
133
viii
OVERVIEW AND I NTRODUCTION
In virtually all of the natural sciences, complex geometries must be described mathematically. A prerequisite of numerical simulations is that the real-world problem to
be solved is posed in terms manageable by a computer. Such a set of terms is provided by mesh-based geometric modelling, which is a conceptually simple yet powerful paradigm of geometric modelling.
The concept of mesh-based modelling is to use a large number of small, simple,
and mathematically well-defined surface pieces tied together in a mesh. As a whole,
this mesh of small simple pieces can model large and complex geometries. Usually,
each surface piece is represented by a polygon, often a triangle or a quadrilateral, and
if these polygons fit together, the corresponding surface pieces fit together as well.
In addition, one can also usually control how smooth the transition between adjacent
surface pieces is.
Mesh-based modelling is a vast field, which is illustrated by this thesis. All the
papers relate to mesh-based modelling, with topics ranging from transfinite interpolation to efficient render techniques on Graphics Processing Units (GPUs). Hopefully,
by giving an introduction to the topics of the papers we will, to some extent, set the
papers into a common context.
1
PARAMETERIZING TRIANGLES AND POLYGONS
A geometric shape, like a line, a triangle, or a polygon, is a set of points. We can define
this set either implicitly or explicitly. The implicit form describes the shape as the set
of points that satisfies a set of conditions. For example, an iso-surface of a scalar field,
which we shall discuss in Section 4, is defined in that way. On the other hand, an
explicit form is a map that associates values from a parameter domain to points on the
geometric object.
A parameterization µ of the object S over the object D, that is µ : D → S,
“wraps” S over D such that we can specify positions on S in terms of positions on D.
For example, S can be a curve in some space while D is a subset of the real line.
1.1
BARYCENTRIC COORDINATES
Barycentric coordinates provide a simple and elegant method of parameterizing line
segments and polygons using sequences of non-negative weights that sum to one. Such
a sequence is known as a partition of unity and we let Bn ,
Bn = {λ = (λ1 , . . . , λn ) ∈ Rn :
λ1 , . . . , λn ≥ 0,
1
λ1 + · · · + λn = 1},
OVERVIEW AND I NTRODUCTION
denote the set of all such sequences of size n. Let p0 , . . . , pn ∈ Rk be points in the
k-dimensional Euclidean space, and let [·] denote the convex hull, that is, [p1 ] denotes
a point, [p1 , p2 ] denotes a line segment, [p1 , p2 , p3 ] denotes a triangle, and so on.
The line segment L = [p1 , p2 ] can be parameterized by µL : B2 → L, which is
the convex combination
µL (λ) = λ1 p1 + λ2 p2 ,
(1.1)
where (λ1 , λ2 ) are the barycentric coordinates with respect to L. For any λ ∈ B2 ,
a corresponding point x ∈ L is given by µL . Deducing the inverse map of (1.1) is
2
straightforward: The point x bisects L into two line segments, so µ−1
L (x) : L → B ,
kp2 − xk kx − p1 k
,
µ−1
(x)
=
,
(1.2)
L
kp2 − p1 k kp2 − p1 k
is given by the lengths of the segments relative to the length of L.
With a triangle T = [p1 , p2 , p3 ], we can use the same approach if we replace the
use of lengths with the use of areas. Any point x in T can be specified as a convex
combination of the three corners, that is
µT (λ) = λ1 p1 + λ2 p2 + λ3 p3 ,
(1.3)
µ−1
T
which yields the parameterization µT : B3 → T . The inverse mapping
: T → B3
is given by the ratios of areas
area(x, p2 , p3 ) area(x, p1 , p3 ) area(x, p1 , p2 )
−1
µT (x) =
,
,
,
(1.4)
area(p1 , p2 , p3 ) area(p1 , p2 , p3 ) area(p1 , p2 , p3 )
where area(a, b, c) is the area of the triangle [a, b, c].
Inverse barycentric mappings always exist and are unique for non-degenerate line
segments and triangles. Therefore, by combining the forward and inverse mapping we
can for example parameterize one triangle over another triangle. So, if S ⊂ R2 is a
triangle in the plane, and T ⊂ R3 is a triangle in 3D space, we can define f : S → T
as the composition
−1
f (u) = µT ◦ µ−1
(1.5)
S (u) = µT (µS (u)),
which parameterizes T over S. The map f is in fact a linear interpolation that interpolates the corners of T over S.
1.2
M EAN VALUE COORDINATES
The generalization of the convex combination-based parameterization for arbitrary ngons P is straightforward, defining µn : Bn → P as
µn (λ) = λ1 p1 + · · · + λn pn .
2
(1.6)
OVERVIEW AND I NTRODUCTION
pi+1
)
Γ
L(
x,
θ
x
Γ
pi
p(x, θ)
θ
x
dΩ
Figure 1.1: Circle for the Mean Value Theorem. Left: the polygonal case. Right: the arbitrary
curve case.
Any partition of unity will give a point in the convex hull of p1 , . . . , pn . However,
the tricky part is the inverse map. If we let P = [(0, 0), (1, 0), (1, 1), (0, 1)] be the
unit quadrilateral, we see that both partitions of unity ( 12 , 0, 12 , 0) and (0, 12 , 0, 12 ) form
convex combinations for the same point ( 12 , 21 ). So we need a strategy, for a given point
position, to consistently choose one of the possible partitions of unity. In addition, the
weights should depend smoothly on p1 , . . . , pn .
The mean value coordinates [27] provide such a method of choosing a partition of
unity. Recall that a function u is harmonic if it satisfies the Laplace equation, ∆u = 0.
One property of harmonic functions is that they satisfy the Mean Value Theorem: For
any point p0 and any tiny circle Γ of radius r centered around x, we have
Z
1
u(x) =
u(c) dc.
(1.7)
2πr Γ
The Mean Value Theorem states that for a harmonic function, the function value at x
equals the average of all function values along a tiny circle centered around x. The
mean value coordinates defines µ−1
n such that f is required to satisfy the Mean Value
Theorem, and in this sense approximates a harmonic function.
Thus, for a point x in a polygon P , we create a closed triangle fan with the common
apex in x, where the base of each triangle correspond to an edge of P , see Figure 1.1,
left. We let F be the piecewise linear function over that triangle fan interpolating
f (p1 ), . . . , f (pn ), and f (x). We enforce the Mean Value Theorem onto F by inserting
n
F into (1.7), and that lets us determine f (x). It can be shown [27] that µ−1
n :P →B
is
µ−1
n (x) = Pk
1
j=1
wj
(w1 , . . . , wn ) ,
wi =
tan(αi−1 /2) + tan(αi /2)
,
kpi − xk
where αi is the angle at x in the triangle [x, pi , pi+1 ].
3
(1.8)
OVERVIEW AND I NTRODUCTION
The mean value coordinates have several nice properties; in particular, they provide
convex weights for all star-shaped polygons. Also, Floater and Hormann [28] show that
the mean value interpolation function can handle interpolation over arbitrary polygons
in a natural way.
1.3
T RANSFINITE MEAN VALUE INTERPOLATION
Similar to the line segment and triangle case, composing (1.6) and (1.8) yields a function that interpolates the corners of one polygon over another polygon. If we extend this
to arbitrary continuous functions over arbitrary curves in the plane we get transfinite
interpolation, where we do not interpolate at distinct points but over curves.
The basic form of transfinite interpolation is Lagrange transfinite interpolation:
Given a convex or non-convex set in the plane Ω ⊂ R2 , possibly with holes, and a
function f : ∂Ω → R, then find a function g : Ω → R that interpolates f on ∂Ω.
We proceed as in Section 1.2, letting L(x, θ) be the semi-infinite line originating at
x in the direction θ, and p(x, θ) be the intersection of L(x, θ) and ∂Ω, see Figure 1.1,
right. Analogously to the polygonal case, we begin by enforcing the “radially linear”
function F ,
F (x + r(cos θ, sin θ)) =
r
kp(x, θ) − xk − r
g(x) +
f (p(x, θ)),
kp(x, θ) − xk
kp(x, θ) − xk
to possess the Mean Value property by inserting the expression into (1.7). The unique
solution of that expression is given by
Z 2π
Z 2π
1
f (p(x, θ))
1
g(x) =
dθ, where φ(x) =
dθ.
φ(x) 0 kp(x, θ) − xk
0 p(x, θ) − x
Ju, Schaefer and Warren [48] noticed that if a parametric representation c(t) of ∂Ω is
available, we can convert the integrals to integrals over the parameter of c, that is,
Z b
Z b
c(t)−x ×c0 (t)
c(t)−x ×c0 (t)
1
f c(x, θ) dt, φ(x) =
dt.
g(x) =
φ(x) a
kc(t) − xk3
kc(t) − xk3
a
In Paper I, we show that the Lagrange mean value interpolant do, in fact, interpolate.
We also present a variant of Hermite transfinite interpolation that interpolates the
normal derivative: Given Ω ⊂ R2 which is a convex or non-convex set in the plane,
possibly with holes, and a function f : ∂Ω → R. Then find g : Ω → R that interpolates
∂g
∂f
f while ∂n
matches ∂n
on ∂Ω. The Hermite expression enables us to create parameterizations where we have better control over the shape of the parameter domain along
the boundary.
4
OVERVIEW AND I NTRODUCTION
We also prove several properties for the mean value weight-function φ: It is always
positive in the interior of Ω, it is bounded by the distance function, it has a constant
normal derivative, and it has no local minima on Ω. The weight function can be used in
applications where a distance-like function is needed, for example as a weight-function
for the WEB-spline method [41].
2
T RIANGULATIONS
The triangle is a particular attractive geometric primitive since there is a rigid relationship between its interior angles and the lengths of its sides. In addition, the triangle
is always well-defined as long as its three corners are not co-linear. We call a set of
triangles a triangulation. Furthermore, a valid triangulation is a triangulation where
the intersection of two triangles is (i) nothing, (ii) a common corner, or (iii) a common
edge. If we have a set of points of measured data, a valid triangulation can be used
to span the void between the points, thus both forming a surface from a discrete set of
points as well as introducing a spatial relationship between the points.
Given a set of points, there are numerous ways of connecting these points with
triangles. In general, long and thin triangles tend to be numerically less robust than
more equilateral triangles, and that gives us some measure of choosing one triangulation over another. The Delaunay triangulation [16] is a triangulation that maximizes
the minimum angle of every triangle in the triangulation. The smallest interior angle
of a triangle can maximally be 60 degrees, and that is the case if the triangle is equilateral. In that sense, the Delaunay triangulation is often regarded as a good choice.
The Delaunay triangulation enjoys several interesting properties (see e.g. [70, 39]) and
we shall begin investigating one of these properties, namely the intimate relationship
between the Delaunay triangulation and the Voronoi diagram.
2.1
VORONOI DIAGRAMS
Voronoi diagrams, also know as Dirichlet tessellations and Thiessen regions, were already discussed back in 1908 by Voronoi and in 1850 by Dirichlet [70]. The Voronoi
diagram is defined from a set of points in the plane, called Voronoi sites. Each site has
a region associated called a Voronoi tile. The Voronoi tile of a site is the set of points in
the plane that is closer to that tile’s site than any other sites, that is, given the Voronoi
sites p1 , . . . , pn , the Voronoi tile V (pi ) for a site pi is
V (pi ) = {x : kpi − xk ≤ kpj − xk,
∀j 6= i} .
Notice that the inequality is not strict, and thus a point equally distant from several sites
belongs to several tiles. The Voronoi diagram is the collection of Voronoi tiles and is,
5
OVERVIEW AND I NTRODUCTION
Figure 1.2: The relationship between the Voronoi diagram and Delaunay triangulation. Left:
The Voronoi diagram of 32 sites randomly positioned in [0, 1] × [0, 1] and the Voronoi diagram
of 5 co-circular sites. Right: The corresponding Delaunay triangulations of the two sets of
Voronoi sites. There exists several Delaunay triangulations for the 5 co-circular sites, and the red
and green lines show two possibilities.
thus, a partition of the plane. If the tiles of two Voronoi sites intersect they are said to
be Voronoi neighbours. If the intersection is a point, then they are weak neighbours,
and if the intersection is a line, they are strong neighbours.
The Voronoi diagrams of two sets of Voronoi sites are shown in the left of Figure 1.2. Since a Voronoi tile is the set of points closest to a site, it is a useful structure
in all applications requiring queries where you want to find the closest site to a given
query point, for example, the nearest hospital or wireless base station. Another application is to geographically partition responsibility, for example, between fire observation
towers.
2.2
D ELAUNAY TRIANGULATIONS
If we connect all strong Voronoi neighbours with an edge we get the Delaunay pretriangulation [86] of the Voronoi sites. If no set of four or more sites are co-circular,
the pre-triangulation is the unique Delaunay triangulation. For each set of four or more
co-circular sites, we get a corresponding polygon in the triangulation. Any triangulation
of these polygons transforms the pre-triangulation into a valid Delaunay triangulation.
Figure 1.2 shows the Delaunay triangulations of two sets of Voronoi sites. The boundary of a Delaunay triangulation is the convex hull of the sites.
In some applications, uniqueness is an issue. For example, the image compression technique of [17] is based on encoding the image as a triangle mesh, and if the
triangulation is uniquely given by the point positions, we can avoid storing the triangulation all together. The Delaunay triangulation is unique if there are no sets of
four or more co-circular points. However, in this image compression application, these
sets of co-circular points are quite common. In Paper II we propose a rule that, given
6
OVERVIEW AND I NTRODUCTION
two directions that are neither parallel nor orthogonal, uniquely picks out one of every
possible Delaunay triangulation of any point set.
If two triangles [a, b, c] and [c, b, d] in a valid triangulation share an edge and form
a strictly convex quadrilateral, we can replace these two triangles with the two triangles
[a, d, c] and [b, d, a] and still have a valid triangulation. This operation is called edgeswapping, since it amounts to swap the diagonal of the quadrilateral formed by the two
triangles. Only the interior of the quadrilateral is influenced by this edge-swap, and it
is therefore a local operation. In fact, any valid triangulation of a set of points can be
obtained from any other valid triangulation of the same set of points by a sequence of
edge-swaps [57].
One technique for building a Delaunay triangulation is Lawson’s local optimization
procedure [58]. The procedure starts with an arbitrary valid triangulation of the points
and iteratively swaps edges dictated by the “Delaunay criterion”, a rule to select one of
the two possible triangulations of a strictly convex quadrilateral Q. When the procedure
terminates and there are no more diagonals to swap, the triangulation is a Delaunay
triangulation. Thus, for a set of points with no subsets of four or more co-circular
points, this local optimization procedure always terminates at a global optimum.
Lawson stated three equivalent forms of his criteria: (i) The max-min angle criterion: Choose the triangulation of Q that maximizes the minimum interior angle of the
two resulting triangles. (ii) The circle criterion: Choose the triangulation of Q such
that the circumcircle of both triangles is empty and contains no other vertices. (iii) The
Voronoi criterion: Choose the triangulation of Q such that the diagonal of Q connects
vertices that are strong Voronoi neighbours. The rule given in Paper II can easily be
integrated into Lawson’s optimization procedure by augmenting the circle criterion of
Cline and Renka [14]. That rule can also be carried out using integer arithmetics.
2.3
C ONSTRAINED D ELAUNAY TRIANGULATIONS
The constrained Delaunay triangulation (see e.g. [39]) is a generalization of the Delaunay triangulation where we enforce the existence of a set of predefined edges E
in the triangulation. The constrained Delaunay triangulation can be defined using the
max-min angle criterion in the following way: If all the edges where the two adjacent
triangles form a convex quadrilateral are either in E or would not be swapped by the
max-min angle criterion, the triangulation is the constrained Delaunay triangulation.
Alternatively, the constrained Delaunay triangulation can be defined using the following modified circle criterion: The circumcircle of a triangle contains no vertices
that are visible from any of the three vertices that span the triangle, where a vertex a
is said to be visible from b if the line segment [a, b] does not intersect any edge in E.
However, the generalization of the Voronoi criterion is not straight-forward, see [39].
7
OVERVIEW AND I NTRODUCTION
Figure 1.3: Approximating smooth objects with triangles. The same torso-shape was approximated with 1000 triangles (left) and 2000 triangles (right).
The constrained Delaunay triangulation enables us to enforce elements to align
along specific features. That can be used to create a triangulation with a predefined
boundary, or it can be used to encode relationships between such cartographic features
as roads, rivers, lakes, and mountain ridges. In Paper III we represent cartographic
datasets with constrained Delaunay triangulations, and that representation enables us
to simplify cartographic features in such a way that no intersections are removed and
no new intersections are introduced, that is, the topology of the cartographic features
is maintained.
3
T RIANGLE MESHES
A valid triangulation in 3D space is often called a triangle mesh. For several reasons,
the representation of geometry by triangle meshes is quite useful. First of all, modern
graphics hardware is particularly designed for rapid rendering of triangles, and thus,
most real-time rendering applications represent the geometry using triangle meshes
before subjecting it to the graphics pipeline. In addition, some computational geometry
algorithms, like, for example, ray-surface intersection and spatial decomposition are
usually easier to implement and more robust when using triangle meshes than for other
representations. Thus, triangle meshes are fairly common in computational geometry
applications, and, in particular, in off-line rendering applications.
8
OVERVIEW AND I NTRODUCTION
An intrinsic problem of triangle meshes is that triangles are inherently flat, and,
thus, representing a curved surface with a triangle mesh results in a rather faceted
shape. That approximation error is reduced by increasing the number of triangles,
as shown in Figure 1.3. Though, by increasing the number of triangles, the amount
of geometry to process is increased, and, thus, the sheer amount of triangles quickly
becomes a computational burden.
As triangle meshes are rarely folded out in the plane (and quite often they are
impossible to fold out), we need some structure keeping track of which triangle is next
to which triangle. Such a structure is called the connectivity. If the triangulation is
valid and the triangles are given in terms of unique vertices, the connectivity is directly
given. However, to speed up queries, most applications augment their triangle data
structure with connectivity information.
It is often feasible to use some smooth higher-order representation, like, for example, cubic Bézier patches, which we introduce in Section 3.2, or subdivision surfaces,
for the internal representation of the geometry. We can then convert from that representation to a set of triangles on the fly. That conversion process is called tessellation.
If we have a budget of triangles that can be used, an advanced tessellation scheme can
spend the triangle budget where most needed, and we shall investigate such schemes in
papers IV, V, and VI. We shall also consider how to use the information provided for
shading, as described in Section 3.1, to replace triangles with cubic Bẽzier patches.
3.1
R ENDERING AND SHADING
As mentioned in the introduction, computer graphics applications use triangle meshes
extensively. The process of converting a triangle to a set of screen-space fragments (pixels not yet written to the screen) is called rasterization. Each fragment created is subjected to a shading model that determines the colour of the fragment. Since each triangle usually produces a significant number of fragments, this process is very compute
intensive. Therefore, rasterization and shading is usually handled by graphics hardware which is particularly designed for that task. The increase in computation power
of graphics hardware in recent years has allowed increasingly elaborate shading models
to be used.
A shading technique usually involves a lighting model which is used to estimate
how light and matter interact. A particularly popular lighting model is Bui-Tuong
Phong’s lighting model, which consists of three terms: The ambient term is a constant
that models light that is reflected off other surfaces. The diffuse term is Lambert’s
cosine law for perfectly diffuse surfaces, and the specular term is based on the law
of reflection for perfectly reflective mirrors. Letting n be the surface normal, l be the
9
OVERVIEW AND I NTRODUCTION
Figure 1.4: Various shading models for the same geometry. From left: wireframe rendering,
Phong shading, normal mapping, and parallax occlusion mapping.
direction to the light source, and v is the direction to the viewer, the surface color c is
c = ca + cd max(n · l, 0) + cs max(v · r, 0)α ,
(1.9)
where r = 2(l · n)n − l is l reflected about n and the constants ca , cd , cs , and α model
the characteristics of the material.
The shading model specifies how often the lighting model should be evaluated. The
direct approach of evaluating the lighting model once per triangle and using that colour
to fill the entire triangle is called “flat shading”. This approach was used in Figure 1.3,
where we clearly see the faceted nature of the geometry. Increasing the number of
triangles makes the surface appear smoother. However, we can “fake” a smoother
surface without increasing the number of triangles by using a more elaborate shading
model that decouples the actual geometric surface normal and the shading normal used
for lighting calculations.
The Phong shading model assumes that a shading normal is given at each corner
of a triangle. The shading normal inside the triangle is given by the normalized linear
interpolation of the three corner shading normals. The interpolated shading normal is
fed to the Phong lighting model which yields the surface colour. When the shading
normals of a triangle mesh agree at the triangle corners, a coarse geometry, as shown
left in Figure 1.4, appears quite smooth when shaded with the Phong shading model,
as shown in the second image from the left in Figure 1.4.
Instead of interpolating the shading normal, we can fetch it from a texture map,
which is an image wrapped over the object. That approach is called “normal mapping”
and allows the surface to appear having small bumps and creases, as shown in the third
image from the left in Figure 1.4. Since the geometry isn’t altered, the small features
do not occlude other small features, which is a particularly visible artefact at grazing
angles. However, the parallax occlusion mapping technique remedies that to some
extent, as shown in the right of Figure 1.4.
10
OVERVIEW AND I NTRODUCTION
A shortcoming for all shading techniques is that none actually displace the geometry, they only make the appearance of a displacement. That is usually not a problem,
except along the silhouette of the object, since the silhouette shape of an object is directly given by the geometry, which is quite visible in Figure 1.4.
3.2
T RIANGULAR B ÉZIER PATCHES
A considerable amount of triangles is usually needed to model a curved surface convincingly, and those triangles do not necessarily introduce any increased detail. Thus,
it is often feasible to work on a coarser representation using some set of smooth higherorder patches, which are converted to a set of triangles when needed. One fairly common approach is the triangular cubic Bézier patch.
The triangular cubic Bézier patch is flexible enough to handle inflections, is easy
to evaluate and refine, and the number of degrees of freedom is manageably low. The
triangular cubic Bézier patch S is defined in terms of the barycentric coordinates λ ∈
B3 ,
X
3
S(λ) =
blmn Blmn
(λ),
(1.10)
l,m,n≥0
l+m+n=3
6
3
ul v m wn are the triangular cubic Bernstein-Bézier polywhere Blmn
(u, v, w) = l!m!n!
nomials, see e.g. [26]. The points blmn are the ten control points that define the geometry of the patch, in the same way the three corners of a triangle define the geometry
of a triangle. How control points of two adjacent patches are relatively positioned
determine the smoothness of the composite surface over the common boundary.
Quite often, we are not given a set of control points, but a coarse set of triangles (which we call base triangles) and a set of conditions that should be satisfied. For
example, the base triangles can be a coarse approximation to a dense set of points that
should be approximated in a least-squares sense. Or, it can be the case that the normal vector of the patches should coincide with predefined normal vectors given at the
corners of the base triangles. That latter case is particularly interesting in a rendering
context, since that set of conditions is the data usually given the shading model.
An approach to create a patch that satisfies the conditions given to the shading
model for a triangle is Point-Normal triangles (PN-triangles) [92]. The control point
at each of the three corners of the patch are defined by the corner vertices of the base
triangle. Then, along each edge [pi , pj ] of the base triangle we define a cubic Bézier
curve Cij ,
Cij (t) = pi B03 (t) + cij B13 (t) + cji B23 (t) + pj B33 (t),
(1.11)
3
where Bi3 (t) = i ti (1 − t)3−i are the univariate Bernstein-Bézier polynomials, see
0
e.g. [26]. Let ni be the shading normal at pi , and we determine cij such that Cij
lies
11
OVERVIEW AND I NTRODUCTION
in the tangent plane defined by pi and ni at pi . To this end, we let cij be the projection
of 23 pi + 13 pj onto the tangent plane defined by ni and pi , that is,
(1.12)
cij = 23 (pi + pj ) − 31 (pj − pi ) · ni ni .
Similarly, we let cji be defined by nj and pj . This decoupling of the two edge-ends are
possible since the first derivative at the end of a Bézier curve is completely determined
by the two control points at the edge-end.
Along feature edges, where a discontinuity in the shading is intended, the shading
normals of two adjacent triangles do not agree at a common corner. In that case, we
use the cross product of the two shading normals to determine a tangent direction, and
thus, the tangent at that end of the curve will lie in both tangent planes. Letting nSi
and nT i be the shading normals of the two adjacent triangles S and T for the common
corner pi , then
cij = pi + 31 (pj − pi ) · mij mij ,
mij = nSi × nT i knSi × nT i k.
The three Bézier curves define nine of the ten control points of the patch. Vlachos
et al. [92] define the last coefficient b111 as
3
b111 = 12
cij + cji + cjk + ckj + cki + cik − 16 pi + pj + pk .
In Paper V we propose to use
b111 =
1
6
cij + cji + cjk + ckj + cki + cik
instead, which allows a significantly more efficient implementation at the cost of a
slightly “flatter” patch.
3.3
GPU- BASED TESSELLATION OF TRIANGULAR PATCHES
In principle, tessellation is quite straight-forward: We divide the input patch into many
small triangles, where the tessellation level controls the number of triangles we split
a patch into. This set of small triangles is then used for subsequent computations or
rendering. However, modern graphics hardware renders triangles so fast that actually
transferring all the triangles to the GPU is the performance bottleneck in many applications. Therefore, it is of interest to let the GPU perform the tessellation, and that
way, only the control geometry of the patch has to be uploaded to the GPU.
Boubekeur and Schlick [8] proposed a novel approach to GPU-based tessellation.
Usually, all patches have the same parameter domain, so they create a parameter space
tessellation of a generic patch stored in a static VBO (memory on the graphics card). To
12
OVERVIEW AND I NTRODUCTION
Figure 1.5: Real-time GPU silhouette refinement. A dynamic refinement (left) of a coarse
geometry (centre). Cracking between adjacent patches of different refinement levels (top right)
are eliminated by use of the technique described in Paper V (bottom right).
render a patch, rendering of that generic parameter-space patch is triggered and they
let the vertex shader (the part of the graphics hardware that transforms all incoming
vertices, see [85, 75]) evaluate the incoming parameter positions yielding the evaluated
patch. Such an approach where all patches have the same tessellation level is called
uniform tessellation.
The abovementioned uniform tessellation scheme is very fast and efficient, however, it is still slower than straightforwardly rendering the base triangles since every
small triangle processed does induce some computational overhead. Thus, it is of interest to let the tessellation level vary, using only an increased triangle density where
needed. Such schemes are called non-uniform tessellation schemes and they must deal
with several challenges. Most important, the geometry of two adjacent patches with
different tessellation levels must match along their common boundary. If they do not
match, small holes are introduced along the boundary, a phenomenon called cracking,
see Figure 1.5. Furthermore, to let tessellation levels change for a patch over time, we
must have means to continuously blend between the geometry of different tessellation
levels. Otherwise, we get sudden small jumps in the geometry known as popping.
We observe in papers IV and V that shading usually hides the coarseness of geometry everywhere except along the silhouettes, and introduce the predicate silhouetteness
to govern the tessellation level. The silhouetteness is a continuous version of the binary
silhouette test (an edge is a silhouette edge if one of the adjacent triangles is facing the
camera while the other is not). The result of using that predicate with non-uniform
tessellation scheme is shown in the left of Figure 1.5.
13
OVERVIEW AND I NTRODUCTION
An approach to non-uniform tessellation is given by Boubekeur and Schlick [9],
where they propose to use a pool of tessellations for each refinement level, with one
tessellation for every combination of adjacent patch tessellation levels. That scheme
removes cracking artefacts, but does not alleviate popping. Also, the sheer number
of tessellation level combinations grows rapidly with increased maximum refinement
level. That does not only consume memory but also restricts the use of instancing,
a method that allows particularly efficient rendering of large batches of patches with
identical tessellations.
In Paper V, we propose a method to rewrite a patch to a finer tessellation level
without changing the geometry. We use that to introduce continuous blends between
tessellation levels as well as forcing the geometry of patches to match along common boundaries. In principle, that should remove both cracking and popping artefacts.
However, only the geometry of adjacent tessellations match, they do not match topologically edge-by-edge, and adjacent triangles must match edge-by-edge to guarantee
a water-tight output from the rasterizer hardware [67, 80]. In Paper VI we address that
issue, proposing a “snap-function” that modifies the parameter tessellations such that
adjacent tessellations match edge by edge.
4
I SO - SURFACES OF SCALAR FIELDS AS TRIANGLE MESHES
A scalar field associates a scalar value to every point in a space. In 3D space, a scalar
field f is a function f : R3 → R that can, for example, model temperature, humidity,
pressure, gravity, or electric potential. In many applications, the scalar field is defined
as a discrete 3D grid of sample values. In medical imaging, such grids are the output
of MRI- and CT-scans, and geophysical surveys produce enormous grids of seismic
data. A scalar field is often assumed to be at least continuous, however, we can use
a suitable reconstruction function, like the simple tri-linear interpolant to fill the void
in-between the sample values. That reconstruction function can again be used to construct numerical schemes for solving partial differential equations directly on the grid
of values.
Often, we are interested in the iso-surface S of a scalar field f , which is the set of
points where f equals a particular iso-value c, that is,
S = p ∈ R3 : f (p) = c .
That can, for example, describe the surface of an organ in a medical data set, or, as the
case is for level set methods, represent the solution. And in computer graphics, isosurfaces of scalar fields are frequently used to model metamorphic objects like clouds
and smoke [23], shapes that are notoriously difficult to model explicitly.
Thus, it is of interest to both extract and visualize such iso-surfaces. One approach
14
OVERVIEW AND I NTRODUCTION
to visualize a scalar field is volume ray-casting [23, 82]. The idea is to form a ray for
every pixel on the screen from the eye-point through the centre of that pixel and march
through the volume along that ray until we either intersect the iso-surface or exit the
volume. Translucency is easily accommodated and that can yield impressive images.
However, any change in viewport or in the scalar-field provokes a new calculation for
every ray, thus, the approach is quite compute intensive. In addition, that approach
has the drawback that the iso-surface is never extracted into an explicit form, which is
often necessary for subsequent computations. Thusly, a strategy is to directly extract
the iso-surface, and the most common approach is probably the Marching Cubes algorithm [62] of Lorensen and Cline. That algorithm marches through the scalar values
and produces an approximation to the iso-surface as a compact set of triangles. Another
noteworthy similar approach is the Marching Tetrahedra algorithm, which is, however,
inferior for cubical grids due to the artefacts introduced when subdividing the cubical
cells into tetrahedra [13].
Though, these 3D grids grow to the power of three with respect to sample density, and, thus, the sheer size of the dataset easily makes managing and processing the
data quite a challenge. This is particularly true for applications that require interactive
visualization of dynamically changing scalar fields. Therefore, and not unexpectedly,
there has been a lot of research on volume data processing on GPUs, since GPUs are
particularly designed for huge computational tasks with challenging memory bandwidth requirements, building on simple and massive parallelism instead of the CPU’s
more sophisticated serial processing. In Paper VII, we propose to formulate the Marching Cubes algorithm as a stream compaction-expansion process that can be efficiently
implemented on the GPU using an extended version of the Histogram Pyramid algorithm [95].
4.1
T HE M ARCHING C UBES ALGORITHM
The input of the Marching Cubes algorithm is a m×n×k-grid of scalar values. Letting
each 2×2×2-subgrid form the corners of a cubical cell, we get a (m−1)×(n−1)×(k−1)grid of such cubes, as shown in the left of Figure 1.6. For each cube, we label each of
the eight corners as “inside” or “outside” the iso-surface by comparing the scalar values
at the corners with the iso-value. If one corner is inside the iso-surface while another is
outside, the edge connecting the two corners pierces the iso-surface. Triangulating all
the intersections of the iso-surface and the cube’s edges, we get an approximation of
the iso-surface inside that cube. Thus, we can march through those cubes one-by-one,
in any order, and emit the part of the iso-surface contained inside each cube.
The labelling of the corners defines the case of the cube, and, in total, there are
256 different cases. If we assume that every edge that pierces the iso-surface pierces
15
OVERVIEW AND I NTRODUCTION
Figure 1.6: Left: The Marching Cubes algorithm marches through a 3D grid of scalar values,
where each 2×2×2-subgrid forms the corners of a cube. Right: The cube’s case is determined
by the scalar values at the corners. There are in total 256 cases where each case has a predefined
tessellation of edge-intersections.
it exactly once, the set of edge-intersections is completely determined by the case. We
can, therefore, create a pre-defined set of triangle tessellations, one for each of the 256
cases. That can, in principle, be reduced to 15 basic cases by symmetry [62], shown
in Figure 1.6, right. A few of the cases are ambiguous, but the ties can be consistently
resolved by adding some extra triangulations [66].
4.2
M ARCHING C UBES AS A STREAM COMPACTION - EXPANSION PROCESS
In recent years, commodity graphics hardware has become increasingly programmable.
Graphics computations are highly data-parallel, so, as opposed to CPUs, GPUs are
designed for performing identical computations on large sets of data elements, utilizing
massive parallelism. The result is, that for suitable problems, the GPU can achieve
a computational throughput several magnitudes larger than a CPU. Thusly, GeneralPurpose computations on GPUs (GPGPU) has emerged as an active field of research,
where the challenge is to formulate computations on a form suitable for processing by
the GPU, see e.g. [34] for an introduction.
The stream computing paradigm is particularly well suited for GPGPU computations. The idea is to view the computation as a stream of input elements fed to a
computational kernel that produces output elements. An output element depends only
on the corresponding input element, therefore, the computations can be carried out in
any order, for example in parallel. The Marching Cubes algorithm can be formulated
as a stream computing algorithm by letting the cubes be the input elements and the triangles of the corresponding triangulations be the output elements. The computational
kernel determines the case of the cube, calculates the edge intersections, and emits the
corresponding geometry.
16
OVERVIEW AND I NTRODUCTION
Since a cube produces between zero and five triangles, the number of output elements varies for each input element. And all the output elements should be written to a
compact list of triangles. That problem is known as stream compaction and expansion,
which needs a bit of care to be performed efficiently on parallel architectures. One approach is to use prefix scan [35], which creates a table of offsets in the output stream,
so every input element knows where to write its output elements. Prefix scan produces
the output stream by iterating over the input elements, however, the input to Marching
Cubes is a volume, and thus, the number of input elements is proportional to n3 . The
output is a surface, and the number of output elements is proportional to n2 , which
suggests that it may be more efficient to iterate over the output elements when n grows
large.
The HistoPyramid algorithm [95] is another approach to stream compaction and
expansion which produces the output stream by iterating over the output elements.
In Paper VII we propose to implement the Marching Cubes algorithm on the GPUs
using HistoPyramids, an approach that works on both Shader Model 3.0 and Shader
Model 4.0-generations of graphics hardware.
5
C ONCLUSION
We have given an overview and introduction to the topics of this thesis and outlined
the main contributions of the papers. All the papers deal with mesh-based geometric
modelling in some way or another.
The first paper of this thesis investigates transfinite mean value interpolation, where
we give sufficient conditions to guarantee interpolation. In addition, by deriving the
normal derivative of the interpolant, we construct a transfinite Hermite mean value
interpolant. We believe that the transfinite mean value interpolant is particularly useful
for creating smooth mappings between differently shaped sets in the plane, as well as
a helpful tool in various areas of applied mathematics.
Papers II and III deal with Delaunay triangulations. Delaunay triangulations are not
unique when four or more points are co-circular, but in Paper II we propose a simple
rule based on two preferred directions that uniquely selects one of all possible Delaunay
triangulations, which is, for example, useful when we use triangulations to compress
images [17]. In Paper III we propose a method for simultaneous simplification of a
set of piecewise linear curves in the plane without changing the topological relations
between the curves. The crux of our method is encoding the topological relationship
of the curves using a constrained Delaunay triangulation. The collection of curves can,
for example, represent cartographic contours or road networks, of which we, by using
our method, can create consistent multi-resolution representations.
The three papers IV through VI iterate on improving the rendered appearance of
17
OVERVIEW AND I NTRODUCTION
coarse triangular meshes. The idea is to let the triangles with shading normals define
a set of triangular Bézier patches that are used to insert geometry along the silhouettes
where shading cannot hide the coarseness of geometry. In Paper IV, we introduce the
silhouetteness-predicate that can guide a tessellation scheme such that patches along
the silhouettes are densely tessellated. In Paper V we propose a different tessellation
scheme and show how the complete rendering pipeline can be implemented directly
on the GPU. A drawback of the tessellation scheme of Paper V is that the composite
tessellations are not topologically watertight, for what we propose a simple remedy in
Paper VI. The result is a simple scheme for non-uniform tessellation of patches which
is quite suited for instancing, and the performance analyses show that the approaches
are quite efficient.
Finally, in Paper VII we propose a fully GPU-based approach to Marching Cubes.
By formulating Marching Cubes as a stream compaction-expansion process, we show
how it can be efficiently implemented using HistoPyramids, which we extend to allow
arbitrary stream expansion. The result is an implementation that currently outperforms
all other known GPU-based iso-surface extraction algorithms, and should provide an
efficient tool for extracting iso-surfaces in real-time. An interesting direction for further
research would be to extend the approach to handle out-of-core datasets.
18
PAPER I:
T RANSFINITE MEAN VALUE
INTERPOLATION
Christopher Dyken and Michael S. Floater
To appear in Computer Aided Geometric Design.
Abstract: Transfinite mean value interpolation has recently emerged as a simple
and robust way to interpolate a function f defined on the boundary of a planar
domain. In this paper we study basic properties of the interpolant, including sufficient conditions on the boundary of the domain to guarantee interpolation when
f is continuous. Then, by deriving the normal derivative of the interpolant and of
a mean value weight function, we construct a transfinite Hermite interpolant, and
discuss various applications.
1
I NTRODUCTION
Transfinite interpolation means the construction of a function over a planar domain that
matches a given function on the boundary, and has various applications, notably in geometric modelling and finite element methods [78]. Transfinite mean value interpolation
has developed in a series of papers [27, 28, 31, 48]. In [27] barycentric coordinates over
triangles were generalized to star-shaped polygons, based on the mean value property
of harmonic functions. The motivation for these ‘mean value coordinates’ was to parameterize triangular meshes but they also give a method for interpolating piecewise
linear data defined on the boundary of a convex polygon. In [28] it was shown that these
mean value interpolants extend to any simple polygon and even sets of polygons, possibly nested, with application to image warping. In both [31] and [48] 3D coordinates
were similarly constructed for closed triangular meshes, and in [48] the coordinates
were used for mesh deformation. Moreover, in [48] the construction was carried out
over arbitrary curves and surfaces, not just polygons and polyhedra. Further work on
mean value coordinates and related topics can be found in [5, 30, 29, 47, 49, 56, 61, 94].
The purpose of this paper is to study and further develop mean value interpolation
over arbitrary curves in the plane, as proposed by Ju, Schaefer, and Warren [48]. There
are two main contributions. The first is the derivation of sufficient conditions on the
shape of the boundary that guarantee the interpolation property: conditions that ensure
that the mean value interpolant really is an interpolant. This has only previously been
19
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
ρ (x , θ )
Ω
L (x ,θ )
p (x , θ )
θ
x
dΩ
Figure I.1: A convex domain.
shown for polygonal curves with piecewise linear data, in [28]. The second is the
construction of a Hermite interpolant, matching values and normal derivatives of a
given function on the boundary. The Hermite interpolant is constructed from a weight
function and two Lagrange interpolants, and requires finding their normal derivatives.
We complete the paper with applications to smooth mappings and the web-spline
method for solving PDE’s.
2
2.1
L AGRANGE INTERPOLATION
I NTERPOLATION ON CONVEX DOMAINS
Let Ω ⊂ R2 be open, bounded and convex. We start with the convexity assumption
because the definitions and analysis are easier. However, we make no assumption
about the smoothness of the boundary ∂Ω, nor do we demand strict convexity: three
points in ∂Ω can be collinear. Thus we allow Ω to be a convex polygon as well as a
circle, ellipse, and so on. For any point x = (x1 , x2 ) in Ω and any angle θ let L(x, θ)
denote the semi-infinite line that starts at x and whose angle from the x1 -axis is θ, let
p(x, θ) denote the unique point of intersection between L(x, θ) and ∂Ω, and let ρ(x, θ)
be the Euclidean distance ρ(x, θ) = kp(x, θ) − xk; see Figure I.1. The intersection
point p(x, θ) depends on the curve ∂Ω, and sometimes it will help to indicate this
by writing p(x, θ; ∂Ω). In general, p(x, θ; C) will denote the intersection (assumed
unique) between L(x, θ) and any planar curve C and ρ(x, θ; C) the corresponding
distance.
Given some continuous function f : ∂Ω → R, our goal is to define a function
g : Ω → R that interpolates f . To do this, for each x ∈ Ω, we define g(x) by the
following property. If F : Ω → R is the linear radial polynomial, linear along each
line segment [x, y], y ∈ ∂Ω, with F (x) = g(x) and F (y) = f (y), then F should
20
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
satisfy the mean value property
F (x) =
1
2πr
Z
F (z) dz,
(I.1)
Γ
where Γ is any circle in Ω with centre x, and r is its radius. To find g(x), we write (I.1)
as
Z 2π
1
F x + r(cos θ, sin θ) dθ,
(I.2)
g(x) =
2π 0
and since
ρ(x, θ) − r
r
F x + r(cos θ, sin θ) =
g(x) +
f (p(x, θ)),
(I.3)
ρ(x, θ)
ρ(x, θ)
equation (I.2) reduces to
Z
0
2π
f (p(x, θ)) − g(x)
dθ = 0,
ρ(x, θ)
whose unique solution is
Z 2π
f (p(x, θ))
g(x) =
dθ φ(x),
ρ(x, θ)
0
Z
φ(x) =
0
2π
1
dθ.
ρ(x, θ)
(I.4)
Equation (I.4) expresses g(x) as a weighted average of the values of f around Ω. We
will show later that under reasonable conditions on ∂Ω, g interpolates f , i.e., that g
extends continuously to the boundary ∂Ω and equals f there. Thus, since F satisfies
the mean value property (I.1) at x, we call g the mean value interpolant to f . The
interpolant g itself does not satisfy the mean value theorem and is not in general a
harmonic function. But in the spirit of [33], we can view it as ‘pseudo-harmonic’ as it
shares some of the qualitative behaviour of harmonic functions, such as the maximum
principle. Also, similar to harmonic functions, the operator I, defined by g = I(f ), has
linear precision: if f : R2 → R is any linear function, f (x1 , x2 ) = ax1 + bx2 + c, then
I(f ) = f in Ω. This follows from the fact that, if f is linear and we let g(x) = f (x),
then F = f , and so F is linear and therefore trivially satisfies (I.1). Figure I.2 shows
two examples of mean value interpolants on a circular domain.
2.2
I NTERPOLATION ON CONVEX POLYGONS
The construction of the mean value interpolant g was carried out in [27] in the special
case that Ω is a polygon and that f is linear along each edge of the polygon. In this
case g is a convex combination of the values of f at the vertices of the polygon. To see
this we prove
21
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
Figure I.2: Mean value interpolants.
Lemma I.1. Let e = [p0 , p1 ] be a line segment and let f : e → R be any linear
function. Let x be any point in the open half-plane lying to the left of the vector p1 −p0 .
Let θ0 < θ1 be the two angles such that p(x, θi ; e) = pi , i = 0, 1, and let ρi =
kpi − xk. Then
Z θ1
θ1 − θ0
f (p(x, θ; e))
f (p0 ) f (p1 )
+
dθ =
tan
.
(I.5)
ρ(x, θ; e)
ρ0
ρ1
2
θ0
Proof. Similar to the approach of [27], since f is linear, we have with p = p(x, θ; e),
f (p) =
A0
A1
f (p0 ) +
f (p1 ),
A
A
(I.6)
with A0 , A1 , A the triangle areas A0 = A([p0 , x, p]), A1 = A([p, x, p1 ]), A =
A([p0 , x, p1 ]). Letting ρ = ρ(x, θ; e), by the sine rule,
A0
sin(θ − θ0 )ρ
=
,
A
sin(θ1 − θ0 )ρ1
sin(θ1 − θ)ρ
A1
=
,
A
sin(θ1 − θ0 )ρ0
and putting these into (I.6), dividing by ρ, and integrating from θ0 to θ1 gives (I.5).
Since the function f ≡ 1 is linear, the lemma also shows that
Z θ1
1
1
θ1 − θ0
1
dθ =
+
tan
.
ρ0
ρ1
2
θ0 ρ(x, θ; e)
22
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
Together with (I.5), this implies that, if Ω with vertices p0 , p1 , . . . , pn−1 is a convex
polygon, and, indexing modulo n, if f is linear on each edge [pi , pi+1 ], then g in (I.4)
reduces to
n−1
n−1
X
X
g(x) =
wi (x)f (pi ) φ(x),
φ(x) =
wi (x),
(I.7)
i=0
i=0
where
wi (x) :=
tan(αi−1 (x)/2) + tan(αi (x)/2)
,
ρi (x)
(I.8)
and ρi (x) = kpi − xk and αi (x) is the angle at x of the triangle with vertices
x, pi , pi+1 . The functions
n−1
X
λi (x) := wi (x)
wj (x),
j=0
were called mean value coordinates in [27]. By the linear precision property of I, since
both f (x) = x1 and f (x) = x2 are linear, we have
x=
n−1
X
λi (x)pi ,
i=0
which expresses x as a convex combination of the vertices pi . Thus, the coordinates
λi are a generalization of barycentric coordinates.
2.3
T HE BOUNDARY INTEGRAL FORMULA
It is not clear from the formula (I.4) how to differentiate g. Ju, Schaefer, and Warren
[48] noticed however that if a parametric representation of ∂Ω is available, the two
integrals in (I.4) can be converted to integrals over the parameter of the curve. Let
c : [a, b] → R2 , with c(b) = c(a), be some parametric representation of ∂Ω, oriented
anti-clockwise with respect to increasing parameter values. If c(t) = (c1 (t), c2 (t)) =
p(x, θ), then θ is given by
c2 (t) − x2
θ = arctan
,
(I.9)
c1 (t) − x1
and differentiating this with respect to t gives
dθ
(c(t) − x) × c0 (t)
(c1 (t) − x1 )c02 (t) − (c2 (t) − x2 )c01 (t)
=
=
,
2
2
dt
(c1 (t) − x1 ) + (c2 (t) − x2 )
kc(t) − xk2
23
(I.10)
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
where × denotes the cross product in R2 , i.e, a × b := a1 b2 − a2 b1 . Using (I.10) to
change the integration variable in (I.4) yields the boundary integral representation (c.f.
[48]),
Z b
Z b
g(x) =
w(x, t)f (c(t)) dt φ(x),
φ(x) =
w(x, t) dt,
(I.11)
a
a
where
w(x, t) =
(c(t) − x) × c0 (t)
.
kc(t) − xk3
(I.12)
It is now clear that we can take as many partial derivatives of g as we like by differentiating under the integral sign in (I.11). Thus we see that g is in C ∞ (Ω). The
boundary integral formula is also important because it provides a way of numerically
computing the value of g at a point x by sampling the curve c and its first derivative
c0 and applying some standard quadrature rule to the two integrals in (I.11). A simple
alternative evaluation method that only requires evaluating c itself is to make a polygonal approximation to c and apply (I.7). The third alternative of using the original
angle formula (I.4) and sampling the angles between 0 and 2π requires computing the
intersection points p(x, θ).
The numerator in w can also be written as the scalar product of (c(t) − x) and
rot(c0 (t)) = (c02 (t), −c01 (t)), the rotation of c0 (t) through an angle of −π/2. Then,
since the outward normal to the curve c is rot(c0 (t))/kc0 (t)k, another way of representing g is in terms of flux integrals:
Z
Z
g(x) =
f (y)F(y) · N(y) dy φ(x),
φ(x) =
F(y) · N(y) dy,
∂Ω
∂Ω
where F is the vector field
F(y) =
y−x
,
ky − xk3
and N(y) is the outward unit normal at y and dy denotes the element of arc length
of ∂Ω. The Gauss theorem could then be applied to these expressions to give further
formulas for g and φ. Recently, Lee [60] has studied more general formulas of this
type.
2.4
N ON - CONVEX DOMAINS
We now turn our attention to the case that Ω is an arbitrary connected open domain in
R2 , not necessarily convex. In the case that Ω is a polygon, it was shown in [28] that
the mean value interpolant g defined by (I.7–I.8) has a natural extension to non-convex
24
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
p 2(x ,θ)
x
p 3(x ,θ)
p 1(x ,θ)
θ1 θ 2
x
θ
Figure I.3: (a) Example with two non-transversal angles and (b) an angle with three intersections.
polygons if we simply allow αi (x) in (I.8) to be a signed angle: negative when x lies
to the right of the vector pi+1 − pi . The main point is that φ continues to be strictly
positive in Ω so that g is well defined.
To deal with arbitrary (non-polygonal) domains, suppose initially that Ω is simplyconnected, i.e., has no holes, in which case its boundary can be represented as a single
parametric curve c : [a, b] → R2 , with c(b) = c(a), oriented anti-clockwise. Then,
similar to the construction in [48], we define g in this more general setting by the
boundary integral (I.11). Note that for arbitrary x ∈ Ω the quantity w(x, t) may change
sign several times as t varies.
We can also express g in this general setting using angle integrals. Recall that an
intersection point of two smooth planar curves is said to be transversal if the curves
have distinct tangents at that point. We call θ a transversal angle with respect to x if all
the intersections between L(x, θ) and ∂Ω are transversal. For example, in Figure I.3a
all angles at x are transversal except for θ1 and θ2 . We make the assumption that ∂Ω
is such that there is a finite number of non-transversal angles at each x ∈ Ω. If θ is a
transversal angle, let n(x, θ) be the number of intersections of L(x, θ) with ∂Ω which
will be an odd number, assumed finite, and let pj (x, θ), j = 1, 2, . . . , n(x, θ), be the
points of intersection, ordered so that their distances ρj (x, θ) = kpj (x, θ) − xk are
increasing,
ρ1 (x, θ) < ρ2 (x, θ) < · · · < ρn(x,θ) (x, θ).
(I.13)
For example, for θ ∈ (θ1 , θ2 ) in Figure I.3a, there are three such intersections, shown
in Figure I.3b.
Now for fixed x ∈ Ω, let
S+ = {t ∈ [a, b] : w(x, t) > 0}
and
25
S− = {t ∈ [a, b] : w(x, t) < 0},
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
and observe that both integrals in (I.11) reduce to integrals over S+ and S− . Moreover,
the sets S+ and S− are unions of intervals, and thus the integrals in (I.11) are sums of
integrals, one integral for each interval, and w(x, ·) has constant sign in each interval.
By changing the variable of integration for each interval from t to θ, using (I.10), it
follows that g can be expressed as
Z
g(x) =
0
2π n(x,θ)
X
j=1
(−1)j−1
f (pj (x, θ)) dθ
ρj (x, θ)
φ(x),
where
Z
φ(x) =
0
2π n(x,θ)
X
j=1
(−1)j−1
dθ. (I.14)
ρj (x, θ)
Here, if θ is not a transversal angle, we set n(x, θ) = 0. We now use (I.14) to deduce
the positivity of φ and therefore the validity of g in the non-convex case.
Theorem I.1. For all x ∈ Ω, φ(x) > 0.
Proof. The argument is similar to the polygonal case treated in [28]. Since the sequence of distances in (I.13) is increasing, if n(x, θ) ≥ 3,
1
1
−
> 0,
ρ2j−1 (x, θ) ρ2j (x, θ)
j = 1, 2, . . . , (n(x, θ) − 1)/2,
and so (I.14) implies
Z
φ(x) ≥
0
2.5
2π
1
dθ > 0.
ρn(x,θ) (x, θ)
B OUNDS ON φ
Having shown that g, given by either (I.11) or (I.14), is well-defined for non-convex
domains, our next goal is to show that g interpolates the boundary data f under reasonable conditions on the shape of the boundary. A crucial step in this is to study the
behaviour of φ near the boundary. In this section we show that φ behaves like the reciprocal of the distance function d(x, ∂Ω), the minimum distance between a point x ∈ Ω
and the set ∂Ω. First we derive an upper bound.
Theorem I.2. For any x ∈ Ω,
φ(x) ≤
2π
.
d(x, ∂Ω)
26
(I.15)
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
Proof. If n(x, θ) ≥ 3 in equation (I.13), then
1
−1
+
< 0,
ρ2j (x, θ) ρ2j+1 (x, θ)
and so
Z
φ(x) ≤
0
2π
j = 1, 2, . . . , (n(x, θ) − 1)/2,
1
dθ ≤
ρ1 (x, θ)
Z
2π
0
1
dθ.
d(x, ∂Ω)
To derive a lower bound for φ, we make some assumptions about ∂Ω in terms of
its medial axis [6]. Observe that ∂Ω divides R2 into two open and disjoint sets, the
set Ω itself, and its complement ΩC . The interior / exterior medial axis MI ⊂ R2 /
ME ⊂ R2 of ∂Ω is the set of all points in Ω / ΩC whose minimal distance to ∂Ω is
attained at least twice. For any set M ⊂ R2 , we let
d(M, ∂Ω) = inf d(y, ∂Ω),
y∈M
and to derive a lower bound, we will make the assumption that d(ME , ∂Ω) > 0. Note
that this condition holds for convex curves because in the convex case, ME = ∅ and
d(ME , ∂Ω) = ∞. We will also make use of the diameter of Ω,
diam(Ω) =
sup
ky1 − y2 k.
y1 ,y2 ∈∂Ω
Theorem I.3. If d = d(ME , ∂Ω) > 0, there is a constant C > 0 such that for x ∈ Ω,
φ(x) ≥
C
.
d(x, ∂Ω)
(I.16)
With β the ratio β = D/d, where D = diam(Ω), we can take
C=
2
(1 + β)(1 + β +
p
β 2 + 2β)
.
Note that C ≤ 2 and if Ω is convex then β = 0 and C = 2. On the other hand, if d
is small relative to D, then C will be small.
Proof. Let y be some boundary point such that d(x, ∂Ω) = ky − xk, and let δ =
ky − xk and let θy ∈ [0, 2π) be the angle such that L(x, θy ) intersects y. Then the
open disc B1 = B(x, δ) is contained in Ω. By the assumption that d > 0, let xC be the
27
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
a2
d
α δ
α
x
y
d
d
xC
a1
Figure I.4: Lines in proof of Theorem I.3.
point in ΩC on the line L(x, θy ) whose distance from y is d; see Figure I.4. Then the
open disc B2 = B(xC , d) is contained in ΩC . Let α1 , α2 , with α1 < θy < α2 , be the
two angles such that the lines L(x, α1 ) and L(x, α2 ) are tangential to ∂B2 , and let ai ,
i = 1, 2, be the point where L(x, αi ) touches ∂B2 . Let q1 be the polygon consisting
of the two line segments [a1 , y] and [y, a2 ], and q2 the polygon consisting of [a1 , xC ]
and [xC , a2 ].
Let θ be any transversal angle in (α1 , α2 ). Then there is some odd number k, say
with k ≤ n(x, θ), such that the intersection points p1 (x, θ), . . . , pk (x, θ) lie between
B1 and B2 while the remaining ones pk+1 (x, θ), . . . , pn(x,θ) (x, θ) lie beyond B2 .
Then, similar to the proof of Theorem I.1, if k = n(x, θ), the sum in φ in (I.14)
satisfies the inequality
n(x,θ)
X (−1)j−1
1
≥
,
ρj (x, θ)
ρk (x, θ)
j=1
while, if k < n(x, θ), it satisfies
n(x,θ)
X (−1)j−1
1
1
≥
−
.
ρ
(x,
θ)
ρ
(x,
θ)
ρ
j
k
k+1 (x, θ)
j=1
Consequently, in either case
n(x,θ)
X (−1)j−1
1
1
≥
−
,
ρ
(x,
θ)
ρ(x,
θ;
q
)
ρ(x,
θ; q2 )
j
1
j=1
and therefore, from (I.14),
Z
α2
φ(x) ≥
α1
1
1
−
ρ(x, θ; q1 ) ρ(x, θ; q2 )
28
dθ.
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
We now use the explicit formula from Lemma I.1, and setting α = (α2 − α1 )/2, we
find
1
1
1
1
φ(x) ≥ 2
+
tan(α/2) − 2
+
tan(α/2)
ka1−xk ky−xk
ka1−xk kxC −xk
1
2d
1
−
tan(α/2).
tan(α/2) =
=2
δ
δ+d
δ(δ + d)
Moreover, since
1 − cos α
tan(α/2) =
,
sin α
we have
tan(α/2) =
and therefore
p
d
sin α =
,
δ+d
1
φ(x) ≥
δ
cos α =
(δ + d)2 − d2
,
δ+d
d
p
,
δ + d + (δ + d)2 − d2
2d
δ+d
δ+d+
d
√
δ 2 + 2δd
.
(I.17)
Since δ ≤ D, this implies
φ(x) ≥
1
δ
2d
D+d
d
√
D + d + D2 + 2Dd
.
and, putting D = βd and cancelling the d’s, proves the theorem.
2.6
P ROOF OF INTERPOLATION
We can now prove that g really interpolates f under the medial axis condition of Theorem I.3. We also make the mild assumption that
N := sup sup n(x, θ) < ∞,
(I.18)
x∈Ω θ∈T (x)
where T (x) is the subset of [0, 2π) of those angles that are transversal with respect to
x. Note that this holds for convex Ω, in which case N = 1.
Theorem I.4. If f is continuous on ∂Ω and d(ME , ∂Ω) > 0, then g interpolates f .
Proof. Let c(s) be any boundary point and observe that for x ∈ Ω,
Z b
1
g(x) − f (c(s)) =
w(x, t) f (c(t)) − f (c(s)) dt.
φ(x) a
29
(I.19)
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
Rb R
R
We will choose some small γ > 0 and split the integral into two parts, a = I + J ,
where I = [s − γ, s + γ] and J = [a, b] \ I. Then, with M := supy∈∂Ω |f (y)|,
|g(x) − f (c(s))| ≤ max |f (c(t)) − f (c(s))|
t∈I
1
φ(x)
Z
|w(x, t)| dt
I
1
+ 2M
φ(x)
Z
|w(x, t)| dt.
J
Considering the first term on the right hand side, note that
1
φ(x)
Z
|w(x, t)| dt ≤
I
1
φ(x)
Z
b
|w(x, t)| dt =: R,
a
which we will bound above. The argument used to derive (I.14) also shows that
Z
b
Z
|w(x, t)| dt =
a
2π n(x,θ)
X
0
j=1
1
dθ,
ρj (x, θ)
and so
Z
b
Z
|w(x, t)| dt = φ(x) + 2
a
0
2π (n(x,θ)−1)/2
X
j=1
1
2(N − 1)π
dθ ≤ φ(x) +
,
ρ2j (x, θ)
d(x, ∂Ω)
with N as in (I.18). Dividing by φ(x) and applying the lower bound (I.16) to φ(x),
then leads to
2(N − 1)π
2(N − 1)π
≤1+
,
R≤1+
φ(x)d(x, ∂Ω)
C
which is independent of x. Note that when Ω is convex, N = 1 and R = 1.
Let > 0. We must show that there is some δ > 0 such that if x ∈ Ω and
kx − c(s)k ≤ δ then |g(x) − f (c(s))| < . Since f ◦ c is continuous at t = s, we can
choose γ > 0 such that |f (c(t)) − f (c(s))| < (/2)/(1 + 2(N − 1)π/C) for t ∈ I.
Then
Z
1
|g(x) − f (c(s))| < + 2M
|w(x, t)| dt.
(I.20)
2
φ(x) J
Finally, since
Z
Z
lim
|w(x, t)| dt =
|w(c(s), t)| dt < ∞,
x→c(s)
J
J
30
and
lim φ(x) = ∞,
x→c(s)
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
it follows that there is some δ > 0 such that if x ∈ Ω and kx − c(s)k ≤ δ then
Z
1
|w(x, t)| dt <
,
φ(x) J
4M
in which case |g(x) − f (c(s))| < .
3
D IFFERENTIATION
In some applications we might need to compute derivatives of g. Let α = (α1 , α2 ) be a
multi-index, and let Dα = ∂ α1 +α2 /(∂ α1 x1 ∂ α2 x2 ). We start by expressing g in (I.11)
as g(x) = σ(x)/φ(x), where
Z b
σ(x) =
w(x, t)f (c(t)) dt,
a
and we reduce the task of computing derivatives of g to that of computing derivatives
of σ and φ, which are given by
Z b
Z b
Dα σ(x) =
Dα w(x, t)f (c(t)) dt, and Dα φ(x) =
Dα w(x, t) dt,
a
a
α
β
with Dα operating with respect to the x variable. Letting
= αβ11 αβ22 , and defining
β ≤ α to mean that βi ≤ αi for both i = 1, 2, and β < α to mean that β ≤ α and
α 6= β, we take the Dα derivative of the equation φ(x)g(x) = σ(x), and the Leibniz
rule gives
X α Dβ φ(x)Dα−β g(x) = Dα σ(x),
β
0≤β≤α
and by rearranging this in the form


X α 1 
Dα g(x) =
Dα σ(x) −
Dβ φ(x)Dα−β g(x) ,
φ(x)
β
(I.21)
0<β≤α
we obtain a rule for computing all partial derivatives of g recursively from those of σ
and φ. Letting
d = d(x, t) = c(t) − x,
r = r(x, t) = kd(x, t)k,
so that r3 w = d × c0 , an approach similar to the derivation of (I.21) gives


X
1 
Dβ r3 Dα−β w ,
Dα w = 3 Dα d × c0 −
r
0<β≤α
31
(I.22)
(I.23)
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
a rule to compute the partial derivatives of w recursively. Since it is easy to differentiate
r2 , we can use the Leibniz rule to differentiate r3 :
X α Dβ r2 Dα−β r.
Dα r3 = Dα r2 r =
β
0≤β≤α
By applying the Leibniz rule to r2 , we obtain derivatives of r:


X α 1 
Dα r =
Dβ rDα−β r .
Dα r2 −
β
2r
(I.24)
0<β<α
In the case that ∂Ω is a polygon, we can differentiate the explicit formula of g in
(I.7), which boils down to differentiating wi in (I.8). Similar to (I.21) we have
1
1
α
1 X
Dα
Dβ ρi Dα−β
=−
,
β
ρi
ρi
ρi
0<β≤α
and the formula for Dα ρi is given by (I.24) with r replaced by ρi . Derivatives of
tan(αi /2) can be found by rewriting it in terms of scalar and cross products of di (x) =
pi − x,
α ρ ρ
i
i i+1 − di · di+1
tan
.
=
2
di × di+1
Then, by viewing this as a quotient, we have

α 1
i
Dα (ρi ρi+1 − di · di+1 )
Dα tan
=
2
di × di+1

α X
i 
.
−
Dβ (di × di+1 ) Dα−β tan
2
0<β≤α
4
H ERMITE INTERPOLATION
We now construct a Hermite interpolant based on mean value interpolation. As we will
see, the interpolant is a natural generalization of cubic Hermite interpolation in one
variable, and it helps to recall the latter. Given the values and first derivatives of some
function f : R → R at the points x0 < x1 , cubic Hermite interpolation consists of
finding the unique cubic polynomial p such that
p(xi ) = f (xi )
and
p0 (xi ) = f 0 (xi ),
32
i = 0, 1.
(I.25)
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
One way of expressing p is in the form
p(x) = g0 (x) + ψ(x)g1 (x),
(I.26)
where g0 is the linear Lagrange interpolant
g0 (x) =
x − x0
x1 − x
f (x0 ) +
f (x1 ),
x1 − x0
x1 − x0
ψ is the quadratic weight function
ψ(x) =
(x − x0 )(x1 − x)
,
x1 − x0
and g1 is another linear Lagrange interpolant,
g1 (x) =
x − x0
x1 − x
m0 +
m1 ,
x1 − x0
x1 − x0
whose data m0 and m1 are yet to be determined. To see this, observe that since ψ(xi ) =
0, i = 0, 1, p in (I.26) already meets the Lagrange conditions in (I.25), and since
ψ 0 (xi ) 6= 0 for i = 0, 1, the derivative conditions in (I.25) are met by setting
mi =
f 0 (xi ) − g00 (xi )
,
ψ 0 (xi )
i = 0, 1.
Now observe that for x ∈ (x0 , x1 ) we can express g0 and ψ as
X
1
1
X
f (xi )
1
g0 (x) =
|x
−
x|
|x
i
i − x|
i=0
i=0
and
ψ(x) = 1
X
1
i=0
1
, (I.27)
|xi − x|
and similarly for g1 . Therefore, by viewing |xi −x| as the distance from x to the boundary point xi of the domain (x0 , x1 ) we see that the mean value interpolant g in (I.4)
is a generalization of the linear univariate interpolant g0 to two variables. Similarly, φ
in (I.4) generalizes the denominator of ψ above. This suggests a Hermite approach for
the curve case. Given the values and inward normal derivative of a function f defined
on ∂Ω, we seek a function p : Ω → R satisfying
p(y) = f (y)
and
∂f
∂p
(y) =
(y),
∂n
∂n
y ∈ ∂Ω,
(I.28)
in the form
p(x) = g0 (x) + ψ(x)g1 (x),
33
(I.29)
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
0.5
0.4
0.3
0.2
0.1
0
−1
0
1
Figure I.5: Upper and lower bounds for the unit disk.
where g0 is the Lagrange mean value interpolant to f in (I.11), ψ is the weight function
ψ(x) =
1
,
φ(x)
(I.30)
with φ from (I.11), and g1 is a second Lagrange mean value interpolant whose data is
yet to be decided. Similar to the univariate case, we need to show that ψ(y) = 0 and
∂ψ
∂n (y) 6= 0 for y ∈ ∂Ω. Then we obtain (I.28) by setting
g1 (y) =
∂f
∂g0
∂ψ
(y) −
(y)
(y),
∂n
∂n
∂n
y ∈ ∂Ω.
(I.31)
∂g0
Thus we also need to determine ∂ψ
∂n (y) and ∂n (y). We treat each of these requirements
in turn.
First, observe that Theorems I.2 and I.3 give the upper and lower bounds
1
1
d(x, ∂Ω) ≤ ψ(x) ≤ d(x, ∂Ω),
2π
C
x ∈ Ω,
(I.32)
and so ψ(x) → 0 as x → ∂Ω, and so ψ extends continuously to ∂Ω with value zero
there. Figure I.5 shows the upper and lower bounds on ψ with C = 2 in the case that
Ω is the unit disk. The figure shows a plot of ψ and the two bounds along the x-axis.
Next we show that the normal derivative of ψ is non-zero.
Theorem I.5. If d(ME , ∂Ω) > 0 and d(MI , ∂Ω) > 0 and y ∈ ∂Ω, then
∂ψ
1
(y) = .
∂n
2
34
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
a1
a2
R
R
h
x
h
δ
y
a3
Figure I.6: Lines in proof of Theorem I.5.
Proof. Let R = d(MI , ∂Ω). Then the open disc B of radius R that is tangential to ∂Ω
at y on the inside of ∂Ω is empty. For small enough δ > 0, the point x = y + δn is in
B. Let a1 , a2 , a3 be the three points on ∂B such that a2 6= y lies on the line through
x and y, and a1 and a3 lie on the line perpendicular to it, see Figure I.6. Let q be the
four-sided polygon passing through y, a1 , a2 , a3 . Then
Z 2π
Z 2π
1
1
φ(x) ≤
dθ ≤
dθ.
ρ
(x,
θ)
ρ(x,
θ; q)
1
0
0
Then, by Lemma I.1 applied to each edge of q, and since tan(π/4) = 1, we have
1
1
1
1
φ(x) ≤ 2
+
+2
+
.
ky − xk ka1 − xk
ka1 − xk ka2 − xk
So, since ky − xk = δ and ka2 − xk = 2R − δ, and letting h = ka1 − xk = ka3 − xk,
we find
2δ
δ
δφ(x) ≤ 2 1 +
+
.
h
2R − δ
p
√
Moreover, since h2 = R2 − (R − δ)2 , we have h = (2R − δ)δ ≈ 2Rδ for small
δ, and therefore
lim sup δφ(x) ≤ 2.
(I.33)
δ→0
On the other hand, for small δ, y is the closest point to x in ∂Ω, and then (I.17) gives
2d
d
√
,
δφ(x) ≥
δ+d
δ + d + δ 2 + 2δd
where d = d(ME , ∂Ω), and thus
lim inf δφ(x) ≥ 2.
δ→0
35
(I.34)
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
The inequalities (I.33) and (I.34) show that δφ(x) → 2 as δ → 0, and thus
∂ψ
ψ(x) − ψ(y)
1
1
(y) = lim
= lim
= .
δ→0
δ→0 δφ(x)
∂n
δ
2
We have now shown that the Hermite construction (I.29) works, and that the normal
derivative of ψ is 1/2. To apply (I.31) we still have to compute the normal derivative
of g0 .
Theorem I.6. Let g be as in (I.11). If d(ME , ∂Ω) > 0 and d(MI , ∂Ω) > 0, and
y ∈ ∂Ω then
Z
1 b
∂g
(y) =
w(y, t) f (c(t)) − f (y) dt.
∂n
2 a
Proof. For small δ > 0, let x = y + δn. Then dividing both sides of equation (I.19)
by δ, and letting δ → 0, gives the result, using Theorem I.5.
We plotted the weight function ψ on four different domains, shown in Figure I.7. In
the first three, we used numerical quadrature on the integral formula for φ in (I.11). We
use an adaptive approach, where for each x, we split the integral into a fixed number of
pieces, and apply Romberg integration to each piece, i.e., the extrapolated trapezoidal
rule. If at some stage we detect that x is on the boundary, within a given numerical
tolerance, we terminate the integration and return 0 for the value of ψ. For the fourth
domain, which is a regular pentagon, we simply use the exact polygonal formula in
(I.7). We apply similar approaches to evaluate the interpolant g in (I.11).
The weight function ψ is itself a Hermite interpolant with value 0 and normal
derivative 1/2 on the boundary. Figure I.8 shows other Hermite interpolants.
5
A MINIMUM PRINCIPLE
A useful property of harmonic functions is that they have no local maxima or minima
on arbitrary domains. Lagrange mean value interpolants, however, do not share this
property on arbitrary domains, but we conjecture that they do on convex domains. We
are not able to show this, but we can establish a ‘minimum principle’ for the weight
function ψ on arbitrary domains. Since ψ is positive in Ω and zero on ∂Ω, it must have
at least one maximum in Ω, and the S example in Figure I.7 illustrates that it can have
saddle points. But we show that it never has local minima.
36
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
Figure I.7: The weight function ψ on various domains.
Figure I.8: Hermite mean value interpolants.
Lemma I.2. For arbitrary Ω, with φ given by (I.14),
Z
∆φ(x) = 3
0
2π n(x,θ)
X
j=1
(−1)j−1
dθ.
ρ3j (x, θ)
Proof. With the notation of (I.22) we have w = (d×c0 )/r3 in (I.11) and differentiation
gives
(−c02 , c01 ) 3(d × c0 )d
d × c0
+
and
∆w
=
3
.
∇w =
r3
r5
r5
37
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
c2
c1
Ω
c0
Figure I.9: Multiply connected domain.
Integrating the latter expression with respect to t and using (I.10) and the notation of
(I.14), gives the claimed formula.
Lemma I.2 shows that ∆φ > 0 in Ω due to (I.13). From this we deduce
Theorem I.7. In an arbitrary domain Ω, the weight function ψ has no local minima.
Proof. Suppose x∗ ∈ Ω is a local minimum of ψ. Then ∇ψ(x∗ ) = 0 and ∆ψ(x∗ ) ≥
0. But since ψ = 1/φ, we have
∇ψ = −
∇φ
φ2
and
∆ψ = −
∆φ
|∇φ|2
+2 3 .
2
φ
φ
Therefore, ∇φ(x∗ ) = 0 and ∆ψ(x∗ ) = −∆φ(x∗ )/φ2 (x∗ ) < 0, which is a contradiction.
6
D OMAINS WITH HOLES
So far in the paper, we have assumed that Ω is simply connected. In the case that Ω
is multiply connected, all the previously derived properties and formulas continue to
hold with only minor changes. In fact, the angle formula for g in (I.14) is unchanged
in the presence of holes as long as the points pj (x, θ) are understood to be the ordered
intersections of L(x, θ) with all components of ∂Ω. Thus, all angle formulas and
associated properties are also valid for multiply connected domains. However, the
boundary integral formula (I.11) needs to be extended as follows. Suppose that Ω has
m holes, m ≥ 0, so that ∂Ω has m+1 components: the outer one and the m inner ones.
We represent all these pieces parametrically as ck : [ak , bk ] → R2 , k = 0, 1, . . . , m,
with ck (ak ) = ck (bk ). The outer curve c0 of ∂Ω is oriented anti-clockwise and the
inner pieces c1 , . . . , cm are oriented clockwise, see Figure I.9. Then (I.11) should be
38
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
replaced by
g(x) =
m Z
X
k=0
bk
wk (x, t)f (ck (t)) φ(x),
ak
φ(x) =
m Z
X
k=0
bk
wk (x, t) dt. (I.35)
ak
Previous formulas involving the single parametric curve c need to be extended accordingly, but this is straightforward and left to the reader.
7
A PPLICATIONS
We discuss two applications of mean value Hermite (and Lagrange) interpolation.
7.1
S MOOTH MAPPINGS
Smooth mappings from one planar region to another are required in reduced basis element methods for PDE’s that model complex fluid flow systems [65]. The reduced
basis element method is a domain decomposition method where the idea is to decompose the computational domain into smaller blocks that are topologically similar to a
few reference shapes. We propose using mean value interpolation as an efficient way
of generating suitable smooth mappings. Figure I.10 shows on the top left a reference
shape for a bifurcation point in a flow system studied in [65] that could model for
example blood flow in human veins. Top right shows the reference shape mapped to
the computational domain, using the method of [65]. The mapping is continuous but
not C 1 along certain lines in the interior of the domain. However, the result of using
Lagrange mean value interpolation is a C ∞ mapping, bottom left. Finally, it may be
desirable to control the normal derivative of the mapping along the boundary. This can
be achieved using Hermite mean value interpolation. Bottom right shows the Hermite
mean value mapping where the normal derivative of the mapping at each boundary
point equals the unit normal vector at the corresponding point of the computational
domain boundary.
There appears to be no guarantee that these mappings will in general be one-toone. However, we have tested Lagrange mean value mappings from convex domains
to convex domains and have always found them to be injective. We conjecture that this
holds for all convex domains.
7.2
A WEIGHT FUNCTION FOR WEB - SPLINES
Recently, Hollig, Reif, and Wipper [41, 40] proposed a method for solving elliptic
PDE’s over arbitrarily shaped domains based on tensor-product B-splines defined over
a square grid. In order to obtain numerical stability, the B-splines are ‘extended’, and
in order to match the zero boundary condition, they are multiplied by a common weight
39
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
Figure I.10: A bifurcation prototype is mapped to a deformed bifurcation using different transfinite interpolants.
function: a function that is positive in Ω and zero on ∂Ω. Various approaches to choosing a weight function for this kind of finite element method have been discussed in
[51, 76, 41, 83]. The weight function ψ we used in Hermite interpolation satisfies
these basic properties, and in view of the upper and lower bounds (I.32) and the constant normal derivative in Theorem I.5, it behaves like half the signed distance function
near the boundary. So ψ is a good candidate for the weight function in the web-spline
method.
We used bicubic web-splines to solve Poisson’s equation ∆u = f on various domains Ω with zero Dirichlet boundary condition and various right-hand sides f . The
top two plots of Figure I.11 show approximate solutions u over an elliptic domain with
a circular hole, defined by the zeros of r1 and r2 where
r1 (x1 , x2 ) = 1 − x21 /16 − x22 /9,
r2 (x1 , x2 ) = (x1 + 3/4)2 + (x2 − 1/2)2 − 1,
and with right hand side f = sin(r1 r2 /2), a test case used in [41]. The top left plot
shows the result of using the weight function ψ = r1 r2 , while the top right plot shows
the result of using the mean value weight function ψ. The error for the two methods
is similar, with both having a numerical L2 -error of the order O(h4 ) with h the grid
size, as predicted by the analysis of [41]. At the bottom of Figure I.11 are plots of the
approximate numerical solution to ∆u = −1 on other domains using the mean value
weight function. On the left is the solution over a regular pentagon, and on the right is
40
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
Figure I.11: Numerical solution using bicubic web-splines.
the solution over the domain defined by the ‘S’ in the Times font, with piecewise-cubic
boundary. The numerical L2 error in these two cases was O(h2 ), which is expected
when the domain boundary has corners.
One can extend the web-spline method to deal with inhomogeneous problems using
Lagrange mean value interpolation. If we want to solve ∆u = f in Ω with u = u0 on
∂Ω, we can let g be the mean value interpolant (I.11) to u0 , and express the solution
as u = g + v where v solves the homogeneous problem ∆v = fˆ in Ω with v = 0
on ∂Ω, and fˆ = f − ∆g. This approach requires computing the Laplacian of the
mean value interpolant g in (I.4) and this can be done using the formulas of Section 3.
We used bicubic web-splines to solve the inhomogeneous problem with f = −1/2
and u0 (y) = 1 − (y12 + y22 )/8. In Figure I.12, the left plot shows the true solution
u(x) = 1 − (x21 + x22 )/8 and the right plot shows the numerical solution.
Acknowledgement. We thank Ulrich Reif, Kai Hormann, and Solveig Bruvoll for
helpful ideas and comments in this work.
41
PAPER I: T RANSFINITE MEAN VALUE INTERPOLATION
Figure I.12: Solving inhomogeneous problems.
42
PAPER II:
P REFERRED DIRECTIONS FOR
RESOLVING THE NON - UNIQUENESS OF
D ELAUNAY TRIANGULATIONS
Christopher Dyken and Michael S. Floater
Computational Geometry: Theory and Applications 43 (2006).
Abstract: This note proposes a simple rule to determine a unique triangulation
among all Delaunay triangulations of a planar point set, based on two preferred
directions. We show that the triangulation can be generated by extending Lawson’s
edge-swapping algorithm and that point deletion is a local procedure. The rule can
be implemented exactly when the points have integer coordinates and can be used
to improve image compression methods.
1
I NTRODUCTION
Delaunay triangulations [16] play an important role in computational geometry [70].
A recent application that has emerged is their use in compressing digital images [17].
Such images are represented by rectangular arrays of grey scale values or colour values
and one approach to compression is to start by representing them as piecewise linear
functions over regular triangulations and then to approximate these functions by piecewise linear functions over triangulations of subsets of the points. If one could agree
on a unique method of triangulating the points, one would obtain higher compression
rates because the sender would only need to encode the points and the height values,
not the connectivity of the triangulation: the receiver would be able to reproduce the
triangulation exactly.
One advantage of Delaunay triangulations is that they are almost unique. In fact
they are unique for point sets containing no sets of four co-circular points. However,
in the case that a set of planar points is a subset of a rectangular array of points, there
will typically be many co-circular points and therefore a large number of Delaunay
triangulations.
An obvious and simple approach to the non-uniqueness problem is to perturb the
points randomly before triangulating but there is no guarantee that the perturbed points
will have a unique Delaunay triangulation. An alternative approach could be a symbolic
43
PAPER II: P REFERRED DIRECTIONS FOR D ELAUNAY TRIANGULATIONS
perturbation method as discussed in [1, 18, 24, 68, 81]. Such a method should lead
to a unique Delaunay triangulation of the ‘perturbed’ points, but may not be a valid
triangulation of the original ones.
The purpose of this note is to point out that non-uniqueness can be resolved without perturbing the points. We show that a simple rule based on two preferred directions
can be used to determine a unique member of all the Delaunay triangulations of a set
of points in the plane. We show that the rule can simply be incorporated into Lawson’s
swapping algorithm and that point deletion is a local procedure. We further show that,
importantly, the rule can be computed exactly in integer arithmetic. The preferred direction method could immediately be applied to the compression algorithms described
above.
2
T RIANGULATING QUADRILATERALS
Let Q be a quadrilateral in the plane, with ordered vertices v1 , w1 , v2 , w2 as in Figure II.1. If Q is strictly convex, i.e., convex and such that no three vertices are collinear,
then there are two ways to triangulate it, either by placing a diagonal edge between
v1 and v2 , as in Figure II.1, or between w1 and w2 . We want to propose a simple
rule which determines the diagonal of any such quadrilateral uniquely. A natural rule
seems to be to use a preferred direction: we choose that diagonal which makes the
smallest angle with some arbitrary, fixed straight line. However, the two angles could
be equal, and in order to distinguish the two diagonals in this case, we will use a second preferred direction. Thus we choose any two non-zero vectors d1 and d2 which are
neither parallel nor orthogonal to each other. One such choice would be d1 = (1, 0)
and d2 = (1, 1). For any line (or line segment) `, let αi (`), where 0 ≤ αi (`) ≤ π/2, be
the angle between ` and the (undirected) vector di , i = 1, 2. Then we define the score
of ` as the ordered pair of angles
score(`) := (α1 (`), α2 (`)).
We compare scores lexicographically. Thus for two arbitrary lines ` and m, we say that
score(`) < score(m) if either α1 (`) < α1 (m) or α1 (`) = α1 (m) and α2 (`) < α2 (m).
Lemma II.1. Two lines ` and m have the same score if and only if they are parallel.
Proof. If ` and m are parallel, then clearly αi (`) = αi (m) for both i = 1, 2 and so `
and m have the same score. Conversely, suppose ` and m are not parallel but that they
have the same score. Then at the point p of intersection between ` and m, the two lines
through p in the directions d1 and d2 must bisect the lines ` and m. But this can only
occur if d1 is either parallel or orthogonal to d2 , which is a contradiction.
44
PAPER II: P REFERRED DIRECTIONS FOR D ELAUNAY TRIANGULATIONS
v2
w2
w1
v1
Figure II.1: A strictly convex quadrilateral
Our preferred direction rule for the quadrilateral Q in Figure II.1 simply chooses
that diagonal, [v1 , v2 ] or [w1 , w2 ], with the lowest score. Since the two diagonals are
never parallel, they always have distinct scores by Lemma II.1.
3
T RIANGULATING CONVEX POLYGONS
We next use the preferred direction rule to triangulate uniquely any strictly convex
polygon P , illustrated in Figure II.2. Consider EI (P ), the set of all interior edges of
P , i.e., all line segments [v1 , v2 ] connecting non-neighbouring pairs of vertices v1 and
v2 of P . We start by ranking all the edges of EI (P ) according to their score. We
then employ an insertion algorithm, inserting edges of EI (P ) into P in order of their
ranking. In the first step, we insert all edges of EI (P ) (one or more) which share the
lowest score. Note that if there are more than one of these, they must all be parallel by
Lemma II.1, and so they do not cross each other. In the general step, we insert all edges
of EI (P ) with the current lowest score which do not cross edges previously inserted.
We continue until we have triangulated P , denoting the triangulation by Td1 ,d2 (P ).
We will show that Td1 ,d2 (P ) has the useful property that it can be generated by
a local optimisation procedure based on edge swapping. Suppose T (P ) is any triangulation of P . Every interior edge e of T (P ) is the diagonal of a strictly convex
quadrilateral Q, and can therefore be swapped by the other diagonal e0 of Q to form
a new triangulation. We say that e is locally optimal if its score is lower than that of
e0 . Otherwise, we optimize Q by swapping e with e0 . We keep applying this local
optimisation procedure until no more swaps can be performed, in which case we say
that the triangulation is locally optimal.
Lemma II.2. Let T (P ) be any triangulation of P . If there is an edge e ∈ EI (P )
45
PAPER II: P REFERRED DIRECTIONS FOR D ELAUNAY TRIANGULATIONS
P
Figure II.2: A strictly convex polygon and triangulation
v2
e´
e
v1
Figure II.3: The edges e and e0
which crosses an edge of T (P ) with a higher score, then T (P ) is not locally optimal.
Proof. Let e0 = [v1 , v2 ] be an edge of T (P ) with the highest score among all edges
of T (P ) which cross e. We will show that e0 is not locally optimal. Let p be the
intersection of e and e0 . Without loss of generality, and by translating and rotating P
and d1 and d2 about p, we may assume that p = (0, 0) and d1 = (1, 0). Then since
e0 has a higher score than e, e0 cannot lie along the x-axis, and so we can assume that
y1 < 0 < y2 (where vi = (xi , yi )). Further, by reflecting all points, edges, and d1 and
d2 about the y-axis if necessary, we may assume that x1 ≤ 0 ≤ x2 ; see Figure II.3.
Next let `0 be the infinite straight line passing through e0 and let `00 be its reflection
about the y-axis, and let A1 and A2 be the two open semi-infinite cones bounded by
`0 and `00 , with A1 in the positive y half-plane and A2 in the negative y half-plane; see
Figure II.4. Clearly because α1 (e) ≤ α1 (`0 ) = α1 (`00 ), e does not intersect A1 ∪ A2 .
Thus e is contained in the the union of the two closed regions B1 and B2 shown in
Figure II.4. Note, moreover, that e is not contained in the line `0 . It may however be
46
PAPER II: P REFERRED DIRECTIONS FOR D ELAUNAY TRIANGULATIONS
A1
l´´
l´
v2
B2
B1
v1
A2
Figure II.4: Regions of vertices
contained in the line `00 .
Let Q be the convex quadrilateral of T (P ) with e0 as its diagonal. Let the two
vertices in Q other than v1 and v2 be w1 and w2 , with w1 lying on the side of `0
containing the positive x-axis and w2 lying on the side containing the negative x-axis;
see Figure II.5 for an example. Next we show that w2 ∈ B1 . Indeed if w2 ∈ A1
then by the convexity of P , the edge [v1 , w2 ] of T (P ) intersects e and [v1 , w2 ] has a
higher score than e0 which contradicts the definition of e0 . Furthermore, if w2 ∈ `00 and
e 6⊂ `00 , then again the edge [v1 , w2 ] would intersect e and have a higher score than e0 .
Therefore, if w2 ∈ `00 then also e ⊂ `00 . In this case, since no three vertices of P are
collinear, we conclude that w2 is an end point of e.
Similarly, w1 ∈ B2 , because if w1 ∈ A2 then the edge [v2 , w1 ] would intersect
e and have a higher score than e0 which contradicts the definition of e0 . A similar
argument shows that if w1 ∈ `00 then e ⊂ `00 and w1 must be an end point of e.
To complete the proof, we will show that e0 is not locally optimal by showing that
[w1 , w2 ] has a lower score than e0 . First observe from Figure II.5 that α1 ([w1 , w2 ]) ≤
α1 (e0 ). Moreover, if either w2 is in the interior of B1 or w1 is in the interior of
B2 , or both, then the inequality is strict and so score([w1 , w2 ]) < score(e0 ). The
only remaining possibility is that both w1 and w2 lie on the line `00 , in which case
α1 ([w1 , w2 ]) = α1 (e0 ). We have shown, however, that in this case both w1 and w2 are
the end points of the edge e, so that e = [w1 , w2 ]. Since e has a lower score than e0 by
assumption, e0 is therefore not locally optimal.
An immediate consequence of Lemma II.2 is
Theorem II.1. If T (P ) is locally optimal then T (P ) = Td1 ,d2 (P ).
47
PAPER II: P REFERRED DIRECTIONS FOR D ELAUNAY TRIANGULATIONS
l´´
l´
v2
Q
w2
w1
v1
Figure II.5: A possible quadrilateral
Proof. Lemma II.2 implies that if e is an interior edge of T (P ) then the only edges of
EI (P ) which cross it have a higher score than e. This means that when the score of e is
reached in the insertion algorithm, e will be inserted into P . Thus e is in Td1 ,d2 (P ).
Lemma II.2 and Theorem II.1 together clearly imply
Theorem II.2. A line segment e ∈ EI (P ) is an edge of Td1 ,d2 (P ) if and only if it is
not crossed by an edge in EI (P ) with a lower score.
4
D ELAUNAY TRIANGULATIONS
Finally we use the preferred direction rule to determine a unique triangulation among
all possible Delaunay triangulations of a given set of points in the plane.
Any set of planar points V which are not all collinear admits a unique Delaunay
pretriangulation, which is a tiling of the points, whose boundary is the convex hull of
V [86]. Two points form an edge in the tiling if and only if they are strong neighbours
in the Voronoi diagram of V [86, 70]. Two points are Voronoi neighbours if their
Voronoi tiles intersect. Two such tiles intersect either in a line segment or a point. If
the intersection is a line segment the two points are strong neighbours, and they are
weak neighbours otherwise. The vertices of each tile in the Delaunay pretriangulation
lie on a circle. Tiles with three vertices are triangles. By triangulating any tile with four
48
PAPER II: P REFERRED DIRECTIONS FOR D ELAUNAY TRIANGULATIONS
or more vertices arbitrarily, we convert the Delaunay pretriangulation into a Delaunay
triangulation.
Since each tile P of the Delaunay pretriangulation is strictly convex, we can simply triangulate it using Td1 ,d2 (P ) and in this way we determine a unique Delaunay
triangulation of V , which we will denote by Td1 ,d2 (V ).
Similarly to the case of convex polygons, we next show that given an arbitrary
triangulation T (V ), we always reach Td1 ,d2 (V ) by edge swapping. We simply augment Lawson’s local optimisation procedure for Delaunay triangulations [58] with the
preferred direction rule as follows. Suppose e = [v1 , v2 ] is an interior edge of some
triangulation T (V ). If Q, the quadrilateral having e as its diagonal, is not strictly convex we say that e is locally optimal. Otherwise, if e0 = [w1 , w2 ] denotes the opposite
diagonal of Q, we say that e is locally optimal if w2 lies strictly outside the circumcircle C through v1 , v2 , w1 . If w2 lies strictly inside C, we swap e with e0 . Otherwise
v1 , v2 , w1 , w2 are co-circular and we use the preferred direction rule as a tie-breaker: e
is locally optimal if score(e) < score(e0 ). Otherwise we swap e with e0 . We say that
T (V ) is locally optimal if all its interior edges are locally optimal.
Theorem II.3. If T (V ) is locally optimal then T (V ) = Td1 ,d2 (V ).
Proof. If T (V ) is locally optimal then it is a Delaunay triangulation [58], [86]. Thus
every edge of T (V ) which is in the Delaunay pretriangulation of V is also in Td1 ,d2 (V ).
Every remaining edge e of T (V ) is an interior edge of some tile P of the pretriangulation. But since e is then in Td1 ,d2 (P ) it must also be in Td1 ,d2 (V ).
Note further that an important operation on Delaunay triangulations is point deletion. It is well known that when an interior vertex of a Delaunay triangulation is
deleted, a Delaunay triangulation of the remaining points can be constructed by simply
retriangulating the hole created by the removal of v, and thus updating the triangulation is a local operation, and can be implemented efficiently. We now show that the
triangulation Td1 ,d2 (V ) has an analogous property.
Theorem II.4. Let v ∈ V be an interior vertex of the triangulation Td1 ,d2 (V ). Then
every edge e of Td1 ,d2 (V ) which is not incident on v also belongs to the triangulation
Td1 ,d2 (V \ v).
Proof. Suppose first that e is an edge of the Delaunay pretriangulation of V . Then its
end points are strong neighbours in the Voronoi diagram of V , and are therefore also
strong neighbours of the Voronoi diagram of V \ v, and so e is also an edge of the
Delaunay pretriangulation of V \ v, and therefore an edge of Td1 ,d2 (V \ v).
The remaining possibility is that e is an interior edge of some strictly convex tile P
of the Delaunay pretriangulation of V , and all vertices of P lie on a circle. If v is not a
49
PAPER II: P REFERRED DIRECTIONS FOR D ELAUNAY TRIANGULATIONS
vertex of P , then P will also be a tile in the Delaunay pretriangulation of V \ v, and so
e, being an interior edge of Td1 ,d2 (P ), will be contained in Td1 ,d2 (V \ v). Otherwise
v is a vertex of P . Then the polygon P 0 , formed by removing v from P , is a tile
in the Delaunay pretriangulation of V \ v. Now if v and the two end points of e are
consecutive vertices of P , then e will be an edge of P 0 , and therefore of Td1 ,d2 (V \ v).
Otherwise, since EI (P 0 ) ⊂ EI (P ), we see by Theorem II.2 that since e is not crossed
by any edge in EI (P ) with a lower score, it is not crossed by any edge of EI (P 0 ) with
a lower score and is therefore an interior edge of Td1 ,d2 (P 0 ), and hence also an edge of
Td1 ,d2 (V \ v).
5
N UMERICAL IMPLEMENTATION
Using the implementation of the circumcircle test proposed by Cline and Renka [14],
one can construct a Delaunay triangulation of a set V of points with integer coordinates
using integer arithmetic. So in order to construct the triangulation Td1 ,d2 (V ) in this
case, we only need to show that the preferred direction rule can be implemented exactly.
To see that this is possible, consider the angle test
i = 1, 2.
αi [v1 , v2 ] < αi [w1 , w2 ] ,
One way to convert this to an integer comparison is to observe that it is equivalent to
cos αi [v1 , v2 ] > cos αi [w1 , w2 ]
which, using scalar products, is equivalent to
(v2 − v1 ) di (w2 − w1 ) di |v2 − v1 | · |di | > |w2 − w1 | · |di | .
By squaring and removing the common denominator we get the equivalent test
2
2
|w2 − w1 |2 ((v2 − v1 ) · di ) − |v2 − v1 |2 ((w2 − w1 ) · di ) > 0.
Since the left hand side is a polynomial in the coordinates of the points v1 , w1 , v2 , w2 ,
and the vector di , we see that provided d1 and d2 also have integer coordinates, the left
hand side is also an integer. Clearly, all angle tests involved in comparing two scores
involve testing the sign (positive, zero, or negative) of such an integer.
Figure II.6 shows the Delaunay triangulation Td1 ,d2 (V ) for two different choices
of d1 and d2 , where V is a subset of points on a square grid. The triangulations in
the figure were found by recursive point insertion, applying Lawson’s swapping algorithm augmented with the preferred direction rule, and using integer arithmetic. One
50
PAPER II: P REFERRED DIRECTIONS FOR D ELAUNAY TRIANGULATIONS
Figure II.6: Triangulations with d1 = (1, 0), d2 = (1, 1) and d1 = (1, −1), d2 = (0, 1)
could, however, use any method to find one of the possible Delaunay triangulations and
then apply edge swapping with the preferred direction rule to reach the triangulation
Td1 ,d2 (V ). Since the edges of the Delaunay pretriangulation will never be swapped,
the only swapping required will occur inside the tiles of the pretriangulation, and so
the number of swaps will generally be low. The Delaunay triangulations computed in
[17] were found by recursive point removal and this algorithm could be augmented
with the preferred direction rule and the updates would be local due to Theorem II.4.
51
PAPER II: P REFERRED DIRECTIONS FOR D ELAUNAY TRIANGULATIONS
52
PAPER III:
S IMULTANEOUS CURVE
SIMPLIFICATION
Morten Dæhlen, Thomas Sevaldrud, and Christopher Dyken
Submitted.
Abstract: In this paper we present a method for simultaneous simplification
of a collection of piecewise linear curves in the plane. The method is based on
triangulations, and the main purpose is to remove line segments from the piecewise
linear curves without changing the topological relations between the curves. The
method can also be used to construct a multi-level representation of a collection
of piecewise linear curves. We illustrate the method by simplifying cartographic
contours and a set of piecewise linear curves representing a road network.
1
I NTRODUCTION
A simplification of a piecewise linear curve is a piecewise linear approximation using
fewer segments than the original, where the distance between the curve and its approximation is within a prescribed tolerance. In addition, when simplifying sets of curves,
we want to maintain the relationships between the curves, thus keeping the topology
of the curve set constant. In this article we will describe a method for performing a
simultaneous simplification of a curve set, that is, all the curves are simplified in parallel while enforcing the given topology of the curve set. A piecewise linear curve is
defined by a sequence of points where the convex hull pairs of two consecutive points
define the line segments. We create the approximation by finding a subsequence of the
original sequence of points by discarding points one-by-one until no more points can
be removed without violating the tolerance.
An important issue is the strategy for finding which point to remove at each step of
the process. Usually, such processes are guided by evaluating the error induced by removing each candidate point, and choosing the point inducing the smallest error. This
is a greedy optimization approach, which not necessarily find a global optimum, but
gives good results in practice. In most cases the traditional Euclidean norm or variations over the Hausdorff metric is used to measure distance between curves, see [4]. In
this paper we will use a variation over these measures. The distance measure is particularly important when performing simultaneous curve simplification since we must
53
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
measure the distance to neighbouring curves. Our solution for handling neighbouring
relations, both topological and geometrical, is based representing the curve set as a
triangulation.
Simplification of piecewise linear curve sets, and in particular curve sets representing level curves or cartographic contours, was one of the early challenges of computerbased cartography. Other typical examples of curve sets which appear in cartography
and geographical information systems are river networks, various types of boundary
curves describing property borders, forest stands, etc. Contour data and boundary
curves are also important within other applications areas, e.g. when modeling geological structures.
The most commonly used method for curve simplification is the Douglas-Peucker
algorithm [20, 74], which was developed in the early seventies. Other methods have
also been developed as well, see [4] and references therein. Our method for simultaneous simplification of a set of curve is based on measuring the distance between points
and curves using variations over the Euclidean norm, in addition to relationships between points and curves, which are efficiently found by using triangulations. We also
illustrate how this method can be used to construct a nested multi-level representation
of curve networks.
The outline of this paper is as follows: In the next section we give some background
information on refinement and decimation of curves and triangulations, followed by a
section defining the problem statement and further comments the motivation behind
this work. In Section 4 we describe the triangulation into which the curve set is embedded. Then, in Section 5 we describe the simultaneous curve simplification algorithm.
Finally, we conclude the paper with a few examples in Section 6 and some remarks on
future work in Section 7.
2
R EFINEMENT AND DECIMATION
Various algorithms have been constructed for simplification of piecewise linear curves.
The most well-known is the Douglas-Peuker algorithm [20, 74], of which we will give
an overview.
We begin by defining a piecewise linear planar curve. Given a sequence of points
in the plane
p1 , . . . , pn , pi ∈ R,
the curve is defined by the line segments
[pi , pi+1 ],
i = 1, 2, . . . , n − 1,
where [·] is the convex hull.
54
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
pl
pk
pn
p1
Figure III.1: Two steps of the Douglas-Peucker simplification algorithm.
Then, given a tolerance we proceed as follows. The initial approximation is the
single segment [p1 , pn ]. We then pick the point pk in p2 , . . . pn−1 that is farthest
away from [p1 , pn ], using e.g. the Euclidean distance measure.
If the distance between pk and [p1 , pn ] is greater than the tolerance, we insert
pk into the approximation, and get two new line segments [p1 , pk ] and [pk , pn ]. We
continue this process recursively on every new line segment until the tolerance is met
for every line segment.
The two first steps of the process is shown in Figure III.1 where first pk is inserted
into [p1 , pn ] yielding the two line segments [p1 , pk ] and [pk , pn ]. Then, pl is inserted
into [pk , pn ], and we end up with an approximation defined by the three line segments
[p1 , pk ], [pk , pl ], and [pl , pn ]. Variations over the Douglas-Peuker algorithm and other
methods can be found in [4], and references therein.
The Douglas-Peuker is a refinement procedure, where we start with a coarse approximation (one single line segment in our example) and inserts the most significant
points, getting increasingly finer approximations, until the prescribed tolerance is met.
Another approach is the converse process, called a decimation procedure. We start with
the original curve as the initial approximation and remove the least significant points
progressively, giving increasingly coarser approximations until no more points can be
removed without violating the prescribed tolerance.
The simultaneous simplification algorithm presented in this paper is based on decimation of triangle meshes. Similarly to curve simplification, we can start with two
triangles covering the domain as our initial approximation and insert the most significant points followed by a re-triangulation one by one until the tolerance is met. Or
55
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
we can start with the full triangle mesh and remove points and re-triangulate while the
tolerance is not violated. A variety of methods has been developed for this purpose,
and details on these algorithms can be found in [38] and references therein.
3
P ROBLEM STATEMENT
We are given a set of piecewise linear curves in the plane, and we want to simplify
these curves without modifying the topological relationship between the curves. Such
a curve set is the lake in the top row of Figure III.3 where each curve define a contour
of a complex polygon. When simplifying it is important that islands in the lake remain
inside the lake, and furthermore that the lake polygon remain a valid polygon. Another
example is the road network in the bottom row of Figure III.3. The road network is a
curve set where each curve describes the path between two junctions.
Thus, we are given a set Q of M piecewise linear curves,
Q = {P j ,
j = 1, . . . , M },
(III.1)
where each curve P j in Q is a sequence of N j points,
P j = {pj1 , . . . pjN j },
pji ∈ D,
D ⊂ R.
(III.2)
contained in some suitable domain D in the plane.
We want to create a simplified approximation of Q, which we denote Q̂,
Q̂ = {Pˆj ,
j = 1, . . . , M },
where Pˆj approximates P j using a subset of the points of P j .
Given a tolerance and a suitable distance measure, we create Q̂ from Q by removing
points from the curves one-by-one while |Q̂ − Q| < . When removing points one-byone, we inherently create a sequence of increasingly coarser approximations, and thus,
we can easily obtain a multiple-level-of-detail representation of Q by instead of using
a single fixed tolerance , we use a sequence of monotonically increasing tolerances
0 < 1 < 2 < . . . < L . Initially, we let Q̂0 = Q, and using 1 we create Q̂1 from
Q̂0 . Further, we create Q̂2 from Q̂1 using 2 , and so on. Since we only remove points
and never move a point, the points of Q̂i is a subset of the points of Q̂i−1 , and thus, we
have a nested multiple-level-of-detail representation.
If this is done carefully we can introduce the curve sets into a triangle based multilevel terrain model. Alternatively, a sequence of approximations can also be obtained
by removing a fixed number of points successively, and hence avoid using an error
tolerance at all. In Figure III.7, a fixed number of points were removed, while in
figures III.8, III.9, and III.10, a sequence of tolerances were used.
56
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
p
p
r
r
q
Figure III.2: Approximating the two consecutive line segments [p, q] and [q, r] with [p, r]
introduces two new intersections, and thus is an illegal simplification which changes the topology
of the curve set.
However, care must be taken when we remove points. Figure III.2 shows the result
of an unfortunate point removal. By approximating the two consecutive line segments
[p, q] and [q, r] with the single line segment [p, r], we introduce two new intersections,
and thus, change the topological relationship between the curves. From this we see
that in order to simplify one curve segment in a network, we must consider all curves
simultaneously.
To satisfy that the topological relationship between curves remain constant when
we remove a point, we never remove any point whose removal would violate any of the
following four requirements:
1. All intersections between curves between curves shall be kept fixed.
2. No new intersections between curves shall be generated.
3. No curves shall degenerate into one single point.
4. No closed curves shall degenerate into a single line segment.
This asserts that the topology of the curve set as a whole does not change.
For example, in the case of the road network, requirements 1 through 3 ensures
that all existing junctions are maintained, no new junctions are introduced, or that two
junctions don’t collapse into one single junction. Further, in the case of areal objects,
like a polygonal lake, Requirement 4 ensures that they remain areal objects.
In order to continuously enforce the four requirements while we remove points,
we need a suitable method of encoding the geometric relationship between the curves.
This encoding can be handled by using the concept of triangulations from graph theory,
on which we will elaborate in the next section.
57
607m
607m
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
2595.8m
607m
2595.8m
607m
2595.8m
2595.8m
Figure III.3: Polygonal networks with constrained Delaunay triangulations. The top row depicts
a polygonal lake, and the bottom row depicts a small part of a road network. The left column
show the original datasets, while the right column show their maximally simplified counterparts.
4
T RIANGULATION
To encode the geometrical relationship between the curves of Q, we represent Q as a
triangulation and use triangle decimation to remove points and line segments. Thus,
the initial triangulation must contain all points of Q and every line segment of every
curve P j of Q must be an edge in the triangulation. The edges not corresponding to
line segments encodes the spatial relationship between the curves. Figure III.3 shows
examples of such triangulations. Notice that a curve can have itself as a neighbour.
We assume that all curve intersections are also points on the curves. If this is not
the case, the intersection points must be found and inserted into their respective curves.
58
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
With this assumption, it is always possible to construct a such a triangulation of Q.
This follows from the fact that the collection of curves can be extended to a set of
closed non-overlapping polygons by including the boundary of the convex hull of the
curve set. Moreover, any closed polygon in the plane can be partitioned into a set of
triangles [69], and hence it follows that we can construct a triangulation where all line
segments from the curve set is an edge in the triangulation.
For practical purposes we embed the entire curve set in a rectangular region by introducing the boundary of this rectangle as a curve in the the curve set, see Figure III.3.
In most examples this boundary curve will be a rectangle represented as a polygon
containing the 4 corner points of the rectangle plus points where the curve network
intersects the boundary, e.g. where a road or a river leaves the rectangular region.
Several methods exists for creating a triangulation with prescribed edges. A wellknown approach is to create a Delaunay triangulation followed by a successive phase
of edge swapping that includes the prescribed edges into the triangulation [69]. Any
non-boundary edge is always a diagonal of a quadrilateral composed of two adjacent
triangles. If this quadrilateral is convex we can swap the edge to form two new triangles. Alternatively, we can directly construct a constrained Delaunay triangulation,
using all the line segments of the curves as initial edges and fill inn edges between the
curves [59]. Figure III.3 shows the original and decimated versions of two curve sets.
5
C URVE SET DECIMATION
We have a curve set Q of piecewise linear curves embedded in a triangulation T such
that each line segment in Q corresponds to an edge in T . The triangulation T encodes
the geometrical relationship between the curves of Q. In this section we describe how
we use triangle decimation to simplify the curves while enforcing the topology of the
curve set.
We begin by giving a general description of the decimation algorithm, and continue
with discussing the various components of the algorithm in detail. The structure of the
decimation algorithm is as follows:
1. Determine the set of points that are candidates for removal, and assign to each
candidate a weight based on suitable error measure.
2. While the list of candidates for removal is non-empty and the smallest weight
have an error less than the prescribed tolerance, do:
(a) Remove the candidate with the smallest weight from its curve and the triangulation.
59
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
(b) Update the list of candidates and calculate the new weights for points in the
neighbourhood of the removed point.
Notice that there are two error measures at play. One error measure is used to determine
the sequence in which points are removed, and another error measure is used to describe
the global error of the approximation, and thus is used to terminate the decimation
process. Alternatively, we can use a prescribed number of points to be removed as
termination criteria.
5.1
N ODES AND VERTICES
We assume that the initial curve set Q satisfy the four requirements given in Section 3.
The triangulation T is a planar graph consisting of points and edges, and Q is a subgraph of T .
We classify the points in T into two sets, nodes and vertices. The set of vertices is
the points that are not nodes. A node is a point in T that is also an endpoint of a curve
or a junction in Q. The four corner points of the rectangular domain is also included in
the set of nodes.
Further, some points are classified as nodes to handle closed loops. We must make
sure that every closed loop consists of at least three nodes that are not co-linear. This
requirement asserts that Requirement 4 will not be violated in the decimation process.
If a single curve itself forms a closed loop (and the two endpoints of the curve is
the same point), we tag two of the interior points of the curve as nodes such that the
three nodes of the curve are not co-linear and form a triangle. Further, if two curves
together form a closed curve, e.g. when two curves describes the perimeter of a lake,
one interior point of one curve that is not co-linear with the endpoints must be classified
as a node. For three or more curves forming a loop, the problem does not exists.
5.2
R EMOVABLE VERTICES
Nodes are never considered for removal. Further, at each step in the decimation process, only some of the vertices can be removed without violating some of the requirements.
Let q be a vertex. The platelet of q is the union of triangles of T that has q as a
corner, see Figure III.4. The geometry of the platelet defines whether or not a vertex
can be removed. Vertices are always in the interior of curves, and thus, they belong
to one single curve and always has two immediate neighbouring points on the same
curve.
Let q be the vertex on the curve P j considered for removal and p and r be the
two immediate neighbours on P j . The points p and r is situated on the border of
60
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
q
q
p
p
r
r
Figure III.4: The geometry of the platelet of q allows or prohibits removal of q. In the left
figure, [p, r] is strictly inside the platelet of q, and thus, q can safely be removed. However, in
the right figure, [p, r] intersects the boundary of the platelet of q twice, and thus removal of q
would introduce two new intersections, and therefore q cannot be removed without changing the
topology of the curve set.
the platelet of q, refer to Figure III.4. Removing q implies that we simplify the two
consecutive line segments [p, q] and [q, r] of P j with the single line segment [p, r].
The line segments of neighbouring curves are located on the boundary of the platelet of
q. Thus, if [p, r] is strictly inside the platelet of q, the line segment [p, r] never touches
any other curve, and therefore can be safely removed without violating any of the four
criteria. Contrary, if [p, r] intersects the boundary, the approximation intersects one or
more of the neighbouring curves, and the removal of q would change the topology of
the curve set.
We check if the line segment [p, r] is strictly inside the platelet of q in the following way. Let s1 , . . . , sL be the points on the boundary of the platelet of q from
p to r, counter-clockwise organized, and similarly, let t1 , . . . , tR be the points on the
boundary of the platelet of q from r to p, also counter-clockwise organized. That is,
the sequence
p, s1 , . . . , sL , r, t1 , . . . , tR ,
describes the complete border of the platelet of q. Then, the line segment [p, r] is
strictly inside the platelet of q if either
det(r − p, si − p) < 0,
i = 1, ..., L
det(r − p, tj − p) > 0,
j = 1, ..., R,
61
and
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
q s2
s3
q s2
s3
s4
v1
v1
s4
r
s1
s1
l
p
p
r
l
Figure III.5: The maximum distance v1 is the maximum distance between the approximating
line segment l = [p, r] and the vertices s1 , . . . , sn between p and r that is removed from the
curve segment. In the left figure, the maximum distance is perpendicular to l, while in the right
figure it is not.
or
det(r − p, si − p) > 0,
i = 1, ..., L
det(r − p, ti − p) < 0,
j = 1, ..., R,
and
where det(·, ·) is the determinant of a 2x2 matrix.
5.3
W EIGHT CALCULATION
Some points are more significant that others, and we handle this by associating a weight
to each vertex. The decimation process removes the vertex with the smallest weight and
updates the weight of the remaining vertices.
It is natural that weights are based on some appropriate error measure. If approximation error is the sole property we’re interested in, using the approximation error induced by the removal of a vertex as the weight of that vertex is appropriate. However,
in some applications, other properties like separation of adjacent curves are important.
In this case more elaborate weight expressions are appropriate.
In the following we present three different error measures based un Euclidean distances in the plane.
Maximum distance. This distance measure is the maximum distance between a
curve and the approximation of the curve we get if the vertex is removed. Let q be
a vertex on the curve P j . The points defining P j are either remaining or have been
removed in preceding decimation steps.
62
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
Let p be the last remaining point on P j before q, and let r be the first remaining
point after q. Moreover, let s1 , . . . , sn be the sequence of removed points of P j from
p to q.
The removal of q provides that the two line segments [p, q] and q, r] is replaced by
the single line segment [p, r]. The maximum distance between the original curve P j
and the approximation induced by the removal of q is
v1 = max {dist{l, si },
i = 1, . . . , n} ,
(III.3)
where l is the line segment [p, r]. Figure III.5 illustrates the calculation of the maximum distance.
Scaled maximum distance. The maximum distance measure only describes local
approximation error. However, in some cases the separation of curves, that is, the
distance between adjacent curves, is important, for example when approximating cartographic contours. To accommodate this, we introduce an additional error measure, a
scaled version of the maximum distance measure.
Similarly to the maximum distance calculation, let q be the point on P j we are
calculating the weight for, and let p and q be the the remaining points before and after
q on P j . The points p and q and segments of curves close to P j lie on the border of
the platelet of q.
Let t1 , . . . , tm be the vertices on the border of the platelet of q, excluding the two
points p and r. Then, assuming that l = [p, r],
v2 = min {dist{l, tj },
j = 1, . . . , m} ,
(III.4)
is the minimum distance between the approximating line segment [p, r] and the points
on the border of the platelet of q, as illustrated in the left of Figure III.6.
The scaled maximum distance is then defined as
ws =
v1
,
v2
where v1 is the maximum distance (III.3).
The scaled maximum distance measure scales the maximum distance measure proportionally to the minimum distance to neighbouring curves, and thus makes it less
likely that q on P j is removed if other curves are situated close to P j . In this way, the
scaled maximum distance measure tends to preserve separation of adjacent curves.
63
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
p
p
q
v2
q
v2
R
r
r
Figure III.6: The scaled maximum distance measure is the maximum distance scaled by v2 ,
which is the minimum distance between the nodes on the border of the platelet of q and the
approximating line segment [p, q].
Hybrid distance measure. The scaled measure tries to maintain the spatial separation between neighbouring curves. A drawback is that this scaling also has influence
on the separation between curves in areas where curves are relatively far apart, in areas
that the maximum distance measure would be more appropriate. For example, when
simplifying cartographic contours, the scaled maximum distance measure is appropriate in areas dense with curves in order to maintain a good separation of adjacent curves,
while in areas sparse with curves, the maximum distance measure is appropriate. The
hybrid distance measure solves this.
The hybrid distance measure introduces a parameter R, which can be regarded as a
radius of separation. If the minimum distance v2 is smaller than R, we use the scaled
maximum distance measure, otherwise, if v2 is larger than R, we use the maximum
distance measure, see Figure III.6 right.
First, we let v3 be v2 clamped to the interval [0, R],
(
v2 if v2 < R
v3 =
R otherwise.
Then, hybrid distance measure wh is defined as
wh =
v1
.
v3
In effect, R chooses whether the maximum distance measure or the scaled distance
measure is used.
The choice of R is highly dependent on context. For plotting purposes, R should be
dependent on pen size, or for visualization on-screen, R should be dependent on pixel
64
607m
607m
607m
607m
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
607m
607m
Figure III.7: Comparison of the three distance measures. The curve set given in Figure III.8 left
was simplified to 90 points using the three different norms. Left: Maximum distance measure,
M = 16.7m. Center: Scaled maximum distance measure, S = 44.5m. Right: Hybrid distance
measure (R = 20m), H = 22.3m.
size. If the resulting curve set should be represented on a grid, the grid size should
dictate the size of R. At other times, it is appropriate that R should be proportional
to the error threshold . However, this can be a bit dicy when doing multi-resolution
simplifications, since R can be too small at the finer resolutions resulting in bad spatial
separation at coarser resolutions.
Figure III.7 show the dataset from Figure III.8, left, with 90 points removed using
the three different distance measures. We see that the approximation from the maximum distance measure has the best overall shape, but the contours along the gorge in
the right part of the contours end up almost on top of each other. The approximation
of the scaled maximum distance measure has distinctly better separation between the
contours, however, at the cost of an overall coarser shape. The result of the hybrid
measure maintains good separation between adjacent contours while also preserving
the overall shape of contours considerably better than the scaled maximum norm.
6
N UMERICAL RESULTS
In this section we present three examples: A set of cartographic contours, a road network, and a lake and river network. The two first examples consist of polygonal curves,
while the last example is a mix of polygonal curves as well as complex polygons with
holes.
Cartographic contours. The cartographic contour curve set is a curve set of 15 cartographic contours, where the contours are a mix of open and closed piecewise linear
curves. The curve set is originally defined by 470 points (Figure III.8, left) over an area
65
607m
607m
607m
607m
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
607m
607m
Figure III.8: Multi-resolution simplification of cartographic contours, created using the hybrid
norm (R = 20m). Left: Original set of cartographic contours, 470 points. Center: Simplification
1 = 15.4m, 108 points. Right: Further simplification 2 = 100m, 66 points. Here, no more
points can be removed without violating the topological constraints.
of 607 × 607 meters.
First, we simplified the original curve set three times, removing 380 points (and
thus leaving 90 points to define the curve set) using the three different distance measures to guide the simplification. The results are shown in Figure III.7. We used the
maximum distance measure to define the approximation error. It is no surprise that
the simplification guided by the maximum distance measure got the smallest approximation error (M = 16.7 meters). However, as mentioned in Section 5.3, the maximum distance measure has problems preserving separation of adjacent curves, which is
clearly visible in Figure III.7, left. In Figure III.7, center, the result of using the scaled
maximum distance measure as a guide is shown. The separation of adjacent curves
is maintained, but the approximation error is considerably worse (S = 44.5 meters),
which is particularly visible on the gentle side of the mountain top where the contours
are unnecessarily coarse (the approximation error on this part is scaled such that it
becomes very small since the distance between adjacent contours is large). However,
simplification guided by the hybrid distance measure, which is depicted in the right of
Figure III.7, maintains a good balance between approximation error (H = 22.3 meters) and separation of adjacent contours. The shapes are almost as good as the result
of simplification guided by the maximum distance measure, while also the contours
along the gorge are well separated.
Note in particular that even though adjacent contours may end up close to each
other, which is the case in particular when the maximum distance measure is used to
guide the simplification, any contour never end up on top of another contour, which
would cause new intersections to form. This is because the four requirements of Section 3 is honored, and points that whose removal would result in a contour set of a
66
32441.7m
32441.7m
32441.7m
32441.7m
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
32441.7m
32441.7m
Figure III.9: Multi-resolution simplification of the road network of the Lake Tahoe dataset [89],
created using the hybrid distance measure (R = 500m). Left: Original curve set, 123038 points.
Middle: Simplification 1 = 100m, 33014 points. Right: Further simplification 2 = 1000m,
12839 points.
different topology is never removed, and thus, the simplification maintains a constant
topology of the contour set.
Further, Figure III.8 show a multi-resolution simplification of the set of cartographic contours, where the hybrid distance measure (R = 20 meters) was used to
guide the simplification. The sequence of simplifications is nested in that sence that
the points of a coarse simplification is a subset of the points of any of the finer simplifications. In addition, each line segment of a coarse simplification has a one-to-one
correspondence to a sequence of one or more line segments in a finer simplification.
To the left, the original contour set is shown. In the center, the first simplification with
tolerance 1 = 15.4 meters, which gives a contour set of 108 points (23%), and to the
right, a further simplification with tolerance 2 = 100 meters, using 66 points (14%).
This simplification cannot be further simplified, since removal of any of the remaining
points would change the topological type.
67
32441.6m
32441.6m
32441.6m
32441.6m
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
32441.6m
32441.6m
Figure III.10: Multi-resolution simplification of the lake and river network combined of the
Lake Tahoe dataset [89], created using the hybrid distance measure (R = 500m). Left: Original
network, 144293 points. Middle: Simplification 1 = 100m, 31794 points. Right: Further
simplification 2 = 1000m, 8022 points.
Road network. The road network of Figure III.9 is the road network of the Lake
Tahoe dataset [89], covering a domain of approximately 32 × 32 square kilometers.
Note that the network is plotted using the geographic coordinate system and thus appears to be rectangular even though the domain is square. The original network, which
is depicted in Figure III.9, is defined by 123038 points.
The road network was consecutively simplified two times using the hybrid distance
measure (R = 500 meters). The first simplification ran with a tolerance 1 = 100
meters, which produced an approximation of 33014 points (27%), depicted in the center of Figure III.9. The original road network and the first approximation are visually
indistinguishable at this scale, even though the approximation only uses one fourth of
the geometry. The second simplification is a further simplification of the first simplification, using a tolerance 2 = 1000 meters, and results in a road network of 12839
points (10%), about one tenth of the geometry of the original network, and with the
same topology as the original road network.
68
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
Maximum
Scaled
Hybrid
Maximum
Scaled
Hybrid
Maximum
Scaled
Hybrid
1500
1000
75
1000
50
500
500
25
0
0
0
100
200
300
400
0
0
25000
50000
75000
100000
0
25000
50000
75000
100000
125000
Figure III.11: Approximation errors resulting from using the different distance measures to
guide the simplification process. From left to right, the cartographic contours, the road network,
and the lake and river network is shown respectively.
Lake and river network. The final example is the lake and river networks of the
Lake Tahoe dataset [89] combined into one network, a network consisting of both
piecewise linear curves and complex polygons with holes.
A clean polygon is a set of closed non-intersecting curves, called rings or contours,
consistently oriented such that the inside of the polygon always is on a consistent side
when one “walks” along the curves. Simplification of polygons using our technique is
simple. We let each contour be a closed curve in the contour set. Since the topology of
the curve set is constant during the simplification, the polygon defined by the simplified
curves is always a clean consistent polygon.
The original network is shown in the left of Figure III.10, and the two consecutive
simplifications are shown in the middle and right of the same figure. The simplifications were guided by the hybrid distance measure (R = 500 meters), and the tolerances
was the same as for the road network. The original network consists of 144293 points,
the first approximation with a tolerance of 1 = 100 meters consists of 31794 points
(22%) and the second approximation with a tolerance of 2 = 1000 meters consists of
8022 points (6%).
Similarly to the road network, the original lake and river network is visually indistinguishable at this scale from the first approximation, which contains significantly less
geometry.
Approximation error In Figure III.11, the approximation error (the maximum distance measure) is plotted as a function of the number of the points removed, for each of
the three examples. From left to right, the plots for the cartographic contours, the road
network, and the lake and river networks are shown respectively. Each plot shows the
69
PAPER III: S IMULTANEOUS CURVE SIMPLIFICATION
approximation error of the three different norms as the simplification progresses, and
thus gives an indication on aspect of the performance of the three distance measure.
As expected, simplifications guided directly by the maximum distance gives the
best performance in terms of approximation error, but do not preserve curve separation. The scaled maximum distance measure preserves curves separation better, but
has a significantly worse performance when comparing approximation error, which is
quite evident in all three plots. However, the hybrid distance measures embraces the
best from both worlds, both preserving good separation between curve, and gives a
reasonable performance in terms of approximation error.
7
C ONCLUDING REMARKS
In this paper we presented a method for simultaneous simplification of curve sets in
the plane, that is, simplification of curves while maintaining the geometric relationship
between curves.
In addition, we introduces three different distance measures that can be used to
guide the simplification process, choosing which point to remove at each step. The
distance measures have different characteristics. The maximum distance measure minimizes approximation error, while the scaled maximum distance measure preserves
separation of adjacent curves. These two measures was successfully combined in the
hybrid distance measure, which both preserves separation of adjacent curves while giving a reasonable performance in regard of approximation error. However, the hybrid
distance measure requires that a radius of separation to be specified, which is highly
dependent on context.
The method can, in addition to create single simplifications with a given tolerance
or a given number of points, be used to create multi-level-of-detail representations of
collections of curves.
70
PAPER IV:
R EAL - TIME LINEAR SILHOUETTE
ENHANCEMENT
Christopher Dyken and Martin Reimers
Mathematical Methods for Curves and Surfaces: Tromsø 2004.
Abstract: We present a simple method for improving the rendered appearance
of coarse triangular meshes. We refine and modify the geometry along silhouette edges in real-time, and thus increase the geometric complexity only where
needed. We address how to define the improved silhouette and a method to blend
the modified and the original geometry continuously.
1
I NTRODUCTION
Coarse triangular meshes are used extensively in real-time rendering applications such
as computer games and virtual reality software. Fragment level techniques, like texture
mapping and per-pixel lighting, effectively make objects appear to have more geometric detail than they do in reality. Although highly effective in many cases, even
the most advanced shading technique cannot hide the piecewise linear silhouette of a
coarse polygonal mesh. This artifact can be seen even in current, cutting edge computer
games.
The silhouette separates an object from the background and is characterised by a
sharp change in contrast and texture, which is very important in the human perception
of shape [55]. Therefore, the silhouette region conveys a very significant amount of the
visual information.
We propose a straightforward technique to improve the visual appearance of the
silhouette of a coarse polygonal mesh. We replace linear silhouette edges with smooth
curves and refine the local geometry accordingly during rendering. We define a continuous “silhouetteness” test and use this to blend flat and curved geometry in order to
avoid transitional artifacts. We also propose a simple and efficient rendering technique
for the silhouette enhanced geometry.
The idea of improving silhouettes of piecewise linear geometry has been addressed
before. In [79] a high resolution version of the model is used to create a stencil clipping
the outline of the coarse geometry, at the cost of an extra rendering pass. Our method
is related to the PN-triangle construction [92] which is based on replacing each triangle
71
PAPER IV: R EAL - TIME LINEAR SILHOUETTE ENHANCEMENT
of the mesh with a triangular spline patch defined by its geometry and shading normals.
This implicitly improves the silhouettes, but increases the geometric complexity globally. A similar approach is taken in [91], where subdivision is used instead of spline
patches. In [2] a local refinement technique for subdivision surfaces is described.
The outline of the rest of the paper is as follows. After some preliminaries, we
propose a new continuous silhouetteness classification method. In section 3 we define
smooth edge curves based on vertex positions and normals and construct a Bézier patch
for each triangle. We use silhouetteness to define a view dependent geometry with
smoother silhouettes. We conclude with implementational issues and conclusion in
sections 4 and 5.
2
S ILHOUETTES
We assume for simplicity that Ω is a closed triangle mesh with consistently oriented
triangles T1 , . . . , TN and vertices p1 , . . . , pn in R3 . An edge of Ω is defined as eij =
[pi , pj ] where [·] denotes the convex hull of a set. The triangle normal nt of a triangle
Tt = [pi , pj , pk ] is defined as the normalisation of the vector (pj − pi ) × (pk − pi ).
Since our interest is rendering Ω we also assume that we are given shading normals,
nti , ntj , ntk associated with the vertices of Tt . The view point o ∈ R3 is the position
of the observer and for a point p on a Ω, the view direction vector is p − o. If n is the
surface normal in p we say that Ω is front facing in p if (p − o) · n ≤ 0, otherwise it
is back-facing.
The silhouette of a triangle mesh is the set of edges where one of the adjacent faces
is front-facing while the other is back-facing. Let pij be the midpoint of an edge eij
shared by two triangles Ts and Tt in Ω. Defining fij : R3 → R by
fij (x) :=
pij − x
· ns
kpij − xk
pij − x
· nt ,
kpij − xk
(IV.1)
we see that eij is a silhouette edge when observed from o in case fij (o) ≤ 0 (and it is
not occluded by other objects).
Our objective is to render silhouette edges of Ω as smooth curves. Since these
curves does not in general lie in Ω and since “silhouetteness” is a binary function
of the view-point, a naive implementation leads to transitional artifacts; the rendered
geometry depends discontinuously on the view-point. We propose instead to make the
rendered geometry depend continuously on the view-point. To that end we define the
72
PAPER IV: R EAL - TIME LINEAR SILHOUETTE ENHANCEMENT
(a) Coarse mesh
(b) Coarse mesh refined along silhouette
Figure IV.1: Replacing triangles along the silhouette with triangular spline patches yields a
smoother silhouette.
silhouetteness of eij seen from x ∈ R3 to be


1
αij (x) := 1 − fij (x)/βij


0
fij (x) ≤ 0
0 < fij (x) ≤ βij ,
βij < fij (x),
(IV.2)
where βij ≥ 0 is a constant. This continuous silhouetteness classification extends
the standard binary classification by adding a transitional region, see Figure IV.2. A
silhouetteness αij ∈ (0, 1) implies that eij is nearly a silhouette edge. We will use
73
PAPER IV: R EAL - TIME LINEAR SILHOUETTE ENHANCEMENT
0<fij (x )< βij
fij (x)< 0
fij (x)> βij
φ
φ
ns
0<fij (x )< βij
fij (x)< 0
[ pi , pj]
nt
Figure IV.2: Values of fij in (IV.1) looking along the edge eij with the transitional region with
angle φ marked gray.
φ=0
φ=
1
3
φ=
2
3
Figure IV.3: Edges with αij > 0 for different angles φ in (IV.3).
silhouetteness to control the view dependent interpolation between silhouette geometry
and non-silhouette geometry.
The constant βij could depend on the local geometry. As an example we could let
the transitional region define a “wedge” with angle φ with the adjacent triangles as in
Figure IV.2. This amounts to setting
β = sin φ cos φ sin θ + sin2 φ cos θ,
(IV.3)
where θ is the angle between ns and nt . We have illustrated the effect of varying φ in
Figure IV.3. More elaborate methods could be considered, however we found that the
heuristic choice of βij = 0.25 worked well in practice.
3
V IEW DEPENDENT GEOMETRY
In this section we propose a scheme to define smooth silhouettes based on vertex positions, shading normals and view point. We define for each edge eij in Ω a smooth edge
74
PAPER IV: R EAL - TIME LINEAR SILHOUETTE ENHANCEMENT
curve on the Bézier form
C ij (t) = pi B03 (t) + cij B13 (t) + cji B23 (t) + pj B33 (t),
(IV.4)
where Bi3 (t) = 3i ti (1 − t)3−i , see e.g. [26]. This cubic curve interpolates its end
points pi and pj , and it remains to determine the two inner control points cij and cji .
Let Ts and Tt be the two triangles adjacent to eij . In the event that the two shading
normals nsi , nti are equal we say that the edge end pi is smooth, otherwise it is a
feature edge end, see Figure IV.4.
Since shading normals by assumption equals surface normals, we require the edge
curve tangents to be orthogonal to the shading normals at the end points. This is equivalent to the conditions
(cij − pi ) · nqi = 0,
q = s, t.
(IV.5)
This is an under-determined Hermite type of problem which has been addressed before. Sabin [77] determines the interior coefficients uniquely by requiring in addition
that the surface normal is parallel to the curve normal at the endpoints, yielding an approximation to a geodesic curve. Note that this method applies only in case nsi = nti .
In [91] a minimal energy approach is used. Farin [26] describes a method for which
the end point tangent direction at pi is found by projecting pj into the tangent plane at
pi and the tangent length is determined somewhat heuristically.
To describe our method, we define the linear interpolant to eij in cubic form,
Lij (t) = pi B03 (t) + lij B13 (t) + lji B23 (t) + pj B33 (t),
(IV.6)
where lk` = (2pk + p` )/3. The control points of Lij are used to determine the control
points of C ij uniquely as follows. For a smooth edge end we define cij to be the
projection of lij onto the edge end tangent plane defined by pi and nsi , i.e.
cij =
(pj − pi ) · nsi
2pi + pj
−
nsi .
3
3
(IV.7)
For a feature edge end, i.e. nsi 6= nti , the intersection of the two tangent planes at pi
defined by nsi and nti is the line pi + xmij with
mij =
nti × nsi
.
knti × nsi k
Projecting lij onto this line yields
cij = pi +
(pj − pi ) · mij
mij .
3
75
(IV.8)
PAPER IV: R EAL - TIME LINEAR SILHOUETTE ENHANCEMENT
nsi= nti
cij
nsj
mji
cji
ntj
pj
pi
Tt
Ts
Figure IV.4: The edge curve control points in case pi is a smooth edge end and pj is a feature
edge end.
It is easy to verify that in either case the conditions (IV.5) are satisfied. The method
has linear precision in the sense that it reproduces Lij in case the shading normals are
orthogonal to eij .
It is worthwhile to mention that the naive approach of handling the feature edges by
using the smooth edge end method (IV.7) with the average of the two shading normals
leads in some cases to sharp edge curves with undesired inflections.
The edge curve coefficients defined in (IV.7) and (IV.8) were used in the PN-triangle
construction [92] to define for each triangle of Ω a Bézier patch on the form
s(u, v, w) =
X
3
bijk Bijk
(u, v, w),
n
Bijk
(u, v, w) =
i+j+k=3
n! i j k
u v w.
i!j!k!
Our approach is similar, defining the boundary coefficients of the patch as a convex
combination of the edge curves (IV.4) and the linear interpolants (IV.6), weighted by
silhouetteness;
b300 = pi ,
b201 = αik cik + (1−αik )lik ,
b102 = αki cki + (1−αki )lki ,
b030 = pj ,
b120 = αji cji + (1−αji )lji ,
b210 = αij cij + (1−αij )lij ,
b003 = pk ,
b012 = αkj ckj + (1−αkj )lkj ,
b021 = αjk cjk + (1−αjk )ljk .
As in [92], we define the central control point to be
b111 =
3
12
b201 + b102 + b021 + b012 + b210 + b120 −
76
1
6
b300 + b030 + b003 ,
PAPER IV: R EAL - TIME LINEAR SILHOUETTE ENHANCEMENT
cji
pj
cji
b120
cij
cjk
pi
b021
b300
ckj
cik
cjk
pi
b111
b201
b012
b102
cki
pk
(a) The edge curve control points
of a triangle with one silhouette
edge.
cij
b030
b210
pj
ckj
cik
cki
b003
(b) The control points of the corresponding patch.
pk
(c) A possible rendering strategy
for a curved patch.
Figure IV.5: We replace every triangle with a triangular Bézier patch defined by the silhouetteness and edge curves of adjacent edges.
since this choice reproduce quadratic polynomials. See Figure IV.5 for an illustration
of the resulting control mesh. Note that if αij = 1 for all edges, the resulting patches
equals the ones in the PN-triangle construction.
Our construction yields for a given viewpoint a spline surface that is a blend between the geometry of the PN-triangle construction and Ω itself. The blend is such that
the surface is smooth near a silhouette and flat elsewhere. The patches depends continuously on the viewpoint and moreover, neighbouring patches have a common edge
curve, and thus meet continuously. Figure IV.1 shows the result on a mesh with only
smooth edges, while the mesh in Figure IV.6 has numerous sharp features. Note that in
the latter example we could have used the same approach to render the obvious feature
lines smoothly as well.
4
I MPLEMENTATIONAL ISSUES
The algorithm proposed in the previous sections could be implemented as follows.
Precompute all the edge curves and store the inner coefficients. For a given view point
we calculate the silhouetteness of the edges according to (IV.2). A straight forward
implementation is linear in the number of triangles. More sophisticated methods exists,
however the break even point appears to be around 10.000 triangles, see [37] and the
references therein.
We next tag a triangle as flat if its three edges have zero silhouetteness and curved
77
PAPER IV: R EAL - TIME LINEAR SILHOUETTE ENHANCEMENT
Figure IV.6: Silhouette improvement of a model with numerous feature edges.
otherwise. A flat triangle is rendered in the standard way, while a curved triangle could
be rendered as depicted in Figure IV.5 (c). The patch is split into three sub-patches with
a common apex at the parametric midpoint of the patch. The sub-patches are refined
independently along the base, and the three sub-patches rendered using a triangle fan.
The OpenGL specification [80] encourages the use of perspective-correct interpolation of associated data like texture coordinates, colours and shading normals when
rasterising polygons. Therefore associated data should be interpolated linearly when
refining the triangular patches.
Our algorithm should be a strong candidate for GPU implementation. However,
determining silhouetteness requires connectivity information and refinement generates
new geometry. These operations are not supported by current versions of OpenGL or
DirectX GPU programs. However, the stencil shadow volume algorithm [25], whose
popularity is growing rapidly in real-time rendering applications, have the same functionality requirements. Therefore, it is likely that future revisions of graphics API’s
have the functionality needed for our algorithm to be implemented on the GPU.
78
PAPER IV: R EAL - TIME LINEAR SILHOUETTE ENHANCEMENT
5
F INAL REMARKS
We have proposed a practical algorithm for improving the visual quality of coarse triangle meshes. To overcome transitional artifacts, we introduced a continuous silhouette
classification method that could be usefull in other similar applications. Our method
of enhancing the silhouettes gave significantly better visual quality in our experiments.
Sharp features can be handled through the use of shading normals. We believe our
method is applicable in many real-time graphics applications such as flight simulators,
computer games and virtual reality settings.
The method could be improved in several ways. Feature lines and boundaries are
easily accomodated by our method; we simply set αij = 1 for such edges in order to
render them smoothly. If a high resolution version of the mesh is known as in [79], the
edge curves could be defined using this fine geometry. One alternative is to let the edge
curve C ij approximate a geodesic curve connecting pi and pj .
79
PAPER IV: R EAL - TIME LINEAR SILHOUETTE ENHANCEMENT
80
PAPER V:
R EAL - TIME GPU SILHOUETTE
REFINEMENT USING ADAPTIVELY
BLENDED
B ÉZIER PATCHES
Christopher Dyken, Martin Reimers, and Johan Seland
Computer Graphics Forum 27, 1 (2008).
Abstract: We present an algorithm for detecting and extracting the silhouette
edges of a triangle mesh in real time using GPUs (Graphical Processing Units).
We also propose a tessellation strategy for visualizing the mesh with smooth silhouettes through a continuous blend between Bézier patches with varying level
of detail. Furthermore, we show how our techniques can be integrated with displacement and normal mapping. We give details on our GPU implementation and
provide a performance analysis with respect to mesh size.
1
I NTRODUCTION
Coarse triangular meshes are used extensively in real-time rendering applications such
as games and virtual reality systems. Recent advances in graphics hardware have made
it possible to use techniques such as normal mapping and per pixel lighting to increase the visual realism of such meshes. These techniques work well in many cases,
adding a high level of detail to the final rendered scene. However, they can not hide
the piecewise linear silhouette of a coarse triangular mesh. We propose an effective
GPU implementation of a technique similar to the one proposed by two of the authors
in [21], to adaptively refine triangular meshes along the silhouette, in order to improve
its visual appearance. Since our technique dynamically refines geometry at the vertex
level, it integrates well with pixel based techniques such as those mentioned above.
We start by reviewing previous and related work in the following section, before
we introduce our notation and recall the silhouetteness classification method that was
introduced in [21]. In Section 4 we discuss the construction of a cubic Bézier patch for
each triangle in the mesh, based on the mesh geometry and shading normals. These
patches are in the subsequent section tessellated adaptively using the silhouetteness to
determine the local level of detail. The result is a “watertight” mesh with good geometric quality along the silhouettes, which can be rendered efficiently. We continue by
81
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
Figure V.1: A dynamic refinement (left) of a coarse geometry (center). Cracking between
patches of different refinement levels (top right) is eliminated using the technique described
in Section 5 (bottom right).
discussing details of our GPU implementation in Section 6, and show how to integrate
our approach with normal and displacement mapping. Thereafter, in Section 7, we
compare the performance of our GPU implementation with several CPU based methods, before we conclude.
2
P REVIOUS AND RELATED WORK
Silhouette extraction. Silhouette extraction has been studied extensively, both in
the framework for rendering soft shadows and for use in non-photorealistic-rendering.
Isenberg et.al. [45] provides an excellent overview of the trade-offs involved in choosing among the various CPU-based silhouette extraction techniques. Hartner et.al. [37]
benchmark and compare various algorithms in terms of runtime performance and code
complexity. For comparison, we present runtime performance for our method within
this framework in Section 7. Card and Mitchell [12] propose a single pass GPU assisted
algorithm for rendering silhouette edges, by degenerating all non silhouette edges in a
vertex shader.
Curved geometry. Curved point-normal triangle patches (PN-triangles), introduced
by Vlachos et.al. [92], do not need triangle connectivity between patches, and are therefore well suited for tessellation in hardware. An extension allowing for finer control of
82
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
the resulting patches was presented by Boubekeur et.al. [7] and dubbed scalar tagged
PN-triangles. A similar approach is taken by van Overveld and Wyvill [91], where subdivision was used instead of Bézier patches. Alliez et.al. describe a local refinement
technique for subdivision surfaces [2].
Adaptivity and multi resolution meshes. Multi resolution methods for adaptive rendering have a long history, a survey is given by Luebke et.al. [64]. Some examples are
progressive meshes, where refinement is done by repeated triangle splitting and deletion by Hoppe [42], or triangle subdivision as demonstrated by Pulli and Segal [73] and
Kobbelt [54].
GPU techniques. Global subdivision using a GPU kernel is described by Shiue et.al.
in [84] and an adaptive subdivision technique using GPUs is given by Bunnel [11].
A GPU friendly technique for global mesh refinement on GPUs was presented by
Boubekeur and Schlick [8], using pre-tessellated triangle strips stored on the GPU.
Our rendering method is similar, but we extend their method by adding adaptivity to
the rendered mesh.
A recent trend is to utilize the performance of GPUs for non-rendering computations, often called GPGPU (General-Purpose Computing on GPUs). We employ
such techniques extensively in our algorithm, but forward the description of GPGPU
programming to the introduction by Harris [34]. An overview of various applications in which GPGPU techniques have successfully been used is presented in Owens
et.al. [71]. For information about OpenGL and the OpenGL Shading Language see the
reference material by Shreiner et.al. [85] and Rost [75].
3
S ILHOUETTES OF TRIANGLE MESHES
We consider a closed triangle mesh Ω with consistently oriented triangles T1 , . . . , TN
and vertices v1 , . . . , vn in R3 . The extension to meshes with boundaries is straightforward and is omitted for brevity. An edge of Ω is defined as eij = [vi , vj ] where [·]
denotes the convex hull of a set. The triangle normal nt of a triangle Tt = [vi , vj , vk ]
is defined as the normalization of the vector (vj − vi ) × (vk − vi ). Since our interest
is in rendering Ω, we also assume that we are given shading normals, nti , ntj , ntk associated with the vertices of Tt . The viewpoint x ∈ R3 is the position of the observer
and for a point v on Ω, the view direction vector is v − x. If n is the surface normal in
v, we say that T is front facing in v if (v − x) · n ≤ 0, otherwise it is back facing.
The silhouette of a triangle mesh is the set of edges where one of the adjacent
triangles is front facing while the other is back facing. Let vij be the midpoint of an
83
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
nj
vj
ni vi
cij
vi
cik
vk
nk
cji
vj
cjk
S1 [F]
ckj F
cki vk
S2 [F]
S3 [F]
Figure V.2: From left to right: A triangle [vi , vj , vk ] and the associated shading normals ni , nj ,
and nk is used to define three cubic Bézier curves and a corresponding cubic triangular Bézier
patch F . The sampling operator Si yields tessellations of the patch at refinement level i.
edge eij shared by two triangles Ts and Tt in Ω. Defining fij : R3 → R by
vij − x
vij − x
· ns
· nt ,
fij (x) =
k vij − x k
k vij − x k
(V.1)
we see that eij is a silhouette edge when observed from x in the case fij (x) ≤ 0.
Our objective is to render Ω so that it appears to have smooth silhouettes, by adaptively refining the mesh along the silhouettes. Since the resulting silhouette curves in
general do not lie in Ω, and since silhouette membership for edges is a binary function
of the viewpoint, a naive implementation leads to transitional artifacts: The rendered
geometry depends discontinuously on the viewpoint. In [21], a continuous silhouette
test was proposed to avoid such artifacts. The silhouetteness of eij as seen from x ∈ R3
was defined as

if fij (x) ≤ 0;

1
f (x)
if 0 < fij (x) ≤ βij ;
αij (x) = 1 − ijβij
(V.2)


0
if fij (x) > βij ,
where βij > 0 is a constant. We let βij depend on the local geometry, so that the transitional region define a “wedge” with angle φ with the adjacent triangles, see Figure V.3.
This amounts to setting βij = sin φ cos φ sin θ + sin2 φ cos θ, where θ is the angle between the normals of the two adjacent triangles. We also found that the heuristic choice
of βij = 14 works well in practice, but this choice gives unnecessary refinement over
flatter areas.
The classification (V.2) extends the standard binary classification by adding a transitional region. A silhouetteness αij ∈ (0, 1) implies that eij is nearly a silhouette
84
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
edge. We use silhouetteness to control the view dependent interpolation between silhouette geometry and non-silhouette geometry.
4
C URVED GEOMETRY
We assume that the mesh Ω and its shading normals are sampled from a piecewise
smooth surface (it can however have sharp edges) at a sampling rate that yields nonsmooth silhouettes. In this section we use the vertices and shading normals of each
triangle in Ω to construct a corresponding cubic Bézier patch. The end result of our
construction is a set of triangular patches that constitutes a piecewise smooth surface.
Our construction is a variant of the one in [21], see also [92].
For each edge eij in Ω, we determine a cubic Bézier curve based on the edge
endpoints vi , vj and their associated shading normals:
C ij (t) = vi B03 (t) + cij B13 (t) + cji B23 (t) + vj B33 (t),
where Bi3 (t) = 3i ti (1 − t)3−i are the Bernstein polynomials, see e.g. [26]. The inner
control point cij is determined as follows. Let Ts and Tt be the two triangles adjacent
to eij and let nsi and nti be the shading normals at vi belonging to triangle Ts and Tt
respectively. If nsi = nti , we say that vi is a smooth edge end and we determine its
(v −v )·n
2v +v
inner control point as cij = i3 j − j 3i si nsi . On the other hand, if nsi 6= nti ,
(v −v )·t
we say that vi is a sharp edge end and let cij = vi + j 3i ij tij , where tij is the
normalized cross product of the shading normals. We refer to [21] for the rationale
behind this construction.
Next, we use the control points of the three edge curves belonging to a triangle
[vi , vj , vk ] to define nine of the ten control points of a cubic triangular Bézier patch of
the form
X
3
F=
blmn Blmn
.
(V.3)
l+m+n=3
3
Blmn
6
l m n
l!m!n! u v w
Here
=
are the Bernstein-Bézier polynomials and u, v, w are
barycentric coordinates, see e.g. [26]. We determine the coefficients such that b300 =
vi , b210 = cij , b120 = cji and so forth. In [92] and [21] the center control point b111
was determined as
b111 =
−
3
12 (cij + cji + cjk
1
6 (vi + vj + vk ) .
+ ckj + cki + cik )
(V.4)
We propose instead to optionally use the average of the inner control points of the three
edge curves,
b111 = 61 (cij + cji + cjk + ckj + cki + cik ) .
(V.5)
85
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
0<fij (x )< βij
fij (x)< 0
fij (x)> βij
φ
φ
ns
0<fij (x )< βij
fij (x)< 0
[ v i , v j]
nt
Figure V.3: Values of fij in (V.1) looking along the edge eij with the transitional region φ
marked gray.
This choice allows for a significantly more efficient implementation at the cost of a
slightly “flatter” patch, see Section 6.4 and Figure V.4. This example is typical in that
the patches resulting from the two formulations are visually almost indistinguishable.
5
A DAPTIVE TESSELLATION
In the previous section we defined a collection of cubic Bézier patches based on the
mesh Ω and its shading normals. We next propose a strategy for tessellating these
patches adaptively for rendering. We let the tessellation level (which controls the number of triangles produced) depend on the local silhouetteness so that silhouettes appear
to be smooth, while retaining the coarse geometry away from the silhouettes. We avoid
“cracking” by ensuring that the tessellations of neighboring patches meet continuously,
see Figure V.1.
The parameter domain of a triangular Bézier patch F is a triangle P0 ⊂ R2 . We
can refine P0 to form a triangle mesh P1 by splitting each edge in half and forming
four new triangles. A further refinement P2 of P1 is formed similarly, by splitting each
triangle in P1 in four new triangles, and so forth. The m’th refinement Pm is a triangle
mesh with vertices at the dyadic barycentric coordinates
(i, j, k)
+
m
:
i,
j,
k
∈
Z
,
i+j
+k
=
2
.
(V.6)
Im =
2m
A tessellation Pm of the parameter domain P0 and a map f : P0 → Rd , gives rise
to a continuous approximation Sm [f ] : P0 → Rd of f that is linear on each triangle
of Pm and agrees with f at the vertices of Pm . For example, Sm [F] maps a triangle
[pi , pj , pk ] in Pm linearly to a triangle [F(pi ), F(pj ), F(pk )] in R3 . It is clear that
the collection of all such triangles forms a tessellation of F. We will in the following
86
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
call both the map Sm [F] and its image Sm [F](P0 ) a tessellation. A piecewise linear
map Sm [f ] can be evaluated at a point p ∈ P0 as follows: Let T = [pi , pj , pk ] be
a triangle in Pm containing p and let (ui , uj , uk ) be the barycentric coordinates of p
with respect to T . Then p = ui pi + uj pj + uk pk and
Sm [f ](p) = ui f (pi ) + uj f (pj ) + uk f (pk ).
(V.7)
Given two tessellations Ps and Pm and two integers s ≤ m, the set of vertices
of Ps is contained in the set of vertices of Pm and a triangle of Pm
in a
is contained
triangle of Ps . Since both maps are linear on each triangle of Sm Ss [f ] and agrees
at the corners, the two maps must be equal in the whole of P0 . This implies that a
tessellation can be refined to a finer level without changing its geometry: Given a map
f : P0 → Rd , we have a corresponding tessellation
Sm Ss [f ] = Ss [f ].
(V.8)
We say that Sm Ss [f ] has topological refinement level m and geometric refinement
level s. From the previous result we can define tessellations for a non-integer refinement level s = m + α where m is an integer and α ∈ [0, 1). We refine Sm [f ] to
refinement level m + 1 and let α control the blend between the two refinement levels,
Sm+α [f ] = (1 − α)Sm+1 [Sm [f ]] + αSm+1 [f ].
(V.9)
See Figure V.5 for an illustration of non-integer level tessellation. The sampling operator Sm is linear, i.e. Sm [α1 f1 + α2 f2 ] = α1 Sm [f1 ] + α2 Sm [f2 ] for all real α1 , α2
and maps f1 , f2 . As a consequence, (V.8) holds for non-integer geometric refinement
level s.
Our objective is to define for each triangle T = [vi , vj , vk ] a tessellation T of
the corresponding patch F adaptively with respect to the silhouetteness of the edges
eij , ejk , eki . To that end we assign a geometric refinement level sij ∈ R+ to each
edge, based on its silhouetteness as computed in Section 3. More precisely, we use
sij = M αij where M is a user defined maximal refinement level, typically M = 3.
We set the topological refinement level for a triangle to be m = dmax{sij , sjk , ski }e,
i.e. our tessellation T has the same topology as Pm . Now, it only remains to determine
the position of the vertices of T . We use the sampling operator Ss with geometric
refinement level varying over the patch and define the vertex positions as follows. For
a vertex p of Pm we let the geometric refinement level be
(
sqr
if p ∈ (pq , pr );
s(p) =
(V.10)
max{sij , sjk , ski } otherwise,
87
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
Figure V.4: The surfaces resulting from the center control point rule (V.4) (left) and (V.5) (right),
applied to a tetrahedron with one normal vector per vertex. The difference is marginal, although
the surface to the right can be seen to be slightly flatter.
0.5 L 0.5
S2 S1 [F]
S2 [F]
S1 [F]
S1.5 [F]
Figure V.5: To tessellate a patch at the non-integerˆrefinement
level s = 1.5, we create the tes˜
sellations S1 [F ] and S2 [F ], and refine S1 [F ] to S2 S1 [F ] such that the topological refinement
levels match. Then, the two surfaces are weighted and combined to form S1.5 [F ].
where (pq , pr ) is the interior of the edge of P0 corresponding to eqr . Note that the
patch is interpolated at the corners vi , vj , vk . The vertex v of T that corresponds to p
is then defined as
v = Ss(p) [F](p) = Sm Ss(p) [F] (p).
(V.11)
Note that s(p) is in general a real value and so (V.9) is used in the above calculation.
The final tessellation is illustrated in Figure V.6.
The topological refinement level of two neighboring patches will in general not
88
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
S3 S0 [F]
1.5
3
s=
F
S3 [F]
s=
s=
S3 S1.5 [F]
0
Figure V.6: Composing multiple refinement levels for adaptive tessellation. Each edge have a
geometric refinement level, and the topological refinement level is dictated by the edge with the
largest refinement level.
be equal. However, our choice of constant geometric refinement level along an edge
ensures that neighboring tessellations match along the common boundary. Although
one could let the geometric refinement level s(p) vary over the interior of the patch,
we found that taking it to be constant as in (V.10) gives good results.
6
I MPLEMENTATION
We next describe our implementation of the algorithm. We need to distinguish between
static meshes for which the vertices are only subject to affine transformations, and dynamic meshes with more complex vertex transformations. Examples of the latter are
animated meshes and meshes resulting from physical simulations. We assume that the
geometry of a dynamic mesh is retained in a texture on the GPU that is updated between frames. This implies that our algorithm must recompute the Bézier coefficients
accordingly. Our implementation is described sequentially, although some steps do not
require the previous step to finish. A flowchart of the implementation can be found in
Figure V.7.
6.1
S ILHOUETTENESS CALCULATION
Silhouetteness is well suited for computation on the GPU since it is the same transform
applied to every edge and since there are no data dependencies. The only changing
89
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
Current
Geometry
Silhouetteness
Triangle
refinement
viewpoint
Start new
frame
Calculate
geometry
Calculate
silhouetteness
Determine
triangle
refinement
Render
unrefined
triangles
Render
patches
Issue rendering of refined
triangles
Extract
triangle data
Extract
edge data
Build
histopyramids
Triangle data
Edge data
Triangle
histopyramid
Edge histopyramid
Figure V.7: Flowchart of the silhouette refinement algorithm. The white boxes are executed on
the CPU, the blue boxes on the GPU, the green boxes are textures, and the red boxes are pixel
buffer objects. The dashed lines and boxes are only necessary for dynamic geometry.
parameter between frames is the viewpoint.
If the mesh is static we can pre-compute the edge-midpoints and neighboring triangle normals for every edge as a preprocessing step and store these values in a texture on
the GPU. For a dynamic mesh we store the indices of the vertices of the two adjacent
triangles instead and calculate the midpoint as part of the silhouetteness computation.
The silhouetteness of the edges is calculated by first sending the current viewpoint
to the GPU as a shader uniform, and then by issuing the rendering of a textured rectangle into an off-screen buffer with the same size as our edge-midpoint texture.
We could alternatively store the edges and normals of several meshes in one texture
and calculate the silhouetteness of all in one pass. If the models have different model
space bases, such as in a scene graph, we reserve a texel in a viewpoint-texture for each
model. In the preprocessing step, we additionally create a texture associating the edges
with the model’s viewpoint texel. During rendering we traverse the scene graph, find
the viewpoint in the model space of the model and store this in the viewpoint texture.
We then upload this texture instead of setting the viewpoint explicitly.
6.2
H ISTOGRAM PYRAMID CONSTRUCTION AND EXTRACTION
The next step is to determine which triangles should be refined, based on the silhouetteness of the edges. The straightforward approach is to read back the silhouetteness
texture to host memory and run sequentially through the triangles to determine the refinement level for each of them. This direct approach rapidly congests the graphics bus
90
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
and thus reduces performance. To minimize transfers over the bus we use a technique
called histogram pyramid extraction [95] to find and compact only the data that we
need to extract for triangle refinement. As an added benefit the process is performed in
parallel on the GPU.
The first step in the histogram pyramid extraction is to select the elements that we
will extract. We first create a binary base texture with one texel per triangle in the
mesh. A texel is set to 1 if the corresponding triangle is selected for extraction, i.e. has
at least one edge with non-zero silhouetteness, and 0 otherwise. We create a similar
base texture for the edges, setting a texel to 1 if the corresponding edge has at least one
adjacent triangle that is selected and 0 otherwise.
For each of these textures we build a histopyramid, which is a stack of textures
similar to a mipmap pyramid. The texture at one level is a quarter of the size of the
previous level. Instead of storing the average of the four corresponding texels in the
layer below like for a mipmap, we store the sum of these texels. Thus each texel in
the histopyramid contains the number of selected elements in the sub-pyramid below
and the single top element contains the total number of elements selected for extraction. Moreover, the histopyramid induces an ordering of the selected elements that can
be obtained by traversal of the pyramid. If the base texture is of size 2n × 2n , the
histopyramid is built bottom up in n passes. Note that for a static mesh we only need a
histopyramid for edge extraction and can thus skip the triangle histopyramid.
The next step is to compact the selected elements. We create a 2D texture with at
least m texels where m is the number of selected elements and each texel equals its
index in a 1D ordering. A shader program is then used to find for each texel i the corresponding element in the base texture as follows. If i > m there is no corresponding
selected element and the texel is discarded. Otherwise, the pyramid is traversed topdown using partial sums at the intermediate levels to determine the position of the i’th
selected element in the base texture. Its position in the base texture is then recorded in
a pixel buffer object.
The result of the histopyramid extraction is a compact representation of the texture
positions of the elements for which we need to extract data. The final step is to load
associated data into pixel buffer objects and read them back to host memory over the
graphics bus. For static meshes we output for each selected edge its index and silhouetteness. We can thus fit the data of two edges in the RGBA values of one render
target.
For dynamic meshes we extract data for both the selected triangles and edges. The
data for a triangle contains the vertex positions and shading normals of the corners of
the triangle. Using polar coordinates for normal vectors, this fit into four RGBA render
targets. The edge data is the edge index, its silhouetteness and the two inner Bézier
control points of that edge, all of which fits into two RGBA render targets.
91
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
6.3
R ENDERING UNREFINED TRIANGLES
While the histopyramid construction step finishes, we issue the rendering of the unrefined geometry using a VBO (vertex buffer object). We encode the triangle index into
the geometry stream, for example as the w-coordinates of the vertices. In the vertex
shader, we use the triangle index to do a texture lookup in the triangle refinement texture in order to check if the triangle is tagged for refinement. If so, we collapse the
vertex to [0, 0, 0], such that triangles tagged for refinement are degenerate and hence
produce no fragments. This is the same approach as [12] uses to discard geometry.
For static meshes, we pass the geometry directly from the VBO to vertex transform,
where triangles tagged for refinement are culled. For dynamic meshes, we replace the
geometry in the VBO with indices and retrieve the geometry for the current frame using
texture lookups, before culling and applying the vertex transforms.
The net effect of this pass is the rendering of the coarse geometry, with holes where
triangles are tagged for refinement. Since this pass is vertex-shader only, it can be
combined with any regular fragment shader for lightning and shading.
6.4
R ENDERING REFINED TRIANGLES
While the unrefined triangles are rendered, we wait for the triangle data read back to
the host memory to finish. We can then issue the rendering of the refined triangles. The
geometry of the refined triangles are tessellations of triangular cubic Bézier patches as
described in Section 4 and 5.
To allow for high frame-rates, the transfer of geometry to the GPU, as well as
the evaluation of the surface, must be handled carefully. Transfer of vertex data over
the graphics bus is a major bottleneck when rendering geometry. Boubekeur et.al. [8]
have an efficient strategy for rendering uniformly sampled patches. The idea is that
the parameter positions and triangle strip set-up are the same for any patch with the
same topological refinement level. Thus, it is enough to store a small number of pretessellated patches Pi with parameter positions Ii as VBOs on the GPU. The coefficients of each patch are uploaded and the vertex shader is used to evaluate the surface
at the given parameter positions. We use a similar set-up, extended to our adaptive
watertight tessellations.
The algorithm is similar for static and dynamic meshes, the only difference is the
place from which we read the coefficients. For static meshes, the coefficients are pregenerated and read directly from host memory. The coefficients for a dynamic mesh
are obtained from the edge and triangle read-back pixel pack buffers. Note that this
pass is vertex-shader only and we can therefore use the same fragment shader as for
the rendering of the unrefined triangles.
The tessellation strategy described in Section 5 requires the evaluation of (V.11) at
92
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
the vertices of the tessellation Pm of the parameter domain of F, i.e. at the dyadic
parameter points (V.6) at refinement level m. Since the maximal refinement level M
over all patches is usually quite small, we can save computations by pre-evaluating the
basis functions at these points and store these values in a texture.
A texture lookup gives four channels of data, and since a cubic Bézier patch has
ten basis functions, we need three texture lookups to get the values of all of them at a
point. If we define the center control point b111 to be the average of six other control
points, as in (V.5), we can eliminate it by distributing the associated basis function
B111 = uvw/6 = µ/6 among the six corresponding basis functions,
3
B̂300
= u3 ,
3
3
B̂201
= 3wu2+µ, B̂102
= 3uw2+µ,
3
B̂030
= v3 ,
3
B̂120
= 3uv 2+µ,
3
B̂210
= 3vu2+µ,
3
B̂003
3
B̂012
3
B̂021
3
=w ,
2
= 3vw +µ,
We thus obtain a simplified expression
X
F=
bijk Bijk =
X
(V.12)
2
= 3wv +µ.
bijk B̂ijk
(V.13)
(i,j,k)6=(1,1,1)
involving only nine basis functions. Since they form a partition of unity, we can obtain
one of them from the remaining eight. Therefore, it suffices to store the values of eight
basis functions, and we need only two texture lookups for evaluation per point. Note
that if we choose the center coefficient as in (V.4) we need three texture lookups for
retrieving the basis functions, but the remainder of the shader is essentially the same.
Due to the linearity of the sampling operator, we may express (V.11) for a vertex p
of PM with s(p) = m + α as
v = Ss(p) [F](p) =
X
bijk Ss(p) [B̂ijk ](p)
(V.14)
i,j,k
=
X
bijk (1 − α)Sm [B̂ijk ](p) + αSm+1 [B̂ijk ](p) .
i,j,k
3
3
Thus, for every vertex p of PM , we pre-evaluate Sm [B̂300
](p), . . . , Sm [B̂021
](p) for
every refinement level m = 1, . . . , M and store this in a M × 2 block in the texture.
We organize the texture such that four basis functions are located next to the four corresponding basis functions of the adjacent refinement level. This layout optimizes spatial
coherency of texture accesses since two adjacent refinement levels are always accessed
when a vertex is calculated. Also, if vertex shaders on future graphics hardware will
93
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
Figure V.8: The left image depicts a coarse mesh using normal mapping to increase the perceived
detail, and the right image depicts the same scene using the displacement mapping technique
described in Section 6.5.
support filtered texture lookups, we could increase performance by carrying out the
linear interpolation between refinement levels by sampling between texel centers.
Since the values of our basis function are always in in the interval [0, 1], we can
trade precision for performance and pack two basis functions into one channel of data,
letting one basis function have the integer part while the other has the fractional part
of a channel. This reduces the precision to about 12 bits, but increases the speed of the
algorithm by 20% without adding visual artifacts.
6.5
N ORMAL AND DISPLACEMENT MAPPING
Our algorithm can be adjusted to accommodate most regular rendering techniques.
Pure fragment level techniques can be applied directly, but vertex-level techniques may
need some adjustment.
An example of a pure fragment-level technique is normal mapping. The idea is to
store a dense sampling of the object’s normal field in a texture, and in the fragment
shader use the normal from this texture instead of the interpolated normal for lighting
calculations. The result of using normal mapping on a coarse mesh is depicted in the
left of Figure V.8.
Normal mapping only modulates the lighting calculations, it does not alter the geometry. Thus, silhouettes are still piecewise linear. In addition, the flat geometry is
distinctively visible at gracing angles, which is the case for the sea surface in Figure V.8.
The displacement mapping technique attacks this problem by perturbing the ver94
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
tices of a mesh. The drawback is that displacement mapping requires the geometry
in problem areas to be densely tessellated. The brute force strategy of tessellating the
whole mesh increase the complexity significantly and is best suited for off-line rendering. However, a ray-tracing like approach using GPUs has been demonstrated by
Donnelly[19].
We can use our strategy to establish the problem areas of the current frame and
use our variable-level of detail refinement strategy to tessellate these areas. First, we
augment the silhouetteness test, tagging edges that are large in the current projection
and part of planar regions at gracing angles for refinement. Then we incorporate displacement mapping in the vertex shader of Section 6.4. However, care must be taken
to avoid cracks and maintain a watertight surface.
For a point p at integer refinement level s, we find the triangle T = [pi , pj , pk ]
of Ps that contains p. We then find the displacement vectors at pi , pj , and pk . The
displacement vector at pi is found by first doing a texture lookup in the displacement
map using the texture coordinates at pi and then multiplying this displacement with
the interpolated shading normal at pi . In the same fashion we find the displacement
vectors at pj and pk . The three displacement vectors are then combined using the
barycentric weights of p with respect to T , resulting in a displacement vector at p.
If s is not an integer, we interpolate the displacement vectors of two adjacent levels
similarly to (V.9).
The result of this approach is depicted to the right in Figure V.8, where the cliff
ridges are appropriately jagged and the water surface is displaced according to the
waves.
7
P ERFORMANCE ANALYSIS
In this section we compare our algorithms to alternative methods. We have measured
both the speedup gained by moving the silhouetteness test calculation to the GPU, as
well as the performance of the full algorithm (silhouette extraction + adaptive tessellation) with a rapidly changing viewpoint. We have executed our benchmarks on two
different GPUs to get an impression of how the algorithm scales with advances in GPU
technology.
For all tests, the CPU is an AMD Athlon 64 running at 2.2GHz with PCI-E graphics
bus, running Linux 2.6.16 and using GCC 4.0.3. The Nvidia graphics driver is version
87.74. All programs have been compiled with full optimization settings. We have used
two different Nvidia GPUs, a 6600 GT running at 500MHz with 8 fragment and 5 vertex pipelines and a memory clockspeed of 900MHz, and a 7800 GT running at 400MHz
with 20 fragment and 7 vertex pipelines and a memory clockspeed of 1000Mhz. Both
GPUs use the PCI-E interface for communication with the host CPU.
95
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
10k
7800 GT
Uniform
Brute
Static VBO
Hierarchal
10k
100
1k
10k
Dynamic mesh
1k
1k
10
100
Static mesh
6600 GT
Frames pr. sec
Silhouette extractions pr. sec
100k
100
24
10
1
100
100k
Number of triangles
1k
10k
100k
Number of triangles
(a) The measured performance for brute force
CPU silhouette extraction, hierarchical CPU silhouette extraction, and the GPU silhouette extraction on the Nvidia GeForce 6600GT and 7800GT
GPUs.
(b) The measured performance on an Nvidia
GeForce 7800GT for rendering refined meshes using one single static VBO, the uniform refinement
method of [8], and our algorithm for static and dynamic meshes.
Figure V.9: Performance measurements of our algorithm.
Our adaptive refinement relies heavily on texture lookups in the vertex shader.
Hence, we have not been able to perform tests on ATI GPUs, since these just recently
got this ability. However, we expect similar performance on ATI hardware.
We benchmarked using various meshes ranging from 200 to 100k triangles. In
general, we have found that the size of a mesh is more important than its complexity and
topology, an observation shared by Hartner et.al.[37]. However, for adaptive refinement
it is clear that a mesh with many internal silhouettes will lead to more refinement, and
hence lower frame-rates.
7.1
S ILHOUETTE EXTRACTION ON THE GPU
To compare the performance of silhouette extraction on the GPU versus traditional
CPU approaches, we implemented our method in the framework of Hartner et.al. [37].
This allowed us to benchmark our method against the hierarchical method described
by Sander et.al. [79], as well as against standard brute force silhouette extraction. Figure V.9(a) shows the average performance over a large number of frames with random
viewpoints for an asteroid model of [37] at various levels of detail. The GPU measurements include time spent reading back the data to host memory.
Our observations for the CPU based methods (hierarchical and brute force) agree
96
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
with [37]. For small meshes that fit into the CPU’s cache, the brute force method is
the fastest. However, as the mesh size increases, we see the expected linear decrease in
performance. The hierarchical approach scales much better with regard to mesh size,
but at around 5k triangles the GPU based method begins to outperform this method as
well.
The GPU based method has a different performance characteristic than the CPU
based methods. There is virtually no decline in performance for meshes up to about
10k triangles. This is probably due to the dominance of set-up and tear-down operations
for data transfer across the bus. At around 10k triangles we suddenly see a difference in
performance between the 8-pipeline 6600GT GPU and the 20-pipeline 7800GT GPU,
indicating that the increased floating point performance of the 7800GT GPU starts to
pay off. We also see clear performance plateaus, which is probably due to the fact
that geometry textures for several consecutive mesh sizes are padded to the same size
during histopyramid construction.
It could be argued that coarse meshes in need of refinement along the silhouette
typically contains less than 5000 triangles and thus silhouetteness should be computed
on the CPU. However, since the test can be carried out for any number of objects at the
same time, the above result applies to the total number of triangles in the scene, and
not in a single mesh.
For the hierarchical approach, there is a significant pre-processing step that is not
reflected in Figure V.9(a), which makes it unsuitable for dynamic meshes. Also, in realtime rendering applications, the CPU will typically be used for other calculations such
as physics and AI, and can not be fully utilized to perform silhouetteness calculations.
It should also be emphasized that it is possible to do other per-edge calculations, such
as visibility testing and culling, in the same render pass as the silhouette extraction, at
little performance overhead.
7.2
B ENCHMARKS OF THE COMPLETE ALGORITHM
Using variable level of detail instead of uniform refinement should increase rendering
performance since less geometry needs to be sent through the pipeline. However, the
added complexity balances out the performance of the two approaches to some extent.
We have tested against two methods of uniform refinement. The first method is to
render the entire refined mesh as a static VBO stored in graphics memory. The rendering of such a mesh is fast, as there is no transfer of geometry across the graphics bus.
However, the mesh is static and the VBO consumes a significant amount of graphics
memory. The second approach is the method of Boubekeur and Schlick [8], where
each triangle triggers the rendering of a pre-tessellated patch stored as triangle strips in
a static VBO in graphics memory.
97
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
Figure V.9(b) shows these two methods against our adaptive method. It is clear
from the graph that using static VBOs is extremely fast and outperforms the other
methods for meshes up to 20k triangles. At around 80k triangles, the VBO grows
too big for graphics memory, and is stored in host memory, with a dramatic drop in
performance. The method of [8] has a linear performance degradation, but the added
cost of triggering the rendering of many small VBOs is outperformed by our adaptive
method at around 1k triangles. The performance of our method also degrades linearly,
but at a slower rate than uniform refinement. Using our method, we are at 24 FPS able
to adaptively refine meshes up to 60k for dynamic meshes, and 100k triangles for static
meshes, which is significantly better than the other methods. The other GPUs show
the same performance profile as the 7800 in Figure V.9(b), just shifted downward as
expected by the number of pipelines and lower clock speed.
Finally, to get an idea of the performance impact of various parts of our algorithm,
we ran the same tests with various features enabled or disabled. We found that using
uniformly distributed random refinement level for each edge (to avoid the silhouetteness test), the performance is 30–50% faster than uniform refinement. This is as expected since the vertex shader is only marginally more complex, and the total number
of vertices processed is reduced. In a real world scenario, where there is often a high
degree of frame coherency, this can be utilized by not calculating the silhouetteness for
every frame. Further, if we disable blending of consecutive refinement levels (which
can lead to some popping, but no cracking), we remove half of the texture lookups in
the vertex shader for refined geometry and gain a 10% performance increase.
8
C ONCLUSION AND FUTURE WORK
We have proposed a technique for performing adaptive refinement of triangle meshes
using graphics hardware, requiring just a small amount of preprocessing, and with
no changes to the way the underlying geometry is stored. Our criterion for adaptive
refinement is based on improving the visual appearance of the silhouettes of the mesh.
However, our method is general in the sense that it can easily be adapted to other
refinement criteria, as shown in Section 6.5.
We execute the silhouetteness computations on a GPU. Our performance analysis
shows that our implementation using histogram pyramid extraction outperforms other
silhouette extraction algorithms as the mesh size increases.
Our technique for adaptive level of detail automatically avoids cracking between
adjacent patches with arbitrary refinement levels. Thus, there is no need to “grow”
refinement levels from patch to patch, making sure two adjacent patches differ only
by one level of detail. Our rendering technique is applicable to dynamic and static
meshes and creates continuous level of detail for both uniform and adaptive refinement
98
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
algorithms. It is transparent for fragment-level techniques such as texturing, advanced
lighting calculations, and normal mapping, and the technique can be augmented with
vertex-level techniques such as displacement mapping.
Our performance analysis shows that our technique gives interactive frame-rates for
meshes with up to 100k triangles. We believe this makes the method attractive since
it allows complex scenes with a high number of coarse meshes to be rendered with
smooth silhouettes. The analysis also indicates that the performance of the technique
is limited by the bandwidth between host and graphics memory. Since the CPU is
available for other computations while waiting for results from the GPU, the technique
is particularly suited for CPU-bound applications. This also shows that if one could
somehow eliminate the read-back of silhouetteness and trigger the refinement directly
on the graphics hardware, the performance is likely to increase significantly. To our
knowledge there are no such methods using current versions of the OpenGL and Direct3D APIs. However, considering the recent evolution of both APIs, we expect such
functionality in the near future.
A major contribution of this work is an extension of the technique described in
[8]. We address three issues: evaluation of PN-triangle type patches on vertex shaders,
adaptive level of detail refinement and elimination of popping artifacts. We have proposed a simplified PN-triangle type patch which allows the use of pre-evaluated basisfunctions requiring only one single texture lookup (if we pack the pre-evaluated basis
functions into the fractional and rational parts of a texel). Further, the use of a geometric refinement level different from the topological refinement level comes at no cost
since this is achieved simply by adjusting a texture coordinate. Thus, adaptive level of
detail comes at a very low price.
We have shown that our method is efficient and we expect it to be even faster
when texture lookups in the vertex shader become more mainstream and the hardware
manufacturers answer with increased efficiency for this operation. Future GPUs use
a unified shader approach, which could also boost the performance of our algorithm
since it is primarily vertex bound and current GPUs perform the best for fragment
processing.
Acknowledgments. We would like to thank Gernot Ziegler for introducing us to the
histogram pyramid algorithm. Furthermore, we are grateful to Mark Hartner for giving
us access to the source code of the various silhouette extraction algorithms. Finally,
Marie Rognes has provided many helpful comments after reading an early draft of this
manuscript. This work was funded, in part, by contract number 158911/I30 of The
Research Council of Norway.
99
PAPER V: R EAL - TIME GPU SILHOUETTE REFINEMENT
100
PAPER VI:
S EMI - UNIFORM ADAPTIVE PATCH
TESSELLATION
Christopher Dyken, Martin Reimers, and Johan Seland
Submitted.
Abstract: We present an adaptive tessellation scheme for surfaces consisting of
parametric patches. The resulting tessellations are topologically uniform, yet consistent and watertight across boundaries of patches with different tessellation levels. Our scheme is simple to implement, requires little memory, and is well suited
for instancing, a feature available on current GPUs that allows a substantial performance increase. We describe how the scheme can be implemented effectively
and give performance benchmarks comparing it to other standard approaches.
1
I NTRODUCTION
Higher-order primitives like Bézier patches can model complex and smooth shapes parameterized by relatively few control points, and is easier to manipulate than triangular
meshes with corresponding fidelity. However, since current graphics hardware (GPUs)
usually can not handle higher order primitives directly, these must be tessellated into
triangle meshes before rendering. A standard approach is to tessellate the parameter
domain of the patch, map the vertices of this triangulation to the patch surface, and
render the resulting triangulation. A uniform tessellation results if each patch is tessellated in the same way. Otherwise, one can use an adaptive approach with varying
tessellation levels. A complication with adaptive methods is that the tessellations of
adjoining patches must match edge-by-edge to guarantee watertight rasterization [80].
The performance of GPUs rely heavily on massive parallelism, and thus to achiveve
good GPU utilization, geometry must be grouped into large batches. Instancing is
a feature of DX10-class GPUs that drastically increases batch sizes by letting multiple instances of an object be rendered using a single draw-call. A patch tessellation
approach as described above is well suited for instancing if the number of different
tessellations is low and patches with identical tessellations are easily grouped.
We present an adaptive patch tessellation scheme based on dyadic tessellations of
the triangular parameter domain. To make adjacent patches meet edge-by-edge, we
use a snap function that in one step moves boundary vertices and collapses triangles in
101
PAPER VI: S EMI - UNIFORM ADAPTIVE PATCH TESSELLATION
Figure VI.1: The triangles of a bump-mapped sphere are adaptively refined and displaced along
the silhouette, giving the impression of high geometric complexity.
a consistent manner, which can be interpreted as a series of edge-collapses [43]. This
way, we can efficiently produce adaptive and topologically consistent tessellations from
a limited set of source tessellations. We also show how the patches can be grouped
according to tessellation level on the GPU into render queues. Combined, this provides
a pure GPU rendering pipeline for adaptive tessellation with efficient use of instancing.
After a brief account of previous work in the next section, we discuss uniform tessellations and our new semi-uniform adaptive approach in Section 3. In the subsequent
section we elaborate on implementation and describe a pure GPU pipeline suitable for
current graphics hardware. Section 5 is devoted to performance benchmarks, comparing a number of alternative implementations with and without instancing. We finish
the paper with a few final remarks.
2
R ELATED WORK
Boubekeur and Schlick [8] propose to store a parameter space tessellation in a vertex
buffer object (VBO) on the GPU, letting the vertex shader evaluate the patch at the vertices. Their approach was extended by adaptive refinement patterns [9] that stores an
atlas of tessellations covering all the different combinations of edge refinement levels
in GPU memory, which allows patches with different refinement levels to meet edgeby-edge. However, the number of tessellations grows rapidly with maximum tessellation level, which quickly consumes a significant amount of GPU memory and hinders
efficient use of instancing.
GPU based adaptive tessellation has been demonstrated for subdivision surfaces
102
PAPER VI: S EMI - UNIFORM ADAPTIVE PATCH TESSELLATION
Figure VI.2: The left figure shows the case where patches with different tessellation levels meet,
resulting in numerous hanging nodes (red) along the boundaries. The center triangulation is the
result of our method, which amounts to moving hanging nodes to the nearest vertex common
to both tessellations. The close-up to the right illustrates how the red vertices are moved by the
snap function φ as indicated by the arrows. The dashed edges are either collapsed or moved as a
result.
[84, 11] and for grid based terrain models using Geometry Clipmaps [63]. These
methods either yield tessellations with hanging nodes, or watertight tessellations by
inserting triangles.
We have earlier proposed an adaptive tessellation scheme based on dyadic uniform
tessellations with geometric continuity [22]. The present scheme is to some extent
similar but guarantees that tessellations are topologically consistent.
Moreton [67] discusses adaptive tessellations and crack removal strategies, and
proposes a forward differencing scheme for Bézier patches based, aimed at hardware
implementation. The GPU of the Xbox 360 [3] game console contains a hardware
tessellation unit, and it is likely that standard graphics hardware will incorporate such
units in the future. The geometry shader of DX10-class hardware can in principle be
used for tessellation. For reference, we have benchmarked a geometry shader-based
implementation of our scheme.
3
S EMI - UNIFORM ADAPTIVE PATCH TESSELLATION
A parametric triangular patch F : P 0 → R3 can be rendered by evaluating F at the
vertices of a tessellation P of its triangular parameter domain P 0 ⊂ R2 . If the resulting
tessellation T in R3 is fine enough it can be rendered as a good approximation of
the patch itself. A dyadic tessellation of P 0 is a natural choice due to its uniformity
and that it is trivial to produce on the fly, as demonstrated by our geometry shaderimplementation. A dyadic refinement P 1 of P 0 can be obtained by inserting a vertex
at the midpoint of each edge of P 0 and replace the triangle with four new triangles.
103
PAPER VI: S EMI - UNIFORM ADAPTIVE PATCH TESSELLATION
Continuing the refinement procedure, we get at the m’th step a triangulation P m of the
m-dyadic barycentric points
m
I
=
1
(i, j, k) :
2m
i, j, k ∈ N,
m
i+j+k =2
.
Note that dyadic tessellations of P 0 are nested in the sense that Im ⊂ Im+1 , i.e. a vertex of a coarse tessellation is also a vertex of its refinements. A dyadic tessellation P m
of the patch parameter domain yields a corresponding tessellation T m of the patch itself, consisting of the triangles [F(ui ), F(uj ), F(uk )] for triangles [ui , uj , uk ] of P m .
This approach lends itself naturally to the use of VBOs and vertex shader programs for
patch evaluations.
We are interested in the case where we have a number of triangular patches that
meet with geometric continuity, i.e. with common boundaries. We wish to construct
one tessellation for each patch such that the individual tessellations are sufficiently
fine. Allowing this form of adaptivity we are faced with the problem of making the
tessellations of neighboring patches compatible. One approach is to add triangles on
the patch boundaries in order to fill holes in the tessellations, resulting in a slightly
more complex mesh than in the case of uniform tessellations. Another approach is to
let two tessellations meet with geometric continuity [22, 63]. However, this results in
hanging nodes as illustrated in Figure VI.2 (left), which can result in artifacts such
as drop-out pixels. To guarantee a watertight tessellation one must ensure that the
patch tessellations are topologically consistent, i.e. that adjacent triangles share the
end-points of their common edge.
Consider dyadic tessellations as the ones in Figure VI.2 (left), with neighboring
patches of different tessellation levels and thus a number of hanging nodes. Our approach is to move each hanging node to the nearest dyadic barycentric point shared by
the neighboring tessellations, as illustrated in Figure VI.2 (center). This results in a
new tessellation that is uniform in the interior and topologically consistent with neighboring tessellations, although with degenerate triangles. Since the resulting mesh is
in fact still topologically uniform it can be rendered using a VBO corresponding to
a dyadic tessellation P m . Degenerate triangles pose no problems as they are quickly
discarded in the rendering process. We next discuss the details of our approach.
We consider a single patch to be tessellated at level m and denote by p0 , p1 , p2 the
tessellation levels of its neighboring patches. In order to remove hanging nodes we
define a dyadic snap function φ : Im → Im which maps u = (u0 , u1 , u2 ) ∈ Im to the
nearest dyadic barycentric point consistent with neighboring tessellations, illustrated
104
PAPER VI: S EMI - UNIFORM ADAPTIVE PATCH TESSELLATION
in Figure VI.2 (right). More precisely,
(
u0 , u 1 , u 2 ,
if u0 , u1 , u2 6= 0;
φ(u) =
σpi (u0 ), σpi (u1 ), σpi (u2 ) , if ui = 0,
where σp maps a real value to the nearest p-dyadic number,
(
1 d2p t − 1/2e if t < 1/2;
σp (t) = p
2
b2p t + 1/2c otherwise,
(VI.1)
breaking ties towards 0 if t < 1/2 and towards 1 otherwise. Thus, for a tie with u being
equally distant from two pi -dyadic points, φ snaps towards the closest corner vertex of
P 0 , yielding some degree of symmetry.
Let us verify that φ is well defined and works as required. If u0 , u1 , u2 6= 0, then
φ(u) = u, i.e. interior vertices of P m are preserved. Since σp (i) = i for any integers
i and p, the corners of P 0 are left unchanged by φ. This also implies that φ is well
defined even if two of the coordinates (u0 , u1 , u2 ) are zero. Suppose now that u ∈ Im
is a boundary vertex, i.e. with some ui = 0. Since σp (t)+σp (1−t) = 1 for any integer
p and real value t, then σpi (u0 ) + σpi (u1 ) + σpi (u2 ) = 1 and hence φ(u) ∈ Ipi . Note
that this holds for all integers pi . Therefore φ(u) ∈ Im and in the case u is on the
i’th boundary edge, φ(u) ∈ Ipi . In conclusion, φ preserves interior vertices and snaps
a hanging boundary vertex to the closest dyadic barycentric coordinate at the required
tessellation level.
Applying φ to all the vertices of P m we obtain a corresponding planar tessellation
of P 0 ,
Ppm0 p1 p2 = {[φ(ui ), φ(uj ), φ(uk )] : [ui , uj , uk ] ∈ P m } ,
m
m
m
with vertices Im
p0 p1 p2 = {φ(u) : u ∈ I } ⊆ I . A triangle in Pp0 p1 p2 is degenerated
if two of its vertices are identical. The remaining triangles cannot fold over and since
φ preserves the order of the boundary vertices, Ppm0 p1 p2 is a valid tessellation of P 0 if
we ignore the degenerate triangles.
Our choice of snap function minimizes the problem of long thin triangles for a
given set of vertices, since φ always maps a boundary point to the closest boundary
m
point in Im
p0 p1 p2 . It can be shown that if P is an equilateral triangle, then Pp0 p1 p2 is a
m
Delaunay triangulation of Ip0 p1 p2 .
4
I MPLEMENTATION
In this section we describe a pure GPU implementation of the tessellation scheme on
DX10-class hardware. The performance benchmarks in the next section compares this
approach to CPU-assisted alternatives on older hardware.
105
PAPER VI: S EMI - UNIFORM ADAPTIVE PATCH TESSELLATION
Start new frame
Calculate refinement level
Build refinement HPs
×
RQ 1
.
.
.
RQ M
HP 1
.
.
.
HP M
Build render
queues
×
VBO lvl 1
.
.
.
VBO lvl M
Render
tess. level 0
Render tess.
level 1 . . . M
Patch
coefficients
Apply F ◦ φ
Patch tess.
level texture
Input Patches
Figure VI.3: A schematic view of the implementation. The thick arrows designate control flow,
while the data flow is drawn in green using dashed arrows. Static data is drawn in red. Instancing
is indicated by the ⊗-symbol.
Conceptually, an implementation of the scheme is straightforward, and the snap
function can be realized in a few lines of shader code, see Listing 1. For simplicity we
assume the input patches are static, but this is not a requirement. Given an input mesh
of patches and a maximum refinement level M , we use some predicate to associate a
refinement level 0 ≤ le ≤ M with every edge, e.g. silhouetteness [22], patch curvature,
or the distance from the camera. The patch tessellation level m is the maximum of the
integer edge tessellation levels dle e. We then issue the rendering of P m , applying
the snap function φ and the patch evaluator F to calculate the final vertex positions.
Any parameteric patch evaluator can be used, such as subdivision patches [87] or PNtriangles [92]. Similar to adaptive refinement patterns [9], the tessellations only handle
integer refinement levels. However, continuous refinement level can be accommodated
by using a blending scheme like adaptively blended Bézier patches [22].
For better performance, we organize the calculations as in Figure VI.3. We first
calculate the tessellation levels for all input patches. Thereafter we group the patches
according to tessellation level, using the HistoPyramid [95] data compaction algorithm.
For each m = 1, . . . , M we extract a render queue, consisting of all patches with tessellation level m. Finally, we issue all P m -patches in a single draw call using instancing,
applying F ◦ φ to the vertices using a vertex shader.
4.1
B UILDING THE RENDER QUEUES
For each frame, the first step is to determine the tessellation level of each patch. This is
typically done in a GPGPU-pass, resulting in a patch tessellation level texture. Then,
106
PAPER VI: S EMI - UNIFORM ADAPTIVE PATCH TESSELLATION
vec4 p = gl_Vertex;
vec4 m = equal( floor(p.wwww), vec4(0,1,2,3) );
float l = dot( levels, m );
if( m.w == 0 ) {
float s = exp2( ceil(lod) );
float t = (1/s)*floor( s*dot( p.xyz, m.xyz ) + fract(p.w) );
p.xyz = mix( mask.zxy, mask.xyz, t );
}
Listing 1: GLSL-implementation of φ.
for each m = 1, . . . , M we build a HistoPyramid, setting the base level of the pyramid
to one if the patch is of level m and zero otherwise. The upper levels in the HistoPyramid is built by a sequence of GPGPU-passes. Finally, we trigger an asynchronous
readback of the top-elements of the HistoPyramids to the CPU, a total of M integers,
which give the number of patches in each render queue. These numbers are needed
by the CPU to control subsequent passes, e.g the number of instances to produce of a
given tessellation.
Then the render queues are built, laid out in a single texture. This is done using a
GPGPU-pass that linearly enumerates the elements and determine which render queue
it belongs to by inspecting the top elements of the HistoPyramids. Then the HistoPyramid is traversed, terminating at the corresponding patch. The patch index is stored in
the render queue, along with the tessellation levels of the neighbors.
This GPU-based bucket sort algorithm can be used to group geometry for other
uses, e.g. to sort geometry into layers to control sequence of pixel overwrites in scenes
with transparent geometries. When supported, the render queue construction could
possibly benefit from implementation in a data parallel programming environment like
Nvidia CUDA, using a standard sorting routine.
4.2
R ENDERING
The geometry is rendered in two stages, one for unrefined patches and one for the
refined patches. The input patches are stored in a static VBO, with the patch index
encoded. We render this VBO using a vertex shader that degenerates all patches tagged
for refinement in the patch tessellation level texture.
We then render the patches in the render queues. For each tessellation level m
we have a parameter space tessellation P m stored in a static VBO [8]. The xyzcoordinates of gl_Vertex contain the barycentric coordinates, and the integer part
of w specifies if the vertex is in the interior or on a edge. Since the first case of (VI.1)
107
PAPER VI: S EMI - UNIFORM ADAPTIVE PATCH TESSELLATION
can be exchanged with b2p t + 12 − c for a sufficiently small > 0, we let the fraction
part of w contain 12 or 21 − and thus avoiding run-time conditionals in σp . Listing 1
shows our GLSL-implementation of φ. We bind the render queue as a texture and trigger the given number of instances of P m . The vertex shader fetches the neighboring
tessellation levels and patch index from the render queue using the instance ID, and
then use the patch index to fetch the per-patch data. We get the final vertex position
by applying φ and F, producing the evaluated patch at the primitive assembly stage.
Thus, any regular fragment shader can be used without modification.
4.3
O PTIMIZATIONS
The description above gives a simplified view of the implementation. We reduce the
number of GPGPU-passes by storing the set of HistoPyramids in a single mipmaped 2D
texture array. In this way, all HistoPyramids can be bound to a single texture sampler
when building the render queues, and they are built in parallel using multiple render
targets.
Per-patch data, such as shading normals and coefficients, can be uploaded through
bindable uniforms instead of buffer textures. The rationale behind this is that it is faster
to have data that is constant for numerous vertices stored in constant memory instead
of (cached) textures. We store the extra data in addition to the edge refinement levels
in the render queue, increasing the storage requirement to 40 floats per patch. The
render queues are built using transform feedback. A restriction of bindable uniforms
is that the amount of constant memory is limited. This limits the batch size to roughly
400 patches, resulting in more draw calls. The performance effect of this approach is
described in the following section.
5
P ERFORMANCE ANALYSIS
We have performed a series of performance benchmarks on the implementation described in the last section, as well as a few other variations of it for two different GPU
generations. The benchmark consist of measuring the average framerate for a series
of surfaces with a randomly varied viewpoint. The results are presented in Table VI.1.
For the benchmarks, we have used the silhouetteness criterion [22] to determine the
patch tessellation level and PN-Triangles [92] as the patch evaluator.
Method A is a brute force approach that, as a preprocess, tessellate all patches to
level M , and stores this in a static VBO. This VBO is rendered every frame, which
should be the fastest approach as long as the resulting VBO fits in GPU memory.
Method B is a uniform refinement method that uses a single parameter space tessellation (of level M ) that is invoked for every patch [8].
Method C is our earlier method which results in hanging nodes [22]. Method D
108
PAPER VI: S EMI - UNIFORM ADAPTIVE PATCH TESSELLATION
max level 5
max level 3
GeForce 7800 GT
GeForce 8800 GT
A
B
C
D
A
B
C
D
E
F
G
H
I
Patches
200 1988 1146 672 648
3593 1667 745 736 1014 1000 931 798 250
2992 788 550 555 915 920 854 864 114
468 1632 666 429 409
2513 614 482 482 850 832 862 841 81
800 1396 440 387 368
1000
1187 363 291 276
3098 511 369 384 813 693 824 822 60
981 252 329 323
2341 293 428 419 821 762 813 814 65
1536
4000
500 103 182 172
1082 117 218 219 590 500 685 684 25
6144
351
68 197 184
746
93 239 234 559 488 686 685 26
23216
120
18
59 54
216
24
66
65 193 195 278 273 6.0
96966
3.6
4.4
22 20
52
5.9
23
23
67
68 111 108 1.2
200
609 214 232 203
1417 1349 732 731 680 846 693 693 —
468
306
96 113 98
758 704 531 535 408 567 451 508 —
800
199
57 113 98
532 475 484 485 420 497 435 509 —
1000
158
46
71 62
425 381 382 377 294 362 336 397 —
1536
113
30
97 86
296 258 424 427 383 530 403 461 —
4000
8.8
12
41 36
115 101 222 220 171 262 210 235 —
6144
5.7
7.6
49 42
76
66 239 240 191 277 240 290 —
—
2.0
11 9.3
20
18
64
64
45
78
59
74 —
23216
96966
—
0.5
4.0 3.4
—
4.2
24
23
16
30
22
29 —
A: Single static VBO.
B: Uniform refinement [8], per patch triggering.
C: Adaptive refinement [22], per patch triggering.
D: Algorithm C using semi-uniform tessellation.
E: Semi-uniform tessellation with CPU render queues and patch data through textures.
F: Algorithm E with patch data through bindable uniforms.
G: Semi-uniform tessellation with GPU render queues and patch data through textures.
H: Algorithm G with patch data through bindable uniforms.
I: Semi-uniform tessellation using the geometry shader.
Table VI.1: Performance in frames per second for the algorithms. Algorithms E–I are only
available on DX10-class GPUs.
is Method C augmented with φ to show the performance hit of the snap function.
Method E is method D, but uses instancing to draw the geometry. Instead of triggering
rendering of the VBOs for each individual triangle, the CPU builds the render queues
using memory-mapped buffer objects. Then, each queue is bound to a buffer texture,
and rendered in one batch using instancing. Method F is identical to Method D except
that the render queues are exposed to the vertex shader through bindable uniforms as
outlined in Section 4.3.
Methods G and H are the pure GPU implementation described in Section 4. Method G
builds a single texture with all render queues in a GPGPU-pass. Method H expose the
per-patch data through bindable uniforms.
109
PAPER VI: S EMI - UNIFORM ADAPTIVE PATCH TESSELLATION
For reference, Method I is semi-uniform tessellation implemented as a single geometry shader program. The triangle adjacency primitive is used to stream the base triangles along with direct neighbours into the geometry shader, which initially calculate
the silhouetteness of the edges. If none of the edges dictate a refinement, the triangle is
passed directly through. Otherwise, an uniform dyadic tessellation is produced on the
fly and F ◦ φ is applied to every emitted vertex.
Since methods A and B render the same number of triangles, their relative performance give some information on the cost associated of triggering a VBO per patch
and evaluation f in the vertex shader. It is clear that a large pre-evaluated VBO is
much faster than evaluating the vertices every frame, but it can be prohibitively expensive with respect to memory usage. We observe that it is a very small difference
between C and D on similar hardware, indicating that applying φ has a negligible overhead.
We observe that introducing adaptivity (C–H) outperform method B for reasonably
sized meshes, due to the reduced number of vertices processed. Instancing (E–H) gives
a considerable performance increase, and for larger meshes even outperform using the
static VBO (A). For mesh sizes of about 1000 patches and above, sorting on the GPU
(G and H) is faster than using the CPU for sorting (E and F). We believe that adaptive
refinement patterns [9] would have performance comparable to Method C.
The geometry shader implementation (I) is the simplest adaptive method to implement. However, it has consistently the worst performance, which is probably because
the hardware does not know in advance the number of output primitives per patch,
and this makes load balancing difficult. Also, the geometry upscale capabilites are insufficent for refinement levels greater than three, which limits the applicability of the
approach.
When it comes to the different methods for passing the patch coefficients, the results are less clear. For M = 3 our findings is that the fastest approach is to use texture
buffers. For M = 5 we observe that it is faster to use bindable uniforms. We presume
this is because the patch data is more heavily accessed, so the fast constant memory
outweigh the smaller batch sizes.
6
C ONCLUSION
We have presented a scheme that modifies uniform tessellations by a dyadic snap function. The result is that patches of different tessellation levels are topologically consistent across boundaries. The snap function is easy to implement and incurs little
overhead. Furthermore the scheme can trivially be extended to other patch types such
as subdivision surfaces or a mix of triangular and polynomial patches.
The snap function is well suited for implementation in a vertex shader, and can
110
PAPER VI: S EMI - UNIFORM ADAPTIVE PATCH TESSELLATION
be incorporated in a rendering framework using a VBO for each tessellation level.
The tessellations are simple and can be generated on the fly without any triangulation
software. The number of VBOs scales linearly with the maximum tessellation level,
so the memory requirement is low. Furthermore the low number of VBOs makes it
possible to use instancing, yielding high performance as shown by our performance
analysis.
The advent of fast GPUs has favored regular meshes over small irregular meshes.
However, our findings indicate that the increased flexibility of GPUs and GPGPUalgorithms makes it beneficial to use adaptivity to reduce the total number of vertices,
as long as enough uniformity can be preserved to use instancing.
7
ACKNOWLEDGMENTS
We thank Denis Kovacs and Ignacio Castaño for helpful comments in this work.
111
PAPER VI: S EMI - UNIFORM ADAPTIVE PATCH TESSELLATION
112
PAPER VII:
H IGH - SPEED MARCHING CUBES USING
H ISTOGRAM P YRAMIDS
Christopher Dyken, Gernot Ziegler, Christian Theobalt, and Hans-Peter Seidel
Submitted.
Abstract: We present an implementation approach for Marching Cubes on graphics hardware for OpenGL 2.0 or comparable APIs. It currently outperforms all
other known GPU-based iso-surface extraction algorithms in direct rendering for
sparse or large volumes, even those using the recently introduced geometry shader
capabilites. To achieve this, we outfit the HistoPyramid algorithm, previously
only used in GPU data compaction, with the capability for arbitrary data expansion. After reformulation of Marching Cubes as a data compaction and expansion
process, the HistoPyramid algorithm becomes the core of a highly efficient and
interactive Marching Cube implementation. For graphics hardware lacking geometry shaders, such as mobile GPUs, the concept of HistoPyramid data expansion is
easily generalized, opening new application domains in mobile visual computing.
Further, to serve recent developments, we present how the HistoPyramid can be
implemented in the parallel programming language CUDA, by using a novel 1D
chunk/layer construction.
1
I NTRODUCTION
Iso-surfaces of scalar fields defined over cubical grids are essential in a wide range of
applications, e.g. medical imaging, geophysical surveying, physics, and computational
geometry. A major challenge is that the number of elements grows to the power of
three with respect to sample density, and the massive amounts of data puts tough requirements on processing power and memory bandwidth. This is particularly true for
applications that require interactive visualization of scalar fields. In medical visualization, for example, iso-surface extraction, as depicted in Figure VII.1, is used on a daily
basis. There, the user benefits greatly from immediate feedback in the delicate process
of determining transfer functions and iso-levels. In other areas such as geophysical
surveys, iso-surfaces are an invaluable tool for interpreting the enormous amounts of
measurement data.
Therefore, and not unexpectedly, there has been a lot of research on volume data
processing on Graphics Processing Units (GPUs), since GPUs are particularly designed
113
PAPER VII: H IGH - SPEED MARCHING CUBES
Figure VII.1: Determining transfer functions and iso-levels for medical data is a delicate process
where the user benefits greatly from immediate feedback.
for huge computational tasks with challenging memory bandwidth requirements, building on simple and massive parallelism instead of the CPU’s more sophisticated serial
processing. Volume ray-casting is one visualization technique for scalar fields that
has been successfully implemented on GPUs. While the intense computation for every
change in viewport can nowadays be handled, ray-casting can never produce an explicit
representation of the iso-surface. Such an explicit iso-surface is essential for successive
processing of the geometry, like volume or surface area calculations, freeform modeling, surface fairing, or surface-related effects for movies and games, such as the one
shown in Figure VII.2. In particular, two efficient algorithms for extracting explicit isosurfaces, Marching Cubes (MC) and Marching Tetrahedra (MT), have been introduced.
By now, substantial research effort has been spent on accelerating these algorithms on
GPUs.
In this paper, we present a novel, though well-founded, formulation of the Marching Cubes algorithm, suitable for any graphics hardware with at least Shader Model 3
(SM3) capabilites. This allows the implementation to run on a wide range of graphics
hardware. Our approach extract iso-surfaces directly from raw data without any preprocessing, and thus dynamic datasets, changes in the transfer function, or variations in
the iso-level is handled directly. It is able to produce a compact sequence of iso-surface
triangles in GPU memory without any transfer of geometry to or from the CPU. The
method requires only a moderate implementation effort and can thus be easily integrated into existing applications, while currently outperforms all other known GPUbased iso-surface extraction approaches. For completeness, we also propose how this
algorithm can be implemented in the general GPU programming language CUDA [15].
114
PAPER VII: H IGH - SPEED MARCHING CUBES
Figure VII.2: An iso-surface represented explicitly as a compact list of triangles (left) can be
visualized from any viewpoint (middle) and even be directly post-processed. One example for
such post-processing is the spawning of particles evenly over the surface (right). In all three
images, the GPU has autonomously extracted the mesh from the scalar field, where it is kept in
graphics memory.
The main element of our approach is the Histogram Pyramid [95] (short: HistoPyramid), which has shown to be an efficient data structure for GPU stream compaction.
In this paper, we have extended the HistoPyramid to handle general GPU stream expansion. This simple, yet fundamental modification, together with a reformulation of
the MC algorithm as a stream compaction and expansion process, enables us to map
the MC algorithm onto the GPU.
We begin with an overview of previous and related work in Section 2, followed by a
description of HistoPyramids in Section 3. In Section 4, we describe the MC algorithm,
its mapping to HistoPyramid stream processing, and implementation details for both
the OpenGL and the CUDA implementations. After that, we provide a performance
analysis in Section 5, before we conclude in the final section.
2
P REVIOUS AND RELATED WORK
In recent years, iso-surface extraction on stream processors (like GPUs) has been
a topic of intensive research. MC is particularly suited for parallelization, as each
MC cell can be processed individually. Nevertheless, the number of MC cells is substantial, and some approaches employ pre-processing strategies to avoid processing of
empty regions at render time. Unfortunately, this greatly reduces the applicability of
the approach to dynamic data. Also, merging the outputs of the MC cells’ triangles
into one compact sequence is not trivial to parallelize.
Prior to the introduction of geometry shaders, GPUs completely lacked functionality to create primitives directly. Consequently, geometry had to be instantiated by
the CPU or be prepared as a vertex buffer object (VBO). Therefore, a fixed number of
triangles had to be assumed for each MC cell, and by making the GPU cull degenerate
115
PAPER VII: H IGH - SPEED MARCHING CUBES
primitives, the superfluous triangles could be discarded. Some approaches used MT
to reduce the amount of redundant triangle geometry, since MT never requires more
than two triangles per MT tetrahedron. In addition, the configuration of a MT tetrahedron can be determined by inspecting only four corners, reducing the amount of
inputs. However, for a cubical grid, each cube must be subdivided into at least five
tetrahedra, which makes the total number of triangles usually larger in total than for an
MC-generated mesh. Beyond that, tetrahedral subdivision of cubical grids introduces
artifacts [13].
Pascucci et al. [72] represents each MT tetrahedron with a quad and let the vertex
shader determine intersections. The input geometry is comprised of triangle strips arranged in a 3D space-filling curve, which minimizes the workload of the vertex shader.
The MT approach of Klein et al. [53] renders the geometry into vertex arrays, moving
the computations to the fragment shader. Kipfer et al. [52] improved upon this by letting edge intersections be shared. Buatois et al. [10] applied multiple stages and vertex
texture lookups to reduce redundant calculations. Some approaches reduce the impact
of fixed expansion by using spatial data-structures. Kipfer et al. [52], for example,
identify empty regions in their MT approach using an interval tree. Goetz et al. [32]
let the CPU classify MC cells, and only feed surface-relevant MC cells to the GPU,
an approach also taken by Johannson et al. [46], where a kd-tree is used to cull empty
regions. But they also note that this pre-processing on the CPU limits the speed of the
algorithm. The geometry shader (GS) stage of SM4 hardware can produce and discard
geometry on the fly. Uralsky [90] propose a GS-based MT approach for cubical grids,
splitting each cube into six tetrahedra. An implementation is provided in the Nvidia
OpenGL SDK-10, and has also been included in the performance analysis.
Most methods could provide a copy of the iso-surface in GPU memory, using either
vertex buffers or the new transform feedback mechanism of SM4-hardware. However,
except for GS-based approaches, the copy would be littered with degenerate geometry,
so additional post-processing, such as stream compaction, would be needed to produce
a compact sequence of triangles.
Horn’s early approach [44] to GPU-based stream compaction uses a prefix sum
method to generate output offsets for each input element. Then, for each output element, the corresponding input element is gathered using binary search. The approach
has a complexity of O(n log n) and does not perform well on large datasets.
Prefix Sum (Scan) uses a pyramid-like up-sweep and down-sweep phase, where it
creates, in parallel, a table that associates each input element with output offsets. Then,
using scattering, the GPU can iterate over input elements and directly store the output
using this offset table. Harris [35] designed an efficient implementation in CUDA.
The Nvidia CUDA SDK 1.1 provides an MC implementation using Scan, and we have
included a highly optimized version in the performance analysis.
116
PAPER VII: H IGH - SPEED MARCHING CUBES
Ziegler et al. [95] have proposed another approach to data compaction. With the
introduction of HistoPyramids, data compaction can be run on the GPU of SM3 hardware. Despite a deep gather process for the output elements, the algorithm is surprisingly fast when extracting a small subset of the input data.
3
H ISTO P YRAMIDS
The core component of our MC implementation is the HistoPyramid data structure,
introduced in [95] for GPU-based data compaction. We extend its definition here, and
introduce the concept of local key indices (below) to provide for GPU-based 1:m expansion of data stream elements for all non-negative m. The input is a stream of data
input elements, short: the input stream. Now, each input element may allocate a given
number of elements in the output stream. If an input element allocates zero elements
in the output stream, the input element is discarded and the output stream becomes
smaller (data compaction). On the other hand, if the input element allocates more than
one output element, the stream is expanded (data expansion). The input elements’ individual allocation is determined by a user-supplied predicate function which determines
the output multiplicity for each input element. As a sidenote, in [95], each element
allocated exactly one output or none.
The HistoPyramid algorithm consists of two distinct phases. In the first phase, we
create a HistoPyramid, a pyramid-like data structure very similar to a MipMap. In the
second phase, we extract the output elements by traversing the HistoPyramid top-down
to find the corresponding input elements. In the case of stream expansion, we also
determine which numbered copy of the input element we are currently generating.
3.1
C ONSTRUCTION
The first step is to build the HistoPyramid, a stack of 2D textures. At each level, the
texture size is a quarter of the size of the level below, i.e. the same layout as the MipMap
pyramid of a 2D texture. We call the largest texture, in the bottom of the stack, the base
texture, and the single texel of the 1 × 1 texture in the top of the stack the top element.
Figure VII.3 shows the levels of a HistoPyramid, laid out from left to right. The texel
count in the base texture is the maximum number of input elements the HistoPyramid
can handle. For simplicity, we assume that the base texture is square and the side length
a power of two (arbitrary sizes can be accommodated with suitable padding).
In the base level, each input element corresponds to one texel. This texel holds
the number of allocated output elements. In Figure VII.3 we have an input stream of
16 elements, laid out from left to right and top to bottom. Thus, elements number
0,1,3,4,6,11, and 12 have allocated one output element each (stream pass-through).
Element number 9 has allocated two output elements (stream expansion), while the rest
117
PAPER VII: H IGH - SPEED MARCHING CUBES
of the elements have not allocated anything (stream compaction). These elements will
be discarded. The number of elements to be allocated is determined by the predicate
function at the base level. This predicate function may also map the dimension of the
input stream to the 2D layout of the base level. In our MC application, the input stream
is a 3D volume.
The next step is to build the rest of the levels from bottom-up, level by level. According to the MipMap principle, each texel in a level corresponds to four texels in the
level below. In contrast to the averaging used in the construction of MipMaps, we sum
the four elements. Thus, each texel receives the sum of the four corresponding elements in the level below. The example in Figure VII.3 illustrates this process. The sum
of the texels in the 2 × 2 block in the upper left of the base level is three, and stored in
the upper left texel of Level 1. The sum of the texels in the single 2×2 block of Level 1
is nine, and stored in the single texel of Level 2, the top element of the HistoPyramid.
At each level, the computation of a texel depends only on four texels from the
previous one. This allows us to compute all texels in one level in parallel, without any
data inter-dependencies.
3.2
T RAVERSAL
In the second phase, we generate the output stream. The number of output elements
is provided by the top element of the HistoPyramid. Now, to fill the output stream,
we traverse the HistoPyramid once per output element. To do this, we linearly enumerate the output elements with a key index k, and re-interpret HP values as key index
intervals. The traversal requires several variables: We let m denote the number of HP
levels. The traversal maintains a texture coordinate p and a current level l, referring to
one specific texel in the HistoPyramid. We further maintain a local key index kl , which
adapts to the local index range. It is initialized as kl = k. The traversal starts from
the top level l = m and goes recursively down, terminating at the base level l = 0.
During traversal, kl and p are continuously updated, and when traversal terminates, p
points to a texel in the base level. In the case of stream pass-through, kl is always zero
when traversal terminates. However, in the case of stream expansion, the value in kl
determines which numbered copy of the input element this particular output element
is.
Initially, l = m and p points to the center of the single texel in the top level. We
subtract one from l, descending one step in the HistoPyramid, and now p refers to the
center of the 2 × 2 block of texels in level m − 1 corresponding to the single texel p
pointed to at level m. We label these four texels in the following manner,
a b
c d
118
PAPER VII: H IGH - SPEED MARCHING CUBES
1
1
0
1
1
0
1
0
0
2
0
1
3
2
1
0
0
0
3
1
Level 1
Base level
9
Level 2
Figure VII.3: Bottom-up build process of the HistoPyramid, adding the values of four texels
repeatedly. The top texel contains the total number of output elements in the pyramid.
3
9
Level 2
3
2
1
Level 1
In: Key indices
Out: Texcoords and k
0
1
2
[0,0],0 [1,0],0 [0,1],0
3
4
5
[3,0],0 [2,1],0 [1,2],0
6
7
8
[1,2],1 [0,3],0 [3,2],0
1
1
0
1
1
0
1
0
0
2
0
1
1
0
0
9
0
Level 2
Base level
1
1
0
1
1
0
1
0
2
0
1
0
0
0
3
2
0
3
1
1
Level 1
Base level
Figure VII.4: Element extraction, interpreting partial sums as interval in top-down traversal.
Red traces the extraction of key index 4 and green traces key index 6.
119
PAPER VII: H IGH - SPEED MARCHING CUBES
c
d
g
h
(f, h)
a
e
y
b
(b, f )
x
z
(e, f )
f
Figure VII.5: The 15 basic predefined triangulations [62] for edge intersections (left). By symmetry, they provide triangulations for all 256 MC cases. Ambiguous cases are handled by adding
some extra triangulations [66]. A MC cell (right) where only f is inside the iso-surface, and thus
the edges (e, f ), (b, f ), and (f, h) intersect the iso-surface.
and use the values of these texels to form the four ranges A,B,C, and D, defined as
A = [0 , a),
B = [a , a + b),
C = [a + b , a + b + c),
and
D = [a + b + c , a + b + c + d).
Then, we examine which of the four ranges kl falls into. If, for example, kl falls into
the range B, we adjust p to point to the center of b and subtract the start of the range,
in this case a, from kl , adapting kl to the local index range. We recurse by subtracting
one from l and repeating the process until l = 0, when the traversal terminates. Then,
from p we can calculate the index of the corresponding input stream element, and the
value in kl enumerates the copy.
Figure VII.4 show two examples of HistoPyramid traversal. The first example,
labeled red, is for the key index k = 4, a stream pass-through. We start at level 2 and
120
PAPER VII: H IGH - SPEED MARCHING CUBES
descend to level 1. The four texels at level 1 form the ranges
A = [0 , 3),
B = [3 , 5),
C = [5 , 8),
D = [8 , 9).
We see that kl is in the range B. Thus, we adjust the texture coordinate to point to the
upper left texel and adjust kl to the new index range by subtracting 3 from kl , which
leaves kl = 1. Then, we descend again to the base level. The four texels in the base
level corresponding to the upper left texel of level 1 form the ranges
A = [0 , 0),
B = [0 , 1),
C = [1 , 2),
D = [2 , 2).
The ranges A and D are empty. Here, kl = 1 falls into C, and we adjust p and kl
accordingly. Since we’re at the base level, the traversal terminates, p = [2, 1] and
kl = 0.
The second example of Figure VII.4, labeled green, is a case of stream expansion,
with key index k = 6. We begin at the top of the HistoPyramid and descend to level 2.
Again, the four texels form the ranges
A = [0 , 3),
B = [3 , 5),
C = [5 , 8),
D = [8 , 9),
and kl falls into the range C. We adjust p to point to c and subtract the start of range C
from kl , resulting in the new local key index kl = 1. Descending, we inspect the four
texels in the lower left corner of the base level, which form the four ranges
A = [0 , 0),
B = [0 , 2),
C = [2 , 3),
D = [3 , 3),
where kl now falls into range B, and we adjust p and kl accordingly. Since we’re
at the base level, we terminate traversal, and have p = [1, 2]. kl = 1 implies that
output element 6 is the second copy of the input element from position [1, 2] in the
base texture.
The traversal only reads from the HistoPyramid. There are no data dependencies
between traversals. Therefore, the output elements can be extracted in any order —
even in parallel.
3.3
C OMMENTS
The 2D texture layout of the HistoPyramid fits graphics hardware very well. It can intuitively be seen that in the domain of normalized texture coordinate calculations, the
texture fetches overlap with fetches from the level below. This allows the 2D texture
cache to assist HP traversal with memory prefetches, and thus increases its performance.
121
PAPER VII: H IGH - SPEED MARCHING CUBES
At each descent during traversal, we have to inspect the values of four texels, which
amounts to four texture fetches. However, since we always fetch 2 × 2 blocks, we can
use a four-channel texture and encode these four values as RGBA value. This halves
the size of all textures along both dimensions, and thus let us build four times larger
HistoPyramids within the same texture size limits. In addition, since we quarter the
number of texture fetches, and graphics hardware is quite efficient at fetching fourchannel RGBA values, this usually yields a speed-up. For more details, see vec4HistoPyramids in [95].
Memory requirements of the HP are identical to a 2D MipMap-pyramid, i.e. 1/3
the size of the base level. Since the lower levels contain only small values, one could
create a composite structure using UINT8 for the lowest levels, UINT16 for some levels
before using UINT32 for the top levels.
4
M ARCHING C UBES
The Marching Cubes (MC) algorithm [62] of Lorensen and Cline is probably the most
commonly used algorithm for extracting iso-surfaces from scalar fields, which is why
we chose it as basis for our GPU iso-surface extraction. From a 3D grid of N × M × L
scalar values, we form a grid of (N −1) × (M −1) × (L−1) cube-shaped “MC cells”
in-between the scalar values such that each corner of the cube corresponds to a scalar
value. The basic idea is to “march” through all the cells one-by-one, and for each cell,
produce a set of triangles that approximates the iso-surface locally in that particular
cell.
It is assumed that the topology of the iso-surface inside a MC cell can be completely
determined from classifying the eight corners of the MC cell as inside or outside the
iso-surface. Thus, the topology of the local iso-surface can be encoded into an eightbit integer, which we call the MC case of the MC cell. If any of the twelve edges of
the MC cell have one endpoint inside and one outside, the edge is said to be piercing
the iso-surface. The set of piercing edges is completely determined by the MC case
of the cell. E.g., the MC cell right in Figure VII.5 has corner f inside and the rest
of the corners outside. Encoding “inside” with 1 and “outside” with 0, we attain the
MC case %00000100 in binary notation, or 32 in decimal. The three piercing edges of
the MC cell are (b, f ), (e, f ), and (f, h).
For each piercing edge we determine the intersection point where the edge intersects the iso-surface. By triangulating these intersection points we attain an approximation of the iso-surface inside the MC cell, and with some care, the triangles of
two adjacent MC cells fit together. Since the intersection points only move along the
piercing edges, there are essentially 256 possible triangulations, one for each MC case.
From 15 basic predefined triangulations, depicted left in Figure VII.5, we can create
122
PAPER VII: H IGH - SPEED MARCHING CUBES
Figure VII.6: Assuming that edges pierce the iso-surface at the middle of an edge (left) and
using an approximating linear polynomial to determine the intersection (right).
triangulations for all 256 MC cases due to inherent symmetries [62]. However, some
of the MC cases are ambiguous, which may result in a discontinuous surface. Luckily,
this is easily remedied by modifying the triangulations for some of the MC cases [66].
On the downside, this also increases the maximum number of triangles emitted per
MC cell from 4 to 5.
Where a piercing edge intersects the iso-surface is determined by the scalar field
along the edge. However, the scalar field is only known at the end-points of the edge,
so some assumptions must be made. A simple approach is to position the intersection
at the midpoint of the edge, however, this choice leads to an excessively “blocky” appearance, see the left side of Figure VII.6. A better choice is to approximate the scalar
field along the edge with an interpolating linear polynomial, and find the intersection
using this approximation, as shown in the right half of Figure VII.6.
4.1
M APPING MC TO STREAM AND H ISTO P YRAMID PROCESSING
Our approach is to implement MC as a sequence of data stream operations, with the
input data stream being the cells of the 3D scalar field, and the output stream being a
set of vertices, forming the triangles of the iso-surface. The data stream operations are
executed via the HistoPyramid or, in one variant, the geometry shader, which compact
and expand the data stream as necessary.
Figure VII.7 shows a flowchart of our algorithm. We use a texture to represent the
3D scalar field, and the first step of our algorithm is to update this field. The 3D scalar
field can stem from a variety of sources: it may e.g. originate from disk storage, CPU
123
PAPER VII: H IGH - SPEED MARCHING CUBES
Scalar field
texture
Vertex count
texture
HistoPyramid
texture
Triangulation
table texture
Enumeration
VBO
Update
scalar field
Build
HP base
HP
reduce
Vertex count
readback
Render
geometry
Iso-level
Start
new frame
For each level
Figure VII.7: A schematic view of the implementation. Thick arrows designate control flow,
with blue boxes executing on the GPU and white boxes on the CPU. The dotted and dashed
arrows represent data flow, with dotted arrows for fetches and dashed arrows for writes. Green
boxes stand for dynamic data. Red boxes for static data.
memory, or simply be the result of GPGPU computations. For static scalar fields, this
update is of course only needed once.
The next step is to build the HistoPyramid. We start at the base level. Our predicate
function corresponds the base level texels with one MC cell each, and calculates the
corresponding 3D scalar field coordinates. Then, it samples the scalar field to classify
the MC cell corners. By comparing against the iso-level, it can determine which MC
cell corners are inside or outside the iso-surface. This determines the MC case of the
cell, and thus the number of vertices needed to triangulate this case. We store this value
in the base level, and can now proceed with HistoPyramid build-up for the rest of the
levels, as described in Section 3.1.
After HistoPyramid buildup has been completed, we read back the single texel at
its top level. This makes the CPU aware of the actual number of vertices required
for a complete iso-surface mesh. Dividing this number by three yields the number of
triangles.
As already mentioned, output elements can be extracted by traversing the HistoPyramid. Therefore, the render pass is fed with dummy vertices, enumerated with increasing key indices. For each input vertex, we use its key index to conduct a HistoPyramid
traversal, as described in Section 3.2. After the traversal, we have a texel position in
the base level and a key index remainder kl . From the texel position in the base texture,
we can determine the corresponding 3D coordinate, inverting the predicate function’s
3D to 2D mapping. Using the MC case of the cell and the local key index kl , we can
perform a lookup in the triangulation table texture, a 16 × 256 table where entry (i, j)
tells which edge vertex i in a cell of MC case j corresponds to. Then, we sample the
scalar field at the two end-points of the edge, determine a linear interpolant of the scalar
field along this edge, find the exact intersection, and emit the corresponding vertex.
124
PAPER VII: H IGH - SPEED MARCHING CUBES
In effect, the algorithm has transformed the stream of 3D scalar field values into a
stream of vertices, generated on the fly while rendering iso-surface geometry. Still, the
geometry can be stored in a buffer on the GPU if so needed, either by using transform
feedback buffers or via a render-to-vertex-buffer pass.
4.2
I MPLEMENTATION DETAILS
In detail, the actual implementation of our MC approach contains some noteworthy
caveats described in this chapter.
We store the 3D scalar field using a large tiled 2D texture, know as a Flat 3D
layout [36], which allows the scalar field to be updated using a single GPGPU-pass.
Since the HistoPyramid algorithm performs better for large amounts of data, we use
the same layout for the base level of the HistoPyramid, allowing the entire volume to
be processed using one HistoPyramid.
We use a four-channel HistoPyramid, where the RGBA-values of each base level
texel correspond to the analysis of a tiny 2 × 2 × 1-chunk of MC cells. The analysis
begins by fetching the scalar values at the common 3 × 3 × 2 corners of the four MC
cells. We compare these values to the iso-value to determine the inside/outside state
of the corners, and from this determine the actual MC cases of the MC cells. The
MC case corresponds to the MC template geometry set forth by the Marching Cubes
algorithm. As it is needed in the extraction process, we use some of the bits in the base
level texels to cache it. To do this, we let the vertex count be the integer part and the
MC case the fractional part of a single float32 value. This is sound, as the maximum
number of vertices needed by an MC case is 15, and therefore the vertex count only
needs 4 of the 32 bits in a float32 value. This data co-sharing is only of relevance in the
base-level, and the fractional part is stripped away when building the first level of the
HistoPyramid. HistoPyramid texture building is implemented as consecutive GPGPUpasses of reduction operations, as exemplified in “render-to-texture loop with custom
MipMap generation” [50], but instead of using one single framebuffer object (FBO) for
all MipMap levels, we use a separate FBO for each MipMap level, yielding a speedup
on some hardware. We retrieve the RGBA-value of the top element of the HistoPyramid to the CPU as the sum of these four values is the number of vertices in the
iso-surface.
Our SM3 variant uses the vertex shader to generate the iso-surface on the fly. Here,
rendering is triggered by feeding the given number of (dummy) vertices to the vertex
shader. The only vertex attribute provided by the CPU is a sequence of key indices,
streamed off a static vertex buffer object (VBO). Even though SM4-hardware provides
the gl_VertexID-attribute, OpenGL cannot initiate vertex processing without any
attribute data, and hence a VBO is needed anyway. For each vertex, the vertex shader
125
PAPER VII: H IGH - SPEED MARCHING CUBES
uses the provided key index to traverse the HistoPyramid, determining which MC cell
and which of its edges this vertex is part of. It then samples the scalar field at both
end-points of its edge, and uses its linear approximation to intersect with the edge.
The shader can also find an approximate normal vector at this vertex, which it does by
interpolating the forward differences of the 3D scalar field at the edge end-points.
Our SM4 variant of iso-surface extraction lets the geometry shader generate the vertices required for each MC cell. Here, the HistoPyramid is only used for data stream
compaction, discarding MC cells that do not intersect with the iso-surface. To this
purpose, we modified the predicate function to fill the HP base level with keep (1)
or discard (0) values, since no output cloning is necessary for vertex generation. After retrieving the number of geometry-producing MC cells from the top level of the
HistoPyramid, the CPU triggers the geometry shader by rendering one point primitive
per geometry-producing MC cell. For each primitive, the geometry shader first traverses the HistoPyramid and determines which MC cell this invocation corresponds
to. Then, based on the stored MC case, it emits the required vertices and, optionally,
their normals by iterating through the triangulation table texture. This way, the SM4
variant reduces the number of HistoPyramid traversals from once for every vertex of
each iso-surface triangle, to once for every geometry-relevant MC cell.
If the iso-surface is required for post-processing, the geometry can be recorded
directly as a compact list of triangles in GPU memory using either the new transform
feedback extension or a more traditional render-to-texture setup.
Algorithmically, there is no reason to handle the complete volume in one go, except for the moderate performance increase at large volume sizes which is typical for
HistoPyramids. Hence, the volume could also be tiled into suitable chunks, making the
memory impact of the HP small.
4.3
CUDA IMPLEMENTATION
Even though our method can be implemented using standard OpenGL 2.0, we have
noticed increased interest in performance behavior under the GPGPU programming
language CUDA. In the following section, we describe some additional insights from
porting our algorithm to CUDA. A thorough introduction to CUDA itself can be found
in [15].
At the core of our algorithm lies the HistoPyramid, a data structure based on 2D textures. Unfortunately, in the current release of CUDA, kernels cannot output data to a
2D texture without an intermediate device-device copy. Instead, the kernels write output to linear memory, which can in its turn be bound directly to a 1D sampler. Therefore, we linearize the 2D structure using Morton-code, which preserves locality very
well. The Morton-code of a point is determined by interleaving the bits of the coor126
PAPER VII: H IGH - SPEED MARCHING CUBES
1 1 0 1
0
1
4
5
1 1 1 0 0 1 1 0 0 2 1 0 0 1 0 0 3 2 3 1 9
1 0 1 0
2
3
6
7
0 2 0 1
8
9
12
13
1 0 0 0
10
11
14
15
Base level
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
3 2
16
17
3 1
18
19
Level 1
9
20
Level 2
Figure VII.8: HistoPyramids in CUDA: A chunk. Serialization of the HistoPyramid cells,
aka Morton code, is shown in the cells’ lower left. If unknown, it can be constructed via the
value one at each cell in the base level. Using this layout, four numbers that form intervals lie
consecutively in memory. Red arrows (above) show construction of the linearized HistoPyramid,
while the green arrows (below) show extraction of key index 6, as exemplified in Figure VII.4
and explained in Section 3.2.
dinate values. Figure VII.8 shows a HistoPyramid with the Morton-code in the lower
left corners of the elements. To improve the locality between MipMap-levels, we use
chunks, HistoPyramids which are so small that all their levels remain close in memory. These chunks are then organized into layers, where the top level of the chunks in
one layer forms the base of the chunks in the next layer. One example: using chunks
with 64 base layer elements, we use one layer to handle 3 levels of a comparable 2D
HistoPyramid.
Using this layout, we can link our data structures closely to the CUDA computation
concepts grids, blocks and threads. Each layer of chunks is built by one grid. Inside
the layer, each chunk is built with one block, using one thread per chunk base level
element. The block’s threads store the chunk base layer in global memory, but keep
a copy in shared memory. Then, the first quarter of the threads continue, building the
next level of the chunk from shared memory, again storing it in global mem, with a
copy in shared mem and so on. Four consecutive elements are summed to form an
element in the next level, as shown by the red arrows in Figure VII.8. HP Chunk/Layer
Traversal is largely analogous to 2D texture-based traversal, as shown by the green
arrows in Figure VII.8. In addition, for each chunk traversed, we must jump to the
corresponding chunk in the layer below. Data extraction based on this traversal can be
carried out in CUDA, by letting CUDA populate a VBO. Alternately, by letting CUDA
store the layers of chunks in an OpenGL buffer object, HP Chunk/Layer structures can
be traversed in an OpenGL vertex shader.
In effect, we have transformed the HistoPyramid 2D data structure into a novel
127
PAPER VII: H IGH - SPEED MARCHING CUBES
1D HistoPyramid layer/chunk-structure. In principle, the memory requirement of the
layer/chunk-structure is one third of the base layer size, just like for 2D MipMaps. But
since empty chunks are never traversed, they can even be completely omitted. This
way, the size of the input data needs only be padded up to the number of base elements
in a chunk, which further reduces memory requirements. Furthermore, all layers do
not need chunks with full 32-bit values. MC produces maximally 15 vertices per cell,
which allows us to use 8-bit chunks with 16 base level elements in the first layer, and a
layer with 16-bit chunks with 256 base level elements, before we have to start using 32bit chunks. Thus, the flexibility of the layer/chunk-structure makes it easier to handle
large datasets, very probably even out-of-core data.
We had good results using 64 elements in the chunk base layer. With this chunk
size, a set of 2563 elements can be reduced using four layers. Since the chunks’ cells
are closely located in linear memory, we have improved 1D cache assistance — both
in theory and practice.
5
P ERFORMANCE ANALYSIS
We have performed a series of performance benchmarks on six iso-surface extraction
methods. Four are of our own design: the OpenGL-based HistoPyramid with extraction
in the vertex shader (GLHP-VS) or extraction in the geometry shader (GLHP-GS), and
the CUDA-based HistoPyramid with extraction into a VBO using CUDA (CUHP-CU),
and extraction directly in the OpenGL vertex shader (CUHP-VS). In addition, we have
benchmarked the MT-implementation [90] from the Nvidia OpenGL SDK-10 (NVSDK10), where the geometry shader is used for compaction and expansion. For the
purpose of this performance analysis, we obtained a highly optimized version of the
Scan [35]-based MC-implementation provided in the Nvidia CUDA 1.1 SDK. This
optimized version (CUDA1.1+) is up to three times faster than the original version from
the SDK, which reinforces that intimate knowledge of the hardware is of advantage for
CUDA application development.
To measure the performance of the algorithms under various loads, we extracted
iso-surfaces out of six different datasets, at four different resolutions. The iso-surfaces
are depicted in Figure VII.9. The first three volumes, “Bunny”, “CThead”, and “MRbrain”, were obtained from the Stanford volume data archive [88], the “Bonsai” and
“Aneurism” volumes were obtained from volvis.org [93]. The analytical “Cayley” surface is the zero set of the function f (x, y, z) = 16xyz + 4(x + y + z) − 1 sampled
over [−1, 1]3 . While the algorithm is perfectly capable of handling dynamic volumes
without modification, we have kept the scalar field and iso-level static to get consistent
statistics, however, the full pipeline is run every frame.
Table 5 shows the performance of the algorithms, given in million MC-cells pro128
PAPER VII: H IGH - SPEED MARCHING CUBES
cessed per second, capturing the throughput of each algorithm. In addition, the frames
per second, given in parentheses, captures the interactiveness on current graphics hardware. Since the computations per MC cell vary, we recorded the percentage of MC cells
that produce geometry. This is because processing of a MC cell that intersects the isosurface is heavier than for MC cells that do not intersect. On average, each intersecting
MC cell produces roughly two triangles.
All tests were carried out on a single workstation with an Intel Core2 2.13 GHz
CPU and 1 GB RAM, with four different Nvidia GeForce graphics cards: A 128MB
6600GT, a 256MB 7800GT, a 512MB 8800GT-640, and a 768MB 8800GTX. Table 5
shows the results for the 7800GT and the 8800GTX, representing the SM3.0 and SM4.0
generations of graphics hardware. All tests were carried out under Linux, running the
169.04 Nvidia OpenGL display driver, except the test with NVSDK10, which was
carried out on MS Windows under the 158.22 OpenGL display driver.
Evaluation shows that the HistoPyramid algorithms benefit considerably from increasing amounts of volume data. This meets our expectations, since the HistoPyramid
is particularly suited for sparse input data, and in large volume datasets, large amounts
of data can be culled early in the MC algorithm. However, some increase in throughput
is also likely to be caused by the fact that larger chunks of data give increased possibility of data-parallelism, and require fewer GPU state-changes (shader setup, etc.)
in relation to the data processed. This probably explains the (moderate) increase in
performance for the NV-SDK10.
The 6600GT performs quite consistently at half the speed of the 7800GT, indicating that HistoPyramid buildup speeds are highly dependent on memory bandwidth, as
the 7800GT has twice the memory bandwidth of the 6600GT. The 8800GT performs
at around 90–100% the speed of the 8800GTX, which is slightly faster than expected,
given it only has 70% of the memory bandwidth. This might be explained by the architecture improvements carried out along with the improved fabrication process that
differentiates the GT from the GTX. However, the HP-VS algorithm on the 8800GTX
is 10–30 times faster than on the 7800GT, peaking at over 1000 million MC cells processed per second. This difference cannot be explained by larger caches and improved
memory bandwidth alone, and shows the benefits of the unified Nvidia 8 architecture,
enabling radically higher performance in the vertex shader-intensive extraction phase.
The CUDA implementations are not quite as efficient as the GLHP-VS, running at
only 70–80% of its speed. However, if geometry must not only be rendered but also
stored (GLHP-VS uses transform feedback in this case), the picture changes. There,
CUHP-CU is at least as fast as GLHP-VS, and up to 60% faster for dense datasets than
our reference. CUHP-VS using transform feedback, however, is consistently slower
than the GLHP-VS with transform feedback, indicating that the 1D chunk/layer-layout
is not as optimal as the MipMap-like 2D layout.
129
PAPER VII: H IGH - SPEED MARCHING CUBES
The geometry shader approach, GLHP-GS, has the theoretical advantage of reducing the number of HistoPyramid traversals to roughly one sixth of the vertex shader
traversal in GLHP-VS. Surprisingly, in practice we observed a throughput that is four
to eight times less than for GLHP-VS, implying that the data amplification rate of the
geometry shader cannot compete with the HistoPyramid, at least not in this application.
It seems as if the overhead of this additional GPU pipeline stage is still considerably
larger than the partially redundant HistoPyramid traversals. Similarly, the NVSDK10approach shows a relatively mediocre performance compared to GLPHP-VS. But this
picture is likely to change with improved geometry shaders of future hardware generations.
The CUDA1.1 approach uses two successive passes of scan. The first pass is a
pure stream compaction pass, culling all MC cells that will not produce geometry. The
second pass expands the stream of remaining MC cells. The advantage of this twopass approach is that it enables direct iteration over the geometry-producing voxels,
and this avoids a lot of redundant fetches from the scalar field and calculations of
edge intersections. The geometry-producing voxels are processed homogenously until
the final step where the output geometry is built using scatter write. This approach
has approximately the same performance as our CUDA implementation for moderatly
dense datasets, and slightly worse for sparse datasets, where the HistoPyramid excels.
We also experimented with various detail changes in the algorithm. One was to
position the vertices at the edge midpoints, removing the need for sampling the scalar
field in the extraction pass, as mentioned in Section 4. In theory, this should increase
performance, but experiments show that the speedup is marginal and visual quality
drops drastically, see Figure VII.6. In addition, we benchmarked performance with
different texture storage formats, including the new integer storage format of SM4.
However, it turned out that the storage type still has relatively little impact in this
hardware generation. We therefore omitted these results to improve readability.
6
C ONCLUSION AND FUTURE WORK
We have presented a fast and general method to extract iso-surfaces from volume data
running completely on the GPU of OpenGL 2.0 graphics hardware. It combines the
well-known MC algorithm with novel HistoPyramid algorithms to handle geometry
generation. We have described a basic variant using the HistoPyramid for stream compaction and expansion, which works on almost any graphics hardware. In addition,
we have described a version using the geometry shader for stream expansion, and one
version implemented fully in CUDA. Since our approach does not require any preprocessing and simply re-generates the mesh constantly from the raw data source, it
can be applied to all kinds of dynamic data sets, be it analytical, volume streaming,
130
PAPER VII: H IGH - SPEED MARCHING CUBES
Bunny, iso=512
CThead, iso=512
MRbrain, iso=1539
Bonsai, iso=36
Aneurism, iso=11
Cayley, iso=0.0
Figure VII.9: The iso-surfaces used in the performance analysis, along with the actual iso-values
used in the extraction process.
or a mixture of both (e.g. an analytical transfer function applied to a static volume, or
volume processing output).
We have conducted an extensive performance analysis on both versions, and set
them in contrast to the MT implementation of the Nvidia SDK-10, and an optimized
version of the CUDA-based MC implementation provided in the CUDA SDK 1.1.
At increasing data sizes, our algorithms outperform all other known GPU-based isosurface extraction algorithms. Surprisingly, the vertex-shader based variant of the algorithm (GLHP-VS) is also the fastest on recent DX-10-class hardware, even though it
does not use any geometry shader capabilities.
In direct comparison, Scan and HistoPyramids have some similarities (the Scan upsweep phase and the HistoPyramid construction are closely related), while the difference lies in the extraction process. Scan has the advantage that only one table lookup is
needed, as long as scatter-write is available. For HistoPyramids, each output element
extraction requires a log(n)-traversal of the HistoPyramid. Despite that algorithmic
complexity, the HistoPyramid algorithm can utilize the texture cache very efficiently,
131
PAPER VII: H IGH - SPEED MARCHING CUBES
reducing the performance hit of the deeper traversal. A second difference is that Scan’s
output extraction iterates over all input elements and scatters the relevant ones to output, while HistoPyramid iterates on the output elements instead. Scan uses two passes
to minimize the impact of this disadvantage for the MC application, and often succeeds.
However, if a lot of the input elements are to be culled, which is the case with MC for
larger and sparse volumes, the HistoPyramid algorithms can play out their strengths,
despite the deep gathering traversal.
While the CUDA API is an excellent GPGPU tool, seemingly more fitting to this issue, we still feel that a pure OpenGL implementation is of considerable interest. First of
all, the OpenGL implementation still outperforms all other implementations. Further, it
is completely platform-independent, and can thus be implemented on AMD hardware
or even mobile graphics platforms, requiring only minor changes. This aside, we still
see more future potential for our CUDA implementation, which we believe is not yet
fully mature. The chunk/layer structure does remedy CUDA’s lack of render-to-2D texture, which brings the CUDA implementation up to speed with the OpenGL approach
and introduces a flexible data structure that requires a minimal amount of padding.
But we believe that our CUDA approach would benefit significantly from a traversal
algorithm that iterates over every output-producing input element, which would allow to calculate edge intersections once per geometry producing voxel and triangles
to be emitted using scattered write, similar to the Scan-based approach (CUDA1.1+).
We have begun investigating approaches to this problem, and preliminary results look
promising.
A port of Marching Cubes to OpenGL ES would be a good reference for making general data compaction and expansion available on mobile graphics hardware.
As already mentioned, our geometry generation approach is not specific to MC; its
data expansion principle is general enough to be used in totally different areas, such
as providing geometry generation for games, or advanced mobile image processing.
For geometry shader capable hardware, we are curious if the gap between geometry
shaders and HistoPyramids will actually close. It is fully possible that general GPU
improvements will benefit HistoPyramid performance accordingly, and thus keep this
HP-based algorithm useful even there.
Future work might concentrate on out-of-core applications, which also benefit from
high-speed MC implementations. Multiple Rendering Targets will allow us to generate multiple iso-surfaces or to accelerate HistoPyramid processing (and thus geometry
generation) even further. For example, a view-dependent layering of volume data could
allow for immediate output of transparency sorted iso-surface geometry. We also consider introducing indexed triangle mesh output in our framework, as they preserve mesh
connectivity. For that purpose, we would experiment with algorithmic approaches that
avoid the two passes and scattering that the straight-forward solution would require.
132
PAPER VII: H IGH - SPEED MARCHING CUBES
We have further received notice that this approach should fit to iso-surface extraction
from unstructured grids, since the whole approach is independent of the input’s data
structure: It only requires a stream of MC cells.
Acknowledgment. We thank Simon Green and Mark Harris for helpful discussions
and comments in this work, and for providing the optimized version of the Marching
Cubes implementation from the CUDA SDK 1.1.
133
PAPER VII: H IGH - SPEED MARCHING CUBES
Model
255x255x255
127x127x127
63x63x63
31x31x31
255x255x127
127x127x63
63x63x31
31x31x15
255x255x127
127x127x63
63x63x31
31x31x15
255x255x255
127x127x127
63x63x63
31x31x31
255x255x255
127x127x127
63x63x63
31x31x31
255x255x255
127x127x127
63x63x63
31x31x31
MC cells
16581375
2048383
250047
29791
8258175
1016127
123039
14415
8258175
1016127
123039
14415
16581375
2048383
250047
29791
16581375
2048383
250047
29791
16581375
2048383
250047
29791
Density
3.18%
5.64%
9.07%
13.57%
3.73%
6.25%
9.62%
14.46%
5.87%
7.35%
9.96%
14.91%
3.04%
5.07%
6.69%
8.17%
1.60%
2.11%
3.70%
6.80%
0.93%
1.89%
3.87%
8.10%
7800GT
GLHP-VS
—
12 (5.7)
8.5 (34)
5.0 (167)
16 (2.0)
12 (11)
7.6 (62)
4.5 (310)
10 (1.3)
9.8 (9.7)
7.4 (60)
4.3 (302)
—
13 (6.3)
11 (45)
8.0 (268)
—
29 (14)
19 (77)
8.7 (292)
—
31 (15)
18 (72)
7.3 (246)
8800GTX
GLHP-VS
540 (33)
295 (144)
133 (530)
22 (722)
434 (53)
265 (260)
82 (669)
11 (768)
305 (37)
239 (235)
82 (663)
11 (771)
562 (34)
314 (153)
148 (590)
21 (717)
905 (55)
520 (254)
169 (676)
21 (715)
1135 (68)
534 (261)
174 (695)
22 (736)
8800GTX
GLHP-GS
82 (5.0)
45 (22)
27 (108)
12 (399)
68 (8.2)
40 (40)
23 (189)
8 (566)
38 (4.6)
32 (31)
21 (169)
8 (546)
82 (4.9)
45 (22)
32 (127)
16 (529)
134 (8.1)
98 (48)
50 (199)
16 (545)
245 (15)
118 (58)
52 (206)
17 (574)
8800GTX
CUHP-CU
414 (25)
250 (122)
94 (378)
17 (569)
372 (45)
200 (197)
58 (473)
8.6 (599)
269 (33)
184 (181)
57 (466)
8.5 (589)
427 (26)
264 (129)
103 (413)
17 (578)
605 (37)
396 (193)
116 (464)
17 (584)
695 (42)
405 (198)
116 (465)
18 (589)
8800GTX
CUHP-VS
420 (25)
248 (121)
119 (474)
24 (805)
366 (44)
200 (196)
76 (615)
12 (857)
274 (33)
183 (180)
75 (611)
12 (837)
433 (26)
262 (128)
132 (526)
25 (827)
598 (36)
427 (209)
152 (607)
25 (830)
700 (42)
438 (214)
151 (606)
25 (828)
8800GTX 8800GTX
NVSDK10 CUDA1.1+
—
400 (24)
—
246 (120)
28 (113)
109 (436)
22 (734)
22 (739)
—
358 (43)
—
217 (213)
25 (206)
70 (571)
17 (1187)
12 (802)
—
279 (34)
—
112 (194)
26 (215)
70 (566)
18 (1257)
12 (795)
—
407 (25)
—
269 (131)
29 (116)
119 (476)
24 (805)
23 (783)
—
510 (31)
—
372 (182)
33 (132)
136 (544)
26 (857)
24 (789)
—
563 (34)
—
377 (184)
32 (129)
133 (530)
25 (828)
23 (774)
Table VII.2: The performance of extraction and rendering of iso-surfaces, measured in million MC cells processed per second,
with frames per second given in parentheses. The implementations are described in Section 5.
134
Bunny
CTHead
MRBrain
Bonsai
Aneurism
Cayley
R EFERENCES
[1] P. Alliez, O. Devillers, and J. Snoeyink. Removing degeneracies by perturbing
the problem or the world. Reliable Computing, 6:61–79, 2000.
[2] P. Alliez, N. Laurent, H. Sanson,
√ and F. Schmitt. Efficient view-dependent refinement of 3D meshes using 3-subdivision. The Visual Computer, 19:205–221,
2003.
[3] J. Andrews and N. Baker. Xbox 360 system architecture. IEEE Micro, 26(2):25–
37, 2006.
[4] E. Arge and M. Dæhlen. Data reduction of piecewise linear curves. In A. Tveito
and M. Dæhlen, editors, Numerical Methods and Software Tools in Industrial
Mathematics, pages 347–364. Birkhauser, 1997.
[5] A. Belyaev. On transfinite barycentric coordinates. In K. Polthier and A. Sheffer,
editors, Eurographics symposium on geometry processing, pages 89–99, 2006.
[6] H. Blum. A Transformation for Extracting New Descriptors of Shape. In
W. Wathen-Dunn, editor, Models for the Perception of Speech and Visual Form,
pages 362–380. MIT Press, 1967.
[7] T. Boubekeur, P. Reuter, and C. Schlick. Scalar tagged PN triangles. In Eurographics 2005 (Short Papers), 2005.
[8] T. Boubekeur and C. Schlick. Generic mesh refinement on GPU. In Graphics
Hardware 2005, pages 99–104, July 2005.
[9] T. Boubekeur and C. Schlick. A flexible kernel for adaptive mesh refinement on
GPU. Computer Graphics Forum, 27(1):102–114, 2008.
[10] L. Buatois, G. Caumon, and B. Levy. GPU accelerated isosurface extraction on
tetrahedral grids. In International Symposium on Visual Computing, 2006.
[11] M. Bunnell. Adaptive tessellation of subdivision surfaces with displacement mapping. In GPU Gems 2, pages 109–122. Addison-Wesley, 2005.
[12] D. Card and J. L. Mitchell. ShaderX, chapter Non-Photorealistic Rendering with
Pixel and Vertex Shaders. Wordware, 2002.
135
R EFERENCES
[13] H. Carr, T. Moller, and J. Snoeyink. Artifacts caused by simplicial subdivision. IEEE Transactions on Visualization and Computer Graphics, 12(2):231–
242, March 2006.
[14] A. K. Cline and R. J. Renka. A storage efficient method for construction of a
Thiessen triangulation. Rocky Mountain J. Math, pages 119–140, 1984.
[15] NVIDIA Corporation. CUDA programming guide version 1.0.
[16] B. Delaunay. Sur la sphére vide. Bulletin of Academy of Sciences of the USSR,
pages 793–800, 1934.
[17] L. Demaret, N. Dyn, M. S. Floater, and A. Iske. Adaptive thinning for terrain
modelling and image compression. In N. Dodgson, M. S. Floater, and M. A.
Sabin, editors, Advances in Multiresolution for Geometric Modelling, pages 319–
338. Springer-Verlag, 2004.
[18] O. Devillers and M. Teillaud. Perturbations and vertex removal in a 3D Delaunay
triangulation. In SODA ’03: Proceedings of the fourteenth annual ACM-SIAM
symposium on Discrete algorithms, pages 313–319. Society for Industrial and
Applied Mathematics, 2003.
[19] W. Donnelly. GPU Gems 2, chapter 8 Per-Pixel Displacement Mapping with
Distance Functions. Addison Wesley Professional, 2005.
[20] D. H. Douglas and T. K. Peuker. Algorithms for the reduction of the number
of points required to represent a digitized line or its caricature. The Canadian
Cartographer, 10(2):112–122, 1973.
[21] C. Dyken and M. Reimers. Real-time linear silhouette enhancement. In Mathematical Methods for Curves and Surfaces: Tromsø 2004, pages 135–144. Nashboro Press, 2004.
[22] C. Dyken, M. Reimers, and J. Seland. Real-time GPU silhouette refinement using
adaptively blended Bézier patches. Computer Graphics Forum, 27(1):1–12, 2008.
[23] D. S. Ebert, F. K. Musgrave, D. Peachey, K. Perlin, and S. Worley. Texturing and
Modeling: A Procedural Approach. Academic Press, 2nd edition, 1998.
[24] H. Edelsbrunner and E. P. Mücke. Simulation of simplicity: a technique to cope
with degenerate cases in geometric algorithms. ACM Trans. Graph., 9(1):66–104,
1990.
136
R EFERENCES
[25] C. Everitt and M. J. Kilgard. Practical and robust stenciled shadow volumes for
hardware-accelerated rendering, 2001.
[26] G. Farin. Curves and surfaces for CAGD. Morgan Kaufmann Publishers Inc.,
2002.
[27] M. S. Floater. Mean value coordinates. Computer Aided Geometric Design,
20:19–27, 2003.
[28] M. S. Floater and K. Hormann. Mean value coordinates for arbitrary planar polygons. ACM Transactions on Graphics, 25:1424–1441, 2006.
[29] M. S. Floater and K. Hormann. Barycentric rational interpolation with no poles
and high rates of approximation. Numerische Mathematik, 107(2):315–331,
2007.
[30] M. S. Floater, K. Hormann, and G. Kós. A general construction of barycentric
coordinates over convex polygons. Advances in Computational Mathematics,
24(1):311–331, 2006.
[31] M. S. Floater, G. Kós, and M. Reimers. Mean value coordinates in 3D. Computer
Aided Geometric Design, 22(7):623–631, 2005.
[32] F. Goetz, T. Junklewitz, and G. Domik. Real-time marching cubes on the vertex
shader. Eurographics 2005 Short Presentations, 2005.
[33] W. J. Gordon and J. A. Wixom. Pseudo-harmonic interpolation on convex domains. SIAM J. Numer. Anal, 11(5):909–933, 1974.
[34] M. J. Harris. GPU Gems 2, chapter 31 Mapping Computational Concepts to
GPUs. Addison Wesley Professional, 2005.
[35] M. J. Harris. Parallel prefix sum (scan) with CUDA. NVIDIA CUDA SDK 1.0,
2007.
[36] M. J. Harris, W. V. Baxter III, T. Scheuermann, and A. Lastra. Simulation of cloud
dynamics on graphics hardware. Proceedings of Graphics Hardware, 2003.
[37] A. Hartner, M. Hartner, E. Cohen, and B. Gooch. Object space silhouette algorithims. In Theory and Practice of Non-Photorealistic Graphics: Algorithms,
Methods, and Production System. SIGGRAPH Course Notes, 2003.
137
R EFERENCES
[38] P. S. Heckbert and M. Garland. Survey on polygonal surface simplification algorithms. Technical report, Carnegie Mellon University, 1997. Multiresolution
Surface Modeling Course, SIGGRAPH’97.
[39] Ø. Hjelle and M. Dæhlen. Triangulations and Applications. Springer-Verlag,
2006.
[40] K. Hollig. Finite Element Methods with B-Splines. Society for Industrial and
Applied Mathematics, 2003.
[41] K. Hollig, U. Reif, and J. Wipper. Weighted extended b-spline approximation of
dirichlet problems. SIAM Journal on Numerical Analysis, 39:442–462, 2001.
[42] H. Hoppe. Progressive meshes. In ACM SIGGRAPH 1996, pages 99–108, 1996.
[43] H. Hoppe, T. DeRose, T. Duchamp, J. McDonald, and W. Stuetzle. Mesh optimization. In Proceedings of SIGGRAPH 93, Computer Graphics Proceedings,
Annual Conference Series, pages 19–26, August 1993.
[44] D. Horn. GPU Gems 2, chapter 36 Stream Reduction Operations for GPGPU
Applications, pages 573–589. Addison-Wesley, 2005.
[45] T. Isenberg, B. Freudenberg, N. Halper, S. Schlechtweg, and T. Strothotte. A
developer’s guide to silhouette algorithms for polygonal models. IEEE Computer
Graphics and Applications, 23(4):28–37, 2003.
[46] G. Johansson and H. Carr. Accelerating marching cubes with graphics hardware.
In CASCON ’06: Proceedings of the 2006 conference of the Center for Advanced
Studies on Collaborative research. ACM Press, 2006.
[47] T. Ju, P. Liepa, and J. Warren. A general geometric construction of coordinates in
a convex simplicial polytope. Comput. Aided Geom. Des., 24(3):161–178, 2007.
[48] T. Ju, S. Schaefer, and J. Warren. Mean value coordinates for closed triangular
meshes. ACM Transactions on Graphics, 24:561–566, 2005.
[49] T. Ju, S. Schaefer, J. Warren, and M. Desbrun. A geometric construction of coordinates for convex polyhedra using polar duals. In SGP ’05: Proceedings of the
third Eurographics symposium on Geometry processing, pages 181–186, 2005.
R
[50] J. Juliano and J. Sandmel. GL_EXT_framebuffer_object. OpenGL
extension
registry, 2005.
138
R EFERENCES
[51] L. V. Kantorovich and V. I. Krylov. Approximate methods of higher analysis.
Interscience, 1964.
[52] P. Kipfer and R. Westermann. GPU construction and transparent rendering of isosurface. In G. Greiner, J. Hornegger, H. Niemann, and M. Stamminger, editors,
Proceedings Vision, Modeling and Visualization 2005, pages 241–248. IOS Press,
infix, 2005.
[53] T. Klein, S. Stegmaier, and T. Ertl. Hardware-accelerated reconstruction of polygonal isosurface representations on unstructured grids. Pacific Graphics 2004 Proceedings, 2004.
√
[54] L. Kobbelt. 3-subdivision. In ACM SIGGRAPH 2000, pages 103–112, 2000.
[55] J. J. Koenderinc. What does the occluding contour tell us about solid shape.
Perception, 13(3):321–330, 1984.
[56] T. Langer, A. Belyaev, and H.-P. Seidel. Spherical barycentric coordinates. In
SGP ’06: Proceedings of the fourth Eurographics symposium on Geometry processing, pages 81–88, 2006.
[57] C. L. Lawson. Transforming triangulations. Discrete Mathematics, 3:365–372,
1972.
[58] C. L. Lawson. Software for C 1 surface interpolation. In J. Rice, editor, Mathematical Software III. Academic Press, 1977.
[59] D. T. Lee and A. K. Lin. Generalized Delaunay triangulation for planar graphs.
Discrete and Computational Geometry, 1:201–217, 1986.
[60] S. L. Lee. Mean value representations and curvatures of compact convex hypersurfaces. preprint, 2007.
[61] Y. Lipman, J. Kopf, D. Cohen-Or, and D. Levin. GPU-assisted positive mean
value coordinates for mesh deformations. In SGP ’07: Proceedings of the fifth
Eurographics symposium on Geometry processing, pages 117–123, 2007.
[62] W. Lorensen and H. E. Cline. Marching cubes: A high resolution 3D surface construction algorithm. SIGGRAPH 87 Computer Graphics Proceedings, 21(4):163–
170, 1987.
[63] F. Losasso and H. Hoppe. Geometry clipmaps: terrain rendering using nested
regular grids. ACM Transactions on Graphics, 23(3):769–776, August 2004.
139
R EFERENCES
[64] D. Luebke, B. Watson, J. D. Cohen, M. Reddy, and A. Varshney. Level of Detail
for 3D Graphics. Elsevier Science Inc., 2002.
[65] A. E. Løvgren, Y. Maday, and E. M. Rønquist. A reduced basis element method
for complex flow systems. In P. Wesseling, E. Oñate, and J. Périaux, editors,
ECCOMAS CFD 2006, European Conference on Computational Fluid Dynamics.
TU Delft, 2006.
[66] C. Montani, R. Scateni, and R. Scopigno. A modified look-up table for implicit
disambiguation of Marching Cubes. The Visual Computer, 10:353–355, 1994.
[67] H. Moreton. Watertight tessellation using forward differencing. In Graphics
Hardware 2001, pages 25–32, 2001.
[68] E. P. Mücke. A robust implementation for three-dimensional Delaunay triangulations. International Journal of Computational Geometry & Applications, 8,
1998.
[69] G. M. Nielson. Tools for Triangulations and Tetrahedrizations and Constructing
Functions Defined over Them, pages 429–525. CS Press, 1997.
[70] J. O’Rourke. Computational Geometry in C. Cambridge University Press, 2nd
edition, 1998.
[71] J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn, and
T. J. Purcell. A survey of general-purpose computation on graphics hardware. In
Eurographics 2005, State of the Art Reports, pages 21–51, August 2005.
[72] V. Pascucci. Isosurface computation made simple: Hardware acceleration, adaptive refinement and tetrahedral stripping. Joint Eurographics - IEEE TVCG Symposium on Visualization, pages 292–300, 2004.
[73] K. Pulli and M. Segal. Fast rendering of subdivision surfaces. In ACM SIGGRAPH 1996 Visual Proceedings: The art and interdisciplinary programs, 1996.
[74] U. Ramer. An iterative procedure for the polygonal approximation of plane
curves. Computer Graphics and Image Processing, 1, 1972.
R
[75] R. J. Rost. OpenGL
Shading Language. Addison Wesley Longman Publishing
Co., Inc., 2006.
[76] V. L. Rvachev, T. I. Sheiko, V. Shapiro, and I. Tsukanov. Transfinite interpolation
over implicity defined sets. Comput. Aided Geom. Des., 18(3):195–220, 2001.
140
R EFERENCES
[77] M. Sabin. Numerical geometry of surfaces. Acta Numerica, 3:411–466, 1994.
[78] M. Sabin. Transfinite surface interpolation. In Proceedings of the 6th IMA Conference on the Mathematics of Surfaces, pages 517–534, 1996.
[79] P. V. Sander, X. Gu, S. J. Gortler, H. Hoppe, and J. Snyder. Silhouette clipping.
In K. Akeley, editor, SIGGRAPH 2000 Computer Graphics Proceedings, pages
327–334. Addison-Wesley Professional, 2000.
R
[80] M. Segal, K. Akeley, C. Frazier, J. Leech, and P. Brown. The OpenGL
Graphics
system: A Specification, version 2.1. Silicon Graphics, Inc., 2006.
[81] R. Seidel. The nature and meaning of perturbations in geometric computing.
Discrete Comput. Geom., 19:1–17, 1998.
[82] J. Seland and T. Dokken. Real-time algebraic surface visualization. In G. Hasle,
K.-A. Lie, and E. Quak, editors, Geometric Modelling, Numerical Simulation,
and Optimization: Applied Mathematics at SINTEF. Springer-Verlag, 2007.
[83] V. Shapiro. Semi-analytic geometry with R-functions. Acta Numerica, pages
1–65, 2007.
[84] L.-J. Shiue, I. Jones, and J. Peters. A realtime GPU subdivision kernel. ACM
Transactions on Graphics, 24(3):1010–1015, August 2005.
R
[85] D. Shreiner, M. Woo, J. Neider, and T. Davis. OpenGL
Programming Guide.
Addison Wesley Longman Publishing Co., Inc., 2006.
[86] R. Sibson. Locally equiangular triangulations. Comput. J., pages 243–245, 1978.
[87] J. Stam. Exact evaluation of Catmull-Clark subdivision surfaces at arbitrary parameter values. In Proceedings of SIGGRAPH 98, Computer Graphics Proceedings, Annual Conference Series, pages 395–404, July 1998.
[88] The stanford volume data archive. http://graphics.stanford.edu/
data/voldata/.
[89] U. S. Geological Survey. Lake Tahoe data clearinghouse website, 1997.
[90] Y. Uralsky. DX10: Practical metaballs and implicit surfaces. GameDevelopers
conference, 2006.
141
R EFERENCES
[91] K. van Overveld and B. Wyvill. An algorithm for polygon subdivision based on
vertex normals. In Computer Graphics International, 1997. Proceedings, pages
3–12, 1997.
[92] A. Vlachos, J. Peters, C. Boyd, and J. L. Mitchell. Curved PN triangles. In 2001
ACM Symposium on Interactive 3D Graphics, March 2001.
[93] Volvis volume dataset archive. http://www.volvis.org/.
[94] J. Warren, S. Schaefer, A. Hirani, and M. Desbrun. Barycentric coordinates for
convex sets. Advances in Computational Mathematics, to appear.
[95] G. Ziegler, A. Tevs, C. Theobalt, and H.-P. Seidel. GPU point list generation
through histogram pyramids. Technical Report MPI-I-2006-4-002, Max-PlanckInstitut für Informatik, 2006.
142
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement