Examensarbete Ray Tracing B´ ezier Surfaces on GPU Joakim L¨

Examensarbete Ray Tracing B´ ezier Surfaces on GPU Joakim L¨
Examensarbete
Ray Tracing Bézier Surfaces on GPU
Joakim Löw
LITH - MAT - EX - - 06 / 24 - - SE
Ray Tracing Bézier Surfaces on GPU
Scientific Computing, Department of Mathematics, Linköpings Universitet
Joakim Löw
LITH - MAT - EX - - 06 / 24 - - SE
Examensarbete: 20 p
Level: D
Supervisor: Tomas Akenine-Möller,
Computer Graphics, Department of Computer Science, Lund University
Examiner: Fredrik Berntsson,
Scientific Computing, Department of Mathematics, Linköpings Universitet
Linköping: January 2006
Datum
Date
Avdelning, Institution
Division, Department
January 2006
Matematiska Institutionen
581 83 LINKÖPING
SWEDEN
Språk
Language
Rapporttyp
Report category
Licentiatavhandling
Svenska/Swedish
×
Engelska/English
×
ISBN
ISRN
LITH - MAT - EX - - 06 / 24 - - SE
Examensarbete
C-uppsats
Serietitel och serienummer
D-uppsats
Title of series, numbering
ISSN
Övrig rapport
URL för elektronisk version
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva5476
Titel
Title
Ray Tracing Bézier Surfaces on GPU
Författare
Author
Joakim Löw
Sammanfattning
Abstract
In this report, we show how to implement direct ray tracing of Bézier surfaces on
graphics processing units (GPUs), in particular bicubic rectangular Bézier surfaces
and nonparametric cubic Bézier triangles. We use Newton’s method for the rectangular case and show how to use this method to find the ray-surface intersection. For
Newton’s method to work we must build a spatial partitioning hierarchy around each
surface patch, and in general, hierarchies are essential to speed up the process of ray
tracing. We have chosen to use bounding box hierarchies and show how to implement
stackless traversal of such a structure on a GPU.
For the nonparametric triangular case, we show how to find the wanted intersection by
simply solving a cubic polynomial. Because of the limited precision of current GPUs,
we also propose a numerical approach to solve the problem, using a one-dimensional
Newton search.
Nyckelord
Keyword
Ray Tracing, Bézier Surface, Newton, Newton’s method, Graphics Processor, Graphics processing unit, Graphics Hardware, GPU.
vi
Abstract
In this report, we show how to implement direct ray tracing of Bézier surfaces
on graphics processing units (GPUs), in particular bicubic rectangular Bézier
surfaces and nonparametric cubic Bézier triangles. We use Newton’s method for
the rectangular case and show how to use this method to find the ray-surface
intersection. For Newton’s method to work we must build a spatial partitioning
hierarchy around each surface patch, and in general, hierarchies are essential
to speed up the process of ray tracing. We have chosen to use bounding box
hierarchies and show how to implement stackless traversal of such a structure
on a GPU.
For the nonparametric triangular case, we show how to find the wanted intersection by simply solving a cubic polynomial. Because of the limited precision
of current GPUs, we also propose a numerical approach to solve the problem,
using a one-dimensional Newton search.
Keywords: Ray Tracing, Bézier Surface, Newton, Newton’s method, Graphics
Processor, Graphics processing unit, Graphics Hardware, GPU.
vii
viii
Chapter 0. Abstract
Acknowledgements
First and foremost, I would like to thank my supervisor, Tomas Akenine-Möller,
for giving me the opportunity to work on my master’s thesis at Lund University,
and my examiner, Fredrik Berntsson, for his constructive and helpful criticism
of the content of this document.
In addition, I would like to thank the computer graphics people at the Department of Computer Science, Lund University, including Jon Hasselgren, Jacob Munkberg, Petrik Clarberg and Calle Lejdfors for their invaluable help
during my work.
I also want to thank Linde Wittmeyer-Koch, Tommy Elfving, Ulla Ouchterlony, Ingegerd Skoglund and the rest of the great people at the Department of
Mathematics, Linköpings Universitet, for inspiring me and helping me in my
studies in scientific computing.
Finally, I would like to thank my opponents, Johan Pettai and Julius Jägerskog for reading through this material and giving me suggestions on how to
improve it.
ix
x
Chapter 0. Acknowledgements
Contents
Abstract
vii
Acknowledgements
ix
1 Introduction
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
2
2 Bézier Surfaces
2.1 Triangular Bézier Surface . . . . . . . . . . . . . . . . . . . . . .
2.2 Rectangular Bézier Surface . . . . . . . . . . . . . . . . . . . . .
3
3
5
3 The Graphics Processing Unit
3.1 General purpose usage . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
9
10
4 Ray
4.1
4.2
4.3
4.4
4.5
Tracing Bézier Surfaces
Subdivision . . . . . . . . .
Newton’s method . . . . . .
Interval Analysis . . . . . .
Bézier clipping . . . . . . .
Other methods . . . . . . .
13
14
15
16
18
20
5 Ray
5.1
5.2
5.3
Tracing Algorithm
21
Traversing the Bounding Volume Hierarchy . . . . . . . . . . . . 21
Rectangular Surface Intersection . . . . . . . . . . . . . . . . . . 24
Nonparametric Triangular Surface Intersection . . . . . . . . . . 25
6 Implementation
6.1 GPU ray tracing
6.2 Data . . . . . . .
6.3 Program . . . . .
6.4 Optimizations . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
31
32
34
35
xi
xii
Contents
7 Results
39
7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Bibliography
51
Chapter 1
Introduction
This document presents the results of my master’s thesis work at LUGG (Lund
University Graphics Group) during the autumn 2005. In this first chapter, a
short background to the material is presented along with an overview of the
contents of the later chapters.
1.1
Background
Computer hardware is reaching a point where interactive, or even real time,
ray tracing can become a reality. Software implementations taking advantage
of advanced features such as SIMD architectures and intelligently using caches
have shown to give great speeds on modern computers. One example of this
is the OpenRT project [8]. We have also seen the appearance of processors
specialized for ray tracing, especially the RPU [51], having surprisingly fast
render times. One key factor to reach high speeds is to reduce the number of
primitives, allowing for more aggressive optimizations. The above mentioned
ray tracing systems uses triangles as the only primitive.
Direct ray tracing of curved surfaces, that is, not using tessellation, is a bit more
involved and places more demands on the implementation. Previous implementations of interactive Bézier surface ray tracing has been made in [1, 17]. They
reached reasonable speeds by heavy use of SIMD extensions and restricting their
implementations to bicubic surfaces.
Graphics processing units (GPUs), whose processing power today are increasing
more rapidly than CPUs, have started to become an alternative for computation purposes other than rasterizing graphics. They exist in most modern home
computers, and can therefore be used as co-processors by many applications.
1
2
Chapter 1. Introduction
One of these general-purpose applications is ray tracing. A few implementations of GPU ray tracing exist. Purcell et al. [37] proposed already in 2002 how
to completely fit the ray tracing algorithm on a GPU by using a stream processor model. Just recently, two more implementations have been made [28, 6].
Yet another implementation [5], uses the GPU only for intersection tests and
performs the rest of the algorithm on the CPU. All of these implementations
use triangles as the sole primitive.
In this report we take a closer look at how to utilize graphics processing units
for the purpose of ray tracing Bézier surfaces. Several existing methods is considered, trying to find one fit to be implemented on a GPU. GPUs have several
limitations which need to be taken into account when choosing method. We
choose an appropriate method and implement ray tracing of rectangular Bézier
surfaces as well as nonparametric triangular Bézier surfaces.
1.2
Overview
The report has been split into seven chapters which, excluding the current
chapter, has the following content:
Chapter 2 contains an introduction to triangular and rectangular Bézier surfaces.
Chapter 3 introduces the concepts of the graphics processing unit and explains
the advantages and limitations we need to take into consideration.
Chapter 4 gives an overview of the different methods that exist for ray tracing
Bézier surfaces, along with any advantages or disadvantages the methods
may have in comparison.
Chapter 5 explains the algorithm that was chosen for implementation in this
work and the reasons for making this choice. Included are also discussion
and solutions to the different problems that exist with the algorithm.
Chapter 6 gives details on the GPU specific implementation.
Chapter 7 finally presents the results, a conclusion and a short discussion on
how this work may be extended in the future.
Chapter 2
Bézier Surfaces
A Bézier surface is a parametric surface defined by a set of control points (usually
in three-dimensional Euclidean space) and a set of basis functions, in this case
Bernstein polynomials, which hold the weight of each control point throughout the parametric domain of the surface. Bézier surfaces yield a natural and
intuitive way of representing surfaces and algorithms are usually fast and numerically stable. This chapter gives a brief overview of two types of Bézier
surfaces, triangular and rectangular.
2.1
Triangular Bézier Surface
For triangular surfaces, we are given a set of control points pi,j,k , i + j + k = n,
where n is the degree of the surface. In the triangular case, each control point
is assigned a bivariate Bernstein polynomial as basis function, defined as
n
Bi,j,k
(u) =
n! i j k
uv w ,
i!j!k!
i + j + k = n,
(2.1)
where u = (u, v, w) is given in barycentric coordinates. The parametric domain
is set to be u, v, w ≥ 0, u + v + w = 1. A few examples of the Bernstein polynomials can be seen in Figure 2.2. More thorough information on barycentric
coordinates can be found in [14, 13], but briefly, the theory proceeds as follows:
assume we have points a, b, c, p ∈ R2 . Then p = ua + vb + wc is called a
barycentric combination. We require that u + v + w = 1. Now u = (u, v, w) is
said to be the barycentric coordinates of p in the barycentric coordinate system
(abc). Note that (2.1) really is bivariate, since w = 1 − u − v. Note also that
the control points can be placed arbitrarily and independently of each other in
Euclidean space.
3
4
Chapter 2. Bézier Surfaces
b0,3,0
v3
b0,2,1 b1,2,0
3v 2 w 3uv 2
b0,1,2 b1,1,1 b2,1,0
3vw2 6uvw 3u2 v
b0,0,3 b1,0,2 b2,0,1 b3,0,0
w3
3uw2 3u2 w
(a)
u3
(b)
(c)
Figure 2.1: Bézier triangle example. Control point indexing (a), corresponding basis
functions (b) and surface (c) of a cubic Bézier triangle.
The equation of the surface takes the form
X
n
Sn (u) =
pi,j,k Bi,j,k
(u).
(2.2)
i+j+k=n
If, however, the control points are distributed in a certain uniform manner, more
specifically
i j k
pi,j,k = (u, v, w, z) = ( , , , pi,j,k ),
n n n
where (u, v, w) are the barycentric coordinates of the point in a given plane, the
surface can be reduced to the form
X
n
z=
pi,j,k Bi,j,k
(u).
(2.3)
i+j+k=n
In this form, the task of ray tracing the surface becomes particularly simple,
especially for low-degree surfaces (degree ≤ 4), since the ray-surface intersection
then can be found by solving an explicit formula. We will look closer at this
case in a later chapter.
A natural derivative to consider when working with triangular patches is the
directional derivative [14]. They give us a tool to compute the derivative along
any direction in the parametric domain of the triangle:
X
n−1
(dpi+1,j,k + epi,j+1,k + f pi,j,k+1 )Bi,j,k
(u),
(2.4)
Dd S(u) = n
i+j+k=n−1
where the (barycentric) vector d = u2 − u1 = (d, e, f ) defines the direction of
the derivative. As we can see by (2.4), a derivative of a Bézier surface is itself
a Bézier surface. Note that for a vector in barycentric coordinates we have
d + e + f = 0.
2.2. Rectangular Bézier Surface
5
3
B1,1,1
= 6uvw
3
B0,3,0
= v3
3
B0,1,2
= 3vw2
Figure 2.2: Three of the basis functions of a cubic Bézier triangle.
However, for the purposes in this thesis, we need non-barycentric partial derivatives. The surface can easily be rewritten as a function of two parameters by
replacing w = 1 − u − v. The partial derivatives of the surface then becomes
simple special cases of the directional derivatives:
∂ n
S (u, v, 1 − u − v) = D(1,0,−1) S(u),
∂u
∂ n
S (u, v, 1 − u − v) = D(0,1,−1) S(u).
∂v
(2.5)
(2.6)
Bézier surfaces enjoy several useful properties, one of which is invariance under
affine transformations. This means the overall shape of the surface remains unchanged when translating or rotating the surface (that is, applying a translation
or rotation transformation to all control points). Also, Bézier surfaces have the
convex hull property, meaning that the surface is always completely contained
inside the convex hull generated by its control points. This property is very
useful for ray tracing purposes as we shall see later.
2.2
Rectangular Bézier Surface
For rectangular surfaces we have a set of control points pi,j , 0 ≤ i ≤ m, 0 ≤
j ≤ n, where m and n are the degree of the surface along the two different
parametric directions. In this case, the basis functions are given by the product
of two univariate Bernstein polynomials:
m,n
Bi,j
(u, v) = Bim (u)Bjn (v),
where
Bin (t) =
n
i
ti (1 − t)n−i ,
0 ≤ t ≤ 1,
(2.7)
(2.8)
6
Chapter 2. Bézier Surfaces
Figure 2.3: Bicubic Bézier surface example.
3,3
B0,3
= (1 − u)3 v 3
3,3
3,3
= 3u(1 − u)2 3v(1 − v)2
B0,1
= (1 − u)3 3v(1 − v)2 B1,1
Figure 2.4: Three of the sixteen basis functions of a bicubic Bézier rectangle.
yielding the surface equation
Sm,n (u, v)
=
m X
n
X
m,n
pi,j Bi,j
(u, v)
i=0 j=0
= uT Pv.
(2.9)
In the last expression the equation has been rewritten as a matrix-vector product, where
uT
vT
=
=
m
[B0m (u) B1m (u) . . . Bm
(u)] ,
n
n
n
[B0 (v) B1 (v) . . . Bn (v)] ,
(2.10)
(2.11)
and P is the m × n matrix with the surface control points as elements.
The partial derivatives of a rectangular patch is given by
∂ m,n
S (u, v)
∂u
= m
∂ m,n
S (u, v)
∂v
= n
m−1
n
XX
i=0 j=0
m n−1
X
X
i=0 j=0
m−1,n
(pi+1,j − pi,j )Bi,j
(u, v),
m,n−1
(pi,j+1 − pi,j )Bi,j
(u, v).
(2.12)
(2.13)
2.2. Rectangular Bézier Surface
7
Again, we can see that the partial derivatives of a Bézier surface are also Bézier
surfaces.
Similar to triangular surfaces, rectangular surfaces is invariant under affine
transformations and has the convex hull property.
8
Chapter 2. Bézier Surfaces
Chapter 3
The Graphics Processing
Unit
Before going into detail of the algorithms used for implementation, we need
some information about the usage of graphics processing units (GPUs) for other
purposes than graphics. Current design of GPUs have a few limitations which
need to be explained in order to motivate the choices of algorithms.
3.1
General purpose usage
The main purpose of a GPU is to accelerate the rendering of triangles using the
traditional rasterization technique. This includes hardware transformation and
decoration, such as lighting, texture mapping, bump mapping, etc. Figure 3.1
shows a rudimentary overview of the rendering pipeline. The transformation
and decoration steps are programmable, giving the programmer the means for
implementing a wide range of different effects. The effect programs are known as
vertex shaders and fragment shaders, used for transformations and decorations
respectively.
In later years, the research community has started to utilize the programmability
of modern GPUs to perform general purpose calculations [34, 29, 19, 2, 30],
often huge amounts of simple calculations. The GPU is extremely efficient for
this purpose, performing simple calculations on large sets of data in parallel.
For general purpose computations we normally don’t have much use for vertex
shaders, but instead use the fragment shader. The algorithms are implemented
as a number of fragment shader programs, also referred to as kernels, and we use
the stream processor model (Figure 3.2), where each kernel works on a stream
of data.
9
10
Chapter 3. The Graphics Processing Unit
-
Vertex
-
shader
-
-
Fragment
-
shader
Figure 3.1: Rough overview of the GPU pipeline. The vertex and fragment shaders
are the programmable parts of the GPU. The vertex shader transforms the geometry
and the fragment shader determines the value (color) of the pixels.
In 2002, [37] introduced the GPU as a stream processor for ray tracing purposes.
Several GPU ray tracers have been implemented since then [38, 28, 10]. In [5]
the GPU is utilized for intersection computations only, leaving the rest of the
ray tracing algorithm for the CPU to resolve.
3.2
Limitations
Unfortunately, when it comes to more complex programs, the GPU is not as easy
to program as a CPU. A GPU does not have all the functionality and flexibility
of a CPU simply because it does not have to, in order for it to accomplish its
main task, rendering triangles.
Firstly, and most importantly, GPUs have a different memory model [30]. Every
kernel have access only to read-only memory, in the form of textures. During
the execution, there are a number of temporary registers available to store intermediate values, but the only way to communicate values outside the program
is through the return values of the program. Normally, this value is sent to the
final buffer used for display, but by using render-to-texture instead, we have the
means to send data back to the CPU. Alternatively, it makes it possible to setup
data for further processing in other kernels (or repeated process in the current
kernel), a technique called multipass rendering (see Figure 3.3).
GPUs have no stack [30]. This means recursion is not allowed within kernels.
On the other hand, there are loop constructs and conditional constructs we
can use to implement similar stackless algorithms. Alternatively, we can take
advantage of the multipass rendering technique to simulate a stack.
Furthermore, GPUs do their computations in a very limited precision environment. The maximum precision available is what is normally called single
precision, in other words, 32 bit floating point precision. This affects algorithms
that depend highly on numerical accuracy. We shall see later that it has a
serious impact on one of the algorithms implemented during this work.
3.2. Limitations
11
Input
?
Kernel
?
Output
Figure 3.2: Streaming model used for general purpose GPU programming.
?
Kernel
?
Kernel
?
Kernel
?
(a)
?
Kernel
?
(b)
Figure 3.3: Multipass rendering using one kernel (a) or several kernels (b).
12
Chapter 3. The Graphics Processing Unit
Chapter 4
Ray Tracing Bézier Surfaces
In this chapter, we will take a look at different existing methods for ray tracing Bézier surfaces, that is, methods for finding the intersection between a ray
and a Bézier surface. In addition to a description, advantages and disadvantages with each method are discussed, including a motivation to whether the
method is appropriate for an implementation on a GPU. The examples in this
chapter all concern rectangular patches, but each method below can (with some
modification) be used on triangular patches as well.
Before heading into the different methods, we will say a few words about a
technique that can be used together with most of the methods. If we, before
applying the ray tracing method, project all control points of the surface to two
dimensions [26, 33], we can reduce the amount of work of surface specific operations (for example the de Casteljau algorithm and surface point and derivative
evaluations) with some 33% (even the problem of ray tracing rational surfaces
can be reduced to two dimensions [33]). Doing this does not always reduce
the total cost of finding the intersection, since we lose some depth information.
We often need to do extra computations in the final stages of the algorithm,
computations that wouldn’t be needed if projection was not performed.
A ray can be written in parametric form as
r = o + td = (ox + tdx , oy + tdy , oz + tdz ),
(4.1)
where o is the ray origin, d is the (normalized) direction of the ray and t is a
variable scalar that determines positions along the ray (alternatively the length
of the ray). If we find two perpendicular planes, intersecting exactly along
the ray, we can use the plane normals as basis vectors for a two-dimensional
Euclidean space. We easily construct these normals as [31]:
13
14
Chapter 4. Ray Tracing Bézier Surfaces
Figure 4.1: Projection of a surface from three to two dimensions.
(dy , −dx , 0)
(0, dz , −dy )
n1
=
n2
= n1 × d.
|dx | > |dy | and |dx | > |dz |
,
otherwise
(4.2)
(4.3)
In this space, the ray reduces to a point. If we want the ’ray point’ to lie at
the origin, the projection of a control point pi from R3 to the currently defined
space is performed by
pi · n1 − n1 · o
′
pi =
.
(4.4)
pi · n2 − n2 · o
The projection is illustrated in Figure 4.1. Finding the intersection now reduces
to the problem of finding the parametric coordinates of the surface at the origin
in the two-dimensional space.
4.1
Subdivision
Subdivision is a straightforward method for finding an intersection and was
implemented for bicubic Bézier patches in [1] using SIMD optimizations on a
CPU (and earlier in [50]). The idea is simple: split the surface into smaller
and smaller patches, throwing away those patches that cannot be intersected by
the ray, until some maximum depth has been reached. Then we are left with
a (hopefully) small set of patches that are candidates for nearest ray-surface
intersection.
Splitting the surface may be done efficiently using the classic de Casteljau algorithm. Discarding patches is simple: if all of the control points have the same
sign on one of the coordinates (in the two-dimensional space defined before),
4.2. Newton’s method
(a)
15
(b)
(c)
Figure 4.2: Subdivision method. Surface (a) is split into (b) and (c). (b) can be
discarded since all of its control points lies on one side of a coordinate axis.
the patch cannot be intersected by the ray due to the convex hull property (see
Figure 4.2). Finally, the control point mesh of each candidate subpatch is used
as an approximation to the surface and the intersection is found by performing
simple ray-triangle intersection on the triangles generated by the mesh.
Although the basic idea is simple and easy to implement, there are some drawbacks to the method. First of all, an object consisting of several surfaces must
use the same subdivision depth for all its surfaces in order to avoid cracks. Secondly, since we are approximating the surface with control point meshes, the
intersection points can be found only with a low degree of accuracy. We can
improve accuracy by increasing subdivision depth, but that would make the
method slower. Lastly, since we need to cache at least one surface at every
subdivision depth, the demand for temporary memory is high. For this reason
the method is hard (if not impossible) to implement on a GPU.
4.2
Newton’s method
Finding a zero of a system of nonlinear equations is a standard problem in
computational mathematics, and a problem that appears in many applications.
One of the few tools available for this task is Newton’s method [23].
Finding the intersection between a ray and a Bézier surface can easily be formulated as a system of equations. Using the projected control points (4.4) we
already have a system of nonlinear equations ready to solve: S(u, v) = (0, 0)
(see Figure 4.3). In [17, 31] a slightly different approach is taken, performing
the projection after surface points and derivatives have been computed in R3 .
In [44] the method is implemented for triangular Bézier surfaces.
Newton’s method is an iterative method, derived from the truncated Taylor
16
Chapter 4. Ray Tracing Bézier Surfaces
series of the system [23]:
f (x + s) ≈ f (x) + Jf (x)s.
(4.5)
The Jacobian Jf of a function f : Rn → Rn is a n × n matrix with elements
{Jf }ij =
∂fi
.
∂xj
(4.6)
If we assume x + s to be a zero of f we have Jf (x)s ≈ −f (x). We let s be the
change of the current approximate zero xi and get the Newton step (note the
inverse Jacobian):
xi+1 = xi + s = xi − J−1
f (xi )f (xi ).
(4.7)
Newton’s method can be used for finding the zero of a system in a fast, but
not unconditionally stable manner. To start iterating we need an initial guess
x0 , and in order for the method to converge to a solution, it is crucial to have
an initial guess close enough to the solution. For ray tracing purposes this is
often solved using precomputed bounding volume hierarchies, which completely
encloses the surface (Figure 4.4(b)). Each volume in the hierarchy is associated
with one or several initial guesses for the part of the surface contained in that
particular volume [31, 17, 10]. If the ray intersects one of the volumes, the
probability that it also intersects the surface is high, and we can start a Newton
iteration using the initial guess of the current volume. Interval analysis gives us
another tool to ensure convergence as we shall see in the next section.
There are other problems inherent to the method in addition to the vague convergence conditions. If the Jacobian Jf is close to singular, that is if the condition number of Jf is very large, we will run into serious numerical problems.
This happens when the ray is close to tangential to the surface and close to
areas on the surface where at least one of the partial derivatives is close to zero
(for example when neighboring control points of the surface coincide). This is
solved in [31, 10] by slightly perturbing the parametric point to push away from
the problem area. Another problem concerns multiple intersections within one
bounding volume (see Figure 4.5). We can increase the chance of getting the
correct solution by assigning several initial guesses to every bounding volume
and putting some effort into choosing the best one for a given ray [10].
Newton’s method is the method chosen for implementation on GPU (using only
one initial guess per bounding volume), and we will go into more detail on how
to apply the method for ray tracing Bézier surfaces in the next chapter.
4.3
Interval Analysis
Interval analysis is a tool often used in error analysis to keep track of error
propagation and computing error limits [22], but it can also be utilized for ray
4.3. Interval Analysis
17
v
1
v
1
x=0
0
(a)
y=0
1u
0
(b)
0
1u
0
(c)
Figure 4.3: The intersection is located at (x, y) = (0, 0), and we therefore search for
(u, v)-coordinates such that S(u, v) = (0, 0).
z
r2
S(u, v)
s
s
r(t)
(a)
r1
z
(b)
Figure 4.4: Actual intersection (a) and bounding
volume technique (b) to find initial guess for Newton’s method.
Figure 4.5:
Problem
with multiple intersections
within a bounding volume.
tracing. Again Newton’s method is used to find the wanted intersection, but
interval analysis can be used for providing good starting approximations. In
fact, by utilizing a couple of theorems of interval analysis, we can test whether
Newton’s method is guaranteed to converge to a unique solution inside a given
parametric interval, using any initial guess inside the interval [45].
This method could be used in conjunction with a precomputed bounding box
hierarchy. Every time a node is reached, a test for guaranteed convergence is
performed, and if convergence is guaranteed, we start a Newton search. If a leaf
node has been reached and convergence is still not guaranteed, the parametric
interval is split further on-the-fly.
Interval analysis also provide us with the interval Newton iteration method,
similar to Newton’s method used in the previous section, but instead working on
intervals [45]. Given a parametric interval containing the intersection point, this
method can under certain conditions be used to give us a fairly rapid convergence
of the interval to the intersection point.
A down side of the method is that we may need to make an extreme amount
18
Chapter 4. Ray Tracing Bézier Surfaces
of surface splits (or interval Newton iterations) to get an interval inside which
convergence is guaranteed. Additionally, in a prototype CPU implementation,
the test for guaranteed convergence proved slow, and this method was therefore
never included in our GPU ray tracer.
For more information on the details of the method, the reader is referred to [45,
32], where the method is implemented and described more in depth. See also [27,
20] for a similar approach for implicit surfaces.
4.4
Bézier clipping
Bézier clipping is a technique that was introduced in [33], and proposes a method
to iteratively narrow the parametric domain of the surface in which the intersection can be. It was later improved in [3, 4] and has also been implemented
for triangular Bézier surfaces in [39].
The given control point mesh is first projected onto two dimensions (even the
problem of ray tracing rational surfaces can be reduced to two dimensions [33]).
In this example we will perform clipping in the parametric u-direction. First of
all, we define a line L:
ax + by = 0,
a2 + b2 = 1,
(4.8)
through the origin, parallel to the vector v0 + v1 (see Figure 4.6(a)). We then
define the signed distance from every (projected) control point pi,j = (xi,j , yi,j )
to L as
di,j = axi,j + byi,j
(4.9)
(Figure 4.6(b)). Now, the control points
di,j = (
i j
, , di,j )
m n
(m and n being the surface degrees) can be used to form a new Bézier surface
d(u, v), for which we find the convex hull (seen from the side in Figure 4.7).
Analyzing where this convex hull intersects the zero axis, we find the parametric
interval in which the ray can intersect the original surface. The ray can only
intersect the surface in the interval [umin , umax ].
Finally, when we have found umin and umax , we compute new control points for
the subpatch corresponding to the subinterval of the original parametric domain
(Figure 4.8).
The procedure is then repeated, alternating between the parametric directions,
until the wanted accuracy has been reached. If, in some step, the total reduction
of parametric interval is too small (which for example may happen when we have
4.4. Bézier clipping
19
L
v0
v1
(a)
(b)
Figure 4.6: Bézier clipping method. Determine a line L (a) and compute distances
to L from every control point (b).
multiple intersections), the surface is split in half, and the procedure is continued
for each of the halves.
This last case is the reason for not implementing this method on a GPU. When
performing a split, we are left with two new surfaces, one of which must be stored
temporarily while we work with the other, which makes the method unfit for
GPU (for the same reasons as the subdivision method). The method is stable
in the sense that it is guaranteed to find an intersection if it exists, but it is
relatively slow for surfaces of higher degree [39].
d(u, v)
umin
0
umax
u
Figure 4.7: Convex hull of ’distance’
surface.
Figure 4.8: Resulting subpatch after
one iteration of Bézier clipping.
20
4.5
Chapter 4. Ray Tracing Bézier Surfaces
Other methods
There are a few other methods in addition to the ones mentioned in this chapter.
The probably most commonly used method today is tessellation of the surface,
that is, approximating the surface with a set of triangles, before starting the ray
trace. This solution has similarities to the subdivision method, but often has
an extreme demand on memory to store the precomputed meshes. However,
ray tracers with triangles as primitives have already been successfully implemented on GPUs, so this method would certainly be an alternative for GPU
implementation.
Other methods include implicitization methods. If possible we rewrite the parametric surface as an implicit surface, or if that is too costly, approximate the
surface with other implicit surfaces [42, 41, 18].
Chapter 5
Ray Tracing Algorithm
In this work, we have implemented ray tracing of rectangular, bicubic Bézier
surfaces using Newton’s method, since this method meets the limitations of
modern GPUs. Apart from the needed precomputation, it is fast and has very
little demand on memory for intermediate data during the actual search.
Bounding volume hierarchy traversal is used to find candidate surfaces for intersection with the ray, and further to find initial guesses for Newton’s method.
This traversal is done without recursion since GPUs don’t have a stack, and thus
cannot allow for recursion. As bounding volumes, we use axis aligned boxes.
Furthermore, ray tracing of nonparametric triangular Bézier surfaces has been
implemented. This is done in a brute force way, in the sense that no space
partitioning has been made, but every surface is tested against every ray. nonparametric surfaces is faster to ray trace and could be used for terrain or water
(height maps).
This chapter describe the implemented algorithms in detail.
5.1
Traversing the Bounding Volume Hierarchy
Space partitioning is a fundamental technique when accelerating ray tracing,
allowing us to reduce the total amount of intersection tests we need to make.
But having no stack affects the algorithms that are usually used when traversing
the space partition hierarchies in ray tracing. Either we simulate a stack, or we
rewrite the algorithms to fit the non-recursive environment.
There are several ways we can choose to do space partitioning. In [21] it is
shown that KD-trees is the best choice for most purposes when performing ray
21
22
Chapter 5. Ray Tracing Algorithm
(a)
Figure 5.1: Simple twodimensional KD-tree.
(b)
Figure 5.2: Bounding box hierarchy (a) and corresponding node tree (b).
tracing on CPU. A KD-tree is built recursively by splitting each node into two
smaller nodes using an axis-aligned splitting plane (see Figure 5.1). Stackless
KD-tree traversal has been successfully implemented on a GPU in [16], see
also [12], where KD-tree traversal on GPU is implemented by simulating a
stack. However, this approach works with a fixed stack depth and memory cost
is high as we need memory proportional to the number of rays times the stack
depth.
Here, we have instead chosen to try to implement stackless bounding volume
hierarchy traversal, using axis aligned boxes as bounding volumes, and it turns
out to be feasible indeed.
In a bounding volume hierarchy, each node is associated with a volume, completely enclosing its children (see Figure 5.2(a)). For Bézier surfaces, we build
the hierarchy simply by repeatedly splitting the surface until some flatness criterion is met, or until a maximum depth is reached. In [31] a measure based
on the surface curvature is used to determine the number of splits. The convex
hull of the subsurface can then be used to generate a bounding volume.
Normally, the first time we visit a node, we compute the entry point of the ray
in each of the children node volumes, in order to decide which child to traverse
first. Having a stack, we have fast access to stored information about what
choices we made during previous visits to the node. This is a luxury we must
live without when doing a stackless implementation. The only memory we will
use is a short traversal history, telling us which node we just came from.
As mentioned before, using a stack we would only have to compute child node
entry points once. Here, we need to make this computation every time we visit
a node. Having the children entry points we can, based on which node we came
from, decide on where to traverse next. We can enter the node from three
directions, from its parent or from one of its children (Figure 5.2(b)). Coming
from the parent, we proceed to one of the child nodes (providing the current
node isn’t a leaf), and we make the decision on which child simply based on
which entry point is closest to the ray origin. When coming from one of the
children the decision is slightly more complicated. Assume we came from the
left child. Now, if the left child entry point is closer to the ray origin than the
5.1. Traversing the Bounding Volume Hierarchy
23
right child entry point, we know that we traversed the left child first, and thus
need to traverse right. If the left child entry point instead was further away than
the right child entry point, we know that we have traversed both children, and
instead traverse up. Coming from the right child we make traversal decisions in
the same way. Of course, we keep the value of the currently nearest intersection
at all time, avoiding to traverse nodes that obviously cannot yield a closer
intersection. Pseudo code for the algorithm is given in Algorithm 5.1.
Algorithm 5.1 Stackless bounding box hierarchy traversal
1: previousN ode = P arent(root)
2: currentN ode = root
3: tn = IN F IN IT Y
{Reset nearest intersection to infinity}
4: while currentN ode 6= P arent(root) do
5:
if currentN ode has no children then
6:
tn = min(tn , nearest surface intersection in currentN ode)
7:
previousN ode = currentN ode
8:
currentN ode = P arent(currentN ode)
9:
else
10:
BBl , BBr = bounding boxes of child nodes
11:
tl , tr = ray entry points in child node boxes
12:
if previousN ode = P arent(currentN ode) then
13:
previousN ode = currentN ode
14:
if ray hits BBl and tl ≤ tr and tl ≤ tn then
15:
currentN ode = Lef tChild(currentN ode)
16:
else if ray hits BBr and tr ≤ tn then
17:
currentN ode = RightChild(currentN ode)
18:
else
19:
currentN ode = P arent(currentN ode)
20:
end if
21:
else if previousN ode = Lef tChild(currentN ode) and ray hits BBr
and tl ≤ tr and tr ≤ tn then
22:
previousN ode = currentN ode
23:
currentN ode = RightChild(currentN ode)
24:
else if previousN ode = RightChild(currentN ode) and ray hits BBl
and tl > tr and tl ≤ tn then
25:
previousN ode = currentN ode
26:
currentN ode = Lef tChild(currentN ode)
27:
else
28:
previousN ode = currentN ode
29:
currentN ode = P arent(currentN ode)
30:
end if
31:
end if
32: end while
24
5.2
Chapter 5. Ray Tracing Algorithm
Rectangular Surface Intersection
After having reached a leaf node of the bounding volume hierarchy, we are set
with the surface patch identifier and Newton iteration initial guess associated
with the node. We will not project the control points to two dimension before starting the Newton search. Tests showed that the algorithm slowed down
when preprojecting the control points. The cost of projecting the points obviously overshadowed the reduced cost of computing surface point and derivatives.
Additionally, we need a depth value of the intersection point, something we get
for free when using the original control points. Using projected control points,
we need to make an extra surface point and derivative evaluation in three dimensions after the search. In our implementation, the mean number of iterations in
the Newton search most often lay between two and three. With a higher mean,
the reduced cost of surface point and derivative evaluation would probably be
more beneficial.
Recall the Newton step (4.7). In order to get a two-dimensional search, we
define the following function f (u, v) to use in our search [31, 17]:
n1 · S(u, v) − n1 · o
f (u, v) =
.
(5.1)
n2 · S(u, v) − n2 · o
This means basically that we delay the projection to two dimensions until after
we have computed the surface point. n1 and n2 are the normals defined in (4.2)
and (4.3) and o is the ray origin. Using this function, we get the Jacobian
j11 j12
n1 · Su (u, v) n1 · Sv (u, v)
.
(5.2)
Jf =
=
n2 · Su (u, v) n2 · Sv (u, v)
j21 j22
The inverse of the Jacobian is particularly easy to calculate in the two-dimensional
case:
1
j22 −j12
−1
,
(5.3)
Jf =
−j21
j11
det Jf
where the determinant is determined by
det Jf = j11 j22 − j12 j21 ,
(5.4)
giving us the final piece of information we need to start iterating. Newton’s
method is described in Algorithm 5.2.
We keep iterating until one of the stopping criterion is met. If at any time the
norm of the function value is less than some preset tolerance, that is
|f (un , vn )| < ǫ,
(5.5)
we consider the search a success and report a hit. If, however, this measure
grows from one iteration to another,
|f (un+1 , vn+1 )| > |f (un , vn )| ,
(5.6)
5.3. Nonparametric Triangular Surface Intersection
25
or if we reach a maximum number of iterations, we assume the ray missed the
surface.
Note that (5.5) only gives a measure of the spatial error, not the error of
could use an alternative error measure as stopping criterion,
un and vn . We
(un+1 , vn+1 )T − (un , vn )T , giving us a measure of the error in the parametric
domain.
Finally, if a hit was reported, we make sure the resulting (u, v)-coordinates lie
within the parametric domain of the surface. This test needs to be done with
some tolerance to avoid visual artifacts in the seam between surfaces.
Algorithm 5.2 Newton’s method
1: u0 = initial guess
2: n = 0
3: repeat
4:
un+1 = un − J−1 (un )f (un )
{un = (un , vn )T }
5:
previousError = error
6:
error = |f (un+1 )|
7:
n=n+1
8: until error < T OL or error > previousError or n ≥ M AXIT ER
5.3
Nonparametric Triangular Surface Intersection
Trying to exploit the simplicity of solving polynomials of low degree (less than
or equal to three), we have also implemented ray tracing of nonparametric,
triangular Bézier patches. These types of surfaces may well be used for example
to interpolate height data, creating terrain or decorating reliefs. The certainly
most numerically appealing type is the surface of second degree. There exist
several simple interpolation schemes for these types of surfaces, for example
the C 1 Clough-Tocher interpolant or, particularly for piecewise second degree
surfaces, the C 1 Powell-Sabin interpolants [13].
We opted for testing cubic surfaces, which gives us the possibility to use the
explicit cubic formula to solve the intersection problem. However, due to numerical problems using the this approach, we present also a different method,
based on the one-dimensional Newton method.
In order to utilize the implicit form of the surface, we first transform the ray
into the local coordinate system of the surface. We assume the surface has been
modeled in the barycentric coordinate system (abc), a = (1, 0, 0), b = (0, 1, 0),
c = (0, 0, 0) (Figure 5.3). By making this assumption, we can simply use the
inverse of any transformation applied to the surface to get the ray into the local
26
Chapter 5. Ray Tracing Algorithm
z
y
x
Figure 5.3: Local coordinate system of nonparametric triangular Bézier surface.
coordinate system of the surface. Furthermore, the barycentric coordinates
needed as argument for the basis function are easily extracted from the local
(x, y)-coordinates by letting u = (x, y, 1 − x − y).
Now, we rewrite the surface in implicit form and insert the elements of the
transformed ray’s parametric form (4.1), yielding:
X
3
pi,j,k Bi,j,k
(ur ) = 0.
(5.7)
z − f (x, y) = oz + tdz −
i+j+k=3
where ur = (ox + tdx , oy + tdy , 1 − (ox + tdx + oy + tdy )). Expanding (5.7) reveals
the coefficients to a third degree polynomial:
c0 t3 + c1 t2 + c2 t + c3 = 0,
(5.8)
and how these coefficients depend on the elements of the ray. Before solving the
polynomial we rewrite it in the following form:
t3 + at2 + bt + c = 0.
(5.9)
As before suggested, there are some numerical aspects we need to consider
when solving (5.9). First of all, to avoid excessively large coefficients and thus
potential numerical problems (remember, we are working in 32 bit precision
only), we reset the ray origin to a location closer to the surface along the ray.
All coefficients except c0 depends on the ray origin coordinates, and large values
on the origin can very much affect the accuracy of the final result. Here, we
solve the problem by simply resetting the origin to the entry point of the surface
bounding volume. This may not yield the best possible values, but proved good
enough for our tests. We keep track of the distance we move the origin in order
to be able to get the distance between the actual origin and the intersection.
To solve (5.9) explicitly, we follow the methods outlined in [36, 49], with some
modifications to improve numerical stability. Define
Q =
a2 − 3b
,
9
(5.10)
5.3. Nonparametric Triangular Surface Intersection
R
=
2a3 − 9ab + 27c
.
54
27
(5.11)
The discriminant is defined as
D = R2 − Q3 .
(5.12)
For large values on a, R2 and Q3 are both completely dominated by the term
a6 /729, and we have an obvious risk of cancellation. Using instead the expansions
R2
=
Q3
=
a6
a4 b a3 c a2 b2
abc c2
−
+
+
−
+ ,
729
81
27
36
6
4
a6
a4 b a2 b2
b3
−
+
− ,
729
81
27
27
to compute D, we arrive at
b3
abc c2
a3 c a2 b2
−
+
−
+ .
27
108
27
6
4
D=
(5.13)
The sign of the discriminant reveals the number of real roots to the polynomial.
If D < 0 it has three real distinct roots, if D = 0 it has three real roots of which
at least two are equal, and if D > 0 it has one real root only. If D < 0, we
compute an angle θ using one of
θ
=
θ
=
π
R
,
− arctan √
2
−D
R
arccos p ,
Q3
(5.14)
(5.15)
and then get the three real roots using:
t1
t2
t3
p
θ + π2
a
)− ,
= −2 Q cos(
3
3
p
θ + 5π
a
2
= −2 Q cos(
)− ,
3
3
p
θ + 9π
a
2
= −2 Q cos(
)− .
3
3
(5.16)
(5.17)
(5.18)
If instead D >= 0, we compute one real root (note that even if D = 0, we still
compute one root only) using
S
T
t
h
√ i1/3
,
= −sgn(R) |R| + D
Q/S S 6= 0
=
,
0
S=0
a
= S+T − .
3
(5.19)
28
Chapter 5. Ray Tracing Algorithm
Of course we also need to take care of the degenerate cases when (5.9) reduces
to a second or first degree polynomial.
On a CPU prototype implementation, (5.14) yielded better results than (5.15),
but on the GPU implementation (using the Cg language) the trigonometric
function arctan had an unpleasantly bad accuracy. In this case arccos proved
better, but still not nearly as good as in the CPU implementation. Because
of these shortcomings, we have also implemented a solution based on the onedimensional Newton method. This method is slower, but much more stable.
Define a function using the cubic polynomial:
f (t) = t3 + at2 + bt + c.
(5.20)
Taking a closer look at (5.20), we find that we have only four basic cases needed
to be taken care of. To figure out the case of the current ray, and an initial
guess for the Newton search, we compute the discriminant of the (quadratic)
derivate of the cubic polynomial:
Df ′ = b2 − 4ac,
(5.21)
and its roots t′1 and t′2 . The four cases are then as follows:
1. Df ′ < 0: the derivative has no roots, and thus f has no local minimum
or maximum. Use any initial guess t0 . We have chosen to use the explicit
formula to compute an initial guess in this case (the case where the cubic
polynomial discriminant D >= 0, see above).
2. Df ′ >= 0 and f (t′1 ) · f (t′2 ) <= 0: we have three real roots. Find one using
a Newton search with initial guess t0 = (t′1 + t′2 )/2.
3. Df ′ >= 0 and f (t′1 ), f (t′2 ) > 0: we have one real root. Use initial guess
t0 > t′2 . We have chosen to use t0 = t′2 + (t′2 − t′1 )/2.
4. Df ′ >= 0 and f (t′1 ), f (t′2 ) < 0: we have one real root. Use initial guess
t0 < t′1 . We have chosen to use t0 = t′1 − (t′2 − t′1 )/2.
These cases are illustrated in Figure 5.4 (note that the figure does not illustrate
the ray and the surface but rather the value of f along the ray).
Having an initial guess, we start the Newton search:
tn+1 = tn −
f (tn )
t3 + at2 + btn + c
= tn − n 2 n
,
′
f (tn )
3tn + 2atn + b
(5.22)
and repeat until some desired accuracy has been reached. In case 2, where we
have three roots, we can use the just found root t∗ to find the other two by
solving the second degree polynomial
t2 + (t∗ + a)t + ((t∗ )2 + at∗ + b) = 0.
(5.23)
5.3. Nonparametric Triangular Surface Intersection
case 1
case 2
case 3
29
case 4
Figure 5.4: The different cases of the cubic polynomial.
Having a root t∗ we quickly find the ray-surface intersection point either by
inserting t∗ into the parametric form of the original (untransformed) ray, or
by computing the point on the surface using the barycentric coordinates u =
(ox + t∗ dx , oy + t∗ dy , 1 − (ox + t∗ dx + oy + t∗ dy )).
30
Chapter 5. Ray Tracing Algorithm
Chapter 6
Implementation
Now that the algorithms have been made clear, we will discuss the actual implementation of them on the GPU. All GPU specific code was written in Cg
and run through the Cg API of OpenGL. Cg (C for graphics) is a programming
language developed by NVIDIA to give developers access to GPU hardware on a
higher level than pure assembler [15]. We will focus mainly on the implementation of the rectangular case, since our current implementation of nonparametric
triangular Bézier surfaces is very much a simplified version of the previous, with
no traversal and of course a different intersection kernel.
6.1
GPU ray tracing
Traditionally, GPUs are used to render large amounts of triangles fast, handling
also transformations and decorations that we may want to apply to the triangles.
By specializing on this, the GPU can be extremely efficient for drawing the type
of three-dimensional graphics used in for example modern games, where all
models are drawn onto screen triangle by triangle. For ray tracing purposes, we
cannot fully take advantage of this functionality since the algorithm is inherently
different. Instead of drawing all triangles onto the screen one by one, we need
to work on each screen pixel one by one, tracing a ray through each one. As
mentioned in Chapter 3, the decoration step (fragment shader) of the GPU is
programmable, and this is what we will use to get the GPU ray tracing.
Instead of using the fragment shader (see Chapter 3) to decorate a triangle with
bitmap textures and light, we use it to run the ray tracing algorithm. This
is accomplished by programming a set of fragment shader programs (kernels),
each one specialized to handle a certain step of the ray tracing algorithm. To
get the GPU to execute a certain step of the ray tracing algorithm, we simply
31
32
Chapter 6. Implementation
enable the kernel corresponding to the step, and then draw two triangles which
entirely cover the screen (see Figure 6.2). When rasterizing the triangles, the
GPU will then execute the kernel on each pixel.
6.2
Data
Each kernel need to be able to read and write data in order to communicate
with other kernels. All data is put into textures, for example, a point in threedimensional space can be stored in the red, green and blue components of a
texture. Before describing the details of the program, we will briefly discuss
how to handle the input (and output) of data to a kernel. There are basically
two ways of accessing the data, directly or indirectly.
When applying textures to triangles for decoration, we supply texture coordinates to each triangle vertex. The GPU then interpolates these coordinates
over the triangle to fetch the correct value of the texture for each generated
pixel in the triangle (Figure 6.1). This is the facility we will use to simulate a
streaming data model. We call this direct data access, since the GPU always
supplies the current kernel with the address to the value in the texture. When
drawing the two screen covering triangles, we set texture coordinates in each
corner of the triangles to map the texture perfectly to the screen (Figure 6.2).
Having textures of the same size as the screen, every element of the texture will
be addressed by exactly one pixel.
Not all kind of data can be used in this way. For example, when laying out
the bounding volume hierarchy data in a texture, there is no particular relation
between any part of this data and a certain pixel in the screen space. Instead,
any piece of this data may be needed for the calculation of any pixel value (for
example, we never know beforehand which parts of the hierarchy a certain ray
will traverse). Thus, we need to compute the address of this data ourselves, and
we call this indirect data access. Often, we compute this address using data
from another texture (Figure 6.3).
In our implementation, we use six data sets. We use two screen sized sets to
store ray origin and direction. Screen sized textures gives us the possibility to
use direct access and gives each pixel the corresponding element of the texture
for storage (note that a texture can be either read from or written to during one
kernel execution, not both). Additionally, we use two indirect data sets to store
the bounding box hierarchy and Bézier control point data. Lastly, we also need
two screen sized (direct) data sets holding current traversal data (for example,
current node identifier) and nearest found intersection.
6.2. Data
33
(0, 1)
(1, 0)
(0, 0)
Figure 6.1: Classic texture mapping using a bitmap texture.
(1, 1)
(0, 1)
-
(0, 0)
(1, 0)
(a)
(b)
(c)
Figure 6.2: We execute a kernel on every element of a data set (texture) by drawing
two triangles just large enough to get a one-to-one mapping between the generated
pixels in the screen space and the data set. In this example we use a screen size of
8 × 8 pixels. The triangles we draw (a) are rasterized when drawn in screen space (b).
By enabling a certain kernel before drawing the triangles, we force the GPU to execute
this kernel on every pixel generated. During this execution, we access the values of
screen sized textures (c) at corresponding coordinates.
Input stream
2
5
U
3
?
3
s
Other data set
Figure 6.3: Indirect addressing using streams.
34
6.3
Chapter 6. Implementation
Program
The program is implemented using multipass rendering consisting of eight kernels and the program flow is illustrated in Figure 6.4. It should be mentioned
that the program currently only handles primary rays, meaning that shadow,
reflection and refraction effects have not been implemented. The bounding volume hierarchy and the Bézier control point data are set up as texture data
before starting the program and then input to kernels when needed. In the
bounding volume hierarchy data set, we add information about surface id and
initial guess to each leaf node, supplying us with the required information to
start a Newton search. Following is a brief description of each kernel:
Generate primary rays This is the simplest of the kernels. It generates two
textures, one with the origins and one with the normalized directions of the
rays through every pixel. We need no input texture, but simply generate
the two output textures by setting the origin and direction values in each
corner of the texture and letting the GPU interpolate these values.
Initialize traversal During the program execution we need to keep track of
each ray’s state, that is whether it is traversing the bounding volume tree
(and if so its current position in the tree), waiting for intersection test or
has been deactivated (done traversing). We also need to hold information
about the currently nearest intersection that has been found. For this we
use two data sets, both of which we initialize in this kernel.
Check state In this kernel we decide whether to continue traversing or whether
to perform an intersection test. This decision is based on how many
rays are ready for intersection test, that is, rays that have reached a leaf
node. The number of rays in a certain state is counted using occlusion
queries [29]. A kernel can always discard a value if it needs to, resulting
in no value being written to the output stream (leaving previous data intact). The check state kernel discards values if the corresponding ray is
in waiting state or deactivated. Occlusion queries are used to count the
number of values which was not discarded by the kernel, and we can thus
determine whether to continue traversing or not.
Traverse This is the bounding volume hierarchy traversal kernel. Assuming
a ray is in traversal state, we traverse the tree based on which node we
came from, and based on intersection tests between the ray and child node
bounding volumes (which is also performed by this kernel). If a ray reaches
the root node after having traversed both of its child trees (if needed), it
is deactivated.
Intersection When enough rays (see next section) are in a waiting state we
stop traversing, and instead perform an intersection test. Using the surface
id and initial guess from the current leaf node, we start a Newton search.
6.4. Optimizations
-
35
Generate
primary rays
?
Initialize
traversal
?
- Check state
- Intersection
?
Traverse
-
Shade
-
?
Continue
(two kernels)
Figure 6.4: Program flow.
In this kernel we also make an occlusion query. If all values were discarded,
then all rays have been deactivated, we are done and can continue to the
shading step.
Continue This step consists of two kernels, the first used to update the nearest
intersection data set, and the other to update the state data set. This
could be combined into one kernel, but was kept separate to simplify
implementation.
Shade Finally, when all rays are deactivated, this kernel takes care of shading
using the nearest intersection data set.
6.4
Optimizations
A naive implementation can prove to be very inefficient, especially if we do
not take into consideration the parallel nature of the GPU. In this section, we
explain a few techniques we have used to speed up the rendering of a scene.
The probably single most important optimization is tiling of the screen space.
The number of needed kernel executions vary from different parts of the screen
space. After each iteration, we will have larger inactive areas of the screen space.
Executing each kernel on the entire data set thus means a lot of unnecessary
work. Since we do have screen coherency to some extent, we tile the screen
36
Chapter 6. Implementation
space. From the start we maintain a list of active tiles, that is, tiles with at
least one active pixel (Figure 6.5), for which we execute the current kernel.
We can speed the program up by performing some load balancing [37]. Executing a kernel on a tile with almost no active pixels means a lot of redundant work.
Therefore, we do not wait for all pixels to be in waiting or inactive state before
switching kernel. In the current implementation we use the following rules for
the traversal/check state loop to perform load balancing:
1. A tile is considered to be in waiting state if one of the following holds:
• the number of active pixels in the tile are larger than a threshold
Pactive and the number of pixels in waiting state represent at least
Pwaiting % of the number of active pixels.
• all active pixels in the tile are in waiting state.
2. We switch to the intersection kernel if one of the following holds:
• the number of active tiles are larger than a threshold Tactive and the
number of tiles in waiting state represent at least Twaiting % of the
number of active tiles.
• all active tiles are in waiting state.
For our tests, we have used a screen size of 512 × 512, a tile size of 16 × 16 pixels
and the following values for the rules:
Pactive
=
30,
Pwaiting
Tactive
=
=
80%,
40,
Twaiting
=
35%.
These values have been roughly determined through experimentation to give
good results for our test scenes. Small variations to the values do not affect the
results significantly.
6.4. Optimizations
Figure 6.5: Tiling of the screen space and list of active tiles.
37
38
Chapter 6. Implementation
Chapter 7
Results
In this chapter we will present and discuss the results of the Bézier surface ray
tracer that was implemented on GPU as well as give a few suggestions on how
to extend the work.
7.1
Results
The implementation of the algorithms presented in this work has been tested
on a few scenes, illustrated in Figure 7.1. All tests were run on an NVIDIA
GeForce 6800 with a screen size of 512 × 512 pixels and a tile size of 16 × 16
pixels. The scenes were tested with two different bounding box tree depths to
investigate how this affects the rendering time. The data and results for the
scenes are given in Tables 7.1, 7.2 and 7.3. The data shows how many times the
traversal and intersection kernels were bound (loaded into memory) and how
many times they were executed on a tile. We also tested the performance of the
kernels and the results are given in Table 7.4.
The main work is made by the traversal and intersection kernels, which have
the highest instruction count. Switching kernels does not seem to have any
significant effect on the overall speed and as we can see, neither does the the
number of bounding boxes. We need more tests to determine the effect of the
tree depth however.
The results show that our GPU implementation is at least comparable with
the rendering speeds achieved in [17, 1] with CPU implementations. We will
not make a qualitative comparison with those CPU implementations, since it is
hard to make a fair one.
39
40
Chapter 7. Results
In Figure 7.2 and 7.3 we show two of the problems that can arise when using
Newton’s method.
The test scenes for nonparametric triangular patches are shown in Figure 7.4.
These are simple scenes consisting of four patches only using no spacial partitioning, thus every ray is tested against all surface patches. Both scenes reached
a speed of 17 fps using the numerical method to find intersections.
Scene
Patches
Bounding boxes
Tree depth
Traversal binds
Traversal executions
Intersection binds
Intersection executions
Frames per second
teapot (1)
32
12423
13
433
59391
29
6253
1.00
teapot (2)
32
3111
12
233
44555
27
5751
1.18
Table 7.1: Results from teapot scene tests.
Scene
Patches
Bounding boxes
Tree depth
Traversal binds
Traversal executions
Intersection binds
Intersection executions
Frames per second
teacup (1)
26
8571
13
374
74125
25
6984
0.89
teacup (2)
26
2671
12
311
57802
27
7169
1.01
Table 7.2: Results from teacup scene tests.
Scene
Patches
Bounding boxes
Tree depth
Traversal binds
Traversal executions
Intersection binds
Intersection executions
Frames per second
teaspoon (1)
16
3433
12
581
25638
31
2764
1.89
teaspoon (2)
16
1245
10
388
20037
28
2631
2.22
Table 7.3: Results from teaspoon scene tests.
7.1. Results
41
Figure 7.1: Test scenes for rectangular Bézier ray tracing. Screen is 512 × 512 pixels.
42
Chapter 7. Results
Kernel
Generate primary rays
Initialize traversal
Check state
Traverse
Intersection
Continue (state)
Continue (intersection)
Shade
Instruction count
4
21
3
157
341
21
10
149
Pixel throughput (GP/s)
0.975
0.300
1.950
0.061
−
0.433
0.780
0.031
Table 7.4: Results from running NVShaderPerf on the kernels. Intersection kernel
failed to test pixel throughput.
Figure 7.2: Problem area on the knob of the teapot. Several control points coincide
and we run into numerical problems when computing partial derivatives.
Figure 7.3: To few bounding boxes have been created for the surface patches resulting
in bad initial guesses. We therefore find the wrong intersection or none at all.
7.1. Results
43
Figure 7.4: Test scenes for nonparametric triangular Bézier ray tracing. Screen is
512 × 512 pixels.
44
7.2
Chapter 7. Results
Conclusion
We have shown in this work that it is possible to implement ray tracing of Bézier
surfaces on a GPU using Newton’s method and we have introduced bounding
box hierarchy traversal on GPU. The results show that our GPU implementation
is slower but still performs well compared to previous similar implementations
on CPUs. We have used load balancing and suggested and implemented the use
of tiling to speed up execution.
Furthermore, we have shown how to ray trace nonparametric Bézier triangles.
We have proposed two methods, one direct method using the cubic formula to
find the intersection analytically, and a numerical method based on Newton’s
method to handle an environment with limited floating point precision, which
we are presented with when working with GPUs. The numerical method has
been implemented and works quite well on our simple test scenes.
7.3
Future work
We have successfully implemented the basic parts of a Bézier surface ray tracer
on GPU, but there is still a lot of work that can be done. A first step to extend
the current ray tracer could be to add shadow rays and support for reflection
and refraction. We should also put more work into minimizing the numerical
problems that still exist and test the algorithm on larger and more complex
scenes.
Another point that certainly deserves some attention is the use of early-z culling.
This is a technique used by modern GPUs to avoid unnecessary execution of
fragment programs which we could utilize to speed up the execution. By making
a look-up in the depth buffer, the GPU can discard pixels before execution of
the fragment program. By setting up the depth buffer properly before executing
an expensive kernel, for example by running a small and much more inexpensive
kernel specialized for this task, we can avoid unnecessary work on inactive rays.
Further testing of nonparametric triangular surfaces would include the use of
more complex scenes and spacial hierarchies. Also, if future GPUs offers higher
floating point precision or if the current numerical problems can be solved, the
analytical approach would yield faster execution.
The algorithms in this work have all focused on cubic surfaces, but in some
cases it may be sufficient to use quadratic surfaces. Quadratic surfaces are
much more attractive when considering the numerical aspect and would also
yield faster algorithms.
In [17] rays are not traced one by one, but rather several rays at a time in
small packets. A similar approach should be possible to implement on a GPU,
7.3. Future work
45
effectively reducing the total work of traversing the spacial hierarchy traversal.
46
Chapter 7. Results
Bibliography
[1] Carsten Benthin, Ingo Wald, and Philipp Slusallek. Interactive ray tracing
of free-form surfaces. In AFRIGRAPH ’04: Proceedings of the 3rd international conference on Computer graphics, virtual reality, visualisation
and interaction in Africa, pages 99–106, New York, NY, USA, 2004. ACM
Press.
[2] Ian Buck. Taking the plunge into GPU computing. In Matt Pharr, editor,
GPU Gems 2, pages 509–519. Addison-Wesley, March 2005.
[3] Swen Campagna and Philipp Slusallek. Improving bézier clipping and
chebyshev boxing for ray tracing parametric surfaces, 1996.
[4] Swen Campagna, Philipp Slusallek, and Hans-Peter Seidel. Ray tracing of
spline surfaces: Bézier clipping, chebyshev boxing, and bounding volume
hierarchy - a critical comparison with new results. The Visual Computer,
13(6):265–282, 1997.
[5] Nathan A. Carr, Jesse D. Hall, and John C. Hart. The ray engine.
In HWWS ’02: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS
conference on Graphics hardware, pages 37–46, Aire-la-Ville, Switzerland,
Switzerland, 2002. Eurographics Association.
[6] M. Christen. Ray tracing on GPU. Diploma thesis, University of Applied
Sciences Basel, Switzerland, 2005.
[7] J. L. D. Comba and J. Stolfi. Affine arithmetic and its applications to computer graphics. In Proc. VI Brazilian Symposium on Computer Graphics
and Image Processing (SIBGRAPI’93), pages 9–18, 1993.
[8] Andreas Dietrich, Ingo Wald, Carsten Benthin, and Philipp Slusallek. The
OpenRT Application Programming Interface – Towards A Common API for
Interactive Ray Tracing. In Proceedings of the 2003 OpenSG Symposium,
pages 23–31, Darmstadt, Germany, 2003. Eurographics Association.
[9] Loic Lamarque David Menegaux Dominique Michelucci, Sebti Foufou.
Bernstein based arithmetic featuring de casteljau. In Proceedings of the
47
48
Bibliography
17th Canadian Conference on Computational Geometry (CCCG’05), pages
215–218, 2005.
[10] Alexander Efremov. Efficient ray tracing of trimmed NURBS surfaces.
Master’s thesis, University of Saarland, 2004.
[11] Alexander Efremov, Vlastimil Havran, and Hans-Peter Seidel. Robust and
numerically stable bézier clipping method for ray tracing nurbs surfaces. In
SCCG ’05: Proceedings of the 21st spring conference on Computer graphics,
pages 127–135, New York, NY, USA, 2005. ACM Press.
[12] Manfred Ernst, Christian Vogelgsang, and Günther Greiner. Stack implementation on programmable graphics hardware. In VMV, pages 255–262,
2004.
[13] Gerald Farin. Triangular bernstein-bézier patches. Computer Aided Geometric design, 3(2):83–128, 1986.
[14] Gerald Farin. Curves and Surfaces for CAGD, a Practical Guide. Academic
Press, San Diego, fourth edition, 1997.
[15] Randima Fernando and Mark J. Kilgard. The Cg Tutorial: The Definitive
Guide to Programmable Real-Time Graphics. Addison-Wesley Longman
Publishing Co., Inc., Boston, MA, USA, 2003.
[16] Tim Foley and Jeremy Sugerman. Kd-tree acceleration structures for
a gpu raytracer.
In HWWS ’05: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pages 15–
22, New York, NY, USA, 2005. ACM Press.
[17] Markus Geimer and Oliver Abert. Interactive ray tracing of trimmed bicubic bézier surfaces without triangulation. In WSCG (Full Papers), pages
71–78, 2005.
[18] Pat Hanrahan. Ray tracing algebraic surfaces. In SIGGRAPH ’83: Proceedings of the 10th annual conference on Computer graphics and interactive
techniques, pages 83–90, New York, NY, USA, 1983. ACM Press.
[19] Mark Harris. Mapping computational concepts to GPUs. In Matt Pharr,
editor, GPU Gems 2, pages 493–508. Addison-Wesley, March 2005.
[20] John C. Hart. Ray tracing implicit surfaces. In SIGGRAPH 93 Modeling,
Visualizing, and Animating Implicit Surfaces course notes, pages 13–1 to
13–15. 1993.
[21] Vlastimil Havran. Heuristic Ray Shooting Algorithms. Ph.d. thesis, Department of Computer Science and Engineering, Faculty of Electrical Engineering, Czech Technical University in Prague, November 2000.
[22] Brian Hayes. A lucid interval.
November–December 2003.
American Scientist, 91(6):484–488,
Bibliography
49
[23] Michael T. Heath. Scientific Computing: An Introductory Survey. McGrawHill, second edition, 2002.
[24] Kenneth I. Joy and Murthy N. Bhetanabhotla. Ray tracing parametric surface patches utilizing numerical techniques and ray coherence. In
SIGGRAPH ’86: Proceedings of the 13th annual conference on Computer
graphics and interactive techniques, pages 279–285, New York, NY, USA,
1986. ACM Press.
[25] A. Junior, L. de Figueiredo, and M. Gattas. Interval methods for raycasting
implicit surfaces with ane arithmetic, 1999.
[26] James T. Kajiya. Ray tracing parametric patches. In SIGGRAPH ’82:
Proceedings of the 9th annual conference on Computer graphics and interactive techniques, pages 245–254, New York, NY, USA, 1982. ACM Press.
[27] D. Kalra and A. H. Barr. Guaranteed ray intersections with implicit surfaces. In SIGGRAPH ’89: Proceedings of the 16th annual conference on
Computer graphics and interactive techniques, pages 297–306, New York,
NY, USA, 1989. ACM Press.
[28] Filip Karlsson and Carl Johan Ljungstedt. Ray tracing fully implemented
on programmable graphics hardware. Master’s thesis, Chalmers University
of Technology, 2004.
[29] Emmett Kilgariff and Randima Fernando. The geforce 6 series GPU architecture. In Matt Pharr, editor, GPU Gems 2, pages 471–491. AddisonWesley, March 2005.
[30] Aaron Lefohn, Joe M. Kniss, and John D. Owens. Implementing efficient
parallel data structures on GPUs. In Matt Pharr, editor, GPU Gems 2,
pages 521–545. Addison-Wesley, March 2005.
[31] William Martin, Elaine Cohen, Russell Fish, and Peter Shirley. Practical
ray tracing of trimmed NURBS surfaces. J. Graph. Tools, 5(1):27–52, 2000.
[32] R. E. Moore and S. T. Jones. Safe starting regions for iterative methods.
SIAM Journal on Numerical Analysis, 14(6):1051–1065, 1977.
[33] Tomoyuki Nishita, Thomas W. Sederberg, and Masanori Kakimoto. Ray
tracing trimmed rational surface patches. In SIGGRAPH ’90: Proceedings of the 17th annual conference on Computer graphics and interactive
techniques, pages 337–345, New York, NY, USA, 1990. ACM Press.
[34] John Owens. Streaming architectures and technology trends. In Matt
Pharr, editor, GPU Gems 2, pages 457–470. Addison-Wesley, March 2005.
[35] Les Piegl and Wayne Tiller. The NURBS Book. Springer-Verlag, second
edition, 1997.
50
Bibliography
[36] W. H. Press, W. T. Vetterling, S. A. Teukolsky, and B. P. Flannery. Numerical Recipes in C++. Cambridge University Press, New York, second
edition, 2002.
[37] Timothy J. Purcell, Ian Buck, William R. Mark, and Pat Hanrahan. Ray
tracing on programmable graphics hardware. In SIGGRAPH ’02: Proceedings of the 29th annual conference on Computer graphics and interactive
techniques, pages 703–712, New York, NY, USA, 2002. ACM Press.
[38] Timothy John Purcell. Ray tracing on a stream processor, 2004.
[39] S. H. Martin Roth, Patrick Diezi, and Markus H. Gross. Raytracing triangular bézier patches. Comput. Graph. Forum, 20(3), 2001.
[40] Thomas W. Sederberg and David C. Anderson. Ray tracing of steiner
patches. In SIGGRAPH ’84: Proceedings of the 11th annual conference on
Computer graphics and interactive techniques, pages 159–164, New York,
NY, USA, 1984. ACM Press.
[41] Thomas W. Sederberg and Falai Chen. Implicitization using moving curves
and surfaces. In SIGGRAPH ’95: Proceedings of the 22nd annual conference on Computer graphics and interactive techniques, pages 301–308, New
York, NY, USA, 1995. ACM Press.
[42] Thomas W. Sederberg, Jianmin Zheng, Kris Klimaszewski, and Tor
Dokken. Approximate implicitization using monoid curves and surfaces.
Graph. Models Image Process., 61(4):177–198, 1999.
[43] John M. Snyder. Interval analysis for computer graphics. In SIGGRAPH
’92: Proceedings of the 19th annual conference on Computer graphics and
interactive techniques, pages 121–130, New York, NY, USA, 1992. ACM
Press.
[44] Wolfgang Stürzlinger. Ray-tracing triangular trimmed free-form surfaces.
IEEE Transactions on Visualization and Computer Graphics, 4(3):202–214,
1998.
[45] Daniel L. Toth. On ray tracing parametric surfaces. In SIGGRAPH ’85:
Proceedings of the 12th annual conference on Computer graphics and interactive techniques, pages 171–179, New York, NY, USA, 1985. ACM Press.
[46] Alex Vlachos, Jörg Peters, Chas Boyd, and Jason L. Mitchell. Curved pn
triangles. In SI3D ’01: Proceedings of the 2001 symposium on Interactive
3D graphics, pages 159–166, New York, NY, USA, 2001. ACM Press.
[47] Ingo Wald, Philipp Slusallek, Carsten Benthin, and Markus Wagner. Interactive rendering with coherent ray tracing. In A. Chalmers and T.-M.
Rhyne, editors, EG 2001 Proceedings, volume 20(3), pages 153–164. Blackwell Publishing, 2001.
Bibliography
51
[48] Shyue-Wu Wang, Zen-Chung Shih, and Ruei-Chuan Chang. An efficient
and stable ray tracing algorithm for parametric surfaces. J. Inf. Sci. Eng.,
18(4):541–561, 2002.
[49] Eric W. Weisstein. Cubic formula. From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/CubicFormula.html (2005-11-13).
[50] Ch. Woodward. Ray tracing parametric surfaces by subdivision in viewing
plane. pages 273–287, 1989.
[51] Sven Woop, Jörg Schmittler, and Philipp Slusallek. Rpu: a programmable
ray processing unit for realtime ray tracing. ACM Trans. Graph., 24(3):434–
444, 2005.
52
Bibliography
LINKÖPING UNIVERSITY
ELECTRONIC PRESS
Copyright
The publishers will keep this document online on the Internet - or its possible
replacement - for a period of 25 years from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies
for your own use and to use it unchanged for any non-commercial research and
educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the
copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual
property law the author has the right to be mentioned when his/her work is
accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its
procedures for publication and for assurance of document integrity, please refer
to its WWW home page: http://www.ep.liu.se/
Upphovsrätt
Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare
- under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för
var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och
att använda det oförändrat för ickekommersiell forskning och för undervisning.
Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta
tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det
lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed
kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot
att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang
som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller
egenart. För ytterligare information om Linköping University Electronic Press
se förlagets hemsida http://www.ep.liu.se/
c 2006, Joakim Löw
53
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement