Examensarbete Ray Tracing Bézier Surfaces on GPU Joakim Löw LITH - MAT - EX - - 06 / 24 - - SE Ray Tracing Bézier Surfaces on GPU Scientific Computing, Department of Mathematics, Linköpings Universitet Joakim Löw LITH - MAT - EX - - 06 / 24 - - SE Examensarbete: 20 p Level: D Supervisor: Tomas Akenine-Möller, Computer Graphics, Department of Computer Science, Lund University Examiner: Fredrik Berntsson, Scientific Computing, Department of Mathematics, Linköpings Universitet Linköping: January 2006 Datum Date Avdelning, Institution Division, Department January 2006 Matematiska Institutionen 581 83 LINKÖPING SWEDEN Språk Language Rapporttyp Report category Licentiatavhandling Svenska/Swedish × Engelska/English × ISBN ISRN LITH - MAT - EX - - 06 / 24 - - SE Examensarbete C-uppsats Serietitel och serienummer D-uppsats Title of series, numbering ISSN Övrig rapport URL för elektronisk version http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva5476 Titel Title Ray Tracing Bézier Surfaces on GPU Författare Author Joakim Löw Sammanfattning Abstract In this report, we show how to implement direct ray tracing of Bézier surfaces on graphics processing units (GPUs), in particular bicubic rectangular Bézier surfaces and nonparametric cubic Bézier triangles. We use Newton’s method for the rectangular case and show how to use this method to find the ray-surface intersection. For Newton’s method to work we must build a spatial partitioning hierarchy around each surface patch, and in general, hierarchies are essential to speed up the process of ray tracing. We have chosen to use bounding box hierarchies and show how to implement stackless traversal of such a structure on a GPU. For the nonparametric triangular case, we show how to find the wanted intersection by simply solving a cubic polynomial. Because of the limited precision of current GPUs, we also propose a numerical approach to solve the problem, using a one-dimensional Newton search. Nyckelord Keyword Ray Tracing, Bézier Surface, Newton, Newton’s method, Graphics Processor, Graphics processing unit, Graphics Hardware, GPU. vi Abstract In this report, we show how to implement direct ray tracing of Bézier surfaces on graphics processing units (GPUs), in particular bicubic rectangular Bézier surfaces and nonparametric cubic Bézier triangles. We use Newton’s method for the rectangular case and show how to use this method to find the ray-surface intersection. For Newton’s method to work we must build a spatial partitioning hierarchy around each surface patch, and in general, hierarchies are essential to speed up the process of ray tracing. We have chosen to use bounding box hierarchies and show how to implement stackless traversal of such a structure on a GPU. For the nonparametric triangular case, we show how to find the wanted intersection by simply solving a cubic polynomial. Because of the limited precision of current GPUs, we also propose a numerical approach to solve the problem, using a one-dimensional Newton search. Keywords: Ray Tracing, Bézier Surface, Newton, Newton’s method, Graphics Processor, Graphics processing unit, Graphics Hardware, GPU. vii viii Chapter 0. Abstract Acknowledgements First and foremost, I would like to thank my supervisor, Tomas Akenine-Möller, for giving me the opportunity to work on my master’s thesis at Lund University, and my examiner, Fredrik Berntsson, for his constructive and helpful criticism of the content of this document. In addition, I would like to thank the computer graphics people at the Department of Computer Science, Lund University, including Jon Hasselgren, Jacob Munkberg, Petrik Clarberg and Calle Lejdfors for their invaluable help during my work. I also want to thank Linde Wittmeyer-Koch, Tommy Elfving, Ulla Ouchterlony, Ingegerd Skoglund and the rest of the great people at the Department of Mathematics, Linköpings Universitet, for inspiring me and helping me in my studies in scientific computing. Finally, I would like to thank my opponents, Johan Pettai and Julius Jägerskog for reading through this material and giving me suggestions on how to improve it. ix x Chapter 0. Acknowledgements Contents Abstract vii Acknowledgements ix 1 Introduction 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 2 Bézier Surfaces 2.1 Triangular Bézier Surface . . . . . . . . . . . . . . . . . . . . . . 2.2 Rectangular Bézier Surface . . . . . . . . . . . . . . . . . . . . . 3 3 5 3 The Graphics Processing Unit 3.1 General purpose usage . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 10 4 Ray 4.1 4.2 4.3 4.4 4.5 Tracing Bézier Surfaces Subdivision . . . . . . . . . Newton’s method . . . . . . Interval Analysis . . . . . . Bézier clipping . . . . . . . Other methods . . . . . . . 13 14 15 16 18 20 5 Ray 5.1 5.2 5.3 Tracing Algorithm 21 Traversing the Bounding Volume Hierarchy . . . . . . . . . . . . 21 Rectangular Surface Intersection . . . . . . . . . . . . . . . . . . 24 Nonparametric Triangular Surface Intersection . . . . . . . . . . 25 6 Implementation 6.1 GPU ray tracing 6.2 Data . . . . . . . 6.3 Program . . . . . 6.4 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 31 32 34 35 xi xii Contents 7 Results 39 7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 7.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 7.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Bibliography 51 Chapter 1 Introduction This document presents the results of my master’s thesis work at LUGG (Lund University Graphics Group) during the autumn 2005. In this first chapter, a short background to the material is presented along with an overview of the contents of the later chapters. 1.1 Background Computer hardware is reaching a point where interactive, or even real time, ray tracing can become a reality. Software implementations taking advantage of advanced features such as SIMD architectures and intelligently using caches have shown to give great speeds on modern computers. One example of this is the OpenRT project [8]. We have also seen the appearance of processors specialized for ray tracing, especially the RPU [51], having surprisingly fast render times. One key factor to reach high speeds is to reduce the number of primitives, allowing for more aggressive optimizations. The above mentioned ray tracing systems uses triangles as the only primitive. Direct ray tracing of curved surfaces, that is, not using tessellation, is a bit more involved and places more demands on the implementation. Previous implementations of interactive Bézier surface ray tracing has been made in [1, 17]. They reached reasonable speeds by heavy use of SIMD extensions and restricting their implementations to bicubic surfaces. Graphics processing units (GPUs), whose processing power today are increasing more rapidly than CPUs, have started to become an alternative for computation purposes other than rasterizing graphics. They exist in most modern home computers, and can therefore be used as co-processors by many applications. 1 2 Chapter 1. Introduction One of these general-purpose applications is ray tracing. A few implementations of GPU ray tracing exist. Purcell et al. [37] proposed already in 2002 how to completely fit the ray tracing algorithm on a GPU by using a stream processor model. Just recently, two more implementations have been made [28, 6]. Yet another implementation [5], uses the GPU only for intersection tests and performs the rest of the algorithm on the CPU. All of these implementations use triangles as the sole primitive. In this report we take a closer look at how to utilize graphics processing units for the purpose of ray tracing Bézier surfaces. Several existing methods is considered, trying to find one fit to be implemented on a GPU. GPUs have several limitations which need to be taken into account when choosing method. We choose an appropriate method and implement ray tracing of rectangular Bézier surfaces as well as nonparametric triangular Bézier surfaces. 1.2 Overview The report has been split into seven chapters which, excluding the current chapter, has the following content: Chapter 2 contains an introduction to triangular and rectangular Bézier surfaces. Chapter 3 introduces the concepts of the graphics processing unit and explains the advantages and limitations we need to take into consideration. Chapter 4 gives an overview of the different methods that exist for ray tracing Bézier surfaces, along with any advantages or disadvantages the methods may have in comparison. Chapter 5 explains the algorithm that was chosen for implementation in this work and the reasons for making this choice. Included are also discussion and solutions to the different problems that exist with the algorithm. Chapter 6 gives details on the GPU specific implementation. Chapter 7 finally presents the results, a conclusion and a short discussion on how this work may be extended in the future. Chapter 2 Bézier Surfaces A Bézier surface is a parametric surface defined by a set of control points (usually in three-dimensional Euclidean space) and a set of basis functions, in this case Bernstein polynomials, which hold the weight of each control point throughout the parametric domain of the surface. Bézier surfaces yield a natural and intuitive way of representing surfaces and algorithms are usually fast and numerically stable. This chapter gives a brief overview of two types of Bézier surfaces, triangular and rectangular. 2.1 Triangular Bézier Surface For triangular surfaces, we are given a set of control points pi,j,k , i + j + k = n, where n is the degree of the surface. In the triangular case, each control point is assigned a bivariate Bernstein polynomial as basis function, defined as n Bi,j,k (u) = n! i j k uv w , i!j!k! i + j + k = n, (2.1) where u = (u, v, w) is given in barycentric coordinates. The parametric domain is set to be u, v, w ≥ 0, u + v + w = 1. A few examples of the Bernstein polynomials can be seen in Figure 2.2. More thorough information on barycentric coordinates can be found in [14, 13], but briefly, the theory proceeds as follows: assume we have points a, b, c, p ∈ R2 . Then p = ua + vb + wc is called a barycentric combination. We require that u + v + w = 1. Now u = (u, v, w) is said to be the barycentric coordinates of p in the barycentric coordinate system (abc). Note that (2.1) really is bivariate, since w = 1 − u − v. Note also that the control points can be placed arbitrarily and independently of each other in Euclidean space. 3 4 Chapter 2. Bézier Surfaces b0,3,0 v3 b0,2,1 b1,2,0 3v 2 w 3uv 2 b0,1,2 b1,1,1 b2,1,0 3vw2 6uvw 3u2 v b0,0,3 b1,0,2 b2,0,1 b3,0,0 w3 3uw2 3u2 w (a) u3 (b) (c) Figure 2.1: Bézier triangle example. Control point indexing (a), corresponding basis functions (b) and surface (c) of a cubic Bézier triangle. The equation of the surface takes the form X n Sn (u) = pi,j,k Bi,j,k (u). (2.2) i+j+k=n If, however, the control points are distributed in a certain uniform manner, more specifically i j k pi,j,k = (u, v, w, z) = ( , , , pi,j,k ), n n n where (u, v, w) are the barycentric coordinates of the point in a given plane, the surface can be reduced to the form X n z= pi,j,k Bi,j,k (u). (2.3) i+j+k=n In this form, the task of ray tracing the surface becomes particularly simple, especially for low-degree surfaces (degree ≤ 4), since the ray-surface intersection then can be found by solving an explicit formula. We will look closer at this case in a later chapter. A natural derivative to consider when working with triangular patches is the directional derivative [14]. They give us a tool to compute the derivative along any direction in the parametric domain of the triangle: X n−1 (dpi+1,j,k + epi,j+1,k + f pi,j,k+1 )Bi,j,k (u), (2.4) Dd S(u) = n i+j+k=n−1 where the (barycentric) vector d = u2 − u1 = (d, e, f ) defines the direction of the derivative. As we can see by (2.4), a derivative of a Bézier surface is itself a Bézier surface. Note that for a vector in barycentric coordinates we have d + e + f = 0. 2.2. Rectangular Bézier Surface 5 3 B1,1,1 = 6uvw 3 B0,3,0 = v3 3 B0,1,2 = 3vw2 Figure 2.2: Three of the basis functions of a cubic Bézier triangle. However, for the purposes in this thesis, we need non-barycentric partial derivatives. The surface can easily be rewritten as a function of two parameters by replacing w = 1 − u − v. The partial derivatives of the surface then becomes simple special cases of the directional derivatives: ∂ n S (u, v, 1 − u − v) = D(1,0,−1) S(u), ∂u ∂ n S (u, v, 1 − u − v) = D(0,1,−1) S(u). ∂v (2.5) (2.6) Bézier surfaces enjoy several useful properties, one of which is invariance under affine transformations. This means the overall shape of the surface remains unchanged when translating or rotating the surface (that is, applying a translation or rotation transformation to all control points). Also, Bézier surfaces have the convex hull property, meaning that the surface is always completely contained inside the convex hull generated by its control points. This property is very useful for ray tracing purposes as we shall see later. 2.2 Rectangular Bézier Surface For rectangular surfaces we have a set of control points pi,j , 0 ≤ i ≤ m, 0 ≤ j ≤ n, where m and n are the degree of the surface along the two different parametric directions. In this case, the basis functions are given by the product of two univariate Bernstein polynomials: m,n Bi,j (u, v) = Bim (u)Bjn (v), where Bin (t) = n i ti (1 − t)n−i , 0 ≤ t ≤ 1, (2.7) (2.8) 6 Chapter 2. Bézier Surfaces Figure 2.3: Bicubic Bézier surface example. 3,3 B0,3 = (1 − u)3 v 3 3,3 3,3 = 3u(1 − u)2 3v(1 − v)2 B0,1 = (1 − u)3 3v(1 − v)2 B1,1 Figure 2.4: Three of the sixteen basis functions of a bicubic Bézier rectangle. yielding the surface equation Sm,n (u, v) = m X n X m,n pi,j Bi,j (u, v) i=0 j=0 = uT Pv. (2.9) In the last expression the equation has been rewritten as a matrix-vector product, where uT vT = = m [B0m (u) B1m (u) . . . Bm (u)] , n n n [B0 (v) B1 (v) . . . Bn (v)] , (2.10) (2.11) and P is the m × n matrix with the surface control points as elements. The partial derivatives of a rectangular patch is given by ∂ m,n S (u, v) ∂u = m ∂ m,n S (u, v) ∂v = n m−1 n XX i=0 j=0 m n−1 X X i=0 j=0 m−1,n (pi+1,j − pi,j )Bi,j (u, v), m,n−1 (pi,j+1 − pi,j )Bi,j (u, v). (2.12) (2.13) 2.2. Rectangular Bézier Surface 7 Again, we can see that the partial derivatives of a Bézier surface are also Bézier surfaces. Similar to triangular surfaces, rectangular surfaces is invariant under affine transformations and has the convex hull property. 8 Chapter 2. Bézier Surfaces Chapter 3 The Graphics Processing Unit Before going into detail of the algorithms used for implementation, we need some information about the usage of graphics processing units (GPUs) for other purposes than graphics. Current design of GPUs have a few limitations which need to be explained in order to motivate the choices of algorithms. 3.1 General purpose usage The main purpose of a GPU is to accelerate the rendering of triangles using the traditional rasterization technique. This includes hardware transformation and decoration, such as lighting, texture mapping, bump mapping, etc. Figure 3.1 shows a rudimentary overview of the rendering pipeline. The transformation and decoration steps are programmable, giving the programmer the means for implementing a wide range of different effects. The effect programs are known as vertex shaders and fragment shaders, used for transformations and decorations respectively. In later years, the research community has started to utilize the programmability of modern GPUs to perform general purpose calculations [34, 29, 19, 2, 30], often huge amounts of simple calculations. The GPU is extremely efficient for this purpose, performing simple calculations on large sets of data in parallel. For general purpose computations we normally don’t have much use for vertex shaders, but instead use the fragment shader. The algorithms are implemented as a number of fragment shader programs, also referred to as kernels, and we use the stream processor model (Figure 3.2), where each kernel works on a stream of data. 9 10 Chapter 3. The Graphics Processing Unit - Vertex - shader - - Fragment - shader Figure 3.1: Rough overview of the GPU pipeline. The vertex and fragment shaders are the programmable parts of the GPU. The vertex shader transforms the geometry and the fragment shader determines the value (color) of the pixels. In 2002, [37] introduced the GPU as a stream processor for ray tracing purposes. Several GPU ray tracers have been implemented since then [38, 28, 10]. In [5] the GPU is utilized for intersection computations only, leaving the rest of the ray tracing algorithm for the CPU to resolve. 3.2 Limitations Unfortunately, when it comes to more complex programs, the GPU is not as easy to program as a CPU. A GPU does not have all the functionality and flexibility of a CPU simply because it does not have to, in order for it to accomplish its main task, rendering triangles. Firstly, and most importantly, GPUs have a different memory model [30]. Every kernel have access only to read-only memory, in the form of textures. During the execution, there are a number of temporary registers available to store intermediate values, but the only way to communicate values outside the program is through the return values of the program. Normally, this value is sent to the final buffer used for display, but by using render-to-texture instead, we have the means to send data back to the CPU. Alternatively, it makes it possible to setup data for further processing in other kernels (or repeated process in the current kernel), a technique called multipass rendering (see Figure 3.3). GPUs have no stack [30]. This means recursion is not allowed within kernels. On the other hand, there are loop constructs and conditional constructs we can use to implement similar stackless algorithms. Alternatively, we can take advantage of the multipass rendering technique to simulate a stack. Furthermore, GPUs do their computations in a very limited precision environment. The maximum precision available is what is normally called single precision, in other words, 32 bit floating point precision. This affects algorithms that depend highly on numerical accuracy. We shall see later that it has a serious impact on one of the algorithms implemented during this work. 3.2. Limitations 11 Input ? Kernel ? Output Figure 3.2: Streaming model used for general purpose GPU programming. ? Kernel ? Kernel ? Kernel ? (a) ? Kernel ? (b) Figure 3.3: Multipass rendering using one kernel (a) or several kernels (b). 12 Chapter 3. The Graphics Processing Unit Chapter 4 Ray Tracing Bézier Surfaces In this chapter, we will take a look at different existing methods for ray tracing Bézier surfaces, that is, methods for finding the intersection between a ray and a Bézier surface. In addition to a description, advantages and disadvantages with each method are discussed, including a motivation to whether the method is appropriate for an implementation on a GPU. The examples in this chapter all concern rectangular patches, but each method below can (with some modification) be used on triangular patches as well. Before heading into the different methods, we will say a few words about a technique that can be used together with most of the methods. If we, before applying the ray tracing method, project all control points of the surface to two dimensions [26, 33], we can reduce the amount of work of surface specific operations (for example the de Casteljau algorithm and surface point and derivative evaluations) with some 33% (even the problem of ray tracing rational surfaces can be reduced to two dimensions [33]). Doing this does not always reduce the total cost of finding the intersection, since we lose some depth information. We often need to do extra computations in the final stages of the algorithm, computations that wouldn’t be needed if projection was not performed. A ray can be written in parametric form as r = o + td = (ox + tdx , oy + tdy , oz + tdz ), (4.1) where o is the ray origin, d is the (normalized) direction of the ray and t is a variable scalar that determines positions along the ray (alternatively the length of the ray). If we find two perpendicular planes, intersecting exactly along the ray, we can use the plane normals as basis vectors for a two-dimensional Euclidean space. We easily construct these normals as [31]: 13 14 Chapter 4. Ray Tracing Bézier Surfaces Figure 4.1: Projection of a surface from three to two dimensions. (dy , −dx , 0) (0, dz , −dy ) n1 = n2 = n1 × d. |dx | > |dy | and |dx | > |dz | , otherwise (4.2) (4.3) In this space, the ray reduces to a point. If we want the ’ray point’ to lie at the origin, the projection of a control point pi from R3 to the currently defined space is performed by pi · n1 − n1 · o ′ pi = . (4.4) pi · n2 − n2 · o The projection is illustrated in Figure 4.1. Finding the intersection now reduces to the problem of finding the parametric coordinates of the surface at the origin in the two-dimensional space. 4.1 Subdivision Subdivision is a straightforward method for finding an intersection and was implemented for bicubic Bézier patches in [1] using SIMD optimizations on a CPU (and earlier in [50]). The idea is simple: split the surface into smaller and smaller patches, throwing away those patches that cannot be intersected by the ray, until some maximum depth has been reached. Then we are left with a (hopefully) small set of patches that are candidates for nearest ray-surface intersection. Splitting the surface may be done efficiently using the classic de Casteljau algorithm. Discarding patches is simple: if all of the control points have the same sign on one of the coordinates (in the two-dimensional space defined before), 4.2. Newton’s method (a) 15 (b) (c) Figure 4.2: Subdivision method. Surface (a) is split into (b) and (c). (b) can be discarded since all of its control points lies on one side of a coordinate axis. the patch cannot be intersected by the ray due to the convex hull property (see Figure 4.2). Finally, the control point mesh of each candidate subpatch is used as an approximation to the surface and the intersection is found by performing simple ray-triangle intersection on the triangles generated by the mesh. Although the basic idea is simple and easy to implement, there are some drawbacks to the method. First of all, an object consisting of several surfaces must use the same subdivision depth for all its surfaces in order to avoid cracks. Secondly, since we are approximating the surface with control point meshes, the intersection points can be found only with a low degree of accuracy. We can improve accuracy by increasing subdivision depth, but that would make the method slower. Lastly, since we need to cache at least one surface at every subdivision depth, the demand for temporary memory is high. For this reason the method is hard (if not impossible) to implement on a GPU. 4.2 Newton’s method Finding a zero of a system of nonlinear equations is a standard problem in computational mathematics, and a problem that appears in many applications. One of the few tools available for this task is Newton’s method [23]. Finding the intersection between a ray and a Bézier surface can easily be formulated as a system of equations. Using the projected control points (4.4) we already have a system of nonlinear equations ready to solve: S(u, v) = (0, 0) (see Figure 4.3). In [17, 31] a slightly different approach is taken, performing the projection after surface points and derivatives have been computed in R3 . In [44] the method is implemented for triangular Bézier surfaces. Newton’s method is an iterative method, derived from the truncated Taylor 16 Chapter 4. Ray Tracing Bézier Surfaces series of the system [23]: f (x + s) ≈ f (x) + Jf (x)s. (4.5) The Jacobian Jf of a function f : Rn → Rn is a n × n matrix with elements {Jf }ij = ∂fi . ∂xj (4.6) If we assume x + s to be a zero of f we have Jf (x)s ≈ −f (x). We let s be the change of the current approximate zero xi and get the Newton step (note the inverse Jacobian): xi+1 = xi + s = xi − J−1 f (xi )f (xi ). (4.7) Newton’s method can be used for finding the zero of a system in a fast, but not unconditionally stable manner. To start iterating we need an initial guess x0 , and in order for the method to converge to a solution, it is crucial to have an initial guess close enough to the solution. For ray tracing purposes this is often solved using precomputed bounding volume hierarchies, which completely encloses the surface (Figure 4.4(b)). Each volume in the hierarchy is associated with one or several initial guesses for the part of the surface contained in that particular volume [31, 17, 10]. If the ray intersects one of the volumes, the probability that it also intersects the surface is high, and we can start a Newton iteration using the initial guess of the current volume. Interval analysis gives us another tool to ensure convergence as we shall see in the next section. There are other problems inherent to the method in addition to the vague convergence conditions. If the Jacobian Jf is close to singular, that is if the condition number of Jf is very large, we will run into serious numerical problems. This happens when the ray is close to tangential to the surface and close to areas on the surface where at least one of the partial derivatives is close to zero (for example when neighboring control points of the surface coincide). This is solved in [31, 10] by slightly perturbing the parametric point to push away from the problem area. Another problem concerns multiple intersections within one bounding volume (see Figure 4.5). We can increase the chance of getting the correct solution by assigning several initial guesses to every bounding volume and putting some effort into choosing the best one for a given ray [10]. Newton’s method is the method chosen for implementation on GPU (using only one initial guess per bounding volume), and we will go into more detail on how to apply the method for ray tracing Bézier surfaces in the next chapter. 4.3 Interval Analysis Interval analysis is a tool often used in error analysis to keep track of error propagation and computing error limits [22], but it can also be utilized for ray 4.3. Interval Analysis 17 v 1 v 1 x=0 0 (a) y=0 1u 0 (b) 0 1u 0 (c) Figure 4.3: The intersection is located at (x, y) = (0, 0), and we therefore search for (u, v)-coordinates such that S(u, v) = (0, 0). z r2 S(u, v) s s r(t) (a) r1 z (b) Figure 4.4: Actual intersection (a) and bounding volume technique (b) to find initial guess for Newton’s method. Figure 4.5: Problem with multiple intersections within a bounding volume. tracing. Again Newton’s method is used to find the wanted intersection, but interval analysis can be used for providing good starting approximations. In fact, by utilizing a couple of theorems of interval analysis, we can test whether Newton’s method is guaranteed to converge to a unique solution inside a given parametric interval, using any initial guess inside the interval [45]. This method could be used in conjunction with a precomputed bounding box hierarchy. Every time a node is reached, a test for guaranteed convergence is performed, and if convergence is guaranteed, we start a Newton search. If a leaf node has been reached and convergence is still not guaranteed, the parametric interval is split further on-the-fly. Interval analysis also provide us with the interval Newton iteration method, similar to Newton’s method used in the previous section, but instead working on intervals [45]. Given a parametric interval containing the intersection point, this method can under certain conditions be used to give us a fairly rapid convergence of the interval to the intersection point. A down side of the method is that we may need to make an extreme amount 18 Chapter 4. Ray Tracing Bézier Surfaces of surface splits (or interval Newton iterations) to get an interval inside which convergence is guaranteed. Additionally, in a prototype CPU implementation, the test for guaranteed convergence proved slow, and this method was therefore never included in our GPU ray tracer. For more information on the details of the method, the reader is referred to [45, 32], where the method is implemented and described more in depth. See also [27, 20] for a similar approach for implicit surfaces. 4.4 Bézier clipping Bézier clipping is a technique that was introduced in [33], and proposes a method to iteratively narrow the parametric domain of the surface in which the intersection can be. It was later improved in [3, 4] and has also been implemented for triangular Bézier surfaces in [39]. The given control point mesh is first projected onto two dimensions (even the problem of ray tracing rational surfaces can be reduced to two dimensions [33]). In this example we will perform clipping in the parametric u-direction. First of all, we define a line L: ax + by = 0, a2 + b2 = 1, (4.8) through the origin, parallel to the vector v0 + v1 (see Figure 4.6(a)). We then define the signed distance from every (projected) control point pi,j = (xi,j , yi,j ) to L as di,j = axi,j + byi,j (4.9) (Figure 4.6(b)). Now, the control points di,j = ( i j , , di,j ) m n (m and n being the surface degrees) can be used to form a new Bézier surface d(u, v), for which we find the convex hull (seen from the side in Figure 4.7). Analyzing where this convex hull intersects the zero axis, we find the parametric interval in which the ray can intersect the original surface. The ray can only intersect the surface in the interval [umin , umax ]. Finally, when we have found umin and umax , we compute new control points for the subpatch corresponding to the subinterval of the original parametric domain (Figure 4.8). The procedure is then repeated, alternating between the parametric directions, until the wanted accuracy has been reached. If, in some step, the total reduction of parametric interval is too small (which for example may happen when we have 4.4. Bézier clipping 19 L v0 v1 (a) (b) Figure 4.6: Bézier clipping method. Determine a line L (a) and compute distances to L from every control point (b). multiple intersections), the surface is split in half, and the procedure is continued for each of the halves. This last case is the reason for not implementing this method on a GPU. When performing a split, we are left with two new surfaces, one of which must be stored temporarily while we work with the other, which makes the method unfit for GPU (for the same reasons as the subdivision method). The method is stable in the sense that it is guaranteed to find an intersection if it exists, but it is relatively slow for surfaces of higher degree [39]. d(u, v) umin 0 umax u Figure 4.7: Convex hull of ’distance’ surface. Figure 4.8: Resulting subpatch after one iteration of Bézier clipping. 20 4.5 Chapter 4. Ray Tracing Bézier Surfaces Other methods There are a few other methods in addition to the ones mentioned in this chapter. The probably most commonly used method today is tessellation of the surface, that is, approximating the surface with a set of triangles, before starting the ray trace. This solution has similarities to the subdivision method, but often has an extreme demand on memory to store the precomputed meshes. However, ray tracers with triangles as primitives have already been successfully implemented on GPUs, so this method would certainly be an alternative for GPU implementation. Other methods include implicitization methods. If possible we rewrite the parametric surface as an implicit surface, or if that is too costly, approximate the surface with other implicit surfaces [42, 41, 18]. Chapter 5 Ray Tracing Algorithm In this work, we have implemented ray tracing of rectangular, bicubic Bézier surfaces using Newton’s method, since this method meets the limitations of modern GPUs. Apart from the needed precomputation, it is fast and has very little demand on memory for intermediate data during the actual search. Bounding volume hierarchy traversal is used to find candidate surfaces for intersection with the ray, and further to find initial guesses for Newton’s method. This traversal is done without recursion since GPUs don’t have a stack, and thus cannot allow for recursion. As bounding volumes, we use axis aligned boxes. Furthermore, ray tracing of nonparametric triangular Bézier surfaces has been implemented. This is done in a brute force way, in the sense that no space partitioning has been made, but every surface is tested against every ray. nonparametric surfaces is faster to ray trace and could be used for terrain or water (height maps). This chapter describe the implemented algorithms in detail. 5.1 Traversing the Bounding Volume Hierarchy Space partitioning is a fundamental technique when accelerating ray tracing, allowing us to reduce the total amount of intersection tests we need to make. But having no stack affects the algorithms that are usually used when traversing the space partition hierarchies in ray tracing. Either we simulate a stack, or we rewrite the algorithms to fit the non-recursive environment. There are several ways we can choose to do space partitioning. In [21] it is shown that KD-trees is the best choice for most purposes when performing ray 21 22 Chapter 5. Ray Tracing Algorithm (a) Figure 5.1: Simple twodimensional KD-tree. (b) Figure 5.2: Bounding box hierarchy (a) and corresponding node tree (b). tracing on CPU. A KD-tree is built recursively by splitting each node into two smaller nodes using an axis-aligned splitting plane (see Figure 5.1). Stackless KD-tree traversal has been successfully implemented on a GPU in [16], see also [12], where KD-tree traversal on GPU is implemented by simulating a stack. However, this approach works with a fixed stack depth and memory cost is high as we need memory proportional to the number of rays times the stack depth. Here, we have instead chosen to try to implement stackless bounding volume hierarchy traversal, using axis aligned boxes as bounding volumes, and it turns out to be feasible indeed. In a bounding volume hierarchy, each node is associated with a volume, completely enclosing its children (see Figure 5.2(a)). For Bézier surfaces, we build the hierarchy simply by repeatedly splitting the surface until some flatness criterion is met, or until a maximum depth is reached. In [31] a measure based on the surface curvature is used to determine the number of splits. The convex hull of the subsurface can then be used to generate a bounding volume. Normally, the first time we visit a node, we compute the entry point of the ray in each of the children node volumes, in order to decide which child to traverse first. Having a stack, we have fast access to stored information about what choices we made during previous visits to the node. This is a luxury we must live without when doing a stackless implementation. The only memory we will use is a short traversal history, telling us which node we just came from. As mentioned before, using a stack we would only have to compute child node entry points once. Here, we need to make this computation every time we visit a node. Having the children entry points we can, based on which node we came from, decide on where to traverse next. We can enter the node from three directions, from its parent or from one of its children (Figure 5.2(b)). Coming from the parent, we proceed to one of the child nodes (providing the current node isn’t a leaf), and we make the decision on which child simply based on which entry point is closest to the ray origin. When coming from one of the children the decision is slightly more complicated. Assume we came from the left child. Now, if the left child entry point is closer to the ray origin than the 5.1. Traversing the Bounding Volume Hierarchy 23 right child entry point, we know that we traversed the left child first, and thus need to traverse right. If the left child entry point instead was further away than the right child entry point, we know that we have traversed both children, and instead traverse up. Coming from the right child we make traversal decisions in the same way. Of course, we keep the value of the currently nearest intersection at all time, avoiding to traverse nodes that obviously cannot yield a closer intersection. Pseudo code for the algorithm is given in Algorithm 5.1. Algorithm 5.1 Stackless bounding box hierarchy traversal 1: previousN ode = P arent(root) 2: currentN ode = root 3: tn = IN F IN IT Y {Reset nearest intersection to infinity} 4: while currentN ode 6= P arent(root) do 5: if currentN ode has no children then 6: tn = min(tn , nearest surface intersection in currentN ode) 7: previousN ode = currentN ode 8: currentN ode = P arent(currentN ode) 9: else 10: BBl , BBr = bounding boxes of child nodes 11: tl , tr = ray entry points in child node boxes 12: if previousN ode = P arent(currentN ode) then 13: previousN ode = currentN ode 14: if ray hits BBl and tl ≤ tr and tl ≤ tn then 15: currentN ode = Lef tChild(currentN ode) 16: else if ray hits BBr and tr ≤ tn then 17: currentN ode = RightChild(currentN ode) 18: else 19: currentN ode = P arent(currentN ode) 20: end if 21: else if previousN ode = Lef tChild(currentN ode) and ray hits BBr and tl ≤ tr and tr ≤ tn then 22: previousN ode = currentN ode 23: currentN ode = RightChild(currentN ode) 24: else if previousN ode = RightChild(currentN ode) and ray hits BBl and tl > tr and tl ≤ tn then 25: previousN ode = currentN ode 26: currentN ode = Lef tChild(currentN ode) 27: else 28: previousN ode = currentN ode 29: currentN ode = P arent(currentN ode) 30: end if 31: end if 32: end while 24 5.2 Chapter 5. Ray Tracing Algorithm Rectangular Surface Intersection After having reached a leaf node of the bounding volume hierarchy, we are set with the surface patch identifier and Newton iteration initial guess associated with the node. We will not project the control points to two dimension before starting the Newton search. Tests showed that the algorithm slowed down when preprojecting the control points. The cost of projecting the points obviously overshadowed the reduced cost of computing surface point and derivatives. Additionally, we need a depth value of the intersection point, something we get for free when using the original control points. Using projected control points, we need to make an extra surface point and derivative evaluation in three dimensions after the search. In our implementation, the mean number of iterations in the Newton search most often lay between two and three. With a higher mean, the reduced cost of surface point and derivative evaluation would probably be more beneficial. Recall the Newton step (4.7). In order to get a two-dimensional search, we define the following function f (u, v) to use in our search [31, 17]: n1 · S(u, v) − n1 · o f (u, v) = . (5.1) n2 · S(u, v) − n2 · o This means basically that we delay the projection to two dimensions until after we have computed the surface point. n1 and n2 are the normals defined in (4.2) and (4.3) and o is the ray origin. Using this function, we get the Jacobian j11 j12 n1 · Su (u, v) n1 · Sv (u, v) . (5.2) Jf = = n2 · Su (u, v) n2 · Sv (u, v) j21 j22 The inverse of the Jacobian is particularly easy to calculate in the two-dimensional case: 1 j22 −j12 −1 , (5.3) Jf = −j21 j11 det Jf where the determinant is determined by det Jf = j11 j22 − j12 j21 , (5.4) giving us the final piece of information we need to start iterating. Newton’s method is described in Algorithm 5.2. We keep iterating until one of the stopping criterion is met. If at any time the norm of the function value is less than some preset tolerance, that is |f (un , vn )| < ǫ, (5.5) we consider the search a success and report a hit. If, however, this measure grows from one iteration to another, |f (un+1 , vn+1 )| > |f (un , vn )| , (5.6) 5.3. Nonparametric Triangular Surface Intersection 25 or if we reach a maximum number of iterations, we assume the ray missed the surface. Note that (5.5) only gives a measure of the spatial error, not the error of could use an alternative error measure as stopping criterion, un and vn . We (un+1 , vn+1 )T − (un , vn )T , giving us a measure of the error in the parametric domain. Finally, if a hit was reported, we make sure the resulting (u, v)-coordinates lie within the parametric domain of the surface. This test needs to be done with some tolerance to avoid visual artifacts in the seam between surfaces. Algorithm 5.2 Newton’s method 1: u0 = initial guess 2: n = 0 3: repeat 4: un+1 = un − J−1 (un )f (un ) {un = (un , vn )T } 5: previousError = error 6: error = |f (un+1 )| 7: n=n+1 8: until error < T OL or error > previousError or n ≥ M AXIT ER 5.3 Nonparametric Triangular Surface Intersection Trying to exploit the simplicity of solving polynomials of low degree (less than or equal to three), we have also implemented ray tracing of nonparametric, triangular Bézier patches. These types of surfaces may well be used for example to interpolate height data, creating terrain or decorating reliefs. The certainly most numerically appealing type is the surface of second degree. There exist several simple interpolation schemes for these types of surfaces, for example the C 1 Clough-Tocher interpolant or, particularly for piecewise second degree surfaces, the C 1 Powell-Sabin interpolants [13]. We opted for testing cubic surfaces, which gives us the possibility to use the explicit cubic formula to solve the intersection problem. However, due to numerical problems using the this approach, we present also a different method, based on the one-dimensional Newton method. In order to utilize the implicit form of the surface, we first transform the ray into the local coordinate system of the surface. We assume the surface has been modeled in the barycentric coordinate system (abc), a = (1, 0, 0), b = (0, 1, 0), c = (0, 0, 0) (Figure 5.3). By making this assumption, we can simply use the inverse of any transformation applied to the surface to get the ray into the local 26 Chapter 5. Ray Tracing Algorithm z y x Figure 5.3: Local coordinate system of nonparametric triangular Bézier surface. coordinate system of the surface. Furthermore, the barycentric coordinates needed as argument for the basis function are easily extracted from the local (x, y)-coordinates by letting u = (x, y, 1 − x − y). Now, we rewrite the surface in implicit form and insert the elements of the transformed ray’s parametric form (4.1), yielding: X 3 pi,j,k Bi,j,k (ur ) = 0. (5.7) z − f (x, y) = oz + tdz − i+j+k=3 where ur = (ox + tdx , oy + tdy , 1 − (ox + tdx + oy + tdy )). Expanding (5.7) reveals the coefficients to a third degree polynomial: c0 t3 + c1 t2 + c2 t + c3 = 0, (5.8) and how these coefficients depend on the elements of the ray. Before solving the polynomial we rewrite it in the following form: t3 + at2 + bt + c = 0. (5.9) As before suggested, there are some numerical aspects we need to consider when solving (5.9). First of all, to avoid excessively large coefficients and thus potential numerical problems (remember, we are working in 32 bit precision only), we reset the ray origin to a location closer to the surface along the ray. All coefficients except c0 depends on the ray origin coordinates, and large values on the origin can very much affect the accuracy of the final result. Here, we solve the problem by simply resetting the origin to the entry point of the surface bounding volume. This may not yield the best possible values, but proved good enough for our tests. We keep track of the distance we move the origin in order to be able to get the distance between the actual origin and the intersection. To solve (5.9) explicitly, we follow the methods outlined in [36, 49], with some modifications to improve numerical stability. Define Q = a2 − 3b , 9 (5.10) 5.3. Nonparametric Triangular Surface Intersection R = 2a3 − 9ab + 27c . 54 27 (5.11) The discriminant is defined as D = R2 − Q3 . (5.12) For large values on a, R2 and Q3 are both completely dominated by the term a6 /729, and we have an obvious risk of cancellation. Using instead the expansions R2 = Q3 = a6 a4 b a3 c a2 b2 abc c2 − + + − + , 729 81 27 36 6 4 a6 a4 b a2 b2 b3 − + − , 729 81 27 27 to compute D, we arrive at b3 abc c2 a3 c a2 b2 − + − + . 27 108 27 6 4 D= (5.13) The sign of the discriminant reveals the number of real roots to the polynomial. If D < 0 it has three real distinct roots, if D = 0 it has three real roots of which at least two are equal, and if D > 0 it has one real root only. If D < 0, we compute an angle θ using one of θ = θ = π R , − arctan √ 2 −D R arccos p , Q3 (5.14) (5.15) and then get the three real roots using: t1 t2 t3 p θ + π2 a )− , = −2 Q cos( 3 3 p θ + 5π a 2 = −2 Q cos( )− , 3 3 p θ + 9π a 2 = −2 Q cos( )− . 3 3 (5.16) (5.17) (5.18) If instead D >= 0, we compute one real root (note that even if D = 0, we still compute one root only) using S T t h √ i1/3 , = −sgn(R) |R| + D Q/S S 6= 0 = , 0 S=0 a = S+T − . 3 (5.19) 28 Chapter 5. Ray Tracing Algorithm Of course we also need to take care of the degenerate cases when (5.9) reduces to a second or first degree polynomial. On a CPU prototype implementation, (5.14) yielded better results than (5.15), but on the GPU implementation (using the Cg language) the trigonometric function arctan had an unpleasantly bad accuracy. In this case arccos proved better, but still not nearly as good as in the CPU implementation. Because of these shortcomings, we have also implemented a solution based on the onedimensional Newton method. This method is slower, but much more stable. Define a function using the cubic polynomial: f (t) = t3 + at2 + bt + c. (5.20) Taking a closer look at (5.20), we find that we have only four basic cases needed to be taken care of. To figure out the case of the current ray, and an initial guess for the Newton search, we compute the discriminant of the (quadratic) derivate of the cubic polynomial: Df ′ = b2 − 4ac, (5.21) and its roots t′1 and t′2 . The four cases are then as follows: 1. Df ′ < 0: the derivative has no roots, and thus f has no local minimum or maximum. Use any initial guess t0 . We have chosen to use the explicit formula to compute an initial guess in this case (the case where the cubic polynomial discriminant D >= 0, see above). 2. Df ′ >= 0 and f (t′1 ) · f (t′2 ) <= 0: we have three real roots. Find one using a Newton search with initial guess t0 = (t′1 + t′2 )/2. 3. Df ′ >= 0 and f (t′1 ), f (t′2 ) > 0: we have one real root. Use initial guess t0 > t′2 . We have chosen to use t0 = t′2 + (t′2 − t′1 )/2. 4. Df ′ >= 0 and f (t′1 ), f (t′2 ) < 0: we have one real root. Use initial guess t0 < t′1 . We have chosen to use t0 = t′1 − (t′2 − t′1 )/2. These cases are illustrated in Figure 5.4 (note that the figure does not illustrate the ray and the surface but rather the value of f along the ray). Having an initial guess, we start the Newton search: tn+1 = tn − f (tn ) t3 + at2 + btn + c = tn − n 2 n , ′ f (tn ) 3tn + 2atn + b (5.22) and repeat until some desired accuracy has been reached. In case 2, where we have three roots, we can use the just found root t∗ to find the other two by solving the second degree polynomial t2 + (t∗ + a)t + ((t∗ )2 + at∗ + b) = 0. (5.23) 5.3. Nonparametric Triangular Surface Intersection case 1 case 2 case 3 29 case 4 Figure 5.4: The different cases of the cubic polynomial. Having a root t∗ we quickly find the ray-surface intersection point either by inserting t∗ into the parametric form of the original (untransformed) ray, or by computing the point on the surface using the barycentric coordinates u = (ox + t∗ dx , oy + t∗ dy , 1 − (ox + t∗ dx + oy + t∗ dy )). 30 Chapter 5. Ray Tracing Algorithm Chapter 6 Implementation Now that the algorithms have been made clear, we will discuss the actual implementation of them on the GPU. All GPU specific code was written in Cg and run through the Cg API of OpenGL. Cg (C for graphics) is a programming language developed by NVIDIA to give developers access to GPU hardware on a higher level than pure assembler [15]. We will focus mainly on the implementation of the rectangular case, since our current implementation of nonparametric triangular Bézier surfaces is very much a simplified version of the previous, with no traversal and of course a different intersection kernel. 6.1 GPU ray tracing Traditionally, GPUs are used to render large amounts of triangles fast, handling also transformations and decorations that we may want to apply to the triangles. By specializing on this, the GPU can be extremely efficient for drawing the type of three-dimensional graphics used in for example modern games, where all models are drawn onto screen triangle by triangle. For ray tracing purposes, we cannot fully take advantage of this functionality since the algorithm is inherently different. Instead of drawing all triangles onto the screen one by one, we need to work on each screen pixel one by one, tracing a ray through each one. As mentioned in Chapter 3, the decoration step (fragment shader) of the GPU is programmable, and this is what we will use to get the GPU ray tracing. Instead of using the fragment shader (see Chapter 3) to decorate a triangle with bitmap textures and light, we use it to run the ray tracing algorithm. This is accomplished by programming a set of fragment shader programs (kernels), each one specialized to handle a certain step of the ray tracing algorithm. To get the GPU to execute a certain step of the ray tracing algorithm, we simply 31 32 Chapter 6. Implementation enable the kernel corresponding to the step, and then draw two triangles which entirely cover the screen (see Figure 6.2). When rasterizing the triangles, the GPU will then execute the kernel on each pixel. 6.2 Data Each kernel need to be able to read and write data in order to communicate with other kernels. All data is put into textures, for example, a point in threedimensional space can be stored in the red, green and blue components of a texture. Before describing the details of the program, we will briefly discuss how to handle the input (and output) of data to a kernel. There are basically two ways of accessing the data, directly or indirectly. When applying textures to triangles for decoration, we supply texture coordinates to each triangle vertex. The GPU then interpolates these coordinates over the triangle to fetch the correct value of the texture for each generated pixel in the triangle (Figure 6.1). This is the facility we will use to simulate a streaming data model. We call this direct data access, since the GPU always supplies the current kernel with the address to the value in the texture. When drawing the two screen covering triangles, we set texture coordinates in each corner of the triangles to map the texture perfectly to the screen (Figure 6.2). Having textures of the same size as the screen, every element of the texture will be addressed by exactly one pixel. Not all kind of data can be used in this way. For example, when laying out the bounding volume hierarchy data in a texture, there is no particular relation between any part of this data and a certain pixel in the screen space. Instead, any piece of this data may be needed for the calculation of any pixel value (for example, we never know beforehand which parts of the hierarchy a certain ray will traverse). Thus, we need to compute the address of this data ourselves, and we call this indirect data access. Often, we compute this address using data from another texture (Figure 6.3). In our implementation, we use six data sets. We use two screen sized sets to store ray origin and direction. Screen sized textures gives us the possibility to use direct access and gives each pixel the corresponding element of the texture for storage (note that a texture can be either read from or written to during one kernel execution, not both). Additionally, we use two indirect data sets to store the bounding box hierarchy and Bézier control point data. Lastly, we also need two screen sized (direct) data sets holding current traversal data (for example, current node identifier) and nearest found intersection. 6.2. Data 33 (0, 1) (1, 0) (0, 0) Figure 6.1: Classic texture mapping using a bitmap texture. (1, 1) (0, 1) - (0, 0) (1, 0) (a) (b) (c) Figure 6.2: We execute a kernel on every element of a data set (texture) by drawing two triangles just large enough to get a one-to-one mapping between the generated pixels in the screen space and the data set. In this example we use a screen size of 8 × 8 pixels. The triangles we draw (a) are rasterized when drawn in screen space (b). By enabling a certain kernel before drawing the triangles, we force the GPU to execute this kernel on every pixel generated. During this execution, we access the values of screen sized textures (c) at corresponding coordinates. Input stream 2 5 U 3 ? 3 s Other data set Figure 6.3: Indirect addressing using streams. 34 6.3 Chapter 6. Implementation Program The program is implemented using multipass rendering consisting of eight kernels and the program flow is illustrated in Figure 6.4. It should be mentioned that the program currently only handles primary rays, meaning that shadow, reflection and refraction effects have not been implemented. The bounding volume hierarchy and the Bézier control point data are set up as texture data before starting the program and then input to kernels when needed. In the bounding volume hierarchy data set, we add information about surface id and initial guess to each leaf node, supplying us with the required information to start a Newton search. Following is a brief description of each kernel: Generate primary rays This is the simplest of the kernels. It generates two textures, one with the origins and one with the normalized directions of the rays through every pixel. We need no input texture, but simply generate the two output textures by setting the origin and direction values in each corner of the texture and letting the GPU interpolate these values. Initialize traversal During the program execution we need to keep track of each ray’s state, that is whether it is traversing the bounding volume tree (and if so its current position in the tree), waiting for intersection test or has been deactivated (done traversing). We also need to hold information about the currently nearest intersection that has been found. For this we use two data sets, both of which we initialize in this kernel. Check state In this kernel we decide whether to continue traversing or whether to perform an intersection test. This decision is based on how many rays are ready for intersection test, that is, rays that have reached a leaf node. The number of rays in a certain state is counted using occlusion queries [29]. A kernel can always discard a value if it needs to, resulting in no value being written to the output stream (leaving previous data intact). The check state kernel discards values if the corresponding ray is in waiting state or deactivated. Occlusion queries are used to count the number of values which was not discarded by the kernel, and we can thus determine whether to continue traversing or not. Traverse This is the bounding volume hierarchy traversal kernel. Assuming a ray is in traversal state, we traverse the tree based on which node we came from, and based on intersection tests between the ray and child node bounding volumes (which is also performed by this kernel). If a ray reaches the root node after having traversed both of its child trees (if needed), it is deactivated. Intersection When enough rays (see next section) are in a waiting state we stop traversing, and instead perform an intersection test. Using the surface id and initial guess from the current leaf node, we start a Newton search. 6.4. Optimizations - 35 Generate primary rays ? Initialize traversal ? - Check state - Intersection ? Traverse - Shade - ? Continue (two kernels) Figure 6.4: Program flow. In this kernel we also make an occlusion query. If all values were discarded, then all rays have been deactivated, we are done and can continue to the shading step. Continue This step consists of two kernels, the first used to update the nearest intersection data set, and the other to update the state data set. This could be combined into one kernel, but was kept separate to simplify implementation. Shade Finally, when all rays are deactivated, this kernel takes care of shading using the nearest intersection data set. 6.4 Optimizations A naive implementation can prove to be very inefficient, especially if we do not take into consideration the parallel nature of the GPU. In this section, we explain a few techniques we have used to speed up the rendering of a scene. The probably single most important optimization is tiling of the screen space. The number of needed kernel executions vary from different parts of the screen space. After each iteration, we will have larger inactive areas of the screen space. Executing each kernel on the entire data set thus means a lot of unnecessary work. Since we do have screen coherency to some extent, we tile the screen 36 Chapter 6. Implementation space. From the start we maintain a list of active tiles, that is, tiles with at least one active pixel (Figure 6.5), for which we execute the current kernel. We can speed the program up by performing some load balancing [37]. Executing a kernel on a tile with almost no active pixels means a lot of redundant work. Therefore, we do not wait for all pixels to be in waiting or inactive state before switching kernel. In the current implementation we use the following rules for the traversal/check state loop to perform load balancing: 1. A tile is considered to be in waiting state if one of the following holds: • the number of active pixels in the tile are larger than a threshold Pactive and the number of pixels in waiting state represent at least Pwaiting % of the number of active pixels. • all active pixels in the tile are in waiting state. 2. We switch to the intersection kernel if one of the following holds: • the number of active tiles are larger than a threshold Tactive and the number of tiles in waiting state represent at least Twaiting % of the number of active tiles. • all active tiles are in waiting state. For our tests, we have used a screen size of 512 × 512, a tile size of 16 × 16 pixels and the following values for the rules: Pactive = 30, Pwaiting Tactive = = 80%, 40, Twaiting = 35%. These values have been roughly determined through experimentation to give good results for our test scenes. Small variations to the values do not affect the results significantly. 6.4. Optimizations Figure 6.5: Tiling of the screen space and list of active tiles. 37 38 Chapter 6. Implementation Chapter 7 Results In this chapter we will present and discuss the results of the Bézier surface ray tracer that was implemented on GPU as well as give a few suggestions on how to extend the work. 7.1 Results The implementation of the algorithms presented in this work has been tested on a few scenes, illustrated in Figure 7.1. All tests were run on an NVIDIA GeForce 6800 with a screen size of 512 × 512 pixels and a tile size of 16 × 16 pixels. The scenes were tested with two different bounding box tree depths to investigate how this affects the rendering time. The data and results for the scenes are given in Tables 7.1, 7.2 and 7.3. The data shows how many times the traversal and intersection kernels were bound (loaded into memory) and how many times they were executed on a tile. We also tested the performance of the kernels and the results are given in Table 7.4. The main work is made by the traversal and intersection kernels, which have the highest instruction count. Switching kernels does not seem to have any significant effect on the overall speed and as we can see, neither does the the number of bounding boxes. We need more tests to determine the effect of the tree depth however. The results show that our GPU implementation is at least comparable with the rendering speeds achieved in [17, 1] with CPU implementations. We will not make a qualitative comparison with those CPU implementations, since it is hard to make a fair one. 39 40 Chapter 7. Results In Figure 7.2 and 7.3 we show two of the problems that can arise when using Newton’s method. The test scenes for nonparametric triangular patches are shown in Figure 7.4. These are simple scenes consisting of four patches only using no spacial partitioning, thus every ray is tested against all surface patches. Both scenes reached a speed of 17 fps using the numerical method to find intersections. Scene Patches Bounding boxes Tree depth Traversal binds Traversal executions Intersection binds Intersection executions Frames per second teapot (1) 32 12423 13 433 59391 29 6253 1.00 teapot (2) 32 3111 12 233 44555 27 5751 1.18 Table 7.1: Results from teapot scene tests. Scene Patches Bounding boxes Tree depth Traversal binds Traversal executions Intersection binds Intersection executions Frames per second teacup (1) 26 8571 13 374 74125 25 6984 0.89 teacup (2) 26 2671 12 311 57802 27 7169 1.01 Table 7.2: Results from teacup scene tests. Scene Patches Bounding boxes Tree depth Traversal binds Traversal executions Intersection binds Intersection executions Frames per second teaspoon (1) 16 3433 12 581 25638 31 2764 1.89 teaspoon (2) 16 1245 10 388 20037 28 2631 2.22 Table 7.3: Results from teaspoon scene tests. 7.1. Results 41 Figure 7.1: Test scenes for rectangular Bézier ray tracing. Screen is 512 × 512 pixels. 42 Chapter 7. Results Kernel Generate primary rays Initialize traversal Check state Traverse Intersection Continue (state) Continue (intersection) Shade Instruction count 4 21 3 157 341 21 10 149 Pixel throughput (GP/s) 0.975 0.300 1.950 0.061 − 0.433 0.780 0.031 Table 7.4: Results from running NVShaderPerf on the kernels. Intersection kernel failed to test pixel throughput. Figure 7.2: Problem area on the knob of the teapot. Several control points coincide and we run into numerical problems when computing partial derivatives. Figure 7.3: To few bounding boxes have been created for the surface patches resulting in bad initial guesses. We therefore find the wrong intersection or none at all. 7.1. Results 43 Figure 7.4: Test scenes for nonparametric triangular Bézier ray tracing. Screen is 512 × 512 pixels. 44 7.2 Chapter 7. Results Conclusion We have shown in this work that it is possible to implement ray tracing of Bézier surfaces on a GPU using Newton’s method and we have introduced bounding box hierarchy traversal on GPU. The results show that our GPU implementation is slower but still performs well compared to previous similar implementations on CPUs. We have used load balancing and suggested and implemented the use of tiling to speed up execution. Furthermore, we have shown how to ray trace nonparametric Bézier triangles. We have proposed two methods, one direct method using the cubic formula to find the intersection analytically, and a numerical method based on Newton’s method to handle an environment with limited floating point precision, which we are presented with when working with GPUs. The numerical method has been implemented and works quite well on our simple test scenes. 7.3 Future work We have successfully implemented the basic parts of a Bézier surface ray tracer on GPU, but there is still a lot of work that can be done. A first step to extend the current ray tracer could be to add shadow rays and support for reflection and refraction. We should also put more work into minimizing the numerical problems that still exist and test the algorithm on larger and more complex scenes. Another point that certainly deserves some attention is the use of early-z culling. This is a technique used by modern GPUs to avoid unnecessary execution of fragment programs which we could utilize to speed up the execution. By making a look-up in the depth buffer, the GPU can discard pixels before execution of the fragment program. By setting up the depth buffer properly before executing an expensive kernel, for example by running a small and much more inexpensive kernel specialized for this task, we can avoid unnecessary work on inactive rays. Further testing of nonparametric triangular surfaces would include the use of more complex scenes and spacial hierarchies. Also, if future GPUs offers higher floating point precision or if the current numerical problems can be solved, the analytical approach would yield faster execution. The algorithms in this work have all focused on cubic surfaces, but in some cases it may be sufficient to use quadratic surfaces. Quadratic surfaces are much more attractive when considering the numerical aspect and would also yield faster algorithms. In [17] rays are not traced one by one, but rather several rays at a time in small packets. A similar approach should be possible to implement on a GPU, 7.3. Future work 45 effectively reducing the total work of traversing the spacial hierarchy traversal. 46 Chapter 7. Results Bibliography [1] Carsten Benthin, Ingo Wald, and Philipp Slusallek. Interactive ray tracing of free-form surfaces. In AFRIGRAPH ’04: Proceedings of the 3rd international conference on Computer graphics, virtual reality, visualisation and interaction in Africa, pages 99–106, New York, NY, USA, 2004. ACM Press. [2] Ian Buck. Taking the plunge into GPU computing. In Matt Pharr, editor, GPU Gems 2, pages 509–519. Addison-Wesley, March 2005. [3] Swen Campagna and Philipp Slusallek. Improving bézier clipping and chebyshev boxing for ray tracing parametric surfaces, 1996. [4] Swen Campagna, Philipp Slusallek, and Hans-Peter Seidel. Ray tracing of spline surfaces: Bézier clipping, chebyshev boxing, and bounding volume hierarchy - a critical comparison with new results. The Visual Computer, 13(6):265–282, 1997. [5] Nathan A. Carr, Jesse D. Hall, and John C. Hart. The ray engine. In HWWS ’02: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pages 37–46, Aire-la-Ville, Switzerland, Switzerland, 2002. Eurographics Association. [6] M. Christen. Ray tracing on GPU. Diploma thesis, University of Applied Sciences Basel, Switzerland, 2005. [7] J. L. D. Comba and J. Stolfi. Affine arithmetic and its applications to computer graphics. In Proc. VI Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI’93), pages 9–18, 1993. [8] Andreas Dietrich, Ingo Wald, Carsten Benthin, and Philipp Slusallek. The OpenRT Application Programming Interface – Towards A Common API for Interactive Ray Tracing. In Proceedings of the 2003 OpenSG Symposium, pages 23–31, Darmstadt, Germany, 2003. Eurographics Association. [9] Loic Lamarque David Menegaux Dominique Michelucci, Sebti Foufou. Bernstein based arithmetic featuring de casteljau. In Proceedings of the 47 48 Bibliography 17th Canadian Conference on Computational Geometry (CCCG’05), pages 215–218, 2005. [10] Alexander Efremov. Efficient ray tracing of trimmed NURBS surfaces. Master’s thesis, University of Saarland, 2004. [11] Alexander Efremov, Vlastimil Havran, and Hans-Peter Seidel. Robust and numerically stable bézier clipping method for ray tracing nurbs surfaces. In SCCG ’05: Proceedings of the 21st spring conference on Computer graphics, pages 127–135, New York, NY, USA, 2005. ACM Press. [12] Manfred Ernst, Christian Vogelgsang, and Günther Greiner. Stack implementation on programmable graphics hardware. In VMV, pages 255–262, 2004. [13] Gerald Farin. Triangular bernstein-bézier patches. Computer Aided Geometric design, 3(2):83–128, 1986. [14] Gerald Farin. Curves and Surfaces for CAGD, a Practical Guide. Academic Press, San Diego, fourth edition, 1997. [15] Randima Fernando and Mark J. Kilgard. The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2003. [16] Tim Foley and Jeremy Sugerman. Kd-tree acceleration structures for a gpu raytracer. In HWWS ’05: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pages 15– 22, New York, NY, USA, 2005. ACM Press. [17] Markus Geimer and Oliver Abert. Interactive ray tracing of trimmed bicubic bézier surfaces without triangulation. In WSCG (Full Papers), pages 71–78, 2005. [18] Pat Hanrahan. Ray tracing algebraic surfaces. In SIGGRAPH ’83: Proceedings of the 10th annual conference on Computer graphics and interactive techniques, pages 83–90, New York, NY, USA, 1983. ACM Press. [19] Mark Harris. Mapping computational concepts to GPUs. In Matt Pharr, editor, GPU Gems 2, pages 493–508. Addison-Wesley, March 2005. [20] John C. Hart. Ray tracing implicit surfaces. In SIGGRAPH 93 Modeling, Visualizing, and Animating Implicit Surfaces course notes, pages 13–1 to 13–15. 1993. [21] Vlastimil Havran. Heuristic Ray Shooting Algorithms. Ph.d. thesis, Department of Computer Science and Engineering, Faculty of Electrical Engineering, Czech Technical University in Prague, November 2000. [22] Brian Hayes. A lucid interval. November–December 2003. American Scientist, 91(6):484–488, Bibliography 49 [23] Michael T. Heath. Scientific Computing: An Introductory Survey. McGrawHill, second edition, 2002. [24] Kenneth I. Joy and Murthy N. Bhetanabhotla. Ray tracing parametric surface patches utilizing numerical techniques and ray coherence. In SIGGRAPH ’86: Proceedings of the 13th annual conference on Computer graphics and interactive techniques, pages 279–285, New York, NY, USA, 1986. ACM Press. [25] A. Junior, L. de Figueiredo, and M. Gattas. Interval methods for raycasting implicit surfaces with ane arithmetic, 1999. [26] James T. Kajiya. Ray tracing parametric patches. In SIGGRAPH ’82: Proceedings of the 9th annual conference on Computer graphics and interactive techniques, pages 245–254, New York, NY, USA, 1982. ACM Press. [27] D. Kalra and A. H. Barr. Guaranteed ray intersections with implicit surfaces. In SIGGRAPH ’89: Proceedings of the 16th annual conference on Computer graphics and interactive techniques, pages 297–306, New York, NY, USA, 1989. ACM Press. [28] Filip Karlsson and Carl Johan Ljungstedt. Ray tracing fully implemented on programmable graphics hardware. Master’s thesis, Chalmers University of Technology, 2004. [29] Emmett Kilgariff and Randima Fernando. The geforce 6 series GPU architecture. In Matt Pharr, editor, GPU Gems 2, pages 471–491. AddisonWesley, March 2005. [30] Aaron Lefohn, Joe M. Kniss, and John D. Owens. Implementing efficient parallel data structures on GPUs. In Matt Pharr, editor, GPU Gems 2, pages 521–545. Addison-Wesley, March 2005. [31] William Martin, Elaine Cohen, Russell Fish, and Peter Shirley. Practical ray tracing of trimmed NURBS surfaces. J. Graph. Tools, 5(1):27–52, 2000. [32] R. E. Moore and S. T. Jones. Safe starting regions for iterative methods. SIAM Journal on Numerical Analysis, 14(6):1051–1065, 1977. [33] Tomoyuki Nishita, Thomas W. Sederberg, and Masanori Kakimoto. Ray tracing trimmed rational surface patches. In SIGGRAPH ’90: Proceedings of the 17th annual conference on Computer graphics and interactive techniques, pages 337–345, New York, NY, USA, 1990. ACM Press. [34] John Owens. Streaming architectures and technology trends. In Matt Pharr, editor, GPU Gems 2, pages 457–470. Addison-Wesley, March 2005. [35] Les Piegl and Wayne Tiller. The NURBS Book. Springer-Verlag, second edition, 1997. 50 Bibliography [36] W. H. Press, W. T. Vetterling, S. A. Teukolsky, and B. P. Flannery. Numerical Recipes in C++. Cambridge University Press, New York, second edition, 2002. [37] Timothy J. Purcell, Ian Buck, William R. Mark, and Pat Hanrahan. Ray tracing on programmable graphics hardware. In SIGGRAPH ’02: Proceedings of the 29th annual conference on Computer graphics and interactive techniques, pages 703–712, New York, NY, USA, 2002. ACM Press. [38] Timothy John Purcell. Ray tracing on a stream processor, 2004. [39] S. H. Martin Roth, Patrick Diezi, and Markus H. Gross. Raytracing triangular bézier patches. Comput. Graph. Forum, 20(3), 2001. [40] Thomas W. Sederberg and David C. Anderson. Ray tracing of steiner patches. In SIGGRAPH ’84: Proceedings of the 11th annual conference on Computer graphics and interactive techniques, pages 159–164, New York, NY, USA, 1984. ACM Press. [41] Thomas W. Sederberg and Falai Chen. Implicitization using moving curves and surfaces. In SIGGRAPH ’95: Proceedings of the 22nd annual conference on Computer graphics and interactive techniques, pages 301–308, New York, NY, USA, 1995. ACM Press. [42] Thomas W. Sederberg, Jianmin Zheng, Kris Klimaszewski, and Tor Dokken. Approximate implicitization using monoid curves and surfaces. Graph. Models Image Process., 61(4):177–198, 1999. [43] John M. Snyder. Interval analysis for computer graphics. In SIGGRAPH ’92: Proceedings of the 19th annual conference on Computer graphics and interactive techniques, pages 121–130, New York, NY, USA, 1992. ACM Press. [44] Wolfgang Stürzlinger. Ray-tracing triangular trimmed free-form surfaces. IEEE Transactions on Visualization and Computer Graphics, 4(3):202–214, 1998. [45] Daniel L. Toth. On ray tracing parametric surfaces. In SIGGRAPH ’85: Proceedings of the 12th annual conference on Computer graphics and interactive techniques, pages 171–179, New York, NY, USA, 1985. ACM Press. [46] Alex Vlachos, Jörg Peters, Chas Boyd, and Jason L. Mitchell. Curved pn triangles. In SI3D ’01: Proceedings of the 2001 symposium on Interactive 3D graphics, pages 159–166, New York, NY, USA, 2001. ACM Press. [47] Ingo Wald, Philipp Slusallek, Carsten Benthin, and Markus Wagner. Interactive rendering with coherent ray tracing. In A. Chalmers and T.-M. Rhyne, editors, EG 2001 Proceedings, volume 20(3), pages 153–164. Blackwell Publishing, 2001. Bibliography 51 [48] Shyue-Wu Wang, Zen-Chung Shih, and Ruei-Chuan Chang. An efficient and stable ray tracing algorithm for parametric surfaces. J. Inf. Sci. Eng., 18(4):541–561, 2002. [49] Eric W. Weisstein. Cubic formula. From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/CubicFormula.html (2005-11-13). [50] Ch. Woodward. Ray tracing parametric surfaces by subdivision in viewing plane. pages 273–287, 1989. [51] Sven Woop, Jörg Schmittler, and Philipp Slusallek. Rpu: a programmable ray processing unit for realtime ray tracing. ACM Trans. Graph., 24(3):434– 444, 2005. 52 Bibliography LINKÖPING UNIVERSITY ELECTRONIC PRESS Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/ Upphovsrätt Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/ c 2006, Joakim Löw 53

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement