Hardware Accelerated Multi-Resolution - staff.uni

Hardware Accelerated Multi-Resolution - staff.uni
Hardware Accelerated Multi-Resolution
Geometry Synthesis
Martin Bokeloh*
* WSI/GRIS, University of Tübingen
Abstract
In this paper, we propose a new technique for hardware accelerated multi-resolution geometry synthesis. The level of detail for a
given viewpoint is created on-the-fly, allowing for an almost
unlimited model resolution in rendering without excessive memory usage. The models consist of regularly sampled rectangular
patches that are subdivided hierarchically by a programmable
shader in order to create different levels of resolution. The approach is inherently parallel and lends itself to an implementation
on vector processor-like parallel architectures. We demonstrate
this property by an implementation on programmable graphics
hardware. This implementation shows a substantial performance
benefit over a CPU-based implementation by up to more than an
order of magnitude. We apply the framework to rendering of
smooth surfaces and to rendering of complexly structured fractal
landscapes using a novel multi-channel fractal subdivision technique. Due to the hardware acceleration, it is possible to perform
interactive editing and walkthroughs of such scenes in real-time.
Categories and Subject Descriptors: I.3.3 [Computer Graphics]:
Picture / Image Generation – Display Algorithms; I.3.6 [Computer
Graphics]: Methodology and Techniques – Graphics data structures and data types.
Keywords: multi-resolution modeling, games and GPUs, graphics hardware, large data sets
1 INTRODUCTION
In the last 10 years, computer graphics has experienced a dramatic
increase in performance of rendering hardware. Contemporary
graphics coprocessors (GPUs) are capable of processing several
hundred million primitives per second, allowing for highly complex geometry to be displayed in real-time. Given the capabilities
for sophisticated rendering of complex content, additional attention has to be paid to the problem of modeling complex scenes
and the coupling of the modeling and the rendering process.
In terms of effort for a human modeler, it is virtually impossible to create complex geometries by editing on a per-primitive
basis. Consequently, procedural modeling techniques are frequently used to create detailed 3d models. Such techniques allow
(generally speaking) the control of a more complex geometry by
only a few parameters to a modeling algorithm. This property is
usually called data amplification in computer graphics literature.
* [email protected]
° [email protected]
 ACM, 2006. This is the author’s version of the work. It is posted here
by permission of ACM for your personal use. Not for redistribution. The
definitive version will be published in the Proceedings of the Symposium
on Interactive 3D Graphics and Games 2006.
Michael Wand°
° Computer Graphics Laboratory, Stanford University
Procedural techniques range from spline surfaces to complex
fractal models, which provide a varying degree of data amplification. An important advantage of procedural modeling techniques
is memory efficiency: By storing only the parameters for the
procedural model instead of the generated set of geometric primitives, memory requirements can be drastically reduced. Rendering
primitives (triangles, micro-polygons, ray sample points) are
generated on-the-fly, during rendering. A further benefit is level
of detail control: For many procedural rendering techniques, the
number of primitives being generated for rendering can be easily
adapted to the current requirements (such as the viewpoint), resulting in a significant reduction of rendering time.
Although being commonly used in offline rendering (see e.g.
[Cook et al. 87]), procedural generation of geometry is only rarely
used in interactive graphics. Most often, triangle meshes are
precomputed and transferred to the graphics board for rendering.
Only surface shading is commonly performed by procedural
techniques (and in hardware), as this is directly supported by the
architecture of current GPUs. In cases where a more compact
procedural description of geometry is available, this causes avoidable storage and bandwidth problems. Ideally, the evaluation of
the procedural model should be performed on-the-fly, at rendering
time. For current PC hardware, this means that geometry synthesis
should be performed by the GPU, avoiding the bandwidth and
processing bottlenecks of the main CPU. For other architectures
(such as the upcoming multi-core game console architectures), a
similar processing model, enabling the usage of several computational hardware units in parallel, is also desirable.
In this paper, we propose a new approach for hardware accelerated geometry synthesis. It employs a restricted quadtree subdivision of rectangular, regularly sampled patches, corresponding to
different levels-of-detail of an object. Higher resolution patches
are created by subdividing lower resolution patches into four; new
points are functions of fixed neighborhoods of the corresponding
lower resolution points. Multiple attribute channels are employed
to represent additional information to guide the subdivision process. The subdivision routine, which accounts for most of the
computational demands of the algorithm, can be implemented
using a single instruction stream on large amounts of data in
parallel so that it can be executed very efficiently by a vectorprocessor style parallel architecture. We demonstrate the performance benefits of this approach by implementing the algorithm
using the pixel shaders of current GPUs, resulting in a speedup of
up to more than an order of magnitude in comparison to a CPU
implementation. The proposed framework is very flexible, allowing for applications ranging from simple smooth surfaces to complex landscape models.
The proposed technique combines several well-known algorithmic building blocks. Our main contribution is a composite
modeling architecture that can be implemented efficiently on
parallel graphics hardware and is still flexible enough to create
complexly structured models. Efficient execution on parallel
hardware is achieved by the usage of a fixed subdivision kernel
for all data points. However, a problem with this approach is the
stationarity of the subdivision rule, leading to models where different parts have similar geometric characteristics. The main idea
to overcome these limitations is the usage of multiple attributes
per data point. The additional attribute channels store meta information (such as surface roughness or vegetation density) to control the subdivision process, which themselves are altered by
higher level subdivision steps. This results in more flexibility and
variability in the synthesized model. We apply this modeling
approach to the synthesis of complexly structured fractal landscapes. Due to the hardware acceleration and the multi-resolution
approach, modeling and interactive editing of such scenes can be
performed in real-time while maintaining a high model and image
quality.
2 RELATED WORK
Our system is based on several techniques from literature, such as
deterministic and stochastic subdivision for geometry synthesis
and restricted quadtree triangulations for level of detail control. In
this section, we discuss the relation to literature in these areas as
well as to recent GPU-based geometry synthesis techniques.
Modeling by subdivision: Many procedural modeling techniques can be expressed as subdivision algorithms. Spline surfaces can for example be rendered by a repeated application of the
de Casteljau algorithm [Bartels et al. 1987]. Subdivision surfaces
[Catmull and Clark 1978, Doo 1978] generalize modeling of
smooth surfaces to meshes of general topology (see e.g. [Zorin et
al. 2000] for a survey).
Stochastic subdivision / fractal modeling: Subdivision techniques can also be used to create irregular, non-smooth surfaces.
Such surfaces can be characterized as random noise with a certain
frequency spectrum (often proportional to 1/f h for some fixed h)
[Musgrave 1993]. A subdivision algorithm takes a regularly sampled noise signal, upsamples it to a higher sampling rate and adds
additional high-frequency noise that has not been represented by
the lower resolution version. This scheme has been first introduced by [Fournier et al. 1982] and extended by several authors:
[Miller et al. 86] propose a smooth interpolation scheme to avoid
discontinuity artifacts. A general analysis of stochastic subdivision of scalar data arrays is given by [Lewis 1986]. Noise properties are modeled by 2nd order statistics (mean, variance,
autocorrelation). It is shown how different noise characteristics
(such as different roughness or anisotropy, e.g. to create ocean
waves) can be translated into subdivision rules of fixed neighborhoods. Our approach can handle subdivision rules that create high
resolution points as a function of a fixed neighborhood of the
original data (with performance depending on the neighborhood
size). This demand is met by all aforementioned subdivision
schemes.
A non subdivision-based technique is described by [Perlin
1985]: Noise functions of several input attributes are used to
create complexly structured textures, allowing a pixel-parallel
evaluation. We apply a similar idea to describe the subdivision
function. Rendering of fractal landscapes with dynamic level of
detail is also provided by commercial software packages such as
MojoWorld [Pandromeda 2005] or Terragen [Planetside 2005].
These packages offer a high image quality; however, generating
such images takes at least several minutes.
Multi-resolution modeling: The classic approach for level of
detail control is the construction of a triangle hierarchy that allows
a refinement or coarsening of the model by local triangle insertions and deletions (see e.g. [Lubke et al. 2003] for a survey). This
hierarchy can allow general triangle meshes [Hoppe 1996] or
restricted classes of meshes, such as subdivision connectivity
meshes [Lindstrom et al. 1996]. Many techniques have been
described that target especially at the case of terrain visualization
(see e.g. [Duchaineau et al. 1997, Pajarola 1998, Röttger et al.
1998, Lindstrom and Pascucci 2001]), mostly being based on
restricted triangle hierarchies. Recent level of detail techniques
mostly operate batch oriented, employing hierarchies with several
thousand triangles per hierarchy node to optimize the throughput
to the GPU [Cignoni et al. 2003, Larsen and Christensen 2003,
Balázs et al. 2004]. The technique of [Losasso and Hoppe 2004] is
especially optimized for streaming data to the GPU. Arguing that
geometry setup and transfer is typically more often a limiting
factor than vertex processing by the GPU, their technique does not
perform feature dependent mesh optimization but uploads regular
grids of different mip-map levels to the GPU.
Our technique uses a subdivision connectivity hierarchy (restricted quadtree) of regularly sampled patches, similar to [Larsen
and Christensen 2003]. The regular sampling is needed to facilitate the subdivision modeling process. The geometry is rendered
batchwise, directly from graphics memory (where it has been
created). Following the arguments of [Losasso and Hoppe 2004],
we think that the benefits of the regular structure, which guarantees a good utilization of the rendering pipeline, outweight the
losses due to reduced adaptivity of the locally uniform mesh.
Currently, we use a single rectangular patch to parameterize and
sample the data, which currently excludes general base meshes as
topology (which is subject of future work).
In addition to mesh simplification-based level of detail techniques, there are also point-based level of detail techniques that
are favorable for objects of complex mesh topology. Applications
to landscape rendering have been demonstrated for example by
[Stamminger and Dretakis 2001] or [Wand et al. 2001].
Parallel / GPU-based geometry synthesis: The desire for
hardware accelerated geometry synthesis and rendering is not
new: For example [Max 1981] describes an implementation of a
raytracer for procedural terrain models implemented on a Cray-1
vector computer. [Perlin and Hoffert 1989] employ a massively
parallel raytracer for efficient rendering of procedurally defined
noise volumes, coined “Hypertextures”. Recently, several papers
have been published that deal with geometry synthesis on contemporary GPUs. [Dachsbacher and Stamminger 2004] propose a
multi-resolution rendering technique based on image warping:
The geometry of a terrain is encoded in a regularly sampled patch.
This patch is then upsampled non-uniformly to a higher resolution, spending more space in “important” regions (according to
camera distance, orientation, view frustum). Additionally, fractal
noise is added to the geometry to increase the level of detail. This
technique is conceptually elegant but aims at a different application than our technique. For use as general modeling primitive,
the application of fractal noise in distorted space is probably
difficult to control in contrast to regular hierarchical subdivision.
[Shiue et al. 2003] propose an extension to current GPU shader
APIs to support general mutation and subdivision operations.
[Guthe et al. 2005] describe an approach for rendering trimmed
NURBs and T-spline surfaces on graphics hardware using a bitcounting scheme for efficient, hardware-based evaluation of
trimming curves. They report a drastic performance boost due to
the GPU implementation. [Bolz and Schröder 2003] describe a
GPU-based algorithm to evaluate subdivision surfaces using
precomputed tables reflecting the mesh topology. A refined technique is presented by [Shiue et al. 2005] using spiral enumeration
of vertices. In contrast to our proposal, these technique support
general topologies of base meshes but do not provide an intrapatch multi-resolution scheme, thus not being applicable to rendering of extended objects such as landscapes. The same argument also applies to the method of [Boubekeur and Schlick 2005],
who propose mesh refinement in the vertex shader. Recently, a
fully procedural rendering hardware has been proposed by [Whitted and Kajiya 2005] that executes procedures in hardware to
create point rendering primitives.
Figure 1: Layout of a single patch. The
border points (2k to each side) are not
shown.
Figure 2: Patches are subdivided by applying a subdivision function F to a kneighborhood
3 Multi-Resolution Geometry Synthesis
In this section, we describe our proposal for hardware accelerated
multi-resolution geometry synthesis. First, we define geometry
through subdivision, then we describe the hierarchical multiresolution scheme and the GPU-based prototype implementation.
3.1 Modeling by Subdivision
In order to facilitate a hardware implementation, we describe
geometry as rectangular, regularly sampled patches. Each patch
describes a surface with disc topology; for more complex topologies, several patches have to be combined. Each sample point xi,j
in a patch is a n-dimensional vector of attributes (Figure 1). The
initial patch is given by a w × h array that is enlarged by a border
of 2k sample points to each side:
Initial patch P0:
n
xi,j(0) ∈ R , – 2k < i < w + 2k, – 2k < j < h + 2k
The values for the initial patch are specified by the human modeler. k is the support of the subdivision function (see below). The
border is necessary to define a consistent subdivision function
(border issues are discussed in the following subsection). Subsequently, higher resolution versions of the initial patch are created
with the number of sample points doubling at each iteration:
xi,j(d)
Higher resolution patch Pd, d > 0:
∈ R , – 2k < i < w·2d + 2k, – 2k < j < h·2d + 2k
n
Sometimes, it is useful to identify points in a patch by a unique
parameter coordinate (i/2d, j/2d) ∈ [–2k, w + 2k] × [–2k, h + 2k]
rather than by the indices (i, j). The higher resolution patches are
created procedurally by applying a subdivision function F to data
from the previous level (Figure 2). The subdivision function
obtains a fixed (2k + 1)2 neighborhood of values from the previous level as well as the current point index and subdivision level
as input and creates a new point:
xi,j(d)
( d −1)
 x
⋯
  i / 2 − k , j / 2− k
= F 
⋱
⋮

 xi / 2−k , j / 2+ k ( d −1) ⋯

( d −1)


⋮
, d , i,
( d −1) 
xi / 2+k , j / 2+ k

xi / 2+ k , j / 2− k


j



The function F is specified by the human modeler as a procedure.
There are no general restrictions to F other than being computed
in finite time. However, for an efficient implementation on vector
processors, the instruction stream for computing F must not depend on the values xi,j(d-1), i or j but only on d. This does not mean
that the computed value is independent of these quantities, only
the sequence of instructions doing the computation is restricted.
For more general architectures (such as DirectX 9 pixel shader 3.0
hardware [Ati 2005, nVidia 2005]) this restriction can be relaxed,
requiring only a spatially coherent rather than identical instruction
stream.
Figure 3: Handling of borders: k neighbors
are needed to create a new point. Thus, a
border area of 2k points is needed.
The attribute vectors xi,j(d) do not need to represent geometric
quantities (such as a position in three space) but may describe
arbitrary attributes. To create the actual geometry, a mapping
n
m
function R: R → R , m ≥ 3 is applied. This function computes a
geometric position in three space for each attribute vector, probably along with other rendering parameters such as normals, colors
or texture coordinates. Rendering buffers will be cached in memory; therefore, employing this extra mapping step avoids overhead
during rendering as the mapping is only performed once.
Handling Borders
Please note that the subdivision procedure outlined above leads to
shrinking patches: With each subdivision step, a border region of
k sample points to each side is removed. However, their size in
the original parameter domain shrinks by 1/2d. This means, for d
subdivision steps, a region of at most
∑
d
i =0
k 2 i ≤ 2k
points, measured in parameter coordinates (i.e. sample spacing of
the original patch), is removed. This is the reason for choosing a
border size of ± 2k for the initial patch. It is guaranteed that the
“lost” area after an arbitrary number of subdivisions does not
exceed this boundary area (see Figure 3). Consequently, only
geometry at coordinates within [0, w] × [0, h] (parameter coordinates) is rendered. The border region is never shown, it only
affects the shape of the inner region indirectly, similar to boundary points of uniform B-splines [Bartels et al. 1987].
This effect does only occur at boundaries. If we consider more
general topologies, where several patches are stitched together
along their boundaries to form a quad mesh of arbitrary topology,
we only need to provide boundary values at topological borders.
In other areas, the boundary values are taken from the adjacent
patch. An special case is a patch with cyclic boundary conditions,
referring to a topological torus. In this case, no boundary values
are needed. In general, the same is true for arbitrary manifold
meshes without (topological) boundaries. Currently, our implementation supports cyclic and border boundary conditions for a
single patch only, more general topologies are still subject to
future work.
3.2 Multi-Resolution Hierarchy of Patches
Employing the subdivision process outlined above, the amount of
data to be processed is quadrupled at each subdivision step. If the
viewer is very close to the surface, demanding for a high resolution for adequate rendering, the processing costs can easily become prohibitive. This problem can be alleviated by a multiresolution approach: Instead of increasing the resolution for the
whole patch at once, we divide each patch in four equally sized
subpatches and apply the refinement step separately to each subpatch if it is necessary. This leads to a quadtree subdivision
scheme (Figure 4). Each node in the quadtree corresponds to a
Figure 4: Subdividing patches. A border of
2k points is attached to each patch to allow
the computation of near-border values.
Figure 5: A restricted quadtree (8 neighborhood) is used to make neighboring values
available. Red: additional hierarchy levels,
enforcing at most one level difference.
w × h array of sample points. This approach causes some subtle
issues that have to be addressed by the subdivision algorithm:
Handling Inner borders
The first problem is handling inner borders. In order to refine a
sample point with index (i, j) in a patch, all its neighbors with
indices [i – k … i + k] × [j – k … j + k] have to be known. This
means that we must have computed the 8 direct neighbors of a
patch to be able to compute the next level of refinement for this
patch. In other words, the hierarchy must be a (well-known) restricted quadtree [de Berg et al. 1997]. Adjacent levels of resolution must not deviate by more than one level of resolution. In our
case, adjacency is defined by the 8 neighborhood of a patch
(Figure 5). Then, we can access the values of neighboring patches
to obtain values at the borders.
This constraint can easily be enforced by on-demand computation: Whenever a patch has to be refined, all eight direct
neighbors are retrieved by a sibling search algorithm. If the demanded patch (or one of its parents) does not yet exist, we call the
creation procedure recursively. After some patch subdivisions, all
necessary neighboring nodes have been build and the patch of the
next higher resolution level can be finally created. This “balancing” step adds additional overhead to the multi-resolution scheme.
However, this overhead is only O(1), which is easy to see by
assigning “overhead” subdivisions to the neighbor that demanded
for them [de Berg et al. 97]. In practice, the overhead factor is
rather small: Overhead nodes only occur at the border of the view
frustum (which is typically only a one-dimensional border in the
parameter domain, affecting O(n1/2) of the n nodes). The varying
resolution due to the distance to the camera is a smooth function
which usually already demands small spacings in resolution by
itself.
A second problem, also caused by the variation of resolution,
is the triangulation of the surface: We would like to display a
continuous, triangulated surface when rendering the patches. As
we are already forced to build a restricted quadtree hierarchy for
modeling, the solution is straight-forward: Considering one node,
the neighboring nodes can only differ by at most one level of
resolution. Correspondingly, only a small number of triangulations can occur which can easily be precomputed (similar to [Larsen and Christensen 2003]) and instantiated during rendering.
Multi-Resolution Rendering
During rendering, we traverse the quadtree top-down and stop the
descent if a node meets the precision requirements (if a node does
not exist, it is created, as outlined above). Different metrics can be
employed at this step. We currently use the following, rather
canonical rule: The decision whether to render a node is solely
based the bounding box of its geometry (which has to be determined during or after geometry synthesis, see below). Nodes with
bounding boxes completely outside the view frustum are never
rendered. Nodes inside the view frustum are rendered (and the
descent is stopped) if their projected, on-screen resolution exceeds
Figure 6: Patch subdivision in hardware –
first neighboring area is assembled to an
enlarged patch, then the subdivision shader
is employed.
a user defined threshold. The on-screen resolution is estimated by
dividing the side length of the largest side of the bounding box by
the number of points along one edge of the patch (we employ
square patches only, with w = h). This value is then projected onto
the screen by dividing by the minimum z-value of the bounding
box and scaling by a constant according to resolution and viewing
angle. A near-clipping plane is included in the view frustum to
avoid demanding infinite resolution (additionally, a fixed upper
limit can also be specified, if desired).
When all patches have been selected from the hierarchy, each
patch is rendered as a triangle mesh. The mesh is chosen from a
list of precomputed vertex indexing buffers by considering the
resolution of the neighboring patches.
3.3 Hardware Implementation
The algorithm involves two major tasks: Management of the
restricted hierarchy and processing of the points. Hierarchy management involves the traversal of irregular data structures which is
difficult to accelerate by special purpose hardware. Thus, this task
is done by the CPU. If the resulting CPU load is too high, we have
the option to increase the number of sample points per patch,
trading-off the adaptivity of the multi-resolution representation for
less hierarchy management workload for the CPU. Larger patch
sizes lead to a less accurate view frustum culling and some oversampling at parts of the patch farther away from the viewer. However, for typical patch sizes of about 322 - 642 triangles, such
adverse effects are small while already placing the main computational burden to the hardware accelerated patch processing.
Hardware Subdivision
The first step for creating higher resolution patches is the assembly of the 2k-neighborhood: The original patch is copied into a
buffer enlarged by 2k sample values at each side. Then, the 8
neighboring patches are fetched from the hierarchy and the values
at the border to the current patch are copied to the border regions
of the larger buffer (Figure 6) using a BitBlit operation on the
graphics hardware.
The second step is the computation of the high resolution
data. First, four w × h sized destination buffers for the 4 children
are allocated. Then, the subdivision function F has to be evaluated. This step is usually the most expensive of the algorithm and
the main goal of our architecture was to allow for an efficient
hardware implementation at this point. This evaluation can be
implemented very efficiently on a vector processing architecture
(SIMD): The patch consists of several sample points that can all
be processed in parallel, using the same instruction stream. In our
implementation, we use typically 322 patches corresponding to
1024 potentially parallelizable function evaluations.
The third step is the creation of rendering data by applying the
mapping function R. For this step, a new buffer (probably with a
different number of attributes per point) of the same size as the
source patch (but omitting the border region) has to be allocated
first. Then, R is applied to each point of a patch independently and
the result is written into the output buffer. This process can be
executed on the same hardware as the subdivision process, the
only difference is that no upsampling takes place.
The last step is the rendering step: A precomputed index
buffer of triangles is chosen and the data in the rendering buffer
provides the vertices of the mesh. Each vertex provides a position
in 3 space and probably further shading attributes such as normals
and color. This data can be processed directly and very efficiently
by a contemporary programmable GPU.
The created patches, both subdivided and rendering data, are
not deleted after rendering but kept in memory for future use. A
LRU scheme is applied to track the reusage of these buffers. If
memory is filled-up, patches that have not been used for the longest time are deleted first to free memory.
GPU-based Implementation
We have implemented a prototype of our algorithm on a programmable GPU, using OpenGL and CG as API (see [ATI 2005,
nVidia 2005] for details on the programming capabilities mentioned below). We map the computationally intense steps of subdivision (F) and mapping (R) to the pixel shader of the GPU.
These units provide several parallel ALUs that can be used in a
SIMD programming model: Each pixel is being computed independently, using the same instruction stream. Additionally, the
number of output pixels has to be specified in advance while the
amount of input data may vary, according to the shader program.
These conditions are met by our geometry synthesis technique.
Mapping of the algorithm to a programmable GPU is straightforward: Patches are represented as textures (if being used as
source) or render targets (if being used as destination). In order to
avoid switching of render targets, only one render target is created
and used as temporary buffer. The data is copied to a texture
associated with a patch directly after each computation via onboard memory transfer.
The attribute channels of the patches are implemented using
multiple render targets: On the latest hardware, each pixel shader
can read from up to 16 textures and output to up to 4 render targets, both providing up to 4 32-bit floating point channels each. In
this way, up to 16 floating point attribute channels can be handled
in one rendering pass. For more attribute channels, multiple rendering passes are necessary. The example scenes in this paper use
12 (landscapes) and 8 (subdivision surfaces) 32 bit floating point
channels, respectively.
In our implementation, initial data for patch P0 can be specified by importing data from data sources such as landscape elevation data or by interactive painting on the 3d-geometry. We allow
arbitrary amounts of initial data, main memory permitting. If the
initial data is larger than a patch (i.e. typically 322 plus border), a
multi-resolution pyramid is build in main memory by subsampling (currently nearest-neighbor subsampling) the original data in
a quadtree of patches. This initial pyramid is handled in software
and patches are transferred to the graphics board on demand. If
the demanded rendering resolution exceeds that of the initial data,
the hardware accelerated geometry synthesis is invoked.
The subdivision function F and the mapping function R of the
geometry synthesis are represented as pixel shader programs. The
latest shader standard (DirectX 9, shader model 3.0, [Ati 2005,
nVidia 2005]) even allows data dependent branching in the pixel
shader, extending the strict SIMD model. The achieved performance depends on the coherency of the instruction streams for
neighboring data. In our example scenes, we do not use data
dependent branching but only conditional writes that do not alter
the instruction stream, which has turned out to be sufficient for
our models.
Lastly, a further vertex/pixel shader pair is used for final rendering of the resulting triangle meshes. The render buffers are
created by copying the content of the render target directly to a
vertex buffer, which is supported by current OpenGL vendor
extensions. Copying to a vertex buffer is very efficient on current
hardware. An alternative would be the usage of texture fetches in
the rendering vertex shader. This method has the advantage of
easily allowing for interpolation between adjacent subdivision
levels to avoid popping artifacts, which is not included in our
current implementation based on copying buffers.
Our GPU-based implementation processes all geometry data
on the GPU only, with one exception: In order to control the
multi-resolution rendering, the bounding boxes of the synthesized
geometry have to be known to the CPU. Thus, the position channel of the rendering data has to examined and the minimum and
maximum x, y and z coordinates must be determined. This is done
in two steps: First, we reduce the amount of data to be transferred
by scaling down the patches [Buck and Purcell 2004]: We use a
pixel shader that computes the minimum and maximum values of
4 × 4 neighborhoods and outputs them to an eightfold reduced
patch of data. This process can be repeated iteratively. In our
experiments, one such reduction pass was sufficient; a second
pass did not lead to a further reduction of the overall computation
time. After reduction, the resulting data is read back to main
memory and the bounding box is computed by the host CPU, now
requiring only little transfer bandwidth. Up to 16 read back operations are performed in one batch from the same reduction buffer
to reduce synchronization overhead.
4 Modeling
We have implemented two different modeling techniques to examine the practical applicability of our proposal:
Smooth surfaces: To model smooth surfaces, we first need a
parameterization of the surface as a planar patch. Then, wellknown techniques such as subdivision surfaces or spline subdivision can be employed. As an example, we have implemented the
bicubic B-spline subdivision model of the Stanford bunny described in [Lossaso et al. 2003]: The authors create a geometry
image of the bunny geometry and compute vertices for a least
square B-spline subdivision surface approximation. We have used
the data from this paper (which is available on the web) and reimplemented the subdivision process. In addition to the original
paper, our implementation provides adaptive multi-resolution
modeling and rendering, allowing for close-ups of objects without
loss of detail or serious penalties to the rendering performance.
Multi-channel fractals: The multi-resolution approach of our
modeling technique allows handling of large, extended models
such as an entire landscape. To define such models, we employ a
fractal modeling technique which we call multi-channel fractals.
The object is described by a set of attribute channels corresponding to different surface properties. In our example, we use a height
channel describing the landscape as a height field. Additionally,
we have channels for surface roughness, vegetation density for
different layers of vegetation (shown as different colors during
rendering), and a snow layer. Each channel contains fractal 1/f h
noise (with non-stationary h). To create believable landscapes,
interdependences between these channels are introduced in the
subdivision step:
The height field is created by first interpolating the local
neighborhood using a smoothing filter. Then, random noise is
added with an amplitude of 2-dh with h being a smoothness parameter which is stored in a separate channel. The h-channel is a
fractal itself: It is also created by smooth filtering of neighboring
h values and random additions. However, we prefer larger values
of h (leading to smoother terrains due to smaller noise increments)
if the value in the height channel is small (i.e. we are in the area of
a vally). Conversely, h is decreased (leading to more roughness) if
the slope of the height field at the current level of resolution is
large, leading to more roughness at steep mountainsides. Both is
implemented by blending between the h channel and a
height/slope depended h according to the subdivision level. At
low levels, a strict correlation of roughness to height and slope is
enforced while more randomness is allowed at smaller scales.
Similarly, vegetation textures and a snow density are created
by employing fractal channels, which are influenced but not determined entirely by height and slope. For snow, we expect a
smooth surface appearance at thick layers of snow. The thicker
the layer of snow, the more high frequency details are attenuated.
Consequently, the values in the roughness channel h are strongly
enlarged in regions with a large value in the snow channel. The
result conveys an quite realistic look of snow-covered areas in a
rough mountain range. This interplay of fractal randomness and
parameter interdependence yields landscapes with irregular attributes but believable mutual influence and can probably be employed to approximate a variety of other natural phenomena, too.
Of course there are limitations. For example, we cannot directly
simulate global physically-based effects such as erosion [Musgrave et al. 89].
5 RESULTS
The results reported in this section have been measured on a
system equipped with an nVidia GeForce 6800GT AGP graphics
board (256MB video ram) and a 2.6GHz Pentium 4 CPU. The
software has been implemented in C++ and all shaders have been
implemented in CG [nVidia 2005]. The shader code is canonical
C code, no assembly code or hardware specific optimizations
have been employed. Figure 8 shows renderings of example models created with the techniques described in Section 4. The images
are annotated with the rendering time (from cache), the rebuild
time (rendering with emptied caches) and a typical rendering time
for a walkthrough (as shown in the accompanying video).
Smooth surface: The bunny model in Figure 8(a) has been
constructed using the technique of [Lasasso et al. 2003], as described in Section 4. The subdivision shader performs smoothing
and normal vector computation, rendering is done by a simple
environment mapping shader (to show the surface smoothness).
For a typical viewpoint, we obtain 33 frames per second and only
moderate reduction for a moving observer (see video).
Fractal landscapes: The landscape models in Figure 8(b) (e) have been created using the multi-channel fractal technique.
For rendering, antialiased shadow maps (12 samples) and an
approximate atmospheric scattering model have been employed
[Hoffman and Preetham 2002]. The vegetation texture (different
shades of green) and the snow have been modeled as fractal attribute channels (as described in Section 4); the grass has been
additionally modulated by a periodic 2d texture.
A basic landscape scene is shown in Figure 8(b). The shown
view consists of 604 patches of 322 vertices, accounting for about
1.2 million triangles. At the shown quality level, it can be rendered at about 6 frames per second. The throughput of the rendering stage is currently limited by the complexity of the rendering
shaders which have to compute the quite involved lighting model.
Additionally, some of the mapping steps (such as coloring of
vegetation layers) are still computed during rendering to facilitate
interactive landscape design. A rebuild of all geometry from
scratch takes 2.4 seconds; however, due to temporal coherence,
the average frame rate during a walkthrough does not drop significantly (see video). Figure 8(d) shows a similar scene, but with
more roughness and more snow. Please note how the snow channel automatically damps out high frequency noise, leading to the
impression of rough terrain covered by a layer of snow of different thickness. Figures (e) and (f) show a variant of the model from
Figure 8(b). Here, a second fractal layer has been introduced to
model water. The second layer is computed for each patch after
the landscape layer so that its attributes can be accessed for defining the second layer. It is rendered with a water shader (using an
additional rendering pass to create a mirrored and a refracted
image of the landscape). The foam at the coastline is created by a
fractal channel similar to the vegetation channel. The overall
shape depends on water depth but also shows random variations.
Due to the double layer modeling and the multiple rendering
passes, the framerates are lower. Figure (e) and (f) have been
created with different level of detail settings, varying the projected vertex spacing parameter as described in Section 3.2. A last
example is shown in Figure 8(c). For this scene, we have used
height field data of the grand canyon [US Geological Survey
2005] and added different fractal channels. The original data is
4002, a 322 patch sized multi-resolution pyramid of the original
data is created by the CPU, geometry synthesis is applied for
deeper levels of subdivision (see the video for an interactive
walkthrough). Figure 7 shows the variation of the rendering time
and the number of overall and rebuild patches per frame during
the walkthrough of this scene. Due to caching, only a few patches
have to be rebuild for each frame so that interactive walkthroughs
are possible.
Evaluation: We have measured how much time is consumed
by the different parts of the algorithm: Comparing the costs for
geometry synthesis and rendering, we observe a factor of about 614. It is interesting to further split up the synthesis costs into
actual hardware processing costs and time needed do the bounding box calculation (which involves reading back data from GPU
memory). Due to the min/max reduction step (aggregation of 4 × 4
neighborhoods in a pixel shader), only a moderate overhead is
observed: 10% of the rebuild time (landscape scene Figure 8(b))
and 25% (bunny scene), respectively, are spent for bounding box
calculation. Without prior reduction, the overhead is significantly
larger (41% and 57% respectively). The overhead is larger for the
bunny scene because the subdivision shader is less complex.
During animations, the average percentage of rendering time
spent for bounding box calculations is about 1% for all scenes
(due to caching) so that this overhead is not really an issue in
practice.
A last, important point is to examine the benefit of a hardware
implementation. We have compared the execution speed of the
GPU implementation with a CPU implementation. As all shaders
have been written in CG, we were able to compile almost the
original code with a C++ compiler (Intel C++ 7.1, all optimizations enabled). Only a few CG specific commands and data types
had to be translated into macros and classes with inline functions.
Textures have been modeled as conventional, two dimensional C
arrays. This approach (plain C++ code) reflects the typical programming approach in practice. However, it is still biased a bit in
favor of the GPU, as vector data types are not intrinsic in standard
C++ (although the employed compiler automatically tries to employ SSE SIMD instructions) and we do not use an optimized
texture memory layout. Hence, the results should be considered
with some care. Using this setup we have measured the computation time of the subdivision shader on both the CPU and GPU of
the test system. We have obtained a computation time of 0.5 ms
per patch for the GPU and 7.25 ms for the Pentium 4 2.6 GHz
(factor 14.5) for the subdivision shader of the landscape model of
Figure 8(b). For the bunny model subdivision shader, the result is
0.4 ms for the GPU and 1.3 ms for the CPU (factor 3.25)1. The
1
We have also repeated the CPU benchmark on a Pentium-4 3.4 GHz, the
fastest machine we had access to. This yielded performance factors of 11.8
and 2.5, respectively, to the GeForce 6800 GT GPU. Unfortunately, we
did not have a system available with an ATI Radeon X1800 or nNvidia
GeForce 7800 GTX graphics board for comparison with a high end GPU.
speedup for the bunny scene is appreciable but significantly
smaller than for the landscape scene. Again, this is due to the
much shorter shader which puts more emphasize on additional
CPU-GPU communication overhead. For the landscape scene, the
speedup is more than an order of magnitude. Hence, it is probably
save to assume that the hardware-based implementation will still
provide a substantial performance benefit for synthesizing complex geometry even if more aggressive low-level CPU optimizations are applied.
6 CONCLUSIONS AND FUTURE WORK
We have proposed a new hardware accelerated modeling and
rendering technique that can be implemented on data parallel
architectures such as current GPUs. The algorithm employs a
hierarchy of regularly sampled patches to facilitate an efficient
implementation on SIMD processing arrays. This structure maps
well to pixel shaders of current GPUs, allowing for executing
modeling and rendering almost entirely on the GPU, yielding a
substantial performance improvement.
There are several directions for future work. First, some technical implementation issues (such as blending between resolution
levels to avoid popping) could be improved. More importantly,
the implementation should be generalized to support general base
meshes. Currently, each patch is treated separately (this can be
seen by a small hole in the bunny surface; the triangulation
scheme does not connect the outer borders of the geometry image
to a closed surface). This extension is mostly straightforward. The
main issue is handling of neighborhoods at extraordinary vertices
(valance ≠ 4). Here, the technique of [Bolz and Schröder 2003]
could be a starting point to be generalized for more general, stochastic subdivision techniques. Lastly, the subdivision topology
could be made more flexible: Due to the limitations of current
GPUs, we can only handle regularly sampled, rectangular patches.
It would be interesting to examine subdivision rules that allow a
change of topology during subdivision. In combination with a
point-based rendering approach, more general shapes could be
created. This would involve a generalized concept of neighborhoods and a subdivision unit with a variable number of output
data points.
Acknowledgements
This work has been supported by the state of Baden Württemberg
(Germany) and the Max Planck Center for Visual Computing and
Communication. The authors wish to thank Leonidas Guibas,
Martin Frisch, Timo Schairer, Andreas Schilling, and Wolfgang
Straßer and the anonymous reviewers for their valuable comments. We especially wish to thank Alexander Berner, Arno
Fleck, Mark Hoffmann, Philipp Jenke, Benjamin Maier for help
with the implementation and Robert Kuchar for providing the
skylight maps. The environment map for the bunny scene has
been taken from P. Debevec’s web site (www.debevec.org).
References
ATI, 2005. ATI developer relations. http://www.ati.com
BALÁZS, Á., GUTHE, M., AND KLEIN, R., 2004. Fat borders: Gap
filling for efficient view-dependent lod rendering. In: Computers
& Graphics, 28(1), 79–86.
BARTELS, R. H., BEATTY, J. C., and BARSKY, B. A. 1987. An
Introduction to Splines for use in Computer Graphics and
Geometric Modeling, Morgan Kaufmann Publishers.
BOLZ, J. and SCHRÖDER, P., 2003: Evaluation of Subdivision Surfaces
on Programmable Graphics Hardware. http://www.multires.
caltech.edu/pubs/GPUSubD.pdf
Figure 7: Overall rendering time, number of rebuild patches and
number of overall patches for the frames of the Grand Canyon
flyover (see accompanying video)
BOUBEKEUR, T., and SCHLICK, C., 2005: Generic Mesh Refinement
on GPU. In: Graphics Hardware 2005.
BUCK, I., PURCELL, T., 2004: A Toolkit for Computation on GPUs.
In: GPUGems, Addison-Wesley.
CATMULL, E., and CLARK, J. 1978. Recursively generated B-spline
surfaces on arbitrary topological meshes. In: Computer Aided
Design, 10(6), 350–355.
CIGNONI, P., GANOVELLI , F., GOBBETTI, E., MARTON, F., PONCHIO,
F., and SCOPIGNO, R. 2003: Planet-Sized Batched Dynamic
Adaptive Meshes (P-BDAM). In: Visualization 2003
Proceedings.
COOK, R.L., CARPENTER, L., and CATMULL, E., 1987. The Reyes
image rendering architecture. In: Comptuer Graphics, 21(3), 95–
102.
DACHSBACHER, C., and STAMMINGER, M., 2004: Rendering
Procedural Terrain by Geometry Image Warping. In: Proc. of
Eurographics Symposium on Rendering 2004.
DE BERG, M., VAN KREVELD, M., OVERMARS, M., and
SCHWARZKOPF, O., 1997: Computational Geometry – Algorithms
and Applications, Springer Verlag.
DOO, D. 1978. A subdivision algorithm for smoothing down
irregularly shaped polyhedrons. In: Proc. on Interactive
Techniques in Comuter Aided Design, 157–165.
DUCHAINEAU, M., WOLINSKY, M., SIGETI, D. E., MILLER, M. C.,
ALDRICH, C. and MINEEV-WEINSTEIN, M. B. 1997: ROAMing
Terrain: Real-time Optimally Adapting Meshes. Visualization 97
Proceedings, 81–88.
FOURNIER, A., FUSSEL, D., and CARPENTER, L., 1982. Computer
Rendering of Stochastic Models. In: Communications of the ACM
(25)6, 371–384.
GUTHE, M. BALÁZS, Á, and KLEIN, R., 2005: GPU-based trimming
and tessellation of NURBS and T-Spline surfaces. In: ACM
Transactions on Graphics, 24(3).
HOFFMAN N., and PREETHAM, A.J. 2002: Rendering Outdoor Light
Scattering in Real Time.
http://www.ati.com/developer/techpapers.html
HOPPE, H.: Progressive meshes. In: SIGGRAPH 96 Proceedings,
Annual Conference Series, 99–108.
LARSEN, B.D., and CHRISTENSEN, N.J., 2003: Real-time Terrain
Rendering using Smooth Hardware Optimized Level of Detail. In:
Journal of WSCG, Vol.11, No.1.
LOSASSO, F., and HOPPE, H., 2004: Geometry Clipmaps: Terrain
Rendering Using Nested Regular Grids. In: ACM Transactions
on Graphics, 23(3).
LOSASSO, F., HOPPE, H. , SCHAEFER, S., and WARREN., J., 2003:
Smooth geometry images. In: Eurographics Symposium on
Geometry Processing 2003, 138 – 145. Data taken from
http://research.microsoft.com/~hoppe.
(a) Stanford Bunny – subdivision surface
rendering (c.f. [Losasso et al. 2003]),
1089 control points, 184 patches (188416
vertices), rendering time 30 ms, rebuild
from scratch 251ms
(b) Landscape scene – 604 patches,
rendering time 169 ms, rebuild from
scratch 2438 ms, walkthrough (see video)
152 ms per frame (av.)
(c) Grand Canyon – initial 4002 height
field from [US Geological Survey 2005],
433 patches, rendering time: 130 ms,
rebuild from scratch 955 ms, walkthrough
(see video) 136 ms per frame (av.)
(d) Mountain range at sunset – rendering
time 228 ms, rebuild from scratch 2487
ms, 985 Patches, walkthrough (as shown
in the video) 231 ms per frame (av.).
(e) Mountain Lake – a variant of landscape
(b), medium resolution (2.2 pixel per
triangle, 2×586 patches), rendering time:
196 ms. Rebuild from scratch: 1079 ms.
(f) Mountain Lake – a variant of landscape (b), high resolution (1 pixel per
triangle, 2×1039 patches), rendering time:
370 ms. Rebuild from scratch: 3241 ms.
Figure 8: Application examples. In all examples, a multi-resolution patch contains 322 vertices (and k = 3 vertices border).
LEWIS, J.P., 1987. Generalized Stochastic Subdivision. In: ACM
Transactions on Graphics, (6)3.
LINDSTROM, P., KOLLER, D., RIBARSKY, W., HODGES, L.F., FAUST,
N., and TURNER, G.A., 1996. Real-time, continuous level of detail
rendering of height fields. In: SIGGRAPH 96 Proceedings,
Annual Conference Series, 109–118.
LINDSTROM, P., and PASCUCCI, V., 2001. Visualization of Large
Terrains Made Easy. In: Visualization 2001 Proceedings.
LUEBKE, D., REDDY, M., COHEN, J.D., VARSHNEY, A., WATSON, B.,
and HUEBNER, R.: Level of Detail for 3D Graphics, Morgan
Kaufmann Publishers, 2003.
MAX, N.L. 1981. Vectorized Procedural Models for Natural Terrain:
Waves and Islands in the Sunset. In: Computer Graphics, (15)3.
Miller, G.S.P., 1986. The Definition and Rendering of Terrain Maps.
In: Computer Graphics, (20)4.
MUSGRAVE, K.F., KOLB, C.E., and MACE, R.S., 1989. The synthesis
and rendering of erroded fractal terrains. In: Computer Graphics,
(23)3.
MUSGRAVE, K.F., 1993. Methods for Realistic Landscape Imaging.
PhD thesis, Yale University.
NVIDIA, 2005. nVidia developer relations. http://www.nvidia.com
PANDROMEDA 2005: MojoWorld software package.
http://www.pandromeda.com
PERLIN, K., and HOFFERT, E.M., 1989. Hypertexture. In: Comptuer
graphics, 23(3).
PERLIN, K., 1985. An Image Synthesizer. In: Comptuer Graphics,
19(3).
PLANETSIDE SOFTWARE 2005: Terragen software package.
http://www.planetside.co.uk
PAJAROLA, R., 1998. Large Scale Terrain Visualization Using The
Restricted Quadtree Triangulation. In: Visualization 98
Proceedings.
RÖTTGER, S., HEIDRICH, W., SLUSSALLEK, P., and SEIDEL, H.P. 1998:
Real-Time Generation of Continuous Levels of Detail for Height
Fields. In: Proceedings of the 6th International Conference in
Central Europe on Computer Graphics and Visualization, 315–
322.
SHIUE, L.J., GOEL, V., and PETERS, J. 2003. Mesh Mutation in
Programmable Graphics Hardware. In: Graphics Hardware 2003.
SHIUE, L.J., JONES, I., and PETERS, J.: A Realtime GPU Subdivision
Kernel. In: ACM Transactions on Graphics, 24(3).
STAMMINGER, M., and DRETTAKIS, G., 2001: Interactive Sampling
and Rendering for Complex and Procedural Geometry. In:
Rendering Techniques 2001.
US GEOLOGICAL SURVEY, 2005: http://edc.usgs.gov/geodata/
WAND, M., FISCHER, M., PETER, I., MEYER AUF DER HEIDE, F., and
STRAßER, W., 2001: The Randomized z-Buffer Algorithm:
Interactive Rendering of Highly Complex Scenes. In: SIGGRAPH
2001 Proceedings, Annual Conference Series, 361–370.
WHITTED, T., and KAJIYA, J., 2005: Fully Procedural Graphics. In:
Graphics Hardware 2005.
Zorin, D., Schröder, P., DeRose, T., Kobbelt, L., Levin, A., and
Sweldens, W., 2000. Subdivision for Modeling and Animation.
In: Siggraph 2000 Course Notes.
(a) Stanford Bunny (30 ms / 251 ms)
(b) Landscape scene (169 ms / 2438 ms)
(c) Grand Canyon (130 ms / 955 ms)
(d) Mountains at sunset (228 ms / 2487 ms)
(e) Mountain Lake (low resolution,
169 ms / 1079 ms)
(f) Mountain Lake (high resolution,
370 ms / 3241 ms)
Color Plate: Hardware Accelerated Multi-Resolution Geometry Synthesis: Figure 8, Application examples.
Timings: rendering from cache / rendering with full rebuild.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement