Interactive Ray Tracing with CUDA
Interactive Ray Tracing with CUDA
David Luebke and Steven Parker
NVIDIA Research
Ray Tracing & Rasterization
Rasterization
For each triangle:
Find the pixels it covers
For each pixel: compare to closest triangle so far
Ray tracing
For each pixel:
Find the triangles that might be closest
For each triangle: compute distance to pixel
When all triangles/pixels have been processed, we know
the closest triangle at all pixels
Ray Tracing & Rasterization
Rasterization
For each triangle:
Requires Z-buffer: track
distance per pixel
Find the pixels it covers
For each pixel: compare to closest triangle so far
Ray tracing
Requires spatial index: a spatially
sorted arrangement of triangles
For each pixel:
Find the triangles that might be closest
For each triangle: compute distance to pixel
When all triangles/pixels have been processed, we know
the closest triangle at all pixels
Myths of Ray Tracing & Rasterization
Ray tracing is clean, rasterization is ugly
Both are ugly
Ray tracing is sublinear, rasterization linear in primitives
Rasterization uses culling techniques
Ray tracing is linear, rasterization sublinear in pixels
Ray tracing uses packets & frustum tracing
Ray Tracing vs. Rasterization
Rasterization is fast
but needs cleverness to support complex visual effects
Ray tracing supports complex visual effects
but needs cleverness to be fast
Why Rasterization?
Fast & Efficient
Ubiquitous – part of workflow, pipeline
Great for displacement-mapped geometry
Developers know how to make beautiful pictures...
Copyright NVIDIA 2008
6
Why Rasterization?
From Battlefield: Bad Company, EA Digital Illusions CE AB
Copyright NVIDIA 2008
7
Why Rasterization?
From Battlefield: Bad Company, EA Digital Illusions CE AB
Copyright NVIDIA 2008
8
Why Rasterization?
From Crysis, Crytek GmbH
Copyright NVIDIA 2008
9
Why Rasterization?
From Crysis, Crytek GmbH
Copyright NVIDIA 2008
10
Why ray tracing?
Ray tracing unifies rendering of visual phenomena
fewer algorithms with fewer interactions between algorithms
Easier to combine advanced visual effects robustly
soft shadows
subsurface scattering
indirect illumination
transparency
reflective & glossy surfaces
depth of field
…
Copyright NVIDIA 2008
11
Ray Tracing vs. Rasterization
Rasterization is fast
but needs cleverness to support complex visual effects
Ray tracing supports complex visual effects
but needs cleverness to be fast
Use both!
Ray tracing (Appel 1968, Whitted 1980)
Copyright NVIDIA 2008
13
Distributed Ray Tracing (Cook, 1984)
Copyright NVIDIA 2008
14
Path Tracing (Kajiya, 1986)
Copyright NVIDIA 2008
15
Ray Tracing Regimes
Real-time
Interactive
Computational Power
Copyright NVIDIA 2008
16
Industrial strength ray tracing
mental images is market leader for ray tracing software
Applicable in numerous markets: automotive, design,
architecture, film
Copyright NVIDIA 2008
17
Importance
Copyright NVIDIA 2008
18
Importance
Copyright NVIDIA 2008
19
Importance
Copyright NVIDIA 2008
20
Interactive Ray Tracing
Copyright NVIDIA 2008
21
GPUs Are Fast & Getting Faster
1000
Peak GFLOP/s
NVIDIA GPU
Intel CPU
750
500
250
0
Sep-02
Copyright NVIDIA 2008
Jan-04
May-05
Oct-06
Feb-08
22
Why GPU Ray Tracing?
Abundant parallelism, massive computational power
GPUs excel at shading
Opportunity for hybrid algorithms
Copyright NVIDIA 2008
23
GPU Ray Tracing
Purcell et al., Ray Tracing on Programmable
Graphics Hardware, SIGGRAPH 2002
Purcell et al., Photon Mapping on
Programmable Graphics Hardware,
Graphics Hardware 2004
Purcell Photon
Map Image
Goes Here
Popov et al., Stackless KD-Tree Traversal for High Performance
GPU Ray Tracing, Computer Graphics Forum, Oct 2007
Popov et al., Realtime Ray Tracing on GPU with BVH-based
Packet Traversal, Symposium on Interactive Ray Tracing 2007
Copyright NVIDIA 2008
24
GPU Ray Tracing
18
16
14
12
K-D Restart
GPU Improvement
Looping
Short-Stack
10
8
6
4
2
0
Horn et al., Interactive k‐D Tree GPU Raytracing
ACM SIGGRAPH Symposium on Interactive 3D Graphics 2007
Copyright NVIDIA 2008
25
GPU Ray Tracing
Zhou et al., Real‐Time KD‐Tree Construction on Graphics Hardware
Microsoft Research Asia Tech Report 2008‐52
Copyright NVIDIA 2008
26
Volume Ray Casting
Ray marching for isosurfaces + direct volume rendering
Electron density of virus
from cryoelectroscopy
Vital to change isosurface
interactively
Great match for CUDA
Volume Ray Casting With CUDA
Marsalek & Slusallek 2008
Volume Ray Casting
Ray marching for isosurfaces + direct volume rendering
Electron density of virus
from cryoelectroscopy
Vital to change isosurface
interactively
Great match for CUDA
Volume Ray Casting With CUDA
Marsalek & Slusallek 2008
Volume Ray Casting
Ray marching for isosurfaces + direct volume rendering
Electron density of virus
from cryoelectroscopy
Vital to change isosurface
interactively
Great match for CUDA
Volume Ray Casting With CUDA
Marsalek & Slusallek 2008
City demo
Real system
NVSG-driven animation and interaction
Programmable shading
Modeled in Maya, imported through COLLADA
Fully ray traced
2 million polygons
Bump-mapping
Movable light source
5 bounce reflection/refraction
Adaptive antialiasing
System Diagram – ray tracing
Texture/Vertex
buffer setup
(OpenGL)
Ray tracing
(CUDA)
Image display/
postprocessing
(OpenGL)
System Diagram – ray tracing
Texture/Vertex
buffer setup
(OpenGL)
Ray
generation
Programmable
Ray tracing system
Light
shader
Build
Ray tracing
(CUDA)
Image display/
postprocessing
(OpenGL)
Traversal
Miss
shader
Material
shading
Key Parallel Abstractions in CUDA
0. Zillions of lightweight threads
Î Simple decomposition model
1. Hierarchy of concurrent threads
Î Simple execution model
2. Lightweight synchronization primitives
Î Simple synchronization model
3. Shared memory model for cooperating threads
Î Simple communication model
Copyright NVIDIA 2008
33
Key Parallel Abstractions in CUDA
0. Zillions of lightweight threads
Î Simple decomposition model
1. Hierarchy of concurrent threads
Î Simple execution model
2. Lightweight synchronization primitives
Î Simple synchronization model
3. Shared memory model for cooperating threads
Î Simple communication model
Copyright NVIDIA 2008
34
Hierarchy of concurrent threads
Parallel kernels composed of many threads
Thread t
all threads execute the same sequential program
Copyright NVIDIA 2008
35
Hierarchy of concurrent threads
Parallel kernels composed of many threads
Thread t
all threads execute the same sequential program
Block b
Threads are grouped into thread blocks
t0 t1 … tB
threads in the same block can cooperate
Copyright NVIDIA 2008
36
Hierarchy of concurrent threads
Parallel kernels composed of many threads
Thread t
all threads execute the same sequential program
Block b
Threads are grouped into thread blocks
t0 t1 … tB
threads in the same block can cooperate
Kernel foo()
Threads/blocks
have unique IDs
Copyright NVIDIA 2008
...
37
Big Picture
GTX 280 supports up to 30,720 concurrent threads!
1. Big strategic optimization: minimize per-thread state
2. Otherwise, take simplest option
•
Clever optimizations usually violate rule 1
3. Lots of opportunity for further research
•
Coalescing work for increased coherence (work queues)
•
•
•
•
Data coherence
Execution coherence
Ray space hierarchies
Radical departures from traditional methods (see RT08)
Copyright NVIDIA 2008
38
Details – Algorithmic
Top-level BVH + subtrees (BVH or k-d tree)
Supports rigid motion, instancing
Rebuild/refit easy to add
Traversal + intersection + shading “megakernel”
while – while vs. if – if
Highly variable thread lifetimes!
Software load-balancing
Copyright NVIDIA 2008
39
Details - Implementation
Triangle & hierarchy data through texture cache
Ray tree recursion
Stack in local memory to store shader live variables
Copyright NVIDIA 2008
40
Short Stack
Goal: minimize state per thread
Strategy: replace traversal stack with short stack
Horn et al., Interactive k-D Tree
GPU Raytracing, I3D 2008
Slides courtesy Daniel Horn
Copyright NVIDIA 2008
41
KD-Tree
X
Z
B
Y
X
C
Y
D
Z
A
A
Copyright NVIDIA 2008
B
C
D
42
KD-Tree
X
Z
B
Y
X
C
Y
D
Z
A
A
Copyright NVIDIA 2008
B
C
D
43
KD-Tree
X
Z
B
Y
X
C
Y
D
Z
A
A
Copyright NVIDIA 2008
B
C
D
44
KD-Tree
X
Z
B
Y
X
C
Y
D
Z
A
A
Copyright NVIDIA 2008
B
C
D
45
KD-Tree
tmin
X
Z
B
Y
X
C
Y
D
Z
A
tmax
Copyright NVIDIA 2008
A
B
C
D
46
KD-Tree Traversal
X
Z
X
B
Y
C
Y
D
Z
A
A
C
B
D
A
Stack:
Copyright NVIDIA 2008
Z
47
KD-Restart
X
Z
B
Standard traversal
Omit stack operations
Proceed to 1st leaf
Y
C
A
Copyright NVIDIA 2008
D
If no intersection
Advance (tmin,tmax)
Restart from root
Proceed to next leaf
48
KD-Restart with short stack (size 1)
X
Z
X
B
Y
C
Y
D
Z
A
A
C
B
D
A
Stack:
Copyright NVIDIA 2008
A
Z
49
Short Stack Cache
Even better:
Each thread stores full stack in memory non-blocking writes
Cache top of stack locally (registers or shared memory)
Enables BVHs as well as k-d trees
5-10% faster in our current implementation
Copyright NVIDIA 2008
50
Details – Algorithmic
Top-level BVH + subtrees (BVH or k-d tree)
Supports rigid motion, instancing
Rebuild/refit easy to add
Traversal + intersection + shading “megakernel”
while – while vs. if – if
Highly variable thread lifetimes!
Software load-balancing
Copyright NVIDIA 2008
51
Details - Implementation
Triangle & hierarchy data through texture cache
Ray tree recursion
Stack in local memory to store shader live variables
Copyright NVIDIA 2008
52
Big Picture
1. Big strategic optimization: minimize per-thread state
2. Otherwise, take simplest option
•
Clever optimizations usually violate rule 1
3. Lots of opportunity for further research
•
Coalescing work for increased coherence (work queues)
•
•
•
•
Data coherence
Execution coherence
Ray space hierarchies
Radical departures from traditional methods (see RT08)
Copyright NVIDIA 2008
53
System Diagram – ray tracing
Texture/Vertex
buffer setup
(OpenGL)
Ray tracing
(CUDA)
Image
display/postpro
cessing
(OpenGL)
System Diagram – ray tracing
Texture/Vertex
buffer setup
(OpenGL)
Ray
generation
Programmable
Ray tracing system
Light
shader
Build
Ray tracing
(CUDA)
Image
display/postpro
cessing
(OpenGL)
Traversal
Miss
shader
Material
shading
…
System Diagram – Hybrid
Multi-pass
Rasterization
(OpenGL)
Δ IDs, …
Ray
generation
Programmable
Ray tracing system
Light
shader
Build
Ray tracing
(CUDA)
Traversal
FBO, …
…
Composite,
shade, display
(OpenGL)
Miss
shader
Material
shading
Hybrid Rendering – Primary Rays
Copyright NVIDIA 2008
57
Hybrid Rendering – Primary Rays
Copyright NVIDIA 2008
58
Hybrid Rendering – “God Rays”
Wyman & Ramsey, RT08
Creative Commons Image: Mila Zinkova
Copyright NVIDIA 2008
59
Hybrid Rendering – “God Rays”
Wyman & Ramsey, RT08
Creative Commons Image: Mila Zinkova
Copyright NVIDIA 2008
60
Indirect Illumination != Ray Tracing
No indirect lighting
With indirect lighting
Laine et al., Incremental Instant Radiosity for Real‐Time Indirect Illumination
Eurographics Symposium on Rendering 2007
Copyright NVIDIA 2008
61
Solve the Right Problems!
Tracing eye rays is uninteresting
rasterization wins, use it
Scenes change dynamically at run time
can’t lovingly craft all spatial indices in off-line process
Complex shaders & texturing are mandatory
a big weakness of CPU software tracers to date
Need to provide a complete solution
construction, shading, application integration, hardware
Copyright NVIDIA 2008
62
Summary
CUDA makes GPU ray tracing fast and practical
A powerful tool in the interactive graphics toolbox
Hybrid algorithms are the future
Leverage the power of rasterization with the flexibility of CUDA
Together they provide tremendous scope for innovation
Copyright NVIDIA 2008
63
Thank You!
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement