Interactive Ray Tracing with CUDA David Luebke and Steven Parker NVIDIA Research Ray Tracing & Rasterization Rasterization For each triangle: Find the pixels it covers For each pixel: compare to closest triangle so far Ray tracing For each pixel: Find the triangles that might be closest For each triangle: compute distance to pixel When all triangles/pixels have been processed, we know the closest triangle at all pixels Ray Tracing & Rasterization Rasterization For each triangle: Requires Z-buffer: track distance per pixel Find the pixels it covers For each pixel: compare to closest triangle so far Ray tracing Requires spatial index: a spatially sorted arrangement of triangles For each pixel: Find the triangles that might be closest For each triangle: compute distance to pixel When all triangles/pixels have been processed, we know the closest triangle at all pixels Myths of Ray Tracing & Rasterization Ray tracing is clean, rasterization is ugly Both are ugly Ray tracing is sublinear, rasterization linear in primitives Rasterization uses culling techniques Ray tracing is linear, rasterization sublinear in pixels Ray tracing uses packets & frustum tracing Ray Tracing vs. Rasterization Rasterization is fast but needs cleverness to support complex visual effects Ray tracing supports complex visual effects but needs cleverness to be fast Why Rasterization? Fast & Efficient Ubiquitous – part of workflow, pipeline Great for displacement-mapped geometry Developers know how to make beautiful pictures... Copyright NVIDIA 2008 6 Why Rasterization? From Battlefield: Bad Company, EA Digital Illusions CE AB Copyright NVIDIA 2008 7 Why Rasterization? From Battlefield: Bad Company, EA Digital Illusions CE AB Copyright NVIDIA 2008 8 Why Rasterization? From Crysis, Crytek GmbH Copyright NVIDIA 2008 9 Why Rasterization? From Crysis, Crytek GmbH Copyright NVIDIA 2008 10 Why ray tracing? Ray tracing unifies rendering of visual phenomena fewer algorithms with fewer interactions between algorithms Easier to combine advanced visual effects robustly soft shadows subsurface scattering indirect illumination transparency reflective & glossy surfaces depth of field … Copyright NVIDIA 2008 11 Ray Tracing vs. Rasterization Rasterization is fast but needs cleverness to support complex visual effects Ray tracing supports complex visual effects but needs cleverness to be fast Use both! Ray tracing (Appel 1968, Whitted 1980) Copyright NVIDIA 2008 13 Distributed Ray Tracing (Cook, 1984) Copyright NVIDIA 2008 14 Path Tracing (Kajiya, 1986) Copyright NVIDIA 2008 15 Ray Tracing Regimes Real-time Interactive Computational Power Copyright NVIDIA 2008 16 Industrial strength ray tracing mental images is market leader for ray tracing software Applicable in numerous markets: automotive, design, architecture, film Copyright NVIDIA 2008 17 Importance Copyright NVIDIA 2008 18 Importance Copyright NVIDIA 2008 19 Importance Copyright NVIDIA 2008 20 Interactive Ray Tracing Copyright NVIDIA 2008 21 GPUs Are Fast & Getting Faster 1000 Peak GFLOP/s NVIDIA GPU Intel CPU 750 500 250 0 Sep-02 Copyright NVIDIA 2008 Jan-04 May-05 Oct-06 Feb-08 22 Why GPU Ray Tracing? Abundant parallelism, massive computational power GPUs excel at shading Opportunity for hybrid algorithms Copyright NVIDIA 2008 23 GPU Ray Tracing Purcell et al., Ray Tracing on Programmable Graphics Hardware, SIGGRAPH 2002 Purcell et al., Photon Mapping on Programmable Graphics Hardware, Graphics Hardware 2004 Purcell Photon Map Image Goes Here Popov et al., Stackless KD-Tree Traversal for High Performance GPU Ray Tracing, Computer Graphics Forum, Oct 2007 Popov et al., Realtime Ray Tracing on GPU with BVH-based Packet Traversal, Symposium on Interactive Ray Tracing 2007 Copyright NVIDIA 2008 24 GPU Ray Tracing 18 16 14 12 K-D Restart GPU Improvement Looping Short-Stack 10 8 6 4 2 0 Horn et al., Interactive k‐D Tree GPU Raytracing ACM SIGGRAPH Symposium on Interactive 3D Graphics 2007 Copyright NVIDIA 2008 25 GPU Ray Tracing Zhou et al., Real‐Time KD‐Tree Construction on Graphics Hardware Microsoft Research Asia Tech Report 2008‐52 Copyright NVIDIA 2008 26 Volume Ray Casting Ray marching for isosurfaces + direct volume rendering Electron density of virus from cryoelectroscopy Vital to change isosurface interactively Great match for CUDA Volume Ray Casting With CUDA Marsalek & Slusallek 2008 Volume Ray Casting Ray marching for isosurfaces + direct volume rendering Electron density of virus from cryoelectroscopy Vital to change isosurface interactively Great match for CUDA Volume Ray Casting With CUDA Marsalek & Slusallek 2008 Volume Ray Casting Ray marching for isosurfaces + direct volume rendering Electron density of virus from cryoelectroscopy Vital to change isosurface interactively Great match for CUDA Volume Ray Casting With CUDA Marsalek & Slusallek 2008 City demo Real system NVSG-driven animation and interaction Programmable shading Modeled in Maya, imported through COLLADA Fully ray traced 2 million polygons Bump-mapping Movable light source 5 bounce reflection/refraction Adaptive antialiasing System Diagram – ray tracing Texture/Vertex buffer setup (OpenGL) Ray tracing (CUDA) Image display/ postprocessing (OpenGL) System Diagram – ray tracing Texture/Vertex buffer setup (OpenGL) Ray generation Programmable Ray tracing system Light shader Build Ray tracing (CUDA) Image display/ postprocessing (OpenGL) Traversal Miss shader Material shading Key Parallel Abstractions in CUDA 0. Zillions of lightweight threads Î Simple decomposition model 1. Hierarchy of concurrent threads Î Simple execution model 2. Lightweight synchronization primitives Î Simple synchronization model 3. Shared memory model for cooperating threads Î Simple communication model Copyright NVIDIA 2008 33 Key Parallel Abstractions in CUDA 0. Zillions of lightweight threads Î Simple decomposition model 1. Hierarchy of concurrent threads Î Simple execution model 2. Lightweight synchronization primitives Î Simple synchronization model 3. Shared memory model for cooperating threads Î Simple communication model Copyright NVIDIA 2008 34 Hierarchy of concurrent threads Parallel kernels composed of many threads Thread t all threads execute the same sequential program Copyright NVIDIA 2008 35 Hierarchy of concurrent threads Parallel kernels composed of many threads Thread t all threads execute the same sequential program Block b Threads are grouped into thread blocks t0 t1 … tB threads in the same block can cooperate Copyright NVIDIA 2008 36 Hierarchy of concurrent threads Parallel kernels composed of many threads Thread t all threads execute the same sequential program Block b Threads are grouped into thread blocks t0 t1 … tB threads in the same block can cooperate Kernel foo() Threads/blocks have unique IDs Copyright NVIDIA 2008 ... 37 Big Picture GTX 280 supports up to 30,720 concurrent threads! 1. Big strategic optimization: minimize per-thread state 2. Otherwise, take simplest option • Clever optimizations usually violate rule 1 3. Lots of opportunity for further research • Coalescing work for increased coherence (work queues) • • • • Data coherence Execution coherence Ray space hierarchies Radical departures from traditional methods (see RT08) Copyright NVIDIA 2008 38 Details – Algorithmic Top-level BVH + subtrees (BVH or k-d tree) Supports rigid motion, instancing Rebuild/refit easy to add Traversal + intersection + shading “megakernel” while – while vs. if – if Highly variable thread lifetimes! Software load-balancing Copyright NVIDIA 2008 39 Details - Implementation Triangle & hierarchy data through texture cache Ray tree recursion Stack in local memory to store shader live variables Copyright NVIDIA 2008 40 Short Stack Goal: minimize state per thread Strategy: replace traversal stack with short stack Horn et al., Interactive k-D Tree GPU Raytracing, I3D 2008 Slides courtesy Daniel Horn Copyright NVIDIA 2008 41 KD-Tree X Z B Y X C Y D Z A A Copyright NVIDIA 2008 B C D 42 KD-Tree X Z B Y X C Y D Z A A Copyright NVIDIA 2008 B C D 43 KD-Tree X Z B Y X C Y D Z A A Copyright NVIDIA 2008 B C D 44 KD-Tree X Z B Y X C Y D Z A A Copyright NVIDIA 2008 B C D 45 KD-Tree tmin X Z B Y X C Y D Z A tmax Copyright NVIDIA 2008 A B C D 46 KD-Tree Traversal X Z X B Y C Y D Z A A C B D A Stack: Copyright NVIDIA 2008 Z 47 KD-Restart X Z B Standard traversal Omit stack operations Proceed to 1st leaf Y C A Copyright NVIDIA 2008 D If no intersection Advance (tmin,tmax) Restart from root Proceed to next leaf 48 KD-Restart with short stack (size 1) X Z X B Y C Y D Z A A C B D A Stack: Copyright NVIDIA 2008 A Z 49 Short Stack Cache Even better: Each thread stores full stack in memory non-blocking writes Cache top of stack locally (registers or shared memory) Enables BVHs as well as k-d trees 5-10% faster in our current implementation Copyright NVIDIA 2008 50 Details – Algorithmic Top-level BVH + subtrees (BVH or k-d tree) Supports rigid motion, instancing Rebuild/refit easy to add Traversal + intersection + shading “megakernel” while – while vs. if – if Highly variable thread lifetimes! Software load-balancing Copyright NVIDIA 2008 51 Details - Implementation Triangle & hierarchy data through texture cache Ray tree recursion Stack in local memory to store shader live variables Copyright NVIDIA 2008 52 Big Picture 1. Big strategic optimization: minimize per-thread state 2. Otherwise, take simplest option • Clever optimizations usually violate rule 1 3. Lots of opportunity for further research • Coalescing work for increased coherence (work queues) • • • • Data coherence Execution coherence Ray space hierarchies Radical departures from traditional methods (see RT08) Copyright NVIDIA 2008 53 System Diagram – ray tracing Texture/Vertex buffer setup (OpenGL) Ray tracing (CUDA) Image display/postpro cessing (OpenGL) System Diagram – ray tracing Texture/Vertex buffer setup (OpenGL) Ray generation Programmable Ray tracing system Light shader Build Ray tracing (CUDA) Image display/postpro cessing (OpenGL) Traversal Miss shader Material shading … System Diagram – Hybrid Multi-pass Rasterization (OpenGL) Δ IDs, … Ray generation Programmable Ray tracing system Light shader Build Ray tracing (CUDA) Traversal FBO, … … Composite, shade, display (OpenGL) Miss shader Material shading Hybrid Rendering – Primary Rays Copyright NVIDIA 2008 57 Hybrid Rendering – Primary Rays Copyright NVIDIA 2008 58 Hybrid Rendering – “God Rays” Wyman & Ramsey, RT08 Creative Commons Image: Mila Zinkova Copyright NVIDIA 2008 59 Hybrid Rendering – “God Rays” Wyman & Ramsey, RT08 Creative Commons Image: Mila Zinkova Copyright NVIDIA 2008 60 Indirect Illumination != Ray Tracing No indirect lighting With indirect lighting Laine et al., Incremental Instant Radiosity for Real‐Time Indirect Illumination Eurographics Symposium on Rendering 2007 Copyright NVIDIA 2008 61 Solve the Right Problems! Tracing eye rays is uninteresting rasterization wins, use it Scenes change dynamically at run time can’t lovingly craft all spatial indices in off-line process Complex shaders & texturing are mandatory a big weakness of CPU software tracers to date Need to provide a complete solution construction, shading, application integration, hardware Copyright NVIDIA 2008 62 Summary CUDA makes GPU ray tracing fast and practical A powerful tool in the interactive graphics toolbox Hybrid algorithms are the future Leverage the power of rasterization with the flexibility of CUDA Together they provide tremendous scope for innovation Copyright NVIDIA 2008 63 Thank You!
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
advertisement