5. Graphics Hardware - Interactive Media Systems, TU Wien

Hannes
Kaufmann
3D Graphics Hardware
Hannes Kaufmann
Interactive Media Systems Group (IMS)
Institute of Software Technology and
Interactive Systems
Thanks to Dieter Schmalstieg and Michael Wimmer for providing slides/images/diagrams
Hannes
Kaufmann
Motivation
• VR/AR environment = Hardware setup + VR
Software Framework + Application
• Detailled knowledge is needed about
– Hardware: Input Devices & Tracking, Output
Devices, 3D Graphics
– Software: Standards, Toolkits, VR frameworks
– Human Factors: Usability, Evaluations,
Psychological Factors (Perception,…)
2
Hannes
Kaufmann
3D Graphics Hardware -Development
• Incredible development boost of consumer
cards in previous ~20 years
• Development driven by game industry
• PC graphics surpassed workstations (~2001)
3
Consumer Graphics – History
• Up to 1995
– 2D only (S3, Cirrus Logic, Tseng Labs, Trident)
•
•
•
•
•
•
1996 3DFX Vodoo (first real 3D card); Introduction of DX3
1997 Triangle rendering (… DX5)
1998 Triangle setup (…DX6)
1999 Multi-Pipe, Multitexture (…DX7)
2000 Transform and lighting (…DX8)
2001 Programmable shaders
– PCs surpass workstations
•
•
•
•
•4
2002 Full floating point
2004 Full looping and conditionals
2006/07 Geometry/Primitive shaders (DX10, OpenGL 2.1)
2007/08 CUDA (Nvidia) - GPU General Purpose Computing
2009 DX11: Multithreaded rend., Compute shaders, Tessel.
Hannes
Kaufmann
Moore‘s Law
• Gordon Moore, Intel co-founder, 1965
• Exponential growth in number of transistors
• Doubles every 18 months
– yearly growth: factor 1.6
– Slow development since 2002 (2.8GHz available
since December 2002);
– But increase in number of cores - currently 18-core
CPUs
• Moore‘s Law is coming to an end
5
6
Hannes
Kaufmann
Multi-Core Graphics
NVIDIA Quadro P6000
3840 cores, 1560 Mhz
Single/double precision
~12000 GFLOPs (single prec.)
• Faster than the fastest Supercomputer in 2001
• Almost Moore’s law squared (^1.5-2.0)
• In the past performance doubled every 9-12 months –
not anymore but still fast development
• Used in HPC parallel computers (CUDA, Tesla)
– Molecular dynamics, climate simulations, fluid dynamics
– …everything highly parallel computable
• Speedup 10-100x compared to standard processors
7
And it goes on and on….
• Performance increase expected to continue
within the next few years
– Smaller chip production processes possible (currently
12nm for graphics cards, 10nm CPUs)
– Multiple graphics cards or GPUs in a PC
– Multi-core GPUs
• General purpose
computing on GPUs
– OpenCL
– CUDA / www.gpgpu.org
8
Hannes
Kaufmann
Mobile ARM Graphics Chips
for Smartphones/Tablets
Example: Tegra X1 (2015)
• Processor: 4 ARM Cortex-A57 + 4 A53 cores,
20nm
• NVIDIA Maxwell 256-core GPU @ 1GHz
supporting GPU computing – CUDA, DirectX
12, OpenGL 4.5, OpenGL ES 3.1
• Video output 4K x 2K
@60 Hz, 1080p @120 Hz
• 4k H.265 video decode
9
Hannes
Kaufmann
What are the benefits in VR/AR?
Which features are needed?
10
3D Card High End Model
Hannes
Kaufmann
nVidia Quadro P6000 (> € 5000.- )
24 GB GDDR5X RAM
Based on Pascal Architecture (Geforce GTX 1080)
3840 CUDA cores
480 GB/s Bandwidth
optimized OpenGL drivers
(comp. to consumer card)
• 16K x 16K texture resolution
• DX12, Shader Model 5,
OpenGL 4.5, Vulkan 1.0
•
•
•
•
•
•
11
Hannes
Kaufmann
Some Relevant Features (for VR)
• Memory size: 24 GB
• 4 DisplayPorts (4096x2160@120Hz or 4 x 5K @60Hz),
1 DVI-I DL (1600p)
• OpenGL quad-buffered stereo (optional 3-pin sync
connector); 3D Vision Pro
• Scalable Link Interface (SLI™) Technology
• Nvidia Mosaic: 2-8 displays (4K resolution)
• Fast 3D Texture transfer; HW 3D Window clipping
• Quadro-Sync (optional) with Framelock and Genlock
• HDR technology, 30-bit color, SDI output option
• Quality: 64 x Full-Scene Antialiasing (FSAA), …
12
Hannes
Kaufmann
Explanations
&
Back to the Basics
13
3D Graphics
Basics
The Graphics Pipeline(s)
14
What for ?
Hannes
Kaufmann
Understanding the rendering pipeline is the key
to real-time rendering!
• Insights into how things work
– Understanding algorithms
• Insights into how fast things work
– Performance
15
„Historical“ Fixed Graphics Pipeline
16
Purpose: Convert Scene to Pixel Data
Fixed processing of scene
Geometry Stage:
• Input: Primitives
• Output: 2D window coordinates
Rasterization Stage:
• Input: 2D window coordinates
• Output: Pixels
• Fragment: “pixel”, but with additional
info (alpha, depth, stencil, …)
Nowadays every part of the pipeline is
hardware accelerated !
Hannes
Kaufmann
3D Graphics
Basics
The Stages
17
Hannes
Kaufmann
(1) Application Stage:
3D Graphics Programming
3D Application Programmer‘s Interfaces (APIs)
• Access to Hardware
• Standards:
– OpenGL, Direct3D
• Language: C, C++ (mostly)
• Higher Level APIs based on
OpenGL, Direct3D
– Game Engines
– Scene Graph APIs:
18
• OpenInventor, Java3D
• OpenSceneGraph, Performer,…
OpenGL – Hello World
#include <GL/glut.h>
void init (void) {
glClearColor (0.0, 0.0, 0.0, 0.0);
glMatrixMode(GL_PROJECTION);
void display(void) {
glLoadIdentity();
glClear (GL_COLOR_BUFFER_BIT);
glOrtho(0.0, 1.0, 0.0, 1.0, -1.0, 1.0);}
/* draw white polygon (rectangle) with
corners at (0.25, 0.25, 0.0)
int main(int argc, char** argv)
and (0.75, 0.75, 0.0) */
{
glColor3f (1.0, 1.0, 1.0);
glutInit(&argc, argv);
glBegin(GL_POLYGON);
glutInitDisplayMode (GLUT_SINGLE
glVertex3f (0.25, 0.25, 0.0);
GLUT_RGB);
glVertex3f (0.75, 0.25, 0.0);
glutInitWindowSize (250, 250);
glVertex3f (0.75, 0.75, 0.0);
glutInitWindowPosition (100, 100);
glVertex3f (0.25, 0.75, 0.0);
glutCreateWindow ("hello");
glEnd();
init ();
glFlush ();
glutDisplayFunc(display);
}
glutMainLoop();
return 0;
}
19
OpenGL Primitives
20
(1) Application Stage
• Generate database (Scene description)
– Usually only once
– Load from disk
– Build acceleration / optimization structures
• Lots of optimizations possible: Build hierarchy, Level of Details, Culling
Techniques, Impostors,…
•
•
•
•
•
•
21
Simulation (Animation, AI, Physics)
Input event handlers
Modify data structures
Database traversal
Primitive generation
Shaders (vertex, geometry, fragment)
Hannes
Kaufmann
Graphics Driver
• Command interpretation/translation
– Host commands
GPU commands
• Handle data transfer
• Memory management
• Emulation of missing features (e.g. full
OpenGL 4.5 support)
22
(2) Geometry Stage
23
Hannes
Kaufmann
Command
• Command buffering
• Command interpretation
• Unpack and perform format conversion
„Input Assembler“
24
Vertex Processing: Old Geometry Stage
Vertex
Shader
25
Eye
Origin
Lighting
and
Shading
unit cube
Hannes
Kaufmann
26
Viewing Frustum
Hannes
Kaufmann
27
Vertex Processing
Hannes
Kaufmann
Vertex Processing
• Fixed function pipeline:
– User has to provide matrices, the rest happens
automatically
• Programmable pipeline:
– User has to provide matrices/other data to shader
– Shader Code transforms vertex explicitly
• We can do whatever we want with the vertex!
28
(3) Rasterization Stage
• Input: 2D Geometric
Primitives (Points, Lines,
Polys, Bitmaps)
• Primitives needed!
• 1st step output: Fragments
(Pixel-Coord. + Color +
Depth + Texture-Coord.)
• Polygons are decomposed
(various methods)
29
Hannes
Kaufmann
(3) Rasterization Stage
• Per-Fragment Operations
• Pixel Ownership Test (Window visible?)
30
Buffers:
• Frame Buffer (Color + Alpha channel)
• Depth Buffer Test (z-Buffer)
• Stencil Buffer
• Accumulation Buffer
• P-Buffer (aux. color buffer -> direct rendering)
Hannes
Kaufmann
Rasterizer/Display Stage
• Framebuffer pixel format: RGBA vs. indexed
(colormap)
• Bits: 32, 24 (true color) 16, 15 (high color), 8
• Double buffering, Triple Buffering
• For Stereo: Quad buffer
• Per-window video mode (e.g. stereo, mono)
• Display: frame buffer -> screen
31
Hannes
Kaufmann
Modern Programmable
Graphics Pipeline
• Vertex Shader integrated in „old“ Geometry Stage
– Allows per vertex transformations e.g. warping
• Fragment/Pixel Shader integrated in „old“ Rasterization Stage
– Fragment: „pixel“ with additional information (alpha, depth, stencil,…)
– Allows e.g. per pixel lighting,….
32
Vertex and Fragment Shaders
Hannes
Kaufmann
• Various Shading Languages
33
–
–
–
–
ARB - GPU assembly language (optimized)
GLSL (Open GL Shading Language – in OpenGL 2.0)
HLSL (High Level Shading Language – Microsoft)
CG (Nvidia)
Hannes
Kaufmann
34
DirectX10 / OpenGL 2.0 Evolution
Hannes
Kaufmann
35
Current OpenGL 4.x / DirectX 11
Architecture
Hannes
Kaufmann
Tesselation
• If just triangles, nothing needs to be done,
otherwise:
– Evaluation of polynomials for curved surfaces
– Create vertices (tesselation)
• DirectX11 specifies this in hardware!
– 3 new shader stages!
– Still not trivial (special algorithms required)
37
Hannes
Kaufmann
38
DirectX 11 Tesselation
Hannes
Kaufmann
39
Tesselation Example &
Displacement Map
Hannes
Kaufmann
41
Mobile Graphics: OpenGL ES
Hannes
Kaufmann
42
OpenGL ES 2.0
Hannes
Kaufmann
2014: OpenGL ES 3.x
• 32bit Floating point support
• Texture compression, 3D textures
• Multiple apps can share 3D hardware
43
Hannes
Kaufmann
Current State
• Heterogeneous architectures
– CPU and GPU on one chip (especially mobile chips)
– GPU is treated as a parallel streaming PU
• High bandwidth interconnect of CPU and GPU
– CPU and streaming units working together
• Whole pipeline is fully programmable
(GPU computing)
• Good-bye to the one way graphics pipeline!
44
Hannes
Kaufmann
46
Real-time Ray Tracing
Hannes
Kaufmann
Long term future
• We have…
– Very high fill rates polygon rates
– Lots of textures
– Almost full programmability
– Few limits (program lengths, memory bandwidth)
• We want (and will get)…
– Flexible geometry specification
– Full, easy programmability
– Real-time ray tracing
47
VR/AR and the
Need for Extreme Graphics Power:
Examples
Princeton Display Wall
3x8 projectors, 24 PC cluster
Mechanical visualization
CAVE, SGI Onyx
(8 CPUs, 6 outputs)
HMD setups for
larger groups
Parallel Graphics Hardware
Hannes
Kaufmann
Overcome bottleneck by parallel computation
Types of parallel graphics:
1. On-chip / on a graphics board (standard)
2. Multiple boards (former: graphics supercomputer)
Multiple boards with multi GPUs (1+2)
3. PC cluster:
–
–
50
Offline Rendering: Standard network – Distributed
Environment
Realtime Rendering: PC cluster with special hardware
1
Hannes
Kaufmann
Multiple Graphic Pipelines
• Pipelines fully in HW
• Multiple independent
pipelines can be parallelized
• NVIDIA / AMD
– CUDA cores / streaming processors
51
• Modern GPUs process more than 3800 Pipelines in
parallel (Unified architecture)
e.g. Nvidia Quadro P6000
Parallel On-Board
Examples:
• Nvidia Geforce GTX TITAN Z (2014)
– 2 x 2880 cores
• AMD Radeon R9 295X2 – 2x2816 cores
52
1
Parallel Graphics Hardware
Hannes
Kaufmann
Types of parallel graphics:
1. On-chip / on a graphics board (standard)
2. Multiple boards (former: graphics supercomputer)
Multiple boards with multi GPUs (1+2)
3. PC cluster:
–
–
53
Offline Rendering: Standard network – Distributed
Environment
Realtime Rendering: PC cluster with special hardware
Hannes
Kaufmann
Multiple Graphics Boards
Parallel graphics rendering:
• Graphics „Supercomputer“
• PC with SLI or CrossFire
Different:
Multiple display support - (not) synchronized:
• PC with multiple unconnected cards
– Nvidia Mosaic
54
2
Hannes
Kaufmann
Graphics Supercomputer
2
SGI Onyx with
Infinite Reality 3
55
2
Hannes
Kaufmann
SGI Onyx 3000 & Infinite Reality 4
G-Brick:
• 4 RasterManager Boards
• 1.3 Gpixel/s/Pipeline
– (8 subsample/full scene/AA)
•
•
•
•
56
1 GB Texturspeicher
10 GB Framebuffer
192 GB/s Bandbreite
Kombination bis zu 16 IR4
Nvidia Multi-GPU
solutions
Hannes
Kaufmann
•
•
•
•
•
57
Connected via PCIe to PC
2-8 GPUs
12 GB Frame Buffer per GPU
2 to 8 Dual-Link Digital Display Connectors
Genlock/Frame Lock
2
Hannes
Kaufmann
Supercomputer – Application Areas
• Theme Parks
(DisneyQuest –
CyperSpace Mountain)
• Flight Simulators
• Military Applications
• CAVEs / Large setups
58
Barco RP-360
Flight Simulator (Video)
Graphics Supercomputer
Hannes
Kaufmann
•
•
•
•
•
•
•
•
•
59
Multiple CPUs
Multiple Geometry Engines
Multiple Rasterization Engines
Genlocking
Multiple Pipes (=graphics cards)
Multiple Channels (=display outputs)
Highly configurable
Now used: standard Nvidia/ATI graphics chips
On PC: Scalable Link Interface (Nvidia) or
CrossFireX (ATI) for PCI Express
2
Hannes
Kaufmann
Parallel Graphics Hardware
(A) Computing the same (high resolution) image
(B) Computing multiple images –
Multiple outputs
60
Hannes
Kaufmann
Basic Problems of Parallel Rendering
Vertex and Pixel Load Balancing:
• Problem with parallel rendering
– Load balancing of vertices
 3D (object space) problem
– Load balancing of pixel (rasterizers)
 2D (screen space) problem
61
Parallel Rendering as Sorting
• Parallel Geometry Stage
– Cut 3D model into pieces with equal T&L1
number of vertices
– Assign one piece to one T&L unit
• Parallel Rasterization
– Cut destination image into tiles
– Assign (triangles contained in) one
tile to one rasterizer
 Need to SORT transformed 2D
triangles
• Shared common memory
62
T&L3
T&L2
T&L4
R1
R2
R3
R4
Hannes
Kaufmann
SLI™ (Nvidia)
• Scalable Link Interface
3 Modes:
• Split Frame Rendering (SFR) - Scissors: Splits each frame and
sends half the load to each of the graphics cards
• Alternate Frame Rendering (AFR):
Frame 1 – Card 1, Frame 2 – Card 2,
alternating
• VR SLI: Right/Left frame computed on
Card 1/Card2 in parallel
• PCIe cards are connected by a bridge
• Optimal performance increase: 1,8 max.
63
Hannes
Kaufmann
CrossFireX (ATI)
3 Modes:
• Supertiling
• Scissors
• Alternate Frame Rendering
Additional AA Mode
64
2
Hannes
Kaufmann
SLI / CrossFire
• Mainboards with SLI or CrossFire support
needed
• Connection via separate bridge (PCIe
communication)
• CrossFire SuperTiling efficient
• CrossFireX more flexible (supports multiple
displays and connection of different ATI cards)
• 2-4 cards can be connected
• VR SLI Mode: Parallel left/right image
65
Nvidia Mosaic
multiple display configurations with Quadro cards
66
Dual Host Interface Card required to run dual systems.
2
Parallel Graphics Hardware
Hannes
Kaufmann
Types of parallel graphics:
1. On-chip / on a graphics board (standard)
2. Multiple boards (former: graphics supercomputer)
Multiple boards with multi GPUs (1+2)
3. PC cluster:
–
–
68
Offline Rendering: Standard network – Distributed
Environment
Realtime Rendering: PC cluster with special hardware
Hannes
Kaufmann
Parallel Cluster Rendering (1)
• PC Cluster
– Off-the-shelf hardware
– Network (LAN)
– Cheap
– Scalable
• Distributed Software
System
69
3
Hannes
Kaufmann
Parallel Cluster Rendering (2)
• power of cluster ≥ power of supercomputer
• Price of cluster << price of supercomputer
• BUT: problems of cluster
– How to make cluster PCs work together
– On a single image
(or consistent set of images)
 Parallel Execution of Rendering !
 Cluster synchronisation (genlocking) !
70
3
Cluster Synchronisation
Q: How to
synchronize
multiple
displays?
(1) Simple: PC +
Multiple
graphic
outputs
(2) Not so simple:
Multiple
workstations
71
3
Parallel Graphics Hardware
Hannes
Kaufmann
Types of parallel graphics:
1. On-chip / on a graphics board (standard)
2. Multiple boards (former: graphics supercomputer)
Multiple boards with multi GPUs (1+2)
3. PC cluster:
–
–
72
Offline Rendering: Standard network – Distributed
Environment
Realtime Rendering: PC cluster with special hardware
Example:
CAVE
“Computer Assisted Virtual Environment” ™
• Has 3 to 6 large screens
• Puts user in a room for
visual immersion
• Usually driven by a
single or group of
powerful graphics
engines – nowadays
usually PC cluster
Hannes
Kaufmann
Example: CAVE & Shuttering
Shutter Glasses
74
Hardware Synchronisation
Synchronizing multiple displays/workstations
Framelock:
Synchronizing frame buffer
swap
• Begins redrawing at the
same time
75
Genlock:
Exact synchronization of
vertical synch (electron
beam of CRT)
• Refreshes each pixel
synchronously
Hannes
Kaufmann
76
Example: Blue-C
3D Card High End Model
Hannes
Kaufmann
nVidia Quadro P6000 (> € 5000.- )
24 GB GDDR5X RAM
Based on Pascal Architecture (Geforce GTX 1080)
3840 CUDA cores
480 GB/s Bandwidth
optimized OpenGL drivers
(comp. to consumer card)
• 16K x 16K texture resolution
• DX12, Shader Model 5,
OpenGL 4.5, Vulkan 1.0
•
•
•
•
•
•
77
Hannes
Kaufmann
Some Relevant Features (for VR)
• Memory size: 24 GB
• 4 DisplayPorts (4096x2160@120Hz or 4 x 5K @60Hz),
1 DVI-I DL (1600p)
• OpenGL quad-buffered stereo (optional 3-pin sync
connector); 3D Vision Pro
• Scalable Link Interface (SLI™) Technology
• Nvidia Mosaic: 2-8 displays (4K resolution)
• Fast 3D Texture transfer; HW 3D Window clipping
• Quadro-Sync (optional) with Framelock and Genlock
• HDR technology, 30-bit color, SDI output option
• Quality: 64 x Full-Scene Antialiasing (FSAA), …
78
Hannes
Kaufmann
Physics Effects
•
•
•
•
Calculation on GPU
Rigid Bodies, Joints
Cloth, Particles, Fire, Fluids
Puts higher rendering load on
graphics card
– SLI recommended
79
Physics in VR
Hannes
Kaufmann
GRIMAGE Project
Incredible Machine
Microsoft Holodesk
80
General Purpose Computing
• Nvidia TESLA V100
–
–
–
–
–
–
–
„High Performance Computing“ / Deep Learning
No graphics card! No graphics output!
Programmed using CUDA
Additional GPU
5120 CUDA cores
640 Tensor Cores
16 GB HBM2 RAM
– CUDA C/C++/Fortran, OpenCL,
DirectCompute Toolkits, ....
• Alternative: Intel Xeon Phi
– x86 cores (72 Atom cores)
81
Nvidia GRID
Hannes
Kaufmann
• GPU Virtualization – sharing the GPU
• Low latency remote display
– „Real time“ H.264 encoding
• Grid K2:
– 2 Kepler GPUs, 3072 cores
– 8GB RAM
82
Hannes
Kaufmann
Literatur
• Real-time Rendering
Tomas Akenine-Möller, Eric Haines, and Naty Hoffman, 1045
pages, from A.K. Peters Ltd., 3rd edition, ISBN 978-1-56881424-7, 2008
• http://www.realtimerendering.com/
83
Download PDF
Similar pages