The new Xbox 360 250GB CPU GPU SoC

The new Xbox 360 250GB
CPU GPU SoC
Rune Jensen, Microsoft
Bob Drehmel, IBM
Hot Chips 22
8/23/2010
Xbox 360 250GB System
CPU GPU SoC Module
•
•
CPU GPU Die
• High Performance CPU & GPU
• GDDR3 Memory Interface
• Video Output
• PCIe
Embedded DRAM Die
Custom South Bridge
•
•
IO Connectivity
System Management
Custom Video Display Controller
Optical Disk Drive
Flash and IO Connectivity
250GB HDD
Wireless 802.11N Integration
2
CPU, GPU Process Migrations
CPU
90nm, 2005
65nm, 2007
GPU
45nm, 2010
90nm, 2005
65nm, 2008
3
Motivation for Integrated CPU GPU SoC
Cost and Power Savings
• Front Side Bus Removal
• Single Package
• IBM 45nm SOI Technology
35x35mm Package,
1156 Balls
Simplified Console Design
• Motherboard Footprint
• Power Delivery
• Thermal Design
• Single Heatsink + Fan
Integrated
CPU GPU
Embedded
DRAM
4
CPU GPU SoC: Features & Block Diagram
CPU
•
•
•
Three 3.2 GHz PowerPC® cores
Shared 1MB L2 cache
Per Core:
•
Dual Thread Execution
•
32K L1 I-cache, 32K L1 D-cache
•
2-issue per cycle
•
Branch, Integer, Load/Store Units
•
VMX128 Units enhanced for games
GPU
•
•
•
•
•
48 parallel unified shaders
24 billion shader instructions per second
4 billion pixels/sec pixel fill rate
500 million triangles/sec geometry rate
High Speed IO interface to 10 MB EDRAM
Compatibility
•
•
Functional and Performance equivalent to
prior Xbox 360 GPU/CPU
FSB Latency and BW match prior FSB
5
Chip Statistics
• 372M transistors
• 45nm SOI, Ultra-low k dielectric
• 10 levels of metal
• 153 array types, ~1000 instances
• 1.8 million flip flops
• 6 PLLs
• 12 clock domains
• Compared to 2005 CPU GPU
• >60% Power Reduction
• >50% Silicon Area Reduction
CPU Core
CPU Core
L2 Cache
CPU Core
FSBR
BIU/IO
Package Technology
• 35mm FC-PBGA (3-2-3) build-up layers
• Lidded Multi-Chip Module
• High speed interface to on-module EDRAM
• C4 Pitch: 151um minimum
Power Delivery
• Adaptive Power Supply (APS)
• 8 Power Domains
MC0
Graphics Core
MC1
Vid
Manufactured by multiple foundries
6
Edram I/O
Technology
Implementation Challenge: High Performance + Density
CPU
VHDL
Technology
Map
GPU
VHDL
Conversion
Scripts
GPU
Verilog
Synthesis
Semi-Custom Design
Methodology
• 18 Track High
Performance Base Library
GPU
CPU
Standard Cell “ASIC Like”
Methodology
• Synthesized Macros
• 12 Track High Density Base Library
• Custom Macros
• Custom Arrays
• Synthesized Logic Macros
Infrastructure
• Grow-able Array Subsystem
• Transistor Level Timing
Analysis
• Gate Level Timing analysis
• Full clock grid
• Combination clock tree and clock grid
Full Chip Hierarchical Design
Methodology
• Full Chip Logic Verification
• Hierarchical Partition Based Timing
• Full Chip Design For Test
7
Implementation Challenge: Backward Compatibility
Challenge: The new hardware must be
‘transparent’ to the user
•
•
•
Backward Compatibility is a combination of both
performance and function
Existing verification environments only validate
function
Problem compounded by new chip boundaries
and technology change
Solution: Sequential equivalence used to
validate design migration
•
•
•
•
• Arrays
Original
GPU
• I/O’s &
PHY’s
• PLL’s
Converted
GPU
• Test logic
Compare corresponding sequential path outputs
from two different design representations to
ensure their function is the same
Provides both performance and functional
validation for units that didn’t change
Leveraged IBM developed tool for functional
equivalence
Solution: Pattern based verification used
to focus on any areas of change
•
Updates to:
• Clocks &
Clock Gating
Sequential Equivalence
Ran existing pattern based test cases to validate
functions
Wrote new test cases for any areas of change,
including the new FSB logic
8
Power Optimization
Power Optimization Key Design Requirement
Adaptive Power Supply
• Part specific supply voltage for Core VDD
• Separate SRAM supply tracking Core VDD
• Power saving of 31%
In System Voltage Regulator Calibration
• Regulator loadline and tolerance calibrated
• Ring Oscillator based on-die voltage measurement
• Power saving of 12%
Total Power Saving 43%
Max Power Application – Power Virus
• Combine CPU + GPU Max Usage
• Power virus >10% more aggressive than games
9
Thermal Management
Requirement: Max hot spot & Max average temperature
• Must be met regardless of workload
Power and Thermal Maps created for extreme use cases
• Combinations of Max/Min CPU and GPU power
Thermal diode placement dictated by use cases
• Hot Spot Diode: Between CPU core0 and 1
• Average Temperature Diode: By GPU shaders
• Separate Diode for EDRAM
Example Thermal Map
Thermal set points to ensure ample margin to requirements
• Closed loop operation based on all T-Diode measurements
• Goal to keep fan speed low.
• Set points reduced in low power mode to reduce thermal overshoot
when switching to full power mode
Result: Thermal requirements met
10
Results from Power and Thermal Optimizations
11
Console Design Using CPU GPU SoC
Existing Xbox 360 Motherboard
Power Reduction
• Smaller Power Supply Unit
Simplified Motherboard Layout
• Single Chip for CPU GPU
• Power Delivery
• Efficient decoupling cap placement
GPU FSB CPU
Thermal Flexibility
• Single Heatsink
• Single Fan
Console Size Reduction
Motherboard with CPU GPU SoC
12
Power
Delivery
Console Design Using CPU GPU SoC
Power Reduction
• Smaller Power Supply Unit
Simplified Motherboard Layout
• Single Chip for CPU GPU
• Power Delivery
• Efficient decoupling cap placement
Heatsink
Thermal Flexibility
• Single Heatsink
• Single Fan
Motherboard + Heatsink
Console Size Reduction
13
Console Design Using CPU GPU SoC
Power Reduction
• Smaller Power Supply Unit
Simplified Motherboard Layout
• Single Chip for CPU GPU
• Power Delivery
• Efficient decoupling cap placement
HDD
Fan
ODD
Thermal Flexibility
• Single Heatsink
• Single Fan
Motherboard, Fan, Optical Disk Drive
Console Size Reduction
14
Console Design Using CPU GPU SoC
Power Reduction
• Smaller Power Supply Unit
Simplified Motherboard Layout
• Single Chip for CPU GPU
• Power Delivery
• Efficient decoupling cap placement
Thermal Flexibility
• Single Heatsink
• Single Fan
New Xbox 360 250GB Console
Console Size Reduction
15
Conclusion
First High Performance Integrated CPU GPU SoC
• 372M Transistors
• IBM 45nm SOI Technology
Enabled Whisper Quiet Console
• Optimized Power and Thermal Design
Significant benefits achieved from close collaboration
of system and chip design teams
16
Appendix
Contributing Authors
Dan Kuper, Greg Williams, John Sell, Mike Love, Walker Robb, Ram Kadiyala, Eiko Junus,
Jim Barnhart, Kent Haselhorst, Mike Gruver, Bill Hovis, Paul Espeset, Julia Purtell,
Michael Lau, Andrew Roedel, Pete Atkinson, Aaron Buerman, Greg Luurtsema, Paul
Paternoster
©2010 Microsoft Corporation, IBM Corporation
Microsoft, Xbox 360, Xbox, XNA, Visual C++, Windows, Win32, Direct3D, and the Xbox 360 logo and Visual Studio
logo are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other
countries.
IBM, the IBM logo, and PowerPC are trademarks of International Business Machines Corp., registered in many
jurisdictions worldwide.
IEEE is a registered trademark in the United States, owned by the Institute of Electrical and Electronics
Engineers.
OpenMP is a trademark of the OpenMP Architecture Review Board.
The names of actual companies and products mentioned herein may be the trademarks of their respective
owners.
17