Infiniband und Virtualisierung

IT-Symposium 2006
18. Mai 2006
Infiniband und Virtualisierung
Ulrich Hamm uhamm@cisco.com
Data Center Team
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Public
1
Cisco Public
2
Infiniband Primer
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
www.decus.de
IT-Symposium 2006
18. Mai 2006
What is InfiniBand?
• InfiniBand is a high speed – low latency technology
used to interconnect servers, storage and networks
within the datacenter
• Standards Based – InfiniBand Trade Association
http://www.infinibandta.org
• Scalable Interconnect:
1X = 2.5Gb/s
4X = 10Gb/s
12X = 30Gb/s
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
3
InfiniBand Physics
• Copper and Fibre interfaces are specified
• Copper
Up to 20m for 4x connections
Up to 10m for 12x connections
• Optical
Initial availability via dongle solution
Up to 300m with current silicon (probably 150m first)
Long Haul possible, but not with current silicon
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
4
www.decus.de
IT-Symposium 2006
18. Mai 2006
InfiniBand Physics
• Link is bonded 2.5Gbps (1x) links
Fiber is a ribbon cable
Copper is a multi-conductor cable
• Each Link is 8b/10b encoded
4x Link is 4 2.5Gbps Physical Connections
Each connection is 2Gbps data
SAR provides a single 8Gbps data connection (4x)
24 Gbps (12x)
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
5
Pluggable Optics Module
Transforms Powered Copper Ports to Optical Ports
Coverts a copper port to an
optical port on a port by
port basis
Topspin Optical Module
Extends port to port reach
to 150m - 300m with fibre
ribbon cables
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
6
www.decus.de
IT-Symposium 2006
18. Mai 2006
CPU
Server
Mem
Cntlr
System
Memory
Server
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
HCA – Host
HostChannel Adaptor
Host
Host
SM - Subnet manager Host
Host
Server
Server
Server
TCA – Target
Channel Adaptor
HCA
IB Link
CPU
Host Interconnect
InfiniBand Nomenclature
SM
IB Link
IB Link TCA
Ethernet link
Switch
IB Link TCA
Session Number
Presentation_ID
FC link
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
7
InfiniBand Switch Hardware
• Hardware switch devices is a cut-through memory
switch
• Full-duplex, non-blocking tag forwarding switch
• Tags are system Local ID, provided to all network
endpoints by the Master Subnet Manager on
system startup
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
8
www.decus.de
IT-Symposium 2006
18. Mai 2006
InfiniBand Host Channel Adapter
• Network interface for IB attached Servers
• Provides hardware Virtual/Physical memory
mapping, Direct Memory Access (DMA), and
memory protection
• Provides RDMA (Remote DMA) data transfer engine
and reliable packet forwarding capabilities
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
9
InfiniBand Gateway
• Technically a Target Channel Adapter
• Similar to an HCA attached to an embedded device
• Usually doesn’t require virtual memory
manipulation and mapping
• Simplified HCA on a specialized device
Examples, Ethernet to InfiniBand or Fibre Channel to
InfiniBand packet fowarding engines
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
10
www.decus.de
IT-Symposium 2006
18. Mai 2006
InfiniBand System Overview
Session Number
Presentation_ID
Presentation_ID
©
Cisco Systems,
Systems, Inc.
Inc. All
All rights
rights reserved.
reserved.
© 2003,
2005 Cisco
Cisco Internal & Partners
11
11
InfiniBand System Architecture
• Connection Oriented Architecture
Central connection routing management (SM)
All communications based on send/receive queue pairs
• Two primary connection types
Reliable Connection
Unreliable Datagram
• Unused connection types
Unreliable Connection
Reliable Datagram
Raw Datagram
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
12
www.decus.de
IT-Symposium 2006
18. Mai 2006
InfiniBand Connections
• Reliable Connection
Host Channel Adapter based guaranteed delivery
Uses HCA onboard memory (or system memory with PCI-E) for
packet buffering
Primarily used for RDMA communications
Can use end-to-end flow control based on credits related to
available receive buffers
• Unreliable Datagram
Best effort forwarding
Used for IP over IB communications
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
13
InfiniBand Subnet Manager
• IB Fabric is called an InfiniBand Subnet
All devices under the control of a single Master Subnet
Manger (SM)
May have multiple slaves with replicated SM database state
• At system startup, all devices register with the SM
Central Routing function
Shortest Path First Routing
Equal Paths Loadbalanced with static round robin
distribution
Connection endpoint lookup
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
14
www.decus.de
IT-Symposium 2006
18. Mai 2006
Clusters 2.0 Subnet Manager:
Fabric Sweep Performance
Number of Hosts
32
64
128
256
512
1,024*
2,048*
4,096*
Time
< 1 sec
< 1 sec
2 sec
4 sec
22 sec
35-40 sec
1-1:30 min**
5-7 min**
* Requires HPC Subnet Manager for this performance
** Estimated based on simulation
ƒ Assumes InfiniSwitch-III based two tier topology
ƒ Embedded SM can handle up to 1,024 nodes
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
15
IB Addressing
• 3 addresses: GUID, GID, LID
• GUID
Global Unique ID 64 bits in length
Used to uniquely identify a port or port group
HCA and each port has a GUID
(e.g 00:05:ad:00:00:01:02:03)
• GID
GUID plus Subnet prefix
Used for host lookup on a subnet
Used for inter-subnet IB routing (future)
(e.g. fe:80:00:00:00:00:00:00:00:05:ad:00:00:01:02:03)
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
16
www.decus.de
IT-Symposium 2006
18. Mai 2006
Address Resolution
Join Multicast Group
Send ARP on “Broadcast addr”
Receive remote GID via ARP
Ask SM for GID->LID mapping
Ask Host for Service info (QP)
Communicate
Host 1
Session Number
Presentation_ID
SM
Host 2
Host 3
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
17
RDMA and Upper Layer Protocols
Session Number
Presentation_ID
Presentation_ID
©
Cisco Systems,
Systems, Inc.
Inc. All
All rights
rights reserved.
reserved.
© 2003,
2005 Cisco
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
18
18
www.decus.de
IT-Symposium 2006
18. Mai 2006
CPU
Server (Host)
System Memory
Mem
Cntlr
App Buffer
OS Buffer
NIC
interconnect
CPU
Host Interconnect
Current NIC Architecture
Data traverses bus 3 times
Multiple context switches robs CPU cycles
from actual work
Memory bandwidth and per packet interrupts
limit max throughput
OS manages end-to-end communications path
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
19
CPU
Server (Host)
System Memory
Mem
Cntlr
App Buffer
OS Buffer
HCA
interconnect
CPU
Host Interconnect
With RDMA and OS Bypass
Data traverses bus once, saving CPU and memory cycles
Secure Memory – Memory transfers with
no CPU overhead
PCI-X/PCI-e becomes the bottleneck for
network data transmission
HCA manages remote data transmission
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
20
www.decus.de
IT-Symposium 2006
18. Mai 2006
Kernel Bypass
Traditional Model
Kernel Bypass Model
Application
User
Kernel
Sockets
Layer
Application
User
Kernel
Sockets
Layer
TCP/IP
Transport
TCP/IP
Transport
Driver
Driver
Hardware
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
RDMA
ULP
Hardware
Cisco Internal & Partners
21
Upper Layer Protocols
• Variety of software protocols to handle high speed
communication over RDMA
• Protocols include
IP-over-InfiniBand – IETF http://www.ietf.org/internet-drafts/draftietf-ipoib-ip-over-infiniband-09.txt
SDP – InfiniBand Trade Association http://infinibandta.org
SRP – ANSI T10 http://www.t10.org/ftp/t10/drafts/srp/srp-r16a.pdf
DAPL – DAT Collaborative http://www.datcollaborative.org
MPI – MPI Forum http://www.mpi-forum.org
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
22
www.decus.de
IT-Symposium 2006
18. Mai 2006
IPoIB
IP over InfiniBand
• IETF draft specification
• Leverages InfiniBand Multicast for broadcast
requirements (ARP)
• Supports TCP, UDP, IP Multicast
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
23
SDP
Sockets Direct Protocol
• STREAM Sockets over InfiniBand Reliable
Connections
• TCP offload function for IB attached devices
• Can be used by TCP application without re-building
the application
• Asynchronous I/O model also available with true
RDMA forwarding – requires application re-write
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
24
www.decus.de
IT-Symposium 2006
18. Mai 2006
SCSI RDMA Protocol
SRP
• SCSI Semantics over RDMA fabric
• Not IB specific
• Host drivers tie into standard SCSI/Disk interfaces
in kernel/OS
• Can be used for end-to-end IB storage
(implemented today!)
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Direct Access Provider Library
Cisco Internal & Partners
25
uDAPL
• Two variants: User DAPL (uDAPL)/Kernel DAPL
(kDAPL)
• RDMA semantics API
• Provides low level interface for application direct or
kernel direct RDMA functions (memory pinning, key
exchange, etc.)
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
26
www.decus.de
IT-Symposium 2006
18. Mai 2006
MPI
Message Passing Interface
• MPI is the defacto standard API for parallel
computing applications
• RDMA capabilities added via a set of patches to the
base MPI code (MPICH, one of many available MPI
libraries), initially developed at Ohio State
University
http://nowlab.cis.ohio-state.edu/projects/mpi-iba/
Session Number
Presentation_ID
Cisco Internal & Partners
© 2005 Cisco Systems, Inc. All rights reserved.
27
InfiniBand Performance
Measured Results
Application
BSD Sockets
uDAPL
SRP
Async I/O
MPI
extension
Direct
Access
TCP
IP
SDP
IPoIB
10G IB
1GE
Throughput
Latency
Session Number
Presentation_ID
1 Gb/s
4.1Gb/s
4.5 Gb/s
7.9Gb/s
8Gb/s
8Gb/s
40-60 usec
30 usec
18 usec
18 usec
8 usec
3.5 usec
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
28
www.decus.de
IT-Symposium 2006
18. Mai 2006
IB Glossary
• IB – InfiniBand Architecture (not InfinityBand)
• HCA – Host Channel Adapter (NIC)
• RDMA – Remote Direct Memory Access
• SM – Subnet Manager (management process)
• SRP – SCSI RDMA Protocol
• SDP – Sockets Direct Protocol
• TCA – Target channel Adapter (gateway)
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
29
Why InfiniBand for HPC?
1. InfiniBand Technology Provides Industry Best
Price / Performance and network efficiency
2. Mission Critical Management and Reliability
Features for HPC Clusters
3. Industry standard with strong open source
community and Linux adoption
4. Customer and application proof points:
InfiniBand is a hot technology right now
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
30
www.decus.de
IT-Symposium 2006
18. Mai 2006
Cisco HPC Product Overview
Cisco SFS 7000, 7008, 7012, and 7024
Session Number
Presentation_ID
Cisco Public
© 2005 Cisco Systems, Inc. All rights reserved.
31
What Makes The Server Fabric Switch Different?
High Performance
Server-to-Server
Interconnect
Virtualization
(I/O, Storage, and CPU)
−RDMA
−High Bandwidth
−Low Latency
−InfiniBand today;
PCI-Express and /or
10GigE when ready
Policy-Based
Dynamic
Resource
−Shared Resources
Mapping
Across
Entire Cluster
−Routing, Aggregation,
Load Balancing
−App/OS to CPU
provisioning
Performance and Control
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
32
www.decus.de
IT-Symposium 2006
18. Mai 2006
SFS Building Blocks
Gateway Modules
- InfiniBand to Ethernet
- InfiniBand to Fibre Channel
Switches
Integrated System and
Fabric management
Host Channel Adapter (HCA)
With upper layer protocols
ƒ
ƒ
ƒ
ƒ
ƒ
SRP
SDP
uDAPL
MPI
IPoIB
Linux and Windows driver support
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
33
Cisco Internal & Partners
34
HCA
Blade
Server
Session Number
Presentation_ID
VFrame™ Server Fabric
Virtualization Software R1.0
InfiniBand Multifabric
Server Fabric
Switch
Software
The Cisco SFS Product Line
SFS 3001 (TS90)
SFS 3012 (TS360)
(12) 4XIB + 1 Gw
(24) 4XIB + 12 Gws
•(2) 2G FC GW
•(6) GE GW
SFS 7000 (TS120)
(24) 4XIB
SFS 7008 (TS270)
(96) 4XIB
• HCA (2) 1XIB PCI-X
•Embedded switch (14)
1XIB (Internal) + (1) 4XIB
and (1) 12XIB (External)
•(2) 4XIB PCI-X
•(2) 4XIB PCI-ex
*plus InfiniBand Cables
• HCA (2) 4XIB PCI-ex
• Passthru Module (10) 4XIB
•Remote Boot
•Linux Host Driver
•Windows Host Driver
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
www.decus.de
IT-Symposium 2006
18. Mai 2006
Terms and Components
Topspin IB Components
InfiniBand Switches
SFS 7000 (24-ports)
Generally used as an edge switch.
SFS 7008 (up to 96-ports)
SFS 7012 (up to 144-ports)
SFS 7024 (up to 288-ports)
Generally used as core switches in
large fabrics, or as an edge switch in
single switch fabrics.
Topspin InfiniBand Cable
Plugs into a switch either
as a host or uplink cable.
Topspin 2-port Low-Profile HCA
PCI-X and PCIe flavors available
Session Number
Presentation_ID
Cisco Internal & Partners
© 2005 Cisco Systems, Inc. All rights reserved.
35
SFS 7008 – 96-Port Core Switch
Distributed Building Block for Highly
Manageable HPC Clusters
ƒ Rapid Service™ Architecture
- Passive midplane
- 2 min. service vs. 2 man days
ƒ No single point of failure
- Hot-swap power, fan, mgt
6U
form factor
96 4X or 32 12X InfiniBand ports
ƒ Full security
- SSH/SSL/SNMP v3, RADIUS, SCP
- Multi-tiered Security Partitions
ƒ Architected for Rolling upgrades for
hitless service windows
8 Line Card Slots (8 x 12 = 96)
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
36
www.decus.de
IT-Symposium 2006
18. Mai 2006
Mission Critical RAS Functionality
Cisco InfiniBand Rapid Service Architecture
• InfiniBand switches that enable enough enterprise-class reliability, but
still priced for high-port count HPC
• Full set of software and hardware diagnostics with active alerts
• Repair at FRU-level, not switch-level with minimal fabric disruption
SARA
Session Number
Presentation_ID
Cisco Internal & Partners
© 2005 Cisco Systems, Inc. All rights reserved.
37
Cisco SFS Large Switch Family Overview
SFS 7008
SFS 7012
SFS 7024
Chassis Type
6U Modular
7U Modular
14U Modular
Max 4X ports
96 ports
144 ports
288 ports
Max 12X ports
32 ports
48 ports
96 ports
Port Module
Options
8 Horizontal Slots
12 by 4X LIMs
4 by 12X LIMs
12 Side by Side Slots
12 by 4X LIMs
4 by 12X LIMs
24 Side by Side Slots
12 by 4X LIMs
4 by 12X LIMs
High Availability
Redundant Power/Cooling
Redundant Management
Non-disruptive Upgrades
Hot Swappable FRUs
Redundant Power/Cooling
Redundant Management
Non-disruptive Upgrades
Hot Swappable FRUs
Redundant Power/Cooling
Redundant Management
Non-disruptive Upgrades
Hot Swappable FRUs
Embedded or External
Embedded or External
Embedded or External
Fabric Manager
ƒ 64-96 node clusters
Best Use
Session Number
Presentation_ID
ƒ Core switch for 288 to
4,000 node clusters
ƒ 97-144 node clusters
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
ƒ 145-288 node clusters
ƒ Core switch for 1,536+
node clusters
Cisco Internal & Partners
38
www.decus.de
IT-Symposium 2006
18. Mai 2006
Systems Management – GUI Software
SFS 7008 - Topspin 270
Adjust management
ports
Control Every Port
on the Chassis
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
39
System Management – Web Interface
SFS 7000 /
Topspin 120
ƒ Manage the switch chassis, fabric,
or subnet manager
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
40
www.decus.de
IT-Symposium 2006
18. Mai 2006
I/O Virtualization
Session Number
Presentation_ID
Presentation_ID
Cisco Internal & Partners
©
Cisco Systems,
Systems, Inc.
Inc. All
All rights
rights reserved.
reserved.
© 2003,
2005 Cisco
41
41
The Evolution of I/O Virtualization
SMP
Dis-aggregation
Virtualization
Symmetric Multi-Processor
Fibre Channel
Ethernet Network
Pro: Single managed
entity, fast backplane
Pro: Standard servers,
inexpensive
Con: Expensive,
Proprietary server +
backplane
Con: Lots of managed
components, lowperforming interconnect
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Pro: Reduced # of
managed
components, virtual
I/O, fast standards
backplane
Cisco Internal & Partners
42
www.decus.de
IT-Symposium 2006
18. Mai 2006
Evolution of the Data Center
Network and Storage Virtualization
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
43
Cisco Internal & Partners
44
Evolution of the Data Center
Server Virtualization - The Server Switch
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
www.decus.de
IT-Symposium 2006
18. Mai 2006
Virtual I/O for Network and Storage
Unified “wire-once” fabric
Server Cluster
Single
Single InfiniBand
InfiniBand link
link for:
for:
-- Storage
Storage
-- Network
Network
SAN
Fibre
Fibre Channel
Channel to
to InfiniBand
InfiniBand gateway
gateway for
for storage
storage
access
access
ƒƒ Two
Two 2-Gbps
2-Gbps Fibre
Fibre Channel
Channel ports
ports per
per gateway
gateway
ƒƒ Create
Create 10-Gbps
10-Gbps virtual
virtual storage
storage pipe
pipe to
to each
each server
server
Session Number
Presentation_ID
LAN/WAN
Cisco
SFS 3012
Ethernet
Ethernet to
to InfiniBand
InfiniBand gateway
gateway for
for LAN
LAN
access
access
ƒƒ Six
Six Gigabit
Gigabit Ethernet
Ethernet ports
ports per
per gateway
gateway
ƒƒ Create
Create virtual
virtual GigE
GigE pipe
pipe to
to each
each server
server
Cisco Internal & Partners
© 2005 Cisco Systems, Inc. All rights reserved.
45
Virtual I/O: Simplifying the I/O Problem
Legacy Architectures:
Dedicated 1:1
Dedicated
Switch
Ports
48 GigE ports
48 SAN ports
48 IPC ports
Dedicated
Adapters
48 NICs
48 HBAs
48 IPC
Dedicated
Disk
24+ Local
Disks
Dedicated
Processing
24 Servers
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Topspin:
Virtual Many:1
6 GigE ports
6 SAN ports
48 IPC ports
Pooled
Network
48 HCAs
No local disk
24 Stateless
Servers
Pooled
Storage
Pooled
Processing
Cisco Internal & Partners
46
www.decus.de
IT-Symposium 2006
18. Mai 2006
Scaling Massive I/O for the BladeCenter
BladeCenter
14 Servers
BladeCenter
14 Servers
BladeCenter
14 Servers
Topspin 360
IB
FC
Eth
Storage
Networking
Session Number
Presentation_ID
ƒUp to 24 BladeCenter Connections
ƒMost commonly 3-6
Depends on redundancy
requirements and blocking ratios
ƒUp to 24 2G FC connections
ƒUp to 72 1G Eth connections
ƒControl Plane for On-Demand Computing
Cisco Internal & Partners
© 2005 Cisco Systems, Inc. All rights reserved.
47
Integrated InfiniBand for Blade Servers
Create “wire-once” fabric
• Integrated 10Gbps InfiniBand
switches provide unified “wireonce” fabric
10Gbps
30Gbps
• Optimize density, cooling,
space, and cable management.
• Virtual I/O provides shared
Ethernet and Fibre Channel
ports across blades and racks
IB Switch
IB Switch
• Option of integrated InfiniBand
switch (ex: IBM BC) or passthru module (ex: Dell 1855)
Blade Chassis with InfiniBand Switches
HCA
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
48
www.decus.de
IT-Symposium 2006
18. Mai 2006
SFS 3012 Multifabric Server Switch
2 InfiniBand Switch Modules
24 10Gbps InfiniBand ports
12 hot plug expansion slots
ƒ 2-port 2Gbps Fibre Channel
ƒ 4-port Gigabit Ethernet
4U form factor
ƒ Embedded subnet manager
ƒ Redundant power and cooling
ƒ Redundant control
ƒ Dual 4X IB to each slot, dual 12 4x ports
ƒ Console and modem ports
ƒ Ethernet-based management port
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
49
InfiniBand-to-Ethernet Gateway Overview
• Act like L2 bridge between IB and
Ethernet
• Bridge group is the main
forwarding entity
• Bridge group has two bridge ports
Ethernet and IPoIB
– one VLAN to one IB partition
• Ethernet bridge port can be
tagged or untagged
• Ethernet bridge port can
aggregate up to 6 ports
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
50
www.decus.de
IT-Symposium 2006
18. Mai 2006
InfiniBand-to-Ethernet Gateway Features
• IP-Only protocols
• 802.1Q VLAN support
• Link aggregation
• IPv4 multicast support
• Loop protection
• Ethernet jumbo frames up to 9k (Ingress only)
• IP fragmentation
• High Availability
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
51
InfiniBand-to-Ethernet Gateway Performance
• Six 1 Gb/s Ports
• FPGA Based Packet Forwarding Engine
• 11 Million PPS Aggregate
• Hardware Multicast Mapping
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
52
www.decus.de
IT-Symposium 2006
18. Mai 2006
Ethernet Gateway - Multicast Support
• InfiniBand switches support true IB Multicast in Hardware
• InfiniBand-to-Ethernet gateways support multicast
mapping in hardware.
• IB Switches use two types of Forwarding Tables:
Linear Forwarding Table (1 to 1 - Message In/Out)
Multicast Forwarding Table (1 to Many - Message In/Out)
• IB Partitions can be used to Segregate Traffic Domains
• Hardware Multicast Support means:
No Host Overhead for sending Multicast Messages
No Appreciable Latency between 1st Message & Last Message
No Superfluous Network Traffic
Multiple IB Switches in a Fabric Effectively Creates a Parallelized
Multicast Delivery Mechanism (Scales Very Large, Very Fast)
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
53
Ethernet Gateway - High Availability
• Bridge group based redundancy
• Bridge group member of a redundancy group
• One redundancy group cover one VLAN
• Active / Passive and Active / Active modes
• Automatic fail-over and fail-back
• Uses gratuitous ARP to redirect traffic
• Redundancy group can span multiple chassis
• Proprietary redundancy protocols for address distribution
and bridge group election
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
54
www.decus.de
IT-Symposium 2006
18. Mai 2006
InfiniBand-to-Fibre Channel Gateway
• Ensures seamless integration with important SAN
tools.
– Fabric-based Zoning
– LUN-based access controls
– Storage and host-based HA (Multipath) and load
balancing
• Creates SAN network addresses on InfiniBand.
– SAN Management Tools must “see” each node.
– Creates “talk-through” mode with virtual WWNNs per
server.
• Enables SAN Interoperability with InfiniBand.
– Appears as Fibre Attached Loop
– Proven interoperability with Brocade, McData,
Qlogic, EMC, IBM, Hitachi, and more.
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
55
Fibre Channel Gateway Performance
• Current SW Gateway:
– Two 2 Gb/s FC Ports
– Appears as Fibre Attached Loop Device
– 40,000 IOPs
• Next-Gen FPGA “Newton” Gateway
– Four 4 Gb/s FC Ports
– Appears as E-Port
– 450,000 IOPs
– Future support for VSANs
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
56
www.decus.de
IT-Symposium 2006
18. Mai 2006
Topology Transparency: How it Works
• Storage Gateway presents either:
Fabric Attached Loops
E-Port
• SCSI RDMA (SRP) Driver installs on host as normal SCSI driver.
Defined by ANSI T10 standards.
• Each IB/SRP Initiator is assigned:
1 FC WWNN and
Multiple WWPNs
• Unique WWNs allow normal zoning to work as usual.
• Storage-based load balancing works as usual.
• Enhanced multipathing and I/O consolidation
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Cisco Internal & Partners
57
Topspin and Top Tier Server Vendors
“IBM and Topspin Communications
Forge Key Agreement”
“Sun and Topspin Partner to Deliver Grid
Computing Solutions”
“Dell Adds Topspin Switches to HighPerformance Computing Clusters”
“Topspin Selected As NEC's Strategic
InfiniBand Technology Provider”
“HP To Leverage Topspin Technology”
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
58
www.decus.de
IT-Symposium 2006
18. Mai 2006
Server Virtualization
Session Number
Presentation_ID
Presentation_ID
Cisco Internal & Partners
©
Cisco Systems,
Systems, Inc.
Inc. All
All rights
rights reserved.
reserved.
© 2003,
2005 Cisco
59
59
VFrame Server Virtualization Framework
Building Blocks
Policy and Provisioning Services
Virtualization and Boot Services
Topology Transparency
Ethernet
(I/O)
Session Number
Presentation_ID
InfiniBand (RDMA)
High Performance
Server-Server Connectivity
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Fibre Channel
(Storage)
Cisco Internal & Partners
60
www.decus.de
IT-Symposium 2006
18. Mai 2006
VFrame™
• Software suite that makes the Server Switch programmable
• Three main components
VFrame™ Embedded System Logic
Policy ingestion, interpretation, and enforcement at the server switch
VFrame™ APIs (and SDK)
Allows 3rd party (End-user Customers, Software Partners, System Vendor
OEMs) management and provision tools to program and manage the server
switch fabric Software Partners
VFrame™ Director
Software package disseminates policies to server switch fabric
Central policy enforcement provides better system wide decision making and
conflict arbitration
Can be installed on any server in the network
Session Number
Presentation_ID
Cisco Internal & Partners
© 2005 Cisco Systems, Inc. All rights reserved.
61
Programmability
VFrame™
1) Server Switch receives policy
from VFrame™ Director or
3rd party software.
Virtual Server
Policy
2) Based on policy, Server Switch
assembles the virtual server
ƒ
Session Number
Presentation_ID
Selects server(s) that meet
minimum criteria
(e.g. CPU, memory)
ƒ
Boot server(s) over the network
with appropriate app/os image
ƒ
Creates virtual IPs in servers
and maps to VLANs for client
access.
ƒ
vIP
vIP
vHBA
vHBA
vHBA
Cisco
SFS 3012
CPUs
SAN
LAN
Creates virtual HBAs in servers
and maps to Zones, LUNs, and
WWNNs for storage access
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
62
www.decus.de
IT-Symposium 2006
18. Mai 2006
How it Works
Policy Definition
A Virtual Server
combines:
A Virtual Server Group
combines:
Policies Consist of:
Everything but the
physical hardware. Ex:
ƒ One or more Virtual Servers
ƒ
ƒ Shared Storage
ƒ Network Interfaces
ƒ VLAN / SAN Zoning
ƒ SAN WWNs
ƒ Server Customization
scripts
Session Number
Presentation_ID
ƒ Performance Monitors
ƒ
ƒ Policies
One or more Trigger(s)
− Component Failure,
Performance Metric,
Scheduled Event,
Custom Script
One or more Action(s)
− Add/Remove/Change
Server or Group
− Failover Server
− Email Notification
− Custom Script
Cisco Internal & Partners
© 2005 Cisco Systems, Inc. All rights reserved.
63
Cisco SFS Architecture
3rd Party Management and Provisioning Tools
Extensibility
Layer
Protocols and APIs (for Third-Party Management Tools)
Control
Plane
Triggers
Policy
Enforcement
Actions
Topology Transparency
Switching
Fabric
Ethernet
InfiniBand
Network
Session Number
Presentation_ID
© 2005 Cisco Systems, Inc. All rights reserved.
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Servers
Fibre Channel
Storage
Cisco Internal & Partners
64
www.decus.de
IT-Symposium 2006
18. Mai 2006
OPT-2052
Session Number
8224_06_2003_X2
Presentation_ID
©
Cisco Systems,
Systems, Inc.
Inc. All
All rights
rights reserved.
reserved.
© 2003,
2005 Cisco
Copyright © 2005, Cisco Systems, Inc. All rights reserved. Printed in USA.
Presentation_ID.scr
Cisco Internal & Partners
65
65
www.decus.de