네트워크 프로세서 개요 (Network Processor Overview) References

이 강좌는 연세대학교 이용석 교수 연구실에서
연구실에서
제작되었으며 copyright가
copyright가 없으므로
비영리적인 목적에 한하여 누구든지 복사,
배포가 가능합니다. 연구실 홈페이지에는
고성능 마이크로프로세서에 관련된 많은
연세대학교 전기전자공학과
프로세서 연구실
박사과정 홍인표
E-mail: yonglee@yonsei.ac.kr
강좌가 있으며 누구나 무료로 다운로드 받을
수 있습니다.
References
네트워크 프로세서 개요
(Network Processor Overview)
1. Douglas E. Comer, Network Systems
Design Using Network Processors,
2004, Pearson Prentice Hall
2004. 8. 30
연세대학교 이용석교수 연구실
2. Microprocessor Report, Cahners,
Cahners,
http://www.mpronline.com/,
http://www.mpronline.com/,
1999~2004
박사과정 홍 인 표
E-mail: yonglee@yonsei.ac.kr
Homepage: http://mpu.yonsei.ac.kr
http://mpu.yonsei.ac.kr
전화:
전화: 0202-392392-7194
2
1
References
Traditional Network System
3. White papers and presentations of
network processor makers:
http://www.intel.com
/
http://www.intel.com/
http://www.ezchip.com
/
http://www.ezchip.com/
http://www.broadcom.com
/
http://www.broadcom.com/
http://www.agere.com
/
http://www.agere.com/
4. Presentations from
Network Processor Summit
in Networld+Interop,
Networld+Interop, Las Vegas, 2000
•
Embedded Processors
– Large and complex layer 4 protocol
Interface1
Standard CPU
Interface2
Other processing
Layer 4
Layer 4
Embedded
Embedded
processor Traffic Mgmt. (ASIC) processor
Layer 1 & 2 Layer 3 & class.
ASIC
Switching Fabric
Layer 3 & class. Layer 1 & 2
ASIC
4
3
1
Traditional Network System
•
What is the Network Processor?
ASIC
High-speed
&
Flexible
Performance
– Growth of network data rates
‹ OCOC-3, OCOC-12, OCOC-48, OCOC-192
‹ Bottleneck: embedded processor
Network
Processor
ASIC
– The only solution for highhigh-speed backbone
– Problem
‹ High
cost
timetime-toto-market
‹ Difficult to simulate
‹ Hard to debug, change or reuse
General
Purpose
uP
‹ Long
5
6
Flexibility & time-to-market
Requirements
What is the Network Processor?
Traffic 증가
음성,
음성, 데이터 통합
•
High performance processing
다양한 네트워크 서비스 등장
•
Flexibility
•
Ability to leverage coco-processors & memory
•
Headroom for emerging services
•
Robust software development environment
늘어나는 traffic을
traffic을 감당할
감당할 수 있으면서,
있으면서,
유연하게 신기술에 대응할
대응할 수 있는 네트워크 장비
HighHigh-performance Programmable Network devices
7
What is Important in NPUs…
NPUs…
•
8
Performance Gap
Performance & Scalability
62500Mbps
– Parallel architecture
– Pipelining
12500Mbps
CPU Speed
OC-12
500Mbps
(MIPS)
500MIPS
OC-3
100Mbps
‹Software controls hardware
2500MIPS
OC-48
(Mbps)
– Programmability
12500MIPS
OC-192
Speed of Internet
Backbone
2500Mbps
Connections
• Flexibility
62500MIPS
OC-768
100MIPS
T1
1985
9
1990
1995
2000
2005
10
2
Architecture
Input
stream
12
Types of Network Processors
F( )
G( )
H( )
RISC
Input
stream
I( )
Classify
Lookup
Modify
Queueing
RISC
Pipeline architecture
RISC
RISC
F( ); G( ); H( ); I( )
Input
stream
Multiple Processor
Approach
Pipeline Approach
F( ); G( ); H( ); I( )
Flexibility
F( ); G( ); H( ); I( )
TimeTime-toto-market
11
Parallel architecture
High performance
Pipeline Processing Approach
External
Policy
Memory
Input
Streams
Parsing
Searching
Example) TOPs of EZchip
External
Packet
Memory
Co-Processor
Interface
TOPmodify:
TOPmodify: Packet header and content modification
Queuing/
Fabric
Scheduling
Editing
Source: Networld+Interop 2000
TOPresolve:
TOPresolve: Queue management and forwarding
Packet buffer
TOPsearch:
TOPsearch: Table lookup
Output Scheduling
Streams
Editing
Egress
Lookup
Fabric
Enqueuing
External
Packet
Memory
Co-Processor
Interface
TOPparse:
TOPparse: Header field extraction and classification
13
Example) TOPs of EZchip
14
* TOP: Task Optimized Processor
Example) PXF of Cisco
Switching Fabric
•
Queuing
32 homogeneous embedded processors
Input
Memory
modify
resolve
search
MAC classify
Accounting & ICMP
Memory
FIB & Netflow
MPLS classify
Memory
Access Control
CAR routing
Memory
parse
MAC
Network
MLPPP
WRED
15
output
16
3
Multiple Processor Approach
Multiple Processor Approach
Source: Networld+Interop 2000
17
18
Source: CSIX & CPIX
Example) CNP810 of
Clearwater Networks
Example) BCM1250 of Broadcom
Dual 1GHz
MIPS 64 Processor
Debug/
Bus Trace
JTAG
Serial
Interface
D
M
A
Serial
Interface
D
M
A
Dual
SMBus
GPIO/
Interrupt/
PCMCIA
•
SB-1
Core
SB-1
Core
256 Bits
DDR
Memory
Controller
ZBbus
I/O Bridge
Generic Bus
And
Flash I/O
Packet
Cache
Data
Mover
L2 Cache
DMA
DMA
DMA
10/100/
1000
MAC
10/100/
1000
MAC
10/100/
1000
MAC
FIFO
32-Bit
PCI
SPI-3
SPI-3
HT
Host
Bridge
DDR2
DDR3
DDR4
DMA2
DMA3
DMA4
PCI-X
UART1
Xpress Switch (peak 225Gbps)
UART2
PMU
(PMMU, Queues)
RTU
SMT
Core
SPI
EEPROM
JTAG &
Trace
19
Source: http://www.broadcom.com/
Example) CNP810 of
Clearwater Networks
•
DDR1
DMA1
SPI-4
PCI/HT Bridge
FIFO
Use a SMT core instead of multiple processors
20
Example) IXPIXP-2800 of Intel
Reference about SMT core
– 온라인 강좌 “SMT 마이크로프로세서 구조의 개요”
개요”
Thread
context
Thread
context
Thread
context
Thread
context
Register file
Register file
Register file
Register file
Program counter
Program counter
Program counter
Program counter
Execution resources
ALUs
Multipliers
FPUs
21
22
4
More Commercial
Network Processors
Instruction Set Architecture
• Dedicated instruction set
•
– Network specific ISA
Reference [1]
– Chapter 15
• Modified RISC instruction set
– MIPS ISA + special instructions
23
24
Instruction Set Architecture
•
Dedicated Instruction Set
Dedicated instruction set
•
– Small fast instruction set
–
–
–
–
–
–
‹ Less
– Arithmetic, rotate, and shift
than 40 instructions
Strong bit manipulation
Bit field extraction
Special functions; CRC or Hash
Load/store on various data sizes
Load/store on various kind of memories
Conditional branch
‹ Different
conditions from common RISCs
Dedicated Instruction Set
•
– Branch
‹ Common
conditional branch
‹ BR_BSET, BR_BCLR
– Arithmetic operation
‹ ALU_SHF
– Arithmetic operation and shift
– Field extraction
‹ DBL_SHF
– Concatenate two words and shift
25
26
•
ISA of Intel IXPIXP-1200 micromicro-engine
– Reference
‹ CSR
‹ FIFO
– Branch if bit set or clear
‹ PCI
BR!=BYTE
bus
‹ Scratchpad
– Branch if byte equal or not equal
‹ Branch
‹ ALU
Dedicated Instruction Set
ISA of Intel IXPIXP-1200 micromicro-engine
‹ BR=BYTE,
ISA of Intel IXPIXP-1200 micromicro-engine
memory
‹ SDRAM
on event or signal
jump and return
‹ SRAM
‹ Common
27
28
5
Dedicated Instruction Set
•
Dedicated Instruction Set
ISA of Intel IXPIXP-1200 micromicro-engine
•
– Local register instructions
‹ FIND_BSET
ISA of Intel IXPIXP-1200 micromicro-engine
– Misc.
(_WITH_MASK)
‹ HASH
‹ IMMED
‹ NOP
‹ IMMED_Bn,
IMMED_Bn,
‹ CTX_ARB
IMMED_Wn
‹ LD_FIELD (_W_CLR)
‹ LOAD_ADDR
‹ LOAD_BSET_RESULTn
– Context swap and wake on event
29
Instruction Set Architecture
•
Modified RISC ISA
Modified RISC ISA
–
–
–
–
30
• ISA of Motorola CC-5
Mostly based on MIPS ISA
Remove some instructions
Add network specific instructions
Example)
– MIPS based ISA
‹Removed instructions
–
–
–
–
‹ Clearwater
networks, CNP810
CC-5
‹ Broadcom BCMBCM-1250
‹Added instructions
‹ Motorola
– CLZ
– Insert/extract bit field
– Conditional branches
31
Data Transfer
•
Multiply, divide
FPU instructions
Unaligned load/store
Move to high/low
32
Memory Subsystem
Intensive data movement
•
Streaming data
– Memory Subsystem
– Not reused
– Not suitable for cache
– Internal Transfer Mechanism
– OnOn-chip SRAM buffer
– External Interfaces
– Cache + special hardware
‹ Ex)
33
Packet Management Unit
34
6
Memory Subsystem
•
Memory Subsystem
SRAM
Table data
– Various lookup tables
‹ L3
Routing table, L2 forwarding, security
policy table, and etc.
CAM
contents
address
data
– Cache memory
Compare
Æ hit
=
data
– CAM (Content Addressable Memory)
‹ Suitable
‹ Small
for table lookup
capacity
35
Internal Transfer Mechanism
•
36
Internal Transfer Mechanism
Internal bus
•
•
Hardware FIFO
•
OnOn-chip shared memory
Internal bus
– Multiple units are attached to an
internal bus
– Centralized control (bus arbiter)
– Multiple DMA engines
– 256~512bits wide
37
Internal Transfer Mechanism
•
38
External Interfaces
Hardware FIFO
• Standard and specialized bus
interfaces
– USB, PCI, LALA-2 (by NPF)
Input
stream
FIFO
Classify
FIFO
Lookup
FIFO
Modify
Queueing
• External memory interfaces
• Direct I/O interfaces
– SPI, SFI, serial line
• Switching fabric interfaces
39
– CSIX standard
40
7
Example) Data Transfer Units
of CNP810
External Interfaces
•
High-speed
external
memory
On-chip
memory
External memory
– Fast DRAM
‹ DDRDDR-SDRAM
‹ QDRQDR-SDRAM
Packet
Cache
Internal bus
interface
DDR1
DDR2
DDR3
DDR4
DMA1
DMA2
DMA3
DMA4
PCI-X
– CNP810, BCMBCM-1250
‹ RDRAM
SPI-4
– IXP2400, IXP2800
SPI-3
• Memory bandwidth
– NPUs > General Purpose Processors
SPI-3
41
Direct I/O
interface
Direct I/O
interface
UART1
Xpress Switch (peak 225Gbps)
UART2
PMU
(PMMU, Queues)
RTU
Benchmarks
Netbench
• Hard to compare the performance of
network processors
• MicroMicro-level algorithms
SMT
Core
SPI
EEPROM
JTAG &
Trace
Packet
Management
Unit
42
– CRC: CRC32
– TL: Table Lookup (radix(radix-tree routing table)
– Various target application range
– Various operating environment
• IPIP-level algorithms
–
–
–
–
• Netbench
– http://cares.icsl.ucla.edu/NetBench
/
http://cares.icsl.ucla.edu/NetBench/
Route: IPv4 routing
DRR: scheduling method
NAT: Network Address Translation
IPCHAINS: firewall application
• ApplicationApplication-level algorithms
• Commbench
– http://ccrc.wustl.edu/~wolf/cb
/
http://ccrc.wustl.edu/~wolf/cb/
43
– URL: URLURL-based switching
– DH: public key encryption mechanism
– MD5: Message Digest algorithm (security)
44
Commbench
• Header processing applications
–
–
–
–
RTR: RadixRadix-Tree Routing table lookup
FRAG: IPIP-packet fragmentation
DRR: scheduling algorithm
TCP: TCP traffic monitoring
• Payload processing applications
–
–
–
–
CAST: CASTCAST-128 block ciper algorithm
ZIP: data compression
REED: ReedReed-Solomon forward error correction
JPEG
45
8
Download PDF
Similar pages