advertisement
Intel
®
82599 10 GbE Controller—Software Initialization and Diagnostics
• PCIe reset (PERST and in-band)
• D3hot --> D0
• FLR
• Software reset by the PF
The 82599 sets the
RSTI
bits in all the VFMailbox registers. Once the reset completes, each VF might read its VFMailbox register to identify a reset in progress.
Once the PF completes configuring the device, it clears the CTRL_EXT.PFRSTD bit. As a result, the 82599 clears the
RSTI
bits in all the VFMailbox registers and sets the
RSTD
(
Reset Done
) bits in all the VFMailbox registers.
Until a RSTD condition is detected, the VFs should access only the VFMailbox register and should not attempt to activate the interrupt mechanism or the transmit and receive process.
4.6.11 DCB Configuration
After power up or device reset, DCB and any type of FC are disabled by default, and a unique TC and packet buffer (like PB0) is used. In this mode, the host can exchange information via DCX protocol to determine the number of TCs to be configured. Before setting the device to multiple TCs, it should be reset (software reset).
The registers concerned with setting the number of TCs are: RXPBSIZE[0-7],
TXPBSIZE[0-7], TXPBTHRESH, MRQC, MTQC, and RTRUP2TC registers along with the following bits RTRPCS.RAC, RTTDCS.TDPAC, RTTDCS.VMPAC and RTTPCS.TPPAC.
They cannot be modified on the fly, but only after device reset. Packet buffers with a nonnull size must be allocated from PB0 and up.
Rate parameters and bandwidth allocation to VMs can be modified on the fly without disturbing traffic flows.
4.6.11.1 CPU Latency Considerations
When the CPU detects an idle period of some length, it enters a low-power sleep state.
When traffic arrives from the network, it takes time for the CPU to wake and respond
(such as to snoop). During that period, Rx packets are not posted to system memory.
If the entry time to sleep state is too short, the CPU might be getting in and out of sleep state in between packets, therefore impacting latency and throughput. 100 s was defined as a safe margin for entry time to avoid such effects.
Each time the CPU is in low power, received packets need to be stored (or dropped) in the
82599 for the duration of the exit time. Given 64 KB Rx packet buffers per TC in the
82599, Priority Flow Control (PFC) does not spread (or a packet is not dropped) provided that the CPU exits its low power state within
50 s.
180 331520-004
Software Initialization and Diagnostics—Intel
®
82599 10 GbE Controller
4.6.11.2 Link Speed Change Procedure
Each time the link status or speed is changed, hardware is automatically updating the transmit rates that were loaded by software relatively to the new link speed. This means that if a rate limiter was set by software to 500 Mb/s for a 10 GbE link speed, it is changed by hardware to 50 Mb/s if the link speed has changed to 1 GbE.
Since transmit rates must be considered as absolute rate limitations (expressed in Mb/s, regardless of the link speed), in such occasions software is responsible to either clear all the transmit rate-limiters via the BCN_CLEAR_ALL bit in RTTBCNRD register, and/or to re-load each transmit rate with the correct value relatively to the new link speed. In the previous example, the new transmit rate value to be loaded by software must be multiplied by 10 to maintain the rate limitation to 500 Mb/s.
4.6.11.3 Initial Configuration Flow
Only the following configuration modes are allowed.
4.6.11.3.1 General Case: DCB-on, VT-on
1. Configure packet buffers, queues, and traffic mapping:
— 8 TCs mode — Packet buffer size and threshold, typically
RXPBSIZE[0-7].SIZE=0x40
TXPBSIZE[0-7].SIZE=0x14 but non-symmetrical sizing is also allowed (see
TXPBTHRESH.THRESH[0-7]=TXPBSIZE[0-7].SIZE — Maximum expected Tx packet length in this TC
— 4 TCs mode — Packet buffer size and threshold, typically
RXPBSIZE[0-3].SIZE=0x80, RXPBSIZE[[4-7].SIZE=0x0
TXPBSIZE[0-3].SIZE=0x28, TXPBSIZE[4-7].SIZE=0x0 but non-symmetrical sizing among TCs[0-3] is also allowed (see
for rules)
TXPBTHRESH.THRESH[0-3]=TXPBSIZE[0-3].SIZE — Maximum expected Tx packet length in this TC
TXPBTHRESH.THRESH[4-7]=0x0
— Multiple Receive and Transmit Queue Control (MRQC and MTQC)
• Set MRQC.MRQE to 1xxxb, with the three least significant bits set according
to the number of VFs, TCs, and RSS mode as described in Section 8.2.3.7.12
.
• Set both
RT_Ena
and
VT_Ena
bits in the MTQC register.
• Set MTQC.NUM_TC_OR_Q according to the number of TCs/VFs enabled as
described in Section 8.2.3.7.16
— Set the PFVTCTL.VT_Ena (as the MTQC.VT_Ena)
— Queue Drop Enable (PFQDE) — In SR-IO the
QDE
bit should be set to 1b in the
PFQDE register for all queues. In VMDq mode, the
QDE
bit should be set to 0b for all queues.
331520-004 181
Intel
®
82599 10 GbE Controller—Software Initialization and Diagnostics
— Split receive control (SRRCTL[0-127]): Drop_En=1 — drop policy for all the queues, in order to avoid crosstalk between VMs
— Rx User Priority (UP) to TC (RTRUP2TC)
— Tx UP to TC (RTTUP2TC)
— DMA TX TCP Maximum Allowed Size Requests (DTXMXSZRQ) — set
Max_byte_num_req = 0x010 = 4 KB
2. Enable PFC and disable legacy flow control:
— Enable transmit PFC via: FCCFG.TFCE=10b
— Enable receive PFC via: MFLCN.RPFCE=1b
— Disable receive legacy flow control via: MFLCN.RFCE=0b
for other registers related to flow control
3. Configure arbiters, per TC[0-1]:
— Tx descriptor plane T1 Config (RTTDT1C) per queue, via setting RTTDQSEL first.
Note that the RTTDT1C for queue zero must always be initialized.
— Tx descriptor plane T2 Config (RTTDT2C[0-7])
— Tx packet plane T2 Config (RTTPT2C[0-7])
— Rx packet plane T4 Config (RTRPT4C[0-7])
4. Enable TC and VM arbitration layers:
— Tx Descriptor plane Control and Status (RTTDCS), bits:
TDPAC=1b, VMPAC=1b, TDRM=1b, BDPM=1b, BPBFSM=0b
— Tx Packet Plane Control and Status (RTTPCS): TPPAC=1b, TPRM=1b,
ARBD=0x004
— Rx Packet Plane Control and Status (RTRPCS): RAC=1b, RRM=1b
5. Set the SECTXMINIFG.SECTXDCB field to 0x1F.
4.6.11.3.2 DCB-On, VT-Off
Set the configuration bits as specified in Section 4.6.11.3.1
with the following exceptions:
• Configure packet buffers, queues, and traffic mapping:
— MRQC and MTQC
• Set MRQE to 0xxxb, with the three least significant bits set according to the number of TCs and RSS mode
• Set
RT_Ena
bit and clear the
VT_Ena
bit in the MTQC register.
• Set MTQC.NUM_TC_OR_Q according to the number of TCs enabled
— Clear PFVTCTL.VT_Ena (as the MRQC.VT_Ena)
• Allow no-drop policy in Rx:
— PFQDE: The
QDE
bit should be set to 0b in the PFQDE register for all queues enabling per queue policy by the SRRCTL[n] setting.
182 331520-004
Software Initialization and Diagnostics—Intel
®
82599 10 GbE Controller
— Split Receive Control (SRRCTL[0-127]): The
Drop_En
bit should be set per receive queue according to the required drop / no-drop policy of the TC of the queue.
• Tx descriptor plane control and status (RTTDCS) bits:
— TDPAC=1b, VMPAC=1b, TDRM=1b, BDPM=0b if Tx rate limiting is not enabled and 1b if Tx rate limiting is enabled, BPBFSM=0b.
• Disable VM arbitration layer:
— Clear RTTDT1C register, per each queue, via setting RTTDQSEL first
— RTTDCS.VMPAC=0b
4.6.11.3.3 DCB-Off, VT-On
Set the configuration bits as specified in
exceptions:
• Disable multiple packet buffers and allocate all queues to PB0:
— RXPBSIZE[0].SIZE=0x200, RXPBSIZE[1-7].SIZE=0x0
— TXPBSIZE[0].SIZE=0xA0, TXPBSIZE[1-7].SIZE=0x0
— TXPBTHRESH.THRESH[0]=0xA0 — Maximum expected Tx packet length in this
TC TXPBTHRESH.THRESH[1-7]=0x0
— MRQC and MTQC
• Set MRQE to 1xxxb, with the three least significant bits set according to the number of VFs and RSS mode
• Clear
RT_Ena
bit and set the
VT_Ena
bit in the MTQC register.
• Set MTQC.NUM_TC_OR_Q according to the number of VFs enabled
— Set PFVTCTL.VT_Ena (as the MRQC.VT_Ena)
— Rx UP to TC (RTRUP2TC), UPnMAP=0b, n=0,...,7
— Tx UP to TC (RTTUP2TC), UPnMAP=0b, n=0,...,7
— DMA TX TCP Maximum Allowed Size Requests (DTXMXSZRQ) — set
Max_byte_num_req = 0xFFF = 1 MB
• Disable PFC and enabled legacy flow control:
— Disable receive PFC via: MFLCN.RPFCE=0b
— Enable transmit legacy flow control via: FCCFG.TFCE=01b
— Enable receive legacy flow control via: MFLCN.RFCE=1b
• Configure VM arbiters only, reset others:
— Tx Descriptor Plane T1 Config (RTTDT1C)
per pool
, via setting RTTDQSEL first for the pool index. Clear RTTDT1C for other queues. Note that the RTTDT1C for queue zero must always be initialized.
— Clear RTTDT2C[0-7] registers
— Clear RTTPT2C[0-7] registers
331520-004 183
Intel
®
82599 10 GbE Controller—Software Initialization and Diagnostics
— Clear RTRPT4C[0-7] registers
• Disable TC arbitrations while enabling the packet buffer free space monitor:
— Tx Descriptor Plane Control and Status (RTTDCS), bits:
TDPAC=0b, VMPAC=1b, TDRM=0b, BDPM=1b, BPBFSM=0b
— Tx Packet Plane Control and Status (RTTPCS): TPPAC=0b, TPRM=0b,
ARBD=0x224
— Rx Packet Plane Control and Status (RTRPCS): RAC=0b, RRM=0b
4.6.11.3.4 DCB-Off, VT-Off
Set the configuration bits as specified in Section 4.6.11.3.1
with the following exceptions:
• Disable multiple packet buffers and allocate all queues and traffic to PB0:
— RXPBSIZE[0].SIZE=0x200, RXPBSIZE[1-7].SIZE=0x0
— TXPBSIZE[0].SIZE=0xA0, TXPBSIZE[1-7].SIZE=0x0
— TXPBTHRESH.THRESH[0]=0xA0 — Maximum expected Tx packet length in this
TC TXPBTHRESH.THRESH[1-7]=0x0
— MRQC and MTQC
• Set MRQE to 0xxxb, with the three least significant bits set according to the
RSS mode
• Clear both
RT_Ena
and
VT_Ena
bits in the MTQC register.
• Set MTQC.
NUM_TC_OR_Q
to 00b.
— Clear
PFVTCTL.VT_Ena
(as the MRQC.VT_Ena)
— Rx UP to TC (RTRUP2TC), UPnMAP=0b, n=0,...,7
— Tx UP to TC (RTTUP2TC), UPnMAP=0b, n=0,...,7
— DMA TX TCP Maximum Allowed Size Requests (DTXMXSZRQ) — set
Max_byte_num_req = 0xFFF = 1 MB
• Allow no-drop policy in Rx:
— PFQDE: The
QDE
bit should be set to 0b in the PFQDE register for all queues enabling per queue policy by the SRRCTL[n] setting.
— Split Receive Control (SRRCTL[0-127]): The
Drop_En
bit should be set per receive queue according to the required drop / no-drop policy of the TC of the queue.
• Disable PFC and enable legacy flow control:
— Disable receive PFC via: MFLCN.RPFCE=0b
— Enable receive legacy flow control via: MFLCN.RFCE=1b
— Enable transmit legacy flow control via: FCCFG.TFCE=01b
• Reset all arbiters:
— Clear RTTDT1C register, per each queue, via setting RTTDQSEL first
184 331520-004
Software Initialization and Diagnostics—Intel
®
82599 10 GbE Controller
— Clear RTTDT2C[0-7] registers
— Clear RTTPT2C[0-7] registers
— Clear RTRPT4C[0-7] registers
• Disable TC and VM arbitration layers:
— Tx Descriptor Plane Control and Status (RTTDCS), bits:
TDPAC=0b, VMPAC=0b, TDRM=0b, BDPM=1b, BPBFSM=1b
— Tx Packet Plane Control and Status (RTTPCS): TPPAC=0b, TPRM=0b,
ARBD=0x224
— Rx Packet Plane Control and Status (RTRPCS): RAC=0b, RRM=0b
4.6.11.4 Transmit Rate Scheduler
In some applications it might be useful to setup rate limiters on Tx queues for other usage models (rate-limiting VF traffic for instance). In all cases, setting a rate limiter on
Tx queue N to a TargetRate requires the following settings:
Global Setting
• The Transmit Rate-scheduler memory for all transmit queues must be cleared before rate limiting is enabled on any queue. This memory is accessed by the RTTBCNRC register mapped by the RTTDQSEL.TXDQ_IDX.
• Set global transmit compensation time to the MMW_SIZE in RTTBCNRM register.
Typically MMW_SIZE=0x014 if 9.5 KB (9728-byte) jumbo is supported and 0x004 otherwise.
Per Queue Setting
• Select the requested queue by programming the queue index - RTTDQSEL.TXQ_IDX
• Program the desired rate as follow:
— Compute the Rate_Factor which equals Link_Speed / Target_Rate. Link_Speed could be either 10 Gb/s or 1 Gb/s. Note that the Rate_Factor is composed of an integer number plus a fraction. The integer part is a 10 bit number field and the fraction part is a 14 bit binary fraction number.
— Integer (Rate_Factor) is programmed by the RTTBCNRC.RF_INT[9:0] field
— Fraction (Rate_Factor) is programmed by the RTTBCNRC.RF_DEC[13:0] field. It equals RF_DEC[13] * 2
-1
+ RF_DEC[12] * 2
-2
+ ... + RF_DEC[0] * 2
-14
• Enable Rate Scheduler by setting the RTTBCNRC. RS_ENA
Numerical Example
• Target_Rate = 240 Mb/s; Link_Speed = 10 Gb/s
• Rate_Factor = 10 / 0.24 = 41.6666... = 101001.10101010101011b
• RF_DEC = 10101010101011b; RF_INT = 0000101001b
• Therefore, set RTTBCNRC to 0x800A6AAB
331520-004 185
Intel
®
82599 10 GbE Controller—Software Initialization and Diagnostics
Note:
The IPG pacing feature is a parallel feature to the Tx rate scheduler where
IPG pacing is applied to the entire Tx data flow while the Tx rate scheduler is applied separately to each Tx queue. Therefore, if a single queue is used, either feature can be used to limit the Tx data rate; however, if multiple queues are used, the IPG pacing feature is a better choice for a homogeneous Tx data rate limitation.
4.6.11.5 Configuration Rules
4.6.11.5.1 TC Parameters
Traffic Class
Per 802.1p, priority #7 is the highest priority.
A specific TC can be configured to receive or transmit a specific amount of the total bandwidth available per port.
Bandwidth allocation is defined as a fraction of the total available bandwidth, which can be less than the full Ethernet link bandwidth (if it is bounded by the PCIe bandwidth or by flow control).
Low latency TC should be configured to use the highest priority TC possible (TC 6, 7). The lowest latency is achieved using TC7.
Bandwidth Group (BWGs)
The main reason for having BWGs is to represent different traffic types. A traffic type
(such as storage, IPC LAN or manageability) can have more than one TC (for example, one for control traffic and one for the raw data), by grouping these two TC to a BWG the user can allocate bandwidth to the storage traffic so that unused bandwidth by the control could be used by the data and vise versa. This BWG concept supports the converged fabric as each traffic type, that is used to run on a different fabric, can be configured as a BWG and gets its resources as if it was on a different fabric.
1. To configure DCB not to share bandwidth between TCs, each TC should be configured as a separate BWG.
2. There are no limits on the TCs that can be bundled together as a BWG. All TCs can be configured as a single BWG.
3. BWG numbers should be sequential starting from zero until the total number of BWGs minus one.
4. BWG numbers do not imply priority, priority is only set according to TCs.
Refill Credits
Refill credits regulate the bandwidth allocated to BWG and TC. The ratio between the credits of the BWG’s represents the relative bandwidth percentage allocated to each
BWG. The ratio between the credits of the TC’s represents the relative bandwidth percentage allocated to each TC within a BWG.
186 331520-004
Software Initialization and Diagnostics—Intel
®
82599 10 GbE Controller
Credits are configured and calculated using 64 bytes granularity.
1. In any case, the number of refill credits assigned per TC should be as small as possible but must be larger than the maximum frame size used and larger than 1.5
KB. Using a lower refill value causes more refill cycles before a packet can be sent.
These extra cycles unnecessarily increase the latency.
2. Refill credits ratio between TCs should be equal to the desired ratio of bandwidth allocation between the different TCs. Applying rule #1, means bandwidth shares are sorted from the smaller to the bigger, and just one maximum sized frame is allocated to the smallest.
3. The ratio between the refill credits of any two TCs should not be greater than 100.
4. Exception to rule #2 — TCs that require low latency should be configured so that they are under subscribed. For example, credit refill value should provide these TCs somewhat more bandwidth than what they actually need. Low latency TCs should always have credits so they can be next in line for the WSP arbitration.
This exception causes the low latency TC to always have maximum credits (as it starts with maximum credits and on average cycle uses less than the refill credits).
The end point that is sending/receiving packets of 127 bytes eventually gets double the bandwidth it was configured to, as we do all the credit calculation by rounding the values down to the next 64 byte aligned value.
Maximum Credit Limit
The maximum credit limit value establishes a limit for the number of credits that a TC or
BWG can own at any given time. This value prevents stacking up stale credits that can be added up over a relatively long period of time and then used by TCs all at once, altering fairness and latency.
Maximum credits limits are configured and calculated using 64 bytes granularity.
1. Maximum credit limit should be bigger than the refill credits allocated to the TC.
2. Maximum credit limit should be set to be as low as possible while still meeting other rules to minimize the latency impact on low latency TCs.
3. If a low latency TC generates a burst that is larger than its maximum credit limit this
TC might experience higher latency since the TC needs to wait for allocation of additional credits because it finished all its credits for this cycle. Therefore maximum credit limit for a low latency TC must be set bigger than the maximum burst length of traffic expected on that TC (for all the VMs at once). If TC7 and TC6 are for low latency traffic, it leads to:
Max(TC
7,6
) >= MaxBurst(TC7,6) served with low latency
4. An arbitration cycle can extend when one or more TCs accumulate credits more than their refill values (up to their maximum credit limit). For such a case, a low latency TC should be provided with enough credits to cover for the extended cycle duration.
Since the low latency TC operates at maximum credits (see rule #3) its maximum credit limit should meet the following formula:
{Max(TCx)/SUMi=0..7[Max(TCi)]} >= {BW(TCx)/Full BW}
The formula applies to both descriptor arbiter and data arbiter.
331520-004 187
188
Intel
®
82599 10 GbE Controller—Software Initialization and Diagnostics
5. When in a virtualized environment, the low latency TC condition checked by the VM
WRR arbiter (see
) induces the following relation between the
maximum credits of a low latency TC and the refill credits of its attached VM arbiter:
Max(TCx) >= 2 x {SUMi=0...15[Refill(VMi)]}
6. To ensure bandwidth for low priority TC (when those are allocated with most of the bandwidth) the maximum credit value of the low priority TC in the data arbiter needs to be high enough to ensure sync between the two arbiters. In the equation that follows the bandwidth numbers are from the descriptor arbiter while the maximum values are of the data arbiter.
{Max(TCx)/SUMi=x+1..7[Max(TCi)]} >= {BW(TCx)/Full_PCIE_BW}
Tip:
Note that the previous equation is worst case and covers the assumption that all higher TCs have the full maximum to transmit.
A simplified maximum credits allocation scheme would be to find the minimum number N >= 2 such that rules #3 and #5 are respected, and allocate
Max(TCi) = N x Refill(TCi), for i=0...7
By maintaining the same ratios between the maximum credits and the bandwidth shares, the bandwidth allocation scheme is made more immune to disturbing events such as reception of priority pause frames with short timer values.
GSP and LSP
TC Link Strict Priority
(TC.LSP): This bit specifies that the configured TC can transmit without any restriction of credits. This effectively means that the TC can take up entire link bandwidth, unless preempted by higher priority traffic. The Tx queues associated with LSP TC must be set as
Strict Low Latency
in the TXLLQ[n] registers.
TC Strict Priority
within group (TC.GSP): This bit defines whether strict priority is enabled or disabled for this TC within its BWG. If TC.GSP is set to 1b, the TC is scheduled for transmission using strict priority. It does not check for availability of credits in the TC. It does check whether the BWG of this TC has credits. For example, the amount of traffic generated from this TC is still limited by the BWG allocated for the BWG.
1. TC’s with the
LSP
bit set should be the first to be considered by the scheduler. This implies that
LSP
should be configured to the highest priority TC’s. For example, starting from priority 7 and down. The other TC’s should be used for groups with bandwidth allocation. It is recommended to use LSP only for one TC (TC7) as the first
LSP TC takes its bandwidth and there are no guarantees to the lower priority LSPs.
2. GSP can be set to more than one TC in a BWG, always from the highest priority TC within that BWG downward. For the LAN scenario, all TCs could be configured to be
GSP as their bandwidth needs are not known.
3. To a low latency TC for which the
GSP
bit is set, non-null refill credits must be set for at least one maximum sized frame. It ensures that even after having been quiet for a while, some BWG credits are left available to the GSP TC, for serving it with minimum latency (without waiting for replenishing). Bigger refill credits values ensure longer burst of GSP traffic served with minimum latency.
331520-004
Software Initialization and Diagnostics—Intel
®
82599 10 GbE Controller
4.6.11.5.2 VM Parameters
Refill Credits
Refill credits regulate the fraction of the TC’s bandwidth that is allocated to a VM. The ratio between the credits of the VMs represents the relative TC bandwidth percentage allocated to each VM.
Credits are configured and calculated using 64 bytes granularity.
1. The number of refill credits assigned per VM should be as small as possible but still larger than the maximum frame size used and larger than 1.5 KB in any case. Using a lower refill value causes more refill cycles before a packet can be sent. These extra cycles increase the latency unnecessarily.
2. Refill credits ratio between VMs should be equal to the desired ratio of bandwidth allocation between the different TCs. Applying rule #1, means bandwidth shares are sorted from the smaller to the bigger, and just one maximum sized frame is allocated to the smallest.
3. The ratio between the refill credits of any two VMs within the TC should not be greater than 10.
VMs that are sending/receiving packets of 127 bytes eventually gets double the bandwidth it was configured to as we do all the credit calculation by rounding the values down to the next 64 byte aligned value.
4. In a low latency TC, non-null refill credits must be set to a VSP VM, for at least one maximum sized frame. It ensures that even after having been quiet for a while, some
TC credits are left available to the VSP VM, for serving it with minimum latency
(without waiting for TC to replenish). Bigger refill credits values ensure longer burst of VSP traffic served with minimum latency.
Example 4-1 Refill and MaxCredits Setting Example
This example assumes a system with only four TCs and three VMs present, and with the following bandwidth allocation scheme. Also, full PCIe bandwidth is evaluated to 15 G.
Table 4-9 Bandwidth Share Example
TC0
TCs and VMs
Total
VM0
VM1
VM2
Total
Bandwidth
Share%
40
60
30
10
20
TC1
VM0
VM1
VM2
34
33
33
Notes
9.5 KB (9728-byte) jumbo allowed.
No jumbo.
331520-004 189
Intel
®
82599 10 GbE Controller—Software Initialization and Diagnostics
Table 4-9 Bandwidth Share Example (Continued)
TCs and VMs
Bandwidth
Share%
Total
30
Notes
Low latency TC. No jumbo.
Bandwidth share already increased.
MaxBurstTC2=120 KB
TC2
VM0
VM1
80
10
VM2
10
Total
10
Low latency LSP TC.
No jumbo.
MaxBurstTC3=36 KB
TC3
VM0
VM1
VM2
20
60
20
The ratios between TC refills were driven by TC0, which was set as 152 for supporting 9.5
KB jumbos.
The ratio between MaxCredits and Refill were taken as 17 for all the TCs, as driven by
TC2 relation between MaxCredits and MaxBurstTC2.
Table 4-10 Refill and MaxCredits Setting
TCs and VMs
Refill (64-Byte
Units)
TC0
TC1
Total
VM0
VM1
VM2
Total
VM0
VM1
VM2
76
25
24
24
152
912
456
152
2584
1292
MaxCredits (64-Byte Units)
190 331520-004
advertisement
Related manuals
advertisement
Table of contents
- 21 Scope
- 21 Product Overview
- 22 82599 Silicon/Software Features
- 23 System Configurations
- 24 External Interfaces
- 24 PCI-Express* (PCIe*) Interface
- 24 Network Interfaces
- 25 EEPROM Interface
- 26 Serial Flash Interface
- 26 SMBus Interface
- 26 NC-SI Interface
- 26 MDIO Interfaces
- 27 I2C Interfaces
- 27 Software-Definable Pins (SDP) Interface (General-Purpose I/O)
- 28 LED Interface
- 28 Features Summary
- 33 Overview of New Capabilities Beyond the
- 33 Security
- 33 Transmit Rate Limiting
- 33 Fibre Channel over Ethernet (FCoE)
- 34 Performance
- 35 Rx/Tx Queues and Rx Filtering
- 35 Interrupts
- 35 Virtualization
- 36 Double VLAN
- 37 Time Sync — IEEE 1588 — Precision Time Protocol (PTP)
- 37 Conventions
- 37 Terminology and Acronyms
- 37 Byte Count
- 37 Byte Ordering
- 38 Register/Bit Notations
- 38 References
- 41 Architecture and Basic Operation
- 41 Transmit (Tx) Data Flow
- 42 Receive (Rx) Data Flow
- 43 Pin Assignment
- 43 Signal Type Definition
- 44 PCIe Symbols and Pin Names
- 47 EEPROM
- 47 Serial Flash
- 48 SMBus
- 49 NC-SI
- 50 Software Defined Pins (SDPs)
- 51 RSVD and No Connect Pins
- 53 Miscellaneous
- 54 Power Supplies
- 55 Pull-Ups
- 58 Ball Out — Top Level
- 61 PCI-Express* (PCIe*)
- 61 Overview
- 64 General Functionality
- 64 Host Interface
- 68 Transaction Layer
- 75 Link Layer
- 76 Physical Layer
- 80 Error Events and Error Reporting
- 86 Performance Monitoring
- 87 SMBus
- 87 Channel Behavior
- 87 SMBus Addressing
- 88 SMBus Notification Methods
- 90 Receive TCO Flow
- 91 Transmit TCO Flow
- 93 Concurrent SMBus Transactions
- 93 SMBus ARP Functionality
- 97 LAN Fail-Over Through SMBus
- 97 Network Controller — Sideband Interface (NC-SI)
- 97 Electrical Characteristics
- 98 NC-SI Transactions
- 98 EEPROM
- 98 General Overview
- 98 EEPROM Device
- 98 EEPROM Vital Content
- 99 Software Accesses
- 99 Signature Field
- 100 Protected EEPROM Space
- 101 EEPROM Recovery
- 102 EEPROM Deadlock Avoidance
- 103 VPD Support
- 104 Flash
- 104 Flash Interface Operation
- 105 Flash Write Control
- 105 Flash Erase Control
- 105 Flash Access Contention
- 106 Configurable I/O Pins — Software-Definable Pins (SDP)
- 109 Network Interface (MAUI Interface)
- 110 10 GbE Interface
- 121 GbE Interface
- 123 SGMII Support
- 125 Auto Negotiation For Backplane Ethernet and Link Setup Features
- 129 Transceiver Module Support
- 130 Management Data Input/Output (MDIO) Interface
- 136 Ethernet Flow Control (FC)
- 146 Inter Packet Gap (IPG) Control and Pacing
- 147 MAC Speed Change at Different Power Modes
- 151 Power Up
- 151 Power-Up Sequence
- 152 Power-Up Timing Diagram
- 155 Reset Operation
- 155 Reset Sources
- 158 Reset in PCI-IOV Environment
- 159 Reset Effects
- 162 Queue Disable
- 163 Function Disable
- 163 General
- 163 Overview
- 165 Control Options
- 165 Event Flow for Enable/Disable Functions
- 166 Device Disable
- 166 Overview
- 167 BIOS Disable of the Device at Boot Time by Using the Strapping Option
- 167 Software Initialization and Diagnostics
- 167 Introduction
- 167 Power-Up State
- 168 Initialization Sequence
- 169 100 Mb/s, 1 GbE, and 10 GbE Link Initialization
- 170 Initialization of Statistics
- 170 Interrupt Initialization
- 171 Receive Initialization
- 175 Transmit Initialization
- 176 FCoE Initialization Flow
- 177 Virtualization Initialization Flow
- 180 DCB Configuration
- 191 Security Initialization
- 193 Alternate MAC Address Support
- 195 Power Targets and Power Delivery
- 195 Power Management
- 195 Introduction to the 82599 Power States
- 196 Auxiliary Power Usage
- 196 Power Limits by Certain Form Factors
- 197 Interconnects Power Management
- 199 Power States
- 204 Timing of Power-State Transitions
- 208 Wake Up
- 208 Advanced Power Management Wake Up
- 208 ACPI Power Management Wake Up
- 209 Wake-Up Packets
- 215 Wake Up and Virtualization
- 217 EEPROM General Map
- 219 EEPROM Software
- 219 SW Compatibility Module — Word Address 0x10-0x
- 219 PBA Number Module — Word Address 0x15-0x
- 220 iSCSI Boot Configuration — Word Address 0x
- 223 Software Reserved Word — PXE VLAN Configuration Pointer — Word Address 0x
- 224 VPD Module Pointer — Word Address 0x2F
- 224 EEPROM PXE Module — Word Address 0x30-0x
- 227 Alternate Ethernet MAC Address — Word Address 0x
- 227 Checksum Word Calculation (Word 0x3F)
- 229 Word Address 0x
- 229 Software Reserved Word 16 — Alternate SAN MAC Block Pointer — Word Address 0x
- 230 Software Reserved Word 17 — Active SAN MAC Block Pointer — Word Address 0x
- 231 EEPROM Hardware Sections
- 231 EEPROM Hardware Section — Auto-Load Sequence
- 231 EEPROM Init Module
- 233 PCIe Analog Configuration Module
- 234 Core 0/1 Analog Configuration Modules
- 235 PCIe General Configuration Module
- 244 PCIe Configuration Space 0/1 Modules
- 246 LAN Core 0/1 Modules
- 249 MAC 0/1 Modules
- 255 CSR 0/1 Auto Configuration Modules
- 257 Firmware Module
- 257 Test Configuration Module
- 258 Common Firmware Parameters — (Global MNG Offset 0x3)
- 259 Pass Through LAN 0/1 Configuration Modules
- 268 Sideband Configuration Module
- 270 Flexible TCO Filter Configuration Module
- 272 NC-SI Microcode Download Module
- 272 NC-SI Configuration Module
- 277 Receive Functionality
- 278 Packet Filtering
- 282 Rx Queues Assignment
- 310 MAC Layer Offloads
- 310 Receive Data Storage in System Memory
- 310 Legacy Receive Descriptor Format
- 313 Advanced Receive Descriptors
- 323 Receive Descriptor Fetching
- 323 Receive Descriptor Write-Back
- 324 Receive Descriptor Queue Structure
- 327 Header Splitting
- 330 Receive Checksum Offloading
- 333 SCTP Receive Offload
- 334 Receive UDP Fragmentation Checksum
- 335 Transmit Functionality
- 335 Packet Transmission
- 344 Transmit Contexts
- 345 Transmit Descriptors
- 361 TCP and UDP Segmentation
- 369 Transmit Checksum Offloading in Non-segmentation Mode
- 373 Interrupts
- 373 Interrupt Registers
- 377 Interrupt Moderation
- 381 TCP Timer Interrupt
- 381 Mapping of Interrupt Causes
- 388 802.1q VLAN Support
- 388 802.1q VLAN Packet Format
- 388 802.1q Tagged Frames
- 389 Transmitting and Receiving 802.1q Packets
- 390 802.1q VLAN Packet Filtering
- 390 Double VLAN and Single VLAN Support
- 394 Direct Cache Access (DCA)
- 395 PCIe TLP Format for DCA
- 397 Data Center Bridging (DCB)
- 397 Overview
- 400 Transmit-side Capabilities
- 413 Receive-Side Capabilities
- 417 LinkSec
- 418 Packet Format
- 418 LinkSec Header (SecTag) Format
- 420 LinkSec Management – KaY (Key Agreement Entity)
- 421 Receive Flow
- 424 Transmit Data Path
- 425 LinkSec and Manageability
- 425 Key and Tamper Protection
- 426 LinkSec Statistics
- 428 Time SYNC (IEEE1588 and 802.1AS)
- 428 Overview
- 428 Flow and Hardware/Software Responsibilities
- 430 Hardware Time Sync Elements
- 433 Time Sync Related Auxiliary Elements
- 434 PTP Packet Structure
- 437 7.10 Virtualization
- 437 Overview
- 441 PCI-SIG SR-IOV Support
- 452 Packet Switching
- 463 Virtualization of Hardware
- 464 7.11 Receive Side Coalescing (RSC)
- 466 Packet Viability for RSC Functionality
- 468 Flow Identification and RSC Context Matching
- 470 Processing New RSC
- 470 Processing Active RSC
- 472 Packet DMA and Descriptor Write Back
- 474 RSC Completion and Aging
- 476 7.12 IPsec Support
- 476 Overview
- 476 Hardware Features List
- 479 Software/Hardware Demarcation
- 480 IPsec Formats Exchanged Between Hardware and Software
- 484 TX SA Table
- 485 TX Hardware Flow
- 487 AES-128 Operation in Tx
- 489 RX Descriptors
- 489 Rx SA Tables
- 492 RX Hardware Flow without TCP/UDP Checksum Offload
- 493 RX Hardware Flow with TCP/UDP Checksum Offload
- 493 AES-128 Operation in Rx
- 495 7.13 Fibre Channel over Ethernet (FCoE)
- 495 Introduction
- 496 FCoE Transmit Operation
- 502 FCoE Receive Operation
- 518 7.14 Reliability
- 518 Memory Integrity Protection
- 518 PCIe Error Handling
- 519 Address Regions
- 519 Memory-Mapped Access
- 520 I/O-Mapped Access
- 522 Registers Terminology
- 523 Device Registers — PF
- 523 MSI-X BAR Register Summary PF
- 523 Registers Summary PF — BAR
- 543 Detailed Register Descriptions — PF
- 734 Device Registers — VF
- 734 Registers Allocated Per Queue
- 734 Non-Queue Registers
- 735 MSI—X Register Summary VF — BAR
- 737 Registers Summary VF — BAR
- 739 Detailed Register Descriptions —VF
- 749 PCI Compatibility
- 750 Configuration Sharing Among PCI Functions
- 752 PCIe Register Map
- 752 Register Attributes
- 752 PCIe Configuration Space Summary
- 754 Mandatory PCI Configuration Registers — Except BARs
- 757 Subsystem ID Register (0x2E; RO)
- 757 Cap_Ptr Register (0x34; RO)
- 758 Mandatory PCI Configuration Registers — BARs
- 759 PCIe Capabilities
- 765 MSI-X Capability
- 770 VPD Registers
- 771 PCIe Configuration Registers
- 782 PCIe Extended Configuration Space
- 783 Advanced Error Reporting Capability (AER)
- 788 Serial Number
- 790 Alternate Routing ID Interpretation (ARI) Capability Structure
- 791 IOV Capability Structure
- 798 Virtual Functions Configuration Space
- 800 Mandatory Configuration Space
- 802 PCI Capabilities
- 805 10.1 Platform Configurations
- 805 On-Board BMC Configurations
- 806 82599 NIC
- 806 10.2 Pass Through (PT) Functionality
- 807 DMTF NC-SI Mode
- 809 SMBus Pass Through (PT) Functionality
- 813 10.3 Manageability Receive Filtering
- 813 Overview and General Structure
- 815 L2 EtherType Filters
- 815 VLAN Filters - Single and Double VLAN Cases
- 816 L3 and L4 Filters
- 818 Manageability Decision Filters
- 820 Possible Configurations
- 822 10.4 LinkSec and Manageability
- 823 Handover of LinkSec Responsibility Between BMC and Host
- 825 10.5 Manageability Programming Interfaces
- 825 NC-SI Programming
- 874 SMBus Programming
- 911 Manageability Host Interface
- 915 Software and Firmware Synchronization
- 919 11.1 Introduction
- 919 11.2 Operating Conditions
- 919 Absolute Maximum Ratings
- 920 Recommended Operating Conditions
- 920 11.3 Power Delivery
- 920 Power Supply Specifications
- 922 In-Rush Current
- 922 11.4 DC/AC Specification
- 922 DC Specifications
- 927 Digital I/F AC Specifications
- 938 PCIe Interface AC/DC Specification
- 938 Network (MAUI) Interface AC/DC Specification
- 940 SerDes Crystal/Reference Clock Specification
- 946 11.5 Package
- 946 Mechanical
- 946 Thermal
- 946 Electrical
- 947 Mechanical Package
- 947 11.6 Devices Supported
- 947 Flash
- 948 EEPROM
- 949 12.1 Connecting the PCIe Interface
- 949 Link Width Configuration
- 950 Polarity Inversion and Lane Reversal
- 950 PCIe Reference Clock
- 950 PCIe Analog Bias Resistor
- 950 Miscellaneous PCIe Signals
- 950 PCIe Layout Recommendations
- 951 12.2 Connecting the MAUI Interfaces
- 951 MAUI Channels Lane Connections
- 951 MAUI Bias Resistor
- 951 XAUI, KX/KR, BX4, CX4, BX and SFI+ Layout Recommendations
- 952 Board Stack-Up Example
- 952 Trace Geometries
- 953 Other High-Speed Signal Routing Practices
- 956 Reference Planes
- 958 Dielectric Weave Compensation
- 959 Impedance Discontinuities
- 959 Reducing Circuit Inductance
- 960 Signal Isolation
- 960 Power and Ground Planes
- 966 KR and SFI+ Recommended Simulations
- 967 Additional Differential Trace Layout Guidelines for SFI+ Boards
- 969 12.3 Connecting the Serial EEPROM
- 969 Supported EEPROM Devices
- 969 12.4 Connecting the Flash
- 970 Supported Flash Devices
- 970 12.5 SMBus and NC-SI
- 972 12.6 NC-SI
- 972 NC-SI Design Requirements
- 974 NC-SI Layout Requirements
- 978 12.7 Resets
- 979 12.8 Connecting the MDIO Interfaces
- 979 12.9 Connecting the Software-Definable Pins (SDPs)
- 980 12.10 Connecting the Light Emitting Diodes (LEDs)
- 980 12.11 Connecting Miscellaneous Signals
- 980 LAN Disable
- 981 BIOS Handling of Device Disable
- 982 12.12 Oscillator Design Considerations
- 982 Oscillator Types
- 983 Oscillator Solution
- 983 Oscillator Layout Recommendations
- 983 Reference Clock Measurement Recommendations
- 983 12.13 Power Supplies
- 984 Power Supply Sequencing
- 984 Power Supply Filtering
- 985 Support for Power Management and Wake Up
- 985 12.14 Connecting the JTAG Port
- 987 13.1 Thermal Considerations
- 988 13.2 Importance of Thermal Management
- 988 13.3 Packaging Terminology
- 989 13.4 Thermal Specifications
- 990 13.5 Case Temperature
- 990 13.6 Thermal Attributes
- 990 Designing for Thermal Performance
- 990 Model System Definition
- 991 Package Thermal Characteristics
- 992 13.7 Thermal Enhancements
- 992 13.8 Clearances
- 994 13.9 Default Enhanced Thermal Solution
- 995 13.10 Extruded Heatsinks
- 996 13.11 Attaching the Extruded Heatsink
- 996 Clips
- 996 Thermal Interface (PCM45 Series)
- 996 Avoid Damaging Die-Side Capacitors with Heat Sink Attached
- 997 Maximum Static Normal Load
- 998 13.12 Reliability
- 998 Thermal Interface Management for Heat-Sink Solutions
- 999 13.13 Measurements for Thermal Specifications
- 999 Case Temperature Measurements
- 1000 Attaching the Thermocouple (No Heatsink)
- 1000 Attaching the Thermocouple (Heatsink)
- 1001 13.14 Heatsink and Attach Suppliers
- 1002 13.15 PCB Guidelines
- 1003 14.1 Link Loopback Operations
- 1016 15.1 Register Attributes
- 1017 Legacy Packet Formats
- 1017 ARP Packet Formats
- 1019 IP and TCP/UDP Headers for TSO
- 1025 Magic Packet
- 1025 SNAP Packet Format
- 1025 Packet Types for Packet Split Filtering
- 1026 Type 1.1: Ethernet (VLAN/SNAP) IP Packets
- 1034 Type 2: Ethernet, Ipv
- 1037 Type 3: Reserved
- 1037 Type 4: NFS Packets
- 1042 IPsec Formats Run Over the Wire
- 1042 AH Formats
- 1046 ESP Formats
- 1051 BCN Frame Format
- 1052 FCoE Framing
- 1052 FCoE Frame Format
- 1055 FC Frame Format
- 1063 Background
- 1064 Location in the NVM
- 1063 375 mA
- 1063 20 mA
- 1063 375 mA
- 1063 20 mA
- 1064 Section