Express5800/A2040c, A2020c, A2010c, A1040c PCIe Live

Express5800/A2040c, A2020c, A2010c, A1040c PCIe Live
Express5800/A2040c, A2020c, A2010c, A1040c
PCIe Live Error Recovery User’s Guide
(Release 1.0)
June 2015
NEC Corporation
© 2015 NEC Corporation
855-901079-001-A
Notes on Using This Manual
 No part of this manual may be reproduced in any form without the prior written permission of
NEC Corporation.
 The contents of this manual may be revised without prior notice.
 The contents of this manual shall not be copied or altered without the prior written permission
of NEC Corporation.
Trademarks
 Linux is a trademark or registered trademark of Linus Torvalds in Japan and other countries.
 Red Hat and Red Hat Enterprise Linux are trademarks or registered trademarks of Red Hat,
Inc. in the United States and other countries.
 Oracle is a registered trademark of Oracle Corporation or its subsidiaries, and/or its affiliates
in the United States and other countries.
 Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation in the United
States and other countries.
 All other product, brand, or trade names used in this publication are the trademarks or
registered trademarks of their respective trademark owners.
Related Documents
 Express5800/A1040c, A2040c, A2020c, A2010c User’s Guide
Contents
1.
Introduction .............................................................................................................................. 1
1.1
1.2
1.3
1.4
1.5
2.
Installing necpciras ................................................................................................................. 3
2.1
2.2
2.3
2.4
2.5
3.
What is PCIe Live Error Recovery? ................................................................................... 1
Operating Environment ...................................................................................................... 1
Supported Cards ............................................................................................................... 1
Terminology ....................................................................................................................... 2
Access Limitation ............................................................................................................... 2
Installing necpciras ............................................................................................................ 3
Uninstalling necpciras ........................................................................................................ 3
Upgrading necpciras .......................................................................................................... 3
Configuration by necpciras ................................................................................................ 5
Backup Configuration Information ..................................................................................... 5
Necpciras Command Reference ............................................................................................ 7
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
necpciras command line format......................................................................................... 7
--show option ..................................................................................................................... 7
--set-ler option.................................................................................................................... 8
--set-noler option ............................................................................................................... 9
--set-threshold option ....................................................................................................... 10
--reset option.................................................................................................................... 11
--version option ................................................................................................................ 12
Usage .............................................................................................................................. 12
1.
Introduction
1.1
What is PCIe Live Error Recovery?
PCIe Live Error Recovery is a feature to improve the I/O availability. In the event of a
critical/uncorrectable failure occurs to an adapter, the feature will bring down the PCIe link associated
with the failed root port within one cycle and automatically reinitialize the adapter in the case of the
intermittent failure to maintain. Without this feature, if a critical I/O failure occurs to the adapter, the
system will be down. This feature improves more the I/O availability by a combination of redundant I/O
features such as NIC Teaming.
1.2
Operating Environment
PCIe Live Error Recovery operating environment as shown below:
Table 1-1 Operating Environment
Hardware (Server)
OS
1.3
Express5800/A2040c
Express5800/A2020c
Express5800/A2010c
Express5800/A1040c
Red Hat Enterprise Linux 6.6
Supported Cards
PCIe Live Error Recovery supported cards as shown below:
Table 1-2 Supported Cards
Network Card
Fibre Channel Card
10GBASE (SFP+/2ch)
-NE3304-149
Fibre Channel Controller (1ch,8G)
-NE3390-159
Fibre Channel Controller (2ch,8G)
-NE3390-160
Fibre Channel Controller (1ch,16G)
-NE3390-157A
Fibre Channel Controller (2ch,16G)
-NE3390-158A
1
1.4
Terminology
Terms used in Mission Critical I/O Failover as shown below:
Table 1-3 Terminology
Term
Bonding
SPS
Failover
LER mode
NoLER mode
LER / NoLER slot
Web console
necpciras
1.5
Description
Bonding is standard NIC teaming in Linux.
StoragePathSavior(SPS) is a software to multiplex paths between a server and
storage unit in a system with Express5800 and the NEC Storage series Disk Array
Subsystem.
Traffic failover to prevent connectivity loss in the event of a network component
failure.
LER mode is Live Error Recovery mode. Setting LER mode enables PCIe Live Error
Recovery. When uncorrected error is occurred in the PCIe slot set as LER, the
feature will bring down the PCIe link, automatically reinitialize the adapter in the case
of the intermittent failure.
Setting NoLER mode disables Mission Critical I/O Failover. When uncorrected error
is occurred in the PCIe slot set as NoLER, the system will be rebooted.
LER slot is the PCIe slot set as LER.
NoLER slot is the PCIe slot set as NoLER.
A tool used to view or configure the server via web browser provided by
EXPRESSSCOPE Engine SP3.
Command used for configuring LER mode.
Access Limitation
Operation related to Mission Critical I/O Failover feature is allowed for the user having administrative
right (Administrator account).
2
2.
Installing necpciras
This section describes how to install, uninstall, and upgrade necpciras command.
2.1
Installing necpciras
1.
Login to the target machine as a root user.
2.
Copy the file necpciras-*.x86_64.rpm to desired directory in target machine. (* represents
revision number.)
# rpm -ivh necpciras-2.4-1.02.el6.x86_64.rpm
Preparing...
########################################### [100%]
1:necpciras
########################################### [100%]
3.
Run the following command to check if neccapd package is installed correctly.
# rpm -qa |grep necpciras
necpciras-2.4-1.02.el6.x86_64
2.2
Uninstalling necpciras
1.
Login to the target machine as a root user.
2.
Uninstall necpciras package by running rpm command.
# rpm -e necpciras
3.
Run the following command to check if neccapd package is installed correctly.
# rpm -qa |grep necpciras
Uninstallation is completed successfully if no response is displayed against the command.
Important
2.3
Configuration by necpciras command is preserved after uninstallation.
Upgrading necpciras
Upgrade necpciras as follows:
Uninstall the old necpciras package according to "2.2 Uninstalling necpciras", then install the new
necpciras according to "2.1 Installing necpciras".
3
4
2.4
Configuration by necpciras
necpcirs command is used to display information related to PCIe Live Error Recovery feature and to set
LER mode settings.
See "3. Necpciras Command Reference" for details of command line of necpciras command.
Important
Some settings require to system (OS) reboot to apply the settings.
Important
Factory default setting is NoLER mode.
Table 2-1 necpciras command options
Option
--set-ler
--set-noler
--reset
--set-threshold
2.5
Use case
Reboot
Use this option to set PCIe slots as LER
mode.
Use this option to set PCIe slots as NoLER
mode.
Required
Use this option to restore factory default
settings.
Use this option to specify recovery
threshold of uncorrected error.
Required
Required
Required
Important
LER Mode must be set to supported cards only for PCIe Live Error
Recovery.
Tips
This feature improves more the I/O availability by a combination of
redundant I/O features such as NIC Teaming..
Backup Configuration Information
Information configured by necpciras is stored in hardware of the server, not in the file system of OS. If
you change configuration information, be sure to backup the configuration information using web
console.
Important
Reboot or shutdown the system before starting backup process.
Described below is procedure to backup configuration information using web console.
Refer to "Express5800/A1040c, A2040c, A2020c, A2010c User’s Guide" for detailed information and
operation screen images.
Backup procedure
1.
Reboot or shutdown the system.
2.
Select the [Configuration] on web console.
5
3.
Select [Save/Restore in Bulk] on web console.
4.
Press the [Backup] button to download the file containing configuration information.
Refer to "Express5800/A1040c, A2040c, A2020c, A2010c User’s Guide" for how to restore the
configuration information using the backup file obtained from web console.
6
3.
Necpciras Command Reference
This section describes details of necpciras command used to view or configure information related to
Mission Critical I/O Failover. For how to install necpciras, see “2.1 Installing necpciras”.
3.1
necpciras command line format
necpciras subcommand [<options>]
subcommand:
--show
… See [3.2].
--set-ler=<PCI_SLOT_NUMBERS>
… See [3.3].
--set-noler=<PCI_SLOT_NUMBERS>
… See [3.4].
--set-threshold=<THRESHOLD>
… See [3.5].
--reset
… See [3.6].
--version
… See [3.7].
PCI_SLOT_NUMBERS: List the number of PCIe slots delimiting with slash.
THRESHOLD: Recovery threshold
3.2
--show option
Shows the current settings of PCIe Live Error Recovery feature.
Suboption
None
Execution resultEx
# ./necpciras --show
LER Settings:
-----------------------------------------------------LER
LER
Slot
Status Current Next
-----------------------------------------------------PCI1
Enable No
No
PCI2
N/A
No
No
PCI3
Enable No
No
PCI4
N/A
No
No
PCI5
N/A
No
No
PCI6
N/A
No
No
PCI7
N/A
No
No
PCI8
N/A
No
No
PCI9
N/A
No
No
PCI10 N/A
No
No
PCI11 N/A
No
No
PCI12 N/A
No
No
PCI13 N/A
No
No
PCI14 N/A
No
No
PCI15 N/A
No
No
PCI16 N/A
No
No
LER threshold Setting:
-----------------------------------------------------Current Next
-----------------------------------------------------Threshold
1
1
7
Description
Table 3-1 necpciras –show option
Item
Slot Status
LER Current
Displayed character
string
Enable
The PCIe slot is available.
N/A
Yes
The PCIe slot is not available.
The PCIe slot is set as LER mode.
The PCIe slot is set as NoLER mode. Empty PCIe slot is
displayed as NoLER mode.
No
Yes
The PCIe slot will be set as LER mode on next boot or PCIe
hot-add.
No
The PCIe slot will be set as NoLER mode on next boot.
Value
Current recovery threshold is shown. Recovery will be
performed until the threshold.
Value
Next recovery threshold is shown on next boot.
LER Next
Threshold
Current
Threshold
Next
3.3
Meaning
--set-ler option
Specify LER mode of each PCIe slot.
Important
LER Mode must be set to supported cards for PCIe Live Error Recovery.
Tips
This feature improves more the I/O availability by a combination of
redundant I/O features such as NIC Teaming.
Important
Reboot the system to apply the settings.
Suboption
 --set-ler=<PCI_SLOT_NUMBERS> (Ex. –set-ler=9/10)
Specify the PCIe slot numbers of LER slot, by delimiting with the slash.
Execution resultEx
When command is executed successfully, the same contents as --show option is displayed.
If command fails due to an illegal argument or others, an error message or usage of necpciras is
displayed.
8
# ./necpciras --set-ler=9/10
LER Settings:
-----------------------------------------------------LER
LER
Slot
Status
Current
Next
-----------------------------------------------------PCI1
Enable No
No
PCI2
N/A
No
No
PCI3
Enable No
No
PCI4
N/A
No
No
PCI5
N/A
No
No
PCI6
N/A
No
No
PCI7
N/A
No
No
PCI8
N/A
No
No
PCI9
Enable Yes
No
PCI10 Enable Yes
No
PCI11 N/A
No
No
PCI12 N/A
No
No
PCI13 N/A
No
No
PCI14 N/A
No
No
PCI15 N/A
No
No
PCI16 N/A
No
No
LER threshold Setting:
-----------------------------------------------------Current Next
-----------------------------------------------------Threshold
1
1
** NOTICE **
The configuration changes have not been applied yet.
You must reboot the system to apply them.
Tips
3.4
Empty PCIe slot is allowed to be set as LER mode. When PCIe
Hot-Add, the slot will be set as LER mode.
--set-noler option
Specify NoLER mode of each PCIe slot.
Important
Reboot the system to apply the settings.
Suboption
 --set-noler=<PCI_SLOT_NUMBERS> (e.g. --set-ler=9/10)
Specify the PCIe slot numbers of NoLER slot, by delimiting with the slash.
Execution resultEx
When command is executed successfully, the same contents as --show option is displayed.
If command fails due to an illegal argument or others, an error message or usage of necpciras is
displayed.
9
# ./necpciras --set-noler=9/10
LER Settings:
-----------------------------------------------------LER
LER
Slot
Status Current Next
-----------------------------------------------------PCI1
Enable No
No
PCI2
N/A
No
No
PCI3
Enable No
No
PCI4
N/A
No
No
PCI5
N/A
No
No
PCI6
N/A
No
No
PCI7
N/A
No
No
PCI8
N/A
No
No
PCI9
Enable Yes
No
PCI10 Enable Yes
No
PCI11 N/A
No
No
PCI12 N/A
No
No
PCI13 N/A
No
No
PCI14 N/A
No
No
PCI15 N/A
No
No
PCI16 N/A
No
No
LER threshold Setting:
-----------------------------------------------------Current Next
-----------------------------------------------------Threshold
1
1
** NOTICE **
The configuration changes have not been applied yet.
You must reboot the system to apply them.
3.5
--set-threshold option
Specify recovery threshold of uncorrected error. Recovery will be performed until the threshold. The
number of recovery will be counted for each slot. If the number of recovery exceeded the threshold in a
certain slot, the system will be down in order to prevent unsafe recovery. If recovery threshold is zero,
recovery will done up to infinity.
Example: If the threshold is 3 and the uncorrectable error occurs 3 times on the PCIe slot 9, recovery
th
will be performed for 3 times in PCIe slot 9. Then if the 4 uncorrectable occurs on the PCIe slot 9, the
system will be down.
Tips
If in the same PCIe card error occurred repeatedly, there is a high
possibility of complete failure. In this case, downing system may be
safer than running. Determine the threshold according to the use
environment. Default recovery. Default recovery threshold is one.
Important
Reboot the system to apply the settings.
Suboption
 --set-threshold=<THRESHOLD> (e.g. --set-threshold=3)
Specify the decimal recovery threshold
Execution resultEx
When command is executed successfully, the same contents as --show option is displayed.
If command fails due to an illegal argument or others, an error message or usage of necpciras is
displayed.
10
# ./necpciras --set-threshold=3
LER Settings:
-----------------------------------------------------LER
LER
Slot
Status Current Next
-----------------------------------------------------PCI1
Enable No
No
PCI2
N/A
No
No
PCI3
Enable No
No
PCI4
N/A
No
No
PCI5
N/A
No
No
PCI6
N/A
No
No
PCI7
N/A
No
No
PCI8
N/A
No
No
PCI9
Enable Yes
Yes
PCI10 Enable Yes
Yes
PCI11 N/A
No
No
PCI12 N/A
No
No
PCI13 N/A
No
No
PCI14 N/A
No
No
PCI15 N/A
No
No
PCI16 N/A
No
No
LER threshold Setting:
-----------------------------------------------------Current Next
-----------------------------------------------------Threshold
1
3
** NOTICE **
The configuration changes have not been applied yet.
You must reboot the system to apply them.
3.6
--reset option
Reset ler settings.
Important
Reboot the system to apply the settings.
Suboption
None
Execution resultEx
LER settings are reset. And the same contents as --show option is displayed.
If command fails due to an illegal argument or others, an error message or usage of necpciras is
displayed.
11
# ./necpciras --reset
LER Settings:
-----------------------------------------------------LER
LER
Slot
Status Current Next
-----------------------------------------------------PCI1
Enable No
No
PCI2
N/A
No
No
PCI3
Enable No
No
PCI4
N/A
No
No
PCI5
N/A
No
No
PCI6
N/A
No
No
PCI7
N/A
No
No
PCI8
N/A
No
No
PCI9
Enable Yes
No
PCI10 Enable Yes
No
PCI11 N/A
No
No
PCI12 N/A
No
No
PCI13 N/A
No
No
PCI14 N/A
No
No
PCI15 N/A
No
No
PCI16 N/A
No
No
LER threshold Setting:
-----------------------------------------------------Current Next
-----------------------------------------------------Threshold
3
1
** NOTICE **
The configuration changes have not been applied yet.
You must reboot the system to apply them.
3.7
--version option
Shows version information of necpciras.
Suboption
None
Execution result
The version information is displayed by the following format:
# ./necpciras --version
necpciras Version 1.3
3.8
Usage
If command fails due to an illegal argument or others, usage of necpciras is displayed.
Suboption
None
12
Execution result
# ./necpciras
Usage:./necpciras --show
Usage:./necpciras --reset
Usage:./necpciras --set-ler=<PCI_SLOT_NUMBERS>
Usage:./necpciras --set-noler=<PCI_SLOT_NUMBERS>
Usage:./necpciras --set-threshold=<THRESHOLD>
Default value of LER Setting is "No".
<PCI_SLOT_NUMBERS> is pci slot numbers separated by a slash.
ex) --set-ler=1/6 : slot1 and slot6 are set to LER.
<THRESHOLD> is LER threshold value of 0 - 255. Default value is "1".
0 : No specified threshold. PCI Error Recovery will be performed
for every uncorrectable error.
1 - 255 : Specify threshold value.
ex) --set-threshold=1 : PCI Error Recovery will be performed only for
1st uncorrectable error.
ex) --set-threshold=2 : PCI Error Recovery will be performed for
1st and 2nd uncorrectable error.
13
Express5800/A2040c, A2020c, A2010c, A1040c
PCIe Live Error Recovery
User’s Guide
(Release 1.0)
NEC Corporation
7-1 Shiba 5-Chome, Minato-Ku
Tokyo 108-8001, Japan
TEL (03) 3454-1111 (Main phone number)
© NEC Corporation 2015
No part of this manual may be reproduced in any form without the prior written permission of NEC Corporation.
14
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising