Intel® Omni-Path Fabric Staging Guide

Intel® Omni-Path Fabric Staging Guide
Intel® Omni-Path Fabric
Staging Guide
May 2016
Document Number: J27600-1.0
Legal Disclaimer
Legal Disclaimer
You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel
products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted
which includes subject matter disclosed herein.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel
product specifications and roadmaps.
The products described may contain design defects or errors known as errata which may cause the product to deviate from
published specifications. Current characterized errata are available on request.
Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-5484725 or by visiting: http://www.intel.com/design/literature.htm
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service
activation. Learn more at http://www.intel.com/ or from the OEM or retailer.
No computer system can be absolutely secure.
Intel, Intel Xeon Phi, Xeon, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2016, Intel Corporation. All rights reserved.
Intel® Omni-Path Fabric
Staging Guide
2
May 2016
Document Number: J27600-1.0
Contents
Contents
1
Introduction .................................................................................................... 6
2
Installation Prerequisites ................................................................................ 6
3
Configure BIOS Settings.................................................................................. 7
4
Configure OS Settings ..................................................................................... 7
4.1
4.2
5
CPU Frequency Settings .................................................................................... 7
OS Tuning ....................................................................................................... 8
Install Intel® Omni-Path Software .................................................................. 8
5.1
Disable Linux* Firewall ...................................................................................... 9
6
TCP/IP Host Name Resolution ........................................................................ 9
7
Generate Cable Map Topology Files ............................................................... 10
8
Configure FastFabric ..................................................................................... 11
8.1
8.2
8.3
Format for IPoIB Host Names ........................................................................... 11
Specify Test Areas for opaallanalysis ................................................................. 11
Location of mpi_apps Directory ........................................................................ 11
9
punchlist.csv ................................................................................................. 12
10
Configure Internally Managed Switches ........................................................ 12
11
Configure Externally Managed Switches ........................................................ 14
12
Verify Cable Map Topology ............................................................................ 15
13
Verify Server and Fabric ................................................................................ 16
14
Best Known Methods (BKMs) for Site Installation ......................................... 17
14.1
14.2
14.3
14.4
14.5
14.6
14.7
14.8
14.9
15
Enable Intel® Fabric Manager GUI for early debug ............................................... 17
Address Server and Fabric Verification Test Results ............................................. 18
Debug Intel® Omni-Path Physical Link Issues...................................................... 19
14.3.1
OPA Link Transition Flow .................................................................... 19
14.3.2
Verify the Fabric Manager is running .................................................... 20
14.3.3
Check the state of all links in the system .............................................. 20
14.3.4
Check the state of HFI links from a server ............................................ 20
14.3.5
Link width, downgrades, and opafm.xml .............................................. 21
14.3.6
How to check fabric connectivity ......................................................... 21
14.3.7
Physical links stability test using opacabletest ....................................... 21
14.3.8
How to debug and fix physical link issues ............................................. 23
14.3.9
Link Debug CLI Commands ................................................................ 23
Use opatop for Bandwidth and Error Summary .................................................... 25
Use the Beacon LED on HFI and Edge Switches ................................................... 25
Decode the Physical Configuration of an HFI Adapter ........................................... 26
Verify Fabric Manager Sweep ........................................................................... 26
Verify PM Sweep Duration................................................................................ 27
Check Credit Loop Operation ............................................................................ 27
Run Benchmark and Stress Tests .................................................................. 27
May 2016
Document Number: J27600-1.0
Intel® Omni-Path Fabric
Staging Guide
3
Contents
15.1
15.2
15.3
15.4
15.5
Run
Run
Run
Run
Run
Bandwidth Test ........................................................................................ 27
Latency Test............................................................................................ 28
MPI DeviationTest: ................................................................................... 28
mpi_groupstress (Cable Stress) ................................................................. 28
run_mpi_stress ........................................................................................ 29
16
Take State Dump of a Switch ........................................................................ 29
17
BKM and OPA Commands .............................................................................. 30
17.1
17.2
17.3
17.4
17.5
17.6
17.7
17.8
17.9
18
Retrieve Host Fabric Interface (HFI) Temperature ............................................... 30
Read Error Counters ....................................................................................... 30
Clear Error Counters ....................................................................................... 31
Load and Unload Intel® Omni-Path Host HFI Driver ............................................. 31
Analyze Links ................................................................................................. 31
Trace Route Between Two Nodes ...................................................................... 32
Analyze All Fabric ISLs Routing Balance ............................................................. 32
Dump Switch ASIC Forwarding Tables ............................................................... 32
Configure Redundant Fabric Manager (FM) Priority .............................................. 32
17.9.1
Configure FM priority from a local or remote terminal ............................. 32
17.9.2
Configure FM Elevated Priority ............................................................ 33
17.9.3
Configuration Consistency for Priority/Elevated Priority ........................... 33
17.9.4
Display FM states from the Management Node ...................................... 33
Final Fabric Checks ....................................................................................... 33
Figures
Figure 1.
Fabric Manager GUI Connection Test ................................................................. 18
Tables
Table 1.
Link Quality Values and Description ................................................................... 31
Intel® Omni-Path Fabric
Staging Guide
4
May 2016
Document Number: J27600-1.0
Revision History
Revision History
Date
Revision
May 2016
1.0
Description
Initial Release
§
May 2016
Document Number: J27600-1.0
Intel® Omni-Path Fabric
Staging Guide
5
Introduction
1
Introduction
This document provides a high level overview of the steps required to stage a
customer-based installation of the Intel® Omni-Path Fabric. Procedures and key
reference documents, such as Intel® Omni-Path user guides and installation guides
are provided to clarify the process. Additional commands and BKMs are defined to
facilitate the installation process and troubleshooting.
Intel recommends that you use the Intel® Omni-Path FastFabric (FF) TUI (Textual
User Interface) as the initial tool suite for installation, configuration, and validation of
the fabric. This tool includes a set of automated features that are specifically used for
standalone host, Ethernet*, and Intel® Omni-Path Fabric connectivity validation.
This document includes recommendations for processes and procedures that
complement the FF tools to reduce the time required to install and configure the
customer's fabric.
You should check applicable release notes and technical advisories for key information
that could influence installation steps outlined in this document.
Assumptions:
 Customer has generated a topology.csv file in the format specified for
opaxlattopology as described in Generate Cable Map Topology Files in this
document and provided to Intel.
 Reference Documentation: Intel® Omni-Path End User Publications.
 Operating System (OS) Software: RHEL* 7.1 or later.
 Single Management Node (with Fabric Manager running) configured with the
Intel® Omni-Path Fabric Suite Software, also known as IntelOPA IFS.
 Intel® Omni-Path Fabric Manager enabled on management nodes.
 Compute Nodes configured with the Intel® Omni-Path Fabric Host Software, also
known as IntelOPA-Basic.
 Password-less access enabled for all hosts and switches.
2
Installation Prerequisites
The recommended fabric installation pre-requisites are defined in Intel® Omni-Path
Fabric Software Installation Guide: Installation Prerequisites section.
The RPMs required for the operating system you are using are defined in Intel® OmniPath Fabric Software Installation Guide: OS RPMs Installation Prerequisites section.
Complete the following steps before starting software installation:
1.
Install Intel® Omni-Path Host Fabric Interface (HFI) Gen3 PCIe Card(s) in servers.
Intel® Omni-Path Fabric
Staging Guide
6
May 2016
Document Number: J27600-1.0
Configure BIOS Settings
3
2.
Verify server boots OS from local disk or PXE remote boot server with no
hardware errors.
3.
Verify node executes a warm reset and boots to OS.
Configure BIOS Settings
The recommended BIOS settings for Intel® Xeon® servers are defined in:
Intel® Omni-Path Performance Tuning User Guide: Section 2 BIOS Settings.
Intel recommends pre-configuring servers with the appropriate BIOS settings before
starting Intel® Omni-Path software configuration. The settings below are for all-around
performance.
The performance-relevant recommended BIOS settings on a server with Intel® Xeon®
Processor V3 family CPUs (codenamed Haswell) is listed in the Intel® Omni-Path
Performance Tuning User Guide.
4
Configure OS Settings
The recommended RHEL* 7 OS settings to optimize performance are defined in:
Intel® Omni-Path Performance Tuning User Guide: RHEL* Settings section.
Intel recommends pre-configuring servers with the appropriate OS configuration
settings before starting Intel® Omni-Path software installation, thus reducing
installation time.
4.1
CPU Frequency Settings
These settings are used to optimize CPU performance for benchmarks and may not be
required for a production environment.
CPU frequency default Intel pstate driver in RHEL* 7 can result in changing CPU
frequencies and unpredictable performance. The following change allows cpupower to
set a consistent and steady CPU clock rate on all cores.
1.
Disable intel_pstate in the kernel command line:
Edit /etc/default/grub by adding intel_pstate=disable to
GRUB_CMDLINE_LINUX.
2.
Apply the change: grub2-mkconfig –o /boot/grub2/grub.cfg
3.
Reboot.
Platform Settings
To reduce run-to-run performance variations, Intel recommends that you pin the CPU
clock frequency to a specific value and use the performance setting of the CPU power
governor.
May 2016
Document Number: J27600-1.0
Intel® Omni-Path Fabric
Staging Guide
7
Install Intel® Omni-Path Software
For example, the following command sets the frequency of all cores to a value of 2.6
GHz and sets the performance governor, when using acpi-cpufreq driver:
sudo cpupower –c all frequency-set –min 2.6 GHz –max 2.6 GHz –g
performance
4.2
OS Tuning
These settings are used to optimize OS performance and are recommended for both
benchmark and production environments.
1.
Avoid acpi_pad consuming CPU resources: ACPI processor aggregator driver
handles high core count processor power management. However, the driver can
cause the system to run acpi_pad and consume 100% of each core. To work
around this issue, add the following line to /etc/modprobe.d/blacklist.conf:
blacklist acpi_pad
2.
Irqbalance: for optimum verbs and IPoIB performance and stability, add the
following to /etc/sysconfig/irqbalance file:
IRQBALANCE_ARGS=--hintpolicy=exact and restart irqbalance service after HFI1
driver load: /bin/systmctl restart irqbalance.service or reboot.
3.
5
Set IPoFabric to MTU size of 65520 and set connected mode in the
/etc/sysconfig/network-scripts/ifcfg-ib0 file.
Install Intel® Omni-Path
Software
You should configure at least one node to run the Intel® Omni-Path Management
Software including Fabric Manager (FM). This node is used to configure and validate all
of the other hosts, switches, and chassis fabric devices. You must install the Intel®
Omni-Path Fabric Suite software on this node.
Install IntelOPA-IFS (including mpi app) by changing directory to /IntelOPAIFS.DISTRO.VERSION and using ./INSTALL.
 The following document and sections describe the install procedures:
Intel® Omni-Path Fabric Software Installation Guide: Section 3.0 Download and
Extract Installation Packages
Intel® Omni-Path Fabric Software Installation Guide: Section 4.0 Install the Intel®
Omni-Path Fabric Software
 Recommendation: Enable servers with IPMI interfaces to support ACPI or equivalent
remote power management and reset control via an Ethernet network.
 Apply Technical Advisories as needed.
 Verify HFI speed and bus width using lspci.
After the IFS installation, verify the Intel® OP HFI card is configured and visible to
the host OS as Gen3 x16 slot speed (values are in bold text):
Intel® Omni-Path Fabric
Staging Guide
8
May 2016
Document Number: J27600-1.0
TCP/IP Host Name Resolution
lspci -d 8086:24f0 -vv |grep Width
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L1, Exit Latency L0s
<4us, L1 <64us
LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActiveBWMgmt- ABWMgmt-
5.1
Disable Linux* Firewall
Use the commands:
# systemctl status firewalled
# systemctl stop firewalled
# systemctl disable firewalled
# systemctl status firewalled
6
TCP/IP Host Name Resolution
For an overview of resolving the TCP/IP Host Namesm see
Intel® Omni-Path Fabric Software Installation Guide: Section 2.1 – Installation
Prerequisites.
The following notes provide an example of the contents of /etc/hosts file.
Create a /etc/hosts file before starting Intel® Omni-Path software installation to
simplify the process. In a typical installation, the server and switch names follow a
local convention to indicate physical location or purpose of the node. This convention
simplifies installing and managing the fabric.
 If using /etc/hosts, update the /etc/hosts file on the Management Node (the
head node with IFS installed) and copy to all hosts.
 If using DNS, all Management Network and IPoIB hostnames must be added to DNS
/etc/resolve.conf and configured on the Management Node.
 The /etc/hosts file should contain:
 Local host, required for subsequent single host verification using FastFabric
TUI
 Ethernet and IPoIB addresses and names for all hosts
 Ethernet addresses and names of switches
 Ethernet addresses of IPMI or remote management modules
 Ethernet addresses of power domain.
An example of these recommendations follows:
# /etc/hosts example
# localhost (required)
127.0.0.1 localhost localhost.localdomain localhost4
localhost4.localdomain
# Ethernet Addresses of hosts
10.128.196.14 node1
10.128.196.15 node2
10.128.196.16 node3
#IPoIB Address of hosts should be outside Ethernet network
10.128.200.14
node1-opa
May 2016
Document Number: J27600-1.0
Intel® Omni-Path Fabric
Staging Guide
9
Generate Cable Map Topology Files
10.128.200.15
10.128.200.16
node2-opa
node3-opa
#RMM IP Addresses
10.127.240.121 node1-rmm
10.127.240.122 node2-rmm
# Chassis IP Address
10.128.198.250 Edgeswitch1
10.128.198.249 Edgeswitch2
Other files that may need adjustment according to specific site requirements include:
 /etc/hostname
 /etc/resolv.conf
 /etc/sysconfig/network
 /etc/sysconfig/network-scripts/ifcfg-enp5s0f0
7
Generate Cable Map Topology
Files
This section describes how to generate topology files that are used later in this
document to verify physical installation and to debug basic link, node, and switch
connectivity issues.
Refer to README.topology, and README.xlat_topology in the /opt/opa/samples
directory for best practices in editing the topology.xlsx.
1.
From a management node, copy /opt/opa/sample/topology.xlsx to a laptop
and edit the file in a spreadsheet to reflect rack source and destination name,
port, type (FI or SW in upper case), label, and cable length.
2.
Save this file as .xlsx and then save the first spreadsheet tab as .csv on the
Management Node.
3.
Generate the topology file in .xml using the following commands and the .csv file
as the source:
/usr/sbin/opaxlattopology topology.csv /etc/sysconfig/opa/topology.0:0.xml
Following is an example of xlsx file:
Standard-Format Topology Spread Sheet
Source
Destination
Rack Group Rack Name
Name-2 Port Type Rack Group Rack
row1
rack1 node1
hdlab
1 FI
row1
rack2
node2
hdlab
1 FI
edgesw2 hdlab
9 SW
edgesw2 hdlab
33 SW
Intel® Omni-Path Fabric
Staging Guide
10
Name
Name-2 Port Type
edgesw1
3 SW
edgesw2
1 SW
edgesw1
12 SW
edgesw1
47 SW
Cable
Label
n1-esw1
n2-esw2
ISL
ISL
Length
1m
1m
1m
1m
Details
Cable CU
Cable CU
Cable CU
Cable CU
May 2016
Document Number: J27600-1.0
Configure FastFabric
Refer to opaxlattopology in the Intel® Omni-Path Fabric Suite FastFabric Command
Line Interface Reference Guide for more information.
8
Configure FastFabric
Intel® Omni-Path Fabric Suite FastFabric User Guide, Configuration Files for FastFabric
section defines the list of configuration files that are used by FastFabric.
The opafastfabric.conf file provides default settings for most of the FastFabric
command line options.
8.1
Format for IPoIB Host Names
By default, FastFabric uses the suffix OPA for the IPoIB host name. You can change
this to a prefix and you can also change from opa to another convention such as ib, as
the customer requires in /etc/sysconfig/opa/opafastfabric.conf.
The following examples show changing opa to ib as a prefix or suffix.
For suffix:
export FF_IPOIB_SUFFIX=${FF_IPOIB_SUFFIX:--opa to export
FF_IPOIB_SUFFIX=${FF_IPOIB_SUFFIX:--ib
For prefix:
export FF_IPOIB_PREFIX=${FF_IPOIB_PREFIX:-opa- to export
FF_IPOIB_PREFIX=${FF_IPOIB_PREFIX:-ib-
8.2
Specify Test Areas for opaallanalysis
By default, opaallanalysis includes the fabric and chassis. These can be modified to
include host SM, embedded SM, and externally managed switches in
/etc/sysconfig/opa/opafastfabric.conf as follows:
# pick appropriate type of SM to analyze
#export FF_ALL_ANALYSIS=${FF_ALL_ANALYSIS:-fabric chassis hostsm esm}
export FF_ALL_ANALYSIS=${FF_ALL_ANALYSIS:-fabric chassis hostsm}
8.3
Location of mpi_apps Directory
By default, opafastfabric uses mpi_apps located in /opt/opa/src/mpi_apps.
If a different path is set up for mpi_apps, then modify the following in
/etc/sysconfig/opa/opafastfabric.conf:
export FF_MPI_APPS_DIR=${FF_MPI_APPS_DIR:-/opt/opa/src/mpi_apps}
May 2016
Document Number: J27600-1.0
Intel® Omni-Path Fabric
Staging Guide
11
punchlist.csv
9
punchlist.csv
A punchlist file is generated during execution of the FastFabric TUI and CLI commands
and can be used for tracking issues identified by the OPA tools. The file is located in:
$FF_RESULT_DIR/punchlist.csv, typically /root/punchlist.csv.
10
Configure Internally Managed
Switches
For a complete description of the configuration process, refer to the Intel® Omni-Path
Fabric Software Installation Guide, Configure Intel® Omni-Path Chassis section.
The following steps provide a summary:
1.
Download and install the driver file CDM v2.12.00 WHQL Certified.exe from:
http://www.ftdichip.com/Drivers/VCP.htm
2.
Set up USB serial port terminal emulator using the following serial options:
 Speed: 115200
 Data Bits: 8
 Stop Bits: 1
 Parity: None
 Flow Control: None
3.
Set up the switch TCP/IP address, gateway, netmask, and other options using a
terminal emulator.
a.
Set the chassis IP address:
setChassisIpAddr -h ipaddress -m netMask
where ipaddress is the new IP address in dotted decimal format
(xxx.xxx.xxx.xxx), and netMask is the new subnet mask in dotted decimal
format.
b.
Change the chassis default gateway IP address
setDefaultRoute -h ipaddress
where ipaddress is the new default gateway IP address in dotted decimal
format.
The changes are effective immediately.
For details, refer to the Intel® Omni-Path Fabric Switches Hardware Installation
Guide.
4.
Run opafastfabric
5.
Select 1) Chassis Setup/Admin
6.
Select items 0-6 (Edit Config …, Test, …)
Intel® Omni-Path Fabric
Staging Guide
12
May 2016
Document Number: J27600-1.0
Configure Internally Managed Switches
a.
b.
Press P to Perform
Item 0: Edit Config and Select/Edit Chassis File
1.
Skip opafastfabric.conf, no changes needed.
2.
Skip ports, no changes needed.
3.
For chassis file, in the editor, review the list of chassis selected. The
setup of this file should have occurred above when setting up the
Management Node by editing /etc/sysconfig/opa/chassis with the
name corresponding to the Ethernet IP address of the chassis.
c.
Item 1: Verify Chassis via Ethernet Ping, should pass without error
d.
Item 2: Update Chassis Firmware
Specify the location for the firmware file to use.
e.
Item 3: Set Up Chassis Basic Configuration
Provide answers as follows:
1.
2.
Password: – Press Enter (no password)
Syslog (y)
a.
Syslog server (n)
b.
TCP/UDP port number (n) – use default
c.
Syslog facility (n) – use default
3.
NTP (n) – Customer to assign
4.
Timezone and DST (y)
Use local timezone of server (y)
Do you wish to configure OPA Node Desc to match Ethernet chassis
name (y)
Do you wish to configure the Link CRC Mode? [n]
5.
6.
f.
Item 4: Set Up Password-Less SSH/SCP
g.
Item 5: Reboot Chassis
Should pass without error.
h.
Item 6: Get Basic Chassis Configuration.
Expected Summary output at end is shown below. Note that count should
match the number of Edge switches.
Edgeswitch1:
Firmware Active
: 10.0.0.0.696
Firmware Primary
: 10.0.0.0.696
Syslog Configuration
: Syslog host set to: 0.0.0.0 port 514
facility 22
NTP
: Configured to use the local clock
Time Zone
: Current time zone offset is: -5
LinkWidth Support
: 4X
Node Description
: switch1
Link CRC Mode
: 48b_or_14b_or_16b
May 2016
Document Number: J27600-1.0
Intel® Omni-Path Fabric
Staging Guide
13
Configure Externally Managed Switches
11
Configure Externally Managed
Switches
For a complete description of the install process, including Pre-Requisites and
Configuration refer to the Intel® Omni-Path Fabric Software Installation Guide,
Configure Firmware on the Externally Managed Intel® Omni-Path Switches section.
The 100SWE48QF Edge switches do not have Ethernet* interface. Setup of these
switches are performed using FastFabric via in-band commands.
Preferred approach:
1.
Make sure all hosts are booted, this is required to ID switch names. If hosts are
not available, you can perform all configuration steps except setting the switch
names.
2.
Run opafastfabric
3.
Select 2) Externally Managed Switch Setup/Admin
4.
Select items 0-9 (Edit config, Generate …, Test, …., VPD info)
a.
b.
Press P to Perform
Item 0: Edit Config and Select/Edit Switch File
1.
Skip opafastfabric.conf, no changes needed.
2.
Skip ports, no changes needed.
3.
Edit the file /etc/sysconfig/opa/switches and review the list of
chassis selected. The switches file specifies:
 switches by node GUID
 (optional) hfi:port
 (optional) Node Description (nodename) to be assigned to the switch
 (optional) distance value indicating the relative distance from the
FastFabric node for each switch
The following snippet shows the switches file format and an example:
nodeguid:hfi:port,nodename,distance
0x00117501026a5683:0:0,OmniPth00117501ff6a5602,2
4.
Intel® Omni-Path Fabric
Staging Guide
14
Item 1: Generate or Update Switch File
a.
Regenerate – Answer y if this is the first time or additional
externally-managed switches have been added or replaced.
b.
Update switch names – Answer y if there are sufficient hosts
booted to allow determination of switch names. When y is selected,
this step may take a few minutes.
5.
Item 2, 3:
Should pass without error.
6.
Item 4: Specify the location for the FW file (.emfw) to use.
7.
Item 5: Set up switch basic configuration and set the node description.
Performing Switch Admin: Setup Switch basic configuration
May 2016
Document Number: J27600-1.0
Verify Cable Map Topology
Executing: /usr/sbin/opaswitchadmin -L
/etc/sysconfig/opa/switches configure
Do you wish to configure the switch Link Width Options? [n]:
Do you wish to configure the switch Node Description as it is
set in the switches file? [n]: y
Do you wish to configure the switch FM Enabled option? [n]:
Do you wish to configure the switch Link CRC Mode? [n]:
Executing configure Test Suite (configure) Fri Jan 15
11:11:12 EST 2016 ...
Executing TEST SUITE configure CASE
(configure.0x00117501026a5683:0:0,OmniPth00117501ff6a5602.i2c
.extmgd.switchconfigure) configure switch
0x00117501026a5683:0:0,OmniPth00117501ff6a5602 ...
TEST SUITE configure CASE
(configure.0x00117501026a5683:0:0,OmniPth00117501ff6a5602.i2c
.extmgd.switchconfigure) configure switch
0x00117501026a5683:0:0,OmniPth00117501ff6a5602 PASSED
TEST SUITE configure: 1 Cases; 1 PASSED
8.
Item 6: Reboot should pass without error.
9.
Item 7: Review results for redundant power and FAN status.
Expected summary output at end should be similar to the following
(count should match number of un-managed Edge switches):
0x00117501026a5683:0:0,OmniPth00117501ff6a5602:
F/W ver:10.0.0.0.696 H/W ver:003-01 H/W pt num:H89344-00301 Fan status:Normal/Normal/Normal/Normal/Normal/Normal PS1
Status:ONLINE PS2 Status: ONLINE Temperature
status:LTC2974:33C/MAX_QSFP:40C/PRR_ASIC:40C
Any non-redundant or failed fans or power supplies found during this
step are also reported in /root/punchlist.csv
10.
Item 8: Get Basic Switch Configuration.
Expected summary output at end should be similar to the following
(count should match number of un-managed Edge switches):
Link Width
: 1,2,3,4
Link Speed
: 25Gb
FM Enabled
: No
Link CRC Mode
: None
vCU
: 0
External Loopback Allowed
: Yes
Node Description
: Edgeswitch1
11.
12
Item 9: Save the test.res output for future reference.
Verify Cable Map Topology
This section describes how to use the fabric topology xml created in Generate Cable
Map Topology Files to verify that fabric topology (cabling) is consistent with the cable
map.
The command opareport –o verifyall compares the live fabric interconnect against
the topology file created based on the cable map (links, nodes, and switches).
opareport -o verifyall -T /etc/sysconfig/opa/topology.0:0.xml
May 2016
Document Number: J27600-1.0
Intel® Omni-Path Fabric
Staging Guide
15
Verify Server and Fabric
or
opalinkanalysis verifyall
These commands test links, switches, and SM topology. If successful, the output
reports a total of 0 Incorrect Links found, 0 Missing, 0 Unexpected, 0 Misconnected, 0
Duplicate, and 0 Different.
For links reported with error, verify the physical interconnect against the cable map.
13
Verify Server and Fabric
Validation of servers and the fabric is initiated from the Management Node using the
FastFabric TUI. From the FastFabric TUI, choose Item 4) Host Verification/Admin
and run through all steps.
 Perform 3) Perform Single Host Verification.. When prompted “Would you
like to specify tests to run? [n]:” enter y for HPL test. When prompted
“View Load on hosts prior to verification? [y]:” enter y. This checks CPU
load by running /usr/sbin/opacheckload -f /etc/sysconfig/opa/allhosts.
Edit hostverify.res for results.
 Perform 4) Verify OPA Fabric Status and Topology. This goes through a fabric
error and topology verification. Choose the default for all prompts. Edit
/root/linkanalysis.res to view results.
 Perform 6) Verify Hosts Ping via IPoIB. This will ping all IPoIB interfaces.
 Perform 8) Check MPI Performance. This will Test Latency and Bandwidth deviation
between all hosts. Choose defaults for all prompts. Edit /root/test.log for results.
For details, refer to the following sections in the Intel® Omni-Path Fabric Software
Installation Guide:
 Verify Intel® Omni-Path Fabric Host Software on the Remaining Servers (FF TUI)
section
 Verify Intel® Omni-Path Fabric Host Software on the Remaining Servers using CLI
Commands section
A punchlist file is generated during execution of the FastFabric TUI and CLI commands
and is useful for tracking and resolving issues. The file is located in:
$FF_RESULT_DIR/punchlist.csv, typically /root/punchlist.csv.
Two additional files, /root/test.res and /root/test.log, are created during OPA
test commands and are useful for tracking test failures and issues.
Intel® Omni-Path Fabric
Staging Guide
16
May 2016
Document Number: J27600-1.0
Best Known Methods (BKMs) for Site Installation
14
Best Known Methods (BKMs)
for Site Installation
This section contains commands useful for configuring and debugging issues during
fabric installation.
14.1
Enable Intel® Fabric Manager GUI for early debug
By default the Intel® Omni-Path Fabric Suite Fabric Manager GUI is disabled after
installation of the IFS software. To quickly enable for early debug, use the following
steps.
Note: This method bypasses the SSH key authorization and is not intended for end customer
installs.
1.
Edit /etc/sysconfig/opafm.xml file on the Management Node and make the two
changes shown in bold:
<SslSecurityEnabled>0</SslSecurityEnabled>
<!-- Common FE (Fabric Executive) attributes -->
<Fe>
<!-- The FE is required by the Intel Omni-Path FM GUI. -->
<!-- To enable the FE, configure the SslSecurity parameters in
this file -->
<!-- as desired. -->
<!-- For Host FM then set Start to 1. -->
<!-- For Embedded FM the Start parameter in this file is not
used; -->
<!-- enable the FE via the smConfig and smPmStart chassis CLI
commands. -->
<Start>1</Start> <!-- default FE startup for all instances -->
<!-- Overrides of the Common.Shared parameters if desired -->
<!-- <SyslogFacility>Local6</SyslogFacility> -->
2.
Restart the Fabric Manager to enable the changes and start the FE process
required by the Fabric Manager GUI.
systemctl restart opafm
3.
Download and install the Fabric Manager GUI application to a Windows* PC or
Linux* system.
4.
Start the Fabric Manager GUI application.
5.
Open the Configuration tab and enter the hostname or IP address of the
Management Node running the Fabric Manager in your system into the FE
Connection.
6.
Uncheck the Secure tab.
7.
Select APPLY to run the connection test and then RUN to start the Fabric
Manager GUI application.
Note: The Fabric Manager GUI does not operate through network proxies. Network firewall
access may also need to be disabled. For a quick go/no-go verification, complete the
connection test in the configuration tab as previously described.
May 2016
Document Number: J27600-1.0
Intel® Omni-Path Fabric
Staging Guide
17
Best Known Methods (BKMs) for Site Installation
Figure 1. Fabric Manager GUI Connection Test
14.2
Address Server and Fabric Verification Test
Results
During fabric validation, unexpected loads on Host CPUs may result in inconsistent
performance results. As a debug step, isolate the issue using the following:
Use the OPA tool to verify CPU host load. By default, it captures the top 10 most
heavily loaded hosts.
# /usr/sbin/opacheckload –f /etc/sysconfig/opa/allhosts
After the high load hosts have been identified, the next step is to root cause the
issues.
Perform the following steps:
1.
Check for HFI PCIe width or speed issues.
Are HFI cards operating in a degraded mode, narrow width or less than PCIe
Gen3? Use lspci or opahfirev to verify the PCIe operating speed and bus width:
 lspci: Error! Reference source not found.
 opahfirev: Decode the Physical Configuration of an HFI Adapter
Possible sources for narrow PCIe width:
a.
Be aware that OPA does support different width PCIe cards, including dual
HFI cards using two x8 slices of a x16 physical connector. opahfirev is very
useful for detecting this configuration.
b.
HFI Card partial insertion into x16 slots. Initially this appears to be a narrow
width issue but re-inserting the card often resolves the issue. This may occur
after a server is shipped. This step has resolved most width issues.
c.
Server physical configuration: Many servers support different PCIe logical
widths based on riser card configuration. The slot may be physically x16 but
internally limited to x8. Check other servers of the same configuration in the
fabric. Check the server configuration. This is also a common issue.
d.
Swap the HFI to another server to determine if the problem follows the card
or the server.
Intel® Omni-Path Fabric
Staging Guide
18
May 2016
Document Number: J27600-1.0
Best Known Methods (BKMs) for Site Installation
2.
Use the Linux* top command to identify the key CPU load processes:
#top
opatop may be useful for checking for loads that vary over time, using the r (rev)
f (forward) L (live) options to look through PM snapshots of system activity. This is
also helpful for monitoring application startup vs. run time loads. The PM captures
high resolution statistics, with very low system overhead, over periods up to 2
days, opatop (and the FM GUI) are the tools that harvest the PM stats.
#opatop
3.
Check for high CPU percent processes.
Examples of some issues:
ksoftirqd process – known issue in RHEL* 7.1, the workaround is to reboot the
individual server. Fix is to update to a newer release.
Screen savers – when a Linux GUI is enabled on hosts, the screen that runs
when the user interface is idle may have a high CPU load.
Test applications – look for MPI jobs or similar applications running in the
background. This is a common issue particularly in a shared fabric bring-up
environment. Use kill –p process to stop orphan applications or reboot the
server to debug the issue.
4.
Review the following sections of this document to isolate nodes with different or
incorrect settings. Each area represents configuration variables that have been
shown to create performance deltas.
 Configure BIOS Settings
 CPU Frequency Settings
 OS Tuning
14.3
Debug Intel® Omni-Path Physical Link Issues
After you’ve run the FastFabric tool suite and identified issues with links, then it is
useful to start root-causing the issues. This section focuses on Intel® Omni-Path Fabric
physical links and not PCIe bus link issues.
OPA reporting tools are robust, but it can be confusing for new users to understand
the difference between error counters and actual failures.
From an installation perspective, it is important to watch for physical issues with
cabling, both copper and optical. In general, bend radius, cable insertion issues, and
physical compression or damage to cables can result in transmission issues. OPA
recovers from many issues transparently. This section helps root-cause solid failures
as well as marginal links. Most often the issue is resolved simply by re-installing a
cable and verifying that it clicks into the connector socket on the HFI or switch.
14.3.1
OPA Link Transition Flow
To debug link issues, it is helpful to understand the four key link states, starting from
Offline and running properly in the final Active state.
May 2016
Document Number: J27600-1.0
Intel® Omni-Path Fabric
Staging Guide
19
Best Known Methods (BKMs) for Site Installation
Note: The Fabric Manager (opafm), must be running to transition physical links from the Init
state to the Active state. If you subsequently stop the Fabric Manager when a link is in
the Active state, the link remains active. You can safely make changes to the
opafm.xml file for the Fabric Manager and restart the service without dropping active
links. As of the 10.0.0.696 software release, by default, the opafm service is not
configured for autostart after IFS FULL installation.
PortState:
 Offline: link down. QSFP not present or not visible to the HFI driver.
 Polling: physical link training in progress. At this point you do not know if the other
end of the QSFP is connected to a working OPA device.
 Init: Link training has completed, both sides are present. Typically waiting for the
Fabric Manager to enable the link.
 Active: Normal operating state of a fully functional link.
14.3.2
Verify the Fabric Manager is running
From the Management Node, run the following command to report all HFIs, Switches,
and where the FM is running:
# opareport –V
If it fails, try the following steps:
 To check status of the Fabric Manager process:
# systemctl status opafm
 Restart the Fabric Manager:
# systemctl start opafm
14.3.3
Check the state of all links in the system
The opaextracterror command generates a CSV output representing the entire link
state of the fabric.
# opaextracterror > link_status.csv
14.3.4
Check the state of HFI links from a server
If you are debugging server link issues, the opainfo command may be useful for a
single server view.
opainfo captures a variety of data useful for debugging server related link issues.
Multiple OPA commands can be used to extract individual data elements, however,
this command is unique in the combination of data it provides.
 PortState: see OPA Link Transition Flow
 LinkWidth: a fully functional link should indicate Act:4 and En:4
 QSFP: Physical cable information for the QSFP, in this case a 5M Optical (AOC)
Finisar cable
 Link Quality: 5 is the Best, 4 is Good, 1 is the worst.
Intel® Omni-Path Fabric
Staging Guide
20
May 2016
Document Number: J27600-1.0
Best Known Methods (BKMs) for Site Installation
# opainfo
hfi1_0:1
PortGID:0xfe80000000000000:001175010165b19c
PortState:
Active
LinkSpeed
Act: 25Gb
En: 25Gb
LinkWidth
Act: 4
En: 4
LinkWidthDnGrd ActTx: 4 Rx: 4
En: 3,4
LCRC
Act: 14-bit
En: 14-bit,16-bit,48-bit Mgmt: True
LID: 0x00000001-0x00000001
SM LID: 0x00000002 SL: 0
QSFP: PassiveCu, 1m
FCI Electronics
P/N 10131941-2010LF Rev 5
Xmit Data:
22581581 MB Pkts:
5100825193
Recv Data:
18725619 MB Pkts:
4024569756
Link Quality: 5 (Excellent)
14.3.5
Link width, downgrades, and opafm.xml
By default OPA links run in x4 link width mode. OPA has a highly robust link
mechanism, as compared to Infiniband*, and it allows links to run in reduced widths
with no data loss.
Three things to know:
1.
By default the opafm.xml configuration file requires links to start up in x4 Mode.
This is configurable separately for HFI and ISL link using the WidthPolicy
parameter.
2.
Link downgrade ranges are also configurable in the opafm.xml file, using the
MaxDroppedLanes parameter.
3.
Default configuration example – A link that successfully starts up in x4 width and
subsequently downgrades to x3 width continues to operate. If the link is restarted
(by a server reboot, for example) and attempts to run by less than x4 width, then
the link is disabled by the Fabric Manager and does not enter the Active state.
The opainfo command for HFIs is useful for checking the link width and link
downgrade configuration on servers.
For a system view of all links that are running in less than x4 width mode, use the
command:
# opareport –o slowlinks
14.3.6
How to check fabric connectivity
For large fabrics, follow the flow described in: Generate Cable Map Topology Files.
14.3.7
Physical links stability test using opacabletest
OPA uses a quality metric for reporting status (opainfo). The quality metric ranges
from 5 (excellent) to 1 (poor). For a more quantitative metric, use cabletest to
generate traffic from on the HFI and ISL links, and opaextractperf and
opaextracterrors to harvest the data.
Before you begin:
 Clear error counters prior to test using opareport -o none –clearall and check
the error counters after the test.
May 2016
Document Number: J27600-1.0
Intel® Omni-Path Fabric
Staging Guide
21
Best Known Methods (BKMs) for Site Installation
 Check to make sure there are no errors in fabric using: opareport -o errors
 Use opatop to monitor fabric utilization.
Detailed procedure:
1.
Start and stop cable test on the Management Node either from the opafastfabric
TUI or using CLI commands:
a. # opafastfabric
b. 4) Host Verification/Admin
c. a) Start or Stop Bit Error Rate cable Test
Or to run manually, use the following tests for hosts, then ISLs. Test each one for
a reasonable time, typically 5 -15 minutes.
# /usr/sbin/opacabletest
stop_fi stop_isl
-A -n 3 -f '/etc/sysconfig/opa/allhosts'
# opareport -o none –clearall
# /usr/sbin/opacabletest
start_fi
-A -n 3 -f '/etc/sysconfig/opa/allhosts'
Run the previous command for 5 – 15 minutes for the hosts.
# /usr/sbin/opacabletest
stop_fi start_isl
-A -n 3 -f '/etc/sysconfig/opa/allhosts'
Run the previous command for 5 – 15 minutes for the ISLs.
# /usr/sbin/opacabletest
stop_isl stop_fi
-A -n 3 -f '/etc/sysconfig/opa/allhosts'
# opaextractperf > link_stability_perf.csv
# opaextracterrors > link_stability_counters.csv
Use opatop to view link utilization.
2.
For large fabrics, check stability using a long run of opacabletest (typically 4-8
hours). Short runs of 10-15 minutes are fine for initial validation.
3.
How to interpret the results:
The opaextracterrors command is a misnomer, it captures interesting statistics
for evaluating links, but most of the content is not indicative of failures. The OPA
fabric has robust end-to-end recovery mechanisms that handle issues.
Suggest looking specifically at the following columns:
 LinkWidthDnGradeTxActive – expect to see x4 Width
 LinkWidthDnGradeRxActive – expect to see x4 Width
 LinkQualityIndicator – 5 is excellent, 4 is acceptable, 3 is marginal and clearly
an issue.
 LinkDowned – when an HFI is reset, the link down count increases, so
rebooting a server results in small increments. If you see a link with
significantly higher counts than its reboot expectations, then take a look at the
server /var/log/messages file to determine whether the server is rebooting
or the link is re-initializing.
Intel® Omni-Path Fabric
Staging Guide
22
May 2016
Document Number: J27600-1.0
Best Known Methods (BKMs) for Site Installation
For the other error counters, run a column sort and look for high error counts (greater
than 100x) versus other links and take a look at the link types. Optical links have
higher retry rates. This is not typically an issue unless they far exceed their peers.
The output is useful for verifying that every link is being tested. Unusual fabric
opaextractperf topologies may result in non-optimum cabletest results. One
workaround is to separately run isl and fi (HFI) link tests, then look at the total error
results.
14.3.8
How to debug and fix physical link issues
Check the topology before and after each of the debug steps using:
# opareport -o verifyall -T test_topology.xm
If the original issue was marginal operation rather than a hard failure, then re-run
cabletest and analyze the opaextracterrors results to verify whether the issues
were resolved.
At this point, you have a list of links with issues. Intel recommends the following
approach for physical link resolution:
1.
Unplug and re-insert each end of a physical cable. Check that the cable actually
clicks into place. It may be useful to do this step separately for each end of the
cable. Re-run opacabletest and verify if the issue has been resolved.
Note: This step has resolved more link issues in fabric installs than all others.
14.3.9
2.
Swap the questionable cable with a known good cable to isolate whether it’s an
HFI/Switch issue or cable issue.
3.
If step 2 worked, then install the questionable cable into another location and
verify whether it works.
4.
If the issue is corrected, then the issue may be a mechanical latching issue on the
HFI/Switch connector.
5.
If the original issue was marginal operation rather than a hard failure, then re-run
opacabletest and analyze the opaextracterrors results to verify whether the
issues were resolved.
6.
Re-run the physical links stability test using opacabletest.
Link Debug CLI Commands
Use the steps in Generate Cable Map Topology Files to create a baseline topology file
/etc/sysconfig/opa/topology.0:0.xml
You can also generate a temporary topology file of the current system, which can be
useful for live debug and tracking physical changes to the system (either adds or
deletions on connectivity).
Generate a topology file of the current system:
# opareport –o topology > test_topology.xml
Check the topology file against the current physical system:
# opareport -o verifyall -T test_topology.xml
or
May 2016
Document Number: J27600-1.0
Intel® Omni-Path Fabric
Staging Guide
23
Best Known Methods (BKMs) for Site Installation
# opareport –o verifyall –T /etc/sysconfig/opa/topology.0:0.xml
Intel recommends you start at the top of this list to identify all errors and then work
down to isolate specific types of errors.
 Identify fabric errors:
# opareport –o errors
 Identify slow links (< x4 width):
# opareport –o slowlinks
 Find links that are not plugged in or not seen by the interface. Find all links stuck in
the Offline state:
# opareport -A -m -F portphysstate:offline -o comps -d 5
 A link stuck in Polling may indicate that the other end of the cable is not inserted
correctly. In this case, typically, one end is Polling and the other end is Offline.
 Find all links stuck in the Polling state:
# opareport -A -m -F portphysstate:polling -o comps -d 5
 Identify bad links:
# opaextractbadlinks
 As a debug step, temporarily disable all bad links and append
/etc/sysconfig/opa/disabled.0:0.csv with a list of all bad links disabled.
# opaextractbadlinks | opadisableports
 To enable links previously disabled:
#cat /etc/sysconfig/opa/disabled.0:0.csv | opaenableports
 To bounce a link, simulating a cable pull and re-insert on a server. It may take up
to 60 seconds for the port to re-inter the active state
# opaportconfig bounce
 Check status using
# opainfo
opaportconfig and opaportinfo are key commands for port debug, run with -help to
see available options
Intel® Omni-Path Fabric
Staging Guide
24
May 2016
Document Number: J27600-1.0
Best Known Methods (BKMs) for Site Installation
14.4
Use opatop for Bandwidth and Error Summary
Use opatop to look at bandwidth and error summary of HFIs and switches. The
following items provide a high-level overview of opatop:
 1) selects HFIs and 2) selects SW.
Intel recommends selecting 2) SWs.
In this display, HFIs show up as Send/Rcv and ISLs show up as Int.
 Select (W) for BW
 Select (E) for Error summary
Use u to move to an upper level. Similarly, use 2) to view SWs BW and error
summary.
14.5
Use the Beacon LED on HFI and Edge Switches
The LED beaconing flash pattern can be turned ON/OFF with opaportconfig. This can
be used to identify the HFI and switches/ports installed in racks that need attention.
For HFI:
opaportconfig -l 0x001 ledoff
Disabling Led at LID 0x00000001 Port 0 via local port 1
(0x0011750101671ed9)
opaportconfig -l 0x001 ledon
Enabling LED at LID 0x00000001 Port 0 via local port 1
(0x0011750101671ed9)
For Switch port:
opaportconfig -l 0x002 -m 40 ledon (where –m 40 is port number)
Enabling LED at LID 0x00000002 Port 40 via local port 1
(0x0011750101671ed9)
opaportconfig -l 0x002 -m 40 ledoff
Disabling Led at LID 0x00000002 Port 40 via local port 1
(0x0011750101671ed9)
May 2016
Document Number: J27600-1.0
Intel® Omni-Path Fabric
Staging Guide
25
Best Known Methods (BKMs) for Site Installation
14.6
Decode the Physical Configuration of an HFI
Adapter
The opahfirev command provides a quick snapshot of an HFI (OPA) adapter,
providing both PCIe status and physical configuration state, complementary to the
opainfo command.
# opahfirev
######################
node145 - HFI 0000:03:00.0
# Compute Server name = node145, PCIE
addr 0000:03:00.0
HFI:
hfi1_0
Board: ChipABI 3.0, ChipRev 7.16, SW Compat 3
SN:
0x0067671e
Bus:
Speed 8GT/s, Width x16
# PCIe Gen3 = 8GT/s, with a x16
configuration
GUID: 0011:7501:0167:671e
TMM:
10.0.0.992.40
######################
Note the new field for TMM firmware version, an optional micro-controller for thermal
monitoring on vendor-specific HFI adapters using the SMBus.
 Check the TMM FW version using: opatmmtool –fwversion
 Check the TMM FW version in file hfi1_smbus.fw using:
opatmmtool -f /lib/firmware/updates/hfi1_smbus.fw fileversion
 If the fwversion is less than fileversion:
Update the TMM firmware version using:
opatmmtool -f /lib/firmware/updates/hfi1_smbus.fw update
14.7
Verify Fabric Manager Sweep
By default, Fabric Manager sweeps every 5 minutes as defined in
/etc/sysconfig/opafm.xml. Sweeps are triggered sooner if there are fabric changes
such as hosts, switches, or links going up or down. Edit /var/log/messages and
search for CYCLE START. Each cycle start has a complementary cycle end. Any links
with errors are noted during this sweep cycle.
An example of a clean SM sweep follows:
Feb 16 16:12:08 hds1fnb8261 fm0_sm[3946]: PROGR[topology]: SM:
topology_main: TT: DISCOVERY CYCLE START - REASON: Scheduled sweep
interval
Feb 16 16:12:08 hds1fnb8261 fm0_sm[3946]: PROGR[topology]: SM:
topology_main: DISCOVERY CYCLE END. 9 SWs, 131 HFIs, 131 end ports, 523
total ports, 1 SM(s), 1902 packets, 0 retries, 0.350 sec sweep
Intel® Omni-Path Fabric
Staging Guide
26
May 2016
Document Number: J27600-1.0
Run Benchmark and Stress Tests
14.8
Verify PM Sweep Duration
To show the sweep duration, open opatop then select i
opatop: Img:Tue Feb 16 01:54:43 2016, Hist
Image Info:
Sweep Start: Tue Feb 16 01:54:43 2016
Sweep Duration: 0.001 Seconds
Num SW-Ports:
Num SWs:
Num Fail Nodes:
Num Skip Nodes:
3
1
HFI-Ports:
Num Links:
0
0
Ports:
Ports:
Now:Tue Feb 16 09:53:26 2016
2
2
0
0
Num SMs:
2
Unexpected Clear Ports: 0
Select r to traverse the previous sweep duration time from history files. By default,
PM sweeps every 10 seconds. The latest 10 image files (100 sec) are stored in RAM
and up to 24 hours of history is stored in /var/opt/opafm/.
14.9
Check Credit Loop Operation
For details on credit loops, see the Intel® Omni-Path Fabric Suite Fabric Manager User
Guide QoS Operation section.
To verify that a fabric does not have a credit loop issue, use:
# opareport –o validatecreditloops
15
Run Benchmark and Stress
Tests
Configuration for both Bandwidth and Latency Test:
#source /usr/mpi/gcc/openmpi-1.10.0-35-hfi/bin/mpivars.sh
#cd /usr/mpi/gcc/openmpi-1.10.0-35-hfi/bin/
Create host2 file with two nodes (if it does not exist):
#export LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-1.10.0-35-hfi/lib64
15.1
Run Bandwidth Test
From /opt/opa/src/mpi_apps run:
#./run_bw3
This test uses hosts defined in the mpi_hosts file.
May 2016
Document Number: J27600-1.0
Intel® Omni-Path Fabric
Staging Guide
27
Run Benchmark and Stress Tests
15.2
Run Latency Test
From /opt/opa/src/mpi_apps run:
#./run_lat3
15.3
Run MPI DeviationTest:
From /opt.opa/src/mpi_apps run:
# ./run_deviation 20 bwtol 20 -lattol 50
15.4
Run mpi_groupstress (Cable Stress)
Note: This section describes a procedure for using Cable Test for OEM testing of custom
interconnects such as backplanes, integrated switches, and custom HFIs. This test is
not for use by end customers.
Refer to: Intel® Omni-Path Fabric Suite FastFabric Command Line Interface Reference
Guide for detailed information. The test is located in /opt/opa/src/mpi_apps.
1.
Create mpi_group_hosts:
# /opt/opa/src/mpi_apps/gen_group_hosts
Accept the defaults.
2.
Verify file contents from /opt/opa/src/mpi_apps/mpi_group_hosts:
# cat mpi_group_hosts
/opt/opa/src/mpi_apps/groupstress/mpi_groupstress.c
mpicc -o mpi_groupstress_ompi mpi_groupstress.c
3.
Clear error counters:
# opareport -o none -clearall
4.
Confirm no errors exist:
# opareport -o errors
5.
Run mpi_groupstress test:
#mpirun -machinefile /opt/opa/src/mpi_apps/mpi_group_hosts
./mpi_groupstress
MPI_GroupStress BIBW Cable Stress Test
6 groups of 2, running for 60 minutes.
6.
Run opatop to monitor the link utilization during the test. For HFI use 1, then W
and for switches use 2, then W.
An output example of opatop 1, W follows:
Group BW Stats: HFIs
Criteria: Util-High Number: 10
Snd: TotMBps AvgMBps MinMBps MaxMBps
TotKPps
45618
7603
0
9173
6590
Buckt 0+%
10+%
20+%
30+%
40+%
50+%
60+%
1
0
0
0
0
0
0
Rcv: TotMBps AvgMBps MinMBps MaxMBps
TotKPps
45619
7603
0
9173
6595
Buckt 0+%
10+%
20+%
30+%
40+%
50+%
60+%
1
0
0
0
0
0
0
AvgKPps MinKPps MaxKPps
1098
0
1328
70+%
80+%
90+%
5
0
0
AvgKPps MinKPps MaxKPps
1099
0
1328
70+%
80+%
90+%
5
0
0
By default, the test runs for 60 minutes. To run for longer duration, specify
minutes in the mpirun command. For example, ./mpi_groupstress 120 runs for
2 hours.
7.
Check error counts after the test:
Intel® Omni-Path Fabric
Staging Guide
28
May 2016
Document Number: J27600-1.0
Take State Dump of a Switch
# opareport -o errors
15.5
8.
View the log file which is available for analysis in /opt/opa/src/mpi_apps/logs.
The log filename format is mpi_groupstress.date_time
9.
Extract the log file in CSV (Comma-Separated Values) format for errors and
performance.
# opaextracterror
# opaextractperf
Run run_mpi_stress
The default traffic pattern is “all-to-all” for this test.
Refer to Intel® Omni-Path Fabric Suite FastFabric Command Line Interface Reference
Guide for detailed information.
The test is located in /opt/opa/src/mpi_apps.
16
1.
Clear error counters:
# opareport -o none --clearall
2.
Confirm no errors exist:
# opareport -o error
3.
Run mpi_stress test using a 60 minute duration:
# ./run_mpi_stress all -t 60
4.
Run opatop to monitor the link utilization during the test.
5.
Check error counts after the test:
# opareport -o errors
6.
View the log file that is available for analysis in /opt/opa/src/mpi_apps/logs.
The log filename format is mpi_stress.date_time
7.
Extract the log file in CSV format for errors and performance.
# opaextracterror
# opaextractperf
Take State Dump of a Switch
Note: Taking a state dump is a disruptive process and requires reboot of the switch after the
state dump is taken. A state dump should only be taken if required to debug an issue.
A state dump of a switch is taken from an internally managed switch in the fabric. To
take a state dump of a switch, find its LID by running the opaextractlids|grep
switch name command. Then run the ismTakeStateDump lid command. The
following example shows taking a state dump of a switch with the LID 0x04. Taking a
state dump of a switch is based on its LID and the following process applies to both
managed and externally-managed switches.
1.
On an internally managed switch in the fabric, log in to support:
Edge-> supportLogin
username: Username
password: Password
May 2016
Document Number: J27600-1.0
Intel® Omni-Path Fabric
Staging Guide
29
BKM and OPA Commands
Edge-> ismTakeStateDump -lid 0x0004
Dumping state of the switch at lid 4 to /firmware/prr-LID0004.gz
2.
From the Management Node SFTP to the internally managed switch:
sftp admin@<internally managed switch> with password adminpass.
admin@10.228.222.20's password:
Connected to 10.228.222.20.
sftp> dir
admin
operator
prr-LID0004.gz
prr-LID0005.gz
prr-LID0015.gz
get prr-LID0004.gz
3.
17
Reboot the switch where the state dump was taken. For externally managed
switches, use FastFabric to reboot the switch. A reboot clears the state dump.
BKM and OPA Commands
Note: OPA commands should be issued from the Management Node where the IFS Full
package was installed.
17.1
Retrieve Host Fabric Interface (HFI) Temperature
Use the command:
cat /sys/class/infiniband/hfi1_X/tempsense
where X represents the device number.
When you send the command, the information is acquired at that specific time. Do not
be concerned with the file’s date/time.
An example of the output and the definition for each group of numbers follows:
# cat /sys/class/infiniband/hfi1_0/tempsense
68.50 0.00 105.00 105.00 0 0 0
 68.50 – actual temperature and temperature steps are 0.25⁰ C
 0.00 – low limit
 105.00 – upper limit
 105.00 – critical limit
 0 (first) – low limit flag (1 = flag set)
 0 (second) – upper limit flag (1 = flag set)
 0 (third) – critical limit flag (1 = flag set)
17.2
Read Error Counters
Use the command:
Intel® Omni-Path Fabric
Staging Guide
30
May 2016
Document Number: J27600-1.0
BKM and OPA Commands
# opareport -o errors
This command uses the default thresholds defined in:
/etc/sysconfig/opa/opamon.si.conf.
To run it against a different threshold file, use:
opareport -o errors -c /etc/sysconfig/opa/filename.conf
17.3
Clear Error Counters
Use the command:
# opareport -o none --clearall
17.4
Load and Unload Intel® Omni-Path Host HFI
Driver
Unload the driver HFI driver using:
# modprobe -r hfi1
Load the HFI driver using:
# modprobe hfi1
17.5
Analyze Links
Use the commands:
# opainfo output - Includes the link quality of local
HFI port
# opareport –o errors - Includes links with lower quality
# opareport -o links -F linkqualLE:value - Outputs the ports with a link
quality less than or equal to the value
# opareport -o links -F linkqualGE:value - Outputs the ports with a link
quality greater than or equal to the value
# opareport -o links -F linkqual:value - Outputs the ports with a link
quality equal to value
Table 1.
Link Quality Values and Description
Link
Quality
Value
Description
5
Working at or above preferred link quality, no action needed.
4
Working slightly below preferred link quality, but no action required.
3
Working on low end of acceptable link quality, recommended corrective
action on next maintenance window.
2
Working below acceptable link quality, recommend timely corrective
action.
May 2016
Document Number: J27600-1.0
Intel® Omni-Path Fabric
Staging Guide
31
BKM and OPA Commands
Link
Quality
Value
17.6
Description
1
Working far below acceptable link quality, recommend immediate
corrective action.
0
Link down
Trace Route Between Two Nodes
Use the command:
opareport -o route -S nodepat:"hds1fnb6101 hfi1_0" -D
nodepat:"hds1fnb6103 hfi1_0"
To trace using LID, use the command:
# opareport –o route -S lid:5 -D lid:8
17.7
Analyze All Fabric ISLs Routing Balance
Use the command:
#opareport –o treepathusage
17.8
Dump Switch ASIC Forwarding Tables
Use the commands:
 opareport –o linear — Displays all switch unicast forwarding tables DLIDs and
Egress ports.
 opareport –o mcast — Displays multicast groups and members.
17.9
Configure Redundant Fabric Manager (FM)
Priority
17.9.1
Configure FM priority from a local or remote terminal
Perform the following steps:
1.
Edit the /etc/sysconfig/opafm.xml file.
2.
Select the <Priority>0 <Priority> and change 0 to the number you want (015).
3.
Save File.
4.
Start or restart the Fabric Manager to load new file.
opafm restart
Note: If you set a Fabric Manager to a higher priority, it becomes the master Fabric Manager
automatically. The sticky finger option is disabled by default.
Intel® Omni-Path Fabric
Staging Guide
32
May 2016
Document Number: J27600-1.0
Final Fabric Checks
17.9.2
Configure FM Elevated Priority
Perform the following steps:
17.9.3
1.
Edit the /etc/sysconfig/opafm.xml file.
2.
Select the <ElevatedPriority>0</ElevatedPriority> and change 0 to the
number you want (0-15).
3.
Save the file.
4.
Start or restart the Fabric Manager to load the new file.
opafm restart
Configuration Consistency for Priority/Elevated Priority
Priority and Elevated Priority are not part of the opafm.xml configuration consistency
checksum calculation. This makes standby Fabric Managers with mismatched
configuration inactive because they are not valid to take over as Master in case of
failover.
Having different values for Priority and Elevated Priority settings for SM instances is
allowed and failover works as documented per Priority/ElevatedPriority settings. In
normal failover without elevated priority, if the original Master Fabric Manager goes
down, the Standby Fabric Manager becomes Master. When the original Master comes
back up, it again takes over as Master.
Note: In sticky failover, Elevated Priority is used and with sticky failover enabled, when the
original Master comes back up, it does NOT take over.
17.9.4
Display FM states from the Management Node
Run the opafabricinfo command to view the new active master SM.
18
Final Fabric Checks
After addressing all issues, perform final fabric checks as described in Verify Server
and Fabric.
§
May 2016
Document Number: J27600-1.0
Intel® Omni-Path Fabric
Staging Guide
33
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising