Isolation procedures

Isolation procedures
Power Systems
Isolation procedures
Power Systems
Isolation procedures
Note
Before using this information and the product it supports, read the information in “Safety notices” on page xi, “Notices” on
page 469, the IBM Systems Safety Notices manual, G229-9054, and the IBM Environmental Notices and User Guide, Z125–5823.
This edition applies to IBM Power Systems™ servers that contain the POWER7 processor and to all associated
models.
© Copyright IBM Corporation 2010, 2015.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
Contents
Safety notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Isolation procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
HSL/RIO 12X isolation procedures . . . . . . . . . . . . . . . . . . . . .
Bus, high-speed link (HSL/RIO/12X) isolation information . . . . . . . . . . . . .
PCI bus isolation using AIX, Linux, or the management console . . . . . . . . . .
Isolating a PCI bus problem while running AIX or Linux . . . . . . . . . . .
Isolating a PCI bus problem from the management console . . . . . . . . . . .
Verifying a high-speed link, system PCI bus, or a multi-adapter bridge repair . . . . .
Analyzing a 12X or PCI bus reference code . . . . . . . . . . . . . . . . .
DSA translation . . . . . . . . . . . . . . . . . . . . . . . . . .
Card positions . . . . . . . . . . . . . . . . . . . . . . . . . . .
Converting the loop number to 12X port location labels . . . . . . . . . . . .
HSL loop configuration and status form . . . . . . . . . . . . . . . . . .
Installed features in a PCI bridge set form . . . . . . . . . . . . . . . . .
RIO/HSL/12X link status diagnosis form . . . . . . . . . . . . . . . . .
CONSL01 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RIOIP01 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Main task . . . . . . . . . . . . . . . . . . . . . . . . . . .
The ports on both ends of the failed link are in different system units on the loop. . .
The port on one end of the failed link is in a system unit and the port on the other end is
The ports on both ends of the failed link are in an I/O unit . . . . . . . . . .
Cannot power on unit . . . . . . . . . . . . . . . . . . . . . . .
Manually detecting the failed link . . . . . . . . . . . . . . . . . . .
Refresh the port status. . . . . . . . . . . . . . . . . . . . . . .
RIOIP06 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RIOIP08 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RIOIP09 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RIOIP10 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RIOIP11 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RIOIP12 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RIOIP56 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Multi-adapter bridge isolation procedures . . . . . . . . . . . . . . . . . .
MABIP02 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MABIP03 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MABIP05 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MABIP50 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MABIP51 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MABIP52 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MABIP53 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MABIP54 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MABIP55 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MABIP56 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MABIP57 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Communication isolation procedure . . . . . . . . . . . . . . . . . . . .
COMIP01, COMPIP1 . . . . . . . . . . . . . . . . . . . . . . . .
Disk unit isolation procedure . . . . . . . . . . . . . . . . . . . . . .
DSKIP03 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Intermittent isolation procedures . . . . . . . . . . . . . . . . . . . . .
INTIP03 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
INTIP05 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
INTIP07 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
INTIP08 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
INTIP09 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
INTIP14 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
© Copyright IBM Corp. 2010, 2015
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
in
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
an
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
. .
. .
. .
. .
. .
. .
. .
I/O
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
. .
. .
. .
. .
. .
. .
. .
unit .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
. 1
. 1
. 2
. 2
. 3
. 3
. 5
. 6
. 7
. 13
. 18
. 19
. 19
. 20
. 21
. 21
. 22
. 23
. 24
. 25
. 25
. 26
. 26
. 26
. 27
. 28
. 29
. 30
. 32
. 33
. 33
. 33
. 33
. 33
. 33
. 34
. 35
. 38
. 40
. 42
. 44
. 46
. 46
. 48
. 48
. 51
. 52
. 53
. 53
. 54
. 55
. 57
iii
INTIP16 . . . . . . . . . . .
INTIP18 . . . . . . . . . . .
INTIP20 . . . . . . . . . . .
INTIP24 . . . . . . . . . . .
I/O processor isolation procedures . . .
IOPIP01 . . . . . . . . . . .
Using the product activity log . . .
IOPIP13 . . . . . . . . . . .
IOPIP16 . . . . . . . . . . .
IOPIP17 . . . . . . . . . . .
IOPIP18 . . . . . . . . . . .
IOPIP19 . . . . . . . . . . .
IOPIP20 . . . . . . . . . . .
IOPIP21 . . . . . . . . . . .
IOPIP22 . . . . . . . . . . .
IOPIP23 . . . . . . . . . . .
IOPIP25 . . . . . . . . . . .
IOPIP26 . . . . . . . . . . .
IOPIP27 . . . . . . . . . . .
IOPIP28 . . . . . . . . . . .
IOPIP29 . . . . . . . . . . .
IOPIP30 . . . . . . . . . . .
IOPIP31 . . . . . . . . . . .
IOPIP32 . . . . . . . . . . .
IOPIP33 . . . . . . . . . . .
IOPIP34 . . . . . . . . . . .
IOPIP40 . . . . . . . . . . .
IOPIP41 . . . . . . . . . . .
Licensed Internal Code isolation procedures
LICIP01 . . . . . . . . . . .
LICIP03 . . . . . . . . . . .
LICIP04 . . . . . . . . . . .
LICIP07 . . . . . . . . . . .
LICIP08 . . . . . . . . . . .
LICIP11. . . . . . . . . . . .
How to find the cause code . . . .
0001 . . . . . . . . . . . .
0002 . . . . . . . . . . . .
0004 . . . . . . . . . . . .
0005 . . . . . . . . . . . .
0006 . . . . . . . . . . . .
0007 . . . . . . . . . . . .
0008 . . . . . . . . . . . .
0009 . . . . . . . . . . . .
000A . . . . . . . . . . .
000B . . . . . . . . . . .
000C . . . . . . . . . . .
000D . . . . . . . . . . .
000E . . . . . . . . . . .
0010 . . . . . . . . . . .
0011 . . . . . . . . . . .
0012 . . . . . . . . . . .
0015 . . . . . . . . . . .
0016 . . . . . . . . . . .
0017 . . . . . . . . . . .
0018 . . . . . . . . . . .
0019 . . . . . . . . . . .
001A . . . . . . . . . . .
001C . . . . . . . . . . .
001D . . . . . . . . . . .
001E . . . . . . . . . . .
iv
Isolation procedures
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 57
. 57
. 57
. 58
. 59
. 60
. 63
. 63
. 65
. 68
. 70
. 72
. 72
. 74
. 75
. 75
. 75
. 77
. 78
. 80
. 80
. 80
. 83
. 85
. 86
. 86
. 87
. 88
. 89
. 89
. 91
. 91
. 91
. 94
. 94
. 95
. 95
. 96
. 98
. 99
. 99
. 99
. 99
. 99
. 99
. 100
. 100
. 100
. 100
. 101
. 101
. 101
. 101
. 101
. 101
. 101
. 101
. 102
. 102
. 102
. 102
001F . . . . . . . . . . . . . . . . . . . . .
0020 . . . . . . . . . . . . . . . . . . . . .
0021 . . . . . . . . . . . . . . . . . . . . .
0022 . . . . . . . . . . . . . . . . . . . . .
0023 . . . . . . . . . . . . . . . . . . . . .
0024 . . . . . . . . . . . . . . . . . . . . .
0025 . . . . . . . . . . . . . . . . . . . . .
0026 . . . . . . . . . . . . . . . . . . . . .
0027 . . . . . . . . . . . . . . . . . . . . .
002A . . . . . . . . . . . . . . . . . . . . .
002B . . . . . . . . . . . . . . . . . . . . .
0031 . . . . . . . . . . . . . . . . . . . . .
0033 . . . . . . . . . . . . . . . . . . . . .
0034 . . . . . . . . . . . . . . . . . . . . .
0035 . . . . . . . . . . . . . . . . . . . . .
0037 . . . . . . . . . . . . . . . . . . . . .
0038 . . . . . . . . . . . . . . . . . . . . .
0039 . . . . . . . . . . . . . . . . . . . . .
003A . . . . . . . . . . . . . . . . . . . . .
0099 . . . . . . . . . . . . . . . . . . . . .
LICIP12 . . . . . . . . . . . . . . . . . . . . .
How to find the cause code . . . . . . . . . . . . .
0002 . . . . . . . . . . . . . . . . . . . . .
0004 . . . . . . . . . . . . . . . . . . . . .
0007 . . . . . . . . . . . . . . . . . . . . .
0008 . . . . . . . . . . . . . . . . . . . . .
0009 . . . . . . . . . . . . . . . . . . . . .
000A . . . . . . . . . . . . . . . . . . . . .
000B . . . . . . . . . . . . . . . . . . . . .
000D . . . . . . . . . . . . . . . . . . . . .
000E . . . . . . . . . . . . . . . . . . . . .
002C . . . . . . . . . . . . . . . . . . . . .
002D . . . . . . . . . . . . . . . . . . . . .
002E . . . . . . . . . . . . . . . . . . . . .
002F . . . . . . . . . . . . . . . . . . . . .
0030 . . . . . . . . . . . . . . . . . . . . .
0032 . . . . . . . . . . . . . . . . . . . . .
0099 . . . . . . . . . . . . . . . . . . . . .
LICIP13 . . . . . . . . . . . . . . . . . . . . .
LICIP14 . . . . . . . . . . . . . . . . . . . . .
LICIP15 . . . . . . . . . . . . . . . . . . . . .
LICIP16 . . . . . . . . . . . . . . . . . . . . .
Logical partition isolation procedure. . . . . . . . . . . . .
LPRIP01 . . . . . . . . . . . . . . . . . . . . .
Operations console isolation procedures . . . . . . . . . . .
OPCIP03 . . . . . . . . . . . . . . . . . . . . .
Power isolation procedures . . . . . . . . . . . . . . . .
Power problems . . . . . . . . . . . . . . . . . .
Cannot power on system unit . . . . . . . . . . . . .
Cannot power off system or SPCN-controlled I/O expansion unit .
Cannot power on SPCN-controlled I/O expansion unit . . . .
IQYDBPL. . . . . . . . . . . . . . . . . . . . .
IQYPLNR . . . . . . . . . . . . . . . . . . . .
IQYRIEA . . . . . . . . . . . . . . . . . . . . .
IQYRIEB . . . . . . . . . . . . . . . . . . . . .
IQYRIRR . . . . . . . . . . . . . . . . . . . . .
IQYRISC . . . . . . . . . . . . . . . . . . . . .
IQYRISE . . . . . . . . . . . . . . . . . . . . .
IQYRISJ . . . . . . . . . . . . . . . . . . . . .
IQYRISK . . . . . . . . . . . . . . . . . . . . .
IQYRISM . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
102
103
103
103
103
103
104
104
104
104
104
104
104
104
105
105
105
105
105
105
105
105
106
108
108
108
108
108
108
108
109
109
109
109
109
109
109
109
109
114
115
117
118
118
122
122
124
125
126
129
132
136
136
137
137
137
137
137
138
138
138
Contents
v
IQYRISQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IQYRISR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IQYRISS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IQYRISU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IQYRISZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PWR1900 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PWR1904 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Procedure for 8248-L4T, 8408-E8D, or 9109-RMD . . . . . . . . . . . . . . . . . . . .
Procedure for 8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD . . .
PWR1905 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Procedure for 8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, 8205-E6D, 8231-E2B, 8231-E1C, 8231-E1D,
8231-E2C, 8231-E2D, 8233-E8B, 8236-E8C, or 8268-E1D. . . . . . . . . . . . . . . . . . .
PWR1907 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PWR1909 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Procedure for 5796, 5802, 5877, and 7314-G30. . . . . . . . . . . . . . . . . . . . . .
PWR1911 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PWR1912 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PWR1917 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PWR1918 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PWR1920 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PWR2402 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Router isolation procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RTRIP01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RTRIP02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RTRIP03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RTRIP04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RTRIP05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RTRIP06 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RTRIP07 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RTRIP08 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Serial-attached SCSI isolation procedures . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3110 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3111 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3112 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3113 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3120 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3121 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3130 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3131 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3132 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3134 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3140 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3141 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3142 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3143 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3144 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3145 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3146 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3147 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3148 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3149 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3150 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3152 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3153 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3250 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3254 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3290 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP3295 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP4040 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP4041 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SIP4044 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
Isolation procedures
139
139
139
139
140
140
140
141
143
145
145
147
149
149
151
154
156
158
162
163
165
165
165
166
166
166
167
168
168
169
169
171
172
174
176
177
178
179
182
184
186
187
189
190
191
195
196
199
200
201
201
204
207
207
210
210
211
211
211
213
SIP4047 . .
SIP4049 . .
SIP4050 . .
SIP4052 . .
SIP4053 . .
SIP4140 . .
SIP4141 . .
SIP4144 . .
SIP4147 . .
SIP4149 . .
SIP4150 . .
SIP4152 . .
SIP4153 . .
Service processor
FSPSP01 . .
FSPSP02 . .
FSPSP03 . .
FSPSP04 . .
FSPSP05 . .
FSPSP06 . .
FSPSP07 . .
FSPSP09 . .
FSPSP10 . .
FSPSP11 . .
FSPSP12 . .
FSPSP14 . .
FSPSP16 . .
FSPSP17 . .
FSPSP18 . .
FSPSP20 . .
FSPSP22 . .
FSPSP23 . .
FSPSP24 . .
FSPSP25 . .
FSPSP27 . .
FSPSP28 . .
FSPSP29 . .
FSPSP30 . .
FSPSP31 . .
FSPSP32 . .
FSPSP33 . .
FSPSP34 . .
FSPSP35 . .
FSPSP36 . .
FSPSP38 . .
FSPSP42 . .
FSPSP45 . .
FSPSP46 . .
FSPSP47 . .
FSPSP48 . .
FSPSP49 . .
FSPSP50 . .
FSPSP51 . .
FSPSP52 . .
FSPSP54 . .
FSPSP55 . .
FSPSP56 . .
FSPSP57 . .
FSPSP58 . .
FSPSP59 . .
FSPSP60 . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
isolation
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
procedures
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contents
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
215
215
217
220
222
222
222
223
225
226
226
229
231
232
232
240
241
241
241
241
241
242
243
244
245
246
247
247
247
247
248
250
250
251
251
257
257
259
260
260
265
265
266
266
268
269
270
273
276
278
280
281
283
283
284
285
285
285
286
287
287
vii
FSPSP61 . . . . . . . . . . . . . . . . . . . . . . .
9119–FHB bulk power connection tables . . . . . . . . . . .
9125-F2C bulk power connection tables. . . . . . . . . . . .
FSPSP62 . . . . . . . . . . . . . . . . . . . . . . .
9119–FHB bulk power connection tables . . . . . . . . . . .
FSPSP63 . . . . . . . . . . . . . . . . . . . . . . .
FSPSP64 . . . . . . . . . . . . . . . . . . . . . . .
FSPSP65 . . . . . . . . . . . . . . . . . . . . . . .
FSPSP66 . . . . . . . . . . . . . . . . . . . . . . .
FSPSP67 . . . . . . . . . . . . . . . . . . . . . . .
FSPSP68 . . . . . . . . . . . . . . . . . . . . . . .
FSPSP70 . . . . . . . . . . . . . . . . . . . . . . .
FSPSP71 . . . . . . . . . . . . . . . . . . . . . . .
FSPSP73 . . . . . . . . . . . . . . . . . . . . . . .
FSPSP75 . . . . . . . . . . . . . . . . . . . . . . .
FSPSP79 . . . . . . . . . . . . . . . . . . . . . . .
FSPSP83 . . . . . . . . . . . . . . . . . . . . . . .
FSPSPC1 . . . . . . . . . . . . . . . . . . . . . . .
FSPSPD1 . . . . . . . . . . . . . . . . . . . . . . .
Tape unit isolation procedures . . . . . . . . . . . . . . . . .
TUPIP03 . . . . . . . . . . . . . . . . . . . . . . .
TUPIP04 . . . . . . . . . . . . . . . . . . . . . . .
TUPIP06 . . . . . . . . . . . . . . . . . . . . . . .
Tape unit self-test procedure . . . . . . . . . . . . . . . .
Running the self-test . . . . . . . . . . . . . . . . . .
Interpreting the results . . . . . . . . . . . . . . . . .
Tape device ready conditions . . . . . . . . . . . . . . . .
Twinaxial workstation I/O processor isolation procedure . . . . . . . .
TWSIP01 . . . . . . . . . . . . . . . . . . . . . . .
Workstation adapter isolation procedure . . . . . . . . . . . . .
WSAIP01 . . . . . . . . . . . . . . . . . . . . . . .
Workstation adapter console isolation procedure . . . . . . . . . .
Isolating problems on servers that run AIX or Linux . . . . . . . . .
MAP 0210: General problem resolution . . . . . . . . . . . . .
Problems with loading and starting the operating system (AIX and Linux) .
SCSI service hints . . . . . . . . . . . . . . . . . . . .
General SCSI configuration checks . . . . . . . . . . . . .
High availability or multiple SCSI system checks . . . . . . . .
SCSI-2 single-ended adapter PTC failure isolation procedure . . . .
Determining where to start . . . . . . . . . . . . . . . .
External SCSI-2 single-ended bus PTC isolation procedure . . . . .
External SCSI-2 single-ended bus probable tripped PTC causes . . . .
Internal SCSI-2 single-ended bus PTC isolation procedure . . . . .
Internal SCSI-2 single-ended bus probable tripped PTC resistor causes .
SCSI-2 differential adapter PTC failure isolation procedure . . . . .
External SCSI-2 differential adapter bus PTC isolation procedure . . .
SCSI-2 differential adapter probable tripped PTC causes . . . . . .
Dual-channel ultra SCSI adapter PTC failure isolation procedure . . .
64-bit PCI-X dual channel SCSI adapter PTC failure isolation procedure .
MAP 0020 . . . . . . . . . . . . . . . . . . . . . .
MAP 0030 . . . . . . . . . . . . . . . . . . . . . .
MAP 0040 . . . . . . . . . . . . . . . . . . . . . .
MAP 0050 . . . . . . . . . . . . . . . . . . . . . .
Preparing for hot-plug SCSI device or cable deconfiguration. . . . .
After hot-plug SCSI device or cable deconfiguration . . . . . . .
MAP 0054 . . . . . . . . . . . . . . . . . . . . . .
MAP 0070 . . . . . . . . . . . . . . . . . . . . . .
MAP 0220 . . . . . . . . . . . . . . . . . . . . . .
MAP 0230 . . . . . . . . . . . . . . . . . . . . . .
MAP 0235 . . . . . . . . . . . . . . . . . . . . . .
MAP 0260 . . . . . . . . . . . . . . . . . . . . . .
viii
Isolation procedures
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
287
287
289
290
292
294
294
294
294
294
294
294
295
295
296
296
297
297
302
303
304
306
310
311
311
312
312
313
314
320
321
323
325
325
326
329
329
329
330
330
330
331
331
333
333
333
335
335
336
336
343
344
346
351
352
353
355
356
360
366
367
MAP 0270 . . . . . . . . . . . . . . . . . . . . . .
MAP 0280 . . . . . . . . . . . . . . . . . . . . . .
MAP 0285 . . . . . . . . . . . . . . . . . . . . . .
MAP 0291 . . . . . . . . . . . . . . . . . . . . . .
MAP 4040 . . . . . . . . . . . . . . . . . . . . . .
MAP 4041 . . . . . . . . . . . . . . . . . . . . . .
MAP 4044 . . . . . . . . . . . . . . . . . . . . . .
MAP 4047 . . . . . . . . . . . . . . . . . . . . . .
MAP 4049 . . . . . . . . . . . . . . . . . . . . . .
MAP 4050 . . . . . . . . . . . . . . . . . . . . . .
MAP 4052 . . . . . . . . . . . . . . . . . . . . . .
MAP 4053 . . . . . . . . . . . . . . . . . . . . . .
MAP 4140 . . . . . . . . . . . . . . . . . . . . . .
MAP 4141 . . . . . . . . . . . . . . . . . . . . . .
MAP 4144 . . . . . . . . . . . . . . . . . . . . . .
MAP 4147 . . . . . . . . . . . . . . . . . . . . . .
MAP 4149 . . . . . . . . . . . . . . . . . . . . . .
MAP 4150 . . . . . . . . . . . . . . . . . . . . . .
MAP 4152 . . . . . . . . . . . . . . . . . . . . . .
MAP 4153 . . . . . . . . . . . . . . . . . . . . . .
MAP 5000 . . . . . . . . . . . . . . . . . . . . . .
MAP 5001 . . . . . . . . . . . . . . . . . . . . . .
PFW1540: Problem isolation procedures . . . . . . . . . . . .
PFW1542: I/O problem isolation procedure . . . . . . . . . . .
PFW1548: Memory and processor subsystem problem isolation procedure .
PFW1548: Memory and processor subsystem problem isolation procedure
attached . . . . . . . . . . . . . . . . . . . . . .
PFW1548: Memory and processor subsystem problem isolation procedure
attached . . . . . . . . . . . . . . . . . . . . . .
SAS fabric identification . . . . . . . . . . . . . . . . . .
SAS RAID configurations . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
when a management console is
. . . . . . . . . . .
without a management console
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
369
373
374
377
379
379
380
383
384
385
389
391
393
394
395
398
398
399
402
405
407
407
407
408
423
. 438
. 446
. 453
. 458
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
Trademarks . . . . .
Electronic emission notices
Class A Notices. . .
Class B Notices . . .
Terms and conditions. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contents
.
.
.
.
.
470
470
470
474
477
ix
x
Isolation procedures
Safety notices
Safety notices may be printed throughout this guide:
v DANGER notices call attention to a situation that is potentially lethal or extremely hazardous to
people.
v CAUTION notices call attention to a situation that is potentially hazardous to people because of some
existing condition.
v Attention notices call attention to the possibility of damage to a program, device, system, or data.
World Trade safety information
Several countries require the safety information contained in product publications to be presented in their
national languages. If this requirement applies to your country, safety information documentation is
included in the publications package (such as in printed documentation, on DVD, or as part of the
product) shipped with the product. The documentation contains the safety information in your national
language with references to the U.S. English source. Before using a U.S. English publication to install,
operate, or service this product, you must first become familiar with the related safety information
documentation. You should also refer to the safety information documentation any time you do not
clearly understand any safety information in the U.S. English publications.
Replacement or additional copies of safety information documentation can be obtained by calling the IBM
Hotline at 1-800-300-8751.
German safety information
Das Produkt ist nicht für den Einsatz an Bildschirmarbeitsplätzen im Sinne § 2 der
Bildschirmarbeitsverordnung geeignet.
Laser safety information
IBM® servers can use I/O cards or features that are fiber-optic based and that utilize lasers or LEDs.
Laser compliance
IBM servers may be installed inside or outside of an IT equipment rack.
© Copyright IBM Corp. 2010, 2015
xi
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
DANGER
xii
Isolation procedures
Observe the following precautions when working on or around your IT rack system:
v Heavy equipment–personal injury or equipment damage might result if mishandled.
v Always lower the leveling pads on the rack cabinet.
v Always install stabilizer brackets on the rack cabinet.
v To avoid hazardous conditions due to uneven mechanical loading, always install the heaviest
devices in the bottom of the rack cabinet. Always install servers and optional devices starting
from the bottom of the rack cabinet.
v Rack-mounted devices are not to be used as shelves or work spaces. Do not place objects on top
of rack-mounted devices.
v Each rack cabinet might have more than one power cord. Be sure to disconnect all power cords in
the rack cabinet when directed to disconnect power during servicing.
v Connect all devices installed in a rack cabinet to power devices installed in the same rack
cabinet. Do not plug a power cord from a device installed in one rack cabinet into a power
device installed in a different rack cabinet.
v An electrical outlet that is not correctly wired could place hazardous voltage on the metal parts of
the system or the devices that attach to the system. It is the responsibility of the customer to
ensure that the outlet is correctly wired and grounded to prevent an electrical shock.
CAUTION
v Do not install a unit in a rack where the internal rack ambient temperatures will exceed the
manufacturer's recommended ambient temperature for all your rack-mounted devices.
v Do not install a unit in a rack where the air flow is compromised. Ensure that air flow is not
blocked or reduced on any side, front, or back of a unit used for air flow through the unit.
v Consideration should be given to the connection of the equipment to the supply circuit so that
overloading of the circuits does not compromise the supply wiring or overcurrent protection. To
provide the correct power connection to a rack, refer to the rating labels located on the
equipment in the rack to determine the total power requirement of the supply circuit.
v (For sliding drawers.) Do not pull out or install any drawer or feature if the rack stabilizer brackets
are not attached to the rack. Do not pull out more than one drawer at a time. The rack might
become unstable if you pull out more than one drawer at a time.
v (For fixed drawers.) This drawer is a fixed drawer and must not be moved for servicing unless
specified by the manufacturer. Attempting to move the drawer partially or completely out of the
rack might cause the rack to become unstable or cause the drawer to fall out of the rack.
(R001)
Safety notices
xiii
CAUTION:
Removing components from the upper positions in the rack cabinet improves rack stability during
relocation. Follow these general guidelines whenever you relocate a populated rack cabinet within a
room or building:
v Reduce the weight of the rack cabinet by removing equipment starting at the top of the rack
cabinet. When possible, restore the rack cabinet to the configuration of the rack cabinet as you
received it. If this configuration is not known, you must observe the following precautions:
– Remove all devices in the 32U position and above.
– Ensure that the heaviest devices are installed in the bottom of the rack cabinet.
– Ensure that there are no empty U-levels between devices installed in the rack cabinet below the
32U level.
v If the rack cabinet you are relocating is part of a suite of rack cabinets, detach the rack cabinet from
the suite.
v Inspect the route that you plan to take to eliminate potential hazards.
v Verify that the route that you choose can support the weight of the loaded rack cabinet. Refer to the
documentation that comes with your rack cabinet for the weight of a loaded rack cabinet.
v Verify that all door openings are at least 760 x 230 mm (30 x 80 in.).
v
v
v
v
v
Ensure that all devices, shelves, drawers, doors, and cables are secure.
Ensure that the four leveling pads are raised to their highest position.
Ensure that there is no stabilizer bracket installed on the rack cabinet during movement.
Do not use a ramp inclined at more than 10 degrees.
When the rack cabinet is in the new location, complete the following steps:
– Lower the four leveling pads.
– Install stabilizer brackets on the rack cabinet.
– If you removed any devices from the rack cabinet, repopulate the rack cabinet from the lowest
position to the highest position.
v If a long-distance relocation is required, restore the rack cabinet to the configuration of the rack
cabinet as you received it. Pack the rack cabinet in the original packaging material, or equivalent.
Also lower the leveling pads to raise the casters off of the pallet and bolt the rack cabinet to the
pallet.
(R002)
(L001)
(L002)
xiv
Isolation procedures
(L003)
or
All lasers are certified in the U.S. to conform to the requirements of DHHS 21 CFR Subchapter J for class
1 laser products. Outside the U.S., they are certified to be in compliance with IEC 60825 as a class 1 laser
product. Consult the label on each part for laser certification numbers and approval information.
CAUTION:
This product might contain one or more of the following devices: CD-ROM drive, DVD-ROM drive,
DVD-RAM drive, or laser module, which are Class 1 laser products. Note the following information:
v Do not remove the covers. Removing the covers of the laser product could result in exposure to
hazardous laser radiation. There are no serviceable parts inside the device.
v Use of the controls or adjustments or performance of procedures other than those specified herein
might result in hazardous radiation exposure.
(C026)
Safety notices
xv
CAUTION:
Data processing environments can contain equipment transmitting on system links with laser modules
that operate at greater than Class 1 power levels. For this reason, never look into the end of an optical
fiber cable or open receptacle. (C027)
CAUTION:
This product contains a Class 1M laser. Do not view directly with optical instruments. (C028)
CAUTION:
Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the following
information: laser radiation when open. Do not stare into the beam, do not view directly with optical
instruments, and avoid direct exposure to the beam. (C030)
CAUTION:
The battery contains lithium. To avoid possible explosion, do not burn or charge the battery.
Do Not:
v ___ Throw or immerse into water
v ___ Heat to more than 100°C (212°F)
v ___ Repair or disassemble
Exchange only with the IBM-approved part. Recycle or discard the battery as instructed by local
regulations. In the United States, IBM has a process for the collection of this battery. For information,
call 1-800-426-4333. Have the IBM part number for the battery unit available when you call. (C003)
Power and cabling information for NEBS (Network Equipment-Building System)
GR-1089-CORE
The following comments apply to the IBM servers that have been designated as conforming to NEBS
(Network Equipment-Building System) GR-1089-CORE:
The equipment is suitable for installation in the following:
v Network telecommunications facilities
v Locations where the NEC (National Electrical Code) applies
The intrabuilding ports of this equipment are suitable for connection to intrabuilding or unexposed
wiring or cabling only. The intrabuilding ports of this equipment must not be metallically connected to the
interfaces that connect to the OSP (outside plant) or its wiring. These interfaces are designed for use as
intrabuilding interfaces only (Type 2 or Type 4 ports as described in GR-1089-CORE) and require isolation
from the exposed OSP cabling. The addition of primary protectors is not sufficient protection to connect
these interfaces metallically to OSP wiring.
Note: All Ethernet cables must be shielded and grounded at both ends.
The ac-powered system does not require the use of an external surge protection device (SPD).
The dc-powered system employs an isolated DC return (DC-I) design. The DC battery return terminal
shall not be connected to the chassis or frame ground.
xvi
Isolation procedures
Isolation procedures
Isolation procedures are used together with diagnostic programs, which are part of server firmware.
If a server is connected to a management console, these procedures are available on the management
console. Use the management console procedures to continue isolating the problem. If the server does not
have a management console and you are directed to perform an isolation procedure, the procedures
documented here are needed to continue isolating a problem.
HSL/RIO 12X isolation procedures
Use RIO/HSL/12X isolation procedures if there is no a management console attached to the server. If the
server is connected to a management console, use the procedures that are available on the management
console to continue FRU isolation.
Bus, high-speed link (HSL/RIO/12X) isolation information
Symbolic FRUs, failing items, and bus isolation procedures use the terms partition and logical partition to
indicate any single partition in a system that has multiple partitions. If the system you are working on
does not have multiple partitions, then the terms refer to the primary partition.
Read all safety notices below before servicing the system and while performing a procedure.
Note: Unless instructed otherwise, always power off the system before removing, exchanging, or
installing a field-replaceable unit (FRU).
© Copyright IBM Corp. 2010, 2015
1
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
PCI bus isolation using AIX, Linux, or the management console
Isolate a PCI bus problem from the management console or while running in the AIX® or Linux
environment.
If you have a management console, then this procedure should be performed from the management
console as part of the management console directed service.
If you do not have a management console, then you should perform this procedure when directed by the
maintenance package.
Isolating a PCI bus problem while running AIX or Linux
Can an IPL be run on the operating system?
v No: Perform “MABIP52” on page 34. This ends the procedure.
v Yes:
Choose from the following:
– If you are running AIX, go to Running the online and standalone diagnostics log to isolate the PCI
bus failure with online diagnostics in concurrent mode.
2
Isolation procedures
– If you are running Linux, go to Running the online and standalone diagnostics log to isolate the PCI
bus failure with stand-alone diagnostics.
This ends the procedure.
Isolating a PCI bus problem from the management console
To isolate a PCI bus problem from the management console, check the serviceable event view for the
server for FRU part locations associated with the serviceable event, then continue with this procedure:
1. Did the serviceable event view provide the locations for the failing FRUs?
Yes: Use those locations to exchange the given FRUs one at a time until the problem is resolved.
This ends the procedure.
No: Continue with the next step.
2. Go to “DSA translation” on page 6 to determine the Direct Select Address (DSA).
3. Perform the following steps:
a. Record the bus number value (BBBB) from the DSA and convert it to decimal format.
b. Search for the decimal system bus number in the partition resources screens on the management
console.
c. Record the frame or unit type and continue with the next step.
4. Record the Cc value from the DSA. Is the Cc value greater than 00?
Yes: Continue with the next step.
No: The multi-adapter bridge number and the multi-adapter function number have not been
identified, and so the card slot cannot be identified using the DSA. Look in the management
console partition resources for non-reporting or non-operational hardware. That will indicate
which cards in which positions need to be replaced. See System FRU locations for the model you
are working on for information about the multi-adapter bridge that controls those card slots. That
multi-adapter bridge is also a FRU. This ends the procedure.
5. Is the right-most character (c) F?
No: Continue with the next step.
Yes: Only the multi-adapter bridge number has been identified. Record the multi-adapter bridge
number (left-most character of Cc) for later use. Because the card slot cannot be identified with the
DSA, see System FRU locations for the model you are working on for information about the
multi-adapter bridge that controls the card slots. Consider all card slots controlled by the
multi-adapter bridge to be FRUs. This ends the procedure.
6. See “Card positions” on page 7 and use the BBBB and Cc values that you recorded to identify the card
position. Then return to the procedure that sent you here. This ends the procedure.
Verifying a high-speed link, system PCI bus, or a multi-adapter bridge
repair
Use this procedure to verify a repair for the high-speed link, a system PCI bus, or for a multi-adapter
bridge.
Within this procedure, the terms "system" and "logical partition" are interchangeable when used
individually.
1. Perform this procedure from the logical partition you were in when you were sent to this procedure,
or from the management console if this error was worked from the management console.
2. If you previously powered off a system or logical partition, or an expansion unit during this service
action, then you need to power it off again.
3. Install all cards, cables, and hardware, ensuring that all connections are tight. You can use the system
configuration list to verify that the cards are installed correctly.
4. Power on any expansion unit, logical partition or system unit that was powered off during the
service action. Is one of the following true?
Isolation procedures
3
v If the system or a logical partition was powered off during the service action, does the IPL
complete successfully to the IPL or does Install the System display?
v If an expansion unit was powered off during the service action, does the expansion unit power on
complete successfully?
v If any IOP or IOA card locations were powered off using concurrent maintenance during the
service action, do the slots power on successfully?
v If you exchanged a FRU that should appear as a resource or resources to the system, such as an
IOA, or I/O bridge, does the new FRU's resource appear in HSM as operational?
Yes: Continue with the next step.
No: Verify that you have followed the power off, remove and replace, and power-on
procedures correctly. When you are sure that you have followed the procedures correctly, then
exchange the next FRU in the list. If there are no more FRUs to exchange, then contact your
next level of support. This ends the procedure.
5. Does the system or logical partition have mirrored protection? Select Yes if you are not sure.
No: Continue with the next step.
Yes: From the Dedicated Service Tools (DST) display, select Work with disk units, and resume
mirrored protection for all units that have a suspended status.
6. Choose from the following options:
v If you are working from a partition, from the Start a Service Tool display, select Hardware service
manager and look for the I/O processors that have a failed or missing status.
v If you are working from a management console, look at the system unit properties.
a. Choose the I/O tab.
b. Look for IOAs or IOPs that have a failed or missing status.
Are all I/O processor cards operational?
Note: Ignore any IOPs that are listed with a status of not connected.
Yes: Go to step 10 on page 5.
No: Display the logical hardware resource information for the non-operational I/O processors.
For all I/O processors and I/O adapters that are failing; record the bus number (BBBB), board
(bb) and card information (Cc). Continue with the next step.
7. Perform the following steps:
a. Return to the Dedicated Service Tools (DST) display.
b. Display the Product Activity Log.
c. Select All logs and search for an entry with the same bus, board, and card address information
as the non-operational I/O processor. Do not include informational or statistical entries in your
search. Use only entries that occurred during the last IPL.
Did you find an entry for the SRC that sent you to this procedure?
No: Continue with the next step.
Yes: Ask your next level of support for assistance. This ends the procedure.
8. Did you find a B600 6944 SRC that occurred during the last IPL?
Yes: Continue with the next step.
No: A different SRC is associated with the non-operational I/O processor. Go to the Start of call
procedure and look up the new SRC to correct the problem. This ends the procedure.
9. Is there a B600 xxxx SRC that occurred during the last IPL other than the B600 6944 and
informational SRCs?
Yes: Use the other B600 xxxx SRC to determine the problem. Go to the Start of call and look up
the new SRC to correct the problem. This ends the procedure.
4
Isolation procedures
No: You connected an I/O processor in the wrong card position. Use the system configuration list
to compare the cards. When you have corrected the configuration, go to the start of this
procedure to verify the bus repair. This ends the procedure.
10. If in a partition, use the hardware service manager function to print the system configuration list.
Are there any configuration mismatches?
No: Continue with the next step.
Yes: Ask your next level of support for assistance. This ends the procedure.
11. You have verified the repair of the system bus.
a. If for this service action only an expansion unit was powered off or only the concurrent
maintenance function was used for an IOP or IOA, then continue with the next step.
b. Otherwise, perform the following steps to return the system to the customer:
1) Power off the system or logical partition. See Powering on and powering off the system for
procedures on powering on or off your system.
2) Select the operating mode with which the customer was originally running.
3) Power on the system or logical partition.
12. If the system has logical partitions and the entry point SRC was B600 xxxx, then check for related
problems in other logical partitions that could have been caused by the failing part. This ends the
procedure.
Analyzing a 12X or PCI bus reference code
Use Word 7 of the reference code to determine the bus number, bus type, multi-adapter bridge number,
multi-adapter bridge function number, and logical card number from the direct select address (DSA).
Physical card slot labels and card positions for PCI buses are determined by using the DSA and the
appropriate system unit or I/O unit card positions. See “Card positions” on page 7 for details.
Table 1. 12X and PCI reference code analysis
Word of the
reference code
Control panel
function
Panel function
characters
Format
Description
1
11
1–8
B600 uuuu or B700 uuuu
uuuu = unit reference
code (69xx)
1 – extended
reference code
information
11
9–16
iiii
Frame ID of the failing
resource
1 – extended
reference code
information
11
17–24
ffff
Frame location
1 – extended
reference code
information
11
25–32
bbbb
Board position
2
12
1–8
MIGVEP62 or MIGVEP63 See System Reference
Code (SRC) Format
Description.
3
12
9–16
cccc cccc
Component reference
code
4
12
17–24
pppp pppp
Programming reference
code
5
12
25–32
qqqq qqqq
Program reference code
high order qualifier
6
13
1–8
qqqq qqqq
Program reference code
low order qualifier
Isolation procedures
5
Table 1. 12X and PCI reference code analysis (continued)
Word of the
reference code
Control panel
function
Panel function
characters
Format
Description
7
13
9–16
BBBB Ccbb
See “DSA translation”
8
13
17–24
TTTT MMMM
Type (TTTT) and model
(MMMM) of the failing
item (if not zero)
9
13
25–32
uuuu uuuu
Unit address (if not zero)
DSA translation
The Direct Select Address (DSA) may be coded in word 7 of the reference code.
This DSA is either a PCI system bus number or a RIO/HSL/12X loop number, depending on the type of
error. With the following information, and the information in either the card position table (for PCI bus
numbers) or the information in the loop-number-to-NIC-port table (for RIO/HSL/12X loop numbers),
you can isolate a failing PCI bus or RIO/HSL/12X loop. Use the following instructions to translate the
DSA:
1. Separate the DSA into the bus number, multi-adapter bridge number, and multi-adapter bridge
function number. The DSA is of the form BBBB Ccxx, and separates into the following parts:
v BBBB = bus number
v C = multi-adapter bridge number
v c = multi-adapter bridge function number
v xx = not used
2. Is the bus number less than 0684?
Yes: The bus number is a PCI bus number in hexadecimal. Convert the number to decimal, and
then continue with the next step.
No: The bus number is a RIO/HSL/12X loop number in hexadecimal. Convert the number to
decimal, and then go to step 4.
3. Use one of the following guides to determine the type of system unit or expansion unit in which the
bus is located:
v If you are using a management console interface, view the managed system's properties on the
management console.
v If you are using AIX or Linux, use the command line interface to determine the enclosure type. On
the command line, type the following:
lshwres -r io --rsubtype bus
The result will be in the form:
unit_phys_loc=Uxxxx.yyy.zzzzzzz,bus_id=a,
......
Find the bus ID "a" entry that matches the decimal bus number you determined in step 2. Using the
corresponding Uxxxx value, look up the unit model or enclosure type using the Unit Type and
Locations table in System FRU locations.
4. Perform one of the following:
v If you are working with a PCI bus number, see “Card positions” on page 7 to search for the bus
number, the multi-adapter bridge number, and the multi-adapter bridge function number that
matches the system unit or expansion unit type where the bus is located. This ends the procedure.
v If you are working with a RIO/HSL/12X loop number, see “Converting the loop number to 12X
port location labels” on page 13 to determine the starting ports for the RIO/HSL/12X loop with the
failed link. This ends the procedure.
6
Isolation procedures
Card positions
The following information correlates PCI bus numbers to PCI card location codes for the listed machine
types and models.
PCI bus numbers in system units are assigned as indicated in the tables below. PCI bus numbers in
expansion units are assigned by Licensed Internal Code or firmware as the busses are discovered.
Card positions for model 8202-E4B or 8205-E6B
Card positions for model 8202-E4C, 8202-E4D, 8205-E6C, or 8205-E6D
Card positions for model 8231-E2B
Card positions for model 8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D, or 8268-E1D
Card
Card
Card
Card
Card
positions
positions
positions
positions
positions
for
for
for
for
for
model
model
model
model
model
8233-E8B or 8236-E8C
8248-L4T, 8408-E8D, or 9109-RMD
9117-MMB or 9179-MHB
8412-EAD, 9117-MMC, 9117-MMD, 9179-MHC, or 9179-MHD
9125-F2C
Table 2. Card positions for the 8202-E4B or 8205-E6B
Bus number in DSA
(hexadecimal/decimal)
Item designated by the DSA
Location
200/512
PCI-X embedded storage I/O adapter
Un-P1
201/513
PCI-X embedded USB controller
Un-P1
202/514
PCI-X storage I/O adapter
Un-P1-C19
204/516
PCIe IOA card
Un-P1-C4
205/517
PCIe IOA card
Un-P1-C5
206/518
PCIe IOA card
Un-P1-C6
207/519
PCIe IOA card
Un-P1-C7
208/520
PCIe IOA card
Un-P1-C1-C2
209/521
PCIe IOA card
Un-P1-C1-C4
20A/522
PCIe IOA card
Un-P1-C1-C3
20B/523
PCIe IOA card
Un-P1-C1-C1
Table 3. Card positions for the 8202-E4C, 8202-E4D, 8205-E6C, or 8205-E6D
Bus number in DSA
(hexadecimal/decimal)
Item designated by the DSA
Location
A/10
v PCIe embedded storage I/O adapter v Un-P1
v Cache battery card
v Un-P1-C14
v Battery on cache battery card
v Un-P1-C14-E1
B/11
PCIe embedded USB controller
Un-P1
C/12
v RAID storage controller or RAID
and cache storage controller
v Un-P1-C19
v Un-P1-C19-E1
v Battery on RAID and cache storage
controller
D/13
PCIe IOA card
Un-P1-C7
201/513
PCIe IOA card
Un-P1-C2
202/514
PCIe IOA card
Un-P1-C3
Isolation procedures
7
Table 3. Card positions for the 8202-E4C, 8202-E4D, 8205-E6C, or 8205-E6D (continued)
Bus number in DSA
(hexadecimal/decimal)
Item designated by the DSA
Location
203/515
PCIe IOA card
Un-P1-C4
204/516
PCIe IOA card
Un-P1-C5
205/517
PCIe IOA card
Un-P1-C6
208/520
PCIe IOA card
Un-P1-C1-C1
209/521
PCIe IOA card
Un-P1-C1-C2
20A/522
PCIe IOA card
Un-P1-C1-C3
20B/523
PCIe IOA card
Un-P1-C1-C4
Table 4. Card positions for the 8231-E2B
Bus number in DSA
(hexadecimal/decimal)
Item designated by the DSA
Location
200/512
PCI-X embedded storage I/O adapter
Un-P1
201/513
PCI-X embedded USB controller
Un-P1
202/514
PCI-X storage I/O adapter
Un-P1-C18
203/515
PCIe IOA card
Un-P1-C3
204/516
PCIe IOA card
Un-P1-C4
205/517
PCIe IOA card
Un-P1-C5
206/518
PCIe IOA card
Un-P1-C6
Table 5. Card positions for the 8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D, or 8268-E1D
Bus number in DSA
(hexadecimal/decimal)
Item designated by the DSA
Location
A/10
v PCIe embedded storage I/O adapter v Un-P1
v Cache battery card
v Un-P1-C13
v Battery on cache battery card
v Un-P1-C13-E1
B/11
PCIe embedded USB controller
Un-P1
C/12
v RAID and cache storage controller
v Un-P1-C18
v Battery on RAID and cache storage
controller
v Un-P1-C18-E1
D/13
PCIe IOA card
Un-P1-C7
201/513
PCIe IOA card
Un-P1-C2
202/514
PCIe IOA card
Un-P1-C3
203/515
PCIe IOA card
Un-P1-C4
204/516
PCIe IOA card
Un-P1-C5
205/517
PCIe IOA card
Un-P1-C6
Table 6. Card positions for the 8233-E8B or 8236-E8C
Bus number in DSA
(hexadecimal/decimal)
Item designated by the DSA
Location
200/512
PCI-X IOA card
Un-P1-C4
201/513
PCI-X IOA card
Un-P1-C5
8
Isolation procedures
Table 6. Card positions for the 8233-E8B or 8236-E8C (continued)
Bus number in DSA
(hexadecimal/decimal)
Item designated by the DSA
Location
202/514
PCI-X embedded storage I/O adapter
Un-P1
203/515
PCI-X embedded USB controller
Un-P1
204/516
PCIe IOA card
Un-P1-C1
205/517
PCIe IOA card
Un-P1-C2
206/518
PCIe RAID enablement and auxiliary
write cache card
Un-P1-C10
207/519
PCIe IOA card
Un-P1-C3
Table 7. Card positions for the 8248-L4T, 8408-E8D, or 9109-RMD
Bus number in DSA
(hexadecimal/decimal)
Item DSA points to
Location
200/512
PCIe embedded SATA media device
controller for the DVD at location
Un-P2-C9-D7
Un-P2
201/513
PCI embedded serial controller for
location Un-P2-C8-T7
Un-P2
202/514
PCIe IOA card
Un-P2-C4
203/515
PCIe IOA card
Un-P2-C3
204/516
PCIe IOA card
Un-P2-C2
205/517
PCIe IOA card
Un-P2-C1
208/520
PCIe embedded storage I/O adapter
Un-P2-C9
Note: The cache battery is at location
Un-P2-C9-C1-E1
209/521
PCIe embedded storage I/O adapter
Un-P2-C9
Note: The cache battery is at location
Un-P2-C9-C1-E2
20A/522
PCIe embedded Ethernet controller
Un-P2-C8
20B/523
Embedded USB controller for port
locations Un-P2-C8-T5 and
Un-P2-C8-T6
Un-P2-C8
20C/524
PCIe IOA card
Un-P2-C6
20D/525
PCIe IOA card
Un-P2-C5
Table 8. Card positions for the 9117-MMB or 9179-MHB
Bus number in DSA
(hexadecimal/decimal)
Item DSA points to
Location
200/512 (node 1)
PCI embedded USB controller
Un-P2
Note: The USB ports are
Un-P2-C8-T5 and Un-P2-C8-T6 but
the controller is embedded in Un-P2.
240/576 (node 2)
280/640 (node 3)
2C0/704 (node 4)
Isolation procedures
9
Table 8. Card positions for the 9117-MMB or 9179-MHB (continued)
Bus number in DSA
(hexadecimal/decimal)
Item DSA points to
Location
204/516 (node 1)
PCIe IOA card
Un-P2-C1
PCIe IOA card
Un-P2-C2
PCIe IOA card
Un-P2-C3
PCIe IOA card
Un-P2-C4
PCI-X embedded SATA media device
controller
Un-P2
Note: The device slot is Un-P2-C9
but the device controller is
embedded in Un-P2.
PCI embedded serial controller
Un-P2
Note: The serial port is in
Un-P2-C8-T7 but the controller is
embedded in Un-P2.
PCIe IOA card
Un-P2-C5
PCIe IOA card
Un-P2-C6
244/580 (node 2)
284/644 (node 3)
2C4/708 (node 4)
205/517 (node 1)
245/581 (node 2)
285/645 (node 3)
2C5/709 (node 4)
206/518 (node 1)
246/582 (node 2)
286/646 (node 3)
2C6/710 (node 4)
207/519 (node 1)
247/583 (node 2)
287/647 (node 3)
2C7/711 (node 4)
208/520 (node 1)
248/584 (node 2)
288/648 (node 3)
2C8/712 (node 4)
209/521 (node 1)
249/585 (node 2)
289/649 (node 3)
2C9/713 (node 4)
20C/524 (node 1)
24C/588 (node 2)
28C/652 (node 3)
2CC/716 (node 4)
20D/525 (node 1)
24D/589 (node 2)
28D/653 (node 3)
2CD/717 (node 4)
10
Isolation procedures
Table 8. Card positions for the 9117-MMB or 9179-MHB (continued)
Bus number in DSA
(hexadecimal/decimal)
Item DSA points to
Location
20E/526 (node 1)
PCIe embedded storage I/O adapter
Un-P2-C9
PCIe embedded storage I/O adapter
Un-P2-C9
24E/590 (node 2)
28E/654 (node 3)
2CE/718 (node 4)
20F/527 (node 1)
24F/591 (node 2)
28F/655 (node 3)
2CF/719 (node 4)
Table 9. Card positions for the 8412-EAD, 9117-MMC, 9117-MMD, 9179-MHC, or 9179-MHD
Bus number in DSA
(hexadecimal/decimal)
Item DSA points to
Location
200/512 (node 1)
PCIe embedded SATA media device
controller
Un-P2
Note: The device slot is Un-P2-C9
but the device controller is
embedded in Un-P2.
PCI embedded serial controller
Un-P2
Note: The serial port is in
Un-P2-C8-T7 but the controller is
embedded in Un-P2.
PCIe IOA card
Un-P2-C4
PCIe IOA card
Un-P2-C3
PCIe IOA card
Un-P2-C2
240/576 (node 2)
280/640 (node 3)
2C0/704 (node 4)
201/513 (node 1)
241/577 (node 2)
281/641 (node 3)
2C1/705 (node 4)
202/514 (node 1)
242/578 (node 2)
282/642 (node 3)
2C2/706 (node 4)
203/515 (node 1)
243/579 (node 2)
283/643 (node 3)
2C3/707 (node 4)
204/516 (node 1)
244/580 (node 2)
284/644 (node 3)
2C4/708 (node 4)
Isolation procedures
11
Table 9. Card positions for the 8412-EAD, 9117-MMC, 9117-MMD, 9179-MHC, or 9179-MHD (continued)
Bus number in DSA
(hexadecimal/decimal)
Item DSA points to
Location
205/517 (node 1)
PCIe IOA card
Un-P2-C1
PCIe embedded storage I/O adapter
Un-P2-C9
PCIe embedded storage I/O adapter
Un-P2-C9
PCI embedded USB controller
Un-P2
Note: The USB ports are
Un-P2-C8-T5 and Un-P2-C8-T6 but
the controller is embedded in Un-P2.
PCIe embedded Ethernet controller
Un-P2
PCIe IOA card
Un-P2-C6
PCIe IOA card
Un-P2-C5
245/581 (node 2)
285/645 (node 3)
2C5/709 (node 4)
208/520 (node 1)
248/584 (node 2)
288/648 (node 3)
2C8/712 (node 4)
209/521 (node 1)
249/585 (node 2)
289/649 (node 3)
2C9/713 (node 4)
20A/522 (node 1)
24A/586 (node 2)
28A/650 (node 3)
2CA/714 (node 4)
20B/523 (node 1)
24B/587 (node 2)
28B/651 (node 3)
2CB/715 (node 4)
20C/524 (node 1)
24C/588 (node 2)
28C/652 (node 3)
2CC/716 (node 4)
20D/525 (node 1)
24D/589 (node 2)
28D/653 (node 3)
2CD/717 (node 4)
Table 10. Card positions for the 9125-F2C
Bus number in DSA
(hexadecimal/decimal)
Item DSA points to
Location
200/512
PCIe IOA card
Un-P1-C16
201/513
PCIe IOA card
Un-P1-C15
12
Isolation procedures
Table 10. Card positions for the 9125-F2C (continued)
Bus number in DSA
(hexadecimal/decimal)
Item DSA points to
Location
202/514
PCIe IOA card
Un-P1-C17
208/520
PCIe IOA card
Un-P1-C14
209/521
PCIe IOA card
Un-P1-C13
210/528
PCIe IOA card
Un-P1-C12
211/529
PCIe IOA card
Un-P1-C11
218/536
PCIe IOA card
Un-P1-C10
219/537
PCIe IOA card
Un-P1-C9
220/544
PCIe IOA card
Un-P1-C8
221/545
PCIe IOA card
Un-P1-C7
228/552
PCIe IOA card
Un-P1-C6
229/553
PCIe IOA card
Un-P1-C5
230/560
PCIe IOA card
Un-P1-C4
231/561
PCIe IOA card
Un-P1-C3
238/568
PCIe IOA card
Un-P1-C2
239/569
PCIe IOA card
Un-P1-C1
Converting the loop number to 12X port location labels
Use this table to convert the 12X loop number to port location labels.
Select the system you are servicing from the following list:
v 8202-E4B or 8205-E6B
v 8202-E4C, 8202-E4D, 8205-E6C, or 8205-E6D
v
v
v
v
v
8231-E2B
8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D, or 8268-E1D
8233-E8B or 8236-E8C
8248-L4T, 8408-E8D, or 9109-RMD
9117-MMB or 9179-MHB
v 8412-EAD, 9117-MMC, 9117-MMD, 9179-MHC, or 9179-MHD
v 9119-FHB
8202-E4B or 8205-E6B
Table 11. Converting the loop number to port location labels for the 8202-E4B or 8205-E6B
Loop number (hex/dec)
FRU position
12X port labels on system unit
0780/1920
Un-P1
Internal
0781/1921
Un-P1-C2
Un-P1-C2-T1 (top)
Un-P1-C2-T2 (bottom)
0782/1922
Un-P1-C8
Un-P1-C8-T1 (top)
Un-P1-C8-T2 (bottom)
8202-E4C, 8202-E4D, 8205-E6C, or 8205-E6D
Isolation procedures
13
Table 12. Converting the loop number to port location labels for the 8202-E4C, 8202-E4D, 8205-E6C, or 8205-E6D
Loop number (hex/dec)
FRU position
12X port labels on system unit
0780/1920
Un-P1
Internal
0781/1921
Un-P1-C1
Un-P1-C1-T1 (top)
Un-P1-C1-T2 (bottom)
0782/1922
Un-P1-C8
Un-P1-C8-T1 (top)
Un-P1-C8-T2 (bottom)
8231-E2B
Table 13. Converting the loop number to port location labels for the 8231-E2B
Loop number (hex/dec)
FRU position
12X port labels on system unit
0780/1920
Un-P1
Internal
0781/1921
Un-P1-C1
Un-P1-C1-T1 (top)
Un-P1-C1-T2 (bottom)
0782/1922
Un-P1-C7
Un-P1-C7-T1 (top)
Un-P1-C7-T2 (bottom)
8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D, or 8268-E1D
Table 14. Converting the loop number to port location labels for the 8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D, or
8268-E1D
Loop number (hex/dec)
FRU position
12X port labels on system unit
0780/1920
Un-P1
Internal
0782/1922
Un-P1-C8
Un-P1-C8-T1
Un-P1-C8-T2
8233-E8B or 8236-E8C
Table 15. Converting the loop number to port location labels for the 8233-E8B or 8236-E8C
Loop number (hex/dec)
FRU position
12X port labels on system unit
0780/1920
Un-P1
Internal
0781/1921
Un-P1-C8
Un-P1-C8-T1 (top)
Un-P1-C8-T2 (bottom)
0782/1922
Un-P1-C7
Un-P1-C7-T1 (top)
Un-P1-C7-T2 (bottom)
8248-L4T, 8408-E8D, or 9109-RMD
Table 16. Converting the loop number to port location labels for the 8248-L4T, 8408-E8D, or 9109-RMD
Loop number (hex/dec)
FRU position
12X port labels on system unit
0781/1921
Un-P2
Internal
0783/1923
Un-P2
Internal
14
Isolation procedures
Table 16. Converting the loop number to port location labels for the 8248-L4T, 8408-E8D, or 9109-RMD (continued)
Loop number (hex/dec)
FRU position
12X port labels on system unit
0785/1925
Un-P1-C2
Un-P1-C2-T1 (left)
Un-P1-C2-T2 (right)
0787/1927
Un-P1-C3
Un-P1-C3-T1 (left)
Un-P1-C3-T2 (right)
9117-MMB or 9179-MHB
Table 17. Converting the loop number to port location labels for the 9117-MMB or 9179-MHB
Loop number (hex/dec)
FRU position
12X port labels on system unit
0780/1920 (node 1)
Un-P2
Internal
Un-P2
Internal
Un-P1-C2
Un-P1-C2-T1 (left)
0788/1928 (node 2)
0790/1936 (node 3)
0798/1944 (node 4)
0781/1921 (node 1)
0789/1929 (node 2)
0791/1937 (node 3)
0799/1945 (node 4)
0782/1922 (node 1)
078A/1930 (node 2)
Un-P1-C2-T2 (right)
0792/1938 (node 3)
079A/1946 (node 4)
0783/1923 (node 1)
Un-P1-C3
078B/1931 (node 2)
Un-P1-C3-T1 (left)
Un-P1-C3-T2 (right)
0793/1939 (node 3)
079B/1947 (node 4)
8412-EAD, 9117-MMC, 9117-MMD, 9179-MHC, or 9179-MHD
Table 18. Converting the loop number to port location labels for the 8412-EAD, 9117-MMC, 9117-MMD, 9179-MHC,
or 9179-MHD
Loop number (hex/dec)
FRU position
12X port labels on system unit
0781/1921 (node 1)
Un-P2
Internal
0789/1929 (node 2)
0791/1937 (node 3)
0799/1945 (node 4)
Isolation procedures
15
Table 18. Converting the loop number to port location labels for the 8412-EAD, 9117-MMC, 9117-MMD, 9179-MHC,
or 9179-MHD (continued)
Loop number (hex/dec)
FRU position
12X port labels on system unit
0784/1924 (node 1)
Un-P2
Internal
Un-P1-C2
Un-P1-C2-T1 (left)
078C/1932 (node 2)
0794/1940 (node 3)
079C/1948 (node 4)
0783/1923 (node 1)
078B/1931 (node 2)
Un-P1-C2-T2 (right)
0793/1939 (node 3)
079B/1947 (node 4)
0786/1926 (node 1)
Un-P1-C3
078E/1934 (node 2)
Un-P1-C3-T1 (left)
Un-P1-C3-T2 (right)
0796/1942 (node 3)
079E/1950 (node 4)
9119-FHB
Table 19. Converting the loop number to port location labels for the 9119-FHB
Loop number (hex/dec)
FRU position
12X port labels on system unit
0780/1920
Un-P9-C44
Un-P9-C44-T1 (left)
Un-P9-C44-T2 (right)
0781/1921
Un-P7-C44
Un-P7-C44-T1 (left)
Un-P7-C44-T2 (right)
0782/1922
Un-P9-C41
Un-P9-C41-T1 (left)
Un-P9-C41-T2 (right)
0783/1923
Un-P7-C41
Un-P7-C41-T1 (left)
Un-P7-C41-T2 (right)
0784/1924
Un-P9-C39
Un-P9-C39-T1 (left)
Un-P9-C39-T2 (right)
0785/1925
Un-P7-C39
Un-P7-C39-T1 (left)
Un-P7-C39-T2 (right)
0786/1926
Un-P9-C40
Un-P9-C40-T1 (left)
Un-P9-C40-T2 (right)
0787/1927
Un-P7-C40
Un-P7-C40-T1 (left)
Un-P7-C40-T2 (right)
0788/1928
Un-P5-C44
Un-P5-C44-T1 (left)
Un-P5-C44-T2 (right)
16
Isolation procedures
Table 19. Converting the loop number to port location labels for the 9119-FHB (continued)
Loop number (hex/dec)
FRU position
12X port labels on system unit
0789/1929
Un-P8-C44
Un-P8-C44-T1 (left)
Un-P8-C44-T2 (right)
078A/1930
Un-P5-C41
Un-P5-C41-T1 (left)
Un-P5-C41-T2 (right)
078B/1931
Un-P8-C41
Un-P8-C41-T1 (left)
Un-P8-C41-T2 (right)
078C/1932
Un-P5-C39
Un-P5-C39-T1 (left)
Un-P5-C39-T2 (right)
078D/1933
Un-P8-C39
Un-P8-C39-T1 (left)
Un-P8-C39-T2 (right)
078E/1934
Un-P5-C40
Un-P5-C40-T1 (left)
Un-P5-C40-T2 (right)
078F/1935
Un-P8-C40
Un-P8-C40-T1 (left)
Un-P8-C40-T2 (right)
0790/1936
Un-P6-C44
Un-P6-C44-T1 (left)
Un-P6-C44-T2 (right)
0791/1937
Un-P3-C44
Un-P3-C44-T1 (left)
Un-P3-C44-T2 (right)
0792/1938
Un-P6-C41
Un-P6-C41-T1 (left)
Un-P6-C41-T2 (right)
0793/1939
Un-P3-C41
Un-P3-C41-T1 (left)
Un-P3-C41-T2 (right)
0794/1940
Un-P6-C39
Un-P6-C39-T1 (left)
Un-P6-C39-T2 (right)
0795/1941
Un-P3-C39
Un-P3-C39-T1 (left)
Un-P3-C39-T2 (right)
0796/1942
Un-P6-C40
Un-P6-C40-T1 (left)
Un-P6-C40-T2 (right)
0797/1943
Un-P3-C40
Un-P3-C40-T1 (left)
Un-P3-C40-T2 (right)
0798/1944
Un-P2-C44
Un-P2-C44-T1 (left)
Un-P2-C44-T2 (right)
0799/1945
Un-P4-C44
Un-P4-C44-T1 (left)
Un-P4-C44-T2 (right)
Isolation procedures
17
Table 19. Converting the loop number to port location labels for the 9119-FHB (continued)
Loop number (hex/dec)
FRU position
12X port labels on system unit
079A/1946
Un-P2-C41
Un-P2-C41-T1 (left)
Un-P2-C41-T2 (right)
079B/1947
Un-P4-C41
Un-P4-C41-T1 (left)
Un-P4-C41-T2 (right)
079C/1948
Un-P2-C39
Un-P2-C39-T1 (left)
Un-P2-C39-T2 (right)
079D/1949
Un-P4-C39
Un-P4-C39-T1 (left)
Un-P4-C39-T2 (right)
079E/1950
Un-P2-C40
Un-P2-C40-T1 (left)
Un-P2-C40-T2 (right)
079F/1951
Un-P4-C40
Un-P4-C40-T1 (left)
Un-P4-C40-T2 (right)
HSL loop configuration and status form
Use this HSL loop configuration and status form to record the status of the HSL ports in the loop.
Note: You may copy this form as necessary.
HSL loop configuration and status worksheet for system _______________, Loop number ___________
Table 20. HSL loop configuration and status form
HSL resource information
Resource
type
18
Resource
name
Isolation procedures
Frame
ID
Leading port information
Trailing port information
Port number
(or internal)
Port number (or
internal)
Link status
(operational or
failed)
Link status
(operational or
failed)
Table 20. HSL loop configuration and status form (continued)
HSL resource information
Resource
type
Resource
name
Frame
ID
Leading port information
Trailing port information
Port number
(or internal)
Port number (or
internal)
Link status
(operational or
failed)
Link status
(operational or
failed)
Installed features in a PCI bridge set form
Use this form to record the PCI bridge set card positions, and multi-adapter bridge function numbers.
Note: You might find it helpful to copy this form as necessary.
Table 21. Installed features in a PCI bridge set
Multi-adapter bridge function
number
PCI bridge set card positions
Record if IOP or IOA is installed.
0
1
2
3
4
5
6
7
RIO/HSL/12X link status diagnosis form
Use this form to record the status of the RIO/HSL/12X links.
Column A (starting status)
Resource
Port info
with
failing link
First
Column B
Column C
(column A
is failed and
column B is
failed)
Column D
Port status
Port status
Port status
____
Port _0 (or
internal)
Port _0 (or
internal)
Port _0 (or
internal)
Card Position
____
____
____
Port #
Port _1 (or
internal)
Port _1 (or
internal)
Port _1 (or
internal)
____
____
____
____
Frame ID
____
Column E
(column B is
failed and
column D is
failed)
Isolation procedures
19
Column A (starting status)
Resource
Port info
with
failing link
Second
Column B
Column C
(column A
is failed and
column B is
failed)
Column D
Port status
Port status
Port status
____
Port _0 (or
internal)
Port _0 (or
internal)
Port _0 (or
internal)
Card Position
____
____
____
Port #
Port _1 (or
internal)
Port _1 (or
internal)
Port _1 (or
internal)
____
____
____
____
Frame ID
____
Column E
(column B is
failed and
column D is
failed)
CONSL01
Use this procedure to exchange the I/O processor (IOP) for the system or partition console.
1. Is the system managed by a management console?
No: Go to step 6 on page 21.
Yes: The management console will be required for this procedure. Move to the management
console and continue with the next step only if the management console is functional.
2. Can the customer power off the partition at this time?
Yes: Power off the partition from the operating system console or the management console. Then,
continue with the next step.
No: The IOP controlling the partition's console may be controlling other critical resources. The
partition must be powered off to exchange this IOP. Perform this procedure when the customer is
able to power off the partition. Then, continue with the next step.
3. Perform the following steps to determine the unit machine type, model, and serial number where the
console IOP is located and the location of the console IOP:
a. Hardware Management Console (HMC): Select Systems Management. Double-click the partition
you are working on. Select the I/O tab. Record the location of the load source IOP. The unit type,
model, and serial number are the first three parts of the location code and are separated by
periods.
Systems Director Management Console (SDMC): In the content area, select the virtual server under
Resources. Click Actions > Properties. Record the location of the load source IOP. The unit type,
model, and serial number are the first three parts of the location code and are separated by
periods.
b. Continue with the next step.
4. Record the frame type or feature by using the frame ID and system configuration listing or by
locating the frame with that ID and recording the frame type or feature.
5. Perform the following steps to exchange the IOP in that card position:
a. Go to System FRU locations and select the unit type and model that you recorded.
b. Locate the card position in the FRU locations table and use the exchange procedure that is
identified.
c. Power on the partition.
This ends the procedure.
20
Isolation procedures
6. The problem is in the partition of a system with one or more partitions that is not managed by a
management console. Replace the load source IOP. This ends the procedure.
RIOIP01
Use this procedure to isolate a failure in a RIO/HSL/12X loop using service tools.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
Follow the steps in the “Main task” and you will be directed to the proper subtasks.
Note: During this procedure, you will be disconnecting and reconnecting cables. If errors concerning
missing resources (such as disk units and RIO/HSL/12X failures) occur, ignore them. Missing resources
will report in again when the loop reinitializes.
Main task
1. Were you sent here from a B600 xxxx reference code?
No: Continue with the next step.
Yes: Use the serviceable event view and the system service documentation to search for a B700
xxxx reference code with the same last four characters reported at approximately the same time. If
you find one, perform service on that reference code first, and when you close that problem, close
this one as well. If you do not find one, continue with the next step.
2. Before powering down any system unit or expansion unit, work with the customer to end all
subsystems in all of the partitions using each partition's console.
3. From the partition control panel, IPL the system or partition to Dedicated Service Tools (DST).
Attention: Do not use function 21.
4. Are all system and expansion units on the loop powered on?
Yes: Go to step 6.
No: Continue with the next step.
5. Perform the following steps:
a. Power on all system and expansion units on the loop. If a frame cannot be powered on, perform
the “Cannot power on unit” on page 25 subtask below, and then continue with step 6.
b. Was the RIO/HSL/12X link error cleared up when the frames were powered back on?
v No: Continue with the next step.
v Yes: Go to Verify a repair.
This ends the procedure.
6. Perform the following steps:
a. Access the Service Action Log (SAL) entry for this error; the field replaceable units (FRUs) should
be listed there. Look for part numbers and descriptions for the FRUs containing the RIO/HSL/12X
port for two frames. There should also be a FRU for the cable between them. The locations
information for the FRUs is the location of the failed ports on the failed link.
b. Record the loop number from the SAL (if it is displayed there in one of the FRU descriptions) or
from the first four characters of word 7 of the reference code. Go to “Converting the loop number
to 12X port location labels” on page 13 to determine which RIO/HSL/12X cables on the system
you are working with.
Is this information in the SAL?
Yes: Continue with the next step.
No: Perform “Manually detecting the failed link” on page 25 below, and then continue with the
next step of the main task.
Isolation procedures
21
7. Is the cable connecting the failed ports an optical cable?
No: Go to step 9.
Yes: Continue with the next step.
8. Perform the following steps:
a. Clean the RIO/HSL/12X cable connectors and ports using the fiber optic cleaning kit and the
fiber-optic cleaning procedures in "SY27-2604 Fiber Optic Cleaning Procedures".
b. To determine if cleaning the connectors and ports solved the problem, perform “Manually
detecting the failed link” on page 25 below and return to this point. Did the ports you were
working on have a status of "failed"?
No: Then the problem is fixed, go to Verify a repair. This ends the procedure.
Yes: Continue with the next step.
9. There are now three cases to consider. Continue with the appropriate subtask of this procedure:
v “The ports on both ends of the failed link are in different system units on the loop.”
v “The port on one end of the failed link is in a system unit and the port on the other end is in an
I/O unit” on page 23.
v “The ports on both ends of the failed link are in an I/O unit” on page 24.
The ports on both ends of the failed link are in different system units on the loop
1. There may be failed hardware that will report a different error on the other system units. Perform the
following steps:
a. Resolve any other RIO/HSL/12X problems in the serviceable event view on the other system
units.
b. Perform “Manually detecting the failed link” on page 25 below and return to this point. Did the
ports you were working on have a status of "failed"?
No: Then the problem is fixed, go to Verify a repair. This ends the procedure.
Yes: Continue with the next step.
2. Is the cable an optical RIO/HSL/12X cable?
Yes: Go to step 4.
No: Continue with the next step.
3. Perform the following steps:
a. Verify that the cables are connected securely. For any cable that was loose, disconnect the cable at
that end, wait 30 seconds, and reconnect the cable securely. If there are thumbscrews, you must
tighten both thumb screws within 30 seconds of when the cable makes contact with the port.
b. If you disconnected and reconnected the cable at either end, perform “Manually detecting the
failed link” on page 25 below and return to this point. Did the ports you were working on have a
status of "failed"?
No: Then the problem is fixed, go to Verify a repair. This ends the procedure.
Yes: Continue with the next step.
4. Replace the cable between the two system unit ports on the failed link. To determine if replacing the
cable resolved the problem, perform “Manually detecting the failed link” on page 25 below and
return to this point. Did the ports you were working on have a status of "failed"?
No: Then the problem is fixed, go to Verify a repair. This ends the procedure.
Yes: Continue with the next step.
5. Exchange the FRU with the RIO/HSL/12X port in one of the system units. If you are working with a
serviceable event view and the RIO/HSL/12X FRUs are listed, exchange the FRU corresponding to
the first RIO/HSL/12X cable port listed. Otherwise, exchange the FRU that is quickest and easiest to
replace). To determine if replacing the FRU resolved the problem, perform “Manually detecting the
failed link” on page 25 below and return to this point. Did the ports you were working on have a
status of "failed"?
22
Isolation procedures
No: Then the problem is fixed, go to Verify a repair. This ends the procedure.
Yes: Continue with the next step.
6. Exchange the remaining FRU with the RIO/HSL/12X port on the other system unit. To determine if
replacing the FRU resolved the problem, perform “Manually detecting the failed link” on page 25
below and return to this point. Did the ports you were working on have a status of "failed"?
No: Then the problem is fixed, go to Verify a repair. This ends the procedure.
Yes: Continue with the next step.
7. Use the procedure HSL_LNK to determine if there are any additional RIO/HSL/12X cable-related
FRUs, such as interposer cards and internal ribbon cables, that may be on either unit. Did you
exchange any additional RIO/HSL/12X FRUs?
No: Call your next level of support for further instruction. This ends the procedure.
Yes: Continue with the next step.
8. To determine if replacing the FRU resolved the problem, perform “Manually detecting the failed link”
on page 25 below and return to this point. Did the ports you were working on have a status of
"failed"?
No: Then the problem is fixed, go to Verify a repair. This ends the procedure.
Yes: Call your next level of support for further instruction. This ends the procedure.
The port on one end of the failed link is in a system unit and the port on the other
end is in an I/O unit
1. Switch the two RIO/HSL/12X cables on the I/O unit with the failed port, so that each cable is
connected to the port where the other cable was previously connected. Disconnect both cables at the
same time, wait 30 seconds, and then reconnect the cables one at a time.
Attention: For copper cables with thumbscrews, you must fully connect the cable and tighten the
connector's screws within 30 seconds of when the cable makes contact with the port. Otherwise, the
link will fail and you must disconnect and reconnect again. Also, if the connector screws are not
tightened, errors will occur on the link and it will fail.
2. Refresh the port status for the first failing resource by performing “Refresh the port status” on page
26 below. Then continue with the next step.
3. Is the port on the system unit that was failed now working?
No: Continue with the next step.
Yes: Exchange the RIO/HSL/12X bridge FRU in the I/O unit. Go to Verify a repair. This ends the
procedure.
4. Switch the cables back to their original positions by disconnecting both cables at the same time,
waiting 30 seconds, and then reconnecting the cables one at a time.
Attention: For copper cables with thumbscrews, you must fully connect the cable and tighten the
connector's screws within 30 seconds of when the cable makes contact with the port. Otherwise, the
link will fail and you must disconnect and reconnect again. Also, if the connector screws are not
tightened, errors will occur on the link and it will fail.
5. Exchange the cable between the two ports on the failed link. To determine if replacing the cable
resolved the problem, perform “Manually detecting the failed link” on page 25 below and return to
this point. Did the ports you were working on have a status of "failed"?
No: Then the problem is fixed, go to Verify a repair. This ends the procedure.
Yes: Continue with the next step.
6. Use the procedure HSL_LNK to determine if there are any additional RIO/HSL/12X cable-related
FRUs, such as interposer cards and internal ribbon cables, that may be on either unit. Did you
exchange any additional RIO/HSL/12X FRUs?
No: Call your next level of support for further instruction. This ends the procedure.
Yes: Continue with the next step.
Isolation procedures
23
7. Exchange the RIO/HSL/12X FRU that contains the failing port in the system unit. To determine if
replacing the FRU resolved the problem, perform “Manually detecting the failed link” on page 25
below and return to this point. Did the ports you were working on have a status of "failed"?
No: Then the problem is fixed, go to Verify a repair. This ends the procedure.
Yes: Continue with the next step.
8. To determine if replacing the FRU resolved the problem, perform “Manually detecting the failed link”
on page 25 below and return to this point. Did the ports you were working on have a status of
"failed"?
No: Then the problem is fixed, go to Verify a repair. This ends the procedure.
Yes: Call your next level of support for further instruction. This ends the procedure.
The ports on both ends of the failed link are in an I/O unit
1. Switch the two RIO/HSL/12X cables on the first (or "From") cable's I/O unit with the failed port so
that each cable is connected to the port where the other cable was previously connected.
Attention: For copper cables with thumbscrews, you must fully connect the cable and tighten the
connector's screws within 30 seconds of when the cable makes contact with the port. Otherwise, the
link will fail and you must disconnect and reconnect again. Also, if the connector screws are not
tightened, errors will occur on the link and it will fail.
2. Refresh the port status for the first failing resource by performing “Refresh the port status” on page
26 below. Then continue with the next step.
3. Is the port on the I/O unit on which you did not switch the cables now working?
No: Go to step 5
Yes: Exchange the RIO/HSL/12X I/O bridge card in the I/O unit where you just switched the
cables. The continue with the next step.
4. To determine if replacing the FRU resolved the problem, perform “Manually detecting the failed
link” on page 25 and return to this point. Did the ports you were working on have a status of
"failed"?
No: Then the problem is fixed, go to Verify a repair. This ends the procedure.
Yes: Continue with the next step.
5. Switch the cables back to their original positions.
Attention: For copper cables with thumbscrews, you must fully connect the cable and tighten the
connector's screws within 30 seconds of when the cable makes contact with the port. Otherwise, the
link will fail and you must disconnect and reconnect again. Also, if the connector screws are not
tightened, errors will occur on the link and it will fail.
6. Switch the two RIO/HSL/12X cables on the second (or "To") I/O unit with the failed port so that
each cable is connected to the port where the other cable was previously connected.
Attention: For copper cables with thumbscrews, you must fully connect the cable and tighten the
connector's screws within 30 seconds of when the cable makes contact with the port. Otherwise, the
link will fail and you must disconnect and reconnect again. Also, if the connector screws are not
tightened, errors will occur on the link and it will fail.
7. Refresh the port status for the first failing resource by performing “Refresh the port status” on page
26. Then continue with the next step.
8. Is the port on the I/O unit on which you did not switch cables now working?
No: Go to step 10 on page 25.
Yes: Exchange the RIO/HSL/12X I/O bridge card in the I/O unit where you just switched the
cables. Then continue with the next step.
9. To determine if replacing the FRU resolved the problem, perform “Manually detecting the failed
link” on page 25 below and return to this point. Did the ports you were working on have a status of
"failed"?
No: Then the problem is fixed, go to Verify a repair. This ends the procedure.
24
Isolation procedures
Yes: Continue with the next step.
10. Switch the cables back to their original positions.
Attention: For copper cables with thumbscrews, you must fully connect the cable and tighten the
connector's screws within 30 seconds of when the cable makes contact with the port. Otherwise, the
link will fail and you must disconnect and reconnect again. Also, if the connector screws are not
tightened, errors will occur on the link and it will fail.
11. Exchange the RIO/HSL/12X cable between the two ports on the failed link. To determine if
replacing the cable resolved the problem, perform “Manually detecting the failed link,” then return
to this point.
Did the ports you were working on have a status of "failed"?
No: Then the problem is fixed, go to Verify a repair. This ends the procedure.
Yes: Continue with the next step.
12. Use the procedure HSL_LNK to determine if there are any additional RIO/HSL/12X cable-related
FRUs, such as interposer cards and internal ribbon cables, that may be on either unit. Did you
exchange any additional RIO/HSL/12X FRUs?
No: Call your next level of support for further instruction. This ends the procedure.
Yes: Continue with the next step.
13. To determine if replacing the FRU resolved the problem, perform “Manually detecting the failed
link” below and return to this point. Did the ports you were working on have a status of "failed"?
No: Then the problem is fixed, go to Verify a repair. This ends the procedure.
Yes: Call your next level of support for further instruction. This ends the procedure.
Cannot power on unit
1. Work the errors related to powering on the units, and then continue with the next step. If a unit still
cannot be powered on, re-cable the RIO/HSL/12X loop without the I/O units and system units that
cannot be powered on, allowing the loop to be complete (no disconnected cables).
2. To determine if re-cabling the loop resolved the problem, perform “Manually detecting the failed link”
below and return to this point.
Manually detecting the failed link
1. Get the loop number from the reference code if you do not already have it. The loop number is a
hexadecimal number in word 7 of the reference code.
v If you are working from the Product Activity Log (PAL), then the loop number is the 4 leftmost
characters of the DSA in word 7 (BBBB). The loop number is in hexadecimal. Convert the
hexadecimal loop number to decimal format before continuing with this procedure.
2.
3.
4.
5.
v If you are working from the Service Action Log (SAL), the loop number should be displayed in
the FRU description area in decimal format.
Sign on to SST or DST (if you have not already done so). Select Start a service tool > Hardware
service manager > Logical hardware resources > High-speed link (HSL) resources.
Select Resources associated with loop for the RIO/HSL/12X loop with the failed link. The
RIO/HSL/12X bridges will be displayed under the loop.
Select Display detail for the loop with the failed link.
Record the name of the NIC/RIO controller resource you are starting from on the display. You will
need to know this name to determine if you have followed the loop around and back to this
resource.
6. If the leading port does not have a status of "failed", select Follow leading port until a leading port
with a "failed" status is found, or the display is showing information for the starting NIC/RIO
resource you recorded. Did you find a leading port with a status of "failed"?
No: The loop is functioning properly. Return to the subtask that sent you here.
Isolation procedures
25
Yes: Record the resource name at the leading port with a "failed" status, and the type, model, and
serial number for the resource with the failed status. Continue with the next step.
7. Select Follow leading port one more time and note all the information for the resource name with a
failed trailing port.
8. Select Display system information and note the power controlling system's type, model, and serial
number (and name, if available). This info may be needed for FRU replacement at a later time.
9. Select Cancel twice to return to the previous screen.
10. Go to each resource name (found above) and select Associated packaging resources. This gives the
description of the failing item and the unit ID.
11. Select Display detail to find the part number and location associated with the possible failing item.
Then return to the step that sent you here.
Refresh the port status
1. Wait one minute, and then sign on to SST or DST (if you have not already done so).
2. Select Start a service tool > Hardware service manager > Logical hardware resources > High-speed
link (HSL) resources.
3. Move the cursor to the RIO/HSL/12X loop that you want to examine and select Display detail >
Include non-reporting resources.
4. If the display is not already showing the ports for one of the units you are working on, then select
Follow leading port. Continue to select Follow leading port until the display is showing the ports for
one of the units you are working on. Note the status of the port you were working on. Select Follow
leading port until the display is showing the ports for the other unit you are working on, and note
the status of the port you were working on.
5. Select Cancel > Refresh > Display detail for the failing resource you are checking. Note any change
in the status for the resource. Then return to the step that sent you here.
RIOIP06
Use HSM to examine the RIO/HSL/12X Loop to determine if other systems are connected to the loop.
1. Sign on to SST or DST (if you have not already done so).
2. Select Start a service tool > Hardware service manager > Logical hardware resources > High-speed
link (HSL) resources.
3. Move the cursor to the RIO/HSL/12X loop that you want to examine, and select Resources
associated with loop.
4. Search for Remote RIO/HSL/12X NICs on the loop.
Are there any Remote RIO/HSL/12X NICs on the loop?
Yes: You have determined that there are other systems connected to this loop. This ends the
procedure.
No: You have determined that there are not any other systems connected to this loop. This ends
the procedure.
RIOIP08
Starting with the unit ID and RIO/HSL/12X port for one end of an RIO/HSL/12X cable, determine the
unit ID and port location for the other end.
1. Sign on to SST or to DST if you have not already done so.
2. Select Start a Service Tool > Hardware Service Manager > Logical Hardware Resources > High
Speed Link (HSL) Resources.
3. Move the cursor to the RIO/HSL/12X loop that you want to examine, and select Resources
associated with loop > Include non-reporting resources. The display that appears shows the loop
resource and all the "HSL I/O Bridge" and all the "Remote HSL NIC" resources connected to the loop.
26
Isolation procedures
4. Perform the following for each of the HSL I/O Bridge resources listed until you are directed to do
otherwise.
a. Move the cursor to the HSL I/O Bridge resource and select Associated packaging resources.
b. Compare the unit ID on the display with the unit ID (in hexadecimal format) that you started
with.
Are the unit IDs the same?
Yes: Continue with the next step.
No: Select Cancel to return to the Logical Hardware Associated with HSL Loops display.
Repeat this for each HSL I/O Bridge under the loop, until you are directed to do otherwise.
5. Perform the following steps:
a. Select Associated logical resources.
b. Move the cursor to the HSL I/O Bridge resource and select Display detail.
c. Examine the Leading port and Trailing port information. Search the display for the RIO/HSL/12X
port location label that you recorded prior to starting this procedure. If the label is part of the
information for the Leading port, then select Follow leading port. If the label is part of the
information for the Trailing port, then select Follow trailing port.
d. Perform the step below that matches the function you selected in the previous step:
v If you selected Follow leading port, then examine the display for the Trailing port information.
Record, on the worksheet that you are using, the RIO/HSL/12X port location label shown on
the "Trailing port from previous resource" line. Record this information as the "To HSL Port Label".
v If you selected Follow trailing port, then examine the display for the Leading port information.
Record, on the worksheet that you are using, the RIO/HSL/12X port location label on the
"Leading port to next resource" line. Record this information as the"To HSL Port Label".
e. Record the "Link type" (Copper or Optical) on the worksheet that you are using in the field
describing the cable type.
f. Select Cancel > Cancel > Cancel to return to the Logical Hardware Associated With HSL Loops"
display.
g.
h.
i.
j.
Record the resource name on the display.
Move the cursor to the resource with the resource name you recorded in step 5g.
Select Associated packaging resources.
Record the unit ID.
k. Return to the procedure that sent you here. This ends the procedure.
RIOIP09
This procedure offers a description and service action for RIO/HSL/12X reference code B600 6982.
Note: A fiber-optic cleaning kit may be required for optical RIO/HSL/12X connections.
Note: This reference code can occur on an RIO/HSL/12X loop when an I/O expansion unit on the loop
is powered off for a concurrent maintenance action.
1. Is the reference code in the Service Action Log (SAL) or serviceable event view you are using?
Yes: There is a connection failure on an RIO/HSL/12X link. A B600 6984 reference code may also
appear in the Product Activity Log (PAL) or error log view you are using. Both reference codes are
reporting the same problem. Continue with the next step.
No: The reference code is only informational, and requires no service action. This ends the
procedure.
2. Multiple B600 6982 errors may occur due to retry and recovery activity. Is there a B600 6985 with
"xxxx 3206" in word 4 logged after all B600 6982 errors for the same RIO/HSL/12X loop in the PAL?
Isolation procedures
27
Yes: The recovery efforts were successful. Close all of the B600 6982 entries for the same loop in
the SAL. No service is required. This ends the procedure.
No: Continue with the next step.
3. Is there a B600 6987 reference code in the SAL, or serviceable event view you are using, logged at
about the same time?
Yes: Close this problem and work the B600 6987. This ends the procedure.
No: Continue with the next step.
4. Is there a B600 6981 reference code in the SAL, or serviceable event view you are using, logged at
approximately the same time?
Yes: Go to step 9.
No: Continue with the next step.
5. Perform “RIOIP06” on page 26 to determine if this loop connects to any other systems and then
return here.
Note: The loop number can be found in the SAL in the description for the HSL_LNK FRU.
Is this loop connected to other systems?
Yes: Continue with the next step.
No: Go to step 9.
6. Check for RIO/HSL/12X failures in the serviceable event views on the other systems. RIO/HSL/12X
failures are indicated by entries with RIO/HSL/12X I/O bridge and Network Interface Controller
(NIC) resources. Ignore B600 6982 and B600 6984 entries.
Are there RIO/HSL/12X failures on other systems?
Yes: Continue with the next step.
No: Go to step 9.
7. Repair the problems on the other systems and return to this step. After making repairs on the other
systems check the PAL of this system. Is there a B600 6985 reference code, with this loop's resource
name, that was logged after the repairs you made on the other systems?
Yes: Continue with the next step.
No: Go to step 9.
8. For the B600 6985 reference code you found, use SIRSTAT to determine if the loop is now complete.
Is the loop complete?
Yes: The problem has been resolved. Use “RIOIP01” on page 21 to verify that the loop is now
working properly. This ends the procedure.
No: Continue with the next step.
9. The FRU list displayed in the SAL, or serviceable event view you are using, may be different from the
failing item list given here. Use the FRU list in the serviceable event view if it is available.
Does the reference code appear in the serviceable event view with HSL_LNK or HSLxxxx listed as a
symbolic FRU?
Yes: Perform “RIOIP01” on page 21. This ends the procedure.
No: Exchange the FRUs in the serviceable event view according to their part action codes. This
ends the procedure.
RIOIP10
Use this procedure to determine if the 12X loop is complete (with both primary and redundant paths
functioning for each unit on the loop).
1. Is the system managed by a management console ?
Yes: Continue with the next step
No: Go to step 3 on page 29.
28
Isolation procedures
2. The 12X loop number found in the first 4 characters of word 7 of the SRC that sent you here is in
hexadecimal. Convert this number to decimal.
Hardware Management Console (HMC): From the HMC, expand Systems Management > Servers.
Select the server on which you are working, expand Hardware Information, and click View RIO-12X
Topology. Locate the decimal loop number's information.
Systems Director Management Console (SDMC): From the SDMC, select the server on the Resources
page. Click Actions > Hardware Information > View Hardware Topology. Click Actions, and select
Hardware Information to view the RIO-12X topology. Locate the decimal loop number's information.
Are all links in this loop operational?
Yes: The 12X loop recovered. Return to the procedure that sent you here. This ends the procedure.
No: The 12X loop did not recover. Return to the procedure that sent you here. This ends the
procedure.
3. Search in Advanced System Management Interface (ASMI) for a B700 6985 informational SRC logged
after the 12X SRC you are working on. If the system has an IBM i operating system partition, you can
also find informational logs in the product activity log. Compare the first half of word 7 in the B700
6985 informational log to the value that caused you to be sent to this procedure. Are the two values
the same?
Yes: Use the informational log and SIRSTAT to determine if the loop has recovered. This ends the
procedure.
No: The loop did not recover. Return to the procedure that sent you here. This ends the
procedure.
RIOIP11
Use this procedure to recover from a B7xx 6982 12X failure.
1. Record the 12X loop number in the first four characters of word 7 of this SRC and perform
“RIOIP10” on page 28.
2. Did the 12X loop recover?
No: Continue with the next step
Yes: Close the problem. This ends the procedure.
3. Work with the customer to determine if an I/O enclosure on the 12X loop has powered down
normally.
4. Was an I/O enclosure on the loop powered down normally?
No: Go to 6.
Yes: The loop remains in a failed state until all I/O enclosures on the loop are powered on and
functioning. Work with the customer to determine if all the powered down enclosures on the
loop can be powered on. After all enclosures on the loop are powered on, continue with the next
step.
5. Did the 12X loop recover?
No: Continue with the next step.
Yes: Close the problem. This ends the procedure.
6. Search for a serviceable event with a 1xxx xxxx SRC logged at approximately the same time and with
one or more FRUs in the same unit as those in the FRU list for the SRC you are currently working.
7. Did you find a serviceable event with a 1xxx xxxx SRC?
No: Go to 9 on page 30
Yes: Work to resolve the problem. After you have repaired that error, the 12X loop may be
recovered. After you finish working on the problem, return to this procedure and check to
determine if correcting that problem also corrected the 12X error. To determine if the 12X loop
has recovered, record the 12X loop number in the first four characters of word 7 of this SRC and
perform “RIOIP10” on page 28.
Isolation procedures
29
8. Did the 12X loop recover?
No: Continue with the next step.
Yes: Close the problem. This ends the procedure.
9. In the serviceable event view, search for a B700 6981 error logged at approximately the same time
and on the same 12X loop (the first four characters of word 7 are the same).
10. Did you find a serviceable event with a B700 6981 SRC at approximately the same time and on the
same 12X loop?
No: Go to 14.
Yes: Work to resolve the problem. After you have repaired that error, the 12X loop may be
recovered. After you finish working on the problem, return to this procedure and check to
determine if correcting that problem also corrected the 12X error. To determine if the 12X loop
has recovered, record the 12X loop number in the first four characters of word 7 of this SRC and
perform “RIOIP10” on page 28.
11. Did the 12X loop recover?
No: Continue with the next step.
Yes: Close the problem. This ends the procedure.
12. Verify that all 12X cables in the loop are connected securely. Attach any cables which are not
connected to complete the 12X loop. Did you find any 12X cables to connect?
No: Go to 14.
Yes: Perform “RIOIP10” on page 28.
13. Did the 12X loop recover?
No: Continue with the next step.
Yes: Close the problem. This ends the procedure.
14. Using the FRU list that you are working with for this SRC, exchange one FRU at a time. After you
exchange each FRU, determine if the loop has recovered. To determine if the 12X loop has recovered,
record the 12X loop number in the first four characters of word 7 of this SRC and perform
“RIOIP10” on page 28. After the loop recovers or after you have exchanged all the FRUs, continue
with the next step. To replace a FRU, see System FRU locations.
15. Did the 12X loop recover?
No: Contact your next level of support. This ends the procedure.
Yes: Close the problem. This ends the procedure.
RIOIP12
Use this procedure to recover from a B7xx 6985 12X failure.
1. Work with the customer to determine if a processor enclosure or I/O enclosure on the RIO loop has
powered down normally.
2. Was a processor enclosure or I/O enclosure on the loop powered down normally?
No: Go to step 6 on page 31.
Yes: The loop remains in a failed state until all processor enclosures and I/O enclosures on the
loop are powered on and functioning. Work with the customer to determine if all the powered
down enclosures on the loop can be powered on. After all processor enclosures and I/O
enclosures on the loop are powered on, check to determine if the 12X loop is complete. To
determine if the 12X loop has recovered, record the 12X loop number in the first four characters
of word 7 of this SRC and perform “RIOIP10” on page 28.
3. Did the 12X loop recover?
No: Continue with the next step.
Yes: Close the problem. This ends the procedure.
4. Was an I/O enclosure concurrently added, and did the enclosure power on with one or both 12X
links connected at approximately the same time that the permanent B7xx 6985 error was logged?
30
Isolation procedures
5.
6.
7.
8.
No: Go to step 6
Yes: The permanent B7xx 6985 error is expected under some circumstances when the first I/O
enclosure is added to the loop concurrently. For example, if the enclosure powers on with only
one link connected the error will be generated. After ensuring that both links have been
connected, record the 12X loop number in the first four characters of word 7 of this SRC and then
perform “RIOIP10” on page 28 to determine if the 12X loop has recovered.
Did the 12X loop recover?
No: Continue with the next step.
Yes: Close the problem. This ends the procedure.
In the serviceable event view, search for a serviceable event with a 1xxx xxxx SRC logged at
approximately the same time and with one or more FRUs in the same enclosure as those in the FRU
list for the SRC you were currently working with.
Did you find a serviceable event with a 1xxx xxxx SRC?
No: Go to step 9
Yes: Work to resolve the problem. After you have repaired that error, the 12X loop may be
recovered. After you finish working on the problem, return to this procedure and check to
determine if correcting that problem also corrected the 12X error. To determine if the 12X loop
has recovered, record the 12X loop number in the first four characters of word 7 of this SRC and
perform “RIOIP10” on page 28.
Did the 12X loop recover?
No:: Continue with the next step.
Yes:: Close the problem. This ends the procedure.
9. In the serviceable event view, search for a B700 6981 or B700 6986 error logged at approximately the
same time and on the same 12X loop (the first four characters of word 7 are the same).
10. Did you find a B700 6981 or a B700 6986 error logged at approximately the same time and on the
same 12X loop?.
No: Go to step 12.
Yes: Work to resolve the problem. After you have repaired that error, the 12X loop may be
recovered. After you finish working on the problem, return to this procedure and check to
determine if correcting that problem also corrected the 12X error. To determine if the 12X loop
has recovered, record the 12X loop number in the first four characters of word 7 of this SRC and
perform “RIOIP10” on page 28.
11. Did the 12X loop recover?
No: Continue with the next step.
Yes: Close the problem. This ends the procedure.
12. Search for a processor enclosure or I/O enclosure on the 12X loop that has not powered up as
expected.
13. Did you find a processor enclosure or I/O enclosure on the 12X loop that has not powered up as
expected?
No: Go to step 15 on page 32.
Yes: Go to “Cannot power on SPCN-controlled I/O expansion unit” on page 132 and work that
power symptom. Use the first half of word 7 to determine the loop number for later use. After
you have repaired that error, the 12X loop may be recovered. After you finish working that
power symptom, return to this procedure and check to determine if correcting that problem also
corrected the 12X error. To determine if the 12X loop has recovered, record the 12X loop number
in the first four characters of word 7 of this SRC and perform “RIOIP10” on page 28.
14. Did the 12X loop recover?
No: Continue with the next step.
Yes: Close the problem. This ends the procedure.
Isolation procedures
31
15. Using the FRU list that you are working with for this SRC, exchange one FRU at a time. After you
exchange each FRU, determine if the loop has recovered. To determine if the 12X loop has recovered,
record the 12X loop number in the first four characters of word 7 of this SRC and perform
“RIOIP10” on page 28. After the loop recovers or after you have exchanged all the FRUs, continue
with the next step. To replace a FRU, see System FRU locations.
16. Did the 12X loop recover?
No: Contact your next level of support. This ends the procedure.
Yes: Close the problem. This ends the procedure.
RIOIP56
Use this procedure to restore the 12X link to optimal bandwidth.
1. Record the 12X loop number in the first four characters of word 7 of this SRC. The loop number is in
hexadecimal format and must be converted to decimal.
2. Is the system managed by a management console?
Yes: Continue with the next step.
No: Go to step 6.
3. Perform the following from the management console:
Hardware Management Console (HMC): From the HMC, expand Systems Management > Servers.
Select the server on which you are working, expand Hardware Information, and click View RIO-12X
Topology. In the Current Topology area, scroll down until you find the data for the decimal 12X loop
number you identified in step 1.
Systems Director Management Console (SDMC): From the SDMC, select the server on the Resources
page. Click Actions > Hardware Information > View Hardware Topology. Click Actions, and select
Hardware Information to view the RIO-12X topology. Scroll down until you find the data for the
decimal 12X loop number you identified in step 1.
Is the link width 12X?
Yes: The 12X cable connection is now operating at the optimal bandwidth. No further action is
required. This ends the procedure.
No: Continue with the next step.
4. Unplug both ends of the cable indicated in the FRU list for at least 30 seconds and then reconnect it.
Refresh the view on the management console and verify that the width is now 12X for the decimal
loop number you identified in step 1.
Is the link width 12X?
Yes: The 12X cable connection is now operating at the optimal bandwidth. No further action is
required. This ends the procedure.
No: Continue with the next step.
5. Replace the cable. Refresh the view on the management console and verify that the width is now 12X
for the decimal loop number you identified in step 1.
Is the link width 12X?
Yes: The 12X cable connection is now operating at the optimal bandwidth. No further action is
required. This ends the procedure.
No: Continue replacing the items in the FRU list until the problem is resolved. This ends the
procedure.
6. Unplug both ends of the cable indicated in the FRU list for at least 30 seconds and then reconnect the
cable. It is not possible to concurrently verify that the 12X link has been restored to optimal
bandwidth. If the same SRC occurs for this 12X link after the next IPL, the problem has not been
resolved. Replace the cable and check for the error condition after the next IPL. Continue replacing
the items in the FRU list, and perform an IPL the system each time until the problem has been
resolved. This ends the procedure.
32
Isolation procedures
Multi-adapter bridge isolation procedures
Use multi-adapter bridge (MAB) isolation procedures if there is not a management console attached to
the server. If the server is connected to a management console, use the procedures that are available on
the management console to continue FRU isolation.
MABIP02
Use this procedure to resolve a problem with a multi-adapter bridge.
Perform “MABIP51.”
MABIP03
Use this procedure to isolate a failing PCI adapter under a multi-adapter bridge.
Perform “MABIP50.”
MABIP05
Use this procedure to reset an IOP.
Attention: When the IOP reset is performed, all resources controlled by the IOP will be reset. Perform
this procedure only if the customer has verified that the IOP reset can be performed at this time.
1. Go to the SST/DST display in the partition which reported the problem. Use STRSST if IBM i is
running; use function 21 if STRSST does not work; or IPL the partition to DST.
2. On the Start Service Tools Sign On display, type in a user ID with service authority and password.
3. Select Start a service tool > Hardware service manager > Logical hardware resources > System bus
resources.
4. Page forward until you find the IOP that you want to reset. For help in identifying the IOP from the
Direct Select Address (DSA) in the reference code, see “DSA translation” on page 6.
5. Verify that the IOP are correct by matching the resource names on the display with the resource
names in the Service Action Log (SAL) for the problem you are working on.
6. Move the cursor to the IOP that you want to reset, and select I/O Debug > Reset IOP > IPL IOP.This
ends the procedure.
MABIP50
This isolation procedure is not supported on these models. Continue with the next failing item in the
failing item list.
MABIP51
Use this procedure to resolve a problem with a multi-adapter bridge.
This procedure will determine if the multi-adapter bridge is failing when the symbolic PIOCARD FRU is
in the failing item list. It will also determine whether the symbolic PIOCARD FRU can be removed from
the FRU list.
1. Is the failing item located in a 5797 or 5798 expansion unit and is the location code of the other failing
items in the failing item list Un-P1-C1 through Un-P1-C7, or Un-P2-C1 through Un-P2-C7?
No:
Continue with the next step.
Yes:
Continue with the next failing item in the failing item list. This ends the procedure.
2. Is the partition an AIX partition or a Linux partition, or is a management console attached?
No:
Continue with the next step.
Isolation procedures
33
Go to “PCI bus isolation using AIX, Linux, or the management console” on page 2 to isolate a
PCI bus problem from AIX, Linux, or the management console.
3. Is a location code available for the PIOCARD FRU in the service action log entry? For information
about how to find service action log entries, see Searching the Service Action Log.
Yes:
No:
Find the reference code in the service action log and record the Direct Select Address (DSA),
which is in word 7. For information, see “DSA translation” on page 6, and then continue with
the next step.
Yes:
Go to step 8.
4. Record the bus number (BBBB) and the multi-adapter bridge number (C) of the DSA.
5. Go to isolation procedure “MABIP53” on page 35 and determine the location of the PCI I/O card in
the failing item list. Then, return here and continue with the next step.
6. Determine which of the card positions are controlled by the same multi-adapter bridge that is
controlling the PCI I/O card. (Use the card position table for the frame or I/O tower type you
recorded in MABIP53.)
Note: A card position is controlled by the same multi-adapter bridge if it has the same bus number
and multi-adapter bridge number as the PCI I/O Card that you previously located.
7. Record the card position and the DSA from the card position table for each card position that is
controlled by the same multi-adapter bridge.
8. Use the service action log to find other failures in the same frame that are either located in the card
positions that you recorded in step 7 or are listed with the PIOCARD FRU.
9. Are any such failures listed in the service action log?
No:
Use the failing item list that you were using when you started this procedure. This ends the
procedure.
Yes:
The multi-adapter bridge is failing. Remove the symbolic PIOCARD FRU from the list of
failing items, because it is not the failing FRU. This ends the procedure.
MABIP52
This procedure will isolate a failing PCI adapter from a reference code when an IPL is not successful on
the system or logical partition.
Attention: Power off the partition to remove and replace any failing items referenced in this procedure.
If you do not perform dedicated maintenance, the problem will persist.
1. Determine the PCI bridge set (multi-adapter bridge domain) by performing the following:
a. Record the bus number (BBBB), the multi-adapter bridge number (C) and the multi-adapter bridge
function number (c) from the Direct Select Address (DSA) in word 7 of the reference code. See
“DSA translation” on page 6 for help in determining these values.
b. Use the bus number that you recorded and the System Configuration Listing (or ask the customer)
to determine which frame the bus is in.
c. Record the frame type where the bus is located.
d. Use the System Configuration Listing, the card position table for the frame type that you recorded,
the bus number, and the multi-adapter bridge number to determine the PCI bridge set where the
failure occurred. The PCI bridge set is the group of card positions controlled by the same
multi-adapter bridge on the bus that you recorded.
e. Use the card position table to record the PCI bridge set card positions.
f. Examine the PCI bridge set in the frame, and record all the positions with IOA cards installed in
them.
2. Perform the following steps:
a. Power off the partition.
34
Isolation procedures
b. Remove all the IOA cards in the PCI bridge set identified in step 1. Be sure to record the card
position of each IOA so that you can reinstall it in the same position later. To determine the
location of all the IOA cards in the PCI bridge set, go to System FRU locations.
c. Power on the partition.
Does the reference code or failure that sent you to this procedure occur?
Continue with the next step.
No:
Yes:
The problem is the multi-adapter bridge. Continue with step “MABIP52” on page 34.
3. Reinstall one of the IOAs and power on the partition. Does the reference code or failure that sent you
to this procedure occur?
Yes:
The IOA that you just installed is the failing FRU. Replace the IOA.This ends the procedure.
No:
v If a different SRC occurs, return to Start of call and follow the service procedures for the
new reference code. This ends the procedure.
v If no SRC occurs and there are more IOAs to install, power off the partition and repeat this
step.
v If no SRC occurs and there are no more IOAs to install, the problem is intermittent; contact
your next level of support. This ends the procedure.
4. Power off the partition. Determine which FRU contains the multi-adapter bridge. Locate the card
position table for the frame type that you recorded. Perform the following steps:
a. Using the multi-adapter bridge number that you recorded, search for the multi-adapter bridge
function number "F" in the card position table to determine the card position of the multi-adapter
bridge's FRU.
b. Exchange the multi-adapter bridge's FRU at the card position that you determined for it. See
System FRU locations to determine the correct card location for removal.
c. Install all IOAs in their original positions.
d. Power on the partition.
Does the reference code or failure that sent you to this procedure occur?
No:
Go to Verify a repair. This ends the procedure.
Yes:
Call your next level of support. This ends the procedure.
MABIP53
Use this procedure to determine a card position when no location is given for a PCI adapter FRU.
This procedure uses the Direct Select Address given in the reference code because no location was given
for a PCI adapter FRU.
1. Is the partition an AIX partition or a Linux partition, or is a management console attached?
No:
Continue with the next step.
Yes:
Go to “PCI bus isolation using AIX, Linux, or the management console” on page 2 to isolate
a PCI bus problem from an AIX partition, a Linux partition, or a management console.
2. If you were sent to this procedure with a specific Direct Select Address (DSA), then use it.
Otherwise, use the DSA in the reference code. See “DSA translation” on page 6 to find the DSA in
the reference code and translate it into the BBBB Cc values that you use in later steps of this
procedure.
3. Perform the following steps:
a. Record the bus number value, BBBB, in the DSA and convert it to decimal format.
b. Search for the decimal bus number in HSM or the System Configuration Listing to determine
which frame or I/O unit contains the failing item. From the HSM screen select Logical Hardware
Isolation procedures
35
Resources > System Bus resources. Move the cursor to a system bus object and select Display
Detail. Do this for each bus until you find the bus on which you are working.
c. Record the frame or unit type.
4. Record the Cc value in the DSA. Is the Cc value greater than 00?
No: The multi-adapter bridge and the multi-adapter function number are not identified. Record
that the multi-adapter bridge is not identified in the DSA. The card slot cannot be identified
using the DSA. Go to step 9.
Yes: Continue with the next step.
5. Is the right-most character of the Cc value 'F'?
No: Continue with the next step.
Yes: Only the multi-adapter bridge number is identified. Record the multi-adapter bridge number
(the leftmost character of the Cc value) for later use. The card slot cannot be identified using the
DSA. Go to step 9.
6. Are you working with a B7xx reference code?
No: Go to step 8.
Yes: Continue with the next step.
7. Is SST/DST available?
No: Continue with the next step.
Yes: Go to step 13 on page 37.
8. Use the “Card positions” on page 7 with the BBBB and Cc values that you recorded to identify the
card position. Then return to the procedure, failing item, or symbolic FRU that sent you here. This
ends the procedure.
9. Perform the following steps:
a. Sign onto SST or DST if you have not already done so.
b. Select Start a service tool > Hardware service manager > Logical Hardware Resources > System
bus resources.
c. Put the bus number in the System bus(es) to work with field. Then select Include non-reporting
resources and examine the display. Is there more than one multi-adapter bridge connected to the
bus resource you are working with?
No: Continue with the next step.
Yes: Go to step 12.
10. Was there a multi-adapter bridge number identified in the Cc value of the DSA?
Yes: Continue with the next step.
No: From the Logical Hardware Resources on System Bus display, examine the status of all the
resources under the bus, looking for a "failed" resource.
v To examine the status of the IOAs, select Resources associated with IOP for each IOP under the
bus.
v To determine the card position of a failed resource, select Associated packaging resources >
Display detail and record the unit ID and part number.
Return to the procedure that sent you here. This ends the procedure.
11. Search for the multi-adapter bridge number that is identified in the DSA by moving the cursor to
each multi-adapter bridge resource and selecting Display detail. Convert the system card value to
hexadecimal (it is displayed in decimal format). The hexadecimal system card value is the Cc address
of the multi-adapter bridge. When you find the multi-adapter bridge resource, where the
multi-adapter bridge number (the leftmost character of the hexadecimal Cc value) matches the
multi-adapter bridge number that you recorded from the DSA, then you have located the
multi-adapter bridge identified in the DSA.
12. From the Logical Hardware Resources on System Bus display, examine the status of all the resources
under the multi-adapter bridge, looking for a "failed" resource.
36
Isolation procedures
v To examine the status of the IOAs, select Resources associated with IOP for each IOP under the
multi-adapter bridge.
v To determine the card position of a failed resource, select Associated packaging resources >
Display detail and record the unit ID and part number.
Did you find any failed resources?
Yes: One of the failing resources that you located is the problem. Return to the procedure that
sent you here. This ends the procedure.
No: Use the System Configuration Listing and “Card positions” on page 7 for the frame type that
you recorded to determine which card positions may have the failing card. If you recorded that
the multi-adapter bridge was identified in the leftmost character of the Cc value, then “Card
positions” on page 7 will help you identify which card slots (PCI bridge set) are controlled by the
multi-adapter bridge that is identified in the Cc value. If the multi-adapter bridge was not
identified in the Cc value (indicated by a value of '0' in the leftmost character) then “Card
positions” on page 7 will identify which card slots are controlled by the bus (BBBB) that is
identified in the DSA. Return to the procedure that sent you here. This ends the procedure.
13. Perform the following steps:
a. Convert the hexadecimal Cc value in the DSA into a decimal value. You will be searching for the
decimal value in HSM where it will be called "System card".
b. Sign on to SST or DST if you have not already done so.
c. Select Start a Service Tool > Hardware Service Manager > Logical Hardware Resources >
System Bus Resources.
d. Search for the "System Bus" resource identified in BBBB of the DSA by moving the cursor to each
system bus resource and selecting Display detail. Do this until you locate the bus number that
matches the decimal bus number value that you recorded from the DSA. Record the resource
name of the bus for later use. From the Logical Hardware Resources on System Bus display,
select Include non-reporting resources.
14. From the Logical Hardware Resources on System Bus display, examine all of the IOP and IOA
resources under the bus. Look for a "System card" value that matches the decimal value of the Cc that
you converted to decimal in step 12. Perform the following steps to display the "System card" value
for each of the IOP and IOA resources:
Note: Do not examine virtual IOP resources.
v To examine the IOP resources:
a. Select Associated packaging resources > Display detail. The "System card" value of the IOP
will be shown on the display.
b. If the "System card" value matches the decimal value of the Cc, then you have located the
failing resource. Record the unit ID and part number, and then return to the procedure that
sent you here. Otherwise, continue to examine all the IOP and IOA resources on the bus.
v To examine the IOA resources:
a. Move the cursor to an IOP or a virtual IOP resource and select Resources associated with IOP
> Associated packaging resources > Display detail. The "System card" value of the IOA will be
shown on the display.
b. If the "System card" value matches the decimal value of the Cc, then you have located the
failing resource. Record the unit ID and part number, and then return to the procedure that
sent you here. Otherwise, continue to examine all the IOP and IOA resources on the bus.
Have you examined all the IOP and IOA resources under the bus?
No: Repeat this step.
Yes: Continue with the next step.
15. Did you locate a resource with a "System card" value that matches the decimal Cc value from step 13?
Isolation procedures
37
Yes: Record the unit ID and part number of the resource. Return to the procedure that sent you
here. This ends the procedure.
No: You will not be able to locate the card using DST. Go to step 8 on page 36 to locate the card.
MABIP54
Use this procedure to isolate the failing PCI I/O adapter card from a reference code with a Direct Select
Address when the serviceable event view does not indicate a location for the PCI card.
The removal and replacement of all FRUs in this procedure must be performed using dedicated
maintenance.
1. Is the partition an AIX partition or a Linux partition, or is a management console attached?
No:
Continue with the next step.
Go to “PCI bus isolation using AIX, Linux, or the management console” on page 2 to isolate
a PCI bus problem from an AIX partition, a Linux partition, or a management console.
2. Determine the PCI bridge set (multi-adapter bridge domain) by performing the following:
a. Record the bus number (BBBB), the multi-adapter bridge number (C), and the multi-adapter
bridge function number (c) from the Direct Select Address (DSA) (see “Analyzing a 12X or PCI
bus reference code” on page 5 for help in determining these values).
b. Using the bus number and the System Configuration Listing, or by asking the customer,
determine which unit the bus is located in and record that unit type.
c. The PCI bridge set is the group of card positions controlled by the same multi-adapter bridge on
the bus that you recorded. Use the System Configuration Listing, the “Card positions” on page 7
table for the unit type that you recorded, the bus number, and the multi-adapter bridge number
to determine in which PCI bridge set the failure occurred.
Yes:
d. Using the card position table, record the PCI bridge set card positions, and the multi-adapter
bridge function numbers.
e. Examine the PCI bridge set and record the information on the form for all of the positions with
IOP and IOA cards installed in them.
3. Perform the following steps:
a. Power off the system or partition.
b. Remove all the IOAs in the PCI bridge set identified in step 1. Be sure to record the card position
of each IOA so that you can reinstall it in the same position later. To determine the remove and
replace procedures for the IOAs, locate the IOA card positions in System FRU locations for the
frame type that you recorded.
c. Power on the system or partition.
Does the reference code or failure that sent you to this procedure occur?
No: Power off the system or partition. Go to step 7 on page 39.
Yes: Continue with the next step.
4. Perform the following steps:
a. Power off the system or partition.
b. Exchange the IOP that is indicated in the DSA. Be sure to record the card position of the IOP so
that you can reinstall it in the same position later. To determine the exchange procedure for the
IOP, locate the IOP's card position in System FRU locations for the frame type you recorded.
c. Power on the system or partition.
Does the reference code or failure that sent you to this procedure occur?
No: Power off the system or partition. Go to step 11 on page 39.
Yes: Continue with the next step.
5. Perform the following steps:
38
Isolation procedures
a. Power off the system or partition.
b. Install all of the IOAs that you removed in step 2. Be sure to install them in their original
positions.
c. Power on the system or partition.
Does the reference code or failure that sent you to this procedure occur?
No: Perform “Verifying a high-speed link, system PCI bus, or a multi-adapter bridge repair” on
page 3. This ends the procedure.
Yes: Continue with the next step.
6. Perform the following steps:
a. Power off the system or partition.
b. Remove all of the IOAs in the PCI bridge set identified in step 1. Be sure to record the card
position of each IOA so that you can reinstall it in the same position later.
c. Remove the IOP that you exchanged and install the original IOP in its original position. Continue
with the next step.
7. Perform the following steps:
a. Reinstall, in its original position, one of the IOAs that you removed.
b. Power on the system or partition.
Does the reference code or failure that sent you to this procedure occur?
No: Power off the system or partition. Repeat this step for another one of the IOAs that you
removed. If you have reconnected all of the IOAs and the reference code or failure that sent you
to this procedure does not occur, the problem is intermittent; contact your next level of support.
This ends the procedure.
Yes: The IOA that you just installed is the failing FRU. Continue with the next step.
8. Perform the following steps:
a. Power off the system or partition.
b. Replace the I/O adapter card that you last installed. Be sure to install the new I/O adapter card
in the same position.
c. Power on the system or partition.
Does the reference code or failure that sent you to this procedure occur?
Yes: Contact your next level of support. This ends the procedure.
No: Continue with the next step.
9. Perform the following steps:
a. Power off the system or partition.
b. Reinstall, in their original positions, the remaining I/O adapter cards that you removed.
c. Power on the system or partition.
Does the reference code or failure that sent you to this procedure occur?
Yes: Contact your next level of support. This ends the procedure.
No: Continue with the next step.
10. Does a different reference code occur?
Yes: Perform Problem Analysis for the new reference code.
No: Perform “Verifying a high-speed link, system PCI bus, or a multi-adapter bridge repair” on
page 3. This ends the procedure.
11. The problem is the multi-adapter bridge. Perform the following:
a. Power off the system or partition.
b. If you have not already done so, replace the I/O backplane in the failing enclosure.
c. Install the original IOP in its original position.
Isolation procedures
39
d. Install all of the other IOPs and IOAs into their original positions. Do not install the IOAs that
you were instructed to remove in step 3 on page 38.
e. Power on the system or partition.
Does the reference code or failure that sent you to this procedure occur?
Yes: Contact your next level of support. This ends the procedure.
No: Continue with the next step.
12. Perform the following steps:
a. Power off the system or partition.
b. Install all of the IOAs that you removed in step 3 on page 38. Be sure to install them in their
original positions.
c. Power on the system or partition.
Does the reference code or failure that sent you to this procedure occur?
Yes: Contact your next level of support. This ends the procedure.
No: The problem has been resolved. This ends the procedure.
MABIP55
Use this procedure to isolate a failing I/O adapter.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
1. Is the partition an AIX partition or a Linux partition, or is a management console attached?
No:
Continue with the next step.
Yes:
Go to “PCI bus isolation using AIX, Linux, or the management console” on page 2 to isolate
a PCI bus problem from an AIX partition, a Linux partition, or a management console.
2. If the system is not IPLed, will it IPL to DST?
No:
Perform “MABIP54” on page 38. This ends the procedure.
Yes:
From the SAL display for the reference code, record the count. Continue with the next step.
3. Go to the SST/DST display in the partition which reported the problem. Use STRSST if the operating
system is running; use function 21 if STRSST does not work; or IPL the partition to DST.
4. On the Start Service Tools Sign On display, type in a user ID with QSRV authority and password.
5. Select Start a service tool > Hardware service manager > Logical hardware resources > System bus
resources.
6. Is there a resource name logged in the SAL entry?
No:
Continue with the next step.
Yes:
Go to step 13 on page 41
7. Do you have a location for the I/O processor?
No:
Record the Direct Select Address (DSA), word 7 of the reference code, from the SAL display.
Then continue with the next step.
Yes:
Go to step 11 on page 41
8. Return to the HSM System bus resources display.
9. Locate the I/O processor by performing the following:
a. Select Display detail.
b. Compare the DSA with the bus, card, and board information for the IOP.
40
Isolation procedures
Note: The card information on the HSM display is in decimal format. You must convert the
decimal card information to hexadecimal format to match the DSA format.
c. Repeat this step until you find the IOP with the same DSA.
10. Select Cancel, and then go to step 14.
11. Locate the I/O processor in HSM by performing the following for each IOP:
a. Select Associated packaging resources > Display detail.
b. Repeat until you find the IOP with the same location.
12. Select Cancel > Cancel and go to step 14.
13. Page forward until you find the multi-adapter bridge and IOP where the problem exists. Verify that
the multi-adapter bridge and IOP are correct by matching the resource names on the display with
the resource names in the SAL for the problem you are working on.
14. For the IOP you are working on, select Resources associated with IOP (if the I/O adapters are not
already displayed).
15. If there is an IOA that is listed in any state other than "operational", then perform steps 16 through
19, starting with the disabled IOA by moving the cursor to the disabled IOA. Otherwise, move the
cursor to the first IOA that is assigned to the IOP.
16. Select Associated packaging resources > Concurrent maintenance > Power off domain. Record the
unit ID of the slot you are powering off. Did the domain power off successfully?
No:
Choose from the following options:
v If only one IOA was listed as failing, power down the system and replace the IOA. Re-IPL
the system. If a different reference code occurred, perform problem analysis and work that
reference code. If there was no reference code, go to Verify a repair. This ends the
procedure.
v If there were multiple failed IOAs and concurrent maintenance did not work on one, then
move to the next failed IOA and repeat steps 16 through 19.
v If concurrent maintenance does not work for multiple failed IOAs, this procedure will not
be able to identify a failing I/O adapter. Return to the procedure that sent you here. This
ends the procedure.
Yes
Perform “MABIP05” on page 33 and then return here and continue with the next step.
17. Did the IOP reset and IPL successfully?
No:
This procedure will not be able to identify a failing I/O adapter. Return to the procedure
that sent you here. This ends the procedure.
Check for the same failure that sent you to this procedure. Check the system control panel,
the SAL for the partition that reported the problem, or the Work with partition status display
for the partition that reported the problem. In the SAL, the count will increase if the
reference code occurred again. Continue with the next step.
18. Did the same reference code occur after the IOP was reset and you performed an IPL?
Yes:
No:
Go to step 20 on page 42.
Yes:
Perform the following steps:
a. Go to the Hardware Service Manager display.
b. Go to Packaging Hardware Resources.
c. Power on the IOA by selecting Power on domain.
d. Reassign the IOA to the IOP
e. Return to the HSL resource display, showing the IOP and associated resources.
f. Continue with the next step.
19. Is there any other IOA, assigned to the IOP, that you have not already powered off and on?
No:
Go to step 22 on page 42.
Isolation procedures
41
Yes:
Move the cursor to another IOA assigned to the IOP, choosing IOAs with a status of
"unknown" or "disabled" before moving on to IOAs with a status of "operational". Go to step
16 on page 41.
20. The failing IOA is located. Exchange the I/O adapter that you just powered off. Use the location you
recorded in step 16 on page 41 to locate the IOA.
21. Power on the IOA that you just exchanged. Does the same reference code that sent you to this
procedure still occur?
No:
You have exchanged the failing IOA. Go to Verify a repair. This ends the procedure.
The IOA is not the failing item. Remove the IOA and reinstall the original IOA. Continue
with the next step.
22. No failing IOAs were identified. Return to the procedure that sent you here. This ends the
procedure.
Yes:
MABIP56
Use this procedure to isolate a problem with a PCI Express (PCIe) storage enclosure, a PCIe cable, or an
enclosure RAID module (ERM).
The following procedure can be used to locate and isolate problems with PCIe storage enclosures, PCIe
cables, and enclosure RAID modules only if there is a location code of the form Un-Px-Cy-Tz-L1 in the
serviceable event view for this problem. If a location code of the form Un-Px-Cy-Tz-L1 is not in the
serviceable event view, contact your next level of support. If you already performed this procedure,
return to the procedure that sent you here.
1. Is a location code of the form Un-Px-Cy-Tz-L1 available in the serviceable event view for a
field-replacement unit (FRU) associated with the reference code you are working on?
Yes:
Continue with the next step.
No:
Contact your next level of support. This ends the procedure.
2. Determine the location of the PCIe connector on the system unit by removing the -L1 from the
location code. The resulting location code has the form Un-Px-Cy-Tz. The PCIe connector location is
listed in the part location topic for the machine type and model number of the system unit. See Part
locations and location codes. Activate the identify indicator using the PCIe connector location code.
See Identifying a part.
Were you able to determine the location of the PCIe connector?
Yes:
Continue with the next step.
No:
Contact your next level of support. This ends the procedure.
3. Is a PCIe cable securely connected to the PCIe connector identified in the previous step?
Yes:
Continue with the next step.
Reconnect the PCIe cable to the connector. If you are not sure which PCIe cable to connect to
the connector, work with the customer to determine which PCIe storage enclosure and PCIe
cable to connect to the connector. Record that you reconnected the system unit end of the
PCIe cable. Continue with the next step.
4. Use one of the following methods to locate the ERM at the other end of the PCIe cable:
No:
v Using the serviceable event view, find ADJ_PHY symbolic FRU in the failing item list for this
problem. The location associated with this FRU is the physical location of the ERM.
v If your system firmware level is Ax760 or later, use the ASMI to find the location of the ERM. On
the ASMI, use the System Configuration menu to access the PCIe Hardware Topology. Locate the
entry for the PCIe connector you identified in step 2 in the Host Port column. The location code of
the ERM can be found using the I/O enclosure port location in the corresponding row and
removing the Tx label.
42
Isolation procedures
v With the system powered on, activate the identify indicator of the FRU with location code
Un-Px-Cy-Tz-L1. Locate the ERM. Using Part locations and location codes for this PCIe storage
enclosure, determine the location code of the ERM you have identified.
v Using functions in the operating system of the logical partition that owns the ERM, determine the
location code of the ERM connected to the PCIe cable.
v Trace the PCIe cable to find the location of the enclosure RAID module. If it is not possible to
trace the PCIe cable, record the PCIe cable serial number where the cable is attached to the system
unit. Examine the PCIe cables at all other PCIe storage enclosures that are connected to the system
unit. Match the serial number of the PCIe cable where it is attached to the ERM of the PCIe
storage enclosure to the serial number you recorded. Using Part locations and location codes for
this PCIe storage enclosure, determine the location code of the ERM you have identified.
5. Were you able to determine the location of the enclosure RAID module?
Yes:
Continue with the next step.
No:
Contact your next level of support. This ends the procedure.
6. Is the PCIe cable securely connected to the enclosure RAID module identified in the previous step?
Yes:
Continue with the next step.
Reconnect the PCIe cable to the enclosure RAID module. Record that you reconnected the
enclosure RAID module end of the PCIe cable. Continue with the next step.
7. Ensure that the PCIe storage enclosure does not have any power problems by performing the
following steps:
a. Verify that the power cables are securely connected to the PCIe storage enclosure.
b. Resolve any power errors reported by the logical partition that owns the enclosure RAID module.
No:
c. Ensure that the LEDs are set as follows:
v The ac and dc power supply LEDs (green) are on solid.
v The enclosure RAID module powered on LED (green) is on solid.
Was there a power problem with the PCIe storage enclosure?
Yes:
Resolve the power problem. If additional assistance is needed, contact your next level of
support. After resolving the problem, record that you restored power to the PCIe storage
enclosure and continue with the next step.
No:
Continue with the next step.
8. Is the enclosure RAID module securely installed and latched into place?
Yes:
Continue with the next step.
No:
Remove and reinstall the enclosure RAID module, ensuring that it is securely latched into
place. See Removing and installing an ERM assembly. Record that you reinstalled the
enclosure RAID module and continue with the next step.
9. Did you reconnect a PCIe cable or restore power to the PCIe storage enclosure during this
procedure?
Yes:
Continue with the next step.
No:
Remove and install the ERM assembly and continue with the next step.
For a 5888 PCIe storage enclosure, complete the steps mentioned in Removing and installing
an ERM assembly for a 5888 PCIe storage enclosure.
For an EDR1 PCIe storage enclosure, complete the steps mentioned in Removing and
installing an ERM assembly for an EDR1 PCIe storage enclosure.
10. Is the PCIe storage enclosure feature code EDR1?
Yes:
If the system is already powered off, power it on. If the system is not already powered off,
you can either power off the system and then power on the system or use concurrent
Isolation procedures
43
maintenance procedures. Perform Removing and installing a PCIe cable for an EDR1 PCIe
storage enclosure with the power on, but do not physically replace the cable. Then continue
with the next step.
If the system is not already powered off, power off the system and then power on the
system. Continue to the next step.
11. Record whether the problem was resolved or not and return to the procedure that sent you here.
This ends the procedure.
No:
MABIP57
Use this procedure to determine which I/O slot location codes are associated with a known I/O
controller location code.
1. Use the following table to determine the I/O controller type used by your system.
Table 22. I/O controller to I/O slot location code mapping
Type and model
I/O controller type
I/O controller location
codes
I/O slot location codes
8202-E4B
GX++ 12X adapter
Un-P1-C2
Expansion unit I/O slots
8205-E6B
PCIe expansion riser
Un-P1-C1
v Un-P1-C1-C1
v Un-P1-C1-C2
v Un-P1-C1-C3
v Un-P1-C1-C4
8205-E6B
GX++ 12X adapter
v Un-P1-C2
Expansion unit I/O slots
v Un-P1-C8
8202-E4C, 8202-E4D,
8205-E6C, 8205-E6D
Embedded I/O hub on
system backplane
Un-P1
Un-P1-T9
8202-E4C, 8202-E4D,
8205-E6C, 8205-E6D
PCIe expansion riser
Un-P1-C1
v Un-P1-C1-C1
v Un-P1-C1-C2
v Un-P1-C1-C3
v Un-P1-C1-C4
8202-E4C, 8202-E4D
GX++ 12X adapter
Un-P1-C1
Expansion unit I/O slots
8202-E4C, 8202-E4D
GX++ PCIe adapter
Un-P1-C1
v Un-P1-C1-T1-L1
v Un-P1-C1-T2-L1
8205-E6C, 8205-E6D
GX++ 12X adapter
v Un-P1-C1
Expansion unit I/O slots
v Un-P1-C8
8205-E6C, 8205-E6D
GX++ PCIe adapter
v Un-P1-C1
v Un-P1-C1-T1-L1
v Un-P1-C8
v Un-P1-C1-T2-L1
v Un-P1-C8-T1-L1
v Un-P1-C8-T2-L1
8231-E1C, 8231-E1D,
8268-E1D
Embedded I/O hub on
system backplane
Un-P1
Un-P1-T9
8231-E1C, 8231-E1D,
8268-E1D
GX++ PCIe adapter
Un-P1-C1
Un-P1-C1-T1-L1
8231-E2C, 8231-E2D
GX++ 12X adapter
Un-P1-C8
Expansion unit I/O slots
8231-E2C, 8231-E2D
GX++ PCIe adapter
v Un-P1-C1
v Un-P1-C1-T1-L1
v Un-P1-C8
v Un-P1-C8-T1-L1
44
Isolation procedures
Table 22. I/O controller to I/O slot location code mapping (continued)
8248-L4T, 8408-E8D,
9109-RMD
Embedded I/O hub on I/O Un-P2
backplane
v Un-P2-C9-R1
8248-L4T, 8408-E8D,
9109-RMD
GX++ 12X adapter
Expansion unit I/O slots
8412-EAD, 9117-MMC,
9117-MMD, 9179-MHC, or
9179-MHD
Embedded I/O hub on I/O Un-P2
backplane
v Un-P2-C9-T1
8412-EAD, 9117-MMC,
9117-MMD, 9179-MHC, or
9179-MHD
GX++ 12X adapter
Expansion unit I/O slots
9119-FHB
GX++ 12X adapter
v Un-P1-C2
v Un-P2-C9-R2
v Un-P1-C3
v Un-P1-C2
v Un-P2-C9-T2
v Un-P1-C3
v Un-Px-C39
Expansion unit I/O slots
v Un-Px-C40
v Un-Px-C41
v Un-Px-C44
2. Is the I/O controller type an embedded I/O hub, a PCIe expansion riser, or a GX++ PCIe adapter?
Yes:
The I/O slot location codes associated with this I/O controller are listed in Table 22 on page
44. This ends the procedure.
No:
Continue with the next step.
3. Is the I/O controller type a GX++ 12X adapter?
Yes:
The I/O controller might have expansion units attached. Continue with the next step.
The I/O slot location codes associated with the I/O controller cannot be determined. This
ends the procedure.
4. Is the system managed by a management console?
No:
Yes:
Continue with the next step.
No:
The I/O slot location codes associated with the I/O controller cannot be determined. This
ends the procedure.
5. Determine whether any expansion units are associated with the I/O controller by using the
management console.
Hardware Management Console (HMC): From the HMC, expand Systems Management > Servers.
Select the server on which you are working, expand Hardware Information, and click View
Hardware Topology.
Systems Director Management Console (SDMC): From the SDMC, select the server on the Resources
page. Click Actions > Hardware Information > View Hardware Topology.
Find the location code of the I/O controller in the Leading Port Location Code column. Are any
expansion units listed under the I/O controller?
Yes:
The occupied slots of each expansion unit listed are under the I/O controller. This ends the
procedure.
No:
No additional I/O slot location codes are associated with the I/O controller. This ends the
procedure.
Isolation procedures
45
Communication isolation procedure
Isolate a communications failure.
Read and observe the following warnings when using this procedure.
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
COMIP01, COMPIP1
This procedure helps you to isolate problems with the communications input/output adapter (IOA) or
input/output processor (IOP).
Read and observe the danger notices in “Communication isolation procedure” before proceeding with
this procedure.
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to Determining if the system has logical
partitions.
46
Isolation procedures
2. To determine which communications hardware to test, use the SRC from the problem summary form,
or problem log, For details on line description information, see the Starting a Trace section of Work
with communications trace.
3. Perform the following steps:
a. Vary off the resources.
b. On the Start a Service Tool display, select Hardware service manager > Logical hardware
resources > System bus resources > Resources associated with IOP for the attached IOPs in the
list until you display the suspected failing hardware.
c. Select Verify on the hardware you want to test. The Verify option may be valid on the IOP, IOA,
or port resource. When it is valid on the IOP resource, any replaceable memory will be tested.
Communications IOAs are tested by using the Verify option on the port resource.
4. Run the IOA/IOP tests. This may include any of the following:
v Adapter internal test
v Adapter wrap test (requires adapter wrap plug - available from your hardware service provider).
v Processor internal test
v Memory test
v System port test
Does the IOA/IOP tests complete successfully?
No: The problem is in the IOA or IOP. If a verify test identified a failing memory module, replace
the memory module. On multiple card combinations, exchange the IOA card before exchanging
the IOP card. Exchange the failing hardware. See System FRU locations. This ends the procedure.
Yes: The IOA/IOP is good. Do not replace the IOA/IOP. Continue with the next step.
5. Before running tests on modems or network equipment, the remaining local hardware should be
verified Since the IOA/IOP tests have completed successfully, the remaining local hardware to be
tested is the external cable.
Is the IOA adapter type 2838, with a UTP (unshielded twisted pair) external cable?
Yes: Continue with the next step.
No: Go to step 8.
6. Is the RJ-45 connector on the external cable correctly wired according to the EIA/TIA-568A standard?
That is,
-Pins
-Pins
-Pins
-Pins
1
3
4
7
and
and
and
and
2
6
5
8
using
using
using
using
the
the
the
the
same
same
same
same
twisted
twisted
twisted
twisted
pair,
pair,
pair,
pair.
Yes: Continue with the next step.
No: Replace the external cable with correctly wired cable. This ends the procedure.
7. Do the Line Speed and Duplex values of the line description (DSPLINETH) match the corresponding
values for the network device (router, hub or switch) port?
No: Change the Line Speed and/or Duplex value for either the line description or the network
device (router, hub or switch) port. This ends the procedure.
Yes: Go to step 9 on page 48.
8. Is the cable wrap test option available as a Verify test option for the hardware you are testing?
v Yes: Verify the external cable by running the cable wrap test. A wrap plug is required to perform
the test. This plug is available from your hardware service provider.
Does the cable wrap test complete successfully?
Yes: Continue with the next step.
No: The problem is in the cable. Exchange the cable. This ends the procedure.
v No: The communications IOA/IOP is not the failing item. One of the following could be causing
the problem.
Isolation procedures
47
–
–
–
–
–
External cable.
The network.
Any system or device on the network
The configuration of any system or device on the network.
Intermittent problems on the network.
– A new SRC - perform problem analysis or ask your next level of support for assistance.
Work with the customer or your next level of support to correct the problem. This ends the
procedure.
9. All the local hardware is good. This completes the local hardware verification. The communications
IOA/IOP and/or external cable is not the failing item.
One of the following could be causing the problem:
v The network
v Any system or device on the network
v The configuration of any system or device on the network
v Intermittent problems on the network
v A new SRC - perform problem analysis or ask your next level of support for assistance
Work with the customer or your next level of support to correct the problem. This ends the
procedure.
Disk unit isolation procedure
Provides a procedure to isolate a failure in a disk unit.
Read and observe all safety procedures before servicing the system and while performing the disk unit
isolation procedure.
Attention: Unless instructed otherwise, always power off the system or expansion unit where the FRU
is located before removing, exchanging, or installing a field-replaceable unit (FRU).
DSKIP03
Use this procedure to determine the reference code, which is used to isolate a problem and to determine
the failing device.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
1. If the system has logical partitions, perform this procedure from the logical partition that reported
the problem. To determine if the system has logical partitions, go to determining if the system has
logical partitions.
2. Look in the Service action log for other errors logged at or around the same time as the 310x SRC. If
no entries appear in the service action log, use the product activity log. Use the other SRCs to correct
the problem before performing an IPL. Contact your next level of support as necessary for assistance
with SCSI bus problem isolation. If the problem is not corrected, continue with the next step.
3. Perform an IPL to dedicated service tool (DST). See dedicated service tools.
Does an SRC appear on the control panel?
v Yes: Go to step 6 on page 49.
v No: Does either the Disk Configuration Error Report, the Disk Configuration Attention Report, or
the Disk Configuration Warning Report display appear?
Yes: Continue with the next step.
No: Go to step 5 on page 49.
48
Isolation procedures
4. Does one of the following messages appear in the list?
v Missing disk units in the configuration
v Missing mirror protection disk units in the configuration
v Device parity protected units in exposed mode
– No: Continue with the next step.
– Yes: Select option 5, press F11, and then press Enter to display the details.
If all of the reference codes are 0000, go to “LICIP11” on page 94 and use cause code 0002. If
any of the reference codes are not 0000, go to step 6 and use the reference code that is not 0000.
Note: Use the characters in the Type column to find the correct reference code table.
5. Does the Display Failing System Bus display appear?
v No: Look at all the Product activity logs by selecting Product activity log under DST (see
dedicated service tools). If there is more than one SRC logged, use an SRC that is logged against
the IOP or IOA.
Is an SRC logged as a result of this IPL?
Yes: Continue with the next step.
No: You cannot continue isolating the problem. Use the original SRC and exchange the failing
items, starting with the highest probable cause of failure. This ends the procedure.
v Yes: Use the reference code that is displayed under Reference Code to correct the problem. This
ends the procedure.
6. Record the SRC.
Is the SRC the same one that sent you to this procedure?
Yes: Continue with the next step.
No: Perform problem analysis to correct the problem. This ends the procedure.
7. Perform the following steps:
a. Power off the system or expansion unit.
b. Find the devices identified by FI code FI01106. See FI01106.
c. Disconnect one of the disk units, (other than the load-source disk unit), the tape units, or the
optical storage units that are identified by FI code FI01106. Slide it partially out of the system.
Note: Do not disconnect the load-source disk unit, although FI code FI01106 may identify it.
8. Power on the system or the expansion unit that you powered off.
Does an SRC appear on the control panel?
No: Continue with the next step.
Yes: Go to step 12 on page 50.
9. Does either the Disk Configuration Error Report, the Disk Configuration Attention Report, or the
Disk Configuration Warning Report display appear with one of the following listed?
v Missing disk units in the configuration
v Missing mirror protection disk units in the configuration
v Device parity protected units in exposed mode
Yes: Continue with the next step.
No: Go to step 11 on page 50.
10. Select option 5, press F11, and then press Enter to display details.
Does an SRC appear in the Reference Code column?
No: Continue with the next step.
Yes: Go to step 12 on page 50.
Isolation procedures
49
11. Look at all the Product activity logs by selecting Product activity log under DST.
Is an SRC logged as a result of this IPL?
v Yes: Continue with the next step.
v No: The last device you disconnected is the failing item. Replace the failing device and reconnect
the devices that were disconnected previously. This ends the procedure.
12. Record the SRC.
Is the SRC the same one that sent you to this procedure?
v No: Continue with the next step.
v Yes: The last device you disconnected is not the failing item.
a. Leave the device disconnected and go to step 7 on page 49 to continue isolation.
b. If you have disconnected all devices that are identified by FI code FI01106 except the
load-source disk unit, reconnect all devices. Then, go to step 15.
13. Does the Disk Configuration Error Report, the Disk Configuration Attention Report, or the Disk
Configuration Warning Report display appear with one of the following listed?
v Missing disk units in the configuration
v Missing mirror protection disk units in the configuration
v Device parity protected units in exposed mode
Yes: Continue with the next step.
No: Use the reference code to correct the problem. This ends the procedure.
14. Select option 5, press F11, and then press Enter to display details.
Are all the reference codes 0000?
v No: Use the reference code to correct the problem. This ends the procedure.
v Yes: The last device you disconnected is the failing item.
a. Reconnect all devices except the failing item.
b. Replace the failing item. This ends the procedure.
15. Was disk unit 1 (the load-source disk unit) a failing item that FI code FI01106 identified?
Yes: The failing items that FI code FI01106 identified are not failing. The load-source disk unit
may be failing. Use the original SRC and exchange the failing items, starting with the highest
probable cause of failure. This ends the procedure.
No: The failing items that FI code FI01106 identified are not failing. Use the original SRC and
exchange the failing items, starting with the highest probable cause of failure. This ends the
procedure.
50
Isolation procedures
Intermittent isolation procedures
These procedures help you to correct an intermittent problem.
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
Use these procedures to correct an intermittent problem, if other problem analysis steps or tables sent you
here. Only perform the procedures that apply to your system.
Read all safety procedures before servicing the system. Observe all safety procedures when performing a
procedure. Unless instructed otherwise, always power off the system or expansion unit where the FRU is
located. See Powering on and powering off the system before removing, exchanging, or installing a
field-replaceable unit (FRU).
Use the procedure below to identify intermittent problems and the associated corrective actions.
Isolation procedures
51
INTIP03
Use this procedure to isolate problems with external noise on AC voltage lines.
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
Electrical noise on incoming ac voltage lines can cause various system failures. The most common source
of electrical noise is lightning.
1. Ask the customer if an electrical storm was occurring at the time of the failure to determine if
lightning could have caused the failure.
Could lightning have caused the failure?
No: Go to step 3 on page 53.
Yes: Continue with the next step.
2. Determine if lightning protection devices are installed on the incoming ac voltage lines where they
enter the building. There must be a dedicated ground wire from the lightning protection devices to
earth ground.
Are lightning protection devices installed?
Yes: Continue with the next step.
52
Isolation procedures
No: Lightning may have caused the intermittent problem. Recommend that the customer install
lightning protection devices to prevent this problem from recurring. This ends the procedure.
3. Have an installation planning representative perform the following steps:
a. Connect a recording ac voltage monitor to the incoming ac voltage lines of the units that contain
the failing devices with reference to ground.
b. Set the voltage monitor to start recording at a voltage slightly higher than the normal incoming ac
voltage.
Does the system fail again with the same symptoms?
No: This ends the procedure.
Yes: Continue with the next step.
4. Look at the recording and see if the voltage monitor recorded any noise when the failure occurred.
Did the monitor record any noise when the failure occurred?
Yes: Review with the customer what was happening external to the system when the failure
occurred. This may help you to determine the source of the noise. Discuss with the customer what
to do to remove the noise or to prevent it from affecting the server. This ends the procedure.
No: Perform the next intermittent isolation procedure listed in the Isolation procedure column. This
ends the procedure.
INTIP05
Use this procedure to isolate problems with external noise on twinaxial cables.
Electrical noise on twinaxial cables that are not installed correctly may affect the twinaxial workstation
I/O processor card.
Examples of this include open shields on twinaxial cables, and station protectors that are not being
installed where necessary.
Check for the following on the system:
v There must be no more than 11 connector breaks in a twinaxial cable run.
v Station protectors must be installed (in pairs) where a cable enters or leaves a building.
v There can only be two station protectors for each twinaxial run.
v
v
v
v
v
There is a maximum of seven devices (with addresses 0-6) for each cable run.
There is a maximum cable length of 1524 meters (5000 feet) for each port.
All cable runs must be ended (terminated).
Disconnect all twinaxial cables that are not used.
Remove any cause of electrical noise in the twinaxial cables.
v All workstations must be grounded.
This ends the procedure.
INTIP07
Use this procedure to lessen the effects of electrical noise (electromagnetic interference, or EMI) on the
system.
1. Ensure that air flow cards are installed in all adapter card slots that are not used.
2. Keep all cables away from sources of electrical interference, such as ac voltage lines, fluorescent lights,
arc welding equipment, and radio frequency (RF) induction heaters. These sources of electrical noise
can cause the system to become powered off.
3. If you have an expansion unit, ensure that the cables that attach the system unit to the expansion unit
are seated correctly.
Isolation procedures
53
Note: If the failures occur when people are close to the system or machines that are attached to the
system, the problem may be electrostatic discharge (ESD).
4. Have an installation planning representative use a radio frequency (RF) field intensity meter to
determine if there is an unusual amount of RF noise near the server. You also can use it to help
determine the source of the noise. This ends the procedure.
INTIP08
Use this procedure to ensure that the system is electrically grounded correctly.
1. Have an installation planning representative or an electrician (when necessary), perform the
following steps.
2. Power off the server and the power network branch circuits before performing this procedure.
3. Ensure the safety of personnel by making sure that all electrical wiring in the United States meets
National Electrical Code requirements.
4. Check all system receptacles to ensure that each one is wired correctly. This includes receptacles for
the server and all equipment that attaches to the server, including workstations. Do this to determine
if a wire with primary voltage on it is swapped with the ground wire, causing an electrical shock
hazard.
5. For each unit, check continuity from a conductive area on the frame to the ground pin on the plug.
Do this at the end of the mainline ac power cable. The resistance must be 0.1 ohm or less.
6. Ground continuity must be present from each unit receptacle to an effective ground. Therefore, check
the following:
v The ac voltage receptacle for each unit must have a ground wire connected from the ground
terminal on the receptacle to the ground bar in the power panel.
v The ground bars in all branch circuit panels must be connected with an insulated ground wire to a
ground point, that is defined as follows:
– The nearest available metal cold water pipe, only if the pipe is effectively grounded to the earth
(see National Electric Code Section 250-81, in the United States).
– The nearest available steel beams in the building structure, only if the beam is effectively
grounded to the earth.
– Steel bars in the base of the building or a metal ground ring that is around the building under
the surface of the earth.
– A ground rod in the earth (see National Electric Code Section 250-83, in the United States).
Note: For installations in the United States only, by National Electrical Code standard, if more
than one of the preceding grounding methods are used, they must be connected together
electrically. See National Electric Code Section 250, for more information about grounding.
v The grounds of all separately derived sources (uninterruptable power supply, service entrance
transformer, system power module, motor generator) must be connected to a ground point as
defined above.
v The service entrance ground bar must connect to a ground point as defined above.
v All ground connections must be tight.
v Check continuity of the ground path for each unit that is using an ECOS tester, Model 1023-100.
Check continuity at each unit receptacle, and measure to the ground point as defined above. The
total resistance of each ground path must be 1.0 ohm or less. If you cannot meet this requirement,
check for faults in the ground path.
v Conduit is sometimes used to meet wiring code requirements. If conduit is used, the branch
circuits must still have a green (or green and yellow) wire for grounding as stated above.
Note: The ground bar and the neutral bar must never be connected together in branch circuit
power panels.
54
Isolation procedures
The ground bar and the neutral bar in the power panels that make up the electrical power
network for the server must be connected together. This applies to the first electrically isolating
unit that is found in the path of electrical wiring from the server to the service entrance power
panel. This isolating unit is sometimes referred to as a separately derived source. It can be an
uninterruptable power supply, the system power module for the system, or the service entrance
transformer. If the building has none of the above isolating units, the ground bar and the neutral
bar must be connected together in the service entrance power panel.
7. Look inside all power panels to ensure the following:
v There is a separate ground wire for each unit.
v
v
v
v
The
The
The
The
54.
green (or green and yellow) ground wires are connected only to the ground bar.
ground bar inside each power panel is connected to the frame of the panel.
neutral wires are connected only to the neutral bar.
ground bar and the neutral bar are not connected together, except as stated in step 6 on page
8. For systems with more than one unit, ensure that the ground wire for each unit is not connected
from one receptacle to the next in a string. Each unit must have its own ground wire, which goes to
the power source.
9. Ensure that the grounding wires are insulated with green (or green and yellow) wire at least equal in
size to the phase wires. The grounding wires also should be as short as possible.
10. If extension-mainline power cables or multiple-outlet power strips are used, make sure that they
must have a three-wire cable. One of the wires must be a ground conductor. The ground connector
on the plug must not be removed. This applies to any extension mainline power cables or
multiple-outlet power strips that are used on the server. It also applies for attaching devices such as
personal computers, workstations, and modems.
Note: Check all extension-mainline power cables and multiple-outlet power strips with an ECOS
tester and with power that is applied. Ensure that no wires are crossed (for example, a ground wire
crossed with a wire that has voltage on it).
This ends the procedure.
INTIP09
Use this procedure to check the AC electrical power for the system.
1. Have an installation planning representative or an electrician (when necessary), perform the
following steps.
2. Power off the server and the power network branch circuits before performing this procedure.
3. To ensure the safety of personnel, all electrical wiring in the United States must meet National
Electrical Code requirements.
4. Check all system receptacles to ensure that each is wired correctly. This includes receptacles for the
server and all equipment that attaches to the server, including workstations. Do this to determine if a
wire with primary voltage on it has been swapped with the ground wire, causing an electrical shock
hazard.
5. When three-phase voltage is used to provide power to the server, correct balancing of the load on
each phase is important. The units should be connected so that all three phases are used equally.
6. The power distribution neutral must return to the "separately derived source" (uninterruptable
power supply, service entrance transformer, system power module, motor generator) through an
insulated wire that is the same size as the phase wire or larger.
7. The server and its attached equipment should be the only units that are connected to the power
distribution network that the server gets its power.
8. The equipment that is attached to the server, such as workstations and printers, must be attached to
the power distribution network for the server when possible.
9. Check all circuit breakers in the network that supply ac power to the server as follows:
Isolation procedures
55
v Ensure that the circuit breakers are installed tightly in the power panel and are not loose.
v Feel the front surface of each circuit breaker to detect if it is warm. A warm circuit breaker may be
caused by:
– The circuit breaker that is not installed tightly in the power panel.
– The contacts on the circuit breaker that is not making a good electrical connection with the
contacts in the power panel.
– A defective circuit breaker.
– A circuit breaker of a smaller current rating than the current load which is going through it.
– Devices on the branch circuit which are using more current than their rating.
10. Equipment that uses a large amount of current, such as: Air conditioners, copiers, and FAX
machines, should not receive power from the same branch circuits as the system or its workstations.
Also, the wiring that provides ac voltage for this equipment should not be placed in the same
conduit as the ac voltage wiring for the server. The reason for this is that this equipment generates
ac noise pulses. These pulses can get into the ac voltage for the server and cause intermittent
problems.
11. Measure the ac voltage to each unit to ensure that it is in the normal range.
Is the voltage outside the normal range?
No: Continue with the next step.
Yes: Contact the customer to have the voltage source returned to within the normal voltage
range.
12. The remainder of this procedure is only for a server that is attached to a separately derived source.
Some examples of separately derived sources are an uninterruptable power supply, a motor
generator, a service entrance transformer, and a system power module.
The ac voltage system must meet all the requirements that are stated in this procedure and also all
of the following:
Notes:
a. The following applies to an uninterruptable power supply, but it can be used for any separately
derived source.
b. System upgrades must not exceed the power requirements of your derived source.
The uninterruptable power supply must be able to supply the peak repetitive current that is used by
the system and the devices that attach to it. The uninterruptable power supply can be used over its
maximum capacity if it has a low peak repetitive current specification, and the uninterruptable
power supply is already fully loaded. Therefore, a de-rating factor for the uninterruptable power
supply must be calculated to allow for the peak-repetitive current of the complete system. To help
you determine the de-rating factor for an uninterruptable power supply, use the following:
Note: The peak-repetitive current is different from the "surge" current that occurs when the server is
powered on.
The de-rating factor equals the crest factor multiplied by the RMS load current divided by the peak
load current where the:
v Crest factor is the peak-repetitive current rating of the uninterruptable power supply that is
divided by the RMS current rating of the uninterruptable power supply. If you do not know the
crest factor of the uninterruptable power supply, assume that it is 1.414.
v RMS load current is the steady state RMS current of the server as determined by the power
profile.
v Peak load current is the steady state peak current of the server as determined by the power
profile.
For example, if the de-rating factor of the uninterruptable power supply is calculated to be 0.707,
then the uninterruptable power supply must not be used more than 70.7% of its kVA-rated
56
Isolation procedures
capacity. If the kVA rating of the uninterruptable power supply is 50 kVA, then the maximum
allowable load on it is 35.35 kVA (50 kVA multiplied by 0.707).
When a three-phase separately derived source is used, correct balancing of the load as specified in
step 5 on page 55 is critical. If the load on any one phase of an uninterruptable power supply is
more than the load on the other phases, the voltage on all phases may be reduced.
13. If the system is attached to an uninterruptable power supply or motor generator, then check for the
following:
v The system and the attached equipment should be the only items that are attached to the
uninterruptable power supply or motor generator. Equipment such as air conditioners, copiers,
and FAX machines should not be attached to the same uninterruptable power supply, or motor
generator that the system is attached.
v The system unit console and the Electronic Customer Support modem must get ac voltage from
the same uninterruptable power supply or motor generator to which the system is attached. This
ends the procedure.
INTIP14
Use this procedure to isolate problems with station protectors.
Station protectors must be installed on all twinaxial cables that leave the building in which the server is
located. This applies even if the cables go underground, through a tunnel, through a covered outside
hallway, or through a skyway. Station protectors help prevent electrical noise on these cables from
affecting the server.
1. Look at the Product Activity Log to determine what workstations are associated with the failure.
2. Determine if station protectors are installed on the twinaxial cables to the failing workstations.
Are station protectors installed on the twinaxial cables to the failing workstations?
Yes: Perform the next intermittent isolation procedure listed in the Isolation procedure column. This
ends the procedure.
No: You may need to install station protectors on the twinaxial cables to the failing workstations.
This ends the procedure.
INTIP16
Use this procedure when you need to copy a main storage dump to give to your next level of support.
For some problems, performing a dump of main storage helps to analyze the problem. The data on the
dump is analyzed by support personnel to determine the cause of the problem and how to correct it.
1. Copy the main storage dump to tape. See Copying a dump.
2. Ask your next level of support to determine for assistance. This ends the procedure.
INTIP18
Use this procedure to determine if one or more PTFs are available to correct this specific problem.
1. Ensure that all PTFs that relate to the problem have been installed.
Note: Ensure that the latest platform LIC fix has been installed before you exchange a service
processor.
2. Contact your next level of support for more information. This ends the procedure.
INTIP20
Use this procedure to analyze system performance problems.
Isolation procedures
57
1. Look in the Product Activity Log, ASM log, or management console to determine if any hardware
errors occurred at the same time that the performance problem occurred. Did any hardware problems
occur at the same time that the performance problem occurred?
Yes: Perform problem analysis and correct the hardware errors. This ends the procedure.
No: The performance problems are not related to hardware. Continue with the next step.
2. Perform the following steps:
a. Ask the customer if they have asked software support for any software PTFs that relate to this
problem.
b. Recommend that the customer install a cumulative PTF package if they have not done so in the
past three months.
c. Inform the customer that performance could possibly be improved by having Software Support
analyze the conditions.
d. Inform the customer that your service provider has performance tools. Contact Software Support
for more information. This ends the procedure.
INTIP24
Use this procedure to collect data when the service processor reports a suspected intermittent problem.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
It is important that you collect data for this problem so that the problem can be corrected. Use this
procedure to collect the data.
There are several ways the system can display the SRC. Follow the instructions for the correct display
method, defined as follows:
v If this SRC is displayed in the Product Activity Log or ASM log, then record all of the SRC data words,
save all of the error log data, and contact your next level of support to submit an APAR.
v If the control panel is displaying SRC data words scrolling automatically through control panel
functions 11, 12 and 13, and the control panel user interface buttons are not responding, then perform
“FSPSP02” on page 240 instead of using this procedure.
v If the SRC is displayed at the control panel, and the control panel user interface buttons respond
normally, then record all of the SRC words.
Do not perform an IPL until you perform a storage dump of the service processor. To get a storage dump
of the service processor, perform the following steps:
1. Record the complete system reference code (SRC) (functions 11 through 20).
2. Perform a service processor dump. See Performing dumps.
3. Is a display shown on the console?
v Yes: Continue with the next step.
v No: The problem is not intermittent. Choose from the following options:
– If you were sent here from another procedure, return there and follow the procedure for a
problem that is not intermittent.
– If the problem continues, replace the service processor hardware. This ends the procedure.
4. The problem is intermittent. Copy the IOP dump to tape. See Performing dumps.
5. Complete the IPL.
6. Determine if there are available program temporary fixes (PTFs) for this problem.
7. If a PTF is found, apply the PTF. Then, return here and answer the following question.
Did you find and apply a PTF for this problem?
58
Isolation procedures
v Yes: This ends the procedure.
v No: Record the following information, and contact your next level of support.
– The complete SRC you recorded in this procedure
– The service processor dump to tape you obtained in step 4 on page 58.
– All known system symptoms:
- How often the intermittent problem occurs
- System environment (IPL, certain applications)
- If necessary, other SRCs that you suspect relate to the problem
– Information needed to write an LICTR. This ends the procedure.
I/O processor isolation procedures
Isolate a failure in the multiple function I/O card.
Read all safety procedures before servicing the system.
Attention: Unless instructed otherwise, always power off the system or expansion unit where the field
replaceable unit (FRU) is located before removing, exchanging, or installing a FRU.
Attention:
Disconnecting the J15 and J16 cables will not prevent the system unit from powering on.
Isolation procedures
59
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
IOPIP01
Use this procedure to perform an IPL to dedicated service tool (DST) to determine if the same reference
code occurs.
If a new reference code occurs, more analysis may be possible with the new reference code. If the same
reference code occurs, you are instructed to exchange the failing items.
1. If the system has logical partitions, perform this procedure from the logical partition that reported
the problem. To determine if the system has logical partitions, go to Determining if the system has
logical partitions, before continuing with this procedure.
2. Was the IPL performed from disk (Type A or Type B)?
No: Continue with the next step.
Yes: Go to step 5 on page 61.
3. Perform the following steps:
a. Ensure that the IPL media is the correct version and level that are needed for the system model.
b. Ensure that the media is not physically damaged.
c. Choose from the following options to clean the IPL media:
60
Isolation procedures
v If it is cartridge type optical media (for example, DVD), do not attempt to clean the media.
v If it is non-cartridge type media (for example, CD-ROM), wipe the disc in a straight line from
the inner hub to the outer rim. Use a soft, lint-free cloth or lens tissue. Always handle the disc
by the edges to avoid finger prints.
v If it is tape, clean the recording head in the tape unit. Use the correct Cleaning Cartridge Kit
provided by your service provider.
4. Perform a Type D IPL in Manual mode.
Does a system reference code (SRC) appear on the control panel?
v No: Go to step 8.
v Yes: Is the SRC the same one that sent you to this procedure?
Yes: You cannot continue isolating the problem. Use the original SRC and exchange the failing
items, starting with the highest probable cause of failure. See the reference code list. If the
failing item list contains FI codes, see System FRU locations to help determine part numbers
and location in the system. This ends the procedure.
No: A different SRC occurred. Use the new SRC to correct the problem. See Start of call. This
ends the procedure.
5. Perform an IPL to DST. See Performing an IPL to dedicated service tools.
Does an SRC appear on the control panel?
No: Continue with the next step.
Yes: Go to step 10 on page 62.
6. Does either the Disk Configuration Error Report, the Disk Configuration Attention Report, or the
Disk Configuration Warning Report display appear on the console?
v No: Continue with the next step.
v Yes: Select option 5, press F11, then press Enter to display the details. Then, choose from the
following options:
– If all of the reference codes are 0000, go to “LICIP11” on page 94 and use cause code 0002.
– If any of the reference codes are not 0000, go to step 10, and use the reference code that is not
0000.
Note: Use the characters in the Type column to find the correct reference code table.
7. Look at the product activity log. See “Using the product activity log” on page 63 for details.
Is an SRC logged as a result of this IPL?
Yes: Continue with the next step.
No: The problem cannot be isolated any more. Use the original SRC and exchange the failing
items. Start with the highest probable cause of failure in the failing item list for this reference
code. If the failing item list contains FI codes, see System FRU locations to help determine part
numbers and location in the system. This ends the procedure.
8. Does either the Disk Configuration Error Report, the Disk Configuration Attention Report, or the
Disk Configuration Warning Report display appear on the console?
v Yes: Continue with the next step.
v No: Look at the product activity log. See “Using the product activity log” on page 63 for details.
Is an SRC logged as a result of this IPL?
Yes: Continue with the next step.
No: The problem is corrected. This ends the procedure.
9. Select option 5, press F11, then press Enter to display the details. Then, choose from the following
options:
v If all of the reference codes are 0000, go to “LICIP11” on page 94 and use cause code 0002.
Isolation procedures
61
v If any of the reference codes are not 0000, continue with the next step and use the reference code
that is not 0000.
Note: Use the characters in the Type column to find the correct reference code table.
10. Record the SRC.
Are the SRC and unit reference code (URC) the same ones that sent you to this procedure?
Yes: Continue with the next step.
No: Use the new SRC or reference code to correct the problem. This ends the procedure.
11. Perform the following steps:
a. Power off the system or expansion unit. See Powering on and powering off the system.
b. Exchange the FRUs in the failing item list for the SRC you have now. Start with the highest
probable cause of failure in the failing item column in the reference code list. Perform the
remaining steps of this procedure after you exchange each FRU until you determine the failing
FRU.
Note: If you exchange a disk unit, do not attempt to save customer data until instructed to do so
in this procedure.
12. Power on the system or the expansion unit. See Powering on and powering off the system.
Does an SRC appear on the control panel?
No: Continue with the next step.
Yes: Go to step 16.
13. Does either the Disk Configuration Error Report, the Disk Configuration Attention Report, or the
Disk Configuration Warning Report display appear on the console?
v Yes: Continue with the next step.
v No: Look at the product activity log. See “Using the product activity log” on page 63 for details.
Is an SRC logged as a result of this IPL?
– Yes: Continue with the next step.
– No: The last FRU you exchanged was failing.
Note: Before exchanging a disk unit, you should attempt to save customer data.
This ends the procedure.
14. Select option 5, press F11, then press Enter to display the details. Then, choose from the following
options:
v If all of the reference codes are 0000, go to “LICIP11” on page 94 and use cause code 0002.
v If any of the reference codes are not 0000, go to step 10, and use the reference code that is not
0000.
Note: Use the characters in the Type column to find the correct reference code table.
15. Record the SRC on the Problem summary form. See “Using the product activity log” on page 63 for
details.
Is the SRC the same one that sent you to this procedure?
v Yes: The last FRU you exchanged is not the failing FRU. Go to step 12 to continue FRU isolation.
v No: Is the SRC B100 4504 or B100 4505 and have you exchanged disk unit 1 in the system unit, or
are all the reference codes on the console 0000?
– Yes: The last FRU you exchanged was failing. This ends the procedure.
Note: Before exchanging a disk unit, you should attempt to save customer data.
– No: Use the new SRC or reference code to correct the problem. This ends the procedure.
62
Isolation procedures
Using the product activity log
This procedure can help you learn how to use the Product Activity Log (PAL).
1. To locate a problem, find an entry in the product activity log for the symptom you are seeing.
a. On the command line, enter the Start System Service Tools command:
STRSST
If you cannot get to SST, select DST.
Note: Do not IPL the system or partition to get to DST.
b. On the Start Service Tools Sign On display, type in a User ID with service authority and password.
c. From the System Service Tools display, select Start a Service Tool > Product activity log >
Analyze log.
d. On the Select Subsystem Data display, select the option to view All Logs.
Note: You can change the From: and To: Dates and Times from the 24-hour default if the time that
the customer reported having the problem was more than 24 hours ago.
e. Use the defaults on the Select Analysis Report Options display by pressing the Enter key.
f. Search the entries on the Log Analysis Report display.
Note: For example, a 6380 Tape Unit error would be identified as follows:
System Reference Code: 6380CC5F
Class: Perm
Resource Name: TAP01
2. Find an SRC from the product activity log that best matches the time and type of the problem the
customer reported.
Did you find an SRC that matches the time and type of problem the customer reported?
Yes: Use the SRC information to correct the problem. This ends the procedure.
No: Contact your next level of support. This ends the procedure.
IOPIP13
Use this procedure to isolate problems on the interface between the I/O card and the storage devices.
The unit reference code (part of the SRC that sent you to this procedure) indicates the SCSI bus that has
the problem:
Unit reference code (URC)
SCSI bus
3100
0
3101
1
3102
2
3103
3
1. If the system has logical partitions, perform this procedure from the logical partition that reported
the problem. To determine if the system has logical partitions, go to Determining if the system has
logical partitions.
2. Were you performing an IPL from removable media (IPL type D) when the error occurred?
No: Continue with the next step.
Yes: Exchange the FRUs in the failing item list for the reference code that sent you to this
procedure. This ends the procedure.
3. Perform the following steps:
a. Look in the service action log (see Searching the service action log) for other errors logged at or
around the same time as the 310x SRC.
Isolation procedures
63
b. If no entries appear in the service action log, use the product activity log (see Using the product
activity log).
c. Use the other SRCs to correct the problem before performing an IPL.
d. Contact your next level of support as necessary for assistance with SCSI bus problem isolation.
e. If the problem is not corrected, continue with the next step.
4. Perform an IPL to DST. See Performing an IPL to dedicated service tools. Does an SRC appear on the
control panel?
No: Continue with the next step.
Yes: Go to step 7.
5. Does one of the following displays appear on the console?
v Disk Configuration Error Report
v Disk Configuration Attention Report
v Disk Configuration Warning Report
v Display Unknown Mirrored Load-Source Status
v Display Load-Source Failure
– Yes: Continue with the next step.
– No: Look at the product activity log. See Using the product activity logfor details.
Is an SRC logged as a result of this IPL?
– Yes: Continue with the next step.
– No: You cannot continue isolating the problem. Use the original SRC and exchange the failing
items, starting with the highest probable cause of failure see the failing item list for this
reference code. If the failing item list contains FI codes, see the System FRU locations topic to
help determine part numbers and location in the system. This ends the procedure.
6. Are all of the reference codes 0000? On some of the displays, you must press F11 to display reference
codes.
v No: Continue with the next step. Use the reference code that is not 0000.
v Yes: Go to “LICIP11” on page 94 and use cause code 0002. This ends the procedure.
7. Is the SRC the same one that sent you to this procedure?
Yes: Continue with the next step.
No: Record the SRC. Then use the SRC description to correct the problem. This ends the
procedure.
8. Perform the following steps:
a. Power off the system or the expansion unit. See Powering on and powering off the system for
details.
b. Find the I/O card identified in the failing item list.
c. Remove the I/O card and install a new I/O card. This item has the highest probability of being
the failing item.
d. Power on the system or the expansion unit.
Does an SRC appear on the control panel?
No: Continue with the next step.
Yes: Go to step 12 on page 65.
9. Does one of the following displays appear on the console?
v Disk Configuration Error Report
v Disk Configuration Attention Report
v Disk Configuration Warning Report
v Display Unknown Mirrored Load-Source Status
v Display Load-Source Failure
64
Isolation procedures
v Yes: Does the Display Unknown Mirrored Load-Source Status display appear on the console?
Note: On some of these displays, you must press F11 to display reference codes.
– Yes: Continue with the next step.
– No: Are all of the reference codes 0000?
- No: Go to step 12 using the reference code that is not 0000.
- Yes: Go to “LICIP11” on page 94 and use cause code 0002. This ends the procedure.
v No: Go to step 11.
10. Is the reference code the same one that sent you to this procedure?
v No: Either a new reference code occurred, or the reference code is 0000. There may be more than
one problem. The original I/O card may be failing, but it must be installed in the system to
continue problem isolation. Install the original I/O card by doing the following:
a. Power off the system or the expansion unit. See Powering on and powering off the system for
details.
b. Remove the I/O card you installed in step 8 on page 64 and install the original I/O card.
Note: Do not power on the system or the expansion unit now.
A device connected to the I/O card could be the failing item. Go to “IOPIP16,” step (9) to
continue isolating the problem. This ends the procedure.
v Yes: Go to step 13.
11. Look at the product activity log. See Using the product activity log for details. Is an SRC logged as a
result of this IPL?
Yes: Continue with the next step.
No: The I/O card, which you removed in step 8 on page 64, is the failing item. This ends the
procedure.
12. Is the SRC or reference code the same one that sent you to this procedure?
Yes: Continue with the next step.
No: Record the SRC. Then use the SRC description to correct the problem. This ends the
procedure.
13. The original I/O card is not the failing item. Install the original I/O card by doing the following:
a. Power off the system or the expansion unit. See Powering on and powering off the system for
details.
b. Remove the I/O card you installed in step 8 on page 64 of this procedure and install the original
I/O card.
Note: Do not power on the system or the expansion unit now.
A device connected to the I/O card could be the failing item. Go to “IOPIP16,” step (9) to continue
isolating the problem. This ends the procedure.
IOPIP16
Use this procedure to isolate failing devices that are identified by FI codes FI01105, FI01106, and FI01107.
During this procedure, you will remove devices that are identified by the FI code, and then you will
perform an IPL to determine if the symptoms of the failure have disappeared, or changed. You should
not remove the load-source disk until you have shown that the other devices are not failing. Removing
the load-source disk can change the symptom of failure, although it is not the failing unit.
1. If the system has logical partitions, perform this procedure from the logical partition that reported
the problem. To determine if the system has logical partitions, go to Determining if the system has
logical partitions before continuing with this procedure.
Isolation procedures
65
2. Use the Hardware Service Manager (HSM) verify function (use DST or SST), and verify that all tape
and optical units attached to the SCSI bus (identified by FI01105, FI01106, or FI01107) are operating
correctly. See Verify a repair for details.
Note: Do not IPL the system to get to DST.
3. Choose from the following options:
v If verification was successful for all tape and optical units, then go to step 5.
v If any tape or optical device could not be verified, or if it failed verification, then exchange the
failing item. See the System FRU locations and continue with the next step.
4. Use the Hardware Service Manager (HSM) verify function (use SST or DST) and verify that the
exchanged item is operating correctly. See Verify a repair for details.
Was the verification successful?
No: Replace the exchanged device with the original. See System FRU locations and continue with
the next step.
Yes: The newly exchanged tape or optical device was the failing item. This ends the procedure.
5. Perform an IPL to DST. See Performing an IPL to dedicated service tools.
Does an SRC appear on the control panel?
No: Continue with the next step.
Yes: Go to step 8.
6. Does one of the following displays appear on the console?
v Disk Configuration Error Report
v Disk Configuration Attention Report
v Disk Configuration Warning Report
v Display Unknown Mirrored Load-Source Status
v Display Load-Source Failure
Note: On some of these displays, you must press F11 to display reference codes. The characters
under Type are the same as the 4 leftmost characters of word 1. The characters under Reference
Code are the same as the 4 rightmost characters of word 1.
– No: Continue with the next step.
– Yes: Are all of the reference codes 0000?
No: Go to step 8, and use the reference code that is not 0000.
Yes: Go to “LICIP11” on page 94 and use cause code 0002. This ends the procedure.
7. Look at the Product Activity Log. See Using the product activity log for details.
Is a reference code logged as a result of this IPL?
Yes: Continue with the next step.
No: You cannot continue isolating the problem. Use the original reference code and exchange the
failing items, starting with the highest probable cause of failure. If the failing item list contains FI
codes, see System FRU locations for additional details. This ends the procedure.
8. Is the SRC or reference code the same one that sent you to this procedure?
Yes: Continue with the next step.
No: Record the SRC or reference code. Then, use the SRC or reference code to correct the
problem. This ends the procedure.
9. Isolate the failing device by doing the following:
a. Power off the system or the expansion unit if it is powered on. See Powering on and powering
off the system .
b. Go to System FRU locations to find the devices identified by FI code FI01105, FI01106, or FI01107
in the failing item list.
66
Isolation procedures
c. Disconnect one of the devices that are identified by the FI code, other than the load-source disk
unit.
Note: The tape, or optical units should be the first devices to be disconnected, if they are
attached to the SCSI bus identified by FI01105, FI01106, or FI01107.
d. Go to step 11.
10. Continue to isolate the possible failing items by doing the following:
a. Power off the system or the expansion unit. See Powering on and powering off the system.
b. Disconnect the next device that is identified by FI codes FI01105, FI01106, or FI01107 in the FRU
list. See the note in step 9 on page 66. Do not disconnect disk unit 1 (load-source disk) until you
have disconnected all other devices and the load-source disk is the last device that is identified
by these FI codes.
11. Power on the system or the expansion unit.
Does an SRC appear on the control panel?
No: Continue with the next step.
Yes: Go to step 14.
12. Does one of the following displays appear on the console?
v Disk Configuration Error Report
v Disk Configuration Attention Report
v Disk Configuration Warning Report
v Display Unknown Mirrored Load-Source Status
v Display Load-Source Failure
Note: On some of these displays, you must press F11 to display reference codes. The characters
under Type are the same as the 4 leftmost characters of word 1. The characters under Reference
Code are the same as the 4 rightmost characters of word 1.
– Yes: Go to step 14.
– No: Look at the Product Activity Log. See Using the product activity log for details. Is a
reference code logged as a result of this IPL?
No: Continue with the next step.
Yes: Go to step 14.
13. You are here because the IPL completed successfully. The last device you disconnected is the failing
item.
Is the failing item a disk unit?
No: Exchange the failing item and reconnect the devices you disconnected previously. See the
System FRU locations. This ends the procedure.
Yes: Exchange the failing FRU. Before exchanging a disk drive, you should attempt to save
customer data. This ends the procedure.
14. Is the SRC or reference code the same one that sent you to this procedure?
Yes: Continue with the next step.
No: Record the SRC or reference code on the Problem summary form. Then go to step 16 on page
68.
15. The last device you disconnected is not failing.
Have you disconnected all the devices that are identified by FI codes FI01105, FI01106, or FI01107 in
the FRU list?
No: Leave the device disconnected and return to step 10 to continue isolating the possible failing
items.
Yes: Replace the device backplane or backplanes associated with the devices you removed in the
earlier steps. If the device backplane does not fix the problem, then you cannot continue isolating
Isolation procedures
67
the problem. Use the original SRC and exchange the failing items, starting with the highest
probable cause of failure. If the failing item list contains FI codes, see System FRU locations for
additional information. This ends the procedure.
16. Is the SRC B1xx 4504, and have you disconnected the load-source disk unit? (The load-source disk
unit is disconnected by disconnecting disk unit 1.)
v Yes: Continue with the next step.
v No: Does one of the following displays appear on the console, and are all reference codes 0000?
–
–
–
–
–
Disk Configuration Error Report
Disk Configuration Attention Report
Disk Configuration Warning Report
Display Unknown Mirrored Load-Source Status
Display Load-Source Failure
Note: On some of these displays, you must press F11 to display reference codes. The characters
under Type are the same as the 4 leftmost characters of word 1. The characters under Reference
Code are the same as the 4 rightmost characters of word 1.
Yes: Continue with the next step.
No: A new SRC or reference code occurred. Perform problem analysis and correct the
problem. This ends the procedure.
17. The last device you disconnected may be the failing item. Exchange the last device you
disconnected. See the System FRU locations.
Note: Before exchanging a disk drive, you should attempt to save customer data.
Was the problem corrected by exchanging the last device you disconnected?
No: Continue with the next step.
Yes: This ends the procedure.
18. Reconnect the devices you disconnected previously in this procedure.
19. Use the original SRC and exchange the failing items, starting with the highest probable cause of
failure. Do not exchange the FRU that you exchanged in this procedure. If the failing item list
contains FI codes, see System FRU locations to help determine part numbers and location in the
system. This ends the procedure.
IOPIP17
Use this procedure to isolate problems that are associated with SCSI bus configuration errors and device
task initialization failures.
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to Determining if the system has logical
partitions before continuing with this procedure.
2. Were you performing an IPL from removable media (IPL type D) when the error occurred?
v Yes: Exchange the FRUs in the failing item list for the reference code that sent you to this
procedure.
v No: Perform an IPL to DST. See Performing an IPL to dedicated service tools.. Does an SRC appear
on the control panel?
No: Continue with the next step.
Yes: Go to step 5 on page 69.
3. Does either the Disk Configuration Error Report, the Disk Configuration Attention Report, or the Disk
Configuration Warning Report display appear on the console?
v No: Continue with the next step.
v Yes: Does one of the following messages appear in the list?
68
Isolation procedures
– Missing disk units in the configuration
– Missing mirror protection disk units in the configuration
– Device parity protected units in exposed mode.
- No: Continue with the next step.
- Yes: Select option 5, press F11, then press Enter to display the details. Then, choose from the
following options:
v If all of the reference codes are 0000, go to “LICIP11” on page 94 and use cause code 0002.
v If any of the reference codes are not 0000, go to step 5, and use the reference code that is
not 0000.
Note: Use the characters in the Type column to find the correct reference code table.
4. Look at the product activity log. See Searching the service action log. Is an SRC logged as a result of
this IPL?
Yes: Continue with the next step.
No: You cannot continue isolating the problem. Use the original SRC and exchange the failing
items, starting with the highest probable cause of failure (see the failing item list for this reference
code in the (System Reference Codes)) topic. If the failing item list contains FI codes, see (Failing
items) to help determine part numbers and location in the system. This ends the procedure.
5. Record the SRC. Is the SRC the same one that sent you to this procedure?
v No: A different SRC or reference code occurred. Use the new SRC or reference code to perform
problem analysis and correct the problem. This ends the procedure.
v Yes: Determine the device unit reference code (URC) from the SRC. If the Disk Configuration Error
Report, the Disk Configuration Attention Report, or the Disk Configuration Warning Report display
appears on the console, the device URC is displayed under Reference Code. This is on the same line
as the missing device. Is the device unit reference code 3020, 3021, 3022, or 3023?
Yes: Continue with the next step.
No: Go to step 7 on page 70.
6. A unit reference code of 3020, 3021, 3022, or 3023 indicates that there is a problem on an I/O card
SCSI bus. The problem can be caused by a device that is attached to the I/O card that:
v Is not supported.
v Does not match system configuration rules. For example, there are too many devices that are
attached to the bus.
v Is failing.
Perform the following steps:
a. Look at the characters on the control panel Data display or the Problem Summary Form for
characters 9 - 16 of the top 16 character line of function 12 (word 3). Use the format BBBB-Cc-bb
(BBBB = bus, Cc = card, bb = board) to determine the card slot location for the I/O card.
b. The unit reference code indicates the SCSI bus that has the problem:
URC
3020
3021
3022
3023
SCSI Bus
0
1
2
3
c. Find the bus and device locations. See System FRU locations for information about FRU locations
for the system you are servicing.
d. Find the printout that shows the system configuration from the last IPL and compare it to the
present system configuration.
Note: If configuration is not the problem, a device on the SCSI bus may be failing.
Isolation procedures
69
e. If you need to perform isolation on the SCSI bus, go to “IOPIP16” on page 65. This ends the
procedure.
7. The possible failing items are FI codes FI01105 (90%) and FI01112 (10%). Find the device unit address
from the SRC (see System Reference Code (SRC) Format Description). Use this information to find the
physical location of the device. Record the type and model numbers to determine if the addressed
I/O card supports this device. Is the device given support on your system?
v No: Continue with the next step.
v Yes: Perform the following steps:
a. Exchange the device.
b. Perform an IPL to DST. See Performing an IPL to dedicated service tools.
Does this correct the problem?
No: Ask your next level of support for assistance. This ends the procedure.
Yes: This ends the procedure.
8. Perform the following steps:
a. Remove the device.
b. Perform an IPL to DST. See Performing an IPL to dedicated service tools..
Does this correct the problem?
No: Ask your next level of support for assistance. This ends the procedure.
Yes: This ends the procedure.
IOPIP18
Use this procedure to isolate problems that are associated with SCSI bus configuration errors and device
task initialization failures.
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to Determining if the system has logical
partitions before continuing with this procedure.
2. Perform an IPL to DST. See Performing an IPL to dedicated service tools.
Does an SRC appear on the control panel?
v Yes: Go to step 5 on page 71.
v No: Does either the Disk Configuration Error Report, the Disk Configuration Attention Report, or
the Disk Configuration Warning Report display appear on the console?
Yes: Continue with the next step.
No: Go to step 4.
3. Does one of the following messages appear in the list?
v Missing disk units in the configuration
v Missing mirror protection disk units in the configuration
v Device parity protected units in exposed mode.
– No: Continue with the next step.
– Yes: Select option 5, press F11, and then press Enter to display the details. Choose from the
following options:
- If all of the reference codes are 0000, go to “LICIP11” on page 94 and use cause code 0002.
- If any of the reference codes are not 0000, go to step 5 on page 71, and use the reference code
that is not 0000.
Note: Use the characters in the Type column to find the correct reference code table.
4. Look at the product activity log.
Is an SRC logged as a result of this IPL?
70
Isolation procedures
Yes: Continue with the next step.
No: You cannot continue isolating the problem. Use the original SRC and exchange the failing
items, starting with the highest probable cause of failure in the failing item column in the reference
code list. If the failing item list contains FI codes, see System FRU locationsto help determine part
numbers and location in the system. This ends the procedure.
5. Record the SRC.
Is the SRC the same one that sent you to this procedure?
Yes: Continue with the next step.
No: A different SRC or reference code occurred. Use the new SRC or reference code to correct the
problem. This ends the procedure.
6. Determine the device unit reference code (URC) from the SRC. If the Disk Configuration Error Report,
the Disk Configuration Attention Report, or the Disk Configuration Warning Report display appears
on the console, the device URC is displayed under Reference Code. This is on the same line as the
missing device.
Is the device unit reference code 3020?
v No: Continue with the next step.
v Yes: A device reference code of 3020 indicates that a device is attached to the addressed I/O card.
Either it is not supported, or it does not match system configuration rules. For example, there are
too many devices that are attached to the bus. Perform the following steps:
Find the printout that shows the system configuration from the last IPL and compare it to the
present system configuration.
b. Use the unit address and the physical address in the SRC to help you with this comparison.
a.
c. If configuration is not the problem, a device on the SCSI bus may be failing. Use FI code
FI00884 in the System FRU locations table to help find the failing device.
d. If you need to perform isolation on the SCSI bus, go to “IOPIP16” on page 65. This ends the
procedure.
7. The possible failing items are FI codes FI01105 (90%) and FI01112 (10%).
Find the device unit address from the SRC. Use this information to find the physical location of the
device. Record the type and model numbers to determine if the addressed I/O card supports this
device.
Is the device given support on your system?
v No: Continue with the next step.
v Yes: Perform the following steps:
a. Exchange the device.
b. Perform an IPL to DST. See Performing an IPL to dedicated service tools.
Does this correct the problem?
No: Ask your next level of support for assistance. This ends the procedure.
Yes: This ends the procedure.
8. Perform the following steps:
a. Remove the device.
b. Perform an IPL to DST. See Performing an IPL to dedicated service tools.
Does this correct the problem?
No: Ask your next level of support for assistance. This ends the procedure.
Yes: This ends the procedure.
Isolation procedures
71
IOPIP19
You were sent to this procedure from unit reference code (URC) 9010, 9011, or 9013.
Contact your next level of support for assistance.
IOPIP20
Use this procedure to isolate the problem when two or more devices are missing from a disk array.
You were sent to this procedure from unit reference code (URC) 9020 or 9021.
1. If the system has logical partitions, perform this procedure from the logical partition that reported
the problem. To determine if the system has logical partitions, go to Determining if the system has
logical partitions, before continuing with this procedure.
2. Access SST/DST by doing one of the following:
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
3. Have any other I/O card or device SRCs (other than a 902F SRC) occurred at about the same time as
this error?
v Yes: Use the other I/O card or device SRCs to correct the problem. This ends the procedure.
v No: Has the I/O card, or have the devices been repaired or reconfigured recently?
– Yes: Continue with the next step.
– No: Contact your next level of support for assistance. This ends the procedure.
4. Did you perform a D IPL to get to DST?
v Yes: Continue with the next step.
v No: Perform the following steps:
a. Access the Product Activity Log and display the SRC that sent you here and view the
"Additional Information" to record the formatted log information. Record all devices that are
missing from the disk array. These are the array members that have both a present address of
0 and an expected address that is not 0.
Note: There might be more than one Product Activity Log entry with the same Log ID. Access
any additional entries by pressing the enter key from the "Display Detail Report for Resource"
screen. View the "Additional Information" for each entry to record the formatted log
information.
For example: There might be an xxxx902F SRC entry in the Product Activity Log if there are
more than 10 disk units in the array.
b. Continue with step 6.
5. A formatted display of hexadecimal information for Product Activity Log entries is not available. In
order to interpret the hexadecimal information, see More information from hexadecimal reports.
Record all devices that are missing from the disk array. These are the array members that have both
a present address of 0 and an expected address that is not 0.
Note: There might be an xxxx902F SRC entry in the Product Activity Log if there are more than 10
disk units in the array. In order to interpret the hexadecimal information for these additional disk
units, see More information from hexadecimal reports.
6. There are three possible ways to correct the problem:
72
Isolation procedures
a. Find the missing devices and install them in the correct physical locations in the system. If you
can find the missing devices and want to continue with this repair option, then continue with the
next step.
b. Stop the disk array that contains the missing devices.
Attention: Customer data might be lost.
If you want to continue with this repair option, go to step 8.
c. Initialize and format the remaining members of the disk array.
Attention: Customer data will be lost.
If you want to continue with this repair option, go to step 9.
7. Perform the following steps:
a. Install the missing devices in the correct locations in the system. See System FRU locations.
b. Power on the system. See Powering on and powering off the system.
Does the IPL complete successfully?
No: You have a new problem. Perform problem analysis and correct the problem. This ends the
procedure.
Yes: This ends the procedure.
8. You have chosen to stop the disk array that contains the missing devices.
Attention: Customer data might be lost.
Perform the following steps:
a. If you are not already using dedicated service tools, perform an IPL to DST. See Performing an
IPL to dedicated service tools.
If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Select Work with disk units.
Did you get to DST with a Type D IPL?
Yes: Continue with the next step.
No: Select Work with disk configuration > Work with device parity protection. Then,
continue with the next step.
c. Select Stop device parity protection.
d. Follow the on-line instructions to stop device parity protection.
e. Perform an IPL from disk.
Does the IPL complete successfully?
No: You have a new problem. Perform problem analysis and correct the problem.This ends the
procedure.
Yes: This ends the procedure.
9. You have chosen to initialize and format the remaining members of the disk array. Perform the
following steps:
Attention: Customer data will be lost.
a. If you are not already using dedicated service tools, perform an IPL to DST. See Performing an
IPL to dedicated service tools.
If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Select Work with disk units.
Did you get to DST with a Type D IPL?
Yes: Continue with the next step.
No: Select Work with disk unit recovery > Disk unit problem recovery procedures, and
continue with the next step.
10. Select Initialize and format disk unit.
11. Follow the online instructions to format and initialize the disk units.
Isolation procedures
73
12. Perform an IPL from disk. Does the IPL complete successfully?
No: You have a new problem. Perform problem analysis and correct the problem. This ends the
procedure.
Yes: This ends the procedure.
IOPIP21
Use this procedure to determine the failing disk unit when, a disk unit is not compatible with other disk
units in the disk array, or when a disk unit has failed. If the URC is 9025 or 9030, the disk array is
running, but it might not be protected.
You were sent to this procedure from a unit reference code (URC) of 9025, 9030, or 9032.
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to Determining if the system has logical
partitions before continuing with this procedure.
2. Is the device location information for this SRC available in the service action log (see Searching the
service action log for details)?
Yes: Exchange the disk unit.
No: Continue with the next step.
3. Access SST/DST by doing one of the following:
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
4. Perform the following steps:
a. Access the Product Activity Log and display the SRC that sent you here.
b. Press the F9 key for address information. This is the I/O card address.
Note: There may be more than one entry with the same Log ID. Entries with the same Log ID
may be accessed by pressing the Enter key from the "Display Detail Report for Resource" screen.
Example: There may be a device specific SRC and/or an xxxx902F SRC entry in the Product
Activity Log. The xxxx902F SRC will occur if there are more than 10 disk units in the array.
c. Continue with the next step.
5. Perform the following steps:
a. Return to the SST or DST main menu.
b. Select Work with disk units > Display disk configuration > Display disk configuration status.
c. On the Display disk configuration status display, look for the devices attached to the I/O card that
is identified in step 4.
d. Find the device that has a status of "DPY/Unknown" or "DPY/Failed". This is the device that is
causing the problem. Show the device address by selecting Display Disk Unit Details > Display
Detailed Address. Record the device address.
e. See System FRU locations and find the diagram of the system unit, or the expansion unit and find
the following:
v The card slot that is identified by the I/O card direct select address
v The disk unit location that is identified by the device address
Have you determined the location of the I/O card and disk unit that is causing the problem?
Yes: Exchange the disk unit that is causing the problem. This ends the procedure.
No: Ask your next level of support for assistance. This ends the procedure.
74
Isolation procedures
IOPIP22
Use this procedure to gather error information and contact your next level of support.
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to Determining if the system has logical
partitions, before continuing with this procedure.
2. Access SST/DST by doing one of the following:
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
3. Did you perform a D IPL to get to DST?
v Yes: Continue with the next step.
v No: Perform the following steps:
a. Access the Product Activity Log and display the SRC that sent you here and view the
"Additional Information" to record the formatted log information. Record all the information.
Note: There may be more than one Product Activity Log entry with the same Log ID. Access
any additional entries by pressing the Enter key from the "Display Detail Report for Resource"
screen. View the "Additional Information" for each entry to record the formatted log
information. Example: There may be an xxxx902F SRC entry in the Product Activity Log if there
are more than 10 disk units in the array.
b. Continue with step 5.
4. A formatted display of hexadecimal information for Product Activity Log entries is not available. In
order to interpret the hexadecimal information, see More information from hexadecimal reports.
Record all the information. Then continue with the next step.
Note: There may be an xxxx902F SRC entry in the Product Activity Log if there are more than 10 disk
units in the array. In order to interpret the hexadecimal information for these additional disk units,
see More information from hexadecimal reports.
5. Ask your next level of support for assistance.
Note: Your next level of support may require the error information you recorded in the previous step.
This ends the procedure.
IOPIP23
You were sent to this procedure from a unit reference code (URC) 9050.
Contact your next level of support for assistance.
IOPIP25
Use this procedure to isolate the problem when a device attached to the I/O card has functions that are
not given support on the I/O card.
You were sent to this procedure from URC 9008.
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to Determining if the system has logical
partitions, before continuing with this procedure.
2. Have any other I/O card or device SRCs occurred at about the same time as this error?
v Yes: Use the other I/O card or device SRCs to correct the problem. See the system reference codes.
This ends the procedure.
Isolation procedures
75
v No: Has the I/O card, or have the devices been repaired or reconfigured recently?
– Yes: Continue with the next step.
– No: Contact your next level of support for assistance. This ends the procedure.
3. Access SST/DST by doing one of the following:
v If you can enter a command at the console, access system service tools (SST). System service tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
4. Did you perform a D IPL to get to DST?
v No: Access the Product Activity Log and display the SRC that sent you here. Press the F9 key for
address information. This is the I/O card address. Then, view the "Additional Information" to
record the formatted log information. Record the addresses that are not 0000 0000 for all devices
listed.
Continue with the next step.
v Yes: Access the Product Activity Log and display the SRC that sent you here. The direct select
address (DSA) of the I/O Card is in the format BBBB-Cc-bb:
– BBBB = hexadecimal offsets 4C and 4D
– Cc = hexadecimal offset 51
– bb = hexadecimal offset 4F
The unit address of the I/O card is hexadecimal offset 18C through 18F.
A formatted display of hexadecimal information for Product Activity Log entries is not available. In
order to interpret the hexadecimal information, see More information from hexadecimal reports.
Record the addresses that are not 0000 0000 for all devices listed. Continue with the next step.
5. See System FRU locations and find the diagram of the system unit, or the expansion unit. Then find
the following:
v The card slot that is identified by the I/O card direct select address (DSA) and unit address. If there
is no IOA with a matching DSA and unit address, the IOP and IOA are one card. Use the IOP with
the same DSA.
v The disk unit locations that are identified by the unit addresses.
Have you determined the location of the I/O card and the devices that are causing the problem?
v No: Ask your next level of support for assistance. This ends the procedure.
v Yes: Have one or more devices been moved to this I/O card from another I/O card?
Yes: Continue with the next step.
No: Ask your next level of support for assistance. This ends the procedure.
6. Is the I/O card capable of supporting the devices attached?
v No: Remove the devices from the I/O card.
Note: You can remove disk units without installing another disk unit, and the system continues to
operate.
This ends the procedure.
v Yes: Do you want to continue using these devices with this I/O card?
Yes: Continue with the next step.
v No: Remove the devices from the I/O card. This ends the procedure.
7. Initialize and format the disk units by performing the following steps:
Attention: Data on the disk unit will be lost.
a. Access SST or DST.
b. Select Work with disk units.
Did you get to DST with a Type D IPL?
76
Isolation procedures
Yes: Continue with the next step.
No: Select Work with disk unit recovery > Disk unit problem recovery procedures. Then
continue with the next step.
c. Select Initialize and format disk unit for each disk unit. When the new disk unit is initialized and
formatted, the display shows that the status is complete. This may take 30 minutes or longer. The
disk unit is now ready to be added to the system configuration. This ends the procedure.
IOPIP26
Use this procedure to correct the problem when the I/O card recognizes that the attached disk unit must
be initialized and formatted.
You were sent to this procedure from URC 9092.
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to Determining if the system has logical
partitions before continuing with this procedure.
2. Have any other I/O card or device SRCs occurred at about the same time as this error?
v Yes: Use the other I/O card or device SRCs to correct the problem. This ends the procedure.
v No: Has the I/O card, or have the devices been repaired or reconfigured recently?
Yes: Continue with the next step.
No: Contact your next level of support for assistance. This ends the procedure.
3. Access SST/DST by doing one of the following:
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools for details.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
4. Did you perform a D IPL to get to DST?
v Yes: Continue with the next step.
v No: Perform the following steps:
a. Access the Product Activity Log and display the SRC that sent you here.
b. Press the F9 key for address information. This is the I/O card address.
c. Then view the "Additional Information" to record the formatted log information.
d. Record the addresses that are not 0000 0000 for all devices listed.
e. Continue with step 6.
5. Perform the following steps:
a. Access the Product Activity Log and display the SRC that sent you here. The direct select address
(DSA) of the I/O card is in the format BBBB-Cc-bb:
v BBBB = hexadecimal offsets 4C and 4D
v Cc = hexadecimal offset 51
v bb = hexadecimal offset 4F
The unit address of the I/O card is hexadecimal offset 18C through 18F.
b. A formatted display of hexadecimal information for Product Activity Log entries is not available.
In order to interpret the hexadecimal information, see More information from hexadecimal reports.
Record the addresses that are not 0000 0000 for all devices listed.
c. Continue with the next step.
6. See System FRU locations and find the diagram of the system unit, or the expansion unit. Then find
the following:
Isolation procedures
77
v The card slot that is identified by the I/O card direct select address (DSA) and unit address. If there
is no IOA with a matching DSA and unit address, the IOP and IOA are one card. Use the IOP with
the same DSA.
v The disk unit locations that are identified by the unit addresses.
Have you determined the location of the I/O card and the devices that are causing the problem?
v No: Ask your next level of support for assistance. This ends the procedure.
v Yes: Have one or more devices been moved to this I/O card from another I/O card?
Yes: Continue with the next step.
No: Ask your next level of support for assistance. This ends the procedure.
7. Do you want to continue using these devices with this I/O card?
v Yes: Continue with the next step.
v No: Remove the devices from the I/O card.
Note: You can remove disk units without installing another disk unit, and the system continues to
operate.
This ends the procedure.
8. Initialize and format the disk units by performing the following steps:
Attention: Data on the disk unit will be lost.
a. Access SST or DST.
b. Select Work with disk units.
Did you get to DST with a Type D IPL?
Yes: Continue with the next step.
No: Select Work with disk unit recovery > Disk unit problem recovery procedures. Then
continue with the next step.
c. Select Initialize and format disk unit for each disk unit. When the new disk unit is initialized and
formatted, the display shows that the status is complete. This may take 30 minutes or longer. The
disk unit is now ready to be added to the system configuration. This ends the procedure.
IOPIP27
I/O card cache data exists for a missing or failed device.
You were sent to this procedure from a unit reference code (URC) of 9051.
Note: For some storage I/O adapters, the cache card is integrated and not removable.
Having I/O card cache data for a missing or failed device might be caused by the following conditions:
v One or more disk units have failed on the I/O card.
v The cache card of the I/O card was not cleared before it was shipped as a MES to the customer. In
addition, the service representative moved devices from the I/O card to a different I/O card before
performing a system IPL.
v The cache card of the I/O card was not cleared before it was shipped to the customer. In addition,
residual data was left in the cache card for disk units that manufacturing used to test the I/O card.
v The I/O card and cache card were moved from a different system or a different location on this system
after an abnormal power off.
v One or more disk units were moved either concurrently, or they were removed after an abnormal
power off.
CAUTION:
Any Function 08 power down (including from a D-IPL) is an abnormal power off.
78
Isolation procedures
1. If the system has logical partitions, perform this procedure from the logical partition that reported
the problem. To determine if the system has logical partitions, go to Determining if the system has
logical partitions before continuing with this procedure.
2. Access SST/DST by doing one of the following:
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
3. Did you perform a D IPL to get to DST?
v Yes: Continue with the next step.
v No: Perform the following steps:
a. Access the Product Activity Log and display the SRC that sent you here.
b. Press the F9 key for address information. This is the I/O card address.
c. Then view the "Additional Information" to record the formatted log information. Record the
device types and serial numbers for those devices that show a unit address of 0000 0000.
d. Continue with step 5.
4. Perform the following steps:
a. Access the Product Activity Log and display the SRC that sent you here. The direct select address
(DSA) of the I/O card is in the format BBBB-Cc-bb:
v BBBB = hexadecimal offsets 4C and 4D.
v Cc = hexadecimal offset 51
v bb = hexadecimal offset 4F
The unit address of the I/O card is hexadecimal offset 18C through 18F.
A formatted display of hexadecimal information for Product Activity Log entries is not available.
In order to interpret the hexadecimal information, see More information from hexadecimal
reports.
b. Record the device types and serial numbers for those devices that show a unit address of 0000
0000.
c. Continue with the next step.
5. See System FRU locations and find the diagram of the system unit, or the expansion unit. Find the
card slot that is identified by the I/O card direct select address (DSA) and unit address. If there is no
IOA with a matching DSA and unit address, the IOP and IOA are one card. Use the IOP with the
same DSA.
6. Choose from the following options:
v If the devices from step 3 of this procedure have never been installed on this system, continue
with the next step.
v If the devices are not in the current system disk configuration, go to step 9 on page 80.
v Otherwise, the devices are part of the system disk configuration; go to step 11 on page 80.
7. Choose from the following options:
v If this I/O card and cache card were moved from a different system, continue with the next step.
v Otherwise, the cache card was shipped to the customer without first being cleared. Perform the
following steps:
a. Make a note of the serial number, the customer number, and the device types and their serial
numbers. These were found in step 3.
b. Inform your next level of support.
c. Then go to step 10 on page 80 to clear the cache card and correct the URC 9051 problem.
Isolation procedures
79
8. Install both the I/O card and the cache cards back into their original locations. Then re-IPL the
system. There could be data in the cache card for devices in the disk configuration of the original
system. After an IPL to DST and a normal power off on the original system, the cache card will be
cleared. It is then safe to move the I/O card and the cache card to another location.
9. One or more devices that are not currently part of the system disk configuration were installed on
this I/O card. Either they were removed concurrently, they were removed after an abnormal power
off, or they have failed. Continue with the next step.
10. Use the Reclaim IOP cache storage procedure to clear data from the cache for the missing or failed
devices as follows:
a. Perform an IPL to DST. See Performing an IPL to dedicated service tools.
If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Reclaim the cache adapter card storage. See Reclaiming IOP cache storage.
11. Choose from the following options:
v If this I/O card and cache card were moved from a different location on this system, go to step 8.
v If the devices from step 3 on page 79 of this procedure are now installed on another I/O card, and
they were moved there before the devices were added to the system disk configuration, go to step
7 on page 79. (On an MES, the disk units are sometimes moved from one I/O card to another I/O
card. This problem will result if manufacturing did not clear the cache card before shipping the
MES.)
v Otherwise, continue with the next step.
12. One or more devices that are currently part of the system disk configuration are either missing or
failed, and have data in the cache card. Consider the following:
v The problem may be because devices were moved from the I/O card concurrently, or they were
removed after an abnormal power off. If this is the case, locate the devices, power off the system
and install the devices on the correct I/O card.
v If no devices were moved, look for other errors logged against the device, or against the I/O card
that occurred at approximately the same time as this error. Continue the service action by using
these system reference codes.
IOPIP28
You were sent to this procedure from unit reference code (URC) 9052.
Contact your next level of support for assistance.
IOPIP29
The failing item is in a migrated expansion unit.
You were sent to this procedure from URC 9012.
Contact your next level of support for assistance.
IOPIP30
Use this procedure to correct the problem when the system cannot find the required cache data for the
attached disk units.
You were sent to this procedure from URC 9050.
1. If the system has logical partitions, perform this procedure from the logical partition that reported
the problem. To determine if the system has logical partitions, go to Determining if the system has
logical partitions before continuing with this procedure.
2. Did you just exchange the storage input/output (I/O) adapter as a result of a failure?
No: Continue with step 13 on page 82.
80
Isolation procedures
Yes: Continue with the next step.
3. Are you working with a 571F/575B card set?
No: Go to step 5.
Yes: Continue with the next step.
4. Remove the 571F/575B card set. Create a new card set with the following:
Note: Label all parts (both original and new) before moving them.
v The new replacement 571F storage IOA.
v The cache directory card from the original 571F storage IOA.
v The original 575B auxiliary cache adapter.
See Separating the 571F/575B card set and moving the cache directory card.
v Ensure that the SCSI cable and the battery power cable on the top edge of the storage side of the
card are connected to the top edge of the auxiliary cache side of the card.
v Reinstall this card set into the system and go to step 6.
5. Remove the I/O adapter. Install the new replacement storage I/O adapter with the following parts
installed on it:
Note: Label all parts (both old and new) before moving them.
v The cache directory card from the original storage I/O adapter. On adapters with removable cache
cards, the cache directory card will move with the removable cache card.
v The removable cache card from the original storage I/O adapter (this applies to only the 571E and
some 2780 I/O adapters).
See System FRU locations for information about removing and replacing parts.
v If the I/O adapter is attached to an auxiliary cache I/O adapter, ensure that the SCSI cable on the
last port of the new replacement storage I/O adapter is connected to the auxiliary cache I/O
adapter. For a list of auxiliary cache I/O adapters, see System parts.
6. Did the 9050 SRC that sent you to this procedure occur on a type-D IPL?
Yes: Perform a type-D IPL and continue with the next step.
No: Continue with the next step.
7. Has a new 9010 or 9050 SRC occurred in the Service Action Log or Product Activity Log?
No: Go to step 10.
Yes: Continue with the next step.
8. Was the new SRC 9050?
No: Continue with the next step.
Yes: Contact your next level of support. This ends the procedure.
9. The new SRC was 9010. Reclaim the cache storage. See Reclaiming IOP cache storage.
Note: When an auxiliary cache I/O adapter that is connected to the storage I/O adapter logs a 9055
SRC in the Product Activity Log, the reclaim does not result in lost sectors. Otherwise, the reclaim
does result in lost sectors, and the system operator might want to restore data from the most recent
saved tape after you complete the repair.
10. Are you working with a 571F/575B card set?
No: Go to step 12 on page 82.
Yes: Continue with the next step.
11. Remove the 571F/575B card set. Create a new card set with the following:
v The new 571F storage IOA
v The cache directory card from the new 571F storage IOA
Isolation procedures
81
v The new 575B auxiliary cache adapter
See Separating the 571F/575B card set and moving the cache directory card.
v Ensure that the SCSI cable and the battery power cable on the top edge of the storage side of the
card are connected to the top edge of the auxiliary cache side of the card.
v Reinstall this card set into the system. This ends the procedure.
12. Remove the I/O adapter. Install the new replacement storage I/O adapter that has the following
parts installed on it:
v The cache directory card from the new storage I/O adapter. On adapters with removable cache
cards, the cache directory card will move with the removable cache card.
v The removable cache card from the new storage I/O adapter (this applies to only the 571E and
some 2780 I/O adapters).
See System FRU locations.
v If the I/O adapter is attached to an auxiliary cache I/O adapter, ensure that the SCSI cable on the
last port of the new replacement storage I/O adapter is connected to the auxiliary cache I/O
adapter. For a list of auxiliary cache I/O adapters, see System parts.
This ends the procedure
13. Identify the affected disk units using information in the Product Activity Log. Access SST/DST by
doing one of the following:
v If you can enter a command at the console, access system service tools (SST). See System Service
Tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools .
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
14. Did you perform a D IPL to get to DST?
v Yes: Continue with the next step.
v No: Perform the following steps:
a. Access the Product Activity Log and display the SRC that sent you here, then view Additional
Information to record the formatted log information. The Device Errors detected field indicates the
total number of disk units that are affected. The Device Errors logged field indicates the number
of disk units for which detailed information is provided. Under the Device heading, the unit
address, type, and serial number are provided for up to three disk units. Additionally, the
controller type and serial number for each of these disk units indicates the adapter to which
the disk was last attached when it was operational.
Note: You might find more than one Product Activity Log entry with the same Log ID. Access
any additional entries by pressing Enter from the Display Detail Report for Resource screen.
View Additional Information for each entry, and record the formatted log information. For
example: You might find an entry for an xxxx902F SRC in the Product Activity Log when the
array includes more than 10 disk units.
b. Continue with step 16 on page 83.
15. A formatted display of hexadecimal information for Product Activity Log entries is not available. To
interpret the hexadecimal information, see More information from hexadecimal reports. The Device
Errors detected field indicates the total number of disk units that are affected. The Device Errors logged
field indicates the number of disk units for which detailed information is provided. Under the Device
heading, the unit address, type, and serial number are provided for up to three disk units.
Additionally, the controller type and serial number for each of these disk units indicates the adapter
to which the disk was last attached when it was operational.
82
Isolation procedures
Note: You might find an entry for an xxxx902F SRC entry in the Product Activity Log when the
array includes more than 10 disk units. To interpret the hexadecimal information for these additional
disk units, see More information from hexadecimal reports.
16. Has the I/O card or have the devices been repaired or reconfigured recently?
Yes: Continue with the next step.
No: Contact your next level of support for assistance. This ends the procedure.
17. You can use one of the following repair options to correct the problem:
v Reunite the adapter and disk units identified in previous steps so that the cache data can be
written to the disk units. If you can find the devices and adapters and want to continue with this
repair option, then continue with the next step.
v If the data for the disk units identified in previous steps is not needed on this or any other
system, initialize and format these disk units.
Attention: This repair option causes a loss of customer data. If you want to continue with this
repair option, go to step 19.
18. Perform the following steps:
a. Restore the adapter and disk units back to their original configuration. For more information, see
System FRU locations. After the system writes cache data to the disk units and you power off the
system normally, you can move the adapter and disk units to another location.
b. Power on the system. For more information, see Powering on and powering off the system. Does
the IP complete successfully?
No: Perform problem analysis to correct the new problem. This ends the procedure.
Yes: This ends the procedure.
19. You have chosen to initialize and format the identified disk units. Perform the following steps:
Attention: Performing the following steps causes a loss of customer data.
a. If you are not already using dedicated service tools, perform an IPL to DST. For more
information, see Performing an IPL to dedicated service tools . If you cannot perform a type A or
B IPL, perform a type D IPL from removable media.
b. Select Work with disk units. Did you get to DST with a Type D IPL?
Yes: Continue with the next step.
No: Select Work with disk unit recovery > Disk unit problem recovery procedures, then
continue with the next step.
20. Select Initialize and format disk unit.
21. Follow the online instructions to format and initialize the disk units.
22. Perform an IPL from disk. Does the IPL complete successfully?
No: Perform problem analysis and correct the new problem. This ends the procedure.
Yes: This ends the procedure.
IOPIP31
Cache data associated with the attached devices cannot be found.
You were sent to this procedure from URC 9010.
1. If the system has logical partitions, perform this procedure from the logical partition that reported
the problem. To determine if the system has logical partitions, go to Determining if the system has
logical partitions before continuing with this procedure.
2. Has the system been powered off for several days?
v No: Go to step 4 on page 84.
v Yes: The cache battery pack may be depleted. Do not replace the I/O adapter or the cache battery
pack. Reclaim the cache storage. See Reclaiming IOP cache storage. Then continue with the next
step.
Isolation procedures
83
Note: When an auxiliary cache I/O adapter connected to the storage I/O adapter logs a 9055 SRC
in the Product Activity Log, the Reclaim does not result in lost sectors. Otherwise, the Reclaim
does result in lost sectors, and the system operator may want to restore data from the most recent
saved tape after you complete the repair.
3. Does the IPL complete successfully?
No: Contact your next level of support. This ends the procedure.
Yes: This ends the procedure.
4. Are you working with a 571F/575B card set?
No: Go to step 6.
Yes: Continue with the next step.
5. Remove the 571F/575B card set. Create a new card set with the following:
Note: Label all parts (both original and new) before moving them.
v The new replacement 571F storage IOA
v The cache directory card from the original 571F storage IOA
v The original 575B auxiliary cache adapter
See Separating the 571F/575B card set and moving the cache directory card.
v Ensure that the SCSI cable and the battery power cable on the top edge of the storage side of the
card are connected to the top edge of the auxiliary cache side of the card.
v Reinstall this card set into the system and go to step 7.
6. Remove the I/O adapter. Install the new replacement storage I/O adapter with the following parts
installed on it:
Note: Label all parts (both old and new) before moving them.
v The cache directory card from the original storage I/O adapter. On adapters with removable cache
cards, the cache directory card will move with the removable cache card.
v The removable cache card from the original storage I/O adapter (this applies to only the 571E and
some 2780 I/O adapters).
See System FRU locations.
v If the I/O adapter is attached to an auxiliary cache I/O adapter, ensure that the SCSI cable on the
last port of the new replacement storage I/O adapter is connected to the auxiliary cache I/O
adapter. For a list of auxiliary cache I/O adapters, see System parts.
7. Did the 9010 SRC that sent you to this procedure occur on a type-D IPL?
No: Continue with the next step.
Yes: Perform a type-D IPL and continue with the next step.
8. Has a new 9010 or 9050 SRC occurred in the Service Action Log?
No: Go to step 11.
Yes: Continue with the next step.
9. Was the new SRC 9050?
No: Continue with the next step.
Yes: Contact your next level of support. This ends the procedure.
10. The new SRC was 9010. Reclaim the cache storage. See Reclaiming IOP cache storage.
Note: When an auxiliary cache I/O adapter that is connected to the storage I/O adapter logs a 9055
SRC in the Product Activity Log, the reclaim does not result in lost sectors. Otherwise, the reclaim
does result in lost sectors, and the system operator might want to restore data from the most recent
saved tape after you complete the repair.
11. Are you working with a 571F/575B card set?
84
Isolation procedures
No: Go to step 13.
Yes: Continue with the next step.
12. Remove the 571F/575B card set. Create a new card set with the following:
v The new 571F storage IOA
v The cache directory card from the new 571F storage IOA
v The new 575B auxiliary cache adapter
See Separating the 571F/575B card set and moving the cache directory card.
v Ensure that the SCSI cable and the battery power cable on the top edge of the storage side of the
card are connected to the top edge of the auxiliary cache side of the card.
v Reinstall this card set into the system.
This ends the procedure.
13. Remove the I/O adapter. Install the new replacement storage I/O adapter that has the following
parts installed on it:
v The cache directory card from the new storage I/O adapter. On adapters with removable cache
cards, the cache directory card will move with the removable cache card.
v The removable cache card from the new storage I/O adapter (this applies to only the 571E and
some 2780 I/O adapters).
See System FRU locations.
v If the I/O adapter is attached to an auxiliary cache I/O adapter, ensure that the SCSI cable on the
last port of the new replacement storage I/O adapter is connected to the auxiliary cache I/O
adapter. For a list of auxiliary cache I/O adapters, see the System parts.
This ends the procedure
IOPIP32
You were sent to this procedure from unit reference code (URC) 9011.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
Attention: There is data in the cache of this I/O card, that belongs to devices other than those that are
attached. Customer data may be lost.
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to Determining if the system has logical
partitions before continuing with this procedure.
2. Did you just exchange the storage I/O adapter as a result of a failure?
v No: Continue with the next step.
v Yes: Reclaim the cache storage. See Reclaiming IOP cache storage.
Does the IPL complete successfully?
No: You have a new problem. Perform problem analysis and correct the problem. This ends the
procedure.
Yes: This ends the procedure.
3. Have the I/O cards been moved or reconfigured recently?
v No: Ask your next level of support for assistance. This ends the procedure.
v Yes: Perform the following steps:
a. Power off the system. See Powering on and powering off the system for details.
b. Restore all I/O cards to their original position.
Isolation procedures
85
c. Select the IPL type and mode that are used by the customer.
d. Power on the system.
Does the IPL complete successfully?
No: Ask your next level of support for assistance. This ends the procedure.
Yes: This ends the procedure.
IOPIP33
The I/O processor card detected a device configuration error. The configuration sectors on the device
may be incompatible with the current I/O processor card.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
You were sent to this procedure from unit reference code (URC) 9001.
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to (Determine if the system has logical
partitions) before continuing with this procedure.
2. Has the I/O adapter been replaced with a different type of I/O adapter, or have the devices been
moved from a different type of I/O adapter to this one?
v No: Contact your next level of support. This ends the procedure.
v Yes: Continue with the next step.
3. Does the disk unit contain data that needs to be saved?
v Yes: Continue with the next step.
v No: Initialize and format the disk units.
Attention: Any data on the disk unit will be lost. Perform the following steps:
a. Access SST or DST.
b. Select Work with disk units.
c. Did you get to DST with a type D IPL?
v No: Select Work with disk unit recovery > Disk unit problem recovery procedures. Then,
continue with the next step.
v Yes: Continue with the next step.
d. Select Initialize and format disk unit for each disk unit. When the new disk unit is initialized and
formatted, the display will show that the status is complete. This may take 30 minutes or longer.
e. The disk unit is now ready to be added to the system configuration. This ends the procedure.
4. The disk unit contains data that needs to be saved.
v If the I/O adapter has been replaced with a different type of I/O adapter, reinstall the original I/O
adapter. Then continue with the next step.
v If the disk units have been moved from a different type of I/O adapter to this one, return the disk
units to their original I/O adapter. Then continue with the next step.
5. Stop parity protection on the disk units, and power down the system normally with the I/O adapter
in an operational state. The I/O adapter or disk units can now be returned to the configuration at the
beginning of this procedure. This ends the procedure.
IOPIP34
You were sent to this procedure from unit reference code (URC) 9027.
The I/O processor card detected that an array is not functional due to the present hardware
configuration.
86
Isolation procedures
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to (Determine if the system has logical
partitions) before continuing with this procedure.
2. Has the I/O adapter been replaced with a different I/O adapter, or have the devices been moved
from a different I/O adapter to this one?
v No: Perform (IOPIP22). This ends the procedure.
v Yes: Perform the following steps:
a. Power off the system. See (Power on/off the system and logical partitions).
b. Restore all I/O cards or devices to their original position.
c. Power on the system.
3. Does the IPL complete successfully?
v No: Ask your next level of support for assistance. This ends the procedure.
v Yes: This ends the procedure.
IOPIP40
Use this procedure to isolate the problem when a storage I/O adapter is connected to an incompatible or
non-operational auxiliary cache I/O adapter.
Perform the following steps:
1. Are you working on a 571F/575B combination storage and auxiliary cache IOA card set (uses two
card slot locations)?
v Yes: Continue with step 2.
v No: Continue with step 3.
2. Find the location of the card.
v Use the location displayed in the Service Action Log. If the Service Action Log does not have a
location, determine the address of the I/O adapter. See System Reference Code (SRC) Format
Description.
v The location identified is for the 571F side of the card set.
You must configure both the 571F and the 575B in the same partition. Are both the 571F side of the
card set and the 575B side of the card set configured in the same partition?
v Yes: Replace the entire card set. See System FRU locations for information about FRU locations on
the system that you are servicing. This ends the procedure.
v No: Change the configuration so that the same partition controls both cards in the card set. This
ends the procedure.
3. Ensure that the SCSI cable on the last port of the storage I/O adapter is connected to the auxiliary
cache I/O adapter. Do the following:
a. Use the location of the storage I/O adapter displayed in the Service Action Log. If the Service
Action Log does not have a location, determine the address of the storage I/O adapter. See System
Reference Code (SRC) Format Description.
b. Determine the location of the storage I/O adapter. See System FRU locations for information about
FRU locations on the system that you are servicing.
c. Ensure that the SCSI cable on the last port of the storage I/O adapter is properly connected to an
auxiliary cache I/O adapter.
d. Ensure that both the auxiliary cache I/O adapter and the storage I/O adapter are in the same
partition.
e. Ensure that the slot power indicator is lit for the auxiliary cache I/O adapter. If it is not, use
concurrent maintenance to power on the slot.
Isolation procedures
87
4. Ensure that the auxiliary cache I/O adapter is supported for the storage I/O adapter to which it is
connected. See the PCI adapter installation instructions for information about which adapters are
compatible.
5. Replace the SCSI cable on the last port of the storage I/O adapter that connects to the auxiliary cache
I/O adapter. See System FRU locations for information about FRU locations on the system that you
are servicing. If this does not fix the problem, replace the auxiliary cache I/O adapter. This ends the
procedure.
IOPIP41
Use this procedure to correct the problem when an auxiliary cache I/O adapter is not connected to a
storage I/O adapter or when an auxiliary cache I/O adapter is connected to an incompatible or
non-operational storage I/O adapter.
Perform the following steps:
1. Are you working on a 571F/575B combination storage and auxiliary cache IOA card set (uses two
card slot locations)?
v Yes: Continue with step 2.
v No: Continue with step 3.
2. Find the location of the card.
v Use the location displayed in the Service Action Log. If the Service Action Log does not have a
location, determine the address of the I/O adapter. See System FRU locations.
v The location identified is for the 575B side of the card set.
You must configure both the 571F and the 575B in the same partition. Are both the 571F side of the
card set and the 575B side of the card set configured in the same partition?
v Yes: Replace the entire card set. See System FRU locations for the model you are working on. This
ends the procedure.
v No: Change the configuration so that the same partition controls both cards in the card set. This
ends the procedure.
3. Ensure that the SCSI cable of the auxiliary cache I/O adapter is connected to the last port of the
storage I/O adapter. Do the following:
a. Use the location of the auxiliary cache I/O adapter displayed in the Service Action Log. If the
Service Action Log does not have a location, determine the address of the auxiliary cache I/O
adapter. See The System Reference Code (SRC) Format Description.
b. Determine the location of the auxiliary cache I/O adapter. See System FRU locations for the model
you are working on.
c. Ensure that the SCSI cable of the auxiliary cache I/O adapter is properly connected to the last port
of the storage I/O adapter.
d. Ensure that both the auxiliary cache I/O adapter and the storage I/O adapter are in the same
partition.
e. Ensure that the slot power indicator is lit for the storage I/O adapter. If it is not, use concurrent
maintenance to power on the slot.
4. Did you just replace the auxiliary cache I/O adapter because of a failure and did the new replacement
auxiliary cache I/O adapter log a 9073 URC?
v Yes: The SCSI bus on the storage I/O adapter may be disabled as a result of the initial failure. Use
Hardware Service Manager to re-IPL the storage I/O adapter that is connected to the new
replacement auxiliary cache I/O adapter. This ends the procedure.
v No: Continue with the next step.
5. Ensure that the auxiliary cache I/O adapter is supported for the storage I/O adapter to which it is
connected. See the PCI adapter installation instructions for information about which adapters are
compatible.
88
Isolation procedures
6. Replace the SCSI cable that connects the auxiliary cache I/O adapter to the storage I/O adapter. See
System parts for cable part number information. If this does not fix the problem, replace the storage
I/O adapter. This ends the procedure.
Licensed Internal Code isolation procedures
Use this section to isolate Licensed Internal Code problems.
Read and observe all safety procedures before servicing the system and while preforming a procedure.
Attention: Unless instructed otherwise, always power off the system or expansion unit where the FRU
is located before removing, exchanging, or installing a field-replaceable unit (FRU).
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
LICIP01
Licensed Internal Code detected an IOP programming problem.
Isolation procedures
89
You will need to gather data to determine the cause of the problem. If using OptiConnect, and the IOP is
connected to another system, then collect this information from both systems. Read the “Licensed Internal
Code isolation procedures” on page 89 before continuing with this procedure.
1. If the system has logical partitions, perform this procedure from the logical partition that reported
the problem. To determine if the system has logical partitions, go to Determining if the system has
logical partitions before continuing with this procedure.
2. Is the system operational: Did the SRC come from the Service Action Log, Product Activity Log,
problem log, or system operator message?
v No: Go to step 9.
v Yes: Is this a x6xx5121 SRC?
No: Continue with the next step.
Yes: Go to step 4.
3. If the IOP has DASD attached to it, then the IOP dump is in SID87 (or SID187 if the DASD is
mirrored). Copy the IOP dump. See Storage dumps.
4. Print the Product Activity Log, including any IOP dumps, to removable media for the day which the
problem occurred. Select the option to obtain HEX data.
5. Use the "Licensed Internal Code log" service function under DST/SST to copy the LIC log entries to
removable media for the day that the problem occurred.
6. Copy the system configuration list. See System configuration list.
7. Provide the dumps to Service Support.
8. Check the Logical Hardware Resource STATUS field using Hardware Service Manager. If the status
is not Operational then IPL the IOP using the I/O Debug option. Ignore resources with a status of not
connected.
To IPL a failed IOP, the following command can be used: VRYCFG CFGOBJ(XXXX)
CFGTYPE(*CTL) STATUS(*RESET) or use DST/SST Hardware Service Manager.
If the IPL does not work:
v Check the Service Action Log for new SRC entries. See Searching the service action log. Use the
new SRC and perform problem analysis to correct the problem.
v If there are no new SRCs in the Service Action Log, contact your next level of support. This ends
the procedure.
9. Has the system stopped but the DST console is still active: Did the SRC come from the Main Storage
Dump manager screen on the DST console?
Yes: Continue with the next step.
10.
11.
12.
13.
14.
15.
No: Go to step 15.
Complete a Problem Summary Form using the information in words 1-9 from the control panel, or
from the DST Main Storage Dump screen.
The system has already taken a partial main storage dump for this SRC and automatically re-IPLed
to DST.
Copy the main storage dump to tape. See Storage dumps.
When the dump is completed, the system will re-IPL automatically. Sign on to DST or SST. Obtain
the data in steps 3, 4, 5, and 6.
Provide the dumps to Service Support. This ends the procedure.
Has the system stopped with an SRC at the control panel?
Yes: Continue with the next step.
No: Go to step 2.
16. Complete a Problem Summary Form using the information in words 1-9 from the control panel, or
from the DST Main storage dump screen.
17. Do not power off the system. Perform a manual IPL to DST, and start the Main storage dump
manager service function.
90
Isolation procedures
18.
19.
20.
21.
Copy the main storage dump to tape.
Obtain the data in steps 3 on page 90, 4 on page 90, 5 on page 90, and 6 on page 90.
Re-IPL the system.
Has the system stopped with an SRC at the control panel?
Yes: Using the new SRC, perform problem analysis to correct the problem. This ends the
procedure.
No: Provide the dumps to Service Support. This ends the procedure.
LICIP03
Dedicated service tools (DST) found a permanent program error, or a hardware failure occurred.
Read the danger notices in the “Licensed Internal Code isolation procedures” on page 89 before
continuing with this procedure.
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to Determining if the system has logical
partitions before continuing with this procedure.
2. Perform a main storage dump. See Storage dumps.
3. Go to LICIP08. This ends the procedure.
LICIP04
The initial program load (IPL) service function ended.
Dedicated service tools (DST) was in the disconnected status or lost communications with the IPL console
because of a console failure and could not communicate with the user. Read the danger notices in
“Licensed Internal Code isolation procedures” on page 89 before continuing with this procedure.
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to Determining if the system has logical
partitions before continuing with this procedure.
2. Select function 21 (Make DST Available) on the control panel and press Enter to start DST again.
Does the DST Sign On display appear?
v No: Continue with the next step.
v Yes: Perform the following steps (see Dedicated service tools for details):
a. Select Start a Service Tool > Licensed Internal Code log.
b. Perform a dump of the Licensed Internal Code log to tape. See Start a service tool for details.
c. Return here and continue with the next step.
3. Perform a main storage dump. See Storage dumps for details.
4. Copy the main storage dump to removable media. See Storage dumps for details.
5. Report a Licensed Internal Code problem to your next level of support. This ends the procedure.
LICIP07
The system detected a problem while communicating with a specific I/O processor.
The problem could be caused by Licensed Internal Code, the I/O processor card, or by bus hardware.
Read the danger notices in “Licensed Internal Code isolation procedures” on page 89 before continuing
with this procedure.
1. If the system has logical partitions, perform this procedure from the logical partition that reported
the problem. To determine if the system has logical partitions, go to Determining if the system has
logical partitions before continuing with this procedure.
Isolation procedures
91
2. Did a previous procedure have you power off the system, perform an IPL in Manual mode, and is
the system in Manual mode now?
v Yes: Continue with the next step.
v No: Perform the following steps:
a. Power off the system. See Powering on and powering off the system for details.
b. Select Manual mode on the control panel. See IPL type, mode, and speed options for details.
c. Power on the system.
d. Continue with the next step.
3. Does the SRC that sent you to this procedure appear on the control panel?
v No: Continue with the next step.
v Yes: Use the information in the SRC to determine the card direct select address. If the SRC is
B6006910, you can use the last 8 characters of the top 16 character line of function 13 (word 7) to
find the card direct select address in BBBBCcbb format.
BBBB Bus number
Cc Card direct select address
bb board address
Go to step 11 on page 93.
4. Does the console display indicate a problem with missing disks?
Yes: Continue with the next step.
No: Go to step 6.
5. Perform the following steps:
a. Go to the DST main menu.
b. On the DST sign-on display, enter the DST full authority user ID and password. See Dedicated
service tools for details.
c. Select Start a service tool > Hardware service manager.
d. Check for the SRC in the service action log. See Searching the service action log.
Did you find the same SRC that sent you to this procedure?
v Yes: Note the date and time for that SRC. Go to the Product Activity Log and search all logs to
find the same SRC. When you have found the SRC, go to step 9 on page 93.
v No: Perform the following steps:
1) Return to the DST main menu.
2) Perform an IPL and return to the Display Missing Disk Units display.
3) Go to “LICIP11” on page 94. This ends the procedure.
6. Does the SRC that sent you to this procedure appear on the console or on the alternative console?
v Yes: Continue with the next step.
v No: Does the IPL complete successfully to the IPL or Install the System display?
Yes: Continue with the next step.
No: A different SRC occurred. Use the new SRC to correct the problem. This ends the
procedure.
7. Perform the following steps:
a. Use the full-authority password to sign on to DST.
b. Search All logs in the product activity log looking for references of SRC B600 5209 and the SRC
that sent you to this procedure.
Note: Search only for SRCs that occurred during the last IPL.
Did you find B600 5209 or the same SRC that sent you to this procedure?
v Yes: Go to step 10 on page 93.
92
Isolation procedures
v No: Did you find a different SRC than the one that sent you to this procedure?
Yes: Continue with the next step.
No: The problem appears to be intermittent. Ask your next level of support for assistance. This
ends the procedure.
8. Use the new SRC to correct the problem. This ends the procedure.
9. Use F11 to move through alternative views of the log analysis displays until you find the card
position and frame ID of the failing IOP associated with the SRC.
Was the card position and frame ID available, and did this information help you find the IOP?
No: Continue with the next step.
Yes: Go to step 12.
10. Perform the following steps:
a. Display the report for the log entry of the SRC that sent you to this procedure.
b. Display the additional information for the entry.
c. If the SRC is B6006910, use characters 9-16 of the top 16 character line of function 13 (word 7) to
find the card direct select address in BBBBCcbb format.
BBBB Bus number
Cc Card direct select address
bb board address
11. Use the BBBBCcbb information and see System FRU locations to determine the failing IOP and its
location.
12. Go to “MABIP55” on page 40 to isolate an I/O adapter problem on the IOP you just identified. If
this fails to isolate the problem, return here and continue with the next step.
13. Is the I/O processor card you identified in step 9 or step 11 the CFIOP?
v No: Continue with the next step.
v Yes: Perform the following steps:
a. Exchange the failing CFIOP card. See System FRU locations.
Note: You will be prompted for the system serial number. Ignore any error messages regarding
system configuration that appear during the IPL.
b. Go to step 16.
14. Perform the following steps:
a. Power off the system.
b. Remove the IOP card.
c. Power on the system.
Does the SRC that sent you to this procedure appear on the control panel or appear as a new entry
in the service action log or product activity log?
v No: Continue with the next step.
v Yes: Perform the following steps:
a. Power off the system.
b. Install the IOP card you just removed.
c. Replace the I/O backplane (Un-P1). This ends the procedure.
15. Perform the following steps:
a. Power off the system.
b. Exchange the failing IOP card.
16. Power on the system.
Does the SRC that sent you to this procedure appear on the control panel, on the console, or on the
alternative console?
No: Continue with the next step.
Isolation procedures
93
Yes: Go to step 18.
17. Does a different SRC appear on the control panel, on the console, or on the alternative console?
v Yes: Use the new SRC to correct the problem. This ends the procedure.
v No: On the IPL or Install the System display, check for the SRC in the service action log. See
Searching the service action log for details.
Did you find the same SRC that sent you to this procedure?
Yes: Continue with the next step.
No: Go to Verify a repair. This ends the procedure.
18. Perform the following steps:
a. Power off the system.
b. Remove the IOP card you just exchanged and install the original card.
c. Go to (Bus-PIP1). This ends the procedure.
19. Ask your next level of support for assistance and report a Licensed Internal Code problem. You may
be asked to verify that all PTFs have been applied.
If you are asked to perform the following, see the following:
v Copy the main storage dump from disk to tape or diskette, see Storage dumps.
v Print the product activity log, see Using the product activity log.
v Copy the IOP storage dump to removable media, see Storage dumps. This ends the procedure.
LICIP08
Licensed Internal Code detected an operating system program problem.
Read the danger notices in “Licensed Internal Code isolation procedures” on page 89 before continuing
with this procedure.
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to Determining if the system has logical
partitions before continuing with this procedure.
2. Select Manual mode and perform an IPL to DST. See Performing an IPL to dedicated service tools.
Does the same SRC occur?
v Yes: Go to step 5.
v No: Does the same URC appear on the console?
No: Continue with the next step.
Yes: Go to step 4.
3. Does a different SRC occur, or does a different URC appear on the console?
v Yes: Use the new SRC or reference code to correct the problem. If the procedure for the new SRC
sends you back to this procedure, then continue with the next step. This ends the procedure.
v No: Select Perform an IPL on the IPL or Install the System display to complete the IPL.
Is the problem intermittent?
Yes: Continue with the next step.
No: This ends the procedure.
4. Copy the main storage dump to removable media. See Storage dumps.
5. Report a Licensed Internal Code problem to your next level of support. This ends the procedure.
LICIP11
Use this procedure to isolate a system STARTUP failure in the initial program load (IPL) mode.
94
Isolation procedures
Ensure you have read the danger notices in “Licensed Internal Code isolation procedures” on page 89
before continuing with this procedure.
How to find the cause code
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to Determining if the system has logical
partitions.
2. Were you given a cause code by another procedure?
No: Continue with the next step.
Yes: Use the cause code given by the other procedure. Then, go to step 4.
3. Look at the Data display characters in word 3. You can obtain these characters by either:
v Looking at word 3 on the Problem summary form that was filled out earlier.
v Selecting characters 9-16 of the top 16 character line of function 12 (word 3).
4. The 4 leftmost characters of word 3 represent the cause code. Select the cause code to go to the correct
isolation instructions:
“0001”
“0010” on page 101
“0020” on page 103
“0031” on page 104
“0002” on page 96
“0011” on page 101
“0021” on page 103
“0033” on page 104
“0004” on page 98
“0012” on page 101
“0022” on page 103
“0034” on page 104
“0005” on page 99
“0015” on page 101
“0023” on page 103
“0035” on page 105
“0006” on page 99
“0016” on page 101
“0024” on page 103
“0037” on page 105
“0007” on page 99
“0017” on page 101
“0025” on page 104
“0038” on page 105
“0008” on page 99
“0018” on page 101
“0026” on page 104
“0039” on page 105
“0009” on page 99
“0019” on page 101
“0027” on page 104
“003A” on page 105
“000A” on page 99
“001A” on page 102
“002A” on page 104
“0099” on page 105
“000B” on page 100
“001C” on page 102
“002B” on page 104
“000C” on page 100
“001D” on page 102
“000D” on page 100
“001E” on page 102
“000E” on page 100
“001F” on page 102
0001
Disk configuration is missing.
1. Select Manual mode and perform an IPL to DST for the failing partition. See Performing an IPL to
dedicated service tools. Does the Disk Configuration Error Report display appear?
Yes: Continue with the next step.
No: The IPL completed successfully. This ends the procedure.
2. Is Missing Disk Configuration information displayed?
Yes: Continue with the next step.
No: Go to step 1 on page 96 for cause code 0002.
3. On the Missing Disk Configuration display, perform the following:
a. Select option 5 > Display Detailed Report > Work with disk units > Work with disk unit
recovery > Recover Configuration.
b. Follow the instructions on the display. After the disk configuration is recovered, the system
automatically performs an IPL. This ends the procedure.
Isolation procedures
95
0002
Disk units are missing from the disk configuration.
Data from the control panel can be used to find information about the missing disk unit.
1. Did you enter this procedure because all the devices listed on the Display Missing Units display
(reached from the Disk Configuration Error Report, the Disk Configuration Attention Report, or the
Disk Configuration Warning Report display) have a reference code of 0000?
No: Continue with the next step.
Yes: Go to step 20 on page 98.
2. Have you installed a new disk enclosure in a disk unit and not restored the data to the disk unit?
No: Continue with the next step.
Yes: Ignore SRC A600 5090. Continue with the disk unit exchange recovery procedure. This ends
the procedure.
3. Use words 1-9 from the information recorded on the Problem summary form to determine the disk
unit that is missing from the configuration:
v Characters 1-8 of the bottom 16 character line of function 12 (word 4) contain the IOP direct select
address.
v Characters 1-8 of the top 16 character line of function 13 (word 6) contains the disk unit type, level
and model number.
v Characters 9-16 of the top 16 character line of function 13 (word 7) contains the disk unit serial
number.
Note: For 2105 and 2107 disk units, the 5 rightmost characters of word 7 contain the disk unit
serial number.
v Characters 1-8 of the bottom 16 character line of function 13 (word 8) contains the number of
missing disk units.
Are the problem disk units 432x, 433x, 660x, 671x, or 673x disk units?
No: Continue with the next step.
Yes: Go to step 5.
4. Attempt to get all devices attached to the MSIOP to Ready status by performing the following:
a. The MSIOP address (MSIOP Direct Select Address) to use is characters 1-8 of the bottom 16
character line of function 12 (word 4).
b. Verify the following and correct if necessary before continuing with step 10 on page 97.
v All cable connections are made correctly and are tight.
v All storage devices have the correct signal bus address, as indicated in the system
configuration list.
v All storage devices are powered on and ready.
5. Did you enter this procedure because there was an entry in the Service Action Log which has the
reference code B6005090?
Yes: Continue with the next step.
No: Go to step 10 on page 97.
6. Are customer jobs running on the system now?
Yes: Continue with the next step.
No: Ensure that the customer is not running any jobs before continuing with this procedure. Then
go to step 10 on page 97.
7. Select System Service Tools (SST) > Work with disk units > Display disk configuration > Display
disk configuration status.
Are any disk units missing from the configuration (indicated by an asterisk *)?
Yes: Continue with the next step.
96
Isolation procedures
No: This ends the procedure.
8. Do all of the disk units that are missing from the configuration have a status of "Suspended"?
Yes: Continue with the next step.
No: Ensure that the customer is not running any jobs before continuing with this procedure. Then
go to step 10.
9. Use the Service Action Log to determine if there are any entries for the missing disk units (see
Searching the service action log). Are there any entries in the Service Action Log for the missing disk
units that were logged since the last IPL?
Yes: Use the information in the Service Action Log, and Reference Code Finder). Perform the
action indicated for the unit reference code. This ends the procedure.
No: Go to step 21 on page 98.
10. Select Manual mode and perform an IPL to DST for the failing partition (see Performing an IPL to
dedicated service tools). Does the Disk Configuration Error Report, the Disk Configuration Attention
Report, or the Disk Configuration Warning Report display appear?
Yes: Continue with the next step.
No: The IPL completed successfully. This ends the procedure.
11. Does one of the following messages appear in the list?
v Missing disk units in the configuration
v Missing mirror protected disk units in the configuration
Yes: Continue with the next step.
No: Go to step 16.
12. Select option 5. Do the missing units have device parity protected status? (Device parity protection
status is indicated by "DPY/" as the first four characters of the status.)
Yes: Continue with the next step.
No: Go to step 14.
13. Is the status DPY/Active?
Yes: Continue with the next step.
No: Use the Service Action Log to determine if there are any entries for the missing disk units or
the IOA/IOP controlling them. See Searching the service action log for details. This ends the
procedure.
14. Press F11, and press Enter to display the details.
Do all of the disk units listed on the display have a reference code of 0000?
Yes: Continue with the next step.
No: Use the disk unit reference code shown on the display and Reference Code Finder. Perform
the action indicated for the unit reference code. This ends the procedure.
15. Do all of the IOPs or devices listed on the display have a reference code of 0000?
No: Use the IOP reference code shown on the display and Reference Code Finder. Perform the
action indicated for the reference code. This ends the procedure.
Yes: Go to step 20 on page 98.
16. Does the following message appear in the list: Unknown load-source status?
Yes: Continue with the next step.
No: Go to step 18.
17. Select option 5, press F11, and then press Enter to display the details.
Does the Assign Missing Load Source Disk display appear?
No: Continue with the next step.
Yes: Press Enter to assign the missing load-source disk unit. This ends the procedure.
18. Does the following message appear in the list?
Isolation procedures
97
Load source failure
Yes: Continue with the next step.
No: The IPL completed successfully. This ends the procedure.
19. Select option 5, press F11, and then press Enter to display the details.
20. The number of failing disk unit facilities (actuators) is the number of disk units displayed. A disk
unit has a Unit number greater than zero.
Find the failing disk unit by type, model, serial number, or address displayed on the console.
21. Is there more than one failing disk device attached to the IOA or MSIOP?
Yes: Continue with the next step.
No: Go to step 24.
22. Use the SAL to determine if there are any entries that occurred around the time of the A6xx/B6xx
5090 SRC. See Using the service action log. Are there any such entries?
No: Continue with the next step.
Yes: Use the information in the SAL and Reference Code Finder). Perform the action indicated for
the unit reference code. This ends the procedure.
23. Are all the disk devices that are attached to the IOA or MSIOP failing? (If the disk units are using
mirrored protection, select Display Disk Status to find out.)
No: Continue with the next step.
Yes: Go to step 25.
24. Go to the Reference Code Finder and exchange the FRUs shown one at a time. Then return here and
answer the question below the listed disk units.
Disk Unit
SRC to look up
2105
21053002
2107
21073002
432x
432x3002
433x
433x3002
660x
660x3002
671x
671x3002
673x
673x3002
Did the disk unit service information correct the problem?
No: Continue with the next step.
Yes: This ends the procedure.
25. Exchange the IOA or MSIOP. See System FRU locations for information about FRU locations for the
system you are servicing.
If exchanging the IOA or MSIOP did not correct the problem, use the original SRC and exchange the
failing items, starting with the highest probable cause of failure. If the failing item list contains FI
codes, Reference Code Finder to help determine parts and locations. This ends the procedure.
0004
Some disk units are unprotected but configured into a mirrored ASP. These units were originally DPY
protected but protection was disabled.
Perform the following steps:
1. Is the system managed by a management console?
Yes: Select DST by performing the management console action for Function 21 for the failing
partition. See Control panel functions on the management console. Then continue with the next
step.
98
Isolation procedures
No: Select DST using Function 21 for the failing partition. See Selecting function 21 from the
control panel in Service functions. Then continue with the next step.
2. Select Work with disk units and take the actions to protect the system.
If you do not know what actions to take, select Manual mode and perform an IPL to DST for the
failing partition. See Performing an IPL to dedicated service tools.
When the Disk configuration error report appears, the recovery actions are listed in the Help text for
the error message "Unprotected disk units in a mirrored ASP". This ends the procedure.
0005
A disk unit using parity protection is operating in exposed mode.
1. Select Manual mode and perform an IPL to DST for the failing partition. See Performing an IPL to
dedicated service tools.
2. Choose from the following options:
v If the same reference code appears, ask your next level of support for assistance.
v If no reference code appears and the IPL completes successfully, the problem is corrected.
v If a different reference code appears, use it to perform problem analysis and correct the new
problem. This ends the procedure.
0006
There are new devices attached to the system that do not have Licensed Internal Code installed. Ask your
next level of support for assistance.
0007
Some of the configured disk units have device parity protection disabled when the system expected
device parity protection to be enabled.
1. Is the system managed by a management console?
Yes: Select DST by performing the management console action for Function 21 for the failing
partition. See Control panel functions on the management console. Then continue with the next
step.
No: Select DST using Function 21 for the failing partition. See Selecting function 21 from the
control panel in Service functions. Then continue with the next step.
2. Correct the problem by doing the following:
a. Select Work with disk units > Work with disk unit recovery > Correct device parity protection.
b. Follow the online instructions. This ends the procedure.
0008
A disk unit has no more alternate sectors to assign.
1. Determine the failing unit by type, model, serial number or address given in words 4-7. See The
system reference code format description.
2. See the service information for the specific storage device. Use the disk unit reference code listed
below for service information entry.
432x 102E, 433x 102E, 660x 102E, 671x 102E, 673x 102E (Reference Code Finder).
This ends the procedure.
0009
The procedure to restore a disk unit from the tape unit did not complete. Continue with the disk unit
exchange recovery procedure.
000A
There is a problem with a disk unit subsystem. As a result, there are missing disk units in the system.
Perform the following steps:
Isolation procedures
99
1. Is the system managed by a management console?
Yes: Select DST by performing the management console action for Function 21 for the failing
partition. See Control panel functions on the management console. Then continue with the next
step.
No: Select DST using Function 21 for the failing partition. See Selecting function 21 from the
control panel in Service functions. Then continue with the next step.
2. On the Service Tools display, select Start a Service Tool > Product activity log > Analyze log.
3. On the Select Subsystem Data display, select the option to view All Logs.
Note: You can change the From: and To: Dates and Times from the 24-hour default if the time that
the customer reported having the problem was more than 24 hours ago.
4. Use the defaults on the Select Analysis Report Options display by pressing Enter.
5. Search the entries on the Log Analysis Report display for system reference codes associated with the
missing disk units.
6. Go to Reference Code Finder to correct the problem. This ends the procedure.
000B
Some system IOPs require cache storage be reclaimed.
1. Is the system managed by a management console?
Yes: Select DST by performing the management console action for Function 21 for the failing
partition. See Control panel functions on the management console. Then continue with the next
step.
No: Select DST using Function 21 for the failing partition. See Selecting function 21 from the
control panel in Service functions. Then continue with the next step.
2. Reclaim the cache adapter card storage. See Reclaiming IOP cache storage.
Note: The system operator may want to restore data from the most recent saved tape after you
complete the repair.
This ends the procedure.
000C
One of the mirror protected disk units has no more alternate sectors to assign.
1. Determine the failing unit by type, model, serial number or address given in words 4-7. See System
reference code information.
2. See the service information for the specific storage device. Use the disk unit reference code listed
below for service information entry.
432x 102E, 433x 102E, 660x 102E, 671x 102E, 673x 102E (Reference Code Finder).
This ends the procedure.
000D
The system disk capacity has been exceeded.
For more information about disk capacity, see iSeries® Handbook, GA19-5486-20.
000E
Start compression failure.
1. Select Manual mode and perform an IPL to DST for the failing partition. See Performing an IPL to
dedicated service tools.
2. Correct the problem by doing the following:
a. Select Work with disk units > Work with disk unit recovery > Recover from start compression
failure.
b. Follow the on-line instructions. This ends the procedure.
100
Isolation procedures
0010
The disk configuration has changed.
The operating system must be installed again, and all customer data must be restored.
1. Select Manual mode on the control panel.
2. Perform an IPL to reinstall the operating system.
3. The customer must restore all data from the latest system backup. This ends the procedure.
0011
The serial number of the control panel does not match the system serial number.
1. Select Manual mode on the control panel.
2. Perform an IPL. You will be prompted for the system serial number. This ends the procedure.
0012
The operation to write the vital product data (VPD) to the control panel failed.
Exchange the multiple function I/O processor card. See System FRU locations for information about FRU
locations for the system that you are servicing.
0015
The mirrored load-source disk unit is missing from the disk configuration. Go to step 1 on page 96 for
cause code 0002.
0016
A mirrored protected disk unit is missing. Wait six minutes. If the same reference code appears, go to
step 1 on page 96 for cause code 0002.
0017
One or more disk units have a lower level of mirrored protection than originally configured.
1. Select Manual mode and perform an IPL to DST for the failing partition. See Performing an IPL to
dedicated service tools.
2. Review the detailed display, which shows the new and the previous levels of mirrored protection.
This ends the procedure.
0018
Load-source configuration problem. The load-source disk unit is using mirrored protection and is
configured at an incorrect address. Ensure that the load-source disk unit is in device location 1.
0019
One or more disk units were formatted incorrectly.
The system will continue to operate normally. However, it will not operate at optimum performance. To
repair the problem, perform the following steps:
1. Record the unit number and serial number of the disk unit that is formatted incorrectly.
2. Sign on to DST. See Accessing dedicated service tools.
3. Select Work with disk units > Work with disk unit configuration > Remove unit from
configuration.
4. Select the disk unit you recorded earlier in this procedure.
5. Confirm the option to remove data from the disk unit. This step may take a long time because the
data must be moved to other disk units in the auxiliary storage pool (ASP).
6. When the remove function is complete, select Add unit to configuration.
7. Select the disk unit you recorded earlier in this procedure.
8. Confirm the add. The disk unit is formatted during functional operation. This ends the procedure.
Isolation procedures
101
001A
The load-source disk unit data is down-level.
The load-source disk unit is mirror protected. The system is using the load-source disk unit that does not
have the current level of data.
1. Select Manual mode and perform an IPL to DST for the failing partition. See Performing an IPL to
dedicated service tools. Does the Disk Configuration Error Report display appear?
No: The system is now using the correct load-source. This ends the procedure.
Yes: Continue with the next step.
2. Does a "Load source failure" message appear in the list?
Yes: Continue with the next step.
No: The system is now using the correct load-source. This ends the procedure.
3. Select option 5, press F11, and then press Enter to display details.
The load-source type, model, and serial number information that the system needs is displayed on the
console.
Is the load-source disk unit (displayed on the console) attached to an MSIOP that cannot be used for a
load-source?
Yes: Contact your next level of support. This ends the procedure.
No: The load-source disk unit is missing. Go to step 1 on page 96 for cause code 0002.
001C
The disk units that are needed to update the system configuration are missing.
Perform an IPL by doing the following:
1. Select Manual mode on the control panel.
2. Perform an IPL. Use the IPL information to determine the cause of the problem. This ends the
procedure.
001D
1. Is the Disk Configuration Attention Report, or the Disk Configuration Warning Report displayed?
Yes: Continue with the next step.
No: Ask your next level of support for assistance. This ends the procedure.
2. On the Bad Load Source Configuration message line, select 5, and press Enter to rebuild the
load-source configuration information. If there are other types of warnings, select option 5 on the
warnings, and correct the problem. This ends the procedure.
001E
The load-source data must be restored.
001F
Licensed Internal Code was installed on the wrong disk unit of the load-source mirrored pair.
The system performed an IPL on a load source that may not contain the same level of Licensed Internal
Code that was installed on the other load source. The type, model, and address of the active device are
displayed in words 4-7 of the SRC.
Choose from the following options:
1. If the load-source disk unit in position 1 contains the correct level of Licensed Internal Code, perform
the following steps:
a. Select Manual mode and perform an IPL to DST for the failing partition. See Performing an IPL to
dedicated service tools. Is the Disk Configuration Attention Report or Disk Configuration Warning
Report displayed?
102
Isolation procedures
Yes: Select option 5 on the Incorrect Licensed Internal Code Install message line. When the
Display Incorrect Licensed Internal Code Install display appears on the console, press Enter.
No: The system is now using the correct load source. This ends the procedure.
2. If the load-source disk unit in position 1 of the system unit does not contain the correct level of
Licensed Internal Code, restore the Licensed Internal Code to the disk unit in position 1 of the system
unit. This ends the procedure.
0020
The system appears to be a one disk unit system. Select Manual mode and perform an IPL to DST for the
failing partition. See Performing an IPL to dedicated service tools.
0021
The system password verification failed.
1. Select Manual mode and perform an IPL to DST for the failing partition. See Performing an IPL to
dedicated service tools.
2. When prompted, enter the correct system password. If the correct system password is not available
perform the following steps:
a. Select Bypass the system password.
b. Have the customer contact the marketing representative immediately to order a new system
password from your service provider. This ends the procedure.
0022
A different compression status was expected on a reporting disk unit. Accept the warning. The reported
compression status will be used as the current compression status.
0023
There is a problem with a disk unit subsystem. As a result, there are missing disk units in the system.
The system is capable of IPLing in this state.
1. Is the system managed by a management console?
Yes: Select DST by performing the management console action for Function 21 for the failing
partition. See Control panel functions on the management console. Then continue with the next
step.
No: Select DST using Function 21 for the failing partition. See Selecting function 21 from the
control panel in Service functions. Then continue with the next step.
2. On the Service Tools display, select Start a Service Tool > Product activity log > Analyze log.
3. On the Select Subsystem Data display, select the option to view All Logs.
Note: You can change the From: and To: Dates and Times from the 24-hour default if the time that
the customer reported having the problem was more than 24 hours ago.
4. Use the defaults on the Select Analysis Report Options display by pressing Enter.
5. Search the entries on the Log Analysis Report display for system reference codes associated with the
missing disk units.
6. Go to the Reference Code Finder topic and use the SRC information to correct the problem. This ends
the procedure.
0024
The system type or system unique ID needs to be entered.
1. Select Manual mode and perform an IPL to DST for the failing partition. See Performing an IPL to
dedicated service tools.
2. When prompted, enter the correct system type or system unique ID. This ends the procedure.
Isolation procedures
103
0025
Hardware Resource Information Persistence disabled.
1. Select Manual mode and perform an IPL to DST for the failing partition. See Performing an IPL to
dedicated service tools.
2. Contact your next level of support for instructions on how to enable the Hardware Resource
Information Persistence function. This ends the procedure.
0026
A disk unit is incorrectly configured for an LPAR system.
1. Select Manual mode and perform an IPL to DST for the failing partition. See Performing an IPL to
dedicated service tools.
2. On the Service Tools display, select Start a Service Tool > Product activity log > Analyze log.
3. On the Select Subsystem Data display, select the option to view All Logs.
Note: You can change the From: and To: Dates and Times from the 24-hour default if the time that
the customer reported having the problem was more than 24 hours ago.
4. Use the defaults on the Select Analysis Report Options display by pressing Enter.
5. Search the entries on the Log Analysis Report display for system reference codes (B6xx 53xx) that are
associated with the error.
6. Using the SRC information, Reference Code Finder and use the information to correct the problem.
This ends the procedure.
0027
The user ASP has overflowed. Contact your next level of support.
002A
The command to examine the status of the IOA cache storage failed. Contact your next level of support.
002B
Data from the user ASP has overflowed into the system ASP because the user ASP was full. Either add
more disk units to the user ASP or delete data from the user ASP so that there is enough capacity in the
user ASP to hold the data which has overflowed. Then select Manual mode and perform an IPL to DST
for the failing partition. See Performing an IPL to dedicated service tools. At the display, recover the
overflowed user ASP. This will move the overflowed data from the system ASP back to the user ASP. If
you need assistance, contact your next level of support.
0031
A problem was detected with the installation of Licensed Internal Code service displays. The cause may
be defective media, the installation media being removed too early, a device problem or a Licensed
Internal Code problem.
v Ask your next level of support for assistance. Characters 13-16 of the top 16 character line of function
12 (4 rightmost characters of word 3) contain information regarding the install error.
v If the customer does not require the service displays to be in the national language, you may be able to
continue by performing another system IPL. This ends the procedure.
0033
System model not supported. This model of hardware does not support the System Licensed Internal
Code version and release that is being used. Use a supported version and release of the System Licensed
Internal Code.
0034
Insufficient main storage capacity.
104
Isolation procedures
There is not enough main storage capacity. For details about how much more capacity is required, see the
"Insufficient Main Storage Capacity" screen, that is displayed when the system is IPLed in manual mode.
Typically, this error occurs when you have moved memory between logical partitions, and one partition
no longer has a sufficient amount of main storage.
0035
Data from a User ASP has overflowed into the System ASP (ASP 1). There is not enough free space in the
User ASP to move the overflowed data from the System ASP back into the User ASP. The system will
continue to run in this condition, but if a disk failure in the System ASP causes the System ASP to be
cleared, the data in the User ASP will also be cleared out.
You should delete some files or objects from the User ASP so that enough free space exists in the User
ASP to allow the data that is overflowed into the System ASP to be moved back.
0037
One or more functional connections to a disk unit in a multi-path environment have not been detected.
The connections to the disk unit were established by running ESS Specialist. If you use the server in this
state, you may cause a loss of data. You must ensure that all of the functional connections are still
established between the disk and the Input/Output Adapters (IOAs) attached to this server and this
logical partition. If there is an IOA which has a connection to the disk unit that has been moved to a
different logical partition or different server, you should not continue with the IPL. Notify your next level
of support.
0038
Verification of the encryption key failed. Using the backup media which contains the correct encryption
key value, restore the system. Contact your next level of support.
0039
The disk unit is attached to the partition in a dual storage adapter configuration. The secondary adapter
is missing or disabled. The primary adapter is working, so the disk unit is available to the partition.
Contact your next level of support to determine why the secondary controller is missing or disabled.
003A
The disk unit is attached to the partition in a dual storage adapter configuration. The secondary adapter
is failed. The primary adapter is working, so the disk unit is available to the partition. Repair or replace
the failed secondary adapter.
0099
A Licensed Internal Code program error occurred. Ask your next level of support for assistance.
LICIP12
Use this procedure to isolate an Independent Auxiliary Storage Pool (IASP) vary on failure.
Message CPDB8E0 occurred if the user attempted to vary on the IASP. Read the Danger notices in
“Licensed Internal Code isolation procedures” on page 89 before continuing with this procedure.
How to find the cause code
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to Determining if the system has logical
partitions before continuing with this procedure.
2. Were you given a cause code by another procedure?
No: Continue with the next step.
Yes: Use the cause code given by the other procedure. Then go to step 4 on page 106.
3. Look at the characters in word 3. You can obtain these characters by doing the following:
Isolation procedures
105
a. On the command line, enter the Start System Service Tools (STRSST) command. If you cannot get
to SST, use function 21 to get to DST. Do not IPL the system to get to DST.
b. On the Start Service Tools Sign On display, type in a User ID with service authority and password.
c. Select Start a Service Tool > Hardware Service Manager > Work with service action log.
d. On the Select Timeframe display, change the From: Date and Time to a date and time prior to
when the user attempted to vary on the IASP.
e. Search for a B6005094 system reference code that occurred at the time the user attempted to vary
on the IASP. Display the failing item information for this entry.
f. Select the function key for Additional details.
g. The 4 leftmost characters of word 3 is the cause code to be used in this procedure.
4. Find the cause code below:
“0002”
“000A” on page 108
“002C” on page 109
“0030” on page 109
“0004” on page 108
“000B” on page 108
“002D” on page 109
“0032” on page 109
“0007” on page 108
“000D” on page 108
“002E” on page 109
“0099” on page 109
“0009” on page 108
“000E” on page 109
“002F” on page 109
0002
Disk units are missing from the IASP disk configuration.
1. Have you installed a new disk enclosure in a disk unit and not restored the data to the disk unit?
v No: Continue with the next step.
v Yes: Ignore SRC A600 5094.
Continue with the disk unit exchange recovery procedure. This ends the procedure.
2. Use words 1-9 from the information in the Service Action Log to determine the disk unit that is
missing from the configuration:
v Word 4 contains the IOP direct select address.
v Word 5 contains the unit address.
v Word 6 contains the disk unit type, level and model number.
v Word 7 contains the disk unit serial number.
v Word 8 contains the number of missing disk units.
Are the problem disk units 432x, 660x, or 671x Disk Units?
v Yes: Continue with the next step.
v No: Attempt to get all devices attached to the IOP to Ready status by performing the following:
a. The IOP address (IOP Direct Select Address) to use is Word 4.
b. Verify the following, and correct if necessary:
– Ensure all cable connections are made correctly and are tight.
– Ensure the configuration within the device is correct.
– Ensure all storage devices are powered on and ready.
c. Continue with the next step.
3. Perform the following steps:
Select System Service Tools (SST) > Work with disk units > Display disk configuration > Display
disk configuration status.
Are any disk units missing-indicated with an asterisk (*)- from the IASP configuration?
Yes: Continue with the next step.
106
Isolation procedures
No: This ends the procedure.
4. Use the Service Action Log to determine if there are any entries other than B6xx 5094 for the missing
disk units or the IOA or IOP that is controlling them. See Using the product activity log.
Are there any entries in the Service Action Log other than B6xx 5094 for the missing disk units or
the IOA or IOP that is controlling them?
No: Continue with the next step.
Yes: Use the information in the Service Action Log to solve the problem. See Using the product
activity log. This ends the procedure.
5. Did you enter this procedure because there was a B6xx 5094 cause code of 0030?
v No: Continue with the next step.
v Yes: Work with the customer to recover the unknown configuration source disk unit.
Use a workstation with System i® Navigator installed to select the disk pool with the problem,
and then select Recover unknown configuration source for this disk pool. This ends the
procedure.
6. Use Hardware Service Manager to display logical resources connected to the IOP. See Hardware
service manager.
7. Is every device attached to the IOP failing?
v Yes: Continue with the next step.
v No: Are all of the disk units that are attached to one IOA missing?
– No: Continue with the next step.
– Yes: Exchange the IOA. Use the IOP direct select address and the first character of the unit
address from step 2 on page 106 to find the location. See System FRU locations. This ends the
procedure.
8. Is there more than one storage IOA attached to the IOP?
v Yes: Exchange the IOP. Use the IOP direct select address from step 2 on page 106 to find the
location. See System FRU locations. This ends the procedure.
v No: Go to step 10.
9. Go to the service information for the specific disk unit that is listed below and perform the action
indicated. Then return here and answer the following question.
v 2105 Disk Units: Use SRC 3002 exchange the FRUs shown one at a time.
v 432x, 660x, 671x Disk Units: Use SRC 3002 and exchange the FRUs shown one at a time.
Did the disk unit service information correct the problem?
No: Continue with the next step.
Yes: This ends the procedure.
10. Perform the following steps:
a. Exchange the IOA. Use the IOP direct select address and the first character of the unit address
from step 2 on page 106 to find the location. See System FRU locations.
b. If exchanging the IOA does not correct the problem, exchange the IOP. Use the IOP direct select
address from step 2 on page 106 to find the location. See System FRU locations.
c. If exchanging the IOP does not correct the problem, exchange the failing items in the following
FRU list starting with the first item in the list.
1) FI01140
2) System backplane
3) FI00580
4) AJDG301
This ends the procedure.
Isolation procedures
107
0004
Some disk units are unprotected but configured into a mirrored IASP. These units were originally DPY
protected but protection was disabled.
Direct the customer to take the actions necessary to start protection on these disk units. This ends the
procedure.
0007
Some of the configured disk units have device parity protection disabled when the system expected
device parity protection to be enabled.
1. Select Manual mode and perform an IPL to DST. See Performing an IPL to dedicated service tools.
2. Correct the problem by doing the following:
a. Select Work with disk units > Work with disk unit recovery > Correct device parity protection
mismatch.
b. Follow the on-line instructions. This ends the procedure.
0008
A disk unit has no more alternate sectors to assign.
1. Determine the failing unit by type, model, serial number or address given in words 4-7. See The
system reference code format description.
2. See the service information for the specific storage device. Use the disk unit reference code listed
below for service information entry.
432x 102E, 660x 102E, 671x 102E This ends the procedure.
0009
The procedure to restore a disk unit from the tape unit did not complete.
Continue with the disk unit exchange recovery procedure. This ends the procedure.
000A
There is a problem with a disk unit subsystem. As a result, there are missing disk units in the system.
Use the Service Action Log to find system reference codes associated with the missing disk units by
changing the From: Date and Time on the Select Timeframe display to a date and time prior to when the
user attempted to vary on the IASP. For information about how to use the Service Action Log, see
Searching the service action log. This ends the procedure.
000B
Some system IOPs require cache storage be reclaimed.
1. Start SST.
2. Reclaim the cache adapter card storage by performing the following:
a. Select Work with disk units > Work with disk unit recovery > Reclaim IOP Cache Storage.
b. Follow the on-line instructions to reclaim cache storage.
c. After you complete the repair, the system operator may want to restore data from the most
recently saved tape. This ends the procedure.
000D
The system disk capacity has been exceeded.
For more information about disk capacity, see the iSeries Handbook. This ends the procedure.
108
Isolation procedures
000E
Start compression failure.
1. Select Manual mode and perform an IPL to DST. See Performing an IPL to dedicated service tools.
2. Correct the problem by doing the following:
a. Select Work with disk units > Work with disk unit recovery > Recover from start compression
failure.
b. Follow the on-line instructions. This ends the procedure.
002C
A Licensed Internal Code program error occurred.
Ask your next level of support for assistance. This ends the procedure.
002D
The IASP configuration source disk unit data is down-level.
The system is using the IASP configuration source disk unit that does not have the current level of data.
Work with the customer to recover the configuration. On a workstation with System i Navigator installed,
select the disk pool with the problem, and then select Recover configuration. This ends the procedure.
002E
The Independent ASP is assigned to another system or a Licensed Internal Code program error occurred.
Work with the customer to check other systems to determine if the Independent ASP has been assigned
to it. If the Independent ASP has not been assigned to another system, ask your next level of support for
assistance. This ends the procedure.
002F
The system version and release are at a different level than the IASP version and release.
The system version and release must be upgraded to be the same as the system version and release in
which the IASP was created. This ends the procedure.
0030
The mirrored IASP configuration source disk unit has a disk configuration status of unknown and is
missing from the disk configuration.
Go to step 1 on page 106 for cause code 0002.
0032
A Licensed Internal Code program error occurred.
Ask your next level of support for assistance. This ends the procedure.
0099
A Licensed Internal Code program error occurred.
Ask your next level of support for assistance. This ends the procedure.
LICIP13
A disk unit seems to have stopped communicating with the system.
Isolation procedures
109
The system has stopped normal operation until the cause of the disk unit failure is found and corrected.
Ensure you have read the Danger notices in “Licensed Internal Code isolation procedures” on page 89
before continuing with this procedure.
If the disk unit that stopped communicating with the system has mirrored protection active, normal
operation of the system stops for one to two minutes. Then the system suspends mirrored protection for
that disk unit and continues normal operation.
Note: Do not power off the system or partition using the white button, function 08, ASMI, or
management console immediate power-off when performing this procedure. If this procedure or other
isolation procedures referenced by this procedure direct you to IPL or power off the system,
v perform a partition main storage dump (see Performing dumps), or
v if additional dump information is not needed, perform a function 03 IPL or restart the system or
partition using the management console.
1. If the system has logical partitions, perform this procedure from the logical partition that reported
the problem. To determine if the system has logical partitions, go to Determining if the system has
logical partitions before continuing with this procedure.
2. Was a problem summary form completed for this problem?
No: Continue with the next step.
Yes: Use the problem summary form information and go to step 4.
3. Fill out a Problem Reporting Form completely with the instructions provided.
4. Recovery from a device command time-out may have caused the communications loss condition
(indicated by an SRC on the control panel or in the management console). This communications loss
condition has the following symptoms:
v The A6xx SRC does not increment within two minutes.
v The system continues to run normally after it recovers from the communications loss condition
and the reference code is cleared from the control panel.
Does the communication loss condition have the above symptoms?
Yes: Continue with the next step.
No: Go to step 6.
5. Verify that all Licensed Internal Code PTFs have been applied to the system. Apply any Licensed
Internal Code PTFs that have not been applied to the system. Does the intermittent condition
continue?
Yes: Print all product activity logs. Print the LIC logs with a major code of 1000. Provide this
information to your next level of support. This ends the procedure.
No: This ends the procedure.
6. Is the storage hosted by another partition?
Yes: Contact your next level of support.
No: Continue with the next step.
7. A manual reset of the IOP may clear the attention reference code. Perform the following steps:
If you are working from the control panel:
a. Select Manual mode on the control panel.
b. Select Function 25 and press Enter.
c. Select Function 26 and press Enter.
d. Select Function 67 and press Enter to reset the IOP.
e. Wait 10 minutes.
f. Select Function 25 and press Enter to disable the service functions on the control panel.
If you are working from the HMC:
a. In the navigation area, select Systems Management.
110
Isolation procedures
b.
c.
d.
e.
f.
In the contents area, open the server on which the logical partition is located.
In the contents are, select the logical partition.
Select Serviceability > Control Panel Functions.
Select (67) Disk Unit IOP Reset/Reload.
Wait 10 minutes.
Did the reset successfully clear the control panel SRC or management console panel value and can
commands be entered on the partition console?
No: Continue with the next step.
Yes: Look for a Service Action Log (SAL) entry since the last IPL, and use it to fix the problem
(see Searching the service action log). If a B6xx 5090 SRC occurred since the last IPL, look for
other SRC entries and take action on them first. This ends the procedure.
8. Is the SRC the same reference code that sent you here?
Yes: The same reference code occurred. Continue with the next step.
No: Collect all words of the reference code and perform, problem analysis to resolve the new
problem. This ends the procedure.
9. Powering off and powering on the affected IOP domain may clear the attention reference code.
Perform the following steps:
If you are working from the control panel:
a. Select Manual mode on the control panel.
b. Select Function 25 and press Enter.
c. Select Function 26 and press Enter.
d. Select Function 68 and press Enter to power off the domain.
e. After the domain has been powered off or 10 minutes have passed, select Function 69 and press
Enter to power on the domain.
f. Wait 10 minutes.
g. Select Function 25 and press Enter to disable the service functions on the control panel.
If you are working from the HMC:
a. In the navigation area, select Systems Management.
b. In the contents area, open the server on which the logical partition is located.
c. In the contents are, select the logical partition.
d. Select Serviceability > Control Panel Functions.
e. Select (68) Concurrent Maintenance Power Off Domain.
f. After the domain has been powered off or 10 minutes have passed, select (69) Concurrent
Maintenance Power On Domain.
g. Wait 10 minutes.
Did this successfully clear the control panel SRC or management console panel value, and can
commands be entered on the partition console?
No: Continue with the next step.
Yes: Look for a SAL entry since the last IPL, and use it to fix the problem (see Searching the
service action log). If a B6xx 5090 SRC occurred since the last IPL, look for other SRC entries and
take action on them first. This ends the procedure.
10. Is the SRC the same reference code that sent you here?
Yes: The same reference code occurred. Continue with the next step.
No: Collect all words of the reference code and perform problem analysis to resolve the new
problem. This ends the procedure.
11. Perform a main storage dump, then perform an IPL by performing the following:
If you are working from the control panel:
Isolation procedures
111
a. Select Manual mode on the control panel.
b. Select Function 22 and press Enter to dump the main storage to the load-source disk unit.
c. Wait for SRC A100 300x to occur, indicating that the dump is complete.
d. Then perform an IPL to DST (see Performing an IPL to dedicated service tools).
If you are working from the HMC:
a.
b.
c.
d.
e.
In the navigation area, select Systems Management.
In the contents area, open the server on which the logical partition is located.
In the contents are, select the logical partition.
Select Operations > Restart.
In the Restart Partition window, select the Dump restart option.
Does a different SRC occur, or does a display appear on the console showing reference codes?
No: Continue with the next step.
Yes: Perform problem analysis to correct the new problem. This ends the procedure.
12. Does the same reference code occur?
v Yes: Continue with the next step.
v No: The problem is intermittent. Perform the following:
a. Print the system product activity log for the magnetic storage subsystem and print the LIC
logs with a major code of 1000.
b. Copy the main storage dump to removable media (see Managing dumps).
c. Contact your next level of support and provide them with this information. This ends the
procedure.
13. Are characters 7-8 of the top 16 character line of function 12 (2 rightmost characters of word 2) equal
to 13 or 17?
Yes: Continue with the next step.
No: Go to step 16 on page 113.
14. Use the word 1 through 9 information recorded on the Problem summary form to determine the disk
unit that stopped communicating with the system:
v Characters 9-16 of the top 16 character line of function 12 (word 3) contain the IOP direct select
address.
v Characters 1-8 of the bottom 16 character line of function 12 (word 4) contains the unit address.
v Characters 1-8 of the top 16 character line of function 13 (word 6) may contain the disk unit type,
level and model number.
v Characters 13-16 of the top 16 character line of function 13 (4 rightmost characters of word 7) may
contain the disk unit reference code.
v Characters 1-8 of the bottom 16 character line of function 13 (word 8) may contain the disk unit
serial number.
Note: For 2105 and 2107 disk units, characters 4-8 of the bottom 16 character line of function 13 (5
rightmost characters of word 8) contain the disk unit serial number.
15. Is the disk unit reference code 0000?
v No: Using the information from step 14, find the table for the indicated disk unit type. Perform
problem analysis for the disk unit reference code. This ends the procedure.
v Yes: Perform the following steps:
a. Determine the IOP type by using characters 9-12 of the bottom 16 character line of function 13
(4 leftmost characters of word 9).
b. Find the unit reference code table for the IOP type. Determine the unit reference code by using
characters 13-16 of the bottom 16 character line of function 13 (4 rightmost characters of word
9).
112
Isolation procedures
c. Perform problem analysis for the unit reference code. This ends the procedure.
16. Are characters 7-8 of the top 16 character line of function 12 (the two rightmost characters of word 2)
equal to 27?
Yes: Continue with the next step.
No: Go to step 20.
17. Use the word 1 through 9 information recorded on the Problem summary form to determine the disk
unit that stopped communicating with the system:
v Characters 9-16 of the top 16 character line of function 12 (word 3) contain the IOP direct select
address.
v Characters 1-8 of the bottom 16 character line of function 12 (word 4) contains the disk unit
address
v Characters 9-16 of the bottom 16 character line of function 12 (word 5) contains the disk unit type,
level and model number.
v Characters 1-8 of the bottom 16 character line of function 13 (word 8) contains the disk unit serial
number.
Note: For 2105 and 2107 Disk Units, characters 4-8 of the bottom 16 character line of function 13
(5 rightmost characters of word 8) contain the disk unit serial number.
v Characters 13-16 of the bottom 16 character line of function 13 (4 rightmost characters of word 9)
contain the disk unit reference code.
18. Is the disk unit reference code 0000?
v No: Continue with the next step.
v Yes: Find the table for the indicated disk unit type. Then find unit reference code (URC) 3002 in
the table, and exchange the FRUs for that URC, one at a time.
Note: Do not perform any other isolation procedures that are associated with URC 3002.
This ends the procedure.
19. Are characters 9-16 of the bottom 16 character line of function 13 (word 9) B6xx 51xx?
Yes: Using the B6xx table, perform problem analysis for the 51xx unit reference code. This ends
the procedure.
No: Using the information from step 17, find the table for the indicated disk unit type. Perform
problem analysis for the disk unit reference code. This ends the procedure.
20. Are the 2 rightmost characters of word 2 on the Problem summary form equal to 62?
No: Use the information in characters 9-16 of the bottom 16 character line of function 13 (word 9)
and use this information instead of the information in word 1 for the reference code. This ends
the procedure.
Yes: Continue with the next step.
21. Are characters 9-16 of the top 16 character line of function 12 (word 3) equal to 00010004?
Yes: Continue with the next step.
No: Go to step 24 on page 114.
22. Are characters 13-16 of the bottom 16 character line of function 12 (4 rightmost characters of word 5)
equal to 0000?
No: Continue with the next step.
Yes: Go to step 25 on page 114.
23. Note the following:
v Characters 13-16 of the bottom 16 character line of function 12 (4 rightmost characters of word 5)
contain the disk unit reference code.
v Characters 1-8 of the top 16 character line of function 13 (word 6) contains the disk unit address.
Isolation procedures
113
v Characters 9-16 of the top 16 character line of function 13 (word 7) contain the IOP direct select
address.
v Characters 1-8 of the bottom 16 character line of function 13 (word 8) contains the disk unit type,
level and model number.
Find the table for the disk unit type (characters 1-4 of the bottom 16 character line of function 13 - 4
leftmost characters of word 8), and use characters 13-16 of the bottom 16 character line of function 12
(4 rightmost characters of word 5) as the unit reference code. This ends the procedure.
24. Are characters 9-16 of the top 16 character line of function 12 (word 3) equal to 0002000D?
v Yes: Continue with the next step.
v No: Use the information in characters 9-16 of the bottom 16 character line of function 13 (word 9),
instead of the information in word 1 for the reference code, and perform problem analysis.
– Characters 1-8 of the top 16 character line of function 13 (word 6) may contain the disk unit
address.
– Characters 9-16 of the top 16 character line of function 13 (word 7) may contain the IOP direct
select address.
– Characters 1-8 of the bottom 16 character line of function 13 (word 8) may contain the disk unit
type, level and model number. This ends the procedure.
25. Note the following:
v Characters 1-8 of the top 16 character line of function 13 (word 6) contains the disk unit address.
v Characters 9-16 of the top 16 character line of function 13 (word 7) contain the IOP direct select
address.
v Characters 1-8 of the bottom 16 character line of function 13 (word 8) contains the disk unit type,
level and model number.
Find the table for the disk unit type (characters 1-4 of the bottom 16 character line of function 13 (4
leftmost characters of word 8) and use 3002 as the unit reference code. Exchange the FRUs for URC
3002 one at a time. This ends the procedure.
LICIP14
Licensed Internal Code detected a card slot test failure.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
1. Has the I/O adapter moved to a new card location?
Yes: Continue with the next step.
No: Go to step 4.
2. Perform one of the following, and then continue with the next step:
v Use the concurrent maintenance option in Hardware Service Manager in SST/DST to power off,
remove, reinsert, and power on the I/O adapter.
OR
v Power off the system, remove and reinsert the I/O adapter. Then IPL the system.
3. Does the reference code occur again for this same I/O adapter?
v Yes: Continue with the next step.
v No: No further service action is needed.
This ends the procedure.
4. Move the I/O adapter to a different card location, that has no I/O processors in the PCI bridge set, by
performing one of the following, and then continue with the next step:
114
Isolation procedures
v Use the concurrent maintenance option in Hardware Service Manager in SST/DST to power off,
remove the I/O adapter, install the I/O adapter in a different card location, and power on the I/O
adapter.OR
v Power off the system, remove the I/O adapter, install the I/O adapter in a different card location,
and then IPL the system.
5. Does the same reference code occur again for this I/O adapter?
v Yes: Replace the I/O adapter.
This ends the procedure.
v No: Replace the backplane.
This ends the procedure.
LICIP15
Use this procedure to help you recover from an initial program load (IPL) failure.
1. Is the system managed by Hardware Management Console (HMC), Systems Director Management
Console (SDMC), or Integrated Virtualization Manager (IVM)?
Yes: Continue with the next step.
No: Go to step 4.
2. Check the LPAR configuration to ensure that the load source and alternate load source devices are
valid. Is the LPAR configuration correct?
Yes: Continue with the next step.
No: Correct the LPAR configuration problem. This ends the procedure.
3. Is the load source hosted by another partition?
Yes: Contact your next level of support.
No: Continue with the next step.
4. Did the failure occur when you were performing a type-D IPL?
v No: Go to step 10 on page 116.
v Yes: Perform the following steps:
a. Ensure that the device is ready and has valid install media.
b. Ensure that the device has the correct SCSI address and that any cables are properly connected
and terminated.
If a correction is made during the above checks, retry the IPL. If none of the above items resolve
the problem, continue with the next step.
5. Are the load source and alternate load source devices controlled by the same I/O adapter, and does
the load source disk unit have SLIC loaded on it?
Yes: Continue with the next step.
No: Go to step 7 on page 116.
6. Perform a type-B IPL in manual mode. Does the same SRC occur?
v No: Continue with the next step.
v Yes: Replace the following items, one at a time, and retry the IPL until the problem is resolved
(see System FRU locations):
a. The I/O adapter controlling load source and alternate load source devices.
Note: The I/O adapter may be embedded on the system unit backplane.
b. The common cable, if present, attached between both the load source and alternate load source
and the controlling I/O adapter.
c. If none of the items above resolve the problem, contact your next level of support. This ends
the procedure.
Isolation procedures
115
7. Replace the following items, one at a time, and retry the type-D IPL until the problem is resolved
(see System FRU locations):
a. Media in the alternate load source device
b. Device cables (if present)
c. Media device
d. Media backplane
e. I/O adapter controlling the alternate load source device
Note: The I/O adapter may be embedded on the system unit backplane
f. If the problem persists after replacing each of these parts, contact your next level of support. This
ends the procedure.
8. You performed a type A or type B IPL. Is the load source I/O adapter a Fibre Channel adapter?
Yes: Continue with the next step.
No: Continue with step 10.
9. Perform a type-D IPL in manual mode to DST. Look for other SRCs and use them to resolve the
problem. If there are no SRCs, or if the SRCs do not resolve the problem, perform the actions for the
2847 3100 SRC. This ends the procedure.
10. Is the device in a valid location (see System FRU locations)?
Yes: Continue with the next step.
No: Correct the device location problem and retry the IPL. If the problem persists, continue with
the next step.
11. Perform a type-D IPL in manual mode to DST. Is the type-D IPL successful?
v No: Continue with the next step.
v Yes: Look for other SRCs and use them to resolve the problem. If there are no SRCs, or the SRCs
do not resolve the problem, replace the following items, one at a time, until the problem is
resolved (see System FRU locations):
a. Load source disk drive
b. Cables (if present)
c. Disk drive backplane
d. I/O adapter controlling the load source device
Note: The I/O adapter may be embedded on the system unit backplane
e. Backplane that the I/O adapter is plugged into
f. If the problem persists after replacing each of these parts, contact your next level of support.
This ends the procedure.
12. The type-D IPL in manual mode to DST was not successful. Is the I/O adapter embedded on the
system unit backplane?
No: Continue with the next step.
Yes: Replace the system unit backplane and retry the IPL. If the IPL still fails, contact your next
level of support. This ends the procedure.
13. Are the load source and alternate load source controlled by the same I/O adapter?
No: Go to step 16 on page 117.
Yes: Continue with the next step.
14. Replace the I/O adapter and perform a type-A or type-B IPL. Does the IPL complete successfully?
Yes: This ends the procedure.
No: Continue with the next step.
15. Perform a type-D IPL in manual mode to DST. Is the type-D IPL successful?
v No: Continue with the next step.
116
Isolation procedures
v Yes: Look for other SRCs and use them to resolve the problem. If there are no SRCs, or the SRCs
do not resolve the problem, replace the following items, one at a time, until the problem is
resolved (see System FRU locations):
a.
b.
c.
d.
Load source disk drive
Cables (if present)
Disk drive backplane
I/O adapter controlling the load source device
Note: The I/O adapter may be embedded on the system unit backplane
e. Backplane that the I/O adapter and I/O processor are plugged into
f. If the problem persists after replacing each of these parts, contact your next level of support.
This ends the procedure.
16. Replace the backplane that the I/O adapter is plugged into and retry the IPL. If the IPL still fails,
contact your next level of support. This ends the procedure.
LICIP16
Use this procedure to identify an adapter that is operational but is not located in the same partition as its
associated adapter.
An adapter identified that its associated adapter is operational but is not located in the same partition.
Use this procedure to identify the serial number and then find the location of the associated adapter and
reassign it so that both adapters are in the same partition. Note: If the associated adapter is located in a
different IBM i partition, there might also be a B600690A logged against the associated adapter in that
partition.
1. The adapter against which the B600690A is logged has identified that its associated adapter can not be
found in this partition. Find the resource name that this error was logged against. This can be
obtained from the Service Action Log. Then, using the resource name, perform the following steps:
a. Access SST or DST.
b. Select Start a Service Tool.
c. Select Hardware Service Manager.
d. Select Locate resource by resource name.
e. Enter the resource name that this error was logged against.
f. Take the option to Display detail for the adapter.
2. The bottom of the resource detail screen displays any combination of the following information:
Attached storage IOA resource name. :
Attached storage IOA serial number. :
Attached storage IOA link status. . :
Or
Attached auxiliary IOA resource name:
Attached auxiliary IOA serial number:
Attached auxiliary IOA link status. :
Or
Remote storage IOA resource name. . :
Remote storage IOA serial number. . :
Remote storage IOA link status. . . :
3. Using the serial number information displayed for the Attached or Remote IOA, have the customer
determine which partition currently owns the adapter with that serial number by using logical
resource or VPD utilities in each of the partitions on the system.
Note: The CCIN of the associated adapter is the first four characters of word 6 of the SRC.
Isolation procedures
117
4. Then, have the customer ensure that both adapters are owned by the same partition. For further
assistance, the customer should contact their software service provider. This ends the procedure.
Logical partition isolation procedure
Identify logical partition (LPAR) configuration conditions and the associated corrective actions.
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
LPRIP01
Use this procedure to isolate the problem when LPAR configuration data does not match the current
system configuration.
1. Is there only one B6005311 error logged, and is it against the load source device for the partition, in
either the primary or a secondary partition?
v Yes: Is the reporting partition the primary partition?
– Yes: Continue with the next step.
– No: Go to step 3 on page 119.
v No: Go to step 4 on page 119.
118
Isolation procedures
2. Was the load source disk unit migrated from another partition within the same system?
v Yes: Is this load source device intended to be the load source for the primary partition?
– Yes: To accept the load source disk unit: Go to SST/DST in the current partition and select
Work with system partitions > Recover configuration data > Accept load source disk unit.
This ends the procedure.
– No: Power off the system. Return the original load source disk to the primary partition and
perform a system IPL. This ends the procedure.
v No: The load source disk unit has not changed. Contact your next level of support. This ends the
procedure.
3. The reporting partition is a secondary partition.
Since the last IPL of the reporting partition, has one of the following events occurred?
v Has the primary partition time/date been moved backward to a time/date earlier than the
previous setting?
v Has the system serial number been changed?
v Was the load source disk unit in this secondary partition replaced intentionally with a load source
from another system or another partition from the same system?
v Yes: To accept the load source disk unit: Go to SST/DST in the current partition and select Work
with system partitions > Recover configuration data > Accept load source disk unit. This ends
the procedure.
v No: Contact your next level of support. This ends the procedure.
4. Are there multiple B6005311 SRCs logged in the same partition?
v Yes: Continue with the next step.
v No: None of the conditions in this procedure have been met. Contact your next level of support.
This ends the procedure.
5. Is the resource for one of the B6005311 SRCs the load source device and are all of the other B6005311
entries for resources which are non-configured disk units?
Note: To determine if a disk unit is a non-configured disk unit, see the "Work with disk unit
options" section in the "DST options" section of the DST topic collection in the iSeries Service
Functions information.
v Yes: Is the partition that is reporting the error the primary partition?
– Yes: Continue with the next step.
– No: Go to step 7.
v No: Go to step 8 on page 120.
6. Was the load source disk unit migrated from another partition within the same system?
v Yes: Is this load source device intended to be the load source for the primary partition?
– Yes: To accept the load source disk unit: Go to SST/DST in the current partition and select
Work with system partitions > Recover configuration data > Accept load source disk unit.
This ends the procedure.
– No: Power off the system. Return the original load source disk to the primary partition and
perform a system IPL. This ends the procedure.
v No: The load source disk unit has not changed. Contact your next level of support. This ends the
procedure.
7. The reporting partition is a secondary partition.
Since the last IPL of the reporting partition, has one of the following events occurred?
v Has the primary partition time/date been moved backward to a time/date earlier than the
previous setting?
v Has the system serial number been changed?
Isolation procedures
119
v Was the load source disk unit in this secondary partition, replaced intentionally with a load source
from another system or another partition from the same system?
v Yes: To accept the load source disk unit: Go to SST/DST in the current partition and select Work
with system partitions > Recover configuration data > Accept load source disk unit. This ends
the procedure.
v No: Contact your next level of support. This ends the procedure.
8. One or more B6005311 SRCs have been logged in the same partition.
Do all of the B6005311 errors have a resource that is a non-configured disk unit in the partition?
Note: To determine if a disk unit is a non-configured disk unit, see the "Work with disk unit
options" section in the "DST options" section of the DST topic collection in the iSeries Service
Functions information.
v Yes: Continue with the next step.
v No: None of the conditions in this procedure have been met. Contact your next level of support.
This ends the procedure.
9. Have any disk unit resources associated with the B6005311 SRCs been added to the partition since
the last IPL of the partition?
v No: Continue with the next step.
v Yes: Perform the following steps to clear non-configured disk unit configuration data:
a. Go to SST/DST in the partition and select Work with system partitions > Recover
configuration data > Clear non-configured disk unit configuration data.
b. Select each unit in the list that is new to the system and press Enter.
c. Continue the system IPL. This ends the procedure.
10. None of the resources that are associated with the B6005311 SRCs are disk units that were added to
the partition since the last IPL of the partition.
Has a scratch install recently been performed on the partition that is reporting the errors?
v No: Continue with the next step.
v Yes: Go to step 13.
11. If a scratch install was not performed, was the clear configuration data option recently used to
discontinue LPAR use?
v Yes: Continue with the next step.
v No: The Clear configuration data option was not used. Contact your next level of support. This
ends the procedure.
12. Perform the following steps to clear non-configured disk unit configuration data:
a. Go to SST/DST in the partition and select Work with system partitions > Recover configuration
data > Clear non-configured disk unit configuration data.
b. Select each unit in the list that is new to the system and press Enter.
c. Continue the system IPL. This ends the procedure.
13. Was the load source device previously mirrored before the scratch install?
v Yes: Continue with the next step.
v No: Go to step 15.
14. Perform the following steps to clear the old configuration data from the disk unit that was mirroring
the old load source disk
a. Go to SST/DST in the partition and select Work with system partitions > Recover configuration
data > Clear non-configured disk unit configuration data.
b. Select the former load source mirror in the list and press Enter.
15. Is the primary partition reporting the B6005311 errors?
v No: This ends the procedure.
120
Isolation procedures
v Yes: Does the customer want multiple partitions on the system?
– No: This ends the procedure.
– Yes: Use the Recover primary partition configuration data option to retrieve the LPAR
configuration data from other devices in the system.
a. Go to SST/DST in the primary partition and select Work with system partitions > Recover
configuration data > Recover primary partition configuration data. The system will
perform an automatic IPL.
b. Verify the information that appears.
- The device should be a former load source device from a secondary partition.
- The time and date should reflect a time when that partition was active. It should be more
recent than the last change to the logical partition configuration. This ends the
procedure.
Isolation procedures
121
Operations console isolation procedures
These procedures help you to isolate a failure with the Operations Console.
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
The following safety notices apply throughout this section.
Read all safety procedures before servicing the system. Observe all safety procedures when performing a
procedure. Unless instructed otherwise, always power off the system or expansion unit where the
field-replaceable unit (FRU) is located before removing, exchanging, or installing a FRU.
OPCIP03
Use this procedure to isolate a bringup failure with Operations Console.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
122
Isolation procedures
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
Use this procedure to isolate an Operations Console bringup failure when the SRC on the panel is
A6xx5008 or B6xx5008. If you are not using the Operations Console, see A6005004. This procedure only
works with cable-connected and LAN configurations. It is not valid for dial-connected configurations.
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to Determine if the system has logical
partitions before continuing with this procedure.
2. Is the SRC on the panel A6xx5008 or B6xx5008?
v No: This ends the procedure.
v Yes: Are you connecting Operations Console using the ASYNC adapter?
Yes: Continue with the next step.
No: You are connecting using a LAN adapter. Go to step 6 on page 124.
3. Are words 17, 18, and 19 all equal to 00000000?
v Yes: Report the problem to your next level of support. This ends the procedure.
v No: Is word 17 equal to 00000001?
No: Continue with the next step.
Isolation procedures
123
Yes: The ASYNC adapter was not detected. Ensure that the ASYNC adapter card is installed, or
replace the IOA and try again. This ends the procedure.
4. Is word 17 equal to 00000002?
v Yes: On the ASYNC adapter card that was found, no cable was detected. Word 18 contains the card
position. Locate the ASYNC adapter card in this card position, and ensure that the external cable is
attached. Install or replace the external cable. This ends the procedure.
v No: Is word 17 equal to 00000003?
No: Continue with the next step.
Yes: The cable that was detected does not have the correct cable ID. Word 18 contains the card
position. Word 19 contains the cable ID. Locate the ASYNC adapter card in this card position,
and verify that the correct cable is attached, or replace the cable. This ends the procedure.
5. Is word 17 equal to 00000004?
No: Report the problem to you next level of support. This ends the procedure.
Yes: Operations Console failed to make a connection because the port is already being used. Word
18 contains the card position. Disconnect the active communications session and try using the
resource again. This ends the procedure.
6. Are words 13, 14 and 15 all equal to 00000000?
v Yes: Report the problem to you next level of support. This ends the procedure.
v No: Is word 13 equal to 00000002?
No: Continue with the next step.
Yes: The LAN hardware failed to activate. Replace the LAN IOA being used. This ends the
procedure.
7. Is word 13 equal to 00000003?
v No: Continue with the next step.
v Yes: A hardware error occurred. Word 14 contains the error code (example: 53001A80). Word 15
contains the card position.
Is the error code equal to 53001A80?
Yes: The network cable is not attached to the LAN adapter, the cable is defective, or the network
is not operational. This ends the procedure.
No: The LAN adapter hardware is not operational. Replace the hardware and try again. This
ends the procedure.
8. Is word 13 equal to 00000004?
v Yes: The console did not respond. Word 14 contains the number of attempts made. Word 15
contains the card position. The system is inserted into the network but there is no connection to the
client (PC). Verify the configuration for the network at the system and client; verify the
configuration of Operations Console. This ends the procedure.
v No: Is word 13 equal to 00000005?
No: Report the problem to your next level support. This ends the procedure.
Yes: IP information was received from the console. Word 14 contains the IP address that was
received. Verify the configuration data for the client (PC) or verify the configuration for the
network. This ends the procedure.
Power isolation procedures
Use power isolation procedures for isolating a problem in the power system. Use isolation procedures if
there is not a management console attached to the server. If the server is connected to a management
console, use the procedures that are available on the management console to continue FRU isolation.
Some field replaceable units (FRUs) can be replaced with the unit powered on. Follow the instructions in
System FRU locations when directed to remove, exchange, or install a FRU.
124
Isolation procedures
The following safety notices apply throughout the power isolation procedures. Read all safety procedures
before servicing the system and observe all safety procedures when performing a procedure.
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
Power problems
Use the following table to learn how to begin analyzing a power problem.
Table 23. Analyzing power problems
Symptom
What you should do
System unit does not power on.
See “Cannot power on system unit” on page 126.
The processor or I/O expansion unit does not power off.
See “Cannot power off system or SPCN-controlled I/O
expansion unit” on page 129.
The system does not remain powered on during a loss of See the UPS user's guide that was provided with your
incoming ac voltage and has an uninterruptible power
unit.
supply (UPS) installed.
An I/O expansion unit does not power on.
See “Cannot power on SPCN-controlled I/O expansion
unit” on page 132.
Isolation procedures
125
Cannot power on system unit
Perform this procedure until you correct the problem and you can power on the system.
For important safety information before continuing with this procedure, see “Power isolation procedures”
on page 124.
1. Attempt to power on the system. See Powering on and powering off the system for information
about powering on or off your system. Does the system power on, and is the system power status
indicator light on continuously?
Note: The system power status indicator flashes at the slower rate (one flash per two seconds) while
powered off, and at the faster rate (one flash per second) during a normal power-on sequence.
No: Continue with the next step.
Yes: Go to step 20 on page 129.
2. Are there any characters displayed on the control panel (a scrolling dot may be visible as a
character)?
No: Continue with the next step.
Yes: Go to step 5.
3. Are the mainline ac power cables from the power supply, power distribution unit, or external
uninterruptible power supply (UPS) to the customer's ac power outlet connected and seated correctly
at both ends?
Yes: Continue with the next step.
No: Connect the mainline ac power cables correctly at both ends and go to step 1.
4. Perform the following steps:
a. Verify that the UPS is powered on (if it is installed). If the UPS will not power on, follow the
service procedures for the UPS to ensure proper line voltage and UPS operation.
b. Disconnect the mainline ac power cable or ac power jumper cable from the system's ac power
connector at the system.
c. Use a multimeter to measure the ac voltage at the system end of the mainline ac power cable or
ac power jumper cable.
Note: Some system models have more than one mainline ac power cable or ac power jumper
cable. For these models, disconnect all the mainline ac power cables or ac power jumper cables
and measure the ac voltage at each cable before continuing with the next step.
Is the ac voltage from 200 V ac to 240 V ac, or from 100 V ac to 127 V ac?
No: Go to step 15 on page 128.
Yes: Continue with the next step.
5. Perform the following steps:
a. Disconnect the mainline ac power cables from the power outlet.
b. Exchange the system unit control panel (Un-D1). See System FRU locations.
c. Reconnect the mainline ac power cables to the power outlet.
d. Attempt to power on the system.
Does the system power on?
No: Continue with the next step.
Yes: The system unit control panel was the failing item. This ends the procedure.
6. Perform the following steps:
a. Disconnect the mainline ac power cables from the power outlet.
b. Exchange the power supply or supplies (Un-E1, Un-E2). See System FRU locations.
c. Reconnect the mainline ac power cables to the power outlet.
d. Attempt to power on the system. See Powering on and powering off the system.
126
Isolation procedures
Does the system power on?
No: Continue with the next step.
Yes: The power supply was the failing item. This ends the procedure.
7. Is the system an 8248-L4T, 8408-E8D, or 9109-RMD?
8.
No:
Go to step 9.
Yes:
Continue with the next step.
Perform the following steps:
a. Disconnect the mainline ac power cables.
b. Replace the service processor card at location Un-P1-C1.
c. Reconnect the mainline ac power cables to the power outlet.
d. Attempt to power on the system. Does the system power on?
No:
Continue with the next step.
Yes:
The service processor card was the failing item. This ends the procedure.
9. Is the system a 8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD?
No:
Go to step 14.
Yes:
Continue with the next step.
10. Perform the following steps:
a. Disconnect the mainline ac power cables.
b. Replace the service processor card in the primary processor enclosure.
c. Reconnect the mainline ac power cables to the power outlet.
d. Attempt to power on the system. Does the system power on?
No:
Continue with the next step.
Yes:
The service processor card was the failing item. This ends the procedure.
11. Perform the following steps:
a. Disconnect the mainline ac power cables.
b. Replace the service processor card in the first secondary processor enclosure.
c. Reconnect the mainline ac power cables to the power outlet.
d. Attempt to power on the system. Does the system power on?
No:
Continue with the next step.
Yes:
The service processor card was the failing item. This ends the procedure.
12. Perform the following steps:
a. Disconnect the mainline ac power cables.
b. Replace the midplane in the primary processor enclosure.
c. Reconnect the mainline ac power cables to the power outlet.
d. Attempt to power on the system. Does the system power on?
No:
Continue with the next step.
Yes:
The midplane was the failing item. This ends the procedure.
13. Perform the following steps:
a. Disconnect the mainline ac power cables.
b. Replace the midplane in the first secondary processor enclosure.
c. Reconnect the mainline ac power cables to the power outlet.
d. Attempt to power on the system. Does the system power on?
Isolation procedures
127
No:
Contact your next level of support.
Yes:
The midplane was the failing item. This ends the procedure.
14. Perform the following steps:
a. Disconnect the mainline ac power cables.
b. Replace the system backplane (Un-P1). See System FRU locations.
c. Reconnect the mainline ac power cables to the power outlet.
d. Attempt to power on the system.
Does the system power on?
No: Continue with the next step.
Yes: The system backplane was the failing item. This ends the procedure.
15. Are you working with a system unit with a power distribution unit with tripped breakers?
v No: Continue with the next step.
v Yes: Perform the following steps:
a. Reset the tripped power distribution breaker.
b. Verify that the removable ac power cable is not the problem. Replace the cord if it is defective.
c. If the breaker continues to trip, install a new power supply in each location until the defective
one is found. This ends the procedure.
16. Does the system have an external UPS installed?
Yes: Continue with the next step.
No: Go to step 18.
17. Use a multimeter to measure the ac voltage at the external UPS outlets. Is the ac voltage from 200 V
ac to 240 V or from 100 V ac to 127 V ac?
No: The UPS needs service. For 9910 type UPS, call IBM Service Support. For all other UPS types,
have the customer call the UPS provider. In the meantime, go to step 19 to bypass the UPS.
Yes: Replace the ac power cable. See System parts for FRU part number. This ends the
procedure.
18. Perform the following steps:
a. Disconnect the mainline ac power cable from the customer's ac power outlet.
b. Use a multimeter to measure the ac voltage at the customer's ac power outlet.
Note: Some system models have more than one mainline ac power cable. For these models,
disconnect all the mainline ac power cables and measure the ac voltage at all ac power outlets
before continuing with this step.
Is the ac voltage from 200 V ac to 240 V ac or from 100 V ac to 127 V ac?
Yes: Exchange the mainline ac power cable. See System parts for the FRU part number. Then
go to step 1 on page 126.
No: Inform the customer that the ac voltage at the power outlet is not correct. When the ac
voltage at the power outlet is correct, reconnect the mainline ac power cables to the power
outlet. This ends the procedure.
19. Perform the following steps to bypass the UPS unit:
a. Power off your system and the UPS unit.
b. Remove the signal cable used between the UPS and the system.
c. Remove any power jumper cords used between the UPS and the attached devices.
d. Remove the country or region-specific power cord used from the UPS to the wall outlet.
e. Use the correct power cord (the original country or region-specific power cord that was provided
with your system) and connect it to the power inlet on the system. Plug the other end of this
cord into a compatible wall outlet.
128
Isolation procedures
f. Attempt to power on the system.
Does the power-on standby sequence complete successfully?
Yes: Go to Verify a repair. This ends the procedure.
No: Go to step 5 on page 126.
20. Display the selected IPL mode on the system unit control panel. Is the selected mode the same mode
that the customer was using when the power-on failure occurred?
No: Go to step 22.
Yes: Continue with the next step.
21. Is a function 11 reference code displayed on the system unit control panel?
No: Go to step 23.
Yes: Return to Start of call. This ends the procedure.
22. Perform the following steps:
a. Power off the system. See Powering on and powering off the system for information about
powering on and off your system.
b. Select the mode on the system unit control panel that the customer was using when the
power-on failure occurred.
c. Attempt to power on the system.
Does the system power on?
Yes: Continue with the next step.
No: Exchange the system unit control panel (Un-D1). See System FRU locations. This ends the
procedure.
23. Continue the IPL. Does the IPL complete successfully?
Yes: This ends the procedure.
No: Return to Start of call. This ends the procedure.
Cannot power off system or SPCN-controlled I/O expansion unit
Use this procedure to analyze a failure of the normal command and control panel procedures to power
off the system unit or an SPCN-controlled I/O expansion unit.
Attention: To prevent loss of data, ask the customer to verify that no interactive jobs are running before
you perform this procedure.
For important safety information before continuing with this procedure, see “Power isolation procedures”
on page 124.
1. Is the power off problem on the system unit?
No: Continue with the next step.
Yes: Go to step 3.
2. Ensure that the SPCN cables that connect the units are connected and seated correctly at both ends.
Does the I/O unit power off, and is the power indicator light flashing slowly?
Yes: This ends the procedure.
No: Go to step 7 on page 130.
3. Attempt to power off the system. Does the system unit power off, and is the power indicator light
flashing slowly?
No: Continue with the next step.
Yes: The system is not responding to normal power off procedures which could indicate a
Licensed Internal Code problem. Contact your next level of support. This ends the procedure.
4. Attempt to power off the system using ASMI. Does the system power off?
Yes: The system is not responding to normal power off procedures which could indicate a
Licensed Internal Code problem. Contact your next level of support. This ends the procedure.
Isolation procedures
129
No: Continue with the next step.
5. Attempt to power off the system using the control panel power button. Does the system power off?
Yes: Continue with the next step.
No: Go to step 10.
6. Is there a reference code logged in the ASMI, the control panel, or the management console that
indicates a power problem?
Yes: Perform problem analysis for the reference code in the log. This ends the procedure.
No: Contact your next level of support. This ends the procedure.
7. Is the I/O expansion unit that will not power off part of a shared expansion unit loop?
Yes: Go to step 9.
No: Continue with the next step.
8. Attempt to power off the I/O expansion unit. Were you able to power off the expansion unit?
Yes: This ends the procedure.
No: Go to step 10.
9. The unit will only power off under certain conditions:
v If the unit is in private mode, it should power off with the system unit that is connected by the
SPCN frame-to-frame cable.
v If the unit is in switchable mode, it should power off if the "owning" system is powered off or is
powering off, and the system unit that is connected by the SPCN frame-to-frame cable is powered
off or is powering off.
Does the I/O expansion unit power off?
No: Continue with the next step.
Yes: This ends the procedure.
10. Ensure there are no jobs running on the system or partition, and verify that an uninterruptible
power supply (UPS) is not powering the system or I/O expansion unit. Then continue with the next
step.
11. Perform the following steps:
a. Remove the system or I/O expansion unit ac power cord from the external UPS or, if an external
UPS is not installed, from the customer's ac power outlet. If the system or I/O expansion unit has
more than one ac line cord, disconnect all the ac line cords.
b. Exchange the following FRUs one at a time. See System FRU locations and System parts for
information about FRU locations and parts for the system that you are servicing.
If the system unit is failing:
1) Power supply (Un-E1 or Un-E2). Go to step 12 on page 131.
2) For the 8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, 8205-E6D, 8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C, 8231-E2D, or 8268-E1D system, exchange the service processor (Un-P1).
For the 8233-E8B or 8236-E8C system, replace the system backplane (Un-P1).
For the 8248-L4T, 8408-E8D, 8412-EAD, 9109-RMD, 9117-MMB, 9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or 9179-MHD system, replace the service processor (Un-P1-C1). If that
does not resolve the problem, replace the midplane (Un-P1).
3) System control panel (Un-D1)
If an I/O expansion unit is failing:
1) Each power supply. Go to step 12 on page 131.
2) I/O backplane
3) I/O backplane for the expansion unit that is sequentially configured before the expansion unit
that will not power off
4) SPCN frame-to-frame cable
130
Isolation procedures
This ends the procedure.
12. A power supply might be the failing item.
Attention: When replacing a redundant power supply, a 1xxx1504, 1xxx1514, 1xxx1524, or 1xxx1534
reference code may be logged in the error log. If you just removed and replaced the power supply in
the location associated with this reference code, and the power supply came ready after the install,
disregard this reference code. If you had not previously removed and replaced a power supply, the
power supply did not come ready after installation, or there are repeated fan fault errors after the
power supply replacement, continue to follow these steps.
Is the reference code 1xxx15xx?
No: Continue with the next step.
Yes: Perform the following steps:
a. Find the unit reference code in one of the following tables to determine the failing power supply.
b. Ensure that the power cables are properly connected and seated.
c. Is the reference code 1xxx1500, 1xxx1510, 1xxx1520, or 1xxx1530 and is the failing unit configured
with a redundant power supply option (or dual line cord feature)?
v Yes: Perform “PWR1911” on page 151 before replacing parts.
v No: Continue with step 12d.
d. See System FRU locations for information about FRU locations for the system that you are
servicing.
e. Replace the failing power supply (see the following tables to determine which power supply to
replace).
f. If the new power supply does not fix the problem, perform the following :
1) Reinstall the original power supply.
2) Try the new power supply in each of the other positions listed in the table.
3) If the problem still is not fixed, reinstall the original power supply and go to the next FRU in
the list.
4) For reference codes 1xxx1500, 1xxx1510, 1xxx1520, and 1xxx1530, exchange the power
distribution backplane if a problem persists after replacing the power supply.
Table 24. System unit
Unit reference code
Power supply
1510, 1511, 1512, 1513, 1514, 7110
E1
1520, 1521, 1522, 1523, 1524, 7120
E2
Attention: For reference codes 1500, 1510, 1520, and 1530, perform “PWR1911” on page 151 before
replacing parts.
Table 25. 7314-G30 expansion unit
Unit reference code
Power supply
1510, 1511, 1512, 1513, 1514, 1516, 1517
P01/E1
1520, 1521, 1522, 1523, 1524, 1526, 1527
P02/E2
This ends the procedure.
13. Is the reference code 1xxx2600, 1xxx2603, 1xxx2605, or 1xxx2606?
v No: Continue with the next step.
v Yes: Perform the following steps:
a. See System FRU locations for information about FRU locations for the system you are servicing.
b. Replace the failing power supply.
Isolation procedures
131
c. Perform the following if the new power supply does not fix the problem:
1) Reinstall the original power supply.
2) Try the new power supply in each of the other positions listed in the table.
3) If the problem still is not fixed, reinstall the original power supply and go to the next FRU in
the list.
Attention:
module.
Do not install power supplies P00 and P01 ac jumper cables on the same ac input
Table 26. Failing power supplies
System or feature code
Failing power supply
System unit
Un-E1, Un-E2
7314-G30
E1, E2
This ends the procedure.
14. Is the reference code 1xxx8455 or 1xxx8456?
v No: Return to Start of call. This ends the procedure.
v Yes: One of the power supplies is missing, and must be installed. Use the following table to
determine which power supply is missing, and install the power supply. See System FRU locations
for information about FRU locations for the system you are servicing.
Table 27. Missing power supplies
Reference code
Missing power supply
1xxx8455
Un-E1
1xxx8456
Un-E2
This ends the procedure.
Cannot power on SPCN-controlled I/O expansion unit
You are here because an SPCN-controlled I/O expansion unit cannot be powered on, and might be
displaying a 1xxxC62E reference code.
For important safety information before continuing with this procedure, see “Power isolation procedures”
on page 124.
1. Power on the system.
2. Start from SPCN 0 or SPCN 1 on the system unit. See System FRU locations, then go to the first unit
in the SPCN frame-to-frame cable sequence that does not power on. Is the Data display background
light on, or is the power-on LED flashing, or are there any characters displayed on the I/O
expansion unit display panel?
Note: The background light is a dim yellow light in the Data area of the display panel.
Yes: Go to step 12 on page 134.
No: Continue with the next step.
3. Use a multimeter to measure the ac voltage at the customer's ac power outlet.
Is the ac voltage from 200 V ac to 240 V ac, or from 100 V ac to 127 V ac?
v Yes: Continue with the next step.
v No: Inform the customer that the ac voltage at the power outlet is not correct.
This ends the procedure.
4. Is the mainline ac power cable from the ac module, power supply, or power distribution unit to the
customer's ac power outlet connected and seated correctly at both ends?
v Yes: Continue with the next step.
132
Isolation procedures
v No: Connect the mainline ac power cable correctly at both ends.
This ends the procedure.
5. Perform the following steps:
a. Disconnect the mainline ac power cable from the ac module, power supply, or power distribution
unit.
b. Use a multimeter to measure the ac voltage at the ac module, power supply, or power
distribution unit end of the mainline ac power cable.
Is the ac voltage from 200 V ac to 240 V ac, or from 100 V ac to 127 V ac?
No: Continue with the next step.
Yes: Go to step 7.
6. Are you working on a power distribution unit with tripped breakers?
v No: Replace the mainline ac power cable or power distribution unit.
This ends the procedure.
v Yes: Perform the following steps:
a. Reset the tripped power distribution breaker.
b. Verify that the removable ac line cord is not the problem. Replace the cord if it is defective.
c. Install a new power supply (one with the same part number as the one that is currently
installed) in all power locations until the defective one is found.
This ends the procedure.
7. Does the unit you are working on have ac power jumper cables installed?
Note: The ac power jumper cables connect from the ac module, or the power distribution unit, to
the power supplies.
Yes: Continue with the next step.
No: Go to step 11 on page 134.
8. Are the ac power jumper cables connected and seated correctly at both ends?
v Yes: Continue with the next step.
v No: Connect the ac power jumper cables correctly at both ends.
This ends the procedure.
9. Perform the following steps:
a. Disconnect the ac power jumper cables from the ac module or power distribution unit.
b. Use a multimeter to measure the ac voltage at the ac module or power distribution unit (that
goes to the power supplies).
Is the ac voltage at the ac module or power distribution unit from 200 V ac to 240 V ac, or from 100
V ac to 127 V ac?
v Yes: Continue with the next step.
v No: Replace the following items (see System parts for location and part number information):
– ac module
– Power distribution unit
This ends the procedure.
10. Perform the following steps:
a. Connect the ac power jumper cables to the ac module, or power distribution unit.
b. Disconnect the ac power jumper cable at the power supplies.
c. Use a multimeter to measure the voltage input that the power jumper cables provide to the
power supplies.
Is the voltage 200 V ac to 240 V ac or 100 V ac to 127 V ac for each power jumper cable?
v No: Exchange the power jumper cable.
Isolation procedures
133
This ends the procedure.
v Yes: Replace the following parts one at a time:
a. I/O backplane
b. Display unit
c. Power supply 1
d. Power supply 2
e. Power supply 3
This ends the procedure.
11. Perform the following steps:
a. Disconnect the mainline ac power cable (to the expansion unit) from the customer's ac power
outlet.
b. Exchange the following FRUs, one at a time:
v Power supply
v I/O backplane
c. Reconnect the mainline ac power cables (from the expansion unit) into the power outlet.
d. Attempt to power on the system.
Does the expansion unit power on?
v Yes: The unit you exchanged was the failing item.
This ends the procedure.
v No: Repeat this step and exchange the next FRU in the list. If you have exchanged all of the FRUs
in the list, ask your next level of support for assistance.
This ends the procedure.
12. Is there a reference code displayed on the display panel for the I/O unit that does not power on?
v Yes: Continue with the next step.
v No: Replace the I/O backplane.
This ends the procedure.
13. Is the reference code 1xxxxx2E?
v Yes: Continue with the next step.
v No: Use the new reference code and return to Start of call.
This ends the procedure.
14. Do the SPCN optical cables (A) connect the failing unit (B) to the preceding unit in the chain or
loop?
.---------.
(A) SPCN
| System | Optical Cables -.
.----- SPCN
| Unit
|
|
V Optical Adapter
| SPCN 0 |
.-.
V
.-.
’----.----’
| +------------+ |
|
| +------------+ |
.----’----. .-------’-+
.-------’-+ .---------.
| J15
| |Sec
J16|
|Sec
J15| |
Sec |
|Sec Unit +-+UNIT J15|
|Unit J16+-+J15 Unit |
| 1
| | 2
|
| 3
| |
4 |
’---------’ ’---------’
’---------’ ’---------’
^
|
’---- (B) Failing Unit
Yes: Continue with the next step.
No: Go to step 18 on page 136.
15. Remove the SPCN optical adapter (A) from the frame that precedes the frame that cannot become
powered on.
134
Isolation procedures
.---------.
.--- (A) SPCN Optical Adapter
| System |
|
| Unit
|
V
| SPCN 0 |
.-.
.-.
’----.----’
| +------------+ |
322
| +------------+ |
.----’----. .-------’-+
.-------’-+ .---------.
| J15
| |Sec
J16|
|Sec
J15| |
Sec |
|Sec Unit +-+Unit J15|
|Unit J16+-+J15 Unit |
| 1
| | 2
|
| 3
| |
4 |
’---------’ ’---------’
’---------’ ’---------’
^
|
’-- Failing Unit
16. Perform the following steps:
Notes:
a.
b.
a.
b.
The cable may be connected to either J15 or J16.
Use an insulated probe or jumper when performing the voltage readings.
Connect the negative lead of a multimeter to the system frame ground.
Connect the positive lead of a multimeter to pin 2 of the connector from which you removed the
SPCN optical adapter in the previous step of this procedure.
c. Note the voltage reading on pin 2.
d. Move the positive lead of the multimeter to pin 3 of the connector or SPCN card.
e. Note the voltage reading on pin 3.
Is the voltage on both pin 2 and pin 3 from 1.5 V dc to 5.5 V dc?
v Yes: Continue with the next step.
v No: Exchange the I/O backplane.
This ends the procedure.
17. Exchange the following FRUs, one at a time:
a. In the failing unit (first frame with a failure indication), replace the I/O backplane.
b. In the preceding unit in the string, replace the I/O backplane.
c. SPCN optical adapter (A) in the preceding unit in the string.
d. SPCN optical adapter (B) in the failing unit.
e. SPCN optical cables (C) between the preceding unit in the string and the failing unit.
This ends the procedure.
(A) SPCN
Optical
.----- (C) SPCN
Adapter ----.
|
Optical Cables
|
|
|
|
.-- (B) SPCN
.---------.
|
|
|
Optical
| System |
|
|
|
Adapter
| Unit
|
V
|
V
| SPCN 0 |
.-.
V
.-.
’----.----’
| +------------+ |
|
| +------------+ |
.----’----. .-------’-+ .-------’-+ .---------.
|Sec J15 | | Sec J16| |Sec
J15| |
Sec |
|Unit J16+-+J15 Unit | |Unit J16+-+J15 Unit |
| 1
| |
2 | | 3
| |
4 |
’---------’ ’---------’ ’---------’ ’---------’
^
|
|
’--- Failing Unit
Isolation procedures
135
18. Perform the following steps:
a. Power off the system.
b. Disconnect the SPCN frame-to-frame cable from the connector of the first unit that cannot be
powered on.
c. Connect the negative lead of a multimeter to the system frame ground.
d. Connect the positive lead of the multimeter to pin 2 of the SPCN cable.
Note: Use an insulated probe or jumper when performing the voltage readings.
e. Note the voltage reading on pin 2.
f. Move the positive lead of the multimeter to pin 3 of the SPCN cable.
g. Note the voltage reading on pin 3.
Is the voltage on both pin 2 and pin 3 from 1.5 V dc to 5.5 V dc?
v No: Continue with the next step.
v Yes: Exchange the following FRUs one at a time:
a. In the failing unit, replace the I/O backplane.
b. In the preceding unit in the string, replace the I/O backplane.
c. SPCN frame-to-frame cable.
This ends the procedure.
19. Perform the following steps:
a. Follow the SPCN frame-to-frame cable back to the preceding unit in the string.
b. Disconnect the SPCN cable from the connector.
c. Connect the negative lead of a multimeter to the system frame ground.
d. Connect the positive lead of a multimeter to pin 2 of the connector.
Note: Use an insulated probe or jumper when performing the voltage readings.
e. Note the voltage reading on pin 2.
f. Move the positive lead of the multimeter to pin 3 of the connector.
g. Note the voltage reading on pin 3.
Is the voltage on both pin 2 and pin 3 from 1.5 V dc to 5.5 V dc?
v Yes: Exchange the following FRUs one at a time:
a. SPCN frame-to-frame cable.
b. In the failing unit, replace the I/O backplane.
c. In the preceding unit in the string, replace the I/O backplane.
This ends the procedure.
v No: Exchange the I/O backplane from the unit from which you disconnected the SPCN cable in
the previous step of this procedure.
This ends the procedure.
IQYDBPL
A disk enclosure backplane might be failing.
Contact your next level of support. This ends the procedure.
IQYPLNR
A system board switch might be failing.
Contact your next level of support. This ends the procedure.
136
Isolation procedures
IQYRIEA
The system detected that the room unit emergency power off (UEPO) is in bypass.
If the room UEPO is in bypass, enable the room UEPO.
If the room UEPO is not in bypass, contact your next level of support.
This ends the procedure.
IQYRIEB
The system detected a temperature warning in the bulk power assembly (BPA) enclosure.
Verify that there is unrestricted air flow around the system. In the BPA that detected the problem, ensure
that there are no empty card positions. All card positions must be filled for proper air flow and cooling.
Contact your next level of support. This ends the procedure.
IQYRIRR
A switch riser might be failing.
Contact your next level of support. This ends the procedure.
IQYRISC
The system detected that the temperature threshold was exceeded in a CEC enclosure.
Perform the following steps:
1. Is the room temperature less than 40°C (104°F)?
Yes:
Continue with the next step.
Notify the customer. The customer must bring the room temperature within normal range. If
the problem persists, continue with the next step
2. Are the front and rear doors of the system free of obstructions?
No:
Yes:
Continue with the next step.
Notify the customer. The system must be free of obstructions for proper air flow. If the
problem persists, continue with the next step.
3. Do all the unused FRU positions contain fillers?
No:
Yes:
Continue with the next step.
Fill any unused FRU positions with fillers. If the problem persists, continue with the next
step.
4. There might be a problem with the CEC enclosure cooling. Contact your next level of support. This
ends the procedure.
No:
IQYRISE
The bulk power assembly (BPA) has detected that the unit emergency power off (UEPO) is in bypass.
For 9119-FHB, perform the following steps:
1. Is the switch on the bulk power controller (BPC) indicated in the reference code description set to
bypass?
Yes:
Continue with the next step.
Isolation procedures
137
Use the management console to replace the BPC indicated in the reference code description
for this error. This ends the procedure.
2. Set the switch on the BPC to the normal position. To resolve the error condition, activate the BPC or
use the management console to remove and reinstall the BPC (without replacing hardware). This
ends the procedure.
No:
For 9125-F2C, perform the following steps:
1. Is the switch on the bulk power controller hub (BPCH) indicated in the reference code description set
to bypass?
Yes:
Continue with the next step.
Use the management console to replace the BPCH indicated in the reference code description
for this error. This ends the procedure.
2. Set the switch on the BPCH to the normal position. To resolve the error condition, activate the BPCH
or use the management console to remove and reinstall the BPCH (without replacing hardware). This
ends the procedure.
No:
IQYRISJ
The system detected a bulk power regulator overcurrent problem.
Perform the following steps:
1. Is the reference code 14022A10 or 14022B10?
Yes:
Contact your next level of support. This ends the procedure.
No:
Continue with the next step.
2. Replace the failing items in the failing item list one at a time until the problem is resolved. This ends
the procedure.
IQYRISK
The system detected a bulk power regulator (BPR) switch that is in the wrong position.
Perform the following steps:
1. Slide the switch on the BPR identified in the reference code description to the opposite position.
Verify that the switch is in the correct position by activating the BPR or using the management
console to remove and reinstall the BPR (without replacing hardware). Has the problem been
resolved?
Yes:
This ends the procedure.
No:
Continue with the next step.
2. Use the management console to replace the BPR indicated in the reference code description for this
error. This ends the procedure.
IQYRISM
The system detected a problem in which a cable is included in the failing item list.
Perform the following steps:
1. Check that each end of the cable indicated in the failing item list is connected to the system. Is the
cable connected at both ends and not visibly damaged?
Yes:
138
Replace the failing items in the failing item list one at a time until the problem is resolved.
This ends the procedure.
Isolation procedures
If the cable is disconnected, connect the cable. If the cable is visibly damaged, replace the
cable. Continue with the next step.
2. Has the problem been resolved?
No:
Yes:
This ends the procedure.
No:
Replace the failing items in the failing item list one at a time until the problem is resolved.
This ends the procedure.
IQYRISQ
The system detected a problem.
Contact your next level of support. This ends the procedure.
IQYRISR
The system detected a vital product data problem.
Contact your next level of support. This ends the procedure.
IQYRISS
The system detected a single chip module (SCM) or a multiple chip module (MCM) configuration
problem.
Contact your next level of support. This ends the procedure.
IQYRISU
The system detected a loss of input power to a bulk power assembly (BPA).
For 9119-FHB, perform the following steps:
For important safety information before continuing with this procedure, see “Power isolation procedures”
on page 124.
Note: 9119-FHB systems can be powered by 200 - 480 V ac or 380 - 520 V dc. When measuring voltage,
determine the type of input voltage and set the meter accordingly.
1. Is the system powered by 380 - 520 V dc?
Yes:
Continue with the next step.
No:
Continue with step 3.
2. Determine the voltage polarity on the BPA by measuring the voltages between the labeled test points
on the face of the BPA. Connect the negative lead of the meter to phase A and the positive lead of the
meter to phase B. Is the voltage negative?
Yes:
Inform the customer that power voltage at the input to the BPA needs to be corrected. This
ends the procedure.
No:
Continue with the next step.
3. Measure the voltages between the following labeled test points on the face of the BPA:
v Phase A and phase B
v Phase B and phase C
v Phase C and phase A
Are all of the meter readings greater than 180 V ac or 330 V dc?
Yes:
Continue with the next step.
Isolation procedures
139
Inform the customer that power voltage at the input to the BPA needs to be corrected. This
ends the procedure.
4. Does the problem persist?
No:
Yes:
Continue with the next step.
No:
This ends the procedure.
5. Replace the failing items in the failing item list one at a time until the problem is resolved. This ends
the procedure.
For 9125-F2C, perform the following steps:
1. Is the LED on the BPA near the power cord on?
Yes:
Continue with the next step.
Inform the customer that the ac voltage at the power outlet is not correct. This ends the
procedure.
2. Does the problem persist?
No:
Yes:
Continue with the next step.
No:
This ends the procedure.
3. Replace the failing items in the failing item list one at a time until the problem is resolved. This ends
the procedure.
IQYRISZ
The system detected a bulk power problem.
Contact your next level of support. This ends the procedure.
PWR1900
Determine which procedure to use based on the model number.
Follow the instructions for the model or expansion unit you are servicing.
Perform isolation procedure “PWR1905” on page 145 when servicing an 8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B, 8205-E6C, 8205-E6D, 8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D,
8233-E8B, 8236-E8C, or 8268-E1D system unit.
Perform isolation procedure “PWR1904” when servicing an 8248-L4T, 8408-E8D, 8412-EAD,
9109-RMD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD system unit.
Perform isolation procedure “PWR1909” on page 149 when servicing a 5796, 5802, 5877, or 7314-G30
expansion unit.
This ends the procedure.
PWR1904
A power supply fault is occurring on the +12V/-12V line.
See “Power isolation procedures” on page 124 for important safety information before servicing the
system.
Select the system that you are servicing:
v 8248-L4T, 8408-E8D, or 9109-RMD
v 8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
140
Isolation procedures
Procedure for 8248-L4T, 8408-E8D, or 9109-RMD
1. Perform the following steps:
a. Power off the system.
b. Replace one of the memory DIMMs on a memory riser card. See System FRU locations for
information about FRU locations for the system that you are servicing.
c. Power on the system. See Powering on and powering off the system.
d. Has this resolved the problem?
No: Continue with step 1e.
Yes: This ends the procedure.
e. Have you replaced all of the DIMMs?
No: Repeat step 1 and replace the next memory DIMM.
Yes: Go to step 2.
2. Perform the following steps:
a. Power off the system and disconnect the ac power cable from the unit you are working on.
b. Disconnect all of the I/O devices (tape, diskette, optical, and disk units) from the unit you are
working on by sliding them partially out of the unit.
c. Remove and label all cards (for example, PCI adapters, memory riser cards, GX adapters,
RIO/HSL and RAID cards if installed).
d. Reconnect the ac power cable to the unit you are working on.
e. Power on the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: Go to step 7 on page 142.
3. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Replace one of the system fans. See System FRU locations for information about FRU locations
for the system that you are servicing.
c. Power on the system.
Does a power reference code occur?
v Yes: Continue with the next step.
v No: The fan you just replaced was the failing item.
This ends the procedure.
4. Have you tried replacing all of the fans?
v Yes: Reinstall all of the fans you replaced in step 3 and continue with the next step.
v No: Perform the following steps:
a. Power off the system.
b. Reinstall the fan that you just removed in step 3 to its original location.
c. Repeat step 3.
5. Perform the following steps:
a. Power off the system.
b. Replace one of the voltage regulator cards. See System FRU locations for the voltage regulator
card location for the system you are servicing.
c. Power on the system.
Does a power reference code occur?
v Yes: Continue with the next step.
Isolation procedures
141
v No: The voltage regulator card you just replaced was the failing item.
This ends the procedure.
6. Have you tried replacing all of the voltage regulator cards?
v Yes: Continue with the next step.
v No: Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Reinstall the voltage regulator card that you just removed in step 6 to its original location.
c. Repeat step 5 on page 141.
7. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Reinstall all of the cards (PCI adapters, memory risers, GX adapters, RIO/HSL and RAID cards)
you removed in step 2 on page 141 to their original locations.
c. Power on the system. See Powering on and powering off the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: Go to step 10.
8. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Disconnect one of the cards you reinstalled in step 7.
c. Power on the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: Exchange the last card you disconnected in this step (see System FRU locations). This ends
the procedure.
9. Have you disconnected all of the cards?
No: Repeat step 8.
Yes: Reinstall all of the parts you have removed or exchanged in this procedure and perform
problem analysis using the reference code. This ends the procedure.
10. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Reconnect all of the I/O devices (tape, diskette, optical, and disk units) that you disconnected in
step 3 on page 141.
c. Power on the system. See Powering on and powering off the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: The problem has been resolved. This ends the procedure.
11. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Disconnect one of the I/O devices (tape, diskette, optical, and disk units) that you reconnected in
step 10.
c. Power on the system. See Powering on and powering off the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: Exchange the last I/O device you disconnected in this step (see System FRU locations). This
ends the procedure.
12. Have you tried disconnecting all of the I/O devices?
142
Isolation procedures
No: Repeat step 11 on page 142.
Yes: Reinstall all of the parts you have removed or exchanged in this procedure and contact your
next level of support. This ends the procedure.
Procedure for 8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC,
or 9179-MHD
1. Perform the following steps:
a. Power off the system.
b. Replace one of the memory DIMMs on the processor card. See System FRU locations for
information about FRU locations for the system that you are servicing.
c. Power on the system. See Powering on and powering off the system.
d. Has this resolved the problem?
No: Continue with step 1e.
Yes: This ends the procedure.
e. Have you replaced all of the DIMMs?
No: Repeat step 1 and replace the next memory DIMM.
Yes: Go to step 2.
2. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Replace the processor card. See System FRU locations for information about FRU locations for the
system you are servicing.
c. Power on the system.
d. Has this resolved the problem?
No: Continue with the next step.
Yes: This ends the procedure.
3. Perform the following steps:
a. Power off the system and disconnect the ac power cable from the unit you are working on.
b. Disconnect all of the I/O devices (tape, diskette, optical, and disk units) from the unit you are
working on by sliding them partially out of the unit.
c. Remove and label all cards (for example, PCI adapters, memory DIMMs, GX adapters, RIO/HSL
and RAID cards if installed).
d. Reconnect the ac power cable to the unit you are working on.
e. Power on the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: Go to step 8 on page 144.
4. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Replace one of the system fans. See System FRU locations for information about FRU locations
for the system that you are servicing.
c. Power on the system.
Does a power reference code occur?
v Yes: Continue with the next step.
v No: The fan you just replaced was the failing item.
This ends the procedure.
5. Have you tried replacing all of the fans?
v Yes: Reinstall all of the fans you replaced in step 4 and continue with the next step.
Isolation procedures
143
v No: Perform the following steps:
a. Power off the system.
b. Reinstall the fan that you just removed in step 4 on page 143 to its original location.
c. Continue with the next step.
6. Perform the following steps:
a. Power off the system.
b. Replace one of the voltage regulator cards. See System FRU locations for the voltage regulator
card location for the system you are servicing.
c. Power on the system.
Does a power reference code occur?
v Yes: Continue with the next step.
v No: The voltage regulator card you just replaced was the failing item.
This ends the procedure.
7. Have you tried replacing all of the voltage regulator cards?
v Yes: Replace the processor card (Un-P3). See System FRU locations. This ends the procedure.
v No: Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Reinstall the voltage regulator card that you just removed in step 6 to its original location.
c. Repeat step 6.
8. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Reinstall all of the cards (PCI adapters, memory DIMMs, GX adapters, RIO/HSL and RAID
cards) you removed in step 3 on page 143 to their original locations.
c. Power on the system. See Powering on and powering off the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: Go to step 11.
9. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Disconnect one of the cards you reinstalled in step 8.
c. Power on the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: Exchange the last card you disconnected in this step (see System FRU locations). This ends
the procedure.
10. Have you disconnected all of the cards?
No: Repeat step 9.
Yes: Reinstall all of the parts you have removed or exchanged in this procedure and perform
problem analysis using the reference code. This ends the procedure.
11. Perform the following steps:
a. Power off the system. See Powering on and powering off the system
b. Reconnect all of the I/O devices (tape, diskette, optical, and disk units) that you disconnected in
step 3 on page 143.
c. Power on the system. See Powering on and powering off the system
Does a power reference code occur?
Yes: Continue with the next step.
144
Isolation procedures
No: The problem has been resolved. This ends the procedure.
12. Perform the following steps:
a. Power off the system. See Powering on and powering off the system
b. Disconnect one of the I/O devices (tape, diskette, optical, and disk units) that you reconnected in
step 11 on page 144.
c. Power on the system. See Powering on and powering off the system
Does a power reference code occur?
Yes: Continue with the next step.
No: Exchange the last I/O device you disconnected in this step (see System FRU locations). This
ends the procedure.
13. Have you tried disconnecting all of the I/O devices?
No: Repeat step 12.
Yes: Reinstall all of the parts you have removed or exchanged in this procedure and contact your
next level of support.This ends the procedure.
PWR1905
A system unit power supply load fault is occurring.
See “Power isolation procedures” on page 124 for important safety information before servicing the
system.
Procedure for 8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D, 8233-E8B, 8236-E8C, or
8268-E1D
1. Is the reference code 1xxx 1B01?
Yes: Continue with the next step.
No: This procedure only isolates problems that cause 1xxx 1B01 to be logged. Return to the
procedure that sent you here. This ends the procedure.
2. Perform the following steps:
a. Power off the system and disconnect the ac power cable from the unit you are working on. See
Powering on and powering off the system.
b. Disconnect all the I/O devices (tape, diskette, optical, and disk units) by sliding them partially
out of the system unit. See System FRU locations for information about system FRU parts,
locations and addresses.
c. Remove and label all cards (for example, PCI adapters, GX adapters, RIO/HSL, and RAID cards
if installed).
d. Reconnect the ac power cable or cables to the unit you are working on.
e. Power on the system. See Powering on and powering off the system for information about
powering on or off your system.
Does a power reference code occur?
Yes: Continue with the next step.
No: Go to step 5 on page 146.
3. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Replace one of the system fans. See System FRU locations Finding parts, locations, and addresses
for information about system fan locations for the system you are servicing.
c. Power on the system. See Powering on and powering off the system for information about
powering on or off your system.
Does a power reference code occur?
Isolation procedures
145
Yes: Continue with the next step.
No: The fan you just replaced was the failing item. This ends the procedure.
4. Have you tried replacing all the fans?
v Yes: Reinstall the fan you just replaced in step 3 on page 145 and continue with the next step.
v No: Perform the following steps:
a. Power off the system.
b. Reinstall the fan that you just removed in step 3 on page 145 to its original location.
c. Repeat step 3 on page 145.
5. Perform the following steps:
a. Power off the system.
b. Reinstall all the cards (PCI adapters, memory DIMMs, GX adapters, RIO/HSL and RAID cards)
you removed in step 2 on page 145 into their original locations.
c. Power on the system. See Powering on and powering off the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: Go to step 8.
6. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Disconnect one of the cards you reinstalled in step 5.
c. Power on the system. See Powering on and powering off the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: Exchange the last card you disconnected in this step. See System FRU locations for
information about FRU locations for the system you are servicing. This ends the procedure.
7. Have you disconnected all the cards?
No: Repeat step 6.
Yes: Reinstall all of the parts removed or exchanged in this procedure and return to Start of call.
This ends the procedure.
8. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Reconnect all of the I/O devices (tape, diskette, optical, and disk units) that you disconnected in
step 2 on page 145.
c. Power on the system. See Powering on and powering off the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: The problem has been resolved. This ends the procedure.
9. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Disconnect one of the I/O devices (tape, diskette, optical, and disk units) that you reconnected in
step 8.
c. Power on the system. See Powering on and powering off the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: Exchange the last I/O device you disconnected in this step. See System FRU locations for
information about the I/O device location for the system you are servicing. This ends the
procedure.
146
Isolation procedures
10. Have you tried disconnecting all of the I/O devices?
No: Repeat step 9 on page 146.
Yes: Continue with the next step.
11. Replace the system backplane (Un-P1). Reinstall all of the parts you have removed or exchanged in
this procedure.
Does a power reference code occur?
Yes: Continue with the next step.
No: The system is fixed. This ends the procedure.
12. Does the system contain only one power supply?
Yes: Continue with the next step.
No: Go to step 14.
13. Replace the power supply.
Does a power reference code occur?
Yes: Contact your service support.This ends the procedure.
No: The system is fixed. This ends the procedure.
14. Power off the system. Remove one of the power supplies, and then power on the system. See
Powering on and powering off the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: The power supply you just removed is defective. Replace it. This ends the procedure.
15. Power off the system. Reinstall the power supply that was removed in step 14, and remove the other
power supply. Power on the system. See Powering on and powering off the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: The power supply you just removed is defective. Replace it. This ends the procedure.
16. Power off the system. Replace both power supplies. Power on the system.
Does a power reference code occur?
Yes: Contact your service support. This ends the procedure.
No: The system is fixed. This ends the procedure.
PWR1907
A unit was dropped from the SPCN configuration.
1. Is the reference code you are working with 1xxx 913B?
v No: Continue with the next step.
v Yes: A system power control network (SPCN) firmware update is needed, but not started due to the
SPCN firmware update policy setting. A manual update needs to be started.
Notes:
– Do not perform maintenance on an expansion unit or modify the SPCN network while the SPCN
firmware update is being performed.
– Performing firmware updates or powering off the system will interrupt SPCN firmware updates
and the firmware update will need to be started again after these actions.
a. Access the Advanced System Management Interface (ASMI) and select System Configuration and
then Configure I/O Enclosures.
b. Record the current SPCN firmware update policy setting so it can be restored later.
c. Change the SPCN firmware update policy setting to expanded and click Save Policy Setting to
allow for SPCN firmware updates to be done over the RIO/HSL and serial SPCN interfaces.
Isolation procedures
147
d. Select Start SPCN Firmware Update. The SPCN firmware will now be downloaded to the
expansion units that require an update.
e. Change the SPCN firmware update policy setting back to what it was originally set to in step 1b
on page 147 and click on Save Policy Setting.
Note: The SPCN firmware update can be stopped using the Stop SPCN Firmware Update button.
However, the firmware update must be allowed to complete in order for the expansion units to be
updated to the latest SPCN firmware level.
The progress of the SPCN firmware update can be monitored by clicking on Configure I/O
Enclosures to update the screen. Do not use the browser Back or Refresh button to monitor the
update progress. The Power Control Network Firmware Update Status column shows the
percentage complete and In Progress is displayed while the download is progresses. Not
Required is displayed when the download process completes.
This ends the procedure.
2. Is the reference code you are working with 1xxx 90F0?
v No: Contact your next level of support.
v Yes: A unit was dropped from the SPCN configuration.
This can be caused by any of the following:
– The rack or unit has lost all ac or dc power.
– The SPCN function in the unit has an error.
– The SPCN frame-to-frame cable cable has failed.
3. Using the HMC or ASMI, find the 1xxx 90F0 SRC in the error log (see Displaying error and event
logs). Use the option Show details to display the location information for the failing unit.
4. After locating the failing unit, ensure that the SPCN cable is seated correctly. Reseat the SPCN cable, if
necessary.
Is the cable connected correctly?
v No: Correctly reconnect the SPCN cable, or replace them if necessary. This ends the procedure.
v Yes: Continue with the next step.
5. Are the ac line cords on the failing unit connected properly at both ends?
v No: Reconnect the ac line cords, or replace them if necessary. This ends the procedure.
v Yes: Continue with the next step.
6. Check the voltage at the customer's ac outlet. Is the voltage correct?
v No: Inform the customer that the voltage at the ac power outlet is incorrect. This ends the
procedure.
v Yes: Continue with the next step.
7. Are the power supplies functional?
v No: Perform the following steps:
a. See System FRU locations to determine the location and part number for each power supply,
and to find the appropriate procedure for exchanging the power supplies.
b. Replace each power supply one at a time, until the problem has been resolved.
c. If the problem persists after replacing all of the power supplies, continue with the next step.
v Yes: Continue with the next step.
8. Use the following table to determine the FRUs to replace. See System FRU locations for instructions
on replacing FRUs. This ends the procedure.
148
Isolation procedures
Table 28. Expansion enclosures FRUs
Enclosure
FRU
5796, 7314-G30
I/O backplane
5802, 5877
EMC card, Un-P2
PWR1909
A power supply load fault is occurring in a system expansion unit or expansion unit.
For important safety information before servicing the system, see “Power isolation procedures” on page
124.
Procedure for 5796, 5802, 5877, and 7314-G30
See System FRU locations for information about FRU locations.
1. Perform the following steps:
a. Power off the system.
b. Disconnect all the I/O devices (tape, diskette, optical, and disk units) from the expansion unit or
expansion unit you are working on by sliding them partially out of the unit. See System FRU
locations for information about FRU location and removal.
c. Remove and label all cards installed in the PCI adapters area.
d. Power on the system. See Powering on and powering off the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: Go to step 8 on page 150.
2. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Remove one of the fans from the expansion unit or I/O expansion unit that you have not
previously removed during this procedure.
Note: If a fan reference code occurs during this step, ignore it.
c. Power on the system. See Powering on and powering off the system.
Does a power reference code occur?
v Yes: Continue with the next step.
v No: The fan you removed in this step is the failing item.
This ends the procedure.
3. Have you removed all of the fans one at a time?
v No: Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Reinstall the fan that you removed in step 2 into its original location.
c. Repeat step 2.
v Yes: Reinstall all of the fans and continue with the next step.
4. Perform the following steps:
Note: If there are no DASD installed in this enclosure, go to step 6 on page 150.
a. Power off the system. See Powering on and powering off the system.
b. Remove the expansion unit power supply cable, at the DASD backplane, that you have not
previously removed.
c. Power on the system. See Powering on and powering off the system.
Isolation procedures
149
Does a power reference code occur?
v No: The DASD backplane that was disconnected in this step is the failing item.
This ends the procedure.
v Yes: Continue with the next step.
5. Have you disconnected the power cables from each of the DASD backplanes one at a time?
Yes: Continue with the next step.
No: Repeat step 4 on page 149.
6. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Remove a power supply that you have not previously removed, and replace it with a new one.
c. Power on the system. See Powering on and powering off the system.
Does a power reference code occur?
v Yes: Continue with the next step.
v No: The power supply that was removed in this step is the failing item.
This ends the procedure.
7. Have you removed all of the power supplies one at a time?
v Yes: Perform the following steps:
a. Remove the new power supply that you installed in step 6 and reinstall the original power
supply.
b. Replace the I/O backplane if you are working on a 5796, 5802, 5877, or 7314-G30.
This ends the procedure.
v No: Remove the new power supply that you installed in step 6 and reinstall the original power
supply. Then, repeat step 6.
8. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Reinstall all of the cards you removed in step 1 on page 149.
c. Power on the system. See Powering on and powering off the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: Go to step 11.
9. Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Disconnect one of the cards you reconnected in step 8.
c. Power on the system. See Powering on and powering off the system.
Does a power reference code occur?
v Yes: Continue with the next step.
v No: Exchange the last card you disconnected in this step. See System FRU locations for
information about the FRU location for the system that you are servicing. Reinstall all the parts.
This ends the procedure.
10. Have you disconnected all the cards?
v No: Repeat step 9.
v Yes: Reinstall all the parts and return to Start of call.
This ends the procedure.
11. Perform the following steps:
a. Power off the system.
150
Isolation procedures
b. Reconnect all of the I/O devices (tape, diskette, optical, or disk units) that you disconnected in
step 1 on page 149.
c. Power on the system.
Does a power reference code occur?
Yes: Continue with the next step.
No: This ends the procedure.
12. Perform the following steps:
a. Power off the system.
b. Disconnect one of the I/O devices you reconnected in step 11 on page 150.
c. Power on the system.
Does a power reference code occur?
v Yes: Continue with the next step.
v No: Exchange the last I/O device you disconnected in this step.
This ends the procedure.
13. Have you disconnected all of the I/O devices?
v No: Repeat step 12.
v Yes: Reinstall all the parts and return to Start of call.
This ends the procedure.
PWR1911
You are here because of a power problem on a dual line cord system. If the failing unit does not have a
dual line cord, return to the procedure that sent you here or go to the next item in the FRU list.
The following steps are for the system unit, unless other instructions are given. For important safety
information before servicing the system, see “Power isolation procedures” on page 124.
1. If an uninterruptible power supply is installed, verify that it is powered on before proceeding.
2. Are all the units powered on?
Note: In an 8233-E8B or 8236-E8C system, there are two internal ac power cables that run from the
back of the drawer to the power supplies in the front. The upper ac power connector on the back of
the system goes to power supply E1. The lower ac power connector on the back of the system goes
to power supply E2.
v Yes: Go to step 7 on page 153.
v No: On the unit that does not power on, perform the following steps:
a. Disconnect the ac line cords from the unit that does not power on.
b. Use a multimeter to measure the ac voltage at the system end of both ac line cords.
Table 29. Correct ac voltage
Model or expansion unit
Correct ac voltage
8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C,
8205-E6D, 8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C,
8231-E2D, 8233-E8B, 8236-E8C, or 8268-E1D
100 - 127 V or 200 - 240 V
8248-L4T, 8408-E8D, 8412-EAD, 9109-RMD, 9117-MMB,
9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or
9179-MHD
200 - 240 V
5796 and 7314-G30 expansion units
200 - 240 V
5802 expansion unit
90 - 259 V
c. Is the voltage correct (see Table 29)?
Isolation procedures
151
Yes: Continue with the next step.
No: Go to step 6.
3. Are you working on an 8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, 8205-E6D, 8231-E2B,
8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D, 8233-E8B, 8236-E8C, 8268-E1D, 8248-L4T, 8408-E8D,
8412-EAD, 9109-RMD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
system or a 5796, 7314-G30 system, or a 5802 or 5877 expansion unit?
v No: Continue with the next step.
v Yes: Perform the following steps:
a. Reconnect the ac line cords.
b. Verify that the failing unit fails to power on.
c. Replace the failing power supply. Use the table below to determine which power supply needs
replacing, and then see System FRU locations for its location, part number, and exchange
procedure.
Table 30. Failing power supply for systems models and expansion units
Reference code
System or expansion unit
Failing item name
1510
System unit
Power supply 1
Expansion unit
Power supply 1
System unit
Power supply 2
Expansion unit
Power supply 2
1520
This ends the procedure.
4. Perform the following steps:
a. Reconnect the ac line cord to the ac modules.
b. Remove the ac jumper cables at the power supplies.
c. Use a multimeter to measure the ac voltage at the power-supply end of the jumper cable.
Is the ac voltage from 200 V to 240 V?
v No: Continue with the next step.
v Yes: Replace the failing power supply. See System FRU locations for its location, part number, and
exchange procedure.
Attention: Do not install power supplies P00 and P01 ac jumper cables on the same ac module.
This ends the procedure.
5. Perform the following steps:
a. Disconnect the ac jumper cable at the ac module output.
b. Use a multimeter to measure the ac voltage at the ac module output.
Is the ac voltage from 200 V to 240 V?
v Yes: Exchange the ac jumper cable.
This ends the procedure.
v No: Exchange the ac module. See System FRU locations for FRU location information for the
system that you are servicing.
This ends the procedure.
6. Perform the following steps:
a. Disconnect the ac line cords from the customer's ac power outlet.
b. Use a multimeter to measure the ac voltage at the customer's ac power outlet.
Is the ac voltage correct (see Table 29 on page 151)?
v Yes: Exchange the failing ac line cord.
This ends the procedure.
152
Isolation procedures
v No: Perform the following steps:
a. Inform the customer that the ac voltage at the power outlet is not correct.
b. Reconnect the ac line cords to the power outlet after the ac voltage at the power outlet is
correct.
This ends the procedure.
7. Is the reference code 1xxx00AC?
v No: Continue with the next step.
v Yes: This reference code may have been caused by an ac outage. If the system will power on
without an error, no parts need to be replaced.
This ends the procedure.
8. Is the reference code 1xxx1510 or 1520?
v No: Continue with the next step.
v Yes: Perform the following steps:
a. Use the following table, figures and location codes to locate the failing parts. See System FRU
locations for information about FRU location for the system that you are servicing.
Table 31. Power reference code table
System or expansion unit
Reference code
Locate these parts
System unit
1xxx 1510
Power supply E1 and ac line cord 1
1xxx 1520
Power supply E2 and ac line cord 2
1xxx 1510
Power supply 1 and ac line cord 1
1xxx 1520
Power supply 2 and ac line cord 2
Expansion unit
b. Locate the ac line cord or the ac jumper cable for the reference code you are working on.
c. Go to step 10.
9. Is the reference code 1xxx 1500 or 1xxx 1530?
v No: Perform Problem Analysis using the reference code.
This ends the procedure.
v Yes: Locate the ac jumper cables for the reference code you are working on (see Table 31), and
then continue with the next step:
– If the reference code is 1xxx 1500, determine the locations of ac jumper cables that connect to
power supply P00 (see the preceding figures).
– If the reference code is 1xxx 1530, determine the locations of ac jumper cables that connect to
power supply P03 (see the preceding figures).
10. Perform the following steps:
Attention: Do not disconnect the other system line cord or the other ac jumper cable when
powered on.
a. For the reference code you are working on, disconnect either the ac jumper cable or the ac line
cord from the power supply.
b. Use a multimeter to measure the ac voltage at the power supply end of the ac jumper cable or
the ac line cord.
Is the ac voltage correct (see Table 29 on page 151)?
No: Continue with the next step.
Yes: Exchange the failing power supply. See Table 30 on page 152 for its position, and then see
System FRU locations for part numbers and directions to the correct exchange procedures. This
ends the procedure.
11. Perform the following steps:
a. Disconnect the ac line cords from the power outlet.
Isolation procedures
153
b. Use a multimeter to measure the ac voltage at the customer's ac power outlet.
Is the ac voltage correct (see Table 29 on page 151)?
v Yes: Exchange the following, one at a time:
– Failing ac line cord
– Failing ac jumper cable (if installed)
– Failing ac module (if installed) (see System FRU locations for part numbers and directions to
the correct exchange procedures)
This ends the procedure.
v No: Perform the following steps:
a. Inform the customer that the ac voltage at the power outlet is not correct.
b. Reconnect the ac line cords to the power outlet after the ac voltage at the power outlet is
correct.
This ends the procedure.
PWR1912
The server detected an error in the power system.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
154
Isolation procedures
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
1. Perform the following steps:
a. Ensure that both power line cords are properly connected.
b. Make sure that the unit EPO switch is in the on position.
c. Make sure that the unit EPO bypass switches on both bulk power controllers (BPCs) are in the
normal position.
d. Ensure that the cable from unit EPO connector J00 to BPC-A connector J05 and the cable from unit
EPO connector J01 to BPC-B connector J05 are secure and undamaged.
e. Ensure that the room temperature is not in excess of the maximum allowed (40° Celsius or 104°
Fahrenheit).
Note: If the room temperature has exceeded the maximum allowed, the system may continually
cycle on and off.
Were any problems discovered while performing the above checks?
No: Continue with the next step.
Yes: Correct any problems you found. This ends the procedure.
2. Make sure that the on/off switches on all the bulk power regulators (BPRs) are in the on (left)
position.
Isolation procedures
155
Note: A switch set to the off position is not the cause of your problem, but they all need to be on
before proceeding.
3. Check the state of the LEDs on both sides of the bulk power assembly (BPA) and then choose from
the following conditions:
v If all of the LEDs on both sides of the BPA are in the off position, go to step 4.
v If the unit EPO power LED is turned on, the BPC GOOD LED is turned on, and all other LEDs are
in the off position, go to step 5.
v If neither of the above two conditions is true, independent faults are indicated on both sides of the
BPA. Each side must be isolated separately. Call your next level of support. This ends the
procedure.
4. Prepare a voltage meter to measure up to 600 V ac. Using the labelled test points on the frame,
measure the voltage between phase A and phase B. Is the voltage greater than 180 V ac?
Yes: Independent faults are indicated on both sides of the BPA. Each side must be isolated
separately. Call your next level of support. This ends the procedure.
No: Inform the customer that power line voltage at the input to the BPR is missing or too low and
needs to be corrected. This ends the procedure.
5. Is a cable connected to connector J02 on the unit EPO card?
No: Continue with the next step.
Yes: Go to step 7 to determine if the room EPO circuit is the problem.
6. Is the internal toggle switch on the unit EPO card set to the RM EPO BYPASS position?
No: Set the internal toggle switch on the unit EPO card to the RM EPO BYPASS position. This
ends the procedure.
Yes: The unit EPO card is the failing item and needs to be replaced. This ends the procedure.
7. Unplug the cable from connector J02 on the EPO card and set the toggle switch to the RM EPO
BYPASS position. Does the EPO CMPLT LED on at least one BPC become lit?
Yes: Inform the customer that the room EPO circuit is defective at this connection and requires
service. This ends the procedure.
No: The unit EPO card is the failing item and needs to be replaced. This ends the procedure.
PWR1917
This procedure is used to display or change the configuration ID.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
1. Use either the Advanced System Management Interface (ASMI) or the control panel to display and
change the configuration ID.
v If you are using the ASMI, see the operations guide for your system and use the instructions for
Changing system configuration. Use Table 32 on page 157 to find the correct configuration ID.
v If you are using the control panel, continue with the next step.
2. Perform the following steps to display the configuration ID:
Attention: The system or unit that will display the ID must be powered off with ac power applied.
Notes:
v If you have just restored power to the system, the service processor must return to standby before
control panel functions will work correctly. Returning the service processor to standby takes a few
minutes after the panel appears to be operational.
v You must have the panel in manual mode to access function 7 options.
a. Select function 07 on the system control panel. Press Enter (07** will be displayed).
156
Isolation procedures
b. Use the arrow keys to increment or decrement to subfunction A8. 07A8 will be displayed. Press
Enter (07A8 00 will be displayed).
c. Use the arrow keys to increment or decrement to the first byte of the unit address (usually 3C) for
the box you want to check. 07nn will be displayed, where nn is the first byte of the unit address.
d. Press Enter (073C 00, for example, will be displayed).
e. Use the arrow keys to increment or decrement to the second byte of the unit address (usually 01,
02, etc for I/O expansion units) for the box you want to check. 07nn will be displayed, where nn is
the second byte of the unit address (0701, for example, for a unit). Press Enter (0701 00, for
example, will be displayed).
Note: The display on the addressed I/O expansion unit being addressed should be flashing on
and off while displaying the configuration ID as the last two characters of the bottom line.
f. Use the following table to check the unit configuration ID.
Table 32. Unit configuration IDs
Model or expansion unit
Configuration ID
5802 or 5877
8E
5796 or 7314-G30
8D
g. Is the correct configuration ID displayed for the expansion unit selected?
v No: Continue with the next step.
v Yes: Go to step 6 on page 158.
3. You need to set the unit configuration ID. Are you starting this step from the function 01 view on the
control panel?
v No: To ensure that the control panel operates properly, return to function 01. Do the following:
a. The operator panel should still show the incorrect configuration ID (for example, 07C0).
b. Press Enter. The control panel will now show 07xx 00 (for example, 07C0 00).
c. Use the arrow keys to display 07**, then press Enter. The control panel will now show 07.
d. Use the arrow keys to get the display to function 01, then press Enter. You should now be at the
regular function 01 control panel view.
e. Continue with the next step.
v Yes: Continue with step 4.
4. Set the unit configuration ID. Do the following:
a. Select function 07 on the system control panel. Press Enter (07** will be displayed).
b. Use the arrow keys to increment/decrement to subfunction A9 (07A9 will be displayed). Press
Enter (07A9 00 will be displayed).
c. Use the arrow keys to increment/decrement to the first byte of the unit address (usually 3C) for
the box that you want to change. 07nn (073C, for example) will be displayed, where nn is the first
byte of the unit address. Press Enter (073C 00, for example, will be displayed).
d. Use the arrow keys to increment/decrement to the second byte of the unit address (usually 01, 02,
etc for I/O Expansion units) for the box you want to check. 07nn will be displayed, where nn is
the second byte of the unit address (01, for example, for a unit). Press Enter (0701 00, for example,
will be displayed).
Note: The display on the addressed I/O expansion unit will be flashing on and off.
e. Use the arrow keys to increment/decrement to the correct configuration ID (see Table 32). 07xx
will be displayed where xx is the configuration ID.
f. Press Enter (07xx 00 will be displayed). After 20 to 30 seconds, the display on the addressed I/O
expansion unit will stop flashing and return to the normal display format.
Isolation procedures
157
Note: To return the panel to normal display, scroll to 07** and press Enter.
g. Continue with the next step.
5. Power on the system. Do you still get SRC 1xxxx84D0 or 1xxxx840E?
v No: This ends the procedure.
v Yes: Continue with the next step.
6. Perform the following steps:
a. Power off the system.
b. Exchange the SPCN card in the failing frame. This ends the procedure.
PWR1918
The server detected an error in the power system.
See System FRU locations for information about FRU locations for the system that you are servicing.
Select the system that you are servicing and then perform the indicated PWR1918 procedure.
8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, or 8205-E6D
8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D, or 8268-E1D
8233-E8B, 8236-E8C
8248-L4T, 8408-E8D, 9109-RMD
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
Table 33. For 8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, or 8205-E6D, use the following table to
determine the action to perform.
Reference code
Action
1xxx 2611
Replace the following, if present, one at a time, and in the order listed:
1. System backplane (Un-P1)
2. PCIe riser (Un-P1-C1)
3. Disk drive backplane (Un-P2)
1xxx 2622
Replace the following, if present, one at a time, and in the order listed:
1. System backplane (Un-P1)
2. RAID card (Un-P1-C13 or Un-P1-C14 or Un-P1-C19)
3. PCIe riser (Un-P1-C1)
1xxx 2623
Replace the following, if present, one at a time, and in the order listed:
1. System backplane (Un-P1)
2. RAID battery (Un-P1-C14-E1 or Un-P1-C19-E1)
3. RAID card (Un-P1-C13 or Un-P1-C14 or Un-P1-C19)
4. PCIe riser (Un-P1-C1)
5. Disk drive backplane (Un-P2)
1xxx 2625
Replace the following, if present, one at a time, and in the order listed:
1. System backplane (Un-P1)
2. RAID battery (Un-P1-C14-E1 or Un-P1-C19-E1)
3. RAID card (Un-P1-C13 or Un-P1-C14 or Un-P1-C19)
4. HEA adapter (Un-P1-C3)
5. Disk drive backplane (Un-P2)
158
Isolation procedures
Table 33. For 8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, or 8205-E6D, use the following table to
determine the action to perform. (continued)
Reference code
Action
1xxx 2626
Replace the following, if present, one at a time, and in the order listed:
1. System backplane (Un-P1)
2. RAID card (Un-P1-C13 or Un-P1-C14 or Un-P1-C19)
3. HEA adapter (Un-P1-C3)
4. PCIe riser (Un-P1-C1)
1xxx 2628
Replace the system backplane (Un-P1).
1xxx 2691
Perform the following steps:
1. Remove one of the memory cards that has not already been removed in this
procedure and replace it with a new memory card.
2. Power on the system. See Powering on and powering off the system for
information on starting and stopping the system.
3. Is reference code 1xxx 2691 logged again?
Yes:
Continue with the next step.
No:
The memory card that was replaced was the failing item. This ends the
procedure.
4. Have all of the memory cards been replaced?
Yes:
Replace the system backplane at location Un-P1. This ends the
procedure.
No:
Reinstall the original memory card and repeat this procedure.
1xxx 8450
At least one VRM is missing from the system unit. Inspect all of the processor cards
and memory cards and install the VRMs that are missing. See System FRU locations
for FRU location information for the system that you are servicing.
1xxx 8453
There is an extra processor regulator in the system. Remove the extra regulator.
1xxx 8454
A fan at location Un-A3 is installed in a single-socket system. This configuration is not
valid. Remove the fan at location Un-A3.
1xxx 8457 and 1xxx 8458
Low power processor VRMs were installed on the system. The system requires high
power processor VRMs. Replace the processor VRMs with the correct part number.
Table 34. For 8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D, or 8268-E1D, use the following table to
determine the action to perform.
Reference code
Action
1xxx 2611
Replace the following, if present, one at a time, and in the order listed:
1. System backplane (Un-P1)
2. Disk drive backplane (Un-P2)
1xxx 2622
Replace the following, if present, one at a time, and in the order listed:
1. System backplane (Un-P1)
2. RAID card (Un-P1-C12 or Un-P1-C13 or Un-P1-C18)
1xxx 2623
Replace the following, if present, one at a time, and in the order listed:
1. System backplane (Un-P1)
2. RAID battery (Un-P1-C13-E1 or Un-P1-C18-E1)
3. RAID card (Un-P1-C12 or Un-P1-C13 or Un-P1-C18)
4. Disk drive backplane (Un-P2)
Isolation procedures
159
Table 34. For 8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D, or 8268-E1D, use the following table to
determine the action to perform. (continued)
Reference code
Action
1xxx 2625
Replace the following, if present, one at a time, and in the order listed:
1. System backplane (Un-P1)
2. RAID battery (Un-P1-C13-E1 or Un-P1-C18-E1)
3. RAID card (Un-P1-C12 or Un-P1-C13 or Un-P1-C18)
4. HEA adapter (Un-P1-C3)
5. Disk drive backplane (Un-P2)
1xxx 2626
Replace the following, if present, one at a time, and in the order listed:
1. System backplane (Un-P1)
2. HEA adapter (Un-P1-C3)
1xxx 2691
Perform the following steps:
1. Remove one of the memory cards that has not already been removed in this
procedure and replace it with a new memory card.
2. Power on the system. See Powering on and powering off the system for
information on starting and stopping the system.
3. Is reference code 1xxx 2691 logged again?
Yes:
Continue with the next step.
No:
The memory card that was replaced was the failing item. This ends the
procedure.
4. Have all of the memory cards been replaced?
Yes:
Replace the system backplane at location Un-P1. This ends the
procedure.
No:
Reinstall the original memory card and repeat this procedure.
1xxx 8450
At least one processor VRM is missing. Inspect all of the processor cards and install
the VRMs that are missing. See System FRU locations for FRU location information for
the system that you are servicing.
1xxx 8453
There is an extra processor regulator in the system. Remove the extra regulator.
1xxx 8458
Low-power processor VRMs were installed on the system. The system requires high
power processor VRMs. Replace the processor VRMs with the correct part number.
Table 35. For 8233-E8B or 8236-E8C, use the following table to determine the action to perform.
Reference code
Action
1xxx 2622
Replace the following, if present, one at a time, and in the order listed:
1. Base RAID card (Un-P1-C11)
2. Auxiliary RAID card (Un-P1-C10)
3. System backplane (Un-P1)
1xxx 2623
Replace the following, if present, one at a time, and in the order listed:
1. Base RAID card (Un-P1-C11)
2. Auxiliary RAID card (Un-P1-C10)
160
Isolation procedures
Table 35. For 8233-E8B or 8236-E8C, use the following table to determine the action to perform. (continued)
Reference code
Action
1xxx 2624
Replace the following, if present, one at a time, and in the order listed:
1. Ethernet card (Un-P1-C6)
2. Base RAID card (Un-P1-C11)
3. Auxiliary RAID card (Un-P1-C10)
4. System backplane (Un-P1)
5. DASD drives
6. DASD and media backplane (Un-P2)
1xxx 2625
Replace the following, if present, one at a time, and in the order listed:
1. Ethernet card (Un-P1-C6)
2. Base RAID card (Un-P1-C11)
3. Auxiliary RAID card (Un-P1-C10)
4. System backplane (Un-P1)
5. TPMD card (Un-P1-C12)
1xxx 2626
Replace the Ethernet card (Un-P1-C6), if present.
1xxx 2630
Replace the POWER7® VRM on processor card 1 (Un-P1-C13-C1). See System FRU
locations for FRU location information for the system that you are servicing.
1xxx 2632
Replace the memory VRM on processor card 1 (Un-P1-C13-C10). See System FRU
locations for FRU location information for the system that you are servicing.
1xxx 2634
Replace processor card 1 (Un-P1-C13). If the problem is not resolved, replace the
system backplane (Un-P1).
1xxx 2640
Replace the POWER7 VRM on processor card 2 (Un-P2-C14-C1). See System FRU
locations for FRU location information for the system that you are servicing.
1xxx 2642
Replace the memory VRM on processor card 2 (Un-P1-C14-C10). See System FRU
locations for FRU location information for the system that you are servicing.
1xxx 2644
Replace processor card 2 (Un-P1-C14). If the problem is not resolved, replace the
system backplane (Un-P1).
1xxx 2650
Replace the POWER7 VRM on processor card 3 (Un-P2-C15-C1). See System FRU
locations for FRU location information for the system that you are servicing.
1xxx 2652
Replace the memory VRM on processor card 3 (Un-P1-C15-C10). See System FRU
locations for FRU location information for the system that you are servicing.
1xxx 2654
Replace processor card 3 (Un-P1-C15). If the problem is not resolved, replace the
system backplane (Un-P1).
1xxx 2660
Replace the POWER7 VRM on processor card 4 (Un-P2-C16-C1). See System FRU
locations for FRU location information for the system that you are servicing.
1xxx 2662
Replace the memory VRM on processor card 4 (Un-P1-C16-C10). See System FRU
locations for FRU location information for the system that you are servicing.
1xxx 2664
Replace processor card 4 (Un-P1-C16). If the problem is not resolved, replace the
system backplane (Un-P1).
1xxx 3120
Replace the following, if present:
1. The VRM specified by the location code.
2. The processor card specified by the location code.
1xxx 8450
At least one VRM is missing from the processor card. Inspect all of the processor
cards and install the VRMs that are missing. See System FRU locations for FRU
location information for the system that you are servicing.
1xxx 8453
There is an extra processor regulator in the system. Remove the extra regulator.
Isolation procedures
161
Table 36. For 8248-L4T, 8408-E8D, or 9109-RMD, use the following table to determine the action to perform.
Reference code
Action
1xxx 262C
Replace the disk drive backplane at location Un-P2-C9.
1xxx 8450
There are fewer voltage regulator modules than system processor modules. Add
another voltage regulator module in the next empty position.
1xxx 8453
There is an extra voltage regulator module in the system. Remove the extra voltage
regulator module.
Table 37. For 8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD, use the
following table to determine the action to perform.
Reference code
Action
1xxx 2620
Replace the backplane (Un-P1)
1xxx 262C
Replace the disk drive backplane (Un-P2-C9)
1xxx 2611, 2629, 262B, 262D, Replace the following:
262E, 262F
v I/O backplane (Un-P2)
v Processor card (Un-P3)
1xxx 2630, 2632, 2635, 2636,
2640, 2642, 2645
Replace the VRM specified by the location code.
1xxx 2634
Replace the I/O backplane (Un-P2).
1xxx 2700, 2701
Replace the VRM specified by the location code.
1xxx 2702, 2703
Replace the disk drive backplane (Un-P2-C9).
1xxx 2704, 2705
Replace the I/O backplane (Un-P2).
1xxx 3120
Replace the following, if present:
v The VRM specified by the location code.
v The processor card specified by the location code, if included in the failing item list.
v The I/O backplane (Un-P2) if included in the failing item list.
1xxx 8450
There are fewer voltage regulator cards than processor cards. Add another regulator
card in the next empty position.
1xxx 8451
There are too few voltage regulator cards installed. Add another regulator card in the
next empty position.
1xxx 8453
There is an extra processor regulator in the system. Remove the extra regulator.
PWR1920
Use this procedure to verify that the lights on the server control panel and the display panel on all
attached I/O expansion units are operating correctly.
For important safety information before continuing with this procedure, see “Power isolation procedures”
on page 124.
See System FRU locations for instructions for removing and replacing FRUs.
1. Activate the lamp test by performing one of the following:
v Select function 04 lamp test on the control panel and press Enter.
v Sign on to ASMI and click System Configuration -> Service Indicators -> Lamp Test.
2. Look at the server control panel and the display panels on all attached expansion units. The lamp test
is active only for 25 seconds after you press Enter. Check the following lights on the server control
panel and all expansion units:
v Power-on light.
162
Isolation procedures
v Attention light.
v All dots for the 32 character display.
Are all the lights on the control panel and the I/O display panels on?
v No: Go to step 4.
v Yes: These control panel lights are working correctly. Continue with the next step.
3. Are any abnormal characters or character patterns (not reference codes or normal display mode)
displayed?
v No: The lights are operating correctly.
This ends the procedure.
v Yes: Continue with the next step.
4. Verify that all cables are seated correctly. If the problem persists, replace the control panel. If the
problem still persists, use the following table to determine the possible causes for the lamp test
failure:
Table 38. Failing unit
Failing unit
FRU
8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C,
8205-E6D, 8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C,
8231-E2D, 8233-E8B, 8236-E8C, or 8268-E1D
System backplane
8248-L4T, 8408-E8D, 8412-EAD, 9109-RMD, 9117-MMB,
9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or
9179-MHD
Replace the following one at a time:
1. Service processor, Un-P1-C1
2. Control panel cable
3. Midplane, Un-P1
5796. 7314-G30
I/O backplane
PWR2402
Use this procedure when the server detects an error in the power system.
For important safety information before continuing with this procedure, see “Power isolation procedures”
on page 124.
See System FRU locations for instructions for removing and replacing FRUs.
Note: 9119-FHB systems can be powered by 200 - 480 V ac or 380 - 520 V dc. 9125-F2C systems can be
powered by 200 - 480 V ac or 370 - 575 V dc. When measuring voltage, determine the type of input
voltage and set the meter accordingly.
1. Is the SRC 1xxx8700 or 1xxx8701?
No:
Return to the Start of call procedure. This ends the procedure.
Yes:
Continue with the next step.
2. Measure voltages on the BPRs. If the SRC is 1xxx8700, you should measure the voltage on BPR-A
(front). If the SRC is 1xxx8701, you should measure the voltage on BPR-B (rear). The test points are on
the left side of BPR-1 and BPR-2. Using the labeled test points on the face of the BPR, measure the
voltages between the following:
v Phase A and phase B
v Phase B and phase C
v Phase C and phase A
Are all of the meter readings greater than 200 V ac or 380 V dc?
Isolation procedures
163
No:
Inform the customer that power voltage at the input to the BPR is either missing or too low
and needs to be corrected. This ends the procedure.
Yes:
Go to step 4.
3. Does the check confirm that the customer's voltage levels missing or too low?
No:
Replace the power cord (see Cables for the proper part number). This ends the procedure.
Yes:
The customer must correct the voltage levels. This ends the procedure.
4. Exchange the following FRUs, one at a time, until the problem is resolved.
Note: See System FRU locations for information about FRU locations for the system that you are
servicing.
v Bulk power regulator (BPR) 1
v BPR 2
v BPR 3
v BPR 4
v Bulk power controller (BPC)
v Bulk power assembly (BPA)
This ends the procedure.
164
Isolation procedures
Router isolation procedures
These procedures serve as a guide to the correct isolation procedures from the reference code tables.
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
Perform these procedures only when directed to do so from another procedure.
RTRIP01
Gives a link to a topic that might assist you when exchanging the I/O processor (IOP) for the system or
partition console.
Perform “CONSL01” on page 20.
RTRIP02
Gives a link to a topic that might assist you when diagnosing errors detected by a workstation IOP.
Perform “TWSIP01” on page 314.
Isolation procedures
165
RTRIP03
Gives links to topics to assist you when diagnosing errors detected by a workstation IOP.
If you have a twinaxial terminal for the console, perform “TWSIP01” on page 314. Otherwise, perform
“WSAIP01” on page 321.
RTRIP04
Use the FRU list in the service action log if it is available. If it is not available, examine word 5 of the
reference code.
Is word 5 of the reference code zero (0000 0000)?
Yes: Perform SIIOADP.
No: Perform PIOCARD.
This ends the procedure.
RTRIP05
Use the attached procedure when this reference code occurs for a RIO/HSL/12X loop resource, when an
I/O expansion unit on the loop is powered off for a concurrent maintenance action.
Note: This reference code can occur for the RIO/HSL/12X loop resource when an I/O expansion unit on
the loop is powered off for a concurrent maintenance action.
Note: A fiber-optic cleaning kit may be required for optical RIO/HSL/12X connections.
1. Multiple B600 6982 errors may occur due to efforts to retry and recover. If the recovery efforts were
successful, there will be a B600 6985 reference code with xxxx 3206 in word 4 logged after all B600
6982 reference codes in the product activity log (PAL). If this is the case, close out all the B600 6982
entries. Then continue with the next step.
2. Is there a B600 6987 reference code in the service action log (SAL) logged at about the same time?
Yes: Close this problem and work the B600 6987.
This ends the procedure.
No: Continue with the next step.
3. Is there a B600 6981 reference code in the SAL logged at approximately the same time?
Yes: Go to step 8 on page 167.
No: Continue with the next step.
4. Perform “RIOIP06” on page 26 to determine if any other systems are connected to this loop and then
return here.
Note: The loop number can be found in the SAL in the description for the HSL_LNK FRU.
Are there other systems connect to this loop?
Yes: Continue with the next step.
No: Go to step 8 on page 167.
5. Check for HSL failures in the SALs on the other systems before replacing parts. HSL failures are
indicated by SAL entries with HSL I/O bridge and Network Interface Controller (NIC) resources.
Ignore B600 6982 and B600 6984 entries.
Are there HSL failures on other systems?
Yes: Continue with the next step.
No: Go to step 8 on page 167.
166
Isolation procedures
6. Repair the problems on the other systems and return to this step. After making repairs on the other
systems check the PAL of this system. Is there a B600 6985, along with this loop's resource name, that
was logged after the repairs you made on the other systems?
Yes: Continue with the next step.
No: Go to step 8.
7. For the B600 6985 reference code you found, use SIRSTAT to determine if the loop is now complete.
Is the loop complete?
Yes: The problem has been resolved.
This ends the procedure.
No: Go to step 8.
8. The FRU list displayed in the SAL may be different from the failing item list given here. Use the
SAL's FRU list when it is available.
Does this reference code appear in the SAL with the symbolic FRU HSL_LNK listed as a FRU?
Yes: Perform “RIOIP01” on page 21.
This ends the procedure.
No: Exchange the FRUs listed in the SAL according to their part action codes.
This ends the procedure.
RTRIP06
Use the attached procedure when this reference code occurs in a service action code (SAL).
Note: A fiber-optic cleaning kit may be required for optical HSL connections.
1. Is the reference code in the service action log (SAL)?
v Yes: Continue with the next step.
v No: The reference code is informational. Use SIRSTAT to determine what the reference code means.
This ends the procedure.
2. This error can appear in the SAL if a expansion unit or another system in the loop did not complete
powering on before Licensed Internal Code (LIC) checked this loop for errors. Search the product
activity log (PAL) for all B600 6985 reference codes logged for this loop and use SIRSTAT to determine
if this error requires service.
Is further service required?
Yes: Continue with the next step.
No: This ends the procedure.
3. There may be multiple B600 6985 reference codes, with xxxx 3205 in word 4, for the same loop
resource in the SAL. This is caused by attempts to retry and recover. If there is a B600 6985 reference
code with xxxx 3206 or xxxx 3208 in word 4 after the above B600 6985 entries in the PAL, then the
recovery efforts were successful. If this is the case, close all the B600 6985 entries for that loop
resource in the SAL. Then continue with the next step.
4. Is there a B600 6981 reference code in the SAL?
Yes: Close that problem and go to step 9 on page 168.
No: Continue with the next step.
5. Perform “RIOIP06” on page 26 to determine if any other systems are connected to this loop and then
return here.
Note: The loop number can be found in the SAL in the description for the HSL_LNK FRU.
Are there other systems connected to this loop?
Yes: Continue with the next step.
No: Go to step 9 on page 168.
Isolation procedures
167
6. Check for HSL failures in the SALs on the other systems before replacing parts. HSL failures are
indicated by SAL entries with HSL I/O bridge and Network Interface Controller (NIC) resources.
Ignore B600 6982 and B600 6984 entries.
Are there HSL failures on other systems?
Yes: Continue with the next step.
No: Go to step 9.
7. Repair the problems on the other systems and return to this step. After making repairs on the other
systems check the PAL of this system. Is there a B600 6985 reference code that was logged after the
repairs you made on the other systems?
Yes: Continue with the next step.
No: Go to step 9.
8. For the B600 6985 log you found, use SIRSTAT to determine if the loop is now complete.
Is the loop complete?
v Yes: The problem has been resolved.
This ends the procedure.
v No: Go to step 9.
9. The FRU list displayed in the SAL may be different from the failing item list given here. Use the
SAL's FRU list when it is available.
Does this reference code appear in the SAL with the symbolic FRU HSL_LNK listed as a FRU?
v Yes: Perform “RIOIP01” on page 21.
This ends the procedure.
v No: Exchange the FRUs listed in the SAL according to their part action codes.
This ends the procedure.
RTRIP07
Gives a link to assist you when diagnosing a keyboard error.
Perform “WSAIP01” on page 321.
RTRIP08
Gives a link to assist when the Licensed Internal Code detected an IOP programming problem.
Perform a system IPL. Is the IPL successful?
Yes: Perform “LICIP01” on page 89 to determine the cause of the problem. This ends the procedure.
No: Perform the action described in the new reference code. This ends the procedure.
168
Isolation procedures
Serial-attached SCSI isolation procedures
Use serial-attached SCSI (SAS) isolation procedures when a management console is not attached to the
server. If the server is connected to a management console, use the procedures that are available on the
management console to continue FRU isolation.
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
SIP3110
This procedure resolves problems when disk units are incompatible or a disk unit is missing or failed.
The following SRCs are possible:
v SRC xxxx9025. Indicates that an incompatible disk unit is installed at the disk unit location that caused
the array to be exposed.
v SRC xxxx9030. Indicates that a disk array is exposed due to a missing or failed disk unit.
v SRC xxxx9032. Indicates that a disk unit in a disk array is missing or failed, but the array is still
protected.
If you received SRC xxxx9030 or xxxx9032, one of the following occurred:
Isolation procedures
169
v A disk unit has failed and the RAID array protection is exposed or will become exposed if another disk
unit fails because no hot spare disk unit was available to replace it. If the array is exposed, then the
array will continue to be exposed until the disk unit has been replaced and a manual rebuilding of the
array has been started.
v A disk unit has failed in a RAID array, but a hot spare was used to automatically start rebuilding the
array. Replace, format, and configure the failed disk unit as a hot spare.
Note: If the previous hot spare disk unit was a larger capacity than the failed disk unit, ensure that the
customer understands that the replacement disk unit might not provide adequate hot spare coverage
for all of the arrays under this adapter.
1. Is the device location information for this SRC available in the service action log (see Searching the
service action log for details)?
No:
Continue with the next step.
Yes:
Exchange the disk unit. This ends the procedure.
2. Identify the affected adapter and disk units by examining the product activity log. Perform one of the
following steps to access system service tools (SST) or dedicated service tools (DST):
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
3. Perform the following steps:
a. Access the product activity log and display the SRC that sent you here.
b. Press the F9 key for address information. This is the adapter address.
c. Continue with the next step.
4. Perform the following steps:
a. Return to the SST or DST main menu.
b. Select Work with disk units > Display disk configuration > Display disk configuration status.
c. On the Display disk configuration status display, look for the devices attached to the adapter that
is identified in step 3.
Is there a device that has a status of RAID 5/Unknown, RAID 6/Unknown, RAID 5/Failed, or RAID
6/Failed?
No:
Continue with step 7.
Yes:
Continue with the next step.
5. Find the device that has a status of RAID 5/Unknown, RAID 6/Unknown, RAID 5/Failed, or RAID
6/Failed. This is the device that is causing the problem. Show the device address by selecting Display
Disk Unit Details > Display Detailed Address. Record the device address. See System FRU locations
and find the diagram of the system unit or of the expansion unit and find the following items:
v The slot that is identified by the direct select address of the adapter
v The disk unit location that is identified by the device address
6. Have you determined the location of the adapter and disk unit that is causing the problem?
No:
Ask your next level of support for assistance. This ends the procedure.
Yes:
Exchange the disk unit that is causing the problem. This ends the procedure.
7. Press the function key to cancel and to return to the Display Disk Configuration menu, then do the
following:
a. Select Display disk hardware status.
b. Find a device that is either Not operational or Read/write protected.
c. Display details for the device and get the location of the failed disk unit.
170
Isolation procedures
d. Exchange the disk unit and configure it as a hot spare. This ends the procedure.
SIP3111
This procedure resolves the problem when two or more disk units are missing from a RAID 5 or RAID 6
disk array.
The following SRCs are possible:
v xxxx9020
v xxxx9021
v xxxx9022
1. Identify the affected adapter and disk units by examining the Product Activity Log. Perform the
following steps:
a. Access SST or DST.
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the Product Activity Log and record address information.
If a type D IPL was not performed to get to SST or DST:
The log information is formatted. Access the Product Activity Log and display the SRC
that sent you here. Press the F9 key for address information. This is the adapter address.
Then, press F12 to cancel and return to the previous screen. Then press the F4 key to view
the "Additional Information" to record the formatted log information. Record all devices
that are missing from the disk array. These are the array members that have both a current
address of 0 and an expected address that is not 0.
If a type D IPL was performed to get to DST:
The log information is not formatted. Access the Product Activity Log and display the SRC
that sent you here. The direct select address (DSA) of the adapter is in the format
BBBB-Cc-bb:
BBBB Hexadecimal offsets 4C and 4D
Cc
Hexadecimal offset 51
bb
Hexadecimal offset 4F
In order to interpret the hexadecimal information to get device addresses, see More
information from hexadecimal reports. Record all devices that are missing from the disk
array. These are the array members that have both a current address of 0 and an expected
address that is not 0.
c. Determine the location of the adapter and the devices that are causing the problem. See System
FRU locations and find the diagram of the system unit, or the expansion unit. Then find the
following items:
v The card slot that is identified by the direct select address (DSA)
v The disk unit locations that are identified by the unit addresses
2. Perform one of the following options (listed in order of preference):
Option 1
Power off the system or partition and install the identified disk units in the correct physical
locations (that is the expected addresses) in the system. This ends the procedure.
Option 2
Stop the disk array that contains the missing devices.
Isolation procedures
171
Attention:
Customer data might be lost.
Perform the following:
a. If you are not already using dedicated service tools, perform an IPL to DST. See
Performing an IPL to dedicated service tools. If you cannot perform a type A or B IPL,
perform a type D IPL from removable media.
b. Select Work with disk units. Did you get to DST with a type D IPL?
Select Work with disk configuration > Work with device parity protection. Then,
continue with substep 2c.
No:
Yes:
Continue with substep 2c.
c. Select Stop device parity protection.
d. Follow the online instructions to stop device parity protection.
e. Perform an IPL from disk.
Does the IPL complete successfully?
No:
Go to Start of call. This ends the procedure.
Yes:
This ends the procedure.
Option 3
If the data on the disk units is not needed, initialize and format the remaining members of the
disk array by performing the following steps:
Attention:
Data on the disk unit will be lost.
If a type D IPL was not performed to get to SST or DST:
a. Access SST or DST.
b. Select Work with disk units > Work with disk unit recovery > Disk unit problem
recovery procedures.
c. Select Initialize and format disk unit for each disk unit. When the new disk unit is
initialized and formatted, the display shows that the status is complete. This might take 30
minutes or much longer depending on the capacity of the disk unit. The disk unit is now
ready to be added to the system configuration. This ends the procedure.
If a type D IPL was performed to get to DST:
a. Access DST.
b. Select Work with disk units.
c. Select Initialize and format disk unit for each disk unit. When the new disk unit is
initialized and formatted, the display shows that the status is complete. This might take 30
minutes or much longer depending on the capacity of the disk unit. The disk unit is now
ready to be added to the system configuration. This ends the procedure.
SIP3112
This procedure resolves the problem when one or more disk array members are not at the required
physical locations.
The possible SRC is SRC xxxx9023.
1. Identify the affected adapter and disk units that are not at their required locations by examining the
Product Activity Log. Perform the following steps:
a. Access SST or DST.
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
172
Isolation procedures
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the product activity log and record address information.
If a type D IPL was not performed to get to SST or DST:
The log information is formatted. Access the Product Activity Log and display the SRC
that sent you here. Press the F9 key for address information. This is the adapter address.
Then, press F12 to cancel and return to the previous screen. Then press the F4 key to view
the additional information to record the formatted log information. Record all devices that
are not at their required locations. These are the array members that have a current
address and an expected address that do not match. A current address of 0 is acceptable,
and no action is needed to correct it for a known failed drive in the array.
If a type D IPL was performed to get to DST:
The log information is not formatted. Access the Product Activity Log and display the SRC
that sent you here. The direct select address (DSA) of the adapter is in the format
BBBB-Cc-bb:
BBBB Hexadecimal offsets 4C and 4D
Cc
Hexadecimal offset 51
bb
Hexadecimal offset 4F
In order to interpret the hexadecimal information to get device addresses, see More
information from hexadecimal reports. Record all devices that are not at their required
locations. These are the array members that have a current address and an expected
address that do not match. A current address of 0 is acceptable, and no action is needed to
correct it for a known failed drive in the array.
c. Determine the location of the adapter and the devices that are causing the problem. See System
FRU locations and find the diagram of the system unit or the expansion unit. Then find the
following items:
v The card slot that is identified by the direct select address (DSA)
v The disk unit locations that are identified by the unit addresses
2. Perform only one of the following options (listed in order of preference):
Option 1
Power off the system or partition and install the identified disk units in the correct physical
locations (that is the expected addresses) in the system. This ends the procedure.
Option 2
Stop the disk array that contains the missing devices.
Attention:
Customer data might be lost.
Perform the following:
a. If you are not already using dedicated service tools, perform an IPL to DST. See
Performing an IPL to dedicated service tools. If you cannot perform a type A or B IPL,
perform a type D IPL from removable media.
b. Select Work with disk units. Did you get to DST with a type D IPL?
No:
Select Work with disk configuration > Work with device parity protection. Then,
continue with substep 2c.
Yes:
Continue with the next substep 2c.
c. Select Stop device parity protection.
d. Follow the online instructions to stop device parity protection.
e. Perform an IPL from disk.
Isolation procedures
173
Does the IPL complete successfully?
No:
Go to Start of call. This ends the procedure.
Yes:
This ends the procedure.
Option 3
If the data on the disk units is not needed, initialize and format the remaining members of the
disk array by performing the following steps:
Attention:
Data on the disk unit will be lost.
If a type D IPL was not performed to get to SST or DST:
a. Access SST or DST.
b. Select Work with disk units > Work with disk unit recovery > Disk unit problem
recovery procedures.
c. Select Initialize and format disk unit for each disk unit. When the new disk unit is
initialized and formatted, the display shows that the status is complete. This might take 30
minutes or much longer depending on the capacity of the disk unit. The disk unit is now
ready to be added to the system configuration. This ends the procedure.
If a type D IPL was performed to get to DST:
a. Access DST.
b. Select Work with disk units.
c. Select Initialize and format disk unit for each disk unit. When the new disk unit is
initialized and formatted, the display shows that the status is complete. This might take 30
minutes or much longer depending on the capacity of the disk unit. The disk unit is now
ready to be added to the system configuration. This ends the procedure.
SIP3113
This procedure resolves problems when a disk array is or would become exposed and parity data is out
of synchronization.
The possible SRC is xxxx9027.
1. Identify the affected adapter and disk units by examining the Product Activity Log. Perform the
following steps:
a. Access SST or DST.
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the Product Activity Log and record address information.
If a type D IPL was not performed to get to SST or DST:
The log information is formatted. Access the Product Activity Log and display the SRC
that sent you here. Press the F9 key for address information. This is the adapter address.
Then, press F12 to cancel and return to the previous screen. Then press the F4 key to view
the additional information to record the formatted log information. Record all devices that
are missing from the disk array. These are the array members that have both a current
address of 0 and an expected address that is not 0.
If a type D IPL was performed to get to DST:
The log information is not formatted. Access the product activity log and display the SRC
that sent you here. The direct select address (DSA) of the adapter is in the format
BBBB-Cc-bb:
174
Isolation procedures
BBBB Hexadecimal offsets 4C and 4D
Cc
Hexadecimal offset 51
bb
Hexadecimal offset 4F
In order to interpret the hexadecimal information to get device addresses, see More
information from hexadecimal reports. Record all devices that are missing from the disk
array. These are the array members that have both a current address of 0 and an expected
address that is not 0.
c. Determine the location of the adapter and the devices that are causing the problem. See System
FRU locations and find the diagram of the system unit or the expansion unit. Then find the
following:
v The card slot that is identified by the direct select address (DSA)
v The disk unit locations that are identified by the unit addresses
2. Have the adapter or disk units been physically moved recently?
No: Contact your hardware service provider. This ends the procedure.
Yes: Continue with the next step.
3. Perform one of the following two options (listed in order of preference):
Option 1
Power off the system or partition and restore the I/O adapter and disk units back to their
original configuration. This ends the procedure.
Option 2
Stop the disk array that contains the missing devices.
Attention:
Customer data might be lost.
Perform the following:
a. If you are not already using dedicated service tools, perform an IPL to DST. See
Performing an IPL to dedicated service tools. If you cannot perform a type A or B IPL,
perform a type D IPL from removable media.
b. Select Work with disk units. Did you get to DST with a type D IPL?
Select Work with disk configuration > Work with device parity protection. Then,
continue with substep 3c.
No:
Yes:
Continue with substep 3c.
c. Select Stop device parity protection.
d. Follow the online instructions to stop device parity protection.
e. Perform an IPL from disk.
Does the IPL complete successfully?
No:
Go to Start of call. This ends the procedure.
Yes:
This ends the procedure.
Option 3
If the data on the disk units is not needed, initialize and format the disk units by performing
the following steps:
Attention:
Data on the disk unit will be lost.
If a type D IPL was not performed to get to SST or DST:
a. Access SST or DST.
b. Select Work with disk units > Work with disk unit recovery > Disk unit problem
recovery procedures.
Isolation procedures
175
c. Select Initialize and format disk unit for each disk unit. When the new disk unit is
initialized and formatted, the display shows that the status is complete. This might take 30
minutes or much longer depending on the capacity of the disk unit. The disk unit is now
ready to be added to the system configuration. This ends the procedure.
If a type D IPL was performed to get to DST:
a. Access DST.
b. Select Work with disk units.
c. Select Initialize and format disk unit for each disk unit. When the new disk unit is
initialized and formatted, the display shows that the status is complete. This might take 30
minutes or much longer depending on the capacity of the disk unit. The disk unit is now
ready to be added to the system configuration. This ends the procedure.
SIP3120
This procedure resolves the problem when cache data associated with attached disk units cannot be
found.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
The possible SRC is xxxx9010
1. Is the adapter connected in a dual storage adapter configuration (that is, two adapters connected to
the same set of disk units)?
No
Go to step 2.
Yes
Contact your hardware service provider.
2. Has the server been powered off for several days?
No
Go to step 3.
Yes
Go to step 8 on page 177.
3. Are you working with a 572F/575C card set?
No
Go to step 5.
Yes
Go to step 4.
4. Using the appropriate service procedures, remove the adapter card set. Install the new replacement
storage adapter with the following parts installed on it:
Note: Label all parts (original and new) before moving them.
v The new replacement 572F storage adapter with the cache directory card from the original storage
adapter. See Separating the 572F/575C card set and moving the cache directory card.
v The original 575C auxiliary cache adapter.
Go to step 6 on page 177.
5. Using the appropriate service procedures, remove the adapter. Install the new replacement storage
adapter with the following parts installed on it:
Note: Label all parts (original and new) before moving them.
v The cache directory card from the original storage adapter. Refer to Replacing the cache directory
card.
v The removable cache card from the original storage adapter. This applies only to certain adapters
that have a removable cache card.
Go to step 6 on page 177.
176
Isolation procedures
6. Has a new SRC of xxxx9010 or xxxx9050 occurred?
No
Go to step 9.
Yes
Go to step 7
7. Was the new SRC xxxx9050?
No
The new SRC was xxxx9010. Reclaim adapter cache storage. See Reclaiming IOP cache
storage.
Attention: Data might be lost. When an auxiliary cache adapter connected to the RAID
adapter logs a xxxx9055 SRC in the hardware error log, the reclaim process does not result in
lost sectors. Otherwise, the reclaim process does result in lost sectors.
Note: On the Reclaim Controller Cache Storage results screen, the number of lost sectors is
displayed. If the number is 0, there is no data loss. If the number is not 0, data has been lost
and the system operator might want to restore data after this procedure is completed.
Go to step 9.
Yes:
Contact your hardware service provider. This ends the procedure.
8. If the server has been powered off for several days after an abnormal power-down, the cache battery
pack might be depleted. Do not replace the adapter or the cache battery pack. Reclaim adapter cache
storage. See Reclaiming IOP cache storage.
Attention: Data might be lost. When an auxiliary cache adapter connected to the RAID adapter
logs a xxxx9055 SRC in the hardware error log, the reclaim process does not result in lost sectors.
Otherwise, the reclaim process does result in lost sectors.
Note: On the Reclaim Controller Cache Storage results screen, the number of lost sectors is
displayed. If the number is 0, there is no data loss. If the number is not 0, data has been lost and the
system operator might want to restore data after this procedure is completed.
This ends the procedure.
9. Are you working with a 572F/575C card set?
No
Go to step 11.
Yes
Go to step 10.
10. Using the appropriate service procedures, remove the adapter card set. Install the new replacement
storage adapter with the following parts installed on it:
v The new replacement 572F storage adapter with the cache directory card from the new
replacement storage adapter. See Separating the 572F/575C card set and moving the cache
directory card.
v The new 575C auxiliary cache adapter.
This ends the procedure.
11. Using the appropriate service procedures, remove the adapter. Install the new replacement storage
adapter with the following parts installed on it:
v The cache directory card from the new storage adapter. See Replacing the cache directory card.
v The removable cache card from the new storage adapter. This only applies to certain adapters
which have a removable cache card.
This ends the procedure.
SIP3121
Use this procedure to resolve the following problem: RAID adapter resources not available due to
previous problems (SRC xxxx9054).
Isolation procedures
177
Power off the system and remove all new or replacement disk units. IPL the system to DST. If you cannot
perform a type A or B IPL, perform a type D IPL from removable media.
Look for Product Activity Log entries for other reference codes and take action on them. This ends the
procedure.
SIP3130
Use this procedure to resolve the following problem: Adapter does not support function expected by one
or more disk units (SRC xxxx9008).
1. Identify the affected adapter and disk units by examining the Product Activity Log. Perform the
following steps:
a. Access SST or DST.
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the product activity log and record address information.
If a type D IPL was not performed to get to SST or DST:
The log information is formatted. Access the Product Activity Log and display the SRC
that sent you here. Press the F9 key for address information. This is the adapter address.
Then, press F12 to cancel and return to the previous screen. Then press the F4 key to view
the "Additional Information" to record the formatted log information. The Device Errors
Detected field indicates the total number of disk units that are affected. The Device Errors
Logged field indicates the number of disk units for which detailed information is
provided. Under the Device heading, the unit address, type, serial number, and worldwide
ID are provided for up to three disk units. Additionally, the adapter type, serial number,
and worldwide ID for each of these disk units indicates the adapter to which the disk was
last attached when it was operational.
If a type D IPL was performed to get to DST:
The log information is not formatted. Access the Product Activity Log and display the SRC
that sent you here. The direct select address (DSA) of the adapter is in the format
BBBB-Cc-bb:
BBBB Hexadecimal offsets 4C and 4D
Cc
Hexadecimal offset 51
bb
Hexadecimal offset 4F
In order to interpret the hexadecimal information to get device addresses, see More
information from hexadecimal reports. The Device Errors Detected field indicates the total
number of disk units that are affected. The Device Errors Logged field indicates the
number of disk units for which detailed information is provided. Under the Device
heading, the unit address, type, serial number, and worldwide ID are provided for up to
three disk units. Additionally, the adapter type, serial number, and worldwide ID for each
of these disk units indicates the adapter to which the disk was last attached when it was
operational.
c. Determine the location of the adapter and the devices that are causing the problem. See System
FRU locations and find the diagram of the system unit or the expansion unit. Then find the
following items:
v The card slot that is identified by the direct select address (DSA)
v The disk unit locations that are identified by the unit addresses
Have you determined the location of the adapter and the devices that are causing the problem?
178
Isolation procedures
No: Ask your next level of support for assistance. This ends the procedure.
Yes: Continue with the next step.
2. Have the adapter or disk units been physically moved recently, or were the disk units previously used
by the AIX or Linux operating system?
No: Ask your next level of support for assistance. This ends the procedure.
Yes: Continue with the next step.
3. Perform one of the following options (listed in order of preference):
Option 1
Power off the system or partition and restore the adapter and disk units back to their original
configuration. This ends the procedure.
Option 2
If the data on the disk units is not needed, initialize and format the disk units by performing
the following steps:
Attention:
Data on the disk unit will be lost.
If a type D IPL was not performed to get to SST or DST:
a. Access SST or DST.
b. Select Work with disk units > Work with disk unit recovery > Disk unit
problem recovery procedures.
c. Select Initialize and format disk unit for each disk unit. When the new disk unit
is initialized and formatted, the display shows that the status is complete. This
might take 30 minutes or much longer depending on the capacity of the disk unit.
The disk unit is now ready to be added to the system configuration. This ends the
procedure.
If a type D IPL was performed to get to DST:
a. Access DST.
b. Select Work with disk units.
c. Select Initialize and format disk unit for each disk unit. When the new disk unit
is initialized and formatted, the display shows that the status is complete. This
might take 30 minutes or much longer depending on the capacity of the disk unit.
The disk unit is now ready to be added to the system configuration. This ends the
procedure.
SIP3131
Use this procedure to resolve the following problem: Required cache data cannot be located for one or
more disk units (SRC xxxx9050).
1. Did you just exchange the adapter as the result of a failure?
No:
Go to step 6 on page 180.
Yes:
Go to step 2.
2. Is the adapter connected in a dual storage adapter configuration (that is, two adapters connected to
the same set of disk units)?
No:
Go to step 3.
Yes:
Contact your hardware service provider.
3. Are you working with a 572F/575C card set?
No:
Go to step 5 on page 180.
Yes:
Go to step 4 on page 180.
Isolation procedures
179
4. Using the appropriate service procedures, remove the adapter card set. Install the new replacement
storage adapter with the following parts installed on it:
Note: Label all parts (original and new) before moving them.
v The new replacement 572F storage adapter with the cache directory card from the original storage
adapter. See Separating the 572F/575C card set and moving the cache directory card.
v The original 575C auxiliary cache adapter.
Go to step 11 on page 182.
5.
Attention:
a. The failed adapter that you have just exchanged contains cache data that is required by the disk
units that were attached to that adapter. If the adapter that you just exchanged is failing
intermittently, reinstalling it and performing an IPL of the system might allow the data to be
successfully written to the disk units. After the cache data is written to the disk units and the
system is powered off normally, the adapter can be replaced without data being lost. Otherwise,
continue with this procedure.
b. Label all parts (original and new) before moving them.
Using the appropriate service procedures, remove the adapter. Install the new replacement storage
adapter with the following parts installed on it:
v The cache directory card from the original storage adapter. Refer to Replacing the cache directory
card.
v The removable cache card from the original storage adapter. This applies only to certain adapters
that have a removable cache card.
Go to step 11 on page 182.
6. Identify the affected adapter and disk units by examining the Product Activity Log. Perform the
following steps:
a. Access SST/DST.
v If you can enter a command at the console, access system service tools (SST). See System
service tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL
to dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the product activity log and record address information.
If a type D IPL was not performed to get to SST/DST:
The log information is formatted. Access the Product Activity Log and display the SRC
that sent you here. Press the F9 key for address information. This is the adapter address.
Then, press F12 to cancel and return to the previous screen. Then press the F4 key to
view the additional information to record the formatted log information. The Device
Errors detected field indicates the total number of disk units that are affected. The Device
Errors Logged field indicates the number of disk units for which detailed information is
provided. Under the Device heading, the unit address, type, serial number, and
worldwide ID are provided for up to three disk units. Additionally, the adapter type,
serial number, and worldwide ID for each of these disk units indicates the adapter to
which the disk was last attached when it was operational.
If a type D IPL was performed to get to DST:
The log information is not formatted. Access the product activity log and display the SRC
that sent you here. The direct select address (DSA) of the adapter is in the format
BBBB-Cc-bb:
BBBB Hexadecimal offsets 4C and 4D
180
Isolation procedures
Cc
Hexadecimal offset 51
bb
Hexadecimal offset 4F
In order to interpret the hexadecimal information to get device addresses, see More
information from hexadecimal reports. The Device Errors Detected field indicates the
total number of disk units that are affected. The Device Errors Logged field indicates the
number of disk units for which detailed information is provided. Under the Device
heading, the unit address, type, serial number, and worldwide ID are provided for up to
three disk units. Additionally, the adapter type, serial number, and worldwide ID for
each of these disk units indicates the adapter to which the disk was last attached when it
was operational.
c. Determine the location of the adapter and the devices that are causing the problem. See System
FRU locations and find the diagram of the system unit or the expansion unit. Then find the
following items:
v The card slot that is identified by the direct select address (DSA)
v The disk unit locations that are identified by the unit addresses
Have you determined the location of the adapter and the devices that are causing the problem?
No: Ask your next level of support for assistance. This ends the procedure.
Yes: Continue with the next step.
7. Have the adapter or disk units been physically moved recently?
No:
Contact your hardware service provider.
Yes:
Go to step 8.
8. Is the data on the disk units needed for this or any other system?
No:
Go to step 10.
Yes:
Go to step 9
9. Restore the adapter and disk units back to their original configuration. The adapter and disk units
must be rejoined so that the cache data can be written to the disk units.
After the cache data is written to the disk units and the system is powered off normally, the adapter
or disk units can be moved to another location. This ends the procedure
10. Perform only one of the following options, listed in the order of preference:
Option 1
Reclaim adapter cache storage. See Reclaiming IOP cache storage.
Attention: Data on the disk array will be lost.
This ends the procedure.
Option 2
If the data on the disk units is not needed, initialize and format the disk units by performing
the following steps:
Attention:
Data on the disk units will be lost.
If a type D IPL was not performed to get to SST or DST:
a. Access SST or DST.
b. Select Work with disk units > Work with disk unit recovery > Disk unit
problem recovery procedures.
c. Select Initialize and format disk unit for each disk unit. When the new disk unit
is initialized and formatted, the display shows that the status is complete. This
might take 30 minutes or much longer depending on the capacity of the disk
unit. The disk unit is now ready to be added to the system configuration. This
ends the procedure.
Isolation procedures
181
If a type D IPL was performed to get to DST:
a. Access DST.
b. Select Work with disk units.
c. Select Initialize and format disk unit for each disk unit. When the new disk unit
is initialized and formatted, the display shows that the status is complete. This
might take 30 minutes or much longer depending on the capacity of the disk
unit. The disk unit is now ready to be added to the system configuration. This
ends the procedure.
11. Has a new SRC xxxx9010 or xxxx9050 occurred?
No:
Go to step 13.
Yes:
Go to step 12.
12. Was the new SRC xxxx9050?
No:
The new SRC was xxxx9010.
Reclaim adapter cache storage. See Reclaiming IOP cache storage..
Attention: Data might be lost. When an auxiliary cache adapter connected to the RAID
adapter logs an xxxx9055 SRC in the hardware error log, the reclaim process does not result
in lost sectors. Otherwise, the reclaim process does result in lost sectors.
Note: On the Reclaim Controller Cache Storage results screen, the number of lost sectors is
displayed. If the number is 0, there is no data loss. If the number is not 0, data has been lost
and the system operator might want to restore data after this procedure is completed.
Go to step 13
Yes:
Contact your hardware service provider.
13. Are you working with a 572F/575C card set?
No:
Go to step 15.
Yes:
Go to step 14.
14. Using the appropriate service procedures, remove the adapter card set. Install the new replacement
storage adapter with the following parts installed on it:
v The new replacement 572F storage adapter with the cache directory card from the new
replacement storage adapter. See Separating the 572F/575C card set and moving the cache
directory card.
v The new 575C auxiliary cache adapter.
This ends the procedure.
15. Using the appropriate service procedures, remove the adapter. Install the new replacement storage
adapter with the following parts installed on it:
v The cache directory card from the new storage adapter. See Replacing the cache directory card.
v The removable cache card from the new storage adapter. This only applies to certain adapters that
have a removable cache card.
This ends the procedure.
SIP3132
Use this procedure to resolve the following problem: Cache data exists for one or more missing or failed
disk units (SRC xxxx9051).
The possible causes are:
v One or more disk units have failed on the adapter.
182
Isolation procedures
v One or more disk units were either moved concurrently or were removed after an abnormal power off.
v The adapter was moved from a different system or a different location on this system after an
abnormal power off.
v The cache of the adapter was not cleared before it was shipped to the customer.
1. Identify the affected adapter and disk units by examining the product activity log. Perform the
following steps:
a. Access SST or DST.
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the Product Activity Log and record address information.
If a type D IPL was not performed to get to SST or DST:
The log information is formatted. Access the product activity log and display the SRC that
sent you here. Press the F9 key for address information. This is the adapter address. Then,
press F12 to cancel and return to the previous screen. Then press the F4 key to view the
additional information to record the formatted log information. The Device Errors
Detected field indicates the total number of disk units that are affected. The Device Errors
Logged field indicates the number of disk units for which detailed information is
provided. Under the Device heading, the unit address, type, serial number, and worldwide
ID are provided for up to three disk units. Additionally, the adapter type, serial number,
and worldwide ID for each of these disk units indicates the adapter to which the disk was
last attached when it was operational.
If a type D IPL was performed to get to DST:
The log information is not formatted. Access the product activity log and display the SRC
that sent you here. The direct select address (DSA) of the adapter is in the format
BBBB-Cc-bb:
BBBB Hexadecimal offsets 4C and 4D
Cc
Hexadecimal offset 51
bb
Hexadecimal offset 4F
In order to interpret the hexadecimal information to get device addresses, see More
information from hexadecimal reports. The Device Errors Detected field indicates the total
number of disk units that are affected. The Device Errors Logged field indicates the
number of disk units for which detailed information is provided. Under the Device
heading, the unit address, type, serial number, and worldwide ID are provided for up to
three disk units. Additionally, the adapter type, serial number, and worldwide ID for each
of these disk units indicates the adapter to which the disk was last attached when it was
operational.
c. Determine the location of the adapter and the devices that are causing the problem. See System
FRU locations and find the diagram of the system unit or the expansion unit. Then find the
following items:
v The card slot that is identified by the direct select address (DSA)
v The disk unit locations that are identified by the unit addresses
Have you determined the location of the adapter and the devices that are causing the problem?
No: Ask your next level of support for assistance. This ends the procedure.
Yes: Continue with the next step.
2. Are there other disk unit or adapter errors that have occurred at approximately the same time as this
error?
Isolation procedures
183
No:
Go to step 3.
Yes:
Go to step 6.
3. Is the data on the disk units (and thus the cache data for the disk units) needed for this or any other
system?
No:
Go to step 7.
Yes:
Go to step 4
4. Have the adapter card or disk units been physically moved recently?
No:
Contact your hardware service provider.
Yes:
Go to step 5.
5. Restore the adapter and disk units back to their original configuration. The adapter and disk units
must be rejoined so that the cache data can be written to the disk units.
After the cache data is written to the disk units and the system is powered off normally, the adapter
or disk units can be moved to another location. This ends the procedure.
6. Take action on the other errors that have occurred at the same time as this error. This ends the
procedure.
7. Reclaim adapter cache storage. See Reclaiming IOP cache storage.
Attention: Data will be lost. This ends the procedure.
SIP3134
Use this procedure to resolve the following problem: Disk unit requires format before use (SRC xxxx9092).
The possible causes are:
v Disk unit is a previously failed disk unit from a disk array and was automatically replaced by a hot
spare disk unit.
v Disk unit is a previously failed disk unit from a disk array and was removed and later reinstalled on a
different adapter or different location on this adapter.
v Appropriate service procedures were not followed when replacing disk units or reconfiguring the
adapter, such as not performing a normal power off of the system prior to reconfiguring disk units and
adapters.
v Disk unit is a member of a disk array, but was detected subsequent to the adapter being configured.
v Disk unit has multiple or complex configuration problems.
1. Identify the affected adapter and disk units by examining the product activity log. Perform the
following steps:
a. Access SST or DST.
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the product activity log and record address information.
If a type D IPL was not performed to get to SST or DST:
The log information is formatted. Access the product activity log and display the SRC that
sent you here. Press the F9 key for address information. This is the adapter address. Then,
press F12 to cancel and return to the previous screen. Then press the F4 key to view the
additional information to record the formatted log information. The Device Errors
Detected field indicates the total number of disk units that are affected. The Device Errors
Logged field indicates the number of disk units for which detailed information is
provided. Under the Device heading, the unit address, type, serial number, and worldwide
184
Isolation procedures
ID are provided for up to three disk units. Additionally, the adapter type, serial number,
and worldwide ID for each of these disk units indicates the adapter to which the disk was
last attached when it was operational.
If a type D IPL was performed to get to DST:
The log information is not formatted. Access the product activity log and display the SRC
that sent you here. The direct select address (DSA) of the adapter is in the format
BBBB-Cc-bb:
BBBB Hexadecimal offsets 4C and 4D
Cc
Hexadecimal offset 51
bb
Hexadecimal offset 4F
In order to interpret the hexadecimal information to get device addresses, see More
information from hexadecimal reports. The Device Errors detected field indicates the total
number of disk units that are affected. The Device Errors logged field indicates the
number of disk units for which detailed information is provided. Under the Device
heading, the unit address, type, serial number, and Worldwide ID are provided for up to
three disk units. Additionally, the adapter type, serial number, and Worldwide ID for each
of these disk units indicates the adapter to which the disk was last attached when it was
operational.
c. Determine the location of the adapter and the devices that are causing the problem. See System
FRU locations and find the diagram of the system unit, or the expansion unit. Then find the
following items:
v The card slot that is identified by the direct select address (DSA)
v The disk unit locations that are identified by the unit addresses
Have you determined the location of the adapter and the devices that are causing the problem?
No: Ask your next level of support for assistance. This ends the procedure.
Yes: Continue with the next step.
2. Are there other disk unit or adapter errors that have occurred at about the same time as this error?
No:
Go to step 3.
Yes:
Go to step 5.
3. Have the adapter card or disk units been physically moved recently?
No:
Go to step 4.
Yes:
Go to step 6.
4. Is the data on the disk units needed for this or any other system?
No:
Go to step 7 on page 186.
Yes:
Go to step 6.
5. Take action on the other errors that have occurred at the same time as this error. This ends the
procedure.
6. Perform one of the following options that is most applicable to your situation:
Option 1
Perform the following steps to cause the adapter to rediscover the devices and connections:
a. Use Hardware Service Manager to perform another IPL of the virtual I/O processor that is
associated with this adapter.
b. Vary on any other resources attached to the virtual I/O processor.
Take action for any other errors that are now occurring. This ends the procedure.
Isolation procedures
185
Option 2
Power off the system or partition and restore the adapter and disk units to their original
configuration. This ends the procedure.
Option 3
Remove the disk units from this adapter. This ends the procedure.
7. Do the following steps to format the disk units:
Attention: All data on the disk units will be lost.
If a type D IPL was not performed to get to SST or DST:
a. Access SST or DST.
b. Select Work with disk units > Work with disk unit recovery > Disk unit problem
recovery procedures.
c. Select Initialize and format disk unit for each disk unit. When the new disk unit is
initialized and formatted, the display shows that the status is complete. This might take 30
minutes or much longer depending on the capacity of the disk unit. The disk unit is now
ready to be added to the system configuration. This ends the procedure.
If a type D IPL was performed to get to DST:
a. Access DST.
b. Select Work with disk units.
c. Select Initialize and format disk unit for each disk unit. When the new disk unit is
initialized and formatted, the display shows that the status is complete. This might take 30
minutes or much longer depending on the capacity of the disk unit. The disk unit is now
ready to be added to the system configuration. This ends the procedure.
SIP3140
Use this procedure to resolve the following problem: Multiple adapters connected in an invalid
configuration (SRC xxxx9073)
The possible causes are:
v Incompatible adapters are connected to each other. This includes invalid adapter combinations such as
the following:
– Adapters with different write cache sizes
– One adapter is not supported by the IBM i operating system
– An adapter that does not support auxiliary cache is connected to an auxiliary cache adapter
– An adapter that supports dual storage IOAs is connected to another adapter which does not have
the same support
– Greater than 2 adapters are connected for dual storage IOAs
– Adapter code levels are not up to date or are not at the same level of functioning
v One adapter, of a connected pair of adapters, is not operating under the IBM i operating system.
Connected adapters must both be controlled by the IBM i operating system. Additionally, both
adapters must be in the same system partition.
v Adapters connected for dual storage IOAs are not cabled correctly. Each type of dual storage IOA
configuration requires specific cables be used in a supported manner.
Determine which of the possible causes applies to the current configuration and take the appropriate
actions to correct it. If this does not correct the error, contact your hardware service provider. This ends
the procedure.
186
Isolation procedures
SIP3141
Use this procedure to resolve the following problem: Multiple adapters not capable of similar functions or
controlling same set of devices (SRC xxxx9074)
1. This error relates to adapters connected in a dual storage IOA configuration. To obtain the reason or
description for this failure, you must find the formatted error information in the Product Activity Log.
This should also contain information about the connected adapter.
Perform the following steps:
a. Access SST/DST.
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the Product Activity Log and record address information.
If a D IPL was not performed to get to SST/DST:
The log information is formatted. Access the Product Activity Log and display the SRC
that sent you here. Press the F9 key for "Address Information". This is the adapter address.
Then, press F12 to cancel and return to the previous screen. Then press the F4 key to view
the "Additional Information" to record the formatted log information. The Problem
description field indicates the type of problem. The type, serial number, and Worldwide ID
of the connected adapter is also available.
If a D IPL was performed to get to DST:
The log information is not formatted. Access the Product Activity Log and display the SRC
that sent you here. The direct select address (DSA) of the adapter is in the format
BBBB-Cc-bb:
BBBB hexadecimal offsets 4C and 4D
Cc
hexadecimal offset 51
bb
hexadecimal offset 4F
In order to interpret the hexadecimal information to get device addresses. See More
information from hexadecimal reports. The Problem description field indicates the type of
problem. The type, serial number, and Worldwide ID of the connected adapter is also
available.
2. Find the problem description and information for the connected adapter (remote adapter) shown in
the error log, and perform the action listed for the reason in the following table.
Table 39. RAID array reason for failure
Problem description
Full description
Secondary does not support Secondary adapter detected
RAID level being used by
that the primary has a
primary.
RAID array with a RAID
level that the secondary
does not support.
Action
Customer needs to upgrade
the type of secondary
adapter or change the
RAID level of the array on
the primary to a level that
is supported by the
secondary.
Secondary does not support Secondary adapter detected Customer might need to
disk unit function being
a device function that it
upgrade the adapter code
used by primary.
does not support.
or upgrade the type of
secondary adapter.
Adapter on which to
perform the action
Physically change the type
of adapter that logged the
error. Change RAID level
on primary adapter (remote
adapter indicated in the
error log).
Adapter that logged the
error.
Isolation procedures
187
Table 39. RAID array reason for failure (continued)
Problem description
Full description
Action
Adapter on which to
perform the action
Secondary is unable to find Secondary adapter cannot
The adapter that is logging Adapter that logged the
error.
devices found by the
discover all the devices that the error is the secondary
primary.
the primary has.
adapter. View the logging
adapter's resource details
under "Logical Hardware
Resources" in hardware
service manager to
determine the resource
name of the primary
adapter. Then view the
devices attached to both the
primary adapter and
secondary adapter in the
"Logical Hardware
Resources" information to
determine which of the
devices are missing from
the secondary adapter.
Verify the cable connections
to the missing devices. If
the cable connections are
correct but the problem
persists, replace the cables.
Secondary found devices
not found by the primary.
Secondary port not
connected to the same
numbered port on primary.
Secondary adapter has
discovered more devices
than the primary. After this
error is logged, an
automatic failover will
occur.
Remote adapter indicated
Verify the connections to
the devices from the remote in the error log.
adapter as indicated in the
error log.
SAS connections from the
adapter to the devices are
incorrect. Common disk
expansion units must be
connected to the same
numbered SAS port on
both adapters.
Either adapter.
Verify connections and
re-cable SAS connections as
necessary.
View the disk units under
each adapter using HSM to
determine the SAS port that
has the problem.
The failure can also be
caused by incorrect cabling
to a disk expansion unit.
Ensure that the Y0 cable, YI
cable, or X cable is routed
along the right side of the
rack frame, as viewed from
the rear, when connecting
to a disk expansion unit.
Primary lost contact with
disk units accessible by
secondary.
188
Isolation procedures
Link failure from primary
adapter to devices. An
automatic failover will
occur.
Verify cable connections
from the adapter which
logged the error. Possible
disk expansion unit failure.
Adapter that logged the
error.
Table 39. RAID array reason for failure (continued)
Adapter on which to
perform the action
Problem description
Full description
Action
Caching is disabled.
Replace the remote adapter
with an adapter that is of
the same type as the
adapter that logged this
error.
A CCIN 57B5 adapter is
connected to a CCIN 57BB
adapter. These adapters are
not compatible. The CCIN
57BB adapter will log this
error and prevent either
adapter from performing
write caching. Performance
might be degraded until
the problem is resolved.
Identify the CCIN 57B5
Remote adapter indicated
adapter that is paired to the in the serviceable event
CCIN 57BB adapter that is view.
logging this error and
replace it with a CCIN
57BB adapter.
Other
Not currently defined.
Contact your hardware
service provider.
This ends the procedure.
SIP3142
Use this procedure to resolve the following configuration error: incorrect connection between cascaded
enclosures (SRC xxxx4010).
The possible causes are:
v Incorrect cabling of cascaded device enclosures
v Use of an unsupported device enclosure
To prevent hardware damage, power off the system, partition, or card slot as appropriate, before
connecting and disconnecting cables or devices.
1. Identify the affected adapter and its port by examining the product activity log. Perform the following
steps:
a. Access SST or DST.
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the product activity log and record address information.
If a type D IPL was not performed to get to SST or DST:
The log information is formatted. Access the product activity log and display the SRC that
sent you here. Press the F9 key for address information. This is the adapter address. Then,
press F12 to cancel and return to the previous screen. Then press the F4 key to view the
additional Information to record the formatted log information. The Adapter Port field
indicates the port on the adapter reporting the problem. There might be more than one
port listed because multiple ports map to the same physical connector. For example, ports
0 through 3 map to the first physical connector, 4 through 7 map to the second physical
connector, and so on. The port numbers are labeled on the adapter tailstock.
If a type D IPL was performed to get to DST:
The log information is not formatted. Access the product activity log and display the SRC
that sent you here. The direct select address (DSA) of the adapter is in the format
BBBB-Cc-bb:
BBBB Hexadecimal offsets 4C and 4D
Isolation procedures
189
Cc
Hexadecimal offset 51
bb
Hexadecimal offset 4F
In order to interpret the hexadecimal information to get device addresses, see More
information from hexadecimal reports. The Adapter Port field indicates the port on the
adapter reporting the problem. There might be more than one port listed because multiple
ports map to the same physical connector. For example, ports 0 through 3 map to the first
physical connector, 4 through 7 map to the second physical connector, and so on. The port
numbers are labeled on the adapter tailstock.
c. Determine the location of the adapter that reported the problem. See System FRU locations and
find the diagram of the system unit or the expansion unit. Then find the following items:
v The card slot that is identified by the direct select address (DSA)
v The physical connector identified by the port number found on the adapter tailstock
Have you determined the location of the adapter and its port?
No: Ask your next level of support for assistance. This ends the procedure.
Yes: Continue with the next step.
2. Review the device enclosure cabling and correct the cabling as required for the device or device
enclosure attached to the identified adapter port. To see example device configurations with SAS
cabling, see Serial-attached SCSI cable planning, in the Site and hardware planning information. If
unsupported device enclosures are attached, then either remove or replace them with supported
device enclosures.
3. Perform the following steps to cause the adapter to rediscover the devices and connections:
a. Use Hardware Service Manager to perform another IPL of the virtual I/O processor that is
associated with this adapter.
b. Vary on any other resources attached to the virtual I/O processor.
Did the error recur?
No:
This ends the procedure.
Yes:
Contact your hardware service provider. This ends the procedure.
SIP3143
Use this procedure to resolve the following configuration error: connections exceed adapter design limits
(SRC xxxx4020).
The possible causes are:
v Unsupported number of cascaded device enclosures
v Improper cabling of cascaded device enclosures
To prevent hardware damage, power off the system, partition, or card slot as appropriate, before
connecting and disconnecting cables or devices.
1. Identify the affected adapter and its port by examining the product activity log. Perform the following
steps:
a. Access SST or DST.
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the product activity log and record address information.
190
Isolation procedures
If a type D IPL was not performed to get to SST or DST:
The log information is formatted. Access the product activity log and display the SRC that
sent you here. Press the F9 key for address information. This is the adapter address. Then,
press F12 to cancel and return to the previous screen. Then press the F4 key to view the
additional information to record the formatted log information. The Adapter Port field
indicates the port on the adapter reporting the problem. There may be more than one port
listed because multiple ports map to the same physical connector. For example, ports 0
through 3 map to the first physical connector, 4 through 7 map to the second physical
connector, and so on. The port numbers are labeled on the adapter tailstock.
If a type D IPL was performed to get to DST:
The log information is not formatted. Access the product activity log and display the SRC
that sent you here. The direct select address (DSA) of the adapter is in the format
BBBB-Cc-bb:
BBBB Hexadecimal offsets 4C and 4D
Cc
Hexadecimal offset 51
bb
Hexadecimal offset 4F
In order to interpret the hexadecimal information to get device addresses, see More
information from hexadecimal reports. The Adapter Port field indicates the port on the
adapter reporting the problem. There might be more than one port listed because multiple
ports map to the same physical connector. For example, ports 0 through 3 map to the first
physical connector, 4 through 7 map to the second physical connector, and so on. The port
numbers are labeled on the adapter tailstock.
c. Determine the location of the adapter that reported the problem. See System FRU locations and
find the diagram of the system unit or the expansion unit. Then find the following items:
v The card slot that is identified by the direct select address (DSA)
v The physical connector identified by the port number found on the adapter tailstock
Have you determined the location of the adapter and its port?
No: Ask your next level of support for assistance. This ends the procedure.
Yes: Continue with the next step.
2. Reduce the number of cascaded device enclosures. Device enclosures can only be cascaded one level
deep, and only in certain configurations. Review the device enclosure cabling and correct the cabling
as required for the device or device enclosure attached to the identified adapter port. To see example
device configurations with SAS cabling, see Serial-attached SCSI cable planning, in the Site and
hardware planning information.
3. Perform the following steps to cause the adapter to rescover the devices and connections:
a. Use Hardware Service Manager to perform another IPL of the virtual I/O processor that is
associated with this adapter.
b. Vary on any other resources attached to the virtual I/O processor.
Did the error recur?
No:
This ends the procedure.
Yes:
Contact your hardware service provider. This ends the procedure.
SIP3144
Use this procedure to resolve problems with multipath connections.
This procedure is used to resolve the following configuration errors:
v Configuration error, incorrect multipath connection (SRC xxxx4030)
Isolation procedures
191
v Configuration error, incomplete multipath connection between adapter and enclosure detected (SRC
xxxx4040)
The possible causes are:
v Incorrect cabling to device enclosure.
Note: Pay special attention to the requirement that a Y0-cable, YI-cable, or X-cable must be routed
along the right side of the rack frame (as viewed from the rear) when connecting it to a disk expansion
unit. Review the device enclosure cabling and correct the cabling as required. To see example device
configurations with serial attached SCSI (SAS) cabling, see Serial-attached SCSI cable planning, in the
Site and hardware planning.
v A failed connection caused by a failing component in the SAS fabric between, and including, the
adapter and device enclosure.
Considerations:
v Power off the system, partition, or card slot before connecting and disconnecting cables or devices, as
appropriate, to prevent hardware damage.
v Some systems have SAS, PCI-X, and PCIe bus interface logic integrated onto the system boards and
use a pluggable RAID enablement card (a non-PCI form factor card) for these SAS, PCI-X, and PCIe
buses. For these configurations, replacement of the RAID enablement card is unlikely to solve a SAS
related problem because the SAS interface logic is on the system board.
v Some systems have the disk enclosure or removable media enclosure integrated in the system with no
cables. For these configurations the SAS connections are integrated onto the system boards and a failed
connection can be the result of a failed system board or integrated device enclosure.
v Some systems have SAS RAID adapters integrated onto the system backplane and use a cache RAID
and dual IOA enablement card to enable storage adapter write cache and dual storage I/O adapter
(IOA) mode. For these configurations, replacement of the cache RAID and dual IOA enablement card is
unlikely to solve a SAS-related problem because the SAS interface logic is on the system backplane.
v Some configurations involve a SAS adapter connecting to internal SAS disk enclosures within a system
using a cable card. Keep in mind that when the procedure refers to a device enclosure, it could be
referring to the internal SAS disk slots or media slots. Also, when the procedure refers to a cable, it
could include a cable card.
v When using SAS adapters in a dual storage IOA configuration, ensure that the actions taken in this
procedure are against the primary adapter (that is, not the secondary adapter).
Attention: When SAS fabric problems exist, do not replace RAID adapters without assistance from your
service provider. Because the adapter might contain non-volatile write cache data and configuration data
for the attached disk arrays, additional problems can be created by replacing an adapter. Follow
appropriate service procedures when replacing the cache RAID and dual IOA Enablement Card. Incorrect
removal can result in data loss or a nondual storage IOA mode of operation.
1. Was the SRC xxxx4030?
No:
Go to step 5 on page 193.
Yes:
Go to step 2.
2. Identify the affected adapter and its port by examining the product activity log. Perform the
following steps:
a. Access SST or DST.
v If you can enter a command at the console, access system service tools (SST). See System
service tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL
to dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
192
Isolation procedures
b. Access the product activity log and record address information.
If a type D IPL was not performed to get to SST or DST:
The log information is formatted. Access the product activity log and display the SRC
that sent you here. Press the F9 key for address information. This is the adapter address.
Then, press F12 to cancel and return to the previous screen. Then press the F4 key to
view the additional information to record the formatted log information. The Adapter
Port field indicates the port on the adapter reporting the problem. There may be more
than one port listed because multiple ports map to the same physical connector. For
example, ports 0 through 3 map to the first physical connector, 4 through 7 map to the
second physical connector, and so on. The port numbers are labeled on the adapter
tailstock.
If a type D IPL was performed to get to DST:
The log information is not formatted. Access the product activity log and display the SRC
that sent you here. The direct select address (DSA) of the adapter is in the format
BBBB-Cc-bb:
BBBB Hexadecimal offsets 4C and 4D
Cc
Hexadecimal offset 51
bb
Hexadecimal offset 4F
In order to interpret the hexadecimal information to get device addresses, see More
information from hexadecimal reports. The Adapter Port field indicates the port on the
adapter reporting the problem. There may be more than one port listed because multiple
ports map to the same physical connector. For example, ports 0 through 3 map to the
first physical connector, 4 through 7 map to the second physical connector, and so on.
The port numbers are labeled on the adapter tailstock.
c. Determine the location of the adapter that reported the problem. See System FRU locations and
find the diagram of the system unit or the expansion unit. Then find the following items:
v The card slot that is identified by the direct select address (DSA)
v The physical connector identified by the port number found on the adapter tailstock
Have you determined the location of the adapter and its port?
No: Ask your next level of support for assistance. This ends the procedure.
Yes: Continue with the next step.
3. Review the device enclosure cabling and correct the cabling as required for the device or device
enclosure attached to the identified adapter port. To see example device configurations with SAS
cabling, see Serial-attached SCSI cable planning, in the Site and hardware planning information.
4. Perform the following steps to cause the adapter to rediscover the devices and connections:
a. Use Hardware Service Manager to perform another IPL of the virtual I/O processor that is
associated with this adapter.
b. Vary on any other resources attached to the virtual I/O processor.
Did the error recur?
No:
This ends the procedure.
Yes:
Contact your hardware service provider. This ends the procedure.
5. The SRC is xxxx4040. Is the IBM i operating system at Version 6.1.1 or later?
No: Continue with the next step.
Yes: Go to step 7 on page 194.
6. Determine whether a problem still exists for the adapter that logged this error by examining the SAS
connections as follows:
a. On the System Service Tools (SST) screen, select Start a Service Tool then press Enter.
Isolation procedures
193
b.
c.
d.
e.
f.
Select Display/Alter/Dump.
Select Display/Alter storage.
Select Licensed Internal Code (LIC) data.
Select Advanced Analysis.
Type FABQUERY on the entry line and then select it with option 1.
g. On the Specify Advanced Analysis Options screen, type -SUB 01 -IOA DCxx -DSP 0 in the
Options field, where DCxx is the adapter resource name. Press Enter.
Note: More information is available by returning to the Specify Advanced Analysis Options
screen and typing -SUB 01 -IOA DCxx -DSP 2 in the Options field, where DCxx is the adapter
resource name. Press Enter.
Do all expected devices appear in the list and are all paths marked as Operational?
No: Go to step 8.
Yes: The error condition no longer exists. This ends the procedure.
7. Determine whether a problem still exists for the DCxx adapter resource that logged this error by
examining the SAS connections. See Viewing SAS fabric path information. Do all expected devices
appear in the list and are all paths marked as Operational?
No: Continue with the next step.
Yes: The error condition no longer exists. This ends the procedure.
8. Perform the following steps to cause the adapter to rediscover the devices and connections:
a. Use Hardware Service Manager to re-IPL the virtual I/O processor that is associated with this
adapter.
b. Vary on any other resources attached to the virtual I/O processor.
Note: At this point, ignore any problems found and continue with the next step.
9. Determine if the problem still exists for the adapter that logged this error by examining the SAS
connections by performing the actions in step 6 on page 193 or step 7 again.
Do all expected devices appear in the list and are all paths marked as Operational?
No
Go to step 10.
Yes
This ends the procedure.
10. Because the problem persists, some corrective action is needed to resolve the problem. Proceed by
doing the following:
Perform only one of the following corrective actions (listed in the order of preference). If one of the
corrective actions has previously been attempted, proceed to the next one in the list.
v Reseat cables if present on adapter and device enclosure. Perform the following:
a. Use adapter concurrent maintenance to power off the adapter slot, or power off the system or
partition.
b. Reseat the cables.
c. Use adapter concurrent maintenance to power on the adapter slot, or power on the system or
partition.
v Replace the cable, if present, from the adapter to the device enclosure. Perform the following
steps:
a. Use adapter concurrent maintenance to power off the adapter slot, or power off the system or
partition.
b. Replace the cables.
c. Use adapter concurrent maintenance to power on the adapter slot, or power on the system or
partition.
194
Isolation procedures
v Replace the internal device enclosure or see the service documentation for an external expansion
unit. Perform the following steps:
a. Power off the system or partition. If the enclosure is external, adapter concurrent maintenance
can be used instead to power off the adapter slot.
b. Replace the device enclosure.
c. Power on the system or partition. If the enclosure is external, use adapter concurrent
maintenance instead to power on the adapter slot.
v Replace the adapter. The procedure to replace the adapter can be found in PCI adapter.
v Contact your service provider.
11. To determine if the problem still exists for the adapter that logged this error, examine the SAS
connections by performing the actions in step 6 on page 193 or step 7 on page 194 again. Do all
expected devices appear in the list and are all paths marked as Operational?
No:
Go to step 10 on page 194.
Yes:
This ends the procedure.
SIP3145
Use this procedure to resolve the following problem: Unsupported enclosure function detected (SRC
xxxx4110).
The possible causes are:
v Device enclosure or adapter code levels are not up to date.
v Unsupported type of device enclosure or device. For example, this error can occur if a SATA device,
such as a DVD drive, is attached to a CCIN 57B4 adapter. The CCIN 57B4 adapter does not support
SATA devices. To determine whether an adapter supports SATA devices, see PCIe2 SAS RAID card
comparison or PCIe3 SAS RAID card comparison.
Considerations:
To prevent hardware damage or erroneous diagnostic results, remove power from the system as
appropriate before connecting and disconnecting cables or devices.
1. Identify the affected adapter and its port by examining the product activity log. Perform the following
steps:
a. Access SST or DST.
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the product activity log and record address information.
If a type D IPL was not performed to get to SST or DST:
The log information is formatted. Access the product activity log and display the SRC that
sent you here. Press the F9 key for address information. This is the adapter address. Then,
press F12 to cancel and return to the previous screen. Then press the F4 key to view the
additional information to record the formatted log information. The Adapter Port field
indicates the port on the adapter that is reporting the problem. There may be more than
one port listed because multiple ports map to the same physical connector. For example,
ports 0 through 3 map to the first physical connector, 4 through 7 map to the second
physical connector, and so on. The port numbers are labeled on the adapter tailstock.
Isolation procedures
195
If a type D IPL was performed to get to DST:
The log information is not formatted. Access the product activity log and display the SRC
that sent you here. The direct select address (DSA) of the adapter is in the format
BBBB-Cc-bb:
BBBB Hexadecimal offsets 4C and 4D
Cc
Hexadecimal offset 51
bb
Hexadecimal offset 4F
In order to interpret the hexadecimal information to get device addresses, see More
information from hexadecimal reports. The Adapter Port field indicates the port on the
adapter reporting the problem. There might be more than one port listed because multiple
ports map to the same physical connector. For example, ports 0 through 3 map to the first
physical connector, 4 through 7 map to the second physical connector, and so on. The port
numbers are labeled on the adapter tailstock.
c. Determine the location of the adapter that reported the problem. See System FRU locations and
find the diagram of the system unit or the expansion unit. Then find the following items:
v The card slot that is identified by the direct select address (DSA)
v The physical connector identified by the port number found on the adapter tailstock
Have you determined the location of the adapter and its port?
Ask your next level of support for assistance. This ends the procedure.
No:
Yes:
Continue with the next step.
2. Check for the latest PTFs for the device enclosure or adapter and apply them. If unsupported device
enclosures or devices are attached, then either remove or replace them with supported device
enclosures or devices.
Review the device enclosure cabling and correct the cabling as required for the device or device
enclosure attached to the identified adapter port. To see example device configurations with SAS
cabling, see Serial-attached SCSI cable planning, in the Site and hardware planning information.
3. Perform the following steps to cause the adapter to rediscover the devices and connections:
a. Use Hardware Service Manager to perform another IPL of the virtual I/O processor that is
associated with this adapter.
b. Vary on any other resources attached to the virtual I/O processor.
Did the error recur?
No:
This ends the procedure.
Yes:
Contact your hardware service provider. This ends the procedure.
SIP3146
Use this procedure to resolve the configuration error: Incomplete multipath connection between
enclosures and device detected (SRC xxxx4041).
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
The possible cause is a failed connection caused by a failing component within the device enclosure,
including the device itself.
Note: The adapter is not a likely cause of this problem.
Considerations:
196
Isolation procedures
v Power off the system, partition, or card slot before connecting and disconnecting cables or devices, as
appropriate, to prevent hardware damage.
v Some systems have the disk enclosure or removable media enclosure integrated in the system with no
cables. For these configurations the SAS connections are integrated onto the system boards and a failed
connection can be the result of a failed system board or integrated device enclosure.
v Some configurations involve a serial-attached SCSI (SAS) adapter connecting to internal SAS disk
enclosures within a system using a cable card. Keep in mind that when the procedure refers to a device
enclosure, it could be referring to the internal SAS disk slots or media slots. Also, when the procedure
refers to a cable, it could include a cable card.
v When using SAS adapters in a dual storage IOA configuration, ensure that the actions taken in this
procedure are against the primary adapter (that is, not the secondary adapter).
Attention: Do not remove functioning disk units in a disk array without assistance from your service
provider. If functioning disk units are removed, a disk array might become unprotected or failed and
additional problems could be created.
1. Determine the resource name of the adapter that reported the problem by performing the following
steps:
a. Access SST or DST.
b. Access the product activity log and record the resource name that this error is logged against. If
the resource name is an adapter resource name, use it and continue with the next step. If the
resource name is a disk-unit resource name, use the Hardware Service Manager to determine the
resource name of the adapter that is controlling this disk unit. The logical bus number of the
disk-unit logical resource might be useful in determining the adapter resource name.
2. Is the IBM i operating system at Version 6.1.1 or later?
No: Continue with the next step.
Yes: Go to step 4.
3. Determine if a problem still exists for the adapter that logged this error by examining the SAS
connections as follows:
a. On the System Service Tools (SST) display, select Start a Service Tool and press Enter.
b. Select Display/Alter/Dump > Display/Alter storage > Licensed Internal Code (LIC) data >
Advanced Analysis.
c. Type FABQUERY on the entry line and then select it with option 1.
d. On the Specify Advanced Analysis Options display, type -SUB 01 -IOA DCxx -DSP 0 in the
Options field, where DCxx is the adapter resource name. Press Enter.
Note: More information is available by returning to the Specify Advanced Analysis Options
display and typing -SUB 01 -IOA DCxx -DSP 2 in the Options field, where DCxx is the adapter
resource name. Press Enter.
Are all of the following statements true?
v All expected devices appear in the list.
v All paths are marked as Operational.
v None of the paths are blank.
No: Go to step 5 on page 198.
Yes: The error condition no longer exists. This ends the procedure.
4. Determine whether a problem still exists for the DCxx adapter resource that logged this error by
examining the SAS connections. See Viewing SAS fabric path information. Are all of the following
statements true?
v All expected devices appear in the list.
v All paths are marked as Operational.
v None of the paths are blank.
Isolation procedures
197
No: Continue with the next step.
Yes: The error condition no longer exists. This ends the procedure.
5. Perform the following steps to cause the adapter to rediscover the devices and connections:
Note: Performing this step causes the system partition to temporarily hang. Wait until the system
bypasses the temporary hang.
a. Use the logical resources IO debug option in Hardware Service Manager to perform another IPL of
the virtual I/O processor that is associated with this adapter.
b. Vary on any other resources that are attached to the virtual I/O processor.
6. To determine whether the problem still exists for the adapter that logged this error, check the product
activity log to determine whether any new errors were logged for the same resource that was
identified in step 1 on page 197. Also, examine the SAS connections by performing the actions in step
3 on page 197 or step 4 on page 197 again. Are all of the following statements true?
v All expected devices appear in the list.
v All paths are marked as Operational.
v None of the paths are blank.
v No new errors are logged in the product activity log for the resource.
No: Continue with the next step.
Yes: The error condition no longer exists. This ends the procedure.
7. Perform only one of the following corrective actions (listed in the order of preference) and then
continue with step 8 on page 199. If one of the corrective actions has previously been attempted,
proceed to the next one in the list.
v Review the device enclosure cabling and correct the cabling as required. To see example device
configurations with SAS cabling, see Serial-attached SCSI cable planning, in the Site and hardware
planning information.
v Reseat the cables, if present, on the adapter, device enclosure, and any additional device enclosures
connected to the device enclosure. Perform the following steps:
a. Using Hardware Service Manager packaging resources, perform adapter concurrent maintenance
to power off the adapter slot, or power off the system or partition.
b. Reseat the cables.
c. Using Hardware Service Manager packaging resources, perform adapter concurrent maintenance
to power on the adapter slot, or power on the system or partition.
v Replace the cable, if present, from the adapter to the device enclosure, and any cables between the
device enclosure and additional device enclosures connected to the device enclosure. Perform the
following steps:
a. Using Hardware Service Manager packaging resources, perform adapter concurrent maintenance
to power off the adapter slot, or power off the system or partition.
b. Replace the cables.
c. Using Hardware Service Manager packaging resources, perform adapter concurrent maintenance
to power on the adapter slot, or power on the system or partition.
v Replace the internal device enclosure or see the service documentation for an external expansion
unit. Perform the following steps:
a. Power off the system or partition. If the enclosure is external, use adapter concurrent
maintenance instead to power off the adapter slot.
b. Replace the device-enclosure failing items. See SASEXP and DEVBPLN for possible failing items
to replace.
c. Power on the system or partition. If the enclosure is external, adapter concurrent maintenance
can be used instead to power on the adapter slot.
v Replace the device.
198
Isolation procedures
v Contact your service provider.
8. To determine whether the problem still exists for the adapter that logged this error, check the product
activity log to determine whether any new errors were logged for the same resource that was
identified in step 1 on page 197. Also, examine the SAS connections by performing the actions in step
3 on page 197 or step 4 on page 197 again. Are all of the following statements true?
v All expected devices appear in the list.
v All paths are marked as Operational.
v None of the paths are blank.
v No new errors are logged in the product activity log for the resource.
No: Go to step 7 on page 198.
Yes: The error condition no longer exists. This ends the procedure.
SIP3147
Use this procedure to resolve the following problem: Missing remote adapter (SRC xxxx9076)
1. An adapter attached in either an Auxiliary Cache or Dual Storage IOA configuration was not
discovered in the allotted time. To obtain additional information about the configuration involved,
locate the formatted log in the Product Activity Log.
a. Access SST/DST.
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the Product Activity Log and record address information.
If a D IPL was not performed to get to SST/DST:
The log information is formatted. Access the Product Activity Log and display the SRC
that sent you here. Press the F9 key for "Address Information". This is the reporting
adapter address. Then, press F12 to cancel and return to the previous screen. Then press
the F4 key to view the "Additional Information" to record the formatted log information.
The "Type of adapter connection" field indicates the type of configuration involved.
If a D IPL was performed to get to DST:
The log information is not formatted. Access the Product Activity Log and display the SRC
that sent you here. The direct select address (DSA) of the adapter is in the format
BBBB-Cc-bb:
BBBB hexadecimal offsets 4C and 4D
Cc
hexadecimal offset 51
bb
hexadecimal offset 4F
In order to interpret the hexadecimal information to get device addresses. See More
information from hexadecimal reports. The "Type of adapter connection" field indicates the
type of configuration involved.
2. Determine which of the following is the cause of your specific error and take the appropriate actions
listed. If this does not correct the error, contact your hardware service provider. The possible causes
are:
v An attached adapter for the configuration is not installed or is not powered on. Some adapters are
required to be part of a Dual Storage IOA Configuration. Ensure that both adapters are properly
installed and powered on. Use the Hardware Service Manager to determine the missing remote
adapter that is paired with the reporting adapter.
Isolation procedures
199
v If this is an Auxiliary Cache or dual storage IOA configuration, then both adapters may not be in
the same partition. Ensure that both adapters are assigned to the same partition.
v An attached adapter does not support the desired configuration.
v An attached adapter for the configuration failed. Take action on the other errors that have occurred
at the same time as this error.
v Adapter code levels are not up to date or are not at the same level of supported function. Ensure
that the code for both adapters is at the latest level.
v A PCIe cable to a PCIe storage enclosure might be failing.
Note: The adapter that is logging this error will run in a performance degraded mode, without
caching, until the problem is resolved.
This ends the procedure.
SIP3148
Use this procedure to resolve the following problem: Attached enclosure does not support required
multipath function (SRC xxxx4050).
The possible cause is the use of an unsupported device enclosure.
Considerations:
v Power off the system, partition, or card slot before connecting and disconnecting cables or devices, as
appropriate, to prevent hardware damage.
1. Identify the adapter and adapter port that is associated with the problem by examining the product
activity log. Perform the following steps:
a. Access SST or DST.
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the product activity log and record address information.
If a type D IPL was not performed to get to SST or DST:
The log information is formatted. Access the product activity log and display the SRC that
sent you here. Press the F9 key for address information. This is the adapter address. Then,
press F12 to cancel and return to the previous screen. Then press the F4 key to view the
additional information to record the formatted log information. The Adapter Port field
indicates the port on the adapter reporting the problem. There may be more than one port
listed because multiple ports map to the same physical connector. For example, ports 0
through 3 map to the first physical connector, 4 through 7 map to the second physical
connector, and so on. The port numbers are labeled on the adapter tailstock.
If a type D IPL was performed to get to DST:
The log information is not formatted. Access the product activity log and display the SRC
that sent you here. The direct select address (DSA) of the adapter is in the format
BBBB-Cc-bb:
BBBB Hexadecimal offsets 4C and 4D
Cc
Hexadecimal offset 51
bb
Hexadecimal offset 4F
In order to interpret the hexadecimal information to get device addresses, see More
information from hexadecimal reports. The Adapter Port field indicates the port on the
200
Isolation procedures
adapter reporting the problem. There may be more than one port listed because multiple
ports map to the same physical connector. For example, ports 0 through 3 map to the first
physical connector, 4 through 7 map to the second physical connector, and so on. The port
numbers are labeled on the adapter tailstock.
c. Determine the location of the adapter that reported the problem. See System FRU locations and
find the diagram of the system unit or the expansion unit. Then find the following items:
v The card slot that is identified by the direct select address (DSA)
v The physical connector identified by the port number found on the adapter tailstock
Have you determined the location of the adapter and its port?
Ask your next level of support for assistance. This ends the procedure.
No:
Yes:
Continue with the next step.
2. If unsupported device enclosures are attached to the identified adapter port, then either remove or
replace them with supported device enclosures.
3. Perform the following steps to cause the adapter to rediscover the devices and connections:
a. Use Hardware Service Manager to perform another IPL of the virtual I/O processor that is
associated with this adapter.
b. Vary on any other resources attached to the virtual I/O processor.
Did the error recurr?
No:
This ends the procedure.
Yes:
Contact your hardware service provider. This ends the procedure.
SIP3149
Use this procedure to resolve the following problem: Incomplete multipath connection between adapter
and remote adapter (SRC xxxx9075)
The possible cause is incorrect cabling between SAS RAID adapters.
Power off the system, partition, or card slot before connecting and disconnecting cables or devices, as
appropriate, to prevent hardware damage.
Review the device enclosure cabling and correct the cabling as required. To see example device
configurations with SAS cabling. See Serial-attached SCSI cable planning.
This ends the procedure.
SIP3150
Use this procedure to perform serial attached SCSI (SAS) fabric problem isolation.
Considerations:
v Power off the system, partition, or card slot before connecting and disconnecting cables or devices, as
appropriate, to prevent hardware damage.
v Some systems have SAS, PCI-X, and PCIe bus interface logic integrated onto the system boards and
use a pluggable RAID enablement card (a non-PCI form factor card) for these SAS, PCI-X, and PCIe
buses. For these configurations, replacement of the RAID enablement card is unlikely to solve a SAS
related problem because the SAS interface logic is on the system board.
v Some systems have the disk enclosure or removable media enclosure integrated in the system with no
cables. For these configurations the SAS connections are integrated onto the system boards and a failed
connection can be the result of a failed system board or integrated device enclosure.
v Some systems have SAS RAID adapters integrated onto the system backplane and use a cache RAID
and dual IOA enablement card to enable storage adapter write cache and dual storage I/O adapter
Isolation procedures
201
(IOA) mode. For these configurations, replacement of the cache RAID and dual IOA enablement card is
unlikely to solve a SAS-related problem because the SAS interface logic is on the system backplane.
Attention: When SAS fabric problems exist, obtain assistance from your hardware service provider:
v When SAS fabric problems exist, do not replace RAID adapters without assistance from your service
provider. Because the adapter might contain nonvolatile write cache data and configuration data for
the attached disk arrays, additional problems can be created by replacing an adapter.
v Follow appropriate service procedures when replacing the Cache RAID and dual IOA enablement card.
Incorrect removal can result in data loss or a nondual storage IOA mode of operation.
v Do not remove functioning disk units in a disk array without assistance from your service provider. A
disk array might become unprotected or might fail if functioning disk units are removed. The removal
of functioning disk units might also result in additional problems in the disk array.
1. Was the SRC xxxx3020 or SRC xxxx8130?
No:
Go to step 3.
Yes:
Go to step 2.
2. Determine which of the following is the cause of your specific error and take the appropriate actions
listed.
The possible causes for SRC xxxx3020 are:
v More devices are connected to the adapter than the adapter supports. Change the configuration to
reduce the number of devices below what is supported by the adapter.
v A SAS device has been incorrectly moved from one location to another. Either return the device to
its original location or move the device while the adapter is powered off.
v A SAS device has been incorrectly replaced by a SATA device. A SAS device must be used to
replace a SAS device.
The possible causes for SRC xxxx8130 are:
v One or more SAS devices were moved from a PCIe2 or PCIe3 adapter to a PCI-X or PCIe adapter.
If the device was moved from a PCIe2 or PCIe3 adapter to a PCI-X or PCIe adapter, the Detail
Data section of the hardware error log contains a reason for failure of Payload CRC Error. For this
case, the error can be ignored and the problem is resolved if the devices are moved back to a
PCIe2 or PCIe3 adapter or if the devices are formatted on the PCI-X or PCIe adapter.
v For all other causes, go to step 3.
3. Determine the status of the disk units in the array by doing the following steps:
a. Access the product activity log and display the SRC that sent you here.
b. Press the F9 key for address information. This is the adapter address.
c. Return to the SST or DST main menu.
d. Select Work with disk units > Display disk configuration > Display disk configuration status.
e. On the Display disk configuration status screen, look for the devices attached to the adapter that
was identified.
Is there a device that has a status of RAID 5/Unknown, RAID 6/Unknown, RAID 5/Failed, or RAID
6/Failed?
No:
Go to step 5.
Yes:
Go to step 4
4. Other errors should have occurred related to the disk array having degraded protection. Take action
on these errors to replace the failed disk unit and restore the disk array to a fully protected state.
This ends the procedure.
5. Have other errors occurred at the same time as this error?
No:
202
Go to step 7 on page 203.
Isolation procedures
Yes:
Go to step 6
6. Take action on the other errors that have occurred at the same time as this error. This ends the
procedure.
7. Was the SRC xxxxFFFE?
No:
Go to step 10.
Yes:
Go to step 8.
8. Check for the latest PTFs for the device, device enclosure, and adapter and apply them. Did you find
and apply a PTF?
No:
Go to step 10.
Yes:
Go to step 9.
9. This ends the procedure.
10. Identify the adapter and adapter port associated with the problem by examining the product activity
log. Perform the following steps:
a. Access SST or DST.
v If you can enter a command at the console, access system service tools (SST). See System
service tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL
to dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the product activity log and display the SRC that sent you here. Record the adapter
address and the adapter port by doing one of the following:
v If the SRC is xxxxFFFE, press the F9 key for address information. The adapter address is the
bus, board, card information. The port is shown in the I/O bus field. Convert the port value
from decimal to hexadecimal.
v Press the F9 key for address information. The adapter address is the bus, board, card
information. Then, press F12 to cancel and return to the previous screen. Then press the F4 key
to view the additional information, if available. The adapter port is characters 1 and 2 of the
unit address. For example, if the unit address is 123456FF, the port would be 12.
v Go to Hexadecimal product activity log data to obtain the address information. The adapter
address is the bus, board, card information. The adapter port is characters 1 and 2 of the unit
address. For example, if the unit address is 123456FF, the port would be 12.
11. Perform the following steps:
a. Select Start a Service Tool > Hardware Service Manager > Logical Hardware Resources >
System Bus Resources.
b. Enter the adapter bus address and use the Associated packaging resource(s) option to display
the type, model, and unit ID.
c. Record the type, model, and unit ID of the enclosure in which the adapter is located.
d. Use the type, model, unit ID and adapter address to find the location of the adapter (see
Addresses to find the location and then go to System FRU locations).
e. The logical port number was identified in step 10. Logical port numbers are indicated on the
physical connector labels located on the tailstock of the adapter. To locate the device or device
enclosure that is experiencing the problem, use the logical port number to determine the physical
connector to which the device or device enclosure is attached.
12. Because the problem persists, some corrective action is needed to resolve the problem. Proceed by
doing the following:
Perform only one of the following corrective actions (listed in the order of preference). If one of the
corrective actions has previously been attempted, proceed to the next one in the list.
v Reseat cables, if present, on adapter and device enclosure. Perform the following steps:
Isolation procedures
203
a. Use adapter concurrent maintenance to power off the adapter slot, or power off the system or
partition.
b. Reseat the cables.
c. Use adapter concurrent maintenance to power on the adapter slot, or power on the system or
partition.
v Replace the cable, if present, from the adapter to the device enclosure. Perform the following
steps:
a. Use adapter concurrent maintenance to power off the adapter slot, or power off the system or
partition.
b. Replace the cables.
c. Use adapter concurrent maintenance to power on the adapter slot, or power on the system or
partition.
v Replace the device.
Note: If there are multiple devices with a path that is not Operational, the problem is not likely to
be with a device.
v Replace the internal device enclosure or see the service documentation for an external expansion
unit. Perform the following steps:
a. Power off the system or partition. If the enclosure is external, use adapter concurrent
maintenance instead to power off the adapter slot.
b. Replace the device enclosure.
c. Power on the system or partition. If the enclosure is external, use adapter concurrent
maintenance instead to power on the adapter slot.
v Replace the adapter. The procedure to replace the adapter can be found in PCI adapter.
v Contact your service provider.
13. Does the problem still occur after performing the corrective action?
No: This ends the procedure.
Yes: Go to step 12 on page 203.
SIP3152
Use this procedure to resolve possible failed connection problems.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
This procedure is used to resolve the following problems:
v Multipath redundancy level is worse (SRC xxxx4060)
v Device bus fabric error (SRC xxxx4100)
v Temporary device bus fabric error (SRC xxxx4101)
The possible causes are:
v A failed connection caused by a failing component in the serial-attached SCSI (SAS) fabric between,
and including, the adapter and device enclosure.
v A failed connection caused by a failing component within the device enclosure, including the device
itself.
Note: For SRC xxxx4060, the failed connection was previously working, and might have already
recovered.
204
Isolation procedures
Considerations:
v Power off the system, partition, or card slot before connecting and disconnecting cables or devices, as
appropriate, to prevent hardware damage.
v Some systems have SAS, PCI-X, and PCIe bus interface logic integrated onto the system boards and
use a pluggable RAID enablement card (a non-PCI form factor card) for these SAS, PCI-X, and PCIe
buses. For these configurations, replacement of the RAID enablement card is unlikely to solve a
SAS-related problem because the SAS interface logic is on the system board.
v Some systems have the disk enclosure or removable media enclosure integrated in the system with no
cables. For these configurations the SAS connections are integrated onto the system boards and a failed
connection can be the result of a failed system board or integrated device enclosure.
v Some systems have SAS RAID adapters integrated onto the system backplane and use a cache RAID
and dual IOA enablement card to enable the storage-adapter write cache and dual-storage I/O adapter
(IOA) mode. For these configurations, replacement of the cache RAID and dual IOA enablement card is
unlikely to solve a SAS-related problem because the SAS interface logic is on the system backplane.
v Some configurations involve a SAS adapter connecting to internal SAS disk enclosures within a system
that uses a cable card. When the procedure refers to a device enclosure, it could be referring to the
internal SAS disk slots or media slots. Also, when the procedure refers to a cable, it could include a
cable card.
v When using SAS adapters in a dual storage IOA configuration, ensure that the actions taken in this
procedure are against the primary adapter (not the secondary adapter).
Attention:
v When SAS fabric problems exist, do not replace RAID adapters without assistance from your service
provider. Because the adapter might contain nonvolatile, write-cache data and configuration data for
the attached disk arrays, additional problems can be created by replacing an adapter.
v Follow appropriate service procedures when replacing the cache RAID and dual IOA enablement card.
Incorrect removal can result in data loss or a nondual storage IOA mode of operation.
v Do not remove functioning disk units in a disk array without assistance from your service provider. A
disk array might become unprotected or might fail if functioning disk units are removed. The removal
of functioning disk units might also result in additional problems in the disk array.
1. Determine the resource name of the adapter that reported the problem by performing the following
steps:
a. Access SST or DST.
b. Access the product activity log and record the resource name that this error is logged against. If
the resource name is an adapter resource name, use it and continue with the next step. If the
resource name is a disk-unit resource name, use the Hardware Service Manager to determine the
resource name of the adapter that is controlling this disk unit. The logical bus number of the
disk-unit logical resource might be useful in determining the adapter resource name.
2. Is the IBM i operating system at Version 6.1.1 or later?
No: Continue with the next step.
Yes: Go to step 4 on page 206.
3. Determine whether a problem still exists for the adapter that logged this error by examining the SAS
connections as follows:
a. On the System Service Tools (SST) display, select Start a Service Tool and press Enter.
b. Select Display/Alter/Dump > Display/Alter storage > Licensed Internal Code (LIC) data >
Advanced Analysis.
c. Type FABQUERY on the entry line and then select it with option 1.
d. On the Specify Advanced Analysis Options display, type -SUB 01 -IOA DCxx -DSP 0 in the
Options field, where DCxx is the adapter resource name. Press Enter.
Isolation procedures
205
Note: More information is available by returning to the Specify Advanced Analysis Options
display and typing -SUB 01 -IOA DCxx -DSP 2 in the Options field, where DCxx is the adapter
resource name. Press Enter.
Do all expected devices appear in the list and are all paths marked as Operational?
No: Go to step 5.
Yes: The error condition has been recovered. If the error condition has been recovered more
than once, go to step 7. Otherwise, the error condition is not a persistent problem and no
further service action is necessary. This ends the procedure.
4. Determine whether a problem still exists for the DCxx adapter resource that logged this error by
examining the SAS connections. See Viewing SAS fabric path information. Do all expected devices
appear in the list and are all paths marked as Operational?
No: Continue with the next step.
Yes: The error condition has been recovered. If the error condition has been recovered more than
once, go to step 7. Otherwise, the error condition is not a persistent problem and no further service
action is necessary. This ends the procedure.
5. Perform the following steps to cause the adapter to rediscover the devices and connections:
a. Use the logical resources IO debug option in Hardware Service Manager to perform another IPL of
the virtual I/O processor that is associated with this adapter.
b. Vary on any other resources that are attached to the virtual I/O processor.
6. To determine whether the problem still exists for the adapter that logged this error, examine the SAS
connections by performing the actions in step 3 on page 205 or step 4 again. Do all expected devices
appear in the list and are all paths marked as Operational?
No: Continue with the next step.
Yes: The error condition no longer exists. This ends the procedure.
7. Perform only one of the following corrective actions (listed in the order of preference). If one of the
corrective actions has previously been attempted, proceed to the next one in the list.
v Reseat cables, if present, on the adapter , device enclosure, and any additional device enclosures
connected to the device enclosure. Perform the following steps:
a. Using Hardware Service Manager packaging resources, perform adapter concurrent maintenance
to power off the adapter slot, or power off the system or partition.
b. Reseat the cables.
c. Using Hardware Service Manager packaging resources, perform adapter concurrent maintenance
to power on the adapter slot, or power on the system or partition.
v Replace the cable, if present, from the adapter to device enclosure, and any cables between the
device enclosure and additional device enclosures connected to the device enclosure. Perform the
following steps:
a. Using Hardware Service Manager packaging resources, perform adapter concurrent maintenance
to power off the adapter slot, or power off the system or partition.
b. Replace the cables.
c. Using Hardware Service Manager packaging resources, perform adapter concurrent maintenance
to power on the adapter slot, or power on the system or partition.
v Replace the device.
Note: If there are multiple devices with a path that is not Operational, the problem is not likely to
be with a device.
v Replace the internal device enclosure or see the service documentation for an external expansion
unit. Perform the following steps:
a. Power off the system or partition. If the enclosure is external, adapter concurrent maintenance
can be used instead to power off the adapter slot.
206
Isolation procedures
b. Replace the device-enclosure failing items. See SASEXP and DEVBPLN for possible failing items
to replace.
c. Power on the system or partition. If the enclosure is external, adapter concurrent maintenance
can be used instead to power on the adapter slot.
v Replace the adapter. For the procedure to replace the adapter, see PCI adapter.
v Contact your service provider.
8. To determine if the problem still exists for the adapter that logged this error, examine the SAS
connections by performing the actions in step 3 on page 205 or step 4 on page 206 again. Do all
expected devices appear in the list and are all paths marked as Operational?
No: Go to step 7 on page 206.
Yes: The error condition has been recovered. If the error condition has been recovered more than
once, go to step 7 on page 206. Otherwise, the error condition is not a persistent problem and no
further service action is necessary. This ends the procedure.
SIP3153
Use this procedure to resolve possible failed connection problems.
Use isolation procedure “SIP3152” on page 204.
SIP3250
Use this procedure to perform SAS fabric problem isolation for a PCIe2 or PCIe3 controller.
Considerations:
v Power off the system, partition, or card slot before connecting and disconnecting cables or devices, as
appropriate, to prevent hardware damage.
v Some systems have the disk enclosure or removable media enclosure integrated in the system with no
cables. For these configurations, the SAS connections are integrated onto the system boards and a
failed connection can be the result of a failed system board or integrated device enclosure.
Attention: When SAS fabric problems exist, obtain assistance from your hardware service provider:
v When SAS fabric problems exist, do not replace RAID adapters without assistance from your service
provider. Because the adapter might contain nonvolatile write cache data and configuration data for
the attached disk arrays, additional problems can be created by replacing an adapter.
v Do not remove functioning disk units in a disk array without assistance from your service provider. If
functioning disk units are removed, a disk array might become unprotected or might fail. The removal
of functioning disk units might also result in additional problems in the disk array.
1. Was the SRC xxxx3020?
No:
Go to step 3.
Yes:
Go to step 2.
2. The possible causes are:
v More devices are connected to the adapter than the adapter supports. Change the configuration to
reduce the number of devices below what is supported by the adapter.
v A SAS device has been incorrectly moved from one location to another. Either return the device to
its original location or move the device while the adapter is powered off.
v A SATA device has incorrectly replaced a SAS device. A SAS device must be used to replace a SAS
device.
This ends the procedure.
3. Determine the status of the disk units in the array by doing the following steps:
a. Access the product activity log and display the SRC that sent you here.
Isolation procedures
207
b. Press the F9 key for address information. The displayed information contains the adapter
address.
c. Return to the SST or DST main menu.
d. Select Work with disk units > Display disk configuration > Display disk configuration status.
e. On the Display disk configuration status screen, look for the devices attached to the adapter that
was identified.
Is there a device that has a status of RAID 5/Unknown, RAID 6/Unknown, RAID 5/Failed, or RAID
6/Failed?
No:
Go to step 5.
Yes:
Go to step 4
4. Other errors must have occurred related to the disk array having degraded protection. For these
errors, replace the failed disk unit and restore the disk array to a fully protected state. This ends the
procedure.
5. Was the SRC xxxxFFFD?
No:
Go to step 8.
Yes:
Go to step 6.
6. Identify the location of the device that is associated with the problem. Perform the following steps:
a. Access SST or DST.
v If you can enter a command at the console, access system service tools (SST). See System
service tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL
to dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the product activity log and display the SRC that sent you here. Find the location of the
device by completing one of the following actions:
v Record the resource name of the device. Then, use the Hardware Service Manager to determine
the location of the device by using the resource name.
v Go to Hexadecimal product activity log data to obtain the direct select address (DSA) and the
unit address information. Then, go to Addresses to determine the location of the device.
7. Replace the device at the location identified in step 6. For information about locations, see System
FRU locations. If replacing the device does not resolve the problem, contact your hardware service
provider. This ends the procedure.
8. Have other errors occurred at the same time as this error?
No:
Go to step 10.
Yes:
Go to step 9
9. Fix the other errors that occurred at the same time as this error. This ends the procedure.
10. Was the SRC xxxxFFFE?
No:
Go to step 13 on page 209.
Yes:
Go to step 11.
11. Check for the latest PTFs for the device, device enclosure, and adapter and apply them. Did you find
and apply a PTF?
No:
Go to step 13 on page 209.
Yes:
Go to step 12.
12. This ends the procedure.
208
Isolation procedures
13. Identify the adapter SAS port associated with the problem by examining the product activity log.
Perform the following steps:
a. Access SST or DST.
v If you can enter a command at the console, access system service tools (SST). See System
service tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL
to dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the product activity log and display the SRC that sent you here. Record the adapter
address and the adapter port by doing one of the following:
v If the SRC is xxxxFFFE, press the F9 key for address information. The adapter address is the
bus, board, card information. The port is shown in the I/O bus field. Convert the port value
from decimal to hexadecimal.
v Press the F9 key for address information. The adapter address is the bus, board, card
information. Then, press F12 to cancel and return to the previous screen. Then press the F4 key
to view the additional information, if available. The adapter port is characters 1 and 2 of the
unit address. For example, if the unit address is 123456FF, the port would be 12.
v Go to Hexadecimal product activity log data to obtain the address information. The adapter
address is the bus, board, card information. The adapter port is characters 1 and 2 of the unit
address. For example, if the unit address is 123456FF, the port would be 12.
14. Perform the following steps:
a. Select Start a Service Tool > Hardware Service Manager > Logical Hardware Resources >
System Bus Resources.
b. Enter the adapter bus address and use the Associated packaging resource(s) option to display
the type, model, and unit ID.
c. Record the type, model, and unit ID of the enclosure in which the adapter is located.
d. Use the type, model, unit ID and adapter address to find the location of the adapter (see
Addresses to find the location and then go to System FRU locations).
e. The logical port number was identified in step 13. Logical port numbers are indicated on the
physical connector labels located on the tailstock of the adapter. To locate the device or device
enclosure that is experiencing the problem, use the logical port number to determine the physical
connector to which the device or device enclosure is attached.
15. Because the problem persists, some corrective action is required to resolve the problem. Proceed by
doing the following:
Perform only one of the following corrective actions (listed in the order of preference). If one of the
corrective actions has previously been attempted, proceed to the next one in the list.
v Reseat cables, if present, on adapter and device enclosure. Perform the following steps:
a. Use adapter concurrent maintenance to power off the adapter slot, or power off the system or
partition.
b. Reseat the cables.
c. Use adapter concurrent maintenance to power on the adapter slot, or power on the system or
partition.
v Replace the cable, if present, from the adapter to the device enclosure. Perform the following
steps:
a. Use adapter concurrent maintenance to power off the adapter slot, or power off the system or
partition.
b. Replace the cables.
c. Use adapter concurrent maintenance to power on the adapter slot, or power on the system or
partition.
v Replace the device.
Isolation procedures
209
Note: If there are multiple devices with a path that is not Operational, the problem is not likely to
be with a device.
v Replace the internal device enclosure or see the service documentation for an external expansion
unit. Perform the following steps:
a. Power off the system or partition. If the enclosure is external, use adapter concurrent
maintenance instead to power off the adapter slot.
b. Replace the device enclosure.
c. Power on the system or partition. If the enclosure is external, use adapter concurrent
maintenance instead to power on the adapter slot.
v Replace the adapter. The procedure to replace the adapter can be found in PCI adapter.
v Contact your service provider.
16. Does the problem still occur after performing the corrective action?
No: This ends the procedure.
Yes: Go to step 15 on page 209.
SIP3254
Use this procedure to resolve the following problem: Device bus fabric performance degradation (SRC
xxxx4102) for a PCIe2 or PCIe3 controller.
Note: This problem is not common for a PCIe2 or PCIe3 controller.
Perform “SIP3290.”
This ends the procedure.
SIP3290
The problem that occurred is uncommon or complex to resolve. Use this procedure to gather error
information and contact your next level of support.
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to Determining if the system has logical
partitions, before continuing with this procedure.
2. Access SST/DST by doing one of the following:
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
3. Access the product activity log and collect any hardware error information, including hexadecimal
data, logged about the same time for the adapter.
4. Determine the status of the disk units in the array by performing the following steps:
a. Access the product activity log and display the SRC that sent you here.
b. Press the F9 key for address information. This displays the adapter address.
c. Return to the SST or DST main menu.
d. Select Work with disk units > Display disk configuration > Display disk configuration status.
e. On the Display disk configuration status screen, look for the devices attached to the adapter that
was identified and record their status.
5. Collect log, debug, and dump data if available and contact your next level of support for assistance.
This ends the procedure.
210
Isolation procedures
SIP3295
Use this procedure to resolve the following problem: Controller exceeded maximum operating
temperature (SRC xxxx4080) for a PCIe2 or PCIe3 controller.
1. The storage controller chip has exceeded the maximum normal operating temperature. The adapter
continues to run unless the temperature rises even more to the point where errors or hardware
failures occur. The adapter is not likely to be the cause of the over temperature condition. Perform the
following steps to view the temperature information:
a. Access SST/DST by doing one of the following:
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
b. Access the product activity log and display the SRC that sent you here. Press the F4 key to view
the temperature information in Additional Information. The Detail Data section contains the
Current Temperature (in degrees Celsius in decimal notation) and Maximum Operating
Temperature (in degrees Celsius in decimal notation) at the time the error was logged.
c. Continue with the next step to determine the possible cause and necessary action to prevent
exceeding the maximum operating temperature.
2. Determine which of the following items is the cause of exceeding the maximum operating
temperature and take the appropriate actions listed. If this does not correct the error, contact your
next level of support for assistance.
The possible causes are:
v The adapter is installed in an unsupported system. For information about which systems support
the adapter, see PCI adapter information by feature type.
v The adapter is installed in an unsupported slot location within the system unit or I/O enclosure.
For information about supported slot locations, see PCI adapter placement information for the
machine type model (MTM) where the adapter is located.
v The adapter is installed in a supported system, but the system is not operating in the required
airflow mode (for example, the adapter is in an 8202-E4B or 8205-E6B system that is running in
Acoustic Mode). For information about system-specific requirements for this adapter, see PCI adapter
information by feature type.
v Ensure that there are no issues affecting proper cooling (no fan failures or obstructions).
Note: The adapter that is logging this error continues to log this error while the adapter remains
above the maximum operating temperature or each time it exceeds the maximum operating
temperature.
This ends the procedure.
SIP4040
Use this procedure to resolve the following problem: Multiple adapters connected in an invalid
configuration (SRC xxxx9073)
A configuration error occurred. See SAS RAID configurations to determine the allowable configurations.
Then correct the configuration. If correcting the configuration does not resolve the error, contact your
hardware service provider. This ends the procedure.
SIP4041
Use this procedure to resolve the following problem: Multiple adapters not capable of controlling the
same set of devices (SRC xxxx9074)
Isolation procedures
211
1. This error relates to adapters connected in a dual storage IOA configuration. To obtain the reason or
description for this failure, you must find the formatted error information in the Product Activity Log.
The log also contains information about the connected adapter.
Perform the following steps:
a. Access SST/DST.
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the Product Activity Log and record address information.
If a D IPL was not performed to get to SST/DST:
The log information is formatted. Access the Product Activity Log and display the SRC
that sent you here. Press the F9 key for "Address Information". This is the adapter address.
Then, press F12 to cancel and return to the previous screen. Then press the F4 key to view
the "Additional Information" to record the formatted log information. The Problem
description field indicates the type of problem. The type, serial number, and Worldwide ID
of the connected adapter is also available.
If a D IPL was performed to get to DST:
The log information is not formatted. Access the Product Activity Log and display the SRC
that sent you here. The direct select address (DSA) of the adapter is in the format
BBBB-Cc-bb:
BBBB hexadecimal offsets 4C and 4D
Cc
hexadecimal offset 51
bb
hexadecimal offset 4F
In order to interpret the hexadecimal information to get device addresses. See More
information from hexadecimal reports. The Problem description field indicates the type of
problem. The type, serial number, and Worldwide ID of the connected adapter is also
available.
2. Find the problem description and information for the connected adapter (remote adapter) shown in
the error log, and perform the action listed for the reason in the following table.
Table 40. RAID array reason for failure
Problem description
Full description
Action
Secondary is unable to find Secondary adapter cannot
Verify the connections to
devices found by the
discover all the devices that the devices from the
primary.
the primary has.
adapter that logged the
error.
Adapter on which to
perform the action
Adapter that logged the
error.
See SAS fabric
identification to verify
connections.
Secondary found devices
not found by the primary.
212
Isolation procedures
Secondary adapter has
discovered more devices
than the primary. After this
error is logged, an
automatic failover will
occur.
Verify the connections to
Remote adapter indicated
the devices from the remote in the error log.
adapter as indicated in the
error log.
See SAS fabric
identification to verify
connections.
Table 40. RAID array reason for failure (continued)
Problem description
Full description
Action
Primary lost contact with
disk units accessible by
secondary.
Link failure from primary
adapter to devices. An
automatic failover will
occur.
Verify the connections to
the devices from the
adapter that logged the
error.
Adapter on which to
perform the action
Adapter that logged the
error.
See SAS fabric
identification to verify
connections.
Other
Not currently defined.
Contact your hardware
service provider.
This ends the procedure.
SIP4044
Use this procedure to resolve problems with multipath connections.
This procedure is used to resolve the following configuration errors:
v Configuration error, incorrect multipath connection (SRC xxxx4030)
v Configuration error, incomplete multipath connection between adapter and enclosure detected (SRC
xxxx4040)
The possible causes are:
v Incorrect cabling to device enclosure.
Note: Pay special attention to the requirement that a YI-cable must be routed along the right side of
the rack frame (as viewed from the rear) when connecting to a disk expansion unit. Review the device
enclosure cabling and correct the cabling as required. To see example device configurations with serial
attached SCSI (SAS) cabling, see Serial-attached SCSI cable planning, in the Site and hardware
planning.
v A failed connection caused by a failing component in the SAS fabric between, and including, the
adapter and device enclosure.
Considerations:
v Power off the system, partition, or card slot before connecting and disconnecting cables or devices, as
appropriate, to prevent hardware damage.
v Some systems have SAS, PCI-X, and PCIe bus interface logic integrated onto the system boards and
use a pluggable RAID enablement card (a non-PCI form factor card) for these SAS, PCI-X, and PCIe
buses. For these configurations, replacement of the RAID enablement card is unlikely to solve a
SAS-related problem because the SAS interface logic is on the system board.
v Some systems have the disk enclosure or removable media enclosure integrated in the system with no
cables. For these configurations, the SAS connections are integrated onto the system boards. A failed
connection can be the result of a failed system board or integrated device enclosure.
v When using SAS adapters in a dual storage IOA configuration, ensure that the actions taken in this
procedure are against the primary adapter (that is, not the secondary adapter).
Attention: When SAS fabric problems exist, do not replace RAID adapters without assistance from your
service provider. Because the adapter might contain non-volatile write cache data and configuration data
for the attached disk arrays, additional problems can be created by replacing an adapter. Follow
appropriate service procedures when replacing the cache RAID and dual IOA Enablement Card. Incorrect
removal can result in data loss or a nondual storage IOA mode of operation.
Isolation procedures
213
1. Was the SRC xxxx4030?
No:
Go to step 4.
Yes:
Go to step 2.
2. Review the device enclosure cabling and correct the cabling as required for the device or device
enclosure attached to the identified adapter port. To see example device configurations with SAS
cabling, see Serial-attached SCSI cable planning, in the Site and hardware planning information.
3. Perform the following steps to cause the adapter to rediscover the devices and connections:
a. Use Hardware Service Manager to perform another IPL of the virtual I/O processor that is
associated with this adapter.
b. Vary on any other resources attached to the virtual I/O processor.
Did the error recur?
No:
This ends the procedure.
Yes:
Contact your hardware service provider. This ends the procedure.
4. The SRC is xxxx4040. Is the IBM i operating system at Version 6.1.1 or later?
No: Continue with the next step.
Yes: Go to step 6.
5. Determine whether a problem still exists for the adapter that logged this error by examining the SAS
connections as follows:
a. On the System Service Tools (SST) screen, select Start a Service Tool then press Enter.
b. Select Display/Alter/Dump.
c. Select Display/Alter storage.
d. Select Licensed Internal Code (LIC) data.
e. Select Advanced Analysis.
f. Type FABQUERY on the entry line and then select it with option 1.
g. On the Specify Advanced Analysis Options screen, type -SUB 01 -IOA DCxx -DSP 0 in the
Options field, where DCxx is the adapter resource name. Press Enter.
Note: More information is available by returning to the Specify Advanced Analysis Options
screen and typing -SUB 01 -IOA DCxx -DSP 2 in the Options field, where DCxx is the adapter
resource name. Press Enter.
Do all expected devices appear in the list and are all paths marked as Operational?
No: Go to step 7.
Yes: The error condition no longer exists. This ends the procedure.
6. Determine whether a problem still exists for the DCxx adapter resource that logged this error by
examining the SAS connections. See Viewing SAS fabric path information. Do all expected devices
appear in the list and are all paths marked as Operational?
No: Continue with the next step.
Yes: The error condition no longer exists. This ends the procedure.
7. Perform the following steps to cause the adapter to rediscover the devices and connections:
a. Use Hardware Service Manager to re-IPL the virtual I/O processor that is associated with this
adapter.
b. Vary on any other resources attached to the virtual I/O processor.
Note: At this point, ignore any problems found and continue with the next step.
8. Determine if the problem still exists for the adapter that logged this error by examining the SAS
connections by performing the actions in step 5 or step 6 again.
Do all expected devices appear in the list and are all paths marked as Operational?
214
Isolation procedures
No
Go to step 9.
Yes
This ends the procedure.
9. Go to SAS fabric identification. Then continue with the next step.
10. To determine if the problem still exists for the adapter that logged this error, examine the SAS
connections by performing the actions in step 5 on page 214 or step 6 on page 214 again. Do all
expected devices appear in the list and are all paths marked as Operational?
No:
Go to step 9.
Yes:
This ends the procedure.
SIP4047
Use this procedure to resolve the following problem: Missing remote adapter (SRC xxxx9076)
An adapter attached in a dual storage IOA configuration was not discovered in the allotted time.
Determine which one of the following items is the cause of your specific error and take the appropriate
actions listed. If this action does not correct the error, contact your hardware service provider. The
possible causes are:
v An attached adapter for the configuration is not installed or is not powered on. Some adapters are
required to be part of a Dual Storage IOA Configuration. Ensure that both adapters are properly
installed and powered on.
v If this configuration is a dual storage IOA configuration, then both adapters might not be in the same
partition. Ensure that both adapters are assigned to the same partition.
v An attached adapter does not support the desired configuration. See SAS RAID configurations to
determine the allowable configurations. Then correct the configuration. If correcting the configuration
does not resolve the error, contact your hardware service provider.
v An attached adapter for the configuration failed. Take action on the other errors that have occurred at
the same time as this error.
v Adapter code levels are not up to date or are not at the same level of supported function. Ensure that
the code for both adapters is at the latest level.
Note: The adapter that is logging this error will run in a performance degraded mode, without caching,
until the problem is resolved.
This ends the procedure.
SIP4049
Use this procedure to resolve the following problem: Incomplete multipath connection between adapter
and remote adapter (SRC xxxx9075)
The possible cause is a failure in the embedded SAS fabric. Use the following table to determine the
service action to perform.
Isolation procedures
215
Table 41. Service actions for a failure in the embedded SAS fabric
Location of device or devices
Service action
8202-E4B or 8205-E6B system unit
Replace the following FRUs, one at a time, in the order
shown until the problem is resolved. See 8202-E4B or
8205-E6B locations to determine the location, part
number, and replacement procedure to use for each FRU.
1. Device backplane (CCIN 2BD5 or 2BD6) at location
Un-P2.
2. Replace the adapter that logged the error. The
adapter might be any of the following possible failing
items:
v System backplane (CCIN 2BFB or 2BFC) at location
Un-P1.
v RAID adapter (CCIN 2BD9 or 2BE0) at location
Un-P1-C19.
Disk expansion unit attached to an 8202-E4B or 8205-E6B Replace the cables to the disk expansion unit. If that does
system unit
not resolve the problem, see the service information for
the disk expansion unit for additional FRUs to replace.
8202-E4C, 8202-E4D, 8205-E6C, or 8205-E6D system unit
Replace the following FRUs, one at a time, in the order
shown until the problem is resolved. See 8202-E4C,
8202-E4D, 8205-E6C, or 8205-E6D locations to determine
the location, part number, and replacement procedure to
use for each FRU.
1. Device backplane (CCIN 2BD5 or 2BD6) at location
Un-P2.
2. Replace the adapter that logged the error. The
adapter might be any of the following possible failing
items:
v System backplane (CCIN 2B2C, 2B2D, 2B4A, or
2B4B) at location Un-P1.
v RAID adapter (CCIN 2B4C or 2B4F) at location
Un-P1-C19.
Disk expansion unit attached to an 8202-E4C, 8202-E4D,
8205-E6C, or 8205-E6D system unit
Replace the cables to the disk expansion unit. If that does
not resolve the problem, see the service information for
the disk expansion unit for additional FRUs to replace.
8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D, or 8268-E1D
system unit
Replace the following FRUs, one at a time, in the order
shown until the problem is resolved. See 8231-E1C,
8231-E1D, 8231-E2C, 8231-E2D, or 8268-E1D locations to
determine the location, part number, and replacement
procedure to use for each FRU.
1. Device backplane (CCIN 2BD7 or 2BE7) at location
Un-P3.
2. Device backplane interposer (CCIN 2D1E or 2D1F) at
location Un-P2.
3. Replace the adapter that logged the error. The
adapter might be any of the following possible failing
items:
v System backplane (CCIN 2B2C, 2B2D, 2B4A, or
2B4B) at location Un-P1.
v RAID adapter (CCIN 2B4C) at location Un-P1-C18.
Disk expansion unit attached to an 8231-E1C, 8231-E1D,
8231-E2C, 8231-E2D, or 8268-E1D system unit
216
Isolation procedures
Replace the cables to the disk expansion unit. If that does
not resolve the problem, see the service information for
the disk expansion unit for additional FRUs to replace.
Table 41. Service actions for a failure in the embedded SAS fabric (continued)
Location of device or devices
Service action
8231-E2B system unit
Replace the following FRUs, one at a time, in the order
shown until the problem is resolved. See 8231-E2B
locations to determine the location, part number, and
replacement procedure to use for each FRU.
1. Device backplane (CCIN 2BD7 or 2BE7) at location
Un-P3.
2. Device backplane interposer (CCIN 2D1E or 2D1F) at
location Un-P2.
3. Replace the adapter that logged the error. The
adapter might be any of the following possible failing
items:
v System backplane (CCIN 2BFB or 2BFC) at location
Un-P1.
v RAID adapter (CCIN 2BD9 or 2BE0) at location
Un-P1-C18.
Disk expansion unit attached to an 8231-E2B system unit
Replace the cables to the disk expansion unit. If that does
not resolve the problem, see the service information for
the disk expansion unit for additional FRUs to replace.
This ends the procedure.
SIP4050
Use this procedure to perform serial attached SCSI (SAS) fabric problem isolation.
Considerations:
v Power off the system, partition, or card slot before connecting and disconnecting cables or devices, as
appropriate, to prevent hardware damage.
v Some systems have SAS, PCI-X, and PCIe bus interface logic integrated onto the system boards and
use a pluggable RAID enablement card (a non-PCI form factor card) for these SAS, PCI-X, and PCIe
buses. For these configurations, replacement of the RAID enablement card is unlikely to solve a
SAS-related problem because the SAS interface logic is on the system board.
v Some systems have the disk enclosure or removable media enclosure integrated in the system with no
cables. For these configurations, the SAS connections are integrated onto the system boards. A failed
connection can be the result of a failed system board or integrated device enclosure.
Attention: When SAS fabric problems exist, obtain assistance from your hardware service provider:
v When SAS fabric problems exist, do not replace RAID adapters without assistance from your service
provider. Because the adapter might contain nonvolatile write cache data and configuration data for
the attached disk arrays, additional problems can be created by replacing an adapter.
v Follow appropriate service procedures when replacing the Cache RAID and dual IOA enablement card.
Incorrect removal can result in data loss or a nondual storage IOA mode of operation.
v Do not remove functioning disk units in a disk array without assistance from your service provider. A
disk array might become unprotected or might fail if functioning disk units are removed. The removal
of functioning disk units might also result in additional problems in the disk array.
1. Was the SRC xxxx3020?
No:
Go to step 3 on page 218.
Yes:
Go to step 2.
2. The possible causes are:
Isolation procedures
217
v More devices are connected to the adapter than the adapter supports. Change the configuration to
the allowable number of devices.
v A SAS device has been incorrectly moved from one location to another. Either return the device to
its original location or move the device while the adapter is powered off.
v A SAS device has been incorrectly replaced by a SATA device. A SAS device must be used to
replace a SAS device.
This ends the procedure.
3. Determine the status of the disk units in the array by doing the following steps:
a. Access the product activity log and display the SRC that sent you here.
b. Press the F9 key for address information. This is the adapter address.
c. Return to the SST or DST main menu.
d. Select Work with disk units > Display disk configuration > Display disk configuration status.
e. On the Display disk configuration status screen, look for the devices attached to the adapter that
was identified.
Is there a device that has a status of RAID 5/Unknown, RAID 6/Unknown, RAID 5/Failed, or RAID
6/Failed?
No:
Go to step 5.
Yes:
Go to step 4
4. Other errors might have occurred related to the disk array having degraded protection. Take action
on these errors to replace the failed disk unit and restore the disk array to a fully protected state.
This ends the procedure.
5. Have other errors occurred at the same time as this error?
No:
Go to step 7.
Yes:
Go to step 6
6. Take action on the other errors that have occurred at the same time as this error. This ends the
procedure.
7. Was the SRC xxxxFFFE?
No:
Go to step 10.
Yes:
Go to step 8.
8. Check for the latest PTFs for the device, device enclosure, and adapter and apply them. Did you find
and apply a PTF?
No:
Go to step 10.
Yes:
Go to step 9.
9. This ends the procedure.
10. Is the problem in a disk expansion unit?
No
Go to SAS fabric identification.
Yes
Go to 11.
11. Identify the adapter and adapter port associated with the problem by examining the product activity
log. Perform the following steps:
a. Access SST or DST.
v If you can enter a command at the console, access system service tools (SST). See System
service tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL
to dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
218
Isolation procedures
b. Access the product activity log and display the SRC that sent you here. Record the adapter
address and the adapter port by doing one of the following:
v If the SRC is xxxxFFFE, press the F9 key for address information. The adapter address is the
bus, board, card information. The port is shown in the I/O bus field. Convert the port value
from decimal to hexadecimal.
v Press the F9 key for address information. The adapter address is the bus, board, card
information. Then, press F12 to cancel and return to the previous screen. Then press the F4 key
to view the additional information, if available. The adapter port is characters 1 and 2 of the
unit address. For example, if the unit address is 123456FF, the port would be 12.
v Go to Hexadecimal product activity log data to obtain the address information. The adapter
address is the bus, board, card information. The adapter port is characters 1 and 2 of the unit
address. For example, if the unit address is 123456FF, the port would be 12.
12. Perform the following steps:
a. Select Start a Service Tool > Hardware Service Manager > Logical Hardware Resources >
System Bus Resources.
b. Enter the adapter bus address and use the Associated packaging resource(s) option to display
the type, model, and unit ID.
c. Record the type, model, and unit ID of the enclosure in which the adapter is located.
d. Use the type, model, unit ID and adapter address to find the location of the adapter (see
Addresses to find the location and then go to System FRU locations).
e. The logical port number was identified in step 11 on page 218. Logical port numbers are
indicated on the physical connector labels located on the tailstock of the adapter. To locate the
device or device enclosure that is experiencing the problem, use the logical port number to
determine the physical connector to which the device or device enclosure is attached.
13. Because the problem persists, some corrective action is needed to resolve the problem. Proceed by
doing the following:
Perform only one of the following corrective actions (listed in the order of preference). If one of the
corrective actions has previously been attempted, proceed to the next one in the list.
v Reseat cables, if present, on adapter and device enclosure. Perform the following steps:
a. Use adapter concurrent maintenance to power off the adapter slot, or power off the system or
partition.
b. Reseat the cables.
c. Use adapter concurrent maintenance to power on the adapter slot, or power on the system or
partition.
v Replace the cable, if present, from the adapter to the device enclosure. Perform the following
steps:
a. Use adapter concurrent maintenance to power off the adapter slot, or power off the system or
partition.
b. Replace the cables.
c. Use adapter concurrent maintenance to power on the adapter slot, or power on the system or
partition.
v Replace the device.
Note: If there are multiple devices with a path that is not Operational, the problem is not likely to
be with a device.
v Replace the internal device enclosure or see the service documentation for an external expansion
unit. Perform the following steps:
a. Power off the system or partition. If the enclosure is external, use adapter concurrent
maintenance instead to power off the adapter slot.
b. Replace the device enclosure.
Isolation procedures
219
c. Power on the system or partition. If the enclosure is external, use adapter concurrent
maintenance instead to power on the adapter slot.
v Replace the adapter. The procedure to replace the adapter can be found in PCI adapter.
v Contact your service provider.
14. Does the problem still occur after performing the corrective action?
No: This ends the procedure.
Yes: Go to step 13 on page 219.
SIP4052
Use this procedure to resolve possible failed connection problems
This procedure is used to resolve the following problems:
v Multipath redundancy level is worse (SRC xxxx4060)
v Device bus fabric error (SRC xxxx4100)
v Temporary device bus fabric error (SRC xxxx4101)
The possible causes are:
v A failed connection caused by a failing component in the serial attached SCSI (SAS) fabric between,
and including, the adapter and device enclosure.
v A failed connection caused by a failing component within the device enclosure, including the device
itself.
Note: For SRC xxxx4060, the failed connection was previously working, and might have already
recovered.
Considerations:
v Power off the system, partition, or card slot before connecting and disconnecting cables or devices, as
appropriate, to prevent hardware damage.
v Some systems have SAS, PCI-X, and PCIe bus interface logic integrated onto the system boards and
use a pluggable RAID enablement card (a non-PCI form factor card) for these SAS, PCI-X, and PCIe
buses. For these configurations, replacement of the RAID enablement card is unlikely to solve a
SAS-related problem because the SAS interface logic is on the system board.
v Some systems have the disk enclosure or removable media enclosure integrated in the system with no
cables. For these configurations, the SAS connections are integrated onto the system boards. A failed
connection can be the result of a failed system board or integrated device enclosure.
v When using SAS adapters in a dual storage IOA configuration, ensure that the actions taken in this
procedure are against the primary adapter (not the secondary adapter).
Attention:
v When SAS fabric problems exist, do not replace RAID adapters without assistance from your service
provider. Because the adapter might contain nonvolatile write cache data and configuration data for
the attached disk arrays, additional problems can be created by replacing an adapter.
v Follow appropriate service procedures when replacing the Cache RAID and dual IOA enablement card.
Incorrect removal can result in data loss or a nondual storage IOA mode of operation.
v Do not remove functioning disk units in a disk array without assistance from your service provider. A
disk array might become unprotected or might fail if functioning disk units are removed. The removal
of functioning disk units might also result in additional problems in the disk array.
1. Determine the resource name of the adapter that reported the problem by performing the following:
a. Access SST or DST.
220
Isolation procedures
b. Access the product activity log and record the resource name that this error is logged against. If
the resource name is an adapter resource name, use it and continue with the next step. If the
resource name is a disk unit resource name, use the Hardware Service Manager to determine the
resource name of the adapter that is controlling this disk unit.
2. Is the IBM i operating system at Version 6.1.1 or later?
No: Continue with the next step.
Yes: Go to step 4.
3. Determine whether a problem still exists for the adapter that logged this error by examining the SAS
connections as follows:
a. On the System Service Tools (SST) screen, select Start a Service Tool and then press Enter.
b.
c.
d.
e.
f.
g.
Select Display/Alter/Dump.
Select Display/Alter storage.
Select Licensed Internal Code (LIC) data.
Select Advanced Analysis.
Type FABQUERY on the entry line and then select it with option 1.
On the Specify Advanced Analysis Options screen, type -SUB 01 -IOA DCxx -DSP 0 in the Options
field, where DCxx is the adapter resource name. Press Enter.
Note: More information is available by returning to the Specify Advanced Analysis Options screen
and typing -SUB 01 -IOA DCxx -DSP 2 in the Options field, where DCxx is the adapter resource
name. Press Enter.
Do all expected devices appear in the list and are all paths marked as Operational?
No: Continue with the next step.
Yes: The error condition has been recovered. If the error condition has been recovered more
than one time, go to step 7. Otherwise, the error condition is not a persistent problem and no
further service action is necessary. This ends the procedure.
4. Determine whether a problem still exists for the DCxx adapter resource that logged this error by
examining the SAS connections. See Viewing SAS fabric path information. Do all expected devices
appear in the list and are all paths marked as Operational?
No: Continue with the next step.
Yes: The error condition has been recovered. If the error condition has been recovered more than
one time, go to step 7. Otherwise, the error condition is not a persistent problem and no further
service action is necessary. This ends the procedure.
5. Perform the following steps to cause the adapter to rediscover the devices and connections:
a. Use Hardware Service Manager to perform another IPL of the virtual I/O processor that is
associated with this adapter.
b. Vary on any other resources attached to the virtual I/O processor.
6. To determine if the problem still exists for the adapter that logged this error, examine the SAS
connections by performing the actions in step 3 or step 4 again. Do all expected devices appear in the
list and are all paths marked as Operational?
No: Continue with the next step.
Yes: The error condition no longer exists. This ends the procedure.
7. Go to SAS fabric identification. Then continue with the next step.
8. To determine if the problem still exists for the adapter that logged this error, examine the SAS
connections by performing the actions in step 3 or step 4 again. Do all expected devices appear in the
list and are all paths marked as Operational?
No: Go to step 7.
Isolation procedures
221
Yes: The error condition has been recovered. If the error condition has been recovered more than
one time, go to step 7 on page 221. Otherwise, the error condition is not a persistent problem and
no further service action is necessary. This ends the procedure.
SIP4053
Use this procedure to resolve possible failed connection problems.
Use isolation procedure “SIP4052” on page 220.
SIP4140
Use this procedure to resolve the following problem: Multiple adapters connected in an invalid
configuration (SRC xxxx9073)
One adapter, of a connected pair of adapters, is not operating under the same operating system type as
the other adapter. Connected adapters must both be controlled by the same type of operating system.
Correct the configuration. If correcting the configuration does not resolve the error, contact your
hardware service provider. This ends the procedure.
SIP4141
Use this procedure to resolve the following problem: Multiple adapters not capable of controlling the
same set of devices (SRC xxxx9074)
1. This error relates to adapters connected in a dual storage IOA configuration. To obtain the reason or
description for this failure, you must find the formatted error information in the Product Activity Log.
The log also contains information about the connected adapter.
Perform the following steps:
a. Access SST/DST.
v If you can enter a command at the console, access system service tools (SST). See System service
tools.
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL to
dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the Product Activity Log and record address information.
If a D IPL was not performed to get to SST/DST:
The log information is formatted. Access the Product Activity Log and display the SRC
that sent you here. Press the F9 key for "Address Information". This is the adapter address.
Then, press F12 to cancel and return to the previous screen. Then press the F4 key to view
the "Additional Information" to record the formatted log information. The Problem
description field indicates the type of problem. The type, serial number, and Worldwide ID
of the connected adapter is also available.
If a D IPL was performed to get to DST:
The log information is not formatted. Access the Product Activity Log and display the SRC
that sent you here. The direct select address (DSA) of the adapter is in the format
BBBB-Cc-bb:
BBBB hexadecimal offsets 4C and 4D
Cc
hexadecimal offset 51
bb
hexadecimal offset 4F
In order to interpret the hexadecimal information to get device addresses. See More
information from hexadecimal reports. The Problem description field indicates the type of
problem. The type, serial number, and Worldwide ID of the connected adapter is also
available.
222
Isolation procedures
2.
Find the problem description and information for the connected adapter (remote adapter) shown in
the error log, and perform the action listed for the reason in the following table.
Table 42. RAID array reason for failure
Problem description
Full description
Action
Secondary is unable to find Secondary adapter cannot
Verify the connections to
devices found by the
discover all the devices that the devices from the
primary.
the primary has.
adapter that logged the
error.
Adapter on which to
perform the action
Adapter that logged the
error.
See SAS fabric
identification to verify
connections.
Secondary found devices
not found by the primary.
Primary lost contact with
disk units accessible by
secondary.
Secondary adapter has
discovered more devices
than the primary. After this
error is logged, an
automatic failover will
occur.
Verify the connections to
Remote adapter indicated
the devices from the remote in the error log.
adapter as indicated in the
error log.
Link failure from primary
adapter to devices. An
automatic failover will
occur.
Verify the connections to
the devices from the
adapter that logged the
error.
See SAS fabric
identification to verify
connections.
Adapter that logged the
error.
See SAS fabric
identification to verify
connections.
Other
Not currently defined.
Contact your hardware
service provider.
This ends the procedure.
SIP4144
Use this procedure to resolve problems with multipath connections.
This procedure is used to resolve the following configuration errors:
v Configuration error, incorrect multipath connection (SRC xxxx4030)
v Configuration error, incomplete multipath connection between adapter and enclosure detected (SRC
xxxx4040)
The possible causes are:
v Incorrect cabling to device enclosure.
Note: Pay special attention to the requirement that a YI-cable must be routed along the right side of
the rack frame (as viewed from the rear) when connecting to a disk expansion unit. Review the device
enclosure cabling and correct the cabling as required. To see example device configurations with serial
attached SCSI (SAS) cabling, see Serial-attached SCSI cable planning, in the Site and hardware
planning.
v A failed connection caused by a failing component in the SAS fabric between, and including, the
adapter and device enclosure.
Considerations:
Isolation procedures
223
v Power off the system, partition, or card slot before connecting and disconnecting cables or devices, as
appropriate, to prevent hardware damage.
v Some systems have the disk enclosure or removable media enclosure integrated in the system with no
cables. For these configurations, the SAS connections are integrated onto the system boards. A failed
connection can be the result of a failed system board or integrated device enclosure.
v Some systems have SAS RAID adapters integrated onto the system backplane and use a cache RAID
and dual IOA enablement card to enable storage adapter write cache and dual storage I/O adapter
(IOA) mode. For these configurations, replacement of the cache RAID and dual IOA enablement card is
unlikely to solve a SAS-related problem because the SAS interface logic is on the system backplane.
v When using SAS adapters in a dual storage IOA configuration, ensure that the actions taken in this
procedure are against the primary adapter (that is, not the secondary adapter).
Attention: When SAS fabric problems exist, do not replace RAID adapters without assistance from your
service provider. Because the adapter might contain non-volatile write cache data and configuration data
for the attached disk arrays, additional problems can be created by replacing an adapter. Follow
appropriate service procedures when replacing the cache RAID and dual IOA Enablement Card. Incorrect
removal can result in data loss or a nondual storage IOA mode of operation.
1. Was the SRC xxxx4030?
No:
Go to step 4.
Yes:
Go to step 2.
2. Review the device enclosure cabling and correct the cabling as required for the device or device
enclosure attached to the identified adapter port. To see example device configurations with SAS
cabling, see Serial-attached SCSI cable planning, in the Site and hardware planning information.
3. Perform the following steps to cause the adapter to rediscover the devices and connections:
a. Use Hardware Service Manager to perform another IPL of the virtual I/O processor that is
associated with this adapter.
b. Vary on any other resources attached to the virtual I/O processor.
Did the error recur?
No:
This ends the procedure.
Yes:
Contact your hardware service provider. This ends the procedure.
4. The SRC is xxxx4040. Is the IBM i operating system at Version 6.1.1 or later?
No: Continue with the next step.
Yes: Go to step 6 on page 225.
5. Determine whether a problem still exists for the adapter that logged this error by examining the SAS
connections as follows:
a.
b.
c.
d.
e.
f.
On the System Service Tools (SST) screen, select Start a Service Tool then press Enter.
Select Display/Alter/Dump.
Select Display/Alter storage.
Select Licensed Internal Code (LIC) data.
Select Advanced Analysis.
Type FABQUERY on the entry line and then select it with option 1.
g. On the Specify Advanced Analysis Options screen, type -SUB 01 -IOA DCxx -DSP 0 in the
Options field, where DCxx is the adapter resource name. Press Enter.
Note: More information is available by returning to the Specify Advanced Analysis Options
screen and typing -SUB 01 -IOA DCxx -DSP 2 in the Options field, where DCxx is the adapter
resource name. Press Enter.
Do all expected devices appear in the list and are all paths marked as Operational?
224
Isolation procedures
No: Go to step 7.
Yes: The error condition no longer exists. This ends the procedure.
6. Determine whether a problem still exists for the DCxx adapter resource that logged this error by
examining the SAS connections. See Viewing SAS fabric path information. Do all expected devices
appear in the list and are all paths marked as Operational?
No: Continue with the next step.
Yes: The error condition no longer exists. This ends the procedure.
7. Perform the following steps to cause the adapter to rediscover the devices and connections:
a. Use Hardware Service Manager to re-IPL the virtual I/O processor that is associated with this
adapter.
b. Vary on any other resources attached to the virtual I/O processor.
Note: At this point, ignore any problems found and continue with the next step.
8. Determine if the problem still exists for the adapter that logged this error by examining the SAS
connections by performing the actions in step 5 on page 224 or step 6 again.
Do all expected devices appear in the list and are all paths marked as Operational?
No
Go to step 9.
Yes
This ends the procedure.
9. Go to SAS fabric identification. Then continue with the next step.
10. To determine if the problem still exists for the adapter that logged this error, examine the SAS
connections by performing the actions in step 5 on page 224 or step 6 again. Do all expected devices
appear in the list and are all paths marked as Operational?
No:
Go to step 9.
Yes:
This ends the procedure.
SIP4147
Use this procedure to resolve the following problem: Missing remote adapter (SRC xxxx9076)
An adapter attached in a dual storage IOA configuration was not discovered in the allotted time.
Determine which one of the following items is the cause of your specific error and take the appropriate
actions listed. If this action does not correct the error, contact your hardware service provider. The
possible causes are:
v An attached adapter for the configuration is not installed or is not powered on. Some adapters are
required to be part of a Dual Storage IOA Configuration. Ensure that both adapters are properly
installed and powered on.
v If this configuration is a dual storage IOA configuration, then both adapters might not be in the same
partition. Ensure that both adapters are assigned to the same partition.
v The 175 MB cache RAID and dual storage IOA enablement card is not seated properly. See SAS RAID
enablement and cache battery pack for the model 8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C,
or 8205-E6D for information about removing and installing the card.
Attention: Appropriate service procedures must be followed when reseating the 175 MB cache RAID
and dual storage IOA enablement card because removal of this card can cause data loss if incorrectly
performed.
v An attached adapter for the configuration failed. Take action on the other errors that have occurred at
the same time as this error.
v Adapter code levels are not up to date or are not at the same level of supported function. Ensure that
the code for both adapters is at the latest level.
Isolation procedures
225
Note: The adapter that is logging this error will run in a performance degraded mode, without caching,
until the problem is resolved.
This ends the procedure.
SIP4149
Use this procedure to resolve the following problem: Incomplete multipath connection between adapter
and remote adapter (SRC xxxx9075)
The possible cause is a failure in the embedded SAS fabric. Use the following table to determine the
service action to perform.
Table 43. Service actions for a failure in the embedded SAS fabric
Location of device or devices
Service action
8248-L4T, 8408-E8D, 8412-EAD, 9109-RMD, 9117-MMB,
9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or
9179-MHD
Replace the following FRUs, one at a time, in the order
shown until the problem is resolved. See Part locations
and location codes to determine the location, part
number, and replacement procedure to use for each FRU.
1. Small form factor SAS disk drive backplane with
embedded SAS adapters at location Un-P2-C9.
2. I/O backplane at location Un-P2.
Disk expansion unit attached to an 8248-L4T, 8408-E8D,
8412-EAD, 9109-RMD, 9117-MMB, 9117-MMC,
9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
Replace the cables to the disk expansion unit. If that does
not resolve the problem, see the service information for
the disk expansion unit for additional FRUs to replace.
This ends the procedure.
SIP4150
Use this procedure to perform serial attached SCSI (SAS) fabric problem isolation.
Considerations:
v Power off the system, partition, or card slot before connecting and disconnecting cables or devices, as
appropriate, to prevent hardware damage.
v Some systems have the disk enclosure or removable media enclosure integrated in the system with no
cables. For these configurations, the SAS connections are integrated onto the system boards. A failed
connection can be the result of a failed system board or integrated device enclosure.
v Some systems have SAS RAID adapters integrated onto the system backplane and use a cache RAID
and dual IOA enablement card to enable storage adapter write cache and dual storage I/O adapter
(IOA) mode. For these configurations, replacement of the cache RAID and dual IOA enablement card is
unlikely to solve a SAS-related problem because the SAS interface logic is on the system backplane.
Attention: When SAS fabric problems exist, obtain assistance from your hardware service provider:
v When SAS fabric problems exist, do not replace RAID adapters without assistance from your service
provider. Because the adapter might contain nonvolatile write cache data and configuration data for
the attached disk arrays, additional problems can be created by replacing an adapter.
v Follow appropriate service procedures when replacing the Cache RAID and dual IOA enablement card.
Incorrect removal can result in data loss or a nondual storage IOA mode of operation.
v Do not remove functioning disk units in a disk array without assistance from your service provider. A
disk array might become unprotected or might fail if functioning disk units are removed. The removal
of functioning disk units might also result in additional problems in the disk array.
1. Was the SRC xxxx3020?
No:
226
Go to step 3 on page 227.
Isolation procedures
Yes:
Go to step 2.
2. The possible causes are:
v More devices are connected to the adapter than the adapter supports. Change the configuration to
the allowable number of devices.
v A SAS device has been incorrectly moved from one location to another. Either return the device to
its original location or move the device while the adapter is powered off.
v A SAS device has been incorrectly replaced by a SATA device. A SAS device must be used to
replace a SAS device.
This ends the procedure.
3. Determine the status of the disk units in the array by doing the following steps:
a. Access the product activity log and display the SRC that sent you here.
b. Press the F9 key for address information. This is the adapter address.
c. Return to the SST or DST main menu.
d. Select Work with disk units > Display disk configuration > Display disk configuration status.
e. On the Display disk configuration status screen, look for the devices attached to the adapter that
was identified.
Is there a device that has a status of RAID 5/Unknown, RAID 6/Unknown, RAID 5/Failed, or RAID
6/Failed?
No:
Go to step 5.
Yes:
Go to step 4
4. Other errors might have occurred related to the disk array having degraded protection. Take action
on these errors to replace the failed disk unit and restore the disk array to a fully protected state.
This ends the procedure.
5. Have other errors occurred at the same time as this error?
No:
Go to step 7.
Yes:
Go to step 6
6. Take action on the other errors that have occurred at the same time as this error. This ends the
procedure.
7. Was the SRC xxxxFFFE?
No:
Go to step 10.
Yes:
Go to step 8.
8. Check for the latest PTFs for the device, device enclosure, and adapter and apply them. Did you find
and apply a PTF?
No:
Go to step 10.
Yes:
Go to step 9.
9. This ends the procedure.
10. Is the problem in a disk expansion unit?
No
Go to SAS fabric identification.
Yes
Go to 11.
11. Identify the adapter and adapter port associated with the problem by examining the product activity
log. Perform the following steps:
a. Access SST or DST.
v If you can enter a command at the console, access system service tools (SST). See System
service tools.
Isolation procedures
227
v If you cannot enter a command at the console, perform an IPL to DST. See Performing an IPL
to dedicated service tools.
v If you cannot perform a type A or B IPL, perform a type D IPL from removable media.
b. Access the product activity log and display the SRC that sent you here. Record the adapter
address and the adapter port by doing one of the following:
v If the SRC is xxxxFFFE, press the F9 key for address information. The adapter address is the
bus, board, card information. The port is shown in the I/O bus field. Convert the port value
from decimal to hexadecimal.
v Press the F9 key for address information. The adapter address is the bus, board, card
information. Then, press F12 to cancel and return to the previous screen. Then press the F4 key
to view the additional information, if available. The adapter port is characters 1 and 2 of the
unit address. For example, if the unit address is 123456FF, the port would be 12.
v Go to Hexadecimal product activity log data to obtain the address information. The adapter
address is the bus, board, card information. The adapter port is characters 1 and 2 of the unit
address. For example, if the unit address is 123456FF, the port would be 12.
12. Perform the following steps:
a. Select Start a Service Tool > Hardware Service Manager > Logical Hardware Resources >
System Bus Resources.
b. Enter the adapter bus address and use the Associated packaging resource(s) option to display
the type, model, and unit ID.
c. Record the type, model, and unit ID of the enclosure in which the adapter is located.
d. Use the type, model, unit ID and adapter address to find the location of the adapter (see
Addresses to find the location and then go to System FRU locations).
e. The logical port number was identified in step 11 on page 227. Logical port numbers are
indicated on the physical connector labels located on the tailstock of the adapter. To locate the
device or device enclosure that is experiencing the problem, use the logical port number to
determine the physical connector to which the device or device enclosure is attached.
13. Because the problem persists, some corrective action is needed to resolve the problem. Proceed by
doing the following:
Perform only one of the following corrective actions (listed in the order of preference). If one of the
corrective actions has previously been attempted, proceed to the next one in the list.
v Reseat cables, if present, on adapter and device enclosure. Perform the following steps:
a. Use adapter concurrent maintenance to power off the adapter slot, or power off the system or
partition.
b. Reseat the cables.
c. Use adapter concurrent maintenance to power on the adapter slot, or power on the system or
partition.
v Replace the cable, if present, from the adapter to the device enclosure. Perform the following
steps:
a. Use adapter concurrent maintenance to power off the adapter slot, or power off the system or
partition.
b. Replace the cables.
c. Use adapter concurrent maintenance to power on the adapter slot, or power on the system or
partition.
v Replace the device.
Note: If there are multiple devices with a path that is not Operational, the problem is not likely to
be with a device.
v Replace the internal device enclosure or see the service documentation for an external expansion
unit. Perform the following steps:
228
Isolation procedures
a. Power off the system or partition. If the enclosure is external, use adapter concurrent
maintenance instead to power off the adapter slot.
b. Replace the device enclosure.
c. Power on the system or partition. If the enclosure is external, use adapter concurrent
maintenance instead to power on the adapter slot.
v Replace the adapter. The procedure to replace the adapter can be found in PCI adapter.
v Contact your service provider.
14. Does the problem still occur after performing the corrective action?
No: This ends the procedure.
Yes: Go to step 13 on page 228.
SIP4152
Use this procedure to resolve possible failed connection problems
This procedure is used to resolve the following problems:
v Multipath redundancy level is worse (SRC xxxx4060)
v Device bus fabric error (SRC xxxx4100)
v Temporary device bus fabric error (SRC xxxx4101)
The possible causes are:
v A failed connection caused by a failing component in the serial attached SCSI (SAS) fabric between,
and including, the adapter and device enclosure.
v A failed connection caused by a failing component within the device enclosure, including the device
itself.
Note: For SRC xxxx4060, the failed connection was previously working, and might have already
recovered.
Considerations:
v Power off the system, partition, or card slot before connecting and disconnecting cables or devices, as
appropriate, to prevent hardware damage.
v Some systems have the disk enclosure or removable media enclosure integrated in the system with no
cables. For these configurations, the SAS connections are integrated onto the system boards. A failed
connection can be the result of a failed system board or integrated device enclosure.
v Some systems have SAS RAID adapters integrated onto the system backplane and use a cache RAID
and dual IOA enablement card to enable storage adapter write cache and dual storage I/O adapter
(IOA) mode. For these configurations, replacement of the cache RAID and dual IOA enablement card is
unlikely to solve a SAS-related problem because the SAS interface logic is on the system backplane.
v When using SAS adapters in a dual storage IOA configuration, ensure that the actions taken in this
procedure are against the primary adapter (not the secondary adapter).
Isolation procedures
229
Attention:
v When SAS fabric problems exist, do not replace RAID adapters without assistance from your service
provider. Because the adapter might contain nonvolatile write cache data and configuration data for
the attached disk arrays, additional problems can be created by replacing an adapter.
v Follow appropriate service procedures when replacing the Cache RAID and dual IOA enablement card.
Incorrect removal can result in data loss or a nondual storage IOA mode of operation.
v Do not remove functioning disk units in a disk array without assistance from your service provider. A
disk array might become unprotected or might fail if functioning disk units are removed. The removal
of functioning disk units might also result in additional problems in the disk array.
1. Determine the resource name of the adapter that reported the problem by performing the following:
a. Access SST or DST.
b. Access the product activity log and record the resource name that this error is logged against. If
the resource name is an adapter resource name, use it and continue with the next step. If the
resource name is a disk unit resource name, use the Hardware Service Manager to determine the
resource name of the adapter that is controlling this disk unit.
2. Is the IBM i operating system at Version 6.1.1 or later?
No: Continue with the next step.
Yes: Go to step 4.
3. Determine whether a problem still exists for the adapter that logged this error by examining the SAS
connections as follows:
a. On the System Service Tools (SST) screen, select Start a Service Tool and then press Enter.
b. Select Display/Alter/Dump.
c.
d.
e.
f.
Select Display/Alter storage.
Select Licensed Internal Code (LIC) data.
Select Advanced Analysis.
Type FABQUERY on the entry line and then select it with option 1.
g. On the Specify Advanced Analysis Options screen, type -SUB 01 -IOA DCxx -DSP 0 in the Options
field, where DCxx is the adapter resource name. Press Enter.
Note: More information is available by returning to the Specify Advanced Analysis Options screen
and typing -SUB 01 -IOA DCxx -DSP 2 in the Options field, where DCxx is the adapter resource
name. Press Enter.
Do all expected devices appear in the list and are all paths marked as Operational?
No: Continue with the next step.
Yes: The error condition has been recovered. If the error condition has been recovered more
than one time, go to step 7 on page 231. Otherwise, the error condition is not a persistent
problem and no further service action is necessary. This ends the procedure.
4. Determine whether a problem still exists for the DCxx adapter resource that logged this error by
examining the SAS connections. See Viewing SAS fabric path information. Do all expected devices
appear in the list and are all paths marked as Operational?
No: Continue with the next step.
Yes: The error condition has been recovered. If the error condition has been recovered more than
one time, go to step 7 on page 231. Otherwise, the error condition is not a persistent problem and
no further service action is necessary. This ends the procedure.
5. Perform the following steps to cause the adapter to rediscover the devices and connections:
a. Use Hardware Service Manager to perform another IPL of the virtual I/O processor that is
associated with this adapter.
b. Vary on any other resources attached to the virtual I/O processor.
230
Isolation procedures
6. To determine if the problem still exists for the adapter that logged this error, examine the SAS
connections by performing the actions in step 3 on page 230 or step 4 on page 230 again. Do all
expected devices appear in the list and are all paths marked as Operational?
No: Continue with the next step.
Yes: The error condition no longer exists. This ends the procedure.
7. Go to SAS fabric identification. Then continue with the next step.
8. To determine if the problem still exists for the adapter that logged this error, examine the SAS
connections by performing the actions in step 3 on page 230 or step 4 on page 230 again. Do all
expected devices appear in the list and are all paths marked as Operational?
No: Go to step 7.
Yes: The error condition has been recovered. If the error condition has been recovered more than
one time, go to step 7. Otherwise, the error condition is not a persistent problem and no further
service action is necessary. This ends the procedure.
SIP4153
Use this procedure to resolve possible failed connection problems.
Use isolation procedure “SIP4152” on page 229.
Isolation procedures
231
Service processor isolation procedures
Use service processor isolation procedures if there is not a management console attached to the server. If
the server is connected to a management console, use the procedures that are available on the
management console to continue FRU isolation.
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
FSPSP01
A part vital to system function has been unconfigured. Review the system error logs for errors that
include in their failing item list parts that are relevant to each reason code. If replacing those parts does
not resolve the error, use this procedure.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
Perform the following steps:
1. Is the SRC B1xxB10C or B1xxB10D (xx indicates that the subsystem ID is irrelevant)?
232
Isolation procedures
v No: Go to step 8.
v Yes: The system has detected that one of the following has occurred:
– A memory controller that is required for the system to function is unconfigured.
– There is not enough memory.
– The memory is plugged incorrectly.
Continue with the next step.
2. Use the Advanced System Management Interface (ASMI) to determine if the memory controller is
unconfigured.
Note: To perform this operation, your authority level must be administrator or authorized service
provider.
a. On the ASMI Welcome pane, specify your user ID and password, and click Log In.
b. In the navigation area, expand System Service Aids > Deconfiguration Records.
c. Is the memory controller unconfigured?
v Yes: Contact your next level of support.
v No: Continue with the next step.
3. Ensure the memory is plugged correctly.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, or 8205-E6D
Go to Memory riser placement and memory module balancing.
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, or 8268-E1D
Go to Memory riser placement and memory module balancing.
8233-E8B or 8236-E8C
Contact your next level of support.
8248-L4T, 8408-E8D, or
9109-RMD
Go to Memory riser placement and memory module balancing.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9119-FHB, 9125-F2C,
9179-MHB, 9179-MHC, or
9179-MHD
Contact your next level of support.
Is the memory plugged correctly?
Yes: Go to step 5.
No: Correct the memory plugging problem and continue with the next step.
4. Perform the action indicated in the table below for your system.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB, or
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Isolation procedures
233
System:
Action:
8233-E8B or 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
Does the problem persist?
Yes: Continue with the next step.
No: Go to Verify a repair. This ends the procedure.
5. Perform the following steps:
a. Perform the action indicated in the table below for your system.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D, or
9109-RMD
Reseat all of the memory DIMMs on each memory card but do not replace any
memory DIMMs at this time.
8233-E8B or 8236-E8C
Reseat all of the memory DIMMs in each processor card but do not replace any
memory DIMMs at this time.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Reseat all of the memory DIMMs in each processor enclosure (primary and all
secondary units) but do not replace any memory DIMMs at this time.
9119-FHB
Reseat all of the memory DIMMs on each memory card but do not replace any
memory DIMMs at this time.
9125-F2C
Reseat all of the memory DIMMs in the node but do not replace any memory DIMMs
at this time.
b. Perform the action indicated in the table below for your system.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB, or
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B or 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
234
Isolation procedures
System:
Action:
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
6.
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
Does the problem persist?
Yes: Continue with the next step.
No: Go to Verify a repair. This ends the procedure.
Perform the action indicated in the table below for your system.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D, or
9109-RMD
1. Replace the first memory DIMM pair (on each memory card, starting with memory
card 1).
8233-E8B or 8236-E8C
2. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
3. Does the problem persist?
Yes:
Repeat this step and replace the next memory DIMM pair. If you have
replaced all of the memory DIMM pairs, then continue with the next step.
No:
Go to Verify a repair. This ends the procedure.
1. Replace the first memory DIMM pair (for each processor card, starting with processor
card 1).
2. Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
3. Does the problem persist?
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC,
or 9179-MHD
Yes:
Repeat this step and replace the next memory DIMM pair. If you have
replaced all of the memory DIMM pairs, then continue with the next step.
No:
Go to Verify a repair. This ends the procedure.
1. Replace the first memory DIMM pair in the first processor enclosure.
2. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
3. Does the problem persist?
Yes:
Repeat this step and replace the next memory DIMM pair. If you have
replaced all of the memory DIMMs, then continue with the next step.
No:
Go to Verify a repair. This ends the procedure.
Isolation procedures
235
System:
Action:
9119-FHB
1. Replace the first memory DIMM quad.
2. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
3. Does the problem persist?
9125-F2C
Yes:
Repeat this step and replace the next memory DIMM. If you have replaced
all of the memory DIMMs, continue with the next step.
No:
Go to Verify a repair. This ends the procedure.
1. Replace the first memory DIMM in each octant.
2. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
3. Does the problem persist?
7.
Yes:
Repeat this step and replace the next memory DIMM in each octant. If you
have replaced all of the memory DIMMs, contact your next level of support.
This ends the procedure.
No:
Go to Verify a repair. This ends the procedure.
Perform the action indicated in the table below for your system.
System:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E1C, 8231-E1D,
8231-E2C, 8231-E2D, or
8268-E1D
Action:
1. Replace the first processor module (Un-P1-C11).
2. Does the problem persist?
Yes:
Continue with the next step.
No:
Go to Verify a repair. This ends the procedure.
3. Is a second processor module present?
Yes:
Continue with the next step.
No:
Replace the system backplane at location Un-P1. This ends the procedure.
4. Replace the second processor module at location Un-P1-C10. Does the problem
persist?
8231-E2B
Yes:
Replace the system backplane at location Un-P1. This ends the procedure.
No:
Go to Verify a repair. This ends the procedure.
1. Replace the first processor module (Un-P1-C10).
2. Does the problem persist?
Yes:
Continue with the next step.
No:
Go to Verify a repair. This ends the procedure.
3. Is a second processor module present?
Yes:
Continue with the next step.
No:
Replace the system backplane at location Un-P1. This ends the procedure.
4. Replace the second processor module at location Un-P1-C19. Does the problem
persist?
236
Isolation procedures
Yes:
Replace the system backplane at location Un-P1. This ends the procedure.
No:
Go to Verify a repair. This ends the procedure.
System:
Action:
8233-E8B or 8236-E8C
1. Replace the first processor card, starting with processor card 1 (Un-P1-C13).
2. Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
3. Does the problem persist?
8248-L4T, 8408-E8D, or
9109-RMD
Yes:
Repeat this step and replace the next processor card. If you have replaced all
of the processor cards, then contact your next level of support. This ends the
procedure.
No:
Go to Verify a repair. This ends the procedure.
1. Replace the first processor module (Un-P3-C12).
2. Does the problem persist?
Yes:
Continue with the next step.
No:
Go to Verify a repair. This ends the procedure.
3. If there are additional processor modules present at Un-P3-C17, Un-P3-C13, or
Un-P3-C16, replace them one at a time until the problem is resolved.
4. Does the problem persist?
Yes:
Continue with the next step.
No:
Go to Verify a repair. This ends the procedure.
5. Replace the processor backplane at location Un-P3.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC,
or 9179-MHD
1. Replace the processor card in the primary processor enclosure.
2. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
3. Does the problem persist?
9119-FHB
Yes:
Contact your next level of support. This ends the procedure.
No:
Go to Verify a repair. This ends the procedure.
1. Replace the processor book.
2. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
3. Go to Verify a repair. This ends the procedure.
8. Is SRC B1xxB107 or B1xxB108?
Yes:
Perform the action indicated in the table below for your system.
No:
Continue with the next step.
Isolation procedures
237
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8233-E8B,
8236-E8C, or 8268-E1D
The system has detected a problem with the clock module. Replace the system backplane
(Un-P1). This ends the procedure.
8248-L4T, 8408-E8D, or
9109-RMD
The system has detected a problem with the clock module on the service processor.
Replace the service processor at location Un-P1-C1. This ends the procedure.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC,
or 9179-MHD
The system has detected a problem with a clock module on one of the service processors.
1. Starting in the primary processor enclosure and continuing in the secondary
enclosures, replace the service processor cards one at a time (starting with Un-P1-C1)
until the problem is resolved.
2. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
9119-FHB
The system has detected a problem with the system unit clock card. Perform the following
steps for each of the system unit clock cards until the problem is resolved:
1. Replace the system unit clock card.
2. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
3. Go to Verify a repair. This ends the procedure.
9125-F2C
Contact your next level of support. This ends the procedure.
9. Is the SRC B1xxB106?
Yes:
Perform the action indicated in the table below for your system.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, or 8268-E1D
Replace the system backplane (Un-P1). This ends the procedure.
8233-E8B or 8236-E8C
Replace the system backplane (Un-P1).
1. Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
2. Go to Verify a repair. This ends the procedure.
8248-L4T, 8408-E8D, or
9109-RMD
238
Isolation procedures
Review the system error logs for errors that include system midplanes and backplanes in
their failing item list. This will indicate which system backplanes have problems and need
to be replaced. Replace the system midplane or backplane in the enclosure indicated by
the logs.
System:
Action:
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC,
or 9179-MHD
Review the system error logs for errors that include system midplanes and backplanes in
their failing item list. This will indicate which system backplanes have problems and need
to be replaced. Replace the system midplane or backplane in the enclosure indicated by
the logs.
1. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
2. Go to Verify a repair. This ends the procedure.
9119-FHB
Review the system error logs for errors that include system backplanes in their failing
item list. This will indicate which system backplanes have problems and need to be
replaced. Replace the appropriate processor book or processor books.
1. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
2. Go to Verify a repair. This ends the procedure.
9125-F2C
Contact your next level of support. This ends the procedure.
No:
Continue with the next step.
10. Is the SRC B1xxB110 or B1xxB111?
No:
Contact your next level of support. This ends the procedure.
Yes:
Perform the action indicated in the table below for your system.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, or 8268-E1D
Replace the system backplane (Un-P1). This ends the procedure.
8233-E8B or 8236-E8C
Replace the system backplane (Un-P1).
1. Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
2. Go to Verify a repair. This ends the procedure.
8248-L4T, 8408-E8D, or
9109-RMD
Review the system error logs for errors that include system midplanes and backplanes in
their failing item list. This will indicate which system backplanes have problems and need
to be replaced. Replace the system midplane or backplane in the enclosure indicated by
the logs.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC,
or 9179-MHD
Review the system error logs for errors that include I/O backplanes in their failing item
list. This will indicate which I/O backplanes have problems and need to be replaced.
Replace the I/O backplane in the enclosure indicated by the logs.
1. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
2. Go to Verify a repair. This ends the procedure.
Isolation procedures
239
System:
Action:
9119-FHB
Review the system error logs for errors that include I/O bridges in their failing item list.
This will indicate which I/O bridges have problems and need to be replaced. Replace the
appropriate processor book or processor books.
1. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
2. Go to Verify a repair. This ends the procedure.
9125-F2C
Contact your next level of support. This ends the procedure.
FSPSP02
This procedure is for boot failures that terminate very early in the boot process.
This error path is indicated when the SRC data words are scrolling automatically through control panel
functions 11, 12, and 13, and the control panel interface buttons are not responsive.
Perform the following:
1. Is the system a 9119-FHB or a 9125-F2C?
No: Continue with the next step.
Yes: Save any error log and dump data and contact your next level of support for assistance. This
ends the procedure.
2. Push the white power button to reset the system and start it on the other side of the platform
Licensed Internal Code.
Note: The white power button will only reset the system and attempt to reach standby.
3. Did an SRC occur after starting the system on the other side?
No: Verify that the system's firmware is at the latest level. Update the system's firmware on the
failing side if necessary. Perform LICCODE. This ends the procedure.
Yes: Continue with the next step.
4. Is the SRC the same SRC that brought you to this procedure?
v No: Return to Start of call to service this new SRC. This ends the procedure.
v Yes: Perform the action indicated in the table below for your system.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8233-E8B,
8236-E8C, 8268-E1D
Replace the system backplane (Un-P1). Refer to System FRU locations to locate the
correct system location of the part you are replacing. If the problem is not resolved,
contact your next level of support. This ends the procedure.
8248-L4T, 8408-E8D,
9109-RMD
1. Replace the service processor at location Un-P1-C1.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
240
Isolation procedures
2. If the problem persists, replace the system midplane at location Un-P1. This ends
the procedure.
1. Replace the service processor.
2. If the problem persists, replace the system midplane at location Un-P1. This ends
the procedure.
FSPSP03
A system operator or user error has occurred.
See the documentation for the task that you were attempting to perform.
FSPSP04
A problem has been detected in the service processor firmware.
Perform LICCODE. This ends the procedure.
FSPSP05
The service processor has detected a problem in the platform firmware.
Perform LICCODE. This ends the procedure.
FSPSP06
The service processor reported a suspected intermittent problem.
Collect log, debug, and dump data if available and send it to your next level of support.
FSPSP07
The time of day has been reset to the default value.
1. To set the time of day, see the systems operations guide.
2. If the problem persists, perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. To determine the action to perform, use the following table.
System
Action
8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C,
8205-E6D, 8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C,
8231-E2D, 8233-E8B, 8236-E8C, or 8268-E1D
Replace the time-of-day battery at location Un-P1-E1. See
System FRU locations.
8248-L4T, 8408-E8D, 8412-EAD, 9109-RMD, 9117-MMB,
9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or
9179-MHD
Replace the time-of-day battery at location Un-P1-C1-E1.
See System FRU locations.
9119-FHB
Replace the time-of-day battery. To determine which
battery to replace, use the last byte in word 3 of the
primary SRC.
v If the last byte is a 10, replace the system controller
card 0 time-of-day battery.
v If the last byte is a 20, replace the system controller
card 1 time-of-day battery.
9125-F2C
Replace the time-of-day battery. To determine which
battery to replace, use the location code specified with
the SRC. If a location code is not available, look at the
amber battery LEDs on the front of the DCCAs. Replace
the battery next to the amber LED that is lit.
3. To determine the action to perform, use the following table.
Isolation procedures
241
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB, or
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B or 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
FSPSP09
A problem has been detected with a memory DIMM, but it cannot be isolated to a specific memory
DIMM.
1. Use the following table to determine the action to perform.
System
Action
8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, or
8205-E6D
Replace all of the memory DIMMs that are located on
the same memory card (starting with memory card 1:
Un-P1-Cx-Cy, with x =15 through 18 and y = 1 through 4
and 7 through 10). See System FRU locations for FRU
location information.
8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D, or
8268-E1D
Replace all of the memory DIMMs that are located on
the same memory card (starting with memory card 1:
Un-P1-Cx-Cy, with x =14 through 17 and y = 1 through
4). See System FRU locations for FRU location
information.
8233-E8B, 8236-E8C
Replace all of the memory DIMMs that are located on
the same processor card (starting with processor card 1:
Un-P1-Cx-Cy, with x =13 through 16, and y = 2 through
9). See System FRU locations for FRU location
information.
8248-L4T, 8408-E8D, 9109-RMD
Replace all of the memory DIMMs that are located on
the same memory card (starting with memory card 1:
Un-P3-Cx-Cy, with x =1 through 4 and 6 through 9 and y
= 1 through 8). See System FRU locations for FRU
location information.
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or 9179-MHD
Using the FRU that is included in the failing item list
after this procedure, replace all of the memory DIMMs
that are on the processor card. See System FRU locations
for FRU location information.
242
Isolation procedures
System
Action
9119-FHB
Using the FRU that is included in the failing item list
after this procedure, replace all of the memory DIMMs
that are located on the same memory board or card. See
System FRU locations for FRU location information.
9125-F2C
Using the FRU that is included in the failing item list
after this procedure, replace all of the memory DIMMs
that are located in the same octant. See System FRU
locations for FRU location information.
2. To determine the action to perform, use the following table.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB, or
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B or 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
FSPSP10
The part indicated in the FRU list that follows this procedure is not valid or missing for this system's
configuration.
See System FRU locations for information about FRU locations.
Perform the following steps to correct the problem:
1. Does word 8 (the 8 leftmost characters in the 2nd line of function 13) of the reference code end with
02 or 04?
No: Go to step 3 on page 244.
Yes: Continue with the next step.
2. The FRU that is included in the failing item list after this procedure is either missing or not valid. Is
that FRU installed and connected or plugged in correctly?
Yes: The installed FRU is not valid. Remove that FRU. Then contact your next level of support to
determine the correct FRU. This ends the procedure.
No: The FRU is missing. If the FRU is present but not connected, reconnect it and then perform the
specified action in following table. Otherwise, contact your next level of support to determine the
missing FRU.
Isolation procedures
243
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB,
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B, 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
3. Does word 8 end with 01 or 05?
No: Return to the Start of call. This ends the procedure.
Yes: The FRU that is listed after this procedure has the same serial number as another FRU in the
system. Remove all but one of the FRUs that are listed after this procedure and then perform the
specified action in following table.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB,
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B, 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
FSPSP11
The service processor has detected an error in the system unit.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
244
Isolation procedures
1. Verify that the system's firmware is at the latest level. Update the system's firmware if necessary.
2. If the problem persists, use the following table to determine the action to perform.
System
Action
8233-E8B, 8236-E8C
v Replace the host Ethernet adapter (HEA) card.
v Replace the system backplane at location Un-P1.
v Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow
boot.
No: Power on the system to the hypervisor
standby.
This ends the procedure.
8202-E4B, 8205-E6B, 8231-E2B
v Replace the host Ethernet adapter (HEA) card.
v Replace the system backplane at location Un-P1.
v Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
This ends the procedure.
8248-L4T, 8408-E8D, 9109-RMD
v Replace the host Ethernet adapter (HEA) card.
v Replace the I/O backplane at location Un-P2.
v Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
This ends the procedure.
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or 9179-MHD
1. Replace the host Ethernet adapter (HEA) card.
2. Replace each I/O backplane one at a time.
3. Determine the level of firmware the system is
running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow
boot.
No: Power on the system to the hypervisor
standby.
This ends the procedure.
9119-FHB
v Replace the processor book.
v Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
This ends the procedure.
FSPSP12
The DIMM FRU that was previously replaced did not correct the memory error.
Perform the following steps:
1. Power off the system. See Powering on and powering off the system.
2. To determine the action to perform, use the following table.
Isolation procedures
245
System
Action
8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C,
8205-E6D
Replace the memory card that the DIMM is plugged into
(Un-P1-C15 through Un-P1-C18). See System FRU
locations for information about system FRU locations.
8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D,
8268-E1D
Replace the memory card that the DIMM is plugged into
(Un-P1-C14 through Un-P1-C17). See System FRU
locations for information about system FRU locations.
8233-E8B, 8236-E8C
Replace the processor card that the DIMM is plugged
into (Un-P1-C13 for processor card 1 through Un-P1-C16
for processor card 4). See System FRU locations for
information about system FRU locations.
8248-L4T, 8408-E8D, 9109-RMD
Replace the memory card that the DIMM is plugged into
(Un-P3-C1 through Un-P3-C4 and Un-P3-C6 through
Un-P3-C9). See System FRU locations for information
about system FRU locations.
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or 9179-MHD
Replace the processor card the DIMM is plugged into
(Un-P3). See System FRU locations for information about
system FRU locations.
9119-FHB
Replace the processor book with the failure. See System
FRU locations for information about system FRU
locations.
9125-F2C
Contact your next level of support. This ends the
procedure.
3. To determine the action to perform, use the following table.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB, or
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B or 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
FSPSP14
The service processor cannot communicate with the system firmware. The server firmware will continue
to run the system and partitions while it attempts to recover communications. Server firmware recovery
actions will continue for approximately 30 to 40 minutes.
Perform the following steps:
246
Isolation procedures
1. Record the time the log was created or when the SRC was first noticed. Continue with the next step.
2. Are progress codes being displayed on the panel?
v Yes: Server firmware was able to reset the service processor. This ends the procedure.
v No: Continue with the next step.
3. Has an A7006995 SRC been displayed on the panel?
v Yes: Partitions are being powered off and a server dump will be attempted. Follow the A7006995
SRC description if the partitions do not terminate as requested. This ends the procedure.
v No: Continue with the next step.
4. Has the A1xx SRC remained on the panel for more than 40 minutes?
v Yes: Server firmware could not begin termination of the partitions. Contact your next level of
support to assist in attempting to terminate any remaining partitions and forcing a server dump.
Collect the dump for support, power off and power on the system. This ends the procedure.
v No: Contact your next level of support. This ends the procedure.
FSPSP16
Save any error log and dump data and contact your next level of support for assistance.
FSPSP17
A system uncorrectable error has occurred.
1. Look for other serviceable events and use the failing items listed with them to correct the problem.
2. If you need to run the system in a degraded mode until you can perform the service actions, do the
following:
a. Power off the system (see Powering on and powering off the system).
b. Power on the system (see Powering on and powering off the system) to allow the memory
diagnostics to clean up the memory and deconfigure any defective parts.
This ends the procedure.
FSPSP18
A problem has been detected in the platform Licensed Internal Code.
Perform LICCODE. This ends the procedure.
FSPSP20
A failing item has been detected by a hardware procedure.
To run full hardware diagnostics, use the following table.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB, or
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Isolation procedures
247
System:
Action:
8233-E8B or 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
If a new SRC occurs, repair the system using that reference code.
If an incomplete occurs, go to Managing the advanced system management interface (ASMI) menus to
power off, check for deconfigured components and perform the action mentioned in the previous table.
This ends the procedure.
FSPSP22
The system has detected that a processor chip is missing from the system configuration because JTAG
lines are not working.
See System FRU locations for instructions for removing and replacing FRUs.
For 8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, 8205-E6D, 8231-E1C, 8231-E1D, 8231-E2C,
8231-E2D, or 8268-E1D, perform the following steps:
1. Power off the system. See Powering on and powering off the system.
2. Replace the system backplane at location Un-P1.
3. Replace the first processor module at location Un-P1-C11.
4. Replace the second processor module if present at location Un-P1-C10.
5. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
For 8231-E2B, perform the following steps:
1. Power off the system. See Powering on and powering off the system.
2. Replace the system backplane at location Un-P1.
3. Replace the first processor module at location Un-P1-C10.
4. Replace the second processor module if present at location Un-P1-C9.
5. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
This ends the procedure.
For 8233-E8B or 8236-E8C, perform the following steps:
1. Power off the system. See Powering on and powering off the system.
2. Replace the system backplane at location Un-P1.
3. Replace the processor cards at locations Un-P1-C13 through Un-P1-C16.
4. Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
248
Isolation procedures
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
For 8248-L4T, 8408-E8D, or 9109-RMD, perform the following steps:
1. Power off the system. See Powering on and powering off the system.
2. Replace the service processor card at location Un-P1-C1.
3. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
4. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
5. Replace the system processor modules one at a time (Un-P3-C12, then Un-P3-C17, then Un-P3-C13,
then Un-P3-C16, if present). See System FRU locations for information about FRU locations for the
system you are servicing.
6. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
7. Replace the system processor assembly at location Un-P3.
This ends the procedure.
For 8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD, perform the
following steps:
1. Power off the system. See Powering on and powering off the system.
2. Replace the service processor at location Un-P1-C1.
3. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
4. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
5. Perform the following for each system processor until the problem is resolved:
a. Power off the system.
b. Replace the processor card at location Un-P3.
c. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot after replacing each processor. To review the "Performing a slow
boot" procedure go to Performing a slow boot.
No: Power up the system to hypervisor standby and see if the error recurs.
This ends the procedure.
For 9119-FHB, perform the following steps:
1. Power off the system. See Powering on and powering off the system.
Isolation procedures
249
2. Replace the node controllers in the failing node, one at a time.
3. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
4. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
5. Perform the following for each system processor until the problem is resolved:
a. Power off the system.
b. Replace each processor in the failing node, one at a time.
c. Power up the system to hypervisor standby after replacing each processor.
Note: Slow boot is not supported.
This ends the procedure.
For 9125-F2C, perform the following steps:
1. Power off the system. See Powering on and powering off the system.
2. Replace the DCCAs one at a time.
3. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
4. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
5. Replug all of the memory cards.
6. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
7. Does the problem persist?
No:
This ends the procedure.
Yes:
Replace the system backplane at location Un-P1.
FSPSP23
The system needs to perform a service processor dump.
1. Perform a service processor dump (see Performing a platform system dump or service processor
dump).
2. Attempt to perform an IPL on the system.
3. Save the service processor dump to storage. See Managing dumps.
4. Contact your next level of support. This ends the procedure.
FSPSP24
The system is running degraded. Array bit steering may be able to correct this problem without replacing
hardware.
1. Power off the system (see Powering on and powering off the system).
2. To determine the action to perform, use the following table.
250
Isolation procedures
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB,
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B, 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
3. If the problem persists, replace the FRU that is listed after this procedure in the failing item list. See
System FRU locations for instructions. This ends the procedure.
FSPSP25
The server has detected an over-temperature thermal fault.
1. Before replacing any server hardware FRUs, look in the error log for thermal problems related to fans,
power supplies, etc. Perform all service actions for the thermal problem SRCs first before continuing
with any other failing items in the current SRC. Thermal problems are associated with 1100 xxxx
SRCs, where xxxx may be any of the following:
v 1514
v 1524
v 7201
v 7203
v 7205
v
v
v
v
v
v
v
7610
7611
7620
7621
7630
7631
7640
v 7641
2. If no thermal-related SRCs or problems can be found, replace the server hardware FRU associated
with the current SRC. See System FRU locations for instructions. This ends the procedure.
FSPSP27
A problem has been detected on an attention line. If the FRU replaced before this procedure did not
correct the problem, perform the following:
See System FRU locations to locate the correct system location of the part you are replacing.
Isolation procedures
251
To perform the correct FSPSP27 for the system that you are servicing, select your machine type and
model (system) from among the following:
8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, or 8205-E6D
8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D, or 8268-E1D
8233-E8B or 8236-E8C
8248-L4T, 8408-E8D, or 9109-RMD
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
9119-FHB
8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, or 8205-E6D
1. Was the FRU listed before this procedure a memory DIMM (Un-P1-C15-Cx or Un-P1-C16-Cx or
Un-P1-C17-Cx or Un-P1-C18-Cx)?
No:
Go to step 4.
Yes:
Continue with the next step.
2. Perform the following steps:
a. Replace the memory card (Un-P1-C15 or Un-P1-C16 or Un-P1-C17 or Un-P1-C18) that the DIMM
was plugged into.
b. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
3. Perform the following steps:
a. Replace the system backplane (Un-P1).
b. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
4. Was the FRU listed before this procedure a processor module (Un-P1-C11 or Un-P1-C10)?
No:
This ends the procedure.
Yes:
Continue with the next step.
5. Perform the following steps:
a. Replace the system backplane (Un-P1).
b. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
This ends the procedure.
8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D, or 8268-E1D
1. Was the FRU listed before this procedure a memory DIMM (Un-P1-C14-Cx or Un-P1-C15-Cx or
Un-P1-C16-Cx or Un-P1-C17-Cx)?
No:
Go to step 4.
Yes:
Continue with the next step.
252
Isolation procedures
2. Perform the following steps:
a. Replace the memory card (Un-P1-C14 or Un-P1-C15 or Un-P1-C16 or Un-P1-C17) that the DIMM
was plugged into.
b. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
3. Perform the following steps:
a. Replace the system backplane (Un-P1).
b. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
4. Was the FRU listed before this procedure a processor module?
No:
This ends the procedure.
Yes:
Continue with the next step.
5. Perform the following steps:
a. Replace the system backplane (Un-P1).
b. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
This ends the procedure.
8233-E8B or 8236-E8C
1. Was the FRU listed before this procedure a memory DIMM (Un-P1-C13-Cx or Un-P1-C14-Cx or
Un-P1-C15-Cx or Un-P1-C16-Cx)?
No:
Go to step 4.
Yes:
Continue with the next step.
2. Perform the following steps:
a. Replace the processor card (Un-P1-C13 or Un-P1-C14 or Un-P1-C15 or Un-P1-C16) that the DIMM
was plugged into.
b. Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
3. Perform the following steps:
a. Replace the system backplane (Un-P1).
b. Determine the level of firmware the system is running.
Isolation procedures
253
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
4. Was the FRU listed before this procedure a processor card (Un-P1-C13 or Un-P1-C14 or Un-P1-C15 or
Un-P1-C16) or GX adapter card (Un-P1-C8 or Un-P1-C7)?
No:
This ends the procedure.
Yes:
Continue with the next step.
5. Perform the following steps:
a. Replace the system backplane (Un-P1).
b. Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
8248-L4T, 8408-E8D, or 9109-RMD
1. Was the FRU listed before this procedure a memory DIMM (Un-P3-Cx-Cy, where x = 1 through 4
and 6 through 9, and y = 1 through 8)?
No:
Go to step 8.
Yes:
A memory DIMM attention line has an error. Continue with the next step.
2. Replace the service processor card at location Un-P1-C1.
3. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
4. Replace the memory riser card (Un-P3-Cx, where x = 1 through 4 and 6 through 9) on which you
previously replaced DIMMs.
5. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
6. Replace the system processor assembly at location Un-P3.
7. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Does the problem persist?
No:
254
This ends the procedure.
Isolation procedures
Yes:
Continue with the next step.
8. Was the FRU listed before this procedure the system processor assembly (Un-P3) or GX adapter card
(Un-P1-C2 or Un-P1-C3)?
No:
Go to step 13.
Yes:
A processor card or GX adapter card attention line has an error. Continue with the next step.
9. Replace the service processor card at location Un-P1-C1.
10. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
11. Replace the midplane at location Un-P1.
12. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
This ends the procedure.
13. Was the FRU listed before this procedure the midplane (Un-P1)?
No:
Contact your next level of support. This ends the procedure.
Yes:
An I/O chip attention line has an error. Continue to the next step.
14. Replace the service processor card at location Un-P1-C1.
15. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
This ends the procedure.
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
1. Was the FRU listed before this procedure a memory DIMM (Un-P3-Cx)?
No:
Go to step 10.
Yes:
A memory DIMM attention line has an error. Continue to the next step.
2. Replace the service processor (Un-P1-C1) from the same drawer as the memory DIMM.
3. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
4. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue to the next step.
5. Replace the processor card (Un-P3) on which you previously replaced DIMMs.
6. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
7. Does the problem persist?
Isolation procedures
255
No:
This ends the procedure.
Yes:
Continue with the next step.
8. Replace the system backplane (Un-P1) from the same drawer as the memory DIMM.
9. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
10. Was the FRU listed before this procedure a processor card (Un-P3) or GX adapter card (Un-P1-C2 or
Un-P1-C3)?
No:
Go to step 16.
Yes:
A processor card or GX adapter card attention line has an error. Continue to the next step.
11. Replace the service processor (Un-P1-C1) from the same drawer as the processor card or GX adapter
card.
12. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
13. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
14. Replace the system backplane (Un-P1) in the same drawer as the processor card or GX adapter card.
15. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
16. Was the FRU listed before this the system backplane (Un-P1)?
No:
Contact your next level of support. This ends the procedure.
Yes:
An I/O chip attention line has an error. Continue to the next step.
17. Replace the service processor (Un-P1-C1) from the same drawer as the system backplane.
18. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
9119-FHB
1. Was the FRU listed before this procedure a memory DIMM (Un-Pm-Cx, where m = 2 through 9 and x
= 2 through 21, 24 through 29, and 33 through 38), a processor (Un-Pm-Cx, where m = 2 through 9
and x = 32, 23, 22, or 31), or GX adapter card (Un-Pm-Cx, where m = 2 through 9 and x = 44, 41, 40,
or 39)?
No:
256
This ends the procedure.
Isolation procedures
Yes:
Continue with the next step.
2. Replace both node controllers (Un-Pm-C42 and Un-Pm-C43, where m = 2 through 9) from the same
node as the memory DIMM, processor, or GX adapter card.
3. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
4. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
5. Replace the processor book (Un-Pm, where m = 2 through 9), the memory DIMM, processor, or GX
adapter card.
6. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
This ends the procedure.
FSPSP28
The resource ID (RID) of one or more FRUs could not be found in the vital product data (VPD) table.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
1. Go to your serviceable event view and find other failing items that read "FSPxxxx" where xxxx is a 4
digit hex number that represents the RID. Do not perform any actions on these failing items.
2. Record all failing items, RID numbers, and the model of the system and contact your next level of
support. This ends the procedure.
FSPSP29
The system has detected that all I/O bridges are missing from the system configuration.
Select the system that you are servicing and then perform the indicated FSPSP29 procedure.
v 8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, 8205-E6D, 8231-E2B, 8231-E1C, 8231-E1D,
8231-E2C, 8231-E2D, 8268-E1D
v 8233-E8B, 8236-E8C
v 8248-L4T, 8408-E8D, 9109-RMD
v 8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, 8205-E6D, 8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C,
8231-E2D, 8268-E1D
1. Power off the system. To review the "Powering on and powering off" procedure, go to Powering on
and powering off the system.
2. Replace the system backplane (Un-P1). See System FRU locations and locate the correct location of the
part needing servicing on your system.
3. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
This ends the procedure.
8233-E8B, 8236-E8C
Isolation procedures
257
Power off the system. To review the "Powering on and powering off" procedure, go to Powering on
and powering off the system.
2. Replace the system backplane (Un-P1). See System FRU locations for the location of the failing item.
3. Determine the level of firmware the system is running.
1.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
8248-L4T, 8408-E8D, 9109-RMD
1. Power off the system. To review the "Powering on and powering off" procedure, go to Powering on
and powering off the system.
2. Replace the service processor card at location Un-P1-C1. See System FRU locations for the location of
the failing item.
3. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
4. Does the problem persist?
No:
Go to Verify a repair. This ends the procedure.
Yes:
Continue with the next step.
5. Replace the I/O backplane at location Un-P2.
6. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
7. Does the problem persist?
No:
Go to Verify a repair. This ends the procedure.
Yes:
Replace the midplane at location Un-P1. This ends the procedure.
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
1. Power off the system. To review the "Powering on and powering off" procedure, go to Powering on
and powering off the system.
2. Replace the service processor (Un-P1-C1).
3. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
4. Does the problem persist?
No:
Go to Verify a repair. This ends the procedure.
Yes:
Continue with the next step.
5. Replace the system backplane (Un-P1).
6. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
258
Isolation procedures
7. Does the problem persist?
No:
Go to Verify a repair. This ends the procedure.
Yes:
Continue with the next step.
8. Replace each secondary unit system backplane (Un-P1), one at a time, until the problem is resolved.
This ends the procedure.
FSPSP30
A problem has been encountered accessing the VPD card or the data found on the VPD card has been
corrupted.
This error occurred before VPD collection was completed. No location codes have been created.
1. Power off the system. To review the "Powering on and powering off" procedure go to Powering on
and powering off.
2. Clear any deconfiguration errors for the VPD card.
3. Perform the action indicated in the table below for your system.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D
Replace the following one at a time in the order listed:
1. VPD card at location Un-P1-C20. See System FRU locations to locate the correct
system location of the part you are replacing.
2. System backplane at location Un-P1.
This ends the procedure.
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8268-E1D
Replace the following one at a time in the order listed:
1. VPD card at location Un-P1-C19. See System FRU locations to locate the correct
system location of the part you are replacing.
2. System backplane at location Un-P1.
This ends the procedure.
8233-E8B, 8236-E8C
Replace the following one at a time in the order listed:
1. VPD card at location Un-P1-C9. See System FRU locations to locate the correct
system location of the part you are replacing.
2. System backplane at location Un-P1.
This ends the procedure.
8248-L4T, 8408-E8D,
9109-RMD
Replace the following one at a time in the order listed:
1. Service processor card at location Un-P1-C1. See System FRU locations to locate
the correct system location of the part you are replacing.
2. VPD card at location Un-P2-C7.
This ends the procedure.
Isolation procedures
259
System:
Action:
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
1. Replace all of the service processors.
2. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
3. Does the problem persist?
No:
Go to Verify a repair. This ends the procedure.
Yes:
Continue with the next step.
4. Replace the VPD card at location Un-P2-C7.
5. Does the problem persist?
9119-FHB
No:
Go to Verify a repair. This ends the procedure.
Yes:
Replace the system backplane at location Un-P1. This ends the
procedure.
1. Replace both of the system controller cards (Un-P1-C2, Un-P1-C5).
2. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
3. Does the problem persist?
9125-F2C
No:
This ends the procedure.
Yes:
Replace the VPD card at location Un-P1-C1. This ends the procedure.
1. Replace both of the DCCAs (Un-P1-C147, Un-P1-C148).
2. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
3. Does the problem persist?
No:
This ends the procedure.
Yes:
Replace the VPD card at location Un-P1-C146. This ends the procedure.
FSPSP31
The service processor has detected that one or more of the required fields in the system VPD has not
been initialized.
1. Log into ASMI with authorized service provider authority (see Accessing the Advanced System
Management Interface).
2. Set the system VPD values (see Setting the system enclosure type and Setting the system identifiers).
Note: The service processor will automatically reset when leaving the ASMI after updating the system
VPD.
3. Power on the system. See Powering on and powering off the system. This ends the procedure.
FSPSP32
A problem with the enclosure has been found.
The problem might be caused by one of the following items:
v The enclosure VPD cannot be found.
260
Isolation procedures
v The enclosure serial number is not programmed or has the same value as the system serial number.
v The enclosure feature code is not programmed.
See System FRU locations for instructions for removing and replacing FRUs.
Perform the following steps:
1. Record the reason code (the last 4 characters of word 11) from the SRC by looking at the operator
panel or accessing the error log with the ASMI.
2. Is the reason code A46F?
No: Go to step 5 on page 263.
Yes: Continue with the next step.
3. Check for and apply any server firmware updates. Does the problem persist?
No: This ends the procedure.
Yes: Continue with the next step.
4. Use the following table to determine the action to perform.
System
Action
8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C,
8205-E6D, 8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C,
8231-E2D, 8268-E1D
Perform the following steps:
1. Power off the system. See Powering on and powering
off the system.
2. Replace the system backplane (Un-P1).
3. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Does the problem persist?
8233-E8B, 8236-E8C
No:
This ends the procedure.
Yes:
Contact your next level of support. This
ends the procedure.
Perform the following steps:
1. Power off the system. See Powering on and powering
off the system.
2. Replace the system backplane (Un-P1).
3. Determine the level of firmware the system is
running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow
boot.
No: Power on the system to the hypervisor
standby.
Does the problem persist?
No:
This ends the procedure.
Yes:
Contact your next level of support. This
ends the procedure.
Isolation procedures
261
System
Action
8248-L4T, 8408-E8D, 9109-RMD
Perform the following steps:
1. Power off the system. See Powering on and powering
off the system.
2. Replace the service processor card at location
Un-P1-C1.
3. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Does the problem persist?
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or 9179-MHD
No:
This ends the procedure.
Yes:
Contact your next level of support. This
ends the procedure.
Perform the following steps:
1. Power off the system. See Powering on and powering
off the system.
2. Replace the service processor (Un-P1-C1).
3. Determine the level of firmware the system is
running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow
boot.
No: Power on the system to the hypervisor
standby.
Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
4. Power off the system. See Powering on and powering
off the system.
5. Replace the system backplane.
6. Determine the level of firmware the system is
running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow
boot.
No: Power on the system to the hypervisor
standby.
Does the problem persist?
262
Isolation procedures
No:
This ends the procedure.
Yes:
Contact your next level of support. This
ends the procedure.
System
Action
9119-FHB
Perform the following steps:
1. Power off the system. See Powering on and powering
off the system.
2. Replace both system controller cards (Un-P1-C2 and
Un-P1-C5).
3. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
4. Power off the system. See Powering on and powering
off the system.
5. Replace the midplane (Un-P1).
6. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Does the problem persist?
9125-F2C
No:
This ends the procedure.
Yes:
Contact your next level of support. This
ends the procedure.
Perform the following steps:
1. Power off the system. See Powering on and powering
off the system.
2. Replace both of the DCCAs (locations Un-P1-C147
and Un-P1-C148).
3. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
4. Power off the system. See Powering on and powering
off the system.
5. Replace the system backplane at location Un-P1.
6. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Does the problem persist?
No:
This ends the procedure.
Yes:
Contact your next level of support. This
ends the procedure.
5. Is the reason code A41C or A460?
No: Go to step 7 on page 264.
Yes: Continue with the next step.
6. Perform the following steps:
a. Set the enclosure serial number using the ASMI. See Setting the system enclosure type.
Note: The enclosure serial number can be found on the label located on the system chassis.
Isolation procedures
263
b. The service processor will automatically reset when leaving the ASMI after updating the serial
number.
c. To determine the action to perform, use the following table.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB,
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B, 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
Does the problem persist?
No: This ends the procedure.
Yes: Contact your next level of support. This ends the procedure.
7. Is the reason code A45F?
No: Contact your next level of support. This ends the procedure.
Yes: Continue with the next step.
8. Perform the following steps:
a. Set the enclosure feature code using the ASMI. See Setting the system enclosure type. The service
processor will automatically reset when leaving the ASMI after updating the serial number.
b. To determine the action to perform, use the following table.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB,
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B, 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
264
Isolation procedures
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
Does the problem persist?
No: This ends the procedure.
Yes: Contact your next level of support. This ends the procedure.
FSPSP33
A problem has been detected in the connection with the management console.
1. Ensure that the cable connectors to the network from the management console, managed system,
managed system partitions, and other management consoles are securely connected. If the connections
are not secure, plug the cables back into the proper locations and make sure that the connections are
good.
2. Check to see if the management console is working correctly or if the management console was
disconnected incorrectly from the managed system, managed system partitions, and other
management consoles. If either has happened, reboot the management console.
3. Verify that the network connection between the HMC, managed system, managed system partitions,
and other HMCs is working properly.
4. If applicable, service the next FRU. See System FRU locations for instructions.
5. If the problem continues to persist, contact your next level of support. This ends the procedure.
FSPSP34
The memory cards are plugged in an invalid configuration and cannot be used by the system.
See System FRU locations for information about locating FRUs.
1. Is the SRC B1xx B17B?
Yes: There are one or more memory cards that are incompatible with the other memory cards
plugged into the same board in the system. To correct the error, remove the medium severity
failing items identified in the failing item list and replace them with compatible ones. If you are
unable to determine which memory cards are compatible, contact your next level of support. This
ends the procedure.
No: Continue with the next step.
2. Is the SRC B1xx B180?
Yes: There are one or more memory cards in the system which are not supported. Remove the
failing items identified in the failing item list and replace them with compatible ones. If you are
unable to determine which memory cards are compatible, contact your next level of support. This
ends the procedure.
No: Continue with the next step.
3. Is the SRC B1xx C029?
Yes: A memory module is a different type than the other memory modules in the same group. The
additional parts in the FRU list will include all memory modules in the group that contain the
error. To correct the error, exchange the memory modules of the incorrect type with those of the
required type. This ends the procedure.
No: Continue with the next step.
4. Is the SRC B1xx C02A?
Yes: A memory module is missing from the system. The additional parts in the FRU list will
include all memory modules in the group with the missing card. To correct the error, visually
check the system to determine which of these missing is missing, and add the module. This ends
the procedure.
No: Continue with the next step.
5. Is the SRC B1xx C02B?
Isolation procedures
265
Yes: A group of memory modules are missing and are required so that other memory modules on
the board can be configured. The additional parts in the FRU list will include all missing memory
modules in the group. To correct the error, add or remove these modules to the required locations.
This ends the procedure.
No: Continue with the next step.
6. Is the SRC B1xx C036, B1xx C04E, or B1xx C067?
Yes: A memory module is not supported by this system. The additional parts in the FRU list will
include all memory modules in the group that contains the unsupported modules. To correct the
error, remove these modules from the system or replace them with the correct type. This ends the
procedure.
No: Continue with the next step.
7. Is the SRC B1xx C071?
Yes: Fewer than eight DIMMs are installed for the processor listed in the FRU list. This system
requires a minimum of eight DIMMs per processor. To correct the error, add as many DIMMs as
are required to meet the minimum of eight. This ends the procedure.
No: Return to Start of call. This ends the procedure.
FSPSP35
The system has detected a problem with a memory controller.
Perform the following steps to enable redundant utilization:
1. Power off the system. See Powering on and powering off the system.
2. To determine the action to perform, use the following table.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB, or
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B or 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
FSPSP36
One or both of the SMP cables connecting the system processor cards on this system are incorrectly
plugged, broken, or not the correct type of cable for this system configuration.
See System FRU locations for instructions for removing and replacing FRUs.
Perform the following:
266
Isolation procedures
1. Re-plug the SMP cables (P2-Cx-T1 or P2-Cx-T2) that connect to the system processor cards.
2. To determine the action to perform, use the following table.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB, or
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B or 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
3. Does the problem persist?
Yes: Continue with the next step.
No: This ends the procedure.
4. Replace the SMP cables.
5. To determine the action to perform, use the following table.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB, or
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B or 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
6. Does the problem persist?
No: This ends the procedure.
Yes: Continue with the next step.
7. Perform the following for each system processor card:
a. Power off the system (see Powering on and powering off the system ).
Isolation procedures
267
b. Remove processor card one in the primary unit.
c. To determine the action to perform, use the following table.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB, or
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B or 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
d. Does the problem persist?
Yes: Reinstall the processor card you removed and then repeat this step removing each of the
processor cards, one at a time, until all of the processor cards have been removed. After removing
each card, perform the specified action in the following table. If you have removed the last
processor card and the problem persists, contact your next level of support.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB, or
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B or 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
No: Replace the processor you just removed, it is the failing FRU. This ends the procedure.
FSPSP38
The system has detected an error within the JTAG path.
268
Isolation procedures
Replace all failing items, if any, in the failing item list of the service processor error log entry. If the
replacement of all failing items in the failing item list does not fix the problem, contact your next level of
support.
See System FRU locations for information about locating FRUs.
This ends the procedure.
FSPSP42
An error communicating between two system processors was detected.
See System FRU locations for instructions on locating FRUs found on your system.
There is a communication error between the processors and the FRUs listed before this procedure. If you
were unable to correct the problem by replacing FRUs that were previously specified before coming to
this procedure, consider the possibility of failing system backplanes.
Select the system that you are servicing and then perform the indicated FSPSP42 procedure.
8233-E8B, 8236-E8C
8248-L4T, 8408-E8D, 9109-RMD
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
8233-E8B, 8236-E8C
1. Power off the system. See Powering on and powering off the system.
2. Replace the system backplane (Un-P1).
3. Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
8248-L4T, 8408-E8D, 9109-RMD
1. Power off the system. See Powering on and powering off the system.
2. Replace the system processor assembly at location (Un-P3).
3. Power on the system to the hypervisor standby.
This ends the procedure.
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
1. Use the location codes for the FRUs that were previously specified to determine if the system
processors are on separate nodes. Are the system processors on separate nodes?
No:
Continue with step 2.
Yes:
Do the following:
a. Power off the system. See Powering on and powering off the system.
b. Replace the SMP cables between the two enclosures.
c. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
Isolation procedures
269
No: Power on the system to the hypervisor standby.
d. Is the problem resolved?
Yes:
This ends the procedure.
No:
Continue with the next step.
2. Power off the system. See Powering on and powering off the system.
3. Replace the bad node backplane at location Un-P1.
4. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
FSPSP45
The system has detected an error with the FSI path.
Select the system that you are servicing and then perform the FSPSP45 procedure.
v 8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, 8205-E6D, 8231-E1C, 8231-E1D, 8231-E2C,
8231-E2D, 8268-E1D
v 8231-E2B
v 8233-E8B, 8236-E8C
v 8248-L4T, 8408-E8D, 9109-RMD
v 8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
v 9119-FHB
v 9125-F2C
8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, 8205-E6D, 8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D,
8268-E1D
1. Replace the processor modules one at a time (Un-P1-C11, then Un-P1-C10 if present). See System FRU
locations for information about locating the FRU.
2. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
3. Have all processor modules been replaced?
No:
Go back to step 1 and replace another processor module.
Yes:
Continue with the next step.
4. Replace the system backplane (Un-P1).
5. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
This ends the procedure.
8231-E2B
270
Isolation procedures
1. Replace the processor modules one at a time (Un-P1-C10, then Un-P1-C9 if present). See System FRU
locations for information about locating the FRU.
2. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
3. Have all processor modules been replaced?
No:
Go back to step 1 and replace another processor module.
Yes:
Continue with the next step.
4. Replace the system backplane (Un-P1).
5. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
This ends the procedure.
8233-E8B, 8236-E8C
1. Replace the processor cards one at a time (Un-P1-C13 or Un-P1-C14 or Un-P1-C15 or Un-P1-C16). See
System FRU locations for information about locating the FRU.
2. Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
3. Have all processor cards been replaced?
No:
Go back to step 1 and replace another processor card.
Yes:
Continue with the next step.
4. Replace the system backplane (Un-P1).
5. Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
8248-L4T, 8408-E8D, 9109-RMD
1. Power off the system. See Powering on and powering off the system.
2. Replace the system processor modules one at a time (Un-P3-C12, then Un-P3-C17, then Un-P3-C13,
then Un-P3-C16, if present). See System FRU locations for information about FRU locations for the
system you are servicing.
3. Power on the system to the hypervisor standby.
Isolation procedures
271
Note: Slow boot is not supported.
Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
4. Have all system processor modules been replaced?
No:
Go back to step 1 and replace another system processor module.
Yes:
Continue with the next step.
5. Replace the system processor assembly (Un-P3).
6. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
This ends the procedure.
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
1. Use the serviceable event view to determine the location of the failing items in the failing item list.
Are all of the failing items located only in drawer 1 or in drawer 2, but not in both drawer 1 and
drawer 2?
No:
Continue with the next step.
Yes:
Replace the following FRUs one at a time until the problem is resolved:
a. Any FRU in the failing item list other than the service processor card at location Un-P1-C1.
b. The service processor card at location Un-P1-C1 in the drawer that is reporting the error.
c. The processor card at location Un-P3 if the card is located in the path between the
locations identified in the failing item list.
d. The midplane card at location Un-P1.
This ends the procedure.
2. Are some of the failing items located in drawer 1, and some of the failing items located in drawer 2?
No:
Continue with the next step.
Yes:
Replace the following FRUs one at a time until the problem is resolved:
a. The service processor cable (Un-P1-C1-Tx, where x = 1, 2, 3, or 4). See System FRU
locations for information about locating the FRU.
b. The service processor card at location Un-P1-C1 in drawer 1.
c. The service processor card at location Un-P1-C1 in drawer 2.
This ends the procedure.
3. Some of the failing items are located in drawer 1 or drawer 2, and some of the failing items are
located in drawer 3 or drawer 4. Replace the following FRUs one at a time until the problem is
resolved:
a. The service processor cable (Un-P1-C1-Tx, where x = 1, 2, 3, or 4). See System FRU locations for
information about locating the FRU.
b. Any FRU in the failing item list in drawer 3 or 4 other than the service processor card at location
Un-P1-C1.
c. The service processor card at location Un-P1-C1 in drawer 1 or 2 that is identified in the failing
item list.
d. The pass-through card at location Un-P1-C1 in drawer 3 or 4 that is identified in the failing item
list.
e. The processor card at location Un-P3 if the card is located in the path between the locations
identified in the failing item list.
272
Isolation procedures
f. The midplane card at location Un-P1.
This ends the procedure.
9119-FHB
1. Was the FRU listed before this procedure a memory DIMM (Un-Pm-Cx, where m = 2 through 9 and x
= 2 through 21, 24 through 29, and 33 through 38), a processor (Un-Pm-Cx, where m = 2 through 9
and x = 32, 23, 22, or 31), or GX adapter card (Un-Pm-Cx, where m = 2 through 9 and x = 44, 41, 40,
or 39)?
No:
This ends the procedure.
Yes:
Continue with the next step.
2. Replace both node controllers (Un-Pm-C42 and Un-Pm-C43, where m = 2 through 9) from the same
node as the memory DIMM, processor, or GX adapter card.
3. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
4. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
5. Replace the processor book (Un-Pm, where m = 2 through 9), the memory DIMM, processor, or GX
adapter card.
6. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
This ends the procedure.
9125-F2C
1. Replace the DCCA indicated in the failing item list.
2. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
3. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
4. Replace the other failing items in the failing item list one at a time.
5. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
6. Replace the system backplane at location Un-P1. This ends the procedure.
FSPSP46
Some corrupt areas of flash or RAM have been detected on the service processor.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
1. Use the following table to determine the action to perform.
Isolation procedures
273
System
Action
8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C,
8205-E6D, 8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C,
8231-E2D, 8233-E8B, 8236-E8C, 8268-E1D
Replace the system backplane at location Un-P1.
8248-L4T, 8408-E8D, 9109-RMD
See System FRU locations for information about system
FRU locations.
Replace the service processor at location Un-P1-C1.
See System FRU locations for information about system
FRU locations.
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or 9179-MHD
Replace the service processor.
1. If the last byte in word 3 of the SRC is a 10, replace
the primary unit, service processor (Un-P1-C1).
2. If the last byte in word 3 of the SRC is a 20, replace
the secondary unit 1, service processor (Un-P1-C1).
9119-FHB
Using the management console, look at the platform ID
in the IQYYLOG error log entry to determine the failing
system controller.
v For Platform ID 0x50xxxxxx, replace system controller
A at location Un-P1-C2.
v For Platform ID 0x51xxxxxx, replace system controller
B at location Un-P1-C5.
v For platform ID 0x86xxxxxx, replace bulk power
controller A at location Un-P1-C1.
v For platform ID 0x88xxxxxx, replace bulk power
controller B at location Un-P2-C1.
Example:
|------------------------------------------|
| Platform Event Log - 0x51EC6DB6
|
|------------------------------------------|
|
Private Header
|
|------------------------------------------|
| Section Version
: 1
|
| Sub-section type : 0
|
| Created by
: hlth
|
| Created at
: 04/21/2010 05:05:17 |
| Committed at
: 04/21/2010 05:05:17 |
| Creator Subsystem : FipS Error Logger
|
| CSSVER
:
|
| Platform Log Id
: 0x51EC6DB6
|
| Entry Id
: 0x51EC6DB6
|
| Total Log Size
: 496
|
|------------------------------------------|
In this example, the platform ID is 0x51, which indicates
system controller B should be replaced.
274
Isolation procedures
System
Action
9125-F2C
Using the management console, look at the platform ID
in the IQYYLOG error log entry to determine the failing
system controller.
v For Platform ID 0x50xxxxxx, replace DCCA A at
location Un-P1-C147.
v For Platform ID 0x51xxxxxx, replace DCCA B at
location Un-P1-C148.
v For platform ID 0x86xxxxxx, replace bulk power
controller A at location Un-P1-C1.
v For platform ID 0x88xxxxxx, replace bulk power
controller B at location Un-P2-C1.
Example:
|------------------------------------------|
| Platform Event Log - 0x51EC6DB6
|
|------------------------------------------|
|
Private Header
|
|------------------------------------------|
| Section Version
: 1
|
| Sub-section type : 0
|
| Created by
: hlth
|
| Created at
: 04/21/2010 05:05:17 |
| Committed at
: 04/21/2010 05:05:17 |
| Creator Subsystem : FipS Error Logger
|
| CSSVER
:
|
| Platform Log Id
: 0x51EC6DB6
|
| Entry Id
: 0x51EC6DB6
|
| Total Log Size
: 496
|
|------------------------------------------|
In this example, the platform ID is 0x51, which indicates
DCCA B should be replaced.
2. To determine the action to perform, use the following table.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB, or
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B or 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
Isolation procedures
275
FSPSP47
The system has detected an error within the PSI link. To resolve the problem, perform the following
steps:
See System FRU locations for instructions for removing and replacing FRUs.
Select the system that you are servicing:
v 8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, 8205-E6D, 8231-E1C, 8231-E1D, 8231-E2C,
8231-E2D, 8268-E1D
v
v
v
v
8231-E2B
8233-E8B, 8236-E8C
8248-L4T, 8408-E8D, 9109-RMD
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, 8205-E6D, 8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D,
8268-E1D
1. Replace the system backplane at location Un-P1.
2. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
3. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
4. Replace the processor modules one at a time (Un-P1-C11, then Un-P1-C10 if present). See System FRU
locations for instructions for removing and replacing any FRUs.
5. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
6. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
7. Have all processor modules been replaced?
No:
Go back to step 4 and replace another processor module.
Yes:
Contact your next level of support. This ends the procedure.
8231-E2B
1. Replace the system backplane at location Un-P1.
2. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
3. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
4. Replace the processor modules one at a time (Un-P1-C10, then Un-P1-C9 if present). See System FRU
locations for instructions for removing and replacing any FRUs.
5. Power on the system to the hypervisor standby.
276
Isolation procedures
Note: Slow boot is not supported.
6. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
7. Have all processor modules been replaced?
No:
Go back to step 4 and replace another processor module.
Yes:
Contact your next level of support. This ends the procedure.
8233-E8B, 8236-E8C
1. Replace the system backplane at location Un-P1.
2. Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
3. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
4. Replace the processor cards one at a time (Un-P1-C13 or Un-P1-C14 or Un-P1-C15 or Un-P1-C16). See
System FRU locations for instructions for removing and replacing any FRUs.
5. Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
6. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
7. Have all processor cards been replaced?
No:
Go back to step 4 and replace another processor card.
Yes:
Contact your next level of support. This ends the procedure.
8248-L4T, 8408-E8D, 9109-RMD
1. Replace the service processor card at location Un-P1-C1.
2. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
3. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
4. Replace the system processor modules one at a time (Un-P3-C12, then Un-P3-C17, then Un-P3-C13,
then Un-P3-C16, if present). See System FRU locations for instructions for removing and replacing any
FRUs.
5. Power on the system to the hypervisor standby.
Isolation procedures
277
Note: Slow boot is not supported.
6. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
7. Have all processor modules been replaced?
No:
Go back to step 4 and replace another system processor module.
Yes:
Replace the system processor assembly at location Un-P3. This ends the procedure.
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
1. Replace the service processor at location Un-P1-C1.
2. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
3. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
4. Replace the processor card at location Un-P3.
5.
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
6. Does the problem persist?
No:
This ends the procedure.
Yes:
Continue with the next step.
7. Replace the system backplane at location Un-P1.
8. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
FSPSP48
A diagnostic function detected an external processor interface problem. Perform the following steps:
Select the system that you are servicing and then perform the indicated FSPSP48 procedure.
v 8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, 8205-E6D, 8231-E2B, 8231-E1C, 8231-E1D,
8231-E2C, 8231-E2D, 8268-E1D
v 8233-E8B, 8236-E8C
v 8248-L4T, 8408-E8D, 9109-RMD
v 8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
278
Isolation procedures
v 9119-FHB
8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, 8205-E6D, 8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C,
8231-E2D, 8268-E1D
1. Power off the system. See Powering on and powering off the system.
2. Replace the system backplane at location Un-P1. See System FRU locations for information about FRU
locations for the system you are servicing.
3. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
4. If the problem is not resolved, replace the FRUs listed in the FRU list. This ends the procedure.
8233-E8B, 8236-E8C
1. Power off the system. See Powering on and powering off the system.
2. Replace the system backplane at location Un-P1. See System FRU locations for information about FRU
locations for the system you are servicing.
3. Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
4. If the problem is not resolved, replace the FRUs listed in the FRU list. This ends the procedure.
8248-L4T, 8408-E8D, 9109-RMD
1. Power off the system. See Powering on and powering off the system.
2. Replace the first system processor module at the location indicated in the service processor error log
entry. If the problem is not resolved, replace the second system processor module indicated in the
service processor error log entry. If the problem is not resolved, continue replacing system processor
modules one at a time until the problem is resolved. See System FRU locations for information about
FRU locations for the system you are servicing.
3. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
4. If the problem is not resolved, replace the FRUs listed in the FRU list. This ends the procedure.
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
1. Power off the system. See Powering on and powering off the system.
2. Replace the SMP cable at location Un-P3-Tx that attaches the two nodes that were listed before this
procedure. See System FRU locations for information about FRU locations for the system that you are
servicing.
3. Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
4. Replace the first processor module at the location indicated in the service processor error log entry. If
the problem is not resolved, replace the second processor module indicated in the service processor
error log entry.
5. If the problem is not resolved, replace the FRUs listed in the FRU list. This ends the procedure.
9119-FHB
Isolation procedures
279
1. Power off the system. See Powering on and powering off the system.
2. Replace the processor book (Un-Pm, where m = 2 through 9) of each failing FRU. See System FRU
locations for information about FRU locations for the system that you are servicing.
3. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
4. Does the problem still exist?
No: This ends the procedure.
Yes:
Continue with the next step.
5. Replace the midplane (Un-P1).
6. Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
7. If the problem is not resolved, replace the FRUs listed in the FRU list. This ends the procedure.
FSPSP49
A diagnostic function detected an internal processor interface problem. Perform the following steps:
1. Replace the FRUs listed in the FRU list. Does the problem still exist?
No: This ends the procedure.
Yes:
Continue with the next step.
2. Power off the system. See Powering on and powering off the system.
3. Use the following table to determine the action to perform.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E1C, 8231-E1D,
8231-E2C, 8231-E2D,
8268-E1D
1. Replace the processor modules at locations Un-P1-C11 and Un-P1-C10 if present,
one at a time.
8231-E2B
2. Replace the system backplane at location Un-P1. See System FRU locations for
information about locating the system backplane.
1. Replace the processor modules at locations Un-P1-C10 and Un-P1-C9 if present,
one at a time.
2. Replace the system backplane at location Un-P1. See System FRU locations for
information about locating the system backplane.
8233-E8B, 8236-E8C
1. Replace the processor cards at locations Un-P1-C13 through Un-P1-C16 one at a
time.
2. Replace the system backplane at location Un-P1. See System FRU locations for
information about locating the system backplane.
8248-L4T, 8408-E8D,
9109-RMD
1. Replace the system processor module at location Un-P3-C12. See System FRU
locations for information about locating the system processor module.
2. Replace the system processor module at location Un-P3-C17 if it is present. See
System FRU locations for information about locating the system processor module.
3. Replace the system processor module at location Un-P3-C13 if it is present. See
System FRU locations for information about locating the system processor module.
4. Replace the system processor module at location Un-P3-C16 if it is present. See
System FRU locations for information about locating the system processor module.
280
Isolation procedures
System:
Action:
9117-MMB, 9117-MMC,
9179-MHB, 9179-MHC
(two-processor system unit)
1. Replace the processor module at location Un-P3-C22. See System FRU locations for
information about locating the processor module.
8412-EAD, 9117-MMD,
9179-MHC (four-processor
system unit), 9179-MHD
2. Replace the processor module at location Un-P3-C25 if it is present. See System
FRU locations for information about locating the processor module.
1. Replace the processor module at location Un-P3-C23. See System FRU locations for
information about locating the processor module.
2. Replace the processor module at location Un-P3-C28 if it is present. See System
FRU locations for information about locating the processor module.
3. Replace the processor module at location Un-P3-C24 if it is present. See System
FRU locations for information about locating the processor module.
4. Replace the processor module at location Un-P3-C27 if it is present. See System
FRU locations for information about locating the processor module.
9119-FHB
Replace the processor book. See System FRU locations for information about locating
the processor book.
9125-F2C
Replace the system backplane at location Un-P1. See System FRU locations.
4. To determine the action to perform, use the following table.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB,
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B, 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
FSPSP50
A diagnostic function detects a connection problem between a processor chip and a GX chip. If replacing
the FRUs previously listed in the FRU list does not fix the problem, perform the following steps:
1. Power off the system. See Powering on and powering off the system.
2. Use the following table to determine the action to perform.
Isolation procedures
281
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8268-E1D
Replace the system backplane at location Un-P1. See System FRU locations for
information about locating the system backplane.
8233-E8B, 8236-E8C
1. Replace the processor cards at locations Un-P1-C13 through Un-P1-C16 one at a
time.
2. Replace the system backplane at location Un-P1. See System FRU locations for
information about locating the system backplane.
8248-L4T, 8408-E8D,
9109-RMD
1. Replace the I/O backplane at location Un-P2. See System FRU locations for
information about locating the I/O backplane.
2. Replace the midplane at location Un-P1. See System FRU locations for information
about locating the midplane.
3. Replace the system processor module at location Un-P3-C12. See System FRU
locations for information about locating the system processor module.
4. Replace the system processor module at location Un-P3-C17 if it is present. See
System FRU locations for information about locating the system processor module.
5. Replace the system processor module at location Un-P3-C13 if it is present. See
System FRU locations for information about locating the system processor module.
6. Replace the system processor module at location Un-P3-C16 if it is present. See
System FRU locations for information about locating the system processor module.
9117-MMB, 9117-MMC,
9179-MHB, 9179-MHC
(two-processor system unit)
1. Replace the I/O backplane at location Un-P2.
2. Replace the midplane at location Un-P1. See System FRU locations for information
about locating the midplane.
3. Replace the processor module at location Un-P3-C22. See System FRU locations for
information about locating the processor module.
4. Replace the processor module at location Un-P3-C25 if it is present. See System
FRU locations for information about locating the processor module.
8412-EAD, 9117-MMD,
9179-MHC (four-processor
system unit), 9179-MHD
1. Replace the I/O backplane at location Un-P2.
2. Replace the midplane at location Un-P1. See System FRU locations for information
about locating the midplane.
3. Replace the processor module at location Un-P3-C23. See System FRU locations for
information about locating the processor module.
4. Replace the processor module at location Un-P3-C28 if it is present. See System
FRU locations for information about locating the processor module.
5. Replace the processor module at location Un-P3-C24 if it is present. See System
FRU locations for information about locating the processor module.
6. Replace the processor module at location Un-P3-C27 if it is present. See System
FRU locations for information about locating the processor module.
9119-FHB
Replace the processor book that contains the failing parts. See System FRU locations
for information about locating the processor book.
3. To determine the action to perform, use the following table.
282
Isolation procedures
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB,
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B, 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
FSPSP51
A runtime diagnostic function detected a chip interconnection bus correctable error that exceeded the
threshold. The correctable error did not cause a disruption of system operations. However, the system is
operating in a degraded mode because the error is being corrected by hardware.
To resolve the problem, replace the FRU listed after this procedure in the failing item list.
FSPSP52
A problem has been detected on a memory bus. If replacing the FRUs previously listed in the FRU list
does not fix the problem, perform the following steps:
1. Power off the system. See Powering on and powering off the system).
2. Use the following table to determine the action to perform.
System
Action
8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C,
8205-E6D
Replace the memory card the DIMMs were located on
(Un-P1-C15 or Un-P1-C16 or Un-P1-C17 or Un-P1-C18).
See System FRU locations for information about system
FRU locations.
8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C, 8231-E2D,
8268-E1D
Replace the memory card the DIMMs were located on
(Un-P1-C14 or Un-P1-C15 or Un-P1-C16 or Un-P1-C17).
See System FRU locations for information about system
FRU locations.
8233-E8B, 8236-E8C
Replace the processor card the DIMMs were located on
(Un-P1-C13 or Un-P1-C14 or Un-P1-C15 or Un-P1-C16).
See System FRU locations for information about system
FRU locations.
8248-L4T, 8408-E8D, 9109-RMD
Replace the memory card the DIMMs were located on
(Un-P3-C1 through Un-P3-C4 or Un-P3-C6 through
Un-P3-C9). See System FRU locations for information
about system FRU locations.
Isolation procedures
283
System
Action
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or 9179-MHD
Replace the processor card on which DIMMs were
previously replaced (Un-P3). See System FRU locations
for information about system FRU locations.
3. To determine the action to perform, use the following table.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB, or
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B or 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
FSPSP54
A processor over-temperature has been detected. Check for any environmental issues before replacing any
parts.
1. Is the ambient room temperature in the normal operating range (less than 35 degrees C/95 degrees
F)?
No:
Notify the customer. The customer must lower the room temperature so that it is within the
normal range. Do not replace any parts. This ends the procedure.
Yes:
Continue to the next step.
2. Are the front and rear of the system unit drawer, and the front and rear rack doors, free of
obstructions that would impede the airflow through the drawer?
No:
Notify the customer. The system must be free of obstructions for proper airflow. Clean the air
inlets and exits in the drawer as required. Do not replace any parts. This ends the procedure.
Yes:
Continue to the next step.
3. Are all of the fans, especially those at the back of the power supply, functioning normally?
No:
Replace any fans that are not turning or are turning slowly. Refer to System FRU locations for
instructions. This ends the procedure.
Yes:
There are no environmental issues with the cooling of the processors. This ends the
procedure.
284
Isolation procedures
FSPSP55
An error occurred on a bus between two FRUs in the failing item list. All FRUs in the failing path are
included in the failing item list. Any of the FRUs in the failing item list might be the cause of the error.
Replace the items in the failing item list one at a time until the problem is resolved.
See System FRU locations for information about failing items.
FSPSP56
A concurrent maintenance repair action could not complete.
Reinstall the VPD card from the same drawer it was originally removed from prior to the beginning of
the concurrent maintenance repair action.
FSPSP57
The host appears on only one bulk power network hub.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
To perform the correct procedure for the system that you are servicing, select your machine type and
model (system) from among the following:
9119-FHB
9125-F2C
9119-FHB
1. Verify that the cables for the missing host are connected properly. The missing host is identified by
using words 6 and 7 of the SRC.
v Word 6 indicates which bulk power hub the host is missing from.
If word 6 is 0x0000000A, then the host is missing from bulk power hub A (Un-P1-C4).
If word 6 is 0x0000000B, then the host is missing from bulk power hub B (Un-P2-C4).
v Word 7 indicates the missing host. Use the following table to identify the missing host based upon
the value of word 7.
Table 44. 9119-FHB word 7 value to missing host cross reference
Word 7 value
Missing host
0x00000095
System controller A
0x00000094
System controller B
0x00000093
Bulk power controller B
0x00000092
Bulk power controller A
0x0000000x (x = 2 through 9)
Node controller Um-Px (x = 2 through 9)
2. Are all cables connected correctly?
Yes:
Continue to the next step.
No:
Correct the cable connections. If the problem persists, continue to the next step.
3. Replace the cable used to connect the missing host to the bulk power hub. Does the problem persist?
Yes:
Continue to the next step.
No:
This ends the procedure.
Isolation procedures
285
4. Replace the missing host identified in word 7 of the SRC. Does the problem persist?
Yes:
Continue to the next step.
No:
This ends the procedure.
5. Replace the bulk power hub (Un-P1-C4 or Un-P2-C4) identified in word 6 of the SRC. This ends the
procedure.
9125-F2C
1. Verify that the cables for the missing host are connected properly. The missing host is identified by
using words 6 and 7 of the SRC.
v Word 6 indicates which bulk power hub the host is missing from.
If word 6 is 0x0000000A, then the host is missing from bulk power hub A (Un-P2-C1).
If word 6 is 0x0000000B, then the host is missing from bulk power hub B (Un-P1-C1).
v Word 7 indicates the EIA location that the host is missing from. Word 8 indicates the bulk power
hub port (Txx) that the host is missing from. Use the following table to identify the missing host
based upon the values of words 7 and 8.
Table 45. 9125-F2C word 7 value to missing host cross reference
Word 7 value
Word 8 value
Missing host
0x000000xx (xx = 05 through 27)
12 through 24
DCCA A
0x000000xx (xx = 05 through 27)
29 through 41
DCCA B
0x00000092
11
v Bulk power controller A if word 6
is 0x0000000A
v Bulk power controller B if word 6
is 0x0000000B
0x00000093
28
v Bulk power controller B if word 6
is 0x0000000A
v Bulk power controller A if word 6
is 0x0000000B
2. Are all cables connected correctly?
Yes:
Continue to the next step.
No:
Correct the cable connections. If the problem persists, continue to the next step.
3. Replace the cable used to connect the missing host to the bulk power hub. Does the problem persist?
Yes:
Continue to the next step.
No:
This ends the procedure.
4. Replace the missing host identified in words 7 and 8 of the SRC. Does the problem persist?
Yes:
Continue to the next step.
No:
This ends the procedure.
5. Replace the bulk power hub (Un-P2-C1 or Un-P1-C1) identified in word 6 of the SRC. This ends the
procedure.
FSPSP58
A network cable is misplugged.
The FRU immediately following this procedure shows the current bulk power hub port that has the
wrong cable plugged into it. The FRU after that shows the bulk power hub port that the cable should be
plugged into. Move the cable to the correct port. This ends the procedure.
286
Isolation procedures
FSPSP59
The network data is not available or missing.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
1. Is the power on for the bulk power controllers (Un-P1-C1 and Un-P2-C1)?
No:
Apply power to the bulk power controllers. Continue to the next step.
Yes:
Replace the bulk power controller listed after this procedure.
2. Does the problem persist?
No:
This ends the procedure.
Yes:
Replace the bulk power hub listed after this procedure.
FSPSP60
A MAC address has been duplicated in the system.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
1. Perform LICCODE to update the system firmware.
Note: Do not update the BPC firmware.
2. Does the problem persist?
No:
This ends the procedure.
Yes:
Replace the hardware connected to the bulk power hub ports that are in the failing item list
after this procedure. This ends the procedure.
FSPSP61
An invalid network connection to the bulk power hub has been detected causing a looping ping
condition.
1. The failing item listed after this procedure in the failing item list indicates which bulk power hub port
to correct.
2. If you are servicing a 9119-FHB, go to “9119–FHB bulk power connection tables” to view the correct
cabling connection. If you are servicing a 9125-F2C, go to “9125-F2C bulk power connection tables” on
page 289 to view the correct cabling connection.
9119–FHB bulk power connection tables
Cross-reference tables identify the system unit node to bulk power hub, system controller to bulk power
hub, bulk power controller to bulk power hub, and management console bulk power hub connections.
Table 46. Node controller to bulk power hub connection table
System unit node
location
FRU location
FRU connector
location
Bulk power hub
(BPH) side A
connector location
Um-P2-C42-J01
Un-P1-C4-J18
Bulk power hub
(BPH) side B
connector location
Un refers to BPC location.
Um refers to system unit location.
P2
Um-P2-C42
Um-P2-C42-J02
Um-P2-C43
Um-P2-C43-J01
Un-P2-C4-J18
Un-P1-C4-J17
Isolation procedures
287
Table 46. Node controller to bulk power hub connection table (continued)
System unit node
location
FRU location
FRU connector
location
Bulk power hub
(BPH) side A
connector location
Um-P2-C43-J02
P3
Um-P3-C42
Um-P3-C42-J01
Un-P2-C4-J17
Un-P1-C4-J12
Um-P3-C42-J02
Um-P3-C43
Um-P3-C43-J01
Un-P2-C4-J12
Un-P1-C4-J11
Um-P3-C43-J02
P4
Um-P4-C42
Um-P4-C42-J01
Un-P2-C4-J11
Un-P1-C4-J10
Um-P4-C42-J02
Um-P4-C43
Um-P4-C43-J01
Un-P2-C4-J10
Un-P1-C4-J09
Um-P4-C43-J02
P5
Um-P5-C42
Um-P5-C42-J01
Un-P2-C4-J09
Un-P1-C4-J22
Um-P5-C42-J02
Um-P5-C43
Um-P5-C43-J01
Un-P2-C4-J22
Un-P1-C4-J21
Um-P5-C43-J02
P6
Um-P6-C42
Um-P6-C42-J01
Un-P2-C4-J21
Un-P1-C4-J20
Um-P6-C42-J02
Um-P6-C43
Um-P6-C43-J01
Un-P2-C4-J20
Un-P1-C4-J19
Um-P6-C43-J02
P7
Um-P7-C42
Um-P7-C42-J01
Un-P2-C4-J19
Un-P1-C4-J16
Um-P7-C42-J02
Um-P7-C43
Um-P7-C43-J01
Um-P8-C42
Um-P8-C42-J01
Un-P2-C4-J16
Un-P1-C4-J15
Um-P7-C43-J02
P8
Un-P2-C4-J15
Un-P1-C4-J14
Um-P8-C42-J02
Um-P8-C43
Um-P8-C43-J01
Un-P2-C4-J14
Un-P1-C4-J13
Um-P8-C43-J02
P9
Um-P9-C42
Um-P9-C42-J01
Un-P2-C4-J13
Un-P1-C4-J24
Um-P9-C42-J02
Um-P9-C43
Um-P9-C43-J01
Bulk power hub
(BPH) side B
connector location
Un-P2-C4-J24
Un-P1-C4-J23
Um-P9-C43-J02
Un-P2-C4-J23
Table 47. System controller to bulk power hub connection table
System controller
(SC)
SC-A
FRU location
FRU connector
location
Bulk power hub
(BPH) side A
connector location
Um-P1-C2
Um-P1-C2-J04
Un-P1-C4-J05
Um-P1-C2-J03
SC-B
Um-P1-C5
Um-P1-C5-J04
Um-P1-C5-J03
288
Isolation procedures
Bulk power hub
(BPH) side B
connector location
Un-P2-C4-J05
Un-P1-C4-J06
Un-P2-C4-J06
Table 48. Bulk power controller to bulk power hub connection table
Bulk power
controller (BPC)
BPC-A
FRU location
FRU connector
location
Bulk power hub
(BPH) side A
connector location
Un-P1-C1
Un-P1-C1-J03
Un-P1-C4-J07
Un-P1-C1-J02
BPC-B
Un-P2-C1
Un-P2-C1-J03
Bulk power hub
(BPH) side B
connector location
Un-P2-C4-J07
Un-P1-C4-J08
Un-P2-C1-J02
Un-P2-C4-J08
Table 49. Management console bulk power hub connection table
Management console
Bulk power hub (BPH) side A
connector location
HMC-A
Un-P1-C4-J01
HMC-B
Bulk power hub (BPH) side B connector
location
Un-P2-C4-J01
9125-F2C bulk power connection tables
Cross-reference tables identify the system unit node to bulk power hub, bulk power controller to bulk
power hub connections, and management console to bulk power hub.
Table 50. Node controller to bulk power hub connection table
System unit node
location
FRU location
Ethernet 0 (T2, lower Ethernet 1 (T3, upper
switch)
switch)
Rack EIA position 5U DCCA A
Un-P1-C147
Un-P2-C1-T12
Un-P1-C1-T12
DCCA B
Un-P1-C148
Un-P2-C1-T29
Un-P1-C1-T29
Rack EIA position 7U DCCA A
Un-P1-C147
Un-P2-C1-T13
Un-P1-C1-T13
FRU
DCCA B
Un-P1-C148
Un-P2-C1-T30
Un-P1-C1-T30
Rack EIA position 9U DCCA A
Un-P1-C147
Un-P2-C1-T14
Un-P1-C1-T14
DCCA B
Un-P1-C148
Un-P2-C1-T31
Un-P1-C1-T31
DCCA A
Un-P1-C147
Un-P2-C1-T15
Un-P1-C1-T15
DCCA B
Un-P1-C148
Un-P2-C1-T32
Un-P1-C1-T32
DCCA A
Un-P1-C147
Un-P2-C1-T16
Un-P1-C1-T16
DCCA B
Un-P1-C148
Un-P2-C1-T33
Un-P1-C1-T33
DCCA A
Un-P1-C147
Un-P2-C1-T17
Un-P1-C1-T17
DCCA B
Un-P1-C148
Un-P2-C1-T34
Un-P1-C1-T34
DCCA A
Un-P1-C147
Un-P2-C1-T18
Un-P1-C1-T18
DCCA B
Un-P1-C148
Un-P2-C1-T35
Un-P1-C1-T35
DCCA A
Un-P1-C147
Un-P2-C1-T20
Un-P1-C1-T20
DCCA B
Un-P1-C148
Un-P2-C1-T37
Un-P1-C1-T37
DCCA A
Un-P1-C147
Un-P2-C1-T21
Un-P1-C1-T21
DCCA B
Un-P1-C148
Un-P2-C1-T38
Un-P1-C1-T38
Rack EIA position
11U
Rack EIA position
13U
Rack EIA position
15U
Rack EIA position
17U
Rack EIA position
19U
Rack EIA position
21U
Isolation procedures
289
Table 50. Node controller to bulk power hub connection table (continued)
System unit node
location
Rack EIA position
23U
Rack EIA position
25U
Rack EIA position
27U
FRU
FRU location
Ethernet 0 (T2, lower Ethernet 1 (T3, upper
switch)
switch)
DCCA A
Un-P1-C147
Un-P2-C1-T22
Un-P1-C1-T22
DCCA B
Un-P1-C148
Un-P2-C1-T39
Un-P1-C1-T39
DCCA A
Un-P1-C147
Un-P2-C1-T23
Un-P1-C1-T23
DCCA B
Un-P1-C148
Un-P2-C1-T40
Un-P1-C1-T40
DCCA A
Un-P1-C147
Un-P2-C1-T24
Un-P1-C1-T24
DCCA B
Un-P1-C148
Un-P2-C1-T41
Un-P1-C1-T41
Table 51. Bulk power controller to bulk power hub connection table
Bulk power controller
(BPC)
FRU connector location
BPC-A
Un-P1-C1-T28
Un-P2-C1-T11
BPC-B
Ethernet 0 (lower switch)
Un-P2-C1-T2
Un-P2-C1-T3
Un-P1-C1-T11
Un-P2-C1-T28
Ethernet 1 (upper switch)
Un-P1-C1-T2
Un-P1-C1-T3
Table 52. Management console bulk power hub connection table
Management console
Port 1 (upper switch)
Port 2 (lower switch)
HMC1
Un-P1-C1-T19
Un-P2-C1-T19
HMC2
Un-P1-C1-T36
Un-P2-C1-T36
FSPSP62
The service processor has detected a missing node due to a mismatch between the nodes with power and
the nodes that the service processor can communicate with.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
1. Using the “9119–FHB bulk power connection tables” on page 292, verify the cables for the missing
node. Word 6 of the SRC identifies the missing node as follows:
Table 53. Missing node table
Word 6 value
Missing node
0x0000000x (x = 2 through 9)
Node controllers Un-Px (x = 2 through 9)
2. Are the cables set correctly?
Yes:
Continue to step 4 on page 291.
No:
Correct the cable connections.
To determine the action to perform, use the following table.
290
Isolation procedures
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB,
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B, 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
3. Does the problem persist?
Yes:
Continue to the next step.
No:
This ends the procedure.
4. Replace the cables used to connect the missing node to the bulk power hubs.
To determine the action to perform, use the following table.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB,
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B, 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
5. Does the problem persist?
Yes:
Continue to the next step.
No:
This ends the procedure.
6. Replace the node controllers (Un-Px-C42 or Un-Px-C43, where x = word 6 value) on the missing node.
To determine the action to perform, use the following table.
Isolation procedures
291
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB,
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B, 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
This ends the procedure.
9119–FHB bulk power connection tables
Cross-reference tables identify the system unit node to bulk power hub, system controller to bulk power
hub, bulk power controller to bulk power hub, and management console bulk power hub connections.
Table 54. Node controller to bulk power hub connection table
System unit node
location
FRU location
FRU connector
location
Bulk power hub
(BPH) side A
connector location
Um-P2-C42-J01
Un-P1-C4-J18
Bulk power hub
(BPH) side B
connector location
Un refers to BPC location.
Um refers to system unit location.
P2
Um-P2-C42
Um-P2-C42-J02
Um-P2-C43
Um-P2-C43-J01
Un-P1-C4-J18
Un-P1-C4-J17
Um-P2-C43-J02
P3
Um-P3-C42
Um-P3-C42-J01
Un-P2-C4-J17
Un-P1-C4-J12
Um-P3-C42-J02
Um-P3-C43
Um-P3-C43-J01
Un-P1-C4-J12
Un-P1-C4-J11
Um-P3-C43-J02
P4
Um-P4-C42
Um-P4-C42-J01
Un-P2-C4-J11
Un-P1-C4-J10
Um-P4-C42-J02
Um-P4-C43
Um-P4-C43-J01
Un-P1-C4-J10
Un-P1-C4-J09
Um-P4-C43-J02
P5
Um-P5-C42
Um-P5-C42-J01
Un-P2-C4-J09
Un-P1-C4-J22
Um-P5-C42-J02
Um-P5-C43
Um-P5-C43-J01
Un-P1-C4-J22
Un-P1-C4-J21
Um-P5-C43-J02
P6
Um-P6-C42
Um-P6-C42-J01
Um-P6-C42-J02
292
Isolation procedures
Un-P2-C4-J21
Un-P1-C4-J20
Un-P1-C4-J20
Table 54. Node controller to bulk power hub connection table (continued)
System unit node
location
FRU location
FRU connector
location
Bulk power hub
(BPH) side A
connector location
Um-P6-C43
Um-P6-C43-J01
Un-P1-C4-J19
Um-P6-C43-J02
P7
Um-P7-C42
Um-P7-C42-J01
Un-P2-C4-J19
Un-P1-C4-J16
Um-P7-C42-J02
Um-P7-C43
Um-P7-C43-J01
Un-P1-C4-J16
Un-P1-C4-J15
Um-P7-C43-J02
P8
Um-P8-C42
Um-P8-C42-J01
Un-P2-C4-J15
Un-P1-C4-J14
Um-P8-C42-J02
Um-P8-C43
Um-P8-C43-J01
Un-P1-C4-J14
Un-P1-C4-J13
Um-P8-C43-J02
P9
Um-P9-C42
Um-P9-C42-J01
Un-P2-C4-J13
Un-P1-C4-J24
Um-P9-C42-J02
Um-P9-C43
Um-P9-C43-J01
Bulk power hub
(BPH) side B
connector location
Un-P1-C4-J24
Un-P1-C4-J23
Um-P9-C43-J02
Un-P2-C4-J23
Table 55. System controller to bulk power hub connection table
System controller
(SC)
SC-A
FRU location
FRU connector
location
Bulk power hub
(BPH) side A
connector location
Um-P1-C2
Um-P1-C2-J04
Un-P1-C4-J05
Um-P1-C2-J03
SC-B
Um-P1-C5
Um-P1-C5-J04
Bulk power hub
(BPH) side B
connector location
Un-P2-C4-J05
Un-P1-C4-J06
Um-P1-C5-J03
Un-P2-C4-J06
Table 56. Bulk power controller to bulk power hub connection table
Bulk power
controller (BPC)
BPC-A
FRU location
FRU connector
location
Bulk power hub
(BPH) side A
connector location
Un-P1-C1
Un-P1-C2-J02
Un-P1-C4-J07
Un-P1-C2-J03
BPC-B
Un-P2-C1
Un-P1-C5-J02
Bulk power hub
(BPH) side B
connector location
Un-P2-C4-J07
Un-P1-C4-J08
Un-P1-C5-J03
Un-P2-C4-J08
Table 57. Management console bulk power hub connection table
Management console
Bulk power hub (BPH) side A
connector location
HMC-A
Un-P1-C4-J01
HMC-B
Bulk power hub (BPH) side B connector
location
Un-P2-C4-J01
Isolation procedures
293
FSPSP63
The system has experienced a power error. Please review previous error logs for power-related issues.
FSPSP64
All the processor support interface (PSI) links of the system are either nonfunctional or deconfigured, so
the system cannot perform an IPL appropriately. Look for previous error logs that deconfigure hardware.
FSPSP65
Both service processor ports are on the same IP subnet. This configuration is not valid.
The network is either wired or set up incorrectly. For the Hardware Management Console (HMC), see
Configuring the HMC to correct the problem. For the IBM Systems Director Management Console
(SDMC), see Configuring network to correct the problem. This ends the procedure.
FSPSP66
Use this procedure when a system with redundant service processors was booted with service processor
fail-over disabled.
If you are servicing a system that contains redundant service processors and you booted the system with
the fail-over disabled, perform the following steps.
1. Shut down the system.
2. Use the management console to enable fail-over.
3. Reboot the system. Verify that the SRC that sent you here was not logged during this boot.
FSPSP67
No standby power to the primary service processor.
The primary service processor is not receiving standby power.
Perform the following steps:
1. Verify that the AC power cords are properly plugged into the back of drawers 1 and 2.
2. Verify that the service processor and SPCN cables are properly plugged into the back of the system.
3. Verify that "01" is in the upper left-hand corner of the control (operator) panel in the first drawer.
4. Correct any problems that are found in the preceding steps.
5. If the system does not reach standby after correcting the AC input and cabling, continue to the next
FRU in the list. If the system reaches standby, this ends the procedure.
FSPSP68
A problem occurred during a service action.
Perform a service processor dump. See Performing a service processor dump. Report the dump to your
next level of support. See Reporting a dump. This ends the procedure.
FSPSP70
Look for 1100xxxx errors in the serviceable event view that were logged at about the same time as this
error and resolve them. If there are no 1100xxxx errors logged, contact your next level of support.
294
Isolation procedures
FSPSP71
Some corrupt areas of flash or RAM have been detected in the bulk power controller (BPC).
Use the following table to determine the action to perform.
System
Action
9119-FHB, 9125-F2C
Replace the failing bulk power controller. See System
FRU locations for information about system FRU
locations.
Look at the platform ID in the IQYYLOG error log entry
to determine the failing bulk power controller (BPC).
v For platform ID 0x86xxxxxx, replace bulk power
controller A at location Un-P1-C1.
v For platform ID 0x88xxxxxx, replace bulk power
controller B at location Un-P2-C1.
Example:
|------------------------------------------|
| Platform Event Log - 0x88EC6DB6
|
|------------------------------------------|
|
Private Header
|
|------------------------------------------|
| Section Version
: 1
|
| Sub-section type : 0
|
| Created by
: hlth
|
| Created at
: 04/21/2010 05:05:17 |
| Committed at
: 04/21/2010 05:05:17 |
| Creator Subsystem : FipS Error Logger
|
| CSSVER
:
|
| Platform Log Id
: 0x88EC6DB6
|
| Entry Id
: 0x88EC6DB6
|
| Total Log Size
: 496
|
|------------------------------------------|
In this example, the platform ID is 0x88, which indicates
bulk power controller (BPC) B should be replaced.
FSPSP73
One of the BPHs (bulk power hubs) detected an IP address that is not valid.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
Perform the following steps:
1. Enter “netvcmd –validate” on the BPC (bulk power controller) command line. To perform this
operation, your authority level must be an authorized service provider. Perform the following steps:
a. On the ASMI Welcome pane, specify your user ID and password, and click Log In.
b. In the navigation area, expand System Service Aids.
c. Select BPC (bulk power controller) Command Line.
d. Enter “netvcmd –validate” on the command line.
e. Click Execute to perform the command on the BPC.
Check the service processor error logs in the ASMI. Is there a new B1818CAC SRC logged?
Yes:
Continue to the next step.
Isolation procedures
295
No:
This ends the procedure.
2. Use the bulk power connection tables in isolation procedure FSPSP61 to verify that the cables going
to the BPH port identified by words 6 and 7 of the SRC data are connected properly.
v Word 6 indicates which bulk power hub detected the IP address that is not valid.
If word 6 is 0x0000000A, bulk power hub A detected an IP address that is not valid.
If word 6 is 0x0000000B, bulk power hub B detected an IP address that is not valid.
v Word 7 indicates the bulk power hub port that detected the IP address that is not valid.
v Word 8 is the IP address that is not valid displayed in hexadecimal format. For example, if word 8
is 090619F9, the IP address that is not valid is 9.6.25.249.
3. Are the cables connected correctly?
Yes:
Continue to step 8.
No:
Correct the cable connections. Then continue to the next step.
4. Repeat step 1 on page 295. Does the problem persist?
Yes:
Continue to the next step.
No:
This ends the procedure.
5. Enter “nmgrcmd –surv –display-npl” on the service processor command line. See Entering service
processor commands. Find the “Node” and “Pos” of the BPH that is detecting the IP address that is
not valid. Does the output show the host with the IP address that is not valid?
Yes:
Continue to the next step.
No:
Continue to step 8.
6. Enter “nmgr –admin –reboot –node=x -pos=y" where x and y are the “Node” and “Pos” values
obtained in step 5
Attention: This command will reboot the BPH that detected the IP address that is not valid. If there
is more than one BPH that detected an IP address that is not valid, reboot one BPH at a time and
wait for the BPH to come back online before rebooting the next one.
7. Repeat step 1 on page 295. Does the problem persist?
Yes:
Continue to the next step.
No:
This ends the procedure.
8. Perform symbolic FRU LICCODE to update the service processor firmware.
9. Repeat step 1 on page 295. Does the problem persist?
Yes:
Continue to the next step.
No:
This ends the procedure.
10. Replace the hardware connected to the bulk power hub port that is called out after this procedure in
the detailed error log. This ends the procedure.
FSPSP75
One of the BPHs (bulk power hubs) detected an IP address that is not valid.
Perform isolation procedure FSPSP73. This ends the procedure.
FSPSP79
Look for uncorrectable memory errors in the serviceable event view that were logged at about the same
time as this error and resolve them. If there are no uncorrectable memory errors logged, replace the
processor module.
296
Isolation procedures
FSPSP83
The service processor detected hardware that was removed from the system configuration. Use the
Advanced System Management Interface (ASMI) to determine the hardware to be replaced and schedule
maintenance at your earliest convenience.
To determine the hardware to replace, complete the following steps:
Note: To perform this operation, your authority level must be administrator or authorized service
provider.
1. On the ASMI Welcome pane, specify your user ID and password, and click Log In.
2. In the navigation area, expand System Service Aids > Deconfiguration Records.
3. Use the system reference code (SRC) associated with the unconfigured hardware to resolve the
problem. This ends the procedure.
FSPSPC1
If the system hangs after the code that sent you to this procedure appears in the control panel, perform
these steps to reset the service processor.
To perform the correct procedure for the system that you are servicing, select your machine type and
model (system) from among the following:
v 8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, 8205-E6D, 8231-E2B, 8231-E1C, 8231-E1D,
8231-E2C, 8231-E2D, 8233-E8B, 8236-E8C, 8268-E1D
v
v
v
v
8248-L4T, 8408-E8D, 9109-RMD
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
9119-FHB
9125-F2C
8202-E4B, 8202-E4C, 8202-E4D, 8205-E6B, 8205-E6C, 8205-E6D, 8231-E2B, 8231-E1C, 8231-E1D, 8231-E2C,
8231-E2D, 8233-E8B, 8236-E8C, 8268-E1D
Attention: You should periodically check the system firmware level on all your servers and update the
firmware to the latest level, if appropriate. If you were directed to this procedure because the server
displayed B1817201, C1001014, or C1001020, or a combination of these codes, the latest firmware can help
avoid a recurrence of this problem.
Even if the customer cannot update the firmware on this system at this time, all of their systems should
be updated to the latest firmware level as soon as possible to help prevent this problem from occurring
on other systems.
Resetting the service processor on systems with a physical control panel
1. If the Advanced System Management Interface (ASMI) is available, reset the service processor using
the ASMI menus.
Were you able to use the ASMI menus to reset the service processor?
Yes: This ends the procedure.
No: Continue with the next step.
2. Activate the service processor pinhole reset switch on the system's control (operator) panel by
carefully performing these steps:
a. Using an insulated paper clip, unbend the paper clip so that it has a straight section about 2
inches long.
b. Insert the clip straight into the hole, keeping the clip perpendicular to the plastic bezel.
Isolation procedures
297
c. When you engage the reset switch, you should feel the detent of the switch. Pressing the reset
switch resets the service processor and causes the system to shut down.
3. Reboot the system in slow mode from the permanent side, using control panel function 02 or the
ASMI menus, if available.
4. If the hang repeats, verify whether there is a firmware update that is available that fixes the problem;
apply if available. For more information, see:
v If the system is managed by a Hardware Management Console (HMC), see Upgrading the machine
code on an HMC from Version 6 to Version 7.
v If the system is managed by an IBM Systems Director Management Console (SDMC), see Updating
the SDMC.
v If the system is not managed by a management console, see the Managing the Advanced System
Management Interface.
5. Choose from the following options:
v If no firmware update is available, continue with the next step.
v If a firmware update is available, use the management console to apply, if attached.
Did the update resolve the problem so that the system now boots?
Yes: This ends the procedure.
No: You are here because there is no management console attached to the system, the flash update
failed, or the updated firmware did not fix the hang. Continue with the next step.
6. Choose from the following options:
v If you are a customer, contact your authorized hardware service provider. This ends the procedure.
v If you are the authorized hardware service provider, continue with the next step.
7. Replace the service processor (see System FRU locations).
8. If replacing the service processor does not fix the problem, contact your next level of support. This
ends the procedure.
Resetting the service processor on systems with a logical control panel
1. Reset the service processor. Use the Advanced System Management Interface (ASMI) menus, if
available, or the management console first to remove then to reapply power to the service processor.
2. Using the setting in the ASMI menu, reboot the system in slow mode from the permanent side.
3. If the hang repeats, check with service support to see if a firmware update is available that fixes the
problem. For the Hardware Management Console (HMC), see Upgrading the machine code on an
HMC from Version 6 to Version 7. If the system is managed by an IBM Systems Director Management
Console (SDMC), see Updating the SDMC.
4. Choose from the following options:
v If no firmware update is available, continue with the next step.
v If a firmware update is available, apply it using the management console. Did the update resolve
the problem so that the system now boots?
No:
You are here because there is no management console attached to the system, the flash
update failed, or the updated firmware did not fix the hang. Continue with the next step.
Yes:
This ends the procedure.
5. Choose from the following options:
v If you are a customer, contact your authorized hardware service provider. This ends the procedure.
v If you are a customer and your system has a secondary service processor, use the management
console to initiate a service processor failover and continue to power on the system. Contact your
authorized service provider to schedule deferred maintenance on the service processor that is not
working. This ends the procedure.
v If you are the authorized hardware service provider, continue with the next step.
298
Isolation procedures
6. Replace the service processor. See System FRU locations for information about the FRU location for
the system that you are servicing.
7. If replacing the service processor does not fix the problem, contact your next level of support. This
ends the procedure.
8248-L4T, 8408-E8D, 9109-RMD
Select the procedure that applies to the system on which you are working.
v Systems with a physical control panel.
v Systems with a logical control panel.
Resetting the service processor on systems with a physical control panel
1. If the Advanced System Management Interface (ASMI) is available, reset the service processor using
the ASMI menus. Were you able to use the ASMI menus to reset the service processor?
Yes:
This ends the procedure.
No:
Continue with the next step.
2. Activate the service processor pinhole reset switch on the system's operator panel by carefully
performing these steps:
a. Using an insulated paper clip, unbend the clip so that it has a straight section about two inches
long.
b. Insert the clip straight into the hole, keeping the clip perpendicular to the plastic bezel.
c. When you engage the reset switch, you should feel the detent of the switch. Pressing the reset
switch resets the service processor and causes the system to shut down.
3. Reboot the system from the permanent side using control panel function 02 or the ASMI menus, if
available.
4. If the hang repeats, check with service support to see if a firmware update is available that fixes the
problem.
5. Choose from the following options:
v If no firmware update is available, continue with the next step.
v If a firmware update is available, apply it using the management console. Did the update resolve
the problem so that the system now boots?
Yes:
This ends the procedure.
You are here because there is no management console attached to the system, the flash
update failed, or the updated firmware did not fix the hang. Continue with the next step.
6. Choose from the following options:
v If you are a customer, contact your authorized hardware service provider. This ends the procedure.
v If you are the authorized hardware service provider, continue with the next step.
7. Replace the service processor card. See System FRU locations for information about the FRU location
for the system that you are servicing.
8. If replacing the service processor card does not fix the problem, contact your next level of support.
This ends the procedure.
No:
Resetting the service processor on systems with a logical control panel
1. Reset the service processor. Use the Advanced System Management Interface (ASMI) menus, if
available, or the management console first to remove then to reapply power to the service processor.
2. Using the setting in the ASMI menu, reboot the system from the permanent side.
3. If the hang repeats, check with service support to see if a firmware update is available that fixes the
problem.
Isolation procedures
299
4. Choose from the following options:
v If no firmware update is available, continue with the next step.
v If a firmware update is available, apply it using the management console. Did the update resolve
the problem so that the system now boots?
No:
You are here because there is no management console attached to the system, the flash
update failed, or the updated firmware did not fix the hang. Continue with the next step.
Yes:
This ends the procedure.
5. Choose from the following options:
v If you are a customer, contact your authorized hardware service provider. This ends the procedure.
v If you are the authorized hardware service provider, continue with the next step.
6. Replace the service processor card. See System FRU locations for information about the FRU location
for the system that you are servicing.
7. If replacing the service processor card does not fix the problem, contact your next level of support.
This ends the procedure.
8412-EAD, 9117-MMB, 9117-MMC, 9117-MMD, 9179-MHB, 9179-MHC, or 9179-MHD
Are you servicing a system with multiple drawers?
Yes:
Verify that the VPD card is present in the first (top) drawer, and that the VPD card is not
installed in any of the other processor drawers. If there are problems with the configuration of
the VPD card, correct them, and reapply AC power. If the service processor comes up to standby
mode, this ends the procedure. If the service processor still fails early in the boot process, or the
VPD card was configured correctly, continue to the next step.
No:
Continue with the next step. Select the procedure that applies to the system on which you are
working.
v Systems with a physical control panel.
v Systems with a logical control panel.
Resetting the service processor on systems with a physical control panel
1. If the Advanced System Management Interface (ASMI) is available, reset the service processor using
the ASMI menus. Were you able to use the ASMI menus to reset the service processor?
Yes:
This ends the procedure.
No:
Continue with the next step.
2. Activate the service processor pinhole reset switch on the system's operator panel by carefully
performing these steps:
a. Using an insulated paper clip, unbend the clip so that it has a straight section about two inches
long.
b. Insert the clip straight into the hole, keeping the clip perpendicular to the plastic bezel.
c. When you engage the reset switch, you should feel the detent of the switch. Pressing the reset
switch resets the service processor and causes the system to shut down.
3. Reboot the system in slow mode from the permanent side using control panel function 02 or the
ASMI menus, if available.
4. If the hang repeats, check with service support to see if a firmware update is available that fixes the
problem. For the Hardware Management Console (HMC), see \ Upgrading the machine code on an
HMC from Version 6 to Version 7. If the system is managed by an IBM Systems Director Management
Console (SDMC), see Updating the SDMC.
5. Choose from the following options:
v If no firmware update is available, continue with the next step.
300
Isolation procedures
v If a firmware update is available, apply it using the management console. Did the update resolve
the problem so that the system now boots?
Yes:
This ends the procedure.
No:
You are here because there is no management console attached to the system, the flash
update failed, or the updated firmware did not fix the hang. Continue with the next step.
6. Choose from the following options:
v If you are a customer, contact your authorized hardware service provider. This ends the procedure.
v If you are a customer and your system has a secondary service processor, use the management
console to initiate a service processor failover and continue to power on the system. Contact your
authorized service provider to schedule deferred maintenance on the service processor that is not
working. This ends the procedure.
v If you are the authorized hardware service provider, continue with the next step.
7. Replace the service processor. See System FRU locations for information about the FRU location for
the system that you are servicing.
8. If replacing the service processor does not fix the problem, contact your next level of support. This
ends the procedure.
Resetting the service processor on systems with a logical control panel
1. Reset the service processor. Use the Advanced System Management Interface (ASMI) menus, if
available, or the management console first to remove then to reapply power to the service processor.
2. Using the setting in the ASMI menu, reboot the system in slow mode from the permanent side.
3. If the hang repeats, check with service support to see if a firmware update is available that fixes the
problem. For the Hardware Management Console (HMC), see Upgrading the machine code on an
HMC from Version 6 to Version 7. If the system is managed by an IBM Systems Director Management
Console (SDMC), see Updating the SDMC.
4. Choose from the following options:
v If no firmware update is available, continue with the next step.
v If a firmware update is available, apply it using the management console. Did the update resolve
the problem so that the system now boots?
No:
You are here because there is no management console attached to the system, the flash
update failed, or the updated firmware did not fix the hang. Continue with the next step.
Yes:
This ends the procedure.
5. Choose from the following options:
v If you are a customer, contact your authorized hardware service provider. This ends the procedure.
v If you are a customer and your system has a secondary service processor, use the management
console to initiate a service processor failover and continue to power on the system. Contact your
authorized service provider to schedule deferred maintenance on the service processor that is not
working. This ends the procedure.
v If you are the authorized hardware service provider, continue with the next step.
6. Replace the service processor. See System FRU locations for information about the FRU location for
the system that you are servicing.
7. If replacing the service processor does not fix the problem, contact your next level of support. This
ends the procedure.
9119-FHB
1. Reset the service processor. Use the Advanced System Management Interface (ASMI) menus, if
available, or the management console first to remove then to reapply power to the processor node.
2. Using the setting in the ASMI menu, reboot the system to hypervisor standby from the permanent
side.
Isolation procedures
301
3. If the hang repeats, check with service support to see if a firmware update is available that fixes the
problem. See Getting fixes in the Customer service and support topic for details.
4. Choose from the following options:
v If no firmware update is available, continue with the next step.
v If a firmware update is available, apply it using the Service Focal Point™ in the management
console. Did the update resolve the problem so that the system now boots?
No:
You are here because there is no management console attached to the system, the flash
update failed, or the updated firmware did not fix the hang. Continue with the next step.
Yes:
This ends the procedure.
5. Choose from the following options:
v If you are a customer, and your system has only one system controller, contact your hardware
service provider. This ends the procedure.
v If you are a customer, and your system has a secondary system controller, use the management
console to initiate a service processor failover and continue to power on the system. Contact your
service provider to schedule deferred maintenance on the service processor that is hung. This ends
the procedure.
v If you are a hardware service provider, continue with the next step.
6. Replace the system controller (symbolic FRU SYSCONTR).
7. If replacing the system controller does not fix the problem, contact your next level of support. This
ends the procedure.
9125-F2C
1. Reset the service processor. Use the Advanced System Management Interface (ASMI) menus, if
available, or the management console first to remove then to reapply power to the processor node.
2. Using the setting in the ASMI menu, reboot the system to hypervisor standby from the permanent
side.
3. If the hang repeats, check with service support to see if a firmware update is available that fixes the
problem. See Getting fixes in the Customer service and support topic for details.
4. Choose from the following options:
v If no firmware update is available, continue with the next step.
v If a firmware update is available, apply it using the Service Focal Point in the management console.
Did the update resolve the problem so that the system now boots?
No:
You are here because there is no management console attached to the system, the flash
update failed, or the updated firmware did not fix the hang. Continue with the next step.
Yes:
This ends the procedure.
5. Replace the DCCAs, one at a time, at locations Un-P1-C147 and Un-P1-C148.
6. If replacing the DCCAs does not fix the problem, contact your next level of support. This ends the
procedure.
FSPSPD1
If the system hangs after the code that sent you to this procedure appears in the control panel, perform
these steps to reset the service processor.
Perform “FSPSPC1” on page 297.
302
Isolation procedures
Tape unit isolation procedures
This topic contains the procedures necessary to isolate a failure in a tape device.
In these procedures, the term tape unit may be any one of the following:
v An internal tape drive, including its electronic parts and status indicators
v An internal tape drive, including its tray, power regulator, and AMDs
v An external tape drive, including its power supply, power switch, power regulator, and AMDs
You should interpret the term tape unit to mean the tape drive you are working on. However, these
procedures use the terms tape drive and enclosure to indicate a more specific meaning.
Read and observe all safety procedures before servicing the system and while performing the procedures
in this topic. Unless instructed otherwise, always power off the system or expansion unit where the FRU
is located (see Powering on and powering off the system) before removing, exchanging, or installing a
field-replaceable unit (FRU).
Isolation procedures
303
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
TUPIP03
You were directed here because you may need to exchange a failing part.
The failing part was determined from one of the following:
v Other problem isolation procedures
v The Failing item column of the tape unit reference code table
v Tape unit service information
Note: Occasionally, the system is available but not performing an alternate IPL (type D IPL). In this
instance, any hardware failure of the tape unit I/O processor, or any device attached to it is not critical.
With the exception of the loss of the affected devices, the system remains available.
1. If the system has logical partitions, perform this procedure from the logical partition that reported
the problem (see Determining if the system has logical partitions).
2. Do you need to exchange a possible failing device?
v No: Do you need to exchange the tape unit I/O processor?
No: Continue with the next step.
304
Isolation procedures
Yes: Exchange the tape unit I/O processor. See System FRU locations. When you have
completed the remove and replace procedure, continue with the next step.
v Yes: Perform the following steps:
– For an internal tape unit, go to System FRU locations for information about system FRU
locations.
– For an external tape unit, go to the remove and replace procedures in the device service
information.
3. Are you working with a tape unit in the system unit or in an expansion unit?
v Yes: Is the system available, and can you enter commands on the command line?
No: Continue with the next step.
Yes: Go to step 9.
v No: Continue with the next step.
4. Display the selected IPL type.
Is the displayed IPL type D?
v No: Do you want to perform an alternate IPL (type D)?
No: Continue with the next step.
Yes: Go to step 6.
v Yes: Go to step 6.
5. Perform an IPL from disk by doing the following:
a. Power off the system. See Powering on and powering off the system.
b. Select IPL type A in manual mode.
c. Power on the system.
d. Go to step 8.
6. Place the first tape of the latest set of SAVSYS tapes or SAVSTG tapes, or the first Software
Distribution tape in the alternate IPL tape drive. The tape drive automatically becomes ready for the
IPL operation (this may take several minutes).
7. Perform an alternate IPL by doing the following:
a. Power off the system.
b. Select IPL type D in Manual mode.
c. Power on the system.
8. The IPL may take one or more hours to complete.
Does an unexpected reference code appear on the control panel, and is the System Attention light
on?
v No: Does the IPL complete successfully?
Yes: Continue with the next step.
No: Perform problem analysis to continue analyzing the problem. This ends the procedure.
v Yes: Go to step 10.
9. Perform the following steps to test the tape unit:
a. Enter VFYTAP (the Verify Tape command) on the command line.
b. Follow the prompts on the Verify Tape displays, then return here and answer the following
question.
Does the VFYTAP command end successfully?
No: Continue with the next step.
Yes: This ends the procedure.
10. Record the SRC.
Is the SRC the same one that sent you to this procedure?
Isolation procedures
305
Yes: You cannot continue to analyze the problem. Use the original SRC and exchange the FRUs.
Begin with the FRU which has the highest percent of probable failure (see the failing item list for
this reference code). This ends the procedure.
No: A different SRC occurred. Use the new SRC to correct the problem. This ends the procedure.
TUPIP04
Use this procedure to reset an IOP and its attached tape units. Read the (overview) before continuing
with this procedure.
If disk units are attached to an IOP, you must power off the system, then power it on to reset the IOP.
1. If the system has logical partitions, perform this procedure from the logical partition that reported
the problem (see Determining if the system has logical partitions).
2. Is the tape unit powered on?
v No: Continue with the next step.
v Yes: Perform the following steps:
a. Press the Unload switch on the front of the tape unit you are working on.
b. If a data cartridge or a tape reel is present, do not load it until you need it.
c. Continue with the next step of this procedure.
3. Verify the following:
v If the external device has a power switch, ensure that it is set to the On position.
v Ensure that the power and external signal cables are connected correctly.
Note: For every 8mm and 1/4 inch tape unit, the I/O bus terminating plug for the SCSI external
signal cable is connected internally. These devices do not need and should not have an external
terminating plug.
4. Did you press the Unload switch in step 2?
v Yes: Can you enter commands on the command line?
Yes: Continue with the next step.
No: Go to step 11 on page 308.
v No: Press the Unload switch on the front of the tape unit you are working on. If a data cartridge
or a tape reel is present, do not load it until you need it. Continue with the next step of this
procedure.
5. Has the tape unit operated correctly since it was installed? If you do not know, continue with the
next step of this procedure.
Yes: Continue with the next step.
No: Go to step 11 on page 308.
6. If a system message displayed an I/O processor name, a tape unit resource name, or a device name,
record the name for use in the next step. You may continue without a name.
Does the I/O processor give support to only one tape unit? If you do not know, continue with the
next step of this procedure.
v No: Continue with the next step.
v Yes: Perform the following. You must complete all parts of this step before you press Enter.
a. Enter
WRKCFGSTS *DEV *TAP ASTLVL(*INTERMED)
(the Work with Configuration Status command) on the command line.
b. If the device is not varied off, select Vary off before continuing.
c. Select Vary on for the failing tape unit.
d. Enter
306
Isolation procedures
RESET(*YES)
(the Reset command) on the command line.
e. Press Enter. This ends the procedure.
7. This step determines if the I/O processor for the tape unit gives support to other tape units or to a
disk unit.
Notes:
a. If you cannot determine the tape unit you are attempting to use, go to step 11 (See 11 on page
308).
b. System messages refer to other tape units that the I/O processor gives support to as associated
devices.
Enter WRKHDWRSC *STG (the Work with Hardware Resources command) on the command line.
Did you record an I/O processor (IOP) resource name in step 6 on page 306?
v No: Perform the following steps:
a. Select Work with resources for each storage resource IOP (CMB01, SIO1, and SIO2 are
examples of storage resource IOPs).
b. Find the Configuration Description name of the tape unit you are attempting to use, and then
record the Configuration Description names of all tape units that the I/O processor gives
support to.
c. Record whether the I/O processor for the tape unit also gives support to any disk unit
resources.
d. Continue with the next step.
v Yes: Perform the following steps:
a. Select Work with resources for that resource.
b. Record the Configuration description name of all tape units for which the I/O processor
provides support.
c. Record whether the I/O processor for the tape unit also gives support to any disk unit
resources.
d. Continue with the next step.
8. Does the I/O processor give support to any disk unit resources?
No: Continue with the next step.
Yes: The Reset option is not available. Go to step 11 on page 308.
9. Does the I/O processor give support to only one tape unit?
v No: Continue with the next step.
v Yes: Perform the following steps:
a. Select Work with configuration description and press Enter.
b. Select Work with status and press Enter.
Note: You must complete the remaining parts of this step before you press Enter again.
c. If the device is not varied off, select Vary off before continuing.
d. Select Vary on for the failing tape unit.
e. Enter RESET(*YES) (the Reset command) on the command line.
f. Press Enter. This ends the procedure.
10. Perform the following steps:
a. Enter
WRKCFGSTS *DEV *TAP ASTLVL(*INTERMED)
Isolation procedures
307
(the Work with Configuration Status command) on the command line.
b. Select Vary off for the failing tape unit and associated devices (the devices you identified in step
7 on page 307), and then press Enter.
Note: You must complete the remaining parts of this step before you press Enter again.
c. Select Vary on for the failing tape unit.
d. Enter
RESET(*YES)
(the Reset command) on the command line.
e. Press Enter.
f. Select Vary on for the associated devices (tape units) you identified in step 7 on page 307. It is not
necessary to use the Reset option again.
Does a system message indicate that the vary on operation failed?
Yes: Continue with the next step.
No: This ends the procedure.
11. The Reset is not available, or you were not able to find the Configuration Description name when
using
WRKHDWRSC *STG
(the Work with Hardware Resources command).
You can perform an I/O processor (IOP) reset by performing an IPL of the I/O processor. All devices
that are attached to the IOP will reset.
The following steps describe how to load an IOP, how to configure a tape drive, how to vary on tape
devices, and how to make tape devices available.
12. Is a data cartridge or a tape reel installed in the tape device?
No: Continue with the next step.
Yes: Remove the data cartridge or tape reel. Continue with the next step.
13. Can you enter commands on the command line?
v Yes: Continue with the next step.
v No: Perform the following steps:
a. Power off the system. See Powering on and powering off the system.
b. Power on the system.
The system performs an IPL and resets all devices. If the tape device responds to SCSI address 7,
the system configures the tape device. This ends the procedure.
14. Verify that automatic configuration is on by entering
DSPSYSVAL QAUTOCFG
(the Display System Value command) on the command line.
Is the Autoconfigure device option set to 1?
v Yes: Continue with the next step.
v No: Perform the following steps:
a. Press Enter to return to the command line.
b. Set automatic configuration to On by entering
CHGSYSVAL QAUTOCFG ’1’
(the Change System Value command) on the command line.
Note: QAUTOCFG resets to its initial value in step 20 on page 309.
308
Isolation procedures
c. Continue with the next step.
15. Perform the following steps:
a. Enter
STRSST
(the Start SST command) on the command line.
b. On the Start Service Tools Sign On display, type in a User ID with QSRV authority and
Password.
c. Select Start a Service Tool > Hardware Service Manager > Logical Hardware Resources >
System Bus Resources. The Logical Hardware Resources on System Bus display shows all of the
IOPs.
d. Find the IOP you want to reset. You must ensure that no one is using any of the tape units,
communication channels, or display stations that are attached to the IOP you want to reset.
Does a "*" indicator appear to the right of the IOP description?
v No: Continue with the next step.
v Yes: Disk units are attached to the IOP.
Perform the following steps:
a. Press F3 until the Exit System Service Tools display appears.
b. Press Enter.
c. Power off the system. See Powering on and powering off the system.
d. Power on the system.
The system performs an IPL and resets all devices. This ends the procedure.
16. Perform the following steps:
a. Select I/O debug > IPL the I/O processor.
b. When the IOP reset is complete, continue with the next step of this procedure.
17. Perform the following steps:
a. Press F12 to return to the Logical Hardware Resources on System Bus display.
b. Select Resources associated with IOP for the IOP you reset.
Did the IOP detect the tape unit?
v Yes: Continue with the next step.
v No: The IOP did not detect the tape unit. Consider the following:
– Ensure that the tape unit is powered on and that the signal cables are connected correctly. If
you find and correct a power or a signal cable problem, return to step 15.
– The tape unit may be failing. Go to the tape unit service information and perform the
procedures for analyzing device problems. If you find and correct a tape unit problem, return
to step 15.
– If none of the above are true, ask your next level of support for assistance. This ends the
procedure.
18. Press F3 until the Exit System Service Tools display appears. Then press Enter.
19. Was automatic configuration Off before you performed step 14 on page 308?
Yes: Continue with the next step.
No: This ends the procedure.
20. Enter
CHGSYSVAL QAUTOCFG ’0’
(the Change System Value command) on the command line to reset QAUTOCFG to its initial value.
This ends the procedure.
Isolation procedures
309
TUPIP06
Use this procedure to isolate a Device Not Found message during installation from an alternate device.
There are several possible causes:
v
v
v
v
v
v
The alternate installation device was not correctly defined.
The alternate installation device was not made ready.
The alternate installation device does not contain installation media.
The alternate installation device is not powered on.
The alternate installation device is not connected properly.
There is a hardware error on the alternate installation device or the attached I/O processor.
Read the danger notices in “Tape unit isolation procedures” on page 303 before continuing with this
procedure.
1. Is the device that you are using for alternate installation defined as the alternate installation device?
v Yes: Is the alternate installation device ready?
Yes: Continue with the next step.
No: Make the alternate installation device ready and retry the alternate installation. This ends
the procedure.
v No: Correct the alternate installation device information and retry the alternate installation. This
ends the procedure.
2. Is there installation media in the alternate installation device?
v Yes: Is the alternate installation device an external device?
Yes: Continue with the next step.
No: Go to step 5.
v No: Load the correct media and retry the alternate installation. This ends the procedure.
3. Is the alternate installation device powered on?
v Yes: Make sure that the alternate installation device is properly connected to the I/O processor or
I/O adapter card.
Is the alternate installation device properly connected?
Yes: Go to step 5.
No: Correct the problem and retry the alternate installation. This ends the procedure.
v No: Continue with the next step.
4. Ensure that the power cable is connected tightly to the power cable connector at the back of the
alternate device. Ensure that the power cable is connected to a power outlet that has the correct
voltage. Set the alternate device Power switch to the Power On position.
The Power light should go on and remain on. If a power problem is present, one of the following
power failure conditions may occur:
v The Power light flashes, then remains off.
v The Power light does not go on.
v Another indication of a power problem occurs.
Does one of the above power failure conditions occur?
v No: The alternate device is powered on and runs its power-on self-test. Wait for the power-on
self-test to complete.
Does the power-on self-test complete successfully?
No: Go to the service information for the specific alternate installation device to correct the
problem. Then retry the alternate installation. This ends the procedure.
Yes: Retry the alternate installation. This ends the procedure.
310
Isolation procedures
v Yes: Perform the following steps:
a. Go to the service information for the specific alternate device to correct the power problem.
b. When you have corrected the power problem, retry the alternate installation. This ends the
procedure.
5. Was a device error recorded in the Product Activity Log?
No: Contact your next level of support. This ends the procedure.
Yes: See Reference Code Finder to look up the device error record and correct the problem. This
ends the procedure.
Tape unit self-test procedure
This procedure is designed to allow you to quickly perform a complete set of diagnostic tests on a 6384
or 6387 tape unit.
The following procedure is designed to allow you to quickly perform a complete set of diagnostic tests
on a 6384 or 6387 tape unit, without impacting your system operation. This test can also be used to verify
good performance of individual tape cartridges.
Enter diagnostic mode:
1. Verify that a cartridge is not loaded in the tape unit. To unload a cartridge, press the eject button on
the front of the tape unit. If the cartridge does not eject, refer to (Tape unit - manual removal).
2. Press and hold the eject button for about 6 seconds until the amber LED starts flashing slowly, then
release the button. The amber (left) LED will flash, indicating that the tape unit is waiting for a
cartridge to be inserted.
Running the self-test
1. Self-testing begins when a scratch data cartridge is inserted into the tape unit. The Ready (left) LED
will flash, indicating that self-testing is in progress.
Note: A cartridge must be loaded within 15 seconds, otherwise, the tape unit will automatically revert
back to normal operation. If necessary, return to step 1 to reenter diagnostic mode.
2. For fastest results, we recommend using an SLR100 Test Tape (P/N 35L0967) which was originally
provided with your System i server.
Attention: Use a blank cartridge that does not contain customer data. During this self-test, the
cartridge will be rewritten with a test pattern and any customer data will be destroyed.
Note: Use a cartridge that is not write-protected. If a write-protected cartridge is inserted while the
tape unit is in diagnostic mode, the cartridge will be ejected, see Incorrect cartridge below.
Self-testing will only be performed using a write-compatible cartridge type, and with a cartridge that
is not damaged, see Incorrect cartridge below.
If a cleaning cartridge is inserted while the tape unit is in diagnostic mode, drive cleaning will occur
and the tape unit will then return to normal operating mode. Return to step 1 to reenter diagnostic
mode.
3. At any time, self-testing can be stopped by pressing the eject button. After the current operation is
completed, the cartridge will be ejected and tape unit will return to normal operating mode.
4. The Ready (left) LED will continue to flash during the following:
v The cartridge load sequence has a approximate duration of 30 seconds. The center LED indicates
tape movement.
v The hardware test has an approximate duration of 2 and 1/2 minutes. During that time, a static
test is performed on tape unit electrical components. No tape motion occurs during this step.
Isolation procedures
311
v The cartridge load/unload test has an approximate duration of 1 and 1/2 minutes. During that
time, the Ready LED will continue to flash while a dynamic test is performed on tape unit
mechanical components. Two cartridge load cycles are included.
v Duration of the write/read test will vary, depending on what type of cartridge is loaded into the
tape unit. When an SLR100 Test Tape is used, typical duration will be 5 minutes. Use of other
cartridge types can increase the write/read test duration to 30-40 minutes. During this test, the
Ready LED will continue to flash. The center LED indicates tape movement.
Interpreting the results
Test Passed: When self-testing has completed successfully, and no problems are detected, the cartridge is
unloaded from the tape unit and all LEDs are off. Proper function of both the tape unit and tape
cartridge have now been verified.
Note: A solid amber light indicates that self-testing has completed successfully, but the tape unit requires
cleaning. Clean the tape unit by inserting an Dry Process Cleaning Cartridge (P/N 35L0844).
Test Failed: The cartridge will remain loaded inside the tape unit, and the amber LED will flash when a
problem is detected with either the tape unit or cartridge.
Note: To isolate failure to either tape unit or cartridge, return to step 1 on page 311 and repeat this
self-test using a different scratch cartridge.
Incorrect cartridge: When the center (green) and right (amber) LEDs flash and a cartridge is unloaded,
the tape unit has determined that an incorrect tape cartridge has been inserted, and self-testing cannot be
performed. Verify that your tape cartridge is not one of the following:
v
v
v
v
Write-protected
Damaged
Unsupported media type
Media that is not write-compatible with tape unit.
Press the eject button, to end self-test and return the tape unit to normal operating mode. Then return to
step 1 on page 311 and run the self-test using another cartridge, or one that is not write-protected. This
ends the procedure.
Tape device ready conditions
All the conditions that are listed for the device, must be correct for the device to be ready.
If the device is not ready, use the Action column or other instructions, and go to the service information
for the specific tape device.
If the system has logical partitions, perform this procedure from the logical partition that reported the
problem (see Determining if the system has logical partitions).
312
Isolation procedures
Table 58. Tape device ready conditions
Storage device
Ready description
Action
3480 or 3490
v Power switch is set to the On
position.
See the 3480 Magnetic Tape Subsystem
Operator's Guide, SA32-0066, or 3490
Magnetic Tape Subsystem Operator's
Guide, SA32-0124, for instructions on
making the tape unit ready.
v Power light is on.
v DC Power light is on.
v Control unit On-line switch is set
to the On-line position.
v Control unit Normal/Test switch is
set to the Normal position.
v Control unit channel
Enable/Disable switch is set to the
Enable position.
v Tape unit On-line/Off-line switch
is set to the On-line position.
v Tape is loaded.
v Tape unit displays Ready U or
Ready F.
9348
v Power switch is set to the On
position.
v Power light is on.
v Tape is loaded.
v Status display shows 00 A002.
v On-line light is on.
See the 9348 Customer Information
manual, SA21-9567, for instructions
on making the tape unit ready. If you
cannot make the tape unit ready, go
to the "Analyzing Problems" section
of 9348 Tape Unit Service Information,
SY31-0697.
Twinaxial workstation I/O processor isolation procedure
Use the procedure below to isolate a failure which has been detected by the twinaxial workstation I/O
processor. If you are using a personal computer, an emulation program must be installed and working.
Read and observe all safety procedures before servicing the system and while performing the procedure
below.
Attention: Unless instructed otherwise, always power off the system or expansion unit where the FRU
is located (see Powering on and powering off the system) before removing, exchanging, or installing a
field-replaceable unit (FRU).
Isolation procedures
313
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
Attention: When instructed, remove and connect cables carefully. You may damage the connectors if
you use too much force.
TWSIP01
The workstation IOP detected an error.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
Read the danger notices in “Twinaxial workstation I/O processor isolation procedure” on page 313 before
performing this procedure.
One of the following occurred:
v All of the workstations on one port are not working.
v All of the workstations on the system are not working.
v One of the workstations on the system is not working.
314
Isolation procedures
v The reference code table instructed you to perform this procedure.
v The Remote Operations Console is not working.
1. If the system has logical partitions, perform this procedure from the logical partition that reported
the problem. To determine if the system has logical partitions, go to Determining if the system has
logical partitions before continuing with this procedure.
2. Are you using a workstation adapter console?
Note: A personal computer (used as a console) that is attached to the system by using a console
cable feature is known as a workstation adapter console. The cable (part number 46G0450, 46G0479,
or 44H7504) connects the system port on the personal computer to a communications I/O adapter on
the system.
No: Continue with the next step.
Yes: Go to “WSAIP01” on page 321. This ends the procedure.
3. Is the device you are attempting to repair a personal computer (PC)?
No: Continue with the next step.
Yes: PC emulation programs operate and report system-to-PC communications problems
differently. See the PC emulation information for details on error identification. Then, continue
with the next step.
4. Perform the following steps:
a. Verify that all the devices you are attempting to repair, the primary console, and any alternative
consoles are powered on.
b. Verify that the all the devices you are attempting to repair, the primary console, and any
alternative consoles have an available status. For more information about displaying the device
status, see Hardware service manager.
c. Verify that the workstation addresses of all workstations on the failing port are correct. Each
workstation on the port must have a separate address, from 0 through 6. See the workstation
service information for details on how to check addresses.
d. Verify that the last workstation on the failing port is terminated. All other workstations on that
port must not be terminated.
e. Ensure that the cables that are attached to the device or devices are tight and are not visibly
damaged.
f. If there were any cable changes, check them carefully.
g. If all of the workstations on the system are not working, disconnect them by terminating at the
console.
h. Verify the device operation (see the device information for instructions).
i. The cursor position can assist in problem analysis.
v If the cursor is in the upper right corner, it indicates a communication problem between the
workstation IOP and the device. Continue with the next step.
v If the cursor is in the upper left corner, it indicates a communication problem between the
workstation IOP and the operating system. Perform the following steps:
1) Verify that all current PTFs are loaded.
2) Ask your next level of support for assistance. This ends the procedure.
5. Is the system powered off?
Yes: Continue with the next step.
No: Go to step 8 on page 316.
6. Perform the following steps:
a. Power on the system in Manual mode. See IPL type, mode, and speed options for details.
b. Wait for a display to appear on the console or a reference code to appear on the control panel.
Does a display appear on the console?
Isolation procedures
315
v No: Continue with the next step.
v Yes: If you disconnected any devices after the console in step 4 on page 315, perform the
following:
a. Power off the system.
b. Reconnect one device.
Note: Ensure that you terminate the device you just reconnected and remove the termination
from the device previously terminated.
c. Power on the system.
d. If a reference code appears on the control panel, go to step 9.
e. If no reference code appears, repeat steps a through d of this step until you have checked all
devices disconnected previously.
f. Continue to perform the initial program load (IPL). This ends the procedure.
7. Does the same reference code that sent you to this procedure appear on the control panel?
Yes: Continue with the next step.
No: Perform problem analysis for this new problem. This ends the procedure.
8. Perform the following steps to make DST available:
a. Ensure that Manual mode on the control panel is selected.
b. Select function 21 Make DST Available.
c. Check the console and any alternative consoles for a display.
Does a display appear on any of the console displays?
v No: Continue with the next step.
v Yes: If you disconnected any devices after the console in step 4 on page 315, perform the
following:
a. Power off the system.
b. Reconnect one device.
Note: Ensure that you terminate the device you just reconnected and remove the termination
from the previously terminated device.
c. Power on the system.
d. If a reference code appears on the control panel or on the management console, go to step 9.
e. If no reference code, repeat steps a through d of this step until you have checked all devices
disconnected previously.
f. Continue to perform the initial program load (IPL). This ends the procedure.
9. Ensure that the following conditions are met:
v The workstation addresses of all workstations on the failing port must be correct.
Each workstation on the port must have a separate address, from 0 through 6. See the workstation
service information if you need help with checking addresses.
Did you find a problem with any of the above conditions?
Yes: Continue with the next step.
No: Go to step 11 on page 317.
10. Perform the following steps:
a. Correct the problem.
b. Select function 21 Make DST Available.
c. Check the console and any alternative consoles for a display.
Does a display appear on any of the consoles?
v Yes: Continue to perform the IPL. This ends the procedure.
316
Isolation procedures
v No: Does the same reference code appear on the control panel?
Yes: Continue with the next step.
No: Perform problem analysis for this new problem. This ends the procedure.
11. Is the reference code one of the following: 0001, 0003, 0004, 0005, 0006, 0101, 0103, 0104, 0105, 0106,
5004, 5082, B000, D010, or D023?
No: Continue with the next step.
Yes: Go to step 15.
12. Does the system have an alternative console on a second workstation IOP?
Yes: Continue with the next step.
No: Go to step 14.
13. There is either a Licensed Internal Code problem, or there are two device failures on the workstation
IOPs, consoles, or cables. The console and any alternative consoles are the most probable causes for
this failure.
v See the service information for the failing display to attempt to correct the problem. If a display is
connected to the system by a link protocol converter, use the link protocol converter information
to attempt to correct the problem. The link protocol converter may be the failing item.
v If you have another working display, you can exchange the console and alternative consoles and
perform an IPL to attempt to correct the problem.
v Exchange the following parts one at a time until you determine the failing item:
a. Console
b. Alternative console
c. Cables
d. Workstation IOA for the console
e. The multi-adapter bridge. This ends the procedure.
14. The console, cables, or the workstation IOP card is the most probable causes for this failure. If the
console is connected to the system by a link protocol converter, the link protocol converter is
possibly the failing item. Use one or more of the following options to correct the problem:
a. See the service information for the failing displays for more information. If a display is connected
to the system by a link protocol converter, see the link protocol converter information to attempt
to correct the problem.
b. If you have another working display, you can exchange the console and perform an IPL to
attempt to correct the problem.
c. Exchange the following parts one at a time until you determine the failing item:
1) Console
2) Workstation IOA
3) The multi-adapter bridge.
4) Twinaxial attachment (cable) This ends the procedure.
15. To continue problem analysis, use a port tester, part 93X2040 or 59X4262, which you may have with
your tools or the customer may have one. The port tester has either two or three lights.
Is a port tester available?
v Yes: Continue with the next step.
v No: Check or exchange the cables from the system to the failing display. Did this correct the
problem?
Yes: You corrected the problem. This ends the procedure.
No: Go to step 12.
16. To use the port tester to isolate the problem, perform the following steps:
Isolation procedures
317
v Verify that the port tester is operating correctly by doing a self-test. A self-test can be made at any
time, even when the port tester is attached to a port or cable. Perform the following steps to do a
self-test:
a. Move the selector switch to the center (0) position.
b. Push and hold the test button until all lights go on. The yellow lights should go on
immediately, and the green light should go on approximately 5 seconds later. The port tester is
ready for use if all lights go on.
v Leave the system power on.
17. Find the input cable to the failing console or port.
Is the failing console or the failing port attached to a protocol converter?
v No: Perform the following steps:
a. Disconnect the input cable from the failing console.
b. Connect the port tester to the input cable.
c. Continue with the next step.
v Yes: Perform the following steps:
a. Disconnect the cable that comes from the system at the protocol converter.
b. Connect the port tester to the cable.
c. Continue with the next step.
18. Perform the following steps:
a. Set the selector switch on the port tester to the left (1) position for a twinaxial connection. Set the
switch to the right (2) position for a twisted pair connection.
b. Press and hold the test switch on the port tester for 15 seconds and observe the lights.
c. Choose from the following options:
v If the port tester has three lights, do the following:
– If only the top (green) light is on, go to step 27 on page 319.
– If both the top (green) and center (yellow) lights are on, go to step 20.
Note: The center (yellow) light is always on for twisted pair cable and may be on for
fiber-optical cable.
– If only the bottom (yellow) light is on, go to step 21 on page 319.
– If all lights are off, go to step 22 on page 319.
–
v If
–
–
–
–
If all lights are on, go to step 19.
the port tester has two lights, do the following:
If only the top (green) light is on, go to step 27 on page 319.
If only the bottom (yellow) light is on, go to step 21 on page 319.
If both lights are off, go to step 22 on page 319.
If both lights are on, continue with the next step.
19. The tester is in the self-test mode. Check the position of the selector switch.
v If the selector switch is not in the correct position, go to step 18.
v If the selector switch is already in the correct position, the port tester is not working correctly.
Exchange the port tester, and go to step 16 on page 317.
20. The cable you are testing has an open shield.
Note: The open shield can be checked only on the cable from the twinaxial workstation attachment
to the device or from device to device. Only one section of cable can be checked at a time. See the
SA41-3136, Port Tester Use information.
This ends the procedure.
318
Isolation procedures
21. The cable network is bad. The wires in the cable between the console and the twinaxial workstation
attachment are reversed. Go to step 26.
22. Perform the following steps:
a. Find the twinaxial workstation attachment to which the failing console is attached.
b. Disconnect the cable from port 0 on that twinaxial workstation attachment.
c. Connect the port tester to port 0 on the attachment.
d. Set the selector switch on the port tester to the left (1) position.
23. Perform the following steps:
a. Press and hold the test switch on the port tester for 15 seconds and observe the lights.
b. If the port tester has three lights, do the following:
v If both the top (green) and center (yellow) lights are on, continue with step 24.
v
v
v
v
Note: The center (yellow) light is always on for twisted pair cable and may be on for
fiber-optical cable.
If only the bottom (yellow) light is on, continue with step 24.
If all lights are off, continue with step 24.
If only the top (green) light is on, go to step 26.
If all lights are on, go to step 25.
c. If the port tester has two lights, do the following:
v If only the top (green) light is on, go to step 26.
v If only the bottom (yellow) light is on, continue with step 24.
v If both lights are off, continue with step 24.
v If both lights are on, go to step 25.
24. The test indicated that there was no signal from the system. Reconnect the cable you disconnected
and perform the following steps:
a. Exchange the following parts:
1) Twinaxial workstation IOA card
2) The multi-adapter bridge.
b. Power on the system to perform an IPL. This ends the procedure.
25. The tester is in the self-test mode. Check the position of the selector switch:
v If the selector switch is not in the left (1) position, set the switch to the left (1) position. Then go to
step 23.
v If the selector switch is already in the left (1) position, the port tester is not working correctly.
Exchange the port tester and go to step 22.
26. The cable to the workstation is the failing item. Cable maintenance is a customer responsibility. The
cable must be repaired or exchanged. Then, power on the system to perform an IPL. This ends the
procedure.
27. The port tester detects most problems, but it does not always detect an intermittent problem or some
cable impedance problems. The tester may indicate a good condition, although there is a problem
with the workstation IOP card or cables.
a. If the failing display is connected to a link protocol converter, the link protocol converter is the
failing item. See the link protocol converter service information to correct the problem.
b. Exchange the following parts:
1) Console
2) Twinaxial workstation IOA
3) The multi-adapter bridge.
4) Cables
Isolation procedures
319
c. If you have another working display, you can exchange the console and perform an IPL to
attempt to correct the problem. See the service information for the failing display for more
information.
d. If exchanging the failing items did not correct the problem and the reference code was a 5002,
5082, or 50FF, there may be a Licensed Internal Code problem. Go to “LICIP03” on page 91.
e. The problem may be caused by devices that are attached after the console on port 0. This ends
the procedure.
Workstation adapter isolation procedure
Isolates a failure that is detected by the workstation adapter, and is used when no display is available
with which to perform on-line problem analysis.
The workstation adapter detected a problem while communicating with the workstation that is used as
the primary console.
Note: If you are using a PC, you must install an emulation program.
Read and observe all safety procedures before servicing the system and while performing the procedure
below.
Attention: Unless instructed otherwise, always power off the system or expansion unit where the FRU
is located (see Powering on and powering off the system) before removing, exchanging, or installing a
field-replaceable unit (FRU).
320
Isolation procedures
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
WSAIP01
Isolate a console keyboard error that contains a "K" on the display.
Note: If the console has a keyboard error, there may be a "K" on the display. See the workstation service
information for more information.
Perform the following procedure from the logical partition that reported the problem:
1. Select the icon on the workstation to make it the console (you may have already done this). You must
save the console selection.
2. Access Dedicated Service Tools (DST) by performing the following:
a. Select Manual mode on the control panel.
b. Use the selection switch on the control panel to display function 21, Make DST Available, and
press Enter on the control panel.
c. Wait for a display to appear on the console or for a reference code to appear on the control panel.
Does a display appear on the console?
No: Continue with the next step.
Isolation procedures
321
Yes: The problem is corrected. This ends the procedure.
3. Isolate the problem to one server and one workstation (console) by performing the following:
a. Disconnect the power cable from the workstation.
b. Eliminate all workstations, cables, and connector boxes from the network except for one server,
one console, two connector boxes, and one cable.
c. Ensure that the cables that are connected to the console, the keyboard, and the server are
connected correctly and are not damaged.
4. Perform the following steps:
a. Ensure that the server console is terminated correctly.
b. Set the Power switch on the console to the On position.
c. Select the SNA*PS icon on the console.
d. See the workstation information for more information.
5. Access DST by performing the following:
a. Select Manual mode on the control panel.
b. Use the selection switch on the control panel to display function 21, Make DST Available, and
press Enter on the control panel.
c. Wait for a display to appear on the console or for a reference code to appear on the control panel.
Does a display appear on the console?
No: Continue with the next step.
Yes: The problem is in a cable, connector box, or device you disconnected in step 3. This ends the
procedure.
6. Does the reference code A600 5005 appear on the control panel?
Yes: Continue with the next step.
No: Perform problem analysis using this reference code. This ends the procedure.
7. Do you have another workstation, cable, and two connector boxes you can exchange with the
workstation connected to the server?
v Yes: Continue with the next step.
v No: One of the following is causing the problem:
Note: The items at the top of the list have a higher probability of fixing the problem than the items
at the bottom of the list.
–
–
–
–
–
–
Workstation adapter Licensed Internal Code
Workstation adapter configuration
Workstation
Cable
Connector box
Workstation IOA
– Workstation IOP
If you still have not corrected the problem, ask your next level of support for assistance. This ends
the procedure.
8. Repeat steps 3 through 7 of this procedure, using a different workstation, cable, and connector boxes.
Do you still have a problem?
Yes: Continue with the next step.
No: The problem is in the cable, connector boxes, or workstation you disconnected. This ends the
procedure.
9. One of the following is causing the problem:
322
Isolation procedures
Note: The items at the top of the list have a higher probability of fixing the problem than the items at
the bottom of the list.
v Workstation adapter Licensed Internal Code
v Workstation adapter configuration
v Workstation IOA
v Communications IOP
To bring up a workstation other than the console, perform the following:
a. Connect another workstation into this network.
b. Select Normal mode on the control panel.
c. Perform an IPL.
If the sign-on display appears, the following parts are good:
v Communications IOP
v Workstation IOA
Note: If a printer connected to this assembly is not working correctly, it may look like the display is
bad. Perform a self-test on the printer to ensure that it prints correctly (see the printer service
information).
If you still have not corrected the problem, ask your next level of support for assistance. This ends
the procedure.
Workstation adapter console isolation procedure
Contains the procedure necessary to isolate a failure that is detected by the workstation adapter console.
Use this procedure when no display is available with which to perform online problem analysis.
Note: If you are using a PC, you must install an emulation program.
Read all safety procedures before servicing the system. Observe all safety procedures when performing a
procedure. Unless instructed otherwise, always power off the system or expansion unit where the FRU is
located, see Powering on and powering off the system before removing, exchanging, or installing a
field-replaceable unit (FRU).
Isolation procedures
323
DANGER
When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To
avoid a shock hazard:
v Connect power to this unit only with the IBM provided power cord. Do not use the IBM
provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration
of this product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
v Connect all power cords to a properly wired and grounded electrical outlet. Ensure that the outlet
supplies proper voltage and phase rotation according to the system rating plate.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Disconnect the attached power cords, telecommunications systems, networks, and modems before
you open the device covers, unless instructed otherwise in the installation and configuration
procedures.
v Connect and disconnect cables as described in the following procedures when installing, moving,
or opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. Remove the power cords from the outlets.
3. Remove the signal cables from the connectors.
4. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. Attach the power cords to the outlets.
5. Turn on the devices.
(D005)
Read and understand the following service procedures before using this section:
v Powering on and powering off the system
v Primary consoles or alternative consoles
v System FRU locations
Note: If the console has a keyboard error, there may be a K on the display. See the workstation service
information for more information.
1. If the system has logical partitions, perform this procedure from the logical partition that reported the
problem. To determine if the system has logical partitions, go to Determining if the system has logical
partitions.
2. Ensure that your workstation meets the following conditions:
v The workstation that you are using for the console is powered on.
v The emulation program is installed and is working.
v The input/output adapter (IOA) is installed and the workstation console cable is attached.
Notes:
324
Isolation procedures
a. Card information: Hardware that is associated with 6A59 feature is the type 2745 card.
b. Cable information: The cable attaches directly to the IOA.
Did you find a problem with any of the conditions listed above?
No: Continue with the next step.
Yes: Correct the problem. Then, perform an IPL of the system. This ends the procedure.
3. Perform the following steps to make dedicated service tool (DST) available:
a. If there is an alternative console, ensure that it is powered on.
b. Ensure that Manual mode on the control panel is selected.
c. Select function 21, Make DST Available on the control panel, and press Enter.
Does a display appear on either the console or any alternative console?
No: Continue with the next step.
Yes: Complete the IPL. When the operating system display appears, use the Work with Problem
command (WRKPRB) or Analyze Problem command (ANZPRB) to analyze and correct or report any
console problems. This ends the procedure.
4. Do you have SRC A600 5001, A600 5004, A600 5007, or B075 xxxx (where xxxx is any value)?
v No: Continue with the next step.
v Yes: Perform the following steps:
a. Disconnect any cables that are attached to the IOA.
b. Install the wrap plug on the IOA. The 2745 wrap plug label is QQ.
c. Perform an IPL in Manual mode.
5. Does SRC 6A59 5007 occur?
v No: Continue with the next step.
v Yes: One of the following is causing the problem:
– Workstation emulation program
– Workstation
– Workstation console cable
This ends the procedure.
6. Did SRC A600 5001, A600 5004, or 6A59 5008 occur?
No: This is a new problem. Use the new reference code to perform problem analysis for the
problem, or ask your next level of support for assistance. This ends the procedure.
Yes: The Type 2745 workstation adapter is the failing item. This ends the procedure.
Isolating problems on servers that run AIX or Linux
Use the procedures for AIX or Linux servers if there is not a management console attached to the server.
If the server is connected to a management console, use the procedures that are available on the
management console to continue FRU isolation.
MAP 0210: General problem resolution
Replace the FRUs in the failing item list one at a time in the order that they are listed.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
Isolation procedures
325
Problems with loading and starting the operating system (AIX and
Linux)
If the system is running partitions from partition standby (LPAR), the following procedure addresses the
problem in which one partition will not boot AIX or Linux while other partitions boot successfully and
run the operating system successfully.
It is the customer's responsibility to move devices between partitions. If a device must be moved to
another partition to run standalone diagnostics, contact the customer or system administrator. If the
optical drive must be moved to another partition, all SCSI devices connected to that SCSI adapter must
be moved because moves are done at the slot level, not at the device level.
Depending on the boot device, a checkpoint may be displayed on the operator panel for an extended
period of time while the boot image is retrieved from the device. This is particularly true for tape and
network boot attempts. If booting from an optical drive or tape drive, watch for activity on the drive's
LED indicator. A flashing LED indicates that the loading of either the boot image or additional
information required by the operating system being booted is still in progress. If the checkpoint is
displayed for an extended period of time and the drive LED is not indicating any activity, there might be
a problem loading the boot image from the device.
Notes:
1. For network boot attempts, if the system is not connected to an active network or if the target server
is inaccessible (which can also result from incorrect IP parameters being supplied), the system will
still attempt to boot. Because time-out durations are necessarily long to accommodate retries, the
system may appear to be hung. Refer to checkpoint CA00 E174.
2. If the partition hangs with a 4-character checkpoint in the display, the partition must be deactivated,
then reactivated before attempting to reboot.
3. If a BA06 000x error code is reported, the partition is already deactivated and in the error state.
Reboot by activating the partition. If the reboot is still not successful, go to step 3.
This procedure assumes that a diagnostic CD-ROM and an optical drive from which it can be booted are
available, or that diagnostics can be run from a NIM (Network Installation Management) server. Booting
the diagnostic image from an optical drive or a NIM server is referred to as running standalone
diagnostics.
1. Is a management console attached to the managed system?
Yes: Continue with the next step.
No: Go to step 3.
2. Look at the service action event error log on the management console. Perform the actions necessary
to resolve any open entries that affect devices in the boot path of the partition or that indicate
problems with I/O cabling. Then try to reboot the partition. Does the partition reboot successfully?
Yes: This ends the procedure.
No: Continue with the next step.
3. Boot to the SMS main menu:
v If you are rebooting a partition from partition standby (LPAR), go to the properties of the partition
and select Boot to SMS, then activate the partition.
v If you are rebooting from platform standby, access the ASMI. See Accessing the Advanced system
Management Interface using a Web browser. Select Power/Restart Control, then Power On/Off
System. In the AIX/Linux partition mode boot box, select Boot to SMS menu > Save Settings
and Power On.
At the SMS main menu, select Select Boot Options and verify whether the intended boot device is
correctly specified in the boot list. Is the intended load device correctly specified in the boot list?
v Yes: Perform the following steps:
326
Isolation procedures
4. If
v
v
v
a. Remove all removable media from devices in the boot list from which you do not want to load
the operating system.
b. If you are attempting to load the operating system from a network, go to step 4.
c. If you are attempting to load the operating system from a disk drive or an optical drive, go to
step 7.
d. No: Go to step 5.
you are attempting to load the operating system from the network, perform the following steps:
Verify that the IP parameters are correct.
Use the SMS ping utility to attempt to ping the target server. If the ping is not successful, have the
network administrator verify the server configuration for this client.
Check with the network administrator to ensure that the network is up. Also ask the network
administrator to verify the settings on the server from which you are trying to load the operating
system.
v Check the network cabling to the adapter.
Restart the partition and try loading the operating system. Does the operating system load
successfully?
Yes: This ends the procedure.
No: Go to step 7.
5. Use the SMS menus to add the intended boot device to the boot sequence. Can you add the device
to the boot sequence?
Yes: Restart the partition. This ends the procedure.
No: Continue with the next step.
6. Ask the customer or system administrator to verify that the device you are trying to load from is
assigned to the correct partition. Then select List All Devices and record the list of bootable devices
that displays. Is the device from which you want to load the operating system in the list?
Yes: Go to step 7.
No: Go to step 10.
7. Try to load and run standalone diagnostics against the devices in the partition, particularly against
the boot device from which you want to load the operating system. You can run standalone
diagnostics from an optical drive or a NIM server. To boot standalone diagnostics, follow the
detailed procedures in Running the online and stand-alone diagnostics.
Note: When attempting to load diagnostics on a partition from partition standby, the device from
which you are loading standalone diagnostics must be made available to the partition that is not
able to load the operating system, if it is not already in that partition. Contact the customer or
system administrator if a device must be moved between partitions in order to load standalone
diagnostics.
Did standalone diagnostics load and start successfully?
Yes: Go to step 8.
No: Go to step 14 on page 328.
8. Was the intended boot device present in the output of the option Display Configuration and
Resource List, that is run from the Task Selection menu?
v Yes: Continue with the next step.
v No: Go to step 10.
9. Did running diagnostics against the intended boot device result in a No Trouble Found message?
Yes: Go to step 12 on page 328.
No: Go to the list of service request numbers and perform the repair actions for the SRN reported
by the diagnostics. When you have completed the repair actions, go to step 13 on page 328.
10. Perform the following actions:
Isolation procedures
327
a. Perform the first item in the action list below. In the list of actions below, choose SCSI or IDE
based on the type of device from which you are trying to boot the operating system.
b. Restart the system or partition.
c. Stop at the SMS menus and select Select Boot Options.
d. Is the device that was not appearing previously in the boot list now present?
Yes: Go to Verify a repair. This ends the procedure.
No: Perform the next item in the action list and then return to step 10b. If there are no more
items in the action list, go to step 11.
Action list:
Note: See System FRU locations for part numbers and links to exchange procedures.
a. Verify that the SCSI or IDE cables are properly connected. Also verify that the device
configuration and address jumpers are set correctly.
b. Do one of the following:
v SCSI boot device: If you are attempting to boot from a SCSI device, remove all hot-swap disk
drives (except the intended boot device, if the boot device is a hot-swap drive).If the boot
device is present in the boot list after you boot the system to the SMS menus, add the
hot-swap disk drives back in one at a time, until you isolate the failing device.
v IDE boot device: If you are attempting to boot from an IDE device, disconnect all other
internal SCSI or IDE devices. If the boot device is present in the boot list after you boot the
system to the SMS menus, reconnect the internal SCSI or IDE devices one at a time, until you
isolate the failing device or cable.
c. Replace the SCSI or IDE cables.
d. Replace the SCSI backplane (or IDE backplane, if present) to which the boot device is connected.
e. Replace the intended boot device.
f. Replace the system backplane.
11. Choose from the following:
v If the intended boot device is not listed, go to “PFW1548: Memory and processor subsystem
problem isolation procedure” on page 423. This ends the procedure.
v If an SRN is reported by the diagnostics, go to the list of service request numbers and follow the
action listed. This ends the procedure.
12. Have you disconnected any other devices?
Yes: Reinstall each device that you disconnected, one at a time. After you reinstall each device,
reboot the system. Continue this procedure until you isolate the failing device. Replace the failing
device, then go to step 13.
No: Perform an operating system-specific recovery process or reinstall the operating system. This
ends the procedure.
13. Is the problem corrected?
Yes: Go to Verify a repair. This ends the procedure.
No: If replacing the indicated FRUs did not correct the problem, or if the previous steps did not
address your situation, go to “PFW1548: Memory and processor subsystem problem isolation
procedure” on page 423. This ends the procedure.
14. Is a SCSI boot failure (where you cannot boot from a SCSI-attached device) also occurring?
v Yes: Go to “PFW1548: Memory and processor subsystem problem isolation procedure” on page
423. This ends the procedure.
v No: Continue to the next step.
15. Perform the following actions to determine if another adapter is causing the problem:
a. Remove all adapters except the one to which the optical drive is attached and the one used for
the console.
328
Isolation procedures
b. Reload the standalone diagnostics. Can you successfully reload the standalone diagnostics?
v Yes: Perform the following steps:
1) Reinstall the adapters that you removed (and attach devices as applicable) one at a time.
After you reinstall each adapter, retry the boot operation until the problem reoccurs.
2) Replace the adapter or device that caused the problem.
3) Go to Verify a repair. This ends the procedure.
v No: Continue with the next step.
16. The graphics adapter (if installed), optical drive, IDE or SCSI cable, or system board is most likely
defective. Does your system have a PCI graphics adapter installed?
Yes: Continue with the next step.
No: Go to step 18
17. Perform the following steps to determine if the graphics adapter is causing the problem:
a. Remove the graphics adapter.
b. Attach a TTY terminal to the system port.
c. Try to reload standalone diagnostics. Do the standalone diagnostics load successfully?
Yes: Replace the graphics adapter. This ends the procedure.
No: Continue with the next step.
18. Replace the following (if not already replaced), one at a time, until the problem is resolved:
a. Optical drive
b. IDE or SCSI cable that goes to the optical drive
c. System board that contains the integrated SCSI or IDE adapters.
If this resolves the problem, go to Verify a repair. If the problem still persists or if the previous
descriptions did not address your particular situation, go to “PFW1548: Memory and processor
subsystem problem isolation procedure” on page 423.
This ends the procedure.
SCSI service hints
Use one or more of the following procedures when servicing SCSI adapter or devices.
General SCSI configuration checks
With any type of SCSI problem, begin with the following steps:
1. Verify that all SCSI devices on the SCSI bus have a unique address.
2. Verify that all cables are connected securely and that there is proper termination at both ends of the
SCSI bus.
3. Verify that the cabling configuration does not exceed the maximum cable length for the adapter in
use.
4. Verify that the adapters and devices that you are working with are at the appropriate microcode
levels for the customer situation. If you need assistance with microcode issues, contact your service
support structure.
5. If there are multiple SCSI adapters on the SCSI bus, verify that the customer is using the appropriate
software to support such an arrangement. If the correct software is not in use, some SCSI errors
should be expected when multiple adapters attempt to access the same SCSI device. Also, each
adapter should have a unique address.
High availability or multiple SCSI system checks
If you have a high-availability configuration, or if more than one system is attached to the same SCSI
bus, do the following:
1. Verify that the adapters and devices have unique SCSI addresses. The default SCSI adapter address is
always 7. If you have more than one adapter on the bus, change the address of at least one by using
Isolation procedures
329
SMIT (SMIT Devices > SCSI Adapter > Change/Show characteristics of an adapter). You must make
the changes to the database only, then reboot the system in order for the change to take effect.
2.
3.
4.
5.
6.
Note: Diagnostics defaults to using ID 7 (do not use this ID in high availability configurations).
If RAID devices such as the 7135 or 7137 are attached, run the appropriate diagnostics for the device.
If problems occur, contact your service support structure for assistance. If the diagnostics are run
incorrectly on these devices, misleading SRNs can result.
Diagnostics cannot be run against OEM devices; doing so results in misleading SRNs.
Verify that all cables are connected securely and that both ends of the SCSI bus is terminated correctly.
Verify that the cabling configuration does not exceed the maximum cable length for the adapter in
use. See the SCSI Cabling section in the RS/6000® eServer™ pSeries Adapters, Devices, and Cable
Information for Multiple Bus Systems for more details on SCSI cabling issues.
Verify that the adapter and devices are at the appropriate microcode levels for the customer situation.
If you need assistance with microcode issues, contact your service support structure.
SCSI-2 single-ended adapter PTC failure isolation procedure
Before replacing a SCSI-2 single-ended adapter, use these procedures to determine if a short-circuit
condition exists on the SCSI bus. The same positive temperature coefficient (PTC) resistor is used for both
the internal and external buses. The PTC protects the SCSI bus from high currents due to short-circuiting
on the cable, terminator, or device. It is unlikely that the PTC can be tripped by a defective adapter.
Unless instructed to do so by these procedures, do not replace the adapter because of a tripped PTC
resistor.
A fault (short-circuit) causes an increase in PTC resistance and temperature. The increase in resistance
causes the PTC to halt current flow. The PTC returns to a low resistive and low temperature state when
the fault is removed from the SCSI bus or when the system is turned off. Wait 5 minutes for the PTC
resistor to fully cool, then retest.
These procedures determine if the PTC resistor is still tripped and then determine if there is a
short-circuit somewhere on the SCSI bus.
Determining where to start
Use the following steps to determine the adapter configuration and select the appropriate procedure:
v If there are external cables attached to the adapter, start with the External Bus PTC Isolation Procedure
for your type of adapter.
v If there are no external cables attached, start with the “Internal SCSI-2 single-ended bus PTC isolation
procedure” on page 331.
v If there is a combination of external and internal cables start with the External Bus PTC Isolation
Procedure for your type of adapter. If this procedure does not resolve the problem, continue with the
Internal Bus PTC Isolation Procedure for your type of adapter.
External SCSI-2 single-ended bus PTC isolation procedure
Isolate the external SCSI bus PTC fault with the following procedure:
Note: The external bus is of single-ended design.
1. Ensure the system power and all externally attached device power is turned off. All testing is
accomplished with the power off.
2. Disconnect any internal and external cables from the adapter and remove the adapter from the
system.
3. Verify with a digital ohmmeter that the internal PTC resistor, labeled Z1, (see the illustration after
Internal SCSI-2 Single-Ended Bus PTC Isolation Procedure, step 3 on page 332) is cool and in a low
resistance state, typically less than 1/2 Ohm. Measuring across, be sure to probe both sides of the PTC
330
Isolation procedures
where the solder joints and board come together. The polarity of the test leads is not important. If
necessary, allow the PTC resistor to cool and measure again.
4. Locate Capacitor C1 and measure the resistance across it by using the following procedure:
a. Connect the positive lead to the side of the capacitor where the + is indicated on the board near
C1. Be sure to probe at the solder joint where the capacitor and board come together.
b. Connect the negative lead to the opposite side of the capacitor marked GND. Be sure to probe at
the solder joint where the capacitor and board come together.
c. If there is no short-circuit present, then the resistance reading is high, typically hundreds of Ohms.
Note: Because this is a measurement across unpowered silicon devices, the reading is a function of
the ohmmeter used.
v If there is a fault, the resistance reading is low, typically below 10 Ohms. Because there are no
cables attached, the fault is on the adapter. Replace the adapter.
Note: Some multi-function meters label the leads specifically for voltage measurements. When
using this type of meter to measure resistance, the plus lead and negative lead my not be labeled
correctly. If you are not sure that your meter leads accurately reflect the polarity for measuring
resistance, repeat this step with the leads reversed. If the short-circuit is not indicated with the
leads reversed, the SCSI bus is not faulted (short-circuited).
v If the resistance measured was high, proceed to the next step.
5. Reattach the external cable to the adapter, then do the following:
a. Measure across C1 as previously described.
b. If the resistance is still high, in this case above 10 Ohms, then there is no apparent cause for a PTC
failure from this bus. If there are internal cables attached, continue to the “Internal SCSI-2
single-ended bus PTC isolation procedure.”
c. If the resistance is less than 10 Ohms, there is a possibility of a fault on the external SCSI bus.
Troubleshoot the external SCSI bus by disconnecting devices and terminators. Measure across C1
to determine whether the fault has been removed. Replace the failing component. Go to Verify a
repair.
External SCSI-2 single-ended bus probable tripped PTC causes
The following list provides some suggestions of things to check when the PTC is tripped:
v A short-circuited terminator or cable. Check for bent pins on each connector and removable terminator.
v Intermittent PTC failures can be caused by improperly seated cable connectors. Reseat the connector
and flex the cable in an attempt to duplicate the fault condition across C1.
v Plugging or unplugging a cable or terminator while the system is turned on (hot plugging).
v A short-circuited device.
v Differential devices or terminators are attached to the single-ended SCSI bus.
Note: The SCSI-2 Fast/Wide and Ultra PCI Adapters use an onboard electronic terminator on the
external SCSI bus. When power is removed from the adapter, as in the case of this procedure, the
terminator goes to a high impedance state and the resistance measured cannot be verified, other than it
is high. Some external terminators use an electronic terminator, which also goes to a high impedance
state when power is removed. Therefore, this procedure is designed to find a short-circuited or low
resistance fault as opposed to the presence of a terminator or a missing terminator.
Internal SCSI-2 single-ended bus PTC isolation procedure
Isolate the internal SCSI bus PTC resistor fault with the following procedure:
Note: The internal bus is single-ended.
1. Ensure that system power and all externally attached device power is turned off.
Isolation procedures
331
2. Disconnect any internal and external cables from the adapter, then remove the adapter from the
system.
3. Verify with a digital ohmmeter, that the internal PTC resistor, labeled Z1, is cool and in a low
resistance state, typically less than 1/2 Ohm. Measuring across, be sure to probe both sides of the PTC
where the solder joints and board come together. The polarity of the test leads is not important. If
necessary, allow the PTC to cool and measure again. See the following illustration.
Note: Ensure that only the probe tips are touching the solder joints. Do not allow the probes to touch
any other part of the component.
4. Locate capacitor C1 and measure the resistance across it using the following procedure:
a. Connect the positive lead to the side of the capacitor where the + is indicated. Be sure to probe at
the solder joint where the capacitor and board come together.
b. Connect the negative lead to the opposite side of the capacitor. Be sure to probe at the solder joint
where the capacitor and board come together.
c. If there is no short-circuit present, the resistance reading is high, typically hundreds of Ohms.
Note: Because this is a measurement across unpowered silicon devices, the reading is a function of
the ohmmeter used.
v If there is a fault, the resistance reading is low, typically below 10 Ohms. Because there are no
cables attached, the fault is on the adapter. Replace the adapter.
Note: Some multi-function meters label the leads specifically for voltage measurements. When
using this type of meter to measure resistance, the plus lead and negative lead my not be labeled
correctly. If you are not sure that your meter leads accurately reflect the polarity for measuring
resistance, repeat this step with the leads reversed. Polarity is important in this measurement to
prevent forward-biasing diodes, which lead to a false low resistance reading. If the short circuit is
not indicated with the leads reversed, the SCSI bus is not faulted (short-circuited).
332
Isolation procedures
v If the resistance is high and there is no internal cable to reattach, there is no apparent cause for the
PTC resistor diagnostic failure.
v If the resistance is high and there is an internal cable to reattach, proceed to the next step.
5. Reattach the internal cable to the adapter, then do the following:
a. Measure across C1 as described previously.
b. If the resistance is still high, above 25 Ohms, there is no apparent cause for a PTC failure.
c. If the resistance is less than 10 Ohms, a fault on the internal SCSI bus is possible. Troubleshoot the
internal SCSI bus by disconnecting devices and terminators. Measure across C1 to determine if the
fault has been removed.
Note: Some internal cables have nonremovable terminators.
Internal SCSI-2 single-ended bus probable tripped PTC resistor causes
The following list provides some suggestions of things to check when the PTC is tripped:
v A short-circuited terminator or cable. Check for bent pins on each connector and removable terminator.
v Intermittent PTC failures can be caused by incorrectly seated cable connectors. Reseat the connector
and flex the cable in an attempt to duplicate the fault condition across C1.
v A short-circuited device.
v On some systems, the terminator is fixed to the internal cable and cannot be removed. If all devices are
removed from the cable and the resistance is still low, then the cable should be replaced.
Note: The SCSI-2 Fast/Wide and Ultra PCI adapters use an onboard electronic terminator on the
internal SCSI bus. When power is removed from the adapter, as in the case of this procedure, the
terminator goes to a high impedance state and the resistance measured cannot be verified, other than it
is high. Some internal terminators use an electronic terminator, which also goes to a high impedance
state when power is removed. Therefore, this procedure is designed to find a short-circuit or low
resistance fault as opposed to the presence of a terminator or a missing terminator.
SCSI-2 differential adapter PTC failure isolation procedure
Use this procedure when SRN xxx-240 or xxx-800 has been indicated.
The differential adapter can be identified by the 4-B or 4-L on the external bracket plate.
Before replacing a SCSI-2 differential adapter, use these procedures to determine if a short-circuit
condition exists on the SCSI Bus. The PTC protects the SCSI bus from high currents due to short-circuits
on the cable, terminator, or device. It is unlikely that the PTC can be tripped by a defective adapter.
Unless instructed to do so by these procedures, do not replace the adapter because of a tripped PTC
resistor.
A fault (short-circuit) causes an increase in PTC resistance and temperature. The increase in resistance
causes the PTC to halt current flow. The PTC returns to a low resistive and low temperature state when
the fault is removed from the SCSI bus or when the system is turned off. Wait 5 minutes for the PTC
resistor to fully cool, then retest.
These procedures determine if the PTC resistor is still tripped and then determine if there is a
short-circuit somewhere on the SCSI bus.
External SCSI-2 differential adapter bus PTC isolation procedure
Isolate the external SCSI bus PTC fault with the following procedure:
Notes:
1. Ensure that only the probe tips are touching the solder joints. Do not allow the probes to touch any
other part of the component.
Isolation procedures
333
2. The external bus is differential.
1. Ensure that system power and all externally attached device power is turned off.
2. Check to ensure all devices are marked SCSI Differential and that the terminator on the end of the
SCSI bus is also marked differential. If not, you may have a single-ended SCSI device or terminator
on the differential SCSI bus. Single-ended devices do not work on a differential SCSI bus and may
cause a PTC type error to be reported. The entire SCSI bus may appear to be intermittent. After
ensuring the system is completely differential, continue.
3. Disconnect the external cables from the adapter and remove the adapter from the system.
4. Verify with a digital ohmmeter that the internal PTC resistor, labeled Z1, is cool and in a low
resistance state, typically less than 1/2 Ohm. See the following illustration. Measuring across, be sure
to probe both sides of the PTC resistor where the solder joints and board come together. The polarity
of the test leads is not important. If necessary, allow the PTC resistor to cool and measure again.
5. Locate capacitor C1 and measure the resistance across it using the following procedure:
a. Connect the negative lead to the side of the capacitor marked GND. Be sure to probe at the solder
joint where the capacitor and board come together.
b. Connect the positive lead to the side of the capacitor marked Cathode D1 on the board near C1. Be
sure to probe at the solder joint where the capacitor and board come together.
v If there is no fault present, then the resistance reading is 25 to 35 Ohms. The adapter is not
faulty. Continue to the next step.
v If the resistance measured is higher than 35 Ohms, check to see if RN1, RN2, and RN3 are
plugged into their sockets. If these sockets are empty, you are working with a Multi-Initiators or
High-Availability system. With these sockets empty, a resistive reading across C1 cannot be
verified other than it measures a high resistance (not a short-circuit). If the resistance
measurement is not low enough to be suspected as a fault (lower than 10 Ohms), continue to
the next step.
334
Isolation procedures
v If the resistance is high and there is no external cable to reattach, there is no apparent cause for
the PTC diagnostic failure.
v If the resistance reading is low, typically below 10 Ohms, there is a fault. Because there are no
cables attached, the fault is on the adapter. Replace the adapter.
v If the resistance measured was high and there is an external cable to reattach, proceed to the
next step.
6. Reattach the external cable to the adapter.
a. Measure across C1 as previously described.
b. If the resistance is between 10 to 20 Ohms, there is no apparent cause for a PTC resistor failure.
c. If the resistance is less than 10 Ohms, there is a possibility of a fault on the external SCSI bus.
Troubleshoot the external SCSI bus by disconnecting devices and terminators. Measure across C1
to determine if the fault has been removed.
SCSI-2 differential adapter probable tripped PTC causes
The following list provides some suggestions of things to check when the PTC is tripped:
v A short-circuited terminator or cable. Check for bent pins on each connector and removable terminator.
v Intermittent PTC failures can be caused by incorrectly seated cable connectors. Reseat the connector
and flex the cable in an attempt to duplicate the fault condition across C1.
v Plugging or unplugging a cable or terminator while the system is turned on (hot-plugging).
v A short-circuited device.
v Single-ended devices are attached to the differential SCSI bus.
Dual-channel ultra SCSI adapter PTC failure isolation procedure
Use the following procedures if diagnostics testing indicates a potential positive temperature coefficient
(PTC) resistor fault or the TERMPWR short-circuited LED is lit.
This procedure is used for SRNs 637-240 and 637-800 on the Dual-Channel Ultra SCSI Adapter. If the
TERMPWR short-circuited LED is lit, use this procedure to help isolate the source of the problem on the
failing channel.
1. Identify the adapter by its label of 4-R on the external bracket. Then, determine if the failure is on
channel A or channel B.
2. The same PTC is used for both the internal and external buses. The PTC protects the SCSI bus from
high currents due to short-circuits on the cable, terminator, or device. It is unlikely that the PTC can
be tripped by a defective adapter. A fault (short-circuit) causes an increase in PTC resistance and
temperature. The increase in resistance causes the PTC to halt current flow. The PTC returns to a low
resistive and low temperature state when the fault is removed from the SCSI bus or when the system
is turned off.
Wait 5 minutes for the PTC resistor to fully cool, then retest.
3. If this same error persists, or the TERMPWR short-circuited LED is lit, replace the components of the
failing channel in the following order (wait five minutes between steps):
a. If the failure is on the external cable, replace the following:
1) Cable
2) Device
3) Attached subsystem
4) Adapter
b. If the failure is on the internal cable, replace the following:
1) Cable
2) Device
3) Backplane
4) Adapter
Isolation procedures
335
c. If the failure persists, verify that the parts exchanged are in the correct channel (internal or
external, A or B).
If the errors are still occurring, continue isolating the problem by going to “MAP 0050” on page
346.
64-bit PCI-X dual channel SCSI adapter PTC failure isolation procedure
Use the following procedures if diagnostics testing indicates a potential self-resetting thermal fuse
problem. This procedure is used for SRN 2524-702 on the integrated dual-channel SCSI adapter in a
7039/651 system.
1. Identify the adapter as the one embedded in the system board. Then, determine if the failure is on
channel 0 or channel 1.
2. The thermal fuse protects the SCSI bus from high currents due to short-circuits on the terminator,
cable, or device. It is unlikely that the thermal fuse can be tripped by a defective adapter. A fault
(short-circuit) causes an increase in resistance and temperature of the thermal fuse. The increase in
temperature causes the thermal fuse to halt current flow. The thermal fuse returns to a low resistive
and low temperature state when the fault is removed from the SCSI bus or when the system is turned
off.
Wait 10 seconds for the thermal fuse to reset itself and recover, then retest.
3. If the same error persists, replace the components of the failing channel in the following order. Wait
10 seconds for the thermal fuse to reset itself between steps.
a. Cable
b. Device
c. DASD backplane (if present)
d. System board (adapter)
4. If the failure persists, verify that the parts exchanged are in the correct channel (0 or 1). If the errors
are still occurring, continue isolating the problem by going to “MAP 0050” on page 346.
MAP 0020
Use this MAP to get a service request number (SRN) if the customer or a previous MAP provided none.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
Note: If you are unable to power the system on, see the “Power isolation procedures” on page 124.
v Step 0020-1
Visually check the server for obvious problems such as unplugged power cables or external devices
that are powered off.
Did you find an obvious problem?
No
Go to Step 0020-2.
Yes
Fix the problem, then go to Verify a repair.
v Step 0020-2
Are the AIX online diagnostics installed?
Note: If AIX is not installed on the server or partition, answer no to the above question.
No
If the operating system is running, perform its shutdown procedure. Get help if needed. Go to
Step 0020-4.
Yes
Go to Step 0020-3.
v Step 0020-3
336
Isolation procedures
Note: When possible, run online diagnostics in service mode. Online diagnostics perform additional
functions compared to standalone diagnostics.
Run online diagnostics in concurrent mode when the customer does not let you power-off the system
unit. To run online diagnostics in service mode, go to substep5. If the system unit is already running in
the service mode and you want to run online diagnostics, proceed to the question at the bottom of this
MAP step. Otherwise, continue with 1 through 4 in the following procedure.
1. Log in with root authority or use CE login. If necessary, ask the customer for the password.
2. Enter the diag -a command to check for missing resources.
a. If you see a command line prompt, proceed to substep 3 below.
b. If the DIAGNOSTIC SELECTION menu is displayed, with the letter M shown next to any
resource, select that resource, then press Commit (F7 key). Follow any instructions displayed. If
you are prompted with a message Do you want to review the previously displayed error
select Yes and press Enter. If an SRN displays, record it, and go to Step 0020-15. If there is no
SRN, go to substep 3 below.
c. If the MISSING RESOURCE menu is displayed, follow any instructions displayed. If you are
prompted with a message Do you want to review the previously displayed error select Yes
and press Enter. If an SRN displays, record it, and go to Step 0020-15. If there is no SRN, go to
substep 3 below.
3. Enter the diag command.
4. Go to Step 0020-5.
5. If the operating system is running, perform its shutdown procedure (get help if needed).
6. Turn off the system unit power and wait 45 seconds before proceeding.
7. Turn on the system unit power.
8. Load the online diagnostics in service mode.
9. Wait until the Diagnostic Operating Instructions display or the system appears to have stopped.
Are the Diagnostic Operating Instructions displayed?
No
Go to Step 0020-16.
Yes
Go to Step 0020-5.
v Step 0020-4
Note: If you are working on a partition, do not remove the power as directed in the following
procedure. Remove the power only if you are working on a server that does not have multiple
partitions.
1. If the server does not have multiple partitions, disconnect the power from the server, wait 45
seconds, then reconnect the power.
2. Perform the action specified in the following table.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB,
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
Isolation procedures
337
System:
Action:
8233-E8B, 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
Set the server to perform a slow boot for the next boot that is performed. If the system does not
support slow boot, do a normal boot in the next step.
3. See Running the standalone diagnostics from CD-ROM to load the standalone diagnostics. Before
continuing to the next step, ensure that the server power is turned on, or if you are working on a
partition, the partition is started. The server or partition should be booting the standalone
diagnostics from a CD-ROM or a network server.
4. Wait until the Diagnostic Operating Instructions display or the server boot appears to have stopped.
Are the Diagnostic Operating Instructions displayed?
No
Go to Step 0020-16.
Yes
Go to Step 0020-5.
v Step 0020-5
Are the Diagnostic Operating Instructions displayed (screen number 801001) with no obvious problem
(for example, blurred or distorted)?
No
For display problems, go to Step 0020-12.
Yes
To continue with diagnostics, go to Step 0020-6.
v Step 0020-6
Press the Enter key.
Is the FUNCTION SELECTION menu displayed (screen number 801002)?
No
Go to Step 0020-13.
Yes
Go to Step 0020-7.
v Step 0020-7
1. Select the option ADVANCED DIAGNOSTICS ROUTINES.
Notes:
a. If the terminal type is not defined, do so now. You cannot proceed until this is complete.
b. If you have SRNs from a Previous Diagnostics Results screen, process these Previous
Diagnostics Results SRNs prior to processing any SRNs you may have received from an SRN
reporting screen.
2. If the DIAGNOSTIC MODE SELECTION menu (screen number 801003) displays, select the option
PROBLEM DETERMINATION.
3. Find your system response in the following table. Follow the instructions in the Action column.
338
Isolation procedures
System Response
Action
Previous Diagnostic Results. Do you
want to review the previously
displayed error?
You have a pending item in the error log for which there is no
corresponding Log Repair Action. To see this error, select YES at the
prompt.
Information from the error log is displayed in order of last event first.
Record the error code, the FRU names and the location code of the FRUs.
Go to Step 0020-15.
The RESOURCE SELECTION menu
or the ADVANCED DIAGNOSTIC
SELECTION menu is displayed
(screen number 801006).
Go to Step 0020-8.
The system halted while testing a
resource.
Record SRN 110-xxxx, where xxxx is the first four digits of the menu
number displayed in the upper-right corner of the diagnostic menu. Go to
Step 0020-15.
The MISSING RESOURCE menu is
displayed or the letter M is displayed
alongside a resource in the resource
list.
If the MISSING RESOURCE menu is displayed, follow the displayed
instructions until either the ADVANCED DIAGNOSTIC SELECTION menu
or an SRN is displayed. If an M is displayed in front of a resource (indicating
that it is missing) select that resource then choose the Commit (F7 key).
Note:
1. Run any supplemental media that may have been supplied with the
adapter or device, and then return to substep 1 of Step 0020-7.
2. If the SCSI enclosure services device appears on the missing resource list
along with the other resources, select it first.
3. ISA adapters cannot be detected by the system. The ISA adapter
configuration service aid in standalone diagnostics allows the
identification and configuration of ISA adapters.
If the ADVANCED DIAGNOSTIC SELECTION menu is displayed, go to
Step 0020-11.
If an 8-digit error code is displayed, record it and go to Start of call.
If an SRN is displayed, record it, and go to Step 0020-15.
Go to Step 0020-4.
The message The system will now
continue the boot process is
displayed continuously on the system
unit's console.
The message Processing
supplemental diagnostic diskette
media is displayed continuously on
the system unit's console.
Call your service support.
The diagnostics begin testing a
resource.
Note: If the option Problem
Determination was selected from the
DIAGNOSTIC MODE SELECTION
menu, and if a recent error has been
logged in the error log, the
diagnostics automatically begin
testing the resource.
Follow the displayed instructions.
If the No Trouble Found screen is displayed, press Enter.
If another resource is tested, repeat this step.
If the ADVANCED DIAGNOSTIC SELECTION menu is displayed, go to
Step 0020-11.
If an SRN is displayed, record it, and go to Step 0020-15.
If an 8-digit error code is displayed, go to Start of call.
The system did not respond to
selecting the advanced diagnostics
option.
Go to Step 0020-13.
Isolation procedures
339
System Response
Action
A system unit with a beeper did not
beep while booting.
Record SRN 111-947 and then go to Step 0020-15
The system unit emits a continuous
sound from the beeper.
Record SRN 111-947 and then go to Step 0020-15.
An SRN or an eight-digit error code is Record the error code, the FRU names, and the location code for the FRUs.
displayed.
If a SRN is displayed, go to Step 0020-15.
If an 8-digit error code is displayed, go to Start of call.
The system stopped with a 3-digit or Record SRN 101-xxx (where xxx is the rightmost three digits of the
4-digit code displayed in the operator displayed code). Go to Step 0020-15.
panel display.
An 888 message is displayed in the
operator panel display.
Note: The 888 may or may not be
flashing.
Go to “MAP 0070” on page 355.
v Step 0020-8
On the DIAGNOSTIC SELECTION or ADVANCED DIAGNOSTIC SELECTION menu, look through
the list of resources to make sure that all adapters and SCSI devices are listed including any new
resources.
Notes:
1. Resources attached to serial and parallel ports may not appear in the resource list.
If running diagnostics in a partition within a partitioned system, resources assigned to other
partitions will not be displayed on the resource list.
Did you find all the adapters or devices on the list?
2.
No
Go to Step 0020-9.
Yes
Go to Step 0020-11.
v Step 0020-9
Is the new device or adapter an exact replacement for a previous one installed at same location?
No
Go to Step 0020-10.
The replacement device or adapter may be defective. If possible, try installing it in an alternate
location if one is available; if it works in that location, then suspect that the location where it
failed to appear has a defective slot. Schedule time to replace the hardware that supports that
slot. If it does not work in alternate location, suspect a bad replacement adapter or device. If
you are still unable to detect the device or adapter, contact your service support structure.
v Step 0020-10
Is the operating system software to support this new adapter or device installed?
Yes
No
Load the operating system software.
The replacement device or adapter may be defective. If possible, try installing it in an alternate
location if one is available; if it works in that location, then suspect that the location where it
failed to appear has a defective slot. Schedule time to replace the hardware that supports that
slot. If it does not work in alternate location, suspect a bad replacement adapter or device. If
you are still unable to detect the device or adapter, contact your service support structure.
v Step 0020-11
Yes
Select and run the diagnostic test problem determination or system verification on one of the
following:
340
Isolation procedures
– The resources with which the customer is having problems. If the resource is not shown on the
DIAGNOSTIC SELECTION menu, then run diagnostics on its parent (the adapter or controller to
which the resource is attached).
– The resources you suspect are causing a problem.
– All resources.
Note: When choosing All Resources, interactive tests are not done. If no problem is found running
All Resources, select each of the individual resources on the selection menu to run diagnostics tests
on to do the interactive tests
Find the response in the following table, or follow the directions on the test results screen.
Diagnostic Response
Action
An SRN or an eight-digit error code is Record the error code, the FRU names, and the location code for the FRUs.
displayed on the screen.
If an SRN is displayed, go to Step 0020-15. If an 8-digit error code is
displayed, go to the information center, and perform a search on the error
code to obtain the name and location of the failing FRU. Perform the listed
action.
The TESTING COMPLETE menu and
the No trouble was found message
are displayed, and you have not
tested all of the resources.
Press Enter and continue testing other resources.
The TESTING COMPLETE menu and Go to Step 0020-14.
the No trouble was found message
Note: If you have not run the sysplanar test, do so before going to Step
are displayed, and you have tested all 0020-14.
of the resources.
The system halted while testing a
resource.
Record SRN 110-xxxx, where xxxx is the first three or four digits of the
menu number displayed in the upper-right corner of the diagnostic menu
screen.
Go to Step 0020-15.
When running the Online Diagnostics, Ensure that the diagnostic support for the device was installed. The display
an installed device does not appear in configuration service aid can be used to determine whether diagnostic
the test list.
support is installed for the device.
Record SRN 110-101. Go to Step 0020-15.
Note: Supplemental diskettes may be required if service aids are run from
standalone diagnostics.
The IBM ARTIC960 Quad T1/E1
adapter diagnostics display a message
indicating that the interface board
(PMC) is either not installed or is
malfunctioning.
Install a PMC board if not already installed.
When running online diagnostics on any of the IBM ARTIC960 family of
adapters and the message indicates that the PMC is not installed, but it is
installed, do the following:
v Reseat the PMC board, then run diagnostics.
v If the response is the same, replace the PMC and then go to Verify a
repair.
The symptom was not found in the
table.
Return to the Start of call.
v Step 0020-12
The following step analyzes a console display problem.
Find your type of console display in the following table. Follow the instructions given in the Action
column.
Isolation procedures
341
Type of Console Display
Action
TTY-type terminal
Be sure the TTY terminal attributes are set correctly.
If you did not find a problem with the attributes, go to the documentation
for this type of TTY terminal, and continue problem determination. If you
do not find the problem, record SRN 111-259, then go the Step 0020-15.
Graphics display
Go to the documentation for this type of graphics display, and continue
problem determination. If you do not find the problem, record SRN 111-82c,
then go to Step 0020-15.
Management console
For a Hardware Management Console (HMC), go to the service procedures
in Troubleshooting the HMC. For an IBM Systems Director Management
Console (SDMC), go to the service procedure in Troubleshooting the SDMC.
If management console tests find no problem, there may be a problem with
the communication between the management console and the managed
system. If the management console communicates with the managed system
through a network interface, verify whether the network interface is
functional. If the management console communicates with the managed
system through the management console interface, check the cable between
the management console and the managed system. If it is not causing the
problem, suspect a configuration problem of the management console
communications setup.
v Step 0020-13
There is a problem with the keyboard.
Find the type of keyboard you are using in the following table. Follow the instructions given in the
Action column.
Keyboard Type
Action
Type 101 keyboard (U.S.). Identify by
the size of the Enter key. The Enter
key is in only one horizontal row of
keys.
Record SRN 111-736, then go to Step 0020-15.
Type 102 keyboard (W.T.). Identify by Record SRN 111-922; then go to Step 0020-15.
the size of the Enter key. The Enter
key extends into two horizontal rows.
Kanji-type keyboard. (Identify by the
Japanese characters.)
Record SRN 111-923; then go to Step 0020-15.
TTY terminal keyboard
Go to the documentation for this type of TTY terminal and continue
problem determination.
Hardware Management Console
(HMC)
Go to the HMC service procedures in Troubleshooting the HMC. If HMC
tests find no problem, there may be a problem with the communication
between the HMC and the managed system. If the HMC communicates
with the managed system through a network interface, verify whether the
network interface is functional. If the HMC communicates with the
managed system through the HMC interface, check the cable between the
HMC and the managed system. If it is not causing the problem, suspect a
configuration problem of the HMC communications setup.
v Step 0020-14
The diagnostics did not detect a problem.
If the problem is related to either the system unit or the I/O expansion box, see the service
documentation for that unit.
If the problem is related to an external resource, use the problem determination procedures, if
available, for that resource.
342
Isolation procedures
If a problem occurs when running online diagnostics but not when running the stand-alone
diagnostics, suspect a software problem.
Check for the presence of supplemental diagnostic material, such as diskettes or documentation.
This is possibly a problem with software or intermittent hardware. If you think that you have an
intermittent hardware problem, go to “MAP 0040” on page 344.
v Step 0020-15
Take the following actions:
1. Handle multiple SRNs and error codes in the following order:
a. 8-digit error codes.
b. SRNs with a source code other than F or G.
c. SRNs with a source code of F. Run online diagnostics in advanced and problem determination
mode to obtain maximum isolation.
d. SRNs with a source code of G.
Note: The priority for multiple SRNs with a source of G is determined by the time stamp of the
failure. Follow the action for the SRN with the earliest time stamp first.
e. Device SRNs and error codes (5-digit SRNs).
If a group has multiple SRNs, it does not matter which SRN is handled first.
2. Find the SRN.
If the SRN is not listed, look for it in the following:
– Any supplemental service information for the device
– The diagnostic problem report screen for additional information
– The "Service Hints" service aid in AIX and Linux tasks and service aids
3. Perform the action listed.
4. If you replace a part, go to Verify a repair.
v Step 0020-16
Look up the AIX IPL progress codes for definitions of configuration program indicators. They are
normally 0xxx or 2xxx.
Is a configuration program indicator displayed?
No
Go to the “Problems with loading and starting the operating system (AIX and Linux)” on page
326.
Record SRN 101-xxxx (where xxxx is the rightmost three or four digits or characters of the
configuration program indicator). Go to Step 0020-17.
v Step 0020-17
Is a location information displayed on the operator panel display?
Yes
No
Go to Step 0020-15.
Yes
Record the location code, then go to Step 0020-15.
MAP 0030
This MAP is used for problems that still occur after all FRUs indicated by the SRN or error code have
been exchanged.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
Note: Check the action text of the SRN before proceeding with this MAP. If there is an action listed,
perform that action before proceeding with this MAP.
Isolation procedures
343
v Step 0030-1
Some external devices (including rack drawers that contain devices) have their own problem
determination procedures. If the problem is related to an external device that has its own problem
determination procedure, run those procedures if not already run. If they do not correct the problem,
continue with this MAP.
v Step 0030-2
The problem may have been caused by a resource that has not been tested. System Checkout tests all
resources. If the online Diagnostics are installed and you are able to load them, then All Resources
under the Diagnostic Selection menu should be run. If you get a different SRN, look up the SRN in the
SRN topic collections and perform the listed action. If you are unable to run All Resources under the
Diagnostic Selection menu or you do not get another SRN when running it, continue with this MAP.
v Step 0030-3
If the problem is related to a SCSI device, SCSI bus, or SCSI controller, go to “MAP 0050” on page 346.
If you are unable to isolate the problem with MAP 0050, continue with Step 0030-4.
v Step 0030-4
1. Find the resources that are identified by the SRN or error code in the following table.
2. Perform the first action listed for the resource.
3. If you exchange a FRU or change a switch setting, test the resource again.
4. If the action does not correct the problem, perform the next action until all actions have been tried.
If an action says to exchange a FRU that you have already exchanged, go to the next action. If an
action corrects the problem, go to Verify a repair.
5. If you perform all of the actions and do not correct the problem, check the Service Hints service aid
for information. If the service aid does not help, call your service support structure.
Failing Resource
Repair Action
SCSI Device
Exchange the SCSI controller. Replace the power supply.
Pluggable SCSI or IDE controller
Exchange the backplane into which the adapter is plugged.
Keyboard, tablet, mouse, dials, LPF
keys, diskette drive
Check the cable attaching the device to its adapter. If you do not find a
problem, exchange the device's adapter.
Pluggable adapters, CPU cards, and
controllers
Determine whether the adapter contains any attached FRUs such as fuses,
DRAMs, and crossover cables.
1. Check or exchange any attached FRU on the resource.
2. If the adapter is plugged into a riser card, check or exchange the riser
card.
3. Exchange the backplane into which the adapter is plugged.
System and I/O backplanes
Contact your service support structure.
Built-In system ports
Replace the service processor if present.
A device attached to the system by a
cable and an adapter.
1. Replace the adapter for the device.
IDE Device
Replace the cable between the IDE controller and the device. If the IDE
controller is packaged on a backplane, replace that backplane, otherwise
replace the adapter containing the IDE controller.
2. Replace the cable to the device.
MAP 0040
This MAP provides a structured way of analyzing intermittent problems.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
344
Isolation procedures
This MAP consists of two tables: Hardware Symptoms and Software Symptoms.
Because software or hardware can cause intermittent problems, consider all symptoms relevant to your
problem.
How to use this MAP
This MAP contains information about causes of intermittent symptoms. In the following tables, find your
symptoms, and read the list of things to check.
When you exchange a FRU, go to Verify a repair to check out the system.
Hardware symptoms
Note: This table spans several pages.
Symptom of hardware problem
Things to check for
Any hardware log entry in the error
log.
Use the Hardware Error Report service aid to view the error log and check
for:
v Multiple errors on devices attached to the same SCSI bus.
v Multiple errors on devices attached to the same async adapter.
v Multiple errors on internally installed devices only.
Contact your service support structure for assistance with error report
interpretation.
Hardware-caused system crashes
v The connections on the CPU backplane or CPU card
v Memory modules for correct connections
v Connections to the system backplane.
v Cooling fans operational
v The environment for a too-high or too-low operating temperature.
v Vibration: proximity to heavy equipment.
System unit powers off a few seconds v Fan speed. Some fans contain a speed-sensing circuit. If one of these fans
after powering.
is slow, the power supply powers the system unit off.
v Correct voltage at the outlet into which the system unit is plugged.
v Loose power cables and fan connectors, both internal and external.
System unit powers off after running
for more than a few seconds.
v Excessive temperature in the power supply area.
v Loose cable connectors on the power distribution cables.
v Fans turning at full speed after the system power has been on for more
than a few seconds.
Isolation procedures
345
Symptom of hardware problem
Things to check for
Only internally installed devices are
failing.
Check the following items that are common to more than one device:
v Ground connections on all of the disk drives and other types of drives
installed.
v Loose connections on the power cables to the backplanes, drives, fans,
and battery.
v System unit cooling. Is the input air temperature within limits? Are all the
fans running at full speed? Are any of the vent areas blocked?
v Signal cables to the diskette drives, and the power supply.
v SCSI device signal cables for loose connectors and terminators.
v Loose SCSI device address jumpers.
v Possible contamination of any device that has a cleaning procedure. See
the operator guide for cleaning instructions.
v Excessive static electricity.
v Correct voltage at the system unit power outlet
Only externally attached devices are
failing.
Check the following items that are common to more than one device.
v Check the SCSI signal cables to the devices for loose connectors and
terminators.
v Check devices that use jumpers to set the SCSI address for loose jumpers.
v Check any device that has a cleaning procedure for contamination. See
the operator guide for cleaning instructions.
v Check for excessive static electricity.
v Check the outlet that the device is plugged into for correct voltage.
v Check the error log for entries for the adapter driving the failing devices.
v Check the temperature of the devices. Are the cooling vents blocked? Are
the fans running?
v Check for other devices near the failing device that may be radiating
noise (displays, printers, and so on).
Software symptoms
Symptom of software problem
Things to check for
Any symptom you suspect is related
to software.
Use the software documentation to analyze software problems.
Software-caused system crashes
Check the following software items:
v Is the problem only with one application program?
v Is the problem only with one device?
v Does the problem occur on a recently installed program?
v Was the program recently patched or modified in any way?
v Is the problem associated with any communication lines?
v Check for static discharge occurring at the time of the failure.
MAP 0050
Use this MAP to analyze problems with a SCSI bus.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
346
Isolation procedures
Considerations
v To prevent hardware damage or erroneous diagnostic results from a system with its power turned on,
use "PCI hot-plug manager" subtask "replace/remove PCI hot plug adapter" before connecting or
disconnecting cables or devices.
v Also, use this MAP for SCSI adapters that are built into system backplanes or I/O backplanes. If this
procedure calls for replacing a SCSI adapter and the SCSI adapter is built into the system backplane or
I/O backplane, replace the system backplane or I/O backplane as appropriate.
v If the failure is a terminator power failure (SRNs xxx-226, xxx-240, xxx-800), always allow five minutes
for the PTC to cool.
v The differential version of the adapter has socket-type terminators to support high-availability. If this is
the adapter's configuration, the terminators would have been removed from the adapter. MAP steps
requiring the removal of the cable from the adapter do not apply, because an adapter that is not
terminated always fails diagnostics. Proper SCSI diagnostics require proper termination. If the
configuration involves a Y-cable, leave it, with the appropriate terminator, attached to the adapter. Or,
place an external differential terminator on the external port.
v If the system uses shared disk-drive hardware or a high-availability configuration, ensure that the other
system that is sharing the devices is not using the devices. For additional information concerning
high-availability configurations, see “SCSI service hints” on page 329.
v For intermittent problems that cannot be resolved with this MAP, see “SCSI service hints” on page 329.
v If the SCSI bus is attached to a RAID subsystem, see the RAID subsystem documentation for any
problem determination. If the RAID adapter is a PCI-X RAID adapter, see the PCI-X SCSI RAID
Controller Reference Guide for AIX.
Follow the steps in this MAP to isolate a SCSI bus problem.
Note: This procedure removes devices and components from a SCSI bus until a problem or a symptom
or problem is eliminated. If you follow the entire procedure, you will remove all components of a SCSI
bus in the following order:
1. Hot-swap devices
2. Devices that are not hot-swap
3. SCSI Enclosure Services (SES) device or enclosures
4. SCSI cables
5. SCSI adapter
Do the following:
v Step 0050-1
Have changes been made recently to the SCSI configuration?
No
Go to Step 0050-2.
Yes
Go to Step 0050-5.
v Step 0050-2
Are there any hot-swap devices (SCSI disk drives or media devices) controlled by the adapter?
No
Go to Step 0050-3.
Yes
Go to Step 0050-11.
v Step 0050-3
Are there any devices other than hot-swappable devices controlled by the adapter?
No
Go to Step 0050-4.
Yes
Go to Step 0050-13.
v Step 0050-4
Isolation procedures
347
Is an enclosure or drawer that supports hot-swap devices controlled by the adapter?
No
Go to Step 0050-22.
Yes
Go to Step 0050-15.
v Step 0050-5
This step handles cases where recent changes have been made to the SCSI configuration.
Using the first three digits of the SRN, see the FFC listing and determine if the adapter is single-ended
or differential.
Is the adapter a single-ended adapter?
No
Go to Step 0050-6.
Yes
Go to Step 0050-7.
v Step 0050-6
The adapter's termination jumper settings may be incorrect. Power off the system, and inspect jumper
J7.
Are the jumpers correct?
No
Go to Step 0050-8.
Yes
Go to Step 0050-9.
v Step 0050-7
If the adapter is not being used in a high-availability configuration, be sure sockets RN1, RN2, and
RN3 are populated.
If the adapter is being used in a high-availability configuration, be sure sockets RN1, RN2, and RN3 are
not populated.
Go to Step 0050-9.
v Step 0050-8
1. Correct the jumper settings and reinstall the adapter and all cables.
2. Power on the system, and run diagnostics in system verification mode on the adapter.
Did the diagnostic pass?
No
Go to Step 0050-9.
Yes
Go to Step 0050-10.
v Step 0050-9
Check for the following problems:
– Address conflicts between devices.
– Cabling problems such as configurations that exceed the maximum cable lengths, missing
termination, or excessive termination.
Did you find a problem?
No
Go to Step 0050-2.
Yes
Go to Step 0050-10.
v Step 0050-10
1. Correct the problem.
2. Power on the system, and run diagnostics in system verification mode on the adapter.
Did a failure occur?
No
Go to Verify a repair.
Yes
Go to Step 0050-2.
v Step 0050-11
348
Isolation procedures
This step determines if a hot-swap device is causing the failure.
1. Go to Preparing for hot-plug SCSI device or cable deconfiguration.
2. Disconnect all hot-swap devices attached to the adapter.
3. Go to After hot-plug SCSI device or cable deconfiguration.
4. If the Missing Options menu displays, select the option The resource has been turned off, but
should remain in the system configuration for all the devices that were disconnected.
5. Run the diagnostics in system verification mode on the adapter.
Did a failure occur?
No
Go to Step 0050-12.
Yes
Go to Step 0050-3.
v Step 0050-12
Go to Preparing for hot-plug SCSI device or cable deconfiguration. Reconnect the hot-plug devices one
at a time. After reconnecting each device, do the follow:
1. Go to After hot-plug SCSI device or cable deconfiguration.
2. Rerun the diagnostics on the adapter.
3. If the adapter fails, the problem may be with the last device reconnected. Perform these substeps:
a. Follow repair procedures for that last device.
b. Rerun diagnostics on the adapter.
c. If diagnostics fail, replace the SES backplane corresponding to the slot for the device.
d. Rerun diagnostics.
e. If diagnostics fail, replace the last device.
f. Rerun diagnostics on the adapter.
g. If diagnostics pass, go to Verify a repair. Otherwise, contact your support center.
Note: A device problem can cause other devices attached to the same SCSI adapter to go into
the defined state. Ask the system administrator to make sure that all devices attached to the
same SCSI adapter as the device that you replaced are in the available state.
4. If no errors occur, the problem could be intermittent. Make a record of the problem. Running the
diagnostics for each device on the bus may provide additional information.
v Step 0050-13
This step determines if a device other than a hot-swappable device is causing the failure. Follow these
steps:
1. Go to Preparing for hot-plug SCSI device or cable deconfiguration.
2. Disconnect all devices attached to the adapter (except for the device from which you boot to run
diagnostics; you may want to temporarily move this device to another SCSI port while you are
trying to find the problem).
3. Go to After hot-plug SCSI device or cable deconfiguration.
4. If the Missing Options menu displays, select the option The resource has been turned off, but
should remain in the system configuration for all the devices that were disconnected.
5. Run the diagnostics in system verification mode on the adapter.
Did a failure occur?
No
Go to Step 0050-14.
Yes
Go to Step 0050-4.
v Step 0050-14
Reconnect the devices one at time. After reconnecting each device, follow this procedure:
1. Rerun the diagnostics in system verification mode on the adapter.
Isolation procedures
349
2. If there is a failure, the problem should be with the last device reconnected. Follow the repair
procedures for that device, then go to Verify a repair.
3. If no errors occur, the problem could be intermittent. Make a record of the problem. Running the
diagnostics for each device on the bus may provide additional information.
v Step 0050-15
This step determines whether the SCSI Enclosure Services (SES) controller, which provides hot-plug
capability for SCSI drives in the server, is causing the problem.
Note: In most cases the SES controller is integrated on the backplane that is used to connect SCSI
devices, for example a disk drive backplane. If your system has hot-plug capability and the SES
controller is separate from the SCSI drive backplane, there will be an intermediate card on the SCSI bus
between the SCSI adapter and the device or SCSI backplane. You will have to make a visual check to
see if there are any intermediate cards on the SCSI bus that is displaying a problem.
Does a separate SES controller plug into the SCSI device backplane?
No
Go to Step 0050-18.
Yes
Go to Step 0050-16.
v Step 0050-16
Follow these steps:
1. Power off the system.
2. Remove the intermediate SES controller card. Locate the SES controller part number under System
parts.
3. Power on the system.
4. If the Missing Options menu displays, select the option The resource has been turned off, but
should remain in the system configuration for all the devices that were disconnected.
5. Run the diagnostics in system verification mode on the adapter.
Did a failure occur?
No
Go to Step 0050-17.
Yes
Go to Step 0050-18.
v Step 0050-17
Follow these steps:
1. Power off the system.
2. Replace the intermediate SES controller card.
3. Go to Verify a repair.
v Step 0050-18
Follow these steps:
1. Go to Preparing for hot-plug SCSI device or cable deconfiguration.
2. Disconnect all cables attached to the SCSI adapter. For SCSI differential adapters in a
high-availability configuration, see Considerations.
3. Go to After hot-plug SCSI device or cable deconfiguration.
4. If the Missing Options menu displays, select the option The resource has been turned off, but
should remain in the system configuration for all the devices that were disconnected.
5. Run the diagnostics in system verification mode on the adapter.
Did a failure occur?
No
Go to Step 0050-19.
Yes
Replace the adapter, then go to Verify a repair.
v Step 0050-19
350
Isolation procedures
Follow these steps:
1. Go to Preparing for hot-plug SCSI device or cable deconfiguration.
2. Reconnect the cables to the adapter.
Does the SES controller (an intermediated SES controller) plug into the backplane?
No
Go to Step 0050-20.
Yes
Go to Step 0050-21.
v Step 0050-20
Follow these steps:
1. Replace the SES controller. Locate the intermediate SES controller part number under System parts.
2. Power on the system.
3. If the Missing Options menu displays, select the option The resource has been turned off, but
should remain in the system configuration for all the devices that were disconnected.
4. Run the diagnostics in system verification mode on the adapter.
Did a failure occur?
No
Go to Verify a repair.
Yes
Go to Step 0050-21.
v Step 0050-21
One of the cables remaining in the system is defective. See System parts for the cable part numbers.
Replace the parts one at a time in the order listed. Follow these steps for each FRU replaced:
1. Rerun the diagnostics for the adapter.
2. If there is any failure, continue with the next FRU.
3. If there is no failure, go to Verify a repair.
v Step 0050-22
Follow these steps:
1. Go to Preparing for hot-plug SCSI device or cable deconfiguration.
2. Disconnect all cables attached to the adapter (except for the cable to the device from which you
boot to run diagnostics; you may want to temporarily move this device to another SCSI port while
you are trying to find the problem).
3. Go to After hot-plug SCSI device or cable deconfiguration.
4. If the Missing Options menu displays, select the option The resource has been turned off, but
should remain in the system configuration for all the devices that were disconnected.
5. Run the diagnostics on the adapter.
Did a failure occur?
No
Go to Step 0050-23.
Yes
Replace the adapter, then go to Verify a repair.
v Step 0050-23
One of the cables remaining in the system is defective. See System parts for the cable part numbers.
Replace the parts one at time in the order listed. Follow these steps for each FRU replaced:
1. Rerun the diagnostics for the adapter.
2. If there is any failure, continue with the next FRU.
3. If there is no failure, go to Verify a repair.
Preparing for hot-plug SCSI device or cable deconfiguration
Use this procedure when you are preparing to unconfigure a hot-plug Small Computer System Interface
(SCSI) device or cable. This procedure will help determine if a SCSI device or SCSI device cable is
causing your system problem.
Isolation procedures
351
Disconnect all cables attached to the adapter, (except for the cable to the device from which you boot to
run diagnostics; temporarily move these device cables to another SCSI port while you are trying to find
the problem).
1. Go to Running the online and stand-alone diagnostics and perform the prerequisite tasks described
in the Before you begin topic.
2. Determine which SCSI adapter you plan to remove the cables or devices from.
3. Adapter slots are numbered on the rear of the system unit, record the slot number and location of
each adapter being removed.
4. Ensure that any processes or applications that might use the adapter are stopped.
5. Enter the system diagnostics by logging in as root user or as the CE login user. Type the diag
command on the AIX command line.
6. When the DIAGNOSTIC OPERATING INSTRUCTIONS menu displays, and press Enter. The
FUNCTION SELECTION menu appears.
7. From the FUNCTION SELECTION menu, select Task Selection, and then press Enter.
8. From the Task Selection list, select Hot Plug Manager, and select PCI Hot Plug Manager.
9. From the PCI Hot Plug Manager menu, select Unconfigure a Device, and then press Enter.
10. Press F4 or ESC+4 to display the Device Names menu.
11. Select the adapter from which you are removing the cables or devices in the Device Names menu.
12. In the Keep Definition field, use the tab key to answer Yes.
13. In the Unconfigure Child Devices field, use the Tab key to answer Yes, then press Enter. The ARE
YOU SURE screen is displayed.
14. Press Enter to verify the information. A successful deconfiguration is indicated by the OK message
displayed next to the Command field at the top of the screen.
15. Press F4 or ESC+4 twice to return to the Hot Plug Manager menu.
16. Select replace/remove PCI Hot Plug adapter.
17. Select the slot that has the adapter you want to remove the cables or devices from in the system.
18. Select Remove.
Note: A fast flashing amber LED located at the back of the machine indicates the slot that you
selected.
19. Press Enter. This places the adapter in the action state, meaning it is ready to be removed from the
system. (You do not need to remove the adapter, unless it makes removing the cables attached to it
easier).
After hot-plug SCSI device or cable deconfiguration
Use this procedure after you deconfigure or hot-plug a SCSI device to ensure that the replaced
component was successfully installed.
1. Press Enter, then continue to follow the screen instructions until you receive a message that the
replacement is successful. A successful replacement is indicated by the OK message displayed next to
the command field at the top of the screen.
2. Press the F3 or ESC 3 key to return to the PCI Hot-Plug Manager menu.
3. Press the F3 or ESC 3 key to return to the Hot-Plug Manager menu.
4. Press the F3 or ESC 3 key to return to the TASK selection list.
5.
6.
7.
8.
9.
Select Log Repair Action.
Select the adapter you just removed the cables or devices from, then press Enter.
Press Commit (F7 or ESC 7), then press Enter.
Press the F10 or the ESC 0 key to exit diagnostics.
Type the diag -a command on the command line.
352
Isolation procedures
MAP 0054
Use this MAP to determine which FRUs may need to be replaced in order to solve a SCSI bus-related
problem on a PCI-X SCSI or PCI-X SCSI RAID adapter.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
Considerations
v Remove power from the system before connecting and disconnecting cables or devices, as appropriate,
to prevent hardware damage or erroneous diagnostic results.
v Note that some systems have SCSI and PCI-X bus interface logic integrated onto the system boards and
use a pluggable RAID enablement card (a non-PCI form factor card) for these SCSI and PCI-X buses.
An example of such a RAID enablement card is FC 5709. For these configurations, replacement of the
RAID enablement card is unlikely to solve a SCSI bus-related problem, because the SCSI bus interface
logic is on the system board.
v Some adapters provide two connectors, one internal and one external, for each SCSI bus. For this type
of adapter, it is not acceptable to use both connectors for the same SCSI bus at the same time. SCSI bus
problems are likely to occur if this is done. However, it is acceptable to use an internal connector for
one SCSI bus and an external connector for another SCSI bus. The internal and external connectors are
labeled to indicate which SCSI bus they correspond to.
Attention: RAID adapters should not be replaced when SCSI bus problems exist, except with assistance
from your service support structure. Because the adapter may contain non-volatile write cache data and
configuration data for the attached disk arrays, additional problems can be created by replacing an
adapter when SCSI bus problems exist.
Attention: Do not remove functioning disks in a disk array without assistance from your service
support structure. A disk array may become degraded or failed if functioning disks are removed, and
additional problems may be created.
Follow the steps in this MAP to isolate a PCI-X SCSI bus problem.
v Step 0054-1
Identify the SCSI bus on which the problem is occurring on by examining the hardware error log. To
view the hardware error log, do the following:
1. Invoke diagnostics and select Task Selection on the Function Selection display.
2. Select Display Hardware Error Report.
3. Choose one of the following options:
– If the type of adapter is not known, select Display Hardware Errors for Any Resource.
– If the adapter is a PCI-X SCSI adapter, select Display Hardware Errors for PCI-X SCSI
Adapters.
– If the adapter is a PCI-X SCSI RAID adapter, select Display Hardware Errors for PCI-X SCSI
RAID Adapters.
4. Select the resource, or select All Resources if the resource is not known. If you had previously
selected Display Hardware Errors for Any Resource, then select All Resources.
5. On the Error Summary screen, look for an entry with a SRN corresponding to the problem which
sent you here, and select it.
Note: If multiple entries exist for the SRN, some entries might be old or the problem has occurred
on multiple entities (adapters, disk arrays, or devices). Older entries can be ignored; however, this
MAP may need to be used multiple times if the same problem has occurred on multiple entities.
6. Select the hardware error log to view.
Isolation procedures
353
While viewing the hardware error log, under the Detail Data and SENSE DATA headings, identify the
first four bytes of the hexadecimal data (for example, nnnn nnnn nnnn nnnn ...). The four bytes
identified in the error log can be interpreted as:
00bb ssLL
where:
– bb, when not FF, identifies the adapter's SCSI bus
– ss, when not FF, identifies the SCSI ID of a device
– LL, when not FF, identifies the logical unit number (LUN) of a device
Go to Step 0540-2.
v Step 0054-2
Are the last two bytes of the four bytes identified in Step 0540-1, equal to FFFF (for example, 00bb FFFF,
where bb identifies the adapter's SCSI bus)?
No
Go to Step 0540-4.
Yes
Go to Step 0540-3.
v Step 0054-3
While the error persists, replace the components of the failing SCSI bus in the following order:
1. Cable on bus bb (if present)
2. Adapter (if SCSI bus interface logic is on the adapter) or system board (if SCSI bus interface logic is
on the system board)
To replace a component, and see if the problem was corrected, do the following:
1. Follow the removal and replacement procedure for the component as previously described in this
step.
2. Run diagnostics in system verification mode on the adapter.
When the problem is resolved, go to Verify a repair.
v Step 0054-4
Are the last two bytes of the four bytes identified in Step 0540-1, equal to FF00 (for example, 00bb FF00,
where bb identifies the adapter's SCSI bus)?
No
Go to Step 0540-6.
Yes
Go to Step 0540-5.
v Step 0054-5
While the error persists, replace the components of the failing SCSI bus in the following order:
1. Cable on bus bb (if present)
2. Adapter (if SCSI bus interface logic is on the adapter) or system board (if the SCSI bus interface
logic is on the system board)
3. DASD backplane attached to bus bb (if present)
To replace a component, and see if the problem was corrected, do the following:
1. Follow the removal and replacement procedure for the component as previously described in this
step.
2. Run diagnostics in system verification mode on the adapter.
When the problem is resolved, go to Verify a repair.
v Step 0054-6
While the error persists, replace the components of the failing SCSI bus in the following order:
1. Device on bus bb with SCSI ID ss
2. Cable on bus bb (if present)
3. Adapter (if SCSI bus interface logic is on the adapter) or system board (if SCSI bus interface logic is
on the system board)
354
Isolation procedures
To replace a component and see if the problem was corrected, do the following:
1. Follow the removal and replacement procedure for the component as previously described in this
step.
2. Run diagnostics in system verification mode on the adapter.
When the problem is resolved, go to Verify a repair.
MAP 0070
Use this MAP when you receive an 888 sequence on the operator panel display or monitor.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
An 888 sequence in operator panel display suggests that either a hardware or software problem has been
detected and a diagnostic message is ready to be read.
Note: The 888 will not necessarily flash on the operator panel display.
v Step 0070-1
Perform the following steps to record the information contained in the 888 sequence message.
1. Wait until the 888 sequence displays.
2. Record, in sequence, every code displayed after the 888. On systems with a 3-digit or a 4-digit
operator panel, you may need to press the system's "reset" button to view the additional digits after
the 888. Stop recording when the 888 digits reappear.
3. Go to Step 0070-2.
v Step 0070-2
Using the first code that you recorded, use the following list to determine the next step to use.
Type 102
Go to Step 0070-3.
Type 103
Go to Step 0070-4.
v Step 0070-3
A type 102 message generates when a software or hardware error occurs during system execution of an
application. Use the following information to determine the content of the type 102 message.
The message readout sequence is:
102 = Message type RRR = Crash code (the three-digit code that immediately follows the 102) SSS =
Dump status code (the three-digit code that immediately follows the Crash code).
Record the crash code and the dump status code from the message you recorded in Step 0070-1.
Are there additional codes following the dump status?
No
Go to Step 0070-5.
Yes
The message also has a type 103 message included in it. Go to Step 0070-4 to decipher the SRN
and field replaceable unit (FRU) information in the type 103 message.
Note: Type 102 messages have no associated SRNs.
v Step 0070-4
A type 103 message is generated by the hardware when certain types of hardware errors are detected.
Use the following steps and information you recorded in Step 0070-1 to determine the content of the
type 103 message.
The message readout sequence is:
Isolation procedures
355
103 = Message type (x)xxx (y)yyy = SRN (where (x)xxx = the three- or four-digit code following the
103 and (y)yyy is the three- or four-digit code following the (x)xxx code).
1. Record the SRN and FRU location codes from the recorded message.
2. Find the SRN in the Service Request Number List and perform the indicated action.
Note: The only way to recover from an 888 type of halt is to turn off the system unit.
v Step 0070-5
Perform the following steps:
1. Turn off the system unit power.
2. Turn on the system unit power, and load the online diagnostics in service mode.
3. Wait until one of the following conditions occurs:
– You are able to load the diagnostics to the point where the Diagnostic Mode Selection menu
displays.
– The system stops with an 888 sequence.
– The system appears to hang.
Is the Diagnostic Mode Selection menu displayed?
No
Go to Start of call.
Yes
Go to Step 0070-6.
v Step 0070-6
Run the All Resources options under Advanced Diagnostics in Problem Determination Mode.
Was an SRN reported by the diagnostics?
No
This is possibly a software-related 888 sequence. Follow the procedure for reporting a software
problem.
Yes
Record the SRN and its location code information. Find the SRN in the SRN Listing and do the
indicated action.
MAP 0220
Use this procedure to exchange hot-swappable field replaceable units (FRUs).
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
Note: The FRU you want to hot plug might have a defect on it that can cause the hot-plug operation to
fail. If, after following the hot-plug procedure, you continue to get an error message that indicates that
the hot-plug operation has failed, schedule a time for deferred maintenance when the system containing
the FRU can be powered off. Then go to MAP 0210: General problem resolution, Step 0210-2 and answer
NO to the question Do you want to exchange this FRU as a hot-plug FRU?
Attention: If the FRU is a disk drive or an adapter, ask the system administrator to perform the steps
necessary to prepare the device for removal.
v Step 0220-1
1. If the system displayed a FRU part number on the screen, use that part number to exchange the
FRU.
If there is no FRU part number displayed on the screen, see the SRN listing. Record the SRN source
code and the failing function codes in the order listed.
2. Find the failing function codes in the FFC listing, and record the FRU part number and description
of each FRU.
3. To determine if the part is hot-swappable, see the System FRU locations procedure for the part.
356
Isolation procedures
Does this system unit support hot-swapping of the first FRU listed?
No
Go to “MAP 0210: General problem resolution” on page 325.
Yes
Go to Step 0220-2.
v Step 0220-2
Is the FRU a hot-swap power supply or fan?
No
Go to Step 0220-4.
Yes
Go to Step 0220-3.
v Step 0220-3
Note: See System FRU locations for the part.
1. Remove the old FRU.
2. Install the new FRU.
3. Enter the diag command.
Go to Step 0220-14.
v Step 0220-4
Is the FRU a hot-plug PCI adapter?
No
Go to Step 0220-5.
Yes
Go to Step 0220-12.
v Step 0220-5
Is the FRU a SCSI hot-plug device?
No
Go to Step 0220-11.
Yes
Go to Step 0220-6.
v Step 0220-6
Is the hot-plug drive located within a system unit?
No
Go to Step 0220-8.
Yes
Go to Step 0220-7.
v Step 0220-7
See the removal and replacement procedures for your system in System FRU locations.
Go to Step 0220-13.
v Step 0220-8
Does the hot-plug drive's enclosure have procedures for removing and replacing SCSI disk drives?
No
Go to Step 0220-9.
If a hot-plug procedure exists, use that procedure to remove the old hot-plug SCSI disk drive
and replace it with a new hot-plug SCSI disk drive. Otherwise, if no hot-plug procedure exists,
use the power off procedure to remove the old SCSI disk drive and replace it with a new SCSI
disk drive. Go to Step 0220-13.
v Step 0220-9
1. Ask the customer to back up the data on the drive that you intend to replace onto another drive.
2. Verify that the disk drive is in the defined state. The amber LED on the hot-plug disk drive should
be off.
Is the hot-plug disk drive's amber LED unlit?
Yes
No
Ask the customer to remove the hot-plug disk drive from the operating system configuration
(refer the customer to the system management guide for more information).
Isolation procedures
357
Yes
Go to Step 0220-10.
v Step 0220-10
Using the hot-plug task service aid, replace the hot-plug drive using the following procedure:
1. Use the option List the SES Devices to show the configuration of the hot-plug slots. Identify the
slot number of the adapter for the FRU you want to replace.
2. Select the option REPLACE/REMOVE a Device Attached to an SES Device.
3. Select the slot which contains the SCSI hot-plug drive you want to replace. Press Enter. You will see
a fast flashing green light on the front on the hot-plug drive indicating that it is ready for removal.
Note: See the "Installing hardware" section of the information; locate the server information that
you are servicing and follow the tables to locate the correct removal or replacement procedure.
4. Remove the old hot-plug drive.
5. Install the new hot-plug drive. After the hot-plug drive is in place, press Enter.
6. Press Exit. Wait while configuration is done on the drive, until you see the "hot-plug task" on the
service aid menu.
Go to Step 0220-15.
v Step 0220-11
Attention: Do not remove functioning disks in a disk array attached to a PCI-X SCSI RAID controller
without assistance from your service support structure. A disk array may become degraded or failed if
functioning disks are removed and additional problems may be created. If you still need to remove a
RAID array disk attached to a PCI-X SCSI RAID controller, use the SCSI and SCSI RAID hot-plug
manager.
Using the hot-plug task service aid, replace the hot-plug drive using the hot plug RAID service aid:
Note: The drive you want to replace must be either a SPARE or FAILED drive. Otherwise, the drive
would not be listed as an "Identify and remove resources selection" within the RAID HOT-PLUG
DEVICES screen. In that case you must ask the customer to put the drive into FAILED state. Refer the
customer to the Operating System and Device management in the AIX library for more information. Ask
the customer to back up the data on the drive that you intend to replace.
1. Select the option RAID HOT-PLUG DEVICES within the HOT-PLUG TASK under DIAGNOSTIC
SERVICE AIDS.
2. Select the RAID adapter that is connected to the RAID array containing the RAID drive you want
to remove, then select COMMIT.
3. Choose the option IDENTIFY in the IDENTIFY AND REMOVE RESOURCES menu.
4. Select the physical disk which you want to remove from the RAID array and press Enter.
5. The disk will go into the IDENTIFY state, indicated by a flashing light on the drive. Verify that it is
the physical drive you want to remove, then press Enter.
6. At the IDENTIFY AND REMOVE RESOURCES menu, choose the option REMOVE and press Enter.
7. A list of the physical disks in the system that may be removed will be displayed. If the physical
disk you want to remove is listed, select it and press Enter. The physical disk will go into the
REMOVE state, as indicated by the LED on the drive. If the physical disk you want to remove is not
listed, it is not a SPARE or FAILED drive. Ask the customer to put the drive in the FAILED state before
you can proceed to remove it. Refer the customer to the Operating System and Device management in
the AIX library for more information.
8. See the service information for the system unit or enclosure that contains the physical drive for
removal and replacement procedures for the following substeps:
a. Remove the old hot-plug RAID drive.
b. Install the new hot-plug RAID drive. After the hot-plug drive is in place, press Enter. The drive
will exit the REMOVE state, and will go to the NORMAL state after you exit diagnostics.
358
Isolation procedures
Note: There are no elective tests to run on a RAID drive itself under diagnostics (the drives are
tested by the RAID adapter).
9. This completes the repair. Return the system to the customer. Ask the customer to add the physical
disk drive to the original configuration within the RAID. Refer them to system management guide
for more information.
v Step 0220-12
1. Remove the old adapter FRU and replace it with the new adapter FRU. See the System FRU
locations procedure for the part.
2. Enter the diag command.
3. Go to the FUNCTION SELECTION menu, and select the option Advanced Diagnostics Routines.
4. When the DIAGNOSTIC MODE SELECTION menu displays, select the option System Verification.
5. Go to Step 0220-14.
v Step 0220-13
1. If not already running diagnostics, enter the diag command.
Note: If you are already running service mode diagnostics and have just performed the task
Configure Added/Replaced Devices (under the SCSI Hot Swap manager of the Hot Plug Task
service aid), you must use the F3 key to return to the DIAGNOSTIC OPERATING INSTRUCTIONS
menu before proceeding with the next step, or else the drive might not appear on the resource list.
2. Go to the FUNCTION SELECTION menu, and select the option Advanced Diagnostics Routines.
3. When the DIAGNOSTIC MODE SELECTION menu displays, select the option System Verification.
Does the hot-plug SCSI disk drive you just replaced appear on the resource list?
No
Verify that you have correctly followed the procedures for replacing hot-plug SCSI disk drives
in the system service information. If the disk drive still does not appear in the resource list, go
to “MAP 0210: General problem resolution” on page 325 to replace the resource that the
hot-plug SCSI disk drive is plugged into.
Yes
Go to Step 0220-14.
v Step 0220-14
Run the diagnostic test on the FRU you just replaced.
Did the diagnostics run with no trouble found?
No
Go to Step 0220-15.
Go to Verify a repair. Before returning the system to the customer, if a hot-plug disk has been
removed, ask the customer to add the hot-plug disk drive to the operating system
configuration. See Operating System and Device Management in the AIX library for more
information.
v Step 0220-15
Yes
1. Use the option Log Repair Action in the TASK SELECTION menu to update the AIX error log. If
the repair action was reseating a cable or adapter, select the resource associated with your repair
action. If it is not displayed on the resource list, select sysplanar0.
Note: On systems with a fault indicator LED, this changes the fault indicator LED from the fault
state to the normal state.
2. While in diagnostics, go to the FUNCTION SELECTION menu. Select the option Advanced
Diagnostics Routines.
3. When the DIAGNOSTIC MODE SELECTION menu displays, select the optionSystem Verification.
Run the diagnostic test on the FRU you just replaced, or sysplanar0.
Did the diagnostics run with no trouble found?
No
Go to Step 0220-16.
Isolation procedures
359
Yes
If you changed the service processor or network settings, restore the settings to the value they
had prior to servicing the system. If you performed service on a PCI RAID subsystem
involving changing of the RAID adapter cache card or changing the configuration on RAID
disks, ask the customer to run "PCI SCSI disk array manager" using smitty to resolve the PCI
SCSI RAID adapter configuration. The following is an example of how the customer would
resolve the configuration:
1. At the AIX command line, type smitty pdam.
2. On the PCI SCSI Disk Array Manager screen, select RECOVERY OPTIONS.
3. If a previous configuration exists on the replacement adapter, this must be cleared. Select
Clear PCI SCSI RAID Adapter Configuration. Press F3.
4. On the Recovery Options screen, select RESOLVE PCI SCSI RAID ADAPTER
CONFIGURATION.
5. On the Resolve PCI SCSI RAID Adapter Configuration screen, select ACCEPT
CONFIGURATION on DRIVES.
6. On the PCI SCSI RAID Adapter selection menu, select the adapter that you changed.
7. On the next screen, press Enter.
8. On the "Are You Sure?" selection menu, press Enter to continue.
9. You receive an OK status message when the recovery is complete. If you get a Failed status
message, verify that you are preforming recovery on the correct adapter, then do this
complete procedure. When you complete the recovery, exit smitty to return to the AIX
command line.
Go to Verify a repair.
v Step 0220-16
Does the original problem persist?
No
If a FRU was replaced, run the log repair action service aid under the online diagnostics for the
resource that was replaced. If the resource associated with your action is not displayed on the
resource list, select sysplanar0. If steps were taken to make the device ready for removal,
inform the system administrator of the steps required to return the system to the original state.
Go to Verify a repair.
Yes
Go to Step 0220-17.
v Step 0220-17
Have you exchanged all the FRUs that correspond to the failing function codes?
No
Go to Step 0220-18.
The SRN did not identify the failing FRU. Schedule a time to run diagnostics in service mode.
If the same SRN is reported in service mode, go to “MAP 0030” on page 343.
v Step 0220-18
Yes
Note: Before proceeding, remove the FRU you just replaced and install the original FRU in its place.
Does the system unit support hot-swapping of the next FRU listed?
No
Go to “MAP 0210: General problem resolution” on page 325.
Yes
The SRN did not identify the failing FRU. Schedule a time to run diagnostics in service mode.
If the same SRN is reported in service mode, go to Step 0220-14.
MAP 0230
Use this MAP to resolve problems reported by SRNs A00-xxx to A25-xxxx.
360
Isolation procedures
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
Step 0230-1
1. The last character of the SRN is bit-encoded as follows:
8
4
2
1
|
|
|
|
|
|
|
Replace all FRUs listed
|
| Hot-swap is supported
|
Software or Firmware could be the cause
Reserved
2. See the last character in the SRN. A 4, 5, 6, or 7 indicates a possible software or firmware problem.
Does the last character indicate a possible software or firmware problem?
No
Go to Step 0230-4
Yes
Go to Step 0230-2.
Step 0230-2
Ask the customer if any software or firmware has been installed recently.
Has any software or firmware been installed recently?
No
Go to Step 0230-4.
Yes
Go to Step 0230-3.
Step 0230-3
Check with your support center for any known problems with the new software or firmware.
Are there any known problems with the software or firmware?
No
Go to Step 0230-4.
Yes
Obtain and follow the procedure to correct the software problem. This completes the repair.
Step 0230-4
Were any FRUs or location codes reported with the SRN?
No
Go to Step 0230-5.
Yes
Go to Step 0230-9
Step 0230-5
Run the diagnostics in problem determination mode on sysplanar0.
Were there any FRUs reported with the SRN?
No
Go to Step 0230-6.
Yes
Go to Step 0230-9
Step 0230-6
Isolation procedures
361
Did the system display: "Previous Diagnostic Results - Do you want to review the previously displayed
error?"
No
Go to Step 0230-7.
Yes
You have a pending item in the error log for which there is no corresponding log repair action.
To see this error, select YES at the prompt. Information from the error log displays in order of last
event first. Record the error code, the FRU names and the location code of the FRUs. Go to Step
0230-7
Step 0230-7
Were there any other SRNs that begin with an A00 to A1F reported?
No
Go to Step 0230-8.
Yes
Go to Step 0230-1 and use the new SRN.
Step 0230-8
Perform the action specified in the following table.
System:
Action:
8202-E4B, 8202-E4C,
8202-E4D, 8205-E6B,
8205-E6C, 8205-E6D,
8231-E2B, 8231-E1C,
8231-E1D, 8231-E2C,
8231-E2D, 8248-L4T,
8268-E1D, 8408-E8D,
9109-RMD, 9119-FHB,
9125-F2C
Power on the system to the hypervisor standby.
Note: Slow boot is not supported.
8233-E8B, 8236-E8C
Determine the level of firmware the system is running.
Is system firmware AL710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
8412-EAD, 9117-MMB,
9117-MMC, 9117-MMD,
9179-MHB, 9179-MHC, or
9179-MHD
Determine the level of firmware the system is running.
Is system firmware AM710_xxx installed?
Yes: Perform a slow boot. See Performing a slow boot.
No: Power on the system to the hypervisor standby.
If the system boots, run the diagnostics in problem determination mode on sysplanar0
Were any new error codes or SRNs reported?
No
Call your support center.
Yes
Follow the procedure for the new error code or SRN.
Step 0230-9
1. Obtain the list of physical location codes and FRU numbers that were listed on the Problem Report
Screen. The list can be obtained by running the sysplanar0 diagnostics or using the task Display
Previous Diagnostic Results.
2. Record the physical location codes and FRU numbers.
3. See the last character in the SRN. A 2, 3, 6, or 7 indicates that hot-swap is possible.
Does the last character indicate that hot-swap is possible?
362
Isolation procedures
No
Go to Step 0230-10.
Yes
Go to Step 0230-14
Step 0230-10
Note: If necessary, see Powering on and powering off the system for information about system shutdown
and powering the system on and off.
1. If the operating system is running, perform the operating system's shutdown procedure.
2. Turn off power to the system.
3. See the last character in the SRN. A 1, 3, 5, or 7 indicates that all FRUs listed on the Problem Report
Screen need to be replaced. For SRNs ending with any other character, exchange one FRU at a time,
in the order listed.
4. Turn on power to the system.
5. If you are running the AIX operating system, load the online diagnostics in service mode. See
Running the online diagnostics in service mode
Note: If the Diagnostics Operating Instructions do not display or you are unable to select the option
Task Selection , check for loose cards, cables, and obvious problems. If you do not find a problem,
go to “MAP 0020” on page 336 and get a new SRN.
6. Wait until the Diagnostics Operating Instructions are displayed or the system appears to stop.
7. Press Enter.
8. Select Diagnostic Routines at the function selection menu.
9. Select System Verification.
10. If a missing options exist, particularly if it is related to the device that was replaced, resolve the
missing options before proceeding
11. Select the option Task Selection.
12. Select the option Log Repair Action.
13. Log a repair action for each replaced resource.
14. If the resource associated with your repair action is not displayed on the resource list, select
sysplanar0.
15. Return to the Task Selection Menu.
16. If the FRU that was replaced was memory and the system is running as a full system partition,
select Run Exercisers and run the short exerciser on all the resources, otherwise proceed to Step
0230-15.
17. If you ran the exercisers in Step 0230-10, substep 16, return to the Task Selection menu.
18. Select Run Error Log Analysis and run analysis on all the resources.
Was a problem reported?
No
The repair is complete. Go to Verify a repair.
Yes
Go to Step 0230-11.
Step 0230-11
Is the problem the same as the original problem?
No
The symptom has changed. Check for loose cards, cables, and obvious problems. If you do not
find a problem, go to “MAP 0020” on page 336 and get a new SRN.
Yes
Go to Step 0230-12.
Step 0230-12
Isolation procedures
363
Look at the physical location codes and FRU part numbers you recorded.
Have you exchanged all the FRUs that were listed?
No
Go to Step 0230-13.
Yes
The SRN did not identify the failing FRU. Call your support person for assistance.
Step 0230-13
1. After performing a shutdown of the operating system, turn off power to the system.
2. Remove the new FRU and install the original FRU.
3. Exchange the next FRU in list.
4. Turn on power to the system.
5. If you are running the AIX operating system, load the online diagnostics in service mode. See
Running the online diagnostics in service mode
Note: If the Diagnostics Operating Instructions do not display or you are unable to select the option
Task Selection, check for loose cards, cables, and obvious problems. If you do not find a problem, go
to “MAP 0020” on page 336 and get a new SRN.
6. Wait until the Diagnostics Operating Instructions are displayed or the system appears to stop.
7. Press Enter.
8. Select Diagnostic Routines at the function selection menu.
9. Select System Verification.
10. If a missing options exist, particularly if it is related to the device that was replaced, resolve the
missing options before proceeding.
11. Select the option Task Selection.
Select the option Log Repair Action.
Log a repair action for each replaced resource.
If the resource associated with your action does not appear on the Resource List, select sysplanar0.
Return to the Task Selection Menu.
If the FRU that was replaced was memory and the system is running as a full system partition,
select Run Exercisers and run the short exerciser on all the resources, otherwise proceed to Step
0230-15.
17. If you ran the exercisers in Step 0230-13, substep 16, return to the Task Selection menu.
18. Select Run Error Log Analysis and run analysis on all the resources.
12.
13.
14.
15.
16.
Was a problem reported?
No
The repair is complete. Go to Verify a repair.
Yes
Go to Step 0230-11.
Step 0230-14
The FRUs can be hot-swapped. If you do not want to use the hot-swap, go to Step 0230-10.
1. See the last character in the SRN. A 1, 3, 5, or 7 indicates that all FRUs listed on the Problem Report
Screen must be replaced. For SRNs ending with any other character, exchange one FRU at a time, in
the order listed.
2. If available, use the CE Login and enter the diag command.
Note: If CE Login is not available, have the system administrator enter superuser mode and then
enter the diag command.
3. After the Diagnostics Operating Instructions display, press Enter.
364
Isolation procedures
4.
5.
6.
7.
8.
Select the option Task Selection.
Select the option Log Repair Action.
If the resource associated with your action is not displayed on the Resource List, select sysplanar0.
Log a repair action for each replaced resource.
Return to the Task Selection menu.
9. For systems running as a full system partition, select Run Exercisers and run the short exerciser on
all resources.
10. Use the option Log Repair Action in the Task Selection menu to update the error log. If the repair
action was reseating a cable or adapter, select the resource associated with your repair action. If it is
not displayed on the resource list, select sysplanar0.
Note: On systems with a Fault Indicator LED, this changes the Fault Indicator LED from the fault
state to the normal state.
Was a problem reported?
No
The repair is complete. Return the system to the customer.
Yes
Go to Step 0230-15.
Step 0230-15
Is the problem the same as the original problem?
No
The symptom has changed. Check for loose cards, cables, and obvious problems. If you do not
find a problem, go to “MAP 0020” on page 336 and get a new SRN.
Yes
Go to Step 0230-16.
Step 0230-16
Look at the physical location codes and FRU part numbers you recorded.
Have you exchanged all the FRUs that were listed?
No
Go to Step 0230-17.
Yes
The SRN did not identify the failing FRU. Call your support person for assistance.
Step 0230-17
1. Remove the new FRU and install the original FRU.
2. Exchange the next FRU in the list.
3. Return to the Task Selection Menu.
4. Select the option Log Repair Action.
5. Log a repair action for each replaced resource.
6. If the resource associated with your action is not displayed on the Resource List, select sysplanar0.
7. Return to the Task Selection Menu.
8. For systems running as a full system partition, select Run Exercisers and run the short exerciser on
all resources.
9. If you ran the exercisers in substep Step 0230-17, substep 8.
10. Select Run Error Log Analysis and run analysis on all exchanged resources.
Was a problem reported?
No
The repair is complete. Return the system to the customer.
Isolation procedures
365
Yes
Go to Step 0230-15.
MAP 0235
Use this MAP to resolve problems reported by SRNS A11-560 to A11-580.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
Note: The following steps may require that the system be rebooted to invoke Array bit steering, so you
may wish to schedule deferred maintenance with the system administrator to arrange a convenient time
to reboot their system.
Step 0235-1
Was the SRN A11-560?
No
Go to Step 0235-3.
Yes
Go to Step 0235-2.
Step 0235-2
Logged in as root or using CE Login, at the command line type diag then press enter. Use the Log
Repair Action option in the TASK SELECTION menu to update the error log. Select sysplanar0.
Note: On systems with fault indicator LED, this changes the fault indicator LED from the FAULT state to
the NORMAL state.
Were there any other errors on the resource reporting the array bit steering problem?
No
Step 0235-4.
Yes
Resolve those errors before proceeding.
Step 0235-3
Logged in as root or using CE Login, at the command line type diag then press enter. Use the Log Repair
Action option in the TASK SELECTION menu to update the error log. Select procx, where x is the
processor number of the processor that reported the error.
Note: On systems with fault indicator LED, this changes the fault indicator LED from the FAULT state to
the NORMAL state.
Were there any other errors on procx?
No
Step 0235-4.
Yes
Resolve those errors before proceeding.
Step 0235-4
Schedule deferred Maintenance with the customer. When it is possible, reboot the system to invoke Array
Bit steering.
Step 0235-5
366
Isolation procedures
After the system has been rebooted, log in as root or use CE Login. At the command line, run diagnostics
in problem determination mode to determine if the array bit steering was able to correct the problem.
If diagnostics are not run (for instance, if the system returns to Resource Selection menu after running
diagnostics in problem determination mode) or there is no problem on the resource that originally
reported the problem, then array bit steering was able to correct the problem. Go to Verify a repair.
MAP 0260
Use this MAP when the system unit hangs while configuring a resource.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
This MAP handles problems when the system unit hangs while configuring a resource.
v Step 0260-1
The last three or four digits of the SRN following the dash (-) match a failing function code number.
Look at the System parts and find the failing function code that matches the last three or four digits of
your SRN, following the dash. Record the FRU part number and description (use the first FRU part
listed when multiple FRUs are listed).
The location information or device name is displayed on the operator panel.
Do you have a location code displayed?
No
Go to Step 0260-4.
Yes
Go to Step 0260-2.
v Step 0260-2
Are there any FRUs attached to the device described by the location code?
No
Go to Step 0260-6.
Yes
Go to Step 0260-3.
v Step 0260-3
Remove this kind of FRU attached to the device described in the location code one at a time. Note
whether the system still hangs after each device is removed. Repeat this step until you no longer get a
hang, or all attached FRUS have been removed from the adapter or device.
Has the symptom changed?
No
Go to Step 0260-4.
Yes
Use the location code of the attached device that you removed when the symptom changed,
and go to Step 0260-6.
v Step 0260-4
Does your system unit contain only one of this kind of FRU?
No
Go to Step 0260-5.
Yes
Go to Step 0260-6.
v Step 0260-5
One of the FRUs of this kind is defective.
Remove this kind of FRU one at a time. Test the system unit after each FRU is removed. Stop when the
test completes successfully or when you have removed all of the FRUs of this kind.
Were you able to identify a failing FRU?
No
Go to “PFW1540: Problem isolation procedures” on page 407.
Isolation procedures
367
Yes
Go to Step 0260-6.
v Step 0260-6
1. Turn off the system unit.
2. Exchange the FRU identified by the location code or Step 0260-5.
Is this system capable of running online diagnostics in service mode?
No
Go to Step 0260-7.
Yes
Go to Step 0260-8.
v Step 0260-7
1. Turn on the system unit.
2. Load the standalone diagnostics. See Running the standalone diagnostics from CD-ROM.
3. Wait until the Diagnostic Operating Instructions display or the system appears to have stopped.
Are the DIAGNOSTIC OPERATING INSTRUCTIONS displayed?
No
Go to Step 0260-9.
Yes
Go to Verify a repair.
v Step 0260-8
1. Turn on the system unit.
2. Load the standalone diagnostics. See Running the standalone diagnostics from CD-ROM.
3. Wait until the Diagnostic Operating Instructions display or the system appears to have stopped.
Are the DIAGNOSTIC OPERATING INSTRUCTIONS displayed?
No
Go to Step 0260-9.
Yes
Go to Verify a repair.
v Step 0260-9
Look at the operator panel display.
Is the number displayed the same as the last three or four digits after the dash (-) of your SRN?
No
The symptom changed. Check for loose cards, cables, and obvious problems. If you do not find
a problem, go to “MAP 0020” on page 336 and get a new SRN.
Yes
Go to Step 0260-10.
v Step 0260-10
Was the FRU you exchanged an adapter or a backplane?
No
Go to Step 0260-11.
Yes
Go to “PFW1540: Problem isolation procedures” on page 407.
v Step 0260-11
Was the FRU you exchanged a device?
No
Go to “PFW1540: Problem isolation procedures” on page 407.
Yes
Go to Step 0260-12.
v Step 0260-12
The adapter for the device may be causing the problem.
1. Turn off the system unit.
2. Exchange the adapter for the device.
Note: If the AIX operating system is not used on the system, start diagnostics from an alternate
source.
3. Turn on the system unit. If c31 is displayed, follow the instructions to select a console display.
368
Isolation procedures
4. Load the standalone diagnostics. See Running the standalone diagnostics from CD-ROM.
5. Wait until the DIAGNOSTIC OPERATING INSTRUCTIONS display or the system appears to have
stopped.
Are the DIAGNOSTIC OPERATING INSTRUCTIONS displayed?
No
Go to “PFW1540: Problem isolation procedures” on page 407.
Yes
Go to Verify a repair.
MAP 0270
Use this MAP to resolve SCSI RAID adapter, cache, or drive problems.
If you need additional information for failing part numbers, location codes, or removal and replacement
procedures, see Part locations and location codes. Select your machine type and model number to find
additional location codes, part numbers, or replacement procedures for your system.
Notes:
1. This MAP assumes that the RAID adapter and drive microcode is at the correct level.
2. This MAP applies only to PCI, not PCI-X, RAID adapters.
Attention: If the FRU is a disk drive or an adapter, ask the system administrator to perform the steps
necessary to prepare the device for removal.
v Step 0270-1
1. If the system displayed a FRU part number on the screen, use that part number. If there is no FRU
part number displayed on the screen, see the SRN listing. Record the SRN source code and the
failing function codes in the order listed.
2. Find the failing function codes in the FFC listing, and record the FRU part number and description
of each FRU.
Go to Step 0270-2.
v Step 0270-2
Is the FRU a RAID drive?
No
Go to Step 0270-6.
Yes
Go to Step 0270-3.
v Step 0270-3
If the RAID drive you want to replace is not already in the failed state, then ask the customer to run
the PCI SCSI Disk Array Manager using smit to fail the drive that you want to replace. An example of
this procedure is:
1. Log in as root user.
2. Type smit pdam.
3. Select Fail a Drive in a PCI SCSI Disk Array.
4. Select the appropriate disk array by placing the cursor over that array and press Enter.
5. Select the appropriate drive to fail based on the Channel and ID indicated in diagnostics. The Fail a
Drive screen will appear.
6. Verify that you are failing the correct drive by looking at the Channel ID row. Press Enter when
verified correct. Press Enter again.
7. Press F10 and type smit pdam
8. Select Change/Show PCI SCSI RAID Drive Status > Remove a Failed Drive.
9. Select the drive that just failed.
Go to Step 0270-4.
v Step 0270-4
Isolation procedures
369
Replace the RAID drive using the RAID HOT PLUG DEVICES service aid:
Note: The drive you want to replace must be either a SPARE or FAILED drive. Otherwise, the drive
would not be listed as an IDENTIFY AND REMOVE RESOURCES selection within the RAID HOT
PLUG DEVICES screen. In that case you must ask the customer to put the drive into FAILED state. For
information about putting the drive in a FAILED state, refer the customer to the SAS RAID controllers
for AIX or SAS RAID controllers for Linux topic.
1. Select the option RAID HOT PLUG DEVICES within the HOT PLUG TASK under DIAGNOSTIC
SERVICE AIDS.
2. Select the RAID adapter that is connected to the RAID array containing the RAID drive you want
to remove, then select COMMIT.
3. Choose the option IDENTIFY in the IDENTIFY AND REMOVE RESOURCES menu.
4. Select the physical disk which you want to remove from the RAID array and press Enter. The disk
will go into the IDENTIFY state, indicated by a flashing light on the drive.
5. Verify that it is the physical drive you want to remove, then press Enter.
6. At the IDENTIFY AND REMOVE RESOURCES menu, choose the option REMOVE and press Enter.
A list of the physical disks in the system that may be removed will be displayed.
7. If the physical disk you want to remove is listed, select it and press Enter. The physical disk will go
into the REMOVE state, as indicted by the LED on the drive. If the physical disk you want to
remove is not listed, it is not a SPARE or FAILED drive. Ask the customer to put the drive in the
FAILED state before you can proceed to remove it. For information about putting the drive in a
FAILED state, refer the customer to the SAS RAID controllers for AIX or SAS RAID controllers for
Linux topic.
8. See the service information for the system unit or enclosure that contains the physical drive for
removal and replacement procedures for the following substeps:
a. Remove the old hot-swap RAID drive.
b. Install the new hot-swap RAID drive. After the hot-swap drive is in place, press Enter. The
drive will exit the REMOVE state, and will go to the NORMAL state after you exit diagnostics.
Note: There are no elective tests to run on a RAID drive itself under diagnostics (the drives are
tested by the RAID adapter).
Go to Step 0270-5.
v Step 0270-5
If the RAID did not begin reconstructing automatically, perform the following steps.
Adding a Disk to the RAID array and Reconstructing:
Ask the customer to run the PCI SCSI Disk Array Manager using smit. An example of this procedure
is:
1. Log in as root user.
2. Type smit pdam.
3.
4.
5.
6.
7.
Select Change/Show PCI SCSI RAID Drive Status.
Select Add a Spare Drive.
Select the appropriate adapter.
Select the channel and ID of the drive that was replaced.
Press Enter when verified.
8. Press F3 until you return to the Change/Show PCI SCSI