VAXft Systems Model 810 Service Information

VAXft Systems Model 810 Service Information
VAXft Systems
Model 810
Service Information
Order Number: EK-VXFTA-SI.A01
June 1993
This manual is intended for use by trained personnel responsible for
maintaining VAXft Model 810 systems.
Digital Equipment Corporation
June 1993
The information in this document is subject to change without notice and should not be construed
as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no
responsibility for any errors that may appear in this document.
No responsibility is assumed for the use or reliability of software on equipment that is not supplied
by Digital Equipment Corporation or its affiliated companies.
Restricted Rights: Use, duplication, or disclosure by the U.S. Government is subject to restrictions
as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software
clause at DFARS 252.227-7013.
© Digital Equipment Corporation June 1993.
All Rights Reserved.
Printed in Canada
The following are trademarks of Digital Equipment Corporation: CompacTape, OpenVMS, ThinWire,
TK, UETP, VAX, VAXft, VMS, VAXELN, and the DIGITAL logo.
FCC NOTICE: This equipment generates, uses, and may emit radio frequency energy. It has been
tested and found to comply with the limits for a Class A computing device pursuant to Subpart J
of Part 15 of FCC rules of operation in a commercial environment. This equipment, when operated
in a residential area, may cause interference to radio/TV communications. In such event the user
(owner), at his own expense, may be required to take corrective measures.
This document is available on CDROM.
Documentation Map
Hardware
Information
(VAXft Systems)
Overview
Information
(VAXft Systems)
Software Product
Description
Models
110, 410, 610, 612
Model
810
Configuration
Guide
Configuring
the Model 810
Operating System
(VMS)
Cover
Letter
Software
Information
(VAXft System Services)
Before You
Install Letter
Release Notes
Site Prep and
Installation Guide
Release Notes
Installation
Information
Owner’s Manual
Operating
Information
VMS Upgrade and
Installation Manual
Wide Area
* VAXNetwork
Device Drivers
*Maintenance
Guide
*
Service
Information
*Site Prep
Information
= Book
= Tape
VMS Upgrade and
Installation Supplement:
VAXft Systems
Using Factory−Installed
Software with
VAXft Systems
Manager’s
Guide
Online Help
*
VMS Volume
Shadowing Manual
= Bookreader
Reference
Manual
= Online
= Letter
*
= Order Separately
MR−6230−RA
Contents
1 Cabinet and Component Descriptions
1.1
1.2
1.3
1.4
1.5
In This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU and Expansion Cabinets . . . . . . . . . . . . . . . . . . .
Zone Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . . .
Power Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Domestic and International Power Distribution Boxes .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1–1
1–1
1–6
1–8
1–9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2–1
2–1
2–3
2–4
2–4
2–5
2–6
2–7
2–8
2–9
2–9
2–10
2–11
2–11
2–13
2–13
2–15
2–15
2–16
2–16
2–16
2–17
2–17
2–18
2–18
2–19
2–20
2–21
2–22
2–22
2 Console Operations
2.1
In This Chapter . . . . . . . . . . . . . . . .
2.2
Console Description . . . . . . . . . . . . .
2.3
Console Operating Modes . . . . . . . . .
2.3.1
Entering CIO Mode . . . . . . . . . .
2.3.2
Exiting CIO Mode . . . . . . . . . . .
2.4
Console Control Characters . . . . . . .
2.5
Console Command Language Syntax
2.6
Bootstrap Procedures . . . . . . . . . . . .
2.7
Entering CIO Mode . . . . . . . . . . . . .
2.8
CIO Mode Console Commands . . . . .
2.8.1
BOOT . . . . . . . . . . . . . . . . . . . .
2.8.2
CLEAR . . . . . . . . . . . . . . . . . . . .
2.8.3
CONTINUE . . . . . . . . . . . . . . .
2.8.4
DEPOSIT . . . . . . . . . . . . . . . . .
2.8.5
DUP . . . . . . . . . . . . . . . . . . . . . .
2.8.6
EXAMINE . . . . . . . . . . . . . . . . .
2.8.7
FIND . . . . . . . . . . . . . . . . . . . .
2.8.8
HELP . . . . . . . . . . . . . . . . . . . . .
2.8.9
INITIALIZE . . . . . . . . . . . . . . . .
2.8.10
MOVE . . . . . . . . . . . . . . . . . . . .
2.8.11
MATCH_ZONES . . . . . . . . . . . .
2.8.12
REPEAT . . . . . . . . . . . . . . . . . .
2.8.13
SET . . . . . . . . . . . . . . . . . . . . . .
2.8.13.1
SET BOOT . . . . . . . . . . . . . .
2.8.14
SHOW . . . . . . . . . . . . . . . . . . . .
2.8.15
START . . . . . . . . . . . . . . . . . . .
2.8.16
TEST . . . . . . . . . . . . . . . . . . . .
2.8.17
X(transfer) . . . . . . . . . . . . . . . . .
2.8.18
Z ........................
2.8.19
!(comment) . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
v
3 System Maintenance
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.8.1
3.8.2
3.8.3
3.8.4
3.8.5
3.8.5.1
3.8.5.2
3.9
3.9.1
3.9.2
3.9.3
3.9.4
In This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Maintenance Strategy . . . . . . . . . . . . . . . . . . . . . . . . . .
Operating Rules and Cautions . . . . . . . . . . . . . . . . . . . .
General Troubleshooting Procedure . . . . . . . . . . . . . . . .
Module Fault LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Power System Overview . . . . . . . . . . . . . . . . . . . . . . . . .
Power System Maintenance . . . . . . . . . . . . . . . . . . . . . .
Device Status and Fault Indicators . . . . . . . . . . . . . . . .
RF35 Disk Drawer . . . . . . . . . . . . . . . . . . . . . . . . . .
SF35 Storage Array . . . . . . . . . . . . . . . . . . . . . . . . .
SF73 Storage Array . . . . . . . . . . . . . . . . . . . . . . . . .
TF85C Tape Drive . . . . . . . . . . . . . . . . . . . . . . . . . .
TF857 Tape Loader . . . . . . . . . . . . . . . . . . . . . . . . .
Power-On Process . . . . . . . . . . . . . . . . . . . . . . .
Operator Control Panel Controls and Indicators
ROM-Based Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . .
TEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Z .......................................
CPU ROM-Based Diagnostics . . . . . . . . . . . . . . . . .
I/O ROM-Based Diagnostics . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3–1
3–1
3–2
3–4
3–6
3–7
3–12
3–19
3–19
3–21
3–24
3–26
3–27
3–27
3–27
3–29
3–30
3–31
3–31
3–34
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4–1
4–1
4–2
4–3
4–4
4–5
4–10
4–12
4–12
4–13
4–13
4–14
4–14
4–15
4–16
4–16
4–17
4–19
4–20
4–22
4–24
4–26
4–27
4–27
4–28
4–29
4–30
4–34
4–36
4–38
4 Error Handling and Analysis
4.1
4.2
4.2.1
4.2.2
4.2.3
4.2.4
4.2.5
4.3
4.3.1
4.3.2
4.3.2.1
4.3.2.2
4.3.2.3
4.3.2.4
4.3.2.5
4.3.2.6
4.3.3
4.4
4.4.1
4.4.2
4.4.3
4.4.4
4.4.5
4.4.5.1
4.4.5.2
4.4.5.3
4.4.5.4
4.4.5.5
4.4.5.6
4.5
vi
In This Chapter . . . . . . . . . . . . . . . . . . . . . .
Error Handling Services Overview . . . . . . . .
Basic Error Isolation and Handling . . . .
EHS Structure . . . . . . . . . . . . . . . . . . . .
System Operating Modes . . . . . . . . . . . .
Error Types . . . . . . . . . . . . . . . . . . . . . .
VAXELN Error Handling . . . . . . . . . . . .
Field Replaceable Units (FRUs) . . . . . . . . . .
Isolation . . . . . . . . . . . . . . . . . . . . . . . . .
Deconfiguration . . . . . . . . . . . . . . . . . . .
I/O Attachment Module . . . . . . . . . .
CPU Module and Memory . . . . . . . .
I/O Expansion Module . . . . . . . . . . .
Interface Module . . . . . . . . . . . . . . .
Zone . . . . . . . . . . . . . . . . . . . . . . . . .
Cross-Link Cable . . . . . . . . . . . . . . .
Application of Thresholds . . . . . . . . . . . .
OpenVMS Error Log . . . . . . . . . . . . . . . . . .
Fault Summary . . . . . . . . . . . . . . . . . . .
FRU Information . . . . . . . . . . . . . . . . . .
Deconfiguration Information . . . . . . . . .
Threshold Information . . . . . . . . . . . . . .
Fault Data . . . . . . . . . . . . . . . . . . . . . . .
System Registers . . . . . . . . . . . . . . .
End Actions . . . . . . . . . . . . . . . . . . .
End Action Timeouts . . . . . . . . . . . .
VAXELN Detected Errors . . . . . . . . .
Software Detected Errors . . . . . . . . .
Unsynchable Events . . . . . . . . . . . . .
Module NVRAM Status and LED Indicators
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4.6
4.6.1
4.6.2
4.6.2.1
4.7
4.7.1
4.7.1.1
4.7.1.2
4.7.2
4.8
4.8.1
4.8.1.1
4.8.1.2
4.8.1.3
4.8.2
4.8.2.1
4.8.2.2
4.8.3
4.9
4.9.1
4.9.2
4.9.3
FTSS Event Reporting Interface . . . . . . . . . . . . . . .
Event Reporting Interface Routines . . . . . . . . . .
Error Event Messages . . . . . . . . . . . . . . . . . . . .
Deconfiguration Messages . . . . . . . . . . . . . .
Firmware Interfaces . . . . . . . . . . . . . . . . . . . . . . . . .
System Console and Diagnostics . . . . . . . . . . . .
System Resets . . . . . . . . . . . . . . . . . . . . . . .
CCA Fields . . . . . . . . . . . . . . . . . . . . . . . . . .
I/O Expansion Module Console and Diagnostics .
Firmware and OpenVMS Interface Data Structures
Console Communications Area . . . . . . . . . . . . . .
Duplex Compatibility Test . . . . . . . . . . . . . .
Dispatch Block Description . . . . . . . . . . . . .
Boot Parameter Block Description . . . . . . . .
Device Configuration Block . . . . . . . . . . . . . . . .
Sub-Device Configuration Blocks . . . . . . . . .
CPU Module SubDCB . . . . . . . . . . . . . . . . .
Page Frame Number Bitmap . . . . . . . . . . . . . . .
Error Log Analysis . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU/MEM Fault Error Log Entry . . . . . . . . . . .
CPU/MEM Fault End Action Error Log Entry . .
CPU or Zone Unsynchable Error Log Entry . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4–40
4–40
4–40
4–49
4–50
4–50
4–51
4–53
4–53
4–54
4–55
4–57
4–59
4–60
4–61
4–63
4–64
4–65
4–66
4–66
4–69
4–72
In This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Field Replaceable Unit List . . . . . . . . . . . . . . . . . . . . .
Before You Begin . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Handling FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . .
Shutting Down a Zone . . . . . . . . . . . . . . . . . . . . . .
Verifying Zone Shutdown . . . . . . . . . . . . . . . . . . . .
Starting Up a Zone . . . . . . . . . . . . . . . . . . . . . . . .
Accessing the FRUs . . . . . . . . . . . . . . . . . . . . . . . .
FRU Removal and Replacement . . . . . . . . . . . . . . . . . .
CPU and ATM Modules . . . . . . . . . . . . . . . . . . . . .
SIMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MMBs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fan and FCSB . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RF35 Disk Drive Removal and Replacement . . . . .
DSSI Disk Drawer . . . . . . . . . . . . . . . . . . . . . . . . .
Zone Control Panel . . . . . . . . . . . . . . . . . . . . . . . .
FEU, 3.3V Regulator, 5V Regulator, PSC Modules .
Cross-Link Assembly . . . . . . . . . . . . . . . . . . . . . . .
Console Extender Module . . . . . . . . . . . . . . . . . . . .
DSSI Extender Module . . . . . . . . . . . . . . . . . . . . .
CAMP Module . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DSSI Interface Module (DIM) . . . . . . . . . . . . . . . .
Ethernet Interface Module (EIM) . . . . . . . . . . . . . .
DSSI Cable Removal and Replacement . . . . . . . . .
TF85C-BA Tape Drive . . . . . . . . . . . . . . . . . . . . . .
SF73 Disk Drive . . . . . . . . . . . . . . . . . . . . . . . . . . .
SF35 Storage Array . . . . . . . . . . . . . . . . . . . . . . . .
TF857-CA Tape Drive . . . . . . . . . . . . . . . . . . . . . .
Power Distribution Box . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5–1
5–1
5–3
5–4
5–4
5–5
5–5
5–5
5–6
5–7
5–8
5–9
5–10
5–12
5–14
5–14
5–16
5–18
5–20
5–22
5–24
5–26
5–28
5–29
5–30
5–32
5–36
5–39
5–42
5 FRU Removal and Replacement Procedures
5.1
5.2
5.3
5.3.1
5.3.2
5.3.3
5.3.4
5.3.5
5.4
5.4.1
5.4.2
5.4.3
5.4.4
5.4.5
5.4.6
5.4.7
5.4.8
5.4.9
5.4.10
5.4.11
5.4.12
5.4.13
5.4.14
5.4.15
5.4.16
5.4.17
5.4.18
5.4.19
5.4.20
vii
6 Managing Integrated Storage Elements
6.1
6.2
6.3
6.4
6.5
6.6
6.6.1
6.6.2
6.6.3
6.6.4
In This Chapter . . . . . . . . . . . . . . . . . . . . .
Loading the DUP Driver . . . . . . . . . . . . . .
Using VMS DUP . . . . . . . . . . . . . . . . . . . .
Using the Server Setup Switch . . . . . . . . .
Assigning DSSI Unit Numbers . . . . . . . . .
Warm Swapping an ISE . . . . . . . . . . . . . .
Setting ISE Parameters . . . . . . . . . . . .
ISE Removal . . . . . . . . . . . . . . . . . . . .
ISE Replacement . . . . . . . . . . . . . . . . .
Installing an ISE in a Running System
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6–1
6–1
6–1
6–2
6–2
6–3
6–5
6–7
6–8
6–11
In This Appendix . . . . . . . . . . . . . . . . . . . . . . .
Processor Halt Codes . . . . . . . . . . . . . . . . . . . .
Console Halt Codes . . . . . . . . . . . . . . . . . . . . .
Error Register Descriptions . . . . . . . . . . . . . . .
System Fault (SYSFLT) Register . . . . . . . .
System Error Address (SYSADR) Register .
DMA Error Address (DMAADR) Register .
Reset Reason 0013 Fault Analysis . . . . . . .
I/O Physical Address Space . . . . . . . . . . . . . . .
System Control Block Description . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A–1
A–1
A–3
A–4
A–4
A–7
A–7
A–8
A–8
A–10
In This Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Individual ISE Parameter Worksheets . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ISE Zone Parameter Worksheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B–1
B–1
B–3
Indirect Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
How to Shut Down a Zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
How to Verify Zone Shutdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2–12
5–5
5–5
Cabinet Layout, Front View . . . . . . . . .
Cabinet Layout, Rear View . . . . . . . . .
Zone Control Panel . . . . . . . . . . . . . . .
Power Module Controls and Indicators
Domestic Power Distribution Box . . . .
International Power Distribution Box .
System Components . . . . . . . . . . . . . . .
Console Operating Modes . . . . . . . . . . .
Boot Procedure . . . . . . . . . . . . . . . . . . .
1–2
1–4
1–6
1–8
1–10
1–11
2–2
2–4
2–7
A Miscellaneous System Information
A.1
A.2
A.3
A.4
A.4.1
A.4.2
A.4.3
A.4.4
A.5
A.6
B ISE Parameter Worksheets
B.1
B.2
B.3
Index
Examples
2–1
5–1
5–2
Figures
1–1
1–2
1–3
1–4
1–5
1–6
2–1
2–2
2–3
viii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3–1
3–2
3–3
3–4
3–5
3–6
3–7
3–8
3–9
3–10
3–11
4–1
4–2
4–3
4–4
4–5
4–6
4–7
4–8
4–9
4–10
4–11
4–12
4–13
4–14
4–15
5–1
5–2
5–3
5–4
5–5
5–6
5–7
5–8
5–9
5–10
5–11
5–12
5–13
5–14
5–15
5–16
5–17
5–18
5–19
5–20
5–21
Module Fault LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Power System Block Diagram (1 of 2) . . . . . . . . . . . . . .
Power System Block Diagram (2 of 2) . . . . . . . . . . . . . .
Power Module Controls and Indicators . . . . . . . . . . . . .
RF35 Disk Drawer Controls and Indicators . . . . . . . . . .
SF35 Operator Control Panel . . . . . . . . . . . . . . . . . . . . .
SF35 Rear Panel Fault Indicator . . . . . . . . . . . . . . . . . .
Location of SF73 Storage Array LEDs and Switchpacks
Rear of the SF73 Storage Array . . . . . . . . . . . . . . . . . . .
TF85C Cartridge Tape Drive . . . . . . . . . . . . . . . . . . . . .
TF857 Operator Control Panel . . . . . . . . . . . . . . . . . . . .
Hardware Error Handling Flowchart . . . . . . . . . . . . . . .
EHS Architectural Position . . . . . . . . . . . . . . . . . . . . . .
OpenVMS Error Log Format . . . . . . . . . . . . . . . . . . . . .
Fault Summary Block . . . . . . . . . . . . . . . . . . . . . . . . . .
FRU Information Block . . . . . . . . . . . . . . . . . . . . . . . . .
Deconfiguration Information Block . . . . . . . . . . . . . . . .
Threshold Information Block . . . . . . . . . . . . . . . . . . . . .
Fault Data Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
End Action Timeout Block . . . . . . . . . . . . . . . . . . . . . . .
VAXELN Detected Error Block . . . . . . . . . . . . . . . . . . .
Software Detected Error Block . . . . . . . . . . . . . . . . . . . .
Unsynchable Event Block . . . . . . . . . . . . . . . . . . . . . . .
Firmware and OpenVMS Data Structure Memory Map .
Dispatch Block Structure . . . . . . . . . . . . . . . . . . . . . . . .
SubDCB Links to DCB . . . . . . . . . . . . . . . . . . . . . . . . . .
Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU Module and ATM Module Locations . . . . . . . . . . .
SIMM Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MMB Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fan Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FCSB Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RF35 Disk Drive Location . . . . . . . . . . . . . . . . . . . . . . .
Zone Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FEU, 3.3V Regulator, 5V Regulator, and PSC Locations
Cross-Link Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . .
Module Extraction Tool . . . . . . . . . . . . . . . . . . . . . . . . .
Console Extender Module Location . . . . . . . . . . . . . . . .
Console Extender Module Layout . . . . . . . . . . . . . . . . . .
DSSI Extender Module Locations . . . . . . . . . . . . . . . . .
CAMP Module Locations . . . . . . . . . . . . . . . . . . . . . . . .
DIM Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DIM Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EIM Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TF85C-BA Tape Drive, Rear View . . . . . . . . . . . . . . . . .
TF85C-BA Tape Drive Removal . . . . . . . . . . . . . . . . . . .
SF73 Disk Drive, Rear View . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3–6
3–8
3–9
3–13
3–20
3–21
3–23
3–24
3–25
3–26
3–28
4–2
4–4
4–19
4–20
4–22
4–24
4–26
4–27
4–30
4–30
4–35
4–37
4–54
4–59
4–64
5–6
5–7
5–8
5–9
5–10
5–11
5–12
5–15
5–16
5–18
5–19
5–20
5–21
5–22
5–24
5–26
5–27
5–28
5–30
5–31
5–32
ix
5–22
5–23
5–24
5–25
5–26
5–27
5–28
5–29
5–30
5–31
5–32
6–1
6–2
A–1
A–2
A–3
A–4
A–5
A–6
SF73 Disk Drive, Front View . . . . . . . . .
SF73 Disk Drive Enclosure Removal . . .
SF73 Disk ISE Removal . . . . . . . . . . . . .
SF35 Storage Array, Rear View . . . . . . .
SF35 Storage Array, Front View . . . . . .
SF35 Disk ISE Removal . . . . . . . . . . . . .
TF857-CA Tape Drive, Rear View . . . . .
Loosening the Shipping Restraint Screw
Setting the TF857 Tape Loader Node ID
Domestic Power Distribution Box . . . . .
International Power Distribution Box . .
VAXft Model 810 Front View . . . . . . . . .
VAXft Model 810 Rear View . . . . . . . . . .
System Fault Register . . . . . . . . . . . . . .
JXD System Error Address Register . . .
JXD DMA Error Address Register . . . . .
I/O Physical Address Space . . . . . . . . . .
System Control Block Base Register . . .
System Control Block Vector Format . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5–33
5–34
5–35
5–36
5–37
5–38
5–39
5–40
5–41
5–42
5–43
6–3
6–4
A–4
A–7
A–7
A–9
A–10
A–10
Key to Figure 1–1, Cabinet Layout, Front View . . . . . . . . .
Key to Figure 1–2, Cabinet Layout, Rear View . . . . . . . . .
Key to Figure 1–3, Zone Control Panel . . . . . . . . . . . . . . .
Key to Figure 1–4, Power Module Controls and Indicators
Key to Figure 1–5, Domestic Power Distribution Box . . . .
Key to Figure 1–6, International Power Distribution Box .
Key to Figure 2–1, System Components . . . . . . . . . . . . . . .
Function of the Console Components . . . . . . . . . . . . . . . . .
Console Control Characters and Function Keys . . . . . . . . .
Console Command Language Syntax . . . . . . . . . . . . . . . . .
Qualifiers for BOOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VMB Program /R5:<flag> Values . . . . . . . . . . . . . . . . . . . .
Qualifier for CLEAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Qualifiers for DEPOSIT . . . . . . . . . . . . . . . . . . . . . . . . . . .
Address-Spec Symbolic Addresses . . . . . . . . . . . . . . . . . . .
Qualifiers for DUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Qualifiers for EXAMINE . . . . . . . . . . . . . . . . . . . . . . . . . .
Address-Spec Symbolic Addresses . . . . . . . . . . . . . . . . . . .
Qualifiers for FIND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
INITIALIZE Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SET Variables and Values . . . . . . . . . . . . . . . . . . . . . . . . .
SHOW Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Qualifiers for TEST Selection . . . . . . . . . . . . . . . . . . . . . . .
Qualifiers for TEST Control . . . . . . . . . . . . . . . . . . . . . . . .
Qualifier for Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1–3
1–5
1–7
1–9
1–10
1–11
2–2
2–3
2–5
2–6
2–9
2–10
2–10
2–11
2–12
2–13
2–14
2–14
2–15
2–16
2–17
2–18
2–20
2–21
2–22
Tables
1–1
1–2
1–3
1–4
1–5
1–6
2–1
2–2
2–3
2–4
2–5
2–6
2–7
2–8
2–9
2–10
2–11
2–12
2–13
2–14
2–15
2–16
2–17
2–18
2–19
x
3–1
3–2
3–3
3–4
3–5
3–6
3–7
3–8
3–9
3–10
3–11
3–12
3–13
3–14
3–15
3–16
3–17
3–18
3–19
3–20
3–21
3–22
3–23
3–24
3–25
3–26
3–27
3–28
3–29
4–1
4–2
4–3
4–4
4–5
4–6
4–7
4–8
4–9
4–10
4–11
4–12
4–13
4–14
4–15
4–16
4–17
4–18
Before Stopping a Zone . . . . . . . . . . . . . . . . . . . . . . . . . . .
After a Zone is Repaired . . . . . . . . . . . . . . . . . . . . . . . . . .
Before Leaving the Site . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
General Troubleshooting Procedure . . . . . . . . . . . . . . . . . .
Key to Figure 3–1, Module Fault LEDs . . . . . . . . . . . . . . .
Power System Functional Summary . . . . . . . . . . . . . . . . .
System DC Voltage Summary . . . . . . . . . . . . . . . . . . . . . .
Key to Figure 3–4, Power Module Controls and Indicators
Fan, LDC, Temperature Error Codes . . . . . . . . . . . . . . . . .
FEU Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PSC Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12 V DC to DC Converter Error Codes . . . . . . . . . . . . . . .
2 V DC to DC Converter Error Codes . . . . . . . . . . . . . . . .
3 V DC to DC Converter Error Codes . . . . . . . . . . . . . . . .
5 V DC to DC Converter Error Codes . . . . . . . . . . . . . . . .
12 V DC to DC Converter Error Codes . . . . . . . . . . . . . . .
RF35 Disk Drawer Controls and Indicators . . . . . . . . . . . .
SF35 Operator Control Panel Description . . . . . . . . . . . . .
SF35 Rear Panel Controls and Indicator . . . . . . . . . . . . . .
SF73 Front Panel Controls and Indicators . . . . . . . . . . . . .
TF85C Tape Drive Problems . . . . . . . . . . . . . . . . . . . . . . .
TF85C Cartridge Tape Drive Indicators . . . . . . . . . . . . . . .
TF857 OCP Controls and Indicators . . . . . . . . . . . . . . . . .
Qualifiers for TEST Selection . . . . . . . . . . . . . . . . . . . . . . .
Qualifiers for TEST Control . . . . . . . . . . . . . . . . . . . . . . . .
Qualifier for Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU ROM-Based Diagnostic Descriptions . . . . . . . . . . . . .
I/O ROM-Based Diagnostic Descriptions . . . . . . . . . . . . . .
EHS Error Notification . . . . . . . . . . . . . . . . . . . . . . . . . . .
Error Handling Flowchart Definitions . . . . . . . . . . . . . . . .
System Operating Modes . . . . . . . . . . . . . . . . . . . . . . . . . .
Error Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VAXELN Error Classes . . . . . . . . . . . . . . . . . . . . . . . . . . .
System FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ATM Deconfiguration Actions . . . . . . . . . . . . . . . . . . . . . . .
CPU Deconfiguration Actions . . . . . . . . . . . . . . . . . . . . . . .
I/O Expansion Module Deconfiguration Actions . . . . . . . . .
Interface Module Deconfiguration Actions . . . . . . . . . . . . .
Zone Deconfiguration Actions . . . . . . . . . . . . . . . . . . . . . . .
Cross-Link Cable Deconfiguration Actions . . . . . . . . . . . . .
FRU Thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
OpenVMS Error Log Sizes . . . . . . . . . . . . . . . . . . . . . . . . .
Fault Summary Block Entry Descriptions . . . . . . . . . . . . .
FRU Information Block Entry Descriptions . . . . . . . . . . . .
Deconfiguration Information Block Entry Descriptions . . .
Threshold Information Block Entry Descriptions . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3–2
3–2
3–3
3–4
3–4
3–7
3–10
3–12
3–13
3–15
3–16
3–16
3–18
3–18
3–19
3–19
3–19
3–20
3–22
3–23
3–24
3–26
3–27
3–28
3–30
3–30
3–31
3–31
3–34
4–2
4–3
4–4
4–5
4–11
4–12
4–13
4–14
4–15
4–15
4–16
4–16
4–17
4–19
4–20
4–23
4–24
4–26
xi
4–19
4–20
4–21
4–22
4–23
4–24
4–25
4–26
4–27
4–28
4–29
4–30
4–31
4–32
4–33
4–34
4–35
4–36
4–37
4–38
4–39
5–1
5–2
5–3
5–4
5–5
5–6
5–7
5–8
5–9
5–10
5–11
5–12
5–13
5–14
5–15
5–16
5–17
5–18
5–19
5–20
5–21
5–22
6–1
6–2
6–3
6–4
xii
System Register Entry Descriptions . . . . . . . . . . . . . . . . . . . . . . . .
End Actions Register Descriptions . . . . . . . . . . . . . . . . . . . . . . . . .
End Action Timeout Block Entry Description . . . . . . . . . . . . . . . .
VAXELN Detected Error Block Entry Descriptions . . . . . . . . . . . .
Software Detected Error Block Entry Descriptions . . . . . . . . . . . .
Unsynchable Event Block Entry Descriptions . . . . . . . . . . . . . . . .
Module ID NVRAM/DCB Status Codes . . . . . . . . . . . . . . . . . . . . .
System Reset Action Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
System Reset Reason Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Error Handler Reset Reasons . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I/O Reset Action Code Description . . . . . . . . . . . . . . . . . . . . . . . . .
I/O Reset Reason Code Descriptions . . . . . . . . . . . . . . . . . . . . . . .
CCA Component Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Duplex Compatibility Test Failure Codes . . . . . . . . . . . . . . . . . . . .
Dispatch Block Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BPB Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BPB Entry Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DCB Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DCB Entry Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU SubDCB Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU SubDCB Entry Components . . . . . . . . . . . . . . . . . . . . . . . . .
Model 810 FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Handling FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU Module and ATM Module Removal Procedure . . . . . . . . . . . .
SIMM Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MMB Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fan and FCSB Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . .
RF35 Disk Drive Removal Procedure . . . . . . . . . . . . . . . . . . . . . . .
DSSI Disk Drawer Removal Procedure . . . . . . . . . . . . . . . . . . . . .
Zone Control Panel Removal Procedure . . . . . . . . . . . . . . . . . . . . .
FEU, 3.3V Regulator, 5V Regulator, and PSC Removal Procedure .
Cross-Link Assembly Removal Procedure . . . . . . . . . . . . . . . . . . .
Console Extender Module Removal Procedure . . . . . . . . . . . . . . . .
DSSI Extender Module Removal Procedure . . . . . . . . . . . . . . . . . .
CAMP Module Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . .
DIM Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EIM Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DSSI Cable Removal Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . .
TF85C-BA Tape Drive Removal Procedure . . . . . . . . . . . . . . . . . . .
SF73 Disk Drive Enclosure Removal Procedure . . . . . . . . . . . . . . .
SF35 Storage Array Removal Procedure . . . . . . . . . . . . . . . . . . . .
TF857-CA Tape Drive Removal Procedure . . . . . . . . . . . . . . . . . . .
Power Distribution Box Removal Procedure . . . . . . . . . . . . . . . . . .
PARAMS Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Switches For Disabling the MSCP . . . . . . . . . . . . . . . . . . . . . . . . .
ISE Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Disabling the MSCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4–28
4–29
4–30
4–30
4–35
4–37
4–39
4–51
4–52
4–52
4–54
4–54
4–55
4–58
4–60
4–60
4–61
4–61
4–61
4–65
4–65
5–1
5–4
5–7
5–8
5–9
5–11
5–13
5–14
5–15
5–17
5–19
5–21
5–23
5–25
5–27
5–29
5–29
5–31
5–33
5–38
5–40
5–43
6–2
6–2
6–5
6–9
6–5
A–1
A–2
A–3
A–4
A–5
A–6
Disabling the MSCP . . . . . . . . . . . . . . . .
Processor Halt Code Definitions . . . . . . .
Processor Halt Reason Code Definitions .
Console Halt Reason Code Definitions . .
Xlink Mode Coding . . . . . . . . . . . . . . . . .
Code Field Definition . . . . . . . . . . . . . . .
SCB Layout . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6–11
A–1
A–2
A–3
A–4
A–10
A–11
xiii
1
Cabinet and Component Descriptions
1.1 In This Chapter
This chapter includes descriptions of the:
•
CPU and expansion cabinets
•
Zone control panel
•
Power modules
•
Domestic power distribution box
•
International power distribution box
1.2 CPU and Expansion Cabinets
Figure 1–1 shows the front layout of an expanded system. Table 1–1 describes
the components shown in Figure 1–1. Figure 1–2 shows the rear layout of an
expanded system. Table 1–2 describes the components shown in Figure 1–2.
Cabinet and Component Descriptions 1–1
Figure 1–1 Cabinet Layout, Front View
1
Front
2
10
3
10
6
10
10
4
8
5
1
7
15
11
12
10
10
16
9
2
13
14
Expansion Cabinet
CPU Cabinet
MR−0406−92RAGS
1–2 Cabinet and Component Descriptions
Table 1–1 Key to Figure 1–1, Cabinet Layout, Front View
Item
Component
Description
1
Zone A
Complete computer with enough elements to run an
operating system.
2
Zone B
Complete computer with enough elements to run an
operating system.
3
Fan assembly
Cooling device.
4
Disk drawer
Optional SF35 disk drive(s).
System Module Card Cage
5
Slot 0 - CPU module
Logic chips and memory.
6
Slot 1 - ATM module
I/O logic supporting up to eight interface adapter cards.
7
Slot 2 - Not used
For future expansion.
8
Zone control panel
Zone controls and indicators.
9
Blank panel
Not used.
10
Disk device
Location for disk device.
11
Disk/tape device
Location for disk or tape device.
12
Disk/tape/tape loader
Location for disk, tape, or tape loader device.
13
Power distribution box A
AC power source for Zone A.
14
Power distribution box B
AC power source for Zone B.
15
UPS A
Optional uninterruptible power supply for Zone A.
16
UPS B
Optional uninterruptible power supply for Zone B.
Cabinet and Component Descriptions 1–3
Figure 1–2 Cabinet Layout, Rear View
Rear
2
1
19
19
19
19
20
21
19
19
3
4
5
6
7
8 9 10
1
11
14
16 17
12 13
15
18
2
22
24
25
23
CPU Cabinet
Expansion Cabinet
Expansion Cabinet Option
MR−0407−92RAGS
1–4 Cabinet and Component Descriptions
Table 1–2 Key to Figure 1–2, Cabinet Layout, Rear View
Item
Component
Description
1
Zone A
Complete computer with enough elements to run an
operating system.
2
Zone B
Complete computer with enough elements to run an
operating system.
3
Fan assembly
Cooling device.
4
Blank panel
Not used.
5
Front End Unit (FEU)
AC input circuit breaker.
6
FEU
Converts ac power to 48 Vdc.
7
FEU
AC input connector.
8
Regulator
Provides +3.3 Vdc at 30 A, +12 Vdc at 12.5 A, and
bias.
9
Regulator
Provides +5 Vdc at 90 A.
10
Power system controller
Provides interface signals to the ATM module.
Miscellaneous Module Card Cage
11
Blank panel
Not used.
12
Slot 0 - Not used
For future expansion.
13
Slot 1 - Cross-link assembly
Connects Zone A and Zone B.
14
Slot 2 - Console module
Module with console port.
15
Slot 3 - Not used
Factory test module.
16
Slot 4 - Disk In/Disk Out module
Permits zone interconnections to access all
configured disks.
17
Slot 5 - CAMP module
Provides custom power control circuits.
Interface Module Card Cage
18
Slots 10 to 17
DSSI and NI interface modules.
Slots 20 to 27
For future expansion.
19
Disk device
Location for disk device.
20
Disk/tape/tape loader
Location for disk, tape, or tape loader device.
21
Disk/tape device
Location for disk or tape device.
22
Power distribution box A
AC power source for Zone A.
23
Power distribution box B
AC power source for Zone B.
24
UPS A
Optional uninterruptible power supply for Zone A.
25
UPS B
Optional uninterruptible power supply for Zone B.
Cabinet and Component Descriptions 1–5
1.3 Zone Control Panel
Figure 1–3 shows the layout of the zone control panel. Table 1–3 describes the
functions of the zone control panel controls and indicators.
Figure 1–3 Zone Control Panel
1
2
3
4
5
6
7
1
8
9
10
MR−0514−92RAGS
1–6 Cabinet and Component Descriptions
Table 1–3 Key to Figure 1–3, Zone Control Panel
Item
Control/Indicator
Function
1
Logic Power - OFF
Two switches with amber indicators. Pressing
the two switches removes 48 V power and
disables the zone. Pressing one switch has no
effect on the operation of the zone. (CPU cabinet
disk power is not affected when logic power is
removed by pressing these switches.)
2
Logic Power - ON
One switch with a green indicator. Pressing this
switch applies 48 V power to the zone. (CPU
cabinet disk power is not affected when logic
power is applied by pressing this switch.)
3
Local Console
One switch with a green indicator. Pressing this
switch connects the system to the console local
port for communication.
4
Remote Console
One switch with a green indicator. Pressing this
switch connects the system to the remote port
for communication.
5
Secure
One switch with a green indicator. Pressing this
switch disables the console Break key function.
(You cannot use the console Break key to halt
the zone or system.)
6
Zone Halt Enable
One switch with a green indicator. Pressing this
switch enables the console Break key function.
(You can use the console Break key to halt the
zone.)
7
System Halt Enable
One switch with a green indicator. Pressing this
switch enables the console Break key function.
(You can use the console Break key to halt both
zones.)
Note
System Halt Enable is NOT supported in Simplex mode.
8
System OK
Green indicator. On when the system power is
on and the system is operational.
9
System Fault
Amber indicator. On when the system is not
operational.
10
OS Running
Green indicator. On when the system is
operational and running a customer or
diagnostic application.
Cabinet and Component Descriptions 1–7
1.4 Power Modules
Figure 1–4 shows the location of the power module controls and indicators.
Table 1–4 describes their functions.
Figure 1–4 Power Module Controls and Indicators
FEU
DC3
DC5
PSC
7
8
9
10
11
12
13
1
2
3
4
14
5
6
15
16
CAMP
MR−0483−92RAGS
1–8 Cabinet and Component Descriptions
Table 1–4 Key to Figure 1–4, Power Module Controls and Indicators
Item
Control/Indicator
Function
1
AC Circuit Breaker
2
FEU Failure
When on, indicates the dc output voltages for the
FEU are below the specified minimum.
3
FEU OK
When on, indicates the dc output voltages for the
FEU are above the specified minimum.
4
DC3 Failure
When on, indicates that one of the +3 Vdc output
voltages is not within the specified tolerances.
5
DC3 OK
When on, indicates that the +3 Vdc output voltages
are within the specified tolerances.
6
AC Present
When on, indicates ac power is present at the ac
input connector, regardless of the position of the
circuit breaker.
7
DC5 Failure
When on, indicates that one of the +5 Vdc output
voltages is not within the specified tolerances.
8
DC5 OK
When on, indicates that the +5 Vdc output voltages
are within the specified tolerances.
9
PSC Failure
When on, indicates a PSC fault.
10
PSC OK
When blinking, indicates the PSC is performing
power-on self-tests.
11
Over Temperature
Shutdown
When on, indicates that the PSC shut down the
system because of an internal overtemperature
condition.
12
Fan Failure
When on, indicates a fan failure. Use the
hexadecimal number in the Fault ID Display to
isolate the fan.
13
Disk Power Failure
When on, indicates a disk power failure. Use the
hexadecimal number in the Fault ID Display to
isolate the storage compartment that houses the disk.
14
Fault ID Display
Displays power subsystem fault codes.
15
PSC Reset Button
When out, indicates a PSC fault condition. Press in
to reset.
16
CAMP Fan Fault
When on, indicates that a fan fault caused all disk
drives and tape drives to shut down.
When on, indicates the PSC is functioning.
1.5 Domestic and International Power Distribution Boxes
The domestic power distribution box (PN 30-24374-01) is shown in Figure 1–5.
Table 1–5 describes the components shown in the figure. The international power
distribution box (PN 30-35415-02) is shown in Figure 1–6. Table 1–6 describes
the components shown in the figure.
Cabinet and Component Descriptions 1–9
Figure 1–5 Domestic Power Distribution Box
5
1
I
CB
2
3
5
4
MR-0498-92DG
Table 1–5 Key to Figure 1–5, Domestic Power Distribution Box
Item
Component
Description
1
Three-phase power cord
Connects the power distribution box to ac power.
The power cord may be repositioned by moving the
locking arm.
2
Circuit breaker
When set to on, ac power is applied to the
distribution box.
3
Local/Remote switch
The switch has icons representing Remote, Off, and
Local. When set to:
•
Local, the internal bus controls the operation of
ac power.
•
Off, the distribution box is turned off.
•
Remote, the distribution box is turned on (if the
power cord is connected to ac power and the
circuit breaker is set to on).
4
For power cords
Used to dress the power cords.
5
Eight ac outlets
Reserved for the FEU and expansion cabinet.
1–10 Cabinet and Component Descriptions
Figure 1–6 International Power Distribution Box
5
1
2
3
4
MR-0499-92DG
Table 1–6 Key to Figure 1–6, International Power Distribution Box
Item
Component
Description
1
Single-phase power cord
Connects the power distribution box to ac power.
2
Circuit breaker
When set to on, ac power is applied to the
distribution box.
3
Local/Remote switch
The switch has icons representing Remote, Off, and
Local. When set to:
•
Local, the internal bus controls the operation of
ac power.
•
Off, the distribution box is turned off.
•
Remote, the distribution box is turned on (if the
power cord is connected to ac power and the
circuit breaker is set to on).
4
For power cords
Used to dress the power cords.
5
Six ac outlets
Reserved for the expansion cabinet.
Cabinet and Component Descriptions 1–11
2
Console Operations
2.1 In This Chapter
This chapter describes the console, console operating modes and commands, and
booting information.
This chapter includes:
•
Console description
•
Console operating modes
•
Console control characters
•
Console command language syntax
•
Bootstrap procedures
•
Entering CIO mode
•
CIO mode console commands
2.2 Console Description
The system architecture (Figure 2–1 and Table 2–1) supports in each zone:
•
A local console terminal
•
The console firmware (programs located in ROM) residing on:
The primary NCIO module
The CPU module
•
A remote console terminal
The remote console terminal and the local console terminal are connected to the
zone through the primary NCIO module.
The console operates a terminal that may be:
•
Connected to the CPU serial port
•
On the system console port
Cabinet and Component Descriptions 2–1
Figure 2–1 System Components
2
8
7
4
3
5
6
2
8
7
4
3
5
1
MR−0486−92RAGS
Table 2–1 Key to Figure 2–1, System Components
Number
Component
1
CPU cabinet
2
Zone (A or B)
3
CPU module
4
To memory
5
Primary NCIO module
6
Cross-link cable
7
Local console terminal
8
Remote console terminal (optional)
2–2 Cabinet and Component Descriptions
Table 2–2 describes the function of each console component.
Table 2–2 Function of the Console Components
Part
Function
Local console terminal
Terminal located with the system that is used for console
input and display output.
Remote console port
One remote port is available in each zone. The port may
be connected to a remote console terminal through a
modem. There is no built-in modem control. The remote
console port provides the same functions as the local
console port.
Console firmware
The console firmware resides on the primary NCIO
module and on the CPU module.
You can use any one of the four console terminals (local or remote) for input
commands, but use only one terminal at a time. All of the console terminals echo
the response of the system to a console command.
If the system is operating with a single zone running, you must use a console
terminal (local or remote) that is connected to that zone for input commands.
2.3 Console Operating Modes
Operators communicate with the system in one of the following input/output
modes:
•
Program I/O (PIO) mode
•
Console I/O (CIO) mode
Normal operation takes place in the PIO mode. From PIO mode, the operator
uses the console to:
•
Log in
•
Use the mail facility
•
Create and edit files
From CIO mode, the operator executes the console commands. These commands
are described in Section 2.8.
Cabinet and Component Descriptions 2–3
2.3.1 Entering CIO Mode
The CIO mode is entered when you turn on system power if:
•
The Zone Halt Enable switch is pressed
•
A STOP/ZONE instruction is executed
•
A severe processor condition occurs
•
An external halt is detected
Once entered, the console prompt >>> is displayed and the CIO mode is ready to
execute commands entered at the prompt.
2.3.2 Exiting CIO Mode
The CIO mode is exited by issuing one of the following console commands:
•
BOOT
•
START
•
CONTINUE
These commands are described in Section 2.8. Figure 2–2 shows how to move
between PIO and CIO modes.
Figure 2–2 Console Operating Modes
PIO
Mode
BOOT
CONTINUE
START
STOP/ZONE
CIO
Mode
MR−0487−92RAGS
2–4 Cabinet and Component Descriptions
2.4 Console Control Characters
The ASCII control characters and function keys listed in Table 2–3 have special
meanings when typed on a console terminal.
Table 2–3 Console Control Characters and Function Keys
Character/Key
Function
Break
In CIO mode, acts like Ctrl/C . In PIO mode, causes the processor to
halt and begin running the console program.
If the system is in a secure mode when you press the Break key the
halt is suppressed. If you press the Zone Halt Enable or System
Halt Enable switch, the halt initiated by pressing the Break earlier
is enabled.
Ctrl/C
Echoes ^C and causes the console to abort processing of a command,
if possible.
Ctrl/O
Alternately enables and disables output.
Ctrl/Q
Resumes output previously suspended by Ctrl/S .
Ctrl/R
Echoes ^R and retypes the command line.
Ctrl/S
Stops transmission until Ctrl/Q is typed.
Ctrl/U
Echoes ^U and ignores the current command line. The console
prompt is displayed on the next line. This affects only the entry of
the current line. Pressing Ctrl/U does not abort a command that is
executing.
<x (delete)
Deletes the character to the left of the cursor. On video terminals,
the deleted characters disappear. On hard-copy terminals, the
deleted characters are typed within a pair of backslash delimiters
as they are deleted.
Esc or Ctrl/[
Suppresses any special meaning associated with a given character.
Return
Terminates a command line and executes the command.
Cabinet and Component Descriptions 2–5
2.5 Console Command Language Syntax
The console commands accept qualifiers. Qualifiers specify a numerical value or
select an option from a list of options. Command elements may be abbreviated
and any extra tabs or spaces are ignored. Unless otherwise noted, numerical
values must be given in hexadecimal notation. The command length may not
exceed 80 characters.
Table 2–4 lists the console command language syntax rules. The console
commands available for the system are listed in Section 2.8.
Table 2–4 Console Command Language Syntax
Command Element
Rule
Abbreviations
A command verb or argument may be abbreviated to the
extent that it remains unique.
Multiple adjacent spaces and
tabs
Are treated as a single space.
Qualifiers
May appear after a command verb, option, or symbol.
They must be preceded by a slash (/).
Numbers
Must be hexadecimal.
No characters
Are treated as a null command. No action is taken.
2–6 Cabinet and Component Descriptions
2.6 Bootstrap Procedures
The BOOT command initializes the system and then loads and starts the virtual
memory bootstrap (VMB) program from read-only memory (ROM). The VMB
program, in turn, loads and starts the operating system from the specified boot
device. Figure 2–3 shows the steps in the boot procedure.
Figure 2–3 Boot Procedure
Enter BOOT command
at the >>>
console prompt.
Boot procedure
initializes
the system.
Boot procedure
loads VMB into
main memory.
VMB loads the
operating system.
MR−0490−92RAGS
The VMB program is the primary bootstrap program. VMB:
•
Resides in ROM on the ATM module.
•
Is loaded into memory and initiated by the system console firmware.
•
Provides the necessary parameters for successful operation of the OpenVMS
secondary bootstraps.
•
Allows you to boot from DSSI compatible disk and tape devices over the
Ethernet.
Cabinet and Component Descriptions 2–7
2.7 Entering CIO Mode
To recognize and process CIO commands:
•
The System Halt Enable switch on both zone control panels must be pressed
•
The operating software must be halted
•
The processor must be running the console firmware
The example below shows how to use the Break key to enter CIO mode from PIO
mode and then return to PIO mode by using the CONTINUE command. The
System Halt Enable switch on both zone control panels must be pressed.
Caution
Use CONTINUE to continue from a system halt. Use START/ZONE to
continue from a zone halt.
A remote operator can use CIO mode only when full access privileges for the
remote console have been set at the local console.
Example
$
$
$
$
$ Break
>>>
?002 External halt
PC = 01E01473
>>> CONTINUE
$
!
!
!
!
!
!
!
Press the System Halt Enable switch on
both zone control panels.
From PIO mode, press the Break key once.
This puts the processor in HALT mode.
! This command resumes execution of the
! operating system software.
! The console returns to PIO mode.
Notice that comments (characters following an exclamation point (!)) are allowed
on a command line. Comments are ignored by the console when the Return key is
pressed. This may be useful when you document a console session on a hardcopy
terminal.
Notice also that lowercase characters are accepted, but the console converts all
characters to uppercase.
2–8 Cabinet and Component Descriptions
2.8 CIO Mode Console Commands
This section describes the CIO mode console commands. The console commands
are listed below with command abbreviations shown in bold capital letters.
Boot
CLEAR
Continue
Deposit
DUP
Examine
Find
HElp
Initialize
Move
MATCH_ZONES
Repeat
SEt
SHow
Start
Test
X(transfer)
Z
!(comment)
2.8.1 BOOT
BOOT initializes the system, loads a program image from a specified boot device,
and transfers control to that program image.
When you do not supply a boot-spec, the default boot device is used. When you do
not supply flag(s), a value of 0 is assumed.
The console program accepts a terminating colon on the boot-spec, but ignores the
colon when the name is processed.
The BOOT syntax is:
BOOT[/OVER][[/R5:]<flag(s)> boot-spec]
The boot-spec format may be dduuu/PATH=path-list . . . dduuu/PATH=path-list,
where:
dd is a device mnemonic.
uuu is a unit number (0 to 999).
/PATH=path-list is a qualifier. See Table 2–5.
Or, the boot-spec format may be a variable that specifies the boot devices and
paths. See Section 2.8.13.1.
Table 2–5 describes the qualifiers. Table 2–6 lists the VMB program /R5:<flag>
values.
Table 2–5 Qualifiers for BOOT
Qualifier
Function
/R5:<flag>
Passes parameters to the virtual memory bootstrap (VMB)
program. See Table 2–6.
/PATH=path-list
Specifies a path to a boot device. The path-list specifies zones
and slot numbers in the path. When the path-list has more
than one slot, you separate the slots by commas. The path-list
format is zss, where:
z is a zone ID (A or B).1
ss is a slot number (10 to 17, 20 to 27) of an adapter
connecting to a boot device.
/OVER
1 The
Overrides the results of the bootability test to allow a Simplex
mode boot.
console validates this field before invoking VMB.
Cabinet and Component Descriptions 2–9
Table 2–6 VMB Program /R5:<flag> Values
Bit
Hex Value
Function
Action
0
1
Conversational
boot
Returns to the SYSBOOT> prompt.
1
2
Debug
Maps the XDELTA program into the system
page table.
2
4
Initial
breakpoint
Operating system issues a breakpoint after
turning on memory management.
3
8
Secondary boot
Boots from boot block specified in /R4:n.
5
20
Bootstrap
breakpoint
Transfers control to the XDELTA program.
8
100
Solicit file name
VMB issues a prompt for the secondary boot
procedure.
9
200
Halt before
transfer
VMB executes a halt before transferring
control to the secondary bootstrap procedure.
31:28
x0000000
Top-level system
boot
Specifies the top-level directory number for
a system disk with multiple system roots,
where x = a hex value from 0 to F.
2.8.2 CLEAR
CLEAR BOOT deletes a boot-spec. CLEAR ERRORS clears the error frame of the
previously detected error. If you do not clear the error frame, the next error is not
recorded in the error frame. CLEAR BROKE clears the broke bit in EEPROM.
The following CLEAR syntax deletes a boot-spec:
CLEAR BOOT <name>
The following CLEAR syntax clears the error frame:
CLEAR ERRORS
The following CLEAR syntax clears the broke bit ID in EEPROM:
CLEAR BROKE[/PATH=path-number]
Table 2–7 describes the /PATH=path-number qualifier.
Table 2–7 Qualifier for CLEAR
Qualifier
Function
/PATH=path-number
Specifies the zone and slot number of the module to clear. The
path-number format is zss, where:
z is the zone ID (A or B).
ss is the slot number (0 to 2, 10 to 17, 20 to 27) of an
adapter connecting to a DSSI device.
CLEAR BROKE clears the module ID EEPROM in the zone that is running.
2–10 Cabinet and Component Descriptions
2.8.3 CONTINUE
CONTINUE exits the CIO mode and returns operation to the PIO mode.
Caution
Use CONTINUE to continue from a system halt. Use START/ZONE to
continue from a zone halt.
The CONTINUE syntax is:
CONTINUE
2.8.4 DEPOSIT
DEPOSIT stores the specified data in the specified address.
When the system is initialized or when any transition from a running to a halted
state occurs, the defaults are physical address space 0 and data size longword.
The DEPOSIT syntax is:
DEPOSIT[/{B,W,L,Q}][/{G,I,M,P,V,U}][/N:count]address-spec data-spec
The address-spec identifies a physical or virtual hexadecimal memory address. A
qualifier may be placed before or after an address-spec or data-spec.
The data-spec identifies a hexadecimal number to be stored, unless the default
radix has been changed with a %D introducer. When you do not supply a
data-spec, a value of 0 is assumed.
Table 2–8 describes the qualifiers. Table 2–9 lists the address-spec symbolic
addresses.
Table 2–8 Qualifiers for DEPOSIT
Qualifier
Function
/B
Sets the data size to byte.
/W
Sets the data size to word.
/L
Sets the data size to longword.
/Q
Sets the data size to quadword.
/G
Sets general purpose register address space R0 through PC.
/I
Sets internal processor register (IPR) address space accessed by the MTPR and
MFPR instructions.
/P
Sets physical address space.
/V
Sets virtual address space. An EXAMINE to virtual memory returns the
translated physical address. A DEPOSIT to virtual memory sets the PTE <M>
bit.
/U
Sets access to console private memory. This qualifier must be specified for each
command.
/N:count
Specifies the number of consecutive locations to modify. The console deposits
to the first address, then to the specified number of succeeding addresses. This
qualifier must be specified for each command.
Cabinet and Component Descriptions 2–11
Table 2–9 Address-Spec Symbolic Addresses
Symbolic Address
Description
R<n>
General purpose register number n, where n is a decimal
number 0 to 15.
FP
Frame pointer.
AP
Argument pointer.
SP
Stack pointer.
PC
Program counter.
PSL
Program status longword.
+
A location following the last location accessed by an EXAMINE
or DEPOSIT. The location is the last address plus the size of
the last reference (1 for byte, 2 for word, 4 for longword).
-
A location preceding the last location accessed by an
EXAMINE or DEPOSIT. The location is the last address
minus the size of the last reference (1 for byte, 2 for word, 4 for
longword).
*
The last location referenced by an EXAMINE or DEPOSIT.
@
Indirect addressing. The address-spec is used as a pointer to
the data. The format is @address-spec, where address-spec can
be any valid address except another @. See Example 2–1.
Note
Remember that the symbolic addresses from the previous command are
used for indirect addressing. See Example 2–1.
Example 2–1 Indirect Addressing
>>> DEPOSIT R0 200
! The value 200 is stored directly in R0. The defaults
! are set to longword, general purpose register.
>>> DEPOSIT/P @R0 200 !
!
!
!
The value 200 is stored directly in the address pointed
to by R0. The /P qualifier tells the parser that the
value in R0 should be treated as a physical address.
The defaults are set to longword, physical.
>>> DEPOSIT/V @R0 200 !
!
!
!
The value 200 is stored directly in the address pointed
to by R0. The /V qualifier tells the parser that the
value in R0 should be treated as a virtual address.
The defaults are set to longword, virtual.
>>> DEPOSIT @200
! The value 200 is stored in the address specified in
! the previous command. The defaults are set to longword,
! virtual.
2–12 Cabinet and Component Descriptions
2.8.5 DUP
DUP connects to the DSSI DUP service on a selected node. DUP is used to
examine and modify the parameters of a DSSI device.
DUP syntax is:
DUP[/PATH:<path-number>] node-id /[TASK:task]
The node-spec identifies the node number (0 to 7) of a DSSI device attached to
the console. Table 2–10 describes the qualifiers.
Table 2–10 Qualifiers for DUP
Qualifier
Function
/PATH=path-number
Specifies the zone and slot number of an adapter connecting to
a DSSI device. The path-number format is zss, where:
z is the zone ID (A or B).
ss is the slot number (10 to 17, 20 to 27) of an adapter
connecting to a DSSI device.
node-id
Specifies the DSSI node connecting to a DSSI device. Valid
node-ids are 0 to 5.
TASK:task
Invokes a task from a DSSI device. Valid DUP tasks are:
DRVEXR
DRVTST
HISTRY
DIRECT
ERASE
VERIFY
DKUTIL
PARAMS
2.8.6 EXAMINE
EXAMINE displays the contents of the specified memory location or register. The
display line consists of:
•
A single-character address specifier
•
The hexadecimal physical address to be examined
•
The examined data in hexadecimal
When the system is initialized or when any transition from a running to a halted
state occurs, the defaults are physical address space 0 and data size longword.
The EXAMINE syntax is:
EXAMINE[/{B,W,L,Q}][/{G,I,M,P,V,U}][/N:count][/A][address-spec]
The address-spec identifies a physical or virtual hexadecimal memory address. A
qualifier may be placed before or after the address-spec or data-spec.
Table 2–11 describes the qualifiers. Table 2–12 lists the address-spec symbolic
addresses.
Cabinet and Component Descriptions 2–13
Table 2–11 Qualifiers for EXAMINE
Qualifier
Function
/B
Sets the data size to byte.
/W
Sets the data size to word.
/L
Sets the data size to longword.
/Q
Sets the data size to quadword.
/G
Sets general purpose register address space R0 through PC.
/I
Sets internal processor register (IPR) address space accessed
by the MTPR and MFPR instructions.
/P
Sets physical address space.
/V
Sets virtual address space. An EXAMINE to virtual memory
returns the translated physical address. A DEPOSIT to virtual
memory sets the PTE <M> bit.
/U
Sets access to console private memory. This qualifier must be
specified for each command.
/N:count
Specifies the number of consecutive locations to modify. The
console deposits to the first address, then to the specified
number of succeeding addresses. This qualifier must be
specified for each command.
/A
Interprets and displays the data as ASCII characters.
Nonprinting characters are displayed as periods.
Table 2–12 Address-Spec Symbolic Addresses
Symbolic Address
Description
R<n>
General purpose register number n, where n is a decimal
number 0 to 15.
FP
Frame pointer.
AP
Argument pointer.
SP
Stack pointer.
PC
Program counter.
PSL
Program status longword.
+
A location following the last location accessed by an EXAMINE
or DEPOSIT. The location is the last address plus the size of
the last reference (1 for byte, 2 for word, 4 for longword).
-
A location preceding the last location accessed by an
EXAMINE or DEPOSIT. The location is the last address
minus the size of the last reference (1 for byte, 2 for word, 4 for
longword).
*
The last location referenced by an EXAMINE or DEPOSIT.
@
Indirect addressing. The address-spec is used as a pointer to
the data. The format is @address-spec, where address-spec can
be any valid address except another @. See Example 2–1.
Note
Remember that the symbolic addresses from the previous command are
used for indirect addressing. See Example 2–1.
2–14 Cabinet and Component Descriptions
2.8.7 FIND
FIND searches the main memory beginning at physical address space 0 for either
a page-aligned 512-Kbyte segment of memory, or a restart parameter block (RPB).
When FIND is successful, it saves the address plus the segment of memory (or
RPB) in the stack pointer. When FIND is unsuccessful, an error message is
displayed and the contents of the stack pointer are unpredictable.
The FIND syntax is:
FIND
Table 2–13 describes the qualifiers.
Table 2–13 Qualifiers for FIND
Qualifier
Function
/MEMORY
Searches main memory for a page-aligned 512-Kbyte segment
of memory.
/RPB
Searches main memory for a restart parameter block. The
search leaves memory unchanged.
2.8.8 HELP
HELP displays a summary of the commands, their arguments, and qualifiers.
When you supply a command name, HELP displays the arguments and qualifiers
for that command only. HELP does not provide complete descriptions of the
commands.
The HELP syntax is:
HELP [command]
Or:
? [command]
Cabinet and Component Descriptions 2–15
2.8.9 INITIALIZE
INITIALIZE performs the steps shown in Table 2–14.
Table 2–14 INITIALIZE Steps
Step
Action
1
Do hard reset of zone (the cross-link state is set to off).
2
Do hard reset of all available ATMs.
3
Initialize hardware.
4
Reconfigure the zone and update the device configuration block
(DCB) to reflect the zone status.
5
Execute the Duplex Compatibility Test.
6
Load the firmware into the console main loop.
The INITIALIZE syntax is:
INITIALIZE
2.8.10 MOVE
MOVE transfers the specified number of bytes (count) from the source-address to
the destination-address.
The MOVE syntax is:
MOVE source-address destination-address count
The source-address is the starting address of the data. The destination-address
is the starting address of the destination. The count is the number of bytes to be
moved.
2.8.11 MATCH_ZONES
MATCH_ZONES copies the system-wide module data EEPROM from the other
zone. MATCH_ZONES does not copy the zone-specific module data EEPROM.
Use MATCH_ZONES only when:
•
The cross-link state is set to off, and
•
The path to the other zone is available. (The cross-link cables and other zone
power is on.)
The MATCH_ZONES syntax is:
MATCH_ZONES
2–16 Cabinet and Component Descriptions
2.8.12 REPEAT
REPEAT continuously executes the specified command. REPEAT applies to the
following commands only.
•
DEPOSIT
•
EXAMINE
REPEAT can be aborted by pressing
Ctrl/C
at the console keyboard.
The REPEAT syntax is:
REPEAT command
2.8.13 SET
SET modifies the value of the specified variable.
The SET syntax is:
SET variable value [value]
Note
SET does not allow abbreviations. You must enter the name of the
variable completely.
Table 2–15 lists the variables with the acceptable values.
Table 2–15 SET Variables and Values
Variable
Description
Acceptable Values
BOOT DEFAULT
Default boot specification.
Up to 80 characters of ASCII text
MODE
Boot mode.
FAILSTOP = Simplex mode
FAILSAFE = Duplex mode
RESTART
Halt action switch.
HALT = Enter console mode
BOOT = Boot
RESTART = Restart
BAUD
Console port speed.
300, 600, 1200, 2400, 4800, 9600,
19200, 38400
ZONE
Zone identification.
A = Zone A
B = Zone B
Cabinet and Component Descriptions 2–17
2.8.13.1 SET BOOT
SET BOOT saves the values of boot-specs. Space for nine boot-specs is available
on the CPU module EEPROM. The first space is reserved for the default bootspec. The other eight spaces are available to the user.
The SET BOOT syntax is:
SET BOOT DEFAULT value
Or:
SET BOOT boot-spec value
The boot-spec may be up to 8 characters of ASCII text. The value is the ASCII
text assigned to the boot-spec.
2.8.14 SHOW
SHOW displays information about the specified variable. When the cross-link
state is off (Simplex mode), information about the current zone is displayed.
When the cross-link state is on (Duplex mode), information about both zones is
displayed.
The SHOW syntax is:
SHOW variable
Table 2–16 lists the variables. You must supply a variable.
Table 2–16 SHOW Variables
Variable
Description
Acceptable Values
DEFAULT
Default specification.
Up to 80 characters of ASCII text
MODE
Boot mode.
FAILSTOP = Simplex mode
FAILSAFE = Duplex mode
RESTART
Halt action switch.
HALT = Enter console mode
BOOT = Boot
RESTART = Restart
BAUD
Console port speed.
300, 600, 1200, 2400, 4800, 9600,
19200, 38400
ZONE
Zone identification.
A = Zone A
B = Zone B
BOOT
Displays the saved boot
specifications.
CONFIGURATION
Displays the current system
configuration, including the
identity and status of any
modules in the system.
VERSION
Displays the firmware
revision of all ROMs in the
system.
(continued on next page)
2–18 Cabinet and Component Descriptions
Table 2–16 (Cont.) SHOW Variables
Variable
Description
Acceptable Values
DSSI/PATH=pathnumber
Specifies the zone and
slot number of an adapter
connecting to a DSSI device.
The path-number format is
zss, where:
z is the zone ID (A or B).
ss is the slot number
(10 to 17, 20 to 27) of an
adapter connecting to a
DSSI device.
ETHERNET
Displays the physical
Ethernet addresses.
MEMORY
Displays system memory
information.
STATE
Displays the state of the
cross-link and the system
cables.
ERRORS
Displays the diagnostic error
frames. Not allowed if the
cross-link state is on.
ALL
Displays the contents of all
variables.
2.8.15 START
START begins execution of the operating software from the specified address.
START is equivalent to DEPOSIT PC followed by CONTINUE.
The START syntax is:
START address-spec
You must supply an address-spec.
Cabinet and Component Descriptions 2–19
2.8.16 TEST
TEST enables the user to test:
•
The system
•
A zone
•
The CPU and memory
Use TEST only when the cross-link state is set to off.
The TEST syntax is:
TEST [qualifier(s)]
Tables 2–17 and 2–18 describe the TEST selection and control qualifiers.
Table 2–17 Qualifiers for TEST Selection
Qualifier
Function
/GROUP:n1
Specifies a decimal number from 0 to 5 that identifies the
group of tests to be run.
/TEST:n1
Specifies a decimal number from 0 to 32 that identifies the
tests to be run.
/SUBTEST:n1
Specifies a decimal number from 0 to 32 that identifies the
subtests to be run.
/VERBOSE
Enables a display of all individual tests during execution.
/NOTRACE
Disables test traces.
1n
can be a:
•
•
•
•
Single value
Range separated by a colon (1:5)
List separated by commas (1,5,9)
Combination of range and list (1:6,8,10,11:29)
2–20 Cabinet and Component Descriptions
Table 2–18 Qualifiers for TEST Control
Qualifier
Function
/PASSCOUNT:n
n is a decimal number from 0 to MAXINT. When n is 0, the
passcount is infinite.
/NOTRACE
Disables the test traces.
/COE
Continues on error.
/NOCONFIRM
Disables the test confirmation on destructive tests.
/EXTENDED
Enables extended error reports.
/NOSTATUS
Disables status messages and reports.
/LIST
Lists the available tests, but does not run them.
When you do not supply the qualifier(s), TEST runs all the nonextended tests
(except those that require confirmation).
2.8.17 X(transfer)
X is used by automatic systems communicating with the console. X is not
intended for use by operators.
X loads or unloads the count of bytes beginning at the specified address.
When the high-order bit of the count longword is 1, the data is read from physical
memory to the console terminal. When the high-order bit of the count longword
is 0, the data is written from the console terminal to physical memory.
The X syntax is:
X address-spec count
Return
data-stream checksum
The address-spec is a hexadecimal number that specifies a physical address.
The count is an 8-bit hexadecimal number that specifies a number of bytes.
The data-stream contains the bytes to be transferred by X. The checksum is a
2-digit hexadecimal number that specifies the 2’s complement checksum of the
data-stream. The checksum verifies the data-stream.
Cabinet and Component Descriptions 2–21
2.8.18 Z
Z connects to the firmware of another module in the system.
The Z syntax is:
Z[/PATH=path-number]
Table 2–19 describes the qualifier.
Table 2–19 Qualifier for Z
Qualifier
Function
/PATH=path-number
Specifies the zone and slot number of a module. The pathnumber format is zss, where:
z is the zone ID (A or B).
ss is the slot number of the module.
When you do not supply a path, Z tries to connect to the module in slot 1 of the
zone that is running.
Note
Z performs a hard reset on the ATMs, but you need to issue a programmed
reset to load and start the functional firmware. After Z, you must issue a
BOOT from the same zone, or a START/ZONE from the other zone (if that
zone is running the operating system).
2.8.19 !(comment)
The ! (exclamation point) prefixes a comment. The text following the ! is ignored.
The ! syntax is:
!(comment)
Or:
command!(comment)
2–22 Cabinet and Component Descriptions
3
System Maintenance
3.1 In This Chapter
This chapter includes:
•
Maintenance strategy
•
Operating rules and cautions
•
General troubleshooting procedure
•
Module fault LEDs
•
Power system overview
•
Power system maintenance
•
Device status and fault indicators
•
ROM-based diagnostics
3.2 Maintenance Strategy
When a hardware component fails, the Model 810 system uses self-diagnosis
through ROM-based diagnostics (RBDs) to isolate the faulty FRU. Once isolated,
the system automatically:
•
Places the faulty FRU off line
•
Reports the error in the error log
•
Identifies the faulty FRU on the console terminal
•
Turns on the faulty FRU fault LED
System Maintenance 3–1
3.3 Operating Rules and Cautions
Table 3–1, Table 3–2, and Table 3–3 contain operating rules for use during a
service call. Table 3–4 provides cautions.
Table 3–1 Before Stopping a Zone
Step
Action
1.
Do not depend on the accuracy of a zone ID label. Issue SHOW ZONE before
STOP/ZONE to check the states of both zones.
2.
Issue SHOW SYSTEM to make sure that the FTSS$SERVER process is running
before turning off zone power, or pressing the Break key.
3.
Check both zone control panels. The System Fault indicator in the failing zone
should be on.
4.
Check console messages and error log for related problem information.
5.
Always issue SHOW DEV D before STOP/ZONE to make sure that shadow set
copying in not in progress.
6.
Issue STOP/ZONE. Wait for the zone to initialize, and then turn off zone power.
7.
Remove the cross-link assembly.
Table 3–2 After a Zone is Repaired
Step
Action
1.
Replace the cross-link assembly.
2.
Turn on zone power.
3.
Issue SHOW MODE to make sure that the zone is set to: MODE = FAILSAFE.
4.
Issue START/ZONE.
5.
Check the running zone console for the following message: % FTSS-S-ZONEAVAIL.
6.
If the message in step 5 does not appear on the console, consider replacing the
cross-link assembly.
7.
Monitor the console for the following environmental information messages:
"OPERATING ON EXTERNAL POWER"
"OPERATING BATTERY POWER" (Life approx 1 hr.)
"NORMAL ZONE TEMPERATURE"
"YELLOW ZONE TEMPERATURE"
"BATTERY TEST PASSED IN CABINET....."
"BATTERY TEST FAILED" (Battery not present)
FTSS messages....
3–2 System Maintenance
Table 3–3 Before Leaving the Site
Step
Action
1.
Issue SHOW DEVICE D to make sure that all disks are either shadow set
members or in the process of being copied.
2.
Issue SHOW DEVICE E to make sure that all EP/EF drivers are on line.
3.
Use FTSS$FSM to show the failover set status:
MCR FTSS$FSM Return
FSM> SHOW ADAPTER
Return
4.
Issue SHOW DEV PW to make sure the PW driver is on line.
5.
Issue SHOW CLUSTER/CONTINUE (ADD CIRCUITS,
CONNECTIONS,LPORT,RPORT) to check for correct DSSI configuration:
$ SHOW CLUSTER/CONTINUE Return
COMMAND> ADD CIRCUITS, CONNECTIONS,LPORT,RPORT
Return
SYSTEMS
MEMBERS
CIRCUITS
CONNECTIONS
NODE SOFTWARE STATUS LPORT RPORT RP_TYP CIR_STA LOC_PROC_NAME
FTSYS VMS V5.4
PWA0
PWB0
PWF0
PWG0
PWA0
PWB0
PWF0
PWG0
6
7
6
7
7
6
7
6
SWIFT
SWIFT
SWIFT
SWIFT
SWIFT
SWIFT
SWIFT
SWIFT
OPEN
OPEN
OPEN
OPEN
OPEN
OPEN
OPEN
OPEN
SCS$DIRECTORY
LISTEN
RFX V200
PWA0
0
RF35
MSCP$TAPE
MSCP$DISK
OPEN VMS$DISK_CL_DRVROPEN OPEN
USERS RFX V200
PWG0
PWA0
PWG0
0
1
1
RF35
RF35
RF35
OPEN
OPEN
OPEN VMS$DISK_CL_DRVROPEN OPEN
FTTA
RFX V246
PWA0
2
RF35
OPEN VMS$DISK_CL_DRVROPEN OPEN
SYSB
RFX V200
PWG0
PWB0
2
0
RF35
RF35
OPEN
OPEN VMS$DISK_CL_DRVROPEN OPEN
DISK1 RFX V200
PWF0
PWB0
PWF0
0
1
1
RF35
RF35
RF35
OPEN
OPEN
OPEN VMS$DISK_CL_DRVROPEN OPEN
FTTB RFX V246
PWB0
2
RF35
OPEN VMS$DISK_CL_DRVROPEN OPEN
PWF0
2
RF35
OPEN
SYSA
6.
CON_STA
Make sure that the Break keys on both zones are disabled (zone control panel
SECURE LED is on).
System Maintenance 3–3
Table 3–4 Cautions
1.
Do not press ZONE HALT ENABLE and the Break key to stop a running zone.
Use STOP/ZONE. If ZONE HALT ENABLE is used, CONTINUE will not resume
zone operation.
2.
Do not press the Break key or cycle power during the power on or RBD tests. This
action may corrupt the EEPROM.
3.
Do not perform a Simplex boot (MODE = FAILSTOP) from a disk used by the
running zone. This action may corrupt the disk.
4.
Do not turn off zone power or halt a zone if the FTSS$SERVER is not loaded and
running.
3.4 General Troubleshooting Procedure
Table 3–5 provides a general procedure for isolating and replacing a faulty FRU.
While the repair is being performed, the user application continues to run.
Table 3–5 General Troubleshooting Procedure
Step
Action
1.
Check both zone control panels. The System Fault indicator in the failing zone
should be on.
2.
If the zone is not already stopped, ask the system manager or other responsible
system person to perform a SHOW ZONE and STOP ZONE.
After the system manager stops the zone, remove the cross-link assembly.
If you are given permission to stop the zone, use the procedure specified in
Table 3–1.
3.
Check all fault LEDs and the console messages.
To verify that the correct FRU has been isolated, check the error log.
If a fault LED is on and/or a console message indicates that an FRU has been
removed from service, replace the FRU. (See Chapter 5, FRU Removal and
Replacement Procedures.)
Note
Before removing and replacing any module, check the Power Module
indicators (Table 3–9) to rule out any potential power problems.
4.
If the replaced FRU corrected the problem, turn on zone power.
5.
If the repaired zone passes the power on diagnostics, turn off zone power and
reconnect the cross-link assembly.
6.
Turn on zone power. If the power on diagnostics and the duplex compatibility test
pass with the cross-link assembly connected, turn the system over to the system
manager.
The system manager is responsible for synchronizing the system and returning it
to duplex operation.
(continued on next page)
3–4 System Maintenance
Table 3–5 (Cont.) General Troubleshooting Procedure
Step
Action
7.
If the replaced FRU did not correct the problem, open the system cabinet front
door. Check all module and disk drawer fault LEDs.
If any fault LED is on, replace the associated module or device. (See Chapter 5,
FRU Removal and Replacement Procedures.)
8.
If no module or disk fault LED is on, open the system cabinet rear door. Check all
module LEDs in the miscellaneous and interface module card cages.
If a fault LED is on, replace the associated module. (See Chapter 5, FRU Removal
and Replacement Procedures.)
9.
If no module fault LED is on, open the expansion cabinet rear door. Check the disk
power fault indicators to eliminate any potential power problems. (See Figure 3–7
and Figure 3–9.)
If a power fault indicator is on, replace the device. (See Chapter 5, FRU Removal
and Replacement Procedures.)
10.
If no power fault indicator is on, open the expansion cabinet front door and check
all disk and tape unit fault LEDs and indicators. (See Figure 3–6, Figure 3–8, and
Table 3–23.)
If any LED or fault indicator is on, replace or repair the failing device. (See
Chapter 5, FRU Removal and Replacement Procedures.)
11.
If no fault LEDs or indicators are on, run the error log utility. (See Chapter 4,
Error Handling and Analysis.)
Use the OpenVMS HELP facility to help you run the utility as shown in the
following example.
Qualifier examples can be displayed at the ANALYZE Subtopic? prompt as
shown at the end of the code example.
$ HELP ANALYZE/ERROR_LOG
ANALYZE
/ERROR_LOG
Invokes the Errorlog Report Formatter (ERF) to selectively report
the contents of an error log file. The /ERROR_LOG qualifier is
required. For a complete description of the OpenVMS Analyze Error
Log Utility, including more information about the ANALYZE/ERROR_LOG
command and its qualifiers, see the OpenVMS Error Log Utility Reference
Manual.
Format:
ANALYZE/ERROR_LOG [file-sped[,...]]
Additional information available:
Parameters Command_Qualifiers
/BEFORE
/BINARY
/BRIEF
/ENTRY
/EXCLUDE
/INCLUDE /LOG
/OUTPUT /REGISTER_DUMP
/SID_REGISTER
/SINCE
/STATISTICS
Examples
/FULL
/REJECTED
/SUMMARY
ANALYZE /ERROR_LOG Subtopic? Return
ANALYZE Subtopic? Examples Return
(continued on next page)
System Maintenance 3–5
Table 3–5 (Cont.) General Troubleshooting Procedure
Step
Action
12.
If the problem cannot be isolated and repaired, the service call should be escalated
to the Customer Service Center for further action.
3.5 Module Fault LEDs
Figure 3–1 shows all module fault LED locations. Table 3–6 identifies each
module.
Figure 3–1 Module Fault LEDs
Rear
Front
4
5
1
.
..
..
..
.
. ...
2
6
7
8
.
3
.
.. .. .. ..
..
.
.
. . . .. . . .
9
10
.
. . . .. . . .
CPU Cabinet
CPU Cabinet
MR−0049−93RAGS
3–6 System Maintenance
Table 3–6 Key to Figure 3–1, Module Fault LEDs
Key
Module
1
CPU module
2
ATM module
3
System Fault (zone control panel)
4
Front end unit
5
DC3 converter
6
DC5 converter
7
Power system controller
8
Console module
9
CAMP module
10
DSSI and Ethernet interface modules
3.6 Power System Overview
The following sections describe the power distribution and power components.
Figure 3–2 and Figure 3–3 are basic block diagrams of the system power and
power distribution.
Table 3–7 provides a functional summary of the power components. Table 3–8 is
a DC voltage summary.
System Maintenance 3–7
Figure 3–2 Power System Block Diagram (1 of 2)
UTILITY POWER INPUT
120 Vac, 60 Hz
240 Vac, 50 Hz
Optional
Uninterruptible
Power System
AC POWER OUTPUT AND DISTRIBUTION
Power
Distribution
Boxes
With UPS: AC Power Distributed to
System and Expansion Cabinets
Without UPS: AC Power Distributed
to Expansion Cabinet
DC POWER OUTPUT AND DISTRIBUTION
48V_DRCT
Front End
Unit
48V_SWD
DC5
5 Vdc to Centerplane to CPU/IO ATM/Console
Extender/Interface Modules
3.3 Vdc and 12 Vdc to Centerplane to CPU/IO
ATM/Console Extender/Interface Modules
DC3
Thermal Emulator Output to
Power System Control (PSC)
2 Vdc Output Not Used on Model 810
CAMP
Module
Zone A
and B
LDCs
21 Vdc to CPU and IO ATM Module Clock Logic
48V_PSC to PSC
I2C Bus Power to Module Fault LEDs
5 Vdc In−Zone Disk Control Panel
5 Vdc Terminal DC Power
12 Vdc LDC Control Card
LDC Control and Status to CAMP Module
Zone A and B
Disk Extender
Modules
48V_SWD
DC3 3.3 Vdc/12 Vdc Input
DC5 5 Vdc Input
Console Extender Module
−12 Vdc Input
Console
Extender
Module
−12 Vdc to Centerplane/CPU/IO
ATM/Interface Modules
Interface
Module
MR−0500−92RAGS−A
3–8 System Maintenance
Figure 3–3 Power System Block Diagram (2 of 2)
DC POWER OUTPUT AND DISTRIBUTION
DC3 3.3 Vdc/12 Vdc Input
DC5 5 Vdc Input
Console Extender Module
−12 Vdc Input
DC3 3.3 Vdc/12 Vdc Input
DC5 5 Vdc Input
Console Extender Module
−12 Vdc Input
CPU
Module
IO ATM
Module
12 Vdc
3.3 Vdc
Internal
DC to DC
Converter
I2C Bus−Power Status to System
Power Fail Function (POK_H) to
I/O Devices and Options
From CAMP Module: Initiate
Power On Sequence
DC3 Thermal Emulator Input
System Temperature Monitor
Centerplane DC Voltage Monitor
LDC Status Monitor
Power
System
Control
Initiate Overtemperature
Power Off Sequence
Initiate Overvoltage
Power Off Sequence
Initiate Undervoltage
Power Off Sequence
Fan Speed Commands to CAMP Module
Report LDC Status and Faults to System
MR−0500−92RAGS−B
System Maintenance 3–9
Table 3–7 Power System Functional Summary
FRU
Functional Summary
Local Disk Converter
(LDC)
An LDC is located in each in-zone disk drawer. It provides
+12 Vdc with fast transit response and tolerance to short-term
loading during disk spinup. Also provides +5 Vdc for power
logic, and EMI filtering for the 48 V bus.
It provides VTERM, which is a 5 V diode isolated output, and
current limited for powering the I/O bus terminators. Fusing is
included to prevent a fault on one LDC from loading the 48 V
bus and crashing the entire power system.
Front End Unit (FEU)
H7884-AA
Provides the main ac circuit breaker, and generates two +48 V
outputs:
•
Unswitched (DRCT) which supports the CAMP and Disk
Extender modules, LDCs, DC3, and DC5
•
Switched (SWD) which supports the interface modules, and
Console and Disk extender modules
Also provides programmable fan power output from +11 to +27
Vdc which allows the system to adjust the fan speed based on
system temperature. The PSC monitors the system temperature
through a thermal emulator in DC3, and sends fan speed
commands through the CAMP module to the FEU to adjust
the fan power output.
Power System
Controller (PSC)
H7851-AA
An I2C bus allows the PSC to write power status information
to the system, and provides a power fail signal (POK_H) to the
mass storage devices and I/O options. Receives commands from
the CAMP module to initiate the logic power on sequence by
commanding the FEU to turn on the +48 V switched output and
enable the DC3 and DC5 outputs.
The PSC also drives the power system visual status indicators.
It monitors system temperature through the thermal emulator in
DC3 and sends fan speed commands through the CAMP module
to the FEU for fan power and fan speed control. Provides a
warning when system temperatures are beyond the normal
operating range:
Green Zone = 5°C (41°F) to 52°C (126°F)
Yellow Zone = 5°C (41°F) to 62°C (144°F)
Red Zone = 5°C (41°F) to 75°C (167°F)
Initiates the power off sequence when system temperature
reaches the red zone.
The PSC monitors the centerplane voltages and initiates a power
off on an undervoltage fault; fires the crowbar and initiates a
power off on an overvoltage fault.
Also initiates a power off if the FEU indicates a 48 V output is
out of tolerance, or there is less than 4 millisecond of reserve
power, and on a fan failure. The PSC monitors the LDC status
and reports failures to the system.
(continued on next page)
3–10 System Maintenance
Table 3–7 (Cont.) Power System Functional Summary
FRU
Functional Summary
DC5 H7179-AA
DC to dc converter which provides +5 Vdc to the CPU, MMB,
SIMMs, I/O ATM, interface and console extender modules, as
well as +5 Vdc to the I/O ATM internal +5 Vdc to +3.3 Vdc
converter for the SOC.
Provides EMI filtering on the 48 V bus, and fusing to prevent
the power system from crashing due to a short circuit on a
converter input. Supports the crowbar SCR on a 5 V overvoltage
or undervoltage fault.
DC3 H7178-AA
DC to dc converter which provides +3 Vdc to the CPU, I/O ATM,
interface and console extender modules. Provides +12 Vdc to the
console extender module +12 V to -12 V converter for the CPU
and I/O ATM modules, and the +21 V converter for the CPU and
I/O ATM clock logic.
Provides EMI filtering on the 48 V bus, and fusing to prevent
the power system from crashing due to a short circuit on a
converter input. Supports the crowbar SCR on a 3 V or 12 V
undervoltage or overvoltage fault. Provides system temperature
sensing through the thermal emulator.
The emulator provides system temperature information to the
PSC for system cooling fan speed control and for power off in the
event of an overtemperature condition.
CAMP module
Control and Miscellaneous Power module. Provides
miscellaneous custom power control circuits.
Console extender
module
Provides local and remote console terminal ports, modem port,
and zone control panel interface.
Fan current sense
board (FCSB)
Monitors the fan current and rotation, and generates a
rotation signal to the CAMP module. The CAMP module in
turn generates a tachometer signal to the PSC for fan speed
monitoring and control.
Zone A and B power
controllers
Provide ac utility power to the peripheral devices. Power
controllers are located in the expansion cabinet.
Power I2C bus
Provides serial communication between the PSC, console
extender, and I/O ATM modules. The PSC uses the bus to
write power status information.
The I/O ATM uses the bus to control the zone control panel
LEDs through the console extender module. It also writes the
Ethernet hardware addresses.
System Maintenance 3–11
Table 3–8 System DC Voltage Summary
Component
Supplies . . .
To . . .
DC5 (H7179-AA)
+5 Vdc
CPU, I/O ATM, console extender, and
interface modules
DC3 (H7178-AA)
+3.3 Vdc
CPU, I/O ATM, console extender, and
interface modules
DC3 (H7178-AA)
+12 Vdc
CPU, I/O ATM, console extender, and
interface modules
FEU (H7884-AA)
+48V_DRCT
(direct)
CAMP and disk extender modules, LDCs,
DC3, and DC5
FEU (H7884-AA)
+48V_SWD
(switched)
Console extender, disk extender, and
interface modules
CAMP 48V_DRCT to 12
V converter
VBIAS12
I2C bus power to drive module fault LEDs
CAMP 48V_DRCT to 12
V converter
VBIAS5
CAMP module internal bias voltage
Console extender module
+48_SWD to -12 V
converter
-12 Vdc
CPU and I/O ATM modules
CAMP +12 V to +21 V
converter
+21 Vdc
CPU and I/O ATM module clock logic
FEU (H7884-AA)
11 Vdc to 27
Vdc
Programmable fan control power
Local disk converter
(LDC)
+5 Vdc
In-zone disk control panel
LDC
+12 Vdc
LDC control card
LDC
+5 VTERM
Terminal dc power
3.7 Power System Maintenance
Figure 3–4 shows the location of the power module controls and indicators.
Table 3–9 describes module functions and repair action.
Table 3–10, Table 3–11, Table 3–12, Table 3–13, Table 3–14, Table 3–15,
Table 3–16, and Table 3–17 describe the Fault ID Display codes of the PSC.
3–12 System Maintenance
Figure 3–4 Power Module Controls and Indicators
FEU
DC3
DC5
PSC
7
8
9
10
11
12
13
1
2
3
4
14
5
6
15
16
CAMP
MR−0483−92RAGS
Table 3–9 Key to Figure 3–4, Power Module Controls and Indicators
Item
Control/Indicator
Function
Repair Action
1
AC Circuit Breaker
2
FEU Failure
When on, indicates the
dc output voltages for the
FEU are below the specified
minimum.
Replace the FEU. See
Chapter 5.
3
FEU OK
When on, indicates the
dc output voltages for the
FEU are above the specified
minimum.
4
DC3 Failure
When on, indicates that
one of the output voltages
is not within the specified
tolerances.
Replace the dc converter.
See Chapter 5.
(continued on next page)
System Maintenance 3–13
Table 3–9 (Cont.) Key to Figure 3–4, Power Module Controls and Indicators
Item
Control/Indicator
Function
5
DC3 OK
When on, indicates that the
output voltages are within
the specified tolerances.
6
AC Present
When on, indicates ac power
is present at the ac input
connector, regardless of the
position of the circuit breaker.
If ac power is present,
check the power source and
power cord.
Replace the dc converter.
See Chapter 5.
7
DC5 Failure
When on, indicates that
one of the output voltages
is not within the specified
tolerances.
8
DC5 OK
When on, indicates that the
output voltages are within
the specified tolerances.
9
PSC Failure
When on, indicates a PSC
fault.
10
PSC OK
When blinking, indicates the
PSC is performing power-on
self-tests.
Repair Action
If the system will not
power on, and the ac LED
is the only LED on, check
the circuit breaker.
Replace the PSC. See
Chapter 5.
When on, indicates the PSC
is functioning.
11
Over Temperature
Shutdown
When on, indicates that
the PSC shut down the
system because of an internal
overtemperature condition.
Set the circuit breaker
to off and wait 1 minute
before turning system
power on.
Make sure the air intake
is unobstructed and that
the room temperature does
not exceed the maximum
requirement.
12
Fan Failure
When on, indicates a fan
failure. Use the hexadecimal
number in the Fault ID
Display to isolate the fan.
Replace the fan. See
Chapter 5.
13
Disk Drive Power
Failure
When on, indicates a disk
drive power failure. Use the
hexadecimal number in the
Fault ID Display to isolate
the storage compartment that
houses the disk drive.
The faulty unit is probably
the local disk converter
(LDC). To isolate the LDC,
disconnect the drives on
the specified bus, and turn
on system power.
If the indicator stays
on with the drives
disconnected, replace the
failing LDC. See Chapter 5.
A cable or drive may also
be at fault.
(continued on next page)
3–14 System Maintenance
Table 3–9 (Cont.) Key to Figure 3–4, Power Module Controls and Indicators
Item
Control/Indicator
Function
Repair Action
14
Fault ID Display
Displays the power
subsystem fault codes.
15
PSC Reset Button
When out, indicates a PSC
fault condition.
Press in to reset.
16
CAMP Fan Fault
When on, indicates that a fan
fault caused all disk drives
and tape drives to shut down.
Replace the fan. See
Chapter 5.
Table 3–10 Fan, LDC, Temperature Error Codes
Error
Code
PSC
OK
PSC
Failure
LDC
Fault
FAN
Failure
0
On
Off
—1
—
Error Description
Normal operation, displayed after PSC
passes self-test
1
—
—
—
On
Fan 1 failed
2
—
—
—
On
Fan 2 failed
3
—
—
—
On
Fan 3 failed
4
—
—
—
On
Fan 4 failed
9
—
—
—
On
Access door opened, or two or more fans
failed
A
—
—
On
—
LDCA (LDC0) failed
B
—
—
On
—
LDCB (LDC1) failed
C
—
—
On
—
LDCC (LDC2) failed
D
—
—
On
—
LDCD (LDC3) failed
A
—
—
On
—
LDCE (LDC4) failed
—
—
—
On
—
LDCF (LDC5) failed
—
—
—
On
—
LDCG (LDC6) failed
—
—
—
On
—
LDCH (LDC7) failed
7
Off
On
—
—
Temperature sensor failed, low reading
8
Off
On
—
—
Temperature sensor failed, high reading
—
—
—
On
—
Temperature in red zone
1 Dash
entries = LED state NOT changed by error
The PSC Fault ID Display provides a continuous, 1-character rotating display of
the 4-character error codes listed in Tables 3–11 to 3–17. Character display time
is approximately 1/2 second.
System Maintenance 3–15
Table 3–11 FEU Error Codes
Error
Code
FEU
OK
FEU
Failure
Error Description
E200
Off
On
48V_SWITCHED OK before enabling
E201
Off
On
Fan converter operating before enabling
E202
Off
On
HVDC is OK, but POWER is not OK (contradictory
status)
E203
Off
On
The ac current is not OK (in idle state/loop)
E204
Off
On
48V_DIRECT is not OK and POWER is OK (IRQ18)
E205
Off
On
48V_SWITCHED is not OK and switched bus requested
(IRQ19)
E206
Off
On
HVDC is OK, but POWER is not OK (IRQ20)
E210
Off
On
SWITCHED BUS did not turn on at startup
E211
Off
On
SWITCHED BUS did not turn off at shutdown
E212
Off
On
The ac current is high for the second time (in startup or
run loop)
E220
Off
On
Fan converter voltage is low
Table 3–12 PSC Error Codes
Error
Code
PSC
OK
PSC
Failure
Error Description
EFFF
Off
On
Invalid error number (in display_error procedure)
E000
Off
On
Unused error condition
E001
Off
On
PSC bias supply not OK
E002
Off
On
80C196 internal register test failed
E003
Off
On
80C196 operational test failed
E004
Off
On
80C196 on-chip RAM test failed
E005
Off
On
ROM checksum test failed
E006
Off
On
External RAM test failed
E007
Off
On
Port FF20 (PSC/FEU LEDs) not initially zero
E008
Off
On
Port FF22 (Module enable) not initially zero
E009
Off
On
Port FF23 (DC-DC LEDs) not initially zero
E010
Off
On
Port FF24 (LDC enable) not initially zero
E011
Off
On
External interrupt test failed (8259 did not clear test bit)
E012
Off
On
Masked interrupt occurred (A/D conversion complete)
E013
Off
On
Masked interrupt occurred (HSI data available)
E014
Off
On
Masked interrupt occurred (HSO)
E015
Off
On
Masked interrupt occurred (HSI pin 0)
E016
Off
On
Masked interrupt occurred (Serial I/O)
E017
Off
On
Software trap interrupt occurred (F7 instruction
executed)
(continued on next page)
3–16 System Maintenance
Table 3–12 (Cont.) PSC Error Codes
Error
Code
PSC
OK
PSC
Failure
E018
Off
On
Unimplemented opcode interrupt occurred (invalid
instruction)
E019
Off
On
Masked interrupt occurred (HSI FIFO 4th entry)
E020
Off
On
Masked interrupt occurred (Timer 2 capture)
E021
Off
On
Masked interrupt occurred (Timer 2 overflow)
Error Description
E022
Off
On
PSC bias supply failed (NMI occurred)
E023
Off
On
Invalid interrupt number (>31) received from 8259
E024
Off
On
IRQ4 occurred (slave 0 to master 8259)
E025
Off
On
IRQ5 occurred (slave 1 to master 8259)
E026
Off
On
IRQ6 occurred (slave 2 to master 8259)
E027
Off
On
Masked IRQ13 occurred (FEU DIRECT 48 became OK)
E028
Off
On
Masked IRQ14 occurred (FEU SWITCHED 48 became
OK)
E029
Off
On
Masked IRQ16 occurred (FEU POWER became OK)
E030
Off
On
External interrupt test, not enabled (IRQ22)
E031
Off
On
External interrupt test, bit not set (IRQ22)
E032
Off
On
Masked IRQ25 occurred (OCP DC ON, turned on)
E033
Off
On
Masked IRQ26 occurred (PSC DC ON, turned on)
E034
Off
On
Invalid converter number (start of enable_converter
procedure)
E035
Off
On
Invalid converter number (end of enable_converter
procedure)
E036
Off
On
Invalid converter number (start of disable_converter
procedure)
E037
Off
On
Invalid converter number (end of disable_converter
procedure)
E047
Off
On
Unused error condition
E078
Off
On
Unused error condition
E079
Off
On
Unused error condition
E086
Off
On
Unused error condition
E087
Off
On
Unused error condition
E088
Off
On
Unused error condition
E091
Off
On
Unused error condition
E092
Off
On
Unused error condition
E093
Off
On
Unused error condition
E094
Off
On
Unused error condition
E095
Off
On
Unused error condition
E096
Off
On
Unused error condition
E097
Off
On
Unused error condition
(continued on next page)
System Maintenance 3–17
Table 3–12 (Cont.) PSC Error Codes
Error
Code
PSC
OK
PSC
Failure
Error Description
E098
Off
On
Unused error condition
E099
Off
On
Unused error condition
Table 3–13 12 V DC to DC Converter Error Codes
Error
Code
12V
OK
1
12V
Fault
5V
OK
5V
Fault
3V
OK
3V
Fault
2V
OK
2V
Fault
Error Description
E010
—
—
Off
On
Off
On
—
—
Delta 0 V
E101
—
—
Off
—
Off
—
Off
—
Indeterminant
converter
overvoltage
(IRQ7)
E102
Off
—
Off
—
Off
—
Off
—
Indeterminant
converter
overvoltage/
undervoltage
(IRQ15)
E103
Off
On
Off
On
Off
On
Off
On
Unknown
converter
overvoltage/
undervoltage
condition
1 Dash
entries = LED state NOT changed by error
Table 3–14 2 V DC to DC Converter Error Codes
Error
Code
2V
OK
E110
E111
2V
Fault
Error Description
Off
On
Out of regulation low
Off
On
Out of regulation high
E112
Off
On
Undervoltage
E113
Off
On
Overvoltage
E114
Off
On
Voltage present when disabled
E115
Off
On
Did not turn off
Note
The 2 V converter output is not used on the Model 810.
3–18 System Maintenance
Table 3–15 3 V DC to DC Converter Error Codes
Error
Code
3V
OK
3V
Fault
Error Description
E120
Off
On
Out of regulation low
E121
Off
On
Out of regulation high
E122
Off
On
Undervoltage
E123
Off
On
Overvoltage
E124
Off
On
Voltage present when disabled
E125
Off
On
Did not turn off
Table 3–16 5 V DC to DC Converter Error Codes
Error
Code
5V
OK
5V
Fault
Error Description
E130
Off
On
Out of regulation low
E131
Off
On
Out of regulation high
E132
Off
On
Undervoltage
E133
Off
On
Overvoltage
E134
Off
On
Voltage present when disabled
E135
Off
On
Did not turn off
Table 3–17 12 V DC to DC Converter Error Codes
Error
Code
12V
OK
12V
Fault
Error Description
E140
Off
On
Out of regulation low
E141
Off
On
Out of regulation high
E142
Off
On
Undervoltage
E143
Off
On
Overvoltage
E144
Off
On
Voltage present when disabled
E145
Off
On
Did not turn off
3.8 Device Status and Fault Indicators
The following sections describe the device status and fault indicators.
3.8.1 RF35 Disk Drawer
Figure 3–5 shows the RF35 disk drawer controls and indicators. Table 3–18
describes their functions.
System Maintenance 3–19
Figure 3–5 RF35 Disk Drawer Controls and Indicators
D0 D1 D2
FAULT
WRITE
PROT
ON
LINE
PWR
ON/OFF
SET UP
D3 D4
0−1
SU
D5
FAULT
WRITE
PROT
ON
LINE
PWR
ON/OFF
SET UP
0−1
SU
MR−0436−92RAGS
Table 3–18 RF35 Disk Drawer Controls and Indicators
Control/Indicator
Color
State
Operating Condition
Fault
Red
On
Drive is faulty.
Off
Drive is functioning correctly.
Out, off
System can read from the disk and write to
the disk.
In, on
System cannot write to the disk, but can
read from the disk.
Out, off
Drive is disabled.
In, on
Drive is enabled.
In, on
Power is on.
Out, off
Power is off.
In
Prevents the drive from joining the DSSI
cluster. Also allows you to set the DSSI
parameters for a new drive or a drive you
replace in the system after repair. (If you
want to set the DSSI parameters, you press
the Set Up switch and the Power On/Off
switch at the same time.)
Out
Has no effect on the drive.
Write Protect
On Line
Power On/Off
Set Up Switch
3–20 System Maintenance
Amber
Green
Green
3.8.2 SF35 Storage Array
Figure 3–6 shows the operator control panel. Table 3–19 describes their functions.
Figure 3–7 shows the rear of the storage array. Table 3–20 describes the functions
of the controls and indicator located at the rear of the storage array.
Figure 3–6 SF35 Operator Control Panel
Operator
Control
Panel
(OCP)
Front
A
B
C
Reeaarr
R
D
E
F
A
B
C
D
E
F
Ready
Write
Protect
Fault
Fault Indicators
A
A
B
C
D
E
F
B
C
Front
D
E
F
A
B
C
Rear
D
E
F
MR-0017-93DG
System Maintenance 3–21
Table 3–19 SF35 Operator Control Panel Description
Control/Indicator
Function
Ready
Push-to-set switch with green indicator. Brings the integrated storage
element (ISE) on-line in about 10 seconds. The indicator remains on
while the ISE is on-line.
Write Protect
Push-to-set switch with amber indicator. Write protects the data on
the ISE. The data cannot be overwritten, nor can new data be written
to the ISE.
Fault
Recessed switch with multi-color indicator. Controls the MSCP.
This switch is equivalent to the SU switch. The colors indicate the
following conditions:
Green (in) = MSCP is disabled.
Green (out)= MSCP is enabled.
Amber = Fault is detected while the MSCP is disabled.
Red = ISE fault.
Off = Normal MSCP operation.
Drive DC Power
Switches
3–22 System Maintenance
One switch/indicator for each ISE. Apply power to the ISEs. Each
ISE spins up and runs a self-test. The indicator shows that nominal
power is being applied to the ISE. (If you want to bring the ISE
on-line, you press the Ready switch next.)
Figure 3–7 SF35 Rear Panel Fault Indicator
DSSI
Connectors
A
B
C
D
E
F
digi tal
1 0
AC Power
Switch
Power Supply
Fault Indicator
(Behind Panel)
230
115
FAULT
Line Voltage
Selector Switch
(Behind Panel)
MR-0421-92DG
Table 3–20 SF35 Rear Panel Controls and Indicator
Control/Indicator
Function
AC Power Switch
Applies power to the ac power supply.
Line Voltage
Selector Switch
Selects 120 Vac (60 Hz) or 240 Vac (50 Hz) line voltage.
Power Supply
Fault Indicator
When on, indicates an overtemperature condition.
System Maintenance 3–23
3.8.3 SF73 Storage Array
Figure 3–8 shows the SF73 storage array status and fault indicators. Table 3–21
descibes their functions. Figure 3–9 shows the controls and indicator located at
the rear of the storage array.
Figure 3–8 Location of SF73 Storage Array LEDs and Switchpacks
digi tal
Write
Ready Protect Fault
DSSI
ID
1
DSSI
ID
Write
Ready Protect Fault
2
MR-0423-92DG
Table 3–21 SF73 Front Panel Controls and Indicators
Control/Indicator
Function
Ready
Push-to-set switch with green indicator. Brings the integrated storage
element (ISE) on-line in about 10 seconds. The indicator remains on
while the ISE is on-line.
Write Protect
Push-to-set switch with amber indicator. Write protects the data on
the ISE. The data cannot be overwritten, nor can new data be written
to the ISE.
Fault
Switch with red indicator. When the indicator is on, the ISE failed.
Press the switch to display the fault codes and clear the ISE fault.
The indicator is off during normal operation.
TERM PWR LED
When on, indicates that the correct termination power is being
supplied.
SPLIT LEDs (2)
When on, indicates that the storage array is operating in split-bus
mode.
Switchpacks (4)
One for each of the drives in the storage array. Each switchpack is
used to set the DSSI ID number. The icon on the front of the door
indicates the location of the drive. The three rightmost switches of
each switchpack are the DSSI ID switches. The leftmost switch is the
SU switch.
Drive DC Power
Switches
One switch/indicator for each ISE. Each switch applies power to an
ISE. Each ISE spins up and runs a self-test. The indicator shows
that nominal power is being applied to the ISE. (If you want to bring
the ISE on-line, you press the Ready switch next.)
3–24 System Maintenance
Figure 3–9 Rear of the SF73 Storage Array
DSSI
Connectors
1 0
AC Power
Switch
Power Supply
Fault Indicator
(Behind Panel)
230
115
FAULT
Line Voltage
Selector Switch
(Behind Panel)
MR-0422-92DG
System Maintenance 3–25
3.8.4 TF85C Tape Drive
Table 3–22 may help you define and correct TF85C tape drive problems.
Table 3–22 TF85C Tape Drive Problems
Problem
Possible Solution
Correctable failure
during operation
If the TF85C drive fails during operation, reset the the drive, then
rewind, unload, and remove the cartridge.
If all four indicators are blinking, press the Unload button. If the
failure is correctable, the tape begins to rewind and the yellow
indicator blinks. When the tape is unloaded, the green indicator
turns on and the beeper sounds. Then pull the Insert/Remove
handle to open the drive and remove the cartridge.
Noncorrectable
failure during tape
motion
If the tape does not rewind when the Unload button is pushed, and
all indicators continue to blink, the failure is not correctable. The
drive must be serviced or replaced.
Failure during
cartridge insertion
A cartridge failure occurs if a cartridge is damaged or if internal
portions of the drive that handle the cartridge are not working.
Suspect a cartridge failure if the green indicator blinks, but the
tape does not move (the yellow indicator does not blink). Remove
the cartridge and try another one, or inspect the tape leader and
drive takeup leader.
Figure 3–10 shows the front of the TF85C tape drive. Table 3–23 describes the
indicators shown in Figure 3–10.
Figure 3–10 TF85C Cartridge Tape Drive
t
ad
gh
Lo
Li
o
T
t
ai his
t
W
n
pe
O dle pe
a
T
an
H
rt
se his
t
In
se
lo e
C dl
an
H
R
d
oa
n
nl
to
U
ut t
o
B gh
T
i
ss L
re
P t
is
ai
th
W
n
pe
pe
O dle Ta
an ve
o
em
H
se
U
ed
g
e
ct
in
in
at dle
n
te te
e
e a pe per an
ri ro
ap Us Cle Ta
W P
O H
T
Text is 8pt on 8pt
Rt,z,-45
TK85 is TI med (ti) 12pt
U
nl
oa
d
MR-0471-92DG
3–26 System Maintenance
Table 3–23 TF85C Cartridge Tape Drive Indicators
Indicator
Color
State
Operating Condition
Write Protected
Orange
On
Tape is write-protected.
Off
Tape is write-enabled.
Tape in Use
Yellow
Blinking
Tape is moving.
On
Tape is loaded; ready for use.
On
Drive head needs cleaning or tape is bad.
If it remains on after
you unload the cleaning
tape . . .
Then the cleaning was not completed because the
tape ended.
If, after cleaning, it
turns on again when
the data cartridge is
reloaded . . .
Then a data cartridge problem occurred. Try
another cartridge.
On
Okay to operate the Insert/Remove handle.
Off
Do not operate the Insert/Remove handle.
On
Power-on self-test is in progress.
Blinking
A fault is occurring. Press the Unload button to
unload the cartridge. If the fault is cleared, the
yellow indicator blinks while the tape rewinds.
When the green indicator turns on, you can
move the Insert/Remove handle to remove the
cartridge. If the fault is not cleared, all four
indicators continue to blink. Do not attempt to
remove the cartridge. Refer to the TF85C service
guide.
Use Cleaning
Tape
Operate Handle
All four
indicators
Orange
Green
3.8.5 TF857 Tape Loader
This section describes the power on process and the operator control panel (OCP)
indicators.
3.8.5.1 Power-On Process
When the TF857 tape loader powers on, all of the indicators on the control panel
(OCP) turn on within 15 seconds. The power on self-test (POST) is initializing
the subsystem. When POST completes successfully, all OCP indicators, including
the Magazine Fault and Loader Fault indicators, turn off — except for Power On.
Then the elevator scans the magazine to find slots that contain cartridges.
3.8.5.2 Operator Control Panel Controls and Indicators
Figure 3–11 shows the OCP controls and indicators. Table 3–24 describes their
functions.
System Maintenance 3–27
Figure 3–11 TF857 Operator Control Panel
Operator Control Panel
Eject
Load/Unload
Mode Select Key
Slot Select
OCP
Disabled
0
Automatic
Mode
Power On
Current
Slot
Indicators
0-5
Manual
Mode
Service
Mode
Button
and
Indicator
Area
OCP Label
Write
Protected
Tape In Use
1
Use
Cleaning Tape
Magazine
Fault
Loader Fault
2
DSSI Node
ID Label
3
Eject
Load/Unload
Slot Select
0
Power On
Write
Protected
Write Protect
Load Fault
1
Tape In Use
4
Use
Cleaning Tape
Magazine
Fault
Loader Fault
2
3
5
4
5
6
6
40% REDUCTION
MR-0472-92
Table 3–24 TF857 OCP Controls and Indicators
Control/Indicator
Color
Function
Eject button
–
Opens the receiver, allowing access to the
magazine for removal and insertion of
cartridges. Also can be used to unload the
tape from the drive to the magazine.
Eject indicator
Green
Indicates that pressing the Eject button opens
the receiver. If a cartridge is in the drive, the
cartridge unloads to the magazine and the
receiver opens. If no cartridge is in the drive,
the receiver opens.
(continued on next page)
3–28 System Maintenance
Table 3–24 (Cont.) TF857 OCP Controls and Indicators
Control/Indicator
Color
Function
Load/Unload button
–
Loads the currently selected cartridge into the
drive, or unloads the cartridge from the drive
to the magazine.
If the Loader Fault or Magazine Fault
indicators are on, can also be used to reset
the subsystem.
Load/Unload indicator
Green
Indicates you can press the Load/Unload
button.
Slot Select button
–
When pressed, increments the current slot
indicator to the next slot.
Slot Select indicator
Green
Indicates the Slot Select button can be used.
Pressing the button increments the current
slot indicator to the next slot.
Power On indicator
Green
When on, indicates the TF857-AA tape loader
power is on (ac and dc voltages are within
tolerance). When off, indicates the tape loader
power is off.
Write Protected indicator
Orange
When on, indicates the cartridge in the drive
is write protected. When off, indicates the
cartridge in the drive is write enabled.
Tape in Use indicator
Yellow
Indicates tape drive activity as follows:
•
Slow blinking indicates tape is rewinding;
rapid blinking indicates tape is reading or
writing.
•
When on steadily, indicates a cartridge is
in the drive and the tape is not moving.
•
When off, indicates no cartridge is in the
drive.
Magazine Fault indicator
Red
Indicates a magazine failure.
Use Cleaning Tape indicator
Orange
Indicates the read/write head needs cleaning.
Loader Fault indicator
Red
Indicates a TF857-AA tape loader transfer
assembly error or drive error.
Current slot indicators 0–6
Green
Identify the current slot (see Slot Select
button). Each current slot indicator blinks
when its corresponding cartridge moves to or
from the drive. Also used with the Magazine
Fault or Loader Fault indicator to indicate the
type of fault.
3.9 ROM-Based Diagnostics
The following sections describe how to use the TEST and Z commands and to run
the ROM-based diagnostics (RBDs).
System Maintenance 3–29
3.9.1 TEST
TEST enables the user to test:
•
The system
•
A zone
•
The CPU and memory
Use TEST only when the cross-link state is set to off.
The TEST syntax is:
TEST [qualifier(s)]
Tables 3–25 and 3–26 describe the TEST selection and control qualifiers.
Table 3–25 Qualifiers for TEST Selection
Qualifier
Description
/GROUP:n1
Specifies a decimal number from 0 to 5 that identifies the
group of tests to be run.
/TEST:n1
Specifies a decimal number from 0 to 32 that identifies the
tests to be run.
/SUBTEST:n1
Specifies a decimal number from 0 to 32 that identifies the
subtests to be run.
/VERBOSE
Enables a display of all individual tests during execution.
/NOTRACE
Disables test traces.
1n
can be a:
•
•
•
•
Single value
Range separated by a colon (1:5)
List separated by commas (1,5,9)
Combination of range and list (1:6,8,10,11:29)
Table 3–26 Qualifiers for TEST Control
Qualifier
Description
/PASSCOUNT:n
n is a decimal number from 0 to MAXINT. When n is 0, the
passcount is infinite.
/NOTRACE
Disables the test traces.
/COE
Continues on error.
/NOCONFIRM
Disables the test confirmation on destructive tests.
/EXTENDED
Enables extended error reports.
/NOSTATUS
Disables status messages and reports.
/LIST
Lists the available tests, but does not run them.
When you do not supply the qualifier(s), TEST runs all the nonextended tests
(except those that require confirmation).
3–30 System Maintenance
3.9.2 Z
Z connects to the firmware of another module in the system. It is also used to
initiate I/O ROM-based diagnostics.
The Z syntax is:
Z[/PATH=path-number]
Table 3–27 describes the qualifier.
Table 3–27 Qualifier for Z
Qualifier
Function
/PATH=path-number
Specifies the zone and slot number of a module. The pathnumber format is zss, where:
z is the zone ID (A or B).
ss is the slot number of the module.
When you do not supply a path, Z tries to connect to the module in slot 1 of the
zone that is running.
Note
Z performs a hard reset on the ATMs, but you need to issue a programmed
reset to load and start the functional firmware. After Z, you must issue a
BOOT from the same zone, or a START/ZONE from the other zone (if that
zone is running the operating system).
3.9.3 CPU ROM-Based Diagnostics
Table 3–28 provides a brief description of the CPU ROM-based diagnostics
(RBDs).
Table 3–28 CPU ROM-Based Diagnostic Descriptions
Group
Test
Subtest
G: 0
Description
Self-Test
G: 0
T: 0
G: 0
T: 0
NVRAM Test
S: 0
NVRAM CPU EEPROM Data Integrity Test
G: 0
T: 0
S: 1
NVRAM CPU EEPROM Checksum Test
G: 0
T: 0
S: 2
NVRAM I2C Bus Register Access Test
G: 0
T: 0
S: 3
NVRAM Module-ID PROM Access and Data Integrity
R/W Test
G: 0
T: 0
S: 4
NVRAM Module-ID PROM Checksum Test
G: 0
T: 0
S: 5
NVRAM System Ethernet Access Test
G: 0
T: 0
S: 6
NVRAM System Ethernet PROM Checksum Test
G: 0
T: 1
P-CACHE Test
(continued on next page)
System Maintenance 3–31
Table 3–28 (Cont.) CPU ROM-Based Diagnostic Descriptions
Group
Test
Subtest
Description
G: 0
T: 1
S: 0
P-CACHE Register Bit Test
G: 0
T: 1
S: 1
P-CACHE Tag Integrity Test
G: 0
T: 1
S: 2
P-CACHE Data Integrity Test
G: 0
T: 1
S: 3
P-CACHE Data/Tag Parity Test
G: 0
T: 2
G: 0
T: 2
S: 0
VIC Register Bit Test
VIC Test
G: 0
T: 2
S: 1
VIC Cache Tag Test
G: 0
T: 2
S: 2
VIC Cache Data Test
G: 0
T: 2
S: 3
VIC Cache Data Parity Error Test
G: 0
T: 2
S: 4
VIC Cache Tag Parity Error Test
G: 0
T: 2
S: 5
VIC Branch Prediction Test
G: 0
T: 3
JXD Test
G: 0
T: 4
G: 0
T: 4
S: 0
MEMORY Data Bus & Catastrophic Failure Test
Memory Test
G: 0
T: 4
S: 1
MEMORY Address Uniqueness Test
G: 0
T: 4
S: 2
MEMORY Bank Addressing Test
G: 0
T: 4
S: 3
MEMORY Chip Addressing Test
G: 0
T: 4
S: 4
MEMORY Chip Open Address Lines Test
G: 0
T: 4
S: 5
MEMORY Single-Bit ECC Error Logic Test
G: 0
T: 4
S: 6
MEMORY Double-Bit ECC Error Logic Test
G: 0
T: 4
S: 7
MEMORY ECC Error Logic Test
G: 0
T: 4
S: 8
MEMORY ECC Test
G: 0
T: 4
S: 9
MEMORY ECC Lines Test
S: 0
BITMAP March Test
G: 0
T: 5
G: 0
T: 5
G: 0
T: 6
G: 0
T: 6
BITMAP Test
B-CACHE Test
S: 0
B-CACHE Data RAM Test
G: 0
T: 6
S: 1
B-CACHE Tag RAM Test
G: 0
T: 6
S: 2
B-CACHE ECC RAM Test
G: 0
T: 6
S: 3
B-CACHE Write Test
G: 0
T: 6
S: 4
B-CACHE Data Integrity Test
S: 5
G: 0
T: 6
G: 0
T: 7
B-CACHE Data Test (error enabled)
G: 0
T: 7
S: 0
DMA Powerup State Test
G: 0
T: 7
S: 1
DMA Register Access Test
DMA Test
G: 0
T: 7
S: 2
DMA Address Decode Test
G: 0
T: 7
S: 3
DMA Interlock Access Test
G: 0
T: 7
S: 4
DMA Queue Processing Test
(continued on next page)
3–32 System Maintenance
Table 3–28 (Cont.) CPU ROM-Based Diagnostic Descriptions
Group
Test
Subtest
Description
G: 0
T: 7
S: 5
DMA Sub-Trasfer Length Test
G: 0
T: 7
S: 6
DMA I/O Byte Alignment Test
G: 0
T: 7
S: 7
DMA Memory Byte Alignment Test
G: 0
T: 7
S: 8
DMA Maximum Transfer Length Test
G: 0
T: 8
G: 0
T: 8
S: 0
XLINK Serial Cross-link Internal Loopback Test - Part 1
XLINK Test
G: 0
T: 8
S: 1
XLINK Serial Cross-link Internal Loopback Request Test
G: 0
T: 8
S: 2
XLINK Serial Cross-link Internal Loopback Reply Test
G: 0
T: 8
S: 3
XLINK Serial Cross-link Internal Loopback Query Test
G: 0
T: 8
S: 4
XLINK Serial Cross-link External Loopback Test
G: 0
T: 8
S: 5
XLINK Serial Cross-link Communication Register Test
G: 0
T: 9
RESET Test
G: 0
T: 9
RESET CPU Module Hard Reset Test
G: 1
G: 1
Zone Test
T: 0
ACCESS Test
G: 1
T: 0
S: 0
ACCESS Parallel Xlink Loopback Test
G: 1
T: 0
S: 1
ACCESS I/O Module PATH ACCESS Test
G: 1
T: 0
S: 2
ACCESS I/O Module SSC Console Uart Test
G: 1
T: 1
DMA Test
G: 1
T: 2
INTERRUPT Test
G: 1
T: 3
ERROR Test
G: 1
T: 3
S: 0
ERROR I/O Crosscheck Test
G: 1
T: 4
G: 1
T: 4
S: 0
RESET CPU Module Zone Reset Test
RESET Test
G: 1
T: 4
S: 1
RESET I/O Module Reset Test
G: 2
System Test
G: 2
T: 0
Cross-link Mode Test
G: 2
T: 0
S: 0
Zone A (MASTER -> RESYNC MASTER -> DUPLEX)
Mode Test
G: 2
T: 0
S: 1
Zone B (MASTER -> RESYNC MASTER -> DUPLEX)
Mode Test
G: 2
T: 1
G: 2
T: 1
S: 0
ACCESS I/O Module Path Access Test
Zone A MASTER - Zone B SLAVE Mode Test
G: 2
T: 1
S: 1
ACCESS I/O Module SSC Console Uart Test
G: 2
T: 1
S: 2
G: 2
T: 2
ERROR I/O Crosscheck Test
Zone A RESYNC_MASTER - Zone B RESYNC_SLAVE
Mode Test
(continued on next page)
System Maintenance 3–33
Table 3–28 (Cont.) CPU ROM-Based Diagnostic Descriptions
Group
Test
Subtest
Description
G: 2
T: 2
S: 0
ACCESS I/O Module Path Access Test
G: 2
T: 2
S: 1
ACCESS I/O Module SSC Console Uart Test
G: 2
T: 2
S: 2
ERROR I/O Crosscheck Test
G: 2
T: 3
Zone B MASTER - Zone A SLAVE Mode Test
G: 2
T: 3
S: 0
ACCESS I/O Module Path Access Test
G: 2
T: 3
S: 1
ACCESS I/O Module SSC Console Uart Test
G: 2
T: 3
S: 2
ERROR I/O Crosscheck Test
G: 2
T: 4
G: 2
T: 4
S: 0
ACCESS I/O Module Path Access Test
G: 2
T: 4
S: 1
ACCESS I/O Module SSC Console Uart Test
S: 2
Zone B RESYNC_MASTER - Zone A RESYNC_SLAVE
Mode Test
G: 2
T: 4
G: 2
T: 5
ERROR I/O Crosscheck Test
G: 2
T: 5
S: 0
ACCESS I/O Module Path Access Test
G: 2
T: 5
S: 1
ACCESS I/O Module SSC Console Uart Test
G: 2
T: 5
S: 2
ERROR I/O Crosscheck Test
DUPLEX Mode Test
The following example shows a CPU RBD error frame.
>>> group: 0 test: 1 subtest:2
======================================================================
----------------------- DIAGNOSTIC TEST ERROR ---------------------GROUP: 00
Test: 01 Sub: 02
Error: 01 Pass: 00000001
Addr: 00000000
Exp: 00000000 Rec: 000000ff
Xor: 000000ff
Data Miscompare
=======================================================================
The example shows that the P-CACHE Data/Tag Integrity Test was executed and
failed. The XOR data specifies a data miscompare.
3.9.4 I/O ROM-Based Diagnostics
Table 3–29 provides a brief description of the I/O ROM-based diagnostics (RBDs).
Table 3–29 I/O ROM-Based Diagnostic Descriptions
Group
Test
Subtest
G: 0
G: 0
Description
I/O Self-Test
T: 0
I/O SSC Test
G: 0
T: 0
S: 0
SSC Toy Clock Test
G: 0
T: 0
S: 1
SSC Storage Uart Test
G: 0
T: 0
S: 2
SSC Bus Timeout Test
G: 0
T: 0
S: 3
SSC Interval Timer Test
(continued on next page)
3–34 System Maintenance
Table 3–29 (Cont.) I/O ROM-Based Diagnostic Descriptions
Group
Test
Subtest
Description
G: 0
T: 1
G: 0
T: 1
S: 0
VIC Register Test
I/O VIC Test
G: 0
T: 1
S: 1
VIC Interrupt Test
G: 0
T: 2
I/O Firewall Test
G: 0
T: 2
S: 0
Firewall Register Test
G: 0
T: 2
S: 1
Firewall Rail Master Test
G: 0
T: 2
S: 2
Firewall Cross Check Error Test
G: 0
T: 3
I/O Cache Test
G: 0
T: 3
S: 0
CACHE Control Register Bit Test
G: 0
T: 3
S: 1
CACHE Minimum Bank Test
G: 0
T: 3
S: 2
CACHE Data Integrity Test
G: 0
T: 3
S: 3
CACHE Tag Integrity Test
G: 0
T: 3
S: 4
CACHE Tag Parity Detection Test
G: 0
T: 3
S: 5
CACHE Tag Parity Generation Test
G: 0
T: 3
S: 6
CACHE Data Parity Checking Test
G: 0
T: 4
I/O NVRAM Test
G: 0
T: 4
S: 0
Module Data EEPROM Integrity Test
G: 0
T: 4
S: 1
Module I2C EEPROM Integrity Test
G: 0
T: 5
G: 0
T: 5
I/O RAM Test
S: 0
G: 1
G: 1
SOC RAM Test
I/O Eself Pcard Test
T: 0
I/O SLIM Test
G: 1
T: 0
S: 0
SLIM Register Test
G: 1
T: 0
S: 1
SLIM RAM Test
G: 1
T: 1
G: 1
T: 1
I/O SWIFT Test
S: 0
SWIFT Reset Test
G: 1
T: 1
S: 1
SWIFT Register Test
G: 1
T: 1
S: 2
SWIFT Interrupt Test
G: 1
T: 1
S: 3
SWIFT Internal Loopback Test
G: 1
T: 2
I/O LANCE Test
G: 1
T: 2
S: 0
LANCE Register Test
G: 1
T: 2
S: 1
LANCE Internal Loopback Test
G: 1
T: 2
S: 2
LANCE Interrupt Test
System Maintenance 3–35
The following example shows an I/O RBD error frame.
>>> z
Connecting to target...Press Ctrl/P to end connection
I
IO1> group: 0 test: 4 subtest:1
======================================================================
----------------------- DIAGNOSTIC TEST ERROR ---------------------GROUP: 00
Test: 04 Sub: 01
Error: 03 Pass: 00000001
Addr: 00000000
Exp: 00000000 Rec: 000000ff
Xor: 000000ff
Data Miscompare
=======================================================================
The example shows that the Module I2C EEPROM Integrity Test was executed
and failed. The XOR data specifies a data miscompare.
3–36 System Maintenance
4
Error Handling and Analysis
4.1 In This Chapter
This chapter includes:
•
Error handling services overview
•
Field replaceable units
•
OpenVMS error log
•
Module NVRAM status and LED indicators
•
FTSS error reporting interface
•
Firmware interfaces
•
Firmware and OpenVMS interface data structures
•
Error log analysis
4.2 Error Handling Services Overview
The primary function of the error handling services (EHS) is to handle and
recover from high-level system interrupts generated by the hardware when an
error is detected. When an error occurs, the EHS is invoked by hardware as an
interrupt service routine.
The interrupt service routine isolates the failure by examining various system
registers. The isolation process occurs at a high system priority level; it pauses
the OpenVMS operating system until it is complete.
After isolating the faulty FRU, the EHS determines the appropriate actions
to take. For solid errors, system deconfiguration is performed and the FRU is
removed from service. This usually involves performing module resets to invoke
diagnostics.
Error Handling and Analysis 4–1
EHS error notification is described in Table 4–1.
Table 4–1 EHS Error Notification
Step
Action
1.
Entries are made into the system error log.
2.
Status information is written to the module ID NVRAM and the DCB, where
applicable.
3.
The LED indicator associated with a failed module is set.
4.
A call is issued to the error reporting interface (ERI) which reports the event to the
FTSS$SERVER. The server process generates OPCOM messages and reports the
events to a mailbox.
4.2.1 Basic Error Isolation and Handling
Figure 4–1 and Table 4–2 describe the error isolation and handling procedure.
Figure 4–1 Hardware Error Handling Flowchart
Hardware
Error
A
6
Fork to IPL8
1
IPL29
Interrupt
7
Transient
Error
2
Fault
Detection
8
Treshold
Error
YES
NO
3
FRU
Isolation
4
Solid
Failure
11
Make Error
Log Entry
YES
5
Deconfigure
FRU
12 − Notify
FTSS$SERVER
through ERI
NO
A
NO
9
Over
Treshold
YES
10
Deconfigure
FRU
13
Done
MR−0495−92RAGS
4–2 Error Handling and Analysis
Table 4–2 Error Handling Flowchart Definitions
Event
Definition
1
Hardware reports error through a high-level interrupt and control is
transferred to the EHS.
2
The EHS examines system registers to determine the type of failure which has
occurred.
3
The EHS identifies the FRU that is the source of the error. FRU isolation is
generally accomplished at the module level. In some cases, FRU isolation is to
a set of modules. In all cases, the EHS isolates the error to an FRU or set of
FRUs in one zone.
4
The EHS determines if the error is solid.
5
If the error is solid, the FRU is deconfigured from the system.
6
The EHS has successfully recovered from the error (either solid or transient)
and execution is continued at IPL8.
7 and 8
If the error is transient, it is compared to its error rate threshold.
9
If the error is below the error rate threshold, an entry is made in the error log.
10
If the error is above the error rate threshold, the FRU is deconfigured from the
system.
11
An entry is made in the error log.
12
The FTSS$SERVER is notified of the error through the ERI.
13
Error handling is complete.
4.2.2 EHS Structure
The EHS is packaged as part of the Fault Tolerant System Services (FTSS)
execlet (loadable image file). The FTSS execlet is loaded and initialized when
FTSS is started after the OpenVMS operating system is booted.
System errors are reported to software through an IPL 29 interrupt. When
an interrupt occurs, the hardware fetches the dispatch vector from the System
Control Block (SCB) and dispatches to the EHS interrupt service routine.
VAXELN errors are reported to the OpenVMS operating system through an IPL
22 interrupt. The interrupts are vectored by a combination of hardware and
software to the EHS interrupt service routine.
Figure 4–2 illustrates the position of the EHS relative to the major hardware,
system firmware, and other software components.
Error Handling and Analysis 4–3
Figure 4–2 EHS Architectural Position
Error Handling Services
Functions
System Utilities
Error Reporting Interface
System Error Log
Error Event
Notification
Remote Zone
Interface
IZC Routines
Zone Available
Firmware Interface
Resets
Status
Serial Interrupts
Serial Transmit/Receive
VAXELN and Diagnostics
Console and Diagnostics
Registers
Hardware Interface
Interrupts
System Hardware
VMS Interface
Device Unavailable
FRU Deconfiguration
Device Drivers
FTSS Reconfiguration
MR−0004−93RAGS
4.2.3 System Operating Modes
The error handler recognizes four modes of system operation. Each mode directly
relates to the supported hardware modes of the cross-link state as summarized in
Table 4–3.
Table 4–3 System Operating Modes
Mode
Definition
Simplex
The cross-link state in one zone is off and the CPU, memory, and I/O
subsystem of the other zone are not available for use. However, those
components in the other zone may be available and can run the OpenVMS
operating system. The system can be booted in this mode if one zone is not
physically present or is out of service. The system can also be degraded into
this mode after the failure of one zone.
Degraded
Duplex
The cross-link state in one zone (the master zone) is set to master and the
cross-link state in the other zone is set to slave. The CPU and memory in the
master zone are running the OpenVMS operating system and the I/O from
the slave zone is configured and in use. However, the slave zone CPU and
memory are not in use. This mode can only be achieved as a result of the
deconfiguration of a CPU and memory set of one zone due to an error.
Resynch
This mode is similar to Degraded Duplex except that all memory writes in
the master zone are duplicated in the slave zone. That is, when a write to
memory is performed in the master zone, the same data is written to the
same memory location in the slave zone. The cross-link state in one zone
is Resynch master and in the other zone, Resynch slave. This mode is used
during the synchronization process to copy the master zone memory to the
slave zone before entering Duplex mode.
(continued on next page)
4–4 Error Handling and Analysis
Table 4–3 (Cont.) System Operating Modes
Mode
Definition
Duplex
The memories in both zones are identical and both CPUs are running in
lockstep. The I/O subsystems of both zones are available and in use. The
cross-link state in both zones is Duplex. The system can be booted in this
mode, or can transition to this mode as the result of the synchronization
process from either Simplex or Degraded Duplex modes.
4.2.4 Error Types
EHS recognizes 11 error types. All errors are classified as one of those described
in Table 4–4.
Table 4–4 Error Types
Error Type
Definition
CPU/MEM
Faults
All data, ECC codes, and control signals flow over the primary rail. The
mirror rail exists primarily for the purpose of performing verification
checks against the primary rail. Some checks are performed by hardware
between these two rails to detect failures within the boundaries of the
CPU module. When such a condition is detected, a CPU/MEM fault is
generated by the hardware, and results in the following set of hardware
actions:
1. A high-level system interrupt occurs to report the error, causing an
entry into the error handler. In some cases, the failure may be severe
enough to prevent instructions from executing.
2. If the operating mode at the time of the failure is Duplex, it will
be changed to Degraded Duplex mode. In this case, the other zone is
interrupted as well by a report that a CPU/MEM fault occurred in the
failing zone.
3. Approximately 145 microseconds after the interrupt, the failing CPU
module will be reset by hardware, resulting in an entry into the system
console. The purpose of this brief delay is to allow the error handler to
store the contents of the CPU, JXD, and cross-link registers in the Console
Communications Area (CCA).
In non-Duplex modes, only one CPU is in use. This failure results in the
termination of the OpenVMS operating system.
CPU/MEM faults can be caused by solid or transient errors. Since
software cannot distinguish between the two, they are all treated as
transient. The CPU module requires service only when they exceed the
operating system’s threshold, when an end action timeout occurs, or when
diagnostics fail. In all cases, the FRU identified by software is the CPU
module which experienced the failure.
(continued on next page)
Error Handling and Analysis 4–5
Table 4–4 (Cont.) Error Types
Error Type
Definition
Double-Bit
memory
errors
Hardware reports a double-bit error (DBE) when the ECC checkers detect
this condition on a read from a main memory location. This read can occur
during a DMA or CPU cycle, with two possible error causes: a memory
failure or a programming error.
If system software attempts to access a location beyond the bounds of
physical memory, hardware will report a double-bit ECC error. This is a
programming error in the OpenVMS operating system and the EHS will
initiate a system crash. This will be seen as a FATMEMERR bugcheck.
If system software attempts to access a valid physical memory location
which does not respond, a DBE will be reported by the hardware. In
this case, the cause of the problem is failed memory. The CPU with this
memory failure is removed from the configuration.
If the system is operating in a non-Duplex mode, the OpenVMS operating
system is terminated by forcing an entry into the system console. In
Duplex, the failed CPU is removed and the system continues to operate in
Degraded Duplex mode.
DBEs due to memory failures are always treated as solid. The failed CPU
will not be reconfigured until the zone with the failure is removed and the
memory is repaired.
The FRU in most cases will be a pair of SIMMs on a memory mother board
(MMB). In all cases, FRU isolation is done at the time of the end action
when system registers are recovered from the failed CPU. In the case of
an end action timeout, the CPU module will be identified as the FRU.
(continued on next page)
4–6 Error Handling and Analysis
Table 4–4 (Cont.) Error Types
Error Type
Definition
Single-Bit
memory
errors
Single-Bit Errors (SBEs) can be detected by either the JXD during a DMA
read cycle which reads from main memory or the CPU during a memory
read. Software action varies depending upon the system operating mode
and where the error detection occurs.
If the SBE is detected by the JXD during a DMA cycle in any system
mode or by the CPU during a CPU cycle in any non-Duplex mode, the
actions of the EHS are the same. The error is always transient, and no
deconfiguration is performed. A pair of memory SIMM rows on an MMB
are isolated and compared to its error rate threshold.
In Duplex mode (JXD detected) when the threshold is exceeded, the CPU
module on which the memory resides will be removed from service. In
non-Duplex mode, since there is only one CPU active and since SBEs are
always transient, the CPU is not removed from service when the threshold
is exceeded. The SBE is repaired in memory by hardware if detected by
the JXD, and by the EHS if detected by the CPU.
If the SBE is detected during a CPU cycle while the system is in Duplex
mode, the action differs due to hardware constraints. The CPU which
experiences the SBE will be removed from service by hardware at the time
of the error. An error log will be generated reporting the error, but FRU
isolation is done at the time of the end action. The error is then compared
to its error rate threshold by the OpenVMS operating system.
If the threshold is not exceeded, the CPU will be resynchronized
immediately by system software (FTSS$SERVER) at the time of the end
action. The process of resynchronization will repair the SBE in physical
memory since each location is rewritten during the memory copy.
If the failed CPU does not return for resynchronization after being
removed in the CPU-detected Duplex mode case, an end action timeout
event will be logged which identifies the failed CPU module as the FRU.
In most cases, a pair of SIMM rows and a memory mother board (MMB)
are identified as the FRU in the error log. However, in some cases, end
action data may not contain all the information needed to isolate to a pair
of memory SIMM rows. In this case the CPU module will be identified as
the FRU and will be subjected to the same threshold as a memory SIMM.
Cable
failures
All traffic between the two zones of the system is performed across the
cross-link cable. If this cable is detached or broken, the hardware will
report a cable loss event to the EHS. This error can only happen in a nonSimplex system, and when it occurs, communication between the zones is
lost.
In all cases, the system operating mode must be changed to Simplex. If
the mode before the error was not Duplex, then the slave zone is removed
from service. If the mode was Duplex, then Zone B is removed from
service.
The EHS indicates in the error log that this error is solid and service
is required, and the error is compared to its error rate threshold. If the
threshold is not exceeded, the zone will be resynchronized automatically. If
the threshold is exceeded, no automatic resynchronization will occur until
the cross-link cable is repaired. In all cases, the FRU is the cross-link
cable.
(continued on next page)
Error Handling and Analysis 4–7
Table 4–4 (Cont.) Error Types
Error Type
Definition
Power
failures
If a zone loses power in a non-Simplex configuration, hardware generates
an interrupt to report the event to the EHS. In a non-Duplex mode,
software will detect this error only when the slave zone loses power. In
this case, the slave zone is removed from the configuration and the system
continues to run in Simplex mode.
In Duplex mode, the error is detected by software when either zone loses
power. Again, the failed zone is removed from the configuration and the
system continues in Simplex mode.
EHS indicates in the error log that this error is solid and service is
required, and the error is compared to its error rate threshold. If the
threshold is not exceeded, the zone will be resynchronized automatically.
If the threshold is exceeded, no automatic resynchronization will occur
until the zone is repaired and resynchronized manually. The failed zone is
identified as the FRU for all power failures.
Clock phase
errors
If the clocks between zones begin to run out of phase, hardware generates
an interrupt to report the event to the EHS. This event can occur only
in non-Simplex modes. The cause of this type of failure can be either the
oscillator or the clock locking logic.
An oscillator failure will prevent the CPU and I/O module clocks in the two
zones from running in synchronization and will result in the termination
of the OpenVMS operating system on that zone.
Failure in the clock lock logic will result in two zones running diverged
if the system operating mode had been Duplex. In this case, EHS will
select one zone to remove, and the other zone will continue to run the
OpenVMS operating system in Simplex mode. (Zone selection is based on
timings within the system and could be either zone.) In Degraded Duplex
mode, the slave zone is removed from the configuration and the OpenVMS
operating system continues in Simplex mode.
In all cases of oscillator failure, the ATM in the zone which is removed
is identified as the FRU. If the error is caused by clock lock logic failure,
software cannot accurately determine in which zone the failure exists.
The EHS compares the error to its error rate threshold. An error log is
generated at the time of the error which identifies the ATM as the FRU. If
the threshold is exceeded, the error log indicates that service is required
for the ATM and the zone will not be resynchronized automatically. If the
threshold is not exceeded and the diagnostic tests complete successfully,
the zone will be resynchronized when it becomes available.
If the threshold is not exceeded and the diagnostics report a failure, the
end action error log will indicate that the ATM module requires service
and the zone will not be resynchronized automatically. If the zone fails to
return for service and the threshold had not been exceeded, an end action
timeout error log is generated which indicates the ATM requires service.
(continued on next page)
4–8 Error Handling and Analysis
Table 4–4 (Cont.) Error Types
Error Type
Definition
Halt errors
A halt error occurs when the system is operating in Duplex mode, the Zone
Halt Enable switch on the zone control panel is pressed, and the Break key
is pressed on one of the system consoles, or one zone experiences errors on
its halt lines.
The zone attached to the console terminal or with the error will be halted
and enter the system console. In the other zone, hardware generates
an interrupt to the EHS. The system operating mode will be degraded
to Simplex and the OpenVMS operating system will be continued after
deconfiguring the halted zone.
The failed zone is identified as the FRU in the error log. This error is
not subjected to thresholding. The halted zone must be resynchronized
manually to be returned to service.
Resynch
abort errors
During memory resynchronization, all memory writes are mimicked to
both zones. The data is driven from the master zone across the resynch
bus (also referred to as the cross-link cables) to the slave zone. The
incoming data on the slave side is protected by ECC. An ECC failure on
the slave side results in a CPU/MEM fault on the slave and is handled as
that type of error. The data is protected on the master side by an ECC, a
cross-rail ECC comparison and a data cross-check.
The failure of any of these checks results in hardware generating an
interrupt to the EHS reporting a resynch abort error. Resynch mode is
terminated by the hardware and system operation continues in Degraded
Duplex mode.
Since all resynch abort errors indicate failures on the master side, the
master CPU module is isolated as the FRU. This error can occur only
when the system is in Resynch mode, so removal of the CPU would result
in termination of the OpenVMS operating system. The error log message
will indicate the master CPU as the FRU.
The EHS compares the error to its error rate threshold. If the threshold is
exceeded, the EHS will disable automatic resynchronization of the remote
zone. Manual intervention will be required to repair this situation. Since
Duplex mode cannot be achieved and the master CPU is the source of this
failure, the OpenVMS operating system must be manually terminated to
repair the CPU module.
Nonexistent
I/O errors
Nonexistent I/O (NXIO) errors occur when a reference to an I/O module
times out. Such a timeout can occur during a DMA or CPU cycle. In
a CPU cycle, an automatic operation retry is attempted. If the retry
succeeds, hardware reports the failure as transient. Otherwise, it is
reported as a solid failure.
All timeouts during DMA cycles are transient errors. The error log
indicates if the error was solid or transient, and if it occurred on a DMA or
CPU cycle.
In all NXIO error cases, either an I/O or interface module will be identified
as the FRU. If the error is solid, the I/O or interface module will be
removed from system service by the EHS.
If the error is transient, it will be compared to its error rate threshold
by the EHS. If the threshold is exceeded and the system operating mode
is not Simplex, the I/O or interface module will be removed from system
service.
No I/O module will be removed due to transient errors from a Simplex
system (where alternate I/O paths are not normally available). Additional
transient errors on the I/O module will generate further error logs.
(continued on next page)
Error Handling and Analysis 4–9
Table 4–4 (Cont.) Error Types
Error Type
Definition
I/O errors
The ATM module contains a series of checkers that verify consistency
between the dual rails of the system during I/O accesses. When
discrepancies are detected, the hardware generates an interrupt, invoking
the EHS. System registers which reflect the state of the checkers are read
and analyzed to determine the source of the error.
These miscompare errors can be detected during a DMA operation or
a direct CPU I/O access. When miscompares occur on CPU cycles, the
hardware automatically retries the operation.
If the retry succeeds, hardware reports the error as transient. Otherwise,
the error is solid and the EHS deconfigures the system to remove the FRU.
The error log will indicate the FRU, describe the error as solid or
transient, and list any modules that were deconfigured as a result. If
the FRU is a zone or an ATM, the entire zone is removed.
These errors result in a CPU, ATM, interface module, or cross-link FRU.
Transient errors are compared to their error rate threshold by the EHS.
Errors that exceed the threshold may result in the removal of the FRU
from service.
Zone
divergence
This error type occurs when the two zones begin executing separate
code paths while operating in Duplex mode. This situation is detected
by hardware when an access to I/O space is performed. At that time,
miscompares in the control and data signals will be detected in the crosslink chips on the ATM.
This error is reported by hardware as an I/O error or an NXIO error, but
software recognizes the special case and identifies it as zone divergence in
the error log. When this error is detected, software will remove one zone
from service (Zone selection depends on how zone divergence manifested
itself). Either zone may be removed.
This error is usually due to a programming error or divergence between
the NVRAMs of the two zones. The error is treated as transient and the
threshold error count for that error is incremented.
If the threshold is not exceeded of if the diagnostics on the removed zone
complete successfully, the zone will be resynchronized back into the system
at end action time. If the threshold is exceeded or if the diagnostics on the
removed zone report a failure, the zone will not be resynchronized at end
action time. The end action error log will indicate that service is required.
If the removed zone fails to return from running diagnostics, an end action
timeout error log will be generated which identifies the zone as the FRU
and requests service. If the threshold is exceeded, the zone will not be
automatically resynchronized. Manual intervention will be required to
repair the zone and return it to service.
4.2.5 VAXELN Error Handling
Failures detected by VAXELN software running on the I/O expansion module are
reported to the EHS through one of two mechanisms:
•
An IPL 22 interrupt from the module error which is dispatched into the EHS.
•
The EHS detects the expiration of a watchdog timer maintained by VAXELN
signaling a termination of VAXELN execution.
4–10 Error Handling and Analysis
Table 4–5 describes the VAXELN error classes and the actions taken by the EHS.
Table 4–5 VAXELN Error Classes
Error Class
Description
EHS Actions
VAXELN Kernel
Fatal
This error is reported when the
VAXELN kernel detects a fatal
error which prevents it from
continuing operation.
The FRU is the I/O expansion
module. This is a solid error and
is not subjected to a threshold.
VAXELN Kernel
Recoverable
A recoverable error was
detected and handled by
VAXELN software. Currently,
this error is reported only
when VAXELN software
detects a repairable single-bit
memory error.
The FRU is the I/O expansion
module. The error is compared to its
error rate threshold. If the threshold
is exceeded, the I/O expansion
module and all attached interface
modules are deconfigured from the
system.
I/O Expansion
Module Master
Fatal
A fatal error detected by the
VAXELN I/O expansion module
master job which results in
the shutdown of all VAXELN
processes.
The FRU is the I/O expansion
module. This is considered a solid
error; no threshold is applied. The
I/O expansion module is deconfigured
from the system.
I/O Expansion
Module Master
Recoverable
An error detected by the
VAXELN I/O expansion module
master job which resulted from
the failure of a VAXELN job to
initialize successfully. The Job
ID field of the error message
indicates which VAXELN job
failed.
The FRU in an interface module.
The EHS isolates the interface
module by checking the Job ID field
of the error message. The error is
considered solid; no threshold is
applied. The module is deconfigured
from the system.
I/O Expansion
Module Job
Fatal
Similar to I/O Expansion
Module Master Recoverable,
this error indicates that a
VAXELN job has experienced
a fatal error and has been
terminated. The Job ID field
of the error message indicates
which VAXELN job failed.
The FRU is an interface module.
The EHS isolates the interface
module by checking the Job ID field
of the error message. The error is
considered solid; no threshold is
applied. The interface module is
deconfigured from the system.
VAXELN software implements a watchdog timer which is a cell in the I/O
Expansion Module Communication Area (NCA). It is incremented periodically
by VAXELN and monitored by the EHS. If the value in the NCA cell stops
incrementing, VAXELN has crashed. This is referred to as a VAXELN kernel
fatal error.
The EHS examines the VAXELN NCA error log buffer area for a VAXELN error
message. When it finds the error message, the EHS identifies the I/O expansion
module as the FRU. The error is considered solid; no threshold is applied, and the
I/O expansion module is deconfigured from the system.
Error Handling and Analysis 4–11
4.3 Field Replaceable Units (FRUs)
After analyzing error information and determining the error type, the EHS
isolates the source of the error to a FRU. If the error was solid, the system is
deconfigured to remove the FRU from service. If the error is transient, it is
compared against a threshold for the error type and FRU. If the threshold is
exceeded, or if the error is solid, the system is deconfigured to remove the FRU
from service.
4.3.1 Isolation
Table 4–6 describes the FRUs and lists the error types which could result in a
FRU being isolated.
Table 4–6 System FRUs
FRU
Description
Source Error Types
ATM
module
I/O attachment module. Performs exchange
and verification of I/O control and data signals
between zones. The module includes an
embedded I/O expansion module.
I/O errors
Clock phase errors
CPU
module
The CPU module is identified as the FRU when
the failure is attributable to a CPU problem or
to a problem that cannot be isolated between the
CPU and memory.
Resynch abort errors
CPU/MEM faults
Double-Bit memory
errors
Single-Bit memory errors
Memory
board
A pair of rows of memory SIMMs on a memory
mother board (MMB) will be identified as the
FRU when the error can be isolated beyond the
CPU board to a specific piece of memory.
Double-Bit memory
errors
Single-Bit memory errors
I/O expansion
module
An I/O expansion module can be identified as the
FRU as a result of a firewall miscompare during
an I/O operation or as a result of a nonexistent
I/O error during a reference to the I/O expansion
module or an attached interface module.
Nonexistent I/O errors
I/O errors
VAXELN errors
Interface
module
An interface module can be identified as a FRU
only as a result of a nonexistent I/O error which
occurs during a reference to the interface module.
It is also possible that the I/O expansion module
will be identified as the FRU.
Nonexistent I/O errors
VAXELN errors
Zone
Some error cases involve failures not directly
attributable to a single module. The zone FRU is
only identified in the case of solid or reproducible
errors, so diagnostics should be able to isolate the
failure within the zone.
Power failures
Halt errors
Zone divergence
Crosslink
cable
The cross-link cable is the identified FRU
for any error which isolates the connections
between zones. This includes the resynch and
interzone buses, which are packaged into the
single physical cable.
Cable failures
I/O errors
4–12 Error Handling and Analysis
4.3.2 Deconfiguration
This section describes the actions taken by the EHS when a FRU is identified as
the source of a solid error or transient errors which exceed the FRU threshold. A
table is provided for each FRU that describes the actions taken by the EHS when
the FRU is deconfigured.
In non-Duplex modes, the EHS may respond to excessive transient failures by
calling out the FRU but not removing it from service. This action prevents loss of
system service due only to transient errors.
4.3.2.1 I/O Attachment Module
Table 4–7 describes the OpenVMS operating system actions taken when the
ATM is identified as the FRU and deconfigured by the EHS. Some actions are
dependent on the system operating mode.
Table 4–7 ATM Deconfiguration Actions
Action Taken
Description
Comments
Cross-link mode =
off
The cross-link mode is set to off.
The system will continue in Simplex
mode. The action may be taken by
the hardware when the error occurs
or by software while handling the
error.
Done in non-Simplex mode
only. Extraneous when the
error occurs in Simplex
mode.
CPU/MEM fault
A CPU/MEM fault is forced on the
zone with the failed ATM module.
This results in an entry into the
system console.
Done when the error
occurs in Duplex, Simplex
or in the master zone
of a Degraded Duplex
configuration.
Zone hard reset
A zone hard reset is issued to the
zone with the failed ATM to force
diagnostics to run.
Done only when the error
occurs in the slave zone
of a Degraded Duplex
configuration.
Set ATM LED
indicator
Use the module I2C bus to turn on
the LED indicator for the failed ATM
module.
Set module status
in ATM NVRAM
and DCB
Update the status_os and status_
sum fields in the module ID NVRAM
and the DCB to indicate the module
has experienced a failure. The code
written depends on the failure type.
The entries in Table 4–7 apply when the module is being removed because of a
solid error or excessive transient errors. There is one exception. When an ATM
module in a Simplex system experiences excessive transient errors, the module is
not fully deconfigured since that would result in the termination of the OpenVMS
operating system. In this case, the ATM LED indicators turn on, and the module
status is written to the ATM NVRAM and DCB. The OpenVMS operating system
continues to run. The module will not be configured when the system is booted,
or when the failed zone is synchronized until the module is repaired.
Error Handling and Analysis 4–13
4.3.2.2 CPU Module and Memory
When memory is deconfigured from the system, it is done by removing the CPU
module on which the memory resides.
Table 4–8 describes the OpenVMS operating system actions taken when a CPU
module or memory is identified as the FRU and is deconfigured by the EHS.
These actions are identical for CPU and memory failures. Some actions are
dependent on the system operating mode.
Table 4–8 CPU Deconfiguration Actions
Action Taken
Description
Comments
Cross-link mode =
Degraded Duplex
The cross-link mode is set to master
on the zone with the surviving CPU
and slave on the zone with the failed
CPU. The action may be taken by the
hardware when the error occurs or by
software while handling the error.
Done in Duplex mode
only.
CPU/MEM fault
A CPU/MEM fault is forced on the failed
CPU module. This results in an entry
into system console.
Set CPU LED
indicator
The module I2C bus is used to turn on
the LED indicator for the failed CPU
module.
Set module status
in CPU NVRAM
and DCB
The status_os and status_sum fields
in the module ID NVRAM and DCB
are updated to indicate the module has
experienced a failure. The code written
depends on the failure type.
When one CPU is in use (Degraded Duplex, Simplex, or Resynch mode), excessive
transient failures will result in the EHS calling out the failed module, but not
removing it from service. Removing it from service would cause termination of
the OpenVMS operating system. In this case, the CPU module LED is turned
on, and the module status is written to the CPU module NVRAM and DCB. The
OpenVMS operating system continues to run. The CPU will not be configured
when the system is booted or when the failed zone is synchronized unless the
CPU is repaired.
4.3.2.3 I/O Expansion Module
Table 4–9 describes the actions taken by the OpenVMS operating system when
an I/O expansion module is identified as the FRU and is deconfigured by the
OpenVMS operating system.
4–14 Error Handling and Analysis
Table 4–9 I/O Expansion Module Deconfiguration Actions
Action Taken
Description
I/O hard reset
The I/O expansion module which is being deconfigured is reset
through the cross-link I/O hard reset register.
Set I/O expansion
module LED
indicator
The module I2C bus is used to turn on the LED for the failed
module.
Set module status
in I/O expansion
module NVRAM
and DCB
The status_os and status_sum fields in the module ID NVRAM and
DCB are updated to indicate the module has experienced a failure.
The actual code written depends on the failure type.
The entries in Table 4–9 apply when the module is being removed due to a
solid error or excessive transient errors. There is one exception. When an I/O
expansion module in a Simplex system experiences excessive transient errors, the
module is not fully deconfigured since that would likely result in the loss of the
only I/O path to a device. In this case, the I/O expansion module LED is turned
on and the module status is written to the interface module NVRAM and the
DCB.
The I/O expansion module will remain in service. The NVRAM will not be
configured when the system is booted or when the failed zone synchronized until
the module is repaired.
4.3.2.4 Interface Module
Table 4–10 describes the OpenVMS operating system actions taken when an
interface module is identified as the FRU and is deconfigured by the OpenVMS
operating system. Some actions are dependent on the system operating mode.
Table 4–10 Interface Module Deconfiguration Actions
Action Taken
Description
Reset interface module
The interface module being deconfigured is reset through the
module I2C bus.
Set interface module
LED indicator
Use the module I2C bus to turn on the LED indicator for the
failed interface module.
Set module status
in interface module
NVRAM
Update the status_os and status_sum fields in the module ID
NVRAM and the DCB to indicate the module has failed. The
code written depends on the failure type.
The entries in Table 4–10 apply when the module is being removed because of
a solid error or excessive transient errors. There is one exception. When an
interface module in a Simplex system experiences excessive transient errors, the
module is not fully deconfigured since that would likely result in the loss of the
only I/O path to a device. In this case, the interface module LED indicator is
turned on, and the module status is written to the interface module NVRAM and
the DCB (See Section 4.8.2).
The interface module will remain in service. The module will not be configured
when the system is booted or when the failed zone is synchronized until the
module is repaired.
Error Handling and Analysis 4–15
4.3.2.5 Zone
Table 4–11 describes the OpenVMS operating system actions taken when an
entire zone is identified as the FRU and is deconfigured by the EHS. Note that
some actions are dependent on the system operating mode.
Table 4–11 Zone Deconfiguration Actions
Action Taken
Description
Comments
Cross-link mode =
off
The cross-link mode is set to off.
The system will continue in Simplex
mode. The action may be taken by
the hardware when the error occurs
or by software while handling the
error.
Done only in non-Simplex
mode.
CPU/MEM fault
A CPU/MEM fault is forced on the
failed zone. This results in an entry
into system console.
Done when the error occurs
in Duplex, Simplex or in the
master zone of a Degraded
Duplex system.
Zone hard reset
A zone hard reset is issued to the
failed zone.
Done only in the slave zone
of a Degraded Duplex or
Resynch mode system.
4.3.2.6 Cross-Link Cable
Table 4–12 describes the OpenVMS operating system actions taken when the
cross-link cable is identified as the FRU and is deconfigured by the EHS. The
cross-link cable is active only during non-Simplex modes.
Table 4–12 Cross-Link Cable Deconfiguration Actions
Action Taken
Description
Comments
Cross-link mode =
off
The cross-link mode is set to off.
The system will continue in Simplex
mode. The action may be taken by
the hardware when the error occurs
or by software while handling the
error.
Done only in non-Simplex
modes.
CPU/MEM fault
A CPU/MEM fault is forced on Zone
B. This results in an entry into
system console.
Done only when the error
occurs in Duplex mode.
Zone hard reset
A zone hard reset is issued to the
slave zone.
Done in the slave zone when
the error occurs in Degraded
Duplex or Resynch mode.
4–16 Error Handling and Analysis
4.3.3 Application of Thresholds
Application of thresholds by the EHS is rate based. An FRU exceeds its threshold
when it accumulates a certain number of a given error type in a specified time
period. Table 4–13 lists the thresholds associated with each FRU and error type.
In most cases, more than one type of error can result in the isolation of an FRU.
For each FRU and error type, a separate threshold is applied. The threshold
for an error type of a specific FRU must be exceeded before the module is
deconfigured.
For example, both NXIO and I/O errors may isolate an ATM module. EHS
maintains separate thresholds for NXIO and I/O errors for each ATM module.
When one of the errors occurs and is isolated to an ATM, the threshold for that
error type on that ATM is applied. If the threshold is exceeded, the ATM is
deconfigured.
Table 4–13 FRU Thresholds
Error
Type
Error
Limit
Time
Period1 Comments
CPU Module
CPU/MEM
faults
3
12
A CPU/MEM fault results in the temporary removal
of the CPU module from service. The CPU will be
reconfigured into the system if this threshold is not
exceeded.
Resynch
abort errors
3
1
Resynch abort errors result in the termination of the
Resynch operation. When the threshold for this error
is exceeded, the CPU module is marked as broken.
System downtime must be scheduled to repair the
problem since the only CPU module has failed.
Memory SIMMs
Single-bit
memory
errors
3
12
Each single-bit memory error is attributed to a row
of memory SIMMs on a single MMB. Each SIMM row
has an individual threshold. When the threshold for
the SIMM row is exceeded, the CPU module on which
the SIMM resides will be removed from service if the
system operating mode is Duplex.
I/O ATM Module
Clock phase
errors
3
12
Each clock phase error results in the temporary
removal from service of a zone. When the zone returns
to service, it will be resynchronized automatically if the
threshold is not exceeded.
Transient
I/O errors
3
12
When this threshold is exceeded, the zone in which
the ATM resides is removed from service, except in a
Simplex system.
I/O Expansion Module
1 In
hours
(continued on next page)
Error Handling and Analysis 4–17
Table 4–13 (Cont.) FRU Thresholds
Error
Type
Error
Limit
Time
Period1 Comments
I/O Expansion Module
Transient
NXIO errors
3
12
When the threshold is exceeded, the module is
deconfigured except in Simplex system.
Transient
I/O errors
3
12
When the threshold is exceeded, the module is
deconfigured except in Simplex system.
VAXELN
kernel
recoverable
errors
3
24
When the threshold is exceeded, the module is
deconfigured except in Simplex system.
Interface Module
Transient
NXIO errors
3
12
When the threshold is exceeded, the interface module is
deconfigured, except in a Simplex system.
Zone
Power
failures
3
24
When power is lost, the zone is temporarily removed
from service and the error is compared to its error rate
threshold. When power is restored, the zone will be
resynchronized automatically if the threshold has not
been exceeded.
Zone
divergence
3
24
When the zones diverge, one zone is temporarily
removed from the configuration and the error is
compared to its error rate threshold. When the
zone returns to service, it will be reconfigured if the
threshold is not exceeded. This threshold is not applied
directly to any FRU. The selection of which zone to
remove is made based on how the error manifests itself
within the system.
Cross-Link
Cable
failures
3
24
When the cable between the zones is lost, the zone
is temporarily removed from service and the error is
compared to its error rate threshold. When the zone
returns, it will be resynchronized automatically if the
threshold has not been exceeded.
Transient
I/O errors
3
12
When the threshold is exceeded, the cross-link is
deconfigured, which results in the removal of one of the
zones from service.
1 In
hours
4–18 Error Handling and Analysis
4.4 OpenVMS Error Log
The EHS makes entries in the system error log for all system error interrupts.
Figure 4–3 shows the format of the error log. With the exception of the Fault
Data block, all blocks have fixed length.
Figure 4–3 OpenVMS Error Log Format
Number of Longwords
Fault Summary
FRU Information
Deconfiguration Information
Threshold Information
Fault Data
MR−0006−93RAGS
The first longword in the error log contains the count of longwords which follow.
This number is based on the fault class of the error log (see Section 4.4.1).
Table 4–14 lists the different values which will appear for each of the six different
fault classes.
Table 4–14 OpenVMS Error Log Sizes
Class Value
Fault Class
Decimal Size
Hexidecimal Size
1
System Error
40
28
2
End Action
41
29
3
End Action Timeout
13
D
4
VAXELN Error
28
1C
5
Software Detected Error
15
F
6
CPU or Zone Unsynchable
14
E
Error Handling and Analysis 4–19
4.4.1 Fault Summary
The Fault Summary block contains the fault ID, fault flags describing the nature
of the fault, the cross-link mode at the time the fault occurred, and the cross-link
mode after the error handling was completed. All fields in this block are valid for
all error entries. Figure 4–4 identifies each entry in the block and the offset from
the start of the block. Table 4–15 describes the content of each field.
Note
The 1-byte FAULT_ID field is composed of two 4-bit subfields. Bits [07:04]
indicate the class of the fault. Bits [03:00] identify the error type within
the fault class. There are six fault classes. Each class has a different fault
data block at the end of the error log. See Section 4.4.5 for a description
of each fault class and the fault data provided in the error log.
Figure 4–4 Fault Summary Block
XLINK_MODE_AFTER
(Crosslink Mode After)
XLINK_MODE_ERROR
(Crosslink Mode Error)
FAULT_FLAGS
(Fault Flags)
FAULT_ID
(Fault Identification)
MR−0009−93RAGS
Table 4–15 Fault Summary Block Entry Descriptions
Entry
Contents
FAULT_ID
Fault Identification type. The hexidecimal ID values are defined
as:
10 - CPU-detected double-bit error
11 - JXD-detected double-bit error
12 - Cable gone between zones
13 - Power gone in other zone
14 - Clock error
15 - Other zone halted
16 - Resynch abort error
17 - CPU-detected single-bit error
18 - JXD-detected single-bit error
19 - CPU/MEM fault
1A - Nonexistent I/O
1B - I/O miscompare error
1C - Zones divergence
20 - CPU-detected DBE end action
21 - JXD-detected double-bit error end action
22 - Cable gone end action (reserved for future use)
(continued on next page)
4–20 Error Handling and Analysis
Table 4–15 (Cont.) Fault Summary Block Entry Descriptions
Entry
Contents
23 - Power gone end action (reserved for future use)
24 - Clock error end action
25 - Other zone halted end action (reserved for future use)
26 - Resynch abort error end action (reserved for future use)
27 - CPU-detected single-bit error end action
28 - JXD-detected single-bit error end action (reserved for future
use)
29 - CPU/MEM fault end action
2C - Zone divergence end action timeout
30 - CPU-detected DBE end action timeout
31 - JXD-detected DBE end action timeout
32 - Cable gone end action timeout (reserved for future use)
33 - Power gone end action timeout (reserved for future use)
34 - Clock error end action timeout
35 - Other zone halted end action timeout (reserved for future use)
36 - Resynch abort error end action timeout (reserved for future
use)
37 - CPU-detected SBE end action timeout
38 - JXD-detected single-bit error end action timeout (reserved for
future use)
39 - CPU/MEM fault end action timeout
3C - Zone have diverged end action timeout (reserved for future
use)
40 - VAXELN kernel fatal error
41 - VAXELN kernel recoverable error
42 - VAXELN master job fatal error
43 - VAXELN master job recoverable error
44 - VAXELN job fatal error
45 - VAXELN job recoverable error (reserved for future use)
50 - Software-detected error
60 - CPU is unsynchable
FAULT_FLAGS
The following fields are defined within FAULT_FLAGS:
00 - Transient error
01 - Solid error
02 - Error threshold exceeded
03 - Service is required
(continued on next page)
Error Handling and Analysis 4–21
Table 4–15 (Cont.) Fault Summary Block Entry Descriptions
Entry
Contents
XLINK_MODE_
ERROR
Cross-link mode at the time of error. The following values are
defined:
[07:04] - Not used
0 - Off (Simplex)
1 - Slave
2 - Master
3 - Duplex
4 - Not used
5 - RESYNCH_SLAVE
6 - RESYNCH_MASTER
7 - Not used
XLINK_MODE_
AFTER
Cross-link mode after error handling. The modes are as defined
for XLINK_MODE_ERROR.
4.4.2 FRU Information
This block contains information on the isolated FRU and is valid for all error
events. Figure 4–5 identifies each entry in the block and the offset from the start
of the block. Table 4–16 describes the content of each entry.
Note
In some cases, an FRU is not identified in the error log for a system error
event. All fields in this block will be -1 (FFFFFFFF hexidecimal). In
these cases, the FRU will be identified in a subsequent end action or end
action timeout error log.
Figure 4–5 FRU Information Block
FRU_TYPE (FRU Type)
0
+4
FRU_DATA (FRU Data)
MR−0010−93RAGS
4–22 Error Handling and Analysis
Table 4–16 FRU Information Block Entry Descriptions
Entry
Contents
FRU_TYPE
The following bits are defined:
01 - The FRU is a module in Zone A (FRU_DATA has slot ID)
02 - The FRU is a module in Zone B (FRU_DATA has slot ID)
03 - Zone A is the FRU
04 - Zone B is the FRU
05 - The cross-link cable is the FRU
06 - The FRU is a Zone A SIMM (FRU_DATA has SIMM ID)
07 - The FRU is a Zone B SIMM (FRU_DATA has SIMM ID)
FRU_DATA
FRU specific data. The following bits are defined for IDs 1 and 2:
00 - CPU module in slot 0 is the FRU
01 - ATM module in slot 1 is the FRU
02 - I/O expansion module in slot 2 is the FRU
[09:03] - Not used
10 - Interface module in slot 10 is the FRU
11 - Interface module in slot 11 is the FRU
12 - Interface module in slot 12 is the FRU
13 - Interface module in slot 13 is the FRU
14 - Interface module in slot 14 is the FRU
15 - Interface module in slot 15 is the FRU
16 - Interface module in slot 16 is the FRU
17 - Interface module in slot 17 is the FRU
[19:18] - Not used
20 - Interface module in slot 20 is the FRU
21 - Interface module in slot 21 is the FRU
22 - Interface module in slot 22 is the FRU
23 - Interface module in slot 23 is the FRU
24 - Interface module in slot 24 is the FRU
25 - Interface module in slot 25 is the FRU
26 - Interface module in slot 26 is the FRU
27 - Interface module in slot 27 is the FRU
[31:28] - Not used
Note
The following fields define the SIMM ID for FRU_TYPEs 06 and 07:
[15:00] = MMB ID from 0 to 3.
[31:16] = SIMM row ID. Values 1 to 4 represent SIMM rows A to D,
respectively.
This field = -1 for all other FRU_TYPE values.
Error Handling and Analysis 4–23
4.4.3 Deconfiguration Information
This error log block contains information about any system deconfiguration
performed by the EHS. Figure 4–6 identifies each entry in the block and the
offset from the start of the block. Table 4–17 describes the content of each entry.
Note
For errors which require no system deconfiguration, only the FT_FLAGS
fields will be filled in. The last two longwords will contain 0.
Figure 4–6 Deconfiguration Information Block
FT_FLAGS_BEFORE (Fault Flags Before)
0
+4
FT_FLAGS_AFTER (Fault Flags After)
+8
DECONFIG_INFO (Entity Deconfigured)
DECONFIG_MODULES (Modules Deconfigured)
+12
MR−0011−93RAGS
Table 4–17 Deconfiguration Information Block Entry Descriptions
Entry
Contents
FT_FLAGS_
BEFORE
The contents of EXE$GL_FT_FLAGS at the time the system error
occurred. The field is valid for all errors.
FT_FLAGS_AFTER
The contents of EXE$GL_FT_FLAGS after error handling is
complete. If the EHS performs any system deconfiguration that
includes degraded system mode in the cross-link, this field will
differ from FT_FLAGS_BEFORE. Otherwise, they are the same.
The field is valid for all errors.
DECONFIG_INFO
This field shows the entity which was deconfigured as a result of
the error. This is either a module in a given zone or an entire zone.
The following bits are defined:
00 - Zone A deconfigured.
01 - Zone B deconfigured.
02 - CPU module in Zone A deconfigured.
03 - CPU module in Zone B deconfigured.
04 - ATM module in Zone A deconfigured.
05 - ATM module in Zone B deconfigured.
06 - I/O expansion module in Zone A deconfigured.
07 - I/O expansion module in Zone B deconfigured.
08 - Interface module in Zone A deconfigured.
09 - Interface module in Zone B deconfigured.
(continued on next page)
4–24 Error Handling and Analysis
Table 4–17 (Cont.) Deconfiguration Information Block Entry Descriptions
Entry
Contents
DECONFIG_
MODULES
This field shows the Zone A modules removed from service as
a result of error handling. For example, if the source of a solid
or excessive transient error were an I/O expansion module, all
attached interface modules have been removed from service. The
following bits are defined:
00 - CPU module in slot 0 has been removed from service.
01 - I/O expansion module in slot 1 has been removed from service.
Set when the expansion module portion of the ATM module in slot 1
is removed from service. Removal of this portion of the ATM module
does not require deconfiguring the entire zone.
02 - I/O expansion module in slot 2 has been removed from service.
03 - ATM module in slot 1 has been removed from service. Set when
the entire ATM module is removed from service. The bits for all
other modules present in the zone will also be set. The entire zone
is deconfigured.
[09:04] - Not used.
10 - Interface module in slot 10 has been removed from service.
11 - Interface module in slot 11 has been removed from service.
12 - Interface module in slot 12 has been removed from service.
13 - Interface module in slot 13 has been removed from service.
14 - Interface module in slot 14 has been removed from service.
15 - Interface module in slot 15 has been removed from service.
16 - Interface module in slot 16 has been removed from service.
17 - Interface module in slot 17 has been removed from service.
[19:18] - Not used.
20 - Interface module in slot 20 has been removed from service.
21 - Interface module in slot 21 has been removed from service.
22 - Interface module in slot 22 has been removed from service.
23 - Interface module in slot 23 has been removed from service.
24 - Interface module in slot 24 has been removed from service.
25 - Interface module in slot 25 has been removed from service.
26 - Interface module in slot 26 has been removed from service.
27 - Interface module in slot 27 has been removed from service.
[31:28] - Not used.
Error Handling and Analysis 4–25
4.4.4 Threshold Information
When the Transient Error flag is set in the FAULT_FLAGS field of the Fault
Summary block, the isolated FRU error is compared to its error rate threshold.
When threshold is exceeded, the FRU will be removed from the system. In
addition, the Excessive Transient Errors flag is set in the FAULT_FLAGS field.
When the threshold comparison is completed, the threshold information is written
to the error log. Figure 4–7 identifies each entry in the block and the offset from
the start of the block. Table 4–18 describes the content of each entry.
Note
For errors which do not require a threshold comparison, all entries in this
block will be -1 (FFFFFFFF hex).
Figure 4–7 Threshold Information Block
THRESH_INT (Threshold Interval)
0
+4
THRESH_COUNT (Threshold Count)
+8
THRESH_LMT (Threshold Limit)
+12
THRESH_ZERO (Time Since Zeroed)
THRESH_TOTAL (Total Error Types)
+16
MR−0012−93RAGS
Table 4–18 Threshold Information Block Entry Descriptions
Entry
Content
THRESH_INT
The event threshold interval, expressed in seconds.
THRESH_COUNT
The number of events detected within the threshold interval,
expressed in decimal.
THRESH_LMT
The number of events which, if detected within the threshold
interval, will cause the event to be treated as a solid error by the
EHS. Expressed in decimal.
THRESH_ZERO
Time since the threshold count was last zeroed, expressed in
seconds.
THRESH_TOTAL
Total number of this type error since the threshold was zeroed,
expressed in decimal.
4–26 Error Handling and Analysis
4.4.5 Fault Data
The Fault Data block has a variable length specific to the class of the fault which
occurred. The error class can be determined by the high-order four bits of the
FAULT_ID field in the Fault Summary block (see Table 4–15). The six Fault Data
types based on these fault classes are shown in Figure 4–8 and described in the
following subsections.
Figure 4–8 Fault Data Block
0
System Registers
End Actions (End Action Registers)
+108
+112
End Action Timeouts
+1
VAXELN Detected Errors
Software Detected Errors
+16
+8
Unsynchable Events
MR−0005−93RAGS
4.4.5.1 System Registers
The EHS gathers system error information in the course of error handling. The
content of these registers is written to the error log. Table 4–19 lists each register
entry and its offset from the start of the block.
Note
For different system errors, different sets of system registers are collected.
A value of -1 (FFFFFFFF hex) in a system register location in the error
log indicates that the register was not recorded.
Error Handling and Analysis 4–27
Table 4–19 System Register Entry Descriptions
Entry
Content
Offset
SYSFLT
JXD System Fault Register
0
SYSADR
JXD System Error Address Register
4
DMAADR
DMA Error Address Register
8
DMA_IO_ADDR
DMA Engine I/O Error Address Register
12
JCSR_A
JXD Control and Status Register - Zone A
16
JCSR_B
JXD Control and Status Register - Zone B
20
JDIAG_P_A
JXD Diagnostic Error Register - Zone A, primary rail
24
JDIAG_M_A
JXD Diagnostic Error Register - Zone A, mirror rail
28
JDIAG_P_B
JXD Diagnostic Error Register - Zone B, primary rail
32
JDIAG_M_B
JXD Diagnostic Error Register - Zone B, mirror rail
36
ATMERR0_A
JXD ROM BUS ATM Error Register - Zone A
40
ATMERR0_B
JXD ROM BUS ATM Error Register - Zone B
44
DMASTS_A
DMA Status Register - Zone A
48
DMASTS_B
DMA Status Register - Zone B
52
MMBERR0_A
JXD ROM BUS MMB Error Register 0 - Zone A
56
MMBERR0_B
JXD ROM BUS MMB Error Register 0 - Zone B
60
MMBERR1_A
JXD ROM BUS MMB Error Register 1 - Zone A
64
MMBERR1_B
JXD ROM BUS MMB Error Register 1 - Zone B
68
SERCRS_A
Serial Cross-Link Control and Status Register - Zone A
72
SERCRS_B
Serial Cross-Link Control and Status Register - Zone B
76
SERMODE_A
Serial Cross-Link Mode Register - Zone A
80
SERMODE_B
Serial Cross-Link Mode Register - Zone B
84
BIU_ADDR_A
CPU BIU Address Register - Zone A
88
BIU_ADDR_B
CPU BIU Address Register - Zone B
92
BIU_STAT_A
CPU Fill Syndrome - Zone A
96
BIU_STAT_B
CPU Fill Syndrome - Zone B
100
BIU_CTL_A
CPU Fill Address - Zone A
104
BIU_CTL_B
CPU Fill Address - Zone B
108
4.4.5.2 End Actions
End action data is provided after diagnostics have completed running on a zone
or CPU which was removed from service as a result of a system error. It is
composed of console and diagnostic status and the contents of registers from the
failed zone/CPU at the time the original system error occurred. Table 4–20 lists
each register entry and its offset from the start of the data block.
4–28 Error Handling and Analysis
Table 4–20 End Actions Register Descriptions
Entry
Content
Offset
SYSFLT
JXD System Fault Register
0
SYSADR
JXD System Error Address Register
4
JCSR
JXD Control and Status Register
8
JDIAG_P
JXD Diagnostic Error Register - primary rail
12
JDIAG_M
JXD Diagnostic Error Register - mirror rail
16
MMBERR0
JXD ROM BUS MMB Error Register 0
20
MMBERR1
JXD ROM BUS MMB Error Register 1
24
ATMERR0
JXD ROM BUS ATM Error Register
28
DMASTS
DMA Status Register
32
DMAADR
DMA Error Address Register
36
SERCRS
Serial Cross-Link Control and Status Register
40
SERMODE
Serial Cross-Link Mode Register
44
SAVPC
CPU Saved PC - Zone A
48
SAVPSL
CPU Saved PSL
52
ECR
CPU EBox Control Register
56
BIU_CTL
CPU BIU Control Register
60
BC_TAG
CPU B-cache Error Tag
64
BIU_STS
CPU BIU Status Register
68
BIU_ADDR
CPU BIU Address Register
72
FIL_SYN
CPU Fill Syndrome
76
FIL_ADDR
CPU Fill Address
80
VMAR
CPU VIC Memory Address Register
84
ICSR
CPU IBox Control and Status Register
88
TBADR
CPU MBox TB Parity Address
92
TBSTS
CPU MBox TB Parity Status
96
PCSTS
CPU P-cache Status Register
100
PCCTL
CPU P-cache Control Register
104
CONSOLE_STS
System Console Duplex Compatibility Status
108
DIAG_STS
System Diagnostics Status Longword
112
4.4.5.3 End Action Timeouts
This data is provided when a zone or CPU which was temporarily removed from
service due to a fault fails to communicate through the interzone communication
service (IZC) to the remaining zone after running diagnostics. In many cases,
such a situation results in the EHS declaring a solid error for the CPU or zone in
this error log.
Error Handling and Analysis 4–29
Figure 4–9 shows the format of this Fault Data block entry and its offset.
Table 4–21 contains a brief description of the entry.
Figure 4–9 End Action Timeout Block
0
TIMEOUT_INT (Timeout Interval)
MR−0013−93RAGS
Table 4–21 End Action Timeout Block Entry Description
Entry
Content
Offset
TIMEOUT
End action timeout interval in seconds
0
4.4.5.4 VAXELN Detected Errors
This data is provided for errors detected by VAXELN software running on the I/O
expansion module. It is composed of data provided by VAXELN software when
the error was detected on the I/O expansion module.
Figure 4–10 shows the format of this Fault Data block and the offset of each
entry from the start of the block. Table 4–22 contains a brief description of each
entry.
Figure 4–10 VAXELN Detected Error Block
ERROR_CLASS (VAXELN Error Class)
0
+4
ERROR_TYPE (VAXELN Error Type)
+8
JOB_ID (ELN Component Job with Error)
+12
ERROR_CODE (Unique Error Designation Code)
ERROR_DATA (Error Condition Specific Data)
+16
MR−0014−93RAGS
Table 4–22 VAXELN Detected Error Block Entry Descriptions
Entry
Contents
ERROR_CLASS
VAXELN error class:
1 - VAXELN kernel fatal error
2 - VAXELN kernel recoverable error
3 - VAXELN master job fatal error
4 - VAXELN master job recoverable error
(continued on next page)
4–30 Error Handling and Analysis
Table 4–22 (Cont.) VAXELN Detected Error Block Entry Descriptions
Entry
Contents
5 - VAXELN job fatal error
6 - VAXELN job recoverable error (reserved for future use)
ERROR_TYPE
VAXELN error type:
1 - Hardware error
2 - Software error
3 - Unknown error
JOB_ID
VAXELN component job with error:
0 - Interface module 0 driver job
1 - Interface module 1 driver job
2 - Interface module 2 driver job
3 - Interface module 3 driver job
4 - Interface module 4 driver job
5 - Interface module 5 driver job
6 - Interface module 6 driver job
7 - Interface module 7 driver job
8 - UART 0 driver job
9 - UART 1 driver job
10 - VAXELN master job
13 - VAXELN FIST job
14 - VAXELN background job
15 - VAXELN I/O expansion module error
17 - VAXELN kernel error
ERROR_CODE
Unique error designation code (in hexadecimal)
9000
Watchdog timer expired
FA03
Job initialization failed
FA04
Job initialization timeout
CA01
Unexpected command interrupt
CA02
Unexpected interface module interrupt
0
Machine check handler entered with unknown type code
11
Floating point accelerator error
15
Memory management - PTE in P0 space
16
Memory management - PTE in P1 space
17
Memory management - PTE in P0 space on M bit
18
Memory management - PTE in P1 space on M bit
19
Unused interrupt priority level
1A
Microcode detected error
80
Unknown hardware error
10080
Bus timeout error. Read error - normal read
(continued on next page)
Error Handling and Analysis 4–31
Table 4–22 (Cont.) VAXELN Detected Error Block Entry Descriptions
Entry
Contents
20080
DAL parity error. Read error - normal read
30080
Cache parity error. Read error - normal read
40080
Uncorrectable read data error. Read error - normal read
50080
DMA error. Read error - normal read
60080
Firewall SOC miscompare. Read error - normal read
81
Unknown hardware error. Read error - SPTE/PCB/SCB
10081
Read error - SPTE/PCB/SCB
20081
DAL parity error. Read error - SPTE/PCB/SCB
30081
Cache parity error. Read error - SPTE/PCB/SCB
40081
Uncorrectable read data error. Read error - SPTE/PCB/SCB
50081
DMA error. Read error - SPTE/PCB/SCB
60081
Firewall SOC miscompare. Read error - SPTE/PCB/SCB
82
Unknown hardware error. Write error - normal write
10082
Bus timeout error. Write error - normal write
20082
DAL parity error. Write error - normal write
30082
Cache parity error. Write error - normal write
40082
Uncorrectable read data error. Write error - normal write
50082
DMA error. Write error - normal write
60082
Firewall SOC miscompare. Write error - normal write
83
Unknown hardware error. Write error - SPTE/PCB
10083
Bus timeout error. Write error - SPTE/PCB
20083
DAL parity error. Write error - SPTE/PCB
30083
Cache parity error. Write error - SPTE/PCB
40083
Uncorrectable read data error. Write error - SPTE/PCB
50083
DMA error. Write error - SPTE/PCB
60083
Firewall SOC miscompare. Write error - SPTE/PCB
100
Correctable read data error
200
Polled machine bus timeout error
201
Polled machine DAL parity error
202
Polled machine cache parity error
203
Polled machine uncorrectable read data error
204
Polled machine DMA error
205
Polled machine Firewall SOC miscompare
206
Polled machine battery low
400
Fatal system bugcheck
401
Nonfatal system bugcheck
402
Bugcheck from process
800
Bugcheck during boot
(continued on next page)
4–32 Error Handling and Analysis
Table 4–22 (Cont.) VAXELN Detected Error Block Entry Descriptions
Entry
Contents
1
Normal successful completion
7C04
Bad parameter count
7C0C
Bad job or process creation
7C14
Bad string parameter length
7C1C
Bad access mode
7C24
Bad stack
7C2C
Bad object state
7C34
Bad object type
7C3C
Bad parameter value
7C44
Connect circuit completed
7C4C
Connect circuit pending
7C54
Connect circuit timeout
7C5C
Count overflow
7C64
Count underflow
7C6C
Debug signal
7C74
Device already connected
7C7C
Circuit disconnected by partner
7C84
Duplicate name
7C8C
Kernel stack not valid
7C94
Machine check
7C9C
No access to parameter
7CA4
No destination port
7CAC
No job initialization specified
7CB4
No physical memory available
7CBC
No I/O mapping register available
7CC4
No message available
7CCC
No object table entry available
7CD4
No process page table available
7CDC
No data path register available
7CE4
No pool available
7CEC
No port available
7CF4
No exit status value specified
7CFC
No such device
7D04
No such name
7D0C
No such port
7D14
No such program
7D1C
No such service
7D24
No system page table entries available
(continued on next page)
Error Handling and Analysis 4–33
Table 4–22 (Cont.) VAXELN Detected Error Block Entry Descriptions
Entry
Contents
7D2C
No virtual address space available
7D34
Power recovery signal
7D3C
Quit signal
7D44
Remote port value
7D4C
Process exit signal
7D54
Remote system currently unreachable
7D5C
Interprocess signal
7D64
Remote system rejected username or password
7D6C
Bad message size
7D74
Referenced shareable image not present
7D7C
Unsupported program image format
7D84
Internal consistency failure
7D8C
Port on another BI node
7D94
Third party disconnected circuit
7D9C
Network is in the off state
7DA4
No such job
7F01
Time has not been previously set
7F09
Expedited message
7F11
Previous job created area
7F19
Device already exists
ERROR_DATA
Error condition specific data. This entry is reserved for future
expansion.
4.4.5.5 Software Detected Errors
This data is provided for errors detected by the OpenVMS operating system
components. Such errors are not usually detected by hardware mechanisms. The
data is composed of information passed by the operating system component to the
EHS.
Figure 4–11 shows the format of this fault data block and the offset of each entry
from the start of the block. Table 4–23 contains a brief description of each entry.
Note
If the software component which detects the module failure does not
request the setting of the module ID NVRAM status code or does not
request a reset of the module, then these fields will contain -1 (FFFFFFFF
hexidecimal).
4–34 Error Handling and Analysis
Figure 4–11 Software Detected Error Block
MODULE_STATUS
0
+4
RESET_REASON
+8
RESET_ACTION
MR−0007−93RAGS
Table 4–23 Software Detected Error Block Entry Descriptions
Entry
Contents
MODULE_STATUS
Hexidecimal module ID NVRAM status code. The following values
are defined:
0F
Excessive CPU/MEM faults
1E
Excessive resynchronization abort errors
2D
Double-bit error
3C
Excessive single-bit errors
4B
Excessive clock phase errors
5A
Excessive CPU I/O errors
69
Solid CPU I/O errors
78
Excessive transient NXIO errors
87
Solid NXIO error
96
VAXELN kernel fatal error
A5
The module is good
B4
Excessive VAXELN kernel recoverable errors
C3
VAXELN master fatal error
D2
VAXELN master recoverable error
E1
VAXELN job fatal error
F0
System software detected module failure
F1
System software detected I/O expansion module primary UART
failure
F2
System software detected I/O expansion module auxiliary UART
failure
F3
Unexpected VAXELN error detected
RESET_REASON
Hexidecimal OpenVMS reset reason code. The following values are
defined:
1
Duplex zones have diverged
2
Fatal cross-link error has occurred
3
Fatal zone error has occurred
4
Fatal ATM module error has occurred
5
Fatal CPU module error has occurred
(continued on next page)
Error Handling and Analysis 4–35
Table 4–23 (Cont.) Software Detected Error Block Entry Descriptions
Entry
Contents
6
Fatal memory error has occurred
7
Single-bit error has occurred
8
User command issued to stop a zone
9
Unexpected machine check has occurred
A
Software detected failure has occurred
B
Solid NXIO error has occurred
C
Excessive transient I/O expansion module errors have occurred
D
A solid I/O error has occurred
E
Excessive transient I/O errors have occurred
F
Excessive VAXELN kernel recoverable errors have occurred
10
A VAXELN master fatal error has occurred
11
A VAXELN job fatal error has occurred
12
Not enough SPTEs could be allocated to boot the OpenVMS
operating system
13
Unexpected system error occurred
14
Interface module has occurred
15
Unexpected VAXELN error occurred
16
A VAXELN kernel fatal error has occurred
RESET_ACTION
Hexidecimal console reset action code. The following values are
defined:
0
Unexpected CPU reset
1
No diagnostic CPU reset
2
Dispatch request CPU reset
3
Resynchronization reset CPU reset
4
Run diagnostic CPU reset
5
Reconfigure console CPU reset
6
STOP/ZONE CPU reset
10000
Unexpected I/O reset
10001
No diagnostic I/O reset
10002
Dispatch request I/O reset
10003
Z command I/O reset
10004
Load and run (VAXELN) I/O reset
10005
Upgrade flash ROM I/O reset
10006
Run diagnostic I/O reset
10007
Reconfigure console I/O reset
4.4.5.6 Unsynchable Events
This data is provided if the console reports that a zone or CPU is unsynchable
when no previous error had been associated with it. The error can occur when
diagnostics run on a zone which was not present in the system configuration, or
after a zone has been manually removed. The data is composed of console and
diagnostic status from the failed zone.
4–36 Error Handling and Analysis
Figure 4–12 shows the format of this Fault Data block and the offset of each field
from the start of the block. Table 4–24 contains a brief description of each entry.
Figure 4–12 Unsynchable Event Block
COMPAT_STS (Test Status)
0
+4
DIAG_STS (Diagnostic Status)
MR−0008−93RAGS
Table 4–24 Unsynchable Event Block Entry Descriptions
Bit
Description
COMPAT_STS
System console duplex compatibility test status. This field indicates
the results of the compatibility test performed by the console after
diagnostics have completed. The following bits are defined:
00
Self test failed
01
Zone test failed
02
System test failed
03
ATM module self test failed
04
Both zones have same zone ID
05
CPU ID EEPROM is bad
06
CPU ID EEPROM has bad OpenVMS status
07
CPU ID EEPROM has bad firmware status
08
CPU ID EEPROM module ID mismatches with other zone
09
CPU ID EEPROM module name mismatches with other zone
10
CPU ID EEPROM hardware revision not compatible with other zone
11
CPU ID EEPROM firmware revision not compatible with other zone
12
CPU ID EEPROM software revision not compatible with other zone
13
ATM module ID EEPROM is bad
14
ATM module ID EEPROM has bad OpenVMS status
15
ATM module ID EEPROM has bad firmware status
16
ATM module ID EEPROM module ID mismatches with other zone
17
ATM module ID EEPROM module name mismatches with other zone
18
ATM module ID EEPROM hardware revision not compatible with other
zone
19
ATM module ID EEPROM firmware revision not compatible with other
zone
20
ATM module ID EEPROM software revision not compatible with other
zone
21
CPU data EEPROM is bad
22
CPU data EEPROM system wide data area mismatches with other
zone
(continued on next page)
Error Handling and Analysis 4–37
Table 4–24 (Cont.) Unsynchable Event Block Entry Descriptions
Bit
Description
23
CPU memory configuration mismatches with other zone
24
Cables (cross-link/resynchronization)
25
CPU is in burn-in mode
26
Ethernet EEPROM mismatches with other zone
27
CPU console firmware cannot be run in Duplex
[31:28]
Not used
DIAG_STS
System diagnostic status longword. This field is valid when any of bits
[03:00] are set in COMPAT_STS. This longword gives additional detail
on the diagnostic failure indicated by those bits. The following bits are
defined:
[07:00]
Subtest number, expressed in decimal
[15:08]
Test number, expressed in decimal
[23:16]
Group number, expressed in decimal
[27:24]
Diagnostic flags, expressed in hexidecimal
[30:28]
Not used
31
Diagnostic status is valid
4.5 Module NVRAM Status and LED Indicators
There are multiple I2C buses in a Model 810 zone which are used to provide
access to NVRAMs and LEDs on each module. The system I2C bus connects all
the modules in the primary backplane slots in a zone and has master controllers
on the IO ATM module. This I2C bus is used to access the NVRAMs and
the LEDs on the CPU and IO ATM modules, and the embedded primary I/O
expansion module. The primary I/O expansion module has an I2C bus with
a master controller and connections to each interface module to access their
NVRAMs and LEDs.
When the EHS identifies a module as the source of solid or excessive transient
errors, it removes the module from service. At the same time, it flags the module
as failed, turns on the module LED, and writes the error code to the module
NVRAM through its I2C bus. When the zone is removed for service, the LED
remains on.
When repair is complete and system power is turned on, diagnostics on the CPU
or I/O expansion module will examine the error code. If the OpenVMS operating
system flagged the module as failed, or diagnostics fail, the diagnostics will
not turn off the LED. The LED remains on until the module is replaced or the
NVRAM is cleared.
Table 4–25 lists the status codes that the EHS may write into the operating
system status field of the module ID NVRAM, as well as symbol names,
descriptions, and affected modules. The EHS sets the module LED every time it
writes one of these status codes.
Note
In the case of some catastrophic ATM failures, it may not be possible to
access the I2C bus for that zone to write the code and set the LED. In
4–38 Error Handling and Analysis
such cases, diagnostics on the remote zone are relied on to report the
failure.
Table 4–25 Module ID NVRAM/DCB Status Codes
Status Code
Description
Affected Modules
0F
The threshold for CPU/MEM faults for
this module has been exceeded.
CPU module
1E
The threshold for resynch abort errors
for this module has been exceeded.
CPU module
2D
The module experienced a double-bit
memory error.
CPU module
3C
The threshold for single-bit errors for a
memory SIMM has been exceeded.
CPU module
4B
The zone in which this module resides
has experienced excessive clock phase
errors.
ATM module
5A
The module has experienced excessive
transient CPU I/O errors.
ATM and I/O expansion
modules
69
The module has experienced a solid CPU
I/O error.
ATM and I/O expansion
modules
78
The module has experienced excessive
transient NXIO errors.
ATM, I/O expansion, and
Interface modules
87
The module has experienced a solid
NXIO error.
ATM, I/O expansion, and
Interface modules
96
The module has experienced a VAXELN
kernel fatal error.
I/O expansion module
A5
The module is good.
CPU, ATM, I/O expansion,
and Interface modules
B4
The module has experienced excessive
VAXELN kernel recoverable errors.
I/O expansion module
C3
The module has experienced a VAXELN
master fatal error.
I/O expansion module
D2
The module has experienced a VAXELN
master recoverable error.
Interface module
E1
The module has experienced a VAXELN
job fatal error.
Interface module
F0
A failure of this module has been
detected by a system software
component.
ATM, I/O expansion, and
Interface modules
F1
A failure of the system console UART
port in the SSC on the I/O expansion
module has been detected by a system
software component.
ATM and I/O expansion
module
F2
A failure of the auxiliary UART port in
the SSC on the I/O expansion module
has been detected by a system software
component.
ATM and I/O expansion
module
Error Handling and Analysis 4–39
4.6 FTSS Event Reporting Interface
The EHS externalizes events by reporting them to the event reporting interface
(ERI). The ERI, in turn, passes notification of the event to the FTSS$SERVER
process. The server reports the event in one of three ways:
1. Generating messages that are sent to the operator console.
2. Entering additional information into the system error log.
3. Reporting the event to an external mailbox which can be read by a user
application.
4.6.1 Event Reporting Interface Routines
The EHS reports events by calling the following ERI routines located in the
FTSS$CORE image.
FTSS$ZONE_AVAILABLE is called to report the availability of the other zone
or CPU. This occurs when the IZC notifies the EHS that the zone has completed
diagnostics and is available for use. A message code is added by the EHS and
results in an OPCOM message and an error log being generated by the server.
FTSS$ERROR_REPORT is called by the EHS when a FRU is identified as the
error source. This can occur as a result of a hardware or software detected
failure. In this call the EHS passes error information through ERI to the server
process. The server generates the appropriate messages to the operator console
and user applications, and makes entries in the error log.
4.6.2 Error Event Messages
The following messages are passed to OPCOM and the system error log by
the server. Each message corresponds to an EHS error event and contains
information that identifies the FRU.
FTSS$_CABLEGONE, cross-link cable fault detected
Facility: FTSS
Explanation: The crosslink cable has been isolated as the cause of a system
failure. One zone will be removed from service by the operating system. For
transient failures, the error will be compared to its error rate threshold.
If the threshold is not exceeded, the zone will be resynchronized when it
completes diagnostics.
User Action: If the zone is automatically resynchronized, no action
is required on the part of the user. If the zone is not automatically
resynchronized, the system error log should be examined for entries which
correspond to the cross-link cable failure. These entries will identify an FRU.
FTSS$_CLOCK_END, Clock fault end action complete
Facility: FTSS
Explanation: Error processing for a clock fault has been completed and the
zone is available to be resynchronized.
User Action: If the zone is automatically resynchronized by FTSS, then no
action is needed on the part of the user. If the zone is not resynchronized,
the system error log should be examined for entries which correspond to clock
fault. These error logs will identify an FRU.
4–40 Error Handling and Analysis
FTSS$_CLOCK_ENDTMO, Clock fault end action timeout on zone [zone_id]
Facility: FTSS
Explanation: When a clock fault occurs in a non-Simplex system, diagnostics
normally run on the failed zone and, upon completion, report status back to
the zone running the operating system. If this end action does not occur
within a reasonable timeout period, the failure will be treated as solid and
the zone will not be automatically resynchronized by FTSS.
User Action: The system error log should be examined for entries which
correspond to the clock fault and the end action timeout. These entries will
indicate an FRU.
FTSS$_CLOCKFLT, Clock fault detected on [module_id] in slot [slot_id], zone
[zone_id]
Facility: FTSS
Explanation: The clocks in each of the two zones operate in phase lock.
When this synchronization is lost, lockstep operation of the zones is lost. The
error is compared to its error rate threshold. If the threshold is exceeded, the
zone is not automatically resynchronized by FTSS.
User Action: If the removed zone is automatically resynchronized after
running diagnostics, no action is needed on the part of the user. If the zone
is not automatically resynchronized, the system error log should be examined
for entries which correspond to the clock fault. These entries will identify an
FRU which must be replaced.
FTSS$_CPMF_END, CPU/MEM fault end action complete
Facility: FTSS
Explanation: Error processing for a CPU/MEM fault has been completed
and the CPU is available to be resynchronized.
User Action: If the CPU is automatically resynchronized by FTSS, then no
action is needed on the part of the user. If the CPU is not resynchronized,
the system error log should be examined for entries which correspond to the
CPU/MEM fault. These error logs will identify an FRU.
FTSS$_CPMF_ENDTMO, CPU/MEM fault end action timed out on zone [zone_
id]
Facility: FTSS
Explanation: When a CPU/MEM fault occurs in a Duplex system,
diagnostics normally run on the failed CPU and, upon completion, report
status back to the zone running the operating system. If this end action does
not occur within a reasonable timeout period, the failure will be treated as
solid and the CPU will not be automatically resynchronized by FTSS.
User Action: The system error log should be examined for entries which
correspond to the CPU/MEM fault and the end action timeout. These entries
will indicate an FRU.
Error Handling and Analysis 4–41
FTSS$_CPUDBE, Double-bit memory fault detected on [module_id] in slot [slot_
id], zone [zone_id]
Facility: FTSS
Explanation: A double-bit memory error has occurred. This indicates a solid
memory failure. This error will only be reported in a Duplex system and a
CPU module will be removed from service when it occurs.
User Action: The system error log should be examined for entries which
correspond to the double-bit error. These logs will indicate the SIMM memory
row which must be replaced.
FTSS$_CPUSBE, A single-bit memory fault detected on [module_id] in slot [slot_
id], zone [zone_id]
Facility: FTSS
Explanation: A recoverable single-bit memory error has been detected and
handled by the operating system. These transient errors are repaired in
memory and compared to their error rate threshold. In a Duplex system, a
CPU module will be removed from service if the threshold is exceeded.
User Action: In most cases, no action by the user is necessary. If the rate of
single-bit errors becomes excessive, replacement of a SIMM memory row or
CPU module will be required. The system error log should be examined for
the entries which correspond to the single-bit errors.
FTSS$_CPUMEMFLT, CPU/MEM fault detected on [module_id] in slot [slot_id],
zone [zone_id]
Facility: FTSS
Explanation: A CPU/MEM fault in a Duplex system has been detected.
This results in the temporary removal of that CPU from service. This error
is compared to its error rate threshold. If the threshold is not exceeded and
the CPU completes diagnostics successfully, the CPU will be automatically
resynchronized. If the threshold is exceeded or diagnostics fail, the CPU will
be not be automatically resynchronized.
User Action: If the CPU is automatically resynchronized after the
completion of diagnostics, no action is required on the part of the user. If
the CPU is not automatically resynchronized, the system error log should be
examined for entries which correspond to the CPU/MEM fault. These entries
will indicate an FRU.
FTSS$_CPUUNSYNC, [module_id] in slot [slot_id], zone [zone_id] is
unsynchable
Facility: FTSS
Explanation: When a CPU completes diagnostics with failure and reports
this status to the zone running the operating system, this message is
generated. The CPU with the failure will not be automatically resynchronized
by FTSS.
User Action: The system error log should be examined for the entry which
corresponds to the unsynchable event. This entry will indicate an FRU.
4–42 Error Handling and Analysis
FTSS$_DBE_END, DBE end action complete
Facility: FTSS
Explanation: Error processing for a double-bit memory error has been
completed and the CPU is available to be resynchronized.
User Action: The system error log should be examined for entries which
correspond to the double-bit error. These error logs will identify an FRU.
FTSS$_DBE_ENDTMO, DBE end action timed out on zone [zone_id]
Facility: FTSS
Explanation: When double-bit memory errors occur in a Duplex system,
diagnostics run on the failed CPU and, upon completion, report status back
to the zone running the operating system. If this end action does not occur
within a reasonable timeout period, the failure will be treated as solid and
the CPU will not be automatically resynchronized by FTSS.
User Action: The system error log should be examined for entries which
correspond to the double-bit error and the end action timeout. These entries
will indicate an FRU.
FTSS$_DIV_END, zone divergence end action complete
Facility: FTSS
Explanation: Error processing for a zone divergence error been completed
and the zone is available to be resynchronized.
User Action: If the zone is automatically resynchronized by FTSS, then no
action is needed on the part of the user. If the zone is not resynchronized,
the system error log should be examined for entries which correspond to zone
divergence error. These error logs will identify an FRU.
FTSS$_DIV_ENDTMO, zone divergence end action timed out on zone [zone_id]
Facility: FTSS
Explanation: When zones diverge in a Duplex system, diagnostics run on
the removed zone and, on completion, report status to the zone running
the OpenVMS operating system. If this end action does not occur within a
reasonable timeout period, the failure will be treated as solid and the zone
will not be automatically resynchronized by FTSS.
User Action: The system error log should be examined for entries which
correspond to the zone divergence and the end action timeout. These entries
will indicate an FRU.
FTSS$_DIVERGED, A synchronized, dual zone configuration has diverged
Facility: FTSS
Explanation: Lockstep operation between the two zones of a Duplex system
has been lost. One of the zones is temporarily removed from service. The
error is compared to its error rate threshold. If the threshold is not exceeded,
the zone will be automatically resynchronized by FTSS after successfully
completing diagnostics. If the threshold is not exceeded or diagnostics fail,
the zone is not automatically resynchronized.
User Action: If the zone is automatically resynchronized, no action
is necessary on the part of the user. If the zone if not automatically
resynchronized, the system error log should be examined for entries which
correspond to the zone divergence error. These entries will indicate an FRU.
Error Handling and Analysis 4–43
FTSS$_ELNJOBFATAL, VAXELN job fatal error detected on [module_id] in slot
[slot_id], zone [zone_id]
Facility: FTSS
Explanation: A VAXELN job running on an I/O Expansion module has
detected a fatal error and has terminated. This error results in the removal
of the associated Interface module from the system.
User Action: The system error log should be examined for entries which
correspond to the VAXELN job fatal error. These entries will indicate an
FRU.
FTSS$_ELNJOBRECOV, VAXELN job recoverable error detected on [module_id]
in slot [slot_id], zone [zone_id]
Facility: FTSS
Explanation: A VAXELN job running on an I/O Expansion module has
detected a recoverable error. These errors are compared to their error
rate threshold by the operating system. If the threshold is exceeded in a
non-Simplex system, the associated Interface module is removed from the
system.
User Action: If the threshold is not exceeded, no action is required on the
part of the user. If the threshold is exceeded, the system error log should be
examined for entries which correspond to the VAXELN job recoverable error.
These entries will indicate an FRU.
FTSS$_ELNKERFATAL, VAXELN kernel fatal error detected on [module_id] in
slot [slot_id], zone [zone_id]
Facility: FTSS
Explanation: The VAXELN kernel running on an I/O Expansion module has
detected a fatal error and has terminated. This error results in the removal
of the indicated I/O Expansion module and associated Interface modules from
the system configuration.
User Action: The system error log should be examined for entries which
correspond to the VAXELN kernel fatal error. These entries will indicate an
FRU.
FTSS$_ELNKERRECOV, VAXELN kernel recoverable error detected on
[module_id] in slot [slot_id], zone [zone_id]
Facility: FTSS
Explanation: The VAXELN kernel running on an I/O Expansion module
has detected a recoverable error. These errors are compared to their error
rate threshold by the operating system. If the threshold is exceeded in a
non-Simplex system, the indicated I/O Expansion module and associated
Interface modules are removed from service.
User Action: If the threshold is not exceeded, no action is required on the
part of the user. If the threshold is exceeded, the system error log should be
examined for entries which correspond to the VAXELN kernel recoverable
errors. These entries will indicate an FRU.
4–44 Error Handling and Analysis
FTSS$_ELNMASFATAL, VAXELN master job fatal error detected on [module_
id] in slot [slot_id], zone [zone_id]
Facility: FTSS
Explanation: The VAXELN master job running on an I/O Expansion module
has detected a fatal error and has terminated. This error results in the
removal of the indicated I/O Expansion module and associated Interface
modules from the system configuration.
User Action: The system error log should be examined for entries which
correspond to the VAXELN master job fatal error. These entries will indicate
an FRU.
FTSS$_ELNMASRECOV, VAXELN master job recoverable error detected on
[module_id] in slot [slot_id], zone [zone_id]
Facility: FTSS
Explanation: The VAXELN master job running on an I/O Expansion module
has detected a recoverable error. These errors are compare to their threshold
by the operating system. If the threshold is exceeded in a non-Simplex
system, the indicated I/O Expansion module and associated Interface modules
are removed from service.
User Action: If the threshold is not exceeded, no action is required on the
part of the user. If the threshold is exceeded, the system error log should be
examined for entries which correspond to the VAXELN master job recoverable
errors. These entries will indicate an FRU.
FTSS$_JXDDBE, Double-bit memory fault detected on [module_id] in slot [slot_
id], zone [zone_id]
Facility: FTSS
Explanation: A double-bit memory error has occurred. This indicates a solid
memory failure. In a Duplex system, a CPU module will be removed from
service when this error occurs.
User Action: The system error log should be examined for entries which
correspond to the double bit error. These logs will indicate the SIMM memory
row which must be replaced.
FTSS$_JXDSBE, Single-bit memory fault detected on [module_id] in slot [slot_
id], zone [zone_id]
Facility: FTSS
Explanation: A recoverable single-bit memory error has been detected and
handled by the operating system. These transient errors are repaired in
memory, and the errors are compared to their error rate threshold. In a
Duplex system, a CPU module will be removed from service if the threshold
is exceeded.
User Action: In most cases, no action by the user is necessary. If the rate of
single-bit errors becomes excessive, replacement of a SIMM memory row will
be required. The system error log should be examined for the entries which
correspond to the single-bit errors.
Error Handling and Analysis 4–45
FTSS$_POWERGONE, Power gone fault detected on zone [zone_id]
Facility: FTSS
Explanation: Power has been lost in one of the zones. This error is
compared to its error rate threshold. If the threshold is not exceeded, the
zone will be automatically resynchronized when power returns.
User Action: If power is restored and the zone is automatically
resynchronized, no action is required on the part of the user. If power is
restored and the zone is not automatically resynchronized, the user should
examine the external system power source.
FTSS$_RESYNCHFLT, Resynch abort fault detected on [module_type] in slot
[slot_id], zone [zone_id]
Facility: FTSS
Explanation: During an attempt to resynchronize a CPU/Memory module,
an error occurred on the master CPU module. This error is compared to
its error rate threshold by the operating system. If the threshold is not
exceeded, FTSS will retry the resynchronization process. When the threshold
is exceeded, attempts to resynchronize will be terminated.
User Action: If the resynchronization retry is successful, no action is
required on the part of the user. If the threshold for retries is exceeded, the
system error log should be examined for entries which correspond to the
resynch abort failure. These entries will indicate an FRU.
FTSS$_SBE_END, SBE end action complete
Facility: FTSS
Explanation: Error processing for a single-bit memory error has been
completed and the CPU is available to be resynchronized.
User Action: If the CPU is automatically resynchronized by FTSS, then no
action is needed on the part of the user. If the CPU is not resynchronized, the
system error log should be examined for entries which correspond to single
bit error. These error logs will identify an FRU.
FTSS$_SBE_ENDTMO, SBE end action timed out on zone [zone_id]
Facility: FTSS
Explanation: When single-bit memory errors occur in a Duplex system,
diagnostics run on the failed CPU and, on completion, report status back
to the zone running the operating system. If this end action does not occur
within a reasonable timeout period, the failure will be treated as solid and
the CPU will not be automatically resynchronized by FTSS.
User Action: The system error log should be examined for entries which
correspond to the single-bit error and the end action timeout. These entries
will indicate an FRU.
FTSS$_SOLIDIOMOD, Solid I/O fault detected on [module_type] in slot [slot_id],
zone [zone_id]
Facility: FTSS
Explanation: A fatal I/O miscompare error was detected and attributed to
the indicated module. The module is removed from service by the operating
system.
User Action: The system error log should be examined for entries which
correspond to the I/O miscompare errors. These entries will indicate an FRU.
4–46 Error Handling and Analysis
FTSS$_SOLIDNXIO, Solid NXIO fault detected on [module_type] in slot [slot_
id], zone [zone_id]
Facility: FTSS
Explanation: A fatal nonexistent I/O error has occurred when accessing the
indicated I/O module. The module is removed from service by the operating
system.
User Action: The system error log should be examined for entries which
correspond to the nonexistent I/O error. These entries will indicate an FRU.
FTSS$_SOLIDIOXLNK, Solid I/O fault detected on the cross-link
Facility: FTSS
Explanation: A fatal I/O miscompare error was detected and attributed
to the cross-link. One zone is selected and is removed from service by the
operating system.
User Action: The system error log should be examined for entries which
correspond to the I/O miscompare errors. These entries will indicate an FRU.
FTSS$_SOLIDIOZONE, Solid I/O fault detected on zone [zone_id]
Facility: FTSS
Explanation: A fatal I/O miscompare error was detected and attributed to
the indicated zone. The zone is removed from service by the operating system.
User Action: The system error log should be examined for entries which
correspond to the I/O miscompare errors. These entries will indicate an FRU.
FTSS$_SWMODERR, Software detected failure on [module_type] in slot [slot_
id], zone [zone_id]
Facility: FTSS
Explanation: A system software component has detected the failure of a
system module. In most cases, these errors indicate the failure of an I/O
module which was detected by a device driver and not reported by a system
error interrupt. These errors indicate a fatal failure of the indicated module
and it is removed from service.
User Action: The system error log should be examined for entries which
correspond to the software detected module failure. These entries will
indicate an FRU.
FTSS$_SWZONERR, Software detected failure on zone [zone_id]
Facility: FTSS
Explanation: A system software component has detected the failure of
a zone. This error indicates a fatal failure of the indicated zone and it is
removed from service.
User Action: The system error log should be examined for entries which
correspond to the software detected zone failure. These entries will indicate
an FRU.
Error Handling and Analysis 4–47
FTSS$_TRNSIOMOD, Transient I/O fault detected on [module_type] in slot
[slot_id], zone [zone_id]
Facility: FTSS
Explanation: A transient I/O miscompare error was detected and attributed
to the indicated module. These errors are compared to their error rate
threshold. If the threshold is exceeded and the system mode is not Simplex,
the module is removed from service.
User Action: If the threshold is not exceeded and the module is not removed
from service, no action is needed on the part of the user. If the module is
removed from service, the system error log should be examined for entries
which correspond to the I/O miscompare errors. These entries will indicate an
FRU.
FTSS$_TRNSNXIO, Transient NXIO fault detected on [module_type] in slot
[slot_id], zone [zone_id]
Facility: FTSS
Explanation: A transient non-existent I/O error was detected when
accessing the indicated module. These errors are compared to their error
rate threshold. If the threshold is exceeded and the system mode is not
Simplex, the module is removed from service.
User Action: If the threshold is not exceeded and the module is not removed
from service, no action is needed on the part of the user. If the module is
removed from service, the system error log should be examined for entries
which correspond to the non-existent I/O errors. These entries will indicate
an FRU.
FTSS$_TRNSIOXLNK, Transient I/O fault detected on the cross-link
Facility: FTSS
Explanation: A transient I/O miscompare error was detected and attributed
to the cross-link. These errors are compared to their error rate threshold. If
the threshold is exceeded and the system mode is not Simplex, then one zone
is removed from service.
User Action: If the threshold is not exceeded and a zone is not removed from
service, no action is needed on the part of the user. If a zone is removed from
service, the system error log should be examined for entries which correspond
to the I/O miscompare errors. These entries will indicate an FRU.
FTSS$_TRNSIOZONE, Transient I/O fault detected on zone [zone_id]
Facility: FTSS
Explanation: A transient I/O miscompare error was detected and attributed
to the indicated zone. These errors are compared to their error rate threshold.
If the threshold is exceeded and the system mode is not Simplex, the zone is
removed from service.
User Action: If the threshold is not exceeded and the zone is not removed
from service, no action is needed on the part of the user. If the zone is
removed from service, the system error log should be examined for entries
which correspond to the I/O miscompare errors. These entries will indicate an
FRU.
4–48 Error Handling and Analysis
FTSS$_ZONEHALT, Zone Halt fault detected on zone [zone_id]
Facility: FTSS
Explanation: A single zone of a Duplex system has been halted. This can be
caused by a user command on the system console or by a system error.
User Action: If the Halt was caused by a user command on the system
console, a START/ZONE command must be executed to restore the zone to
service. If the Halt was not caused by a user command, the system error log
should be examined for entries which correspond to the zone halt error. These
entries will identify an FRU.
FTSS$_ZONEUNSYNC, Zone [zone_id] is unsynchable
Facility: FTSS
Explanation: When a zone completes diagnostics with failure and reports
this status to the zone running the operating system, this message is
generated. The zone with the failure will not be automatically resynchronized
by FTSS.
User Action: The system error log should be examined for the entry which
corresponds to the unsynchable event. This entry will indicate an FRU.
4.6.2.1 Deconfiguration Messages
The following messages can be passed to OPCOM and the system error log file
by the FTSS$SERVER at the request of EHS. Each message corresponds to a
deconfiguration activity performed by EHS. Each message contains information
(through FAO arguments) that identifies the entity deconfigured by EHS.
FTSS$_DECONFIG_ATMIO, I/O expansion subsystem on I/O attachment
module in slot [slot_id], zone [zone_id] has been removed from service
Facility: FTSS
Explanation: Due to one or more system errors, the I/O expansion
subsystem on the indicated I/O ATM and its associated Interface modules
have been removed from service.
User Action: The system error log should be examined for entries which
correspond to the removal of the I/O expansion subsystem. These entries will
indicate an FRU.
FTSS$_DECONFIG_CPUMOD, CPU module in slot [slot_id], zone [zone_id] has
been removed from service
Facility: FTSS
Explanation: Due to one or more system errors, the indicated CPU module
has been removed from service. In some cases, the CPU may be automatically
resynchronized by FTSS when it successfully completes the execution of
diagnostics.
User Action: If the CPU is automatically resynchronized by FTSS after
completing diagnostics, no action is required on the part of the user. If the
CPU is not automatically resynchronized, the system error log should be
examined for entries which relate to the removal of the CPU. These entries
will indicate an FRU.
Error Handling and Analysis 4–49
FTSS$_DECONFIG_EXMOD, I/O expansion module in slot [slot_id], zone [zone_
id] has been removed from service
Facility: FTSS
Explanation: Due to one or more system errors, the indicated I/O Expansion
module and its associated Interface modules have been removed from service.
User Action: The system error log should be examined for entries which
correspond to the removal of the I/O expansion module. These entries will
indicate an FRU.
FTSS$_DECONFIG_INTMOD, Interface module in slot [slot_id], zone [zone_id]
has been removed from service
Facility: FTSS
Explanation: Due to one or more system errors, the indicated Interface
module has removed from service.
User Action: The system error log should be examined for entries which
correspond to the removal of the Interface module. These entries will indicate
an FRU.
FTSS$_DECONFIG_ZONE, Zone [zone_id] has been removed from service
Facility: FTSS
Explanation: Due to one or more system errors, the indicated zone has
been removed from service. In some cases, the zone may be automatically
resynchronized by FTSS when it successfully completes the execution of
diagnostics.
User Action: If the zone is automatically resynchronized by FTSS after
completing diagnostics, no action is required on the part of the user. If the
zone is not automatically resynchronized, the system error log should be
examined for entries which relate to the removal of the zone. These entries
will indicate an FRU.
4.7 Firmware Interfaces
The EHS interacts with three firmware-based software entities: system console
and diagnostics, I/O expansion module console and diagnostics, and the I/O
expansion module VAXELN software. The system console and diagnostics and
I/O expansion module console and diagnostics interfaces are discussed in the
following sections.
4.7.1 System Console and Diagnostics
The EHS communicates with the system console through:
•
System hardware resets combined with flags in the console communications
area (CCA)
•
CCA fields referenced using the IZC service
4–50 Error Handling and Analysis
4.7.1.1 System Resets
When the EHS determines that a zone or CPU should be removed from the
configuration, it forces a reset on the CPU. The reset results in the system console
being invoked from serial ROM by the hardware. When system console runs, it
attempts to determine the reason for the reset, which in turn may determine
the actions performed by the console. The EHS uses the fields in the CCA reset
dispatch block (at offset CCA560$R_RESET_BLOCK) to pass reset reason codes
to the console. The fields are:
RDB$L_RESET_CODE - The reset reason code. This longword field is actually
composed of two one-word fields:
•
RDB$W_ACTION - The reset action. This word instructs the console on the
action that needs to be taken. The reset action codes used by the EHS are
described in Table 4–26.
•
RDB$W_REASON - The reset reason. This field is additional data supplied
by the OpenVMS operating system which indicates the reason for the reset.
The code is printed in hex on the operator console after the reset action
is completed. The reset reason codes used by the EHS are described in
Table 4–27.
RDB$L_REASON_VALID - The 1’s complement of the reset reason code longword.
RDB$L_DISPATCH - This field is used only if the system console is to continue
the OpenVMS operating system after completing reset actions. In all reset cases
by the EHS, it will be 0.
Table 4–26 System Reset Action Codes
Decimal
Value
Description
1
This code will cause the system console to enter its halt loop, which will
establish IZC to the other zone, without invoking any diagnostics. Currently,
this reset action is requested only when the EHS is handling a single-bit
error.
4
This code will cause the system console to invoke diagnostics. The
diagnostics which run depend on the cross-link mode at the time. Following
diagnostics, the system console will enter its halt loop, and establish IZC to
the other zone. The code is used when a zone or CPU is being removed due
to a system error.
6
This code will cause the same actions as CPURESET$K_DIAGS. This code
is used when a zone is being removed by operator action (that is, a user
command).
Error Handling and Analysis 4–51
Table 4–27 System Reset Reason Codes
Decimal
Value
Description
1
When the EHS detects zone divergence, it selects one zone to continue the
OpenVMS operating system and one zone to stop. Note that the OpenVMS
operating system is not indicating an error in this zone; it must stop one of
the two.
2
When the EHS isolates a failure to the cross-link cable (for example, a cable
gone error), it will reset one zone using this reason type.
3
When the EHS detects a fault in a zone that cannot be isolated to a single
module, it will reset the zone with this reason type. Usually, such errors are
the result of backplane failures.
4
The OpenVMS operating system will use this reset with an IO ATM module
failure. Before this reset, the operating system will write an error code to
the module ID EEPROM through the I2C bus.
5
The OpenVMS operating system will use this to reset a CPU module after
determining that it has failed. Before the reset, the OpenVMS operating
system will write an error code to the module ID EEPROM through the I2C
bus.
6
The OpenVMS operating system will use this to reset a CPU module after
determining that its memory has failed.
7
An SBE was detected by the CPU in Duplex mode. CPU lockstep between
zones is lost on this event and it should be reestablished as soon as possible.
This code is used in conjunction with the CPURESET$K_NO_DIAGS reset
action code.
8
This code is used as a result of a user-issued command to remove a zone
from service.
9
A fatal system machine check error has occurred.
10
A system software component detected a failure of this module.
Table 4–28 lists the events which might cause the EHS to issue the reset, and the
cross-link modes under which the reset might be issued.
Table 4–28 Error Handler Reset Reasons
Event
Possible Cross-Link Modes
Double-Bit Error
OFF, MASTER
Single-Bit Error
SLAVE
Cross-Link Cable
Failure
OFF
Clock Phase Errors
OFF
I/O Errors
OFF, MASTER, SLAVE
Zone Divergence
OFF
Single-Bit Error
SLAVE
User Command
OFF, SLAVE
4–52 Error Handling and Analysis
4.7.1.2 CCA Fields
When a CPU or zone completes diagnostics, it enters its halt loop, which reports
its status to the OpenVMS operating system in the other zone through the IZC
service. The IZC service will in turn call the OpenVMS operating system to report
the availability of the other zone. The operating system requires the following
information to be available from the console in the other zone:
•
The IZC message to the operating system will contain a synchability status.
If the status is unsynchable, the OpenVMS operating system will examine
the CCA in the console zone. The field CCA560$L_COMPAT_STATUS will
contain a reason mask which describes the reasons that the zone is not
synchable. This information will be entered into the system error log.
If the reason mask indicates a diagnostic failure, the CCA560$Q_DIAG_
STATUS field will contain additional information on the failure. The EHS
will use the IZC service to read this information for entry into the system
error log.
•
The EHS uses the IZC service to read system register information from
the CCA of the other zone starting at offset CCA560$R_REG_BLOCK. The
registers in this block were written by the EHS when the original error
occurred. However, the console must preserve this area through all resets and
during diagnostic execution, whenever possible (some catastrophic failures
will prevent this from working).
4.7.2 I/O Expansion Module Console and Diagnostics
When the EHS determines that an I/O expansion module should be removed from
the configuration, it forces an I/O hard reset on the modules. This results in the
I/O expansion module console being invoked by hardware. When the console runs,
it attempts to determine the reason for the reset, which in turn may determine
the actions performed by the diagnostics. The EHS uses two fields in the NCA
reset dispatch block (at offset NCA560$L_RESET_BLOCK) to pass reset reason
codes to the diagnostics. The fields are:
RDB$L_RESET_CODE - The reset reason code. This longword field is actually
composed of two 1-word fields:
•
RDB$W_ACTION - The system reset action. This word instructs the console
on the action that needs to be taken. The only reset action code used by the
EHS is shown in Table 4–29.
•
RDB$W_REASON - The reset reason. This field is additional data supplied
by the operating system which indicates the reason for the reset. The reset
reason codes used by the EHS are shown in Table 4–30.
RDB$L_REASON_VALID - The 1’s complement of the reset reason code longword.
RDB$L_DISPATCH - This field is used only if the console is to continue the
operating system after completing reset actions. In all cases of I/O resets by the
EHS, it will be 0.
Error Handling and Analysis 4–53
Table 4–29 I/O Reset Action Code Description
Decimal Value
Description
6
This reset code will cause the I/O expansion module console to invoke
diagnostics. The diagnostics which run depend upon the mode of the
cross-link at the time. After diagnostics, console will enter its halt loop.
Table 4–30 I/O Reset Reason Code Descriptions
Decimal
Value
Description
11
The module has experienced a solid NXIO error.
12
The module has experienced excessive transient NXIO errors.
13
The module has experienced a solid I/O miscompare error.
14
The module has experienced excessive transient I/O miscompare errors.
15
The module has experienced excessive VAXELN kernel recoverable errors.
16
The module has experienced a VAXELN master fatal error.
4.8 Firmware and OpenVMS Interface Data Structures
Figure 4–13 shows the OpenVMS operating system and firmware data structure
memory map. The following sections describe the data structures used by the
console:
•
Console Communication Area (CCA)
•
Device Configuration Block (DCB)
•
Page Frame Number Bitmap (PFN)
The firmware constructs, initializes, and shares the data structures with the
OpenVMS operating system.
Figure 4–13 Firmware and OpenVMS Data Structure Memory Map
Page Frame Number (PFN) Bitmap
Zone A Sub−Device Configuration Block (SubDCB)
Zone A Device Configuration Block (DCB)
Zone B Sub−Device Configuration Block (SubDCB)
Zone A Device Configuration Block (DCB)
Console Communications Area (CCA)
Remainder of Main Memory
MR−0019−93RAGS
4–54 Error Handling and Analysis
4.8.1 Console Communications Area
The console communications area (CCA) is the main data structure used by the
console to interface with the OpenVMS operating system. Table 4–31 describes
the CCA components.
Table 4–31 CCA Component Descriptions
Parameter
Size
Description
CCA size
2 bytes
Size of the CCA in bytes. Initialized by firmware.
CCA
revision
1 byte
Revision of the CCA. Initialized by firmware.
CCA base
4 bytes
Physical address of the CCA. Initialized by firmware.
Header
flags
4 bytes
CCA flags. Field breakdown by bit:
•
00 = Bootstrap in progress. Set by firmware when
bootstrap operation is started. Cleared by the OpenVMS
operating system. Used to control the bootstrap
operation.
•
01 = Restart in progress. Set by firmware when restart
operation is started. Cleared by the OpenVMS operating
system. Used to control the restart operation.
•
02 = Automatic bootstrap. Set by firmware when a
manual bootstrap occurred.
•
03 = Reboot in progress. Set by the OpenVMS operating
system when a bootstrap operation is requested by the
operating system using the default boot specification.
•
04 = Failsafe mode. Set by firmware to indicate that the
zone is in Failsafe mode. (Failesafe mode refers to the
method used for bootstrapping.)
•
05 = Synchable status. Set by firmware to indicate that
the zone is synchable (Duplex compatibility test passed).
If bit is clear, test failed. Use the Duplex compatibility
test results component to obtain the reason for failure.
•
06 = Halted from bootstrap. Set by VMB to indicate to
the firmware that it is not to report a bootstrap error.
This bit overrides the state of the bootstrap in progress
bit 0 with respect to handling errors during the bootstrap
operation.
•
[31:07] = Reserved for firmware use.
(continued on next page)
Error Handling and Analysis 4–55
Table 4–31 (Cont.) CCA Component Descriptions
Parameter
Size
Description
Bootability
test results
4 bytes
Results of the bootstrap test. Written by the firmware. Field
breakdown by bit:
•
00 = CPU/ATM check. Set when the CPU and ATM are
good.
•
01 = Cable state. Set when cables are present and good.
•
02 = Other zone power state. Set when the power is on in
the other zone.
•
03 = Other zone OpenVMS operating system state. Set
when the other zone is running the OpenVMS operating
system.
•
04 = Other zone CPU/ATM check. Set when the CPU and
ATM in the other zone are good.
•
[31:07] = Reserved for firmware use.
PFN
bitmap
address
4 bytes
Physical address of the PFN bitmap. Initialized by firmware.
PFN
bitmap
size
4 bytes
Size of the PFN bitmap in bytes. Initialized by firmware.
PFN
bitmap
checksum
4 bytes
Checksum of the PFN bitmap. Checksum = integer sum of all
bytes in the PFN bitmap.
System
serial
number
12 bytes
System serial number. 12 ASCII characters. Initialized by
firmware. Copied from the CPU module data EEPROM.
Zone A
DCB offset
4 bytes
Offset to the Zone A DCB. Offset is the byte offset (signed)
from the CCA base. Initialized by firmware.
Zone A
DCB size
4 bytes
Size in bytes of the DCB for Zone A. The size includes the
DCB and any SubDCBs for Zone A. Initialized by firmware.
Zone B
DCB offset
4 bytes
Offset to the Zone B DCB. Offset is the byte offset (signed)
from the CCA base. Initialized by firmware.
Zone B
DCB size
4 bytes
Size in bytes of the DCB for Zone B. The size includes the
DCB and any SubDCBs for Zone B. Initialized by firmware.
(continued on next page)
4–56 Error Handling and Analysis
Table 4–31 (Cont.) CCA Component Descriptions
Parameter
Size
Description
Diagnostic
status
8 bytes
Results of the diagnostic tests. Initialized by firmware.
Breakdown of the status fields:
•
[07:00] = Error number
•
[15:08] = Subtest number
•
[23:26] = Test number
•
[27:24] = Group number
•
[30:28] = Diagnostic flags. For firmware use only.
•
31 = Set when bits 27:00 indicate a valid failure code.
The high-order four bytes are reserved for firmware.
Duplex
compatibility test
results
4 bytes
Results of the compatibility test. Written by firmware. See
Section 4.8.1.1 for the test descriptions and fault codes.
Reset
dispatch
block
16 bytes
Used by firmware and the OpenVMS operating system to
notify the firmware how to handle a reset entry to firmware.
See Section 4.8.1.2 for dispatch block description.
Boot
parameter
table
164 bytes
Boot parameter table. Initialized by firmware. See
Section 4.8.1.3 for the description.
Saved
register
block
132 bytes
Register block saved by the OpenVMS operating system on
a CPU/MEM fault. Initialized and used by the operating
system.
Reserved
64 bytes
Reserved for future expansion.
4.8.1.1 Duplex Compatibility Test
On firmware entry, the console program verifies a number of conditions that are
required for system operation in Duplex mode. These conditions determine if the
zone is synchable, that is, able to join a partner zone in Duplex operation.
The IZC protocol is used by the console program to execute the Duplex
compatibility test. Once the console establishes the IZC service, it executes
the test and notifies the other zone of the results. A zone is considered synchable
if it passes the test.
The compatibility test is responsible for storing the results in the CCA. The
following items are test parameters.
•
Diagnostic status:
CPU self-test passes
CPU zone test passes
Primary I/O expansion module self-test passes
CPU system test does not fail (not run assumes a passed condition)
•
Zone identification:
One Zone A, one Zone B.
Error Handling and Analysis 4–57
•
CPU module ID EEPROM:
Valid checksum
OpenVMS and firmware status byte is good
Module ID and module name compatible with other zone
Module hardware revision compatible with other zone (major)
Firmware and software revisions compatible with other zone (major)
•
I/O ATM module ID EEPROM:
Valid checksum
OpenVMS and firmware status byte is good
Module ID and module name compatible with other zone
Module hardware revision compatible with other zone (major)
Firmware and software revisions compatible with other zone (major)
•
CPU module data EEPROM:
Valid checksum
System data area must be the same in both zones
•
Memory restrictions for synchronization:
Same memory configuration on both zones
•
Cross-link and resynch cables functional
•
Operational modes must be compatible (that is, burnin state)
•
Ability of the CPU console firmware to run in cross-link in Duplex mode
Table 4–32 lists the test failure codes. Each bit represents the results of checking
the given condition. The test will attempt to check all conditions, and updates the
bits as it performs the test (set bit indicates failure).
Table 4–32 Duplex Compatibility Test Failure Codes
Failure Code
Bit Number
Code Description
00
CPU self-test failed
01
CPU zone test failed
02
CPU system test failed
03
ATM self-test failed
04
Both zones have the same zone ID
05
CPU ID EEPROM is bad
06
CPU ID EEPROM OpenVMS status field shows module is bad
07
CPU ID EEPROM firmware status field shows module is bad
08
CPU ID EEPROM module type field mismatches between zones
09
CPU ID EEPROM module name field mismatches between zones
10
CPU ID EEPROM hardware revision (major) mismatches between
zones
11
CPU ID EEPROM firmware revision (major) mismatches between
zones
(continued on next page)
4–58 Error Handling and Analysis
Table 4–32 (Cont.) Duplex Compatibility Test Failure Codes
Failure Code
Bit Number
Code Description
12
CPU ID EEPROM software revision (major) mismatches between
zones
13
ATM ID EEPROM is bad
14
ATM ID EEPROM OpenVMS status field shows module is bad
15
ATM ID EEPROM firmware status field shows module is bad
16
ATM ID EEPROM module type field mismatches between zones
17
ATM ID EEPROM module name field mismatches between zones
18
ATM ID EEPROM hardware revision (major) mismatches between
zones
19
ATM ID EEPROM firmware revision (major) mismatches between
zones
20
ATM ID EEPROM software revision (major) mismatches between
zones
21
CPU data EEPROM is bad
22
CPU data EEPROM system wide area mismatches between zones
23
CPU/memory configuration mismatches between zones
24
Cables (cross-link and/or resynch) are not functional
25
CPU is in burnin state
26
Ethernet EEPROM address mismatches between zones
27
CPU console firmware cannot be synchable (cannot run in Duplex
mode)
[31:28]
Reserved for future use
4.8.1.2 Dispatch Block Description
The firmware validates a reset entry using a dispatch block, located in memory,
to determine the next operation. Figure 4–14 shows the dispatch block structure.
Table 4–33 describes the block components.
Figure 4–14 Dispatch Block Structure
Base + 00
Dispatch Reason Code
Base + 04
Dispatch Address
Base + 0C
Dispatch Reason Complement
MR−0018−93RAGS
Error Handling and Analysis 4–59
Table 4–33 Dispatch Block Components
Block Content
Offset
Description
Dispatch reason
code
Base + 00h
4 bytes
Code identifying reset reason. Bytes 03:02
identify the reason for the reset.
Bytes 01:00 identify the end action to be taken by
the console as specified below:
•
00 = POWERUP. Default or unexpected reset.
Run diagnostics and halt (enter the console).
•
01 = NO_DIAGS. Halt (enter the console).
•
02 = DISPATCH. Dispatch requested. Jump
to the dispatch address.
•
03 = RESYNCH. Resynch reset. Jump to the
dispatch address.
•
04 = DIAGS. Run diagnostics and halt (enter
the console).
•
05 = STOP_ZONE. OpenVMS issued a STOP_
ZONE. Run diagnostics and halt (enter the
console).
•
06 = RECONFIG. Reconfigure firmware (for
firmware use only).
Dispatch address
Base + 04h
8 bytes
Physical address where console will jump. In the
Model 810, only the first 4 bytes are used. Upper
4 bytes must be 0.
Dispatch reason
complement
Base = 0Ch
4 bytes
The 1’s complement of the dispatch reason code.
Used for checking the dispatch block validity.
4.8.1.3 Boot Parameter Block Description
The boot parameter block (BPB) is a structure built by firmware to reflect the
primary bootstrap code (VMB) of the boot device that is used during the bootstrap
sequence. Table 4–34 describes the BPB components. Table 4–35 describes the
entry components in the DCB structure.
Table 4–34 BPB Components
Component
Length
Description
Number of
entries
4 bytes
Number of entries in the BPB. Written by firmware. Is 0 if
no entries are present.
BPB entries
5 bytes
per entry
An entry describes a boot path. Written by firmware.
Maximum number of entries is 32. (See Table 4–35 for
entry description.)
4–60 Error Handling and Analysis
Table 4–35 BPB Entry Components
Component
Length
Description
Unit number
2 bytes
Device unit number. Valid numbers are in the 0 to 999
(decimal) range.
Device
2 bytes
Device name in ASCII (that is, EP and DI).
Path identifier
1 byte
Path to device. Field breakdown is:
•
[06:00] = Slot number of the adapter module in the 10
to 17 (hex) and 20 to 27 (hex) range.
•
07 = Zone identification of the adapter module: 0 =
Zone A, 1 = Zone B.
4.8.2 Device Configuration Block
The device configuration block (DCB) reflects the configuration of the available
modules in the system. There is a DCB in each zone. The DCB is built by
firmware during the power up sequence and updated each time INIT and BOOT
are executed. The OpenVMS operating system uses the DCB to configure the
system. Table 4–36 describes the DCB components. Table 4–37 describes the
DCB entry components.
Table 4–36 DCB Components
Component
Length
Description
Number of
entries
4 bytes
Number of entries in the DCB. Initialized by firmware. Is 0
if no entries are present.
DCB entries
168 bytes
per entry
An entry describes a module found by the firmware.
Initialized by firmware. Maximum number of entries is
eight. (See Table 4–37 for entry description.)
Table 4–37 DCB Entry Components
Component
Length
Description
Slot number
1 byte
Physical slot number of the module. Valid slot numbers are:
0 to 2 for CPU and I/O ATM modules
0 to 7 for interface modules attached to the I/O ATM
Module type
1 byte
Code identifying the module. Module types are copied from
the module ID EEPROM. Valid module types are:
1 = Not used
2 = SWIFT adapter card
3 = I/O ATM module
4 = DSF module
5 = CPU module
6 = LANCE adapter card
7 = Not used
8 = FDDI adapter card
F = Unknown module
(continued on next page)
Error Handling and Analysis 4–61
Table 4–37 (Cont.) DCB Entry Components
Component
Length
Description
Status
summary
1 byte
Module status summary. This field is a summary of the
OpenVMS and firmware status fields. The field should be
updated whenever OpenVMS or firmware status fields are
updated. Codes are initially copied from the module ID
EEPROM. Valid codes (in hex) are:
A5 = Module is good.
B4 = Module is bad, marked by OpenVMS. See
OpenVMS status field.
C3 = Module is bad, marked by firmware. See firmware
status field.
FF = Module is bad, marked by OpenVMS and
firmware.
OpenVMS
status
1 byte
Module status as marked by OpenVMS (and maintained by
OpenVMS). Codes are initially copied from the module ID
EEPROM. Valid codes (in hex) are:
A5 = module is good.
non A5 = module is bad.
Firmware
status
1 byte
Module status as marked by firmware (and maintained by
firmware). Codes are initially copied from the module ID
EEPROM. Valid codes (in hex) are:
A5 = Module is good.
non A5 = Module is bad.
Module name
4 bytes
ASCII module name. Copied from the module ID EEPROM.
Module serial
number
12 bytes
Module serial in ASCII. Copied from the module ID
EEPROM.
Hardware
revision
6 bytes
Identifies the module hardware revision. Copied from the
module ID EEPROM. Divided in:
Minor revision (bytes 02:00)
Major revision (bytes 05:03)
Firmware
revision
2 bytes
Console/diagnostic firmware revision of the module. Copied
from the module ID EEPROM. Divided in:
Minor revision (byte 00)
Major revision (byte 01)
Software
revision
2 bytes
Functional firmware revision of the module. Copied from
the module ID EEPROM. Divided in:
Minor revision (byte 00)
Major revision (byte 01)
(continued on next page)
4–62 Error Handling and Analysis
Table 4–37 (Cont.) DCB Entry Components
Component
Length
Description
Ethernet
address
32 bytes
Module Ethernet address. Follows the DEC STD format.
Valid only for CPU module and LANCE adapter card.
Copied from the Ethernet EEPROM by firmware for the
CPU. Copied from the LANCE ROM for the LANCE adapter
card.
Extended data
32 bytes
Module-specific data. The field is copied by firmware from
the functional firmware ROM.
Memory size
4 bytes
Size of the module’s memory in 512 byte segments.
For CPU refers to the size of main memory.
For I/O ATM refers to the size of local (SOC) memory.
For interface modules refers to the size of buffer RAM.
SubDCB
4 bytes
Offset to the module SubDCB (Sub-Device Configuration
Block). Offset is the byte offset (signed) from the base of the
DCB. Is 0 if no SubDCB available.
Reserved
64 bytes
Reserved for future use.
4.8.2.1 Sub-Device Configuration Blocks
The SubDCBs reflect the configuration of the interface or memory modules
attached to a module. SubDCBs may be available for the CPU and I/O ATM
modules. The SubDCB is built by firmware during the power up sequence and
updated each time INIT and BOOT are executed.
A SubDCB is present when there are interface modules attached to a given
module and its existence is represented in that module’s DCB entry. When the
SubDCB offset field on a DCB entry is nonzero, the value is used to calculate the
location of its SubDCB block. If the SubDCB offset field on a DCB entry is zero,
there is no SubDCB block present (that is, no interface modules are attached to
that module).
The format of a SubDCB is the same as for the DCB block. The field containing
the number of entries follows the same format as a DCB entry (except the CPU
module SubDCB). Figure 4–15 shows how the SubDCBs are linked to the DCB.
Error Handling and Analysis 4–63
Figure 4–15 SubDCB Links to DCB
SubDCB for DCB Entry 1
CCA
Number of Entries
DCB Entry 1
DCB Entry 2
Zone A DCB Offset
CCA Base
+ Offset
Zone B DCB Offset
DCB Entry n−1
DCB Entry n
Zone A DCB
Number of Entries
DCB Entry 1
DCB Entry n
DCB Base
+ Offset
DCB Base
+ Offset
SubDCB for DCB Entry n
Number of Entries
DCB Entry 1
DCB Entry 2
DCB Entry n−1
DCB Entry n
MR−0020−93RAGS
4.8.2.2 CPU Module SubDCB
The CPU SubDCB is used to represent the memory modules (MMBs) available on
the CPU module. Table 4–38 describes the CPU SubDCB components. Table 4–39
describes the CPU SubDCB entry components.
4–64 Error Handling and Analysis
Table 4–38 CPU SubDCB Components
Component
Length
Description
Number of
entries
4 bytes
Number of entries in the SubDCB. Initialized by firmware.
Is 0 if no entries are present.
SubDCB
entries
16 bytes
per entry
An entry describes an MMB found by the firmware.
Initialized by firmware. Maximum number of entries is
four.
Table 4–39 CPU SubDCB Entry Components
Component
Length
Description
SIMM block
16 bytes
MMB SIMM description. This field is an array of eight
elements (SIMM0 to SIMM7). Each element is 2 bytes in
size and contains:
Byte 00 - SIMM size in Mbytes.
Byte 01 - SIMM status. Values for SIMM status (in
hex) are:
A5 = SIMM is good.
B4 = SIMM is broken.
C3 = SIMM is absent.
4.8.3 Page Frame Number Bitmap
The page frame number (PFN) bitmap is a data structure that indicates which
pages in memory are considered usable by the OpenVMS operating system. The
bitmap is built by diagnostics as a side effect of the memory tests run during the
power up sequence.
The bitmap starts on a page boundary and resides at the top of memory. The
bitmap requires 1 Kbyte for each 4 Mbytes of main memory, that is:
•
A 32-Mbyte system requires an 8-Kbyte bitmap
•
A 512-Mbyte system requires a 128-Kbyte bitmap
The bitmap does not map itself or anything above it. There may be memory
above the bitmap which has good and bad pages.
Each bit in the PFN bitmap corresponds to a page in main memory. There is
a one-to-one correspondence between a page frame number (origin 0) and a bit
index in the bitmap. A 1 in the bitmap indicates that the page is good and can
be used. A 0 indicates that the page is bad and should not be used. By default,
a page is flagged bad if a multiple bit error occurs when referencing the page.
Single-bit errors, regardless of frequency, will not cause a page to be flagged bad.
Error Handling and Analysis 4–65
4.9 Error Log Analysis
4.9.1 CPU/MEM Fault Error Log Entry
V A X / V M S
SYSTEM ERROR REPORT
******************************* ENTRY
ERROR SEQUENCE 1033.
DATE/TIME 2-FEB-1993 18:15:45.55
SYSTEM UPTIME: 0 DAYS 01:47:45
SCS NODE: SIXSHL
COMPILED 3-FEB-1993 09:33:44
PAGE 40.
686. *******************************
LOGGED ON:
SID 17000002
SYS_TYPE 02010101
VAX/VMS T5.5-D34
INT60 ERROR KA560 CPU FW REV# 2. CONSOLE FW REV# 0.1
REGISTER COUNT 00000028
Fault Summary Block
FAULT ID
19
FAULT FLAG
02
XLNK MODE ERROR
03
XLNK MODE AFTER
02
CPU/mem fault
Solid error
Duplex
Master
FRU Information Block
FRU TYPE
00000004
FRU DATA
00000001
Module in zone B
CPU in slot 0
Deconfiguration Information
FLT FLGS BEFORE 33003301
Full
Zone
Zone
Zone
Zone
Zone
Zone
Zone
Zone
configuration active
A CPU present
B CPU present
A I/O present
B I/O present
A CPU in use
B CPU in use
A I/O in use
A I/O in use
Full
Zone
Zone
Zone
Zone
Zone
Zone
Zone
Zone
configuration active
A CPU present
B CPU present
A I/O present
B I/O present
A CPU in use
B CPU in use
A I/O in use
A I/O in use
FLT FLGS AFTER 33003301
DECONFIG INFO
00000008
Zone B cpu removed from service
DECONFIG MODULE 00000001
CPU in slot 0 removed from service
Threshold Information Block
4–66 Error Handling and Analysis
V A X / V M S
SYSTEM ERROR REPORT
COMPILED 3-FEB-1993 09:33:44
PAGE 41.
THRESHOLD INTER.0000A8C0
THRESHOLD INTER. SECONDS = 43200.
THRESHOLD COUNT 00000001
THRESHOLD COUNT = 1.
THRESHOLD LIMIT 00000003
THRESHOLD LIMIT = 3.
THRESHOLD ZEROED0000190E
THRESHOLD ZEROED SECONDS = 6414.
THRESHOLD TOTAL 00000001
THRESHOLD TOTAL = 1.
Fault Data Block
SYSTEM ERROR
SYSFLT
19
30020010
I/O error, zone A
CPU/memory fault, zone B
XLINK MODE = Duplex
SYSADR
61200034
DMAADR
0269BC00
SYSADR = 61200034(X)
DMAADR = 0269BC00(X)
DMA Address Register Invalid
JCSR_A CTL/STAT 00000088
System errors enabled
Bcache on
JCSR_B Register Invalid
DIAG_P_A REG
CAC00000
DMA most error (non-crc)
Burn-in mode
I/O divide = 6
CPU divide = A
DIAG_M_A REG
CAC00000
DMA most error (non-crc)
Burn-in mode
I/O divide = 6
CPU divide = A
DIAG_P_B Register Invalid
DIAG_M_B Register Invalid
ATMERR_A REG
00000000
Zone ID = A
ATMERR_B Register Invalid
DMA STAT REG A 00000040
CPU I/O error
DMASTS_B Register Invalid
MMBERR0_A REG 00000000
MMBERR0_B Register Invalid
MMBERR1_A REG
00000000
Error Handling and Analysis 4–67
V A X / V M S
SYSTEM ERROR REPORT
COMPILED 3-FEB-1993 09:33:44
PAGE 42.
MMBERR1_B Register Invalid
SERCSR_A REG
00000080
Loopback request
Enable query interrupt
SERCSR_B Register Invalid
SERMODE_A REG
00200912
Master
Operating System is running
Clock fault enable
Clock select 0 = Master, 1 = Slave
Halt source 0 = A, 1 = B
SERMODE_B Register Invalid
BIU_ADDR_A Register Invalid
BIU_ADDR_B Register Invalid
BIU_STAT_A Register Invalid
BIU_STAT_B Register Invalid
BIU_CTL_A Register Invalid
BIU_CTL_B Register Invalid
This block reflects the content of the four fields of the Fault Summary Block.
The FAULT ID, FAULT FLAG, FRU TYPE, and FRU DATA fields should
always be reviewed. They will generally provide the most immediate FRU
information.
The system operating mode has been changed from Duplex to Degraded
Duplex, with Zone A as the master.
A solid error has been identified and the FRU removed from service. However,
if the CPU has not exceeded its threshold and diagnostics pass, the CPU will
be reconfigured into the system.
At this point, the Zone B CPU has not been removed from service.
The Zone B CPU is being removed from service due to the solid error and
change in operating mode.
OpenVMS is running in Zone A.
4–68 Error Handling and Analysis
4.9.2 CPU/MEM Fault End Action Error Log Entry
V A X / V M S
SYSTEM ERROR REPORT
******************************* ENTRY
ERROR SEQUENCE 1048.
DATE/TIME 2-FEB-1993 18:16:21.40
SYSTEM UPTIME: 0 DAYS 01:48:21
SCS NODE: SIXSHL
COMPILED 3-FEB-1993 09:33:46
PAGE 56.
701. *******************************
LOGGED ON:
SID 17000002
SYS_TYPE 02010101
VAX/VMS T5.5-D34
INT60 ERROR KA560 CPU FW REV# 2. CONSOLE FW REV# 0.1
REGISTER COUNT 00000029
Fault Summary Block
FAULT ID
29
FAULT FLAG
0A
CPU/mem fault end action
Solid error
Service is required
XLNK MODE ERROR
03
XLNK MODE AFTER
02
Duplex
Master
FRU Information Block
FRU TYPE
00000004
FRU DATA
00000001
Module in zone B
CPU in slot 0
Deconfiguration Information
FLT FLGS BEFORE 33003301
Full
Zone
Zone
Zone
Zone
Zone
Zone
Zone
Zone
configuration active
A CPU present
B CPU present
A I/O present
B I/O present
A CPU in use
B CPU in use
A I/O in use
A I/O in use
Zone
Zone
Zone
Zone
Zone
Zone
Zone
A
B
A
B
A
A
A
FLT FLGS AFTER 31003300
DECONFIG INFO
CPU
CPU
I/O
I/O
CPU
I/O
I/O
present
present
present
present
in use
in use
in use
00000008
Zone B cpu removed from service
DECONFIG MODULE 00000001
CPU in slot 0 removed from service
Threshold Information Not Valid
Error Handling and Analysis 4–69
V A X / V M S
SYSTEM ERROR REPORT
COMPILED 3-FEB-1993 09:33:46
PAGE 57.
Fault Data Block
END ACTION
SYSFLT
29
30020020
I/O error, zone B
CPU/memory fault, zone B
XLINK MODE = Duplex
SYSADR
61200034
SYSADR = 61200034(X)
CNTRL/STAT REG 00000008
System errors enabled
DIAG_P REG
CAC08000
Memory double bit error
DMA most error (non-crc)
Burn-in mode
I/O divide = 6
CPU divide = A
DIAG_M REG
CAC08000
Memory double bit error
DMA most error (non-crc)
Burn-in mode
I/O divide = 6
CPU divide = A
MMBERR0 REG
01010101
MMBERR1 REG
ATMERR REG
00000000
40404040
DMA STAT REG
00000040
DMAADR
0269BC00
SERCSR REG
00000080
MMB #3 double bit error
Zone ID = B
CPU I/O error
DMAADR = 0269BC00(X)
Loopback request
Enable query interrupt
SERMODE REG
00002101
Slave
Clock fault enable
Zone ID 0 = A, 1 = B
PCADR
SAVPSL REG
00000000
0000B039
C-BIT
N-BIT
T-BIT
INTEGER OVERFLOW TRAP ENABLE
INTERRUPT PRIORITY LEVEL = 00.
PREVIOUS MODE = KERNEL
CURRENT MODE = KERNEL
FIRST PART DONE CLEAR
ECR
0000004A
fbox enable
fbox st4 bypass enable
timeout clock
pmf pmux = 00
pmf emux = 00
4–70 Error Handling and Analysis
V A X / V M S
BIU CTL
SYSTEM ERROR REPORT
COMPILED 3-FEB-1993 09:33:46
PAGE 58.
DFE0DEF9
Generate/Expect ECC on check_h pins
output enable of cache rams
direct mapped
2X CPU Cycle
IO Map = 1(X)
512 Kbytes
BC TAG
07913800
tag_match
tag control V
tag control D
tag P
BC TAG = 03C8(X)
BIU STAT
500E3070
Bits 33,32 BIU Addr Reg = 1(X)
Bits 33,32 Fill Addr Reg = 1(X)
FILL SYN
00000000
L0 ECC Syn bits Low Longword = 0(X)
Hi ECC Syn bits High Longword = 0(X)
FILL ADDR
000002A8
VMAR
000007E0
FILL ADDR = 000002A8(X)
Sub Block Select = 0(X)
Row Index = 3F(X)
Error Address Field = 00000000(X)
ICSR
00000001
TBADR
TBSTS
00000000
00000000
enable VIC
s5 cmd corresp to tb perr = 00
source of ref causing tb perr = 00
PCSTS
00000000
PCCTL
00000000
PCSTS.LOCK(0) NOT SET
Performance Monitor Mode = 0(X)
COMPAT/STAT REG 00006008
ATM self test failed
ATM ID EEPROM is bad
ATM ID EEPROM has bad os status
DIAG STATUS REG 00000000
Register is not "VALID"
This block reflects the content of the four fields of the Fault Summary Block.
This entry type (end action) is provided after diagnostics have completed
running on a zone or CPU which has been removed from service as a result of
a system error.
This is the end action for the previous example (CPU/MEM Fault Error Log
Entry).
This message specifies that a physical FRU replacement is required.
The system operating mode has been changed from Duplex to Degraded
Duplex with Zone A as the master.
The FRU may be one of five items: CPU module, or one of the four MMBs.
The Zone B CPU has been removed from service.
Double-bit errors are always treated as solid faults. The failed CPU will not
be reconfigured until Zone B memory is repaired. MMB 3 is the most likely
FRU.
Error Handling and Analysis 4–71
4.9.3 CPU or Zone Unsynchable Error Log Entry
V A X / V M S
SYSTEM ERROR REPORT
******************************* ENTRY
ERROR SEQUENCE 1099.
DATE/TIME 2-FEB-1993 18:16:21.40
SYSTEM UPTIME: 0 DAYS 01:48:21
SCS NODE: SIXSHL
COMPILED 3-FEB-1993 09:33:46
PAGE 56.
743. *******************************
LOGGED ON:
SID 17000002
SYS_TYPE 02010101
VAX/VMS T5.5-D34
INT60 ERROR KA560 CPU FW REV# 2. CONSOLE FW REV# 0.1
REGISTER COUNT 0000000E
Fault Summary Block
FAULT ID
60
FAULT FLAG
0A
CPU or zone unsynchable
Solid error
Service is required
XLNK MODE ERROR
02
XLNK MODE AFTER
02
Master
Master
FRU Information Block
FRU TYPE
00000004
FRU DATA
00000001
Module in zone B
CPU in slot 0
Deconfiguration Information
FLT FLGS BEFORE 31003300
Zone
Zone
Zone
Zone
Zone
Zone
Zone
A
B
A
B
A
A
A
CPU
CPU
I/O
I/O
CPU
I/O
I/O
present
present
present
present
in use
in use
in use
Zone
Zone
Zone
Zone
Zone
Zone
A
B
A
A
A
A
CPU
CPU
I/O
CPU
I/O
I/O
present
present
present
in use
in use
in use
FLT FLGS AFTER 31003301
DECONFIG INFO
00000008
Zone B cpu removed from service
DECONFIG MODULE 00000001
CPU in slot 0 removed from service
Threshold Information Not Valid
Fault Data Block
4–72 Error Handling and Analysis
V A X / V M S
SYSTEM ERROR REPORT
COMPILED 3-FEB-1993 09:33:46
PAGE 57.
CUP or ZONE UNSYNCHABLE EVENTS
COMPAR/STAT REG 02000000
CPU is in burnin mode
DIAG STATUS REG FFFFFFFF
Diagnostic status is valid
DIAG ERR NUM
FF
DIAG SUBTEST NUM
FF
DIAG TEST NUM
FF
DIAG GROUP NUM
0F
DIAG ERR NUM = 255
DIAG SUBTEST NUM = 255
DIAG TEST NUM = 255
DIAG GROUP NUM = 15.
Diag Flag = 7(X)
This block reflects the content of the four fields of the Fault Summary Block.
The system was unable to synchronize and reach Duplex mode. Consequently,
the before and after XLINK_MODE fields (Fault Summary Block) reflect
Degraded Duplex mode.
Since the Zone B CPU was unsynchable, it is not in use.
The Zone B CPU was removed from service, and will remain out of service
until it is repaired.
Error Handling and Analysis 4–73
5
FRU Removal and Replacement Procedures
5.1 In This Chapter
This chapter includes:
•
Field replaceable unit list
•
Before you begin
•
FRU removal and replacement
5.2 Field Replaceable Unit List
A complete list of field replaceable units (FRUs) is given in Table 5–1.
Table 5–1 Model 810 FRUs
FRU
Part Number
Modules:
CPU
54-21075-01
Memory mother board (MMB)
54-21085-01
Single-sided SIMMs (4 Mbytes per SIMM)
54-21139-CA
Double-sided SIMMs (8 Mbytes per SIMM)
54-21139-DA
I/O attachment module (ATM)
54-21083-01
Zone control panel
54-22130-01
Fan current sense board (FCSB)
54-22126-01
Console extender module
54-21067-01
Cross-link assembly
70-03710-01
Fan
12-27848-01
Power:
AC front end unit (FEU)
H7884-AA
5V regulator (DC5)
H7179-AA
3.3V regulator (DC3)
H7178-AA
Power system controller (PSC)
H7851-AA
Domestic power distribution box
BA22J-AE
International power distribution box
BA22J-AJ
(continued on next page)
Error Handling and Analysis 5–1
Table 5–1 (Cont.) Model 810 FRUs
FRU
Part Number
Control and miscellaneous power module (CAMP)
54-21073-01
Options:
Ethernet interface module (EIM)
54-21081-01
DSSI extender module
54-21063-01
DSSI interface module (DIM)
54-21065-01
DSSI disk drawer assembly
70-30569-01
Storage:
18.2 Gbyte magazine tape subsystem
TF857-AA/AB
2.6 Gbyte cartridge tape drive
TF85C-BA
2 Gbyte disk drive
RF73-EA
852 Mbyte disk drive
RF35-EA
2.6 Gbyte cartridge tabletop tape drive
TF85-TA
Cable kit for the TF85-TA drive
CK-KDXDA-BA
4 Gbyte half-rack storage array with two RF73 drives and one
SF73-HK assembly
1.7 Gbyte half-rack storage array with two SF35 drives and one
SF35-HK assembly
Cables:
DIM to storage device with terminator (84 inches)
17-03537-03
DIM to storage device with terminator (62 inches)
17-03537-02
DIM to storage device with terminator (24 inches)
17-03537-01
Fan to fan tray
17-03514-01
Fan tray to FCSB
17-03513-01
FCSB to centerplane
17-03512-01
VT420 to UPS (power cable)
17-00442-17
Zone control panel to centerplane
17-01148-03
DSSI disk drawer to centerplane
17-03805-01
DSSI disk drawer power/signal to centerplane
17-03806-01
5–2 Error Handling and Analysis
5.3 Before You Begin
Warning
Hazadous voltages exist within the system. Bodily injury or equipment
damage can result when service procedures are performed incorrectly.
Note
FRUs should be handled only by qualified maintenance personnel.
You do not need to shut down the entire system to remove and replace a FRU.
You can shut down the zone that houses the faulty FRU while the other zone
continues to operate. Section 5.3.2 explains how to shut down a zone.
There are two types of FRU removal and replacement procedures:
•
Cold swaps
•
Warm swaps
During a cold swap, you shut down the zone that houses the faulty FRU while
the operating system continues to run in the other zone. FRUs that require cold
swaps include:
Logic modules
Fan modules
Power supplies
DIM modules
EIM modules
Zone control panel
During a warm swap, the power remains on in both zones. The operating system
continues to run in both zones while the faulty FRU is replaced. FRUs that allow
a warm swap include:
RF35 disk drives
RF73 disk drives
SF35 disk drives
SF73 disk drives
TF85 tape drives
TF857 tape subsystems
DSSI disk drawer assemblies
Chapter 6 explains how to perform a warm swap procedure.
Error Handling and Analysis 5–3
5.3.1 Handling FRUs
Static electricity can damage FRUs. When you handle FRUs, follow the rules in
Table 5–2.
Table 5–2 Handling FRUs
Rule
Action
1
Wear an electrostatic discharge (ESD) wrist strap.
2
When possible, use a grounded ESD workmat.
3
Attach both the wrist strap and the workmat to the system chassis.
4
Before you remove the FRU from the antistatic box, be sure you ground the box
to the system chassis.
5
Wear an ESD wrist strap when you remove the FRU from the antistatic box.
6
Ask the operator or system manager to shut down the zone you will be working
in.
5.3.2 Shutting Down a Zone
Typically, the shutdown is performed by the operator or the system manager.
1. Enter the SHOW ZONE command to see the status of each zone.
•
Active — The zone is running.
•
Stopped — The zone is not running the operating system. It may be
running diagnostics or is available for synchronizing.
•
Absent — The zone is not available.
•
Synchronizing — The zone is synchronizing with the other zone.
•
Providing I/O only — The zone has detected a CPU/MEM fault, and has
placed the CPU and memory off line.
2. Enter the STOP/ZONE zone-id command.
3. At the zone control panel (A or B), simultaneously press both Logic Power OFF switches to remove logic power from the zone.
Note
Pressing the Logic Power - OFF switches does not affect the fan or the
expansion cabinet power unless the drives (disk or tape) are turned off. If
the drives are turned off, the fan will run for about 30 seconds after you
press the switches.
5–4 Error Handling and Analysis
Example 5–1 How to Shut Down a Zone
$ SHOW ZONE
Zone A is ACTIVE
Zone B is PROVIDING I/O ONLY
! Displays the status of each zone.
! Zone A is running.
! Zone B has a faulty component.
$ STOP/ZONE B
! Stops zone B.
At the console terminal of the zone that continues to run (in this case, zone A),
the OPCOM messages show that zone synchronization has been lost and virtual
circuits are closed.
5.3.3 Verifying Zone Shutdown
The SHOW ZONE command may be used to verify that the STOP/ZONE zone-id
command was successful.
Example 5–2 How to Verify Zone Shutdown
$ SHOW ZONE
Zone A is ACTIVE
Zone B is ABSENT
! Displays the status of each zone.
! Zone A is running.
! Zone B has been shut down.
5.3.4 Starting Up a Zone
Typically, the startup is performed by the operator or the system manager.
1. At the zone control panel (A or B), press the Logic Power - ON switch.
2. Enter the SHOW ZONE command to verify that the zone is shut down.
3. Enter the START/ZONE command to start up the zone.
5.3.5 Accessing the FRUs
Figure 5–1 shows the latches at the front and rear of the system. To open a door,
pull the latch.
The electrostatic discharge (ESD) kit and module extraction tool are located
inside the rear door of the CPU cabinet.
Error Handling and Analysis 5–5
Figure 5–1 Latches
Latch Location
Expander
Cabinet
CPU Cabinet
Expander
Cabinet
CPU Cabinet
s
TM
X ft
tem
Sys
VA
Front View
Rear View
MR-0457-92DG
5.4 FRU Removal and Replacement
The following sections contain FRU removal and replacement procedures.
Caution
Service procedures may be performed only by qualified personnel. They
must be familiar with ESD procedures and power procedures for the
Model 810 system. Excessive shock or incorrect handling can damage the
logic modules.
Note
When specific replacement procedures are not given, replace the FRU by
reversing the steps in the removal procedure.
5–6 Error Handling and Analysis
5.4.1 CPU and ATM Modules
You use the same steps to remove the CPU and ATM modules. Figure 5–2 shows
the locations of the modules. Table 5–3 describes the removal procedure.
Figure 5–2 CPU Module and ATM Module Locations
Captive
Screws
Module
Release
Levers
ATM
Module
CPU
Module
CPU Cabinet
MR−0435−92RAGS
Table 5–3 CPU Module and ATM Module Removal Procedure
Step
Action
1
Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.
2
Open the front door of the cabinet.
3
Loosen the captive screws on the module. The CPU module has four captive
screws; the ATM module has two captive screws.
4
Open the module release levers and slide the module out.
Error Handling and Analysis 5–7
5.4.2 SIMMs
Figure 5–3 shows the locations of the SIMMs. Table 5–4 describes the removal
procedure.
Note
SIMMs are configured on the MMBs in rows, with a pair of SIMMs (two)
in each row. You always replace a pair of SIMMs (a two-SIMM row).
Figure 5–3 SIMM Locations
Retaining
Clip
SIMMs (Row D)
SIMMs (Row C)
SIMMs (Row B)
SIMMs (Row A)
MMB3
MMB0
MMB1
MMB2
CPU Module
MR-0453-92DG
Table 5–4 SIMM Removal Procedure
Step
Action
1
Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.
2
Open the front door of the cabinet.
3
Remove the CPU module using the procedure in Table 5–3.
4
Press the two retaining clips until the SIMM pops up at a 45-degree angle.
5
Remove the pair of SIMMs (a two-SIMM row) from the MMB.
5–8 Error Handling and Analysis
5.4.3 MMBs
Figure 5–4 shows the locations of the MMBs. Table 5–5 describes the removal
procedure.
Figure 5–4 MMB Locations
Mounting
Bracket
Screws
Mounting
Bracket
MMB3
MMB0
Mounting
Bracket
MMB1
MMB2
CPU Module
MR-0414-92DG
Table 5–5 MMB Removal Procedure
Step
Action
1
Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.
2
Open the front door of the cabinet.
3
Remove the CPU module using the procedure in Table 5–3.
4
The MMBs are tension mounted on the CPU module with two screws. These
screws are located on the MMB mounting brackets. Loosen one screw by
turning it two or three times. Then loosen the other screw the same way.
Alternate between the two screws until the MMB is free from the CPU module.
(continued on next page)
Error Handling and Analysis 5–9
Table 5–5 (Cont.) MMB Removal Procedure
Step
Action
5
Remove the three screws that secure each of the mounting brackets on the
MMB.
6
Note the configuration of the SIMMs on the MMB. They must be removed from
the faulty MMB and installed in the same locations on the replacement MMB.
7
Remove the SIMMs from the MMB using the procedure in Table 5–4.
5.4.4 Fan and FCSB
Figure 5–5 shows the location of the fan. Figure 5–6 shows the location of the
FCSB. Table 5–6 describes the removal procedure.
Figure 5–5 Fan Location
Front
Captive
Screws
Fan
Handle
CPU Cabinet
MR−0439−92RAGS
5–10 Error Handling and Analysis
Table 5–6 Fan and FCSB Removal Procedure
Step
Action
1
Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.
2
Open the rear door of the cabinet.
3
Set the FEU circuit breaker to the off position.
4
Open the front door of the cabinet.
5
Loosen the three captive screws that secure the fan in the CPU cabinet.
6
Grasp the handle and pull the fan out of the cabinet.
7
Locate the FCSB inside the fan assembly.
8
Disconnect the FCSB from the fan tray to FCSB cable. See Figure 5–6.
9
Disconnect the FCSB from the FCSB to centerplane cable. See Figure 5–6.
10
Remove the FCSB from the four mounting standoffs. See Figure 5–6.
Figure 5–6 FCSB Location
Fan Tray to
FCSB Cable
FCSB to
Centerplane
Cable
Mounting
Standoffs
FCSB
MR−0437−92RAGS
Error Handling and Analysis 5–11
5.4.5 RF35 Disk Drive Removal and Replacement
Figure 5–7 shows an RF35 disk drive in the DSSI disk drawer. Table 5–7
describes the RF35 disk drive removal procedure.
Figure 5–7 RF35 Disk Drive Location
Release Lever
Bracket
Phillips
Screws (6)
Captive
Screws (4)
Release
Pin
Captive
Screws
RF35
Disk
Drive
LDC
Bracket
Release
Pin
MR-0025-93DG
5–12 Error Handling and Analysis
Table 5–7 RF35 Disk Drive Removal Procedure
Step
Action
1
Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.
2
Open the front door of the cabinet.
3
Turn off the RF35 disk drive.
4
Loosen the four screws that secure the DSSI disk drive rack in the CPU
cabinet.
5
Pull the DSSI disk drive rack out until it locks in place.
6
Swing the LDC bracket out until you can see the disk drives. See Figure 5–7.
7
Label the DSSI, power, and disk signal cables, and disconnect them from the
RF35 drive you are removing.
8
Loosen the captive screws at the bottom of the drive.
9
Remove the drive and bracket.
10
Remove the six Phillips screws that secure the bracket on the drive.
Error Handling and Analysis 5–13
5.4.6 DSSI Disk Drawer
Figure 5–7 shows the components in the DSSI disk drawer. Table 5–8 describes
the DSSI disk drawer removal procedure.
Table 5–8 DSSI Disk Drawer Removal Procedure
Step
Action
1
Ask the operator or system manager to dismount the drive.
2
Open the rear door of the cabinet.
3
Set the FEU circuit breaker to the off position.
4
Open the front door of the cabinet.
5
Turn off all the RF35 disk drives.
6
Loosen the four screws that secure the DSSI disk drive rack in the CPU
cabinet.
7
Pull the DSSI disk drive rack out until it locks in place.
8
Swing the LDC bracket out until you can see the disk drives. See Figure 5–7.
9
Label each of the RF35 disk drives.1
10
Label the DSSI, power, and disk signal cables, and disconnect them from each
of the RF35 drives.
11
Loosen the captive screws at the bottom of each of the drives.
12
Remove all the drives from the DSSI disk drawer.
13
At the rear of the DSSI disk drawer, label the two DSSI cables and the power
cable. Then disconnect them.
14
Press the release lever on the left side of the DSSI disk drawer and slide the
drawer out of the cabinet.
1 Label
each drive before you remove it. The RF35 disk drives must be removed from the DSSI disk
drawer and installed in the same locations in the replacement DSSI disk drawer.
5.4.7 Zone Control Panel
Figure 5–8 shows the zone control panel. Table 5–9 describes the removal
procedure.
5–14 Error Handling and Analysis
Figure 5–8 Zone Control Panel
Captive
Screws
Zone
Control
Panel
Bracket
Signal
Cable
34
Controller
Module
Handle
Phillips
Screws (6)
Captive
Screws
MR−0023−93RAGS
Table 5–9 Zone Control Panel Removal Procedure
Step
Action
1
Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.
2
Open the front door of the cabinet.
3
Loosen the four captive screws that secure the zone control panel on the
cabinet.
4
Grasp the handle and pull the zone control panel out until you can access the
controller module signal cable.
5
Disconnect the signal cable from the controller module.
6
Remove the six Phillips screws that secure the controller module on the zone
control panel bracket.
Error Handling and Analysis 5–15
5.4.8 FEU, 3.3V Regulator, 5V Regulator, PSC Modules
You use the same steps to remove these four FRUs. Figure 5–9 shows the
locations of the modules. Table 5–10 describes the removal procedure.
Figure 5–9 FEU, 3.3V Regulator, 5V Regulator, and PSC Locations
+3.3V Regulator
+5V Regulator
PSC
Rear
Circuit
Breaker
Release
Handle
FEU
CPU Cabinet
MR−0443−92RAGS
5–16 Error Handling and Analysis
Caution
Removing/replacing these four modules without shutting down 48V_DRCT
may cause damage to the power components.
Table 5–10 FEU, 3.3V Regulator, 5V Regulator, and PSC Removal Procedure
Step
Action
1
Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.
2
Open the rear door of the cabinet.
3
Set the FEU circuit breaker to the off position.
4
If you are removing the FEU, disconnect the ac power cable from the FEU.
5
Loosen the screws that secure the module in the cabinet. The FEU is secured
with four screws. The 3.3V regulator, 5V regulator, and PSC are secured with
two screws.
6
Grasp the module release handles and pull the power module out of the cabinet.
Error Handling and Analysis 5–17
5.4.9 Cross-Link Assembly
Figure 5–10 shows the location of the cross-link assembly. Table 5–11 describes
the removal procedure. Figure 5–11 shows you how to use the module extraction
tool.
Figure 5–10 Cross-Link Assembly
Rear
Upper
Retaining
Bar
Crosslink
Module
Middle
Retaining
Bar
Crosslink
Cable
Upper
Retaining
Bar
Middle
Retaining
Bar
Crosslink
Module
CPU Cabinet
MR−0447−92RAGS
Note
The cross-link assembly consists of two cross-link modules (one per zone)
and one cross-link cable. These three parts are considered to be one FRU.
5–18 Error Handling and Analysis
Table 5–11 Cross-Link Assembly Removal Procedure
Step
Action
1
Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.
2
Open the rear door of the cabinet.
3
Remove the four screws from the upper retaining bar.
4
Remove the four screws from the middle retaining bar.
5
Insert the module extraction tool into the hole in the cross-link module. Turn
the module extraction tool to the right until it is fastened to the module. See
Figure 5–11.
6
Pull the cross-link module out of the cabinet.
7
Repeat steps 3 through 6 for the other zone.
Figure 5–11 Module Extraction Tool
Module
Extraction
Tool
Tighten
Loosen
Pull to Remove
MR−0024−93RAGS
Error Handling and Analysis 5–19
5.4.10 Console Extender Module
Figure 5–12 shows the location of the console extender module. Figure 5–13
shows the layout of the console extender module. Table 5–12 describes the
removal procedure.
Figure 5–12 Console Extender Module Location
Rear
Upper
Retaining
Bar
Console
Extender
Module
Middle
Retaining
Bar
CPU Cabinet
MR−0036−93RAGS
5–20 Error Handling and Analysis
Table 5–12 Console Extender Module Removal Procedure
Step
Action
1
Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.
2
Open the rear door of the cabinet.
3
Remove the four screws from the upper retaining bar.
4
Remove the four screws from the middle retaining bar.
5
Turn off any devices connected to the console extender module.
6
Label any cables connected to the console extender module. Then disconnect
them. See Figure 5–13.
7
Insert the module extraction tool into the hole in the console extender module.
Turn the tool to the right until it is fastened to the module. See Figure 5–11.
8
Pull the console extender module out of the cabinet.
Figure 5–13 Console Extender Module Layout
Local
Remote
LU
OP
CS
A
L
RM
EO
MD
O E
TM
E
A
L
A
R
M
UPS
Modem
Alarm
MR−0456−92RAGS
Error Handling and Analysis 5–21
5.4.11 DSSI Extender Module
Figure 5–14 shows the locations of the DSSI extender modules. Table 5–13
describes the removal procedure.
Figure 5–14 DSSI Extender Module Locations
Rear
Upper
Retaining
Bar
DSSI Extender
Modules
DIMs
Middle
Retaining
Bar
DSSI
Cables
DSSI Extender
Modules
DIMs
CPU Cabinet
MR−0032−93RAGS
5–22 Error Handling and Analysis
Table 5–13 DSSI Extender Module Removal Procedure
Step
Action
1
Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.
2
Open the rear door of the cabinet.
3
Remove the four screws from the upper retaining bar.
4
Remove the four screws from the middle retaining bar.
5
Turn off all the devices connected to the console extender module.
6
Label the two DSSI cables and disconnect them from the module. See
Figure 5–14.
7
Insert the module extraction tool into the hole in the DSSI extender module.
Turn the tool to the right until it is fastened to the module. See Figure 5–11.
8
Pull the DSSI extender module out of the cabinet.
Error Handling and Analysis 5–23
5.4.12 CAMP Module
Figure 5–15 shows the locations of the CAMP modules. Table 5–14 describes the
removal procedure.
Caution
Removing/replacing the CAMP module without shutting down 48V_DRCT
may cause damage to the CAMP module.
Figure 5–15 CAMP Module Locations
Rear
CAMP
Module
CPU Cabinet
MR−0475−92RAGS
5–24 Error Handling and Analysis
Table 5–14 CAMP Module Removal Procedure
Step
Action
1
Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.
2
Open the rear door of the cabinet.
3
Set the FEU circuit breaker to the off position.
4
Remove the four screws from the upper retaining bar.
5
Remove the four screws from the middle retaining bar.
6
Turn off all the devices connected to the CAMP module.
7
Insert the module extraction tool into the hole in the CAMP module. Turn the
tool to the right until it is fastened to the module. See Figure 5–11.
8
Pull the CAMP module out of the cabinet.
Error Handling and Analysis 5–25
5.4.13 DSSI Interface Module (DIM)
Figure 5–16 shows the location of the interface logic modules. Figure 5–17 shows
how to remove the DIMs. Table 5–15 describes the removal procedure.
Figure 5–16 DIM Location
Rear
Middle
Retaining
Bar
Interface
Logic
Modules
(DIMs and EIMs)
Lower
Retaining
Bar
CPU Cabinet
MR−0433−92RAGS
5–26 Error Handling and Analysis
Figure 5–17 DIM Removal
Rear
Connector
DSSI
Cable
CPU Cabinet
Expansion Cabinet
MR−0046−93RAGS
Table 5–15 DIM Removal Procedure
Step
Action
1
Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.
2
Open the rear door of the cabinet.
3
Remove the four screws from the middle retaining bar.
4
Remove the four screws from the lower retaining bar.
5
Turn off all the devices connected to the DIM you are removing.
6
Disconnect the DSSI cable from the DIM by loosening the two thumb screws.
See Figure 5–17.
7
Insert the module extraction tool into the hole in the DIM. Turn the tool to the
right until it is fastened to the module. See Figure 5–11.
8
Pull the DIM out of the cabinet.
Error Handling and Analysis 5–27
5.4.14 Ethernet Interface Module (EIM)
Figure 5–16 shows the location of the interface logic modules. Figure 5–18 shows
how to remove the EIMs. Table 5–16 describes the removal procedure.
Figure 5–18 EIM Removal
Rear
Ethernet
Switch
Ethernet
Cable
Connector
Ethernet
Cable
Terminator
CPU Cabinet
Expansion Cabinet
MR−0455−92RAGS
5–28 Error Handling and Analysis
Table 5–16 EIM Removal Procedure
Step
Action
1
Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.
2
Open the rear door of the cabinet.
3
Remove the four screws from the middle retaining bar.
4
Remove the four screws from the lower retaining bar.
5
Turn off all the devices connected to the EIM you are removing.
6
Disconnect the Ethernet cable from the EIM. See Figure 5–18.
7
Disconnect the terminator from the EIM, if one is present. See Figure 5–18.
8
Insert the module extraction tool into the hole in the EIM. Turn the tool to the
right until it is fastened to the module. See Figure 5–11.
9
Pull the EIM out of the cabinet.
5.4.15 DSSI Cable Removal and Replacement
Table 5–17 describes the removal procedure.
Table 5–17 DSSI Cable Removal Procedure
Step
Action
1
Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.
2
Open the rear door of the cabinet.
3
Turn off all the devices connected to the DSSI cable you are removing.
4
Disconnect one end of the DSSI cable from the device by loosening the two
screws on the DSSI connector.
5
Route the DSSI cable through the access hole between the system cabinets.
6
Disconnect the other end of the DSSI cable from the DIM by loosening the two
screws on the DSSI connector.
Error Handling and Analysis 5–29
5.4.16 TF85C-BA Tape Drive
Figure 5–19 and Figure 5–20 show how to remove an TF85C-BA tape drive from
the system. Table 5–18 describes the removal procedure.
Warning
Two people are required to lift and carry the TF85C-BA tape drive
enclosure.
Figure 5–19 TF85C-BA Tape Drive, Rear View
DSSI
Connectors
230
115
Power Supply
Fault Indicator
(Behind Panel)
FAULT
Line Voltage
Selector Switch
(Behind Panel)
MR-0454-92DG
5–30 Error Handling and Analysis
Figure 5–20 TF85C-BA Tape Drive Removal
Tape Drive Enclosure
Release Tab
Front Plate Screws (4)
Screws (3)
TF85 Tape Drive
Front Plate
MR-0038-93DG
Table 5–18 TF85C-BA Tape Drive Removal Procedure
Step
Action
1
Ask the operator or system manager to dismount the tape.
2
Ask the operator or system manager to dismount the tape drive.
3
Unload the tape magazine, if one is present.
4
At the front of the drive, set the power switch to off (0). All the indicators
should be off.
5
Disconnect the power cable from the rear of the drive. See Figure 5–19.
6
Disconnect the two DSSI cables from the rear of the drive. See Figure 5–19.
7
At the front of the drive, remove the three screws that secure the tape drive
enclosure in the cabinet. See Figure 5–20.
8
Slide the tape drive enclosure out of the expansion cabinet.
9
Remove the four screws that secure the front plate on the tape drive enclosure.
10
Push the release tab down and pull the drive straight out of the slot.
Error Handling and Analysis 5–31
5.4.17 SF73 Disk Drive
Figure 5–21 and Figure 5–22 show how to remove the SF73 disk drives from the
system. Figure 5–23 shows how to remove an SF73 disk drive enclosure from
the system. Figure 5–24 shows how to remove an SF73 disk ISE from a drive.
Table 5–19 describes the removal procedure.
Warning
Two people are required to lift and carry the SF73 disk drive enclosure.
Figure 5–21 SF73 Disk Drive, Rear View
DSSI
Connectors
1 0
AC Power
Switch
Power Supply
Fault Indicator
(Behind Panel)
230
115
FAULT
Line Voltage
Selector Switch
(Behind Panel)
MR-0422-92DG
5–32 Error Handling and Analysis
Figure 5–22 SF73 Disk Drive, Front View
digi tal
Write
Ready Protect Fault
DSSI
ID
DSSI
ID
Write
Ready Protect Fault
Captive Screws
Front Cover
Door
Captive Screws
MR-0035-93DG
Table 5–19 SF73 Disk Drive Enclosure Removal Procedure
Step
Action
1
Ask the operator or system manager to dismount the drive.
2
Turn off the disk drive enclosure.
3
Disconnect the power cable from the rear of the drive. See Figure 3–9.
4
Disconnect the two DSSI cables from the rear of the drive. See Figure 3–9.
5
Remove the mounting screws from the retainers that secure the drive enclosure
in the cabinet. See Figure 5–23.
6
Slide the disk drive enclosure out of the expansion cabinet.
7
Remove the retainer screws that secure the retainers on the disk drive
enclosure. See Figure 5–23.
8
Loosen the captive screws that secure the front cover on the disk drive
enclosure. See Figure 5–22.
9
Disconnect all cables from the disk ISE. Slide the disk ISE out of the disk drive
enclosure. See Figure 5–24.
Error Handling and Analysis 5–33
Figure 5–23 SF73 Disk Drive Enclosure Removal
Retainer
Screws
Chassis
Retainer
Mounting
Screws
Retainer
Screws
Retainer
MR-0484-92DG
5–34 Error Handling and Analysis
Figure 5–24 SF73 Disk ISE Removal
NOTE TO
ILLUSTRATOR:
front panel for this
hardware is SHR_X1127_89
ISOL and reduced
17/64 (.265625)
SI
D SID
Re
ad
a
e F
r it
W te c t
ro
y P
u lt
DSSI Cable
SI
D SID
g
di
Re
ad
y
l
a
e F
r it
W te c t
o
Pr
a
it
u lt
10-Pin
OCP Cable
NOTE TO ILLUSTRATOR:
This was created by
rotating SHR_x1074A_89_SCN
RW,Z120
SHR-X0135-90
THIS REPRESENTS
6-Pin
Power Cable
A RF72
SHR-X0128-90-SCN
Skid Plate
Guide
Disk
ISE
MR-0034-93DG
Error Handling and Analysis 5–35
5.4.18 SF35 Storage Array
Figure 5–23 shows how to remove an SF35 storage array from the system.
Figure 3–7 and Figure 5–26 show the rear and front views of the SF35 storage
array. Figure 5–27 shows how to remove an SF35 disk ISE from the storage
array. Table 5–20 describes the removal procedure.
Warning
Two people are required to lift and carry the SF35 storage array.
Figure 5–25 SF35 Storage Array, Rear View
DSSI
Connectors
A
B
C
D
E
F
digi tal
1 0
AC Power
Switch
Power Supply
Fault Indicator
(Behind Panel)
230
115
FAULT
Line Voltage
Selector Switch
(Behind Panel)
MR-0421-92DG
5–36 Error Handling and Analysis
Figure 5–26 SF35 Storage Array, Front View
Operator
Control
Panel
(OCP)
Front
A
B
Reeaarr
R
C
D
E
F
A
B
C
D
E
F
Ready
Write
Protect
Fault
A
A
B
C
D
E
F
B
C
Front
D
E
F
A
B
C
Rear
D
E
F
Drive DC Power Switches
F
E
Re
ar D
C
B
A
y
ad
Re
e
rit
W ec t
ot
Pr
ult
Fa
F
E
Fr
t
on D
C
B
B
A
D
A
F
C
F
E
Re
arD
C
E
B
A
F
E
Fr
C
t
on D
B
A
MR-0470-92DG
Error Handling and Analysis 5–37
Figure 5–27 SF35 Disk ISE Removal
A
B
C Fro
nt
D
E
F
Re
A
ad
y
W
r
Pr ite
ot
ec
t
Fa
ul
t
A
B
C Re
ar
D
E
F
C
B
E
Carrier Lever
A
B
C Fro
nt
D
F
E
F
A
B
C Re
a
r
D
E
F
D
Screw
Carrier Lever
MR-0033-93DG
Table 5–20 SF35 Storage Array Removal Procedure
Step
Action
1
Ask the operator or system manager to dismount the disk.
2
Turn off the storage array.
3
Disconnect the power cable from the rear of the storage array. See Figure 3–7.
4
Disconnect the two DSSI cables from the rear of the storage array. See
Figure 3–7.
5
Remove the mounting screws from the retainers that secure the storage array
in the cabinet. See Figure 5–23.
6
Slide the disk drive enclosure out of the expansion cabinet.
7
Remove the retainer screws that secure the retainers on the storage array. See
Figure 5–23.
8
Remove the screw from the carrier lever. See Figure 5–27.
9
Pull the carrier lever forward and slide the disk ISE out of the slot. See
Figure 5–27.
5–38 Error Handling and Analysis
5.4.19 TF857-CA Tape Drive
Figure 5–28 shows how to remove the TF857-CA tape drive from the system.
Table 5–21 describes the removal procedure.
Warning
Two people are required to lift and carry the TF857-CA tape drive
enclosure.
Figure 5–28 TF857-CA Tape Drive, Rear View
DSSI Cable
Cable
Clip
Tiewraps
Power Cable
Push Cable Tie
MR-0420-92DG
Error Handling and Analysis 5–39
Table 5–21 TF857-CA Tape Drive Removal Procedure
Step
Action
1
Ask the operator or system manager to shut down the zone using the procedure
in Section 5.3.2.
2
Ask the operator or system manager to dismount the tape drive.
3
Unload the tape magazine, if one is present.
4
At the front of the drive, set the power switch to off (0). All the indicators
should be off.
5
Disconnect the power cable from the rear of the drive. See Figure 5–28.
6
Disconnect the two DSSI cables from the rear of the drive. See Figure 5–28.
7
Remove the mounting screws from the retainers that secure the drive enclosure
in the cabinet. See Figure 5–23.
8
Slide the tape drive enclosure out of the expansion cabinet.
9
Loosen the shipping restraint screw until the shipping bracket drops. See
Figure 5–29. If the shipping bracket does not drop when you loosen the
shipping restraint screw, push the shipping bracket down with a screwdriver.
10
Slide the tape drive enclosure out of the expansion cabinet.
Figure 5–29 Loosening the Shipping Restraint Screw
Shipping
Bracket
Shipping
Restraint
Screw
MR-0466-92DG
5–40 Error Handling and Analysis
Note
If you are replacing the TF857 tape loader, you must set the node ID.
Refer to Figure 5–30 for the node ID DIP switch location.
Figure 5–30 Setting the TF857 Tape Loader Node ID
Node ID
DIP Switch
4
3
2
1
Drive Enclosure
Controller Module
1
2
3
4
TF857 Tape Drive
Assembly
Ej
ec
Lo
Sl
t
ad
ot
/U
nl
Se
oa
le
d
ct
0
W
Lo
rit
e
ad
o
Pr
Fa
ul
te
ct
1
t
2
4
5
MR-0467-92DG
Error Handling and Analysis 5–41
5.4.20 Power Distribution Box
Figure 5–31 shows a domestic power distribution box. Figure 5–32 shows
an international power distribution box. Table 5–22 describes the removal
procedure.
Figure 5–31 Domestic Power Distribution Box
AC Power
Outlets (8)
Hex
Screws
I
CB
Circuit Breaker
DEC Power Bus
Switch
AC Power
Cable
Access Hole
Hex
Screws
MR-0044-93DG
5–42 Error Handling and Analysis
Figure 5–32 International Power Distribution Box
AC Power
Outlets (6)
Hex
Screws
AC Power
Connector
Circuit Breaker
DEC Power Bus
Switch
Access Hole
Hex
Screws
MR-0045-93DG
Table 5–22 Power Distribution Box Removal Procedure
Step
Action
1
Turn off any devices connected to the power distribution box.
2
Set the circuit breaker to the off position. See Figure 5–31 or Figure 5–32.
3
Set the DEC power bus switch to the local position. See Figure 5–31 or
Figure 5–32.
4
If you are removing a domestic power distribution box, disconnect the ac power
cable from facility power. See Figure 5–31.
If you are removing an international power distribution box, disconnect the
ac power cable from the ac power connector and from facility power. See
Figure 5–32.
5
Disconnect any ac power cables connected to the ac power outlets and route the
cables through the access hole. See Figure 5–31 or Figure 5–32.
6
Remove the four hex screws that secure the power distribution box in the
cabinet. See Figure 5–31 or Figure 5–32.
7
Remove the power distribution box from the cabinet.
Error Handling and Analysis 5–43
6
Managing Integrated Storage Elements
6.1 In This Chapter
This chapter includes:
•
Loading the DUP driver
•
Using VMS DUP
•
Using the server setup switch
•
Assigning DSSI unit numbers
•
Warm swapping an ISE
6.2 Loading the DUP Driver
If the VMS diagnostic utility protocol (DUP) class driver is not loaded, load it as
follows:
$ MCR SYSGEN Return
SYSGEN> CONNECT FYA0/NOADAPTER
SYSGEN> EXIT Return
Return
6.3 Using VMS DUP
Use the VMS DUP to change configuration data on mass storage devices. With
DUP, you can connect the terminal to a storage controller with the following DCL
command:
SET HOST/DUP/SERVER=MSCP$DUP/TASK=taskname nodename
where:
taskname
–
is the utility or diagnostic program name to be executed on the
target storage system
nodename
–
is the node name of the ISE
You can use SET HOST/DUP to create a virtual terminal connection to the
MSCP$DUP server and to execute a utility or diagnostic program on the MSCP
storage controller that uses the DUP standard dialogue.
Once the connection is established, operations are under the control of the utility
or diagnostic program. When the utility or program ends, control returns to the
local system.
PARAMS is the DUP management utility to examine and change ISE parameters
such as node name, allocation class, and unit number. PARAMS is also used to
display the state of the ISE and performance statistics maintained by the ISE.
PARAMS prompts for a command with the PARAMS> prompt. Once you enter a
command, PARAMS executes it, and prompts you for another command.
Managing Integrated Storage Elements 6–1
To stop the PARAMS utility, press
PARAMS prompt.
Ctrl/C , Ctrl/Y , Ctrl/Z ,
or type EXIT at the
Table 6–1 lists PARAMS commands.
Table 6–1 PARAMS Commands
Command
Description
EXIT
Stops the PARAMS utility
HELP
Displays information on how to use PARAMS commands
SET
Changes internal ISE parameters
SHOW
Displays the setting of a parameter or a class of parameters
WRITE
Records in nonvolatile RAM the device parameter changes you made
with SET
Additional information is available on ISE tasks and commands in the
RF/TF-series installation guides.
6.4 Using the Server Setup Switch
The server setup (SU) switch facilitates the installation of a new or incorrectly
initialized ISE on a running system. Use SET HOST and configure parameters
for the ISE with DUP, before VMS recognizes the ISE as an available resource.
Table 6–2 explains how to disable RF-series and SF35, SF73, and SF72 disks.
Table 6–2 Switches For Disabling the MSCP
Disks
To Disable
More information
in
RF-Series
Press the SU switch to disable the
MSCP/TMSCP server within the ISE
VAXft Systems
Owner’s Manual
SF72 or SF 73
Set the drive positions DSSI ID number
and the left-most MSCP to disable the
ISE. The icon on the front of the door
indicates the location of the drive.
VAXft Systems
Operating
Information
SF35
Press the MSCP switch to disable the
ISE. The MSCP switch is located on the
Operator Control Panel.
VAXft Systems
Operating
Information
6.5 Assigning DSSI Unit Numbers
By default, the disk drive forces the unit number to the same value as the DSSI
node address for the drive. Since the drives in zone A and zone B initially have
the same DSSI unit number, reassign unit numbers to remove configuration
conflicts and improve system management.
All unit numbers must be unique within an allocation class. Change the
UNITNUM and FORCEUNI ISE parameters (see Table 6–3) to override the
default values that assign the unit the same value as its node address.
Reassign unit numbers so that they have values greater than 99. For example,
Figure 6–1 and Figure 6–2 use a 100-, 200-, 300-, 400-, 500-, and 600- numbering
scheme for SF35s and SF73s.
6–2 Managing Integrated Storage Elements
Figure 6–1 VAXft Model 810 Front View
Front
700
700
800
500
600
300
400
100
200
701
SF73
B
101
D
103
F
105
A
100
C
102
E
104
SF35
Expansion Cabinet
CPU Cabinet
MR−0050−93RAGS
6.6 Warm Swapping an ISE
Warm swapping is the procedure by which an ISE can be replaced or added to a
running system without interrupting system operations.
Caution
The procedure must be followed carefully. If a parameter is not entered
correctly, then a system reboot is necessary or the ISE (and possibly the
system) is rendered unusable.
The VMS operating system recognizes an ISE by its unique values for
the NODENAME and SYSTEMID parameters. If only one of these
parameters is changed, VMS inhibits connections to the old and new
parameters for the ISE.
Variations of this procedure depend on the purpose for the warm swap. An ISE
can be warm swapped for the following reasons:
•
Removal and replacement for storage
Managing Integrated Storage Elements 6–3
Figure 6–2 VAXft Model 810 Rear View
Rear
800
700
600
500
703
400
300
200
100
702
SF73
B
107
D
109
F
111
A
106
C
108
E
110
CPU Cabinet
Expansion Cabinet
SF35
MR−0051−93RAGS
•
Replacement in a system that is running
•
Installation in a system that is running
When replacing an ISE or installing a new ISE, determine the parameter values
for the ISE before performing the warm swap procedure. Assign values for each
of the ISE parameters described in Table 6–3.
6–4 Managing Integrated Storage Elements
Table 6–3 ISE Parameters
Parameter
Description
1
ALLCLASS
Allocation class. The default value is 0. Set the ALLCLASS value to the allocation
class chosen for the system. Note that shadowed disk devices must be set to a nonzero
allocation class.
FORCENAM
Force name parameter. Determines if the ISE is to use the NODENAME parameter
value instead of the manufacturing name given to the ISE. The value must be 0. If the
value is 1, the ISE uses a generic device name such as RF31x.
FORCEUNI
Force unit parameter. To use UNITNUM as the device unit number, set the FORCEUNI
parameter to 0. The factory default value of 1 uses the DSSI node address (hardwired
on the backplane) as the unit number.
NODENAME
Node name for an ISE. Each ISE has a node name that is stored in EEPROM. The node
name is determined in the manufacturing process and is unique to each ISE. The node
name can be changed depending on the needs of the site.
SYSTEMID
System identification number. All SYSTEMIDs must be unique within the system. Do
not change this parameter when introducing a new ISE to the system.
UNITNUM
Unit number. Specifies a numeric value for the device name. Use a unit number that is
unique within the allocation class to which you are configuring the unit. Follow the unit
numbering scheme described in Section 6.5 or use one that meets the requirements.
1 RF-series
devices only
More information is available on ISE parameters in the RF/TF-series installation
guides.
6.6.1 Setting ISE Parameters
Digital Equipment Corporation recommends maintaining a worksheet of the
parameters for all ISEs, as well as the serial number of each ISE. This is
especially important at sites that maintain a set of spare drives that may be
stored for some time before they are used.
The worksheet aids in:
•
Preventing duplicate parameters, which render an ISE unusable until the
duplication is isolated and corrected
•
Finding the parameter settings of a non-operational ISE to create a
replacement unit with identical parameters
Use the ISE parameter worksheets in Appendix B to identify and record critical
parameter names and values. When installing a new ISE, select parameter
values that meet the site ISE configuration or guidelines. Then continue with
Section 6.6.4. When replacing an ISE, make sure the parameters selected are not
being used for another ISE in the configuration.
If the parameter values were not recorded, perform the following steps to extract
the information required from your system:
1. Enter SHOW DEVICE DI to display the following information:
•
Device name
The device names in the sample output below are $1$DIA22 and
$1$DIA21.
•
NODENAME
Managing Integrated Storage Elements 6–5
The node name is shown in parentheses. In the following sample output,
the node names are RIRRBA and RICYAA.
•
ALLCLASS
The allocation class is found in the device name between the dollar signs
($). In $1$DIA21, the ISE has an allocation class of 1. If the allocation
class was 0, the node name would display as RICYAA$DIA21.
•
UNITNUM
The unit number is the number following the DIA. In $1$DIA21, the
UNITNUM is 21. It is the MSCP unit number.
•
FORCENAM
The force unit name is set to 0 if NODENAME is anything other than an
RF31x. The x corresponds to a DSSI node ID (A = 0, B = 1, and so on).
•
FORCEUNI
The force unit parameter is not shown in the sample, but it should be 0
if the configuration rules given in the VAXft Systems Configuration Guide
were followed.
2. Determine whether the VMS DUP class driver is loaded by entering the
following DCL command:
$ SHOW DEVICE FYA0
Return
If the driver is not loaded, load it as follows:
$ MCR SYSGEN Return
SYSGEN> CONNECT FYA0/NOADAPTER
SYSGEN> EXIT Return
Return
3. Enter SET HOST/DUP to establish a DUP connection with the ISE as follows:
$ SET HOST/DUP/SERVER=MSCP$DUP/TASK=PARAMS nodename
This invokes DUP on the ISE and runs the PARAMS utility. If a connection
can not be established with the ISE DUP, use ANALYZE/SYSTEM to find
information on some of the parameters.
In the following sample output, the SYSTEMID is 94100302 and the
ALLCLASS is 1.
$ ANALYZE/SYSTEM
Return
VMS System Analyzer
SDA> SHOW DEVICE $1$DIA21
I/O data structures
------------------$1$DIA21
RF31
UCB address: 802D65D0
Device status: 00021810 online,valid,unload,lcl_valid
Characteristics: 1C4D4108 dir,rct,fod,shr,avl,mnt,elg,idv,odv,rnd
000022A1 clu,mscp,srv,nnm,loc
6–6 Managing Integrated Storage Elements
Owner UIC [000010,000001]
PID
00000000
Alloc. lock ID 00B000E5
Alloc. class
1
Class/Type
01/38
Def buf. size
512
DEVDEPEND
00000000
DEVDEPND2
00000000
FLCK index
34
DLCK address
00000000
Operation count
1116
Error count
0
Reference count
1
Online count
2
BOFF
0000
Byte count
0000
SVAPTE
00000000
DEVSTS
0004
RWAITCNT
0000
ORB address 802D6700
DDB address 804DA680
DDT address 80308BD8
VCB address 802E2750
CRB address 8048C250
PDT address 802A5F80
CDDB address 802D6410
I/O wait queue empty
Press RETURN for more.
SDA> Return
I/O data structures
--------------------- Primary Class Driver Data Block (CDDB) 802D6410 --Status:
0040 alcls_set
Controller Flags 80D4 icf_mlths,cf_this,cf_misc,cf_attn,cf_replc
Allocation class
1
System ID
94100302
4041
Contrl. ID
94100302
01644041
Response ID
00000000
MSCP Cmd status FFFFFFFF
CDRP Queue
Restart Queue
DAP count
Contr. timeout
Reinit Count
Wait UCB Count
empty
empty
3
60
0
0
DDB address
CRB address
CDDB link
PDT address
Original OCB
UCB chain
804DA860
8048C250
80344C30
802A5F80
00000000
802D65D0
*** I/O request queue is empty ***
Press RETURN for more.
SDA> EXIT Return
$
$ SHOW DEVICE DI
Device
Name
$1$DIA22
$1$DIA21
Return
(RIRRBA)
(RICYAA)
Device
Status
Mounted
Online
Error
Count
0
5
Volume Free Trans Mnt
Label Blocks Count Cnt
DISK22 744282
1
1
6.6.2 ISE Removal
When you replace an ISE, initialize the new ISE with the same parameters as
the ISE being replaced. Refer to the worksheet maintained for that ISE. (See
Section 6.6.1.)
You can turn off power and replace an ISE in a running system without
interrupting system services or users. When the ISE is replaced, the new
ISE must be correctly initialized to:
•
Supersede pre-set manufacturing values
•
Store the modified values in EEPROM
To replace an ISE in a system that is running, perform the following steps:
Managing Integrated Storage Elements 6–7
Caution
You must use an ESD wrist strap, ground clip, and grounded ESD
workmat whenever you handle ISEs. Use the static protective service kit
(PN 29-262446).
Use great care when you handle an ISE; excessive shock can damage the
head-disk-assembly (HDA).
1. If the ISE is mounted, logically dismount it from the system.
2. Make the device unavailable to the system by entering the following DCL
command:
$ SET DEVICE/NOAVAILABLE devicename
Return
3. Verify that the device has been marked as unavailable by entering the
following DCL command:
$ SHOW DEVICE $1$DIA21
Device
Name
$1$DIA21
(RICYAA)
Return
Device
Error
Status
Count
Unavailable
5
Volume Free Trans Mnt
Label Blocks Count Cnt
4. Set the ISE power switch to off (0). Wait 45 seconds for drive to stop spinning
(and for RF-disks, the interlock solenoid to release).
5. Remove the ISE from the slot. Follow the steps in the device owner’s manual,
and observe all FRU handling procedures.
6.6.3 ISE Replacement
When you replace an ISE in a system that is running, use the following steps to
restore the parameters from the ISE being replaced. When you install a new ISE
in a system that is running, use the steps described in Section 6.6.4.
Caution
You must use an ESD wrist strap, ground clip, and grounded ESD
workmat whenever you handle ISEs. Use the static protective service kit
(PN 29-262446).
Use great care when handling an ISE. Excessive shock can damage the
HDA.
6–8 Managing Integrated Storage Elements
1. Disable the MSCP server as described in Table 6–4.
Table 6–4 Disabling the MSCP
Disks
Action
RF-series
Press and hold the SU switch/button
SF72 or SF72series
Set the MSCP enable switch
SF35
Press the MSCP/Fault switch (LED is green when enabled)
2. Set the ISE power switch to on (1). Wait for the drive to start spinning (and,
on RF-series disks, the interlock solenoid to lock).
3. If you have an RF-series disk, release the server setup switch. If you have an
SF-series disk, continue with Step 4.
4. Verify that the device has been marked as available by entering the following
DCL command:
$ SHOW DEVICE devicename
Return
5. Find the NODENAME parameter for the replacement ISE by entering SHOW
CLUSTER. (SHOW DEVICE will not work at this time.) In the sample
output below, R1QSAA is the replacement ISE.
$ SHOW CLUSTER
Return
View of Cluster from system ID 63973 node CLOUDS
+-----------------------------+
|
SYSTEMS
| MEMBERS |
+-----------------------------+
| NODE | SOFTWARE | STATUS |
+-----------------------------+
| CLOUDS | VMS V5.4 | MEMBER |
| RICYAA | RFX V2001|
|
| RIRRBA | RFX V200 |
|
| R1QSAA | RFX V200 |
|
+-----------------------------+
6. Determine whether the VMS DUP class driver is loaded by entering the
following DCL command:
$ SHOW DEVICE FYA0
Return
If the driver is not loaded, load it by entering the following:
$ MCR SYSGEN Return
SYSGEN> CONNECT FYA0/NOADAPTER
SYSGEN> EXIT Return
Return
7. Enter SET HOST/DUP to establish a DUP connection with the ISE as follows:
$ SET HOST/DUP/SERVER=MSCP$DUP/TASK=PARAMS nodename
This invokes DUP on the ISE and runs the PARAMS utility.
8. Refer to the parameters listed in Table 6–3, and enter the SET command
to set appropriate values for the parameters. Be sure to record the new
parameters on the worksheet for the ISE.
1 Firmware version number
Managing Integrated Storage Elements 6–9
For example:
$ SET HOST/DUP/SERVER=MSCP$DUP/TASK=PARAMS R1QSAA Return
%HSCPAD-I-LOCPROGEXE, Local program executing - type ^\ to exit
Copyright (C) 1993 Digital Equipment Corporation
PARAMS> SHOW NODENAME Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------NODENAME
R1QSAA
RF31
String
Ascii
PARAMS> SET NODENAME RICYAA
Return
PARAMS> SHOW SYSTEMID Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------SYSTEMID
593200495860 0000000000000 Quadword
Hex
B
PARAMS> SET SYSTEMID 0404194100302
Return
PARAMS> SHOW ALLCLASS Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------ALLCLASS
0
0
Byte
Dec
B
PARAMS> SET ALLCLASS 1
Return
PARAMS> SHOW FORCENAM Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------FORCENAM
0
0 Boolean
0/1
B
PARAMS> SHOW UNITNUM Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------UNITNUM
0
0
Word
Dec
U
PARAMS> SET UNITNUM 21
Return
PARAMS> SHOW FORCEUNI Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------FORCEUNI
1
1 Boolean
0/1
U
PARAMS> SET FORCEUNI 0
PARAMS> WRITE
Return
Return
Changes require controller initialization, ok? [Y/(N)] Y
Initializing...
HSCPAD-S-REMPGMEND, Remote program terminated - message number 3
%HSCPAD-S-END, Control returned to CLOUDS
$
9. Make the device available to the system by entering the following DCL
command:
$ SET DEVICE/AVAILABLE devicename
Return
10. Mount the ISE in the system and restore the shadow sets.
11. On SF-series drives, enable the MSCP switch.
When initialization is complete, the replacement ISE and its parameters are
made available to the VMS operating system.
6–10 Managing Integrated Storage Elements
Note
The SHOW CLUSTER command continues to show the name of the ISE
replaced. This does not harm the system. After the next reboot, the
replacement ISE name appears.
Note also that the following message is displayed if another node is
already assigned the same SYSTEMID and NODENAME:
%PWA0-REMOTE SYSTEM CONFLICTS WITH KNOWN SYSTEM
In this case, shut down the new node and issue a unique SYSTEMID and
NODENAME for the new node.
6.6.4 Installing an ISE in a Running System
When you install a new ISE in a system that is running, perform the following
steps to initialize the new ISE parameters:
1. Disable the MSCP server as described in Table 6–5.
Table 6–5 Disabling the MSCP
Disks
Action
RF-series
Press and hold the SU switch/button
SF 72 or SF73
Set the MSCP enable switch
SF35
Press the MSCP/Fault switch (LED is green when enabled)
2. Set the ISE power switch to on (1). Wait for the drive to start spinning (and
on RF-series disks, the interlock solenoid to lock.
3. If you have an RF-series disk, release the server setup switch. If you have an
SF disk, continue with Step 4.
4. Refer to Table 6–3 and Section 6.6.1, and select values for the following
parameters:
•
ALLCLASS
•
FORCENAM
•
FORCEUNI
•
NODENAME
•
UNITNUM
5. Determine whether the VMS DUP class driver is loaded by entering the
following DCL command:
$ SHOW DEVICE FYA0
Return
If the driver is not loaded, load it by entering the following:
$ MCR SYSGEN Return
SYSGEN> CONNECT FY0/NOADAPTER
SYSGEN> EXIT Return
Return
6. Enter SET HOST/DUP to establish a DUP connection with the ISE as follows:
$ SET HOST/DUP/SERVER=MSCP$DUP/TASK=PARAMS nodename
Managing Integrated Storage Elements 6–11
This invokes DUP on the ISE and runs the PARAMS utility.
7. Use SET to assign appropriate values for the parameters. Be sure to record
the new parameters on the worksheet for the ISE.
In the following sample output, the new ISE is configured to be device
$1$DIA22. The device is initialized with these parameters:
•
ALLCLASS — 1
•
FORCENAM — 0
•
FORCEUNI — 0
•
NODENAME — DISK22
•
SYSTEMID — no change
•
UNITNUM — 22
$ SET HOST/DUP/SERVER=MSCP$DUP/TASK=PARAMS R1QSAA Return
%HSCPAD-I-LOCPROGEXE, Local program executing - type ^\ to exit
Copyright (C) 1990 Digital Equipment Corporation
PARAMS> SHOW NODENAME Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------NODENAME
R1QSAA
RF31
String
Ascii
PARAMS> SET NODENAME DISK22
Return
PARAMS> SHOW ALLCLASS Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------ALLCLASS
0
0
Byte
Dec
B
PARAMS> SET ALLCLASS 1
Return
PARAMS> SHOW FORCENAM Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------FORCENAM
0
0 Boolean
0/1
B
PARAMS> SHOW UNITNUM Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------UNITNUM
0
0
Word
Dec
U
PARAMS> SET UNITNUM 22
Return
PARAMS> SHOW FORCEUNI Return
Parameter
Current
Default
Type
Radix
---------- ------------- -------------- ---------- --------FORCEUNI
1
1 Boolean
0/1
U
PARAMS> SET FORCEUNI 0
PARAMS> WRITE
Return
Return
Changes require controller initialization, ok? [Y/(N)] Y
Initializing...
HSCPAD-S-REMPGMEND, Remote program terminated - message number 3
%HSCPAD-S-END, Control returned to CLOUDS
$
6–12 Managing Integrated Storage Elements
When initialization is complete, the new ISE and its parameters are made
available to the VMS operating system.
8. On SF-series drives, enable the MSCP switch.
Note
The SHOW CLUSTER command continues to show the name of the ISE
you replaced. This does not harm the system. After the next reboot, the
new ISE name appears.
Managing Integrated Storage Elements 6–13
A
Miscellaneous System Information
A.1 In This Appendix
This appendix includes:
•
Processor Halt codes
•
Console Halt codes
•
Error register descriptions
•
I/O physical address space
•
System control block description
A.2 Processor Halt Codes
Table A–1 provides the processor Halt code definitions.
Table A–1 Processor Halt Code Definitions
Halt Code
Number
Definition
CPM$K_EXT_HALT
?02
External halt
CPM$K_RESET
?03
Reset
CPM$K_BAD_ISP
?04
Interrupt stack not valid
CPM$K_DBL_ERR1
?05
Machine check during execution
CPM$K_HALT
?06
Halt instruction executed
CPM$K_SCB_ERR3
?07
SCB vector bits [01:00] = 11
CPM$K_SCB_ERR2
?08
SCB vector bits [01:00] = 10
CPM$K_CHM_FRM_ISTK
?0A
CHMx executed while on interrupt stack
CPM$K_CHM_TO_ISTK
?0B
CHMx to interrupt stack
CPM$K_SCB_READ_ERR
?0C
SCB read error
CPM$K_MERR_V
?10
ACV or TNV during machine check
CPM$K_KSP_V
?11
ACV or TNV during KSP exception
CPM$K_DBL_ERR2
?12
Machine check during machine check
CPM$K_DBL_ERR3
?13
Machine check during KSP not valid
CPM$K_PSL_EXC5
?19
PSL [26:24] = 101 during interrupt or
exception
CPM$K_PSL_EXC6
?1A
PSL [26:24] = 110 during interrupt or
exception
(continued on next page)
Miscellaneous System Information A–1
Table A–1 (Cont.) Processor Halt Code Definitions
Halt Code
Number
Definition
CPM$K_PSL_EXC7
?1B
PSL [26:24] = 111 during interrupt or
exception
CPM$K_PSL_REI5
?1D
PSL [26:24] = 101 during REI
CPM$K_PSL_REI6
?1E
PSL [26:24] = 110 during REI
CPM$K_PSL_REI7
?1F
PSL [26:24] = 111 during REI
The following example shows a processor Halt code output. Table A–2 defines the
Halt Reason fields.
>>>
?03 Reset (Reason = 0017)
PC= 01E00000 PSL= 041F0300
Table A–2 Processor Halt Reason Code Definitions
Reason Code
(Hex)
Definition
0001
Duplex zones have diverged
0002
Fatal cross-link error has occurred
0003
Fatal zone error has occurred
0004
Fatal ATM error has occurred
0005
Fatal CPU module error has occurred
0006
Fatal memory error has occurred
0007
Single bit memory error has occurred
0008
User command issued to stop a zone
0009
Unexpected machine check has occurred
000A
Software detected failure has occurred
000B
Solid NXIO error has occurred
000C
Excessive transient NCIO errors have occurred
000D
A solid IO error has occurred
000E
Excessive transient IO errors have occurred
000F
Excessive VAXELN kernel recoverable errors have occurred
0010
A VAXELN master fatal error has occurred
0011
A VAXELN job fatal error has occurred
0012
Not enough SPTEs could be allocated to boot OpenVMS
0013
Unexpected system error occurred
0014
Interface module failure has occurred
0015
Unexpected VAXELN error occurred
1
1 Reset
reason 0013 indicates that an unexpected system error occurred. The contents of the SYSFLT,
SYSADR, and DMAADR registers will be saved in the CCA area. See Figure A–4 for the CCA offsets
of these registers. Use the register bitmaps and description in Section A.4 to determine the cause of
the error.
(continued on next page)
A–2 Miscellaneous System Information
Table A–2 (Cont.) Processor Halt Reason Code Definitions
Reason Code
(Hex)
Definition
0016
A VAXELN kernel fatal error has occurred
0017
Initializing VAXELN before starting reconfiguration
A.3 Console Halt Codes
The following example shows a console Halt code output. Table A–3 defines the
Halt Reason fields.
>>>
?03 Reset (Reason = 0013)
PC= 01E00000 PSL= 041F0300
Table A–3 Console Halt Reason Code Definitions
Reason Code
(Hex)
Definition
0000
Power-up reset
0001
Duplex zones have diverged
0002
Fatal cross-link error has occurred
0003
Fatal zone error has occurred
0004
Fatal ATM error has occurred
0005
Fatal CPU module error has occurred
0006
Fatal memory error has occurred
0007
Single bit memory error has occurred
0008
User command issued to stop a zone
0009
Unexpected machine check has occurred
000A
Software detected failure has occurred
000B
Solid NXIO error has occurred
000C
Excessive transient NCIO errors have occurred
000D
A solid IO error has occurred
000E
Excessive transient IO errors have occurred
000F
Excessive VAXELN kernel recoverable errors have occurred
0010
A VAXELN master fatal error has occurred
0011
A VAXELN job fatal error has occurred
0012
Not enough SPTEs could be allocated to boot OpenVMS
0013
Unexpected system error occurred1
0014
Interface module failure has occurred
1 Reset reason 0013 indicates that an unexpected system error occurred. The contents of the SYSFLT,
SYSADR, and DMAADR registers will be saved in the CCA area. See Figure A–4 for the CCA offsets
of these registers. Use the register bitmaps and description in Section A.4 to determine the cause of
the error.
(continued on next page)
Miscellaneous System Information A–3
Table A–3 (Cont.) Console Halt Reason Code Definitions
Reason Code
(Hex)
Definition
0015
Unexpected VAXELN error occurred
0016
A VAXELN kernel fatal error has occurred
0017
Initializing VAXELN before starting reconfiguration
A.4 Error Register Descriptions
A.4.1 System Fault (SYSFLT) Register
This register is not rail or zone unique (Figure A–1). Software does not take
special precautions when reading this register. In addition, the register is
continuously updated. The setting of one error bit does not prevent other bits
from being set. The register contains bits which cause IPL29 interrupts.
All bits in this register have the following characteristics: default = 0, type = ro,
reset = hr.
Figure A–1 System Fault Register
31
30
SFB
29
28
27
26
XLM
25
24
23
22
21
20
19
18
17
16
LCK
RSA
CBG
PWG
CPB
CPA
HTB
HTA
MFB
MFA
15
14
13
12
11
10
09
08
07
06
05
04
03
02
01
00
MDB
MDA
MSB
MSA
JDB
JDA
JSB
JSA
NXB
NXA
IOB
IOA
DNB
DNA
DMB
DMA
MR−0583−92RAGS
Register Address: CPU = E110 1100 (CCA offset = 15C)
[31]: SFB - Solid Fault Bit. Latched when an automatic retry on an I/O
operation fails to complete properly.
[30:28]: XLM - Xlink Mode [2:0]. This field, sourced by the Xlink, is read-only
and indicates the Xlink mode specified in Table A–4.
Table A–4 Xlink Mode Coding
Code
Mode
000
Xlink Off
001
Xlink Slave
010
Xlink Master
011
Xlink Duplex
100
Not Used
(continued on next page)
A–4 Miscellaneous System Information
Table A–4 (Cont.) Xlink Mode Coding
Code
Mode
101
Resync Slave
110
Resync Master
111
Not Used
[27:26]: - Not used.
[25]: LCK - Lock. Latched when an error occurs during an interlock I/O access.
(Interlock access refers to the special I/O access mode.)
[24]: RSA - Resync Abort. Latched when an error occurs during resync mode.
Resync mode is automatically canceled.
[23]: CBG - Cable Gone. Latched when a cable gone signal is detected. CBG set
will force the Xlink to the off mode.
[22]: PWG - Power Gone. Set when the other zone power gone signal is detected.
PWG set will force the Xlink to the off mode.
[21]: CPB - Clock Phase Error (Zone B). Latches a high level assertion on the
Clock Phase Error line coming from the Xlink. The high level will remain until a
1 is written to the bit. If the Clock Phase Error signal line is still high after the
write 1 to clear, the bit is again set to 1.
[20]: CPA - Clock Phase Error (Zone A). Latches a high level assertion on the
Clock Phase Error line coming from the Xlink. The high level will remain until a
1 is written to the bit. If the Clock Phase Error signal line is still high after the
write 1 to clear, the bit is again set to 1.
[19]: HTB - Halt Error (Zone B). Latches a high level assertion on the Halt
Request line coming from the Xlink. The high level will remain until a 1 is
written to the bit. If the Halt Error signal line is still high after the write 1 to
clear, the bit is again set to 1.
[18]: HTA - Halt Error (Zone A). Latches a high level assertion on the Halt
Request line coming from the Xlink. The high level will remain until a 1 is
written to the bit. If the Halt Error signal line is still high after the write 1 to
clear, the bit is again set to a 1.
[17]: MFB - CPMF (Zone B). Set when the error logic determines that a CPMF
is required.
[16]: MFA - CPMF (Zone A). Set when the error logic determines that a CPMF
is required.
[15]: MDB - Memory Double-Bit Error (Zone B). Set when a double-bit ECC
error or single-bit ECC error is detected during memory writes on the internal
Jet Bus ECC checker. This causes a CPMF.
[14]: MDA - Memory Double-Bit Error (Zone A). Set when a double-bit ECC error
or single-bit ECC error is detected during memory writes on the internal Jet Bus
ECC checker. This causes a CPMF.
[13]: MSB - Memory Single-Bit Error (Zone B). Set when a single-bit ECC error
is detected in memory during a read and the JXD was not the requester of the
data. The bit is set regardless of the state of the Error Enable bit. The error
is automatically corrected at the CPU. An IPL26 interrupt is generated causing
Miscellaneous System Information A–5
a two-zone system to diverge. Hardware generates an IPL29 interrupt to both
zones within three clock cycles.
[12]: MSA - Memory Single-Bit Error (Zone A). Set when a single-bit ECC error
is detected in memory during a read and the JXD was not the requester of the
data. The bit is set regardless of the state of the Error Enable bit. The error
is automatically corrected at the CPU. An IPL26 interrupt is generated causing
a two-zone system to diverge. Hardware generates an IPL29 interrupt to both
zones within three clock cycles.
[11]: JDB - JXD Double-Bit Error (Zone B). Set when a double-bit ECC error is
detected on the internal Jet Bus ECC checker.
[10]: JDA - JXD Double-Bit Error (Zone A). Set when a double-bit ECC error is
detected on the internal Jet Bus ECC checker.
[09]: JSB - JXD Single-Bit Error (Zone B). Set when a single-bit ECC error is
detected on the internal Jet Bus ECC checker and is detected in memory. The
check operation is triggered during Jet Bus transactions. The bit is set regardless
of the state of the Error Enable bit. The error is automatically corrected on JXD
reads from memory. Detection of this error causes the current DMA address
to be latched. The DMA operation is allowed to complete. When finished, the
DMA driver will check this bit, and if set will force a mini resync by reading the
location pointed to by the DMA Error Address register.
[08]: JSA - JXD Single-Bit Error (Zone A). Set when a single-bit ECC error
is detected on the internal Jet Bus ECC checker and is detected in memory.
The check operation is only triggered during Jet Bus transactions. The bit is
set regardless of the state of the Error Enable bit. The error is automatically
corrected on JXD reads from memory. Detection of this error causes the current
DMA address to be latched. The DMA operation is allowed to complete. When
finished, the DMA driver will check this bit, and if set will force a mini resync by
reading the location pointed to by the DMA Error Address register.
[07]: NXB - Nonexistent I/O (Zone B). Set after any bus timeout. If the retry
passes, the Solid Fault bit will not be set.
[06]: NXA - Nonexistent I/O (Zone A). Set after any bus timeout. If the retry
passes, the Solid Fault bit will not be set.
[05]: IOB - I/O Error (Zone B). Set by errors that occur from nonfatal or
recoverable CPU initiated transactions. Errors resulting from CPU to I/O
transactions are retried.
[04]: IOA - I/O Error (Zone A). Set by errors that occur from nonfatal or
recoverable CPU initiated transactions. Errors resulting from CPU to I/O
transactions are retried.
[03]: DNB - DMA NXIO (Zone B). Set when a bus timeout occurs and the
CROME bus is performing a DMA operation.
[02]: DNA - DMA NXIO (Zone A). Set when a bus timeout occurs and the
CROME bus is performing a DMA operation.
[01]: DMB - DMA Error (Zone B). Set by DMA errors. If the bit is set, the DMA
is aborted. A DMA error may generate a CPMF.
[00]: DMA - DMA Error (Zone A). Set by DMA errors. If the bit is set, the DMA
is aborted. A DMA error may generate a CPMF.
A–6 Miscellaneous System Information
A.4.2 System Error Address (SYSADR) Register
This register latches when any error is detected at the JXD Jet Bus and below
(Figure A–2). It contains the address the CPU was accessing at the time the
error occurred. The register is read only and cleared by clearing errors.
All bits in this register have the following characteristics: default = 0, type = ro,
reset = hr.
Figure A–2 JXD System Error Address Register
31
30
29
28
27
26
25
24
23
21
20
19
18
17
16
06
05
04
03
02
01
00
ADR
DL
15
22
14
13
12
11
10
09
08
07
ADR
MR−0581−92RAGS
Register Address: CPU = E110 1030 (CCA_BASE+160)
[31:30]: DL - Data length:
00 - Hexword
01 - Longword
10 - Quadword
11 - Octaword
[29:00] ADR - 30-bit error address latched on CPU operations to the JXD.
A.4.3 DMA Error Address (DMAADR) Register
When a single-bit ECC error is detected at the JXD, the current DMA subtransfer address into main memory is latched in this register and an IPL29
interrupt is generated. Software allows the DMA to complete and later use this
information to fix the bad location in memory (Figure A–3).
All bits in this register have the following characteristics: default = 0, type = ro,
reset = hr.
Figure A–3 JXD DMA Error Address Register
31
30
29
28
27
26
25
24
23
21
20
19
18
17
16
06
05
04
03
02
01
00
DEA
DL
15
22
14
13
12
11
10
09
08
07
DEA
MR−0572−92RAGS
Miscellaneous System Information A–7
Register Address: CPU = E110 1040 (CCA_BASE+180)
[31:30]: DL - DMA data length:
00 - Hexword
01 - Longword
10 - Quadword
11 - Octaword
[29:00]: DEA - DMA 30-bit address latched during error.
A.4.4 Reset Reason 0013 Fault Analysis
The following example shows the content of the SYSFLT and SYSADR registers
after a Reset Halt. The following paragraph analyzes the register content and
identifies the faulty FRU.
?03 Reset (Reason = 0013)
PC= 01E00000 PSL= 041F0300
>>> E/P 1E9AD5C
P 01E9AD5C
300000C0
>>> E/P 1E9AD60
P 01E9AD60
799F0000
!
!
!
!
!
examine saved SYSFLT register contents
from CCA_BASE+15C
NXIO, Zone A (bus timeout)
NXIO, Zone B (bus timeout)
XLINK MODE = Duplex
! examine saved SYSADR register contents
! from CCA_BASE+160
! Zone B, slot 17 P-card address
CCA Base Address
MEMORY SIZE
CCA_BASE
-------------------------32-Mbyte
1E9AC00
64-Mbyte
3E9AC00
96-Mbyte
5E9AC00
128-Mbyte
7E9AC00
160-Mbyte
9E9AC00
192-Mbyte
BE9AC00
224-Mbyte
DE9AC00
256-Mbyte
FE9AC00
The SYSFLT register indicates a NXIO (nonexistent I/O) error. The SYSADR
register contains a 30-bit address of 399F0000. However, after sign extended to
32 bits the address is translated to F99F0000.
Figure A–4 shows that F99F0000 is the address of an interface module in Zone
B, slot 17. The module failed to respond to its address causing a bus timeout.
Replace the module.
A.5 I/O Physical Address Space
Figure A–4 shows the I/O physical address space.
A–8 Miscellaneous System Information
Figure A–4 I/O Physical Address Space
0000 0000
1FFF FFFF
2000 0000
3FFF FFFF
Main Memory
(512−Mbytes, 30−bit)
(current VMS addressable limit)
CPU Private Space
E000 0000
SYSADR Register
E110 1030 (CCA offset = 15C)
DMAADR Register
E110 1040 (CCA offset = 160)
SYSFLT Register
E110 1100 (CCA offset = 180)
Reserved for Zone A (M=0)
Zone A I/O ATM, Slot 1
Main Memory
(512−Mbytes, 32−bit)
(support by later VMS release)
4000 0000
Unsupported Memory
(1−Gbytes)
(M=1)
Zone A ATM Pcard, Slot 10 (*P=8)
F198 0000
Zone A ATM Pcard, Slot 11 (*P=9)
F199 0000
Zone A ATM Pcard, Slot 12 (*P=A)
F19A 0000
Zone A ATM Pcard, Slot 13 (*P=B)
F19B 0000
Zone A ATM Pcard, Slot 14 (*P=C)
F19C 0000
Zone A ATM Pcard, Slot 15 (*P=D)
Zone A ATM Pcard, Slot 16 (*P=E)
Zone A ATM Pcard, Slot 17 (*P=F)
F19D 0000
F19E 0000
F19F 0000
F1A0 0000
8000 0000
B Cache Tags
(1−Gbytes)
C000 0000
E000 0000
FFFF FFFF
Zone A I/O ATM Firewall Space
Zone A I/O RAM/Flash ROM
Unsupported Memory
(512−Mbytes)
EFFF FFFF
F000 0000
F100 0000
Reserved for
Zone A future I/O, Slot 2
F1AF FFFF
F1B0 0000
F1FF FFFF
F200 0000
(M=2)
I/O Space
(512−Mbytes)
F2FF FFFF
FM00 0000
~
~
FMAF FFFF
Unsupported Zone A I/O
(M=3 − 7)
Reserved for Zone B (M=8)
Zone B I/O ATM, Slot 1
(M=9)
F800 0000
F900 0000
Zone B ATM Pcard, Slot 10 (*P=8)
F998 0000
Zone B ATM Pcard, Slot 11 (*P=9)
F999 0000
Zone B ATM Pcard, Slot 12 (*P=A)
F99A 0000
Zone B ATM Pcard, Slot 13 (*P=B)
F99B 0000
Zone B ATM Pcard, Slot 14 (*P=C)
F99C 0000
Zone B ATM Pcard, Slot 15 (*P=D)
Zone B ATM Pcard, Slot 16 (*P=E)
Zone B ATM Pcard, Slot 17 (*P=F)
F99D 0000
F99E 0000
F99F 0000
F9A0 0000
Zone B I/O ATM Firewall Space
F9AF FFFF
Zone B I/O RAM/Flash ROM
Reserved for
Zone B future I/O, Slot 2
F9B0 0000
F9FF FFFF
FA00 0000
(M=A)
FAFF FFFF
Unsupported Zone B I/O
(M=B − F)
FM00 0000
~
~
FMFF FFFF
PKO−0150−93RAGS
Miscellaneous System Information A–9
A.6 System Control Block Description
The System Control Block (SCB) contains vectors for servicing interrupts and
exceptions. The SCB address should be aligned on a page boundary. The
SCB address is contained in the System Control Block Base register (SCBB)
(Figure A–5). Microcode forces a longword-aligned SCBB by clearing bits [01:00]
of the new value before loading the register.
Figure A–5 System Control Block Base Register
31
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
05
04
03
02
01
00
0
0
Physical Page Address of SCB
15
14
13
12
11
10
09
08
07
06
Physical Page Address of SCB
SBZ
MR−0021−93RAGS
An SCB vector is an aligned longword in the SCB through which the CPU
microcode dispatches interrupts and exceptions. Each SCB vector has the format
shown in Figure A–6.
Figure A–6 System Control Block Vector Format
31
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
05
04
03
02
01
00
Longword Address of Service Routine
15
14
13
12
11
10
09
08
07
06
Longword Address of Service Routine
Code
MR−0022−93RAGS
[31:02]: Longword Address - Virtual address of the service routine for the
interrupt or exception. The routine must be longword aligned since the microcode
forces the two low-order bits to 0.
[01:00]: Code - The code field is defined in Table A–5.
Table A–5 Code Field Definition
Code
Definition
00
The event is to be serviced on the kernel stack unless the CPU is already on the
interrupt stack, in which case the event is serviced on the interrupt stack.
(continued on next page)
A–10 Miscellaneous System Information
Table A–5 (Cont.) Code Field Definition
Code
Definition
01
The event is to be serviced on the interrupt stack. If the event is an exception, the
IPL is raised to 1F (hex).
10
Unimplemented, results in a console error halt.
11
Unimplemented, results in a console error halt.
The SCB content is specified in Table A–6.
Table A–6 SCB Layout
Vector
Name
Type
Parameter
Notes
00
Unused
—
—
—
04
Unused
—
—
—
08
Machine check
Abort
6
Parameters reflect
machine state; must
be serviced on the
interrupt stack
0C
Unused
—
—
—
10
Reserved privileged
instruction
Fault
0
—
14
Customer reserved
instruction
Fault
0
XFC instruction
18
Reserved operand
Fault/abort
0
Not always
recoverable
1C
Reserved addressing
mode
Fault
0
—
20
Access control
violation/ vector
alignment fault
Fault
2
Parameters are
virtual address and
status code
24
Translation not valid
Fault
2
Parameters are
virtual address and
status code
28
Trace pending
Fault
0
—
2C
Breakpoint instruction
Fault
0
—
30
Unused
—
—
Compatibility mode
in other VAX systems
34
Arithmetic trap
Fault
1
Parameter is type
code
38 to 3C
Unused
—
—
—
40
CHMK
Trap
1
Parameter is signextended operand
word
44
CHME
Trap
1
Parameter is signextended operand
word
(continued on next page)
Miscellaneous System Information A–11
Table A–6 (Cont.) SCB Layout
Vector
Name
Type
Parameter
Notes
48
CHMS
Trap
1
Parameter is signextended operand
word
4C
CHMU
Trap
1
Parameter is signextended operand
word
50
Unused
—
—
—
54
Soft error notification
Interrupt
0
IPL is 1A (hex)
58 to 5C
Unused
—
—
—
60
Hard error notification
Interrupt
0
IPL is 1D (hex)
64
Unused
—
—
—
68
Vector unit disabled
Fault
0
Vector instructions
6C to 80
Unused
—
—
—
84
Software level 1
Interrupt
0
88
Software level 2
Interrupt
0
Ordinarily used for
AST delivery
8C
Software level 3
Interrupt
0
Ordinarily used for
process scheduling
90 to BC
Software levels 4 to 15
Interrupt
0
—
C0
Interval timer
Interrupt
0
IPL is 16 (hex)
C4
Unused
—
—
—
C8
Emulation start
Fault
10
Same mode
exception, FPD=0;
parameters are
opcode, PC, specifiers
CC
Emulation continue
Fault
0
Same mode
exception, FPD=1;
parameters are
opcode, PC, specifiers
D0
Device vector
Interrupt
0
IPL is 14 (hex)
D4
Device vector
Interrupt
0
IPL is 15 (hex),
includes console
interrupts
D8
Device vector
Interrupt
0
IPL is 16
(hex), includes
interprocessor
interrupts
DC
Device vector
Interrupt
0
IPL is 17 (hex)
E0 to F4
Unused
—
—
—
F8 to FC
Unused
—
—
—
100 to
FFCC
Unused
—
—
—
A–12 Miscellaneous System Information
B
ISE Parameter Worksheets
B.1 In This Appendix
This appendix includes:
•
Individual ISE parameter worksheets
•
ISE zone parameter worksheets
B.2 Individual ISE Parameter Worksheets
Use the following worksheets to record parameters for each ISE.
Serial Number:
NODENAME:
SYSTEMID:
ALLCLASS:
UNITNUM:
FORCEUNI:
FORCENUM:
Serial Number:
NODENAME:
SYSTEMID:
ALLCLASS:
UNITNUM:
FORCEUNI:
FORCENUM:
MR−0052−93RAGS
ISE Parameter Worksheets B–1
Serial Number:
NODENAME:
SYSTEMID:
ALLCLASS:
UNITNUM:
FORCEUNI:
FORCENUM:
Serial Number:
NODENAME:
SYSTEMID:
ALLCLASS:
UNITNUM:
FORCEUNI:
FORCENUM:
Serial Number:
NODENAME:
SYSTEMID:
ALLCLASS:
UNITNUM:
FORCEUNI:
FORCENUM:
MR−0053−93RAGS
B–2 ISE Parameter Worksheets
B.3 ISE Zone Parameter Worksheets
Use the following worksheets to record parameters for each ISE.
Serial No:
Serial No:
NODENAME:
NODENAME:
UNITNUM:
UNITNUM:
Serial No:
Serial No:
NODENAME:
NODENAME:
UNITNUM:
UNITNUM:
Serial No:
Serial No:
NODENAME:
NODENAME:
UNITNUM:
UNITNUM:
Serial No:
Serial No:
NODENAME:
NODENAME:
UNITNUM:
UNITNUM:
Serial No:
Serial No:
NODENAME:
NODENAME:
UNITNUM:
UNITNUM:
Serial No:
Serial No:
NODENAME:
NODENAME:
UNITNUM:
UNITNUM:
MR−0054−93RAGS
ISE Parameter Worksheets B–3
Serial No:
Serial No:
NODENAME:
NODENAME:
UNITNUM:
UNITNUM:
Serial No:
Serial No:
NODENAME:
NODENAME:
UNITNUM:
UNITNUM:
Serial No:
Serial No:
NODENAME:
NODENAME:
UNITNUM:
UNITNUM:
Serial No:
Serial No:
NODENAME:
NODENAME:
UNITNUM:
UNITNUM:
Serial No:
Serial No:
NODENAME:
NODENAME:
UNITNUM:
UNITNUM:
Serial No:
Serial No:
NODENAME:
NODENAME:
UNITNUM:
UNITNUM:
MR−0054−93RAGS
B–4 ISE Parameter Worksheets
Index
A
Application of thresholds, 4–17
ATM module
removal and replacement, 5–7
ATM module deconfiguration actions, 4–13
B
Before you begin, 5–3
Boot parameter block
data structures, 4–60
Bootstrap procedures, 2–7, 2–8
C
CAMP module
removal and replacement, 5–24
CCA fields
firmware interfaces, 4–53
CIO mode console commands
BOOT, 2–7
CIO mode, entering, 2–8
Console
command language syntax, 2–6
control characters, 2–5
description, 2–1, 2–3
entering console mode, 2–4
exiting console mode, 2–4
operating modes, 2–3, 2–4
operations, 2–1
Console commands, 2–22
BOOT, 2–9
CLEAR, 2–10
! (comment), 2–22
CONTINUE, 2–11
DUP, 2–13
EXAMINE, 2–13
FIND, 2–15
HELP, 2–15
INITIALIZE, 2–16
MATCH_ZONES, 2–16
MOVE, 2–16
REPEAT, 2–17
SET, 2–17
SET BOOT DEFAULT, 2–18
SHOW, 2–18
Console commands (cont’d)
START, 2–19
TEST, 2–20, 3–30
X, 2–21
Z, 2–22, 3–31
Console communications area data structures,
4–55
Console extender module
removal and replacement, 5–20
Controls and indicators
disk drawer, 3–19
CPU and expansion cabinets
system component descriptions, 1–1
CPU and memory deconfiguration actions, 4–14
CPU module
removal and replacement, 5–7
CPU module subDCB
data structures, 4–64
CPU or zone unsynchable error log entry, 4–72
CPU ROM-based diagnostics
system diagnostics, 3–31
CPU/MEM fault end action error log entry, 4–69
CPU/MEM fault error log entry, 4–66
Cross-link assembly
removal and replacement, 5–18
Cross-link cable deconfiguration actions, 4–16
D
Deconfiguration information block, 4–24
Deconfiguration messages, 4–49
Device configuration block
data structures, 4–61
Device fault indicators, 3–19
Device status indicators, 3–19
DIM
removal and replacement, 5–26
Disk drawer
controls and indicators, 3–19
Disk drives
RF35 disk drawer, 3–19
SF35-BK/HK/JK, 3–21
SF73-HK/JK, 3–24
Dispatch block description
data structures, 4–59
Index–1
Documentation road map, iii
DSSI cable
removal and replacement, 5–29
DSSI disk drawer
removal and replacement, 5–14
DSSI extender module
removal and replacement, 5–22
DSSI interface module
removal and replacement, 5–26
DUP, 6–1
PARAMS utility, 6–1
SET HOST, 6–1
Duplex compatibility test, 4–57
E
EHS, 4–1
EHS structure, 4–3
EIM
removal and replacement, 5–28
Eject button
unload function, 3–28
End action timeouts, 4–29
End actions, 4–28
Error event messages, 4–40
Error handling services (EHS), 4–1
Error isolation and handling, 4–2
Error log analysis, 4–66
Error register descriptions, A–4
DMA error address register, A–7
system error address register, A–7
system fault register, A–4
Error types, 4–5
ESD procedures, 5–4
Ethernet interface module
removal and replacement, 5–28
Event reporting interface routines, 4–40
F
Fan
removal and replacement, 5–10
Fault data, 4–27
Fault summary, 4–20
FCSB
removal and replacement, 5–10
FEU
removal and replacement, 5–16
Firmware and OpenVMS interface data structures,
4–54
Firmware interfaces, 4–50
FRU deconfiguration, 4–13
FRU handling, 5–4
FRU information, 4–22
FRU isolation, 4–12
FRU list, 5–1
Index–2
FRUs, 4–12
access, 5–5
FTSS event reporting interface, 4–40
G
General troubleshooting procedure
system maintenance, 3–4
H
Halt codes
console halt codes, A–3
processor halt codes, A–1
I
I/O expansion module console and diagnostics
firmware interfaces, 4–53
I/O expansion module deconfiguration actions,
4–14
I/O physical address space, A–8
I/O ROM-based diagnostics
system diagnostics, 3–34
Interface module deconfiguration actions, 4–15
ISE, 6–1
finding parameter values, 6–5
individual parameter worksheet, B–1
installing new, 6–11
parameters, 6–4
replacing, 6–8
setting, 6–5
removal, 6–7
system parameter worksheet, B–3
L
Load/Unload button
reset function, 3–29
M
Maintenace strategy
system maintenance, 3–1
MMB
removal and replacement, 5–9
Module fault LEDs
system maintenance, 3–6
Module NVRAM status and LED indicators, 4–38
O
OpenVMS error log, 4–19
Operating rules and cautions
system maintenance, 3–2
P
Page frame number bitmap
data structures, 4–65
POST, 3–27
Power distribution box
removal and replacement, 5–42
Power distribution boxes
system component descriptions, 1–9
Power modules, 3–12
system component descriptions, 1–8
Power system maintenance, 3–12
Power system overview
system maintenance, 3–7
Power-on, 3–27
Power-on self-test (POST)
status of OCP indicators, 3–27
PSC
removal and replacement, 5–16
R
Removal and replacement
ATM module, 5–7
CAMP module, 5–24
console extender module, 5–20
CPU module, 5–7
cross-link assembly, 5–18
DIM, 5–26
DSSI cable, 5–29
DSSI disk drawer, 5–14
DSSI extender module, 5–22
DSSI interface module, 5–26
EIM, 5–28
Ethernet interface module, 5–28
fan, 5–10
FCSB, 5–10
FEU, 5–16
MMB, 5–9
power distribution box, 5–42
PSC, 5–16
RF35 disk drive, 5–12
SF35 storrage array, 5–36
SF73 disk drive, 5–32
SIMM, 5–8
TF857-CA tape drive, 5–39
TF85C-BA tape drive, 5–30
5V regulator, 5–16
3.3V regulator, 5–16
zone control panel, 5–14
Reset
load/Unload button, 3–29
Reset reason fault analysis
error register descriptions, A–8
RF35 disk drawer
disk drives, 3–19
RF35 disk drive
removal and replacement, 5–12
ROM-based diagnostics
system diagnostics, 3–29
S
SCB description, A–10
Server setup switch, 6–2
Services
error handling, 4–1
SET HOST, 6–1
SF35 storage array
removal and replacement, 5–36
SF35-BK/HK/JK storage array
disk drives, 3–21
SF73 disk drive
removal and replacement, 5–32
SF73-HK/JK storage array
disk drives, 3–24
Shutting down a zone, 5–4
SIMM
removal and replacement, 5–8
Software detected errors
fault data, 4–34
Starting up a zone, 5–5
Sub-device condiguration block
data structures, 4–63
System console and diagnostics
firmware interfaces, 4–50
System control block description, A–10
System operating modes, 4–4
System registers
fault data, 4–27
System resets
firmware interfaces, 4–51
T
Tape devices
TF857 tape loader, 3–27
TF857 tape loader controls and indicators,
3–27
TF85C tape drive, 3–26
TEST command
system diagnostics, 3–30
TF857 tape loader controls and indicators
tape devices, 3–27
TF857-AA tape loader
operating procedures, 3–27
TF857-CA tape drive
removal and replacement, 5–39
TF85C tape drive
tape devices, 3–26
TF85C-BA tape drive
removal and replacement, 5–30
Index–3
Threshold information block, 4–26
TK85C-BA cartridge tape drive indicators, 3–27
fault data, 4–30
VAXELN error handling, 4–10
U
W
Unit number assignment, 6–2
Unsynchable events
fault data, 4–36
Warm swapping, 6–3
V
5V regulator
removal and replacement, 5–16
3.3V regulator
removal and replacement, 5–16
VAXELN detected errors
Index–4
Z
Z command
system diagnostics, 3–31
Zone control panel
removal and replacement, 5–14
system component descriptions, 1–6
Zone deconfiguration actions, 4–16
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising