Compaq AlphaServer GS60E Service manual


Add to my manuals
180 Pages

advertisement

Compaq AlphaServer GS60E Service manual | Manualzz

AlphaServer GS60E

Service Manual

Order Number: EK-GS60E-SV. A01

This manual is intended for Compaq service engineers. It includes troubleshooting information, configuration rules, and instructions for removal and replacement of field-replaceable units (FRUs) for the Compaq AlphaServer GS60E system.

Compaq Computer Corporation

First Printing, February 2000

The information in this publication is subject to change without notice.

COMPAQ COMPUTER CORPORATION SHALL NOT BE LIABLE FOR TECHNICAL OR

EDITORIAL ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR INCIDENTAL

OR CONSEQUENTIAL DAMAGES RESULTING FROM THE FURNISHING,

PERFORMANCE, OR USE OF THIS MATERIAL.

This publication contains information protected by copyright. No part of this publication may be photocopied or reproduced in any form without prior written consent from Compaq Computer

Corporation.

The software described in this guide is furnished under a license agreement or nondisclosure agreement. The software may be used or copied only in accordance with the terms of the agreement.

© 2000 Compaq Computer Corporation.

All rights reserved. Printed in the U.S.A.

Computer Corporation. Alpha, AlphaServer, OpenVMS, and StorageWorks are registered in

COMPAQ, the Compaq logo, and Tru64 are copyrighted and are trademarks of Compaq the U.S

Patent and Trademark Office. Microsoft and Windows are registered trademarks of Microsoft

Corporation. UNIX is a registered trademark in the U.S. and other countries, licensed exclusively through X/Open Company Ltd. Other product names mentioned herein may be the trademarks of their respective companies.

FCC Notice: The equipment described in this manual generates, uses, and may emit radio frequency energy. The equipment has been type tested and found to comply with the limits for a

Class A digital device pursuant to Part 15 of FCC rules, which are designed to provide reasonable protection against such radio frequency interference. Operation of this equipment in a residential area may cause interference in which case the user at his own expense will be required to take whatever measures may be required to correct the interference. Any modifications to this device—unless expressly approved by the manufacturer—can void the user’s authority to operate this equipment under part 15 of the FCC rules.

Shielded Cables: If shielded cables have been supplied or specified, they must be used on the system in order to maintain international regulatory compliance.

Warning! This is a Class A product. In a domestic environment this product may cause radio interference in which case the user may be required to take adequate measures.

Achtung! Dieses ist ein Gerät der Funkstörgrenzwertklasse A. In Wohnbereichen können bei

Betrieb dieses Gerätes Rundfunkstörungen auftreten, in welchen Fällen der Benutzer für entsprechende Gegenmaßnahmen verantwortlich ist.

Attention! Ceci est un produit de Classe A. Dans un environnement domestique, ce produit risque de créer des interférences radioélectriques, il appartiendra alors à l'utilisateur de prendre les mesures spécifiques appropriées.

Contents

Preface

........................................................................................................................xi

Chapter 1 Introduction

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

System Overview................................................................................... 1-2

TLSB System Bus ................................................................................. 1-4

Processor Module .................................................................................. 1-6

MS7CC Memory Module ....................................................................... 1-8

KFTHA Module ................................................................................... 1-10

Power Subsystem Overview ................................................................ 1-12

I/O Bus and In-Cab Storage Devices................................................... 1-14

Troubleshooting Overview .................................................................. 1-16

Chapter 2 Troubleshooting with LEDs

2.1

2.2

2.3

2.4

2.5

2.6

Operator Control Panel......................................................................... 2-2

Troubleshooting TLSB Modules............................................................ 2-6

Troubleshooting a PCI Shelf ................................................................. 2-8

Troubleshooting StorageWorks Shelves ............................................. 2-10

Troubleshooting the Power Subsystem............................................... 2-12

Troubleshooting the Cooling Subsystem............................................. 2-14

Chapter 3 Console Display and Diagnostics

3.1

3.2

3.3

3.4

3.5

3.6

3.7

Checking Self-Test Results: Console Display ....................................... 3-2

Show Configuration Display ................................................................. 3-4

Running Diagnostics: the Test Command ............................................ 3-6

Testing the Entire System .................................................................... 3-8

Sample Test Command for a Memory Module.................................... 3-10

Identifying a Failing SIMM ................................................................ 3-12

Info Command..................................................................................... 3-14 v

Chapter 4 DECevent Error Log

4.5.1

4.5.2

4.5.3

4.6

4.6.1

4.6.2

4.6.3

4.1

4.1.1

4.1.2

4.1.3

4.2

4.3

4.4

4.5

Brief Description of the TLSB Bus........................................................ 4-2

Command/Address Bus................................................................... 4-2

Data Bus ......................................................................................... 4-3

Error Checking ............................................................................... 4-3

Producing an Error Log with DECevent............................................... 4-4

Getting a Summary Error Log .............................................................. 4-5

Supported Event Types......................................................................... 4-6

Sample Error Log Entries..................................................................... 4-8

Machine Check 660 Error ............................................................... 4-8

Machine Check 620 Error ............................................................. 4-17

DWLPB Motherboard (PCIA) Adapter Error Log ........................ 4-24

Console Halt Conditions ..................................................................... 4-30

CPU Double Error Halt ................................................................ 4-30

Machine Check Logout Frames .................................................... 4-39

Machine Check Error Log............................................................. 4-42

Chapter 5 Removal and Replacement Procedures

5.1

5.1.1

5.1.2

5.1.3

5.1.4

TLSB Modules....................................................................................... 5-2

How to Replace the Only Processor ................................................ 5-2

How to Replace the Boot Processor................................................. 5-4

How to Add a New Processor or Replace a Secondary

Processor ......................................................................................... 5-8

Processor, Memory, or Terminator Module Removal and

Replacement ................................................................................. 5-12

SIMM Removal and Replacement ................................................ 5-14

I/O Cable and KFTHA Module Removal and Replacement.......... 5-18

TLSB Card Cage Removal .................................................................. 5-20

5.1.5

5.1.6

5.2

5.3

5.4

5.5

5.6

Operator Control Panel....................................................................... 5-24

CD Tray............................................................................................... 5-26

AC Distribution Box............................................................................ 5-28

Power Rack Assembly ......................................................................... 5-30

5.7

5.8

Cabinet Control Logic (CCL) Panel..................................................... 5-32

BA36R StorageWorks Shelf ................................................................ 5-34

5.9

DWLPB PCI Box ................................................................................. 5-36

5.10

Plenum Assembly................................................................................ 5-38

5.11

Cabinet Panels .................................................................................... 5-40

5.12

Cables.................................................................................................. 5-42 vi

Appendix A Updating Firmware

A.1

A.2

A.3

A.4

A.5

A.6

Booting LFU..........................................................................................A-2

List ........................................................................................................A-4

Update...................................................................................................A-6

Exit......................................................................................................A-10

Display and Verify Commands ...........................................................A-12

Create..................................................................................................A-14

Appendix B Console Commands and Environment Variables

B.1

B.2

Console Commands ...............................................................................B-1

Environment Variables .........................................................................B-5

Index

Examples

3–1 System Self-Test Console Display......................................................... 3-2

3–2 Show Configuration Sample ................................................................. 3-4

3–3 Sample Test Commands........................................................................ 3-6

3–4 Sample Test Command for the Entire System ..................................... 3-8

3–5 Sample Test Command, Memory Test ................................................ 3-10

3–6 Console Mode: No Failing SIMMS ...................................................... 3-12

3–7 Console Mode: Failing SIMMs Found................................................. 3-13

3–8 Examples of the Info Command.......................................................... 3-14

4–1 Producing an Error Log with DECevent............................................... 4-4

4–2 Summary Error Log .............................................................................. 4-5

4–3 OSF Event Type Identification ............................................................. 4-7

4–4 OpenVMS Event Type Identification .................................................... 4-7

4–5 Sample Machine Check 660 Error Log Entry ....................................... 4-8

4–6 Sample Machine Check 620 Error Log Entry ..................................... 4-17

4–7 Sample DWLPB Motherboad Error Log Entry ................................... 4-24

4–8 CPU Double Error Halt....................................................................... 4-33

5–1 Replacing the Only Processor Module .................................................. 5-2

5–2 Replacing the Boot Processor ................................................................ 5-4

5–3 Adding or Replacing a Secondary Processor ......................................... 5-8

A–1 Booting LFU from CD-ROM .................................................................A-2

A–2 List Command.......................................................................................A-4

A–3 Update Command .................................................................................A-6

A–4 Exit Command ....................................................................................A-10 vii

A–5 Display and Verify Commands ...........................................................A-12

A–6 Create Command ................................................................................A-14

Figures

1–1 AlphaServer GS60E System ................................................................. 1-2

1–2 TLSB Card Cage ................................................................................... 1-4

1–3 Processor Module .................................................................................. 1-6

1–4 MS7CC Memory Module ....................................................................... 1-8

1–5 KFTHA Module Hoses ........................................................................ 1-10

1–6 KFTHA Module ................................................................................... 1-11

1–7 GS60E Power Subsystem.................................................................... 1-12

1–8 I/O Bus and In-Cab Storage ................................................................ 1-14

1–9 Troubleshooting Steps......................................................................... 1-16

1–10 Troubleshooting Tools ......................................................................... 1-17

2–1 Operator Control Panel......................................................................... 2-2

2–2 Troubleshooting: Start with the Operator Control Panel ..................... 2-4

2–3 TLSB Module LEDs .............................................................................. 2-6

2–4 PCI Shelf ............................................................................................... 2-8

2–5 Troubleshooting Steps for PCI Shelf..................................................... 2-9

2–6 Troubleshooting StorageWorks Devices and Shelves ......................... 2-10

2–7 Power Subsystem ................................................................................ 2-12

2–8 Cooling Subsystem .............................................................................. 2-14

2–9 Cabinet Airflow ................................................................................... 2-15

3–1 Hose Numbering Scheme for KFTHA................................................... 3-5

4–1 Error Log Header Structure................................................................ 4-31

5–1 Processor, Memory, or Terminator Module ........................................ 5-12

5–2 Removing a SIMM............................................................................... 5-14

5–3 SIMM Connector Numbers – E2035 Module ...................................... 5-16

5–4 SIMM Connector Numbers – E2036 (2-Gbyte) and E2037 (4-Gbyte)

Modules ............................................................................................... 5-17

5–5 I/O Hose Cable .................................................................................... 5-18

5–6 TLSB Card Cage Removal .................................................................. 5-20

5–7 Operator Control Panel....................................................................... 5-24

5–8 CD Tray............................................................................................... 5-26

5–9 AC Distribution Box............................................................................ 5-28

5–10 Power Rack Assembly ......................................................................... 5-30

5–11 Cabinet Control Logic (CCL) Panel..................................................... 5-32

5–12 BA36R StorageWorks Shelf ................................................................ 5-34

5–13 DWLPB PCI Box ................................................................................. 5-36 viii

5–14 Plenum Assembly................................................................................ 5-38

5–15 Cabinet Panels .................................................................................... 5-40

5–16 Cables.................................................................................................. 5-42

Tables

1 Compaq AlphaServer GS60E Documentation ....................................... xii

1–1 Memory Modules and Related SIMMs.................................................. 1-9

2–1 Operator Control Panel LEDs............................................................... 2-2

2–2 Operator Control Panel LEDs at Power-Up ......................................... 2-3

2–3 SCSI Disk Drive LEDs........................................................................ 2-11

4–1 TLSB Address Bus Commands ............................................................. 4-2

4–2 Supported Event Types......................................................................... 4-6

4–3 Parsing a Sample 660 Error (Example 4-5) .......................................... 4-8

4–4 Parsing a Sample 620 Error (Example 4-6) ........................................ 4-17

4–5 Parsing a DWLPB Motherboard Error (Example 4-7)........................ 4-24

5–1 Cables.................................................................................................. 5-43

B–1 Summary of Console Commands ..........................................................B-1

B–2 Environment Variables .........................................................................B-5

B–3 Settings for the graphics_switch Environment Variable ......................B-8 ix

Preface

Intended Audience

This manual is written for the customer service engineer.

Document Structure

This manual uses a structured documentation design. Topics are organized into small sections, usually consisting of two facing pages. Most topics begin with an abstract that provides an overview of the section, followed by an illustration or example. The facing page contains descriptions, procedures, and syntax definitions.

This manual has five chapters and two appendixes.

Chapter 1, Introduction, introduces the AlphaServer GS60E system and gives a brief overview of the system bus, modules, and power subsystem.

Chapter 2, Troubleshooting with LEDs, tells how to use the LEDs and other indicators to find problem components in the system.

Chapter 3, Console Display and Diagnostics, tells how to use these tools to find nonfunctioning components in the system.

Chapter 4, DECevent Error Log, describes how to interpret the error log produced by this utility program.

Chapter 5, Removal and Replacement Procedures, describes the removable and replacement procedures for GS60E components that are replaceable by field service personnel.

Appendix A, Updating Firmware, describes how to use console commands and the Loadable Firmware Update (LFU) Utility to update system firmware.

Appendix B, Console Commands and Environment Variables, is a quick reference for commands.

xi

Documentation Titles

Table 1 Compaq AlphaServer GS60E Documentation

Title Order Number

Hardware User Information and Installation

AlphaServer GS60E Installation Guide

AlphaServer GS60E Operations Manual

KFTHA System I/O Module Installation Card

KFE72 Installation Guide

Service Information

AlphaServer GS60E Service Manual

Reference Manual

AlphaServer GS60E and GS140 Getting Started with

Logical Partitions

Upgrade Manuals

GS60/8200 to GS60E Upgrade Manual

H7506 Power Supply Installation Card

RRDCD Installation Card

EK–GS60E–IN

EK–GS60E–OP

EK–KFTHA–IN

EK–KFE72–IN

EK–GS60E–SV

EK–TUNLP–SF

EK–GS60E–UP

EK–H7506–IN

EK–RRDXX–IN

Information on the Internet

Visit the Compaq Web site at www.compaq.com for service tools and more information about the AlphaServer GS60E system.

xii

Chapter 1

Introduction

The AlphaServer GS60E system is a high-performance, symmetric multi– processing system. It offers access to multiple high-bandwidth I/O buses, very large memory capacities, up to eight high-performance CPUs, and many other features normally associated with mainframe systems.

This chapter introduces the AlphaServer GS60E system. Sections in this chapter include:

• System Overview

• TLSB System Bus

• Processor Module

• MS7CC Memory Module

• KFTHA Module

• Power Subsystem Overview

• I/O Bus and In-Cab Storage Devices

• Troubleshooting Overview

Introduction 1-1

1.1 System Overview

The Compaq AlphaServer GS60E system is the latest offering in the

GS60/GS140 family. It uses the same system bus, the TLSB, with seven slots. It provides the reliability and availability features normally associated with mainframe systems. The GS60E has redundant, hotswappable N+1 power supplies.

Figure 1–1 AlphaServer GS60E System

2nd

Expander

Cabinet

System

Cabinet

1st

Expander

Cabinet

SM11-99

1-2 Service Manual

AlphaServer GS60E System

The AlphaServer GS60E system main cabinet contains the seven-slot TLSB card cage, power supplies, and space for PCI I/O shelves and StorageWorks shelves. The GS60E system can have up to two expander cabinets (see Figure

1-1), containing additional PCI I/O shelves and StorageWorks shelves.

Chapter 2 describes how to use LEDs and other indicators to troubleshoot the system. Chapter 3 describes the console display and diagnostics. The error log produced by the DECevent utility program is described in Chapter 4. Removal and replacement procedures for FRUs are described in Chapter 5.

AlphaServer GS60E Options

A list of the latest supported options is on the Internet, which you can access as follows:

Using ftp, copy the file: ftp://ftp.digital.com/pub/Digital/Alpha/systems/as8400/docs/supported_options.txt

Using a Web browser, follow links from the URL: http://www.digital.com/alphaserver/products.html

Introduction 1-3

1.2 TLSB System Bus

The TLSB card cage is a 7-slot card cage that contains slots for up to four CPU modules, up to five memory array modules, and up to three

I/O modules. The TLSB bus interconnects the CPU, memory, and I/O modules.

Figure 1–2 TLSB Card Cage

First Memory or

Additional I/O or CPU Module

Additional

Memory, I/O or

CPU Modules

Rear

4 5 6 7 8

Centerplane

Power Filter

Front

3 2 1 0

OM24-99

Additional

CPUs or Memories

1-4 Service Manual

The TLSB card cage is located in the upper part of the system cabinet. The

TLSB card cage contains seven module slots (slots 3 and 4 are not used). The slots are numbered 0 through 2 from right to left in the front of the cabinet and slots 5 through 8 right to left in the rear of the cabinet (see Figure 1-2). The minimum configuration is a processor module in slot 0, an I/O module in slot 8, a memory module in slot 7, and terminator modules in all other slots.

Module Placement Rules

Configure modules in this order:

1. Place the processor modules first. Start at slot 0 and work up to slot 2. If a

fourth processor module is used, it can be placed in slot 5, 6, or 7.

2. Place the KFTHA modules next. The first KFTHA module goes in slot 8, a

second in slot 7, and a third in slot 6.

3. Place memory modules last. The first memory module goes in the highest

numbered open slot, the next in the lowest numbered open slot, and so on,

alternating between highest- and lowest-numbered open slots.

4. Fill all remaining open slots with terminator modules.

About the TLSB Card Cage

Modules used in this system are:

Terminator

1 Gbyte memory (MS7CC-EA)

2 Gbyte memory (MS7CC-FA)

4 Gbyte memory (MS7CC-GA)

KFTHA (4 hose cables)

Dual processor (KN7CG-AB and KN7CH-AB)

The maximum number of processor modules is four.

The maximum number of memory modules is five. Memory modules may be placed in slots 1, 2, 5, 6, and 7 only. The maximum amount of memory is 20

Gbytes. All memory modules support two-way interleaving. Mixed sizes of memory modules may be installed in the TLSB card cage.

Each system must have a minimum of one KFTHA I/O module, installed in slot 8.

Introduction 1-5

1.3 Processor Module

Up to four processor modules can be used in an AlphaServer GS60E system. Each processor module contains two CPU chips.

Figure 1–3 Processor Module

1

5

3 2

4

Side 2

5

6

Side 1

SM13-99

1-6 Service Manual

The KN7CG processor module has two Alpha 21264 chips, with a clock speed of

525 MHz. The KN7CH processor module has two 21264A chips, with a clock speed of 700 MHz. If one of the CPUs on the processor module is malfunctioning, you replace the entire module. The chip is not a fieldreplaceable unit (FRU). The console display (see Section 3.1) shows each processor on a module.

Figure 1-3 shows the processor module. The raised blocks in the figure represent heatsinks that cover the chips.

CPU chips. Each 21264(A) chip has a separate address and data bus for

B-cache and system operations. The 21264(A) chip has a 64-Kbyte instruction cache and a 64-Kbyte data cache.

Cache Memory. 4-Mbyte L2 cache per CPU (21264) and 8-Mbyte ECC

L2 onboard cache per CPU (21264A).

TCC. The TurboLaser control chip (TCC) takes commands from both

CPUs and issues them to the TLSB. It also controls all data movements through the TDI and SWI chips.

SWIs. Two swizzle (SWI) chips receive data from the 256-bit wide DLSB and pass it to one of the CPU chips over the 64-bit wide data interface bus.

TDIs. Four TurboLaser Data Interface (TDI) chips receive data from the

TLSB and pass the data over the DLSB to the two SWI chips.

DC to DC Converters. These converters step the 48 VDC power supplied by the power subsystem to the voltages required by the components on the processor board.

Introduction 1-7

1.4 MS7CC Memory Module

The GS60E uses three variants of the MS7CC memory module, 1 Gbyte,

2 Gbytes, and 4 Gbytes. Up to 20 Gbytes of memory can be configured using combinations of the three module variants.

Figure 1–4 MS7CC Memory Module

1

2

3

2

4

1

1-8 Service Manual

SM14-99

All memory modules for the AlphaServer GS60E have SIMMs (single inline memory modules). DRAMs are mounted on small cards that are fixed to the larger memory module by spring-held mounting clips that grip both sides of the

SIMM. Figure 1-4 shows:

The array of SIMMS in an MS7CC–EA (1-Gbyte) memory module.

Memory data interface (MDI) gate arrays that provide the data interface between the TLSB bus and the DRAM arrays. The MDIs contain data buffers, ECC checking logic, self-test data generation and checking logic, and control and status registers (CSRs).

The control address interface (CTL) gate array that provides the interface to the TLSB, controls DRAM timing and refresh, runs memory self-test, and contains TLSB and memory-specific registers.

The DC-to-DC converter.

All types of SIMMs for all the memory modules available for AlphaServer

GS60E systems are field-replaceable. Section 3.6 describes how to isolate a problem SIMM. When you replace a SIMM, you must be sure that the type of

SIMM matches the module for which it is designed, as detailed in Table 1-1.

Table 1-1 Memory Modules and Related SIMMs

Memory (Size)

Motherboard

Part Number

MS7CC–EA (1 GB) EA2035-AA

MS7CC–FA (2 GB) EA2036-AA

MS7CC–GA (4 GB) EA2037-AA

SIMM Part Number

54-21726-01 (32 MB)

54-21718-01 (64 MB)

54-24723-01 (128 MB)

Number of SIMMs

32

36

36

Introduction 1-9

1.5 KFTHA Module

The KFTHA module offers four “hose” connections that interface between the TLSB and the I/O subsystem.

Figure 1–5 KFTHA Module Hoses

Hoses

OM32-99

1-10 Service Manual

The KFTHA module is designed for high-speed, high-volume data transfers.

Direct memory access (DMA) transfers are pipelined to allow for up to 500

Mbytes/second throughput. The major elements of the KFTHA module are:

RAM to buffer data for the DMA transfers.

Four hose-to-data (HDP) chips, each handling 32 bits from two “hoses”

(I/O cables connecting to an adapter in an associated I/O bus). Data on the HDPs flow in one direction; either “up” (to the KFTHA) or “down” (to the I/O adapter).

Four I/O data path (IDP) chips, which together handle a 256-bit data transfer to or from the TLSB system bus.

An I/O control chip (ICC) houses the primary control logic for the TLSB interface.

A DC-to-DC converter that converts the 48 VDC system power to the DC voltage required by the KFTHA module.

Figure 1–6 KFTHA Module

2

3

1

4

3

5

SM16-99

Introduction 1-11

1.6 Power Subsystem Overview

The power subsystem consists of an AC input box, a DC distribution module, redundant hot swap power supplies, a cabinet control logic

(CCL) panel, and cables.

Figure 1–7 GS60E Power Subsystem

Front Rear

DC Distribution

Module

Power

Supplies

CCL Panel

Power

Supplies

AC Input Box

GS60E23-99

1-12 Service Manual

Three-phase AC power enters the system by cable through the AC input box

(see Figure 1-7). The H7506 power supplies convert three-phase AC power to 48

VDC. Three hot-swappable power supplies offer n+1 redundancy; that is, if any one power supply fails, the remaining two supply the needed power.

Introduction 1-13

1.7 I/O Bus and In-Cab Storage Devices

Both the AlphaServer GS60E main cabinet and expander cabinets are designed to hold PCI shelves and StorageWorks I/O shelves.

Figure 1–8 I/O Bus and In-Cab Storage

(Front View) (Rear View)

7-Slot System Bus

Up to 4 CPU Modules

(8 CPUs)

Up to 5 Memory Modules

(12 GB)

Up to 3 I/O Modules

Blowers

DWLPB PCI

CD Drive

(and optional

floppy drive)

StorageWorks

Shelf

Power Supplies

CCL Panel

AC Input Box

SM18-99

1-14 Service Manual

Figure 1-8 shows an AlphaServer GS60E system cabinet.

As shown, PCI shelves and StorageWorks shelves are mounted horizontally.

Each StorageWorks shelf has room for up to seven devices, including a signal converter and 3.25-inch disks or tapes. A power unit (DC-to-DC converter) is in the leftmost slot of shelf.

The system cabinet has space for up to two PCI shelves (DWLPB-DA) and three

StorageWorks shelves (BA36R-RC/RD UltraSCSI).

Each expander cabinet has space for four PCI shelves and three StorageWorks shelves or three PCI shelves and four StorageWorks shelves.

Introduction 1-15

1.8 Troubleshooting Overview

Follow steps to isolate system problems. A possible routine is shown below.

Figure 1–9 Troubleshooting Steps

You cannot find cause of user problem by phone. Go to site and follow these steps.

Control panel LEDs

lit

Yes

No

Check power subsystem

(see Section 2.5)

Operating system

running

No

Yes

Customer experiences intermittent error: Check error log (see Chapter 4)

Console software

running

Yes

Type "init" command.

Check system self-test display (see Section 3.1)

No

Restart system. Check system self-test display

(see Section 3.1)

Identify

faulty FRU

No

Yes

Power down system and replace FRU. Power up.

If system self-test passes, boot operating system.

Boot operating system, check error log (see

Chapter 4)

Yes Identify

faulty FRU

Done

No

Problem is beyond the scope of this Service

Manual. Call customer support center for help.

SM19-99

1-16 Service Manual

The system hardware, console software, and operating system software provide three types of troubleshooting tools, as shown in Figure 1-10.

Chapters 2, 3, and 4 tell how to use these tools to isolate faulty components or report software problems for AlphaServer GS60E systems.

Figure 1-10 Troubleshooting Tools

Tools for Finding Problems

LEDs and Indicators

System Self-Test and

Other Console Displays

Error Log Printout

SM110-99

Introduction 1-17

Chapter 2

Troubleshooting with LEDs

This chapter tells how to use the LED displays and other indicators to track down faulty components that you can replace in the AlphaServer GS60E system.

LEDs give status on the power subsystem, system bus (TLSB) modules

(processor, memory, and I/O) the I/O bus, and devices in shelves. The cooling subsystem consists of two blowers located in the center of the system cabinet.

They can be checked by looking and listening for the fans.

Sections in this chapter are as follows:

• Operator Control Panel

• Troubleshooting TLSB Modules

• Troubleshooting a PCI Shelf

• Troubleshooting StorageWorks Shelves

• Troubleshooting the Power Subsystem

• Troubleshooting the Cooling Subsystem

Troubleshooting with LEDs 2-1

2.1 Operator Control Panel

Start with the operator control panel (OCP). Check the OCP lights. The

OCP has six status LEDs, three pushbuttons, and a keyswitch.

Figure 2–1 Operator Control Panel

1 2 3

OM29-99

4 5

6

Table 2–1 Operator Control Panel LEDs

Light

➊ –

Run

Color State Meaning

Green On Power is supplied to entire system; the blowers are running. System has exited console.

System is powered on.

➋ –

Power

Green On

➌ –

Fault

Yellow On

➍ –

On

Green On

➎ –

Secure

Green On

Fault on system bus.

Power is supplied to the whole system.

➏ –

Reset

Yellow On

Indicates input from the console device is prevented.

Indicates a system reset has occurred, clearing captured error information.

2-2 Service Manual

Six status indicator LEDs (see Figure 2-1) show the state of the system. Table

2-1 describes the conditions indicated by the lights.

NOTE: With the keyswitch in the On position, if all six LEDs are blinking, one or more of the power supplies has failed or there is a missing power supply. With the keyswitch in the Off position, the LEDs will also blink but do not provide power supply status.

Table 2–2 Operator Control Panel LEDs at Power-Up

Action

Set circuit breaker to On

Turn keyswitch to

On and press

On button

System selftest starts

Module passes selftest

Module fails self-test

Power supply problem

Operating system boots

Keyswitch

On; On/Off

Button On

Off

Run Power

Blink Blink

Fault On Secure Reset

Blink Blink Blink Blink

On

On

On

On

On

On

Off

Off

Off

Off

On

On

On

On

Blink Blink

On On

Blink On

On

Off

On

On

On

On

Off

Off

Off

Off

Blink Blink Blink

Off On Off

Off

Off

Off

Off

Blink

Off

Troubleshooting with LEDs 2-3

Figure 2-2 Troubleshooting: Start with the Operator Control Panel

On/Off button/ keyswitch is Off

Yes

No

1

Fix problem identified.

If a faulty component or firmware update was identified as the problem, replace the component or update the firmware. If the problem has not yet been identified, go to

2

2

Turn power on and watch power-up.

As 48-VDC power is passed to the system, initial tests are run on the CPU, memory, and I/O adapters on the system. If the system passes this power-up testing, the green Run and On

LEDs should light. If it does not, look at the console terminal display to pinpoint the failing module and display, the console terminal may be a TGA

(graphics) terminal, connected through a PCI bus.

Connect a character-cell terminal through the serial port on the system cabinet. Repeat

2

Fault LED is lit

No

Yes

3 Some component failed system self-test.

If Run and On are green, Fault is lit, and system self-tests have completed, replace any failed component and proceed with

2

System clock and CPUs are not synchronized.

If Run is off and On is green, Fault is lit, and system self-test did not complete, check to see if the system clock and the CPUs have different cycle times. Replace as appropriate and proceed with

2

A

SM22-99

2-4 Service Manual

Figure 2-2 Troubleshooting: Start with the Operator Control Panel

(Continued)

A

Any LEDs lit on control panel

Yes

No

4

Status LEDs are not receiving power/signals.

Check the power supplies to see if DC power is leaving the supply. If so, check the power and signal lines to the CCL panel. Check the cabling between the CCL and the operator control panel. If connections seem OK, replace CCL. If still no lights on control

Green LED(s)

lit

Yes

5

System self-test passed (On is lit); operating system running (Run is lit).

If both green LEDs are lit, system self-test has

passed, and the operating system is running. Check

the error log (see Chapter 4). Ensure that the

proper boot disk is selected to boot the operating system.

If Run is not lit, boot the operating system.

When the operating system boots, look at the error log.

SM22B-99

Troubleshooting with LEDs 2-5

2.2 Troubleshooting TLSB Modules

You can check individual module self-test results by looking at the status LEDs on the module.

Figure 2–3 TLSB Module LEDs

LEDs

CPU

Memory KFTHA

SM24-99

2-6 Service Manual

In general, if a module on the TLSB does not pass self-test (green light is not lit) it should be replaced.

There is a case where some removal and replacement action may be needed even though the module passes self-test.

Failure of the built-in self-test for the MS7CC modules indicates that testing has shown that there is no single 64-Kbyte segment of memory that is usable.

Each 64-Kbyte segment must show at least 256 bad pages before it is noted as unusable. However, it is possible for a SIMM to warrant replacement, even though the module as a whole passes its self-test.

You can determine faulty SIMMs with the show config console command, as described in Chapter 3.

Troubleshooting with LEDs 2-7

2.3 Troubleshooting a PCI Shelf

LEDs show the status of the power supplies, as well as the adapter selftest results in the PCI shelf.

Figure 2–4 PCI Shelf

LED Status in PCI Shelf

LED 1 - On-board power system OK

LED 2 - Motherboard self-test passed

LED 3 - 48 VDC power supply OK

LED 4 - Hose Error

1 2 3 4

DWLPB LED numbers

OM55-99

2-8 Service Manual

Figure 2-5 Troubleshooting Steps for PCI Shelf

LED 3 lit

Yes

LED 1 lit

Yes

LED 2 lit

Yes

LED 4 lit

No

1 1 Check Cabling to PCI shelf.

Check to make sure the clip connectors

are engaged properly. If so, proceed to

2 Check 48V Power Supply.

2

No

No

13 Internal Power System Error.

Check fans in blower; check

for jumper cable (a small plug) replacing

fan connection.

3

4 Replace

Power

Board.

Yes

15 Replace Motherboard.

16 Hose Error.

Some error has occurred in the protocol

governing the transfer of data over the

hose. Replace the hose first, the mother-

board second, the KFTHA third.

OM56-99

Troubleshooting with LEDs 2-9

2.4 Troubleshooting StorageWorks Shelves

StorageWorks devices are mounted in horizontal shelves in the GS60E system or expander cabinet. LEDs are located on each disk drive.

Figure 2–6 Troubleshooting StorageWorks Devices and Shelves

Green LEDs

Yellow LEDs

OM57-99

2-10 Service Manual

Table 2-3 SCSI Disk Drive LEDs

Indicator LED

Green

Yellow

LED State

Off

Flashing

On

Off

Flashing

On

Meaning

No activity

Activity

Activity

Normal

Spin up/spin down

Not used

Troubleshooting with LEDs 2-11

2.5 Troubleshooting the Power Subsystem

The GS60E power supplies accept three-phase AC and produce 48 VDC power. Each power supply has two LEDs that indicate normal conditions and faults.

Figure 2–7 Power Subsystem

Front VAUX LED (top)

48V LED (bottom)

Rear

Power

Supplies

AC Power Line Cord

Main Circuit Breaker

SM27-99

2-12 Service Manual

The system must be provided with a suitable source of 3-phase AC power.

Three H7506 power supplies (see Figure 2-7) provide the necessary power and power redundancy required for all internal system components.

The AC input box is located at the bottom of the system cabinet (when viewing the system cabinet from the rear). The 48 VDC power supplies are located above the AC input box and are visible when viewing the system cabinet from the front.

The AC input box provides the interface for the system to the AC utility power.

The DC distribution module connects the AC input box and power supplies. It distributes the 48 VDC power. The circuit breaker and power indicators are at the rear of the cabinet.

Circuit Breaker

The main circuit breaker, CB1, controls power to the entire system, including the power supplies, blowers, and in-cabinet options. Current overload causes the breaker to trip to the Off position, so that power to the system is turned off.

For normal operation, circuit breaker CB1 must be in the On position, with the handle pushed up. To shut the circuit breaker off, push the handle down. Subbreakers CB2 through CB11 should also be in the On (up) position during normal system operation.

AC Power Indicators

Three lights above the AC power line cord (see Figure 2-7) indicate that AC power is supplied to the line side of main circuit breaker CB1.

The power supplies have two LEDs that indicate normal conditions and faults.

When the system (keyswitch) is off, plugged in, and the circuit breakers are on, power is present only within the AC box and power supplies. The green VAUX

LEDs on the power supplies should be illuminated. When the system is on, the

VAUX and 48V LEDs should light.

Troubleshooting with LEDs 2-13

2.6 Troubleshooting the Cooling Subsystem

The cooling system cools the power subsystem, the TLSB card cage, and shelves.

Figure 2–8 Cooling Subsystem

(Front View)

TLSB

Blowers

CD Drive

StorageWorks

Shelf

Power Supplies

AC Input Box

DWLPB PCI

SM28-99

2-14 Service Manual

The cooling system is designed to keep the system components at an optimal operating temperature. It is important to keep the front and rear doors free of obstructions, leaving a minimum clearance space of 1.5 meters (59 inches) in the front and 1 meter in the rear to maximize airflow.

Two blowers, located in the center of the cabinet (see Figure 2-8) draw air downward through the TLSB card cage. Air is exhausted at the middle of the cabinet, to the rear (see Figure 2-9). The blower speed varies based on the system’s ambient temperature.

CAUTION: Anything placed on the top of the cabinet could restrict airflow.

This will cause the system to power down.

Figure 2-9 Cabinet Airflow

OM211-99

Troubleshooting with LEDs 2-15

Chapter 3

Console Display and Diagnostics

This chapter describes how hardware diagnostic programs are executed when the system is initialized. Sections include:

• Checking Self-Test Results: Console Display

• Show Configuration Display

• Running Diagnostics: the Test Command

• Testing the Entire System

• Sample Test Command for a Memory Module

• Identifying a Failing SIMM

• Info Command

Console Display and Diagnostics 3-1

3.1 Checking Self-Test Results: Console Display

The self-test console display gives information for the TLSB modules and the PCIs in the system.

Example 3–1 System Self-Test Console Display

F E D C B A 9 8 7 6 5 4 3 2 1 0 NODE #

A M M M . . P P P TYP

o + + + . . ++ ++ ++ ST1

. . . . . . EE EE EB BPD

o + + + . . ++ ++ ++ ST2

. . . . . . EE EE EB BPD

o + + + . . ++ ++ ++ ST3

. . . . . . EE EE EB BPD

+ + + + + + + . . . . + C0 PCI +

. . . . . . . . EISA +

. . . . . . . . . . . . . . . . C1

. . . . . . . . . . . . . . . . C2

. . . . . . . . . . . . . . . . C3

B0 A1 A0 . . . . . ILV

. 4GB 4GB 4GB . . . . . 12GB

Compaq AlphaServer GS60E2-6/700/8, Console V5.5-25 26-OCT-1999 12:06:03

SROM V2.3, OpenVMS PALcode V1.68-101, Tru64 UNIX PALcode V1.61-101

System Serial = NI84177052, OS = OpenVMS, 3:11:57 December 7, 1999

:

:

P00>>>

3-2 Service Manual

➊ The NODE # line lists the node numbers on the TLSB and I/O buses.

➋ The TYP line in the printout indicates the type of module at each TLSB node. Processors are type P, memories are type M, and the KFTHA port module is type A. A period (.) indicates that the slot is not populated or that the module is not reporting.

➌ This line shows the results of individual processor and memory module tests.

Possible values are pass (+) or (–). Since the I/O port module does not have a module-resident self-test, its entry for the ST1 line is always "o".

➍ The BPD line indicates boot processor determination. When the system goes through self-test, the processor with the lowest ID number that passes selftest (ST1 line is +) becomes the boot processor, unless you intervene. The process occurs again after ST2 and ST3 testing. “B” indicates boot processor,

“E” indicates the processor is enabled to become the boot processor, and “D” indicates that a console command has been issued disabling the processor from the possibility of becoming the boot processor.

This BPD line is printed three times. After the first determination of the boot processor, the processors go through two more rounds of testing. Since it is possible for a processor to pass self-test (at line ST1) and fail ST2 or ST3 testing, the processors again determine the boot processor following each round of tests. The first processor to pass self-test is chosen as the boot processor.

➎ During the second round of testing (ST2) all processors run additional CPU tests involving memory.

➐ During the third round of testing (ST3) all processors run multiprocessor tests, and the status of each processor is once again reported on the BPD line.

➑ The primary CPU also tests the I/O port module at this time.

In Example 3-1, the PCI (channel C0) and its options at nodes 0, 5, 6,

7, 8, 9, 10, and 11 passed self-test as indicated by the + symbols.

I/O channels C1, C2, and C3 are not used.

➀ The ILV line contains a memory interleave value (ILV) for each memory.

This line displays the size of each memory module and gives the total size of system memory. In Example 3-1, the total size is 12 Gbytes.

Console version and firmware revision date are given.

Console Display and Diagnostics 3-3

3.2 Show Configuration Display

The show configuration console command is useful to obtain more information about the system configuration, in case you need to replace a module.

Example 3–2 Show Configuration Sample

P00>>> show configuration

Name

TLSB

0++ KN7CG-AB

6+ MS7CC

7+ KFTHA

8+ KFTHA

Type Rev Mnemonic

8025 0000 kn7cg-ab0

5000 0000 ms7cc0

2020 0000 kftha0

2000 0000 kftha1

C0 PCI connected to kftha0 pci0

0+ SIO 4828086 0003 sio0

8+ ISP1020 8101 0000 kzpsa1

7+ KZPSA 8101 0000 kzpsa0

A+ DAC960 11069 0000 dac0

Controllers on SIO sio0

0+ DECchip 21040-AA 21011 0000 tulip0

1+ FLOPPY 2 0000 floppy0

2+ KBD 3 0000 kbd0

3+ MOUSE 4 0000 mouse0

P00>>>

The first grouping shows the modules on the TLSB bus and their status.

In this example, the processor is in slot 0, as shown in the console display of system self-test. A memory is at node 6, and KFTHA modules at nodes 7 and 8.

C0 is next, showing the PCI bus on the KFTHA module.

3-4 Service Manual

Node 0 is the KFE72 standard I/O PCI/EISA adapter module.

Nodes 7 and 8 are the KZPSA adapters.

This line shows the DA960 controller.

These lines show the controllers on the SIO module.

Figure 3-1 shows the connector numbering scheme for the KFTHA module.

Each slot has four connector numbers associated with it, numbered in increasing order from top to bottom, as shown.

Figure 3–1 Hose Numbering Scheme for KFTHA

Centerplane

C0 C4 C8

C3 C7 C11

8

SM31-99

Console Display and Diagnostics 3-5

3.3 Running Diagnostics: the Test Command

The test command allows you to run diagnostics on the entire system, an I/O subsystem, a single module, a group of devices, or a single device.

Example 3–3 Sample Test Commands

P00>>> test # Tests the entire system.

# Default run time is 10 minutes.

P00>>> t pci0 –t 60

P00>>> test ms*

P00>>> t –q

# Tests all devices associated

# with the PCI0 subsystem. Test

# run time is 60 seconds.

# Tests all ms7cc memory modules.

# Status messages will not be

# displayed during test time.

3-6 Service Manual

You enter the command test to test the entire system using exercisers resident in ROM on the boot processor module. No module self-tests are executed when the test command is issued without a mnemonic.

When you specify a subsystem mnemonic or a device mnemonic with test, such as test pci0 or test ms7cc0, self-tests are executed on the associated modules first and then the appropriate exercisers are run.

Console Display and Diagnostics 3-7

3.4 Testing the Entire System

The test command with no modifiers runs all exercisers for subsystems and devices on the system.

Example 3–4 Sample Test Command for the Entire System

P00>>>test

Console is in diagnostic mode

Complete Test Suite for runtime of 1200 seconds

Type ^C to stop testing

Configuring system...

:

:

Memory Tests not run. Must run separately using TEST MS7CC*

Starting network exerciser on ewa0.0.0.12.0 (id #28f) in internal loopback mode

Starting network exerciser on ewb0.0.0.11.0 (id #2a1) in internal loopback mode

Starting network exerciser on ewc0.0.0.12.4 (id #2b3) in internal loopback mode

Starting network exerciser on ewd0.0.0.11.4 (id #2c5) in internal loopback mode

Starting device exerciser on dka0.0.0.4.0 (id #36f) in READ-ONLY mode

Stopping device exerciser on dka0.0.0.4.0 (id #36f)

Starting device exerciser on dka100.1.0.4.0 (id #5df) in READ-ONLY mode

Stopping device exerciser on dka100.1.0.4.0 (id #5df)

Starting device exerciser on dka200.2.0.4.0 (id #858) in READ-ONLY mode

Stopping device exerciser on dka200.2.0.4.0 (id #858)

Starting device exerciser on dka300.3.0.4.0 (id #acc) in READ-ONLY mode

Stopping device exerciser on dka300.3.0.4.0 (id #acc)

Starting device exerciser on dka400.4.0.4.0 (id #d37) in READ-ONLY mode

Stopping device exerciser on dka400.4.0.4.0 (id #d37)

Stopping all testing... please wait

Stopping network exerciser on ewd0.0.0.11.4 (id #2c5)

Stopping network exerciser on ewc0.0.0.12.4 (id #2b3)

Stopping network exerciser on ewb0.0.0.11.0 (id #2a1)

Stopping network exerciser on ewa0.0.0.12.0 (id #28f)

---------Testing done ------------

3-8 Service Manual

Example 3–4 Sample Test Command, System Test (Continued)

Shutting down drivers...

Shutting down units on tulip2, slot 12, bus 0, hose 4...

Shutting down units on floppy1, slot 0, bus 1, hose 4...

Shutting down units on isp4, slot 6, bus 0, hose 4...

Shutting down units on isp5, slot 7, bus 0, hose 4...

Shutting down units on isp6, slot 8, bus 0, hose 4...

Shutting down units on isp7, slot 9, bus 0, hose 4...

Shutting down units on isp8, slot 10, bus 0, hose 4...

Shutting down units on tulip3, slot 11, bus 0, hose 4...

Shutting down units on tulip0, slot 12, bus 0, hose 0...

Shutting down units on floppy0, slot 0, bus 1, hose 0...

Shutting down units on isp0, slot 4, bus 0, hose 0...

Shutting down units on isp1, slot 6, bus 0, hose 0...

Shutting down units on isp2, slot 7, bus 0, hose 0...

Shutting down units on isp3, slot 8, bus 0, hose 0...

Shutting down units on tulip1, slot 11, bus 0, hose 0...

:

:

P00>>>

➊ In Example 3-4, the operator enters the test command. The complete test suite runs for 1200 seconds.

➋ To stop execution of the test command before normal completion, use Ctrl/C (^C). Termination using ^C may take a number of seconds depending upon the particular configuration being tested.

➌ Memory testing is done separately. Status messages indicate the start of the console-based exercisers.

➍ Testing is complete.

➎ All exercisers are stopped, as indicated by the status messages.

➏ The console prompt returns.

Console Display and Diagnostics 3-9

3.5 Sample Test Command for a Memory Module

To test a processor, memory module, or an I/O adapter and its associated devices, enter the test command and the correct mnemonic.

Mnemonics are displayed when you enter a show configuration or a show device command.

Example 3–5 Sample Test Command, Memory Test

P00>>> set d_report full

P00>>> test ms*

Console is in diagnostic mode

Memory subsystem test selected for runtime of 1200 seconds

Type Ctrl/C to abort...

**************************************************************

* *

* ALLOW AT LEAST 2 MINUTES OF TESTING TIME FOR EACH GIGABYTE *

* OF MAIN MEMORY *

* *

* SINGLE-BIT ERROR REPORTING IS ENABLED *

* *

**************************************************************

Starting Cache Coherency Tests

Starting Marching 1’s and 0’s Tests

Memory size is 8192 MB

More than 2 GB memory present ... memory size is 1FFE

Starting Victimize Tests

>2 GB memory testing beginning ...

Starting test 4 at addresses 7F400000 and 10F800000

Starting test 2 at addresses 13F900000 and 16FA00000

Starting test 2 at addresses AF500000 and 19FB00000

Still testing Memory...

Still testing Memory...

Still testing Memory...

:

:

Still testing Memory...

Still testing Memory...

Stopping all testing... please wait

---------Testing done ------------

3-10 Service Manual

Example 3–5 Sample Test Command, Memory Test (Continued)

Shutting down drivers...

Shutting down units on tulip2, slot 12, bus 0, hose 4...

Shutting down units on floppy1, slot 0, bus 1, hose 4...

Shutting down units on isp4, slot 6, bus 0, hose 4...

Shutting down units on isp5, slot 7, bus 0, hose 4...

Shutting down units on isp6, slot 8, bus 0, hose 4...

Shutting down units on isp7, slot 9, bus 0, hose 4...

Shutting down units on isp8, slot 10, bus 0, hose 4...

Shutting down units on tulip3, slot 11, bus 0, hose 4...

Shutting down units on tulip0, slot 12, bus 0, hose 0...

Shutting down units on floppy0, slot 0, bus 1, hose 0...

Shutting down units on isp0, slot 4, bus 0, hose 0...

P00>>>

:

:

In Example 3-5:

➊ Enter test ms*.

➋ All MS7CC memory modules are tested by the memory exerciser, a series of tests executed from the processor module.

NOTE: To test a single memory module on your system, type:

test ms7ccn, where n is the module number.

Console Display and Diagnostics 3-11

3.6 Identifying a Failing SIMM

From the console, you can check for flawed or poorly seated SIMMs in memory boards. This information is useful as a simple on-site check as part of a service call, as a validation procedure after upgrading a memory, or adding or changing SIMMs for any reason. Failing SIMMs are also reported in the error log (see Chapter 4).

Example 3–6 Console Mode: No Failing SIMMS

P00>>> set simm_callout on

P00>>> init

Initializing…. . .

WARNING: SIMM_CALLOUT environment variable is ON

F E D C B A 9 8 7 6 5 4 3 2 1 0 NODE #

A M M M . . P P P TYP

o + + + . . ++ ++ ++ ST1

. . . . . . EE EE EB BPD

o + + + . . ++ ++ ++ ST2

. . . . . . EE EE EB BPD

o + + + . . ++ ++ ++ ST3

. . . . . . EE EE EB BPD

+ + + + + + + . . . . + C0 PCI +

. . . . . . . . EISA +

. . . . . . . . . . . . . . . . C1

. . . . . . . . . . . . . . . . C2

. . . . . . . . . . . . . . . . C3

B0 A1 A0 . . . . . ILV

. 4GB 4GB 4GB . . . . . 12GB

Compaq AlphaServer GS60E2-6/700/8, Console V5.5-25 26-OCT-1999 12:06:03

SROM V2.3, OpenVMS PALcode V1.68-101, Tru64 UNIX PALcode V1.61-101

System Serial = NI84177052, OS = OpenVMS, 3:11:57 December 7, 1999

:

P00>>> show simm

No selftest errors found on any memory modules!

P00>>> set simm_callout off

P00>>> init

Initializing. . .

3-12 Service Manual

The set simm_callout on command sets an internal environment variable that enables code that isolates failing SIMMs during memory testing. With this variable enabled, system self-test can take up to 40 seconds longer if a faulty SIMM is present.

The init command initializes the system and prints the console map.

This line in the console display notes that the SIMM callout environment variable is on.

The show simm command requests a display of faulty SIMMS.

In Example 3-6, no faulty SIMMs were found.

The set simm callout off command turns off the environment variable that enabled callout of faulty SIMMs.

The init command initializes the system in normal mode.

Example 3-7 shows a show simm command that calls out some failing SIMMs.

Section 5.1.5 tells how to locate, remove, and replace SIMMs in a memory module.

Example 3-7 Console Mode: Failing SIMMS Found

.

.

.

P01>>> show simm

The following SIMMs are faulty on memory module in slot 7

J30 J31

The set simm_callout on and init commands are omitted here for brevity.

The show simm command requests a display of faulty SIMMs.

SIMMS numbered J30 and J31 on the memory module in slot 7 are found to be faulty.

Console Display and Diagnostics 3-13

3.7 Info Command

The info command provides information useful in debugging the system. Some of the information it provides can be useful for isolating

FRUs in the field.

Example 3–8 Examples of the Info Command

P00>>> info

0.

About the console

1.

Bitmap

2.

PAL symbols

3.

IMPURE area (abbreviated)

4.

IMPURE area (full)

5.

TLSB Registers

6.

GBUS

7.

LOGOUT area

8.

Per Cpu HWRPB areas

9.

LAMB registers

10.

TLSB register addresses

11.

Page Tables

12.

FRU table

13.

Console internals

14.

Supported devices

15.

Console SCB

16.

PCIA

Enter selection: 5

Node0 Node1 Node 7 Node8

KN7CG-AB MS7CC MS7CC KFTHA

Base adr 88000000 88800000 89c00000 8a000000

TLDEV 00005000 00008014 00002020 00002000

TLBER 00100000 00800000 00000000 00000000

TLCNR 000fc200 00000220 00000170 00000180

TLVID 00000080 00000054

TLMMR0 00008014 80000010 80000010

TLMMR1 00008014 00000000 00000000

TLMMR2 00008014 00000000 00000000

TLMMR3 00008014 00000000 00000000

TLMMR4 00008014 00000000 00000000

TLMMR5 00008014 00000000 00000000

TLMMR6 00008014 00000000 00000000

TLMMR7 00008014 00000000 00000000

3-14 Service Manual

TLFADR0 0011ab00 00000000 00000000

TLFADR1 07050000 00000000 00000000

TLESR0 00000303 00400303 00000000 00000000

TLESR1 00000c0c 00400c0c 00000000 00000000

TLESR2 00006060 00406060 00000000 00000000

TLESR3 00009090 00409090 00000000 00000000

TLILID0 00000000 00000000

Node0 Node1 Node 7 Node8

KN7CG-AB MS7CC MS7CC KFTHA

TLILID1 00000000 00000000

TLILID2 00000000 00000000

TLILID3 00000000 00000000

TLCPUMASK 00000010 00000010

.

.

.

P00>>> info 5 | grep TLBER

TLBER 00100000 00800000 00000000 00000000

P00>>> info 5 | grep TLMMR*

TLMMR0 00008014 80000010 80000010

TLMMR1 00008014 00000000 00000000

TLMMR2 00008014 00000000 00000000

TLMMR3 00008014 00000000 00000000

TLMMR4 00008014 00000000 00000000

TLMMR5 00008014 00000000 00000000

TLMMR6 00008014 00000000 00000000

TLMMR7 00008014 00000000 00000000

P00>>>

The info command lists options available. (This list may change.)

The bitmap, HWRPB, and FRU table options only provide relevant information after the operating system has been running and halted with

Ctrl/P to return to console mode.

The user enters the selection 5 for a listing of TLSB registers.

The listing of bus registers continues for several pages; this is only the first page and a half to show that bus registers for all the modules are listed.

The console commands allow the UNIX concept of “piping.” Here, an info command requesting a listing of TLSB registers is piped into a grep command, which prints all lines produced by the info 5 that contain

TLBER.

This is another example of UNIX-type piping, showing the grep command with a “wildcard” (*), in which all lines produced by the info 5 command beginning with TLMMR are printed.

Console Display and Diagnostics 3-15

Chapter 4

DECevent Error Log

This chapter discusses error logs produced by the DECevent bit-to-text translator. Sections include:

• Brief Description of the TLSB Bus

• Producing an Error Log with DECevent

• Getting a Summary Error Log

• Supported Event Types

• Sample Error Log Entries

• Console Halt Conditions

DECevent Error Log 4-1

4.1 Brief Description of the TLSB Bus

The error log entries discussed here are specific to the AlphaServer

GS60E system. Most of the errors occur during the transmission of commands or data along the TLSB system bus or in buses or storage internal to a particular module.

To understand some of the terms used in the error log, you should understand how data is transferred on the TLSB system bus. The TLSB has two separate buses: a command/address bus and a data bus. Thus, errors can refer to transmissions on either of these buses.

A node that initiates a transaction is called a commander node. The node that responds to the command issued by the commander is called the slave node.

CPUs or I/O nodes are always the commander on memory transactions and can be either the commander or the slave on CSR (control and status register) transactions. Memory nodes are never commander nodes.

4.1.1 Command/Address Bus

Table 4-1 lists the eight address bus commands.

001

010

011

100

101

110

111

Table 4–1 TLSB Address Bus Commands

TLSB CMD

<2:0>

000

Command

No-op

Victim

Read

Description

Device that won arbitration nulled the command

Victim

Read memory

Write Memory write or write update

Read Bank Lock Read memory bank, lock

Write Bank Lock Write memory bank, unlock

CSR Read Read CSR data

CSR Write Write CSR data

4-2 Service Manual

4.1.2 Data Bus

The TSLB transfers data in the sequence order that valid address bus commands are issued. In addition to 256 bits of data, the data bus contains associated ECC bits and some control signals. Three signals are of particular significance in read and write operations.

TLSB_SHARED – When a request is made to access memory, each CPU notes whether the block of memory is currently resident in cache, and, if so, asserts a signal that the data is shared. Thus, when the slave responds with the data, it asserts the TLSB_SHARED signal on the data bus, so that CPU nodes can take note and make sure that the block being accessed remains valid in the CPU’s cache. This signal is valid when driven in response to Read, Read Bank Lock,

Write, and Write Bank Unlock commands.

TLSB_DIRTY – This signal is used to indicate that the block being accessed is valid in a CPU cache, and that the copy there is more recent than the copy in memory. TLSB_DIRTY is guaranteed to be valid in response to Read and Read

Bank Lock commands.

TLSB_STACHK – This signal is asserted whenever TLSB_SHARED or

TLSB_DIRTY are asserted, to ensure that, should an error occur in transmission or reception of either one of these signals, it can be detected. For example, if TLSB_SHARED or TLSB_DIRTY is asserted, but TLSB_STACHK is not, there is an error. Or, if TLSB_STACHK is asserted and TLSB_SHARED or

TLSB_DIRTY is not, there is also an error.

4.1.3 Error Checking

The TLSB is designed to implement error detection and, where possible, error correction. The TLSB uses parity protection on the address bus. The data bus is protected by ECC (error correction code). Protocol sequence checking is used on the control signals across both buses. Cache coherency is monitored with the use of the TLSB_SHARED and TLSB_DIRTY signals described above.

PALcode collects error information from module control and status registers and formats it into a “logout frame” that is passed to the operating system, which uses the information to determine the action to take on the error. Some errors are fatal; they can cause a specific process or the entire system to fail. Other errors can be corrected and do not halt processing. The operating system writes the error information as an entry in a binary file that can then be used by the

DECevent bit-to-text translator to produce an error log.

DECevent Error Log 4-3

4.2 Producing an Error Log with DECevent

The DECevent utility is available for both Tru64 UNIX and OpenVMS operating systems to help diagnose what are called “intermittent errors.” These errors may or may not cause the operating system to crash.

Example 4–1 Producing an Error Log with DECevent

$ diagnose/output=errlog.dat

DECevent Version V3.0

In this example, the error log information is directed to a file called errlog.dat.

If the /output qualifier is not used, the error log information is displayed on the screen of the console terminal.

4-4 Service Manual

4.3 Getting a Summary Error Log

Running DECevent with the /summary qualifier is a good way to start analyzing the error log. It gives you a “table of contents” for the error log.

Example 4–2 Summary Error Log

$ diagnose/summary

SUMMARY OF ALL ENTRIES LOGGED ON NODE CLYP01

Unknown major class

New errorlog created

Timestamp

Machine check (670 entry)

Crash Re-start

System startup

Volume mount

Adapter Error

Soft ECC error

1.

3.

7.

2.

3.

3.

4.

1.

DECevent Error Log 4-5

4.4 Supported Event Types

The events that DECevent logs can be logged by the CPU modules or one of the TLSB or I/O adapters. (Memory errors are logged by the

CPU.)

Table 4–2 Supported Event Types

Event Types Description

Machine check 670

Machine check 660

670 processor checks

660 system machine checks

630 error interrupts 630 correctable processors checks

620 errors

Extended CRD

Adapter

620 correctable system errors

Memory single-bit error footprints

Adapter is logging entity. Adapters include the KFTHA module and the DWLPB motherboard.

Example 4-3 and Example 4-4 show a Tru64 UNIX entry for a 670-type machine check and an OpenVMS 620 error entry for a CRD (corrected read data) error.

The boxes enclose the area that identifies the event type.

4-6 Service Manual

Example 4-3 OSF Event Type Identification

*********************** ENTRY 1 **************************

Logging OS 2. DIGITAL UNIX

System Architecture

Event sequence number

2. ALPHA

1.

Timestamp of occurrence

Host name

21-OCT-1999 16:57:19 clyp01

AXP HW model AlphaServer GS60E

Number of CPUs (mpnum) x0000002

CPU logging event (mperr) x0000006

Event validity

Entry type

1. Valid

100. CPU Machine Check Errors

CPU Minor class

Event severity

1. Machine check (670 entry)

1. Severe Priority

Example 4-4 OpenVMS Event Type Identification

********************** ENTRY 124 ************************

Logging OS 1. OpenVMS

System Architecture

OS version

Event sequence number

Timestamp of occurrence

Host name

2. ALPHA

V7.2-1

102.

2-NOV-1999 17:45:05

CLYP01

AXP HW model AlphaServer GS60E

Number of CPUs (mpnum) x0000005

CPU logging even (mperr) x0000006

Entry type

Memory Minor class

14. CRD log

2. CRD Entry

DECevent Error Log 4-7

4.5 Sample Error Log Entries

4.5.1 Machine Check 660 Error

You can identify problem FRUs in an error log entry by checking the contents of the registers against the parse trees.

The following steps (relating to the callouts in Example 4-5) isolate the error and the FRU most likely responsible.

Table 4–3 Parsing a Sample 660 Error (Example 4-5)

This line identifies the error log entry as a machine check 660 error.

The parse tree for machine check 660 errors starts with the C_STAT register. DOUBLE BIT FILL ERR is set.

The TLBER register is next in the parse tree. UNCORRECTABLE DATA

ERROR is set.

The TLBER register on the memory module is set to an

UNCORRECTABLE DATA ERROR, indicating that the source of the 660 is a memory module.

Example 4-5 Sample Machine Check 660 Error Log Entry

**************** ENTRY 1 ***********************

Logging OS 2. Digital UNIX

System Architecture 2. Alpha

Event sequence number 8.

Timestamp of occurrence 01-OCT-1999 22:12:32

Host name clyp01

System type register x0000000C AlphaServer GS60E67/700

Number of CPUs (mpnum) x00000002

CPU logging event (mperr) x00000000

Event validity 1. O/S claims event is valid

Event severity 1. Severe Priority

Entry type 100. Machine Check Error - (major class)

1. - (minor class)

4-8 Service Manual

-- TLaser MCHK 660 --

Software Flags x00000001 TLSB Error Log Snapshot

Packet Present

Active CPUs x00000003

Hardware Rev x00000000

System Serial Number 12345678

Module Serial Number NI81000080

System Revision x00000000

MCHK Reason Mask x0000FFF0

MCHK Frame Rev x00000001

MCHK Frame Rev: 1.0

- CPU Registers -

I_STAT x0000000000000000

Bits<31:29> Bx000 - NO Error Detected

DC_STAT x0000000000000000

Bits<04:00> Bx00000 - NO Error Detected

C_ADDR x000000004C832000

Address of last reported x0000000001320C80

DC1_SYNDROME x0000000000000000

DC0_SYNDROME x00000000000000D4

C_STAT x0000000000000010

Bits<04:00> Bx10000 DOUBLE BIT FILL ERR

C_STS x0000000000000002

Bits<03:00> Bx0010 INIT mode - Dirty

MM_STAT x0000000000000280

OPCODE x0000000000000028

Dcache Parity: OK

EXC_ADDR xFFFFFFFFB44CCB50

NO Bits Set

Addr Field_1 Bits<31:02> x000000002D1332D4

Addr Field_2 Bits<63:32> x00000000FFFFFFFF

IER_CM x0000007EE0000000

NO Bits Set

Current Mode 00 Kernel

AST Interrupt Enabled x0000000000000000

Software Interrupts Enb: x0000000000000000

Corr Read Error Intr Enb

Serial Line Intr Dis

EIEN Interrupt: x000000000000003F

I_SUM x0000000000000000

NO Bits Set

AST Interrupts NO AST Bits Set

Software Interrupts x0000000000000000

DECevent Error Log 4-9

Performance Cnt Interrupt x0000000000000000

Corr Read Error Intr Dis

Serial Line Intr Dis

EIEN Interrupts: x0000000000000000

PAL_Base x0000000000020000

Base address of PAL Code: x0000000000000004

I_CTL xFFFFFFFC03300396

System Performance Counter Dsb

Icache Set enabled x0000000000000003

Super page Mode Bits x0000000000000002

I-Stream Buffer Enable 3.

I-Stream Buffer Enable DBP based on state

of chooser

Branches chosen

PALRES Inst NOT executed in Kernel Mode

VA_48, 43 Bit Virtual Address used

VA_FORM_32, Bit NOT Set

Single_Issue_L Bottom Up

Performance Counter 0 Disabled

Performance Counter 1 Disabled

CALL_PAL link Reg is R23

MCHK Check Enabled

Processor ID EV6 - Pass 2.3

VPTB Bits<47:30> x000000000003FFF0

VPTB Bits<63:48> x000000000000FFFF

PCTX x0000628000000004

Floating Point Enb

ASTER 00 Kernel

ASTRR 00 Kernel

- System Registers -

WHAMI x0000 TLSB Node ID 0.

CPU0

MISCR x00D5 Bcache Size: 4 Mbyte

Two Processors

TLSB RUN Signal

CPU0 Running console

CPU1 Running console

TLDEV x80008025 -- Device Type: Dual EV6 Proc, 525Mhz,

4meg Bcache

TLBER x00110000 UNCORRECTABLE DATA ERROR

Data Syndrome 0

TLCNR x00000200

TLVID x00000010

4-10 Service Manual

TLESR0 x0008D4D4 SYND0 x000000D4

SYND1 x000000D4

UNCORRECTABLE ECC ERROR

TLESR1 x00000300 SYND0 x00000000

SYND1 x00000003

TLESR2 x00000300 SYND0 x00000000

SYND1 x00000003

TLESR3 x00000300 SYND0 x00000000

SYND1 x00000003

TLMODCONFIG0 x00700B80 DPQ MAX Entries x00000007

enable fast fills

BQ_MAX_ENTRIES 7

Bcache size = 4MB

TLMODCONFIG1 x08B00111 Overtake Enabled

P0 Reqest ID line 0

P1 Reqest ID line 1

TLMBPR_RETRY_Count 2**10 retries - 6.0us

on idle system (min)

DISABLE PROBE Number 0

tbc fast path disabled

dm_dslb_prio - fills, probes, victims or wrio

en_fst_vq

en_fst_prq

en_fts_writes

TCCERR x00011800 TCC Chip Revision x00000001

TDIERR x00000000

INTR MASK 0 x000001FF duart0 interrupt enable

ipl 14 interrupt enable

ipl 15 interrupt enable

ipl 16 interrupt enable

ipl 17 interrupt enable

ip enable

intim enable

CPU halt enable

control/p halt enable

INTR MASK 1 x000000FE ipl 14 interrupt enable

ipl 15 interrupt enable

ipl 16 interrupt enable

ipl 17 interrupt enable

ip enable

intim enable

CPU halt enable

INTR SUM 0 x00000000

INTR SUM 1 x00000000

DECevent Error Log 4-11

TLEP VMG x00000000

TLEPWERR0 x00000380

TLEPWERR1 x00047804

TLEPWERR2 x0006E680

TLEPWERR3 x00047810

CPU0 Last Win Sp Access x000000C780400380

Pending Bit=0, Address NOT LATCHED/NOT VALID

CPU1 Last Win Sp Access x000000C78106E680

Pending Bit=0, Address NOT LATCHED/NOT VALID

Palcode Revision x0000000400000402

Palcode Rev: 4.2-4

TLSB Base Adr x0000000000000000

*TLaser CPU Registers*

TLSB Node Number 0.

TLDEV x80008025 -- Device Type: Dual EV6 Proc, 525Mhz,

4meg Bcache

TLBER x00110000 UNCORRECTABLE DATA ERROR

Data Syndrome 0

TLCNR x00000200

TLVID x00000010

TLESR0 x0008D4D4 SYND0 x000000D4

SYND1 x000000D4

UNCORRECTABLE ECC ERROR

TLESR1 x00000300 SYND0 x00000000

SYND1 x00000003

TLESR2 x00000300 SYND0 x00000000

SYND1 x00000003

TLESR3 x00000300 SYND0 x00000000

SYND1 x00000003

MODCONFIG0 x00700B80 DPQ MAX Entries x00000007

enable fast fills

BQ_MAX_ENTRIES 7

Bcache size = 4MB

MODCONFIG1 x08B00111 Overtake Enabled

P0 Reqest ID line 0

P1 Reqest ID line 1

TLMBPR_RETRY_Count 2**10 retries - 6.0us

on idle system (min)

DISABLE PROBE Number 0

4-12 Service Manual

tbc fast path disabled

dm_dslb_prio - fills, probes, victims or

wrio

en_fst_vq

en_fst_prq

en_fts_writes

TCCERR x00011800 TCC Chip Revision x00000001

TDIERR x00000000

INTRMASK0 x000000FE ipl 14 interrupt enable

ipl 15 interrupt enable

ipl 16 interrupt enable

ipl 17 interrupt enable

ip enable

intim enable

CPU halt enable

INTRMASK1 x00000000

TLEP Interrupt Sum 0 x00000000

TLEP Interrupt Sum 1 x00000000

TLEP VMG x00000000

TLEPWERR0 x00000000

TLEPWERR1 x00000000

TLEPWERR2 x00000000

TLEPWERR3 x00047810

* TLaser Memory Regs *

TLSB Node Number 4.

TLDEV x00005000 -- Device Type: Memory

-- Module Revision: x00000000

TLBER x00800000

TLCNR x000FC240

TLVID x00000080

FADR 0 x0002000000300010

FADR 1 x00020000

TLESR0 x00000300

TLESR1 x00000300

TLESR2 x00000300

TLESR3 x00000300

TMIR x80000002 Interleave x00000002

TMCR x00000205 512MB Module (E2035-DA)

16 MB DRAM

60ns DRAM

Strings Installed = 2

DRAM timing: Bus Spd = 10.0-11.2

DECevent Error Log 4-13

Refresh Cnt = 1360

TMER x00000000 Failing String = x00000000

TMDRA x00000000 Refresh Rate 1X

TDDR0 x00000000

TDDR1 x00000000

TDDR2 x00000000

TDDR3 x00000000

* TLaser Memory Regs *

TLSB Node Number 5.

TLDEV x00005000 -- Device Type: Memory

-- Module Revision: x00000000

TLBER x01110000 UNCORRECTABLE DATA ERROR

DATA SYNDROME 0

DATA TRANSMITTER DURING ERROR

TLCNR x000FC250

TLVID x000000A2

FADR x072200004DC32000

FADR 1 x07220000 Failing Command: Read

Failing Bank = Bank 2

TLESR0 x0009D4D4 ECC Syndrome 0 x000000D4

ECC Syndrome 1 x000000D4

TRANSMITTER DURING ERROR

UNCORRECTABLE ECC ERROR

TLESR1 x00000300

TLESR2 x00000300

TLESR3 x00000300

TMIR x80000002 Interleave x00000002

TMCR x00000208 256MB Module (E2035-CA)

4 MB DRAM

60ns DRAM

Strings Installed = 4

DRAM timing: Bus Spd = 10.0-11.2

Refresh Cnt = 1360

TMER x00000000 Failing String = x00000000

TMDRA x10000000 Refresh Rate 2X Default

TDDR0 x0000C300

TDDR1 x00000000

TDDR2 x00000000

TDDR3 x00000000

* TLaser Memory Regs *

TLSB Node Number 6.

4-14 Service Manual

TLDEV x02045000 -- Device Type: Memory

-- Module Revision: x00000204

TLBER x00800000

TLCNR x000FC260

TLVID x000000B3

FADR 0 x0032000000300010

FADR 1 x00320000

TLESR0 x00000300

TLESR1 x00000300

TLESR2 x00000300

TLESR3 x00000300

TMIR x80000002 Interleave x00000002

TMCR x00000208 256MB Module (E2035-CA)

4 MB DRAM

60ns DRAM

Strings Installed = 4

DRAM timing: Bus Spd = 10.0-11.2

Refresh Cnt = 1360

TMER x00000000 Failing String = x00000000

TMDRA x00000000 Refresh Rate 1X

TDDR0 x0000000

TDDR1 x00000000

TDDR2 x00000000

TDDR3 x00000000

* TLaser Memory Regs *

TLSB Node Number 7.

TLDEV x02045000 -- Device Type: Memory

-- Module Revision: x00000204

TLBER x00800000

TLCNR x000FC270

TLVID x00000091

FADR 0 x0012000000300010

FADR 1 x00120000

TLESR0 x00000300

TLESR1 x00000300

TLESR2 x00000300

TLESR3 x00000300

TMIR x80000002 Interleave x00000002

TMCR x00000205 512MB Module (E2035-DA)

16 MB DRAM

60ns DRAM

DECevent Error Log 4-15

Strings Installed = 2

DRAM timing: Bus Spd = 10.0-11.2

Refresh Cnt = 1360

TMER x00000000 Failing String = x00000000

TMDRA x00000000 Refresh Rate 1X

TDDR0 x00000000

TDDR1 x00000000

TDDR2 x00000000

TDDR3 x00000000

* TLaser I/O Registers *

TLSB Node Number 8.

TLDEV x00002000 -- Device Type: I/O Module

TLBER x00100000

FADR 0 x0000000000000000

FADR 1 x00000000

TLESR0 x00000000

TLESR1 x00000000

TLESR2 x00000000

TLESR3 x00000000

CPU Interrupt Mask x00000001 Cpu Interrupt Mask = x00000001

ICCMSR x00000000 Arbitration Control Minimum Latency Mode

Suppress Control Suppress after 16

Translations

ICCNSE x80000000 Interrupt Enable on NSES Set

ICCMTR x00000000

IDPNSE-0 x00000000

IDPNSE-1 x00000006 Hose Power OK

Hose Cable OK

IDPNSE-2 x00000000

IDPNSE-3 x00000000

IDPVR x00000800

ICCWTR x00000000

TLMBPR x0000000000000000

IDPDR0 x00000000

IDPDR1 x20000000

IDPDR2 x00000000

IDPDR3 x00000000

4-16 Service Manual

4.5.2 Machine Check 620 Error

Machine check 620 errors are nearly always soft errors; that is, they do not cause the system to crash. Correctable write data errors (CWDE) on CSR writes are the exception.

Example 4-6 shows a sample machine check 620 error. In this case, all nodes on the TLSB are presented in the error log entry. The steps in Table 4-4 isolate the error and the FRU most likely responsible.

Table 4–4 Parsing a Sample 620 Error (Example 4-6)

This line identifies the error log entry as a machine check 620 error.

The parse tree for machine check 620 errors starts with the DC_STAT register. The next branch on the parse tree is C_STAT.

DSTREAM_MEM_ERR is set.

The TLBER register is next in the parse tree. CORRECTABLE READ

DATA ERROR is set.

The TLBER register on the memory module is next in the parse tree.

CORRECTABLE READ DATA ERROR is set.

The error log identifies the SIMM where the error occurred as J22. UNIX lists each occurrence of a corrected read data error. Before replacing the

SIMM, you would probably want to examine other 620 entries to see if the error on SIMM J22 was repeated.

Example 4-6 Sample Machine Check 620 Error Log Entry

**** T3.1 ****** ENTRY 1 ***********************

Logging OS 2. Digital UNIX

System Architecture 2. Alpha

Event sequence number 2.

Timestamp of occurrence 15-JUN-1999 20:05:32

Host name warp5

System type register x0000000C AlphaServer 8x00

Number of CPUs (mpnum) x00000004

CPU logging event (mperr) x00000002

DECevent Error Log 4-17

Event validity 1. O/S claims event is valid

Event severity 5. Low Priority

Entry type 100. Machine Check Error - (major class)

3. - (minor class)

-- TLaser 620 Corr Error

Software Flags x00000001 TLSB Error Log Snapshot

Packet Present

Active CPUs x0000000F

Hardware Rev x00000000

System Serial Number

Module Serial Number SSS

System Revision x00000000

MCHK Reason Mask x00000086

MCHK Frame Rev x00000001

MCHK Frame Rev: 1.0

-- CPU Registers --

I_STAT x0000000800000000

Bits<31:29> Bx000 - NO

Error Detected

DC_STAT x0000000000000008

Bits<04:00> Bx01000 - DCACHE DATA

CORRECTABLE ECC ERROR

(LOAD)

C_ADDRESS x0000000000874000

Address of last reported x0000000000021D00

DC1_SYNDROME x0000000000000000

DC0_SYNDROME x00000000000000D5

C_STAT x0000000000000003

Bits<04:00> Bx00011 DSTREAM_MEM_ERR

C_STS x0000000000000002

Bits<03:00> Bx0010 INIT mode - Dirty

MM_STAT x0000000000000000

OPCODE x0000000000000000

Dcache Parity: OK

-- System Registers --

WHAMI x0002 TLSB Node ID 1.

CPU0

MISCR x00D5 Bcache Size: 4 Mbyte

Two Processors

4-18 Service Manual

TLSB RUN Signal

CPU0 Running console

CPU1 Running console

DOF_CNT x00000000

TLDEV xB0008027 -- Device Type: Dual EV67 Proc,

700Mhz,

4meg Bcache

TLBER x00140000 CORRECTABLE READ DATA ERROR

Data Syndrome 0

TLESR0 x0020D5D5 SYND0 x000000D5

SYND1 x000000D5

CORRECTABLE ECC ERROR DURING READ

TLESR1 x00000300 SYND0 x00000000

SYND1 x00000003

TLESR2 x00000300 SYND0 x00000000

SYND1 x00000003

TLESR3 x00000300 SYND0 x00000000

SYND1 x00000003

Palcode Revision x0000001300000504

Palcode Rev: 5.4-19

TLSB Base Adr x0000000000000000

*TLaser CPU Registers*

TLSB Node Number 0.

TLDEV x80008025 -- Device Type: Dual EV6 Proc,

525Mhz,

4meg Bcache

TLBER x00800000 Data Syndrome 3

TLCNR x00000200

TLVID x00000010

TLESR0 x00000300 SYND0 x00000000

SYND1 x00000003

TLESR1 x00000300 SYND0 x00000000

SYND1 x00000003

TLESR2 x00000300 SYND0 x00000000

SYND1 x00000003

TLESR3 x00000300 SYND0 x00000000

SYND1 x00000003

MODCONFIG0 x00700B80 DPQ MAX Entries x00000007

enable fast fills

BQ_MAX_ENTRIES 7

Bcache size = 4MB

DECevent Error Log 4-19

MODCONFIG1 x08B00141 Overtake Enabled

P0 Reqest ID line 0

P1 Reqest ID line 4

MBPR_RETRY_Count 2**10 retries - 6.0us

on idle system (min)

DISABLE PROBE Number 0

tbc fast path disabled

dm_dslb_prio - fills, probes, victims or

wrio

en_fst_vq

en_fst_prq

en_fts_writes

TCCERR x00011800 TCC Chip Revision x00000001

TDIERR x00000000

INTRMASK0 x000000FE ipl 14 interrupt enable

ipl 15 interrupt enable

ipl 16 interrupt enable

ipl 17 interrupt enable

ip enable

intim enable

CPU halt enable

INTRMASK1 x00000000

TLEP Interrupt Sum 0 x00000000

TLEP Interrupt Sum 1 x00000000

TLEP VMG x00000000

TLEPWERR0 x00000000

TLEPWERR1 x00000000

TLEPWERR2 x00000000

TLEPWERR3 x00041FF7

*TLaser CPU Registers*

TLSB Node Number 1.

TLDEV xB0008027 -- Device Type: Dual EV67 Proc,

700Mhz,

4meg Bcache

TLBER x00140000 CORRECTABLE READ DATA ERROR

Data Syndrome 0

TLCNR x00000210

TLVID x00000032

TLESR0 x0020D5D5 SYND0 x000000D5

SYND1 x000000D5

CORRECTABLE ECC ERROR DURING READ

TLESR1 x00000300 SYND0 x00000000

4-20 Service Manual

SYND1 x00000003

TLESR2 x00000300 SYND0 x00000000

SYND1 x00000003

TLESR3 x00000300 SYND0 x00000000

SYND1 x00000003

MODCONFIG0 x00700B80 DPQ MAX Entries x00000007

enable fast fills

BQ_MAX_ENTRIES 7

Bcache size = 4MB

MODCONFIG1 x08B00153 Overtake Enabled

P0 Reqest ID line 1

P1 Reqest ID line 5

TLMBPR_RETRY_Count 2**10 retries - 6.0us

on idle system (min)

DISABLE PROBE Number 0

tbc fast path disabled

dm_dslb_prio - fills, probes, victims or

wrio

en_fst_vq

en_fst_prq

en_fts_writes

TCCERR x00011800 TCC Chip Revision x00000001

TDIERR x00000000

INTRMASK0 x000000FE ipl 14 interrupt enable

ipl 15 interrupt enable

ipl 16 interrupt enable

ipl 17 interrupt enable

ip enable

intim enable

CPU halt enable

INTRMASK1 x00000000

TLEP Interrupt Sum 0 x00000000

TLEP Interrupt Sum 1 x00000000

TLEP VMG x00000000

TLEPWERR0 x00000000

TLEPWERR1 x00000000

TLEPWERR2 x00000000

TLEPWERR3 x00041FF7

* TLaser Memory Regs *

TLSB Node Number 4.

TLDEV x00005000 -- Device Type: Memory

-- Module Revision: x00000000

DECevent Error Log 4-21

TLBER x01140000 CORRECTABLE READ DATA ERROR

DATA SYNDROME 0

DATA TRANSMITTER DURING

ERROR

TLCNR x000FC240

TLVID x00000080

FADR x0702000000874000

FADR 1 x07020000 Failing Command: Read

Failing Bank = Bank 0

TLESR0 x0021D5D5 ECC Syndrome 0 x000000D5

CC Syndrome 1 x000000D5

TRANSMITTER DURING ERROR

CORRECTABLE READ ECC ERROR

ECC Code xD5 Failing SIMM Number = J22

Second ECC Code xD5 Failing SIMM Number = J22

TLESR1 x00000300

TLESR2 x00000300

TLESR3 x00000300

TMIR x80000001 Interleave x00000001

TMCR x0000020D 2GB Module (E2036-AA)

16 MB DRAM

60ns DRAM

Strings Installed = 8

DRAM timing: Bus Spd = 10.0-11.2

Refresh Cnt = 1360

TMER x00000000 Failing String = x00000000

TMDRA x00000000 Refresh Rate 1X

TDDR0 x00000000

TDDR1 x00000000

TDDR2 x00000000

TDDR3 x00000000

* TLaser I/O Registers *

TLSB Node Number 8.

TLDEV x00002020 -- Device Type:

Integrated I/O Module

TLBER x00000000

FADR 0 x0000000000000000

FADR 1 x00000000

TLESR0 x00000000

TLESR1 x00000000

4-22 Service Manual

TLESR2 x00000000

TLESR3 x00000000

CPU Interrupt Mask x00000001 Cpu Interrupt Mask = x00000001

ICCMSR x00000000 Arbitration Control Minimum Latency Mode

Suppress Control Suppress after 16

Translations

ICCNSE x80000000 Interrupt Enable on NSES Set

ICCMTR x00000002 Mbox Trans in Prog, Hose 1

IDPNSE-0 x00000006 Hose Power OK

Hose Cable OK

IDPNSE-1 x00000006 Hose Power OK

Hose Cable OK

IDPNSE-2 x00000000

IDPNSE-3 x00000000

IDPVR x00000800

ICCWTR x00000000

TLMBPR x0000000000000000

IDPDR0 x20000000

IDPDR1 x00000000

IDPDR2 x00000000

IDPDR3 x00000000

DECevent Error Log 4-23

4.5.3 DWLPB Motherboard (PCIA) Adapter Error Log

Registers on the DWLPB motherboard are printed in the error log when one of these errors occur. You use the parse tree for the DWLPB motherboard to determine the most likely FRU.

Example 4-7 shows a sample DWLPB motherboard (PCIA) adapter error. The following steps isolate the error and the FRU most likely responsible.

Table 4–5 Parsing a DWLPB Motherboard Error (Example 4-7)

This line identifies the error as a PCIA (DWLPB motherboard) adapter error.

The parse tree for the DWLPB motherboard starts with the ERR0 register. No bits are set in this register, so we follow the tree down.

The ERR1 register is also all zeros, so we follow the tree down.

The ERR2 register’s last digit is 9, indicating that bit 0 is set, and bit 3 is set. The FRUs identified for this branch of the parse tree are the KFTHA

(high probability), PCIA (DWLPB motherboard) medium probability, and hose (I/O cable connecting KFTHA to DWLPB motherboard) low probability.

Example 4-7 Sample DWLPB Motherboard Error Log Entry

*********************** ENTRY 1 *************************

Logging OS

System Architecture

1. OpenVMS

2. Alpha

OS version V7.2-1

Event sequence number 140.

Timestamp of occurrence 6-JAN-1999 07:45:32

System uptime in seconds 51.

Flags

Host name

x0000

CLYP01

Alpha HW model

Unique CPU ID

Entry type

SWI Minor class

AlphaServer GS60E x00000005

28. Adapter Error

8. Adapter Error

4-24 Service Manual

SWI Minor sub class 5. PCIA

Software Flags x0028000 PCIA Subpacket Present

PCI Bus Snapshot Present

Base Phys Addr of TIOP x000000FF89800000

-Tlaser PCIA Registers-

Channel No.

PCI Slots Present x00000000 Contents of PCI0-Slot 0 No Card

Contents of PCI0-Slot 1 No Card

Contents of PCI0-Slot 2 No Card

Contents of PCI0-Slot 3 No Card

Contents of PCI1-Slot 0 No Card

Contents of PCI1-Slot 1 No Card

Contents of PCI1-Slot 2 No Card

Contents of PCI1-Slot 3 No Card

Contents of PCI2-Slot 0 No Card

Contents of PCI2-Slot 1 No Card

Contents of PCI2-Slot 2 No Card

CTL0

Contents of PCI2-Slot 3 No Card

Module Revision x00000000

x01E00100 Config Cycle Type PCI Type 0

Configuration

Memory Block Size 64 Bytes

PCI Cut Through Threshhold x00000000

IO Space HW Addr Ext. x00000000

Mem Read Mult Pre-fetch S 4 Cache Blocks

I/O Port Up Hose Buffers 3 Buffers (TIOP and IOP)

Scatter/Gather MAP RAM Si 128KB (32K entries-default)

PCI Arbitration Control Round Robin for all Masters

PCI Cut Through Enable

Memory Read Multiple Enable

MRETRY 0

ERR 0

FADR0 x00400000 x00000000

➋ x00000000 DMA Read from Memory

IMask PCI Interrupt 0 x01030000 Error Interrupt Enable

Device Interrupt Priority IPL 14

DIAG0 x00000000 Generate Correct parity

HPC Gate Array Revision=0.

RM Down Hose Translate Ad x00000000

IPEND 0

IPROG 0

Window Mask Reg A0

Window Base Reg A0 x00000000 x00000000 Interrupt Source Slot 0 INTA x007F0000 Window Size = 8 MB x00800003 Scatter/Gather Enable

Window Enable

Window Base Address=x00000080

Translation Base Reg A0 x00000000 Trans Base Address=x00000000

Window Mask Reg B0 x3FFF0000 Window Size = 1 GB

Window Base Reg B0 x40000002 Window Enable

DECevent Error Log 4-25

Window Base Address=x00004000

Translation Base Reg B0 x00000000 Trans Base Address=x00000000

Window Mask Reg C0

Window Base Reg C0 x0FFF0000 Window Size = 256 MB xF0000003 Scatter/Gather Enable

Window Enable

Window Base Address=x0000F000

Translation Base Reg C0 x00000000 Trans Base Address=x00000000

Error Vector 0 x00000945 Interrupt Vector x00000945

Dev Vec 0 Slot 0, IntA x00000B70 Interrupt Vector x00000B70

Dev Vec 0 Slot 0, IntB x00000B80 Interrupt Vector x00000B80

Dev Vec 0 Slot 0, IntC x00000B90 Interrupt Vector x00000B90

Dev Vec 0 Slot 0, IntD x00000BA0 Interrupt Vector x00000BA0

Dev Vec 0 Slot 1, IntA x00000905 Interrupt Vector x00000905

Dev Vec 0 Slot 1, IntB x00000BC0 Interrupt Vector x00000BC0

Dev Vec 0 Slot 1, IntC x00000BD0 Interrupt Vector x00000BD0

Dev Vec 0 Slot 1, IntD x00000BE0 Interrupt Vector x00000BE0

Dev Vec 0 Slot 2, IntA x00000BF0 Interrupt Vector x00000BF0

Dev Vec 0 Slot 2, IntB x00000C00 Interrupt Vector x00000C00

Dev Vec 0 Slot 2, IntC x00000C10 Interrupt Vector x00000C10

Dev Vec 0 Slot 2, IntD x00000C20 Interrupt Vector x00000C20

Dev Vec 0 Slot 3, IntA x00000C30 Interrupt Vector x00000C30

Dev Vec 0 Slot 3, IntB x00000C40 Interrupt Vector x00000C40

Dev Vec 0 Slot 3, IntC x00000C50 Interrupt Vector x00000C50

Dev Vec 0 Slot 3, IntD x00000C60 Interrupt Vector x00000C60

CTL 1 x01E00100 Config Cycle Type PCI Type 0

Configuration

Memory Block Size 64 Bytes

PCI Cut Through Threshhold x00000000

IO Space HW Addr Ext. x00000000

Mem Read Mult Pre-fetch S 4 Cache Blocks

I/O Port Up Hose Buffers 3 Buffers (TIOP and IOP)

Scatter/Gather MAP RAM Si 128KB (32K entries-default)

PCI Arbitration Control Round Robin for all Masters

PCI Cut Through Enable

Memory Read Multiple Enable

MRETRY 1

ERR 1

FADR1 x00400000 x00000000

➌ x00000000 DMA Read from Memory

IMask PCI Interrupt 0 x01030000 Error Interrupt Enable

Device Interrupt Priority IPL 14

DIAG1 x00000000 Generate Correct parity

HPC Gate Array Revision=0.

RM Down Hose Translate Ad x00000000

IPEND 1

IPROG 1

Window Mask Reg A1

Window Base Reg A1 x00000000 x00000000 Interrupt Source Slot 0 INTA x007F0000 Window Size = 8 MB x00800003 Scatter/Gather Enable

Window Enable

Window Base Address=x00000080

4-26 Service Manual

Translation Base Reg A1 x00000000 Trans Base Address=x00000000

Window Mask Reg B1 x3FFF0000 Window Size = 1 GB

Window Base Reg B1 x40000002 Window Enable

Window Base Address=x00004000

Translation Base Reg B1 x00000000 Trans Base Address=x00000000

Window Mask Reg C1 x0FFF0000 Window Size = 256 MB

Window Base Reg C1 xF0000003 Scatter/Gather Enable

Window Enable

Window Base Address=x0000F000

Translation Base Reg C1 x00000000 Trans Base Address=x00000000

Error Vector 1 x00000956 Interrupt Vector x00000956

Dev Vec 1 Slot 0, IntA x00000C70 Interrupt Vector x00000C70

Dev Vec 1 Slot 0, IntB x00000C80 Interrupt Vector x00000C80

Dev Vec 1 Slot 0, IntC x00000C90 Interrupt Vector x00000C90

Dev Vec 1 Slot 0, IntD x00000CA0 Interrupt Vector x00000CA0

Dev Vec 1 Slot 1, IntA x00000CB0 Interrupt Vector x00000CB0

Dev Vec 1 Slot 1, IntB x00000CC0 Interrupt Vector x00000CC0

Dev Vec 1 Slot 1, IntC x00000CD0 Interrupt Vector x00000CD0

Dev Vec 1 Slot 1, IntD x00000CE0 Interrupt Vector x00000CE0

Dev Vec 1 Slot 2, IntA x00000CF0 Interrupt Vector x00000CF0

Dev Vec 1 Slot 2, IntB x00000D00 Interrupt Vector x00000D00

Dev Vec 1 Slot 2, IntC x00000D10 Interrupt Vector x00000D10

Dev Vec 1 Slot 2, IntD x00000D20 Interrupt Vector x00000D20

Dev Vec 1 Slot 3, IntA x00000D30 Interrupt Vector x00000D30

Dev Vec 1 Slot 3, IntB x00000D40 Interrupt Vector x00000D40

Dev Vec 1 Slot 3, IntC x00000D50 Interrupt Vector x00000D50

Dev Vec 1 Slot 3, IntD x00000D60 Interrupt Vector x00000D60

CTL 2 x01E00100 Config Cycle Type PCI Type 0

Configuration

Memory Block Size 64 Bytes

PCI Cut Through Threshhold x00000000

IO Space HW Addr Ext. x00000000

Mem Read Mult Pre-fetch S 4 Cache Blocks

I/O Port Up Hose Buffers 3 Buffers (TIOP and IOP)

Scatter/Gather MAP RAM Si 128KB (32K entries-default)

PCI Arbitration Control Round Robin for all Masters

PCI Cut Through Enable

Memory Read Multiple Enable

MRETRY 2

ERR 2 x00400000 x00000209 Error Summary

CSR Overrun Error

FADR2 x00000000 DMA Read from Memory

IMask PCI Interrupt 0 x01030000 Error Interrupt Enable

DIAG2

Device Interrupt Priority IPL 14 x00000000 Generate Correct parity

HPC Gate Array Revision=0.

RM Down Hose Translate Ad x00000000

IPEND 2

IPROG 2 x00000000 x00000000 Interrupt Source Slot 0 INTA

DECevent Error Log 4-27

Window Mask Reg A2

Window Base Reg A2 x007F0000 Window Size = 8 MB x00800003 Scatter/Gather Enable

Window Enable

Window Base Address=x00000080

Translation Base Reg A2 x00000000 Trans Base Address=x00000000

Window Mask Reg B2 x3FFF0000 Window Size = 1 GB

Window Base Reg B2 x40000002 Window Enable

Window Base Address=x00004000

Translation Base Reg B2 x00000000 Trans Base Address=x00000000

Window Mask Reg C2 x0FFF0000 Window Size = 256 MB

Window Base Reg C2 xF0000003 Scatter/Gather Enable

Window Enable

Window Base Address=x0000F000

Translation Base Reg C2 x00000000 Trans Base Address=x00000000

Error Vector 2 x00000967 Interrupt Vector x00000967

Dev Vec 2 Slot 0, IntA x00000D70 Interrupt Vector x00000D70

Dev Vec 2 Slot 0, IntB x00000D80 Interrupt Vector x00000D80

Dev Vec 2 Slot 0, IntC x00000D90 Interrupt Vector x00000D90

Dev Vec 2 Slot 0, IntD x00000DA0 Interrupt Vector x00000DA0

Dev Vec 2 Slot 1, IntA x00000DB0 Interrupt Vector x00000DB0

Dev Vec 2 Slot 1, IntB x00000DC0 Interrupt Vector x00000DC0

Dev Vec 2 Slot 1, IntC x00000DD0 Interrupt Vector x00000DD0

Dev Vec 2 Slot 1, IntD x00000DE0 Interrupt Vector x00000DE0

Dev Vec 2 Slot 2, IntA x00000DF0 Interrupt Vector x00000DF0

Dev Vec 2 Slot 2, IntB x00000E00 Interrupt Vector x00000E00

Dev Vec 2 Slot 2, IntC x00000E10 Interrupt Vector x00000E10

Dev Vec 2 Slot 2, IntD x00000E20 Interrupt Vector x00000E20

Dev Vec 2 Slot 3, IntA x00000E30 Interrupt Vector x00000E30

Dev Vec 2 Slot 3, IntB x00000E40 Interrupt Vector x00000E40

Dev Vec 2 Slot 3, IntC x00000E50 Interrupt Vector x00000E50

Dev Vec 2 Slot 3, IntD x00000E60 Interrupt Vector x00000E60

--Tlaser PCI Registers --

Node Qty

CONFIG Address

Device Name

Vendor ID

Device ID

Command

Status

Revision ID

Class Code

Cache Line S

Latency T.

Header Type

Bist

Base Address Register 1

Base Address Register 2

1.

x0000000000000018 x0021001 DECchip 21264A

x1011

x0002

x0007

x0280 Fast Back-to-Back Capable

DEVSEL Medium

x23 x020000

x00

xFF

x00

x00 x00180001 x01000000

4-28 Service Manual

Base Address Register 3

Base Address Register 4

Base Address Register 5

Base Address Register 6

Expansion Rom Base Address

Interrupt P1

Interrupt P2

Min Gnt

Max Lat x00000000 x00000000 x00000000 x00000000 x00000000 xE5 x01 x00 x00

DECevent Error Log 4-29

4.6 Console Halt Conditions

Double error halts are conditions in which the processing of a fatal error triggers a second error. The TL6 Machine Check 670/660 logout frame provides error information to the operating system error handler.

4.6.1 CPU Double Error Halt

The CPU double error halt is caused by two conditions:

1. The machine is processing a Machine Check and trapping back into the

Machine Check prior to exiting the first machine check. The operating system clears MCES MCHK in Progress bit to signal exiting the handler.

2. While PALcode is executing, the machine tries to enter a Machine Check, thus causing a Double Error halt.

Under both of these conditions continuing system operation is not possible and the machine state cannot be saved under normal mechanism, such as error logging. For these conditions, PAL and the console save the appropriate state information in EEPROM. When the system is booted, if any double halt error logs exist in the EEPROM, the halt data is copied from the EEPROM into memory. A pointer, in the per-CPU Slot area of the HWRPB indicates the memory location of the halt data. Using this pointer, the double error halt information is written into the error log.

4-30 Service Manual

Figure 4-1 illustrates the format of the Entry type 71 Errorlog utilizing the

Header structures. If the console has two halt frames to log, it will put a header on each as shown. Normally there will only be one Halt Frame in this event. In any case, there will be an End of Event Frame at the bottom on the entry. The packets for memory, TIOP and PCI use the same forms specified in the TurboLaser 5 Product Fault Management Specification. The 670/660 logout frame is the standard 288 byte packet used in error logging. The TLEP subpacket is minimized so only error information is captured during the CPU DBL

HALT. The Byte Count is calculated on a fully populated configuration and includes one incidences of errors.

1

Figure 4-1 Error Log Header Structure

Revision = 1 Type = 11 Class = 5 BC= 1056

TLASER HALT FRAME

Revision = 1 Type = 11 Class = 5 BC= 1056

TLASER HALT FRAME

Revision = 1 0 0 End of Event = 8

1 Unused node locations will be filled with 0xDEADBEEF. If a register NXMs, it will be filled with 0x0BADDEED.

DECevent Error Log 4-31

CPU Double Error Halt content

TL6 CPU DBL ERR HLT Frame Content

HEADER

HALT CODE

2 LW

1 LW

RSVD

WATCH

1 LW

2 LW

670/660 Logout 72 LW

Node 0 TLEP SUB-Packet(mini) 14 LW/Node

Node …8

PCI 0

126 LW 9Nodes

3 LW/Node

PCI …19 60 LW 20PCI

Total Byte Count for two events 2112 byte count

TLEP Sub-Packet (minimized)

TLBER

TLESR1

TLESR3

TDIERR

TLEPWERR1

TLEPWERR3

RESERVED

TLDEV

TLESRO

TLESR2

TCCERR

TLEPWERR0

TLEPWERR2

RESERVED

PCI Sub-Packet

PCIA ERR1 PCIA ERR0

PCIA ERR2

4-32 Service Manual

Memory Sub-Packet

TLBER

TLESR1

TLESR3

TLFADR1

TLMIR

MER

RESERVED

TIOP SUB-Packet

TLBER

TLESR1

TLESR3

ICCWTR

IDPNSE1

IDPNSE3

RESERVED

TLDEV

TLESR0

TLESR2

TLFADR0

TLVID

MCR

RESERVED

TLDEV

TLESR0

TLESR2

ICCNSE

IDPNSEO

IDPNSE2

RESERVED

Example 4-8 CPU Double Error Halt

***************** ENTRY 1 ********************************

Logging OS 1. OpenVMS

System Architecture 2. Alpha

OS version V6.2

Event sequence number 11.

Timestamp of occurrence 31-MAY-1996 14:37:49

Time since reboot 0 Day(s) 0:23:53

Host name FFFA0026

System Model COMPAQ AlphaServer GS140 67/700

Entry Type 113. CPU Double Error Halt

-- TLaser DE Halt --

Halt Code x00000007

DECevent Error Log 4-33

Watch $ x0000620306101227

Halt On 6-Mar-1998 at 16:18:39

MCHK Reason Mask x0000FFFA

MCHK Frame Rev x00000001

MCHK Frame Rev: 0.0

- CPU Registers -

I_STAT x0000000000000000

Bits<31:29> Bx000 - NO Error Detected

DC_STAT x0000000000000000

Bits<04:00> Bx00000 - NO Error Detected

C_ADDR x0000000000000000

Address of last reported x0000000000000000

DC1_SYNDROME x000000000000C000

DC0_SYNDROME x0000000000000000

C_STAT x0000000044000100

Bits<04:00> Bx00000 NO Error

C_STS x0000000000000000

Bits<03:00> Bx0000 NO Error

MM_STAT x0000000000000000

OPCODE x0000000000000000

Dcache Parity: OK

EXC_ADDR x0000000000098000

NO Bits Set

Addr Field_1 Bits<31:02> x0000000000026000

Addr Field_2 Bits<63:32> x0000000000000000

IER_CM x0000000000000000

NO Bits Set

Current Mode 00 Kernel

AST Interrupt Enabled x0000000000000000

Software Interrupts Enb: x0000000000000000

Performance Cnt Intr Enb Interrupt 00

Corr Read Error Intr Dis

Serial Line Intr Dis

EIEN Interrupt: x0000000000000000

I_SUM x0000000000014490

ASTE Bit Set

AST Interrupts ASTU Set

Software Interrupts x0000000000000005

4-34 Service Manual

Performance Cnt Interrupt x0000000000000000

Corr Read Error Intr Dis

Serial Line Intr Dis

EIEN Interrupts: x0000000000000000

PAL_Base x0000000000000000

Base address of PAL Code: x0000000000000000

I_CTL x0000000000000000

System Performance Counter Dsb

Icache Set enabled x0000000000000000

Super page Mode Bits x0000000000000000

I-Stream Buffer Enable Only Demand

Requests Launched

I-Stream Buffer Enable DBP based on state

of chooser

Branches chosen

PALRES Inst NOT executed in Kernel Mode

VA_48, 43 Bit Virtual Address used

VA_FORM_32, Bit NOT Set

Single_Issue_L Bottom Up

Performance Counter 0 Disabled

Performance Counter 1 Disabled

CALL_PAL link Reg is R27

MCHK Check Disabled

Processor ID NOT Recognized

VPTB Bits<47:30> x0000000000000000

VPTB Bits<63:48> x0000000000000000

PCTX x0000000000000000

ASTER 00 Kernel

ASTRR 00 Kernel

- System Registers -

WHAMI x0011 TLSB Node ID 0.

CPU1

TLSB Bad Signal

MISCR x0055 Bcache Size: 4 Mbyte

Two Processors

TLSB RUN Signal

CPU0 Running console

TLDEV x76008024 -- Device Type: Dual EV6 Proc, 525Mhz,

4meg Bcache

DECevent Error Log 4-35

TLBER x00000000

TLCNR x00000000

TLVID x00000000

TLESR0 x00400303 SYND0 x00000003

SYND1 x00000003

CPU0 Sourced Data

TLESR1 x00400C0C SYND0 x0000000C

SYND1 x0000000C

CPU0 Sourced Data

TLESR2 x00406060 SYND0 x00000060

SYND1 x00000060

CPU0 Sourced Data

TLESR3 x00409090 SYND0 x00000090

SYND1 x00000090

CPU0 Sourced Data

TLMODCONFIG0 x00040000 DPQ MAX Entries x00000000

dtag1 disable

BQ_MAX_ENTRIES NO Limit

Bcache size = 4MB

TLMODCONFIG1 x00098AD4 P0 Reqest ID line 2

P1 Reqest ID line 5

TLMBPR_RETRY_Count 2**8 retries - 1.5us

on idle system (min)

fault disabled on TLSB

P0 req disabled

DISABLE PROBE Number 0

tbc fast path enabled

dm_dslb_prio - probes, fills, victims or

wrio

wspc_error_en

TCCERR x00004000 TCC Chip Revision x00000000

TDIERR x00000000

INTR MASK 0 x000001FF duart0 interrupt enable

ipl 14 interrupt enable

ipl 15 interrupt enable

ipl 16 interrupt enable

ipl 17 interrupt enable

ip enable

intim enable

CPU halt enable

control/p halt enable

INTR MASK 1 x000000FE ipl 14 interrupt enable

ipl 15 interrupt enable

ipl 16 interrupt enable

4-36 Service Manual

ipl 17 interrupt enable

ip enable

intim enable

CPU halt enable

INTR SUM 0 x00000000

INTR SUM 1 x00000000

TLEP VMG x00000000

TLEPWERR0 x00000000

TLEPWERR1 x00000000

TLEPWERR2 x00000000

TLEPWERR3 x00000000

CPU0 Last Win Sp Access x000000DBEEFDBEE8

Pending Bit=1, Address Valid

CPU1 Last Win Sp Access x000000DBEEFDBEE8

Pending Bit=1, Address Valid

TLSB Node: 5. Node 5

TLDEV x00005000 -- Device Type: Memory

-- Module Revision: x00000000

TLBER x00100000

TLESR0 x00000303

TLESR1 x00000C0C

TLESR2 x00006060

TLESR3 x00009090

TLFADR1 TLFADR0 x008500000011E940

TLVID x00000080

TLMIR x80000001 Interleave x00000001

MCR x00000235 512MB Module (E2035-DA)

16 MB DRAM

60ns DRAM

Strings Installed = 2

DRAM timing: Bus Spd = 13.0-15.0,

Refresh Cnt = 1008

MER x00000001 Failing String = x00000001

TLSB Node: 7. Node 7

TLDEV x00002020 -- Device Type: Integrated I/O Module

TLBER x00000000

TLESR0 x00000000

TLESR1 x00000000

DECevent Error Log 4-37

TLESR2 x00000000

TLESR3 x00000000

ICCNSE x80000000 Interrupt Enable on NSES Set

ICCWTR x00000000

IDPNSE-0 x00000006 Hose Power OK

Hose Cable OK

IDPNSE-1 x00000006 Hose Power OK

Hose Cable OK

IDPNSE-2 x00000000

IDPNSE-3 x00000000

TLSB Node: 8. Node 8

TLDEV x00002000 -- Device Type: I/O Module

TLBER x00000000

TLESR0 x00000000

TLESR1 x00000000

TLESR2 x00000000

TLESR3 x00000000

ICCNSE x80000000 Interrupt Enable on NSES Set

ICCWTR x00000008 Window Space Trans in Prog, Hose 3

IDPNSE-0 x00000000

IDPNSE-1 x00000000

IDPNSE-2 x00000000

IDPNSE-3 x00000007 HOSE ERROR SIGNAL ASSERTED

Hose Power OK

Hose Cable OK

IOP/PCI: 4. IOP Node 7, Hose 0

PCIERR 0 x00000000

PCIERR 1 x00000000

IOP/PCI: 5. IOP Node 7, Hose 1

PCIERR 0 x00000000

PCIERR 1 x00000000

PCIERR 2 x00000000

4-38 Service Manual

4.6.2 Machine Check Logout Frames

Machine Check Logout Frame - 670/660

The TL6 Machine Check 670/660 logout frame provides error information to the operating system error handler. When a fault is detected, PALcode enters a error handler, captures the state of the processor and system, and builds a logout frame. One frame is built for both processor and system detected errors.

Machine check logout 670 contain EV6 CPU specific error registers while machine check logout 660 contains system specific error registers.

63 … 48

Common Area: R|S|D|C|

47 … 32

System Area Offset

MCHK Frame Rev

CPU Area:

31 … 16

Frame Size

15 … 00

CPU Area Offset

MCHK CODE

ISTAT

DC_STAT

C_ADDR

DCI_SYNDROME

DCO_SYNDROME

C_STAT

C_STS

MM_STAT

EXC_ADDR

IER_CM

I_SUM

RESERVED

PAL_BASE

I_CTL

PCTX

RESERVED

RESERVED

30

38

40

48

50

58

00

08

10

18

20

28

80

88

90

98

60

68

70

78

DECevent Error Log 4-39

System Area:

63 … 48 47 … 32

RSVD

TLBER

TLVID

TLESR1

TLESR3

TLMODCONFIG1

TDIERR

TLINTRMASK1

TLINTRSUM1

TLEPWERR0

TLEPWERR2

RESERVED

RESERVED

RESERVED

RESERVED

RESERVED

31 … 16 15 … 00

MISCR | WHAMI

TLDEV

TLCNR

TLESR0

A0

A8

B0

B8

TLESR2 C0

TLMODCONFIG0 C8

TCCERR D0

TLINTRMASK0

TLINTRSUM0

TLEP_VMG

TLEPWERR1

TLEPWERR3

RESERVED

RESERVED

RESERVED

RESERVED

D8

E0

E8

F0

F8

100

108

110

118

4-40 Service Manual

Machine Check Logout Frame - 630/620

The TL6 Machine Check 630/620 logout frame provides error information to the operating system error handler. When a fault is detected, PALcode enters a error handler, captures the state of the processor and system, and builds a logout frame. One frame is built for both processor and system detected errors that are correctable. Machine check logout 630 contains EV6 CPU specific errors registers while machine check logout 620 contains system specific error registers.

63 … 48

Common Area: R|S|D|C|

47 … 32

System Area Offset

CPU Area:

MCHK Frame Rev 8.

ISTAT

31 … 16

Frame Size

15 … 00

CPU Area Offset

MCHK CODE

DC_STAT

C_ADDR

00

08

10

18

20

28

DCI_SYNDROME

DCO_SYNDROME

C_STAT

C_STS

MM_STAT

30

38

40

48

50

63 … 48

System Area: DOF_CNT

47 … 32

TLBER

TLESR1

TLESR3

31 … 16 15 … 00

MISCR | WHAMI 58

RESERVED

TLDEV

TLESR0

TLESR2

60

68

70

78

RESERVED

RESERVED

80

88

DECevent Error Log 4-41

4.6.3 Machine Check Error Log

The Error Log contains relevant system register information used to diagnosis hardware system faults. Because a majority of the Error Log has been specified in Chapter 5 of the TL5 Product Fault Management Specification, this section only deals with only changes between TL5 and TL6.

Error Log Size

The Operating System Header for OpenVMS and Compaq Tru64 UNIX remains the size as the TL5. The Software Error Flags, Common TLEP Header Area and PALcode revision area are also unchanged in size. The TLEP Machine

Check Frames for 670/660 and 630/620 have different sizes relative to the TL5.

63 … 48 47 … 32 31 … 16

Operating System

Errorlog Header

VMS=96b OSF=56b

Software Error Flags

24 bytes

Common TLEP Header Area

24 bytes

TLEP Machine Check Frame

670/660 =288 bytes

630/620 =144 bytes

PALcode Revision = 8 bytes

15 … 00

4-42 Service Manual

TLSB Bus Snapshot

Error Types Requiring TLSB SNAPSHOT

The following is a list of registers and errors that require the operating system to append a SNAPSHOT to the error log file.

Register Name Signal Name

TLBER DTO, DE, SEQE, DCTCE,

ABTCE, UACKE, FDTCE,

CWDE2, CRDE, CWDE,

UDE, REQDE, FNAE,

MMRE, ACKTCE, RTCE,

NAE, BBE, APE, ATCE

TCCERR P1_ILLEGAL_CMD,

P0_ILLEGAL_CMD,

CSR_XACTION_ERR,

CSR_WR_NXM,

P1_FATAL_MMRE,

P0_FATAL_MMRE,

FAULT_ASSERTED,

WSPC_RD_ERROR,

SYSFAULT, SYSDERR,

P1_TLMBPR_T0,

P0_TLMBPR_T0

TDIERR P1T0, P0T0

Register Bit Position

TLBER<31:25,19:16,9:4,2:0>

TCCERR<21,20,14,13,10:4,1,0>

TDIERR<1,0>

DECevent Error Log 4-43

TLEP Subpacket

The TLEP sub-packet contains TurboLaser CPU module registers. It can be part of the TLSB sub-packet of a machine check entry packet or part of a

LASTFAIL packet. The TL6 TLEP has been extended to include additional system registers.

63 … 48 47 … 32 31 … 16

Base Physical IO Address of TLEP

Valid Bits

15 … 00

TLBER

TLVID

TLESR1

TLESR3

TLDEV

TLCNR

TLESR0

TLESR2

TLMODCONFIG1

TDIERR

TLINTRMASK1

TLINTRSUM1

TLEPWERR0

TLEPWERR2

RESERVED

RESERVED

RESERVED

RESERVED

TLMODCONFIG0

TCCERR

TLINTRMASK0

TLINTRSUM0

TLEP_VMG

TLEPWERR1

TLEPWERR3

RESERVED

RESERVED

RESERVED

00

08

10

18

20

28

30

38

40

48

50

58

60

68

70

78

TLDEV TurboLaser Device Register (BB+0000)

The device register contains information to identify a node. The fields are loaded by console. A zero value indicates an uninitialized note.

TLDEV:

3

1

2 2

4 3

HWREV

1

6

SWREV

1

5 0

DTYPE

4-44 Service Manual

TLDEV Format

Name

CHIP TYPE

CHIP SPEED

EV5 & EV56

CHIP SPEED

EV6

DTYPE

Bit(s) Type Init Description

31:28 M 0 EV5 = 5

EV5/6 = 7

EV6 = 8

EV67=11

27:24 M 0 350MHZ = 0

300MHZ = 1

525MHZ = 2

437MHZ = 3

625MHZ with 8M BCACHE = 5

625MHZ with 4M BCACHE = 6

27:24 M 0

15:0 M 0

525MHZ = 0

700MHZ = 1

I/O MODULE = 2000

INTERGRATED I/O

MODULE = 2020

MEMORY MODULE = 5000

SINGLE PROCESSOR,

4M BCACHE = 8011

DUAL PROCESSOR,

4M BCACHE = 8014

DUAL EV6, 4M BCACHE = 8025

DECevent Error Log 4-45

Chapter 5

Removal and Replacement

Procedures

This chapter contains removal and replacement procedures for the components of the AlphaServer GS60E system. This chapter includes removal and replacement procedures for the following:

• TLSB Modules

• TLSB Card Cage Removal

• Operator Control Panel

• CD Tray

• AC Distribution Box

• Power Rack Assembly

• Cabinet Control Logic (CCL) Panel

• BA36R StorageWorks Shelf

• DWLPB PCI Box

• Plenum Assembly

• Cabinet Panels

• Cables

Removal and Replacement Procedures 5-1

5.1 TLSB Modules

This section covers replacing processor, memory, terminator, or I/O modules, as well as SIMM removal and replacement.

5.1.1 How to Replace the Only Processor

Before replacing processor modules, update console firmware and any customized environment variables and boot paths.

Example 5–1 Replacing the Only Processor Module

P00>>> sho *

[list of environment variables appears]

P00>>> boot dkd400

Building FRU table............

(boot dkd400.4.0.5.0 -flags 0,a0)

[LFU boots]

UPD> update kn7cg-ab0

WARNING: updates may take several minutes to complete for each device.

Confirm update on: kn7cg-ab0 [Y/(N)] y

DO NOT ABORT!

kn7cg-ab0 Updating to V4.9-20... Verifying V4.9-20 Passed.

UPD> exit

Initializing...

[self-test display appears]

P00>>> build -e kn7cg-ab0

Build EEPROM on kn7cg-ab0 ? [Y/N]y

EEPROM built on kn7cg-ab0

P00>>> set bootdef_dev dua1.0.0.11.0

P00>>> init

Initializing...

[self-test display appears]

P00>>> set eeprom field

LARS> 01234567

Message>

P00>>> boot

5-2 Service Manual

1. List the system’s environment variables to determine if any have been customized (see

in Example 5-1). You will set these in step 7.

2. Power down the system and remove and replace the module. See Section

5.1.4.

3. Power up the system. Boot LFU and issue the update command to ensure that the module has the latest version of console firmware (see

).

4. Exit LFU (see

).

5. Build the EEPROM (see

). The format of data often changes between versions of console firmware. This command reformats the data.

6. Set any customized environment variables with the set <envar> command

(see

).

7. Initialize the system (see

).

8. Enter into the EEPROM the 8-digit LARS number and a short message (68 character maximum) stating the date and reason for service (see

).

9. Boot the operating system (see

).

Removal and Replacement Procedures 5-3

5.1.2 How to Replace the Boot Processor

Check the console firmware version in the existing and replacement modules and, if they differ, use the LFU update command to bring the replacement module to the current version. Build the replacement

EEPROM on the replacement module.

Example 5–2 Replacing the Boot Processor

F E D C B A 9 8 7 6 5 4 3 2 1 0 NODE #

A M M M . . P P P TYP

o + + + . . ++ ++ ++ ST1

. . . . . . EE EE EB BPD

o + + + . . ++ ++ ++ ST2

. . . . . . EE EE EB BPD

o + + + . . ++ ++ ++ ST3

. . . . . . EE EE EB BPD

+ + + + + + + . . . . + C0 PCI +

. . . . . . . . EISA +

. . . . . . . . . . . . . . . . C1

. . . . . . . . . . . . . . . . C2

. . . . . . . . . . . . . . . . C3

B0 A1 A0 . . . . . ILV

. 4GB 4GB 4GB . . . . . 12GB

Compaq AlphaServer GS60E 2-6/700/8, Console V5.5-25

26-OCT-1999 12:06:03

SROM V2.3, OpenVMS PALcode V1.68-101, Tru64 UNIX PALcode V1.61-101

System Serial = NI84177052, OS = OpenVMS, 3:11:57 December 7, 1999

P00>>> boot dkd400

Building FRU table............

(boot dkd400.4.0.5.0 -flags 0,a0)

[LFU boots]

UPD> update kn7cg-ab0

WARNING: updates may take several minutes to complete for each device.

Confirm update on: kn7cg-ab0 [Y/(N)] y

DO NOT ABORT!

5-4 Service Manual

1. Remove the failing module (see Section 5.1.4). In this example, the primary processor is the failing module and it is in slot 0.

2. Power up the system and make note of the version of console firmware in the remaining modules. See

in Example 5-2.

3. Power down the system and remove all processor modules. See Section

5.1.4.

4. Insert the replacement modules. See Section 5.1.4.

5. Power up the system and determine the version of console firmware in the replacement module. If it is different from the other modules, boot LFU and update the firmware using the update command. See

.

Continued on next page

Removal and Replacement Procedures 5-5

Example 5–2 Replacing the Boot Processor (Continued)

kn7cg-ab0 Updating to V4.9-20... Verifying V4.9-20... Passed.

UPD> exit

Initializing...

[self-test display appears]

P00>>> build -e kn7cg-ab0

Build EEPROM on kn7cg-ab0 ? [Y/N]y

EEPROM built on kn7cg-ab0

F E D C B A 9 8 7 6 5 4 3 2 1 0 NODE #

A M M M . . P P P TYP

o + + + . . ++ ++ ++ ST1

. . . . . . EE EE EB BPD

o + + + . . ++ ++ ++ ST2

. . . . . . EE EE EB BPD

o + + + . . ++ ++ ++ ST3

. . . . . . EE EE EB BPD

+ + + + + + + . . . . + C0 PCI +

. . . . . . . . EISA +

. . . . . . . . . . . . . . . . C1

. . . . . . . . . . . . . . . . C2

. . . . . . . . . . . . . . . . C3

B0 A1 A0 . . . . . ILV

. 4GB 4GB 4GB . . . . . 12GB

Compaq AlphaServer GS60E 2-6/700/8, Console V5.5-25 26-OCT-1999 12:06:03

SROM V2.3, OpenVMS PALcode V1.68-101, Tru64 UNIX PALcode V1.61-101

System Serial = NI84177052, OS = OpenVMS, 3:11:57 December 7, 1999

P00>>> set cpu 2

P02>>> build –c kn7cg*

P02>>> set cpu 0

P00>>> set eeprom field

LARS> 01234567

Message>

P00>>> boot

5-6 Service Manual

6. Build the EEPROM. See

.

7. Power down the system, replace the other processor modules (see Section

5.1.4), and power up the system.

8. Copy the EEPROM environment variables from a secondary processor to the new primary processor. To do this, set a different module as primary and copy the environment variables using the build –c command. See

.

9. Set processor 0 as the primary processor. Then enter into the EEPROM the

8-digit LARS number and a short message (68 characters maximum) stating the date and reason for service. See

.

10. Boot the operating system.

Removal and Replacement Procedures 5-7

5.1.3 How to Add a New Processor or Replace a Secondary

Processor

Check the console firmware version in the existing modules and the new or replacement module and, if they differ, use the LFU update command to bring the new module to the current version. Build the

EEPROM on the new module.

Example 5–3 Adding or Replacing a Secondary Processor

F E D C B A 9 8 7 6 5 4 3 2 1 0 NODE #

A M M M . . P P P TYP

o + + + . . ++ ++ ++ ST1

. . . . . . EE EE EB BPD

o + + + . . ++ ++ ++ ST2

. . . . . . EE EE EB BPD

o + + + . . ++ ++ ++ ST3

. . . . . . EE EE EB BPD

+ + + + + + + . . . . + C0 PCI +

. . . . . . . . EISA +

. . . . . . . . . . . . . . . . C1

. . . . . . . . . . . . . . . . C2

. . . . . . . . . . . . . . . . C3

B0 A1 A0 . . . . . ILV

. 4GB 4GB 4GB . . . . . 12GB

Compaq AlphaServer GS60E 2-6/700/8, Console V5.5-25

26-OCT-1999 12:06:03

SROM V2.3, OpenVMS PALcode V1.68-101, Tru64 UNIX PALcode V1.61-101

System Serial = NI84177052, OS = OpenVMS, 3:11:57 December 7, 1999

P00>>> boot dkd400

Building FRU table............

(boot dkd400.4.0.5.0 -flags 0,a0)

[LFU boots]

UPD> update kn7cg-ab0

WARNING: updates may take several minutes to complete for each device.

Confirm update on: kn7cg-ab0 [Y/(N)] y

DO NOT ABORT!

5-8 Service Manual

In this example, the primary processor is in slot 0 and a secondary processor is being replaced in slot 1.

1. If you are replacing a secondary processor, remove the module from the system. See Section 5.1.4.

2. Power up the system and make note of the version of console firmware in the processor modules. See

in Example 5-3.

3. Power down the system and remove all processor modules. See Section

5.1.4.

4. Insert the new processor module. See Section 5.1.4.

5. Power up the system and determine the version of console firmware in the replacement module. If it is different from the other modules, boot LFU and update the firmware using the update command. See

.

Continued on next page

Removal and Replacement Procedures 5-9

Example 5–3 Adding or Replacing a Secondary Processor

(Continued)

kn7cg-ab0 Updating to V4.9-20... Verifying V4.9-20... Passed.

UPD> exit

Initializing...

[self-test display appears]

P00>>> build -e kn7cg-ab0

Build EEPROM on kn7cg-ab0 ? [Y/N]y

EEPROM built on kn7cg-ab0

F E D C B A 9 8 7 6 5 4 3 2 1 0 NODE #

A M M M . . P P P TYP

o + + + . . ++ ++ ++ ST1

. . . . . . EE EE EB BPD

o + + + . . ++ ++ ++ ST2

. . . . . . EE EE EB BPD

o + + + . . ++ ++ ++ ST3

. . . . . . EE EE EB BPD

+ + + + + + + . . . . + C0 PCI +

. . . . . . . . EISA +

. . . . . . . . . . . . . . . . C1

. . . . . . . . . . . . . . . . C2

. . . . . . . . . . . . . . . . C3

B0 A1 A0 . . . . . ILV

. 4GB 4GB 4GB . . . . . 12GB

Compaq AlphaServer GS60E 2-6/700/8, Console V5.5-25 26-OCT-1999 12:06:03

SROM V2.3, OpenVMS PALcode V1.68-101, Tru64 UNIX PALcode V1.61-101

System Serial = NI84177052, OS = OpenVMS, 3:11:57 December 7, 1999

P00>>> build –c kn7cg*2

P00>>> set eeprom field

LARS> 01234567

Message>

P00>>> boot

5-10 Service Manual

6. Build the EEPROM. See

.

7. Power down the system, replace the other processor modules. See Section

5.1.4.

8. Power up the system. Copy the EEPROM environment variables to the new processor using the build –c command. See

.

9. Enter into the EEPROM the 8-digit LARS number and a short message

(68 characters maximum) stating the date and reason for service. See

.

10. Boot the operating system.

Removal and Replacement Procedures 5-11

5.1.4 Processor, Memory, or Terminator Module Removal and

Replacement

Wear an antistatic wrist strap. Release the handles and slide the module out of the card cage. To replace, line up the module and cover the guide and rail in the card cage, be sure the projections on the top and bottom of the end plate align with the slots in the card cage, and slide the module into the cage. Push the handles in to connect at the centerplane, and let them spring into the stops.

Figure 5–1 Processor, Memory, or Terminator Module

5

4

SM51-99

5-12 Service Manual

NOTE: If you are replacing or adding a processor module, see Section 5.1.1,

5.1.2, or 5.1.3 before using this procedure.

Removal

1. Shut down the operating system and power down the system.

CAUTION: You must wear a wrist strap when you handle any modules.

2. Ground yourself to the cabinet with an antistatic wrist strap.

3. Push the handles of the module to be removed in toward the module end plate and to the left, releasing them from the stops.

4. Grasp the end plate and slide the module out of the card cage. See

in

Figure 5-1.

5. Place the module on an ESD pad. If it is being replaced, slide the module into the antistatic bag from the replacement module and pack it in the box.

Replacement

1. Ground yourself to the cabinet frame with an antistatic wrist strap.

CAUTION: To avoid damaging an EMI gasket, insert modules from left to right. These gaskets can easily break, and a broken piece of gasket can damage a module or the centerplane.

2. Remove the module from its packaging and release the spring-loaded handles from the stops. To do this, push both handles toward the module end plate and away from the stops.

3. Hold the module assembly by the end plate. Align the module with the card guide and the cover with the rail (see Figure 5-1).

4. Slide the module assembly into the card cage as far as it will easily go.

5. When the module stops, check that the projections on the top and bottom of the end plate are aligned with the slots in the card cage (see

in Figure

5-1). If they are not, remove the module and realign.

6. Push the handles to the module end plate. You will feel the module make contact with the connectors at the centerplane. Release the handles so they spring back into the stops.

Verification

Check that terminator modules are installed in all unused slots. Power up the system and check that the self-test display is correct. Enter the show

configuration command. If you replaced a memory module, enter the show

simm command.

Removal and Replacement Procedures 5-13

5.1.5 SIMM Removal and Replacement

Remove both covers from the memory module. Remove the standoff at the end of the row with the failing SIMM. Remove all SIMMs in the row up to and including the failing SIMM. Release the latches on both ends of the SIMM by gently inserting a small Phillips head screwdriver.

Figure 5–2 Removing a SIMM

SM52-99

5-14 Service Manual

Removal

1. Remove the appropriate memory module from the card cage.

2. Place the module on an ESD pad on a level surface. Remove both module covers by removing the eight screws from each. (The screws that attach to the end plate of the module are larger than those that attach to the standoffs.)

3. Use an adjustable wrench to remove the standoff at the end of the row with the failing SIMM. See

in Figure 5-3 or 5-4.

4. Beginning with J2, J12, or J24 on the E2035 module or with J2, J14, or J28 on the E2036 module, remove each SIMM up to and including the failing

SIMM. To remove a SIMM, release the latch on each end of the connector by inserting a Phillips screwdriver into the slot and pressing down. See

Figure 5-2. (See Figures 5-3 and 5-4 for SIMM connector numbers.)

Replacement

1. Insert the replacement SIMM into the connector at a 45-degree angle. As you rotate it to an upright position, the latches will snap into place. (The

SIMM is keyed on the sides and in the center so that the correct side faces front.)

2. Insert the other SIMMs in their connectors.

3. Replace the standoff. The square standoff goes on side 1 (the component side) and the hexagonal standoff on side 2. Torque the standoffs to 12 inchpounds (15 inch-pounds maximum).

4. Replace the module covers and replace the memory module.

Verification

P00>>> set simm_callout on

P00>>> init

[self-test display appears]

P00>>> show simm

[test message appears]

P00>>> set simm_callout off

Look for a “no error” message.

Removal and Replacement Procedures 5-15

Figure 5-3 SIMM Connector Numbers – E2035 Module

3

J32

J30

J28

J26

J24

J33

J31

J29

J27

J25

J22

J20

J18

J16

J14

J12

J23

J21

J19

J17

J15

J13

J10

J8

J6

J4

J2

J11

J9

J7

J5

J3

3

SM53-99

5-16 Service Manual

Figure 5-4 SIMM Connector Numbers – E2036 (2-Gbyte) and E2037

(4-Gbyte) Modules

3

J36

J34

J32

J30

J28

J37

J35

J33

J31

J29

J26

J24

J22

J20

J18

J16

J14

J27

J25

J23

J21

J19

J17

J15

J12

J10

J8

J6

J4

J2

J7

J5

J3

J13

J11

J9

3

BX-0770-95

Removal and Replacement Procedures 5-17

5.1.6 I/O Cable and KFTHA Module Removal and

Replacement

The I/O hose cable connects the KFTHA module to an I/O bus. Remove a hose by loosening the captive screws on the connector. After disconnecting all cables, removal of the module is the same as other modules.

Figure 5–5 I/O Hose Cable

3

5-18 Service Manual

SM56-99

I/O Hose Cable Removal

1. Shut down the operating system and power down the system.

2. Ground yourself to the cabinet with an antistatic wrist strap.

3. Loosen the captive screws (slotted) to remove the cable connectors at both ends of the I/O cable to be replaced. See

in Figure 5-5.

I/O Hose Cable Replacement

1. Attach the TLSB end with pin 50 on top. Torque the screws to 6 inchpounds/

2. Route the replacement I/O cable through the same path as the original one was routed.

3. Attach the I/O bus end. The connector is asymmetrical to ensure proper orientation.

Verification

Power up the system, check that the green LED near the top connector lights, and check that the console display includes the I/O bus connected to this cable.

Removal and Replacement Procedures 5-19

5.2 TLSB Card Cage Removal

Remove all modules (front and rear), disconnect the cables from the from the card cage, remove and save the mounting brackets, and slide the cage out from the front. You will need a Phillips head screwdriver and 8 mm and 10 mm nutdrivers.

Figure 5–6 TLSB Card Cage Removal

Front

5

4

6

Rear

7

6

SM57-99

5-20 Service Manual

Removal

1. Shut down the operating system and turn the keyswitch to Off.

2. Ground yourself to the cabinet with an antistatic wrist strap.

3. Note the locations of the modules in the card cage and remove the modules.

See Section 5.1.

4. At the front of the card cage, use the 8-mm nutdriver to remove the kepnuts from the terminal cover (see

in Figure 5-6). Save the kepnuts. Using the

10-mm nutdriver, remove the nuts and washers that attach the power and ground cables to the power posts. Save the nuts and washers.

5. Disconnect the CCL cable. See

.

6. At the front of the cabinet, use the Phillips head screwdriver to remove the top and bottom brackets from the card cage and frame (see

). Save the brackets and screws.

7. At the rear of the cabinet, remove the side and bottom brackets from the frame and from the card cage (see

). Save the brackets and screws.

CAUTION: The following step requires two people. Because of the height of the card cage in the cabinet, you should not remove this assembly by yourself.

8. Slide the card cage assembly out the front of the cabinet.

Removal and Replacement Procedures 5-21

Replacement

1. Ground yourself to the cabinet with an antistatic wrist strap.

CAUTION: The following step requires two people. Because of the height of the card cage in the cabinet, you should not install this assembly by yourself.

2. From the front, slide the replacement card cage into the cabinet so that the label is at the top on the front and the power filter is to the left.

3. Attach the reserved front top and bottom brackets and the rear bottom bracket to the card cage using the reserved flathead screws.

NOTE: The rear bottom bracket is deeper than the front one. If these two brackets are swapped, the holes in the side bracket will not line up correctly in the next step.

Continued on next page

5-22 Service Manual

4. At the rear of the cabinet, use the Phillips head screwdriver to loosely install the reserved side bracket to the frame with two reserved screws.

Line up the other two holes in the bracket with the card cage holes and insert two reserved screws. Tighten all four screws. Attach the card cage to the frame at the bottom with the reserved screws.

5. At the front of the cabinet, use the Phillips head screwdriver to attach the card cage to the frame at the top and bottom with five reserved screws.

6. Install all the modules in the card cage.

7. Attach the CCL cable.

8. Use the 10-mm nutdriver and the reserved nuts to attach the power and ground cables to the power posts. (Place a washer behind the power cable connector and one in front of the connector, then attach and tighten the nut.) The yellow cable (+48 V) attaches to the top post; the gray cable

(ground) attaches to the bottom post.

9. Use the 8-mm nutdriver and the reserved kepnuts to install the terminal cover over the power posts.

Verification

Power up the system and check that all the modules appear in the self-test display. Enter the show configuration, show device, and test commands.

Removal and Replacement Procedures 5-23

5.3 Operator Control Panel

The operator control panel (OCP) attaches to the top of the front door.

It is held in place by a boss on each side of the plastic bezel. The signal cable is attached to the bottom connector on the left side at the back of the OCP, accessible from the backside of the front door.

Figure 5–7 Operator Control Panel

5-24 Service Manual

SM58-99

Removal

1. Shut down the operating system and turn the keyswitch to Off.

2. Shut the main circuit breaker off by pushing down the handle.

3. Ground yourself to the cabinet with an antistatic wrist strap.

4. Open the front cabinet door.

5. Remove the signal cable by loosening the two thumbscrews.

6. From the inside of the door, push on the left hand side boss until it snaps out of the opening.

7. Move to the outside of the door. While supporting the OCP on the front side of the door, carefully push on the right hand boss until it snaps free. Make certain the OCP does not fall.

Replacement

• Reverse the steps in the Removal procedure.

Verification

Power up the system and turn the keyswitch to On. Check that the Power and

On LEDs light.

Removal and Replacement Procedures 5-25

5.4 CD Tray

The CD tray houses the CD-ROM drive and optional floppy drive. It mounts to the left-hand rail in front of the DWLPB PCI box.

Figure 5–8 CD Tray

5-26 Service Manual

SM59-99

Removal

1. Shut down the operating system and turn the keyswitch to Off.

2. Shut the main circuit breaker off by pushing down the handle.

3. Remove all cable connectors from the right side of the tray that houses the

CD-ROM drive.

4. Loosen the two captive screws on the left side of the tray (see Figure 5-8).

5. Slide the tray out of the cabinet and place it on a stable working surface.

Replacement

• Reverse the steps in the removal procedure.

Verification

Boot LFU.

Removal and Replacement Procedures 5-27

5.5 AC Distribution Box

The 3-phase 208 VAC distribution box, located at the bottom rear of the system cabinet, rests on right and left side stop brackets and is attached to the cabinet rails with four screws.

Figure 5–9 AC Distribution Box

(Rear)

SM510-99

5-28 Service Manual

Removal

1. Shut down the operating system and turn the keyswitch to Off.

2. At the rear of the cabinet, shut the main circuit breaker off by pushing down the handle.

3. Disconnect the system power cord.

4. From the front of the cabinet, unplug all option power cords from the AC distribution box.

5. At the rear of the cabinet (see Figure 5-9), loosen the four screws (two on each side) attaching the AC distribution box to the cabinet rails.

6. Slide the AC distribution box from the rear of the cabinet.

Replacement

• Reverse the steps in the Removal procedure.

Verification

Power up the system and check that the main circuit breaker does not trip.

Removal and Replacement Procedures 5-29

5.6 Power Rack Assembly

The power rack assembly contains the DC distribution module and three H7506 power supplies.

Figure 5–10 Power Rack Assembly

(Front/Side)

SM511-99

5-30 Service Manual

Removal

1. Shut down the operating system and turn the keyswitch to Off.

2. At the rear of the cabinet, shut the main circuit breaker off by pushing down the handle.

3. Disconnect the system power cord.

4. From the front of the cabinet, remove the three H7506 power supplies by loosening the two screws in the front of each power supply and pulling out the power supply.

5. Remove the two screws (see Figure 5-10) attaching the power rack assembly to the right and left cabinet rails.

6. At the rear of the cabinet, remove the four screws (see Figure 5-10) attaching the power rack assembly to the right and left cabinet rails.

7. Unplug the AC cables from the AC distribution box.

8. Slide the AC distribution box from the rear of the cabinet.

Replacement

• Reverse the steps in the Removal procedure.

Verification

Power up the system and check the power supply LEDs.

H7506 Power Supply

You can replace a failed power supply, or add another power supply, while the system is running. To remove the H7506 power supplies (see EK-H7506-IN,

H7506 Power Supply Installation), loosen the two screws in the front of the power supply and pull out. Push the new power supply into the slot and tighten the two screws. Check that both LEDs (see Figure 2-7) are lit when the system is operational.

Removal and Replacement Procedures 5-31

5.7 Cabinet Control Logic (CCL) Panel

The cabinet control logic (CCL) panel monitors signals from parts of the power system and provides error information to the console software. It is located in the rear lower cabinet, right behind the power rack assembly.

Figure 5–11 Cabinet Control Logic (CCL) Panel

(Rear)

SM512-99

Rear

External

Power Enable

External

UPS Power

External

Enable

Console

PowerComm 1

PowerComm 2

PowerComm 3

Expander

GS60E52-99

5-32 Service Manual

Removal

1. Shut down the operating system and turn the keyswitch to Off.

2. Ground yourself to the cabinet with an antistatic wrist strap.

3. At the rear of the cabinet, shut the main circuit breaker off by pushing down the handle.

4. Disconnect the cables from the CCL panel.

5. Remove the four screws that hold the CCL panel to the CCL assembly.

6. Remove the CCL panel from the CCL assembly.

Replacement

• Reverse the steps in the Removal procedure.

Verification

Power up the system.

Removal and Replacement Procedures 5-33

5.8 BA36R StorageWorks Shelf

The StorageWorks shelf houses disk drives and a power regulator.

Figure 5–12 BA36R StorageWorks Shelf

Green LEDs

Yellow LEDs

SM513-99

5-34 Service Manual

The StorageWorks shelf contains a power supply, StorageWorks disks, and a

Controller.

Removal

1. Shut down the operating system and turn the keyswitch to Off.

2. Disconnect the power cable.

3. Remove the two Philips screws that secure the shelf to the vertical rails.

4. Slide the shelf out of the cabinet.

Replacement

• Reverse the steps in the Removal procedure.

Verification

Power up the system.

Removal and Replacement Procedures 5-35

5.9 DWLPB PCI Box

The DWLPB provides a complete PCI bus subsystem. It contains a

KFE72 adapter which provides I/O for systems using a graphics device.

Figure 5–13 DWLPB PCI Box

(Rear)

SM514-99

5-36 Service Manual

Removal

5. Shut down the operating system and turn the keyswitch to Off.

6. Ground yourself to the cabinet with an antistatic wrist strap.

7. At the rear of the cabinet, shut the main circuit breaker off by pushing down the handle.

8. Disconnect the 48 V cable and I/O hose to the DWLPB.

9. Remove the four screws securing the DWLPB (see Figure 5-13).

10. Slide the DWLPB out on its rails, release the rail locking tabs, and remove the DWLPB from the system.

Replacement

• Reverse the steps in the Removal procedure.

Verification

Power up the system.

Removal and Replacement Procedures 5-37

5.10 Plenum Assembly

The plenum assembly houses the two blowers that cool the system. Air is draw in through the top of the cabinet, through the TLSB card cage, and exhausted at the middle of the cabinet, to the rear.

Figure 5–14 Plenum Assembly

(Front View)

(Front)

5-38 Service Manual

(Rear)

SM515-99

Removal

1. Shut down the operating system and turn the keyswitch to Off.

2. At the rear of the cabinet, shut the main circuit breaker off by pushing down the handle.

3. Disconnect the cables (17-04942-01) from the blowers.

4. Remove the four screws that secure the plenum assembly to the rack.

5. Remove the plenum assembly from the rack.

Replacement

• Reverse the steps in the Removal procedure.

Verification

Power up the system.

Removal and Replacement Procedures 5-39

5.11 Cabinet Panels

The cabinet panels and doors consist of the top and left and right cabinet panels and the front and rear doors.

Figure 5–15 Cabinet Panels

1

2

3

5-40 Service Manual

3

4

SM516-99

3

Removal

1. Lift off the system cabinet cover and set aside (see

, Figure 5-15).

2. Open the system cabinet’s front and rear doors

.

3. Remove the front and rear screws holding the right panel

.

4. Pull the bottom of the panel away from the cabinet, lift up, and remove

.

Repeat steps 3 and 4 on the left side to remove the left system cabinet panel.

5. To remove the front door, open it and unplug the signal cable from the rear of the OCP, located at the top inside of the front door. Unscrew the top bracket securing the door to the cabinet. Lift the door off the bottom hinge pin and set aside.

6. To remove the rear door, open it and unscrew the top bracket securing the door to the cabinet. Lift the door off the bottom hinge pin and set aside.

Replacement

• Reverse the steps in the Removal procedure.

Removal and Replacement Procedures 5-41

5.12 Cables

Figure 5-16 diagrams all the GS60E cables.

Figure 5–16 Cables

DWLPB-DC

KFE72-KA

PCI Module

KZPBA-CX

PCI Module

48V Power

48V Power

Optional

DWLPB-DA

Optional

DWLPB-DA

48V Power

17-04670-02

17-03566-15

17-03566-15

OCP Module

54-30286-01

17-04943-01

J17

J6

Power Subrack

DC Distribution Module - 54-30276-01

J7

J16

J15

J14

J9 J10

J2

17-04941-01

17-04942-01

CCL Module

17-04713-02

TLSB

70-30430-01

17-04941-01

48V Power

Blower

12-42827-03

Blower

12-42827-03

CD Tray

17-03566-15

17-04670-02

17-04713-02

For Expander Cabinet (Optional)

Add: Cable 17-03511-05

Splitter12-44937-01

17-3566-15

SM517-99

Terminator

12-37618-01

5-42 Service Manual

Table 5-1 Cables

Cable Number Connects

17-04713-02 Cabinet Control Logic (CCL) panel to TLSB card cage.

17-04941-01

17-04942-01

DC distribution module to TLSB card cage (48 V).

J9, J10 of DC distribution module and CD-ROM tray to blowers.

17-04943-01

17-04800-02

17-03961-10

17-03961-10

17-03961-10

17-04945-01

J17 of DC distribution module to OCP module.

CCL panel to J6 of DC distribution module.

CCL panel to J14 of DC distribution module.

CCL panel to J15 of DC distribution module.

CCL panel to J16 of DC distribution module.

CCL panel and J6 of DC distribution module to

DWLPBs (48 V)

CD tray to KFE72-KA PCI module.

17-04670-02

17-03566-15

17-03511-05

17-04950-01

CD tray to KFE72-KA and KZPBZ-CX

CCL panel to optional expander cabinet.

CD tray internal cabling.

CD tray internal cabling (optional floppy drive).

17-04100-01

17-04101-01

17-03531-02

17-04952-01

17-03530-01

CD tray internal cabling (optional floppy drive).

CD tray internal cabling (CD-ROM drive).

CD tray internal cabling (CD-ROM drive).

CD tray internal cabling.

Removal and Replacement Procedures 5-43

Appendix A

Updating Firmware

Use the Loadable Firmware Update (LFU) utility to update system firmware. LFU runs without any operating system and can update the firmware on any system module. LFU handles modules on the TLSB bus (for example, the CPU) as well as modules on the I/O buses. You are not required to specify any hardware path information, and the update process is highly automated.

Both the LFU program and the firmware microcode images it writes are supplied on a CD-ROM. From the SRM console, you start LFU with the boot command.

A typical update procedure is:

1. Verify the console environment variable setting (must be serial).

2. Boot the LFU CD-ROM. (Use the show config command to find the device name of the CD-ROM device.)

3. Use the LFU list command to show the revisions of modules that LFU can update and the revisions of update firmware.

4. Use the LFU update command to write the new firmware.

5. Exit.

Sections in this appendix are:

• Booting LFU

• List

• Update

• Exit

• Display and Verify Commands

• Create

Updating Firmware A-1

A.1 Booting LFU

Abstract LFU is supplied on the Alpha CD-ROM (Part Number AG–

RCFB*–BE, where * is the letter that denotes the disk revision). Make sure this CD-ROM is mounted in the in-cabinet CD drive. Boot LFU from the CD-ROM.

Example A–1 Booting LFU from CD-ROM

P00>>> sho dev

➊ polling for units on isp0, slot 0, bus0, hose0...

dka400.4.0.0.0 DKA400 RZ26L 440C polling for units on isp1, slot 1, bus0, hose0...

polling for units on isp2, slot 4, bus0, hose0...

polling for units on isp3, slot 5, bus0, hose0...

dkd400.4.0.5.0 DKD400 RRD47 0000 dkd500.5.0.5.0 DKD500 RZ26L 440C

P00>>> boot dkd400

Building FRU table............

(boot dkd400.4.0.5.0 -flags 0,a0)

SRM boot identifier: scsi 4 0 5 0 400 ef00 81011 boot adapter: isp3 rev 2 in bus slot 5 off of kftia0 in TLSB slot 8 block 0 of dkd400.4.0.5.0 is a valid boot block reading 1150 blocks from dkd400.4.0.5.0

bootstrap code read in

Building FRU table…….

base = 200000, image_start = 0, image_bytes = 8fc00 initializing HWRPB at 2000 initializing page table at 1f2000 initializing machine state setting affinity to the primary CPU jumping to bootstrap code

The default bootfile for this platform is

[gs140]gs140_v55_10.exe

Hit <RETURN> at the prompt to use the default bootfile.

Bootfile:

Starting Firmware Update Utility

Unpacking firmware files

.

A-2 Service Manual

.

***** Loadable Firmware Update Utility *****

----------------------------------------------------------

Function Description

----------------------------------------------------------

Display Displays the system’s configuration table.

Exit Done exit LFU (reset).

List Lists the device, revision, firmware name, and

update revision.

Lfu Restarts LFU.

Readme Lists important release information.

Create Make a custom Console Grom Image.

Update Replaces current firmware with loadable data

image.

Verify Compares loadable and hardware images.

? or Help Scrolls this function table.

WARNING

Before upgrading the "ARC" (AlphaBIOS) section of the console, make sure that the HAL.DLL on WNT boot disk is compatible with the "ARC" section of the console.

See release notes for details.

----------------------------------------------------------

UPD>

➊ Use the show device command to find the name of the RRDCD drive.

➋ Enter the boot command to boot LFU from the RRDCD drive. This drive has the device name dkd400.

➌ Press Enter for the default bootfile, or enter the directory and file name of the utility.

LFU starts, displays a summary of its commands, and issues its prompt (UPD>).

➍ UPD> is the LFU prompt for command entry.

Updating Firmware A-3

A.2 List

The list command displays the inventory of update firmware on the CD-

ROM. Only the devices listed at your terminal are supported for firmware updates.

Example A–2 List Command

UPD> list

Device Current Revision Filename Update Revision cipca0 A315 cipca_fw A420 kn7cg-ab0_arc V5.68-0 kn7xx_arc V5.68-0 kn7cg-ab0 G5.5-11 kn7xx_fw V5.5-12 kn7cg-ab1_arc V5.68-0 kn7xx_arc V5.68-0 kn7cg-ab1 G5.5-11 kn7xx_fw V5.5-12

ccmab_fw 22

cixcd_fw 7

demfa_fw 2.1

demna_fw 9.4

dfxaa_fw 3.10

kdm70_fw 4.4

kfmsb_fw 2.4

kzmsa_fw 5.6

kzpsa_fw A12

UPD>

A-4 Service Manual

The list command shows three pieces of information for each device:

• Current revision — The revision of the device’s current firmware

• Filename — The name of the file that is recommended for updating that firmware

• Update revision — The revision of the firmware update

Updating Firmware A-5

A.3 Update

The update command writes new firmware from the CD-ROM to the module. Then LFU automatically verifies the update by reading the new firmware image from the module into memory and comparing it with the CD-ROM image.

Example A–3 Update Command

UPD> update kn7cg-ab0

WARNING: updates may take several minutes to complete for each device.

Confirm update on: kn7cg-ab0_arc [Y/(N)] y

DO NOT ABORT!

kn7cg-ab0_arc Updating to V5.68-0 .Verifying V5.68-0 Passed

Confirm update on: kn7cg-ab0 [Y/(N)] y

DO NOT ABORT!

kn7cg-ab0 Updating to V5.5-12... Verifying V5.5-12... Passed.

UPD> update kzpsa0

WARNING: updates may take several minutes to complete for each device.

Confirm update on: kzpsa0 [Y/(N)] y

DO NOT ABORT!

kzpsa0 Updating to A10... FAILED.

UPD> exit

Errors occurred during update with the following devices: kzpsa0

Do you want to continue to exit?

Continue [Y/(N)]y

Initializing...

[self-test display appears]

A-6 Service Manual

This command requests a firmware update for a specific module. If you want to update more than one device, you may use a wildcard but not a list. For example, update k* updates all devices with names beginning with k, and update * updates all devices.

LFU requires you to confirm the update. For processors, the first update to confirm is the AlphaBIOS firmware; the second is the SRM console firmware. In either case, the default is no.

Status message reports update and verification progress.

This is a second example.

The update failed. This could indicate a bad device.

Continued on next page

CAUTION: Never abort an update operation. Aborting corrupts the

firmware on the module.

Updating Firmware A-7

Example A–3 Update Command (Continued)

UPD> update

➏ confirm update on:

➐ kzpsa0 kzpsa1 pfi0

[Y/(N)]n

UPD> update kzpsa0 -path cipca_fw

WARNING: updates may take several minutes to complete for each device.

Confirm update on: kzpsa0 [Y/(N)]y

DO NOT ABORT!

Kzpsa0 firmware filename ’kdm70_fw’ is bad

UPD>

A-8 Service Manual

When you do not specify a device name, LFU tries to update all devices.

LFU lists the selected devices to update and prompts before devices are updated.

In this next example, the -path option is used to update a device with different firmware from the LFU default. A network location for the firmware file can be specified with the -path option. In this example, the firmware filename is not a valid file for the device specified.

CAUTION: Never abort an update operation. Aborting corrupts the

firmware on the module.

Updating Firmware A-9

A.4 Exit

The exit command terminates the LFU program, causes system initialization and self-test, and returns the system to console mode.

Example A–4 Exit Command

UPD> exit

Initializing...

[self-test display appears]

P00>>>

UPD> update kzpsa0

WARNING: updates may take several minutes to complete for each device.

Confirm update on: kzpsa0 [Y/(N)]y

DO NOT ABORT!

kzpsa0 Updating to A10... FAILED.

UPD> exit

Errors occurred during update with the following devices :

➌ kzpsa0

Do you want to continue to exit?

Continue [Y/(N)]y

Initializing...

P00>>>

[self-test display appears]

At the UPD> prompt, exit causes the system to be initialized.

The console prompt appears.

Errors occurred during an update.

Because of the errors, confirmation of the exit is required.

Typing y causes the system to be initialized and the console prompt to appear.

A.5 Display and Verify Commands

Display and verify commands are used in special situations. Display shows the physical configuration. Verify repeats the verification process performed by the update command.

Example A–5 Display and Verify Commands

UPD> display

Name Type Rev Mnemonic

TLSB

0++ KN7CG-AB 8014 0000 kn7cg-ab0

2+ MS7CC 5000 0000 ms7cc0

5+ MS7CC 5000 0000 ms7cc1

8+ KFTHA 2020 0000 kftha0

C0 C0 PCI connected to kftha0 pci1

6+ DECchip 21040-AA 21011 0023 tulip2

A+ KZPSA 81011 0000 kzpsa0

UPD> verify kzpsa0

➋ kzpsa0 Verifying A10... PASSED.

UPD>

Display shows the system physical configuration. Display is equivalent to issuing the console command show configuration.

Because it shows the slot for each module, display can help you identify the location of a device.

Verify reads the firmware from the module into memory and compares it with the update firmware on the CD-ROM. If a module already verified successfully when you updated it, but later failed selftest, you can use verify to tell whether the firmware has become corrupted.

A.6 Create

The create command allows you to make a custom console image.

Example A–6 Create Command

UPD> create

Console ARC image:

File = obj\alpha\tl6ab Version = V5.68-0 Creation time = 26-

NOV-1998 05:56:28

Image size = 70000(458752)

Console GROM image:

File = tl6 Version = V5.5-12 Creation time = 16-JUL-1999

11:50:35

Overlays = 163 Image size = 13b5f4(1291764)

Flash free bytes 49ec(18924)

Select form of new Console Grom image [Auto/Modify/Full/(A)] m

Do you wish to include debug capability [Y/(N)]

Included overlays: tl6 advcmd advshell arc arccmd ashshell basiccmd bitmap boot cipca cpu_mem cpu_tst diag_tio diagcmd diagsupport eecmd eeprom eisa environ ether examine fat flash floppy fptest fru galaxy hpc_diag info iso9660 isp1020 isp1020fw kbd kzpaa lfu lfu_drivers memtest mp_ex mscp net nettest nport ods2 optional pci pci_diag phase3 powerup prcache scsi set show show_power test tiop_diag toast tulip vga x86 x86a

Flash free bytes 13fefc(1310460)

Do you wish to add, remove or list overlays? [a,r,l,n] – l

Example A-6 Create Command (Continued)

Available overlays: cixcd dac960 debug defpa demfa demna dup i82558 kdm70 kfesa kfmsb kfpsa kgpsa kzmsa kzpsa lamb_diag mc_diag simport tga xct xdelta xmi

Included overlays: tl6 advcmd advshell arc arccmd ashshell basiccmd bitmap boot cipca cpu_mem cpu_tst diag_tio diagcmd diagsupport eecmd eeprom eisa environ ether examine fat flash floppy fptest fru galaxy hpc_diag info iso9660 isp1020 isp1020fw kbd kzpaa lfu lfu_drivers memtest mp_ex mscp net nettest nport ods2 optional pci pci_diag phase3 powerup prcache scsi set show show_power test tiop_diag toast tulip vga x86 x86a

Flash free bytes 13fefc(1310460)

Do you wish to add, remove or list overlays? [a,r,l,n] –

When you select create, LFU first displays the ARC and Grom console parameters.

LFU asks if you want to modify any parameter values. The default response is no.

Enter l to list the available overlays; or select another function.

Appendix B

Console Commands and

Environment Variables

B.1 Console Commands

Table B-1 is a summary of the console commands, showing syntax and brief descriptions. For additional information, see the Operations

Manual.

Table B–1 Summary of Console Commands

Command b[oot][-flags M,PPPP][-file

<filename>]<device_name> bu[ild] –c <device> bu[ild] –e <device>

Description

Boot the operating system.

–fl[ags]—overrides the boot_osflags environment variable.

M — specifies the system root to be booted from the system disk.

PPPP — operating system bootstrap loader options.

–file — boot from the file <filename>

(overrides the boot_file environment variable).

Copy the EEPROM environment variables from a secondary processor to the primary processor.

<device> — KN7CG- AA

Initialize a module’s EEPROM.

<device> — KN7CG- AA

Console Commands and Environment Variables B-1

Table B–1 Summary of Console Commands (Continued)

Command Description bu[ild] –n <device> bu[ild] –s <device>

Initialize the CPU’s nonvolatile RAM.

<device> — KN7CG- AA

Initialize a module’s serial EEPROM.

<device> — MS7CC, KFTHA, or DWLPB.

Clears the selected EEPROM option.

cl[ear]ee[prom]<option> cl[ear] <envar>

<option>diag_sdd, diag_tdd,

symptom, or log.

Removes an environment variable.

<envar> — name of the environment variable.

Clears the terminal screen.

cl[ear] screen c[ontinue] cra[sh]

<device> — KN7CG- AA

Resumes processing at the point where it was interrupted by Ctrl/P.

Causes the operating system to restart and generates a memory dump.

Creates an environment variable.

cre[ate]<envar>[<value>]

<envar> — name of the environment variable.

<value> — optional variable value.

da[te][<yyyymmddhhmm.ss>] Sets or displays the system date and time.

yyyy — year; mm — month; dd — day; d[eposit][-{b,w,l,q,o,h}][-{n val, s val}][space:]<address>

<data>

hh — hour; mm — minutes; ss — seconds

Stores data in the specified location.

space — device name or address space of the device to access.

<address> — offset within a device to which data is deposited.

Provides information on console commands.

e[xamine][-{b,w,l,q,o,h}][-{n val, s val}][space:]<address> i[nitialize] Performs a reset.

B-2 Service Manual

Table B–1 Summary of Console Commands (Continued)

Command run<progra> [-d<device>]

[-p<n>][-s<paramter string>] runecu se[t]ee[prom]<option> se[t]<envar>[value] set[t]h[ost]<device_adapter> or se[t]h[ost]<-dup><-bus b> mode [task] se[t] see[prom]<option>

<device> sh[ow].c[onfiguration] sh[ow] cpu sh[ow] dev[ice]<dev_name> sh[ow] ee[prom]<option>

Description

Runs one of four ARC utility programs: rcu

(RAID Configuration Utility), swxcrfw, eepromcfg, util_cli. The arc_enable environment variable must be set.

<program> — command option.

<device> — console device containing the program (default is dva0).

<n> — unit number of the PCI to configure.

<parameter string> — optional parameters to pass to the utility (must be enclosed in quotes).

Invokes the EISA Configuration Utility.

Sets the selected EEPROM option.

<option>field, halt, manufacturing,

serial, or symptom.

Modifies an environment variable. See

Table B-2 for the values of envar and

value. The command set –d envar resets the environment variable to its default.

Connects to another console or service. The

–dup option invokes the DUP server on the selected node. The set host command can be issued only from the boot processor.

Sets the selected SEEPROM option.

<option>field, manufacturing, or

serial.

<device> — the device mnemonic.

Displays the last configuration seen at system initialization.

Displays information on CPUs in the system.

Displays device information for any disk or tape adapter or group of adapters.

Displays elected EEPROM information.

<option>field, halt, manufacturing,

serial, or symptom.

Console Commands and Environment Variables B-3

Table B–1 Summary of Console Commands (Continued)

Command Description

sh[ow]<envar> or show * sh[ow] m[emory] sh[ow] ne[twork] sh[ow] see[prom]<option>

Displays the names and physical addresses of all known network devices.

Displays elected SEEPROM information.

<option>diag_sdd, diag_tdd, symptom, <device> sh[ow] simm

field, manufacturing, or serial.

<device>KFTHA

Displays the location of any bad SIMMs or indicates that no SIMM errors were found.

s[tart] address Begins execution of an instruction as the address specified. Does not initialize the system.

sto[p].<processor_number> Halts a specified processor. Does not control t[est][-write][-nowrite “list”

[omit “list”][-t time][-q]

[<dev_arg>]

Displays the current state of the specified environment variable.

<envar> — an environment variable name

(see Table B-2).

Displays memory module information.

the running of diagnostics and does not apply to adapters or memories.

<processor_number> — the logical CPU number (displayed by the show cpu command).

Tests the entire system (default), a subsystem, or a specified device.

–write — selects writes to media as well as reads; applicable only to disk testing.

# (comment)

–nowrite “list” — used with –write to prevent selected devices or groups of devices from being written to.

–omit “list” — specifies devices not to test.

–t time — run time in seconds, following system sizing and configuration; default is 600 seconds.

–q — disables status messages.

<dev_arg> — specifies the target device, group of devices or subsystem.

Introduces a comment.

B-4 Service Manual

B.2 Environment Variables

An environment variable is a name and value association maintained by the console program. The value associated with an environment variable is an

ASCII string (up to 127 characters) or an integer. Some environment variables are typically modified by the user to tailor the recovery behavior of the system on power-up and after system failures. Volatile environment variables are initialized by a system reset; others are nonvolatile across system failures.

Environment variables are created, modified, displayed, and deleted using the

create, set, show, and clear commands. A default value is associated with any variable that is stored the EEPROM area.

Table B-2 lists console environment variables, their attributes, and their functions.

Table B–2 Environment Variables

Variable arc_enable auto_action bootdef_dev boot_file

Attribute Function

Nonvolatile

Nonvolatile

Nonvolatile

Nonvolatile

Enables the console ARC interface, allowing booting of ECU and other utilities. Default value is

off.

Specifies the action the system will take following an error halt. Values are: restart — Automatically restart. If restart fails, boot the operating system.

The default device or device list from which booting is attempted when no device name is specified by the boot command.

The default file name used for the primary bootstrap when no file name is specified by the

boot command, if appropriate.

boot_osflags Nonvolatile

Additional parameters to be passed to the system software during booting if none are specified by the

boot command with the –flags qualifier.

Console Commands and Environment Variables B-5

Table B–2 Environment Variables (Continued)

Variable Attribute Function boot_reset console

Nonvolatile

Nonvolatile

Resets system and displays self-test results during booting. Default value is off.

The type of terminal being used for the console, either serial (default) for a standard video terminal or graphics for a graphics display. If the terminal is a graphics display, the system must have a PCI with a standard I/O module and a TGA graphics controller. If that hardware is not available, the variable remains set to serial.

cpu Volatile cpu_enabled Nonvolatile

Selects the current boot processor.

A bitmask indicating which processors are enabled to run (leave console mode). Default is 0xffff.

cpu_primary d_harderr

Nonvolatile

A bitmask indicating which processors are enabled to become the next boot processor, following the next reset. Default is 0xffff.

Volatile Determines action taken following a hard error.

Values are halt (default) and continue. Applies only when using the test command.

d_report d_softerr dump_dev

Volatile Determines level of information provided by the diagnostic reports. Values are summary and full

(default). Applies only when using the test command.

Volatile Determines action taken following a soft error.

Values are continue (default) and halt. Applies only when using the test command

Nonvolatile

Device to which dump file is written if system crashes, if supported by the operating system.

B-6 Service Manual

Table B–2 Environment Variables (Continued)

Variable graphics_ switch interleave language simm_callout sys_model_ num sys_serial_ num tta0_baud

Attribute Function enable_audit Nonvolatile

Nonvolatile

Nonvolatile

Nonvolatile

Nonvolatile

Nonvolatile

Nonvolatile

Nonvolatile

If set to on (default), enables the generation of audit trail messages. If set to off, audit trail messages are suppressed. Console initialization sets this to on.

Overrides the screen resolution setting. The variable is an integer from 0 to 15, as described in Table B-3.

The memory interleave specification. Value must be default (memory configuration algorithm that attempts to maximize memory interleaving is used), none, or an explicit interleave list.

Determines whether system displays message numbers or message text. Default value is 36

(English).

If set to on, enables pause-on-error mode (POEM) testing of faulty memories during power-up.

Default is off.

The system model number, GS60E. Set in manufacturing.

The system serial number. Set in manufacturing.

Sets the console terminal baud rate. Allowable values are 300, 600, 1200, 2400, 4800, and 9600.

Console Commands and Environment Variables B-7

Table B-3 Settings for the graphics_switch Environment Variable

Setting

6

7

4

5

2

3

0

1

8

9

10

11

12

13

14

15

Pixel Frequency

(Mhz)

93

75

74

69

130

119

108

104

65

50

40

32

25

135

110

Reserved

Monitor Resolution

(Pixels)

1280 x 1024

1280 x 1024

1280 x 1024

1152 x 900

1152 x 900

1024 x 768

1024 x 768

1024 x 864

1024 x 768

800 x 600

800 x 600

640 x 480

640 x 480

1280 x 1024

1280 x 1024

Refresh Rate (Hz)

66

70

72

60

72

66

60

72

60

72

60

72

60

75

60

B-8 Service Manual

Index

A

AC distribution box, 5-28

Address bus commands, 4-2

Address gate array (ADG), 1-7

ARC utility programs, B-3

Audit trail messages, B-7

B

BA36R StorageWorks shelf, 1-14, 2-14,

5-33

Baud rate, console terminal, B-7

Blowers, 1-14, 2-14, 5-38

boot command, A-3, B-1

Boot processor, 3-3

Booting LFU, A-2

BPD line, 3-3

build -c command, 5-7, 5-11

build command, B-1

C

Cabinet control logic (CCL) panel, 1-12,

5-32

Cabinet panels, 5-40

Cables, 5-42

Cache memory, 1-7

CD-ROM drive, 1-14, 2-14, 5-26

clear command, B-2

Commander node, 4-2

Comment (#) command, B-4

Console CD-ROM, A-2

Console commands, B-1

Console halt conditions, 4-30

continue command, B-2

Control and status register (CSR), 4-2

CPU double error halt, 4-30, 4-33

crash command, B-2

create command, B-2

D

Data bus signals, 4-3

Data interface gate arrays (DIGA), 1-7

date command, B-2

DC distribution module, 5-43

DC to DC converters, 1-7, 1-15

DECevent, 4-3

deposit command, B-2

display command, LFU, B-12

Dump file, B-6

DWLPB error log, 4-24

DWLPB PCI box, 5-36

E

EMI gasket, 5-13

Enabled (E) processor, 3-3

Environment variables

arc_enable, B-5

auto_action, B-5

boot_file, B-5

boot_osflags, B-5

boot_reset, B-6

bootdef_dev, B-5

console, B-6

cpu, B-6

cpu_enabled, B-6

cpu_primary, B-6

d_harderr, B-6

d_report, B-6

d_softerr, B-6

dump_dev, B-6

enable_audit, B-7

graphics_switch, B-7

interleave, B-7

language, B-7

Index-1

simm_callout, B-7

sys_model_num, B-7

sys_serial_num, B-7

tta0_baud, B-7

Error checking, 4-3

Error log, DECevent, 4-4

Error log header structure, 4-31

Error log size, 4-42

Event type identification, 4-7

examine command, B-2

exit command, LFU, A-10

Expander cabinet, 1-2, 5-43

F

Fatal errors, 4-30

Floppy drive , 1-14, 5-43

G

Graphics console, B-6

graphics_switch environment variable setting, B-8

grep command, 3-15

GS60E options, 1-3

H

H7056 power supply removal and replacement, 5-29

Hard error, B-6

Hose numbering, 3-5

Hoses, 1-10

I

I/O hose cable, 5-18

info 5 command, 3-15

info command, 3-14

init command, 3-13

initialize command, B-2

K

KFTHA module, 1-10

KFTHA placement, 1-5

Index-2

L

LARS number, 5-7, 5-11

LFU booting, A-2

display command, A-12

exit command, A-10

list command, A-4

update command, A-6

verify command, A-12

LFU prompt, UPD>, A-3

list command, LFU, A-4

Loadable firmware update (LFU) utility, A-1

M

Machine check 620 errors, 4-17, 4-52

Machine check 660 errors, 4-8

Machine check 670 errors, 4-30

Machine check errors, 4-6

Machine check error log, 4-42

Machine check logout frames, 4-39

Memory interleaving, 3-3 size, 3-3

Memory interleave specification, B-7

Memory module placement, 1-5 removal, 5-13 test, 3-10

Module placement rules, 1-5

MS7CC memory module, 1-8

Multiprocessor testing, 3-3

N

Node # line, 3-3

O

OCP cable, 5-24, 5-43

OCP removal, 5-24

OpenVMS event type

identification, 4-7

OSF event type identification, 4-7

P

PAL code, 4-3

Parse trees, 4-23, 4-61

Parsing errors, 4-8, 4-12

path option, A-9

PCI shelves (DWLPB-DA), 1-15

Plenum assembly, 5-38

Power rack assembly, 5-30

Power subsystem, 1-12

Power supplies, 1-12, 5-31

Processor module, 1-6, 5-2 placement, 1-5 replacement, 5-12

R

Removal and replacement procedures

AC distribution box, 5-28

BA36R StorageWorks shelf, 5-34 boot processor, 5-4 cabinet control logic (CCL) panel, 5-

32 cabinet panels and doors, 5-40

CD tray, 5-26

DWLPB, 5-36

H7056 power supply, 5-29

I/O hose cable, 5-18

KFTHA, 5-18 memory module, 5-13 operator control panel (OCP), 5-24 plenum, 5-39 power rack assembly, 5-30 power supply, 5-31 processor module, 5-2 second module, 5-8

SIMM, 5-14 terminator module, 5-12

TLSB card cage, 5-20

run command, B-3

runecu command, B-3

S

Self-test console display, 3-2

Serial console, B-6

set command, B-3

show command, B-3 show configuration command, 5-13

Show configuration display, 3-4

show device command, 5-23, 3

show simm command, 5-13

SIMM console commands, 3-13

SIMM fault, 4-12

SIMM identification, failing, 3-12

SIMMs, 1-9

Slave node, 4-2

start command, B-4

stop command, B-4

StorageWorks shelves (BA36R), 1-15

Summary error log, 4-5

Supported event types, 4-6

T

Terminating testing, 3-9

Terminator module, 1-5, 5-12

test command, 3-6, 5-23, B-4, B-6

TLEP subpacket, 4-44

TLSB system bus, 1-4, 4-2

Troubleshooting overview, 1-16

Troubleshooting tools, 1-17

TYP line, 3-3

U

update command, 5-5, 5-9, A-6

Updating firmware, A-1

V

Verification, 5-13, 5-15, 5-19, 5-23, 5-25,

5-27, 5-29, 5-31, 5-33, 5-37, 5-39

verify command, LFU, A-12

Index-3

advertisement

Was this manual useful for you? Yes No
Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Related manuals