Sun Fire™ 6800/4810/4800/3800 Systems Platform Administration

Sun Fire™ 6800/4810/4800/3800
Systems Platform
Administration Manual
Sun Microsystems, Inc.
901 San Antonio Road
Palo Alto, CA 94303 U.S.A.
650-960-1300
Part No. 816-2970-10
May 2002, Revision A
Send comments about this document to: docfeedback@sun.com
Copyright 2002 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved.
This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation.
No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors,
if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers.
Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in
the U.S. and other countries, exclusively licensed through X/Open Company, Ltd.
Sun, Sun Microsystems, the Sun logo, docs.sun.com, Sun Fire, OpenBoot, Sun StorEdge, and Solaris are trademarks, registered trademarks, or
service marks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or
registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an
architecture developed by Sun Microsystems, Inc.
Federal Acquisitions Commercial Software-Government Users Subject to Standard License Terms and Conditions.
DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Copyright 2002 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 Etats-Unis. Tous droits réservés.
Ce produit ou document est protégé par un copyright et distribué avec des licences qui en restreignent l’utilisation, la copie, la distribution, et la
décompilation. Aucune partie de ce produit ou document ne peut être reproduite sous aucune forme, par quelque moyen que ce soit, sans
l’autorisation préalable et écrite de Sun et de ses bailleurs de licence, s’il y en a. Le logiciel détenu par des tiers, et qui comprend la technologie
relative aux polices de caractères, est protégé par un copyright et licencié par des fournisseurs de Sun.
Des parties de ce produit pourront être dérivées des systèmes Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque
déposée aux Etats-Unis et dans d’autres pays et licenciée exclusivement par X/Open Company, Ltd.
Sun, Sun Microsystems, the Sun logo, docs.sun.com, Sun Fire, OpenBoot, Sun StorEdge, et Solaris sont des marques de fabrique ou des marques
déposées, ou marques de service, de Sun Microsystems, Inc. aux Etats-Unis et dans d’autres pays. Toutes les marques SPARC sont utilisées sous
licence et sont des marques de fabrique ou des marques déposées de SPARC International, Inc. aux Etats-Unis et dans d’autres pays. Les
produits portant les marques SPARC sont basés sur une architecture développée par Sun Microsystems, Inc.
L’interface d’utilisation graphique OPEN LOOK et Sun™ a été développée par Sun Microsystems, Inc. pour ses utilisateurs et licenciés. Sun
reconnaît les efforts de pionniers de Xerox pour la recherche et le développement du concept des interfaces d’utilisation visuelle ou graphique
pour l’industrie de l’informatique. Sun détient une licence non exclusive de Xerox sur l’interface d’utilisation graphique Xerox, cette licence
couvrant également les licenciés de Sun qui mettent en place l’interface d’utilisation graphique OPEN LOOK et qui en outre se conforment aux
licences écrites de Sun.
LA DOCUMENTATION EST FOURNIE “EN L’ETAT” ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES
EXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y COMPRIS
NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION
PARTICULIERE OU A L’ABSENCE DE CONTREFAÇON.
Please
Recycle
Contents
Preface
1.
xv
Introduction
Domains
1
2
System Components
Partitions
3
3
System Controller
8
Serial and Ethernet Ports
9
System Controller Logical Connection Limits
System Controller Software
9
10
Redundant Components and Minimum Configurations
Redundant System Controller Boards
CPU/Memory Boards
I/O Assemblies
13
14
15
Redundant Cooling
Redundant Power
Repeater Boards
13
16
17
18
Redundant System Clocks
19
Reliability, Availability, and Serviceability (RAS)
20
i
Reliability
20
Availability
22
Serviceability
23
Dynamic Reconfiguration Software
24
Sun Management Center Software for the Sun Fire 6800/4810/4800/3800 Systems
Software 25
FrameManager
2.
25
System Controller Navigation Procedures
Connection to the System Controller
Obtaining the Platform Shell
▼
28
28
To Obtain the Platform Shell Using telnet
Obtaining a Domain Shell or Console
System Controller Navigation
28
30
32
▼
To Enter the Domain Console From the Domain Shell If the Domain Is
Inactive 35
▼
To Enter the Domain Shell From the Domain Console
▼
To Get Back to the Domain Console From the Domain Shell
▼
To Enter a Domain From the Platform Shell
Terminating Sessions
3.
27
36
37
37
▼
To Terminate an Ethernet Connection With telnet
▼
To Terminate a Serial Connection With tip
System Power On and Setup
37
38
41
Installing, Cabling, and Powering on the Hardware
43
Setting Up Additional Services Before System Power On
Powering On the Hardware
Powering On the Power Grids
Setting Up the Platform
▼
ii
45
45
46
To Set the Date and Time for the Platform
46
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
43
36
▼
To Set a Password for the Platform
47
▼
To Configure Platform Parameters
47
Setting Up Domain A
48
▼
To Access the Domain
48
▼
To Set the Date and Time for Domain A
▼
To Set a Password for Domain A
▼
To Configure Domain-Specific Parameters
48
Saving the Current Configuration to a Server
▼
48
49
50
To Use dumpconfig to Save Platform and Domain Configurations
Installing and Booting the Solaris Operating Environment
▼
4.
Creating and Starting Domains
51
53
53
Before Creating Multiple Domains
▼
51
To Install and Boot the Solaris Operating Environment
Creating and Starting Multiple Domains
50
To Create A Second Domain
53
55
Special Considerations When Creating a Third Domain on the Sun Fire 6800
System 56
▼
5.
To Start the Domain
Security
57
59
Security Threats
59
System Controller Security
60
setupplatform and setupdomain Parameter Settings
Changing Passwords for the Platform and the Domain
Domains
61
61
62
Domain Separation
62
setkeyswitch Command
64
Solaris Operating Environment Security
64
Contents
iii
SNMP
6.
64
Maintenance
65
Powering Off and On the System
Powering Off the System
65
▼
To Power Off the System
66
▼
To Power On the System
68
Keyswitch Positions
▼
69
To Power On a Domain
Shutting Down Domains
▼
65
70
70
To Shut Down a Domain
70
Assigning and Unassigning Boards
71
▼
To Assign a Board to a Domain
▼
To Unassign a Board From a Domain
Upgrading the Firmware
72
75
Saving and Restoring Configurations
Using dumpconfig
76
System Controller Failover
How SC Failover Works
76
76
Using restoreconfig
7.
74
77
77
What Triggers an Automatic Failover
What Happens During a Failover
SC Failover Prerequisites
78
78
80
Conditions That Affect Your SC Failover Configuration
iv
How to Manage SC Failover
81
▼
To Disable SC Failover
82
▼
To Enable SC Failover
▼
To Perform a Manual SC Failover
82
82
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
81
▼
To Obtain Failover Status Information
How to Recover After an SC Failover
8.
Testing System Boards
▼
9.
85
85
To Test a CPU/Memory Board
Testing an I/O Assembly
▼
84
85
Testing a CPU/Memory Board
Requirements
86
86
To Test an I/O Assembly
87
Removing and Replacing Boards
91
CPU/Memory Boards and I/O Assemblies
92
▼
To Remove and Replace a System Board
▼
To Unassign a Board From a Domain or Disable a System Board
▼
To Hot-Swap a CPU/Memory Board
▼
To Hot-Swap an I/O Assembly
CompactPCI and PCI Cards
92
96
98
To Remove and Replace a PCI Card
▼
To Remove and Replace a CompactPCI Card
▼
95
95
▼
Repeater Board
98
99
99
To Remove and Replace a Repeater Board
System Controller Board
99
101
▼
To Remove and Replace the System Controller Board in a Single SC
Configuration 101
▼
To Remove and Replace a System Controller Board in a Redundant SC
Configuration 103
ID Board and Centerplane
▼
10.
83
104
To Remove and Replace ID Board and Centerplane
Troubleshooting
104
107
Contents
v
System Faults
107
Displaying Diagnostic Information
107
Displaying System Configuration Information
Assisting Sun Service Personnel
▼
Hung Domain
▼
108
To Determine the Cause of Your Failure
Domain Not Responding
108
108
109
109
To Recover a Hard Hung or Paused Domain
Board and Component Failures
112
CPU/Memory Board Failure
I/O Assembly Failure
110
112
113
System Controller Board Failure
113
Collecting Platform and Domain Status Information
Repeater Board Failure
Power Supply Failure
Fan Tray Failure
117
121
121
FrameManager Failure
Disabling Components
A.
122
Mapping Device Path Names
Device Mapping
B.
121
125
125
CPU/Memory Mapping
125
I/O Assembly Mapping
127
Setting Up an http or ftp Server
Setting Up the Firmware Server
vi
▼
To Set Up an http Server
▼
To Set Up an ftp Server
139
139
140
142
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
114
Glossary
Index
145
147
Contents
vii
viii
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Figures
FIGURE 1-1
Sun Fire 6800 System in Single-Partition Mode 5
FIGURE 1-2
Sun Fire 6800 System in Dual-Partition Mode
FIGURE 1-3
Sun Fire 4810/4800 Systems in Single-Partition Mode 6
FIGURE 1-4
Sun Fire 4810/4800 Systems in Dual-Partition Mode 6
FIGURE 1-5
Sun Fire 3800 System in Single-Partition Mode 7
FIGURE 1-6
Sun Fire 3800 System in Dual-Partition Mode
FIGURE 2-1
Navigating Between the Platform Shell and the Domain Shell
FIGURE 2-2
Navigating Between the Domain Shell, the OpenBoot PROM, and the Solaris Operating
Environment 34
FIGURE 2-3
Navigating Between the OpenBoot PROM and the Domain Shell 35
FIGURE 3-1
Flowchart of Power On and System Setup Steps
FIGURE 5-1
System With Domain Separation 63
FIGURE 10-1
Resetting the System Controller 114
FIGURE A-1
Sun Fire 6800 System PCI Physical Slot Designations for IB6 Through IB9
FIGURE A-2
Sun Fire 4810/4800 Systems PCI Physical Slot Designations for IB6 and IB8 132
FIGURE A-3
Sun Fire 3800 System 6-Slot CompactPCI Physical Slot Designations 135
FIGURE A-4
Sun Fire 4810/4800 Systems 4-Slot CompactPCI Physical Slot Designations
FIGURE A-5
Sun Fire 6800 System 4-Slot CompactPCI Physical Slot Designations for IB6 through
IB9 138
5
7
33
42
131
137
ix
x
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Tables
TABLE 1-1
Repeater Boards in the Sun Fire 6800/4810/4800/3800 Systems 3
TABLE 1-2
Maximum Number of Partitions and Domains Per System 4
TABLE 1-3
Board Name Descriptions
TABLE 1-4
Functions of System Controller Boards 8
TABLE 1-5
Serial Port and Ethernet Port Features on the System Controller Board 9
TABLE 1-6
Maximum Number of CPU/Memory Boards in Each System 14
TABLE 1-7
Maximum Number of I/O Assemblies and I/O Slots per I/O Assembly 15
TABLE 1-8
Configuring for I/O Redundancy
TABLE 1-9
Minimum and Maximum Number of Fan Trays 16
TABLE 1-10
Minimum and Redundant Power Supply Requirements 17
TABLE 1-11
Sun Fire 6800 System Components in Each Power Grid
TABLE 1-12
Repeater Board Assignments by Domains in the Sun Fire 6800 System
TABLE 1-13
Repeater Board Assignments by Domains in the Sun Fire 4810/4800/3800 Systems 19
TABLE 1-14
Sun Fire 6800 Domain and Repeater Board Configurations for Single- and Dual-Partitioned
Systems 19
TABLE 1-15
Sun Fire 4810/4800/3800 Domain and Repeater Board Configurations for Single- and DualPartitioned Systems 19
TABLE 1-16
Results of setkeyswitch Settings During a Power Failure 23
TABLE 3-1
Services That Should Be Set Up Before System Power On 43
TABLE 3-2
Steps in Setting up Domains Including the dumpconfig Command 49
TABLE 4-1
Guidelines for Creating Three Domains on the Sun Fire 6800 System 57
4
15
18
18
xi
xii
66
TABLE 6-1
Displaying the Status of All Domains With the showplatform -p status Command
TABLE 6-2
Overview of Steps to Assign a Board To a Domain 71
TABLE 6-3
Overview of Steps to Unassign a Board From a Domain 71
TABLE 9-1
Repeater Boards and Domains 99
TABLE 10-1
OpenBoot PROM error-reset-recovery Configuration Variable Settings
TABLE 10-2
Solaris Operating Environment and System Controller Software Commands for Collecting
Status Information 115
TABLE 10-3
Repeater Board Failure
TABLE 10-4
Blacklisting Component Names
TABLE A-1
CPU and Memory Agent ID Assignment 126
TABLE A-2
I/O Assembly Type and Number of Slots per I/O Assembly by System Type
TABLE A-3
Number and Name of I/O Assemblies per System 127
TABLE A-4
I/O Controller Agent ID Assignments 128
TABLE A-5
8-Slot PCI I/O Assembly Device Map for the Sun Fire 6800/4810/4810 Systems 129
TABLE A-6
Mapping Device Path to I/O Assembly Slot Numbers for Sun Fire 3800 Systems
TABLE A-7
Mapping Device Path to I/O Assembly Slot Numbers for Sun Fire 6800/4810/4800
Systems 135
111
117
122
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
127
134
Code Samples
29
CODE EXAMPLE 2-1
Obtaining the Platform Shell With telnet
CODE EXAMPLE 2-2
Obtaining a Domain Shell With telnet
CODE EXAMPLE 2-3
Obtaining a Domain Shell From the Domain Console 32
CODE EXAMPLE 2-4
Obtaining a Domain Shell From the Domain Console 32
CODE EXAMPLE 2-5
Obtaining a Domain Shell From the Domain Console 36
CODE EXAMPLE 2-6
Ending a tip Session 39
CODE EXAMPLE 3-1
password Command Example For a Domain With No Password Set
CODE EXAMPLE 3-2
Sample Boot Error Message When the auto-boot? Parameter Is Set To true
CODE EXAMPLE 6-1
showboards -a Example Before Assigning a Board to a Domain 72
CODE EXAMPLE 7-1
Messages Displayed During an Automatic Failover 78
CODE EXAMPLE 9-1
Confirming Board ID Information 105
CODE EXAMPLE 9-2
ID Information To Enter Manually
CODE EXAMPLE B-1
Locating the Port 80 Value in httpd.conf
CODE EXAMPLE B-2
Locating the ServerAdmin Value in httpd.conf
CODE EXAMPLE B-3
Locating the ServerName Value in httpd.conf
CODE EXAMPLE B-4
Starting Apache
31
49
51
105
140
141
141
141
xiii
xiv
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Preface
This book provides an overview of the system and presents a step-by-step
description of common administration procedures. It explains how to configure and
manage the platform and domains. It also explains how to remove and replace
components and perform firmware upgrades. It contains information about security,
troubleshooting, and a glossary of technical terms.
How This Book Is Organized
Chapter 1 describes domains and the system controller. It provides an overview of
partitions and domains, redundant system components, and minimum system
configurations. This chapter also provides an overview of reliability, serviceability,
and availability.
Chapter 2 explains how to navigate between the platform and domain shells,
between the Solaris operating environment and the domain shell, or between the
OpenBoot PROM and the domain shell. This chapter also explains how to
terminate a system controller session.
Chapter 3 explains how to power on and set up the system for the first time.
Chapter 4 explains how to create and start multiple domains.
Chapter 5 presents information on security.
Chapter 6 explains how to power on and power off the system. It also explains how
to update firmware.
Chapter 7 describes how system controller failover works.
Chapter 8 describes how to test boards.
xv
Chapter 9 describes the software steps necessary to remove and install a
CPU/Memory board, I/O assembly, Compact PCI card, PCI card, Repeater board,
System Controller board, and ID board/centerplane.
Chapter 10 provides troubleshooting information about LEDs, system faults, the
system controller loghost, and procedures such as displaying diagnostic information,
displaying system configuration information, recovering from a hung domain,
disabling components (blacklisting), and mapping device path names to physical
system devices.
Appendix A describes how to map device path names to physical system devices.
Appendix B describes how to set up an HTTP and FTP server.
Using UNIX Commands
This book assumes you are experienced with the UNIX® operating environment. If
you are not experienced with the UNIX operating environment, see one or more of
the following for this information:
xvi
■
Documentation for the Solaris operating environment, available on
docs.sun.com (see “Accessing Sun Documentation Online” on page xviii)
■
Sun Hardware Platform Guide, which is available in both hard copy and online with
your operating system release, describes Solaris operating environment
information specific to the Sun Fire 6800/4810/4800/3800 systems.
■
Release Notes Supplement for Sun Hardware describes late breaking information
about the Solaris operating environment.
■
Other software documentation that you received with your system
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Typographic Conventions
Typeface
Meaning
Examples
AaBbCc123
The names of commands, files,
and directories; on-screen
computer output
Edit your .login file.
Use ls -a to list all files.
AaBbCc123
What you type, when
contrasted with on-screen
computer output
% su
Password:
AaBbCc123
Book titles, new words or terms,
words to be emphasized
Read Chapter 6 in the User’s Guide.
These are called class options.
You must be superuser to do this.
Command-line variable; replace
with a real name or value
To delete a file, type rm filename.
Shell Prompts
Shell
Prompt
C shell
machine_name%
C shell superuser
machine_name#
Bourne shell and Korn shell
$
Bourne shell and Korn shell superuser
#
Platform shell
schostname:SC>
Platform console
schostname:SC>
Domain shell
schostname:A> or B>, C>, D>
Domain console
ok, login:, machine_name%, or
machine_name#
Preface
xvii
Related Documentation
Type of Book
Title
Part Number
Overview
Sun Fire 6800/4810/4800/3800 Systems
Overview Manual
805-7362
Service
Sun Fire 6800/4810/4800/3800 Systems Service
Manual
805-7363
Service
Sun Fire 4810/4800/3800 System Cabinet
Mounting Guide
806-6781
System Controller
Sun Fire 6800/4810/4800/3800 System
Controller Command Reference Manual
816-2971
Release Notes
Sun Fire 6800/4810/4800/3800 Systems
Software Release Notes
816-2972
Solaris operating
environment
Sun Hardware Platform Guide
Varies with
release.
Solaris operating
environment
Release Notes Supplement for Sun Hardware
Varies with
release.
Accessing Sun Documentation Online
A broad selection of Sun system documentation is located at:
http://www.sun.com/products-n-solutions/hardware/docs
A complete set of Solaris documentation and many other titles are located at:
http://docs.sun.com
xviii
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Sun Welcomes Your Comments
Sun is interested in improving its documentation and welcomes your comments and
suggestions. You can email your comments to Sun at:
docfeedback@sun.com
Please include the part number (816-2970-10) of your document in the subject line of
your email.
Preface
xix
xx
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
CHAPTER
1
Introduction
This chapter presents an introduction of features for the family of mid-range
servers—the Sun Fire 6800/4810/4800/3800 systems. This chapter describes:
■
■
■
■
■
■
■
■
“Domains” on page 2
“System Components” on page 3
“Partitions” on page 3
“System Controller” on page 8
“Redundant Components and Minimum Configurations” on page 13
“Reliability, Availability, and Serviceability (RAS)” on page 20
“Sun Management Center Software for the Sun Fire 6800/4810/4800/3800
Systems Software” on page 25
“FrameManager” on page 25
The term platform, as used in this book, refers to the collection of resources such as
power supplies, the centerplane, and fans that are not for the exclusive use of a
domain.
A partition is a group of Repeater boards that are used together to provide
communication between CPU/Memory boards and I/O assemblies in the same
domain.
A domain runs its own instance of the Solaris operating environment and is
independent of other domains. Each domain has its own CPUs, memory, and I/O
assemblies. Hardware resources including fans and power supplies are shared
among domains, as necessary for proper operation.
The system controller is an embedded system on a board that connects into the
centerplane of these mid-range systems. You access the system controller using
either serial or Ethernet connections. It is the focal point for platform and domain
configuration and management and is used to connect to the domain consoles.
The system controller configures and monitors the other hardware in the system and
provides a command line interface that enables you to perform tasks needed to
configure the platform and each domain, plus many other functions. The system
controller also provides monitoring and configuration capability with SNMP for use
1
with the Sun Management Center software. For more information on the system
controller hardware and software, see “System Controller” on page 8 and “System
Controller Software” on page 10.
Domains
With this family of mid-range systems, you can group system boards (CPU/Memory
boards and I/O assemblies) into domains. Each domain can host its own instance of
the Solaris operating environment and is independent of other domains.
Domains include the following features:
■
■
■
■
Each domain is able to run the Solaris operating environment.
Domains do not interact with each other.
Each domain has its own peripheral and network connections.
Each domain is assigned its own unique host ID and hostname.
All systems are configured at the factory with one domain.
You create domains using either the system controller command line interface or the
Sun Management Center software for the Sun Fire 6800/4810/4800/3800 systems.
How to create domains using the system controller software is described in
“Creating and Starting Domains” on page 53. For instructions on how to create
domains using the Sun Management Center software for the Sun Fire
6800/4810/4800/3800 systems, refer to the Sun Management Center 3.0 Supplement for
Sun Fire 6800, 4810, 4800, and 3800 Systems.
The largest domain configuration is comprised of all CPU/Memory boards and I/O
assemblies in the system. The smallest domain configuration is comprised of one
CPU/Memory board and one I/O assembly.
An active domain must meet these requirements:
■
■
■
■
Minimum of one CPU/Memory board with memory
Minimum of one I/O assembly with one I/O card installed
Required number of Repeater boards (not assigned to a domain)
Minimum of one system controller for the system to work (system controllers are
not assigned to a domain)
In addition, sufficient power and cooling is required. The power supplies and fan
trays are not assigned to a domain.
If you run more than one domain in a partition, then the domains are not completely
isolated. A failed Repeater board could affect all domains within the partition. For
more information, see “Repeater Boards” on page 18.
2
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
System Components
The system boards in each system consist of CPU/Memory boards and I/O
assemblies. The Sun Fire 6800/4810/4800 systems have Repeater boards (TABLE 1-1),
which provide communication between CPU/Memory boards and I/O assemblies.
TABLE 1-1
Repeater Boards in the Sun Fire 6800/4810/4800/3800 Systems
System
Number of Repeater Boards
Sun Fire 6800 system
4 Repeater boards—RP0, RP1, RP2, RP3
Sun Fire 4810 system
2 Repeater boards—RP0, RP2
Sun Fire 4800 system
2 Repeater boards—RP0, RP2
Sun Fire 3800 system
Equivalent of two Repeater boards (RP0 and RP2) are built into an
active centerplane.
For a system overview, including descriptions of the boards in the system, refer to
the Sun Fire 6800/4810/4800/3800 Systems Overview Manual.
Partitions
A partition is a group of Repeater boards that are used together to provide
communication between CPU/Memory boards and I/O assemblies. Depending on
the system configuration, each partition can be used by either one or two domains.
These systems can be configured to have one or two partitions. Partitioning is done
at the Repeater board level. A single partition forms one large partition using all of
the Repeater boards. In dual-partition mode, two smaller partitions using fewer
Repeater boards are created each using one-half of the total number of Repeater
boards in the system. For more information on Repeater boards, see “Repeater
Boards” on page 18.
TABLE 1-2 lists the maximum number of partitions and domains each system can
have.
Chapter 1
Introduction
3
TABLE 1-2
Maximum Number of Partitions and Domains Per System
Sun Fire 6800 System
Sun Fire 4810/4800/3800
Systems
Number of Partitions1
1 or 2
1 or 2
Number of Active Domains in DualPartition Mode
Up to 4 (A, B, C, D)
Up to 2 (A, C)
Number of Active Domains in SinglePartition Mode
Up to 2 (A, B)
Up to 2 (A, B)
1
The default is one partition.
FIGURE 1-1 through FIGURE 1-6 show partitions and domains for the Sun Fire
6800/4810/4800/3800 systems. The Sun Fire 3800 system has the equivalent of two
Repeater boards, RP0 and RP2, as part of the active centerplane. The Repeater
boards are not installed in the Sun Fire 3800 system as they are for the other systems.
Instead, the Repeater boards in the Sun Fire 3800 system are integrated into the
centerplane.
All of these systems are very flexible, and you can assign CPU/Memory boards and
I/O assemblies to any domain or partition. The configurations shown in the
following illustrations are examples only and your configuration may differ.
TABLE 1-3 describes the board names used in FIGURE 1-1 through FIGURE 1-6.
TABLE 1-3
4
Board Name Descriptions
Board Name
Description
SB0 - SB5
CPU/Memory boards
IB6 - IB9
I/O assemblies
RP0 - RP3
Repeater boards
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
FIGURE 1-1 shows the Sun Fire 6800 system in single-partition mode. This system has
four Repeater boards that operate in pairs (RP0, RP2) and (RP1, RP3), six
CPU/Memory boards (SB0 - SB5), and four I/O assemblies (IB6 - IB9).
Partition 0
Domain A
Domain B
RP0
RP1
RP2
RP3
SB0
SB1
SB2
SB3
SB4
SB5
IB6
FIGURE 1-1
IB8
IB7
IB9
Sun Fire 6800 System in Single-Partition Mode
FIGURE 1-2 shows the Sun Fire 6800 system in dual-partition mode. The same boards
and assemblies are shown as in FIGURE 1-1.
Partition 0
Domain A
Partition 1
Domain B
Domain C
Domain D
RP0
RP2
RP1
RP3
SB0
SB4
SB1
SB2
SB3
SB5
IB6
IB8
FIGURE 1-2
IB7
IB9
Sun Fire 6800 System in Dual-Partition Mode
Chapter 1
Introduction
5
FIGURE 1-3 shows the Sun Fire 4810/4800 systems in single-partition mode. These
systems have two Repeater boards (RP0 and RP2) that operate separately (not in pairs
as in the Sun Fire 6800 system), three CPU/Memory boards (SB0, SB2, and SB4), and
two I/O assemblies (IB6 and IB8).
Partition 0
Domain A
Domain B
RP0
RP2
SB0
SB2
SB4
IB6
FIGURE 1-3
IB8
Sun Fire 4810/4800 Systems in Single-Partition Mode
FIGURE 1-4 shows the Sun Fire 4810/4800 systems in dual-partition mode. The same
boards and assemblies are shown as in FIGURE 1-3.
Partition 0
Partition 1
Domain A
Domain C
RP0
RP2
SB0
SB2
SB4
IB6
FIGURE 1-4
6
IB8
Sun Fire 4810/4800 Systems in Dual-Partition Mode
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
FIGURE 1-5 shows the Sun Fire 3800 system in single-partition mode. This system has
the equivalent of two Repeater boards (RP0 and RP2) integrated into the active
centerplane, two CPU/Memory boards (SB0 and SB2), and two I/O assemblies
(IB6 and IB8).
Partition 0
Domain A
Domain B
RP0
RP2
FIGURE 1-5
SB0
SB2
IB6
IB8
Sun Fire 3800 System in Single-Partition Mode
FIGURE 1-6 shows the Sun Fire 3800 system in dual-partition mode. The same boards
and assemblies are shown as in FIGURE 1-5. This system also has the equivalent of
two Repeater boards, RP0 and RP2, integrated into the active centerplane.
Partition 0
Domain A
FIGURE 1-6
Partition 1
Domain C
RP0
RP2
SB0
SB2
IB6
IB8
Sun Fire 3800 System in Dual-Partition Mode
Chapter 1
Introduction
7
System Controller
The system controller is an embedded system on a board that connects into the
centerplane of these mid-range systems. It is the focal point for platform and domain
configuration and management and is used to connect to the domain consoles.
System controller functions include:
■
■
■
■
■
■
■
■
■
■
Managing platform and domain resources
Monitoring the platform and domains
Configuring the domains and the platform
Providing access to the domain consoles
Providing the date and time to the Solaris operating environment
Providing the reference clock signal used throughout the system
Providing console security
Performing domain initialization
Providing a mechanism for upgrading firmware on the boards installed in the
system
Providing an external management interface using SNMP
The system can support up to two System Controller boards (TABLE 1-4) that function
as a main and spare system controller. This redundant configuration of system
controllers supports the SC failover mechanism, which triggers the automatic
switchover of the main SC to the spare if the main SC fails. For details on SC
failover, see Chapter 7.
TABLE 1-4
8
Functions of System Controller Boards
System Controller
Function
Main
Manages all system resources. Configure your system to connect to
the main System Controller board.
Spare
If the main system controller fails and a failover occurs, the spare
assumes all system controller tasks formerly handled by the main
system controller. The spare system controller functions as a hot
standby, and is used only as a backup for the main system
controller.
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Serial and Ethernet Ports
There are two methods to connect to the system controller console:
■
Serial port —Use the serial port to connect directly to an ASCII terminal or to a
network terminal server (NTS).
■
Ethernet port—Use the Ethernet port to connect to the network.
For performance reasons, it is suggested that the system controllers be configured on
a private network. For details, refer to the article, Sun Fire Midframe Server Best
Practices for Administration, at
http://www.sun.com/blueprints
TABLE 1-5 describes the features of the serial port and the Ethernet port on the System
Controller board. The Ethernet port provides the fastest connection.
TABLE 1-5
Serial Port and Ethernet Port Features on the System Controller Board
Capability
Serial Port
Ethernet Port
Number of connections
One
Multiple
Connection speed
9.6 Kbps
10/100 Mbps
System logs
Remain the system controller
message queue
Remain in the system controller message
queue and are written to the configured
syslog host(s). See TABLE 3-1 for how to set
up the loghosts for the platform shell and each
domain shell. Setting up loghosts makes sure
that error messages are captured when a
system failure occurs.
SNMP
Not supported
Supported
Firmware upgrades
No
Yes (using the flashupdate command)
Security
Secure physical location plus
secure terminal server
Password-protected access only
System Controller Logical Connection Limits
The system controller supports one logical connection on the serial port and
multiple logical connections with telnet on the Ethernet port. Connections can be
set up for either the platform or one of the domains. Each domain can have only one
logical connection at a time.
Chapter 1
Introduction
9
System Controller Software
The sections that follow provide information on the system controller software,
including:
■
■
■
■
■
■
“Platform Administration” on page 10
“System Controller Tasks Completed at System Power-On” on page 11
“Domain Administration” on page 11
“Domain Keyswitch” on page 12
“Environmental Monitoring” on page 12
“Console Messages” on page 13
Platform Administration
The platform administration function manages resources and services that are
shared among the domains. With this function, you can determine how resources
and services are configured and shared.
Platform administration functions include:
■
■
■
■
■
■
Monitoring and controlling power to the components
Logically grouping hardware to create domains
Configuring the system controller’s network, loghost, and SNMP settings
Determining which domains may be used
Determining how many domains can be used (Sun Fire 6800 system only)
Configuring access control for CPU/Memory boards and I/O assemblies
Platform Shell
The platform shell is the operating environment for the platform administrator. Only
commands that pertain to platform administration are available. To connect to the
platform, see “To Obtain the Platform Shell Using telnet” on page 28 or “Obtaining
the Platform Shell” on page 28.
Platform Console
The platform console is the system controller serial port, where the system controller
boot messages and platform log messages are printed.
Note – The Solaris operating environment messages are displayed on the domain
console.
10
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
System Controller Tasks Completed at System Power-On
When you power on the system, the system controller boots the system controller
real time operating system and starts the system controller application.
If there was an interruption of power, additional tasks completed at system poweron include:
■
If a domain is active, the system controller turns on components needed to
support the active domain (power supplies, fan trays, and Repeater boards) as
well as the boards in the domain (CPU/Memory boards and I/O assemblies).
■
If no domains are active, only the system controller is powered on.
■
The system controller reboots any domains that were active when the system lost
power.
Domain Administration
The domain administration function manages resources and services for a specific
domain.
Domain administration functions include:
■
■
■
Configuring the domain settings
Controlling the virtual keyswitch
Recovering errors
For platform administration functions, see “Platform Administration” on page 10.
Domain Shell
The domain shell is the operating environment for the domain administrator and is
where domain tasks can be performed. There are four domain shells (A – D).
To connect to a domain, see “Obtaining a Domain Shell or Console” on page 30.
Chapter 1
Introduction
11
Domain Console
If the domain is active (Solaris operating environment, the OpenBoot PROM, or
POST is running in the domain), you can access the domain console. When you
connect to the domain console, you will be at one of the following modes of
operation:
■
■
■
Solaris operating environment console
OpenBoot PROM
Domain will be running POST and you can view the POST output.
Maximum Number of Domains
The domains that are available vary with the system type and configuration. For
more information on the maximum number of domains you can have, see
“Partitions” on page 3.
Domain Keyswitch
Each domain has a virtual keyswitch. There are five keyswitch positions that you
can set: off (default), standby, on, diag, and secure. There are several other keyswitch
positions, which are transitional positions.
For information on keyswitch settings, see “Keyswitch Positions” on page 69. For a
description and syntax of the setkeyswitch command, refer to the Sun Fire
6800/4810/4800/3800 System Controller Command Reference Manual.
Environmental Monitoring
There are sensors throughout the system that monitor temperature, voltage, current,
and fan speed. The system controller periodically reads the values from each of these
sensors. This information is maintained for display using the console commands and
provided with SNMP.
When a sensor is generating values that are outside of the normal limits, the system
controller takes appropriate action. This includes shutting down components in the
system to prevent damage. Domains may be automatically shut down as a result. If
domains are shut down, be aware that an abrupt hardware shutdown occurs (it is
not a graceful shutdown of the Solaris operating environment).
12
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Console Messages
The console messages generated by the system controller for the platform and for
each domain are printed on the appropriate console. The messages are stored in a
buffer on the system controller and can be logged to a syslog host. It is important to
note that these messages are not the Solaris operating environment console
messages.
To enhance accountability and for long-term storage, the messages should be sent to
syslog as syslog host.
The system controller does not have permanent storage for console messages. Both the
platform and each domain have a small buffer that maintains some history.
However, this information is lost when the system is rebooted or the system
controller loses power.
Redundant Components and Minimum
Configurations
The Sun Fire 6800/4810/4800/3800 systems are designed to increase availability by
having redundant components. The sections that follow discuss the redundant
hardware that can be installed:
■
■
■
■
■
■
■
Redundant System Controller Boards
“CPU/Memory Boards” on page 14
“I/O Assemblies” on page 15
“Redundant Cooling” on page 16
“Redundant Power” on page 17
“Repeater Boards” on page 18
“Redundant System Clocks” on page 19
For troubleshooting tips to perform if a board or component fails, see “Board and
Component Failures” on page 112.
Redundant System Controller Boards
Sun Fire 6800/4810/4800/3800 systems support two System Controller boards,
which serve as the main and spare system controllers. The main system controller
performs all system tasks and manages system resources, while the spare system
controller is available to assume the function of the main system controller if the
main fails.
Chapter 1
Introduction
13
The SC failover software monitors the main and spare system controllers for
conditions that cause the main system controller to fail. If such failure conditions are
detected, the failover software causes a switchover of the main system controller to
the spare. For details on system controller failover, see Chapter 7.
CPU/Memory Boards
All systems support multiple CPU/Memory boards. Each domain must contain at
least one CPU/Memory board.
The maximum number of CPUs you can have on a CPU/Memory board is four.
CPU/Memory boards are configured with either two CPUs or four CPUs. TABLE 1-6
lists the maximum number of CPU/Memory boards for each system.
TABLE 1-6
Maximum Number of CPU/Memory Boards in Each System
System
Maximum Number of
CPU/Memory Boards
Maximum Number of CPUs
Sun Fire 6800 system
6
24
Sun Fire 4810 system
3
12
Sun Fire 4800 system
3
12
Sun Fire 3800 system
2
8
Each CPU/Memory board has eight physical banks of memory. The CPU provides
memory management unit (MMU) support for two banks of memory. Each bank of
memory has four slots. The memory modules (DIMMs) must be populated in groups
of four to fill a bank. The minimum amount of memory needed to operate a domain
is one bank (four DIMMs).
A CPU can be installed and used without any memory in one of its banks. A
memory bank cannot be used unless the corresponding CPU is installed and
functioning. If a CPU is disabled, it is not functioning.
Redundant CPUs and Memory
A failed CPU or faulty memory will be isolated from the domain by the power-on
self-test (POST).
You can operate a domain with as little as one CPU and one memory bank (four
memory modules).
14
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
I/O Assemblies
All systems support multiple I/O assemblies. For the types of I/O assemblies
supported by each system and other technical information, refer to the Sun Fire
6800/4810/4800/3800 Systems Overview Manual. TABLE 1-7 lists the maximum number
of I/O assemblies for each system.
TABLE 1-7
Maximum Number of I/O Assemblies and I/O Slots per I/O Assembly
System
Maximum Number of I/O
Assemblies
Number of CompactPCI or PCI I/O Slots
Sun Fire 6800 system
4
• 8 slots—6 slots for full-length PCI cards
and 2 short slots for short PCI cards
• 4 slots for CompactPCI cards
Sun Fire 4810 system
2
• 8 slots—6 slots for full-length PCI cards
and 2 short slots for short PCI cards
• 4 slots for CompactPCI cards
Sun Fire 4800 system
2
• 8 slots—6 slots for full-length PCI cards
and 2 short slots for short PCI cards
• 4 slots for CompactPCI cards
Sun Fire 3800 system
2
6 slots for CompactPCI cards
Redundant I/O
There are two possible ways to configure redundant I/O (TABLE 1-8).
TABLE 1-8
Configuring for I/O Redundancy
Ways to Configure For I/O Redundancy
Description
Redundancy across I/O assemblies
You must have two I/O assemblies in a domain
with duplicate cards in each I/O assembly that
are connected to the same disk or network
subsystem for path redundancy.
Redundancy within I/O assemblies
You must have duplicate cards in the I/O
assembly that are connected to the same disk or
network subsystem for path redundancy. This
does not protect against the failure of the I/O
assembly itself.
Chapter 1
Introduction
15
The network redundancy features use part of the Solaris operating environment,
known as IP multipathing. For information on IP multipathing (IPMP), refer to the
Solaris documentation supplied with the Solaris 8 or 9 operating environment
release.
The Sun StorEdgeTM Traffic Manager provides multipath disk configuration
management, failover support, I/O load balancing, and single instance multipath
support. For details, refer to the Sun StorEdge documentation available on the Sun
Network Solutions website:
http://www.sun.com/storage/san
Redundant Cooling
All systems have redundant cooling when the maximum number of fan trays are
installed. If one fan tray fails, the remaining fan trays automatically increase speed,
thereby enabling the system to continue to operate.
Caution – With the minimum number of fan trays installed, you do not have
redundant cooling.
With redundant cooling, you do not need to suspend system operation to replace a
failed fan tray. You can hot-swap a fan tray while the system is running, with no
interruption to the system.
TABLE 1-9 shows the minimum and maximum number of fan trays required to cool
each system For location information, such as the fan tray number, refer to the labels
on the system and to the Sun Fire 6800/4810/4800/3800 Systems Service Manual.
TABLE 1-9
Minimum and Maximum Number of Fan Trays
System
Minimum Number of
Fan Trays
Maximum Number of
Fan Trays
Sun Fire 6800 system
3
4
Sun Fire 4810 system
2
3
Sun Fire 4800 system
2
3
Sun Fire 3800 system
3
4
Each system has comprehensive temperature monitoring to ensure that there is no
over-temperature stressing of components in the event of a cooling failure or high
ambient temperature. If there is a cooling failure, the speed of the remaining
operational fans increases. If necessary, the system is shut down.
16
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Redundant Power
In order for power supplies to be redundant, you must have the required number of
power supplies installed plus one additional redundant power supply for each
power grid (referred to as the n+1 redundancy model). This means that two power
supplies are required for the system to function properly. The third power supply is
redundant. All three power supplies draw about the same current.
The power is shared in the power grid. If one power supply in the power grid fails,
the remaining power supplies in the same power grid are capable of delivering the
maximum power required for the power grid.
If more than one power supply in a power grid fails, there will be insufficient power
to support a full load. For troubleshooting tips to perform when a power supply
fails, see “Power Supply Failure” on page 121.
The System Controller boards and the ID board obtain power from any power
supply in the system. Fan trays obtain power from either power grid.
TABLE 1-10 describes the minimum and redundant power supply requirements.
TABLE 1-10
Minimum and Redundant Power Supply Requirements
System
Sun Fire 6800
system
Number of Power
Grids per System
2
Sun Fire 6800
system
Minimum Number of
Power Supplies In Each
Power Grid
Total Number of Supplies In
Each Power Grid (Including
Redundant Power Supplies)
2 (grid 0)
3
2 (grid 1)
3
Sun Fire 4810
system
1
2 (grid 0)
3
Sun Fire 4800
system
1
2 (grid 0)
3
Sun Fire 3800
system
1
2 (grid 0)
3
Each power grid has power supplies assigned to the power grid. Power supplies
ps0, ps1, and ps2 are assigned to power grid 0. Power supplies ps3, ps4, and ps5 are
assigned to power grid 1. If one power grid, such as power grid 0 fails, the
remaining power grid is still operational.
Chapter 1
Introduction
17
TABLE 1-11 lists the components in the Sun Fire 6800 system in each power grid. If
you have a Sun Fire 4810/4800/3800 system, refer to the components in grid 0, since
these systems have only power grid 0.
TABLE 1-11
Sun Fire 6800 System Components in Each Power Grid
Components in the System
Grid 0
Grid 1
CPU/Memory boards
SB0, SB2, SB4
SB1, SB3, SB5
I/O assemblies
IB6, IB8
IB7, IB9
Power supplies
PS0, PS1, PS2
PS3, PS4, PS5
Repeater boards
RP0, RP1
RP2, RP3
Redundant Transfer Unit (RTU)
RTUR (rear)
RTRF (front)
Repeater Boards
The Repeater board is a crossbar switch that connects multiple CPU/Memory boards
and I/O assemblies. Having the required number of Repeater boards is mandatory
for operation. There are Repeater boards in each mid-range system except for the
Sun Fire 3800. In the Sun Fire 3800 system, the equivalent of two Repeater boards are
integrated into the active centerplane. Repeater boards are not fully redundant.
For steps to perform if a Repeater board fails, see “Repeater Board Failure” on
page 117. TABLE 1-12 lists the Repeater board assignments by each domain in the Sun
Fire 6800 system.
TABLE 1-12
18
Repeater Board Assignments by Domains in the Sun Fire 6800 System
Partition Mode
Repeater Boards
Domains
Single partition
RP0, RP1, RP2, RP3
A, B
Dual partition
RP0, RP1
A, B
Dual partition
RP2, RP3
C, D
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
TABLE 1-13 lists the Repeater board assignments by each domain in the Sun Fire
4810/4800 systems.
TABLE 1-13
Repeater Board Assignments by Domains in the Sun Fire 4810/4800/3800
Systems
Partition Mode
Repeater Boards
Domains
Single partition
RP0, RP2
A, B
Dual partition
RP0
A
Dual partition
RP2
C
TABLE 1-14 lists the configurations for single-partition mode and dual-partition mode
for the Sun Fire 6800 system regarding Repeater boards and domains.
TABLE 1-14
Sun Fire 6800 Domain and Repeater Board Configurations for Single- and Dual-Partitioned
Systems
Sun Fire 6800 System in Single-Partition Mode
RP0
RP1
RP2
RP3
Sun Fire 6800 System in Dual-Partition Mode
RP0
RP1
RP2
RP3
Domain A
Domain A
Domain C
Domain B
Domain B
Domain D
TABLE 1-15 lists the configurations for single-partition mode and dual-partition mode
for the Sun Fire 4810/4800/3800 systems.
TABLE 1-15
Sun Fire 4810/4800/3800 Domain and Repeater Board Configurations for Single- and DualPartitioned Systems
Sun Fire 4810/4800/3800 System in Single-Partition Mode
RP0
RP2
Domain A
Sun Fire 4810/4800/3800 System in Dual-Partition Mode
RP0
RP2
Domain A
Domain C
Domain B
Redundant System Clocks
The System Controller board provides redundant system clocks. For more
information on system clocks, see “System Controller Clock Failover” on page 21.
Chapter 1
Introduction
19
Reliability, Availability, and
Serviceability (RAS)
Reliability, availability, and serviceability (RAS) are features of these mid-range
systems. The descriptions of these features are:
■
Reliability is the probability that a system stays operational for a specified time
period when operating under normal conditions. Reliability differs from
availability in that reliability involves only system failure, whereas availability
depends on both failure and recovery.
■
Availability, also known as average availability, is the percentage of time that a
system is available to perform its functions correctly. Availability can be measured
at the system level or in the context of the availability of a service to an end client.
The “system availability” is likely to impose an upper limit on the availability of
any products built on top of that system.
■
Serviceability measures the ease and effectiveness of maintenance and system
repair for the product. There is no single well-defined metric, because
serviceability can include both mean time to repair (MTTR) and diagnosability.
The following sections provide details on RAS. For more hardware-related
information on RAS, refer to the Sun Fire 6800/4810/4800/3800 Systems Service
Manual. For RAS features that involve the Solaris operating environment, refer to the
Sun Hardware Platform Guide.
Reliability
The software reliability features include:
■
■
■
■
POST
Disabling of Components
Environmental Monitoring
System Controller Clock Failover
The reliability features also improve system availability.
POST
The power-on self-test (POST) is part of powering on a domain. A board or
component that fails POST will be disabled. The domain, running the Solaris
operating environment, is booted only with components that have passed POST
testing.
20
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Disabling of Components
The system controller provides component-level status and user-controlled disabling
of components, which is also referred to as blacklisting. However, note that the
system controller does not actually maintain a blacklist file.
You can add a faulty component to a blacklist with the disablecomponent
command. Components in the blacklist will not be configured. You can remove a
component from the blacklist with the enablecomponent command.
The platform blacklists supersede the domain blacklists. For example, if a
component is disabled in the platform, it will always be disabled in all domains.
Blacklisting from the platform applies to all domains. Blacklisting in a domain
applies only to the current domain. If you disable a component in one domain and
then move the component to another domain, the component is not disabled. The
showcomponent command displays status information about the component,
including whether or not it has been disabled.
To enable a component that you previously disabled, you must enable it in the
domain(s) or from the platform.
For additional information on the types of components that can be blacklisted, see
“Disabling Components” on page 122.
Environmental Monitoring
The system controller monitors the system temperature, current, and voltage
sensors. The fans are also monitored to make sure they are functioning.
Environmental status is not provided to the Solaris operating environment—only the
need for an emergency shutdown. The environmental status is provided to the Sun
Management Center software with SNMP.
System Controller Clock Failover
Each system controller provides a system clock signal to each board in the system.
Each board automatically determines which clock source to use. Clock failover is the
ability to change the clock source from one system controller to another system
controller without affecting the active domains.
When a system controller is reset or rebooted, clock failover is temporarily disabled.
When the clock source is available again, clock failover is automatically enabled.
Chapter 1
Introduction
21
Availability
The software availability features include:
■
■
■
■
System Controller Failover Recovery
Unattended Domain Reboot
Unattended Power Failure Recovery
System Controller Reboot Recovery
System Controller Failover Recovery
Systems with redundant System Controller boards support the SC failover
capability. In a high-availability system controller configuration, the SC failover
mechanism triggers the switchover of the main SC to the spare if the main SC fails.
Within approximately five minutes or less, the spare SC becomes the main and takes
over all system controller operations. For details on SC failover, see Chapter 7.
Unattended Domain Reboot
If the system controller detects a hardware error, the domain is rebooted. This
behavior is controlled by the reboot-on-error parameter of the setupdomain
command. This parameter, which is set to true by default, reboots the domain when
a hardware error is detected. If you set this parameter to false and the system
controller detects a hardware error, the domain is paused and it must be turned off,
then on again to recover. For details, see the setupdomain command in the Sun Fire
6800/4810/4800/3800 System Controller Command Reference Manual.
If the Solaris operating environment panics, the action taken depends on the type of
panic, the software configuration, and the hardware configuration. After the panic,
when POST runs, it disables any components that fail testing.
Unattended Power Failure Recovery
If there is a power outage, the system controller reconfigures active domains.
TABLE 1-16 describes domain actions that occur during or after a power failure when
the keyswitch is:
■
■
■
22
Active (set to on, secure, diag)
Inactive (set to off or standby)
Processing a keyswitch operation
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
TABLE 1-16
Results of setkeyswitch Settings During a Power Failure
If During a Power Failure the
Keyswitch Is
This Action Occurs
on, secure, diag
The domain will be powered on after a power
failure.
off, standby
The domain will not be restored after a power
failure.
Processing a keyswitch operation,
such as off to on, standby to on, or
on to off
The domain will not be restored after a power
failure.
System Controller Reboot Recovery
The system controller can be rebooted and will start up and resume management of
the system. The reboot does not disturb the currently running domain(s) running the
Solaris operating environment.
Serviceability
The software serviceability features promote the efficiency and timeliness of
providing routine as well as emergency service to these systems.
LEDs
All field-replaceable units (FRUs) that are accessible from outside the system have
LEDs that indicate their state. The system controller manages all the LEDs in the
system, with the exception of the power supply LEDs, which are managed by the
power supplies. For a discussion of LED functions, refer to the appropriate board or
device chapter of the Sun Fire 6800/4810/4800/3800 Systems Service Manual.
Nomenclature
The system controller, the Solaris operating environment, the power-on self-test
(POST), and the OpenBoot PROM error messages use FRU name identifiers that
match the physical labels in the system. The only exception is the OpenBoot PROM
nomenclature used for I/O devices, which use the device path names as described in
Appendix A.
Chapter 1
Introduction
23
System Controller Error Logging
You can configure the system controller platform and domains to log errors by using
the syslog protocol to an external loghost. The system controller also has an
internal buffer where error messages are stored. You can display the system
controller logged events, stored in the system controller message buffer, by using the
showlogs command. There is one log for the platform and one log for each of the
four domains.
System Controller XIR support
The system controller reset command enables you to recover from a hard hung
domain and extract a Solaris operating environment core file.
Dynamic Reconfiguration Software
Dynamic Reconfiguration (DR), which is provided as part of the Solaris operating
environment, enables you to safely add and remove CPU/Memory boards and I/O
assemblies while the system is still running. DR controls the software aspects of
dynamically changing the hardware used by a domain, with minimal disruption to
user processes running in the domain.
You can use DR to do the following:
■
■
■
■
■
■
Shorten the interruption of system applications while installing or removing a
board
Disable a failing device by removing it from the logical configuration, before the
failure can crash the operating system
Display the operational status of boards in a system
Initiate self tests of a system board while the domain continues to run
Reconfigure a system while the system continues to run
Invoke hardware-specific functions of a board or a related attachment
The DR software uses the cfgadm command, which is a command line interface for
configuration administration. You can perform domain management DR tasks using
the system controller software. The DR agent also provides a remote interface to the
Sun Management Center software on Sun Fire 6800/4810/4800/3800 systems.
For complete information on DR, refer to the Sun Fire 6800, 4810, 4800, and 3800
Systems Dynamic Reconfiguration User Guide and also the Solaris documentation
included with the Solaris operating environment.
24
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Sun Management Center Software for
the Sun Fire 6800/4810/4800/3800
Systems Software
For information on the Sun Management Center software for the Sun Fire
6800/4810/4800/3800 systems, refer to the Sun Management Center 3.0 Supplement for
Sun Fire 6800, 4810, 4800, and 3800 Systems, which is available online.
FrameManager
The FrameManager is an LCD display that is located in the top right corner of the
Sun Fire system cabinet. For a description of its functions, refer to the
“FrameManager” chapter of the Sun Fire 6800/4810/4800/3800 Systems Service Manual.
Chapter 1
Introduction
25
26
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
CHAPTER
2
System Controller Navigation
Procedures
This chapter explains step-by-step procedures with illustrations describing how to:
■
■
■
Connect to the platform and the domains
Navigate between the domain shell and the domain console
Terminate a system controller session
Topics covered in this chapter include:
■
“Connection to the System Controller” on page 28
■
■
■
“System Controller Navigation” on page 32
■
■
■
■
■
“Obtaining the Platform Shell” on page 28
“Obtaining a Domain Shell or Console” on page 30
“To Enter the Domain Console From the Domain Shell If the Domain Is
Inactive” on page 35
“To Enter the Domain Shell From the Domain Console” on page 36
“To Get Back to the Domain Console From the Domain Shell” on page 36
“To Enter a Domain From the Platform Shell” on page 37
“Terminating Sessions” on page 37
■
■
“To Terminate an Ethernet Connection With telnet” on page 37
“To Terminate a Serial Connection With tip” on page 38
27
Connection to the System Controller
This section describes how to obtain the following:
■
■
The platform shell
A domain shell or console
There are two types of connections: telnet and serial. If you are using a telnet
connection, configure the system controller network settings before using telnet.You
can access the system controller main menu using either the telnet or serial
connections.
From the main menu, you can select either the platform shell or one of the domain
consoles.
■
If you select the platform, you always obtain a shell.
■
If you select a domain, you obtain the:
■
■
Domain console (if the domain is active)
Domain shell (if the domain is inactive)
You can also bypass the system controller main menu by making a telnet connection
to a specific port.
Obtaining the Platform Shell
This section describes how to obtain the platform shell.
▼
To Obtain the Platform Shell Using telnet
Before you use telnet, be sure to configure the network settings for the system
controllers.
1. Obtain the system controller main menu by typing telnet schostname
(CODE EXAMPLE 2-1).
where:
schostname is the system controller host name.
The system controller main menu is displayed. CODE EXAMPLE 2-1 shows how to
enter the platform shell.
28
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
CODE EXAMPLE 2-1
Obtaining the Platform Shell With telnet
% telnet schostname
Trying xxx.xxx.xxx.xxx
Connected to schostname.
Escape character is ’^]’.
System Controller ‘schostname’:
Type 0 for Platform Shell
Type
Type
Type
Type
1
2
3
4
for
for
for
for
domain
domain
domain
domain
A
B
C
D
Input: 0
Connected to Platform Shell
schostname:SC>
Note – schostname is the system controller host name.
2. Type 0 to enter the platform shell.
The system controller prompt, schostname:SC>, is displayed for the platform shell of
the main system controller. If you have a redundant SC configuration, the spare
system controller prompt is schostname:sc>.
▼ To Initiate a Serial Connection with tip
● At the machine prompt, type tip and the serial port to be used for the system
controller session.
machinename% tip port_name
connected
The main system controller menu is displayed.
Chapter 2
System Controller Navigation Procedures
29
▼ To Obtain the Platform Shell Using the Serial Port
1. Connect the system controller serial port to an ASCII terminal.
The system controller main menu is displayed.
2. From the main menu type 0 to enter the platform shell.
Obtaining a Domain Shell or Console
This section describes the following:
■
■
“To Obtain the Domain Shell Using telnet” on page 30
“To Obtain the Domain Shell From the Domain Console” on page 32
▼ To Obtain the Domain Shell Using telnet
1. Obtain the system controller main menu by typing telnet schostname
(CODE EXAMPLE 2-2).
The system controller main menu is displayed.
where:
schostname is the system controller host name.
CODE EXAMPLE 2-2 shows entering the shell for domain A.
30
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
CODE EXAMPLE 2-2
Obtaining a Domain Shell With telnet
% telnet schostname
Trying xxx.xxx.xxx.xxx
Connected to schostname.
Escape character is ’^]’.
System Controller ‘schostname’:
Type 0 for Platform Shell
Type
Type
Type
Type
1
2
3
4
for
for
for
for
domain
domain
domain
domain
A
B
C
D
Input: 1
Connected to Domain A
Domain Shell for Domain A
schostname:A>
2. Enter a domain. Type 1, 2, 3, or 4 to enter the appropriate domain shell.
The system controller prompt for the domain shell you connected to is displayed.
CODE EXAMPLE 2-2 shows entering the shell for domain A, whose prompt is
schostname:A>.
3. If the domain is active (the domain keyswitch is set to on, diag, or secure which
means you are running the Solaris operating environment, are in the OpenBoot
PROM, or are running POST), perform the following steps:
a. Press and hold the CTRL key while pressing the ] key, to get to the telnet>
prompt.
b. At the telnet> prompt type send break (CODE EXAMPLE 2-3).
Chapter 2
System Controller Navigation Procedures
31
CODE EXAMPLE 2-3
Obtaining a Domain Shell From the Domain Console
ok Ctrl-]
telnet> send break
▼ To Obtain the Domain Shell From the Domain Console
If the domain is active and the domain keyswitch is set to on, diag, or secure (you
are running the Solaris operating environment, are in the OpenBoot PROM, or are
running POST), perform the following steps:
1. Press and hold the CTRL key while pressing the ] key, to get to the telnet>
prompt.
2. At the telnet> prompt type send break.
CODE EXAMPLE 2-4 shows obtaining the shell for domain A from the domain console.
Because the domain is active, you will not see a prompt.
CODE EXAMPLE 2-4
Obtaining a Domain Shell From the Domain Console
ok Ctrl-]
telnet> send break
System Controller Navigation
This section explains how to navigate between the:
■
■
■
System controller platform
System controller domain console
System controller domain shell
To return to the originating shell, use the disconnect command. In a domain shell,
to connect to the domain console, use the resume command. To connect to a domain
shell from the platform shell, use the console command.
32
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
FIGURE 2-1 shows how to navigate between the platform shell, the domain shell, the
domain console by using the console and disconnect commands. FIGURE 2-1 also
shows how to connect to both the domain shell and platform shell from the
operating environment by using the telnet command.
Domain shell
Domain
Type: telnet schostname 500x
Type: disconnect
Type: disconnect
Type: telnet schostname 5000
Type: console domainID
Type: disconnect
Platform shell
Platform shell
FIGURE 2-1
Navigating Between the Platform Shell and the Domain Shell
Note – You can also use the telnet command without the port number, as
described in CODE EXAMPLE 2-1 and CODE EXAMPLE 2-2.
where:
In the telnet command in FIGURE 2-1, 5000 is the platform shell.
x is:
■
■
■
■
1
2
3
4
for
for
for
for
domain
domain
domain
domain
A
B
C
D
In the console command, domainID is a, b, c, or d.
Note – By typing telnet schostname 500x, you will bypass the system controller
main menu and directly enter the platform shell, a domain shell or a domain
console.
Chapter 2
System Controller Navigation Procedures
33
FIGURE 2-2 illustrates how to navigate between the Solaris operating environment,
the OpenBoot PROM, and the domain shell. FIGURE 2-2 assumes that the Solaris
operating environment is running.
Solaris
operating
environment
login:
Press: CTRL ]
At the telnet>
prompt type:
send break
OpenBoot
PROM
ok
Type: resume
Domain shell
schostname:domainID
FIGURE 2-2
Type: break
Navigating Between the Domain Shell, the OpenBoot PROM, and the Solaris
Operating Environment
Caution – Note that in FIGURE 2-2 typing the break command suspends the Solaris
operating environment.
FIGURE 2-3 illustrates how to navigate between the OpenBoot PROM and the domain
shell. This figure assumes that the Solaris operating environment is not running.
34
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Press: CTRL ]
At the telnet>
prompt type:
send break
OpenBoot
PROM
ok
Type: resume
Domain shell
schostname:domainID
FIGURE 2-3
Navigating Between the OpenBoot PROM and the Domain Shell
When you connect to a domain, you will be connected to the domain shell unless the
domain is active in which case you will be connected to the domain console. When
you connect to the console, you will be connected to the Solaris operating
environment console, the OpenBoot PROM, or POST, depending of which of these is
currently executing.
▼
To Enter the Domain Console From the Domain
Shell If the Domain Is Inactive
● Type setkeyswitch on in the domain shell.
schostname:A> setkeyswitch on
The domain console is only available when the domain is active. To make the
domain active, you must turn the keyswitch on. You will be automatically switched
from the domain shell to the domain console.
This action powers on and initializes the domain. The domain will go through POST
and then the OpenBoot PROM. If the OpenBoot PROM auto-boot? parameter is
set to true, the Solaris operating environment will boot.
Chapter 2
System Controller Navigation Procedures
35
▼
To Enter the Domain Shell From the Domain
Console
1. Press and hold the CTRL key while pressing the ] key to get to the telnet>
prompt (CODE EXAMPLE 2-5).
2. Type send break at the telnet prompt.
CODE EXAMPLE 2-5
Obtaining a Domain Shell From the Domain Console
ok Ctrl-]
telnet> send break
▼
To Get Back to the Domain Console From the
Domain Shell
1. Type resume:
schostname:D> resume
Note that because the domain is active, you will get a blank line.
2. Press the Return key to get a prompt.
Note – If the domain is not active, (the Solaris operating environment or the
OpenBoot PROM is not running), the system controller stays in the domain shell and
you will obtain an error.
36
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
▼
To Enter a Domain From the Platform Shell
Note – This example shows entering an inactive domain.
● Type:
schostname:SC> console -d a
Connected to Domain A
Domain Shell for Domain A
schostname:A>
If the OpenBoot PROM is running, you are returned to the console for domain A. If
the keyswitch is set to off or standby, you are returned to the shell for domain A.
Note – To enter another domain, type the proper domainID b, c, or d.
Terminating Sessions
This section describes how to terminate system controller sessions.
▼
To Terminate an Ethernet Connection With
telnet
● Type the disconnect command at the domain shell prompt.
Your system controller session terminates.
schostname:A> disconnect
Connection closed by foreign host.
machinename%
This example assumes that you are connected directly to the domain and not from
the platform shell.
Chapter 2
System Controller Navigation Procedures
37
Note – If you have a connection to the domain initiated on the platform shell, you
must type disconnect twice.
Typing disconnect the first time takes you back to the platform shell connection
and keeps your connection to the system controller. Typing disconnect again exits
the platform shell and ends your connection to the system controller.
▼
To Terminate a Serial Connection With tip
If you are connected to the System Controller board with the serial port, use the
disconnect command to terminate the system controller session then use a tip
command to terminate your tip session.
1. At the domain shell or platform shell prompt, type disconnect.
schostname:A> disconnect
2. If you are in a domain shell and are connected from the platform shell, type
disconnect again to disconnect from the system controller session.
schostname:SC> disconnect
The system controller main menu is displayed.
38
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
3. Type ~. to end your tip session (CODE EXAMPLE 2-6).
CODE EXAMPLE 2-6
Ending a tip Session
System Controller ‘schostname’:
Type 0 for Platform Shell
Type
Type
Type
Type
1
2
3
4
for
for
for
for
domain
domain
domain
domain
A
B
C
D
Input: ~.
machinename%
The machinename% prompt is displayed.
Chapter 2
System Controller Navigation Procedures
39
40
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
CHAPTER
3
System Power On and Setup
This chapter provides information to enable you to power on your system for the
first time and perform software setup procedures using the system controller
command line interface. For instructions on how to subsequently power on your
system, see “To Power On the System” on page 68.
Note – When you are setting up your system for the first time, it is strongly
suggested that you bring up the one domain set up for you, domain A, by installing
the Solaris operating environment in the domain and then booting it before creating
additional domains.
Before you create additional domains, make sure that domain A is operational, can
be accessed from the main menu, and you can boot the Solaris operating
environment in the domain. It is good policy to validate that one domain, domain A,
is functioning properly before you create additional domains. To create additional
domains, see Chapter 4.
This chapter contains the following topics:
■
■
■
■
■
■
“Installing, Cabling, and Powering on the Hardware” on page 43
“Powering On the Power Grids” on page 45
“Setting Up the Platform” on page 46
“Setting Up Domain A” on page 48
“Saving the Current Configuration to a Server” on page 50
“Installing and Booting the Solaris Operating Environment” on page 51
FIGURE 3-1 is a flowchart summarizing the major steps you must perform to power
on and set up the system, which are explained in step-by-step procedures in this
chapter.
41
Install and cable
hardware.
Set up platform
specific parameters
with the
setupplatform
command.
Set up services
before powering
on the hardware.
Set the date and
time for domain A.
Power on the
hardware and
the power grid(s).
Set the password
for domain A.
Set the date
and time for
the platform.
Set up domainspecific parameters
with the
setupdomain
command.
Set the password
for the platform.
FIGURE 3-1
42
Flowchart of Power On and System Setup Steps
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Have the platform
administrator save
the system
configuration with
the dumpconfig
command
Turn on the
domain keyswitch.
If the Solaris
operating
environment.
is not preinstalled, install it.
Boot the Solaris
operating
environment.
Installing, Cabling, and Powering on the
Hardware
1. Install and cable the hardware.
See the installation guide for your system.
2. Connect a terminal to your system using the serial port.
Refer to the installation guide for your system.
3. When you set up the terminal, set the ASCII terminal to the same baud rate as the
serial port.
The default serial port settings for the System Controller board are:
■
■
■
■
9600 baud
8 data bits
No parity
1 stop bit
Because this is the platform console connection, log messages are displayed.
Setting Up Additional Services Before System
Power On
● Before you power on the system for the first time, set up the services described in
TABLE 3-1.
TABLE 3-1
Services That Should Be Set Up Before System Power On
Service
Description
DNS services
The system controller uses DNS to simplify communication with other systems.
Sun Managment
Center 3.0 software*
Manage and monitor your system by using the Sun Management Center software for
the Sun Fire 6800/4810/4800/3800 systems. It is suggested that you use this software
to manage and monitor your system.
Network Terminal
Server (NTS)
A Network Terminal Server (NTS) is used to help manage multiple serial connections.
The NTS should be secured with at least a password.
* It is not necessary to have the loghost set up before you install and boot the Solaris operating environment. You can install
the Sun Managment Center 3.0 software after you boot your system for the first time. Because you can install from a
CD-ROM, it is not necessary to have a boot/install server set up before system power on.
Chapter 3
System Power On and Setup
43
TABLE 3-1
Services That Should Be Set Up Before System Power On (Continued)
Service
Description
Boot/install server*
Allows you to install the Solaris operating environment from a network server instead
of using a CD-ROM.
http/ftp server*
In order to perform firmware upgrades, you must set up either an http or an ftp server.
In order to read/write the configuration backup files for the system controller
dumpconfig and restoreconfig commands, you need to set up an ftp server.
Loghost*
The loghost system is used to collect system controller messages. In order to
permanently save loghost error messages, you must set up a loghost server.
• Use the setupplatform -p loghost command to output platform messages to the
loghost.
• Use the setupdomain -d loghost command to output domain messages to the
loghost.
There is a loghost for the platform and for each domain. For complete information and
command syntax, refer to the Sun Fire 6800/4810/4800/3800 System Controller Command
Reference Manual.
For information on the Solaris operating environment loghost, including how to
redirect error messages, refer to the Sun Hardware Platform Guide, which is available
with your Solaris operating environment release.
System controller
If you plan to put the system controller(s) on a network, each system controller
installed must have an IP address. Each system controller should also have a serial
connection.
Domains
Each domain you plan to use needs to have its own IP address.
* It is not necessary to have the loghost set up before you install and boot the Solaris operating environment. You can install
the Sun Managment Center 3.0 software after you boot your system for the first time. Because you can install from a
CD-ROM, it is not necessary to have a boot/install server set up before system power on.
44
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Powering On the Hardware
● Complete the hardware power-on steps detailed and illustrated in the installation
guide for your system.
Powering On the Power Grids
1. Access the system controller and connect to the system controller main menu.
See “Connection to the System Controller” on page 28.
2. Connect to the platform shell.
3. Power on the power grid(s).
The poweron gridx command powers on power supplies in that power grid.
■
If you have a Sun Fire 6800 system, you must power on power grid 0 and power
grid 1.
schostname:SC> poweron grid0 grid1
■
If you have a Sun Fire 4810/4800/3800 system, there is only one power grid,
grid 0.
schostname:SC> poweron grid0
The poweron grid0 system controller command powers on power supplies in
power grid 0.
Chapter 3
System Power On and Setup
45
Setting Up the Platform
After powering on the power grids, set up your system using the commands
described in this chapter.
This section contains the following topics:
■
■
■
▼
To Set the Date and Time for the Platform
To Set a Password for the Platform
To Configure Platform Parameters
To Set the Date and Time for the Platform
The platform and each of the four domains have separate and independent dates
and times.
Note – If your time zone area is using daylight or summer time, the time and time
zone are adjusted automatically. On the command line, you can enter only nondaylight time zones.
● Set the date, time, and time zone for the platform, by doing one of the following:
■
Use the setdate command from the platform shell.
For complete command syntax, examples, a table of time zone abbreviations, time
zone names, and offsets from Greenwich mean time, refer to the setdate
command in the Sun Fire 6800/4810/4800/3800 System Controller Command Reference
Manual.
If you have a redundant SC configuration, you must run the setdate command
on each system controller and set the same date and time for each SC. The
platform date and time must be the same on both the main and spare SC for
failover purposes.
■
Assign a Simple Time Network Protocol (SNTP) server through the
setupplatform command.
You can assign an SNTP server that synchronizes the date and time between the
main and spare system controller. The platform date and time must be the same
on both the main and spare SC for failover purposes. To assign an SNTP server,
use the setupplatform command, which is described in the Sun Fire
6800/4810/4800/3800 System Controller Command Reference Manual.
46
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Note – Although you can set a different date and time for the platform and for each
domain, it is strongly suggested that you use the same date and time for the
platform and each domain.
Using the same date and time for the platform shell and each domain shell may help
during interpretation of error messages and logs. The date and time set on the
domains is also used by the Solaris operating environment.
▼
To Set a Password for the Platform
The system controller password that you set for the main system controller also
serves as the same password for the spare system controller.
1. From the platform shell, type the system controller password command.
2. At the Enter new password: prompt, type in your password.
3. At the Enter new password again: prompt, type in your password again.
For examples, refer to the password command in the Sun Fire 6800/4810/4800/3800
System Controller Command Reference Manual.
▼
To Configure Platform Parameters
Note – One of the platform configuration parameters that can be set through the
setupplatform command is the partition parameter. Determine if you want to set
up your system with one partition or two partitions. Read “Domains” on page 2 and
“Partitions” on page 3 before completing the following steps.
1. From the platform shell, type setupplatform.
For a description of the setupplatform parameter values and an example of this
command, refer to the setupplatform command in the Sun Fire
6800/4810/4800/3800 System Controller Command Reference Manual.
schostname:SC> setupplatform
Note – If you press the Return key after each parameter, the current value will not
be changed. If you type a dash ( - ), this clears the entry (if the entry can be blank).
Chapter 3
System Power On and Setup
47
2. If you have a second System Controller board installed, run the setupplatform
command on the second system controller.
All of the parameters, except for the network settings (such as the IP address and
hostname of the system controller) and the POST diag level, are copied from the
main system controller to the spare.
Setting Up Domain A
In order to set up a domain, you must complete the procedures:
■
■
■
■
▼
“To
“To
“To
“To
Access the Domain” on page 48
Set the Date and Time for Domain A” on page 48
Set a Password for Domain A” on page 48
Configure Domain-Specific Parameters” on page 49
To Access the Domain
● Access the domain.
For more information, see “System Controller Navigation” on page 32.
▼
To Set the Date and Time for Domain A
● Type the setdate command in the domain A to set the date and time for the
domain.
Note – Because you can have up to four domains, you must eventually set the date
and time for each domain. To start, just set the date and time for domain A.
For command syntax and examples, refer to the setdate command in the Sun Fire
6800/4810/4800/3800 System Controller Command Reference Manual and to “To Set the
Date and Time for the Platform” on page 46.
▼
To Set a Password for Domain A
1. From the domain A shell, type the password command (CODE EXAMPLE 3-1).
2. At the Enter new password: prompt, type your password.
48
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
3. At the Enter new password again: prompt, type your password again
(CODE EXAMPLE 3-1).
CODE EXAMPLE 3-1
password Command Example For a Domain With No Password Set
schostname:A> password
Enter new password:
Enter new password again:
schostname:A>
▼
To Configure Domain-Specific Parameters
Note – Each domain is configured separately.
1. From the domain A shell, type the setupdomain command.
For a listing of parameter values and sample output, refer to the setupdomain
command in the Sun Fire 6800/4810/4800/3800 System Controller Command Reference
Manual.
2. Perform the steps listed in TABLE 3-2.
TABLE 3-2
Steps in Setting up Domains Including the dumpconfig Command
If you are setting up one domain . . .
If you are setting up more than one domain . . .
1. Continue with the procedures in this chapter.
1. Install and boot the Solaris operating environment
on domain A as described in “To Install and Boot
the Solaris Operating Environment” on page 51.
2. Go to Chapter 4 to set up additional domains.
3. After all of the domains are set up and before you
start each additional domain you set up, have the
platform administrator run the dumpconfig
command. See “To Use dumpconfig to Save
Platform and Domain Configurations” on page 50.
Chapter 3
System Power On and Setup
49
Saving the Current Configuration to a
Server
This section describes how to use the dumpconfig command, which must be run by
the platform administrator, to save the current system controller (SC) configuration
to a server. Use dumpconfig to save the SC configuration for recovery purposes.
Use the dumpconfig command when you
▼
■
First set up your system and need to save the platform and domain
configurations.
■
Change the platform and domain configurations with one of the following system
controller commands (setupdomain, setupplatform, setdate, addboard,
deleteboard, enablecomponent, disablecomponent, and password) or
install and remove a CPU/Memory board or I/O assembly.
To Use dumpconfig to Save Platform and
Domain Configurations
Use dumpconfig to save the platform and domain configurations to a server so that
you can restore the platform and domain configurations to a replacement system
controller (if the current system controller fails).
Note – Do not save the configuration to a domain on this system running the Solaris
operating environment. This is because the domain will be unavailable when the
system is restored.
● Type the system controller dumpconfig command from the platform shell to save
the present system controller configuration to a server.
schostname:SC> dumpconfig -f url
For command syntax, a description, command output, and examples, refer to the
dumpconfig command in the Sun Fire 6800/4810/4800/3800 System Controller
Command Reference Manual.
50
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Installing and Booting the Solaris
Operating Environment
▼
To Install and Boot the Solaris Operating
Environment
1. Access the domain A shell.
See “Obtaining a Domain Shell or Console” on page 30.
2. Turn the domain A keyswitch to the on position. Type setkeyswitch on.
The setkeyswitch on command powers on the domain. If the OpenBoot PROM
auto-boot? parameter is set to true, you might obtain an error message similar to
CODE EXAMPLE 3-2.
CODE EXAMPLE 3-2
Sample Boot Error Message When the auto-boot? Parameter Is Set
To true
{0} ok boot
ERROR: Illegal Instruction
debugger entered.
{0} ok
The OpenBoot PROM displays this error message because the Solaris operating
environment might not yet be installed, or perhaps you are booting off the wrong
disk.
3. Insert the CD for the Solaris operating environment into the CD-ROM drive.
4. Install the Solaris operating environment on your system.
Refer to the Sun Hardware Platform Guide for your operating system release. That
book will refer to the installation guide you will need.
5. Boot the Solaris operating system by typing the OpenBoot PROM boot cdrom
command at the ok prompt.
ok boot cdrom
Chapter 3
System Power On and Setup
51
52
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
CHAPTER
4
Creating and Starting Multiple
Domains
This chapter assumes that domain A, which was set up for you by Sun, is bootable.
This chapter explains how to create additional domains and how to start domains.
Note – The system is shipped from the factory configured with one domain, domain
A. All system boards are assigned to domain A.
Creating and Starting Domains
This section explains how to set up two or more domains.
Before Creating Multiple Domains
1. Determine how many domains you can have in your system and how many
partitions you need.
Read “Domains” on page 2 and “Partitions” on page 3. If you have a Sun Fire 6800
system and you are planning to set up three or four domains, you will need to set up
dual partition mode (two partitions). It may be helpful to maintain at least one
unused domain for testing hardware before dynamically reconfiguring it into the
system.
53
Note – For all systems, it is strongly suggested that you use dual partition mode to
support two domains. Using two partitions to support two domains provides better
isolation between domains.
2. Determine the number of boards and assemblies that will be in each domain.
A domain must contain a minimum of one CPU/Memory board and one I/O
assembly. However, it is suggested that you have at least two CPU/Memory boards
and I/O assemblies for high availability configurations. If you have a Sun Fire 6800
system, go to the next step.
3. If you have a Sun Fire 6800 system, complete this step. The Sun Fire 6800 system
has two power grids, grid 0 and grid 1. It is strongly suggested that you set up
boards in a domain to be in the same power grid in order to isolate the domain
from a power failure.
Read “Redundant Power” on page 17 to learn how boards are divided between grid
0 and grid 1.
4. If you need to configure two partitions, turn off all domains.
a. If the Solaris operating environment is running in the domain, complete Step a
through Step d of Step 3 in “To Power Off the System” on page 66, then return
to Step 2 of this procedure.
Otherwise, skip to Step 5.
b. Configure the partition mode to dual.
Refer to the setupplatform command in the Sun Fire 6800/4810/4800/3800
System Controller Command Reference Manual.
5. If you do not need to configure two partitions and if the board that you plan to
assign to a new domain is currently used by domain A, shut down domain A or
use DR to unconfigure and disconnect the board out of the domain.
To shut down the domain, complete Step a through Step d of Step 3 in “To Power Off
the System” on page 66.
You can use the cfgadm command to remove the board from the domain, without
shutting down the domain. Refer to the Sun Fire 6800, 4810, 4800, and 3800 Systems
Dynamic Reconfiguration User Guide.
54
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
▼
To Create A Second Domain
Note – It is strongly suggested that you use domain C with two partitions (dual
partition mode) for your second domain. It provides better fault isolation (complete
isolation of Repeater boards). With one partition, use domain B for the second
domain.
Note – The steps to create a second domain should be performed by the platform
administrator.
1. Complete all steps in “Before Creating Multiple Domains” on page 53.
2. If you have boards that are assigned, type the deleteboard command from the
platform shell to unassign the boards you want to move from one domain to
another domain:
schostname:SC> deleteboard sbx ibx
where:
sbx is sb0 through sb5 (CPU/Memory boards)
ibx is ib6 through ib9 (I/O assemblies)
3. Assign the boards to the new domain with the addboard command.
■
If you have one partition, to add sbx and ibx to domain B, from the platform
shell, type:
schostname:SC> addboard -d b sbx ibx
■
If you have two partitions, to add sbx and ibx to domain C, from the platform
shell, type:
schostname:SC> addboard -d c sbx ibx
4. From the platform shell access the proper domain shell.
See “System Controller Navigation” on page 32.
Chapter 4
Creating and Starting Multiple Domains
55
5. Set the date and time for the domain.
You set the date and time for the second domain in exactly the same way you set the
date and time for domain A. For an example of the setdate command, refer to the
setdate command in the Sun Fire 6800/4810/4800/3800 System Controller Command
Reference Manual.
6. Set a password for the second domain.
You set the password for the second domain in exactly the same way you set the
password for domain A. For an example of the password command, refer to the
password command in the Sun Fire 6800/4810/4800/3800 System Controller Command
Reference Manual.
7. Configure domain-specific parameters for the new domain with setupdomain.
You configure domain-specific parameters for each domain separately. For more
details, tables, and code examples, refer to the setupdomain command in the Sun
Fire 6800/4810/4800/3800 System Controller Command Reference Manual.
8. After creating all domains, have the platform administrator save the state of the
configuration with the dumpconfig command.
For details on using dumpconfig, see the procedure “Saving the Current
Configuration to a Server” on page 50.
9. Start each the domain after all domains have been created.
Go to “To Start the Domain” on page 57.
Special Considerations When Creating a Third
Domain on the Sun Fire 6800 System
You create three domains in exactly the same way that you create two domains.
Follow these steps:
1. If the platform is configured as a single partition, halt the Solaris operating
environment for all active domains before changing partition mode.
Complete Step 3 in “To Power Off the System” on page 66.
2. Configure partition mode to dual with the setupplatform command.
56
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
3. Decide which domain needs higher performance. Plan to assign the third domain
to the partition that requires the lowest performance.
TABLE 4-1 provides some best-practice guidelines to follow.
TABLE 4-1
Guidelines for Creating Three Domains on the Sun Fire 6800 System
Description
Domain IDs
Use these domain IDs if domain A requires higher performance and
more hardware isolation
A, C, D
Use these domain IDs if domain C requires higher performance and
more hardware isolation
A, B, C
On the Sun Fire 4810/4800/3800 systems, when you set the partition mode to dual, this moves the MAC
address and the host ID from domain B to domain C. Use showplatform -p mac to view the settings.
4. Perform all steps in the procedure “To Create A Second Domain” on page 55 to
create the third domain.
▼
To Start the Domain
1. Connect to the domain shell for the domain you want to start.
See “System Controller Navigation” on page 32.
2. Turn the keyswitch on.
schostname:C> setkeyswitch on
The OpenBoot PROM prompt is displayed.
3. Install and boot the Solaris operating environment in the domain.
Refer to the Sun Hardware Platform Guide, which is available with your operating
environment release.
Chapter 4
Creating and Starting Multiple Domains
57
58
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
CHAPTER
5
Security
This chapter lists the major security threats, provides important information about
the system controller, explains password requirements for the platform and the
domains, describes domain separation requirements, explains how to secure the
system controller with the setkeyswitch command, provides references to Solaris
operating environment security, and briefly describes SNMP.
This chapter contains the following topics:
■
■
■
■
■
“Security Threats” on page 59
“System Controller Security” on page 60
“Domains” on page 62
“Solaris Operating Environment Security” on page 64
“SNMP” on page 64
Security Threats
Some of the threats regarding host break-ins that can be imposed are:
■
■
■
■
Unauthorized
Unauthorized
Unauthorized
Unauthorized
system controller access
domain access
administrator workstation access
user workstation access
Caution – It is important to remember that access to the system controller can shut
down all or part of the system, including active domains running the Solaris
operating environment. Also, hardware and software configuration can be changed.
59
System Controller Security
In order to secure the system controller in your system, read about the system
controller security issues. System controller security issues have a great impact on
the security of the system controller installation. Refer to the articles available
online, including Securing the Sun Fire Midframe System Controller, at:
http://www.sun.com/blueprints
When you set up the software for your system, you performed software tasks
needed to set up system controller security in Chapter 3. The basic steps needed in
order to secure the system controller are:
1. Set the platform shell password using the password command.
2. Set up the platform-specific parameters using the setupplatform command.
A few setupplatform parameters involving system controller security are
parameters that configure:
■
■
■
■
■
Network settings
Loghost for the platform
SNMP community strings
Access Control List (ACL) for hardware
Time out period for telnet and serial port connections
3. Set the domain shell password for all domains using the password command.
4. Set the domain-specific parameters using setupdomain.
A few setupdomain parameters involving system controller security are
parameters that configure:
■
■
Loghost for each domain
SNMP for each domain (Public and Private Community Strings)
5. Save the current configuration of the system using dumpconfig.
This list of parameters is only a partial list of what you need to set up. For step-bystep software procedures, see Chapter 3.
60
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
setupplatform and setupdomain Parameter
Settings
For technical information on the setupplatform and setupdomain settings
involving system controller security, see the system controller commands in the Sun
Fire 6800/4810/4800/3800 System Controller Command Reference Manual. Also refer to
the articles available online. See “System Controller Security” on page 60 for the
URL.
Changing Passwords for the Platform and the
Domain
Note – Make sure that you know who has access to the system controller. Anyone
who has that access can control the system.
Rules for Setting Passwords
When you set up your system for the first time:
■
Make sure that you set the platform password and a different domain password
for each domain (even if the domain is not used) to increase isolation between
domains.
■
Continue to change the platform and domain passwords on a regular basis.
Chapter 5
Security
61
Domains
This section discusses domain separation and the setkeyswitch command.
Domain Separation
The domain separation requirement is based on allocating computing resources to a
specific domain. These mid-range systems enforce domain separation, which
prevents users of one domain, who only have access to the Solaris operating
environment running in that domain, from accessing or modifying the data of
another domain.
This security policy enforcement is performed by the software (FIGURE 5-1). In this
figure, a domain user is a person who is using the Solaris operating environment
and does not have access to the system controller. The domain administrator is
responsible for:
■
■
■
Configuring the domain
Maintaining domain operations
Overseeing the domain
As this figure shows, the domain administrator has access to the domain console and
domain shell for the domain the administrator is responsible for. Also note in
FIGURE 5-1 that the platform administrator has access to the platform shell and the
platform console. If the platform administrator knows the domain passwords, the
platform administrator also has access to domain shells and consoles. You should
always set the domain shell passwords for each domain.
The following are security items to consider in each domain:
62
■
Make sure that all passwords are within acceptable security guidelines. For
example, each domain and the platform should have a unique password.
■
Change your passwords for the platform and each domain shell on a regular
basis.
■
Scrutinize log files on a regular basis for any irregularities. For more information
on these log files, refer to the Sun Hardware Platform Guide for the operating
environment installed on your system.
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Domain A
administrator
Domain A shell
or console access
Platform
administrator
Domain A
users
Solaris operating
environment
access
If the platform administrator knows
or resets the domain passwords or if
the domain passwords are not set
Domain B shell
or console access
Solaris operating
environment
access
Platform shell or
console access
Domain B
administrator
FIGURE 5-1
Domain B
users
System With Domain Separation
Chapter 5
Security
63
setkeyswitch Command
The Sun Fire 6800/4810/4800/3800 systems do not have a physical keyswitch. You
set the virtual keyswitch in each domain shell with the setkeyswitch command.
To secure a running domain, set the domain keyswitch to the secure setting. For
more information about setkeyswitch, refer to the online article, Securing the Sun
Fire Midframe System Controller available online at
http://www.sun.com/blueprints
With the keyswitch set to secure, the following restrictions occur:
■
Disables the ability to perform flashupdate operations on CPU/Memory boards
or I/O assemblies. Performing flashupdate operations on these boards should
only be done by an administrator who has platform shell access on the system
controller.
■
Ignores break and reset commands from the system controller. This is an
excellent security precaution. This functionality also ensures that accidentally
typing a break or reset command will not halt a running domain.
Solaris Operating Environment Security
For information on securing the Solaris operating environment, refer to the
following books and articles:
■
■
■
SunSHIELD Basic Security Module Guide (Solaris 8 System Administrator
Collection)
Solaris 8 System Administration Supplement or System Administration Guide: Security
Services in the Solaris 9 System Administrator Collection
Solaris security toolkit articles available online at
http://www.sun.com/blueprints
SNMP
The system controller uses SNMPv1, which is an insecure protocol. This means that
the SNMPv1 traffic needs to be kept on a private network, as described in the online
article, Securing the Sun Fire Midframe System Controller available online at
http://www.sun.com/blueprints
64
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
CHAPTER
6
Maintenance
This chapter explains how to perform the following procedures:
■
■
■
■
■
■
“Powering Off and On the System” on page 65
“Keyswitch Positions” on page 69
“Shutting Down Domains” on page 70
“Assigning and Unassigning Boards” on page 71
“Upgrading the Firmware” on page 75
“Saving and Restoring Configurations” on page 76
Powering Off and On the System
To power off the system, you must halt the Solaris operating environment in each
domain and power off each domain.
Note – Before you begin this procedure, make sure you have the following books.
Also, if you have a redundant system controller configuration, review “Conditions
That Affect Your SC Failover Configuration” on page 77, before you power cycle
your system.
■
■
Sun Fire 6800/4810/4800/3800 Systems Service Manual
Sun Hardware Platform Guide (available with your version of the Solaris operating
environment)
Powering Off the System
When you power off the system, power off all of the active domains. Then power off
the power grid(s). The last step is to power off the hardware.
65
▼
To Power Off the System
1. Connect to the appropriate domain shell.
See “System Controller Navigation” on page 32.
2. Display the status of all domains. Type the following from the platform shell:
TABLE 6-1
Displaying the Status of All Domains With the showplatform -p status
Command
schostname:SC> showplatform -p status
Domain
Solaris Nodename Domain Status
-------- ---------------- ------------A
Solaris nodename-a
Active - Solaris
B
Powered Off
C
Powered Off
D
Powered Off
Keyswitch
--------on
off
standby
standby
schostname:SC>
3. Complete these substeps for each active domain.
These substeps include halting the Solaris operating environment in each domain,
turning off the domain keyswitch, and disconnecting from the session.
a. Enter the domain console you want to power off.
See “Obtaining a Domain Shell or Console” on page 30.
b. If the Solaris operating environment is running, log in as root and halt the
operating environment.
Refer to the Sun Hardware Platform Guide, which is available with your Solaris
operating environment release.
You will see the OpenBoot PROM ok prompt when the Solaris operating
environment is shut down.
c. From the ok prompt, obtain the domain shell prompt.
i. Press and hold the CTRL key while pressing the ] key to get to the telnet>
prompt.
66
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
ii. At the telnet> prompt, type send break.
ok CTRL ]
telnet> send break
schostname:A>
The domain shell prompt is displayed.
d. Turn the domain keyswitch to the off position with the setkeyswitch off
command.
schostname:A> setkeyswitch off
e. Disconnect from the session by typing the disconnect command.
schostname:A> disconnect
4. Power off the power grid(s).
This step powers off the power supplies.
■
Access the platform shell.
See “Obtaining the Platform Shell” on page 28.
■
If you have a Sun Fire 6800 system, you must power off power grids 0 and 1.:
schostname:SC> poweroff grid0 grid1
Go to Step 5.
■
If you have a Sun Fire 4810/4800/3800 system, there is only one power grid,
grid 0. Power off power grid 0:
schostname:SC> poweroff grid0
5. Power off the hardware in your system.
Refer to the “Powering Off and On” chapter of the Sun Fire 6800/4810/4800/3800
Systems Service Manual.
Chapter 6
Maintenance
67
▼
To Power On the System
1. Power on the hardware.
Refer to the “Powering Off and On” chapter of the Sun Fire 6800/4810/4800/3800
Systems Service Manual.
2. Access the system controller platform shell.
See “Obtaining the Platform Shell” on page 28.
3. Power on the power grids.
Powers on the power supplies. Complete the following substeps.
■
If you have a Sun Fire 6800 system, power on power grid 0 and power grid 1.
schostname:SC> poweron grid0 grid1
■
If you have a Sun Fire 4810/4800/3800 system, there is only one power grid, grid
0:
schostname:SC> poweron grid0
4. Boot each domain.
a. Access the domain shell for the domain you want to boot.
“Obtaining a Domain Shell or Console” on page 30.
b. Boot the domain with the system controller setkeyswitch on command.
schostname:A> setkeyswitch on
This command turns on the domain and boots the Solaris operating environment
if the OpenBoot PROM auto-boot? parameter is set to true and the OpenBoot
PROM boot-device parameter is set to the proper boot device.
Use the setupdomain command (OBP.auto-boot? parameter), which is run
from a domain shell, or the OpenBoot PROM setenv auto-boot? true
command to control whether the Solaris operating environment auto boots when
you turn the keyswitch on.
Go to Step 5.
68
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Note – If the Solaris operating environment did not boot automatically, continue
with the Step c. Otherwise, go to Step 5.
The Solaris operating environment will not boot automatically if the OpenBoot
PROM auto-boot? parameter is set to false. You will see the ok prompt.
c. At the ok prompt, type the boot command to boot the Solaris operating
environment.
ok boot
After the Solaris operating environment is booted, the login: prompt is
displayed.
login:
5. To access and boot another domain, repeat Step 4.
Keyswitch Positions
Each domain has a virtual keyswitch with five positions: off, standby, on, diag, and
secure. The setkeyswitch command in the domain shell changes the position of
the virtual keyswitch to the specified value. The virtual keyswitch replaces the need
for a physical keyswitch for each domain. This command is also available, with
limited functionality, in the platform shell.
For command syntax, examples, descriptions of setkeyswitch parameters, and
results when you change the keyswitch setting, see the setkeyswitch command in
the Sun Fire 6800/4810/4800/3800 System Controller Command Reference Manual.
Caution – During the setkeyswitch operation, heed the following precautions:
Do not power off any boards assigned to the domain.
Do not reboot the system controller.
Chapter 6
Maintenance
69
▼
To Power On a Domain
1. Access the domain you want to power on.
See “System Controller Navigation” on page 32.
2. Set the keyswitch to on, diag, or secure using the system controller
setkeyswitch command.
Shutting Down Domains
This section describes how to shut down a domain.
▼
To Shut Down a Domain
1. Connect to the domain console of the domain you want to shut down.
See “System Controller Navigation” on page 32.
From the domain console, if the Solaris operating environment is booted you will see
the % , #, or login: prompts.
2. If the Solaris operating environment is running, halt the Solaris operating
environment from the domain console as root.
Refer to the Sun Hardware Platform Guide, which is available with your Solaris
operating environment release.
3. Enter the domain shell from the domain console.
See “To Obtain the Domain Shell From the Domain Console” on page 32.
4. In the domain shell, type:
schostname:A> setkeyswitch off
5. If you need to completely power off the system, see “Powering Off and On the
System” on page 65.
70
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Assigning and Unassigning Boards
When you assign a board to a domain, the board must be listed in the Access
Control List (ACL) for the domain. It cannot be already assigned to another domain.
The ACL is only checked when you assign a board to a domain. If the board is
assigned to a domain when the domain is active, the board is not automatically
configured to be part of that domain.
TABLE 6-2
■
For an overview of steps on assigning and unassigning boards to or from a
domain with and without dynamic reconfiguration (DR), seeTABLE 6-2 and
TABLE 6-3.
■
For complete step-by-step procedures not using dynamic reconfiguration, see “To
Assign a Board to a Domain” on page 72 and “To Unassign a Board From a
Domain” on page 74.
■
For procedures that use dynamic reconfiguration, refer to the Sun Fire 6800, 4810,
4800, and 3800 Systems Dynamic Reconfiguration User Guide.
Overview of Steps to Assign a Board To a Domain
To Assign a Board To a Domain Using DR
To Assign a Board To a Domain Not Using DR
1. Assign the disconnected and isolated board to the
domain with the cfgadm -x assign command.
2. Use DR to configure the board into the domain.
Refer to the Sun Fire 6800, 4810, 4800, and 3800
Systems Dynamic Reconfiguration User Guide.
1. Assign the board to the domain with the
addboard command.
2. Halt the Solaris operating environment in the
domain.
3. Shut down the domain with setkeyswitch
standby.
4. Turn on the domain with setkeyswitch on.
TABLE 6-3
Overview of Steps to Unassign a Board From a Domain
To Unassign a Board From a Domain Using DR
To Unassign a Board From a Domain Not Using DR
1. Use DR to unconfigure the board from the domain.
Refer to the Sun Fire 6800, 4810, 4800, and 3800
Systems Dynamic Reconfiguration User Guide.
2. Unassign the board from the domain with the
cfgadm -c disconnect -o unassign
command.
1. Halt the Solaris operating environment in the
domain.
2. Turn the keyswitch to standby mode with
setkeyswitch standby.
3. Unassign the board from the domain with the
deleteboard command.
4. Turn on the domain with setkeyswitch on.
Chapter 6
Maintenance
71
To Assign a Board to a Domain
▼
Note – This procedure does not use dynamic reconfiguration (DR).
1. Enter the domain shell for the domain you want to assign the board to.
See “Obtaining a Domain Shell or Console” on page 30.
2. Type the showboards command with the -a option to find available boards that
can be used in the domain.
In the domain shell, the command output list boards that are in the current domain
and boards that are not yet assigned to a domain and are listed in the Access Control
List (ACL) for the current domain. You can assign any board listed that is not
currently part of the domain to the current domain.
showboards -a Example Before Assigning a Board to a Domain
CODE EXAMPLE 6-1
schostname:A> showboards -a
Slot
---/N0/SB0
/N0/IB6
Pwr
--On
On
Component Type
-------------CPU Board
PCI I/O Board
State
----Active
Active
Status
-----Passed
Passed
Domain
-----A
A
If the board you want to assign to the domain is not listed in the showboards -a
output, complete the following substeps. Otherwise, go to Step 3.
a. Make sure that the board has not been assigned to another domain by running
the showboards command in the platform shell.
A board cannot be assigned to the current domain if it belongs to another domain.
b. Verify that the board is listed in the Access Control List (ACL) for the domain.
Use the showplatform -p acls command (platform shell) or the
showdomain -p acls command (domain shell).
c. If the board is not listed in the ACL for the desired domain, use the
setupplatform -p acls command from the platform shell to add the board
to the ACL for the domain.
See “To Configure Platform Parameters” on page 47.
72
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
3. Assign the proper board to the desired domain with the addboard command.
The board must be in the Available board state. For example, to assign
CPU/Memory board, sb2, to the current domain, type:
schostname:A> addboard sb2
The new board assignment takes effect when you change the domain keyswitch
from an inactive position (off or standby) to an active position (on, diag, or
secure) using the system controller setkeyswitch command.
Assigning a board to a domain does not automatically make that board part of an
active domain.
4. If the domain is active (the domain is running the Solaris operating environment,
the OpenBoot PROM, or POST), complete this step.
■
If the Solaris operating environment is running in the domain, log in as root to the
Solaris operating environment and halt it. For details on how to halt a domain
running the Solaris operating environment, refer to the Sun Hardware Platform
Guide.
■
If the OpenBoot PROM or POST is running, wait for the ok prompt.
a. Obtain the domain shell.
See “To Obtain the Domain Shell From the Domain Console” on page 32.
b. Shut down the domain. Type:
schostname:A> setkeyswitch standby
By setting the domain keyswitch to standby instead of off, the boards in the
domain do not need to be powered on and tested again. Setting the keyswitch to
standby also decreases downtime.
c. Turn the domain on. Type:
schostname:A> setkeyswitch on
Note – Rebooting the Solaris operating environment without using the
setkeyswitch command does not configure boards that are in the Assigned board
state into the active domain.
Chapter 6
Maintenance
73
d. If your environment is not set to automatically boot the Solaris operating
environment in the domain after you turned the keyswitch on, boot the
operating environment by typing boot at the ok prompt.
ok boot
Note – Setting up whether the Solaris operating environment auto boots or not
when you turn the keyswitch on is done either with the setupdomain command
(OBP.auto-boot? parameter), which is run from a domain shell, or with the
OpenBoot PROM setenv auto-boot? true command.
▼
To Unassign a Board From a Domain
Note – This procedure does not use dynamic reconfiguration (DR).
Unassign a board from a domain with the deleteboard command. For a complete
description of the deleteboard command, see the Sun Fire 6800/4810/4800/3800
System Controller Command Reference Manual.
Note – When you unassign a board from a domain, the domain cannot not be
active. This means it must not be running the Solaris operating environment, the
OpenBoot PROM, or POST. The board you are unassigning must be in the Assigned
board state.
1. Halt the Solaris operating environment in the domain.
Refer to the Sun Hardware Platform Guide.
2. Enter the domain shell for the proper domain.
See “System Controller Navigation” on page 32.
3. Turn the domain keyswitch off with setkeyswitch off.
4. Type the showboards command to list the boards assigned to the current domain.
5. Unassign the proper board from the domain with the deleteboard command:
schostname:A> deleteboard sb2
74
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
6. Turn on the domain. Type:
schostname:A> setkeyswitch on
7. If your environment is not set to automatically boot the Solaris operating
environment in the domain after you turned the keyswitch on, boot the operating
environment by typing boot at the ok prompt.
ok boot
Upgrading the Firmware
The flashupdate command updates the firmware in the system controller and the
system boards (CPU/Memory boards and I/O assemblies). There is no firmware on
the Repeater boards. This command is available in the platform shell only. The
source flash image can be on a server or another board of the same type.
A complete description of this command, including command syntax and examples,
see the flashupdate command in the Sun Fire 6800/4810/4800/3800 System
Controller Command Reference Manual.
Note – Review the README and Install.info files before you upgrade the
firmware.
In order to upgrade the firmware from a URL, the firmware must be accessible from
a ftp or http URL. Before performing the flashupdate procedure read the
information in the “Description” section of the flashupdate command in the Sun
Fire 6800/4810/4800/3800 System Controller Command Reference Manual. The
“Description” section covers:
■
■
Steps to perform before you upgrade the firmware.
What to do if the images you installed are incompatible with the new images.
Caution – When you update the firmware on the system controller, update only one
system controller at a time, as described in the Install.info file. DO NOT update
both system controllers at the same time.
Chapter 6
Maintenance
75
Saving and Restoring Configurations
This section describes when to use the dumpconfig and restoreconfig
commands.
Using dumpconfig
Use the dumpconfig command to save platform and domain settings after you
■
Complete the initial configuration of the platform and the domains
■
Modify the configuration, or change the hardware configuration
For an explanation of how to use this command, see “Saving the Current
Configuration to a Server” on page 50. For complete command syntax and examples
of this command, refer to the dumpconfig command in the Sun Fire
6800/4810/4800/3800 System Controller Command Reference Manual.
Using restoreconfig
Use the restoreconfig command to restore platform and domain settings.
For complete command syntax and examples of this command, refer to the
restoreconfig command in the Sun Fire 6800/4810/4800/3800 System Controller
Command Reference Manual.
76
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
CHAPTER
7
System Controller Failover
Sun Fire 6800/4810/4800/3800 systems can be configured with two system
controllers for high availability. In a high-availability system controller (SC)
configuration, one SC serves as the main SC, which manages all the system
resources, while the other SC serves as a spare. When certain conditions cause the
main SC to fail, a switchover or failover from the main SC to the spare is triggered
automatically, without operator intervention. The spare SC assumes the role of the
main and takes over all system controller responsibilities.
This chapter explains the following:
■
How SC Failover Works
■
SC Failover Prerequisites
■
Conditions That Affect Your SC Failover Configuration
■
How to Manage SC Failover
■
How to Recover After an SC Failover
How SC Failover Works
The SC failover capability is enabled by default on Sun Fire midrange servers that
have two System Controller boards installed. The failover capability includes both
automatic and manual failover. In automatic SC failover, a failover is triggered when
certain conditions cause the main SC to fail or become unavailable. In manual SC
failover, you force the switchover of the spare SC to the main.
The failover software performs the following tasks to determine when a failover
from the main SC to the spare is necessary and to ensure that the system controllers
are failover-ready:
■
Continuously checks the heartbeat of the main SC and the presence of the spare
SC.
77
■
Copies data from the main SC to the spare SC at regular intervals so that the data
on both system controllers is synchronized if a failover occurs.
If at any time the spare SC is not available or does not respond, the failover
mechanism disables SC failover. If SC failover is enabled, but the connection link
between the SCs is down, failover remains enabled and active until the system
configuration changes. After a configuration change, such as a change in platform or
domain parameter settings, the failover mechanism remains enabled, but it is not
active (SC failover is not in a failover-ready state because the connection link is
down). You can check the SC failover state by using commands such as
showfailover or showplatform, as explained in “To Obtain Failover Status
Information” on page 83.
What Triggers an Automatic Failover
A failover from the main to the spare SC is triggered when one of the following
failure conditions occurs:
■
The heartbeat of the main SC stops.
■
The main SC is rebooted but it does not boot successfully.
■
A fatal software error occurs.
What Happens During a Failover
An SC failover is characterized by the following:
■
Failover event message
The SC failover event is logged in the platform message log file, which is viewed
on the console of the new main SC or through the showlogs command on the SC.
The information displayed indicates that a failover has occurred and identifies the
failure condition that triggered the failover.
CODE EXAMPLE 7-1 shows the type of information that appears on the console of
the spare SC when a failover occurs due to a stop in the main SC heartbeat:
CODE EXAMPLE 7-1
Messages Displayed During an Automatic Failover
Platform Shell - Spare System Controller
sp4-sc0:sc> Nov 12 01:15:42 sp4-sc0 Platform.SC: SC Failover: enabled and
active.
78
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Messages Displayed During an Automatic Failover (Continued)
CODE EXAMPLE 7-1
Nov 12 01:16:42 sp4-sc0 Platform.SC: SC Failover: no heartbeat detected from the
Main SC
Nov 12 01:16:42 sp4-sc0 Platform.SC: SC Failover: becoming main SC ...
Nov 12 01:16:49 sp4-sc0 Platform.SC: Chassis is in single partition mode.
Nov 12 01:17:04 sp4-sc0 Platform.SC: Main System Controller
Nov 12 01:17:04 sp4-sc0 Platform.SC: SC Failover: disabled
sp4-sc1:SC>
■
Change in the SC prompt
The prompt for the main SC is hostname:SC> . Note that the upper case letters, SC,
identify the main SC.
The prompt for the spare SC is hostname:sc> . Note that the lower case letters,
sc, identify the spare SC.
When an SC failover occurs, the prompt for the spare SC changes and becomes
the prompt for the main SC (hostname:SC> ), as shown in the last line of
CODE EXAMPLE 7-1.
■
Command execution is disabled
When an SC failover is in progress, command execution is disabled.
■
Short recovery period
The recovery time for an SC failover from the main to the spare is approximately
five minutes or less. This recovery period consists of the time required to detect a
failure and direct the spare SC to assume the responsibilities of the main SC.
■
No disturbance to running domains
The failover process does not affect any running domains, except for temporary
loss of services from the system controller.
■
Deactivation of the SC failover feature
After an automatic or manual failover occurs, the failover capability is
automatically disabled. This prevents the possibility of repeated failovers back
and forth between the two SCs.
■
Telnet connections to domain consoles are closed
A failover closes a telnet session connected to the domain console, and any
domain console output is lost. When you reconnect to the domain through a
telnet session, you must specify the hostname or IP address of the new main SC,
unless you previously assigned a logical hostname or IP address to your main
system controller (see the next section for an explanation of the logical hostname
and IP address).
Chapter 7
System Controller Failover
79
The remainder of this chapter describes SC failover prerequisites, conditions that
affect your SC failover configuration, and how to manage SC failover, including how
to recover after an SC failover occurs.
SC Failover Prerequisites
This section identifies SC failover prerequisites and optional platform parameters
that can be set for SC failover:
■
Same firmware version required on both the main and spare SC
SC failover requires that you run the same version of the firmware (version 5.13.0)
on both the main and spare system controller. Be sure to follow the instructions
for installing and upgrading the firmware described in the Install.info file
that accompanies the firmware release.
■
Optional platform parameter settings
You can optionally perform the following after you install or upgrade the
firmware on each SC:
■
Assign a logical hostname or IP address to the main system controller.
The logical hostname or IP address identifies the working main system
controller, even after a failover occurs. Assign the logical IP address or
hostname by running the setupplatform command on the main SC.
Note – The logical hostname or IP address is required if you are using Sun
Management Center 3.0 for Sun Fire 6800/4810/4800/3800 systems.
■
Use Simple Network Time Protocol (SNTP) to keep the date and time values
between the main and spare system controllers synchronized.
The date and time between the two SCs must be synchronized, to ensure that
the same time service is provided to the domains. Run the setupplatform
command on each SC to identify the host name or IP address of the system to
be used as the SNTP server (reference clock).
If you do not want to use an SNTP server to synchronize the SC date and time,
you can use the setdate command on each SC to set the date and time.
For further information on setting the platform date and time, see “To Set the
Date and Time for the Platform” on page 46.
80
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Conditions That Affect Your SC Failover
Configuration
If you power cycle your system (power off your system and then on), note the
following:
■
After a power cycle, the first system controller that boots scapp becomes the
main SC.
Certain factors, namely disabling or running SC POST with different diag levels,
influence which SC is booted first.
■
Be sure that SC failover is enabled and active before you power cycle your
system, to ensure that data on both system controllers is current and
synchronized.
If SC failover is disabled at the time a power cycle occurs, it is possible for the
new main SC to boot with a stale SC configuration.
When SC failover is disabled, data synchronization does not occur between the
main and spare SC. As a result, any configuration changes made on the main SC
are not propagated to the spare. If the roles of the main and spare SC change after
a power cycle, scapp on the new main SC will boot with a stale SC configuration.
As long as SC failover is enabled and active, data on both SCs will be
synchronized, and it will not matter which SC becomes the main SC after the
power cycle.
How to Manage SC Failover
You control the failover state through the setfailover command, which enables
you to do the following:
■
Disable SC failover.
■
Enable SC failover.
Perform a manual failover (force a failover from the main SC to the spare).
■
You can also obtain failover status information through commands such as
showfailover or showplatform. For details, see “To Obtain Failover Status
Information” on page 83.
Chapter 7
System Controller Failover
81
▼
To Disable SC Failover
● From the platform shell on either the main or spare SC, type:
schostname:SC> setfailover off
A message indicates failover is disabled. Note that SC failover remains disabled
until you re-enable it (see the next procedure).
▼
To Enable SC Failover
● From the platform shell on either the main or spare SC, type:
schostname:SC> setfailover on
The following message is displayed while the failover software verifies the failoverready state of the system controllers:
SC Failover: enabled but not active.
Within a few minutes, after failover readiness has been verified, the following
message is displayed on the console, indicating that SC failover is activated:
SC Failover: enabled and active.
▼
To Perform a Manual SC Failover
1. Be sure that other SC commands are not currently running on the main SC.
2. From the platform shell on either the main or spare SC, type:
schostname:SC> setfailover force
A failover from one SC to the other occurs, unless there are fault conditions (for
example, the spare SC is not available or the connection link between the SCs is
down) that prevent the failover from taking place.
82
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
A message describing the failover event is displayed on the console of the new main
SC.
Be aware that the SC failover capability is automatically disabled after the failover. If
at some point you need the SC failover feature, be sure to re-activate failover (see
“To Enable SC Failover” on page 82).
▼
To Obtain Failover Status Information
Display failover information through the following commands:
■
The showfailover(1M) command displays SC failover state information, for
example:
schostname:SC> showfailover -v
SC: SSC0
Main System Controller
SC Failover: enabled and active.
Clock failover enabled.
The SC failover state can be one of the following:
■
■
■
enabled and active - SC failover is enabled and functioning normally.
disabled - SC failover has been disabled as a result of an SC failover or
because the SC failover feature was specifically disabled (through the
setfailover off command)
enabled but not active - SC failover is enabled, but certain hardware
components, such as the spare SC or the centerplane between the main and
spare SC, are not in a failover-ready state.
■
The showplatform and showsc commands also display failover information,
similar to the output of the showfailover command.
■
The showboards command identifies the state of the System Controller boards,
either Main or Spare.
For details on these commands, refer to their descriptions in the Sun Fire
6800/4810/4800/38000 System Controller Command Reference Manual.
Chapter 7
System Controller Failover
83
How to Recover After an SC Failover
After an SC failover occurs, you must perform certain recovery tasks:
■
Identify the failure point or condition that caused the failover and determine how
to correct the failure.
■
Use the showlogs command to review the platform messages logged for the
working SC. Evaluate these messages for failure conditions and determine the
corrective action needed to reactivate any failed components.
If the syslog loghost has been configured, you can review the platform
loghost to see any platform messages for the failed SC.
■
■
If you need to replace a failed System Controller board, see “To Remove and
Replace a System Controller Board in a Redundant SC Configuration” on
page 103.
If an automatic failover occurred while you were running the flashupdate,
setkeyswitch, or dynamic reconfiguration commands, note that those
operations are stopped and must be rerun after you resolve the failure
condition.
However, if you were running configuration commands such as
setupplatform, it is possible that some configuration changes occurred
before the failover. Be sure to verify whether any configuration changes were
made. For example, if you were running the setupplatform command when
an automatic failover occurred, use the showplatform command to verify any
configuration changes made before the failover. After you resolve the failure
condition, run the appropriate commands to update your configuration as
needed.
■
84
After you resolve the failover condition, re-enable SC failover by using the
setfailover on command (see “To Enable SC Failover” on page 82).
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
CHAPTER
8
Testing System Boards
This chapter describes how to test
■
A CPU/Memory board with the system controller testboard command
■
An I/O assembly in a spare domain with POST
The CPU/Memory board and I/O assembly are the only boards with directed tests.
Testing a CPU/Memory Board
Use the testboard system controller command to test the CPU/Memory board
name you specify on the command line. This command is available in both the
platform and domain shells.
Requirements
■
Domain cannot be active
■
Board power must be on
■
Repeater boards used to run the domain must also be powered on. See “Repeater
Boards” on page 18 for the Repeater boards needed to run the domain.
■
Board must not be part of an active domain. The board should be in the Assigned
state (if running from a domain shell). Use showboards to display the board
state.
85
▼
To Test a CPU/Memory Board
To test a CPU/Memory board from a domain A shell, type the testboard
command:
schostname:A> testboard sbx
where sbx is sb0 through sb5 (CPU/Memory boards).
For complete command syntax and examples, refer to the testboard command in
the Sun Fire 6800/4810/4800/3800 System Controller Command Reference Manual.
Testing an I/O Assembly
You test a CPU/Memory board with the testboard command. However, you
cannot test an I/O assembly with the testboard command. Testing a board with
testboard requires CPUs to test a board. No CPUs are present on an I/O assembly.
To test an I/O assembly with POST, you must construct a spare domain with the
unit under test and a board with working CPUs. The spare domain must meet these
requirements:
■
■
Domain cannot be active
Contain at least one CPU/Memory board
If your spare domain does not meet these requirements, the following procedure, “To
Test an I/O Assembly” on page 87, explains how to:
■
■
86
Halt the Solaris operating environment in the spare domain
Assign a CPU/Memory board to the spare domain
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
▼
To Test an I/O Assembly
1. Verify that you have a spare domain. Type the showplatform command from the
platform shell.
If you have a spare domain, go to Step 3. If you do not have a spare domain, go to
Step 2.
2. Complete these steps if you do not have a spare domain.
■
If you have a system with one partition and one domain, add a second domain to
the partition.
See “Creating and Starting Domains” on page 53. Go to Step 3.
■
If you have a system with one partition and the partition contains two domains,
create a spare domain in the second partition:
a. Shut down all running domains in the chassis.
b. Change the partition mode to dual by running the setupplatform command.
See the setupplatform command in the Sun Fire 6800/4810/4800/3800 System
Controller Command Reference Manual.
c. Create a spare domain in the second partition.
See “Creating and Starting Domains” on page 53.
3. Enter the domain shell (a through d) of a spare domain.
See “System Controller Navigation” on page 32.
4. If the spare domain is running the Solaris operating environment (#, % prompts
displayed), halt the Solaris operating environment in the domain.
Refer to the Sun Hardware Platform Guide, which is available with your Solaris
release.
5. Verify if the spare domain contains at least one CPU/Memory board by typing the
showboards command.
If you need to add a CPU/Memory board to the spare domain, go to Step 6.
Otherwise, go to Step 7.
6. Assign a CPU/Memory board with a minimum of one CPU to the spare domain
with the addboard command.
This example shows assigning a CPU/Memory board to domain B (in the domain B
shell)
schostname:B> addboard sbx
where sbx is sb0 through sb5.
Chapter 8
Testing System Boards
87
7. Assign the I/O assembly you want to test on the spare domain with the addboard
command.
This example shows assigning an I/O assembly to domain B (in the domain B shell).
schostname:B> addboard ibx
where x is 6, 7, 8, or 9.
8. Run the setupdomain command to configure parameter settings, such as
diag-level and verbosity-level.
This command is an interactive command. For command syntax and a code
example, refer to the setupdomain command in the Sun Fire 6800/4810/4800/3800
System Controller Command Reference Manual.
9. Verify that the date and time are set correctly with showdate.
If the date and time are not set correctly, reset the date and time with setdate.
For complete setdate command syntax and examples, refer to the setdate
command in the Sun Fire 6800/4810/4800/3800 System Controller Command Reference
Manual.
10. Turn the keyswitch on in the spare domain.
This action runs POST in the domain.
schostname:B> setkeyswitch on
.
.
ok
The I/O assembly is tested. However, the cards in the I/O assembly are not tested.
To test the cards in the I/O assembly, you must boot the Solaris operating
environment.
■
If the setkeyswitch operation succeeds:
You will see the ok prompt, which means that it is likely that the I/O assembly is
working. However, it is possible that some components have been disabled. You
can also view the output of the showboards command to view the status of the
boards after testing.
■
If POST finds errors:
Error messages are displayed of the test that failed. Check the POST output for
error messages. If the setkeyswitch operation fails, an error messages is
displayed telling you why the operation failed. You will obtain the domain shell.
88
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
11. Obtain the domain shell from the domain console.
See “To Obtain the Domain Shell From the Domain Console” on page 32.
12. Turn the keyswitch to standby.
schostname:B> setkeyswitch standby
13. Delete the I/O assembly in the spare domain with deleteboard.:
schostname:B> deleteboard ibx
where x is the board number you typed in Step 7.
14. Exit the spare domain shell and go back to the domain you were in before
entering the spare domain.
See “System Controller Navigation” on page 32.
Chapter 8
Testing System Boards
89
90
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
CHAPTER
9
Removing and Replacing Boards
This chapter discusses the software steps to remove and replace the following
boards, cards, and assemblies:
■
■
■
■
■
“CPU/Memory Boards and I/O Assemblies” on page 92
“CompactPCI and PCI Cards” on page 98
“Repeater Board” on page 99
“System Controller Board” on page 101
“ID Board and Centerplane” on page 104
In addition, this chapter also describes how to unassign a board from a domain and
disable the board.
To troubleshoot board and component failures, see “Board and Component Failures”
on page 112. To remove and install the FrameManager, ID board, power supplies,
and fan trays, refer to the Sun Fire 6800/4810/4800/3800 Systems Service Manual.
Before you begin, have the following books handy:
■
■
Sun Hardware Platform Guide
Sun Fire 6800/4810/4800/3800 Systems Service Manual
You will need these books for Solaris operating environment steps and hardware
removal and installation steps. The first book is available with your Solaris operating
environment release.
91
CPU/Memory Boards and I/O
Assemblies
The following procedures describe the software steps needed to
■
■
■
Remove and replace a system board (CPU/Memory board or I/O assembly)
Unassign a system board from a domain or disable a system board
Hot-swap a CPU/Memory board or an I/O assembly
For details on
■
■
Moving a CPU/Memory board or an I/O assembly between domains
Disconnecting a CPU/Memory board or I/O assembly (leave it in the system
until a replacement board is available)
refer to the Sun Fire 6800, 4810, 4800, and 3800 Systems Dynamic Reconfiguration User
Guide.
▼
To Remove and Replace a System Board
The following procedure describes the steps for removing and replacing a system
board without using Dynamic Reconfiguration commands.
1. Connect to the domain console for the domain that contains the board or assembly
your want to remove and replace.
See Chapter 2.
2. Halt the Solaris operating environment in the domain containing the board or
assembly you want to remove.
Refer to the Sun Hardware Platform Guide. You should see the ok prompt.
3. Get to the domain shell prompt.
For details on accessing the domain shell, see Chapter 2.
92
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
4. Turn the domain keyswitch to the standby position with the
setkeyswitch standby command.Power off the board or assembly. Type:
schostname:A> setkeyswitch standby
schostname:A> poweroff board_name
where:
board_name is sb0 - sb5 or ib6 - ib9
5. Verify the green power LED is off (
).
6. Remove the board or assembly.
Refer to the Sun Fire 6800/4810/4800/3800 Systems Service Manual.
7. Replace a new board or assembly.
8. Power on the board or assembly. Type:
schostname:SC> poweron board_name
where board_name is sb0 - sb5 or ib6 - ib9
9. Check the version of the firmware that is installed on the board by using the
showboards command:
schostname:SC> showboards -p version
The firmware version of the new replacement board must be compatible with the
system controller software version.
10. If the firmware version of the replacement board or assembly is different from the
board you removed, update the firmware on the board.
For a description of command syntax, refer to the flashupdate command in the
Sun Fire 6800/4810/4800/3800 System Controller Command Reference Manual.
Chapter 9
Removing and Replacing Boards
93
a. If you have a CPU/Memory board of the same type installed, use the
flashupdate -c command:
schostname:SC> flashupdate -c source_board destination_board
After completing this step, go to Step c.
If you do not have a CPU/Memory board of the same type installed, go to Step b.
b. If you do not have a CPU/Memory board of the same type installed, use the
flashupdate -f command. Type:
schostname:SC> flashupdate -f url board
c. If a board was noted by the Failed state in showboards, after you flashupdate
a compatible version, power off the board to clear the Failed state.
11. Complete this step if you have an I/O assembly.
a. Before you bring the board back to the Solaris operating environment, test the
I/O assembly in a spare domain that contains at least one CPU/Memory board
with a minimum of one CPU.
b. Enter a spare domain.
c. Test the I/O assembly.
See “Testing an I/O Assembly” on page 86.
12. Turn the domain keyswitch to the on position with the setkeyswitch on
command.
schostname:A> setkeyswitch on
This command turns the domain on and boots the Solaris operating environment if
the system controller setupdomain OBP.auto-boot? parameter is set to true and
the OpenBoot PROM boot-device parameter is set to the proper boot device.
■
If the Solaris operating environment did not boot automatically, continue with the
next step.
■
If the appropriate OpenBoot PROM parameters are not set up to take you to the
login: prompt, you will see the ok prompt.
For more information on the OpenBoot PROM parameters, refer to the Sun
Hardware Platform Guide.
94
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
13. At the ok prompt, type the boot command:
ok boot
After the Solaris operating environment is booted, the login: prompt is displayed.
▼
To Unassign a Board From a Domain or Disable
a System Board
If a CPU/Memory board or I/O assembly fails, complete one of the following tasks:
■
Unassign the board from the domain. See “To Unassign a Board From a Domain”
on page 74.
OR
■
▼
Disable the board. Refer to the disablecomponent command in the Sun Fire
6800/4810/4800/3800 System Controller Command Reference Manual. Disabling the
board prevents it from re-entering the domain when the domain is rebooted.
To Hot-Swap a CPU/Memory Board
1. Use DR to unconfigure and disconnect the CPU/Memory board out of the domain.
Refer to the Sun Fire 6800, 4810, 4800, and 3800 Systems Dynamic Reconfiguration User
Guide.
2. Verify the state of the LEDs on the board.
Refer to the CPU/Memory board chapter of the Sun Fire 6800/4810/4800/3800 Systems
Service Manual.
3. Remove and replace the board.
Refer to the CPU/Memory board chapter of the Sun Fire 6800/4810/4800/3800 Systems
Service Manual.
4. Power on the board.
5. Check the version of the firmware that is installed on the board by using the
showboards command:
schostname:SC> showboards -p version
Chapter 9
Removing and Replacing Boards
95
The firmware version of new replacement board should be the same as the board
you just removed.
6. If the firmware version of the replacement board or assembly is different from the
board you removed, update the firmware on the board.
■
If you have a CPU/Memory board of the same type installed, use the
flashupdate -c command:
schostname:SC> flashupdate -c source_board destination_board
For a description of command syntax, refer to the flashupdate command in
the Sun Fire 6800/4810/4800/3800 System Controller Command Reference Manual. Go
to Step 7.
■
If you do not have a CPU/Memory board of the same type installed, use the
flashupdate -f command:
schostname:SC> flashupdate -f URL board
For a description of command syntax, refer to the flashupdate command in the
Sun Fire 6800/4810/4800/3800 System Controller Command Reference Manual.
7. Use DR to connect and configure the board back into the domain.
Refer to the Sun Fire 6800, 4810, 4800, and 3800 Systems Dynamic Reconfiguration User
Guide.
8. Verify the state of the LEDs on the board.
Refer to the CPU/Memory board chapter of the Sun Fire 6800/4810/4800/3800 Systems
Service Manual.
▼
To Hot-Swap an I/O Assembly
The following procedure describes how to hot-swap an I/O assembly and test it in a
spare domain not running the Solaris operating environment.
1. Use DR to unconfigure and disconnect the I/O assembly out of the domain.
Refer to the Sun Fire 6800, 4810, 4800, and 3800 Systems Dynamic Reconfiguration User
Guide
2. Verify the state of the LEDs on the assembly.
Refer to the I/O assembly chapter of the Sun Fire 6800/4810/4800/3800 Systems Service
Manual.
96
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
3. Remove and replace the assembly.
Refer to the I/O assembly chapter of the Sun Fire 6800/4810/4800/3800 Systems Service
Manual.
4. Power on the board.:
schostname:SC> poweron board_name
5. Check the version of the firmware that is installed on the assembly by using the
showboards command:
schostname:SC> showboards -p version
The firmware version of the new replacement assembly should be same as the
assembly you just removed.
6. If the firmware version of the replacement assembly is a different from the
assembly you removed, update the firmware on the assembly.
For a description of command syntax, refer to the flashupdate command in the
Sun Fire 6800/4810/4800/3800 System Controller Command Reference Manual.
■
If you have an I/O assembly of the same type installed, use the
flashupdate -c command:
schostname:SC> flashupdate -c source_board destination_board
Go to Step 8.
■
If you do not have a I/O assembly of the same type installed, use the
flashupdate -f command:
schostname:SC> flashupdate -f URL board
7. Before you bring the board back to the Solaris operating environment, test the I/O
assembly in a spare domain that contains at least one CPU/Memory board with a
minimum of one CPU.
a. Enter a spare domain.
b. Test the I/O assembly.
For details, see “Testing an I/O Assembly” on page 86.
Chapter 9
Removing and Replacing Boards
97
8. Use DR to connect and configure the assembly back into the domain running the
Solaris operating environment.
Refer to the Sun Fire 6800, 4810, 4800, and 3800 Systems Dynamic Reconfiguration User
Guide.
CompactPCI and PCI Cards
If you need to remove and replace a PCI or CompactPCI card, follow the instructions
below. The replacement procedure for CompactPCI cards requires that you simply
remove and replace the card. For further information on physically replacing
CompactPCI and PCI cards, refer to the Sun Fire 6800/4810/4800/3800 Systems Service
Manual.
▼
To Remove and Replace a PCI Card
The following procedure describes the steps for removing and replacing a PCI card
without using DR commands.
1. Halt the Solaris operating environment in the domain, power off the I/O
assembly, and remove it from the system.
Complete Step 1 through Step 6 in “To Remove and Replace a System Board” on
page 92.
2. Remove and replace the card.
Refer to the Sun Fire 6800/4810/4800/3800 Systems Service Manual.
3. Replace the I/O assembly and power it on.
Complete Step 7 and Step 8 in “To Remove and Replace a System Board” on page 92.
4. Reconfigure booting of the Solaris operating environment in the domain.
At the ok prompt, type boot -r.
ok boot -r
98
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
▼
To Remove and Replace a CompactPCI Card
● Remove and replace the CompactPCI card from the I/O assembly.
For details, refer to the Sun Fire 6800/4810/4800/3800 Systems Service Manual.
Repeater Board
This section discusses the software steps necessary to remove and replace a Repeater
board. Only the Sun Fire 6800/4810/4800 systems have Repeater boards. The Sun
Fire 3800 system has the equivalent to two Repeater boards on the active
centerplane.
▼
To Remove and Replace a Repeater Board
To remove and replace a Repeater board, you must halt and power off the domains
that the Repeater board is connected to.
Caution – Be sure you are properly grounded before you remove and replace the
Repeater board.
1. Determine which domains are active by typing the showplatform -p status
system controller command from the platform shell.
2. Determine which Repeater boards are connected to each domain (TABLE 9-1).
TABLE 9-1
Repeater Boards and Domains
System
Partition Mode
Repeater Board Names
Domain IDs
Sun Fire 6800 system
Single partition
RP0, RP1, RP2, RP3
A, B
Sun Fire 6800 system
Dual partition
RP0, RP1
A, B
Sun Fire 6800 system
Dual partition
RP2, RP3
C, D
Sun Fire 4810 system
Single partition
RP0, RP2
A, B
Sun Fire 4810 system
Dual partition
RP0
A
Sun Fire 4810 system
Dual partition
RP2
C
Sun Fire 4800 system
Single partition
RP0, RP2
A, B
Chapter 9
Removing and Replacing Boards
99
TABLE 9-1
Repeater Boards and Domains (Continued)
System
Partition Mode
Repeater Board Names
Domain IDs
Sun Fire 4800 system
Dual partition
RP0
A
Sun Fire 4800 system
Dual partition
RP2
C
Sun Fire 3800 system
Equivalent of two Repeater boards integrated into the active
centerplane.
3. Complete the steps to:
■
■
Halt the Solaris operating environment in each domain the Repeater board is
connected to.
Power off each domain.
Complete Step 1 through Step 3 in “To Power Off the System” on page 66.
4. Power off the Repeater board with the poweroff command.
schostname:SC> poweroff board_name
board_name is the name of the Repeater board (rp0, rp1, rp2, or rp3).
5. Verify the green power LED is off (
).
6. Remove and replace the Repeater board.
Refer to the Sun Fire 6800/4810/4800/3800 Systems Service Manual.
7. Boot each domain using the normal boot procedure.
Refer to “To Power On the System” on page 68.
100
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
System Controller Board
This section discusses how to remove and replace a System Controller board.
▼
To Remove and Replace the System Controller
Board in a Single SC Configuration
To remove a defective System Controller board in a single SC configuration and
replace it with a working System Controller board, follow these steps:
1. For each active domain, use a telnet session to access the domain (see Chapter 2
for details), and halt the Solaris operating environment in the domain.
Caution – Since you do not have access to the console, you will not be able to
determine when the operating environment is completely halted. Using your best
judgement, wait until you can best judge that the operating environment has halted.
2. Turn off the system completely. Be sure to power off the circuit breakers and the
power supply switches for the Sun Fire 3800 system. Make sure you power off all
the hardware components to the system.
Refer to the “Powering Off and On” chapter in the Sun Fire 6800/4810/4800/3800
Systems Service Manual.
3. Remove the defective System Controller board and replace the new System
Controller board.
Refer to the “System Controller Board” chapter in the Sun Fire 6800/4810/4800/3800
Systems Service Manual.
4. Power on the RTUs, AC input boxes, and the power supply switches.
Refer to the “Powering Off and On” chapter in the Sun Fire 6800/4810/4800/3800
Systems Service Manual. When the specified hardware is powered on, the System
Controller board will automatically power on.
Chapter 9
Removing and Replacing Boards
101
5. If you previously saved the platform and domain configurations using the
dumpconfig command, use the restoreconfig command to restore the platform
and domain configurations from a server.
You must have saved the latest platform and domain configurations of your system
with the dumpconfig command in order to restore the latest platform and domain
configurations with the restoreconfig command. For command syntax and
examples, see the restoreconfig command in the Sun Fire 6800/4810/4800/3800
System Controller Command Reference Manual.
■
If you did not type the dumpconfig command earlier, configure the system
again. See Chapter 3.
Note – When you insert a new System Controller board into the system, it is set to
the default values of the setupplatform command. It is set to DHCP, which means
the system controller will use DHCP to get to its network settings.
If DHCP is not available (there is a 60-second timeout waiting period), then the
System Controller board will boot and the network (setupplatform -p net) will
need to be configured before you can type the restoreconfig command.
6. Check the date and time for the platform and each domain. Type the showdate
command in the platform shell and in each domain shell.
If you need to reset the date or time, go to Step 7. Otherwise, skip to Step 8.
7. Set the date and time for the platform and for each domain (if needed).
a. Set the date and time for the platform shell.
See the setdate command in the Sun Fire 6800/4810/4800/3800 System Controller
Command Reference Manual.
b. Set the date for each domain shell.
8. Check the configuration for the platform by typing showplatform at the platform
shell. If necessary, run the setupplatform command to configure the platform.
See “To Configure Platform Parameters” on page 47.
9. Check the configuration for each domain by typing showdomain in each domain
shell. If necessary, run the setupdomain command to configure each domain.
See “To Configure Domain-Specific Parameters” on page 49.
10. Boot the Solaris operating environment in each domain you want powered on.
11. Complete Step 4 and Step 5 in “To Power On the System” on page 68.
102
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
▼
To Remove and Replace a System Controller
Board in a Redundant SC Configuration
To remove a defective System Controller board in a redundant SC configuration and
replace it with a working System Controller board, follow these steps:
1. Run the showsc or showfailover -v command to determine which SC is the
main.
2. If the working system controller (the one that is not to be replaced) is not the
main, perform a manual failover so that the working system controller becomes
the main SC:
schostname:sc> setfailover force
3. Power off the system controller to be replaced:
schostname:SC> poweroff component_name
where component_name is the name of the System Controller board to be replaced,
either SSC0 or SSC1.
The System Controller board is powered off, and the hot-plug LED is illuminated. A
message indicates when you can safely remove the system controller.
4. Remove the defective System Controller board and replace it with the new System
Controller board.
The new System Controller board powers on automatically.
5. Verify that the firmware on the new system controller matches the firmware on the
working SC.
You can use the showsc command to check the firmware version (the ScApp
version) running on the system controller. If the firmware versions do not match, use
the flashupdate command to upgrade the firmware on the new system controller
so that it corresponds with the firmware version on the other SC.
6. Re-enable SC failover by running the following command on the main or spare
SC:
schostname:SC> setfailover on
Chapter 9
Removing and Replacing Boards
103
ID Board and Centerplane
▼
To Remove and Replace ID Board and
Centerplane
1. Before you begin, be sure to have a terminal connected to the serial port of the
system controller and have the following information available (it will be used
later in this procedure):
■
■
■
■
■
System serial number
Model number
MAC address (for domain A)
Host ID (for domain A)
Know if you have a Capacity on Demand system
This information can be found on labels affixed to the system. Refer to the Sun Fire
6800/4810/4800/3800 Systems Service Manual for more information on label placement.
In most cases, when only the ID board and centerplane are replaced, the original
System Controller board will be used. The above information was already cached by
the system controller and will be used to program the replacement ID board. You
will be asked to confirm the above information.
2. Complete the steps to remove and replace the centerplane and ID board.
Refer to the ”Centerplane and ID Board” chapter of the Sun Fire 6800/4810/4800/3800
Systems Service Manual.
Note – The ID board can be written only once. Exercise care to manage this
replacement process carefully. Any errors may require a new ID board.
3. After removing and replacing the ID board, make every attempt to use the
original System Controller board installed in slot ssc0 in this system.
Using the same System Controller board allows the system controller to
automatically prompt with the correct information.
4. Power on the hardware components.
Refer to the “Power Off and On” chapter of the Sun Fire 6800/4810/4800/3800 Systems
Service Manual.
The system controller boots automatically.
104
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
5. If you have a serial port connection, access the console for the system controller
because the system will prompt you to confirm the board ID information
(CODE EXAMPLE 9-1).
The prompting will not occur with a telnet connection.
CODE EXAMPLE 9-1
Confirming Board ID Information
It appears that the ID Board has been replaced.
Please confirm the ID information:
(Model, System Serial Number, Mac Address Domain A, HostID Domain A, COD Status)
Sun Fire 4800, 45H353F, 08:00:20:d8:a7:dd, 80d8a7dd, non-COD
Is the information above correct? (yes/no):
If you have a new System Controller board, skip Step 6 and go to Step 7.
6. Compare the information collected in Step 1 with the information you have been
prompted with in Step 5.
■
If the information matches, answer yes to the above question on the system
controller console. The system will boot normally.
■
If the information does not match, answer no to the above question on the system
controller console.
7. If you answer “no” to the question in Step 6 or if you are replacing both the ID
board and the System Controller board at the same time, you will be prompted to
enter the ID information manually.
Note – Enter this information carefully, as you have only one opportunity to do so.
Use the information collected in Step 1 to answer the questions prompted for in
CODE EXAMPLE 9-2. Be aware that you must specify the MAC address and Host ID of
domain A (not for the system controller).
CODE EXAMPLE 9-2
ID Information To Enter Manually
Please enter System Serial Number: xxxxxxxx
Please enter the model number (3800/4800/4810/6800): xxxx
MAC address for Domain A: xx:xx:xx:xx:xx:xx
Host ID for Domain A: xxxxxxxx
Is COD (Capacity on Demand) system ? (yes/no): xx
Programming Replacement ID Board
Caching ID information
8. Complete Step 3 and Step 4 in “To Power On the System” on page 68.
Chapter 9
Removing and Replacing Boards
105
106
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
CHAPTER
10
Troubleshooting
This chapter provides troubleshooting information for a system administrator. The
chapter describes the following topics:
■
■
■
■
■
■
■
“System Faults” on page 107
“Displaying Diagnostic Information” on page 107
“Displaying System Configuration Information” on page 108
“Assisting Sun Service Personnel” on page 108
“Domain Not Responding” on page 109
“Board and Component Failures” on page 112
“Disabling Components” on page 122
System Faults
An internal fault is any condition that is considered to be unacceptable for normal
system operation. When the system has a fault, the Fault LED (
) will turn on.
You must take immediate action to eliminate an internal fault.
Displaying Diagnostic Information
For information on displaying diagnostic information, refer to the Sun Hardware
Platform Guide, which is available with your Solaris operating environment release.
107
Displaying System Configuration
Information
To display system configuration parameters, refer to the Sun Hardware Platform
Guide, which is available with your Solaris operating environment release.
Assisting Sun Service Personnel
The following procedure, lists the actions you must take to help Sun service
personnel determine the cause of your failure.
▼
To Determine the Cause of Your Failure
● Provide the following information to Sun service personnel so that they can help
you determine the cause of your failure:
■
The system controller log files, if the system controller has a loghost. The system
controller log files are necessary because they contain more information than the
showlogs system controller command. Also, with the system controller log files,
the Sun service personnel is able to obtain a history of the system, which can help
during troubleshooting.
■
A verbatim transcript of all output written to the domain console leading up to
the failure. Also include any output printed subsequent to user actions. If the
transcript does not show certain user actions, in a separate file include comments
on what actions prompted particular messages.
■
A copy of the domain log file as well as other files from /var/adm/messages
from the time leading up to the failure.
■
The following system controller command output from the platform shell:
■
■
■
■
■
■
108
showsc -v
showplatform -v
showplatform -v -d domainID
showboards -v
showlogs -v
showlogs -v -d domainID
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Domain Not Responding
If a domain is not responding, the domain is most likely hung or paused. This
section covers how to determine if a domain is hung or paused and how to recover
from a hard hung or paused domain.
Hung Domain
If the console is not responding, the Solaris operating environment is not
responding, and typing the break command from the domain shell did not work,
the domain is hard hung.
Follow the procedure, “To Recover a Hard Hung or Paused Domain” on page 110, if:
■
■
The domain is no longer working.
It is not possible to log into the domain to terminate processes or reboot directly.
Caution – Completing the steps in “To Recover a Hard Hung or Paused Domain”
on page 110 terminates the Solaris operating environment. Do not perform the steps
in this procedure unless the domain is not working.
When the Solaris operating environment is terminated, data in memory might not be
flushed to disk. This could cause a loss or corruption of the application file system
data.
Paused Domain
Another possibility is that the domain may be paused due to a hardware error that
also may be causing this condition. If the system controller detects a hardware error,
and the reboot-on-error parameter is set to true, the domain is automatically
rebooted. If the reboot-on-error parameter is set to false, the domain is paused.
If the domain is paused, turn the domain off with setkeyswitch off and then
turn the domain on with setkeyswitch on. See the procedure “To Recover a Hard
Hung or Paused Domain” on page 110 for steps to perform.
Chapter 10
Troubleshooting
109
▼
To Recover a Hard Hung or Paused Domain
1. Verify that the system controller is functioning.
Access the platform shell and the domain shell of the failing domain. See “System
Controller Navigation” on page 32.
2. If you cannot access both the platform and domain shell, reset the system
controller by pressing the reset button on the System Controller board.
See “System Controller Board Failure” on page 113. Wait for the system controller to
reboot.
3. Determine the status for the domain as reported by the system controller. Type one
of the following system controller commands:
■
■
showplatform -p status (platform shell)
showdomain -p status (domain shell)
These commands provide the same type of information in the same format.
■
If the output in the Domain Status field displays Paused due to an error,
the domain has paused due to a hardware error. Go to Step 4.
■
If the output in the Domain Status field displays Not Responding, the system
controller has determined that the domain is hung,. You must reset the domain.
Go to Step 5.
■
If the output in the Domain Status field displays any Active status, this
indicates that the system controller has not detected that the domain is hung. You
must reset the domain. Go to Step 5.
4. Reboot the domain manually. Complete the following substeps:
a. Access the domain shell.
See “System Controller Navigation” on page 32.
b. Turn off the domain. Type setkeyswitch off.
c. Turn on the domain. Type setkeyswitch on.
5. If the output displays Not Responding or any Active status, reset the domain.
Complete the following substeps.
Note – A domain cannot be reset while the domain keyswitch is in the secure
position.
a. Access the domain shell.
See “System Controller Navigation” on page 32.
110
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
b. Reset the domain by typing reset.
In order for the system controller to perform this operation, you must confirm it.
For a complete definition of this command, refer to the reset command in the
Sun Fire 6800/4810/4800/3800 System Controller Command Reference Manual.
c. Perform one of the following actions depending on the setting of the
OBP.error-reset-recovery variable in the setupdomain command
(TABLE 10-1).
■
If the setting is sync, the domain should automatically produce a Solaris core
file and then reboot. No further action is required.
Note – Changing the default setting of sync is not advised. If a core file is not
obtained, the chance of identifying and fixing the failure is considerably reduced.
■
■
If the setting is none, the domain returns to the ok prompt. Type sync at the
ok prompt to obtain a core file.
If the setting is boot, the domain should automatically reboot without
obtaining a core file.
TABLE 10-1
OpenBoot PROM error-reset-recovery Configuration Variable Settings
OpenBoot PROM Configuration Variable
Setting for error-reset-recovery
Action
none
The domain returns immediately to the OpenBoot
PROM.
sync (default)
The domain generates a Solaris operating
environment core file and reboots the domain.
boot
The domain is rebooted.
Note – If the configuration variable is set to none and the OpenBoot PROM takes
control, you can type any OpenBoot PROM command from the ok prompt,
including rebooting the Solaris operating environment with the boot command.
d. If no core file can be obtained after Step c:
i. Access the domain console from the domain shell.
See “System Controller Navigation” on page 32.
ii. Type showresetstate -v or showresetstate -v -f URL from the
domain shell.
Chapter 10
Troubleshooting
111
This command prints a summary report of the contents of registers from every
CPU in the domain that has a valid saved state. If you specify the -f URL
option with the showresetstate command, the report summary is written to
a URL, which can be reviewed by Sun Service personnel (see the following
step) to analyze a failure or problem.
iii. Save the output and include the command output with the information you
provide to Sun service personnel as described in “To Determine the Cause of
Your Failure” on page 108.
iv. Reboot the domain by typing setkeyswitch off. Then type
setkeyswitch on.
Board and Component Failures
This section describes what to do when the following boards or components fail:
■
■
■
■
■
■
■
CPU/Memory board
I/O assembly
Repeater board
System Controller board
Power supply
Fan tray
FrameManager
CPU/Memory Board Failure
If a CPU/Memory Board Fails
Perform the Following Actions
When a CPU/Memory board fails, the domain the
CPU/Memory board is in will either go down or will
hang, depending on the type of failure
• Delete the board from the domain.
• If the domain is hard hung, perform the steps in
“To Recover a Hard Hung or Paused Domain” on
page 110.
112
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
I/O Assembly Failure
If an I/O Assembly Fails
Perform the Following Actions
When an I/O assembly fails, the domain containing
the I/O assembly will either go down or will hang.
What event occurs depends on the failure.
• Delete the I/O assembly from the domain.
• If the domain is hard hung, perform the steps in
“To Recover a Hard Hung or Paused Domain” on
page 110.
System Controller Board Failure
If a System Controller Board Fails
Perform the Following Actions
In a single SC configuration:
Perform the procedure “To Remove and Replace the
System Controller Board in a Single SC
Configuration” on page 101.
In a redundant SC configuration:
Perform the procedure “To Remove and Replace a
System Controller Board in a Redundant SC
Configuration” on page 103.
f you have one system controller and the clock on the
system controller fails:
1. Replace the system controller. Refer to the “System
Controller” chapter of the Sun Fire
6800/4810/4800/3800 Systems Service Manual.
2. Reboot each domain in the system.
If you have only one system controller in the system,
and the system controller fails due to a software
error, is hung, or if does not respond:
1. Reboot the system controller from the system
controller platform shell prompt with the reboot
command.
2. If the system controller cannot be rebooted or the
problem is more severe, reset the system controller
board by pressing the reset button on the board
with the tip of a pen (FIGURE 10-1).
If two system controllers are installed:
Wait for automatic SC failover to occur or force a
manual failover to the other SC.
Chapter 10
Troubleshooting
113
Reset button
FIGURE 10-1
Resetting the System Controller
Collecting Platform and Domain Status
Information
This section describes how to gather platform and domain status information for
troubleshooting purposes.
Note – Messages diverted to external sysloghosts can be found in the
/var/adm/messages file of the sysloghost.
▼ To Collect Platform Status Information
1. Be sure that the platform shell loghost is set up.
For details, see the description of the loghost service in TABLE 3-1.
2. Collect platform status information using the following system controller
commands:
■
■
114
showsc
showboards
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
■
■
■
showenvironment
showplatform
showlogs
For details on how to use each of these commands, refer to the Sun Fire
6800/4810/4800/3800 System Controller Command Reference Manual.
3. Collect service-required LED status and data from the platform shell loghost.
Note – Before you access domain shells and collect domain information, check the
platform logs first for any hardware errors. A hardware platform error could lead to
subsequent domain software errors.
▼ To Collect Domain Status Information
1. If a domain is paused due to a system error, collect error messages from the
designated domain sysloghost.
a. Be sure that the loghost for each domain is set up, as described in TABLE 3-1.
b. Collect error messages from the designated domain syslog loghost.
c. Collect service-required LED status and data from the designated domain
syslog loghost.
2. If a domain is not paused or hung, collect status information from the following
sources:
TABLE 10-2
Solaris Operating Environment and System Controller Software Commands
for Collecting Status Information
Command
Description
/var/adm/messages file
Contains error messages relative to the current operating
system initialization.
dmesg Solaris operating
environment command
Looks in a system buffer for recently printed diagnostic
messages and prints them on the standard output.
showboards,
showenvironment,
showdomain, and showlogs
system controller commands
Refer to the Sun Fire 6800/4810/4800/3800 System Controller
Command Reference Manual for a complete description and
syntax on how to use these commands.
Fault LED
Amber fault LED is lit if there is a fault
Platform logs
Check the platform logs to determine if there are any
hardware errors. A hardware platform error can lead to
subsequent domain software errors.
Chapter 10
Troubleshooting
115
For a thorough description of /var/adm/messages and dmesg, refer to the Solaris
operating environment online documentation, which is available with your version
of the Solaris operating environment.
116
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Repeater Board Failure
TABLE 10-3 presents information on how to troubleshoot a failed Repeater board by
system type, partition mode, and the number of domains.
TABLE 10-3
Repeater Board Failure
System Failure Mode
Repeater Board and Domain Changes
Failed Repeater
Board
Sun Fire 4810/4800 systems
1 partition
1 domain—A
RP0
System is down.
• If a replacement Repeater board is available:
1. Replace RP0.
Refer to the Sun Fire 6800/4810/4800/3800 Systems
Service Manual.
2. Reboot domain A. The domain reboots normally.
• If a spare Repeater board is not available:
1. Replace RP0 with RP2.
Refer to the Sun Fire 6800/4810/4800/3800 Systems
Service Manual.
2. Configure the system for dual partition mode with
the setupplatform command.
Sun Fire 3800 system
1 partition
1 domain—A
RP0
System is down.
1. Configure the system for dual partition mode with
the setupplatform command.
Resources from domain A can be configured into
domain C. If you manually reconfigure the
resources, the domain will have the hostID and
MAC address of domain C.
2. Reboot domain C.
3. Plan to replace the centerplane.
Sun Fire 4810/4800 systems
1 partition
1 domain—A
RP2
System is down.
• If a replacement Repeater board is available:
1. Replace RP2.
Refer to the Sun Fire 6800/4810/4800/3800 Systems
Service Manual.
2. Reboot domain A. The domain reboots normally.
• If a spare Repeater board is not available:
1. Configure the system for dual partition mode with
the setupplatform command.
2. Reboot domain A. The domain reboots normally.
Chapter 10
Troubleshooting
117
TABLE 10-3
Repeater Board Failure (Continued)
System Failure Mode
Repeater Board and Domain Changes
Failed Repeater
Board
Sun Fire 3800 system
1 partition
1 domain—A
RP2
System is down.
1. Configure the system to be in dual partition mode
with the setupplatform command.
2. Reboot domain A.
3. Plan to replace the centerplane.
Sun Fire 6800 system
1 partition
1 domain—A
RP0 or RP1
System is down.
• If a replacement Repeater board is available:
1. Replace the defective Repeater board in the Repeater
board pair.
• If a replacement Repeater board is not available:
1. Replace RP0 or RP1 with RP3 or RP4. Refer to the Sun
Fire 6800/4810/4800/3800 Systems Service Manual.
2. Configure the system for dual partition mode with
the setupplatform command.
Sun Fire 6800 system
1 partition
1 domain—A
RP2 or RP3
System is down.
• If a replacement Repeater board is available:
1. Replace the defective Repeater board in the Repeater
board pair.
• If a replacement Repeater board is not available:
1. Configure the system to be in dual partition mode
with the setupplatform command.
RP0 and RP1 come up as partition 0 containing
domain A, which reboots automatically.
Sun Fire 4810/4800/3800 system
2 partitions
2 domains—A, C
RP0
•
•
•
•
RP0 cannot be used.
RP2 continues without rebooting.
Domain C continues unaffected.
Domain A cannot be rebooted, even in another
domain, until you replace RP0.
Sun Fire 4810/4800/3800 system
2 partitions
2 domains—A, C
RP2
•
•
•
•
RP0 continues without rebooting.
RP2 cannot be used.
Domain A continues unaffected.
Domain C cannot be rebooted until you replace RP2.
118
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
TABLE 10-3
Repeater Board Failure (Continued)
System Failure Mode
Repeater Board and Domain Changes
Failed Repeater
Board
Sun Fire 4810/4800 system
1 partition
2 domains—A, B
RP0
System is down.
• If a replacement Repeater board is available:
1. Replace RP0.
2. Reboot both domains.
• If a replacement Repeater board is not available:
1. Configure the system for dual partition mode using
setupplatform.
2. Reboot domain C.
Note: Domain A is down. Domain B becomes domain C.
The MAC address and hostID will not change.
Sun Fire 3800 system
1 partition
2 domains—A, B
RP0
System is down.
1. Configure the system for dual partition mode using
setupplatform.
2. Reboot domain C.
Note: Domain A is down. Domain B becomes domain
C. The MAC address and hostID will not change.
3. Plan to replace the centerplane.
Sun Fire 4810/4800 system
1 partition
2 domains—A, B
RP2
System is down.
• If a replacement Repeater board is available:
1. Replace RP2.
2. Reboot both domains.
• If a replacement Repeater board is not available:
1. Configure the system for dual partition mode using
setupplatform.
2. Reboot domain A. In dual partition mode, you can
reboot only domain A.
Sun Fire 3800 system
1 partition
2 domains—A, B
RP2
System is down.
1. Configure the system for dual partition mode using
setupplatform.
2. Plan to replace the centerplane.
Only domain A can be rebooted. Domain B is down.
Sun Fire 6800 system
2 partitions
2 domains—A, C
RP0 or RP1
• RP0 and RP1 cannot be used.
• RP2 and RP3 continue without rebooting.
• Domain C continues.
1. Configure the CPU/Memory boards and I/O
assemblies from domain A to domain D.
2. Reboot domain D.
Note: The domain will have the hostID and MAC
address.
Chapter 10
Troubleshooting
119
TABLE 10-3
Repeater Board Failure (Continued)
System Failure Mode
Repeater Board and Domain Changes
Failed Repeater
Board
Sun Fire 6800 system
2 partitions
2 domains—A, C
RP2 or RP3
• RP0 and RP1 continue without rebooting.
• RP2 and RP3 are not usable.
• Domain A continues.
1. Configure the CPU/Memory boards and I/O
assemblies from domain C to domain B.
2. Reboot domain B.
Note: The domain will have the hostID and MAC
address of domain B.
Sun Fire 6800 system
2 partitions
3 domains—A, B, C
RP0 or RP1
•
•
•
•
RP0 and RP1 are not usable.
RP2 and RP3 continue unaffected.
Domains A and B cannot reboot.
Domain C continues unaffected.
Sun Fire 6800 system
2 partitions
3 domains—A, B, C
RP2 or RP3
•
•
•
•
RP0 and RP1 are not affected.
RP2 and RP3 are not usable.
Domains A and B are not affected.
Domain C cannot reboot.
Sun Fire 6800 system
2 partitions
3 domains—A, C, D
RP0 or RP1
•
•
•
•
RP0 and RP1 are not usable.
RP2 and RP3 continue unaffected.
Domain A cannot reboot.
Domains C and D continue unaffected.
Sun Fire 6800 system
2 partitions
3 domains—A, C, D
RP2 or RP3
•
•
•
•
RP0 and RP1 are not affected.
RP2 and RP3 are not usable.
Domain A is not affected.
Domains C and D cannot reboot.
Sun Fire 6800 system
2 partitions
4 domains—A, B, C, D
RP0 or RP1
•
•
•
•
RP0 and RP1 cannot restart.
RP2 and RP3 continue without rebooting.
Domains A and B cannot reboot.
Domains C and D continue unaffected.
Sun Fire 6800 system
2 partitions
4 domains—A, B, C, D
RP2 or RP3
•
•
•
•
RP0 and RP1 continue without rebooting.
RP2 and RP3 cannot restart.
Domains C and D cannot reboot.
Domains A and B continue unaffected.
120
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Power Supply Failure
If a Power Supply Fails
Perform the Following Actions
When a power supply fails and you do not have any
redundant power supplies:
The system may abruptly shut down due to
insufficient power.
1. Replace the defective power supply. Refer to the
Sun Fire 6800/4810/4800/3800 Systems Service
Manual.
2. Power on the system. See “To Power On the
System” on page 68.
When a power supply fails and you have one or
more redundant power supplies installed:
The redundant power supply takes over. Replace the
power supply that failed. Refer to the Sun Fire
6800/4810/4800/3800 Systems Service Manual.
Fan Tray Failure
If a Fan Tray Fails
Perform the Following Actions
When a fan tray fails and you do not have a
redundant fan tray:
The system may overheat and shut down if there is
insufficient cooling.
1. Replace the defective fan tray. Refer to the Sun Fire
6800/4810/4800/3800 Systems Service Manual.
2. Power on the system. See “To Power On the
System” on page 68.
When a fan tray fails and you have one or more
redundant fan trays:
The redundant fan tray takes over. Replace the fan
tray that failed. Refer to the Sun Fire
6800/4810/4800/3800 Systems Service Manual.
FrameManager Failure
If the FrameManager Fails
Perform the Following Actions
When the FrameManager fails, there is no affect on
the system.
Replace the FrameManager board.
Chapter 10
Troubleshooting
121
Disabling Components
The system controller supports the blacklisting feature, which allows you to disable
components on a board (TABLE 10-4).
TABLE 10-4
Blacklisting Component Names
System
Component
Component Subsystem
Component Name
board_name/port/physical_bank/logical_bank
CPU system
CPU/Memory
boards (board_name)
SB0, SB1, SB2, SB3, SB4, SB5
Ports on the
CPU/Memory
board
P0, P1, P2, P3
Physical memory
banks on
CPU/Memory
boards
B0, B1
Logical banks on
CPU/Memory
boards
L0, L1, L2, L3
I/O assembly
system
board_name/port/bus or board_name/card
I/O assemblies
(board_name)
IB6, IB7, IB8, IB9
Ports on the
I/O assembly
P0 and P1
Note: Leave at least one I/O controller 0 enabled
in a domain so that the domain can communicate
with the system controller.
Buses on the I/O
assembly
B0, B1
I/O cards in the I/O
assemblies
C0, C1, C2, C3, C4, C5, C6, C7 (the number of
I/O cards in the I/O assembly varies with the
I/O assembly type).
Blacklisting provides lists of system board components that will not be tested and
will not be configured into the Solaris operating environment. The blacklists are
stored in nonvolatile memory.
122
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Blacklist a component or device if you believe it might be failing intermittently or is
failing. Troubleshoot a component you believe is having problems and replace it, if
necessary.
There are three system controller commands for blacklisting:
■
■
■
disablecomponent
enablecomponent
showcomponent
The disablecomponent and enablecomponent commands only update the
blacklists. They do not directly affect the state of the currently configured system
boards.
The updated lists will take effect when you do one the following:
■
Reboot the domain.
■
Transition a domain from an inactive state (off or standby) to an active state
(on, diag, or secure).
■
Reset the domain. This should only be done when the domain is hung. For
information on how to reset a domain, see “Domain Not Responding” on
page 109.
Note – Blacklisting components in the platform shell and a domain shell are treated
differently.
If you blacklist a component from the platform shell and then move the component
to another domain, the component is still blacklisted. However, if you blacklist a
component in a domain shell and then move the component to a different domain,
the component is no longer blacklisted.
Chapter 10
Troubleshooting
123
124
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
APPENDIX
A
Mapping Device Path Names
This appendix describes how to map device path names to physical system devices.
This appendix describes the following topics:
■
■
“CPU/Memory Mapping” on page 125
“I/O Assembly Mapping” on page 127
Device Mapping
The physical address represents a physical characteristic that is unique to the device.
Examples of physical addresses include the bus address and the slot number. The
slot number indicates where the device is installed.
You reference a physical device by the node identifier—Agent ID (AID). The AID
ranges from 0 to 31 in decimal notation (0 to 1f in hexadecimal). In the device path
beginning with ssm@0,0 the first number, 0, is the node ID.
CPU/Memory Mapping
CPU/Memory board and memory agent IDs (AIDs) range from 0 to 23 in decimal
notation (0 to 17 in hexadecimal). Depending on the platform type, a system can
have up to six CPU/Memory boards.
125
Each CPU/Memory board can have either two or four CPUs, depending on your
configuration. Each CPU/Memory board has up to four banks of memory. Each
bank of memory is controlled by one memory management unit (MMU), which is
the CPU. The following code example shows a device tree entry for a CPU and its
associated memory:
/ssm@0,0/SUNW/UltraSPARC-III@b,0 /ssm@0,0/SUNW/memory-controller@b,400000
where:
in b,0
■
■
b is the CPU agent identifier (AID)
0 is the CPU register
in b,400000
■
■
b is the memory agent identifier (AID)
400000 is the memory controller register
There are up to four CPUs on each CPU/Memory board (TABLE A-1):
■
■
■
CPUs with agent IDs 0–3 reside on board name SB0
CPUs with agent IDs 4–7 on board name SB1
CPUs with agent IDs 8–11 on board name SB2, and so on.
TABLE A-1
CPU and Memory Agent ID Assignment
CPU/Memory Board Name
Agent IDs On Each CPU/Memory Board
CPU 0
CPU 1
CPU 2
CPU 3
SB0
0 (0)
1 (1)
2 (2)
3 (3)
SB1
4 (4)
5 (5)
6 (6)
7 (7)
SB2
8 (8)
9 (9)
10 (a)
11 (b)
SB3
12 (c)
13 (d)
14 (e)
15 (f)
SB4
16 (10)
17 (11)
18 (12)
19 (13)
SB5
20 (14)
21 (15)
22 (16)
23 (17)
The first number in the columns of agent IDs is a decimal number. The number or letter in parentheses
is in hexadecimal notation.
126
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
I/O Assembly Mapping
TABLE A-2 lists the types of I/O assemblies, the number of slots each I/O assembly
has, and the systems the I/O assembly types are supported on.
TABLE A-2
I/O Assembly Type and Number of Slots per I/O Assembly by System Type
I/O Assembly Type
Number of Slots Per I/O Assembly
System Name(s)
PCI
8
Sun Fire 6800/4810/4800 systems
CompactPCI
6
Sun Fire 3800 system
CompactPCI
4
Sun Fire 6800/4810/4800 systems
TABLE A-3 lists the number of I/O assemblies per system and the I/O assembly
name.
TABLE A-3
Number and Name of I/O Assemblies per System
System Name(s)
Number of I/O
Assemblies
I/O Assembly Name
Sun Fire 6800 system
4
IB6–IB9
Sun Fire 4810 system
2
IB6 and IB8
Sun Fire 4800 system
2
IB6 and IB8
Sun Fire 3800 system
2
IB6 and IB8
Each I/O assembly hosts two I/O controllers:
■
■
I/O controller 0
I/O controller 1
When mapping the I/O device tree entry to a physical component in the system, you
must consider up to five nodes in the device tree:
■
■
■
■
■
Node identifier (ID)
ID controller agent ID (AID)
Bus offset
PCI or CompactPCI slot
Device instance
Appendix A
Mapping Device Path Names
127
TABLE A-4 lists the AIDs for the two I/O controllers in each I/O assembly.
TABLE A-4
I/O Controller Agent ID Assignments
Slot Number
I/O Assembly Name
Even I/O controller AID
Odd I/O Controller AID
6
IB6
24 (18)
25 (19)
7
IB7
26 (1a)
27 (1b)
8
IB8
28 (1c)
29 (1d)
9
IB9
30 (1e)
31 (1f)
The first number in the column is a decimal number. The number (or a number and letter combination) in parentheses is in
hexadecimal notation.
Each I/O controller has two bus sides: A and B.
■
■
Bus A, which is 66 MHz, is referenced by offset 600000.
Bus B, which is 33 MHz, is referenced by offset 700000.
The board slots located in the I/O assembly are referenced by the device number.
PCI I/O Assembly
This section describes the PCI I/O assembly slot assignments and provides an
example of the device path.
The following code example gives a breakdown of a device tree entry for a SCSI
disk:
/ssm@0,0/pci@19,700000/pci@3/SUNW,isptwo@4/sd@5,0
Note – The numbers in the device path are hexadecimal.
where:
in 19,700000
■
■
19 is the I/O controller agent identifier (AID)
700000 is the bus offset
in pci@3
■
3 is the device number
isptwo is the SCSI host adapter
128
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
in sd@5,0
■
■
5 is the SCSI target number for the disk
0 is the logic unit number (LUN) of the target disk
This section describes the PCI I/O assembly slot assignments and provides an
example of the device path.
TABLE A-5 lists, in hexadecimal notation, the slot number, I/O assembly name, device
path of each I/O assembly, the I/O controller number, and the bus.
TABLE A-5
8-Slot PCI I/O Assembly Device Map for the Sun Fire 6800/4810/4810
Systems
I/O
Assembly
Name
Device Path
IB6
/ssm@0,0/pci@18,700000/pci@1
0
0
B
/ssm@0,0/pci@18,700000/pci@2
1
0
B
/ssm@0,0/pci@18,700000/pci@3
2
0
B
/ssm@0,0/pci@18,600000/pci@1
3
0
A
/ssm@0,0/pci@19,700000/pci@1
4
1
B
/ssm@0,0/pci@19,700000/pci@2
5
1
B
/ssm@0,0/pci@19,700000/pci@3
6
1
B
/ssm@0,0/pci@19,600000/pci@1
7
1
A
/ssm@0,0/pci@1a,700000/pci@1
0
0
B
/ssm@0,0/pci@1a,700000/pci@2
1
0
B
/ssm@0,0/pci@1a,700000/pci@3
2
0
B
/ssm@0,0/pci@1a,600000/pci@1
3
0
A
/ssm@0,0/pci@1b,700000/pci@1
4
1
B
/ssm@0,0/pci@1b,700000/pci@2
5
1
B
/ssm@0,0/pci@1b,700000/pci@3
6
1
B
/ssm@0,0/pci@1b,600000/pci@1
7
1
A
/ssm@0,0/pci@1c,700000/pci@1
0
0
B
/ssm@0,0/pci@1c,700000/pci@2
1
0
B
/ssm@0,0/pci@1c,700000/pci@3
2
0
B
/ssm@0,0/pci@1c,600000/pci@1
3
0
A
/ssm@0,0/pci@1d,700000/pci@1
4
1
B
IB7
IB8
Physical
Slot
Number
Appendix A
I./O
Controller
Number
Bus
Mapping Device Path Names
129
TABLE A-5
I/O
Assembly
Name
IB9
8-Slot PCI I/O Assembly Device Map for the Sun Fire 6800/4810/4810
Systems (Continued)
Device Path
Physical
Slot
Number
I./O
Controller
Number
Bus
/ssm@0,0/pci@1d,700000/pci@2
5
1
B
/ssm@0,0/pci@1d,700000/pci@3
6
1
B
/ssm@0,0/pci@1d,600000/pci@1
7
1
A
/ssm@0,0/pci@1e,700000/pci@1
0
0
B
/ssm@0,0/pci@1e,700000/pci@2
1
0
B
/ssm@0,0/pci@1e,700000/pci@3
2
0
B
/ssm@0,0/pci@1e,600000/pci@1
3
0
A
/ssm@0,0/pci@1f,700000/pci@1
4
1
B
/ssm@0,0/pci@1f,700000/pci@2
5
1
B
/ssm@0,0/pci@1f,700000/pci@3
6
1
B
/ssm@0,0/pci@1f,600000/pci@1
7
1
A
In TABLE A-5, note the following:
■
■
■
600000 is the bus offset and indicates bus A, which operates at 66 MHz.
700000 is the bus offset and indicates bus B, which operates at 33 MHz.
pci@3 is the device number. In this example @3 means it is the third device on
the bus.
FIGURE A-1 illustrates the Sun Fire 6800 PCI I/O assembly physical slot designations
for I/O assemblies IB6 through IB9.
130
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
/ssm@0,0/pci@1d,600000/pci@1
/ssm@0,0/pci@1e,700000/pci@1
/ssm@0,0/pci@1e,700000/pci@2
/ssm@0,0/pci@1e,700000/pci@3
0
7
1
6
2
5
3
4
4
3
5
2
6
1
7
0
/ssm@0,0/pci@1d,700000/pci@3
/ssm@0,0/pci@1d,700000/pci@2
/ssm@0,0/pci@1d,700000/pci@1
/ssm@0,0/pci@1e,600000/pci@1
/ssm@0,0/pci@1f,700000/pci@1
/ssm@0,0/pci@1f,700000/pci@2
/ssm@0,0/pci@1f,700000/pci@3
/ssm@0,0/pci@1f,600000/pci@1
/ssm@0,0/pci@1c,600000/pci@1
/ssm@0,0/pci@1c,700000/pci@3
/ssm@0,0/pci@1c,700000/pci@2
/ssm@0,0/pci@1c,700000/pci@1
IB8
IB9
/ssm@0,0/pci@1a,700000/pci@1
/ssm@0,0/pci@1a,700000/pci@2
/ssm@0,0/pci@19,600000/pci@1
0 7
/ssm@0,0/pci@19,700000/pci@3
1 6
/ssm@0,0/pci@1a,700000/pci@3
/ssm@0,0/pci@1a,600000/pci@1
/ssm@0,0/pci@1b,700000/pci@1
/ssm@0,0/pci@1b,700000/pci@2
/ssm@0,0/pci@19,700000/pci@2
2 5
3 4
/ssm@0,0/pci@19,700000/pci@1
/ssm@0,0/pci@18,600000/pci@1
4 3
/ssm@0,0/pci@18,700000/pci@3
5 2
/ssm@0,0/pci@1b,700000/pci@3
/ssm@0,0/pci@18,700000/pci@2
6 1
/ssm@0,0/pci@1b,600000/pci@1
/ssm@0,0/pci@18,700000/pci@1
7 0
IB7
Note: Slots 0 and 1 of IB6 through IB9 are short slots.
FIGURE A-1
IB6
Sun Fire 6800 System PCI Physical Slot Designations for IB6 Through IB9
Appendix A
Mapping Device Path Names
131
FIGURE A-2 illustrates the comparable information for the Sun Fire 4810/4800/3800
systems.
0
/ssm@0,0/pci@1c,700000/pci@1
1
/ssm@0,0/pci@1c,700000/pci@2
2
/ssm@0,0/pci@1c,700000/pci@3
3
/ssm@0,0/pci@1c,600000/pci@1
4
/ssm@0,0/pci@1d,700000/pci@1
5
/ssm@0,0/pci@1d,700000/pci@2
/ssm@0,0/pci@1d,700000/pci@3
6
7
/ssm@0,0/pci@1d,600000/pci@1
IB8
0
/ssm@0,0/pci@18,700000/pci@1
/ssm@0,0/pci@18,700000/pci@2
/ssm@0,0/pci@18,700000/pci@3
/ssm@0,0/pci@18,600000/pci@1
/ssm@0,0/pci@19,700000/pci@1
/ssm@0,0/pci@19,700000/pci@2
/ssm@0,0/pci@19,700000/pci@3
1
2
3
4
5
6
7
/ssm@0,0/pci@19,600000/pci@1
IB6
Note: Slots 0 and 1 for IB6 and IB 8 are short slots.
FIGURE A-2
132
Sun Fire 4810/4800 Systems PCI Physical Slot Designations for IB6 and IB8
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
CompactPCI I/O Assembly
This section describes the CompactPCI I/O assembly slot assignments and provides
an example on the 6-slot I/O assembly device paths.
▼ To Determine an I/O Physical Slot Number Using an I/O
Device Path
1. Use TABLE A-6 for Sun Fire 3800 systems or to determine the:
■
■
I/O assembly based on the I/O controller agent identifier address.
Physical slot number based on the I/O assembly and the device path.
2. Use FIGURE A-3 to locate the slot based on I/O assembly and the physical slot
number.
CompactPCI I/O Assembly Slot Assignments
In this code example is the breakdown of the device tree for the CompactPCI I/O
assembly, ib8.
/ssm@0,0/pci@1c,700000/pci@1/SUNW,isptwo@4
where:
in pci@1c,700000
■
■
c is the I/O controller agent identifier (AID)
700000 is the bus offset
in pci@1
■
1 is the device number
isptwo is the SCSI host adapter
Appendix A
Mapping Device Path Names
133
6-Slot CompactPCI I/O Assembly Device Map
TABLE A-6 lists, in hexadecimal notation, the slot number, I/O assembly name, device
path of each I/O assembly, the I/O controller number, and the bus.
TABLE A-6
Mapping Device Path to I/O Assembly Slot Numbers for Sun Fire 3800 Systems
I/O Assembly
Name
Device Path
IB6
/ssm@0,0/pci@19,700000/pci@2
5
1
B
/ssm@0,0/pci@19,700000/pci@1
4
1
B
/ssm@0,0/pci@18,700000/pci@2
3
0
B
/ssm@0,0/pci@18,700000/pci@1
2
0
B
/ssm@0,0/pci@19,600000/pci@1
1
1
A
/ssm@0,0/pci@18,600000/pci@1
0
0
A
/ssm@0,0/pci@1d,700000/pci@2
5
1
B
/ssm@0,0/pci@1d,700000/pci@1
4
1
B
/ssm@0,0/pci@1c,700000/pci@2
3
0
B
/ssm@0,0/pci@1c,700000/pci@1
2
0
B
/ssm@0,0/pci@1d,600000/pci@1
1
1
A
/ssm@0,0/pci@1c,600000/pci@1
0
0
A
IB8
Physical Slot
Number
I./O Controller
Number
Bus
In TABLE A-6, note the following:
■
■
■
600000 is the bus offset and indicates bus A, which operates at 66 MHz.
700000 is the bus offset and indicates bus B, which operates at 33 MHz.
pci@1 is the device number. The @1 means it is the first device on the bus.
FIGURE A-3 illustrates the Sun Fire 3800 CompactPCI physical slot designations.
134
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
0
/ssm@0,0/pci@18,600000/pci@1
/ssm@0,0/pci@19,600000/pci@1
/ssm@0,0/pci@18,700000/pci@1
/ssm@0,0/pci@18,700000/pci@2
/ssm@0,0/pci@19,700000/pci@1
/ssm@0,0/pci@1c,600000/pci@1
1
/ssm@0,0/pci@1d,600000/pci@1
2
/ssm@0,0/pci@1c,700000/pci@1
3
/ssm@0,0/pci@1c,700000/pci@2
4
/ssm@0,0/pci@1d,700000/pci@1
5
/ssm@0,0/pci@19,700000/pci@2
/ssm@0,0/pci@1d,700000/pci@2
IB6
FIGURE A-3
IB8
Sun Fire 3800 System 6-Slot CompactPCI Physical Slot Designations
4-Slot CompactPCI I/O Assembly Device Map
TABLE A-7 lists, in hexadecimal notation, the slot number, I/O assembly name, device
path of each I/O assembly, the I/O controller number, and the bus for Sun Fire
6800/4810/4800 systems.
TABLE A-7
Mapping Device Path to I/O Assembly Slot Numbers for Sun Fire 6800/4810/4800 Systems
I/O Assembly
Name
Device Path
IB6
/ssm@0,0/pci@19,700000/pci@1
3
1
B
/ssm@0,0/pci@18,700000/pci@1
2
0
B
/ssm@0,0/pci@19,600000/pci@1
1
1
A
/ssm@0,0/pci@18,600000/pci@1
0
0
A
/ssm@0,0/pci@1b,700000/pci@1
3
1
B
/ssm@0,0/pci@1a,700000/pci@1
2
0
B
/ssm@0,0/pci@1b,600000/pci@1
1
1
A
/ssm@0,0/pci@1a,600000/pci@1
0
0
A
/ssm@0,0/pci@1d,700000/pci@1
3
1
B
/ssm@0,0/pci@1c,700000/pci@1
2
0
B
/ssm@0,0/pci@1d,600000/pci@1
1
1
A
/ssm@0,0/pci@1c,600000/pci@1
0
0
A
IB7
IB8
Physical Slot
Number
Appendix A
I./O Controller
Number
Bus
Mapping Device Path Names
135
TABLE A-7
Mapping Device Path to I/O Assembly Slot Numbers for Sun Fire 6800/4810/4800 Systems
I/O Assembly
Name
Device Path
Physical Slot
Number
I./O Controller
Number
Bus
IB9
/ssm@0,0/pci@1f,700000/pci@1
3
1
B
/ssm@0,0/pci@1e,700000/pci@1
2
0
B
/ssm@0,0/pci@1f,600000/pci@1
1
1
A
/ssm@0,0/pci@1e,600000/pci@1
0
0
A
In TABLE A-7 note the following:
■
■
■
600000 is the bus offset and indicates bus A, which operates at 66 MHz.
700000 is the bus offset and indicates bus B, which operates at 33 MHz.
pci@1 is the device number. The @1 means it is the first device on the bus.
FIGURE A-4 illustrates the Sun Fire 4810 and 4800 CompactPCI physical slot
designations.
136
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
FIGURE A-4
/ssm@0,0/pci@1d,700000/pci@1
/ssm@0,0/pci@18,700000/pci@1
2
/ssm@0,0/pci@1c,700000/pci@1
/ssm@0,0/pci@19,600000/pci@1
1
/ssm@0,0/pci@1d,600000/pci@1
/ssm@0,0/pci@18,600000/pci@1
0
/ssm@0,0/pci@1c,600000/pci@1
IB8
3
IB6
Appendix A
Mapping Device Path Names
137
Sun Fire 4810/4800 Systems 4-Slot CompactPCI Physical Slot Designations
/ssm@0,0/pci@19,700000/pci@1
138
/ssm@0,0/pci@1b,600000/pci@1
/ssm@0,0/pci@1f,600000/pci@1
/ssm@0,0/pci@1a,600000/pci@1
0
/ssm@0,0/pci@1e,600000/pci@1
/ssm@0,0/pci@18,600000/pci@1
0
/ssm@0,0/pci@1c,600000/pci@1
/ssm@0,0/pci@19,600000/pci@1
1
/ssm@0,0/pci@1d,600000/pci@1
/ssm@0,0/pci@18,700000/pci@1
2
/ssm@0,0/pci@1c,700000/pci@1
/ssm@0,0/pci@19,700000/pci@1
3
/ssm@0,0/pci@1d,700000/pci@1
IB8
/ssm@0,0/pci@1e,700000/pci@1
IB9
/ssm@0,0/pci@1a,700000/pci@1
1
IB6
/ssm@0,0/pci@1f,700000/pci@1
2
IB7
Sun Fire 6800 System 4-Slot CompactPCI Physical Slot Designations for IB6
through IB9
3
FIGURE A-5
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
/ssm@0,0/pci@1b,700000/pci@1
APPENDIX
B
Setting Up an http or ftp Server
This appendix describes how to set up a firmware server, which is necessary to
invoke the flashupdate command. A firmware server can either be an http or a ftp
server. To upgrade firmware, you can either use the ftp or http protocol.
Note – This procedure assumes you do not have a web server currently running. If
you already have a web server set up, you can use or modify your existing
configuration. For more information, see man httpd.
Before you begin to set up the http or ftp server, follow these guidelines:
■
Having one firmware server is sufficient for several Sun Fire
6800/4810/4800/3800 systems.
■
Connect the firmware server to the network that is accessible by the system
controller.
Caution – The firmware server must not go down during the firmware upgrade. Do
not power down or reset the system during the flashupdate procedure.
Setting Up the Firmware Server
This section describes the following procedures:
■
■
“To Set Up an http Server” on page 140
“To Set Up an ftp Server” on page 142
139
▼
To Set Up an http Server
This procedure assumes that:
■
■
A http server is not already running.
The Solaris 8 operating environment is installed for the http server to be used.
1. Log in as superuser and navigate to the /etc/apache directory.
hostname% su
Password:
hostname # cd /etc/apache
2. Copy the httpd.conf-example file to replace the current httpd.conf file.
hostname # cp httpd.conf httpd.conf-backup
hostname # cp httpd.conf-example httpd.conf
3. Edit the httpd.conf file and add the following changes.
Port: 80
ServerAdmin:
ServerName:
a. Search through the httpd.conf file to find the “# Port:” section to
determine the correct location to add the Port 80 value as shown in
CODE EXAMPLE B-1.
CODE EXAMPLE B-1
Locating the Port 80 Value in httpd.conf
# Port: The port to which the standalone server listens. For
# ports < 1023, you will need httpd to be run as root initially.
#
Port 80
#
# If you wish httpd to run as a different user or group, you
must run
# httpd as root initially and it will switch.
140
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Search through the httpd.conf file to find the # ServerAdmin:” section to
determine the correct location to add the ServerAdmin value as shown in
CODE EXAMPLE B-2.
CODE EXAMPLE B-2
Locating the ServerAdmin Value in httpd.conf
# ServerAdmin: Your address, where problems with the server
# should be e-mailed. This address appears on some server# generated pages, such as error documents.
ServerAdmin root
#
# ServerName allows you to set a host name which is sent back to
Search through the httpd.conf file and search for to ServerName
(CODE EXAMPLE B-3.)
CODE EXAMPLE B-3
Locating the ServerName Value in httpd.conf
#
# ServerName allows you to set a host name which is sent back to clients for
# your server if it’s different than the one the program would get (i.e., use
# "www" instead of the host’s real name).
#
# Note: You cannot just invent host names and hope they work. The name you
# define here must be a valid DNS name for your host. If you don’t understand
# this, ask your network administrator.
# If your host doesn’t have a registered DNS name, enter its IP address here.
# You will have to access it by its address (e.g., http://123.45.67.89/)
# anyway, and this will make redirections work in a sensible way.
#
ServerName oslab-mon
4. Start Apache.
CODE EXAMPLE B-4
hostname #
hostname #
hostname #
hostname #
hostname #
Starting Apache
cd /etc/init.d
./apache start
cd /cdrom/cdrom0/firmware/
mkdir /var/apache/htdocs/firmware_build_number
cp * /var/apache/htdocs/firmware_build_number
Appendix B
Setting Up an http or ftp Server
141
▼
To Set Up an ftp Server
This procedure assumes that the Solaris 8 operating environment is installed for the
ftp server to be used.
1. Log in as superuser and check the ftpd man page.
hostname % su
Password:
hostname # man ftpd
In the man pages you will find the script that will create the ftp server environment.
Search through the man page to find the lines shown in the example below.
This script will setup your ftp server for you.
Install it in the /tmp directory on the server.
Copy this script and chmod 755 script_name.
#!/bin/sh
# script to setup anonymous ftp area
#
2. Copy the entire script out of the man page (not just the portion shown in the
sample above) into the /tmp directory and chmod 755 the script.
hostname #
hostname #
hostname #
hostname #
vi /tmp/script
chmod 755 /tmp/script
cd /tmp
./script
3. If you need to set up anonymous ftp, add the following entry to the /etc/passwd
file. You must use the following:
■
■
Group — 65534
Shell — /bin/false
/export/ftp was chosen to be the anonymous ftp area. This prevents users from
logging in as the ftp user.
# ftp:x:500:65534:Anonymous FTP:/export/ftp:/bin/false
Note – When using anonymous ftp, you should be concerned about security.
142
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
4. Add the following entry to the /etc/shadow file. Do not give a valid password.
Instead, use NP.
ftp:NP:6445::::::
5. Configure the ftp server on the loghost server.
hostname #
hostname #
hostname #
hostname #
cd /export/ftp/pub
mkdir firmware_build_number
cd /cdrom/cdrom0/firmware
cp * /export/ftp/pub/firmware_build_number
Appendix B
Setting Up an http or ftp Server
143
144
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Glossary
ACL
Access Control List. In order to assign a board to a domain with the addboard
command, the board name must be listed in the Access Control List (ACL). The
ACL is checked when a domain makes an addboard or a testboard request
on that board. On the Sun Fire 3800 system, all power supplies have switches
on them to power them on. These power supplies must be listed in the ACL.
active board state
When the board state is active, the slot has hardware installed in it. The
hardware is being used by the domain to which it was assigned. Active boards
cannot be reassigned.
assigned board state
When a board state is assigned, the slot belongs to a domain but the hardware
is not necessarily tested and configured for use. The slot can be given up by the
domain administrator or reassigned by the platform administrator.
available board state
When a board state is available, the slot is not assigned to any particular
domain.
domain
domain administrator
A domain runs its own instance of the Solaris operating environment and is
independent of other domains. Each domain has its own CPUs, memory, and
I/O assemblies. Repeater boards are shared between domains in the same
partition.
The domain administrator manages the domain.
failover
The switchover of the main system controller to its spare or the system
controller clock source to another system controller clock source when a failure
occurs in the operation of the main system controller or the clock source.
partition
A partition is a group of Repeater boards that are used together to provide
communication between CPU/Memory boards and I/O assemblies in the same
domain. You can set up the system with one partition or two partitions using
the system controller setupplatform command. Partitions do not share
Repeater boards.
platform
administrator
The platform administrator manages hardware resources across domains.
Glossary
145
port
Repeater board
A crossbar switch that connects multiple CPU/Memory boards and I/O
assemblies. Having the required number of Repeater boards is mandatory for
operation. There are Repeater boards in each mid-range system except for the
Sun Fire 3800. In the Sun Fire 3800 system, the equivalent of two Repeater
boards are integrated into the active centerplane.
RTS
Redundant transfer switch.
RTU
Redundant transfer unit
SNMP agent
Sun Management
Center software
system controller
software
146
A board connector.
Simple Network Management Protocol agent. Enables or disables the SNMP
agent.
A graphical user interface that monitors your system.
The application that performs all of the system controller configuration
functions.
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
Index
A
administrator workstation, unauthorized
access, 59
availability, 22
B
blacklisting, 21, 122
board
CompactPCI card
software steps, removal and installation, 91
CPU/Memory, 14
redundant, 13
software steps, removal and installation, 91
testing, 85
deleting from a domain, 72, 74
I/O assembly
software steps, removal and installation, 91
Repeater
definition, 18
description, 18
software steps for removing and installing, 99
System Controller board
software steps, removal and installation, 91
C
CompactPCI card
software steps for removal and installation, 91
components
disabling, 122
redundant, 13
configuration, minimum, 13
configurations
I/O assemblies, 15
configuring
system for redundancy, 13
console messages, 13
cooling, redundant, 13, 16
CPU
redundant, 14
CPU/Memory board, 14
hot-swapping, 95
software steps for removal and installation, 91
testing, 85
CPU/Memory mapping, 125
CPUs
maximum number per CPU/Memory board, 14
minimum number per CPU/Memory board, 14
creating domains, 2
current, monitoring, 12
D
date, setting, 46
deleteboard command, 72, 74
device name mapping, 125
device path names to physical system devices, 125
diagnostic information, displaying, 107
disabling a component, 122
displaying system configuration information, 108
Index
147
domain, 1, 145
A, entering from the platform shell, 37
access, unauthorized, 59
active, 2
adding boards to, 71
console, 12
definition, 35
creating, 2
three domains on the Sun Fire 6800 system, 56
default configuration, 2
deleting boards from, 71, 72, 74
features, 2
overview, 2
powering on, 51, 57, 70
running the Solaris operating environment, 35
security, 62
separation, 62
setting up
two domains, system controller software, 55
starting, 57
domain shell, 11
navigating to the OpenBoot PROM, 34
navigating to the Solaris operating
environment, 34
domain shell and platform shell
navigation, 33
dual partition mode, 3
E
environmental monitoring, 12
Ethernet (network) port, 9
System Controller board, 9
F
failover
recovery tasks, 84
failures, determining causes, 108
fan tray
hot-swapping, 16
redundant, 13, 16
fault, system, 107
features, 9
Ethernet (network), 9
serial (RS-232) port, 9
148
System Controller board ports, 9
features, 9
flashupdate command, 75
Frame Manager software, 25
G
grids, power
powering on, 45
H
hangs, determining causes, 108
hardware
powering on, 45
hot-swapping
CPU/Memory board, 95
I/O assembly, 96
hot-swapping, fan trays, 16
I
I/O assemblies
mapping, 127
redundant, 15
supported configurations, 15
I/O assembly
hot-swapping, 96
software steps for removal and installation, 91
I/O, redundant, 15
IP multipathing software, 16
K
keyswitch
virtual, 12
keyswitch command, 69
keyswitch off command, 67
keyswitch positions, virtual, 69
L
loghost, Solaris operating environment, 44
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002
M
maintenance, 65
mapping, 125
CPU/Memory, 125
I/O assembly, 127
node, 125
memory
redundant, 14
messages, console, 13
minimum configuration, 13
monitoring
current, 12
environmental conditions, 12
sensors, 12
temperature, 12
voltage, 12
multipathing, 16
N
navigation
between domain shell and the OpenBoot PROM
or the domain shell and the Solaris operating
environment, 34
between OpenBoot PROM and the domain
shell, 35
system controller, 33
to the domain shell, 34, 35
node mapping, 125
number of system controller boards supported, 8
setting, 61
passwords and users, security, 62
platform, 1
setting up, 46
platform shell
entering domain A, 37
platform shell and domain shell
navigation, 33
power, 17
redundant, 13, 17
power grids, powering on, 45
power on
flowchart, 42
steps performed before power on, 43
system controller
tasks completed, 11
power on and system set up steps
flowchart, 42
power supplies, 17
powering off
system, 66
powering on
domain, 51, 57, 70
hardware, 45
system, 11
processors
maximum number per CPU/Memory board, 14
minimum number per CPU/Memory board, 14
redundant, 14
R
O
OpenBoot PROM, 35
P
partition, 3
mode, 3
mode, dual, 3
mode, single, 3
partitions
number of, 3
password
RAS, 20
redundancy configuration, 13
redundant, 17
components, 13
cooling, 13, 16
CPU, 14
CPU/Memory boards, 13
fan trays, 13
I/O, 15
I/O assemblies, 15
memory, 14
power, 13, 17
power supplies, 17
Repeater boards, 18
Index
149
reliability, 20
Repeater board
definition, 18
descriptions, 18
redundant, 18
software steps for removing and installing, 99
S
security
domain, 62
domains, 62
threats, 59
users and passwords, 62
sensors, monitoring, 12
serial (RS-232) port, 9
System Controller board, 9
server
setting up, 46
serviceability, 23
setdate command, 46
setkeyswitch on command, 51, 57, 70
setting the date and time, 46
setting up
system (platform), 46
system, flowchart, 42
two domains, system controller software, 55
shells, domain, 11
single partition mode, 3
software steps
removing and installing a CompactPCI card, 91
removing and installing a CPU/Memory
board, 91
removing and installing a System Controller
board, 91
removing and installing an I/O assembly, 91
Repeater board, removing and installing, 99
Solaris operating environment, 34
loghost, 44
starting a domain, 57
Sun Management Center 3.0 Supplement
software, 25
syslog host, 13
system
administrator, tasks, 11
configuration information, displaying, 108
150
faults, 107
power on, system controller
tasks completed, 11
powering off, 66
setting up, 46
setting up, flowchart, 42
system controller
access, unauthorized, 59
definition, 1, 8
failover, 77
functions, 8
navigation, 33
tasks completed, power on, 11
System Controller board
Ethernet (network) port, 9
ports, 9
features, 9
serial (RS-232) port, 9
software steps for removal and installation, 91
System Controller boards
supported, 8
T
tasks performed by system administrator, 11
temperature, monitoring, 12
testboard command, 85
three domains
creating on the Sun Fire 6800 system, 56
time, setting, 46
troubleshooting, 107
U
user workstation, unauthorized access, 59
users and passwords, security, 62
V
virtual keyswitch, 12, 69
voltage, monitoring, 12
Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual • May 2002