Fujitsu Siemens Computers M4000 Server User`s guide

Add to my manuals
162 Pages

advertisement

Fujitsu Siemens Computers M4000 Server User`s guide | Manualzz

Dynamic Reconfiguration (DR) User’s Guide

SPARC Enterprise

M4000 / M5000 / M8000 / M9000 Servers

English

Order No. U41684-J-Z816-2-76

Part No.819-7898-11

September 2007, Revision A

SPARC

®

Enterprise

M4000/M5000/M8000/M9000

Servers Dynamic Reconfiguration

(DR) User's Guide

Copyright 2007 FUJITSU LIMITED, 1-1, Kamikodanaka 4-chome, Nakahara-ku, Kawasaki-shi, Kanagawa-ken 211-8588, Japan. All rights reserved.

Sun Microsystems, Inc. provided technical input and review on portions of this material.

Sun Microsystems, Inc. and Fujitsu Limited each own or control intellectual property rights relating to products and technology described in this document, and such products, technology and this document are protected by copyright laws, patents and other intellectual property laws and international treaties. The intellectual property rights of Sun Microsystems, Inc. and Fujitsu Limited in such products, technology and this document include, without limitation, one or more of the United States patents listed at http://www.sun.com/patents and one or more additional patents or patent applications in the United States or other countries.

This document and the product and technology to which it pertains are distributed under licenses restricting their use, copying, distribution, and decompilation. No part of such product or technology, or of this document, may be reproduced in any form by any means without prior written authorization of Fujitsu Limited and Sun Microsystems, Inc., and their applicable licensors, if any. The furnishing of this document to you does not give you any rights or licenses, express or implied, with respect to the product or technology to which it pertains, and this document does not contain or represent any commitment of any kind on the part of Fujitsu Limited or Sun Microsystems, Inc., or any affiliate of either of them.

This document and the product and technology described in this document may incorporate third-party intellectual property copyrighted by and/or licensed from suppliers to Fujitsu Limited and/or Sun Microsystems, Inc., including software and font technology.

Per the terms of the GPL or LGPL, a copy of the source code governed by the GPL or LGPL, as applicable, is available upon request by the End

User. Please contact Fujitsu Limited or Sun Microsystems, Inc

This distribution may include materials developed by third parties.

Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and in other countries, exclusively licensed through X/Open Company, Ltd.

Sun, Sun Microsystems, the Sun logo, Java, Netra, Solaris, Sun Ray, Answerbook2, docs.sun.com, OpenBoot, and Sun Fire are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries.

Fujitsu and the Fujitsu logo are registered trademarks of Fujitsu Limited.

All SPARC trademarks are used under license and are registered trademarks of SPARC International, Inc. in the U.S. and other countries.

Products bearing SPARC trademarks are based upon architecture developed by Sun Microsystems, Inc.

SPARC64 is a trademark of SPARC International, Inc., used under license by Fujitsu Microelectronics, Inc. and Fujitsu Limited.

The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN

LOOK GUIs and otherwise comply with Sun’s written license agreements.

United States Government Rights - Commercial use. U.S. Government users are subject to the standard government user license agreements of

Sun Microsystems, Inc. and Fujitsu Limited and the applicable provisions of the FAR and its supplements.

Disclaimer: The only warranties granted by Fujitsu Limited, Sun Microsystems, Inc. or any affiliate of either of them in connection with this document or any product or technology described herein are those expressly set forth in the license agreement pursuant to which the product or technology is provided. EXCEPT AS EXPRESSLY SET FORTH IN SUCH AGREEMENT, FUJITSU LIMITED, SUN MICROSYSTEMS, INC.

AND THEIR AFFILIATES MAKE NO REPRESENTATIONS OR WARRANTIES OF ANY KIND (EXPRESS OR IMPLIED) REGARDING SUCH

PRODUCT OR TECHNOLOGY OR THIS DOCUMENT, WHICH ARE ALL PROVIDED AS IS, AND ALL EXPRESS OR IMPLIED

CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTY OF

MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE

EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. Unless otherwise expressly set forth in such agreement, to the extent allowed by applicable law, in no event shall Fujitsu Limited, Sun Microsystems, Inc. or any of their affiliates have any liability to any third party under any legal theory for any loss of revenues or profits, loss of use or data, or business interruptions, or for any indirect, special, incidental or consequential damages, even if advised of the possibility of such damages.

DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,

INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT,

ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.

Please

Recycle

Copyright 2007 FUJITSU LIMITED, 1-1, Kamikodanaka 4-chome, Nakahara-ku, Kawasaki-shi, Kanagawa-ken 211-8588, Japon. Tous droits réservés.

Entrée et revue tecnical fournies par Sun Microsystems, Incl sur des parties de ce matériel.

Sun Microsystems, Inc. et Fujitsu Limited détiennent et contrôlent toutes deux des droits de propriété intellectuelle relatifs aux produits et technologies décrits dans ce document. De même, ces produits, technologies et ce document sont protégés par des lois sur le copyright, des brevets, d’autres lois sur la propriété intellectuelle et des traités internationaux. Les droits de propriété intellectuelle de Sun Microsystems, Inc.

et Fujitsu Limited concernant ces produits, ces technologies et ce document comprennent, sans que cette liste soit exhaustive, un ou plusieurs des brevets déposés aux États-Unis et indiqués à l’adresse http://www.sun.com/patents de même qu’un ou plusieurs brevets ou applications brevetées supplémentaires aux États-Unis et dans d’autres pays.

Ce document, le produit et les technologies afférents sont exclusivement distribués avec des licences qui en restreignent l’utilisation, la copie, la distribution et la décompilation. Aucune partie de ce produit, de ces technologies ou de ce document ne peut être reproduite sous quelque forme que ce soit, par quelque moyen que ce soit, sans l’autorisation écrite préalable de Fujitsu Limited et de Sun Microsystems, Inc., et de leurs

éventuels bailleurs de licence. Ce document, bien qu’il vous ait été fourni, ne vous confère aucun droit et aucune licence, expresses ou tacites, concernant le produit ou la technologie auxquels il se rapporte. Par ailleurs, il ne contient ni ne représente aucun engagement, de quelque type que ce soit, de la part de Fujitsu Limited ou de Sun Microsystems, Inc., ou des sociétés affiliées.

Ce document, et le produit et les technologies qu’il décrit, peuvent inclure des droits de propriété intellectuelle de parties tierces protégés par copyright et/ou cédés sous licence par des fournisseurs à Fujitsu Limited et/ou Sun Microsystems, Inc., y compris des logiciels et des technologies relatives aux polices de caractères.

Par limites du GPL ou du LGPL, une copie du code source régi par le GPL ou LGPL, comme applicable, est sur demande vers la fin utilsateur disponible; veuillez contacter Fujitsu Limted ou Sun Microsystems, Inc.

Cette distribution peut comprendre des composants développés par des tierces parties.

Des parties de ce produit pourront être dérivées des systèmes Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque déposée aux Etats-Unis et dans d’autres pays et licenciée exclusivement par X/Open Company, Ltd.

Sun, Sun Microsystems, le logo Sun, Java, Netra, Solaris, Sun Ray, Answerbook2, docs.sun.com, OpenBoot, et Sun Fire sont des marques de fabrique ou des marques déposées de Sun Microsystems, Inc. aux Etats-Unis et dans d’autres pays.

Fujitsu et le logo Fujitsu sont des marques déposées de Fujitsu Limited.

Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées de SPARC International, Inc.

aux Etats-Unis et dans d’autres pays. Les produits portant les marques SPARC sont basés sur une architecture développée par Sun

Microsystems, Inc.

SPARC64 est une marques déposée de SPARC International, Inc., utilisée sous le permis par Fujitsu Microelectronics, Inc. et Fujitsu Limited.

L’interface d’utilisation graphique OPEN LOOK et Sun™ a été développée par Sun Microsystems, Inc. pour ses utilisateurs et licenciés. Sun reconnaît les efforts de pionniers de Xerox pour la recherche et le développement du concept des interfaces d’utilisation visuelle ou graphique pour l’industrie de l’informatique. Sun détient une license non exclusive de Xerox sur l’interface d’utilisation graphique Xerox, cette licence couvrant également les licenciés de Sun qui mettent en place l’interface d’utilisation graphique OPEN LOOK et qui, en outre, se conforment aux licences écrites de Sun.

Droits du gouvernement américain - logiciel commercial. Les utilisateurs du gouvernement américain sont soumis aux contrats de licence standard de Sun Microsystems, Inc. et de Fujitsu Limited ainsi qu’aux clauses applicables stipulées dans le FAR et ses suppléments.

Avis de non-responsabilité: les seules garanties octroyées par Fujitsu Limited, Sun Microsystems, Inc. ou toute société affiliée de l’une ou l’autre entité en rapport avec ce document ou tout produit ou toute technologie décrit(e) dans les présentes correspondent aux garanties expressément stipulées dans le contrat de licence régissant le produit ou la technologie fourni(e). SAUF MENTION CONTRAIRE EXPRESSÉMENT

STIPULÉE DANS CE CONTRAT, FUJITSU LIMITED, SUN MICROSYSTEMS, INC. ET LES SOCIÉTÉS AFFILIÉES REJETTENT TOUTE

REPRÉSENTATION OU TOUTE GARANTIE, QUELLE QU’EN SOIT LA NATURE (EXPRESSE OU IMPLICITE) CONCERNANT CE

PRODUIT, CETTE TECHNOLOGIE OU CE DOCUMENT, LESQUELS SONT FOURNIS EN L’ÉTAT. EN OUTRE, TOUTES LES CONDITIONS,

REPRÉSENTATIONS ET GARANTIES EXPRESSES OU TACITES, Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE À

LA QUALITÉ MARCHANDE, À L’APTITUDE À UNE UTILISATION PARTICULIÈRE OU À L’ABSENCE DE CONTREFAÇON, SONT

EXCLUES, DANS LA MESURE AUTORISÉE PAR LA LOI APPLICABLE. Sauf mention contraire expressément stipulée dans ce contrat, dans la mesure autorisée par la loi applicable, en aucun cas Fujitsu Limited, Sun Microsystems, Inc. ou l’une de leurs filiales ne sauraient être tenues responsables envers une quelconque partie tierce, sous quelque théorie juridique que ce soit, de tout manque à gagner ou de perte de profit, de problèmes d’utilisation ou de perte de données, ou d’interruptions d’activités, ou de tout dommage indirect, spécial, secondaire ou consécutif, même si ces entités ont été préalablement informées d’une telle éventualité.

LA DOCUMENTATION EST FOURNIE “EN L’ETAT” ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES

OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y COMPRIS NOTAMMENT

TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A

L’ABSENCE DE CONTREFACON.

Contents

Preface xiii

1.

Overview of Dynamic Reconfiguration 1–1

1.1

DR 1–1

1.2

Basic DR Functions 1–5

1.2.1

Adding a System Board 1–6

1.2.2

Deleting a System Board 1–6

1.2.3

Moving a System Board 1–6

1.2.4

Replacing a System Board 1–7

1.3

Security 1–7

1.4

Overview of DR User Interfaces 1–7

2.

What You Must Know Before Using DR 2–1

2.1

System Configuration 2–1

2.1.1

System Board Components 2–1

2.1.1.1

CPU 2–4

2.1.1.2

2.1.1.3

Memory 2–5

I/O Device 2–9

2.1.2

System Board Configuration Requirements 2–10

2.1.3

System Board Pool Function 2–10

v

2.1.4

Checklists for System Configuration 2–11

2.1.5

Reservation of Domain Configuration Changes 2–12

2.2

Conditions and Settings Using XSCF 2–12

2.2.1

Conditions Using XSCF 2–12

2.2.2

Settings Using XSCF 2–13

2.2.2.1

2.2.2.2

Configuration Policy Option

Floating Board Option 2–14

2–13

2.2.2.3

2.2.2.4

Omit-memory Option 2–15

Omit-I/O Option 2–15

2.3

Conditions and Settings Using Solaris OS 2–16

2.3.1

I/O and Software Requirements 2–16

2.3.2

Settings of Kernel Cage Memory 2–16

2.4

Status Management 2–17

2.4.1

Domain Status 2–17

2.4.2

System Board Status 2–18

2.4.3

Flow of DR Processing 2–20

2.4.3.1

Flowchart: Adding a System Board 2–20

2.4.3.2

2.4.3.3

Flowchart: Deleting a System Board

Flowchart: Moving a System Board

2–21

2–23

2.4.3.4

Flowchart: Replacing System Board 2–25

2.5

Operation Management 2–27

2.5.1

I/O Device Management 2–27

2.5.2

Swap Area 2–27

2.5.2.1

2.5.2.2

Swap Area at System Board Addition 2–27

Swap Area at System Board Deletion 2–27

2.5.3

Real-time Processes 2–28

2.5.4

Memory Mirror Mode 2–28

2.5.5

Capacity on Demand (COD) 2–29

vi

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

2.5.6

XSCF Failover 2–29

2.5.7

Kernel Memory Board Deletion 2–29

2.5.8

Deletion of Board with DVD Drive 2–30

3.

DR User Interface 3–1

3.1

How To Use the DR User Interface 3–1

3.1.1

Displaying Domain Information 3–2

3.1.2

Displaying Domain Status 3–5

3.1.3

Displaying System Board Information 3–6

3.1.4

Displaying Device Information 3–10

3.1.5

Displaying System Board Configuration Information 3–13

3.1.6

Adding a System Board 3–15

3.1.7

Deleting a System Board 3–17

3.1.8

Moving a System Board 3–19

3.1.9

Replacing a System Board 3–22

3.1.10

Reserving a Domain Configuration Change 3–24

3.2

Command Reference 3–25

3.3

XSCF Web 3–27

3.4

RCM Script 3–27

4.

Practical Examples of DR 4–1

4.1

Flow of DR Operation 4–1

4.1.1

Flow: Adding a System Board 4–2

4.1.2

Flow: Deleting a System Board 4–3

4.1.3

Flow: Moving a System Board 4–4

4.1.4

Flow: Replacing a System Board 4–5

4.2

Example: Adding a System Board 4–6

4.3

Example: Deleting a System Board 4–8

4.4

Example: Moving a System Board 4–10

Contents

vii

4.5

Examples: Replacing a System Board 4–12

4.5.1

Example: Replacing a Uni-XSB System Board 4–13

4.5.2

Example: Replacing a Quad-XSB System Board 4–16

4.6

Examples: Reserving Domain Configuration Changes 4–20

4.6.1

Example: Reserving a System Board Add 4–20

4.6.2

Example: Reserving a System Board Delete 4–22

4.6.3

Example: Reserving a System Board Move 4–23

A. Message Meaning and Handling A–1

A.1

Solaris OS Messages A–1

A.1.1

Transition Messages A–1

A.1.2

PANIC Messages A–3

A.1.3

Warning Messages A–4

A.2

Command Messages A–23

A.2.1

addboard A–23

A.2.2

deleteboard A–26

A.2.3

moveboard A–28

A.2.4

setdcl A–32

A.2.5

setupfru A–33

A.2.6

showdevices A–34

B. Example: Confirm Swap Space Size B–1

Glossary Glossary–1

Index Index–1

viii

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

Figures

FIGURE 2-6

FIGURE 2-7

FIGURE 2-8

FIGURE 4-1

FIGURE 4-2

FIGURE 4-3

FIGURE 4-4

FIGURE 4-5

FIGURE 1-1

FIGURE 1-2

FIGURE 1-3

FIGURE 2-1

FIGURE 2-2

FIGURE 2-3

FIGURE 2-4

FIGURE 2-5

FIGURE 4-6

FIGURE 4-7

FIGURE 4-8

FIGURE 4-9

Uni-XSB and Quad-XSB (Midrange Servers) 1–2

Uni-XSB and Quad-XSB (High-end Servers 1–3

DR Processing Flow 1–5

Example of Hardware Configuration (with Uni-XSB of Midrange Server) 2–2

Example of Hardware Configuration (with Quad-XSBs of Midrange Server) 2–3

Example of a Hardware Configuration (with Uni-XSBs of High-end Server) 2–4

Example of a Hardware Configuration (with Quad-XSBs of High-end Server) 2–4

Flow of System Board Addition Processing 2–21

Flow of System Board Deletion Processing 2–22

Flow of System Board Move Processing 2–24

Flow of System Board Replacement Processing 2–26

Flow: Adding a System Board 4–2

Flow: Deleting a System Board 4–3

Flow: Moving a System Board 4–4

Flow: Replacing a System Board 4–5

Example: Adding a System Board 4–6

Example: Deleting a System Board 4–8

Example: Moving a System Board 4–10

Example: Replacing a Uni-XSB System Board 4–13

Example: Replacing a Quad-XSB System Board 4–16

ix

FIGURE 4-10

FIGURE 4-11

FIGURE 4-12

Example: Reserve a System Board Add 4–20

Example: Reserving a System Board Delete 4–22

Example: Reserving a System Board Move 4–23

x

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

Tables

TABLE 3-3

TABLE 3-4

TABLE 3-5

TABLE 3-6

TABLE 3-7

TABLE 3-8

TABLE 3-9

TABLE 3-10

TABLE 1-1

TABLE 1-2

TABLE 2-1

TABLE 2-2

TABLE 2-3

TABLE 2-4

TABLE 3-1

TABLE 3-2

TABLE 3-11

TABLE 3-12

TABLE 3-13

TABLE 3-14

Basic DR Terms 1–3

Terms Related to Hardware Configurations 1–4

Unit of Degradation 2–14

Domain Status 2–18

System Board Management Items 2–18

System Board Management Items 2–19

DR Display Commands 3–1

DR Operation Commands 3–2

Options of the showdcl Command 3–3

Items of Domain Information to be Displayed 3–3

Options of the showdomainstatus Command 3–5

Items of Domain Information to be Displayed 3–5

Options of the showboards Command 3–7

Items of System Board Information to be Displayed 3–7

Options of the showdevices Command 3–11

Domain Information Displayed by the showdevices command 3–12

Options of the showfru Command 3–14

Items of System Board Configuration Information to be Displayed 3–14

Options of the addboard Command 3–15

Options of the deleteboard Command 3–18

xi

TABLE 3-15

TABLE 3-16

TABLE 3-17

TABLE 3-18

Options of the moveboard Command 3–20

DR Display Commands 3–25

DR Operation Commands 3–25

DR-related Commands 3–26

xii

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

Preface

This manual describes the Dynamic Reconfiguration (DR hereafter) function provided by SPARC Enterprise servers.

This manual is intended for users, specifically system management administrators who conduct operations on systems and domains.

In addition, before reading this manual, it is necessary to have read the Servers

Overview and Installation Guide of each model, and the SPARC Enterprise

M4000/M5000/M8000/M9000 Servers Administration Guide and SPARC Enterprise

M4000/M5000/M8000/M9000 Servers XSCF User’s Guide.

This section includes:

“Audience” on page xiv

“Structure and Contents of This Manual” on page xiv

“SPARC Enterprise Mx000 Servers Documentation” on page xv

“Abbreviated References to Other Documents” on page xvii

“Models” on page xviii

“Text Conventions” on page xviii

“Prompt Notations” on page xix

“Syntax of the Command Line Interface (CLI)” on page xix

“Software License” on page xx

“Fujitsu Siemens Computers Welcomes Your Comments” on page xx

Preface

xiii

Audience

This manual is intended for users, who administrate SPARC Enterprise

M4000/M5000/M8000/M9000 servers (hereinafter referenced to as XSCF user). The

XSCF user is required to have the following knowledge:

Solaris

TM

Operating System and Unix command

SPARC Enterprise M4000/M5000/M8000/M9000 servers and basic knowledge of

XSCF

Structure and Contents of This Manual

This manual is organized as described below:

Chapter 1

Overview of Dynamic Reconfiguration

This chapter provides an overview of DR.

Chapter 2

What You Must Know Before Using DR

This chapter explains conditions, configuration and checklists related to DR.

Chapter 3

DR User Interface

This chapter describes user interfaces of DR.

Chapter 4

Practical Examples of DR

This chapter describes sample operations of DR commands provided by DR.

Appendix A

Message Meaning and Handling

This appendix contains explanations and messages that might be displayed when DR is used.

Appendix B

Example: Confirm Swap Space Size

This appendix explains how to confirm whether the system has enough swap space for operating the DR function.

xiv

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

Glossary and Index

Glossary

The glossary explains the terms used in this manual

Index

The index provides keywords and corresponding reference page numbers so that the reader can easily search for items in this manual as necessary.

SPARC Enterprise Mx000 Servers

Documentation

The manuals listed below are provided for reference..

Book Titles

SPARC Enterprise M4000/M5000 Servers Site Planning Guide

SPARC Enterprise M8000/M9000 Servers Site Planning Guide

SPARC Enterprise Equipment Rack Mounting Guide

SPARC Enterprise M4000/M5000 Servers Getting Started Guide

SPARC Enterprise M8000/M9000 Servers Getting Started Guide

SPARC Enterprise M4000/M5000 Servers Overview Guide

SPARC Enterprise M8000/M9000 Servers Overview Guide

Important Safety Information for Hardware Systems

SPARC Enterprise M4000/M5000 Servers Safety and Compliance

Guide

SPARC Enterprise M8000/M9000 Servers Safety and Compliance

Guide

External I/O Expansion Unit Safety and Compliance Guide

SPARC Enterprise M4000 Server Unpacking Guide

SPARC Enterprise M5000 Server Unpacking Guide

SPARC Enterprise M8000/M9000 Servers Unpacking Guide

SPARC Enterprise M4000/M5000 Servers Installation Guide

SPARC Enterprise M8000/M9000 Servers Installation Guide

SPARC Enterprise M4000/M5000 Servers Service Manual

Order No.

U41674-J-Z816-x-76

U41685-J-Z816-x-76

U41711-J-Z816-x-76

U41719-J-Z816-x-76

U41717-J-Z816-x-76

U41675-J-Z816-x-76

U41686-J-Z816-x-76

U41715-J-Z816-x-76

U41676-J-Z816-x-76

U41687-J-Z816-x-76

U41716-J-Z816-x-76

U41720-J-Z816-x-76

U41728-J-Z816-x-76

U41718-J-Z816-x-76

U41677-J-Z816-x-76

U41688-J-Z816-x-76

U41678-J-Z816-x-76

Preface

xv

Book Titles

SPARC Enterprise M8000/M9000 Servers Service Manual

External I/O Expansion Unit Installation and Service Manual

SPARC Enterprise M4000/M5000/M8000/M9000 Servers

Administration Guide

SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF

User’s Guide

SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF

Reference Manual

SPARC Enterprise M4000/M5000/M8000/M9000 Servers Dynamic

Reconfiguration (DR) User’s Guide

SPARC Enterprise M4000/M5000/M8000/M9000 Servers Capacity on

Demand (COD) User’s Guide

SPARC Enterprise M4000/M5000 Servers Product Notes

SPARC Enterprise M8000/M9000 Servers Product Notes

External I/O Expansion Unit Product Notes

Order No.

U41689-J-Z816-x-76

U41679-J-Z816-x-76

U41680-J-Z816-x-76

U41681-J-Z816-x-76

U41682-J-Z816-x-76

U41684-J-Z816-x-76

U41693-J-Z816-x-76

U4173x-J-Z816-x-76

U4173x-J-Z816-x-76

U41740-J-Z816-x-76

Note –

" x " in the order number is the version number of the manual.

1. Manuals on the Web

The latest versions of all the SPARC Enterprise Series manuals are available at the following websites.

http://manuals.fujitsu-siemens.com/

2. Provided in system

Man page of the XSCF

Note –

The man page can be referenced on the XSCF Shell, and it provides the same content as the SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF Reference

Manual.

3. Documentation and Supporting on the Web

The latest information about other documents and the supporting of the SPARC

Enterprise series are provided on the Web site.

a. Message http://www.fujitsu.com/sparcenterprise/msg/ b. Downloading the firmware program

xvi

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

Contact the field engineer.

The following files or document are provided.

i. Firmware program file (XSCF Control Package (XCP) file) ii. XSCF extension MIB definition file

Note –

XSCF Control Package (XCP) : XCP is a package which has the control programs of hardware that configures a computing system. The XSCF firmware and the OpenBoot PROM firmware are included in the XCP file.

c. Fault Management MIB (SUN-FM-MIB) definition file http://src.opensolaris.org/source/xref/onnv/onnvgate/usr/src/lib/fm/libfmd_snmp/mibs/ d. Solaris Operating System Related Manuals http://docs.sun.com

Abbreviated References to Other

Documents

In this manual, the following abbreviated titles may be used when referring to a systems manual. The following table lists the abbreviations used in this manual.

Abbreviated Title Full Title

Overview Guide

Service Manual

SPARC Enterprise M4000/M5000 Servers Overview Guide

SPARC Enterprise M8000/M9000 Servers Overview Guide

SPARC Enterprise M4000/M5000 Servers Service Manual

SPARC Enterprise M8000/M9000 Servers Service Manual

Installation Guide SPARC Enterprise M4000/M5000 Servers Installation Guide

SPARC Enterprise M8000/M9000 Servers Installation Guide

Administration Guide

XSCF Reference Manual SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF

Reference Manual

XSCF User’s Guide

SPARC Enterprise M4000/M5000/M8000/M9000 Servers

Administration Guide

SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF

User’s Guide

Preface

xvii

Models

The model names used in this manual are as follows.

Server class

Midrange

High-end

Model name

SPARC Enterprise M4000

SPARC Enterprise M5000

SPARC Enterprise M8000

SPARC Enterprise M9000

Text Conventions

This manual uses the following fonts and symbols to express specific types of information.

Fonts/symbols

AaBbCc123

AaBbCc123

Italic

" "

Meaning

What you type, when contrasted with on-screen computer output.

This font represents the example of command input in the frame.

The names of commands, files, and directories; on-screen computer output.

This font represents the example of command input in the frame.

Indicates the name of a reference manual

Indicates names of chapters, sections, items, buttons, or menus

Example

XSCF> adduser jsmith

User Name: jsmith

Privileges: useradm

auditadm

See the XSCF User's Guide.

See Chapter 2, "Preparation for

Installation."

xviii

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

Prompt Notations

The prompt notations used in this manual are as follows.

Shell

XSCF

C shell

C shell super user

Bourne shell and Korn shell

Bourne shell and Korn shell super user

OpenBoot PROM

Prompt Notations

XSCF>

machine-name%

machine-name#

$

# ok

Syntax of the Command Line Interface

(CLI)

The command syntax is described below.

Command syntax

The command syntax is as follows:

A variable that requires input of a value must be enclosed in <>.

An optional element must be enclosed in [].

A group of options for an optional keyword must be enclosed in [] and delimited by |.

A group of options for a mandatory keyword must be enclosed in {} and delimited by |.

Preface

xix

The command syntax is shown in a frame such as this one.

Example::

XSCF> showuser -a

Software License

The function to explain in this manual uses the softwares of GPL,LGPL and others.

For the information of the license, see Appendix E, "Software License Condition" in

SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF User’s Guide.

Fujitsu Siemens Computers Welcomes

Your Comments

We would appreciate your comments and suggestions to improve this document.

You can submit your comments by using

“Reader's Comment Form” on page xxi .

xx

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

Reader's Comment Form

Preface

xxi

FOLD AND TAPE

BUSINESS REPLY MAIL

FIRST-CLASS MAIL PERMIT NO 741 SUNNYVALE CA

POSTAGE WILL BE PAID BY ADDRESSEE

FUJITSU COMPUTER SYSTEMS

AT TENTION ENGINEERING OPS M/S 249

1250 EAST ARQUES AVENUE

P O BOX 3470

SUNNYVALE CA 94088-3470

FOLD AND TAPE

NO POSTAGE

NECESSARY

IF MAILED

IN THE

UNITED STATES

xxii

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

C H A P T E R

1

Overview of Dynamic

Reconfiguration

This chapter provides an overview of Dynamic Reconfiguration, which is controlled by the eXtended System Control Facility (XSCF).

1.1

DR

Dynamic Reconfiguration (referred to as DR, in this document) enables hardware resources such as processors, memory, and I/O to be added and deleted even while the Solaris

TM

Operating System (referred to as OS, in this document) is running.

DR has three basic functions; i.e., addition, deletion and move, which can be used for the following purposes.

Add system boards without stopping the Solaris OS of the domain, to improve business operations or handle higher system loads.

Temporarily remove a faulty system board for parts replacement without stopping the Solaris OS of the domain, in the event of an error that causes the system board to become degraded.

Move a resource from one domain to another while continuously operating the domains without physically removing or inserting a system board. Resources can be moved to balance the loads on multiple domains, or to share common I/O resources between domains.

SPARC Enterprise M4000/M5000/M8000/M9000 servers have a unique partitioning feature that can divide one physical system board (PSB) into one logical board

(undivided status) or four logical boards. A PSB that is logically divided into one board (undivided status) is called a Uni-XSB, whereas a PSB that is logically divided

1-1

CMU into four boards is called a Quad-XSB. Each composition of physical unit of the divided PSB is called an eXtended System Board (XSB). These XSBs can be combined freely to create domains.

DR functions on these servers are performed on an XSB. This manual uses the term

system board unless physical units of PSB and XSB are described. For an explanation of each term, see

TABLE 1-2

.

Note –

This document explains DR functions on system boards. Use the Solaris command cfgadm(1) to execute DR on I/O devices, including PCI cards. For more information, please see the service manual for your system, and the cfgadm(1M) and cfgadm_pci(1M) man pages.

FIGURE 1-1

Uni-XSB and Quad-XSB (Midrange Servers)

XSB

Uni-XSB

MBU

XSB

Quad-XSB

XSB XSB XSB XSB

MBU

XSB XSB XSB XSB

CMU

IOU IOU

System boards

1-2

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

FIGURE 1-2

Uni-XSB and Quad-XSB (High-end Servers

Uni-XSB

XSB

Quad-XSB

XSB XSB XSB XSB

CMU CMU

IOU IOU

System boards

TABLE 1-1

and

TABLE 1-2

list DR-related terms.

TABLE 1-1

Term

Add

Delete

Move

Register

Release

Assign

Unassign

Connect

Disconnect

Configure

Basic DR Terms

Definition

To connect a system board to a domain and configure it into the

Solaris OS of the domain.

To unconfigure a system board from the Solaris OS of a domain and disconnect it from the domain.

To disconnect a system board from a domain and then connect the system board to another domain.

To register a system board in the domain component list (hereinafter called DCL).

To delete a registered system board from the DCL.

To assign a system board to a domain.

To release a system board from a domain.

To connect a system board to a domain.

To disconnect a system board from a domain.

To configure a system board in the Solaris OS.

Chapter 1 Overview of Dynamic Reconfiguration

1-3

TABLE 1-1

Term

Unconfigure

Reserve

Install

Remove

Replace

Basic DR Terms

Definition

To unconfigure a system board in the Solaris OS.

To reserve a system board such that it is assigned to or unassigned from a domain on the next reboot or power-cycle.

To insert a system board into a system.

To remove a system board from a system.

To remove a system board and then mount it or a new system board, for system maintenance and inspection.

TABLE 1-2

Terms Related to Hardware Configurations

Term Definition

CPU/Memory board unit (CMU)

Unit equipped with a CPU module, and memory. High-end servers only.

Motherboard Unit

(MBU)

Unit for midrange servers. A CMU is mounted on this board.

Midrange servers only.

I/O board unit (IOU) Unit equipped with a PCI card and a disk drive unit.

Physical System

Board (PSB) eXtended System

Board (XSB)

The PSB is made up of physical components, and can include 1 CMU and 1 IOU or just 1 CMU. In midrange servers, the CMU is mounted on a MBU. A PSB also can be used to describe a physical unit for addition/deletion/exchange of hardware. The PSB can be used in one of two methods, one complete unit (undivided status) or divided into four subunits.

The XSB is made of physical components. In the XSB, the PSB can be either one complete unit (undivided status) or divided into four subunits. The XSB is a unit used for domain construction and identification, and also can be used as a logical unit.

Logical System Board

(LSB)

A logical unit name assigned to an XSB. Each domain has its own set of LSB assignments. LSB numbers are used to control how resources such as kernel memory get allocated within domains.

1-4

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

TABLE 1-2

Uni-XSB

Quad-XSB

Terms Related to Hardware Configurations (Continued)

Term

System board

Definition

The hardware resources of a PSB or an XSB. A System board is used to describe the hardware resources for operations such as domain construction and identification. In this manual, this refers to the XSB.

One of the division types of a PSB. Uni-XSB is a name for when a PSB is logically only one unit (undivided status). It is a default value setting for the division type for a PSB. The division type can be changed by using the XSCF command setupfru(8). Uni-XSB may be used to describe a PSB division type or status.

One of the division types of a PSB. Quad-XSB is a name for when a

PSB is logically divided into four parts. The division type can be changed by using the XSCF command setupfru(8). Quad-XSB may be used to describe a PSB division type or status.

1.2

Basic DR Functions

This section describes the basic DR functions.

FIGURE 1-3

shows DR processing.

FIGURE 1-3

DR Processing Flow

Domain B

Domain B

Domain A

Domain A

Chapter 1 Overview of Dynamic Reconfiguration

1-5

1.2.1

1.2.2

1.2.3

In the example shown in

FIGURE 1-3

, system board #2 is deleted from domain A and added to domain B. In this way, the physical configuration of the hardware

(mounting locations) is not changed but the logical configuration is changed for management of the system boards.

Adding a System Board

You can use DR to add a system board to a domain provided that board is installed in the system and not assigned to another domain. You can do so without stopping the Solaris OS running in the domain.

A system board is added in such stages as connect, and configure.

In the add operation, the selected system board is connected to the target domain.

Then, the system board is configured to the Solaris OS of the domain. At this point, addition of the system board is completed.

Deleting a System Board

You can use DR to delete a system board from a domain without stopping the Solaris

OS running in that domain.

A system board is deleted in such stages as unconfigure and disconnect. If the board must be assigned to another domain, the delete operation must also include an unassign step.

In the delete operation, the selected system board is unconfigured from its domain by the Solaris OS. Then, the board is disconnected from the domain. At this point, deletion of the system board is completed.

Moving a System Board

You can use DR to reassign a system board from one domain to another without stopping the Solaris OS running in either domain.

This move function can change the configurations of both domains without physical removal and remounting of the system board.

The move operation for a system board is a serial combination of the “delete” and

“add” operations. In other words, the selected system board is deleted from its domain and then added to the target domain.

1-6

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

1.2.4

Replacing a System Board

You can use DR to remove a system board from a domain and either add it back later, or replace it with another system board, provided both boards satisfy DR requirements as described in this document. You can do so without stopping the

Solaris OS running in either domain.

You can replace system board in the case of exchanging hardware resources such as

CPUs, memory, I/O devices.

A system board is replaced successively in stages.

In the replace operation, the selected system board is deleted from the OS of the domain. Then, the system board is removed when it is ready to be released from its domain. After field parts replacement or other such task, the system board is reinstalled and added.

Note –

You cannot use DR to replace a system board in a midrange server because doing so would replace an MBU. To replace a system board in a midrange server, you must turn off the power of all domains, then replace the board without using

DR commands.

1.3

Security

DR operations are executed based on privileges. For information about privileges and user accounts, see the SPARC Enterprise M4000/M5000/M8000/M9000 Servers

Administration Guide.

1.4

Overview of DR User Interfaces

DR operations are performed through the command line interface (CLI) within the

XSCF shell or through the browser-based user interface (BUI) in the XSCF Web provided by the eXtended System Control Facility (XSCF). These operations are collectively managed by the XSCF. Furthermore, XSCF security management restricts

DR operations to administrators who have the proper access privileges.

Chapter 1 Overview of Dynamic Reconfiguration

1-7

For details of XSCF shell commands provided for DR, see Section 3.1, “How To Use the DR User Interface” on page 3-1 . XSCF Web is beyond the scope of this document.

See the SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for further information.

1-8

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

C H A P T E R

2

What You Must Know Before Using

DR

This chapter provides information you must know to successfully use the DR functions.

2.1

2.1.1

System Configuration

This section describes the conditions, premises, and actions for operating the DR functions to construct a system.

System Board Components

There are three types of system board components that can be added and deleted by

DR: CPU, memory, and I/O device.

FIGURE 2-1

and

FIGURE 2-2

show examples of a system board of a midrange server that is divided into one Uni-XSB, and into Quad-

XSBs.

FIGURE 2-3

and

FIGURE 2-4

show examples of a system board of a high-end server that is divided into one Uni-XSB, and into Quad-XSBs.

Note –

Due to diagnostic requirements, the DR function works only on boards that have at least one CPU and memory.

2-1

FIGURE 2-1

Example of Hardware Configuration (with Uni-XSB of Midrange Server)

CMU IOU

Memory

I/O device

Memory

I/O device

XSB 00-0

Memory

Memory

MBU

XSB 01-0

Memory

Memory

Memory

Memory

I/O device

I/O device

2-2

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

FIGURE 2-2

Example of Hardware Configuration (with Quad-XSBs of Midrange Server)

CMU IOU

XSB 00-0

Memory

I/O device

XSB 00-1

Memory

I/O device

XSB 00-2

Memory

XSB 00-3

Memory

MBU

XSB 01-0

XSB 01-1

XSB 01-2

XSB 01-3

Memory

Memory

Memory

Memory

I/O device

I/O device

Chapter 2 What You Must Know Before Using DR

2-3

FIGURE 2-3

Example of a Hardware Configuration (with Uni-XSBs of High-end Server)

CMU IOU

Memory

I/O device

Memory

I/O device

XSB 00-0

Memory

I/O device

Memory

I/O device

FIGURE 2-4

Example of a Hardware Configuration (with Quad-XSBs of High-end Server)

CMU IOU

XSB 00-0

Memory

I/O device

XSB 00-1

Memory

I/O device

XSB 00-2

Memory

I/O device

XSB 00-3

Memory

I/O device

2.1.1.1

CPU

Using DR to change a CPU configuration is easier than using it to change the configuration of memory or an I/O device.

An added CPU is automatically recognized by the Solaris OS and becomes available for use.

2-4

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

2.1.1.2

A CPU to be deleted must meet the following conditions:

No running process is bound to the CPU to be deleted. If a running process is bound to the target CPU, you must unbind or stop the process.

The CPU to be deleted does not belong to any processor set. If the target processor belongs to a processor set, you must delete the CPU from the processor set by using the psrset(1M) command.

If the resource pools facility is in use by the domain, the CPU cannot be deleted unless the minimum processor set sizes can otherwise be maintained. Use the

Solaris commands pooladm(1M) and poolcfg(1M) to check these parameters and, if necessary, adjust the sizes of the domain's resource pools.

Note –

These conditions also apply to movement of a system board.

If any of the above conditions are not met, the DR operation is stopped and a message is displayed. However, if you specify the deleteboard(8) command with the -f (force) option, these protections are ignored and DR continues the deletion process.

Note –

Exercise care when using the force option, as doing so introduces risk of domain failure.

To avoid this problem and automate the operations for CPUs, the Solaris OS provides the Reconfiguration and Coordination Manager (RCM) script function. For details of RCM, see Section 3.4, “RCM Script” on page 3-27 .

Memory

The DR functions classify system boards by memory usage into two types:

Kernel memory board

User memory board

(1) Kernel Memory Board

A kernel memory board is a system board on which kernel memory (memory internally used by the Solaris OS and containing an OpenBoot PROM program) is loaded. Kernel memory cannot be removed from the system. But the location of kernel memory can be controlled, and kernel memory can be copied from one board to another.

Chapter 2 What You Must Know Before Using DR

2-5

To control whether a system board contains kernel memory, use one or more of the following features, which are described below: kernel cage, floating boards, and kernel memory assginment.

To copy kernel memory from one board to another, use the Copy-rename operation. Copy-rename makes it possible for you to perform DR operations on kernel memory boards.

(1.1) Kernel Cage

The kernel cage function must be in use for DR operations on memory to succeed.

Without the kernel cage, kernel memory could be assigned to all system boards, making it impossible to perform DR operations on memory. With the kernel cage, kernel memory is limited to a minimum set of system boards.

For details on enabling this function, see

Section 2.3.2, “Settings of Kernel Cage

Memory” on page 2-16 .

(1.2) Floating Boards

A floating board is a system board that is designated to be moved easily to another domain. In general, kernel memory is not assigned to a floating board unless absolutely necessary.

However, kernel memory can be assigned to a floating board when one of the following is true:

The total amount of space available among non-floating boards is not enough to hold the kernel memory.

The deleteboard(8) command is used with its -f (force) option.

For details on enabling the floating board option for a system board, see

Section 2.2.2.2, “Floating Board Option” on page 2-14

. Also see the SPARC Enterprise

M4000/M5000/M8000/M9000 Servers XSCF User’s Guide or the setdcl(8) man page for further details.

(1.3) Kernel Memory Assignment

When a domain is powered on, the Power On Self Test (POST) initially assigns an address space to each system board in that domain. The order in which address spaces are assigned depends on the LSB number and floating board option of each system board. The first address spaces are assigned to non-floating boards in ascending order of LSB number. Then, additional address spaces are assigned to floating boards, again in ascending order of their LSB numbers.

2-6

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

When the kernel cage is enabled, kernel memory is assigned to system boards in the order of their address spaces. The kernel cage begins in the first address space

(which initially corresponds to the non-floating board with the lowest LSB number).

If the kernel requires more memory, then the kernel cage expands to the next address space (which initially corresponds to the non-floating board with the nextlowest LSB number), and so on. The kernel cage extends into the address spaces of floating boards only if kernel memory is too large to fit in the address spaces of the non-floating boards.

Note –

During a copy-rename operation, the address spaces initially assigned by

POST are exchanged between system boards. The effects of this process persist through reboots of a domain. Therefore, kernel memory may be assigned in a seemingly different order until the domain has gone through a full poweroff(8) and poweron

(8) cycle, as this pair of operations cancels the effects of copy-rename operations.

For details on assigning LSB numbers to system boards, see the 'SPARC Enterprise

M4000/M5000/M8000/M9000 Servers XSCF User’s Guide' or the setdcl(8) man page.

(1.4) Copy-rename

Kernel memory itself cannot be removed, but it can be transferred to another system board. A DR operation to delete a kernel memory board must first perform this transfer, which is called a copy-rename operation.

The Solaris OS selects the target for the copy-rename operation from among the available user memory boards. The following selection and preference criteria are in effect:

The copy-destination board must not yet contain any kernel memory. (It must be a user memory board.)

The copy-destination board must not be a floating board, unless the -f (force) option is used with the deleteboard(8) command.

The copy-destination board must contain at least as much physical memory as the system board being deleted.

If more than one system board satisfies all the selection criteria to the same degree of satisfaction, the one with the lowest LSB number is selected as the copydestination board.

Note –

If no system boards meet the selection criteria, the DR operation to delete the kernel memory board will fail.

Chapter 2 What You Must Know Before Using DR

2-7

Once the copy-destination board has been selected, the Solaris OS performs a memory deletion on the selected user memory board.

Then, the kernel memory on the system board to be deleted is copied into memory on the selected copy-destination system board. The system is suspended while the copying is in progress. After all the memory is copied, the address space of the copydestination board is renamed to that of the kernel memory board being deleted.

Note –

If the address space of a system board is renamed by a copy-rename operation, the change will persist across reboots of the domain. A poweroff

(8)/poweron(8) cycle of the domain will reset the address space assignments and remove the effects of one or more copy-rename operations.

(2) User Memory Board

A user memory board is a system board on which no kernel memory is loaded.

Before deleting user memory, the system attempts to swap out the physical pages to the swap area. Sufficient swap space must be available for this operation to succeed.

(2.1) Locked Pages and ISM Pages

Some user pages are locked into memory and cannot be swapped out. These pages receive special treatment by DR.

Intimate Shared Memory (ISM) pages are special user pages which are shared by all processes. ISM pages are permanently locked and cannot be swapped out as memory pages. ISM is usually used by Data Base Management System (DBMS) software to achieve better performance.

Although locked pages cannot be swapped out, the system automatically moves them to the memory on another system board to avoid any problem concerning the pages. Note, however, that the deletion of user memory fails if there is not sufficient free memory size on the remaining system boards to hold the relocated pages.

Although such moving of memory (called save processing) requires a certain length of time, system operations can continue during save processing because it is executed as a background task.

Note –

The Dynamic Intimate Shared Memory (DISM) is a feature that allows applications to dynamically resize their ISM segments. Some applications use RCM scripts to resize their DISM segments to assist DR. See the Solaris man page for rcmscript

(4).

2-8

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

2.1.1.3

Deleting or moving a user memory board fails if either of the following statements is true:

The swap area does not have sufficient free space to save data from the user memory to be deleted.

There are too many locked or ISM pages to be covered by the memory on other system boards.

I/O Device

(1) Adding an I/O Device

The device driver processing executed by the Solaris OS is based on the premise that all device drivers dynamically recognize newly added devices. In the domain where

DR is performed, all device drivers must support the addition of devices by DR.

Upon the addition of an I/O device by DR, the I/O device is reconfigured automatically.

The path name of a device file under /dev is configured as the path name of the newly added I/O device to make the I/O device accessible.

(2) Deleting an I/O Device

An I/O device can be deleted when both of the following conditions are met:

The device to be deleted is not in use in the domain where the DR operation is to be performed.

The device drivers in the domain where the DR operation is to be performed support DR.

In most cases the device to be deleted is in use. For example, the root file system or any other file systems requisite for operation cannot be unmounted.

To solve this problem, you can configure the system by using redundant configuration software to make the access path to each requisite I/O device redundant. For a disk drive unit, you can make the unit redundant by using disk mirroring software.

If a device driver that does not support DR is used in the domain, all access to I/O devices controlled by the device driver must be stopped, and the device driver must be unloaded by using the modunload(1M) command.

Chapter 2 What You Must Know Before Using DR

2-9

Note –

Do not move a device that is part of a redundant configuration from one domain to another domain. The consequences of two domains simultaneously accessing the same device through different paths could be disastrous, such as data corruption.

2.1.2

System Board Configuration Requirements

XSCF enables the Uni-XSB or Quad-XSB setting according to the configuration conditions to determine the division type. If the CPU or memory configuration does not meet the configuration conditions, neither Uni-XSB nor Quad-XSB can be set as the division type.

For the CPU configuration and memory configuration conditions set for the division types, refer to the System Overview for your system.

The setting of division type may be changed for DR operation if a domain operation requirement dictates changing of a necessary hardware resource when a system board is added to the domain.

In such cases, the CPU configuration and memory configuration conditions for changing the division type are the same as described above. For the conditions, refer to the System Overview for your system.

Note –

Changing the division type before a DR operation may not be possible depending on the system board status or DR operation, even if configuration conditions have been met.

2.1.3

System Board Pool Function

The system board pooling function places a specific system board in the status where that board does not belong to any domain.

This function can be effectively used to move a system board among multiple domains as needed.

For example, a system board can be added from the system board pool to a domain where CPU or memory has a high load. When the added system board becomes unnecessary, the system board can be returned to the system board pool.

All system boards that are targets of DR operations must be registered in the target domain’s Domain Component List (DCL). A domain’s DCL, managed by XSCF, is a list of system boards that are, or are to be, attached to that domain. The DCL of each domain contains not only information of registered system boards but also domain information and option information of each system board.

2-10

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

2.1.4

Moreover, a system board that is pooled can be assigned to a domain only when it is registered on DCL. Pooled system boards must be properly managed.

You can add and delete system boards by combining the system board pooling function with the floating board, omit-memory, and omit-I/O options described in

Section 2.2, “Conditions and Settings Using XSCF” on page 2-12 .

Checklists for System Configuration

This section describes the prerequisites and the checklists for configuring the system for DR.

1. Redundant Configuration of I/O Devices - Before a system board can be replaced, any I/O device connected to that board must be temporarily disconnected.

You should use redundant-configuration software to prevent any problem that might be caused by disconnection of an I/O device that would affect a job process. You should also confirm that the driver and software support DR before performing a DR operation.

2. Selection of PCI Cards Supporting DR - All PCI cards and I/O device interfaces on a system board must support DR. If not, you cannot execute DR operations on that system board. You must turn off the power supply to the domain before performing maintenance and installation.

3. Confirmation of DR Compliance of Drivers and Other Software - You must confirm that all I/O device drivers and software installed in the system support

DR and allow the I/O device operations of DR.

You should also apply the latest patches to the drivers and other software before performing DR.

4. Allocation of Sufficient Memory and Distributed Swap Areas - You must allocate sufficient memory resources to be used when the memory on a system board is disconnected. Performing a DR operation with a high load already applied to memory may significantly lower job process performance and DR operability.

5. Consideration of Hardware Configuration and System Boards on Which Kernel

Memory is Loaded - Before determining the hardware configuration and operations, you must understand how job processes are affected by DR operations on system boards on which CPUs, memory, and I/O devices are mounted.

You can perform DR operations on system boards that contain kernel memory.

When disconnecting a system board on which kernel memory is loaded, DR copies kernel memory into the memory on another system board. The copy operation is based on the premise that the copy-destination system board does not already contain any kernel memory.

Chapter 2 What You Must Know Before Using DR

2-11

2.1.5

When kernel memory is copied, the Solaris OS is temporarily suspended.

Therefore, you must understand the effect of disconnecting the network connection with remote systems and other influences of the DR operation on job processes before determining system operations.

Reservation of Domain Configuration Changes

Besides letting you add, delete, or move system boards dynamically, DR also lets you order such reconfiguration to take place the next time the affected domains are turned on or turned off, or the domain is rebooted. Use the addboard(8), deleteboard

(8), or moveboard(8) command with the -c reserve option to specify these actions.

Some of the reasons you might want to reserve a domain change include:

A hardware resource cannot be dynamically reconfigured by DR for business or operational reasons.

Domain configuration settings should not be immediately changed.

You want to avoid changing the current domain configuration settings and change the configuration immediately after the domain is rebooted when necessary to delete a system board having a driver or PCI card that does not support DR.

You want to assign a floating board to a specific domain beforehand to prevent the system board from being acquired by another domain.

For how to reserve domain changes, see Section 3.1.10, “Reserving a Domain

Configuration Change” on page 3-24 .

2.2

Conditions and Settings Using XSCF

This section describes the operating conditions required for XSCF to start DR operations and the settings that are established by XSCF.

2.2.1

Conditions Using XSCF

The DR operation to add a system board cannot be executed when the system board has only been mounted. The DR operation is enabled by registering the system board in the DCL by using the XSCF shell or XSCF Web. You must confirm that the system board to be added is registered in the DCL before performing the DR

2-12

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

2.2.2

2.2.2.1

operation.

As a matter of course, system boards to be deleted, moved, or replaced have already been registered in the DCL. You need not confirm that these boards have been registered in the DCL.

For details about the DCL and how to register system boards in the DCL and to confirm registration, refer to SPARC Enterprise M4000/M5000/M8000/M9000 Servers

XSCF User’s Guide.

Settings Using XSCF

The DR functions provide users with some options to avoid the complexities of reconfiguration and memory allocation with the Solaris OS, and make DR operations smoother. You can set up these options using the XSCF shell or XSCF Web. This section describes the following options:

Configuration policy option

Floating board option

Omit-memory option

Omit-I/O option

For details of how to set up the options, refer to SPARC Enterprise

M4000/M5000/M8000/M9000 Servers XSCF User’s Guide or the setdcl(8) man page.

Configuration Policy Option

DR operations involve automatic hardware diagnosis to add or move a system board safely. Degradation of components occurs when the components are set according to the configuration of this option, and a hardware error is detected. This option specifies the range of degradation. Moreover, this option can be used for initial diagnosis by domain startup in addition to DR operations.

The unit of degradation can be a component where a hardware error is detected, the system board (XSB) where the component is mounted, or a domain.

This option is set using setdcl(8) command. Values that can be set and units of degradation is explained in

TABLE 2-1

.

The default value of the configuration policy option is FRU.

Chapter 2 What You Must Know Before Using DR

2-13

2.2.2.2

Note –

Enable the configuration policy option when the power supply of the domain is turned off.

TABLE 2-1

Value

FRU

XSB

System

Unit of Degradation

Unit of degradation

Hardware is degraded in units of components such as CPU and memory.

Hardware is degraded in units of system boards (XSB).

Hardware is degraded in units of domains or the relevant domain is stopped without degradation.

Floating Board Option

The floating board option controls kernel memory allocation.

Upon deletion of a system board on which kernel memory is loaded, the OS is temporarily suspended. The suspended status affects job processes and may disable

DR operations. To avoid this problem, use the floating board option to set the priority of kernel loading into the memory of each system board, which increases the likelihood of successful DR operations.

To move a system board among multiple domains, this option can be enabled for the system board to facilitate the system board move.

The value of this option is “true” (to enable the floating board setting) or “false” (to disable the floating board setting). The default is “false”.

A system board with “true” set for this option is called a floating board. A system board with “false” set for this option is called a non-floating board.

Kernel memory is allocated to the non-floating boards in a domain by priority in ascending order of LSB number. When only floating boards are set in the domain, one of them is selected and used as a kernel memory board. In that case, the status of the board is changed from floating board to non-floating board. When Copyrename is operated by system board deletion or removal, and only floating board can be used because non-floating board cannot be used, specify the force option (-f).

Configuration of floating board option does not change when the force option is used.

Note –

Enable the floating board option when the system board is in the system board pool or when the system board is not connected to the domain configuration.

2-14

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

2.2.2.3

Omit-memory Option

When the omit-memory option is enabled, the memory on a system board cannot be used in the domain.

Even when a system board actually has memory, this option enables you to make the memory on the system board unavailable through a DR operation to add or move the system board.

This option can be used when the target domain needs only the CPU (and not the memory) of the system board to be added.

If a domain has a high load on memory, an attempt to delete a system board from the domain may fail. This failure results if a timeout occurs in memory deletion processing (saving of the memory of the system board to be disconnected onto a disk by paging) when many memory pages are locked because of high load. To prevent this situation, you can enable the omit-memory option to facilitate the DR operation beforehand.

Note –

For diagnosis and management of a system board, memory must be mounted on the system board even if the omit-memory option is enabled. Enabling the omit-memory option reduces available memory in the domain and may lower system performance. This option must be used in consideration of the influence on jobs.

The value of this option is “true” (omit memory) or “false” (do not omit memory).

The default value is “false”.

Note –

Enable the omit-memory option when the system board is in the system board pool or when the system board is not connected to the domain configuration.

2.2.2.4

Omit-I/O Option

The omit-I/O option disables the PCI cards, disk drives, and basic local-area network (LAN) ports on a system board to prevent the target domain from using them.

Set this option to “true” if the domain needs to use only the system board’s CPU and memory.

Set this option to “false” if the domain needs to use the system board’s PCI cards and I/O units. In this case you must fully understand the restrictions on use of these

I/O components. And you must stop the software (e.g. application programs or daemons) that uses them before you attempt to delete or move the system board.

The value of this option is “true” (omit I/O units) or “false” (do not omit I/O units).

The default value is “false”.

Chapter 2 What You Must Know Before Using DR

2-15

Note –

Enable the omit-I/O option when the system board is in the system board pool or when the system board is not connected to the domain configuration.

2.3

2.3.1

2.3.2

Conditions and Settings Using Solaris

OS

This section describes the operating conditions and settings required for DR operations.

I/O and Software Requirements

As described in

Section 2.1, “System Configuration” on page 2-1 , all I/O device

drivers and software installed in a domain where DR is to be used must support DR.

The device drivers that support DR must also support the following DDI and DKI entries: attach

(9E): DDI_ATTACH and DDI_RESUME detach

(9E): DDI_DETACH and DDI_SUSPEND

If a device driver that does not support DR is present, the deletion of a system board might fail.

Even if the DDI_DETACH interface is supported, DDI_DETACH processing fails when the relevant driver is in use. Before starting the deletion of a system board, you must stop using all devices on the system board to be deleted.

The device drivers that do not support DR must be unloaded before a system board is deleted. To unload a device driver, you must stop using all I/O devices controlled by the device driver. To unload a device driver, you can use the Solaris command modunload

(1M). Then, you can reload the driver for the remaining instances and resume using those remaining instances after deleting the system board.

Settings of Kernel Cage Memory

Kernel cage memory is a function used to minimize the number of system boards to which kernel memory is allocated. Kernel cage memory is enabled by default in the

Solaris 10 OS.

2-16

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

If the kernel cage is disabled, the system may run more efficiently, but kernel memory will be spread among all boards and DR operations will not work on memory.

To determine whether kernel cage memory is enabled after the system has been rebooted, check the following message output from the /var/adm/messages file:

NOTICE: DR kernel Cage is ENABLED

If the kernel cage is disabled, the message will be:

NOTICE: DR kernel Cage is DISABLED

In most cases the kernel cage should be enabled. However, you must consider actual operations before changing the setting. If you do not need to perform DR operations, you do not need to enable the kernel cage.

To enable kernel cage memory, remove or comment out the following setting from the /etc/system file: set kernel_cage_enable=0

The OS must be rebooted to make the new setting effective.

2.4

2.4.1

Status Management

The success of DR operations depends on the status of domains and system boards.

This section describes the status information on the domains and system boards managed by XSCF, and the points to be noted for a better understanding of DR operation conditions.

Domain Status

XSCF manages the status of each domain.

You can display and reference the status of each domain through a user interface provided by XSCF. For details of the user interface, see Chapter 3 , DR User

Interface.

Chapter 2 What You Must Know Before Using DR

2-17

2.4.2

XSCF manages the following aspects of domain status:

TABLE 2-2

Domain Status

Status

Powered Off

Initialization Phase

OpenBoot Executing

Completed

Booting

Running

Shutdown Started

Panic State

Description

Domain power is off.

POST processing or OpenBoot PROM initialization is in progress.

Initialization of OpenBoot PROM is completed.

Solaris OS is being booted or, due to the domain being shutdown or reset, the system is in the OpenBoot PROM running state or is suspended in the OpenBoot PROM (ok prompt) state.

Solaris OS is running.

Solaris OS is being shut down.

Solaris OS has panicked.

To perform a DR operation for a system board, you must determine the method of

DR operation according to the status of the relevant domain. The conditions of domain status available for DR operation are described in individual sections of

Chapter , DR User Interface. For details of each method used for DR, see the relevant section.

System Board Status

XSCF manages system board status in units of XSB for the following management items:

TABLE 2-3

System Board Management Items

Management item

Power

Test

Assignment

Connectivity

Configuration

Description

Power on/off status of system board

Diagnostic status of system board

Status of assignment to domain

Status of connection to domain

Status of addition into Solaris OS

2-18

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

The table below lists the status types available for individual management items.

TABLE 2-4

Management item

Power

Test

Assignment

Connectivity

System Board Management Items

Configuration

Status

Power Off

Power On unmount unknown testing passed failed unavailable available assigned disconnected connected unconfigured configured

Description

The system board is powered off and cannot be used.

The system board is powered on.

The system board is not mounted or cannot be recognized, perhaps because it is faulty.

The system board is not being diagnosed.

Testing

Passed

A system board error was detected and the board has been deconfigured.

The system board cannot be used. The reason might be one of the following:

- The board is faulty.

- The board is not listed in the domain’s DCL.

- The domain or board is not configured.

- The board is assigned to another domain.

The system board can be used and is registered in the domain’s DCL.

The system board is in this status when in the system board pool.

The system board is reserved or assigned to the domain.

The system board is disconnected from the domain configuration and is in the system board pool.

The system board is connected to the domain configuration.

The hardware resources of the system board have been deleted from the Solaris OS.

The hardware resources of the system board have been added into the Solaris OS.

XSCF changes and configures system board status according to the conditions under which a system board is installed, removed, or registered in the DCL, or when a domain is started or stopped. System board status also changes when the system board is added, deleted, or moved by DR.

Chapter 2 What You Must Know Before Using DR

2-19

2.4.3

2.4.3.1

To perform a DR operation for a system board, you must determine the method of

DR operation according to the status of the target system board.

You can display and reference the status of each system board via a user interface provided by XSCF. For details of the user interface, see Chapter 3 , DR User

Interface.

Flow of DR Processing

This section describes the flow of DR processing and the changes in system board status during individual DR operations.

Flowchart: Adding a System Board

The flow of DR operations and the transition of system board status when a system board has been added or reserved for addition are described in the schematic flowchart, below.

Each system board status indicated in

FIGURE 2-5

is the main status that is changed.

2-20

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

FIGURE 2-5

Flow of System Board Addition Processing

System board pool

DCL registration status

Addition or reservation, DCL registration process

Test: passed

Assignment: available

Request to add system board registration or reservation

Test: passed

Assignment: assigned

Request to add system board, or domain reboot after registration/reservation

Add operation

Diagnosis

Error status

Error found

Test: testing

Assignment: assigned

Test: fail

Assignment: assigned

Diagnosis completed

Domain configuration change process

Test: passed

Assignment: assigned

Connection

Connectivity: disconnected to domain

Test: passed

Assignment: assigned

Connectivity: connected

Request of addition into OS

Process of addition into OS

Test: passed

Assignment: assigned

Connectivity: connected

Configuration: unconfigured

Incorporation into

OS

Test: passed

Assignment: assigned

Connectivity: connected

Configuration: configured

2.4.3.2

Flowchart: Deleting a System Board

The flow of DR operations and the transition of system board status when a system board has been deleted or reserved for deletion are described in the schematic flowchart, below.

Chapter 2 What You Must Know Before Using DR

2-21

Each system board status indicated in

FIGURE 2-6

is the main status that is changed.

FIGURE 2-6

Flow of System Board Deletion Processing

Status of addition into OS

Deletion/ deletion reservation

Test: passed

Assignment: assigned

Connectivity: connected

Configuration: configured

Request of deletion from OS

Status of deletion from OS

Test: passed

Assignment: assigned

Connectivity: connected

Configuration: unconfigured

Reboot of domain after reservation

Domain configuration change process

Test: passed

Assignment: a ssigned

Connectivity: connected

Disconnection from domain

Deletion from

OS completed

Test: passed

Assignment: assigned

Connectivity: disconnected

Deletion from

Domain

Domain configurationchange completed

DCL registration status

Test: passed

Assignment: assigned

Deletion from

DCL

System board pool

Test: passed

Assignment: available

2-22

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

2.4.3.3

Flowchart: Moving a System Board

The flow of DR operations and the transition of system board status when a system board has been moved or reserved for a move are described in the schematic flowchart, below.

Each system board status indicated in

FIGURE 2-7

is the main status that is changed.

For the flow of system board addition processing or deletion processing and the related system board status, see

Section 2.4.3.1, “Flowchart: Adding a System Board” on page 2-20

or

Section 2.4.3.2, “Flowchart: Deleting a System Board” on page 2-21 ,

respectively.

Chapter 2 What You Must Know Before Using DR

2-23

FIGURE 2-7

Move process

Flow of System Board Move Processing

Move reservation process

Deletion of system board in original domain

Deletion completed

Reservation to delete system board in original domain

Reboot of original domain

Process to change domain configuration in original domain

Assignment: assigned

Connectivity: disconnected

Unassignment

Configuration: unconfigured from domain

Assignment:unavailable

Connectivity: disconnected

Configuration: unconfigured

Configuration change of original domain completed

Process to change configuration of destination domain

Assignment: unavailable

Connectivity: disconnected

Assignment

Configuration: unconfigured to domain

Assignment: assigned

Connectivity: disconnected

Configuration: unconfigured

DCL registration status in destination domain

Request to add system board to destination domain

Registration for destination domain completed

Status of assignment to destination domain

2-24

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

2.4.3.4

Flowchart: Replacing System Board

The flow of DR operations and the transition of system board status when a system board has been replaced are described using the schematic flowchart.

Each system board state indicated in

FIGURE 2-8

is the main status that is changed.

The sample status before and after replacement as shown in the figure are explained below. The actual status after hardware replacement may not match the indicated status.

For the flow of system board addition processing or deletion processing and the related system board status, see

Section 2.4.3.1, “Flowchart: Adding a System Board” on page 2-20

or

Section 2.4.3.2, “Flowchart: Deleting a System Board” on page 2-21 ,

respectively.

For details of hardware replacement operations, see the service manual for your system.

Chapter 2 What You Must Know Before Using DR

2-25

FIGURE 2-8

Flow of System Board Replacement Processing

Deletion process

Deleting a system board

Request to delete from

DCL registration status

Deletion of system boards also from system board pool

DCL registration status

Assignment: assigned

System board pool

Assignment: available

Replacement process

Replacement completed

Hardware replacement and diagnosis

Replacement completed

Replacement process

DCL registration status

Test: passed

Assignment: assigned

System board pool

Test: passed

Assignment: available

Addition process

Addition process

Addition of system board

2-26

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

2.5

2.5.1

2.5.2

2.5.2.1

2.5.2.2

Operation Management

This section describes the premises and the actions for DR operations.

I/O Device Management

Upon the addition of a system board, device information is reconfigured automatically. However, addition of the system board and the reconfiguration of device information do not end at the same time.

Sometimes, device link in /dev directory is not automatically cleaned up by devfsadmd

(1M) daemon. Using devfsadm(1M), you can manually clean up this device link. See the devfsadm(1M) Solaris man page for details.

Swap Area

The size of available virtual memory is the sum of the size of memory mounted in the system and the size of the swap area on the disk. You must ensure that the size of available memory is sufficient for all necessary operations.

Swap Area at System Board Addition

By default in Solaris, the swap area is also used to store a system crash dump. You should use a dedicated dump device, instead. See the Solaris man page dumpadm

(1M). The default swap area used to store the crash dump varies in size according to the size of mounted memory.

The size of the dump device used to store the crash dump must be larger than the size of mounted memory. When a system board is added, thereby increasing the size of mounted memory, the dump device must be reconfigured as required. For details, see the dumpadm(1M) Solaris man page.

Swap Area at System Board Deletion

When you delete a system board, the memory of the system board is swapped to the swap area of the disks. The available swap area is decreased by the memory size to be deleted. So, before you execute a delete board command, check the total swap area to verify that enough free swap space is available to hold the board's physical

Chapter 2 What You Must Know Before Using DR

2-27

2.5.3

2.5.4

memory contents. Be aware that some of the total swap space may be supplied by disks that are attached to the board to be deleted. When making your assessment, be certain to also account for the swap space that will be lost.

If the size of available memory (e.g., 1.5 gigabytes) is larger than the size of deleted memory (e.g., 1 gigabytes), the total size of available memory will be 0.5

gigabytes after deleting the system board.

If the size of available memory (e.g., 1.5 gigabytes) is smaller than the size of deleted memory (2 gigabytes), the attempt to delete the system board will fail.

To determine the size of currently available swap area, execute the swap -s command on the OS and verify that the memory size is marked available. For details, refer to the Solaris man page swap(1M). Moreover, the size of physical memory of system board to be deleted and information on I/O devices connected can be confirmed by the showdevices(8) command. See Section 3.1.4, “Display Device

Information” in Chapter 3 , DR User Interface, or the showdevices(8) man page.

Refer to Appendix B for a more complete example.

Real-time Processes

The Solaris OS is temporarily suspended when a kernel memory board is deleted or moved. If your system has any real-time requirements (such as might be indicated by the presence of real-time processes), be aware that such a DR operation could significantly affect these processes.

Memory Mirror Mode

The memory mirror mode is a function used to duplex memory to ensure the hardware reliability of memory. When memory mirror mode is enabled, the domain can continue operation even if a fault occurs in a part of memory (provided that the fault is recoverable).

Memory mirror mode cannot be set in some division types of PSB. Please see the

SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for the availability of memory mirroring.

Enabling memory mirror mode does not restrict any DR functions. However, you must consider the domain configuration and operation when enabling memory mirror mode.

2-28

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

2.5.5

2.5.6

2.5.7

For example, when a kernel memory board with memory mirror mode enabled is deleted or moved, kernel memory is moved from the kernel memory board to another system board. Kernel memory is moved normally even if memory mirror mode is disabled for the move-destination system board. However, this operation results in lowered reliability of memory on the new kernel memory board.

You must properly plan and decide the setting of memory mirror mode by fully considering the requirements for the domain configuration and operations.

Capacity on Demand (COD)

DR works the same on COD boards as on other system boards, but standard COD restrictions, such as licensing, still apply.

For detailed information on COD boards, refer to SPARC Enterprise

M4000/M5000/M8000/M9000 Servers Administration Guide.

XSCF Failover

An XSCF failover might prevent a DR operation from completing. To check, log in to the active XSCF, check the status of the system board and, if necessary, repeat the DR operation.

Kernel Memory Board Deletion

If an XSCF failure or failover occurs during the Copy-rename phase of a deleteboard

(8) or moveboard(8) operation, the Solaris OS may panic and display the following message:

Irrecoverable FMEM error error_code

If you see this message, log in to the XSCF again to check status. You may have to reboot the Solaris OS and, on the XSCF, check system board status, specify the kernel memory board, and repeat the DR operation.

Chapter 2 What You Must Know Before Using DR

2-29

2.5.8

Deletion of Board with DVD Drive

To delete the system board to which the server’s DVD drive is connected, execute the following steps:

1. Stop the vold(1M) daemon by disabling the volfs service.

# /usr/sbin/svcadm disable volfs

2. Execute the DR operation.

3. Restart the vold(1M) daemon by enabling the volfs service.

# /usr/sbin/svcadm enable volfs

For details, see the vold(1M) Solaris man page.

2-30

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

C H A P T E R

3

DR User Interface

This chapter describes the user interfaces for DR.

3.1

How To Use the DR User Interface

XSCF provides two user interfaces for DR: the command line interface by XSCF shell, and the browser-based user interface by XSCF Web.

This section describes the main XSCF shell commands used for DR. For other related commands, see

Section 3.2, “Command Reference” on page 3-25 . For XSCF Web, see

Section 3.2, “Command Reference” on page 3-25

and

Section 3.3, “XSCF Web” on page 3-27 .

XSCF shell commands for DR operations are classified into two types: DR display and DR operation commands.

TABLE 3-1

DR Display Commands

Command name Function

showdcl

Display the DCL and domain status.

showdomainstatus

Display domain status.

showboards showdevices showfru

Display system board information.

Display information about the CPUs, memory, and I/O devices on system boards.

Display PSB configuration information.

3-1

TABLE 3-2

Command name

setdcl setupfru addboard deleteboard moveboard

DR Operation Commands

Function

Update and edit the DCL.

Set the division type and memory mirror mode for a PSB.

Add a system board to a domain.

Delete a system board from a domain.

Move a system board between domains.

The sections below describe the DR display and DR operation commands in detail and show examples. For details of the options, operands, and usage of these commands, refer to SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF

Reference Manual.

Note –

Use of the user interfaces with XSCF shell and XSCF Web is restricted to selected administrators, and requires administrator privileges for DR operations.

When system boards are shared by multiple administrators, the administrators must carefully prepare and plan secure DR operations.

3.1.1

Displaying Domain Information

The showdcl(8) command displays domain information including the domain ID, configured system board numbers, and domain status in list format.

The showdcl(8) command is used before a DR operation to determine whether the domain status permits DR operation, and confirm the registration of the DR-target system board in the DCL. The showdcl(8) command is also used after a DR operation to confirm domain status and configuration.

To change domain settings or register a system board in the DCL, use the setdcl(8) command. To change PSB settings, use the setupfru(8) command.

The following examples show the format and specifiable options of the showdcl(8) command.

showdcl [-v] -a showdcl [-v] -d

domain_id [-l lsb ...]

showdcl -h

3-2

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

TABLE 3-3

Option

-a

-v

-h

-d

domain_id

l lsb

Options of the showdcl Command

Description

Displays configuration information and status of all domains.

Displays detailed domain configuration information.

Displays usage information.

Displays information about the specified domain, where domain_id is the domain number, possibly 0 to 23, depending on server model.

Only one domain ID can be specified.

Displays information about the specified logical system board (LSB), numbered 00 to 15. For information about multiple LSBs, list board numbers separated by a space. For example:

showdcl -l 00 -l 01

.

TABLE 3-4

Items of Domain Information to be Displayed

Display items Description

DID

LSB

XSB

Status

No-mem

No-IO

Domain ID.

Logical system board number.

System board number.

Domain Status

Powered Off

Initialization

Phase

OpenBoot

Executing

Completed

Domain power is off.

POST processing or OpenBoot PROM initialization is in progress.

Initialization of OpenBoot PROM is completed.

Running

Shutdown

Started

Panic State

Solaris OS is running.

Solaris OS is being shut down.

Solaris OS panic occurred.

Setting of omit-memory option true Enabled: Solaris OS does not use memory false Disabled: Solaris OS uses memory.

Setting of omit-IO option true false

Enabled: Solaris OS does not use I/O device.

Disabled: Solaris OS uses I/O device.

Chapter 3 DR User Interface

3-3

TABLE 3-4

Items of Domain Information to be Displayed (Continued)

Display items Description

Float Setting of floating board option true Enabled: Board is designated as a Floating board.

false Disabled: Board is not designated as Floating board.

Cfg-policy Setting of configuration policy

FRU

XSB

System

Degradation in units of components.

Degradation in units of XSB.

Stopping of domain without degradation.

The table below lists the items displayed by the showdcl(8) command.

The following shows examples of displays by the showdcl(8) command.

Example 1: Display of information on domain #0

XSCF> showdcl -d 0

DID LSB XSB Status

00 Running

00

04

05

06

07

08

00-0

01-0

01-01

01-2

01-3

02-0

Example 2: Display of detailed information on domain #0

XSCF> showdcl -v -d 0

DID LSB XSB Status

00 Running

00

01

02

03

04

05

00-0

-

-

-

01-0

01-1

06

07

08

09

10

11

01-2

01-3

02-0

-

-

-

No-Mem

False

False

False

False

True

True

No-IO

False

False

True

True

True

True

Float

False

False

False

True

True

True

Cfg-policy

FRU

3-4

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

3.1.2

Displaying Domain Status

The showdomainstatus(8) command lists the domains in the system and their status. This command displays the same domain status information as the showdcl

(8) command.

Use the showdomainstatus(8) command to check domain status before and after a

DR operation.

The following examples show the format and options of the showdomainstatus(8) command:

showdomainstatus -a showdomainstatus -d

domain_id

showdomainstatus -h

TABLE 3-5

Option

-a

-d

domain_id

-h

Options of the showdomainstatus Command

Description

Displays the status of all domains.

Displays information about the specified domain, where domain_id is the domain number, possibly 0 to 23, depending on server model.

Only one domain ID can be specified.

Displays usage information.

The table below lists the items displayed by the showdomainstatus(8) command.

TABLE 3-6

Items of Domain Information to be Displayed

Display items Description

DID Domain ID

Chapter 3 DR User Interface

3-5

3.1.3

TABLE 3-6

Items of Domain Information to be Displayed (Continued)

Display items Description

Status Domain status

Powered Off

Initialization Phase

Domain power is off.

POST processing or OpenBoot PROM initialization is in progress.

Initialization by OpenBoot PROM is completed.

OpenBoot Executing

Completed

Booting/OpenBoot

PROM prompt

Running

Shutdown Started

Panic State

Solaris OS is being booted or, due to the domain shutdown or reset, the system is in the OpenBoot

PROM running state, or is suspended in the

OpenBoot PROM (ok prompt) state.

Solaris OS is running.

Solaris OS is being shut down.

Solaris OS panic occurred.

The following example shows a display of the showdomainstatus (8) command.

Example: Display of information on all domains

XSCF> showdomainstatus

DID Status

00

01

02

03

Running

Powered Off

-

Running

Displaying System Board Information

The showboards(8) command displays system board information including the domain ID of the domain to which the target system board belongs and various kinds of system board status in list format.

Use the showboards(8) command before a DR operation to determine whether the system board status permits DR operations, and to confirm the domain ID of the domain to which the target system board belongs. The showboards(8) command is also used after a DR operation to confirm system board status.

To change domain settings or register a system board in the DCL, use the setdcl(8) command. To change PSB settings, use the setupfru(8) command.

3-6

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

The following examples show the format and options of the showboards(8) command.

showboards [-v] -a [-c sp] showboards [-v] -d

domain _id [-c sp]

showboards [-v]

xsb

showboards -h

TABLE 3-7

Option

-v

-a

-h

-d

domain_id xsb

-c sp

Options of the showboards Command

Description

Displays detailed information about the system board.

Displays information about all mounted system boards.

Displays the usage information.

Displays information about the specified domain, where domain_id is the domain number, possibly 0 to 23, depending on server model.

Only one domain ID can be specified.

Displays information about the specified XSB.

Specify xsb in the XX-Y format. (XX = 00 to 15, Y = 0 to 3). The value depends on server model.

Displays information about system boards in system board pool.

The table below lists the items displayed by the showboards(8) command.

TABLE 3-8

Display items

XSB

R

DID (LSB)

Items of System Board Information to be Displayed

Description

System board number.

Reservation status of a system board.

“*” is displayed for a system board when the board is reserved for addition, deletion, or a move.

Domain ID of the domain into which the system board is added and logical system board number “SP” is displayed for a system board that is in the system board pool.

Chapter 3 DR User Interface

3-7

TABLE 3-8

Display items

Assignment

Pwr

Conn

Conf

Items of System Board Information to be Displayed (Continued)

Description

Status of assignment to domain configuration

Unavailable The system board cannot be used.

The system board may be unrecognizable because it is not mounted or it is faulty, the domain or system board may not have been configured, or the system board may be assigned to another domain.

Available The system board can be used and is registered in the

Domain Component List (DCL).

The system board is in this status when in the system board pool.

Assigned The system board is assigned to the domain.

Power-on/off status of system board n y

Power-off status.

The system board is powered off and cannot be used.

Power-on status.

The system board is powered on.

Status of connection to domain configuration n Disconnected status.

The system board is disconnected from the relevant domain configuration or in the system board pool.

y Connected status.

The system board is connected to the relevant domain configuration.

Status of addition into Solaris OS n Unconfigured status.

The hardware resources of the system board have been deleted from the Solaris OS.

y Configured status.

The hardware resources of the system board have been added into the Solaris OS.

3-8

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

TABLE 3-8

Display items

Test

Fault

COD

Items of System Board Information to be Displayed (Continued)

Description

Diagnostic status of system board

Unmount The system board is not mounted or cannot be recognized because it is faulty.

Unknown The system board is not being diagnosed.

Testing

Passed testing.

The system board was tested, and passed.

Failed A system board error was tested, and failed.

The system board cannot be used or has been degraded.

Normal/abnormal status of system board

Normal

Degraded

Normal.

Components have been degraded, but the system board is operating. Degraded here means that a system board included in the corresponding component is faulty

.

The system board cannot operate because of an error.

Failed

Indication of whether the system board is a COD board.

n The board is not a COD board.

y The board is a COD board.

The following examples show displays of the showboards(8) command

Example 1

: Display of information on all system boards

XSCF> showboards -a

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

------------------------------------------------------------------

00-0 00(00) Assigned y y y Passed Normal

00-1 00(01)

00-2 SP

00-3 01(15)

Assigned

Available

Assigned y y y n n y n n y

Passed

Unknown

Passed

Degraded

Normal

Normal

Chapter 3 DR User Interface

3-9

3.1.4

Example 2

: Display of detailed information on all system boards

XSCF> showboards -v -a

XSB R DID(LSB) Assignment Pwr Conn Conf Test Fault COD

--------------------------------------------------------------------------

00-0 00(00) Assigned y y y Passed Normal n

00-1 00(01)

00-2 * SP

00-3 01(15)

Assigned

Available

Assigned y y y n n y n n y

Passed

Unknown Normal

Passed

Degraded

Normal n n n

Example 3

: Display of information on the system board in the system board pool in domain #0

XSCF> showboards -c sp -d 0

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

------------------------------------------------------------------

00-2 SP Available y n n Passed Normal

Displaying Device Information

Use the showdevices(8) command to display device information.

The showdevices(8) command displays information about the physical devices including CPUs, memory, and PCI cards mounted on system boards, and displays the hardware resources usable with these devices in hardware resource format.

The showdevices(8) command is used before a DR operation to confirm information about and status of the hardware resources of the DR-target system board, and to determine the process to access the CPU and I/O devices.

Resource management applications or subsystems provide information concerning use of the hardware resources. A showdevices(8) command offline query about management target resources estimates the effect of each DR operation applied to the system boards and displays the results.

The following examples show the format and options of the showdevices(8) command.

showdevices [-v] [-p bydevice|byboard|query|force]

xsb […]

showdevices [-v] [-p bydevice|byboard] -d

domain_id

showdevices -h

3-10

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

Note –

The showdevices(8) command only reports information about a running domain.

TABLE 3-9

Option

-v

-p query

-p force

xsb

-d

domain_id

Options of the showdevices Command

-p bydevice

-p byboard

Description

Specifies that the command displays information about all devices.

Information about not only the management target devices but also other devices is displayed. However, the displayed information includes resource information about the devices whose resources are managed and does not include resource information about the devices whose resources are not managed.

Specifies that the command display information about the devices mounted on a system board (CPU, memory, and I/O devices), sorted by device.

If neither -p bydevice nor -p byboard is specified, -p bydevice is the default.

Specifies that the command display information about the devices mounted on system boards (CPU, memory, and I/O devices) by system board.

Tests the detachability of the board by test-running the DR command without actually executing it.

Tests the detachability of the board by test-running the DR command with the force flag without actually executing it.

Specifies a system board (XSB) number. Specify xsb in the XX-Y format. (XX = 00 to 15, Y = 0 to 3). The value depends on server model.

Specifies ID of the specified domain, where domain_id is the domain number, possibly 0 to 23, depending on server model. Only one domain ID can be specified.

Chapter 3 DR User Interface

3-11

TABLE 3-10

Domain Information Displayed by the showdevices command

Display items

CPU

Memory

Description

CPU information.

DID

XSB id state speed ecache usage

Memory information.

DID

XSB board mem perm mem base address domain mem target board

Domain ID.

System board number.

CPU ID.

CPU status.

CPU frequency (MHz).

CPU cache size (Megabyte: MB).

Description of instance using resources.

Domain ID

System board number

Size of memory on system board (MB).

Size of non-relocatable (kernel) memory on system board (MB)

Base physical address of memory on system board.

Size of memory in domain (MB).

System board number of the system board whose kernel memory is drained.

Size of already deleted memory (MB).

Size of remaining memory to be deleted (MB).

IO Devices deleted mem remaining mem

I/O device information.

DID

XSB device resource usage query usage/reason

Domain ID.

System board number.

Instance name and number of I/O device.

Management resource name.

Description of resource usage.

Results of estimation with an offline query.

Description of resource usage and reason for the results of estimation with an offline query.

The following example shows a display by the showdevices(8) command.

3-12

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

3.1.5

Example:

Display of device information on XSB00-0

XSCF> showdevices 00-0

CPU:

----

DID

00

XSB

00-0 id

0 state speed ecache on-line 2048 4

00 00-0 1 on-line 2048 4

Memory:

------board perm base remaining

DID XSB mem MB mem MB address mem MB

00 00-0 8192 2048 0x000003c000000000

I/O Devices:

----------

DID XSB

00

00

00-0

00-0 device sd0 sd0

00

00

00-0

00-0

10.1.1.1

sd0 bge0 resource

/dev/dsk/c0t0d0s0

/dev/dsk/c0t0d0s1

/dev/dsk/c0t0d0s1

SUNW_network/bge0 domain mem MB

65536 target

XSB deleted mem MB usage mounted filesystem “/” swap area dump device (swap) bge0 hosts IP addresses:

Displaying System Board Configuration

Information

Use the showfru(8) command to display system board configuration information.

The showfru(8) command displays information about the PSB division type and memory mirroring mode settings in list format.

To change the PSB configuration, use the setupfru(8) command.

The following examples show the format and options of the showfru(8) command.

showfru -a

device

showfru

device location

showfru -h

Chapter 3 DR User Interface

3-13

TABLE 3-11

Options of the showfru Command

Option

-a

-h

device location

Description

Specifies that the command display all configuration information on devices of the type specified by devtype.

Displays usage information.

Specifies a device type. Specify “sb” for DR.

Specifies a device name. Specifies a physical system board (PSB) number. Specify a decimal number from 00 to 15 for PSB. To display information about multiple system boards, several PSB numbers can be specified by delimiting each with a space. The range of PSB numbers to be specified varies depending on the model used.

The table below lists the items displayed by the showfru(8) command.

TABLE 3-12

Items of System Board Configuration Information to be Displayed

Display items Description

Device

Location

Device type.

“sb” is the corresponding device for DR.

Mounting location of a device.

Displays a physical system board (PSB) number.

XSB Mode XSB division type.

Uni Uni-XSB (no division) mode.

Quad Quad-XSB: four-division mode.

Memory mirror mode.

Memory

Mirror

Mode yes no

Memory mirror mode is enabled.

Memory mirror mode is disabled.

The following example shows a display of the showfru(8) command.

Example

: Display of configuration information on all system boards

XSCF> showfru -a sb

Device Location sb 00

XSB Mode

Quad sb sb sb

01

02

03

Quad

Quad

Uni

Memory Mirror Mode yes yes no no

3-14

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

3.1.6

Adding a System Board

Use the addboard(8) command to add a system board to a domain or reserve the addition of a system board to a domain based on the DCL. The system board must already be registered in the target domain’s DCL.

Use the showdcl(8) command to check whether a system board is registered in the

DCL. To register a system board in the DCL, use the setdcl(8) command.

Before executing the addboard(8) command, check the status of the DR-target domain and system board. You must determine whether you can perform the DR operation based on the status of the domain and system board.

The following examples show the format and options of the addboard(8) command.

addboard [[-q] -{y|n}] [-f] [-v] [-c configure] -d

domain_id xsb [...]

addboard [[-q] -{y|n}] [-f] [-v] -c assign -d

domain_id xsb [...]

addboard [[-q] -{y|n}] [-f] [-v] -c reserve -d

domain_id xsb [...]

addboard -h

TABLE 3-13

Options of the addboard Command

Option

-q

-y

-n

Description

Specifies the suppression of output message display.

The -y or -n option determines how output messages are automatically answered, whether or not the messages themselves are suppressed (with the -q option) or displayed.

Specifies that a response of "yes" is made automatically to all output messages.

The -y or -n option determines how output messages are automatically answered, whether or not the messages themselves are suppressed (with the -q option) or displayed.

Specifies that a response of "no" is made automatically to all output messages.

The-y or -n option determines how output messages are automatically answered, whether or not the messages themselves are suppressed (with the -q option) or displayed.

-f

-v

Forcibly adds a system board that has not been diagnosed to a domain. This option for normal DR operations must not be used.

A faulty system board, or a system board where a fault is detected will not be forcibly added to the destination domain.

Displays the progress of this DR command.

If the option is specified with the -q option, the -v option is ignored.

Chapter 3 DR User Interface

3-15

TABLE 3-13

Options of the addboard Command (Continued)

Option

-h

-c configure

-c assign

-c reserve

-d

domain_id xsb

Description

Displays the usage information.

Specifies that the command add a system board to the domain. If no other -c option is specified, -c configure is the default.

Specifies that the command assign a system board to the domain.

With this option specified, the command assigns the target system board to the domain. The assigned system board is added to the domain when the addboard(8) command with the -c configure option specified is executed, and then the domain power is turned on or the domain rebooted.

Specifies that the command reserve the addition of a system board to the domain.

With this option specified, the command executes the same processing as for the -c assign option, and it assigns the target system board to the domain. The assigned system board is added to the domain when the addboard(8) command with the -c configure option specified is executed, and then the domain power is turned on or the domain is rebooted.

Specifies the domain ID of the domain to add a system board, where

domain_id is the domain number, possibly 0 to 23, depending on server model. Only one domain ID can be specified.

Specifies the system board (XSB) number of the system board to be added.

Specify xsb in the XX-Y format. (XX = 00 to 15, Y = 0 to 3). The value depends on server model. To specify multiple system boards, several

XSB numbers can be specified by delimiting each with a space.

Note –

(Note 1) In the system board addition processing executed by this command, a diagnosis of the system board to be added is performed first, and then the system board is added to the target domain. For this reason, much time may be required for the command to complete its operation.

Note –

(Note 2) If DR processing by the addboard(8) command fails, the target system board cannot be restored to its previous status. You must identify the cause of failure based on the error message output by the addboard(8) command and

Solaris OS messages, and then take appropriate corrective action. Note that some errors require the domain to be rebooted.

3-16

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

Note –

(Note 3) If a system board has been forcibly added to a domain by the addboard

(8) command with the -f option specified, normal operation of all added hardware resources may be disabled. For this reason, you should avoid using the -f option for normal DR operations. After adding a system board by using the addboard

(8) command with the -f option specified, be sure to check the status of the added system board and the devices on the system board.

3.1.7

Deleting a System Board

Use the deleteboard(8) command to delete a system board from a domain and assign it to the system board pool. If you specify the -c reserve option, the action takes place the next time the domain is powered off or rebooted.

Before executing the deleteboard(8) command, check the status of the target domain and system board, and the device usage status on the system board. You must determine whether you can perform the DR operation according to the status of the domains and system board, and the device usage status on the system board.

You must also stop the processes that are bound to the CPU and the accessing of I/O devices to prepare for system board deletion.

If the system board to be deleted is a kernel memory board, check the status and memory size of the system board to which kernel memory is to be moved.

The following examples show the format and options of the deleteboard(8) command.

deleteboard [[-q] -{y|n}] [-f] [-v] [-c disconnect]

xsb [xsb...]

deleteboard [[-q] -{y|n}] [-f] [-v] -c unassign

xsb [xsb...]

deleteboard [[-q] -{y|n}] [-f] [-v] -c reserve

xsb [xsb...]

deleteboard -h

Chapter 3 DR User Interface

3-17

TABLE 3-14

Options of the deleteboard Command

Option

-q

-y

-n

-f

-v

-h

-c disconnect

-c unassign

-c reserve

xsb

Description

Specifies the suppression of output message display.

The -y or -n option determines how output messages are automatically answered, whether or not the messages themselves are suppressed (with the -q option) or displayed.

Specifies that a response of "yes" is made automatically to output messages.

The -y or -n option determines how output messages are automatically answered, whether or not the messages themselves are suppressed (with the -q option) or displayed.

Specifies that a response of "no" is made automatically to output messages.

The -y or -n option determines how output messages are automatically answered, whether or not the messages themselves are suppressed (with the-q option) or displayed.

Forcibly deletes a system board from the domain. This option for normal DR operations must not be used.

Displays the progress of this DR command.

If the option is specified with the -q option, the -v option is ignored.

Displays the usage information.

Specifies that the command delete a system board from the domain and set it in the status where it is assigned to the domain. This is a default option.

Deletes the board and adds it to the system board pool.

The command unconfigures and disconnects the system board from the domain. If the board is in the state where it is assigned to the domain, the command unassigns the board from the domain and puts it in the system board pool. Also, if the domain power is off, the command similarly puts the board in the system board pool.

Reserves the deletion of a system board from a domain. The system board is deleted from the domain and placed in the system board pool when the domain power is turned off or the domain is rebooted.

If the board is in the state where it is assigned to the domain, the command unassigns the board from the domain and places it in the system board pool. Also, if the domain power is off, the command similarly places the board in the system board pool.

Specifies the system board (XSB) number of the system board to be deleted.

Specify xsb in the XX-Y format. (XX = 00 to 15, Y = 0 to 3). The value depends on server model. To specify multiple system boards, several

XSB numbers can be specified by delimiting each with a space.

3-18

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

Note –

(Note 1) The time required for system board deletion processing depends on the amount of hardware resources mounted on the target system board. For this reason, much time may be required for the command to end its operation. If the system board contains kernel memory, the OS is suspended for a while.

Note –

(Note 2) If the DR processing executed by the deleteboard(8) command fails, the target system board cannot be restored to the previous status. If DR processing fails, identify the cause of failure based on the error message output by the deleteboard(8) command and Solaris OS messages, and then take appropriate corrective action. Note that some errors require the domain to be rebooted.

Note –

(Note 3) When a system board is forcibly deleted from a domain by the deleteboard

(8) command with the -f option specified, a serious problem may occur in a process that is bound to the CPU or in accessing an I/O device. For this reason, you should avoid using the -f option for normal DR operations. When using the deleteboard(8) command with the -f option specified, be sure to check the status of the domain and application processes.

3.1.8

Moving a System Board

Use the moveboard(8) command to delete a system board from the move-source domain and add it to the move-destination domain, assign it to the movedestination domain, or reserve it to be moved later.

To execute the moveboard(8) command, the system board must have been configured in or assigned to the move-source domain, and be registered in the DCL for the move-destination domain.

Use the showdcl(8) command to check whether a system board is registered in the

DCL. To register a system board in the DCL, use the setdcl(8) command.

Before executing the moveboard(8) command, check the status of the move-source and move-destination domains and move-target system board, and the device usage status on the system board. You must determine whether you can perform the DR operation according to the status of the domains and system board, and the device usage status on the system board. You must also stop any processes that are bound to the CPU and any that are accessing I/O devices to prepare for system board deletion.

If the system board to be deleted is a kernel memory board, check the status and memory size of the system board to which kernel memory is to be moved.

Chapter 3 DR User Interface

3-19

The following examples show the format and options of the moveboard(8) command.

moveboard [[-q] -{y|n}][-f][-v][-c configure] -d

domain_id xsb[xsb...]

moveboard [[-q] -{y|n}][-f][-v] -c assign -d

domain_id xsb[xsb...]

moveboard [[-q] -{y|n}][-f][-v] -c reserve -d

domain_id xsb[xsb...]

moveboard -h

TABLE 3-15

Options of the moveboard Command

Option

-q

-y

-n

-f

-v

-h

-c configure

Description

Specifies the suppression of output message display.

The -y or -n option determines how output messages are automatically answered, whether or not the messages themselves are suppressed (with the -q option) or displayed.

Specifies that a response of "yes" is made automatically to output messages.

The -y or -n option determines how output messages are automatically answered, whether or not the messages themselves are suppressed (with the -q option) or displayed.

Specifies that a response of "no" is made automatically to output messages.

The -y or -n option determines how output messages are automatically answered, whether or not the messages themselves are suppressed (with the -q option) or displayed.

Forcibly deletes a system board from the move-source domain and move it to the move-destination domain. This option for normal DR operations must not be used.

A faulty system board, or a system board where a fault is detected will not be forcibly added to the destination domain.

Displays messages about the progress of this DR operation.

If the option is specified with the -q option, the -v option is ignored.

Displays the usage information.

Specifies that the command delete a system board from the movesource domain and adds it to the move-destination domain.

If no other -c option is specified, -c configure is the default.

The move operation from the move-source domain is performed when the domain power is off or the Solaris OS is running in the move-source domain. However, if the domain power is off or the

Solaris OS is not running in the move-destination domain, the move operation from the move-source domain is not performed and DR processing terminates with an error.

3-20

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

TABLE 3-15

Options of the moveboard Command (Continued)

Option

-c assign

-c reserve

-d

domain_id xsb

Description

Specifies that the command delete a system board from the movesource domain and assign it to the move-destination domain.

The assigned system board is added to the move-destination domain when the addboard(8) command is executed in the move-destination domain, the power of the move-destination domain is turned on, or the move-destination domain is rebooted.

The move operation from the move-source domain is performed and the system board is set to the state where it is assigned to the movedestination domain when the domain power is off in both the movesource domain and the move-destination domain or the Solaris OS is not running in both domains.

Specifies that the command reserve a system board move in the move-source domain.

The system board is deleted from the move-source domain and assigned to the move-destination domain when the power of movesource domain is turned off or the move-source domain rebooted.

The assigned system board is added to the move-destination domain when the addboard(8) command is executed in the move-destination domain, the power of the move-destination domain is turned on, or the move-destination domain is rebooted.

The move operation from the move-source domain is performed and the system board is set to the state where it is assigned to the movedestination domain when the domain power is off or the Solaris OS is not running in the move-source domain.

Specifies the domain ID of the move-destination domain, where

domain_id is the domain number, possibly 0 to 23, depending on server model. Only one domain ID can be specified.

Specifies the system board (XSB) number of the system board to be moved.

Specify xsb in the XX-Y format. (XX = 00 to 15, Y = 0 to 3). The value depends on server model. To specify multiple system boards, several

XSB numbers can be specified by delimiting each with a space.

Note –

(Note 1) The time required for system board deletion processing in the move-source domain depends on the amount of hardware resources mounted on the target system board. Moreover, in the system board addition processing in the movedestination domain, the system board to be added is first diagnosed, and then added to the domain. For this reason, much time may be required for the command to end its operation. Solaris OS is suspended for a while when the system board includes kernel memory.

Chapter 3 DR User Interface

3-21

Note –

(Note 2) If the DR processing executed by the moveboard(8) command fails, the target system board cannot be restored to the previous status. If DR processing fails, identify the cause of failure based on the error message output by the moveboard

(8) command and Solaris OS messages in the move-source and movedestination domains, and then take appropriate corrective action. Note that some errors require one of the domains to be rebooted.

Note –

(Note 3) When a system board is forcibly deleted from the move-source domain by the moveboard(8) command with the -f option specified, a serious problem may occur in a process that is bound to the CPU or in accessing an I/O device. For this reason, you should avoid using the -f option for normal DR operations. When using the moveboard(8) command with the -f option specified, be sure to check the status of the move-source domain and application processes.

3.1.9

Replacing a System Board

Use the deleteboard(8) and addboard(8) commands to replace a system board.

Use them to replace, add, or delete such hardware resources as the CPU, memory, and I/O devices, or replace the PSB of a CMU or IOU.

Note –

In a midrange server, you cannot use DR commands to replace a system board. Instead, turn off the power of all domains, and then replace the target system board.

To replace a system board in a domain, first delete the target system board from the domain by using the deleteboard(8) command to make the PSB replaceable. Next, replace the PSB with a new one, and then add the target system board to the domain.

For details of the conditions and actions for executing the deleteboard(8) command, see

Section 3.1.7, “Deleting a System Board” on page 3-17

. For details of the conditions and actions for executing the addboard(8) command, see

Section 3.1.6, “Adding a System Board” on page 3-15 .

3-22

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

Note –

(Note 1) Before replacing a system board, you must know the division type of the replacement-target PSB and the configurations and operation status of all domains to which all XSBs on the PSB belong.

If the division type of the replacement-target PSB is Quad-XSB and the XSBs on the replacement-target PSB belong to multiple domains, you must consult with all administrators of the relevant domains in advance to adequately adjust the method of replacing the system board.

If the division type of the replacement-target PSB is Uni-XSB, its replacement does not affect any other domains. However, prior adjustment may be required when the replacement-target system board is used as a floating board for multiple domains or hardware replacement work may affect other domains

Note –

(Note 2) If the DR processing executed by the deleteboard(8) or addboard

(8) commands fails, the target system board cannot be restored its the previous status. Identify the cause of failure based on the error messages output by the commands and Solaris OS messages, and then take appropriate corrective action.

Note that some errors require the domain to be rebooted.

Note –

(Note 3) If a system board is forcibly deleted from a domain by the deleteboard

(8) command with the -f option specified, a serious problem may occur in a process bound to the CPU or accessing an I/O device. For this reason, you should avoid using the -f option in normal DR operations. If you must use the deleteboard

(8) command with the -f option specified, be sure to check the status of the domain and application processes before and after execution.

Note –

(Note 4) To execute the addboard(8) command to add a system board by

DR, the system board must already be registered in DCL. Use the showdcl(8) command to check whether a system board is registered in the DCL. To register a system board in the DCL, use the setdcl(8) command.

To replace hardware, you must set the system board to the state where it is assigned to the domain or to the state where it is placed in the system board pool by using the deleteboard

(8) command.

Chapter 3 DR User Interface

3-23

3.1.10

Reserving a Domain Configuration Change

Use the addboard(8), deleteboard(8), or moveboard(8) command to reserve a domain configuration change.

A domain configuration change is reserved when a system board cannot be added, deleted, or moved immediately for operational reasons. The reserved addition, deletion, or move of the system board is executed when the power of the target domain is turned on or off, or the domain rebooted.

If a system board is placed in the system board pool, a domain configuration change can be reserved to assign the system board to the intended domain in advance, preventing the system board from being acquired by another domain.

To reserve the addition of a system board to a domain, use the addboard(8) command with the -c reserve option specified. The system board will be added to the domain when the domain power is turned on, the domain is rebooted, or the next time the addboard(8) command with the -c configure option specified is executed.

For details about the addboard(8) command, see

Section 3.1.6, “Adding a System

Board” on page 3-15

.

To reserve the deletion of a system board from a domain, use the deleteboard(8) command with the -c reserve option specified. The system board will be deleted from the domain when the domain power is turned off, the domain is rebooted, or the next time the deleteboard(8) command with the -c disconnect or -c unassign option specified is executed. For details about the deleteboard(8) command, see

Section 3.1.7, “Deleting a System Board” on page 3-17

.

To reserve a system board move in a domain to another domain, use the moveboard

(8) command with the -c reserve option specified. The system board will be deleted from the move-source domain and moved to the move-destination domain when the power of the move-source domain is turned off, the movedestination domain is rebooted, or the next time the moveboard(8) command with the -c configure or -c assign option specified is executed.

For details about the moveboard(8) command, see

Section 3.1.8, “Moving a System

Board” on page 3-19

.

3-24

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

3.2

Command Reference

This section lists the DR commands and other commands related to DR.

For details of the commands, refer to SPARC Enterprise M4000/M5000/M8000/M9000

Servers XSCF Reference Manual. For the DR commands, see

Section 3.1, “How To Use the DR User Interface” on page 3-1 .

Note –

(Note 1)

Use of each command is restricted to selected administrators only. To use each command, you must have appropriate administrator privileges. For details, refer to

SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF Reference Manual.

Note –

(Note 2)

This section does not list all commands related to DR. For other DR-related commands, refer to SPARC Enterprise M4000/M5000/M8000/M9000 Servers XSCF Reference

Manual.

TABLE 3-16

DR Display Commands

Command name Function

showdcl

Displays the DCL and the domain status.

showdomainstatus

Displays domain status.

showboards showdevices showfru

Displays system board information.

Displays information about the CPUs, memory, and I/O devices on system boards.

Displays PSB configuration information.

TABLE 3-17

DR Operation Commands

Command name

setdcl setupfru addboard deleteboard moveboard

Function

Updates and edits the DCL.

Sets the division type and memory mirror mode for PSB.

Adds a system board into a domain.

Deletes a system board from a domain.

Moves a system board between domains.

Chapter 3 DR User Interface

3-25

TABLE 3-18

DR-related Commands

Command name Function

poweron poweroff setdscp showdscp addfru deletefru replacefru

Turns on the power of all domains or a specified domain.

Turns off the power of all domains or a specified domain.

Configures DSCP network.

Displays the DSCP network configuration.

Installs a Field Replaceable Unit (FRU).

Removes a Field Replaceable Unit (FRU).

Replaces a Field Replaceable Unit (FRU).

addcodlicense

Applies the license key obtained from the license center to the system.

deletecodlicense

Deletes the license key applied to the system.

showcodlicense showcodusage setcod showcod

Displays the license keys applied to the system.

Displays license usage information.

Configures COD settings.

Displays COD settings.

showhardconf showstatus showlog

Displays all components mounted in the server.

Lists degraded components.

Displays an error log, power log, event log, console log, panic log,

IPL log, temperature/humidity log, and monitoring message log.

3-26

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

3.3

XSCF Web

XSCF Web lets you execute DR functions from a browser. XSCF Web is beyond the scope of this document. For details, refer to SPARC Enterprise

M4000/M5000/M8000/M9000 Servers XSCF User’s Guide.

3.4

RCM Script

Reconfiguration Coordination Manager (RCM) is a framework used to manage the dynamic disconnection of system components. RCM provides script functions that enable you to write your own scripts for dynamic reconfiguration.

Using RCM scripts enables you to avoid complicated DR operations (e.g., stopping applications and releasing devices from applications).

For details of how to register RCM scripts and script execution timing, see the

Solaris man page for rcmscript(4).

Note –

(Note 1) An RCM script can only automate actions performed to prepare for the deletion of a system board. When a system board is added to a domain, any actions required for use of the added resources must be manually performed.

Note –

(Note 2) You should test the RCM scripts you create for DR before executing the DR operations. The RCM scripts may not be able to execute certain processing.

Chapter 3 DR User Interface

3-27

3-28

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

C H A P T E R

4

Practical Examples of DR

This chapter provides examples of DR operations, such as the addition, deletion, move, and replacement of system boards.

Each example shows an operation procedure using the command line interface of the XSCF shell. Similar procedures can also be applied to DR operations using the browser-based interface of the XSCF Web.

Note that the sections below explain only procedures such as those for checking the status of components and devices for DR operations and not hardware operations

(e.g., installing, removing, and replacing system boards). Refer to the service manual for each server as needed.

4.1

Flow of DR Operation

This section provides the flows of basic DR operations to add, delete, move, and replace system boards, along with flow diagrams.

4-1

4.1.1

Flow: Adding a System Board

FIGURE 4-1

Flow: Adding a System Board

Hardware maintenance

Checking operation and selecting a DR operation

- Operation status and

configuration of a domain

- Judgment of whether the

DR operation can be

performed

DR operation possible

Checking the domain status

Stop status of the domain

DR operation not possible, or domain configuration to be changed

Error

The domain is operating.

Checking the status of the system board to be added

Checking the device status

Reserve operation for adding a system board

DR operation not possible

Normal

Addition operation for the system board

Power-on or restart of the domain

Addition processing of the system board

Change operation for the domain configuration

4-2

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

4.1.2

Flow: Deleting a System Board

FIGURE 4-2

Flow: Deleting a System Board

Checking operation and selecting a DR operation

- Operation status and

configuration of a domain

- Judgment of whether the

DR operation can be

performed

Stop status of a domain

DR operation possible

Checking the domain status

DR operation not possible

DR operation not possible,

or domain

configuration to be changed

The domain is operating.

Checking the status of the system board to be deleted

Checking the device status

DR operation not possible

Reserve operation for deleting a system board

DR operation possible

Deletion operation for the system board

Power-on or restart of the domain

Deletion processing of the system board

Change operation for the domain configuration

Chapter 4 Practical Examples of DR

4-3

4.1.3

Flow: Moving a System Board

FIGURE 4-3

Flow: Moving a System Board

Checking operation and selecting a DR operation

- Operation status and configuration

of the move-source domain

- Operation status and configuration

of the move-destination domain

- Judgment of whether the DR

operation can be performed

DR operation possible

Confirmation of the move-source and move-destination domains and selecting an operation

DR operation not possible, or domain configuration to be changed

DR operation possible

Checking the status of the system board to be moved

Checking the device status

DR operation not possible

DR operation not possible, or domain configuration to be changed

Reserve operation for moving a system board

DR operation possible

Move operation for the system board

Power-on or restart of the move-source domain

Move processing of the system board

Change operation for the movesource and move-destination domain configurations

Addition operation for the system board in the move-destination domain

Status of reserved addition in the move-destination domain

4-4

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

4.1.4

Flow: Replacing a System Board

FIGURE 4-4

Flow: Replacing a System Board

Stop status of the domain

Checking operation and selecting a DR operation

- Operation status and configuration of a domain

- Adjustment between other domains

- Configuration of the system board to be replaced

- Checking the device status

Deletion reservation

DR deletion

Pooled system board

Power-off of

Deletion reservation operation for the system board in its domain

Stop status of the domain the relevant domain

Deletion operation for the system board in its domain

There is a domain for which deletion has been reserved.

There is no domain for which deletion has been reserved.

Hardware replacement

Checking operation and selecting a DR operation

Addition reservation

DR addition

Deletion reservation operation for the system board in its domain

Start of domain

Power-on of the relevant domain

Start of domain

Deletion operation for the system board in its domain

State of the domain in operation

Chapter 4 Practical Examples of DR

4-5

4.2

Example: Adding a System Board

This section provides an example of the DR operation to add a system board to a domain. In the example, a procedure conforming to section

<LinkColor>4.1.1, "Flow:

Adding a System Board."

, is used, and the system board shown in the figure is added by using the XSCF shell.

FIGURE 4-5

Example: Adding a System Board

Domain#0

XSB#00-0

XSB#01-0

Add

Domain#0

XSB#00-0 XSB#01-0

1. Login to XSCF.

2. Check the status of the domain.

Execute the showdcl(8) command to display domain information, and then check the operation status of the domain. Based on the operation status of the domain, determine whether to perform the DR operation or change the domain configuration.

XSCF> showdcl -d 0

DID LSB XSB Status

00 Running

00

01

00-0

01-0

3. Check the status of the system board to be added.

Execute the showboards(8) command to display system board information, and then check the status of the system board to be added and confirm its registration in the DCL.

4-6

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

If you need to change the PSB configuration, use the setupfru(8) command. If the system board to be added is not registered in the DCL, register the system board in the DCL of the target domain by using the setdcl(8) command.

XSCF> showboards -a

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

----------------------------------------------------------------

00-0 00(00)

01-0 SP

Assigned

Available y y y n y n

Passed

Passed

Normal

Normal

4. Add the new system board.

Execute the addboard(8) command to add the system board to the movedestination domain.

XSCF> addboard -c configure -d 0 01-0

5. Check the status of the domain and added system board.

When the addboard(8) command ends normally, execute the showdcl(8) command to check the operation status of the domain, and then execute the showboards

(8) command to check the status of the added system board.

If the addboard(8) command completes abnormally or leaves the board in an unwanted status, refer output messages to identify the problem, then correct it.

XSCF> showdcl -d 0

DID LSB XSB Status

00 Running

00

01

00-0

01-0

XSCF> showboards -d 0

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

------------------------------------------------------------------

00-0 00(00)

01-0 00(01)

Assigned

Assigned y y y y y y

Passed

Passed

Normal

Normal

Chapter 4 Practical Examples of DR

4-7

4.3

Example: Deleting a System Board

This section provides an example of operation to delete a system board from a domain. In the example, a procedure conforming to

Section 4.1.2, “Flow: Deleting a

System Board” on page 4-3 , is used, and the system board shown in the figure is

deleted using the XSCF shell.

FIGURE 4-6

Example: Deleting a System Board

Domain#0

XSB#00-0

XSB#01-0

Delete

Domain#0

XSB#00-0

XSB#01-0

1. Login to XSCF.

2. Check the status of the domain.

Execute the showdcl(8) command to display domain information, and then check the operation status of the domain. Based on the operation status of the domain, determine whether to perform the DR operation or change the domain configuration.

XSCF> showdcl -d 0

DID LSB XSB

00

Status

Running

00

01

00-0

01-0

4-8

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

3. Check the status of the system board to be deleted.

Execute the showboards(8) command to display system board information, and then check the status of the system board to be deleted.

XSCF> showboards -a

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

-------------------------------------------------------------------

00-0 00(00)

01-0 00(01)

Assigned

Assigned y y y y y y

Passed

Passed

Normal

Normal

4. Delete the system board.

Execute the deleteboard(8) command to delete the system board and pool it in the system board pool.

XSCF> deleteboard -c unassign 01-0

5. Check the status of the domain and deleted system board.

When the deleteboard(8) command ends normally, execute the showdcl(8) command to check the operation status of the domain, and then execute the showboards

(8) command to check the status of the deleted system board.

If the deleteboard(8) command completes abnormally or leaves the board in an unwanted status, refer output messages to identify the problem, then correct it.

XSCF> showdcl -d 0

DID LSB XSB Status

00 Running

00

01

00-0

01-0

XSCF> showboards -a

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

----------------------------------------------------------------

00-0 00(00) Assigned y y y Passed Normal

01-0 SP Available y n n Passed Normal

Chapter 4 Practical Examples of DR

4-9

4.4

Example: Moving a System Board

This section provides an example of an operation to move a system board between domains. In the example, a procedure conforming to

Section 4.1.3, “Flow: Moving a

System Board” on page 4-4 , is used, and the system board shown in the figure is

moved using the XSCF shell.

FIGURE 4-7

Example: Moving a System Board

Domain#0

XSB#00-0

Domain#1

XSB#01-0

Domain#0

XSB#00-0

Move

XSB#00-1

XSB#00-1

Domain#1

XSB#01-0

1. Login to XSCF.

2. Check the status of the move-source domain.

Execute the showdcl(8) command to display domain information, and then check the operation status of the move-source domain.

XSCF> showdcl -d 0

DID LSB XSB Status

00 Running

00

01

00-0

00-1

4-10

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

3. Check the status of the move-destination domain.

Execute the showdcl(8) command to display domain information, and then check the operation status of the move-destination domain. Based on the operation status of the move-source and move-destination domains, determine whether to perform the DR operation or change the domain configuration.

XSCF> showdcl -d 1

DID LSB XSB Status

01 Running

00

01

01-0

00-1

4. Check the status of the system board to be moved.

Execute the showboards(8) command to display system board information, and then check the status of the system board to be moved.

XSCF> showboards 00-1

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

---- -------- ----------- ---- ---- ---- ------- ---------------

00-1 00(01) Assigned y y y Passed Normal

5. Move the system board.

Execute the moveboard(8) command to delete the system board from the movesource domain and add it to the move-destination domain.

XSCF> moveboard -c configure -d 1 00-1

6. Check the status of the move-source domain.

When the moveboard(8) command ends normally, execute the showdcl(8) command to display and check the operation status of the move-source domain.

If the moveboard(8) command completes abnormally or leaves the board in an unwanted status, refer output messages to identify the problem, then correct it.

XSCF> showdcl -d 0

DID LSB XSB Status

00 Running

00

01

00-0

00-1

Chapter 4 Practical Examples of DR

4-11

7. Check the status of the move-destination domain and moved system board.

Execute the showdcl(8) command to check the operation status of the movedestination domain, and then execute the showboards(8) command to check the status of the moved system board.

XSCF> showdcl -d 1

DID LSB XSB Status

01 Running

00

01

01-0

00-1

XSCF> showboards 00-1

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

-------------------------------------------------------------------

00-1 01(01) Assigned y y y Passed Normal

4.5

Examples: Replacing a System Board

This section provides examples of operations to replace a system board in a domain.

The examples illustrate replacement of a system board in a Uni-XSB environment and a system board in a Quad-XSB environment. In each sample operation, a procedure conforming to

Section 4.1.4, “Flow: Replacing a System Board” on page 4-5

, is used, and the system board shown in each figure is replaced using the

XSCF shell.

Note –

You cannot use DR to replace a system board in a midrange server because replacing a system board replaces an MBU. To replace a system board in a midrange server, you must turn off the power for all domains, then perform a hardware replacement.

4-12

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

4.5.1

Example: Replacing a Uni-XSB System Board

FIGURE 4-8

Example: Replacing a Uni-XSB System Board

Domain#0

Delete

Faulty system board

XSB#00-0

XSB#01-0

Add

Replace

New system board

1. Login to XSCF.

2. Check the status of the domain.

Execute the showdcl(8) command to display domain information, and then check the operation status of the domain. Based on the operation status of the domain, determine whether to perform the DR operation or replace the system board after stopping the domain.

XSCF> showdcl -d 0

DID LSB XSB Status

00 Running

00

01

00-0

01-0

3. Check the status of the system board to be replaced.

Execute the showboards(8) command to display system board information, and then check the status of the system board to be deleted. The DR operation for replacement may not be possible if the board to be replaced does not support the

DR delete operation.

XSCF> showboards 01-0

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

-----------------------------------------------------------------

01-0 00(01) Assigned y y y Passed Normal

Chapter 4 Practical Examples of DR

4-13

4. Delete the system board.

Execute the deleteboard(8) command to delete the system board.

XSCF> deleteboard -c disconnect 01-0

5. Check the status of the system board.

Execute the showboards(8) command to display system board information, and then check the status of the system board.

XSCF> showboards 01-0

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

-----------------------------------------------------------------

01-0 00(01) Assigned y n n Passed Normal

6. Physically replace the system board.

Execute the replacefru(8) command, then follow the displayed instructions to replace the system board per the Hot Replacement procedure. For information about Hot Replacement, see the SPARC Enterprise M8000/M9000 Servers Service

Manual.

XSCF> replacefru

7. Check the status of the replaced system board.

Execute the showboards(8) command to display system board information, and then check the status of all related system boards and confirm their registration in the DCL.

If necessary to change the system board configuration (e.g., number of divisions), do so by using the setupfru(8) command. If the system board is not registered in the DCL, register it in the DCL for the target domain by using the setdcl(8) command.

XSCF> showboards 01-0

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

-----------------------------------------------------------------

01-0 00(01) Assigned y n n Passed Normal

4-14

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

8. Check the status of the domain.

Execute the showdcl(8) command to display domain information, and then check the operation status of the domain. Based on the operation status of the domain, determine whether to perform the DR operation or reboot the domains.

XSCF> showdcl -d 0

DID LSB XSB Status

00 Running

00

01

00-0

01-0

9. Add the new system board to the domain.

Execute the addboard(8) command to add the system board to the movedestination domain.

XSCF> addboard -c configure -d 0 01-0

10. Check the status of the domain and added system board.

When the addboard(8) command ends normally, execute the showdcl(8) command to check the operation status of the domain, and then execute the showboards

(8) command to check the status of the added system board.

If the addboard(8) command completes abnormally or leaves the board in an unwanted status, refer to output messages to identify the problem, then correct it.

XSCF> showdcl -d 0

DID LSB XSB Status

00 Running

00

01

00-0

01-0

XSCF> showboards 01-0

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

-----------------------------------------------------------------

01-0 00(01) Assigned y y y Passed Normal

Chapter 4 Practical Examples of DR

4-15

4.5.2

Example: Replacing a Quad-XSB System Board

FIGURE 4-9

Example: Replacing a Quad-XSB System Board

Domain#0

XSB#00-0

XSB#01-0

XSB#01-1

Domain#1

XSB#01-2

XSB#01-3

Delete

Faulty system board

Add

Replace

New system board

1. Login to XSCF.

2. Check the configurations and status of all domains to which the relevant system boards belong.

Execute the showdcl(8) command to display domain information, and then check the configurations and operation status of all domains to which the relevant XSBs belong.

Based on the configurations and operation status of the domains, determine whether to perform the DR operation or replace the replacement-target system board after stopping the domains. If a domain is configured by only the XSBs in the PSB to be replaced, the DR operation for replacement is disabled, and the domain must be stopped for replacement.

In this example, domain #1 has a configuration that requires it to be stopped for system board replacement.

4-16

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

XSCF> showdcl -a

DID LSB XSB Status

00 Running

00

01

00-0

01-0

01-1 02

-------

01

00

01

01-2

01-3

Running

3. Check the status of all related system boards.

Execute the showboards(8) command to display system board information, and then check the status of all system boards related to the PSB to be replaced. The

DR operation for replacement may not be possible if the board to be replaced does not support the DR delete operation.

XSCF> showboards -a

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

-----------------------------------------------------------------

00-0 00(00)

01-0 00(01)

Assigned

Assigned y y y y y y

Passed Normal

Passed Normal

01-1 00(02)

01-2 01(00)

01-3 01(01)

Assigned

Assigned

Assigned y y y y y y y y y

Passed Normal

Passed Normal

Passed Normal

4. Delete all system boards related to the CMU to be replaced.

Execute the deleteboard(8) command to delete the system boards, and then assign the boards to a domain that permits the DR operation.

XSCF> deleteboard -c disconnect 01-0 01-1

5. Power off Domain #1 so the CMU can be replaced.

Execute the poweroff(8) command so that the CMU being replaced will not be in use by domain #1.

XSCF> poweroff -d 1

Chapter 4 Practical Examples of DR

4-17

6. Check the status of all related system boards.

Execute the showboards(8) command to display system board information, and then check the status of all related system boards.

XSCF> showboards -a

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

------------------------------------------------------------------

00-0 00(00)

01-0 00(01)

01-1 00(02)

01-2 01(00)

01-3 01(01)

Assigned

Assigned

Assigned

Assigned

Assigned y y y y y y n n n n y n n n n

Passed

Passed

Passed

Passed

Passed

Normal

Normal

Normal

Normal

Normal

7. Physically replace the system board.

Execute the replacefru(8) command, then follow the displayed instructions to replace the system board per the Hot Replacement procedure. For information about Hot Replacement, see the SPARC Enterprise M8000/M9000 Servers Service

Manual.

XSCF> replacefru

8. Check the status of the replaced system board.

Execute the showboards(8) command to display system board information, and then check the status of the system board to be added and confirm its registration in the DCL.

If you need to change the PSB configuration, use the setupfru(8) command. If the system board is not registered in the DCL, register it in the DCL for the target domain by using the setdcl(8) command.

XSCF> showboards -a

XSB DID LSB) Assignment Pwr Conn Conf Test Fault

--------------------------------------------------------------------

00-0 00(00)

01-0 00(01)

Assigned

Assigned y y y n y n

Passed

Passed

Normal

Normal

01-1 00(02)

01-2 01(00)

01-3 01(01)

Assigned

Assigned

Assigned y y y n n n n n n

Passed

Passed

Passed

Normal

Normal

Normal

4-18

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

9. Check the status of all related domains.

Execute the showdcl(8) command to display domain information, and then check the operation status of all related domains. Based on the operation status of the domain, determine whether to perform the DR operation or reboot the domains.

XSCF> showdcl -a

DID LSB XSB Status

00 Running

00

01

00-0

01-0

01-1 02

-------

01

00

01

01-2

01-3

Powered Off

10. Add the new system board to the domain.

Execute the addboard(8) command in the domain to add the new system board.

XSCF> addboard -c configure -d 0 01-0 01-1

11. Check the status of the related domains and system boards.

Execute the showdcl(8) command to check the operation status of related domains, and then execute the showboards(8) command to check the status of related system boards.

In this example, domain #1 is booted by power-on in this stage.

XSCF> poweron -d 1

XSCF> showdcl -a

DID LSB XSB Status

00 Running

00

01

00-0

01-0

01-1 02

-------

01 Running

00

01

01-2

01-3

Chapter 4 Practical Examples of DR

4-19

XSCF> showboards -a

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

------------------------------------------------------------------

00-0 00(00)

01-0 00(01)

Assigned

Assigned y y y y y y

Passed

Passed

Normal

Normal

01-1 00(02)

01-2 01(00)

01-3 01(01)

Assigned

Assigned

Assigned y y y y y y y y y

Passed

Passed

Passed

Normal

Normal

Normal

4.6

4.6.1

Examples: Reserving Domain

Configuration Changes

This section provides examples of operations to reserve a change in domain configuration by DR. In the examples, the XSCF shell is used to reserve the addition, deletion, and movement of a system board as shown in the given configuration diagram.

Example: Reserving a System Board Add

FIGURE 4-10

Example: Reserve a System Board Add

Domain#0

XSB#00-0

XSB#01-0

Add

Domain#0

XSB#00-0

XSB#01-0

1. Login to XSCF.

4-20

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

2. Check the status of the system board to be added.

Execute the showboards(8) command to display system board information, and then check the status of the system board to be added and confirm its registration in the DCL.

If you need to change the PSB configuration, use the setupfru(8) command. If the system board is not registered in the DCL, register the system board in the

DCL for the target domain by using the setdcl(8) command.

XSCF> showdcl -d 0

DID LSB XSB Status

00 Running

00

01

00-0

01-0

XSCF> showboards 01-0

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

------------------------------------------------------------------

01-0 SP Available y n n Passed Normal

3. Reserve the addition of the system board.

Execute the addboard(8) command to reserve the addition of the system board.

XSCF> addboard -c reserve -d 0 01-0

4. Check the status of the system board.

When the addboard(8) command ends normally, execute the showboards(8) command to display system board information, and then check the status of the target system board and confirm that the addition of the target system board has been reserved.

If the addboard(8) command ends abnormally, identify the cause of the abnormality based on the messages output, and then take appropriate corrective action.

XSCF> showboards -v 01-0

XSB R DID(LSB) Assignment Pwr Conn Conf Test Fault COD

--------------------------------------------------------------------------

01-0 * SP Available y n n Passed Normal n

5. Stop or reboot the domain.

Stop or reboot the domain. This operation executes the reserved deletion of the system board as a change in domain configuration.

Chapter 4 Practical Examples of DR

4-21

4.6.2

Example: Reserving a System Board Delete

FIGURE 4-11

Example: Reserving a System Board Delete

Domain#0

XSB#00-0 XSB#01-0

Delete

Domain#0

XSB#00-0

XSB#01-0

1. Login to XSCF.

2. Check the status of the domain.

Execute the showdcl(8) command to display domain information, and then check the operation status of the domain. Based on the operation status of the domain, determine whether to perform the DR operation or change the domain configuration.

XSCF> showdcl -d 0

DID LSB XSB Status

00 Running

00

01

00-0

01-0

3. Check the status of the system board to be deleted.

Execute the showboards(8) command to display system board information, and then check the status of the system board to be deleted.

XSCF> showboards 01-0

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

------------------------------------------------------------------

01-0 00(01) Assigned y y y Passed Normal

4. Reserve the deletion of the system board.

Execute the deleteboard(8) command to reserve deletion of the system board.

XSCF> deleteboard -c reserve 01-0

4-22

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

4.6.3

5. Check the reserved status of the system board.

Execute the showboards(8) command with the -v option specified to display system board information, and then confirm that deletion of the system board has been reserved.

XSCF> showboards -v 01-0

XSB R DID(LSB) Assignment Pwr Conn Conf Test Fault COD

--------------------------------------------------------------------------

01-0 * 00(01) Assigned y y y Passed Normal n

6. Stop or reboot the domain.

This operation changes the domain’s configuration, reserving deletion of the system board.

Example: Reserving a System Board Move

FIGURE 4-12

Example: Reserving a System Board Move

Domain#0

XSB#00-0

Domain#1

XSB#01-0

Domain#0

XSB#00-0 XSB#01-0

Move

XSB#00-1

XSB#00-1

1. Login to XSCF.

2. Check the status of the move-source domain.

Execute the showdcl(8) command to display domain information, and then check the operation status of the move-source domain.

XSCF> showdcl -d 1

DID LSB XSB Status

01 Running

00 01-0

Chapter 4 Practical Examples of DR

4-23

3. Check the status of the move-destination domain.

Execute the showdcl(8) command to display domain information, and then check the operation status of the move-destination domain. Based on the operation status of the move-source and move-destination domains, determine whether to perform the DR operation or change the domain configuration.

XSCF> showdcl -d 0

DID LSB XSB Status

00 Running

00

01

00-0

00-1

02 01-0

4. Check the status of the system board to be moved.

Execute the showboards(8) command to display system board information, and then check the status of the system board to be moved.

XSCF> showboards 01-0

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

------------------------------------------------------------------

01-0 01(00) Assigned y y y Passed Normal

5. Reserve the move of the system board.

Execute the moveboard(8) command to reserve deletion of the system board from the move-source domain and addition of the system board to the movedestination domain.

XSCF> moveboard -c reserve -d 0 01-0

6. Check the reserved status of the system board.

Execute the showboards(8) command with the -v option specified to display system board information, and confirm that moving the system board to the move-destination domain has been reserved.

XSCF> showboards -v 01-0

XSB R DID(LSB) Assignment Pwr Conn Conf Test Fault COD

--------------------------------------------------------------------------

01-0 * 01(00) Assigned y y y Passed Normal n

7. Stop the move-source domain.

Stop the move-source domain. This operation executes the reserved deletion of the system board from the move-source domain as a change in domain configuration, and the reservation of the addition of the system board to the move-destination domain.

4-24

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

8. Check the status of the move-destination domain and moved system board.

Execute the showdcl(8) command to check the operation status of the movedestination domain, and then execute the showboards(8) command to check the status of the system board and confirm that addition of the system board has been reserved in the move-destination domain.

XSCF> showdcl -d 0

DID LSB XSB Status

00 Running

00 00-0

01 00-1

02 01-0

XSCF> showboards 01-0

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

------------------------------------------------------------------

01-0 00(02) Assigned y n n Passed Normal

9. Add the system board to the move-destination domain.

Execute the addboard(8) command to add the system board to the movedestination domain. If the move-destination domain is in stopped status, the system board will be added the next time the domain is booted.

XSCF> addboard -c configure -d 0 01-0

10. Check the status of the move-destination domain and moved system board.

Execute the showdcl(8) command to check the operation status of the movedestination domain, and then execute the showboards(8) command to check the status of the moved system board.

XSCF> showdcl -d 0

DID LSB XSB Status

00 Running

00

01

02

00-0

00-1

01-0

XSCF> showboards 01-0

XSB DID(LSB) Assignment Pwr Conn Conf Test Fault

------------------------------------------------------------------

01-0 00(02) Assigned y y y Passed Normal

Chapter 4 Practical Examples of DR

4-25

4-26

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

A P P E N D I X

A

Message Meaning and Handling

This appendix explains the meaning and handling of DR-related messages.

A.1

Solaris OS Messages

This section explains the console messages printed by the DR driver. The output for messages that do not have an output field is console.

A.1.1

Transition Messages

DR: PROM detach board X

[Explanation] Detach system board X.

OS configure dr@0:SBX::cpuY

[Explanation] Configure CPU Y on system board X.

OS configure dr@0:SBX::memory

[Explanation] Configure memory on system board X.

OS configure dr@0:SBX::pciY

[Explanation] Configure PCI Y on system board X.

OS unconfigure dr@0:SBX::cpuY

[Explanation] Unconfigure CPU Y on system board X.

A-1

OS unconfigure dr@0:SBX::memory

[Explanation] Unconfigure memory on system board X.

OS unconfigure dr@0:SBX::pciY

[Explanation] Unconfigure PCI Y on system board X.

suspending <device name>@<device info> (aka <alias>)

[Explanation] Suspending the device suspending <device name>@<device info>

[Explanation] Suspending the device resuming <device name>@<device info> (aka <alias>)

[Explanation] Resuming the device resuming <device name>@<device info>

[Explanation] Resuming the device

DR: resuming kernel daemons...

[Explanation] Resuming kernel daemons

DR: resuming user threads...

[Explanation] Resuming user threads

DR: suspending user threads...

[Explanation] Suspending user threads

DR: resume COMPLETED

[Explanation] DR resume operation completed

DR: checking devices...

[Explanation] Checking if there are any DR unsafe device drivers loaded

DR: dr_suspend invoked with force flag

[Explanation] User command requests DR operation without checking for unsafe conditions

DR: suspending drivers

A-2

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

A.1.2

[Explanation] Suspending device drivers

DR: in-kernel unprobe board <board>

[Explanation] Unprobing the board.

PANIC Messages

URGENT_ERROR_TRAP is detected during FMA.

[Explanation] A fatal HW error was encountered during copy-rename.

[Remedy] Please contact customer service.

Failed to remove CMP X LSB NN

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

drmach_copy_rename_fini: invalid op code <opcode>

[Explanation] Internal error happened during kernel migration.

[Remedy] Please contact customer service.

Cannot locate source or target board

[Explanation] Cannot locate source or target board during kernel migration.

[Remedy] Please contact customer service.

Could not update device nodes

[Explanation] Could not update device nodes during kernel migration.

[Remedy] Please contact customer service.

Irrecoverable FMEM error <error code>

[Explanation] Internal error during kernel migration

[Remedy] Please contact customer service.

scf fmem request failed error code = 0x<error code>

[Explanation] Internal error during kernel migration

[Remedy] Please contact customer service.

scf_fmem_end() failed rv=0x<error code>

Appendix A Message Meaning and Handling

A-3

A.1.3

[Explanation] Internal error during kernel migration

[Remedy] Please contact customer service.

CPU nn hang during Copy Rename

[Explanation] A fatal HW error was encountered during copy-rename.

[Remedy] Please contact customer service.

Warning Messages

# megabytes not available to kernel cage

[Explanation] Lack of memory resource deleted.

[Remedy] Detach the board, then attach it again.

IKP: init failed

[Explanation] The initial device tree walk to locate the nodes that are interesting to IKP fails.

[Remedy] Please contact customer service.

dr#: failed to alloc soft-state

[Explanation] Failed to allocate soft-state due to lack of the memory resource

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

dr#: module not yet attached

[Explanation] Failed to attach the DR driver.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

dr_add_memory_spans: unexpected kphysm_add_memory_dynamic return value X; basepfn=Y, npages=Z

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_cancel_cpu: failed to disable interrupts on cpu X

[Explanation] Failed to disable interrupt on CPU X.

A-4

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

[Remedy] Disable interrupt on cpu X with psradm -I and if this command fails again, respond in the manner directed by command message.

dr_cancel_cpu: failed to online cpu X

[Explanation] Failed to online CPU X.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

dr_cancel_cpu: failed to power-on cpu X

[Explanation] Failed to power-on cpu X

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

dr_copyin_iocmd: (32bit) failed to copyin sbdcmd-struct

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_copyin_iocmd: failed to copyin options

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_copyin_iocmd: failed to copyin sbdcmd-struct

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_copyout_errs: (32bit) failed to copyout

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_copyout_errs: failed to copyout

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_copyout_iocmd: (32bit) failed to copyout sbdcmd-struct

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_copyout_iocmd: failed to copyout sbdcmd-struct

Appendix A Message Meaning and Handling

A-5

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_status: failed to copyout status for board #

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_status: unknown dev type (#)

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_dev2devset: invalid cpu unit# = #

[Explanation] Invalid argument is passed to the driver or there may be inconsistency in the system.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

dr_dev2devset: invalid io unit# = #

[Explanation] Invalid argument is passed to the driver or there may be inconsistency in the system.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

dr_dev2devset: invalid mem unit# = #

[Explanation] Invalid argument is passed to the driver or there may be inconsistency in the system.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

dr_exec_op: unknown command (#)

[Explanation] Invalid argument is passed to the driver or there may be inconsistency in the system.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

dr_post_attach_cpu: cpu_get failed for cpu X

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

A-6

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

dr_pre_release_cpu: thread(s) bound to cpu X

[Explanation] The thread in the process is bound to the detached CPU X.

[Remedy] Check if the process bound to the CPU exists by pbind(1M) command.

If it exists, unbind from the CPU and repeat the action.

dr_pre_release_mem: unexpected kphysm_del_release return value #

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_pt_ioctl: invalid passthru args

[Explanation] Invalid argument is passed to the driver or there may be inconsistency in the system.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

dr_release_mem: unexpected kphysm error code #, id 0xX

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_release_mem_done: mem-unit (X.Y): deleted memory still found in phys_install

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_release_mem_done: target: mem-unit (X.Y): deleted memory still found in phys_install

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_release_mem_done: unexpected kphysm_del_release return value #

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_reserve_mem_spans memory reserve failed. Unexpected kphysm_del_span return value #; basepfn=# npages=#

[Explanation] The selected target board can no longer fit all the kernel memory of the source board since it was last selected.

Appendix A Message Meaning and Handling

A-7

[Remedy] Please repeat the action. If the problem remains, please contact customer service.

dr_release_mem_done: <device path>: error <error code> noted

[Explanation] Error noted for a device during releasing memory.

[Remedy] Please contact customer service.

drmach_log_sysevent failed (rv #) for SBX

[Explanation] There may be minor error in the system.

[Remedy] Please contact customer service.

unexpected kcage_range_add return value #

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

unexpected kcage_range_delete return value #

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_select_mem_target: no memlist for mem-unit X, board Y

[Explanation] Detected inconsistency of the memory unit information in the DR driver's internal data.

[Remedy] Please contact customer service.

FAILED to suspend <device name>@<device info>

[Explanation] Device suspension failed

[Remedy] Repeat the action. If the message persists, please contact customer service.

FAILED to resume <device name>@<device info>

[Explanation] The device cannot be resumed.

[Remedy] Please contact customer service.

dr_stop_user_threads: failed to stop thread: process=<name>, pid=#

[Explanation] Cannot stop the user thread.

[Remedy] Please contact customer service.

A-8

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

Cannot stop user thread: <pid> <pid> ...

[Explanation] The DR driver cannot stop all the user processes in the list.

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Cannot setup memory node

[Explanation] DR is unable to read the HW information for the memory device.

[Remedy] Please contact customer service.

Kernel Migration fails. 0xX

[Explanation] Kernel data migration failed as a result of DR detach.

[Remedy] Please contact customer service.

TOD on board X has already been attached.

[Explanation] Time of Date Clock on board X has been attached. This may be a minor inconsistency in the system.

[Remedy] Please contact customer service.

TOD on board X has already been removed.

[Explanation] Time of Date Clock on board X has been removed. This may be a minor inconsistency in the system.

[Remedy] Please contact customer service.

Unable to detach last available TOD on board X

[Explanation] Detaching the system board will result in detaching the last available Time of Date clock.

[Remedy] Attach another system board before detaching.

Device in fatal state

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

[Output] Console and Standard Output

I/O error: dr@0:SBX::memory

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Appendix A Message Meaning and Handling

A-9

Invalid argument

[Explanation] Invalid argument is passed to the driver or there may be inconsistency in the system.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

[Output] Console and Standard Output

Invalid argument: ########

[Explanation] Invalid argument is passed to the driver.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

[Output] Console and Standard Output

Invalid CPU/core state

[Explanation] DR finds some faulty CPU that fails to power on.

[Remedy] Please contact customer service.

No error

[Explanation] Invalid argument is passed to the driver or there may be inconsistency in the system.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

[Output] Console and Standard Output no error: dr@0:SBX::memory

[Explanation] There may be inconsistency in the system.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

[Output] Console and Standard Output

Unrecognized platform command: #

[Explanation] Invalid argument is passed to the driver or there may be inconsistency in the system.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

[Output] Console and Standard Output

Bad address: dr@0:SBX::memory

A-10

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Cannot read property value: device node XXXXXX property: name

[Explanation] Fail to get the property from OBP.

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Cannot read property value: property: scf-cmd-reg

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Cannot find mc-opl interface

[Explanation] DR cannot locate mc-opl driver's suspend/resume interface. mc-opl is probably not loaded or incorrect version is used.

[Remedy] Please contact customer service.

Cannot find scf_fmem interface

[Explanation] DR cannot locate SCF driver's FMEM interface functions. SCF is probably not loaded or incorrect version is used.

[Remedy] Please contact customer service.

Device busy: dr@0:SBX::pciY

[Explanation] Some devices are still referenced.

[Remedy] Confirm that all devices in this pci slot are not in use and repeat the action. If this error message appears again, please contact customer service.

[Output] Console and Standard Output

Device driver failure: path

[Explanation] The device driver failed in attach or detach operation.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

[Output] Console and Standard Output

Error setting up FMEM buffer

Appendix A Message Meaning and Handling

A-11

[Explanation] DR fails to allocate enough memory to perform copy rename.

[Remedy] Retry and if the problem persists, contact customer service.

Failed to off-line: dr@0:SBX::cpuY

[Explanation] Failed to off-line CPU Y on board X.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

[Output] Console and Standard Output

Failed to on-line: dr@0:SBX::cpuY

[Explanation] Failed to online CPU Y on system board X.

[Remedy] Online CPU with psradm -n. If it fails to online CPU, and if this command fails again, respond in the manner directed by command message.

[Output] Console and Standard Output

Failed to start CPU: dr@0:SBX::cpuY

[Explanation] Failed to start CPU Y on system board X.

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Failed to stop CPU: dr@0:SBX::cpuY

[Explanation] Failed to stop CPU Y on system board X.

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Firmware deprobe failed: SBX::cpuY

[Explanation] Failed to deprobe the CPU.

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Firmware probe failed: SBX

[Explanation] Failed to probe the board.

[Remedy] Respond in the manner directed by the other message.

[Output] Console and Standard Output

Insufficient memory: dr@0:SBX::memory

A-12

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

[Explanation] Detected lack of memory resource.

[Remedy] Check the size of memory, detach the board and attach again. If the problem still exists, please contact customer service.

[Output] Console and Standard Output

Internal error: dr.c #

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Internal error: dr_mem.c #

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Invalid argument: dr@0:SBX::memory

[Explanation] The memory board X is currently involved in other DR operation and cannot be detached.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

[Output] Console and Standard Output

Invalid board number: X

[Explanation] Invalid board number.

[Remedy] Check the board number and repeat the action. If this error message appears again, please contact customer service.

[Output] Console and Standard Output

Kernel cage is disabled:

[Explanation] The kernel cage memory feature is disabled.

[Remedy] Ensure /etc/system is edited to enable kernel cage memory.

[Output] Console and Standard Output

Memory operation failed: dr@0:SBX::memory

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Appendix A Message Meaning and Handling

A-13

Memory operation refused: dr@0:SBX::memory

[Explanation] The DR operation is refused.

[Remedy] Respond in the manner directed by the other message.

Memory operation cancelled: dr@0:SBX::memory

[Explanation] The DR operation is canceled.

[Remedy] Respond in the manner directed by the other message.

No device(s) on board: dr@0:SBX

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Non-relocatable pages in span: dr@0:SBX::memory

[Explanation] There is non-relocatable (kernel) memory on the system board.

[Remedy] The target board with kernel memory cannot be disconnected by DR. It depends on the hardware model if you can remove a kernel memory board or not.

Operator confirmation for quiesce is required: dr@0:SBX::memory

[Explanation] There is non-relocatable (kernel) memory on the board.

[Remedy] The target board with kernel memory cannot be disconnected by DR.

[Output] Console and Standard Output

Unexpected internal condition: drmach.c #

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Unexpected internal condition: SBX

[Explanation] The attempt to call OBP failed.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

[Output] Console and Standard Output

Device busy: dr@0:SBX::cpuY

[Explanation] CPU Y on system board X is busy during release operation.

A-14

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

[Output] Console and Standard Output

Insufficient memory: dr@0:SBX::cpuY

[Explanation] Lack of memory resources detected.

[Remedy] Check the size of available memory and detach the board. If the problem still exists, please contact customer service.

[Output] Console and Standard Output

Invalid argument: dr@0:SBX::cpuY

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Invalid state transition: dr@0:SBX::cpuY

[Explanation] Invalid state transition of cpu Y on system board X

[Remedy] Repeat the action. If the problem still exists, please contact customer service.

[Output] Console and Standard Output

Invalid state transition: dr@0:SBX::memory

[Explanation] Invalid state transition of memory on system board X

[Remedy] Repeat the action. If the problem still exists, please contact customer service.

[Output] Console and Standard Output

Invalid state transition: dr@0:SBX::pciY

[Explanation] Invalid state transition of pci Y on system board X

[Remedy] Repeat the action. If the problem still exists, please contact customer service.

[Output] Console and Standard Output

No such device: dr@0:SBX::cpuY

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Appendix A Message Meaning and Handling

A-15

Operation already in progress: dr@0:SBX::cpuY

[Explanation] The operation on cpu Y on system board X is in progress.

[Remedy] Repeat the action. If the problem still exists, please contact customer service.

[Output] Console and Standard Output dr_move_memory: failed to quiesce OS for copy-rename

[Explanation] There is a task not suspended in the process.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

[Output] Console and Standard Output

No available memory target: dr@0:SBX::memory

[Explanation] The system board cannot be detached because it contains kernel memory and there is no available target memory board.

[Remedy] Add new system board and then try the detach operation again.

[Output] Console and Standard Output

Unsafe driver present: <driver name|major #> ...

[Explanation] DR driver found DR unsafe drivers in the system.

[Remedy] Unload the unsafe drivers and try the DR operation again.

[Output] Console and Standard Output

Device failed to resume: <driver name|major #> ...

[Explanation] Devices on the list failed to resume

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Device failed to suspend: <driver name|major #> ...

[Explanation] Devices on the list failed to suspend

[Remedy] Please contact customer service.

[Output] Console and Standard Output

Operation not supported: ERROR

[Explanation] Invalid operation.

[Remedy] Repeat the action. If this error message appears again, please contact customer service.

A-16

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

[Output] Console and Standard Output

Cannot setup resource map opl-fcodemem

[Explanation] Resource memory mapping cannot be set up.

[Remedy] Please contact customer service.

opl_cfg failed to load, error=<errno>

[Explanation] opl_cfg module failed to load.

[Remedy] Please contact customer service.

IKP: failed to read HWD header

[Explanation] The header of the hardware descriptor could not be read.

[Remedy] Please contact customer service.

IKP: create cpu (<board>-<chip>-<core>-<cpu>) failed

[Explanation] There was a problem creating the device node for a cpu.

[Remedy] Please contact customer service.

IKP: create core (<board>-<chip>-<core>) failed

[Explanation] There was a problem creating the device node for a core.

[Remedy] Please contact customer service.

IKP: create chip (<board>-<chip>) failed

[Explanation] There was a problem creating the device node for a chip.

[Remedy] Please contact customer service.

IKP: create pseudo-mc (<board>) failed

[Explanation] There was a problem creating the pseudo-mc device node for the board.

[Remedy] Please contact customer service.

opl_claim_memory - unable to allocate contiguous memory of size zero

[Explanation] A claim request with size zero was issued by the fcode interpreter.

[Remedy] If DR failed after this message, please contact customer service.

opl_claim_memory - vhint is not zero vhint=0x<vhint> - Ignoring

Argument

Appendix A Message Meaning and Handling

A-17

[Explanation] A claim request with a nonzero hint came from the fcode interpreter.

[Remedy] If DR failed after this message, please contact customer service.

opl_claim_memory - unable to allocate contiguous memory

[Explanation] Memory allocation failed for the fcode interpreter.

[Remedy] If DR failed after this message, please contact customer service.

opl_get_fcode: Unable to copy out fcode image

[Explanation] Failed to copy out the fcode image to the efcode daemon.

[Remedy] If DR failed after this message, please contact customer service.

opl_get_hwd_va: Unable to copy out cmuch descriptor for <addr>

[Explanation] Failed to copy out the cmuch HWD to the efcode daemon.

[Remedy] If DR failed after this message, please contact customer service.

opl_get_hwd_va: Unable to copy out pcich descriptor for <addr>

[Explanation] Failed to copy out the pcich HWD to the efcode daemon.

[Remedy] If DR failed after this message, please contact customer service.

IKP: create leaf (<board>-<channel>-<leaf>) failed

[Explanation] A device node was not created for a PCI device.

[Remedy] If DR failed after this message, please contact customer service.

IKP: Unable to probe PCI leaf (<board>-<channel>-<leaf>)

[Explanation] The fcode interpreter returned a bad status for the probe.

[Remedy] If DR failed after this message, please contact customer service.

IKP: Unable to bind PCI leaf (<board>-<channel>-<leaf>)

[Explanation] The driver binding fails, after the leaf has been probed.

[Remedy] If DR failed after this message, please contact customer service.

IKP: destroy pci (<board>-<channel>-<leaf>) failed

[Explanation] The node was not destroyed.

[Remedy] Please contact customer service.

IKP: destroy pseudo-mc (<board>) failed

A-18

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

[Explanation] The node was not destroyed.

[Remedy] Please contact customer service.

IKP: destroy chip (<board>-<chip>) failed

[Explanation] The node was not destroyed.

[Remedy] Please contact customer service.

dr_del_mlist_query: mlist=NULL

[Explanation] The memory list to be deleted is NULL. This warning is also shown at memoryless board.

[Remedy] Please ignore this message on memoryless boards. If DR failed after this message, please contact customer service.

dr_memlist_canfit: memlist_dup failed

[Explanation] System might have run out of memory. Or there is a memoryless board.

[Remedy] Please ignore this message on memoryless boards. If DR failed after this message, please check if the system has enough memory resource and repeat the action. If the error remains, please contact customer service.

Cannot get floating-boards proplen

[Explanation] Failed to get property information of floating-boards.

[Remedy] Please contact customer service.

Cannot get floating-boards prop

[Explanation] Failed to get property information of floating-boards.

[Remedy] Please contact customer service.

Device node 0x<dip> has invalid property value, board#=<board>

[Explanation] The device node has invalid property value.

[Remedy] Please contact customer service.

DR - IKP initialization failed

[Explanation] IKP initialization failed

[Remedy] Please contact customer service.

I/O callback failed in pre-release

[Explanation] I/O callback failed in pre-release

Appendix A Message Meaning and Handling

A-19

[Remedy] Please contact customer service.

I/O callback failed in post-attach

[Explanation] I/O callback failed in post-attach

[Remedy] Please contact customer service.

Kernel Migration fails. 0x%x

[Explanation] Internal error happened during kernel migration.

[Remedy] Please contact customer service.

Failed to add CMP%d on board %d

[Explanation] CPU failed to power-on during DR attach.

[Remedy] Please contact customer service.

FMEM error = 0x<error code>

[Explanation] DR detects error during the copy rename operation.

[Remedy] Please contact customer service.

Cannot proceed; Board is configured or busy

[Explanation] Board cannot be disconnected because its status is busy.

[Remedy] Repeat the action; If the problem still exists, please contact customer service.

drmach parameter is not a valid ID

[Explanation] ID parameter for status command is not a valid ID.

[Remedy] Correct the format of the ID parameter.

drmach parameter is inappropriate for operation

[Explanation] Parameter(s) for DR command specified incorrectly.

[Remedy] Correct the parameter(s).

drmach_node_ddi_get_parent: NULL dip

[Explanation] Internal error during DR operation.

[Remedy] Please contact customer service.

drmach_node_ddi_get_parent: NULL parent dip

[Explanation] Internal error during DR operation.

A-20

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

[Remedy] Please contact customer service.

Failed to remove CMP xx on board n

[Explanation] Internal error during DR operation.

[Remedy] Please contact customer service.

scf_fmem_cancel() failed rv=0x<error code>

[Explanation] Internal error during kernel migration.

[Remedy] Please contact customer service.

scf_fmem_start error

[Explanation] SCF fails to start the FMEM operation. It is possible that there is

HW error and there is no SCF path or the SP is down.

[Remedy] Please contact customer service.

scf_fmem_cancel error

[Explanation] DR detects some error in the copy rename process and informs SCF to cancel the operation. However, SCF fails to cancel the operation.

[Remedy] Please contact customer service.

Unknown cpu implementation

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

dr_mem_ecache_scrub:address (0x%lx) not on page boundary

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

unexpected kcage_range_delete_post_mem_del return value

[Explanation] There may be inconsistency in the system.

[Remedy] Please contact customer service.

opl_fc_ops_free_handle: DMA seen!

[Explanation] A DMA resource was found in the resource list that is being freed while the board is unprobed.

[Remedy] Please contact customer service.

opl_fc_ops_free: unknown resource type <type>

Appendix A Message Meaning and Handling

A-21

[Explanation] An unknown resource type was found in the resource list that is being freed while the board is unprobed.

[Remedy] Please contact customer service.

VM viability test failed: dr@0:SBX::memory

[Explanation] There is not enough real memory to detach memory on system board X.

[Remedy] Check the amount of available real memory, and repeat the action.If

this error message appears again, please contact our customer service.

DR parallel copy timeout

[Explanation] Internal error happened during kernel migration.

[Remedy] Retry and if the problem persists, contact customer service.

SCF busy

[Explanation] SCF was busy during kernel migration.

[Remedy] Retry and if the problem persists, contact customer service.

SCF I/O Retry Error

[Explanation] Internal error happened during kernel migration.

[Remedy] Please contact customer service.

FMEM command timeout

[Explanation] Internal error happened during kernel migration.

[Remedy] Please contact customer service.

Hardware error

[Explanation] Internal error happened during kernel migration.

[Remedy] Please contact customer service.

FMEM operation terminated

[Explanation] Internal error happened during kernel migration.

[Remedy] Please contact customer service.

Memory copy error

[Explanation] Memory copy error happened during kernel migration.

[Remedy] Retry and if the problem persists, contact customer service.

A-22

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

SCF error

[Explanation] Internal error happened during kernel migration.

[Remedy] Please contact customer service.

A.2

Command Messages

A.2.1

addboard

XSB#XX-X will be assigned to DomainID X. Continue? [y|n]:

[Explanation] Confirming whether DR operation is going to be executed or not.

Input "y" to execute it and "n" to stop it.

XSB#XX-Xwill be configured into DomainID X. Continue? [y|n]:

[Explanation] Confirming whether DR operation is going to be executed or not.

Input "y" to execute it and "n" to stop it.

DR operation canceled by operator.

[Explanation] DR operation canceled by operator.

Domain (DomainID X) is not currently running.

[Explanation] Destination domain #0 was not active when "-c configure" was specified.

[Remedy] Execute it by specifying "-c assign".

XSB#XX-X is already assigned to another domain.

[Explanation] The specified system board (XSB#XX-X) has already been assigned to another domain.

[Remedy] XSB has already been assigned to another domain. Confirm the XSB by showboards

(8).

XSB#XX-X is not installed.

[Explanation] System board (XSB#XX-X) is not installed.

[Remedy] Specify the wrong XSB. Confirm the XSB by showboards(8).

Appendix A Message Meaning and Handling

A-23

XSB#XX-X is currently unavailable for DR. Try again later.

[Explanation] The specified system board (XSB#XX-X) has already been executed by another operation.

[Remedy] DR or power-off has been executing for another session. Try again after waiting for a while, with the confirmation of the XSB status.

XSB#XX-X has not been registered in DCL.

[Explanation] System board (XSB#XX-X) is not registered to DCL.

[Remedy] Register DCL information by setdc(8).

Another DR operation is in progress. Try again later.

[Explanation] The specified system board (XSB#XX-X) has already been executed by another session.

[Remedy] DR operation is in progress by another session. Try again after waiting for a while, with the confirmation of the XSB status.

XSB#XX-X has been detected timeout by DR self test.

[Explanation] The timeout occurred during DR processing because the hardware diagnosis did not complete. There is something wrong with the hardware.

[Remedy] Find out the cause of the DR failure referring monitoring message and errorlog. Replace the failure component.

XSB#XX-X encountered a hardware error. See error log for details.

[Explanation] An error occurred during hardware diagnosis. There is something wrong with the hardware.

[Remedy] Find out the cause of the DR failure referring monitoring message and errorlog. Replace the failure component.

IP address of DSCP path is not specified.

[Explanation] DR cannot communicate with the domain because the DSCP IP

Address is not set up or registered.

[Remedy] Register the DSCP IP Address.

An internal error has occurred. This may have been caused by a

DR library error.

[Explanation] The DR processing cannot be failed on the domain OS. The error occurred at the DR library.

A-24

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

[Remedy] Find out the cause of the DR failure referring monitoring message and errorlog. Confirm the patch applying status and the XCP version

DR failed. Domain (DomainID X) cannot communicate via DSCP path.

[Explanation] DR processing cannot communicate with the domain. The reasons are that domain is powered off, the DSCP setting is wrong or the error occurs at the DSCP path.

[Remedy] Confirm the domain powered off, DSCP setting, DSCP error with monitoring message and errorlog.

XSB#XX-X could not be configured into DomainID X due to operating system error.

[Explanation] An error occurred from DR library of domain OS at DR process.

The error occurred at configuration management of domain OS.

[Remedy] Find out the cause of the DR failure referring monitoring message and console message. Try again after taking out cause.

Invalid parameter.

[Explanation] There is an error in the specified argument or operand.

[Remedy] Confirm the specified argument or operand and execute the command once again.

Permission denied.

[Explanation] Do not have privilege.

[Remedy] Confirm the user privilege and the command privilege. In the case of high-end servers, please also confirm whether command is executed by XSCF on standby side.

The current configuration does not support this operation.

[Explanation] Cannot execute the command in the current configuration, or it is not supported.

[Remedy] Confirm the current hardware configuration and support status.

A hardware error occurred. Please check the error log for details.

[Explanation] Hardware error occurred. Please confirm monitoring message and the error log.

[Remedy] Find out the cause of the DR failure referring monitoring message and error log. Replace the failure component.

Appendix A Message Meaning and Handling

A-25

A.2.2

An internal error has occurred. Please contact your system administrator.

[Explanation] DR failed. There is a possibility that DR failed because of an internal error in XSCF.

[Remedy] Find out the cause of the DR failure referring monitoring message and error log. Please also confirm the XCP version.

Timeout detected during self-test of XSB#XX-X.

[Explanation] Because the hardware diagnosis in DR did not complete, a timeout occurred. There is a possibility that a hardware error occurred.

[Remedy] Find out the cause of the DR failure referring monitoring message and error log. Replace the failure component.

deleteboard

XSB#XX-X will be unassigned from domain immediately. Continue?

[y|n]:

[Explanation] Confirming whether DR operation is going to be executed or not.

Input "y" to execute it and "n" to stop it.

XSB#XX-X will be unconfigured from domain immediately.

Continue? [y|n]:

[Explanation] Confirming whether DR operation is going to be executed or not.

Input "y" to execute it and "n" to stop it.

XSB#XX-X will be unassigned from domain after the domain restars. Continue? [y|n]:

[Explanation] Confirming whether DR operation is going to be executed or not.

Input "y" to execute it and "n" to stop it.

DR operation canceled by operator.

[Explanation] DR operation canceled by operator

XSB#XX-X is not installed.

[Explanation] System board (XSB#XX-X) is not installed.

[Remedy] Specify the wrong XSB. Confirm the XSB by showboards(8).

XSB#XX-X is currently unavailable for DR. Try again later.

A-26

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

[Explanation] The specified system board (XSB#XX-X) has already been executed by another operation.

[Remedy] DR or power-off has been executing for another session. Try again after waiting for a while, with the confirmation of the XSB status.

XSB#XX-X has not been registered to DCL.

[Explanation] System board (XSB#XX-X) is not registered to DCL.

[Remedy] Register DCL information by setdc(8).

XSB#XX-X is the last LSB for DomainID X, and this domain is still running. Operation failed.

[Explanation] XSB#XX-X is the last LSB for domain#X.

[Remedy] Power-off the domain by specifying "-c reserve".

IP address of DSCP path is not specified.

[Explanation] DR cannot communicate with the domain because DSCP IP

Address is not set up or not registered.

[Remedy] Register the DSCP IP Address.

An internal error has occurred. This may have been caused by a

DR library error.

[Explanation] The DR processing cannot be failed on the domain OS. The error occurred at the DR library.

[Remedy] Find out the cause of the DR failure referring monitoring message and errorlog. Confirm the patch applying status and the XCP version.

DR failed. Domain (DomainID X) cannot communicate via DSCP path.

[Explanation] DR processing cannot communicate with the domain. The reasons are that domain is powered off, the DSCP setting is wrong or the error occurs at the DSCP path.

[Remedy] Confirm the domain powered off, DSCP setting, DSCP error with monitoring message and errorlog.

XSB#XX-X could not be unconfigured from DomainID X due to operating system error.

[Explanation] An error occurred from DR library of domain OS at DR process.

The error occurred at configuration management of domain OS.

[Remedy] Find out the cause of the DR failure referring monitoring message and console message. Try again after taking out cause.

Appendix A Message Meaning and Handling

A-27

Invalid parameter.

[Explanation] There is an error in the specified argument or operand.

[Remedy] Confirm the specified argument or operand and execute the command once again.

Permission denied.

[Explanation] Do not have privilege.

[Remedy] Confirm the user privilege and the command privilege. In the case of high-end servers, please also confirm whether command is executed by XSCF on standby side.

A hardware error occurred. Please check the error log for details.

[Explanation] Hardware error occurred. Please confirm monitoring message and the error log.

[Remedy] Find out the cause of the DR failure referring monitoring message and error log. Replace the failure component.

An internal error has occurred. Please contact your system administrator.

[Explanation] DR failed. There is a possibility that DR failed because of an internal error in XSCF.

[Remedy] Find out the cause of the DR failure referring monitoring message and error log. Please also confirm the XCP version.

A.2.3

moveboard

XSB#XX-X will be moved from DomainID X to DomainID X immediately. Continue? [y|n]:

[Explanation] Confirming whether DR operation is going to be executed or not.

Input "y" to execute it and "n" to stop it.

XSB#XX-X will be assigned to DomainID X immediately. Continue?

[y|n]:

[Explanation] Confirming whether DR operation is going to be executed or not.

Input "y" to execute it and "n" to stop it.

XSB#XX-X will be assigned to DomainID X after DomainID X restarts. Continue? [y|n]:

A-28

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

[Explanation] Confirming whether DR operation is going to be executed or not.

Input "y" to execute it and "n" to stop it.

DR operation canceled by operator.

[Explanation] DR operation canceled by operator.

Domain (DomainID X) is not currently running.

[Explanation] Destination domain #X was not active when "-c configure" was specified.

[Remedy] Execute it by specifying "-c assign".

XSB#XX-X cannot be moved due to System Board Pool.

[Explanation] The XSB in the system board pool cannot be moved.

[Remedy] Executing addboard command.

XSB#XX-X is not installed.

[Explanation] System board (XSB#XX-X) is not installed.

[Remedy] Specify the wrong XSB. Confirm the XSB by showboards(8).

XSB#XX-X is currently unavailable for DR. Try again later.

[Explanation] The specified system board (XSB#XX-X) has already been executed by another operation.

[Remedy] DR or power-off has been executing for another session. Try again after waiting for a while, with the confirmation of the XSB status.

XSB#XX-X has not been registered in DCL.

[Explanation] System board (XSB#XX-X) is not registered to DCL.

[Remedy] Register DCL information by setdc(8).

Another DR operation is in progress. Try again later.

[Explanation] The specified system board (XSB#XX-X) has already been executed by another session.

[Remedy] DR operation is in progress by another session. Try again after waiting for a while, with the confirmation of the XSB status.

XSB#XX-X is the last LSB for DomainID X, and this domain is still running. Operation failed.

[Explanation] XSB#XX-X is the last LSB for domain#X.

Appendix A Message Meaning and Handling

A-29

[Remedy] Power off the domain by specifying "-c reserve".

XSB#XX-X detected timeout by DR self test.

[Explanation] The timeout occurred during DR processing because the hardware diagnosis did not complete. There is something wrong with the hardware.

[Remedy] Find out the cause of the DR failure referring monitoring message and errorlog. Replace the failure component.

XSB#XX encountered a hardware error. See error log for details.

[Explanation] An error occurred during hardware diagnosis. There is something wrong with the hardware.

[Remedy] Find out the cause of the DR failure referring monitoring message and errorlog. Replace the failure component.

IP address of DSCP path is not specified.

[Explanation] The DR processing cannot communicate the domain because DSCP

IP Address is not set up.

[Remedy] Register the DSCP IP Address.

An internal error has occurred. This may have been caused by a

DR library error.

[Explanation] The DR processing cannot be failed on the domain OS. The error occurred at the DR library.

[Remedy] Find out the cause of the DR failure referring monitoring message and errorlog. Confirm the patch applying status and the XCP version.

DR failed. Domain (DomainID X) cannot communicate via DSCP path.

[Explanation] DR processing cannot communicate with the domain. The reasons are that domain is powered off, the DSCP setting is wrong or the error occurs at the DSCP path.

[Remedy] Confirm the domain powered off, DSCP setting, DSCP error with monitoring message and errorlog.

XSB#03-0 could not be unconfigured from DomainID 1 due to operating system error, or XSB#03-0 could not be configured into DomainID 0 due to operating system error.

[Explanation] An error occurred in DR library of domain OS at DR process. The error occurred at configuration management of domain OS.

A-30

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

[Remedy] Find out the cause of the DR failure referring monitoring message and console message. Try again after taking out cause.

Invalid parameter.

[Explanation] There is an error in the specified argument or operand.

[Remedy] Confirm the specified argument or operand and execute the command once again.

Permission denied.

[Explanation] Do not have privilege.

[Remedy] Confirm the user privilege and the command privilege. In the case of high-end servers, please also confirm whether command is executed by XSCF on standby side.

The current configuration does not support this operation.

[Explanation] Cannot execute the command in the current configuration, or it is not supported.

[Remedy] Confirm the current hardware configuration and support status.

A hardware error occurred. Please check the error log for details.

[Explanation] Hardware error occurred. Please confirm monitoring message and the error log.

[Remedy] Find out the cause of the DR failure referring monitoring message and error log. Replace the failure component.

An internal error has occurred. Please contact your system administrator.

[Explanation] DR failed. There is a possibility that DR failed because of an internal error in XSCF.

[Remedy] Find out the cause of the DR failure referring monitoring message and error log. Please also confirm the XCP version.

Timeout detected during self-test of XSB#XX-X.

[Explanation] Because the hardware diagnosis in DR did not complete, a timeout occurred. There is a possibility that a hardware error occurred.

[Remedy] Find out the cause of the DR failure referring to the monitoring message and error log. Replace the failed component.

XSB#XX-X will be assigned to DomainID X. Continue? [y|n]:

Appendix A Message Meaning and Handling

A-31

A.2.4

[Explanation] Confirming whether DR operation is going to be executed or not.

Input "y" to execute it and "n" to stop it.

XSB#XX-Xwill be configured into DomainID X. Continue? [y|n]:

[Explanation] Confirming whether DR operation is going to be executed or not.

Input "y" to execute it and "n" to stop it.

XSB#XX-X could not be configured into DomainID X due to operating system error.

[Explanation] An error occurred in DR library of domain OS at configuration process. The error occurred at configuration management of domain OS.

[Remedy] Find out the cause of the DR failure referring monitoring message and console message. Try again after resolving cause.

setdcl

XSB is already assigned to an LSB in a running Domain (DomainID

X).

[Explanation] The system board of the specified LSB has already been registered in DCL.

[Remedy] Power off the domain, or move XSB to the system board pool. Try again.

LSB#00 is already registered in DCL.

[Explanation] The system board of the specified LSB has already been registered in DCL.

[Remedy] Confirm the domain, LSB and XSB. Setup data correctly.

LSB#00 has not been registered in DCL yet.

[Explanation] The domain and LSB weren’t set up, when the DCL of no-mem, noio and floating-board was changed.

[Remedy] Set up the domain and LSB. Try again.

DomainID X does not exist.

[Explanation] No LSB was set up on the domain, when the DCL of configurationpolicy was changed.

[Remedy] Set up the domain and LSB. Try again.

A-32

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

A.2.5

Invalid parameter.

[Explanation] There is an error in the specified argument or operand.

[Remedy] Confirm the specified argument or operand and execute the command once again.

Permission denied.

[Explanation] Do not have privilege.

[Remedy] Confirm the user privilege and the command privilege. In the case of high-end servers, please also confirm whether command is executed by XSCF on standby side.

An internal error has occurred. Please contact your system administrator.

[Explanation] DR failed. There is a possibility that DR failed because of an internal error in XSCF.

[Remedy] Find out the cause of the DR failure referring monitoring message and error log. Please also confirm the XCP version.

setupfru

SB#XX is currently in use.

[Explanation] Because the system board of the PSB is running on the domain or is assigned, PSB configuration cannot be changed.

[Remedy] Please confirm whether the system board of the PSB is assigned to the domain or not, and release the system board if it is in the assigned status.

SB#XX is not installed.

[Explanation] Because PSB is not installed, it could not be set.

[Remedy] Please execute it again after confirming installation of the hardware.

Operation has completed. However, a configuration error was detected.

[Explanation] Although configuration of PSB is changed, configuration error is occurring on the system board created. Confirm the CPU module and DIMM slot on the specified PSB and status of Memory Mirror Mode.

[Remedy] Confirm the CPU module and DIMM slot on the PSB board and status of Memory Mirror Mode.

Appendix A Message Meaning and Handling

A-33

The specified parameter is not supported in this model.

[Explanation] Unsupported parameter in this server is specified. For this reason, the command was canceled.

[Remedy] Confirm the specified parameter and the server model, and execute the command once again.

Invalid parameter.

[Explanation] There is an error in the specified argument or operand.

[Remedy] Confirm the specified argument or operand and execute the command once again.

Permission denied.

[Explanation] Do not have privilege.

[Remedy] Confirm the user privilege and the command privilege. In the case of high-end servers, please also confirm whether command is executed by XSCF on standby side.

The current configuration does not support this operation.

[Explanation] Cannot execute the command in the current configuration, or it is not supported.

[Remedy] Confirm the current hardware configuration and support status.

An internal error has occurred. Please contact your system administrator.

[Explanation] DR failed. There is a possibility that DR failed because of an internal error in XSCF.

[Remedy] Find out the cause of the DR failure referring monitoring message and error log. Please also confirm the XCP version.

A.2.6

showdevices

XSB#%s is not currently running.

[Explanation] The system was not able to get some parameter for the XSB.

[Remedy] Confirm the information for the XSB via the showboards command.

cannot get device information from DomainID.

[Explanation] The system was unable to collect the requested information from the domain.

A-34

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

[Remedy] Confirm that the DSCP setting is correct, confirm that the dsc process is running fine on the domain.

Appendix A Message Meaning and Handling

A-35

A-36

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

A P P E N D I X

B

Example: Confirm Swap Space Size

This example shows one way to analyze the physical memory on a system board to determine whether the system has enough swap space to support deletion of a board. It explains how to collect and analyze information using the showdevice(8) command on the XSCF and the swap(1M) command on the Solaris OS.

In this example, the system board to be deleted contains physical memory and a disk has been attached to it to provide swap space. A disk that is attached to another system board provides additional swap space.

This example is based on the following swap space size and physical memory size.

Most of the swap space in the system is still available and the system board can be safely deleted.

Swap area of the entire domain: 4GB

Swap area of the system board to be deleted: 1GB

Physical memory of the system board to be deleted: 2GB

1. Execute the showdevices(8) command on the XSCF to show the resources of

the system board (XSB#00-0) to be deleted.

This command displays the total physical memory on the board and the I/O devices that are attached.

B-1

XSCF> showdevices 00-0

CPU:

----

DID XSB

00 00-0 id

40 state speed on-line 2048

00

00

00

00-0

00-0

00-0

41

40

41 on-line 2048 on-line 2048 on-line 2048 ecache

4

4

4

4

Memory:

-------

DID XSB

00 00-0 board mem MB

2048 perm mem MB

0 base address

0x0000000000000000 domain mem MB

4096 target

XSB deleted mem MB remaining mem MB

IO Devices:

----------

DID XSB

00 00-0 device sd0 resource

/dev/dsk/c0t0d0s1 usage swap area

Notice in the Memory section that 2048 MB (2GB) of physical memory is on this board. And in the I/O Devices section the /dev/dsk/c0t3d0s1 disk contains a configured swap space.

2. On the domain execute the swap(1M) command with its -l option specified to

determine the size of the swap space configured on the disk.

# swap -l swapfile

/dev/dsk/c0t3d0s1

/dev/dsk/c1t1d0s1 dev

118,1

118,2 swaplo

16

16 blocks

2097152

6291456 free

2097152

4109712

Notice that /dev/dsk/c0t3d0s1, the disk to be deleted, contributes 2097152 blocks. Each block is 512 bytes, so this disk contributes 1GB of swap space.

Moreover, the domain has additional swap space available from

/dev/dsk/c1t1d0s1

, a disk connected to another system board, which contributes 6291456 blocks (3GB). Thus, the total available swap space is 4GB.

3. Execute the swap(1M) command with its -s option to determine the total value

of available swap space.

This amount could have been determined in the previous step, but you can use the following command to get a brief summary of the details.

# swap -s total: 40096k bytes allocated + 2200k reserved = 42296k used, 4152008k available

Notice that most of the 4GB of total swap space is available. When the system board is deleted, 1GB of total swap space will be removed, and the remaining available swap space will be nearly 3GB. Therefore, there is enough remaining swap space to allow this system board to be deleted.

B-2

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

Glossary

This glossary describes some of the terms used in this manual.

Capacity on Demand

(COD)

An option that provides additional CPU processing resources when needed.

These additional CPUs are provided on COD CPU boards that are installed in the system. To access the COD CPUs, you must purchase the COD right-to use

(RTU) licenses for them.

CPU/Memory Board unit (CMU)

The CPU/Memory unit (CMU) consists of the CPU, memory and CMU channel.

CPU core

A segmented processing unit of the CPU chip. A virtual processor.

CPU module

A module containing one or two CPU chip(s).

Domain Component

List (DCL)

List of boards available to be attached to a domain.

domain ID (DID)

Domain identifier.

Domain-SP

Communication

Protocol (DSCP)

Protocol which provides a user-level to user-level TCP/IP sockets type communication between the Service Processor and a domain. This communication occurs over a mailbox type of communication provided by other software components.

eXtended system board

(XSB)

The XSB is made of physical components. In the XSB, the PSB can be either one complete unit (undivided status) or divided into four subunits. The XSB is a unit used for domain construction and identification, and can be also used as a logical unit.

eXtended System

Control Facility

(XSCF)

The software that runs on this system Service Processor and provides control and monitoring functions for this system platform.

Glossary-1

eXtended System

Control facility unit

(XSCFU)

The XSCF board for this server which contains system administration function and operates with independent processor.

field-replaceable unit

(FRU)

A part that can be replaced by field engineers when servicing the system.

firmware

Firmware is the software to control the system. These servers have the following firmware; OpenBoot PROM, POST, XSCF. For details, see the definition of OpenBoot PROM, POST, XSCF in this glossary. SAS controller,

GbE controller and control program for IOBOX may be considered as firmware.

Hardware Control

Program (HCP)

A date file which contains cluster of XSCF, POST, and OpenBoot PROM firmware settings.

I/O unit (IOU)

The unit of physical print board and mechanical components, consists of I/O controller. The combination/segmentation of CMU and IOU configures the system (domain).

logical system board

(LSB)

A logical unit name of an XSB to which a logical number (LSB number) is assigned. LSB is used together with LSB number when domains are constructed and it is referred to by Solaris OS.

motherboard unit

(MBU)

The main board assembly to which other boards and components are connected in the servers with a single XSCF Unit. The servers with redundant

XSCF Units do not have motherboard unit, but CMUs.

OpenBoot PROM

A layer of software that does the following: takes control of the configured this system from the Power-on self-test (POST) and builds data structures in memory, and boots the operating system. The OpenBoot PROM is IEEE 1275compliant.

physical system board

(PSB)

The PSB is made up of physical components, and can include 1 CMU and 1

IOU or just 1 CMU. In midrange servers, the CMU is mounted on MBU. A PSB can also be used as to describe a physical unit for addition/deletion/exchange of hardware. The PSB can be used in one of two methods, one complete unit

(undivided status) or divided into four subunits.

power-on self-test

(POST)

A program that takes uninitialized this system hardware, probes and tests its components, configures the working components into a coherent initialized system, and transfers control to the OpenBoot PROM.

Glossary-2

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide

September 2007

privileges

Specific permissions granted to users. This system has platform administrator, platform operator, domain administrator, domain operator, domain manager, user administrator, audit administrator, audit operator and field engineer privileges during the XSCF program running.

Quad-XSB

One of the division types for a PSB to be configured. Quad-XSB is a name for when a PSB is logically divided into four parts. The division type can be changed by using the XSCF command setupfru(8). Quad-XSB may be used to describe a PSB division type or status.

SCF

See “eXtended System Control Facility”.

Secure Shell (SSH)

A software program which allows the user to log into another system over a network, execute commands on a remote machine, and to move files from one machine to another. It provides strong authentication and secure communications over insecure channels.

Service Processor

A small system that directs system start up, reconfiguration, and fault diagnosis. The Service Processor indicates XSCF in this system. When the service processor is described as hardware, it indicates XSCFU.

system board

Component unit that enables the CMU and the IOU hardware resources to constitute a domain. CPU modules and memories are mounted on the CMU.

CPU modules and memory modules are mounted on the mother board (MBU) in the mid-range system.Onboard I/O devices such as a PCI card, hard disk, and LAN port are mounted in the IOU.

Uni-XSB

One of the division types for a PSB to be configured. Uni-XSB is a name for when a PSB is logically only one unit (undivided status). It is a default value setting for the division type for a PSB. The division type can be changed by using the XSCF command setupfru(8). Uni-XSB may be used to describe a

PSB division type or status.

XSCF

See eXtended system board.

XSCF Shell

The CLI interface of the XSCF.

XSCF Web

The BUI (Browser User Interface) interface of the XSCF.

Glossary-3

Glossary-4

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide

September 2007

Index

A

Add, 1-3

addboard, 3-2, 3-15, 3-22

addfru, 3-26

addition, 1-6, 2-20, 2-27, 3-15, 4-2, 4-6

Assign, 1-3

B

Basic DR Terms, 1-3

C

Capacity on Demand, 2-29

configuration policy, 2-13

Configure, 1-3

Copy-rename, 2-7

CPU, 2-4

D

DCL, 1-3, 2-10

degradation, 2-13

Delete, 1-3

deleteboard, 3-2, 3-17, 3-22

deletefru, 3-26

deletion, 1-6, 2-21, 2-27, 3-17, 4-3, 4-8

device information, 2-27, 3-10

division type, 1-5, 2-10, 3-13

domain component list, 1-3

domain status, 2-17, 3-2, 3-5

DR functions, 1-1, 1-5

E

eXtended System Board, 1-4

eXtended System Control Facility (XSCF), 1-7

F

Floating Boards, 2-6, 2-14

I

I/O device, 2-9, 2-16, 2-27

Install, 1-4

Intimate Shared Memory, 2-8

IO board unit, 1-4

ISM, 2-8

K

Kernel Cage, 2-6

kernel cage memory, 2-16

kernel memory, 2-11

Kernel Memory Assignment, 2-6

kernel memory board, 2-5

L

Logical System Board, 1-4

LSB, 1-4

M

memory, 2-5

memory mirror mode, 2-28

memory mirroring mode, 3-13

Move, 1-3

Index-1

move, 1-6, 2-23, 3-19, 4-4, 4-10

moveboard, 3-2, 3-19

O

omit-I/O, 2-15 omit-memory, 2-15

P

Physical System Board, 1-4

poweroff, 3-26 poweron, 3-26

PSB, 1-4

Q

Quad-XSB, 1-5, 2-1, 2-10, 4-16

R

RCM Script, 3-27

real-time processes, 2-28

Register, 1-3

Release, 1-3

Remove, 1-4

Replace, 1-4

replacefru, 3-26

replacement, 1-7, 3-22, 4-12

reservation, 2-12, 3-24

Reserve, 1-4

reserve addition, 4-20

reserve deletion, 4-22

reserve move, 4-23

S

setdcl, 3-2

setdscp, 3-26

setupfru, 3-2

showboards, 3-1, 3-6 showdcl, 3-1, 3-2 showdevices, 3-1, 3-10 showdomainstatus, 3-1, 3-5

showdscp, 3-26

showfru, 3-1, 3-13

Solaris OS, 2-16

swap area, 2-11, 2-27

system board, 1-5

system board pool, 2-10

system board status, 2-18, 3-6

system configuration, 2-11

U

Unassign, 1-3

Unconfigure, 1-4

Uni-XSB, 1-5, 2-1, 2-10, 4-13

user memory board, 2-8

X

XSB, 1-4

XSCF, 2-12, 2-13

XSCF Web, 3-27

Index-2

SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • September 2007

Herausgegeben von / Published by

Fujitsu Siemens Computers GmbH

Bestell-Nr./ Order No.:

U41684-J-Z816-2-76

*U41684-J-Z816-2-76 *

U41684-J-Z816-2-76

advertisement

Was this manual useful for you? Yes No
Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Related manuals

Download PDF

advertisement