HP Smart Array E500 Controller for Integrity Servers User Guide

HP Smart Array E500 Controller for Integrity Servers User Guide

HP Smart Array E500 Controller for

Integrity Servers

User Guide

Part Number 451221-001

April 2007 (First Edition)

© Copyright 2007 Hewlett-Packard Development Company, L.P.

The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.

Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation.

Audience assumptions

This document is for the person who installs, administers, and troubleshoots servers and storage systems.

HP assumes you are qualified in the servicing of computer equipment and trained in recognizing hazards in products with hazardous energy levels.

Contents

Hardware features ........................................................................................................................ 5

Main components on the board .................................................................................................................. 5

Other features........................................................................................................................................... 5

Overview of the installation procedure ............................................................................................ 7

Quick installation procedure (Windows or Linux)........................................................................................... 7

Installing the controller hardware.................................................................................................... 9

Preparing the server................................................................................................................................... 9

Installing the controller board...................................................................................................................... 9

Connecting storage devices...................................................................................................................... 10

Updating the firmware ................................................................................................................ 11

Methods for updating the firmware (Windows® or Linux®) .......................................................................... 11

Configuring an array .................................................................................................................. 12

Utilities available for configuring an array .................................................................................................. 12

Using ORCA........................................................................................................................................... 12

Using ACU or ACU-CLI ............................................................................................................................ 13

Installing device drivers and Management Agents .......................................................................... 14

Systems using Microsoft® Windows®........................................................................................................ 14

Installing device drivers .................................................................................................................. 14

Installing the Event Notification Service ............................................................................................ 14

Installing Management Agents ........................................................................................................ 14

Systems using Linux®............................................................................................................................... 14

Installing Management Agents ........................................................................................................ 15

Upgrading or replacing controller options ..................................................................................... 16

Replacing the cache ................................................................................................................................ 16

Replacing, moving, or adding hard drives..................................................................................... 17

Identifying the status of a hard drive .......................................................................................................... 17

Recognizing hard drive failure .................................................................................................................. 18

Effects of a hard drive failure .......................................................................................................... 19

Compromised fault tolerance .......................................................................................................... 19

Recovering from compromised fault tolerance.................................................................................... 19

Replacing hard drives .............................................................................................................................. 20

Hard drive replacement guidelines .................................................................................................. 20

Automatic data recovery (rebuild).................................................................................................... 21

Upgrading hard drive capacity ....................................................................................................... 23

Moving drives and arrays ........................................................................................................................ 23

Diagnosing array problems.......................................................................................................... 25

Controller board runtime LEDs................................................................................................................... 25

Diagnostic tools ...................................................................................................................................... 26

Electrostatic discharge................................................................................................................. 27

Preventing electrostatic discharge .............................................................................................................. 27

Contents 3

Grounding methods to prevent electrostatic discharge.................................................................................. 27

Regulatory compliance notices ..................................................................................................... 28

Federal Communications Commission notice............................................................................................... 28

Modifications.......................................................................................................................................... 28

Cables................................................................................................................................................... 28

Canadian notice ..................................................................................................................................... 28

European Union regulatory notice ............................................................................................................. 29

BSMI notice ............................................................................................................................................ 29

Japanese class A notice ........................................................................................................................... 29

Korean class A notice .............................................................................................................................. 30

Acronyms and abbreviations........................................................................................................ 31

Index......................................................................................................................................... 32

Contents 4

Hardware features

In this section

Main components on the board ................................................................................................................. 5

Other features.......................................................................................................................................... 5

Main components on the board

Item ID Description

1

2

3

Connector for SAS miniport 1E (external), 4x wide

Connector for SAS miniport 2E (external), 4x wide

40-bit 256-MB cache module

CAUTION: Do not use this controller with cache modules designed for other controller models, or the controller will malfunction and you could lose data. Also, do not transfer this cache module to a different controller module, or you could again lose data.

Other features

Feature Details

Card type Low-profile PCIe

Dimensions (excluding bracket) 16.8 cm × 7.0 cm × 1.8 cm (6.6 in × 2.8 in × 0.7 in)

Drive types supported 3.0 Gb/s SAS or 1.5 Gb/s SATA hard drives; also supports OBDR tape drives (for more information about OBDR, see the HP website

( http://www.hp.com/go/obdr )).

Hardware features 5

Maximum power required

Temperature range

Relative humidity

(noncondensing)

RAID levels supported

Type of edge connector

PCIe transfer rate

Number of SAS connectors

SAS transfer rate

Approximately 14 W

Operating, 10° to 55°C (50° to 131°F)

Storage, -30° to 60°C (-22° to 140°F)

Operating, 10% to 90%

Storage, 5% to 90%

0, 1, and 1+0

PCIe x8

Up to 2.0 GB/s in each direction

Two external mini-SAS 4x

1.2 GB/s per wide port at peak bandwidth

This controller does not support RAID level or stripe size migration, array capacity expansion, or logical drive extension in HP Integrity servers.

For more information about the controller features and specifications, and for information about system requirements, refer to the HP website ( http://www.hp.com/products/smartarray ).

Hardware features 6

Overview of the installation procedure

In this section

Quick installation procedure (Windows or Linux) ......................................................................................... 7

Quick installation procedure (Windows or Linux)

Before installing the controller, refer to the support matrix on the HP website

( http://www.hp.com/products1/serverconnectivity ) to confirm that the server and operating system support the controller.

To install the controller:

1.

Power down the server.

2.

3.

Unplug the AC power cord from the power outlet.

Unplug the power cord from the server.

4.

5.

6.

Install the controller hardware (" Installing the controller hardware

" on page

9

).

If necessary, install additional physical drives.

The number of drives in the server determines the RAID level that is autoconfigured when the server is powered up (next step).

Power up the server.

7.

8.

9.

Update the controller firmware ("

Methods for updating the firmware (Windows® or Linux®) " on page 11 ).

When the firmware update process is complete, the server reboots and runs through a POST procedure. This POST procedure halts briefly during controller initialization and prompts you to open

ORCA.

Open ORCA ("

Configuring an array " on page 12 ).

o o

If using a headless console, press the Esc+8 key combination.

Otherwise, press the F8 key.

Configure the logical boot drive, and then exit from ORCA.

If the server is using Linux, controller installation is complete. When the server is next rebooted, the operating system detects the controller hardware and automatically installs the required driver.

If the server is using Microsoft® Windows®, continue as follows:

1.

2.

Load the controller driver from EBSU on the Smart Setup media. (To load the driver, select Load OEM

Boot Drivers in EBSU. For more information about Smart Setup, refer to the HP Smart Setup Guide on the Smart Setup media.)

Run Express Setup.

3.

4.

When you have finished installing the operating system as directed during the Express Setup procedure, remove the operating system CD, and then insert the Smart Setup media.

Install the Integrity Support Pack (" Installing device drivers and Management Agents " on page

14 ).

Overview of the installation procedure 7

Controller installation is complete.

The latest firmware, drivers, utilities, software, and documentation for HP Integrity servers are available on the support page of the HP website ( http://www.hp.com/support/itaniumservers ).

Overview of the installation procedure 8

Installing the controller hardware

In this section

Preparing the server ................................................................................................................................. 9

Installing the controller board .................................................................................................................... 9

Connecting storage devices..................................................................................................................... 10

Preparing the server

1.

2.

3.

Back up all data.

Close all applications.

Power down the server.

CAUTION: In systems that use external data storage, be sure that the server is the first unit to be powered down and the last to be powered back up. Taking this precaution ensures that the system does not erroneously mark the drives as failed when the server is powered up.

4.

5.

6.

Power down all peripheral devices that are attached to the server.

Unplug the AC power cord from the outlet and then from the server.

Disconnect all peripheral devices from the server.

Installing the controller board

WARNING: To reduce the risk of personal injury or damage to the equipment, consult the safety information and user documentation provided with the server before attempting the installation.

Many servers are capable of providing energy levels that are considered hazardous and are intended to be serviced only by qualified personnel who have been trained to deal with these hazards. Do not remove enclosures or attempt to bypass any interlocks that may be provided for the purpose of removing these hazardous conditions.

1.

Remove or open the access panel.

WARNING: To reduce the risk of personal injury from hot surfaces, allow the drives and the internal system components to cool before touching them.

2.

3.

4.

5.

Select an available x8 or larger PCIe slot.

Remove the slot cover. Save the retaining screw, if one is present.

Slide the controller board along the slot alignment guide, if one is present, and press the board firmly into the slot so that the contacts on the board edge are properly seated in the system board connector.

Secure the controller board in place with the retaining screw. If the slot alignment guide has a latch

(near the rear of the board), close the latch.

Installing the controller hardware 9

6.

7.

Connect storage devices to the controller. (For details of the procedure, see "Connecting storage

devices (on page 10 ).")

Close or replace the access panel, and secure it with thumbscrews, if any are present.

CAUTION: Do not operate the server for long periods with the access panel open or removed.

Operating the server in this manner results in improper airflow and improper cooling that can lead to thermal damage.

Connecting storage devices

1.

2.

3.

4.

5.

Power down the server.

Connect an external SAS cable to the external port of the controller. a. b.

Pull back the tab on the mini SAS 4x connector on the cable.

Insert the cable connector into the external port of the controller. c.

Release the tab.

Connect the other end of the cable to the SAS input connector of the external storage enclosure. o o

If the enclosure uses a standard SAS 4x connector, insert the cable connector into the enclosure connector, and then tighten the lock screws on the cable connector.

If the enclosure uses a mini SAS 4x connector, pull back the tab on the cable connector, insert the cable connector into the enclosure connector, and then release the tab.

Power up the enclosure.

Power up the server.

Installing the controller hardware 10

Updating the firmware

In this section

Methods for updating the firmware (Windows® or Linux®)......................................................................... 11

Methods for updating the firmware (Windows® or

Linux®)

To update the firmware on the server, controller, or hard drives, use Smart Components. The most recent version of a particular component is available on the support page of the HP website

( http://www.hp.com/support ). Some components are also available on the Smart Setup media.

1.

2.

3.

Find the most recent version of the component that you require.

Follow the instructions for installing the component on the server. These instructions are provided on the same Web page as the component.

Follow the additional instructions that describe how to use the component to flash the ROM. These instructions are provided with each component.

Updating the firmware 11

Configuring an array

In this section

Utilities available for configuring an array................................................................................................. 12

Using ORCA.......................................................................................................................................... 12

Using ACU or ACU-CLI ........................................................................................................................... 13

Utilities available for configuring an array

Two utilities are available for configuring an array on an HP Smart Array controller in an HP Integrity server: ORCA and ACU.

ORCA is a simple utility that is used mainly to configure the first logical drive in a new server before the operating system is loaded.

ACU is an advanced utility that enables you to perform many complex configuration tasks.

For more information about the features of these utilities and for instructions for using the utilities, see the

Configuring Arrays on HP Smart Array Controllers Reference Guide. This guide is available on the

Documentation CD that is provided in the controller kit.

Whichever utility you use, remember the following factors when you build an array:

All drives in an array must be of the same type (for example, all SAS or all SATA).

For the most efficient use of drive space, all drives within an array should have approximately the same capacity. Each configuration utility treats every physical drive in an array as if it has the same capacity as the smallest drive in the array. Any excess capacity of a particular drive cannot be used in the array and so is unavailable for data storage.

The more physical drives that an array has, the greater the probability that the array will experience a drive failure during any given period. To guard against the data loss that occurs when a drive fails, configure all logical drives in an array with a suitable fault-tolerance (RAID) method.

Using ORCA

1.

2.

Power up the server. POST runs, and any array controllers that are in the server are initialized one at a time. During each controller initialization process, POST halts for several seconds while an ORCA prompt message appears.

At the ORCA prompt: o o

If you are connected using a headless console, press the Esc+8 key combination.

Otherwise, press the F8 key.

Configuring an array 12

The ORCA main menu appears, enabling you to create, view, or delete a logical drive.

To create a logical drive using ORCA:

1.

Select Create Logical Drive.

2.

The screen displays a list of all available (unconfigured) physical drives and the valid RAID options for the system.

Use the Arrow keys, Spacebar, and Tab key to navigate around the screen and set up the logical drive, including an online spare drive if one is required.

NOTE: You cannot use ORCA to configure one spare drive to be shared among several arrays. Only ACU enables you to configure shared spare drives.

3.

4.

Press the Enter key to accept the settings.

Press the F8 key to confirm the settings and save the new configuration.

After several seconds, the Configuration Saved screen appears.

Press the Enter key to continue.

5.

You can now create another logical drive by repeating the previous steps.

NOTE: Newly created logical drives are invisible to the operating system. To make the new logical drives available for data storage, format them using the instructions given in the operating system documentation.

Using ACU or ACU-CLI

You can also use the GUI or CLI format of ACU to configure arrays on HP Integrity servers. Servers on the

Microsoft Windows platform use the ACU GUI, while servers on the Linux platform use ACU-CLI.

For detailed information about using ACU, see the Configuring Arrays on HP Smart Array Controllers

Reference Guide. This document is available on the Smart Setup media or the Documentation CD that is provided in the controller kit.

Configuring an array 13

Installing device drivers and Management

Agents

In this section

Systems using Microsoft® Windows® ...................................................................................................... 14

Systems using Linux®.............................................................................................................................. 14

Systems using Microsoft® Windows®

You can use the Integrity Support Pack to automatically install the device drivers, Event Notification

Service, and Management Agents, or you can install these items manually.

The Integrity Support Pack is located on the Smart Setup media. To install the Integrity Support Pack, launch Express Setup from EBSU and follow the on-screen instructions.

Installing device drivers

The drivers for the controller are located on the Smart Setup media. Updates are posted to the support page of the HP website ( http://www.hp.com/support/itaniumservers ).

Installation instructions are provided with the drivers.

Installing the Event Notification Service

The HP Smart Array SAS/SATA Event Notification Service provides event notification to the Microsoft®

Windows® Server 2003 64-bit system event log and the HP Integrated Management log.

The most recent version of the software component is available on the support page of the HP website

( http://www.hp.com/support/itaniumservers ). Installation instructions are provided with the component.

Installing Management Agents

The Management Agents are available on the Smart Setup media. The most recent versions of the agents are available on the support page of the HP website ( http://www.hp.com/support/itaniumservers ).

Installation instructions are provided with the agents.

If the new agents do not function correctly, you might also need to update Systems Insight Manager. The latest version of Systems Insight Manager is available for download at the HP website

( http://www.hp.com/servers/manage ).

Systems using Linux®

The drivers for the controller are bundled into the supported Red Hat and Novell Linux distributions.

In a system that does not yet have Linux installed:

Installing device drivers and Management Agents 14

1.

2.

3.

Follow the standard controller installation procedure.

Reboot the server.

Follow the standard procedure for installing Linux. As Linux is installed, it recognizes the controller and automatically loads the correct driver.

In a system that already has Linux installed:

1.

Power down the system.

2.

3.

Follow the standard controller installation procedure.

Power up the system. As Linux boots, it recognizes the controller.

4.

5.

Enter one of the following commands as appropriate to ensure that the driver is loaded correctly:

Red Hat:

#mkinitrd -f /boot/efi/efi/redhat/initrd-$(uname -r).img $(uname r)

Novell (SLES):

#mkinitrd -k /boot/vmlinux -i/boot/initrd

For Novell, enter the following command to confirm that the driver is active:

#lsmod | grep cciss

If the driver is active, the system responds by displaying cciss

.

Installing Management Agents

The most recent versions of the agents are available on the support page of the HP website

( http://www.hp.com/support/itaniumservers ). For installation instructions, refer to the downloadable file

HP Insight Management Agents for Linux on Integrity Servers provided with the agents.

If the new agents do not function correctly, you might also need to update Systems Insight Manager. The latest version of Systems Insight Manager is available for download at the HP website

( http://www.hp.com/servers/manage ).

Installing device drivers and Management Agents 15

Upgrading or replacing controller options

In this section

Replacing the cache ............................................................................................................................... 16

Replacing the cache

CAUTION: Do not use this controller with cache modules designed for other controller models, or the controller will malfunction and you could lose data. Also, do not transfer this cache module to a different controller module, or you could again lose data.

1.

2.

3.

4.

Close all applications, and then power down the server. This procedure flushes all data from the cache.

Disconnect the server from the AC power source.

Remove the controller from the server and place it on a firm, flat, nonconductive surface.

Remove the existing cache from the controller by pulling at both ends of the cache module with equal force.

5.

6.

Install the new cache on the controller. Press firmly above each connector to ensure good electrical contact. (If the cache is not properly connected, the controller cannot boot.)

Replace the controller in the server.

Upgrading or replacing controller options 16

Replacing, moving, or adding hard drives

In this section

Identifying the status of a hard drive......................................................................................................... 17

Recognizing hard drive failure................................................................................................................. 18

Replacing hard drives............................................................................................................................. 20

Moving drives and arrays ....................................................................................................................... 23

Identifying the status of a hard drive

When a drive is configured as a part of an array and connected to a powered-up controller, the condition of the drive can be determined from the illumination pattern of the hard drive status lights (LEDs).

Item Description

1 Fault/UID LED (amber/blue)

2 Online LED (green)

Online/activity

LED (green)

Fault/UID LED

(amber/blue)

Interpretation

On, off, or flashing Alternating amber and blue

On, off, or flashing Steadily blue

On

The drive has failed, or a predictive failure alert has been received for this drive; it also has been selected by a management application.

The drive is operating normally, and it has been selected by a management application. regularly (1 Hz)

A predictive failure alert has been received for this drive.

Replace the drive as soon as possible.

Off The drive is online, but it is not active currently.

Replacing, moving, or adding hard drives 17

Online/activity

LED (green)

Fault/UID LED

(amber/blue)

Interpretation

Flashing regularly

(1 Hz)

Flashing regularly

(1 Hz)

Amber, flashing regularly (1 Hz)

Off

Do not remove the drive. Removing a drive may terminate the current operation and cause data loss.

The drive is part of an array that is undergoing capacity expansion or stripe migration, but a predictive failure alert has been received for this drive. To minimize the risk of data loss, do not replace the drive until the expansion or migration is complete.

Do not remove the drive. Removing a drive may terminate the current operation and cause data loss.

The drive is rebuilding, or it is part of an array that is undergoing capacity expansion or stripe migration.

Flashing irregularly Amber, flashing regularly (1 Hz)

The drive is active, but a predictive failure alert has been received for this drive. Replace the drive as soon as possible.

The drive is active, and it is operating normally. Flashing irregularly Off

Off

Off

Steadily amber A critical fault condition has been identified for this drive, and the controller has placed it offline. Replace the drive as soon as possible. flashing regularly (1 Hz)

A predictive failure alert has been received for this drive.

Replace the drive as soon as possible.

Off The drive is offline, a spare, or not configured as part of an array.

Recognizing hard drive failure

A steadily glowing Fault LED indicates that that drive has failed. Other means by which hard drive failure is revealed are:

The amber LED on the front of a storage system illuminates if failed drives are inside. (However, this

LED also illuminates when other problems occur, such as when a fan fails, a redundant power supply fails, or the system overheats.)

A POST message lists failed drives whenever the system is restarted, as long as the controller detects at least one functional drive.

ACU represents failed drives with a distinctive icon.

Systems Insight Manager can detect failed drives remotely across a network. (For more information about Systems Insight Manager, refer to the documentation on the Management CD.)

ADU lists all failed drives.

For additional information about diagnosing hard drive problems, refer to the HP Servers Troubleshooting

Guide.

CAUTION: Sometimes, a drive that has previously been failed by the controller may seem to be operational after the system is power-cycled or (for a hot-pluggable drive) after the drive has been removed and reinserted. However, continued use of such marginal drives may eventually result in data loss. Replace the marginal drive as soon as possible.

Replacing, moving, or adding hard drives 18

Effects of a hard drive failure

When a hard drive fails, all logical drives that are in the same array are affected. Each logical drive in an array can use a different fault-tolerance method, so each logical drive can be affected differently.

RAID 0 configurations cannot tolerate drive failure. If any physical drive in the array fails, all nonfault-tolerant (RAID 0) logical drives in the same array will also fail.

RAID 1+0 configurations can tolerate multiple drive failures as long as no failed drives are mirrored to one another.

RAID 5 configurations can tolerate one drive failure.

Compromised fault tolerance

If more hard drives fail than the fault-tolerance method allows, fault tolerance is compromised, and the logical drive fails. In this case, all requests from the operating system are rejected with unrecoverable

errors. You are likely to lose data, although it can sometimes be recovered (refer to " Recovering from compromised fault tolerance

" on page 19 ).

One example of a situation in which compromised fault tolerance may occur is when a drive in an array fails while another drive in the array is being rebuilt. If the array has no online spare, any logical drives in this array that are configured with RAID 5 fault tolerance will fail.

Compromised fault tolerance can also be caused by non-drive problems, such as a faulty cable or temporary power loss to a storage system. In such cases, you do not need to replace the physical drives.

However, you may still have lost data, especially if the system was busy at the time that the problem occurred.

Recovering from compromised fault tolerance

If fault tolerance is compromised, inserting replacement drives does not improve the condition of the logical volume. Instead, if the screen displays unrecoverable error messages, perform the following procedure to recover data:

1.

Power down the entire system, and then power it back up. In some cases, a marginal drive will work again for long enough to enable you to make copies of important files.

2.

3.

If a 1779 POST message is displayed, press the F2 key to re-enable the logical volumes. Remember that data loss has probably occurred and any data on the logical volume is suspect.

Make copies of important data, if possible.

Replace any failed drives.

4.

After you have replaced the failed drives, fault tolerance may again be compromised. If so, cycle the power again. If the 1779 POST message is displayed: a.

Press the F2 key to re-enable the logical drives. b. c.

Recreate the partitions.

Restore all data from backup.

To minimize the risk of data loss that is caused by compromised fault tolerance, make frequent backups of all logical volumes.

Replacing, moving, or adding hard drives 19

Replacing hard drives

The most common reason for replacing a hard drive is that it has failed. However, another reason is to gradually increase the storage capacity of the entire system.

If you insert a hot-pluggable drive into a drive bay while the system power is on, all disk activity in the array pauses for a second or two while the new drive is spinning up. When the drive has achieved its normal spin rate, data recovery to the replacement drive begins automatically (as indicated by the blinking Online/Activity LED on the replacement drive) if the array is in a fault-tolerant configuration.

If you replace a drive belonging to a fault-tolerant configuration while the system power is off, a POST message appears when the system is next powered up. This message prompts you to press the F1 key to start automatic data recovery. If you do not enable automatic data recovery, the logical volume remains in a ready-to-recover condition and the same POST message appears whenever the system is restarted.

Hard drive replacement guidelines

Before replacing a degraded drive, perform the following tasks:

Open HP Systems Insight Manager, and inspect the Error Counter window for each physical drive in the same array to confirm that no other drives have any errors. (For details, see the Systems Insight

Manager documentation on the Management CD.)

Be sure that the array has a current, valid backup.

Confirm that the replacement drive is of the same type (SAS or SATA) as the degraded drive.

Use replacement drives that have a capacity at least as great as that of the smallest drive in the array. The controller immediately fails drives that have insufficient capacity.

In systems that use external data storage, be sure that the server is the first unit to be powered down and the last to be powered back up. Taking this precaution ensures that the system does not erroneously mark the drives as failed when the server is powered up.

To minimize the likelihood of fatal system errors, take these precautions when removing failed drives:

Do not remove a degraded drive if any other drive in the array is offline (the Online/Activity LED is off). In this situation, no other drive in the array can be removed without data loss.

The following cases are exceptions: o

When RAID 1+0 is used, drives are mirrored in pairs. Several drives can be in a failed condition simultaneously (and they can all be replaced simultaneously) without data loss, as long as no two failed drives belong to the same mirrored pair. o

If the offline drive is a spare, the degraded drive can be replaced.

Do not remove a second drive from an array until the first failed or missing drive has been replaced and the rebuild process is complete. (The rebuild is complete when the Online/Activity LED on the front of the drive stops blinking.)

The following case is an exception: o

In RAID 1+0 configurations, any drives that are not mirrored to other removed or failed drives can be simultaneously replaced offline without data loss.

Replacing, moving, or adding hard drives 20

Automatic data recovery (rebuild)

When you replace a hard drive in an array, the controller uses the fault-tolerance information on the remaining drives in the array to reconstruct the missing data (the data that was originally on the replaced drive) and write it to the replacement drive. This process is called automatic data recovery, or rebuild. If fault tolerance is compromised, this data cannot be reconstructed and is likely to be permanently lost.

Fault tolerance is unavailable during a rebuild. If another drive in the array fails while a rebuild is in progress, a fatal system error can occur, and all data on the array is then lost. In some cases, however, failure of another drive need not lead to a fatal system error. These exceptions include:

Failure after activation of a spare drive

Failure of a drive that is not mirrored to any other failed drives (in a RAID 1+0 configuration)

Time required for a rebuild

The time required for a rebuild varies considerably, depending on several factors:

The priority that the rebuild is given over normal I/O operations (you can change the priority setting by using ACU)

The amount of I/O activity during the rebuild operation

The rotational speed of the hard drives

The availability of drive cache

The brand, model, and age of the drives

The amount of unused capacity on the drives

For RAID 5, the number of drives in the array

Allow approximately 30 seconds per gigabyte for the rebuild process to be completed. This figure is conservative, and the actual time required is usually less.

System performance is affected during the rebuild, and the system is unprotected against further drive failure until the rebuild has finished. Therefore, replace drives during periods of low activity when possible.

When automatic data recovery has finished, the Online/Activity LED of the replacement drive stops blinking steadily at 1 Hz and begins to either glow steadily (if the drive is inactive) or flash irregularly (if the drive is active).

CAUTION: If the Online/Activity LED on the replacement drive does not light up while the corresponding LEDs on other drives in the array are active, the rebuild process has abnormally terminated. The amber Fault LED of one or more drives might also be illuminated. Refer to

"Abnormal termination of a rebuild (on page

21

)" to determine what action you must take.

Abnormal termination of a rebuild

If the Online/Activity LED on the replacement drive permanently ceases to be illuminated even while other drives in the array are active, the rebuild process has abnormally terminated. The following table indicates the three possible causes of abnormal termination of a rebuild.

Replacing, moving, or adding hard drives 21

Observation

None of the drives in the array have an illuminated amber Fault LED.

The replacement drive has an illuminated amber Fault LED.

One of the other drives in the array has an illuminated amber Fault LED.

Cause of rebuild termination

One of the drives in the array has experienced an uncorrectable read error.

The replacement drive has failed.

The drive with the illuminated Fault LED has now failed.

Each of these situations requires a different remedial action.

Case 1: An uncorrectable read error has occurred.

1.

Back up as much data as possible from the logical drive.

CAUTION: Do not remove the drive that has the media error. Doing so causes the logical drive to fail.

2.

3.

Restore data from backup. Writing data to the location of the unreadable sector often eliminates the error.

Remove and reinsert the replacement drive. This action restarts the rebuild process.

If the rebuild process still terminates abnormally:

1.

Delete and recreate the logical drive.

2.

Restore data from backup.

Case 2: The replacement drive has failed.

Verify that the replacement drive is of the correct capacity and is a supported model. If these factors are not the cause of the problem, use a different drive as the replacement.

Case 3: Another drive in the array has failed.

A drive that has recently failed can sometimes be made temporarily operational again by cycling the server power.

1.

2.

3.

Power down the server.

Remove the replacement physical drive (the one undergoing a rebuild), and reinstall the drive that it is replacing.

Power up the server.

If the newly failed drive seems to be operational again:

1.

Back up any unsaved data.

2.

3.

Remove the drive that was originally to be replaced, and reinsert the replacement physical drive. The rebuild process automatically restarts.

When the rebuild process has finished, replace the newly failed drive.

However, if the newly failed drive has not recovered:

1.

Remove the drive that was originally to be replaced, and reinsert the replacement physical drive.

2.

3.

Replace the newly failed drive.

Restore data from backup.

Replacing, moving, or adding hard drives 22

Upgrading hard drive capacity

You can increase the storage capacity on a system even if there are no available drive bays by swapping drives one at a time for higher capacity drives. This method is viable as long as a fault-tolerance method is running.

CAUTION: Because it can take up to 30 seconds per gigabyte to rebuild the data in the new configuration, the system could be unprotected against drive failure for many hours while the drives are upgraded. Perform drive capacity upgrades only during periods of minimal system activity.

To upgrade hard drive capacity:

1.

Back up all data.

2.

Replace any drive. The data on the new drive is re-created from redundant information on the remaining drives.

CAUTION: Do not replace any other drive until data rebuild on this drive is complete.

When data rebuild on the new drive is complete, the Online/Activity LED stops flashing steadily and either flashes irregularly or glows steadily.

Repeat the previous step for the other drives in the array, one at a time.

3.

When you have replaced all drives, you can use the extra capacity to either create new logical drives or extend existing logical drives. For more information about these procedures, refer to the Configuring

Arrays on HP Smart Array Controllers Reference Guide.

Moving drives and arrays

You can move drives to other ID positions on the same array controller. You can also move a complete array from one controller to another, even if the controllers are on different servers.

Before you move drives, the following conditions must be met:

The server must be powered down.

If moving the drives to a different server, the new server must have enough empty bays to accommodate all the drives simultaneously.

The array has no failed or missing drives, and no spare drive in the array is acting as a replacement for a failed drive.

The controller is not running capacity expansion, capacity extension, or RAID or stripe size migration.

The controller is using the latest firmware version (recommended).

If you want to move an array to another controller, all drives in the array must be moved at the same time.

When all the conditions have been met:

1.

2.

3.

Back up all data before removing any drives or changing configuration. This step is required if you are moving data-containing drives from a controller that does not have a battery-backed cache.

Power down the system.

Move the drives.

Replacing, moving, or adding hard drives 23

4.

5.

Power up the system. If a 1724 POST message appears, drive positions were changed successfully and the configuration was updated.

If a 1785 (Not Configured) POST message appears: a. b.

Power down the system immediately to prevent data loss.

Return the drives to their original locations. c.

Restore the data from backup, if necessary.

Verify the new drive configuration by running ORCA or ACU (" Configuring an array

" on page

12 ).

Replacing, moving, or adding hard drives 24

Diagnosing array problems

In this section

Controller board runtime LEDs.................................................................................................................. 25

Diagnostic tools ..................................................................................................................................... 26

Controller board runtime LEDs

Immediately after the server is powered up, the controller runtime LEDs illuminate briefly in a predetermined pattern as part of the POST sequence. At all other times during server operation, the illumination pattern of the runtime LEDs indicates the status of the controller, as described in the following table.

LED ID Color LED name and interpretation

1

2

Amber

Amber

3

4

5

6

7

8

Green

Green

Green

Green

Green

Green

CR14: Controller Lockup LED.

CR13: Drive Failure LED. A physical drive connected to the controller has failed.

Check the Fault LED on each drive to determine which drive has failed.

CR3: Activity LED for SAS port 2E.

CR8: Activity LED for SAS port 1E.

CR5: Command Outstanding LED. The controller is working on a command from the host driver.

CR6: Heartbeat LED. This LED flashes every 2 seconds to indicate that the controller is in good health.

CR4: Gas Pedal LED. This LED, together with LED 8, indicates the amount of controller CPU activity. For details, see the following table.

CR7: Idle Task LED. This LED, together with LED 7, indicates the amount of controller CPU activity. For details, see the following table.

Diagnosing array problems 25

Controller CPU activity level LED 7 status LED 8 status

75–100% On steadily On steadily

Diagnostic tools

Several diagnostic tools provide feedback about problems with arrays. The most important are:

ADU

This utility is a Windows®-based diagnostic tool that sends an email to HP Support when it detects any problems with the controllers and attached storage in a system.

You can install ADU from the Smart Setup media. When installation is complete, run ADU by clicking

Start and selecting Programs>HP System Tools>HP Array Diagnostic Utility.

The meanings of the various ADU error messages are provided in the HP Servers Troubleshooting

Guide.

POST messages

Smart Array controllers produce diagnostic error messages at reboot. Many of these POST messages are self-explanatory and suggest corrective actions. For more information about POST messages, refer to the HP Servers Troubleshooting Guide.

Diagnosing array problems 26

Electrostatic discharge

In this section

Preventing electrostatic discharge............................................................................................................. 27

Grounding methods to prevent electrostatic discharge ................................................................................ 27

Preventing electrostatic discharge

To prevent damaging the system, be aware of the precautions you need to follow when setting up the system or handling parts. A discharge of static electricity from a finger or other conductor may damage system boards or other static-sensitive devices. This type of damage may reduce the life expectancy of the device.

To prevent electrostatic damage:

Avoid hand contact by transporting and storing products in static-safe containers.

Keep electrostatic-sensitive parts in their containers until they arrive at static-free workstations.

Place parts on a grounded surface before removing them from their containers.

Avoid touching pins, leads, or circuitry.

Always be properly grounded when touching a static-sensitive component or assembly.

Grounding methods to prevent electrostatic discharge

Several methods are used for grounding. Use one or more of the following methods when handling or installing electrostatic-sensitive parts:

Use a wrist strap connected by a ground cord to a grounded workstation or computer chassis. Wrist straps are flexible straps with a minimum of 1 megohm

±

10 percent resistance in the ground cords.

To provide proper ground, wear the strap snug against the skin.

Use heel straps, toe straps, or boot straps at standing workstations. Wear the straps on both feet when standing on conductive floors or dissipating floor mats.

Use conductive field service tools.

Use a portable field service kit with a folding static-dissipating work mat.

If you do not have any of the suggested equipment for proper grounding, have an authorized reseller install the part.

For more information on static electricity or assistance with product installation, contact an authorized reseller.

Electrostatic discharge 27

Regulatory compliance notices

In this section

Federal Communications Commission notice ............................................................................................. 28

Modifications......................................................................................................................................... 28

Cables .................................................................................................................................................. 28

Canadian notice .................................................................................................................................... 28

European Union regulatory notice ............................................................................................................ 29

BSMI notice ........................................................................................................................................... 29

Japanese class A notice .......................................................................................................................... 29

Korean class A notice ............................................................................................................................. 30

Federal Communications Commission notice

This equipment has been tested and found to comply with the limits for a Class A digital device, pursuant to Part 15 of the FCC Rules. These limits are designed to provide reasonable protection against harmful interference when the equipment is operated in a commercial environment. This equipment generates, uses, and can radiate radio frequency energy and, if not installed and used in accordance with the instructions, may cause harmful interference to radio communications. Operation of this equipment in a residential area is likely to cause harmful interference, in which case the user will be required to correct the interference at personal expense.

Modifications

The FCC requires the user to be notified that any changes or modifications made to this device that are not expressly approved by Hewlett-Packard Company may void the user’s authority to operate the equipment.

Cables

Connections to this device must be made with shielded cables with metallic RFI/EMI connector hoods in order to maintain compliance with FCC Rules and Regulations.

Canadian notice

This Class A digital apparatus meets all requirements of the Canadian Interference-Causing Equipment

Regulations.

Cet appareil numérique de la classe A respecte toutes les exigences du Règlement sur le matériel brouilleur du Canada.

Regulatory compliance notices 28

European Union regulatory notice

This product complies with the following EU Directives:

Low Voltage Directive 2006/95/EC

EMC Directive 2004/108/EC

Compliance with these directives implies conformity to applicable harmonized European standards

(European Norms) which are listed on the EU Declaration of Conformity issued by Hewlett-Packard for this product or product family.

This compliance is indicated by the following conformity marking placed on the product:

This marking is valid for non-Telecom products and EU harmonized Telecom products (e.g. Bluetooth).

This marking is valid for EU non-harmonized Telecom products.

*Notified body number (used only if applicable—refer to the product label)

Hewlett-Packard GmbH, HQ-TRE, Herrenberger Strasse 140, 71034 Boeblingen, Germany

BSMI notice

Japanese class A notice

Regulatory compliance notices 29

Korean class A notice

Regulatory compliance notices 30

Acronyms and abbreviations

ACU

Array Configuration Utility

ADU

Array Diagnostics Utility

EBSU

EFI-based setup utility

EFI extensible firmware interface

OBDR

One Button Disaster Recovery

ORCA

Option ROM Configuration for Arrays

POST

Power-On Self Test

SA

Smart Array

Acronyms and abbreviations 31

Index

A

ACU (Array Configuration Utility) 12, 13

array controller installation overview 7

array, configuring 12

array, moving 23

automatic data recovery (rebuild) 21

B

board components 5

BSMI notice 29

C

cables 28

cache, replacing 16

Canadian notice 28

compromised fault tolerance 19

configuring an array 12

connectors 5 controller board, features of 5

controller board, installing 7, 9 controller installation, overview of 7

controller LEDs 25

D

data recovery 19, 21

data transfer rate 5

device drivers, installing 14

drive failure, detecting 18

drive LEDs 17

drive types supported 5

drivers 14

E

electrostatic discharge 27

environmental requirements 5

error messages 18

European Union notice 29

Event Notification service 14

F

failure, hard drive 18

fault tolerance, compromised 19

Federal Communications Commission (FCC)

notice 28

firmware, updating 11

G

grounding methods 27

guidelines, replacing hard drives 20

H

hard drive capacity, upgrading 23

hard drive failure, detecting 18

hard drive failure, effects of 19

hard drive LEDs 17

hard drive, failure of 18, 19

hard drive, replacing 20

hard drives, determining status of 17

hard drives, installing 10

hard drives, moving 23

I

installation overview 7

J

Japanese notice 29

K

Korean notices 30

L

LEDs, controller 25

LEDs, hard drive 17

logical drive, creating 12

M

Management Agents, updating 14

modifications, FCC notice 28

Index 32

moving an array 23

O

ORCA (Option ROM Configuration for Arrays) 12

overview of installation process 7

P

POST error messages 18

power requirements 5

preparation procedures 9

R

RAID levels 5

rebuild, abnormal termination of 21 rebuild, description of 21 rebuild, time required for 21

regulatory compliance notices 28

replacing hard drives 17

replacing the cache module 16

ROM, updating 11

runtime LEDs 25

S

specifications, controller 5

static electricity 27

status lights, controller 25

status lights, hard drive 17

storage capacity, increasing 23

storage devices, connecting 10

T

temperature requirements 5

troubleshooting 26

U

updating the firmware 11

upgrading drive capacity 23

Index 33

Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement

Table of contents