StorageGRID Expansion Guide

StorageGRID® 9.0
Expansion Guide
NetApp, Inc.
495 East Java Drive
Sunnyvale, CA 94089 U.S.A.
Telephone: +1 (408) 822-6000
Fax: +1 (408) 822-4501
Support telephone: +1 (888) 4463-8277
Web: http://www.netapp.com
Feedback: doccomments@netapp.com
Part number: 215-06839_B0
February 2013
Copyright and trademark information
Copyright
information
Copyright © 1994-2013 NetApp, Inc. All rights reserved. Printed in the U.S.A.
No part of this document covered by copyright may be reproduced in any form or by any means—graphic, electronic, or
mechanical, including photocopying, recording, taping, or storage in an electronic retrieval system—without prior written
permission of the copyright owner.
Software derived from copyrighted NetApp material is subject to the following license and disclaimer:
THIS SOFTWARE IS PROVIDED BY NETAPP "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE, WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL
NETAPP BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
NetApp reserves the right to change any products described herein at any time, and without notice. NetApp assumes no
responsibility or liability arising from the use of products described herein, except as expressly agreed to in writing by
NetApp. The use or purchase of this product does not convey a license under any patent rights, trademark rights, or any
other intellectual property rights of NetApp.
The product described in this manual may be protected by one or more U.S. patents, foreign patents, or pending
applications.
RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to restrictions as set forth
in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.277-7103 (October
1988) and FAR 52-227-19 (June 1987).
Trademark
information
NetApp, the NetApp logo, Network Appliance, the Network Appliance logo, Akorri, ApplianceWatch, ASUP,
AutoSupport, BalancePoint, BalancePoint Predictor, Bycast, Campaign Express, ComplianceClock, Cryptainer,
CryptoShred, Data ONTAP, DataFabric, DataFort, Decru, Decru DataFort, DenseStak, Engenio, Engenio logo, E-Stack,
FAServer, FastStak, FilerView, FlexCache, FlexClone, FlexPod, FlexScale, FlexShare, FlexSuite, FlexVol, FPolicy,
GetSuccessful, gFiler, Go further, faster, Imagine Virtually Anything, Lifetime Key Management, LockVault, Manage
ONTAP, MetroCluster, MultiStore, NearStore, NetCache, NOW (NetApp on the Web), Onaro, OnCommand, ONTAPI,
OpenKey, PerformanceStak, RAID-DP, ReplicatorX, SANscreen, SANshare, SANtricity, SecureAdmin, SecureShare,
Select, Service Builder, Shadow Tape, Simplicity, Simulate ONTAP, SnapCopy, SnapDirector, SnapDrive, SnapFilter,
SnapLock, SnapManager, SnapMigrator, SnapMirror, SnapMover, SnapProtect, SnapRestore, Snapshot, SnapSuite,
SnapValidator, SnapVault, StorageGRID, StoreVault, the StoreVault logo, SyncMirror, Tech OnTap, The evolution of
storage, Topio, vFiler, VFM, Virtual File Manager, VPolicy, WAFL, Web Filer, and XBB are trademarks or registered
trademarks of NetApp, Inc. in the United States, other countries, or both.
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation
in the United States, other countries, or both. A complete and current list of other IBM trademarks is available on the Web at
www.ibm.com/legal/copytrade.shtml.
Apple is a registered trademark and QuickTime is a trademark of Apple, Inc. in the U.S.A. and/or other countries. Microsoft
is a registered trademark and Windows Media is a trademark of Microsoft Corporation in the U.S.A. and/or other countries.
RealAudio, RealNetworks, RealPlayer, RealSystem, RealText, and RealVideo are registered trademarks and RealMedia,
RealProxy, and SureStream are trademarks of RealNetworks, Inc. in the U.S.A. and/or other countries.
All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as
such.
NetApp, Inc. is a licensee of the CompactFlash and CF Logo trademarks. NetApp, Inc. NetCache is certified RealSystem
compatible.
2
Copyright and trademark information
Contents
1
2
Copyright and trademark information . . . . . . . . . . .
2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Intended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Installed Version of StorageGRID Software . . . . . . . . . . . . . . . . . . . . . . . .
11
12
12
Add Grid Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Overview: Add Grid Nodes to Existing Grid . . . . . . . . . . . . . . . . . . . . . . .
Procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Scope and Limitations of the Expansion Procedure . . . . . . . . . . . . . . . . . .
Storage Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Control Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Reasons for Control Node Expansions . . . . . . . . . . . . . . . . . . . . . . .
Potential Interruption of Grid Operations . . . . . . . . . . . . . . . . . . . .
Archive Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TSM Archive Node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ILM Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Gateway Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Adding a Gateway Node Replication Group. . . . . . . . . . . . . . . . . .
Adding Secondary Gateway Nodes . . . . . . . . . . . . . . . . . . . . . . . . .
Admin Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Increasing Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Increasing Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Audit Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Increasing Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Prepare for Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Overview: Prepare for Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Confirm Grid’s Grid Node Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . .
Determine Bindings Required for Expansion . . . . . . . . . . . . . . . . .
Bindings Required for Conversion . . . . . . . . . . . . . . . . . . . . . . . . . .
Acquire an Updated Grid Specification File . . . . . . . . . . . . . . . . . . . . .
Admin Node Hosted on a Virtual Machine . . . . . . . . . . . . . . . . . . .
Admin Node Hosted on a Physical Server . . . . . . . . . . . . . . . . . . . .
Gather Required Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ensure Expansion Hardware is Ready . . . . . . . . . . . . . . . . . . . . . . . . . .
Powering Up Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Configuring TSM Archive Node Hardware. . . . . . . . . . . . . . . . . . .
Provision the Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Provision the Grid and Create Server Activation Floppy Image . . . .
13
14
14
15
15
15
16
16
17
17
18
18
18
19
19
19
19
20
20
21
21
21
22
22
23
23
23
24
28
28
29
29
29
NetApp StorageGRID
215-06839_B0
3
Expansion Guide
Provision the Grid and Create a Server Activation USB Flash Drive
Update Networking at Existing Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Grid Networking Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Update Routes on Servers at Existing Sites . . . . . . . . . . . . . . . . . . . . . .
Prepare VMs and Physical Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Install StorageGRID Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Add, Customize, and Start Grid Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Overview: Add, Customize, and Start . . . . . . . . . . . . . . . . . . . . . . . . . .
Run the Initial Expansion Grid Task . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Expansion Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Run the Add Server Grid Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Clone the CMS Database If Required . . . . . . . . . . . . . . . . . . . . . . . . . . .
Identify the Source CMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Shut Down the Source CMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Clone the CMS Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Restart the Source CMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Customize the CMS Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Set CMS ILM Evaluation Deferred Time . . . . . . . . . . . . . . . . . . . . . . . .
Start Grid Software (Enable Services) . . . . . . . . . . . . . . . . . . . . . . . . . . .
Apply Hotfixes and Maintenance Releases . . . . . . . . . . . . . . . . . . . . . .
Ensure Storage Nodes are Included in ILM Storage Pools . . . . . . . . .
Monitor the Restoration of the Gateway Node File System . . . . . . . .
Verify that the Storage Node is Active . . . . . . . . . . . . . . . . . . . . . . . . . .
Verify Operation of the Expansion Control Nodes . . . . . . . . . . . . . . . .
Admin Node Customization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Customizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Copy NMS Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Customize a New Gateway Node Replication Group . . . . . . . . . . . . .
Back up Each New Replication Group . . . . . . . . . . . . . . . . . . . . . . .
Verify the Gateway Node Backup . . . . . . . . . . . . . . . . . . . . . . . . . . .
Customize Gateway Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Update Configurable ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Integrate Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Verify the New Gateway Node Replication Group . . . . . . . . . . . . . . .
Update File Share Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Copy the FSG Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Archive Node Integration and Customization . . . . . . . . . . . . . . . . . . .
Complete the Setup of TSM Archive Nodes . . . . . . . . . . . . . . . . . . .
Update the ILM Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Copy Audit Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Update the Grid for New Grid Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Overview: Update the Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Update Link Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Update NTP Sources on Grid Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . .
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Corrupt ISO Message When Using load_cds.py Script . . . . . . . . . . . .
4
32
35
35
36
38
39
41
41
44
46
47
49
51
52
53
55
56
57
57
61
61
61
62
63
64
64
65
66
66
67
67
68
68
68
69
72
73
73
73
73
75
75
75
76
77
77
NetApp StorageGRID
215-06839_B0
3
4
5
6
Convert to a High Capacity Admin Cluster . . . . . . . .
79
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Convert to a High Capacity Admin Cluster . . . . . . . . . . . . . . . . . . . . . . . .
Decommission the Consolidated Admin Node . . . . . . . . . . . . . . . . . . . . .
79
80
83
Add Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Adding Storage Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Understanding Storage Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Manual Storage Volume Balancing . . . . . . . . . . . . . . . . . . . . . . . . . .
Auto Storage Volume Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . .
New vs. Upgraded Storage Nodes . . . . . . . . . . . . . . . . . . . . . . . . . .
Determine Storage Node’s Storage Volume Balancing Mode . . . . . .
Add NFS Mounted Storage Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . .
Add Direct-Attached or SAN Storage Volumes . . . . . . . . . . . . . . . . . .
Rebalancing Content Across Storage Volumes . . . . . . . . . . . . . . . . . . .
Rebalance Content on the Storage Node . . . . . . . . . . . . . . . . . . . . .
Expand the Remaining Storage Nodes . . . . . . . . . . . . . . . . . . . . . . . . . .
85
85
86
86
87
87
87
88
90
93
95
97
Convert to a High Availability Gateway Cluster . . . .
99
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Overall Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Convert to a High Availability Gateway Cluster . . . . . . . . . . . . . . . . . . . .
Prepare for Conversion and Provision the Grid . . . . . . . . . . . . . . . . . .
Add a New Gateway Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Prevent Clients From Accessing the Replication Group . . . . . . . . . . .
Update Network Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Add New Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Update Gateway Node Configuration . . . . . . . . . . . . . . . . . . . . . . . . . .
Start Primary Gateway Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Update File Share Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Supplementary Gateway Node is New . . . . . . . . . . . . . . . . . . . . . .
Supplementary Gateway Node is Converted . . . . . . . . . . . . . . . . .
Check CIFS File Shares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Check NFS File Shares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Confirm Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Copy the FSG Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Test Failover in the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Restore Client Access to Gateway Nodes . . . . . . . . . . . . . . . . . . . . . . .
99
101
103
103
104
105
105
108
109
111
114
114
115
115
118
119
120
120
122
Hardware Refresh . . . . . . . . . . . . . . . . . . . . . . . . . . .
123
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hardware Refresh vs Server Recovery . . . . . . . . . . . . . . . . . . . . . . . . . .
Refresh Combined Grid Nodes to Virtual Machines . . . . . . . . . . . . . .
Prepare for a Hardware Refresh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
123
124
124
125
NetApp StorageGRID
215-06839_B0
5
Expansion Guide
Grid Topology Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Impact on Grid Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RAID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Servers with Control Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Number of Control Nodes Being Refreshed . . . . . . . . . . . . . . . . . . .
Metadata Consolidation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Content Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Metadata Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CMS Database Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CMS Metadata Replication Groups . . . . . . . . . . . . . . . . . . . . . . . . . .
Storage Pool/Link Cost Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Impact on Grid Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Control Node Failure and Recovery . . . . . . . . . . . . . . . . . . . . . . . . .
Servers with Storage Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Storage Consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Content Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Storage Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Storage Pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Number of Storage Nodes Being Refreshed . . . . . . . . . . . . . . . . . . .
Object Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Impact on Grid Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hardware Refresh vs Decommissioning. . . . . . . . . . . . . . . . . . . . . .
Servers with Admin Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Servers with Gateway Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Materials Checklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Master Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Modify the Grid Specification File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Edit the Grid Specification File Using Grid Designer . . . . . . . . . . . . . .
Procedures for Servers with a Control Node . . . . . . . . . . . . . . . . . . . . . . . .
Prevent the CMS Service and MySQL from Starting . . . . . . . . . . . . . .
Prevent the CMS Service from Starting . . . . . . . . . . . . . . . . . . . . . . . . .
Close Client Gateways and Connections . . . . . . . . . . . . . . . . . . . . . . . .
Open Client Gateways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Procedures for Servers with an Admin Node . . . . . . . . . . . . . . . . . . . . . . .
Change the Preferred E-mail Notifications Sender . . . . . . . . . . . . . . . .
Switch CMNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Load Provisioning Software and Provisioning Data . . . . . . . . . . . . . .
Primary Admin Node on a Physical Server . . . . . . . . . . . . . . . . . . .
Primary Admin Node on a Virtual Machine . . . . . . . . . . . . . . . . . .
Update SSH Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Standard Update Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Automated Update Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Procedures for Servers with a Gateway Node . . . . . . . . . . . . . . . . . . . . . . .
Confirm Samba and Winbind Versions . . . . . . . . . . . . . . . . . . . . . . . . .
Copy the FSG Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Copy File Share Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
125
126
127
127
127
127
127
128
128
129
129
129
130
130
130
130
130
131
131
131
132
132
132
133
134
136
141
141
143
143
144
144
146
147
147
148
149
149
151
152
153
154
155
155
155
156
NetApp StorageGRID
215-06839_B0
A
B
Designate the New FSG as Backup FSG . . . . . . . . . . . . . . . . . . . . . . . . .
Designate the New FSG as the Primary FSG . . . . . . . . . . . . . . . . . . . . .
Procedures for Servers with a Storage Node . . . . . . . . . . . . . . . . . . . . . . . .
Confirm That all Persistent Content was Copied . . . . . . . . . . . . . . . . .
Hardware Refresh and Decommissioning Tasks . . . . . . . . . . . . . . . . . . . .
Remove NMS Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Gateway Node and Admin Node Decommissioning . . . . . . . . . . . . .
Storage Node and Control Node Hardware Refresh . . . . . . . . . . . . . .
Retire the Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Storage Volume Failures During Refresh . . . . . . . . . . . . . . . . . . . . . . . . . . .
Storage Volume Fails on Source LDR . . . . . . . . . . . . . . . . . . . . . . . . . . .
Storage Volume Fails on Destination LDR . . . . . . . . . . . . . . . . . . . . . .
Procedures for Storage Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Abort the Hardware Refresh Grid Task . . . . . . . . . . . . . . . . . . . . . .
Reformat Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Generate and Run the Refresh Cleanup Grid Task . . . . . . . . . . . . .
Resubmit the Hardware Refresh Grid Task . . . . . . . . . . . . . . . . . . .
157
158
159
159
161
161
162
163
168
170
170
170
171
171
172
174
177
Prepare Virtual Machines . . . . . . . . . . . . . . . . . . . . .
179
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Refresh of Combined Grid Nodes to Virtual Machines . . . . . . . . .
Install VMware vSphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Install VMware vSphere Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VMware vCenter Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VMware ESX/ESXi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VMware vSphere Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Create the Virtual Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Create Virtual Machines Using OVF Files . . . . . . . . . . . . . . . . . . . .
Create Virtual Machines Manually . . . . . . . . . . . . . . . . . . . . . . . . . .
Configure ESX/ESXi for Automatic Restart . . . . . . . . . . . . . . . . . . . . . .
Start the Virtual Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Install Linux on Virtual Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Install VMware Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Reorder Network Interface Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Configure Storage Nodes for NFS Mounted Storage Volumes . . . . . . . .
Load Software ISO Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Use load_cds.py on a Virtual Machine . . . . . . . . . . . . . . . . . . . . . . . . . .
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Virtual Machine Not Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VM Resource Reservation Requires Adjustment . . . . . . . . . . . . . .
VM is Not Configured for Automatic Restart . . . . . . . . . . . . . . . . .
Resetting the Virtual Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
179
180
180
181
181
181
182
182
182
183
183
185
185
188
190
192
194
194
195
195
195
196
197
Prepare Expansion Physical Servers . . . . . . . . . . . .
199
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
199
NetApp StorageGRID
215-06839_B0
7
Expansion Guide
C
D
8
Install Linux on Physical Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Install Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Install Hardware Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Reorder Network Interface Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Configure Storage Nodes for NFS Mounted Storage Volumes . . . . . . . . .
Load Software ISO Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Use load_cds.py with CDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Use load_cds.py with ISOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wrong USB Device Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
200
202
202
202
205
206
207
207
209
209
How to Use GDU . . . . . . . . . . . . . . . . . . . . . . . . . . .
211
Start GDU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
GDU User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Entering Commands in GDU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Install Drivers with GDU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Close GDU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
GDU Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
GDU Display Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Problems with Server Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
GDU Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Missing GDU Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Troubleshooting With screen in Multi Display Mode . . . . . . . . . . . . .
About load_cds.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Copy ISO Files in Multi-Site Environment . . . . . . . . . . . . . . . . . . . . . . . . . .
211
213
215
215
216
217
217
217
217
217
218
219
219
Grid Specification Files and Provisioning . . . . . . . .
223
What is Provisioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
About the SAID Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Grid Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
About Grid Specification Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Grid Specification File Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Naming Convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Grid Specification File Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Server Names. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
View a Copy of the Grid Specification File . . . . . . . . . . . . . . . . . . . . . .
Export the Latest Grid Specification File . . . . . . . . . . . . . . . . . . . . . . . .
Admin Node Hosted on a Virtual Machine . . . . . . . . . . . . . . . . . . .
Admin Node Hosted on a Physical Server . . . . . . . . . . . . . . . . . . . .
Provision the Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
On a Primary Admin Node on a Virtual Machine . . . . . . . . . . . . . . . .
On a Primary Admin Node on a Physical Server . . . . . . . . . . . . . . . . .
Change the Provisioning Passphrase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
On a Primary Admin Node on a Virtual Machine . . . . . . . . . . . . . . . .
223
223
224
225
225
226
227
227
230
231
231
231
232
232
234
236
236
NetApp StorageGRID
215-06839_B0
E
On a Primary Admin Node on a Physical Server . . . . . . . . . . . . . . . . .
Provisioning Without a USB Flash Drive . . . . . . . . . . . . . . . . . . . . . . . . . . .
Preserving Copies of the Provisioning Data . . . . . . . . . . . . . . . . . . . . .
Provisioning Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Provision Command Fails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Errors in Grid Specification File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Initial Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Upgrades, Expansion, and Maintenance Procedures . . . . . . . . . . .
238
239
240
241
242
242
243
243
Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
245
Browser Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Verify Internet Explorer Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Enable Pop-ups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
NMS Connection Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Security Certificate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Log In . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Log Out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Command Shell Access Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Log In . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Log Out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Accessing a Server Remotely . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Connect Using the Remote Server Password . . . . . . . . . . . . . . . . . .
Connect Using the ssh Private Key Password . . . . . . . . . . . . . . . . .
Connect to a Server Without Using a Password . . . . . . . . . . . . . . .
Using GNU screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
245
245
246
246
247
248
248
249
249
249
249
249
249
250
250
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
253
NetApp StorageGRID
215-06839_B0
9
Expansion Guide
10
NetApp StorageGRID
215-06839_B0
1
Introduction
Purpose
This guide to the expansion of an existing NetApp® StorageGRID®
system provides all of the steps that you need to expand a grid. The
following chapters guide you through the process of provisioning,
installing, and activating new grid nodes on an existing StorageGRID
system, without interrupting grid operations. Read this guide in full
before you begin an expansion, and become familiar with all of the
steps and requirements that you need to expand a grid. This guide
assumes that you are familiar with the installation processes described
in the Installation Guide and the integration processes described in the
Administrator Guide.
NOTE Grid node software may be installed on different sets of qualified
hardware. This guide only covers the software installation and configuration process. Specific hardware installation and integration
instructions are not included.
This document helps trained technicians to:
•
Add grid nodes to an existing StorageGRID system.
•
Install and activate grid node software on newly added servers,
and retain or copy required data from the existing grid.
•
Add storage to an existing Storage Node.
•
Convert Admin Nodes to a High Capacity Admin Cluster.
•
Convert an existing Basic Gateway replication group to be a High
Availability Gateway replication group.
•
Refresh hardware for servers that host supported grid nodes.
This guide assumes that the grid expansion has been planned in
advance and that all required hardware has been purchased. This
guide also assumes that hardware has been installed, connected, and
configured to specifications.
NetApp StorageGRID
215-06839_B0
11
Expansion Guide
Intended Audience
Content of this guide is current with Release 9.0 of the StorageGRID
software and is intended for technical personnel trained to install and
support the NetApp StorageGRID system.
It is assumed that you have a general understanding of the product
and the product’s provisioning, installation, activation, and integration
process.
This document assumes familiarity with many terms related to
computer operations and programming, network communications,
and operating system file operations. There is wide use of acronyms.
Installed Version of StorageGRID Software
By the end of the expansion process, all grid services of the same type
must be running the same version of the StorageGRID 9.0 software.
This includes applied hotfixes and maintenance releases. Before performing an expansion, it is recommended that you note the current
installed version of the StorageGRID 9.0 software running on the grid
node type that is to be added. Use this number as a reference to determine if hotfixes and/ or maintenance releases must be applied when
performing expansion procedures.
1. In the NMS management interface (MI), go to <grid_node> 
SSM  Services  Main.
2. Scroll to the bottom of the page to view the Packages section. The
Version for the storage-grid-release package indicates the installed
version number
12
NetApp StorageGRID
215-06839_B0
2
Add Grid Nodes
Add grid nodes to an existing site, or add a new satellite
site
Introduction
NOTE To avoid errors, read this chapter in full and become familiar with the
entire expansion process before you begin an expansion.
You can expand a grid by adding the following types of grid nodes:
•
Storage Nodes
•
Control Nodes
•
Secondary Gateway Nodes
•
Gateway Node replication groups
•
Admin Nodes (including a High Capacity Admin Cluster)
•
Archive Nodes
•
Audit Nodes
This chapter also describes how to add a satellite site. Other types of
expansions, conversions, and hardware refresh procedures are not
covered in this chapter. See the relevant chapter in this guide for more
information.
For information about expanding a grid during a software upgrade,
see the Upgrade Guide.
NetApp StorageGRID
215-06839_B0
13
Expansion Guide
Overview: Add Grid Nodes to Existing Grid
The following is an overview of the steps required to add a grid node
of any type to an existing grid.
NOTE New installations of the StorageGRID 9.0 system are not supported
on physical servers. Virtual machines must be used.
Procedure
Grid Expansion
Procedure
Read scope and limitations
Prepare for expansion
Provision the grid
Update networking
Prepare VMs and servers
Install grid software
Add, customize, and start grid
nodes
Update grid for new grid nodes
1. Read the scope and limitations. The type of expansion and the
purpose of the expansion may affect which steps you perform. See
“Scope and Limitations of the Expansion Procedure” on page 15.
2. Gather required materials, and prepare the grid to add new grid
nodes. See “Prepare for Expansion” on page 21.
3. Provision the grid to create the server activation media. See “Provision the Grid” on page 29.
4. If adding a new site, update networking at the existing site(s). See
“Update Networking at Existing Sites” on page 35.
5. Set up virtual machines or physical servers to prepare for the
installation of grid software. See “Prepare VMs and Physical
Servers” on page 38.
6. Install grid software on virtual machines or physical servers that
will host new grid nodes. See “Install StorageGRID Software” on
page 39.
7. Inform the grid of the expansion grid nodes, and complete all other
steps required to permit the expansion grid nodes to interact with
existing grid servers (including customizing and starting the new
grid nodes). See “Add, Customize, and Start Grid Nodes” on
page 41.
8. If required, update the grid by customizing link costs and/or NTP
sources to permit efficient inter-operation of the grid and the new
grid nodes. See “Update the Grid for New Grid Nodes” on
page 75.
14
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Scope and Limitations of the Expansion Procedure
Storage Nodes
You can use the following methods to increase the storage capacity of a
grid:
•
Add Storage Nodes — to increase the total storage capacity of the
grid or to add new sites to the grid. See “Overview: Add Grid
Nodes to Existing Grid” on page 14.
•
Add Storage Volumes (LUNs) to a Storage Node — to increase the
storage capacity of a Storage Node. See Chapter 4: “Adding
Storage Volumes”.
Storage expansion depends on the grid’s ILM rules for stored content
and the grid topology. For example, if the ILM rules make one copy of
each ingested object at a Data Center site and a second copy at a
Disaster Recovery site, you must add an equivalent amount of storage
at each site to increase the overall capacity of the grid. Most ILM
policies create two copies of data on spinning media; therefore, most
storage expansions add two Storage Nodes, or add storage volumes to
two Storage Nodes.
Add or expand new Storage Nodes with a Storage Volume Balancing
mode of Auto (see LDR  Storage  Main). For upgraded Storage Nodes
that are expanded, the Storage Volume Balancing mode remains Manual.
You can add a Storage Node to the grid at any time (including during
an upgrade). When you add Storage Nodes during an upgrade, you
must install the expansion Storage Nodes with the software release
version to which you are upgrading the grid. For more information,
see the Upgrade Guide.
Control Nodes
You can add Control Nodes to increase the number of objects that can
be saved to the grid, or to add sites to the grid. For more information
about the number and location of Control Nodes being added to the
grid you are expanding, see the generated SAID package, which
includes the topology and configuration of the expanded grid.
It is particularly important to understand the purpose of a
Control Node expansion, as described in “Reasons for Control Node
NetApp StorageGRID
215-06839_B0
15
Expansion Guide
Expansions” below. The procedure that you use to add Control Nodes
and the amount of disruption that it may cause to grid operations
varies depending upon the reason Control Nodes are being added.
Reasons for Control Node Expansions
When CMS databases fill, add new Control Nodes to the grid to
maintain grid capacity (number of objects), or, if the CMS database is
less than 800 GiB, perform a hardware refresh to expand the database
and the number of objects that the grid can support. A new
Control Node has an empty database, while a refreshed Control Node
that expands the CMS database includes cloned data from the original
CMS database and will result in an increase to the grid’s object capacity. As Control Nodes (new and refreshed) become operational, they
immediately begin to collect metadata for new objects.
When you add a new site to an existing grid, the updated grid design
may include new Control Nodes. Depending upon customer requirements, you can add these new Control Nodes with empty CMS
databases (when they act as satellite sites), or you may be required to
copy the database from an existing CMS service to the new
Control Node (to increase the grid’s redundancy and data availability).
The most common type of site that is added to a grid is a satellite site.
For more information about satellite sites, and in particular satellite
CMSs, see the Administrator Guide.
WARNING You must add Control Nodes correctly to ensure that
the grid continues to operate as expected under
normal and failure conditions.
Potential Interruption of Grid Operations
When adding Control Nodes, if you need to clone the CMS database,
you must temporarily shut down a functioning CMS service as part of
the cloning procedure. At least two Control Nodes must be operational to perform this procedure. If the grid does not include at least
two operational Control Nodes, before adding more Control Nodes,
rectify this issue and confirm that the grid includes at least two operational Control Nodes.
16
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
When operating
with a single CMS,
alarms appear in
the NMS management interface.
While you clone the database, the grid can continue normal operations
as long as at least one read-write CMS service remains active (that is, if
the grid includes three CMSs). However, when only one CMS remains
active during the cloning process, you are at risk of data loss in case of
another Control Node failure.
NOTE Cloning a CMS database can take over four hours, depending on
content, and the entire expansion process can take up to eight hours.
You are advised to perform this operation at a time that is least disruptive to normal customer operations.
Archive Nodes
You can add one or more Archive Nodes to a grid to add the ability to
store content to archival media.
TSM Archive Node
You can integrate a TSM Archive Node with either a new or an
existing TSM middleware server.
NOTE Archive Nodes cannot be co-hosted with a TSM server.
Before you begin to configure the TSM middleware for use with the
Archive Node, read through the description of best practices included
in the Installation Guide. The best practices section explains details of
Archive Node operation that can impact the configuration of the TSM
middleware and provides guidance as to which TSM settings are
required and recommended.
The section “Configure the TSM Server” in the Installation Guide
includes detailed sample instructions for preparing a TSM Server that
follow these recommendations. These instructions are provided for
your guidance only. They are not intended to replace TSM Server documentation, or to provide complete and comprehensive instructions
suitable for any customer configuration. Instructions suitable for your
site should be provided by a TSM administrator who is familiar both
with your detailed requirements, and with the complete set of TSM
Server documentation.
NetApp StorageGRID
215-06839_B0
17
Expansion Guide
ILM Policy
When you add an Archive Node to an existing grid, you must update
the ILM policy for the grid to direct the Control Nodes to use the
Archive Node as a storage location. Depending upon customer
requirements and the site design, existing content may also be re-evaluated and directed to storage on an Archive Node. For more
information about updating the ILM for the new Archive Node, see
“Update the ILM Policy” on page 73.
Gateway Nodes
Gateway Nodes provide a redundant virtual file system interface to
the files stored in the grid by client applications.
The instructions in this chapter cover the following cases:
•
“Adding a Gateway Node Replication Group”
•
“Adding Secondary Gateway Nodes”
The procedure for converting a Basic Gateway replication group to a
High Availability Gateway replication group (which may include
adding an additional Gateway Node to the replication group) is
described elsewhere in this guide. See Chapter 5: “Convert to a High
Availability Gateway Cluster”.
Adding a Gateway Node Replication Group
You can add additional Gateway Node replication groups to a grid to
let more clients access the grid, to give an entirely separate set of
clients read and write access, or to support the ingest of a larger total
number of files than is possible with a single replication group. You
can also add a Gateway Node replication group when you add a satellite site to let the satellite site ingest and retrieve files. You can add a
Gateway Node replication group to the grid at any time (including
during an upgrade).
When you add Gateway Node replication groups to the grid during an
upgrade, you must install the expansion Gateway Nodes with the
software release version to which you are upgrading the grid. You
cannot expand existing Gateway Node replication groups while you
upgrade the grid. For more information, see the Upgrade Guide.
When you add a new replication group, it runs independently of the
replication groups that are a part of the existing grid.
18
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Adding Secondary Gateway Nodes
You can expand an existing replication group by adding a secondary
Gateway Node. This expansion does not increase grid capacity: it
increases redundancy. Adding a secondary Gateway Node may create
increased retrieval throughput from the grid (by providing an additional read-only access point). If you add the secondary
Gateway Node at another site, it also provides a new access point to
the grid.
NOTE You cannot expand an existing replication group during an upgrade.
Admin Nodes
In the simplest case, a single primary Admin Node (hosting the CMN
service) collects and stores attributes from all services and components
in the grid. This Admin Node provides a web based interface to
display grid information.
Increasing Redundancy
For redundancy, a grid may be configured with a second
Admin Node. This Admin Node does not host the CMN service. Each
Admin Node collects and stores information from all grid services,
and provides a separate, but equivalent, NMS management interface
(MI) to display grid information. This configuration of multiple
Admin Nodes is common for grids that are geographically distributed
(for example, a grid that includes a Data Center site and a Disaster
Recovery site). Note that adding a second Admin Node does not
increase a grid’s overall grid node capacity, and the second
Admin Node must have at least the same capacity as the primary
Admin Node.
For redundancy, in a grid configured with a High Capacity Admin
Cluster (HCAC), a second HCAC can be added. When adding a
second HCAC, note that its reporting Admin Node is not a primary —
it does not host the CMN service.
Increasing Capacity
In a standard grid configuration, there is an upper limit to the number
of grid nodes and their services that a primary Admin Node can
support. If the number of grid nodes exceeds the capacity of the
NetApp StorageGRID
215-06839_B0
19
Expansion Guide
primary Admin Node, you must convert to an HCAC. For more information on determining the current capacity, see “Confirm Grid’s Grid
Node Capacity” on page 21. For the procedure to convert to an HCAC,
see Chapter 3: “Convert to a High Capacity Admin Cluster”.
Audit Nodes
Usually a grid is deployed with the AMS service hosted by the
Admin Node; however, for larger grids, the AMS service can be hosted
by an Audit Node. Regardless of the grid’s configuration (AMS hosted
by either an Admin Node or Audit Node), the grid can be expanded to
add an Audit Node. This expansion adds redundancy for log files.
After adding an Audit Node, if the Audit option is included in the
deployment, enable the audit share on the Audit Node. For more
information and procedures, see the Administrator Guide.
NOTE Authorization is required to access audit log files.
Increasing Redundancy
When you add an Audit Node, the original AMS service hosted by the
Admin Node or Audit Node remains, and a second AMS service is
added to the grid. When started, the expansion Audit Node begins
logging system events. Note, however, that logs created before you
added the Audit Node are not automatically copied over to the expansion Audit Node. You can manually copy log files to the expansion
Audit Node.
20
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Prepare for Expansion
Complete the steps below to prepare yourself and the grid for the new
grid nodes.
Grid Expansion
Procedure
Read scope and limitations
Prepare for expansion
Provision the grid
There are steps you must complete before starting the expansion and
others that must be completed immediately before you add the expansion grid node. You must complete all of the steps in this table
regardless of the type of grid node being added.
Update networking
Prepare VMs and servers
Install grid software
Add, customize, and start grid
nodes
Update grid for new grid nodes
Overview: Prepare for Expansion
Table 1: Prepare for Expansion

Step
Action
See
1.
Confirm the Admin Node’s current grid node capacity.
page 21
2.
Acquire an updated grid specification file for the expansion.
page 23
3.
Confirm that you have all materials required for the expansion.
page 24
4.
Ensure hardware required for the expansion is ready.
page 28
Confirm Grid’s Grid Node Capacity
Before adding grid nodes, confirm that the grid’s current configuration
can support additional grid nodes. If it cannot, convert to an HCAC
before adding other expansion grid nodes.
1. In the NMS MI, go to <Admin Node>  NMS  Overview  Main.
Under Binding Information the following information is displayed:
•
•
NetApp StorageGRID
215-06839_B0
Name — The name of the NMS service or cluster.
Type — The type of NMS service hosted by the Admin Node
(consolidated, reporting, or processing).
21
Expansion Guide
•
Bound Nodes — The current number of grid services bonded to
the NMS service.
•
Maximum Supported Bindings — The total service capacity of
the grid.
•
Remaining Capacity — The percentage of available binding
capacity remaining on the Admin Node.
Determine Bindings Required for Expansion
To determine if the grid’s current configuration can support additional
grid nodes, calculate how many bindings the expansion process adds.
Each service on a grid node equals one binding. For example, an
expansion Gateway Node includes three services (CLB, FSG, SSM) and
therefore three bindings.
Bindings Required for Conversion
If you convert to an HCAC, be aware that additional bindings are
needed during the conversion process. The conversion process
requires at least six bindings for the reporting and processing
Admin Nodes. These bindings are temporarily bound to the old
Admin Node during the expansion phase of the conversion process
and are removed when the old Admin Node is decommissioned. The
old Admin Node must always have at least six bindings available to
accommodate the conversion process.
Example
The following example illustrates how adding a new High Availability
Gateway replication group to a grid increases the number of required
bindings to a point where you need to use an HCAC in the grid.
Suppose you want to add a new High Availability Gateway replication
group to the grid and the gateway group includes a main primary, a
supplementary primary, and a secondary Gateway Node. The expansion adds nine services (three FSGs, three CLBs, and three SSMs), and
therefore requires nine additional bindings on the Admin Node.
When you check the capacity of the Admin Node, you discover that its
value for Maximum Supported Bindings is 160 bindings, and the number
of Bound Nodes is 154. Only six bindings are available for new nodes
and you require nine for the expansion. You must convert the grid to
use an HCAC before you can add the new gateway replication group
to the grid. Note as well that the conversion to an HCAC will use all
six available bindings.
22
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Acquire an Updated Grid Specification File
Use one of the following procedures:
•
“Admin Node Hosted on a Virtual Machine” on page 23
•
“Admin Node Hosted on a Physical Server” on page 23
Admin Node Hosted on a Virtual Machine
Prerequisites and Required Materials
•
You have confirmed the grid’s ability to support additional grid
nodes, or have determined that performing the grid expansion will
require first converting to a HCAC.
•
Passwords.txt file
•
A tool such as WinSCP (available at http://winscp.net/eng/download.php) to transfer files to and from the Admin Node
Procedure
1. At the primary Admin Node server, access a command shell and
log in as root using the password listed in the Passwords.txt file.
2. Create a directory to hold the provisioned grid specification file.
Enter: mkdir -p /root/usb
3. Copy the provisioned grid specification file from the grid to the
directory. Enter: copy-grid-spec /root/usb
4. Use WinSCP to copy the GID<Grid_ID>_REV<revision_number>_GSPEC.xml file from the Admin Node to your service laptop.
5. Log out. Enter: exit
6. Follow the instructions in the Grid Designer User Guide and create
the expansion grid specification file. This includes deployment.
When the new deployment grid specification file is acquired, and you
have confirmed that you have all of the materials in the Materials
Checklist (below), you are ready to begin the grid expansion.
Admin Node Hosted on a Physical Server
NOTE New installations of the Release 9.0 StorageGRID system are not
supported on physical servers. Virtual machines must be used.
NetApp StorageGRID
215-06839_B0
23
Expansion Guide
Prerequisites and Required Materials
•
You have confirmed the grid’s ability to support additional grid
nodes, or have determined that performing the grid expansion will
require first converting to a HCAC.
•
USB flash drive
•
Passwords.txt file
Procedure
1. At the primary Admin Node server, access a command shell and
log in as root using the password listed in the Passwords.txt file.
2. Insert a USB flash drive into the Admin Node server.
3. Copy the latest version of the provisioned grid specification file
from the grid to the USB flash drive. Enter: copy-grid-spec
4. Transfer the provisioned grid specification file from the USB flash
drive to the service laptop.
5. Follow the instructions in the Grid Designer User Guide and create
the expansion grid specification file. This includes deployment.
When the new deployment grid specification file is acquired, and you
have confirmed that you have all of the materials in the Materials
Checklist (below), you are ready to begin the grid expansion.
Gather Required Materials
You will need the following items to complete the expansion:
Table 2: Materials Checklist for Expansion

Item
Notes
SUSE Linux Enterprise Server
(SLES) DVD
Use only the supported version of SLES for the Release 9.0
StorageGRID system. For supported versions, see the
Interoperability Matrix Tool (IMT).
NOTE Use of any version of SLES other than the correct
version will result in an installation failure.
Since you can load Linux on multiple servers in parallel, it
is helpful to have more than one copy of SLES DVD.
24
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Table 2: Materials Checklist for Expansion (cont.)

Item
Notes
StorageGRID Software CD
You only require this CD if you perform a manual installation of an expansion grid node.
Version 9.0.0
Confirm that the version matches the grid activation
information.
Used to install the base version of the grid software on all
grid servers. Works in conjunction with the Server Activation USB flash drive that customizes each server to prepare
it for its assigned role in the grid.
Enablement Layer for StorageGRID Software CD
You only require this CD if you perform a manual installation of an expansion grid node.
Version 9.0.0
The Enablement Layer customizes the Linux operating
system that is installed on each grid server. Only the
packages needed to support the services hosted on the
server are retained to minimize the overall footprint
occupied by the operating system and to maximize the
security of each server.
If available, service pack CDs
You only require these CDs if you perform a manual installation of an expansion grid node.
There are two Service Pack CDs:
•
•
StorageGRID Software Service Pack 9.0.x
Enablement Layer for StorageGRID Software Service
Pack 9.0.x
where x is the service pack number. The service pack
number is identical for both CDs.
Hotfix and/ or Maintenance
Release CD (or ISO image)
Determine whether or not a hotfix and/ or maintenance
release has been applied to the grid node type that is being
added. The expansion grid node must be updated to the
same hot fix or maintenance release as the other installed
grid nodes of the same type. (See the storage-grid-release
version number listed on the <grid_node>  SSM 
Services  Main page.)
To acquire hotfixes and maintenance releases, contact
Support.
NetApp StorageGRID
215-06839_B0
25
Expansion Guide
Table 2: Materials Checklist for Expansion (cont.)

Item
Notes
Three USB flash drives at least
1 GB in size, preferably the
three USB flash drives used to
originally install the grid:
These USB flash drives should be available at the customer
site. It is suggested that you confirm access to these USB
flash drives before leaving for the customer site.
•
The two Provisioning USB
flash drives
•
The Server Activation USB
flash drive
If you cannot reuse these USB flash drives, you must
provide three new USB flash drives (at least one GB in
size). You will use these to recreate the Provisioning USB
flash drive, the Provisioning Backup USB flash drive, and
the Server Activation USB flash drive as part of the expansion procedure.
If all grid nodes are installed in virtual machines, USB
flash drives are not required.
The Server Activation USB flash drive is not required if all
expansion grid nodes are installed on virtual machines.
26
Provisioning passphrase
The passphrase is created and documented when grids are
first installed (for release 8.0 and higher of the StorageGRID system) or when grids are updated to release 8.0.
The provisioning passphrase is not in the Passwords.txt
file.
Expansion grid specification
file
When expanding a grid, the latest version of the provisioned grid specification file is copied from the primary
Admin Node and sent to a NetApp Solutions Engineer for
updating. A new expansion grid specification file is
returned that must then be deployed. This deployment
generates the deployment grid specification file. The
deployment grid specification file is then used to generate
a SAID package.
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Table 2: Materials Checklist for Expansion (cont.)

Item
Notes
TSM Client Packages CD
The CD is only required if one or more of the following
conditions are true:
If you require them, you must
create a CD containing the following Tivoli® Storage
Manager (TSM) Client
packages:
•
The grid did not previously include a TSM
Archive Node
•
You are performing a manual installation of the expansion Archive Node (without GDU)
The package versions have been updated since your
last installation
•
Backup Archive RPM
(TIVsm-BA.i386.rpm)
•
•
Tivoli API RPM (TIVsmAPI.i386.rpm)
Download the correct version of the TSM client packages
from IBM. These packages are included in a TAR file available at:
The current supported version
of the TSM Client packages is
available in the Interoperability
Matrix Tool (IMT).
ftp://service.software.ibm.com/storage/tivoli-storage-management/maintenance/client/
1. Download the LinuxX86 version of the TAR file corresponding to the version of the TSM Library in use at
the deployment. The TAR file is found at <version> 
Linux  LinuxX86  <version>.
2. Unpack the tar file.
3. Copy the RPM files to a CD using standard CD
burning software.
4. Create an ISO image of the CD to be used during
software installation using the load_cds.py script. For
VMs, see “Load Software ISO Images” on page 194,
and for physical servers, see “Load Software ISO
Images” on page 206.
VMware software and
documentation.
For the current supported versions of VMware software,
see the Interoperability Matrix Tool (IMT).
Documentation
•
•
•
•
•
•
NetApp StorageGRID
215-06839_B0
Installation Guide
Administrator Guide
Maintenance Guide
Grid Designer User Guide
Upgrade Guide
Grid specific documentation
27
Expansion Guide
Table 2: Materials Checklist for Expansion (cont.)

Item
Notes
Service laptop
Laptop must have:
•
•
•
•
•
•
Microsoft Windows operating system
•
WinImage, used to create the floppy disk images
required to install grid nodes in a virtual machine.
WinImage is available at: http://www.winimage.com
•
WinSCP to transfer files to and from a primary
Admin Node hosted in a virtual machine. WinSCP is
available at:
http://winscp.net/eng/download.php
Network port
Supported browser for StorageGRID 9.0
Telnet/ssh client (PuTTY version 0.57 or higher)
StorageGRID Grid Designer
Grid Designer. The version of Grid Designer must be
compatible with the version of the StorageGRID
software.
Ensure Expansion Hardware is Ready
The hardware needed to support the grid expansion (including
servers, racks, and networking components) and the customer
network, have been correctly installed, configured, and verified before
software installation begins. Steps to configure virtual machines are
summarized in Table 12: “Prepare Virtual Machines for Expansion” on
page 179.
Powering Up Hardware
Ensure that all racks and drive arrays are powered up. This is particularly important for expansion Storage Nodes, Gateway Nodes, or
Control Nodes that use attached storage.
NOTE Do not start servers until their attached storage drive arrays spin up.
Ensure that the power is on for all racks at the site, and for all drive
array enclosures. The expansion servers can remain off until you need
to boot the server as part of these instructions.
28
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Configuring TSM Archive Node Hardware
You can integrate a TSM Archive Node with a new or an existing TSM.
For more information on installing and configuring TSM middleware
for use with an Archive Node, see “Archive Nodes” on page 17.
Provision the Grid
Grid Expansion
Procedure
Read scope and limitations
Prepare for expansion
After you arrive at the customer site, use one of the following procedures to provision the grid:
•
For an Admin Node installed on a virtual machine, see “Provision
the Grid and Create Server Activation Floppy Image” on page 29.
•
For an Admin Node installed on a physical server, see “Provision
the Grid and Create a Server Activation USB Flash Drive” on
page 32.
Provision the grid
Update networking
Prepare VMs and servers
NOTE New installations of the StorageGRID 9.0 system are not supported
on physical servers. Virtual machines must be used.
Install grid software
Add, customize, and start grid
nodes
Update grid for new grid nodes
Provision the Grid and Create Server Activation Floppy
Image
Use this procedure when the primary Admin Node is installed on a
virtual machine.
Prerequisites and Required Materials
NetApp StorageGRID
215-06839_B0
•
You have a grid specification file that has been updated for the
expansion. See “Acquire an Updated Grid Specification File” on
page 23.
•
A utility such as WinImage (available at http://www.winimage.com), that permits you to create a floppy disk image
•
A tool such as WinSCP (available at http://winscp.net/eng/download.php) to transfer files to and from the Admin Node
•
Service laptop
29
Expansion Guide
Procedure
1. At the primary Admin Node server, access a command shell and
log in as root. When prompted for a password, press <Enter>.
2. Copy the expansion grid specification file to /root/usb.
For example, use WinSCP to copy the edited grid specification file
from your service laptop to the primary Admin Node.
3. Remove any old grid specification files from /root/usb.
Ensure that there is only one file named
GID<grid_ID>_REV<revision_number>_GSPEC.xml.
NOTE The /root/usb directory must contain only one grid specification
file. Otherwise, provisioning will fail.
4. Provision the grid:
a. Enter: provision /root/usb
b. When prompted, enter the provisioning passphrase.
When the process is complete, “Provisioning complete” is displayed.
If provisioning ends with an error message, see “Provisioning
Troubleshooting” on page 241.
5. Log out. Enter: exit
6. Use WinSCP to copy the GID<grid_ID>_REV<revision_number>_SAID.zip file from the /root/usb directory of the Admin Node to
your service laptop.
7. Unzip the GID<grid_ID>_REV<revision_number>_SAID.zip file on
your service laptop, and review the contents of the docs/Index.html
file to confirm the grid is configured correctly. For more information, see “About the SAID Package” on page 223.
8. If the expansion grid node is installed on a physical server, update
the Server Activation USB flash drive:
a. Insert the Server Activation USB flash drive in the service
laptop.
b. Delete any existing content on the flash drive.
c. From the generated SAID package, copy the contents of the
Grid_Activation directory to the Server Activation USB flash
drive.
Do not copy the directory itself. You must enter the complete
path to activation files later in the installation; omitting the
directory makes this task easier.
d. Remove the Server Activation USB flash drive from the laptop.
30
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
9. If the expansion grid node is installed on a virtual machine, create a
Server Activation floppy image:
a. On your service laptop, start the WinImage software. From the
File menu, select New.
b. In the Format selection dialog, select a standard format 1.44 MB
floppy. Click OK.
c. Copy the server activation file to the floppy image:
•
From the Image menu, select Inject.
•
Browse to the Grid_Activation directory of the unzipped SAID
package and select a server activation file.
•
If more than one grid node is being added in this expansion,
add additional activation files to the floppy image if desired.
Three or four activation files fit on a single floppy image.
d. Save the floppy image with a descriptive name, and a file name
ending in .flp.
•
From the File menu, select Save.
•
Select Save as type: Virtual floppy Image (*.vfd,*.flp)
•
Enter a filename ending in .flp, such as <servername>.flp.
You must enter the extension, or the vSphere client cannot use
the image during installation.
•
Click Save.
e. Repeat until all server activation files are on floppy images.
10. Back up the provisioning data to a directory on the Admin Node. This
backup copy can be used to restore the grid in the case of an emergency or during an upgrade or grid expansion.
a. At the primary Admin Node, log in as root with the password
listed in the Passwords.txt file.
b. Create a directory for the backup provisioning data. Enter:
mkdir /var/local/backup
c. Back up the provisioning data. Enter:
backup-to-usb-key /var/local/backup
d. When prompted, enter the provisioning passphrase.
11. Store the Provisioning directory and the Backup Provisioning directories separately in a safe place. For example, use WinSCP to copy these
directories to your service laptop, and then store them to two separate
USB flash drives that are stored in two separate, secure physical
locations.
NetApp StorageGRID
215-06839_B0
31
Expansion Guide
For more information, see “Preserving Copies of the Provisioning
Data” on page 240.
The contents of the Provisioning directory is used during expansion and maintenance of the grid when a new SAID package must
be generated.
WARNING Store two copies of the Provisioning directory separately in safe secure locations. The Provisioning
directories contain encryption keys and passwords
that can be used to obtain data from the grid. The Provisioning directory is also required to recover from a
primary Admin Node failure.
Provision the Grid and Create a Server Activation USB
Flash Drive
Use this procedure when the primary Admin Node is installed on a
physical server.
NOTE New installations of the StorageGRID 9.0 system are not supported
on physical servers. Virtual machines must be used.
Prerequisites and Required Materials
•
You have a grid specification file that has been updated for the
expansion. See “Acquire an Updated Grid Specification File” on
page 23.
•
Provisioning USB flash drive, Backup Provisioning flash drive
•
Server Activation USB flash drive
•
Provisioning passphrase
•
service laptop
Procedure
1. Use Grid Designer to edit the updated expansion grid specification
file for deployment. For more information, see the Grid Designer
User Guide.
32
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
2. Copy the updated expansion grid specification file to the root
directory of the Provisioning USB flash drive.
Ensure that it is the only grid specification file in the root directory
or provisioning will fail.
3. At the primary Admin Node, log in as root. When prompted for a
password, press <Enter>.
4. Provision the grid:
a. Enter: provision
b. When prompted, insert the Provisioning USB flash drive.
c. When prompted, enter the provisioning passphrase.
d. When provisioning is complete, remove the Provisioning USB
flash drive.
When the process is complete, “Provisioning complete” is displayed.
If provisioning ends with an error message, see “Provisioning
Troubleshooting” on page 241.
5. Review the current configuration to confirm all settings are correct:
a. Copy the file GID<grid_ID>_REV<revision_number>_SAID.zip on
the USB Provisioning flash drive to the service laptop and
extract the contents.
b. Inspect the file Index.html to make sure that the settings are
correct. If there is an error, you need to provision the grid
again. For more information, see “Errors in Grid Specification
File” on page 242.
6. If the expansion grid node is installed on a physical server, update
the Server Activation USB flash drive:
a. Insert the Server Activation USB flash drive in the service
laptop.
b. Delete any existing contents on the flash drive.
c. From the generated SAID package, copy the contents of the
Grid_Activation directory to the Server Activation USB flash
drive.
Do not copy the directory itself. You must enter the complete
path to activation files later in the installation; omitting the
directory makes this task easier.
d. Remove the Server Activation USB flash drive from the service
laptop.
NetApp StorageGRID
215-06839_B0
33
Expansion Guide
7. If the expansion grid node is installed on a virtual machine, create
a Server Activation floppy image:
a. On your service laptop, start the WinImage software. From the
File menu, select New.
b. In the Format selection dialog, select a standard format 1.44 MB
floppy. Click OK.
c. Copy the server activation file to the floppy image:
•
From the Image menu, select Inject.
•
Browse to the Grid_Activation directory of the unzipped
SAID package and select a server activation file.
•
If more than one grid node is being added in this expansion, add additional activation files to the floppy image if
desired. Three or four activation files fit on a single floppy
image.
d. Save the floppy image with a descriptive name, and a file name
ending in .flp.
•
From the File menu, select Save.
•
Select Save as type: Virtual floppy Image (*.vfd,*.flp)
•
Enter a filename ending in .flp, such as <servername>.flp.
You must enter the extension, or the vSphere client cannot
use the image during installation.
•
Click Save.
e. Repeat until all server activation files are on floppy images.
8. Back up the provisioning data:
a. At the primary Admin Node server, access a command shell
and log in as root using the password listed in the Passwords.txt
file.
b. Insert the Backup Provisioning USB flash drive:
c. Back up the provisioning data. Enter: backup-to-usb-key
d. When prompted, enter the provisioning passphrase.
e. Remove the Backup Provisioning USB flash drive.
9. Store the Provisioning USB flash drive and the Backup Provisioning USB flash drive separately in a safe place.
The content of the Provisioning USB flash drive is used during
expansion and maintenance of the grid when a new SAID package
must be generated.
34
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
WARNING Store the Provisioning USB flash drive and the Backup
Provisioning USB flash drive separately in safe secure
locations such as a locked cabinet or safe. The USB
flash drives contain encryption keys and passwords
that can be used to obtain data from the grid. The Provisioning USB is also required to recover from a
primary Admin Node failure.
Update Networking at Existing Sites
Grid Expansion
Procedure
Read scope and limitations
Prepare for expansion
If the expansion adds an additional satellite site to the grid, you may
need to update the networking configuration of existing servers at all
sites before you start servers at the satellite site. This step is only
required if the grid uses a grid private network configured with
separate subnets for each site, as described below.
Provision the grid
Update networking
Prepare VMs and servers
Install grid software
Add, customize, and start grid
nodes
Update grid for new grid nodes
Grid Networking Configuration
The StorageGRID system uses the following common networking
configurations:
NetApp StorageGRID
215-06839_B0
•
All grid servers are placed directly on the customer network, with
all traffic between grid servers and all interactions with customer
servers occurring on this single network.
•
(Best practice) The grid is configured with a grid private network
that segregates grid traffic onto a separate subnet or subnets. All
interaction between grid servers occurs on the grid private
network. Gateway Nodes, Admin Nodes, and in some cases
Archive Nodes need to interact with customer servers over the
customer network. Therefore these grid servers also have IP
addresses on the customer network in addition to their IP
addresses on the grid private network.
35
Expansion Guide
When a grid private network is used, it may be configured in a flat
network topology with all grid servers in the same subnet, or as a
series of subnets that can route to one another as required. For
example, it is common for a multi-site grid to use a separate subnet for
each site, with routes configured for each server that permit direct
communications between servers in different subnets (sites).
Update Routes on Servers at Existing Sites
If the grid you are expanding uses a grid private network configured
with separate subnets for each site and the expansion adds a new satellite site, update the routing for existing servers at each site to enable
them to communicate with servers at the new site (subnet).
The deployment grid specification file for the expansion defines the
networking configuration for all new grid servers and updates the networking configuration as required for existing servers. When the
deployment grid specification file is provisioned, an updated SAID
package is created for the grid. When this SAID package is used to
install the new servers, the installation process correctly configures the
IP addresses and routing for the expansion servers. It also creates the
files required to update the routing on all existing grid servers.
Follow the procedure below to update routes on every existing grid
server before enabling services on expansion grid nodes at a new site.
NOTE If required, update routes on the grid private network for all servers at
existing sites before you start expansion servers at a new satellite
site.
You must reboot a server to update its routing. Because routing must
be updated on every server in the grid, this process disrupts grid
services (including client operations) unless you update one server at a
time and fail over primary Gateway Nodes before updating them.
Prerequisites and Required Materials
36
•
Your expansion requires that you update the routing of existing
grid servers (for example, the grid uses separate subnets for each
site and the expansion adds a site to the grid).
•
The CMN service must be available.
•
You must have provisioned the grid for expansion, as described in
“Provision the Grid and Create Server Activation Floppy Image”
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
on page 29 or “Provision the Grid and Create a Server Activation
USB Flash Drive” on page 32.
•
Provisioning passphrase
•
Passwords.txt file
•
If you are updating the routing of Gateway Nodes that are part of a
cluster (HAGC), you require the ssh password of all servers in the
cluster.
Procedure
1. At the primary Admin Node server, access a command shell and
log in as root using the password listed in the Passwords.txt file.
2. Export the server configuration files created by provisioning:
a. Create a subdirectory for the configuration files. Enter:
mkdir -p /var/local/config
b. Export the configuration files. Enter:
get-server-config /var/local/config
c. When prompted, enter the provisioning passphrase.
The script creates subdirectories, one for each server in the
grid. Each subdirectory contains a configuration file for the
server. Server configuration files are created for all servers on
the grid, even if their routing has not changed.
d. Copy the configuration file of each existing server in the grid to
that server. Enter:
scp /var/local/config/<server_name>\
/bycast-server-config.xml <server_IP>:/etc/
where <server_name> is the hostname of the server and
<server_IP> is its grid IP address.
e. Log out of the command shell. Enter: exit
3. Run the apply-pending-changes script on each server in the grid:
a. If this is an active primary Gateway Node:
•
In a Basic Gateway replication group, fail over to the secondary Gateway Node and redirect client applications to
use the secondary.
•
In a High Availability Gateway replication group, failover
to a standby primary Gateway Node. Clients that use NFS
may need to remount their file shares after failover.
For the failover procedures, see the Maintenance Guide.
b. At the server, access a command shell and log in as root using
the password listed in the Passwords.txt file.
NetApp StorageGRID
215-06839_B0
37
Expansion Guide
c. Run the script. Enter: apply-pending-changes
d. When prompted, reboot the server. Enter: reboot
e. Log out of the command shell. Enter: exit
f.
Wait for the reboot to complete. Check that Server Manager
reports that all services are Verified or Running on the server
before proceeding.
g. For primary Gateway Nodes, fail back to the original
Gateway Node and redirect client applications as needed.
h. Repeat from step a on every server in the grid.
Prepare VMs and Physical Servers
Grid Expansion
Procedure
Read scope and limitations
Prepare the virtual machines or physical servers that will host the new
grid nodes. At the end of this section, StorageGRID software services
have not been enabled, and the virtual machines or servers have not
yet joined the grid.
Prepare for expansion
NOTE New installations of the StorageGRID 9.0 system are not supported
Provision the grid
on physical servers. Virtual machines must be used.
Update networking
Prepare VMs and servers
Install grid software
Add, customize, and start grid
nodes
Update grid for new grid nodes
Prerequisites
•
Preparation is complete. See “Prepare for Expansion” on page 21.
•
The grid is provisioned. See “Provision the Grid” on page 29.
•
Networking has been updated, if applicable. See “Update Networking at Existing Sites” on page 35.
Procedure
1. Perform one of the following:
•
For virtual machines, see “Prepare Virtual Machines” on
page 179.
•
For physical servers, see “Prepare Expansion Physical Servers”
on page 199.
The next step is to install grid software.
38
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Install StorageGRID Software
Grid Expansion
Procedure
Read scope and limitations
Prepare for expansion
Use the following procedure to install grid software on all expansion
virtual machines and servers.
Prerequisites
•
If installing on virtual machines, virtual machines have been
prepared as described in “Prepare Virtual Machines” on page 179.
•
If installing on physical servers, physical servers have been prepared
as described in “Prepare Expansion Physical Servers” on page 199.
Provision the grid
Update networking
Prepare VMs and servers
NOTE New installations of the StorageGRID 9.0 system are not supported on
Install grid software
physical servers. Virtual machines must be used.
Add, customize, and start grid
nodes
Update grid for new grid nodes
Procedure
1. Start GDU on the primary Admin Node. See “Start GDU” on
page 211.
2. In the Servers panel, select the expansion server where the software
is to be installed.
3. In the Server Info panel, confirm that the status displayed is Available.
(If necessary, select Update Status to refresh the display.)
4. In the Tasks panel, select Install Software, and then in the Actions
panel, select Start Task and press <Enter>.
The server status changes to busy.
Software installation is completed when the message “Finished ‘Install
Server’ task on <server>” appears in the Log Messages panel.
Installation times vary depending on the size of the database or
storage volumes being set up. It can take approximately three hours
on Control Nodes, 45 minutes on Admin Nodes, and up to
three hours on Storage Nodes which have storage installed. The
script completes in less than 10 minutes on other servers.
If the server is provisioned to host an LDR service and the provisioning hardware profile has not specified object store names,
installation detects the unallocated drives and formats the disks.
If the server hosts a Gateway Node, the server reboots automatically
as part of the installation process.
NetApp StorageGRID
215-06839_B0
39
Expansion Guide
5. If the server has rebooted and this is the primary Admin Node,
continue the software installation:
a. Access a command shell and log in as root using the password
listed in the Passwords.txt file.
b. Start GDU. See “Start GDU” on page 211.
c. In the Servers panel, select the primary Admin Node and
confirm that its current state is Available.
d. In the Tasks panel, select Continue Install, and then in the
Actions panel, select Start Task and press <Enter>. Wait for the
task to complete.
WARNING Unless directed otherwise, never start grid software
(Enable Services in GDU) on an expansion server until
the expansion grid tasks have been run and all
required customization steps have been performed.
Starting grid software immediately after software
installation can lead to unrecoverable data loss. 

The exception to the above rule occurs if you are
adding Control Nodes to the grid. Do not wait. When
the Grid Expansion task (GEXP) transitions to the
Checking CMS Online stage, continue with the next
step in the Control Node expansion procedure.
6. If necessary, install drivers for the physical servers. See “Install
Drivers” on page 202.
The current service pack level of StorageGRID software is installed as
part of the software installation process on the expansion grid node.
40
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Add, Customize, and Start Grid Nodes
Before you begin this section, ensure that you complete all of the steps
in the section “Install StorageGRID Software” on page 39.
Grid Expansion
Procedure
Read scope and limitations
Prepare for expansion
Provision the grid
If the expansion adds a satellite site, perform all of the following tasks
at the Data Center site before you travel to the satellite site, and repeat
the procedures at the satellite site:
•
Prepare all of the virtual machines or physical servers. See
“Prepare Virtual Machines” on page 179 or “Prepare Expansion
Physical Servers” on page 199.
•
Install StorageGRID software. See “Install StorageGRID Software”
on page 39.
•
Perform all of the steps in “Add, Customize, and Start Grid Nodes”
Update networking
Prepare VMs and servers
Install grid software
Add, customize, and start grid
nodes
Update grid for new grid nodes
NOTE New installations of the Release 9.0 StorageGRID system are not
supported on physical servers. Virtual machines must be used.
Overview: Add, Customize, and Start
Table 3: Add, Customize, and Start Expansion Grid Nodes

Step
Applies to
servers
hosting these
services
Action
See
1.
All
Run the initial expansion grid task to prepare
the grid for the expansion grid servers.
page 44
2.
All
Determine the order in which to add each
expansion grid server.
page 46
If you are converting to an HCAC, complete
the conversion process before adding other
grid nodes. Adding a second HCAC can be
done at any time. For more information of
converting to an HCAC, see Chapter 3:
“Convert to a High Capacity Admin Cluster”.
3.
All
NetApp StorageGRID
215-06839_B0
Run the Grid Expansion: Add Server grid tasks
for expansion grid nodes.
page 47
41
Expansion Guide
Table 3: Add, Customize, and Start Expansion Grid Nodes (cont.)

Step
Applies to
servers
hosting these
services
Action
See
4.
Control Nodes
(CMS)
Clone the CMS database if required.
page 49
NOTE Cloning a CMS database can take
over four hours, depending on
content. You are advised to perform
this operation at a time that is least
disruptive to normal customer
operations.
5.
Control Node
(CMS)
Set CMS ILM evaluation deferred time, if
required.
page 57
6.
All
Start grid software (using Enable Services in
GDU) on the grid nodes in the order described
in “Expansion Order” on page 46.
page 57
Read that section before enabling services.
7.
All
Apply hotfixes and maintenance releases.
page 61
8.
Storage Node
(LDR)
Ensure the new Storage Nodes are included in
the active ILM’s storage pools.
page 61
9.
Gateway Node
(FSG)
If you added a secondary Gateway Node to an
existing FSG replication group, monitor the
“restoration” of its file system.
page 61
10.
Storage Node
(LDR)
Verify that the new Storage Node is active.
page 62
11.
Control Node
(CMS)
Verify the operation of the new
Control Nodes.
page 63
12.
Admin Node
(NMS)
If you are adding a second Admin Node or
HCAC to the grid, customize the Admin Node
(processing Admin Node in an HCAC) now.
page 64
If you are converting an Admin Node to be an
HCAC, do not customize it now: wait to clone
the NMS database, copy the audit log, update
IP addresses, and enable DNS when directed
to in Chapter 3: “Convert to a High Capacity
Admin Cluster”.
42
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Table 3: Add, Customize, and Start Expansion Grid Nodes (cont.)

Step
Applies to
servers
hosting these
services
Action
See
13.
Gateway Nodes
(FSG)
If you added a new FSG replication group,
complete these integration and customization
tasks:
page 66
•
Create and verify a backup of the new replication group.
•
Customize Gateway Node behavior as
required.
•
•
Update the configurable ILM if required.
If this is a new FSG replication group,
create file shares to integrate clients.
14.
Gateway Nodes
(FSG)
If you added a secondary Gateway Node to an
existing FSG replication group, update its file
shares integration to match the other FSGs in
its replication group.
page 69
15.
Gateway Node
(FSG)
Verify operation of the new Gateway Node
replication group.
page 68
16.
Gateway Node
(FSG)
Copy the FSG cache.
page 72
17.
Archive Node
(ARC)
Complete the setup of a TSM Archive Node.
Configure ARC to communicate with the TSM
middleware.
page 73
Update storage pools to use the new
Archive Node, and update the ILM policy.
page 73
Optionally, copy existing audit logs to the
expansion Admin Node or Audit Node.
page 73
If the Audit Option is included in the deployment, enable the audit share.
“File Share
Configuration” chapter
of the Administrator Guide.
18.
19.
Audit Node
(AMS)
Admin Node
(AMS)
20.
Audit Node
(AMS)
Admin Node
(AMS)
21.
NetApp StorageGRID
215-06839_B0
After completing the steps in Table 3 for all
virtual machines or servers at a site, repeat all
steps except running the initial expansion grid
task (page 44) at each additional site with
expansion grid nodes.
43
Expansion Guide
Run the Initial Expansion Grid Task
Running the Initial expansion grid task (Grid Expansion: Initial) adds
configuration information to the grid for all expansion grid nodes at
all sites. The grid task is run once per expansion.
Completing the installation of grid software on all expansion grid
nodes before running the initial grid task helps verify that the grid
specification file is correct. Recovering from an error that requires you
to reprovision the grid is much simpler if the Grid Expansion: Initial grid
task has not yet been run.
Prerequisites
•
Grid software has been installed on all expansion grid nodes.
It is better to wait until the installation of grid software is complete
on all expansion grid nodes, but it is safe to run the grid task while
you are waiting for the installation to complete on expansion
Control Nodes, where database initialization may take hours to
complete.
•
•
If installation failed on any server because of a provisioning error,
ensure that you have:
•
Deleted the grid tasks created by provisioning (Grid Expansion:
Initial and all Grid Expansion: Add Server grid tasks)
•
Corrected the provisioning issue and have a corrected grid
specification file for the expansion, if required
•
Reprovisioned the grid (including removing the failed revision), as described in “Errors in Grid Specification File” on
page 242
•
Reinstalled all expansion servers (because reprovisioning the
grid regenerates node IDs and certificates for all expansion
servers)
Only the following grid tasks are running:
•
LDR Content Rebalancing grid task (LBAL)
•
Software Upgrade grid task (SWUP)
•
If you are not adding a CMS, LDR Foreground Verification
(VFGV or LFGV)
If any other grid tasks are running, wait for them to complete,
release their lock, or abort them as appropriate. For more information on grid tasks and resource locking, see the “Grid Tasks”
chapter in the Administrator Guide. For more information on expansion during an upgrade, see the Upgrade Guide.
44
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
•
Control Node servers are accessible.
To ensure that information about expansion grid nodes is propagated throughout the grid, do not run expansion grid tasks while
any Control Node servers are inaccessible (for example, due to
network problems).
Procedure
1. In the NMS MI, go to CMN  Grid Tasks  Configuration  Main.
Figure 1: Grid Tasks
2. Under Pending, adjacent to Grid Expansion: Initial, under Actions,
select Start.
3. Click Apply Changes.
The grid task moves from the Pending table to the Active table. You
must wait for the NMS MI page to auto-refresh before the change
is visible. Do not submit the change again.
The grid task completes quickly
NOTE While active, the progress of the grid task can be charted via
CMN  Grid Tasks  Reports  Charts.
The grid task continues to execute until it completes or fails. When
the grid task completes successfully, it moves to the Historical table
with a Status of Successful. If the grid task fails, it moves to the Historical table with a description of the error under Status.
NetApp StorageGRID
215-06839_B0
45
Expansion Guide
NOTE Grid tasks that are Paused can be resumed. If you Abort a grid
task you cannot restart it while it is in the Historical table. For more
information and instructions on Grid Tasks, see the Administrator
Guide.
If the grid task fails, contact Support.
NOTE Wait until the grid task completes successfully before continuing.
Expansion Order
When adding more than one type of grid node in a single expansion,
determine the expansion order for grid nodes. Then run the Add Server
expansion grid task (Grid Expansion: Add Server) for each grid node,
perform any required customization, and start grid software (using
enable services in GDU) before moving on to the next one. As soon as
the Grid Expansion: Add Server grid task completes, the grid becomes
aware of the expansion grid node and may begin queuing messages for
it.
A quorum (50% + 1) of the total number of ADCs that are installed and
have at some point been connected to the grid must be online and available in order for the grid to perform certain activities (such as running
grid tasks). It is good practice to run the Grid Expansion: Add Server grid
task for a grid node, perform any required customization, and then
start the grid node immediately, before running the Add Server grid task
for the next grid node.
In the following cases, it is recommended that you run the Grid Expansion: Add Server grid tasks for several nodes, one after another, and then
start grid software (enable services) on all of those grid nodes:
Metadata synchronization is
deprecated.
46
•
It is recommended that you run the Grid Expansion: Add Server grid
task for all members of a Gateway Node replication group, and then
enable services on these Gateway Nodes one after another. For
more information, see “Start Grid Software (Enable Services)” on
page 57.
•
It is recommended that if the grid includes CMSs with synchronized databases, you run the Grid Expansion: Add server grid task for
all CMSs in the expansion and then immediately enable services on
each one. (Check the NMS MI. If a CMS includes the Metadata component, the grid uses metadata replication. If it does not, the grid
uses metadata synchronization.)
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
It is recommended that you run the Add Server grid task, customize,
and then start grid software (enable services) on grid nodes in the following order:
NOTE If you are converting to an HCAC, you must complete the conversion
before adding other grid nodes. For more information, see “Convert to
a High Capacity Admin Cluster” on page 79.
1. Admin Nodes
2. Storage Nodes (and Archive Nodes), and Control/Storage Nodes
3. Control Nodes
4. Gateway Nodes
5. Audit Nodes
Starting grid software on grid nodes in this order (Admin Nodes,
Storage Nodes, Control Nodes, Gateway Nodes, Audit Nodes) helps
prevent queue buildups, particularly in a larger expansion where the
length of time to complete the process may be longer.
Run the Add Server Grid Task
Before you customize or start an expansion grid node, you must run
the Add Server expansion grid task (Grid Expansion: Add Server) for the
grid node. This prepares the grid for the new grid node.
Prerequisites
•
Grid Expansion: Initial grid task has been run
•
You have read the section “Expansion Order” and determined
which grid task(s) to run first
•
Only the following grid tasks are running:
•
LDR Content Rebalancing grid task (LBAL)
•
Software Upgrade grid task (SWUP)
•
If you are not adding a CMS, LDR Foreground Verification
(VFGV or LFGV)
If any other grid tasks are running, wait for them to complete,
release their lock, or abort them as appropriate. For more information on grid tasks and resource locking, see the “Grid Tasks”
chapter in the Administrator Guide. For more information on expansion during an upgrade, see the Upgrade Guide.
NetApp StorageGRID
215-06839_B0
47
Expansion Guide
•
Control Node servers are accessible.
To ensure that information about expansion grid nodes is propagated throughout the grid, do not run expansion grid tasks while
any Control Node servers are inaccessible (for example, due to
network problems).
•
It is a good practice to ensure that the grid is healthy, and that no
secondary Gateway Node is acting as a primary before you run the
Add Server expansion grid task, but it is not required.
Procedure
NOTE If the expansion requires more than one grid task, allow each grid task
to complete before you start the next grid task.
1. Go to CMN  Grid Tasks  Configuration  Main.
2. Adjacent to the Pending task that you want to start, under Actions,
select Start.
The grid task appears with the description: Grid Expansion: Add
Server <hostname>.
3. Click Apply Changes.
The grid task moves from the Pending table to the Active table.
For all grid node types except Control Nodes, the grid task continues to execute until it completes or fails.
In the case of a Control Node expansion, the Grid Expansion task
(GEXP) transitions to Checking CMS Online stage and remains at this
stage. Do not wait for it to complete, as it will not complete on its
own. When the Grid Expansion task (GEXP) transitions to the
Checking CMS Online stage, continue with the next step in the
Control Node expansion procedure.
When a grid task completes (or fails), it moves to the Historical table
with a Status description.
NOTE Note that you can resume grid tasks that are Paused. If you Abort
a grid task, you cannot restart it while it is in the Historical table.
For more information and instructions on Grid Tasks, see the
Administrator Guide.
If the grid task fails, contact Support. You must abort the expansion
process, and issue a new expansion request to a NetApp Solutions
Engineer.
48
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
NOTE Unless directed otherwise, before you start grid software on an
expansion grid node, wait at least two minutes after the grid task
completes successfully. This allows authentication certificates to
propagate throughout the grid. 

When adding Control Nodes do not wait. Continue with the
expansion process.
Clone the CMS Database If Required
WARNING Read the “Clone CMS database” procedure from start
to finish before you perform any of the procedures it
describes. Never attempt to start a CMS before you
complete all relevant procedures. 

A CMS started with an inconsistent database that has
not been successfully configured may corrupt other
CMS services in the grid and result in undefined grid
behavior.
Prerequisites
•
Grid software installation is complete on all expansion
Control Nodes.
•
The Grid Expansion: Add Server grid task for each expansion
Control Node has been run.
Procedure
Most grid expansions that add Control Nodes do so to increase the
metadata capacity of the grid. When added to increase capacity,
Control Nodes are added with empty CMS databases. Control Nodes
may also be added with empty CMS databases if you are adding a satellite site that permits islanded operation, in a grid where the CMSs
use metadata replication. In these cases, skip this section and go on to
“Set CMS ILM Evaluation Deferred Time” on page 57.
If you are adding Control Nodes to increase metadata redundancy and
data availability, follow the procedure in this section to clone the CMS
database.
NetApp StorageGRID
215-06839_B0
49
Expansion Guide
When you add Control Nodes to an existing grid and you need to
clone the CMS database from an existing Control Node to the new
Control Nodes, you must temporarily shut down a functioning CMS
as part of the cloning procedure. The StorageGRID system has a
minimum of two CMSs. Both must be operational to perform this
process. If one is unavailable, correct the problem before you add more
servers.
When the grid 
operates with a
single CMS, alarms
are triggered and
displayed in the
NMS MI.
While the database is being cloned, the grid can continue normal operations as long as at least one read-write CMS remains active.
1. Identify the source CMS. See “Identify the Source CMS” on
page 51.
2. Shut down the source CMS to allow capture of its database. See
“Shut Down the Source CMS” on page 52.
3. Clone the source database to all target CMSs. See “Clone the CMS
Database” on page 53.
4. Restart the source CMS. See “Restart the Source CMS” on page 55.
5. Customize the target CMS database. See “Customize the CMS
Databases” on page 56.
NOTE Cloning a CMS database can take over four hours, depending on
content. The entire expansion process can take up to eight hours.
You are advised to perform this operation at a time that is least disruptive to normal customer operations.
Terminology
New CMSs added to increase redundancy require a clone of the
database in use by other CMS services within the grid. This process
makes reference to:
•
The “target” CMS, which is the new CMS on the expansion server.
•
The “source” CMS, which is a functional CMS currently serving
the grid. This service is used as the source of the CMS database to
be cloned to the target CMS(s).
Remote Access
If any of the servers that you need to access are at a different location,
use the procedure described in “Accessing a Server Remotely” on
page 249 to access a command shell on these servers and eliminate
travel between locations.
50
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Identify the Source CMS
CMSs added to increase redundancy must be cloned from one of the
existing CMSs in the grid. The CMS you use as the source to clone the
expansion CMS database depends on the type of CMS and the grid
history:
•
Metadata Replication
In a grid that uses metadata replication, you normally add CMSs
with empty databases. However, if you add a CMS to an existing
CMS replication group, you need to clone the database. Use one of
the existing CMSs in that replication group as the source for
cloning.
The names of the CMSs in a CMS replication group display in
CMS  Tasks, under Recovery Sources.
Source
CMS to use
for cloning
Figure 2: Identifying Source CMS When Using Metadata Replication
•
Metadata synchronization, no grid expansion
If the grid has not been expanded since it was originally deployed,
every CMS database in the grid is equivalent. You can select any
functioning CMS in the grid as the source.
NOTE Metadata synchronization is deprecated.
NetApp StorageGRID
215-06839_B0
51
Expansion Guide
•
Metadata synchronization, after grid expansion
If the grid was expanded previously but the expansion did not
include CMSs, select any functioning CMS as the source.
If the previous grid expansion added CMSs with cloned CMS databases, all CMSs remain equivalent. Select any functioning CMS as
the source.
If the previous grid expansion added CMSs that increased CMS
metadata capacity, all CMSs are not equivalent. Select the CMS
from the correct generation that has the most metadata as the
source. For more information on selecting the CMS with the most
metadata, see the Maintenance Guide.
Shut Down the Source CMS
You must shut down the source CMS service to clone its database.
The CMS selected as the source is unavailable to the grid while you
clone the database. The grid can continue to function during this
process, using the remaining read-write CMSs on the grid. However, if
only one read-write Control Node remains active, consider shutting
down client gateways while cloning proceeds. You are at increased
risk of data loss while operating with only one Control Node.
When the grid 
operates with a
single CMS, alarms
are triggered and
displayed in the
NMS MI. This is
normal.
1. In the NMS MI, check the following attributes:
•
If the CMS uses metadata replication (has a CMS  Metadata
component), check CMS  Content  Active Replications.
•
If the CMS uses metadata synchronization, check CMS 
Synchronization  Incoming Messages.
Ensure that the value of this attribute is not increasing.
NOTE Metadata synchronization is deprecated.
2. If the source Control Node is hosted on the same server as other
grid nodes (that is, the Control Node is part of a combined node
such as a Control/Storage Node):
a. Try to stop the MySQL service. Log in to the command line
interface of the source Control Node, and enter:
/etc/init.d/mysql stop
b. The output from this command tells you which services you
need to stop on this server. The example below, from a server
52
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
that hosts both a Control Node and an Admin Node, shows
that both the CMS and MI services need to be stopped.
hostname:~# /etc/init.d/mysql stop
mysql is unable to stop because the following
services are running:
cms
mi
Please manually stop these services before stopping
mysql.
c. Stop the indicated services. Enter:
/etc/init.d/cms stop
/etc/init.d/mi stop
d. Stop the MySQL service. Enter: /etc/init.d/mysql stop
Stopping only individual services that are required on the source
Control Node permits other parts of the combined grid node (such
as the Storage Node) to continue to operate normally while the
cloning is in progress.
3. If the source Control Node is not combined with any other grid
nodes:
a. Use Server Manager to execute the Stop All command. If
prompted, enter the Server Console’s password listed in the
Passwords.txt file.
If using ssh, enter: /etc/init.d/servermanager stop
b. Wait for Server Manager to report all services, including the
CMS and the MySQL database services, have stopped.
If using ssh, enter: /usr/local/bin/storagegrid-status
c. Press <Ctrl>+<C> to exit.
Clone the CMS Database
After you shut down the source CMS, clone its CMS database to the
target expansion CMS.
1. At the server with the target CMS database, access a command
shell and log in as root using the password listed in the Passwords.txt file.
2. Change directories to access the cloning script. Enter:
cd /usr/local/cms/tools
NetApp StorageGRID
215-06839_B0
53
Expansion Guide
3. Run the cloning script. Enter: ./cms-clone-db.sh <IP_source>
where <IP_source> is the IP address of the server with the source
CMS. The IP address is listed in the index.html file in the Doc directory of the SAID package.
WARNING Enter the correct information for the source
Control Node to prevent a cloning error.
4. When prompted, enter the password for the source server listed in
the Passwords.txt file.
Track the cloning process using the displayed progress bar and
estimated time of completion. Cloning ends with the message
done.
If the script aborts with the message tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now, see the “Troubleshoot CMS
Cloning” section of the Maintenance Guide for instructions on
recovering from this error.
If the script aborts with a different message, contact Support. After
correction, check that no part of the process is active on the source
CMS, and restart the script.
5. If you are adding more than one CMS to the grid, switch to a
command shell on the next target server, and start the cloning
operation on each additional CMS in turn.
You can clone the source CMS to more than one target CMS simultaneously. The cloning process can take up to four hours to
complete for a single CMS. You can realize time savings if you
clone multiple CMSs in parallel. For instance, cloning the CMS
database on two CMSs in parallel should take less than six hours.
WARNING Once database cloning begins, the process must not
be interrupted. As soon as the first file copies from the
source CMS to the target CMS, the target CMS
database is corrupt until the file-copying process 
completes.

If the cloning script terminates prematurely for any
reason, ensure no part of the process is active on the
source CMS (using pgrep cms-clone), then restart the
script on the target CMS.
54
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Restart the Source CMS
WARNING Let the cloning process complete on all target CMSs
before you proceed.
After you clone the CMS database, you can restart the source CMS.
1. Access a remote secure shell to the source CMS server.
2. Confirm that no cloning processes are running. Enter:
pgrep cms-clone
cn2-a-1# pgrep cms-clone
cn2-a-1#
The command should not display any output. If any output
appears, ensure all expansion CMSs have reported done from the
cloning script. If necessary, contact Support.
NOTE Before you restart the source CMS and/or MySQL, make absolutely sure no cloning process is running. If you cancelled a
database cloning script, or if the script was interrupted on a target
CMS, the script could still be running on the source CMS and
must be killed before MySQL (and therefore the CMS) can be
restarted.
3. If the source Control Node is hosted on the same server as other
grid nodes, restart the services that you stopped in “Shut Down
the Source CMS” on page 52. For example, for a server that hosts
both an Admin Node and a Control Node, enter:
/etc/init.d/mysql start
/etc/init.d/cms start
/etc/init.d/mi start
The MySQL service must be started before the CMS or MI services.
The MI service is only present on servers that also host an
Admin Node.
4. If the source Control Node is not combined with any other grid
nodes, enter: /etc/init.d/servermanager start
5. Wait for Server Manager to report all services, including the CMS
and the MySQL database services, have a status of either verified
or running. Enter: storagegrid-status
Press <Ctrl>+<C> to exit.
6. Close the remote session. Enter: exit
NetApp StorageGRID
215-06839_B0
55
Expansion Guide
7. In the NMS MI, go to <source_Control Node>  CMS  Overview 
Main.
8. Verify that the source CMS is running normally:
CMS State = Online, and CMS Status = No Errors.
If there are large queues on ILM evaluation (see the Metadata with
Unachievable ILM Evaluations attribute at CMS  Metadata 
Overview  Main), the source CMS may take longer to start up. Wait
until the source CMS has started before going on to “Customize
the CMS Databases” below.
Customize the CMS Databases
The cloned database on the new CMS services must be customized.
This stage requires the MySQL support service.
1. Start MySQL on one of the new target CMSs. Enter:
/etc/init.d/mysql start
2. Change directories to access the scripts. Enter:
cd /usr/local/cms/tools
3. Run the script to customize the cloned CMS database. Enter:
./customize-cloned-db.sh
This script has no output. The command prompt appears when the
script is done.
NOTE If the script fails, it cannot be run again. Contact Support.
4. Run the script to update the CMS group configuration. Enter:
./update-config-group-expansion.sh <nodeID>
where <nodeID> is the node ID of the new CMS listed in the
index.html file of the SAID package.
This script has no output. The command prompt appears when the
script is done.
NOTE If the script fails, it cannot be run again. Contact Support.
5. Stop MySQL on the new CMS. Enter: /etc/init.d/mysql stop
6. Repeat step 1 to step 5 for all new CMSs.
56
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
7. Run the script update-config-group-expansion.sh on all pre-existing
CMSs in the grid:
a. Enter: ssh <IP_CN>
where <IP_CN> is the IP address of a server hosting a preexisting CMS.
b. Change directories to access the scripts. Enter:
cd /usr/local/cms/tools
c. Run the script. Enter:
./update-config-group-expansion.sh <nodeID>
where <nodeID> is the node ID of a new CMS.
d. Repeat step c for each new CMS added.
e. Restart the pre-existing CMS service. Enter:
/etc/init.d/cms restart
f.
Close the remote session. Enter: exit
g. Repeat for every pre-existing CMS in the grid.
It is now safe to enable services to start the grid software on the
expansion Control Nodes.
This is the end of the CMS database cloning procedure.
Set CMS ILM Evaluation Deferred Time
Configure the CMS ILM Evaluation Deferred Time for each expansion
Control Node, if required. This setting is the time when the grid
performs a daily evaluation of objects ingested into the grid against
current business rules to determine if object locations should be
updated (for example, the object may be moved to tape after a set
time). This evaluation is computationally intense and can interfere
with normal grid activities. It is recommended to set the CMS ILM
Evaluation Deferred Time to a time when the grid is lightly used. The
default time is 21:00 UTC. For more information, see the Administrator
Guide.
Start Grid Software (Enable Services)
Prerequisites
NetApp StorageGRID
215-06839_B0
•
You have read the section “Expansion Order” on page 46 and are
following its recommendations.
•
You have run the Grid Expansion: Add Server grid task for grid
node(s) you will enable services on.
57
Expansion Guide
•
All server preparation steps have been completed as outlined in
“Prepare Virtual Machines” on page 179 or “Prepare Expansion
Physical Servers” on page 199.
In particular, for grid nodes installed on virtual machines:
•
For vSphere 4.1 or later, VMware Tools are installed and the
vmware_setup.py script has been run. See “Install VMware
vSphere” on page 180.
•
If the grid node is a Control Node:
•
You have configured the CMS ILM Evaluation Deferred Time.
See “Set CMS ILM Evaluation Deferred Time” on page 57.
•
If you are adding the Control Node to increase redundancy,
you have cloned the CMS database. See “Clone the CMS
Database If Required” on page 49.
WARNING Do not start grid software on a grid node until you
complete all necessary preparation steps for that type
of grid node.
•
You are not adding a Gateway Node as part of a conversion from a
Basic Gateway replication group to one that includes an HAGC.
Do not enable services on the Gateway Node. Go to Chapter 5:
“Convert to a High Availability Gateway Cluster”.
If necessary, access a server remotely to avoid travel between sites. See
“Accessing a Server Remotely” on page 249.
Procedure
1. If the grid uses synchronized CMS databases and the
Control Nodes were added with an empty database:
a. Access a command shell on an expansion Control Node and
use a text editor to create a text file called recoveryinfo.txt that
lists all Control Nodes added in this expansion:
cd /usr/local/cms/cms/
vi recoveryinfo.txt
NOTE Metadata synchronization is deprecated.
58
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
WARNING Correct information about which Control Nodes were
added during each expansion is critical to restoring
failed synchronized CMSs. Record this information in
the /usr/local/cms/cms/recoveryinfo.txt reference file of
each CMS.
The following is an example of the information to include in the
recoveryinfo.txt file.
Original CMSs
These CMSs have equivalent databases. If one of the CMSs in this list fails,
the CMS in this list with the most metadata is used as the source CMS for a
cloning operation.
Names: DC-CN1 , DC-CN2, DR-CN1, DR-CN2
Node IDs: 13693641, 13693642, 13693651, 13693652
CMSs Added with empty DB in Expansion Date <yyyy-mm-dd>
These CMSs have equivalent databases. If one of the CMSs in this list fails,
the CMS in this list with the most metadata is used as the source CMS for a
cloning operation.
Names: DC-CN3, DC-CN4, DR-CN3, DR-CN4
Node IDs: 13693643, 13693644, 13693653, 13693654
b. Copy this file to every other Control Node in the grid, including any other expansion Control Nodes.
2. Start GDU. See “Start GDU” on page 211.
3. Start the StorageGRID software on the server:
a. In the Servers panel, select the server where you want to enable
services.
b. In the Server Info panel, confirm that the status displayed is
Available.
c. In the Tasks panel, select Enable Services, and then in the
Actions panel, select Start Task and press <Enter>.
Wait for the message Finished ‘Postinstall start’ task to appear in
the Log Messages panel.
NOTE If this is a primary Admin Node, do not select the Load Configuration option.
4. Close GDU as described in “Close GDU” on page 216.
5. If this is a primary Admin Node, that is, you are installing a reporting Admin Node as part of the conversion to a High Capacity
NetApp StorageGRID
215-06839_B0
59
Expansion Guide
Admin Cluster (HCAC) or you are performing a hardware refresh
on a primary Admin Node, disable the ability to load configuration bundles.
The grid automatically restores the latest version of the configuration bundles to the new Admin Node after it is started. If you
mistakenly load the configuration bundle, the grid reloads the
default configuration and overwrites all configuration changes
made via the NMS MI since the grid was first installed.
a. At the primary Admin Node server, access a command shell
and log in as root using the password listed in the Passwords.txt
file.
b. Updated the install state of the Admin Node. Enter:
echo LOAD > /var/local/run/install.state
c. Log out of the command shell. Enter: exit
The postinstll.rb load command is disabled.
6. Use the service laptop to connect through the customer’s network
to the Network Management System service of the grid. Log in
using the “Vendor ” account. For more information, see the “NMS
Connection Procedure” on page 246.
As each node joins the grid, you can monitor it via the NMS MI.
Alarms are normal until the services establish connectivity with
others in the grid. The alarms clear automatically as connections
are established.
NOTE TSM Archive Nodes show a major alarm (are orange) when they
are first started. This is normal. You must configure the TSM middleware and the Archive Node as described in “Complete the
Setup of TSM Archive Nodes” on page 73 before these alarms
clear.
7. Repeat the process of starting the grid software for any other
expansion grid nodes at the site.
For Gateway Nodes, when adding a replication group start all
members of a replication group together, starting with the main
primary in an HAGC, moving on to the supplementary primary,
and then any secondary Gateway Nodes. Start all of the
Gateway Nodes in a replication group before proceeding.
If the new Gateway Node replication groups are not updated in
the grid topology tree, try refreshing the page and/or logging out
of the NMS MI and logging in again.
60
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Apply Hotfixes and Maintenance Releases
After grid software is started, confirm that the expansion grid node is
running the same version of the StorageGRID software as other grid
nodes of the same type. If it is not, apply any necessary hotfixes or
maintenance releases needed to update the expansion grid node to the
same as the rest of the grid nodes of the same type.
Prerequisites
•
Grid software has been started on the expansion grid node.
Procedure
1. Determine the current version of the StorageGRID software:
a. Go to <grid_node_of same type>  SSM  Services  Main and
under Packages note the storage-grid-release number.
2. Determine the version of the StorageGRID software of the expansion grid node:
a. Go to <expansion_grid_node> SSM  Services  Main and
under Packages note the storage-grid-release number.
3. Compare the two versions and if they differ apply hot fixes or
maintenance releases as necessary to update the expansion grid
node to the same software version as the rest of the grid nodes of
the same type. For more information on available hotfixes and
maintenance releases contact Support.
Ensure Storage Nodes are Included in ILM Storage Pools
You must ensure that the expansion Storage Nodes are included in a
storage pool used by the active ILM policy or the new storage will not
be used by the grid. For more information, see the “Information Lifecycle Management” chapter in the Administrator Guide.
Monitor the Restoration of the Gateway Node File System
Prerequisites
NetApp StorageGRID
215-06839_B0
•
Grid software has been started on the expansion Gateway Node
(via Enable Services in GDU).
•
You are adding a Gateway Node to an existing replication group.
•
You are adding an additional secondary Gateway Node to an
existing FSG replication group.
61
Expansion Guide
•
You added a replacement Gateway Node to an FSG replication
group as part of a hardware refresh procedure.
Procedure
Whenever you add a Gateway Node to an existing FSG replication
group, you must synchronize its file system with the file system of the
other FSGs in the replication group before you proceed.
When you start the new secondary Gateway Node, the FSG service
performs a start-up scan and discovers that its file system must be
“restored”. The FSG proceeds to restore the file system automatically
from the last backup and subsequent replication sessions, which store
information about changes that have happened since the last backup.
This process may take a few hours to complete. You can use the NMS
MI to monitor the progress of the restoration.
1. In the NMS MI, go to the new FSG  Backup  Overview, and
monitor the Restore Result. Wait until the value is Successful.
NOTE Restoring the FSG file system from backup may take several
hours for a large file system.
2. Monitor the synchronization of the replication sessions:
a. Go to FSG  Replication  Overview, and verify that the Secondary Active Session ID is not zero.
b. Monitor FSG  Replication  Overview  Operations Not Applied.
Wait until the number stabilizes at a low value. The number
decreases as replication sessions are processed, but may not
reach zero if ingests into the replication group continue while
the restoration is in progress.
Verify that the Storage Node is Active
After you start grid software (via Enable Services in GDU) on the
Storage Node, the grid automatically starts using the new
Storage Node, unless you added the Storage Node as part of a
hardware refresh. In this case, Storage Nodes are read-only until you
complete the hardware refresh grid task. For more information, see
Chapter 6: “Hardware Refresh”.
Use the NMS MI to verify that the grid stores new content on the new
nodes.
62
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Procedure
1. Go to LDR  Storage  Overview of a new node.
2. Click the immediate report button for the Objects Committed attribute to see a graph of data objects being stored on the node.
3. Verify the value of the Objects Committed attribute is increasing.
Verify Operation of the Expansion Control Nodes
Prerequisites
•
Services have been enabled on the expansion Control Nodes.
Procedure
1. In the NMS MI, monitor the new Control Nodes in the grid
topology tree. It could take a few minutes for the grid to detect the
new Control Nodes. Eventually the alarms should clear, indicating
that the services are running normally.
2. Monitor the attributes Managed Objects and Stored Objects for each
new Control Node:
•
If the grid uses metadata replication, the attributes are found at
CMS  Metadata  Overview.
•
If the grid uses metadata synchronization, the attributes are
found at CMS  Content  Overview.
NOTE Metadata synchronization is deprecated.
If you cloned the database, the value of Stored Objects should be
approximately the same as for the source CMS used for database
cloning. The value of Managed Objects should start to increase as the
new CMSs start owning the metadata of new content ingested into
the grid.
3. If you added an additional Control Node to an existing metadata
replication group, go to CMS  Tasks to monitor the metadata reevaluation:
•
On the new CMS, look at the progress of the grid task:
expansion-<task_ID>::local-<new_CMS_node_ID>.
•
On the other CMSs in the grid, look at the progress of the task:
expansion-<task_ID>::remote-<new_CMS_node_ID>
NetApp StorageGRID
215-06839_B0
63
Expansion Guide
The metadata re-evaluation is necessary when you add a CMS to
an existing group to determine which metadata is required on the
expansion Control Node.
These tasks complete in the background and do not interfere with
normal grid operation.
Admin Node Customization
Prerequisites
•
Services have been enabled on the expansion Admin Node(s).
•
You are adding a second Admin Node or second HCAC to the
grid.
•
Passwords.txt file
NOTE Do not customize IP addresses now if you are installing an
Admin Node as part of the process of converting a primary
Admin Node to an HCAC. Instead, follow the instructions in Chapter 2: “Convert to a High Capacity Admin Cluster”.
Customizations
When you add Admin Nodes, the Admin Node configured as the preferred sender continues to send notifications after you add the
expansion Admin Node.
If the IP address allocated to the Admin Node on the customer
network changed since you planned the grid expansion (and generated the SAID package), update the IP address of the Admin Node
using the procedure outlined in the “Network Configuration” chapter
of the Administrator Guide.
If Domain Name Service (DNS) has been configured on the existing
Admin Nodes in the grid, configure DNS for the expansion
Admin Nodes. Follow the instructions in the “Network Configuration” chapter of the Administrator Guide.
When you add a second Admin Node or HCAC, you may optionally
update your browser bookmark with the IP address of the new
Admin Node. This provides web based grid access via this new NMS
MI.
64
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Copy NMS Database
The activation process creates a database for the NMS service on the
new Admin Node, and the new node starts recording attribute and
audit information as if the installation were new. The new NMS
service records attribute information for any services that exist at the
time the NMS is first started. The new NMS also records attribute
information for servers and services added to the grid after the
Admin Node is first started.
When adding Admin Nodes (including those that are part of an
HCAC) to the grid or performing a hardware refresh, you can optionally copy the existing NMS database from the original Admin Node to
the expansion Admin Nodes (the processing Admin Node in an
HCAC). For more information on HCACs, see the Administrator Guide.
Procedure
1. Stop the MI service on both the original Admin Node and the
expansion Admin Node.
If you are adding a second HCAC or converting a consolidated
Admin Node to an HCAC, the expansion Admin Node in this case
is the processing Admin Node.
a. For both Admin Nodes, access the command shell and log in
using the password listed in the Password.txt file.
b. Stop the MI service. Enter: /etc/init.d/mi stop
2. On the expansion Admin Node (processing Admin Node in an
HCAC):
a. Enter:
/usr/local/mi/bin/mi-clone-db.sh <IP_of_source_Admin_Node>
where <IP_of_source_Admin_Node> is the IP address of the
source Admin Node from which the NMS database is copied.
b. When prompted, confirm that you want to overwrite the MI
database on the expansion Admin Node.
c. When prompted, enter the password for the Admin Node, as
found in the Passwords.txt file.
Copying the NMS
database may take
several hours.
The NMS database and its historical data is copied to the expansion Admin Node. When it is done, the script starts the expansion
Admin Node.
3. Restart Server Manager on both the original Admin Node and the
expansion Admin Node. Enter: /etc/init.d/servermanager
restart
NetApp StorageGRID
215-06839_B0
65
Expansion Guide
Customize a New Gateway Node Replication Group
Prerequisites
•
You are adding a new Gateway Node replication group to the grid.
•
Services have been enabled on all Gateway Nodes in the expansion
FSG replication group.
Back up Each New Replication Group
If the expansion adds a new replication group or replication groups, use
the procedure in this section to manually back up the replication group
for the first time. This creates the initial conditions for future maintenance
of the grid.
Backups are scheduled to occur automatically when the grid is running.
This step ensures that you can recover from any problems that might arise
prior to the first scheduled backup of a new replication group, using the
backup created by this process.
Procedure
1. Find the FSG designated to perform the backup.
a. In the NMS MI, go to Grid Management  FSG Management 
<Replication_Group>  Overview  Main.
b. Click the link for Backup FSG to go to FSG  Overview  Main.
2. Ensure the backup FSG is running normally:
a. Verify the FSG State attribute reports Online.
b. Verify the FSG Status attribute reports No Errors.
c. Go to FSG  Backup  Overview  Main.
d. Verify that Current Status is Idle.
e. Verify the Backup Schedule: Next Scheduled Backup is not imminent.
3. Initiate the manual backup:
a. Go to FSG  Backup  Configuration.
b. Select Force Manual Backup.
c. Click Apply Changes.
The FSG now performs the backup, which stores a file into the grid.
4. If there is more than one replication group in the expansion, repeat
this procedure once for each replication group.
66
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Verify the Gateway Node Backup
1. In the NMS MI, find the server hosting the backup FSG and go to
FSG  Backup.
2. Check the Current Status attribute of the Current Backup group on
this page. If that reports Active, you must wait for the backup to
complete.
3. Verify:
•
The attribute Successful Backups reports 1. (If the automatic
backup process began while you were completing grid integration, the value may be greater than one.)
•
The Previous Backup section for the Backup Result reports
Successful.
•
The Start Time and End Time reports a reasonably current time
indicating the manual backup.
If the backup fails, repeat the process to force a backup and revalidate.
4. If there is more than one expansion replication group in the
deployment, repeat this procedure for each group. Verify that the
number of backups continues to increment.
5. Log out of the NMS MI. Click Logout.
The expansion Gateway Nodes are now ready for use. The CLB
services on the Gateway Nodes do not require any customization
or integration. They obtain all settings from the grid through the
ADC services.
After you make and confirm the initial backup of a new replication
group, customize the Gateway Nodes as required.
Customize Gateway Nodes
Before integrating new client accounts to the expansion
Gateway Nodes, you must perform any required customization. Customization may include:
NetApp StorageGRID
215-06839_B0
•
Changing the IP address of the Gateway Nodes, including the
virtual IP address of the HAGC.
•
Customizing Gateway Node behavior. Some possible customizations include:
•
Preventing file deletion
•
Enforcing a file protection period
67
Expansion Guide
•
Preloading files
•
Set caching priority
•
Enabling file recovery
NOTE File recovery is deprecated.
For information on how to customize, see the Administrator Guide.
Update Configurable ILM
If your configurable ILM uses FSG ingest path or other FSG metadata,
update your ILM policy to handle objects saved via the new
Gateway Node replication group. For information about updating a
configurable ILM, see the Administrator Guide.
Integrate Clients
Next integrate new CIFS and/or NFS clients with the expansion
Gateway Node replication group by creating file shares that they can
use to access the grid. See the “File Share Configuration” chapter of the
Administrator Guide.
Verify the New Gateway Node Replication Group
After you add CIFS and/or NFS clients to the new FSG services, verify
that clients can access the shares. See the “Verify Client and Grid Integration” chapter of the Installation Guide.
Ensure that you perform the test that verifies that clients can save
objects to the primary, and retrieve them from another FSG in the same
replication group. This test verifies that you correctly completed all
steps in the Gateway Node expansion procedure.
WARNING Always verify the expansion by performing the Integration Verification tests outlined in the Installation
Guide.
Update File Share Integration
Whenever you add a new Gateway Node to an existing FSG replication group, you must update the file share configuration on the new
68
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Gateway Node to synchronize it with file shares on existing FSGs in
the same replication group. Check grid specific documentation for
information on whether the grid has CIFS file shares, NFS file shares,
or both configured for the replication group.
Prerequisites
•
You added a secondary Gateway Node to an existing FSG replication group, and the restoration of the file system on the new
Gateway Node is complete. See “Monitor the Restoration of the
Gateway Node File System” on page 61.
—or—
•
You are converting a Basic Gateway replication group to be a High
Availability Gateway replication group, and you added a new
Gateway Node to the grid to be the supplementary primary, and
you completed the restoration of the file system on the new
Gateway Node. For the procedure, see Chapter 5: “Convert to a
High Availability Gateway Cluster”.
•
You are performing a hardware refresh on a primary or secondary
Gateway Node. For the procedure, see Chapter 6: “Hardware
Refresh”.
NOTE If the server hosts the Gateway Node and an Admin Node (such
as an Admin/Gateway Node), you must manually create the audit
share if one is required.
Procedure
1. For CIFS file shares:
a. At the command line of the expansion Gateway Node, enter:
config_cifs.rb
b. Copy the file shares to the expansion Gateway Node. Enter:
pull-config
A list of the other FSG services in the replication group is
displayed.
c. Select the active primary in the current replication group, and
enter its number from the list. Enter: <number>
NetApp StorageGRID
215-06839_B0
69
Expansion Guide
d. When you are prompted Make all shares read-only? [Yes/No]:
Type of Expansion
Conditions
Enter:
Add or Refresh
Secondary
Gateway Node
If the replication group
supports business continuity
failover, the shares must be
copied read-write.
No
Add or Refresh
Secondary
Gateway Node
If the replication group does
not support business continuity failover, the shares must
be copied read-only.
Yes
Add Primary
Gateway Node during
HAGC conversion
Shares must be copied readwrite.
No
Refresh Primary
Gateway Node
e. When you are asked Sync authentication information?, enter: yes
Authentication information on the expansion Gateway Node is
synchronized with the primary.
f.
The script asks if you want to synchronize a custom configuration to the target server. (A custom configuration is created by
hand-editing the cifs-custom-config.inc Samba configuration file.
Custom configurations may be performed to integrate with
customers with unusual requirements.)
Enter: No unless the Samba configuration has been hand-edited
for this replication group to meet particular customer needs, or
the file share supports the use of ACLs.
g. The configuration utility must establish a secure shell connection with the active primary Gateway Node to pull a copy of
the shares. Therefore the following message may appear: The
authenticity of host 'hostname (xx.xx.xx.xx)' cannot be established.
The RSA key fingerprint is
d9:2b:ea:86:fd:90:3d:a1:f7:e9:d5:f6:d5:30:75:38.
Are you sure you want to continue connecting (yes/no)?
Enter: yes
The script responds: Warning: Permanently added 'xx.xx.xx.xx'
(RSA) to the list of known hosts.
70
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
h. Next, you are prompted for the password of the remote
Gateway Node. Enter the password, as found in the Passwords.txt file. Enter: <password>
The script responds Configuration updated when it is done.
When you pull the configuration, the NetBIOS name is set to
the hostname of the target Gateway Node. To customize the
NetBIOS name, see the instructions in the Administrator Guide.
i.
If the Gateway Node is cohosted on a server that also hosts an
Admin Node, you must manually create an audit share if
required using the instructions in the Administrator Guide. The
pull-config and push-config options do not copy the audit file
share.
j.
If the file share uses Windows Active Directory authentication,
join the domain from this Gateway Node. For more information, see the Administrator Guide.
k. Quit the utility. Enter: exit
2. For NFS file shares:
a. At the command line of the expansion Gateway Node, enter:
config_nfs.rb
b. Copy the file shares to the expansion Gateway Node. Enter:
pull-config
A list of the other FSG services in the replication group is
displayed.
c. Select the active primary in the current replication group, and
enter its number from the list. Enter: <number>
d. When you are prompted Make all shares read-only? [Yes/No]:
NetApp StorageGRID
215-06839_B0
Type of Expansion
Conditions
Enter:
Add Secondary
Gateway Node
If the replication group
supports business continuity
failover, the shares must be
copied read-write.
No
Add Secondary
Gateway Node
If the replication group does
not support business continuity failover, the shares must
be copied read-only.
Yes
Add Primary
Gateway Node during
HAGC conversion
Shares must be copied readwrite.
No
71
Expansion Guide
e. The following message may appear: The authenticity of host 'hostname (xx.xx.xx.xx)' can't be established. The RSA key fingerprint is
d9:2b:ea:86:fd:90:3d:a1:f7:e9:d5:f6:d5:30:75:38.
Are you sure you want to continue connecting (yes/no)?
Enter: yes
The script responds: Warning: Permanently added 'xx.xx.xx.xx'
(RSA) to the list of known hosts.
f.
Next you are prompted for the password of the remote
Gateway Node. Enter the password, as found in the Passwords.txt file. Enter: <password>
The server opens an ssh connection with the active primary
Gateway Node and pulls a copy of its file shares to the expansion Gateway Node.
The script responds Configuration updated when it is done.
g. If the Gateway Node is cohosted on a server that also hosts an
Admin Node, you must manually create an audit share if
required using the instructions in the Administrator Guide. The
pull-config and push-config options do not copy the audit file
share.
h. Quit the utility. Enter: exit
3. Confirm that the file shares have been copied correctly.
a. Start the config_cifs.rb or config_nfs.rb utility.
b. Enter: validate-config
c. Review the list of file shares and ensure that they are the same
as those on the active primary.
d. Quit the utility. Enter: exit
Copy the FSG Cache
Prerequisites
•
You are adding an expansion secondary Gateway Node to an FSG
replication group.
•
File shares on the new secondary are synchronized. See “Update
File Share Integration” on page 69.
Procedure
Optionally copy the FSG cache from the primary Gateway Node to the
expansion secondary. When the expansion Gateway Node is installed,
72
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
its FSG cache is empty. Copying the FSG cache can speed up retrieval
access to files on the FSG.
For the procedure, see the Maintenance Guide.
Archive Node Integration and Customization
Prerequisites
•
Grid software has been started using Enable Services in GDU on
the expansion Archive Node.
Complete the Setup of TSM Archive Nodes
To complete the setup of a TSM Archive Node:
•
Configure the TSM server.
•
After the TSM server is configured, go to the NMS MI and configure the Archive Node to communicate with the TSM server.
See the “Configure the Grid” chapter of the Installation Guide.
Update the ILM Policy
You must include the new Archive Node in a storage pool and activate
an ILM policy that uses this storage pool before the grid can use the
new storage. For more information, see the ILM management chapter
in the Administrator Guide.
Copy Audit Logs
Each Admin Node or Audit Node that hosts the AMS service has one
audit log file. This audit log file records events and actions performed
by all grid nodes and services in the grid, both before and after any
expansion. Access to the audit log files is limited to authorized technical support staff unless the customer has purchased the audit option.
For those with the audit option a separate document (Audit Message
Reference) detailing access and log content is provided.
When you add a second Admin Node, HCAC, or Audit Node to the
grid (a grid node that hosts the AMS service), its audit log will record
information on all events and actions that occurred after it joined the
grid.
NetApp StorageGRID
215-06839_B0
73
Expansion Guide
If you add an Admin Node or Audit Node for redundancy, when converting a grid to an HCAC, or performing a hardware refresh of an
Admin Node, to preserve the audit messages stored on the original
Admin Node or Audit Node, copy the log files manually to the expansion or refreshed Admin Node or Audit Node. For more information
on the HCAC conversion process, see Chapter 3: “Convert to a High
Capacity Admin Cluster”.
Procedure
To preserve audit logs, copy the audit logs from the original
Admin Node or Audit Node that hosts the AMS service to the expansion node that hosts the AMS service (reporting Admin Node in an
HCAC).
1. At the original Admin Node or Audit Node server, access a
command shell and log in as root using the password listed in the
Passwords.txt file.
2. Stop the AMS service to prevent it from creating a new file. Enter:
/etc/init.d/ams stop
3. Rename the audit.log file, so it does not overwrite the file on the
expansion grid node that hosts the AMS service to which you copy
it. Enter:
cd /var/local/audit/export
ls -l
mv audit.log <new_name>.txt
4. Copy all the audit log files to expansion grid node hosting the
AMS service. Enter:
scp -p * <IP_address>:/var/local/audit/export
When prompted, enter the password of the grid node as listed in
the Passwords.txt file.
5. Restore the original audit.log file. Enter:
mv <new_name>.txt audit.log
6. Restart the AMS service on the grid node stopped in step 2. Enter:
/etc/init.d/ams start
7. Log out of the command shell. Enter: exit
8. At the expansion grid node hosting the AMS service, access a
command shell and log in as root using the password listed in the
Passwords.txt file.
9. Update the user and group settings of the audit log files. Enter:
cd /var/local/audit/export
chown ams-user:bycast *
10. Log out of the command shell. Enter: exit
74
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Update the Grid for New Grid Nodes
Grid Expansion
Procedure
Read scope and limitations
After you add all grid nodes at all sites (as described in “Add, Customize, and Start Grid Nodes” on page 41), perform the final steps in this
section to complete the grid expansion.
Prepare for expansion
Provision the grid
Update networking
Prepare VMs and servers
Install grid software
Add, customize, and start grid
nodes
Update grid for new grid nodes
Overview: Update the Grid
Table 4: Complete the Expansion

Step
Applies to
servers
hosting these
services
1.
2.
All
Action
See
If you have added a new site (group) to the grid,
update link costs for groups within the grid.
page 75
If any of the servers added in the expansion
have the role of NTP primary, update the list of
NTP sources on all other servers in the grid.
page 76
Gateway Nodes and Admin Nodes are
commonly NTP primaries.
Update Link Costs
Prerequisites
NetApp StorageGRID
215-06839_B0
•
You started grid software on all expansion grid nodes, and completed all required customizations for these grid nodes.
•
You added a new site to the grid in the expansion.
75
Expansion Guide
Procedure
When you add a new site to the grid via an expansion, the grid assigns
the grid nodes at the new site to a new group. The link cost values
between the new group and all other groups in the grid default to 100.
Update the link cost values (found in the NMS MI at
Grid Management  Grid Configuration  Link Cost Groups  Configuration)
to properly reflect the relative costs of communicating between the different groups within the grid. For the procedure to change link cost
values, see the Administrator Guide.
Update NTP Sources on Grid Nodes
Prerequisites
•
You completed all other expansion steps for expansion grid nodes.
Procedure
Some expansion grid nodes may be configured to act as a primary
NTP time source for the grid. (Gateway Nodes and Admin Nodes are
commonly designated to be primary NTP time sources). Follow the
procedure below to update all pre-expansion grid nodes to include the
new primary NTP time sources in their list of NTP sources.
1. Check which grid nodes are currently configured to have the role
of a primary NTP time source.
This information is in the SAID package for the grid. Consult the
NTP tab of the Doc/index.html file.
2. If any expansion grid nodes are NTP primaries, note their grid IP
addresses.
3. On each server that was in the grid prior to the expansion, add the
expansion grid nodes as NTP sources. Follow the procedure
“Update the List of Grid NTP Primaries” in the “Network Configuration” chapter in the Administrator Guide.
After you complete any necessary updates to NTP sources, the grid
expansion is complete.
76
NetApp StorageGRID
215-06839_B0
Add Grid Nodes
Troubleshooting
This section includes troubleshooting topics to help you identify and
solve problems that may occur while expanding the StorageGRID
system. See also “GDU Troubleshooting” on page 217.
If problems persist, contact Support. You may be asked to supply the
following installation log files:
•
/var/local/log/install.log (found on the server being installed)
•
/var/local/log/gdu-console.log (found on the primary Admin Node)
Corrupt ISO Message When Using load_cds.py Script
If you get an error message that a corrupt ISO has been detected (for
instance, a failed md5sum check), check the integrity of the CD and
copy the ISO image again.
NetApp StorageGRID
215-06839_B0
77
Expansion Guide
78
NetApp StorageGRID
215-06839_B0
3
Convert to a High
Capacity Admin Cluster
Overview
This chapter describes how to convert a consolidated Admin Node to a
High Capacity Admin Cluster (HCAC). The need to convert most
likely occurs because it has been determined (as described in “Confirm
Grid’s Grid Node Capacity” on page 21) that the number of services in
the grid after an expansion will exceed the current capacity of the
grid’s consolidated Admin Node. If the grid requires an HCAC, you
must convert the existing consolidated Admin Node to an HCAC as
described in this chapter before you add any other grid nodes to the
grid.
In an HCAC, grid services are bound to the processing Admin Node
which collects attribute information. The reporting Admin Node
provides a browser-based management interface for web clients and is
configured as the primary Admin Node (hosts the CMN).
For redundancy, you can also add a second HCAC to the grid. A
second HCAC duplicates the bindings of the first HCAC; however,
only the first HCAC hosts a primary Admin Node. Each HCAC
collects and stores information from all grid services, and each cluster
provides a separate, but equivalent, management interface to display
this grid information. Web clients can link to either HCAC and have
the same view of the grid. Note that alarm acknowledgments made
from one HCAC are not copied to any other HCAC in the grid. It is
therefore possible that the Grid Topology trees will not look the same.
For more information, see the Administrator Guide.
NOTE A grid cannot have both a consolidated Admin Node and an HCAC.
When you convert to an HCAC, the second Admin Node, if present,
must be decommissioned or converted to an HCAC.
NetApp StorageGRID
215-06839_B0
79
Expansion Guide
Convert to a High Capacity Admin Cluster
The conversion process involves a grid expansion to add the reporting
and processing Admin Nodes to the grid, and a decommissioning to
remove the consolidated Admin Node.
NOTE New installations of the Release 9.0 StorageGRID system are not
supported on physical servers. Virtual machines must be used.
1. Prepare for the expansion. See “Prepare for Expansion” on
page 21.
2. Provision the grid. See “Provision the Grid” on page 29.
3. Add the reporting Admin Node:
a. Perform one of the following actions:
•
Prepare virtual machines, see “Prepare Virtual Machines”
on page 179.
•
Prepare physical servers, see “Prepare Expansion Physical
Servers” on page 199.
b. Install the grid software. See “Install StorageGRID Software”
on page 39.
NOTE Do not start the grid software. Do not select Enable Services in
GDU.
c. Run the Initial expansion grid task. See “Run the Initial Expansion Grid Task” on page 44.
d. Run the Add Server expansion grid task. See “Run the Add
Server Grid Task” on page 47.
e. Start the grid software on the reporting Admin Node. See
“Start Grid Software (Enable Services)” on page 57.
NOTE It is safe to ignore the ADC alarm Unreachable Attribute Repositories (ALUR).
4. Add the processing Admin Node:
a. Perform one of the following actions:
•
80
Prepare virtual machines, see “Prepare Virtual Machines”
on page 179.
NetApp StorageGRID
215-06839_B0
Convert to a High Capacity Admin Cluster
•
Prepare physical servers, see “Prepare Expansion Physical
Servers” on page 199.
b. Install the grid software. See “Install StorageGRID Software”
on page 39.
NOTE Do not start the grid software. Do not select Enable Services in
GDU.
c. Run the Add Server expansion grid task. See “Run the Add
Server Grid Task” on page 47.
d. Start the grid software on the processing Admin Node. See
“Start Grid Software (Enable Services)” on page 57.
The ADC alarm Unreachable Attribute Repositories (ALUR) should
clear.
5. Change the state of the CMN service hosted by the consolidated
primary Admin Node to Standby:
a. Go to <consolidated_primary_Admin Node>  CMN 
Configuration  Main.
b. Change CMN State to Standby.
c. Click Apply Changes.
6. Change the state of the CMN service hosted by the reporting
primary Admin Node of the HCAC to Online:
a. Go to <reporting_primary_Admin Node>  CMN 
Configuration  Main.
b. Change CMN State to Online.
c. Click Apply Changes.
7. Apply hotfixes and maintenance releases to the HCAC. See “Apply
Hotfixes and Maintenance Releases” on page 61.
8. Copy the NMS database from the original consolidated
Admin Node to the processing Admin Node. See “Copy NMS
Database” on page 65.
9. If you require the audit logs, copy them from the original consolidated Admin Node to the reporting Admin Node. See “Copy
Audit Logs” on page 73.
10. If the Audit option is included in the deployment, enable the audit
share on the reporting Admin Node (the Admin Node hosting the
AMS service). For more information and procedures, see the
Administrator Guide.
NetApp StorageGRID
215-06839_B0
81
Expansion Guide
11. If Domain Name Service (DNS) was configured on the consolidated Admin Node, configure DNS for the HCAC Admin Nodes
(both the processing and reporting). Follow the instructions in the
“Network Configuration” chapter of the Administrator Guide.
12. Set the new HCAC as the preferred notifications sender:
a. Go to Grid Management  NMS Management  General  Main.
b. Change Preferred Notification Sender to the HCAC.
c. Click Apply Changes.
13. If the HCAC includes a CMN, load provisioning data software and
data on the reporting Admin Node (hosting the CMN service). See
“Load Provisioning Software and Provisioning Data” on page 149.
14. Update all servers that use the consolidated Admin Node as the
primary NTP time source to use the processing Admin Node as the
primary NTP time source. For the procedure, see “Update the list
of NTP Primaries” in the “Network Configuration” chapter of the
Administrator Guide.
15. Update ssh keys on the reporting Admin Node (Admin Node that
hosts the CMN service). See “Update SSH Keys” on page 152.
16. Decommission the consolidated Admin Node. See “Decommission
the Consolidated Admin Node” on page 83.
17. Change the IP address of the reporting Admin Node to the IP
address of the consolidated Admin Node. For more information,
see the Administrator Guide.
— or —
Remap web clients to the IP address of the reporting Admin Node.
This change allows web clients to log in to the NMS MI of the new
HCAC.
If the grid requires a second HCAC, add it now following the procedure in this chapter.
If a second consolidated Admin Node is present, either decommission
it or convert it to an HCAC.
NOTE A grid cannot have a consolidated Admin Node and an HCAC. If a
second Admin Node is present, it must be decommissioned or converted to an HCAC.
After the HCAC has been added to the grid and the consolidated
Admin Node decommissioned, return to Chapter 2 “Add Grid Nodes”
to add any remaining grid nodes that are a part of this expansion.
82
NetApp StorageGRID
215-06839_B0
Convert to a High Capacity Admin Cluster
Decommission the Consolidated Admin Node
Use this procedure to decommission the consolidated Admin Node.
Prerequisites
•
The HCAC Admin Node is running (the HCAC conversion has
been completed to step 15 on page 82).
•
No grid tasks are running. The only exceptions are:
•
Gateway Node or Admin Node decommissioning for a different server (GDCM)
•
LDR content rebalancing grid task (LBAL)
•
ILM evaluation (ILME)
•
LDR foreground verification (LFGV or VFGV)
These grid tasks can run concurrently with the Admin Node
decommissioning grid tasks. If any other grid tasks are running,
wait for them to complete or release their lock, or abort them as
appropriate. For more information on grid tasks and resource
locking, see the chapter “Grid Tasks” in the Administrator Guide.
Procedure
1. Run the grid task to remove bindings from the Admin Node.
a. In the NMS MI of the primary Admin Node of the HCAC
(reporting Admin Node), go to <grid root> Configuration 
Tasks.
b. In the Pending table, locate the grid task Remove NMS Cluster
Bindings. Under Actions, select Start.
c. Click Apply Changes.
The grid task moves to the Active table. Wait until the grid task
moves to the Historical table with a Status of Successful.
2. Run the Admin Node decommissioning grid task:
a. In the NMS MI of the primary Admin Node of the HCAC
(reporting Admin Node), go to <grid root> Configuration 
Tasks.
b. In the Pending table, locate the Admin Node Decommissioning:
Server <hostname> grid task. Under Actions, select Start.
NetApp StorageGRID
215-06839_B0
83
Expansion Guide
c. Click Apply Changes.
The grid task moves from the Pending table to the Active table.
Wait until the grid task moves to the Historical table with a
Status of Successful. The Admin Node is decommissioned.
3. Power down the consolidated Admin Node.
After the consolidated Admin Node has been decommissioned, go
back to step 17 on page 82 of the HCAC conversion procedure.
84
NetApp StorageGRID
215-06839_B0
4
Add Storage
Increasing the capacity of existing Storage Nodes
Introduction
You can use the following methods to increase the disk storage
capacity of a grid:
•
Add Storage Nodes — to increase the total storage capacity of the
grid or to add new sites to the grid. See Chapter 2: “Add Grid
Nodes”.
•
Add Storage Volumes (LUNs) to a Storage Node — to increase the
storage capacity of a Storage Node.
Storage expansion depends upon the grid’s ILM rules for stored
content and the grid topology. For example, if the ILM rules make one
copy of each ingested object at a Data Center site and a second copy at
a Disaster Recovery site, you must add an equivalent amount of
storage at each site to increase the overall capacity of the grid. Most
ILM policies create two copies of data on spinning media; therefore,
most storage expansions add two Storage Nodes, or add storage
volumes to two Storage Nodes.
NOTE You cannot add storage volumes to a Storage Node during upgrade.
Adding Storage Volumes
You can add additional storage volumes (LUNs) to increase the
capacity of a Storage Node (or a server that hosts a combination of grid
nodes, such as a Control/Storage Node). Each Storage Node can
support up to 16 storage volumes. If a Storage Node includes fewer
than 16 storage volumes, you can add additional volumes to that
Storage Node to increase its storage capacity.
NetApp StorageGRID
215-06839_B0
85
Expansion Guide
Note that to increase the total usable storage capacity of the grid, you
must generally add storage capacity to more than one Storage Node
(as described in the “Introduction” above).
Understanding Storage Volumes
The underlying data storage on a Storage Node is divided into a fixed
number of “object stores”, which are drives or partitions that act as
mount points for the storage.
You can view the list of object stores on a node using the NMS MI
interface. Each item in the LDR  Storage  Overview  Object Stores
table corresponds to a mount point listed in the Volumes table on the
SSM  Resources  Overview page of the same node. That is, the object
store with an ID of 0000 corresponds to /var/local/rangedb/0 in the SSM 
Resources  Overview  Volumes table.
NOTE Object stores can vary in size. For more information, see “Rebalancing Content Across Storage Volumes” on page 93.
It is recommended that you add storage volumes that are roughly the
same size as existing storage volumes.
The procedure to add storage to a Storage Node varies depending on
whether it uses auto or manual storage volume balancing.
Manual Storage Volume Balancing
For Storage Nodes with LDRs whose Storage Volume Balancing mode is
Manual, content is randomly distributed across the storage volumes,
with the number of objects stored to each volume being roughly proportional to the volume’s size.
When files are saved to an LDR, they are saved to an object store based
on the file’s CBID, which is assigned to the file by the CMS as it is
ingested. Each object store is mapped to a range of CBIDs and the
object is saved to the correct volume based on its assigned CBID.
If any one of the storage volume fills, the Storage Node as a whole
cannot accept more data and goes read-only. If one storage volume is
much smaller than the rest, the Storage Node might “fill” when there
is still a significant amount of unused disk space on the Storage Node
as a whole, particularly if the object size is large compared to the size
of the smallest storage volume. Therefore, it is recommended that for
86
NetApp StorageGRID
215-06839_B0
Add Storage
Storage Nodes with LDRs whose Storage Volume Balancing mode is
Manual, that storage volumes all be approximately the same size.
Auto Storage Volume Balancing
For Storage Nodes with LDRs whose Storage Volume Balancing mode is
Auto, content is balanced across volumes taking into account available
free space. The number of objects stored to each volume is based on the
size of the objects.
Each object store has a unique volume ID. When files are saved to an
LDR, the LDR randomly (based on available free space) assigns the object
a volume ID and the object is saved to the object store with that volume
ID.
If a storage volume fills, the Storage Node can continue to accept data.
Data is randomly distributed across the remaining storage volumes and
the Storage Node does not go read-only until all storage volumes are
filled.
Note that the object is also assigned a CBID, but only to map the object’s
path within the directory structure of the object store.
New vs. Upgraded Storage Nodes
Newly installed or added Storage Nodes (not Storage Nodes added as
part of a hardware refresh) have a Storage Volume Balancing mode of Auto
(see LDR  Storage  Main). Upgraded Storage Nodes or Storage Nodes
added as part of a hardware refresh have a Storage Volume Balancing
mode of Manual.
Because of this difference in the mapping of objects to object stores for
new release 9.0 Storage Nodes versus upgraded or refreshed
Storage Nodes, expansion procedures differ slightly. Follow the appropriate expansion procedures in this guide.
Determine Storage Node’s Storage Volume Balancing Mode
1. Go to <Storage Node>  LDR  Storage  Main.
2. Determine the mode for the attribute Storage Volume Balancing:
NetApp StorageGRID
215-06839_B0
•
Auto — Objects are mapped to volume ID and the object stores are
automatically balanced
•
Manual — Objects are mapped to a CBID and object stores must be
manually rebalanced
87
Expansion Guide
Add NFS Mounted Storage Volumes
New Storage Nodes only support NetApp storage systems (that is, an
NFS server).
WARNING You cannot change the size of storage volumes after
you integrate the Storage Node with an NFS server,
even if the NFS system lets you expand storage
volumes.
Prerequisites
•
The Storage Node is connected to a NetApp storage system; the
Storage Node is not connected to direct-attached or SAN storage.
•
The additional NFS volumes have been set up and exported.
NOTE Configuration of the NFS volumes on the NFS server is beyond
the scope of this guide.
•
Ensure that you have the following:
•
IP address of the NFS server
•
Passwords.txt file
•
Node ID of the LDR (go to LDR  Overview  Main)
Procedure
1. Log in to the Storage Node as root, using the password provided in
the Passwords.txt file.
2. Verify connectivity to the NFS server. Enter: ping <NFS_Server_IP>
3. Verify that the Linux NFS client package is installed (it should be
installed by default). Enter: rpm -q nfs-utils
4. Stop services on the Storage Node. Enter:
/etc/init.d/servermanager stop
5. Mount each NFS export that is to be added. Mount the NFS export:
a. At the Storage Node server, using a text editor such as vi, add
this line to the /etc/fstab file for each storage volume (on one
line):
<NFS_Server_IP>:<volume_path> /var/local/rangedb/
<next_available_index> nfs rw,rsize=65536,
wsize=65536,nfsvers=3,tcp
where:
88
NetApp StorageGRID
215-06839_B0
Add Storage
•
<NFS_Server_IP> is the IP address of the NFS server
•
<volume_path> is the path of the storage volume exported
from the NFS server
•
<next_available_index> is the Storage Node rangedb, a
number between 0 and 15 in hexadecimal notation (0 to F,
case-specific). For the first storage volume, the index
number is 0. For example:
192.168.130.16:/vol/vol1 /var/local/rangedb/0 nfs
rw,rsize=65536,wsize=65536,nfsvers=3,tcp
Repeat step a for each storage volume.
b. Create a mount point for each NFS storage volume, using the
same index numbers used in the etc/fstab file in step a. Enter:
mkdir -p /var/local/rangedb/<next_available_index>
For example:
mkdir -p /var/local/rangedb/0
Do not create mount points outside of /var/local/rangedb.
Repeat step b for each storage volume.
c. Mount the storage volumes to the mount points. Enter:
mount -a
If mounting fails, make sure that the storage volumes are configured on the NFS server for read-write access to the
Storage Node and that the IP address of the Storage Node
supplied to the NFS server is correct.
6. Set up the NFS rangedb directory. Enter:
/usr/local/ldr/setup_rangedb.sh <LDR_nodeid>
This script creates the required subdirectories in the rangedb
directory.
7. Restart services on the Storage Node. Enter:
/etc/init.d/servermanager start
8. Check that the services start correctly. To view a listing of the status
of all services on the server, enter: storagegrid-status
Wait until all services are Running or Verified. Press <Ctrl>+<C> to
exit.
9. If the Storage Node’s desired state is set to Read-only or Offline,
change its desired state to Online:
a. Log in to the NMS MI using the Vendor account or an account
with Grid Management permissions.
NetApp StorageGRID
215-06839_B0
89
Expansion Guide
b. Go to <Storage Node>  LDR  Storage  Configuration 
Main.
c. Change Storage State – Desired to Online.
Figure 3: Change the Desired Storage State to Online
d. Click Apply Changes.
10. If not already known, determine the Storage Node’s volume balancing mode:
a. Go to LDR  Storage  Overview  Main
b. Determine the value for Storage Volume Balancing: Auto (for
volume ID mapping) or Manual (for CBID mapping).
11. If the value for Storage Volume Balancing is Manual, go on to the procedure “Rebalancing Content Across Storage Volumes” on
page 93.
NOTE In the NMS MI, the attributes in the Utilization table on the LDR 
Storage  Overview page remain gray until you have rebalanced
content for the Storage Node.
If the value for Storage Volume Balancing is Auto, the procedure to
add storage to a Storage Node is complete.
Add Direct-Attached or SAN Storage Volumes
Prerequisites
90
•
The existing Storage Node is integrated with direct-attached or
SAN storage. It is not integrated with NFS mounted storage
volumes.
•
Ensure that you have the Passwords.txt file
NetApp StorageGRID
215-06839_B0
Add Storage
Procedure
To add direct-attached or SAN storage volumes to a Storage Node:
1. Install the new storage hardware. For more information, see the
documentation provided by your hardware vendor.
If adding storage volumes to a virtual machine, restart the virtual
machine, or the add_rangedbs.rb script may not find the new drives.
2. At the Storage Node server, access a command shell and log in as
root, using the password for the server found in the Passwords.txt
file.
3. Shut down all StorageGRID services on the server. Enter:
/etc/init.d/servermanager stop
4. Configure the new storage for use by the Storage Node:
a. Configure the new storage volumes. Enter: add_rangedbs.rb
The script finds any new storage volumes and proposes to
format them.
b. To accept the reformatting, enter: y
c. If any of the drives have previously been formatted, you are
asked if you want to reformat them.
To proceed, enter: y
servername:# add_rangedbs.rb
Unallocated drives detected.
Use
Use
Use
Use
/dev/sdj
/dev/sdk
/dev/sdl
/dev/sdm
(750G)
(750G)
(750G)
(750G)
as
as
as
as
an
an
an
an
Is the following proposed action correct?
LDR
LDR
LDR
LDR
rangedb
rangedb
rangedb
rangedb
Accept proposal [y/n]? y
Restoring any volume groups ...
Determining drive partitioning ...
WARNING: drive /dev/sdj exists
Reformat the drive? [y/n]? y
Formatting the following: /dev/sdj /dev/sdk /dev/sdl /dev/sdm
Tuning the following: /dev/sdj /dev/sdk /dev/sdl /dev/sdm
Running: /usr/local/ldr/setup_rangedb.sh 12084115
Disk drive(s) for LDR ready.
NetApp StorageGRID
215-06839_B0
91
Expansion Guide
servername:/usr/local/sbin # add_rangedbs.rb
Unallocated drives detected. Is the following proposed action correct?
Partition
Partition
Partition
Partition
and
and
and
and
format
format
format
format
/dev/cciss/c1d6
/dev/cciss/c1d7
/dev/cciss/c1d8
/dev/cciss/c1d9
(750G)
(750G)
(750G)
(750G)
for
for
for
for
use
use
use
use
as
as
as
as
an
an
an
an
LDR
LDR
LDR
LDR
rangedb
rangedb
rangedb
rangedb
Accept proposal [y/n]? y
Restoring any volume groups ...
Determining drive partitioning ...
WARNING: partition /dev/cciss/cld6 exists
Reformat the partition? [y/n]? y
Partitioning the following drives: /dev/cciss/c1d6 /dev/cciss/c1d7 /dev/cciss/
c1d8 /dev/cciss/c1d9
Formatting the following partitions: /dev/cciss/c1d6p /dev/cciss/c1d7p /dev/
cciss/c1d8p /dev/cciss/c1d9p
Tuning the following partitions: /dev/cciss/c1d6p /dev/cciss/c1d7p /dev/cciss/
c1d8p /dev/cciss/c1d9p
Running: /usr/local/ldr/setup_rangedb.sh 12075054
Disk partition for LDR ready.
rebalancing process.
servername:/usr/local/sbin #
Now do manual steps to restart LDR and initiate
d. If you do not want to format the drives, enter n to exit the
script.
5. Restart services. Enter: /etc/init.d/servermanager start
6. Check that the services start correctly. To view a listing of the status
of all services on the server, enter: storagegrid-status
The status is updated automatically. Wait until all services are
Running or Verified. To exit the status screen, press <Ctrl>+<C>.
7. If the Storage Node’s desired state is set to Read-only or Offline,
change its desired state to Online to enable the grid to use the additional storage:
a. Log in to the NMS MI using the Vendor account or an account
with Grid Management permissions.
b. Go to <Storage Node>  LDR  Storage  Configuration 
Main.
c. Change Storage State – Desired to Online.
92
NetApp StorageGRID
215-06839_B0
Add Storage
Figure 4: Change the Desired Storage State to Online
d. Click Apply Changes.
8. If not already known, determine the Storage Node’s volume balancing mode:
a. Go to <Storage Node> LDR  Storage  Overview  Main
b. Determine the value for Storage Volume Balancing: Auto (for
volume ID mapping) or Manual (for CBID mapping).
9. If the value for Storage Volume Balancing is Manual, go on to the procedure “Rebalancing Content Across Storage Volumes”.
NOTE In the NMS MI, the attributes in the Utilization table on the LDR 
Storage  Overview page remain gray until you have rebalanced
content for the Storage Node, if required.
If the value for Storage Volume Balancing is Auto, the procedure to
add storage to a Storage Node is complete.
Rebalancing Content Across Storage Volumes
NOTE The following information and procedure only applies to
Storage Nodes upgraded to release 9.0 or Storage Nodes added as
part of a hardware refresh. That is, Storage Nodes with LDRs whose
Storage Volume Balancing mode is Manual.
Content on Storage Nodes only needs to be rebalanced for LDRs
where Storage Volume Balancing is Manual. This is true for all
Storage Nodes upgraded to release 9.0. New release 9.0 Storage Nodes
have Storage Volume Balancing listed as Auto.
As each object is ingested into the grid, it is assigned a unique 16 digit
hexadecimal identifier known as a CBID that falls in the range of
0000000000000000 to FFFFFFFFFFFFFFFF. For LDRs where Storage
Volume Balancing is Manual, storage volumes are configured such that
NetApp StorageGRID
215-06839_B0
93
Expansion Guide
each object store (storage volume or rangedb) is associated with a
range of CBID values. When an object is stored to an LDR, it is saved to
a particular storage volume based on its CBID. Because CBIDs are
randomly generated, this ensures that objects are assigned evenly
across the multiple storage volumes of a Storage Node.
NOTE Storage volumes upgraded to release 9.0 do not need to be identical
in size: larger storage volumes are associated with a larger CBID
range to ensure that storage volumes fill evenly.
For LDRs where Storage Volume Balancing is Manual, although storage
volumes do not need to be identical in size, it is not recommended that
you create a configuration where some storage volumes are much
smaller than others. If large objects are stored to such a Storage Node,
one or more of these small storage volumes could fill, rendering the
entire Storage Node read-only.
When you add additional storage volumes to a Storage Node, you
must take a number of actions before the grid can use the new storage.
First, direct the Storage Node to remap its CBID ranges, so it uses all of
the storage volumes that now exist on the Storage Node. Then “rebalance” the existing stored content across all storage volumes to ensure
that each object already stored on the server ends up in the correct
storage volume, as required by the new mapping.
The procedure below describes how to direct the Storage Node to shift
content so that stored objects are spread evenly across all storage
volumes, with each object stored in the storage volume that corresponds to its CBID.
Balancing content across object stores works best when all objects are
much smaller than the object store size. Avoid rebalancing content on
Storage Nodes that contain very large objects (comparable in size to
the object store itself):
94
•
If any objects are large, content rebalancing does not permit you to
take advantage of unused space on the Storage Node. One object
store (and therefore the Storage Node as a whole) may register as
“full” after receiving one (or only a few) large objects. Rebalancing
cannot correct this issue.
•
If you rebalance content when storage volumes are unbalanced
due to a few large objects, there is a risk that the rebalancing grid
task could become stuck. If this occurs, contact Support.
NetApp StorageGRID
215-06839_B0
Add Storage
Rebalance Content on the Storage Node
New content cannot be written to a Storage Node while content rebalancing is in progress. If you rebalance content on more than one
Storage Node at a time, you may interfere with the process of ingesting data into the grid. Wait until rebalancing completes on the first
Storage Node of any expanded set before rebalancing the next
Storage Node. Note that in a situation where multiple Storage Nodes
are read-only, it is acceptable to rebalance in parallel content on these
read-only Storage Nodes.
Prerequisites
•
One of the following conditions is true:
•
You added storage volumes to an existing Storage Node and
Storage Volume Balancing is Manual.
•
The object stores on an existing Storage Node are significantly
unbalanced and Storage Volume Balancing is Manual.
Significantly unbalanced means that Total Free Space is significantly greater than Total Usable Space. (For example, one object
store has 50% or more free space when others are full.)
— and —
The imbalance in the object stores is not due to the presence of a
few large objects.
•
In general, it is safe to run other grid tasks at the same time as you
rebalance content on a Storage Node.
However, you can only run one grid task that affects a given
Storage Node at a time. Ensure that none of the following grid tasks
are active for the Storage Node whose content you are rebalancing:
•
Storage Node decommissioning (LDCM)
•
Storage Node hardware refresh or a Control/Storage Node
hardware refresh (CSRF) — the current Storage Node is neither
a source nor a destination for a hardware refresh
•
LDR Foreground Verification (LFGV)
•
LDR Content Rebalancing (LBAL)
It is safe to run these grid tasks concurrently for different
Storage Nodes. For example, you can run LDR content rebalancing
grid task on one Storage Node while you perform foreground verification on a second Storage Node.
NetApp StorageGRID
215-06839_B0
95
Expansion Guide
Procedure
NOTE This procedure only applies to LDRs where Storage Volume Balancing is Manual. Rebalancing is not necessary for LDRs where Storage
Volume Balancing is Auto.
1. Log on to the NMS MI and ensure that the CMN service is healthy
and that the LDR has no connectivity alarms:
a. In the NMS MI, go to the CMN and LDR and ensure that they
are green. In particular, ensure that these services do not show
any alarms related to connectivity. For example:
•
An alarm on Events  Overview  Available Metadata Services
indicates that the service cannot communicate with any
CMS. This can indicate an issue with connectivity.
•
An alarm on SSM  Resources  Network Interfaces can
indicate an issue with the network interface card on the
server.
Resolve any issues before continuing.
It is important to ensure that the LDR and the CMN can communicate because the command to rebalance content on the LDR is
executed by a grid task that is coordinated by the CMN.
2. At the Storage Node server, access a command shell session and
log in as root, using the password provided in the Passwords.txt file.
WARNING Use the ADE Console with caution. If you use the
console improperly, you can interrupt grid operations
and corrupt data. Enter commands carefully, and only
use the commands documented in this procedure.
3. Access the ADE console of the LDR service. Enter:
telnet localhost 1402
4. Access the Storage Tasks Module. Enter: cd /proc/STSK
5. Rebalance the content on the Storage Node. Enter: rebalance
The command initiates a grid task to rebalance content across all
storage volumes on the Storage Node. The CMN service executes
the grid task.
6. Log out of the ADE console. Enter: exit
7. Log out of the Storage Node command shell. Enter: exit
8. In the NMS MI, check the status:
96
NetApp StorageGRID
215-06839_B0
Add Storage
a. Go to Storage Node  LDR and check the values of the following attributes as the rebalancing task starts:
•
Storage State – Current becomes Maintenance
•
Inbound Replication Status becomes Storage Read-only
•
Outbound Replication Status remains No Errors
This means that while rebalancing is in progress content can be
retrieved from, but not written to the Storage Node. The
Storage State remains Maintenance until rebalancing is complete.
While active, you can chart the progress of the grid task on the
CMN  Grid Tasks  Reports  Charts page.
b. Go to <primary_ Admin Node>  CMN  Grid Tasks, and find
the Content Rebalancing grid task in the Active list. Monitor
the % Complete attribute.
Storage Node rebalancing may take several days to complete.
WARNING Do not abort the grid task. If you abort the grid task,
you must treat the Storage Node and all its storage
volumes as failed. Follow the instructions in the
Maintenance Guide to recover the Storage Node
and all storage volumes.

Aborting the grid task leaves the Storage Node in
an inconsistent state.
NOTE If a Storage Node fails on this server while rebalancing is in progress, you must recover all storage volumes (using the instructions
in the Maintenance Guide). 

A Storage Node failure leaves the storage volumes in an inconsistent state.
When the content rebalancing grid task completes, the Storage
State – Current returns to Online, and the Inbound Replication Status
changes back to No Errors. The Storage Node is read-write once
more, and can be used both to store and retrieve content.
Expand the Remaining Storage Nodes
After the content rebalancing grid task completes, repeat the procedures in “Add NFS Mounted Storage Volumes” on page 88 and
“Rebalancing Content Across Storage Volumes” on page 93 (when
applicable) for each Storage Node that requires expansion.
NetApp StorageGRID
215-06839_B0
97
Expansion Guide
98
NetApp StorageGRID
215-06839_B0
5
Convert to a High
Availability Gateway
Cluster
Converting a Gateway Node replication group to one
that contains an HAGC
Introduction
This chapter describes the procedure for converting an existing basic
Gateway Node replication group to a replication group that contains a
High Availability Gateway Cluster (HAGC).
NOTE Converting an existing basic Gateway Node replication group to a
replication group that contains an HAGC is not supported during
upgrade.
An HAGC uses redundant hardware and clustering software to create
a primary Gateway Node cluster. Each Gateway Node cluster includes
a main Gateway Node and a supplementary Gateway Node that share
a single IP address (to group the network interfaces on the two servers). The cluster provides a single access point to the grid for clients,
whose NFS or CIFS file shares are present on both members of the
cluster.
By default, the main Gateway Node in an HAGC is the active primary
while the supplementary Gateway Node is the standby primary. A
heartbeat service monitors the status of both Gateway Nodes in the
cluster. If a hardware or software failure disables the active primary
Gateway Node, the heartbeat service detects the fault and automatically directs the standby primary Gateway Node to become the active
primary, minimizing the disruption to client operations. For more
information on Gateway Node replication groups and High Availability Gateway replication groups, see the Administrator Guide.
NetApp StorageGRID
215-06839_B0 10.2G
99
Expansion Guide
A replication group that includes an HAGC may include either two or
three members. A three-member replication group includes the two
Gateway Nodes in the HAGC and a secondary Gateway Node used to
perform backups. A two-member replication group includes only the
two members of the High Availability Gateway Cluster: the main
Gateway Node and the supplementary Gateway Node. In a twomember replication group, backups are performed by the
Gateway Node that is currently in the standby state (that is, its FSG 
Replication  Current Role is Standby Primary).
The procedure to convert an existing Gateway Node replication group
to one that includes an HAGC varies slightly depending upon the
current members of the replication group, and what you want the final
configuration of the replication group to be:
•
If the existing replication group contains three Gateway Nodes (or
two Gateway Nodes and an Admin/Gateway Node), you can
convert one of the existing secondary Gateway Nodes to a supplementary primary Gateway Node and add it to a cluster that
includes the current primary. The end result is a High Availability
Gateway replication group that has a secondary Gateway Node.
•
If the existing replication group has two members that are colocated, you can choose to convert the existing secondary FSG to a
supplementary primary. If you select this option, the existing secondary is added to a cluster that includes the current primary. The
end result is a High Availability Gateway replication group that
does not have a secondary Gateway Node.
•
If the existing replication group has two members, you can choose
to add a third Gateway Node to the replication group to act as a
supplementary primary. Select this option when you require a secondary Gateway Node in the replication group for any of the
following reasons:
•
The original secondary is not co-located with the primary.
Therefore you cannot convert it to a supplementary and add it
to the primary cluster.
•
Read-only access is required from a remote location via a
secondary.
•
Better performance during backups is desired.
The end result is a High Availability Gateway replication group
with three members that has a secondary Gateway Node.
Gateway Nodes that are members of an HAGC and are hosted on
physical servers must be co-located at the same physical location, as
they must be connected by a crossover cable to maintain the highest
100
NetApp StorageGRID
215-06839_B0
Convert to a High Availability Gateway Cluster
levels of reliability. An HAGC hosted on a virtual machine does not
need to use a crossover cable for heartbeat.
All Gateway Nodes in the replication group must perform online
backups. (Older Gateway Nodes that performed offline backups
should have been converted to use online backups when they were
updated to use release 8.1 software, as described in the Upgrade Guide
Release 8.1.x.)
Overall Procedure
Perform the following steps to convert an existing basic
Gateway Node replication group to one that includes an HAGC. (Each
step is described in more detail later in this chapter.)
1. Decide if a new Gateway Node is required to complete the conversion. See the Introduction above and “Add a New Gateway Node”
on page 104.
2. Prepare for the conversion by performing all of the relevant steps
described in “Prepare for Conversion and Provision the Grid” on
page 103.
3. If you are adding a new Gateway Node, see “Add a New
Gateway Node” on page 104.
4. Prevent clients from accessing the Gateway Node replication
group while the conversion takes place. See “Prevent Clients From
Accessing the Replication Group” on page 105.
5. Update the network configuration of Gateway Nodes. See “Update
Network Configuration” on page 105.
6. If you are performing the conversion on physical servers, add the
new hardware required to convert to an HAGC. For more information, see “Add New Hardware” on page 108.
a. Add a network card to each Gateway Node in the HAGC.
b. Add a crossover cable between the two Gateway Nodes in the
HAGC.
NOTE An HAGC hosted on a virtual machine does not need to use a
crossover cable.
NetApp StorageGRID
215-06839_B0
101
Expansion Guide
7. Update the Gateway Node configuration to support the HAGC.
See “Update Gateway Node Configuration” on page 109.
a. Install heartbeat configuration files on the primary gateway in
the HAGC.
b. If an existing secondary is converted to be a supplementary
primary:
•
Install heartbeat configuration files.
•
Run a grid task to update configuration information for the
replication group.
c. If a new Gateway Node is added to be the new supplementary,
no configuration of the new Gateway Node is necessary.
d. If there are any other Gateway Nodes that are members of the
HAGC, install the same heartbeat configuration file on these
remaining Gateway Nodes.
NOTE All Gateway Nodes in an HAGC must include the same heartbeat
configuration file (ha.cf).
8. Start all Gateway Nodes in the replication group. See “Start
Primary Gateway Nodes” on page 111.
9. Optionally, restore client access to the main primary to reduce the
total length of service outage.
10. Apply hotfixes and maintenance releases to new Gateway Nodes.
See “Apply Hotfixes and Maintenance Releases” on page 61.
11. Update the file share configurations of the Gateway Nodes. See
“Update File Share Integration” on page 114.
12. Optionally, copy the FSG cache from the main Gateway Node in
the same replication group to the new Gateway Node. See “Copy
the FSG Cache” on page 120.
13. If not done earlier, restore client access to the grid. See “Restore
Client Access to Gateway Nodes” on page 122.
14. Test the HAGC. See “Test Failover in the Cluster” on page 120.
102
NetApp StorageGRID
215-06839_B0
Convert to a High Availability Gateway Cluster
Convert to a High Availability Gateway Cluster
You may convert an existing replication group to one whose primary
Gateway Node is an HAGC either as part of another expansion, or as a
standalone expansion.
You must shut down all Gateway Nodes in a replication group to add
an HAGC. Arrange a Gateway Node service outage with the customer.
NOTE Adding a High Availability Gateway Cluster to an existing replication
group requires that all Gateway Nodes be shut down. Arrange a
Gateway Node service outage with the customer.
The service outage may last several hours: read through this procedure before planning the outage to assess how long the outage will
last. If a backup is in progress, it is not necessary to wait for the backup
to complete before proceeding with the conversion. If a backup is in
progress, it will be aborted by the shutdown that is a part of this procedure. If necessary, a failed Gateway Node can be restored from the
previous backup already stored in the grid.
Prepare for Conversion and Provision the Grid
1. If you are adding an additional Gateway Node to the replication
group, confirm the Admin Node capacity. See “Confirm Grid’s
Grid Node Capacity” on page 21.
2. Acquire an updated grid specification file that includes the conversion and/or addition of an additional Gateway Node to the
replication group. See “Acquire an Updated Grid Specification
File” on page 23.
3. Confirm that you have all required items listed in the materials
checklist, as described in “Gather Required Materials” on page 24.
4. Provision the grid. See “Provision the Grid” on page 232.
5. If you are installing a new Gateway Node, create or update the
Server Activation media to use during server installation. For
virtual machines, see “Provision the Grid and Create Server Activation Floppy Image” on page 29, and for physical servers, see
“Provision the Grid and Create a Server Activation USB Flash
Drive” on page 32.
6. If you are adding an additional Gateway Node to the replication
group, prepare virtual machines and/or hardware. See:
NetApp StorageGRID
215-06839_B0
103
Expansion Guide
•
“Prepare Virtual Machines” on page 179
•
“Prepare Expansion Physical Servers” on page 199
Add a New Gateway Node
It is not necessary to add a new Gateway Node if you can repurpose an
existing secondary Gateway Node to be a part of the HAGC. You can
convert the secondary Gateway Node to be the supplementary
primary as described later. Go to “Prevent Clients From Accessing the
Replication Group” on page 105.
You must add a new Gateway Node as part of the conversion when:
•
The replication group contains only two Gateway Nodes, and you
require a secondary FSG.
•
Physical servers host an HAGC, and the secondary in the replication group is not in the same physical location as the primary
Gateway Node. When installing on physical servers, you must
place both Gateway Nodes in the HAGC in the same physical
location because a crossover cable connects the physical servers.
Adding an expansion Gateway Node is much the same as installing a
Gateway Node during the initial installation of the grid. The expansion grid tasks automatically allocate the new Gateway Node to the
existing replication group, and assign it its role within that group.
NOTE An HAGC hosted on a virtual machine does not require the use of a
crossover cable for heartbeat.
Procedure
1. Perform all of the steps in one of the following sections:
•
“Prepare Virtual Machines” on page 179
•
“Prepare Expansion Physical Servers” on page 199
2. Install grid software. See “Install StorageGRID Software” on
page 39.
3. Complete the following steps from Table 3: “Add, Customize, and
Start Expansion Grid Nodes” on page 41:
104
•
“Run the Initial Expansion Grid Task” on page 44
•
“Run the Add Server Grid Task” on page 47
NetApp StorageGRID
215-06839_B0
Convert to a High Availability Gateway Cluster
4. After running the Add Server expansion grid task Grid Expansion:
Add Server, return to overview procedure. See “Overall Procedure”
on page 101.
NOTE Do not start the grid software on the expansion Gateway Node, that
is, do not select Enable Services in GDU.
Prevent Clients From Accessing the Replication Group
To reconfigure the replication group to include an HAGC, you must
add additional hardware and make configuration changes that cannot
be performed while Gateway Node services are active. Before you
begin the conversion process, you must arrange a Gateway Node
service outage that may last several hours.
NOTE Adding a High Availability Gateway Cluster to an existing replication
group requires that all Gateway Nodes be made unavailable. Arrange
a Gateway Node service outage with the customer.
If you do not allow clients to access the primary Gateway Node until
after the conversion is complete and the supplementary primary is
fully functional, the service outage lasts approximately as long as it
takes to perform the nightly file system backup of the grid.
Ensure that clients cannot access the Gateway Node before you begin
the conversion.
Do not disconnect the Gateway Nodes, as they must be able to communicate with other grid servers and the customer network during the
conversion process. Instead, ensure that clients do not attempt to
access the Gateway Nodes to store or retrieve files during the conversion. Such attempts will fail as services are stopped and reconfigured
as the Gateway Nodes are converted.
Update Network Configuration
The first step of the conversion process is to update the network configuration of the Gateway Nodes in the replication group to support
the HAGC.
Each Gateway Node in an HAGC has multiple IP addresses:
•
NetApp StorageGRID
215-06839_B0
Grid IP — The IP address of the server on the grid network.
105
Expansion Guide
•
Customer IP — The IP address of the server on the customer
network.
•
Heartbeat IP — An IP address used by the heartbeat service to
monitor the health of the server and its services.
•
Virtual IP address — An IP address that is shared between both
members of the High Availability Gateway Cluster, used to
provide a stable IP address to clients throughout failover.
If the grid does not have a private network infrastructure, the IP
address of the server on the customer network is on the same network
as its IP address on the grid network. Any of the network interfaces on
the server may be bonded to provide additional redundancy.
High Availability Gateway Cluster
Grid Network
Grid NIC
Customer
NIC
Heartbeat
Network
heartbeat
Grid NIC
heartbeat
Main Gateway Node
Customer
NIC
Supplementary Gateway Node
Client Network
Figure 5: Networking for a High Availability Gateway Cluster
When you convert an existing Gateway Node to be a member of a
High Availability Gateway Cluster, you must:
106
•
Update the customer IP address of the primary Gateway Node.
The previous customer IP address of the server (before conversion)
is used as the virtual IP address of the cluster, so that existing client
integrations will continue to work without alteration.
•
If using physical servers (not a virtual machine), add an additional
network card to the server to support the heartbeat service, and
update its network configuration to include the additional IP
address.
•
If using physical servers (not a virtual machine), use a crossover
cable to physically connect the network interfaces for the heartbeat
service on the two members of the cluster. A crossover cable is
NOT required when Gateway Nodes are hosted on virtual
machines.
NetApp StorageGRID
215-06839_B0
Convert to a High Availability Gateway Cluster
It is convenient to update the network configuration before you add
the additional network cards, so that the networking will be correct
when the servers rejoin the grid after being restarted.
Procedure
1. Locate information about the IP addresses and networking configuration of the new High Availability Gateway Cluster:
a. Consult the updated SAID package provided to support the
expansion.
b. Go to the Doc directory and open the index.html file.
c. The Network tab describes the networking configuration of each
server, as shown in Figure 6 below.
Figure 6: Network Tab
2. Export the server configuration files created by provisioning:
a. At the primary Admin Node, access a command shell and log
in as root using the password listed in the Passwords.txt file.
b. Create a subdirectory for the configuration files. Enter:
mkdir -p /var/local/config
c. Export the configuration files. Enter:
get-server-config /var/local/config
d. When prompted, enter the provisioning passphrase.
The script creates subdirectories, one for each server in the
grid. Each subdirectory contains a configuration file for the
server. Server configuration files are created for all servers on
the grid, even if their IP address is not changing.
3. Copy the configuration file of the primary Gateway Node to that
server. Enter on one line:
scp /var/local/config/<server_name> 
/bycast-server-config.xml <server_IP>:/etc/
where <server_name> is the hostname of the server and <server_IP>
is its grid IP address.
NetApp StorageGRID
215-06839_B0
107
Expansion Guide
4. Log out of the Admin Node command shell. Enter: exit
5. Run the apply-pending-changes script:
a. At the primary Gateway Node, access a command shell and log
in as root using the password listed in the Passwords.txt file.
b. Run the script. Enter: apply-pending-changes
c. When prompted, reboot the server.
d. Log in to the Gateway Node after it restarts.
6. Shut down all grid services on the primary Gateway Node. Enter:
/etc/init.d/servermanager stop
7. By default, services are restarted when the server or Server
Manager is restarted. It is important that the FSG service not be
started inadvertently.
Add a DoNotStart file to prevent the FSG service from restarting.
Enter: touch /etc/sv/fsg/DoNotStart
8. If you are converting an existing secondary Gateway Node to a
supplementary primary:
a. Go to the Server Manager console of the secondary
Gateway Node.
b. Access a command shell and log in as root using the password
provided in the Passwords.txt file.
c. Repeat step 3 to step 8 on the secondary Gateway Node.
9. If you added a Gateway Node to the replication group, it was correctly configured at installation. Proceed to the next step.
Add New Hardware
Make sure that the Gateway Node servers that are a part of the HAGC
are powered down.
1. Add any additional required network cards to the Gateway Nodes
that you are adding to the HAGC.
2. For Gateway Nodes installed on physical servers, connect the
crossover cable between the heartbeat interfaces on the servers.
NOTE An HAGC hosted on a virtual machine does not require the use of
a crossover cable for heartbeat.
3. Power up both servers.
Note that FSG services do not start on these servers when they are
powered up because of the DoNotStart files.
108
NetApp StorageGRID
215-06839_B0
Convert to a High Availability Gateway Cluster
Update Gateway Node Configuration
The steps in this section make the configuration changes that are
required to convert the Gateway Nodes to an HAGC.
The process of provisioning an HAGC conversion creates a number of
configuration files that are required to convert a Gateway Node to an
HAGC. You must extract these files on the primary Admin Node, and
copy them to the primary Gateway Node.
1. Log onto the primary Admin Node as root using the password
provided in the Passwords.txt file.
2. Create a temporary directory for the SAID package. Enter:
mkdir /tmp/autoyast
3. Extract the SAID package from the gpt repository to your temporary directory. Enter: backup-to-usb-key /tmp/autoyast
4. When prompted, enter the provisioning passphrase used to
encrypt the gpt repository.
Loading GPT Repository...done
Copying GID170179_REV1_SAID.zip to USB key...done
Copying repository backup to USB key...done
Backup complete.
This password is created during the provisioning portion of the
grid installation process.
5. Change directories to the temporary directory created in step 2
above. Enter: cd /tmp/autoyast
6. Unzip the SAID package. Enter: unzip <SAID_package>.zip
7. Create a staging directory for the configuration files required for
the primary Gateway Node. Enter:
mkdir /tmp/<GN_hostname>_staging
Substitute the hostname of the primary Gateway Node for
<GN_hostname>.
8. Change directories to the SAID package’s Grid_Activation directory.
Enter: cd <SAID_package>/Grid_Activation
The Grid_Activation directory contains the grid activation xml files
for the grid’s nodes, including the primary Gateway Node.
9. Copy the activation file for the primary Gateway Node to the
staging directory for the Gateway Node. Enter:
cp <GN_hostname>_autoinst.xml /tmp/<GN_hostname>_staging
10. Move to the staging directory. Enter:
cd /tmp/<GN_hostname>_staging
NetApp StorageGRID
215-06839_B0
109
Expansion Guide
11. Run the script that extracts the configuration files from the autoinst.xml file. Enter (on one line):
/usr/local/gpt/bin/autoyast_extractor.rb
<GN_hostname>_autoinst.xml tmp.tar.gz
12. Unpack the file produced by the script. Enter (on two lines):
tar xvfz tmp.tar.gz
tar xpf provisioned_files.tar
13. Copy only the configuration files to the primary Gateway Node.
Enter (each scp command on one line):
scp /tmp/<GN_hostname>_staging/var/lib/heartbeat/crm/cib.xml
<GN_IP_address>:/var/lib/heartbeat/crm/
scp /tmp/<GN_hostname>_staging/etc/ha.d/ha.cf /tmp/
<GN_hostname>_staging/etc/ha.d/authkeys <GN_IP_address>:/
etc/ha.d
where <GN_IP_address> is the IP address of the primary
Gateway Node on the customer network, as found in the Configuration.txt file. When prompted, enter the password of the primary
Gateway Node as found in the Passwords.txt file.
14. Remove the temporary files on the primary Admin Node. Enter
(on two lines):
rm -rf /tmp/<GN_hostname>_staging
15. Log onto the primary Gateway Node as root, using the password
provided in the Passwords.txt file (or access the Gateway Node
remotely, as described on page 249).
16. Remove the old heartbeat configuration files. Enter:
rm -f /var/lib/heartbeat/crm/cib.xml.*
17. Change file access permissions for cib.xml and authkeys to ensure
they can only be changed by root. Enter:
chmod 600 /var/lib/heartbeat/crm/cib.xml /etc/ha.d/authkeys
18. Change the ownership of the cib.xml file. Enter:
chown hacluster:haclient /var/lib/heartbeat/crm/cib.xml
19. If you are converting an existing Gateway Node and not adding a
new server, repeat steps 7 to 18 on the supplementary primary.
20. If you are converting an existing Gateway Node and not adding a
new server, run the grid task that imports the updated configuration bundles needed to complete the conversion of the
Gateway Nodes:
a. Log in to the NMS MI, using the Vendor account.
b. Go to <grid root> Configuration  Tasks.
c. Confirm that no grid tasks are running. The only exceptions
are:
110
NetApp StorageGRID
215-06839_B0
Convert to a High Availability Gateway Cluster
•
LDR content rebalancing grid task (LBAL)
•
ILM evaluation (ILME)
•
LDR foreground verification (LFGV or VFGV)
These grid tasks can run concurrently with the Bundle Import
(BDLI) grid task. If any other grid tasks are running, wait for
them to complete, release their lock, or abort them as appropriate. For more information on grid tasks and resource locking,
see the chapter “Grid Tasks” in the Administrator Guide.
d. In the Pending table, locate the grid task Bundle Import: FCLS,
FSGR. Under Actions, select Start.
e. Click Apply Changes.
The grid task moves to the Active table.
The task continues to execute until it completes or fails. When
the task completes it moves to the Historical list with a Status of
Successful.
While active, the progress of the grid task can be charted via
the CMN  Grid Tasks  Reports  Charts page.
After the grid task completes, the required configuration changes are
complete.
Start Primary Gateway Nodes
Start the primary Gateway Node (which is now the main primary
Gateway Node in the cluster) before the supplementary
Gateway Node to ensure that the nodes assume their default roles in
the grid.
You did not shut down the secondary Gateway Node as part of this
procedure. You do not have to restart it.
Procedure
1. Start the main primary Gateway Node:
a. Return to the command shell of the Primary Gateway Node
(the Main Gateway Node in the cluster).
b. Remove the DoNotStart file. Enter: rm /etc/sv/fsg/DoNotStart
c. Restart services on the Gateway Node. Enter:
/etc/init.d/servermanager restart
It takes a few minutes for services to start.
NetApp StorageGRID
215-06839_B0
111
Expansion Guide
d. To monitor the heartbeat service and the services it is starting,
enter: watch -n <time> crm_mon -1
where <time> is the length (in seconds) between refreshes
Refresh in 2s...
============
Last updated: Fri Jun 15 01:30:17 2012
Current DC: 99-14 (4f43d4ec-cbe8-8d1b-d50d-58bc47490a64)
1 Nodes configured.
2 Resources configured.
============
Node: 99-14 (4f43d4ec-cbe8-8d1b-d50d-58bc47490a64): online
Resource Group: res_group
fsgbase
(ocf::Bycast:FSGBase): Started 99-14
fsgactive
(ocf::Bycast:FSGActive):
Started
Filesharing-NFS
(ocf::Bycast:NFSServer):
Filesharing-CIFS
(ocf::Bycast:Samba):
Started
Clone Set: failcountcheck_clone
failcountcheck:0
(ocf::Bycast:FailcountCheck):
99-14
Started 99-14
99-14
Started 99-14
NOTE If the StorageGRID system is installed using SLES 10, when
crm_mon shows that all services are started, press <Ctrl>+<C>
to return to the command line.
2. To reduce the length of the service outage, you can restore client
access to the main primary Gateway Node now.
NOTE A major alarm “Cluster Status: Vulnerable (FCSS)” appears in the
NMS when you start the Gateway Node. This alarm persists until
you complete the conversion and the supplementary
Gateway Node is functional.
You can store and retrieve files from the main primary while you
complete the conversion procedure. You cannot fail over from the
main primary to the supplementary primary until the conversion
procedure is complete.
3. If you added a new Gateway Node to the replication group to act
as the supplementary Gateway Node, start software now:
a. Start GDU. See “Start GDU” on page 211.
b. In the Servers panel, select the server and confirm that its
current state is Available.
c. In the Tasks panel, select Enable Services, and then select Start
Task in the Actions panel and press <Enter>.
Wait for the task to complete.
112
NetApp StorageGRID
215-06839_B0
Convert to a High Availability Gateway Cluster
d. Close GDU. See “Close GDU” on page 216.
e. It takes a few minutes for services to start. When the node
reports all services are verified or running, the FSG service on
this new Gateway Node performs a start-up scan and discovers
that its file system must be synchronized with the file system
on the main primary.
The new FSG automatically populates its file system from the
last backup and the subsequent replication sessions. This
process may take several hours to complete.
While it is in progress, in the NMS MI on the FSG 
Replication  Overview page, the Replication Status remains in the
Starting state, and the FSG  Client Services  Heartbeat status is
Stopped.
f.
Monitor the progress of the synchronization of the file system.
•
Go to FSG  Backup  Overview, and monitor the Percent
Complete value in the Restore section. You can chart this
attribute to estimate when the restoration of the file system
from the backup file will complete.
This step can take several hours to complete.
NOTE While the restoration is in progress, the FSG  Replication 
Cluster Status is Vulnerable. This is normal. The alarm clears
when the restoration of the file system completes.
•
After the restore completes, go to FSG  Backup  Overview
and ensure that the Secondary Active Session is not zero.
When Operations Not Applied stops decreasing for at least
two minutes, the outstanding replication session messages
have been processed.
4. If you converted an existing secondary Gateway Node to be a part
of the HAGC:
a. Return to the command shell of this Gateway Node.
b. Remove the DoNotStart file. Enter: rm /etc/sv/fsg/DoNotStart
c. Restart services on the node. Enter:
/etc/init.d/servermanager restart
It takes a few minutes for services to start.
When you convert a secondary Gateway Node, its file system
is already synchronized with that of the other primary.
5. Monitor the status of the cluster from the command shell of the
second Gateway Node. Enter: watch -n <time> crm_mon -1
NetApp StorageGRID
215-06839_B0
113
Expansion Guide
where <time> is the length (in seconds) between refreshes
When both servers in the cluster are running normally, 2 Nodes
configured appears, and both servers are listed as being online.
NOTE If the StorageGRID system is installed using SLES 10, when
crm_mon shows that all services are started, press <Ctrl>+<C>
to return to the command line.
6. Use the NMS MI to confirm that each Gateway Node has assumed
its new role:
a. Use the service laptop to connect through the customer’s
network to the NMS MI and log in using the “Vendor ” account.
For more information, see “NMS Connection Procedure” on
page 246.
b. Go to the first (former primary) Gateway Node  FSG 
Replication  Overview, and ensure that the:
•
Configured Role is Main Primary
•
Current Role is Active Primary
•
Cluster Status is Normal
c. Go to the second (new or former secondary) Gateway Node 
FSG  Replication  Overview, and ensure that the:
•
Configured Role is Supplementary Primary
•
Current Role is Standby Primary
•
Cluster Status is Normal
Update File Share Integration
Before clients can use the new HAGC, you must update the file share
configuration on the supplementary Gateway Node. Check grid
specific documentation for information on whether the grid has CIFS
file shares, NFS file shares, or both configured for the replication
group.
Supplementary Gateway Node is New
In an HAGC, if the supplementary Gateway Node is new, you must
synchronize file shares on this grid node with the file shares on the
main primary Gateway Node to enable failover between the clustered
Gateway Nodes. You must wait until the restoration of the file system
on the new supplementary Gateway Node is complete before synchronizing the file shares. For the procedure, see “Update File Share
Integration” on page 68.
114
NetApp StorageGRID
215-06839_B0
Convert to a High Availability Gateway Cluster
Supplementary Gateway Node is Converted
If the supplementary Gateway Node was previously a secondary in the
same replication group (and has been converted to a supplementary), it
already has a copy of the file shares that exist on the main primary.
However, its file shares may have been configured as read-only versions
of the shares on the primary, if the secondary was not configured to
permit business continuity failover.
In an HAGC, file shares on the supplementary must have the same readwrite permissions as those on the main primary Gateway Node; that is,
there must be at least one “writable” user to enable data to be saved to
the file share by a client.
Check grid specific documentation for information on whether the preconversion grid supported “business continuity failover” in this replication group, and whether the replication group has CIFS file shares, NFS
file shares, or both. If the shares on the secondary Gateway Node were
configured read-write, no action is necessary. If the shares were configured read-only, update the file shares on the supplementary to be readwrite as described in the procedure below.If you are not sure if the
shares on the secondary were configured read-only (pre-conversion),
follow the procedures for CIFS and NFS file shares below to check.
Check CIFS File Shares
1. At the supplementary Gateway Node server, access a command
shell and log in as root using the password listed in the Passwords.txt
file.
2. Start the CIFS configuration utility. Enter: config_cifs.rb
# config_cifs.rb
CIFS Config Utility
--------------------------------------------------------------------| Shares
| Authentication
| Config
|
--------------------------------------------------------------------| add-filesystem-share
| set-authentication
| validate-config |
| add-audit-share
| set-netbios-name
| pull-config
|
| enable-disable-share
| join-domain
| push-config
|
| remove-share
| add-password-server
| help
|
| add-user-to-share
| remove-password-server | exit
|
| remove-user-from-share | add-wins-server
|
|
| modify-group
| remove-wins-server
|
|
--------------------------------------------------------------------Enter command: _
NetApp StorageGRID
215-06839_B0
115
Expansion Guide
3. Check the file shares configured on the Gateway Node. Enter:
validate-config
Enter command: validate-config
Load smb config files from /etc/samba/smb.conf
Can't find include file /etc/samba/includes/cifs-interfaces.inc
Can't find include file /etc/samba/includes/cifs-custom-config.inc
Processing section "[sharetwo]"
Server's Role (logon server) NOT ADVISED with domain-level security
Loaded services file OK.
WARNING: passdb expand explicit = yes is deprecated
Server role: ROLE_DOMAIN_BDC
Press enter to see a dump of your service definitions
[global]
workgroup = NTDOMAIN
netbios name = NETBIOSSERVER
server string = StorageGRID archive
security = DOMAIN
obey pam restrictions = Yes
password server = 192.168.130.41
passdb backend = tdbsam:/var/lib/samba/passdb.tdb
passwd program = /usr/bin/passwd %u
passwd chat = *Enter\snew\sUNIX\spassword:* %n\n
*Retype\snew\sUNIX\spassword:* %n\n .
syslog = 0
log file = /var/local/log/samba/log.%m
max log size = 1000
name resolve order = wins bcast
max stat cache size = 51200
domain logons = Yes
os level = 65
preferred master = Yes
domain master = Yes
dns proxy = No
panic action = /usr/share/samba/panic-action %d
invalid users = root
create mask = 0700
directory mask = 0700
include = /etc/samba/includes/share-sharetwo.inc
[sharetwo]
path = /fsg/sharetwo
valid users = user1, user2, @Group1
write list = user1, @Group1
force user = fsg-client
force group = nogroup
browseable = No
include = /etc/samba/includes/share-audit-export.inc
Press return to continue.
The file shares configured on the Gateway Node are at the bottom
of the listing. Each share name is enclosed in square brackets,
followed by information about the share. Users and groups with
read-write permission are shown in the write list.
116
NetApp StorageGRID
215-06839_B0
Convert to a High Availability Gateway Cluster
In the example above, the file share named “sharetwo” has a readwrite user named “user1”.
4. Examine the listing of shares. If each share has at least one user in
the write list, you do not have to update the file share configuration on the supplementary Gateway Node.
5. If the shares do not have writable users, update the file share configuration. Enter: pull-config
A numbered list of the other FSG services in the replication group
is displayed.
6. Enter the number of the active primary from the list. Enter:
<number>
7. When prompted to make all shares read-only, enter: No
Shares on the supplementary Gateway Node must be read-write,
to permit clients to continue to save objects to the grid during a
Gateway Node failover.
8. If you are prompted to sync authentication configuration information, enter: y
9. If you are prompted to sync any custom configuration, enter: n
Enter y if the Samba configuration has been hand-edited for this
replication group to meet particular customer needs, or the file
share supports ACLs.
10. If you are prompted with the following message:
The authenticity of host 'hostname (xx.xx.xx.xx)' can't be
established. RSA key fingerprint is
d9:2b:ea:86:fd:90:3d:a1:f7:e9:d5:f6:d5:30:75:38.
Are you sure you want to continue connecting (yes/no)? “
Enter: yes
The following warning is displayed: Warning: Permanently added
'xx.xx.xx.xx' (RSA) to the list of known hosts.
11. If prompted for the password of the remote Gateway Node, enter
the password, as found in the Passwords.txt file. Enter: password
The server opens an ssh connection with the primary
Gateway Node and pulls a copy of its file shares to the supplementary Gateway Node.
The script responds “Configuration updated” when it is done.
12. When prompted, press <Enter>.
The CIFS configuration utility is displayed.
13. Log out of the CIFS configuration utility. Enter: exit
NetApp StorageGRID
215-06839_B0
117
Expansion Guide
If the server hosting the Gateway Node also hosts an Admin Node that
has an audit share configured, you must recreate the audit share if you
used the pull-config option. For more information, see the Administrator
Guide.
Check NFS File Shares
1. At the supplementary Gateway Node server, access a command
shell and log in as root using the password listed in the Passwords.txt file.
2. Start the NFS configuration utility. Enter: config_nfs.rb
# config_nfs.rb
NFS Config Utility
----------------------------------------------------------------| Shares
| Clients
| Config
|
----------------------------------------------------------------| add-nfs-share
| add-ip-to-share
| validate-config |
| add-audit-share
| remove-ip-from-share | pull-config
|
| enable-disable-share |
| push-config
|
| remove-share
|
| refresh-config |
| set-share-uid-gid
|
| help
|
|
|
| exit
|
----------------------------------------------------------------Enter command:
3. Check the file shares configured on the Gateway Node. Enter:
validate-config
NFS file shares configured on the Gateway Node are listed, along
with the IP addresses permitted to access them. IP addresses with
read-write access are indicated with “rw”, while IP addresses with
read-only access are indicated with “ro”.
In the example below, the file share “/fsg/mydirectory” has one readwrite user and one read-only user.
Enter command: validate-config
#
#
#
#
#
See the exports(5) manpage for a description of the syntax of this file.
This file contains a list of all directories that are to be exported to
other computers via NFS (Network File System).
This file used by rpc.nfsd and rpc.mountd. See their manpages for details
on how make changes in this file effective.
/tmp 127.0.0.1(ro,all_squash)
/fsg/mydirectoy 99.99.99.99/12(rw,all_squash,anonuid=6000,sync)
99.99.99.98(ro,all_squash,anonuid=6000,sync)
4. Examine the listing of shares. If each share has read-write (rw)
access, you do not have to update the file share configuration on
the supplementary Gateway Node.
118
NetApp StorageGRID
215-06839_B0
Convert to a High Availability Gateway Cluster
5. If the shares on the supplementary Gateway Node are read-only,
you must update their configuration to make the shares read-write.
Enter: pull-config
A numbered list of the other Gateway Nodes in the replication
group is displayed.
6. Enter the number of the active primary. Enter: <number>
7. When prompted to make all shares read-only, enter: No
8. If you are prompted with the following message:
The authenticity of host 'hostname (xx.xx.xx.xx)' can't be
established. RSA key fingerprint is
d9:2b:ea:86:fd:90:3d:a1:f7:e9:d5:f6:d5:30:75:38.
Are you sure you want to continue connecting (yes/no)? “
Enter: yes
The following warning is displayed: Warning: Permanently added
'xx.xx.xx.xx' (RSA) to the list of known hosts.
9. When prompted for the password of the remote Gateway Node.
Enter the password, as found in the Passwords.txt file. Enter:
password
The server opens an ssh connection with the main primary
Gateway Node and pulls a copy of its file shares to the supplementary Gateway Node.
The script responds “Configuration updated” when it is done.
10. When prompted, press <Enter>.
The NFS configuration utility is displayed.
11. Log out of the CIFS configuration utility. Enter: exit
If the server hosting the Gateway Node also hosts an Admin Node that
has an audit share configured, you must recreate the audit share if you
used the pull-config option. For more information and the procedure, see
the Administrator Guide.
Confirm Configuration
1. At the supplementary Gateway Node server, access a command
shell and log in as root using the password listed in the Passwords.txt
file.
2. Start either the CIFS or NFS configuration utility. Enter:
config_cifs.rb
— or —
config_nfs.rb
NetApp StorageGRID
215-06839_B0
119
Expansion Guide
3. Enter: validate-config
4. Review the list of file shares and ensure that they are the same as
those on the main primary.
5. When prompted, press <Enter>.
The CIFS or NFS configuration utility is displayed.
6. Log out of the configuration utility. Enter: exit
7. Log out of the supplementary Gateway Node. Enter: exit
Copy the FSG Cache
If you added a new Gateway Node, you can optionally copy the FSG
cache from the main Gateway Node in the same replication group to
the new Gateway Node. Copying the FSG cache can speed up retrieval
access to files on the FSG during a failover. For more information and
the procedure, see the Maintenance Guide.
Test Failover in the Cluster
Follow the instructions in this section to test that the cluster is fully
functional by testing failover.
1. Log in to the NMS MI and monitor the status of the clustered
servers throughout the failover.
2. Go to the Main Primary Gateway Node  FSG  Replication 
Overview page.
The server’s Current Role should be Active Primary.
3. If installed on physical servers, remove the cable that connects the
main primary Gateway Node to the customer network.
4. If installed on virtual machines:
a. At the main primary Gateway Node server, access a command
shell and log in as root using the password listed in the Passwords.txt file.
b. Enter: /etc/init.d/fsg restart
120
NetApp StorageGRID
215-06839_B0
Convert to a High Availability Gateway Cluster
5. After a short period of time the cluster should failover and display
the following information on the FSG  Replication  Overview
page of each server in the cluster.
Table 5: Cluster Information
First
Gateway Node
Second
Gateway Node
Configured Role
Main Primary
Supplementary Primary
Current Role
Standby Primary
Active Primary
Replication
Status
Normal
Normal
Cluster Status
Normal
Normal
Failover Counta
0
1
a.
The Failover Count attribute increments by one on the new
active primary Gateway Node.
The second Gateway Node in the cluster (the former secondary) is
now the Active Primary and provides gateway service to clients. (If
clients have already actively begun saving or retrieving data to the
grid, note that NFS file shares may need to be remounted after a
failover.)
6. If installed on physical servers, replace the network cable removed
in step 3.
7. Manually fail back to the main primary:
a. Go to the supplementary primary Gateway Node  FSG 
Client Services  Configuration  Main.
b. Change the Client Services State to Stopped.
c. Click Apply Changes.
d. Go to FSG  Client Services  Overview and confirm that the
services were stopped.
8. Confirm that the main primary Gateway Node is now the active
primary:
a. Go to the first Gateway Node  FSG  Replication  Overview.
b. Confirm that the main primary’s Current Role has switched back
to Active Primary and the Cluster Status is now Vulnerable.
The Failover Count attribute increments by one on the active
primary Gateway Node.
9. Return the supplementary primary Gateway Node to service:
NetApp StorageGRID
215-06839_B0
121
Expansion Guide
a. Go to the supplementary primary Gateway Node  FSG 
Client Services  Configuration  Main.
b. Change Client Services State to Running.
c. Click Apply Changes.
d. Go to FSG  Client Services  Overview and confirm that the
services have started.
10. Go to the supplementary Gateway Node  FSG  Replication 
Overview  Main and confirm that the Current Role of the supplementary Gateway Node is Standby Primary and the Cluster Status
has returned to Normal.
11. After you have confirmed that the HAGC is operating normally,
restore client access to the grid.
12. If desired, you can also test failover from the HAGC to the secondary Gateway Node:
a. In the NMS MI, go to Grid Management  FSG Management 
<replication_group>  Configuration  Settings.
b. Change Primary FSG to make the configured secondary FSG
act as the active primary.
c. Click Apply Changes.
d. Go to the Secondary FSG  Replication  Overview.
e. Verify the Current Role attribute reports “Active Primary”.
If the grid is configured for business continuity failover,
swapping roles and redirecting client applications permits you
to maintain full file system access for clients to the grid should
the HAGC become unavailable.
f.
Go back to Grid Management  FSG Management 
<replication_group>  Configuration  Settings.
g. Change the Primary FSG back to the original primary FSG.
h. Click Apply Changes.
i.
Go to the main primary FSG  Replication component,
Overview tab, and confirm that it has assumed the role Active
Primary.
The conversion procedure is complete.
Restore Client Access to Gateway Nodes
The new HAGC is now active, and ready to provide gateway access to
clients via the virtual IP address of the cluster. If you did not restore
access to clients earlier in the conversion procedure, you can now
restore client access to the grid.
122
NetApp StorageGRID
215-06839_B0
6
Hardware Refresh
Replacing an existing server with a new server and
moving combined grid nodes onto virtual machines
Overview
A hardware refresh
is a type of grid
expansion; it is not a
maintenance procedure. For more
information, see
“Hardware Refresh
vs Server Recovery”
on page 124.
Performing a hardware refresh is the process of replacing an existing
server with a new server and is typically done to increase server data
storage capacity, improve grid performance, or move grid nodes from
physical servers to virtual machines. A basic hardware refresh procedure involves adding a new expansion grid node hosted by a new
server, copying data from the original grid node to the new expansion
grid node, decommissioning the original grid node, and then retiring
the old hardware.
This chapter explains how to refresh hardware for:
•
Admin Nodes
•
Control Nodes
•
Gateway Nodes
•
Storage Nodes
•
Admin/Gateway Nodes
•
Control/Storage Nodes
•
Gateway/Control/Storage Nodes
•
Admin/Gateway/Control/Storage Nodes (physical server only)
NOTE The refreshing to virtual machines of any combined grid node is
not supported.
You can refresh Admin Nodes, Control Nodes, Gateway Nodes, and
Storage Nodes directly from physical servers onto virtual machines.
However, you cannot refresh combined grid nodes, such as Admin/
Gateway Nodes, from physical servers onto virtual machines. Because
virtual machines do not support combined grid nodes, you must first
“split” combined grid nodes into single grid nodes. For example, a
NetApp StorageGRID
215-06839_B0
123
Expansion Guide
Control/Storage Node is split into a Control Node and a Storage Node,
each hosted by a separate virtual machine and the original Control/
Storage Node is removed from the grid topology. For more information on refreshing combined grid nodes to virtual machine, see
“Refresh Combined Grid Nodes to Virtual Machines” on page 124.
NOTE Always perform the hardware refresh procedure under the supervision of authorized and trained personnel.
Hardware Refresh vs Server Recovery
Refreshing hardware is not the same as recovering a server. Use the
hardware refresh procedure when you want to replace server
hardware with newer hardware or move combined grid nodes to
virtual machines. Use the recovery procedure when a server fails. For a
server recovery procedure, see the Maintenance Guide.
Table 6: Server Recovery vs Hardware Refresh
Recovery
Hardware Refresh
When to use
To recover a server that has
failed
To replace server hardware or move a
combined grid node to virtual machines
Node ID
Same node IDs as on old server
Different node IDs from old server
Server location
New server must be at same
location as old server
New server can be at a different
location
Data arrays
New data arrays are configured
exactly as those of the original
server
Hardware configuration and storage
capacity of the new server may be
different
Refresh Combined Grid Nodes to Virtual Machines
Combined grid nodes cannot be hosted by virtual machines. To move
combined grid nodes to virtual machines, you must perform a
hardware refresh and split the combined grid node into individual
grid nodes. Each individual grid node is then hosted by a virtual
machine. The following table lists the combined grid nodes that can be
split and refreshed to virtual machines.
124
NetApp StorageGRID
215-06839_B0
Hardware Refresh
Table 7: Results of Combined Grid Nodes Refresh
Combined Grid Node
Split Grid Nodes
Admin/Gateway Node
Admin Node (with or without a CMN)
Gateway Node
Control/Storage Node
Control Node
Storage Node
Gateway/Control/Storage Node
Gateway Node
Control Node
Storage Node
NOTE The refreshing to virtual machines of an Admin/Gateway/Control/
Storage Node is not supported.
The procedure to refresh combined grid nodes to virtual machines
runs parallel to the standard hardware refresh procedures as described
below. To refresh combined grid nodes to virtual machines, follow the
hardware refresh procedures as listed in Table 9 on page 137. Pay particular attention to the grid node type to which you are applying a
procedure. After a combined grid node is split into separate grid
nodes, follow procedures for the resulting individual grid node type
and not the original combined grid node.
Prepare for a Hardware Refresh
Grid Topology Description
Before you begin a hardware refresh, ensure that you have a description of the grid topology before and after the hardware refresh. Create
a complete, updated description of the final configuration of the grid
before proceeding. Use this description as a guide to determine what
hardware refresh procedures to perform and to determine if, at the
end of this process, the refresh has been successful.
The current grid topology is documented as part of the original
process of commissioning the grid. Additional updated documentation may have been created if the grid has undergone changes since it
NetApp StorageGRID
215-06839_B0
125
Expansion Guide
was first commissioned. As well, consult the grid configuration pages
in the Doc directory of the latest revision of the grid’s SAID package.
If you are performing a hardware refresh of combined grid nodes
hosted on physical servers to grid nodes hosted by virtual machines,
determine the grid node types that will result from “splitting” the
combined grid node into individual grid nodes. The result of refreshing a combined grid node to virtual machines will result in a new grid
topology. For example, the NMS MI’s Grid Topology tree is updated:
the original combined grid node is removed and the new grid nodes
are added.
Impact on Grid Operations
Grid tasks associated with hardware refresh procedures have a lower
priority than tasks associated with normal grid operations. Hardware
refresh should not interfere with normal grid operations and, in most
cases, does not need to be scheduled for a period of grid inactivity.
During certain stages of the hardware refresh grid task, you cannot
perform the following operations:
•
Add or remove other servers
You cannot run the grid task Expansion: Add Server <hostname>
(GEXP). However, you can use GDU to start the grid software
(enable services) on a newly expanded node while a refresh task is
running.
•
You cannot change the grid topology, for example, add sites,
change IP addresses, move servers, change the group IDs of
servers, and so on.
•
Change the ILM policy
Plan a hardware refresh for a time when these activities will not be
necessary. Make any changes to grid topology associated with the
hardware refresh before you begin the hardware refresh process.
The Storage Node/Control Node Hardware Refresh task (CSRF) and
the LDR Decommissioning task (LDCM) release their lock once the
actual data migration phase starts. In other words, you can run
another task, for example, the Grid Expansion task (GEXP) during the
migration phase of the Storage Node Hardware Refresh or LDR
Decommissioning task as long as there are no other resource contentions. Use the NMS MI to determine the phase of the grid task. For
more information on mutually exclusive grid tasks, see the Administrator Guide.
126
NetApp StorageGRID
215-06839_B0
Hardware Refresh
RAID
When performing a hardware refresh, note that the optional RAID
component of the SSM service is not installed on the new server. RAID
component functionality is rendered obsolete in a virtualized environment and no longer supported. For more information on the RAID
component of the SSM service, see the Administrator Guide.
Servers with Control Nodes
The following guidelines apply to any server that hosts a
Control Node, including Control/Storage Nodes, Gateway/Control/
Storage Nodes, and Admin/Gateway/Control/Storage Nodes.
Number of Control Nodes Being Refreshed
When refreshing multiple Control Nodes, the grid specification file is
only updated once and the grid provisioned; however, the procedure
to refresh a Control Node must be performed one Control Node at a
time and not in parallel with other Control Node hardware refreshes.
Metadata Consolidation
This hardware refresh procedure does not support Control Node
consolidation — that is, moving metadata from two or more
Control Nodes onto a single Control Node .
Content Availability
During a Control Node hardware refresh, the metadata on the source
Control Node and target Control Node is not available. To prevent an
interruption to grid services while the refresh is in progress, at least
two other Control Nodes equivalent to the Control Node being
refreshed must remain available in the grid while the hardware refresh
is in progress. That is, in a grid that uses metadata replication, two
Control Nodes from the same CMS replication group must remain
available. In a grid that uses metadata synchronization, two
Control Nodes from the same generation must remain available.
NOTE Metadata synchronization is deprecated.
In a grid with only two Control Nodes, plan a service outage for the
duration of the Control Node refresh procedure. The procedure to
NetApp StorageGRID
215-06839_B0
127
Expansion Guide
refresh a Control Node renders it unavailable: operating the grid with
a single active Control Node renders metadata vulnerable to loss if the
active Control Node fails.
Metadata Capacity
The new Control Node must have at least the same database capacity
as the original Control Node, as the refresh process clones the
database from the original Control Node to the new Control Node.
CMS Database Expansion
A grid's object capacity may be increased by expanding the size of the
CMS database to a maximum of 800 GiB. This increase is achieved by
performing a Control Node hardware refresh of all Control Nodes in a
replication group.
Metadata Synchronization vs Metadata Replication
Both a CMS database that uses metadata replication and a CMS
database that uses metadata synchronization can be expanded by performing a hardware refresh. However, a CMS database that uses
metadata synchronization cannot use the new free tablespace. To use
the new free tablespace, a CMS database that uses metadata synchronization must be converted to metadata replication and then Object
Location Indexing must be enabled.
For a CMS that uses metadata synchronization, read-only thresholds
remain the same after expansion. For example, a 200 GiB CMS
database will go read-only at 180 GiB (90% full). After expanding the
CMS database to 800 GiB, it will still go read-only at 180 GiB. To utilize
the new 540 GiB of free tablespace, the CMS must be converted to use
metadata replication and have Object Location Indexing enabled.
After conversion, the CMS will go read-only at 720 GiB (90% full).
A currently read-only Control Node that uses metadata synchronization remains read-only after a hardware refresh that expands the CMS
database. The Control Node cannot use newly added free tablespace
until it is converted to use metadata replication and Object Location
Indexing is enabled. Read-only and read-write Control Nodes using
metadata replication before expansion and where Object Location
Indexing is enabled are read-write immediately and are able to use
newly added free database space.
128
NetApp StorageGRID
215-06839_B0
Hardware Refresh
CMS Metadata Replication Groups
In a grid that uses metadata replication, the new Control Node must
be in the same CMS metadata replication group as the original
Control Node. This ensures that the grid maintains full redundancy
and can continue to make the required number of copies of metadata.
Storage Pool/Link Cost Group
When refreshing a Control Node in a grid where the CMS services use
metadata synchronization, you can place the new Control Node in a
different link cost group than the original Control Node, which may
be desirable if you want to move it to a new location. When moving
Control Nodes, plan the refresh to ensure that the grid retains sufficient redundancy in terms of the number and location of copies of
metadata.
NOTE Metadata synchronization is deprecated.
In a grid that uses metadata replication, metadata follows content. For
every copy of an object made in a storage pool, the grid makes a copy
of its metadata in the same storage pool. If you refresh a Control Node
into a different storage pool (that is, if you change its link cost group)
during hardware refresh, it may not be possible to satisfy the ILM for
newly ingested objects after refresh is complete. Moreover, the existing
metadata locations on the refreshed Control Node may no longer be in
the correct storage pools to satisfy the grid’s ILM policy.
Before you move a Control Node that uses metadata replication to a
different group as part of a hardware refresh, carefully consider the
ILM policy and the impact of moving the Control Node. After you
complete the refresh, perform an ILM re-evaluation to harmonize
metadata locations and ensure that the ILM for all objects is satisfied
for both content and metadata. For more information on ILM policies
and their impact on the distribution of metadata, see the “Information
Lifecycle Management” and the “Content Management System
(CMS)” chapters of the Administrator Guide.
Impact on Grid Operations
The hardware refresh process for a Control Node involves cloning the
CMS database from the original Control Node to the new
Control Node. Cloning takes about four hours to complete. During
this time, neither the source nor the target Control Node is available to
NetApp StorageGRID
215-06839_B0
129
Expansion Guide
the grid. There is no impact on grid operations other than the original
Control Node being unavailable during the refresh procedure.
Control Node Failure and Recovery
If an active Control Node fails while Control Node refresh is in progress, recover the failed Control Node from one of the other
Control Nodes that remain active.
Servers with Storage Nodes
The following guidelines apply to any server that hosts a
Storage Node, including Control/Storage Nodes, Gateway/Control/
Storage Nodes, and Admin/Gateway/Control/Storage Nodes.
Storage Consolidation
Storage Node consolidation — that is, moving content from two or
more Storage Nodes onto a single Storage Node — is not supported by
this hardware refresh procedure.
Content Availability
During a Storage Node hardware refresh, all content is available for
retrieval. However, the original and the replacement Storage Nodes
are not available for writing until the procedure completes.
WARNING Do not change the state of the Storage Node until the
hardware refresh procedure completes. The
Storage Node must not be placed in a state where
objects can be written to it until after the hardware
refresh procedure completes.
Storage Capacity
Identify the storage capacity on the existing Storage Node server, and
ensure that the new Storage Node server has enough capacity to store
existing content.
1. In the NMS MI, go to <Storage Node>  LDR  Storage 
Overview  Main.
130
NetApp StorageGRID
215-06839_B0
Hardware Refresh
2. View the Storage Volume Balancing attribute to determine whether
the Storage Node is set to Manual or Auto.
3. Identify the disk space:
a. For Storage Nodes with a Storage Volume Balancing attribute of
Manual, view the Total Usable Space attribute. The new
Storage Node server must have the same or greater amount of
disk space.
— or—
a. For Storage Nodes with a Storage Volume Balancing attribute of
Auto, view the ID attribute to identify the number of volumes
on the existing server. The new Storage Node server must have
the same or greater number of volumes.
b. View the Total Usable Space attribute. The new Storage Node
server must have the same or greater amount of disk space on
each volume.
Storage Pools
The new Storage Node must be in the same storage pool as the original
Storage Node to ensure that the ILM policy continues to be met once
data migration is complete. For more information on storage pools, see
the Administrator Guide.
Number of Storage Nodes Being Refreshed
It is possible to refresh more than one Storage Node at once and run
multiple Storage Node refresh operations in parallel. The
Storage Node pairs must be unique to each operation. In other words,
one Storage Node cannot be the source Storage Node for two operations or the destination Storage Node for two operations.
You can run two or more refresh tasks at once provided there is
enough writable storage left in the grid to meet the ILM policy while
the tasks are running.
Object Mapping
If refreshing an upgraded Storage Node with LDRs whose Storage
Volume Balancing mode is Manual (see LDR  Storage  Main), hardware
refresh does not result in the upgrade of object mapping. The Storage
Volume Balancing mode does not change to Auto. Objects continue to be
mapped to object stores using CBIDs. Objects are not mapped to
volume IDs.
NetApp StorageGRID
215-06839_B0
131
Expansion Guide
Impact on Grid Operations
In general, a Storage Node hardware refresh completes more quickly
when the grid is quiet or if only one server is being refreshed at a time.
The rate will be affected by the load on the grid if content is retrieved
from the Storage Node that is being refreshed. The refresh process is
throttled by a limit on the number of files that it will process concurrently (the limit is set to four). There is no relative prioritization of
client requests over migration requests.
Storage Node hardware refresh typically involves the transfer of large
volumes of data over the network. Although these transfers should not
affect normal grid operations, they may impact the total amount of
network bandwidth consumed by the StorageGRID system.
Hardware Refresh vs Decommissioning
Both Storage Node hardware refresh and Storage Node decommissioning permanently move content from an existing server to a
replacement server. However, decommissioning puts a heavier burden
on the grid than hardware refresh because it causes higher traffic and
requires more operations. For example, during decommissioning, the
Control Nodes process each object to make sure that the ILM policy is
fulfilled correctly. During a hardware refresh, the Control Nodes
simply map all instances of the original Storage Node to the new
Storage Node.
If there is not a one-to-one mapping between the original
Storage Node and the new Storage Node, you may have to use both
the refresh procedure and the decommissioning procedure. For
example, if you install two 20-TB Storage Nodes to replace three 5-TB
Storage Nodes, you use the refresh procedure for two of the 5-TB
Storage Nodes and the decommissioning procedure for the third 5-TB
Storage Node.
Servers with Admin Nodes
Hardware refresh for Admin Nodes is supported for these servers:
132
•
Admin Node that is not part of an HCAC
•
Admin/Gateway Node where the Admin Node is not part of a
HCAC and the Gateway Node is the secondary in a basic replication group
NetApp StorageGRID
215-06839_B0
Hardware Refresh
•
Admin/Gateway/Control/Storage Node where the Admin Node is
not part of a HCAC and the Gateway Node is the secondary in a
basic replication group
NOTE The refreshing to virtual machines of an Admin/Gateway/Control/
Storage Node is not supported.
Grid tasks managed by the CMN on the old primary Admin/
Gateway Node automatically transfer to the new CMN.
During a hardware refresh procedure of the Admin Node, if the
Admin Node hosts the AMS service, some audit messages will be
written to audit logs on the old Admin Node server, some to the audit
logs on the new server (if it hosts an AMS service), and some to both
servers. You will have to reconcile the log messages manually.
When you refresh the hardware of an Admin/Gateway/Control/
Storage Node:
•
Historical attribute data is lost for part of the time period that the
refresh is in progress: specifically attribute data is lost for the time
between the completion of the Admin Node decommissioning grid
task and the completion of the CMS refresh script.
•
The NMS MI is unavailable while the CMS refresh script is
running.
Servers with Gateway Nodes
Hardware refresh for Gateway Nodes is supported for these servers:
•
Gateway Node where the Gateway Node is the primary or the secondary in a basic replication group
•
Admin/Gateway Node where the Admin Node is not part of a
HCAC and the Gateway Node is the secondary in a basic replication group
•
Gateway/Control/Storage Nodes
•
Admin/Gateway/Control/Storage Node where the Admin Node is
not part of a HCAC and the Gateway Node is the secondary in a
basic replication group
NOTE The refreshing to virtual machines of an Admin/Gateway/Control/
Storage Node is not supported.
NetApp StorageGRID
215-06839_B0
133
Expansion Guide
Materials Checklist
To perform a hardware refresh, ensure that you have the materials
listed in Table 8.
Table 8: Materials Checklist for Hardware Refresh

Item
Notes
Server hardware
This guide does not cover hardware installation.
SUSE Linux Enterprise Server
(SLES) DVD
Use only the supported version of SLES for the Release 9.0
StorageGRID system. For supported versions, see the Interoperability Matrix Tool (IMT).
NOTE Use of any version of SLES other than the correct
version will result in an installation failure.
VMware software and documentation, if refreshing to a
virtual machine
For the current supported versions of VMware software, see
the Interoperability Matrix Tool (IMT).
Provisioning USB flash drive
and Backup Provisioning USB
flash drive
If the primary Admin Node is installed on a physical server,
get a copy of the most recent provisioning USB flash drives.
These flash drives are updated each time the grid is modified.
—or—
If the primary Admin Node is installed on a virtual machine,
retrieve the backup copies of the most recent provisioning
data. Provisioning data is updated each time the grid is
modified.
Backups of provisioning data
Hotfix and/ or Maintenance
Release CD (or ISO image)
Determine whether or not a hotfix and/ or maintenance
release has been applied to the grid node type that is being
refreshed. During the hardware refresh process, the grid node
must be updated to the same hot fix or maintenance release as
the other installed grid nodes of the same type. (See the
storage-grid-release version number listed on the
<grid_node> > SSM > Services > Main page.)
To acquire hotfixes and maintenance releases, contact
Support.
Blank USB flash drive
Used to copy the grid specification file from the Admin Node
server to the service laptop.
Not required if the primary Admin Node is installed on a
virtual machine.
134
NetApp StorageGRID
215-06839_B0
Hardware Refresh
Table 8: Materials Checklist for Hardware Refresh (cont.)

Item
Notes
Provisioning passphrase
The passphrase is created and documented when grids are
first installed (for StorageGRID 8.0 and higher) or when grids
are updated to StorageGRID 8.0. The provisioning passphrase
is not in the Passwords.txt file.
Passwords.txt file for the grid
Available in the SAID package.
Current grid specification file
This is the file you that you will edit to specify the hardware
refresh. To retrieve the latest copy of the file, see “Export the
Latest Grid Specification File” on page 231. Use Grid Designer
to update the grid specification file.
Documentation
•
•
•
•
Service laptop
Administrator Guide
Maintenance Guide
Grid Designer User Guide
The laptop must have:
•
•
•
•
NetApp StorageGRID
215-06839_B0
Installation Guide
Microsoft® Windows® operating system
Network port
Supported browser for StorageGRID 9.0
Grid Designer. The version of Grid Designer must be compatible with the version of the StorageGRID software.
135
Expansion Guide
Master Procedure
Figure 7 summarizes the main steps to refresh hardware. For links to
procedures, see Table 9. When splitting a combined grid node, after the
grid specification file is edited and the combined grid node is split,
follow procedures for the resulting individual grid nodes.
Server
Step
AddVMorPhysicalServer
Provision
A=AdminC=ControlG=GatewayS=Storage
A
C
G
S
CS
AG
GCS
AGCS
Editthegridspecificationfile
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
ProvisionthegridandcreateserveractivationUSBflashdrive
z
z
z
z
z
z
z
z
Preparevirtualmachinesorphysicalservers
z
z
z
z
z
z
z
z
UseGDUtoinstallthegridsoftware(butdo not enable
services)
z
z
z
z
z
z
z
z
RunthegridtasksGridExpansion:Initialfollowedby Grid
Expansion:AddServer
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
Planthehardwarerefresh
z
PreventMySQLandCMSfromstarting,startgridservices
PreventCMSfromstarting,startgridservices
z
z
UseGDUtoenablegridservices
Applyhotfixesandmaintenancereleases
z
WaitfortheFSGfilesystemtoberestored
ChangethepreferredEͲmailnotificationsender
CopytheNMSdatabase
IftheserverhoststheCMN,selectthenewCMN
Preservetheauditmessagesifdesired
z
z
IftheserverhoststheCMN,loadprovisioningsoftware
z
z
Copyfileshareconfiguration
DesignatethenewFSGtobethebackuporprimaryFSG
dependingonthecase
z
RunthegridtaskRemoveNMSClusterBindings
z
Runthegridtask<AdminNode> or<AdminGatewayNode>or
<GatewayNode>Decommissioning
z
z
z
Runthegridtask<ControlNode> or<StorageNode> or
<Control/StorageNode>HardwareRefresh
z
Confirmthatallpersistentcontenthasbeencopiedfromthe
sourcetothedestinationLDRusingthebuildͲcbidͲlistscript
Ifclientaccesswasremoved,reͲenableaccess
ChangetheCMSdeferredtime
Updatethelinkcostgroups
IftheserverhoststheCMN,updateSSHkeys
IftheAuditoptionisincludeinthedeployment,enablethe
auditshare
PerformLDRforegroundverifiction
z
z
z
Enableobjectlocationindexing
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
z
Converttometadatareplication
z
z
z
z
z
z
z
z
z
z
z
IfthereareonlytworeadͲwriteCMSsinthegrid,closeclient
connections
Retireoldhardware
z
z
z
z
z
z
CopytheFSGcacheifdesired
Refresh
z
z
z
z
z
z
z
z
z
z
z
z
Figure 7: Master Hardware Refresh Procedure
136
NetApp StorageGRID
215-06839_B0
Hardware Refresh
WARNING This high-level procedure omits many details. To avoid
unrecoverable data loss, go to the detailed procedures
and follow them exactly. Read all procedures in full
before attempting to refresh hardware.
Table 9: Hardware Refresh Master Procedure
 Step Applies to1 Action
1.
2.
All
All
See
Prepare for a hardware refresh and gather required
materials.
page 125
Edit the grid specification file.
page 141
When refreshing a combined grid node to virtual
machines, the combined grid node is split into individual grid node types. From this point on, refer to
procedures for the resulting individual grid node
types and not the original combined grid node.
page 124
page 134
3.
All
Provision the grid and prepare the server activation
flash drive.
page 29
4.
All
Prepare hardware for the expansion servers
page 28
5.
All
Complete preparation of the expansion virtual
machines or physical servers
page 179
page 199
6.
All
Use GDU to install the software, but do not enable
services (that is, do not start grid software).
page 39
7.
All
Run the grid task Grid Expansion: Initial.
page 44
NOTE If splitting combined grid nodes, at this point,
combined grid nodes are considered split.
Going forward, follow procedures for individual grid node types and not for combined
Grid Nodes.

For Control Nodes, perform the following
steps one Control Node at a time.
8.
All
Run the grid task Grid Expansion: Add Server
<servername>.
page 47
9.
C
Prevent MySQL and CMS from starting, and then
start grid services.
page 143
CS
GCS
NetApp StorageGRID
215-06839_B0
137
Expansion Guide
Table 9: Hardware Refresh Master Procedure (cont.)
 Step Applies to1 Action
See
10.
AGCS
Prevent the CMS from starting.
page 144
11.
G
Confirm that samba and winbind package versions
are the same for all Gateway Node.
page 155
A
Start the grid software (enable services).
page 57
G
NOTE Do not Load Configuration bundles on a
S
refreshed primary Admin Node.
AG
GCS
AGCS
12.
page 216
AG
If you are performing a hardware refresh of a
primary Admin Node, disable the Load Configuration option in GDU as described it step 5 on page 59.
Close GDU after you enable services.
13.
All
Apply hotfixes and maintenance releases.
page 61
14.
G
Wait until the Gateway Node file system has been
“restored”.
page 61
Change the preferred e-mail notification sender.
page 147
Copy the NMS database from the old Admin Node to
the new Admin Node.
page 65
AG
GCS
AGCS
15.
A
AG
AGCS
16.
A
AG
Do not copy the NMS database for an AGCS. The
NMS database is copied when the Control Node
refresh grid task is run later in this procedure.
17.
A
AG
If the server hosts the CMN, put the CMN on the old
server in standby and the CMN on the new primary
Admin Node online.
page 148
Copy the log files located in the directory /var/local/
audit/export/ from the old Admin Node to the new
Admin Node.
page 73
AGCS
18.
A
AG
AGCS
138
NetApp StorageGRID
215-06839_B0
Hardware Refresh
Table 9: Hardware Refresh Master Procedure (cont.)
 Step Applies to1 Action
19.
G
AG
GCS
Copy the FSG cache from the old Gateway Node to
the new Gateway Node.
See
page 155
This step can take up to four days to complete.
AGCS
20.
A
AG
If the server hosts the CMN, load provisioning
software on the new server.
page 149
Copy the file share configuration to the new server.
page 156
Designate the new FSG as the backup or primary FSG
depending on the case.
page 157
Run the grid task to remove NMS bindings.
page 161
Run the grid task to decommission the Admin Node
or Admin/Gateway Node or Gateway Node.
page 162
If the grid includes only two read-write CMSs, close
all client gateways and connections to prevent client
access to the grid.
page 144
AGCS
21.
G
AG
GCS
AGCS
22.
G
AG
page 158
GCS
AGCS
23.
A
AG
AGCS
24.
A
G
AG
GCS
AGCS
25.
C
CS
GCS
AGCS
NetApp StorageGRID
215-06839_B0
139
Expansion Guide
Table 9: Hardware Refresh Master Procedure (cont.)
 Step Applies to1 Action
26.
S
C
CS
GCS
See
Run the grid task to refresh the Storage Node,
Control Node, or Control/Storage Node hardware.
page 163
If the server hosts a Control Node, the grid task
pauses and prompts you to run the CMS refresh
script.
AGCS
27.
S
CS
Confirm that all persistent content has been copied
from the source to the destination LDR using the
build-cbid-list script.
page 159
If client access was removed in step 25, re-enable
access now.
page 146
Change the CMS deferred time if required.
page 57
GCS
AGCS
28.
C
CS
GCS
AGCS
29.
C
CS
GCS
AGCS
30.
All
Update the grid for the new server. For instance,
update the link cost groups.
page 75
31.
A
If the server hosts the CMN, update ssh keys.
page 152
If the Audit option is included in the deployment,
enable the audit share.
“File Share
Configuration” chapter
of the Administrator Guide.
Perform foreground verification on the destination
LDR.
“Disk Storage”
chapter of the
Administrator
Guide.
AG
AGCS
32.
A
AG
AGCS
33.
S
CS
GCS
AGCS
140
NetApp StorageGRID
215-06839_B0
Hardware Refresh
Table 9: Hardware Refresh Master Procedure (cont.)
 Step Applies to1 Action
34.
If expanding the CMS database, convert
Control Nodes to metadata replication.
“Convert
Control Nodes
to Metadata
Replication”
chapter of
Maintenance
Guide.
If expanding the CMS database, enable object
location indexing.
AGCS
Until object location indexing is enabled, the CMS
service will become read-only at the same amount of
free tablespace as prior to the hardware refresh.
“Enable
Control Nodes
for Object
Location
Indexing”
chapter of
Maintenance
Guide.
All
Retire the old hardware.
page 168
C
CS
GCS
AGCS
35.
C
CS
GCS
36.
See
1: A=Admin C=Control G=Gateway S=Storage
Modify the Grid Specification File
This procedure is the first step of a hardware refresh. Use Grid
Designer to update the grid specification file to indicate which server
you are refreshing (this includes splitting a combined grid node). If the
grid was not deployed using Grid Designer, contact Support for
instructions on how to update the grid specification file.
Edit the Grid Specification File Using Grid Designer
For detailed instructions, see the Grid Designer User Guide.
1. Retrieve a copy of the current grid specification file. See “Export
the Latest Grid Specification File” on page 231.
2. Start Grid Designer.
NOTE The version of Grid Designer must be compatible with the version
of StorageGRID software installed on the grid.
3. In Grid Designer, open the grid specification file retrieved in
step 1. Select File  Open.
NetApp StorageGRID
215-06839_B0
141
Expansion Guide
4. Specify which server is being refreshed and the new hardware:
a. Select Maintenance  Refresh Hardware.
b. In the Hardware Refresh window select a grid node. If a
combined grid node is selected, the Split combination node check
box becomes available.
c. If you are refreshing a combined grid node to virtual machines,
select Split combination node.
d. Click OK.
The Refresh Hardware interface appears. The selected grid
node is removed from Grid Designer’s Grid Topology tree and
replaced with the new refreshed grid node or nodes (if splitting
a combined grid node).
e. Enter required information on the highlighted tabs.
f.
Click Finish.
The grid specification file is saved (the revision number
increases by 1).
5. Enter the grid deployment specifications:
a. Select Actions  Deploy Grid.
b. Enter required information on the highlighted tabs.
c. Click Finish.
The updated grid specification file is saved (the revision
number does not change).
For the next step in the procedure, see Table 9: “Hardware Refresh
Master Procedure” on page 137.
142
NetApp StorageGRID
215-06839_B0
Hardware Refresh
Procedures for Servers with a Control Node
NOTE Perform the following procedures one Control Node at a time.
Prevent the CMS Service and MySQL from Starting
Use this procedure to keep the CMS service and MySQL from starting
and to start grid software.
After you disable the CMS and mysql, Server Manager cannot start if
you use GDU to Enable Services. Therefore this procedure includes
steps to start grid software using postinstall.rb start.
This procedure applies to:
•
Control Node
•
Control/Storage Node
•
Gateway/Control/Storage Node
To determine when to use this procedure while performing a
hardware refresh, see Table 9: “Hardware Refresh Master Procedure”
on page 137.
Prerequisites
•
Software has been installed.
•
Both expansion grid tasks have been executed.
Procedure
1. Log in to the new Control Node server as root using the password
listed in the Passwords.txt file.
2. Keep the CMS service from starting. Enter:
touch /etc/sv/cms/DoNotStart
3. Manually start services on the server. Enter: postinstall.rb start
4. Prevent MySQL from starting after a restart. Enter:
touch /etc/sv/mysql/DoNotStart
5. Stop the MySQL service. Enter: /etc/init.d/mysql stop
6. Log out. Enter: exit
For the next step in the hardware refresh procedure, see
Table 9: “Hardware Refresh Master Procedure” on page 137.
NetApp StorageGRID
215-06839_B0
143
Expansion Guide
Prevent the CMS Service from Starting
Use this procedure to prevent the CMS service from starting, and start
grid services.
After you disable the CMS service, Server Manager cannot start if you
use GDU to Enable Services. Therefore this procedure includes steps to
start grid software using postinstall.rb start. This procedure applies to:
•
Admin/Gateway/Control/Storage Node
To determine when to use this procedure while performing a
hardware refresh, see Table 9: “Hardware Refresh Master Procedure”
on page 137.
Prerequisites
•
Software has been installed.
•
Both expansion grid tasks have been executed.
Procedure
1. Log in to the new server as root using the password listed in the
Passwords.txt file.
2. Enter:
touch /etc/sv/cms/DoNotStart
postinstall.rb start
3. Log out. Enter: exit
For the next step in the hardware refresh procedure, see
Table 9: “Hardware Refresh Master Procedure” on page 137.
Close Client Gateways and Connections
If the grid includes only two read-write CMSs, close all client gateways
and connections to prevent HTTP client access to the grid. If the grid
includes more than two read-write CMSs, skip this procedure.
NOTE This procedure only applies if you are refreshing a server that hosts a
Control Node, and the grid is configured with only two read-write
CMSs.
To determine when to use this procedure while performing a
hardware refresh, see Table 9: “Hardware Refresh Master Procedure”
on page 137.
144
NetApp StorageGRID
215-06839_B0
Hardware Refresh
When you perform hardware refresh on a Control Node, the original
Control Node and the new Control Node are unavailable while the
refresh is in progress. Operating the grid with a single active
Control Node renders metadata vulnerable to loss if the active
Control Node fails. To prevent this loss, you must close all client
gateways and prevent HTTP clients from accessing the grid using the
procedure described below.
WARNING Plan a service outage for the duration of the refresh
procedure, approximately four hours.
Procedure
1. Shut down all Gateway Nodes in each replication group:
a. If a Gateway Node is hosted on the same server as other grid
nodes (that is, the Gateway Node is part of a combined node
such as an Admin/Gateway Node or a Gateway/Control/
Storage Node) stop the FSG and CLB services. Log in to the
command line interface for each Gateway Node, and enter
these commands:
/etc/init.d/fsg stop
/etc/init.d/clb stop
If desired, to prevent the services from starting if the server or
Server Manager is restarted, insert a DoNotStart file for each
one. Enter:
touch /etc/sv/fsg/DoNotStart
touch /etc/sv/clb/DoNotStart
b. If a Gateway Node is not combined with any other grid node
on the same server, use the Server Manager Stop All option to
stop services on all Gateway Nodes.
If prompted, enter the Server Consoles password (used for
Server Manager) listed in the Passwords.txt file.
c. Use the Server Manager console to ensure all services and supporting dependencies (databases in particular) are stopped
before stopping the next server.
2. If the deployment includes clients that access the grid via the StorageGRID API (SGAPI) or CDMI:
a. If clients access the grid via a CLB service, these clients can no
longer access the grid once the gateways are shut down.
b. If clients access LDRs directly, ensure that they do not access
the grid while the CMSs are not available.
NetApp StorageGRID
215-06839_B0
145
Expansion Guide
At this point all external access to the grid via the gateways is blocked.
For the next step in the hardware refresh procedure, see
Table 9: “Hardware Refresh Master Procedure” on page 137.
Open Client Gateways
Use this procedure to open client gateways that were closed to prevent
access during the Control Node hardware refresh.
This procedure applies if client access was interrupted, for example
because the grid has only two Control Nodes.
To determine when to use this procedure while performing a
hardware refresh, see Table 9: “Hardware Refresh Master Procedure”
on page 137.
Prerequisites
•
Client gateways are closed
•
Control Node hardware refresh grid task has completed
successfully
Procedure
1. If the Gateway Node is hosted on the same server as other grid
nodes (that is, the Gateway Node is part of a combined node such
as an Admin/Gateway Node or a Gateway/Control/Storage Node),
log in at the command line interface of each Gateway Node, and
then:
a. If you optionally created DoNotStart files for the services,
remove them. Enter:
rm /etc/sv/fsg/DoNotStart
rm /etc/sv/clb/DoNotStart
b. Restart the FSG and CLB services. Enter:
/etc/init.d/fsg start
/etc/init.d/clb start
2. If a Gateway Node is not combined with any other grid node on
the same server, use the Server Manager interface to issue the Start
All command.
This opens the primary FSG and CLB gateways for client access.
146
NetApp StorageGRID
215-06839_B0
Hardware Refresh
3. If the deployment includes clients that access the grid via SGAPI or
CDMI:
a. Enable clients that access LDRs directly to access the grid once
more.
b. Clients that connect to the grid via a CLB can access the grid
automatically once the Gateway Nodes are available.
For the next step in the hardware refresh procedure, see
Table 9: “Hardware Refresh Master Procedure” on page 137.
Procedures for Servers with an Admin Node
This section describes procedures that apply to servers that host an
Admin Node.
Change the Preferred E-mail Notifications Sender
Use this procedure to change the preferred NMS e-mail notifications
sender.
This procedure applies to:
•
Admin Node
•
Admin/Gateway Node
•
Admin/Gateway/Control/Storage Node
To determine when to use this procedure while performing a hardware
refresh, see Table 9: “Hardware Refresh Master Procedure” on
page 137.
Prerequisites
•
Grid software has been started
Procedure
1. Log in to the NMS MI of the old Admin Node using the Vendor
account.
2. Confirm that all new services that have joined the grid appear in
green.
3. Go to Grid Management  NMS Management  General  Main.
NetApp StorageGRID
215-06839_B0
147
Expansion Guide
Figure 8: Changing the Notification Sender
4. Change Preferred Notification Sender to the new NMS that was
added.
5. Click Apply Changes.
For the next step in the hardware refresh procedure, see
Table 9: “Hardware Refresh Master Procedure” on page 137.
Switch CMNs
Use this procedure to put the CMN on the old primary Admin Node
in standby and the CMN on the new primary Admin Node online.
This procedure applies to these servers if they host the CMN:
•
Admin Node
•
Admin/Gateway Node
•
Admin/Gateway/Control/Storage Node
To determine when to use this procedure while performing a
hardware refresh, see Table 9: “Hardware Refresh Master Procedure”
on page 137.
Prerequisites
•
NMS database has been copied (optional)
•
Preferred e-mail notification sender has been changed
Procedure
1. Log in to the NMS MI using the Vendor account.
2. Go to <old_primary_Admin Node> CMN  Configuration  Main.
3. Change CMN State to Standby.
148
NetApp StorageGRID
215-06839_B0
Hardware Refresh
4. Click Apply Changes.
Both CMNs in the grid are now in standby, and the alarm CMNA
(CMN Status) is triggered on both CMNs.
5. Go to <new_primary_Admin Node> CMN  Configuration  Main.
6. Change CMN State to Online.
7. Click Apply Changes.
The CMN on the new Admin Node is now online.
For the next step in the hardware refresh procedure, see
Table 9: “Hardware Refresh Master Procedure” on page 137.
Load Provisioning Software and Provisioning Data
Use this procedure to load grid provisioning data on the new primary
Admin Node.
This procedure applies to these servers if they host the CMN:
•
Admin
•
Admin/Gateway
•
Admin/Gateway/Control/Storage Node
To determine when to use this procedure while performing a
hardware refresh, see Table 9: “Hardware Refresh Master Procedure”
on page 137.
Primary Admin Node on a Physical Server
Prerequisites
•
The CMN on the new Admin Node is online.
Procedure
1. Log in to the new Admin Node server as root. When prompted for
a password, press <Enter>.
2. Install the provisioning software:
a. At the new Admin Node, access a command shell and log in as
root using the password listed in the Passwords.txt file.
NetApp StorageGRID
215-06839_B0
149
Expansion Guide
b. Mount the StorageGRID 9.0.0 Software CD image, or if applicable, the StorageGRID 9.0.x Software Service Pack CD image.
Enter:
mount -o loop,ro /var/local/install/Bycast_StorageGRID\
_9.0.0_Software_<buildnumber>.iso /cdrom
—or—
mount -o loop,ro /var/local/install/Bycast_StorageGRID\
_9.0.<servicepacknumber>_Software_Service_Pack_\
<buildnumber>.iso /cdrom
c. Install the provisioning software from the service pack ISO.
Enter: /cdrom/load-provisioning-software
d. When prompted, accept the NetApp StorageGRID Licensing
Agreement.
e. When prompted, enter the provisioning passphrase.
3. After the provisioning software has been loaded, restore the provisioning repository:
a. Insert the provisioning USB flash drive.
b. Enter: restore-from-usb-key
c. When prompted, enter the provisioning passphrase.
d. Remove the Provisioning USB flash drive.
4. Back up the provisioning data:
a. Insert the Backup Provisioning USB flash drive.
b. Enter: backup-to-usb-key
c. When prompted, enter the provisioning passphrase.
d. Remove the Backup Provisioning USB flash drive.
5. Store both USB flash drives separately in safe locations.
6. Log out. Enter: exit
WARNING Store the Provisioning USB flash drive and the Backup
Provisioning USB flash drive separately in safe secure
locations such as a locked cabinet or safe. The USB
flash drives contain encryption keys and passwords
that can be used to obtain data from the grid. The Provisioning USB is also required to recover from a
primary Admin Node failure.
7. Log out of the NMS MI of the old Admin Node.
For the next step in the hardware refresh procedure, see
Table 9: “Hardware Refresh Master Procedure” on page 137.
150
NetApp StorageGRID
215-06839_B0
Hardware Refresh
Primary Admin Node on a Virtual Machine
Prerequisites
•
The CMN on the new Admin Node is online.
Procedure
1. Log in to the new Admin Node server as root. When prompted for
a password, press <Enter>.
2. Install the provisioning software:
a. At the new Admin Node, access a command shell and log in as
root using the password listed in the Passwords.txt file.
b. Mount the StorageGRID 9.0.0 Software CD image, or if applicable, the StorageGRID 9.0.x Software Service Pack CD image.
Enter:
mount -o loop,ro /var/local/install/Bycast_StorageGRID\
_9.0.0_Software_<build>.iso /cdrom
—or—
mount -o loop,ro /var/local/install/Bycast_StorageGRID\
_9.0.<servicepack>_Software_Service_Pack_<build>.iso \
/cdrom
c. Install the provisioning software from the service pack ISO.
Enter: /cdrom/load-provisioning-software
d. When prompted, accept the NetApp StorageGRID Licensing
Agreement.
e. When prompted, enter the provisioning passphrase.
3. Create a directory for the provisioning data. Enter:
mkdir -p /root/usb
4. Copy the contents of the Provisioning Media to the /root/usb directory. In particular, copy the /gpt-backup directory and all of its
contents.
For example, use scp to copy the provisioning repository from 
/root/usb on the original Admin Node to /root/usb on the new
Admin Node.
5. After the provisioning software has been loaded, restore the provisioning repository:
a. Enter: restore-from-usb-key /root/usb
b. When prompted, enter the provisioning passphrase.
NetApp StorageGRID
215-06839_B0
151
Expansion Guide
6. Back up the provisioning data:
a. Create a directory for the backup provisioning data. Enter:
mkdir -p /var/local/backup
b. Back up the provisioning data. Enter:
backup-to-usb-key /var/local/backup
c. When prompted, enter the provisioning passphrase.
7. Store the contents of the Provisioning directory (found at /root/usb)
and the Backup Provisioning directory (/var/local/backup) separately in a safe place. For example, use WinSCP to copy these
directories to your service laptop, and then store them to two
separate USB flash drives that are stored in two separate, secure
physical locations.
For more information, see “Preserving Copies of the Provisioning
Data” on page 240.
The contents of the Provisioning directory is used during expansion and maintenance of the grid when a new SAID package must
be generated.
WARNING Store two copies of the Provisioning directory separately in safe secure locations. The Provisioning
directories contain encryption keys and passwords
that can be used to obtain data from the grid. The Provisioning directory is also required to recover from a
primary Admin Node failure.
8. Log out. Enter: exit
9. If logged in, log out of the NMS MI of the old Admin Node.
For the next step in the hardware refresh procedure, see
Table 9: “Hardware Refresh Master Procedure” on page 137.
Update SSH Keys
Use this procedure to update the authorized ssh keys on all servers in
the grid. When you refresh a primary Admin Node, you must update
the authorized ssh keys on all servers in the grid to enable ssh access
from the ssh access point on the primary Admin Node. This access is
required to perform maintenance procedures on servers through
GDU.
You can use the standard procedure (manual update) or, for large
grids, the automated procedure.
152
NetApp StorageGRID
215-06839_B0
Hardware Refresh
This procedure applies to these servers if they host the CMN:
•
Admin Node
•
Admin/Gateway Node
•
Admin/Gateway/Control/Storage Node
•
reporting Admin Node in an HCAC
To determine when to use this procedure while performing a
hardware refresh, see Table 9: “Hardware Refresh Master Procedure”
on page 137.
Standard Update Procedure
Prerequisites
•
The CMN on the new Admin Node is online.
Procedure
1. Log in to the primary Admin Node as root using the password
listed in the Passwords.txt file.
2. Obtain the public ssh key for the root user from the file 
/root/.ssh/id_rsa.pub on the new primaryAdmin Node.
3. Log out of the command shell. Enter: exit
4. For each server in the grid other than the primary Admin Node:
a. Log in to the server as root, using the password listed in the
Passwords.txt file.
b. In a text editor, open the file /root/.ssh/authorized_keys
c. Delete the line corresponding to the ssh key of the decommissioned primary Admin Node. Locate the old server by
searching for its IP address.
d. Append the line corresponding to the ssh key of the new
primary Admin Node.
e. Save the file.
f.
Log out of the command shell. Enter: exit
5. Repeat step 4 for every server in the grid other than the primary
Admin Node.
NetApp StorageGRID
215-06839_B0
153
Expansion Guide
Automated Update Procedure
On a large grid, it may be simpler to perform the following automated
procedure instead of manually editing the authorized ssh keys files on
every server in the grid.
Prerequisites
•
The CMN on the new Admin Node is online.
•
Network connectivity from the primary Admin Node to all servers
in the grid. Any server that does not have connectivity to the
primary Admin Node must be updated using the standard update
procedure described above.
•
The /etc/hosts file contains up-to-date IP addresses.
Procedure
1. Log in to the old Admin Node as root using the password listed in
the Passwords.txt file.
2. Enable passwordless ssh access from the primary Admin Node to
the rest of the grid. Enter: ssh-add
3. If prompted, enter the SSH Access Password listed in the Passwords.txt file.
4. Obtain the public ssh key from the new primary Admin Node.
Enter (on one line):
ssh <new_primary_Admin Node> cat /root/.ssh/id_rsa.pub >
tmp.pub
5. Add the public ssh key of the new primary Admin Node, and
remove the public ssh key of the old primary Admin Node on all
hosts in the grid. Enter:
for i in $(grep -v '^#' /etc/hosts | egrep -v '::|localhost' \
| awk '{print $2}'); do cat tmp.pub | ssh $i \
"sed -i -e '/\b<old_primary_Admin_Node_IP_address>\b/d' \
/root/.ssh/authorized_keys; \
cat >> /root/.ssh/authorized_keys"; done
6. Disable passwordless ssh access and perform clean-up:
ssh-add -D
rm tmp.pub
7. Log out of the command shell of the old primary Admin Node.
Enter: exit
154
NetApp StorageGRID
215-06839_B0
Hardware Refresh
Procedures for Servers with a Gateway Node
This section describes procedures that apply to servers that host a
Gateway Node.
Confirm Samba and Winbind Versions
Samba and winbind versions must be the same for the refreshed
Gateway Node as for other Gateway Nodes in the same FSG replication group.
1. Log in to the NMS MI using the Vendor account.
2. Go to <refreshed_Gateway Node>  SSM  Overview  Main.
3. Under Services, determine the current version of the following
services:
•
CIFS Filesharing (nmbd)
•
CIFS Filesharing (smbd)
•
IFS Filesharing (winbindd)
4. Verify that all Gateway Nodes in the same FSG replication group
are using the same version.
Copy the FSG Cache
When you refresh hardware for a server that hosts a primary
Gateway Node, the FSG cache is lost. To preserve the Gateway Node’s
FSG cache, export it to a temporary file before you retire the old hardware. After you refresh hardware, copy this temporary file to the
refreshed Gateway Node, and import the FSG cache.
If you do not export the FSG cache to a temporary file before you retire
the old hardware, you can copy the FSG cache from another FSG in the
same replication group.
This procedure applies to:
•
Gateway Node
•
Admin/Gateway Node
•
Gateway/Control/Storage Node
•
Admin/Gateway/Control/Storage Node
For the procedure, see the Maintenance Guide.
NetApp StorageGRID
215-06839_B0
155
Expansion Guide
Copy File Share Configuration
Use this procedure to copy file share configuration to the new
Gateway Nodes. This procedure applies to:
•
Gateway Node
•
Admin/Gateway Node
•
Gateway/Control/Storage Node
•
Admin/Gateway/Control/Storage Node
To determine when to use this procedure while performing a
hardware refresh, see Table 9: “Hardware Refresh Master Procedure”
on page 137.
Prerequisites
•
Grid software has been started (using Enable Services in GDU).
•
The restoration of the FSG file system has completed (as described
in “Monitor the Restoration of the Gateway Node File System” on
page 61).
Procedure
1. At the new Gateway Node server, ensure Server Manager reports
all services are Running.
2. Access a command shell and log in as root using the password
listed in the Passwords.txt file.
3. Start the configuration utility:
•
If using CIFS shares, enter: config_cifs.rb
•
If using NFS shares, enter: config_nfs.rb
4. Copy the file share configuration to the new Gateway Node using
pull-config. See “Update File Share Integration” on page 68, or for
more information, see the Administrator Guide.
If the server cohosts an Admin Node (hosting the AMS service)
and you require an audit file share, create one manually. The pullconfig and push-config options do not copy the audit file share.
5. Join the AD domain for CIFS integration with Active Directory. See
the Administrator Guide.
6. Close the configuration utility. Enter: exit
7. Log out of the command shell. Enter: exit
For the next step in the hardware refresh procedure, see
Table 9: “Hardware Refresh Master Procedure” on page 137.
156
NetApp StorageGRID
215-06839_B0
Hardware Refresh
Designate the New FSG as Backup FSG
Use this procedure to make the FSG on the new server the backup FSG.
This procedure applies to the following servers if the FSG is the secondary in the basic replication group:
•
Gateway Node
•
Admin/Gateway Node
•
Gateway/Control/Storage Node
•
Admin/Gateway/Control/Storage Node
To determine when to use this procedure while performing a hardware
refresh, see Table 9: “Hardware Refresh Master Procedure” on page 137.
Prerequisites
•
File share configuration has been copied to the new server.
Procedure
1. Log in to the NMS MI using the Vendor account.
2. Go to Grid Management  FSG Management  <Replication
Group>  Configuration  Settings.
3. Change Backup FSG to the FSG on the new server.
Figure 9: Designating the Backup FSG
4. Click Apply Changes.
NetApp StorageGRID
215-06839_B0
157
Expansion Guide
For the next step in the hardware refresh procedure, see
Table 9: “Hardware Refresh Master Procedure” on page 137.
Designate the New FSG as the Primary FSG
Use this procedure to make the FSG on the new server the primary
FSG. This procedure applies to the following servers if the FSG is the
primary in the basic replication group:
•
Gateway Node
•
Gateway/Control/Storage Node
To determine when to use this procedure while performing a
hardware refresh, see Table 9: “Hardware Refresh Master Procedure”
on page 137.
Prerequisite
•
File share configuration has been copied to the new server.
Procedure
1. Log in to the NMS MI using the Vendor account.
2. Go to Grid Management  FSG Management  <Replication
Group>  Configuration  Settings.
3. Change Primary FSG to the FSG on the new server.
4. Click Apply Changes.
5. Go <new server>  FSG  Replication  Overview  Main and
verify the following:
•
The Failover Count attribute increments by one on the new
primary FSG, and a Notice alarm (RPFO) is triggered.
•
The Current Role attribute reports as Active Primary.
6. Reset the Failover Count attribute and clear the RPFO alarm:
a. Go to <new Gateway Node>  FSG  Replication 
Configuration  Main.
b. Select Reset Failover Count.
c. Click Apply Changes.
The Failover Count attribute is reset to zero and the RPFO alarm
is cleared.
For the next step in the hardware refresh procedure, see
Table 9: “Hardware Refresh Master Procedure” on page 137.
158
NetApp StorageGRID
215-06839_B0
Hardware Refresh
Procedures for Servers with a Storage Node
Confirm That all Persistent Content was Copied
Confirm that you copied all persistent content from the source to the
destination LDR. Generate a list of the CBIDs on the source LDR and
on the destination LDR, and compare the two lists.
This procedure applies to the following servers with an LDR:
•
Storage Node
•
Control/Storage Node
•
Gateway/Control/Storage Node
•
Admin/Gateway/Control/Storage Node
To determine when to use this procedure while performing a
hardware refresh, see Table 9: “Hardware Refresh Master Procedure”
on page 137.
Prerequisite
•
The hardware refresh grid task for Storage Node or Control/
Storage Node has been executed.
•
There is enough disk space on the source and destination
Storage Nodes for the listing files (a listing of 20 million CBIDs
requires about 400 MB).
Procedure
1. Generate a list of the CBIDs on the source LDR:
a. Log in to the server hosting the source LDR as root using the
password listed in the Passwords.txt file.
b. Stop the ldr service on the source LDR. Enter:
/etc/init.d/ldr stop
touch /etc/sv/ldr/DoNotStart
c. Change directories. Enter: cd /usr/local/ldr
d. Run the build-cbid-list script. Enter:
./build-cbid-list /var/local/tmp/old.list
The script should process approximately 20 million CBIDs per
hour but as the script is I/O bound, duration depends on the
server hardware.
NetApp StorageGRID
215-06839_B0
159
Expansion Guide
2. Generate a list of the CBIDs on the destination LDR:
a. Log in to the server hosting the destination LDR as root using
the password listed in the Passwords.txt file.
b. Optionally, stop the ldr service on the destination LDR. The
script will be as much as five times faster if the LDR is not
storing content. Enter:
/etc/init.d/ldr stop
touch /etc/sv/ldr/DoNotStart
c. Change directories. Enter: cd /usr/local/ldr
d. Run the build-cbid-list script. Enter:
./build-cbid-list /var/local/tmp/new.list
e. If you stopped the ldr service on the destination LDR, restart it
now. Enter:
rm /etc/sv/ldr/DoNotStart
/etc/init.d/ldr restart
3. Copy the new.list file to /var/local/tmp on the source Storage Node:
a. Make sure that there is enough disk space on the source
Storage Node.
b. From the destination Storage Node, enter:
scp /var/local/tmp/new.list root@<old_server_ip>:/var/local/tmp
4. Compare the files using comm. On the source LDR, enter:
comm -2 -3 old.list new.list > only-on-old.list
The -2 option suppresses lines that exist only in the new.list file. The
-3 option suppresses lines common to both files. Therefore, the
output contains CBIDs that exist only on the source LDR.
This step should take only a few minutes to complete.
5. Examine the output. If any CBID files are missing from the destination LDR, contact Support.
6. If there are no CBID files missing, remove the list of files on the
destination LDR to free up space. On the destination LDR, enter:
rm /var/local/tmp/new.list
7. Log out of the destination LDR. Enter: exit
160
NetApp StorageGRID
215-06839_B0
Hardware Refresh
Hardware Refresh and Decommissioning Tasks
This section describes the various grid tasks that you must run as part
of hardware refresh.
For general information on grid tasks, see the Administrator Guide.
Remove NMS Bindings
Use this procedure to run the grid task that removes NMS bindings
(RNMC).
This procedure applies to:
•
Admin Node
•
Admin/Gateway Node
•
Admin/Gateway/Control/Storage Node
To determine when to use this procedure while performing a
hardware refresh, see Table 9: “Hardware Refresh Master Procedure”
on page 137.
Prerequisites
•
If the server hosts the CMN, the new CMN is online.
•
No grid tasks are running. The only exceptions are:
•
LDR content rebalancing grid task (LBAL)
•
ILM evaluation (ILME)
•
LDR foreground verification (LFGV or VFGV)
You can run these grid tasks concurrently with the grid task that
removes cluster bindings. If any other grid tasks are running, wait
for them to complete, release their lock, or abort them as appropriate. For more information on grid tasks and resource locking, see
the chapter “Grid Tasks” in the Administrator Guide.
Procedure
1. Log in to the NMS MI of the new primary Admin Node using the
Vendor account.
2. Go to <grid root> Configuration  Tasks.
3. In the Pending table, locate the grid task Remove NMS Cluster Bindings. Under Actions, select Start.
NetApp StorageGRID
215-06839_B0
161
Expansion Guide
4. Click Apply Changes.
The grid task moves to the Active table. Wait until the grid task
moves to the Historical table with a Status of Successful.
For the next step in the hardware refresh procedure, see
Table 9: “Hardware Refresh Master Procedure” on page 137.
Gateway Node and Admin Node Decommissioning
Use this procedure to run the Gateway Node or Admin Node or
Admin/Gateway Node decommissioning grid task (GDCM). This procedure applies to:
•
Admin Node
•
Gateway Node
•
Admin/Gateway Node
•
Gateway/Control/Storage Node
•
Admin/Gateway/Control/Storage Node
To determine when to use this procedure while performing a
hardware refresh, see Table 9: “Hardware Refresh Master Procedure”
on page 137.
Prerequisites
•
FSG cache has been preserved (optional).
•
The new FSG has been designated as backup FSG or Primary FSG,
depending on the situation.
•
If the server includes an Admin Node, the grid task to remove
NMS bindings has been executed.
•
No grid tasks are running. The only exceptions are:
•
Admin Node or Gateway Node decommissioning (GDCM) for
a different server
•
LDR content rebalancing grid task (LBAL)
•
ILM evaluation (ILME)
•
LDR foreground verification (LFGV or VFGV)
These grid tasks can run concurrently with the Admin Node/
Gateway Node decommissioning grid task. If any other grid tasks
are running, wait for them to complete, release their lock, or abort
them as appropriate. For more information on grid tasks and
resource locking, see the chapter “Grid Tasks” in the Administrator
Guide.
162
NetApp StorageGRID
215-06839_B0
Hardware Refresh
Procedure
1. Log in to the NMS MI using the Vendor account.
2. Go to <Gateway Node>  FSG  Replication  Main.
3. Confirm that the value for Files Pending for Replication (FRPP) is 0. If
it is not, contact support.
4. Go to <primary_Admin Node>  CMN  Grid Tasks 
Configuration  Main.
5. In the Pending table, locate the grid task Admin/Gateway Node
Decommissioning or Gateway Node Decommissioning or Admin Node
Decommissioning. Under Actions, select Start.
6. Click Apply Changes.
The grid task moves to the Active table. Wait for the grid task to
complete and move to the Historical table with a Status of Successful.
For the next step in the hardware refresh procedure, see
Table 9: “Hardware Refresh Master Procedure” on page 137.
Storage Node and Control Node Hardware Refresh
Use this procedure to run the Storage Node or Control Node or
Control/Storage Node hardware refresh grid task (CSRF). This procedure applies to:
•
Storage Node
•
Control/Storage Node
•
Gateway/Control/Storage Node
•
Admin/Gateway/Control/Storage Node
If the server hosts a Control Node, the grid task pauses and prompts
you to run the CMS refresh script.
NOTE Do not abort the hardware refresh grid task.
To determine when to use this procedure while performing a
hardware refresh, see Table 9: “Hardware Refresh Master Procedure”
on page 137.
NetApp StorageGRID
215-06839_B0
163
Expansion Guide
Prerequisites
•
On a Control Node, Control/Storage Node, or Gateway/Control/
Storage Node, MySQL and the CMS have been prevented from
starting.
•
On an Admin/Gateway/Control/Storage Node, the CMS has been
prevented from starting.
•
Grid software has been started.
•
If the grid only has two read-write CMSs, client gateways have
been closed.
•
If the server includes a Gateway Node or an Admin Node, the
Gateway Node or Admin Node decommissioning grid task has
been executed.
•
Dual Commit is enabled for all data (all FSG profiles) in the NMS
MI at FSG Management  Group <x>  Profiles).
•
No grid task is running with the exception of the following grid
tasks, when executed for a different grid node:
•
LDR content rebalancing grid task (LBAL)
•
LDR foreground verification (LFGV or VFGV)
•
Control/Storage Node decommissioning (CSRF)
Note that the Storage Node/Control Node Hardware Refresh task
(CSRF) releases its lock once the actual data migration phase starts.
In other words, you can run another task, for example, the Grid
Expansion task (GEXP) during the migration phase of the
Storage Node Hardware Refresh as long as no other resource contentions exist. Use the NMS MI to determine the phase of the grid
task.
For more information on grid tasks and resource locking and
restrictions, see the chapter “Grid Tasks” in the Administrator
Guide.
Procedure
1. Log in to the NMS MI using the Vendor account.
2. Verify these attributes:
Grid Node
Component
Attribute
Notes
Old
Storage Node
LDR  Storage
Total Persistent
Data
If the task involves a Storage Node,
record this value for future reference.
164
NetApp StorageGRID
215-06839_B0
Hardware Refresh
Grid Node
Component
Attribute
Notes
Old
Control Node
CMS  Content or
CMS  Metadataa
Stored Objects
If the task involves a Control Node,
record these values for future
reference.
Old
Control Node
CMS  Metadatab
Metadata with
ILM Evaluation
Pending
Managed Objects
Metadata with 
Unachievable ILM
Evaluations
Locations Requiring Updates
If the task includes a Control Node
and there is metadata with unachievable ILM evaluations, resolve the
issue before continuing.
If there are ILM Evaluations pending
or Locations Requiring Updates, wait
until the number decreases to zero. If
the number does not decrease but
Dual Metadata Commit is enabled,
proceed with the refresh. If it is not,
contact Support.
For information on Dual Metadata
Commit, see the Administrator Guide.
a.
These attributes appear under CMS  Metadata when CMSs use metadata replication and under
CMS  Content when CMSs use metadata synchronization.

Note: Metadata synchronization is deprecated.
b.
These attributes only appear when CMSs use metadata replication.
3. Go to <grid root> Configuration  Tasks.
4. In the Pending table, locate the grid task Storage Node or
Control Node or Control/Storage Node Hardware Refresh. Under
Actions, select Start.
5. Click Apply Changes.
6. The grid task moves to the Active table.
NOTE Do not abort the hardware refresh grid task.
If the refreshing process stops:
•
Check the number of Available Audit Relays. Go to
<primary_Admin Node> NMS Events  Overview Main.
The CMN must be connected to at least one ADC and that
ADC must be able to communicate with a majority of the
ADCs in the grid. ADCs act as Audit Relays, so if Available Audit
Relays is one or greater, the CMN is connected to at least one
ADC.
NetApp StorageGRID
215-06839_B0
165
Expansion Guide
If the CMN is not connected to at least one ADC, ensure that
the Control Nodes are online, and check network connectivity
between the Admin Node and the Control Nodes.
•
If the task involves a Storage Node, verify that the source LDR
and the destination LDR are online. If you shut down the
Storage Node being replaced, the task stops until the
Storage Node is started again. The LDRs must be online
throughout the entire process.
If a storage volume fails on the source or destination LDR while
the grid task is Active, go to “Storage Volume Failures During
Refresh” on page 170. You cannot continue with the current
hardware refresh task.
7. If the task does not involve a Control Node, wait until the grid task
completes and moves to the Historical table with a Status of
Successful.
8. If the task involves a Control Node, wait until the grid task status
changes to Paused and the prompt Paused for user to run cmsrefresh.sh appears in the Message field.
9. If you are refreshing an Admin/Gateway/Control/Storage Node
(AGCS):
a. At the old AGCS server, access a command shell and log in as
root using the password listed in the Passwords.txt file.
b. Stop the MI service. Enter: /etc/init.d/mi stop
c. Log out of the old AGCS. Enter: exit
d. At the new AGCS server, access a command shell and log in as
root using the password listed in the Passwords.txt file.
e. Prevent the MySQL service from restarting once stopped.
Enter: touch /etc/sv/mysql/DoNotStart
f.
Stop the MI and then the MySQL services. Enter:
/etc/init.d/mi stop
/etc/init.d/mysql stop
g. Log out of the new AGCS server. Enter: exit
10. If the server hosts a Control Node, run the CMS refresh script
when prompted:
a. At the new Control Node server, access a command shell and
log in as root using the password listed in the Passwords.txt file.
b. Change directories to access the refresh script. Enter:
cd /usr/local/cms/tools
166
NetApp StorageGRID
215-06839_B0
Hardware Refresh
c. Run the refresh script to copy the content metadata database
from the old Control Node server to the new Control Node
server. Enter: ./cms-refresh.sh <IP_source>
where <IP_source> is the grid network IP address of the old
Control Node server.
WARNING Make sure you enter the correct IP address for the
source server.
d. Follow the prompts.
If asked whether to continue in case the authenticity of host
cannot be established, answer yes
When prompted for a password, enter the password of the old
Control Node server. You will be prompted for the password
three times.
e. Wait for the script to complete. This could take up to four
hours. Track the refresh process using the displayed progress
bar and estimated time of completion. The script ends with the
message “starting cms.... done”.
f.
If this is an AGCS, restart the MI service. Enter:
/etc/init.d/mi start
g. In the NMS MI, confirm that the new CMS is shown in green in
the Grid Topology tree.
h. Restart the grid task that was paused. In the NMS MI, go to
<grid root> Configuration  Tasks. In the Active table, locate
the grid task Control/Storage Node Hardware Refresh or
Control Node Hardware Refresh. Under Actions, select Run.
The grid task status changes from Paused to Running. Wait until
the grid task moves to the Historical table with a Status of
Successful.
11. Verify the grid in the NMS MI:
a. Wait until the new server is detected and all alarms clear. It
could take a few minutes.
b. Verify that the old server no longer appears in the tree.
NetApp StorageGRID
215-06839_B0
167
Expansion Guide
c. Monitor the following attributes:
Grid Node
Component
Attribute
Notes
New
Storage Node
LDR  Overview
State
When the Grid Expansion task is complete, the state of the LDR on the new
Storage Node is Read-only. When the
refresh task is complete, the state
changes to Online.
LDR  Storage
Total Persistent
Data
While the hardware refresh task is
running, the value of Total Persistent
Data continuously increases until it
reaches a constant final value. The value
of Total Persistent Data on the source
LDR does not change. The value should
be approximately equal on both LDRs
at the end of the task. It is not a concern
if the values are slightly different.
CMS  Content
or CMS 
Metadataa
Stored Objects
The values of Stored Objects and
Managed Objects should be approximately the same as for the source CMS
New
Control Node
a.
Managed Objects
The attributes appear under CMS  Metadata when CMSs use metadata replication and under 
CMS  Content when CMSs use metadata synchronization.

Note: Metadata synchronization is deprecated.
For the next step in the hardware refresh procedure, see
Table 9: “Hardware Refresh Master Procedure” on page 137.
Retire the Hardware
Use this procedure to wipe the contents of the server drives and data
arrays.
The hardware refresh process does not remove files from the data
arrays. Confidential data could potentially be retrieved from the old
data arrays, particularly if encryption is not enabled. Before retiring or
repurposing the data arrays, wipe their contents.
NOTE It could still be possible to restore data (such as CMS or MI databases, or audit logs) after you have performed the following
procedure. To ensure data is securely wiped, use commercially available data wiping tools or services.
168
NetApp StorageGRID
215-06839_B0
Hardware Refresh
Wiping the contents of the data arrays is time-consuming: the process
described here takes between several hours and several days to
complete per logical drive.
Prerequisites
•
All other steps in Table 9: “Hardware Refresh Master Procedure”
on page 137 have been completed.
•
The server hosted a Storage Node and the Storage Node does not
use NFS rangedbs.
Procedure
1. At the old server, access a command shell and log in as root, using
the password listed in the Passwords.txt file.
2. Stop all grid services on the old server. Enter:
/etc/init.d/servermanager stop
3. Display the list of mounted file systems. Enter: mount
For example, the following shows the file systems on a
Storage Node with three logical drives (rangedb directories):
# mount
/dev/sda1 on / type ext3 (rw,errors=remount-ro, barrier=0)
proc on /proc type proc (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda3 on /var/local type ext3 (rw,errors=remount-ro)
/dev/sdb1 on /var/local/rangedb/0 type ext3 (rw,dirsync,errors=remountro,barrier=0,data=writeback)
/dev/sdb2 on /var/local/rangedb/1 type ext3 (rw,dirsync,errors=remountro,barrier=0,data=writeback)
/dev/sdb3 on /var/local/rangedb/2 type ext3 (rw,dirsync,errors=remountro,barrier=0,data=writeback)
4. Unmount the first partition that you want to wipe. Enter:
umount /var/local/rangedb/0
5. Overwrite all blocks on the partition. Enter:
dd if=/dev/zero of=/dev/sdb1 bs=128k
Depending upon the size of the logical drive, this step could take
from several hours to a few days to complete.
6. Repeat step 4 and step 5 for each partition in the data array.
After the contents have been wiped, you can power down, and remove
the server and its attached storage.
NetApp StorageGRID
215-06839_B0
169
Expansion Guide
Storage Volume Failures During Refresh
If a storage volume fails on a source or destination Storage Node while
the content migration phase of a hardware refresh grid task is in progress, the hardware refresh that is in progress cannot complete
successfully. Follow the procedures in this section to recover from the
failure, and complete the hardware refresh with all data intact.
Storage Volume Fails on Source LDR
If a storage volume fails on an LDR that is the source for a hardware
refresh after the content migration from the source LDR to the destination LDR has begun, follow the steps below to recover from the failure.
Table 10: Recover From Source LDR Storage Volume Failure

Step
Applies To
Action
See
1.
All
Abort the grid task.
page 171
2.
All
Reformat all storage volumes on the destination LDR.
page 172
3.
All
Generate and run the grid task to recover from the
storage volume failure.
page 174
4.
CS
Return to step 28 of Table 9: “Hardware Refresh Master
Procedure”.
page 140
GCS
AGCS
5.
S
Complete all of the remaining steps in the hardware
refresh procedure, starting with “Open Client
Gateways”.
Return to step 30 of Table 9: “Hardware Refresh Master
Procedure”.
page 140
Complete all of the remaining steps in the hardware
refresh procedure, starting with “Update the grid for
the new server”.
Storage Volume Fails on Destination LDR
If a storage volume fails on an LDR that is the destination for a
hardware refresh after the content migration phase of the hardware
refresh grid task has begun, follow the steps below to recover from the
failure and complete the hardware refresh.
170
NetApp StorageGRID
215-06839_B0
Hardware Refresh
Table 11: Recover From Destination LDR Storage Volume Failure

Step
Applies To
Action
See
1.
All
Abort the grid task.
page 171
2.
All
Replace the defective hardware on the destination
Storage Node.
3.
All
Reformat all storage volumes on the destination LDR.
page 172
4.
S
Resubmit and re-run the hardware refresh grid task.
page 177
page 163
5.
CS
Generate and run the refresh cleanup grid task.
page 174
Return to step 27 of Table 9: “Hardware Refresh Master
Procedure”.
page 140
GCS
AGCS
6.
S
Complete all of the remaining steps in the hardware
refresh procedure, starting with “Confirm That all Persistent Content was Copied”.
7.
Return to step 28 of Table 9: “Hardware Refresh Master
Procedure”, and “Open Client Gateways” if required.
CS
GCS
page 140
AGCS
Procedures for Storage Failures
Abort the Hardware Refresh Grid Task
Prerequisites
•
The Storage Node or Control/Storage Node Hardware Refresh is in the
content migration stage.
•
A storage volume on the source or destination LDR has failed.
Procedure
1. Log in to the NMS MI using the Vendor account.
2. Go to <grid root>  Configuration  Tasks.
3. In the Pending table, locate the grid task Storage Node or Control/
Storage Node Hardware Refresh. Under Actions, select Pause.
NetApp StorageGRID
215-06839_B0
171
Expansion Guide
4. Click Apply Changes.
The status of the grid task changes to Paused.
If there is an error pausing, the grid task retries ten times and goes
into an error state. When the grid task status is Error, you can abort
the grid task.
5. From the Actions menu, select Abort.
6. Click Apply Changes.
The grid task moves to the Historical list with the description
Aborted in the task field.
7. Delete the grid task from the Historical table:
a. In the Historical table, select Remove for the task.
b. Click Apply Changes.
This enables the task to be run again, which is necessary when a
storage volume has failed on the destination LDR.
For the next step in the procedure, see:
•
Table 10: “Recover From Source LDR Storage Volume Failure” on
page 170
—or —
•
Table 11: “Recover From Destination LDR Storage Volume Failure”
on page 171.
Reformat Storage
If a storage volume fails, reformat the storage.
Prerequisites
•
If the server is rebooted with a failed volume, it may fail to reboot.
For information on recovering from this state, see the troubleshooting procedure “Server with a Failed Volume Fails to Reboot” in the
Maintenance Guide.
•
The Storage Node or Control/Storage Node Hardware refresh grid
task has been aborted.
Procedure For Direct-attached or SAN Storage
1. Log on to the destination Storage Node as root using the password
in the Passwords.txt file.
2. Stop the LDR service. Enter: /etc/init.d/ldr stop
172
NetApp StorageGRID
215-06839_B0
Hardware Refresh
3. Unmount all object stores. Enter:
umount /var/local/rangedb/<object_store_number>
4. Reinitialize all storage volumes on the destination grid node:
a. Enter: reformat_rangedbs.rb
b. When asked “WARNING: partition name exists. Reformat the partition <name>? [y/n]?”, enter:
•
y for all storage volumes
Reformat and reinitialize all storage volumes. Because a
volume failed while the hardware refresh was in progress, the
storage must be returned to a known state before restarting the
refresh process.
c. Examine the mapping between each device and the rangedb
directory.
d. When asked whether the assignment of devices to rangedbs is
correct, enter:
•
y
confirm the mount point assigment
•
n
to change the assignment
NOTE Make sure to assign the replacement device to a rangedb directory that has the same size as the rangedb that was the mount
point for the original device.
5. For direct-attached or SAN storage, restart the LDR service. Enter:
/etc/init.d/ldr start
6. Monitor services to ensure that the LDR restarts. Enter:
/usr/local/bin/storagegrid-status
Press <Ctrl>+<C> to exit.
For the next step, return to:
•
Table 10: “Recover From Source LDR Storage Volume Failure” on
page 170
—or —
•
Table 11: “Recover From Destination LDR Storage Volume Failure”
on page 171.
Procedure for NFS Mounted Storage
1. Log on to the destination Storage Node using the password in the
Passwords.txt file.
2. Recreate and export the failed storage volumes on the NFS server.
NetApp StorageGRID
215-06839_B0
173
Expansion Guide
NOTE Maintenance procedures for the NFS server are beyond the
scope of this guide.
3. Stop services on the Storage Node. Enter:
/etc/init.d/servermanager stop
4. Unmount all storage volumes. For each storage volume, enter:
umount /var/local/rangedb/<volume ID>
For example:
umount /var/local/rangedb/1
5. Remount all volumes. Enter: mount -a
6. Set up the NFS rangedb directory. Enter:
/usr/local/ldr/setup_rangedb.sh <LRD_nodeid>
Obtain the LDR node ID from the NMS MI under LDR 
Overview  Main.
This script creates the required subdirectories in the rangedb
directory.
7. Restart services on the Storage Node. Enter:
/etc/init.d/servermanager start
For the next step in the procedure, see:
•
Table 10: “Recover From Source LDR Storage Volume Failure” on
page 170
—or —
•
Table 11: “Recover From Destination LDR Storage Volume Failure”
on page 171.
Generate and Run the Refresh Cleanup Grid Task
Generate and run the Control/Storage Refresh Cleanup (CSRC) task as
described below. This grid task informs the grid that the objects on the
source LDR have been lost, and then goes on to complete the remaining steps in the hardware refresh procedure. It is used to ensure that
copies are made of all objects from the source LDR, and that the CMS
database on a destination grid node is preserved, if present.
This grid task replaces the Storage Node or Control/Storage Node
Hardware Refresh that was aborted.
174
NetApp StorageGRID
215-06839_B0
Hardware Refresh
Prerequisites
•
A storage volume has failed on a source or destination LDR while
a hardware refresh was in progress.
•
The Storage Node or Control/Storage Node Hardware Refresh grid task
has been aborted.
•
No grid task is running with the exception of the following grid
tasks, when executed for a different grid node:
•
LDR content rebalancing grid task (LBAL)
•
LDR foreground verification (LFGV or VFGV)
•
Control/Storage Node Refresh (CSRF)
Note that the Storage Node/Control Node Refresh Cleanup task
(CSRC) releases its lock once it notifies the grid of the lost content
from the original. In other words, you can run another task, for
example, the Grid Expansion task (GEXP) during this phase of the
Hardware Refresh Cleanup grid task as long as there are no other
resource contentions. Use the NMS MI to determine the phase of
the grid task.
For more information on grid tasks and resource locking and
restrictions, see the chapter “Grid Tasks” in the Administrator
Guide.
Procedure
1. Log in to the NMS MI using the Vendor account.
2. Find the OID of the root element in the NMS topology tree for the
source node that is being refreshed.
a. Go to Grid Management  Grid Configuration  NMS Entities 
Overview  Main.
b. In the Name column, look for the name of the source grid node
that is being removed from the grid. Find and record its OID
value.
For example, assume you are performing a hardware refresh
on a Control/Storage Node at the Disaster Recovery site whose
name in the grid topology tree is DR-CS1. The OID value for
the grid node item is 2.16.124.113590.2.1.400001.1.1.2.1.
NetApp StorageGRID
215-06839_B0
175
Expansion Guide
Figure 10: Find the OID from Grid Configuration  NMS Entities
3. Find the node IDs of the destination grid nodes (LDR and/or CMS)
that are replacing the original source grid nodes.
a. Go to the NMS MI, and find the destination grid node in the
grid topology tree.
b. Record the Node ID, shown at LDR  Overview  Main
c. Repeat for the destination CMS, if applicable.
4. Log in to the primary Admin Node as root, using the password
listed in the Passwords.txt file.
5. Generate the Control/Storage Refresh Cleanup grid task.
For a Storage Node, enter:
cs-refresh-cleanup --oid <oid> --dest-ldr <nodeID>
For a Control/Storage Node, enter on one line:
cs-refresh-cleanup --oid <oid> --dest-ldr <nodeID> 
--dest-cms <nodeID>
6. Log out of the primary Admin Node. Enter: exit
7. Return to the NMS MI, and go to <grid_root>  Configuration 
Tasks.
8. In the Pending table, locate the grid task Storage Node or Control/
Storage Node Refresh Cleanup. Under Actions, select Start.
9. Click Apply Changes.
10. The grid task moves to the Active table.
NOTE Do not abort the hardware refresh cleanup grid task.
176
NetApp StorageGRID
215-06839_B0
Hardware Refresh
11. Wait until the grid task completes and moves to the Historical table
with a Status of Successful.
12. Verify the grid in the NMS MI:
a. Wait until the new server is detected and all alarms clear. It
could take a few minutes.
b. Verify that the old server no longer appears in the tree.
c. Monitor these attributes:
Grid Node
Component
Attribute
Notes
New
Storage Node
LDR  Overview
State
When the Grid Expansion task is
complete, the state of the LDR on the
new Storage Node is Read-only.
When the refresh cleanup task is
complete, the state changes to
Online.
New
Control Node
CMS  Content
or CMS 
Metadataa
Stored Objects
The values of Stored Objects and
Managed Objects should be approximately the same as for the source
CMS
CMS  Tasks
Object Lost
background task
a.
Managed Objects
The value indicates the progress of
restoring copies of the objects from
the failed storage.
The attributes appear under CMS  Metadata when CMSs use metadata replication and under 
CMS  Content when CMSs use metadata synchronization.

Note: Metadata synchronization is deprecated.
For the next step in the procedure, see:
•
Table 10: “Recover From Source LDR Storage Volume Failure” on
page 170
—or —
•
Table 11: “Recover From Destination LDR Storage Volume Failure”
on page 171
Resubmit the Hardware Refresh Grid Task
Prerequisites
NetApp StorageGRID
215-06839_B0
•
A storage volume failed on the destination LDR while a
Storage Node hardware refresh grid task was in progress.
•
The original hardware refresh grid task was aborted.
•
Destination storage volumes have been replaced and reformatted.
177
Expansion Guide
Procedure
1. Retrieve the file that contains the grid task from the current SAID
package, found on the Provisioning Media for the grid.
The grid task is found in the Grid_Tasks folder of the SAID package,
and is named grid-task-CSRF-<hostname>.
2. Copy the grid-task-CSRF-<hostname> file to the same computer that
you use to access the NMS MI.
3. Open the file that contains the grid task (Task Signed Text Block)
using WordPad or a programmer’s text editor.
4. Copy the Task Signed Text Block to the clipboard:
a. Select the text including the opening and closing delimiters:.
-----BEGIN TASK----AAAOH1RTSUJDT05UAAANB1RCTEtjbmN0AAAM+1RCTEtDT05UAAAA
EFRWRVJVSTMyAAAAAQAAABBUU0lEVUkzMoEecsEAAAAYVFNSQ0NTV
...
s5zJz1795J3x7TWeqBAInHDVEMKg95O95VJUW5kQij5SRjtoWLAYXC
-----END TASK-----
5.
6.
7.
8.
9.
If the block has a readable description above the opening
delimiter, it may be included but is ignored by the StorageGRID system.
b. Copy the selected text.
Log in to the NMS MI using the Vendor account.
Go to CMN  Grid Tasks  Configuration  Main.
Under Submit New Task, select the prompt text so that you can
replace it with the Task Signed Text Block.
Paste in the Task Signed Text Block.
Click Apply Changes.
The grid validates the Task Signed Text Block and either rejects the
grid task or adds it to the table of Pending tasks.
For the next step in the procedure, see Table 11: “Recover From Destination LDR Storage Volume Failure” on page 171.
178
NetApp StorageGRID
215-06839_B0
A
Prepare Virtual
Machines
Prepare virtual machines to host grid nodes as a result
of expansion or hardware refresh
Introduction
Complete the steps in Table 12 to prepare new virtual machines for
new grid nodes.
If the grid expansion adds virtual machines at two or more sites,
complete the steps in Table 12: “Prepare Virtual Machines for Expansion” on page 179 and Table 3: “Add, Customize, and Start Expansion
Grid Nodes” on page 41 for all virtual machines at one site before you
travel to the next site.
Table 12: Prepare Virtual Machines for Expansion

Step
Action
See
1.
Install VMware vSphere.
page 180
2.
Install Linux on virtual machines.
page 185
3.
Install VMware tools on virtual machines.
page 188
4.
Reorder network interface cards if necessary.
page 190
5.
Configure Storage Nodes for NFS storage volumes.
page 192
6.
Load software ISO images of any required CDs.
page 194
NetApp StorageGRID
215-06839_B0
179
Expansion Guide
Refresh of Combined Grid Nodes to Virtual
Machines
If you are refreshing combined grid nodes to virtual machines, create
one virtual machine for each split grid node.
Table 13: Refresh of Combined Grid Nodes to Virtual Machines
Combined Grid Node Type
Split Grid Nodes
Virtual Machines Needed
Admin/Gateway Node
Admin Node (with or
without a CMN)
two
Gateway Node
Control/Storage Node
Control Node
two
Storage Node
Gateway/Control/Storage Node
Gateway Node
three
Control Node
Storage Node
Install VMware vSphere
Install and configure VMware vSphere software on all servers that will
host virtual machines. These virtual machines will host grid nodes.
Perform the steps outlined in Table 12.
Table 14: Install and Configure VMware Software

Step
Action
See
1.
Install VMware vSphere software.
page 181
2.
Create the virtual machines.
page 182
3.
Configure ESX/ESXi for automatic restart.
page 183
4.
Start the virtual machines.
page 185
NOTE For supported versions of VMware software, see the Interoperability
Matrix Tool (IMT).
180
NetApp StorageGRID
215-06839_B0
Prepare Virtual Machines
Install VMware vSphere Software
To install and configure a virtual machine, install and configure
VMware vSphere software. In particular, you require VMware ESX/
ESX i and VMware vCenter Server software. For the steps required to
install these vSphere products see VMware documentation available
at: http://www.vmware.com/support/pubs
VMware vCenter Server
Create and configure virtual machines using the Open Virtualization
Format (OVF) files produced by Grid Designer:
•
one OVF file per VM host — When deployed using vCenter Server,
this OVF file creates and configures all virtual machines hosted on
a physical server.
To use these OVF files, each destination host must have a single
datastore with enough free space to hold all virtual disks required
on this VM host. For information on the size of the virtual disks
required on a VM host, see the VM BOM created by Grid Designer.
After the initial deployment, you can use vCenter to migrate
virtual disks to multiple datastores.
•
one OVF file per virtual machine — These OVF files create and configure a single virtual machine. For information on the size of the
virtual disks required for each virtual machine, consult the VM
BOM created by Grid Designer.
VMware ESX/ESXi
You must install VMware ESX/ESXi on a prepared physical server with
correctly configured hardware. Grid hardware must be correctly configured (including firmware versions and BIOS settings) before you
install VMware software.
Configure networking in the hypervisor as required to support networking for the grid. Note that an HAGC hosted on a virtual machine
does not require the use of a crossover cable for heartbeat.
Ensure that the datastore is large enough for the virtual machines and
virtual disks that are required to host the grid nodes. If you create
more than one datastore, name each datastore so that you can easily
identify which datastore to use for each grid node when you create
virtual machines.
NetApp StorageGRID
215-06839_B0
181
Expansion Guide
VMware vSphere Client
Install the VMware vSphere client on your service laptop. The
VMware vSphere client allows you to connect to vCenter Server,
monitor ESX/ESXi servers, and create and configure virtual machines.
Create the Virtual Machines
After you install VMware ESX/ESXi, create one virtual machine for
each grid node installed on the server.
NOTE For more information on creating virtual machines, see VMware
vSphere documentation.
Create Virtual Machines Using OVF Files
Use this procedure when you are using VMware vCenter Server.
Prerequisites
•
OVF files
•
VM Bill of Materials (the GID<grid_ID>_REV<revision_number>
_GSPEC_VMBOM.html file) includes information on the virtual
machines to be installed on each physical server, including the type
of grid node hosted in each one and the resources that each VM
requires.
•
vSphere client software
•
when deploying one OVF file per VM Host:
•
•
vCenter Server software
•
a datastore large enough for all virtual disks for all virtual
machines hosted on the server
when deploying one OVF file per virtual machine
•
a datastore large enough for the virtual machine’s virtual disks
Procedure
•
Connect to vCenter Server using vSphere Client software, and
deploy each OVF file to a host system.
This creates all of the required virtual machines, as described in the
VM bill of materials.
182
NetApp StorageGRID
215-06839_B0
Prepare Virtual Machines
Create Virtual Machines Manually
If required, use the NetApp StorageGRID Deployment Guide and the
information in the VM BOM generated by Grid Designer to manually
create virtual machines.
After you create all virtual machines hosted on the server, and you
adjust the resource allocations if required, configure the VMs to
automatucally restart when the ESX/ESXi server restarts. See “Configure ESX/ESXi for Automatic Restart” below.
Configure ESX/ESXi for Automatic Restart
Configure the ESX/ESXi server to automatically restart the virtual
machines when the server restarts. Without an automatic restart, the
virtual machines and grid nodes remain shut down after VMware
ESX/ESXi server restarts.
Prerequisites
•
All virtual machines have been created and configured
Procedure
1. In the VMware vSphere client tree, click the root element (that is,
the ESX/ESXi server), and then select the Configuration tab.
Figure 11: Configuration Pane
NetApp StorageGRID
215-06839_B0
183
Expansion Guide
2. Under Software, click Virtual Machine Startup/Shutdown, and click
Properties.
Figure 12: Virtual Machine Startup and Shutdown Window
3. Under System Settings, select Allow virtual machines to start and
stop automatically with the system.
4. Under Default Startup Delay, leave the start-up delay time at the
default of 120 seconds.
A delay of 120 seconds gives each virtual machine time to start
before the next virtual machine begins the process of starting. This
delay allows for a smoother start-up process that will not overload
the system.
5. Under Startup Order, in the Manual Startup list select any virtual
machine that hosts an Admin Node or a Gateway Node, and click
Move up to move it to the Automatic Startup list.
Place any Admin Node at the top of the list and any
Gateway Nodes below it in any order. Admin Nodes and
Gateway Nodes are generally NTP primaries. Starting them first
helps prevent timing alarms within the grid when the ESX/ESXi
server restarts.
6. Move virtual machines that host other grid nodes to the Any Order
list by clicking Move up.
These virtual machines restart in any order.
184
NetApp StorageGRID
215-06839_B0
Prepare Virtual Machines
Figure 13: Move Virtual Machines to Control Startup Order
7. Click OK.
Start the Virtual Machines
Prerequisites
•
All virtual machines have been configured for automatic restart.
Procedure
1. In the vSphere client, select the virtual machine.
2. Click the Console tab.
3. Click the green Power On button
.
No operating system is available to the virtual machine at this point, so
it attempts to PXE boot and fails with the message “Operating System
not found”.
Install Linux on Virtual Machines
You can load Linux, prepare hardware, and install grid software on
servers in parallel. This reduces the time required to install the grid.
NetApp StorageGRID
215-06839_B0
185
Expansion Guide
Prerequisites
•
The virtual machine for the grid node has been started. See “Install
VMware vSphere” on page 180.
•
SLES 11 DVD
Use only a supported version of SLES 11. For more information,
see the Interoperability Matrix Tool (IMT).
•
Server Activation floppy image. See “Provision the Grid and
Create Server Activation Floppy Image” on page 29.
Procedure
1. Insert the SLES DVD into the machine from which you are running
the vSphere client. Skip this step if you are using an ISO image of
Linux.
2. In the VMware vSphere client navigation tree, select the virtual
machine.
3. Click the Connect/Disconnect CD/DVD drive to the virtual machine icon,
then select Connect CD/DVD 1  Connect to <CD_drive_letter>
— or —
If you are using an ISO image of the Linux installation, select
Connect CD/DVD 1  Connect to ISO image on local disk.
4. Click the Console tab.
5. Click anywhere inside the Console pane to enter the Console pane.
Your mouse pointer disappears.
TIP
Press <Ctrl>+<Alt> to release your mouse pointer from the VM
console.
6. Press <Ctrl>+<Alt>+<Insert> to reset the virtual machine. The server
performs the following steps:
186
•
The BIOS runs a hardware verification.
•
By default the system boots from the DVD, and loads the SUSE
Linux Enterprise Server Boot Screen in the VMware vSphere
client Console pane.
NetApp StorageGRID
215-06839_B0
Prepare Virtual Machines
Figure 14: SLES Boot Screen
7. From the SLES Boot Screen:
a. Press the keyboard’s down arrow and highlight Installation.
(Do not press <Enter>.)
NOTE You must move the cursor to the Installation option within eight
seconds. If you do not, SLES will automatically attempt to install
from the hard drive and the installation process will fail. If this
happens, you must begin the installation process again from the
beginning.
b. Press <Ctrl>+<Alt> to leave the Console pane.
The mouse icon disappears.
c. Click Connect Floppy 1 and select Connect to Floppy Image on
local disk.
d. On your service laptop, select the floppy image that contains
the activation file for this server.
e. Click anywhere inside the Console pane to return to the Console
pane.
f.
Press <Tab>. At the bottom of the screen, adjacent to the Boot
Options prompt, enter:
autoyast=device://fd0/<servername>-autoinst.xml
where <servername> is the server name used to name the activation file.
NetApp StorageGRID
215-06839_B0
187
Expansion Guide
NOTE If you do not enter the path to the activation file when required,
AutoYaST does not customize the installation for the server.
Always enter the path to the activation file.
If you enter an incorrect value, and are prompted to re-enter
the device name and path, check the floppy device name.
g. Press <Enter>.
Wait about one minute while the installer processes the
information.
The base SLES installation completes without further
intervention.
8. During this installation process, disconnect the floppy image after
it is no longer required. Click the Connect/Disconnect the floppy
devices of the virtual machine icon and select Disconnect Floppy 1 
Disconnect from <drive>.
NOTE If the floppy image is connected when the virtual machine
reboots, the message “This is not a bootable disk is displayed.”
To continue, disconnect the floppy image and hit any key to
continue.
When SLES installation is complete, the server completes its configuration and starts the operating system. Installation is complete when the
login prompt appears.
Install VMware Tools
Install VMware Tools on each virtual machine that will host a grid
node. For information on the supported versions of VMware software,
see the Interoperability Matrix Tool (IMT).
Install VMware Tools from an ISO image that is packaged with
vSphere client software.
The vmware_setup.py script configures the video resolution settings
that the virtual machine uses on startup. This is necessary because the
default resolution is not supported by the VMware video adapter,
which results in an error on startup.
188
NetApp StorageGRID
215-06839_B0
Prepare Virtual Machines
Prerequisites
•
Linux has been installed on the virtual machine
Procedure
1. In the Virtual Machine tree, right-click the virtual machine, then
select Guest  Install/ Upgrade VMware Tools.
The Install VMware Tools dialog appears.
2. Select Interactive Tool Upgrade and click OK.
The VMware Tools package is made available to the virtual
machine as an ISO at /cdrom.
Wait until VMware disconnects the Linux CD/DVD.
3. In the VMware vSphere Client, click in the Console pane of the
virtual machine and log in to the host that will be the new grid
node.
4. Copy the VMware Tools packages to the virtual machine:
a. Mount the ISO image of the VMware Tools CD. Enter:
mount /cdrom
b. Copy the gzip package from the CD to the virtual machine, and
unpack it. Enter:
mkdir /tmp/vmtools
cd /tmp/vmtools
tar -zxvf /cdrom/VMwareTools-*.tar.gz
5. Install VMware Tools, accepting the default installation options.
Enter:
cd /tmp/vmtools/vmware-tools-distrib/
./vmware-install.pl --default
Wait for the installation to complete. This takes about a minute.
6. Check to make sure that VMware Tools is running. Enter:
/etc/init.d/vmware-tools status
You will see the message “vmtoolsd is running”.
7. Remove the installation files from the virtual machine. Enter:
cd /tmp
rm -rf vmtools
8. Run the vmware_setup.py script. Enter: vmware_setup.py
The script completes silently.
9. Reboot to ensure the changes take effect. Enter: reboot
NetApp StorageGRID
215-06839_B0
189
Expansion Guide
Reorder Network Interface Cards
Before proceeding with the installation of a server with multiple NICs,
ensure that you understand the purpose of each network interface on
the server, and that each one is connected correctly. The Network
Interface reorder tool (nic_reorder.py) can help you by automating the
process of associating a logical ethernet name to a physical network
interface.
The way that SLES assigns ethernet names to ports based on both the
PCI Bus ID and the MAC address of the network interfaces can lead to
an arbitrary arrangement of names and ports. This is particularly troublesome on Gateway Nodes which always have multiple network
interfaces.
A Gateway Node that is part of a basic replication group has a
minimum of one network interface: one for the grid network and
optionally one on the customer network for application integration. A
Gateway Node that is part of a High Availability Gateway Cluster has
a minimum of two network interfaces, as it also requires a network
interface for the heartbeat service used to monitor the health of the
cluster. In addition, the customer and/or grid network interfaces of any
Gateway Node may be bonded to provide additional redundancy.
NOTE You cannot ping via the heartbeat interface until after SLES is
installed on the second server. Connect and identify this interface on
both servers after all other interfaces have been identified. If the
Gateway Node is a member of a High Availability Gateway Cluster
and is installed on physical servers (not VMs), there is a crossover
cable linking the two servers. An HAGC hosted on a virtual machine
does not require the use of a crossover cable for heartbeat.
Procedure
1. Log in to the server as root, using the password provided in the
Passwords.txt file.
2. Consult the grid documentation to determine which IP address is
assigned for each purpose.
3. Start the NIC reorder tool. Enter: nic_reorder.py
The script provides a mapping of the logical interfaces and the corresponding physical interfaces, as shown below. Each ethernet
190
NetApp StorageGRID
215-06839_B0
Prepare Virtual Machines
name and IP address defined on the server is listed, along with
information about the physical interface, such as the PCI Bus ID of
each NIC.
Logical Interfaces
Physical Interfaces
Name
IP Address
#
Driver
MAC Address
eth0
eth1
eth2
eth3
eth4
bond0 192.168.170.174
bond0 192.168.170.174
bond1 192.168.120.74
bond1 192.168.120.74
192.168.120.73
A
B
C
D
E
bnx2
e1000
e1000
bnx
e1000
00:18:fe:70:b0:a6
00:18:fe:70:b0:a6
00:17:08:7d:3b:bc
00:17:08:7d:3b:bc
00:17:08:7d:3b:bd
PCI Bus Id
0000:05:00.0
0000:13:00.1
0000:17:00.0
0000:03:00.0
0000:17:00.1
4. If bonded interfaces show the same MAC address for both
members of the bonded pair:
a. Enter Q to exit the nic_reorder.py application.
b. Disable bonding. Enter: ifdown bond0
c. Start the NIC reorder tool. Enter: nic_reorder.py
When you re-enter the program, the actual MAC address of
each network interface is shown. (The bonding configuration is
reloaded if you exit the program using W or enter ifup bond0
after you exit the program.)
5. If no interface is mapped to eth0 (this happens if incorrect pci-busids are specified in the specification file), reset the interfaces. 
Enter: R
6. If the PCI Bus ID or MAC address are not readily visible on the
server (allowing you to easily identify each physical interface):
a. Note the IP address of the first logical interface.
b. Attach the network cable associated with the first logical interface to any one of the physical network interfaces on the server.
c. Enter Q to exit the nic_reorder.py application.
7. Ping the default gateway for the subnet containing the IP address
of the first logical interface (or another IP address in the same
subnet as this IP).
8. If the ping does not succeed, swap the logical interface you are
trying to find onto another PCI Bus ID:
a. Restart the NIC reorder tool. Enter: nic_reorder.py
b. Enter: S
c. When prompted for the first interface to swap, enter the letter
that appears to the right of the IP address of the interface you
are trying to find.
NetApp StorageGRID
215-06839_B0
191
Expansion Guide
d. When prompted for the second interface, enter the letter associated with the next PCI Bus ID you want to try.
e. Confirm that you want to swap the interfaces. Enter: Y
f.
Save the updated configuration and exit the program. Enter: W
g. The first logical interface is now associated with a different PCI
Bus ID. Try pinging the default gateway for the first logical
interface again.
h. If the ping does not succeed, go back to step a and try again.
NOTE The server may contain unused NICs.
9. When the ping succeeds, you have successfully identified the PCI
Bus ID that corresponds to the logical interface.
10. Note which physical interface corresponds to the first logical interface. This record helps you to keep track of the NICs as you
connect cables and swap interfaces.
11. Repeat steps 3 to 10 for each logical interface to correctly connect
all network interfaces on the server.
Configure Storage Nodes for NFS Mounted
Storage Volumes
Use this procedure to configure the server for the installation of a
Storage Node integrated with NFS mounted storage volumes.
Prerequisites
•
The NetApp storage system has been set up and exported via NFS.
NOTE Configuration of the NFS server is beyond the scope of this
guide.
•
NFS mounted storage volumes are integrated with the
Storage Node as described in the the Administrator Guide.
•
Ensure that you have the IP address of the NFS server.
Procedure
1. Log in to the Storage Node server as root, using the password
provided in the Passwords.txt file.
192
NetApp StorageGRID
215-06839_B0
Prepare Virtual Machines
2. Verify connectivity to the NFS server. Enter: ping <NFS_Server_IP>
3. Verify that the Linux NFS client package is present (it should be
installed by default). Enter:
•
For SLES 11: rpm -q nfs-client
•
For SLES 10: rpm -q nfs-utils
4. Mount the NFS exports:
a. At the Storage Node server, using a text editor such as vi, add
this line to the /etc/fstab file for each storage volume (on one
line):
<NFS_Server_IP>:<volume_path> /var/local/rangedb/<next_index>
nfs rw,rsize=65536,wsize=65536,nfsvers=3,tcp
where:
•
<NFS_Server_IP> is the IP address of the NFS server
•
<volume_path> is the path of the storage volume exported
from the NFS server
•
<next_index> is the Storage Node rangedb, a number
between 0 and 15 in hexadecimal notation (0 to F, case-specific). For the first storage volume, the index number is 0.
For example:
192.168.130.16:/vol/vol1 /var/local/rangedb/0 nfs
rw,rsize=65536,wsize=65536,nfsvers=3,tcp
Repeat step a for each storage volume.
b. Create a mount point for each NFS storage volume, using the
same index numbers used in the etc/fstab file in step a. Enter:
mkdir -p /var/local/rangedb/<next_available_index>
For example:
mkdir -p /var/local/rangedb/0
Do not create mount points outside of /var/local/rangedb.
Repeat step b for each storage volume.
c. Mount the storage volumes to the mount points. Enter:
mount -a
If mounting fails, make sure that the storage volumes are configured on the NFS server for read-write access to the
Storage Node and that the IP address of the Storage Node
supplied to the NFS server is correct.
NetApp StorageGRID
215-06839_B0
193
Expansion Guide
Load Software ISO Images
Use load_cds.py on a Virtual Machine
Follow this procedure if the primary Admin Node is installed on a
virtual machine.
Prerequisites
•
The grid did not previously include a TSM Archive Node and you
are adding it in this expansion.
You do not need to load any other ISO images. The Grid Deployment
Utility (GDU) installs the current version and service pack level of grid
software on the expansion grid nodes, using copies of the ISO images
that you created when the grid was installed or last updated.
Procedure
1. In vSphere Client, click in the console window of the primary
Admin Node’s virtual machine and log in as root. When prompted
for a password, press <Enter>.
2. Start GDU, as described in “Start GDU” on page 211.
3. Check to see if the required ISOs are present:
a. In the Actions panel, select ISO List and press <Enter>.
GDU shows a list of the ISO images that have already been
copied to the Admin Node.
b. Press <Enter> to close the ISO Repository Contents list and then
in the Actions panel, select Quit and press <Enter>.
If any of the required ISO images are missing, create them using
the load_cds.py script.
4. Insert a CD in the service laptop. The order in which you insert the
CDs does not matter.
TIP
Press <Ctrl>+<Alt> to release your mouse pointer from the VM
console.
5. Click the Connect/Disconnect CD/DVD drive to the virtual machine icon,
then select Connect CD/DVD 1  Connect to <CD_drive_letter>
194
NetApp StorageGRID
215-06839_B0
Prepare Virtual Machines
6. Enter: load_cds.py
Wait while the ISO image of the CD is written to the correct
directory.
7. To exit, type n and press <Enter> when prompted.
8. Log out. Enter: exit
Troubleshooting
This section includes troubleshooting topics to help you identify and
solve problems that may occur while preparing virtual machines.
If problems persist, contact Support. You may be asked to supply the
following installation log files:
•
/var/local/log/install.log (found on the server being installed)
•
/var/local/log/gdu-console.log (found on the primary Admin Node)
Virtual Machine Not Started
If a virtual machine does not start after you create it, see “VM Resource
Reservation Requires Adjustment” on page 195.
If a virtual machine does not restart after VMware ESX/ESXi is
restarted, see “VM is Not Configured for Automatic Restart” on
page 196.
VM Resource Reservation Requires Adjustment
The OVF files created by Grid Designer include a resource reservation
designed to ensure that each grid node has sufficient RAM and CPU to
operate efficiently. If you create virtual machines by deploying these
OVF files on ESX/ESXi and the predefined number of resources are not
available, the virtual machines will not start.
If you are certain that the VM Host has sufficient resources for each
grid node, manually adjust the resources allocated for each virtual
machine, and then try starting the virtual machines.
1. In the VMware vSphere client tree, select the virtual machine that
is not started.
2. Right-click the virtual machine, and select Edit Settings.
NetApp StorageGRID
215-06839_B0
195
Expansion Guide
3. From the Virtual Machines Properties window, select the Resources
tab.
4. Adjust the resources allocated to the virtual machine:
a. Select CPU, then use the Reservation slider to adjust the MHz
reserved for this virtual machine.
b. Select Memory, then use the Reservation slider to adjust the MB
reserved for this virtual machine.
5. Click OK.
6. Repeat as required for other virtual machines hosted on the same
VM Host.
VM is Not Configured for Automatic Restart
If the virtual machine does not restart after VMware ESX/ESXi is
restarted, it is most likely that the virtual machine has not been configured for automatic restart.
1. In the VMware vSphere client tree, select the virtual machine that
is not started.
Figure 15: Virtual Machine Manual Restart
196
NetApp StorageGRID
215-06839_B0
Prepare Virtual Machines
2. In the Getting Started pane, under Basic Tasks, click Power on the
virtual machine.
3. Configure the virtual machine to restart automatically. See “Install
VMware vSphere” on page 180.
Resetting the Virtual Machine
During the Linux installation procedure, the keyboard shortcut <Ctrl>
+ <Alt> + <Insert> is used to reset the virtual machine. Occasionally, this
keyboard shortcut may fail. To enable the <Ctrl> + <Alt> + <Insert>
keyboard shortcut used to reset the virtual machine, update the inittab
file.
1. At the server being installed, access a command shell and log in as
root using the password listed in the Passwords.txt file. If a
password has not yet been set, hit <Enter>.
2. Open the /etc/inittab file with a text editor.
3. Search for either CTRL-ALT-DELETE or ctrlaltdel.
4. Uncomment the line # ca::ctrlaltdel:/sbin/shutdown -t3 -r now by
removing the # symbol. For example:
# disabled for security: ca::ctrlaltdel:/sbin/shutdown -r -t
4 now
becomes
ca::ctrlaltdel:/sbin/shutdown -t3 -r now
5. Save and exit the /etc/inittab file.
6. Initiate the change: Enter: init q
The keyboard shortcut <Ctrl> + <Alt> + <Insert> will now work when
resetting the virtual machine during installation.
7. Log out of the server.
NetApp StorageGRID
215-06839_B0
197
Expansion Guide
198
NetApp StorageGRID
215-06839_B0
B
Prepare Expansion
Physical Servers
Introduction
NOTE New installations of the StorageGRID 9.0 system are not supported
on physical servers.
Complete the steps in Table 15 to prepare new servers for grid nodes.
If the grid expansion adds physical servers at two or more sites,
complete the steps in Table 15: “Prepare Servers for Expansion” on
page 199 and Table 3: “Add, Customize, and Start Expansion Grid
Nodes” on page 41 for all physical servers at one site before you travel
to the next site.
Table 15: Prepare Servers for Expansion

Step
Action
See
1.
Install Linux.
page 200
2.
Install drivers.
page 202
3.
Install hardware monitoring.
page 202
4.
Reorder network interface cards if necessary.
page 202
5.
Configure Storage Nodes for NFS storage volumes.
page 205
6.
Load software ISO images of any required CDs.
page 206
NetApp StorageGRID
215-06839_B0
199
Expansion Guide
Install Linux on Physical Servers
You can load Linux, prepare hardware, and install grid software on
servers in parallel. This can reduce the total time required to install the
grid. As the DVD is ejected from a server, you can start loading Linux
on another server using that disk.
Prerequisites and required materials
•
Server Activation USB flash drive. For information on how to
create it, see “Provision the Grid and Create a Server Activation
USB Flash Drive” on page 32.
•
SLES 11 DVD
Use only a supported version of SLES. For more information, see
the Interoperability Matrix Tool (IMT).
Procedure
1. If the server is to be set up as a Storage Node, disconnect the server
from external storage.
Installation of the operating system proceeds much more quickly if
external storage is disconnected.
2. Insert the SLES installation DVD into the server. The SUSE Linux
Enterprise Server Boot Screen appears.
3. Within eight seconds, press the keyboard’s down arrow and highlight Installation. Do not press <Enter>.
NOTE You must move the cursor to the Installation option within eight
seconds. If you do not, SLES automatically attempts to boot from
the hard drive and the installation fails. If this happens, you must
begin the installation process again.
4. If you are installing from a KVM that only supports a limited
screen resolution, press <F3>, and use the arrow keys to select
1024x768 at the bottom of the screen.
5. Insert the Server Activation USB flash drive into the server.
6. At the bottom of the screen, adjacent to the Boot Options prompt,
enter:
autoyast=device://<devicename>/<servername>-autoinst.xml
where <devicename> is the device name and partition number of
the Server Activation USB flash drive, and <servername> identifies
200
NetApp StorageGRID
215-06839_B0
Prepare Expansion Physical Servers
the server name used to name the activation file (for example,
agn20-autoinst.xml). If you organized activation files into sub-directories by cabinet name or location, include the sub-directory name
in the path to the activation file.
The SLES AutoYaST installer requires the location of the server’s
activation file so that it can use the information in this file to
identify the type of grid node and install the correct packages.
NOTE Always enter the path to the activation file. If you do not,
AutoYaST does not customize the installation for the server.
In Linux, USB flash drives are usually mapped as SCSI devices.
The following device names are typical for StorageGRID grids:
Table 16: USB Flash Drive Names
Device Name
Description
sdc1
Admin Node, Control Node, or Gateway Node
sdb1
Archive Node
sdb1
Audit Node
sdb1
Storage Node — Disconnected.
sdc1
Storage Node — Connected with a single storage
volume without multipath storage .
7. Press <Enter>.
There is a wait of about one minute while the installer processes
this information.
8. When prompted, remove the Server Activation USB flash drive,
and click OK.
9. When the login prompt appears, remove the DVD. Installation is
complete.
10. If you disconnected the storage to install Linux, reconnect the
storage to the server.
a. Power down the server. Enter: halt
b. Connect the cables between the server and the storage.
c. Power the server back up.
d. Power up the disk arrays.
NetApp StorageGRID
215-06839_B0
201
Expansion Guide
Install Drivers
When installing StorageGRID software on supported servers, you may
also need to install drivers. The Enablement Layer CD for StorageGRID software may include drivers for supported servers, and you
can use GDU to install drivers included on the Enablement Layer CD.
However, when the Enablement Layer CD excludes required drivers,
you cannot use GDU to install the drivers. You must locate and
manually install the drivers. For information about supported servers,
see the Interoperability Matrix posted on the NetApp Support Site
(http://support.netapp.com/).
It is your responsibility to confirm that the drivers are the most recent
qualified version. For the latest version, see your hardware vendor.
Install Hardware Monitoring
Install SNMP hardware monitoring software provided by the server’s
hardware vendor. For example, for IBM servers, you can install IBM
Director software.
Reorder Network Interface Cards
Before proceeding with the installation of a server with multiple NICs,
ensure that you understand the purpose of each network interface on
the server, and that each one is connected correctly. The Network
Interface reorder tool (nic_reorder.py) can help you by automating the
process of associating a logical ethernet name to a physical network
interface.
The way that SLES assigns ethernet names to ports based on both the
PCI Bus ID and the MAC address of the network interfaces can lead to
an arbitrary arrangement of names and ports. This is particularly troublesome on Gateway Nodes which always have multiple network
interfaces.
202
NetApp StorageGRID
215-06839_B0
Prepare Expansion Physical Servers
A Gateway Node that is part of a basic replication group has a
minimum of one network interface: one for the grid network and
optionally one on the customer network for application integration. A
Gateway Node that is part of a High Availability Gateway Cluster has
a minimum of two network interfaces, as it also requires a network
interface for the heartbeat service used to monitor the health of the
cluster. In addition, the customer and/or grid network interfaces of any
Gateway Node may be bonded to provide additional redundancy.
NOTE You cannot ping via the heartbeat interface until after SLES is
installed on the second server. Connect and identify this interface on
both servers after all other interfaces have been identified. If the
Gateway Node is a member of a High Availability Gateway Cluster
and is installed on physical servers (not VMs), there is a crossover
cable linking the two servers. An HAGC hosted on a virtual machine
does not require the use of a crossover cable for heartbeat.
Procedure
1. Log in to the server as root, using the password provided in the
Passwords.txt file.
2. Consult the grid documentation to determine which IP address is
assigned for each purpose.
3. Start the NIC reorder tool. Enter: nic_reorder.py
The script provides a mapping of the logical interfaces and the corresponding physical interfaces, as shown below. Each ethernet
name and IP address defined on the server is listed, along with
information about the physical interface, such as the PCI Bus ID of
each NIC.
Logical Interfaces
Physical Interfaces
Name
IP Address
#
Driver
MAC Address
eth0
eth1
eth2
eth3
eth4
bond0 192.168.170.174
bond0 192.168.170.174
bond1 192.168.120.74
bond1 192.168.120.74
192.168.120.73
A
B
C
D
E
bnx2
e1000
e1000
bnx
e1000
00:18:fe:70:b0:a6
00:18:fe:70:b0:a6
00:17:08:7d:3b:bc
00:17:08:7d:3b:bc
00:17:08:7d:3b:bd
PCI Bus Id
0000:05:00.0
0000:13:00.1
0000:17:00.0
0000:03:00.0
0000:17:00.1
4. If bonded interfaces show the same MAC address for both
members of the bonded pair:
a. Enter Q to exit the nic_reorder.py application.
b. Disable bonding. Enter: ifdown bond0
NetApp StorageGRID
215-06839_B0
203
Expansion Guide
c. Start the NIC reorder tool. Enter: nic_reorder.py
When you re-enter the program, the actual MAC address of
each network interface is shown. (The bonding configuration is
reloaded if you exit the program using W or enter ifup bond0
after you exit the program.)
5. If no interface is mapped to eth0 (this happens if incorrect pci-busids are specified in the specification file), reset the interfaces. 
Enter: R
6. If the PCI Bus ID or MAC address are not readily visible on the
server (allowing you to easily identify each physical interface):
a. Note the IP address of the first logical interface.
b. Attach the network cable associated with the first logical interface to any one of the physical network interfaces on the server.
c. Enter Q to exit the nic_reorder.py application.
7. Ping the default gateway for the subnet containing the IP address
of the first logical interface (or another IP address in the same
subnet as this IP).
8. If the ping does not succeed, swap the logical interface you are
trying to find onto another PCI Bus ID:
a. Restart the NIC reorder tool. Enter: nic_reorder.py
b. Enter: S
c. When prompted for the first interface to swap, enter the letter
that appears to the right of the IP address of the interface you
are trying to find.
d. When prompted for the second interface, enter the letter associated with the next PCI Bus ID you want to try.
e. Confirm that you want to swap the interfaces. Enter: Y
f.
Save the updated configuration and exit the program. Enter: W
g. The first logical interface is now associated with a different PCI
Bus ID. Try pinging the default gateway for the first logical
interface again.
h. If the ping does not succeed, go back to step a and try again.
NOTE The server may contain unused NICs.
9. When the ping succeeds, you have successfully identified the PCI
Bus ID that corresponds to the logical interface.
204
NetApp StorageGRID
215-06839_B0
Prepare Expansion Physical Servers
10. Note which physical interface corresponds to the first logical interface. This record helps you to keep track of the NICs as you
connect cables and swap interfaces.
11. Repeat steps 3 to 10 for each logical interface to correctly connect
all network interfaces on the server.
Configure Storage Nodes for NFS Mounted
Storage Volumes
For Storage Node servers with a with a Storage Volume Balancing attribute of Auto, ensure that the new Storage Node server has the same
number of volumes as the existing Storage Node server before you
configure NFS. For information on how to determine the number of
volumes, see “Storage Capacity” on page 130.
Use this procedure to configure the server for the installation of a
Storage Node integrated with NFS mounted storage volumes.
Prerequisites
•
The NetApp storage system has been set up and exported via NFS.
NOTE Configuration of the NFS server is beyond the scope of this
guide.
•
NFS mounted storage volumes are integrated with the
Storage Node as described in the the Administrator Guide.
•
Ensure that you have the IP address of the NFS server.
Procedure
1. Log in to the Storage Node server as root, using the password
provided in the Passwords.txt file.
2. Verify connectivity to the NFS server. Enter: ping <NFS_Server_IP>
3. Verify that the Linux NFS client package is present (it should be
installed by default). Enter:
NetApp StorageGRID
215-06839_B0
•
For SLES 11: rpm -q nfs-client
•
For SLES 10: rpm -q nfs-utils
205
Expansion Guide
4. Mount the NFS exports:
a. At the Storage Node server, using a text editor such as vi, add
this line to the /etc/fstab file for each storage volume (on one
line):
<NFS_Server_IP>:<volume_path> /var/local/rangedb/<next_index>
nfs rw,rsize=65536,wsize=65536,nfsvers=3,tcp
where:
•
<NFS_Server_IP> is the IP address of the NFS server
•
<volume_path> is the path of the storage volume exported
from the NFS server
•
<next_index> is the Storage Node rangedb, a number
between 0 and 15 in hexadecimal notation (0 to F, case-specific). For the first storage volume, the index number is 0.
For example:
192.168.130.16:/vol/vol1 /var/local/rangedb/0 nfs
rw,rsize=65536,wsize=65536,nfsvers=3,tcp
Repeat step a for each storage volume.
b. Create a mount point for each NFS storage volume, using the
same index numbers used in the etc/fstab file in step a. Enter:
mkdir -p /var/local/rangedb/<next_available_index>
For example:
mkdir -p /var/local/rangedb/0
Do not create mount points outside of /var/local/rangedb.
Repeat step b for each storage volume.
c. Mount the storage volumes to the mount points. Enter:
mount -a
If mounting fails, make sure that the storage volumes are configured on the NFS server for read-write access to the
Storage Node and that the IP address of the Storage Node
supplied to the NFS server is correct.
Load Software ISO Images
This section contains the following procedures:
206
•
“Use load_cds.py with CDs” on page 207
•
“Use load_cds.py with ISOs” on page 207
NetApp StorageGRID
215-06839_B0
Prepare Expansion Physical Servers
Use load_cds.py with CDs
Prerequisites
•
The grid did not previously include a TSM Archive Node and you
are adding it in this expansion
You do not need to create any other ISO images. The Grid Deployment
Utility (GDU) installs the current version and service pack level of grid
software on the expansion grid nodes, using copies of the ISO images
that you created when the grid was installed or last updated.
Procedure
1. At the primary Admin Node server, access a command shell and
log in as root using the password listed in the Passwords.txt file.
2. Start GDU, as described in “Start GDU” on page 211.
3. Check to see if the required ISOs are present:
a. In the Actions panel, select ISO List and press <Enter>.
GDU shows a list of the ISO images that have already been
copied to the Admin Node.
b. Press <Enter> to close the ISO Repository Contents list and then
in the Actions panel, select Quit and press <Enter>.
If any of the required ISO images are missing, create them using
the load_cds.py script.
4. Remove any USB flash drive from the server.
5. Insert a CD. The order in which you insert the CDs does not
matter.
6. Enter: load_cds.py
Wait while the ISO image of the CD is written to the correct
directory.
7. When prompted, insert the next CD, type y, and press <Enter>.
8. Repeat step 7 for all CDs.
9. To exit, type n and press <Enter> when prompted.
10. Log out. Enter: exit
Use load_cds.py with ISOs
Use this procedure to load the ISOs if you have already copied ISO
images of the installation CDs to the primary Admin Node, for
NetApp StorageGRID
215-06839_B0
207
Expansion Guide
example if the server is at a remote site and you have used scp to copy
the files.
NOTE Do not put the ISO images in the /var/local/install directory of the
primary Admin Node; use any other directory instead, for instance 
/var/local/tmp. The load_cds.py script will copy the files from the
directory you specify to the /var/local/install directory.
Prerequisites
•
The grid did not previously include a TSM Archive Node and you
are adding it in this expansion.
You do not need to load any other ISO images. The Grid Deployment
Utility (GDU) installs the current version and service pack level of grid
software on the expansion grid nodes, using copies of the ISO images
that you created when the grid was installed or last updated.
Procedure
1. At the primary Admin Node server, access a command shell and
log in as root using the password listed in the Passwords.txt file.
2. Start GDU, as described in “Start GDU” on page 211.
3. Check to see if the required ISOs are present:
a. In the Actions panel, select ISO List and press <Enter>.
GDU shows a list of the ISO images that have already been
copied to the Admin Node.
b. Press <Enter> to close the ISO Repository Contents list and then
in the Actions panel, select Quit and press <Enter>.
If any of the required ISO images are missing, create them using
the load_cds.py script.
4. Remove any USB flash drive from the server.
5. Load the ISO images. Enter on one line:
load_cds.py <iso_TSM_Client_Packages>
If there is more than one ISO, separate the ISO file names with a
space. The order does not matter.
Wait until the ISO images are written the correct directory.
208
NetApp StorageGRID
215-06839_B0
Prepare Expansion Physical Servers
Troubleshooting
This section includes troubleshooting topics to help you identify and
solve problems that may occur while preparing physical servers.
If problems persist, contact Support. You may be asked to supply the
following installation log files:
•
/var/local/log/install.log (found on the server being installed)
•
/var/local/log/gdu-console.log (found on the primary Admin Node)
Wrong USB Device Name
The USB flash drive device name depends on the configuration of the
StorageGRID system, and in particular on the amount of storage
installed.
If you are prompted to re-enter the device name and path, check the
device name.
1. Press <Ctrl>+<Alt>+<F5> to access another Linux command shell.
2. Enter: cat /proc/partitions
A list of devices is displayed. One of these devices is the Provisioning USB flash drive. Note that device sizes are also displayed. This
may assist you in identifying the correct device.
3. Enter: exit to close the command shell.
4. Press <Alt>+<F7> to return to the installation screen.
NetApp StorageGRID
215-06839_B0
209
Expansion Guide
210
NetApp StorageGRID
215-06839_B0
C
How to Use GDU
Start GDU
NOTE GDU is always run from the primary Admin Node or the HCAC’s
primary reporting Admin Node.
1. At the primary Admin Node server or the HCAC’s primary reporting Admin Node, access a command shell.
— or —
If using GDU remotely:
a. Start a Telnet/ssh client such as PuTTY.
b. Select Window  Translation  Remote character set  UTF-8.
Figure 16: PuTTY Settings for GDU
2. Log in as root using the password listed in the Passwords.txt file.
3. If you are using GDU for an upgrade, enter: exec bash
NetApp StorageGRID
215-06839_B0
211
Expansion Guide
4. Enter: ssh-add
You need to run ssh-add, which adds the ssh private key to the ssh
agent, each time you start a new shell session.
For more information on ssh access points, see the Administrator
Guide.
5. If prompted, enter the SSH Access Password listed in the Passwords.txt file.
6. If using PuTTY, start screen. For example, enter: screen -S GDU
NOTE Do not use screen if running GDU locally because the GDU
console characters will not display properly.
The name of the session (for example GDU in the command above)
is optional, but recommended since it is useful for managing
screen sessions.
The screen program allows you to manage multiple shell instances
concurrently, connect to the same session from different locations,
detach from a session without stopping the program running
within the session, and resume a session that was previously
detached.
To detach from a screen, press <Ctrl>+<A> and then <Ctrl>+<D>.
To reattach to a screen, enter: screen -r
7. Start GDU. Enter: gdu-console
NOTE If you get an error using GDU during an upgrade, it is likely
because the session was already open. Either log out of the
session and log back in, or enter: exec bash
8. When prompted, enter the provisioning passphrase. Type the passphrase, press <Tab> to select OK, and then press <Enter>.
Figure 17: Entering the Provisioning Passphrase to Start GDU
If the characters do not display properly, see “GDU Display Problems” on page 217.
212
NetApp StorageGRID
215-06839_B0
How to Use GDU
GDU User Interface
Figure 18: GDU Console
The GDU console consists of five panels:
•
Servers —Displays the servers in the grid.
•
Tasks —Displays the procedures that can be performed on the
server selected in the Servers panel. Only the tasks applicable to the
current situation are displayed. It is possible to run GDU tasks in
parallel on different servers.
The list of tasks includes:
NetApp StorageGRID
215-06839_B0
Task
Select To
Continue Install
Continue the software installation on the primary
Admin Node or the HCAC’s primary reporting
Admin Node server if it has rebooted.
Enable Services
Start the grid software.
Install Driver
Install a driver from the Enablement Layer for
StorageGRID Software CD.
Install Software
Install the grid software on a new server.
Load Configuration
Load NMS configuration settings.
Reboot Server
Reboot the server and start the services.
Remount Storage
Check for preserved storage volumes and
remount them. Used for maintenance procedures
on Storage Nodes.
213
Expansion Guide
Task
Select To
Start Services
Start Server Manager and all services. This is
equivalent to the command 
/etc/init.d/servermanager start
Stop Services
Stop Server Manager and all services. This is
equivalent to the command
/etc/init.d/servermanager stop
Update Software
Apply a service pack.
Upgrade Software
Install a new base version of the software and
apply a service pack.
Update Status
Display the current server status.
These tasks are described in detail in the procedures where they
are used.
•
214
Server Info —Displays the state of the server selected in the Servers
panel. The status can be one of:
Current State
Notes
Available
The server is available for the tasks listed in the
Tasks panel.
Busy
A GDU task is running on this server.
Error
A GDU task has failed.
Pingable
The server is pingable, but cannot be reached
because there is a problem with the hostname.
Reachable
The server can be reached but is not available
because the ssh host keys do not match.
•
Log Messages —Displays the output of the GDU task executed for
the server selected in the Servers panel. If you are running multiple
GDU tasks in parallel, you can display the output of each task by
selecting the appropriate server in the Servers panel.
•
Actions —The actions are:
Action
Select to
Start Task
Start the procedure selected in the Tasks panel.
ISO List
List the ISO images that are in the /var/local/
install directory of the primary Admin Node.
Quit
Quit GDU.
NetApp StorageGRID
215-06839_B0
How to Use GDU
Entering Commands in GDU
Use the keyboard to enter commands:
To
Do
Go from panel to panel
Press <Tab>.
Go back from panel to
panel
Press <Shift> <Tab>.
Go up and down within a
panel
Press <Up Arrow> and <Down Arrow>
Press <Page Up> and <Page Down>
Press <Home> and <End>
Go right and left within a
panel
Press <Left Arrow> and <Right Arrow>.
Select a task
Press the space bar. X appears next to the
selected task.
Activate a command
Press <Enter>.
Install Drivers with GDU
You can use GDU to install drivers that are included on the Enablement Layer for StorageGRID Software CD in the drivers directory. It is
your responsibility to confirm that the drivers included on the Enablement Layer for StorageGRID Software CD are the most recent
qualified version. If they are not, get the latest version from the
hardware vendor and install the drivers manually.
Prerequisites
•
Connectivity to the primary Admin Node
•
List of drivers required for this server
•
Provisioning passphrase
Procedure
1. Start GDU.
2. Select the server in the Servers panel and confirm that its state is
Available.
NetApp StorageGRID
215-06839_B0
215
Expansion Guide
3. Install the driver:
a. Select Install Driver in the Tasks panel. A panel listing the available drivers opens automatically. If the driver you need is not
listed, you must install this driver manually.
Figure 19: Installing Drivers with GDU
b. Select a driver from the list.
c. Select OK. Wait for the driver installation script to complete.
d. Reboot the server: Select Reboot Server in the Tasks panel, then
select Start Task in the Actions panel and press <Enter>.
4. Repeat step 3 if you need to install any other driver on this server
using GDU.
5. If you have finished using GDU, close it and remove passwordless
access.
Close GDU
If you quit GDU while a task is in progress, GDU pauses until the task
completes, and then closes. Some tasks, such as formatting storage
volumes on a new Storage Node, can take hours to complete. Avoid
quitting GDU while long-running tasks are in progress. Continue
working in another terminal window.
1. Quit GDU. Select Quit in the Actions panel and then press <Enter>.
When prompted, confirm that you want to quit GDU.
2. Remove the ability to access servers without a server password.
Enter: ssh-add -D
3. Close the screen session. Enter: exit
216
NetApp StorageGRID
215-06839_B0
How to Use GDU
GDU Troubleshooting
GDU Display Problems
Under certain circumstances, the GDU console may not display properly. For an example, see Figure 20 below.
Figure 20: GDU Display Problems
•
If using PuTTY, change the Window Translation setting to Use font
encoding.
•
If running GDU locally, do not use screen.
Problems with Server Status
When starting GDU, the status update of all servers may hang, or take
a long time to complete. After server status is updated, many appear
as Unknown or Pingable when it is known that the servers are Available.
This typically occurs in large grids with many servers.
To correct the problem, quit GDU, and restart with the -k option. Enter:
gdu-console -k
Starting GDU with the -k option bypasses its initial status update on
startup. The state of all servers remains Unknown in GDU until you
manually update them using the Update Status task.
GDU Log Files
The GDU logs are located on the primary Admin Node in /var/local/log/
gdu-console.log.
Missing GDU Task
If a GDU task that you must execute is missing from the Tasks panel,
check the GDU log for the reason. For instance, it is possible that a
NetApp StorageGRID
215-06839_B0
217
Expansion Guide
required ISO image is missing. To list the ISO images currently in the 
/var/local/install directory, use the ISO List GDU action. For a sample
output, see Figure 21 below.
Figure 21: ISO Images in /var/local/install
The label Missing, required means that the ISO image of the CD required
for the installation is not in the /var/local/install directory.
The label Not present means that an ISO image that GDU expected to
find is not in the /var/local/install directory, but GDU does not know
whether this ISO image is actually required.
Troubleshooting With screen in Multi Display Mode
The screen program is useful when two or more people need to
interact with a shell session simultaneously for troubleshooting purposes. Below is an example of two users connecting to GDU at the
same time.
User 1 creates a named screen session and starts GDU.
# screen -S GDU
# gdu-console
User 2 lists the screen sessions and connects without detaching User 1.
# screen -ls
There is a screen on:
5361.GDU
(Attached)
1 Socket in /var/run/uscreens/S-root.
# screen -r -x GDU
Now both users are viewing GDU and inputs can come from either
user.
218
NetApp StorageGRID
215-06839_B0
How to Use GDU
About load_cds.py
The load_cds.py command accepts two different inputs: physical installation CDs or ISO images of the installation CDs stored in a directory
on the primary Admin Node or the HCAC’s primary reporting
Admin Node.
You can run the load_cds.py script as many times as you need.
The script automatically deletes older service pack software when you
load the latest service pack software.
If you insert the same CD twice, no new ISO is created. The existing ISO
will not be overwritten.
If the load_cds.py script fails because you inserted a CD unrecognized
by the script, eject the CD and continue with the correct CD (you do not
have start over from the first CD you loaded).
Copy ISO Files in Multi-Site Environment
In a multi-site environment, copy ISO files to the servers in the remote
location prior to installing or upgrading the software with GDU. This is
an optional , but recommended, step to reduce the number of large files
that would otherwise be transferred over a slow WAN link.
Data Center
Remote Site
Server 1
Primary Admin Node
Servers 2...n
Figure 22: Copying Files to Remote Site
NetApp StorageGRID
215-06839_B0
219
Expansion Guide
Prerequisites and Required Materials
•
ISO images of the StorageGRID Software CD and the Enablement
Layer for StorageGRID Software CD have been copied to the
primary Admin Node or the HCAC’s primary reporting
Admin Node using the load_cds.py command
•
ssh access between the Admin Node and the servers at the remote
location
Procedure
1. At the primary Admin Node server or the HCAC’s primary reporting Admin Node, access a command shell and log in as root using
the password listed in the Passwords.txt file.
It is not usually necessary to copy the
service pack ISO
images since these
files are small: there
is no real gain over
letting GDU copy
the files
automatically.
2. Copy the ISO image of the StorageGRID Software CD to a server at
the remote location. If the site has an Admin Node, copy the ISO to
it. Otherwise, use a Gateway Node, preferably a secondary. Enter:
scp /var/local/install/Bycast_StorageGRID_9.0.0_Software_\
<build>.iso <destination>:/var/local/tmp
where <destination> is the hostname or IP address of the first server
at the remote location.
3. Copy the ISO image of the Enablement Layer for StorageGRID
Software CD from the primaryAdmin Node to the destination
server. Enter:
scp /var/local/install/Enablement_Layer_for_StorageGRID_\
9.0.0_Software_<build>.iso <destination>:/var/local/tmp
4. If this is an upgrade, install the load_cds.py script. Enter:
scp /usr/local/sbin/load_cds.py <destination>:/usr/local/sbin/
5. Log in to the server at the remote site where you copied the ISO
files. Enter: ssh <destination>
When prompted, enter the password for the remote server listed in
the Passwords.txt file.
6. Change to the /var/local/tmp directory. Enter: cd /var/local/tmp
7. Load the ISOs using the load_cds.py script. Enter:
load_cds.py Bycast_StorageGRID_9.0.0_Software_<build>.iso \
Enablement_Layer_for_StorageGRID_9.0.0_Software_<build>.iso
Separate the ISO file names with a space.
8. Empty the temporary directory. Enter: rm -r /var/local/tmp/*
220
NetApp StorageGRID
215-06839_B0
How to Use GDU
9. Copy the ISOs from the first server at the remote location to the
remaining servers at the remote location. For each remaining
server:
a. Copy the ISO files needed for the update to the server. Enter:
scp /var/local/install/* <next_server>:/var/local/tmp
where <next_server> is the hostname or IP address of the next
server at the remote site.
b. If this is an upgrade, install the load_cds.py script. Enter:
scp /usr/local/sbin/load_cds.py <next_server>:/usr/local/sbin/
c. Log in to the next server at the remote site where you copied
the ISO files. Enter: ssh <next_server>
When prompted, enter the password for the remote server
listed in the Passwords.txt file.
d. Change to the /var/local/tmp directory. Enter: cd /var/local/tmp
e. Load the ISOs using the load_cds.py script. Enter:
load_cds.py Bycast_StorageGRID_9.0.0_Software_<build>.iso \
Enablement_Layer_for_StorageGRID_9.0.0_Software_<build>.iso
Separate the ISO file names with a space.
f.
Empty the temporary directory. Enter: rm -r /var/local/tmp/*
g. End the ssh session. Enter: exit
h. Repeat from step a for each server at the remote site.
10. End the ssh session on the remote server. Enter: exit
11. Log out of the Admin Node. Enter: exit
NetApp StorageGRID
215-06839_B0
221
Expansion Guide
222
NetApp StorageGRID
215-06839_B0
D
Grid Specification Files
and Provisioning
What is Provisioning
Provisioning is the process of turning a grid design into the collection of
files needed to create, expand, maintain, or upgrade the grid. That collection of files, referred to as the GPT (grid provisioning tool) repository,
includes the SAID package. The key input for provisioning is the grid
specification file.
About the SAID Package
The Software Activation and Integration Data (SAID) package contains
site-specific files for the grid. It is generated during the provisioning
process as a zip file and is named using the following naming
convention:
GID<grid_ID>_REV<revision_number>_SAID.zip
The SAID package contains the following items:
Table 17:
NetApp StorageGRID
215-06839_B0
Item
Description
Doc directory
Contains html files used to confirm provisioning.
Escrow_Keys
directory
Encryption keys used by the Data Recovery Tool.
Grid_Activation
directory
Contains activation files, one for each server. Activation files are named <servername>-autoinst.xml.
Activation files are keyed to work with the
hardware used for the grid deployment and the
version of StorageGRID software.
223
Expansion Guide
Table 17:
Item
Description
Configuration.txt
Lists grid-wide configuration and integration data
generated during the provisioning process.
Grid_Tasks directory
Contains files created by some types of changes to
the grid specification file, such as adding a server
or converting the grid to use metadata replication.
Grid tasks are used to trigger various actions
within the grid that are required to implement the
specified changes to the grid.
Grid specification file
XML file that encapsulates the grid design. File
name is:
GID<grid_ID>_REV<revision_number>_GSPEC.xml
Passwords.txt
Passwords used to access the grid.
Grid Configuration Files
The Doc directory of the SAID package contains html files documenting the specifications of the grid’s configuration. Use these pages to
confirm that the grid configuration is correct and complete.
The index.html can
only be opened in a
Windows Internet
Explorer browser.
•
Click the <SAID_package>/doc/index.html file. For an example, see
Figure 23 below.
Figure 23: Index.html File
224
NetApp StorageGRID
215-06839_B0
Grid Specification Files and Provisioning
About Grid Specification Files
The grid specification file is an XML file that encapsulates the configuration information needed to install, expand, and maintain a grid. The
file includes topology, servers, options, and networking details for the
grid.
Figure 24: Grid Specification File in XML Notepad 2007
All new grid specification files are created and deployed using Grid
Designer. As well, all grid specification files updated to
StorageGRID 9.0 are edited and deployed using Grid Designer. For
more information, see the Grid Designer User Guide.
Grid Specification File Stages
The grid specification file goes through a number of stages as the grid
is designed and then installed:
•
Default grid specification file — The default grid specification file
describes the basic grid topology and grid configuration.
•
Deployment grid specification file — The deployment grid specification file is created from the default grid specification file by
updating the grid specification file with customer-specific data, for
example IP addresses
•
Provisioned grid specification file — The provisioned grid specification file is created when the provision command is run.
These stages are summarized in Figure 25 below.
NetApp StorageGRID
215-06839_B0
225
Expansion Guide
Design New Grid
Modify Grid
Request for new grid
Request for grid
changes
Prepare default grid
specification file
Export provisioned
grid specification file
from the grid
Default grid
specification file
Provisioned grid
specification file
Replace factory
defaults in default grid
specification file with
customer-specific
information
Edit grid specification
file
Deployment grid
specification file
Deployment grid
specification file
Provision grid
Provision grid
Provisioned grid
specification file
Provisioned grid
specification file
Figure 25: Editing the Grid Specification File
Naming Convention
Grid specification files use the naming convention
GID<grid_ID>_REV<revision_number>_GSPEC.xml, where <grid_ID>
refers to the grid’s unique identifier and <revision_number> refers to the
revision number of the grid specification file, for example,
GID1234_REV1_GSPEC.xml.
The default grid specification file has a <revision_number> of zero
(REV0). The revision number is increased by 1 each time the grid specification file is modified, for example to add servers, change IP
addresses, or refresh hardware. For the initial installation of the grid,
the revision number must be 1 (REV1) — that is, the default grid specification file has been modified once for the installation of the
StorageGRID system. Any other revision number will cause provisioning to fail.
226
NetApp StorageGRID
215-06839_B0
Grid Specification Files and Provisioning
Grid Specification File Structure
Grid specification files are edited and deployed using Grid Designer.
The following section describes the xml structure of the grid specification file.
Server Names
To review grid information using the grid specification file, you must
ensure that you select the correct server. Table 18 below lists the server
tags. In addition, check the server name
(gptSpec>grid>site>site>server>name) to confirm that you are using the
correct server.
Table 18: Server Tags
NetApp StorageGRID
215-06839_B0
Server
XML Tag
Admin Node
admin
API Gateway Node
gateway
Archive Node
archive
Audit Node
custom
Control Node
control
Gateway Node
gateway
Storage Node
storage
227
Expansion Guide
Tags
Table 19 below lists the XML tags of the attributes most likely to be
reviewed or updated.
Table 19: Common Changes to Grid Specification Files
Setting
XML Tag
Notes
External NTP time sources
gptSpec>grid>ntp>sources>ip
To provide a stable time
source, it is recommended
that four NTP time servers be
used.
External time sources must
use the NTP protocol and not
the SNTP protocol. In particular, do not use the Windows
Time Service: it does not
provide enough synchronization accuracy because it uses
SNTP.
Networking information in a
grid that does not have a private
network
gptSpec>grid>site>site>
server>default-gateway
gptSpec>grid>site>site>
server>grid-network>ip
gptSpec>grid>site>site>
server>grid-network>mask
gptSpec>grid>site>site>
server>grid-network>routes
NMS Entity Name
gptSpec>grid>site>site>
nms-name
Adds the attribute nmsname=“<name>” to the
element tags for grid, site, and
server. For example:
<site name=“grid name”
nms-name= “grid one”>
228
NetApp StorageGRID
215-06839_B0
Grid Specification Files and Provisioning
Table 19: Common Changes to Grid Specification Files (cont.)
Setting
XML Tag
Networking information for grid
communication in grid with a
private network
gptSpec>grid>site>site>
server>default-gateway
Notes
gptSpec>grid>site>site>
server>grid-network>ip
gptSpec>grid>site>site>
server>grid-network>mask
gptSpec>grid>site>site>
server>grid-network>routes
Networking information for
client access in a grid with a
private network
gptSpec>grid>site>site>
server>default-gateway
gptSpec>grid>site>site>
server>network>ip
Servers that have client-side
IP addresses are
Admin Nodes,
Gateway Nodes, and
Archive Nodes.
gptSpec>grid>site>site>
server>network>mask
gptSpec>grid>site>site>
server>network>routes
Virtual IP address of
Gateway Node cluster
NetApp StorageGRID
215-06839_B0
gptSpec>grid>site>site>
gateway>services-config>
fsg>main-in-cluster>virtual-ip
Virtual IP addresses are used
with high availability clusters.
229
Expansion Guide
Table 19: Common Changes to Grid Specification Files (cont.)
Setting
XML Tag
Notes
Heartbeat IP addresses
gptSpec>grid>site>site>
gateway>network>ip
Heartbeat IP addresses are
only used with High Availability Gateway Clusters.
They do not need to be
modified unless they conflict
with another network. If necessary, substitute the 10.1.1.x
network with another unused
non-routeable network.
View a Copy of the Grid Specification File
Follow this procedure if you need to quickly check the grid specification file. If you need to edit the file, use the procedure in “Export the
Latest Grid Specification File” on page 231 to obtain a copy of the file.
1. In the NMS MI, go to Grid Management  Grid Configuration 
Configuration  Main.
View Grid
Specification
File
Figure 26: View the Grid Specification File
2. Click Export
at the bottom of the page, next to Grid Specification
File. A new browser window opens, showing the grid specification
file in raw XML.
230
NetApp StorageGRID
215-06839_B0
Grid Specification Files and Provisioning
Export the Latest Grid Specification File
Admin Node Hosted on a Virtual Machine
Prerequisites and Required Materials
•
Passwords.txt file
•
a utility such as WinImage (available at http://www.winimage.com), that permits you to create a floppy disk image
•
a tool such as WinSCP (available at http://winscp.net/eng/download.php) to transfer files to and from the Admin Node
•
Service laptop
Procedure
1. At the primary Admin Node server, access a command shell and
log in as root using the password listed in the Passwords.txt file.
2. Create a directory to hold the provisioned grid specification file.
Enter: mkdir -p /root/usb
3. Copy the provisioned grid specification file to the directory. Enter:
copy-grid-spec /root/usb
4. Use WinSCP to copy the GID<Grid_ID>_REV<revision_number>_GSPEC.xml file from the Admin Node to your service laptop.
Alternatively, copy the file from the Admin Node to your service
laptop using a floppy image. Create the floppy image on your
service laptop, connect to it in the vSphere client, and then mount
it under Linux as /media/floppy. Copy the grid specification file to
the floppy image from the Admin Node virtual machine, unmount
it, and then extract the files to your laptop.
5. Log out. Enter: exit
Admin Node Hosted on a Physical Server
NOTE New installations of the Release 9.0 StorageGRID system are not
supported on physical servers. Virtual machines must be used.
Prerequisites and Required Materials
NetApp StorageGRID
215-06839_B0
•
USB flash drive
•
Passwords.txt file
231
Expansion Guide
Procedure
1. At the primary Admin Node server, access a command shell and
log in as root using the password listed in the Passwords.txt file.
2. Insert a USB flash drive.
3. Copy the provisioned grid specification file to the USB flash drive.
Enter: copy-grid-spec
4. Log out. Enter: exit
Provision the Grid
Use this procedure to implement the changes made to the grid specification file. The provisioning script imports the updated grid
specification file into the grid and generates any grid tasks required to
complete the implementation of the changes.
On a Primary Admin Node on a Virtual Machine
Prerequisites and Required Materials
•
Deployment grid specification file. See “Export the Latest Grid
Specification File” on page 231.
•
A utility such as WinImage (available at http://www.winimage.com), that permits you to create a floppy disk image
•
Passwords.txt file
•
Provisioning passphrase
•
A tool such as WinSCP (available at http://winscp.net/eng/download.php) to transfer files to and from the Admin Node
•
Service laptop
Procedure
1. Create a floppy image that contains the deployment grid specification file:
a. Start the WinImage software on your service laptop.
b. From the File menu, select New. In the Format selection dialog,
select a standard format 1.44 MB floppy. Click OK.
232
NetApp StorageGRID
215-06839_B0
Grid Specification Files and Provisioning
c. From the Image menu, select Inject. Browse for the grid specification file, and select Open. When prompted, confirm that you
want to inject the file. Select Yes.
2. Save the floppy image:
a. From the File menu, select Save.
b. In the Save dialog, browse to the destination folder.
c. Select Save as type: Virtual floppy Image (*.vfd,*.flp)
d. Enter a filename ending in .flp. For example, <servername>.flp
You must enter the extension, or the vSphere client cannot use
the image during installation.
e. Click Save.
3. At the primary Admin Node, log in as root. When prompted for a
password, press <Enter>.
4. In vSphere Client, connect the Provisioning floppy image by
clicking the Connect/Disconnect the floppy devices of the virtual
machine icon and selecting Connect Floppy 1  Connect to floppy
image on local disk.
5. Click in the vSphere console window to return to the command
line.
6. Copy the GID<Grid_ID>_REV<revision_number>_GSPEC.xml file from
the deployment floppy image to the primary Admin Node:
a. Mount the floppy image. Enter: mount /media/floppy
b. Copy its contents to the Admin Node. Enter:
mkdir /root/usb
cp /media/floppy/* /root/usb
c. Unmount the floppy image. It is no longer needed. Enter:
umount /media/floppy
7. Remove any old grid specification files from /root/usb.
Ensure that there is only one file named
GID<grid_ID>_REV<revision_number>_GSPEC.xml.
NOTE The /root/usb directory must contain only one grid specification
file. Otherwise, provisioning will fail.
8. Run the provisioning script:
a. At the primary Admin Node server, access a command shell
and log in as root using the password listed in the Passwords.txt
file.
b. Run the provisioning script. Enter: provision /root/usb
NetApp StorageGRID
215-06839_B0
233
Expansion Guide
c. When prompted, enter the provisioning passphrase.
When the process is complete, “Provisioning complete” is displayed.
NOTE If provisioning ends with an error message, see “Provisioning
Troubleshooting” on page 241.
9. Back up the provisioning data to another directory on the
Admin Node.
a. Create a directory for the backup provisioning data. Enter:
mkdir -p /var/local/backup
b. Back up the provisioning data. Enter:
backup-to-usb-key /var/local/backup
c. When prompted, enter the provisioning passphrase.
10. Store the Provisioning directory (found at /root/usb) and the
Backup Provisioning directory (found at /var/local/backup) separately in a safe place. For example, use WinSCP to copy these
directories to your service laptop, and then store them to two
separate USB flash drives that are stored in two separate and
secure locations. For more information, see “Preserving Copies of
the Provisioning Data” on page 240.
The contents of the Provisioning directory are used during expansion and maintenance of the grid when a new SAID package must
be generated.
WARNING Store copies of the Provisioning directory in two
separate and secure locations. The Provisioning directories contain encryption keys and passwords that can
be used to obtain data from the grid. The Provisioning
directory is also required to recover from a primary
Admin Node failure.
On a Primary Admin Node on a Physical Server
NOTE New installations of the Release 9.0 StorageGRID system are not
supported on physical servers. Virtual machines must be used.
Prerequisites and Required Materials
234
•
Deployment grid specification file. See “Export the Latest Grid
Specification File” on page 231.
•
Provisioning USB flash drive
NetApp StorageGRID
215-06839_B0
Grid Specification Files and Provisioning
•
Backup Provisioning USB flash drive
•
Passwords.txt file
•
Provisioning passphrase
Procedure
1. Copy the edited grid specification file to the root level of the Provisioning USB flash drive.
2. Remove the old grid specification file from the root level of the Provisioning USB flash drive.
The Provisioning
USB flash drive
must contain only
one grid specification file at the root
level. Otherwise,
provisioning will fail.
3. Verify that the Provisioning USB flash drive contains only one grid
specification file at the root level, that is, there is only one file named
GID<grid_ID>_REV<revision_number>_GSPEC.xml.
4. Run the provisioning script:
a. At the primary Admin Node server, access a command shell and
log in as root using the password listed in the Passwords.txt file.
b. Run the provisioning script. Enter: provision
c. When prompted, insert the Provisioning USB flash drive.
d. When prompted, enter the provisioning passphrase.
e. When provisioning is complete, remove the Provisioning USB
flash drive.
NOTE If provisioning ends with an error message, see “Provisioning Troubleshooting” on page 241.
5. Back up the provisioning data:
a. Insert the Backup Provisioning USB flash drive.
b. Enter: backup-to-usb-key
c. When prompted, enter the provisioning passphrase.
6. When backup is complete, remove the Backup Provisioning USB
flash drive.
7. Review the current configuration to confirm all settings are correct:
a. Copy the file GID<grid_ID>_REV<revision_number>_SAID.zip on
the USB Provisioning flash drive to the service laptop and
extract the contents.
b. Inspect the file Doc\Index.html to make sure that the settings are
correct. If there is an error, you need to provision the grid again.
For more information, see “Errors in Grid Specification File” on
page 242.
NetApp StorageGRID
215-06839_B0
235
Expansion Guide
8. Store the Provisioning USB flash drive and the Backup Provisioning USB flash drive separately in safe locations.
WARNING Store copies of the Provisioning USB flash drive and
the Backup Provisioning USB flash drive in two
separate and secure locations. The USB flash drives
contain encryption keys and passwords that can be
used to obtain data from the grid. The Provisioning
USB flash drive is also required to recover from a
primary Admin Node failure.
Change the Provisioning Passphrase
Use this procedure to update the provisioning passphrase. The provisioning passphrase is used to encrypt the GPT repository. It is created
when the grid is first installed and is required for software upgrades,
grid expansions, and many maintenance procedures.
WARNING The provisioning passphrase is required for many
installation and maintenance procedures. The provisioning passphrase is not listed in the Passwords.txt
file. Make sure that it is documented and kept in a safe
location.
On a Primary Admin Node on a Virtual Machine
Prerequisites and Required Materials
•
Passwords.txt file
•
Current provisioning passphrase
•
New provisioning passphrase
Procedure
1. At the primary Admin Node server, access a command shell and
log in as root using the password listed in the Passwords.txt file.
236
NetApp StorageGRID
215-06839_B0
Grid Specification Files and Provisioning
2. Change the passphrase:
a. Enter: change-repository-passphrase <path>
where <path> is the location on the server where you want to
store a copy of the updated GPT repository.
b. When prompted, enter the old passphrase.
c. When prompted, enter the new passphrase. It must be at least six
characters.
d. When prompted, enter the passphrase again.
The passphrase of the GPT repository is changed to the new
value, and an updated copy of the repository that uses this
password is saved to <path>.
WARNING The provisioning passphrase is required for many
installation and maintenance procedures. The provisioning passphrase is not listed in the Passwords.txt
file. Make sure that it is documented and kept in a safe
location.
3. Back up the provisioning data to another directory on the
Admin Node. This backup copy can be used to restore the grid in the
case of an emergency or during an upgrade or grid expansion.
a. Create a directory for the backup provisioning data. Enter:
mkdir -p /var/local/backup
b. Back up the provisioning data. Enter:
backup-to-usb-key /var/local/backup
c. When prompted, enter the provisioning passphrase.
4. Store the contents of the Provisioning directory (found at </var/local/
gpt-data/>) and the Backup Provisioning directories (/var/local/backup)
separately in a safe place. For more information, see “Preserving
Copies of the Provisioning Data” on page 240.
WARNING Protect the contents of the Provisioning directory. The
Provisioning directory contains encryption keys and
passwords that can be used to obtain data from the grid.
The Provisioning directory is also required to recover
from a primary Admin Node failure.
5. Close the command shell. Enter: exit
6. Write down the provisioning passphrase for future reference.
NetApp StorageGRID
215-06839_B0
237
Expansion Guide
On a Primary Admin Node on a Physical Server
NOTE New installations of the Release 9.0 StorageGRID system are not
supported on physical servers. Virtual machines must be used.
Prerequisites and Required Materials
•
Provisioning USB flash drive
•
Backup Provisioning USB flash drive
•
Passwords.txt file
•
Current provisioning passphrase
•
New provisioning passphrase
Procedure
1. At the primary Admin Node server, access a command shell and
log in as root using the password listed in the Passwords.txt file.
2. Change the passphrase:
a. Enter: change-repository-password
b. When prompted, enter the old passphrase.
c. When prompted, enter the new passphrase. It must be at least
six characters.
d. When prompted, enter the passphrase again.
e. When prompted, insert the Provisioning USB flash drive.
3. Remove the Provisioning USB flash drive and store in a safe
place.
4. When prompted, insert the Backup Provisioning USB flash drive.
5. When backup is complete, remove the Backup Provisioning USB
flash drive and store it in a safe place.
WARNING Store copies of the Provisioning USB flash drive and
the Backup Provisioning USB flash drive in two
separate and secure locations. The USB flash drives
contain encryption keys and passwords that can be
used to obtain data from the grid. The Provisioning
USB flash drive is also required to recover from a
primary Admin Node failure.
6. Close the command shell. Enter: exit
238
NetApp StorageGRID
215-06839_B0
Grid Specification Files and Provisioning
7. Write down the provisioning passphrase for future reference.
WARNING The provisioning passphrase is required for many
installation and maintenance procedures. The provisioning passphrase is not listed in the Passwords.txt
file. Make sure that it is documented and kept in a safe
location.
Provisioning Without a USB Flash Drive
The following commands are run at the command shell interface of the
primary Admin Node to either update or copy grid provisioning data:
•
•
•
•
•
provision
•
load-provisioning-software
change-repository-passphrase
copy-grid-spec
backup-to-usb-key
restore-from-usb-key
By default, provisioning data is assumed to be stored on the Provisioning USB flash drive and the Backup USB flash drive. You are prompted
to insert the appropriate device so that updated data can be written to
these locations.
However, you can store provisioning information to another location
by optionally entering a file path as an argument to each of these commands, as follows:
•
•
•
•
•
provision <path>
change-repository-passphrase <path>
copy-grid-spec <path>
backup-to-usb-key <path>
restore-from-usb-key <path>
When loading provisioning software during upgrade, use the following flag to specify an alternate location for the updated provisioning
information:
•
load-provisioning-software --alternate-usb-dir=<path>
If you choose to store provisioning data to another location, be aware
of the following:
NetApp StorageGRID
215-06839_B0
239
Expansion Guide
For information on
supported versions
of VMware software, see the
Interoperability
Matrix Tool (IMT)
•
the size of the GPT repository increases every time the provision
command is run because a new revision is created and is added to
the GPT repository
•
preserving copies of the GPT repository is critical to the continued
operation of the grid, as described in “Preserving Copies of the
Provisioning Data” below.
Because VMware vSphere software does not support the use of USB
flash drives, it is required that you store provisioning data to an alternate location when the primary Admin Node is installed in a virtual
machine. Floppy disk images are sometimes used to transfer data to or
from software running in a virtual machine, but you should be aware
of the following:
•
except for very small grids, the SAID package is too large to place
on a floppy disk image
•
except for very small grids that have only a few revisions, the GPT
repository is too large to place on a floppy disk image
Therefore you will generally store provisioning data to a location on
the primary Admin Node, and immediately make additional copies in
alternate locations as described in “Preserving Copies of the Provisioning Data” below.
Preserving Copies of the Provisioning Data
Preserving the grid’s GPT repository is critical to the continued operation of StorageGRID software. The contents of the GPT repository are
required to upgrade, maintain, or expand the grid. The GPT repository
is also required to restore the primary Admin Node, should it fail and
require replacement.
VMware vSphere does not permit you to store data from the grid node
directly to a USB flash drive. Each time you run one of the commands
that update or copy grid provisioning data, you must back up the provisioning data to two secure locations, preferably in two distinct
physical locations. For example, use a tool such as WinSCP to copy the
provisioning data to your service laptop, and then store it to two USB
flash drives. Store these USB flash drives separately in two geographically distinct secure locations such as a locked cabinet or safe. The USB
flash drives contain encryption keys and passwords that can be used
to obtain data from the grid.
240
NetApp StorageGRID
215-06839_B0
Grid Specification Files and Provisioning
WARNING Always store two copies of the GPT repository in two
separate and secure locations. The GPT repository is
essential to the continued operation of the grid, and to
recover from a failed primary Admin Node.
It is possible to back up provisioning data directly into the grid by
saving a copy to an FSG file share. If you store a copy to the grid, it is
recommended that you always store a second copy in another location
outside of the grid. The SAID package includes a two-part encryption
key that permits you to recover data from grid nodes in the event of a
catastrophic grid failure. If the only copies of these keys are in the grid
itself, it is not possible to recover data after such a failure.
In a grid where the primary Admin Node was upgraded to StorageGRID 9.0 and is installed directly on a physical server, it is
recommended that you store provisioning data on two USB flash
drives (the Provisioning USB flash drive and the Backup USB flash
drive). Store the Provisioning USB flash drive and the Backup Provisioning USB flash drive separately in two geographically distinct
secure locations such as a locked cabinet or safe. The USB flash drives
contain encryption keys and passwords that can be used to obtain data
from the grid.
NOTE New installations of the Release 9.0 StorageGRID system are not
supported on physical servers. Virtual machines must be used.
Provisioning Troubleshooting
In case of provisioning errors, follow the guidelines below.
NetApp StorageGRID
215-06839_B0
241
Expansion Guide
Provision command
fails
provision-fail.log is
the only file
Fix the grid specification file (keep
same rev number). Run the
provision command again. There is
no need to run the command
remove-provision.
Provision-crash<grid_info>.log
created
Contact Support and supply the log
files found on provisioning media.
Look for log file on
provisioning media
Rev number of
grid spec file is 1
(new installation
Provision command
completes normally,
but there are
configuration errors
in the SAID package
Fix grid specification file (keep rev
number at 1). Start installation over,
i.e reinstall Linux, load provisioning
software, run provisioning
command.
Get revision number
from grid spec
file
All other revisions
Run remove-provision, fix grid spec
file (keep same rev number), run
provision command again.
Figure 27: Troubleshooting Provisioning Errors
Provision Command Fails
If provisioning fails because the grid specification file is incorrect, the
file provision-fail.log is created on the Provisioning Media. This file
contains the error message that the provisioning software displayed
before terminating.
If the provisioning program terminates abnormally (crash), two identical log files are saved to the Provisioning Media:
•
provision-fail.log
•
provision-crash-<grid_info>.log
where <grid_info> includes the grid ID, the grid revision being
created and a timestamp.
If the provision-fail.log file is the only file created, fix the grid specification file by updating it with Grid Designer and run provisioning again.
If the provision-crash-<grid_info>.log file is created, contact Support.
If provisioning ends with an error, no information is saved and the
remove-revision command does not need to be run.
Errors in Grid Specification File
If the provision command completes normally, but you discover an
error in the provisioning data after examining the configuration pages
242
NetApp StorageGRID
215-06839_B0
Grid Specification Files and Provisioning
in the SAID package, fix the grid specification file by updating it with
Grid Designer and then reprovision the grid.
Initial Installation
NOTE Follow this procedure if the revision number of the grid specification
file is 1.
If during the initial installation you discover errors in the SAID
package, you must fix the grid specification file and reinstall the
primary Admin Node from the beginning, that is, you must reinstall
Linux, load provisioning software, and provision the grid.
Upgrades, Expansion, and Maintenance
Procedures
NOTE Follow this procedure if the revision number of the grid specification
file is greater than 1. This procedure cannot be used for a new
installation.
1. Confirm that no scripts or grid tasks generated by provisioning
have been started.
2. Remove the provisioning data from the grid. Enter:
remove-revision
The remove-revision command does not remove grid tasks generated by provisioning nor does it roll back grid tasks that have
already been run.
WARNING Do not use the remove-revision command if you have
started any scripts or grid tasks that were generated
by provisioning. Contact Support for assistance.
3. When prompted, enter the provisioning passphrase.
4. Cancel any pending grid tasks created by the provisioning.
5. Fix the deployment grid specification file with Grid Designer and
save it to the root directory of the Provisioning Media. Do not
change the REV<revision_number>.
NOTE The Provisioning Media must contain only one grid specification
file at the root level. Otherwise, provisioning will fail.
NetApp StorageGRID
215-06839_B0
243
Expansion Guide
6. Run provisioning again and generate a new SAID package. For
more information, see “Provision the Grid” on page 232. The old
SAID package is overwritten and a new one is generated that uses
the same naming convention.
7. Review the contents of the SAID package to confirm that the provisioning information is correct.
244
NetApp StorageGRID
215-06839_B0
E
Connectivity
Connection requirements for the NMS management
interface and the server command shell
Browser Settings
Verify Internet Explorer Settings
If you are using Internet Explorer, verify that the settings for temporary internet files, security and privacy are correct.
1. Go to Tools  Internet Options  General
2. In the Browsing history box, click Settings.
3. For Check for newer versions of stored pages, select Automatically.
Figure 28: Temporary Files Setting
4. Go to Tools  Internet Options  Security  Custom Level and
ensure that the Active Scripting setting is Enable.
NetApp StorageGRID
215-06839_B0
245
Expansion Guide
Figure 29: Active Scripting Setting
5. Go to Tools  Internet Options  Privacy and ensure that the
privacy setting is Medium or lower (cookies must be enabled).
Enable Pop-ups
To make any changes to passwords, you must ensure that your
browser allows pop-up windows. For more information on allowing
pop-up windows, see your browser’s documentation.
NMS Connection Procedure
Connecting to the NMS MI at the customer site requires access to the
customer’s network.
If the grid is configured with a High Capacity Admin Cluster (HCAC),
you can only connect to the reporting Admin Node. You cannot
connect to the processing Admin Node. In a grid with two
Admin Nodes or HCACs, you can connect to either Admin Node or
HCACs’ reporting Admin Node. Each Admin Node or HCAC
displays a similar view of the grid; however, alarm acknowledgments
made at one Admin Node or HCAC are not copied to the other
Admin Node or HCAC. It is therefore possible that the Grid Topology
tree will not look the same for each Admin Node or HCAC.
1. Work with the customer system administrator to establish the
physical network connection to the service laptop. Using the customer’s network rather than a direct connection within the rack
verifies that the interface is accessible using the same infrastructure
the customer uses.
2. From the Configuration.txt file, note the IP address of the
Admin Node (reporting Admin Node in an HCAC) on the
customer network. This is needed to access the NMS MI.
246
NetApp StorageGRID
215-06839_B0
Connectivity
3. From the Passwords.txt file, note the NMS MI password for the
Vendor account or the Admin account.
4. Launch the web browser.
5. Open the address https://<IP_address>
where <IP_address> is the address of the Admin Node (reporting
Admin Node in an HCAC) on the customer network specified in
the Configuration.txt file.
Security Certificate
Depending on your version of Windows and web browser, you may be
warned of a problem with the security certificate when you access the
NMS MI URL.
Figure 30: Example of a Security Alert Window
If this appears, you can either:
•
Proceed with this session. The alert will appear again the next time
you access this URL.
•
Install the certificate. Follow the instructions of your browser.
The NMS MI uses a self-signed certificate. For information on importing this certificate into a browser see the browser's documentation.
Note that the self-signed certificate used by the NMS MI is based on
the grid’s IP address. The expected URL to the interface is this IP
address and not a domain name. In cases where the domain name is
used, browsers may not be able to match the self-signed certificate to
the identity of the NMS server. For more information see the browser's
documentation.
NetApp StorageGRID
215-06839_B0
247
Expansion Guide
Log In
1. Enter the username Vendor for full access to the NMS MI. If you are
not making grid-wide configuration changes, you can also use the
Admin account.
2. Enter the password for the NMS MI specified in the Passwords.txt
file.
Figure 31: NMS MI Login Window
Log Out
When you finish your NMS MI session, log out to keep the system
secure.
located at the top-right corner of the screen.
1. Click Logout
The logging out message appears.
2. You may safely close the browser or use other applications.
NOTE Failure to log out may give unauthorized users access to your NMS
session. Simply closing your browser is not sufficient to log out of the
session.
248
NetApp StorageGRID
215-06839_B0
Connectivity
Command Shell Access Procedures
Log In
•
At the server, access a command shell and log in as root using the
password listed in the Passwords.txt file.
Log Out
1. Enter exit to close the current command shell session.
2. Press <Alt>+<F7> to return to the Server Manager GUI.
Accessing a Server Remotely
There are three ways to connect to a server remotely using ssh:
•
From any server, using the remote server password
•
From the primary Admin Node, using the ssh private key
password
•
From the primary Admin Node, without entering any password
except the ssh private key password once
The primary Admin Node acts as an ssh access point for other grid
servers. The procedures to change the ssh private key password and to
enable passwordless access from the primary Admin Node to other
servers are described in the “Network Configuration” chapter of the
Administrator Guide.
Connect Using the Remote Server Password
1. Log in to any local server.
2. Enter: ssh <IP_address>
where <IP_address> is the IP address of the remote server.
3. When prompted, enter the password for the remote server listed in
the Passwords.txt file.
Connect Using the ssh Private Key Password
1. Log in to the primary Admin Node.
NetApp StorageGRID
215-06839_B0
249
Expansion Guide
2. Enter: ssh <hostname>
where <hostname> is the name of the remote server.
— or —
Enter: ssh <IP_address>
where <IP_address> is the IP address of the remote server.
3. When prompted, enter the SSH Access Password listed in the Passwords.txt file.
Connect to a Server Without Using a Password
1. Log in to the primary Admin Node.
2. Add the ssh private key to the ssh agent to allow the primary
Admin Node passwordless access to the other servers in the grid.
Enter: ssh-add
You need to add the ssh private key to the ssh agent each time you
start a new shell session on the primary Admin Node.
3. When prompted, enter the SSH Access Password.
You can now access any grid server from the primary Admin Node
via ssh without entering additional passwords.
4. When you no longer require passwordless access to other servers,
remove the private key from the ssh agent. Enter: ssh-add -D
5. Log out of the primary Admin Node command shell. Enter: exit
Using GNU screen
The GNU screen program, which is installed by default, allows you to
manage many shell instances concurrently, to connect to the same
session from different locations, to detach from a session without
stopping the program running within the session, and to resume a
session that was previously detached.
The screen program decouples the terminal emulator from the running
program. This means that the program keeps running even if you
detach from the session or close the terminal emulator, or lose the
connection.
Consider using screen when you execute maintenance procedures
remotely and there is a possibility of losing the connection or where it
would be useful to 'hang up' and connect back later. For example, you
may want to use screen when cloning a CMS database since this procedure can take a few hours to complete.
250
NetApp StorageGRID
215-06839_B0
Connectivity
1. Log in to a server remotely. For more information, see “Accessing a
Server Remotely” on page 249.
2. Start screen. Enter: screen
3. Enter the command or script you need to execute in the new
window.
4. To quit the screen session, enter: exit
Screen has a number of command-line options. For example:
-d
To detach a session. The running program disappears
from the terminal. However, it continues to run behind
the scenes.
-r
To resume a detached screen session. This brings the
program back to your terminal.
-ls
To list existing sessions.
-S <name>
To name a session.
-x <name>
To attach to a screen session that is already attached by
another user (multi display mode). Either user can
interact with the screen session or detach from it.
See below for an example of how to use screen.
Create named session
Detach screen session
List screen sessions
Reattach screen session
# screen -S CMScloneproc
# <commands to start cloning process>
# <CTRL+A> <CTRL+D>
# screen -ls
There is a screen on:
20849.CMScloneproc
(Detached)
1 Socket in /var/run/uscreens/S-root.
# screen -d -r CMScloneproc
For more information, display the man page for screen, and consult the
GNU official web site http://www.gnu.org/software/screen/.
NetApp StorageGRID
215-06839_B0
251
Expansion Guide
252
NetApp StorageGRID
215-06839_B0
Glossary
ACL
active primary
FSG
Access control list—Specifies what users or groups of users are
allowed to access an object and what operations are permitted, for
example read, write, and execute.
In an HAGC, the FSG that is currently providing read-write service to
clients. See also “FSG replication group”.
ADC
Administrative Domain Controller—A software component of the
StorageGRID system. The ADC service maintains topology information, provides authentication services, and responds to queries from
the LDR, CMS, CMN, and CLB. The ADC service is found on the
Control Node.
ADE
Asynchronous Distributed Environment—Proprietary development
environment used as a framework for grid services within the NetApp
StorageGRID Software.
Admin Node
A building block of the StorageGRID system. The Admin Node
provides services for the web interface, grid configuration, and audit
logs. See also “reporting Admin Node”, “processing Admin Node”,
“primary Admin Node”, “Audit Node”, and “HCAC”.
AMS
API
API
Gateway Node
ARC
NetApp StorageGRID
215-06839_B0
Audit Management System—A software component of the StorageGRID system. The AMS service monitors and logs all audited system
events and transactions to a text log file. The AMS service is found on
the Admin Node — reporting Admin Node in a High Capacity Admin
Cluster (HCAC) and the Audit Node.
Application Programming Interface—A set of commands and functions, and their related syntax, that enable software to use the
functions provided by another piece of software.
Application Programming Interface Gateway Node provides readwrite access for HTTP clients (via StorageGRID API or CDMI). API
Gateway Nodes are configured to include a “CLB” service, but not an
“FSG” service. As a result, API Gateway Nodes do not support NFS/
CIFS file systems and are not configured as part of a replication group.
Archive—A software component of the StorageGRID system. The
ARC service manages interactions with archiving middleware that
controls nearline archival media devices such as tape libraries. The
ARC service is found on the Archive Node.
253
Expansion Guide
Archive Node
A building block of the StorageGRID system. The Archive Node
manages storage of data to nearline data storage devices such as such
as tape libraries (via IBM Tivoli® Storage Manager).
Audit Node
A building block of the StorageGRID system. The Audit Node logs all
audit system events. It is an optional grid node that is generally
reserved for larger grid deployment.
audit message
Information about an event occurring in the StorageGRID system that
is captured and logged to a file.
atom
Atoms are the lowest-level component of the container data structure,
and generally encode a single piece of information. (Containers are
sometimes used when interacting with the grid via the StorageGRID
API).
AutoYaST
An automated version of the Linux installation and configuration tool
YaST (“Yet another Setup Tool”), which is included as part of the SUSE
Linux distribution.
BASE64
A standardized data encoding algorithm that enables 8-bit data to be
converted into a format that uses a smaller character set, enabling it to
safely pass through legacy systems that can only process basic (low
order) ASCII text excluding control characters. See RFC 2045 for more
details.
Basic Gateway
replication
group
A Basic Gateway replication group contains a primary FSG and one or
more secondary FSGs.
Binding
The persistent assignment of a grid service (for example, an FSG or
SSM) to the consolidated NMS service or processing NMS service. This
assignment is based on grid topology (consolidated Admin Node or
HCAC). See also “Admin Node”.
bundle
A structured collection of configuration information used internally by
various components of the grid. Bundles are structured in container
format.
business
continuity
failover
A business continuity failover within a Gateway Node replication
group is one where a secondary Gateway Node is manually configured to act as a primary after the primary Gateway Node fails. Clients
can continue to read and write to the grid after they are manually redirected to the acting primary. This is a temporary measure to maintain
service while the primary Gateway Node is repaired.
254
NetApp StorageGRID
215-06839_B0
CBID
Content Block Identifier — A unique internal identifier of a piece of
content within the StorageGRID system.
CDMI
Cloud Data Management Interface — An industry standard defined by
SNIA that includes a RESTful interface for object storage. For more
information, see http://www.snia.org/cdmi.
CIDR
Classless Inter-Domain Routing—A notation used to compactly
describe a subnet mask used to define a range of IP addresses. In CIDR
notation, the subnet mask is expressed as an IP address in dotted
decimal notation, followed by a slash and the number of bits in the
subnet. For example, 192.0.2.0/24.
CIFS
Common Internet File System—A file system protocol based on
SMB (Server Message Block, developed by Microsoft) which coexists
with protocols such as HTTP, FTP, and NFS.
CLB
Connection Load Balancer—A software component of the StorageGRID system. The CLB service provides a gateway into the grid for
clients connecting via the HTTP protocol. The CLB service is part of
the Gateway Node.
Cloud Data
Management
Interface
See “CDMI” on page 255.
CMN
Configuration Management Node— A software component of the
StorageGRID system. The CMN service manages system-wide
configuration and grid tasks. The CMN service is found on the
primary Admin Node.
CMS
Content Management System—A software component of the StorageGRID system. The CMS service manages content metadata and content replication according to the rules specified by the ILM policy. The
CMS service is found on the Control Node.
command
In HTTP, an instruction in the request header such as GET, HEAD,
DELETE, OPTIONS, POST, or PUT. Also known as an HTTP method.
container
A container is a data structure used by the internals of grid software.
In the StorageGRID API, an XML representation of a container is used
to define queries or audit messages submitted using the POST
command. Containers are used for information that has hierarchical
relationships between components. The lowest-level component of a
container is an atom. Containers may contain 0 to N atoms, and 0 to N
other containers.
NetApp StorageGRID
215-06839_B0
255
Expansion Guide
content block
ID
See “CBID”.
content handle
See “UUID”.
consolidated
Admin Node
Admin Node hosting the consolidated NMS service. Can be the
primary Admin Node.
consolidated
NMS
Hosted by the consolidated Admin Node. It is the equivalent of a
combined reporting NMS and processing NMS service. See also
“NMS”.
Control Node
A building block of the StorageGRID system. The Control Node
provides services for managing content metadata and content
replication.
CSTR
DC
deduplication
Null-terminated, variable length string.
Data Center site.
If enabled, when the grid identifies two files as being identical, it
“deduplicates” them by redirecting all content handles to point to a
single stored instance of the file. The end result is that only the number
of copies required by the ILM policy are stored in the grid. The feature
was designed for use with applications that save two identical copies
of a file to the grid via different Gateway Nodes.
NOTE Deduplication is deprecated and no longer supported.
distributed CMS
DR
EMR
Enablement
Layer
FCS
256
A CMS that uses metadata replication. See also “metadata
replication”.
Disaster Recovery site.
Electronic Medical Records—A computerized system for managing
medical data that may be interfaced to the grid.
The Enablement Layer for StorageGRID Software CD is used during
installation to customize the Linux operating system installed on each
grid server. Only the packages needed to support the services hosted
on the server are retained, which minimizes the overall footprint
occupied by the operating system and maximize the security of each
grid node.
Fixed Content Storage—a class of stored data where the data, once
captured, is rarely changed and must be retained for long periods of
time in its original form. Typically this includes images, documents,
NetApp StorageGRID
215-06839_B0
and other data where alterations would reduce the value of the stored
information.
FSG
FSG replication
group
File System Gateway—A software component of the StorageGRID
system. The FSG service enables standard network file systems to
interface with the grid. The FSG service is found on the
Gateway Node.
A replication group is a group of FSGs that provide grid access to a
specified set of clients. Within each replication group, there is a
primary FSG (or a primary FSG cluster) and one or more secondary
FSGs. The primary FSG allows clients read and write access to the
grid, while storing file system information (file pointers) for all files
saved to the grid. The secondary FSG “replicates” file system information, and backs up this information to the grid on a regular schedule.
Gateway Node
A building block of the StorageGRID system. The Gateway Node
provides connectivity services for NFS/CIFS file systems and the
HTTP protocol.
Gateway Node
replication
group
See “FSG replication group”.
GDU
Grid Deployment Utility—A StorageGRID software utility used to
facilitate the installation and update of software on all grid nodes.
GDU is installed and available on the primary Admin Node.
GPT
Grid Provisioning Tool— a software tool included with StorageGRID
software that permits you to provision a grid for installation, upgrade,
maintenance, or expansion. GPT creates and maintains an encrypted
“repository” of information about the grid that is required to maintain
the grid and recover failed grid nodes.
Grid Designer
A Microsoft Windows based application used to create the configuration information needed to install, expand, and maintain a grid. It
produces a grid specification file containing the grid’s configuration
details (in an XML format) required for the successful deployment of a
StorageGRID system.
Grid ID signed
text block
A BASE64 encoded block of cryptographically signed data that
contains the grid ID which must match the grid ID (gid) element in the
grid specification file. See also “provisioning”.
grid node
The name of the StorageGRID system building blocks, for example
Admin Node or Control Node. Each type of grid node consists of a set
of services running on a server.
NetApp StorageGRID
215-06839_B0
257
Expansion Guide
Grid
Specification
File
An XML file that provides a complete technical description of a
specific grid deployment. It describes the grid topology, and specifies
the hardware, grid options, server names, network settings, time synchronization, and gateway clusters included in the grid deployment.
The Deployment Grid Specification file is used to generate the files
needed to install the grid.
Grid Task
A managed sequence of actions that are coordinated across a grid to
perform a specific function (such as adding new node certificates).
Grid Tasks are typically long-term operations that span many entities
within the grid. See also “Task Signed Text Block”.
HAGC
High Availability Gateway Cluster—An HAGC is a primary gateway
cluster that consists of a main FSG and a supplementary FSG. A high
availability gateway replication group optionally includes one or more
secondary FSGs.
HCAC
High Capacity Admin Cluster—An HCAC is the clustering of a
reporting Admin Node and processing Admin Node. The result is an
increase to a grid’s capacity for grid services and thus grid nodes. See
also “reporting Admin Node”, “processing Admin Node”, and
“Admin Node”.
HTTP
Hyper-Text Transfer Protocol—A simple, text based client/server
protocol for requesting hypertext documents from a server. This
protocol has evolved into the primary protocol for delivery of information on the World Wide Web.
HTTPS
ILM
inode
KVM
258
Hyper-Text Transfer Protocol, Secure—URIs that include HTTPS
indicate that the transaction must use HTTP with an additional
encryption/authentication layer and often, a different default port
number. The encryption layer is usually provided by SSL or TLS.
HTTPS is widely used on the internet for secure communications.
Information Lifecycle Management—A process of managing content
storage location and duration based on content value, cost of storage,
performance access, regulatory compliance and other such factors.
On Unix/Linux systems, a data structure that contains information
about each file, for example, permissions, owner, file size, access time,
change time, and modification time. Each inode has a unique inode
number.
Keyboard, Video, Mouse—A hardware device consisting of a keyboard, LCD screen (video monitor), and mouse that permits a user to
control all servers in a rack.
NetApp StorageGRID
215-06839_B0
LAN
Local Area Network—A network of interconnected computers that is
restricted to a small area, such as a building or campus. A LAN may be
considered a node to the Internet or other wide area network. Contrast
with WAN.
latency
Time duration for processing a transaction or transmitting a unit of
data from end to end. When evaluating system performance, both
throughput and latency need to be considered. See also “throughput”.
LDR
Local Distribution Router—A software component of the StorageGRID
system. The LDR service manages the storage and transfer of content
within the grid. The LDR service is found on the Storage Node.
LUN
See “object store”.
main primary
FSG
metadata
In an HAGC, the FSG that is configured to be the active primary FSG
by default.
Information related to or describing an object stored in the grid, for
example file ingest path or ingest time.
metadata
replication
In a grid that uses metadata replication, a CMS makes copies of
metadata on the subset of CMSs that are in its CMS replication group,
and then applies the grid’s ILM policy to content metadata. In the
NMS MI, CMSs that use metadata replication display the Metadata
component. Called “distributed CMS” in a previous release.
metadata
synchronization
In a grid that uses metadata synchronization, a CMS synchronizes
metadata with all other read-write CMSs in the grid. Called “synchronized CMS” in a previous release.
NOTE Metadata synchronization is deprecated.
MI
Management Interface—The web-based interface for managing and
monitoring the StorageGRID system provided by the NMS software
component. See also “NMS”.
namespace
A set whose elements are unique names. There is no guarantee that a
name in one namespace is not repeated in a different namespace.
nearline
A term describing data storage that is neither “online” (implying that
it is instantly available like spinning disk) nor “offline” (which could
include offsite storage media). An example of a nearline data storage
location is a tape that is loaded in a tape library, but is not necessarily
mounted.
NetApp StorageGRID
215-06839_B0
259
Expansion Guide
NFS
Network File System—A protocol (developed by SUN Microsystems)
that enables access to network files as if they were on local disks.
NMS
Network Management System—A software component of the StorageGRID system. The NMS service provides a web-based interface for
managing and monitoring the StorageGRID system. The NMS service
is found on the Admin Node (both the reporting and processing
Admin Nodes in an HCAC). There are three types of NMS service:
consolidated, reporting, and processing. See also “MI” and
“Admin Node”.
node ID
An identification number assigned to a grid service within the StorageGRID system. Each service (such as an CMS or ADC) in a single grid
must have a unique node ID. The number is set during system configuration and tied to authentication certificates.
NTP
Network Time Protocol—A protocol used to synchronize distributed
clocks over a variable latency network such as the internet.
object store
A configured file system on a disk volume. The configuration includes
a specific directory structure and resources initialized at system
installation.
object
segmentation
A StorageGRID process that splits a large object into a collection of
small objects (segments) and creates a segment container to track the
collection. The segment container contains the UUID for the collection
of small objects as well as the header information for each small object
in the collection. All of the small objects in the collection are the same
size. See also “segment container”.
OID
Object Identifier—The unique identifier of an object.
primary
Admin Node
Admin Node that hosts the CMN service. There is one per grid. In an
HCAC, the CMN service is hosted by the primary reporting
Admin Node. See also “Admin Node” and “HCAC”.
primary FSG
In an FSG replication group, the FSG that provides read-write services
to clients. See also “FSG replication group”.
processing
Admin Node
Performs attribute and configuration processing that is passed on to
the reporting Admin Node as part of a High Capacity Admin Cluster.
See also “reporting Admin Node” and “HCAC”.
processing NMS
Hosted by the processing Admin Node. Provides attribute and data
processing functionality. Only operates in conjunction with a reporting
Admin Node and the reporting NMS. See also “NMS”.
260
NetApp StorageGRID
215-06839_B0
provisioning
The process of editing the Grid Specification File (if required) and generating a new or updated SAID package and GPT repository. This is
done on the primary Admin Node using the provision command. The
new or updated SAID package is saved to the Provisioning Media. See
also “Grid Specification File” and “SAID”.
quorum
A simple majority: 50% + 1 of the total number in the grid. In StorageGRID software, some functionality may require a quorum of the total
number of some types of service to be available.
reporting
Admin Node
Reports attribute and configuration information to web clients as part
of a High Capacity Admin Cluster. See also “processing Admin Node”
and “HCAC”.
reporting NMS
Hosted by the reporting Admin Node. Reports status information
about the grid and provides a browser-based interface. Only operates
in conjunction with a processing Admin Node and the processing
NMS. See also “NMS”.
SAID
Samba
Software Activation and Integration Data—Generated during provisioning, the SAID package contains site-specific files and software
needed to install a grid.
A free suite of programs which implement the Server Message Block
(SMB) protocol. Allows files and printers on the host operating system
to be shared with other clients. For example, instead of using telnet to
log in to a Unix machine to edit a file there, a Windows user might
connect a drive in Windows Explorer to a Samba server on the Unix
machine and edit the file in a Windows editor. A Unix client called
“smbclient”, built from the same source code, allows FTP-like access to
SMB resources.
SATA
Serial Advanced Technology Attachment—A connection technology
used to connect servers and storage devices.
SCSI
Small Computer System Interface— A connection technology used to
connect servers and peripheral devices such as storage systems.
secondary FSG
A read-only FSG that may also perform backups of the FSG replication
group. See also “FSG replication group”.
security partition
If enabled, access to content ingested into the grid is restricted to the
application, HTTP client, or FSG replication group that ingested the
object.
NetApp StorageGRID
215-06839_B0
261
Expansion Guide
segment
container
server
Server Manager
Used when referring specifically to hardware.
Application that runs on all grid servers, supervises the starting and
stopping of grid services, and monitors all grid services on the server.
service
A unit of the StorageGRID software such as the ADC, CMS or SSM.
SGAPI
StorageGRID Application Programming Interface—A set of
commands and functions, and their related syntax, that provides
HTTP clients with the ability to connect directly to the StorageGRID
system (to store and retrieve objects) without the need for a
Gateway Node.
SLES
SQL
SUSE Linux Enterprise Server—A commercial distribution of the SUSE
Linux operating system, used with the StorageGRID system.
Structured Query Language— An industry standard interface
language for managing relational databases. An SQL database is one
that supports the SQL interface.
ssh
Secure Shell— A Unix shell program and supporting protocols used to
log in to a remote computer and execute commands over an authenticated and encrypted channel.
SSM
Server Status Monitor—A unit of the StorageGRID software that
monitors hardware conditions and reports to the NMS. Every server in
the grid runs an instance of the SSM. The SSMS service is present on all
grid nodes.
SSL
standby
primary FSG
Storage Node
262
An object created by StorageGRID during the segmentation process.
Object segmentation splits a large object into a collection of small
objects (segments) and creates a segment container to track the collection. A segment container contains the UUID for the collection of
segmented objects as well as the header information for each segment
in the collection. When assembled, the collection of segments creates
the original object. See also “object segmentation”.
Secure Socket Layer—The original cryptographic protocol used to
enable secure communications over the internet. See also “TLS”.
In an HAGC, the FSG that is available to take over and provide readwrite services to clients in event of the failure of the active primary
FSG.
A building block of the StorageGRID system. The Storage Node
provides storage capacity and services to store, move, verify, and
retrieve objects stored on disks.
NetApp StorageGRID
215-06839_B0
StorageGRID®
StorageGRID
API
storage volume
supplementary
primary FSG
SUSE
synchronized
CMS
Task Signed
Text Block
A registered trademark of NetApp Inc. for their fixed-content storage
grid architecture and software system.
See “SGAPI”.
See “object store”.
In an HAGC, the FSG that is configured to be the standby primary FSG
by default.
See “SLES”—SUSE Linux Enterprise Server.
See “metadata synchronization”.
A BASE64 encoded block of cryptographically signed data that
provides the set of instructions that define a grid task.
TCP/IP
Transmission Control Protocol / Internet Protocol—A process of
encapsulating and transmitting packet data over a network. It includes
positive acknowledgment of transmissions.
throughput
The amount of data that can be transmitted or the number of transactions that can be processed by a system or subsystem in a given period
of time. See also “latency”.
TLS
Transport Layer Security—A cryptographic protocol used to enable
secure communications over the internet. See RFC 2246 for more
details.
transfer syntax
The parameters, such as the byte order and compression method,
needed to exchange data between systems.
TSM
Tivoli® Storage Manager — IBM storage middleware product that
manages storage and retrieval of data from removable storage
resources.
URI
Universal Resource Identifier—A generic set of all names or addresses
used to refer to resources that can be served from a computer system.
These addresses are represented as short text strings.
UTC
A language-independent international abbreviation, UTC is neither
English nor French. It means both “Coordinated Universal Time” and
“Temps Universel Coordonné”. UTC refers to the standard time
common to every place in the world.
UUID
Universally Unique Identifier—Unique identifier for each piece of
content in the StorageGRID. UUIDs provide client applications with a
NetApp StorageGRID
215-06839_B0
263
Expansion Guide
content handle that permits them to access grid content in a way that
does not interfere with the grid’s management of that same content. A
128-bit number which is guaranteed to be unique. See RFC 4122 for
more details.
VM
XFS
WAN
XML
264
Virtual Machine—A software platform that enables the installation of
an operating system and software, substituting for a physical server
and permitting the sharing of physical server resources amongst
several virtual “servers”.
A scalable, high performance journaled file system originally developed by Silicon Graphics.
Wide Area Network—A network of interconnected computers that
covers a large geographic area such as a country. Contrast with
“LAN”.
eXtensible Markup Language—A text format for the extensible representation of structured information; classified by type and managed
like a database. XML has the advantages of being verifiable, human
readable, and easily interchangeable between different systems.
NetApp StorageGRID
215-06839_B0
Download PDF
Similar pages