SAN Volume Controller Best Practices and Performance

Front cover
SAN Volume Controller
Best Practices and
Performance Guidelines
Read about best practices learned from
the field
Learn about SVC performance
advantages
Fine-tune your SVC
Jon Tate
Katja Gebuhr
Alex Howell
Nik Kjeldsen
ibm.com/redbooks
International Technical Support Organization
SAN Volume Controller Best Practices and
Performance Guidelines
December 2008
SG24-7521-01
Note: Before using this information and the product it supports, read the information in “Notices” on
page xi.
Second Edition (December 2008)
This edition applies to Version 4, Release 3, Modification 0 of the IBM System Storage SAN Volume
Controller.
© Copyright International Business Machines Corporation 2008. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
The team that wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
Summary of changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
December 2008, Second Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Chapter 1. SAN fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 SVC SAN topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Topology basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 ISL oversubscription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Single switch SVC SANs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.5 Basic core-edge topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.6 Four-SAN core-edge topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.7 Common topology issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 SAN switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Selecting SAN switch models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Switch port layout for large edge SAN switches . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.3 Switch port layout for director-class SAN switches . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.4 IBM System Storage/Brocade b-type SANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.5 IBM System Storage/Cisco SANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.6 SAN routing and duplicate WWNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Types of zoning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 Pre-zoning tips and shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.3 SVC intra-cluster zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.4 SVC storage zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.5 SVC host zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.6 Sample standard SVC zoning configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.7 Zoning with multiple SVC clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.8 Split storage subsystem configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4 Switch Domain IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5 Distance extension for mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5.1 Optical multiplexors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5.2 Long-distance SFPs/XFPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5.3 Fibre Channel: IP conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.6 Tape and disk traffic sharing the SAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.7 Switch interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.8 TotalStorage Productivity Center for Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 2. SAN Volume Controller cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Advantages of virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 How does the SVC fit into your environment . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Scalability of SVC clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
© Copyright IBM Corp. 2008. All rights reserved.
23
24
24
24
iii
iv
2.2.1 Advantage of multi-cluster as opposed to single cluster . . . . . . . . . . . . . . . . . . . .
2.2.2 Performance expectations by adding an SVC . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Growing or splitting SVC clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 SVC performance scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Cluster upgrade. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
27
28
32
34
Chapter 3. SVC Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 SVC Console installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Software only installation option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Combined software and hardware installation option . . . . . . . . . . . . . . . . . . . . . .
3.1.3 SVC cluster software and SVC Console compatibility . . . . . . . . . . . . . . . . . . . . .
3.1.4 IP connectivity considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Using the SVC Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 SSH connection limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Managing multiple SVC clusters using a single SVC Console . . . . . . . . . . . . . . .
3.2.3 Managing an SVC cluster using multiple SVC Consoles . . . . . . . . . . . . . . . . . . .
3.2.4 SSH key management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.5 Administration roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.6 Audit logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.7 IBM Support remote access to the SVC Console . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.8 SVC Console to SVC cluster connection problems . . . . . . . . . . . . . . . . . . . . . . .
3.2.9 Managing IDs and passwords. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.10 Saving the SVC configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.11 Restoring the SVC cluster configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
38
38
39
39
41
41
42
44
45
46
47
49
50
50
52
53
55
Chapter 4. Storage controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Controller affinity and preferred path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 ADT for DS4000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.2 Ensuring path balance prior to MDisk discovery . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Pathing considerations for EMC Symmetrix/DMX and HDS . . . . . . . . . . . . . . . . . . . . .
4.3 LUN ID to MDisk translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 ESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2 DS6000 and DS8000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 MDisk to VDisk mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Mapping physical LBAs to VDisk extents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.1 Investigating a medium error using lsvdisklba . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.2 Investigating Space-Efficient VDisk allocation using lsmdisklba. . . . . . . . . . . . . .
4.6 Medium error logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.1 Host-encountered media errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.2 SVC-encountered medium errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7 Selecting array and cache parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.1 DS4000 array width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.2 Segment size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.3 DS8000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8 Considerations for controller configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8.1 Balancing workload across DS4000 controllers . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8.2 Balancing workload across DS8000 controllers . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8.3 DS8000 ranks to extent pools mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8.4 Mixing array sizes within an MDG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8.5 Determining the number of controller ports for ESS/DS8000 . . . . . . . . . . . . . . . .
4.8.6 Determining the number of controller ports for DS4000 . . . . . . . . . . . . . . . . . . . .
4.9 LUN masking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10 WWPN to physical port translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
58
58
59
59
59
59
60
61
62
62
63
63
63
64
66
66
66
67
67
67
68
71
71
71
72
72
75
SAN Volume Controller Best Practices and Performance Guidelines
4.11 Using TPC to identify storage controller boundaries . . . . . . . . . . . . . . . . . . . . . . . . . .
4.12 Using TPC to measure storage controller performance . . . . . . . . . . . . . . . . . . . . . . .
4.12.1 Normal operating ranges for various statistics . . . . . . . . . . . . . . . . . . . . . . . . . .
4.12.2 Establish a performance baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.12.3 Performance metric guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.12.4 Storage controller back end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
77
78
78
78
80
Chapter 5. MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1 Back-end queue depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 MDisk transfer size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Host I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2 FlashCopy I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.3 Coalescing writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Selecting LUN attributes for MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Tiered storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5 Adding MDisks to existing MDGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.1 Adding MDisks for capacity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.2 Checking access to new MDisks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.3 Persistent reserve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.4 Renaming MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.6 Restriping (balancing) extents across an MDG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.6.1 Installing prerequisites and the SVCTools package . . . . . . . . . . . . . . . . . . . . . . .
5.6.2 Running the extent balancing script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7 Removing MDisks from existing MDGs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7.1 Migrating extents from the MDisk to be deleted . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7.2 Verifying an MDisk’s identity before removal . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.8 Remapping managed MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.9 Controlling extent allocation order for VDisk creation . . . . . . . . . . . . . . . . . . . . . . . . . .
5.10 Moving an MDisk between SVC clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
84
84
85
85
85
85
86
87
87
87
88
88
88
89
89
92
92
93
95
96
98
Chapter 6. Managed disk groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1 Availability considerations for MDGs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.1 Performance considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.2 Selecting the MDisk Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Selecting the number of LUNs per array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.1 Performance comparison of one LUN compared to two LUNs per array . . . . . .
6.3 Selecting the number of arrays per MDG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4 Striping compared to sequential type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.5 SVC cache partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.6 SVC quorum disk considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.7 Selecting storage subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
101
102
103
103
104
105
108
114
115
116
116
Chapter 7. VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1 New features in SVC Version 4.3.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.1 Real and virtual capacities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.2 Space allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.3 Space-Efficient VDisk performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.4 Testing an application with Space-Efficient VDisk . . . . . . . . . . . . . . . . . . . . . . .
7.1.5 What is VDisk mirroring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.6 Creating or adding a mirrored VDisk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.7 Availability of mirrored VDisks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.8 Mirroring between controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Creating VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.1 Selecting the MDisk Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
119
120
120
120
120
121
121
122
122
122
122
124
Contents
v
vi
7.2.2 Changing the preferred node within an I/O Group . . . . . . . . . . . . . . . . . . . . . . .
7.2.3 Moving a VDisk to another I/O Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 VDisk migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.1 Migrating with VDisk mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.2 Migrating across MDGs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.3 Image type to striped type migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.4 Migrating to image type VDisk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.5 Preferred paths to a VDisk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.6 Governing of VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4 Cache-disabled VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4.1 Underlying controller remote copy with SVC cache-disabled VDisks . . . . . . . . .
7.4.2 Using underlying controller PiT copy with SVC cache-disabled VDisks . . . . . . .
7.4.3 Changing cache mode of VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5 VDisk performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5.1 VDisk performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.6 The effect of load on storage controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
124
125
126
127
127
127
127
129
130
133
134
134
135
138
141
147
Chapter 8. Copy services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1 SAN Volume Controller Advanced Copy Services functions. . . . . . . . . . . . . . . . . . . .
8.1.1 Setting up FlashCopy services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.2 Steps to making a FlashCopy VDisk with application data integrity . . . . . . . . . .
8.1.3 Making multiple related FlashCopy VDisks with data integrity . . . . . . . . . . . . . .
8.1.4 Creating multiple identical copies of a VDisk . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.5 Creating a FlashCopy mapping with the incremental flag. . . . . . . . . . . . . . . . . .
8.1.6 Space-Efficient FlashCopy (SEFC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.7 Using FlashCopy with your backup application. . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.8 Using FlashCopy for data migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.9 Summary of FlashCopy rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Metro Mirror and Global Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2.1 Using both Metro Mirror and Global Mirror between two clusters . . . . . . . . . . . .
8.2.2 Performing three-way copy service functions . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2.3 Using native controller Advanced Copy Services functions . . . . . . . . . . . . . . . .
8.2.4 Configuration requirements for long distance links . . . . . . . . . . . . . . . . . . . . . . .
8.2.5 Saving bandwidth creating Metro Mirror and Global Mirror relationships . . . . . .
8.2.6 Global Mirror guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2.7 Migrating a Metro Mirror relationship to Global Mirror. . . . . . . . . . . . . . . . . . . . .
8.2.8 Recovering from suspended Metro Mirror or Global Mirror relationships . . . . . .
8.2.9 Diagnosing and fixing 1920 errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2.10 Using Metro Mirror or Global Mirror with FlashCopy. . . . . . . . . . . . . . . . . . . . .
8.2.11 Using TPC to monitor Global Mirror performance. . . . . . . . . . . . . . . . . . . . . . .
8.2.12 Summary of Metro Mirror and Global Mirror rules. . . . . . . . . . . . . . . . . . . . . . .
151
152
152
153
156
158
158
159
159
160
161
162
162
162
163
164
165
167
169
170
170
172
173
174
Chapter 9. Hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1 Configuration recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1.1 The number of paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1.2 Host ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1.3 Port masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1.4 Host to I/O Group mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1.5 VDisk size as opposed to quantity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1.6 Host VDisk mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1.7 Server adapter layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1.8 Availability as opposed to error isolation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2 Host pathing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
175
176
176
177
177
178
178
178
182
183
183
SAN Volume Controller Best Practices and Performance Guidelines
9.2.1 Preferred path algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2.2 Path selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2.3 Path management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2.4 Dynamic reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2.5 VDisk migration between I/O Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3 I/O queues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3.1 Queue depths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.4 Multipathing software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.5 Host clustering and reserves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.5.1 AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.5.2 SDD compared to SDDPCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.5.3 Virtual I/O server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.5.4 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.5.5 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.5.6 Solaris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.5.7 VMware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.6 Mirroring considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.6.1 Host-based mirroring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.7 Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.7.1 Automated path monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.7.2 Load measurement and stress tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
183
183
184
185
187
188
189
190
191
193
196
197
199
200
201
203
204
204
204
205
206
Chapter 10. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1 Application workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1.1 Transaction-based workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1.2 Throughput-based workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1.3 Storage subsystem considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1.4 Host considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2 Application considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2.1 Transaction environments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2.2 Throughput environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3 Data layout overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3.1 Layers of volume abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3.2 Storage administrator and AIX LVM administrator roles . . . . . . . . . . . . . . . . . .
10.3.3 General data layout recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3.4 Database strip size considerations (throughput workload) . . . . . . . . . . . . . . . .
10.3.5 LVM volume groups and logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4 When the application does its own balancing of I/Os . . . . . . . . . . . . . . . . . . . . . . . .
10.4.1 DB2 I/O characteristics and data structures . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4.2 DB2 data layout example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4.3 SVC striped VDisk recommendation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.5 Data layout with the AIX virtual I/O (VIO) server . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.5.2 Data layout strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.6 VDisk size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.7 Failure boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
207
208
208
208
209
209
209
210
210
211
211
212
212
215
215
216
216
217
218
218
219
219
220
220
Chapter 11. Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.1 Configuring TPC to analyze the SVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2 Using TPC to verify the fabric topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.1 SVC node port connectivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.2 Ensuring that all SVC ports are online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.3 Verifying SVC port zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
221
222
223
223
225
226
Contents
vii
viii
11.2.4 Verifying paths to storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.5 Verifying host paths to the SVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3 Analyzing performance data using TPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3.1 Setting up TPC to collect performance information. . . . . . . . . . . . . . . . . . . . . .
11.3.2 Viewing TPC-collected information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3.3 Cluster, I/O Group, and node reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3.4 Managed Disk Group, Managed Disk, and Volume reports . . . . . . . . . . . . . . .
11.3.5 Using TPC to alert on performance constraints . . . . . . . . . . . . . . . . . . . . . . . .
11.3.6 Monitoring MDisk performance for mirrored VDisks . . . . . . . . . . . . . . . . . . . . .
11.4 Monitoring the SVC error log with e-mail notifications. . . . . . . . . . . . . . . . . . . . . . . .
11.4.1 Verifying a correct SVC e-mail configuration . . . . . . . . . . . . . . . . . . . . . . . . . .
228
230
233
234
234
235
240
241
242
243
244
Chapter 12. Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.1 Configuration and change tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.1.1 SAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.1.2 SVC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.1.3 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.1.4 General inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.1.5 Change tickets and tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.1.6 Configuration archiving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.2 Standard operating procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.3 Code upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.3.1 Upgrade code levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.3.2 Upgrade frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.3.3 Upgrade sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.3.4 Preparing for upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.3.5 SVC upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.3.6 Host code upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.3.7 Storage controller upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.4 SAN hardware changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.4.1 Cross-referencing the SDD adapter number with the WWPN . . . . . . . . . . . . .
12.4.2 Changes that result in the modification of the destination FCID . . . . . . . . . . . .
12.4.3 Switch replacement with a like switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.4.4 Switch replacement or upgrade with a different kind of switch . . . . . . . . . . . . .
12.4.5 HBA replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.5 Naming convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.5.1 Hosts, zones, and SVC ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.5.2 Controllers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.5.3 MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.5.4 VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.5.5 MDGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
245
246
246
249
249
249
250
250
251
253
253
254
254
254
255
256
256
256
256
257
258
259
259
259
259
260
260
260
260
Chapter 13. Cabling, power, cooling, scripting, support, and classes . . . . . . . . . . .
13.1 Cabling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.1 General cabling advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.2 Long distance optical links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.3 Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.4 Cable management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.5 Cable routing and support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.6 Cable length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.7 Cable installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.2 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.2.1 Bundled uninterruptible power supply units . . . . . . . . . . . . . . . . . . . . . . . . . . .
261
262
262
262
262
263
263
264
264
264
264
SAN Volume Controller Best Practices and Performance Guidelines
13.2.2 Power switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.2.3 Power feeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.3 Cooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.4 SVC scripting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.4.1 Standard changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.5 IBM Support Notifications Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.6 SVC Support Web site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.7 SVC-related publications and classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.7.1 IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.7.2 Courses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
265
265
265
266
266
266
267
267
267
268
Chapter 14. Troubleshooting and diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.1 Common problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.1.1 Host problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.1.2 SVC problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.1.3 SAN problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.1.4 Storage subsystem problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2 Collecting data and isolating the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2.1 Host data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2.2 SVC data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2.3 SAN data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2.4 Storage subsystem data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.3 Recovering from problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.3.1 Solving host problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.3.2 Solving SVC problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.3.3 Solving SAN problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.3.4 Solving back-end storage problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.4 Livedump. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
269
270
270
270
272
272
274
274
277
279
281
282
283
284
288
288
292
Chapter 15. SVC 4.3 performance highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.1 SVC and continual performance enhancements. . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.2 SVC 4.3 code improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.3 Performance increase when upgrading to 8G4 nodes . . . . . . . . . . . . . . . . . . . . . . .
15.3.1 Performance scaling of I/O Groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
293
294
296
296
299
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Other resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Referenced Web sites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
How to get IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
301
301
301
302
303
303
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Contents
ix
x
SAN Volume Controller Best Practices and Performance Guidelines
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs.
© Copyright IBM Corp. 2008. All rights reserved.
xi
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. These and other IBM trademarked terms are
marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
1350™
AIX®
alphaWorks®
Chipkill™
DB2®
DS4000™
DS6000™
DS8000™
Enterprise Storage Server®
FlashCopy®
GPFS™
HACMP™
IBM®
Redbooks®
Redbooks (logo)
®
ServeRAID™
System p®
System Storage™
System x™
System z®
Tivoli Enterprise Console®
Tivoli®
TotalStorage®
The following terms are trademarks of other companies:
Disk Magic, and the IntelliMagic logo are trademarks of IntelliMagic BV in the United States, other countries,
or both.
NetApp, and the NetApp logo are trademarks or registered trademarks of NetApp, Inc. in the U.S. and other
countries.
Oracle, JD Edwards, PeopleSoft, Siebel, and TopLink are registered trademarks of Oracle Corporation and/or
its affiliates.
QLogic, and the QLogic logo are registered trademarks of QLogic Corporation. SANblade is a registered
trademark in the United States.
VMware, the VMware "boxes" logo and design are registered trademarks or trademarks of VMware, Inc. in the
United States and/or other jurisdictions.
Solaris, Sun, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States,
other countries, or both.
Active Directory, Internet Explorer, Microsoft, Visio, Windows NT, Windows Server, Windows, and the
Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
Intel Xeon, Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks
of Intel Corporation or its subsidiaries in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
xii
SAN Volume Controller Best Practices and Performance Guidelines
Preface
This IBM® Redbooks® publication captures several of the best practices based on field
experience and describes the performance gains that can be achieved by implementing the
IBM System Storage™ SAN Volume Controller.
This book is intended for extremely experienced storage, SAN, and SVC administrators and
technicians.
Readers are expected to have an advanced knowledge of the SAN Volume Controller (SVC)
and SAN environment, and we recommend these books as background reading:
 IBM System Storage SAN Volume Controller, SG24-6423
 Introduction to Storage Area Networks, SG24-5470
 Using the SVC for Business Continuity, SG24-7371
The team that wrote this book
This book was produced by a team of specialists from around the world working at the
International Technical Support Organization, San Jose Center.
Jon Tate is a Project Manager for IBM System Storage SAN Solutions at the International
Technical Support Organization, San Jose Center. Before joining the ITSO in 1999, he
worked in the IBM Technical Support Center, providing Level 2 support for IBM storage
products. Jon has 23 years of experience in storage software and management, services,
and support, and is both an IBM Certified IT Specialist and an IBM SAN Certified Specialist.
Jon also serves as the UK Chair of the Storage Networking Industry Association.
Katja Gebuhr is a Support Center Representative for IBM Germany in Mainz. She joined IBM
in 2003 for an apprenticeship as an IT-System Business Professional and started working for
the DASD Front End SAN Support in 2006. Katja provides Level 1 Hardware and Software
support for SAN Volume Controller and SAN products for IMT Germany and CEMAAS.
Alex Howell is a Software Engineer in the SAN Volume Controller development team, based
at IBM Hursley, UK. He has worked on SVC since the release of Version 1.1.0 in 2003, when
he joined IBM as a graduate. His roles have included test engineer, developer, and
development team lead. He is a development lab advocate for several SVC clients, and he
has led a beta program piloting new function.
Nik Kjeldsen is an IT Specialist at IBM Global Technology Services, Copenhagen, Denmark.
With a background in data networks, he is currently a Technical Solution Architect working
with the design and implementation of Enterprise Storage infrastructure. Nikolaj has seven
years of experience in the IT field and holds a Master’s degree in Telecommunication
Engineering from the Technical University of Denmark.
© Copyright IBM Corp. 2008. All rights reserved.
xiii
Figure 0-1 Authors (L-R): Katja, Alex, Nik, and Jon
We extend our thanks to the following people for their contributions to this project.
There are many people that contributed to this book. In particular, we thank the development
and PFE teams in Hursley, England. Matt Smith was also instrumental in moving any issues
along and ensuring that they maintained a high profile. Barry Whyte was instrumental in
steering us in the correct direction and for providing support throughout the life of the
residency.
The authors of the first edition of this book were:
Deon George
Thorsten Hoss
Ronda Hruby
Ian MacQuarrie
Barry Mellish
Peter Mescher
We also want to thank the following people for their contributions:
Trevor Boardman
Carlos Fuente
Gary Jarman
Colin Jewell
Andrew Martin
Paul Merrison
Steve Randle
Bill Scales
Matt Smith
Barry Whyte
IBM Hursley
Tom Jahn
IBM Germany
Peter Mescher
IBM Raleigh
xiv
SAN Volume Controller Best Practices and Performance Guidelines
Paulo Neto
IBM Portugal
Bill Wiegand
IBM Advanced Technical Support
Mark Balstead
IBM Tucson
Dan Braden
IBM Dallas
Lloyd Dean
IBM Philadelphia
Dorothy Faurot
IBM Raleigh
Marci Nagel
IBM Rochester
Bruce McNutt
IBM Tucson
Glen Routley
IBM Australia
Dan C Rumney
IBM New York
Chris Saul
IBM San Jose
Brian Smith
IBM San Jose
Sharon Wang
IBM Chicago
Deanna Polm
Sangam Racherla
IBM ITSO
Become a published author
Join us for a two- to six-week residency program. Help write a book dealing with specific
products or solutions, while getting hands-on experience with leading-edge technologies. You
will have the opportunity to team with IBM technical professionals, IBM Business Partners,
and Clients.
Your efforts will help increase product acceptance and client satisfaction. As a bonus, you will
develop a network of contacts in IBM development labs, and increase your productivity and
marketability.
Preface
xv
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
 Use the online Contact us review IBM Redbooks publications form found at:
ibm.com/redbooks
 Send your comments in an e-mail to:
redbooks@us.ibm.com
 Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
xvi
SAN Volume Controller Best Practices and Performance Guidelines
Summary of changes
This section describes the technical changes made in this edition of the book and in previous
editions. This edition might also include minor corrections and editorial changes that are not
identified.
Summary of Changes
for SG24-7521-01
for SAN Volume Controller Best Practices and Performance Guidelines
as created or updated on December 7, 2008.
December 2008, Second Edition
This revision reflects the addition, deletion, or modification of new and changed information
described below.
New information
New material:
 Space-Efficient VDisks
 SVC Console
 VDisk Mirroring
© Copyright IBM Corp. 2008. All rights reserved.
xvii
xviii
SAN Volume Controller Best Practices and Performance Guidelines
1
Chapter 1.
SAN fabric
The IBM Storage Area Network (SAN) Volume Controller (SVC) has unique SAN fabric
configuration requirements that differ from what you might be used to in your storage
infrastructure. A quality SAN configuration can help you achieve a stable, reliable, and
scalable SVC installation; conversely, a poor SAN environment can make your SVC
experience considerably less pleasant. This chapter provides you with information to tackle
this topic.
Note: As with any of the information in this book, you must check the IBM System Storage
SAN Volume Controller V4.3.0 - Software Installation and Configuration Guide, S7002156,
and IBM System Storage SAN Volume Controller Restrictions, S1003283, for limitations,
caveats, updates, and so on that are specific to your environment. Do not rely on this book
as the last word in SVC SAN design. Also, anyone planning for an SVC installation must
be knowledgeable about general SAN design principles.
You must refer to the IBM System Storage SAN Volume Controller Support Web page for
updated documentation before implementing your solution. The Web site is:
http://www.ibm.com/storage/support/2145
Note: All document citations in this book refer to the 4.3 versions of the SVC product
documents. If you use a different version, refer to the correct edition of the documents.
As you read this chapter, remember that this is a “best practices” book based on field
experiences. Although there will be many possible (and supported) SAN configurations that
do not meet the recommendations found in this chapter, we think they are not ideal
configurations.
© Copyright IBM Corp. 2008. All rights reserved.
1
1.1 SVC SAN topology
The topology requirements for the SVC do not differ too much from any other storage device.
What make the SVC unique here is that it can be configured with a large number of hosts,
which can cause interesting issues with SAN scalability. Also, because the SVC often serves
so many hosts, an issue caused by poor SAN design can quickly cascade into a catastrophe.
1.1.1 Redundancy
One of the fundamental SVC SAN requirements is to create two (or more) entirely separate
SANs that are not connected to each other over Fibre Channel in any way. The easiest way is
to construct two SANs that are mirror images of each other.
Technically, the SVC supports using just a single SAN (appropriately zoned) to connect the
entire SVC. However, we do not recommend this design in any production environment. In our
experience, we also do not recommend this design in “development” environments either,
because a stable development platform is important to programmers, and an extended
outage in the development environment can cause an expensive business impact. For a
dedicated storage test platform, however, it might be acceptable.
Redundancy through Cisco VSANs or Brocade Traffic Isolation Zones
Simply put, using any logical separation in a single SAN fabric to provide SAN redundancy is
unacceptable for a production environment. While VSANs and Traffic Isolation Zones can
provide a measure of port isolation, they are no substitute for true hardware redundancy. All
SAN switches have been known to suffer from hardware or fatal software failures.
1.1.2 Topology basics
Note: Due to the nature of Fibre Channel, it is extremely important to avoid inter-switch
link (ISL) congestion. While Fibre Channel (and the SVC) can, under most circumstances,
handle a host or storage array that has become overloaded, the mechanisms in Fibre
Channel for dealing with congestion in the fabric itself are not effective. The problems
caused by fabric congestion can range anywhere from dramatically slow response time all
the way to storage access loss. These issues are common with all high-bandwidth SAN
devices and are inherent to Fibre Channel; they are not unique to the SVC.
When an Ethernet network becomes congested, the Ethernet switches simply discard
frames for which there is no room. When a Fibre Channel network becomes congested,
the Fibre Channel switches instead stop accepting additional frames until the congestion
clears, in addition to occasionally dropping frames. This congestion quickly moves
“upstream” in the fabric and clogs the end devices (such as the SVC) from communicating
anywhere. This behavior is referred to as head-of-line blocking, and while modern SAN
switches internally have a non-blocking architecture, head-of-line-blocking still exists as a
SAN fabric problem. Head-of-line-blocking can result in your SVC nodes being unable to
communicate with your storage subsystems or mirror their write caches, just because you
have a single congested link leading to an edge switch.
No matter the size of your SVC installation, there are a few best practices that you need to
apply to your topology design:
 All SVC node ports in a cluster must be connected to the same SAN switches as all of the
storage devices with which the SVC cluster is expected to communicate. Conversely,
2
SAN Volume Controller Best Practices and Performance Guidelines
storage traffic and inter-node traffic must never transit an ISL, except during migration
scenarios.
 High-bandwidth-utilization servers (such as tape backup servers) must also be on the
same SAN switches as the SVC node ports. Putting them on a separate switch can cause
unexpected SAN congestion problems. Putting a high-bandwidth server on an edge
switch is a waste of ISL capacity.
 If at all possible, plan for the maximum size configuration that you ever expect your SVC
installation to reach. As you will see in later parts of this chapter, the design of the SAN
can change radically for larger numbers of hosts. Modifying the SAN later to
accommodate a larger-than-expected number of hosts either produces a poorly-designed
SAN or is difficult, expensive, and disruptive to your business, which does not mean that
you need to purchase all of the SAN hardware initially, just that you need to lay out the
SAN while considering the maximum size.
 Always deploy at least one “extra” ISL per switch. Not doing so exposes you to
consequences from complete path loss (this is bad) to fabric congestion (this is even
worse).
 The SVC does not permit the number of hops between the SVC cluster and the hosts to
exceed three hops, which is typically not a problem.
1.1.3 ISL oversubscription
The IBM System Storage SAN Volume Controller V4.3.0 - Software Installation and
Configuration Guide, S7002156, specifies a suggested maximum host port to ISL ratio of 7:1.
With modern 4 or 8 Gbps SAN switches, this ratio implies an average bandwidth (in one
direction) per host port of approximately 57 MBps (4 Gbps). It you do not expect most of your
hosts to reach anywhere near that value, it is possible to request an exception to the ISL
oversubscription rule, known as a Request for Price Quotation (RPQ), from your IBM
marketing representative. Before requesting an exception, however, consider the following
factors:
 You must take peak loads into consideration, not average loads. For instance, while a
database server might only use 20 MBps during regular production workloads, it might
perform a backup at far higher data rates.
 Congestion to one switch in a large fabric can cause performance issues throughout the
entire fabric, including traffic between SVC nodes and storage subsystems, even if they
are not directly attached to the congested switch. The reasons for these issues are
inherent to Fibre Channel flow control mechanisms, which are simply not designed to
handle fabric congestion. Therefore, any estimates for required bandwidth prior to
implementation must have a safety factor built into the estimate.
 On top of the safety factor for traffic expansion, implement a spare ISL or ISL trunk, as
stated in 1.1.2, “Topology basics” on page 2. You need to still be able to avoid congestion
if an ISL fails due to issues, such as a SAN switch line card or port blade failure.
 Exceeding the “standard” 7:1 oversubscription ration requires you to implement fabric
bandwidth threshold alerts. Anytime that one of your ISLs exceeds 70%, you need to
schedule fabric changes to distribute the load further.
 You need to also consider the bandwidth consequences of a complete fabric outage.
While a complete fabric outage is a fairly rare event, insufficient bandwidth can turn a
single-SAN outage into a total access loss event.
 Take the bandwidth of the links into account. It is common to have ISLs run faster than
host ports, which obviously reduces the number of required ISLs.
Chapter 1. SAN fabric
3
The RPQ process involves a review of your proposed SAN design to ensure that it is
reasonable for your proposed environment.
1.1.4 Single switch SVC SANs
The most basic SVC topology consists of nothing more than a single switch per SAN, which
can be anything from a 16-port 1U switch for a small installation of just a few hosts and
storage devices all the way up to a director with hundreds of ports. This design obviously has
the advantage of simplicity, and it is a sufficient architecture for small to medium SVC
installations.
It is preferable to use a multi-slot director-class single switch over setting up a core-edge
fabric made up solely of lower end switches.
As stated in 1.1.2, “Topology basics” on page 2, keep the maximum planned size of the
installation in mind if you decide to use this architecture. If you run too low on ports,
expansion can be difficult.
1.1.5 Basic core-edge topology
The core-edge topology is easily recognized by most SAN architects, as illustrated in
Figure 1-1 on page 5. It consists of a switch in the center (usually, a director-class switch),
which is surrounded by other switches. The core switch contains all SVC ports, storage ports,
and high-bandwidth hosts. It is connected via ISLs to the edge switches.
The edge switches can be of any size. If they are multi-slot directors, they are usually fitted
with at least a few oversubscribed line cards/port blades, because the vast majority of hosts
do not ever require line-speed bandwidth, or anything close to it. Note that ISLs must not be
on oversubscribed ports.
4
SAN Volume Controller Best Practices and Performance Guidelines
SVC Node
SVC Node
2
2
2
Core Switch
2
Edge Switch
Core Switch
2
Edge Switch
Host
2
Edge Switch
2
Edge Switch
Host
Figure 1-1 Core-edge topology
1.1.6 Four-SAN core-edge topology
For installations where even a core-edge fabric made up of multi-slot director-class SAN
switches is insufficient, the SVC cluster can be attached to four SAN fabrics instead of the
normal two SAN fabrics. This design is especially useful for large, multi-cluster installations.
As with a regular core-edge, the edge switches can be of any size, and multiple ISL links
should be installed per switch.
As you can see in Figure 1-2 on page 6, we have attached the SVC cluster to each of four
independent fabrics. The storage subsystem used also connects to all four SAN fabrics, even
though this design is not required.
Chapter 1. SAN fabric
5
SVC Node
Core Switch
2
SVC Node
Core Switch
2
Edge Switch
Edge Switch
Host
Core Switch
Core Switch
2
2
Edge Switch
Edge Switch
Host
Figure 1-2 Four-SAN core-edge topology
While certain clients have chosen to simplify management by connecting the SANs together
into pairs with a single ISL link, we do not recommend this design. With only a single ISL
connecting fabrics together, a small zoning mistake can quickly lead to severe SAN
congestion.
Using the SVC as a SAN bridge: With the ability to connect an SVC cluster to four SAN
fabrics, it is possible to use the SVC as a bridge between two SAN environments (with two
fabrics in each environment). This configuration can be useful for sharing resources
between the SAN environments without merging them. Another use is if you have devices
with different SAN requirements present in your installation.
When using the SVC as a SAN bridge, pay special attention to any restrictions and
requirements that might apply to your installation.
6
SAN Volume Controller Best Practices and Performance Guidelines
1.1.7 Common topology issues
In this section, we describe common topology problems that we have encountered.
Accidentally accessing storage over ISLs
One common topology mistake that we have encountered in the field is to have SVC paths
from the same node to the same storage subsystem on multiple core switches that are linked
together (refer to Figure 1-3). This problem is commonly encountered in environments where
the SVC is not the only device accessing the storage subsystems.
SVC Node
SVC Node
2
2
2
2
Switch
Switch
Switch
Switch
SVC -> Storage Traffic
should be zoned to never
travel over these links
SVC-attach host
Non-SVC-attach
host
Figure 1-3 Spread out disk paths
If you have this type of topology, it is extremely important to zone the SVC so that it will only
see paths to the storage subsystems on the same SAN switch as the SVC nodes.
Implementing a storage subsystem host port mask might also be feasible here.
Note: This type of topology means you must have more restrictive zoning than what is
detailed in 1.3.6, “Sample standard SVC zoning configuration” on page 16.
Because of the way that the SVC load balances traffic between the SVC nodes and MDisks,
the amount of traffic that transits your ISLs will be unpredictable and vary significantly. If you
Chapter 1. SAN fabric
7
have the capability, you might want to use either Cisco Virtual SANs (VSANs) or Brocade
Traffic Isolation to help enforce the separation.
Accessing storage subsystems over an ISL on purpose
This practice is explicitly advised against in the SVC configuration guidelines, because the
consequences of SAN congestion to your storage subsystem connections can be quite
severe. Only use this configuration in SAN migration scenarios, and when doing so, closely
monitor the performance of the SAN.
SVC I/O Group switch splitting
Clients often want to attach another I/O Group to an existing SVC cluster to increase the
capacity of the SVC cluster, but they lack the switch ports to do so. If this situation happens to
you, there are two options:
 Completely overhaul the SAN during a complicated and painful redesign.
 Add a new core switch, and inter-switch link the new I/O Group and the new switch back to
the original, as illustrated in Figure 1-4.
Old I/O Group
SVC Node
2
New I/O Group
SVC Node
2
Old Switch
SVC Node
2
2
2
New Switch
2
SVC Node
2
Old Switch
New Switch
SVC -> Storage Traffic
should be zoned and
masked to never travel
over these links, but they
should be zoned for intraCluster communications
Host
Figure 1-4 Proper I/O Group splitting
8
SAN Volume Controller Best Practices and Performance Guidelines
2
Host
This design is a valid configuration, but you must take certain precautions:
 As stated in “Accidentally accessing storage over ISLs” on page 7, the zone and Logical
Unit Number (LUN) mask the SAN and storage subsystems, so that you do not access the
storage subsystems over the ISLs. This design means that your storage subsystems will
need connections to both the old and new SAN switches.
 Have two dedicated ISLs between the two switches on each SAN with no data traffic
traveling over them. The reason for this design is because if this link ever becomes
congested or lost, you might experience problems with your SVC cluster if there are also
issues at the same time on the other SAN. If you can, set a 5% traffic threshold alert on
the ISLs so that you know if a zoning mistake has allowed any data traffic over the links.
Note: It is not a best practice to use this configuration to perform mirroring between I/O
Groups within the same cluster. And, you must never split the two nodes in an I/O Group
between various SAN switches within the same SAN fabric.
1.2 SAN switches
In this section, we discuss several considerations when you select the Fibre Channel (FC)
SAN switches for use with your SVC installation. It is important to understand the features
offered by the various vendors and associated models in order to meet design and
performance goals.
1.2.1 Selecting SAN switch models
In general, there are two “classes” of SAN switches: fabric switches and directors. While
normally based on the same software code and Application Specific Integrated Circuit (ASIC)
hardware platforms, there are differences in performance and availability. Directors feature a
slotted design and have component redundancy on all active components in the switch
chassis (for instance, dual-redundant switch controllers). A SAN fabric switch (or just a SAN
switch) normally has a fixed port layout in a non-slotted chassis (there are exceptions to this
rule though, such as the IBM/Cisco MDS9200 series, which features a slotted design).
Regarding component redundancy, both fabric switches and directors are normally equipped
with redundant, hot-swappable environmental components (power supply units and fans).
In the past, over-subscription on the SAN switch ports had to be taken into account when
selecting a SAN switch model. Over-subscription here refers to a situation in which the
combined maximum port bandwidth of all switch ports is higher than what the switch internally
can switch. For directors, this number can vary for different line card/port blade options,
where a high port-count module might have a higher over-subscription rate than a low
port-count module, because the capacity toward the switch backplane is fixed. With the latest
generation SAN switches (both fabric switches and directors), this issue has become less
important due to increased capacity in the internal switching. This situation is true for both
switches with an internal crossbar architecture and switches realized by an internal core/edge
ASIC lineup.
For modern SAN switches (both fabric switches and directors), processing latency from
ingress to egress port is extremely low and is normally negligible.
When selecting the switch model, try to take the future SAN size into consideration. It is
generally better to initially get a director with only a few port modules instead of having to
implement multiple smaller switches. Having a high port-density director instead of a number
Chapter 1. SAN fabric
9
of smaller switches also saves ISL capacity and therefore ports used for inter-switch
connectivity.
IBM sells and support SAN switches from both of the major SAN vendors listed in the
following product portfolios:
 IBM System Storage b-type/Brocade SAN portfolio
 IBM System Storage/Cisco SAN portfolio
1.2.2 Switch port layout for large edge SAN switches
While users of smaller, non-bladed, SAN fabric switches generally do not need to concern
themselves with which ports go where, users of multi-slot directors must pay careful attention
to where the ISLs are located in the switch. Generally, the ISLs (or ISL trunks) must be on
separate port modules within the switch to ensure redundancy. The hosts must be spread out
evenly among the remaining line cards in the switch. Remember to locate high-bandwidth
hosts on the core switches directly.
1.2.3 Switch port layout for director-class SAN switches
Each SAN switch vendor has a selection of line cards/port blades available for their multi-slot
director-class SAN switch models. Some of these options are over-subscribed, and some of
them have full bandwidth available for the attached devices. For your core switches, we
suggest only using line cards/port blades where the full line speed that you expect to use will
be available. You need to contact your switch vendor for full line card/port blade option details.
Your SVC ports, storage ports, ISLs, and high-bandwidth hosts need to be spread out evenly
among your line cards in order to help prevent the failure of any one line card from causing
undue impact to performance or availability.
1.2.4 IBM System Storage/Brocade b-type SANs
These are several of the features that we have found useful.
Fabric Watch
The Fabric Watch feature found in newer IBM/Brocade-based SAN switches can be useful
because the SVC relies on a properly functioning SAN. This is a licensed feature, but it
comes pre-bundled with most IBM/Brocade SAN switches. With Fabric Watch, you can
pre-configure thresholds on certain switch properties, which when triggered, produce an alert.
These attributes include:
 Switch port event, such as link reset
 Switch port errors (link quality)
 Component failures
Another useful feature included with Fabric Watch is Port Fencing, which can exclude a switch
port if the port is misbehaving.
Fibre Channel Routing/MetaSANs
To enhance SAN scalability beyond a single Fibre Channel (FC) fabric, Fibre Channel
Routing (FCR) for IBM/Brocade SANs can be useful. This hierarchical networks approach
allows separate FC fabrics to be connected without merging them. This approach can also be
useful for limiting the fault domains in the SAN environment. With the latest generation of
10
SAN Volume Controller Best Practices and Performance Guidelines
IBM/Brocade SAN switches, FCR is an optionally licensed feature. With older generations,
special hardware is needed.
For more information about the IBM System Storage b-type/Brocade products, refer to the
following IBM Redbooks publications:
 Implementing an IBM/Brocade SAN, SG24-6116
 IBM System Storage/Brocade Multiprotocol Routing: An Introduction and Implementation,
SG24-7544
1.2.5 IBM System Storage/Cisco SANs
We have found the following features to be useful.
Port Channels
To ease the required planning efforts for future SAN expansions, ISLs/Port Channels can be
made up of any combination of ports in the switch, which means that it is not necessary to
reserve special ports for future expansions when provisioning ISLs. Instead, you can use any
free port in the switch for expanding the capacity of an ISL/Port Channel.
Cisco VSANs
VSANs and inter-VSAN routing (IVR) enable port/traffic isolation in the fabric. This port/traffic
isolation can be useful for instance fault isolation and scalability.
It is possible to use Cisco VSANs, combined with inter-VSAN routes, to isolate the hosts from
the storage arrays. This arrangement provides little benefit for a great deal of added
configuration complexity. However, VSANs with inter-VSAN routes can be useful for fabric
migrations from non-Cisco vendors onto Cisco fabrics, or other short-term situations. VSANs
can also be useful if you have hosts that access the storage directly, along with virtualizing
part of the storage with the SVC. (In this instance, it is best to use separate storage ports for
the SVC and the hosts. We do not advise using inter-VSAN routes to enable port sharing.)
1.2.6 SAN routing and duplicate WWNNs
The SVC has a built-in service feature that attempts to detect if two SVC nodes are on the
same FC fabric with the same worldwide node name (WWNN). When this situation is
detected, the SVC will restart and turn off its FC ports to prevent data corruption. This feature
can be triggered erroneously if an SVC port from fabric A is zoned through a SAN router so
that an SVC port from the same node in fabric B can log into the fabric A port.
To prevent this situation from happening, it is important that whenever implementing
advanced SAN FCR functions, be careful to ensure that the routing configuration is correct.
1.3 Zoning
Because it differs from traditional storage devices, properly zoning the SVC into your SAN
fabric is a source of misunderstanding and errors. Despite the misunderstandings and errors,
zoning the SVC into your SAN fabric is not particularly complicated.
Note: Errors caused by improper SVC zoning are often fairly difficult to isolate, so create
your zoning configuration carefully.
Chapter 1. SAN fabric
11
Here are the basic SVC zoning steps:
1.
2.
3.
4.
5.
6.
Create SVC intra-cluster zone.
Create SVC cluster.
Create SVC → Back-end storage subsystem zones.
Assign back-end storage to the SVC.
Create host → SVC zones.
Create host definitions on the SVC.
The zoning scheme that we describe next is slightly more restrictive than the zoning
described in the IBM System Storage SAN Volume Controller V4.3.0 - Software Installation
and Configuration Guide, S7002156. The Configuration Guide is a statement of what is
supported, but this publication is a statement of our understanding of the best way to set up
zoning, even if other ways are possible and supported.
1.3.1 Types of zoning
Modern SAN switches have three types of zoning available: port zoning, worldwide node
name (WWNN) zoning, and worldwide port name (WWPN) zoning. The preferred method is
to use only WWPN zoning.
There is a common misconception that WWPN zoning provides poorer security than port
zoning, which is not the case. Modern SAN switches enforce the zoning configuration directly
in the switch hardware, and port binding functions can be used to enforce that a given WWPN
must be connected to a particular SAN switch port.
Note: Avoid using a zoning configuration with port and worldwide name zoning intermixed.
There are multiple reasons not to use WWNN zoning. For hosts, it is absolutely a bad idea,
because the WWNN is often based on the WWPN of only one of the HBAs. If you have to
replace that HBA, the WWNN of the host will change on both fabrics, which will result in
access loss. In addition, it also makes troubleshooting more difficult, because you have no
consolidated list of which ports are supposed to be in which zone, and therefore, it is difficult
to tell if a port is missing.
Special note for IBM/Brocade SAN Webtools users
If you use the Brocade Webtools Graphical User Interface (GUI) to configure zoning, you
must take special care not to use WWNNs. When looking at the “tree” of available worldwide
names (WWNs), the WWNN is always presented one level higher than the WWPNs. Refer to
Figure 1-5 on page 13 for an example. Make sure that you use a WWPN, not the WWNN.
12
SAN Volume Controller Best Practices and Performance Guidelines
Figure 1-5 IBM/Brocade Webtools zoning
1.3.2 Pre-zoning tips and shortcuts
Now, we describe several tips and shortcuts for the SVC zoning.
Naming convention and zoning scheme
It is important to have a defined naming convention and zoning scheme when creating and
maintaining an SVC zoning configuration. Failing to have a defined naming convention and
zoning scheme can make your zoning configuration extremely difficult to understand and
maintain.
Remember that different environments have different requirements, which means that the
level of detailing in the zoning scheme will vary among environments of different sizes. It is
important to have an easily understandable scheme with an appropriate level of detailing and
then to be consistent whenever making changes to the environment.
Refer to 12.5, “Naming convention” on page 259 for suggestions for an SVC naming
convention.
Chapter 1. SAN fabric
13
Aliases
We strongly recommend that you use zoning aliases when creating your SVC zones if they
are available on your particular type of SAN switch. Zoning aliases make your zoning easier
to configure and understand and cause fewer possibilities for errors.
One approach is to include multiple members in one alias, because zoning aliases can
normally contain multiple members (just like zones). We recommend that you create aliases
for:
 One that holds all the SVC node ports on each fabric
 One for each storage subsystem (or controller blade, in the case of DS4x00 units)
 One for each I/O Group port pair (that is, it needs to contain the 1st node in the I/O Group,
port 2, and the 2nd node in the I/O Group, port 2)
Host aliases can be omitted in smaller environments, as in our lab environment.
1.3.3 SVC intra-cluster zone
This zone needs to contain every SVC node port on the SAN fabric. While it will overlap with
the storage zones that you will create soon, it is handy to have this zone as a “fail-safe,” in
case you ever make a mistake with your storage zones.
1.3.4 SVC storage zones
You need to avoid zoning different vendor storage subsystems together; the ports from the
storage subsystem need to be split evenly across the dual fabrics. Each controller might have
its own recommended best practice.
DS4x00 and FAStT storage controllers
Each DS4x00 and FAStT storage subsystem controller consists of two separate blades. It is a
best practice that these two blades are not in the same zone if you have attached them to the
same SAN. There might be a similar best practice suggestion from non-IBM storage vendors;
contact them for details.
1.3.5 SVC host zones
There must be a single zone for each host port. This zone must contain the host port, and one
port from each SVC node that the host will need to access. While there are two ports from
each node per SAN fabric in a usual dual-fabric configuration, make sure that the host only
accesses one of them. Refer to Figure 1-6 on page 15.
This configuration provides four paths to each VDisk, which is the number of paths per VDisk
for which IBM Subsystem Device Driver (SDD) multipathing software and the SVC have been
tuned.
14
SAN Volume Controller Best Practices and Performance Guidelines
A
B
C
Zone
Foo_Slot3_SAN_A
A
D
SVC Node
I/O Group 0
Zone
Bar_Slot2_SAN_A
Switch A
Zone: Foo_Slot3_SAN_A
50:00:11:22:33:44:55:66
SVC_Group0_Port_A
Zone: Bar_Slot2_SAN_A
50:11:22:33:44:55:66:77
SVC_Group0_Port_C
Host Foo
B
C
D
SVC Node
Zone
Foo_Slot5_SAN_B
Zone
Bar_Slot8_SAN_B
Switch B
Zone: Foo_Slot5_SAN_B
50:00:11:22:33:44:55:67
SVC_Group0_Port_D
Zone: Bar_Slot8_SAN_B
50:11:22:33:44:55:66:78
SVC_Group0_Port_B
Host Bar
Figure 1-6 Typical host → SVC zoning
The IBM System Storage SAN Volume Controller V4.3.0 - Software Installation and
Configuration Guide, S7002156, discusses putting many hosts into a single zone as a
supported configuration under certain circumstances. While this design usually works just
fine, instability in one of your hosts can trigger all sorts of impossible to diagnose problems in
the other hosts in the zone. For this reason, you need to only have a single host in each zone
(single initiator zones).
It is a supported configuration to have eight paths to each VDisk, but this design provides no
performance benefit (indeed, under certain circumstances, it can even reduce performance),
and it does not improve reliability or availability by any significant degree.
Hosts with four (or more) HBAs
If you have four host bus adapters (HBAs) in your host instead of two HBAs, it takes a little
more planning. Because eight paths are not an optimum number, you must instead configure
your SVC Host Definitions (and zoning) as though the single host is two separate hosts.
During VDisk assignment, you alternate which VDisk was assigned to one of the
“pseudo-hosts.”
The reason that we do not just assign one HBA to each of the paths is because, for any
specific VDisk, one node solely serves as a backup node (a preferred node scheme is used).
The load is never going to get balanced for that particular VDisk. It is better to load balance by
I/O Group instead, and let the VDisks get automatically assigned to nodes.
Chapter 1. SAN fabric
15
1.3.6 Sample standard SVC zoning configuration
This section contains a sample “standard” zoning configuration for an SVC cluster. Our
sample setup has two I/O Groups, two storage subsystems, and eight hosts. (Refer to
Figure 1-7.)
Obviously, the zoning configuration must be duplicated on both SAN fabrics; we will show the
zoning for the SAN named “A.”
Note: All SVC Nodes have
two connections per
switch.
SVC Node
SVC Node
SVC Node
SVC Node
Switch A
Peter
Switch B
Barry
Jon
Ian
Thorsten
Ronda
Deon
Foo
Figure 1-7 Example SVC SAN
For the sake of brevity, we only discuss SAN “A” in our example.
Aliases
Unfortunately, you cannot nest aliases, so several of these WWPNs appear in multiple
aliases. Also, do not be concerned if none of your WWPNs looks like the example; we made
a few of them up when writing this book.
Note that certain switch vendors (for example, McDATA) do not allow multiple-member
aliases, but you can still create single-member aliases. While creating single-member aliases
does not reduce the size of your zoning configuration, it still makes it easier to read than a
mass of raw WWPNs.
For the alias names, we have appended “SAN_A” on the end where necessary to distinguish
that these alias names are the ports on SAN “A”. This system helps if you ever have to
perform troubleshooting on both SAN fabrics at one time.
16
SAN Volume Controller Best Practices and Performance Guidelines
SVC cluster alias
As a side note, the SVC has an extremely predictable WWPN structure, which helps make
the zoning easier to “read.” It always starts with 50:05:07:68 (refer to Example 1-1) and ends
with two octets that distinguish for you which node is which. The first digit of the third octet
from the end is the port number on the node.
The cluster alias that we create will be used for the intra-cluster zone, for all back-end storage
zones, and also in any zones that you need for remote mirroring with another SVC cluster
(which will not be discussed in this example).
Example 1-1 SVC cluster alias
SVC_Cluster_SAN_A:
50:05:07:68:01:10:37:e5
50:05:07:68:01:30:37:e5
50:05:07:68:01:10:37:dc
50:05:07:68:01:30:37:dc
50:05:07:68:01:10:1d:1c
50:05:07:68:01:30:1d:1c
50:05:07:68:01:10:27:e2
50:05:07:68:01:30:27:e2
SVC I/O Group “port pair” aliases
These are the basic “building-blocks” of our host zones. Because the best practices that we
have described specify that each HBA is only supposed to see a single port on each node,
these aliases are the aliases that will be included. To have an equal load on each SVC node
port, you need to roughly alternate between the ports when creating your host zones. Refer to
Example 1-2.
Example 1-2 I/O Group port pair aliases
SVC_Group0_Port1:
50:05:07:68:01:10:37:e5
50:05:07:68:01:10:37:dc
SVC_Group0_Port3:
50:05:07:68:01:30:37:e5
50:05:07:68:01:30:37:dc
SVC_Group1_Port1:
50:05:07:68:01:10:1d:1c
50:05:07:68:01:10:27:e2
SVC_Group1_Port3:
50:05:07:68:01:30:1d:1c
50:05:07:68:01:30:27:e2
Storage subsystem aliases
The first two aliases here are similar to what you might see with an IBM System Storage
DS4800 storage subsystem with four back-end ports per controller blade. We have created
different aliases for each blade in order to isolate the two controllers from each other, which is
a best practice suggested by DS4x00 development.
Chapter 1. SAN fabric
17
Because the IBM System Storage DS8000™ has no concept of separate controllers (at least,
not from the viewpoint of a SAN), we put all the ports on the storage subsystem into a single
alias. Refer to Example 1-3.
Example 1-3 Storage aliases
DS4k_23K45_Blade_A_SAN_A
20:04:00:a0:b8:17:44:32
20:04:00:a0:b8:17:44:33
DS4k_23K45_Blade_B_SAN_A
20:05:00:a0:b8:17:44:32
20:05:00:a0:b8:17:44:33
DS8k_34912_SAN_A
50:05:00:63:02:ac:01:47
50:05:00:63:02:bd:01:37
50:05:00:63:02:7f:01:8d
50:05:00:63:02:2a:01:fc
Zones
Remember when naming your zones that they cannot have identical names as aliases.
Here is our sample zone set, utilizing the aliases that we have just defined.
SVC intra-cluster zone
This zone is simple; it only contains a single alias (which happens to contain all of the SVC
node ports). And yes, this zone does overlap with every single storage zone. Nevertheless, it
is good to have it as a fail-safe, given the dire consequences that will occur if your cluster
nodes ever completely lose contact with one another over the SAN. Refer to Example 1-4.
Example 1-4 SVC cluster zone
SVC_Cluster_Zone_SAN_A:
SVC_Cluster_SAN_A
SVC → Storage zones
As we have mentioned earlier, we put each of the storage controllers (and, in the case of the
DS4x00 controllers, each blade) into a separate zone. Refer to Example 1-5.
Example 1-5 SVC → Storage zones
SVC_DS4k_23K45_Zone_Blade_A_SAN_A:
SVC_Cluster_SAN_A
DS4k_23K45_Blade_A_SAN_A
SVC_DS4k_23K45_Zone_Blade_B_SAN_A:
SVC_Cluster_SAN_A
DS4K_23K45_BLADE_B_SAN_A
SVC_DS8k_34912_Zone_SAN_A:
SVC_Cluster_SAN_A
DS8k_34912_SAN_A
18
SAN Volume Controller Best Practices and Performance Guidelines
SVC → Host zones
We have not created aliases for each host, because each host is only going to appear in a
single zone. While there will be a “raw” WWPN in the zones, an alias is unnecessary,
because it will be obvious where the WWPN belongs.
Notice that all of the zones refer to the slot number of the host, rather than “SAN_A.” If you
are trying to diagnose a problem (or replace an HBA), it is extremely important to know on
which HBA you need to work.
For System p® hosts, we have also appended the HBA number (FCS) into the zone name,
which makes device management easier. While it is possible to get this information out of
SDD, it is nice to have it in the zoning configuration.
We alternate the hosts between the SVC node port pairs and between the SVC I/O Groups
for load balancing. While we are just simply alternating in our example, you might want to
balance the load based on the observed load on ports and I/O Groups. Refer to Example 1-6.
Example 1-6 SVC → Host zones
WinPeter_Slot3:
21:00:00:e0:8b:05:41:bc
SVC_Group0_Port1
WinBarry_Slot7:
21:00:00:e0:8b:05:37:ab
SVC_Group0_Port3
WinJon_Slot1:
21:00:00:e0:8b:05:28:f9
SVC_Group1_Port1
WinIan_Slot2:
21:00:00:e0:8b:05:1a:6f
SVC_Group1_Port3
AIXRonda_Slot6_fcs1:
10:00:00:00:c9:32:a8:00
SVC_Group0_Port1
AIXThorsten_Slot2_fcs0:
10:00:00:00:c9:32:bf:c7
SVC_Group0_Port3
AIXDeon_Slot9_fcs3:
10:00:00:00:c9:32:c9:6f
SVC_Group1_Port1
AIXFoo_Slot1_fcs2:
10:00:00:00:c9:32:a8:67
SVC_Group1_Port3
1.3.7 Zoning with multiple SVC clusters
Unless two clusters participate in a mirroring relationship, all zoning must be configured so
that the two clusters do not share a zone. If a single host requires access to two different
Chapter 1. SAN fabric
19
clusters, create two zones with each zone to a separate cluster. The back-end storage zones
must also be separate, even if the two clusters share a storage subsystem.
1.3.8 Split storage subsystem configurations
There might be situations where a storage subsystem is used both for SVC attachment and
direct-attach hosts. In this case, it is important that you pay close attention during the LUN
masking process on the storage subsystem. Assigning the same storage subsystem LUN to
both a host and the SVC will almost certainly result in swift data corruption. If you perform a
migration into or out of the SVC, make sure that the LUN is removed from one place at the
exact same time that it is added to another place.
1.4 Switch Domain IDs
All switch Domain IDs must be unique between both fabrics, and the name of the switch
needs to incorporate the Domain ID. Having a domain ID that is totally unique makes
troubleshooting problems much easier in situations where an error message contains the
FCID of the port with a problem.
1.5 Distance extension for mirroring
To implement remote mirroring over a distance, you have several choices:
 Optical multiplexors, such as DWDM or CWDM devices
 Long-distance small form-factor pluggable transceivers (SFPs) and XFPs
 Fibre Channel → IP conversion boxes
Of those options, the optical varieties of distance extension are the “gold standard.” IP
distance extension introduces additional complexity, is less reliable, and has performance
limitations. However, we do recognize that optical distance extension is impractical in many
cases due to cost or unavailability.
Note: Distance extension must only be utilized for links between SVC clusters. It must not
be used for intra-cluster. Technically, distance extension is supported for relatively short
distances, such as a few kilometers (or miles). Refer to the IBM System Storage SAN
Volume Controller Restrictions, S1003283, for details explaining why this arrangement is
not recommended.
1.5.1 Optical multiplexors
Optical multiplexors can extend your SAN up to hundreds of kilometers (or miles) at
extremely high speeds, and for this reason, they are the preferred method for long distance
expansion. When deploying optical multiplexing, make sure that the optical multiplexor has
been certified to work with your SAN switch model. The SVC has no allegiance to a particular
model of optical multiplexor.
If you use multiplexor-based distance extension, closely monitor your physical link error
counts in your switches. Optical communication devices are high-precision units. When they
shift out of calibration, you start to see errors in your frames.
20
SAN Volume Controller Best Practices and Performance Guidelines
1.5.2 Long-distance SFPs/XFPs
Long-distance optical transceivers have the advantage of extreme simplicity. No expensive
equipment is required, and there are only a few configuration steps to perform. However,
ensure that you only use transceivers designed for your particular SAN switch. Each switch
vendor only supports a specific set of small form-factor pluggable transceivers (SFPs/XFPs),
so it is unlikely that Cisco SFPs will work in a Brocade switch.
1.5.3 Fibre Channel: IP conversion
Fibre Channel IP conversion is by far the most common and least expensive form of distance
extension. It is also a form of distance extension that is complicated to configure, and
relatively subtle errors can have severe performance implications.
With Internet Protocol (IP)-based distance extension, it is imperative that you dedicate
bandwidth to your Fibre Channel (FC) → IP traffic if the link is shared with other IP traffic. Do
not assume that because the link between two sites is “low traffic” or “only used for e-mail”
that this type of traffic will always be the case. Fibre Channel is far more sensitive to
congestion than most IP applications. You do not want a spyware problem or a spam attack
on an IP network to disrupt your SVC.
Also, when communicating with your organization’s networking architects, make sure to
distinguish between megabytes per second as opposed to megabits. In the storage world,
bandwidth is usually specified in megabytes per second (MBps, MB/s, or MB/sec), while
network engineers specify bandwidth in megabits (Mbps, Mbit/s, or Mb/sec). If you fail to
specify megabytes, you can end up with an impressive-sounding 155 Mb/sec OC-3 link,
which is only going to supply a tiny 15 MBps or so to your SVC. With the suggested safety
margins included, this is not an extremely fast link at all.
Exact details of the configurations of these boxes is beyond the scope of this book; however,
the configuration of these units for the SVC is no different than any other storage device.
1.6 Tape and disk traffic sharing the SAN
If you have free ports on your core switch, there is no problem with putting tape devices (and
their associated backup servers) on the SVC SAN; however, you must not put tape and disk
traffic on the same Fibre Channel host bus adapter (HBA).
Do not put tape ports and backup servers on different switches. Modern tape devices have
high bandwidth requirements and to do so can quickly lead to SAN congestion over the ISL
between the switches.
1.7 Switch interoperability
The SVC is rather flexible as far as switch vendors are concerned. The most important
requirement is that all of the node connections on a particular SVC cluster must all go to
switches of a single vendor. This requirement means that you must not have several nodes or
node ports plugged into vendor A, and several nodes or node ports plugged into vendor B.
While the SVC supports certain combinations of SANs made up of switches from multiple
vendors in the same SAN; in practice, we do not particularly recommend this approach.
Despite years of effort, interoperability among switch vendors is less than ideal, because the
Chapter 1. SAN fabric
21
Fibre Channel standards are not rigorously enforced. Interoperability problems between
switch vendors are notoriously difficult and disruptive to isolate, and it can take a long time to
obtain a fix. For these reasons, we suggest only running multiple switch vendors in the same
SAN long enough to migrate from one vendor to another vendor, if this setup is possible with
your hardware.
It is acceptable to run a mixed-vendor SAN if you have gained agreement from both switch
vendors that they will fully support attachment with each other. In general, Brocade will
interoperate with McDATA under special circumstances. Contact your IBM marketing
representative for details (“McDATA” here refers to the switch products sold by the McDATA
Corporation prior to their acquisition by Brocade Communications Systems. Much of that
product line is still for sale at this time). QLogic/BladeCenter FCSM will work with Cisco.
We do not advise interoperating Cisco with Brocade at this time, except during fabric
migrations, and only then if you have a back-out plan in place. We also do not advise that you
connect the QLogic/BladeCenter FCSM to Brocade or McDATA.
When having SAN fabrics with multiple vendors, pay special attention to any particular
requirements. For instance, observe from which switch in the fabric the zoning must be
performed.
1.8 TotalStorage Productivity Center for Fabric
TotalStorage® Productivity Center (TPC) for Fabric can be used to create, administer, and
monitor your SAN fabrics. There is nothing special that you need to do to use it to administer
an SVC SAN fabric as opposed to any other SAN fabric. We discuss information about TPC
for Fabric in Chapter 11, “Monitoring” on page 221.
For further information, consult the TPC IBM Redbooks publication, IBM TotalStorage
Productivity Center V3.1: The Next Generation, SG24-7194, or contact your IBM marketing
representative.
22
SAN Volume Controller Best Practices and Performance Guidelines
2
Chapter 2.
SAN Volume Controller cluster
In this chapter, we discuss the advantages of virtualization and the optimal time to use
virtualization in your environment. Furthermore, we describe the scalability options for the
IBM System Storage SAN Volume Controller (SVC) and when to grow or split an SVC cluster.
© Copyright IBM Corp. 2008. All rights reserved.
23
2.1 Advantages of virtualization
The IBM System Storage SAN Volume Controller (SVC), which is shown in Figure 2-1,
enables a single point of control for disparate, heterogeneous storage resources. The SVC
enables you to put capacity from various heterogeneous storage subsystem arrays into one
pool of capacity for better utilization and more flexible access. This design helps the
administrator to control and manage this capacity from a single common interface instead of
managing several independent disk systems and interfaces. Furthermore, the SVC can
improve the performance of your storage subsystem array by introducing 8 GB of cache
memory in each node, mirrored within a node pair.
SVC virtualization provides users with the ability to move data non-disruptively from one
storage subsystem to another storage subsystem. It also introduces advanced copy functions
that are usable over heterogeneous storage subsystems. For many users, who are offering
storage to other clients, it is also extremely attractive because you can create a “tiered”
storage environment.
Figure 2-1 SVC 8G4 model
2.1.1 How does the SVC fit into your environment
Here is a short list of the SVC features:
 Combines capacity into a single pool
 Manages all types of storage in a common way from a common point
 Provisions capacity to applications easier
 Improves performance through caching and striping data across multiple arrays
 Creates tiered storage arrays
 Provides advanced copy services over heterogeneous storage arrays
 Removes or reduces the physical boundaries or storage controller limits associated with
any vendor storage controllers
 Brings common storage controller functions into the Storage Area Network (SAN), so that
all storage controllers can be used and can benefit from these functions
2.2 Scalability of SVC clusters
The SAN Volume Controller is highly scalable, and it can be expanded up to eight nodes in
one cluster. An I/O Group is formed by combining a redundant pair of SVC nodes (System
x™ server-based). Each server includes a four-port 4 Gbps-capable host bus adapter (HBA),
24
SAN Volume Controller Best Practices and Performance Guidelines
which is designed to allow the SVC to connect and operate at up to 4 Gbps SAN fabric speed.
Each I/O Group contains 8 GB of mirrored cache memory. Highly available I/O Groups are
the basic configuration element of an SVC cluster. Adding I/O Groups to the cluster is
designed to linearly increase cluster performance and bandwidth. An entry level SVC
configuration contains a single I/O Group. The SVC can scale out to support four I/O Groups,
and the SVC can scale up to support 1 024 host servers. For every cluster, the SVC supports
up to 8 192 virtual disks (VDisks). This configuration flexibility means that SVC configurations
can start small with an attractive price to suit smaller clients or pilot projects and yet can grow
to manage extremely large storage environments.
2.2.1 Advantage of multi-cluster as opposed to single cluster
Growing or adding new I/O Groups to an SVC cluster is a decision that has to be made when
either a configuration limit is reached or when the I/O load reaches a point where a new I/O
Group is needed. The saturation point for the configuration that we tested was reached at
approximately 70 000 I/Os per second (IOPS) for the current SVC hardware (8G4 nodes on
an x3550) and SVC Version 4.x (refer to Table 2-2 on page 29).
To determine the number of I/O Groups and monitor the CPU performance of each node, you
can also use TotalStorage Productivity Center (TPC). The CPU performance is related to I/O
performance. When the CPUs become consistently 70% busy, you must consider either:
 Adding more nodes to the cluster and moving part of the workload onto the new nodes
 Moving several VDisks to another I/O Group, if the other I/O Group is not busy
To see how busy your CPUs are, you can use the TPC performance report, by selecting CPU
Utilization as shown in Figure 2-2 on page 26.
Several of the activities that affect CPU utilization are:
 VDisk activity: The preferred node is responsible for I/Os for the VDisk and coordinates
sending the I/Os to the alternate node. While both systems will exhibit similar CPU
utilization, the preferred node is a little busier. To be precise, a preferred node is always
responsible for the destaging of writes for VDisks that it owns. Therefore, skewing
preferred ownership of VDisks toward one node in the I/O Group will lead to more
destaging, and therefore, more work on that node.
 Cache management: The purpose of the cache component is to improve performance of
read and write commands by holding part of the read or write data in SVC memory. The
cache component must keep the caches on both nodes consistent, because the nodes in
a caching pair have physically separate memories.
 FlashCopy® activity: Each node (of the flash copy source) maintains a copy of the bitmap;
CPU utilization is similar.
 Mirror Copy activity: The preferred node is responsible for coordinating copy information
to the target and also ensuring that the I/O Group is up-to-date with the copy progress
information or change block information. As soon as Global Mirror is enabled, there is an
additional 10% overhead on I/O work due to the buffering and general I/O overhead of
performing asynchronous Peer-to-Peer Remote Copy (PPRC).
Chapter 2. SAN Volume Controller cluster
25
Figure 2-2 TPC Performance Report: Storage Subsystem Performance by Node
After you reach the performance or configuration maximum for an I/O Group, you can add
additional performance or capacity by attaching another I/O Group to the SVC cluster.
Table 2-1 on page 27 shows the current maximum limits for one SVC I/O Group.
26
SAN Volume Controller Best Practices and Performance Guidelines
Table 2-1 Maximum configurations for an I/O Group
Objects
Maximum number
Comments
SAN Volume Controller nodes
Eight
Arranged as four I/O Groups
I/O Groups
Four
Each containing two nodes
VDisks per I/O Group
2048
Includes managed-mode and
image-mode VDisks
Host IDs per I/O Group
256 (Cisco, Brocade, or
McDATA)
64 QLogic®
N/A
Host ports per I/O Group
512 (Cisco, Brocade, or
McDATA)
128 QLogic
N/A
Metro/Global Mirror VDisks per
I/O Group
1024 TB
There is a per I/O Group limit of
1024 TB on the quantity of
Primary and Secondary VDisk
address space, which can
participate in Metro/Global
Mirror relationships. This
maximum
configuration will consume all
512 MB of bitmap space for the
I/O Group and allow no
FlashCopy bitmap space. The
default is 40 TB.
FlashCopy VDisks per I/O
Group
1024 TB
This limit is a per I/O group limit
on the quantity of FlashCopy
mappings using bitmap space
from a given I/O Group. This
maximum configuration will
consume all 512 MB of bitmap
space for the I/O Group and
allow no Metro Mirror or Global
Mirror bitmap space. The
default is
40 TB.
2.2.2 Performance expectations by adding an SVC
As shown in 2.2.1, “Advantage of multi-cluster as opposed to single cluster” on page 25, there
are limits that will cause the addition of a new I/O Group to the existing SVC cluster.
In Figure 2-3 on page 28, you can see the performance improvements by adding a new I/O
Group to your SVC cluster. A single SVC cluster can reach a performance of more than
70 000 IOPS, given that the total response time will not pass five milliseconds. If this limit is
close to being exceeded, you will need to add a second I/O Group to the cluster.
With the newly added I/O Group, the SVC cluster can now manage more than 130 000 IOPS.
An SVC cluster itself can be scaled up to an eight node cluster with which we will reach a total
I/O rate of more than 250 000 IOPS.
Chapter 2. SAN Volume Controller cluster
27
Figure 2-3 Performance increase by adding I/O Groups
Looking at Figure 2-3, you can see that the response time over throughput can be scaled
nearly linearly by adding SVC nodes (I/O Groups) to the cluster.
2.2.3 Growing or splitting SVC clusters
Growing an SVC cluster can be done concurrently, and the SVC cluster can grow up to the
current maximum of eight SVC nodes per cluster in four I/O Groups. Table 2-2 on page 29
contains an extract of the total SVC cluster configuration limits.
28
SAN Volume Controller Best Practices and Performance Guidelines
Table 2-2 Maximum SVC cluster limits
Objects
Maximum
number
Comments
SAN Volume Controller nodes
Eight
Arranged as four I/O Groups
MDisks
4 096
The maximum number of
logical units that can be
managed by SVC. This number
includes disks that have not
been configured into Managed
Disk Groups.
Virtual disks (VDisks) per
cluster
8 192
Includes managed-mode
VDisks and image-mode
VDisks. The maximum requires
an 8 node cluster.
TotalStorage manageable by
SVC
8 PB
If maximum extent size of 2048
MB is used
Host IDs per cluster
1 024 (Cisco, Brocade, and
McDATA fabrics)
155 CNT
256 QLogic
A Host ID is a collection of
worldwide port names
(WWPNs) that represents a
host. This Host ID is used to
associate SCSI LUNs with
VDisks.
Host ports per cluster
2048 (Cisco, Brocade, and
McDATA fabrics)
310 CNT
512 QLogic
N/A
If you exceed one of the current maximum configuration limits for the fully deployed SVC
cluster, you then scale out by adding a new SVC cluster and distributing the workload to it.
Because the current maximum configuration limits can change, use the following link to get a
complete table of the current SVC restrictions:
http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003283
Splitting an SVC cluster or having a secondary SVC cluster provides you with the ability to
implement a disaster recovery option in the environment. Having two SVC clusters in two
locations allows work to continue even if one site is down. With the SVC Advanced Copy
functions, you can copy data from the local primary environment to a remote secondary site.
The maximum configuration limits apply here as well.
Another advantage of having two clusters is that the SVC Advanced Copy functions license is
based on:
 The total amount of storage (in gigabytes) that is virtualized
 The Metro Mirror and Global Mirror or FlashCopy capacity in use
In each case, the number of terabytes (TBs) to order for Metro Mirror and Global Mirror is the
total number of source TBs and target TBs participating in the copy operations.
Growing the SVC cluster
Before adding a new I/O Group to the existing SVC cluster, you must make changes. It is
important to adjust the zoning so that the new SVC node pair can join the existing SVC
Chapter 2. SAN Volume Controller cluster
29
cluster. It is also necessary to adjust the zoning for each SVC node in the cluster to be able to
see the same subsystem storage arrays.
After you make the zoning changes, you can add the new nodes into the SVC cluster. You
can use the guide for adding nodes to an SVC cluster in IBM System Storage SAN Volume
Controller, SG24-6423-06.
Splitting the SVC cluster
Splitting the SVC cluster might become a necessity if the maximum number of eight SVC
nodes is reached, and you have a requirement to grow the environment beyond the
maximum number of I/Os that a cluster can support, maximum number of attachable
subsystem storage controllers, or any other maximum mentioned in the V4.3.0 IBM System
Storage SAN Volume Controller restrictions at:
http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003283
Instead of having one SVC cluster host all I/O operations, hosts, and subsystem storage
attachments, the goal here is to create a second SVC cluster so that we equally distribute all
of the workload over the two SVC clusters.
There are a number of approaches that you can take for splitting an SVC cluster. The first,
and probably the easiest, way is to create a new SVC cluster, attach storage subsystems and
hosts to it, and start putting workload on this new SVC cluster.
The next options are more intensive, and they involve performing more steps:
 Create a new SVC cluster and start moving workload onto it. To move the workload from
an existing SVC cluster to a new SVC cluster, you can use the Advanced Copy features,
such as Metro Mirror and Global Mirror. We describe this scenario in Chapter 8, “Copy
services” on page 151.
Note: This move involves an outage from the host system point of view, because the
worldwide port name (WWPN) from the subsystem (SVC I/O Group) does change.
 You can use the VDisk “managed mode to image mode” migration to move workload from
one SVC cluster to the new SVC cluster. Migrate a VDisk from manage mode to image
mode, reassign the disk (logical unit number (LUN) masking) from your storage
subsystem point of view, introduce the disk to your new SVC cluster, and use the image
mode to manage mode migration. We describe this scenario in Chapter 7, “VDisks” on
page 119.
Note: This scenario also invokes an outage to your host systems and the I/O to the
VDisk.
From a user perspective, the first option is the easiest way to expand your cluster workload.
The second and third options are more difficult, involve more steps, and require more
preparation in advance. The third option is the choice that involves the longest outage to the
host systems, and therefore, we do not prefer the third choice.
There is only one good reason that we can think of to reduce the existing SVC cluster by a
certain number of I/O Groups: if more bandwidth is required on the secondary SVC cluster
and if there is spare bandwidth available on the primary cluster.
30
SAN Volume Controller Best Practices and Performance Guidelines
Adding or upgrading SVC node hardware
If you have a cluster of six or fewer nodes of older hardware, and you have purchased new
hardware, you can choose to either start a new cluster for the new hardware or add the new
hardware to the old cluster. Both configurations are supported.
While both options are practical, we recommend that you add the new hardware to your
existing cluster. This recommendation only is true if, in the short term, you are not scaling the
environment beyond the capabilities of this cluster.
By utilizing the existing cluster, you maintain the benefit of managing just one cluster. Also, if
you are using mirror copy services to the remote site, you might be able to continue to do so
without having to add SVC nodes at the remote site.
You have a couple of choices to upgrade an existing cluster’s hardware. The choices depend
on the size of the existing cluster.
If your cluster has up to six nodes, you have these options available:
 Add the new hardware to the cluster, migrate VDisks to the new nodes and then retire the
older hardware when it is no longer managing any VDisks.
This method requires a brief outage to the hosts to change the I/O Group for each VDisk.
 Swap out one node in each I/O Group at a time and replace it with the new hardware. We
recommend that you engage an IBM service support representative (IBM SSR) to help
you with this process.
You can perform this swap without an outage to the hosts.
If your cluster has eight nodes, the options are similar:
 Swap out a node in each I/O Group one at a time and replace it with the new hardware.
We recommend that you engage an IBM SSR to help you with this process.
You can perform this swap without an outage to the hosts, and you need to swap a node
in one I/O Group at a time. Do not change all I/O Groups in a multi-I/O Group cluster at
one time.
 Move the VDisks to another I/O Group so that all VDisks are on three of the four I/O
Groups. You can then remove the remaining I/O Group with no VDisks from the cluster
and add the new hardware to the cluster.
As each pair of new nodes is added, VDisks can then be moved to the new nodes, leaving
another old I/O Group pair that can be removed. After all the old pairs are removed, the
last two new nodes can be added, and if required, VDisks can be moved onto them.
Unfortunately, this method requires several outages to the host, because VDisks are
moved between I/O Groups. This method might not be practical unless you need to
implement the new hardware over an extended period of time, and the first option is not
practical for your environment.
 You can mix the previous two options.
New SVC hardware provides considerable performance benefits on each release, and
there have been substantial performance improvements since the first hardware release.
Depending on the age of your existing SVC hardware, the performance requirements
might be met by only six or fewer nodes of the new hardware.
If this situation fits, you might be able to utilize a mix of the previous two steps. For
example, use an IBM SSR to help you upgrade one or two I/O Groups, and then move the
VDisks from the remaining I/O Groups onto the new hardware.
Chapter 2. SAN Volume Controller cluster
31
For more details about replacing nodes non-disruptively or expanding an existing SVC
cluster, refer to IBM System Storage SAN Volume Controller, SG24-6423-05.
2.3 SVC performance scenarios
In this section, we describe five test scenarios. These scenarios show you a comparison from
a direct-attached DS4500 to a Windows® host and the performance improvement by
introducing the SVC in the data path. These scenarios also show you the performance results
during a VDisk migration from an image mode to a striped VDisk. In the last test, we
examined the impact of a node failure on the I/O throughput.
We performed these tests in the following environment:




Operating System: Microsoft® Windows 2008 Enterprise Edition
Storage: 64 GB LUN/DS4500
SAN: Dual Fabric 2005 B5K/Firmware: V6.1.0c
I/O Application: I/O Meter:
– 70% read
– 30% write
– 32 KB
– 100% sequential
– Queue depth: 8
As we have already explained, the test scenarios (each 40 minutes running) are:
 Test 1: Storage subsystem direct-attached to host
 Test 2: SVC in the path and a 64 GB image mode VDisk/cache-enabled
 Test 3: SVC in the path and a 64 GB VDisk during a migration
 Test 4: SVC in the path and a 64 GB striped VDisk
 Test 5: SVC node failure
The overview shown in Figure 2-4 on page 33 does not provide any absolute numbers or
show the best performance that you are ever likely to get. The test sequence that we have
chosen shows the normal introduction of an SVC cluster in a client environment, going from a
native attached storage environment to a virtualized storage attachment environment.
Figure 2-4 on page 33 shows the total data rate in MBps, while the 64 GB disk was managed
by the SVC (tests 2, 3, 4, and 5). Test 3 and test 4 show a spike at the beginning of each test.
By introducing the SVC in the data path, we introduced a caching appliance. Therefore, host
I/O will no longer go directly to the subsystem, it is first cached and then flushed down to the
subsystem.
32
SAN Volume Controller Best Practices and Performance Guidelines
Figure 2-4 SVC node total data rate
During test 5, we disabled all of the ports for node 1 on the switches. Afterward, but still
during the test, we enabled the switch ports again. SVC node 1 joined the cluster with a
cleared cache, and therefore, you see the spike at the end of the test.
In this section, we show you the value of the SVC cluster in our environment. For this
purpose, we only compare the direct-attached storage with a striped VDisk (test 1 and test 4).
Figure 2-5 on page 34 shows the values for the total traffic: the read MBps and the write
MBps. Similar to the I/O rate, we saw a 12% improvement for the I/O traffic.
Chapter 2. SAN Volume Controller cluster
33
Striped VDisk
Direct
attached
Striped VDisk
Direct
attached
Striped VDisk
Direct
attached
Figure 2-5 Native MBps compared to SVC-attached storage
For both parameters, I/Ops and MBps in Figure 2-5, we saw a performance improvement by
using the SVC.
2.4 Cluster upgrade
The SVC cluster is designed to perform a concurrent code update. Although it is a concurrent
code update for the SVC, it is disruptive to upgrade certain other parts in a client
environment, such as updating the multipathing driver. Before applying the SVC code update,
the administrator needs to review the following Web page to ensure the compatibility between
the SVC code and the SVC Console GUI. The SAN Volume Controller and SVC Console GUI
Compatibility Web site is:
http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1002888
Furthermore, certain concurrent upgrade paths are only available through an intermediate
level. Refer to the following Web page for more information, SAN Volume Controller
Concurrent Compatibility and Code Cross-Reference:
http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1001707
Even though the SVC code update is concurrent, we recommend that you perform several
steps in advance:
 Before applying a code update, ensure that there are no open problems in your SVC,
SAN, or storage subsystems. Use the “Run maintenance procedure” on the SVC and fix
the open problems first. For more information, refer to 14.3.2, “Solving SVC problems” on
page 284.
 It is also extremely important to check your host dual pathing. Make sure that from the
host’s point of view that all paths are available. Missing paths can lead to I/O problems
34
SAN Volume Controller Best Practices and Performance Guidelines
during the SVC code update. Refer to Chapter 9, “Hosts” on page 175 for more
information about hosts.
 It is wise to schedule a time for the SVC code update during low I/O activity.
 Upgrade the Master Console GUI first.
 Allow the SVC code update to finish before making any other changes in your
environment.
 Allow at least one hour to perform the code update for a single SVC I/O Group and 30
minutes for each additional I/O Group. In a worst case scenario, an update can take up to
two hours, which implies that the SVC code update will also update the BIOS, SP, and the
SVC service card.
Important: The Concurrent Code Upgrade (CCU) might appear to stop for a long time (up
to an hour) if it is upgrading a low level BIOS. Never power off during a CCU unless you
have been instructed to power off by IBM service personnel. If the upgrade encounters a
problem and fails, the upgrade will be backed out.
New features are not available until all nodes in the cluster are at the same level. Features,
which are dependent on a remote cluster Metro Mirror or Global Mirror, might not be available
until the remote cluster is at the same level too.
Chapter 2. SAN Volume Controller cluster
35
36
SAN Volume Controller Best Practices and Performance Guidelines
3
Chapter 3.
SVC Console
In this chapter, we describe important areas of the IBM System Storage SAN Volume
Controller (SVC) Console. The SVC Console is a Graphical User Interface (GUI) application
installed on a server running a server version of the Microsoft Windows operating system.
© Copyright IBM Corp. 2008. All rights reserved.
37
3.1 SVC Console installation
The SVC Console is mandatory for installing and managing an SVC cluster. Currently, the
SVC Console is available as a software only solution, or the SVC Console can be a combined
software and hardware solution that can be ordered together with an SVC cluster. Common
to both options is that they communicate with the SVC cluster using an IP/Ethernet network
connection and therefore require an IP address and an Ethernet port that can communicate
with the SVC cluster. The SVC Console also serves as the SVC data source for use with IBM
TotalStorage Productivity Center (TPC).
3.1.1 Software only installation option
The SVC Console software is available for installation on a client-provided server running one
of the following operating systems:
 Microsoft Windows 2000 Server
 Microsoft Windows Server® 2003 Standard Edition
 Microsoft Windows Server 2003 Enterprise Edition
Note: Only x86 (32-bit) versions of these operating systems are supported. Do not use x64
(64-bit) variants.
You access the SVC Console application by using a Web browser. Therefore, ensure that
Microsoft Windows Internet Explorer® Version 7.0 (or Version 6.1 with Service Pack 1, for
Microsoft Windows 2000 Server) is installed on the server.
Secure Shell (SSH) connectivity with the SVC cluster uses the PuTTY SSH suite. The PuTTY
installation package comes bundled with the SVC Console software, and you must install it
prior to installing the SVC Console software.
Requirements: If you want to use Internet Protocol (IP) Version 6 (IPv6) communication
with your SVC cluster, you must run Windows 2003 Server and your PuTTY version must
be at least 0.60.
While not a requirement, we recommend that adequate antivirus software is installed on the
server together with software for monitoring the server health status. Whenever service packs
or critical updates for the operating system become available, we recommend that they are
applied.
To successfully install and run the SVC Console software, the server must have adequate
system performance. We suggest a minimum hardware configuration of:




Single Intel® Xeon dual-core processor, minimum 2.1Ghz (or equivalent)
4 GB DDR memory
70 GB primary hard disk drive capacity using a disk mirror (for fault tolerance)
100 Mbps Ethernet connection
To minimize the risk of conflicting applications, performance problems, and so on, we
recommend that the server is not assigned any other roles except for serving as the SVC
Console server. We also do not recommend that you set up the server to be a member of any
Microsoft Windows Active Directory® domain.
38
SAN Volume Controller Best Practices and Performance Guidelines
3.1.2 Combined software and hardware installation option
If you choose to order an SVC Console server (feature code 2805-MC2) together with an
SVC cluster, you will receive the System Storage Productivity Center (SSPC). SSPC is an
integrated hardware and software solution that provides a single management console for
managing IBM Storage Area Network (SAN) Volume Controller, IBM DS8000, and other
components of your data storage infrastructure.
Note: The SSPC option replaces the dedicated Master Console server (feature code
4001), which is being discontinued. The Master Console is still supported and will run the
latest code levels of the SVC Console software.
The SSPC server has following initial hardware configuration:





1x quad-core Intel Xeon® processor E531, 1.60 GHz, 8 MB L2 cache
4x 1GB PC2-5300 ECC DDR2 Chipkill™ memory
2x primary hard disk drives: 146 GB 15k RPM SAS drives, ServeRAID™ 8k RAID 1 array
2x integrated 10/100/1000 Mbps Ethernet connections
Microsoft Windows Server 2003 Enterprise Edition
If you plan to install and use TPC for Replication or plan to manage a large number of
components using the SSPC server, we recommend that you order the SSPC server with the
additional Performance Upgrade kit (feature code 1800). With this kit installed, both the
processor capacity and memory capacity are doubled compared to the initial configuration.
When using SSPC, the SVC Console software is already installed on the SSPC server, as
well as PuTTY. For a detailed guide to the SSPC, we recommend that you refer to the IBM
System Storage Productivity Center Software Installation and User’s Guide, SC23-8823.
Note: If you want to use IPv6 communication with your SSPC and SVC cluster, ensure
that your PuTTY version is at least 0.60.
The SSPC server does not ship with antivirus software installed. We recommend that you
install antivirus software. Also, you need to apply service packs and critical updates to the
operating system when they become available.
Do not use the SSPC server for any roles except roles related to SSPC, and we do not
recommend joining the server to a Microsoft Windows Active Directory domain.
3.1.3 SVC cluster software and SVC Console compatibility
In order to allow seamless operation between the SVC cluster software and the SVC Console
software, it is of paramount importance that software levels match between the two. Before
adding an SVC cluster to an SVC Console, or before upgrading the SVC cluster software on
an existing SVC cluster, you must ensure that the software levels are compatible.
To check the current SVC cluster software level, connect to the SVC cluster using SSH and
then issue the svcinfo lscluster command, which is shown in Example 3-1 on page 40.
Chapter 3. SVC Console
39
Example 3-1 Checking the SVC cluster software version (lines removed for clarity)
IBM_2145:itsosvccl1:admin>svcinfo lscluster itsosvccl1
cluster_IP_address 9.43.86.117
cluster_service_IP_address 9.43.86.118
code_level 4.3.0.0 (build 8.16.0806230000)
BM_2145:itsosvccl1:admin>
You can locate the SVC Console version on the Welcome window (Figure 3-1), which
displays after you log in to the SVC Console.
Figure 3-1 Display SVC Console version
After you obtain the software versions, locate the appropriate SVC Console version. For an
overview of SAN Volume Controller and SVC Console compatibility, refer to the Web site,
which is shown in Figure 3-2.
http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1002888
Figure 3-2 SVC cluster software to SVC Console compatibility matrix
40
SAN Volume Controller Best Practices and Performance Guidelines
3.1.4 IP connectivity considerations
Management of an SVC cluster relies on IP communication, including both access to the SVC
command line interface (CLI) and communication between the SVC Console GUI application
and the SVC cluster. Error reporting and performance data from the SVC cluster are also
transferred using IP communications through services, such as e-mail notification and Simple
Network Management Protocol (SNMP) traps.
Note: The SVC cluster state information is exchanged between nodes through the node
Fibre Channel interface. Thus, if the IP/Ethernet network connectivity fails, the SVC cluster
will remain fully operational. Only management is disrupted.
The SVC cluster supports both IP Version 4 (IPv4) and 6 (IPv6) connectivity and attaches to
the physical network infrastructure using one 10/100 Mbps Ethernet connection per node. All
nodes in an SVC cluster share the same two IP addresses (cluster address and service IP
address). The cluster IP address is dynamically following the current config node, whereas
the service IP only becomes active when a node is put into service mode using the front
panel. At this point, the service IP address will become active for the node entering service
mode, and it will remain active until service mode is ended.
It is imperative that all node Ethernet interfaces can access the IP networks where the SVC
Console and other management stations reside, because the IP addresses for an SVC
cluster are not statically assigned to any specific node in the SVC cluster. While everything
will work with only the current config node having the correct access, access to the SVC
cluster might be disrupted if the config node role switches to another node in the SVC cluster.
Therefore, in order to allow seamless operations in failover and other state changing
situations, observe the following IP/Ethernet recommendations:
 All nodes in an SVC cluster must be connected to the same layer 2 Ethernet segment. If
Virtual LAN (VLAN) technology is implemented, all nodes must be on the same VLAN.
 If an IP gateway is configured for the SVC cluster, it must not filter traffic based on
Ethernet Media Access Control (MAC) addresses.
 There can be no active packet filters or shapers for traffic to and from the SVC cluster.
 No static (sticky) Address Resolution Protocol (ARP) caching can be active for the IP
gateway connecting to the SVC cluster. When the SVC cluster IP addresses shift from one
node to another node, the corresponding ARP entry will need to be updated with the new
MAC address information.
3.2 Using the SVC Console
The SVC Console is used as a platform for configuration, management, and service activity
on the SAN Volume Controller. You can obtain basic instructions for setting up and using the
SVC Console in your environment in the IBM System Storage SAN Volume Controller V4.3.0
Installation and Configuration Guide, S7002156, and IBM System SAN Volume Controller
V4.3, SG24-6423-06.
Chapter 3. SVC Console
41
3.2.1 SSH connection limitations
To limit resource consumption for management, each SVC cluster can host only a limited
number of Secure Shell (SSH) connections. The SVC cluster supports no more than 10
concurrent SSH connections per user ID for a maximum of 20 concurrent connections per
cluster (10 for the admin user and 10 for the service user). If this number is exceeded, the
SVC cluster will not accept any additional incoming SSH connections. Included in this count
are all of the SSH connections, such as interactive sessions, Common Information Model
Object Manager (CIMOM) applications (such as the SVC Console), and host automation
tools, such as HACMP™-XD.
There is also a limit on the number of SSH connections that can be opened per second. The
current limitation is 15 SSH connections per second.
Note: We recommend that you close SSH connections when they are no longer required.
Use the exit command to terminate an interactive SSH session.
If the maximum connection limit is reached and you cannot determine which clients have
open connections to the cluster, the SVC cluster code has incorporated options to help you
recover from this state.
A cluster error code (2500) is logged by the SVC cluster when the maximum connection limit
is reached. If there is no other error on the SVC cluster with a higher priority than this error,
message 2500 will be displayed on the SVC cluster front panel. Figure 3-3 shows this error
message in the error log.
Figure 3-3 Error code 2500 “SSH Session limit reached”
If you get this error:
1. If you still have access to an SVC Console GUI session for the cluster, you can use the
Service and Maintenance menu to start the “Run Maintenance Procedures” task to fix this
error. This option allows you to reset all active connections, which terminates all SSH
sessions and clears the login count.
2. If you have no access to the SVC cluster using the SVC Console GUI, there is now a
direct maintenance link in the drop-down menu of the View cluster panel of the SVC
Console. Using this link, you can get directly to the Service and Maintenance procedures.
The following panels guide you to access and use this maintenance feature. Figure 3-4 on
page 43 shows you how to launch this procedure.
42
SAN Volume Controller Best Practices and Performance Guidelines
Figure 3-4 Launch Maintenance Procedures from the panel to view the cluster
When analyzing the error code 2500, a window similar to the example in Figure 3-5 on
page 44 will appear. From this window, you can identify which user has reached the 10
concurrent connections limit, which in this case is the admin user.
Note that the service user has only logged in four times and therefore still has six connections
left. From this window, the originating IP address of a given SSH connection is also
displayed, which can be useful to determine which user opened the connection.
Remember that if the connection originated from a different IP subnet than where the SVC
cluster resides, it might be a gateway device IP address that is displayed, which is the case
with the IP address of 9.146.185.99 in Figure 3-5 on page 44. If you are unable to close any
SSH connections from the originator side, you can force the closure of all SSH connections
from the maintenance procedure panel by clicking Close All SSH Connections.
Chapter 3. SVC Console
43
Figure 3-5 SSH connection limit exceeded
You can read more information about the current SSH limitations and how to fix related
problems at:
http://www-1.ibm.com/support/docview.wss?rs=591&context=STCFKTH&context=STCFKTW&dc
=DB500&uid=ssg1S1002896&loc=en_US&cs=utf-8&lang=en
3.2.2 Managing multiple SVC clusters using a single SVC Console
If you have more than a single SVC cluster in your environment, a single SVC Console
instance can be used to manage multiple SVC clusters. Simply add the additional SVC
clusters to the SVC Console by using the Clusters pane of the SVC Console GUI. Figure 3-6
on page 45 shows how to add an additional SVC cluster to an SVC Console, which already
manages three SVC clusters.
44
SAN Volume Controller Best Practices and Performance Guidelines
Figure 3-6 Adding an additional SVC cluster to SVC Console
Important: All SVC clusters to be managed by a given SVC Console must have the
matching public key file installed, because an SVC Console instance can only load a single
SSH certificate (the icat.ppk private SSH key) at a time.
A single SVC Console can manage a maximum of four SVC clusters. As more testing is
done, and more powerful hardware and software become available, this limit might change.
For current information, contact your IBM marketing representative or refer to the SVC
support site on the Internet:
http://www.ibm.com/storage/support/2145
One challenge of using an SVC Console to manage multiple SVC clusters is that if one
cluster is currently not operational, for example, the cluster shows “No Contact” for the SVC
cluster state, ease of access to the other clusters is affected by the two minute timeout during
the launch of SVC menus when the GUI is checking the status of both clusters. This timeout
appears while the SVC Console GUI is trying to access the “missing” SVC cluster.
3.2.3 Managing an SVC cluster using multiple SVC Consoles
In certain environments, it is important to have redundant management tools for the storage
infrastructure, which you can have with the SVC Console.
Note: The SVC Console is the management tool for the SVC cluster. Even if the SVC
Console fails, the SVC cluster still remains operational.
Chapter 3. SVC Console
45
The advantages of using more than one SVC Console include:
 Redundancy: If one SVC Console is failing, you can use another SVC Console to continue
managing the SVC clusters.
 Manageability from multiple locations: If you have two or more physical locations with
SVC clusters installed, have an SVC Console in each location to allow you to manage the
local clusters even if connectivity to the other sites is lost. It is a best practice to have an
SVC Console installed per physical location with an SVC cluster.
 Managing multiple SVC cluster code level versions: For certain environments, it might be
necessary to have multiple versions of the SVC Console GUI application running,
because multiple versions of the SVC cluster code are in use.
SSH connection limitations
The SSH connection limitation of a maximum of 10 SSH connections per user ID applies to
all SVC Consoles. Each SVC Console uses one SSH connection for each GUI session that
is launched.
3.2.4 SSH key management
It is extremely important that the SSH key pairs are managed properly, because management
communication with an SVC cluster relies on key-based SSH communications. Lost keys can
lead to situations where an SVC cluster cannot be managed.
PuTTYgen is used for generating the SSH key pairs. A PuTTY-generated SSH key pair is
required to successfully install an SVC cluster. This specific key pair allows the SVC Console
software to communicate with the SVC cluster using the plink.exe PuTTY component. The
private key part must be named icat.ppk, and icat.ppk must exist in the C:\Program
Files\IBM\svcconsole\cimom directory of the SVC Console server. The public key part is
uploaded to the SVC cluster during the initial setup.
As more users are added to an SVC cluster, more key pairs become active, because user
separation on the SVC cluster is performed by using different SSH key pairs. After uploading
the public key to the SVC cluster, there is no restriction on naming or on where to store the
private key for the key pairs (other than the SVC Console key pair, which must be named
icat.ppk). However, to increase manageability, we recommend the following actions for SSH
key pairs that are used with an SVC cluster:
 Store the public key of the SVC Console key pair as icat.pub in the same directory as the
icat.ppk key, which is C:\Program Files\IBM\svcconsole\cimom.
 Always store the public part and private part of an SSH key pair together.
 Name the public key and private key accordingly to allow easy matching.
For more information about SSH keys and how to use them to access the SVC cluster
through the SVC Console GUI, or the SVC CLI, refer to IBM System SAN Volume Controller
V4.3, SG24-6423-06.
Important: It is essential to continuously maintain a valid backup of all of the SSH key
pairs for an SVC cluster. You must store this backup in a safe and known location
(definitely not on the SVC Console server), and the backup must be validated for integrity
on a regular basis.
46
SAN Volume Controller Best Practices and Performance Guidelines
3.2.5 Administration roles
You can use role-based security to restrict the administrative abilities of a user at both an
SVC Console level and an SVC cluster level.
When you use role-based security at the SVC Console, the view that is presented when
opening a GUI session for an SVC cluster is adjusted to reflect the user role. For instance, a
user with the Monitor role (Figure 3-7) cannot create a new MDisk group, but a user with the
Administrator role (Figure 3-8 on page 48) can create a new MDisk group.
Figure 3-7 MDisk group actions available to SVC Console user with Monitor role
Chapter 3. SVC Console
47
Figure 3-8 MDisk group actions available to SVC Console user with Administrator role
Implementing role-based security at the SVC cluster level implies that different key pairs are
used for the SSH communication. When establishing an SSH session with the SVC cluster,
available SVC CLI commands will be determined by the role that is associated with the SSH
key that established the session.
When implementing role-based security at the SVC cluster level, it is important to understand
that when using SSH key pairs with no associated password, anyone with access to the
correct key can gain administrative rights on the SVC cluster. If a user with restricted rights
can access the private key part of an SSH key pair that has administrative rights on the SVC
cluster (such as the icat.ppk used by the SVC Console), a user can elevate that user’s rights.
To prevent this situation, it is important that users can only access the SSH keys to which
they are entitled. Furthermore, PuTTYgen supports associating a password with generated
SSH key pairs at creation time. In conjunction with access control to SSH keys, associating a
password with user-specific SSH key pairs is the recommended approach.
Note: The SSH key pair used with the SVC Console software cannot have a password
associated with it.
For more information about role-based security on the SVC and the commands that each
user role can use, refer to IBM System SAN Volume Controller V4.3, SG24-6423-06, and IBM
System Storage SAN Volume Controller Command-Line Interface User’s Guide, S7002157.
48
SAN Volume Controller Best Practices and Performance Guidelines
Important: Any user with access to the file system on the SVC Console server (in general,
all users who can interactively log in to the operating system) can retrieve the icat.ppk SSH
key and thereby gain administrative access to the SVC cluster. To prevent this general
access, we recommend that the SVC Console GUI is accessed through a Web browser
from another host. Only allow experienced Microsoft Windows Server professionals to
implement additional file level access control in the operating system.
3.2.6 Audit logging
Audit logging is a useful and important tool for administrators. At a certain point in time, the
administrators might have to prove or validate actions that they have performed on the hosts,
storage subsystems, SAN switches, and, in particular, the SVC. An audit log for the SVC
keeps track of action commands that are issued through a Secure Shell (SSH) session or
through the SVC Console.
The SVC audit logging facility is always turned on.
To create a new audit log file, you must use the CLI to issue the command as shown in
Example 3-2.
Example 3-2 Create a new audit log file
IBM_2145:ITSOCL1:admin>svctask dumpauditlog
IBM_2145:ITSOCL1:admin>
The audit log file name is generated automatically in the following format:
auditlog_<firstseq>_<lastseq>_<timestamp>_<clusterid>
where
<firstseq> is the audit log sequence number of the first entry in the log
<lastseq> is the audit sequence number of the last entry in the log
<timestamp> is the time stamp of the last entry in the audit log being dumped
<clusterid> is the cluster ID at the time the dump was created
Note: The audit log file names cannot be changed.
The audit log file that is created can be retrieved using either the SVC Console GUI or by
using Secure Copy Protocol (SCP).
The audit log provides the following information:
 The identity of the user who issued the action command
 The name of the action command
 The time stamp of when the action command was issued by the configuration node
 The parameters that were issued with the action command
Note: Certain commands are not logged in the audit log dump.
Chapter 3. SVC Console
49
This list shows the commands that are not documented in the audit log:
 svctask dumpconfig
 svctask cpdumps
 svctask cleardumps
 svctask finderr
 svctask dumperrlog
 svctask dumpinternallog
 svcservicetask dumperrlog
 svcservicetask finderr
The audit log also tracks commands that failed.
We recommend that audit log data is collected on a regular basis and stored in a safe
location. This procedure must take into account any regulations regarding information
systems auditing.
3.2.7 IBM Support remote access to the SVC Console
The preferred method of IBM Support to remotely connect to an SVC cluster or the SVC
Console is through the use of Assist on Site (AOS). The client is required to provide a
workstation that is accessible from the outside and that can also access the SVC IP/Ethernet
network. AOS provides multiple levels of access and interactions, which can be selected by
the client, including:





Chat
Shared screen view
Shared control
The capability for the client to end the session at any time
The option for the client to log the session locally
The client can allow IBM Support to control the AOS workstation while the client watches, or
alternatively, the client can follow directions from IBM Support, which observes the client’s
actions.
For further information regarding AOS, go to:
http://www-1.ibm.com/support/assistonsite/
3.2.8 SVC Console to SVC cluster connection problems
After adding a new SVC cluster to the SVC Console GUI, you might experience a “No
Contact” availability status for the SVC cluster as shown in Figure 3-9 on page 51.
50
SAN Volume Controller Best Practices and Performance Guidelines
Figure 3-9 Cluster with availability status of No Contact
There are two possible problems that might cause an SVC cluster status of “’No Contact.”
The SVC Console code level does not match the SVC cluster code level (for example, SVC
Console code V2.1.0.x with SVC cluster code 4.2.0). To fix this problem, you need to install
the corresponding SVC Console code that was mentioned in 3.1.3, “SVC cluster software and
SVC Console compatibility” on page 39.
The CIMOM cannot execute the plink.exe command (PuTTY component). To test the
connection, open a command prompt (cmd.exe) and go to the PuTTY installation directory.
Common installation directories are C:\Support Utils\Putty and
C:\Program Files\Putty. Execute the following command from this directory:
plink.exe admin@clusterIP -ssh -2 -i
"c:\Programfiles\IBM\svcconsole\cimom\icat.ppk"
This command is shown in Example 3-3.
Example 3-3 Command execution
C:\Program Files\PuTTY>plink.exe admin@9.43.86.117 -ssh -2 -i "c:\Program
files\IBM\svcconsole\cimom\icat.ppk"
Using username "admin".
Last login: Sun Jul 27 11:18:48 2008 from 9.43.86.115
IBM_2145:itsosvccl1:admin>
In Example 3-3, we executed the command, and the connection was established. If the
command fails, there are a few things to check:
 The location of the PuTTY executable does not match the SSHCLI path in the
setupcmdline.bat used when installing the SVC Console software.
 The icat.ppk key needs to be in the C:\Program Files\IBM\svcconsole\cimom directory.
 The icat.ppk file found in the C:\Program Files\IBM\svcconsole\cimom directory needs to
match the public key uploaded to the SVC cluster.
Chapter 3. SVC Console
51
 The CIMOM can execute the plink.exe command, but the SVC cluster does not exist, it is
offline, or the network is down. Check if the SVC cluster is up and running (check the front
panel of the SVC nodes and use the arrow keys on the node to determine if the Ethernet
port on the configuration node is active). Also, check that the IP address of the cluster
matches the IP address that you have entered in the SVC Console. Then, check the
IP/Ethernet settings on the SVC Console server and issue a ping to the SVC cluster IP
address. If the ping command fails, check your IP/Ethernet network.
If the SVC cluster still reports “No Contact” after you have performed all of these actions on
the SVC cluster, contact IBM Support.
3.2.9 Managing IDs and passwords
There are a number of user IDs and passwords needed for managing the SVC Console, the
SVC cluster, the SVC CLI (SSH), TotalStorage Productivity Center (TPC) CIMOM, and SVC
service mode. It is essential that all of these credentials are carefully tracked and stored in a
safe and known location.
The important user IDs and passwords are:
 SVC Console: Login and password
 SVC Console server: Login and password to operating system
 SVC Cluster: Login and password
 SVC Service mode: Login and password
 SVC CLI (SSH): Private and public key
 TPC CIMOM: User and password (same as SVC Console)
Failing to remember a user ID, a password, or an SSH key can lead to not being able to
manage parts of an SVC installation. Certain user IDs, passwords, or SSH keys can be
recovered or changed, but several of them are fixed and cannot be recovered:
 SVC Console server: You cannot access the SVC Console server. Password recovery
depends on the operating system. The administrator will need to recover the lost or
forgotten user ID and password.
 SVC Cluster: You cannot access the cluster through the SAN Console without this
password. Allow the password reset option during the cluster creation. If the password
reset is not enabled, issue the svctask setpwdreset SVC CLI command to view and
change the status of the password reset feature for the SAN Volume Controller front
panel. Refer to Example 3-4 on page 53.
 SVC Service mode: You cannot access the SVC cluster when it is in service mode. Reset
the password in the SVC Console GUI using the Maintain Cluster Passwords feature.
 SVC CLI (PuTTY): You cannot access the SVC cluster through the CLI. Create a new
private and public key pair.
 SVC Console: You cannot access the SVC cluster through the SVC Console GUI.
Remove and reinstall the SVC Console GUI. Use the default user and password and
change the user ID and password during the first logon.
 TPC CIMOM: Same user and password as the SVC Console.
When creating a cluster, be sure to select the option Allow password reset from front
panel as shown in Figure 3-10 on page 53. You see this option during the initial cluster
creation. For additional information, refer to IBM System SAN Volume Controller V4.3,
SG24-6423-06.
52
SAN Volume Controller Best Practices and Performance Guidelines
Figure 3-10 Select the password reset policy
This option allows access to the cluster if the admin password is lost. If the password reset
feature was not enabled during the cluster creation, use the svctask setpwdreset -enable
CLI command to enable it. Example 3-4 shows how to determine the current status (a zero
indicates that the password reset feature is disabled) and afterwards how to enable it (a one
indicates that the password reset feature is enabled).
Example 3-4 Enable password reset by using CLI
IBM_2145:itsosvccl1:admin>svctask setpwdreset -show
Password status: [0]
IBM_2145:itsosvccl1:admin>svctask setpwdreset -enable
IBM_2145:itsosvccl1:admin>svctask setpwdreset -show
Password status: [1]
3.2.10 Saving the SVC configuration
The SVC configuration will be backed up every day at 01:00 a.m. depending on the time zone.
There is no way to change the backup schedule on the SVC. In addition to the automated
configuration backup, it is possible to create a new backup by user intervention. You can
either run the backup command on the SVC CLI or issue a configuration backup from the
SVC Console GUI.
The SVC cluster maintains two copies of the configuration file:
 svc.config.backup.xml
 svc.config.backup.bak
These backup files contain information about the current SVC cluster configuration, such as:







Code level
Name and IP address
MDisks
Managed Disk Groups (MDGs)
VDisks
Hosts
Storage controllers
If the SVC cluster has experienced a major problem and IBM Support has to rebuild the
configuration structure, the svc.config.backup.xml file is necessary.
Note: The configuration backup does not include any data from any MDisks. The
configuration backup only saves SVC cluster configuration data.
Before making major changes on your SVC cluster, such as SVC cluster code upgrades,
storage controller changes, or SAN changes, we recommend that you create a new backup
of the SVC configuration.
Chapter 3. SVC Console
53
Creating a new configuration backup using the SVC CLI
To create a configuration backup file from the SVC CLI, open an SSH connection and run the
command svcconfig backup, as shown in Example 3-5.
Example 3-5 Running the SVC configuration backup (lines removed for clarity)
IBM_2145:itsosvccl1:admin>svcconfig backup
......
CMMVC6130W Inter-cluster partnership fully_configured will not be restored
..
CMMVC6112W controller controller0 has a default name
.
.
.
CMMVC6112W mdisk mdisk1 has a default name
................
CMMVC6136W No SSH key file svc.config.admin.admin.key
CMMVC6136W No SSH key file svc.config.test.admin.key
......................................
CMMVC6155I SVCCONFIG processing completed successfully
After the backup file is created, it can be retrieved from the SSH cluster using SSH Secure
Copy.
Creating a configuration backup using the SVC Console GUI
To create a configuration backup file from the SVC Console GUI, you must open the Service
and Maintenance panel and run the Backup Configuration task as shown in Figure 3-11.
Figure 3-11 Backing up the SVC configuration
As in the case with the SVC CLI, a new svc.config.backup.xml_Node-1 file will appear in the
List Dumps section.
Automated configuration backup
We recommend that you periodically copy the configuration backup files off of the SVC
cluster and store them in a safe location. There is a guide that explains how to set up a
manual or scheduled task for the SVC Console server at:
http://www-1.ibm.com/support/docview.wss?rs=591&context=STPVGU&context=STPVFV&q1=p
ageant&uid=ssg1S1002175&loc=en_US&cs=utf-8&lang=en
54
SAN Volume Controller Best Practices and Performance Guidelines
3.2.11 Restoring the SVC cluster configuration
Do not attempt to restore the SVC configuration on your own. Call IBM Support and have
them help you restore the configuration. Make sure that all other components are working as
expected. For more information about common errors, refer to Chapter 14, “Troubleshooting
and diagnostics” on page 269.
If you are unsure about what to do, call IBM Support and let them help you collect the
necessary data.
Chapter 3. SVC Console
55
56
SAN Volume Controller Best Practices and Performance Guidelines
4
Chapter 4.
Storage controller
In this chapter, we discuss the following topics:
 Controller affinity and preferred path
 Pathing considerations for EMC Symmetrix/DMX and HDS
 Logical unit number (LUN) ID to MDisk translation
 MDisk to VDisk mapping
 Mapping physical logical block addresses (LBAs) to extents
 Media error logging
 Selecting array and cache parameters
 Considerations for controller configuration
 LUN masking
 Worldwide port name (WWPN) to physical port translation
 Using TotalStorage Productivity Center (TPC) to identify storage controller boundaries
 Using TPC to measure storage controller performance
© Copyright IBM Corp. 2008. All rights reserved.
57
4.1 Controller affinity and preferred path
In this section, we describe the architectural differences between common storage
subsystems in terms of controller “affinity” (also referred to as preferred controller) and
“preferred path.” In this context, affinity refers to the controller in a dual-controller subsystem
that has been assigned access to the back-end storage for a specific LUN under nominal
conditions (that is to say, both controllers are active). Preferred path refers to the host side
connections that are physically connected to the controller that has the assigned affinity for
the corresponding LUN being accessed.
All storage subsystems that incorporate a dual-controller architecture for hardware
redundancy employ the concept of “affinity.” For example, if a subsystem has 100 LUNs, 50
of them have an affinity to controller 0, and 50 of them have an affinity to controller 1. This
means that only one controller is serving any specific LUN at any specific instance in time;
however, the aggregate workload for all LUNs is evenly spread across both controllers. This
relationship exists during normal operation; however, each controller is capable of controlling
all 100 LUNs in the event of a controller failure.
For the DS4000™ and DS6000™, preferred path is important, because Fibre Channel cards
are integrated into the controller. This architecture allows “dynamic” multipathing and
“active/standby” pathing through Fibre Channel cards that are attached to the same controller
(the SVC does not support dynamic multipathing) and an alternate set of paths, which are
configured to the other controller that will be used if the corresponding controller fails.
For example, if each controller is attached to hosts through two Fibre Channel ports, 50 LUNs
will use the two Fibre Channel ports in controller 0, and 50 LUNs will use the two Fibre
Channel ports in controller 1. If either controller fails, the multipathing driver will fail the 50
LUNs associated with the failed controller over to the other controller and all 100 LUNs will
use the two ports in the remaining controller. The DS4000 differs from the DS6000 and
DS8000, because it has the capability to transfer ownership of LUNs at the LUN level as
opposed to the controller level.
For the DS8000 and the Enterprise Storage Server® (ESS), the concept of preferred path is
not used, because Fibre Channel cards are outboard of the controllers, and therefore, all
Fibre Channel ports are available to access all LUNs regardless of cluster affinity. While
cluster affinity still exists, the network between the outboard Fibre Channel ports and the
controllers performs the appropriate controller “routing” as opposed to the DS4000 and
DS6000 where controller routing is performed by the multipathing driver in the host, such as
with IBM Subsystem Device Driver (SDD) and Redundant Disk Array Controller (RDAC).
4.1.1 ADT for DS4000
The DS4000 has a feature called Auto Logical Drive Transfer (ADT). This feature allows
logical drive level failover as opposed to controller level failover. When you enable this option,
the DS4000 moves LUN ownership between controllers according to the path used by the
host.
For the SVC, the ADT feature is enabled by default when you select the “IBM TS SAN VCE”
host type when you configure the DS4000.
Note: It is important that you select the “IBM TS SAN VCE” host type when configuring the
DS4000 for SVC attachment in order to allow the SVC to properly manage the back-end
paths. If the host type is incorrect, SVC will report a 1625 (“incorrect controller
configuration”) error.
58
SAN Volume Controller Best Practices and Performance Guidelines
Refer to Chapter 14, “Troubleshooting and diagnostics” on page 269 for information regarding
checking the back-end paths to storage controllers.
4.1.2 Ensuring path balance prior to MDisk discovery
It is important that LUNs are properly balanced across storage controllers prior to performing
MDisk discovery. Failing to properly balance LUNs across storage controllers in advance can
result in a suboptimal pathing configuration to the back-end disks, which can cause a
performance degradation. Ensure that storage subsystems have all controllers online and
that all LUNs have been distributed to their preferred controller (local affinity) prior to
performing MDisk discovery. Pathing can always be rebalanced later, however, often not until
after lengthy problem isolation has taken place.
If you discover that the LUNs are not evenly distributed across the dual controllers in a
DS4000, you can dynamically change the LUN affinity. However, the SVC will move them
back to the original controller, and the DS4000 will generate an error indicating that the LUN
is no longer on its preferred controller. To correct this situation, you need to run the SVC
command svctask detectmdisk or use the GUI option “Discover MDisks.” SVC will query the
DS4000 again and access the LUNs through the new preferred controller configuration.
4.2 Pathing considerations for EMC Symmetrix/DMX and HDS
There are certain storage controller types that present a unique worldwide node name
(WWNN) and worldwide port name (WWPN) for each port. This action can cause problems
when attached to the SVC, because the SVC enforces a WWNN maximum of four per
storage controller.
Because of this behavior, you must be sure to group the ports if you want to connect more
than four target ports to an SVC. Refer to the IBM System Storage SAN Volume Controller
Software Installation and Configuration Guide Version 4.3.0, SC23-6628-02, for instructions.
4.3 LUN ID to MDisk translation
The “Controller LUN Number” for MDisks is returned from the storage controllers in the
“Report LUNs Data.” The following sections show how to decode the LUN ID from the report
LUNs data for storage controllers ESS, DS6000, and DS8000.
4.3.1 ESS
The ESS uses 14 bits to represent the LUN ID, which ESS storage specialist displays in
hexadecimal (that is, it is in the range 0x0000 to 0x3FFF). To convert this 14 bits to the SVC
“Controller LUN Number”:
 Add 0x4000 to the LUN ID
 Append ‘00000000’
For example, LUN ID 1723 on an ESS corresponds to SVC controller LUN 572300000000.
Chapter 4. Storage controller
59
4.3.2 DS6000 and DS8000
The DS6000 and DS8000 use 16 bits to represent the LUN ID, which decodes as:
40XX40YY0000 = XXYY = 16 bit LUN ID
The LUN ID will only uniquely identify LUNs within the same storage controller. If multiple
storage devices are attached to the same SVC cluster, the LUN ID needs to be combined with
the WWNN attribute in order to uniquely identify LUNs within the SVC cluster. The SVC does
not contain an attribute to identify the storage controller serial number; however, the
Controller Name field can be used for this purpose and will simplify the LUN ID to MDisk
translation.
The Controller Name field is populated with a default value at the time that the storage
controller is initially configured to the SVC cluster. You must modify this field by using the SVC
console selections: Work with Managed Disk → Disk Storage Controller → Rename a
Disk Controller System.
Best Practice: Include the storage controller serial number in the naming convention for
the Controller Name field. For example, use DS8kABCDE for serial number 75-ABCDE.
Figure 4-1 shows LUN ID fields that are displayed from the DS8000 Storage Manager. LUN
ID 1105, for example, appears as 401140050000 in the Controller LUN Number field on the
SVC, which is shown in Figure 4-2 on page 61.
Figure 4-1 DS8K Storage Manager GUI
60
SAN Volume Controller Best Practices and Performance Guidelines
Figure 4-2 MDisk details
From the MDisk details panel in Figure 4-2, the Controller LUN Number field is
4011400500000000, which translates to LUN ID 0x1105 (represented in Hex).
We can also identify the storage controller from the Controller Name as DS8K7598654, which
had been manually assigned.
Note: The command line interface (CLI) references the Controller LUN Number as
ctrl_LUN_#.
4.4 MDisk to VDisk mapping
There are instances where it is necessary to map an MDisk back to VDisks in order to
determine the potential impact that a failing MDisk might have on attached hosts.
You can use the lsvdiskextent CLI command to obtain this information.
The lsmdiskextent output in Example 4-1 on page 62 shows a list of VDisk IDs that have
extents allocated to mdisk14 along with the number of extents. The GUI also has a drop-down
option to perform the same function for VDisks and MDisks.
Chapter 4. Storage controller
61
Example 4-1 The lsmdiskextent command
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk14
id
number_of_extents copy_id
5
16
0
3
16
0
6
16
0
8
13
1
9
23
0
8
25
0
4.5 Mapping physical LBAs to VDisk extents
SVC 4.3 provides new functionality, which makes it easy to find the VDisk extent to which a
physical MDisk LBA maps and to find the physical MDisk LBA to which the VDisk extent
maps. There are a number of situations where this functionality might be useful:
 If a storage controller reports a medium error on a logical drive, but SVC has not yet taken
MDisks offline, you might want to establish which VDisks will be affected by the medium
error.
 When investigating application interaction with Space-Efficient VDisks (SEV), it can be
useful to find out whether a given VDisk LBA has been allocated or not. If an LBA has
been allocated when it has not intentionally been written to, it is possible that the
application is not designed to work well with SEV.
The two new commands are svctask lsmdisklba and svctask lsvdisklba. Their output
varies depending on the type of VDisk (for example, Space-Efficient as opposed to fully
allocated) and type of MDisk (for example, quorum as opposed to non-quorum). For full
details, refer to the SVC 4.3 Software Installation and Configuration Guide, SC23-6628-02.
4.5.1 Investigating a medium error using lsvdisklba
Assume that a medium error has been reported by the storage controller, at LBA 0x00172001
of MDisk 6. Example 4-2 shows the command that we use to discover which VDisk will be
affected by this error.
Example 4-2 Using lsvdisklba to investigate the effect of an MDisk medium error
IBM_2145:itsosvccl1:admin>svcinfo lsvdisklba -mdisk 6 -lba 0x00172001
vdisk_id vdisk_name
copy_id
type
LBA
vdisk_start
vdisk_end
mdisk_start mdisk_end
0
diomede0
0
allocated 0x00102001 0x00100000
0x0010FFFF
0x00170000
0x0017FFFF
This output shows:
 This LBA maps to LBA 0x00102001 of VDisk 0.
 The LBA is within the extent that runs from 0x00100000 to 0x0010FFFF on the VDisk and
from 0x00170000 to 0x0017FFFF on the MDisk (so, the extent size of this Managed Disk
Group (MDG) is 32 MB).
So, if the host performs I/O to this LBA, the MDisk goes offline.
62
SAN Volume Controller Best Practices and Performance Guidelines
4.5.2 Investigating Space-Efficient VDisk allocation using lsmdisklba
After using an application to perform I/O to a Space-Efficient VDisk, you might want to check
which extents have been allocated real capacity. You can check which extents have been
allocated real capacity with the svcinfo lsmdisklba command.
Example 4-3 shows the difference in output between an allocated and an unallocated part of
a VDisk.
Example 4-3 Using lsmdisklba to check whether an extent has been allocated
IBM_2145:itsosvccl1:admin>svcinfo lsmdisklba -vdisk 0 -lba 0x0
copy_id
mdisk_id mdisk_name
type
LBA
mdisk_start
0
6
mdisk6
allocated 0x00050000
0x00050000
IBM_2145:itsosvccl1:admin>svcinfo lsmdisklba -vdisk 14 -lba 0x0
copy_id
mdisk_id mdisk_name
type
LBA
mdisk_start
0
unallocated
mdisk_end
0x0005FFFF
vdisk_start
0x00000000
vdisk_end
0x0000FFFF
mdisk_end
vdisk_start
0x00000000
vdisk_end
0x0000003F
VDisk 0 is a fully allocated VDisk, so the MDisk LBA information is displayed as in
Example 4-2 on page 62.
VDisk 14 is a Space-Efficient VDisk to which the host has not yet performed any I/O; all of its
extents are unallocated. Therefore, the only information shown by lsmdisklba is that it is
unallocated and that this Space-Efficient grain starts at LBA 0x00 and ends at 0x3F (the grain
size if 32 KB).
4.6 Medium error logging
Medium errors on back-end MDisks can be encountered by Host I/O and by SVC background
functions, such as VDisk migration and FlashCopy. In this section, we describe the detailed
sense data for medium errors presented to the host and the SVC.
4.6.1 Host-encountered media errors
Data checks encountered on a VDisk from a host read request will return check condition
status with Key/Code/Qualifier = 030000.
Example 4-4 on page 64 shows an example of the detailed sense data returned to an AIX®
host for an unrecoverable medium error.
Chapter 4. Storage controller
63
Example 4-4 Sense data
LABEL:
SC_DISK_ERR2
IDENTIFIER:
B6267342
Date/Time:
Thu Aug 5 10:49:35 2008
Sequence Number: 4334
Machine Id:
00C91D3B4C00
Node Id:
testnode
Class:
H
Type:
PERM
Resource Name:
hdisk34
Resource Class: disk
Resource Type:
2145
Location:
U7879.001.DQDFLVP-P1-C1-T1-W5005076801401FEF-L4000000000000
VPD:
Manufacturer................IBM
Machine Type and Model......2145
ROS Level and ID............0000
Device Specific.(Z0)........0000043268101002
Device Specific.(Z1)........0200604
Serial Number...............60050768018100FF78000000000000F6
SENSE DATA
0A00 2800 001C
0000 0000 0000
0000 0000 0000
0000 0000 0000
0000 0000 0000
0000 0000 0000
0000 0000 0000
0000 0000 0000
ED00
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0104
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0800
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0102
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
F000
0000
0000
0000
0000
0000
0000
0300
0000
0000
0000
0000
0000
0000
From the sense byte decode:
 Byte 2 = SCSI Op Code (28 = 10-Byte Read)
 Bytes 4 - 7 = LBA (Logical Block Address for VDisk)
 Byte 30 = Key
 Byte 40 = Code
 Byte 41 = Qualifier
4.6.2 SVC-encountered medium errors
Medium errors encountered by VDisk migration, FlashCopy, or VDisk Mirroring on the source
disk are logically transferred to the corresponding destination disk for a maximum of 32
medium errors. If the 32 medium error limit is reached, the associated copy operation will
terminate. Attempts to read destination error sites will result in medium errors just as though
attempts were made to read the source media site.
Data checks encountered by SVC background functions are reported in the SVC error log as
1320 errors. The detailed sense data for these errors indicates a check condition status with
Key/Code/Qualifier = 03110B.
Example 4-5 shows an example of an SVC error log entry for an unrecoverable media error.
64
SAN Volume Controller Best Practices and Performance Guidelines
Example 4-5 Error log entry
Error Log Entry 1965
Node Identifier
Object Type
Object ID
Sequence Number
Root Sequence Number
First Error Timestamp
: Node7
: mdisk
: 48
: 7073
: 7073
: Thu Jul 24 17:44:13 2008
: Epoch + 1219599853
Last Error Timestamp : Thu Jul 24 17:44:13 2008
: Epoch + 1219599853
Error Count
: 21
Error ID : 10025 : Amedia error has occurred during I/O to a Managed Disk
Error Code
: 1320 : Disk I/O medium error
Status Flag
: FIXED
Type Flag
: TRANSIENT ERROR
40
6D
04
02
00
00
00
00
11
80
02
03
00
00
00
00
40
00
00
11
00
00
00
00
02
00
02
0B
00
00
00
00
00
40
00
80
00
00
00
0B
00
00
00
6D
00
00
00
00
00
00
00
59
00
00
00
00
00
00
00
58
00
00
00
00
00
00
00
00
00
00
00
04
00
00
01
00
00
00
00
00
00
00
0A
00
00
00
00
00
02
00
00
00
00
00
00
00
28
00
00
08
00
00
00
10
00
00
80
00
00
00
00
00
58
80
00
C0
00
00
00
02
59
00
00
AA
00
00
00
01
Where the sense byte decodes as:
 Byte 12 = SCSI Op Code (28 = 10-Byte Read)
 Bytes 14 - 17 = LBA (Logical Block Address for MDisk)
 Bytes 49 - 51 = Key/Code/Qualifier
Important: Attempting to locate medium errors on MDisks by scanning VDisks with host
applications, such as dd, or using SVC background functions, such as VDisk migrations
and FlashCopy, can cause the Managed Disk Group (MDG) to go offline as a result of
error handling behavior in current levels of SVC microcode. This behavior will change in
future levels of SVC microcode. Check with support prior to attempting to locate medium
errors by any of these means.
Notes:
 Medium errors encountered on VDisks will log error code 1320 “Disks I/O Medium
Error.”
 If more than 32 medium errors are found while data is being copied from one VDisk to
another VDisk, the copy operation will terminate and log error code 1610 “Too many
medium errors on Managed Disk.”
Chapter 4. Storage controller
65
4.7 Selecting array and cache parameters
In this section, we describe the optimum array and cache parameters.
4.7.1 DS4000 array width
With Redundant Array of Independent Disks 5 (RAID 5) arrays, determining the number of
physical drives to put into an array always presents a compromise. Striping across a larger
number of drives can improve performance for transaction-based workloads. However,
striping can also have a negative effect on sequential workloads. A common mistake that
people make when selecting array width is the tendency to focus only on the capability of a
single array to perform various workloads. However, you must also consider in this decision
the aggregate throughput requirements of the entire storage server. A large number of
physical disks in an array can create a workload imbalance between the controllers, because
only one controller of the DS4000 actively accesses a specific array.
When selecting array width, you must also consider its effect on rebuild time and availability.
A larger number of disks in an array increases the rebuild time for disk failures, which can
have a negative effect on performance. Additionally, more disks in an array increase the
probability of having a second drive fail within the same array prior to the rebuild completion
of an initial drive failure, which is an inherent exposure to the RAID 5 architecture.
Best practice: For the DS4000, we recommend array widths of 4+p and 8+p.
4.7.2 Segment size
With direct-attached hosts, considerations are often made to align device data partitions to
physical drive boundaries within the storage controller. For the SVC, aligning device data
partitions to physical drive boundaries within the storage controller is less critical based on
the caching that the SVC provides and based on the fact that there is less variation in its I/O
profile, which is used to access back-end disks.
Because the maximum destage size for the SVC is 32 KB, it is impossible to achieve full
stride writes for random workloads. For the SVC, the only opportunity for full stride writes
occurs with large sequential workloads, and in that case, the larger the segment size is, the
better. Larger segment sizes can adversely affect random I/O, however. The SVC and
controller cache do a good job of hiding the RAID 5 write penalty for random I/O, and
therefore, larger segment sizes can be accommodated. The primary consideration for
selecting segment size is to ensure that a single host I/O will fit within a single segment to
prevent accessing multiple physical drives.
Testing has shown that the best compromise for handling all workloads is to use a segment
size of 256 KB.
Best practice: We recommend a segment size of 256 KB as the best compromise for all
workloads.
66
SAN Volume Controller Best Practices and Performance Guidelines
Cache block size
The DS4000 uses a 4 KB cache block size by default; however, it can be changed to 16 KB.
For the earlier models of DS4000 using the 2 Gb Fibre Channel (FC) adapters, the 4 KB block
size performed better for random I/O, and 16 KB performs better for sequential I/O. However,
because most workloads contain a mix of random and sequential I/O, the default values have
proven to be the best choice. For the higher performing DS4700 and DS4800, the 4 KB block
size advantage for random I/O has become harder to see. Because most client workloads
involve at least some sequential workload, the best overall choice for these models is the 16
KB block size.
Best practice:
 For the DS4000, leave the cache block size at the default value of 4 KB.
 For the DS4700 and DS4800 models, set the cache block size to 16 KB.
Table 4-1 is a summary of the recommended SVC and DS4000 values.
Table 4-1 Recommended SVC values
Models
Attribute
Value
SVC
Extent size (MB)
256
SVC
Managed mode
Striped
DS4000
Segment size (KB)
256
DS4000
Cache block size (KB)
4 KB (default)
DS4700/DS4800
Cache block size (KB)
16 KB
DS4000
Cache flush control
80/80 (default)
DS4000
Readahead
1
DS4000
RAID 5
4+p, 8+p
4.7.3 DS8000
For the DS8000, you cannot tune the array and cache parameters. The arrays will be either
6+p or 7+p, depending on whether the array site contains a spare and whether the segment
size (contiguous amount of data that is written to a single disk) is 256 KB for fixed block
volumes. Caching for the DS8000 is done on a 64 KB track boundary.
4.8 Considerations for controller configuration
In this section, we discuss controller configuration considerations.
4.8.1 Balancing workload across DS4000 controllers
A best practice when creating arrays is to spread the disks across multiple controllers, as well
as alternating slots, within the enclosures. This practice improves the availability of the array
by protecting against enclosure failures that affect multiple members within the array, as well
as improving performance by distributing the disks within an array across drive loops. You
Chapter 4. Storage controller
67
spread the disks across multiple controllers, as well as alternating slots, within the enclosures
by using the manual method for array creation.
Figure 4-3 shows a Storage Manager view of a 2+p array that is configured across
enclosures. Here, we can see that each disk of the three disks is represented in a separate
physical enclosure and that slot positions alternate from enclosure to enclosure.
Figure 4-3 Storage Manager
4.8.2 Balancing workload across DS8000 controllers
When configuring storage on the IBM System Storage DS8000 disk storage subsystem, it is
important to ensure that ranks on a device adapter (DA) pair are evenly balanced between
odd and even extent pools. Failing to do this can result in a considerable performance
degradation due to uneven device adapter loading.
The DS8000 assigns server (controller) affinity to ranks when they are added to an extent
pool. Ranks that belong to an even-numbered extent pool have an affinity to server0, and
ranks that belong to an odd-numbered extent pool have an affinity to server1.
Figure 4-4 on page 69 shows an example of a configuration that will result in a 50% reduction
in available bandwidth. Notice how arrays on each of the DA pairs are only being accessed by
one of the adapters. In this case, all ranks on DA pair 0 have been added to even-numbered
extent pools, which means that they all have an affinity to server0, and therefore, the adapter
in server1 is sitting idle. Because this condition is true for all four DA pairs, only half of the
adapters are actively performing work. This condition can also occur on a subset of the
configured DA pairs.
68
SAN Volume Controller Best Practices and Performance Guidelines
Figure 4-4 DA pair reduced bandwidth configuration
Example 4-6 shows what this invalid configuration looks like from the CLI output of the
lsarray and lsrank commands. The important thing to notice here is that arrays residing on
the same DA pair contain the same group number (0 or 1), meaning that they have affinity to
the same DS8000 server (server0 is represented by group0 and server1 is represented by
group1).
As an example of this situation, arrays A0 and A4 can be considered. They are both attached
to DA pair 0, and in this example, both arrays are added to an even-numbered extent pool
(P0 and P4). Doing so means that both ranks have affinity to server0 (represented by
group0), leaving the DA in server1 idle.
Example 4-6 Command output
dscli> lsarray -l
Date/Time: Aug 8, 2008 8:54:58 AM CEST IBM DSCLI Version:5.2.410.299 DS: IBM.2107-75L2321
Array State Data
RAID type arsite Rank DA Pair DDMcap(10^9B) diskclass
===================================================================================
A0
Assign Normal
5 (6+P+S)
S1
R0
0
146.0
ENT
A1
Assign Normal
5 (6+P+S)
S9
R1
1
146.0
ENT
A2
Assign Normal
5 (6+P+S)
S17
R2
2
146.0
ENT
A3
Assign Normal
5 (6+P+S)
S25
R3
3
146.0
ENT
A4
Assign Normal
5 (6+P+S)
S2
R4
0
146.0
ENT
A5
Assign Normal
5 (6+P+S)
S10
R5
1
146.0
ENT
A6
Assign Normal
5 (6+P+S)
S18
R6
2
146.0
ENT
A7
Assign Normal
5 (6+P+S)
S26
R7
3
146.0
ENT
dscli> lsrank -l
Date/Time: Aug 8, 2008 8:52:33 AM CEST IBM DSCLI Version: 5.2.410.299 DS: IBM.2107-75L2321
ID Group State datastate Array RAIDtype extpoolID extpoolnam stgtype exts usedexts
======================================================================================
R0
0 Normal Normal
A0
5
P0
extpool0 fb
779
779
R1
1 Normal Normal
A1
5
P1
extpool1 fb
779
779
R2
0 Normal Normal
A2
5
P2
extpool2 fb
779
779
R3
1 Normal Normal
A3
5
P3
extpool3 fb
779
779
Chapter 4. Storage controller
69
R4
R5
R6
R7
0
1
0
1
Normal Normal
Normal Normal
Normal Normal
Normal Normal
A4
A5
A6
A7
5
5
5
5
P4
P5
P6
P7
extpool4
extpool5
extpool6
extpool7
fb
fb
fb
fb
779
779
779
779
779
779
779
779
Figure 4-5 shows an example of a correct configuration that balances the workload across all
four DA pairs.
Figure 4-5 DA pair correct configuration
Example 4-7 shows what this correct configuration looks like from the CLI output of the
lsrank command. The configuration from the lsarray output remains unchanged. Notice that
arrays residing on the same DA pair are now split between groups 0 and 1. Looking at arrays
A0 and A4 once again now shows that they have different affinities (A0 to group0, A4 group1).
To achieve this correct configuration, what has been changed compared to Example 4-6 on
page 69 is that array A4 now belongs to an odd-numbered extent pool (P5).
Example 4-7 Command output
dscli> lsrank -l
Date/Time: Aug 9, 2008 2:23:18 AM CEST IBM DSCLI Version: 5.2.410.299 DS: IBM.2107-75L2321
ID Group State datastate Array RAIDtype extpoolID extpoolnam stgtype exts usedexts
======================================================================================
R0
0 Normal Normal
A0
5
P0
extpool0 fb
779
779
R1
1 Normal Normal
A1
5
P1
extpool1 fb
779
779
R2
0 Normal Normal
A2
5
P2
extpool2 fb
779
779
R3
1 Normal Normal
A3
5
P3
extpool3 fb
779
779
R4
1 Normal Normal
A4
5
P5
extpool5 fb
779
779
R5
0 Normal Normal
A5
5
P4
extpool4 fb
779
779
R6
1 Normal Normal
A6
5
P7
extpool7 fb
779
779
R7
0 Normal Normal
A7
5
P6
extpool6 fb
779
779
70
SAN Volume Controller Best Practices and Performance Guidelines
4.8.3 DS8000 ranks to extent pools mapping
When configuring the DS8000, two different approaches for the rank to extent pools mapping
exist:
 One rank per extent pool
 Multiple ranks per extent pool using DS8000 Storage Pool Striping (SPS)
The most common approach is to map one rank to one extent pool, which provides good
control for volume creation, because it ensures that all volume allocation from the selected
extent pool will come from the same rank.
The SPS feature became available with the R3 microcode release for the DS8000 series and
effectively means that a single DS8000 volume can be striped across all the ranks in an
extent pool (therefore, the functionality is often referred as “extent pool striping”). So, if a
given extent pool includes more than one rank, a volume can be allocated using free space
from several ranks (which also means that SPS can only be enabled at volume creation, no
reallocation is possible).
The SPS feature requires that your DS8000 layout has been well thought-out from the
beginning to utilize all resources in the DS8000. If this is not done, SPS might cause severe
performance problems (for example, if configuring a heavily loaded extent pool with multiple
ranks from the same DA pair). Because the SVC itself stripes across MDisks, the SPS feature
is not as relevant here as when accessing the DS8000 directly.
Best practice: Configure one rank per extent pool if using DS8000 R1 or R2 microcode
versions.
If using DS8000 R3 or later microcode versions, only configure Storage Pool Striping after
contacting IBM to have your design verified.
4.8.4 Mixing array sizes within an MDG
Mixing array sizes within an MDG in general is not of concern. Testing has shown no
measurable performance differences between selecting all 6+p arrays and all 7+p arrays as
opposed to mixing 6+p arrays and 7+p arrays. In fact, mixing array sizes can actually help
balance workload, because it places more data on the ranks that have the extra performance
capability provided by the eighth disk. There is one small exposure here in the case where an
insufficient number of the larger arrays are available to handle access to the higher capacity.
In order to avoid this situation, ensure that the smaller capacity arrays do not represent more
than 50% of the total number of arrays within the MDG.
Best practice: When mixing 6+p arrays and 7+p arrays in the same MDG, avoid having
smaller capacity arrays comprise more than 50% of the arrays.
4.8.5 Determining the number of controller ports for ESS/DS8000
Configure a minimum of eight controller ports to the SVC per controller regardless of the
number of nodes in the cluster. Configure 16 controller ports for large controller
configurations where more than 48 ranks are being presented to the SVC cluster.
Additionally, we recommend that no more than two ports of each of the DS8000’s 4-port
adapters are used.
Chapter 4. Storage controller
71
Table 4-2 shows the recommended number of ESS/DS8000 ports and adapters based on
rank count.
Table 4-2 Recommended number of ports and adapters
Ranks
Ports
Adapters
2 - 48
8
4-8
> 48
16
8 - 16
The ESS and DS8000 populate Fibre Channel (FC) adapters across two to eight I/O
enclosures, depending on configuration. Each I/O enclosure represents a separate hardware
domain.
Ensure that adapters configured to different SAN networks do not share the same I/O
enclosure as part of our goal of keeping redundant SAN networks isolated from each other.
Best practices that we recommend:
 Configure a minimum of eight ports per DS8000.
 Configure 16 ports per DS8000 when > 48 ranks are presented to the SVC cluster.
 Configure a maximum of two ports per four port DS8000 adapter.
 Configure adapters across redundant SAN networks from different I/O enclosures.
4.8.6 Determining the number of controller ports for DS4000
The DS4000 must be configured with two ports per controller for a total of four ports per
DS4000.
4.9 LUN masking
For a given storage controller, all SVC nodes must see the same set of LUNs from all target
ports that have logged into the SVC nodes. If target ports are visible to the nodes that do not
have the same set of LUNs assigned, SVC treats this situation as an error condition and
generates error code 1625.
Validating the LUN masking from the storage controller and then confirming the correct path
count from within the SVC are critical.
Example 4-8 shows four LUNs being presented from a DS8000 storage controller to a 4-node
SVC cluster.
The DS8000 performs LUN masking based on volume group. Example 4-8 shows showvolgrp
output for volume group V0, which contains four LUNs.
Example 4-8 The showvolgrp command output
dscli> showvolgrp -dev IBM.2107-75ALNN1 V0
Date/Time: August 15, 2008 10:12:33 AM PDT IBM DSCLI Version: 5.0.4.43 DS:
IBM.2107-75ALNN1
Name SVCVG0
ID
V0
Type SCSI Mask
72
SAN Volume Controller Best Practices and Performance Guidelines
Vols 1000 1001 1004 1005
Example 4-9 shows lshostconnect output from the DS8000. Here, you can see that all 16
ports of the 4-node cluster are assigned to the same volume group (V0) and, therefore, have
been assigned to the same four LUNs.
Example 4-9 The lshostconnect command output
dscli> lshostconnect -dev IBM.2107-75ALNN1
Date/Time: August 14, 2008 11:51:31 AM PDT IBM DSCLI Version: 5.0.4.43 DS:
IBM.2107-75ALNN1
Name
ID
WWPN
HostType Profile portgrp volgrpID ESSIOport
===============================================================================
svcnode
0000 5005076801302B3E SVC
San Volume Controller
0 V0
all
svcnode
0001 5005076801302B22 SVC
San Volume Controller
0 V0
all
svcnode
0002 5005076801202D95 SVC
San Volume Controller
0 V0
all
svcnode
0003 5005076801402D95 SVC
San Volume Controller
0 V0
all
svcnode
0004 5005076801202BF1 SVC
San Volume Controller
0 V0
all
svcnode
0005 5005076801402BF1 SVC
San Volume Controller
0 V0
all
svcnode
0006 5005076801202B3E SVC
San Volume Controller
0 V0
all
svcnode
0007 5005076801402B3E SVC
San Volume Controller
0 V0
all
svcnode
0008 5005076801202B22 SVC
San Volume Controller
0 V0
all
svcnode
0009 5005076801402B22 SVC
San Volume Controller
0 V0
all
svcnode
000A 5005076801102D95 SVC
San Volume Controller
0 V0
all
svcnode
000B 5005076801302D95 SVC
San Volume Controller
0 V0
all
svcnode
000C 5005076801102BF1 SVC
San Volume Controller
0 V0
all
svcnode
000D 5005076801302BF1 SVC
San Volume Controller
0 V0
all
svcnode
000E 5005076801102B3E SVC
San Volume Controller
0 V0
all
svcnode
000F 5005076801102B22 SVC
San Volume Controller
0 V0
all
fd11asys
0010 210100E08BA5A4BA VMWare
VMWare
0 V1
all
fd11asys
0011 210000E08B85A4BA VMWare
VMWare
0 V1
all
mdms024_fcs0 0012 10000000C946AB14 pSeries IBM pSeries - AIX
0 V2
all
mdms024_fcs1 0013 10000000C94A0B97 pSeries IBM pSeries - AIX
0 V2
all
parker_fcs0 0014 10000000C93134B3 pSeries IBM pSeries - AIX
0 V3
all
parker_fcs1 0015 10000000C93139D9 pSeries IBM pSeries - AIX
0 V3
all
Additionally, you can see from the lshostconnect output that only the SVC WWPNs are
assigned to V0.
Important: Data corruption can occur if LUNs are assigned to both SVC nodes and
non-SVC nodes, that is, direct-attached hosts.
Next, we show you how the SVC sees these LUNs if the zoning is properly configured.
The Managed Disk Link Count represents the total number of MDisks presented to the SVC
cluster.
Figure 4-6 on page 74 shows the output storage controller general details. To display this
panel, we selected Work with Managed Disks → Disk Controller Systems → View
General Details.
In this case, we can see that the Managed Disk Link Count is 4, which is correct for our
example.
Chapter 4. Storage controller
73
Figure 4-6 Viewing General Details
Figure 4-7 shows the storage controller port details. To get to this panel, we selected Work
with Managed Disks → Disk Controller Systems → View General Details → Ports.
Figure 4-7 Viewing Port Details
Here, a path represents a connection from a single node to a single LUN. Because we have
four nodes and four LUNs in this example configuration, we expect to see a total of 16 paths
with all paths evenly distributed across the available storage ports. We have validated that
74
SAN Volume Controller Best Practices and Performance Guidelines
this configuration is correct, because we see eight paths on one WWPN and eight paths on
the other WWPN for a total of 16 paths.
4.10 WWPN to physical port translation
Storage controller WWPNs can be translated to physical ports on the controllers for isolation
and debugging purposes. Additionally, you can use this information for validating redundancy
across hardware boundaries.
In Example 4-10, we show the WWPN to physical port translations for the ESS.
Example 4-10 ESS
WWPN format for ESS = 5005076300XXNNNN
XX = adapter location within storage controller
NNNN = unique identifier for storage controller
Bay R1-B1 R1-B1 R1-B1 R1-B1 R1-B2 R1-B2 R1-B2 R1-B2
Slot H1
H2
H3
H4
H1
H2
H3
H4
XX
C4
C3
C2
C1
CC
CB
CA
C9
Bay
Slot
XX
R1-B3 R1-B3 R1-B3 R1-B3 R1-B4 R1-B4 R1-B4 R1-B4
H1
H2
H3
H4
H1
H2
H3
H4
C8
C7
C6
C5
D0
CF
CE
CD
In Example 4-11, we show the WWPN to physical port translations for the DS8000.
Example 4-11 DS8000
WWPN format for DS8000 = 50050763030XXYNNN
XX = adapter location within storage controller
Y = port number within 4-port adapter
NNN = unique identifier for storage controller
IO Bay
Slot
XX
B1
S1 S2 S4 S5
00 01 03 04
B2
S1 S2 S4 S5
08 09 0B 0C
B3
S1 S2 S4 S5
10 11 13 14
B4
S1 S2 S4 S5
18 19 1B 1C
IO Bay
Slot
XX
B5
S1 S2 S4 S5
20 21 23 24
B6
S1 S2 S4 S5
28 29 2B 2C
B7
S1 S2 S4 S5
30 31 33 34
B8
S1 S2 S4 S5
38 39 3B 3C
Port
Y
P1
0
P2
4
P3
8
P4
C
4.11 Using TPC to identify storage controller boundaries
It is often desirable to map the virtualization layer to determine which VDisks and hosts are
utilizing resources for a specific hardware boundary on the storage controller, for example,
when a specific hardware component, such as a disk drive, is failing, and the administrator is
Chapter 4. Storage controller
75
interested in performing an application level risk assessment. Information learned from this
type of analysis can lead to actions taken to mitigate risks, such as scheduling application
downtime, performing VDisk migrations, and initiating FlashCopy. TPC allows the mapping of
the virtualization layer to occur quickly, and using TPC eliminates mistakes that can be made
by using a manual approach.
Figure 4-8 shows how a failing disk on a storage controller can be mapped to the MDisk that
is being used by an SVC cluster. To display this panel, click Physical Disk → RAID5
Array → Logical Volume → MDisk.
Figure 4-8 Mapping MDisk
Figure 4-9 on page 77 completes the end-to-end view by mapping the MDisk through the
SVC to the attached host. Click MDisk → MDGroup → VDisk → host disk.
76
SAN Volume Controller Best Practices and Performance Guidelines
Figure 4-9 Host mapping
4.12 Using TPC to measure storage controller performance
In this section, we provide a brief introduction to performance monitoring for the SVC
back-end disk. When talking about storage controller performance, the back-end I/O rate
refers to the rate of I/O between the storage controller cache and the storage arrays. In an
SVC environment, back-end I/O is also used to refer to the rate of I/O between the SVC
nodes and the controllers. Both rates are considered when monitoring storage controller
performance.
The two most important metrics when measuring I/O subsystem performance are response
time in milliseconds and throughput in I/Os per second (IOPS):
 Response time in non-SVC environments is measured from when the host issues a
command to when the storage controller reports that the command has completed. With
the SVC, we not only have to consider response time from the host to the SVC nodes, but
also from the SVC nodes to the storage controllers.
 Throughput, however, can be measured at a variety of points along the data path, and the
SVC adds additional points where throughput is of interest and measurements can be
obtained.
TPC offers many disk performance reporting options that support the SVC environment well
and also support the storage controller back end for a variety of storage controller types. The
most relevant storage components where performance metrics can be collected when
monitoring storage controller performance include:






Subsystem
Controller
Array
MDisk
MDG
Port
Note: In SVC environments, the SVC nodes interact with the storage controllers in the
same way as a host. Therefore, the performance rules and guidelines that we discuss in
this section are also applicable to non-SVC environments. References to MDisks are
analogous with host-attached LUNs in a non-SVC environment.
Chapter 4. Storage controller
77
4.12.1 Normal operating ranges for various statistics
While the exact figures seen depend on both the type of equipment and the workload, certain
assumptions can be made about the normal range of figures that will be achievable. If TPC
reports results outside of this range, it is likely to indicate a problem, such as overloading or
component failure:
 Throughput for storage volumes can range from 1 IOPS to more than 1 000 IOPS based
mostly on the nature of the application. The I/O rates for an MDisk approach 1 000 IOPS
when that MDisk is encountering extremely good controller cache behavior; otherwise,
such high I/O rates are impossible. If the SVC is issuing large I/Os (for example, on a
FlashCopy with large grain size), the IOPS figure will be lower for a given data transfer
rate.
 A 10 millisecond response time is generally considered to be getting high; however, it
might be perfectly acceptable depending on the application behavior and requirements.
For example, many online transaction processing (OLTP) environments require response
times in the 5 to 8 millisecond range, while batch applications with large sequential
transfers are operating nominally in the 15 - 30 millisecond range.
 Nominal service times for disks today are 5 - 7 milliseconds; however, when a disk is at
50% utilization, ordinary queuing adds a wait time roughly equal to the service time, so a
10 - 14 millisecond response time is a reasonable goal in most environments.
 High controller cache hit ratios allow the back-end arrays to run at a higher utilization. A
70% array utilization produces high array response times; however, when averaged with
cache hits, they produce acceptable average response times. High SVC read hit ratios
can have the same effect on array utilization in that they will allow higher MDisk utilizations
and, therefore, higher array response times.
 Poor cache hit ratios require good back-end response times.
 Front-end response times typically must be in the 5 - 15 millisecond range.
 Back-end response times to arrays can usually operate in the 20 - 25 millisecond range up
to 60 milliseconds unless the cache hit ratio is low.
4.12.2 Establish a performance baseline
I/O rate often grows over time, and as I/O rates increase, response times also increase. It is
important to establish a good performance baseline so that the growth effects of the I/O
workload can be monitored and trends identified that can be used to predict when additional
storage performance and capacity will be required.
Best Practices that we recommend:
 Derive the best (as a general rule) metrics for any system from current and historical
data taken from specific configurations and workloads that are meeting application and
user requirements.
 Collect new sets of metrics after configuration changes are made to the storage
controller configuration of the MDG configuration, such as adding or removing MDisks.
 Keep a historical record of performance metrics.
4.12.3 Performance metric guidelines
Several performance metric guidelines are:
78
SAN Volume Controller Best Practices and Performance Guidelines
 Small block reads (4 KB to 8 KB) must have average response times in the 2 - 15
millisecond range.
 Small block writes must have response times near 1 millisecond, because these small
block writes are all cache hits. High response times with small block writes often indicate
nonvolatile storage (NVS) full conditions.
 With large block reads and writes (32 KB or greater), response times are insignificant as
long as throughput objectives are met.
 Read hit percentage can vary from 0% to near 100%. Anything lower than 50% is
considered low; however, many database applications can run under 30%. Cache hit
ratios are mostly dependent on application design. Larger cache always helps and allows
back-end arrays to be driven at a higher utilization.
 Storage controller back-end read response times must not exceed 25 milliseconds unless
the cache read hit ratio is near 99%.
 Storage controller back-end write response times can be high due to the RAID 5 and RAID
10 write penalties; however, they must not exceed 60 milliseconds.
 Array throughput above 700 - 800 IOPS can start impacting front-end performance.
 Port response times must be less than 2 milliseconds for most I/O; however, they can
reach as high as 5 milliseconds with large transfer sizes.
Figure 4-10 is a TPC graph showing aggregate throughput for several ESS arrays. In this
case, all arrays have throughput lower than 700 IOPS.
Figure 4-10 Overall I/O rate for ESS subsystems
Chapter 4. Storage controller
79
4.12.4 Storage controller back end
The back-end I/O rate is the rate of I/O between storage subsystem cache and the storage
arrays. Write activity to back-end disk is from cache and is normally an asynchronous
operation to move data from cache to free space in NVS.
One of the more common conditions that can impact overall performance is array
overdriving. TPC allows metrics to be collected and graphed for arrays, either individually or
as a group. Figure 4-11 is a TPC graph showing response times for all ESS arrays that are
being monitored. This graph shows that certain arrays are regularly peaking over 200 ms,
which indicates overloading.
Figure 4-11 ESS back-end response times showing overloading
Array response times depend on many factors, including disk RPM and the array
configuration. However, in all cases when the number of IOPS is near, or exceeds 1 000
IOPS, the array is extremely busy.
Table 4-3 shows the upper limit for several disk speeds and array widths. Remember that
while these I/O rates can be achieved, they imply considerable queuing delays and high
response times.
Table 4-3 Maximum IOPS for different DDM speeds
80
DDM speed
Single drive (IOPS)
6+P array (IOPS)
7+P array (IOPS)
10 K
150 - 175
900 - 1050
1050 - 1225
15 K
200 - 225
1200 - 1350
1400 - 1575
7.2 k (near-line)
85 - 110
510 - 660
595 - 770
SAN Volume Controller Best Practices and Performance Guidelines
These numbers can vary significantly depending on cache hit ratios, block size, and service
time.
Rule: 1 000 IOPS indicate an extremely busy array and can be impacting front-end
response times.
Chapter 4. Storage controller
81
82
SAN Volume Controller Best Practices and Performance Guidelines
5
Chapter 5.
MDisks
In this chapter, we discuss various MDisk attributes, as well as provide an overview of the
process of adding and removing MDisks from existing Managed Disk Groups (MDGs).
In this chapter, we discuss the following topics:
 Back-end queue depth
 MDisk transfer size
 Selecting logical unit number (LUN) attributes for MDisks
 Tiered storage
 Adding MDisks to existing MDGs
 Restriping (balancing) extents across an MDG
 Remapping managed MDisks
 Controlling extent allocation order for VDisk creation
© Copyright IBM Corp. 2008. All rights reserved.
83
5.1 Back-end queue depth
SVC submits I/O to the back-end (MDisk) storage in the same fashion as any direct-attached
host. For direct-attached storage, the queue depth is tunable at the host and is often
optimized based on specific storage type as well as various other parameters, such as the
number of initiators. For the SVC, the queue depth is also tuned; however, the optimal value
used is calculated internally.
Note that the exact algorithm used to calculate queue depth is subject to change. Do not rely
upon the following details staying the same. However, this summary is true of SVC 4.3.0.
There are two parts to the algorithm: a per MDisk limit and a per controller port limit.
Q = ((P x C) / N) / M
If Q > 60, then Q=60 (maximum queue depth is 60)
If Q < 3, then Q=3 (minimum queue depth is 3)
In this algorithm:
Q = The queue for any MDisk in a specific controller
P = Number of WWPNs visible to SVC in a specific controller
N = Number of nodes in the cluster
M = Number of MDisks provided by the specific controller
C = A constant. C varies by controller type:
– FAStT200, 500, DS4100, and EMC Clarion = 200
– DS4700, DS4800, DS6K, and DS8K = 1000
– Any other controller = 500
When SVC has submitted and has Q I/Os outstanding for a single MDisk (that is, it is waiting
for Q I/Os to complete), it will not submit any more I/O until part of the I/O completes. That is,
any new I/O requests for that MDisk will be queued inside the SVC, which is undesirable and
indicates that back-end storage is overloaded.
The following example shows how a 4-node SVC cluster calculates queue depth for 150
LUNs on a DS8000 storage controller using six target ports:
Q = ((6 ports *1000/port)/4 nodes)/150 MDisks) = 10
With the configuration, each MDisk has a queue depth of 10.
5.2 MDisk transfer size
The size of I/O that the SVC performs to the MDisk depends on where the I/O originated.
84
SAN Volume Controller Best Practices and Performance Guidelines
5.2.1 Host I/O
The maximum transfer size under normal I/O is 32 KB, because the internal cache track size
is 32 KB, and, therefore, destages from cache can be up to the cache track size. Although a
track can hold up to 32 KB, a read or write operation can only partially populate the track;
therefore, a read or write operation to the MDisks can be anywhere from 512 bytes to 32 KB.
5.2.2 FlashCopy I/O
The transfer size for FlashCopy is always 256 KB, because the grain size of FlashCopy is 256
KB and any size write that changes data within a 256 KB grain will result in a single 256 KB
write.
5.2.3 Coalescing writes
The SVC coalesces writes up to the 32 KB track size if writes reside in the same tracks prior
to destage, for example, if 4 KB is written into a track, another 4 KB is written to another
location in the same track.This track will move to the bottom of the least recently used (LRU)
list in the cache upon the second write, and the track will now contain 8 KB of actual data.
This system can continue until the track reaches the top of the LRU list and is then destaged;
the data is written to the back-end disk and removed from the cache. Any contiguous data
within the track will be coalesced for the destage.
Sequential writes
The SVC does not employ a caching algorithm for “explicit sequential detect,” which means
coalescing of writes in SVC cache has a random component to it. For example, 4 KB writes to
VDisks will translate to a mix of 4 KB, 8 KB, 16 KB, 24 KB, and 32 KB transfers to the MDisks
with reducing probability as the transfer size grows.
Although larger transfer sizes tend to be more efficient, this varying transfer size has no effect
on the controller’s ability to detect and coalesce sequential content to achieve “full stride
writes.”
Sequential reads
The SVC uses “prefetch” logic for staging reads based on statistics maintained on 128 MB
regions. If the sequential content is sufficiently high enough within a region, prefetch occurs
with 32 KB reads.
5.3 Selecting LUN attributes for MDisks
The selection of LUN attributes requires the following primary considerations:




Selecting array size
Selecting LUN size
Number of LUNs per array
Number of physical disks per array
Important: We generally recommend that LUNs are created to use the entire capacity of
the array as described in 6.2, “Selecting the number of LUNs per array” on page 104.
Chapter 5. MDisks
85
Capacity planning consideration
When configuring MDisks to MDGs, we advise that you consider leaving a small amount of
MDisk capacity that can be used as “swing” (spare) capacity for image mode VDisk
migrations. A good general rule is to allow enough space equal to the average capacity of the
configured VDisks.
Selecting MDisks for MDGs
All LUNs for MDG creation must have the same performance characteristics. If MDisks of
varying performance levels are placed in the same MDG, the performance of the MDG can be
reduced to the level of the poorest performing MDisk. Likewise, all LUNs must also possess
the same availability characteristics. Remember that the SVC does not provide any
Redundant Array of Independent Disks (RAID) capabilities within an MDG. The loss of
access to any one of the MDisks within the MDG impacts the entire MDG. However, with the
introduction of VDisk Mirroring in SVC 4.3, you can protect against the loss of an MDG by
mirroring a VDisk across multiple MDGs. Refer to Chapter 7, “VDisks” on page 119 for more
information.
We recommend these best practices for LUN selection within an MDG:
 LUNs are the same type.
 LUNs are the same RAID level.
 LUNs are the same RAID width (number of physical disks in array).
 LUNs have the same availability and fault tolerance characteristics.
MDisks created on LUNs with varying performance and availability characteristics must be
placed in separate MDGs.
RAID 5 compared to RAID 10
In general, RAID 10 arrays are capable of higher throughput for random write workloads than
RAID 5, because RAID 10 only requires two I/Os per logical write compared to four I/Os per
logical write for RAID 5. For random reads and sequential workloads, there is typically no
benefit. With certain workloads, such as sequential writes, RAID 5 often shows a
performance advantage.
Obviously, selecting RAID 10 for its performance advantage comes at an extremely high cost
in usable capacity, and, in most cases, RAID 5 is the best overall choice.
When considering RAID 10, we recommend that you use DiskMagic to determine the
difference in I/O service times between RAID 5 and RAID 10. If the service times are similar,
the lower cost solution makes the most sense. If RAID 10 shows a service time advantage
over RAID 5, the importance of that advantage must be weighed against its additional cost.
5.4 Tiered storage
The SVC makes it easy to configure multiple tiers of storage within the same SVC cluster. As
we discussed in 5.3, “Selecting LUN attributes for MDisks” on page 85, it is important that
MDisks that belong to the same MDG share the same availability and performance
characteristics; however, grouping LUNs of like performance and availability within MDGs is
an attractive feature of SVC. You can define tiers of storage using storage controllers of
varying performance and availability levels. Then, you can easily provision them based on
host, application, and user requirements.
86
SAN Volume Controller Best Practices and Performance Guidelines
Remember that a single tier of storage can be represented by multiple MDGs. For example, if
you have a large pool of tier 3 storage that is provided by many low-cost storage controllers, it
is sensible to use a number of MDGs. Using a number of MDGs prevents a single offline
VDisk from taking all of the tier 3 storage offline.
When multiple storage tiers are defined, you need to take precautions to ensure that storage
is provisioned from the appropriate tiers. You can ensure that storage is provisioned from the
appropriate tiers through MDG and MDisk naming conventions, along with clearly defined
storage requirements for all hosts within the installation.
Note: When multiple tiers are configured, it is a best practice to clearly indicate the storage
tier in the naming convention used for the MDGs and MDisks.
5.5 Adding MDisks to existing MDGs
In this section, we discuss adding MDisks to existing MDGs.
5.5.1 Adding MDisks for capacity
Before adding MDisks to existing MDGs, ask yourself first why you are doing this. If MDisks
are being added to the SVC cluster to provide additional capacity, consider adding them to a
new MDG. Recognize that adding new MDisks to existing MDGs will reduce the reliability
characteristics of the MDG and risk destabilizing the MDG if hardware problems exist with the
new LUNs. If the MDG is already meeting its performance objectives, we recommend that, in
most cases, you add the new MDisks to new MDGs rather than add the new MDisks to
existing MDGs.
5.5.2 Checking access to new MDisks
You must be careful when adding MDisks to existing MDGs to ensure the availability of the
MDG is not compromised by adding a faulty MDisk. Because loss of access to a single MDisk
will cause the entire MDG to go offline, we recommend that with SVC versions prior to 4.2.1,
read/write access to the MDisk is tested prior to adding the MDisk to an existing online MDG.
You can test the read/write (R/W) access to the MDisk by creating a test MDG, adding the
new MDG, creating a test VDisk, adding it, and then performing a simple R/W to the VDisk.
SVC 4.2.1 introduced a new feature where MDisks are tested for reliable read/write access
before being added to an MDG. This means that manually performing a test in this way is no
longer necessary. This testing before an MDisk is admitted to an MDG is automatic and no
user action is required. The test will fail if:




One or more nodes cannot access the MDisk through the chosen controller port.
I/O to the disk does not complete within a reasonable time.
The SCSI inquiry data provided for the disk is incorrect or incomplete.
The SVC cluster suffers a software error during the MDisk test.
Note that image-mode MDisks are not tested before being added to an MDG, because an
offline image-mode MDisk will not take the MDG offline.
Chapter 5. MDisks
87
5.5.3 Persistent reserve
A common condition where MDisks can be configured by SVC, but cannot perform R/W is in
the case where a persistent reserve (PR) has been left on a LUN from a previously attached
host. Subsystems that are exposed to this condition were previously attached with IBM
Subsystem Device Driver (SDD) or SDDPCM, because support for PR comes from these
multipath drivers. You do not see this condition on DS4000 when previously attached using
RDAC, because RDAC does not implement PR.
In this condition, you need to rezone LUNs and map them back to the host holding the
reserve or to another host that has the capability to remove the reserve through the use of a
utility, such as lquerypr (included with SDD and SDDPCM).
An alternative option is to remove the PRs from within the storage subsystem. The ESS
provides an option to remove the PRs from within the storage subsystem through the GUI
(ESS Specialist); however for the DS6000 and DS8000, removing the PRs from within the
storage subsystem can only be done by using the command line and, therefore, requires
technical support.
5.5.4 Renaming MDisks
We recommend that you rename MDisks from their SVC-assigned name after you discover
them. Using a naming convention for MDisks that associates the MDisk to the controller and
array helps during problem isolation and avoids confusion that can lead to an administration
error.
Note that when multiple tiers of storage exist on the same SVC cluster, you might also want to
indicate the storage tier in the name as well. For example, you can use R5 and R10 to
differentiate RAID levels or you can use T1, T2, and so on to indicate defined tiers.
Best practice: Use a naming convention for MDisks that associates the MDisk with its
corresponding controller and array within the controller, for example, DS8K_R5_12345.
5.6 Restriping (balancing) extents across an MDG
Adding MDisks to existing MDGs can result in reduced performance across the MDG due to
the extent imbalance that will occur and the potential to create hot spots within the MDG.
After adding MDisks to MDGs, we recommend that extents are rebalanced across all
available MDisks by using the command line interface (CLI) by manual command entry.
Alternatively, you can automate rebalancing the extents across all available MDisks by using
a Perl script, available as part of the SVCTools package from the alphaWorks® Web site.
If you want to manually balance extents, you can use the following CLI commands to identify
and correct extent imbalance across MDGs:
 svcinfo lsmdiskextent
 svctask migrateexts
 svcinfo lsmigrate
The following section describes how to use the script from the SVCTools package to
rebalance extents automatically. You can use this script on any host with Perl and an SSH
client installed; we show how to install it on a Windows Server 2003 server.
88
SAN Volume Controller Best Practices and Performance Guidelines
5.6.1 Installing prerequisites and the SVCTools package
For this test, we installed SVCTools on a Windows Server 2003 server. The major
prerequisites are:
 PuTTY: This tool provides SSH access to the SVC cluster. If you are using an SVC
Master Console or a System Storage Productivity Center (SSPC) server, it has already
been installed. If not, you can download PuTTY from the author’s Web site at:
http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
The easiest package to install is the “Windows installer,” which installs all the PuTTY tools
in one location.
 Perl: Perl packages for Windows are available from a number of sources. We used
ActivePerl, which can be downloaded free-of-charge from:
http://www.activestate.com/Products/activeperl/index.mhtml
The SVCTools package is available at:
http://www.alphaworks.ibm.com/tech/svctools
This package is a compressed file, which can be extracted to wherever is convenient. We
extracted it to C:\SVCTools on the Master Console. The key files for the extent balancing
script are:
 The SVCToolsSetup.doc file, which explains the installation and use of the script in detail
 The lib\IBM\SVC.pm file, which must be copied to the Perl lib directory. With ActivePerl
installed in C:\Perl, copy it to C:\Perl\lib\IBM\SVC.pm.
 The examples\balance\balance.pl file, which is the rebalancing script.
5.6.2 Running the extent balancing script
The MDG on which we tested the script was unbalanced, because we recently expanded it
from four MDisks to eight MDisks. Example 5-1 shows that all of the VDisk extents are on the
original four MDisks.
Example 5-1 The lsmdiskextent script output showing an unbalanced MDG
IBM_2145:itsosvccl1:admin>svcinfo lsmdisk -filtervalue
"mdisk_grp_name=itso_ds4500"
id
name
status
mode
mdisk_grp_id
mdisk_grp_name
capacity
ctrl_LUN_#
controller_name
UID
0
mdisk0
online
managed
1
itso_ds45_18gb
18.0GB
0000000000000000 itso_ds4500
600a0b80001744310000011a4888478c00000000000000000000000000000000
1
mdisk1
online
managed
1
itso_ds45_18gb
18.0GB
0000000000000001 itso_ds4500
600a0b8000174431000001194888477800000000000000000000000000000000
2
mdisk2
online
managed
1
itso_ds45_18gb
18.0GB
0000000000000002 itso_ds4500
600a0b8000174431000001184888475800000000000000000000000000000000
3
mdisk3
online
managed
1
itso_ds45_18gb
18.0GB
0000000000000003 itso_ds4500
600a0b8000174431000001174888473e00000000000000000000000000000000
Chapter 5. MDisks
89
4
mdisk4
online
managed
itso_ds45_18gb
18.0GB
0000000000000004 itso_ds4500
600a0b8000174431000001164888472600000000000000000000000000000000
5
mdisk5
online
managed
itso_ds45_18gb
18.0GB
0000000000000005 itso_ds4500
600a0b8000174431000001154888470c00000000000000000000000000000000
6
mdisk6
online
managed
itso_ds45_18gb
18.0GB
0000000000000006 itso_ds4500
600a0b800017443100000114488846ec00000000000000000000000000000000
7
mdisk7
online
managed
itso_ds45_18gb
18.0GB
0000000000000007 itso_ds4500
600a0b800017443100000113488846c000000000000000000000000000000000
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
64
0
2
64
0
1
64
0
4
64
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
64
0
2
64
0
1
64
0
4
64
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
64
0
2
64
0
1
64
0
4
64
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
64
0
2
64
0
1
64
0
4
64
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
1
1
1
1
mdisk0
mdisk1
mdisk2
mdisk3
mdisk4
mdisk5
mdisk6
mdisk7
The balance.pl script was then run on the Master Console using the command:
C:\SVCTools\examples\balance>perl balance.pl itso_ds45_18gb -k "c:\icat.ppk" -i
9.43.86.117 -r -e
In this command:
 itso_ds45_18gb is the MDG to be rebalanced.
 -k "c:\icat.ppk" gives the location of the PuTTY private key file, which is authorized for
administrator access to the SVC cluster.
 -i 9.43.86.117 gives the IP address of the cluster.
90
SAN Volume Controller Best Practices and Performance Guidelines
 -r requires that the optimal solution is found. If this option is not specified, the extents can
still be somewhat unevenly spread at completion, but not specifying -r will often require
fewer migration commands and less time. If time is important, it might be preferable to not
use -r at first, and then rerun the command with -r if the solution is not good enough.
 -e specifies that the script will actually run the extent migration commands. Without this
option, it will merely print the commands that it might have run. This option can be used to
check that the series of steps is logical before committing to migration.
In this example, with 4 x 8 GB VDisks, the migration completed within around 15 minutes.
You can use the command svcinfo lsmigrate to monitor progress; this command shows a
percentage for each extent migration command issued by the script.
After the script had completed, we checked that the extents had been correctly rebalanced.
Example 5-2 shows that the extents had been correctly rebalanced. In a test run of 40
minutes of I/O (25% random, 70/30 R/W) to the four VDisks, performance for the balanced
MDG was around 20% better than for the unbalanced MDG.
Example 5-2 The lsmdiskextent output showing a balanced MDG
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
32
0
2
32
0
1
32
0
4
32
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
32
0
2
32
0
1
32
0
4
31
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
32
0
2
32
0
1
32
0
4
32
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
32
0
2
32
0
1
32
0
4
32
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
32
0
2
32
0
1
32
0
4
33
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
32
0
2
32
0
1
32
0
4
32
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
mdisk0
mdisk1
mdisk2
mdisk3
mdisk4
mdisk5
mdisk6
Chapter 5. MDisks
91
id
number_of_extents copy_id
0
32
0
2
32
0
1
32
0
4
32
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk7
id
number_of_extents copy_id
0
32
0
2
32
0
1
32
0
4
32
0
Notes on the use of the extent balancing script
To use the extent balancing script:
 Migrating extents might have a performance impact, if the SVC or (more likely) the MDisks
are already at the limit of their I/O capability. The script minimizes the impact by using the
minimum priority level for migrations. Nevertheless, many administrators prefer to run
these migrations during periods of low I/O workload, such as overnight.
 There are command line options other than balance.pl that you can use to tune how
extent balancing works, for example, excluding certain MDisks or certain VDisks from the
rebalancing. Refer to the SVCToolsSetup.doc in svctools.zip for details.
 Because the script is written in Perl, the source code is available for you to modify and
extend its capabilities. If you want to modify the source code, make sure that you pay
attention to the documentation in Plain Old Documentation (POD) format within the script.
5.7 Removing MDisks from existing MDGs
You might want to remove MDisks from an MDG, for example, when decommissioning a
storage controller. When removing MDisks from an MDG, consider whether to manually
migrate extents from the MDisks. It is also necessary to make sure that you remove the
correct MDisks.
5.7.1 Migrating extents from the MDisk to be deleted
If an MDisk contains VDisk extents, these extents need to be moved to the remaining MDisks
in the MDG. Example 5-3 shows how to list the VDisks that have extents on a given MDisk
using the CLI.
Example 5-3 Listing which VDisks have extents on an MDisk to be deleted
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk14
id
number_of_extents copy_id
5
16
0
3
16
0
6
16
0
8
13
1
9
23
0
8
25
0
92
SAN Volume Controller Best Practices and Performance Guidelines
Specify the -force flag on the svctask rmvdisk command, or check the corresponding
checkbox in the GUI. Either action causes the SVC to automatically move all used extents on
the MDisk to the remaining MDisks in the MDG. In most environments, where the extents
were automatically allocated in the first place, moving all used extents on the MDisk in this
manner will be fine.
Alternatively, you might want to manually perform the extent migrations. For example,
database administrators try to tune performance by arranging high workload VDisks on the
outside of physical disks. To preserve this type of an arrangement, the user must migrate all
extents off the MDisk before deletion; otherwise, the automatic migration will randomly
allocate extents to MDisks (and areas of MDisks). After all extents have been migrated, the
VDisk removal can proceed without the -force flag.
5.7.2 Verifying an MDisk’s identity before removal
It is critical that MDisks appear to the SVC cluster as unmanaged prior to removing their
controller LUN mapping. Unmapping LUNs from the SVC that are still part of an MDG will
result in the MDG going offline and will impact all hosts with mappings to VDisks in that MDG.
If the MDisk has been named using the naming convention described in the previous section,
the correct LUNs will be easier to identify. However, we recommend that the identification of
LUNs that are being unmapped from the controller match the associated MDisk on the SVC
using either the Controller LUN Number field or the unique identifier (UID) field.
The UID is the best identifier to use here, because it is unique across all MDisks on all
controllers. The Controller LUN Number is only unique within a given controller and for a
certain host. Therefore when using the Controller LUN Number, you must check that you are
managing the correct storage controller and check that you are looking at the mappings for
the correct SVC host object.
Refer to Chapter 5, “MDisks” on page 83 for correlating ESS, DS6000, and DS8000 volume
IDs to Controller LUN Number.
Figure 5-1 on page 94 shows an example of the Controller LUN Number and UID fields from
the SVC MDisk details.
Chapter 5. MDisks
93
Figure 5-1 Controller LUN Number and UID fields from the SVC MDisk details panel
Figure 5-2 on page 95 shows an example of the Logical Drive Properties for the DS4000.
Note that the DS4000 refers to UID as the Logical Drive ID.
94
SAN Volume Controller Best Practices and Performance Guidelines
Figure 5-2 Logical Drive properties for DS4000, including the LUN UID
5.8 Remapping managed MDisks
You generally do not unmap managed MDisks from the SVC, because it causes the MDG to
go offline. However, if managed MDisks have been unmapped from the SVC for a specific
reason, it is important to know that the LUN must present the same UID to the SVC after it
has been mapped back.
If the LUN is mapped back with a different UID, the SVC will recognize this MDisk as a new
MDisk, and the associated MDG will not come back online. Consider this situation for storage
controllers that support LUN selection, because selecting a different LUN ID will change the
UID. If the LUN has been mapped back with a different LUN ID, it must be remapped again
using the previous LUN ID.
Note: The SVC identifies MDisks based on the UID of the LUN.
Another instance where the UID can change on a LUN is in the case where DS4000 support
has regenerated the metadata for the logical drive definitions as part of a recovery procedure.
Chapter 5. MDisks
95
When logical drive definitions are regenerated, the LUN will appear as a new LUN just as it
does when it is created for the first time (the only exception is that the user data will still be
present).
In this case, restoring the UID on a LUN back to its prior value can only be done with the
assistance of DS4000 support. Both the previous UID and the subsystem identifier (SSID) will
be required, both of which can be obtained from the controller profile. To view the logical drive
properties, click Logical/Physical View → LUN → Open Properties.
Refer to Figure 5-2 on page 95 for an example of the Logical Drive Properties panel for a
DS4000 logical drive. This panel shows Logical Drive ID (UID) and SSID.
5.9 Controlling extent allocation order for VDisk creation
When creating striped mode VDisks, it is sometimes desirable to control the order in which
extents are allocated across the MDisks in the MDG for the purpose of balancing workload
across controller resources. For example, you can alternate extent allocation across “DA
pairs” and even and odd “extent pools” in the DS8000.
Note: When VDisks are created, the extents are allocated across MDisks in the MDG in a
round-robin fashion in the order in which the MDisks were initially added to the MDG.
The following example using DS8000 LUNs illustrates how the extent allocation order can be
changed to provide a better balance across controller resources.
Table 5-1 shows the initial discovery order of six MDisks. Note that adding these MDisks to an
MDG in this order results in three contiguous extent allocations alternating between the even
and odd extent pools, as opposed to alternating between extent pools for each extent.
Table 5-1 Initial discovery order
LUN ID
MDisk ID
MDisk name
Controller resource
DA pair/extent pool
1000
1
mdisk01
DA2/P0
1001
2
mdisk02
DA6/P16
1002
3
mdisk03
DA7/P30
1100
4
mdisk04
DA0/P9
1101
5
mdisk05
DA4/P23
1102
6
mdisk06
DA5/P39
To change extent allocation so that each extent alternates between even and odd extent
pools, the MDisks can be renamed after being discovered and then added to the MDG in their
new order.
Table 5-2 on page 97 shows how the MDisks have been renamed so that when they are
added to the MDG in their new order, the extent allocation will alternate between even and
odd extent pools.
96
SAN Volume Controller Best Practices and Performance Guidelines
Table 5-2 MDisks renamed
LUN ID
MDisk ID
MDisk name
original/new
Controller resource
DA pair/extent pool
1000
1
mdisk01/md001
DA2/P0
1100
4
mdisk04/md002
DA0/P9
1001
2
mdisk02/md003
DA6/P16
1101
5
mdisk05/md004
DA4/P23
1002
3
mdisk03/md005
DA7/P30
1102
6
mdisk06/md006
DA5/P39
There are two options available for VDisk creation. We describe both options along with the
differences between the two options:
 Option A: Explicitly select the candidate MDisks within the MDG that will be used (through
the command line interface (CLI) or GUI). Note that when explicitly selecting the MDisk
list, the extent allocation will round-robin across MDisks in the order that they are
represented on the list starting with the first MDisk on the list:
– Example A1: Creating a VDisk with MDisks from the explicit candidate list order:
md001, md002, md003, md004, md005, and md006. The VDisk extent allocations then
begin at “md001” and alternate round-robin around the explicit MDisk candidate list. In
this case, the VDisk is distributed in the following order: md001, md002, md003,
md004, md005, and md006.
– Example A2: Creating a VDisk with MDisks from the explicit candidate list order:
md003, md001, md002, md005, md006, and md004. The VDisk extent allocations then
begin at “md003” and alternate round-robin around the explicit MDisk candidate list. In
this case, the VDisk is distributed in the following order: md003, md001, md002,
md005, md006, and md004.
 Option B: Do not explicitly select the candidate MDisks within an MDG that will be used
(through the command line interface (CLI) or GUI). Note that when the MDisk list is not
explicitly defined, the extents will be allocated across MDisks in the order that they were
added to the MDG, and the MDisk that will receive the first extent will be randomly
selected.
Example B1: Creating a VDisk with MDisks from the candidate list order (based on this
definitive list from the order that the MDisks were added to the MDG: md001, md002,
md003, md004, md005, and md006. The VDisk extent allocations then begin at a random
MDisk starting point (let us assume “md003” is randomly selected) and alternate
round-robin around the explicit MDisk candidate list based on the order that they were
added to the MDG originally. In this case, the VDisk is allocated in the following order:
md003, md004, md005, md006, md001, and md002.
Summary:
 Independent of the order in which a storage subsystem’s LUNs (volumes) are discovered
by SVC, recognize that by renaming MDisks and changing the order that they are added
to the MDG will influence how the VDisk’s extents are allocated.
 Renaming MDisks into a particular order and then adding them to the MDG in that order
will allow the starting MDisk to be randomly selected for each VDisk created and,
therefore, is the optimal method for balancing VDisk extent allocation across storage
subsystem resources.
Chapter 5. MDisks
97
 When MDisks are added to an MDG based on the order in which the MDisks were
discovered, the allocation order can be explicitly specified; however, the MDisk used for
the first extent will always be the first MDisk specified on the list.
 When creating VDisks from the GUI:
– Recognize that you are not required to select the MDisks from the Managed Disk
Candidates list and click Add, but rather you have the option to just enter a capacity
value into the “Type the size of the virtual disks” field and select whether you require
formatting the VDisk. With this approach, Option B is the applied methodology for how
the VDisk’s extents will be allocated within an MDG.
– When a set or a subset of MDisks is selected and added (by clicking Add) to the
“Managed Disks Striped in this Order” column, Option A is the applied methodology for
how the VDisk’s extents are explicitly distributed across the selected MDisks.
Figure 5-3 shows the MDisk selection panel for creating VDisks.
Figure 5-3 MDisk selection for a striped-mode VDisk
5.10 Moving an MDisk between SVC clusters
It can sometimes be desirable to move an MDisk to a separate SVC cluster. Before beginning
this task, consider the alternatives, which include:
 Using Metro Mirror or Global Mirror to copy the data to a remote cluster. One instance in
which this might not be possible is where the SVC cluster is already in a mirroring
partnership with another SVC cluster, and data needs to be migrated to a third cluster.
 Attaching a host server to two SVC clusters and using host-based mirroring to copy the
data.
98
SAN Volume Controller Best Practices and Performance Guidelines
 Using storage controller-based copy services. If you use storage controller-based copy
services, make sure that the VDisks containing the data are image-mode and
cache-disabled.
If none of these options are appropriate, follow these steps to move an MDisk to another
cluster:
1. Ensure that the MDisk is in image mode rather than striped or sequential. If the MDisk is in
image mode, the MDisk contains only the raw client data and not any SVC metadata. If
you want to move data from a non-image mode VDisk, first use the svctask
migratetoimage command to migrate to a single image-mode MDisk. For a
Space-Efficient VDisk (SEV), image mode means that all metadata for the VDisk is
present on the same MDisk as the client data, which will not be readable by a host, but it
will be able to be imported by another SVC cluster.
2. Remove the image-mode VDisk from the first cluster using the svctask rmvdisk
command.
Note: You must not use the -force option of the rmvdisk command. If you use the
-force option, data in cache will not be written to the disk, which might result in
metadata corruption for an SEV.
3. Check by using svcinfo lsvdisk that the VDisk is no longer displayed. You must wait until
it is removed to allow cached data to destage to disk.
4. Change the back-end storage LUN mappings to prevent the source SVC cluster from
seeing the disk, and then make it available to the target cluster.
5. Perform an svctask detectmdisk command on the target cluster.
6. Import the MDisk to the target cluster. If it is not an SEV, you will use the svctask mkvdisk
command with the -image option. If it is an SEV, you will also need to use two other
options:
– -import instructs the SVC to look for SEV metadata on the specified MDisk.
– -rsize indicates that the disk is Space-Efficient. The value given to -rsize must be at
least the amount of space that the source cluster used on the Space-Efficient VDisk. If
it is smaller, a 1862 error will be logged. In this case, delete the VDisk and try the
mkvdisk command again.
7. The VDisk is now online. If it is not, and the VDisk is Space-Efficient, check the SVC error
log for an 1862 error; if an 1862 error is present, it will indicate why the VDisk import failed
(for example, metadata corruption). You might then be able to use the svctask
repairsevdisk command to correct the problem.
Chapter 5. MDisks
99
100
SAN Volume Controller Best Practices and Performance Guidelines
6
Chapter 6.
Managed disk groups
In this chapter, we describe aspects to consider when planning Managed Disk Groups
(MDGs) for an IBM System Storage SAN Volume Controller (SVC) implementation. We
discuss the following areas:
 Availability considerations for MDGs
 Selecting the number of volumes, or Logical Unit Numbers (LUNs), per storage subsystem
array
 Selecting the number of storage subsystem arrays per MDG
 Striping compared to sequential mode VDisks
 Selecting storage subsystems
© Copyright IBM Corp. 2008. All rights reserved.
101
6.1 Availability considerations for MDGs
While the SVC itself provides many advantages through the consolidation of storage, it is
important to understand the availability implications that storage subsystem failures can have
on availability domains within the SVC cluster.
In this section, we point out that while the SVC offers significant performance benefits through
its ability to stripe across back-end storage volumes, it is also worthwhile considering the
effects that various configurations will have on availability.
Note: Configurations designed with performance in mind tend to also offer the most in
terms of ease of use from a storage management perspective, because they encourage
greater numbers of resources within the same MDG.
When selecting Managed Disks (MDisks) for an MDG, performance is often the primary
consideration. While performance is nearly always important, there are many instances
where the availability of the configuration is traded for little or no performance gain. A
performance-optimized configuration consists of including MDisks from multiple arrays (and
possibly multiple storage subsystems/controllers) within the same MDG. In order to include
MDisks from multiple arrays within the same MDG with large array sizes, this effort typically
involves configuring arrays into multiple LUNs and assigning them as MDisks to multiple
MDGs. These types of configurations have an availability cost associated with them and
might not yield the performance benefits that you intend.
Note: Increasing the performance “potential” of an MDG does not necessarily equate to a
gain in application performance.
Well-designed MDGs balance the required performance and storage management objectives
against availability, and therefore, all three objectives must be considered during the planning
phase.
Remembering that the SVC must take the whole MDG offline if a single MDisk in that MDG
goes offline, the number of storage subsystem arrays per MDG has an impact on availability,
for example, if you have 40 arrays of 1 TB each for a total capacity of 40 TB. With all 40
arrays placed in the same MDG, we have put the entire 40 TB of capacity at risk if one of the
40 arrays failed, therefore, causing an MDisk to go offline. If we then spread the 40 arrays out
over a larger number of MDGs, the effect of an array failure affects less storage capacity,
thus, limiting the failure domain, if MDisks from a given array are all assigned to the same
MDG. If MDisks from a given array are not all assigned to the same MDG, a single array
failure impacts all MDGs in which it resides, and therefore, the failure domain expands to
multiple MDGs.
The following best practices focus on availability and not on performance, so there are valid
reasons why these best practices do not fit in all cases. As is always the case, consider
performance in terms of specific application workload characteristics and requirements.
102
SAN Volume Controller Best Practices and Performance Guidelines
Best practices for availability:
 Each storage subsystem must be used with only a single SVC cluster.
 Each array must be included in only one MDG.
 Each MDG must only contain MDisks from a single storage subsystem.
 Each MDG must contain MDisks from no more than approximately 10 storage
subsystem arrays.
In the following sections, we examine the effects of these best practices on performance.
6.1.1 Performance considerations
Most applications meet performance objectives when average response times for random I/O
are in the 2 - 15 millisecond range; however, there are response-time sensitive applications
(typically transaction-oriented) that cannot tolerate maximum response times of more than a
few milliseconds. You must consider availability in the design of these applications; however,
be careful to ensure that sufficient back-end storage subsystem capacity is available to
prevent elevated maximum response times.
Note: We recommend that you use the Disk Magic™ application to size the performance
demand for specific workloads. You can obtain a copy of Disk Magic, which can assist you
with this effort, from:
http://www.intellimagic.net
Considering application boundaries and dependencies
Reducing hardware failure boundaries for back-end storage is only part of what you must
consider. When determining MDG layout, you also need to consider application boundaries
and dependencies in order to identify any availability benefits that one configuration might
have over another configuration.
Recognize that reducing hardware failure boundaries is not always advantageous from an
application perspective. For instance, when an application uses multiple VDisks from an
MDG, there is no advantage to splitting those VDisks between multiple MDGs, because the
loss of either of the MDGs results in an application outage. However, if an SVC cluster is
serving storage for multiple applications, there might be an advantage to having several
applications continue uninterrupted while an application outage has occurred on other
applications. It is the later scenario that places the most emphasis on availability when
planning the MDG layout.
6.1.2 Selecting the MDisk Group
You can use the SVC to create tiers of storage in which each tier has different performance
characteristics by only including MDisks that have the same performance characteristics
within an MDG. So, if you have a storage infrastructure with, for example, three classes of
storage, you create each VDisk from the MDG, which has the class of storage that most
closely matches the VDisk’s expected performance characteristics.
Because migrating between storage pools, or rather MDGs, is non-disruptive to the users, it is
an easy task to migrate a VDisk to another storage pool, if the actual performance is different
than expected.
Chapter 6. Managed disk groups
103
Note: If there is uncertainty about in which storage pool (MDG) to create a VDisk, initially
use the pool with the lowest performance and then move the VDisk up to a higher
performing pool later if required.
Batch and OLTP workloads
Clients often want to know whether to mix their batch and online transaction processing
(OLTP) workloads in the same MDG. Batch and OLTP workloads might both require the same
tier of storage, but in many SVC installations, there are multiple MDGs in the same storage
tier so that the workloads can be separated.
We usually recommend mixing workloads so that the maximum resources are available to
any workload when needed. However, batch workloads are a good example of the opposing
point of view. There is a fundamental problem with letting batch and online work share
resources: the amount of I/O resources that a batch job can consume is often limited only by
the amount of I/O resources available.
To address this problem, it obviously can help to segregate the batch workload to its own
MDG, but segregating the batch workload to its own MDG does not necessarily prevent node
or path resources from being overrun. Those resources might also need to be considered if
you implement a policy of batch isolation.
For SVC, an interesting alternative is to cap the data rate at which batch volumes are allowed
to run by limiting the maximum throughput of a VDisk; refer to 7.3.6, “Governing of VDisks” on
page 130. Capping the data rate at which batch volumes are allowed to run can potentially let
online work benefit from periods when the batch load is light while limiting the damage when
the batch load is heavy.
A lot depends on the timing of when the workloads will run. If you have mainly OLTP during
the day shift and the batch workloads run at night, there is normally no problem with mixing
the workloads in the same MDG. But you run the two workloads concurrently, and if the batch
workload runs with no cap or throttling and requires high levels of I/O throughput, we
recommend that wherever possible, the workloads are segregated onto different MDGs that
are supported by different back-end storage resources.
6.2 Selecting the number of LUNs per array
We generally recommend that you configure LUNs to use the entire array, which is especially
true for midrange storage subsystems where multiple LUNs configured to an array have
shown to result in a significant performance degradation. The performance degradation is
attributed mainly to smaller cache sizes and the inefficient use of available cache, defeating
the subsystem’s ability to perform “full stride writes” for Redundant Array of Independent
Disks 5 (RAID 5) arrays. Additionally, I/O queues for multiple LUNs directed at the same array
can have a tendency to overdrive the array.
Higher end storage controllers, such as the IBM System Storage DS8000 series, make this
much less of an issue through the use of large cache sizes. However, large array sizes might
require that multiple LUNs are created due to LUN size limitations. Later, we examine the
performance implications of having more than one LUN per array on an DS8000 storage
subsystem. However, on higher end storage controllers, most workloads show the difference
between a single LUN per array compared to multiple LUNs per array to be negligible.
Consider the manageability aspects of creating multiple LUNs per array configurations. Be
careful in regard to the placement of these LUNs so that you do not create conditions where
104
SAN Volume Controller Best Practices and Performance Guidelines
over-driving an array can occur. Additionally, placing these LUNs in multiple MDGs expands
failure domains considerably as we discussed in 6.1, “Availability considerations for MDGs”
on page 102.
Table 6-1 provides our recommended guidelines for array provisioning on IBM storage
subsystems.
Table 6-1 Array provisioning
Controller type
LUNs per array
IBM System Storage DS4000
1
IBM System Storage DS6000
1
IBM System Storage DS8000
1-2
IBM Enterprise Storage Server (ESS)
1-2
6.2.1 Performance comparison of one LUN compared to two LUNs per array
The following example shows a comparison between one LUN per array as opposed to two
LUNs using DS8000 arrays. Because any performance benefit to be gained relies on having
both LUNs within an array to be evenly loaded, this comparison was performed by placing
both LUNs for each array within the same MDG. Testing was performed on two MDGs with
eight MDisks per MDG. Table 6-2 shows the MDG layout for Config1 with two LUNs per array
and Table 6-3 on page 106 shows the MDG layout for Config2 with a single LUN per array.
Table 6-2 Two LUNs per array
DS8000 array
LUN1
LUN2
Array1
MDG1
MDG1
Array2
MDG1
MDG1
Array3
MDG1
MDG1
Array4
MDG1
MDG1
Array5
MDG2
MDG2
Array6
MDG2
MDG2
Array7
MDG2
MDG2
Array8
MDG2
MDG2
Chapter 6. Managed disk groups
105
Table 6-3 One LUN per array
DS8000 array
LUN1
Array1
MDG1
Array2
MDG1
Array3
MDG1
Array4
MDG1
Array5
MDG2
Array6
MDG2
Array7
MDG2
Array8
MDG2
We performed testing using a four node SVC cluster with two I/O Groups and eight VDisks
per MDG.
The following workloads were used in the testing:
 Ran-R/W-50/50-0%CH
 Seq-R/W-50/50-25%CH
 Seq-R/W-50/50-0%CH
 Ran-R/W-70/30-25%CH
 Ran-R/W-50/50-25%CH
 Ran-R/W-70/30-0%CH
 Seq-R/W-70/30-25%CH
 Seq-R/W-70/30-0%CH
Note: Ran=Random, Seq=Sequential, R/W= Read/Write, and CH=Cache Hit (25%CH
means that 25% of all I/Os are read cache hits)
We collected the following performance metrics for a single MDG using IBM TotalStorage
Productivity Center (TPC). Figure 6-1 on page 107 and Figure 6-2 on page 108 show the I/Os
per second (IOPS) and response time comparisons between Config1 (two LUNs per array)
and Config2 (one LUN per array).
106
SAN Volume Controller Best Practices and Performance Guidelines
Figure 6-1 IOPS comparison between two LUNs per array and one LUN per array
Chapter 6. Managed disk groups
107
Figure 6-2 Response time comparison between two LUNs per array and one LUN per array
The test shows a small response time advantage to the two LUNs per array configuration and
a small IOPS advantage to the one LUN per array configuration for sequential workloads.
Overall, the performance differences between these configurations are minimal.
6.3 Selecting the number of arrays per MDG
The capability to stripe across disk arrays is the single most important performance
advantage of the SVC; however, striping across more arrays is not necessarily better. The
objective here is to only add as many arrays to a single MDG as required to meet the
performance objectives. Because it is usually difficult to determine what is required in terms
of performance, the tendency is to add far too many arrays to a single MDG, which again
increases the failure domain as we discussed previously in 6.1, “Availability considerations for
MDGs” on page 102.
It is also worthwhile to consider the effect of aggregate load across multiple MDGs. It is clear
that striping workload across multiple arrays has a positive effect on performance when you
are talking about dedicated resources, but the performance gains diminish as the aggregate
load increases across all available arrays. For example, if you have a total of eight arrays and
are striping across all eight arrays, your performance is much better than if you were striping
across only four arrays. However, if the eight arrays are divided into two LUNs each and are
also included in another MDG, the performance advantage drops as the load of MDG2
approaches that of MDG1, which means that when workload is spread evenly across all
MDGs, there will be no difference in performance.
More arrays in the MDG have more of an effect with lower performing storage controllers. So,
for example, we require fewer arrays from a DS8000 than we do from a DS4000 to achieve
the same performance objectives. Table 6-4 on page 109 shows the recommended number of
108
SAN Volume Controller Best Practices and Performance Guidelines
arrays per MDG that is appropriate for general cases. Again, when it comes to performance,
there can always be exceptions.
Table 6-4 Recommended number of arrays per MDG
Controller type
Arrays per MDG
DS4000
4 - 24
ESS/DS8000
4 - 12
DA pair considerations for selecting ESS and DS8000 arrays
The ESS and DS8000 storage architectures both access disks through pairs of device
adapters (DA pairs) with one adapter in each storage subsystem controller. The ESS
contains four DA pairs and the DS8000 scales from two to eight DA pairs. When possible,
consider adding arrays to MDGs based on multiples of the installed DA pairs. For example, if
the storage controller contains six DA pairs, use either six or 12 arrays in an MDG with arrays
from all DA pairs in a given MDG.
Performance comparison of reducing the number of arrays per MDG
The following test compares the performance between eight arrays per MDG and four arrays
per MDG. The configuration with eight arrays per MDG represents a performance-optimized
configuration, and the four arrays per MDG configuration represents a configuration that has
better availability characteristics.
We performed testing on the following configuration:
 There are eight ranks from a DS8000.
 Each rank is configured as one RAID 5 array.
 Each RAID 5 array is divided into four LUNs.
 Four MDGs are configured.
 Each MDG uses one LUN (MDisk) from each of the eight arrays.
 The VDisks are created in sequential mode.
The array to MDisk mapping for this configuration is represented in Table 6-5.
Table 6-5 Configuration one: Each array is contained in four MDGs
DS8000 array
LUN1
LUN2
LUN3
LUN4
Array1
MDG1
MDG2
MDG3
MDG4
Array2
MDG1
MDG2
MDG3
MDG4
Array3
MDG1
MDG2
MDG3
MDG4
Array4
MDG1
MDG2
MDG3
MDG4
Array5
MDG1
MDG2
MDG3
MDG4
Array6
MDG1
MDG2
MDG3
MDG4
Array7
MDG1
MDG2
MDG3
MDG4
Array8
MDG1
MDG2
MDG3
MDG4
You can see from this design that if a single array fails, all four MDGs are affected, and all
SVC VDisks that are using storage from this DS8000 fail.
Chapter 6. Managed disk groups
109
Table 6-6 shows an alternative to this configuration. Here, the arrays are divided into two
LUNs each, and there are half the number of arrays for each MDG as there were in the first
configuration. In this design, the failure boundary of an array failure is cut in half, because any
single array failure only affects half of the MDGs.
Table 6-6 Configuration two: Each array is contained in two MDGs
DS8000 array
LUN1
LUN2
Array1
MDG1
MDG3
Array2
MDG1
MDG3
Array3
MDG1
MDG3
Array4
MDG1
MDG3
Array5
MDG2
MDG4
Array6
MDG2
MDG4
Array7
MDG2
MDG4
Array8
MDG2
MDG4
We collected the following performance metrics using TPC to compare these configurations.
The first test was performed with all four MDGs evenly loaded. Figure 6-3 on page 111 and
Figure 6-4 on page 112 show the IOPS and response time comparisons between Config1
(four LUNs per array) and Config2 (two LUNs per array) for varying workloads.
110
SAN Volume Controller Best Practices and Performance Guidelines
Figure 6-3 IOPS comparison of eight arrays/MDG and four arrays/MDG with all four MDGs active
Chapter 6. Managed disk groups
111
Figure 6-4 Response time comparison between eight and four arrays/MDG with all four MDGs active
This test shows virtually no difference between using eight arrays per MDG compared to
using four arrays per MDG, when all MDGs are evenly loaded (with the exception of a small
advantage in IOPS for the eight array MDG for sequential workloads).
We performed two additional tests to show the potential effect when MDGs are not loaded
evenly. We performed the first test using only one of the four MDGs, while the other three
MDGs remained idle. This test presents the worst case scenario, because the eight array
MDG has the fully dedicated bandwidth of all eight arrays available to it, and therefore,
halving the number of arrays has a pronounced effect. This test tends to be an unrealistic
scenario, because it is unlikely that all host workload will be directed at a single MDG.
Figure 6-5 on page 113 shows the IOPS comparison between these configurations.
112
SAN Volume Controller Best Practices and Performance Guidelines
Figure 6-5 IOPS comparison between eight and four arrays/MDG with a single MDG active
We performed the second test with I/O running to only two of the four MDGs, which is shown
in Figure 6-6 on page 114.
Chapter 6. Managed disk groups
113
Figure 6-6 IOPS comparison between eight arrays/MDG and four arrays/MDG with two MDGs active
Figure 6-6 shows the results from the test where only two of the four MDGs are loaded. This
test shows no difference between the eight arrays per MDG configuration and the four arrays
per MDG configuration for random workload. This test shows a small advantage to the eight
arrays per MDG configuration for sequential workloads.
Our conclusions are:
 The performance advantage with striping across a larger number of arrays is not as
pronounced as you might expect.
 You must consider the number of MDisks per array along with the number of arrays per
MDG to understand aggregate MDG loading effects.
 You can achieve availability improvements without compromising performance objectives.
6.4 Striping compared to sequential type
With extremely few exceptions, you must always configure VDisks using striping.
However, one exception to this rule is an environment where you have a 100% sequential
workload where disk loading across all VDisks is guaranteed to be balanced by the nature of
the application. For example, specialized video streaming applications are exceptions to this
rule. Another exception to this rule is an environment where there is a high dependency on a
large number of flash copies. In this case, FlashCopy loads the VDisks evenly and the
sequential I/O, which is generated by the flash copies, has higher throughput potential than
114
SAN Volume Controller Best Practices and Performance Guidelines
what is possible with striping. This situation is a rare exception given the unlikely requirement
to optimize for FlashCopy as opposed to online workload.
Note: Electing to use sequential type over striping requires a detailed understanding of the
data layout and workload characteristics in order to avoid negatively impacting system
performance.
6.5 SVC cache partitioning
In a situation where more I/O is driven to an SVC node than can be sustained by the
back-end storage, the SVC cache can become exhausted. This situation can happen even if
only one storage controller is struggling to cope with the I/O load, but it impacts traffic to
others as well. To avoid this situation, SVC cache partitioning provides a mechanism to
protect the SVC cache from not only overloaded controllers, but also misbehaving controllers.
The SVC cache partitioning function is implemented on a per Managed Disk Group (MDG)
basis. That is, the cache automatically partitions the available resources on a per MDG basis.
The overall strategy is to protect the individual controller from overloading or faults. If many
controllers (or in this case, MDGs) are overloaded, the overall cache can still suffer.
Table 7 shows the upper limit of write cache data that any one partition, or MDG, can occupy.
Table 7 Upper limit of write cache data
Number of MDGs
Upper limit
1
100%
2
66%
3
40%
4
30%
5 or more
25%
The effect of the SVC cache partitioning is that no single MDG occupies more than its upper
limit of cache capacity with write data. Upper limits are the point at which the SVC cache
starts to limit incoming I/O rates for VDisks created from the MDG.
If a particular MDG reaches the upper limit, it will experience the same result as a global
cache resource that is full. That is, the host writes are serviced on a one-out one-in basis - as
the cache destages writes to the back-end storage. However, only writes targeted at the full
MDG are limited, all I/O destined for other (non-limited) MDGs continues normally.
Read I/O requests for the limited MDG also continue normally. However, because the SVC is
destaging write data at a rate that is obviously greater than the controller can actually sustain
(otherwise, the partition does not reach the upper limit), reads are serviced equally as slowly.
The main thing to remember is that the partitioning is only limited on write I/Os. In general, a
70/30 or 50/50 ratio of read to write operations is observed. Of course, there are applications,
or workloads, that perform 100% writes; however, write cache hits are much less of a benefit
than read cache hits. A write always hits the cache. If modified data already resides in the
cache, it is overwritten, which might save a single destage operation. However, read cache
hits provide a much more noticeable benefit, saving seek and latency time at the disk layer.
Chapter 6. Managed disk groups
115
In all benchmarking tests performed, even with single active MDGs, good path SVC I/O group
throughput remains the same as it was before the introduction of SVC cache partitioning.
For in-depth information about SVC cache partitioning, we recommend the following IBM
Redpaper publication:
 IBM SAN Volume Controller 4.2.1 Cache Partitioning, REDP-4426-00
6.6 SVC quorum disk considerations
When back-end storage is initially added to an SVC cluster as an MDG, three quorum disks
are automatically created by allocating space from the assigned MDisks. As more back-end
storage controllers (and therefore MDGs) are added to the SVC cluster, the quorum disks do
not get reallocated to span multiple back-end storage subsystems. To eliminate a situation
where all quorum disks go offline due to a back-end storage subsystem failure, we
recommend allocating quorum disks on multiple back-end storage subsystems. This design is
of course only possible when multiple back-end storage subsystems (and therefore multiple
MDGs) are available.
Even when there is only a single storage subsystem, but multiple MDGs created from this, the
quorum disk must be allocated from several MDGs to avoid an array failure causing the loss
of the quorum. Reallocating quorum disks can be done from either the SVC Console or from
the SVC command line interface (CLI). The SVC CLI command to use is:
svctask setquorum -quorum <quorum id> <mdisk_id>
In this command:
 The <quorum id> represents the quorum disk number and can have a value of 0, 1, or 2.
 The <mdisk_id> is the MDisk from where the quorum disk must now allocate space. The
specified MDisk must be assigned to the desired MDG, and free space (256 MB or one
extent, whichever is larger) must be available in the MDG.
To check if a specific MDisk is used as a quorum disk, the following SVC CLI command can
be used:
svcinfo lsmdisk <mdisk_id>
If this command shows a non-blank quorum-index value, the MDisk is used as a quorum disk.
6.7 Selecting storage subsystems
When selecting storage subsystems, the decision generally comes down to the ability of the
storage subsystem to meet the availability objectives of the applications. Because the SVC
does not provide any data redundancy, the availability characteristics of the storage
subsystems’ controllers have the most impact on the overall availability of the data virtualized
by the SVC.
Performance becomes less of a determining factor due to the SVC’s ability to use various
storage subsystems, regardless of whether they scale up or scale out. For example, the
DS8000 is a scale-up architecture that delivers “best of breed” performance per unit, and the
DS4000 can be scaled out with enough units to deliver the same performance. Because the
SVC hides the scaling characteristics of the storage subsystems, the inherent performance
characteristics of the storage subsystems tend not to be a direct determining factor.
116
SAN Volume Controller Best Practices and Performance Guidelines
A significant consideration when comparing native performance characteristics between
storage subsystem types is the amount of scaling that is required to meet the performance
objectives. While lower performing subsystems can typically be scaled to meet performance
objectives, the additional hardware that is required lowers the availability characteristics of
the SVC cluster. Remember that all storage subsystems possess an inherent failure rate, and
therefore, the failure rate of an MDG becomes the failure rate of the storage subsystem times
the number of units.
Of course, there might be other factors that lead you to select one storage subsystem over
another storage subsystem, such as utilizing available resources or a requirement for
additional features and functions, such as the System z® attach capability.
Chapter 6. Managed disk groups
117
118
SAN Volume Controller Best Practices and Performance Guidelines
7
Chapter 7.
VDisks
In this chapter, we show the new features of SVC Version 4.3.0 and discuss Virtual Disks
(VDisks). We describe creating them, managing them, and migrating them across I/O
Groups.
We then discuss VDisk performance and how you can use TotalStorage Productivity Center
(TPC) to analyze performance and to help guide you to possible solutions.
© Copyright IBM Corp. 2008. All rights reserved.
119
7.1 New features in SVC Version 4.3.0
In this section, we highlight the following new VDisk features and details for performance
enhancement:
 Space-Efficient VDisks
 VDisk mirroring
7.1.1 Real and virtual capacities
One feature of SVC Version 4.3.0 is the Space-Efficient VDisk (SE VDisk). You can configure
a VDisk to either be “Space-Efficient” or “Fully Allocated.” The SE VDisks are created with
different capacities: real and virtual capacities. You can still create VDisks in striped,
sequential, or image mode virtualization policy, just as you can any other VDisk.
The real capacity defines how much disk space is actually allocated to a VDisk. The virtual
capacity is the capacity of the VDisk that is reported to other SVC components (for example,
FlashCopy or Remote Copy) and to the hosts.
A directory maps the virtual address space to the real address space. The directory and the
user data share the real capacity.
There are two operating modes for SE VDisks. An SE VDisk can be configured to be
Auto-Expand or not. If you select the Auto-Expand operating mode, the SVC automatically
expands the real capacity of the SE VDisk. The mode of the respective SE VDisk can be
switched at any time.
7.1.2 Space allocation
As mentioned, when a SE VDisk is initially created, a small amount of the real capacity is
used for initial metadata. Write I/Os to the grains of the SE VDisk that have not previously
been written to cause grains of the real capacity to be used to store metadata and user data.
Write I/Os to grains that have previously been written to update the grain where data was
previously written.
Note: The grain is defined when the VDisk is created and can be 32 KB, 64 KB, 128 KB, or
256 KB.
Smaller granularities can save more space, but they have larger directories. When you use
SE with FlashCopy (FC), specify the same grain size for both SE and FC. For more details
about SEFC, refer to 8.1.6, “Space-Efficient FlashCopy (SEFC)” on page 159.
7.1.3 Space-Efficient VDisk performance
SE VDisks require more I/Os because of the directory accesses:
 For truly random workloads, an SE VDisk requires approximately one directory I/O for
every user I/O, so performance will be 50% of a normal VDisk
 The directory is 2-way write-back cached (just like the SVC fastwrite cache), so certain
applications perform better.
 SE VDisks require more CPU processing, so the performance per I/O group will be lower.
120
SAN Volume Controller Best Practices and Performance Guidelines
You need to use the striping policy in order to spread SE VDisks across many MDisks.
Important: Do not use SE VDisks where high I/O performance is required.
SE VDisks only save capacity if the host server does not write to the whole VDisk. Whether
the Space-Efficient VDisk works well is partly dependent on how the filesystem allocated the
space:
 Certain filesystems (for example, NTFS (NT File System)) will write to the whole VDisk
before overwriting deleted files, while other filesystems will reuse space in preference to
allocating new space.
 Filesystem problems can be moderated by tools, such as “defrag” or by managing storage
using host Logical Volume Managers (LVMs).
The SE VDisk is also dependent on how applications use the filesystem, for example, certain
applications only delete log files when the filesystem is nearly full.
Note: There is no recommendation for SEV and best performance or practice. As already
explained, it depends on what is used in the particular environment. For the absolute best
performance, use fully allocated VDisks instead of an SE VDisk.
7.1.4 Testing an application with Space-Efficient VDisk
To help you understand what works in combination with SE VDisks, perform this test:
1. Create an SE VDisk with Auto-Expand turned off.
2. Test the application.
3. If the application and SE do not work well, the VDisk will fill up and in the worst case, it will
go offline.
4. If the application and SE do work well, the VDisk will not fill up and will remain online.
5. You can configure warnings and also monitor how much capacity is being used.
6. If necessary, the user can expand or shrink the real capacity of the VDisk.
7. When you have determined if the combination of the application and SE works well, you
can enable Auto-Expand.
7.1.5 What is VDisk mirroring
With the VDisk mirroring feature, we now can create a VDisk with one or two copies. These
copies can be in the same or in different MDisk Groups (with different extent sizes of the
MDisk Group). The first MDisk Group that is specified contains the “primary” copy.
If a VDisk is created with two copies, both copies use the same virtualization policy, just as
any other VDisk. But there is also a way to have two copies of a VDisk with different
virtualization policies. In combination with space efficiency, each mirror of a VDisk can be
Space-Efficient or fully allocated and in striped, sequential, or image mode.
A mirrored VDisk has all of the capabilities of a VDisk and also the same restrictions as a
VDisk (for example, a mirrored VDisk is owned by an I/O Group, just as any other VDisk).
This feature also provides a point-in-time copy functionality that is achieved by “splitting” a
copy from the VDisk.
Chapter 7. VDisks
121
7.1.6 Creating or adding a mirrored VDisk
When a mirrored VDisk is created and the format has been specified, all copies are formatted
before the VDisk comes online. The copies are then considered synchronized.
Alternatively, with the “no synchronization” option chosen, the mirrored VDisks are not
synchronized.
This might be helpful in these cases:
 If it is known that the already formatted MDisk space will be used for mirrored VDisks.
 If it is just not required, that the copies are synchronized.
7.1.7 Availability of mirrored VDisks
VDisk mirroring provides a low level Redundant Array of Independent Disks 1 (RAID 1)
mirroring to protect against controller and MDisk Group failure, because it allows you to
create a VDisk with two copies, which are in different MDisk Groups. If one storage controller
or MDisk Group failed, a VDisk copy is not affected if it has been placed on a different storage
controller or in a different MDisk Group.
For FlashCopy usage, a mirrored VDisk is only online to other nodes if it is online in its own
I/O Group and if the other nodes have visibility to the same copies as the nodes in the I/O
Group. If a mirrored VDisk is a source VDisk in a FlashCopy relationship, asymmetric path
failures or a failure of the mirrored VDisk’s I/O Group can cause the target VDisk to be taken
offline.
7.1.8 Mirroring between controllers
As mentioned, one advantage of mirrored VDisks is having the VDisk copies on different
storage controllers/MDisk Groups. Normally, the read I/O is directed to the primary copy, but
the primary copy must be available and synchronized. The location of the primary copy can
be selected at its creation, but the location can also be changed later.
Important: For the best practice and best performance, put all the primary mirrored
VDisks on the same storage controller, or you might see a performance impact. Selecting
the copy that is allocated on the higher performance storage controller will maximize the
read performance of the VDisk.
The write performance will be constrained by the lower performance controller, because
writes must complete to both copies before the VDisk is considered to have been written
successfully.
7.2 Creating VDisks
IBM System Storage SAN Volume Controller, SG24-6423-06, fully describes the creation of
VDisks.
The best practices that we strongly recommend are:
 Decide on your naming convention before you begin. It is much easier to assign the
correct names at the time of VDisk creation than to modify them afterwards. If you do need
122
SAN Volume Controller Best Practices and Performance Guidelines
to change the VDisk name, use the svctask chvdisk command (refer to Example 7-1).
This command changes the name of the VDisk Test_0 to Test_1.
Example 7-1 The svctask chvdisk command
IBM_2145:itsosvccl1:admin>svctask chvdisk -name Test_1 Test_0
 Balance the VDisks across the I/O Groups in the cluster to balance the load across the
cluster. At the time of VDisk creation, the workload to be put on the VDisk might not be
known. In this case, if you are using the GUI, accept the system default of load balancing
allocation. Using the command line interface (CLI), you must manually specify the I/O
Group. In configurations with large numbers of attached hosts where it is not possible to
zone a host to multiple I/O Groups, it might not be possible to choose to which I/O Group
to attach the VDisks. The VDisk has to be created in the I/O Group to which its host
belongs. For moving a VDisk across I/O Groups, refer to 7.2.3, “Moving a VDisk to another
I/O Group” on page 125.
Note: Migrating VDisks across I/O Groups is a disruptive action. Therefore, it is best to
specify the correct I/O Group at the time of VDisk creation.
 By default, the preferred node, which owns a VDisk within an I/O Group, is selected on a
load balancing basis. At the time of VDisk creation, the workload to be put on the VDisk
might not be known. But it is important to distribute the workload evenly on the SVC nodes
within an I/O Group. The preferred node cannot easily be changed. If you need to change
the preferred node, refer to 7.2.2, “Changing the preferred node within an I/O Group” on
page 124.
 The maximum number of VDisks per I/O Group is 2 048.
 The maximum number of VDisks per cluster is 8 192 (eight node cluster).
 The smaller the extent size that you select, the finer the granularity of the VDisk of space
occupied on the underlying storage controller. A VDisk occupies an integer number of
extents, but its length does not need to be an integer multiple of the extent size. The
length does need to be an integer multiple of the block size. Any space left over between
the last logical block in the VDisk and the end of the last extent in the VDisk is unused. A
small extent size is used in order to minimize this unused space. The counter view to this
view is that the smaller the extent size, the smaller the total storage volume that the SVC
can virtualize. The extent size does not affect performance. For most clients, extent sizes
of 128 MB or 256 MB give a reasonable balance between VDisk granularity and cluster
capacity. There is no longer a default value set. Extent size is set during the Managed
Disk (MDisk) Group creation.
Important: VDisks can only be migrated between Managed Disk Groups (MDGs) that
have the same extent size, except for mirrored VDisks. The two copies can be in different
MDisk Groups with different extent sizes.
As mentioned in the first section of this chapter, a VDisk can be created as Space-Efficient or
fully allocated, in one of these three modes: striped, sequential, or image and with one or two
copies (VDisk mirroring).
With extremely few exceptions, you must always configure VDisks using striping mode.
Chapter 7. VDisks
123
Note: Electing to use sequential mode over striping requires a detailed understanding of
the data layout and workload characteristics in order to avoid negatively impacting the
system performance.
7.2.1 Selecting the MDisk Group
As discussed in 6.1.2, “Selecting the MDisk Group” on page 103, you can use the SVC to
create tiers (each one with different performance characteristics) of storage.
7.2.2 Changing the preferred node within an I/O Group
The plan is to simplify changing the preferred node within an I/O Group in a future release of
the SVC code so that a single SVC command can change the preferred node within an I/O
Group. Currently, no non-disruptive or easy method exists to change the preferred node
within an I/O Group.
There are three alternative techniques that you can use; they are all disruptive to the host to
which the VDisk is mapped:
 Migrate the VDisk out of the SVC as an image mode-managed disk (MDisk) and then
import it back as an image mode VDisk. Make sure that you select the correct preferred
node. The required steps are:
a. Migrate the VDisk to an image mode VDisk.
b. Cease I/O operations to the VDisk.
c. Disconnect the VDisk from the host operating system. For example, in Windows,
remove the drive letter.
d. On the SVC, unmap the VDisk from the host.
e. Delete the image mode VDisk, which removes the VDisk from the MDG.
f. Add the image mode MDisk back into the SVC as an image mode VDisk, selecting the
preferred node that you want.
g. Resume I/O operations on the host.
h. You can now migrate the image mode VDisk to a regular VDisk.
 If remote copy services are enabled on the SVC, perform an intra-cluster Metro Mirror to a
target VDisk with the preferred node that you want. At a suitable opportunity:
a.
b.
c.
d.
e.
f.
Cease I/O to the VDisk.
Flush the host buffers.
Stop copy services and end the copy services relationship.
Unmap the original VDisk from the host.
Map the target VDisk to the host.
Resume I/O operations.
 FlashCopy the VDisk to a target VDisk in the same I/O Group with the preferred node that
you want, using the auto-delete option. The steps to follow are:
a.
b.
c.
d.
e.
f.
124
Cease I/O to the VDisk.
Start FlashCopy.
When the FlashCopy completes, unmap the source VDisk from the host.
Map the target VDisk to the host.
Resume I/O operations.
Delete the source VDisk.
SAN Volume Controller Best Practices and Performance Guidelines
There is a fourth, non-SVC method of changing the preferred node within an I/O Group if the
host operating system or logical volume manager supports disk mirroring. The steps are:
1. Create a VDisk, the same size as the existing one, on the desired preferred node.
2. Mirror the data to this VDisk using host-based logical volume mirroring.
3. Remove the original VDisk from the Logical Volume Manager (LVM).
7.2.3 Moving a VDisk to another I/O Group
The procedure of migrating a VDisk between I/O Groups is disruptive, because access to the
VDisk is lost. If a VDisk is moved between I/O Groups, the path definitions of the VDisks are
not refreshed dynamically. The old IBM Subsystem Device Driver (SDD) paths must be
removed and replaced with the new one.
The best practice is to migrate VDisks between I/O Groups with the hosts shut down. Then,
follow the procedure listed in 9.2, “Host pathing” on page 183 for the reconfiguration of SVC
VDisks to hosts. We recommend that you remove the stale configuration and reboot the host
in order to reconfigure the VDisks that are mapped to a host.
Ensure that when you migrate a VDisk to a new I/O Group, you quiesce all I/O operations for
the VDisk. Determine the hosts that use this VDisk. Stop or delete any FlashCopy mappings
or Metro/Global Mirror relationships that use this VDisk. To check if the VDisk is part of a
relationship or mapping, issue the svcinfo lsvdisk command that is shown in Example 7-2
where vdiskname/id is the name or ID of the VDisk.
Example 7-2 Output of svcinfo lsvdisk command
IBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode0
id 11
name Image_mode0
IO_group_id 0
IO_group_name PerfBestPrac
status online
mdisk_grp_id 0
mdisk_grp_name itso_ds45_64gb
capacity 18.0GB
type image
formatted no
mdisk_id 10
mdisk_name mdisk10
FC_id
FC_name
RC_id
RC_name
vdisk_UID 60050768018381BF280000000000002A
...
Look for the FC_id and RC_id fields. If these fields are not blank, the VDisk is part of a
mapping or a relationship.
The procedure is:
1. Cease I/O operations to the VDisk.
2. Disconnect the VDisk from the host operating system. For example, in Windows, remove
the drive letter.
3. Stop any copy operations.
Chapter 7. VDisks
125
4. Issue the command to move the VDisk (refer to Example 7-3). This command does not
work while there is data in the SVC cache that is to be written to the VDisk. After two
minutes, the data automatically destages if no other condition forces an earlier destaging.
5. On the host, rediscover the VDisk. For example in Windows, run a rescan, then either
mount the VDisk or add a drive letter. Refer to Chapter 9, “Hosts” on page 175.
6. Resume copy operations as required.
7. Resume I/O operations on the host.
After any copy relationships are stopped, you can move the VDisk across I/O Groups with a
single command in an SVC:
svctask chvdisk -iogrp newiogrpname/id vdiskname/id
In this command, newiogrpname/id is the name or ID of the I/O Group to which you move the
VDisk and vdiskname/id is the name or ID of the VDisk.
Example 7-3 shows the command to move the VDisk named Image_mode0 from its existing
I/O Group, io_grp1, to PerfBestPrac.
Example 7-3 Command to move a VDisk to another I/O Group
IBM_2145:itsosvccl1:admin>svctask chvdisk -iogrp PerfBestPrac Image_mode0
Migrating VDisks between I/O Groups can be a potential issue if the old definitions of the
VDisks are not removed from the configuration prior to importing the VDisks to the host.
Migrating VDisks between I/O Groups is not a dynamic configuration change. It must be done
with the hosts shut down. Then, follow the procedure listed in Chapter 9, “Hosts” on page 175
for the reconfiguration of SVC VDisks to hosts. We recommend that you remove the stale
configuration and reboot the host to reconfigure the VDisks that are mapped to a host.
For details about how to dynamically reconfigure IBM Subsystem Device Driver (SDD) for the
specific host operating system, refer to Multipath Subsystem Device Driver: User’s Guide,
SC30-4131-01, where this procedure is also described in great depth.
Note: Do not move a VDisk to an offline I/O Group under any circumstances. You must
ensure that the I/O Group is online before moving the VDisks to avoid any data loss.
This command will not work if there is any data in the SVC cache, which has to be flushed out
first. There is a -force flag; however, this flag discards the data in the cache rather than
flushing it to the VDisk. If the command fails due to outstanding I/Os, it is better to wait a
couple of minutes after which the SVC will automatically flush the data to the VDisk.
Note: Using the -force flag can result in data integrity issues.
7.3 VDisk migration
In this section, we discuss the best practices to follow when you perform VDisk migrations.
126
SAN Volume Controller Best Practices and Performance Guidelines
7.3.1 Migrating with VDisk mirroring
VDisk mirroring offers the facility to migrate VDisks between MDisk Groups with different
extent sizes:
1. First, add a copy to the target MDisk Group.
2. Wait until the synchronization is complete.
3. Remove the copy in the source MDisk Group.
The migration from a Space-Efficient to a fully allocated VDisk is almost the same:
1. Add a target fully allocated copy.
2. Wait for synchronization to complete.
3. Remove the source Space-Efficient copy.
7.3.2 Migrating across MDGs
Migrating a VDisk from one MDG to another MDG is non-disruptive to the host application
using the VDisk. Depending on the workload of the SVC, there might be a slight performance
impact. For this reason, we recommend that you migrate a VDisk from one MDG to another
MDG when there is a relatively low load on the SVC.
7.3.3 Image type to striped type migration
When migrating existing storage into the SVC, the existing storage is brought in as image
type VDisks, which means that the VDisk is based on a single MDisk. In general, we
recommend that the VDisk is migrated to a striped type VDisk, which is striped across
multiple MDisks and, therefore, multiple RAID arrays as soon as it is practical. You generally
expect to see a performance improvement by migrating from image type to striped type.
Example 7-4 shows the command. This process is fully described in IBM System Storage
SAN Volume Controller, SG24-6423-06.
Example 7-4 Image mode migration command
IBM_2145:itsosvccl1:svctask migratevdisk -mdiskgrp itso_ds45_64gb -threads 4
-vdisk image_mode0
This command migrates our VDisk, image_mode0, to the MDG, itso_ds45_64gb, and uses four
threads while migrating. Note that instead of using the VDisk name, you can use its ID
number.
7.3.4 Migrating to image type VDisk
An image type VDisk is a direct “straight through” mapping to exactly one image mode MDisk.
If a VDisk is migrated to another MDisk, the VDisk is represented as being in managed mode
during the migration. It is only represented as an image type VDisk after it has reached the
state where it is a straight through mapping.
Image type disks are used to migrate existing data into an SVC and to migrate data out of
virtualization. Image type VDisks cannot be expanded.
The usual reason for migrating a VDisk to an image type VDisk is to move the data on the
disk to a non-virtualized environment. This operation is also carried out to enable you to
Chapter 7. VDisks
127
change the preferred node that is used by a VDisk. Refer to 7.2.2, “Changing the preferred
node within an I/O Group” on page 124. The procedure of migrating a VDisk to an image type
VDisk is non-disruptive to host I/O.
In order to migrate a striped type VDisk to an image type VDisk, you must be able to migrate
to an available unmanaged MDisk. The destination MDisk must be greater than or equal to
the size of the VDisk. Regardless of the mode in which the VDisk starts, it is reported as
managed mode during the migration. Both of the MDisks involved are reported as being in
image mode during the migration. If the migration is interrupted by a cluster recovery, the
migration will resume after the recovery completes.
You must perform these command line steps:
1. To determine the name of the VDisk to be moved, issue the command:
svcinfo lsvdisk
The output is in the form that is shown in Example 7-5.
Example 7-5 The svcinfo lsvdisk output
IBM_2145:itsosvccl1:admin>svcinfo lsvdisk -delim :
id:name:IO_group_id:IO_group_name:status:mdisk_grp_id:mdisk_grp_name:capacity:t
ype:FC_id:FC_name:RC_id:RC_name:vdisk_UID:fc_map_count
0:diomede0:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::60050768018
381BF2800000000000024:0:1
1:diomede1:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::60050768018
381BF2800000000000025:0:1
2:diomede2:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::60050768018
381BF2800000000000026:0:1
3:vdisk3:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::600507680183
81BF2800000000000009:0:1
4:diomede3:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::60050768018
381BF2800000000000027:0:1
5:vdisk5:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::600507680183
81BF280000000000000B:0:1
6:vdisk6:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::600507680183
81BF280000000000000C:0:1
7:siam1:0:PerfBestPrac:online:4:itso_ds47_siam:70.0GB:striped:::::6005076801838
1BF2800000000000016:0:1
8:vdisk8:0:PerfBestPrac:online:many:many:800.0MB:many:::::60050768018381BF28000
00000000013:0:2
9:vdisk9:0:PerfBestPrac:online:2:itso_smallgrp:1.5GB:striped:::::60050768018381
BF2800000000000014:0:1
10:Diomede_striped:0:PerfBestPrac:online:0:itso_ds45_64gb:64.0GB:striped:::::60
050768018381BF2800000000000028:0:1
11:Image_mode0:0:PerfBestPrac:online:0:itso_ds45_64gb:18.0GB:image:::::60050768
018381BF280000000000002A:0:1
12:Test_1:0:PerfBestPrac:online:0:itso_ds45_64gb:8.0GB:striped:::::600507680183
81BF280000000000002B:0:1
128
SAN Volume Controller Best Practices and Performance Guidelines
2. In order to migrate the VDisk, you need the name of the MDisk to which you will migrate it.
Example 7-6 shows the command that you use.
Example 7-6 The svcinfo lsmdisk command output
IBM_2145:itsosvccl1:admin>svcinfo lsmdisk -delim :
id:name:status:mode:mdisk_grp_id:mdisk_grp_name:capacity:ctrl_LUN_#:controller_
name:UID
0:mdisk0:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000000:itso_ds4500:60
0a0b80001744310000011a4888478c000000000000000000000000
1:mdisk1:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000001:itso_ds4500:60
0a0b80001744310000011948884778000000000000000000000000
2:mdisk2:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000002:itso_ds4500:60
0a0b80001744310000011848884758000000000000000000000000
3:mdisk3:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000003:itso_ds4500:60
0a0b8000174431000001174888473e000000000000000000000000
4:mdisk4:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000004:itso_ds4500:60
0a0b80001744310000011648884726000000000000000000000000
5:mdisk5:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000005:itso_ds4500:60
0a0b8000174431000001154888470c000000000000000000000000
6:mdisk6:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000006:itso_ds4500:60
0a0b800017443100000114488846ec000000000000000000000000
7:mdisk7:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000007:itso_ds4500:60
0a0b800017443100000113488846c0000000000000000000000000
8:mdisk8:online:unmanaged:::64.0GB:0000000000000018:itso_ds4500:600a0b800017443
10000013a48a32b5400000000000000000000000000000000
9:mdisk9:online:unmanaged:::18.0GB:0000000000000008:itso_ds4500:600a0b800017443
10000011b4888aeca00000000000000000000000000000000
...
From this command, we can see that mdisk8 and mdisk9 are candidates for the image
type migration, because they are unmanaged.
3. We now have enough information to enter the command to migrate the VDisk to image
type, and you can see the command in Example 7-7.
Example 7-7 The migratetoimage command
IBM_2145:itsosvccl1:admin>svctask migratetoimage -vdisk Test_1 -threads 4
-mdisk mdisk8 -mdiskgrp itso_ds45_64gb
4. If there is no unmanaged MDisk to which to migrate, you can remove an MDisk from an
MDisk Group. However, you can only remove an MDisk from an MDisk Group if there are
enough free extents on the remaining MDisks in the group to migrate any used extents on
the MDisk that you are removing.
7.3.5 Preferred paths to a VDisk
For I/O purposes, SVC nodes within the cluster are grouped into pairs, which are called I/O
Groups. A single pair is responsible for serving I/O on a specific VDisk. One node within the
I/O Group represents the preferred path for I/O to a specific VDisk. The other node represents
the non-preferred path. This preference alternates between nodes as each VDisk is created
within an I/O Group to balance the workload evenly between the two nodes.
The SVC implements the concept of each VDisk having a preferred owner node, which
improves cache efficiency and cache usage. The cache component read/write algorithms are
dependent on one node owning all the blocks for a specific track. The preferred node is set at
Chapter 7. VDisks
129
the time of VDisk creation either manually by the user or automatically by the SVC. Because
read miss performance is better when the host issues a read request to the owning node, you
want the host to know which node owns a track. The SCSI command set provides a
mechanism for determining a preferred path to a specific VDisk. Because a track is just part
of a VDisk, the cache component distributes ownership by VDisk. The preferred paths are
then all the paths through the owning node. Therefore, a preferred path is any port on a
preferred controller, assuming that the SAN zoning is correct.
Note: The performance can be better if the access is made on the preferred node. The
data can still be accessed by the partner node in the I/O Group in the event of a failure.
By default, the SVC assigns ownership of even-numbered VDisks to one node of a caching
pair and the ownership of odd-numbered VDisks to the other node. It is possible for the
ownership distribution in a caching pair to become unbalanced if VDisk sizes are significantly
different between the nodes or if the VDisk numbers assigned to the caching pair are
predominantly even or odd.
To provide flexibility in making plans to avoid this problem, the ownership for a specific VDisk
can be explicitly assigned to a specific node when the VDisk is created. A node that is
explicitly assigned as an owner of a VDisk is known as the preferred node. Because it is
expected that hosts will access VDisks through the preferred nodes, those nodes can
become overloaded. When a node becomes overloaded, VDisks can be moved to other I/O
Groups, because the ownership of a VDisk cannot be changed after the VDisk is created. We
described this situation in 7.2.3, “Moving a VDisk to another I/O Group” on page 125.
SDD is aware of the preferred paths that SVC sets per VDisk. SDD uses a load balancing and
optimizing algorithm when failing over paths; that is, it tries the next known preferred path. If
this effort fails and all preferred paths have been tried, it load balances on the non-preferred
paths until it finds an available path. If all paths are unavailable, the VDisk goes offline. It can
take time, therefore, to perform path failover when multiple paths go offline.
SDD also performs load balancing across the preferred paths where appropriate.
7.3.6 Governing of VDisks
I/O governing effectively throttles the amount of IOPS (or MBs per second) that can be
achieved to and from a specific VDisk. You might want to use I/O governing if you have a
VDisk that has an access pattern that adversely affects the performance of other VDisks on
the same set of MDisks, for example, a VDisk that uses most of the available bandwidth.
Of course, if this application is highly important, migrating the VDisk to another set of MDisks
might be advisable. However, in some cases, it is an issue with the I/O profile of the
application rather than a measure of its use or importance.
Base the choice between I/O and MB as the I/O governing throttle on the disk access profile
of the application. Database applications generally issue large amounts of I/O, but they only
transfer a relatively small amount of data. In this case, setting an I/O governing throttle based
on MBs per second does not achieve much throttling. It is better to use an IOPS throttle.
At the other extreme, a streaming video application generally issues a small amount of I/O,
but it transfers large amounts of data. In contrast to the database example, setting an I/O
governing throttle based on IOPS does not achieve much throttling. For a streaming video
application, it is better to use an MB per second throttle.
130
SAN Volume Controller Best Practices and Performance Guidelines
Before running the chvdisk command, run the svcinfo lsvdisk command against the VDisk
that you want to throttle in order to check its parameters as shown in Example 7-8.
Example 7-8 The svcinfo lsvdisk command output
IBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode0
id 11
name Image_mode0
IO_group_id 0
IO_group_name PerfBestPrac
status online
mdisk_grp_id 0
mdisk_grp_name itso_ds45_64gb
capacity 18.0GB
type image
formatted no
mdisk_id 10
mdisk_name mdisk10
FC_id
FC_name
RC_id
RC_name
vdisk_UID 60050768018381BF280000000000002A
throttling 0
preferred_node_id 1
fast_write_state empty
cache readwrite
udid 0
fc_map_count 0
sync_rate 50
copy_count 1
...
The throttle setting of zero indicates that no throttling has been set. Having checked the
VDisk, you can then run the svctask chvdisk command. The complete syntax of the
command is:
svctask chvdisk [-iogrp iogrp_name|iogrp_id] [-rate throttle_rate [-unitmb]]
[-name new_name_arg] [-force] vdisk_name|vdisk_id
To just modify the throttle setting, we run:
svctask chvdisk -rate 40 -unitmb Image_mode0
Running the lsvdisk command now gives us the output that is shown in Example 7-9.
Example 7-9 Output of lsvdisk command
IBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode0
id 11
name Image_mode0
IO_group_id 0
IO_group_name PerfBestPrac
status online
mdisk_grp_id 0
mdisk_grp_name itso_ds45_64gb
capacity 18.0GB
type image
Chapter 7. VDisks
131
formatted no
mdisk_id 10
mdisk_name mdisk10
FC_id
FC_name
RC_id
RC_name
vdisk_UID 60050768018381BF280000000000002A
virtual_disk_throttling (MB) 40
preferred_node_id 1
fast_write_state empty
cache readwrite
udid 0
fc_map_count 0
sync_rate 50
copy_count 1
copy_id 0
status online
sync yes
primary yes
mdisk_grp_id 0
mdisk_grp_name itso_ds45_64gb
type image
mdisk_id 10
mdisk_name mdisk10
fast_write_state empty
used_capacity 18.00GB
real_capacity 18.00GB
free_capacity 0.00MB
overallocation 100
autoexpand
warning
grainsize
This example shows that the throttle setting (virtual_disk_throttling) is 40 MBps on this
VDisk. If we had set the throttle setting to an I/O rate by using the I/O parameter, which is the
default setting, we do not use the -unitmb flag:
svctask chvdisk -rate 2048 Image_mode0
You can see in Example 7-10 that the throttle setting has no unit parameter, which means that
it is an I/O rate setting.
Example 7-10 The svctask chvdisk command and svcinfo lsvdisk output
IBM_2145:itsosvccl1:admin>svctask chvdisk -unitmb -rate 2048 Image_mode0
IBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode0
id 11
name Image_mode0
IO_group_id 0
IO_group_name PerfBestPrac
status online
mdisk_grp_id 0
mdisk_grp_name itso_ds45_64gb
capacity 18.0GB
type image
132
SAN Volume Controller Best Practices and Performance Guidelines
formatted no
mdisk_id 10
mdisk_name mdisk10
FC_id
FC_name
RC_id
RC_name
vdisk_UID 60050768018381BF280000000000002A
throttling 2048
preferred_node_id 1
fast_write_state empty
cache readwrite
udid 0
fc_map_count 0
sync_rate 50
copy_count 1
Note: An I/O governing rate of 0 (displayed as virtual_disk_throttling in the CLI output
of the svcinfo lsvdisk command) does not mean that zero IOPS (or MBs per second) can
be achieved. It means that no throttle is set.
7.4 Cache-disabled VDisks
You use cache-disabled VDisks primarily when you are virtualizing an existing storage
infrastructure and you want to retain the existing storage system copy services. You might
want to use cache-disabled VDisks where there is intellectual capital in existing copy services
automation scripts. We recommend that you keep the use of cache-disabled VDisks to
minimum for normal workloads.
You can use cache-disabled VDisks also to control the allocation of cache resources. By
disabling the cache for certain VDisks, more cache resources will be available to cache I/Os
to other VDisks in the same I/O Group. This technique is particularly effective where an I/O
Group is serving VDisks that will benefit from cache and other VDisks where the benefits of
caching are small or non-existent.
Currently, there is no direct way to enable the cache for previously cache-disabled VDisks.
There are three options to turn the VDisk caching mechanism on:
 If the VDisk is an image-mode VDisk, you can remove the VDisk from the SVC cluster and
redefine it with the cache enabled.
 Use the SVC FlashCopy function to copy the content of the cache-disabled VDisk to a
new cache-enabled VDisk. After the FlashCopy has been started, change the VDisk to
host mapping to the new VDisk, which involves an outage.
 Use the SVC Metro Mirror or Global Mirror function to mirror the data to another
cache-enabled VDisk. As in the second option, you have to change the VDisk to host
mapping after the mirror operation is complete, which also involves an outage.
Chapter 7. VDisks
133
7.4.1 Underlying controller remote copy with SVC cache-disabled VDisks
Where synchronous or asynchronous remote copy is used in the underlying storage
controller, the controller LUNs at both the source and destination must be mapped through
the SVC as image mode disks with the SVC cache disabled. Note that, of course, it is
possible to access either the source or the target of the remote copy from a host directly,
rather than through the SVC. You can use the SVC copy services with the image mode VDisk
representing the primary site of the controller remote copy relationship. It does not make
sense to use SVC copy services with the VDisk at the secondary site, because the SVC does
not see the data flowing to this LUN through the controller.
Figure 7-1 shows the relationships between the SVC, the VDisk, and the underlying storage
controller for a cache-disabled VDisk.
Figure 7-1 Cache-disabled VDisk in remote copy relationship
7.4.2 Using underlying controller PiT copy with SVC cache-disabled VDisks
Where point-in-time (PiT) copy is used in the underlying storage controller, the controller
LUNs for both the source and the target must be mapped through the SVC as image mode
disks with the SVC cache disabled as shown in Figure 7-2 on page 135.
Note that, of course, it is possible to access either the source or the target of the FlashCopy
from a host directly rather than through the SVC.
134
SAN Volume Controller Best Practices and Performance Guidelines
Figure 7-2 PiT copy with cache-disabled VDisks
7.4.3 Changing cache mode of VDisks
There is no non-disruptive method to change the cache mode of a VDisk. If you need to
change the cache mode of a VDisk, follow this procedure:
1. Convert the VDisk to an image mode VDisk. Refer to Example 7-11.
Example 7-11 Migrate to an image mode VDisk
IBM_2145:itsosvccl1:admin>svctask migratetoimage -vdisk Test_1 -threads 4
-mdisk mdisk8 -mdiskgrp itso_ds45_64gb
2. Stop I/O to the VDisk.
3. Unmap the VDisk from the host.
4. Run the svcinfo lsmdisk command to check your unmanaged MDisks.
5. Remove the VDisk, which makes the MDisk on which it is created become unmanaged.
Refer to Example 7-12.
Example 7-12 Removing the VDisk Test_1
IBM_2145:itsosvccl1:admin>svctask rmvdisk Test_1
6. Make an image mode VDisk on the unmanaged MDisk that was just released from the
SVC. Check the MDisks by running the svcinfo lsmdisk command first. Refer to
Example 7-13 on page 136.
Chapter 7. VDisks
135
Example 7-13 Making a cache-disabled VDisk
IBM_2145:itsosvccl1:admin>svcinfo lsmdisk -delim :
id:name:status:mode:mdisk_grp_id:mdisk_grp_name:capacity:ctrl_LUN_#:controller_
name:UID
0:mdisk0:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000000:itso_ds4500:60
0a0b80001744310000011a4888478c000000000000000000000000
1:mdisk1:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000001:itso_ds4500:60
0a0b80001744310000011948884778000000000000000000000000
2:mdisk2:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000002:itso_ds4500:60
0a0b80001744310000011848884758000000000000000000000000
3:mdisk3:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000003:itso_ds4500:60
0a0b8000174431000001174888473e000000000000000000000000
4:mdisk4:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000004:itso_ds4500:60
0a0b80001744310000011648884726000000000000000000000000
5:mdisk5:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000005:itso_ds4500:60
0a0b8000174431000001154888470c000000000000000000000000
6:mdisk6:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000006:itso_ds4500:60
0a0b800017443100000114488846ec000000000000000000000000
7:mdisk7:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000007:itso_ds4500:60
0a0b800017443100000113488846c0000000000000000000000000
8:mdisk8:online:unmanaged:::64.0GB:0000000000000018:itso_ds4500:600a0b800017443
10000013a48a32b5400000000000000000000000000000000
9:mdisk9:online:unmanaged:::18.0GB:0000000000000008:itso_ds4500:600a0b800017443
10000011b4888aeca00000000000000000000000000000000
...
IBM_2145:itsosvccl1:admin>svctask mkvdisk -mdiskgrp itso_ds45_64gb -size 5
-unit gb -iogrp PerfBestPrac -name Image_mode1 -cache none
Virtual Disk, id [13], successfully created
IBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode1
id 13
name Image_mode1
IO_group_id 0
IO_group_name PerfBestPrac
status online
mdisk_grp_id 0
mdisk_grp_name itso_ds45_64gb
capacity 5.0GB
type striped
formatted no
mdisk_id
mdisk_name
FC_id
FC_name
RC_id
RC_name
vdisk_UID 60050768018381BF280000000000002D
throttling 0
preferred_node_id 1
fast_write_state empty
cache none
udid
fc_map_count 0
sync_rate 50
copy_count 1
136
SAN Volume Controller Best Practices and Performance Guidelines
copy_id 0
status online
sync yes
primary yes
mdisk_grp_id 0
mdisk_grp_name itso_ds45_64gb
type striped
mdisk_id
mdisk_name
fast_write_state empty
used_capacity 5.00GB
real_capacity 5.00GB
free_capacity 0.00MB
overallocation 100
autoexpand
warning
grainsize
7. If you want to create the VDisk with read/write cache, omit the -cache parameter, because
cache-enabled is the default setting. Refer to Example 7-14.
Example 7-14 Removing VDisk and recreating with cache enabled
IBM_2145:itsosvccl1:admin>svctask rmvdisk Image_mode1
IBM_2145:itsosvccl1:admin>svctask mkvdisk -mdiskgrp itso_ds45_64gb -size 5
-unit gb -iogrp PerfBestPrac -name Image_mode1
Virtual Disk, id [13], successfully created
IBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode1
id 13
name Image_mode1
IO_group_id 0
IO_group_name PerfBestPrac
status online
mdisk_grp_id 0
mdisk_grp_name itso_ds45_64gb
capacity 5.0GB
type striped
formatted no
mdisk_id
mdisk_name
FC_id
FC_name
RC_id
RC_name
vdisk_UID 60050768018381BF280000000000002D
throttling 0
preferred_node_id 1
fast_write_state empty
cache readwrite
...
8. You can then map the VDisk to the host and continue I/O operations after rescanning the
host. Refer to Example 7-15 on page 138.
Chapter 7. VDisks
137
Example 7-15 Mapping VDISK-Image to host Diomede_Win2k8
IBM_2145:itsosvccl1:admin>svctask mkvdiskhostmap -host Diomede_Win2k8
Image_mode1
Virtual Disk to Host map, id [5], successfully created
Note: Before removing the VDisk host mapping, it is essential that you follow the
procedures in Chapter 9, “Hosts” on page 175 so that you can remount the disk with its
access to data preserved.
7.5 VDisk performance
The answer to many performance questions is “it depends,” which is not much use to you
when trying to solve storage performance problems or rather perceived storage performance
problems. But there are no absolutes with performance, so it is truly difficult to state a specific
performance number for a VDisk.
Some people expect that the SVC will greatly add to the latency of I/O operations, because
the SVC is in-band. But because the SVC is an in-band appliance, all writes are essentially
write-hits, because completion is returned to the host at the point that the SVC cache has
mirrored the write to its partner node. When the workload is heavy, the cache will destage
write data, based on a least recently used (LRU) algorithm, thus, ensuring new host writes
continue to be serviced as quickly as possible. The rate of destage is ramped up to free
space more quickly when the cache reaches certain thresholds, which avoids any nasty
cache full situations.
Reads are likely to be read-hits, and sequential workloads get the benefit of both controller
prefetch and SVC prefetch algorithms, giving the latest SVC nodes the ability to show more
than 10 GBps on large transfer sequential read miss workloads. Random reads are at the
mercy of the storage again, and here we tie in with the “fast path” with tens of microseconds
of additional latency on a read-miss. The chances are this read will also be a read miss on the
controller where a high-end system will respond in around 10 milliseconds. The order of
magnitude of the additional latency introduced by SVC is therefore “lost in the noise.”
A VDisk, just as any storage device, has three basic properties: capacity, I/O rate, and
throughput as measured in megabytes per second. One of these properties will be the
limiting factor in your environment. Having cache and striping across large numbers of disks
can help increase these numbers. But eventually, the fundamental laws of physics apply.
There will always be a limiting number. One of the major problems with designing a storage
infrastructure is that while it is relatively easy to determine the required capacity, determining
the required I/O rate and throughput is not so easy. All too often the exact requirement is only
known after the storage infrastructure has been built, and the performance is inadequate.
One of the advantages of the SVC is that it is possible to compensate for a lack of information
at the design stage due to the SVC’s flexibility and the ability to non-disruptively migrate data
to different types of back-end storage devices.
The throughput for VDisks can range from fairly small numbers (1 to 10 IOPS) to extremely
large values (more than 1 000 IOPS). This throughput depends greatly on the nature of the
application and across how many MDisks the VDisk is striped. When the I/O rates, or
throughput, approach 1 000 IOPS per VDisk, it is either because the volume is getting
extremely good performance, usually from extremely good cache behavior, or that the VDisk
is striped across multiple MDisks and hence usually across multiple RAID arrays on the
back-end storage system. Otherwise, it is not possible to perform so many IOPS to a VDisk
that is based on a single RAID array and still realize a good response time.
138
SAN Volume Controller Best Practices and Performance Guidelines
The MDisk I/O limit depends on many factors. The primary factor is the number of disks in the
RAID array on which the MDisk is built and the speed or revolutions per minute (RPM) of the
disks. But when the number of IOPS to an MDisk is near or above 1 000, the MDisk is
considered extremely busy. For 15 K RPM disks, the limit is a bit higher. But these high I/O
rates to the back-end storage systems are not consistent with good performance; they imply
that the back-end RAID arrays are operating at extremely high utilizations, which is indicative
of considerable queuing delays. Good planning demands a solution that reduces the load on
such busy RAID arrays.
For more precision, we will consider the upper limit of performance for 10 K and 15 K RPM,
enterprise class devices. Be aware that different people have different opinions about these
limits, but all the numbers in Table 7-1 represent extremely busy disk drive modules (DDMs).
Table 7-1 DDM speeds
DDM speed
Maximum
operations/second
6+P
operations/second
7+P
operations/second
10 K
150 - 175
900 - 1050
1050 - 1225
15 K
200 - 225
1200 - 1350
1400 - 1575
While disks might achieve these throughputs, these ranges imply a lot of queuing delay and
high response times. These ranges probably represent acceptable performance only for
batch-oriented applications, where throughput is the paramount performance metric. For
online transaction processing (OLTP) applications, these throughputs might already have
unacceptably high response times. Because 15 K RPM DDMs are most commonly used in
OLTP environments (where response time is at a premium), a simple rule is if the MDisk does
more than 1 000 operations per second, it is extremely busy, no matter what the drive’s RPM
is.
In the absence of additional information, we often assume, and our performance models
assume, that 10 milliseconds (msec) response time is pretty high. But for a particular
application, 10 msec might be too low or too high. Many OLTP environments require
response times closer to 5 msec, while batch applications with large sequential transfers
might run fine with 20 msec response time. The appropriate value can also change between
shifts or on the weekend. A response time of 5 msec might be required from 8 a.m. until 5
p.m., while 50 msec is perfectly acceptable near midnight. It is all client and application
dependent.
What really matters is the average front-end response time, which is what counts for the
users. You can measure the average front-end response time by using TPC for Disk with its
performance reporting capabilities. Refer to Chapter 11, “Monitoring” on page 221 for more
information.
Figure 7-3 on page 140 shows the overall response time of a VDisk that is under test. Here,
we have plotted the overall response time. Additionally, TPC allows us to plot read and write
response times as distinct entities if one of these response times was causing problems to
the user. This response time in the 1 - 2 msec range gives an acceptable level of
performance for OLTP applications.
Chapter 7. VDisks
139
Figure 7-3 VDisk overall response time
If we look at the I/O rate on this VDisk, we see the chart in Figure 7-4 on page 141, which
shows us that the I/O rate to this VDisk was in the region of 2 000 IOPS, which normally is an
unacceptably high response time for a LUN that is based on a single RAID array. However, in
this case, the VDisk was striped across two MDisks, which gives us an I/O rate per MDisk in
the order of 1 200 IOPS. This I/O rate is high and normally gives a high user response time;
however, here, the SVC front-end cache mitigates the high latency at the back end, giving the
user a good response time.
Although there is no immediate issue with this VDisk, if the workload characteristics change
and the VDisk becomes less cache friendly, you need to consider adding another MDisk to
the MDG, making sure that it comes from another RAID array, and striping the VDisk across
all three MDisks.
140
SAN Volume Controller Best Practices and Performance Guidelines
Figure 7-4 VDisk I/O rate
7.5.1 VDisk performance
It is vital that you constantly monitor systems when they are performing well so that you can
establish baseline levels of good performance. Then, if performance as experienced by the
user degrades, you have the baseline numbers for a comparison. We strongly recommend
that you use TPC to monitor and manage your storage environment.
OLTP workloads
Probably the most important parameter as far as VDisks are concerned is the I/O response
time for OLTP workloads. After you have established what VDisk response time provides
good user performance, you can set TPC alerting to notify you if this number is exceeded by
about 25%. Then, you check the I/O rate of the MDisks on which this VDisk is built. If there
are multiple MDisks per RAID array, you need to check the RAID array performance. You can
perform all of these tasks using TPC. The “magic” number here is 1 000 IOPS, assuming that
the RAID array is 6+P. Refer to Table 7-1 on page 139.
If one of the back-end storage arrays is running at more than 1 000 IOPS and the user is
experiencing poor performance because of degraded response time, this array is probably
the root cause of the problem.
Chapter 7. VDisks
141
If users complain of response time problems, yet the VDisk response as measured by TPC
has not changed significantly, this situation indicates that the problem is in the SAN network
between the host and the SVC. You can diagnose where the problem is with TPC. The best
way to determine the location of the problem is to use the Topology Viewer to look at the host
using Datapath Explorer (DPE). This view enables you to see the paths from the host to the
SVC, which we show in Figure 7-5.
Figure 7-5 DPE view of the host to the SVC
Figure 7-5 shows the paths from the disk as seen by the server through its host bus adapters
(HBAs) to the SVC VDisk. By hovering the cursor over the switch port, you can see the
throughput of that port. You can also use TPC to produce reports showing the overall
throughput of the ports, which we show in Figure 7-6 on page 143.
142
SAN Volume Controller Best Practices and Performance Guidelines
Figure 7-6 Throughput of the ports
TPC can present the throughput of the ports graphically over time as shown in Figure 7-7 on
page 144.
Chapter 7. VDisks
143
Figure 7-7 Port throughput rate
From this type of graph, you can identify performance bottlenecks in the SAN fabric and make
the appropriate changes.
Batch workloads
With batch workloads in general, the most important parameter is the throughput rate as
measured in megabytes per second. The goal rate is harder to quantify than the OLTP
response figure, because throughput is heavily dependent on the block size. Additionally high
response times can be acceptable for these workloads. So, it is not possible to give a single
metric to quantify performance. It really is a question of “it depends.”
The larger the block size, the greater the potential throughput to the SVC. Block size is often
determined by the application. With TPC, you can measure the throughput of a VDisk and the
MDisks on which it is built. The important measure for the user is the time that the batch job
takes to complete. If this time is too long, the following steps are a good starting point.
Determine the data rate that is needed for timely completion and compare it with the storage
system’s capability as documented in performance white papers and Disk Magic. If the
storage system is capable of greater performance:
1. Make sure that the application transfer size is as large as possible.
2. Consider increasing the number of concurrent application streams, threads, files, and
partitions.
3. Make sure that the host is capable of supporting the required data rate. For example, use
tests, such as dd, and use TPC to monitor the results.
144
SAN Volume Controller Best Practices and Performance Guidelines
4. Check whether the flow of data through the SAN is balanced by using the switch
performance monitors within TPC (extremely useful).
5. Check whether all switch and host ports are operating at the maximum permitted data rate
of 2 GB per seconds or 4 Gb per seconds.
6. Watch out for cases where the whole batch window stops on a single file or database
getting read or written, which can be a practical exposure for obvious reasons.
Unfortunately, sometimes there is nothing that can be done. However, it is worthwhile
evaluating this situation to see whether, for example, the database can be divided into
partitions, or the large file replaced by multiple smaller files. Or, the use of the SVC in
combination with SDD might help with a combination of striping and added paths to
multiple VDisks. These efforts can allow parallel batch streams to the VDisks and, thus,
speed up batch runs.
The chart shown in Figure 7-8 gives an indication of what can be achieved with tuning the
VDisk and the application. From point A to point B shows the normal steady state running of
the application on the VDisk built on a single MDisk. We then migrated the VDisk so that it
spanned two MDisks. From point B to point C shows the drop in performance during the
migration. When the migration was complete, the line from point D to point E shows that the
performance had almost doubled. The application was one with 75% reads and 75%
sequential access. The application was then modified so that it was 100% sequential. The
resulting gain in performance is shown between point E and point F.
Figure 7-8 Large 64 KB block workloads with improvements
Chapter 7. VDisks
145
Figure 7-9 shows the performance enhancements that can be achieved by modifying the
number of parallel streams flowing to the VDisk. The line from point A to point B shows the
performance with a single stream application. We then doubled the size of the workload, but
we kept it in single stream. As you can see from the line between point C and point D, there is
no improvement in performance. We were then able to split the workload into two parallel
streams at point E. As you can see from the graph, from point E to point F shows that the
throughput to the VDisk has increased by over 60%.
Figure 7-9 Effect of splitting a large job into two parallel streams
Mixed workloads
As discussed in 7.2.1, “Selecting the MDisk Group” on page 124, we usually recommend
mixing workloads, so that the maximum resources are available to any workload when
needed. When there is a heavy batch workload and there is no VDisk throttling, we
recommend that the VDisks are placed on separate MDGs.
This action is illustrated by the chart in Figure 7-10 on page 147. VDisk 21 is running an
OLTP workload, and VDisk 20 is running a batch job. Both VDisks were in the same MDG
sharing the same MDisks, which were spread over three RAID arrays. As you can see
between point A and point B, the response time for the OLTP workload is extremely high,
averaging 10 milliseconds. At point in time B, we migrated VDisk 20 to another MDG, using
MDisks built on different RAID arrays. As you can see, after the migration had completed, the
response time (from point D to point E) dropped for both the batch job and, more importantly,
the OLTP workload.
146
SAN Volume Controller Best Practices and Performance Guidelines
Figure 7-10 Effect of migrating batch workload
7.6 The effect of load on storage controllers
Because the SVC is able to share the capacity of a few MDisks to many more VDisks (and,
thus, are assigned to hosts generating I/O), it is possible that an SVC can generate a lot more
I/O than the storage controller normally received if there was not an SVC in the middle. To
add FlashCopy to this situation can add more I/O to a storage controller in addition to the I/O
that hosts are generating.
It is important to take the load that you can put onto a storage controller into consideration
when defining VDisks for hosts to make sure that you do not overload a storage controller.
So, assuming that a typical physical drive can handle 150 IOPS (a Serial Advanced
Technology Attachment (SATA) might handle slightly fewer IOPS than 150) and by using this
example, you can calculate the maximum I/O capability that an MDG can handle.
Chapter 7. VDisks
147
Then, as you define the VDisks and the FlashCopy mappings, calculate the maximum
average I/O that the SVC will receive per VDisk before you start to overload your storage
controller.
This example assumes:
 An MDisk is defined from an entire array (that is, the array only provides one LUN and that
LUN is given to the SVC as an MDisk).
 Each MDisk that is assigned to an MDG is the same size and same RAID type and comes
from a storage controller of the same type.
 MDisks from a storage controller are contained entirely in the same MDG.
The raw I/O capability of the MDG is the sum of the capabilities of its MDisks. For example,
for five RAID 5 MDisks with eight component disks on a typical back-end device, the I/O
capability is:
5 x (150 x 7) = 5250
This raw number might be constrained by the I/O processing capability of the back-end
storage controller itself.
FlashCopy copying contributes to the I/O load of a storage controller, and thus, it must be
taken into consideration. The effect of a FlashCopy is effectively adding a number of loaded
VDisks to the group, and thus, a weighting factor can be calculated to make allowance for this
load.
The affect of FlashCopy copies depends on the type of I/O taking place. For example, in a
group with two FlashCopy copies and random writes to those VDisks, the weighting factor is
14 x 2 = 28. The total weighting factor for FlashCopy copies is given in Table 7-2.
Table 7-2 FlashCopy weighting
Type of I/O to the VDisk
Impact on I/O
Weight factor for FlashCopy
None/very little
Insignificant
0
Reads only
Insignificant
0
Sequential reads and writes
Up to 2x I/Os
2xF
Random reads and writes
Up to 15x I/O
14 x F
Random writes
Up to 50x I/O
49 x F
Thus, to calculate the average I/O per VDisk before overloading the MDG, use this formula:
I/O rate = (I/O Capability) / (No vdisks + Weighting Factor)
So, using the example MDG as defined previously, if we added 20 VDisks to the MDG and
that MDG was able to sustain 5 250 IOPS, and there were two FlashCopy mappings that also
have random reads and writes, the maximum I/O per VDisks is:
5250 / (20 + 28) = 110
Note that this is an average I/O rate, so if half of the VDisks sustain 200 I/Os and the other
half of the VDisks sustain 10 I/Os, the average is still 110 IOPS.
148
SAN Volume Controller Best Practices and Performance Guidelines
Conclusion
As you can see from the previous examples, TPC is an extremely useful and powerful tool for
analyzing and solving performance problems. If you want a single parameter to monitor to
gain an overview of your system’s performance, it is the read and write response times for
both VDisks and MDisks. This parameter shows everything that you need in one view. It is
the key day-to-day performance validation metric. It is relatively easy to notice that a system
that usually had 2 ms writes and 6 ms reads suddenly has 10 ms writes and 12 ms reads and
is getting overloaded. A general monthly check of CPU usage will show you how the system
is growing over time and highlight when it is time to add a new I/O Group (or cluster).
In addition, there are useful rules for OLTP-type workloads, such as the maximum I/O rates
for back-end storage arrays, but for batch workloads, it really is a case of “it depends.”
Chapter 7. VDisks
149
150
SAN Volume Controller Best Practices and Performance Guidelines
8
Chapter 8.
Copy services
In this chapter, we discuss the best practices for using the Advanced Copy Services
functions, such as FlashCopy services and Metro Mirror and Global Mirror. We also describe
guidelines to obtain the best performance.
© Copyright IBM Corp. 2008. All rights reserved.
151
8.1 SAN Volume Controller Advanced Copy Services functions
In this section, we describe the best practice for the SAN Volume Controller (SVC) Advanced
Copy Services functions, and how to get the best performance.
8.1.1 Setting up FlashCopy services
Regardless of whether you use FlashCopy to make one target disk, or multiple target disks, it
is important that you consider the application and the operating system. Even though the
SVC can make an exact image of a disk with FlashCopy at the point in time that you require,
it is pointless if the operating system, or more importantly, the application, cannot use the
copied disk.
Data stored to a disk from an application normally goes through these steps:
1. The application records the data using its defined application programming interface.
Certain applications might first store their data in application memory before sending it to
disk at a later time. Normally, subsequent reads of the block just being written will get the
block in memory if it is still there.
2. The application sends the data to a file. The file system accepting the data might buffer it
in memory for a period of time.
3. The file system will send the I/O to a disk controller after a defined period of time (or even
based on an event).
4. The disk controller might cache its write in memory before sending the data to the physical
drive.
If the SVC is the disk controller, it will store the write in its internal cache before sending
the I/O to the real disk controller.
5. The data is stored on the drive.
At any point in time, there might be any number of unwritten blocks of data in any of these
steps, waiting to go to the next step.
It is also important to realize that sometimes the order of the data blocks created in step 1
might not be the same order that is used when sending the blocks to steps 2, 3, or 4. So it is
possible, that at any point in time, data arriving in step 4 might be missing a vital component
that has not yet been sent from step 1, 2, or 3.
FlashCopy copies are normally created with data that is visible from step 4. So, to maintain
application integrity, when a FlashCopy is created, any I/O that is generated in step 1 must
make it to step 4 when the FlashCopy is started. In other words, there must not be any
outstanding write I/Os in steps 1, 2, or 3.
If there were outstanding write I/Os, the copy of the disk that is created at step 4 is likely to be
missing those transactions, and if the FlashCopy is to be used, these missing I/Os can make
it unusable.
152
SAN Volume Controller Best Practices and Performance Guidelines
8.1.2 Steps to making a FlashCopy VDisk with application data integrity
The steps that you must perform when creating FlashCopy copies are:
1. Your host is currently writing to a VDisk as part of its day-to-day usage. This VDisk
becomes the source VDisk in our FlashCopy mapping.
2. Identify the size and type (image, sequential, or striped) of the VDisk. If the VDisk is an
image mode VDisk, you need to know its size in bytes. If it is a sequential or striped mode
VDisk, its size, as reported by the SVC Master Console or SVC command line interface
(CLI), is sufficient.
To identify the VDisks in an SVC cluster, use the svcinfo lsvdisk command, as shown in
Example 8-1.
Figure 8-1 on page 154 shows how to obtain the same information using the SVC GUI. If you
want to put VDisk 10 into a FlashCopy mapping, you do not need to know the byte size of that
VDisk, because it is a striped VDisk. Creating a target VDisk of 18 GB by using the SVC GUI
or CLI is sufficient.
Example 8-1 Using the command line to see the type of the VDisks
IBM_2145:itsosvccl1:admin>svcinfo lsvdisk -delim :
id:name:IO_group_id:IO_group_name:status:mdisk_grp_id:mdisk_grp_name:capacity:type
:FC_id:FC_name:RC_id:RC_name:vdisk_UID:fc_map_count:copy_count
0:diomede0:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::600507680183
81BF2800000000000024:0:1
1:diomede1:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::600507680183
81BF2800000000000025:0:1
2:diomede2:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::600507680183
81BF2800000000000026:0:1
3:vdisk3:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::6005076801838
1BF2800000000000009:0:1
4:diomede3:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::600507680183
81BF2800000000000027:0:1
5:vdisk5:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::6005076801838
1BF280000000000000B:0:1
6:vdisk6:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::6005076801838
1BF280000000000000C:0:1
7:siam1:0:PerfBestPrac:online:4:itso_ds47_siam:70.0GB:striped:::::60050768018381
BF2800000000000016:0:1
8:vdisk8:0:PerfBestPrac:online:many:many:800.0MB:many:::::60050768018381BF280000
0000000013:0:2
9:vdisk9:0:PerfBestPrac:online:2:itso_smallgrp:1.5GB:striped:::::60050768018381B
F2800000000000014:0:1
10:Diomede_striped:0:PerfBestPrac:online:0:itso_ds45_64gb:64.0GB:striped:::::600
50768018381BF2800000000000028:0:1
11:Image_mode0:0:PerfBestPrac:online:0:itso_ds45_64gb:18.0GB:image:::::600507680
18381BF280000000000002A:0:1
Chapter 8. Copy services
153
Figure 8-1 Using the SVC GUI to see the type of VDisks
The VDisk 11, which is used in our example, is an image-mode VDisk. In this example, you
need to know its exact size in bytes.
In Example 8-2, we use the -bytes parameter of the svcinfo lsvdisk command to find its
exact size. Thus, the target VDisk must be created with a size of 19 327 352 832 bytes, not
18 GB. Figure 8-2 on page 155 shows the exact size of an image mode VDisk using the SVC
GUI.
Example 8-2 Find the exact size of an image mode VDisk using the command line interface
IBM_2145:itsosvccl1:admin>svcinfo lsvdisk -bytes 11
id 11
name Image_mode0
IO_group_id 0
IO_group_name PerfBestPrac
status online
mdisk_grp_id 0
mdisk_grp_name itso_ds45_64gb
capacity 19327352832
type image
formatted no
mdisk_id 10
mdisk_name mdisk10
FC_id
FC_name
RC_id
RC_name
vdisk_UID 60050768018381BF280000000000002A
throttling 0
preferred_node_id 1
fast_write_state empty
cache readwrite
...
154
SAN Volume Controller Best Practices and Performance Guidelines
Figure 8-2 Find the exact size of an image mode VDisk using the SVC GUI
3. Create a target VDisk of the required size as identified by the source VDisk in Figure 8-3
on page 163. The target VDisk can be either an image, sequential, or striped mode VDisk;
the only requirement is that it must be exactly the same size as the source VDisk. The
target VDisk can be cache-enabled or cache-disabled.
4. Define a FlashCopy mapping, making sure that you have the source and target disks
defined in the correct order. (If you use your newly created VDisk as a source and the
existing host’s VDisk as the target, you will destroy the data on the VDisk if you start the
FlashCopy.)
5. As part of the define step, you can specify the copy rate from 0 to 100. The copy rate will
determine how quickly the SVC will copy the data from the source VDisk to the target
VDisk.
Setting the copy rate to 0 (NOCOPY), the SVC will only copy blocks that have changed
since the mapping was started on the source VDisk or the target VDisk (if the target VDisk
is mounted, read write to a host).
6. The prepare process for the FlashCopy mapping can take several minutes to complete,
because it forces the SVC to flush any outstanding write I/Os belonging to the source
VDisks to the storage controller’s disks. After the preparation completes, the mapping has
a Prepared status and the target VDisk behaves as though it was a cache-disabled VDisk
until the FlashCopy mapping is either started or deleted.
Note: If you create a FlashCopy mapping where the source VDisk is a target VDisk of
an active Metro Mirror relationship, you add additional latency to that existing Metro
Mirror relationship (and possibly affect the host that is using the source VDisk of that
Metro Mirror relationship as a result).
The reason for the additional latency is that the FlashCopy prepares and disables the
cache on the source VDisk (which is the target VDisk of the Metro Mirror relationship),
and thus, all write I/Os from the Metro Mirror relationship need to commit to the storage
controller before the completion is returned to the host.
Chapter 8. Copy services
155
7. After the FlashCopy mapping is prepared, you can then quiesce the host by forcing the
host and the application to stop I/Os and flush any outstanding write I/Os to disk. This
process will be different for each application and for each operating system.
One guaranteed way to quiesce the host is to stop the application and unmount the VDisk
from the host.
8. As soon as the host completes its flushing, you can the start the FlashCopy mapping. The
FlashCopy starts extremely quickly (at most, a few seconds).
9. When the FlashCopy mapping has started, you can then unquiesce your application (or
mount the volume and start the application), at which point the cache is re-enabled for the
source VDisks. The FlashCopy continues to run in the background and ensures that the
target VDisk is an exact copy of the source VDisk when the FlashCopy mapping was
started.
You can perform step 1 on page 153 through step 5 on page 155 while the host that owns the
source VDisk performs its typical daily activities (that means no downtime). While step 6 on
page 155 is running, which can last several minutes, there might be a delay in I/O throughput,
because the cache on the VDisk is temporarily disabled.
Step 7 must be performed when the application is down. However, these steps complete
quickly and application downtime is minimal.
The target FlashCopy VDisk can now be assigned to another host, and it can be used for
read or write even though the FlashCopy process has not completed.
Note: If you intend to use the target VDisk on the same host as the source VDisk at the
same time that the source VDisk is visible to that host, you might need to perform
additional preparation steps to enable the host to access VDisks that are identical.
8.1.3 Making multiple related FlashCopy VDisks with data integrity
Where a host has more than one VDisk, and those VDisks are used by one application,
FlashCopy consistency might need to be performed across all disks at exactly the same
moment in time to preserve data integrity.
Here are examples when this situation might apply:
 A Windows Exchange server has more than one drive, and each drive is used for an
Exchange Information Store. For example, the exchange server has a D drive, an E drive,
and an F drive. Each drive is an SVC VDisk that is used to store different information
stores for the Exchange server.
Thus, when performing a “snap copy” of the exchange environment, all three disks need
to be flashed at exactly the same time, so that if they were used during a recovery, no one
information store has more recent data on it than another information store.
 A UNIX® relational database has several VDisks to hold different parts of the relational
database. For example, two VDisks are used to hold two distinct tables, and a third VDisk
holds the relational database transaction logs.
Again, when a snap copy of the relational database environment is taken, all three disks
need to be in sync. That way, when they are used in a recovery, the relational database is
not missing any transactions that might have occurred if each VDisk was copied by using
FlashCopy independently.
156
SAN Volume Controller Best Practices and Performance Guidelines
Here are the steps to ensure that data integrity is preserved when VDisks are related to each
other:
1. Your host is currently writing to the VDisks as part of its daily activities. These VDisks will
become the source VDisks in our FlashCopy mappings.
2. Identify the size and type (image, sequential, or striped) of each source VDisk. If any of
the source VDisks is an image mode VDisk, you will need to know its size in bytes. If any
of the source VDisks are sequential or striped mode VDisks, their size as reported by the
SVC Master Console or SVC command line will be sufficient.
3. Create a target VDisk of the required size for each source identified in the previous step.
The target VDisk can be either an image, sequential, or striped mode VDisk; the only
requirement is that they must be exactly the same size as their source VDisk. The target
VDisk can be cache-enabled or cache-disabled.
4. Define a FlashCopy Consistency Group. This Consistency Group will be linked to each
FlashCopy mapping that you have defined, so that data integrity is preserved between
each VDisk.
5. Define a FlashCopy mapping for each source VDisk, making sure that you have the
source disk and the target disk defined in the correct order. (If you use any of your newly
created VDisks as a source and the existing host’s VDisk as the target, you will destroy
the data on the VDisk if you start the FlashCopy).
When defining the mapping, make sure that you link this mapping to the FlashCopy
Consistency Group that you defined in the previous step.
As part of defining the mapping, you can specify the copy rate from 0 to 100. The copy
rate will determine how quickly the SVC will copy the source VDisks to the target VDisks.
Setting the copy rate to 0 (NOCOPY), the SVC will only copy blocks that have changed on
any VDisk since the Consistency Group was started on the source VDisk or the target
VDisk (if the target VDisk is mounted read/write to a host).
6. Prepare the FlashCopy Consistency Group. This preparation process can take several
minutes to complete, because it forces the SVC to flush any outstanding write I/Os
belonging to the VDisks in the Consistency Group to the storage controller’s disks. After
the preparation process completes, the Consistency Group has a Prepared status and all
source VDisks behave as though they were cache-disabled VDisks until the Consistency
Group is either started or deleted.
Note: If you create a FlashCopy mapping where the source VDisk is a target VDisk of
an active Metro Mirror relationship, this mapping adds additional latency to that existing
Metro Mirror relationship (and possibly affects the host that is using the source VDisk of
that Metro Mirror relationship as a result).
The reason for the additional latency is that the FlashCopy Consistency Group
preparation process disables the cache on all source VDisks (which might be target
VDisks of a Metro Mirror relationship), and thus, all write I/Os from the Metro Mirror
relationship need to commit to the storage controller before the complete status is
returned to the host.
7. After the Consistency Group is prepared, you can then quiesce the host by forcing the
host and the application to stop I/Os and flush any outstanding write I/Os to disk. This
process differs for each application and for each operating system.
One guaranteed way to quiesce the host is to stop the application and unmount the
VDisks from the host.
Chapter 8. Copy services
157
8. As soon as the host completes its flushing, you can then start the Consistency Group. The
FlashCopy start completes extremely quickly (at most, a few seconds).
9. When the Consistency Group has started, you can then unquiesce your application (or
mount the VDisks and start the application), at which point the cache is re-enabled. The
FlashCopy continues to run in the background and preserves the data that existed on the
VDisks when the Consistency Group was started.
Step 1 on page 157 through step 6 on page 157 can be performed while the host that owns
the source VDisks is performing its typical daily duties (that is, no downtime). While step 6 on
page 157 is running, which can take several minutes, there might be a delay in I/O
throughput, because the cache on the VDisks is temporarily disabled.
You must perform step 7 when the application is down; however, these steps complete quickly
so that the application downtime is minimal.
The target FlashCopy VDisks can now be assigned to another host and used for read or write
even though the FlashCopy processes have not completed.
Note: If you intend to use any of the target VDisks on the same host as their source VDisk
at the same time that the source VDisk is visible to that host, you might need to perform
additional preparation steps to enable the host to access VDisks that are identical.
8.1.4 Creating multiple identical copies of a VDisk
Since SVC 4.2, you can create multiple point-in-time copies of a source VDisk. These
point-in-time copies can be made at different times (for example, hourly) so that an image of a
VDisk can be captured before a previous image has completed.
If there is a requirement to have more than one VDisk copy created at exactly the same time,
using FlashCopy Consistency Groups is the best method.
By placing the FlashCopy mappings into a Consistency Group (where each mapping uses the
same source VDisks), when the FlashCopy Consistency Group is started, each target will be
an identical image of all the other VDisk FlashCopy targets.
The VDisk Mirroring feature, which is new in SVC 4.3, allows you to have one or two copies of
a VDisk, too. For more details, refer to Chapter 7, “VDisks” on page 119.
8.1.5 Creating a FlashCopy mapping with the incremental flag
By creating a FlashCopy mapping with the incremental flag, only the data that has been
changed since the last FlashCopy was started is written to the target VDisk.
This functionality is necessary in cases where we want, for example, a full copy of a VDisk for
disaster tolerance, application testing, or data mining. It greatly reduces the time required to
establish a full copy of the source data as a new snapshot when the first background copy is
completed. In cases where clients maintain fully independent copies of data as part of their
disaster tolerance strategy, using incremental FlashCopy can be useful as the first layer in
their disaster tolerance and backup strategy.
158
SAN Volume Controller Best Practices and Performance Guidelines
8.1.6 Space-Efficient FlashCopy (SEFC)
Using the Space-Efficient VDisk (SEV) feature, which was introduced in SVC 4.3, FlashCopy
can be used in a more efficient way. SEV allows for the late allocation of MDisk space (also
called thin-provisioning). Space-Efficient VDisks (SE VDisks) present a virtual size to hosts,
while the real MDisk Group space (the number of extents x the size of the extents) allocated
for the VDisk might be considerably smaller.
SE VDisks as target VDisks offer the opportunity to implement SEFC. SE VDisks as source
VDisk and target VDisk can also be used to make point-in-time copies.
There are two distinct meanings:
 Copy of an SE source VDisk to an SE target VDisk
The background copy only copies allocated regions, and the incremental feature can be
used for refresh mapping (after a full copy is complete).
 Copy of a Fully Allocated (FA) source VDisk to an SE target VDisk
For this combination, you must have a zero copy rate to avoid fully allocating the SE target
VDisk.
Note: The defaults for grain size are different: 32 KB for SE VDisk and 256 KB for
FlashCopy mapping.
You can use SE VDisks for cascaded FlashCopy and multiple target FlashCopy. It is also
possible to mix SE with normal VDisks, and it can be used for incremental FlashCopy too, but
using SE VDisks for incremental FlashCopy only makes sense if the source and target are
Space-Efficient.
The recommendation for SEFC:
 SEV grain size must be equal to the FlashCopy grain size.
 SEV grain size must be 64 KB for the best performance and the best space efficiency.
The exception is where the SEV target VDisk is going to become a production VDisk (will be
subjected to ongoing heavy I/O). In this case, the 256 KB SEV grain size is recommended to
provide better long term I/O performance at the expense of a slower initial copy.
Note: Even if the 256 KB SEV grain size is chosen, it is still beneficial if you keep the
FlashCopy grain size to 64 KB. It is then possible to minimize the performance impact to
the source VDisk, even though this size increases the I/O workload on the target VDisk.
Clients with extremely large numbers of FlashCopy/Remote Copy relationships might still
be forced to choose a 256 KB grain size for FlashCopy due to constraints on the amount of
bitmap memory.
8.1.7 Using FlashCopy with your backup application
If you are using FlashCopy together with your backup application and you do not intend to
keep the target disk after the backup has completed, we recommend that you create the
FlashCopy mappings using the NOCOPY option (background copy rate = 0).
Chapter 8. Copy services
159
If you intend to keep the target so that you can use it as part of a quick recovery process, you
might choose one of the following options:
 Create the FlashCopy mapping with NOCOPY initially. If the target is used and migrated
into production, you can change the copy rate at the appropriate time to the appropriate
rate to have all the data copied to the target disk. When the copy completes, you can
delete the FlashCopy mapping and delete the source VDisk, thus, freeing the space.
 Create the FlashCopy mapping with a low copy rate. Using a low rate might enable the
copy to complete without an impact to your storage controller, thus, leaving bandwidth
available for production work. If the target is used and migrated into production, you can
change the copy rate to a higher value at the appropriate time to ensure that all data is
copied to the target disk. After the copy completes, you can delete the source, thus,
freeing the space.
 Create the FlashCopy with a high copy rate. While this copy rate might add additional I/O
burden to your storage controller, it ensures that you get a complete copy of the source
disk as quickly as possible.
By using the target on a different Managed Disk Group (MDG), which, in turn, uses a
different array or controller, you reduce your window of risk if the storage providing the
source disk becomes unavailable.
With Multiple Target FlashCopy, you can now use a combination of these methods. For
example, you can use the NOCOPY rate for an hourly snapshot of a VDisk with a daily
FlashCopy using a high copy rate.
8.1.8 Using FlashCopy for data migration
SVC FlashCopy can help you with data migration, especially if you want to migrate from a
controller (and your own testing reveals that the SVC can communicate with the device).
Another reason to use SVC FlashCopy is to keep a copy of your data behind on the old
controller in order to help with a back-out plan in the event that you want to stop the migration
and revert back to the original configuration.
In this example, you can use the following steps to help migrate to a new storage
environment with minimum downtime, which enables you to leave a copy of the data in the
old environment if you need to back up to the old configuration.
To use FlashCopy to help with migration:
1. Your hosts are using the storage from either an unsupported controller or a supported
controller that you plan on retiring.
2. Install the new storage into your SAN fabric and define your arrays and logical unit
numbers (LUNs). Do not mask the LUNs to any host; you will mask them to the SVC later.
3. Install the SVC into your SAN fabric and create the required SAN zones for the SVC
nodes and SVC to see the new storage.
4. Mask the LUNs from your new storage controller to the SVC and use svctask
detectmdisk on the SVC to discover the new LUNs as MDisks.
5. Place the MDisks into the appropriate MDG.
6. Zone the hosts to the SVC (while maintaining their current zone to their storage) so that
you can discover and define the hosts to the SVC.
7. At an appropriate time, install the IBM SDD onto the hosts that will soon use the SVC for
storage. If you have performed testing to ensure that the host can use both SDD and the
original driver, you can perform this step anytime before the next step.
160
SAN Volume Controller Best Practices and Performance Guidelines
8. Quiesce or shut down the hosts so that they no longer use the old storage.
9. Change the masking on the LUNs on the old storage controller so that the SVC now is the
only user of the LUNs. You can change this masking one LUN at a time so that you can
discover them (in the next step) one at a time and not mix any LUNs up.
10.Use svctask detectmdisk to discover the LUNs as MDisks. We recommend that you also
use svctask chmdisk to rename the LUNs to something more meaningful.
11.Define a VDisk from each LUN and note its exact size (to the number of bytes) by using
the svcinfo lsvdisk command.
12.Define a FlashCopy mapping and start the FlashCopy mapping for each VDisk by using
the steps in 8.1.2, “Steps to making a FlashCopy VDisk with application data integrity” on
page 153.
13.Assign the target VDisks to the hosts and then restart your hosts. Your host sees the
original data with the exception that the storage is now an IBM SVC LUN.
With these steps, you have made a copy of the existing storage, and the SVC has not been
configured to write to the original storage. Thus, if you encounter any problems with these
steps, you can reverse everything that you have done, assign the old storage back to the
host, and continue without the SVC.
By using FlashCopy in this example, any incoming writes go to the new storage subsystem
and any read requests that have not been copied to the new subsystem automatically come
from the old subsystem (the FlashCopy source).
You can alter the FlashCopy copy rate, as appropriate, to ensure that all the data is copied to
the new controller.
After the FlashCopy completes, you can delete the FlashCopy mappings and the source
VDisks. After all the LUNs have been migrated across to the new storage controller, you can
remove the old storage controller from the SVC node zones and then, optionally, remove the
old storage controller from the SAN fabric.
You can also use this process if you want to migrate to a new storage controller and not keep
the SVC after the migration. At step 2 on page 160, make sure that you create LUNs that are
the same size as the original LUNs. Then, at step 11, use image mode VDisks. When the
FlashCopy mappings complete, you can shut down the hosts and map the storage directly to
them, remove the SVC, and continue on the new storage controller.
8.1.9 Summary of FlashCopy rules
To summarize the FlashCopy rules:
 FlashCopy services can only be provided inside an SVC cluster. If you want to FlashCopy
to remote storage, the remote storage needs to be defined locally to the SVC cluster.
 To maintain data integrity, ensure that all application I/Os and host I/Os are flushed from
any application and operating system buffers.
 You might need to stop your application in order for it to be “restarted” with a copy of the
VDisk that you make. Check with your application vendor if you have any doubts.
 Be careful if you want to map the target flash-copied VDisk to the same host that already
has the source VDisk mapped to it. Check that your operating system supports this
configuration.
Chapter 8. Copy services
161
 The target VDisk must be the same size as the source VDisk; however, the target VDisk
can be a different type (image, striped, or sequential mode) or have different cache
settings (cache-enabled or cache-disabled).
 If you stop a FlashCopy mapping or a Consistency Group before it has completed, you will
lose access to the target VDisks. If the target VDisks are mapped to hosts, they will have
I/O errors.
 A VDisk cannot be a source in one FlashCopy mapping and a target in another FlashCopy
mapping.
 A VDisk can be the source for up to 16 targets.
 A FlashCopy target cannot be used in a Metro Mirror or Global Mirror relationship.
8.2 Metro Mirror and Global Mirror
In the following topics, we discuss Metro Mirror and Global Mirror guidelines and best
practices.
8.2.1 Using both Metro Mirror and Global Mirror between two clusters
A Remote Copy (RC) Mirror relationship is a relationship between two individual VDisks of
the same size. The management of the RC Mirror relationships is always performed in the
cluster where the source VDisk exists.
However, you must consider the performance implications of this configuration, because write
data from all mirroring relationships will be transported over the same inter-cluster links.
Metro Mirror and Global Mirror respond differently to a heavily loaded, poorly performing link.
Metro Mirror will usually maintain the relationships in a consistent synchronized state,
meaning that primary host applications will start to see poor performance (as a result of the
synchronous mirroring being used).
Global Mirror, however, offers a higher level of write performance to primary host
applications. With a well-performing link, writes are completed asynchronously. If link
performance becomes unacceptable, the link tolerance feature automatically stops Global
Mirror relationships to ensure that the performance for application hosts remains within
reasonable limits.
Therefore, with active Metro Mirror and Global Mirror relationships between the same two
clusters, Global Mirror writes might suffer degraded performance if Metro Mirror relationships
consume most of the inter-cluster link’s capability. If this degradation reaches a level where
hosts writing to Global Mirror experience extended response times, the Global Mirror
relationships can be stopped when the link tolerance threshold is exceeded. If this situation
happens, refer to 8.2.9, “Diagnosing and fixing 1920 errors” on page 170.
8.2.2 Performing three-way copy service functions
If you have a requirement to perform three-way (or more) replication using copy service
functions (synchronous or asynchronous mirroring), you can address this requirement by
using a combination of SVC copy services with image mode cache-disabled VDisks and
storage controller copy services. Both relationships are active, as shown in Figure 8-3 on
page 163.
162
SAN Volume Controller Best Practices and Performance Guidelines
Figure 8-3 Using three-way copy services
Important: The SVC only supports copy services between two clusters.
In Figure 8-3, the Primary Site uses SVC copy services (Global Mirror or Metro Mirror) to the
secondary site. Thus, in the event of a disaster at the primary site, the storage administrator
enables access to the target VDisk (from the secondary site), and the business application
continues processing.
While the business continues processing at the secondary site, the storage controller copy
services replicate to the third site.
8.2.3 Using native controller Advanced Copy Services functions
Native copy services are not supported on all storage controllers. There is a summary of the
known limitations at the following Web site:
http://www-1.ibm.com/support/docview.wss?&uid=ssg1S1002852
The storage controller is unaware of the SVC
When you use the copy services function in a storage controller, remember that the storage
controller has no knowledge that the SVC exists and that the storage controller uses those
disks on behalf of the real hosts. Therefore, when allocating source volumes and target
volumes in a point-in-time copy relationship or a remote mirror relationship, make sure you
choose them in the right order. If you accidently use a source logical unit number (LUN) with
SVC data on it as a target LUN, you can accidentally destroy that data.
If that LUN was a Managed Disk (MDisk) in an MDisk group (MDG) with striped or sequential
VDisks on it, the accident might cascade up and bring the MDG offline. This situation, in turn,
makes all the VDisks that belong to that group offline.
Chapter 8. Copy services
163
When defining LUNs in point-in-time copy or a remote mirror relationship, double-check that
the SVC does not have visibility to the LUN (mask it so that no SVC node can see it), or if the
SVC must see the LUN, ensure that it is an unmanaged MDisk.
The storage controller might, as part of its Advanced Copy Services function, take a LUN
offline or suspend reads or writes. The SVC does not understand why this happens;
therefore, the SVC might log errors when these events occur.
If you mask target LUNs to the SVC and rename your MDisks as you discover them and if the
Advanced Copy Services function prohibits access to the LUN as part of its processing, the
MDisk might be discarded and rediscovered with an SVC-assigned MDisk name.
Cache-disabled image mode VDisks
When the SVC uses a LUN from a storage controller that is a source or target of Advanced
Copy Services functions, you can only use that LUN as a cache-disabled image mode VDisk.
If you use the LUN for any other type of SVC VDisk, you risk data loss. Not only of the data on
that LUN, but you can potentially bring down all VDisks in the MDG to which you assigned
that LUN (MDisk).
If you leave caching enabled on a VDisk, the underlying controller does not get any write I/Os
as the host writes them; the SVC caches them and destages them at a later time, which can
have additional ramifications if a target host is dependent on the write I/Os from the source
host as they are written.
When to use storage controller Advanced Copy Services functions
The SVC provides you with greater flexibility than only using native copy service functions,
namely:
 Standard storage device driver. Regardless of the storage controller behind the SVC, you
can use the IBM Subsystem Device Driver (SDD) to access the storage. As your
environment changes and your storage controllers change, using SDD negates the need
to update device driver software as those changes occur.
 The SVC can provide copy service functions between any supported controller to any
other supported controller, even if the controllers are from different vendors. This
capability enables you to use a lower class or cost of storage as a target for point-in-time
copies or remote mirror copies.
 The SVC enables you to move data around without host application interruption, which
can be useful, especially when the storage infrastructure is retired when new technology
becomes available.
However, certain storage controllers can provide additional copy service features and
functions compared to the capability of the current version of SVC. If you have a requirement
to use those features, you can use those additional copy service features and leverage the
features that the SVC provides by using cache-disabled image mode VDisks.
8.2.4 Configuration requirements for long distance links
IBM has tested a number of Fibre Channel extender and SAN router technologies for use with
the SVC.
The list of supported SAN routers and Fibre Channel extenders is available at this Web site:
http://www.ibm.com/storage/support/2145
164
SAN Volume Controller Best Practices and Performance Guidelines
If you use one of these extenders or routers, you need to test the link to ensure that the
following requirements are met before you place SVC traffic onto the link:
 For SVC 4.1.0.x, the round-trip latency between sites must not exceed 68 ms (34 ms oneway) for Fibre Channel (FC) extenders or 20 ms (10 ms one-way) for SAN routers.
 For SVC 4.1.1.x and later, the round-trip latency between sites must not exceed 80 ms
(40 ms one-way).
The latency of long distance links is dependent on the technology that is used. Typically,
for each 100 km (62.1 miles) of distance, it is assumed that 1 ms is added to the latency,
which for Global Mirror means that the remote cluster can be up to 4 000 km (2485 miles)
away.
 When testing your link for latency, it is important that you take into consideration both
current and future expected workloads, including any times when the workload might be
unusually high. You must evaluate the peak workload by considering the average write
workload over a period of one minute or less plus the required synchronization copy
bandwidth.
 SVC uses part of the bandwidth for its internal SVC inter-cluster heartbeat. The amount of
traffic depends on how many nodes are in each of the two clusters. Table 8-1 shows the
amount of traffic, in megabits per second, generated by different sizes of clusters.
These numbers represent the total traffic between the two clusters when no I/O is taking
place to mirrored VDisks. Half of the data is sent by one cluster, and half of the data is
sent by the other cluster. The traffic will be divided evenly over all available inter-cluster
links; therefore, if you have two redundant links, half of this traffic will be sent over each
link during fault-free operation.
Table 8-1 SVC inter-cluster heartbeat traffic (megabits per second)
Local/remote
cluster
Two nodes
Four nodes
Six nodes
Eight nodes
Two nodes
2.6
4.0
5.4
6.7
Four nodes
4.0
5.5
7.1
8.6
Six nodes
5.4
7.1
8.8
10.5
Eight nodes
6.7
8.6
10.5
12.4
 If the link between the sites is configured with redundancy so that it can tolerate single
failures, the link must be sized so that the bandwidth and latency statements continue to
be accurate even during single failure conditions.
8.2.5 Saving bandwidth creating Metro Mirror and Global Mirror relationships
If you have a situation where you have a large source VDisk (or a large number of source
VDisks) that you want to replicate to a remote site and your planning shows that the SVC
mirror initial sync time will take too long (or will be too costly if you pay for the traffic that you
use), here is a method of setting up the sync using another medium (that might be less
expensive).
Another reason that you might want to use these steps is if you want to increase the size of
the VDisks currently in a Metro Mirror relationship or a Global Mirror relationship. To increase
the size of these VDisks, you must delete the current mirror relationships and redefine the
mirror relationships after you have resized the VDisks.
Chapter 8. Copy services
165
In this example, we use tape media as the source for the initial sync for the Metro Mirror
relationship or the Global Mirror relationship target before using SVC to maintain the Metro
Mirror or Global Mirror. This example does not require downtime for the hosts using the
source VDisks.
Here are the steps:
1. The hosts are up and running and using their VDisks normally. There is no Metro Mirror
relationship or Global Mirror relationship defined yet.
You have identified all the VDisks that will become the source VDisks in a Metro Mirror
relationship or a Global Mirror relationship.
2. You have already established the SVC cluster relationship with the target SVC.
3. Define a Metro Mirror relationship or a Global Mirror relationship for each source VDisk.
When defining the relationship, ensure that you use the -sync option, which stops the
SVC from performing an initial sync.
Note: If you fail to use the -sync option, all of these steps are redundant, because the
SVC performs a full initial sync anyway.
4. Stop each mirror relationship by using the -access option, which enables write access to
the target VDisks. We will need this write access later.
5. Make a copy of the source VDisk to the alternate media by using the dd command to copy
the contents of the VDisk to tape. Another option might be using your backup tool (for
example, IBM Tivoli® Storage Manager) to make an image backup of the VDisk.
Note: Even though the source is being modified while you are copying the image, the
SVC is tracking those changes. Your image that you create might already have some of
the changes and is likely to have missed some of the changes as well.
When the relationship is restarted, the SVC will apply all of the changes that occurred
since the relationship was stopped in step 1 in 8.2.10, “Using Metro Mirror or Global
Mirror with FlashCopy” on page 172. After all the changes are applied, you will have a
consistent target image.
6. Ship your media to the remote site and apply the contents to the targets of the
Metro/Global Mirror relationship; you can mount the Metro Mirror and Global Mirror target
VDisks to a UNIX server and use the dd command to copy the contents of the tape to the
target VDisk. If you used your backup tool to make an image of the VDisk, follow the
instructions for your tool to restore the image to the target VDisk. Do not forget to remove
the mount, if this is a temporary host.
Note: It will not matter how long it takes to get your media to the remote site and
perform this step. The quicker you can get it to the remote site and loaded, the quicker
SVC is running and maintaining the Metro Mirror and Global Mirror.
7. Unmount the target VDisks from your host. When you start the Metro Mirror and Global
Mirror relationship later, the SVC will stop write access to the VDisk while the mirror
relationship is running.
8. Start your Metro Mirror and Global Mirror relationships. While the mirror relationship
catches up, the target VDisk is not usable at all. As soon as it reaches Consistent
Copying, your remote VDisk is ready for use in a disaster.
166
SAN Volume Controller Best Practices and Performance Guidelines
8.2.6 Global Mirror guidelines
When using SVC Global Mirror, all components in the SAN (switches, remote links, and
storage controllers) must be capable of sustaining the workload generated by application
hosts, as well as the Global Mirror background copy workload. If this is not true, Global Mirror
might automatically stop your relationships to protect your application hosts from increased
response times.
The Global Mirror partnership’s background copy rate must be set to a value appropriate to
the link and secondary back-end storage.
Global Mirror is not supported for cache-disabled VDisks participating in a Global Mirror
relationship.
We recommend that you use a SAN performance monitoring tool, such as IBM TotalStorage
Productivity Center (TPC), which allows you to continuously monitor the SAN components for
error conditions and performance problems.
TPC can alert you as soon as there is a performance problem or if a Global (or Metro Mirror)
link has been automatically suspended by the SVC. A remote copy relationship that remains
stopped without intervention can severely impact your recovery point objective. Additionally,
restarting a link that has been suspended for a long period of time can add additional burden
to your links while the synchronization catches up.
The gmlinktolerance parameter
The gmlinktolerance parameter of the remote copy partnership must be set to an
appropriate value. The default value of 300 seconds (5 minutes) is appropriate for most
clients.
If you plan to perform SAN maintenance that might impact SVC Global Mirror relationships,
you must either:
 Pick a maintenance window where application I/O workload is reduced for the duration of
the maintenance
 Disable the gmlinktolerance feature or increase the gmlinktolerance value (meaning that
application hosts might see extended response times from Global Mirror VDisks)
 Stop the Global Mirror relationships
VDisk preferred node
Global Mirror VDisks must have their preferred nodes evenly distributed between the nodes
of the clusters.
The preferred node property of a VDisk helps to balance the I/O load between nodes in that
I/O Group. This property is also used by Global Mirror to route I/O between clusters.
The SVC node that receives a write for a VDisk is normally that VDisk’s preferred node. For
VDisks in a Global Mirror relationship, that node is also responsible for sending that write to
the preferred node of the target VDisk. The primary preferred node is also responsible for
sending any writes relating to background copy; again, these writes are sent to the preferred
node of the target VDisk.
Note: The preferred node for a VDisk cannot be changed non-disruptively or easily after
the VDisk is created.
Chapter 8. Copy services
167
Each node of the remote cluster has a fixed pool of Global Mirror system resources for each
node of the primary cluster. That is, each remote node has a separate queue for I/O from
each of the primary nodes. This queue is a fixed size and is the same size for every node.
If preferred nodes for the VDisks of the remote cluster are set so that every combination of
primary node and secondary node is used, Global Mirror performance will be maximized.
Figure 8-4 shows an example of Global Mirror resources that are not optimized. VDisks from
the Local Cluster are replicated to the Remote Cluster, where all VDisks with a preferred node
of Node 1 are replicated to the Remote Cluster, where the target VDisks also have a preferred
node of Node 1.
With this configuration, the Remote Cluster Node 1 resources reserved for Local Cluster
Node 2 are not used. Nor are the resources for Local Cluster Node 1 used for Remote Cluster
Node 2.
Figure 8-4 Global Mirror resources not optimized
If the configuration was changed to the configuration shown in Figure 8-5, all Global Mirror
resources for each node are used, and SVC Global Mirror operates with better performance
than that of the configuration shown in Figure 8-4.
Figure 8-5 Global Mirror resources optimized
168
SAN Volume Controller Best Practices and Performance Guidelines
Back-end storage controller requirements
The capabilities of the storage controllers in a remote SVC cluster must be provisioned to
allow for:
 The peak application workload to the Global Mirror or Metro Mirror VDisks
 The defined level of background copy
 Any other I/O being performed at the remote site
The performance of applications at the primary cluster can be limited by the performance of
the back-end storage controllers at the remote cluster.
To maximize the number of I/Os that applications can perform to Global Mirror and Metro
Mirror VDisks:
 Global Mirror and Metro Mirror VDisks at the remote cluster must be in dedicated MDisk
Groups. The MDisk Groups must not contain non-mirror VDisks.
 Storage controllers must be configured to support the mirror workload that is required of
them, which might be achieved by:
– Dedicating storage controllers to only Global Mirror and Metro Mirror VDisks
– Configuring the controller to guarantee sufficient quality of service for the disks used by
Global Mirror and Metro Mirror
– Ensuring that physical disks are not shared between Global Mirror or Metro Mirror
VDisks and other I/O
– Verifying that MDisks within a mirror MDisk group must be similar in their
characteristics (for example, Redundant Array of Independent Disks (RAID) level,
physical disk count, and disk speed)
8.2.7 Migrating a Metro Mirror relationship to Global Mirror
It is possible to change a Metro Mirror relationship to a Global Mirror relationship or a Global
Mirror relationship to a Metro Mirror relationship. This procedure, however, requires an
outage to the host and is only successful if you can guarantee that no I/Os are generated to
either the source or target VDisks through these steps:
1. Your host is currently running with VDisks that are in a Metro Mirror or Global Mirror
relationship. This relationship is in the state Consistent-Synchronized.
2. Stop the application and the host.
3. Optionally, unmap the VDisks from the host to guarantee that no I/O can be performed on
these VDisks. If there are currently outstanding write I/Os in the cache, you might need to
wait at least two minutes before you can unmap the VDisks.
4. Stop the Metro Mirror or Global Mirror relationship, and ensure that the relationship stops
with Consistent Stopped.
5. Delete the current Metro Mirror or Global Mirror relationship.
6. Create the new Metro Mirror or Global Mirror relationship. Ensure that you create it as
synchronized to stop the SVC from resynchronizing the VDisks. Use the -sync flag with
the svctask mkrcrelationship command.
7. Start the new Metro Mirror or Global Mirror relationship.
8. Remap the source VDisks to the host if you unmapped them in step 3.
9. Start the host and the application.
Chapter 8. Copy services
169
Extremely important: If the relationship is not stopped in the consistent state, or if any
host I/O takes place between stopping the old Metro Mirror or Global Mirror relationship
and starting the new Metro Mirror or Global Mirror relationship, those changes will never be
mirrored to the target VDisks. As a result, the data on the source and target VDisks is not
exactly the same, and the SVC will be unaware of the inconsistency.
8.2.8 Recovering from suspended Metro Mirror or Global Mirror relationships
It is important to understand that when a Metro Mirror or Global Mirror relationship is started
for the first time, or started after it has been stopped or suspended for any reason, that while
the synchronization is “catching up,” the target disk is not in a consistent state until the
synchronization completes.
If you attempt to use the target VDisk at any time that a synchronization has started and
before it gets to the synchronized state (by stopping the mirror relationship and making the
target writable), the VDisk will contain only parts of the source VDisk and must not be used.
This inconsistency is particularly important if you have a Global/Metro Mirror relationship
running (that is synchronized) and the link fails (thus, the mirror relationship suspends). When
you restart the mirror relationship, the target disk will not be usable until the mirror catches up
and becomes synchronized again.
Depending on the number of changes that needs to be applied to the target and on your
bandwidth, this situation will leave you exposed without a usable target VDisk at all until the
synchronization completes.
To avoid this exposure, we recommend that you make a FlashCopy of the target VDisks
before you restart the mirror relationship. At least this way, you will have a usable target
VDisk even if it does contain old data.
8.2.9 Diagnosing and fixing 1920 errors
The SVC generates a 1920 error message whenever a Metro Mirror or Global Mirror
relationship has stopped due to poor performance. A 1920 error does not occur during normal
operation as long as you use a supported configuration and your SAN fabric links have been
sized to suit your workload.
This 1920 error can be temporary a temporary error, for example, as a result of maintenance,
or a permanent error due to hardware failure or unexpectedly higher host I/O workload.
If several 1920 errors have occurred, you must diagnose the cause of the earliest error first.
In order to diagnose the cause of the first error, it is extremely important that TPC, or your
chosen SAN performance analysis tool, is correctly configured and monitoring statistics when
the problem occurs. If you use TPC, set TPC to collect available statistics using the lowest
collection interval period, which is currently five minutes.
These situations are the most likely reasons for a 1920 error:
 Maintenance caused a change, such as switch or storage controller changes, for
example, updating firmware or adding additional capacity
170
SAN Volume Controller Best Practices and Performance Guidelines
 The remote link is overloaded. Using TPC, you can check the following metrics to see if
the remote link was a cause:
– Look at the total Global Mirror auxiliary VDisk write throughput before the Global Mirror
relationships were stopped.
If this write throughput is approximately equal to your link bandwidth, it is extremely
likely that your link is overloaded, which might be due to application host I/O or a
combination of host I/O and background (synchronization) copy activity.
– Look at the total Global Mirror source VDisk write throughput before the Global Mirror
relationships were stopped.
This write throughput represents only the I/O performed by the application hosts. If this
number approaches the link bandwidth, you might need to either upgrade the link’s
bandwidth, reduce the I/O that the application is attempting to perform, or choose to
mirror fewer VDisks using Global Mirror.
If, however, the auxiliary disks show much more write I/O than the source VDisks, this
situation suggests a high level of background copy. Try decreasing the Global Mirror
partnership’s background copy rate parameter to bring the total application I/O
bandwidth and background copy rate within the link’s capabilities.
– Look at the total Global Mirror source VDisk write throughput after the Global Mirror
relationships were stopped.
If write throughput increases greatly (by 30% or more) when the relationships were
stopped, this situation indicates that the application host was attempting to perform
more I/O than the link can sustain. While the Global Mirror relationships are active, the
overloaded link causes higher response times to the application host, which decreases
the throughput that it can achieve. After the relationships have stopped, the application
host sees lower response times, and you can see the true I/O workload. In this case,
the link bandwidth must be increased, the application host I/O rate must be decreased,
or fewer VDisks must be mirrored using Global Mirror.
 The storage controllers at the remote cluster are overloaded. Any of the MDisks on a
storage controller that are providing poor service to the SVC cluster can cause a 1920
error if this poor service prevents application I/O from proceeding at the rate required by
the application host.
If you have followed the specified back-end storage controller requirements, it is most
likely that the error has been caused by a decrease in controller performance due to
maintenance actions or a hardware failure of the controller.
Use TPC to obtain the back-end write response time for each MDisk at the remote cluster.
Response time for any individual MDisk, which exhibits a sudden increase of 50 ms or
more or that is higher than 100 ms, indicates a problem:
– Check the storage controller for error conditions, such as media errors, a failed
physical disk, or associated activity, such as RAID array rebuilding.
If there is an error, fix the problem and restart the Global Mirror relationships.
If there is no error, consider whether the secondary controller is capable of processing
the required level of application host I/O. It might be possible to improve the
performance of the controller by:
•
Adding more physical disks to a RAID array
•
Changing the RAID level of the array
•
Changing the controller’s cache settings (and checking that the cache batteries are
healthy, if applicable)
•
Changing other controller-specific configuration parameters
Chapter 8. Copy services
171
 The storage controllers at the primary site are overloaded. Analyze the performance of the
primary back-end storage using the same steps you use for the remote back-end storage.
The main effect of bad performance is to limit the amount of I/O that can be performed by
application hosts. Therefore, back-end storage at the primary site must be monitored
regardless of Global Mirror.
However, if bad performance continues for a prolonged period, it is possible that a 1920
error will occur and the Global Mirror relationships will stop.
 One of the SVC clusters is overloaded. Use TPC to obtain the port to local node send
response time and port to local node send queue time.
If the total of these statistics for either cluster is higher than 1 millisecond, the SVC might
be experiencing an extremely high I/O load.
Also, check the SVC node CPU utilization; if this figure is in excess of 50%, this situation
might also contribute to the problem.
In either case, contact your IBM service support representative (IBM SSR) for further
assistance.
 FlashCopy mappings are in the prepared state. If the Global Mirror target VDisks are the
sources of a FlashCopy mapping, and that mapping is in the prepared state for an
extended time, performance to those VDisks can be impacted, because the cache is
disabled. Starting the flash copy mapping will re-enable the cache, improving the VDisks’
performance for Global Mirror I/O.
8.2.10 Using Metro Mirror or Global Mirror with FlashCopy
SVC allows you to use a VDisk in a Metro Mirror or Global Mirror relationship as a source
VDisk for a FlashCopy mapping. You cannot use a VDisk as a FlashCopy mapping target that
is already in a Metro Mirror or Global Mirror relationship.
When you prepare a FlashCopy mapping, the SVC puts the source VDisks into a temporary
cache-disabled state. This temporary state adds additional latency to the Metro Mirror
relationship, because I/Os that are normally committed to SVC memory now need to be
committed to the storage controller.
One method of avoiding this latency is to temporarily stop the Metro Mirror or Global Mirror
relationship before preparing the FlashCopy mapping. When the Metro Mirror or Global Mirror
relationship is stopped, the SVC records all changes that occur to the source VDisks and
applies those changes to the target when the remote copy mirror is restarted. The steps to
temporarily stop the Metro Mirror or Global Mirror relationship before preparing the
FlashCopy mapping are:
1. Stop each mirror relationship by using the -access option, which enables write access to
the target VDisks. We will need this access later.
2. Make a copy of the source VDisk to the alternate media by using the dd command to copy
the contents of the VDisk to tape. Another option might be using your backup tool (for
example, IBM Tivoli Storage Manager) to make an image backup of the VDisk.
172
SAN Volume Controller Best Practices and Performance Guidelines
Note: Even though the source is being modified while you are copying the image, the
SVC is tracking those changes. Your image that you create might already have part of
the changes and is likely to have missed part of the changes as well.
When the relationship is restarted, the SVC will apply all changes that have occurred
since the relationship was stopped in step 1. After all the changes are applied, you will
have a consistent target image.
3. Ship your media to the remote site and apply the contents to the targets of the
Metro/Global mirror relationship; you can mount the Metro Mirror and Global Mirror target
VDisks to a UNIX server and use the dd command to copy the contents of the tape to the
target VDisk. If you used your backup tool to make an image of the VDisk, follow the
instructions for your tool to restore the image to the target VDisk. Do not forget to remove
the mount if this is a temporary host.
Note: It will not matter how long it takes to get your media to the remote site and
perform this step. The quicker you can get it to the remote site and loaded, the quicker
the SVC is running and maintaining the Metro Mirror and Global Mirror.
4. Unmount the target VDisks from your host. When you start the Metro Mirror and Global
Mirror relationship later, the SVC will stop write access to the VDisk while the mirror
relationship is running.
5. Start your Metro Mirror and Global Mirror relationships. While the mirror relationship
catches up, the target VDisk is not usable at all. As soon as it reaches Consistent
Copying, your remote VDisk is ready for use in a disaster.
8.2.11 Using TPC to monitor Global Mirror performance
It is important to use a SAN performance monitoring tool to ensure that all SAN components
perform correctly. While a SAN performance monitoring tool is useful in any SAN
environment, it is particularly important when using an asynchronous mirroring solution, such
as SVC Global Mirror. Performance statistics must be gathered at the highest possible
frequency, which is currently five minutes for TPC.
Note that if your VDisk or MDisk configuration is changed, you must restart your TPC
performance report to ensure that performance is correctly monitored for the new
configuration.
If using TPC, monitor:
 Global Mirror Secondary Write Lag
You monitor Global Mirror Secondary Write Lag to identify mirror delays (tpcpool metric
942).
 Port to Remote Node Send Response
Time needs to be less than 80 ms (the maximum latency supported by SVC Global
Mirror). A number in excess of 80 ms suggests that the long-distance link has excessive
latency, which needs to be rectified. One possibility to investigate is that the link is
operating at maximum bandwidth (tpcpool metrics 931 and 934).
 Sum of Port to Local Node Send Response Time and Port to Local Node Send Queue
Time must be less than 1 ms for the primary cluster. A number in excess of 1 ms might
indicate that an I/O Group is reaching its I/O throughput limit, which can limit performance.
Chapter 8. Copy services
173
 CPU Utilization Percentage
CPU Utilization must be below 50%.
 Sum of Backend Write Response Time and Write Queue Time for Global Mirror MDisks at
the remote cluster
Time needs to be less than 100 ms. A longer response time can indicate that the storage
controller is overloaded. If the response time for a specific storage controller is outside of
its specified operating range, investigate for the same reason.
 Sum of Backend Write Response Time and Write Queue Time for Global Mirror MDisks at
the primary cluster
Time must also be less than 100 ms. If response time is greater than 100 ms, application
hosts might see extended response times if the SVC’s cache becomes full.
 Write Data Rate for Global Mirror MDisk groups at the remote cluster
This data rate indicates the amount of data that is being written by Global Mirror. If this
number approaches either the inter-cluster link bandwidth or the storage controller
throughput limit, be aware that further increases can cause overloading of the system and
monitor this number appropriately.
8.2.12 Summary of Metro Mirror and Global Mirror rules
To summarize the Metro Mirror and Global Mirror rules:
 FlashCopy targets cannot be in a Metro Mirror or Global Mirror relationship, only
FlashCopy sources can be in a Metro Mirror or Global Mirror relationship.
 Metro Mirror or Global Mirror source or target VDisks cannot be moved to different I/O
Groups.
 Metro Mirror or Global Mirror VDisks cannot be resized.
 Intra-cluster Metro Mirror or Global Mirror can only mirror between VDisks in the same I/O
Group.
 The target VDisks must be the same size as the source VDisks; however, the target VDisk
can be a different type (image, striped, or sequential mode) or have different cache
settings (cache-enabled or cache-disabled).
174
SAN Volume Controller Best Practices and Performance Guidelines
9
Chapter 9.
Hosts
This chapter describes best practices for monitoring host systems attached to the SAN
Volume Controller (SVC).
A host system is an Open Systems computer that is connected to the switch through a Fibre
Channel (FC) interface.
The most important part of tuning, troubleshooting, and performance considerations for a
host attached to an SVC will be in the host. There are three major areas of concern:
 Using multipathing and bandwidth (physical capability of SAN and back-end storage)
 Understanding how your host performs I/O and the types of I/O
 Utilizing measurement and test tools to determine host performance and for tuning
This topic supplements the IBM System Storage SAN Volume Controller Host Attachment
User’s Guide Version 4.3.0, SC26-7905-02, at:
http://www-1.ibm.com/support/docview.wss?rs=591&context=STCCCXR&context=STCCCYH&dc
=DA400&q1=english&q2=-Japanese&uid=ssg1S7002159&loc=en_US&cs=utf-8&lang=en
© Copyright IBM Corp. 2008. All rights reserved.
175
9.1 Configuration recommendations
There are basic configuration recommendations when using the SVC to manage storage that
is connected to any host. The considerations include how many paths through the fabric are
allocated to the host, how many host ports to use, how to spread the hosts across I/O
Groups, logical unit number (LUN) mapping, and the correct size of virtual disks (VDisks) to
use.
9.1.1 The number of paths
From general experience, we have determined that it is best to limit the total number of paths
from any host to the SVC. We recommend that you limit the total number of paths that the
multipathing software on each host is managing to four paths, even though the maximum
supported is eight paths. Following these rules solves many issues with high port fanouts,
fabric state changes, and host memory management, and improves performance.
Refer to the following Web site for the latest maximum configuration requirements:
http://www-1.ibm.com/support/docview.wss?uid=ssg1S7002156
The major reason to limit the number of paths available to a host from the SVC is for error
recovery, failover, and failback purposes. The overall time for handling errors by a host is
significantly reduced. Additionally, resources within the host are greatly reduced each time
that you remove a path from the multipathing management. Two path configurations have just
one path to each node, which is a supported configuration but not recommended for most
configurations. However, refer to the host attachment guide for specific host and OS
requirements:
http://www-1.ibm.com/support/docview.wss?rs=591&context=STCCCXR&context=STCCCYH&dc
=DA400&q1=english&q2=-Japanese&uid=ssg1S7002159&loc=en_US&cs=utf-8&lang=en
We have measured the effect of multipathing on performance as shown in the following
tables. As the charts show, the differences in performance are generally minimal, but the
differences can reduce performance by almost 10% for specific workloads. These numbers
were produced with an AIX host running IBM Subsystem Device Driver (SDD) against the
SVC. The host was tuned specifically for performance by adjusting queue depths and buffers.
We tested a range of reads and writes, random and sequential, cache hits and misses, at 512
byte, 4 KB, and 64 KB transfer sizes.
Table 9-1 on page 177 shows the effects of multipathing.
176
SAN Volume Controller Best Practices and Performance Guidelines
Table 9-1 Effect of multipathing on write performance
R/W test
Four paths
Eight paths
Difference
Write Hit 512 b
Sequential IOPS
81 877
74 909
-8.6%
Write Miss 512 b
Random IOPS
60 510.4
57 567.1
-5.0%
70/30 R/W Miss 4K
Random IOPS
130 445.3
124 547.9
-5.6%
70/30 R/W Miss 64K
Random MBps
1 810.8138
1 834.2696
1.3%
50/50 R/W Miss 4K
Random IOPS
97 822.6
98 427.8
0.6%
50/50 R/W Miss 64K
Random MBps
1 674.5727
1 678.1815
0.2%
9.1.2 Host ports
The general recommendation for utilizing host ports connected to the SVC is to limit the
number of physical ports to two ports on two different physical adapters. Each of these ports
will be zoned to one target port in each SVC node, thus limiting the number of total paths to
four, preferably on totally separate redundant SAN fabrics.
If four host ports are preferred for maximum redundant paths, the requirement is to zone each
host adapter to one SVC target port on each node (for a maximum of eight paths). The
benefits of path redundancy are outweighed by the host memory resource utilization required
for more paths.
Use one host object to represent a cluster of hosts and use multiple worldwide port names
(WWPNs) to represent the ports from all the hosts that will share the same set of VDisks.
Best practice: Though it is supported in theory, we strongly recommend that you keep
Fibre Channel tape and Fibre Channel disks on separate host bus adapters (HBAs).
These devices have two extremely different data patterns when operating in their optimum
mode, and the switching between them can cause undesired overhead and performance
slowdown for the applications.
9.1.3 Port masking
You can use a port mask to control the node target ports that a host can access. The port
mask applies to logins from the host port that are associated with the host object. You can
use this capability to simplify the switch zoning by limiting the SVC ports within the SVC
configuration, rather than utilizing direct one-to-one zoning within the switch. This capability
can simplify zone management.
The port mask is a four-bit field that applies to all nodes in the cluster for the particular host.
For example, a port mask of 0001 allows a host to log in to a single port on every SVC node in
the cluster, if the switch zone also includes both host and SVC node ports.
Chapter 9. Hosts
177
9.1.4 Host to I/O Group mapping
An I/O Grouping consists of two SVC nodes that share management of VDisks within a
cluster. The recommendation is to utilize a single I/O Group (iogrp) for all VDisks allocated to
a particular host. This recommendation has many benefits. One major benefit is the
minimization of port fanouts within the SAN fabric. Another benefit is to maximize the
potential host attachments to the SVC, because maximums are based on I/O Groups. A third
benefit is within the host itself, having fewer target ports to manage.
The number of host ports and host objects allowed per I/O Group depends upon the switch
fabric type. Refer to the maximum configurations document for these maximums:
http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003283
Occasionally, an extremely powerful host can benefit from spreading its VDisks across I/O
Groups for load balancing. Our recommendation is to start with a single I/O Group and use
the performance monitoring tools, such as TotalStorage Productivity Center (TPC), to
determine if the host is I/O Group-limited. If additional I/O Groups are needed for the
bandwidth, it is possible to use more host ports to allocate to the other I/O Group. For
example, start with two HBAs zoned to one I/O Group. To add bandwidth, add two more
HBAs and zone to the other I/O Group. The host object in the SVC will contain both sets of
HBAs. The load can be balanced by selecting which host volumes are allocated to each
VDisk. Because VDisks are allocated to only a single I/O Group, the load will then be spread
across both I/O Groups based on the VDisk allocation spread.
9.1.5 VDisk size as opposed to quantity
In general, host resources, such as memory and processing time, are used up by each
storage LUN that is mapped to the host. For each extra path, additional memory can be used,
and a portion of additional processing time is also required. The user can control this effect by
using fewer larger LUNs rather than lots of small LUNs; however, it might require tuning of
queue depths and I/O buffers to support this efficiently. If a host does not have tunable
parameters, such as Windows, the host does not benefit from large VDisk sizes. AIX greatly
benefits from larger VDisks with a smaller number of VDisks and paths presented to it.
9.1.6 Host VDisk mapping
When you create a VDisk-to-host mapping, the host ports that are associated with the host
object can see the LUN that represents the VDisk on up to eight Fibre Channel ports (the four
ports on each node in a I/O Group). Nodes always present the logical unit (LU) that
represents a specific VDisk with the same LUN on all ports in an I/O Group.
This LUN mapping is called the Small Computer System Interface ID (scsi id), and the SVC
software will automatically assign the next available ID if none is specified. There is also a
unique identifier on each VDisk called the LUN serial number.
The best practice recommendation is to allocate SAN boot OS VDisk as the lowest SCSI ID
(zero for most hosts) and then allocate the various data disks. While not required, if you share
a VDisk among multiple hosts, control the SCSI ID so the IDs are identical across the hosts.
This consistency will ensure ease of management at the host level.
If you are using image mode to migrate a host into the SVC, allocate the VDisks in the same
order that they were originally assigned on the host from the back-end storage.
178
SAN Volume Controller Best Practices and Performance Guidelines
An invocation example:
svcinfo lshostvdiskmap -delim
The resulting output:
id:name:SCSI_id:vdisk_id:vdisk_name:wwpn:vdisk_UID
2:host2:0:10:vdisk10:0000000000000ACA:6005076801958001500000000000000A
2:host2:1:11:vdisk11:0000000000000ACA:6005076801958001500000000000000B
2:host2:2:12:vdisk12:0000000000000ACA:6005076801958001500000000000000C
2:host2:3:13:vdisk13:0000000000000ACA:6005076801958001500000000000000D
2:host2:4:14:vdisk14:0000000000000ACA:6005076801958001500000000000000E
For example, VDisk 10, in this example, has a unique device identifier (UID) of
6005076801958001500000000000000A, while the SCSI_ id that host2 used for access is 0.
svcinfo lsvdiskhostmap -delim : EEXCLS_HBin01
id:name:SCSI_id:host_id:host_name:wwpn:vdisk_UID
950:EEXCLS_HBin01:14:109:HDMCENTEX1N1:10000000C938CFDF:600507680191011D48000000000
00466
950:EEXCLS_HBin01:14:109:HDMCENTEX1N1:10000000C938D01F:600507680191011D48000000000
00466
950:EEXCLS_HBin01:13:110:HDMCENTEX1N2:10000000C938D65B:600507680191011D48000000000
00466
950:EEXCLS_HBin01:13:110:HDMCENTEX1N2:10000000C938D3D3:600507680191011D48000000000
00466
950:EEXCLS_HBin01:14:111:HDMCENTEX1N3:10000000C938D615:600507680191011D48000000000
00466
950:EEXCLS_HBin01:14:111:HDMCENTEX1N3:10000000C938D612:600507680191011D48000000000
00466
950:EEXCLS_HBin01:14:112:HDMCENTEX1N4:10000000C938CFBD:600507680191011D48000000000
00466
950:EEXCLS_HBin01:14:112:HDMCENTEX1N4:10000000C938CE29:600507680191011D48000000000
00466
950:EEXCLS_HBin01:14:113:HDMCENTEX1N5:10000000C92EE1D8:600507680191011D48000000000
00466
950:EEXCLS_HBin01:14:113:HDMCENTEX1N5:10000000C92EDFFE:600507680191011D48000000000
00466
If using IBM multipathing software (IBM Subsystem Device Driver (SDD) or SDDDSM), the
command datapath query device shows the vdisk_UID (unique identifier) and so enables
easier management of VDisks. The SDDPCM equivalent command is pcmpath query device.
Host-VDisk mapping from more than one I/O Group
The SCSI ID field in the host-VDiskmap might not be unique for a VDisk for a host, because it
does not completely define the uniqueness of the LUN. The target port is also used as part of
the identification. If there are two I/O Groups of VDisks assigned to a host port, one set will
start with SCSI ID 0 and then increment (given the default), and the SCSI ID for the second
I/O Group will also start at zero and then increment by default. Refer to Example 9-1 on
page 180 for a sample of this type of host map. VDisk s-0-6-4 and VDisk s-1-8-2 both have
a SCSI ID of ONE, yet they have different LUN serial numbers.
Chapter 9. Hosts
179
Example 9-1 Host-VDisk mapping for one host from two I/O Groups
IBM_2145:ITSOCL1:admin>svcinfo lshostvdiskmap senegal
id
name
SCSI_id
vdisk_id
wwpn
vdisk_UID
0
senegal
1
60
210000E08B89CCC2 60050768018101BF28000000000000A8
0
senegal
2
58
210000E08B89CCC2 60050768018101BF28000000000000A9
0
senegal
3
57
210000E08B89CCC2 60050768018101BF28000000000000AA
0
senegal
4
56
210000E08B89CCC2 60050768018101BF28000000000000AB
0
senegal
5
61
210000E08B89CCC2 60050768018101BF28000000000000A7
0
senegal
6
36
210000E08B89CCC2 60050768018101BF28000000000000B9
0
senegal
7
34
210000E08B89CCC2 60050768018101BF28000000000000BA
0
senegal
1
40
210000E08B89CCC2 60050768018101BF28000000000000B5
0
senegal
2
50
210000E08B89CCC2 60050768018101BF28000000000000B1
0
senegal
3
49
210000E08B89CCC2 60050768018101BF28000000000000B2
0
senegal
4
42
210000E08B89CCC2 60050768018101BF28000000000000B3
0
senegal
5
41
210000E08B89CCC2 60050768018101BF28000000000000B4
vdisk_name
s-0-6-4
s-0-6-5
s-0-5-1
s-0-5-2
s-0-6-3
big-0-1
big-0-2
s-1-8-2
s-1-4-3
s-1-4-4
s-1-4-5
s-1-8-1
Example 9-2 shows the datapath query device output of this Windows host. Note that the
order of the two I/O Groups’ VDisks is reversed from the host-vdisk map. VDisk s-1-8-2 is
first, followed by the rest of the LUNs from the second I/O Group, then VDisk s-0-6-4, and
the rest of the LUNs from the first I/O Group. Most likely, Windows discovered the second set
of LUNS first. However, the relative order within an I/O Group is maintained.
Example 9-2 Using datapath query device for the host VDisk map
C:\Program Files\IBM\Subsystem Device Driver>datapath query device
Total Devices : 12
DEV#:
0 DEVICE NAME: Disk1 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000B5
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk1 Part0
OPEN
NORMAL
0
0
1
Scsi Port2 Bus0/Disk1 Part0
OPEN
NORMAL
1342
0
2
Scsi Port3 Bus0/Disk1 Part0
OPEN
NORMAL
0
0
3
Scsi Port3 Bus0/Disk1 Part0
OPEN
NORMAL
1444
0
DEV#:
1 DEVICE NAME: Disk2 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000B1
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
180
SAN Volume Controller Best Practices and Performance Guidelines
0
1
2
3
Scsi
Scsi
Scsi
Scsi
Port2
Port2
Port3
Port3
Bus0/Disk2
Bus0/Disk2
Bus0/Disk2
Bus0/Disk2
Part0
Part0
Part0
Part0
OPEN
OPEN
OPEN
OPEN
NORMAL
NORMAL
NORMAL
NORMAL
1405
0
1387
0
0
0
0
0
DEV#:
2 DEVICE NAME: Disk3 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000B2
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk3 Part0
OPEN
NORMAL
1398
0
1
Scsi Port2 Bus0/Disk3 Part0
OPEN
NORMAL
0
0
2
Scsi Port3 Bus0/Disk3 Part0
OPEN
NORMAL
1407
0
3
Scsi Port3 Bus0/Disk3 Part0
OPEN
NORMAL
0
0
DEV#:
3 DEVICE NAME: Disk4 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000B3
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk4 Part0
OPEN
NORMAL
1504
0
1
Scsi Port2 Bus0/Disk4 Part0
OPEN
NORMAL
0
0
2
Scsi Port3 Bus0/Disk4 Part0
OPEN
NORMAL
1281
0
3
Scsi Port3 Bus0/Disk4 Part0
OPEN
NORMAL
0
0
DEV#:
4 DEVICE NAME: Disk5 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000B4
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk5 Part0
OPEN
NORMAL
0
0
1
Scsi Port2 Bus0/Disk5 Part0
OPEN
NORMAL
1399
0
2
Scsi Port3 Bus0/Disk5 Part0
OPEN
NORMAL
0
0
3
Scsi Port3 Bus0/Disk5 Part0
OPEN
NORMAL
1391
0
DEV#:
5 DEVICE NAME: Disk6 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000A8
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk6 Part0
OPEN
NORMAL
1400
0
1
Scsi Port2 Bus0/Disk6 Part0
OPEN
NORMAL
0
0
2
Scsi Port3 Bus0/Disk6 Part0
OPEN
NORMAL
1390
0
3
Scsi Port3 Bus0/Disk6 Part0
OPEN
NORMAL
0
0
DEV#:
6 DEVICE NAME: Disk7 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000A9
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk7 Part0
OPEN
NORMAL
1379
0
1
Scsi Port2 Bus0/Disk7 Part0
OPEN
NORMAL
0
0
2
Scsi Port3 Bus0/Disk7 Part0
OPEN
NORMAL
1412
0
3
Scsi Port3 Bus0/Disk7 Part0
OPEN
NORMAL
0
0
DEV#:
7 DEVICE NAME: Disk8 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000AA
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk8 Part0
OPEN
NORMAL
0
0
Chapter 9. Hosts
181
1
2
3
Scsi Port2 Bus0/Disk8 Part0
Scsi Port3 Bus0/Disk8 Part0
Scsi Port3 Bus0/Disk8 Part0
OPEN
OPEN
OPEN
NORMAL
NORMAL
NORMAL
1417
0
1381
0
0
0
DEV#:
8 DEVICE NAME: Disk9 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000AB
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk9 Part0
OPEN
NORMAL
0
0
1
Scsi Port2 Bus0/Disk9 Part0
OPEN
NORMAL
1388
0
2
Scsi Port3 Bus0/Disk9 Part0
OPEN
NORMAL
0
0
3
Scsi Port3 Bus0/Disk9 Part0
OPEN
NORMAL
1413
0
DEV#:
9 DEVICE NAME: Disk10 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000A7
=============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk10 Part0
OPEN
NORMAL
1293
0
1
Scsi Port2 Bus0/Disk10 Part0
OPEN
NORMAL
0
0
2
Scsi Port3 Bus0/Disk10 Part0
OPEN
NORMAL
1477
0
3
Scsi Port3 Bus0/Disk10 Part0
OPEN
NORMAL
0
0
DEV#: 10 DEVICE NAME: Disk11 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000B9
=============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk11 Part0
OPEN
NORMAL
0
0
1
Scsi Port2 Bus0/Disk11 Part0
OPEN
NORMAL
59981
0
2
Scsi Port3 Bus0/Disk11 Part0
OPEN
NORMAL
0
0
3
Scsi Port3 Bus0/Disk11 Part0
OPEN
NORMAL
60179
0
DEV#: 11 DEVICE NAME: Disk12 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000BA
=============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk12 Part0
OPEN
NORMAL
28324
0
1
Scsi Port2 Bus0/Disk12 Part0
OPEN
NORMAL
0
0
2
Scsi Port3 Bus0/Disk12 Part0
OPEN
NORMAL
27111
0
3
Scsi Port3 Bus0/Disk12 Part0
OPEN
NORMAL
0
0
Sometimes, a host might discover everything correctly at initial configuration, but it does not
keep up with the dynamic changes in the configuration. The scsi id is therefore extremely
important. For more discussion about this topic, refer to 9.2.4, “Dynamic reconfiguration” on
page 185.
9.1.7 Server adapter layout
If your host system has multiple internal I/O busses, place the two adapters used for SVC
cluster access on two different I/O busses to maximize availability and performance.
182
SAN Volume Controller Best Practices and Performance Guidelines
9.1.8 Availability as opposed to error isolation
It is important to balance availability through the multiple paths through a SAN to the two SVC
nodes as opposed to error isolation. Normally, people add more paths to a SAN to increase
availability, which leads to the conclusion that you want all four ports in each node zoned to
each port in the host. However, our experience has shown that it is better to limit the number
of paths so that the software error recovery software within a switch or a host is able to
manage the loss of paths quickly and efficiently. Therefore, it is beneficial to keep the span out
from the host port through the SAN to an SVC port to one-to-one as much as possible. Limit
each host port to a different set of SVC ports on each node, which keeps the errors within a
host isolated to a single adapter if the errors are coming from a single SVC port or from one
fabric, making isolation to a failing port or switch easier.
9.2 Host pathing
Each host mapping associates a VDisk with a host object and allows all HBA ports in the
host object to access the VDisk. You can map a VDisk to multiple host objects. When a
mapping is created, multiple paths might exist across the SAN fabric from the hosts to the
SVC nodes that are presenting the VDisk. Most operating systems present each path to a
VDisk as a separate storage device. The SVC, therefore, requires that multipathing software
is running on the host. The multipathing software manages the many paths that are available
to the VDisk and presents a single storage device to the operating system.
9.2.1 Preferred path algorithm
I/O traffic for a particular VDisk is, at any one time, managed exclusively by the nodes in a
single I/O Group. The distributed cache in the SAN Volume Controller is two-way. When a
VDisk is created, a preferred node is chosen. This task is controllable at the time of creation.
The owner node for a VDisk is the preferred node when both nodes are available.
When I/O is performed to a VDisk, the node that processes the I/O duplicates the data onto
the partner node that is in the I/O Group. A write from the SVC node to the back-end
managed disk (MDisk) is only destaged via the owner node (normally, the preferred node).
Therefore, when a new write or read comes in on the non-owner node, it has to send some
extra messages to the owner-node to check if it has the data in cache, or if it is in the middle
of destaging that data. Therefore, performance will be enhanced by accessing the VDisk
through the preferred node.
IBM multipathing software (SDD, SDDPCM, or SDDDSM) will check the preferred path
setting during initial configuration for each VDisk and manage the path usage:
 Non-preferred paths: Failover only
 Preferred path: Chosen multipath algorithm (default: load balance)
9.2.2 Path selection
There are many algorithms used by multipathing software to select the paths used for an
individual I/O for each VDisk. For enhanced performance with most host types, the
recommendation is to load balance the I/O between only preferred node paths under normal
conditions. The load across the host adapters and the SAN paths will be balanced by
alternating the preferred node choice for each VDisk. Care must be taken when allocating
VDisks with the SVC Console GUI to ensure adequate dispersion of the preferred node
Chapter 9. Hosts
183
among the VDisks. If the preferred node is offline, all I/O will go through the non-preferred
node in write-through mode.
Certain multipathing software does not utilize the preferred node information, so it might
balance the I/O load for a host differently. Veritas DMP is one example.
Table 9-2 shows the effect with 16 devices and read misses of the preferred node contrasted
with the non-preferred node on performance and shows the effect on throughput. The effect is
significant.
Table 9-2 The 16 device random 4 Kb read miss response time (4.2 nodes, usecs)
Preferred node (owner)
Non-preferred node
Delta
18 227
21 256
3 029
Table 9-3 shows the change in throughput for the case of 16 devices and random 4 Kb read
miss throughput using the preferred node as opposed to a non-preferred node shown in
Table 9-2.
Table 9-3 The 16 device random 4 Kb read miss throughput (IOPS)
Preferred node (owner)
Non-preferred node
Delta
105 274.3
90 292.3
14 982
In Table 9-4, we show the effect of using the non-preferred paths compared to the preferred
paths on read performance.
Table 9-4 Random (1 TB) 4 Kb read response time (4.1 nodes, usecs)
Preferred Node (Owner)
Non-preferred Node
Delta
5 074
5 147
73
Table 9-5 shows the effect of using non-preferred nodes on write performance.
Table 9-5 Random (1 TB) 4 Kb write response time (4.2 nodes, usecs)
Preferred node (owner)
Non-preferred node
Delta
5 346
5 433
87
IBM SDD software, SDDDSM software, and SDDPCM software recognize the preferred
nodes and utilize the preferred paths.
9.2.3 Path management
The SVC design is based on multiple path access from the host to both SVC nodes.
Multipathing software is expected to retry down multiple paths upon detection of an error.
We recommend that you actively check the multipathing software display of paths available
and currently in usage periodically and just before any SAN maintenance or software
upgrades. IBM multipathing software (SDD, SDDPCM, and SDDDSM) makes this monitoring
easy through the command datapath query device or pcmpath query device.
184
SAN Volume Controller Best Practices and Performance Guidelines
Fast node reset
There was a major improvement in SVC 4.2 in software error recovery. Fast node reset
restarts a node following a software failure before the host fails I/O to applications. This node
reset time improved from several minutes for “standard” node reset in previous SVC versions
to about thirty seconds for SVC 4.2.
Pre-SVC 4.2.0 node reset behavior
When an SVC node is reset, it will disappear from the fabric. So from a host perspective, a
few seconds of non-response from the SVC node will be followed by receipt of a registered
state change notification (RSCN) from the switch. Any query to the switch name server will
find that the SVC ports for the node are no longer present. The SVC ports/node will be gone
from the name server for around 60 seconds.
SVC 4.2.0 node reset behavior
When an SVC node is reset, the node ports will not disappear from the fabric. Instead, the
node will keep the ports alive. So from a host perspective, SVC will simply stop responding to
any SCSI traffic. Any query to the switch name server will find that the SVC ports for the node
are still present, but any FC login attempts (for example, PLOGI) will be ignored. This state
will persist for around 30-45 seconds.
This improvement is a major enhancement for host path management of potential double
failures, such as a software failure of one node while the other node in the I/O Group is being
serviced, and software failures during a code upgrade. This new feature will also enhance
path management when host paths are misconfigured and include only a single SVC node.
9.2.4 Dynamic reconfiguration
Many users want to dynamically reconfigure the storage connected to their hosts. The SVC
gives you this capability by virtualizing the storage behind the SVC so that a host will see only
the SVC VDisks presented to it. The host can then add or remove storage dynamically and
reallocate using VDisk-MDisk changes.
After you decide to virtualize your storage behind an SVC, an image mode migration is used
to move the existing back-end storage behind the SVC. This process is simple, seamless,
and requires the host to be gracefully shut down. Then the SAN must be rezoned for SVC to
be the host, the back-end storage LUNs must be moved to the SVC as a host, and the SAN
rezoned for the SVC as a back-end device for the host. The host will be brought back up with
the appropriate multipathing software, and the LUNs are now managed as SVC image mode
VDisks. These VDisks can then be migrated to new storage or moved to striped storage
anytime in the future with no host impact whatsoever.
There are times, however, when users want to change the SVC VDisk presentation to the
host. The process to change the SVC VDisk presentation to the host dynamically is
error-prone and not recommended. However, it is possible to change the SVC VDisk
presentation to the host by remembering several key issues.
Hosts do not dynamically reprobe storage unless prompted by an external change or by the
users manually causing rediscovery. Most operating systems do not notice a change in a disk
allocation automatically. There is saved information about the device database information,
such as the Windows registry or the AIX Object Data Manager (ODM) database, that is
utilized.
Chapter 9. Hosts
185
Add new VDisks or paths
Normally, adding new storage to a host and running the discovery methods (such as cfgmgr)
are safe, because there is no old, leftover information that is required to be removed. Simply
scan for new disks or run cfgmgr several times if necessary to see the new disks.
Removing VDisks and then later allocating new VDisks to the host
The problem surfaces when a user removes a vdiskhostmap on the SVC during the process
of removing a VDisk. After a VDisk is unmapped from the host, the device becomes
unavailable and the SVC reports that there is no such disk on this port. Usage of datapath
query device after the removal will show a closed, offline, invalid, or dead state as shown
here:
Windows host:
DEV#:
0 DEVICE NAME: Disk1 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018201BEE000000000000041
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk1 Part0
CLOSE OFFLINE
0
0
1
Scsi Port3 Bus0/Disk1 Part0
CLOSE OFFLINE
263
0
AIX host:
DEV#: 189 DEVICE NAME: vpath189 TYPE: 2145
POLICY:
Optimized
SERIAL: 600507680000009E68000000000007E6
============================================================================
Path#
Adapter/Hard Disk
State
Mode
Select
Errors
0
fscsi0/hdisk1654
DEAD
OFFLINE
0
0
1
fscsi0/hdisk1655
DEAD
OFFLINE
2
0
2
fscsi1/hdisk1658
INVALID NORMAL
0
0
3
fscsi1/hdisk1659
INVALID NORMAL
1
0
The next time that a new VDisk is allocated and mapped to that host, the SCSI ID will be
reused if it is allowed to set to the default value, and the host can possibly confuse the new
device with the old device definition that is still left over in the device database or system
memory. It is possible to get two devices that use identical device definitions in the device
database, such as in this example.
Note that both vpath189 and vpath190 have the same hdisk definitions while they actually
contain different device serial numbers. The path fscsi0/hdisk1654 exists in both vpaths.
DEV#: 189 DEVICE NAME: vpath189 TYPE: 2145
POLICY:
Optimized
SERIAL: 600507680000009E68000000000007E6
============================================================================
Path#
Adapter/Hard Disk
State
Mode
Select
Errors
0
fscsi0/hdisk1654
CLOSE
NORMAL
0
0
1
fscsi0/hdisk1655
CLOSE
NORMAL
2
0
2
fscsi1/hdisk1658
CLOSE
NORMAL
0
0
3
fscsi1/hdisk1659
CLOSE
NORMAL
1
0
DEV#: 190 DEVICE NAME: vpath190 TYPE: 2145
POLICY:
Optimized
SERIAL: 600507680000009E68000000000007F4
============================================================================
Path#
Adapter/Hard Disk
State
Mode
Select
Errors
0
fscsi0/hdisk1654
OPEN
NORMAL
0
0
1
fscsi0/hdisk1655
OPEN
NORMAL
6336260
0
2
fscsi1/hdisk1658
OPEN
NORMAL
0
0
3
fscsi1/hdisk1659
OPEN
NORMAL
6326954
0
186
SAN Volume Controller Best Practices and Performance Guidelines
The multipathing software (SDD) recognizes that there is a new device, because at
configuration time, it issues an inquiry command and reads the mode pages. However, if the
user did not remove the stale configuration data, the Object Data Manager (ODM) for the old
hdisks and vpaths still remains and confuses the host, because the SCSI ID as opposed to
the device serial number mapping has changed. You can avoid this situation if you remove
the hdisk and vpath information from the device configuration database (rmdev -dl
vpath189, rmdev -dl hdisk1654, and so forth) prior to mapping new devices to the host and
running discovery.
Removing the stale configuration and rebooting the host is the recommended procedure for
reconfiguring the VDisks mapped to a host.
Another process that might cause host confusion is expanding a VDisk. The SVC will tell a
host through the scsi check condition “mode parameters changed,” but not all hosts are able to
automatically discover the change and might confuse LUNs or continue to use the old size.
Review the IBM System Storage SAN Volume Controller V4.3.0 - Software Installation and
Configuration Guide, SC23-6628, for more details and supported hosts:
http://www-1.ibm.com/support/docview.wss?uid=ssg1S7002156
9.2.5 VDisk migration between I/O Groups
Migrating VDisks between I/O Groups is another potential issue if the old definitions of the
VDisks are not removed from the configuration. Migrating VDisks between I/O Groups is not a
dynamic configuration change, because each node has its own worldwide node name
(WWNN); therefore, the host will see the new nodes as a different SCSI target. This process
causes major configuration changes. If the stale configuration data is still known by the host,
the host might continue to attempt I/O to the old I/O node targets during multipathing
selection.
Example 9-3 shows the Windows SDD host display prior to I/O Group migration.
Example 9-3 Windows SDD host display prior to I/O Group migration
C:\Program Files\IBM\Subsystem Device Driver>datapath query device
DEV#:
0 DEVICE NAME: Disk1 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000A0
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk1 Part0
OPEN
NORMAL
0
0
1
Scsi Port2 Bus0/Disk1 Part0
OPEN
NORMAL
1873173
0
2
Scsi Port3 Bus0/Disk1 Part0
OPEN
NORMAL
0
0
3
Scsi Port3 Bus0/Disk1 Part0
OPEN
NORMAL
1884768
0
DEV#:
1 DEVICE NAME: Disk2 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF280000000000009F
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk2 Part0
OPEN
NORMAL
0
0
1
Scsi Port2 Bus0/Disk2 Part0
OPEN
NORMAL
1863138
0
2
Scsi Port3 Bus0/Disk2 Part0
OPEN
NORMAL
0
0
3
Scsi Port3 Bus0/Disk2 Part0
OPEN
NORMAL
1839632
0
If you just quiesce the host I/O and then migrate the VDisks to the new I/O Group, you will
get closed offline paths for the old I/O Group and open normal paths to the new I/O Group.
However, these devices do not work correctly, and there is no way to remove the stale paths
Chapter 9. Hosts
187
without rebooting. Note the change in the pathing in Example 9-4 for device 0
SERIAL:S60050768018101BF28000000000000A0.
Example 9-4 Windows VDISK moved to new I/O Group dynamically showing the closed offline paths
C:\Program Files\IBM\Subsystem Device Driver>datapath query device
Total Devices : 12
DEV#:
0 DEVICE NAME: Disk1 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000A0
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk1 Part0
CLOSED OFFLINE
0
0
1
Scsi Port2 Bus0/Disk1 Part0
CLOSED OFFLINE
1873173
0
2
Scsi Port3 Bus0/Disk1 Part0
CLOSED OFFLINE
0
0
3
Scsi Port3 Bus0/Disk1 Part0
CLOSED OFFLINE
1884768
0
4
Scsi Port2 Bus0/Disk1 Part0
OPEN
NORMAL
0
0
5
Scsi Port2 Bus0/Disk1 Part0
OPEN
NORMAL
45
0
6
Scsi Port3 Bus0/Disk1 Part0
OPEN
NORMAL
0
0
7
Scsi Port3 Bus0/Disk1 Part0
OPEN
NORMAL
54
0
DEV#:
1 DEVICE NAME: Disk2 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF280000000000009F
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk2 Part0
OPEN
NORMAL
0
0
1
Scsi Port2 Bus0/Disk2 Part0
OPEN
NORMAL
1863138
0
2
Scsi Port3 Bus0/Disk2 Part0
OPEN
NORMAL
0
0
3
Scsi Port3 Bus0/Disk2 Part0
OPEN
NORMAL
1839632
0
To change the I/O Group, you must first flush the cache within the nodes in the current I/O
Group to ensure that all data is written to disk. The SVC command line interface (CLI) guide
recommends that you suspend I/O operations at the host level.
The recommended way to quiesce the I/O is to take the volume groups offline, remove the
saved configuration (AIX ODM) entries, such as hdisks and vpaths for those that are planned
for removal, and then gracefully shut down the hosts. Migrate the VDisk to the new I/O Group
and power up the host, which will discover the new I/O Group. If the stale configuration data
was not removed prior to the shutdown, remove it from the stored host device databases
(such as ODM if it is an AIX host) at this point. For Windows hosts, the stale registry
information is normally ignored after reboot. Doing VDisk migrations in this way will prevent
the problem of stale configuration issues.
9.3 I/O queues
Host operating system and host bus adapter software must have a way to fairly prioritize I/O
to the storage. The host bus might run significantly faster than the I/O bus or external storage;
therefore, there must be a way to queue I/O to the devices. Each operating system and host
adapter have unique methods to control the I/O queue. It can be host adapter-based or
memory and thread resources-based, or based on how many commands are outstanding for
a particular device. You have several configuration parameters available to control the I/O
queue for your configuration. There are host adapter parameters and also queue depth
188
SAN Volume Controller Best Practices and Performance Guidelines
parameters for the various storage devices (VDisks on the SVC). There are also algorithms
within multipathing software, such as qdepth_enable.
9.3.1 Queue depths
Queue depth is used to control the number of concurrent operations occurring on different
storage resources. Refer to “Limiting Queue Depths in Large SANs,” in the IBM System
Storage SAN Volume Controller V4.3.0 - Software Installation and Configuration Guide,
SC23-6628-02, for more details:
http://www-1.ibm.com/support/docview.wss?uid=ssg1S7002156
Queue depth control must be considered for the overall SVC I/O Group to maintain
performance within the SVC. It must also be controlled on an individual host adapter basis,
LUN basis to avoid taxing the host memory, or physical adapter resources basis. Refer to the
host attachment scripts and host attachment guides for initial recommendations for queue
depth choices, because they are specific to each host OS and HBA.
You can obtain the IBM System Storage SAN Volume Controller V4.3.0 - Host Attachment
Guide, SC26-7905-03, at:
http://www-1.ibm.com/support/docview.wss?uid=ssg1S7002159
AIX host attachment scripts are available here:
http://www-1.ibm.com/support/dlsearch.wss?rs=540&q=host+attachment&tc=ST52G7&dc=D4
10
Queue depth control within the host is accomplished through limits placed by the adapter
resources for handling I/Os and by setting a queue depth maximum per LUN. Multipathing
software also controls queue depth using different algorithms. SDD recently made an
algorithm change in this area to limit queue depth individually by LUN as opposed to an
overall system queue depth limitation.
The host I/O will be converted to MDisk I/O as needed. The SVC submits I/O to the back-end
(MDisk) storage as any host normally does. The host allows user control of the queue depth
that is maintained on a disk. SVC controls queue depth for MDisk I/O without any user
intervention. After SVC has submitted I/Os and has “Q” I/Os per second (IOPS) outstanding
for a single MDisk (that is, it is waiting for Q I/Os to complete), it will not submit any more I/O
until some I/O completes. That is, any new I/O requests for that MDisk will be queued inside
SVC.
The graph in Figure 9-1 on page 190 indicates the effect on host VDisk queue depth for a
simple configuration of 32 VDisks and one host.
Chapter 9. Hosts
189
Figure 9-1 IOPS compared to queue depth for 32 Vdisks tests on a single host
Figure 9-2 shows another example of queue depth sensitivity for 32 VDisks on a single host.
Figure 9-2 MBps compared to queue depth for 32 VDisk tests on a single host
9.4 Multipathing software
The SVC requires the use of multipathing software on hosts that are connected. The latest
recommended levels for each host operating system and multipath software package are
documented in the SVC Web site:
http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003278#_Multi_Host
Note that the prior levels of host software packages that were recommended are also tested
for SVC 4.3.0 and allow for flexibility in maintaining the host software levels with respect to
the SVC software version. In other words, it is possible to upgrade the SVC before upgrading
the host software levels or after upgrading the software levels, depending on your
maintenance schedule.
190
SAN Volume Controller Best Practices and Performance Guidelines
9.5 Host clustering and reserves
To prevent hosts from sharing storage inadvertently, it is prudent to establish a storage
reservation mechanism. The mechanisms for restricting access to SVC VDisks utilize the
Small Computer Systems Interface-3 (SCSI-3) persistent reserve commands or the SCSI-2
legacy reserve and release commands.
There are several methods that the host software uses for implementing host clusters. They
require sharing the VDisks on the SVC between hosts. In order to share storage between
hosts, control must be maintained over accessing the VDisks. Certain clustering software
uses software locking methods. Other methods of control can be chosen by the clustering
software or by the device drivers to utilize the SCSI architecture reserve/release
mechanisms. The multipathing software can change the type of reserve used from a legacy
reserve to persistent reserve, or remove the reserve.
Persistent reserve refers to a set of Small Computer Systems Interface-3 (SCSI-3) standard
commands and command options that provide SCSI initiators with the ability to establish,
preempt, query, and reset a reservation policy with a specified target device. The functionality
provided by the persistent reserve commands is a superset of the legacy reserve/release
commands. The persistent reserve commands are incompatible with the legacy
reserve/release mechanism, and target devices can only support reservations from either the
legacy mechanism or the new mechanism. Attempting to mix persistent reserve commands
with legacy reserve/release commands will result in the target device returning a reservation
conflict error.
Legacy reserve and release mechanisms (SCSI-2) reserved the entire LUN (VDisk) for
exclusive use down a single path, which prevents access from any other host or even access
from the same host utilizing a different host adapter.
The persistent reserve design establishes a method and interface through a reserve policy
attribute for SCSI disks, which specifies the type of reservation (if any) that the OS device
driver will establish before accessing data on the disk.
Four possible values are supported for the reserve policy:
 No_reserve: No reservations are used on the disk.
 Single_path: Legacy reserve/release commands are used on the disk.
 PR_exclusive: Persistent reservation is used to establish exclusive host access to the
disk.
 PR_shared: Persistent reservation is used to establish shared host access to the disk.
When a device is opened (for example, when the AIX varyonvg command opens the
underlying hdisks), the device driver will check the ODM for a reserve_policy and a
PR_key_value and open the device appropriately. For persistent reserve, it is necessary that
each host attached to the shared disk use a unique registration key value.
Clearing reserves
It is possible to accidently leave a reserve on the SVC VDisk or even the SVC MDisk during
migration into the SVC or when reusing disks for another purpose. There are several tools
available from the hosts to clear these reserves. The easiest tools to use are the commands
lquerypr (AIX SDD host) and pcmquerypr (AIX SDDPCM host). There is also a Windows
SDD/SDDDSM tool, which is menu driven.
Chapter 9. Hosts
191
The Windows Persistent Reserve Tool is called PRTool.exe and is installed automatically
when SDD or SDDDSM is installed:
C:\Program Files\IBM\Subsystem Device Driver>PRTool.exe
It is possible to clear SVC VDisk reserves by removing all the host-VDisk mappings when
SVC code is at 4.1.0 or higher.
Example 9-5 shows how to determine if there is a reserve on a device using the AIX SDD
lquerypr command on a reserved hdisk.
Example 9-5 The lquerypr command
[root@ktazp5033]/reserve-checker-> lquerypr -vVh /dev/hdisk5
connection type: fscsi0
open dev: /dev/hdisk5
Attempt to read reservation key...
Attempt to read registration keys...
Read Keys parameter
Generation : 935
Additional Length: 32
Key0 : 7702785F
Key1 : 7702785F
Key2 : 770378DF
Key3 : 770378DF
Reserve Key provided by current host = 7702785F
Reserve Key on the device: 770378DF
This example shows that the device is reserved by a different host. The advantage of using
the vV parameter is that the full persistent reserve keys on the device are shown, as well as
the errors if the command fails. An example of a failing pcmquerypr command to clear the
reserve shows the error:
# pcmquerypr -ph /dev/hdisk232 -V
connection type: fscsi0
open dev: /dev/hdisk232
couldn't open /dev/hdisk232, errno=16
Use the AIX include file errno.h to find out what the 16 indicates. This error indicates a busy
condition, which can indicate a legacy reserve or a persistent reserve from another host (or
this host from a different adapter). However, there are certain AIX technology levels (TLs)
that have a diagnostic open issue, which prevents the pcmquerypr command from opening
the device to display the status or to clear a reserve.
The following hint and tip give more information about AIX TL levels that break the
pcmquerypr command:
http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&uid=ssg1S1003122&lo
c=en_US&cs=utf-8&lang=en
SVC MDisk reserves
Sometimes, a host image mode migration will appear to succeed, but when the VDisk is
actually opened for read or write I/O, problems occur. The problems can result from not
removing the reserve on the MDisk before using image mode migration into the SVC. There
is no way to clear a leftover reserve on an SVC MDisk from the SVC. The reserve will have to
be cleared by mapping the MDisk back to the owning host and clearing it through host
commands or through back-end storage commands as advised by IBM technical support.
192
SAN Volume Controller Best Practices and Performance Guidelines
9.5.1 AIX
The following topics describe items specific to AIX.
HBA parameters for performance tuning
The following example settings can be used to start off your configuration in the specific
workload environment. These settings are suggestions, and they are not guaranteed to be
the answer to all configurations. Always try to set up a test of your data with your
configuration to see if there is further tuning that can help. Again, knowledge of your specific
data I/O pattern is extremely helpful.
AIX operating system settings
The following section outlines the settings that can affect performance on an AIX host. We
look at these settings in relation to how they impact the two workload types.
Transaction-based settings
The following host attachment script will set the default values of attributes for the SVC
hdisks: devices.fcp.disk.IBM.rte or devices.fcp.disk.IBM.mpio.rte.
You can modify these values, but they are an extremely good place to start. There are
additionally HBA parameters that are useful to set for higher performance or large numbers of
hdisk configurations.
All attribute values that are changeable can be changed using the chdev command for AIX.
AIX settings, which can directly affect transaction performance, are the queue_depth hdisk
attribute and num_cmd_elem in the HBA attributes.
The queue_depth hdisk attribute
For the logical drive known as the hdisk in AIX, the setting is the attribute queue_depth:
# chdev -l hdiskX -a queue_depth=Y -P
In this example, “X” is the hdisk number, and “Y” is the value to which you are setting X for
queue_depth.
For a high transaction workload of small random transfers, try queue_depth of 25 or more, but
for large sequential workloads, performance is better with shallow queue depths, such as 4.
The num_cmd_elem attribute
For the HBA settings, the attribute num_cmd_elem for the fcs device represents the number of
commands that can be queued to the adapter:
chdev -l fcsX -a num_cmd_elem=1024 -P
The default value is 200, and the maximum value is:




LP9000 adapters: 2048
LP10000 adapters: 2048
LP11000 adapters: 2048
LP7000 adapters: 1024
Best practice: For a high volume of transactions on AIX or a large numbers of hdisks on
the fcs adapter, we recommend that you increase num_cmd_elem to 1 024 for the fcs
devices being used.
Chapter 9. Hosts
193
AIX settings, which can directly affect throughput performance with large I/O block size, are
the lg_term_dma and max_xfer_size parameters for the fcs device.
The lg_term_dma attribute
This AIX Fibre Channel adapter attribute controls the direct memory access (DMA) memory
resource that an adapter driver can use. The default value of lg_term_dma is 0x200000, and
the maximum value is 0x8000000. A recommended change is to increase the value of
lg_term_dma to 0x400000. If you still experience poor I/O performance after changing the
value to 0x400000, you can increase the value of this attribute again. If you have a dual-port
Fibre Channel adapter, the maximum value of the lg_term_dma attribute is divided between
the two adapter ports. Therefore, never increase lg_term_dma to the maximum value for a
dual-port Fibre Channel adapter, because this value will cause the configuration of the
second adapter port to fail.
The max_xfer_size attribute
This AIX Fibre Channel adapter attribute controls the maximum transfer size of the Fibre
Channel adapter. Its default value is 100 000, and the maximum value is 1 000 000. You can
increase this attribute to improve performance. You can change this attribute only with AIX
5.2.0 or higher.
Note that setting the max_xfer_size affects the size of a memory area used for data transfer
by the adapter. With the default value of max_xfer_size=0x100000, the area is 16 MB in size,
and for other allowable values of max_xfer_size, the memory area is 128 MB in size.
Throughput-based settings
In the throughput-based environment, you might want to decrease the queue depth setting to
a smaller value than the default from the host attach. In a mixed application environment, you
do not want to lower the num_cmd_elem setting, because other logical drives might need this
higher value to perform. In a purely high throughput workload, this value will have no effect.
Best practice: The recommended start values for high throughput sequential I/O
environments are lg_term_dma = 0x400000 or 0x800000 (depending on the adapter type)
and max_xfr_size = 0x200000.
We recommend that you test your host with the default settings first and then make these
possible tuning changes to the host parameters to verify if these suggested changes actually
enhance performance for your specific host configuration and workload.
Configuring for fast fail and dynamic tracking
For host systems that run an AIX 5.2 or higher operating system, you can achieve the best
results by using the fast fail and dynamic tracking attributes. Before configuring your host
system to use these attributes, ensure that the host is running the AIX operating system
Version 5.2 or higher. Perform the following steps to configure your host system to use the
fast fail and dynamic tracking attributes:
1. Issue the following command to set the Fibre Channel SCSI I/O Controller Protocol Device
event error recovery policy to fast_fail for each Fibre Channel adapter:
chdev -l fscsi0 -a fc_err_recov=fast_fail
The previous example command was for adapter fscsi0.
2. Issue the following command to enable dynamic tracking for each Fibre Channel device:
chdev -l fscsi0 -a dyntrk=yes
The previous example command was for adapter fscsi0.
194
SAN Volume Controller Best Practices and Performance Guidelines
Multipathing
When the AIX operating system was first developed, multipathing was not embedded within
the device drivers. Therefore, each path to an SVC VDisk was represented by an AIX hdisk.
The SVC host attachment script devices.fcp.disk.ibm.rte sets up the predefined attributes
within the AIX database for SVC disks, and these attributes have changed with each iteration
of host attachment and AIX technology levels. Both SDD and Veritas DMP utilize the hdisks
for multipathing control. The host attachment is also used for other IBM storage devices. The
Host Attachment allows AIX device driver configuration methods to properly identify and
configure SVC (2145), DS6000 (1750), and DS8000 (2107) LUNs:
http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D410&q1=host+att
achment&uid=ssg1S4000106&loc=en_US&cs=utf-8&lang=en
SDD
IBM Subsystem Device Driver (SDD) multipathing software has been designed and updated
consistently over the last decade and is an extremely mature multipathing technology. The
SDD software also supports many other IBM storage types directly connected to AIX, such as
the 2107. SDD algorithms for handling multipathing have also evolved. There are throttling
mechanisms within SDD that controlled overall I/O bandwidth in SDD Releases 1.6.1.0 and
lower. This throttling mechanism has evolved to be single vpath specific and is called
qdepth_enable in later releases.
SDD utilizes persistent reserve functions, placing a persistent reserve on the device in place
of the legacy reserve when the volume group is varyon. However, if HACMP is installed,
HACMP controls the persistent reserve usage depending on the type of varyon used. Also,
the enhanced concurrent volume groups (VGs) have no reserves: varyonvg -c for enhanced
concurrent and varyonvg for regular VGs that utilize the persistent reserve.
Datapath commands are an extremely powerful method for managing the SVC storage and
pathing. The output shows the LUN serial number of the SVC VDisk and which vpath and
hdisk represent that SVC LUN. Datapath commands can also change the multipath selection
algorithm. The default is load balance, but the multipath selection algorithm is programmable.
The recommended best practice when using SDD is also load balance using four paths. The
datapath query device output will show a somewhat balanced number of selects on each
preferred path to the SVC:
DEV#: 12 DEVICE NAME: vpath12 TYPE: 2145
POLICY:
Optimized
SERIAL: 60050768018B810A88000000000000E0
====================================================================
Path#
Adapter/Hard Disk
State
Mode
Select
Errors
0
fscsi0/hdisk55
OPEN
NORMAL
1390209
0
1
fscsi0/hdisk65
OPEN
NORMAL
0
0
2
fscsi0/hdisk75
OPEN
NORMAL
1391852
0
3
fscsi0/hdisk85
OPEN
NORMAL
0
0
We recommend that you verify that the selects during normal operation are occurring on the
preferred paths (use datapath query device -l). Also, verify that you have the correct
connectivity.
Chapter 9. Hosts
195
SDDPCM
As Fibre Channel technologies matured, AIX was enhanced by adding native multipathing
support called Multipath I/O (MPIO). This structure allows a manufacturer of storage to create
software plug-ins for their specific storage. The IBM SVC version of this plug-in is called
SDDPCM, which requires a host attachment script called devices.fcp.disk.ibm.mpio.rte:
http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D410&q1=host+att
achment&uid=ssg1S4000203&loc=en_US&cs=utf-8&lang=en
SDDPCM and AIX MPIO have been continually improved since their release. We recommend
that you are at the latest release levels of this software.
The preferred path indicator for SDDPCM will not display until after the device has been
opened for the first time, which differs from SDD, which displays the preferred path
immediately after being configured.
SDDPCM features four types of reserve policies:




No_reserve policy
Exclusive host access single path policy
Persistent reserve exclusive host policy
Persistent reserve shared host access policy
The usage of the persistent reserve now depends on the hdisk attribute: reserve_policy.
Change this policy to match your storage security requirements.
There are three path selection algorithms:
 Failover
 Round-robin
 Load balancing
The latest SDDPCM code of 2.1.3.0 and later has improvements in failed path reclamation by
a health checker, a failback error recovery algorithm, Fibre Channel dynamic device tracking,
and support for SAN boot device on MPIO-supported storage devices.
9.5.2 SDD compared to SDDPCM
There are several reasons for choosing SDDPCM over SDD. SAN boot is much improved
with native mpio-sddpcm software. Multiple Virtual I/O Servers (VIOSs) are supported.
Certain applications, such as Oracle® ASM, will not work with SDD.
Another thing that might be worthwhile noting is that with SDD, all paths can go to dead,
which will improve HACMP and Logical Volume Manager (LVM) mirroring failovers. With
SDDPCM, one path will always remain open even if the LUN is dead. This design causes
longer failovers.
With SDDPCM utilizing HACMP, enhanced concurrent volume groups require the no reserve
policy for both concurrent and non-concurrent resource groups. Therefore, HACMP uses a
software locking mechanism instead of implementing persistent reserves. HACMP used with
SDD does utilize persistent reserves based on what type of varyonvg was executed.
SDDPCM pathing
SDDPCM pcmpath commands are the best way to understand configuration information about
the SVC storage allocation. The following example shows how much can be determined from
this command, pcmpath query device, about the connections to the SVC from this host.
196
SAN Volume Controller Best Practices and Performance Guidelines
DEV#:
0 DEVICE NAME: hdisk0 TYPE: 2145 ALGORITHM: Load Balance
SERIAL: 6005076801808101400000000000037B
======================================================================
Path# Adapter/Path Name
State
Mode
Select Errors
0
fscsi0/path0
OPEN
NORMAL
155009 0
1
fscsi1/path1
OPEN
NORMAL
155156 0
In this example, both paths are being used for the SVC connections. These counts are not
the normal select counts for a properly mapped SVC, and two paths are not an adequate
number of paths. Use the -l option on pcmpath query device to check whether these paths
are both preferred paths. If they are both preferred paths, one SVC node must be missing
from the host view.
Using the -l option shows an asterisk on both paths, indicating a single node is visible to the
host (and is the non-preferred node for this VDisk):
0*
1*
fscsi0/path0
fscsi1/path1
OPEN
OPEN
NORMAL
NORMAL
9795 0
9558 0
This information indicates a problem that needs to be corrected. If zoning in the switch is
correct, perhaps this host was rebooted while one SVC node was missing from the fabric.
Veritas
Veritas DMP multipathing is also supported for the SVC. Veritas DMP multipathing requires
certain AIX APARS and the Veritas Array Support Library. It also requires a certain version of
the host attachment script devices.fcp.disk.ibm.rte to recognize the 2 145 devices as hdisks
rather than MPIO hdisks. In addition to the normal ODM databases that contain hdisk
attributes, there are several Veritas filesets that contain configuration data:
 /dev/vx/dmp
 /dev/vx/rdmp
 /etc/vxX.info
Storage reconfiguration of VDisks presented to an AIX host will require cleanup of the AIX
hdisks and these Veritas filesets.
9.5.3 Virtual I/O server
Virtual SCSI is based on a client/server relationship. The Virtual I/O Server (VIOS) owns the
physical resources and acts as the server, or target, device. Physical adapters with attached
disks (VDisks on the SVC, in our case) on the Virtual I/O Server partition can be shared by
one or more partitions. These partitions contain a virtual SCSI client adapter that sees these
virtual devices as standard SCSI compliant devices and LUNs.
There are two types of VDisks that you can create on a VIOS: physical volume (PV) VSCSI
hdisks and logical volume (LV) VSCSI hdisks.
PV VSCSI hdisks are entire LUNs from the VIOS point of view, and if you are concerned
about failure of a VIOS and have configured redundant VIOSs for that reason, you must use
PV VSCSI hdisks. So, PV VSCSI hdisks are entire LUNs that are VDisks from the virtual I/O
client (VIOC) point of view. An LV VCSI hdisk cannot be served up from multiple VIOSs. LV
VSCSI hdisks reside in LVM volume groups (VGs) on the VIOS and cannot span PVs in that
VG, nor be striped LVs. Due to these restrictions, we recommend using PV VSCSI hdisks.
Multipath support for SVC attachment to Virtual I/O Server is provided by either SDD or MPIO
with SDDPCM. Where Virtual I/O Server SAN Boot or dual Virtual I/O Server configurations
Chapter 9. Hosts
197
are required, only MPIO with SDDPCM is supported. We recommend using MPIO with
SDDPCM due to this restriction with the latest SVC-supported levels as shown by:
http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003278#_Virtual_IO_Server
Details of the Virtual I/O Server-supported environments are at:
http://www14.software.ibm.com/webapp/set2/sas/f/vios/home.html
There are many questions answered on the following Web site for usage of the VIOS:
http://www14.software.ibm.com/webapp/set2/sas/f/vios/documentation/faq.html
One common question is how to migrate data into a VIO environment or how to reconfigure
storage on a VIOS. This question is addressed in the previous link.
Many clients want to know if SCSI LUNs can be moved between the physical and virtual
environment “as is.” That is, given a physical SCSI device (LUN) with user data on it that
resides in a SAN environment, can this device be allocated to a VIOS and then provisioned to
a client partition and used by the client “as is”?
The answer is no, this function is not supported at this time. The device cannot be used “as
is.” Virtual SCSI devices are new devices when created, and the data must be put on them
after creation, which typically requires a type of backup of the data in the physical SAN
environment with a restoration of the data onto the VDisk.
Why do we have this limitation
The VIOS uses several methods to uniquely identify a disk for use as a virtual SCSI disk; they
are:
 Unique device identifier (UDID)
 IEEE volume identifier
 Physical volume identifier (PVID)
Each of these methods can result in different data formats on the disk. The preferred disk
identification method for VDisks is the use of UDIDs.
MPIO uses the UDID method
Most non-MPIO disk storage multipathing software products use the PVID method instead of
the UDID method. Because of the different data format associated with the PVID method,
clients with non-MPIO environments need to be aware that certain future actions performed in
the VIOS logical partition (LPAR) can require data migration, that is, a type of backup and
restoration of the attached disks. These actions can include, but are not limited to:
 Conversion from a non-MPIO environment to MPIO
 Conversion from the PVID to the UDID method of disk identification
 Removal and rediscovery of the Disk Storage ODM entries
 Updating non-MPIO multipathing software under certain circumstances
 Possible future enhancements to VIO
Due in part to the differences in disk format that we just described, VIO is currently supported
for new disk installations only.
AIX, VIO, and SDD development are working on changes to make this migration easier in the
future. One enhancement is to use the UDID or IEEE method of disk identification. If you use
the UDID method, it might be possible to contact IBM technical support to get a method of
migrating that might not require restoration.
198
SAN Volume Controller Best Practices and Performance Guidelines
A quick and simple method to determine if a backup and restoration is necessary is to run the
command lquerypv -h /dev/hdisk## 80 10 to read the PVID off the disk. If the output is
different on both the VIOS and VIOC, you must use backup and restore.
How to back up the VIO configuration
To back up the VIO configuration:
1. Save off the volume group information from the VIOC (PVIDs and VG names).
2. Save off the disk mapping, PVID, and LUN ID information from ALL VIOSs. This step
includes mapping the VIOS hdisk (typically, a hdisk) to the VIOC hdisk and you must save
at least the PVIDs information.
3. Save off the physical LUN to host LUN ID information on the storage subsystem for when
we reconfigure the hdisk (typically).
After all the pertinent mapping data has been collected and saved, it is possible to back up
and reconfigure your storage and then restore using the AIX commands:
 Back up the VG data on the VIOC.
 For rootvg, the supported method is a mksysb and an install, or savevg and restvg for
non-rootvg.
9.5.4 Windows
There are two options of multipathing drivers released for Windows 2003 Server hosts.
Windows 2003 Server device driver development has concentrated on the storport.sys driver.
This driver has significant interoperability differences from the older scsiport driver set.
Additionally, Windows has released a native multipathing I/O option with a storage specific
plug-in. SDDDSM was designed to support these newer methods of interfacing with Windows
2003 Server. In order to release new enhancements more quickly, the newer hardware
architectures (64-bit EMT and so forth) are only tested on the SDDDSM code stream;
therefore, only SDDDSM packages are available.
The older version of the SDD multipathing driver works with the scsiport drivers. This version
is required for Windows Server 2000 servers, because storport.sys is not available. The SDD
software is also available for Windows 2003 Server servers when the scsiport hba drivers are
used.
Clustering and reserves
Windows SDD or SDDDSM utilizes the persistent reserve functions to implement Windows
Clustering. A stand-alone Windows host will not utilize reserves.
Review this Microsoft article about clustering to understand how a cluster works:
http://support.microsoft.com/kb/309186/
When SDD or SDDDSM is installed, the reserve and release functions described in this
article are translated into proper persistent reserve and release equivalents to allow load
balancing and multipathing from each host.
SDD compared to SDDDSM
The major requirement for choosing SDD over SDDDSM is to ensure the matching host bus
adapter driver type is also loaded on the system. Choose the storport driver for SDDDSM and
the scsiport versions for SDD. From an error isolation perspective, the tracing available and
collected by sddgetdata is easier to follow with the SDD software and is a more mature
Chapter 9. Hosts
199
release. Future enhancements will concentrate on SDDDSM within the windows MPIO
framework.
Tunable parameters
With Windows operating systems, the queue depth settings are the responsibility of the host
adapters and configured through the BIOS setting. Configuring the queue depth settings
varies from vendor to vendor. Refer to your manufacturer’s instructions about how to
configure your specific cards and the IBM System Storage SAN Volume Controller Host
Attachment User’s Guide Version 4.3.0, SC26-7905-03, at:
http://www-1.ibm.com/support/docview.wss?uid=ssg1S7002159
Queue depth is also controlled by the Windows application program. The application program
has control of how many I/O commands it will allow to be outstanding before waiting for
completion.
For IBM FAStT FC2-133 (and QLogic-based HBAs), the queue depth is known as the
execution throttle, which can be set with either the QLogic SANSurfer tool or in the BIOS of
the QLogic-based HBA by pressing Ctrl+Q during the startup process.
Changing back-end storage LUN mappings dynamically
Unmapping a LUN from a Windows SDD or SDDDSM server and then mapping a different
LUN using the same SCSI ID can cause data corruption and loss of access. The procedure
for reconfiguration is documented at the following Web site:
http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&uid=ssg1S1003316&lo
c=en_US&cs=utf-8&lang=en
Recommendations for Disk Alignment using Windows with SVC VDisks
The recommended settings for the best performance with SVC when you use Microsoft
Windows operating systems and applications with a significant amount of I/O can be found at
the following Web site:
http://www-1.ibm.com/support/docview.wss?rs=591&context=STPVGU&context=STPVFV&q1=m
icrosoft&uid=ssg1S1003291&loc=en_US&cs=utf-8&lang=en
9.5.5 Linux
IBM has decided to transition SVC multipathing support from IBM SDD to Linux® native
DM-MPIO multipathing. Refer to the V4.3.0 - Recommended Software Levels for SAN
Volume Controller for which versions of each Linux kernel require SDD or DM-MPIO support:
http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003278
If your kernel is not listed for support, contact your IBM marketing representative to request a
Request for Price Quotation (RPQ) for your specific configuration.
Linux Clustering is not supported, and Linux OS does not use the legacy reserve function.
Therefore, there are no persistent reserves used in Linux. Contact IBM marketing for RPQ
support if you need Linux Clustering in your specific environment.
200
SAN Volume Controller Best Practices and Performance Guidelines
SDD compared to DM-MPIO
For reference on the multipathing choices for Linux operating systems, SDD development
has provided the white paper, Considerations and Comparisons between IBM SDD for Linux
and DM-MPIO, which is available at:
http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&q1=linux&uid=ssg1S7
001664&loc=en_US&cs=utf-8&lang=en
Tunable parameters
Linux performance is influenced by HBA parameter settings and queue depth. Queue depth
for Linux servers can be determined by using the formula specified in the IBM System
Storage SAN Volume Controller V4.3.0 - Software Installation and Configuration Guide,
SC23-6628-02, at:
http://www-1.ibm.com/support/docview.wss?uid=ssg1S7002156
Refer to the settings for each specific HBA type and general Linux OS tunable parameters in
the IBM System Storage SAN Volume Controller V4.3.0 - Host Attachment Guide,
SC26-7905-03, at:
http://www-1.ibm.com/support/docview.wss?uid=ssg1S7002159
In addition to the I/O and OS parameters, Linux also has tunable file system parameters.
You can use the command tune2fs to increase file system performance based on your
specific configuration. The journal mode and size can be changed. Also, the directories can
be indexed. Refer to the following open source document for details:
http://swik.net/how-to-increase-ext3-and-reiserfs-filesystems-performance
9.5.6 Solaris
There are several options for multipathing support on Solaris™ hosts. You can choose
between IBM SDD, Symantec/VERITAS Volume Manager, or you can use Solaris MPxIO
depending on the OS levels in the latest SVC software level matrix.
SAN startup support and clustering support are available for Symantec/VERITAS Volume
Manager, and SAN boot support is also available for MPxIO.
Solaris MPxIO
Releases of SVC code prior to 4.3.0 did not support load balancing of the MPxIO software.
Configure your SVC host object with the type attribute set to tpgs if you want to run MPxIO on
your Sun™ SPARC host. For example:
svctask mkhost -name new_name_arg -hbawwpn wwpn_list -type tpgs
In this command, -type specifies the type of host. Valid entries are hpux, tpgs, or generic.
The tpgs option enables an extra target port unit. The default is generic.
Symantec/VERITAS Volume Manager
When managing IBM SVC storage in Symantec’s volume manager products, you must install
an array support library (ASL) on the host so that the volume manager is aware of the storage
subsystem properties (active/active, active/passive). If the appropriate ASL is not installed,
the volume manager has not claimed the LUNs. Usage of the ASL is required to enable the
special failover/failback multipathing that SVC requires for error recovery.
Chapter 9. Hosts
201
Use the following commands to determine the basic configuration of a Symantec/Veritas
server:
pkginfo –l (lists all installed packages)
showrev -p |grep vxvm (to obtain version of volume manager)
vxddladm listsupport (to see what ASLs are configured)
vxdisk list
vxdmpadm listctrl all (shows all attached subsystems, and provides a type where
possible)
vxdmpadm getsubpaths ctlr=cX (lists paths by controller)
vxdmpadm getsubpaths dmpnodename=cxtxdxs2’ (lists paths by lun)
The following commands will determine if the SVC is properly connected and show at a
glance which ASL library is used (native DMP ASL or SDD ASL).
Here is an example of what you see when Symantec volume manager is correctly seeing our
SVC, using the SDD passthrough mode ASL:
# vxdmpadm list enclosure all
ENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS
============================================================
OTHER_DISKS OTHER_DISKS OTHER_DISKS CONNECTED
VPATH_SANVC0 VPATH_SANVC 0200628002faXX00 CONNECTED
Here is an example of what we see when SVC is configured using native DMP ASL:
# vxdmpadm listenclosure all
ENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS
============================================================
OTHER_DISKS OTHER_DSKSI OTHER_DISKS CONNECTED
SAN_VC0 SAN_VC 0200628002faXX00 CONNECTED
ASL specifics for SVC
For SVC, ASLs have been developed using both DMP multipathing or SDD passthrough
multipathing.
For SDD passthrough:
http://support.veritas.com/docs/281321
# pkginfo -l VRTSsanvc
PKG=VRTSsanvc
BASEDIR=/etc/vx
NAME=Array Support Library for IBM SAN.VC with SDD.
PRODNAME=VERITAS ASL for IBM SAN.VC with SDD.
For native DMP:
http://support.veritas.com/docs/276913
pkginfo -l VRTSsanvc
PKGINST: VRTSsanvc
NAME: Array Support Librarry for IBM SAN.VC in NATIVE DMP mode
To check the installed Symantec/VERITAS version:
showrev -p |grep vxvm
To check what IBM ASLs are configured into the volume manager:
vxddladm listsupport |grep -i ibm
202
SAN Volume Controller Best Practices and Performance Guidelines
Following the installation of a new ASL using pkgadd, you need to either reboot or issue
vxdctl enable. To list the ASLs that are active, run vxddladm listsupport.
How to troubleshoot configuration issues
Here is an example of the appropriate ASL not being installed or the system not enabling the
ASL. The key is the enclosure type OTHER_DISKS:
vxdmpadm listctlr all
CTLR-NAME
ENCLR-TYPE
STATE
ENCLR-NAME
=====================================================
c0
OTHER_DISKS
ENABLED
OTHER_DISKS
c2
OTHER_DISKS
ENABLED
OTHER_DISKS
c3
OTHER_DISKS
ENABLED
OTHER_DISKS
vxdmpadm listenclosure all
ENCLR_NAME
ENCLR_TYPE
ENCLR_SNO
STATUS
============================================================
OTHER_DISKS
OTHER_DISKS
OTHER_DISKS
CONNECTED
Disk
Disk
DISKS
DISCONNECTED
9.5.7 VMware
Review the V4.3.0 - Recommended Software Levels for SAN Volume Controller Web site for
the various ESX levels that are supported:
http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003278#_VMWare
To get continued support for VMware levels, for example, you use level 3.01, you must
upgrade it to a minimum VMware level of 3.02. For more details, contact your IBM marketing
representative and ask about the submission of an RPQ for support. The necessary patches
and procedures to apply them will be supplied after the specific configuration has been
reviewed and approved.
Multipathing solutions supported
Multipathing is supported at ESX level 2.5.x and higher; therefore, installing multipathing
software is not required. Single pathing is only supported in ESX level 2.1.
VMware® multipathing does not support dynamic pathing. Preferred paths will be ignored in
the SVC. The VMware multipathing software performs static load balancing for I/O, based
upon a host setting, which defines the preferred path for a given volume.
Multipathing configuration maximums
The maximum supported configuration for the VMware multipathing software is:
 A total of 256 SCSI devices
 Four paths to each VDisk
Note: Each path to a VDisk equates to a single SCSI device.
For more information about VMware and SVC, VMware storage and zoning
recommendations, HBA settings and attaching VDisks to VMware, refer to IBM System
Storage SAN Volume Controller V4.3, SG24-6423, at:
http://www.redbooks.ibm.com/redpieces/abstracts/sg246423.html
Chapter 9. Hosts
203
9.6 Mirroring considerations
As you plan how to fully utilize the various options to back up your data through mirroring
functions, consider how to keep a consistent set of data for your application. A consistent set
of data implies a level of control by the application or host scripts to start and stop mirroring
with both host-based mirroring and back-end storage mirroring features. It also implies a
group of disks that must be kept consistent.
Host applications have a certain granularity to their storage writes. The data has a consistent
view to the host application only at certain times. This level of granularity is at the file system
level as opposed to the SCSI read/write level. The SVC guarantees consistency at the SCSI
read/write level when its features of mirroring are in use. However, a host file system write
might require multiple SCSI writes. Therefore, without a method of controlling when the
mirroring stops, the resulting mirror can be missing a portion of a write and look corrupted.
Normally, a database application has methods to recover the mirrored data and to back up to
a consistent view, which is applicable in the case of a disaster that breaks the mirror.
However, we recommend that you have a normal procedure of stopping at a consistent view
for each mirror in order to be able to easily start up the backup copy for non-disaster
scenarios.
9.6.1 Host-based mirroring
Host-based mirroring is a fully redundant method of mirroring using two mirrored copies of the
data. Mirroring is done by the host software. If you use this method of mirroring, we
recommend that each copy is placed on a separate SVC cluster.
9.7 Monitoring
A consistent set of monitoring tools is available when IBM SDD, SDDDSM, and SDDPCM are
used for the multipathing software on the various OS environments. Examples earlier in this
chapter showed how the datapath query device and datapath query adapter commands
can be used for path monitoring.
Path performance can also be monitored via datapath commands:
datapath query devstats (or pcmpath query devstats)
The datapath query devstats command shows performance information for a single device,
all devices, or a range of devices. Example 9-6 shows the output of datapath query devstats
for two devices.
Example 9-6 The datapath query devstats command output
C:\Program Files\IBM\Subsystem Device Driver>datapath query devstats
Total Devices : 2
Device #:
0
=============
I/O:
SECTOR:
204
Total Read
1755189
14168026
Total Write
1749581
153842715
Active Read
0
0
SAN Volume Controller Best Practices and Performance Guidelines
Active Write
0
0
Maximum
3
256
Transfer Size:
<= 512
271
<= 4k
2337858
<= 16K
104
<= 64K
1166537
> 64K
0
Total Read
20353800
162956588
Total Write
9883944
451987840
Active Read
0
0
Active Write
1
128
Maximum
4
256
<= 512
296
<= 4k
27128331
<= 16K
215
<= 64K
3108902
> 64K
0
Device #:
1
=============
I/O:
SECTOR:
Transfer Size:
Also, an adapter level statistics command is available: datapath query adaptstats (also
mapped to pcmpath query adaptstats). Refer to Example 9-7 for a two adapter example.
Example 9-7 The datapath query adaptstats output
C:\Program Files\IBM\Subsystem Device Driver>datapath query adaptstats
Adapter #: 0
=============
I/O:
SECTOR:
Total Read
11060574
88611927
Total Write
5936795
317987806
Active Read
0
0
Active Write
0
0
Maximum
2
256
Total Read
11048415
88512687
Total Write
5930291
317726325
Active Read
0
0
Active Write
1
128
Maximum
2
256
Adapter #: 1
=============
I/O:
SECTOR:
It is possible to clear these counters so that you can script the usage to cover a precise
amount of time. The commands also allow you to choose devices to return as a range, single
device, or all devices. The command to clear the counts is datapath clear device count.
9.7.1 Automated path monitoring
There are many situations in which a host can lose one or more paths to storage. If the
problem is just isolated to that one host, it might go unnoticed until a SAN issue occurs that
causes the remaining paths to go offline, such as a switch failure, or even a routine code
upgrade, which can cause a loss-of-access event, which seriously affects your business. To
prevent this loss-of-access event from happening, many clients have found it useful to
implement automated path monitoring using SDD commands and common system utilities.
For instance, a simple command string in a UNIX system can count the number of paths:
datapath query device | grep dead | lc
This command can be combined with a scheduler, such as cron, and a notification system,
such as an e-mail, to notify SAN administrators and system administrators if the number of
paths to the system changes.
Chapter 9. Hosts
205
9.7.2 Load measurement and stress tools
Generally, load measurement tools are specific to each host operating system tool support.
For example, the AIX OS has the tool iostat. Windows OS has perfmon.msc /s.
There are industry standard performance benchmarking tools available. These tools are
available by joining the Storage Performance Council. The information about how to join is
available here:
http://www.storageperformance.org/home
These tools are available to both create stress and measure the stress that was created with
a standardized tool and are highly recommended for generating stress for your test
environments to compare against the industry measurements.
Another recommended stress tool available is iometer for Windows and Linux hosts:
http://www.iometer.org
AIX System p has Wikis on performance tools and has made a set available for their users:
http://www-941.ibm.com/collaboration/wiki/display/WikiPtype/Performance+Monitoring
+Tools
http://www-941.ibm.com/collaboration/wiki/display/WikiPtype/nstress
Xdd is a tool for measuring and analyzing disk performance characteristics on single systems
or clusters of systems. It was designed by Thomas M. Ruwart from I/O Performance, Inc. to
provide consistent and reproducible performance of a sustained transfer rate of an I/O
subsystem. It is a command line-based tool that grew out of the UNIX community and has
been ported to run in Windows environments as well.
Xdd is a free software program distributed under a GNU General Public License. Xdd is
available for download at:
http://www.ioperformance.com/products.htm
The Xdd distribution comes with all the source code necessary to install Xdd and the
companion programs for the timeserver and the gettime utility programs.
DS4000 Best Practices and Performance Tuning Guide, SG24-6363-02, has detailed
descriptions of how to use these measurement and test tools:
http://www.redbooks.ibm.com/abstracts/sg246363.html?Open
206
SAN Volume Controller Best Practices and Performance Guidelines
10
Chapter 10.
Applications
In this chapter, we provide information about laying out storage for the best performance for
general applications, IBM AIX Virtual I/O (VIO) servers, and IBM DB2® databases
specifically. While most of the specific information is directed to hosts running the IBM AIX
operating system, the information is also relevant to other host types.
© Copyright IBM Corp. 2008. All rights reserved.
207
10.1 Application workloads
In general, there are two types of data workload (data processing):
 Transaction-based
 Throughput-based
These workloads are different by nature and must be planned for in quite different ways.
Knowing and understanding how your host servers and applications handle their workload is
an important part of being successful with your storage configuration efforts and the resulting
performance.
A workload that is characterized by a high number of transactions per second and a high
number of I/Os Per Second (IOPS) is called a transaction-based workload.
A workload that is characterized by a large amount of data transferred, normally with large I/O
sizes, is called a throughput-based workload.
These two workload types are conflicting in nature and consequently will require different
configuration settings across all components comprising the storage infrastructure. Generally,
I/O (and therefore application) performance will be best when the I/O activity is evenly spread
across the entire I/O subsystem.
But first, let us describe each type of workload in greater detail and explain what you can
expect to encounter in each case.
10.1.1 Transaction-based workloads
High performance transaction-based environments cannot be created with a low-cost model
of a storage server. Indeed, transaction process rates are heavily dependent on the number
of back-end physical drives that are available for the storage subsystem controllers to use for
parallel processing of host I/Os, which frequently results in having to decide how many
physical drives you need.
Generally, transaction intense applications also use a small random data block pattern to
transfer data. With this type of data pattern, having more back-end drives enables more host
I/Os to be processed simultaneously, because read cache is far less effective than write
cache, and the misses need to be retrieved from the physical disks.
In many cases, slow transaction performance problems can be traced directly to “hot” files
that cause a bottleneck on a critical component (such as a single physical disk). This situation
can occur even when the overall storage subsystem sees a fairly light workload. When
bottlenecks occur, they can present an extremely difficult and frustrating task to resolve.
Because workload content can continually change throughout the course of the day, these
bottlenecks can be extremely mysterious in nature and appear and disappear or move over
time from one location to another location.
Generally, I/O (and therefore application) performance will be best when the I/O activity is
evenly spread across the entire I/O subsystem.
10.1.2 Throughput-based workloads
Throughput-based workloads are seen with applications or processes that require massive
amounts of data sent and generally use large sequential blocks to reduce disk latency.
208
SAN Volume Controller Best Practices and Performance Guidelines
Generally, a smaller number of physical drives are needed to reach adequate I/O
performance than with transaction-based workloads. For instance, 20 - 28 physical drives are
normally enough to reach maximum I/O throughput rates with the IBM System Storage
DS4000 series of storage subsystems. In a throughput-based environment, read operations
make use of the storage subsystem cache to stage greater chunks of data at a time to
improve the overall performance. Throughput rates are heavily dependent on the storage
subsystem’s internal bandwidth. Newer storage subsystems with broader bandwidths are
able to reach higher numbers and bring higher rates to bear.
10.1.3 Storage subsystem considerations
It is of great importance that the selected storage subsystem model is able to support the
required I/O workload. Besides availability concerns, adequate performance must be ensured
to meet the requirements of the applications, which include evaluation of the disk drive
modules (DDMs) used and if the internal architecture of the storage subsystem is sufficient.
With today’s mechanically based DDMs, it is important that the DDM characteristics match
the needs. In general, a high rotation speed of the DDM platters is needed for
transaction-based throughputs, where the DDM head continuously moves across the platters
to read and write random I/Os. For throughput-based workloads, a lower rotation speed might
be sufficient, because of the sequential I/O nature. As for the subsystem architecture, newer
generations of storage subsystems have larger internal caches, higher bandwidth busses,
and more powerful storage controllers.
10.1.4 Host considerations
When discussing performance, we need to consider far more than just the performance of the
I/O workload itself. Many settings within the host frequently affect the overall performance of
the system and its applications. All areas must be checked to ensure that we are not focusing
on a result rather than the cause. However, in this book we are focusing on the I/O subsystem
part of the performance puzzle; so we will discuss items that affect its operation.
Several of the settings and parameters that we discussed in Chapter 9, “Hosts” on page 175
must match both for the host operating system (OS) and for the host bus adapters (HBAs)
being used as well. Many operating systems have built-in definitions that can be changed to
enable the HBAs to be set to the new values.
10.2 Application considerations
When gathering data for planning from the application side, it is important to first consider the
workload type for the application.
If multiple applications or workload types will share the system, you need to know the type of
workloads of each application, and if the applications have both types or are mixed
(transaction-based and throughput-based), which workload will be the most critical. Many
environments have a mix of transaction-based and throughput-based workloads; generally,
the transaction performance is considered the most critical.
However, in some environments, for example, a Tivoli Storage Manager backup environment,
the streaming high throughput workload of the backup itself is the critical part of the operation.
The backup database, although a transaction-centered workload, is a less critical workload.
Chapter 10. Applications
209
10.2.1 Transaction environments
Applications that use high transaction workloads are known as Online Transaction Processing
(OLTP) systems. Examples of these systems are database servers and mail servers.
If you have a database, you tune the server type parameters, as well as the database’s logical
drives, to meet the needs of the database application. If the host server has a secondary role
of performing nightly backups for the business, you need another set of logical drives, which
are tuned for high throughput for the best backup performance you can get within the
limitations of the mixed storage subsystem’s parameters.
So, what are the traits of a transaction-based application? In the following sections, we
explain these traits in more detail.
As mentioned earlier, you can expect to see a high number of transactions and a fairly small
I/O size. Different databases use different I/O sizes for their logs (refer to the following
examples), and these logs vary from vendor to vendor. In all cases, the logs are generally
high write-oriented workloads. For table spaces, most databases use between a 4 KB and a
16 KB I/O size. In certain applications, larger chunks (for example, 64 KB) will be moved to
host application cache memory for processing. Understanding how your application is going
to handle its I/O is critical to laying out the data properly on the storage server.
In many cases, the table space is generally a large file made up of small blocks of data
records. The records are normally accessed using small I/Os of a random nature, which can
result in about a 50% cache miss ratio. For this reason and to not waste space with unused
data, plan for the SAN Volume Controller (SVC) to read and write data into cache in small
chunks (use striped VDisks with smaller extent sizes).
Another point to consider is whether the typical I/O is read or write. In most OLTP
environments, there is generally a mix of about 70% reads and 30% writes. However, the
transaction logs of a database application have a much higher write ratio and, therefore,
perform better in a different managed disk (MDisk) group (MDG). Also, you need to place the
logs on a separate virtual disk (VDisk), which for best performance must be located on a
different MDG that is defined to better support the heavy write need. Mail servers also
frequently have a higher write ratio than read ratio.
Best practice: Database table spaces, journals, and logs must never be collocated on the
same MDisk or MDG in order to avoid placing them on the same back-end storage logical
unit number (LUN) or Redundant Array of Independent Disks (RAID) array.
10.2.2 Throughput environments
With throughput workloads, you have fewer transactions, but much larger I/Os. I/O sizes of
128 K or greater are normal, and these I/Os are generally of a sequential nature. Applications
that typify this type of workload are imaging, video servers, seismic processing, high
performance computing (HPC), and backup servers.
With large size I/O, it is better to use large cache blocks to be able to write larger chunks into
cache with each operation. Generally, you want the sequential I/Os to take as few back-end
I/Os as possible and to get maximum throughput from them. So, carefully decide how the
logical drive will be defined and how the VDisks are dispersed on the back-end storage
MDisks.
Many environments have a mix of transaction-oriented workloads and throughput-oriented
workloads. Unless you have measured your workloads, assume that the host workload is
210
SAN Volume Controller Best Practices and Performance Guidelines
mixed and use SVC striped VDisks over several MDisks in an MDG in order to have the best
performance and eliminate trouble spots or “hot spots.”
10.3 Data layout overview
In this section, we document data layout from an AIX point of view. Our objective is to help
ensure that AIX and storage administrators, specifically those responsible for allocating
storage, know enough to lay out the storage data, consider the virtualization layers, and avoid
the performance problems and hot spots that come with poor data layout. The goal is to
balance I/Os evenly across the physical disks in the back-end storage subsystems.
We will specifically show you how to lay out storage for DB2 applications as a good example
of how an application might balance its I/Os within the application.
There are also various implications for the host data layout based on whether you utilize SVC
image mode or SVC striped mode VDisks.
10.3.1 Layers of volume abstraction
Back-end storage is laid out into RAID arrays by RAID type, the number of disks in the array,
and the LUN allocation to the SVC or host. The RAID array is a certain number of disk drive
modules (DDMs) (usually containing from two to 32 disks and most often, around 10 disks) in
a RAID configuration (RAID 0, RAID 1, RAID 5, or RAID 10, typically); although, certain
vendors call their entire disk subsystem an “array.”
Use of an SVC adds another layer of virtualization to understand, because there are VDisks,
which are LUNs served from the SVC to a host, and MDisks, which are LUNs served from
back-end storage to the SVC.
The SVC VDisks are presented to the host as LUNs. These LUNs are then mapped as
physical volumes on the host, which might build logical volumes out of the physical volumes.
Figure 10-1 on page 212 shows the layers of storage virtualization.
Chapter 10. Applications
211
Figure 10-1 Layers of storage virtualization
10.3.2 Storage administrator and AIX LVM administrator roles
Storage administrators control the configuration of the back-end storage subsystems and
their RAID arrays (RAID type and number of disks in the array, although there are restrictions
on the number of disks in the array and other restrictions depending upon the disk
subsystem). They normally also decide the layout of the back-end storage LUNs (MDisks),
SVC MDGs, SVC VDisks, and which VDisks are assigned to which hosts.
The AIX administrators control the AIX Logical Volume Manager (LVM) and in which volume
group (VG) the SVC VDisks (LUNs) are placed. They also create logical volumes (LVs) and
file systems within the VGs. These administrators have no control where multiple files or
directories reside in an LV unless there is only one file or directory in the LV.
There is also an application administrator for those applications, such as DB2, which balance
their I/Os by striping directly across the LVs.
Together, the storage administrator, LVM administrator, and application administrator control
on which physical disks the LVs reside.
10.3.3 General data layout recommendations
Our primary recommendation for laying out data on SVC back-end storage for general
applications is to use striped VDisks across MDGs consisting of similar-type MDisks with as
few MDisks as possible per RAID array. This general purpose rule is applicable to most SVC
212
SAN Volume Controller Best Practices and Performance Guidelines
back-end storage configurations and removes a significant data layout burden for the storage
administrators.
Consider where the “failure boundaries” are in the back-end storage and take this into
consideration when locating application data. A failure boundary is defined as what will be
affected if we lose a RAID array (an SVC MDisk). All the VDisks and servers striped on that
MDisk will be affected together with all other VDisks in that MDG. Consider also that
spreading out the I/Os evenly across the back-end storage has a performance benefit and a
management benefit. We recommend that an entire set of back-end storage is managed
together considering the failure boundary. If a company has several lines of business (LOBs),
it might decide to manage the storage along each LOB so that each LOB has a unique set of
back-end storage. So, for each set of back-end storage (a group of MDGs or perhaps better,
just one MDG), we create only striped VDisks across all the back-end storage arrays, which is
is beneficial, because the failure boundary is limited to a LOB, and performance and storage
management is handled as a unit for the LOB independently.
What we do not recommend is to create striped VDisks that are striped across different sets
of back-end storage, because using different sets of back-end storage makes the failure
boundaries difficult to determine, unbalances the I/O, and might limit the performance of
those striped VDisks to the slowest back-end device.
For SVC configurations where SVC image mode VDisks must be used, we recommend that
the back-end storage configuration for the database consists of one LUN (and therefore one
image mode VDisk) per array, or an equal number of LUNs per array, so that the Database
Administrator (DBA) can guarantee that the I/O workload is distributed evenly across the
underlying physical disks of the arrays. Refer to Figure 10-2 on page 214.
Use striped mode VDisks for applications that do not already stripe their data across physical
disks. Striped VDisks are the all-purpose VDisks for most applications. Use striped mode
VDisks if you need to manage a diversity of growing applications and balance the I/O
performance based on probability.
If you understand your application storage requirements, you might take an approach that
explicitly balances the I/O rather than a probabilistic approach to balancing the I/O. However,
explicitly balancing the I/O requires either testing or a good understanding of the application
and the storage mapping and striping to know which approach works better.
Examples of applications that stripe their data across the underlying disks are DB2, GPFS™,
and Oracle ASM. These types of applications might require additional data layout
considerations as described in 10.4, “When the application does its own balancing of I/Os” on
page 216.
Chapter 10. Applications
213
General data layout recommendation for AIX:
 Evenly balance I/Os across all physical disks (one method is by striping the VDisks)
 To maximize sequential throughput, use a maximum range of physical disks (AIX
command mklv -e x) for each LV.
 MDisk and VDisk sizes:
– Create one MDisk per RAID array.
– Create VDisks based on the space needed, which overcomes disk subsystems that
do not allow dynamic LUN detection.
 When you need more space on the server, dynamically extend the VDisk on the SVC
and then use the AIX command chvg -g to see the increased size in the system.
Figure 10-2 General data layout recommendations for AIX storage
SVC striped mode VDisks
We recommend striped mode VDisks for applications that do not already stripe their data
across disks.
Creating VDisks that are striped across all RAID arrays in an MDG ensures that AIX LVM
setup does not matter. Creating VDisks that are striped across all RAID arrays in an MDG is
an excellent approach for most general applications and eliminates data layout
considerations for the physical disks.
Use striped VDisks with the following considerations:
 Use extent sizes of 64 MB to maximize sequential throughput when it is important. Refer
to Table 10-1 for a table of extent size compared to capacity.
 Use striped VDisks when the number of VDisks does not matter.
 Use striped VDisks when the number of VGs does not affect performance.
 Use striped VDisks when sequential I/O rates are greater than the sequential rate for a
single RAID array on the back-end storage. Extremely high sequential I/O rates might
require a different layout strategy.
 Use striped VDisks when you prefer the use of extremely large LUNs on the host.
Refer to 10.6, “VDisk size” on page 220 for details about how to utilize large VDisks.
Table 10-1 Extent size as opposed to maximum storage capacity
214
Extent size
Maximum storage capacity of SVC cluster
16 MB
64 TB
32 MB
128 TB
64 MB
256 TB
128 MB
512 TB
256 MB
1 PB
512 MB
2 PB
1 GB
4 PB
2 GB
8 PB
SAN Volume Controller Best Practices and Performance Guidelines
10.3.4 Database strip size considerations (throughput workload)
It is also worthwhile thinking about the relative strip sizes (a strip is the amount of data written
to one volume or “container” before going to the next volume or container). Database strip
sizes are typically small. Let us assume they are 32 KB. The SVC strip size (called extent) is
user selectable and in the range of 16 MB to 2 GB. The back-end RAID arrays have strip
sizes in the neighborhood of 64 - 512 KB. Then, there is the number of threads performing I/O
operations (assume they are sequential, because if they are random, it does not matter). The
number of sequential I/O threads is extremely important and is often overlooked, but it is a
key part of the design to get performance from applications that perform their own striping.
Comparing striping schemes for a single sequential I/O thread might be appropriate for
certain applications, such as backups, extract, transform, and load (ETL) applications, and
several scientific/engineering applications, but typically, it is not appropriate for DB2 or Tivoli
Storage Manager.
If we have one thread per volume or “container” performing sequential I/O, using SVC image
mode VDisks ensures that the I/O is done sequentially with full strip writes (assuming RAID
5). With SVC striped VDisks, we might run into situations where two threads are doing I/O to
the same back-end RAID array or run into convoy effects that temporarily reduce
performance (convoy effects result in longer periods of lower throughput).
Tivoli Storage Manager uses a similar scheme as DB2 to spread out its I/O, but it also
depends on ensuring that the number of client backup sessions is equal to the number of
Tivoli Storage Manager storage volumes or containers. Tivoli Storage Manager performance
issues can be improved by using LVM to spread out the I/Os (called PP striping), because it
is difficult to control the number of client backup sessions. For this situation, a good approach
is to use SVC striped VDisks rather than SVC image mode VDisks. The perfect situation for
Tivoli Storage Manager is n client backup sessions going to n containers (each container on a
separate RAID array).
To summarize, if you are well aware of the application’s I/O characteristics and the storage
mapping (from the application all the way to the physical disks), you might want to consider
explicit balancing of the I/Os using SVC image mode VDisks to maximize the application’s
striping performance. Normally, using SVC striped VDisks makes sense, balances the I/O
well for most situations, and is significantly easier to manage.
10.3.5 LVM volume groups and logical volumes
Without an SVC managing the back-end storage, the administrator must ensure that the host
operating system aligns its device data partitions or slices with those of the logical drive.
Misalignment can result in numerous boundary crossings that are responsible for
unnecessary multiple drive I/Os. Certain operating systems do this automatically, and you
just need to know the alignment boundary that they use. Other operating systems, however,
might require manual intervention to set their start point to a value that aligns them.
With an SVC managing the storage for the host as striped VDisks, aligning the partitions is
easier, because the extents of the VDisk are spread across the MDisks in the MDG. The
storage administrator must ensure an adequate distribution.
Understanding how your host-based volume manager (if used) defines and makes use of the
logical drives when they are presented is also an important part of the data layout. Volume
managers are generally set up to place logical drives into usage groups for their use. The
volume manager then creates volumes by carving up the logical drives into partitions
(sometimes referred to as slices) and then building a volume from them by either striping or
concatenating them to form the desired volume size.
Chapter 10. Applications
215
How the partitions are selected for use and laid out can vary from system to system. In all
cases, you need to ensure that spreading the partitions is done in a manner to achieve
maximum I/Os available to the logical drives in the group. Generally, large volumes are built
across a number of different logical drives to bring more resources to bear. You must be
careful when selecting logical drives when you do this in order to not use logical drives that
will compete for resources and degrade performance.
10.4 When the application does its own balancing of I/Os
In this section, we discuss how to lay out data when the SVC is implemented with
applications that can balance their I/Os themselves.
10.4.1 DB2 I/O characteristics and data structures
DB2 tables are put into DB2 tablespaces. DB2 tablespaces are made up of containers that
are identified storage locations, such as a raw device (logical volume) or a file system. DB2
spreads data and I/Os evenly across all containers in a tablespace by placing one DB2 extent
of data in each container in a round-robin fashion. Each container will have the same I/O
activity. Thus, you do not use LVM to spread out I/Os across physical disks. Rather, you
create a tablespace with one container on each array, which causes DB2 to explicitly balance
I/Os, because data is being accessed equally off each array.
As we will see, a single DB2 container resides on a single logical volume; thus, each
container of a tablespace (the logical volume, or a file or directory on it) must reside on a
single LUN on an array. This storage design achieves the goal of balanced I/Os spread
evenly across physical disks. There are also db2logs that do not share the round-robin extent
design. The db2logs reside on one LV, which is generally spread across all disks evenly.
Note that this storage design differs from the recommended storage design for other
applications in general. For example, assuming that we are using a disk subsystem directly,
the general best practice for highest performance is to create RAID arrays of the same type
and size (or nearly the same size), then to take one LUN from each array and from the LVM
create a VG, and then create LVs that are spread across every LUN in the VG. In other
words, this technique is a spread everything (all LVs) across everything (all physical disks)
approach (which is quite similar to what the SVC can do). It is better to not use this approach
for DB2, because this approach uses probability to balance I/Os across physical disks, while
DB2 explicitly assures that I/Os are balanced.
DB2 also evenly balances I/Os across DB2 database partitions, which can exist on different
AIX logical partitions (LPARs). The same I/O principles are applied to each partition
separately.
DB2 also has multiple options for containers, including:




Storage Managed Space (SMS) file system directories
Database Managed Space (DMS) file system files
DMS raw
Automatic Storage for DB2 8.2.2
DMS and SMS are DB2 acronyms for Database Managed Space and Storage Managed
Space. Think of DMS containers as pre-allocated storage and SMS containers as dynamic
storage.
216
SAN Volume Controller Best Practices and Performance Guidelines
Note that if we use SMS file system directories, it is important to have one file system (and
underlying LV) per container. That is, do not have two SMS file system directory containers in
the same file system. Also, for DMS file system files, it is important to have just one file per file
system (and underlying LV) per container. In other words, we have only one container per LV.
The reason for these restrictions is that we do not have control of where each container
resides in the LV; thus, we cannot assure that the LVs are balanced across physical disks.
The simplest way to think of DB2 data layout is to assume that we are using many disks and
that we create one container per disk. In general, each container has the same sustained
IOPS bandwidth and resides on a set of physically independent physical disks, because each
container will be accessed equally by DB2 agents.
DB2 also has multiple types of tablespaces and storage uses. For example, tablespaces can
be created separately for table data, indexes, and DB2 temporary work areas. The principles
of storage design for even I/O balancing among tablespace containers applies to each of
these tablespace types. Furthermore, containers for different tablespace types can be shared
on the same array, thus, allowing all database objects to have equal opportunity at using all
I/O performance of the underlying storage subsystem and disks. Also note that different
options can be used for each container type, for example, DMS file containers might be used
for data tablespaces, and SMS file system directories might be used for DB2 temporary
tablespace containers.
DB2 connects physical storage to DB2 tables and database structures through the use of
DB2 tablespaces. Collaboration between a DB2 DBA and the AIX Administrator (or storage
administrator) to create the DB2 tablespace definitions can ensure that the guidance provided
for the database storage design is implemented for optimal I/O performance of the storage
subsystem by the DB2 database.
Use of Automatic Storage bypasses LVM entirely, and here, DB2 uses disks for containers.
So in this case, each disk must have similar IOPS characteristics. We will not describe this
option here.
10.4.2 DB2 data layout example
Assume that we have one database partition, a regular tablespace for data, and a temporary
tablespace for DB2 temporary work. Further assume that we are using DMS file containers
for the regular tablespace and SMS file directories for the DB2 temporary tablespace. This
situation provides us with two options for LUN and LVM configuration:
 Create one LUN per array for SMS containers and one LUN per array for DMS containers.
 Create one LUN per array. Then, on each LUN, create one LV (and associated file
system) for SMS containers and one LV (and associated file system) for DMS containers.
In either case, the number of VGs is irrelevant from a data layout point of view, but one VG is
usually easier to administer and has an advantage for the db2log LV. For the file system
logs, JFS2 in-line logs balance the I/Os across the physical disks as well. The second
approach is more flexible for growth, at least on disk subsystems that do not allow dynamic
LUN expansion, because as the database grows, we can increase the LVs as needed. There
also does not need to be any initial planning for the size difference between DB2 tables and
DB2 temporary space, which is why DB2 practitioners will frequently recommend creating
only one LUN on an array. This storage design provides simplicity while maintaining the
highest levels of I/O performance.
For the db2log LV, we have similar options and we can create one LUN per array and then
create the LV across all the LUNs.
Chapter 10. Applications
217
A second approach to growth is to add another array, the LUNs, and the LVs and allow DB2
to rebalance the data across the containers. This approach also increases the IOPS number
available to DB2.
A third approach to growth is to add one or two disks to each RAID array (for disk subsystems
that support dynamic RAID array expansion). This approach increases IOPS bandwidth.
For DB2 data warehouses, or extremely high bandwidth DB2 databases on the SVC, utilizing
sequential mode VDisks and DB2 managed striping might be preferred.
But for other general applications, we generally recommend using striped VDisks to balance
the I/Os. This recommendation also has the advantage of eliminating LVM data layout as an
issue. We also recommend using SDDPCM instead of IBM Subsystem Device Driver (SDD).
Growth can be handled for general applications by dynamically increasing the size of the
VDisk and then using chvg -g for LVM to see the increased size. For DB2, growth can be
handled by adding another container (a sequential or image mode VDisk) and allowing DB2
to restripe the data across the VDisks.
10.4.3 SVC striped VDisk recommendation
While we have recommended that applications that can handle their own striping are set up
not to use the striping provided by SVC, it usually does little harm to do both kinds of striping.
One danger of multiple striping upon striping is the “beat” effect, similar to the harmonics of
music. One striping method reverses (undoes) the benefits of the other striping method.
However, the beat effect is easy to avoid by ensuring a wide difference in stripe granularities
(sizes of the strips, extents, and so on).
You can design a careful test of an application configuration to ensure that application striping
is optimal when using SVC image mode disks, therefore, supplying maximum performance.
However, in a production environment, the usual scenario is a mix of different databases,
built at different times for different purposes, that is housed in a large and growing number of
tablespaces. Under these conditions, it is extremely difficult to ensure that application striping
continues to work well in terms of distributing the total load across the whole set of physical
disks.
Therefore, we recommend SVC striping even when the application does its own striping,
unless you have carefully planned and tested the application and the entire environment. This
approach adds a great deal more robustness to the situation. It now becomes easy to
accommodate completely new databases and tablespaces with no special planning and
without disrupting the balance of work. Also, the extra level of striping ensures that the load
will be balanced even if the application striping fails. Perhaps most important, this
recommendation lifts a significant burden from the database administrator, because good
performance can be achieved with much less care and planning.
10.5 Data layout with the AIX virtual I/O (VIO) server
The purpose of this section is to describe strategies to get the best I/O performance by evenly
balancing I/Os across physical disks when using the VIO Server.
218
SAN Volume Controller Best Practices and Performance Guidelines
10.5.1 Overview
In setting up storage at a VIO server (VIOS), a broad range of possibilities exists for creating
VDisks and serving them up to VIO clients (VIOCs). The obvious consideration is to create
sufficient storage for each VIOC. Less obvious, but equally important, is getting the best use
of the storage. Performance and availability are of paramount importance. There are typically
internal Small Computer System Interface (SCSI) disks (typically used for the VIOS operating
system) and SAN disks. Availability for disk is usually handled by RAID on the SAN or by
SCSI RAID adapters on the VIOS. We will assume here that any internal SCSI disks are used
for the VIOS operating system and possibly for the VIOC’s operating systems. Furthermore,
we will assume that the applications are configured so that the limited I/O will occur to the
internal SCSI disks on the VIOS and to the VIOC’s rootvgs. If you expect your rootvg will have
a significant IOPS rate, you can configure it in the same fashion as we recommend for other
application VGs later.
VIOS restrictions
There are two types of VDisks that you can create on a VIOS: physical volume (PV) VSCSI
hdisks and logical volume (LV) VSCSI hdisks.
PV VSCSI hdisks are entire LUNs from the VIOS point of view, and if you are concerned
about failure of a VIOS and have configured redundant VIOS for that reason, you must use
PV VSCSI hdisks. So, PV VSCSI hdisks are entire LUNs that are VDisks from the VIOC point
of view.
An LV VSCSI hdisk cannot be served up from multiple VIOSs. LV VSCSI hdisks reside in
LVM VGs on the VIOS and cannot span PVs in that VG, nor be striped LVs.
VIOS queue depth
From a performance point of view, the queue_depth of VSCSI hdisks is limited to 3 at the
VIOC, which limits the IOPS bandwidth to approximately 300 IOPS (assuming an average I/O
service time of 10 ms). Thus, you need to configure a sufficient number of VSCSI hdisks to
get the IOPS bandwidth needed. The queue depth limit changed in Version 1.3 of the VIOS
(August 2006) to 256; although, you still need to worry about the IOPS bandwidth of the
back-end disks. When possible, set the queue depth of the VIOC hdisks to match that of the
VIOS hdisk to which it maps.
10.5.2 Data layout strategies
You can use the SVC or AIX LVM (with appropriate configuration of vscsi disks at the VIOS)
to balance the I/Os across the back-end physical disks. When using an SVC, here is how to
balance the I/Os evenly across all arrays on the back-end storage subsystems:
 You create just a few LUNs per array on the back-end disk in each MDG (the normal
practice is to have RAID arrays of the same type and size, or nearly the same size, and
same performance characteristics in an MDG).
 You create striped VDisks on the SVC that are striped across all back-end LUNs.
 When you do this, the LVM setup does not matter, and you can use PV vscsi hdisks and
redundant VIOSs or LV vscsi hdisks (if you are not worried about VIOS failure).
Chapter 10. Applications
219
10.6 VDisk size
Larger VDisks might need more disk buffers and larger queue_depths depending on the I/O
rates; however, there is a large benefit of less AIX memory and fewer path management
resources used. It is worthwhile to tune the queue_depths and adapter resources for this
purpose. It is preferable to use fewer large LUNs, because it is easy to increase the
queue_depth, which does require application downtime, and disk buffers, because handling
more AIX LUNs requires a considerable amount of OS resources.
10.7 Failure boundaries
As mentioned in 10.3.3, “General data layout recommendations” on page 212, it is important
to consider failure boundaries in the back-end storage configuration. If all of the LUNs are
spread across all physical disks (either by LVM or SVC VDisk striping), and you experience a
single RAID array failure, you might lose all your data. So, there are situations in which you
probably want to limit the spread for certain applications or groups of applications. You might
have a group of applications where if one application fails, none of the applications can
perform any productive work.
When implementing the SVC, limiting the spread can be accounted for through the MDG
layout. Refer to Chapter 5, “MDisks” on page 83 for more information about failure boundaries
in the back-end storage configuration.
220
SAN Volume Controller Best Practices and Performance Guidelines
11
Chapter 11.
Monitoring
The SAN Volume Controller (SVC) provides a range of data about how it performs and also
about the performance of other components of the SAN. When you properly monitor
performance, having the SVC in the SAN makes it easier to recognize and fix faults and
performance problems.
In this chapter, we first describe how to collect SAN topology and performance information
using TotalStorage Productivity Center (TPC). We then show several examples of
misconfiguration and failures, and how they can be identified in the TPC Topology Viewer
and performance reports. Finally, we describe how to monitor the SVC error log effectively by
using the e-mail notification function.
The examples in this chapter were taken from TotalStorage Productivity Center (TPC)
V3.3.2.79, which was released in June 2008 to support SVC 4.3. You must always use the
latest version of TPC that is supported by your SVC code; TPC is often updated to support
new SVC features. If you have an earlier version of TPC installed, you might still be able to
reproduce the reports described here, but certain data might not be available.
© Copyright IBM Corp. 2008. All rights reserved.
221
11.1 Configuring TPC to analyze the SVC
TPC manages all storage controllers using their Common Information Model (CIM) object
manager (CIMOM) interface. CIMOM interfaces enable a Storage Management Initiative
Specification (SMI-S) management application, such as TPC, to communicate to devices
using a standards-based protocol. The CIMOM interface will translate an SMI-S command
into a proprietary command that the device understands and then convert the proprietary
response back into the SMI-S-based response.
The SVC’s CIMOM interface is supplied with the SVC Master Console and is automatically
installed as part of the SVC Master Console installation. The Master Console can manage
multiple SVC clusters, and TPC is aware of all of the clusters that it manages. TPC does not
directly connect to the Config Node of the SVC cluster to manage the SVC cluster.
If you see that TPC is having difficulty communicating with or monitoring the SVC, check the
health and status of the SVC Master Console.
Note: For TPC to manage the SVC, you must have TCP/IP connectivity between the TPC
Server and the SVC Master Console. TPC will not communicate with the SVC nodes, so it
is acceptable that the SVC nodes are not on the same network to which TPC has access.
To configure TPC to manage the SVC:
1. Start the TPC GUI application. Navigate to Administrative Services → Data Sources →
CIMOM Agents → Add CIMOM. Enter the information in the Add CIMOM panel that
appears. Refer to Figure 11-1 for an example.
Figure 11-1 Configuring TPC to manage the SVC
2. When you click Save, TPC will validate the information that you have provided by testing
the connection to the CIMOM. If there is an error, an alert will pop up, and you must
correct the error before you can save the configuration again.
222
SAN Volume Controller Best Practices and Performance Guidelines
3. After the connection has been successfully configured, TPC must run a CIMOM
Discovery (under Administrative Services → Discovery → CIMOM) before you can set
up performance monitoring or before the SVC cluster will appear in the Topology Viewer.
Note: The SVC Config Node (that owns the IP address for the cluster) has a 10 session
Secure Shell (SSH) limit. TPC will use one of these sessions while interacting with the
SVC. You can read more information about the session limit in 3.2.1, “SSH connection
limitations” on page 42.
11.2 Using TPC to verify the fabric topology
After TPC has probed the SAN environment, it takes the information from all the SAN
components (switches, storage controllers, and hosts) and automatically builds a graphical
display of the SAN environment. This graphical display is available through the Topology
Viewer option on the TPC navigation tree.
The information on the Topology Viewer panel is current as of the successful resolution of the
last problem. By default, TPC will probe the environment daily; however, you can execute an
unplanned or immediate probe at any time.
Normally, the probe takes less than five minutes to complete. If you are analyzing the
environment for problem determination, we recommend that you execute an unplanned probe
to ensure that you have the latest up-to-date information on the SAN environment. Make sure
that the probe completes successfully.
11.2.1 SVC node port connectivity
It is important that each SVC node port is connected to switches in your SAN fabric. If any
SVC node port is not connected, each node in the cluster will display an error on the LCD
display (probably, error 1060). TPC will also show the health of the cluster as a warning in the
Topology Viewer.
It is equally important to ensure that:
 You have at least one port from each node in each fabric.
 You have an equal number of ports in each fabric from each node; that is, do not have
three ports in fabric one and only one port in fabric two for an SVC node.
Figure 11-2 on page 224 shows using TPC (under IBM TotalStorage Productivity Center →
Topology → Storage) to verify that we have an even number of ports in each fabric. The
example configuration shows that:
 Our SVC is connected to two fabrics (we have named our fabrics FABRIC-2GBS and
FABRIC-4GBS).
 We have four SVC nodes in this cluster. TPC has organized our switch ports so that each
column represents a node, which you can see, because worldwide port name (WWPN)
has similar numbers.
 We have an even number of ports in each switch. Figure 11-2 on page 224 shows the
links to each switch at the same time. It might be easier to validate this setup by clicking on
one switch at a time (refer to Figure 11-5 on page 227).
Chapter 11. Monitoring
223
Information: When we cabled our SVC, we intended to connect ports 1 and 3 to one
switch (IBM_2109_F32) and ports 2 and 4 to the other switch (swd77). We thought that we
were really careful about labeling our cables and configuring our ports.
TPC showed us that we did not configure the ports this way, and additionally, we made
two mistakes. Figure 11-2 shows that we:
 Correctly configured all four nodes with port 1 to switch IBM_2109_F32
 Correctly configured all four nodes with port 2 to switch swd77
 Incorrectly configured two nodes with port 3 to switch swd77
 Incorrectly configured two nodes with port 4 to switch IBM_2109_F32
Figure 11-2 Checking the SVC ports to ensure they are connected to the SAN fabric
TPC can also show us where our host and storage are in our fabric and which switches the
I/Os will go through when I/Os are generated from the host to the SVC or from the SVC to the
storage controller.
For redundancy, all storage controllers must be connected to at least two fabrics, and those
same fabrics must be the fabrics to which the SVC is connected.
Figure 11-3 on page 225 shows our DS4500 is also connected to fabrics FABRIC-2GBS and
FABRIC-4GBS as we planned.
Information: Our DS4500 was shared with other users, so we were only able to use two
ports of the available four ports. The other two ports were used by a different SAN
infrastructure.
224
SAN Volume Controller Best Practices and Performance Guidelines
Figure 11-3 Checking that your storage is in each fabric
11.2.2 Ensuring that all SVC ports are online
Information in the Topology Viewer can also confirm the health and status of the SVC and the
switch ports. When you look at the Topology Viewer, TPC will show a Fibre port with a box
next to the WWPN. If this box has a black line in it, the port is connected to another device.
Table 11-1 shows an example of the ports with their connected status.
Table 11-1 TPC port connection status
TPC port view
Status
This is a port that is connected.
This is a port that is not
connected.
Figure 11-2 on page 224 shows an example where all the TPC ports are connected and the
switch ports are healthy.
Figure 11-4 on page 226 shows an example where the SVC ports are not healthy. In this
example, the two ports that have a black line drawn between the switch and the SVC node
port are in fact down.
Because TPC knew where these two ports were connected on a previous probe (and, thus,
they were previously shown with a green line), the probe discovered that these ports were no
longer connected, which resulted in the green line becoming a black line.
Chapter 11. Monitoring
225
If these ports had never been connected to the switch, no lines will show for them, and we will
only see six of the eight ports connected to the switch.
Figure 11-4 Showing SVC ports that are not connected
11.2.3 Verifying SVC port zones
When TPC probes the SAN environment to obtain information on SAN connectivity, it also
collects information on the SAN zoning that is currently active. The SAN zoning information is
also available on the Topology Viewer through the Zone tab.
By opening the Zone tab and clicking both the switch and the zone configuration for the SVC,
we can confirm that all of the SVC node ports are correct in the Zone configuration.
Figure 11-5 on page 227 shows that we have defined an SVC node zone called SVC_CL1_NODE
in our FABRIC-2GBS, and we have correctly included all of the SVC node ports.
226
SAN Volume Controller Best Practices and Performance Guidelines
Click on the switch to
see which ports are
connected to it.
The gray box
shows ports in the
zone.
Figure 11-5 Checking that our zoning is correct
Our SVC will also be used in a Metro Mirror and Global Mirror relationship with another SVC
cluster. In order for this configuration to be a supported configuration, we must make sure that
every SVC in this cluster is zoned so that it can see every port in the remote cluster.
In each fabric, we made a zone set called SVC_MM_NODE with all the node ports for all of the
SVC nodes. We can check each SVC to make sure that all of its ports are in fact in this zone
set. Figure 11-6 on page 228 shows that we have correctly configured all ports for the SVC
cluster ITSO_CL1.
Chapter 11. Monitoring
227
Shift click on each
zone, to see all the
ports.
Figure 11-6 Verifying Metro Mirror and Global Mirror zones
11.2.4 Verifying paths to storage
TPC 3.3 introduced a new feature called the Data Path View. You can use this view to see
the path between two objects, and the Data Path View shows the objects and the switch
fabric in one view.
Using the Data Path View, we can see that mdisk1 in SVC ITSOCL1 is available through all the
SVC ports and trace that connectivity to its logical unit number (LUN) ST-7S10-5. Figure 11-7
on page 229 shows this view.
What is not shown in Figure 11-7 on page 229 is that you can hover over the MDisk, LUN,
and switch ports with your mouse and get both health and performance information about
these components. This capability enables you to verify the status of each component to see
how well it is performing.
228
SAN Volume Controller Best Practices and Performance Guidelines
Figure 11-7 Verifying the health between two objects in the SVC
Chapter 11. Monitoring
229
11.2.5 Verifying host paths to the SVC
By using the computer display in TPC, you can see all the fabric and storage information for
the computer that you select.
Figure 11-8 shows the host KANAGA, which has two host bus adapters (HBAs). This host has
also been configured to access part of the SVC storage (the SVC storage is only partially
shown in this panel).
Our Topology View confirms that KANAGA is physically connected to both of our fabrics.
By using the Zone tab, we can see that only one zone configuration applies to KANAGA for the
FABRIC-2GBS zone and that no zone configuration is active for the FABRIC-4GBS zone.
Therefore, KANAGA does not have redundant paths, and thus, if switch IBM_2109_F32 went
offline, KANAGA will lose access to its SAN storage.
By clicking the zone configuration, we can see which port is included in a zone configuration
and thus which switch has the zone configuration. The port that has no zone configuration will
not be surrounded by a gray box.
Figure 11-8 Kanaga has two HBAs but is only zoned into one fabric
Using the Fabric Manager component of TPC, we can quickly fix this situation. The fixed
results are shown in Figure 11-9 on page 231.
230
SAN Volume Controller Best Practices and Performance Guidelines
Figure 11-9 Kanaga with the zoning fixed
You can also use the Data Path Viewer in TPC to check to confirm path connectivity between
a disk that an operating system sees and the VDisk that the SVC provides.
Figure 11-10 on page 232 shows two diagrams for the path information relating to host
KANAGA:
 The top (left) diagram shows the path information before we fixed our zoning configuration.
It confirms that KANAGA only has one path to the SVC VDisk vdisk4. Figure 11-8 on
page 230 confirmed that KANAGA has two HBAs and that they are connected to our SAN
fabrics. From this panel, we can deduce that our problem is likely to be a zoning
configuration problem.
 The lower (right) diagram is the result that shows the zoning fixed.
Figure 11-10 on page 232 does not show us that you can hover over each component to also
get health and performance information, which might also be useful when you perform
problem determination and analysis.
Chapter 11. Monitoring
231
Figure 11-10 Viewing host paths to the SVC
232
SAN Volume Controller Best Practices and Performance Guidelines
11.3 Analyzing performance data using TPC
TPC can collect performance information for all of the components that make up your SAN.
With the performance information about the switches and storage, it is possible to view the
end-to-end performance for a specific host in our SAN environment.
There are three methods of using the performance data that TPC collects:
 Using the TPC GUI to manage fabric and disk performance
By default, the TPC GUI is installed on the TPC server. You can also optionally install the
TPC GUI on any supported Windows or UNIX workstation by running the setup on disk1 of
the TPC media and choosing a custom installation.
By using the TPC GUI, you can monitor the performance of the:
– Switches by navigating to Fabric Manager → Reporting → Switch Performance
– Storage controllers by navigating to Disk Manager → Reporting → Storage
Subsystem Performance
Both options are in the TPC navigation tree on the left side of the GUI.
The reports under these menu options provide the most detailed information about the
performance of the devices.
 Using the TPC GUI with the Data Path Viewer
With TPC 3.3, there is a new Data Path Viewer display, which enables you to see the
end-to-end performance between:
– A host and its disks (VDisks if they are SVC or LUNs if there is a storage controller)
– The SVC and the storage controllers that provide LUNs
– A storage controller and all the hosts to which it provides storage (including the SVC)
With the Data Path Viewer, all the information and the connectivity between a source
(Initiator Entity) and a target (Target Entity) are shown in one display.
Turning on the Topology Viewer Health, Performance, and Alert overlays, you can hover
over each component to get a full understanding of how they are performing and their
health.
To use the Data Path Viewer, navigate to the Topology Viewer (under IBM TotalStorage
Productivity Center → Topology), right-click a computer or storage controller and select
Open DataPath View.
 Using the TPC command line interface (CLI) TPCTOOL
The TPCTOOL command line interface enables you to script extracting data out of TPC
so that you can do more advanced performance analysis, which is particularly useful if you
want to include multiple performance metrics about one or more devices on one report.
For example, if you have an application that spans multiple hosts with multiple disks
coming from multiple controllers, you can use TPCTOOL to collect all the performance
information from each component and group all of it together onto one report.
Using TPCTOOL assumes you have an advanced understanding of TPC and requires
scripting to take full advantage of it. We recommend that you use at least TPC V3.1.2 if
you plan on using the CLI interface.
Chapter 11. Monitoring
233
11.3.1 Setting up TPC to collect performance information
TPC performance collection is either turned on or turned off. You do not need to specify the
performance information that you want to collect. TPC will collect all performance counters
that the SVC (or storage controller) provides and insert them into the TPC database. After the
counters are there, you can report on the results using any of the three methods described in
the previous section.
To enable the performance collection, navigate to Disk Manager → Monitoring and
right-click Storage Performance Monitors.
We recommend that you create a separate performance monitor for each CIMOM from which
you want to collect performance data. Each CIMOM provides different sampling intervals, and
if you combine all of your different storage controllers into one performance collection, the
sample interval might not be as granular as you want.
Additionally, by having separate performance monitor collections, you can start and stop
individual monitors as required.
Note: Make sure that your TPC server, SVC Master Console, and SVC cluster are set with
the correct times for their time zones.
If your SVC is configured for Coordinated Universal Time (UTC), ensure that it is in fact on
UTC time and not local time. TPC will adjust the time on the performance data that it
receives before inserting the data in the TPC database.
If the time does not match the time zone, it is difficult to compare performance among
objects, for example, the switch performance or the storage controller performance.
11.3.2 Viewing TPC-collected information
TPC collects and reports on many statistics as recorded by the SVC nodes. With these
statistics, you can get general cluster performance information or more detailed specific
VDisk or MDisk performance information.
An explanation of the metrics and how they are calculated is available in Appendix A of the
TotalStorage Productivity Center User Guide located at this Web site:
http://publib.boulder.ibm.com/infocenter/tivihelp/v4r1/topic/com.ibm.itpc.doc/tpcu
gd31389.htm
The TPC GUI interface provides you with an easy intuitive method of querying the TPC
database to obtain information about many of the counters that it stores. One limitation of the
TPC GUI is that you can only report on “like” counters at one time. For example, you cannot
display response times and data rates on the same graph.
You also cannot include information from related devices on the same report. For example,
you cannot combine port utilization from a switch with the host data rate as seen on the SVC.
This information can only be provided in separate reports with the TPC GUI.
If you use the TPC command line interface, you will be able to collect all of the individual
metrics on which you want to report and massage that data into one report.
When starting to analyze the performance of the SVC environment to identify a performance
problem, we recommend that you identify all of the components between the two systems
and verify the performance of the smaller components.
234
SAN Volume Controller Best Practices and Performance Guidelines
Thus, traffic between a host, the SVC nodes, and a storage controller goes through these
paths:
1. The host generates the I/O and transmits it on the fabric.
2. The I/O is received on the SVC node ports.
3. If the I/O is a write I/O:
a. The SVC node writes the I/O to the SVC node cache.
b. The SVC node sends a copy to its partner node to write to the partner node’s cache.
c. If the I/O is part of a Metro Mirror and Global Mirror, a copy needs to go to the target
VDisk of the relationship.
d. If the I/O is part of a FlashCopy and the FlashCopy block has not been copied to the
target VDisk, this action needs to be scheduled.
4. If the I/O is a read I/O:
a. The SVC needs to check the cache to see if the Read I/O is already there.
b. If the I/O is not in the cache, the SVC needs to read the data from the physical LUNs.
5. At some point, write I/Os will be sent to the storage controller.
6. The SVC might also do some read ahead I/Os to load the cache in case the next read I/O
from the host is the next block.
TPC can help you report on most of these steps so that it is easier to identify where a
bottleneck might exist.
11.3.3 Cluster, I/O Group, and node reports
The TPC cluster performance information is useful to get an overall idea of how the cluster is
performing and to get an understanding of the workload passing through your cluster.
The I/O Group and node reports enable you to drill down into the health of the cluster and
obtain a more granular understanding of the performance.
The available reports fit into the following categories.
SVC node resource performance
These reports enable you to understand the workload on the cluster resources, particularly
the load on CPU and cache memory. There is also a report that shows the traffic between
nodes.
Figure 11-11 on page 236 shows an example of several of the available I/O Group resource
performance metrics. In this example, we generated excessive I/O to our storage controller
(of which the SVC was unaware) together with an excess load on two hosts that each had 11
VDisks from our SVC cluster. The result of this exercise was to show where a storage
controller is under stress and how this stress is reflected in the TPC results.
Chapter 11. Monitoring
235
Figure 11-11 Multiple I/O Group resource performance metrics
An important metric in this report is the CPU utilization (in dark blue). The CPU utilization
reports give you an indication of how busy the cluster CPUs are. A continually high CPU
Utilization rate indicates a busy cluster. If the CPU utilization remains constantly high, it might
be time to increase the cluster by adding more resources.
You can add cluster resources by adding another I/O Group to the cluster (two nodes) up to
the maximum of four I/O Groups per cluster.
After there are four I/O Groups in a cluster and high CPU utilization is still indicated in the
reports, it is time to build a new cluster and consider either migrating part of the storage to the
new cluster or servicing new storage requests from it.
We recommend that you plan additional resources for the cluster if your CPU utilization
indicates workload continually above 70%.
The cache memory resource reports provide an understanding of the utilization of the SVC
cache. These reports provide you with an indication of whether the cache is able to service
and buffer the current workload.
In Figure 11-11, you will notice that there is an increase in the Write-cache Delay Percentage
and Write-cache Flush Through Percentage and a drop in the Write-cache Hits Percentage,
Read Cache Hits, and Read-ahead percentage of cache hits. This change is noted about
halfway through the graph.
This change in these performance metrics together with an increase in back-end response
time shows that the storage controller is heavily burdened with I/O, and at this time interval,
the SVC cache is probably full of outstanding write I/Os. (We expected this result with our test
run.) Host I/O activity will now be impacted with the backlog of data in the SVC cache and
with any other SVC workload that is happening on the same MDisks (FlashCopy and
Global/Metro Mirror).
236
SAN Volume Controller Best Practices and Performance Guidelines
If cache utilization is a problem, you can add additional cache to the cluster by adding an I/O
Group and moving VDisks to the new I/O Group.
SVC fabric performance
The SVC fabric performance reports help you understand the SVC’s impact on the fabric and
give you an indication of the traffic between:
 The SVC and the hosts that receive storage
 The SVC and the back-end storage
 Nodes in the SVC cluster
These reports can help you understand if the fabric might be a performance bottleneck and if
upgrading the fabric can lead to performance improvement. Figure 11-12 is one version of a
port send and receive data rate report.
Figure 11-12 Port receive and send data rate for each I/O Group
Figure 11-12 and Figure 11-13 on page 238 show two versions of port rate reports.
Figure 11-12 shows the overall SVC node port rates for send and receive traffic. With a
2 Gb per second fabric, these rates are well below the throughput capability of this fabric, and
thus, the fabric is not a bottleneck here. Figure 11-13 on page 238 shows the port traffic
broken down into host, node, and disk traffic. During our busy time as reported in
Figure 11-11 on page 236, we can see that host port traffic drops while disk port traffic
continues. This information indicates that the SVC is communicating with the storage
controller, possibly flushing outstanding I/O write data in the cache and performing other
non-host functions, such as FlashCopy and Metro Mirror and Global Mirror copy
synchronization.
Chapter 11. Monitoring
237
Figure 11-13 Total port to disk, host, and local node report
Figure 11-14 on page 239 shows an example TPC report looking at port rates between the
SVC nodes, hosts, and disk storage controllers. This report shows low queue and response
times, indicating that the nodes do not have a problem communicating with each other.
If this report showed unusually high queue times and high response times, our write activity
(because each node communicates to each other node over the fabric) is affected.
Unusually high numbers in this report indicate:
 SVC node or port problem (unlikely)
 Fabric switch congestion (more likely)
 Faulty fabric ports or cables (most likely)
238
SAN Volume Controller Best Practices and Performance Guidelines
Figure 11-14 Port to local node send and receive response and queue times
SVC storage performance
The remaining TPC reports give you a high level understanding of the SVC’s interaction with
hosts and back-end storage. Most reports provide both an I/O rate report (measured in IOPS)
and a data rate report (measured in MBps).
The particularly interesting areas of these reports include the back-end read and write rates
and the back-end read and write response times, which are shown in Figure 11-15.
Figure 11-15 Back-end read and write response times
Chapter 11. Monitoring
239
In Figure 11-15 on page 239, we see an unusual spike in back-end response time for both
read and write operations, and this spike is consistent for both of our I/O Groups. This report
confirms that we are receiving poor response from our storage controller and explains our
lower than expected host performance.
Our cache resource reports (in Figure 11-11 on page 236) also show an unusual pattern in
cache usage during the same time interval. Thus, we can attribute the cache performance to
be a result of the poor back-end response time that the SVC is receiving from the storage
controller. The cause of this poor response time must be investigated using all available
information from the SVC and the back-end storage controller. Possible causes, which might
be visible in the storage controller management tool, include:
 Physical drive failure can lead to an array rebuild, which drives internal read/write
workload in the controller while the rebuild is in progress. If this array rebuild is causing
poor latency, it might be desirable to adjust the array rebuild priority to lessen the load.
However, this priority must be balanced with the increased risk of a second drive failure
during the rebuild, which can cause data loss in a Redundant Array of Independent Disks
5 (RAID 5) array.
 Cache battery failure can lead to cache being disabled by the controller, which can usually
be resolved simply by replacing the failed battery.
Summary of the available cluster reports in TPC 3.3
These are the types of data available in reports on the cluster, I/O Groups, and nodes:
 Overall Data Rates and I/O Rates
 Backend I/O Rates and Data Rates
 Response Time and Backend Response Time
 Transfer Size and Backend Transfer Size
 Disk to Cache Transfer Rate
 Queue Time
 Overall Cache Hit Rates and Write Cache Delay
 Readahead and Dirty Write cache
 Write cache overflow, flush-through, and write-through
 Port Data Rates and I/O Rates
 CPU Utilization
 Data Rates, I/O Rates, Response Time, and Queue Time for:
–
–
–
–
Port to Host
Port to Disk
Port to Local Node
Port to Remote Node
 Global Mirror Rates
 Peak Read and Write Rates
11.3.4 Managed Disk Group, Managed Disk, and Volume reports
The Managed Disk Group, Managed Disk, and Volume reports enable you to report on the
performance of storage both from the back end and from the front end. Note that “Volumes” in
TPC correspond to VDisks when monitoring an SVC.
240
SAN Volume Controller Best Practices and Performance Guidelines
By including a VDisk on a report, together with the LUNs from the storage controllers (which
in turn are the MDisks over which the VDisks can be striped), you can see the performance
that a host is receiving (through the VDisks) together with the impact on the storage controller
(through the LUNs).
Figure 11-16 shows a VDisk named IOTEST and the associated LUNs from our DS4000
storage controller. We can see which of the LUNs are being used while IOTEST is being used.
Figure 11-16 Viewing VDisk and LUN performance
11.3.5 Using TPC to alert on performance constraints
Along with reporting on SVC performance, TPC can also generate alerts when performance
has not met or has exceeded a defined threshold.
Like most TPC tasks, the alerting can report to:
 Simple Network Management Protocol (SNMP), which can enable you to send a trap to an
upstream systems management application. The SNMP trap can then be used with other
events occurring within the environment to help determine the root cause of an SNMP trap
generated by the SVC.
For example, if the SVC reported to TPC that a Fibre Channel port went offline, it might
result from a switch failure. This “port failed” trap, together with the “switch offline”
trap, can be analyzed by a systems management tool, which discovers that this is a switch
problem and not an SVC problem, and calls the switch technician.
 TEC Event. You can select to send a Tivoli Enterprise Console® (TEC) event.
 Login Notification. You can select to send the alert to a TotalStorage Productivity Center
user. The user receives the alert upon logging in to TotalStorage Productivity Center. In
the Login ID field, type the user ID.
 UNIX or Windows NT® Server system event logger.
Chapter 11. Monitoring
241
 Script. The script option enables you to run a defined set of commands that might help
address this event. For example, simply open a trouble ticket in your helpdesk ticket
system.
 Notification by e-mail. TPC will send an e-mail to each person listed.
Useful performance alerts
While you can use performance alerts to monitor any value reported by TPC, certain alerts
will be more useful when identifying serious problems. These alerts include:
 Node CPU utilization threshold
The CPU utilization report alerts you when your SVC nodes become too busy. CPU
utilization depends on the amount of host I/O, as well as the extent to which advanced
copy services are being used. If this statistic increases beyond 70%, you might want to
think about increasing the size of your cluster or adding a new cluster.
 Overall port response time threshold
The port response time alert can let you know when the SAN fabric is becoming a
bottleneck. If the response times are consistently poor, perform additional analysis of your
SAN fabric.
 Overall back-end response time threshold
An increase in back-end response time might indicate that you are overloading your
back-end storage. The exact value at which to set the alert depends on what kind of
storage controller you are using, the RAID configuration, and the typical I/O workload. A
high-end controller, such as a DS8000, might be expected to have a lower typical latency
than a DS4500. RAID 1 typically is faster than RAID 5. To evaluate the normal working
range of your back-end storage, use TPC to collect data for a period of typical workload.
After you have established the normal working range of your controller, create a
performance alert for back-end response time. You might want to set more than one alert
level. For example, response time of more than 100 ms nearly always indicates that
back-end storage is being overloaded, so 100 ms might be a suitable high importance
alert level. You might set a low importance alert for a lower value, such as 20% over the
typical response time.
11.3.6 Monitoring MDisk performance for mirrored VDisks
The new VDisk Mirroring function in SVC 4.3 allows you to mirror VDisks between different
MDisk groups to improve availability. However, it is important to note that write performance
of the VDisk will depend on the worst performing MDisk group. Reads are always performed
from the primary VDisk Copy, if it is available. Writes remain in the SVC cache until both
MDisks have completed the I/O. Therefore, if one group performs significantly worse, this
problem will reduce the write performance of the VDisk as a whole.
You can use TPC to ensure that the performance of the groups is comparable:
 Report on back-end disk performance by selecting Disk Manager → Storage
Subsystem Performance → By managed disk
 Include Write Data Rate
 Choose Selection and check only the MDisks that are members of the groups being used
for mirrored VDisks.
242
SAN Volume Controller Best Practices and Performance Guidelines
The graph from this report will show whether one MDisk group performs significantly worse
than the other MDisk group. If there is a gap between the two MDisk groups, consider taking
steps to avoid adverse performance impact, which might include:
 Migrating other, non-mirrored MDisks from the poorly performing MDisk group to allow
more bandwidth for the mirrored VDisk’s I/O
 Migrating one of the mirrored VDisk’s copies to another MDisk group with spare
performance capacity
 Accepting the current performance if the slower of the two MDisk groups is still reasonable
11.4 Monitoring the SVC error log with e-mail notifications
In a SAN environment, it is important to ensure that events, such as hardware failures, are
recognized promptly and that corrective action is taken. Redundancy in SAN design allows
hosts to continue performing I/O even when these failures occur; however, there are two
reasons to fix problems promptly:
 While operating in a degraded state, the performance of key components might be lower.
For example, if a storage controller port fails, the remaining ports might not be able to
cope with the I/O bandwidth from hosts or SVC.
 The longer a SAN runs with a failed component, the higher the likelihood that a second
component will fail, risking a loss of access.
The SVC error log provides information about errors within the SVC, as well as problems with
attached devices, such as hosts, switches, and back-end storage. By making good use of this
information, having a SVC in a SAN can make it easier to diagnose problems and restore the
SAN to a healthy state.
There are three ways to access the SVC error log:
 You can view the error log directly using the SVC Console GUI, which allows searching
the error log for particular problems or viewing the whole log to gain an overview of what
has happened. However, the administrator must consciously decide to check the error log.
 Simple Network Management Protocol (SNMP) allows continuous monitoring of events as
they occur. When an error is logged, the SVC sends an SNMP trap through Ethernet to a
monitoring service running on a server. Different responses can be set up for different
error classes (or severities), for example, a warning about error recovery activity on
back-end storage might simply be logged, while an MDisk Group going offline might
trigger an e-mail to the administrator.
 SVC 4.2.0.3 and higher are capable of sending e-mails directly to a standard Simple Mail
Transfer Protocol (SMTP) mail server, which means that a separate SNMP server is no
longer required. The existing e-mail infrastructure at a site can be used instead, which is
often preferable.
Best practice: You must configure SNMP or e-mail notification and test the configuration
when the cluster is created, which will make it easier to detect and resolve SAN problems
as the SVC environment grows.
Chapter 11. Monitoring
243
11.4.1 Verifying a correct SVC e-mail configuration
After the e-mail settings have been configured on the SVC, it is important to make sure that
e-mail can be successfully sent. The svctask testemail command allows you to test sending
the e-mail. If the command completes without error, and the test e-mail arrives safely in the
administrator’s incoming e-mail, you can be confident that error notifications will be received.
If not, you must investigate where the problem lies.
The testemail output in Example 11-1 shows an example of a failed e-mail test. In this case,
the test failed because the specified IP address did not exist. The part of the lscluster output
that is not related to e-mail has been removed for clarity.
Example 11-1 Sending a test e-mail
IBM_2145:itsosvccl1:admin>svcinfo lscluster itsosvccl1
email_server 9.43.86.82
email_server_port 25
email_reply noone@uk.ibm.com
IBM_2145:itsosvccl1:admin>svcinfo lsemailuser
id
name
address
err_type
0
admin_email
noone@uk.ibm.com all
user_type
local
inventory
off
IBM_2145:itsosvccl1:admin>svctask testemail admin_email
CMMVC6280E Sendmail error EX_TEMPFAIL. The sendmail command could not create a
connection to a remote system.
Possible causes include:
 Ethernet connectivity issues between the SVC cluster and the mail server. For example,
the SVC might be behind a firewall protecting the data center network, or even on a
separate network segment that has no access to the mail server. As with the Master
Console or System Storage Productivity Center (SSPC), the mail server must be
accessible by the SVC. SMTP uses TCP port 25 (unless you have configured an
alternative port); if there is a firewall, enable this port outbound from the SVC.
 Mail server relay blocking. Many administrators implement filtering rules to prevent spam,
which is particularly likely if you are sending e-mail to a user who is on a different mail
server or outside of the mail server’s own network. On certain platforms, the default
configuration prevents mail forwarding to any other machine. You must check the mail
server log to see whether it is rejecting mail from the SVC. If it is, the mail server
administrator must adjust the configuration to allow the forwarding of these e-mails.
 An invalid “FROM” address. Certain mail servers will reject e-mail if no valid “FROM”
address is included. SVC takes this FROM address from the email_reply field of lscluster.
Therefore, make sure that a valid reply-to address is specified when setting up e-mail. You
can change the reply-to address by using the command svctask chemail -reply address
If you cannot find the cause of the e-mail failure, contact your IBM service support
representative (IBM SSR).
244
SAN Volume Controller Best Practices and Performance Guidelines
12
Chapter 12.
Maintenance
As with any piece of enterprise storage equipment, the IBM SAN Volume Controller (SVC) is
not a completely “hands-off” device. It requires configuration changes to meet growing needs,
updates to software for enhanced performance, features, and reliability, and the tracking of all
the data that you used to configure your SVC.
© Copyright IBM Corp. 2008. All rights reserved.
245
12.1 Configuration and change tracking
The IBM SAN Volume Controller provides great flexibility to your storage configuration that
you do not otherwise have. However, with the flexibility comes an added layer of
configuration that is not present in a “normal” SAN. However, your total administrative burden
often decreases, because extremely few changes are necessary on your disk arrays when
the SVC manages them.
There are many tools and techniques that you can use to prevent your SVC installation from
spiralling out of control. What is most important is what information you track, not how you
track it. For smaller installations, everything can be tracked on simple spreadsheets. In
environments with several clusters, hundreds of hosts, and a whole team of administrators,
more automated solutions, such as TotalStorage Productivity Center or custom databases,
might be required.
We do not discuss how to track your changes, because there are far too many tools and
methods available to describe here. Rather, we discuss what sort of information is extremely
useful to track. You need to decide what is the best method.
Note: Do not store all change tracking and SAN, SVC, and storage inventory information
on the SAN itself.
In theory, your documentation must be sufficient for any engineer, who is skilled with the
products that you own, to take a copy of all of your configuration information and use it to
create a functionally equivalent copy of the environment from nothing. If your documentation
does not allow you to achieve this goal, you are not tracking enough information.
It is a best practice to create this documentation as you install your solution. Putting this
information together after deployment is likely to be a tedious, boring, and error-prone task.
In the following sections, we provide what we think is the minimum documentation needed for
an SVC solution. Do not view it as an exhaustive list; you might have additional business
requirements that require other data to be tracked.
12.1.1 SAN
Tracking how your SAN is configured is extremely important.
SAN diagram
The most basic piece of SAN documentation is the SAN diagram. If you ever call IBM Support
asking for help with your SAN, you can be sure that the SAN diagram is likely to be one of the
first things that you are asked to produce.
Maintaining a proper SAN diagram is not as difficult as it sounds. It is not necessary for the
diagram to show every last host and the location of every last port; this information is more
properly collected (and easier to read) in other places. To understand how difficult an overly
detailed diagram is to read, refer to Figure 12-1 on page 247.
246
SAN Volume Controller Best Practices and Performance Guidelines
Figure 12-1 An overly detailed SAN diagram
Instead, a SAN diagram must only include every switch, every storage device, all inter-switch
links (ISLs), along with how many there are, and a representation of which switches have
hosts connected to them. An example is shown in Figure 12-2 on page 248. In larger SANs
with many storage devices, the diagram can still be too large to print without a large-format
printer, but it can still be viewed on a panel using the zoom feature. We suggest a tool, such
as Microsoft Visio®, to create your diagrams. Do not worry about finding fancy stencils or
official shapes, because your diagram does not need to show exactly into which port
everything is plugged. You can use your port inventory for that. Your diagram can be
appropriately simple. You will notice that our sample diagram just uses simple geometric
shapes and “standard” stencils to represent a SAN.
Note: These SAN diagrams are just sample diagrams. They do not necessarily depict a
SAN that you actually want to deploy.
Chapter 12. Maintenance
247
Figure 12-2 A more useful diagram of a SAN
Port inventory
Along with the SAN diagram, an inventory of “what is supposed to be plugged in where” is
also quite important. Again, you can create this inventory manually or generate it with
automated tools. Before using automated tools, remember that it is important that your
inventory contains not just what is currently plugged into the SAN, but also what is supposed
to be attached to the SAN. If a server has lost its SAN connection, merely looking at the
current status of the SAN will not tell you where it was supposed to be attached.
This inventory must exist in a format that can be exported and sent to someone else and
retained in an archive for long-term tracking.
The list, spreadsheet, database, or automated tool needs to contain the following information
for each port in the SAN:
 The name of the attached device and whether it is a storage device, host, or another
switch
 The port on the device to which the switch port is attached, for example, Host Slot 6 for a
host connection or Switch Port 126 for an ISL
 The speed of the port
 If the port is not an ISL, list the attached worldwide port name (WWPN)
 For host ports or SVC ports, the destination aliases to which the host is zoned
Automated tools, obviously, can do a decent job of keeping this inventory up-to-date, but
even with a fairly large SAN, a simple database, combined with standard operating
procedures, can be equally effective. For smaller SANs, spreadsheets are a time-honored
and simple method of record keeping.
248
SAN Volume Controller Best Practices and Performance Guidelines
Zoning
While you need snapshots of your zoning configuration, you do not really need a separate
spreadsheet or database just to keep track of your zones. If you lose your zoning
configuration, you can rebuild the SVC parts from your zoning snapshot, and the host zones
can be rebuilt from your port inventory.
12.1.2 SVC
For the SVC, there are several important components that you need to document.
Managed disks (MDisks) and Managed Disk Groups (MDGs)
Records for each MDG need to contain the following information:
 Name
 The total capacity of the MDG
 Approximate remaining capacity
 Type (image or managed)
 For each MDisk in the group:
– The physical location of each logical unit number (LUN) (that is, rank, loop pair, or
controller blade)
– Redundant Array of Independent Disks (RAID) level
– Capacity
– Number of disks
– Disk types (for example, 15k or 4 Gb)
Virtual disks (VDisks)
The VDisk list needs to contain the following information for every VDisk in the SAN:







Name
Owning host
Capacity
MDG
Type of I/O (sequential, random, or mixed)
Striped or sequential
Type (image or managed)
12.1.3 Storage
Actually, for the LUNs themselves, you do not need to track anything outside of what is
already in your configuration documentation for the MDisks, unless the disk array is also used
for direct-attached hosts.
12.1.4 General inventory
Generally separate from your spreadsheets or databases that describe the configurations of
the components, you also need a general inventory of your equipment. This inventory can
include information, such as:
 The physical serial number of the hardware
 Support phone numbers
Chapter 12. Maintenance
249




Support contract numbers
Warranty end dates
Current running code level
Date that the code was last checked for updates
12.1.5 Change tickets and tracking
If you have ever called support (for any vendor) for assistance on a complicated problem, you
will be asked if you have changed anything recently. Being able to produce what was
changed if anything is the key that leads to a swift resolution to a large number of problems.
While you might not have done anything wrong, knowing what was changed can help the
support person find the action that eventually caused the problem.
As mentioned at the beginning of this section, in theory, the record of your changes must
have sufficient detail that you can take all the change documentation and create a functionally
equivalent copy of the environment from the beginning.
The most common way that changes are actually performed in the field is that the changes
are made and then any documentation is written afterward. As in the field of computer
programming, this method often leads to incomplete or useless documentation; a
self-documenting SAN is just as much of a fallacy as self-documenting code. Instead, write
the documentation first and make it detailed enough that you have a “self-configuring”
environment. A “self-configuring” environment means that if your documentation is detailed
enough, the actual act of sitting down at the configuration consoles to execute changes
becomes an almost trivial process that does not involve any actual decision-making. This
method is actually not as difficult as it sounds when you combine it with the checklists that we
explain and demonstrate in 12.2, “Standard operating procedures” on page 251.
12.1.6 Configuration archiving
There must be at least occasional historical snapshots of your SAN and SVC configuration,
so that if there are issues, these devices can be rolled back to their previous configuration.
Historical snapshots can also be useful in measuring the performance impact of changes. In
any case, because modern storage is relatively inexpensive, just a couple of GBs can hold a
couple of years of complete configuration snapshots, even if you pull them before and after
every single SAN change.
These snapshots can include:
 The supportShow output from Brocade switches
 The show tech-support details from Cisco switches
 Data collections from Enterprise Fabric Connectivity Manager (EFCM)-equipped McDATA
switches
 SVC configuration (Config) dumps
 DS4x00 subsystem profiles
 DS8x00 LUN inventory commands:
–
–
–
–
–
–
250
lsfbvol
lshostconnect
lsarray
lsrank
lsioports
lsvolgrp
SAN Volume Controller Best Practices and Performance Guidelines
Obviously, you do not need to pull DS4x00 profiles if the only thing you are modifying is SAN
zoning.
12.2 Standard operating procedures
The phrase “standard operating procedure” (SOP) often brings to mind thick binders filled
with useless, mind-numbing processes that nobody reads or uses in their daily job. It does not
have to be this way, even for a relatively complicated environment.
For all of the common changes that you make to your environment, there must be procedures
written that ensure that changes are made in a consistent fashion and also ensure that the
changes are documented properly. If the same task is done in different ways, it can make
things confusing quite quickly, especially if you have multiple staff responsible for storage
administration. These procedures might be created for tasks, such as adding a new host to
the SAN/SVC, allocating new storage, performing disk migrations, configuring new Copy
Services relationships, and so forth.
One way to implement useful procedures is to integrate them with checklists that can then
serve as change tracking records. We describe one example of a combination checklist and
SOP document for adding a new server to the SAN and allocating storage to it on an SVC
next.
In Example 12-1, our procedures have all of the variables set off in __Double Underscores__.
The example guidance to what decisions to make is in italics.
Note: Do not actually use this procedure exactly as described. It is almost certainly
missing information vital to the proper operation of your environment. Use it instead as a
general guide as to what a SOP can look like.
Example 12-1 Host addition standard operating procedure, checklist, and change record
Abstract: Request__ABC456__ : Add new server __XYZ123__ to the SAN and allocate
__200GB__ from SVC Cluster __1__
Date of Implementation: __08/01/2008__
Implementing Storage Administrator: Katja Gebuhr(x1234)
Server Administrator: Jon Tate (x5678)
Impact: None. This is a non-disruptive change.
Risk: Low.
Time estimate: __30 minutes__
Backout Plan: Reverse changes
Implementation Checklist:
1. ___ Verify (via phone or e-mail) that the server administrator has installed
all code levels listed on the intranet site
http://w3.itsoelectronics.com/storage_server_code.html
2. ___ Verify that the cabling change request, __CAB927__ has been completed.
3. ___ For each HBA in the server, update the switch configuration spreadsheet
with the new server using the information below.
To decide on which SVC cluster to use: All new servers must be allocated to SVC
cluster 2, unless otherwise indicated by the Storage Architect.
Chapter 12. Maintenance
251
To decide which I/O Group to Use: These must roughly be evenly distributed. Note:
If this is a high-bandwidth host, the Storage Architect might give a specific I/O
Group assignment, which should be noted in the abstract.
To select which Node Ports to Use: If the last digit of the first WWPN is odd (in
hexadecimal, B, D, and F are also odd), use ports 1 and 3; if even, 2 and 4.
HBA A:
Switch: __McD_1__
Port: __47__
WWPN: __00:11:22:33:44:55:66:77__
Port Name:__XYZ123_A__
Host Slot/Port: __5__
Targets: __SVC 1, IOGroup 2, Node Ports 1__
HBA B:
Switch: __McD_2__
Port: __47__
WWPN: __00:11:22:33:44:55:66:88__
Port Name:__XYZ123_B__
Host Slot/Port: __6__
Targets: __SVC 1, IOGroup 2, Node Ports 4__
4. ___ Log in to EFCM and modify the Nicknames for the new ports (using the
information above).
5. ___ Collect Data Collections from both switches and attach them to this ticket
with the filenames of ticket_number>_<switch name_old.zip
6. ___ Add new zones to the zoning configuration using the standard naming
convention and the information above.
7. ___ Collect Data Collections from both switches again and attach them with the
filenames of <ticket_number>_<switch name>_new.zip
8. Log on to the SVC Console for Cluster __2__ and:
___ Obtain a config dump and attach it to this ticket under the filename
<ticket_number>_<cluster_name>_old.zip
___ Add the new host definition to the SVC using the information above and setting
the host type to __Generic__ Do not type in the WWPN. If it does not appear in
the drop-down list, cancel the operation and retry. If it still does not appear,
check zoning and perform other troubleshooting as necessary.
___ Create new VDisk(s) with the following parameters:
To decide on the MDiskGroup: For current requests (as of 8/1/08) use ESS4_Group_5,
assuming that it has sufficient free space. If it does not have sufficient free
space, inform the storage architect prior to submitting this change ticket and
request an update to these procedures.
Use Striped (instead of Sequential) VDisks for all requests, unless otherwise
noted in the abstract.
Name: __XYZ123_1__
Size: __200GB__
IO Group: __2__
MDisk Group: __ESS 4_Group_5__
Mode: __Striped__
252
SAN Volume Controller Best Practices and Performance Guidelines
9. ___ Map the new VDisk to the Host
10.___ Obtain a config dump and attach it to this ticket under
<ticket_number>_<cluster_name>_new.zip
11.___ Update the SVC Configuration spreadsheet using the above information, and
the following supplemental data:
Request: __ABC456__Project: __Foo__
12.Also update the entry for the remaining free space in the MDiskGroup with the
information pulled from the SVC console.
13.___ Call the Server Administrator in the ticket header and request storage
discovery. Ask them to obtain a pathcount to the new disk(s). If it is not 4,
perform necessary troubleshooting as to why there are an incorrect number of
paths.
14.___ Request that the storage admin confirm R/W connectivity to the paths.
15.Make notes on anything unusual in the implementation here: ____
Note that the example checklist does not contain pages upon pages of screen captures or
“click Option A, select Option 7....” Instead, it assumes that the user of the checklist
understands the basic operational steps for the environment.
After the change is over, the entire checklist, along with the configuration snapshots, needs to
be stored in a safe place, not the SVC or any other SAN-attached location.
You must use detailed checklists even for non-routine changes, such as migration projects, to
help the implementation go smoothly and provide an easy-to-read record of what was done.
Writing a one-use checklist might seem horribly inefficient, but if you have to review the
process for a complex project a few weeks after implementation, you might discover that your
memory of exactly what was done is not as good as you thought. Also, complex, one-off
projects are actually more likely to have steps skipped, because they are not routine.
12.3 Code upgrades
Code upgrades in a networked environment, such as a SAN, are complex enough. Because
the SVC introduces an additional layer of code, upgrades can become a bit tricky.
12.3.1 Upgrade code levels
The SVC requires an additional layer of testing on top of the normal product testing
performed by the rest of IBM storage product development. For this reason, SVC testing of
newly available SAN code often runs several months behind other IBM products, which
makes determining the correct code level quite easy; simply refer to the “Recommended
Software Levels” and “Supported Hardware List” on the SVC support Web site under
“Plan/Upgrade.”
Chapter 12. Maintenance
253
Do not run software levels that are higher than what is recommended on those lists if
possible. We do recognize that there can be situations where you need a particular code fix
that is only available in a level of code later than what appears on the support matrix. If that is
the case, contact your IBM marketing representative and ask for a Request for Price
Quotation (RPQ); however, this particular type of modification usually does not cost you
anything. These requests are relayed to IBM SVC Development and Test and are routinely
granted. The purpose behind this process is to ensure that SVC Test has not run into an
interoperability issue in the level of code that you want to run.
12.3.2 Upgrade frequency
Most clients perform major code upgrades every 12 - 18 months, which usually include
upgrades across the entire infrastructure, so that all of the code levels are “in sync.”
It is common to wait three months or so after major version upgrades to gauge the stability of
the code level. Other clients use an “n-1” policy, which means that a code upgrade does not
get deployed until its replacement is released. For instance, they do not deploy 4.2 until either
4.3 or 5.0 ships.
12.3.3 Upgrade sequence
Unless you have another compelling reason (such as a fix that the SVC readme file says you
must install first), upgrade the Master Console and the SVC first. Backward compatibility
usually works much better than forward compatibility. Do so even if the code levels on
everything else were not tested on the latest SVC release.
The exception to this rule is if you discovered that a part of your SAN is accidentally running
ancient code, such as a server running a three year old copy of IBM Subsystem Device Driver
(SDD).
The following list shows a desirable order:
 SVC Master Console GUI
 SVC cluster code
 SAN switches
 Host systems (host bus adapter (HBA), operating system (OS) and service packs, and
multipathing driver)
 Storage controller
12.3.4 Preparing for upgrades
Before performing any SAN switch or SVC upgrade, make sure that your environment has no
outstanding problems. Prior to the upgrade, you need to:
 Check all hosts for the proper number of paths. If a host was for some reason not
communicating with one of the nodes in an I/O Group, it will experience an outage during
an SVC upgrade, because nodes individually reset to complete the upgrade. There are
several techniques that you can use to make this process less tedious; refer to 9.7.1,
“Automated path monitoring” on page 205.
 Check the SVC error log for unfixed errors. Remedy all outstanding errors. (Certain clients
have been known to just automatically click “this error has been fixed” just to clear out the
log, which is an extremely bad idea; you must make sure that you understand the error
before stating it has been fixed).
254
SAN Volume Controller Best Practices and Performance Guidelines
 Check your switch logs for issues. Pay special attention to your SVC and storage ports.
Things to look for are signal errors, such as Link Resets and cyclic redundancy check
(CRC) errors, unexplained logouts, or ports in an error state. Also, make sure that your
fabric is stable with no ISLs going up and down often.
 Examine the readme files or release notes for the code that you are preparing to upgrade.
There can be important notes about required pre-upgrade dependencies, unfixed issues,
necessary APARs, and so on. This requirement applies to all SAN-attached devices, such
as your HBAs and switches, not just the SVC.
You must also expect a write performance hit during an SVC upgrade. Because node resets
are part of the upgrade, the write cache will be disabled on the I/O Group currently being
upgraded.
12.3.5 SVC upgrade
Before applying the SVC code upgrade, review the following Web page to ensure the
compatibility between the SVC code and the SVC Console GUI. The SAN Volume Controller
and SVC Console GUI Compatibility Web site is:
http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1002888
Furthermore, certain concurrent upgrade paths are only available through an intermediate
level. Refer to the SAN Volume Controller Concurrent Compatibility and Code
Cross-Reference Web page for more information:
http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1001707
It is wise to schedule a low I/O activity time for the SVC code upgrade. Before making any
other changes in your SAN environment, allow the SVC code upgrade to finish. Allow at least
one hour to perform the code upgrade for a single SVC I/O Group and 30 minutes for each
additional I/O Group. In a worst case scenario, an upgrade can take up to two hours, which
implies that the SVC code upgrade will also upgrade the BIOS, SP, and the SVC service
card.
Important: If the Concurrent Code Upgrade (CCU) appears to stop for a long time (up to
an hour), this delay can occur because it is upgrading a low level BIOS. Never power off
during a CCU upgrade unless you have been instructed to do so by IBM service personnel.
If the upgrade does encounter a problem and fails, it will back out the upgrade itself.
New features are not available until all nodes in the cluster are at the same level. Features
that are dependent on a remote cluster Metro Mirror or Global Mirror might not be available
until the remote cluster is at the same level, too.
Upgrade the SVC cluster in a Metro or Global Mirror cluster relationship
When upgrading the SVC cluster software where the cluster participates in an intercluster
relationship, make sure to only upgrade one cluster at a time. Do not attempt to upgrade both
SVC clusters concurrently. This action is not policed by the software upgrade process. Allow
the software upgrade to complete on one cluster before you start the upgrade on the other
cluster.
If both clusters are upgraded concurrently, it might lead to a loss of synchronization. In stress
situations, it might lead to a loss of availability.
Chapter 12. Maintenance
255
12.3.6 Host code upgrades
Making sure that hosts run correctly and with the current HBA drivers, multipath drivers, and
HBA firmware is a chronic problem for a lot of storage administrators. In most IT
environments, server administration is separate from storage administration, which makes
enforcement of proper code levels extremely difficult.
One thing often not realized by server administrators is that proper SAN code levels are just
as important to the proper operation of the server as the latest security patches or OS
updates. There is no reason not to install updates to storage-related code on the same
schedule as the rest of the OS.
The ideal solution to this problem is software inventory tools that are accessible to both
administration staffs. These tools can be “homegrown” or are available from many vendors,
including IBM.
If automatic inventory tools are not available, an alternative approach is to have an intranet
site, which is maintained by the storage staff, that details the code levels that server
administrators need to be running. This effort will likely be more successful if it is integrated
into a larger intranet site detailing required code levels and patches for everything else.
12.3.7 Storage controller upgrades
If you have to take a controller completely offline for disruptive maintenance, the SVC code
Version 4.3.0 allows you to use the VDisk mirroring feature to prepare for this event. You can
then have the copy of the primary VDisk in a different MDisk group, where different LUNs
from a different storage controller are used. This feature allows you to take the controller
offline, fix a problem, or perform an upgrade. When the maintenance is done, you can bring
the controller back online and sync the data that has changed since it was offline.
12.4 SAN hardware changes
Part of SAN/SVC maintenance sometimes involves upgrading or replacing equipment, which
might require extensive preparation before performing the change.
12.4.1 Cross-referencing the SDD adapter number with the WWPN
It is extremely common in SAN maintenance operations to gracefully take affected adapters
or paths offline before performing actions that will take them down in an abrupt manner. This
method allows the multipathing software to complete any outstanding commands using that
path before it disappears. If you choose to gracefully take affected adapters or paths offline
first, it is extremely important that you verify which adapter you will be working on before
running any commands to take the adapter offline.
One common misconception is that the adapter IDs in SDD have anything to do with the slot
number, FCS/FSCSI number, or any other ID they might be assigned somewhere else.
Instead, you need to run several commands to properly associate the WWPN of the adapter,
which can be obtained from your SAN records, to the switch on which you are performing
maintenance.
For example, let us suppose that we need to perform SAN maintenance with an AIX system
on the adapter with a WWPN ending in F5:B0.
256
SAN Volume Controller Best Practices and Performance Guidelines
The steps are:
1. Run datapath query WWPN, which will return output similar to:
[root@abc]> datapath query wwpn
Adapter Name PortWWN
fscsi0
10000000C925F5B0
fscsi1
10000000C9266FD1
As you can see, the adapter that we want is fscsi0.
2. Next, cross-reference fscsi0 with the output of datapath query adapter:
Active Adapters :4
Adpt# Name
State
0
scsi3 NORMAL
1
scsi2 NORMAL
2
fscsi2 NORMAL
3
fscsi0 NORMAL
Mode
ACTIVE
ACTIVE
ACTIVE
ACTIVE
Select
129062051
88765386
407075697
341204788
Errors
0
303
5427
63835
Paths
64
64
1024
256
Active
0
0
0
0
From here, we can see that fscsi0 has the adapter ID of 3 in SDD. We will use this ID when
taking the adapter offline prior to maintenance. Note how the SDD ID was 3 even though the
adapter had been assigned the device name fscsi0 by the OS.
12.4.2 Changes that result in the modification of the destination FCID
There are many changes to your SAN that will result in the modification of the destination
Fibre Channel ID (FCID), which is also known as the N_Port ID. The following operating
systems have suggested procedures that you must perform before the change takes place. If
you do not perform these steps, you might have difficulty bringing the paths back online.
The changes that trigger this issue will be noted in this chapter. Note that changes in the
FCID of the host itself will not trigger this issue.
AIX
In AIX without the SDDPCM, if you do not properly manage a destination FCID change,
running cfgmgr will create brand-new hdisk devices, all of your old paths will go into a defined
state, and you will have difficulty removing them from your Object Data Manager (ODM)
database.
There are two ways of preventing this issue in AIX.
Dynamic Tracking
This is an AIX feature present in AIX 5.2 Technology Level (TL) 1 and higher. It causes AIX to
bind hdisks to the WWPN instead of the destination FCID. However, this feature is not
enabled by default, has extensive prerequisite requirements, and is disruptive to enable. For
these reasons, we do not recommend that you rely on this feature to aid in scheduled
changes. The alternate procedure is not particularly difficult, but if you are still interested in
Dynamic Tracking, refer to the IBM System Storage Multipath Subsystem Device Driver
User’s Guide, SC30-4096, for full details.
If you choose to use Dynamic Tracking, we strongly recommend that AIX is at the latest
available TL. If Dynamic Tracking is enabled, no special procedures are necessary to change
the FCID.
Chapter 12. Maintenance
257
Manual device swaps with SDD
Use these steps to perform manual device swaps with SDD:
1. Using the procedure in 12.4.1, “Cross-referencing the SDD adapter number with the
WWPN” on page 256, obtain the SDD adapter ID.
2. Run the command datapath set adapter X offline where X is the SDD adapter ID.
3. Run the command datapath remove adapter X. Again, X is the SDD adapter ID.
4. Run rmdev -Rdl fcsY where Y is the FCS/FSCSI number. If you receive an error
message about the devices being in use, you probably took the wrong adapter offline.
5. Perform your maintenance.
6. Run cfgmgr to detect your “new” hdisk devices.
7. Run addpaths to get the “new” hdisks back into your SDD vpaths.
Device swaps with SDDPCM
With or without Dynamic Tracking, the issue of not properly managing a destination FCID
change is not a problem if you are using AIX Multipath I/O (MPIO) with the SDDPCM.
Other operating systems
Unfortunately, whether the HBA binds to the FCID is HBA driver-dependent. Consult your
HBA vendor for further details. (We were able to provide details for AIX, because there is only
one supported adapter driver.) The most common Intel HBAs made by QLogic are not
affected by this issue.
12.4.3 Switch replacement with a like switch
If you are replacing a switch with another switch of the same model, your preparation is fairly
straightforward:
1. If the current switch is still up and running, take a snapshot of its configuration.
2. Check all affected hosts to make sure that the path on which you will be relying during the
replacement is operational.
3. If there are hosts attached to the switch, gracefully take the paths offline. In SDD, the
appropriate command is datapath set adapter X offline where X is the adapter
number. While technically taking the paths offline is not necessary, it is nevertheless a
good idea. Follow the procedure in 12.4.1, “Cross-referencing the SDD adapter number
with the WWPN” on page 256 for details.
4. Power off the old switch. Note that the SVC will log all sorts of error messages when you
power off the old switch. Perform at least a spot-check of your hosts to make sure that
your access to disk still works.
5. Remove the old switch, put in the new switch, and power up the new switch; do not attach
any of the Fibre Channel ports yet.
6. If appropriate, match the code level on the new switch with the other switches in your
fabric.
7. Give the new switch the same Domain ID as the old switch. You might also want to upload
the configuration of the old switch into the new switch as well. In the case of a Cisco
switch and AIX hosts using SDD, it is important to upload the configuration of the old
switch into the new switch. Uploading the configuration of the old switch into the new
switch ensures that the FCID of the destination devices remains constant, which often is
important to AIX hosts with SDD.
258
SAN Volume Controller Best Practices and Performance Guidelines
8. Plug the ISLs into the new switch and make sure that the new switch merges into the
fabric successfully.
9. Attach the storage ports, making sure to use the same physical ports as the old switch.
10.Attach the SVC ports and perform appropriate maintenance procedures to bring the disk
paths back online.
11.Attach the host ports and bring their paths back online.
12.4.4 Switch replacement or upgrade with a different kind of switch
The only difference from the procedure in the previous section is that you are obviously not
going to upload the configuration of the old switch into the new switch. You must still give the
new switch the same Domain ID as the old switch. Remember that the FCIDs will almost
certainly change when installing this new switch, so be sure to follow the appropriate
procedures for your operating system here.
12.4.5 HBA replacement
Replacing a HBA is a fairly trivial operation if done correctly with the appropriate preparation:
1. Ensure that your SAN is currently zoned by WWPN instead of worldwide node name
(WWNN). If you are using WWNN, change your zoning first.
2. If you do not have hot-swappable HBAs, power off your system, replace the HBA, power
the system back on, and skip to step 5.
3. Using the procedure in 12.4.1, “Cross-referencing the SDD adapter number with the
WWPN” on page 256, gracefully take the appropriate path offline.
4. Follow the appropriate steps for your hardware and software platform to replace the HBA
and bring it online.
5. Ensure that the new HBA is successfully logging in to the name server on the switch. If it is
not, fix this issue before continuing to the next step. (The WWPN for which you are looking
is usually on a sticker on the back of the HBA or somewhere on the HBA’s packing box.)
6. In the zoning interface for your switch, replace the WWPN of the old adapter with the
WWPN of the new adapter.
7. Swap out the WWPNs in the SVC host definition interface.
8. Perform the device detection procedures appropriate for your OS to bring the paths back
up and verify that the paths are up with your multipathing software. (Use the command
datapath query adapter in SDD.)
12.5 Naming convention
Without a proper naming convention, your SAN and SVC configuration can quickly become
extremely difficult to maintain. The naming convention needs to be planned ahead of time
and documented for your administrative staff. It is more important that your names are useful
and informative rather than extremely short.
12.5.1 Hosts, zones, and SVC ports
If you examine 1.3.6, “Sample standard SVC zoning configuration” on page 16, you see a
sample naming convention that you might want to use in your own environment.
Chapter 12. Maintenance
259
12.5.2 Controllers
It is common to refer to disk controllers by part of their serial number, which helps facilitate
troubleshooting by making the cross-referencing of logs easier. If you have a unique name,
by all means, use it, but it is helpful to append the serial number to the end.
12.5.3 MDisks
The MDisks must most certainly be changed from the default name of mDisk X. The name
must include the serial number of the controller, the array number/name, and the volume
number/name. Unfortunately, you are limited to fifteen characters. This design builds a name
similar to:
23K45_A7V10 - Serial 23K45, Array 7, Volume 10.
12.5.4 VDisks
The VDisk name must indicate for what host the VDisk is intended, along with any other
identifying information that might distinguish this VDisk from other VDisks.
12.5.5 MDGs
MDG names must indicate from which controller the group comes, the RAID level, and the
disk size and type. For example, 23K45_R1015k300 is an MDG on 23K45, RAID 10, 15k, 300
GB drives. (As with the other names on the SVC, you are limited to 15 characters).
260
SAN Volume Controller Best Practices and Performance Guidelines
13
Chapter 13.
Cabling, power, cooling,
scripting, support, and classes
In this chapter, we discuss valuable miscellaneous advice regarding the implementation of
the IBM SAN Volume Controller (SVC). This chapter includes several of the supporting
installations upon which the SVC relies, together with information about scripting the SVC.
We also include references for further information.
© Copyright IBM Corp. 2008. All rights reserved.
261
13.1 Cabling
None of the advice in the following section is specific to the SVC. However, because cabling
problems can produce SVC issues that will be troublesome and tedious to diagnose,
reminders about how to structure cabling might be useful.
13.1.1 General cabling advice
All cabling used in a SAN environment must be high-quality cables certified for the speeds at
which you will be using the cable. Because current SVC nodes come with shortwave small
form-factor pluggable (SFP) optical transceivers, multi-mode cabling is to be used for
connecting the nodes. For most installations, multi-mode cabling translates into multi-mode
cables with a core diameter of 50 microns. When using the current SVC node maximum
speed of 4 Gbps, the cables that you use must to be certified to meet the 400-M5-SN-I
cabling specification. This specification refers to 400 MBps, 50 micron core, multi-mode,
shortwave no-Open Fiber Control (OFC) laser, intermediate distance.
Note that we do not recommend recycling old 62.5 micron core cabling, which is likely to
cause problems. There are specifications for using 62.5 micron cabling, but you are greatly
limited as far as your maximum cable length, and many cables will not meet the stringent
standards required by Fibre Channel. Also, because the SVC nodes come with LC
connectors, we do not recommend that you use any conversions between LC and SC
connectors in the fiber path.
We recommend that you use factory-terminated cables from a reputable vendor. Only use
field-terminated cables for permanent cable installations, and only when they are installed by
qualified personnel with fiber splicing skills. When using field-terminated cables, ensure that a
fiber path quality test is conducted, for instance, with an Optical Time Domain Reflectometer
(OTDR). To ensure cable quality, and prepare for future link speeds, we also advise that you
get cables that meet the OM3 cable standard.
If you have a large data center, remember that at 4 Gbps, you are limited to a maximum cable
length of 150 meters (492 feet and 1.7 inches). You must set up your SAN so that all switches
are within 150 meters of the SVC nodes.
13.1.2 Long distance optical links
Certain installations will require long distance direct fiber links to connect devices in the SAN.
For such links, single-mode cables are used together with longwave optical transceivers. For
long distance links, you must always ensure quality by measurements prior to production
usage. Also, it is of paramount importance that you use the correct optical transceivers and
that your SAN switches are capable of supporting stable, error-free operations. We
recommend that you consult IBM for any planned links longer than a kilometer (0.62 miles),
which is especially important when using wavelength division multiplexing (WDM) solutions
instead of direct fiber links.
13.1.3 Labeling
All cables must be labeled at both ends with their source and destination locations. Even in
the smallest SVC installations, a lack of cable labels quickly becomes an unusable mess
when you are trying to trace problems. A small SVC installation consisting of a two-port
storage subsystem, 10 hosts, and a single SVC cluster with two nodes will require 30
fiber-optic cables to set up.
262
SAN Volume Controller Best Practices and Performance Guidelines
To ensure that unambiguous information can be read from the labels, we recommend that
you institute a standard labeling scheme to be used in your environment. The labels at both
cable ends must be identical. An example labeling scheme consists of three lines per label,
with the following content:
Line 1: Cable first end physical location <-> Cable second end physical location
Line 2: Cable first end device name and port number
Line 3: Cable second end device name and port number
For one of the SVC clusters that was used when writing this book, the label for both ends of
the cable connecting SVC node 1, port 1 to the SAN switch, port 2 looks like:
NSJ1R2U14 <-> NSJ1R3U16
itsosvccl1_n1 p1
IBM_2005_B5K_1 p2
In line one, “NSJ” refers to the site name, “Rn” is the rack number and “Un” is the rack unit
number. Line two has the name of the SVC cluster node 1 together with port 1, and line three
has the name of the corresponding SAN switch together with port 2.
If your cabling installation includes patch panels in the cabling path, information about these
patch panels must be included in the labeling. We recommend using a cable management
system to keep track of cabling information and routing. For small installations, you can use a
a simple spreadsheet, but for large data center, we recommend that you use one of the
customized commercial solutions that are available.
Note: We strongly recommend that you use only cable labels that are made for this
purpose, because they have a specific adhesive that works well with the cable jacket.
Otherwise, labels made for other purposes tend to lose their grip on the cable over time.
13.1.4 Cable management
With SAN switches increasing in port density, it is now theoretically possible to install more
than 1 500 ports into a single rack cabinet (this number is based on the IBM System
Storage/Cisco MDS 9513 SAN director).
We do not recommend that you install more than 1 500 ports into a single rack cabinet.
Most SAN installations are far too dynamic for this idea to ever work. If you ever have to swap
out a faulty line card/port blade, or even worse, a switch chassis, you will be presented with
an inaccessible nightmare of cables. For this reason, we strongly advise you to use proper
cable management trays and guides. As a general rule, cable management takes about as
much space as your switches take.
13.1.5 Cable routing and support
Most guides to fiber cabling specify a minimum bend radius of around 2.5 cm (approximately
1 inch). Note that is a radius; the minimum bend diameter needs to be twice that length.
Proper bend radius is a lofty goal to which you need to design your cabling plan to meet.
However, we have never actually seen a production data center that did not have at least a
few cables that did not meet that standard. While a few short cables are not a disaster, proper
Chapter 13. Cabling, power, cooling, scripting, support, and classes
263
bend radius will become even more important as SAN speeds increase. You can expect well
over twice the number of physical layer issues at 4 Gbps as you might have seen in a 2 Gbps
SAN. And, 8 Gbps will have even more stringent requirements.
There are two major causes of insufficient bend radius:
 Incorrect use of server cable management arms. These hinged arms are extremely
popular in racked server designs, including the IBM design. However, you must be careful
to ensure that when these arms are slid in and out, the cables in the arm do not become
kinked.
 Insufficient cable support. You cannot rely on the strain-relief boots built into the ends of
the cable to provide support. Over time, your cables will inevitably sag if you rely on these
strain-relief boots. A common scene in many data centers is a “waterfall” of cables
hanging down from the SAN switch without any other support than the strain-relief boots.
Use loosely looped cable ties or cable straps to support the weight of your cables. And as
stated elsewhere, make sure that you install a proper cable management system.
13.1.6 Cable length
Cables must be as close as possible to exactly the required length, with little slack. Therefore,
purchase a variety of cable lengths and use the cables that will leave you the least amount of
slack.
If you do have slack in your cable, you must neatly spool up the excess into loops that are
around 20 cm (7.87 inches) in diameter and bundle them together. Try to avoid putting these
bundled loops in a great heap on the floor, or you might never be able to remove any cables
until your entire data center is destined for the scrap yard.
13.1.7 Cable installation
Before plugging in any cables, it is an extremely good idea to clean the end of the cables with
a disposable, lint-free alcohol swab, which is especially true when reusing cables. Also,
gently use canned air to blow any dust out of the optical transceivers.
13.2 Power
Because the SVC nodes can be compared to standard one unit rack servers, they have no
particularly exotic power requirements. Nevertheless, it is often a source of field issues.
13.2.1 Bundled uninterruptible power supply units
The most notable power feature of the SVC is the required uninterruptible power supply
(UPS) units.
The most important consideration with the UPS units is to make sure that they are not
cross-connected, which means that you must ensure that the serial cable and the power cable
from a specific UPS unit connect to the same SVC node.
Also, remember that the function of the UPS units is solely to provide battery power to the
SVC nodes long enough to copy the write cache from memory onto the internal disk of the
nodes. The shutdown process will begin immediately when power is lost, and the shutdown
cannot be stopped by bringing back power during the shutdown. The SVC nodes will restart
264
SAN Volume Controller Best Practices and Performance Guidelines
immediately when power is restored. Therefore, compare the UPS units to the built-in
batteries found in most storage subsystem controllers, and do not think of them as substitutes
to the normal data center UPS units. If you want continuous availability, you will need to
provide other sources of backup power to ensure that the power feed to your SVC cluster is
never interrupted.
13.2.2 Power switch
The UPS unit that comes bundled with an SVC node only has a single power inlet, and
therefore, it can only be connected to a single power feed. If you prefer to have each node
connected to two power feeds, there is a small power switch available from IBM. This unit will
accept two incoming power feeds and give a feed out. If one of the incoming feeds goes
down, the outbound feed will not be interrupted.
Figure 13-1 Optional SVC power switch
13.2.3 Power feeds
There must be as much separation as possible between the feeds that power each node in an
SVC I/O Group. The nodes must be plugged into completely different circuits within the data
center; you do not want a single breaker tripping to cause an entire SVC I/O Group to shut
down.
13.3 Cooling
The SVC has no extraordinary cooling requirements. From the perspective of a data center
space planner, it can be compared to a pack of standard one unit rack servers. The most
important considerations are:
 The SVC nodes cools front-to-back. When installing the nodes, make sure that the node
front faces toward where the cold air comes in.
 Fill empty spaces in your rack with filler panels to help prevent recirculating hot exhaust air
back into the air intakes. The most common filler panels do not even require screws to
mount.
 Data centers with rows of racks must be set up with “hot” and “cold” aisles. Air intakes
must face the cold aisles, and hot air is then blown into the hot aisles. You do not want the
hot air from one rack dumping into the intake of another rack.
Chapter 13. Cabling, power, cooling, scripting, support, and classes
265
 In a raised-floor installation, the vent tiles must only be in the cold aisles. Vent tiles in the
hot aisle can cause air recirculation problems.
 If you need to deploy fans on the floor to fix “hot spots,” you need to reevaluate your data
center cooling configuration. Fans on the floor is a poor solution that will almost certainly
lead to reduced equipment life. Instead, engage IBM, or any one of a number of
professional data center contractors, to evaluate your cooling configuration. It might be
possible to fix your cooling by reconfiguring existing airflow without having to purchase
any additional cooling units.
13.4 SVC scripting
While the SVC Console GUI is an extremely user friendly tool, just as other GUIs, it is not well
suited to perform large amounts of specific operations. For complex, often-repeated
operations, it is more convenient to script the SVC command line interface (CLI). The SVC
CLI can be scripted using any program that can pass text commands to the SVC cluster
Secure Shell (SSH) connection. Using PuTTY, the component to use is plink.exe.
Engineers in IBM have developed a scripting toolkit that is designed to help automate SVC
operations. It is Perl-based and available at no-charge from:
http://www.alphaworks.ibm.com/tech/svctools
The scripting toolkit includes a sample script that you can use to redistribute extents across
existing MDisks in the group. Refer to 5.6, “Restriping (balancing) extents across an MDG” on
page 88 for an example use of the redistribute extents script from the scripting toolkit.
Note: The scripting toolkit is made available to users through IBM’s AlphaWorks Web site.
As with all software available on AlphaWorks, it is not extensively tested and is provided on
an as-is basis. It is not supported in any formal way by IBM Product Support. Use it at your
own risk.
13.4.1 Standard changes
For organizations that are incorporating change management processes, the SVC scripting
option is well suited for facilitating the creation of standard SVC changes. Standard changes
are pre-tested changes to a production environment. By using scripted execution, you can
ensure that the execution sequence for a specific change type remains the same as what has
been tested.
13.5 IBM Support Notifications Service
Unless you enjoy browsing the SVC Web site on a regular basis, it is an excellent idea to sign
up for the IBM Support Notifications Service. This service will send you e-mails when
information on the SVC Support Web site changes, including notices of new code releases,
product alerts (flashes), new publications, and so on.
The IBM Support Notifications Service is available for the SVC, along with many other IBM
products, at:
http://www.ibm.com/support/subscriptions/us/
266
SAN Volume Controller Best Practices and Performance Guidelines
You can obtain notifications for the SVC from the “System Storage support notifications”
section of this Web site.
You need an IBM ID to subscribe. If you do not have an IBM ID, you can create one (for free)
by following a link from the sign-on page.
13.6 SVC Support Web site
The primary support portal for the IBM SVC is at:
http://www.ibm.com/storage/support/2145
This page is the primary source for new and updated information about the IBM SVC. From
here, you can obtain a variety of SVC-related information, including installation and
configuration guides, problem resolutions, and product feature presentations.
13.7 SVC-related publications and classes
There are several IBM publications and educational options available for the SVC.
13.7.1 IBM Redbooks publications
These are useful publications describing the SVC and important related topics:
 IBM System Storage SAN Volume Controller V4.3, SG24-6423-06. This book is an SVC
configuration cookbook covering many aspects of how to implement the SVC
successfully. It is in an extremely easy to read format.
 Implementing the SVC in an OEM Environment, SG24-7275. This book describes how to
integrate the SVC with several non-IBM storage systems (EMC, HP, NetApp®, and HDS),
as well as with the IBM DS4000 series. It also discusses storage migration scenarios.
 IBM TotalStorage Productivity Center V3.1: The Next Generation, SG24-7194. While this
book was written for Version 3.1, it can be applied to later TotalStorage Productivity
Center (TPC) 3.x versions. It is a cookbook style book about TPC implementation.
 TPC Version 3.3 Update Guide, SG24-7490. This book describes new features in TPC
Version 3.3.
 Implementing an IBM/Brocade SAN, SG24-6116. This book discusses many aspects to
consider when implementing a SAN that is based on IBM System Storage b-type/Brocade
products.
 Implementing an IBM/Cisco SAN, SG24-7545. This book discusses many aspects to
consider when implementing a SAN that is based on the Cisco SAN portfolio for IBM
System Storage.
 IBM System Storage/Brocade Multiprotocol Routing: An Introduction and Implementation,
SG24-7544. This book discusses many aspects regarding design and implementation of
routed SANs (MetaSANs) with the IBM System Storage b-type/Brocade portfolio. It also
describes SAN distance extension technology.
 IBM System Storage/Cisco Multiprotocol Routing: An Introduction and Implementation,
SG24-7543. This book discusses many aspects regarding design and implementation of
routed SANs with the Cisco portfolio for IBM System Storage. It also describes SAN
distance extension technology.
Chapter 13. Cabling, power, cooling, scripting, support, and classes
267
There are many other IBM Redbooks publications available that describe TPC, SANs, and
IBM System Storage Products, as well as many other topics. To browse all of the IBM
Redbooks publications about Storage, go to:
http://www.redbooks.ibm.com/portals/Storage
13.7.2 Courses
IBM offers several courses to help you learn how to implement the SVC:
 SAN Volume Controller (SVC) - Planning and Implementation (ID: SN821) or SAN
Volume Controller (SVC) Planning and Implementation Workshop (ID: SN830). These
courses provide a basic introduction to SVC implementation. The workshop course
includes a hands-on lab; otherwise, the course content is identical.
 IBM TotalStorage Productivity Center Implementation and Configuration (ID: SN856). This
course is extremely useful if you plan to use TPC to manage your SVC environment.
 TotalStorage Productivity Center for Replication Workshop (ID: SN880). This course
describes managing replication with TPC. The replication part of TPC is virtually a
separate product from the rest of TPC, and it is not covered in the basic implementation
and configuration course.
268
SAN Volume Controller Best Practices and Performance Guidelines
14
Chapter 14.
Troubleshooting and diagnostics
The SAN Volume Controller (SVC) has proven to be a robust and reliable virtualization
engine that has demonstrated excellent availability in the field. Nevertheless, from time to
time, problems occur. In this chapter, we provide an overview about common problems that
can occur in your environment. We discuss and explain problems related to the SVC, the
Storage Area Network (SAN) environment, storage subsystems, hosts, and multipathing
drivers. Furthermore, we explain how to collect the necessary problem determination data
and how to overcome these problems.
© Copyright IBM Corp. 2008. All rights reserved.
269
14.1 Common problems
Today’s SANs, storage subsystems, and host systems are complicated, often consisting of
hundreds or thousands of disks, multiple redundant subsystem controllers, virtualization
engines, and different types of Storage Area Network (SAN) switches. All of these
components have to be configured, monitored, and managed properly, and in the case of an
error, the administrator will need to know what to look for and where to look.
The SVC is a great tool for isolating problems in the storage infrastructure. With functions
found in the SVC, the administrator can more easily locate any problem areas and take the
necessary steps to fix the problems. In many cases, the SVC and its service and
maintenance features will guide the administrator directly, provide help, and suggest remedial
action. Furthermore, the SVC will probe whether the problem still persists.
When experiencing problems with the SVC environment, it is important to ensure that all
components comprising the storage infrastructure are interoperable. In an SVC environment,
the SVC support matrix is the main source for this information. You can download the SVC
Version 4.3 support matrix from:
http://www-1.ibm.com/support/docview.wss?rs=591&context=STPVGU&context=STPVFV&uid=
ssg1S1003277&loc=en_US&cs=utf-8&lang=en
Although the latest SVC code level is supported to run on older HBAs, storage subsystem
drivers, and code levels, we recommend that you use the latest tested levels.
14.1.1 Host problems
From the host point of view, you can experience a variety of problems. These problems can
start from performance degradation up to inaccessible disks. There are a few things that you
can check from the host itself before drilling down to the SAN, SVC, and storage subsystems.
Areas to check on the host:





Any special software that you are using
Operating system version and maintenance/service pack level
Multipathing type and driver level
Host bus adapter (HBA) model, firmware, and driver level
Fibre Channel SAN connectivity
Based on this list, the host administrator needs to check for and correct any problems.
You can obtain more information about managing hosts on the SVC in Chapter 9, “Hosts” on
page 175.
14.1.2 SVC problems
The SVC has good error logging mechanisms. It not only keeps track of its internal problems,
but it also tells the user about problems in the SAN or storage subsystem. It also helps to
isolate problems with the attached host systems. Every SVC node maintains a database of
other devices that are visible in the SAN fabrics. This database is updated as devices appear
and disappear.
Fast node reset
The SVC cluster software incorporates a fast node reset function. The intention of a fast node
reset is to avoid I/O errors and path changes from the host’s point of view if a software
270
SAN Volume Controller Best Practices and Performance Guidelines
problem occurs in one of the SVC nodes. The fast node reset function means that SVC
software problems can be recovered without the host experiencing an I/O error and without
requiring the multipathing driver to fail over to an alternative path. The fast node reset is done
automatically by the SVC node. This node will inform the other members of the cluster that it
is resetting.
Other than SVC node hardware and software problems, failures in the SAN zoning
configuration are a problem. A misconfiguration in the SAN zoning configuration might lead to
the SVC cluster not working, because the SVC cluster nodes communicate with each other
by using the Fibre Channel SAN fabrics.
You must check the following areas from the SVC perspective:
 The attached hosts
Refer to 14.1.1, “Host problems” on page 270.
 The SAN
Refer to 14.1.3, “SAN problems” on page 272.
 The attached storage subsystem
Refer to 14.1.4, “Storage subsystem problems” on page 272.
There are several SVC command line interface (CLI) commands with which you can check
the current status of the SVC and the attached storage subsystems. Before starting the
complete data collection or starting the problem isolation on the SAN or subsystem level, we
recommend that you use the following commands first and check the status from the SVC
perspective.
You can use these helpful CLI commands to check the environment from the SVC
perspective:
 svcinfo lscontroller controllerid
Check that multiple worldwide port names (WWPNs) that match the back-end storage
subsystem controller ports are available.
Check that the path_counts are evenly distributed across each storage subsystem
controller or that they are distributed correctly based on the preferred controller. Use the
path_count calculation found in 14.3.4, “Solving back-end storage problems” on page 288.
The total of all path_counts must add up to the number of managed disks (MDisks)
multiplied by the number of SVC nodes.
 svcinfo lsmdisk
Check that all MDisks are online (not degraded or offline).
 svcinfo lsmdisk mdiskid
Check several of the MDisks from each storage subsystem controller. Are they online?
And, do they all have path_count = number of nodes?
 svcinfo lsvdisk
Check that all virtual disks (VDisks) are online (not degraded or offline). If the VDisks are
degraded, are there stopped FlashCopy jobs? Restart these stopped FlashCopy jobs or
delete the mappings.
 svcinfo lshostvdiskmap
Check that all VDisks are mapped to the correct hosts. If a VDisk is not mapped correctly,
create the necessary VDisk to host mapping.
Chapter 14. Troubleshooting and diagnostics
271
 svcinfo lsfabric
Use of the various options, such as -controller, can allow you to check different parts of
the SVC configuration to ensure that multiple paths are available from each SVC node
port to an attached host or controller. Confirm that all SVC node port WWPNs are
connected to the back-end storage consistently.
14.1.3 SAN problems
Introducing the SVC into your SAN environment and using its virtualization functions are not
difficult tasks. There are basic rules to follow before you can use the SVC in your
environment. These rules are not complicated; however, you can make mistakes that lead to
accessibility problems or a reduction in the performance experienced. There are two types of
SAN zones needed to run the SVC in your environment: a host zone and a storage zone. In
addition, there must be an SVC zone that contains all of the SVC node ports of the SVC
cluster; this SVC zone enables intra-cluster communication.
Chapter 1, “SAN fabric” on page 1 provides you with valuable information and important
points about setting up the SVC in a SAN fabric environment.
Because the SVC is in the middle of the SAN and connects the host to the storage
subsystem, it is important to check and monitor the SAN fabrics.
14.1.4 Storage subsystem problems
Today, we have a wide variety of heterogeneous storage subsystems. All these subsystems
have different management tools, different setup strategies, and possible problem areas. All
subsystems must be correctly configured and in good working order, without open problems,
in order to support a stable environment. You need to check the following areas if you have a
problem:
 Storage subsystem configuration: Ensure that a valid configuration is applied to the
subsystem.
 Storage controller: Check the health and configurable settings on the controllers.
 Array: Check the state of the hardware, such as a disk drive module (DDM) failure or
enclosure problems.
 Storage volumes: Ensure that the Logical Unit Number (LUN) masking is correct.
 Host attachment ports: Check the status and configuration.
 Connectivity: Check the available paths (SAN environment).
 Layout and size of Redundant Array of Independent Disks (RAID) arrays and LUNs:
Performance and redundancy are important factors.
In the storage subsystem chapter, we provide you with additional information about managing
subsystems. Refer to Chapter 4, “Storage controller” on page 57.
Determining the correct number of paths to a storage subsystem
Using SVC CLI commands, it is possible to find out the total number of paths to an storage
subsystem. To determine the proper value of the available paths, you need to use the
following formula:
Number of MDisks x Number of SVC nodes per Cluster = Number of paths
mdisk_link_count x Number of SVC nodes per Cluster = Sum of path_count
272
SAN Volume Controller Best Practices and Performance Guidelines
Example 14-1 shows how to obtain this information using the commands svcinfo
lscontroller controllerid and svcinfo lsnode.
Example 14-1 The svcinfo lscontroller 0 command
IBM_2145:itsosvccl1:admin>svcinfo lscontroller 0
id 0
controller_name controller0
WWNN 200400A0B8174431
mdisk_link_count 2
max_mdisk_link_count 4
degraded no
vendor_id IBM
product_id_low 1742-900
product_id_high
product_revision 0520
ctrl_s/n
WWPN 200400A0B8174433
path_count 4
max_path_count 12
WWPN 200500A0B8174433
path_count 4
max_path_count 8
IBM_2145:itsosvccl1:admin>svcinfo lsnode
id
name
UPS_serial_number
WWNN
status
IO_group_id
IO_group_name
config_node
UPS_unique_id
hardware
6
Node1
1000739007
50050768010037E5 online
0
io_grp0
no
20400001C3240007 8G4
5
Node2
1000739004
50050768010037DC online
0
io_grp0
yes
20400001C3240004 8G4
4
Node3
100068A006
5005076801001D21 online
1
io_grp1
no
2040000188440006 8F4
8
Node4
100068A008
5005076801021D22 online
1
io_grp1
no
2040000188440008 8F4
Example 14-1 shows that two MDisks are present for the storage subsystem controller with ID
0, and there are four SVC nodes in the SVC cluster, which means that in this example the
path_count is:
2 x 4 = 8
If possible, spread the paths across all storage subsystem controller ports, which is the case
for Example 14-1 (four for each WWPN).
Chapter 14. Troubleshooting and diagnostics
273
14.2 Collecting data and isolating the problem
Data collection and problem isolation in an IT environment are sometimes difficult tasks. In
the following section, we explain the essential steps to collect debug data to find and isolate
problems in an SVC environment. Today, there are many approaches to monitoring the
complete client environment. IBM offers the IBM TotalStorage Productivity Center (TPC)
storage management software. Together with problem and performance reporting, TPC offers
a powerful alerting mechanism and an extremely powerful Topology Viewer, which enables
the user to monitor the storage infrastructure.
Refer to Chapter 11, “Monitoring” on page 221 for more information about the TPC Topology
Viewer.
14.2.1 Host data collection
Data collection methods vary by operating system. In this section, we show how to collect the
data for several major host operating systems.
As a first step, always collect the following information from the host:
 Operating system: Version and level
 Host Bus Adapter (HBA): Driver and firmware level
 Multipathing driver level
Then, collect the following operating system specific information:
 IBM AIX
Collect the AIX system error log, by collecting a snap -gfiLGc for each AIX host.
 For Microsoft Windows or Linux hosts
Use the IBM Dynamic System Analysis (DSA) tool to collect data for the host systems.
Visit the following links for information about the DSA tool:
– http://multitool.pok.ibm.com
– http://www-304.ibm.com/jct01004c/systems/support/supportsite.wss/docdisplay?b
randind=5000008&lndocid=SERV-DSA
If your server is based on non-IBM hardware, use the Microsoft problem reporting tool,
MPSRPT_SETUPPerf.EXE, found at:
http://www.microsoft.com/downloads/details.aspx?familyid=cebf3c7c-7ca5-408f-88b
7-f9c79b7306c0&displaylang=en
For Linux hosts, another option is to run the tool sysreport.
 VMware ESX Server
Run the following script on the service console:
/usr/bin/vm-support
This script collects all relevant ESX Server system and configuration information, as well
as ESX Server log files.
In most cases, it is also important to collect the multipathing driver used on the host system.
Again, based on the host system, the multipathing drivers might be different.
If this is an IBM Subsystem Device Driver (SDD), SDDDSM, or SDDPCM host, use datapath
query device or pcmpath query device to check the host multipathing. Ensure that there are
274
SAN Volume Controller Best Practices and Performance Guidelines
paths to both the preferred and non-preferred SVC nodes. For more information, refer to
Chapter 9, “Hosts” on page 175.
Check that paths are open for both preferred paths (with select counts in high numbers) and
non-preferred paths (the * or nearly zero select counts). In Example 14-2, path 0 and path 2
are the preferred paths with a high select count. Path 1 and path 3 are the non-preferred
paths, which show an asterisk (*) and 0 select counts.
Example 14-2 Checking paths
C:\Program Files\IBM\Subsystem Device Driver>datapath query device -l
Total Devices : 1
DEV#:
0 DEVICE NAME: Disk1 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF2800000000000037
LUN IDENTIFIER: 60050768018101BF2800000000000037
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk1 Part0
OPEN
NORMAL
1752399
0
1 *
Scsi Port3 Bus0/Disk1 Part0
OPEN
NORMAL
0
0
2
Scsi Port3 Bus0/Disk1 Part0
OPEN
NORMAL
1752371
0
3 *
Scsi Port2 Bus0/Disk1 Part0
OPEN
NORMAL
0
0
Multipathing driver data (SDD)
IBM Subsystem Device Driver (SDD) has been enhanced to collect SDD trace data
periodically and to write the trace data to the system’s local hard drive. You collect the data by
running the sddgetdata command. If this command is not found, collect the following four
files, where SDD maintains its trace data:




sdd.log
sdd_bak.log
sddsrv.log
sddsrv_bak.log
These files can be found in one of the following directories:






AIX: /var/adm/ras
Hewlett-Packard UNIX: /var/adm
Linux: /var/log
Solaris: /var/adm
Windows 2000 Server and Windows NT Server: \WINNT\system32
Windows Server 2003: \Windows\system32
SDDPCM
SDDPCM has been enhanced to collect SDDPCM trace data periodically and to write the
trace data to the system’s local hard drive. SDDPCM maintains four files for its trace data:




pcm.log
pcm_bak.log
pcmsrv.log
pcmsrv_bak.log
Chapter 14. Troubleshooting and diagnostics
275
Starting with SDDPCM 2.1.0.8, the relevant data for debugging problems is collected by
running sddpcmgetdata. The sddpcmgetdata script collects information that is used for
problem determination and then creates a tar file at the current directory with the current date
and time as a part of the file name, for example:
sddpcmdata_hostname_yyyymmdd_hhmmss.tar
When you report an SDDPCM problem, it is essential that you run this script and send this tar
file to IBM Support for problem determination. Refer to Example 14-3.
Example 14-3 Use of the sddpcmgetdata script (output shortened for clarity)
>sddpcmgetdata
>ls
sddpcmdata_confucius_20080814_012513.tar
If the sddpcmgetdata command is not found, collect the following files:
 pcm.log
 pcm_bak.log
 pcmsrv.log
 pcmsrv_bak.log
 The output of the pcmpath query adapter command
 The output of the pcmpath query device command
You can find these files in the /var/adm/ras directory.
SDDDSM
SDDDSM also provides the sddgetdata script to collect information to use for problem
determination. SDDGETDATA.BAT is the batch file that generates the following files:
 The sddgetdata_%host%_%date%_%time%.cab file
 SDD\SDDSrv logs
 Datapath output
 Event logs
 Cluster log
 SDD specific registry entry
 HBA information
Example 14-4 shows an example of this script.
Example 14-4 Use of the sddgetdata script for SDDDSM (output shortened for clarity)
C:\Program Files\IBM\SDDDSM>sddgetdata.bat
Collecting SDD trace Data
Collecting datapath command outputs
Collecting SDD and SDDSrv logs
Collecting Most current driver trace
Generating a CAB file for all the Logs
276
SAN Volume Controller Best Practices and Performance Guidelines
sdddata_DIOMEDE_20080814_42211.cab file generated
C:\Program Files\IBM\SDDDSM>dir
Volume in drive C has no label.
Volume Serial Number is 0445-53F4
Directory of C:\Program Files\IBM\SDDDSM
06/29/2008
04:22 AM
574,130 sdddata_DIOMEDE_20080814_42211.cab
Data collection script for IBM AIX
In Example 14-5, we provide a script that collects all of the necessary data for an AIX host at
one time (both operating system and multipathing data). You can execute the script in
Example 14-5 by using these steps:
1. vi /tmp/datacollect.sh
2. Cut and paste the script into the /tmp/datacollect.sh file and save the file.
3. chmod 755 /tmp/datacollect.sh
4. /tmp/datacollect.sh
Example 14-5 Data collection script
#!/bin/ksh
export PATH=/bin:/usr/bin:/sbin
echo "y" | snap -r # Clean up old snaps
snap -gGfkLN # Collect new; don't package yet
cd /tmp/ibmsupt/other # Add supporting data
cp /var/adm/ras/sdd* .
cp /var/adm/ras/pcm* .
cp /etc/vpexclude .
datapath query device > sddpath_query_device.out
datapath query essmap > sddpath_query_essmap.out
pcmpath query device > pcmpath_query_device.out
pcmpath query essmap > pcmpath_query_essmap.out
sddgetdata
sddpcmgetdata
snap -c # Package snap and other data
echo "Please rename /tmp/ibmsupt/snap.pax.Z after the"
echo "PMR number and ftp to IBM."
exit 0
14.2.2 SVC data collection
You can collect data for the SVC either by using the SVC Console GUI or by using the SVC
CLI. In the following sections, we describe how to collect SVC data using the SVC CLI, which
is the easiest method.
Chapter 14. Troubleshooting and diagnostics
277
Data collection for SVC code Version 4.x
Because the config node is always the SVC node with which you communicate, it is essential
that you copy all the data from the other nodes to the config node. In order to copy the files,
first run the command svcinfo lsnode to determine the non-config nodes.
The output of this command is shown in Example 14-6.
Example 14-6 Determine the non-config nodes (output shortened for clarity)
IBM_2145:itsosvccl1:admin>svcinfo lsnode
id
name
WWNN
status
1
node1
50050768010037E5 online
2
node2
50050768010037DC online
IO_group_id
0
0
config_node
no
yes
The output that is shown in Example 14-6 shows that the node with ID 2 is the config node.
So, for all nodes, except the config node, you must run the command svctask cpdumps.
There is no feedback given for this command. Example 14-7 shows the command for the
node with ID 1.
Example 14-7 Copy the dump files from the other nodes
IBM_2145:itsosvccl1:admin>svctask cpdumps -prefix /dumps 1
To collect all the files, including the config.backup file, trace file, errorlog file, and more, you
need to run the svc_snap dumpall command. This command collects all of the data, including
the dump files. To ensure that there is a current backup of the SVC cluster configuration, run
a svcconfig backup before issuing the svc_snap dumpall command. Refer to Example 14-8
for an example run.
It is sometimes better to use the svc_snap and ask for the dumps individually, which you do by
omitting the dumpall parameter, which captures the data collection apart from the dump files.
Note: Dump files are extremely large. Only request them if you really need them.
Example 14-8 The svc_snap dumpall command
IBM_2145:itsosvccl1:admin>svc_snap dumpall
Collecting system information...
Copying files, please wait...
Copying files, please wait...
Dumping error log...
Waiting for file copying to complete...
Waiting for file copying to complete...
Waiting for file copying to complete...
Waiting for file copying to complete...
Creating snap package...
Snap data collected in /dumps/snap.104603.080815.160321.tgz
After the data collection with the svc_snap dumpall command is complete, you can verify that
the new snap file appears in your 2145 dumps directory using this command, svcinfo
ls2145dumps. Refer to Example 14-9 on page 279.
278
SAN Volume Controller Best Practices and Performance Guidelines
Example 14-9 The ls2145 dumps command (shortened for clarity)
IBM_2145:itsosvccl1:admin>svcinfo ls2145dumps
id
2145_filename
0
dump.104603.080801.161333
1
svc.config.cron.bak_node2
.
.
23
104603.trc
24
snap.104603.080815.160321.tgz
To copy the file from the SVC cluster, use secure copy (SCP). The PuTTY SCP function is
described in more detail in IBM System Storage SAN Volume Controller V4.3, SG24-6423.
Information: If there is no dump file available on the SVC cluster or for a particular SVC
node, you need to contact your next level of IBM Support. The support personnel will guide
you through the procedure to take a new dump.
14.2.3 SAN data collection
In this section, we describe capturing and collecting the switch support data. If there are
problems that cannot be fixed by a simple maintenance task, such as exchanging hardware,
IBM Support will ask you to collect the SAN data.
We list how to collect the switch support data for Brocade, McDATA, and Cisco SAN
switches.
IBM System Storage/Brocade SAN switches
For most of the current Brocade switches, you need to issue the supportSave command to
collect the support data.
Example 14-10 shows the use of the supportSave command (interactive mode) on an IBM
System Storage SAN32B-3 (type 2005-B5K) SAN switch running Fabric OS v6.1.0c.
Example 14-10 The supportSave output from IBM SAN32B-3 switch (output shortened for clarity)
IBM_2005_B5K_1:admin> supportSave
This command will collect RASLOG, TRACE, supportShow, core file, FFDC data
and other support information and then transfer them to a FTP/SCP server
or a USB device. This operation can take several minutes.
NOTE: supportSave will transfer existing trace dump file first, then
automatically generate and transfer latest one. There will be two trace dump
files transfered after this command.
OK to proceed? (yes, y, no, n): [no] y
Host IP or Host Name: 9.43.86.133
User Name: fos
Password:
Protocol (ftp or scp): ftp
Remote Directory: /
Saving support information for switch:IBM_2005_B5K_1, module:CONSOLE0...
..._files/IBM_2005_B5K_1-S0-200808132042-CONSOLE0.gz:
5.77 kB 156.68 kB/s
Saving support information for switch:IBM_2005_B5K_1, module:RASLOG...
...files/IBM_2005_B5K_1-S0-200808132042-RASLOG.ss.gz:
38.79 kB
0.99 MB/s
Chapter 14. Troubleshooting and diagnostics
279
Saving support information for switch:IBM_2005_B5K_1,
...M_2005_B5K_1-S0-200808132042-old-tracedump.dmp.gz:
Saving support information for switch:IBM_2005_B5K_1,
...M_2005_B5K_1-S0-200808132042-new-tracedump.dmp.gz:
Saving support information for switch:IBM_2005_B5K_1,
...les/IBM_2005_B5K_1-S0-200808132042-ZONE_LOG.ss.gz:
Saving support information for switch:IBM_2005_B5K_1,
..._files/IBM_2005_B5K_1-S0-200808132044-CONSOLE1.gz:
Saving support information for switch:IBM_2005_B5K_1,
..._files/IBM_2005_B5K_1-S0-200808132044-sslog.ss.gz:
SupportSave completed
IBM_2005_B5K_1:admin>
module:TRACE_OLD...
239.58 kB
3.66 MB/s
module:TRACE_NEW...
1.04 MB
1.81 MB/s
module:ZONE_LOG...
51.84 kB
1.65 MB/s
module:RCS_LOG...
5.77 kB 175.18 kB/s
module:SSAVELOG...
1.87 kB
55.14 kB/s
IBM System Storage/Cisco SAN switches
Establish a terminal connection to the switch (Telnet, SSH, or serial) and collect the output
from the following commands:
 terminal length 0
 show tech-support detail
 terminal length 24
IBM System Storage/McDATA SAN switches
Enterprise Fabric Connectivity Manager (EFCM) is the preferred way of collecting data for
McDATA switches.
For EFCM 8.7 and higher levels (without the group manager license), select the switch for
which you want to collect data, right-click it, and launch the switch local Element Manager.
Refer to Figure 14-1.
On the Element Manager panel, choose Maintenance → Data collection → Extended, and
save the compressed file on the local disk. Name the compressed file to reflect your problem
ticket number before uploading the file to IBM Support.
Figure 14-1 Data collection for McDATA using Element Manager
If you have the group manager license for EFCM, you can collect data from multiple switches
in one run. Refer to Figure 14-2 on page 281.
280
SAN Volume Controller Best Practices and Performance Guidelines
Figure 14-2 Selecting Group Manager from EFCM
To collect data when you are in the EFCM Group Manager, select Run Data Collection as
the Group Action (Figure 14-3). From this point, a wizard will guide you through the data
collection process. Name the generated zipped file to reflect your problem ticket number
before uploading the file to IBM Support.
Figure 14-3 Selecting the data collection action in EFCM Group Manager
14.2.4 Storage subsystem data collection
How you collect the data depends on the storage subsystem model. We only show how to
collect the support data for IBM System Storage storage subsystems.
IBM System Storage DS4000 series
With Storage Manager levels higher than 9.1, there is a feature called Collect All Support
Data. To collect the information, open the Storage Manager and select Advanced →
Troubleshooting → Collect All Support Data as shown in Figure 14-4 on page 282.
Chapter 14. Troubleshooting and diagnostics
281
Figure 14-4 DS4000 data collection
IBM System Storage DS8000 and DS6000 series
By issuing the following series of commands, you get an overview of the current configuration
of an IBM System Storage DS8000 or DS6000:
 lssi
 lsarray -l
 lsrank
 lsvolgrp
 lsfbvol
 lsioport -l
 lshostconnect
The complete data collection is normally performed by the IBM service support representative
(IBM SSR) or the IBM Support center. The IBM product engineering (PE) package includes
all current configuration data as well as diagnostic data.
14.3 Recovering from problems
In this section, we provide guidance about how to recover from several of the more common
problems that you might encounter. We also show example problems and how to fix them. In
all cases, it is essential to read and understand the current product limitations to verify the
configuration and to determine if you need to upgrade any components or to install the latest
fixes or “patches.”
To obtain support for IBM products, visit the major IBM Support Web page on the Internet:
http://www.ibm.com/support/us/en/
282
SAN Volume Controller Best Practices and Performance Guidelines
From this IBM Support Web page, you can obtain various types of support by following the
links that are provided on this page.
To review the SVC Web page for the latest flashes, the concurrent code upgrades, code
levels, and matrixes, go to:
http://www.ibm.com/storage/support/2145/
14.3.1 Solving host problems
Apart from hardware-related problems, there can be problems in areas, such as the operating
system or the software used on the host. These problems are normally handled by the host
administrator or the service provider of the host system.
However, the multipathing driver installed on the host and its features can help to determine
possible problems. In Example 14-11, we show two faulty paths reported by the IBM
Subsystem Device Driver (SDD) output on the host by using the datapath query device -l
command. The faulty paths are the paths in the “close” state. Faulty paths can be caused by
both hardware and software problems.
Hardware problems, such as:
 Faulty small form-factor pluggable transceiver (SFP) in the host or the SAN switch
 Faulty fiber optic cables
 Faulty Host Bus Adapters (HBA)
Software problems, such as:




A back level multipathing driver
A back level HBA firmware
Failures in the zoning
The wrong host to VDisk mapping
Example 14-11 SDD output on a host with faulty paths
C:\Program Files\IBM\Subsystem Device Driver>datapath query device -l
Total Devices : 1
DEV#:
3 DEVICE NAME: Disk4 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018381BF2800000000000027
LUN IDENTIFIER: 60050768018381BF2800000000000027
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk4 Part0
CLOSE
OFFLINE
218297
0
1 *
Scsi Port2 Bus0/Disk4 Part0
CLOSE
OFFLINE
0
0
2
Scsi Port3 Bus0/Disk4 Part0
OPEN
NORMAL
222394
0
3 *
Scsi Port3 Bus0/Disk4 Part0
OPEN
NORMAL
0
0
Based on our field experience, we recommend that you check the hardware first:
 Check if any connection error indicators are lit on the host or SAN switch.
 Check if all of the parts are seated correctly (cables securely plugged in the SFPs, and the
SFPs plugged all the way into the switch port sockets).
 Ensure that there are no broken fiber optic cables (if possible, swap the cables to cables
that are known to work).
Chapter 14. Troubleshooting and diagnostics
283
After the hardware check, continue to check the software setup:
 Check that the HBA driver level and firmware level are at the recommended and
supported levels.
 Check the multipathing driver level, and make sure that it is at the recommended and
supported level.
 Check for link layer errors reported by the host or the SAN switch, which can indicate a
cabling or SFP failure.
 Verify your SAN zoning configuration.
 Check the general SAN switch status and health for all switches in the fabric.
In Example 14-12, we discovered that one of the HBAs was experiencing a link failure due to
a fiber optic cable that has been bent over too far. After we changed the cable, the missing
paths reappeared.
Example 14-12 Output from datapath query device command after fiber optic cable change
C:\Program Files\IBM\Subsystem Device Driver>datapath query device -l
Total Devices : 1
DEV#:
3 DEVICE NAME: Disk4 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018381BF2800000000000027
LUN IDENTIFIER: 60050768018381BF2800000000000027
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port3 Bus0/Disk4 Part0
OPEN
NORMAL
218457
1
1 *
Scsi Port3 Bus0/Disk4 Part0
OPEN
NORMAL
0
0
2
Scsi Port2 Bus0/Disk4 Part0
OPEN
NORMAL
222394
0
3 *
Scsi Port2 Bus0/Disk4 Part0
OPEN
NORMAL
0
0
14.3.2 Solving SVC problems
For any problem in an environment implementing the SVC, we advise you to use the “Run
Maintenance Procedure” function, which you can find in the SVC Console GUI as shown in
Figure 14-5 on page 285, first before trying to fix the problem anywhere else.
The maintenance procedure checks the error condition, and if it was a temporary failure, it
marks this problem as fixed; otherwise, the problem persists. In this case, the SVC will guide
you through several verification steps to help you isolate the problem area.
The SVC error log provides you with information, such as all of the events on the SVC, all of
the error messages, and SVC warning information. Although you can mark the error as fixed
in the error log, we recommend that you always use the “Run Maintenance Procedure”
function as shown in Figure 14-5 on page 285.
The SVC error log has a feature called Sense Expert as shown in Figure 14-5 on page 285.
284
SAN Volume Controller Best Practices and Performance Guidelines
Figure 14-5 Error with Sense Expert available
When you click Sense Expert, the sense data is translated into data that is more clearly
explained and more easily understood, as shown in Figure 14-6 on page 286.
Chapter 14. Troubleshooting and diagnostics
285
Figure 14-6 Sense Expert output
Another common practice is to use the SVC CLI to find problems. The following list of
commands provides you with information about the status of your environment:
 svctask detectmdisk (discovers any changes in the back-end storage configuration)
 svcinfo lscluster clustername (checks the SVC cluster status)
 svcinfo lsnode nodeid (checks the SVC nodes and port status)
 svcinfo lscontroller controllerid (checks the back-end storage status)
 svcinfo lsmdisk (provides a status of all the MDisks)
 svcinfo lsmdisk mdiskid (checks the status of a single MDisk)
 svcinfo lsmdiskgrp (provides a status of all the MDisk groups)
 svcinfo lsmdiskgrp mdiskgrpid (checks the status of a single MDisk group)
 svcinfo lsvdisk (checks if VDisks are online)
Important: Although the SVC raises error messages, most problems are not caused by
the SVC. Most problems are introduced by the storage subsystems or the SAN.
286
SAN Volume Controller Best Practices and Performance Guidelines
If the problem is caused by the SVC and you are unable to fix it either with the “Run
Maintenance Procedure” function or with the error log, you need to collect the SVC debug
data as explained in 14.2.2, “SVC data collection” on page 277.
If the problem is related to anything outside of the SVC, refer to the appropriate section in this
chapter to try to find and fix the problem.
Cluster upgrade checks
There are a number of prerequisite checks to perform to confirm readiness prior to performing
an SVC cluster code load:
 Check the back-end storage configurations for SCSI ID to LUN ID mappings. Normally, a
1625 error is detected if there is a problem, but it is also worthwhile to manually check
these back-end storage configurations for SCSI ID to LUN ID mappings.
Specifically, you need to make sure that the SCSI ID to LUN ID is the same for each SVC
node port.
You can use these commands on the Enterprise Storage Server (ESS) to pull the data out
to check ESS mapping:
esscli list port -d “ess=<ESS name>”
esscli list hostconnection -d “ess=<ESS name>”
esscli list volumeaccess -d “ess=<ESS name>”
And, verify that the mapping is identical.
Use the following commands for an IBM System Storage DS8000 series storage
subsystem to check the SCSI ID to LUN ID mappings:
lsioport -l
lshostconnect -l
showvolgrp -lunmap <volume group>
lsfbvol -l -vol <SVC volume groups>
LUN mapping problems are unlikely on a DS8000-based storage subsystem because of
the way that volume groups are allocated; however, it is still worthwhile to verify the
configuration just prior to upgrades.
For the IBM System Storage DS4000 series, we also recommend that you verify that each
SVC node port has an identical LUN mapping.
From the DS4000 Storage Manager, you can use the Mappings View to verify the
mapping. You can also run the data collection for the DS4000 and use the subsystem
profile to check the mapping.
 For storage subsystems from other vendors, use the corresponding steps to verify the
correct mapping.
 Check the host multipathing to ensure path redundancy.
 Use the svcinfo lsmdisk and svcinfo lscontroller commands to check the SVC cluster
to ensure the path redundancy to any back-end storage controllers.
 Use the “Run Maintenance Procedure” function or “Analyze Error Log” function in the SVC
Console GUI to investigate any unfixed or investigated SVC errors.
 Download and execute the SAN Volume Controller Software Upgrade Test Utility:
http://www-1.ibm.com/support/docview.wss?uid=ssg1S4000585
 Review the latest flashes, hints, and tips prior to the cluster upgrade. There will be a list of
directly applicable flashes, hints, and tips on the SVC code download page. Also, review
the latest support flashes on the SVC support page.
Chapter 14. Troubleshooting and diagnostics
287
14.3.3 Solving SAN problems
A variety of situations can cause problems in the SAN and on the SAN switches. Problems
can be related to either a hardware fault or to a software problem on the switch. Hardware
defects are normally the easiest problems find. Here is a short list of possible hardware
failures:
Switch power, fan, or cooling units
Application-specific integrated circuit (ASIC)
Installed SFP modules
Fiber optic cables




Software failures are more difficult to analyze, and in most cases, you need to collect data,
and you need to involve IBM Support. But before taking any other step, we recommend that
you check the installed code level for any known problems. We also recommend that you
check if there is a new code level available that resolves the problem that you are
experiencing.
The most common SAN problems are usually related to zoning, for example, you choose the
wrong WWPN for a host zone, such as when two SVC node ports need to be zoned to one
HBA, with one port from each SVC node. But, Example 14-13 shows that there are two ports
zoned belonging to the same node. The result is that the host and its multipathing driver will
not see all of the necessary paths. Incorrect zoning is shown in Example 14-13.
Example 14-13 Wrong WWPN zoning
zone:
Senegal_Win2k3_itsosvccl1_iogrp0_Zone
50:05:07:68:01:20:37:dc
50:05:07:68:01:40:37:dc
20:00:00:e0:8b:89:cc:c2
The correct zoning must look like the zoning shown in Example 14-14.
Example 14-14 Correct WWPN zoning
zone:
Senegal_Win2k3_itsosvccl1_iogrp0_Zone
50:05:07:68:01:40:37:e5
50:05:07:68:01:40:37:dc
20:00:00:e0:8b:89:cc:c2
The following SVC error codes are related to the SAN environment:
 Error 1060 Fibre Channel ports are not operational.
 Error 1220 A remote port is excluded.
If you are unable to fix the problem with these actions, use 14.2.3, “SAN data collection” on
page 279, collect the SAN switch debugging data, and then contact IBM Support.
14.3.4 Solving back-end storage problems
SVC is a great tool to find and analyze back-end storage subsystem problems, because the
SVC has a monitoring and logging mechanism.
However, the SVC is not as helpful in finding problems from a host perspective, because the
SVC is a SCSI target for the host, and the SCSI protocol defines that errors are reported via
the host.
288
SAN Volume Controller Best Practices and Performance Guidelines
Typical problems for storage subsystem controllers include incorrect configuration, which
results in a 1625 error code. Other problems related to the storage subsystem are failures
pointing to the managed disk I/O (error code 1310), disk media (error code 1320), and error
recovery procedure (error code 1370).
However, all messages do not have just one explicit reason for being issued. Therefore, you
have to check multiple areas and not just the storage subsystem. Next, we explain how to
determine the root cause of the problem and in what order to start checking:
1. Run the maintenance procedures under SVC.
2. Check the attached storage subsystem for misconfigurations or failures.
3. Check the SAN for switch problems or zoning failures.
4. Collect all support data and involve IBM Support.
Now, we look at these steps sequentially:
1. Run the maintenance procedures under SVC.
To run the SVC Maintenance Procedures, open the SVC Console GUI. Select Service
and Maintenance → Run Maintenance Procedures. On the Maintenance Procedures
panel that appears in the right pane, click Start Analysis (Figure 14-7).
Figure 14-7 Start Analysis from the SVC Console GUI
For more information about how to use the SVC Maintenance Procedures, refer to IBM
System SAN Volume Controller V4.3, SG24-6423-06, or the SVC Service Guide,
S7002158.
2. Check the attached storage subsystem for misconfigurations or failures:
a. Independent of the type of storage subsystem, the first thing for you to check is
whether there are any open problems on the system. Use the service or maintenance
features provided with the storage subsystem to fix these problems.
b. Then, check if the LUN masking is correct. When attached to the SVC, you have to
make sure that the LUN masking maps to the active zone set on the switch. Create a
similar LUN mask for each storage subsystem controller port that is zoned to the SVC.
Also, observe the SVC restrictions for back-end storage subsystems, which can be
found at:
http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003283
c. Next, we show an example of a misconfigured storage subsystem, and how this
misconfigured storage system will appear from the SVC’s point of view. Furthermore,
we explain how to fix the problem.
By running the svcinfo lscontroller ID command, you will get output similar to the
output that is shown in Example 14-15 on page 290. As highlighted in the example, the
MDisks, and therefore, the LUNs, are not equally allocated. In our example, the LUNs
provided by the storage subsystem are only visible by one path, which is storage
subsystem WWPN.
Chapter 14. Troubleshooting and diagnostics
289
Example 14-15 The svcinfo lscontroller command
IBM_2145:itsosvccl1:admin>svcinfo lscontroller 0
id 0
controller_name controller0
WWNN 200400A0B8174431
mdisk_link_count 2
max_mdisk_link_count 4
degraded no
vendor_id IBM
product_id_low 1742-900
product_id_high
product_revision 0520
ctrl_s/n
WWPN 200400A0B8174433
path_count 8
max_path_count 12
WWPN 200500A0B8174433
path_count 0
max_path_count 8
This imbalance has two possible causes:
•
If the back-end storage subsystem implements a preferred controller design,
perhaps the LUNs are all allocated to the same controller. This situation is likely
with the IBM System Storage DS4000 series, and you can fix it by redistributing the
LUNs evenly across the DS4000 controllers and then rediscovering the LUNs on
the SVC. Because we used a DS4500 storage subsystem (type 1742) in the
Example 14-15, we need to check for this situation.
•
Another possible cause is that the WWPN with zero count is not visible to all the
SVC nodes via the SAN zoning or the LUN masking on the storage subsystem. Use
the SVC CLI command svcinfo lsfabric 0 to confirm.
If you are unsure which of the attached MDisks has which corresponding LUN ID, use
the SVC CLI command svcinfo lsmdisk (refer to Example 14-16). This command
also shows to which storage subsystem a specific MDisk belongs (the controller ID).
Example 14-16 Determine the ID for the MDisk
IBM_2145:itsosvccl1:admin>svcinfo lsmdisk
id
name
status
mode
mdisk_grp_id mdisk_grp_name
capacity
ctrl_LUN_#
controller_name
UID
0
mdisk0
online
managed
0
MDG-1
600.0GB
0000000000000000
controller0
600a0b800017423300000059469cf84500000000000000000000000000000000
2
mdisk2
online
managed
0
MDG-1
70.9GB
0000000000000002
controller0
600a0b800017443100000096469cf0e800000000000000000000000000000000
The problem turned out to be with the LUN allocation across the DS4500 controllers.
After fixing this allocation on the DS4500, an SVC MDisk rediscovery fixed the problem
from the SVC’s point of view. Example 14-17 on page 291 shows an equally distributed
MDisk.
290
SAN Volume Controller Best Practices and Performance Guidelines
Example 14-17 Equally distributed MDisk on all available paths
IBM_2145:itsosvccl1:admin>svctask detectmdisk
IBM_2145:itsosvccl1:admin>svcinfo lscontroller 0
id 0
controller_name controller0
WWNN 200400A0B8174431
mdisk_link_count 2
max_mdisk_link_count 4
degraded no
vendor_id IBM
product_id_low 1742-900
product_id_high
product_revision 0520
ctrl_s/n
WWPN 200400A0B8174433
path_count 4
max_path_count 12
WWPN 200500A0B8174433
path_count 4
max_path_count 8
d. In our example, the problem was solved by changing the LUN allocation. If step 2 did
not solve the problem, you need to continue with step 3.
3. Check the SANs for switch problems or zoning failures.
Many situations can cause problems in the SAN. Refer to 14.2.3, “SAN data collection” on
page 279 for more information.
4. Collect all support data and involve IBM Support.
Collect the support data for the involved SAN, SVC, or storage systems as described in
14.2, “Collecting data and isolating the problem” on page 274.
Common error recovery steps using the SVC CLI
In this section, we describe how to use the SVC CLI to perform common error recovery steps
for back-end SAN problems or storage problems.
The maintenance procedures perform these steps, but it is sometimes quicker to run these
commands directly via the CLI. Run these commands anytime that you have:
 Experienced a back-end storage issue (for example, error code 1370 or error code 1630)
 Performed maintenance on the back-end storage subsystems
It is especially important to run these commands when there is a back-end storage
configuration or zoning change to ensure that the SVC follows the changes.
The SVC CLI commands for common error recovery are:
 The svctask detectmdisk command (discovers the changes in the back end)
 The svcinfo lscontroller command and the svcinfo lsmdisk command (give you
overall status of all of the controllers and MDisks)
 The svcinfo lscontroller controllerid command (checks the controller that was
causing the problems and verifies that all the WWPNs are listed as you expect)
 svctask includemdisk mdiskid (for each degraded or offline MDisk)
Chapter 14. Troubleshooting and diagnostics
291
 The svcinfo lsmdisk command (Are all MDisks online now?)
 The svcinfo lscontroller controllerid command (checks that the path_counts are
distributed somewhat evenly across the WWPNs)
Finally, run the maintenance procedures on the SVC to fix every error.
14.4 Livedump
SVC livedump is a procedure that IBM Support might ask your clients to run for problem
investigation.
Note: Only invoke the SVC livedump procedure under the direction of IBM Support.
Sometimes, investigations require a livedump from the configuration node in the SVC cluster.
A livedump is a lightweight dump from a node, which can be taken without impacting host I/O.
The only impact is a slight reduction in system performance (due to reduced memory being
available for the I/O cache) until the dump is finished. The instructions for a livedump are:
1. Prepare the node for taking a livedump: svctask preplivedum <node id/name>
This command will reserve the necessary system resources to take a livedump. The
operation can take some time, because the node might have to flush data from the cache.
System performance might be slightly affected after running this command, because part
of the memory, which normally is available to the cache, is not available while the node is
prepared for a livedump.
After the command has completed, then the livedump is ready to be triggered, which you
can see by looking at the output from svcinfo lslivedump <node id/name>.
The status must be reported as “prepared.”
2. Trigger the livedump: svctask triggerlivedump <node id/name>
This command completes as soon as the data capture is complete, but before the dump
file has been written to disk.
3. Query the status and copy the dump off when complete:
svcinfo lslivedump <nodeid/name>
The status shows “dumping” while the file is being written to disk and “inactive” after it is
completed. After the status returns to the inactive state, you can find the livedump file in
/dumps on the node with a filename of the format:
livedump.<panel_id>.<date>.<time>
You can then copy this file off the node, just as you copy a normal dump, by using the GUI
or SCP.
The dump must then be uploaded to IBM Support for analysis.
292
SAN Volume Controller Best Practices and Performance Guidelines
15
Chapter 15.
SVC 4.3 performance highlights
In this chapter, we discuss the performance improvements that have been made with the
4.3.0 release of the SAN Volume Controller (SVC) code and the advantage of upgrading to
the latest 8G4 node hardware. We also discuss how to optimize your system to gain the
maximum benefit from the improvements that are not discussed elsewhere in this book. We
look in detail at:
 Improvements between SVC 4.2 and SVC 4.3
 Benefits of the latest 8G4 nodes
 Caching and striping capabilities
 Sequential scaling of additional nodes
© Copyright IBM Corp. 2008. All rights reserved.
293
15.1 SVC and continual performance enhancements
Since the introduction of the SVC in May 2003, IBM has continually increased its
performance capabilities to meet increasing client demands. The SVC architecture brought
together, for the first time, the full range of capabilities needed by storage administrators to
regain control of SAN complexity, while also meeting aggressive goals for storage reliability
and performance. On 29 October 2004, SVC Release 1.2.1 increased the potential for
storage consolidation by doubling the maximum number of supported SVC nodes from four to
eight.
There is also a performance white paper available to IBM employees at this Web site:
http://tinyurl.com/2el4ar
Contact your IBM marketing representative for details about getting this white paper.
The release of Version 2 of the SVC code included performance improvements that
increased the online transaction processing (OLTP) performance. With the release of SVC
3.1, not only were there continued code improvements but a new release of hardware: the
8F2 node with a doubling of cache and improved processor and internal bus speeds. The 8F4
node included support for 4 Gbps SANs and an increase of performance.
SVC 4.2, and the new 8G4 node brought a dramatic increase in performance as
demonstrated by the results in the Storage Performance Council (SPC) Benchmarks: SPC-1
and SPC-2.
The benchmark number 272,505.19 SPC-1 IOPS is the industry-leading OLTP result and the
PDF is available here:
http://www.storageperformance.org/results/b00024_IBM-SVC4.2_SPC2_executive-summary
.pdf
The throughput benchmark, 7,084.44 SPC-2 MBPS, is the industry-leading throughput
benchmark, and the PDF is available here:
http://www.storageperformance.org/results/b00024_IBM-SVC4.2_SPC2_executive-summary
.pdf
The performance improvement over time can be seen in Figure 15-1 on page 295 for OLTP.
294
SAN Volume Controller Best Practices and Performance Guidelines
Figure 15-1 SPC-1 Benchmark over time
In Figure 15-2 on page 296, we show the improvement for throughput. Because the SPC-2
benchmark was only introduced in 2006, this graph is of necessity over a shorter time span.
Chapter 15. SVC 4.3 performance highlights
295
Figure 15-2 SPC-2 benchmark over time
15.2 SVC 4.3 code improvements
SVC code upgrades generally include a range of minor performance improvements. The
following larger changes have been made since SVC 4.2.0:
 SVC 4.2.1 improved the ability of the cache to adapt to performance differences between
back-end storage controllers. If one Managed Disk Group (MDG) has poor performance,
for example, because a cache battery has failed, the amount of write cache that it is
allowed to use will be limited, which means that VDisks hosted by other MDGs will
continue to benefit from SVC’s write caching despite the broken storage controller.
 SVC 4.3.0 tunes inter-node communication over the SAN, which can improve
performance for workloads consisting of many small I/Os.
While these changes will improve performance in certain circumstances, upgrading from
older node hardware to the latest 8G4 level will have a much greater effect. The following test
results demonstrate the kind of improvement that can be expected.
15.3 Performance increase when upgrading to 8G4 nodes
Figure 15-3 on page 297 uses a variety of workloads to examine the performance gains
achieved by upgrading the software on an 8F4 node to SVC 4.2. These gains are compared
with those gains that result from a complete hardware and software replacement based upon
8G4 node technology.
296
SAN Volume Controller Best Practices and Performance Guidelines
Figure 15-3 Comparison of a software only upgrade to a full upgrade of an 8F4 node (variety of
workloads, I/O rate times 1000)
As you can see in Figure 15-3, significant gains can be achieved with the software-only
upgrade. The 70/30 miss workload, consisting of 70 percent read misses and 30 percent write
misses, is of special interest. This workload contains a mix of both reads and writes, which we
ordinarily expect to see under production conditions.
Figure 15-4 on page 298 presents another view of the effect of moving to the latest level of
software and hardware.
Chapter 15. SVC 4.3 performance highlights
297
Figure 15-4 Two node SVC cluster with random 4 KB throughput
Figure 15-5 presents a more detailed view of performance on this specific workload.
Figure 15-5 shows that the SVC 4.2 software-only upgrade boosts the maximum throughput
for the 70/30 workload by more than 30%. Thus, a significant portion of the overall throughput
gain achieved with full hardware and software replacement comes from the software
enhancements.
2 Node - 70/30 4K Random Miss
30
Response Time (ms)
25
20
15
10
5
0
0
20000
40000
60000
80000
100000
120000
Throughput (IO/s)
4.1.0 8F4
4.2.0 8F4
4.2.0 8G4
Figure 15-5 Comparison of a software only upgrade to a full upgrade of an 8F4 node 70/30 miss
workload
298
SAN Volume Controller Best Practices and Performance Guidelines
15.3.1 Performance scaling of I/O Groups
We turn now to a discussion of the SVC’s capability to scale up to extremely high levels of I/O
demand. This section focuses on an online transaction processing (OLTP) workload, typical
of a database’s I/O demands; the following section then examines SVC scalability for
sequential demands. Figure 15-6 shows the SPC-1 type performance delivered by two, four,
six, or eight SVC nodes. The OLTP workload is handled by 1 536 15K RPM disks configured
as Redundant Array of Independent Disks 10 (RAID 10). The host connectivity was through
32 Fibre Channels.
Figure 15-6 OLTP workload performance with two, four, six, or eight nodes
Figure 15-7 on page 300 presents the database scalability results at a higher level by pulling
together the maximum throughputs (observed at a response time of 30 milliseconds or less)
for each configuration. The latter figure shows that SVC performance scales in a nearly linear
manner depending upon the number of nodes.
Chapter 15. SVC 4.3 performance highlights
299
Figure 15-7 OLTP workload scalability
As Figure 15-6 on page 299 and Figure 15-7 show, the tested SVC configuration is capable of
delivering over 270 000 I/Os per second (IOPS) for the OLTP workload. You are encouraged
to compare this result against any other disk storage product currently posted on the SPC
Web site at:
http://www.storageperformance.org
300
SAN Volume Controller Best Practices and Performance Guidelines
Related publications
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this book.
IBM Redbooks publications
For information about ordering these publications, refer to “How to get IBM Redbooks
publications” on page 303. Note that several of the documents referenced here might be
available in softcopy only:
 IBM System Storage SAN Volume Controller, SG24-6423-06
 Get More Out of Your SAN with IBM Tivoli Storage Manager, SG24-6687
 IBM Tivoli Storage Area Network Manager: A Practical Introduction, SG24-6848
 IBM System Storage: Implementing an IBM SAN, SG24-6116
 DS4000 Best Practices and Performance Tuning Guide, SG24-6363-02
Other resources
These publications are also relevant as further information sources:
 IBM System Storage Open Software Family SAN Volume Controller: Planning Guide,
GA22-1052
 IBM System Storage Master Console: Installation and User’s Guide, GC30-4090
 IBM System Storage Open Software Family SAN Volume Controller: Installation Guide,
SC26-7541
 IBM System Storage Open Software Family SAN Volume Controller: Service Guide,
SC26-7542
 IBM System Storage Open Software Family SAN Volume Controller: Configuration Guide,
SC26-7543
 IBM System Storage Open Software Family SAN Volume Controller: Command-Line
Interface User's Guide, SC26-7544
 IBM System Storage Open Software Family SAN Volume Controller: CIM Agent
Developers Reference, SC26-7545
 IBM TotalStorage Multipath Subsystem Device Driver User’s Guide, SC30-4096
 IBM System Storage Open Software Family SAN Volume Controller: Host Attachment
Guide, SC26-7563
 IBM System Storage SAN Volume Controller V4.3, SG24-6423-06
 Implementing the SVC in an OEM Environment, SG24-7275
 IBM TotalStorage Productivity Center V3.1: The Next Generation, SG24-7194
 TPC Version 3.3 Update Guide, SG24-7490
 Implementing an IBM/Brocade SAN, SG24-6116
 Implementing an IBM/Cisco SAN, SG24-7545
© Copyright IBM Corp. 2008. All rights reserved.
301
 IBM System Storage/Brocade Multiprotocol Routing: An Introduction and Implementation,
SG24-7544
 IBM System Storage/Cisco Multiprotocol Routing: An Introduction and Implementation,
SG24-7543
 Considerations and Comparisons between IBM SDD for Linux and DM-MPIO, which is
available at:
http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&q1=linux&uid=ssg
1S7001664&loc=en_US&cs=utf-8&lang=en
 TotalStorage Productivity Center User Guide, which is located at:
http://publib.boulder.ibm.com/infocenter/tivihelp/v4r1/topic/com.ibm.itpc.doc/t
pcugd31389.htm
Referenced Web sites
These Web sites are also relevant as further information sources:
 IBM TotalStorage home page:
http://www.storage.ibm.com
 SAN Volume Controller supported platform:
http://www-1.ibm.com/servers/storage/support/software/sanvc/index.html
 Download site for Windows SSH freeware:
http://www.chiark.greenend.org.uk/~sgtatham/putty
 IBM site to download SSH for AIX:
http://oss.software.ibm.com/developerworks/projects/openssh
 Open source site for SSH for Windows and Mac:
http://www.openssh.com/windows.html
 Cygwin Linux-like environment for Windows:
http://www.cygwin.com
 IBM Tivoli Storage Area Network Manager site:
http://www-306.ibm.com/software/sysmgmt/products/support/IBMTivoliStorageAreaNe
tworkManager.html
 Microsoft Knowledge Base Article 131658:
http://support.microsoft.com/support/kb/articles/Q131/6/58.asp
 Microsoft Knowledge Base Article 149927:
http://support.microsoft.com/support/kb/articles/Q149/9/27.asp
 Sysinternals home page:
http://www.sysinternals.com
 Subsystem Device Driver download site:
http://www-1.ibm.com/servers/storage/support/software/sdd/index.html
 IBM TotalStorage Virtualization home page:
http://www-1.ibm.com/servers/storage/software/virtualization/index.html
302
SAN Volume Controller Best Practices and Performance Guidelines
How to get IBM Redbooks publications
You can search for, view, or download IBM Redbooks publications, IBM Redpaper
publications, Technotes, draft publications and Additional materials, as well as order
hardcopy IBM Redbooks publications, at this Web site:
ibm.com/redbooks
Help from IBM
IBM Support and downloads
ibm.com/support
IBM Global Services
ibm.com/services
Related publications
303
304
SAN Volume Controller Best Practices and Performance Guidelines
Index
Numerics
1862 error 99
2-way write-back cached 120
500 84
A
access 2, 24, 58, 86, 109, 125, 163, 177, 222, 258
access pattern 130
accident 163
action commands 49
active 42, 58, 111, 162, 201, 226, 289
Active Directory domain 39
adapters 67, 109, 177, 219, 230, 256
address 120
Address Resolution Protocol 41
adds 77, 210
Admin 212
admin password 53
administraive rights 48
administration 88, 251
administrative access 49
administrative rights 48
administrator 24, 75, 163, 251, 270
administrators 205, 211, 246, 294
advanced copy 24, 164
aggregate 58, 108
AIX 63, 176, 207, 256
AIX host 186, 193, 277
AIX LVM admin roles 212
alert 9, 167, 222
alerts 3, 241, 266
algorithms 129
Alias 17
alias 16
aliases 14, 248
alignment 215
amount of I/O 30, 104, 130, 169
analysis 76, 170, 231, 292
antivirus software 38
AOS 50
application
availability 86, 102, 219
performance 86, 102, 127, 162, 208, 233
Application Specific Integrated Circuit 9
application testing 158
applications 21, 24, 103, 130, 162, 177, 207
architecture 58, 116, 191, 294
architectures 109, 199
area 189, 217, 274
areas 175, 209, 270
ARP 41
ARP entry 41
array 2, 24, 57, 66, 85–86, 102, 104, 138, 160, 169, 201,
210, 249, 262
© Copyright IBM Corp. 2008. All rights reserved.
array overdriving 80
arrays 2, 24, 66, 86, 102, 127, 160, 211, 246
ASIC 9
Assist 50
asynchronous 80, 134, 162
asynchronously 162
attached 3, 58, 84, 123, 175, 248, 270
attention 10, 255
attributes 83
audit 49
audit log 49
audit log file 49
Audit logging 49
audit logging facility 49
auto 124
Auto-Expand 121
Automated configuration backup 54
automatically discover 187
automation 42, 133
auxiliary 171
availability 10, 66, 86, 102, 182, 265
B
backend storage controller 148
back-end storage controllers 169
background copy 167
background copy rate 167
backplane 9
backup 3, 53, 159, 198, 209, 265, 278
backup files 53
backup node 15
backup sessions 215
balance 15, 59, 96, 102, 123, 167, 183, 211
balance the workload 129
balanced 15, 59, 114, 145, 178, 216
balancing 19, 96, 123, 178, 213, 215
band 138
Bandwidth 175
bandwidth 2, 25, 68, 112, 130, 160, 178, 209, 252
bandwidth requirements 21
baseline 78, 141
Basic 4, 41
basic 2, 25, 138, 176, 246, 268, 272
beat effect 218
best practices xiii, 1, 86, 102, 122, 162, 175
between 3, 31, 58, 86, 103, 123, 177, 210, 222, 265, 293
BIOS 35, 200
blade 14
BladeCenter 22
blades 14
block 67, 123, 152, 208, 235
block size 67, 144, 210
blocking 2
blocks 129
305
BM System Storage SAN Volume Controller Host Attachment User’s Guide Version 4.2.0 175, 200
boot 178
boot device 196
bottlenecks 144, 208
boundary crossing 215
bridge 6
Brocade 27, 279
buffer 152, 236
buffers 124, 161, 176, 220
bus 24, 188, 230, 294
C
cache 2, 57, 85, 104, 126, 176, 208–210, 235, 255, 264,
292, 294
cache disabled 133, 155
cache enabled 133
cache mode 135
cache-disabled VDisk 133–134
Cache-disabled VDisks 133
cache-enabled 162
cache-enabled VDisk 133
caching 24, 41, 66, 85, 130, 133, 164
caching mechanism 133
cap 104
capacity 8, 24, 85, 123, 214, 249, 290
cards 58, 200
certified 20, 262
changes 3, 29, 78, 85, 144, 164, 176, 219, 245, 266, 270
channel 194
chdev 193
choice 30–31, 67, 86, 130, 183
CIMOM 42, 222
Cisco 2, 27, 250, 263, 280
classes 103, 267
CLI 61, 88, 123, 153, 188, 233, 271
commands 69, 88, 286
client 197, 215
cluster 2, 23, 38, 55, 58, 84, 102, 123, 177, 222, 251,
271, 298
creation 52, 123
IP address 52, 223
cluster connection problems 50
cluster ID 49
cluster IP address 41
cluster partnership 54
cluster state information 41
clustering 191
clustering software 191
clusters 20, 24, 191, 222, 246
code update 34
combination 145, 162, 251
command 42, 59, 88, 123, 153, 179, 214, 222, 258, 266,
275
command prompt 51
commit 155
Common Information Model Object Manager 42
compatibility 34, 39, 254
complexity 11, 294
conception 12
306
concurrent 34, 42, 144, 189, 287
config node 41
configuration 1, 25, 41, 57, 84, 102, 162, 176, 208, 222,
245, 266, 271, 299
configuration backup 54
configuration backup file 54
configuration changes 187
configuration data 187, 282
configuration file 53
configuration node 52, 292
configuration parameters 171, 188
configure 86, 194, 219, 222, 245
congested 9
congestion 2, 238
control 3
connected 2, 58, 175, 223, 247, 264, 272
connection 42, 74, 192, 222, 248
connections 8, 42, 58, 196
connectivity 195, 222, 253, 270, 299
consistency 204
consistent 139, 170, 204, 240, 251
consolidation 102, 294
container 215
containers 215, 217
control 24, 71, 96, 133, 177, 212, 246, 294
controller port 84
copy 24, 103, 124, 204, 235, 246, 264, 278
copy rate 155
copy services 24, 31, 124
core 262
core switch 4, 8
core switches 10
core/edge ASIC 9
core-edge 5
correctly configured 170, 224
corrupted 204
corruption 20, 73
cost 20, 86, 102, 164, 254
counters 205, 234
create a FlashCopy 155
credentials 52
critical 66, 93, 208
cross-bar architecture 9
current 25, 65, 164, 188, 223, 248, 271
CWDM 20
D
data 3, 24, 59, 85, 162, 177, 208, 245, 262, 269
consistency 156
data formats 198
data integrity 126, 153
data layout 115, 124, 211
Data layout strategies 219
data migration 160, 198
data mining 158
data path 77
data pattern 208
data rate 104, 144, 174, 234
data structures 216
data traffic 9
SAN Volume Controller Best Practices and Performance Guidelines
database 3, 79, 130, 156, 185, 209, 234, 248, 270, 299
log 210
Database Administrator 213
date 223, 248, 276
DB2 container 216
DB2 I/O characteristics 216
db2logs 216
DBA 213
debug 75, 274
dedicate bandwidth 21
dedicated ISLs 9
default 58, 123, 167, 179, 223, 257
default values 67
defined 18, 148, 152, 210, 226, 257
degraded 141, 162, 271
delay 139, 156
delete
a VDisk 125
deleted 155
demand 103, 299
dependency 114
design 1, 24, 79, 103, 138, 184, 215, 263
destage 66, 85, 138
device 2, 66, 109, 138, 164, 179, 213, 225, 245, 274
device driver 164, 191
diagnose 15, 170, 262
diagnostic 192, 282
different vendors 164
director 10
directors 10
directory I/O 120
disabled 133, 255
disaster 29, 163, 204, 263
discovery 59, 96, 186, 253
disk 2, 24, 64, 83, 102, 123, 152, 185, 208, 231, 246,
264, 280, 300
latency 208
disk access profile 130
disk groups 29
Disk Magic 144
disruptive 3, 34, 123, 251
distance 20, 164, 262
limitations 20
distance extension 21
distances 20
DMP 184
documentation 1, 246
domain 72
Domain ID 20, 258
domain ID 20
Domain IDs 20
domains 102
download 206, 287
downtime 156
driver 34, 58, 164, 191, 258, 270
drops 108, 237
DS4000 58, 88, 105, 206, 241, 281
DS4000 Storage
Server 209
DS4100 84
DS4500 224
DS4800 17, 67, 84
DS6000 58, 88, 105, 195, 282
DS8000 18, 58, 84, 104, 195, 282
dual fabrics 14
dual-redundant switch controllers 9
DWDM 20
E
edge 2
edge switch 3
edge switches 4–5, 10
efficiency 129
egress 9
element 25
eliminates 76
e-mail 21, 205, 251
EMC 57
EMC Symmetrix 59
enable 11, 24, 53, 127, 156, 194, 209, 233, 257
enforce 8
Enterprise 58, 241, 250, 280
error 20, 42, 57, 88, 176, 222, 246, 270
Error Code 65
error handling 65
error log 64, 254, 274
error logging 63
errors 20, 176, 254, 270
ESS 58, 88, 105
Ethernet 2, 52
evenly balancing I/Os 218
event 3, 58, 102, 130, 163, 194, 241
events 164, 241, 284
exchange 156
execution throttle 200
expand 30
expansion 3, 217
extenders 164
extension 20
extent 29, 68, 83, 123, 210, 215
size 123, 215
extent size 123, 214
extent sizes 123, 214
extents 57, 123, 215
F
Fabric 22, 32, 230, 250, 280
fabric 1, 25, 144, 160, 176, 223, 270
isolation 183
login 185
fabric outage 3
Fabric Watch 10
fabrics 5, 177, 223
failover 58, 130, 176, 271
failure boundaries 103, 213
failure boundary 213
FAStT 14, 200
storage 14
FAStT200 84
Index
307
fastwrite cache 120
fault isolation 11
fault tolerant 86
FC 2, 67, 185
fcs 19, 193, 256
fcs device 194
features 24, 164, 196, 245, 264, 270
Fibre Channel 2, 58, 164, 175, 257, 262, 270
ports 21, 58
routers 164
traffic 3
Fibre Channel (FC) 177
Fibre Channel ports 58, 178, 258, 288
file level access control 49
file system 152, 201, 216
file system directories 217
file system level 204
filesets 197
firmware 170, 256
flag 126, 169
FlashCopy 27, 63, 85, 114, 124, 235, 271
applications 65, 114
mapping 76
prepare 155
rules 161
source 64, 124
Start 124
target 134, 235
FlashCopy mapping 153
FlashCopy mappings 125
flexibility 25, 130, 164, 190, 246
flow 3, 145
flush the cache 188
force flag 126
format 49, 75, 198, 247, 267, 292
frames 2
free extents 129
front panel 42
full bandwidth 10
fully allocated copy 127
fully allocated VDisk 127
function 61, 117, 163, 200, 264
functions 24, 63, 152, 195, 237, 272
G
GB 250
Gb 67, 237
General Public License (GNU) 206
Global 227
Global Mirror 228
Global Mirror relationship 162
gmlinktolerance 167
GNU 206
governing throttle 130
grain 85
granularity 123, 204
graph 79, 144, 236, 295
graphs 189
group 8, 69, 102, 123, 178, 210, 233, 249, 262, 280
groups 9, 29, 70, 83, 119, 179, 212, 236, 287, 299
308
growth 78, 217
GUI 12, 34, 59, 88, 123, 183, 222, 266, 277
GUI session 46
H
HACMP 42, 195
hardware 2, 25, 38, 58, 87, 103, 170, 199, 249, 271, 294
HBA 21, 24, 183, 193, 200, 230, 251, 270
HBAs 12, 142, 177–178, 200, 209, 230, 255
health 196, 222, 284
healthy 171, 225
heartbeat 165
help 8, 42, 104, 119, 160, 193, 211, 235, 246, 266, 270
heterogeneous 24, 272
high-bandwidth 10
high-bandwidth hosts 4
hops 3
host 2, 24, 58, 84, 112, 123, 162, 175, 207, 246, 270
configuration 15, 125, 161, 211, 272
creating 17
definitions 125, 186, 209
HBAs 15
information 35, 184, 231, 252, 275
systems 30, 175, 209, 270
zone 14, 123, 177, 272
host bus adapter 199
host level 178
host mapping 138, 178, 271
host type 58, 252
host zones 17, 249
I
I/O governing 130
I/O governing rate 133
I/O group 8, 27, 123, 183, 235, 252, 265
I/O Groups 129
I/O groups 16, 123, 174, 187, 240
I/O performance 194, 217
I/O rate setting 132
I/O response time 141
I/O workload 213
IBM Subsystem Device Driver 58, 88, 125–126, 164, 195
IBM TotalStorage Productivity Center 22, 167, 223, 267,
274, 301
identification 93, 179
identify 57, 88, 103, 195, 234
identity 49
IDs and passwords 52
IEEE 198
image 27, 86, 123, 152, 178, 211, 249
Image mode 32, 127, 162
image mode 30, 124, 185
image mode VDisk 124, 218
Image Mode VDisks 164
image mode virtual disk 134
image type VDisk 127
implement 3, 29, 31, 88, 199, 251, 268
implementing xiii, 1, 104, 191
import 124
SAN Volume Controller Best Practices and Performance Guidelines
import failed 99
improvements 27, 31, 114, 145, 196, 293
Improves 24
in-band 138
information 1, 59, 129, 185, 207, 222, 246, 266, 270
infrastructure 103, 133, 164, 224, 254
ingress 9
initial configuration 182
initiating 76
initiators 84, 191
install 5, 51, 160, 199, 233, 254
installation 1, 87, 222, 246, 262
insufficient bandwidth 3
integrity 126, 152
Inter Switch Link 2
interface 24, 153, 175, 222, 259
Internet Protocol 21
interoperability 21, 254
interval 170, 234
inter-VSAN routing 11
introduction 77, 268, 294
iogrp 126, 178
IOPS 177, 208, 294
IP 20–21, 222
IP communication 41
IP connectivity considerations 41
IP traffic 21
IPv4 41
IPv6 41
IPv6 communication 39
ISL 2, 248
ISL capacity 10
ISL links 5
ISL oversubscription 3
ISL trunks 10
ISLs 3, 247
isolated 72, 183
isolation 2, 59, 88, 104, 183, 271
IVR 11
kernel 200
key 185, 215, 250
key based SSH communications 46
key pairs 48
keys 46, 48, 192
library 201
license 29, 280
light 104, 208, 283
limitation 42, 189, 234
limitations 1, 29, 42, 163, 210, 282
limiting factor 138
limits 24, 139, 162, 189, 219
lines of business 213
link 2, 29, 42, 162, 198
bandwidth 21, 165
latency 165
link quality 10
link reset 10
links 164, 223, 262
Linux 200
list 12, 24, 50, 61, 85, 164, 202, 246, 270
list dump 54
livedump 292
load balance 130, 183
load balances traffic 7
Load balancing 196
load balancing 123, 199
loading 68, 114
LOBs 213
location 75, 85, 142, 208, 246
locking 191
log 49, 64, 164, 236, 254, 274
logged 42, 72
Logical Block Address 64
logical drive 58, 95, 193, 210, 215
logical unit number 163
logical units 29
logical volumes 215
login 42, 177
logins 177
logs 156, 210, 255, 276
long distance 165
loops 67, 264
lower-performance 122
LPAR 198
LU 178
LUN 30, 57, 83, 104, 134, 163, 176, 210, 228, 249
access 164, 191
LUN mapping 93, 178
LUN masking 20, 72, 272
LUN Number 59, 93
LUN per 105, 213
LUNs 58, 84, 101, 164, 178, 211, 213, 233, 249
LVM 125, 196, 212
LVM volume groups 215
L
M
J
journal 201, 210
K
last extent 123
latency 9, 138, 155, 208
LBA 64
level 12, 25, 58, 86, 139, 162, 178, 218, 239, 270, 297
storage 75, 204, 253, 271
levels 65, 86, 104, 141, 190, 217, 251
lg_term_dma 194
MAC 41
MAC address 41
maintenance 34, 42, 167, 184, 256, 270
maintenance procedures 42, 259, 289
maintenance window 167
manage 24, 58, 119, 176, 213, 222, 268
managed disk 219, 289
managed disk group 127, 219
Index
309
Managed Mode 67, 127
management xiii, 6, 41, 102, 162, 176, 211, 222, 263,
272
capability 177
port 177, 241
software 179
management communication 46
managing 24, 31, 176, 215, 246, 268, 270
map 61, 137, 161, 179
map a VDisk 183
mapping 57, 93, 109, 125, 153, 176, 213, 271
mappings 125, 192, 271
maps 219, 289
mask 9, 164, 177, 289
masking 30, 72, 161, 177, 272
master 35, 153
master console 41, 154
Master Console server 39
max_xfer_size 194
maximum IOs 216
MB 21, 67, 123, 194
Mb 21, 29
McDATA 27, 280
MDGs 83, 123, 213
MDisk 53, 57, 83, 103, 123, 163, 183, 210, 228, 249
adding 87, 140
removing 192
MDisk group 124, 163, 210
media 171, 233, 289
Media Access Control 41
media error 64
medium errors 63
member 16
members 67, 271
memory 152, 176, 210, 235, 253, 264, 292
message 20, 42, 170, 258
messages 183, 258, 284
metadata 120
metadata corruption 99
MetaSANs 10
metric 78, 139, 173, 236
Metro 27, 124, 227
Metro Mirror 162, 236
Metro Mirror relationship 155
microcode 65
Microsoft Windows Active Directory 38
Microsoft Windows Server professionals 49
migrate 22, 124, 160, 178
migrate data 127, 198
migrate VDisks 125
migration 3, 30, 63, 126, 160, 185, 253, 267
migration scenarios 8
mirrored 138, 165, 204
mirrored VDisk 122
mirroring 20, 125, 162, 196
misalignment 215
mkrcrelationship 169
Mode 67, 180, 252, 275
mode 27, 52, 86, 101, 162, 177, 211, 262, 290
settings 162
310
monitor 22, 141, 167, 272
monitored 78, 141, 172, 204, 270
monitoring 77, 167, 175, 288
monitors 145, 234
mount 126, 156, 265
MPIO 196, 258
multi-cluster installations 5
multipath drivers 88, 256
multipath software 190
multipathing 34, 58, 176, 256, 269
Multipathing software 184
multipathing software 183, 259
multiple paths 130, 183, 272
multiple striping 218
multiple vendors 21
multiplexing 20
N
name server 185, 259
names 16, 49, 122, 199, 259
nameserver 185
naming 13, 60, 87, 122, 252
naming convention 13
naming conventions 53
nest aliases 16
new disks 186
new MDisk 95
No Contact 50
no synchronization 122
NOCOPY 155
node 3, 27, 52, 72, 84, 104, 123, 176, 223, 255, 264, 270,
294
adding 29
failure 130, 185
port 14, 130, 172, 177, 223, 272
node port 14
nodes 3, 24, 52, 71, 84, 123, 177, 223, 254, 264, 271,
293
noise 138
non 11, 24, 73, 124, 183, 213, 237, 251, 267, 275
non-disruptive 127
non-preferred path 129
num_cmd_elem 193–194
O
offline 52, 65, 87, 126, 163, 184, 230, 256, 271
OLTP 210
Online 210
online 87, 115, 225, 257, 271
online transaction processing (OLTP) 210
operating system (OS) 208
operating systems 183, 215, 258, 274
optimize 115, 293
Oracle 196, 213
organizations 21
OS 52, 176, 220, 256
outage 133
overlap 14
overloading 148, 174, 242
SAN Volume Controller Best Practices and Performance Guidelines
over-subscribed 10
oversubscribed 4
over-subscription 9
oversubscription 3
overview 40, 83, 211, 269
P
packet filters 41
parameters 49, 57, 84, 131, 171, 178, 209, 252
partition 197, 216
partitions 66, 144, 197, 215
partnership 54, 167
password 52
password reset feature 53
passwords 52
path 3, 58, 104, 176, 220, 228, 256, 270
selection 195
paths 7, 34, 58, 130, 176, 228, 253, 272
peak 3, 165
per cluster 28, 123, 236
performance xiii, 3, 24, 57, 86, 102, 119, 162, 175, 207,
223, 245, 270, 293
degradation 59, 104, 162
performance advantage 86, 108
performance characteristics 103, 124, 206, 219
performance improvement 127, 237, 294
performance monitoring 173, 178
performance requirements 31
Performance Upgrade kit 39
permanent 170
permit 3
persistent 88, 191
PFE xiv
physical 20, 24, 57, 85, 147, 152, 175, 235, 249, 264
physical volume 197, 219
ping 52
PiT 134
Plain Old Documentation 92
planning 15, 86, 101, 139, 165, 209
plink 51
plink.exe 51
PLOGI 185
point-in-time 163
point-in-time copy 164
policies 196
policy 53, 104, 191, 254
pool 24, 68, 168
port 2, 24, 57, 84, 142, 172, 176, 223, 246, 262, 272
types 59
port bandwidth 9
Port Channels 11
port errors 10
port event 10
Port Fencing 10
port layout 10
port zoning 12
port/traffic isolation 11
port-density 9
ports 2, 27, 58, 84, 142, 176, 223, 248, 263, 271
power 188, 258, 264, 288
preferred 20, 30, 50, 58, 123, 167, 177, 214, 271
preferred node 15, 123, 167, 183
preferred owner node 129
preferred path 58, 129, 183
preferred paths 130, 183, 275
prepare a FlashCopy 172
prepared state 172
Pre-zoning tips 13
primary 29, 85, 102, 134, 162, 212
priority 42
private key 46
problems 2, 34, 59, 87, 138, 161, 192, 208, 250, 262, 269
profile xiv, 66, 96, 130, 287
properties 138, 201
protect 167
protecting 67
provisioning 87, 105
pSeries 19, 73, 206
public key 46
PuTTY 39, 52
PuTTY generated SSH 46
PuTTY SSH 38
PuTTYgen 46, 48
PVID 198
PVIDs 199
Q
queue depth 83, 188, 194, 200, 219
quickly 2, 76, 138, 155, 183, 230, 251, 262
quiesce 125, 156, 187
R
RAID 67, 86, 127, 169, 210, 249, 299
RAID array 139, 171, 211, 213
RAID arrays 138, 212
RAID types 211
ranges 139
RDAC 58, 88
Read cache 208
read miss performance 130
real capacity 63
reboot 125, 188
rebooted 197
receive 97, 237, 258
recovery 29, 52, 95, 128, 156, 176, 289
recovery point 167
Redbooks Web site 303
Contact us xvi
redundancy 2, 9–10, 58, 116, 165, 177, 224, 272
redundant 24, 45, 72, 165, 177, 219, 230, 270
redundant paths 177
redundant SAN 72
registry 185, 276
relationship 19, 58, 124, 197, 227
reliability 15, 87, 245, 294
remote cluster 35, 165, 227, 255
remote copy 134
remote mirroring 20
remotely 50
Index
311
remount 138
removed 20, 31, 125, 186
rename 161, 277
repairsevdisk 99
replicate 163
replication 162, 268
reporting 77, 139, 241, 274
reports 142, 186, 221
reset 42, 185, 254, 270
resource consumption 42
resources 24, 75, 96, 102, 133, 168, 176, 216, 235, 292
restart 161, 264
restarting 167
restarts 185
restore 55, 166, 173, 198
restricted rights 48
restricting access 191
rights 48
risk 76, 87, 102, 164, 266
role 48, 210
role-based security 48
roles 47, 212
root 141, 192, 241, 257
round 96, 165, 216
round-robin 97
route 167
router 164
routers 165
routes 11
routing 58, 263
RPQ 3, 200, 254
RSCN 185
rules 77, 149, 161, 176, 272
S
SAN xiii, 1, 23–24, 39, 58, 122, 160, 175, 219, 245, 262,
269–270, 294
availability 183
fabric 1, 160, 183, 224
SAN bridge 6
SAN configuration 1
SAN fabric 1, 160, 178, 223, 272
SAN switch models 9
SAN Volume Controller 1, 3, 12, 15, 24, 127, 175
multipathing 200
SAN zoning 130, 226, 251
save capacity 121
scalability 2, 23, 299
scalable 1, 24
scale 25, 116, 299
scaling 31, 117, 293
scan 186
SCP 49
scripts 133, 189
SCSI 64, 130, 185, 287
commands 191, 287
SCSI disk 198
SCSI-3 191
SDD 14, 58, 88, 125–126, 164, 176, 195, 218, 254, 274
SDD for Linux 201, 302
312
SDDDSM 179, 274
SE VDisks 120
secondary 29, 134, 163, 210
secondary site 29, 163
secure 52
Secure Copy Protocol 49
Secure Shell 49
security 12, 47, 196, 256
segment 66
separate zone 18
sequential 85, 101, 123, 176, 208, 249, 299
serial number 60, 178, 249
serial numbers 179
Server 58, 196–197, 218, 222, 251, 275
server 3, 24, 66, 142, 156, 185, 233, 248, 264
Servers 199
servers 3, 25, 196, 207, 251, 265
service 35, 41, 78, 86, 219, 236, 270
settings 52, 171, 193, 208, 272
setup 41, 193, 214, 262, 272
SEV 159
SFPs 21
shapers 41
share 19, 72, 86, 104, 147, 177, 216
shared 21, 169, 191, 217, 224
sharing 11, 146, 191, 209
shortcuts 13
shutdown 125, 161, 185, 264
Simple Network Management Protocol 41
single host 15
single initiator zones 15
single storage device 183
single-member aliases 16
site 29, 31, 64, 134, 163, 203, 234, 251, 300, 302
slice 215
slot number 19, 256
slots 67–68
slotted design 9
SMS 217
snapshot 160, 249
SNMP 41, 241
Software 1, 3, 12, 15, 187, 253, 255, 270
software 2, 164, 176, 245, 266, 271
Solaris 201, 275
solution 1, 38, 86, 139, 173, 208, 246, 266
solutions 119, 246
source 11, 64, 124, 201, 233
sources 265
space 80, 86, 123, 210, 252
Space Efficient 121
space efficient copy 127
Space Efficient VDisk 159
Space Efficient VDisk Performance 120
space-efficient VDisk 63
spare 3, 30, 67, 86
speed 10, 25, 139, 169, 248
speeds 20, 80, 139, 262, 294
split 5, 23, 70, 146
splitting 121
SPS 71
SAN Volume Controller Best Practices and Performance Guidelines
SSH 42, 49, 223
SSH communication 48
SSH connection 46
SSH connection limitations 46
SSH connectivity 38
SSH keypairs 46
SSH Secure Copy 54
SSH session 48
SSPC 39, 89, 244
SSPC server 39
standards 22, 222, 262
start 20, 25, 79, 148, 162, 178, 215, 222, 270
state 42, 127, 162, 176, 255, 292
synchronized 170
statistics 85, 170, 205, 234
statistics collection 170
status 50, 63, 192, 222, 248, 271
storage 1, 24, 57, 83, 101, 123, 162, 175, 207, 245, 262,
270, 294
storage controller 14, 24, 57, 84, 103, 134, 163, 224
storage controllers 14, 24, 59, 86, 104, 147, 163, 222
Storage Manager 68, 166, 172, 281
storage performance 78, 138, 239
Storage Pool Striping 71
storage subsystems 49
storage traffic 3
streaming 114, 130, 209
strip 215
Strip Size Considerations 215
strip sizes 215
stripe 102, 213
striped 96, 123, 185, 210
striped mode 155, 211
striped mode VDisks 214
striped VDisk recommendation 218
Striping 101
striping 24, 66, 108, 212, 215, 293
striping policy 121
Subsystem Device Driver 14, 58, 88, 125–126, 164, 180,
195, 257, 275
support 25, 58, 88, 210, 246, 263, 294
SVC xiii, 1, 23, 58, 84, 123, 152, 175, 210, 245, 262, 269,
293
SVC CLI 52
SVC Cluster 52
SVC cluster 3, 23, 60, 84, 103, 182, 222, 279
SVC configuration 53, 177, 250, 267, 300
SVC Console 52
SVC Console server 52
SVC Console software 38
SVC error log 99
SVC installations 4, 104, 262
SVC master console 52, 157
SVC node 14, 29, 164, 177, 223, 271
SVC nodes 7, 24, 31, 72, 138, 178, 227, 288, 294
SVC Service mode 52
SVC software 178, 270
SVC zoning 16
svcinfo 88, 125, 179, 271
svcinfo lsmigrate 88
svctask 59, 88, 123, 160, 201, 278
svctask dumpinternallog 50
svctask finderr 50
switch 53, 142, 170, 175, 223, 247
fabric 3, 255
failure 3, 205
interoperability 21
switch fabric 2, 178, 228
switch ports 8, 225
switch splitting 8
switches 2, 49, 167, 247, 262, 270
Symmetrix 57
synchronization 165
Synchronized 169
synchronized 122, 162
system 30, 78, 115, 123, 152, 175, 209, 231, 256, 274,
293
system performance 124, 201, 292
System Storage Productivity Center 39
T
tablespace 210, 216
tablespaces 217–218
tape 3, 166, 177
target 59, 84, 124, 177, 233, 288
target ports 72, 177
targets 187
tasks 241
test 2, 51, 87, 108, 165, 175, 218, 236
tested 25, 87, 164, 176, 218, 254, 266, 300
This 1, 23, 58, 83, 101, 123, 163, 175, 207, 221, 246,
261, 270, 297
thread 188, 215
threads 215
threshold 3, 162, 241
thresholds 138
throttle 130, 200
throttle setting 131
throttles 130
throttling 131
throughput 28, 66, 86, 104, 138, 156, 184, 194, 208, 210,
237, 294
throughput based 208–209
tier 87, 104
time 2, 27, 52, 58, 96, 103, 122, 176, 208, 223, 248, 264,
269, 294
tips 13
Tivoli 166, 172, 241
Tivoli Storage Manager (TSM) 209
tools 175, 246, 272
Topology 142, 223, 274
topology 2, 223, 274
topology issues 7
topology problems 7
total load 218
TPC CIMOM 52
TPC for Replication 39
traditional 11
traffic 3, 7, 165, 183, 235
congestion 3
Index
313
Fibre Channel 21
Traffic Isolation 8
traffic threshold 9
transaction 66, 156, 193, 208
transaction based 208–209
Transaction log 210
transceivers 21
transfer 58, 83, 130, 176, 208
transit 3
trends 78
trigger 15, 250
troubleshooting 12, 175, 252
TSM 215
ttle 131
tuning 145, 175
U
UID 93, 290
unique identifier 75, 178
UNIX 156, 205
Unmanaged MDisk 164
unmanaged MDisk 128
unmap 124
unused space 123
upgrade 171, 185, 254, 287, 297
upgrades 184, 253, 287
upgrading 35, 190, 237, 255–256, 296
upstream 2, 241
URL 50
user data 120
user IDs 52
users 10, 24, 139, 185, 224, 266
using SDD 164, 195, 258
utility 88, 206
VSCSI 197, 219
W
Web browser 49
Windows 2003 32, 199
workload 29, 58, 96, 102, 123, 165, 193, 208, 235, 297
throughput based 208
transaction based 208
workload type 209
workloads 3, 66, 86, 103, 133, 165, 176, 208, 296
worldwide node name 11
write performance 122
writes 66, 85, 104, 138, 162, 176, 210, 235, 297
WWNN 11–12, 59, 187, 259
WWNs 12
WWPN 12, 30, 57, 84, 223, 248
WWPN zoning 12
WWPNs 12, 73, 177, 259, 271
X
XFPs 21
Z
zone 7, 160, 177, 226, 272
zone name 19
zoned 2, 177, 227, 248, 288
zones 12, 160, 226, 249, 272
zoneset 18, 227, 289
Zoning 20, 249
zoning 6, 11, 29, 73, 130, 177, 226, 249
zoning configuration 11, 231, 249
zoning scheme 13
zSeries 117
V
VDisk 14, 27, 53, 57, 83, 103, 119, 178, 210, 231, 249,
271
creating 87, 158
migrating 127
modifying 146
showing 142
VDisk migration 64
VDisk Mirroring 121
VIO clients 219
VIO server 198, 219
VIOC 197, 219
VIOS 196–197, 219
virtual address space 120
virtual disk 98, 130, 198
Virtual LAN 41
virtualization 23, 75, 211, 269
virtualization policy 121
virtualizing 11, 185
VLAN 41
volume abstraction 211
volume group 72, 195
VSAN 11
VSANs 2, 11
314
SAN Volume Controller Best Practices and Performance Guidelines
SAN Volume Controller Best
Practices and Performance
Guidelines
SAN Volume Controller Best Practices
and Performance Guidelines
SAN Volume Controller Best Practices and Performance Guidelines
SAN Volume Controller Best Practices and Performance Guidelines
(0.5” spine)
0.475”<->0.873”
250 <-> 459 pages
SAN Volume Controller Best
Practices and Performance
Guidelines
SAN Volume Controller Best
Practices and Performance
Guidelines
Back cover
®
SAN Volume Controller
Best Practices and
Performance Guidelines
Read about best
practices learned
from the field
This IBM Redbooks publication captures many of the best practices
based on field experience and details the performance gains that can
be achieved by implementing the IBM System Storage SAN Volume
Controller (SVC).
Learn about SVC
performance
advantages
This book is intended for extremely experienced storage, SAN, and SVC
administrators and technicians.
Fine-tune your SVC
Readers are expected to have an advanced knowledge of the SVC and
SAN environment, and we recommend these books as background
reading:
 IBM System Storage SAN Volume Controller, SG24-6423
 Introduction to Storage Area Networks, SG24-5470
 Using the SVC for Business Continuity, SG24-7371
®
INTERNATIONAL
TECHNICAL
SUPPORT
ORGANIZATION
BUILDING TECHNICAL
INFORMATION BASED ON
PRACTICAL EXPERIENCE
IBM Redbooks are developed
by the IBM International
Technical Support
Organization. Experts from
IBM, Customers and Partners
from around the world create
timely technical information
based on realistic scenarios.
Specific recommendations
are provided to help you
implement IT solutions more
effectively in your
environment.
For more information:
ibm.com/redbooks
SG24-7521-01
ISBN 0738432040
Download PDF
Similar pages