Another (Off) White Paper on High Availability

Another (Off) White Paper on High Availability
hp
Another (Off) White Paper on High Availability Storage Options
and Their Impact on Performance on the HP e3000
By Walter McCullough
Version Date January 20, 2004
This is an update to a white paper I wrote back in 1997 when we were receiving a number of questions
regarding the introduction of storage arrays and their use on MPE systems. It seems we continue to have
some confusion regarding this subject despite the passage of time and improvement in the storage
technology.
My goal for this (short) paper is to lay out a somewhat high level overview of what makes MPE different
from other operating systems and importantly why MPE goes to disk to satisfying an I/O request. Then I
would like to describe the available HA storage solutions for your HP e3000 and just a little behind the
curtain view on how that new technology works with MPE.
You can view the older paper at:
http://jazz.external.hp.com/mpeha/papers/off_white_paper.html
Architecture
The current (and last) incarnation of MPE and its lowest (machine dependent) layer was specifically
designed for the PA-RISC architecture. This thin layer allowed the MPE lab to create an operating system
that had very little shielding from the hardware layer. While the HP-UX approach was to create a (thicker)
layer which allowed for greater hardware independence, MPE's approach allowed operations to move
more expeditiously through the computer, thus giving it the ability to do more (and generate more I/O).
The disadvantages (for the HP e3000) of this thinner machine dependent hardware layer or shielding is
that MPE is limited to the PA-RISC architecture while HP-UX's thicker shielding allowed it much more
freedom and flexibility to move to newer faster computer architectures.
MPE I/O Behavior
First of all MPE/iX is different from HP-UX and Unix (like) systems on how it relates to file access and
how it utilizes the combination of disk storage and server memory. While Unix (like) systems are usually
referred to as using a paging system to access files and data to and from a swap area, the MPE operating
system uses the file's disk location as the file swap area and only needs to allocate process/user state
information/data which is "swapped" out to areas of the system volume set. This technique greatly
reduces the number of times that the operating system needs to go to disk to satisfy an application’s I/O
requirements.
1
To further reduce the need to go to disk, MPE employs a cache as part of its design. This design uses
available main memory to keep active parts of the file accessible by an approach that is similar to direct
memory access. As active applications require access to their data those requests are satisfied through
MPE's memory cache and are handled by the MPE/iX memory manager facility.
The next technique employed by MPE/iX to increase performance is that of reducing the need to post data
to disk. MPE's journaling file system is called the Transaction Manager (XM for short). This facility
copies just those bytes that change in a file, because of a file record update, and copies them to XM's
facility. (This XM protection is afforded to critical MPE/iX structures and some file types like KSAM and
Image data bases.) This reduces the need to post files/data to disk to ensure data durability and this
posting action can significantly impact the overall system and application performance.
The last couple of techniques I want to mention is the way MPE stripes data files across multiple disk
members within a volume set and how it sends multiple I/O request for any given Ldev (volume) at the
same time. This helps reduce the impact of disk seek and latency times, part of the normal performance
impact when dealing with disk drives. MPE bundles a number of I/O requests (8 per Ldev) for most types
disks and arrays and sends them asynchronously. This lets the operating system return to other tasks while
those operations complete.
How is performance perceived
The number one question I usually get is "how fast will this HP e3000 server run with that storage
array?". The answer is, "It depends"! Without any performance data or an understanding of the current (or
future) application's I/O characteristics, that question really can't be answered except very conservatively.
Using a car story analogy; A customer goes to a car dealer looking for a car that has a 400 horse power
engine and "price isn't an option" says the customer. The dealer sells the customer a real nice sports car
but is dismayed when two days later the car is returned dirty, smoking and badly wrecked. The customer
wants his money back because, it seems, the car was unable to navigate off-road, down hill and with a
large boat attached (which explains the dents on the rear and roof of the car). <Enough of cars now>
Without understanding what the needs are and understanding how the technology works together, we are
all destined (in our small way) to repeat this story.
Understanding the Technology
The benchmark for storage technology is set by the performance of JBOD (Just a Bunch Of Disks).
Remember too, that modern storage arrays are built on top of JBOD but use a memory (cache) and a small
operating system to add features that ultimately allow for the added protection of data and in some cases
improve ease of management and performance.
The added value that arrays provide over that of JBOD is that of data protection. It does this by laying the
data across a number of disk drives along with data recovery information that can be used to reconstruct
the data in the event that a drive mechanism fails. The vender may have created a way for the
reconstruction to happen on the fly or as a background activity. The main point to remember is that
whatever technology is in place, the goal is to protect the data.
2
Storage Performance Features
One of the biggest features that all the venders tout as their answer to performance improvements is the
use of large amounts of memory in the array used as a cache. Some even give the impression that this
cache will improve performance over JBOD and that arrays are faster because of the cache. Not always
so.
The right side of the picture below shows an arrow that indicates the (relative) time it takes to do either a
read or write to disk. The time is based on a combination of events that culminate in I/O response time,
such as bus transfer speed, disk seek, settle and latency times. The value of this response time is usually in
milliseconds (which I will not go into because I'll have to change it every time a new disk hits the
market). Just use the placement of the JBOD Read/Write as a relative reference point to compare that of
the storage array's Read/Write response times.
When we factor in the use of a storage array cache we see that when a Read is issued to the array and the
information is already in the cache (Cache Hit), it can respond back with the data faster than a Cache
Miss, which needs to wait until the data is found on the disk (seek, settle and latency) and then transferred
to the Cache. As you see in the picture, a Cache Hit is faster than a JBOD Read/Write and a Cache Miss is
slower than a JBOD Read/Write Access but much slower than the array Cache Hit.
Array Technology
Write Access
Time
Faster
JBOD Technology
Cache Hit
Read/Write Access
Read Access
Cache Miss
Slower
Now this same picture will look a little different on the left (array) side if we look what happens to a
Write Access. In this case, depending on available array cache, the Write Access can complete faster than
the right side (JBOD) because the array has the ability to store the Write data in the array's cache, then
inform the server that the Write has completed and unbeknownst to the server, transfer the data from the
array cache to disk later. This only works if there is room in the array cache. If the array is getting more
data than it can transfer to disk then I/Os will queue up and wait for room in the cache. This then slows
the I/O completion down just like a cache miss.
Early enterprise array technology (1995-2000) had response times for cache hits only marginally faster
than JBOD but cache misses were many many times slower than JBOD. With this scenario the only way
to achieve decent performance numbers was that MPE's cache was large enough to reduce the need to do
I/O to the array or that the array either had a better algorithm to anticipate MPE's needs (highly unlikely)
or a large enough cache to allow MPE's I/O needs to be satisfied with a great proportion of cache hits.
3
MPE Performance Environment
Lets spend a little time painting the picture of an environment where MPE was best designed to give
optimum performance. We will break this part up into a number of categories: memory, connection
technology and Ldev (spindle count/size). The order of the categories also indicates its impact on
performance, where the fastest access is memory and the slowest is disk. We will then go through the list
of available array products for the HP e3000.
Memory
The greater the amount of memory for the HP e3000 the more efficient it will run (to a point). MPE uses
available (server) memory as a cache to store user process information and also keeps a virtual copy of the
file. If there is enough memory to cache the user process information and data then MPE does not have to
wait for disk access when making changes to the file until either the user makes an explicit call to postand-wait or lack of memory causes MPE to push that data from memory to make room for other processes
that need to execute. The amount of memory needed depends on an understanding of the application’s
needs, the number of users and the amount of available disk space. This information can be supplied by
the application’s vender or with the help of an MPE performance specialist/consultant.
Connection Technology
The HP e3000 server comes in two flavors, an older NIO based machine, such as the 9x7, 9x8 and 9x9
and a newer PCI based machine such as the A-Class and N-Class machines. The NIO based machines are
limited to HVD-SCSI (High Voltage Differential) but can use the A5814 Router to connect to newer HP
fibre channel storage arrays.
Without playing too much with numbers, an NIO based machine can usually pump out about 12Mbytes
per second of data through the backplane and HVD-SCSI HBA (Host Based Adapter) card while the
theoretical limit of this SCSI specification is 20Mbytes. This sounds like a definite bottleneck compared
to fibre channel speeds but remember, performance is based on all the components (memory, connection
technology and disk) working at best case and disks are still a mechanical device limited to mechanical
speeds.
Another connection technology is the HVD-SCSI to Fibre Channel Fabric Router. This device connects to
an MPE (or HP-UX) HVD-SCSI HBA and converts SCSI to fibre channel (1Gbit) for connection to the
newer HP fibre channel portfolio of storage arrays. The router solution has been thoroughly tested and has
not been shown to increase any performance overhead by its presence as a connection solution.
The last connection technology for the HP e3000 is that of native fibre channel. This technology is only
available on PCI based servers, usually referred to as the A-Class and N-Class, running MPE/iX version
7.5.
This fibre channel technology is rated at 2Gbits per second (theoretical limit). This translates down to
about an effective rate of 200Mbytes per second but let me caution you right now. This speed limit or
bandwidth is no guarantee that the disks can keep up with this limit. Using a car story analogy (again),
while you can go 65mph on the freeway you still have to pull into your driveway and wait for the garage
door to open before you can enter.
4
Ldev Count & Spindle Size
The term Ldev in this paper referrers to an MPE configured disk or volume.
The following is a list of features that MPE/iX provides which will help clarify the relationship between
the number of Ldevs and their size to the impact on performance:
Extent Striping
Given a single large file, MPE/iX was designed to break up that file into extents
(chunks) and then spread each of those extents across the different disks within its
volume set (volume group) which we will refer to as "extent striping". This striping
was shown to improve file access the HP e3000 by spreading the I/O load across
multiple disks much the same way as the newer array technology does by
spreading its I/O by the use of RAID-0 features.
Asynchronous I/O
MPE/iX is designed to initiate up to 8 individual I/O requests per Ldev (disk or
volume).
XM Facility
XM supports multiple instances of itself for each volume set. This means that XM
single threading is greatly reduced when the HP e3000 is configured to have a
number of user volume sets.
Given the feature set listed above let's look at the advantages of using 8 separate 9Gbyte Ldevs instead of
using a single 72GByte drive.
Extent Striping
With a number of Ldevs MPE will utilize its extent striping feature to improve
performance. This allows MPE to spread the access across multiple I/O paths and
drives.
Asynchronous I/O
MPE/iX is designed to initiate up to 8 individual I/O requests per Ldev (disk or
volume). One big drive (72GByte) will be limited to only 8 asynchronous I/O
requests at any given time while that same disk capacity (72Gbytes) divided 8
ways allows MPE/iX to issue a total of 64 asynchronous I/O request for that given
capacity.
XM Facility
The best way to take advantage of XM's support of multiple instances of itself for
each volume set is by dividing the 8 Ldevs into two or more volume sets. This will,
depending on the application's I/O characteristics, also reduce single threading on
XM resources (like the XM recovery log file).
High Availability Storage
Lets begin our description of storage by limiting our storage products to only those that are available,
purchasable and supported by HP and that work on the HP e3000. I will start off with describing entry
level storage solutions and go through the top of the line enterprise storage arrays. A good single point of
information of arrays and HA features available for the HP e3000 can be found at
http://jazz.external.hp.com/mpeha/ and in the left navigation window click on the ha & storage matrix
button.
5
MPE Mirrored Disk/iX
Though mirrored disk is not an array it is the lowest level entry into high availability data protection. This
technology is based on (SCSI) JBOD and relies on having a single piece of data (file) written to two
different disks that are on two separate paths. This method protects the data from any single point of
failure along the I/O path. The greatest advantage of this technology is its small cost to implement.
The downside of this technology is that it can disable itself under high I/O loads which leave the user
unprotected. These false failures can also lead to long recovery times which also leave the user
unprotected. For this and other reasons the MPE/iX R&D Lab does not recommend this solution for
environments where there is a great deal of I/O activity or where the disk capacity exceeds 50Gbytes of
protected storage.
VA7xxx Storage Arrays
These are HP's entry and mid-range fibre channel high availability storage arrays. The VA7100 (now
discontinued) is a dual controller (active/passive) 1Gbit fibre channel array that is only slightly faster than
the 12H AutoRAID but has much more features and is easier to manage.
All VA7xxx storage arrays must be ordered with dual controllers for MPE/iX and MPE/iX requires that a
VA manage software package called the CommandView SDM, to configure and monitor the VA, be on a
separate PC workstation or HP-UX server attached to the VA storage array.
For older NIO based HP e3000s the A5814A (option #003) Fabric Router can be used to connect the fibre
channel arrays to the older SCSI based NIO HP e3000s. (See the ha & storage matrix for support
information.)
The VA7110 and VA7410 are HP's newer entry and mid-range virtual arrays. Both these arrays support
2Gbit redundant controllers. The VA7410 (mid-range array) is better suited for heterogeneous
connections but still don't perform as well as some (Ultra-SCSI JBOD) Mirrored Disk/iX solutions. What
you do get is added hardware reliability, added data protection and ease of management because most
array component failures (disk or media problems) are handled transparently by the array. Adding MPE's
High Availability Failover/iX increases availability to both the VA7110 and VA7410 by protecting the
I/O path components between the HP e3000 and the storage array.
Purchasing array controllers with as much memory cache as possible is important along with making sure
there is enough available server memory for MPE/iX so that it does not have to go to disk will help
improve the performance picture.
There have been a number of VA7xxx configuration change recommendations issued by HP that help to
resolve HP-UX problems that if implemented on arrays attached to the HP e3000 have caused major
performance problems. One such change describes reducing the Queue Depth of the VA from the default
value of 750 and to set it to a very small number. This change solves an I/O saturation problem when HPUX issues too many I/O requests for the VA to handle. MPE handles its own requests and will only issue
8 I/O requests for each MPE/iX configured Ldev. The "Queue Full" message is an unexpected error and at
best MPE/iX will stop all I/Os on that bus (I/O pathway), abort all those pending I/O requests, reset and
clear the bus and then restart all those pending I/Os each time it encounters this error message or worse do
a System Abort if this reset and I/O restart happens during critical operating system I/O.
6
Business Copy and Continuous Access are not supported and there are no plans to release these features
for use on the HP e3000.
Enterprise Virtual Array (EVA)
Not supported for the HP e3000.
Early indications are that it is much faster than the VA Family of storage and has many of the same
features as the VA Family but is a higher storage capacity and connectivity solution.
But is not supported on the HP e3000.
HP SureStore Disk Array XP256
This product was born out of the partnership between HP and Hitachi Storage and was the replacement
for the EMC Symmetrix line (see older white paper) and provided much of the same features but added an
increased array cache, faster disks and provided a host based program to manage/control Business Copy
XP and Continuous Access XP high availability features. Though this product used the Hitachi 7700
Storage hardware frame, HP provides value add with its HP proprietary firmware base. This means that
the XP256 and Hitachi version are different and the Hitachi version will not work with the HP e3000.
Both the XP256 and (older) EMC Symmetrix provided an enterprise storage and connectivity solution for
a heterogeneous server environment with autonomous control within each defined domain or operating
system environment.
Both storage products were originally offered with a number of HVD-SCSI ports/connections and later
the XP256 started supporting fibre channel for HP-UX. Only HVD-SCSI was/is supported for the HP
e3000.
The XP256's much larger cache and faster drives played an important role in improving overall
performance issues caused by Read/Write Cache Misses for the HP e3000.
For a list of supported high availability features for this product please see the ha & storage matrix at
http://jazz.external.hp.com/mpeha
The XP256 is no longer available from HP but may be still available in the used product market.
HP SureStore Disk Array XP512/XP48
The XP512 and smaller capacity XP48 were HP's entry into 1Gbit per second fibre channel connectivity.
The XP512/XP48 again supported multi-host heterogeneous environments with up to 16 FC ports.
The earlier versions of MPE/iX released during that year (2000) only supported HVD-SCSI so HP
OEM'ed a product from Vicom Systems that (basically) converted HVD-SCSI to 1Gb FC for a direct
connect to an XP512/XP48 (no FC switches or FC hubs). This product is still supported in this
environment and was referred to as the A5814A SCSI Fibre Channel Router.
7
The XP512/XP48 is no longer available from HP but may be still available in the used product market.
HP StorageWorks Disk Array XP1024/XP128
The XP1024 (and smaller XP128) are HP's latest entry into 2Gbit per second fibre channel connectivity.
The XP1024/XP128 again supports multi-host heterogeneous environments and along with HP's latest
(last) released version of MPE/iX 7.5 on PCI based hardware support native fibre channel (no router) to
the portfolio of HP FC switches and FC hardware.
For those customers still using older NIO (HVD-SCSI only) based hardware can upgrade to the latest
version of MPE/iX and use the latest version A5814A-003 SCSI-Fibre Channel Fabric Router. This newer
hardware supports the newer VA family and XP family of storage along with the portfolio of FC
switches. (The A5814A is not an upgradeable product to the A5814A-003.)
The XP1024 (and XP128) support an even larger cache along with previous features like Continuous
Access XP and Business Copy XP. MPE/iX version 7.0 and 7.5 support an enhanced version of MPE's
Failover/iX (I/O path failover) and also the Cluster/iX solution for even greater availability.
As the storage technology has improved over time in both performance and connectivity, customers have
increased the number of servers connected and this has increased the demands of that storage. This
increase in demand has sometimes impacted performance on individual servers like the HP e3000.
The increase in performance of the disks and their ability to transfer data to and from the array cache and
controllers has helped reduce the impact of cache misses. One of the largest impacts to the overall
performance of the array is the effective reduction of the amount of array cache available for each of the
servers. As the number of servers (and workload) increased, the amount of array cache available to (just)
one server decreased (like to the HP e3000). Changes in the workload of other the servers also has been
known to drastically change the performance of jobs and applications running on the HP e3000 (and other
servers) that are sensitive to disk I/O traffic delays.
Another performance issue observed which impacts performance on the array is MPE/iX XM checkpoint
operations. The checkpoints have a lesser impact on overall performance when a large amount of array
cache is available to handle the operation. In the vast majority of instances this is not the case and these
checkpoint operations do tend to saturate the array cache which can impact both the performance of the
HP e3000 and in some cases other servers connected to the array.
Depending on the application and system I/O activity, the frequency of the checkpoints can become
(somewhat) predictable. MPE/iX can be re-configured to reduce the frequency of these events. When the
XM log file size is increased the frequency of checkpoints decreases but the interesting observation is that
the duration of each of those checkpoints do not significantly increase.
8
"Rules of Thumb"
for Improving MPE/iX Disk Performance
What I would like to do now is list some guide lines to help make the migration to an array a better
experience. These are not hard rules and some flexibility is needed as you incorporate these "Rules of
Thumb" into your environment. The most important thing is to help set realistic expectations as storage
technology changes and improves:
1.
MPE/iX prefers many small Ldevs over a few large Ldevs. Configure your array luns in the
preferred 4Gbytes to 18Gbytes range. Also create (more) Ldevs for the system volume set. This
allows MPE/iX to "spread" the file extents and system transient space (the swap area) across a
number of Ldevs (MPE's extent striping implementation). This will have an improved performance
over a configuration which uses a single large Ldev.
2.
Restore the application’s data to User Volumes. Create as many User Volume Sets as makes sense.
This reduces the XM bottleneck to that of the master volume of the volume set, adds more pathways
to the data and adds fault containment. This also reduces the amount of data to reload in case of
catastrophic disk failure.
3.
Gather data on the user’s application I/O characteristics. There are several tools available like
GLANCE/XL that can help collect this data.
4.
Do not configure more that 16 Ldevs per fibre channel pathway or bus and make sure the number of
Ldevs per pathway or bus is supported in that environment. (HVD-SCSI should be limited to 8 while
SE-SCSI should be no more than 4) Consolidating a large number of physical devices to a fewer
number of larger capacity devices might create a situation where the array is the bottleneck while
too many Ldevs per pathway or bus can cause the connection technology (FC) to be the bottleneck.
(Your mileage will vary depending on driving habits)
5.
Purchase a high availability disk drive technology for its feature set and its ability to protect data.
Performance of these products is dependent on a clear understanding of how it is used, I/O
characteristics of the application and the way it is configured.
---------------------------
9
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement