r e a l t i m e p u b l i s h e r s . c o m t m
In the previous chapter, we covered the deployment phase of your SRM solution. We started by reviewing the SRM goals and components, and developed an organizational view of the SRM solution. Next we looked at the different storage management strategies and the various products, listing product selection criteria for the following types of solutions: device configuration and management, enterprise storage management, application-centered SRM, fibre-channel SAN approach to SRM, and policy-based object management. I continued to cover project-management fundamentals, such as the critical path, setting milestones, and performing a risk analysis. I gave you a template for risk-mitigation techniques and identifying sources of problems such as technical issues and people issues. We also looked at change control in the context of extending AD. Finally, I gave you some sample success-measurement criteria to help you define your objectives for this phase of the project.
In this chapter, we will continue the focus on project management, as you complete the deployment by setting up systems to monitor and maintain the SRM solution. In addition, we will cover the technical aspects of what you need to monitor and which solutions are available. I will give you a complete list of system management recurring tasks that you can use to make sure that you have all your operations in place, including SRM functions.
The goal of this chapter is to aid you in developing a daily approach to SRM that automates the repetitive tasks, for example monitoring disk usage by using the SRM software that we have discussed. This automation will free your time for other crucial tasks—such as maintaining your security defenses—that might be overlooked, as you only have so much time, and must continually fight to ensure that your priorities match those of the business. Table 7.1 shows
Phase 6 in the overall SRM deployment methodology.
Maintain Continue to support the solution and prepare to improve as needed
Monitor disk usage; add storage as needed (hopefully only for performance upgrades or to replace defective hardware).
Table 7.1: Phase 6 of the SRM deployment methodology.
At this point, you should be polishing off any rough edges in your SRM deployment, and you’ll find that this task is the easiest part of the deployment. You will have the opportunity to see the benefits of SRM, and to work on automating the monitoring and management process. Change of plan—this urgent message just came in—we have a security violation on our network that must be dealt with immediately!
When our security model is compromised, all else takes a back seat—SRM becomes less important than storage resource protection. Security should not be omitted in any deployment.
Throughout this SRM discussion, security may have been given lesser priority, but in this
chapter, we will increase the priority of security in the context of your SRM deployment. Recent virus outbreaks have either tested whether you have been updating your systems’ security patches or given you a chance to validate your data recovery procedures. Lately, much effort has been spent just keeping our systems safe from harm, and from this effort, new attention to security and a new security initiative from Microsoft will result.
There are several things that you can do to improve your security immediately. First is subscribing to the Microsoft Security Notification Service. To subscribe to this service, send an email to [email protected] (no need to put anything in the subject line or message body).
More information can be found at http://www.microsoft.com/technet/security/bulletin/notify.asp.
The next thing that you can do is to download and run several Microsoft-provided security tools.
Many security flaws or problems have been found in Internet Information Server (IIS), which is a component of the default installation of Win2K, so many of the tools focus on IIS. A good starting point is the Microsoft Security Tool Kit, as it packages several tools and recent patches to the OS, IIS, and some applications such as Internet Explorer (IE).
Microsoft Security Tool Kit
Microsoft designed the Microsoft Security Tool Kit for protecting systems from Internet threats, which are primarily focused on IIS, but might also attack the core OS. You can order the security tool kit CD-ROM, at no charge for US customers, at http://www.microsoft.com/security/kitinfo.asp, or if you or someone in your office is a TechNet subscriber, you have the first release of the toolkit in the November 2001 edition and a standalone version in the December 2001 edition. For more information about the toolkit, go to http://www.microsoft.com/technet/security/tools/stkintro.asp and read the Microsoft article
“Release Notes for the Microsoft Security Toolkit” at http://support.microsoft.com/default.aspx?scid=kb;EN-US;q309536. You can also spend an hour and a half watching the support web cast “Using the Microsoft Security Tool Kit to Get and Stay
Secure,” which is available at http://support.microsoft.com/servicedesks/webcasts/wc121301/wcblurb121301.asp.
The Microsoft Security Toolkit CD-ROM includes:
A custom version of Win2K SP2 for use in the security toolkit’s automated deployment.
A patch for the SSI Privilege Elevation Vulnerability
An MSI package for installing the Microsoft Windows Critical Update Notification.
The IIS Lockdown Tool, which provides several configuration templates, including a server that does not require IIS.
URLScan, which is integrated into the IIS Lockdown Wizard, screens HTTP requests and allows only those requests that comply with a rule set that the Administrator created.
Additional updated software such as IE 5.01 SP2 and IE 5.5 SP2, and Windows Media
QChain, for applying several hotfixes with only one reboot.
To apply hotfixes using QChain, I create a folder with QChain.exe, and rename the QChain.exe as
!Qchain.exe to avoid the program being called by the FOR loop. I then add the
Apply_ALL_HotFixes_in_this_folder.cmd and HOTFIX.CMD batch files to the !QChain.exe folder.
for %%A in (Q*.exe) DO CALL HOTFIX %%A
echo %computername% >> ServersQ_Done.txt
SHUTDOWN /C /R /T:99
This calls the HOTFIX.CMD batch file, which applies any Q##### hotfix in the folder.
HOTFIX.CMD %1 -z -m
Before you install the toolkit, Microsoft recommends that you back up your system and update the Emergency Repair Disks (ERDs). After you finish these tasks, the next step is to determine the current state of the system. You can use the Microsoft Network Security Hotfix Checker
(Hfnetcheck.exe, which is available on the toolkit CD-ROM in the
COMBINED\tools\HFNetCheck folder) to determine which security fixes are currently installed.
According to the documentation, Hfnetcheck compares the security patches on your system to those listed in a Microsoft-maintained database that is updated each time the company issues a new security bulletin. Hfnetcheck will assess patch status for NT 4.0 and Win2K as well as hotfixes for IIS 4.0, IIS 5.0, SQL Server 7.0, SQL Server 2000 (including the Microsoft Data
Engine—MSDE), and IE 5.01 or later. The security tookit doesn’t install Hfnetcheck by default, and this tool isn’t required on every system.
Hfnetcheck offers a surprising range of options for scanning servers. You can scan servers by server name, using the command
Hfnetchk.exe -h server1, server2, server3
by IP range, using the command
Hfnetchk.exe -r 192.168.0.1,192.168.0.254 -z –v
and by domain, using the command
Hfnetchk.exe -d domain_name -t 128 -s 2 -z
The starting point for the security toolkit is the readme.htm file. If you are logged on as an
Administrator on the system that you want to secure, click the Install Now link to start the installation process. Readme.htm also has information about performing manual installations for new and existing NT 4.0 and Win2K systems and how to deploy the toolkit using Microsoft
Systems Management Server (SMS). Before running the tool, you might want to run through the
IIS 5.0 Baseline Security Checklist available at http://www.microsoft.com/technet/security/tools/iis5cl.asp.
After you finish installing the security toolkit, view the Mstsetup.log file (located in the
%systemroot% folder) to see the updates identified for installation and whether these installations were successful.
Part of hardening a system for security is to remove unnecessary services, so you may also consider removing IIS from your default installation if it is not needed on the server. The IIS
Lockdown tool can assist in removing or disabling IIS services such as HTTP, FTP, SMTP, and
NNTP. This tool is available separately for download at http://www.microsoft.com/Downloads/Release.asp?ReleaseID=33961. The Windows 2000
Server Baseline Security Checklist is available at http://www.microsoft.com/technet/security/tools/w2ksvrcl.asp.
Whew, now that we’ve secured our systems, we can get back to the business of SRM. The challenge in managing any storage environment is how to be proactive instead of reacting to catastrophic events. What can we learn from the top professionals in enterprise organizations, the largest consumers of storage? If we follow their lead, we will already know the questions that must be asked, and how to find the answers, such as “How do I know if my storage is online and performing as well as it should?”
Table 7.2 lists the types of events most likely to happen in your environment, and some planned responses. What you can gain from the table is seeing the importance of SRM. Perhaps you are reading this book with the thought of doing SRM yourself, without a third-party application. If so, you will need the following contingency plans.
Anticipated Event Planned Response
Running low on disk space
New users and home directories
New folders or subdirectories Ensure that existing storage policies are applied to the new folders.
New viral attacks Prevent viruses from writing files by using NTFS permissions and identifying the viral files (sometimes creating a read-only preexisting file to stop the viral action).
New storage systems and
Carve out the storage and allocate to application servers and file servers. Apply storage policy to allocated storage. Ensure that storage systems and SANs are part of the management framework. Understand how to deal with specific events, such as device failure
Disk drives added to the servers
Notify users, prepare user report and administrative report about what can be removed; if space is dangerously low, block writes until files are removed.
Add new users to the existing storage policy (how much space allocated, which types of files are not allowed). Ensure that no users exist outside of policy.
Add the disks to existing arrays (if supported) or create new arrays and logical disks. Ensure that the disks are part of the storage policy.
User accounts and files will become orphaned through long periods of inactivity. Plan to identify and clean up these objects periodically.
Minor service interruptions
Resolve items such as loss of power, cable damage or failure, operator error, or other service errors such as GC or domain controller failures precluding authentication.
Device failure or data corruption necessitating recovery procedures. Identify proper procedures and ensure that recovery hardware is on standby.
Table 7.2: Anticipated storage events and planned responses.
You can monitor and measure Win2K stability using similar methods as you use with other applications, primarily by using an application monitor (such as Microsoft Operations
Manager—MOM, which I’ll discuss later) to watch event logs for the dirty shutdown event (ID
6008) followed by the system startup event (ID 6005).
If your server is experiencing stop errors, see the following articles for information about how to use the crash dump information recorded in the Memory.dmp file: “Gathering Blue Screen Information
After Memory Dump” at http://support.microsoft.com/directory/article.asp?ID=KB;EN-US;Q192463& and “Blue Screen Preparation Before Contacting Microsoft” at http://support.microsoft.com/directory/article.asp?ID=KB;EN-US;Q129845&.
For information about troubleshooting failed applications, see the article “How to Install Symbols for
Dr Watson Error Debugging” at http://support.microsoft.com/directory/article.asp?ID=KB;EN-
Another product that performs Win2K monitoring offers a visual perspective—Quest Software’s
Spotlight on Windows. This product’s UI looks like it should also play CDs or mp3s, but it actually provides an all-in-one view of how a server is performing, including items such as free disk space and disk I/O (reads/second and writes/second). This tool offers more functionality than Windows Performance Monitor provides, featuring an analysis of performance data and an online tuning guide.
Storage Event Monitoring
When we are forced to be in reactive mode, which is inevitable as devices fail and software crashes, the key is how quickly we can find out that there is a problem and how extensive the information is that we can gather. Quite often we find that the failure was preceded by several warnings, such as several Event 9, source: scsi miniport driver, which states
\Device\ScsiPortX, did not respond within the timeout period
followed by an Event 11 source: scsi miniport driver, which states
The driver detected a controller error on Device\ScsiPortX.
Even Win2K Performance Monitor can be set to monitor many servers with a very infrequent polling interval, just remember to change the service startup to use a domain account with sufficient credentials. You can monitor free disk space on logical drives (by enabling disk counters using the diskperf –y command, as discussed in previous chapters), or you can monitor a counter such as system uptime, just to make sure the system is online.
Another choice is a vendor-provided storage monitoring application, such as the Web-based view of a direct-attached RAID controller, which Figure 7.1 shows. The information shown in this figure is from Compaq Insight Manager, which provides information about devices attached
to the server: controllers, disks, storage boxes (cabinets), and so on, and is available at http://www.compaq.com.
Figure 7.1: Web-based view of a direct-attached RAID controller.
This type of vendor-provided application is usual for storage event monitoring as it shows degraded and failed devices, which you can see in the Condition Legend. As the figure shows, the controller is in a degraded state as an array is being rebuilt (the error code states Expand in
Progress). Where this product falls short is that a view or state must be determined for each server and rolled up to a centralized hierarchy, and perhaps this model does not apply when you are dealing with multiple applications sharing a pool of storage. So, we must also consider storage monitoring from an application perspective.
Storage Application Monitoring
In the previous chapters, we have gone through the process of designing, testing, and installing your deployment of an SRM application. At this point, we must address the question, Who will monitor the storage resource monitor? How will we know that the SRM application is online and performing its duties? The answer lies in another layer of monitoring—in application monitoring and management. There are a wide variety of application-monitoring products including those focused on SAN device management, which we touched on in the last chapter.
The wide variety of storage and SAN monitoring tools presents several challenges. First, it presents a variety of interfaces or methods of managing the storage, as there is little commonality between the vendors. Second, vendors must develop a product that manages a wide variety of devices that offer varying degrees of interoperability or have limited standards. So the end result is a multitude of specialized management applications with little centralization. At this point in technology evolution, our best choice for centralization is a management and monitoring application that relies on gathering information from the servers (and other similar devices that include event logging, such as a SAN management appliance based on Win2K) attached to the
“But” you ask, “what will monitor the monitoring application—how will we know that the application-monitoring application is running?” Fair question, let’s take a look at one application-monitoring package, MOM, that includes management packs to monitor itself!
So much press and publicity has been focused on MOM lately that I think it is beneficial for storage architects and storage administrators to pay attention. I have heard critical reviews of
MOM’s difficulty and shortfalls, but as with any Microsoft product, the lessons from the field will be turned into a better and perhaps more successful product. If you are unfamiliar with
MOM, it is a server- and application-monitoring product for which Microsoft bought the code from NetIQ. So, if you are familiar with the NetIQ product functionality, MOM will be familiar.
If not, you might find getting started with MOM difficult and overwhelming. I’ll give you a quick lesson in how to get started with MOM, and we’ll look at how MOM integrates or will be integrated with storage management.
Perhaps the hardest part of getting started with MOM is to meet the prerequisites. It is doubtful that you have all of them in place. The server on which you choose to install MOM is known as the central computer. This server will act as the database collection point and the management console. It should be a member of a domain, but not a domain controller, or MOM will refuse to install.
First, run Office Setup, and install the Office graphing component and Access 2000 (the full version of Access 2000 is required for creating or customizing reports, whereas, the run-time version of Access 2000 is required to run and view reports). The run-time version of Access
2000 is available on the MOM CD-ROM in the \Intel\Access2000RT folder.
Next, update %systemroot%\system32\inetsrv\browscap.ini if you’re not using IE 6.0.
Supposedly (according to the MOM product documentation), this file is downloadable from the
Microsoft Web site, but I couldn’t find it, and the MOM setup application takes care of updating browscap.ini. Optionally, you can install Outlook 98 or later to send email notifications through
Next, install SQL Server 2000, and set a password on the sa account. If you’re installing MOM on an existing SQL Server, run the sp_helpsort query to ensure that the sort order is case insensitive, and verify that the audit level of SQL
Server is set to None or Failure (check the audit level on the Security tab of the server’s properties page). Ensure that the MSSQLServer, MSDTC, and SQLServerAgent services are running and set to start automatically on computer startup.
MOM requires Microsoft Data Access Components (MDAC) 2.6 or later. As Figure 7.2 shows, the MOM setup program will verify this prerequisite and give you the option to install MDAC
Figure 7.2: MOM installation verifies prerequisites and can update MDAC.
Next, increase the log file size of the Microsoft Distributed Transaction Coordinator (MSDTC).
As Figure 7.3 illustrates, the MOM installation program gives you the option to increase the
MSDTC log file size, and it can launch the Component Services MMC for you. In the MMC, right-click My Computer, and select Stop MS DTC. Right-click My Computer, and select
Properties to access the MSDTC log file settings. Increase the log file size as much as possible, with 512MB being a recommended minimum for production environments (possibly on its own drive array), and 64MB a recommended minimum for small or test environments. Clicking OK to confirm the changes has the same effect as clicking Reset Log. On the pop-up warning message, click Yes only if you are sure that it is OK to reset this log on your system. Finally, right-click My Computer, and select Start MS DTC.
Figure 7.3: The MOM installation program gives you the option to increase the MSDTC log file size.
During installation, you might want to add Management Pack Modules as Figure 7.4 shows.
Figure 7.4: Adding Management Pack Modules during MOM installation.
The next step in setting up MOM is to add the servers that will be monitored. MOM will discover the servers and push out agents to them if you authorize it. This process isn’t well documented in MOM, so I have illustrated it. The first step is to right-click the Agent Managers folder in the MOM Administrator Console, and open the properties, as Figure 7.5 shows.
Figure 7.5: Accessing the properties of the Agent Managers folder is the first step in selecting the computers to be managed by MOM.
Next, on the Managed Computer Rules tab, click Add, which will take you to the window that
Figure 7.6 shows. This window lets you enter the domain name of the servers and a rule for matching the server names. If you want to find all computers in the domain just enter an asterisk
(*). You can approve or reject the installation of MOM agents individually, so you don’t need to worry about finding too many computers at this point (unless MOM has been previously configured to install without confirmation, but that is not the default setting).
Figure 7.6: Selecting servers to monitor in MOM.
After MOM discovers the servers, you will see them listed in the Pending Installation folder under Configuration, as Figure 7.7 shows. In this figure, I have three new servers on which to install MOM, pending my approval.
Figure 7.7: List of computers pending installation of MOM agents.
Other Applications and MOM Integration
As Figure 7.8 shows, you can use Microsoft Visio Professional 2002 to diagram a SAN; a process that is made easier by an add-in called Netreon SANexec, part of the Netreon SANexec
Manager kit (http://www.netreon.com). Netreon also offers a wizard-driven SAN-management tool that can assist in tasks such as zoning. The SANexec Manager kit integrates with AD and
MOM and can trigger the notification of a problem with a brocade SAN switch or congestion in the fibre-channel fabric.
Figure 7.8: Using Microsoft Visio and Netreon SANexec to diagram a SAN.
Improving the System
Although maintaining the status quo and avoiding problems is a good starting point, there is also a need to work to improve your systems. From the business standpoint, information systems are designed to give your business competitive advantage. On the horizon, there are always competitors to internal information systems departments—service providers—whose mission is to sell the same IT functions to the business as you, as a network administrator provide, but from an external basis. For the service providers to be successful, they must provide competitive systems offerings, such as more efficient operations at a lower cost. They can also provide competitive advantages such as higher performance or guaranteed availability. If the business’ internal information systems operations cannot provide these desired advantages, then we are in danger of losing our jobs to service providers. One way to ensure our job security is to maintain availability.
In working with customers, I’ve determined that the key to maintaining and improving availability is to understand two components: the mean time between failures (MTBF) and the mean time to recover (MTTR). Combined, MTBF and MTTR determine the system availability.
Most studies of system downtime, especially those focused on storage systems, list the top causes of service interruptions as hardware failures, software crashes, and operator errors. So the best way to ensure availability is to implement redundant hardware systems and provide adequate operator training and change control. To protect against software crashes (in addition to
change control, which can help minimize untested configurations from being deployed) and to protect against any remaining hardware faults, you can use clustering technologies.
There are many types of clustering in the Windows environment, from the Wolfpack clusters of
Microsoft Cluster Server (MSCS) to devices that run multiple servers in lockstep, such as
Marathon Technologies’ Endurance solution (http://www.marathontechnologies.com). Both of these clustering options require specialized hardware that can add significantly to the cost. For
MSCS, the storage must be on a bus that can be shared, either external SCSI (which limits both the distance and number of hosts) or fibre-channel (which is more flexible and costly).
Endurance requires a dual set of servers to separate the compute element from the I/O Processor
(IOP), which maintains the storage and network connections. To connect the computer element and the IOP, the solution uses proprietary boards (Marathon Interconnects—MICs), essentially as an extension of the system bus. This setup lets the computer element and IOP pair be redundant (for a total of four physical servers acting as one logical server) and separated at a distance, up to the acceptable latency limits of the fibre-channel interconnects. In the near future, we may see these proprietary interconnect boards replaced by industry-standard InfiniBand boards.
Another option is to use servers that mirror all internal devices, including processor and memory, running all internal operations in lockstep. For any of these, the additional cost must be justified against the desired improvement in availability, or at least the improvement in MTBF, because if something does go wrong on one of these specialized systems, you need to look at MTTR, and it may be more difficult to recover than on a standard server.
The primary method for reducing MTTR is to ensure that suitable system and information backups are being performed, and to ensure that recovery procedures are valid. The difficulty is gauging the value of these operations against the cost of performing them. The experienced IT manager or CIO knows the value of practicing recovery operations and decreasing recovery times, but you can easily let this necessity fall behind in day-to-day priorities and activities.
File Share Security
Up to this point, I have touched briefly on NTFS security, so let’s consider this a final review of the subject, and perhaps a final examination to see how well you do. Recently I worked on a project in which a shared directory was needed so that vendors and employees could place files in it, but they could not browse the directory or open the files. I was surprised at how difficult this process turned out to be for some systems administrators. Once you follow these steps, the process will make sense as you relate it to your knowledge of inheritable NTFS permissions, but if you perform the steps in the wrong order, the process will not work, which is what was preventing the systems administrators from creating the secure drop share.
Creating a Secure Drop Directory
The design goal is to have a drop folder that is available to anyone on the network to drop files into but is not available for anyone else but a select administrator (who can view the contents or execute files in that directory). For example, suppose the secure drop folder is called ztest. Two users, AB User and Secure (or groups of users instead of the individual accounts used in this
scenario) can view the folder over the network. A member of the local administrators must be logged on to view the folder’s properties. The share ztest should be under the Full Control of the account Secure. The share ztest should be visible to AB User over the network and allow writeonly access to this user. As Figure 7.9 illustrates, if AB User attempts to open the ztest folder, access is denied.
Figure 7.9: Attempts by a non-administrator to open the secure drop share ztest are denied.
However, copying a file to the folder by any authenticated user does not result in access denied, as Figure 7.10 illustrates.
Figure 7.10: Non-administrator users can copy a file to the ztest folder.
The process of creating a secure drop share is not that tricky, but it has a few steps that must be done in the right order or it just won’t work. Figures 7.11 shows the desired permissions for the
Secure account, which will have Full Control access to the ztest folder.
Figure 7.11: Desired permissions for the Secure account.
Figure 7.12 shows the correct permissions for the AB User account, which allow the user writeonly permissions on the ztest folder.
Figure 7.12: Desired permissions for AB User account.
Figure 7.13 shows what happens if you attempt to set the NTFS permissions during the share creation process. This error message can prevent some administrators from attempting to use this configuration. As the figure shows, if the process is performed in this order, you cannot even create the share!
Figure 7.13: The error message that results when permissions are applied during the shared folder creation.
Instead of using Windows Explorer to create shares, you can use the Computer Management
MMC snap-in, which Figure 7.14 shows. Doing so has the following advantages: you can create shares on a remote computer and setting permissions on the folder is easier.
Figure 7.14: Using the Computer Management snap-in to remotely create a shared folder.
The following steps walk you through how to create and configure the ztest folder:
First, create the folder on an NTFS partition, as Figure 7.15 shows.
Keep the default permissions for now (Everyone Full Control).
Share the folder either from the computer using Windows Explorer or remotely using the
Computer Management snap-in.
Keep the Share Permissions as Everyone Full Control by accepting the default basic share permissions “All users have full control.”
Figure 7.15: Creating the shared folder.
Open the newly created share properties and select the Security tab, as Figure 7.16 shows.
Remove the Everyone group.
Figure 7.16: The default share permissions.
Usually you would not be able to remove the Everyone group using Windows Explorer to create shares. As Figure 7.17 shows, an error message occurs if you attempt to remove the Everyone group from a folder that inherits permissions.
Figure 7.17: Error message that results from attempting to remove the Everyone group.
The big advantage of using the Computer Management snap-in to create and configure the shared folder is that the snap-in automatically clears the
Allow inheritable permissions
check box. However, if you are using Windows Explorer, you can work around the error message in
Figure 7.17 using the following steps:
As Figure 7.18 shows, if you’re using Windows Explorer to create and configure the share, you must clear the Allow inheritable permissions check box before you can configure the permissions. As the figure shows, the permissions check boxes will be grayed out until you clear this check box.
Figure 7.18: Clear the Allow inheritable permissions check box to configure the folder’s permissions.
After you clear this check box, you will be presented with the pop-up message that Figure
7.19 shows. Click Remove in this dialog box.
Figure 7.19: The pop-up message that results from clearing the
Allow inheritable permissions
In the Customize Permissions window, which Figure 7.20 shows, clear the Read check box under Allow, but leave the Write check box under Allow selected.
Figure 7.20: Setting write-only permissions for the secure drop share.
As Figure 7.21 shows, AB User cannot even see the size of files and folders.
Figure 7.21: AB User cannot see the size of files and folders.
Ongoing Process of Storage Management
From here, the process of storage management will consist of everything from designing and deploying the appropriate storage systems to developing the techniques to manage them.
Traditionally, storage management starts with defining the types of information that will be stored, and developing the appropriate type of storage to house it. Next, information is classified and prioritized so that appropriate protection and disaster recovery procedures can be implemented. Once the storage is online and adequately protected (from both a security perspective as well as fault protection, such as RAID), the storage monitoring begins, ensuring availability and performance. Integrity of the information will need to be maintained as well
(protecting the data from corruption as well as ensuring that the information is necessary for the business). Finally, future storage requirements will need to be anticipated and met; this process involves selecting the best technology to making sure that the technology is distributed and used wisely. Table 7.3 will help you keep track of the recurring tasks that comprise storage and information systems management.
Monitor server and storage availability
Monitor information flow
Troubleshoot and support
Document change control
Review security logs
Monitor storage utilization
Recurrence Estimate of Time
Periodically, such as ping every hour
Daily, on demand
Daily, as needed (as changes occur)
Daily, as needed (as changes occur)
At least weekly, automation makes this process more real-time
Hours, unless automated
Hours, unless automated
2 to 4 hours
Verify servers are online and not reporting any failed hardware or predicted hardware failures (such as members of RAID disk sets), verify that services are running, and verify that disk storage is available to users and applications
Ensure that network transport is active and mail queues or file transfers are not queuing up
Respond to support requests on the end-user or server level, depending on your role
30 minutes Record changes to the servers, storage, and network environment
4 to 8 hour window, hands-on is time minimal
Application data and system state backups
30 minutes Review Windows event logs and application-specific logs
Hours, unless automated
Check for available free disk space, perform usage forecast (trend analysis), and run reports on duplicate, aged, and unwanted file types
SRM tools, Win2K
Performance Monitor, application monitors, hardware reporting tools, and the system information in the
Computer Management console
Application or transportspecific automation tools
Help desk support system
Auditing or surveying tools
NT Backup and other backup software
Event log filtering, and monitoring tools that read the event logs (for example, MOM)
SRM tools and Win2K
Patch or update systems
Document environment and current project status
Monitor server and storage performance
Perform security audit
Recurrence Estimate of Time
Monthly or more often if there is a security issue
Daily or weekly, as needed (as changes occur)
At least weekly, automation makes this process more real-time
1 to 2 hours per system
1 hour per system, you might need to schedule change control if updates involve downtime
Includes offline defragmentation, removing temporary files, and so on
Hotfixes and service packs for
OSs, and firmware updates such as
30 minutes Update the documentation of the network and storage systems and prepare reports for management
15 to 30 minutes
Measure the storage performance compared with baseline or lastknown state—Is performance adequate?
15 to 30 minutes
Check for inactive user and computer accounts
Monthly 2 to 6 hours
Disk defragmenter or database-application specific utilities
Windows Update, security bulletins, and the QChain tool for applying Win2K hotfixes
Visio, SMS, and other tracing tools; and project status emails and
Microsoft Project Gantt charts
Monitor and hardwarespecific tools such as
SCSI or fibre-channel diagnostic utilities
Resources depend on the directory (for example, Active
Directory Users and
Computers for AD)
Backup software and standby recovery systems
Annually or quarterly
Perform an offline server recovery and ensure that backups and recovery procedures are valid
Perform intrusiondetection audits and attempts to breach security
Intrusion detection tools, security consultants, and the Microsoft security toolkit
Research new technologies
Recurrence Estimate of Time
Annually, as needed
Annually, as needed
Keep up-to-date on improvements in technologies and update professional certifications
Ongoing upgrades to servers and storage systems
Keep a sense of humor here, I’m only joking!
Web sites and email subscriptions such as
InfoStor news and
Storage UPDATE from the
Windows & .NET
Keep track in configuration log, including needed changes
Dilbert books and cartoons
Table 7.3: Storage systems management recurring tasks.
The goal of this chapter is to aid you in developing a daily approach to SRM that automates repetitive tasks, for example monitoring disk usage by using the SRM software that we have discussed. This approach will free your time for other crucial tasks, such as maintaining your security defenses, which we explored. We finished the SRM deployment by setting up systems to monitor and maintain the SRM solution. We covered the technical aspects of what you need to monitor and what solutions are available. I gave you a complete list of systems management recurring tasks that you can use to make sure that you have all your operations in place, including SRM functions. Without a daily approach, important tasks might get squeezed out, as you only have so much time, and must continually fight to ensure that your priorities match those of the business.
In the next chapter, we will look at the future of storage and SRM, including both hardware technology and software changes. First, we’ll look at the immediate future—at changes that are happening all around us—that you’d be wise to learn about and consider. Then I’ll take a more predictive look into the future and attempt to divine what the predominant or surviving technologies and standards will be.
Much of the next chapter will focus on networked storage, as that is clearly where the most improvement and increases in adoption will occur. In the area of hardware, we’ll look at changes in speeds and feeds as we get faster pipes and possibly even greater distances. One of the upcoming changes is in virtualization of devices and storage, which we touched upon earlier. In the next chapter, we will also look at what these changes mean from a storage management perspective. We will look at the server side of storage networks, changes in host bus adapters
(HBAs), booting from the SAN, and multi-path I/O and what it means for performance and fault tolerance.
No discussion would be complete without covering disaster recovery, so we will look at distance mirroring, cloning, and snapshots and serverless backup. Some of these technologies exist today, albeit in their infancy, so we will look at where they will need to go to speed adoption.
Some of these advancements will require parallel improvements in the Windows OS, so we will look at the next generation—.NET Server—and what it will provide to enhance our storage experience. Finally, we will look at what changes we will need to make from an operational and procedural perspective.
© 2001 Realtimepublishers.com, Inc.
All rights reserved. This site contains materials that have been created, developed, or commissioned by, and published with the permission of,
Realtimepublishers.com, Inc. (the “Materials”) and this site and any such Materials are protected by international copyright and trademark laws.
THE MATERIALS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-
INFRINGEMENT. The Materials are subject to change without notice and do not represent a commitment on the part of Realtimepublishers.com, Inc or its web site sponsors. In no event shall Realtimepublishers.com, Inc. or its web site sponsors be held liable for technical or editorial errors or omissions contained in the Materials, including without limitation, for any direct, indirect, incidental, special, exemplary or consequential damages whatsoever resulting from the use of any information contained in the Materials.
The Materials (including but not limited to the text, images, audio, and/or video) may not be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any way, in whole or in part, except that one copy may be downloaded for your personal, non-commercial use on a single computer. In connection with such use, you may not modify or obscure any copyright or other proprietary notice.
The Materials may contain trademarks, services marks and logos that are the property of third parties. You are not permitted to use these trademarks, services marks or logos without prior written consent of such third parties.
If you have any questions about these terms, or if you would like information about licensing materials from Realtimepublishers.com, please contact us via e-mail at [email protected]
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project