Foglight for Storage Management 4.1 User and Reference Guide

Foglight™ for Storage Management 4.1
User and Reference Guide
©
2015 Dell Inc.
ALL RIGHTS RESERVED.
This guide contains proprietary information protected by copyright. The software described in this guide is furnished under a
software license or nondisclosure agreement. This software may be used or copied only in accordance with the terms of the
applicable agreement. No part of this guide may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying and recording for any purpose other than the purchaser’s personal use without the written
permission of Dell Inc.
The information in this document is provided in connection with Dell products. No license, express or implied, by estoppel or
otherwise, to any intellectual property right is granted by this document or in connection with the sale of Dell products. EXCEPT
AS SET FORTH IN THE TERMS AND CONDITIONS AS SPECIFIED IN THE LICENSE AGREEMENT FOR THIS PRODUCT, DELL ASSUMES NO
LIABILITY WHATSOEVER AND DISCLAIMS ANY EXPRESS, IMPLIED OR STATUTORY WARRANTY RELATING TO ITS PRODUCTS
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR
NON-INFRINGEMENT. IN NO EVENT SHALL DELL BE LIABLE FOR ANY DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE, SPECIAL OR
INCIDENTAL DAMAGES (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION OR LOSS
OF INFORMATION) ARISING OUT OF THE USE OR INABILITY TO USE THIS DOCUMENT, EVEN IF DELL HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES. Dell makes no representations or warranties with respect to the accuracy or completeness of
the contents of this document and reserves the right to make changes to specifications and product descriptions at any time
without notice. Dell does not make any commitment to update the information contained in this document.
If you have any questions regarding your potential use of this material, contact:
Dell Inc.
Attn: LEGAL Dept
5 Polaris Way
Aliso Viejo, CA 92656
Refer to our web site (software.dell.com) for regional and international office information.
Patents
This product is protected by U.S. Patent # 8,175,863. Foglight™ is protected by U.S. Patents # 7,979,245 and 8,175,862.
Additional Patents Pending. For more information, go to http://software.dell.com/legal/patents.aspx.
Trademarks
Dell, the Dell logo, and Foglight, IntelliProfile, PerformaSure, and Tag and Follow are trademarks of Dell Inc. "Apache HTTP
Server", Apache, "Apache Tomcat" and "Tomcat" are trademarks of the Apache Software Foundation. Google is a registered
trademark of Google Inc. Chrome, Android, and Nexus are trademarks of Google Inc. Red Hat, JBoss, the JBoss logo, and Red
Hat Enterprise Linux are registered trademarks of Red Hat, Inc. in the U.S. and other countries. CentOS is a trademark of Red
Hat, Inc. in the U.S. and other countries. Microsoft, .NET, Active Directory, Internet Explorer, Hyper-V, SharePoint, SQL Server,
Windows, Windows Vista and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the
United States and/or other countries. AIX, IBM, and WebSphere are trademarks of International Business Machines Corporation,
registered in many jurisdictions worldwide. Sun, Oracle, Java, Oracle Solaris, and WebLogic are trademarks or registered
trademarks of Oracle and/or its affiliates in the United States and other countries. SPARC is a registered trademark of SPARC
International, Inc. in the United States and other countries. Products bearing the SPARC trademarks are based on an
architecture developed by Oracle Corporation. OpenLDAP is a registered trademark of the OpenLDAP Foundation. HP is a
registered trademark that belongs to Hewlett-Packard Development Company, L.P. Linux is a registered trademark of Linus
Torvalds in the United States, other countries, or both. MySQL is a registered trademark of MySQL AB in the United States, the
European Union and other countries. Novell and eDirectory are registered trademarks of Novell, Inc., in the United States and
other countries. VMware, ESX, ESXi, vSphere, vCenter, vMotion, and vCloud Director are registered trademarks or trademarks
of VMware, Inc. in the United States and/or other jurisdictions. Sybase is a registered trademark of Sybase, Inc. The X Window
System and UNIX are registered trademarks of The Open Group. Mozilla and Firefox are registered trademarks of the Mozilla
Foundation. "Eclipse", "Eclipse Foundation Member", "EclipseCon", "Eclipse Summit", "Built on Eclipse", "Eclipse Ready" "Eclipse
Incubation", and “Eclipse Proposals" are trademarks of Eclipse Foundation, Inc. IOS is a registered trademark or trademark of
Cisco Systems, Inc. and/or its affiliates in the United States and certain other countries. Apple, iPad, iPhone, Xcode, Mac OS,
and Safari are trademarks of Apple Inc., registered in the U.S. and other countries. Ubuntu is a registered trademark of
Canonical Ltd. Symantec and Veritas are trademarks or registered trademarks of Symantec Corporation or its affiliates in the
U.S. and other countries. YAST is a registered trademark of SUSE LLC in the United States and other countries. Citrix and
XenDesktop are trademarks of Citrix Systems, Inc. and/or one or more of its subsidiaries, and may be registered in the United
States Patent and Trademark Office and in other countries. AlertSite and DéjàClick are either trademarks or registered
trademarks of Boca Internet Technologies, Inc. Samsung, Galaxy S, and Galaxy Note are registered trademarks of Samsung
Electronics America, Inc. and/or its related entities. MOTOROLA is a registered trademark of Motorola Trademark Holdings,
LLC. The Trademark BlackBerry Bold is owned by Research In Motion Limited and is registered in the United States and may be
pending or registered in other countries. Dell is not endorsed, sponsored, affiliated with or otherwise authorized by Research
In Motion Limited. Other trademarks and trade names may be used in this document to refer to either the entities claiming the
marks and names or their products. Dell disclaims any proprietary interest in the marks and names of others.
Legend
CAUTION: A CAUTION icon indicates potential damage to hardware or loss of data if instructions are not followed.
WARNING: A WARNING icon indicates a potential for property damage, personal injury, or death.
IMPORTANT NOTE, NOTE, TIP, MOBILE, or VIDEO: An information icon indicates supporting information.
Foglight for Storage Management User and Reference Guide 
Updated - July 2015
Foglight Version - 5.7.1
Cartridge Version - 4.1
Contents
Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Introducing Foglight for Storage Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Navigating Foglight for Storage Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Verifying Storage Collector Agents are Collecting Storage Data . . . . . . . . . . . . . . . . . .10
Collecting Storage Data for Virtualized Devices and Servers . . . . . . . . . . . . . . . . . .12
Understanding Metric Data in Charts and Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . .14
Modifying and Extending Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Monitoring Storage Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Introducing the Storage Environment Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
Monitoring Your Storage Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
Understanding Status, Alarms, and Rules in Foglight for Storage Management . . . . . .19
Reviewing the Status of All Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
Assessing Storage Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
Monitoring Fabrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22
Monitoring Storage Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26
Monitoring Filers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28
Asking Questions About the Monitored Storage Environment . . . . . . . . . . . . . . . . . . . .30
Assessing Connectivity and I/O Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31
Introducing the Virtualization Dashboards . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31
Summary of Icons Used in Topology Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . .33
Exploring Connectivity with SAN Topology Diagrams . . . . . . . . . . . . . . . . . . . . . . .34
Exploring I/O Performance with SAN Data Paths . . . . . . . . . . . . . . . . . . . . . . . . .37
Monitoring Storage Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40
Capacity Trending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40
Evaluating Pool Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40
Environment Summary/Monitoring/Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .41
Capacity Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42
Low Capacity Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42
Creating Storage Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43
Investigating Storage Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Introducing the Storage Explorer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44
Exploring a Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46
Exploring a Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .49
Exploring a Cisco VSAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50
Exploring a Filer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51
Exploring a Storage Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54
Non-Clustered Storage Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54
Dell EqualLogic Storage Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56
EMC Isilon Storage Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59
Common Data for Filers and Storage Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64
Foglight for Storage Management 4.1
User and Reference Guide
3
LUNs tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64
Disks tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65
Advanced Find . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66
Investigating Storage Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Introducing Storage Component Dashboards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70
Investigating an Aggregate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70
Investigating an Array/Filer Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73
Investigating a Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74
Investigating a Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76
Investigating an EqualLogic Member . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .77
Investigating an FC Switch Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78
Investigating an Isilon Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .80
Investigating a LUN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81
Investigating a NASVolume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84
Investigating a Physical Disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .86
Investigating a Pool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88
Common Component Disk Tab Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93
Troubleshooting Storage Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Starting a Troubleshooting Investigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .95
Analyzing Storage Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .96
Analyzing the Pool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Changing Latency Thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Understanding the Troubleshooting Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Managing Data Collection, Rules, and Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . .103
Collecting Virtual Storage-to-SAN Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Inferring Physical-Host-to-Storage Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Enabling Dependency Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Reviewing and Editing Host-Port Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Running Dependency Processing Manually . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Customizing Helper Strings for Dependency Processing . . . . . . . . . . . . . . . . . . . . 106
Reviewing Inferred Hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Modifying Data Collection Schedules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Understanding Data Collection Types and Schedules . . . . . . . . . . . . . . . . . . . . . . 110
Modifying Data Collection Schedules for Storage Collector Agents . . . . . . . . . . . . . 111
Managing Foglight for Storage Management Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Managing Alarm Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Changing Alarm Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Configuring Email Notifications for Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Troubleshooting Database Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Understanding Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .118
Units of Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Fabrics and FC Switches — Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . 118
Foglight for Storage Management 4.1
User and Reference Guide
4
Storage Arrays and Filers — Disk I/O Performance Metrics . . . . . . . . . . . . . . . . . . 120
Clustered Storage Arrays — Network Performance Metrics . . . . . . . . . . . . . . . . . . 121
Capacity Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Storage Arrays — Array, Member, and Pool Capacity Metrics . . . . . . . . . . . . . . . . . 121
Filers — Filer and Aggregate Capacity Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Storage Arrays and Filers — LUN, NASVolume, and Disk Capacity Metrics . . . . . . . . 126
Overview of Metrics in Foglight for Storage Management . . . . . . . . . . . . . . . . . . . . . 126
Summary of SAN Topology Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Locating SAN Topology Objects in Foglight for Storage Management . . . . . . . . . . . 127
Contacting Dell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Technical support resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Foglight for Storage Management 4.1
User and Reference Guide
5
1
Getting Started
This chapter introduces you to Foglight for Storage Management, and tells you how to verify that Foglight for
Storage Management is capturing storage collection data for all the devices you are interested in monitoring.
NOTE: For information about installing Foglight for Storage Management, configuring Storage Collector
agents and the Storage Agent Master, and managing agents and Storage Agent Masters, see the Managing
Storage in Virtual Environments Installation and Configuration Guide.
This section covers the following topics:
•
Introducing Foglight for Storage Management
•
Navigating Foglight for Storage Management
•
Verifying Storage Collector Agents are Collecting Storage Data
•
Understanding Metric Data in Charts and Tables
•
Modifying and Extending Data Collection
•
Next Steps
Introducing Foglight for Storage
Management
Virtual infrastructures add layers of abstraction that obscure the relationship between physical storage and
virtual machines. Foglight for Storage Management supports VMWare and Hyper-V virtual environments. It helps
you look beyond logical datastores and volumes to see how physical storage components (switches, logical unit
numbers (LUNs), arrays, and filers) are impacting virtual machines (VMs).
The key concepts for Foglight for Storage Management are “relationship” and “context.” Rather than simply
viewing repeated performance metrics, Foglight for Storage Management incorporates the data into an end-toend view of your virtual storage infrastructure, showing the relationship of the storage and storage area
network (SAN) performance to VM performance. You can also investigate performance events in the proper
context, viewing the status of the entire topology at the time the performance issues occurred.
Performance Monitoring
Monitoring the performance of your storage infrastructure is critical to maximizing VM performance. Foglight
for Storage Management allows you to track:
•
Insufficient bandwidth and throughput of SAN switch ports, filers, and storage arrays.
•
Over (or under) utilized datastores CSVs, and logical disks.
•
I/O bottlenecks for NASVolumes and LUNs, along with the pool providing the physical disk storage for the
LUNs and NASVolumes.
Foglight for Storage Management 4.1
User and Reference Guide
6
Figure 1. Performance Monitoring
Foglight for Storage Management captures performance metrics from components of your storage infrastructure
(SAN switches, LUNs, storage arrays, and filers), evaluates the data based on preconfigured performance and
event rules, then presents this data via a detailed graphical interface. With architectural diagrams, graphs,
alerts, and drill-down screens, Foglight for Storage Management helps you quickly identify problems in the
virtual environment or the physical storage. For more information, see Monitoring Your Storage Environment on
page 19.
Capacity Monitoring
Foglight for Storage Management aids the process of identifying and resolving storage capacity issues. The
Foglight for Storage Management virtualization cartridges alert you to datastores and logical volumes in danger
of running out of storage. The storage component shows you how the physical storage supporting your virtual
infrastructure is configured. The detailed graphical interface of Foglight for Storage Management lets you see
storage capacity allocations for arrays and filers, and receive alerts when free disk capacity and free pool
capacity fall below thresholds. For more information, see Monitoring Storage Arrays on page 26 , Monitoring
Filers on page 28 and Monitoring Storage Capacity on page 40.
Foglight for Storage Management 4.1
User and Reference Guide
7
Figure 2. Capacity Monitoring
Connectivity and I/O Performance Monitoring
Foglight for Storage Management provides comprehensive topology views which detail all I/O paths and
components on each I/O path from VMs down to their assigned physical storage devices. These views assist in
determining where actual performance or connectivity problems are occurring during I/O requests, something
which is not possible using traditional virtual infrastructure management tools. For more information, see
Assessing Connectivity and I/O Performance on page 31.
Figure 3. Connectivity and I/O Performance Monitoring.
Foglight for Storage Management 4.1
User and Reference Guide
8
Navigating Foglight for Storage Management
Foglight for Storage Management is built on the Foglight for Storage Management platform. The following
diagram and table introduce the common Foglight for Storage Management screen elements that are used in
Foglight for Storage Management dashboards. If you need more information about how to use these elements,
open the online help and navigate to Using Foglight for Storage Management > Foglight for Storage
Management User Help > Getting Started > Working with Dashboards.
Figure 4. Navigating Foglight for Storage Management.
Navigation Panel
Display Area
Time Range
Breadcrumbs
Action Panel
Metric Data
Status Icons
Sparklines
Table 1. Screen Elements
Screen Element
Description
Navigation Panel
Contains a menu of all the dashboards that you can access based on your role.
Select a dashboard to view in the display area. The panel operates like a drawer
and can be open or closed using the arrow.
Display Area
Displays the contents of the selected dashboard.
Provides the following functions:
Action Panel
•
Lists the actions that you can perform on the current dashboard.
•
Lists the views and data that you can add to a new dashboard or report.
•
Provides access to context-sensitive help and the online help system.
The panel operates like a drawer and can be open or closed.
Foglight for Storage Management 4.1
User and Reference Guide
9
Table 1. Screen Elements
Screen Element
Breadcrumbs
Time Range
Description
Displays the path of dashboards you navigated to get to the current dashboard.
Click a name in the path to return to that dashboard.
NOTE: The browser Back button is disabled. Use the breadcrumbs to return to the
previous screen.
Controls the time period for the data displayed in the dashboard and any drill-down
views you access. The default is the last four hours. Click the text to open the time
range popup and set the range.
Shows the trend of a metric in a small space. The value beside the sparkline is the
current value for the metric in the selected time range. When used in a table,
sparklines provide an easy way to compare spikes or other trends in a group of
metrics. Click a sparkline to plot all data points for the time period on a graph.
Sparkline
NOTE: If the last collection failed for any reason, the sparkline value is n/a.
Status Icons
Displays the status of a storage resource. For more information, see Understanding
Status, Alarms, and Rules in Foglight for Storage Management on page 19.
Verifying Storage Collector Agents are
Collecting Storage Data
To begin, review the Foglight for Storage Management dashboards to verify you are monitoring the devices you
want to monitor. You need the IP address of the Foglight for Storage Management Management Server and your
user credentials from the person who installed and configured Foglight for Storage Management.
NOTE: This procedure is intended for Foglight for Storage Management users with the roles of Storage
Administrator, VMware Administrator and Hyper-V Administrator.
To log in and confirm the list of monitored devices:
1
Open a supported browser. For a list of supported browsers, see the System Requirements Guide.
2
In the navigation bar, type the IP address followed by the port number 8080. For example:
http://<Foglight for Storage Management_Management_Server_IP_Address>:8080
3
Log in to Foglight for Storage Management using your user credentials.
4
On the navigation panel, expand Dashboards.
These menu items are of particular interest for monitoring storage performance: Storage & SAN,
VMware, Hyper-V and Infrastructure.
5
Expand Storage & SAN.
The following menu items are available:
•
Storage Environment — Monitor storage devices in your storage environment.
•
Storage Explorer — Find and examine any storage device in your storage environment.
•
Storage Troubleshooting — Determine if a poorly-performing virtual machine is experiencing
storage issues.
6
Click Storage Environment.
7
Ensure the Monitoring tab is selected.
8
Click the top half of each active tile and review the list of devices in the quick view.
Foglight for Storage Management 4.1
User and Reference Guide
10
Figure 5. Storage Environment
9
If you do not see a storage device that you think should be monitored, review the list of configured
Storage Collector agents.
a
On the Storage Environment dashboard, click the Administration tab.
TIP: If you do not see the Administration tab, ask your Foglight for Storage Management
Administrator to add the role Storage Administrator to your user account.
b
In the Storage Collector Agents list, look for the Storage Collector agent that monitors the missing
device or device type. When an agent monitors a device type, click the Edit icon to view a list of
named devices in the Edit Agent Properties dialog box.
Figure 6. Storage Collector Agents
c
If a Storage Collector agent does not exist for a device, or if the agent exists but it is disabled or
its data collection is turned off, contact your Foglight for Storage Management Administrator. For
instructions about agent tasks, see the Foglight for Storage Management Installation and
Configuration Guide.
After you finish with Storage Collector agents, verify the virtualization agents.
Foglight for Storage Management 4.1
User and Reference Guide
11
Collecting Storage Data for Virtualized Devices and
Servers
To collect storage data for virtualized devices and servers:
1
On the navigation panel, expand the dashboard menu for your virtualization software. For example,
either VMware or Hyper-V.
TIP: If you do not see the VMware or Hypver-V menu, ask your Foglight for Storage Management
Administrator to add the appropriate administrator role to your user account.
In a Foglight for Storage Management installation without Foglight for Storage Management for
Virtualization EE, the following menu items are available when you expand VMware or Hyper-V
(additional menu items are available with Foglight for Storage Management for Virtualization EE):
2
•
Agent Administration — Create, edit, and delete the Performance agents to monitor the virtual
environment.
•
Explorer — Monitor entities in your virtual environment.
To verify the virtualization agents:
a
Click the Hyper-V or VMware Explorer.
The Explorer opens a Virtual Infrastructure Topology tree—displayed in the navigation panel
below the list of Dashboards—that contains branches for each monitored vCenter or Hyper-V
server. For detailed information about the dashboards available through this tool, see the
Managing Virtualized Environments User and Reference Guide.
Figure 7. Explorer.
b
If the Virtual Infrastructure Topology list is empty, your Foglight for Storage Management
Administrator needs to create a Performance agent to monitor at least one vCenter or Hyper-V
server. For instructions, see the Managing Storage in Virtual Environments Installation and
Configuration Guide.
c
If the list is not empty, expand the tree and select a virtualization host or server.
Foglight for Storage Management 4.1
User and Reference Guide
12
Figure 8. Example of a VMware ESX host
d
Click the SAN Topology tab.
e
Verify that you can see the storage-side of the virtual infrastructure, identified by the icons that
are circled in the following diagram. If these elements are missing, the Performance agent
monitoring this host is not collecting the data necessary to make connections to storage and SAN
data. To enable storage data collection, see Collecting Virtual Storage-to-SAN Relationships on
page 103.
Figure 9. Topology
After you finish with Performance agents, review monitored hosts.
3
On the navigation panel, click Infrastructure.
a
In the Select a Service box, select All Hosts.
The tiles organize physical hosts by their operating system.
b
Click the Inferred tile.
Foglight for Storage Management 4.1
User and Reference Guide
13
The Inferred tile lists physical hosts that Foglight for Storage Management has inferred based on
an analysis of the switch, array, and filer ports it is monitoring (a process called dependency
processing). When physical hosts are connected to storage devices, you can drill down on a host
and view connections in SAN Topology tabs.
Figure 10. Infrastructure Environment
c
If the Inferred tile does not show any inferred hosts, you may want to enable dependency
processing to begin collecting the names of inferred hosts. For more information, see Inferring
Physical-Host-to-Storage Relationships on page 104.
You have finished verifying that agents are collecting data.
Understanding Metric Data in Charts and
Tables
In the Storage & SAN dashboards, charts and tables display metric data. Data is collected by Storage Collector
agents, aggregated into collections, and the collections are published to the SAN & Storage dashboards at
regular intervals. For more information about storage collection schedules and the type of data collected, see
Modifying Data Collection Schedules on page 109.
The data displayed in charts and tables represents data for one or more intervals in the selected time range as
follows:
•
Current. The value collected in the last interval in the selected time range. In charts, the last plotted
value is the current value.
•
Period. Aggregated values for the entire selected time range. Period values are often used to provide
context for current values.
•
Historical. Individual values for each collection interval in the selected time period. Historical values are
often presented as datapoints in a plot chart or as sparklines to show how metrics changed over the time
period.
•
Latest. Values in the latest collection interval available, irrespective of the selected time range. If data
collection is enabled, this value reflects the latest collection. If data collection is disabled, this value
represents the last collection made by the agent. Although latest values are tracked, the SAN & Storage
dashboards do not display these values.
For indepth information about metrics, see http://communities.quest.com/docs/DOC-12862.
Foglight for Storage Management 4.1
User and Reference Guide
14
Values are not available (n/a) under the following circumstances:
•
The device vendor does not provide the necessary metric. In this case, the text n/p by vendor appears
after the metric name in column headings.
•
A device is offline (reported in the State field).
•
A device is not actively being used, such as when a disk is configured as a spare (reported in the Used or
Role fields).
•
For the Latency metric (which is a partially computed metric), when the operations/second value is very
low, the latency metric may not be submitted during a collection interval.
•
If all other situations do not apply, then it is likely that data collection failed for some reason. If the
current value is unavailable for a time range that occurs in the past, you can try changing the selected
time range to end before or after the problem interval.
Modifying and Extending Data Collection
You can modify and extend data collection in the following ways. You may want to make these changes now, or
adjust the settings in the future if you find a need.
Table 2. Data collection options
Options
Instructions
If not already enabled during installation, collect
data about virtualization device-to-SAN
relationships and display this information in SAN
topology diagrams. Requires a VMware or Hyper-V
Performance agent.
Collecting Virtual Storage-to-SAN Relationships on
page 103
Infer connections from physical hosts to storage and Inferring Physical-Host-to-Storage Relationships on
show the connections in topology diagrams.
page 104
Change the default collection interval.
Modifying Data Collection Schedules on page 109
Modify when rules trigger alarms (change threshold Managing Foglight for Storage Management Rules on
values) and add a list of email addresses to notify
page 114
when alarms occur.
Next Steps
Your next steps depend on whether you want to monitor your storage environment to identify problems or
whether you need to respond to the report of a problem in the virtual infrastructure.
Monitoring Your Storage Environment
The following workflow suggests one approach to monitoring your storage environment and investigating issues.
You may prefer a different approach, depending on your monitoring needs.
1
Performance Monitoring — Start by assessing alarms on all storage devices. See Monitoring Storage
Performance on page 17.
2
Capacity Monitoring — After you acknowledge and/or resolve device-level alarms, you may want to
investigate individual storage devices and their components. See Investigating Storage Devices on page
44.
3
Connectivity and I/O Performance Monitoring — You can identify performance or connectivity problems
that occur during I/O requests by viewing diagrams that place storage components within the context of
the virtualization environment, from VMs and hosts to storage devices. See Assessing Connectivity and
I/O Performance on page 31.
Foglight for Storage Management 4.1
User and Reference Guide
15
Responding to Problem Reports
Storage Administrators need to respond to reports about problems in the storage environment.
•
When you receive reports from colleagues about poorly-performing VMware virtual machines, use the
Storage Troubleshooting analysis tool to determine whether resources in the SAN are contributing to the
problem. For instructions, see Troubleshooting Storage Performance on page 95.
•
When you receive emails from Foglight for Storage Management about a storage device or component
that has entered a problem state, click the link in the email to navigate to the dashboard that provides
details about the alarm. To learn how to set up email notifications, see Configuring Email Notifications
for Alarms on page 116.
Foglight for Storage Management 4.1
User and Reference Guide
16
2
Monitoring Storage Performance
This chapter introduces the Storage Environment dashboard and guides you through monitoring your storage
environment by assessing device status and alarms. The next sections provide alternative approaches to
assessing your storage environment through the use of questions or by assessing the connectivity and port
performance within the context of your virtual infrastructure. The last section introduces reports.
Introducing the Storage Environment
Dashboard
The starting point for monitoring storage performance is the Storage Environment dashboard. The Storage
Environment dashboard contains the following tabs:
•
Monitoring Tab — Monitor your overall storage environment by device type. For more information, see
Monitoring Your Storage Environment on page 19.
•
FAQts Tab — Use questions to retrieve important data about your monitored storage environment. For
more information, see Asking Questions About the Monitored Storage Environment on page 30.
•
Reports Tab — Use templates to create reports on your storage environment. For more information, see
Creating Storage Reports on page 43.
TIP: If the Reports tab is missing, your Foglight for Storage Management Administrator may need to
add the General Access role your user account. For more information, see “Assigning Foglight for
Storage Management Roles” in the Managing Storage in Virtual Environments Installation and
Configuration Guide.
If your Foglight for Storage Management user account includes the Storage Administrator role, the Storage
Environment dashboard also displays these tabs:
•
Administration Tab — Manage configured agents, credentials, the Storage Agent Master, and modify data
collection settings.
•
Getting Started Tab — Purchase a product license.
To open the Storage Environment dashboard:
•
On the navigation panel, under Dashboards, click Storage & SAN > Storage Environment.
The first time you open this dashboard, the Monitoring tab is displayed, showing the overall status of
your monitored storage environment.
Foglight for Storage Management 4.1
User and Reference Guide
17
Figure 1. Monitoring dashboard
Time Range
Tiles
Device List
Quick View
Device Summary
FAQts
Alarm Summary
Table 1. The Monitoring tab contains the following screen elements:
Screen Element
Description
Time Range
Controls the period of time represented in the dashboard. The default is the last four
hours. For more information, see “Time Range” in the online help.
Summarizes how many of each type of device—Fabrics, Storage Arrays, and Filers—are
being monitored, and breaks down the total number by alarm severity (from left to right:
Fatal, Critical, Warning, and Normal). Tiles control the Quick View area as follows:
Tiles
Quick View
•
Click the top half of a tile to open the device type’s quick view with the Summary
view displayed.
•
Click a status icon in one of the tiles to open the device type’s quick view
containing only devices with that status.
Contains four areas: Device List, Device Summary, FAQts, and Alarm Summary.
Displays the list of storage devices that match the selected tile.
Device List
Device Summary
•
Click Summary to display the top storage devices in terms of key performance or
capacity metrics.
•
Click a device name to display metrics for that device only.
Displays performance and capacity metrics for the device selected in the Device List.
Foglight for Storage Management 4.1
User and Reference Guide
18
Table 1. The Monitoring tab contains the following screen elements:
Screen Element
Description
FAQts
Offers common searches in the form of questions. Scroll through questions, and click
Show Me to see the answer to the currently displayed question. For more information,
see Asking Questions About the Monitored Storage Environment on page 30
Alarm Summary
Displays a list of alarms for the device selected in the Device List. For more information,
see Understanding Status, Alarms, and Rules in Foglight for Storage Management on page
19.
Monitoring Your Storage Environment
When you begin monitoring your storage environment, a good way to start is by identifying poorly-performing
devices and resolving the issues that are contributing to the problems. In Foglight for Storage Management, all
storage devices and their child components are assigned a status, which enables you to see at glance which
elements in your storage infrastructure need attention. Any device that enters a non-Normal status triggers an
alarm. For more information, see Understanding Status, Alarms, and Rules in Foglight for Storage Management
on page 19.
The procedures in this section show you how to use the Monitoring tab in the Storage Environment dashboard to
assess alarms and drill down to the components used by storage devices, such as the LUNs and NASVolumes that
supply the physical storage for datastores.
Understanding Status, Alarms, and Rules in Foglight
for Storage Management
In Foglight for Storage Management, all storage devices and their child components are assigned a status, which
enables you to see at a glance which entities in your storage infrastructure need attention. Status and alarms
are controlled by rules that can be based on data, time, schedules, or events. The rule conditions define which
values reflect acceptable behavior (Normal status) and which values warrant an alarm (Warning, Critical, or
Fatal status). As the value associated with a monitored entity passes a condition threshold, Foglight for Storage
Management generates an alarm and changes the entity’s status accordingly.
NOTE: When a component is selected, its detail views display the component’s Status followed by its
State. Status is determined by Foglight for Storage Management as describe above. State refers to the
physical state of a component as reported by the vendor; if the vendor does not provide the physical
state, the state is unknown. A component’s physical state may affect its status only when an enabled rule
triggers alarms based on state. Consult with your Foglight Administrator if you want to enable or create
rules that perform this check.
A storage device often has large numbers (thousands) of child components. With a few exceptions, alarms on
child components do not change the status of the parent device. For example, a failed disk may have a Fatal
status, but because arrays are designed to cope with a failed disk, the parent device continues to display a
Normal status. The parent device status may be changed by child components in the following circumstances:
•
Controller problems typically affect the performance of the storage array or filer, so the parent device
inherits the alarm status of the controller. Pool alarms reflect capacity issues and performance problems
that can affect many users of the storage array or filer, so the parent device inherits the alarm status of
the pool or aggregate.
•
When a significant percentage of child components have problems (for example, many disks are failing),
the problems may be indicative of a systemic problem. In this case, the parent device status changes
when the number of affected child components reaches a threshold defined in rules.
For information about changing default rules and alarm settings, see Managing Foglight for Storage Management
Rules on page 114.
Foglight for Storage Management 4.1
User and Reference Guide
19
Reviewing the Status of All Devices
Use the Monitoring tab in the Storage Environment dashboard to gain a high-level understanding of the status of
the devices in your environment, organized by device type. For a general description of the dashboard, see
Introducing the Storage Environment Dashboard on page 17.
To monitor the storage environment:
1
On the navigation panel, under Dashboards, click Storage & SAN > Storage Environment.
2
Click the Monitoring tab.
The time range and selected tile are the same as the last time the dashboard was opened.
Optional—Change the Time Range. For an initial review, use the default time range.
3
Scan the lower part of the tiles.
Figure 2. Device status
4
5
If any of the tiles show that there are devices with the
one of the following:
Fatal,
Critical, or
Warning status, do
•
To view all levels of alarms, click the top part of a tile that has the highest number of devices
with the most severe errors. Go to the next section, Assessing Storage Alarms.
•
To view alarms at one severity level only, such as all Critical alarms, click the Critical status
count. Go to the next section, Assessing Storage Alarms.
If all tiles report that devices are in the Normal status, it means that the devices are operating within
acceptable parameters. The next step is to review the devices to see if any of their child components
show an alarm status. See one of the following topics:
•
Monitoring Fabrics on page 22
•
Monitoring Storage Arrays on page 26
•
Monitoring Filers on page 28
Assessing Storage Alarms
When a storage device or component enters an unacceptable state (as defined in a rule), the rule that monitors
the entity triggers an alarm and sets the status of the resource. Examine the alarm messages starting with the
most severe alarms.
This walkthrough assumes that you are looking at alarms of all severity levels in an Alarm Summary view. In
many places in the software you can restrict your assessment to resources with the same alarm severity. This
can be useful when you want to prioritize your alarm assessment, such as focusing on all storage arrays with
Critical alarms first, and then on the Warning alarms.
To assess alarms:
1
In the Alarm Summary view, review the alarm messages to understand the issues. If the list contains
multiple levels of alarm severity, start with the highest severity alarm.
Foglight for Storage Management 4.1
User and Reference Guide
20
Figure 3. Alarm Summary view
TIP: If you see alarms on devices or components that you think are operating within acceptable
parameters, consider creating new rules to better suit your environment. For more information,
see Managing Foglight for Storage Management Rules on page 114.
2
If you need more details to understand the issue, click the alarm message.
An Alarm window displays more information about the alarm and troubleshooting tips.
TIP: To bypass the Alarm window and go straight to the Choose Diagnostic Focus Time window
(described in the next step), click an instance name instead of the message.
Figure 4. Alarms window.
3
From the Troubleshooting pane, click the Diagnose button.
4
Choose the time period to use as your diagnostic time range:
•
Explore at Storage Alarm time. Select this option when you want to view diagrams and other
details of the affected component at the time the alarm occurred. Shows data for the time period
leading up to and including the alarm time. For example, given an alarm time of 10:32 AM and
the default four hour time range, the diagnostic time range is set to 6:32 AM – 10:32 AM.
•
Explore at Default Diagnostic time. Select this option when you want to determine if the
situation causing the alarm persisted or if it resolved on its own. Shows data before and after the
alarm, with the alarm time positioned three quarters of the way into the time range. For
example, given an alarm time of 10:32 AM and the default four hour time range, the diagnostic
time range is set to three hours before the alarm and one hour after the alarm, that is, 7:32 AM –
Foglight for Storage Management 4.1
User and Reference Guide
21
11:32 AM. If the current time falls within the range, for example, it is currently 11:05 AM, the
time range is set to 7:32 AM – 11:05 AM.
A component dashboard opens with its time range set to the selected diagnostic time range.
5
Review the component dashboard to better understand the data that led to the alarm. If you navigate to
other dashboards, the diagnostic time range remains the same.
6
When you complete your investigation, in the breadcrumbs, click Storage Environment to return to the
Choose Diagnostic Focus Time window. If desired, choose the other diagnostic time range.
7
When you are finished, close the Choose Diagnostic Focus Time window, and in the Alarm window click
one of the following options:
8
•
Acknowledge. Continues to display the alarm, but it is marked as acknowledged until the alarm is
triggered again. For example, for Warnings, an appropriate action may be to acknowledge the
alarm and ignore it.
•
Acknowledge Until Normal. Continues to display the alarm, but it is marked as acknowledged
until the affected component returns to the Normal status. This is useful when a component has
failed and you want to know when it is replaced.
•
Clear. Deletes the alarm. Choose this option when the situation is resolved.
Close the Alarm window.
TIP: When you close the window, the time range returns to the time range in use before your alarm
analysis. If it does not, in the Time Range either click the Frozen Time Range
icon to return to
real time or click the arrow to expand the zonar and set the range. For more information, see
“Working in a Current or a Diagnostic Time Range” in the online help.
9
Take action to resolve the issue in your storage infrastructure, either by yourself or by notifying the
appropriate person.
Monitoring Fabrics
Foglight for Storage Management provides insight into both physical and virtual fabrics available with Brocade
and Cisco Fibre Channel (FC) switches. A physical fabric is a group of interconnected FC switches. The definition
of a virtual fabric differs depending on the vendor:
•
Brocade switches enable customers to group ports on physical switches into logical switches. Logical
switches and physical switches can then be interconnected into virtual fabrics. Brocade creates logical
ISL ports to interconnect logical switches. No metrics are available for LISL ports.
•
Cisco switches enable customers to create virtual storage area networks (VSANs) partitioned from a
physical fabric. A VSAN is a logical group of ports, where the ports are located on one or more of the
interconnected FC switches that form the physical fabric.
Fabrics are displayed in the Fabrics quick view of the Storage Environment dashboard. When you expand a
fabric branch to view its components, the list of components varies depending on the type of fabric as follows:
Foglight for Storage Management 4.1
User and Reference Guide
22
Figure 5. Fabric branch.
Brocade virtual fabric
List of physical and logical switches in the fabric
(same icon)
Cisco fabric with VSANs
List of physical switches in the fabric
This walkthrough introduces the quick views for fabrics and their components.
To monitor fabrics, switches, and VSANs:
1
On the Storage Environment dashboard, ensure the Monitoring tab is selected.
2
Click the Fabrics tile to open the Fabrics quick view.
TIP: You can also open this quick view from the navigation panel. For more information, see
Introducing the Storage Explorer on page 44.
3
To identify the busiest fabrics in your environment, in the Fabrics list, click Summary.
The Fabrics Summary (All Fabrics) panel opens. The Fabrics view identifies the top three fabrics with the
highest average values for Data Rate, Link Error Rate, and Non-Link Error Rate, respectively. The FC
Switches view identifies the top three switches in terms of the same metrics. The charts plot the metric
values over the time period, while the tables show the average and current values for each component.
Figure 7. Fabrics Summary panel.
To investigate one of the top three fabrics or switches:
4
•
To explore a top fabric, click its name in a table. See Exploring a Fabric on page 46.
•
To explore a top switch, click its name in a table. See Exploring a Switch on page 49.
•
To return to this quick view, in the breadcrumbs, click Storage Environment.
To monitor the performance of a fabric, in the Fabrics list, click a fabric name.
Foglight for Storage Management 4.1
User and Reference Guide
23
In the Fabric Summary (Selected Fabric), the Related Inventory view contains alarm summaries for the
selected fabric as well as its switches, ISL ports, N ports, and VSANs (Cisco fabrics only). The Resource
Utilization charts display the following metrics for ISL ports (left) and N ports (right) used by the fabric:
•
Avg Utilization Distribution. For each type of port, displays aggregated values for Rcvd
Utilization and Xmit Utilization grouped by percentage of usage. Most of your port utilization
should be in the lower percentages. When there are ports performing at high utilization rates,
you may want to investigate port performance further.
•
Data Rate. For each type of port, plots aggregated values for Data Receive Rate and Data Send
Rate over the time period and displays the Baseline.
•
Error Rate. For each type of port, plots aggregated values for Link Error Rate and Non-Link Error
Rate over the time period.
Figure 8. Fabric Summary.
To continue investigating the selected fabric:
5
•
To explore details about the fabric, its switches, and its ports, click View in Explorer. See
Exploring a Fabric on page 46.
•
To explore an FC switch in the selected fabric, in the Related Inventory view click FC Switches or
an alarm icon, and select a switch. See Exploring a Switch on page 49.
•
To investigate a port used in the selected fabric, in the Related Inventory view click either ISL
Ports or N Ports or an alarm icon, and select a port. See Investigating an FC Switch Port on page
78.
•
Cisco fabrics only —To investigate a VSAN used in the selected fabric, in the Related Inventory
view click VSANs or an alarm icon, and select a VSAN. See Exploring a Cisco VSAN on page 50.
•
To return to the quick view, in the breadcrumbs, click Storage Environment.
To monitor the performance of a switch (physical or logical), in the Fabrics list, expand a fabric and click
a switch.
In the FC Switch Summary, the Related Inventory view contains alarm summaries for the selected switch,
the fabric it belongs to, and its ISL ports and N ports. The charts display the following metrics for ISL
ports and N ports used by the switch:
•
Ports Average Utilization Distribution. For each type of port, displays aggregated values for
Rcvd Utilization and Xmit Utilization grouped by percentage of usage. Most of your port
utilization should be in the lower percentages. When there are ports on the switch performing at
high utilization rates, you may want to investigate port performance further.
Foglight for Storage Management 4.1
User and Reference Guide
24
•
Rcv Rate. For each type of port, plots aggregated values for Data Receive Rate over the time
period and displays the Baseline.
•
Xmit Rate. For each type of port, plots aggregated values for Data Send Rate over the time
period and displays the Baseline.
•
Error Rate. For each type of port, plots aggregated values for Link Error Rate and Non-Link Error
Rate over the time period.
Figure 9. FC Switch Summary
TIP: To identify port performance that has changed, look for values that fall outside the grey
Baseline ribbon.
To continue investigating the selected switch:
6
•
To explore details about the switch and its ports, click View in Explorer. See Exploring a Switch
on page 49.
•
To investigate a port used by the selected switch, in the Related Inventory view click ISL Ports or
N Ports or an alarm icon, and then select a port. See Investigating an FC Switch Port on page 78.
•
To return to the quick view, in the breadcrumbs, click Storage Environment.
To monitor the performance of a Cisco VSAN, in the Fabrics list, expand a Cisco fabric and click a VSAN.
In the VSAN Summary, the Related Inventory view contains alarm summaries for the selected VSAN, the
fabric it belongs to, and the ISL ports and N ports used by the VSAN. The Resource Utilization charts are
the same as the charts displayed for a fabric, but the aggregated values include only the ports used by
the VSAN.
Foglight for Storage Management 4.1
User and Reference Guide
25
Figure 10. VSAN Summary.
To continue investigating the selected VSAN:
•
To explore details about the VSAN and its ports, click View in Explorer. See Exploring a Cisco
VSAN on page 50.
•
To investigate a port used by the selected VSAN, in the Related Inventory view click ISL Ports or N
Ports or an alarm icon, and then select a port. See Investigating an FC Switch Port on page 78.
•
To return to the quick view, in the breadcrumbs, click Storage Environment.
Monitoring Storage Arrays
When monitoring storage arrays, or when diagnosing alarms on storage arrays, you may need more detail on the
array or its member nodes, controllers, ports, pools, LUNs, or disks. You may also want to determine if any of
the child components have alarms that did not affect the array status. This walkthrough introduces the quick
view for storage arrays.
To monitor storage arrays:
1
On the Storage Environment dashboard, ensure the Monitoring tab is selected.
2
Click the Storage Arrays tile to open the Storage Arrays quick view.
TIP: You can also open this quick view from the navigation panel. For more information, see
Introducing the Storage Explorer on page 44.
3
To identify the most problematic arrays in your environment, in the Storage Arrays list, click Summary.
Figure 11. Storage Arrays list.
Foglight for Storage Management 4.1
User and Reference Guide
26
The Storage Array Summary summarizes the capacity and performance health of the arrays in the
environment.
•
•
Capacity
•
The table is organized by arrays whose pools have the most significant, near-term capacity
issues.
•
The categories take into account the estimated time when pool capacity will be full, as
well as the current available capacity and the over-provisioning state of the pool.
•
To investigate, click on the array name. See Monitoring Storage Capacity on page 40.
Performance
•
The table is organized by arrays with the most LUNs having latency issues.
•
To investigate, click on the array name in the table to drill down to the Array Explorer.
Click on the LUNs tab. See Investigating a LUN on page 81.
Figure 12. Storage Arrays Summary.
4
To monitor a storage array, in the Storage Array list, click a storage array name.
The content of the quick view varies depending on the selected storage array. For most storage arrays
(excluding Dell EqualLogic and EMC Isilon), the quick view contains the following embedded views:
•
Related Inventory. Contains alarm summaries for the selected storage array and its controllers,
FC ports, IP ports, pools, LUNs, and disks.
•
Storage Capacity Summary. Displays current values for Total Advertised LUNs Size and Capacity
Provisioned to LUNs.
•
Controller Performance. Plots % Busy values by controller over the time period and displays
threshold lines (defined in registry variable StSAN.Controller.PctBusyThreshold).
•
Pools with Severe or High Pressure on Available Usable Capacity (or Raw Capacity)
•
Displays the pools that have the most significant, near-term capacity issues.
•
Shows the available usable capacity or available raw capacity in the table, depending on
the data provided from the device vendor, and the % available.
•
Shows the estimated time when the pool capacity will be full
•
Over commitment is not shown when raw capacity numbers are displayed.
•
The cylinders are colored to show the % of available capacity in the pool.
Foglight for Storage Management 4.1
User and Reference Guide
27
•
LUNs/Disks States. Plots the percentage of disks and LUNs in the storage array in problem states.
Problem states are reported by the vendor. Resolving these issues may improve LUN performance.
Figure 13. Storage Array Summary.
For more storage array quick views, see the Summary tab description under Dell EqualLogic Storage
Array on page 56 and EMC Isilon Storage Array on page 59.
5
To continue investigating the selected storage array:
•
To explore details about the storage array and its child components, click View in Explorer. See
Exploring a Storage Array on page 54.
•
To investigate a child component, in the Related Inventory view click a component type or a
status icon, and then select a component. A component dashboard opens. For help with the
dashboard, see one of the following topics:
- Investigating a Controller on page 74
- Investigating an EqualLogic Member on page 77
- Investigating an Isilon Node on page 80
- Investigating an Array/Filer Port on page 73
- Investigating a Pool on page 88
- Investigating a LUN on page 81
- Investigating a Physical Disk on page 86
•
To return to this quick view, in the breadcrumbs, click Storage Environment.
Monitoring Filers
When monitoring filers, or when diagnosing alarms on filers, you may need more detail on a filer and its
controllers, ports, NASVolumes, LUNs, aggregates, or disks. You may also want to determine if any child
components display alarms. This walkthrough introduces the quick views for filers.
To monitor filers:
1
On the Storage Environment dashboard, ensure the Monitoring tab is selected.
2
Click the Filers tile to open the Filers quick view.
Foglight for Storage Management 4.1
User and Reference Guide
28
TIP: You can also open this quick view from the navigation panel. For more information, see
Introducing the Storage Explorer on page 44.
3
To identify the busiest filers in your environment, in the Filers list, click Summary.
Figure 14. Summary
The Filer Summary view identifies the top three filers in two categories:
•
Filers with Lowest % of Free Disk/Spares Capacity (raw). Displays cylinders showing the amount
of used Free Disk/Spares Capacity (Raw). Below each cylinder, you can see total and free
capacity.
•
Filers with Lowest % of Available Aggr Capacity (usable). Displays cylinders showing the amount
of used Aggr Capacity (Usable) Free. Below each cylinder, you can see total and free capacity.
Figure 15. Filer Summary view
If you find that cylinder colors do not reflect the acceptable thresholds in your environment, you can ask
your Foglight for Storage Management Administrator to edit the threshold values in the following registry
variables:
•
Disk capacity: StSAN.FilerDisks.PctUnallocatedCapacityThreshold.[Fatal|Critical|Warning]
•
Aggregate capacity:
StSAN.FilerAggregates.PctUnallocatedCapacityThreshold.[Fatal|Critical|Warning]
NOTE: Registry variables are global variables that are often referenced by rules. Before editing a
registry variable, ensure that the edit will not cause unintended changes to how the affected rules
trigger alarms. For more information, search for “Registry Variable” in the online help.
4
In the Filer list, click a filer name.
The Related Inventory view contains alarm summaries for the selected filer as well as its controllers,
ports, NASVolumes, LUNs, aggregates, or disks. The Resource Utilization charts show capacity and
performance metrics for the filer:
•
Storage Capacity Summary. The cylinders show the amount of capacity consumed in the filer,
expressed using current values for the following pairs of metrics:
- Disk Capacity (Raw) and Free Disk/Spares Capacity (Raw)
- Aggr Capacity (Raw) Total and Aggr Capacity (Raw) Free
- Aggr Capacity (Usable) Total and Aggr Capacity (Usable) Free
The bar chart displays current values for Advertised LUN Size, Advertised NASVolumes Size, Aggr
Capacity (Usable) Total, and Aggr Capacity (Usable) Free.
Foglight for Storage Management 4.1
User and Reference Guide
29
•
NASVolume Performance. Plots the percentage of NASVolumes in the filer in problem states.
Problem states are reported by the vendor. Resolving issues may improve volume performance.
•
LUN Performance. Plots the percentage of LUNs in the filer in problem states. Problem states are
reported by the vendor. Resolving issues may improve LUN performance.
Figure 16. Filer Summary
5
To continue investigating the selected filer:
•
To explore details about the filer and its child components, click View in Explorer. See Exploring
a Filer on page 51.
•
To investigate a child component, in the Related Inventory view click a component type or an
alarm icon, and then select a component. A component dashboard opens. For help with the
dashboard, see one of the following topics:
- Investigating a Controller on page 74
- Investigating an Array/Filer Port on page 73
- Investigating a NASVolume on page 84
- Investigating a LUN on page 81
- Investigating an Aggregate on page 70
- Investigating a Physical Disk on page 86
•
To return to this quick view, in the breadcrumbs, click Storage Environment.
Asking Questions About the Monitored
Storage Environment
Another way to find out information about your storage environment is to ask questions. The FAQts tab contains
frequently-asked questions about storage environments. Answers are displayed in the form of tables and graphs.
You can select questions that apply to all storage resources or to a specific type of storage resource.
To use the FAQts tab:
1
On the Storage Environment dashboard, click the FAQts tab.
2
In the Categories pane, click a category to display the questions for that category in the Questions pane.
Foglight for Storage Management 4.1
User and Reference Guide
30
TIP: If the list of questions is long and you want to narrow it down, search for a particular text
string using the Search Questions box.
3
In the Questions pane, click a question to display the answer in the Answer pane.
Figure 17. Questions pane
4
In the Answer pane, you can change the number of results displayed by changing the value of the Show
Top box.
5
To investigate a component or its parent device, click a name in the Name or Parent column.
Assessing Connectivity and I/O Performance
You can also identify connectivity and port performance problems that occur during I/O requests by assessing
storage from the perspective of the virtualization environment. The VMware and Hyper-V Explorers offer an
end-to-end look at all the entities that make up your virtual infrastructure, including clusters, hosts,
virtualization storage, and virtual machines.
NOTE: This workflow assumes that the following requirement is satisfied:
•
VMware or Hyper-V Performance agents are defined for the target environment, and the
agents are configured to collect storage data. For more information, see Step 2 in
Collecting Storage Data for Virtualized Devices and Servers on page 12.
Introducing the Virtualization Dashboards
This workflow introduces you to the version of the Virtualization dashboards available in Foglight for Storage
Management. For detailed information, see the VMware Monitoring in Foglight for Storage Management User
and Reference Guide or Hyper-V Monitoring in Foglight for Storage Management User and Reference Guide.
Table 2. List of dashboards that display virtualization storage environment-to-SAN relationships when data
collection is enabled on VMware or Hyper-V Performance agents
Dashboard
SAN Topology Tab
SAN Data Paths Tab
ESX Host/Hyper-V Server
Yes
Yes
Virtual Machine
Yes
Yes
Cluster
Yes
—
Datastore, CSV, logical disk
Yes
—
Foglight for Storage Management 4.1
User and Reference Guide
31
Table 2. List of dashboards that display virtualization storage environment-to-SAN relationships when data
collection is enabled on VMware or Hyper-V Performance agents
Dashboard
SAN Topology Tab
SAN Data Paths Tab
LUN
Yes
Yes
NASVolume
Yes
—
The following workflow walks you through opening a virtualization dashboard and selecting an entity to explore.
To use the virtualization dashboard and topology diagrams:
1
On the navigation panel, under Dashboards, click either VMware > VMware Explorer or Hyper-V >
Hyper-V Explorer.
TIP: If you do not see the menu, ask your Foglight for Storage Management Administrator to add
the role VMware Administrator or Hyper-V Administrator to your user account.
The Explorer opens an Infrastructure menu below the Dashboards menu on the navigation panel.
2
Under Infrastructure, on the Topology tab, select the desired entity.
The dashboard opens.
NOTE: The topology will display clusters, virtualization servers, and virtual machines. For VMware,
it will display Virtual Centers and Datacenters.
•
From the Summary tab, you can monitor and investigate entities in your virtual infrastructure
using an approach similar to the Storage Environment dashboard.
•
When you click the Storage tab, you see a list of all logical storage used by the entity and the
associated storage capacity. Click a storage entity name to open the dashboard containing details
about the entity.
Figure 18. Entity details
TIP: When the list of entities is long—which is often the case for virtual machines—you can find an
entity by typing its full or partial name in the Topology tab’s Search box.
Foglight for Storage Management 4.1
User and Reference Guide
32
When you select a cluster, the dashboard will include SAN Topology tabs. Host/server and virtual
machine dashboards will include SAN Data Paths views in addition to SAN Topology that you can explore.
•
Click the SAN Topology tab. For instructions about this view, see Exploring Connectivity with SAN
Topology Diagrams on page 34.
•
Click the SAN Data Path tab. For instructions about this view, see Exploring I/O Performance with
SAN Data Paths on page 37.
Summary of Icons Used in Topology Diagrams
In topology diagrams, the icons represent entities in your monitored virtual infrastructure and storage
infrastructure, as described in the following table. Each icon incorporates a small status icon to show the status
of the entity.
Table 3. Icons used in topology diagrams.
Icon
Represents
Physical Host
ESX Host
Description
A physical server in your network.
A physical server hosting the hypervisor-based architecture and virtual
machines controlling and managing resources for the virtual machines.
Hyper-V Server
Virtual Machine
VMware
A virtual machine running on a host or server.
Hyper-V
Datastore
A datastore or CSV (Cluster Shared Volume) or logical disk is a logical
storage structure used to provide virtual machine disks and files.
Logical Disk
CSV
Disk Extent
Cloud
FC Port
A disk extent is all or part of a host disk that can provide the physical
storage for a datastore, CSV or logical disk. The disk extent can be on a
local DAS device, or it can be mapped to a LUN on a storage array or filer in
the SAN.
Hides port-level complexity in a topology diagram. Click the icon to view
ports and their connected devices in a Detail window.
NOTE: The SAN Data Paths tab does not use the cloud icon. All ports and
their devices are shown in the diagram.
FC ports connect to other FC ports on switches, physical hosts, storage
arrays, or filers using fibre-channel network technology in the SAN.
Foglight for Storage Management 4.1
User and Reference Guide
33
Table 3. Icons used in topology diagrams.
Icon
Represents
Description
IP Port
IP ports provide access to the IP network for storage devices using iSCSI or
other protocols.
Controller
A controller manages the ports used by a storage array or filer.
LUN
A LUN (logical unit number) represents a logical SAN block storage device
on an array or filer that can be exposed for mapping to a server.
NASVolume
A NASVolume is a volume whose physical storage is on a filer or unified
storage supplier. It can be mounted by an ESX host to provide the physical
storage for a datastore using NFS.
A storage server that works together with other members or nodes in a
Member or Node clustered storage architecture. The array cluster distributes the workloads
among the members of the cluster.
Exploring Connectivity with SAN Topology Diagrams
SAN Topology diagrams map the connections through the virtual environment (VMs-to-virtualization storage-toextents) to the resources in the SAN (LUNs and NASVolumes). Topology diagrams hide port-level details to focus
on connectivity. Using this view, you can see which resources have an alarm status along each connection path,
and begin to form hypotheses about whether or not the SAN environment is contributing to performance issues
in the virtual environment. You can then drill down on resources to test those hypotheses.
In a topology diagram, you can view individual port connections by clicking one or more cloud icons. Details are
shown in a separate window.
TIP: If your primary goal is to assess the performance of the ports that are used by the datastores and disk
extents connecting to the SAN, use the SAN Data Paths tab instead (when available). For more
information, see Exploring I/O Performance with SAN Data Paths on page 37.
The SAN Topology view displays as a graph for a VM, Datastore or CSV, and for a storage LUN or NASVolume.
The SAN Topology view for a Cluster or server or host will offer the following choices for viewing the topology:
•
Show Datastore or CSV resources used by VMs as graph
•
Show RDM or pass-through disk resources used by VMs as graph
•
Show storage resources used by VMs as table
•
Show all storage resources as table
The option to show all storage resources as a table will include Datastores or CSVs that are not used by VMs.
The following workflow explains how you can verify connectivity and the status of entities and storage devices
in your infrastructure using a topology diagram. This procedure assumes that you navigated to a topology view
from a Virtualization Explorer dashboard (see Introducing the Virtualization Dashboards on page 31) or from a
Storage Explorer component dashboard (see Investigating a LUN on page 81 or Investigating a NASVolume on
page 84).
To use a topology diagram graph:
1
Click on a Datastore or CSV icon to see the VMs that use this storage. Click on a cloud icon to see the
extents that map to the LUN.
NOTE: When viewing the graph for a Cluster, the VMs and disk extents are not displayed on the
main graph. When viewing the graph for a server or host, if it supports large numbers of VMs, as in
a VDI environment, the VMs will not display on the main graph.
Foglight for Storage Management 4.1
User and Reference Guide
34
The following image shows virtual machines connected to separate datastores. The disk extents are
identified and linked through the network to LUNs. For descriptions of icons, see Summary of Icons Used
in Topology Diagrams on page 33.
Figure 19. Topology diagram.
2
When a topology diagram has a very large number of elements to be displayed, the graphical
representation may be changed to a table for clarity. If the graph is too large to display in the viewing
area, you can use the browser’s scroll bars and the Navigator to view the hidden areas. The Navigator
enables you to scale the topology diagram and, in large diagrams, drag the viewing area to a different
part of the diagram.
3
Click a Datastore or CSV icon to see the VMs that use this storage.
Figure 20. VMs using this storage.
4
Click any icon to review key details and metrics about the entity.
Foglight for Storage Management 4.1
User and Reference Guide
35
Figure 21. Key details and metrics.
5
Click a Cloud
icon to display the ports and SAN paths from the logical storage to the LUN, through
the host or server.
Figure 22. Ports and SAN paths.
In this example, you can see that the LUN’s controller has a Critical status.
6
To investigate an entity, click its icon. For example:
a
Click the controller
icon.
b
To view the alarm message, click the alarm count (opens an Alarm Summary window) or the
controller name (opens the Controller dashboard).
Figure 23. Alarm message.
c
The Alarm Summary reports the cause.
Foglight for Storage Management 4.1
User and Reference Guide
36
7
In some topology diagrams, you may see an inferred host (indicated with an orange line). On these
diagrams, if an inferred-host-to-port assignment is incorrect, you can click the host port icon and click
the appropriate action to correct it.
Figure 24. Incorrect inferred-host-to-port assignment.
To use a topology diagram table:
1
Click on a VM cell to display the VMs that use this Datastore or CSV, or RDM. Click on a cloud icon to
display the ports and SAN paths from the storage to the LUN, through the hosts or servers.
NOTE: The table view provides the same information as the graphical view.
Exploring I/O Performance with SAN Data Paths
The SAN Data Paths tab focuses on input/output performance of the ports used in the data paths connecting the
virtual environment to the SAN. This view combines an I/O Data Paths table with a topology diagram that
includes port details and I/O performance metrics. The table displays the worst-performing path segment of the
possible data paths between each disk extent and the LUN during the time period, helping you to identify
bottlenecks resulting in high latency. The diagram enables you see the entities that are capable of doing I/O
through each path segment, corresponding to the table rows.
Because the metric values in the SAN Data Paths tab represent the average values over the time period,
consider selecting a shorter time range (one hour or less) to help investigate performance spikes. For example,
if a customer is complaining about latency issues in the last 15 minutes, you may want to set the dashboard
time range to the last 15-30 minutes to focus on the problem period.
TIP: If your primary goal is to discover which SAN resources are used by entities in your virtual
environment, use the SAN Topology tab instead. For more information, see Exploring Connectivity with
SAN Topology Diagrams on page 34.
This workflow walks you through using the SAN Data Paths tab from the VMware Explorer’s ESX Host dashboard.
The content of the SAN Data Paths tab may be slightly different on the Virtual Machine, Datastore, and LUN
dashboards, but the flow is the same. The workflow for Hyper-V servers, VMs, and CSVs is similar, but uses the
Hyper-V terminology. This workflow continues from Introducing the Virtualization Dashboards on page 31.
Foglight for Storage Management 4.1
User and Reference Guide
37
To review data paths:
1
On the SAN Data Paths tab, review the In Use I/O Paths table to see the worst-performing path segments
in terms of Latency.
The color of the bars (in the table and in the diagram) highlights data values that fall within various
thresholds. Metrics without thresholds are displayed in dark blue.
Figure 25. In Use I/O Paths.
The columns display the following details and metrics. On the Virtual Machine, Datastore, and LUN
dashboards, the details and metrics may be slightly different.
•
Datastore/Disk Extent. List of datastores and the disks that they use, ordered so that the
datastores with high-latency disk extents appear at the top. Datastores configured from a
NASVolume show only the associated volume; no other data is available. If an RDM or Other node
is displayed, the disk extents under this node are RDMs providing storage directly to the virtual
machine.
•
Latency. Average latency per operation.
NOTE: When physical hosts are inferred, latency values are unavailable because the disk
extents that map to the selected LUN are unknown. For more information, see Inferring
Physical-Host-to-Storage Relationships on page 104.
•
Data Rate. Average data rate for I/O from the ESX or VM to the LUN.
•
ESX FC Ports --> SAN Util. Displays the busiest link (read or write utilization) in the possible paths
between the ESX and the FC switches. Click the cell to display all the port links. Review the
topology diagram to see the ports and link utilization. Data is not available for IP ports.
•
SAN --> A/F Ports Util. Displays the busiest link (read or write utilization) in the possible paths
between the FC switches and the array/filer ports. Click the cell to display all the port links.
Review the topology diagram to see the ports and link utilization. Data is not available for IP
ports.
•
A/F Ctrl Busy. Displays the CPU % Busy metric for the busiest controller in the data path for a
storage array or filer. % Busy values are not available on some devices.
•
LUN / NASVolume / Dir. Displays the LUN that is mapped to the extent, or displays the NASVolume
(filers) or directory (Isilon arrays) providing the storage for a datastore.
•
% Competing I/O at LUN. Displays the percentage of I/O being experienced by this LUN for all
VMs accessing the Datastore, not just those in this ESX. Click the cell to display the top five VMs
doing I/O to this LUN.
•
LUN State. Reports on the state of the LUN as follows:
Table 4. LUN states
Indicates that the LUN is reporting activity that gives an indication of performance
problems, it is currently degraded or rebuilding, or the % Busy or Latency metrics are over
their thresholds for the time period. Dwell on the cell for details.
Foglight for Storage Management 4.1
User and Reference Guide
38
Table 4. LUN states
Indicates that the vendor does not provide % Busy or Latency metrics.
Indicates that either the % Busy metric or Latency metric is within normal range during the
time period, and the LUN is not reporting that it is currently degraded or rebuilding. Dwell
on the cell for details.
•
2
Latency (ms). Average latency per operation to the LUN during the time period.
In the diagram, ensure that Select Desired View is set to Show full data paths.
The diagram shows the data paths between datastores and the LUNs providing the physical storage
within an ESX host. The bars shown below the icons summarize average latency for hosts (if available),
link utilization for FC ports, and percentage busy for controllers. For descriptions of icons, see Summary
of Icons Used in Topology Diagrams on page 33.
Figure 26. Show full data paths.
Latency
Link utilization
Percentage busy
This diagram highlights how ports are often shared by multiple disk extents. You may be able to trace the
cause of a poorly-performing disk extent to poor link utilization values or to some resource issue further
down the data path in the SAN. Or if all SAN ports and resources are in a Normal status, you may be able
to rule out a storage issue.
3
Review the data paths for the worst performing entities to identify bottlenecks that may be contributing
to the high latency values.
4
To see more of the diagram, use the Navigator to shrink the diagram or move the viewing area. For more
information, see Exploring Connectivity with SAN Topology Diagrams on page 34.
TIP: You can also close the I/O Paths table using the arrow on the far right of the table title bar.
5
To investigate an entity, click its icon or name. Use the breadcrumbs to return to this view.
6
If you find that the colors used in some metric bars do not reflect acceptable thresholds in your
environment, you can ask your Foglight for Storage Management Administrator to edit the threshold
values in the following registry variables:
•
Latency: VMW:diskTotalLatency.[warning|critical|fatal]
•
Latency: HPV:diskTotalLatency.[Warning, Critical, Fatal]
•
Port utilization: StSAN.FCSwitchPort.Utilization.[Warning|Critical|Fatal]
Foglight for Storage Management 4.1
User and Reference Guide
39
•
A/F Ctrl Busy: StSAN.Controller.PctBusyThreshold.[Warning|Critical]
NOTE: Registry variables are global variables that are often referenced by rules. Before editing a
registry variable, ensure that the edit will not cause unintended changes to how the affected rules
trigger alarms. For more information, search for “Registry Variable” in the online help.
Monitoring Storage Capacity
FSM closely monitors the capacity being used by the pools in your array or filer. The historical growth of pool
capacity usage is analyzed daily to estimate how much time is remaining until the pool becomes full. This
capability is critically important for pools that are over-committed with thin-provisioned LUNs and/or
NASVolumes.
This section discusses the rules and alarms, views, charts, reports, and configurable values you can use to
monitor your pool capacity.
Capacity Trending
FSM performs a linear regression analysis nightly on the historical values of consumed pool capacity.
The Time Until Full (per available history) will use all available history, up to the last 180 days, to project when
the pool will become full. The minimum number of days required to compute this trend is defined by the
registry variable StSAN_minDaysForLongHistTrend. The default is 30 days.
The Time Until Full (per limited history), will use only recent history, up to the last 30 days, to project when the
pool will become full. The minimum number of days required to compute this trend is defined by the registry
variable StSAN_minDaysForShortHistTrend. The default is 20 days. Examining this value is useful primarily
when there has been a significant recent change in the pool usage that is expected to continue.
The Time Until Full value is displayed with the following granularity:
•
In days, for 0-2 months
•
In weeks, for 2-6 months
•
In months, for 6-12 months
•
As "> 1 year", for anything beyond 365 days
•
As "> 1 year" when usage is decreasing or not changing, not increasing.
•
As "n/a" when there is insufficient history
The Time Until Full (available history) value is displayed on most screens that show pool date. The Capacity tab
of a Pool Explorer will display a chart of the capacity and the trends.
Evaluating Pool Capacity
Pool capacity is evaluated every few hours (after a new topology collection) to determine if a low pool capacity
alarm should be generated, and to summarize the number of pools experiencing capacity pressure on the
Environment Summary/Monitoring/Summary for Arrays and Filers.
The following factors are examined to determine if a pool is experiencing capacity pressure.
•
The over-committed state of the pool
•
The absolute capacity available and the % of capacity available
•
The estimated Time Until Full
Foglight for Storage Management 4.1
User and Reference Guide
40
Environment Summary/Monitoring/Summary
This view will provide you with a quick summary of the arrays in your environment, ordered by low score at the
top. The score reflects the percent of pools in your environment with capacity issues.
Figure 27. Storage view
Clicking on the array name will display a view that identifies the pools with Severe or High capacity
issues. A low pool capacity alarm will have been generated for every pool that is not OK, if the low pool
capacity rule is enabled.
Foglight for Storage Management 4.1
User and Reference Guide
41
Figure 28. Arrays with capacity issues
Capacity Reports
The SanStorage Pools Capacity Summary Report summarizes the total and available capacity, and estimated
time until full, for the pools or aggregates in the selected Storage devices. The report can be optionally limited
to the pools or aggregates with thin-provisioned entities.
Low Capacity Rule
The rule StSAN E Pool Low Available Capacity with Email will be evaluated after every collection that provides
new capacity information. A fatal alarm will be generated for every pool experiencing severe capacity pressure.
A critical alarm will be generated for every pool experiencing high capacity pressure. A warning alarm will be
generated for every pool included under the monitor count. No email will be generated for warning alarms.
Registry Variables
The following registry variables are used in the pool capacity evaluations for alarm generation and for
determining which pools are experiencing capacity pressure.
These variables are used in pool capacity evaluation to compare to the estimated Time Until Full. For overcommitted pools, they will generate alarms as Fatal/Critical/Warning. For pools that are not over-committed,
they will generate alarms as Critical or Warning, based on the available capacity in the pool.
•
StSAN.Pool.TimeToFull.VerySoon - default 2 weeks
•
StSAN.Pool.TimeToFull.Soon - default 4 weeks
•
StSAN.Pool.TimeToFull.Warn - default 12 weeks
There variables are used to examine pool capacity when no estimated Time Until Full value is available, and
when the Time to Full is very long. Critical or Warning alarms will be generated for pools that are not overcommitted. Fatal and Critical alarms will be generated if the pool is over-committed.
Foglight for Storage Management 4.1
User and Reference Guide
42
•
StSAN.Pool.AvailableCapacityThreshold.Critical - default 100 GB
•
StSAN.Pool.AvailableCapacityThreshold.Warning - default 200 GB
•
StSAN.Pool.PctAvailableCapacityThreshold.Critical - default 5%
•
StSAN.Pool.PctAvailableCapacityThreshold.Warning - default 10%
The value of these registry variables can be easily modified to meet the needs of your environment.
Creating Storage Reports
In the Storage Environment dashboard, you can create reports to share with colleagues. You can also copy and
edit existing reports. If the Reports tab is not displayed, your Foglight for Storage Management Administrator
may need to modify your user account to include the General Access role.
TIP: You can also use the Foglight for Storage Management Reports dashboard available from the Reports
link at the top of all dashboards. For more information about this dashboard, see the Foglight for Storage
Management User Guide.
To create a report:
1
On the Storage Environment dashboard, click the Reports tab.
The Create a Report wizard opens. Use the Next and Back buttons to navigate between pages.
Figure 29. Follow the steps in the wizard to select a report template and specify report parameters.
2
Click Finish to generate the report.
Foglight for Storage Management 4.1
User and Reference Guide
43
3
Investigating Storage Devices
To find out more about the performance of a storage device and its components, you use the Storage Explorer
dashboards.
Introducing the Storage Explorer
The Storage Explorer enables you to quickly find and examine any storage device in your storage environment.
The Storage Explorer has a Topology tree—displayed in the navigation panel below the list of Dashboards—that
contains branches for Fabrics, Filers, and Storage Arrays. The fabrics in the Fabrics branch can be expanded to
show FC switches and VSANs. When you select a device from the tree, the Storage Explorer opens a dashboard
displaying details about that device.
This walkthrough describes how to open the Storage Explorer and select a storage device. If you select a toplevel device type node, such as Fabrics, the Storage Environment quick view is displayed. For more information
about the quick view, see Monitoring Your Storage Environment on page 19.
To use the Storage Explorer:
1
On the navigation panel, under Dashboards, click Storage & SAN > Storage Explorer.
The Topology tree opens below the Dashboards area, containing a list of all monitored storage devices
and their status.
Figure 1. The Topology tree
2
On the navigation panel, under Topology, select a storage device. Details about the device and its
components are displayed in a Storage Explorer dashboard.
All Storage Explorer dashboards have a similar organization.
Foglight for Storage Management 4.1
User and Reference Guide
44
Figure 2. Common elements of the Storage Explorer dashboards.
Time Range
Alarm Count
Components
FAQts
Summary
Tab Views
Alarm
Summary
Table 1. Storage Explorer dashboards contain the following common screen elements:
Screen Element
Description
Time Range
Displays and controls the period of time represented in the dashboard.
The default is the last four hours. For more information, see “Time
Range” in the online help.
Alarm Count
Displays counts of all the alarms at each severity level that are still
active (not cleared) at the end of the displayed time range. Unlike the
Alarm Summary view, the Alarm Count includes alarms for the device
and all its child components. Click an alarm count to display the alarms
at that severity level.
When the Summary tab is selected:
•
The top view displays the same Related Inventory view and
charts that are displayed for this device in the Storage
Environment quick view.
•
The Summary and Resource Information view displays physical
configuration details for the selected device.
•
The Alarm Summary displays device-level alarms (if any).
Summary Tab Views
Contains a tab for each type of component used by the selected device,
such as ports, drives, and disks.
When a component tab is selected:
•
The top view displays key performance and/or capacity metrics
for the components used by this device.
•
A Details view lists the components by name and displays
physical configuration details.
•
The Alarm Summary displays component-level alarms (if any).
•
Other component-specific views may be present.
Component Tabs
Foglight for Storage Management 4.1
User and Reference Guide
45
Table 1. Storage Explorer dashboards contain the following common screen elements:
Screen Element
Description
FAQts Tab
Contains questions appropriate for the selected device type. For
information on how to use a FAQts tab, see Asking Questions About the
Monitored Storage Environment on page 30.
Alarm Summary
Displays alarms for the selected device (on the Summary tab) or the
type of component (on the Component tabs).
For information about each type of Storage Explorer dashboard, see the following topics:
•
Exploring a Fabric
•
Exploring a Switch
•
Exploring a Cisco VSAN
•
Exploring a Storage Array
•
Exploring a Filer
Exploring a Fabric
The Storage Explorer’s Fabric dashboard contains information about the selected fabric and its switches as well
as all ports used by the switches in the fabric. The data displayed reflects values for the selected time range.
This walkthrough describes the contents of each of the tabs and points you to where you can drill down for more
details.
To explore a fabric:
1
On the navigation panel, under Dashboards, click Storage & SAN > Storage Explorer.
2
On the navigation panel, under Topology, expand Fabrics and select a fabric.
The Fabric dashboard opens with details for the selected fabric.
TIP: You can also open this dashboard from the Storage Environment dashboard by selecting a
fabric and clicking View in Explorer.
3
If the Alarm Count displays alarms, you may want to assess the alarms before continuing with this
walkthrough. Click a status count to open a list of device and component alarms. For more information,
see Assessing Storage Alarms on page 20.
4
In the Summary tab, review performance in terms of key metrics.
The Summary tab contains the same Related Inventory view and Resource Utilization charts as the Fabric
Summary quick view described in Monitoring Fabrics on page 22. In addition, the tab includes the
following views:
•
Summary and Resource Information. Displays physical details about the fabric.
•
FC Switch Detail. For each switch, displays the switch status, type of switch (physical or logical),
and current value for Data Rate, followed by details about the ISL ports and N ports belonging to
the switch. Ports details include the number of ports by alarm status and current values for Link
Error Rate and Non-Link Error Rate. Click a
period.
Sparkline to plot metric values over the time
Foglight for Storage Management 4.1
User and Reference Guide
46
Figure 4. FC Switch details
•
5
Alarm Summary. Displays alarms on this fabric.
Click the ISL Ports tab.
•
Highest charts. The charts display the top three ISL ports in the fabric with the highest average
Data Rate, Link Error Rate, and Non-Link Error Rate, respectively. The tables show the average
and current values for the ports.
Figure 5. Highest charts
•
Topology Table (Inter-Switch Connections). For each switch, identifies all ISL port connections
with other switches in the fabric.
Figure 6. Topology Table
•
Port Details. For each switch-plus-port combination, displays the name and type of switch
(physical or logical) and the port name, status, and physical state. The metrics represent current
values for Rcvd Utilization, Xmit Utilization, Link Error Rate, Non-Link Error Rate, and Link Speed.
Click a
Sparkline to plot metric values over the time period.
NOTE: When the selected fabric is a Cisco physical fabric with VSANs, this table includes a
VSAN column that identifies the VSANs to which each ISL port is assigned.
Foglight for Storage Management 4.1
User and Reference Guide
47
Figure 7. Port details
You can add the following metrics to the table by clicking the Customizer
Frame Rate, Data Receive Rate, Data Rate, and Data Send Rate.
•
6
icon:
Alarm Summary. Displays alarms on ISL ports.
Click the N Ports tab.
•
Highest charts. The charts display the top three N ports in the fabric with the highest average
Rcvd Utilization, Xmit Utilization, Link Error Rate, and Non-Link Error Rate, respectively. The
tables show the average and current values for the ports.
NOTE: When the selected fabric is a Cisco physical fabric with VSANs, the tables include a VSAN
column that identifies the VSAN to which an N port is assigned.
Figure 8. Highest charts
•
Port Details. For each switch-plus-port combination, displays the name and type of switch and
the port name, status, and physical state. The metrics represent current values for Rcvd
Utilization, Xmit Utilization, Link Error Rate, Non-Link Error Rate, and Link Speed. Click a
Sparkline to plot metric values over the time period.
NOTE: When the selected fabric is a Cisco physical fabric with VSANs, this table includes a VSAN
column that identifies the VSAN to which an N port is assigned.
Foglight for Storage Management 4.1
User and Reference Guide
48
Figure 9. Port Details
You can add the following metrics to the table by clicking the Customizer
Frame Rate, Data Receive Rate, Data Rate, and Data Send Rate.
•
7
8
icon:
Alarm Summary. Displays alarms on N ports.
To investigate further, click the name of a component on any tab:
•
Click a switch name. See Exploring a Switch on page 49.
•
Click a port name. See Investigating an FC Switch Port on page 78.
•
For Cisco fabrics with VSANs, click the text in the VSAN column and then select a VSAN. See
Exploring a Cisco VSAN on page 50.
•
Click an alarm. See Assessing Storage Alarms on page 20.
To change the fabric name to a more user-friendly name, click the fabric name in the Summary and
Resource Information panel.
The Change Fabric Name dialog appears, allowing you to enter a new name.
Exploring a Switch
The Storage Explorer’s FC Switch dashboard contains information about the selected Fibre Channel switch and
all ports used by the switch. The data displayed reflects metric values for the selected time range. This
walkthrough describes the contents of each of the tabs and points you to where you can drill down for more
details.
To explore a switch:
1
On the navigation panel, under Dashboards, click Storage & SAN > Storage Explorer.
2
On the navigation panel, under Topology, expand Fabrics, expand the target fabric, and select a switch.
The FC Switch dashboard opens with details for the selected switch.
TIP: You can also open this dashboard from the Storage Environment dashboard by selecting a
switch and clicking View in Explorer.
3
If the Alarm Count displays alarms, you may want to assess the alarms before continuing with this
walkthrough. Click a status count to open a list of device and component alarms. For more information,
see Assessing Storage Alarms on page 20.
4
In the Summary tab, review performance in terms of key metrics.
The Summary tab contains the same Related Inventory view and Resource Utilization charts as the FC
Switch Summary quick view described in Monitoring Fabrics on page 22. In addition, the tab includes the
following views:
•
Summary and Resource Information. Displays physical details about the switch.
•
Topology Table (Inter-Switch Connections). Brocade logical switches only. For each switch,
identifies all ISL port connections with other switches in the fabric.
Foglight for Storage Management 4.1
User and Reference Guide
49
•
5
Alarm Summary. Displays alarms on the switch.
Click the ISL Ports tab, if available.
NOTE: This tab is hidden when the switch is a Brocade logical switch that uses only logical ISL ports
for inter-switch connections. No metrics are available for logical ISL ports. The Topology Table
(Inter-Switch Connections) table appears on the Summary tab instead.
Displays the same information as the ISL Ports tab on the Fabric dashboard, but the data reflects only the
ISL ports used by the selected switch. For more information, see Exploring a Fabric on page 46.
6
Click the N Ports tab.
Displays the same information as the N Ports tab on the Fabric dashboard, but the data reflects only the
N ports used by the selected switch. For more information, see Exploring a Fabric on page 46.
7
To investigate further, click the name of a component on any tab:
•
Click a port name. See Investigating an FC Switch Port on page 78.
•
For Cisco fabrics with VSANs, click the text in the VSAN column and then select a VSAN. See
Exploring a Cisco VSAN on page 50.
•
Click an alarm. See Assessing Storage Alarms on page 20.
Exploring a Cisco VSAN
The Storage Explorer’s VSAN dashboard contains information about the selected Cisco VSAN and all ports that
are part of the VSAN. The data displayed reflects metric values for the selected time range. This walkthrough
describes the contents of each of the tabs and points you to where you can drill down for more details.
To explore a VSAN:
1
On the navigation panel, under Dashboards, click Storage & SAN > Storage Explorer.
2
On the navigation panel, under Topology, expand Fabrics, expand a Cisco fabric with VSANs, and select a
VSAN.
The VSAN dashboard opens with details for the selected VSAN.
TIP: You can also open this dashboard from the Storage Environment dashboard by selecting a VSAN
and clicking View in Explorer.
3
If the Alarm Count displays alarms, you may want to assess the alarms before continuing with this
walkthrough. Click a status count to open a list of device and component alarms. For more information,
see Assessing Storage Alarms on page 20.
4
In the Summary tab, review performance in terms of key metrics.
The Summary tab contains the same Related Inventory view and Resource Utilization charts as the VSAN
Summary quick view described in Monitoring Fabrics on page 22.
5
Click the ISL Ports tab.
Displays the same information as the ISL Ports tab on the Fabric dashboard, but the data reflects only the
ports that are part of the selected VSAN. For more information, see Exploring a Fabric on page 46.
6
Click the N Ports tab.
Displays the same information as the N Ports tab on the Fabric dashboard, but the data reflects only the
ports that are part of the selected VSAN. For more information, see Exploring a Fabric on page 46.
7
To investigate further, click the name of a component on any tab:
•
Click a port name. See Investigating an FC Switch Port on page 78.
•
Click an alarm. See Assessing Storage Alarms on page 20.
Foglight for Storage Management 4.1
User and Reference Guide
50
Exploring a Filer
Foglight for Storage Management supports NetApp filers. NetApp uses the word aggregate instead of pool, but
the metrics collected for aggregates and pools are similar. In some filer views, in particular the views on the
LUNs tab and Disks tabs, the word pool may be displayed instead of aggregate, but in this context it refers to
aggregates.
The Storage Explorer’s Filer dashboard contains information about the selected filer and its child components.
The data displayed reflects values for the selected time range. This walkthrough describes the contents of each
of the tabs and points you to where you can drill down for more details.
To explore a filer:
1
On the navigation panel, under Dashboards, click Storage & SAN > Storage Explorer.
2
On the navigation panel, under Topology, expand Filers and select a filer.
The Filer dashboard opens with details for the selected filer.
TIP: You can also open this dashboard from the Storage Environment dashboard by selecting a filer
and clicking View in Explorer.
3
In the Summary tab, review performance in terms of key metrics.
The Summary tab contains the same views as the Filer Summary (Selected Filer) quick view described in
Monitoring Filers on page 28. In addition, the tab includes the following views:
4
•
Summary and Resource Information. Displays physical details about the filer.
•
Alarm Summary. Displays alarms on the filer.
Click the Controllers-Ports tab.
•
Controllers. The charts display the controllers with the highest average Data Rate and Ops Rate.
The table displays current values for each of the charted metrics, plus Latency by block and file
and the most severe alarm status on the FC ports and IP ports associated with this controller.
Figure 10. Controllers
•
Ports. Select a controller from the Show Ports for Controller list. The ports associated with the
selected controller are displayed, however the port’s physical state and performance metrics are
unavailable for NetApp filers. The IP ports shown are used for iSCSI I/O to LUNs traffic.
Foglight for Storage Management 4.1
User and Reference Guide
51
Figure 11. Ports
•
5
6
Alarm Summary. Displays alarms on controllers and ports.
To investigate further, click the name of a component:
•
Click a controller name. See Investigating a Controller on page 74.
•
Click a port name. See Investigating an Array/Filer Port on page 73.
•
Click an alarm. See Assessing Storage Alarms on page 20.
Click the Aggregates tab.
•
Lowest Available Capacity. Displays the five aggregates with the lowest average value for
Available Usable Capacity over the time period.
•
Most Overcommitted Aggregates. Identifies up to five of the most overcommitted aggregates,
and displays current values for Advertised NASVolumes Size, Total Usable Capacity, and Used
Capacity.
Figure 12. Top 5 Aggregates
•
Aggregate Details. For each aggregate, displays the aggregate name, status, and current values
for Total Usable Capacity, Available Usable Capacity, % Available, Advertised NASVolumes Size,
and Overcommitment.
•
Alarm Summary. Displays alarms on aggregates.
Foglight for Storage Management 4.1
User and Reference Guide
52
To investigate further, click the name of a component:
7
•
Click an aggregate name. See Investigating an Aggregate on page 70.
•
Click an alarm. See Assessing Storage Alarms on page 20.
Click the NASVolumes tab.
This view lets you quickly see the top 15 NASVolumes with a selected metric, averaged over the selected
time period.
Selecting the time period "for current" shows the NASVolumes with the worst current metric.
Selecting the time period "last 30 minutes, last hour, last 4 hours" shows the NASVolumes with the worst
average values in that time period.
NOTE: The selected time range looks at the average of values from collections that completed
between the time at the end of the zonar, and the selected period before that.
The bars in the Average column show the values of the other items' metric relative to the first (highest)
item.
Figure 13. NASVolumes tab
Click on a name in the table, and summary details about that item will be displayed in details panel
below. This panel will display the exact time period of the completed collections in the selected
timerange.
Figure 14. Summary detail
To further investigate the selected item, click Explore in the details panel. See Investigating a
NASVolume on page 84.
•
Alarm Summary. Displays alarms on NASVolumes.
Foglight for Storage Management 4.1
User and Reference Guide
53
To investigate further, click an alarm. See Assessing Storage Alarms on page 20.
8
Click the LUNs tab.
9
Click the Disks tab.
NOTE: To investigate a NASVolume, LUN, or Disk that is not included in the Top 15 items on the
NASVolumes, LUNs, or Disks tab, use the Advanced Find functionality on the Advanced Find tab to
query for the desired items. For more information, see Advanced Find on page 66
Exploring a Storage Array
The Storage Explorer’s Storage Array dashboard contains information about the selected storage array and its
child components.
The tabs displayed on the dashboard differ depending on the selected storage array:
•
Non-Clustered Storage Arrays (supported storage arrays except EqualLogic and Isilon)
•
Dell EqualLogic Storage Array
•
EMC Isilon Storage Array
Non-Clustered Storage Arrays
Most supported storage arrays are displayed as described in this walkthrough. If you are using Dell EqualLogic
arrays or EMC Isilon arrays, see Dell EqualLogic Storage Array or EMC Isilon Storage Array instead.
This walkthrough describes the contents of each of the tabs and points you to where you can drill down for more
details. The data displayed reflects values for the selected time range.
To explore storage arrays:
1
On the navigation panel, under Dashboards, click Storage & SAN > Storage Explorer.
2
On the navigation panel, under Topology, expand Storage Arrays and select a storage array.
The Storage Array dashboard opens with details for the selected storage array.
TIP: You can also open this dashboard from the Storage Environment dashboard by selecting an
EqualLogic storage array and clicking View in Explorer.
3
If the Alarm Count displays alarms, you may want to assess the alarms before continuing with this
walkthrough. Click a status count to open a list of device and component alarms. For more information,
see Assessing Storage Alarms on page 20.
4
In the Summary tab, review performance in terms of key metrics.
The Summary tab contains the same views as the Storage Array Summary (Selected Storage Array) quick
view described in Monitoring Storage Arrays on page 26. In addition, the tab includes the following
views:
5
•
Summary and Resource Information. Displays physical details about the storage array.
•
Alarm Summary. Displays alarms on the storage array.
Click the Controllers-Ports tab.
•
Controllers. The charts display the controllers with the highest average Data Rate, Ops Rate, and
% Busy, if available. The table displays current values for each of the charted metrics, plus
Latency if available, and the most severe alarm status on the FC ports and IP ports associated
with this controller.
Foglight for Storage Management 4.1
User and Reference Guide
54
Figure 15. Controllers
•
Ports. Select a controller from the Show Ports for Controller list. The chart will display port
utilization distribution, if the array provides the operating link speed of the ports. Otherwise, the
Data Rate of the 3 busiest ports will be displayed. The tables will display all available current
metrics for each port.
Figure 16. Ports
Alarm Summary. Displays alarms on controllers and ports.
To investigate further, click the name of a component:
6
•
Click an aggregate name. See Investigating an Aggregate on page 70.
•
Click an alarm. See Assessing Storage Alarms on page 20.
Click the Pools tab.
•
Show Top 15/All Pools/Pools w Thin LUNs With … This view lets you quickly see the top fifteen or
all LUNs with a selected metric, averaged over the selected time period. The performance
metrics that may be available for selection, depending on the array type, include:
•
R/w latency
•
Average queue depth
•
Data rate
•
Ops rate to disk
The capacity metrics that may be available for selection, depending on the array type, include:
•
Least time until full
•
Least or most available usable capacity
Foglight for Storage Management 4.1
User and Reference Guide
55
•
Least or most available raw capacity
Selecting the time period "for current" shows the LUNs with the worst current metric.
Selecting the time period "last 30 minutes, last hour, last 4 hours" shows the LUNs with the worst
average values in that time period. These time periods are available only for performance
metrics.
NOTE: The selected timerange looks at the average of values from collections that
completed between the time at the end of the zonar, and the selected period before that.
The bars in the Average column show the values of the other items' metric relative to the first
(highest) item.
NOTE: Selecting Pools w Thin LUNs is useful for monitoring Pools whose available capacity
can change quickly because it contains thin-provisioned LUNs.
Figure 17. Pools tab
Click on a name in the table to display summary details about that item in the details panel
below. This panel displays the exact time period of the completed collections in the selected
timerange.
Figure 18. Summary details
To further investigate the selected item, click “Explore” in the details panel.
•
Alarm Summary. Displays alarms on pools.
To investigate further, click an alarm. See Assessing Storage Alarms on page 20.
7
Click the LUNs tab.
8
Click the Disks tab.
9
Click the Advanced Find tab to investigate a LUN or Disk that is not included in the Top 15 items on the
LUNs or Disks. Use the Advanced Find functionality to query for the desired items. For more information,
see Advanced Find on page 66.
Dell EqualLogic Storage Array
Foglight for Storage Management monitors Dell EqualLogic array groups. In the interface, groups are
categorized as Storage Arrays, and array members are called Member Nodes or simply Members.
Foglight for Storage Management 4.1
User and Reference Guide
56
This walkthrough describes the contents of each of the tabs and points you to where you can drill down for more
details. The data displayed reflects values for the selected time range.
To explore an EqualLogic storage array:
1
On the navigation panel, under Dashboards, click Storage & SAN > Storage Explorer.
2
On the navigation panel, under Topology, expand Storage Arrays and select an EqualLogic group.
The Storage Array dashboard opens with details for the selected EqualLogic group.
TIP: You can also open this dashboard from the Storage Environment dashboard by selecting a
storage array and clicking View in Explorer.
3
If the Alarm Count displays alarms, you may want to assess the alarms before continuing with this
walkthrough. Click a status count to open a list of device and component alarms. For more information,
see Assessing Storage Alarms on page 20.
4
In the Summary tab, review performance in terms of key metrics.
The Storage Explorer’s Summary tab contains the same views as the Storage Environment’s quick view for
EqualLogic storage arrays:
•
Related Inventory. Contains alarm summaries for the selected storage array and its member
nodes, IP ports, pools, LUNs, and disks.
•
Storage Capacity Summary. Displays current values for Total Advertised LUNs Size and Capacity
Provisioned to LUNs.
•
Network Ports Utilization. Plots aggregated current values for Send Util and Rcv Util over the
time period.
•
Pools with Severe or High Pressure on Available Usable Capacity:
•
•
Displays the pools that have the most significant, near-term capacity issues.
•
Shows the available usable capacity in the table, the % available, and over commitment.
•
Shows the estimated time when the pool capacity will be full, if available.
•
The cylinders are colored to show the % of available capacity in the pool.
LUNs/Disks States. Plots the percentage of disks and LUNs in the storage array in problem states.
Problem states are reported by the vendor. Resolving these issues may improve LUN performance.
Figure 19. LUNs/Disks States
In addition, the tab includes the following views:
Foglight for Storage Management 4.1
User and Reference Guide
57
5
•
Summary and Resource Information. Displays physical details about the storage array.
•
Alarm Summary. Displays alarms on the storage array.
Click the Network tab.
•
Network Summary. Identifies the load on the network, in terms of Send Util and Rcv Util, and the
Packet Errors % over the time period.
Figure 20. Network Summary
•
Port Status Filter. Click to show only ports with the selected status in the Port Details table.
•
Port Details. For each member-plus-port combination, displays current values for Data Rate,
Send Util, Rcv Util, Packet Errors %, and Link Speed. Click a
Sparkline to plot metric values
over the time period. Also displays the member’s IPv4 and IPv6 addresses, if available.
Figure 21. Port Status
•
6
Alarm Summary. Displays alarms on ports.
Click the Pools-Members tab.
•
Highest charts. Display the pools with the highest average value for Ops Rate and Latency. Also
shows aggregated values for Total Advertised LUNs Size and Capacity Provisioned to LUNs for all
members.
Figure 22. Pools-Members
•
Pool Details. For each pool, displays its name, status, and current values for Data Rate, Ops Rate,
Latency, Average Queue Depth, Total Advertised LUNs Size, Total Usable Capacity, Available
Usable Capacity, Capacity Provisioned to LUNs, and Overcommitment. Click a
plot metric values over the time period. Also identifies pool members.
Sparkline to
Foglight for Storage Management 4.1
User and Reference Guide
58
Figure 23. Pool Details
•
Member Details. For each member, displays its name, parent pool, status, and current values for
Data Rate, Ops Rate, Latency, Send Util, Rcv Util, Total Usable Capacity, % Used, and % of Pool.
Also displays the RAID level in use for the member.
Figure 24. Member Details
•
Alarm Summary. Displays alarms on pools and members.
7
Click the LUNs tab.
8
Click the Disks tab.
9
To investigate further, click the name of a component on any tab:
•
Click a port name. See Investigating an Array/Filer Port on page 73.
•
Click an array member name. See Investigating an EqualLogic Member on page 77.
•
Click a pool name. See Investigating a Pool on page 88.
•
Click an alarm. See Assessing Storage Alarms on page 20.
10 Click the Advanced Find tab to investigate a LUN or Disk that is not included in the Top 15 items on the
LUNs or Disks. Use the Advanced Find functionality to query for the desired items. For more information,
see Advanced Find on page 66.
EMC Isilon Storage Array
Foglight for Storage Management monitors EMC Isilon clusters. In the interface, clusters are categorized as
Storage Arrays, and the nodes are called Member Nodes or simply Members.
This walkthrough describes the contents of each of the tabs and points you to where you can drill down for more
details. The data displayed reflects values for the selected time range.
To explore an Isilon storage array:
1
On the navigation panel, under Dashboards, click Storage & SAN > Storage Explorer.
2
On the navigation panel, under Topology, expand Storage Arrays and select an Isilon cluster.
The Storage Array dashboard opens with details for the selected Isilon cluster.
TIP: You can also open this dashboard from the Storage Environment dashboard by selecting an
Isilon storage array and clicking View in Explorer.
3
If the Alarm Count displays alarms, you may want to assess the alarms before continuing with this
walkthrough. Click a status count to open a list of device and component alarms. For more information,
see Assessing Storage Alarms on page 20.
4
In the Summary tab, review performance in terms of key metrics.
The Storage Explorer’s Summary tab contains the same views as the Storage Environment’s quick view for
Isilon storage arrays:
•
Related Inventory. Contains alarm summaries for the selected storage array and its member
nodes, external ports, internal ports, pools, and disks.
Foglight for Storage Management 4.1
User and Reference Guide
59
•
IFS Capacity. The chart displays IFS Capacity: Free and IFS Capacity Used for the IFS file system.
In addition, the following metrics pairs are displayed:
- IFS Capacity: Total and IFS Capacity: Free
- IFS Capacity: HDD Total and IFS Capacity: HDD Free
- IFS Capacity: SDD Total and IFS Capacity: SDD Free
•
Average CPU Busy %. Displays the aggregate value of % Busy for all nodes in the array over the
time period.
•
File System Throughput. Plots Data Read Rate and Data Write Rate to the file system over the
time period.
•
Disk System Throughput. Plots Data Read Rate and Data Write Rate to the disk over the time
period.
•
External Network Throughput. Plots the Rcv Data Rate and Send Data Rate for the external
network over the time period.
TIP: If you prefer to see the Send Util (% of max) and Rcv Util (% of max) based on the rated
maximum speeds of the ports (actual port speeds are not available from Isilon arrays) in the
external network, ask your Foglight for Storage Management Administrator to edit the registry
variable StSAN_StoragePortShowUtilMax and set ISLN_E_IP=true.
•
Internal Network Throughput. Plots the Rcv Data Rate and Send Data Rate for the internal
network over the time period.
•
Clients. Plots clients connected versus clients that are actively using the network over the time
period.
Figure 25. Clients
In addition, the tab includes the following views:
5
•
Summary and Resource Information. Displays physical details about the storage array.
•
Alarm Summary. Displays alarms on the storage array.
Click the Network tab.
Use this tab to understand network utilization and packet errors by port.
External Network Summary
•
Charts. Plot aggregated values for Send Data Rate and Rcv Data Rate, and Packet Errors % for the
external network connections over the time period.
•
Port Status Filter. Click this link to show only ports with a selected status in the Port Details
table.
Foglight for Storage Management 4.1
User and Reference Guide
60
•
Port Details. For each port used by the external network, identifies the port name, parent node,
status, physical state, and current values for Send Data Rate, Rcv Data Rate, and Packet Errors %.
Also displays the maximum speed of the port. Click a
the time period.
Sparkline to plot metric values over
NOTE: When the registry variable StSAN_StoragePortShowUtilMax is set to show port utilization,
this table also displays Rcv Util and Send Util.
•
Alarm Summary. Displays alarms on ports used by the external network.
Figure 26. Alarm Summary
Internal Network Summary
•
Charts. Plot aggregated values for Send Data Rate, Rcv Data Rate, and Packet Errors % for the
internal network connections over the time period.
•
Port Status Filter. Click this link to show only ports with a selected status in the Port Details
table.
•
Port Details. For each port used by the internal network, identifies the port name, parent node,
status, physical state, and current values for Send Data Rate, Rcv Data Rate, and Packet Errors %.
Also displays the maximum speed of the port. Click a
the time period.
Sparkline to plot metric values over
NOTE: When the registry variable StSAN_StoragePortShowUtilMax is set to show port utilization,
this table can also display Rcv Util and Send Util.
•
6
Alarm Summary. Displays alarms on the external network and its ports.
Click the Member Nodes tab.
•
Top 5 Nodes. Displays the top five member nodes with the highest average value for Data Rate,
Latency, and % Busy. The charts plot values over the time period, while the tables show the
average and current values for each node.
Foglight for Storage Management 4.1
User and Reference Guide
61
Figure 27. Top 5 Nodes
•
Node Status Filter. Click to show only member nodes with the selected status in the Node Details
table.
•
Node Details. For each member node, displays its name, status, physical state, and current
values for Data Rate, % Busy, Latency, L2 Cache Hit Rate, IFS Capacity, and IFS Capacity: Free.
Also displays the worst status of the node’s disks, external network ports, and internal network
ports, and identifies the node’s parent pool. Click a
time period.
Sparkline to plot metric values over the
Figure 28. Node Status
You can add the following metrics to the table by clicking the Customizer
icon:
L1 Cache Hit Rate, Send Data Rate and Rcv Data Rate. You can also choose to display other Isilon
details, such as Total Disk IOPS and Model.
•
7
Alarm Summary. Displays alarms on nodes.
Click the Pools tab.
•
Top 5 Pools. Displays the top five pools with the highest average value for Data Rate and Latency,
and the lowest average value for L2 Cache Hit Rate. The charts plot values over the time period,
while the tables show the average and current values for each pool.
Figure 29. Top 5 Pools
•
Pool Status Filter. Click to show only pools with the selected status in the Pool Details table.
•
Pool Details. For each pool, displays its name, status, and current values for Data Rate, Latency,
L2 Cache Hit Rate, IFS Capacity, IFS Capacity: Free, HDD % Free, and SDD % Free. Also displays the
tier that the pool belongs to and the worst status of nodes associated with the pool. Click a
Sparkline to plot metric values over the time period.
Foglight for Storage Management 4.1
User and Reference Guide
62
Figure 30. Pool Status Filter
You can add the following metrics to the table by clicking the Customizer
Ops Rate, % Busy, and L1 Cache Hit Rate.
•
8
icon:
Alarm Summary. Displays alarms on pools.
Click the IFS tab.
•
Current Capacity Usage. For each pool or tier, displays the following sets of metrics:
- IFS Capacity: HDD Total, IFS Capacity: HDD Used, IFS Capacity: HDD Free, and HDD % Free
- IFS Capacity: SDD Total, IFS Capacity: SDD Used, IFS Capacity: SDD Free, and SDD % Free
Figure 31. Current Capacity Usage
•
File System. Plots Data Read Rate and Data Write Rate values and Read Ops Rate and Write Ops
Rate values for the file system over the time period.
Figure 32. File System
•
Exported Directories. For each exported directory, states whether the directory is used in the
monitored environment. When Used=true, you can click the directory name to view a topology
diagram showing the datastores that mount the directory.
Figure 33. Exported Directories
9
Click the Disks tab.
10 To investigate further, click the name of a component on any tab:
•
Click a port name. See Investigating an Array/Filer Port on page 73.
•
Click a member node name. See Investigating an Isilon Node on page 80.
•
Click a pool name. See Investigating a Pool on page 88.
Foglight for Storage Management 4.1
User and Reference Guide
63
•
When a directory is used in the monitored environment, click the directory name. See
Investigating a Directory on page 76.
•
Click an alarm. See Assessing Storage Alarms on page 20.
11 Click the Advanced Find tab to investigate a Disk that is not included in the Top 15 items on the Disks.
Use the Advanced Find functionality to query for the desired items. For more information, see Advanced
Find on page 66.
Common Data for Filers and Storage Arrays
LUNs tab
A LUN (logical unit) is the block storage element made available by an array or filer to a server.
•
Show LUNs With …
This view lets you quickly see the top 15 LUNs with a selected metric, averaged over the selected time
period.
•
Selecting the time period "for current" shows the LUNs with the worst current metric.
•
Selecting the time period "last 30 minutes, last hour, last 4 hours" shows the LUNs with the worst
average values in that time period.
NOTE: The selected time range looks at the average of values from collections that completed
between the time at the end of the zonar, and the selected period before that.
The bars in the Average column show the values of the other items' metric relative to the first (highest)
item.
Figure 34. Show LUNs with
NOTE: The metrics available in the drop down for selection include only the metrics made
available by that particular array or filer vendor.
Click on a name in the table, and summary details about that item will be displayed in details panel
below. This panel will display the exact time period of the completed collections in the selected time
range.
Foglight for Storage Management 4.1
User and Reference Guide
64
Figure 35. LUNs
To further investigate the selected LUN, click Explore in the details panel. See Investigating a LUN.
•
Alarm Summary. Displays alarms on LUNs, disks. To investigate further, click an alarm. See Assessing
Storage Alarms on page 20.
Disks tab
•
Show Disks With...
This view lets you quickly see the top 15 disks with a selected metric, averaged over the selected time
period.
•
Selecting the time period “for current” shows the disks with the worst current metric.
•
Selecting the time period "last 30 minutes, last hour, last 4 hours" shows the disks with the worst
average values in that time period.
NOTE: The selected timerange looks at the average of values from collections that
completed between the time at the end of the zonar, and the selected period before that.
The bars in the Average column show the values of the other items' metric relative to the first (highest)
item.
NOTE: The metrics available in the drop down for selection include only the metrics made
available by that particular array or filer vendor.
Figure 36. Averages
Click on a name in the table, and summary details about that item will be displayed in the Details panel
below. This panel will display the exact time period of the completed collections in the selected time
range
Foglight for Storage Management 4.1
User and Reference Guide
65
Figure 37. Details panel.
To further investigate the selected disk, click Explore in the details panel. See Investigating a Physical
Disk.
•
Alarm Summary. Displays alarms on disks. To investigate further, click an alarm. See Assessing Storage
Alarms on page 20.
Advanced Find
The tabs for specific entities (like LUNs, Disks, Pools, NASVolumes) quickly identify the Top 15 objects exhibiting
the "worst" values for a particular metric. Often, however, there is a need to locate objects by name, or with
other characteristics. Use the Advanced Find functionality to locate these items.
Queries can be constructed using a combination of Property conditions and Metric conditions.
Foglight for Storage Management 4.1
User and Reference Guide
66
Figure 38. Advanced Find
Using the Property Conditions
Finding Entities by Name
To find entities by name:
1
Select the desired entity type to search for in the Find drop down.
2
Type the string to search for in the Name Like field.
3
Click Perform Query.
•
All items with that string anywhere in their name will be returned.
•
_ is a positional wildcard, that can match a single character. For example, C_D would match
decode, but would not match record
•
% is a non-positional wildcard. For example, C%D would match record
Finding Entities with Current Status or Physical State
To find entities with current status or physical state:
1
Select the desired entity type to search for in the Find drop down.
2
In Properties, check the status boxes to retrieve entities with that status or state.
3
Click Perform Query.
For example, This example searches for LUNs whose names include a particular string, and have a nonnormal alarm state.
The query will take a short time to execute. When it completes, the Latest Query panel will summarize
the query conditions. The Query Result panel will display the LUNs found to match the query conditions.
The table header displays the time period used for evaluating the query conditions.
Foglight for Storage Management 4.1
User and Reference Guide
67
Figure 39. Query Conditions
Using the Metric Conditions
The metric conditions area lets you easily construct a more complex query that can include up to 3 conditions
on metrics.
Figure 40. Metrics conditions
To use the metric conditions:
1
Period. Refers to the time range on the zonar displayed in the upper right-hand corner. Select the time
the query should use for the metric values. When n/a is selected, this condition is ignored.
2
Metric. This drop down shows the metrics available for this entity type for this vendor.
3
The third drop down list provides the comparison operator.
4
Use the fourth field to enter the value to use for the comparison.
5
Use the fifth drop down to select the units for the value you entered. The units selection will change
based on the type of metric selected.
6
Click the + icon if you want to add another condition. The next condition row will be preceded by a
AND/OR selector. A trash can icon will appear to the right of the + icon to let you remove a condition.
Foglight for Storage Management 4.1
User and Reference Guide
68
Up to 3 conditions can be defined. When the + icon is clicked to display the 3rd condition row,
parentheses will be displayed around the first 2 conditions, indicating the order of evaluation.
For example, the following query looks for LUNs that had low average latency during the period, but had
a higher latency at the last collection.The query will take a short time to execute. When it completes:
•
The Latest Query panel will summarize the query conditions.
•
The Query Result panel will display the LUNs found to match the query conditions. The table
header displays the time period used for evaluating the query conditions. The metrics used in the
query conditions will be displayed in the results table.
Figure 41. Query results
To investigate further
•
Click the name of an entity in the table, to drill down to the explorer view for that entity.
•
Click on an alarm Status icon, to display the alarms for that entity.
Foglight for Storage Management 4.1
User and Reference Guide
69
4
Investigating Storage Components
To gain a complete picture of the performance of a storage device, you need to understand how its child
components are performing.
Introducing Storage Component Dashboards
A component dashboard is a drill-down view that displays physical details about a selected component,
performance and capacity metrics, and alarms. Some also include a topology diagram that shows how the
component fits into the monitored environment; for more information, see Exploring Connectivity with SAN
Topology Diagrams on page 34.
For information about each type of component dashboard, see the following topics:
•
Investigating an Aggregate
•
Investigating an Array/Filer Port
•
Investigating a Controller
•
Investigating a Directory
•
Investigating an EqualLogic Member
•
Investigating an FC Switch Port
•
Investigating an Isilon Node
•
Investigating a LUN
•
Investigating a NASVolume
•
Investigating a Physical Disk
•
Investigating a Pool
Investigating an Aggregate
The data displayed in the dashboard reflects values for the aggregate over the selected time range. This
walkthrough describes the contents of each of the tabs.
To explore an aggregate:
1
On a dashboard, click an aggregate name.
The Aggregate dashboard opens.
2
Review overall performance in the Summary tab.
•
Aggregate Details. Displays the aggregate’s status, parent filer, and current values for Advertised
NASVolumes Size, Total Usable Capacity, Available Usable Capacity, Overcommitment, Data Rate,
and Ops Rate. Click a
Sparkline to plot metric values over the time period.
Foglight for Storage Management 4.1
User and Reference Guide
70
Figure 1. Aggregate Details
•
Pool Performance. Displays the same performance metrics found in the Aggregate Details view,
but in chart format.
Figure 2. Pool Performance
3
•
Perform Pool Change Analysis. Click to identify the NASVolumes and LUNs primarily responsible
for increased I/O. For more information, see Analyzing the Pool on page 100.
•
Perform Pool Load Analysis. Click to identify the busiest NASVolumes and LUNs, and rank them
based on their activity during the same time range over the last 30 days. For more information,
see Analyzing the Pool on page 100.
Click the Capacity tab.
Displays the following:
•
Capacity Summary
Figure 3. Capacity Summary
•
Capacity Usage Trends: Chart provides a visual display of the estimated capacity consumption in
the future, based on a regression analysis of the historical capacity consumption.
NOTE: If time until full shows "n/a", it means enough historical data has not been collected yet.
Mouse over to show how much more data is required. For more information, see Capacity Trending
on page 40.
Foglight for Storage Management 4.1
User and Reference Guide
71
Figure 4. Capacity Usage Trends
4
Click the Consumers tab.
Displays the NASVolumes and LUNs that are configured out of this aggregate.
•
Aggregate Details. Displays the aggregate’s status, parent filer, and current values for Advertised
LUNs Size, Advertised NASVolumes Size, Total Usable Capacity, Available Usable Capacity, and
Overcommitment.
•
NASVolumes Using This Aggregate. For each volume, displays its name, status, physical state,
and whether the NASVolume is used by an entity in the monitored environment. The metric
columns display current values for Data Rate, Latency, Advertised NASVolumes Size, Used
Capacity, and % Used. Click a
Sparkline to plot metric values over the time period.
The remaining columns provide the following details:
- Protection. Displays the type of protection in use on the volume.
- Thin?. Indicates whether the volume is thin-provisioned (true) or thick-provisioned.
- vFiler. The name of the aggregate’s parent virtual filer.
Figure 5. NASVolumes Using this Aggregate
•
LUNs Using This Aggregate. For each LUN, displays its name, status, physical state, and whether
the LUN is used by an entity in the monitored environment. The metric columns display current
values for Data Rate, Ops Rate, Latency, % Busy, and Advertised LUNs Size. Click a
to plot metric values over the time period.
Sparkline
The remaining columns provide the following details:
- Protection. Displays the type of protection in use on the LUN, such as the RAID level.
- Thin?. Indicates whether the LUN is thin-provisioned (true) or thick-provisioned.
- vFiler. The name of the aggregate’s parent virtual filer.
Foglight for Storage Management 4.1
User and Reference Guide
72
Figure 6. LUNs Using this Aggregate
5
Click the Disks tab.
Displays the disks used by the selected aggregate. For more information, see Common Component Disk
Tab Data on page 93.
6
To investigate further:
•
Click a NASVolume name. See Investigating a NASVolume on page 84 (includes a topology
diagram).
•
Click a LUN name. See Investigating a LUN on page 81 (includes a topology diagram).
•
Click a disk name. See Investigating a Physical Disk on page 86.
•
Click an alarm. See Assessing Storage Alarms on page 20.
Investigating an Array/Filer Port
The data displayed in the dashboard reflects values for the component over the selected time range. This
walkthrough describes the contents of each of the tabs.
To explore a port:
1
On a dashboard, click a filer or storage array port.
The Array/Filer Port dashboard opens.
2
Review overall performance in the Summary tab.
•
Details. Displays the port’s status, physical details, parent storage array or filer, port controller
(for filers and non-cluster arrays) or member node (for cluster arrays), and current values for
Data Rate, Ops Rate, Xmit Utilization, and Rcvd Utilization.
Figure 7. Details
•
Charts. Displays the following metrics over the time period, if available:
- Data Rate. Plots values for Data Rate, Data Read Rate, and Data Write Rate.
- Ops Rate. Plots values for Ops Rate, Read Ops Rate, and Write Ops Rate.
- Port Utilization. Plots values for Send Util and Rcv Util.
Figure 8. Data Rate
•
Alarm Summary. Displays alarms on the port.
Foglight for Storage Management 4.1
User and Reference Guide
73
3
If available, click the Topology tab. This tab is displayed for Fibre Channel (FC) ports only.
•
Basic Connectivity (diagram). Displays the selected port (left box) and its connection to its
controller. If the port is an FC port, the N port on the fibre switch it connects to is also
displayed.
- Click an icon for details about the device connected to the port. 
- Click a device name to open a list of components that you can drill down on.
Figure 9. Basic Connectivity
•
Port Dependencies. On separate tabs, displays connections to LUNs (through the selected port)
from virtual machines, ESX or Hyper-V servers, and physical hosts. If the selected port has
problems or failures, the connected VMs or hosts may exhibit performance problems.
Figure 10. Port Dependencies
4
To investigate further:
•
In the topology diagram, click a port or controller icon to navigate to related components.
•
Click an alarm. See Assessing Storage Alarms on page 20.
Investigating a Controller
Controllers manage the ports used by a non-cluster storage array or filer.The data displayed in the dashboard
reflects values for the component over the selected time range. The Controller dashboard has two tabs.
To explore a controller and its ports:
1
On a dashboard, click a controller name.
The Controller dashboard opens.
2
Review overall performance in the Summary tab.
•
Controller Details. Displays the controller’s status, physical details, and parent device.
Foglight for Storage Management 4.1
User and Reference Guide
74
Figure 11. Controller Details
•
Charts. Displays the following metrics over the time period, if available:
- Data Rate. Plots values for Data Rate, Data Read Rate, and Data Write Rate.
- Ops Rate. Plots values for Ops Rate, Read Ops Rate, and Write Ops Rate.
- Latency. Plots values for latency.
- % Busy.
Figure 12. Ops Rate
•
3
Alarm Summary. Displays alarms on the controller.
Review Port performance in the Port tab.
•
•
FC Ports/IP Ports charts.
•
If the operational link speeds for the ports are available, a chart will be displayed that
shows the performance of the ports based on their average utilization in the time period.
•
If the operational link speeds for the ports are not available, a chart will be displayed with
the top three ports with the highest average Data Rate in the time period.
Port Details. List the ports, their status and physical state, and current values for Data Rate and
Ops Rate. Click a
Sparkline to plot metric values over the time period.
NOTE: Port metrics are not available for NetApp filers. The Ports tab is not displayed.
Foglight for Storage Management 4.1
User and Reference Guide
75
Figure 13. FC Ports
4
To investigate further:
•
Click a port name. See Investigating an Array/Filer Port on page 73.
•
Click an alarm. See Assessing Storage Alarms on page 20.
Investigating a Directory
In Isilon storage arrays, directories are mounted by datastores. The data displayed in the dashboard reflects
values for the directory over the selected time range. The Directory dashboard has one tab.
To explore a directory:
1
On a dashboard, click a directory name.
The Directory dashboard opens.
2
Review overall performance in the Summary tab.
•
Details. Displays the directory’s status, physical details, and parent device.
Figure 14. Details
•
Topology. Displays the connections from datastores to the selected directory.
Figure 15. Topology
Foglight for Storage Management 4.1
User and Reference Guide
76
3
To investigate further, click an icon to see more detail about the entity it represents, and if desired,
select a component name in the detailed view to navigate to the dashboard for that component.
Investigating an EqualLogic Member
The data displayed in the dashboard reflects values for the selected EqualLogic array member over the time
range. This walkthrough describes the contents of each of the tabs.
To explore an EqualLogic member:
1
In a topology diagram or from the EqualLogic Storage Array dashboard, click an array member name.
The Member Node dashboard opens.
2
Review overall performance in the Summary tab.
•
Related Inventory. Displays a list of components from the perspective of the member (rather
than the storage array).
•
Resource Utilization charts. Compares resources used by the member against resources used by
the pool.
- Capacity Summary charts. Displays Used Capacity and Available Usable Capacity.
- Data Rate charts. Plots values for Data Read Rate and Data Write Rate.
- Ops Rate charts. Plots values for Read Ops Rate and Write Ops Rate.
- Latency charts. Plots values for Read Latency and Write Latency.
Figure 16. Overall Performance summary
3
•
Member Network Load. Plots the member’s Send Util and Rcv Util over the time period.
•
Summary and Resource Information. Displays physical details about the member.
•
Alarm Summary. Displays alarms on the member.
Click the Network tab.
Displays the same information as the Network tab on the EqualLogic Storage Array dashboard, but the
data reflects only the selected member. For more information, see Dell EqualLogic Storage Array on page
56.
4
Click the Disks tab.
Foglight for Storage Management 4.1
User and Reference Guide
77
Displays only the disks used by the selected member. For more information, see Common Component
Disk Tab Data on page 93.
Investigating an FC Switch Port
The data displayed in the dashboard reflects values for the component over the selected time range. This
walkthrough describes the contents of each of the tabs.
NOTE: If the port is a Brocade logical ISL, no metrics are available.
To explore a switch port:
1
On a dashboard, click an FC switch port.
The FC Switch Port dashboard opens.
2
Review overall performance in the Summary tab.
•
Details. Displays the port’s status, physical details, parent switch, parent fabric, VSAN (if
applicable), and current values for Data Rate, Frame Rate, Link Error Rate, and Non-Link Error
Rate. Click a
Sparkline to plot metric values over the time period.
Figure 17. Details
•
Charts. Charts plot the following metric pairs over the time period:
- Data Rate. Plots values for Data Receive Rate and Data Send Rate.
- Utilization. Plots values for Rcvd Utilization and Xmit Utilization.
- Frame Rate. Plots values for Frame Receive Rate and Frame Send Rate.
- Errors. Plots values for Link Error Rate and Non-Link Error Rate.
Figure 18. Charts
•
3
Alarm Summary. Displays alarms on the port.
Click the Topology tab.
Foglight for Storage Management 4.1
User and Reference Guide
78
The contents of this tab depends on the type of port selected.
ISL Port:
•
Switch-to-Switch Connectivity diagram. Displays the selected ISL port (left box) and its
connection to another ISL port (right box). Click a port icon for details about the device
connected to the port.
Figure 19. Switch-to-Switch Connectivity diagram
•
Topology Table (Inter-Switch Connections). Identifies all ISL port connections from this port’s
parent switch to other switches in the fabric.
Figure 20. Topology Table
N Port:
•
Basic Connectivity (table). Displays the ESX hosts and/or physical hosts connected to the
selected N port through their host ports. An N port can have connections to multiple host ports
using NPV technology.
TIP: If hosts in the Host Name column are (unknown), you may be able to use dependency
processing to infer the host names associated with the host ports. For instructions, see Inferring
Physical-Host-to-Storage Relationships on page 104.
•
Basic Connectivity (diagram). Displays the selected N port (left box) and its connection to a filer
or storage array port (right box). Click a port icon for details about the device connected to the
port.
•
Select Host Port(s). Controls the set of hosts displayed in the Port Dependencies table.
•
Port Dependencies. On separate tabs, displays connections to LUNs (though the selected port)
from virtual machines, ESX or Hyper-V servers, and physical hosts. If the selected port has
problems or failures, the connected VMs or hosts may exhibit performance problems.
Figure 21. Port Dependencies
Foglight for Storage Management 4.1
User and Reference Guide
79
4
To investigate further:
•
Click a VSAN link and then select the name of the VSAN. Exploring a Cisco VSAN on page 50.
•
Click an alarm. See Assessing Storage Alarms on page 20.
Related Topics
•
Introducing Storage Component Dashboards
•
Exploring a Fabric
•
Exploring a Switch
•
Exploring a Cisco VSAN
Investigating an Isilon Node
The data displayed in the dashboard reflects values for the selected Isilon node over the time range. This
walkthrough describes the contents of each of the tabs.
To explore an Isilon node:
1
In a topology diagram or from the Isilon Storage Array dashboard, click a member node name.
The Member Node dashboard opens.
2
Review overall performance in the Summary tab.
•
Related Inventory. Displays a list of components from the perspective of the member node
(rather than the storage array).
•
IFS Capacity. The chart displays IFS Capacity: Free and IFS Capacity Used for the IFS file system.
In addition, the total capacity is broken down into hard disk drive and solid state drives metrics as
follows:
- Total. IFS Capacity: Total and IFS Capacity: Free
- HDD. IFS Capacity: HDD Total and IFS Capacity: HDD Free
- SSD. IFS Capacity: SDD Total and IFS Capacity: SDD Free
•
Node Network Load. Charts plot the following metric pairs:
- External Network Throughput. Plots the Send Data Rate and Rcv Data Rate for the external
network over the time period.
TIP: If you prefer to see the Send Util (% of max) and Rcv Util (% of max) based on the rated
maximum speeds of the ports (actual port speeds are not available from Isilon arrays) in the
external network, ask your Foglight for Storage Management Administrator to edit the registry
variable StSAN_StoragePortShowUtilMax and set ISLN_E_IP=true.
- Internal Network Throughput. Plots values for Send Data Rate and Rcv Data Rate.
- Clients. Clients connected versus clients that are actively using the network.
•
Node Performance. Charts plot the following metrics for the selected node over the time period:
% Busy, Latency, L1 Cache Hit Rate, L2 Cache Hit Rate, Data Read Rate, Data Write Rate, Read
Ops Rate, and Write Ops Rate.
Foglight for Storage Management 4.1
User and Reference Guide
80
Figure 22. Node Performance
3
•
Summary and Resource Information. Displays physical details about the member node.
•
Alarm Summary. Displays alarms on the member node.
Click the Network tab.
Displays the same information as the Network tab on the Isilon Storage Array dashboard, but the data
reflects only the selected member node. For more information, see EMC Isilon Storage Array on page 59.
4
Click the Disks tab.
Displays disks used by the selected member. For more information, see Common Component Disk Tab
Data on page 93.
Related Topics
•
Introducing Storage Component Dashboards
•
EMC Isilon Storage Array
Investigating a LUN
A LUN (logical unit number) represents a logical SAN block storage device on an array or filer that can be
exposed for mapping to a server. The data displayed in the dashboard reflects metric values for the LUN over
the selected time range. This walkthrough describes the contents of each of the tabs.
To explore a LUN:
1
On a dashboard, click a LUN name or icon.
The LUN dashboard opens.
2
Review overall performance in the Summary tab.
•
Details. Displays the LUN’s status, physical details, parent device and capacity metrics.
Foglight for Storage Management 4.1
User and Reference Guide
81
Figure 23. Details
•
Charts. Plot values over the time period for these metrics, if available: Ops Rate, Data Rate,
Latency, % Busy, Cache Hit Rate, and Average Queue Depth.
Figure 24. Data Rate
•
Pool Details. Identifies the pools to which the LUN belongs.
Non-clustered storage arrays — Displays performance and capacity metrics for the pools relevant
to the selected LUN. For more information, see Performance Metrics on page 118 and Capacity
Metrics on page 121.
Figure 25. Pool Details
EqualLogic storage arrays—Displays the same information as the Pool Details view on the PoolsMembers tab of the EqualLogic Storage Array dashboard, but the data reflects only the pools
relevant to the selected LUN. For more information, see Dell EqualLogic Storage Array on page
56.
Figure 26. Pool Details
•
3
Alarm Summary. Displays alarms on the LUN.
Click the Topology tab.
Displays the datastores, CSVs, or logical disks that get their physical storage from the selected LUN, the
VMs that do the I/O, and if dependency processing is enabled, the (most likely) physical host connected
to the LUN. For more information on dependency processing, see Inferring Physical-Host-to-Storage
Relationships on page 104.
Figure 27. LUN providing storage to VMware VMs
Foglight for Storage Management 4.1
User and Reference Guide
82
Figure 28. LUN providing storage to Hyper-V VMs
4
•
Click a
Cloud icon to reveal the ports and paths from disks extents to the LUN, through a
particular server. For more information, see Exploring Connectivity with SAN Topology Diagrams
on page 34.
•
Click an icon, then select a component name to navigate to the component dashboard. For
example, if you select the VM icon, you can see a list of the VMs that use the storage and select
one that warrants further investigation.
Click the SAN Data Paths tab.
Displays the datastores or servers, CSVs or logical disks that get their physical storage from the selected
LUN, and the VMs that do I/O to them and the LUN.
5
Review the I/O Paths table to assess the worst performing path segment of the possible data paths
between each disk extent and the LUN during the time period. This helps you to identify bottlenecks
resulting in high latency. For general information about how to use a Data Paths tab, see Exploring I/O
Performance with SAN Data Paths on page 37. For information specific to the LUN Data Path tab, see
below.
TIP: For metrics that have thresholds, the color of the bars highlight data values that fall within
normal, warning, critical, and fatal thresholds.Thresholds are set in the registry variables
StSAN.FCSwitchPortUtilization.[Warning|Critical|Fatal].
Figure 29. I/O Paths
NOTE: The column headings may vary between VMware and Hyper-V. Where the names differ, the column
headers will be listed for VMware and Hyper-V respectively.
•
ESX/VMs or Hyper-V Server/VM. List of servers and the VMs they host.
•
Latency. Average latency per operation in the time period.
•
Data Rate. Average data rate for I/O from the ESX or VM to the LUN over the time period.
•
% Total I/O in ESX or % Total I/O in Hyper-V Server. When multiple VMs on a server are
performing I/O to the same LUN, displays the percentage of the total I/O performed by this VM.
•
ESX FC Ports --> SAN Util or Hyper-V Server FC --> Ports SAN Util. Displays the busiest link (read
or write utilization) over a time period. Click the cell to display all the port links. Review the
topology diagram to see the ports and link utilization. Data is not available for IP ports.
•
SAN --> A/F Ports Util. Displays the busiest link (read or write utilization) over the time period.
Click the cell to display all the port links. Review the topology diagram to see the ports and link
utilization. Data is not available for IP ports.
•
% Total I/O at LUN. When VMs on multiple servers are doing I/O to the same LUN, displays the
percentage of the total I/O performed by this server.
•
LUN Latency (ms). Displays the latency of LUN in ms.
Foglight for Storage Management 4.1
User and Reference Guide
83
6
Review the diagram to understand the path from disk extents through the SAN to LUNs. The topology
diagram is enhanced to show the FC ports and to display latency, link utilization, and for controllers,
percent busy.
Figure 30. Topology diagram
Latency
Link utilization
Percentage busy
NOTE: When physical hosts are inferred by dependency processing and not monitored directly by
agents, the disk extents on the host that map to the selected LUN are unknown and therefore no
latency values are available. For instructions on how to enable dependency processing, see
Inferring Physical-Host-to-Storage Relationships on page 104.
7
To investigate further:
•
Click a host, VM, or virtual storage name or icon. See the Managing Virtualized Environments
User and Reference Guide.
•
Click a disk name or icon. See Investigating a Physical Disk on page 86.
•
Click a port name or icon. See Investigating an Array/Filer Port on page 73.
•
Click a controller name or icon. See Investigating a Controller on page 74.
•
Click an alarm. See Assessing Storage Alarms on page 20.
Related Topics
•
Introducing Storage Component Dashboards
•
Exploring a Filer
•
Exploring a Storage Array
Investigating a NASVolume
A NASVolume is a volume whose physical storage is on a filer or unified storage supplier. It can be mounted by
an ESX host to provide the physical storage for a datastore using NFS. The data displayed in the dashboard
Foglight for Storage Management 4.1
User and Reference Guide
84
reflects metric values for the NASVolume over the selected time range. This walkthrough describes the contents
of each of the tabs.
To explore a NASVolume:
1
On a dashboard, click a NASVolume name.
The NASVolume dashboard opens.
2
Review overall performance in the Summary tab.
•
Details. Displays the volume’s status, physical state, physical details, parent devices, and alarms
on disks in the volume. Also displays current values for Advertised NASVolumes Size, Used
Capacity, and % Used.
Figure 31. Details
•
Charts. Displays the following metrics over the time period:
- Data Rate. Plots values for Data Read Rate and Data Write Rate.
- Ops Rate. Plots values for Read Ops Rate, Write Ops Rate and Other Ops.
- Latency. Plots values for Read Latency and Write Latency.
Figure 32. Ops Rate
•
Aggregate Details. Displays values for Total Usable Capacity, Available Usable Capacity %
Available, Advertised NASVolumes Size, and Overcommitment.
Figure 33. Aggregate Details
•
3
Alarm Summary. Displays alarms on the NASVolume.
Click the Topology tab.
Displays the datastores that get their physical storage from the selected volume as a NAS mount and the
VMs that do I/O to these datastores. For more details, click an icon or an entity name. For more
information about using this view, see Exploring Connectivity with SAN Topology Diagrams on page 34.
Foglight for Storage Management 4.1
User and Reference Guide
85
Figure 34. Topology
4
Click the Disks tab.
Displays the disks used by the NASVolume. For more information, see Common Component Disk Tab Data
on page 93.
5
To investigate further:
•
Click a host, VM, or datastore name or icon. See the Managing Virtualized Environments User and
Reference Guide.
•
Click an alarm. See Assessing Storage Alarms on page 20.
Related Topics
•
Introducing Storage Component Dashboards
•
Exploring a Filer
•
Investigating an Aggregate
Investigating a Physical Disk
The data displayed in the dashboard reflects values for the selected disk over the selected time range. The
Physical Disk dashboard has only one tab.
To explore a physical disk:
1
On a dashboard, click a disk name.
2
Review overall performance in the Summary tab.
•
Disk Details. Displays the disk’s status, physical details, parent device, and size if available.
Figure 35. Disk Details
•
Charts. Displays the following metrics over the time period:
- Data Rate. Plots values for Data Rate, Data Read Rate, and Data Write Rate.
- Ops Rate. Plots values for Ops Rate, Read Ops Rate, and Write Ops Rate.
- Latency. Plots values for Latency, Read Latency, and Write Latency.
- Busy %. Plots values for % Busy.
- Queue Depth. Plots depth of the queue.
Foglight for Storage Management 4.1
User and Reference Guide
86
Figure 36. Ops Rate
•
Pool Details (Arrays only). Displays performance and capacity metrics for the pools to which this
disk contributes resources. For more information, see Performance Metrics on page 118 and
Capacity Metrics on page 121. To investigate a pool, click its name. For more information, see
Investigating a Pool on page 88.
Figure 37. Pool Details
EqualLogic storage arrays—Displays the same information as the Pool Details view on the PoolsMembers tab of the EqualLogic Storage Array dashboard, but the data reflects only the pools
relevant to the selected disk. For more information, see Dell EqualLogic Storage Array on page
56.
Figure 38. Pool Details - EqualLogic
Isilon storage arrays—Displays the same information as the Pool Details view on the Pools tab of
the Isilon Storage Array dashboard, but the data reflects only the pools relevant to the selected
disk. For more information, see EMC Isilon Storage Array on page 59.
Figure 39. Pool Details - Isilon
•
Aggregate Details (Filers only). Lists the aggregates to which this disk contributes resources. For
each aggregate, displays its name, status, physical state, and current values for Total Usable
Capacity, Available Usable Capacity % Available, Advertised NASVolumes Size, and
Overcommitment. To investigate an aggregate, click its name. See Investigating an Aggregate on
page 70.
Figure 40. Aggregate Details
•
Alarm Summary. Displays alarms on the disk.
Foglight for Storage Management 4.1
User and Reference Guide
87
Related Topics
•
Introducing Storage Component Dashboards
•
Exploring a Filer
•
Exploring a Storage Array
Investigating a Pool
Displays a summary of details for the selected pool, including a selection of size, capacity, data rate, and
latency metrics. The data displayed in the dashboard reflects values for the pool over the selected time range.
The Pool dashboard is very similar for non-clustered storage arrays and EqualLogic storage arrays. The Isilon
storage array requires its own Pool dashboard, which is discussed separately. The following walkthroughs
describe the contents of each of the tabs.
Pool belonging to a non-clustered storage array or
EqualLogic storage array
To explore a pool:
1
On a dashboard, click a pool name.
The Pool dashboard opens.
2
Review overall performance in the Summary tab. From here you can initiate further analysis of the pool
in terms of performance and, in some cases, pool capacity.
•
Pool Details. Displays the pool’s status, parent storage array, and current values for Total Raw
Capacity, Available Raw Capacity, Total Advertised LUNs Size, Total Usable Capacity, Available
Usable Capacity, and Overcommitment, if available.
Figure 41. Pool Details
•
Charts. Displays the following sets of metrics over the time period, if available:
- Data Rate. Plots values for Data Rate, Data Read Rate, and Data Write Rate.
- Ops Rate to Disk. Plots values for Ops Rate, Read Ops Rate, and Write Ops Rate.
- Latency. Plots values for Latency, Read Latency, and Write Latency.
- Average Queue Depth. Plots values for Average Queue Depth.
Foglight for Storage Management 4.1
User and Reference Guide
88
Figure 42. Pool Performance
3
•
Perform Pool Change Analysis. Click to identify the LUNs primarily responsible for increased I/O.
For more information, see Analyzing the Pool on page 100.
•
Perform Pool Load Analysis. Click to identify the busiest LUNs, and rank them based on their
activity during the same time range over the last 30 days. For more information, see Analyzing
the Pool on page 100.
•
Alarm Summary. Displays alarms on the pool.
Click the Capacity tab.
•
Capacity Summary. Pools in most arrays will display data similar to the data shown below.
Figure 43. Capacity Summary
For some arrays, pool capacity in Usable terms is not available. Then the capacity summary will
display with total capacity in raw terms instead of usable terms.
Figure 44. Capacity Summary - raw terms
•
Capacity Usage Trends. Provides a visual display of the estimated capacity consumption in the
future, based on a regression analysis of the historical capacity consumption.
If time until full shows n/a, it means enough historical data hasn't been collected yet to perform
an analysis. Mouse over for additional information. For more information, see Capacity Trending
on page 40.
Foglight for Storage Management 4.1
User and Reference Guide
89
Figure 45. Capacity Usage Trends
4
Click the LUNs tab. This tab displays the LUNs carved from this pool.
A LUN (logical unit) is the block storage element made available by an array or filer to a server.
•
Top 5 LUNs. Displays the top five LUNs with the highest average value for Data Rate, and the
performance metrics most useful for this array or filer.
Figure 46. Top 5 LUNs
•
LUN Status Filter. Click to show only LUNs with the selected status in the LUN Details table.
•
LUN Details. For each LUN, displays its status, physical state, parent pool or aggregate, and
whether the LUN is used by an entity in the monitored environment. The metric columns display
current values for Data Rate, Ops Rate, Latency, % Busy, Average Queue Depth, L1 Cache Hit
Rate, and Total Advertised LUNs Size. Click a
period.
Sparkline to plot metric values over the time
NOTE: If a metric is not available for this particular array or filer, the column is not displayed.
The remaining columns provide the following details:
- Protection. Displays the type of protection in use on the LUN, such as the RAID level.
- Thin?. Indicates whether the LUN is thin-provisioned (true) or thick-provisioned.
- IQN. Displays the iSCSI Qualified Name. Click the Customizer
to add this column.
- vFiler. Filers only. Displays the name of the virtual filer associated with the LUN.
•
5
Alarm Summary. Displays alarms on LUNs.
Click the Disks tab.
Displays the disks used by the selected pool. For more information, see Common Component Disk Tab
Data on page 93.
6
To investigate further:
•
Click a LUN name. See Investigating a LUN on page 81.
•
Click a disk name. See Investigating a Physical Disk on page 86.
Foglight for Storage Management 4.1
User and Reference Guide
90
Pool belonging to an Isilon storage array
To explore a pool for an Isilon storage array:
1
On an Isilon-related dashboard, click a pool name.
The Pool dashboard opens.
2
Review overall performance in the Summary tab.
•
Pool Details. Displays the pool’s status, parent storage array, and current values for IFS Capacity:
Free and IFS Capacity Used for the IFS file system. In addition, the total capacity is broken down
into hard disk drive and solid state drives metrics as follows:
- Total. IFS Capacity: Total and IFS Capacity: Free
- HDD. IFS Capacity: HDD Total and IFS Capacity: HDD Free
- SSD. IFS Capacity: SDD Total and IFS Capacity: SDD Free
Figure 47. Pool Details
•
Performance. Charts plot the following metrics for the selected pool over the time period: %
Busy, Latency, L1 Cache Hit Rate, L2 Cache Hit Rate, Data Read Rate, Data Write Rate, Read Ops
Rate, and Write Ops Rate.
Figure 48. Performance
•
Node Status Filter. Click to show only member nodes with the selected status in the Node Details
table.
•
Node Details. For each node belonging to the pool, displays its name, status, physical state, and
current values for Data Rate, % Busy, Latency, L2 Cache Hit Rate, IFS Capacity, and IFS Capacity:
Free. Also displays the worst status of the node’s disks, external network ports, and internal
network ports, and identifies the node’s parent pool. Click a
over the time period.
Sparkline to plot metric values
Figure 49. Node Status
Foglight for Storage Management 4.1
User and Reference Guide
91
You can add the following metrics to the table by clicking the Customizer
icon:
L1 Cache Hit Rate, Send Data Rate and Rcv Data Rate. You can also choose to display other Isilon
details, such as Total Disk IOPS and Model.
•
3
Alarm Summary. Displays alarms on the pool.
Click the Capacity tab.
•
Capacity Usage Trends. Provides a visual display of the estimated capacity consumption in the
future, based on a regression analysis of the historical capacity consumption.
If time until full shows n/a, it means enough historical data hasn't been collected yet to perform
an analysis. Mouse over for additional information. For additional information, see Capacity
Trending on page 40.
Figure 50. Capacity Usage Trends
4
Click the Network tab. Assess the performance of the ports in the External Network from the
perspective of the pool.
•
Charts. Plot aggregated values for Send Data Rate and Rcv Data Rate, and Packet Errors % for the
external network connections over the time period.
•
Port Status Filter. Click to show only ports with a selected status in the Port Details table.
•
Port Details. For each port used by the external network, identifies the port name, parent node,
status, physical state, and current values for Send Data Rate, Rcv Data Rate, and Packet Errors %.
Also displays the maximum speed of the port. Click a
the time period.
Sparkline to plot metric values over
NOTE: When the registry variable StSAN_StoragePortShowUtilMax is set to show port
utilization, the chart and table display Rcv Util (% of max) and Send Util (% of max).
Foglight for Storage Management 4.1
User and Reference Guide
92
Figure 51. External Network Summary
•
Alarm Summary. Displays alarms on ports used by the external network.
5
Click the Disks tab. Displays the disks used by the selected pool. For more information, see Common
Component Disk Tab Data on page 93.
6
To investigate further, click a disk name. See Investigating a Physical Disk on page 86.
Common Component Disk Tab Data
The Disks tab can be found on the following dashboards:
•
Member/Node
•
Pool
•
Aggregate
•
NASVolume.
The Disks tab displays the physical disks associated with the selected entity.
NOTE: The images in this section reflect an EMC CLARiiON storage array, but the views should be similar
for any filer or storage array. For filers, where the word pool is used, it means aggregate.
•
Top 5 Disks. Displays the top five disks with the highest average value for Ops Rate and the performance
metrics most useful for this array or filer.
Figure 52. Top 5 Disks
•
Disk Status Filter. Click to show only disks with the selected status in the Disk Details table.
Foglight for Storage Management 4.1
User and Reference Guide
93
•
Disk Details. For each disk, identifies its status, physical state, parent pool or aggregate, and Disk Size,
and then displays current values for Ops Rate, % Busy, Average Queue Depth, Data Rate, and Latency.
Click a
Sparkline to plot metric values over the time period.
NOTE: If a metric is not available for this particular array or filer, the column is not displayed.
The remaining columns provide the following details:
- Role. Displays the role played by the disk in the pool, such as disk or spare.
- RPM. Displays revolutions per minute as reported by the vendor.
- Disk Interface. When available, displays the type of interface, such as SATA or SCSI.
- Member. EqualLogic and Isilon only. Displays the name of the member node where the disk physically
resides.
Figure 53. Disk Status Filter
•
Alarm Summary. Displays alarms on disks.
Foglight for Storage Management 4.1
User and Reference Guide
94
5
Troubleshooting Storage Performance
NOTE: This chapter is intended for Foglight for Storage Management users with the role of Storage
Administrator.
Administrators may receive problem reports from stakeholders about the performance of a VMware virtual
machine (VM). When the suspected cause is storage, the Administrator can run an automated analysis using the
Storage Troubleshooting dashboard. The analysis can quickly rule out a storage issue, allowing the Administrator
to focus on other areas. Conversely, if a storage issue is found to be contributing to poor performance, the
results of the analysis clearly highlights the datastores or RDM disk extents that require attention.
This chapter describes how to start an investigation, analyze the results, and change latency thresholds. It also
summarizes the algorithm used to identify and assess storage performance issues.
Starting a Troubleshooting Investigation
Before beginning an investigation, you should ask the person reporting the issue the following questions:
•
What is the host name of the affected virtual machine?
•
When did you first notice the performance issues? The reported time frame determines how you set the
zonar time range.
To investigate a potential storage performance issue:
1
On the navigation panel, under Dashboards, choose Storage & SAN > Storage Troubleshooting.
2
Set the zonar to encompass the reported time frame, up to a maximum of 8 hours.
Figure 54. The zonar is located in the upper right corner of the dashboard.
3
Type the host name of the virtual machine.
4
The latency threshold values are set automatically based on the latency registry variables. If different
values would be more meaningful for your analysis, see Changing Latency Thresholds on page 101.
5
Click Perform Analysis.
Depending on the complexity of the analysis request, it may take a little time before the results of the
analysis appear on the dashboard.
Foglight for Storage Management 4.1
User and Reference Guide
95
Figure 55. Troubleshoot Storage Performance
Recall that a virtual machine can get its storage either from a datastore (connected in turn to one or
more disk extents or to a NASVolume) or directly from an RDM disk extent (without a datastore). In the
analysis results, each datastore or RDM disk extent that is connected directly to the selected virtual
machine is represented by a separate view.
6
Review the icons displayed in the title bars of each datastore/RDM view.
•
If a Normal
icon appears in all title bars, the performance issue is not storage-related. This
investigation is complete.
TIP: In some cases, the Diagnosis summary (right-hand panel) may contain hints about where to
look next. For example, increased I/O (as compared to typical I/O for this period) may indicate
that an application is behaving differently than before.
•
If the Attention
icon appears in one or more title bars, continue your investigation following
the workflow described in Analyzing Storage Issues.
Analyzing Storage Issues
If the view for a datastore or RDM disk extent shows the Attention
icon, the troubleshooting algorithm has
discovered evidence of a performance problem related to storage. The problem may or may not be in the SAN
Storage environment. Review the details to determine the cause of the performance issue.
Each datastore/RDM view has three summary panels (from left to right):
•
VM I/O to Datastore/RDM (first panel)
•
Latency for Disk Extents (middle panel)
•
Diagnosis (last panel)
A virtual machine may be connected to multiple datastores and RDM disk extents, each of which may report
varying degrees of problems. When a virtual machine has more than one datastore/RDM view, start by scanning
the timeline bars in the VM I/O to Datastore/RDM panel to identify a datastore/RDM with consistently slow I/O
performance or significant changes from typical performance.
Foglight for Storage Management 4.1
User and Reference Guide
96
The following workflow describes one way to identify a latency problem in the collected SAN Storage
environment. While the details in your investigation may differ, the general workflow should be similar to this
one.
To analyze storage issues:
1
In a view showing the Attention
icon, scan the VM I/O to Datastore/RDM summary (first panel). Look
for timeline bars that primarily show colors such as yellow, orange, or pink, that is, any color other than
green (which represents acceptable activity).
Figure 56. VM I/O to Datastore/RDM summary
In this example, the VM Latency vs Threshold timeline is orange, which means the virtual machine is
consistently exceeding the default latency thresholds that were specified for the analysis. We should
focus our investigation here.
The VM Latency vs Typical timeline is green, which means that the latency is typical for the time period;
this behavior has been going on for some time. The typical values are statistical values determined by
IntelliProfile from activity within the last 30 days.
2
Now look at the Latency for Disk Extents summary (middle panel) to identify the disk extents that are
contributing to the problem.
NOTE: When a datastore is connected to a NASVolume, this panel is empty.
Figure 57. Latency for Disk Extents
In this case, there is only one disk extent attached to the datastore. Its timeline is orange, which means
the disk extent is exceeding latency thresholds. The number in brackets indicates that the virtual
machine was performing I/O to the disk extent while the VM was experiencing latency. The larger the
number, the more I/O was occurring. When this number is zero, no I/O occurred while the VM was
experiencing latency.
TIP: In your own investigations, you may have more than one disk extent, in which case you may be
able to see that one disk extent is slow while the others are normal. Or you may have multiple
virtual machines sharing the same disk extent. In this case, the Diagnosis panel may display a
message to let you know that the latency may not reflect this virtual machine alone.
Foglight for Storage Management 4.1
User and Reference Guide
97
3
Next, review the notes in the Diagnosis summary (last panel).
Figure 58. Diagnosis summary
In this case, one of the notes describes a correlation between the virtual machine latency and the disk
extent latency. The other note points toward a problem in the SAN Storage or the network, and below is
a button that begins an analysis of the SAN Storage.
NOTE: If the troubleshooting algorithm determines that the issue is likely in the SAN Storage
environment, and if the SAN Storage is provided by an element that the Foglight for Storage
Management system is monitoring, the Analyze SAN Storage button appears in the Diagnosis panel.
4
Before analyzing the SAN Storage, you may want to quantify the performance issue by reviewing the
metric values that underlie the timeline bars in the summary panels.
a
In the VM I/O to Datastore/RDM summary, click the Chart
icon.
Figure 59. Metric values
The charts show the values of the metrics over the time period. Some charts also contain a
baseline range, which shows a statistical range of values encountered over the last 30 days.
Spikes outside of the normal range represent a significant change in behavior that may warrant
further investigation.
Foglight for Storage Management 4.1
User and Reference Guide
98
In this case, the top chart shows that the VM I/O latency is hovering around 200 m/s, well above
the default 25 m/s and 35 m/s latency thresholds.
b
Close the window.
c
In the Latency for Disk Extents summary, click the Chart
icon.
Figure 60. Disk Extent latency values
The chart shows the disk extent latency values hovering around 180 m/s. The latency here may
reflect the activity of multiple VMs doing I/O or the performance within the ESX itself. It is clear
that a significant delay is occurring in the disk extent.
d
5
Close the window.
In the Diagnosis panel, click Analyze SAN Storage.
The Storage-Side Analysis window breaks down performance by the Extent, the LUN to which it belongs,
and the pool to which the LUN belongs.
Figure 61. Storage-Side Analysis
This view shows that the disk extent is slow for all virtual machines and for the selected virtual machine.
The disk extent to LUN I/O performance also exceeds latency thresholds, though its current performance
is within the baseline range, which shows a statistical range of typical values encountered over the last
30 days. This indicates the high latency has been occurring for some time. If you want to see the metric
values, click the Chart
icon.
Foglight for Storage Management 4.1
User and Reference Guide
99
6
The next step in the investigation depends on the state of the pool.
•
If the pool timeline bars are green, the investigation is complete.
•
If the pool timeline bars are a color other than green, you can analyze the changes within the
pool and the load on the pool. Continue the investigation by following the workflow in Analyzing
the Pool.
In this example, the pool’s Avg Queue Depth timeline bar is yellow. The note beside the bar suggests that
this may be a cause of slow I/O to the LUN. The pool warrants further investigation.
Analyzing the Pool
When pool timeline bars show abnormal average queue depth or ops rate, analyze the changes within the pool
and the load on the pool.
•
Perform Pool Change Analysis. The Pool Change analyzer identifies the LUNs primarily responsible for
increased I/O. It compares LUN activity in the problem time range with LUN activity during the same
time range in the past. Changes are reported in terms of average operations rate and change amount.
•
Perform Pool Load Analysis. The Pool Load analyzer identifies the busiest LUNs and ranks them based on
their activity during the same time range over the last 30 days (not the current time frame). Activity is
measured in operations per second.
To analyze the pool:
1
From the Server-Side Analysis window or the Pool Explorer window, click Perform Pool Change Analysis.
Figure 62. Perform Pool Change Analysis
The analyzer compares activity for all the LUNs in the pool during the problem time range with their
activity a week ago. In this example, the table and pie chart confirm that the LUN with the highest
average operations rate and change amount is the LUN that includes the disk extent used by the poor
performing virtual machine.
Foglight for Storage Management 4.1
User and Reference Guide
100
TIP: You can change the comparison time range by clicking Change and selecting a new date and
time range.
2
Close the window.
3
Click Perform Pool Load Analysis.
Figure 63. Perform Pool Load Analysis
The chart and table show the top ten busiest LUNs. The analysis is based on the last 30 days of activity
during this time range. In this example, the table confirms that the top two busiest LUNs are connected
to the poor performing virtual machine.
4
Close the window.
The investigation is complete.
Changing Latency Thresholds
One way Foglight for Storage Management determines if the performance problem is occurring in the SAN
storage environment is to evaluate latency against the thresholds defined for latency in registry variables. The
latency thresholds used for analysis are, by default, the same thresholds as are used for generating latency
alarms. If you think it would be helpful to adjust the threshold values for your analysis, you can change the
threshold values using the Storage Troubleshooting dashboard. The original registry variables are not updated.
TIP: If you change thresholds, you can restore the default values later using the Reset button.
To change the latency thresholds:
1
In the Storage Troubleshooting dashboard, click Edit Thresholds.
2
Specify the new thresholds in milliseconds for one or both of Warning and Critical.
3
Click Apply.
Foglight for Storage Management 4.1
User and Reference Guide
101
Understanding the Troubleshooting
Algorithm
To determine if the problem is likely to be a storage performance problem, Foglight for Storage Management
evaluates latency metrics against thresholds and typical performance, and disk extent metrics against the I/O
being performed to the extent by the virtual machine. If the likely cause of the problem is slow performance in
the SAN Storage environment, Foglight for Storage Management examines the LUN or NASVolume. If no
circumstances, such as a rebuild, are identified as a cause of high latency, the pool is examined.
Foglight for Storage Management 4.1
User and Reference Guide
102
6
Managing Data Collection, Rules, and
Alarms
NOTE: This chapter is intended for users with the role Storage Administrator. Some tasks also require the
Foglight for Storage Management role of Administrator (as noted) to manage agents and rules.
You can collect additional types of relationship data, change collection schedules, manage rules, and manage
alarms.
•
Collecting Virtual Storage-to-SAN Relationships
•
Inferring Physical-Host-to-Storage Relationships
•
Modifying Data Collection Schedules (Administrator role required)
•
Managing Foglight for Storage Management Rules (Administrator role required)
•
Managing Alarm Settings
•
Troubleshooting Database Limits
If you are looking for information about configuring Foglight for Storage Management agents and agent masters,
reviewing agent alarms, creating support bundles, or backing up data, see the Managing Storage in Virtual
Environments Installation and Configuration Guide.
Collecting Virtual Storage-to-SAN
Relationships
To include virtual storage-to-SAN relationships in your topology diagrams, you need to enable storage collection
on your VMware or Hyper-V Performance agent. If you did not enable this option when defining the agent, you
can enable it by editing the agent.
NOTE: If you cannot see the Administration dashboards, ask your Foglight for Storage Management
Administrator to add the role VMware Administrator or Hyper-V Administrator to your user account.
To update an existing virtualization agent to add the collection of storage data:
1
On the navigation panel, under Dashboards, click VMware > VMware Agent Administration or Hyper-V >
Hyper-V Agent Administration.
2
In the Agents table, locate the Agent. In the Data Collection column, you can see whether general data
collection is enabled for this agent. Storage data collection, however, is not enabled by default.
Foglight for Storage Management 4.1
User and Reference Guide
103
Figure 1. Agents
3
To enable storage data collection for an agent, click the agent’s Edit
4
Select the Enable Storage Collection check box and click Save.
icon.
Inferring Physical-Host-to-Storage
Relationships
If you want to understand how storage resources are being accessed by the physical hosts in your network,
without using agents to monitor the hosts directly, you can enable dependency processing. Dependency
processing collects FC switch zone information and LUN connection information from arrays and filers, analyzes
each host port used for storage I/O, and infers the (most likely) physical host associated with the port. When
dependency processing is enabled, relevant topology diagram are enhanced to show inferred hosts connected to
the ports that are used for storage I/O. Connections from inferred hosts are shown as orange or blue lines.
Figure 2. Topology diagrams with inferred hosts
In future, if you decide you want to collect data about inferred physical hosts, you can begin monitoring the
hosts by configuring Infrastructure agents. For instructions, see Reviewing Inferred Hosts on page 108.
Enabling Dependency Processing
Dependency processing requires that you have VMware Performance agents configured to monitor your
vCenters, and that storage collection is enabled on the agents. For more information, see “Configuring VMware
Performance Agents” in the Managing Storage in Virtual Environments Installation and Configuration Guide.
To enable dependency processing:
1
On the navigation panel, under Dashboards, click Storage & SAN > Storage Environment.
2
Click the Administration tab.
3
Click Enable Physical Host to Storage Dependency Processing.
4
In the Physical Host Support message box, click Yes.
Foglight for Storage Management 4.1
User and Reference Guide
104
The dependency processing algorithm collects information and runs a host-resolution process. This may
take some time.
When the processing completes, the Enable Physical Host to Storage Dependency Processing Task changes
to Disable Physical Host to Storage Dependency Processing Task, and a new task, Review Physical Host
FQDNs, appears in the task list.
Figure 3. Tasks
5
Next Step: Reviewing and Editing Host-Port Assignments
Reviewing and Editing Host-Port Assignments
After enabling dependency processing, it is a good idea to review the list of inferred hosts derived from this
process. You can verify that the inferred hosts are correct for the specified port, fix errors, and provide missing
host names. For this task, you use the Review Physical Host FQDNs dashboard.
NOTE: The Review Physical Host FQDNs dashboard also contains host-port assignment for physical hosts
that are being monitored by an agent.
To review the list of hosted identified by dependency processing:
1
On the Storage Environment dashboard, in the Administration tab, click Review Physical Host FQDNs.
The Review Physical Host FQDNs dashboard opens. The table displays the list of hosts, the associated
port carrying storage I/O, zone information for switch ports, and the number and type of storage
resources accessed. You can filter the contents of the table using the Show Host Ports check boxes.
Figure 4. Review Physical Host FQDNs
Foglight for Storage Management 4.1
User and Reference Guide
105
2
To add missing host names or change existing host names:
a
Click the row check boxes for the port assignments you want to change. All selected ports will be
assigned the same host name.
b
Click Edit Host FQDN.
Figure 5. Edit Host FQDN
c
Type the fully qualified domain name for the host, and click Submit.
The table displays the new host-port assignments.
3
To remove host assignments from ports, select the row check boxes for the ports and click Clear Host
FQDN.
•
Clearing a user-resolved host makes the agent-resolved host active, if one exists.
•
Clearing an agent-resolved host adds the host FQDN to a list so that it does not get reassigned to
the port in the future.
Running Dependency Processing Manually
If your environment has changed, you can update inferred-host-to-storage connections immediately rather than
waiting for a scheduled processing.
NOTE: Edited host-port assignments are not affected by these actions.
On the Review Physical Host FQDNs dashboard, use one of the following options:
•
Refresh—Runs the host-resolution analysis to identify inferred hosts for ports handling storage I/O.
•
Collect Zone Information—Dependency processing automatically collects zone information on a nightly
basis. If your environment has changed, use this button to collect updated zone information for FC
switches in your environment and run the host-resolution analysis. The collection and analysis can take
between five to ten minutes, possibly longer depending on the complexity of the environment.
Customizing Helper Strings for Dependency
Processing
Zone names are set by the Storage Administrator and typically include the names of devices connected to ports.
The dependency processing algorithm processes the zone name strings using various helper strings to identify
the possible host names associated with a port. A DNS lookup is performed on the possible names to identify the
most likely host FQDN. You can customize the helper stringers for your environment and improve the accuracy
of the inferred hosts.
Foglight for Storage Management 4.1
User and Reference Guide
106
To customize helper strings for dependency processing:
1
On the Storage Environment dashboard, in the Administration tab, click Review Physical Host FQDNs.
2
Click Change Zone Processing Params.
The Edit Parameters for Zone Info Processing dialog box opens.
3
To specify IP addresses to ignore during dependency processing, click Invalid IPs. Click Add, specify an IP
address, and click Add.
Some DNS setups always return a default IP address when a DNS lookup of a possible host name fails. If a
DNS lookup of a host name returns an IP address listed in the Invalid IP table, the host name is
considered invalid for dependency processing purposes.
Figure 6. Invalid IP table
4
To specify strings that may be included in the zone information, but are known not to be host names,
click DNS Exclude Strings. Click Add, specify a string, and click Add.
Figure 7. DNS Exclude Strings
5
To specify strings used as separators for readability in the zone names, click Split Strings. Click Add,
specify a string, and click Add.
Foglight for Storage Management 4.1
User and Reference Guide
107
6
To specify strings that you want to convert to other strings before performing a host name DNS lookup,
click Translation Strings. Click Add, specify a string found in the zone information and the new string to
use, and click Add.
7
When you are finished customizing parameters on all tabs, click Save.
Dependency processing begins. Zone information is collected using the changed parameters and hostresolution analysis is performed.
Reviewing Inferred Hosts
You can see a list of inferred hosts on the Infrastructure dashboard. From this dashboard, you can explore hosts
and change their host port assignments if necessary. If you are interested in monitoring and collecting detailed
data about an inferred host, you can set up an agent to monitor the host. This action moves the host from the
inferred list to the appropriate monitored host list.
For more information about the Infrastructure dashboard, see the Managing the Infrastructure Cartridge
section of the Foglight for Storage Management online help.
To review inferred hosts:
1
On the navigation panel, under Dashboards, click Infrastructure.
2
In the Select a Service list, select All Hosts.
3
In the Monitoring tab, click the Inferred tile.
Figure 8. Inferred tile
4
In the quick view, click a host in the Inferred list and then click Explore.
5
Click the SAN Topology tab.
The SAN Topology tab displays the connections to the LUNs identified through dependency processing.
Foglight for Storage Management 4.1
User and Reference Guide
108
Figure 9. SAN topology
6
Click a Cloud
icon to show the connections through a port in a Details window.
An orange line connecting a host to a port indicates that the host is an inferred host.
Figure 10. Inferred Host
7
If the assigned host is incorrect, click the Host Port
Assign to Different Host or Remove Link to Host.
icon and select the desired action, such as
8
Close the Details window.
9
If you want to set up an agent to monitor this host, click the Monitor tab.
NOTE: If your user credentials do not include the role Administrator, you need to ask someone with
this role to create an agent to monitor the host.
10 Click Configure Host Monitoring.
11 In the Add Monitored Host wizard, follow the online instructions to add the host.
For help with the wizard, see “Adding a Monitored Host” in the Managing the Infrastructure Cartridge
section of the Foglight for Storage Management online help.
Modifying Data Collection Schedules
Generally speaking, the default data collection schedules are suitable for most environments. If you notice that
some agents take an extended time to collect topology data or performance data, and modifying your
Foglight for Storage Management 4.1
User and Reference Guide
109
monitoring environment is not a suitable solution, you may want to change the data collection schedules for the
affected agents. You can find the current duration of collections in the Edit Agent Properties dialog box.
Figure 11. Edit Agent Properties
Understanding Data Collection Types and Schedules
Storage Collector agents perform two types of collections: topology collections and performance collections.
Topology collections
During topology collections, Foglight for Storage Management gathers basic information regarding each object
and its child objects. For example, on an array object, LUN, disk, and children information is gathered. Foglight
for Storage Management gathers various attributes and parameters (for example, size or capacity) as well as
relationship data. This relationship data is the key to understanding the basic topology and interconnects, or
paths, between objects (parent to parent, child to parent and, where appropriate, child to child). Without this
relationship information performance analysis would not be possible.
Performance collections
During performance collections, Foglight for Storage Management gathers the following basic types of
information:
•
Performance metrics for each known object (an object is known if it has previously been collected during
a topology collection).
•
The performance collection enumerates objects and their children. This enumeration determines if
there is any change in the topology (for example, a LUN has been created, a new ESX server has been
added to a cluster, or a new VM has been created). As the collector recognizes changes in the
environment (sometimes in conjunction with the server) it schedules topology collections to ensure that
Foglight for Storage Management continually has an accurate understanding of the underlying topology.
Not all topology changes can be detected by the performance collection. A scheduled topology
collection will pick up any undetected changes.
Default data collection schedules
Storage Collector agents have a default schedule for topology and performance collections, as described in the
following table.
Foglight for Storage Management 4.1
User and Reference Guide
110
Table 1. Default data collection schedules.
Type
Collection
Interval
Notes
Topology collection
3 hours
n/a
15 minutes
Defines the collection interval for an agent that
monitors a storage device using SMI-S or Hitachi
AMS. The management hosts from which the
Storage Collector agent collects performance
data using SMI-S obtain performance data from
the arrays every 15 minutes. Setting the agent to
collect data more frequently is inefficient,
because the same data is re-collected and
discarded.
10 minutes
Defines the collection interval for an agent that
uses a different method (than those listed above)
to monitor a storage device.
Performance Collection for EMC
SMI-S and Hitachi AMS only
Performance Collection for all
others
For information about the technologies used by Storage Collector agents to monitor the different kinds of
storage devices, see “Configuring Agents to Monitor Storage Devices” in the Managing Storage in Virtual
Environments Installation and Configuration guide.
Modifying Data Collection Schedules for Storage
Collector Agents
NOTE: You require the Foglight for Storage Management role Administrator to perform tasks that affect
agents.
The collection interval can be increased to reduce the collection frequency impact in busy environments. It is
not recommended to reduce the performance collection interval to less than five minutes. You can change the
schedule for all agents by editing the default schedule, or clone the default schedule to create and select a
specific schedule for a Storage Collector agent.
To modify storage collection schedules:
1
On the navigation panel, under Dashboards, click Administration > Agents.
2
Click Agent Status.
The Agent Status dashboard opens.
3
Select the agent whose schedule you want to change from the Agent Status view. You can select any
Storage Collector agent if you are changing the default schedule.
4
Click Edit > Properties.
Figure 12. Edit > Properties
The Agent Status view displays.
5
Select Modify the properties for this agent only.
Foglight for Storage Management 4.1
User and Reference Guide
111
6
Scroll down to the Data Collection Scheduler section.
Figure 13. Data Collection Scheduler
From here you can perform the following agent editing tasks:
7
•
To edit an existing schedule, continue with this procedure.
•
To create a unique schedule for an agent using clone, see Creating a unique schedule for an agent
on page 113.
•
Assign a cloned schedule to another agent, see Assigning a Cloned Schedule to Another Agent on
page 114.
From the Data Collection Scheduler drop-down, select the schedule you want to edit.
Figure 14. Data Collection Scheduler
8
Click Edit.
The Storage Collector-<scheduleName> dialog box opens.
9
Click inside the field you want to edit and make the required changes.
Foglight for Storage Management 4.1
User and Reference Guide
112
Figure 15. Storage Collection Schedule changes
10 Click Save Changes.
11 Close the Storage Collector-defaultSchedule dialog box.
12 Click Back to Agent Status.
The edit is complete.
Creating a unique schedule for an agent
This procedure provides the steps required to create a unique schedule for a Storage Collector agent.
To create a unique schedule for an agent:
1
From the Agent Status view, click Modify the private properties for this agent.
Figure 16. Modify Private Properties
2
Scroll down to the Data Collection Scheduler section.
3
In the Data Collection Schedule section, click Clone.
Figure 17. Data Collection Schedule
The Clone defaultSchedule dialog box opens.
Foglight for Storage Management 4.1
User and Reference Guide
113
4
Enter a schedule name, and click OK.
The newly-cloned schedule appears in the Collector Config schedule selector.
5
Click Edit.
The Storage Collector edit dialog box opens.
6
Click inside the field(s) to be edited.
7
Click Save Changes to update the new schedule.
8
Close the Storage Collector-<name> dialog box.
9
Click Save to assign the new schedule to the agent.
10 Click Back to Agent Status.
The edit is complete.
Assigning a Cloned Schedule to Another Agent
This procedure provides the steps required to clone a schedule to another Storage Collector agent.
To assign a cloned schedule to another agent:
1
From the Agent Status view, click Modify the private properties for this agent.
2
Scroll down to the Data Collection Scheduler section.
3
Click the Collector Config drop-down list.
4
Select the schedule you want to assign to this agent.
Figure 18. Select Schedule
5
Click Save to assign this new schedule to this agent.
6
Click Back to Agent Status.
The edit is complete.
Managing Foglight for Storage Management
Rules
NOTE: You require the Foglight for Storage Management role Administrator to perform tasks that affect
rules.
To see the list of Foglight for Storage Management rules, navigate to Administration > Rules & Notifications >
Rule Management, and select StorageUI from the Cartridge list. For help with rules, open the online help from
this dashboard.
All Foglight for Storage Management rule names start with StSAN and are organized into categories:
•
Essential—StSAN E—Essential rules alert on latency issues, capacity issues, device failures, and errors.
•
Normal—StSAN N—Normal rules alert on things like data traffic spikes.
Foglight for Storage Management 4.1
User and Reference Guide
114
•
Tuning—StSAN T—Tuning rules are additional rules for situations that do not cause performance issues in
most environment.
The Essential and Normal categories are enabled by default. The rules within each category can be enabled or
disabled individually. By default, all rules in the Essential category are enabled, and all rules in the Normal
category are disabled. You may want to review the Normal rules and enable the ones that suit your
environment. Finally, you can control whether alarms are generated for rules in all categories or only some
categories. For more information, see Changing Alarm Sensitivity on page 115.
You can and should modify rule conditions to better suit your storage environment. To change conditions, copy
the rule and modify it. Then you can enable the new rule and disable the original rule. This approach allows you
to refer back to the original rule if necessary, and also protects you from changes to the default rules that may
occur during regular software updates. For more information, search for “Copying or deleting rules” in the
online help.
For rules that reference registry variables for threshold values, you should modify the threshold in the registry
variable, rather than modifying the rule. For help finding and editing registry variables, search for “Registry
NOTE: Rules that begin with StSANCar are rules used internally by the Storage cartridge, and do not
generate alarms. Do not disable or alter these rules.
Variable” in the online help.
Managing Alarm Settings
You can control which type of rule generates alarms and also set up email notifications.
Changing Alarm Sensitivity
You can control which categories of StSAN rules generate alarms.
To set the alarm sensitivity:
1
On the navigation panel, under Dashboards, click Storage & SAN > Storage Environment.
2
Click the Administration tab.
3
Click Change Alarm Sensitivity.
Figure 19. Alarm Sensitivity
Foglight for Storage Management 4.1
User and Reference Guide
115
4
Select the category of rules that generate alarms or turn off alarms for all essential, normal, and tuning
rules.
As noted in the option descriptions, each option includes the rule category that matches the name of the
option plus the categories listed in the option above it, so selecting Tuning enables alarms for tuning,
normal, and essential rules.
5
Click Save.
Configuring Email Notifications for Alarms
You can define a list of email addresses that receive a notification whenever an essential rule evaluates to a
Critical or Fatal status. Email addresses are saved to the StorageAdministrator registry variable. An
essential rule begins with StSAN E and has the email action set automatically. For more information about rules,
see Managing Foglight for Storage Management Rules on page 114.
To specify email addresses:
1
On the navigation panel, under Dashboards, click Storage & SAN > Storage Environment.
2
Click the Administration tab.
3
Click Configure Storage Administrator Email Addresses.
4
Click Add, type an email address, and click Add. Repeat to add additional email addresses.
Figure 20. Configure Storage Administrator Email Addresses
Troubleshooting Database Limits
Foglight for Storage Management sets limits on the number of instances of each object type that can exist in the
Database Repository. On the Administration tab, the Review Instances and Limits task enables you to view a
table containing a list of Foglight for Storage Management object types.
For each object type, the table displays the instance limit, the number of instances currently in the database,
the utilization (count/limit), and a status based on utilization. While the utilization remains below 90%, the
object type status is Normal. When utilization reaches 90%, the status becomes Critical; if you intend to
Foglight for Storage Management 4.1
User and Reference Guide
116
monitor new instances of this object type, you should take some steps to plan for the growth. At 100%, the
status changes to Fatal and Foglight for Storage Management does not create anymore instances of the object
type. For example, if the SanLUN object type is at 99% utilization, and you add an agent to monitor a new
storage array that has 200 LUNs configured, the SanLUN instance limit will soon be reached and you will not see
data for many of the new LUNs in the dashboards.
The limit on the number of instances per object type is a global default value, which is set in the registry
variable called foglight.limit instances. You can override the limit for selected topology types.
CAUTION: Before changing limits, you may want to consult with a Support Engineer to discuss
alternatives. You want to ensure that the size of your database remains reasonable while still meeting
your monitoring requirements.
To review the number of instances of each Storage object type in the Foglight for Storage
Management database:
1
On the Storage Environment dashboard, in the Administration tab, click Review Instances and Limits.
Figure 21. Edit Registry Values
2
Review the Status column. If any of the icons are Critical or Fatal, consider either modifying how you
monitor that type of object or extending the limit for that object type.
3
To increase or decrease a limit, click Edit Registry Variables and select foglight.limit.instances.
On the Edit Registry Variables dashboard, open the online help to learn how to add a registry variable for
an object type and set its instance limit.
Foglight for Storage Management 4.1
User and Reference Guide
117
7
Understanding Metrics
This chapter defines the metrics displayed in the Storage & SAN dashboards. It also provides an overview of
metrics in Foglight for Storage Management—you need this information if you intend to create new rules or
custom dashboards.
Units of Measurement
Foglight for Storage Management uses the following units of measurement:
•
B/s — Number of bytes per second. Frequently converted to KB/s or MB/s.
•
ops/s — Number of operations per second.
•
ms — Milliseconds. Frequently converted to μs (microsecond).
•
m — milli — Thousandth.
•
ms/op — Milliseconds per operation.
•
MB — Capacity metrics are captured in MB (megabytes), but are frequently displayed in GB (gigabytes) or
TB (terabytes).
Performance Metrics
In general, performance is gauged in terms of rates (such as data rates, ops to disk rates, frame rates),
latency, and utilization. The selection of performance metrics displayed in any given dashboard depends on the
selected device type and the metrics available from the device vendor.
The following tables define each metric and shows the topology name for the metric. You need the topology
name when referencing the metric in user-defined rules and dashboards.
Performance metrics are organized into the following categories:
•
Fabrics and FC Switches — Performance Metrics
•
Storage Arrays and Filers — Disk I/O Performance Metrics
•
Clustered Storage Arrays — Network Performance Metrics
Fabrics and FC Switches — Performance Metrics
For fabrics and switches, performance is assessed in terms of send and receive values for the FC switch ports
and is expressed as the average number of bytes per second or frames per second. Port errors are also an
important indicator of port performance, because a higher number of errors correlates with lower send and
receive values.
Foglight for Storage Management 4.1
User and Reference Guide
118
Table 1. Performance metrics
Display Name
Description
Topology Name
Baseline
A range of normal values, as determined by IntelliProfile n/a
by looking at historical data for the same time range. A
baseline range displays after seven days of data
collection.
Data Rate
Bytes per second sent and received though a port.
Data Receive Rate
Average number of bytes per second received through a bytesRcvd
port. The data rate, compared against the Baseline of
typical activity, indicates whether the current traffic is
typical or out of the norm.
bytesXmit
Also Data Xmit Rate
Data Send Rate
bytesTotal
Average number of bytes per second transmitted
through a port. The data rate, compared against the
Baseline of typical activity, indicates whether the
current traffic is typical or out of the norm.
Frame Rate
Average number of frames per second received and
transmitted through a port.
framesTotal
Frame Receive Rate
Average number of frames per second received through
a port.
framesRcvd
Also Frame Xmit Rate
framesXmit
Frame Send Rate
Average number of frames per second transmitted
through a port.
Average number of link errors per second on a port. Link errorsLink
errors are caused by or affect the link status of the
connection. Link errors include:
Link Error Rate
•
Link resets
•
Loss of signal on the port
•
Loss of synchronization
If an error rate value is very small, it may be displayed
as 0.0. If a fabric or switch has an actual error rate of 0,
it is not displayed.
Link Speed
Non-Link Error Rate
Current speed of a port in Mb/second.
currentSpeedMb
Average number of non-link errors per frame on a port.
Non-link errors include:
nonLinkErrors
•
Resource constraint that causes frames to be
retried
•
CRC errors
•
Frame length errors
•
Address errors
If an error rate value is very small, it may be displayed
as 0.0. If a fabric or switch has an actual error rate of 0,
it is not displayed.
Rcvd Utilization
= Data Receive Rate / Link Speed, expressed as a
percentage. Utilization values help you identify ports
that may be overloaded.
bytesRcvd
Utilization
Xmit Utilization
= Data Send Rate / Link Speed, expressed as a
percentage. Utilization values help you identify ports
that may be overloaded.
bytesXmit
Utilization
Foglight for Storage Management 4.1
User and Reference Guide
119
Storage Arrays and Filers — Disk I/O Performance
Metrics
Rate and latency metrics describe the performance of ports, controllers, pools, LUNs, disks, and NASVolumes.
The selection of performance metrics displayed in any given dashboard depends on the selected device type
and the metrics available from the device vendor.
NOTE: For controllers, the Date Rate, Latency, and Ops Rate sets of metrics include Block and File
versions as well to track block and file operations separately. For example, Data Rate has Data Rate (File)
and Data Rate (Block). These sets of metrics are not included in this table.
Table 2. Disk I/O Performance Metrics
Display Name
Description
Topology Name
Also displayed as Pct Busy or Busiest
busy
% Busy
Average percentage of time a component is busy doing
I/O during the collection period.
Average Queue
Depth
Average number of outstanding I/O operations at the
start of each new I/O request in a pool, LUN, or disk.
Cache Hit Rate
Percentage of read operations that can be satisfied from cacheHits
the cache.
L1 Cache Hit Rate
Percentage of read operations that can be satisfied from cacheHitRate_L1
this cache.
L2 Cache Hit Rate
Percentage of read operations that can be satisfied from cacheHitRate_L2
this cache.
Data Rate
Bytes per second read and written.
bytesTotal
Data Read Rate
Bytes per second read.
bytesRead
Data Write Rate
Bytes per second written.
bytesWrite
Disk Latency
Average latency for read/write operations to the disks in latencyTotalDisk
a pool, calculated in milliseconds per operation. At the
pool level, Avg Disk Latency reflects the average latency
for I/O to the physical disks.
Latency
(R/W Latency)
avgQueueDepth
Average latency for read/write operations, calculated in latencyTotal
milliseconds per operation. At the pool level, Latency
reflects I/O to the pool only. If the block of data to be
read is already in a cache, then the latency for that
operation is very low, because no disk access needs to
be performed.
NOTE: Latency is not the sum of read latency and write
latency. The number of read operations and write
operations is different.
Other Latency
Average latency for other operations
latencyOther
Read Latency
Average latency for read operations.
latencyRead
Write Latency
Average latency for write operations.
latencyWrite
Ops Rate
Average read and write operations per second.
opsTotal
Read Ops Rate
Read operations per second.
opsRead
Write Ops Rate
Write operations per second.
opsWrite
Other Ops Rate
Other operations per second.
opsOther
Foglight for Storage Management 4.1
User and Reference Guide
120
Clustered Storage Arrays — Network Performance
Metrics
These metrics describe the performance of an EqualLogic array’s network and an Isilon array’s internal and
external networks.
Table 3. Network Performance Metrics
Display Name
Description
Link Speed
Current operational speed of the port link.
currentSpeedMb
Max Port Speed
Rated maximum speed of the port
maxSpeedMb
Number of incorrectly received data packets divided by
the total number of received packets, expressed as a
percentage.
packetErrorPercent
Packet Errors %
Rcv Data Rate
Bytes per second received through a port or group of
ports.
bytesWrite
Send Data Rate
Bytes per second sent through a port or group of ports.
bytesRead
= Rcv Data Rate / Link Speed
bytesWrite
Utilization
= Send Data Rate / Link Speed
bytesRead
Utilization
Also NW Rcv Util (% of max)
bytesWrite
UtilizationMax
Rcv Util
Send Util
Rcv Util (% of max)
Send Util (% of max)
Topology Name
= Rcv Data Rate / Max Port Speed
bytesRead
UtilizationMax
Also NW Send Util (% of max)
= Send Data Rate / Max Port Speed
Capacity Metrics
The selection of capacity metrics displayed in any given dashboard depends on the selected device type and the
metrics available from the device vendor. The following tables define each metric and shows the topology name
for the metric. You need the topology name when referencing the metric in user-defined rules and dashboards.
Capacity metrics are organized into the following categories:
•
Storage Arrays — Array, Member, and Pool Capacity Metrics
•
Filers — Filer and Aggregate Capacity Metrics
•
Storage Arrays and Filers — LUN, NASVolume, and Disk Capacity Metrics
Storage Arrays — Array, Member, and Pool Capacity
Metrics
The following table contains the capacity metrics you may see displayed for storage arrays, members/nodes,
and pools.
NOTE: To view the metrics for EMC Isilon arrays (which are significantly different than other arrays), see
EMC Isilon Only Capacity Metrics.
Foglight for Storage Management 4.1
User and Reference Guide
121
Total Advertised
LUNs Size
Available Raw
Capacity
Available Usable
Capacity
Capacity
Provisioned to LUNs
Also Advertised LUNs Size
Topology Name
Pool
Description
Array
Capacity Metric
Member/Node
Table 4. Array, Member, and Pool Capacity Metrics
conf_LunSize
Sum of the Advertised LUN Size for all
LUNs in the entity.
Capacity available in the pool that has
not been committed to data,
snapshots, or thick-provisioned LUNs.
Raw counts all the disk blocks
available, before applying RAID.
raw_available
Also Available Capacity
conf_available
Capacity available in the pool that has
not been committed to data,
snapshots, or thick-provisioned LUNs.
Usable means after applying RAID.
Also Provisioned to LUNs
Array:
Capacity used for LUN data. Does not
include capacity that is used for
snapshots, reserves, etc.
lun_usable_capacity
•
For thin-provisioned LUNs, this
is the capacity used for data
written to the LUNs.
•
For thick-provisioned LUNs,
this is the capacity allocated to
the LUNs, and does not reflect
data written
Pool:
conf_used
Formerly displayed as Consumed
Capacity.
= Total Advertised LUNs Size – Total
Usable Capacity
Overcommitment
conf_LunOvercommitted
Overcommitment reflects the
minimum additional capacity needed
in the pool to support the advertised
capacity promised to thin-provisioned
LUNs. When the value of
overcommitment is greater than zero,
you run the risk of I/O writes failing
because of insufficient physical
storage to support the demand.
Overcommitment is available only for
arrays where the vendor provides the
total usable capacity.
Total Disk Capacity
Sum of the sizes of the disks in the
array. This includes spares.
disk_capacity
raw_capacity
Total Raw Capacity
Total capacity of the pool in raw
terms. Raw counts all the disk blocks
allocated to the pool, before applying
RAID.
Foglight for Storage Management 4.1
User and Reference Guide
122
Member/Node
Table 4. Array, Member, and Pool Capacity Metrics
Description
conf_capacity
Total Usable
Capacity
Total capacity of the pool or
member/node in usable terms. Pools
that are defined with a specific RAID
level have this metric. Pools that can
support LUNs with different RAID
levels do not have this metric.
Used Capacity
= Total Usable Capacity - Available
Usable Capacity.
conf_used
= Used Capacity / Total Usable
Capacity
Calculated
% Used
Topology Name
Pool
Array
Capacity Metric
Percentage of the member/node's
usable capacity that has been used.
EqualLogic Pools-Members tab only.
% of Pool
Calculated
Percentage of the pool storage
capacity provided by a pool member.
EMC Isilon Only Capacity Metrics
Isilon arrays provide additional metrics to describe the Isilon OneFS file system (IFS) capacity in terms of hard
disk drive (HDD) and solid state drive (SSD) capacity.
Member/Node
Table 5. EMC Isilon only Capacity Metrics
Description
Total HDD Capacity
Raw capacity of the hard drives in the
array, as reported by the vendor.
raw_capacity_HDD
Raw capacity of the solid-state drives
in the array, as reported by the
vendor.
raw_capacity_SSD
Total SSD Capacity
Total capacity of the Isilon File
System.
Array:
IFS Capacity: Total
Topology Name
Pool
Array
Capacity Metric
pool_capacity_conf
Pool: conf_capacity
IFS Capacity
IFS Capacity: Free
= Total HDD Capacity + Total SSD
Capacity
conf_capacity
Capacity of the Isilon File System on a
pool or member/node.
Also IFS Free
Array:
Free capacity of Isilon File system.
pool_free_conf
Pool and member:
conf_available
IFS Capacity Used
= IFS Capacity - IFS Capacity: Free
Calculated
Foglight for Storage Management 4.1
User and Reference Guide
123
Member/Node
Table 5. EMC Isilon only Capacity Metrics
Description
IFS Capacity: HDD
Total
Capacity of Isilon File System on hard
disk drives in an array, pool, or node.
conf_capacity_HDD
IFS Capacity: HDD
Free
Free capacity of Isilon File System on
hard disk drives in an array, pool, or
node.
conf_available_HDD
IFS Capacity: HDD
Used
= IFS Capacity: HDD Total - IFS
Capacity: HDD Free
Calculated
IFS Capacity: SDD
Total
Capacity of Isilon File System on solid
state drives in an array, pool, or node.
conf_capacity_SSD
IFS Capacity: SDD
Free
Free capacity of Isilon File System on
solid state drives in an array, pool, or
node.
conf_available_SSD
IFS Capacity: SDD
Used
= IFS Capacity: SDD Total - IFS
Capacity: SDD Free
Calculated
HDD % Free
= IFS Capacity: HDD Free / IFS
Capacity: HDD Total
conf_pct_available_
HDD
SDD % Free
= IFS Capacity: SDD Free / IFS
Capacity: SDD Total
conf_pct_available_
SSD
Topology Name
Pool
Array
Capacity Metric
Filers — Filer and Aggregate Capacity Metrics
Foglight for Storage Management supports NetApp filers (F). NetApp uses the word aggregate (Ag) instead of
pool, but the metrics collected for aggregates are similar to the metrics collected for pools.
Description
Also Total Advertised LUNs Size or
Advertised LUNs
Size
Advertised
NASVolumes Size
Aggregate
Capacity Metric
Filer
Table 6. Filer and Aggregate Capacity Metrics
Topology Name
configured_lun_size
Adv LUNs Size
Sum of the Advertised LUN Size for all
LUNs in the entity.
Sum of the Advertised Size for all
NASVolumes.
Filer:
configured_vol_size
Aggregate:
conf_VolSize
Also Total Disk Capacity
Disk Capacity (Raw) Sum of the sizes of the disks in the filer.
This includes spares.
disk_capacity
Foglight for Storage Management 4.1
User and Reference Guide
124
Description
Amount of disk capacity that has not been
committed to an aggregate.
Free Disk/Spares
Capacity (Raw)
Topology Name
disk_free
Disk capacity cylinder coloring and
capacity alarms are defined in registry
variables with the prefix:
StSAN.FilerDisks.PctUnallocatedCapacityT
hres hold...
Also Total Usable Aggr Capacity
Aggr Capacity
(Usable) Total
Aggregate
Capacity Metric
Filer
Table 6. Filer and Aggregate Capacity Metrics
pool_capacity_conf
Sum of the Total Usable Capacity of all the
aggregates in the filer. Aggregate capacity
cylinder coloring and alarms are defined in
registry variables with the prefix:
StSAN.FilerAggregates.PctUnallocatedCap
acityThreshold...
Also Available Aggr Capacity
Aggr Capacity
(Usable) Free
pool_free_conf
Sum of the Available Usable Capacity in
the aggregates. Usable means after
applying RAID.
Also displayed as Aggregate Capacity
Aggr Capacity (Raw) (Raw)
Total
Sum of the raw capacity of all the
aggregates in the filer.
pool_capacity_raw
Aggr Capacity (Raw) Sum of the available raw capacity in all the
Free
aggregates.
pool_free_raw
= Advertised NASVolumes Size – Aggr
Capacity (Usable) Total
Overcommitment
conf_overcommitted
Overcommitment reflects the minimum
additional capacity needed in the
aggregate to support the advertised
capacity promised to thin-provisioned
volumes and LUNs.
When the value of overcommitment is
greater than zero, you run the risk of I/O
writes failing because of insufficient
physical storage to support the demand.
Total Usable
Capacity
Available Usable
Capacity
Used Capacity
% Available
Total capacity of the aggregate in usable
terms. Usable means after applying RAID.
conf_capacity
Also displayed as Available Capacity
conf_available
Capacity available in the aggregate that
has not been committed to data,
snapshots, or thick-provisioned volumes
and LUNs, in usable terms. Usable means
after applying RAID.
= Total Usable Capacity – Available Usable
Capacity
conf_used
Also % Free
Calculated
= Available Usable Capacity / Total Usable
Capacity
Foglight for Storage Management 4.1
User and Reference Guide
125
Storage Arrays and Filers — LUN, NASVolume, and
Disk Capacity Metrics
These metrics describe the capacity of LUNs , NASVolumes, and disks.
NASVolume
Table 7. LUN, NASVolume, and Disk Capacity Metrics
Topology Name
Advertised LUN
Size
Size of the LUN as advertised to the
hosts connected to the LUN.
advertisedSize
size
Provisioned Usable
Capacity
Capacity provisioned for LUN data. In
a thin-provisioned LUN, this is the
capacity used for data written to the
LUN. In a thick-provisioned LUN, this
is the capacity allocated to the LUN,
and does not reflect data written.
Provisioned Raw
Capacity
Capacity provisioned for LUN data in
raw terms.
rawCapacity
Advertised Size
Size of the NASVolume as advertised
to the host mounting the volume.
total_capacity
Free Capacity
Advertised Size – Used Capacity
available_capacity
Used Capacity
Capacity in volume committed to
data, snapshots, etc. in usable terms.
used_capacity
% Used
= Used Capacity / Advertised Size
percent_used
size
Disk Size
Size of the disk in MBs. Disk capacity
cylinder coloring and capacity alarms
are defined in registry variables with
the prefix:
Disk
Description
LUN
Capacity Metric
StSAN.FilerDisks.PctUnallocatedCapac
ityThreshold...
Overview of Metrics in Foglight for Storage
Management
Metrics and data are saved in the Foglight for Storage Management database as properties of topology objects.
Topology objects correspond to the types of storage devices and components being monitored in your storage
environment.
TIP: For in-depth information about the Foglight for Storage Management data model, see the Foglight for
Storage Management Data Model Guide.
Summary of SAN Topology Objects
If you want to create new rules or custom dashboards, or if you want to browse the data directly instead of
through the Storage & SAN dashboards, you need to know the names of devices, components, and metrics as
they are stored in the Foglight for Storage Management Database Repository.
Foglight for Storage Management 4.1
User and Reference Guide
126
The following tables list the top-level storage devices and all child components, and identifies their topology
object names. These lists are not comprehensive, rather they reflect the objects that you may need to
reference if you are creating new rules or custom dashboards. For other objects, see the Data dashboard. For
help navigating to the Data dashboard, see Locating SAN Topology Objects in Foglight for Storage Management
on page 127.
Table 8. SAN Topology Objects
Device
Topology Object
Fabric
SanFabric
Fibre Channel Switch
SanFcSwitch
VSAN
SanVsan
Filer
SanFiler
Storage Array
SanStorageArray
Component
Topology Object
FC Switch Port
SanFcSwitchPort
FC Port (Arrays/Filers)
SanStorageSupplierPortFC
IP Port (Arrays/Filers)
SanStorageSupplierPortISCSI
InfiniBand Port (Arrays)
SanStorageSupplierPortIB
Controller
SanController
Disk
SanPhysicalDisk
LUN
SanLun
Member
SanMember
NASVolume
SanVolume
Pool or Aggregate
SanPool
Locating SAN Topology Objects in Foglight for Storage
Management
NOTE: You require the Foglight for Storage Management role Administrator to access the Configuration >
Data menu.
You can explore directly the metrics and data captured for monitored storage devices and their components.
Data is displayed as a hierarchy of objects.
To locate storage topology objects:
1
On the navigation panel, under Dashboards, click Configuration > Data.
2
Expand the Storage & SAN branch.
3
Select a device type and then a device.
4
Select a metric to view details in the Property Viewer.
For example, the following image shows a selected FC Switch (brocade3900_3). The Data Rate metric,
which is called bytesRcvd, is selected and displayed in the Property Viewer. The Property Viewer
contains tables showing all collected values for the selected metric. For more information, see
Understanding Metric Data in Charts and Tables on page 14.
Foglight for Storage Management 4.1
User and Reference Guide
127
Figure 1. Property Viewer
Foglight for Storage Management 4.1
User and Reference Guide
128
About Dell
Dell listens to customers and delivers worldwide innovative technology, business solutions and services they
trust and value. For more information, visit www.software.dell.com.
Contacting Dell
Technical support:
Online support
Product questions and sales:
(800) 306-9329
Email:
info@software.dell.com
Technical support resources
Technical support is available to customers who have purchased Dell software with a valid maintenance
contract and to customers who have trial versions. To access the Support Portal, go to
https://support.software.dell.com/.
The Support Portal provides self-help tools you can use to solve problems quickly and independently, 24 hours a
day, 365 days a year. In addition, the portal provides direct access to product support engineers through an
online Service Request system.
The site enables you to:
•
Create, update, and manage Service Requests (cases)
•
View Knowledge Base articles
•
Obtain product notifications
•
Download software. For trial software, go to Trial Downloads.
•
View how-to videos
•
Engage in community discussions
•
Chat with a support engineer
Foglight for Storage Management 4.1
User and Reference Guide
129
Download PDF