Compaq Version 7.3 DECamds, Data Analyzer, Data Provider User’s Guide
Below you will find brief information for DECamds Version 7.3. DECamds Version 7.3 is a real-time monitoring, diagnostic, and correction tool that helps you improve OpenVMS system and OpenVMS Cluster availability. DECamds helps improve OpenVMS system and OpenVMS Cluster availability in the following ways: Availability: Alerts users to resource availability problems, suggests paths for further investigation, and recommends actions to improve availability. Centralized management: Provides centralized management of remote nodes within an extended local area network (LAN). Intuitive interface: Provides an easy-to-learn and easy-to-use DECwindows Motif user interface. Correction capability: Allows real-time intervention, including adjustment of node and process parameters, even when remote nodes are hung. Customization: Adjusts to site-specific requirements through a wide range of customization options. Scalability: Makes it easier to monitor multiple OpenVMS systems and OpenVMS Cluster systems over a single site or over multiple sites.
advertisement
Assistant Bot
Need help? Our chatbot has already read the manual and is ready to assist you. Feel free to ask any questions about the device, but providing details will make the conversation more productive.
DECamds User’s Guide
Order Number: AA–Q3JSE–TE
April 2001
This guide explains how to use DECamds software to detect and fix system availability problems. It also explains how to install DECamds.
Revision/Update Information:
This guide supersedes the DECamds
User’s Guide, Version 7.1.
Operating System and Version:
Data Analyzer:OpenVMS Alpha and
VAX Version 7.2 or later
Data Provider: OpenVMS Alpha and
VAX Version 6.2 or later
Software Version:
Compaq DECamds Version 7.3
Compaq Computer Corporation
Houston, Texas
© 2001 Compaq Computer Corporation
Compaq, VAX, VMS, and the Compaq logo Registered in U.S. Patent and Trademark Office.
OpenVMS is a trademark of Compaq Information Technologies Group, L. P. in the United States and other countries.
Motif, OSF/1, and UNIX are trademarks of The Open Group in the United States and other countries.
All other product names mentioned herein may be trademarks of their respective companies.
Confidential computer software. Valid license from Compaq required for possession, use, or copying.
Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software
Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license.
Compaq shall not be liable for technical or editorial errors or omissions contained herein. The information in this document is provided "as is" without warranty of any kind and is subject to change without notice. The warranties for Compaq products are set forth in the express limited warranty statements accompanying such products. Nothing herein should be construed as constituting an additional warranty.
ZK5929
The Compaq OpenVMS documentation set is available on CD-ROM.
This document was prepared using DECdocument, Version 3.3-1b.
Contents
Preface
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 Overview of DECamds
1.1
1.2
1.3
1.3.1
1.3.2
1.3.2.1
1.3.2.2
1.3.2.3
1.3.3
1.3.4
How Does DECamds Work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Where to Install the DECamds Data Analyzer . . . . . . . . . . . . . . . . . . . . .
DECamds Security Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Understanding DECamds Security Files . . . . . . . . . . . . . . . . . . . . . . .
Customizing Security Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Up Node Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Defining Data Exchange Access Between Nodes . . . . . . . . . . . . . .
Limiting Specific Users to Read Access . . . . . . . . . . . . . . . . . . . . .
Sending Messages to OPCOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Broadcast Intervals for Node Availability Messages . . . . . . . .
2 Getting Started
2.1
2.2
2.2.1
2.2.2
2.2.3
2.3
2.3.1
2.3.2
2.3.3
2.3.4
2.3.5
Starting DECamds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using the System Overview Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Expanding and Collapsing Group Information . . . . . . . . . . . . . . . . . . .
Displaying Additional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Stopping Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using the Event Log Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Displaying Information About an Event Log Entry . . . . . . . . . . . . . . .
Performing Corrective Action on an Event Log Entry . . . . . . . . . . . . .
Sending Event Information to OPCOM . . . . . . . . . . . . . . . . . . . . . . . .
Removing an Event from the Event Log Window . . . . . . . . . . . . . . . . .
Retaining and Releasing an Event in the Event Log Window . . . . . . .
3 Managing DECamds Data Windows
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
Disk Status Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Volume Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Single Disk Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Page/Swap File Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Node Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Process I/O Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU Modes Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Memory Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.10
Single Process Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.11
Lock Contention Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.12
Single Lock Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.13
Cluster Transition/Overview Summary Window . . . . . . . . . . . . . . . . . . . . .
ix
1–2
1–3
1–3
1–5
1–7
1–8
1–8
1–9
1–9
1–10
2–1
2–2
2–5
2–5
2–6
2–7
2–9
2–9
2–10
2–10
2–10
3–2
3–5
3–6
3–8
3–10
3–12
3–14
3–15
3–17
3–19
3–25
3–28
3–30 iii
3.13.1
Data Displayed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.13.2
Notes About the Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.14
System Communications Architecture Summary Window . . . . . . . . . . . . .
3.14.1
Notes About the Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.15
NISCA Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.15.1
3.15.2
Data Displayed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Notes About the Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Performing Fixes
4.1
4.2
4.2.1
4.2.2
4.2.3
4.2.4
4.2.5
4.2.6
4.2.7
4.2.8
4.3
4.3.1
4.3.2
Understanding Fixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Performing Fixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Adjust Quorum Fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Adjust Process Quota Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Adjust Working Set Fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Change Process Priority Fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Crash Node Fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Exit Image and Delete Process Fixes . . . . . . . . . . . . . . . . . . . . . . . . . .
Purge Working Set Fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Suspend Process and Resume Process Fixes . . . . . . . . . . . . . . . . . . . .
Examples for Fixing Low Memory Availability . . . . . . . . . . . . . . . . . . . . . .
Performing a Fix Using Automatic Fix Settings . . . . . . . . . . . . . . . . .
Performing a Fix Using Manual Investigation . . . . . . . . . . . . . . . . . . .
5 Customizing DECamds
5.1
5.1.1
5.1.2
5.1.3
5.2
5.2.1
5.2.2
5.3
5.4
5.5
5.5.1
5.5.1.1
5.5.1.2
5.5.1.3
5.5.2
5.5.2.1
5.5.2.2
5.5.3
Customizing DECamds Defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Default Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Automatic Event Investigation . . . . . . . . . . . . . . . . . . . . . . . .
Setting Automatic Lock Investigation . . . . . . . . . . . . . . . . . . . . . . . . .
Filtering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Filtering Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Customizing Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sorting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Collection Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Optimizing Performance with System Settings . . . . . . . . . . . . . . . . . . . . .
Optimizing DECamds Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Process Quotas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting LAN Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Window Customizations . . . . . . . . . . . . . . . . . . . . . . . . . . .
Optimizing System Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Setting Data Link Read Operations . . . . . . . . . . . . . . . . . . . . . . . .
Setting the Communications Buffer . . . . . . . . . . . . . . . . . . . . . . . .
Optimizing Performance with Hardware . . . . . . . . . . . . . . . . . . . . . . .
A Installing the DECamds Data Analyzer
A.1
A.2
A.3
A.4
A.5
A.5.1
A.6
General Installation Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Data Analyzer Installation Requirements . . . . . . . . . . . . . . . . . . . . . . . . .
Obtaining the Data Analyzer Software . . . . . . . . . . . . . . . . . . . . . . . . . . .
Installing Data Analyzer Software from a PCSI Kit . . . . . . . . . . . . . . . . .
Postinstallation Tasks on Data Provider Nodes . . . . . . . . . . . . . . . . . . . . .
Starting, Stopping, and Reloading DECamds . . . . . . . . . . . . . . . . . . . .
Postinstallation Tasks on a Data Analyzer Node . . . . . . . . . . . . . . . . . . . .
3–32
3–33
3–33
3–35
3–36
3–38
3–40
4–9
4–9
4–10
4–10
4–11
4–1
4–2
4–4
4–5
4–6
4–6
4–7
4–8
5–19
5–19
5–20
5–20
5–20
5–20
5–21
5–1
5–3
5–4
5–4
5–4
5–6
5–8
5–14
5–16
5–18
5–18
A–1
A–2
A–4
A–4
A–7
A–8
A–8 iv
Figures
3–3
3–4
3–5
3–6
3–7
3–8
3–9
3–10
3–11
3–12
3–13
3–14
3–15
3–16
3–17
1–1
2–1
2–2
2–3
2–4
2–5
3–1
3–2
A.7
A.8
A.9
Starting to Use the Data Provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Determining and Reporting Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Running the Installation Verification Procedure Separately . . . . . . . . . . .
B DECamds Files and Logical Names
B.1
B.2
B.3
B.4
B.5
B.6
Files and Logical Names for the Data Analyzer Node . . . . . . . . . . . . . . . .
Files and Logical Names for Data Provider Nodes . . . . . . . . . . . . . . . . . . .
Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Event Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lock Contention Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
OPCOM Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Glossary
Index
Examples
A–1
B–1
B–2
Sample OpenVMS Alpha Installation . . . . . . . . . . . . . . . . . . . . . . . . .
Sample Event Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sample Lock Contention Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DECamds Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
System Overview Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
System Overview Window Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Event Log Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Event Log Window Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Event Display Choice Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DECamds Data Window Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . .
Disk Status Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Volume Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Single Disk Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Page/Swap File Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Node Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Process I/O Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU Modes Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Memory Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Single Process Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lock Contention Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . .
Filtering Lock Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Single Lock Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cluster Transition/Overview Summary Window . . . . . . . . . . . . . . . . . .
SCA Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
NISCA Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A–9
A–10
A–10
B–1
B–2
B–4
B–5
B–5
B–6
A–6
B–5
B–6
3–5
3–7
3–8
3–10
3–12
3–14
3–15
3–17
3–19
3–25
3–27
3–28
3–31
3–34
3–37
1–3
2–3
2–5
2–7
2–8
2–9
3–1
3–3 v
Tables
2–2
3–1
3–2
3–3
3–4
3–5
3–6
3–7
1–1
1–2
1–3
1–4
2–1
3–8
3–9
3–10
3–11
3–12
3–13
3–14
3–15
3–16
3–17
3–18
4–11
5–1
5–2
5–3
5–4
5–5
5–6
5–7
4–1
4–2
4–3
4–4
4–5
4–6
4–7
4–8
4–9
4–10 vi
FIX Adjust Quorum Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FIX Adjust Process Quota Limit Dialog Box . . . . . . . . . . . . . . . . . . . .
FIX Adjust Working Set Size Dialog Box . . . . . . . . . . . . . . . . . . . . . . .
FIX Process Priority Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FIX Crash Node Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FIX Process State Dialog Box — Exit Image or Delete Process . . . . . .
FIX Purge Working Set Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . .
FIX Process State Dialog Box — Suspend or Resume Process . . . . . . .
Sample Fix Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DECamds Memory Summary Window . . . . . . . . . . . . . . . . . . . . . . . . .
DECamds Node Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . .
DECamds Application Customizations Dialog Box . . . . . . . . . . . . . . . .
Event Qualification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU Summary Filtering Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . .
Customize Events Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
LOWSQU Event Customization Window . . . . . . . . . . . . . . . . . . . . . . .
Memory Summary Sorting Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . .
Memory Summary Collection Interval Dialog Box . . . . . . . . . . . . . . . .
4–13
5–2
5–5
5–7
5–9
5–10
5–15
5–16
4–4
4–5
4–6
4–7
4–7
4–8
4–9
4–10
4–11
4–12
Security Triplet Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Security Triplet Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DECamds Logical Names for OPCOM Messages . . . . . . . . . . . . . . . . .
Broadcast Availability Logical Names . . . . . . . . . . . . . . . . . . . . . . . . .
System Overview Window Display Fields . . . . . . . . . . . . . . . . . . . . . .
Event Log Window Display Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DECamds Data Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Disk Status Summary Window Data Fields . . . . . . . . . . . . . . . . . . . . .
Volume Summary Window Data Fields . . . . . . . . . . . . . . . . . . . . . . . .
Single Disk Summary Window Data Fields . . . . . . . . . . . . . . . . . . . . .
Page/Swap File Summary Window Data Fields . . . . . . . . . . . . . . . . . .
Node Summary Window Data Fields . . . . . . . . . . . . . . . . . . . . . . . . . .
Process I/O Summary Window Data Fields . . . . . . . . . . . . . . . . . . . . .
CPU Modes Summary Window Data Fields . . . . . . . . . . . . . . . . . . . . .
CPU Summary Window Data Fields . . . . . . . . . . . . . . . . . . . . . . . . . .
Memory Summary Window Data Fields . . . . . . . . . . . . . . . . . . . . . . . .
Single Process Summary Window Data Fields . . . . . . . . . . . . . . . . . . .
Lock Contention Summary Window Data Fields . . . . . . . . . . . . . . . . .
Single Lock Summary Window Data Fields . . . . . . . . . . . . . . . . . . . . .
Data Items in the Summary Panel of the Cluster Transition/Overview
Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Data Items in the Cluster Members Panel of the Cluster
Transition/Overview Summary Window . . . . . . . . . . . . . . . . . . . . . . . .
Data Items in the SCA Summary Window . . . . . . . . . . . . . . . . . . . . . .
Data Items in the Transmit Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Data Items in the Receive Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3–32
3–32
3–35
3–38
3–38
1–5
1–7
1–9
1–10
2–4
2–7
3–1
3–3
3–6
3–8
3–9
3–11
3–13
3–15
3–16
3–18
3–20
3–26
3–29
5–6
5–7
A–1
B–1
B–2
B–3
B–4
3–19
3–20
3–21
3–22
4–1
5–1
5–2
5–3
5–4
5–5
Data Items in the Congestion Control Panel . . . . . . . . . . . . . . . . . . . .
Data Items in the Channel Selection Panel . . . . . . . . . . . . . . . . . . . . .
Data Items in the VC Closures Panel . . . . . . . . . . . . . . . . . . . . . . . . .
Data Items in the Packets Discarded Panel . . . . . . . . . . . . . . . . . . . . .
Summary of DECamds Fixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DECamds Application Defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Event Log Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CPU, I/O, and Memory Class Definitions . . . . . . . . . . . . . . . . . . . . . . .
Memory Summary Collection Interval Fields . . . . . . . . . . . . . . . . . . . .
Default Window Collection Intervals . . . . . . . . . . . . . . . . . . . . . . . . . .
LAN Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Monitoring Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Recommended System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . .
Files on the Data Analyzer Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Logical Names Defined for the Data Analyzer . . . . . . . . . . . . . . . . . . .
Files on Nodes Running the Data Provider . . . . . . . . . . . . . . . . . . . . .
Logical Names Defined on Nodes Running the Data Provider . . . . . . .
5–19
5–21
A–2
B–1
B–2
B–3
B–3
3–39
3–39
3–40
3–40
4–3
5–3
5–6
5–12
5–17
5–17 vii
Preface
Intended Audience
This guide is intended for system managers who install and use Compaq
DECamds software.
Document Structure
This guide contains the following chapters and appendixes:
• Chapter 1 describes an overview of DECamds software, where to install
DECamds, security features, and customizing security files.
• Chapter 2 describes how to start DECamds and use online help. It also describes the System Overview window and the Event Log window.
• Chapter 3 describes how to use the DECamds data windows.
• Chapter 4 describes how to take corrective actions, called fixes, to improve system availability.
• Chapter 5 describes the tasks you can perform to filter, sort, and customize the display of system data using DECamds. It also describes how some of these tasks can optimize the performance of DECamds.
• Appendix A contains instructions for installing DECamds.
• Appendix B contains a description of all files and logical names created when
DECamds is installed and gives examples of the log files that DECamds writes.
• The Glossary defines DECamds terminology.
Related Documents
The following manuals provide additional information:
• OpenVMS Version 7.3 Release Notes describes features and changes that apply to DECamds software.
• OpenVMS System Manager’s Manual describes tasks you perform to manage an OpenVMS system. It also describes installing a product with the
POLYCENTER Software Installation utility.
• OpenVMS System Management Utilities Reference Manual describes utilities you use to manage an OpenVMS system.
• OpenVMS Programming Concepts Manual explains OpenVMS lock management concepts.
•
OpenVMS System Messages: Companion Guide for Help Message Users
explains how to use help messages.
ix
x
• POLYCENTER Software Installation Utility User’s Guide describes the features you can request with the PRODUCT INSTALL command when starting an installation.
For additional information about Compaq OpenVMS products and services, access the Compaq website at the following location: http://www.openvms.compaq.com/
Reader’s Comments
Compaq welcomes your comments on this manual. Please send comments to either of the following addresses:
Internet
Compaq Computer Corporation
OSSG Documentation Group, ZKO3-4/U08
110 Spit Brook Rd.
Nashua, NH 03062-2698
How to Order Additional Documentation
Use the following World Wide Web address to order additional documentation: http://www.openvms.compaq.com/
If you need help deciding which documentation best meets your needs, call
800-282-6672.
Conventions
The following conventions are used in this guide:
Ctrl/x
PF1 x
Return
. . .
A sequence such as Ctrl/x indicates that you must hold down the key labeled Ctrl while you press another key or a pointing device button.
A sequence such as PF1 x indicates that you must first press and release the key labeled PF1 and then press and release another key or a pointing device button.
In examples, a key name enclosed in a box indicates that you press a key on the keyboard. (In text, a key name is not enclosed in a box.)
In the HTML version of this document, this convention appears as brackets, rather than a box.
Horizontal ellipsis points in examples indicate one of the following possibilities:
• Additional optional arguments in a statement have been omitted.
• The preceding item or items can be repeated one or more times.
• Additional parameters, values, or other information can be entered.
.
.
.
Vertical ellipsis points indicate the omission of items from a code example or command format; the items are omitted because they are not important to the topic being discussed.
( )
[ ]
|
{ }
bold text
italic text
UPPERCASE TEXT
Monospace text
numbers
In command format descriptions, parentheses indicate that you must enclose choices in parentheses if you specify more than one.
In command format descriptions, brackets indicate optional choices. You can choose one or more items or no items.
Do not type the brackets on the command line. However, you must include the brackets in the syntax for OpenVMS directory specifications and for a substring specification in an assignment statement.
In command format descriptions, vertical bars separate choices within brackets or braces. Within brackets, the choices are optional; within braces, at least one choice is required. Do not type the vertical bars on the command line.
In command format descriptions, braces indicate required choices; you must choose at least one of the items listed. Do not type the braces on the command line.
This typeface represents the introduction of a new term. It also represents the name of an argument, an attribute, or a reason.
Italic text indicates important information, complete titles of manuals, or variables. Variables include information that varies in system output (Internal error number), in command lines (/PRODUCER=name), and in command parameters in text (where dd represents the predefined code for the device type).
Uppercase text indicates a command, the name of a routine, the name of a file, or the abbreviation for a system privilege.
Monospace type indicates code examples and interactive screen displays.
In the C programming language, monospace type identifies the following elements: keywords, the names of independently compiled external functions and files, syntax summaries, and references to variables or identifiers introduced in an example.
A hyphen at the end of a command format description, command line, or code line indicates that the command or statement continues on the following line.
All numbers in text are assumed to be decimal unless otherwise noted. Nondecimal radixes—binary, octal, or hexadecimal—are explicitly indicated.
xi
1
Overview of DECamds
This chapter describes the following:
• Overview of DECamds
• Where to install the DECamds Data Analyzer
• DECamds security features
Compaq DECamds is a real-time monitoring, diagnostic, and correction tool that helps you improve OpenVMS system and OpenVMS Cluster availability.
DECamds also helps system programmers/analysts to target a specific node or process for detailed analysis, and system operators and service technicians to determine hardware and software issues.
DECamds simultaneously collects and analyzes system data and process data from multiple nodes and displays the output on a DECwindows Motif display.
Based on the analyzed data, DECamds detects events and proposes actions to correct resource availability and system denial issues in real time.
DECamds helps improve OpenVMS system and OpenVMS Cluster availability in the following ways:
Availability
Centralized management
Intuitive interface
Correction capability
Customization
Scalability
Alerts users to resource availability problems, suggests paths for further investigation, and recommends actions to improve availability.
Provides centralized management of remote nodes within an extended local area network (LAN).
Provides an easy-to-learn and easy-to-use DECwindows Motif user interface.
Allows real-time intervention, including adjustment of node and process parameters, even when remote nodes are hung.
Adjusts to site-specific requirements through a wide range of customization options.
Makes it easier to monitor multiple OpenVMS systems and OpenVMS
Cluster systems over a single site or over multiple sites.
1–1
Overview of DECamds
1.1 How Does DECamds Work?
1.1 How Does DECamds Work?
DECamds is a client/server application. It is installed in two parts as follows:
1.
The Data Provider gathers system data and transmits it to the Data Analyzer.
2.
The Data Analyzer receives data from the Data Provider, analyzes the data, and displays it.
A node that has the DECamds Data Provider installed announces its availability, using a multicast LAN message, to any DECamds Data Analyzer that is installed and running. The Data Analyzer receives the Data Provider’s availability announcement and a communications link is established.
Note
The Data Provider runs at a high interrupt priority level (IPL), so it gathers data and transmits it to the Data Analyzer even if the Data
Provider is on a remote node that is hung. However, because of the high
IPL collection, the Data Provider cannot collect nonresident memory data, restricting some data collection in process space.
The Data Analyzer portion of DECamds is a DECwindows Motif application that runs on any OpenVMS Version 6.2 or later system. Although you can run the Data Analyzer as a member of a monitored cluster, it is typically run on an
OpenVMS system that is not a member of the cluster being monitored. You can have more than one Data Analyzer application executing in a LAN, but only one can be running at a time on each OpenVMS system.
System data is analyzed and translated into meaningful values and rates that are displayed in DECwindows Motif windows. The data is screened for data points that exceed thresholds that might cause system or OpenVMS Cluster availability problems. The Data Analyzer can also implement various system correction options if authorized to do so.
The Data Analyzer and Data Provider nodes communicate over an Extended LAN using an IEEE 802.3 Extended Packet format protocol. Once a secure connection is established, the Data Analyzer instructs the Data Provider to gather specific system and process data.
Figure 1–1 illustrates the interaction of the Data Analyzer and Data Provider on nodes in a cluster.
Nodes A, C, D, E, F, and H can exchange information with the Data Analyzer.
Node B has defined its security to exclude the Data Analyzer from accessing its system data. Node G has not installed DECamds and does not communicate with the Data Analyzer.
1–2
Figure 1–1 DECamds Processing
A
Data Provider
Node
H
Data Provider
Node
Overview of DECamds
1.1 How Does DECamds Work?
B
Data Provider
Node
Security triplet does not match.
G
DECamds not installed on this node.
Data Analyzer Node
C
Data Provider
Node
F
Data Provider
Node
D
Data Provider
Node
E
Data Provider
Node
ZK−7946A−GE
1.2 Where to Install the DECamds Data Analyzer
This section discusses where to install the DECamds Data Analyzer software. You can install and run the DECamds Data Analyzer from either a cluster member or a standalone system outside the cluster. However, Compaq recommends that you run the Data Analyzer from outside a cluster because then you can monitor system information even if the nodes in the cluster pause or hang.
Generally, you can install and run the DECamds Data Provider on any OpenVMS
Version 6.2 or later system. Appendix A describes the specific system hardware and software requirements for installing and running the DECamds Data
Analyzer.
1.3 DECamds Security Features
DECamds has several security features, including the following:
• Private LAN transport
1–3
Overview of DECamds
1.3 DECamds Security Features
The DECamds protocol is based on the 802.3 Extended Packet Format (also known as SNAP). The IEEE DECamds protocol values are as follows:
Protocol ID: 08-00-2B-80-48
Multicast Address: 09-00-2B-02-01-09
If you filter protocols for bridges or routers in your network, add these values to your network protocols.
• DECamds data transfer security
Each node running DECamds as a Data Analyzer or a Data Provider has a file containing a list of three-part codes, called security triplets. See
Section 1.3.1 for more information about security triplets.
For Data Analyzer and Data Provider nodes to exchange data, at least one security triplet must match between the files on each system. DECamds Data
Provider nodes that have read access allow system data to be viewed by the
Data Analyzer node. Data Provider nodes that have write access also allow fixes to be performed by the Data Analyzer node.
• DECamds security log
The Data Provider logs all access denials and executed write instructions to the operator communication manager (OPCOM). Each log entry contains the network address of the initiator. If access is denied, the log entry also indicates whether a read or write was attempted. If a write operation was performed, the log entry indicates the process identifier (PID) of the affected process.
• OpenVMS file protection and process privileges
When the DECamds Data Analyzer and Data Provider are installed, they set directory and file protections on system directories so that only
SYSTEM accounts can read the files. For additional security on these system directories and files, you can create access control lists (ACLs) to restrict and set alarms on write access to the security files. For more information about creating ACLs, see the OpenVMS Guide to System Security.
The AMDS$CONFIG logical translates to the location of the default security files, including the following:
• The AMDS$DRIVER_ACCESS.DAT file is installed on all Data Provider nodes. The file contains a list of Data Analyzer nodes to which system data can be sent. It also contains the type of access allowed for each of those nodes.
• The AMDS$CONSOLE_ACCESS.DAT file is installed on only those nodes that run the Data Analyzer portion of DECamds. It contains a list of passwords to identify itself to Data Provider nodes.
You can create additional security files in the directory associated with the
AMDS$CONFIG logical name. By default, this logical name is assigned to
AMDS$SYSTEM. As you customize DECamds, you can change the logical assignment of AMDS$CONFIG to read input files from other locations.
The following sections describe what a security triplet is, where to find the security files, and how to set up your security files.
1–4
Overview of DECamds
1.3 DECamds Security Features
1.3.1 Understanding DECamds Security Files
A security triplet determines which systems can access system data from the node. The AMDS$DRIVER_ACCESS.DAT and AMDS$CONSOLE_ACCESS.DAT
files on the Data Analyzer and Data Provider systems list security triplets.
A security triplet is a three-part record that is separated by backslashes ( \ ). A triplet consists of the following fields:
• A network address (DECnet address, hardware address, or a wildcard character)
• An 8-character (alphanumeric) password
The password is not case sensitive, so the passwords ‘‘testtest’’ and
‘‘TESTTEST’’ are considered to be the same.
• A read or write (R or W) access verification code
For the Data Analyzer, the security triplets that allow write access are listed last in the AMDS$CONSOLE_ACCESS.DAT security file.
The exclamation point ( ! ) is a comment delimiter; any characters after the comment delimiter are ignored.
Table 1–1 describes the detailed format of each portion of the security triplet and then gives some examples for different situations.
Table 1–1 Security Triplet Format
Item Description
DECnet address
(area.number)
Although DECnet is not required to run DECamds, the DECnet address is used to determine a node’s physical address. The DECnet address is created by using the area.number format, where area is a value from 1 to 63, and number is a value from 1 to 1023. This address is modified into a physical address of the form AA-00-04-00xx-yy to conform to the standard IEEE 802.3 protocol for network addressing. The AA-00-04-00 prefix is associated with the Compaqowned address. The xx-yy suffix is the hexadecimal representation of the address formula: area*1024+number
Note
If you are running on a system with more than one LAN adapter or are running DECnet-Plus networking software, then this format is not valid for you. Instead, you must use the hardware address or wildcard address format for this field.
1–5
Overview of DECamds
1.3 DECamds Security Features
Item
Hardware address
(08-00-2B-xx-xx-xx)
Wildcard address
( * )
Description
The hardware address field is the physical hardware address in the
LAN adapter chip. It is used if you have multiple LAN adapters or are running the DECnet-Plus networking software on the system
(as opposed to the DECnet for OpenVMS Phase IV networking software).
For adapters provided by Compaq, the hardware address is in the form 08-00-2B-xx-xx-xx, where the 08-00-2B portion is Compaq’s valid range of LAN addresses as defined by the IEEE 802 standards and the xx-xx-xx portion is chip specific.
To determine the value of the hardware address on a system, use the OpenVMS System Dump Analyzer (SDA) as follows:
$ ANALYZE/SYSTEM
SDA> SHOW LAN
The previous commands display a list of available devices. Choose the template device of the LAN adapter you will be using and then enter the following command:
SDA> SHOW LAN/DEVICE=xxA0
The wildcard character allows any incoming triplet with a matching password field to access the Data Provider node. Use the wildcard character to allow read access and to run the console application from any node in your network.
Because the Data Analyzer does not use this field, you should use the wildcard character in this field in the AMDS$CONSOLE_
ACCESS.DAT file.
Caution
Use of the wildcard character for write access security triplets enables any system to perform system-altering fixes.
The following steps show how DECamds uses the security triplets to ensure security among DECamds nodes:
1.
A message is broadcast at regular intervals to all nodes within the LAN indicating the availability of a Data Provider node to communicate with a
Data Analyzer node.
2.
The node running the Data Analyzer receives the availability message and returns a security triplet that identifies it to the Data Provider and requests system data from the Data Provider.
3.
The Data Provider examines the security triplet to determine if the Data
Analyzer is listed in the AMDS$DRIVER_ACCESS.DAT file to permit access to the system.
• If the AMDS$DRIVER_ACCESS.DAT file lists Data Analyzer access information, then the Data Provider and the Data Analyzer can exchange information.
• If the Data Analyzer is not listed in the AMDS$DRIVER_ACCESS.DAT
file, or does not have appropriate access information, then access is denied and a message is logged to OPCOM; the Data Analyzer receives a message stating that access to that node is not permitted.
1–6
Overview of DECamds
1.3 DECamds Security Features
Table 1–2 describes how the Data Provider node interprets a security triplet match.
Table 1–2 Security Triplet Verification
Security Triplet Interpretation
08-00-2B-12-34-
56\ HOMETOWN\ W
2.1\ HOMETOWN\ R
*\ HOMETOWN\ R
The Data Analyzer has write access to the node only when the
Data Analyzer is run from the node with this hardware address
(multiadapter or DECnet-Plus system) and with the password
HOMETOWN.
The Data Analyzer has read access to the node when run from a node with DECnet for OpenVMS Phase IV address 2.1 and the password HOMETOWN.
Any Data Analyzer with the password HOMETOWN has read access to the node.
1.3.2 Customizing Security Files
Security files define which Data Analyzers can access data on nodes that have a
Data Provider. The security files let you group nodes according to specific criteria.
Note
Compaq recommends that you group nodes according to OpenVMS Cluster membership. A node can be in only one group at a time. All nodes in a cluster must also be in the same group.
Installing DECamds initially assigns all nodes to one group. Each node that is assigned to a group is listed under the group name heading in the System
Overview window.
Consider the following items when you set up customized groups:
• OpenVMS Cluster and data integrity
– All nodes in a cluster must be in the same group for data in the disk volume and lock contention windows to be complete and accurate.
It is possible to include two clusters in one group, but if a cluster is divided between two groups or only partially included, the data might not be accurate.
– Adding standalone nodes to the group will affect only the accuracy of disk volume and lock contention data.
• Partitioning for analysis
Specific users can have read or write access to certain subsets of nodes.
For example, one Data Analyzer can be designated to monitor a certain hardware type or cluster. This is entirely independent of the group to which
the nodes of that hardware type or cluster are assigned. Apart from strict security considerations, this mechanism is often used to partition systems for convenience.
Your site might already have criteria relevant to defining groups. These could include a system management division of labor, hardware type, physical location, or work function.
1–7
Overview of DECamds
1.3 DECamds Security Features
Compaq recommends that you correlate your security files to your group definitions so that all nodes in the group are visible in the System Overview window. Section 1.3 explains how to set up security files.
1.3.2.1 Setting Up Node Groups
Assign nodes in a cluster to the same group.
To assign a node to a group, perform the following steps on each Data Provider node that is to be part of the group:
1.
Assign a unique name of up to 15 alphanumeric characters to the AMDS$GROUP_NAME logical name in the
AMDS$SYSTEM:AMDS$LOGICALS.COM file. For example:
$ AMDS$DEF AMDS$GROUP_NAME FINANCE ! Group FINANCE; OpenVMS Cluster alias
2.
Apply the logical name by restarting the Data Provider, as follows:
$ @SYS$STARTUP:AMDS$STARTUP.COM START
For more information about the other logical names in
AMDS$LOGICALS.COM, see Appendix B.
1.3.2.2 Defining Data Exchange Access Between Nodes
The Data Provider stores access security triplets in a file called AMDS$DRIVER_
ACCESS.DAT, which indicates the Data Analyzer nodes that are allowed to request that data be provided. If a Data Analyzer node is not listed in the file, access is denied.
Examples
All Data Provider nodes in Group FINANCE have the following AMDS$DRIVER_
ACCESS.DAT file:
*\FINGROUP\R ! Let anyone with FINGROUP password read
!
2.1\DEVGROUP\W ! Let only DECnet node 2.1 with
! DEVGROUP password perform fixes (writes)
!
2.2\FINGROUP\W ! Let DECnet node 2.2 perform fixes
All Data Provider nodes in Group DEVELOPMENT have the following
AMDS$DRIVER_ACCESS.DAT file:
*\GROUPBRD\R ! Let anyone with GROUPBRD password read
!
2.1\DEVGROUP\W ! Let only DECnet node 2.1 with
! DEVGROUP password perform fixes
AMDS$CONSOLE_ACCESS.DAT file for a Data Analyzer
For a Data Analyzer to access information on any node in Groups FINANCE or
DEVELOPMENT, the following access security triplets must be listed in the Data
Analyzer node’s AMDS$CONSOLE_ACCESS.DAT file:
1–8
Overview of DECamds
1.3 DECamds Security Features
*\FINGROUP\R ! To access data on nodes in Group FINANCE
!
*\GROUPBRD\R ! To access data on nodes in Group DEVELOPMENT
!
*\DEVGROUP\W ! Assumes you are the owner of DECnet
! address 2.1 so you can access data and
! perform fixes on both Group FINANCE and
! Group DEVELOPMENT nodes.
!
*\FINGROUP\W ! Assumes you are the owner of DECnet
! address 2.2 so you can access data and
! perform fixes on Group FINANCE nodes.
After you modify the AMDS$CONSOLE_ACCESS.DAT security file, restart the Data Analyzer with the AVAIL command to use the changes. For more information about starting DECamds, see Chapter 2.
1.3.2.3 Limiting Specific Users to Read Access
You can restrict write access for certain users by performing the following steps:
1.
Assign a search list of directories to the AMDS$CONFIG logical name in the
AMDS$SYSTEM:AMDS$LOGICALS.COM file. For example:
$ DEFINE AMDS$CONFIG SYS$LOGIN,AMDS$SYSTEM
Execute the procedure as follows:
$ @AMDS$SYSTEM:AMDS$LOGICALS
2.
Copy the AMDS$CONSOLE_ACCESS.DAT security file to the SYS$LOGIN directory of a user and edit the file for that user.
3.
Restart the Data Analyzer with the AVAIL command. For more information about starting the Data Analyzer, see Chapter 2.
The next time the user starts DECamds, the new security file will be found in their SYS$LOGIN directory and will be used. The security file found in
AMDS$SYSTEM will not be read.
1.3.3 Sending Messages to OPCOM
The logical names shown in Table 1–3 control the sending of messages to OPCOM and are defined in the AMDS$LOGICALS.COM file.
Table 1–3 DECamds Logical Names for OPCOM Messages
AMDS$RM_OPCOM_READ
AMDS$RM_OPCOM_WRITE
A value of TRUE logs read failures to OPCOM.
A value of TRUE logs write failures to OPCOM.
To use the changes, restart the Data Analyzer with the following command on each system or use the System Management utility (SYSMAN) to run the command on all systems within the OpenVMS Cluster:
$ @SYS$STARTUP:AMDS$STARTUP RESTART
1–9
Overview of DECamds
1.3 DECamds Security Features
1.3.4 Setting Broadcast Intervals for Node Availability Messages
Availability messages are broadcast by the Data Provider on nodes at regular intervals until a node establishes a link with the Data Analyzer. After a link has been established, the interval varies depending on the amount of data collection
(and other factors) occurring between nodes.
You can modify the logical names in the AMDS$LOGICALS.COM file (shown in
Table 1–4) to change the broadcast availability intervals.
Table 1–4 Broadcast Availability Logical Names
AMDS$RM_DEFAULT_INTERVAL
AMDS$RM_SECONDARY_INTERVAL
Defines from 15- to 300-second intervals between availability message broadcasts.
Defines from 15- to 1800-second intervals between availability message broadcasts after a link has been established between nodes.
To use the changes, restart the Data Analyzer with the following command on each system or by using SYSMAN to run the command on all systems within the
OpenVMS Cluster:
$ @SYS$STARTUP:AMDS$STARTUP RESTART
1–10
2
Getting Started
This chapter describes the following:
• How to start DECamds
• How to use the System Overview window to monitor resource availability problems on your system
• How to use the Event Log window to correct resource availability problems on your system
2.1 Starting DECamds
To start the DECamds Data Analyzer, enter the following command and any of the following qualifiers:
AVAIL /qualifiers
Note
If you have a recent version of DECamds or if you have Availability
Manager installed, you must use the following command to invoke
DECamds:
$ AVAIL/ MOTIF
Qualifiers
/CONFIGURE
Specifies the directories from which input files are read. This can be a search list of directories or a logical defining a search list of directories.
/LOG_DIRECTORY
Specifies the directory to which log files are written. Output files can be directed to the null device, NLA0:.
/GROUP
A comma-separated list of the groups of Data Provider nodes that you want the
Data Analyzer to access.
Note
If you have not already set up a group hierarchy of nodes during
DECamds installation, refer to Section 1.3.2.1 for information about setting up node groups.
2–1
Getting Started
2.1 Starting DECamds
The following examples of commands start DECamds with input files read first from SYS$LOGIN, and then from AMDS$SYSTEM (if the files are not found in
SYS$LOGIN). All output files are written to the SYS$LOGIN directory. Only data from the group you enter (such as KUDOS) is collected.
$ DEFINE/JOB AMDS$CONFIG SYS$LOGIN,AMDS$SYSTEM
$ AVAIL/CONFIGURE=AMDS$CONFIG/LOG_DIRECTORY=SYS$LOGIN/GROUP=(KUDOS)
When DECamds starts, it displays the System Overview and Event Log windows.
To obtain help about DECamds, choose a menu item from the Help menu.
2.2 Using the System Overview Window
The System Overview window allows you to focus on resource usage activity at a high level and to display more specific data when necessary. The System
Overview window displays CPU, memory, I/O data, number of processes in CPU queues, operating system version, and hardware model for each node and group
DECamds recognizes.
Figure 2–1 shows a sample System Overview window displaying the nodes that
DECamds can reach and is monitoring.
2–2
Getting Started
2.2 Using the System Overview Window
Figure 2–1 System Overview Window
System Overview
File Control Customize View Collect Help
Group (node cnt)
NodeName
EVMS (29)
2BOYS
4X4TRK
ALTOS
ARUSHA
AZSUN
BARNEY
CALPAL
CHOBE
CLAIR
CLAWS
CRNPOP
DFODIL
ETOSHA
FARKLE
FCMOVE
GLOBBO
GNRS
LOADQ
MACHU
MILADY
ORNOT
PITMOD
RUMAD
SUB4
TSAVO
VAX5
VMSRMS
ZAPNOT
ZOON
4
0
3
3
6
11
0
2
3
1
1
10
34
0
4
5
1
32
8
0
2
10
1
6
21
24
4
4
1
2
% Utilization
CPU MEM
71
70
55
64
88
69
25
69
68
63
49
35
28
72
73
70
70
30
37
50
47
29
71
37
79
70
34
40
71
72
Rate / Sec / CPU
BIO DIO
# procs in
CPU Qs
1
1
1
0
0
32
1
0
0
0
0
0
3
0
0
0
3
0
17
0
0
0
0
0
1
0
5
0
0
105
2
0
1
2
4
5
1
1
29
2
0
0
3
0
0
19
9
13
1
2
0
0
0
0
0
0
13
0
0
8
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1 dir
O. S.
Version
V6.2
V6.2
V6.2
V6.2
V6.2
V6.2
V6.2
V6.2
V6.2
V6.2
V6.2
V6.2
V6.2
V6.2
V6.2
V7.0
V6.2
V6.2
V6.2
V6.2
V6.2
V7.0
V6.2
V6.2
V7.0
V6.2
V6.2
V6.2
V6.2
Hardware
Model
DEC 3000 Model 400
VAXstation 3100−M76/SPX
DEC 7000 Model 630
DEC 7000 Model 630
DEC 4000 Model 620
VAXstation 3100/SPX
VAX 6000−430
DEC 7000 Model 630
AlphaStation 400 4/233
DEC 4000 Model 610
VAXstation 4000−VLC
VAXstation 3100/GPX
DEC 7000 Model 630
VAXstation 3100−M75/SPX
VAXstation 3100/GPX
VAXstation 3100/GPX
DEC 3000 Model 500
VAX 7000−620
VAXstation 4000−VLC
VAXstation 3100/SPX
VAX 6000−440
VAXstation 3100/GPX
DEC 3000 Model 400
VAXstation 3100/GPX
DEC 7000 Model 630
VAX 6000−540
VAXstation 3100/GPX
VAX 6000−440
VAXstation 3100/SPX
The System Overview window contains two kinds of information:
• Group information, displayed in the row next to the group name, shows averages for all nodes in the group.
• Node information, displayed in the row next to the node name, shows averages for the node.
If the View menu is set to Hide Nodes, node information is not displayed.
Table 2–1 explains the fields displayed in the System Overview window.
ZK−8543A−GE
2–3
Getting Started
2.2 Using the System Overview Window
Table 2–1 System Overview Window Display Fields
Field Description
Group
NodeName
CPU (CPU usage)
Displays the group names in alphabetical order and the number of nodes recognized by DECamds. A group is a defined set of nodes that appear together in the System Overview window. A group can be defined by type of hardware, physical location, function, or
OpenVMS Cluster alias.
Displays the name of the node in a node row.
In a group row, displays the average of the percentage of CPU time used by all processors weighted toward the present.
In a node row, displays the percentage of CPU time used by all processes on the node, expressed as an exponential average, weighted toward the present.
On Symmetric Multiprocessing (SMP) nodes, rates for CPU time are added and divided by the number of CPUs.
MEM (Memory rate) In a group row, displays the average of the sampled values (over time) for all processes on all nodes in a group.
In a node row, displays the percent of space in physical memory that all processes on the node are currently occupying. The value represents 100 percent minus the amount of free memory.
BIO (Buffered I/O rate)
DIO (Direct I/O usage)
# procs in CPU Qs
(Number of processes in CPU queues)
In a group row, displays the average of BIO operations of all processes on all nodes.
In a node row, displays the BIO rate for all processes on the node across the number of CPUs.
In a group row, displays the average of DIO operations of all processes on all nodes.
In a node row, displays the DIO rate for all processes on the node.
Represents the number of processes the Node Summary data collection found in the COM, COMO, MWAIT, and PWAIT CPU queues.
O.S. Version (Version of the operating system)
Hardware Model
Lists the currently loaded version of OpenVMS on the node being monitored (not the node doing the monitoring).
Lists the hardware model of the node being monitored.
A percentage of a used resource is shown both by number and a dynamic status bar. For group rows, the values are averaged for all nodes in the group when node summary data collection is active. (Node summary data collection is active by default on DECamds startup.)
Resource availability problems are indicated by highlighting. When an event occurs, DECamds highlights the status bar that represents the resource.
Highlighting is shown in red on color monitors, by default; it is bold on monochrome monitors. You can change the highlight color. (See Chapter 5 for more information.)
When data appears dimmed, the data is more than 60 seconds old due to a user action that stopped node summary data collection. When the data is updated, the display returns to normal resolution.
Figure 2–2 shows the System Overview window options. Note that on the View menu, the Hide Nodes item toggles with Show Nodes; on the Control menu, the
Disable menu choices toggle with Enable choices.
2–4
Getting Started
2.2 Using the System Overview Window
Figure 2–2 System Overview Window Menus
File
Quit
Exit
View
Hide Nodes
Control
Disable Automatic Data Collection
Enable Automatic Event Investigation
Stop All Data Collection
Close All Displays
Customize
DECamds Customizations
Save DECamds Customizations
Save Geometry
Use System Defaults
Collect
All Node Summary
All CPU Summary
All Memory Summary
All Process I/O Summary
All Disk Status Summary
All Disk Volume Summary
All Page/Swap File Summary
All Lock Contention Summary
All Cluster Transition Summary
Stop All Data Collection
Help
On Context
On Window
On Version
On Help
2.2.1 Expanding and Collapsing Group Information
Use the View menu to display group or group and node status in the System
Overview window. Typically, a group is an OpenVMS Cluster. Groups are displayed in alphabetical order. Nodes within a group are also displayed in alphabetical order.
You can also expand and collapse specific group displays by clicking MB3 while the cursor is on the selected group and choosing either the Hide Nodes or Show
Nodes menu item.
2.2.2 Displaying Additional Data
By default, the Data Analyzer collects, analyzes, and displays four categories of data from Data Provider nodes:
• Node Summary
• Page/Swap File Summary
• Lock Contention Summary
• Cluster Transition Summary
In addition to the default data, you can choose any of these categories of additional data to be collected, analyzed, and displayed:
• CPU Summary
• Memory Summary
• Process I/O Summary
• Disk Status Summary
• Disk Volume Summary
ZK−7968A−GE
2–5
Getting Started
2.2 Using the System Overview Window
You can change the default data windows that are displayed with the DECamds
Application Customizations dialog box. For more information about customizing
DECamds, see Chapter 5.
Note
Data gathering and display consume CPU time and network bandwidth.
Request only the data you need to conclude an investigation, and then stop collecting the data (see Section 2.2.3). Whenever possible, collect data for just one node, not the entire group.
To request a specific data category, do one of the following:
• For data on a single node or a group, in the System Overview window, click
MB3 on a selected node or group, then choose Collect from the menu, and then choose a category from the submenu.
• For data on all nodes, in the System Overview window, choose a category from the Collect menu.
• In the Event Log window, click MB3 on a selected event and choose Display from the menu. (See Section 2.3 for information on the Event Log window.)
2.2.3 Stopping Data Collection
To stop collecting data, do one of the following:
• Choose Stop All Data Collection from either of the following:
Collect menu or Control menu of the System Overview window
Control menu of the Event Log window
This stops collecting for all nodes. Events are removed from the Event Log, and data values in the System Overview window go to zero and are dimmed.
Use this item if you lose track of data you are collecting in the background.
Then restart data collection as needed; new events appear once data collection resumes.
• Click MB3 on a group or node name of the System Overview window to display the Collect submenu. Select Stop All Data Collection.
This stops all data collection for the group or node you select. Node or group data in the System Overview window is zeroed.
• From the File menu of any data window, select Stop Collecting.
If the data window is specific to a node or group, this option stops collecting for the node or group. (Data windows are discussed in Chapter 3.)
Note
Choosing Close Display from the File menu of any data window closes the window but continues data collection as a background task.
• From the File menu of the System Overview window, select Exit or Quit.
2–6
Getting Started
2.3 Using the Event Log Window
2.3 Using the Event Log Window
The Event Log window allows you to identify and correct a system problem. The
Event Log window displays a warning message whenever DECamds detects a resource availability problem. Figure 2–3 shows an Event Log window.
Figure 2–3 Event Log Window
Event Log
File Control Customize Help
........
Time
........
Sev Event
......
Description
....................................................................................................
13:29:12.21
13:28:33.12
13:28:20.92
13:28:22.21
13:28:20.39
13:28:03.97
13:27:50.79
12:52:32.17
12:30:52.04
60 HIBIOR,
60 HIBIOR,
80 LOMEMY,
75 HIHRDP,
80 LCKCNT,
80 LCKCNT,
80 LCKCNT,
80 LOMEMY,
80 LOMEMY,
AMDS buffered I/O rate is high
GALAXY buffered I/O rate is high
GALAXY free memory is low
ETOSHA hard page fault rate is high
ORNOT possible contention for resource F11B$s{...
AJAX possible contention for resource F11B$vWORK213
CALPAL possible contention for resource PHASE1
HELENA free memory is low
DELPHI free memory is low
ZK−7951A−GE
DECamds writes all events to a log file (AMDS$LOG:AMDS$EVENT_LOG.LOG).
You can read this file in the Event Log window while the application is running.
Note
Ignore event messages that report the system process ‘‘SWAPPER’’ as having used all its quotas. The SWAPPER process is the OpenVMS memory management process; it does not have its quotas defined in the same way other system and user processes do.
Table 2–2 explains the fields displayed in the Event Log window.
Table 2–2 Event Log Window Display Fields
Field Description
Time
Sev
(Severity)
Event
Displays, in real time, the time that an event is detected.
Displays a value from 0 to 100. By default, events are listed in the Event
Log window in order of decreasing severity. 0 is an informational message;
100 is a severe event. An event severity of 80 is high and indicates a potentially serious problem. Events with a severity of less than 50 appear dimmed, to indicate that they are less important. See Chapter 5 for information about how to change the display of severe events. Events that are critical are also sent to the OpenVMS operator communication manager (OPCOM).
Displays an alphanumeric identifier of the type of event.
(continued on next page)
2–7
Getting Started
2.3 Using the Event Log Window
Table 2–2 (Cont.) Event Log Window Display Fields
Field Description
Description Displays the node or group name and a short description of the resource availability problem.
When an event ‘‘times out’’ by an improvement in availability, it is removed from the display. Events that are not triggered by a condition are timed out after 30 seconds (for example, the ‘‘CFGDON, node configuration done’’ event). When you select an event, the event remains displayed for 15 seconds (or until you initiate another task in the window), even if the event times out.
Figure 2–4 shows the Event Log window options.
Figure 2–4 Event Log Window Menus
File
Quit
Exit
Control
Disable Automatic Data Collection
Enable Automatic Event Investigation
Stop All Data Collection
Close All Displays
Customize
DECamds Customizations
Save DECamds Customizations
Save Geometry
Use System Defaults
Customize Events
Save Event Cutomizations
Sort Data
Filter Data
Use Last Saved Settings
Save Sort Changes
Save Filter Changes
Help
On Context
On Window
On Version
On Help
ZK−7952A−GE
For information about customizing event log information, see Section 5.2.1.
2–8
Getting Started
2.3 Using the Event Log Window
2.3.1 Displaying Information About an Event Log Entry
To display more information about an event, click MB3 on the event in the Event
Log window, and then choose Display. Depending on the event, you have one or more event display choices that give you more information about the event.
Figure 2–5 shows a sample event display choice dialog box.
Figure 2–5 Event Display Choice Dialog Box
DISPLAY − HIHRDP, DIMOND hard page fault rate is high
Event Display Choices
Memory Summary
Node Summary
OK Apply
Cancel
ZK−7950A−GE
2.3.2 Performing Corrective Action on an Event Log Entry
To take corrective action on an event, click MB3 on the event in the Event Log window, and then choose Fix. Depending on the type of event, one or more of the following event fix choices are displayed (not all events have all fix options):
Adjust process working set
Crash node
Delete a process
Exit an image
Lower process priority
Purge process working set
Raise process priority
Resume a process
Suspend a process
See Chapter 4 for detailed information about performing fixes.
2–9
Getting Started
2.3 Using the Event Log Window
2.3.3 Sending Event Information to OPCOM
DECamds sends critical events to the operator communication manager
(OPCOM).
By default, events that meet both of the following criteria are sent to OPCOM:
• Have a severity level of 90 or above
• Occur continuously for 600 seconds (10 minutes)
You can change either criterion by choosing Filter Data... from the Customize menu of the Event Log window. For more information on changing Event Log filters, see Chapter 5.
2.3.4 Removing an Event from the Event Log Window
To remove an event from the Event Log window, click MB3 on an event, and choose Remove from the menu. An event reappears if DECamds routine sampling detects the same situation that caused the original log entry.
2.3.5 Retaining and Releasing an Event in the Event Log Window
Event Log entries are removed when the underlying cause is removed, so an event might disappear from the Event Log window. To retain the selected event in the Event Log window, click MB3 on an event and choose Freeze. When an event is frozen, the Time field is highlighted.
To release the selected event, click MB3 on the event and choose Unfreeze.
2–10
3
Managing DECamds Data Windows
This chapter describes the DECamds data windows that you can display from the
System Overview and Event Log windows.
Figure 3–1 shows the hierarchy of the DECamds data windows.
Figure 3–1 DECamds Data Window Hierarchy
Event
Log
System
Overview
Disk
Status
Summary*
Volume
Summary*
Page/Swap
File
Summary*
Node
Summary
Lock
Contention
Summary**
Cluster
Transition/
Overview
Summary
Single
Disk
Summary
Process
I/O
Summary
CPU
Modes
Summary
CPU
Summary
Memory
Summary
Single
Lock
Summary
System
Communication
Architecture
Summary
* Available for individual nodes and groups of nodes.
** Available for groups only.
Single
Process
Summary
NISCA
Summary
ZK−7970A−GE
Table 3–1 describes the data windows and their functions.
Table 3–1 DECamds Data Windows
Window Reference
Opened from
Window
Disk Status
Summary
Section 3.1
Event Log
System Overview
Volume
Summary
Section 3.2
Event Log
System Overview
Displays
Disk device data including path, volume name, status, and mount, transaction, error, and resource wait counts.
Disk volume data, including path, volume name, disk block utilization, queue length, and operation count rate.
(continued on next page)
3–1
Managing DECamds Data Windows
Table 3–1 (Cont.) DECamds Data Windows
Window Reference
Opened from
Window
Single Disk
Summary
Page/Swap
File Summary
Node
Summary
Section 3.3
Section 3.4
Section 3.5
Disk Status
Summary
Volume Summary
Event Log
System Overview
Event Log
System Overview
Process I/O
Summary
Section 3.6
Event Log
Node Summary
System Overview
Single Disk
Summary
Node Summary CPU Modes
Summary
CPU
Summary
Memory
Summary
Section 3.7
Section 3.8
Section 3.9
Single Process
Summary
Section 3.10
Event Log
Node Summary
System Overview
Event Log
Node Summary
System Overview
Event Log
Any data window
Lock
Contention
Summary
Single Lock
Summary
Section 3.11
Section 3.12
Cluster Section 3.13
Transition/Overview
Summary
Section 3.14
System
Communication
Architecture
Summary
NISCA
Summary
Section 3.15
Event Log
System Overview
Event Log
Lock Contention
Summary
Event Log
System Overview
Cluster
Transition/Overview
Summary
System
Communication
Architecture
Summary
Displays
Summary data about each node in a group in which a disk is available.
Data about page and swap names and utilization, including free, used, and reserved pages.
Overview of a specific node’s resource demand on the CPU state queues and processor modes, memory utilization, page faults, and I/O.
Statistics about I/O utilization by process, including buffered I/O, direct I/O, and page write I/O; also lists various I/O quotas.
A graphic representation of each CPU’s processor modes, listing the process currently executing in the CPU.
Statistics about CPU utilization by process, including process state, priority, execution rate,
CPU time, and wait time.
Statistics about memory usage by process, including process working set count, quota and extent, and paging rates.
Specific data about a process, basically a combination of data elements from the CPU,
Memory, and Process I/O displays, as well as data for specific quota utilization, current image, specific process information, and wait queue time.
Data about each resource for which a potential lock contention situation exists.
Specific data about the blocking lock and any other locks in the granted, conversion, or waiting queues.
Summary information about each node’s membership in an OpenVMS Cluster.
System Communication Architecture (SCA) information about a selected node’s connection or connections to other nodes in a cluster.
Summary information about the Network
Interconnect System Communication Architecture
(NISCA) protocol, which is responsible for carrying messages to other nodes in the cluster.
3.1 Disk Status Summary Window
The Disk Status Summary window shown in Figure 3–2 displays data about availability, count, and errors of disk devices on the system.
3–2
Managing DECamds Data Windows
3.1 Disk Status Summary Window
Figure 3–2 Disk Status Summary Window
AMDS Disk Status Summary
File View Fix Customize
Device Name Path Volume Name
DAD18
DAD19
DAD20
DAD15
DAD16
DAD17
DAD22
DAD26
DAD27
DAD28
DAD23
DAD24
DAD25
DAD3
AMDS
AMDS
AMDS
AMDS
AMDS
AMDS
AMDS
AMDS
AMDS
AMDS
AMDS
AMDS
AMDS vs0121
DECLEARN010
OPTMOD
VAXDOCJUL942
VAXDOCJUL943
VAXDOCJUL944
V4RESD
V46_RESD
V47RES
ESS_KITSDISK
V43_RESD
V44RES
V45RES
VAXBINJUL942
Status
Mounted wrtlck
Mounted wrtlck
Mounted wrtlck
Mounted wrtlck
Mounted wrtlck
Mounted wrtlck
Mounted wrtlck
Mounted wrtlck
Mounted wrtlck
Mounted wrtlck
Mounted wrtlck
Mounted wrtlck
Mounted wrtlck
Mounted wrtlck
Error
Count
Trans Mount Rwait
0
0
0 1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
Help
ZK−7947A−GE
To open a Disk Status Summary window, do one of the following:
• In the System Overview window, click MB3 on a node or group line, choose
Display from the menu, and choose Disk Status Summary from the submenu.
• In the Event Log window, click MB3 on any disk status-related event, and choose Display from the menu.
Table 3–2 describes the Disk Status Summary window data fields.
Table 3–2 Disk Status Summary Window Data Fields
Field Displays
Device
Name
Path
Volume
Name
The standard OpenVMS device name that indicates where the device is located, as well as a controller or unit designation.
The primary path (node) from which the device receives commands.
The name of the media that is currently mounted.
(continued on next page)
3–3
Managing DECamds Data Windows
3.1 Disk Status Summary Window
Table 3–2 (Cont.) Disk Status Summary Window Data Fields
Field Displays
Status
Errors
Trans
1
Mount
1
Rwait
1
1
One or more of the following disk status values:
Alloc
Disk is allocated to a specific user
CluTran
Disk status is uncertain due to a cluster state transition in progress
Dismount
Foreign
Disk in process of dismounting; may be waiting for a file to close
Disk is mounted with the /FOREIGN qualifier
Invalid
MntVerify
Mounted
Offline
Online
Shadow Set
Member
Unavailable
Wrong
Volume
Disk is in an invalid state (likely Mount Verify Timeout)
Disk is waiting for a mount verification
Disk is logically mounted by a MOUNT command or service call
Disk is no longer physically mounted in device drive
Disk is physically mounted in device drive
Disk is a member of a shadow set
Disk is set /UNAVAILABLE
Disk has been mounted with the wrong volume name
Wrtlck
Disk is mounted and write locked
The number of errors generated by the disk (a quick indicator of device problems).
The number of currently-in-progress file system operations for the disk.
The number of nodes that have the specified disk mounted.
An indicator that a system I/O operation is stalled, usually during normal connection failure recovery or volume processing of host-based shadowing.
1
For the group window, the sum of the node window values is displayed.
DECamds detects the following disk status-related events and displays them in the Event Log window. Node refers to the name of the node that is signaling the event. Disk refers to the name of the disk to which the event is related.
DSKERR, node disk disk error count is high
DSKINV, node disk disk is in an invalid state
DSKMNV, node disk disk mount verify in progress
DSKOFF, node disk disk is off line
DSKRWT, node disk disk Rwait count is high
DSKUNA, node disk disk is unavailable
DSKWRV, node disk wrong volume mounted
3–4
Managing DECamds Data Windows
3.2 Volume Summary Window
3.2 Volume Summary Window
The Volume Summary window shown in Figure 3–3 displays summary data about disk volumes mounted in the system. Volume summary data is accurate when every node in an OpenVMS Cluster environment is in the same group. Multiple clusters can share a group, but clusters cannot be divided into different groups without losing accuracy.
Figure 3–3 Volume Summary Window
EVMS Volume Summary
File View Fix Customize
Device Name Path
DSA4010
DSA4006
DSA67
$64$DUA113
$64$DUA178
DSA4005
DSA64
$64$DUA203
$64$DUA114
DSA4009
DSA4004
DSA4007
Volume Name
Disk Space (blocks)
Used
EVMS
EVMS
EVMS
CALPAL
HICLIT
EVMS
EVMS
LOADQ
CALPAL
EVMS
EVMS
EVMS
WORK9
WORK5
FOLKLORE
SCRTCH.1
EVMS_SYS_061
WORK4
VMSCMSMASTER
AXPVMS061
SCRTCH.2
WORK6
WORK3
WORK7
3815570
3920478
2252727
2890557
3850054
3020402
3097410
3917270
2881599
3207386
3449114
2924410
% Used Free
104380
12
688218
50394
70404
900088
823080
3220
59352
713104
471344
996080
Queue
0.00
0.00
0.00
0.00
0.00
0.00
1.26
1.00
0.00
0.89
0.00
0.00
OpRate
45.14
27.14
7.66
6.35
5.16
4.90
4.38
3.37
2.02
0.83
0.29
0.17
Help
ZK−7948A−GE
Note
The group value for Free blocks used is determined from the node with the mastering lock on the volume resource.
To open a Volume Summary window, do one of the following:
• In the System Overview window, click MB3 on a node or group line, choose
Display from the menu, and choose Volume Summary from the submenu.
• In the Event Log window, click MB3 on any volume-related event, and choose
Display from the menu.
Note
DECamds does not collect Volume Summary data on remote disks mounted using the VAX Distributed File Service (DFS).
Table 3–3 describes the Volume Summary window data fields.
3–5
Managing DECamds Data Windows
3.2 Volume Summary Window
Table 3–3 Volume Summary Window Data Fields
Field Displays
Device
Name
Path
Volume
Name
Used
% Used
The standard OpenVMS device name that indicates where the device is located, as well as a controller or unit designation.
The primary path (node) from which the device receives commands.
The name of the mounted media.
Free
Queue
OpRate
The number of volume blocks in use.
The percentage of the number of volume blocks in use in relation to the total volume blocks available.
The number of blocks of volume space available for new data.
The average number of I/O operations pending for the volume (an indicator of performance; less than 1.00 is optimal).
The rate at which the operations count to the volume has changed since the last sampling. The rate measures the amount of activity on a volume. The optimal load is device-specific.
DECamds detects the following volume-related events and displays them in the
Event Log window. Node refers to the name of the node that is signaling the event. Disk refers to the name of the disk to which the event is related. Group refers to the name of the group to which the event is related.
DSKQLN, node disk disk volume queue length is high
LOVLSP, group disk disk volume free space is low
3.3 Single Disk Summary Window
The Single Disk Summary window shown in Figure 3–4 displays summary data about each node in the group in which a disk is available. This window is a node-by-node display of the data that is summarized in the Disk Status Summary and Volume Summary windows. The values displayed are those you would see if you displayed Disk Status Summary or Volume Summary for each node within the group.
You can use this display to determine both of the following:
• Which node in the group has a disk with high I/O rates
Determining which node has a high I/O rate to the disk is useful because you can sort by direct I/O rate and learn which process or processes are causing the high I/O rates to the disk.
• If a disk is in a state that is inconsistent with other nodes
Determining which node or nodes might be in an abnormal state is useful because you can then discover if, for some reason, one node believes that the disk is in the MntVerify or CluTran state, thus holding up processing in the cluster in which the node resides.
3–6
Managing DECamds Data Windows
3.3 Single Disk Summary Window
Figure 3–4 Single Disk Summary Window
$64$DUA208(V15SNAPSHOTS) Single Disk Summary for EVMS
File View Fix Customize Help
Node Status Errors Trans Rwait Free QLen OpRate
............................................................................................................................................................
ARUSHA
2BOYS
CHOBE
VMSRMS
CLAWS
RUMAD
ZAPNOT
BARNEY
MILADY
MACHU
LOADQ
DFODIL
FARKLE
ZOOH
ALTOS
CRNPOP
GLOBBO
GNBS
Mounted
Mounted
Mounted
Mounted
Mounted
Mounted
Mounted
Mounted
Mounted
Mounted unknown
Mounted
Mounted
Mounted
Mounted
Mounted
Mounted
Mounted
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
909
0
909
909
0
909
909
909
909
909
0
909
909
909
909
909
909
909
(M)
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
ZK−8544A−GE
To open a Single Disk Summary window, follow these steps:
1.
In the System Overview window, click MB3 on a group or node name.
The system displays a pop-up menu.
2.
Choose Display from the menu and Disk Status Summary (or Volume
Summary) from the submenu.
The system displays the Disk Status Summary window (or Volume Summary window).
3.
In the Disk Status Summary window (or Volume Summary window), click
MB3 on a device name.
The system displays a pop-up menu.
4.
Choose Display Disk.
The system displays the Single Disk Summary window.
As an alternative to steps 3 and 4, you can can double-click MB1 on a line in the
Disk Status Summary or Volume Summary window to display the Single Disk
Summary window.
Note that when you click on an item, DECamds temporarily stops updating the window for 15 seconds or until you choose an item from a menu.
3–7
Managing DECamds Data Windows
3.3 Single Disk Summary Window
Table 3–4 lists the Single Disk Summary window data fields.
Table 3–4 Single Disk Summary Window Data Fields
Field Displays
Node
Status
Errors
Trans
Rwait
Free
QLen
OpRate
Name of the node
Status of the disk: mounted, online, offline, and so on
Number of errors on the disk
Number of currently-in-progress file system operations on the disk
(number of open files on the volume)
Indication of an I/O stalled on the disk
Count of free disk blocks on the volume
An (M) after the free block count indicates this node holds the lock on the volume that DECamds uses to obtain the true free block count on the volume. Other nodes might not have accessed the disk, so their free block count might not be up to date.
Average number of operations in the I/O queue for the volume
Count of rate of change to operations on the volume
From the Single Disk Summary window, you can display the Process I/O
Summary window. See Section 3.6 for more information.
3.4 Page/Swap File Summary Window
The Page/Swap File Summary window shown in Figure 3–5 displays data about a node’s page/swap file usage and identifies page or swap files that are overused or underconfigured. It also displays nodes that lack a page or swap file.
Figure 3–5 Page/Swap File Summary Window
STAR Page/Swap Files Summary
File View Fix Customize Help
File Usage (blocks)
Node Name File Name Used
DELPHI DISK$DELPHI_PAG66:[SYSEXE]PAGEFILE_DELPHI_2.SYS;1
HELENA
DELPHI
DISK$HELENA_PAG65:[SYSEXE]PAGEFILE_HELENA_1.SYS;1
DISK$DELPHI_PAG64:[SYSEXE]PAGEFILE_DELPHI_1.SYS;1
HELENA DISK$HELENA_PAG67:[SYSEXE]PAGEFILE_HELENA_2.SYS;1
ADRIC DISK$ADRIC_20400:[SYS13.SYSEXE]PAGEFILE.SYS;2
BRYTT DISK$BRYTT_19565:[SYS1.SYSEXE]PAGEFILE.SYS;1
BARODA DISK$BARODA_65093:[SYS39.SYSEXE]PAGEFILE.SYS;1
HELENA DISK$PAGEDISK4:[SYSEXE]PAGEFILE_HELENA_4.SYS;1
SPRGLU DISK$SPRGLU_65118:[SYS1.SYSEXE]PAGEFILE.SYS;1
DELPHI
DAG
BOOM
DISK$PAGEDISK4:[SYSEXE]PAGEFILE_DELPHI_4.SYS;1
DISK$DAG_19613:[SYS1.SYSEXE]PAGEFILE.SYS;2
DISK$VAXVMSV055:[SYS0.SYSEXE]PAGEFILE.SYS;1
138842
132163
123951
111784
106415
48170
45713
39130
37876
37224
35394
30285
%Used Total Reservable
499992
499992
499992
499992
169992
199992
149992
149992
149992
149992
149992
74000
121063
110758
121895
108575
−4604
81519
44993
32993
54985
37170
72112
−15934
ZK−7964A−GE
3–8
Managing DECamds Data Windows
3.4 Page/Swap File Summary Window
To open a Page/Swap File Summary window, do one of the following:
• In the Event Log window, click MB3 on any event, and choose Display from the menu displayed. Then choose Page/Swap File Summary from the submenu displayed.
• In the System Overview window, click MB3 on any node or group line, and choose Display from the menu displayed. Then choose Page/Swap File
Summary from the submenu displayed.
Table 3–5 describes the Page/Swap File Summary window data fields.
Table 3–5 Page/Swap File Summary Window Data Fields
Field Displays
Node
Name
File Name
The name of the node on which the page/swap file resides.
The name of the page/swap file. For secondary page/swap files, the file name is obtained by a special AST to the job controller on the remote node.
DECamds makes one attempt to retrieve the file name.
Used
% Used
The number of used pages or pagelet blocks within the file.
A graph representing the percentage of the blocks from the available page or pagelet blocks in each file.
The total number of pages or pagelet blocks within the file.
Total
Reservable The number of pages or pagelet blocks that can be logically claimed by a process for future physical allocation. This value can be listed as a negative value, because it is merely a value of a process’s interest in getting pages from the file. If every process currently executing needed to use the file, then this value is the debt that is owed.
DECamds detects the following page and swap file-related events and displays them in the Event Log window. Node is replaced by the name of the node to which the event is related.
LOPGSP, node file page file space is low
LOSWSP, node file swap file space is low
NOPGFL, node has no page file
NOSWFL, node has no swap file
3–9
Managing DECamds Data Windows
3.5 Node Summary Window
3.5 Node Summary Window
The Node Summary window shown in Figure 3–6 displays a high-level graphic summary of node resource demands on the CPU, memory, and I/O.
Figure 3–6 Node Summary Window
DELPHI Node Summary
File View Fix Customize Help
Model:
O.S.:
Uptime:
Memory:
CPUs:
DEC 7000 Model 630
OpenVMS V7.0
10 19:00:21.98
192.00 Mb
4
CPU Process State Queues
0 2 4 6 Curr Peak
COM
WAIT
0.08
5.40
1.89
3.79
Page Faults (per second)
0 192
Total
Hard
System
Memory (Pages in thousands)
0 78
Free
Used
Modified
Bad
157
384
235
C
M
N
I
K
E
S
U
CPU Modes (Avg all processors)
0 25 50 75 100 Curr Peak
14
0
9
33
25
14
1
5
20
0
19
4
50
14
5
38
576
314
768
393
Curr
269.43
25.35
0.00
Curr
22838
354729
15649
0
Peak
1363.80
63.97
0.38
Peak
19375
358194
15679
0
I/O (per second)
0
WIO
DIO
BIO
48 96 144 192 Curr
0.96
98.10
37.71
Peak
8.03
116.49
92.76
ZK−7963A−GE
To open a Node Summary window, do one of the following:
• In the System Overview window, double-click on any node name. You can also click MB3 on any node name, and choose Display from the menu.
• In the Event Log window, double-click on any node name. You can also click
MB3 on an event that is related to node summary data, and choose Display from the menu.
3–10
Managing DECamds Data Windows
3.5 Node Summary Window
Dynamic bar graphs display the current values for each field. Peak values are also displayed from when DECamds begins collecting node summary data. A peak value is typically the highest value received; however, for the Free Memory field it is the lowest value received.
You can open the following windows from the Node Summary Window by double-clicking in the space for each category:
CPU Summary
CPU Modes Summary
Memory Summary
I/O summary
Table 3–6 describes the Node Summary window data fields.
Table 3–6 Node Summary Window Data Fields
Field Displays
Hardware Model
Operating System
Uptime
Memory
CPUs
CPU Process State
Queues
CPU Modes
Page Faults
Memory
I/O
The system hardware model name.
The name and version of the operating system.
The time since last reboot measured in days, hours, minutes, and seconds.
The total amount of physical memory found on the system.
The number of active CPUs on the node.
One of the following:
COM Sum of the queue lengths of processes in the COM and
COMO states.
WAIT Sum of the queue lengths of processes in the MWAIT,
COLPG, CEF, PFW, and FPG states.
The CPU usage by mode (kernel, executive, supervisor, user, interrupt, compatibility, multiprocessor synchronization, and null). On symmetric multiprocessing (SMP) nodes, percentages are averaged across all the CPUs and displayed as one value.
The rate of system hard and soft page faulting, as well as peak values seen during a DECamds session. System page faults are taken from kernel processes.
The histogram listing memory distribution (Free, Used, Modified,
Bad) as absolute values of number of thousands of pages or pagelets.
Peak values are also listed, with Free using lowest seen value as peak.
The histogram listing Buffer, Direct, and Page Write I/O rates per second. Also included is the peak value seen.
DECamds detects the following node events and displays them in the Event Log window. Node is replaced by the name of the node to which the event is related.
HIBIOR, node buffered I/O rate is high
HICOMQ, node many processes waiting for CPU
HIDIOR, node direct I/O rate is high
HIHRDP, node hard page fault rate is high
HIMWTQ, node process waiting in MWAIT
HINTER, node interrupt mode time is high
HIPWIO, node paging write I/O rate is high
HIPWTQ, node many processes waiting in COLPG, PFW, or FPG
HITTLP, node total page fault rate is high
3–11
Managing DECamds Data Windows
3.5 Node Summary Window
HMPSYN, node MP synchronization mode time is high
HISYSP, node system page fault rate is high
LOMEMY, node free memory is low
NOPROC, node cannot find process names process
3.6 Process I/O Summary Window
The Process I/O Summary window shown in Figure 3–7 displays summary statistics about process I/O rates and quotas. Use the Process I/O Summary window to display information about I/O issues that might be caused by
I/O-intensive programs or I/O bottlenecks.
Note
DECamds does not yet support kernel threads. If you use threaded processes, DECamds displays only the top thread.
Figure 3–7 Process I/O Summary Window
EDISON Process I/O Summary
File View Fix Customize
..................
PID
Process
Name
..................
I/O Rate per second
DIO BIO PIO
Open
...........
Files DIO
Remaining Quotas
...........
BIO Bytes Files
20A0005B
20A000A4
20A000A6
20A00094
DECW$SERVER_0
Dana’s Dad 2
AMDS$COMM
Dana’s Dad
20A0004D
20A0006B
20A00063
AUDIT_SERVER
SMISERVER
SQLSRV$SERVER
20A00056 EVL
20A0009A DECW$MWM
20A00050 SECURITY_SERVER
0.00
1.08
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.83
0.33
14.69
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
7
6
5
4
3
3
21
9
9
8
100
250
100
250
200
100
100
100
250
200
92
249
199
246
198
97
254
97
249
199
38224
11984
11984
75728
97632
37312
39552
37184
89168
66528
179
591
591
592
93
94
95
96
597
97
Help
ZK−7965A−GE
To open a Process I/O Summary window, do one of the following:
• In the Node Summary window, double-click in the I/O area.
• On the View menu in the Single Disk Summary window, choose Display
Process I/O Summary.
• In the System Overview window, double-click on the BIO or DIO fields for any node. You can also click MB3 on any field for any node, choose Display from the menu, and choose Process I/O Summary from the submenu.
• To open a Process I/O Summary window for every node in a group, in the
System Overview window, click MB3 on a group line, choose Display from the menu, and choose Process I/O Summary from the submenu.
• In the Event Log window, click MB3 on any process I/O-related event, and choose Display from the menu.
3–12
Managing DECamds Data Windows
3.6 Process I/O Summary Window
You can open a window about a specific process in the Process I/O Summary window by double-clicking on the process name.
Table 3–7 describes the Process I/O Summary window data fields.
Table 3–7 Process I/O Summary Window Data Fields
Field Displays
PID
Process Name
Direct I/O Rate
(DIO)
Buffered I/O Rate
(BIO)
Paging I/O Rate
(PIO)
Open Files
Direct I/O Limit
Remaining (DIO)
Buffered I/O Limit
Remaining (BIO)
Byte Limit
Remaining (Bytes)
Open File Limit
Remaining (Files)
The process identifier, a 32-bit value that uniquely identifies a process.
The current process name.
The rate at which I/O transfers occur between the system devices and the pages or pagelets that contain the process buffer that the system locks in physical memory.
The rate at which I/O transfers occur between the process buffer and an intermediate buffer from the system buffer pool.
The rate of read attempts necessary to satisfy page faults (also known as Page Read I/O or the Hard Fault Rate).
The number of open files.
The number of remaining direct I/O limit operations available before the process reaches its quota. DIOLM quota is the maximum number of direct I/O operations a process may have outstanding at one time.
The number of remaining buffered I/O operations available before the process reaches its quota. BIOLM quota is the maximum number of buffered I/O operations a process may have outstanding at one time.
The number of buffered I/O bytes available before the process reaches its quota. BYTLM is the maximum number of bytes of nonpaged system dynamic memory that a process can claim at one time.
The number of additional files the process can open before reaching its quota. FILLM quota is the maximum number of files that can be opened simultaneously by the process, including active network logical links.
DECamds detects the following process I/O-related events and displays them in the Event Log window. Node is replaced by the name of the node to which the event is related. Process is replaced by the name of the process to which the event is related.
LOBIOQ, node process has used most of its BIOLM process quota
LOBYTQ, node process has used most of its BYTLM job quota
LODIOQ, node process has used most of its DIOLM process quota
LOFILQ, node process has used most of its FILLM job quota
PRBIOR, node process buffered I/O rate is high
PRDIOR, node process direct I/O rate is high
PRPIOR, node process paging I/O rate is high
3–13
Managing DECamds Data Windows
3.7 CPU Modes Summary Window
3.7 CPU Modes Summary Window
The CPU Modes Summary window shown in Figure 3–8 displays more detailed summary statistics about CPU mode usage than the Node Summary window.
Use the CPU Modes Summary window to diagnose issues that may be caused by
CPU-intensive users or CPU bottlenecks.
Figure 3–8 CPU Modes Summary Window
HELTER CPU Modes Summary
File View Customize
CPU Id
Capabilities
CPU #01
PRIMARY
RUN
QUORUM
CPU #03
RUN
QUORUM
CPU #04
RUN
QUORUM
State
Mode
Run
Kernel
Executive
Supervisor
User
Interrupt
Compatiblity
MP Synch
Null
Run
Kernel
Executive
Supervisor
User
Interrupt
Compatiblity
MP Synch
Null
Run
Kernel
Executive
Supervisor
User
% Used
PID
Rate
Name
Peak
0%
0%
0%
0%
4%
0%
0%
95%
7%
4%
0%
14%
1%
0%
0%
74%
*** None ***
13%
3%
0%
36%
13%
0%
0%
54%
*** None ***
26%
7%
9%
60%
3%
0%
2%
34%
2EA031EB
8%
7%
0%
24%
APAS1_CALENDAR
42%
8%
10%
63%
Help
ZK−7940A−GE
To open a CPU Modes Summary window, do one of the following:
• In the Node Summary window, double-click MB1 in the CPU Modes area. You can also click MB3, and choose Display from the menu.
• In the Node Summary window View menu, choose Display Modes Summary.
You can open a window about a specific process in the CPU Modes Summary window by double-clicking on the process name.
Table 3–8 describes the CPU Modes Summary window data fields.
3–14
Managing DECamds Data Windows
3.7 CPU Modes Summary Window
Table 3–8 CPU Modes Summary Window Data Fields
Field Displays
CPU ID A decimal value representing the identity of a process in a multiprocessing system. On a uniprocessor, this value will always be CPU #00.
Capabilities One of the the following CPU capabilities: Primary, Quorum, Run, or Vector.
State
Mode
One of the following CPU states: Boot, Booted, Init, Rejected, Reserved,
Run, Stopped, Stopping, or Timeout.
One of the following values for CPU modes supported for the architecture:
Compatibility, Executive, Interrupt, Kernel, MP Synch, Null, Supervisor, or
User. Note: Compatibility mode does not exist on OpenVMS Alpha systems.
% Used
PID
Name
Rate
Peak
A bar graph, by CPU, representing the percentage of the CPU utilization for each mode.
The process identifier value of the process that is using the CPU. If the PID is unknown to the console application, the internal PID (IPID) will be listed.
The process name of the process found in the CPU. If no process is found in the CPU, this will be listed as *** None ***.
A numerical percentage of CPU time for each mode.
The peak CPU usage determined for each mode.
3.8 CPU Summary Window
The CPU Summary window shown in Figure 3–9 displays summary statistics about process CPU usage issues that might be caused by CPU-intensive users or
CPU bottlenecks.
Figure 3–9 CPU Summary Window
VAX5 CPU Summary
File View Fix Customize Help
..................
PID
Process ..................
Name Priority
.....................................
State Rate
CPU
Wait
.....................................
Time
41C029F6
41C02c6c
41C02c55
41C0286B
BATCH_548
SYSBAK_286B
SYSBAK_28DA
SYSBAK_2185_3
3/ 3
6/ 6
6/ 6
7/ 6
LEF 25.63%
LEF
LEF
23.77%
9.71%
LEF 5.01%
0.00%
0.00%
0.00%
0.00%
0 00:09:32.39
0 00:01:11.60
0 00:05:47.91
0 00:00:14.08
ZK−7942A−GE
3–15
Managing DECamds Data Windows
3.8 CPU Summary Window
To open a CPU Summary window, do one of the following:
• In the System Overview window, double-click on the CPU field of any node.
You can also click MB3 on an event that is related to CPU usage, choose
Display from the menu, and choose CPU Summary from the list.
• In the Node Summary window, double-click on CPU Process State Queues.
• In the Event Log window, click MB3 on an event that is related to CPU usage, choose Display from the menu, and choose CPU Summary from the list.
You can open a window about a specific process in the CPU Summary window by double-clicking on the process name.
Table 3–9 describes the CPU Summary window data fields.
Table 3–9 CPU Summary Window Data Fields
Field Displays
PID
Name
Priority
State
Rate
Wait
Time
The process identifier, a 32-bit value that uniquely identifies a process.
The process name.
Computable (xx) and base (yy) process priority in the format xx/yy.
One of the values listed under the Single Process Summary description in
Table 3–11.
The percent of CPU time used by this process. This is the ratio of CPU time to elapsed time. The CPU rate is also displayed in the bar graph.
The percent of time the process is in the COM or COMO state.
The amount of actual CPU time charged to the process.
DECamds detects the following CPU-related events and displays them in the
Event Log window. Node is replaced by the name of the node to which the event is related. Process is replaced by the name of the process to which the event is related.
PRCCOM, node process waiting in COM or COMO
PRCCVR, node process has high CPU rate
PRCMWT, node process waiting in MWAIT
PRCPWT, node process waiting in COLPG, PFW, or FPG
3–16
Managing DECamds Data Windows
3.9 Memory Summary Window
3.9 Memory Summary Window
The Memory Summary window shown in Figure 3–10 displays memory usage data for processes on a node so that you can identify processes that use large amounts of memory or have high page fault rates.
Figure 3–10 Memory Summary Window
File View Fix
AMDS Memory Summary
Customize
PID
Process
Name Count
000001D5
000001CE
000001CD
000001C5
000001C2
000001AB
00000091
00000065
0000004F
0000004A
_FTA24:
VUE$SYSTEM_4
VUE$SYSTEM_3
DECW$MWM
390
337
271
1290
DECW$SESSION
DECW$SERVER_0
DECW$TE_0091
LATACP
NETACP
QUEUE_MANAGER
490
4423
2768
395
249
500
Size
Working Set
Extent Rate
512 32768 0.00
512
512
32768
32768
0.00
0.00
5762 32768 0.00
6512 32768
5581
3264
32768
32768
680
500
0.00
0.00
0.02
2048 0.00
2048 0.00
2048 32768 0.00
Page Fault
I/O
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Help
ZK−7960A−GE
To open a Memory Summary window, do one of the following:
• In the Node Summary window, double-click on the Page Faults or Memory area. You can also click MB3 on the Page Faults or Memory area, and choose
Display from the menu.
• In the View menu of the Node Summary window, choose Display Memory
Summary.
• In the System Overview window, double-click on the Memory field for any node. You can also click MB3 on any field for any node, choose Display from the pop-up menu, and choose Memory Summary from the submenu.
• To display a memory summary of every node in a group from the System
Overview window, click MB3 on the group line, choose Display from the menu, and choose Memory Summary from the submenu.
• In the Event Log window, click MB3 on an event related to memory usage, and choose Display from the menu.
You can open a window about a specific process in the Memory Summary window by double-clicking on the process name.
3–17
Managing DECamds Data Windows
3.9 Memory Summary Window
Table 3–10 describes the Memory Summary window data fields.
Table 3–10 Memory Summary Window Data Fields
Field Displays
PID
Process Name
Working Set
Count
1
The process identifier, a 32-bit value that uniquely identifies a process.
The process name.
The number of physical pages or pagelets of memory that the process is using. The bar graph represents the percentage of working set count used to the working set extent.
Working Set
Size
1
Working Set
Extent
1
The number of pages or pagelets of memory the process is allowed to use. This value is periodically adjusted by the operating system based on analysis of page faults relative to CPU time used. When the value increases in large units, this indicates a process is receiving a lot of page faults and its memory allocation is increasing.
The number of pages or pagelets of memory in the process’s
WSEXTENT quota as defined in the user authorization file (UAF).
The number of pages or pagelets will not exceed the value of the system parameter WSMAX.
Page Fault Rate The number of page faults per second for the process. The bar graph represents a relative number of page faults per second.
Page Fault I/O The rate of read attempts necessary to satisfy page faults (also known as Page Read I/O or the Hard Fault Rate).
1
Working Set Value = Total Physical Memory / Maximum Process Count
DECamds detects the following memory-related events and displays them in the
Event Log window. Node is replaced by the name of the node to which the event is related. Process is replaced by the name of the process to which the event is related.
LOWEXT, node process working set extent is too small
LOWSQU, node process working set quota is too small
PRPGFL, node process high page fault rate
PRPIOR, node process paging I/O rate is high
3–18
Managing DECamds Data Windows
3.10 Single Process Summary Window
3.10 Single Process Summary Window
The Single Process Summary window shown in Figure 3–11 displays summary data about a process, including Execution Rates, Process Quotas in Use, Wait
States, and Job Quotas in Use.
Figure 3–11 Single Process Summary Window
EDISON Process SECURITY_SERVER (DETACHED)
File View Fix Customize
Process name
Username
Account
UIC
PID
Owner ID
PC
PSL
Priority
State
SECURITY_SERVER
SYSTEM
<start>
[1,4]
20A00050
00000000
7FFEDF8A
03C00000
10/ 8
HIB
WS global pages
WS private pages
WS total pages
WS size
WSdef
WSquo
WSextent
Images activated
Mutexes held
21
113
134
1178
1028
4100
21685
0
0
Help
EXECUTION RATES
CPU
Direct I/O
Buffered I/O
Paging I/O
Page Faults
0.00
0.00
0.00
0.00
0.00
WAIT STATES
Compute
Memory
Direct I/O
Buffered I/O
Control
Quotas
Explicit
0
0
0
0
0
0
68
100
100
100
100
100
100
100
Current image: Not Available − not in memory
PROCESS QUOTAS IN USE
DIOlm
BIOlm
ASTlm
CPU
0
1
8
200
200
200
JOB QUOTAS IN USE
Fillm
Pgflquo
Englm
TQElm
Prclm
Bytlm
3
3296
6
7
0
0
100
32768
2000
50
8
66528
ZK−7967A−GE
To open a Single Process Summary window, do one of the following:
• In any window that displays processes (CPU, CPU Modes, Memory, Process
I/O, and Single Lock Summary), double-click on any field. You can also click
MB3 on any field in a process line, and choose Display from the pop-up menu.
• You can also click on any field in a process line, and choose Display Process from the View menu.
• In the Event Log window, double-click on a process-related event. You can also click MB3 on a process-related event, choose Display from the menu, and choose Single Process in the dialog box.
3–19
Managing DECamds Data Windows
3.10 Single Process Summary Window
Table 3–11 describes the Single Process Summary window data fields.
Table 3–11 Single Process Summary Window Data Fields
Field Displays
Process name
Username
Account
UIC
PID
Owner ID
PC
PSL
Priority
The name of the process.
The user name of the user owning the process.
The string assigned to the user by the system manager.
The user identification code (UIC), a pair of numbers or character strings designating the group and user.
The process identifier, a 32-bit value that uniquely identifies a process.
The PID of the process that created the process displayed in the window. If 0, then the process is a parent process.
The program counter. On OpenVMS VAX systems, this is the address of the next instruction the CPU will execute. On OpenVMS Alpha systems, this value is displayed as 0, because the data is not readily available to the Data Provider node.
The processor status longword (PSL). On OpenVMS VAX systems, this indicates the current processor mode (user, kernel, and so on) and its interrupt level. On OpenVMS Alpha systems, this value is displayed as
0, because the data is not readily available to the Data Provider node.
The computable and base priority of the process. Priority is an integer between 0 and 31. Processes with higher priority get more CPU time.
(continued on next page)
3–20
Managing DECamds Data Windows
3.10 Single Process Summary Window
Table 3–11 (Cont.) Single Process Summary Window Data Fields
Field Displays
State One of the following process states:
CEF
Common Event Flag, waiting for a Common Event Flag
COLPG
Collided Page Wait, involuntary wait state; likely indicates a memory shortage, waiting for hard page faults
COM
COMO
CUR
FPW
Computable; ready to execute
Computable Outswapped, COM, but swapped out
Current, currently executing in a CPU
Free Page Wait, involuntary wait state; likely indicates a memory shortage
Local Event Flag, waiting for a Local Event Flag
LEF
LEFO
HIB
Local Event Flag Outswapped; LEF, but outswapped
Hibernate, voluntary wait state requested by the process; it is inactive
HIBO
Hibernate Outswapped, hibernating but swapped out
MWAIT
Miscellaneous Resource Wait, involuntary wait state; possibly caused by a shortage of a systemwide resource such as no page or swap file capacity or synchronizations for single threaded code
PFW
Page Fault Wait, involuntary wait state; possibly indicates a memory shortage, waiting for hard page faults
RWAST
Resource Wait State, waiting for delivery of an asynchronous system trap (AST) that signals a resource availability; usually an I/O is outstanding or a process quota is exhausted
RWBRK
Resource Wait for BROADCAST to finish
RWCAP
Resource Wait for CPU Capability
RWCLU
Resource Wait for Cluster Transition
RWCSV
Resource Wait for Cluster Server Process
RWIMG
Resource Wait for Image Activation Lock
RWLCK
Resource Wait for Lock ID data base
RWMBX
Resource Wait on MailBox, either waiting for data in mailbox (to read) or waiting to place data (write) into a full mailbox (some other process has not read from it; mailbox is full so this process cannot write).
RWMPB
Resource Wait for Modified Page writer Busy
RWMPE
Resource Wait for Modified Page list Empty
RWNPG
Resource Wait for Non Paged Pool
RWPAG
Resource Wait for Paged Pool
RWPFF
Resource Wait for Page File Full
RWQUO
Resource Wait for Pooled Quota
RWSCS
Resource Wait for System Communication Services
RWSWP
Resource Wait for Swap File space
SUSP
Suspended, wait state process placed into suspension; it can be resumed at the request of an external process
SUSPO
Suspended Outswapped, suspended but swapped out
(continued on next page)
3–21
Managing DECamds Data Windows
3.10 Single Process Summary Window
Table 3–11 (Cont.) Single Process Summary Window Data Fields
Field Displays
WS global pages The shared data or code between processes, listed in pages or pagelets.
WS private pages
The amount of accessible memory, listed in pages or pagelets.
WS total pages The sum of global and private pages or pagelets.
WS size
WSdef
WSquo
WSextent
The working set size, number of pages or pagelets of memory the process is allowed to use. This value is periodically adjusted by the operating system based on analysis of page faults relative to CPU time used. When it increases in large units, this indicates a process is taking a lot of page faults and its memory allocation is increasing.
The working set default, the initial limit to the number of physical pages or pagelets of memory the process can use. This parameter is listed in the user authorization file (UAF); discrepancies between the
UAF value and the displayed value are due to page/longword boundary rounding or other adjustments made by the operating system.
The working set quota, the maximum amount of physical pages or pagelets of memory the process can lock into its working set. This parameter is listed in the UAF; discrepancies between the UAF value and the displayed value are due to page/longword boundary rounding or other adjustments made by the operating system.
The working set extent, the maximum number of physical pages or pagelets of memory the system will allocate for the process. The system provides memory to a process beyond its quota only when it has an excess of free pages and can be recalled if necessary. This parameter is listed in the UAF; any discrepancies between the UAF value and the displayed value are due to page/longword boundary rounding or other adjustments made by the operating system.
The number of times an image is activated.
Images activated
Mutexes held The number of mutual exclusions (mutexes) held. Persistent values other than zero (0) require analysis. A mutex is similar to a lock but is restricted to one CPU. When a process holds a mutex, its priority is temporarily incremented to 16.
CPU
Direct I/O
Buffered I/O
Paging I/O
Page Faults
Execution Rates
The percent of CPU time used by this process. This is the ratio of CPU time to elapsed time. CPU rate is also displayed in the bar graph.
The rate at which I/O transfers take place from the pages or pagelets containing the process buffer that the system locks in physical memory to the system devices.
The rate at which I/O transfers take place for the process buffer from an intermediate buffer from the system buffer pool.
The rate of read attempts necessary to satisfy page faults. This is also known as Page Read I/O or the Hard Fault Rate.
The page faults per second for the process. The bar graph visually represents page faults per second.
(continued on next page)
3–22
Managing DECamds Data Windows
3.10 Single Process Summary Window
Table 3–11 (Cont.) Single Process Summary Window Data Fields
Field Displays
DIOLM
BIOLM
ASTLM
CPU
Compute
Memory
Direct I/O
Buffered I/O
Control
Quotas
Explicit
Process Quotas in Use
1
Direct I/O Limit. A bar graph representing current count of DIOs used with respect to the limit that can be attained.
Buffered I/O Limit. A bar graph representing current count of BIOs used with respect to the limit that can be attained.
Asynchronous System Traps Limit. A bar graph representing current count of ASTs used with respect to the limit that can be attained.
CPU Time Limit. A bar graph representing current count of CPU time used with respect to the limit that can be attained. If the limit is 0, then this value is not used.
Wait States
2
A relative value indicating that the process is waiting for CPU time.
The included states are COM, COMO, RWCAP.
A relative value indicating that the process is waiting for a page fault that requires data to be read from disk; common during image activation. The included states are PFW, COLPG, FPG, RWPAG,
RWNPG, RWMPE, RWMPB.
A relative value indicating that the process is waiting for data to be read from or written to a disk. The included state is DIO.
A relative value indicating that the process is waiting for data to be read from or written to a slower device such as a terminal, line printer, or mailbox. The included state is BIO.
A relative value indicating that the process is waiting for another process to release control of some resource. The included states are CEF, MWAIT, LEF, LEFO, RWAST, RWMBX, RWSCS, RWCLU,
RWCSV, RWUNK, and LEF waiting for a ENQ.
A relative value indicating that the process is waiting because the process has exceeded some quota. The included states are QUOTA and
RWAST_QUOTA.
A relative value indicating that the process is waiting because the process asked to wait, such as a hibernate system service. The included states are HIB, HIBO, SUSP, SUSPO, and LEF waiting for a TQE.
FILLM
PGFLQUO
ENQLM
Job Quotas in Use
File Limit. A bar graph representing current number of open files with respect to the limit that can be attained.
Page File Quota. A bar graph representing current number of disk blocks in page file that the process can use with respect to the limit that can be attained.
Enqueue Limit. A bar graph representing current count of resources
(lock blocks) queued with respect to the limit that can be attained.
1
When you display the SWAPPER process, no values are listed in this section. The SWAPPER process does not have quotas defined in the same way other system and user processes do.
2
The wait state specifies why a process cannot execute, based on application-specific calculations.
(continued on next page)
3–23
Managing DECamds Data Windows
3.10 Single Process Summary Window
Table 3–11 (Cont.) Single Process Summary Window Data Fields
Field Displays
TQELM
PRCLM
BYTLM
Image Name
Job Quotas in Use
Timer Queue Entry Limit. A bar graph representing current count of timer requests with respect to the limit that can be attained.
Process Limit. A bar graph representing current count of subprocesses created with respect to the limit that can be attained.
Buffered I/O Byte Limit. A bar graph representing current count of bytes used for buffered I/O transfers with respect to the limit that can be attained.
The name of the currently executing image, if available. If this field does not appear, then the data is not resident in memory.
DECamds displays them in the Event Log window. Node is replaced by the name of the node to which the event is related. Process is replaced by the name of the process to which the event is related.
LOASTQ, node process has used most of its ASTLM process quota
LOBIOQ, node process has used most of its BIOLM process quota
LOBYTQ, node process has used most of its BYTLM job quota
LODIOQ, node process has used most of its DIOLM process quota
LOENQU, node process has used most of its ENQLM job quota
LOFILQ, node process has used most of its FILLM job quota
LOPGFQ, node process has used most of its PGFLQUOTA job quota
LOPRCQ, node process has used most of its PRCLM process quota
LOTQEQ, node process has used most of its TQELM job quota
LOWEXT, node process working set extent is too small
LOWSQU, node process working set quota is too small
PRBIOR, node process buffered I/O rate is high
PRBIOW, node process waiting for buffered I/O
PRCCOM, node process waiting in COM or COMO
PRCCUR, node process has high CPU rate
PRCMUT, node process waiting for a mutex
PRCPUL, node process has used most of its CPUTIME process quota
PRCPWT, node process waiting in COLPG, PFW, or FPG
PRCQUO, node process waiting for a quota
PRCRWA, node process waiting in RWAST
PRCRWC, node process waiting in RWCAP
PRCRWM, node process waiting in RWMBX
PRCRWP, node process waiting in RWPAG, PWNPG, RWMPE, or RWMPB
PRCRWS, node process waiting in RWSCS, RWCLU, or RWCSV
PRCUNK, node process waiting for a system resource
PRDIOR, node process direct I/O rate is high
PRDIOW, node process waiting for direct I/O
PRLCKW, node process waiting for a lock
PRPGFL, node process high page fault rate
PRPIOR, node process paging I/O rate is high
3–24
Managing DECamds Data Windows
3.11 Lock Contention Summary Window
3.11 Lock Contention Summary Window
The Lock Contention Summary window shown in Figure 3–12 determines which resources are under contention. It displays all the OpenVMS Lock Manager resources that have potential lock contention situations. The Lock Contention
Summary window is available only for groups; attempting to open a Lock
Contention Summary for a node opens the node’s group window.
Figure 3–12 Lock Contention Summary Window
EVMS Lock Contention Summary
File View Customize
Resource Name Master Node Parent Resource
DECW$SERVER_2680009D_0066_0
DECW$SERVER_268000A7_0069_0
DECW$CLIENT_268000A6_0071_0
DECW$CLEINT_268000A7_0069_0
DECW$CLIENT_2680009B_003A_0
DECW$CLIENT_2680009E_0067_0
DECW$CLIENT_2680009B_003C_0
LCKPAG
LCKPAG
LCKPAG
LCKPAG
LCKPAG
LCKPAG
LCKPAG
Help
Duration Status
0 00:09:10 VALID
0 00:09:10
0 00:09:10
0 00:09:10
VALID
VALID
VALID
0 00:09:10
0 00:09:10
0 00:09:10
VALID
VALID
VALID
ZK−7956A−GE
Locks are written to AMDS$LOCK_LOG.LOG; see Section B.3 for more information. To interpret the information displayed in the Lock Contention
Summary window, you should have an understanding of OpenVMS lock management services. For more information, see the OpenVMS System Services
Reference Manual.
Note
Lock contention data is accurate only if every node in an OpenVMS
Cluster environment is in the same group. Multiple clusters can share a group, but clusters cannot be divided into different groups without losing accuracy.
You can open a Lock Contention Summary window from the Event Log or System
Overview windows, as follows:
• In the Event Log window, click MB3 on any lock contention-related event and choose Display from the menu.
• In the System Overview window:
1.
Click MB3 on any node or group line, and choose Display from the menu.
2.
Choose Lock Contention Summary from the submenu.
3–25
Managing DECamds Data Windows
3.11 Lock Contention Summary Window
Table 3–12 describes the Lock Contention Summary window data fields.
Table 3–12 Lock Contention Summary Window Data Fields
Field Displays
Resource Name
Master Node
The resource name associated with the $ENQ system service call.
The node on which the resource is mastered.
Parent Resource The name of the parent resource. If no name is displayed, the resource listed is the parent resource.
Duration
Status
The amount of time elapsed since DECamds first detected the contention situation.
The status of the lock. See the $ENQ(W) description in the OpenVMS
System Services Reference Manual.
You can open a Single Lock Summary window from the Lock Contention
Summary window. See Section 3.12 for more information.
Figure 3–13 shows how to determine which filters can or cannot be displayed.
To filter specific locks from the display, choose Filter Data... from the Customize menu on the Lock Contention Summary window. A filter dialog box appears with a list of locks currently being filtered from the display.
To add a filter, use either of the following methods:
• Type the name of a filter in the Input Lock Name to Filter field and click on the Add button. You can use the asterisk ( * ) wildcard character to specify a range of filters. For example, $DSA*$WAITER will filter all locks beginning with $DSA and ending with $WAITER and anything in between.
• Click on a lock in the Lock Contention Summary window. The name of the lock will appear in the Input Lock Name to Filter field (as shown in
Figure 3–13). You must click on the Add button to add the filter.
3–26
Managing DECamds Data Windows
3.11 Lock Contention Summary Window
Figure 3–13 Filtering Lock Events
EVMS Lock Contention Summary
File View Customize
Resource Name Master Node
$DSA0064_COPIER LOADQ
MOU$_DAD44: ZAPNOT
Status
0 00:25:35 INVALID
0 00:24:30 VALID
Lock Contention Summary Filtering
Filtered Resources Name List
ALIAS$
DQS$
NSCHED_
CACHE$
AUDRSV$
Input Lock Name to Filter:
MOU$_DAD44:
Help
OK
Add
Remove Cancel
ZK−7957A−GE
You can remove a lock from the filter list by selecting a lock and clicking on the Remove button. Any lock contentions affected by the removed filter will be displayed.
DECamds detects the following lock contention-related events and displays them in the Event Log window. Italicized words are replaced with actual values.
LCKCNT, node possible contention for resource resource
LRGHSH, node lock hash table too large n entries
RESDNS, node resource hash table dense percentage full n resources, hash table size n
RESPRS, node resource hash table sparse, only percentage full n resources, table size n
3–27
Managing DECamds Data Windows
3.12 Single Lock Summary Window
3.12 Single Lock Summary Window
The Single Lock Summary window shown in Figure 3–14 displays data about a blocking lock and all locks in the granted, conversion, and waiting queues. You can use it to display detailed information about a lock contention situation. The lock name is specified in the title bar. All locks are written to AMDS$LOCK_
LOG.LOG; see Section B.3 for more information.
Figure 3–14 Single Lock Summary Window
LCKPAG Single Lock Summary for DECW$SERVER_268000A7_D069_0
File View Fix Customize
Granted Lock
Node:
LKID:
LCKPAG
050007F3
Process Name: DECW$SERVER_0
Lock Type: Local Copy
Resource Name: DECW$SERVER_268000A7_0069_0
Parent Resource:
Help
Granted Queue
Node
........
LCKPAG
Process Name
DECW$SERVER_0
LKID
........
GR Mode Duration
.....
Flags
............................................................
050007F3 EX 0 00:00:26 NOQUEUE SYNCSTS SYSTEM NODL CKW
Conversion Queue
Node Process Name LKID GR Mode Duration RQ Mode Flags
3–28
Waiting Queue
Node
........
Process Name
LCKPAG DECW$TE_00A7
LKID
........
RQ Mode Duration
.....
Flags
............................................................
010007F5 EX 0 00:00:26 SYSTEM NODL CKW NODL CKBLK
ZK−7966A−GE
In a Single Lock Summary window, if DECamds cannot determine the node name for the group, it uses the cluster system ID (CSID) value, which the OpenVMS
Cluster software uses to uniquely identify cluster members.
Managing DECamds Data Windows
3.12 Single Lock Summary Window
To open a Single Lock Summary window, do one of the following:
• In the Lock Contention Summary window, double-click on any field. You can also click MB3 on any field, and choose Display Lock from the menu.
• In the View menu of the Lock Contention Summary window, choose Display
Lock.
• In the Event Log window, click MB3 on any lock blocking-related or lock waiting-related event, and choose Display from the menu.
Table 3–13 describes the Single Lock Summary window data fields.
Node
Process
Name
LKID
GR Mode
RQ Mode
Duration
Flags
Table 3–13 Single Lock Summary Window Data Fields
Field Displays
Node
LKID
Process
Name
Lock Type
Resource
Name
Parent
Resource
Granted Lock
The node name on which the lock is granted.
The lock ID value (which is useful with SDA).
The name of the process owning the blocking lock.
One of the following: Local Copy, Process Copy, or Master Copy.
The name of the resource.
The name of the parent resource (if any).
Granted, Conversion, and Waiting Queue
The node on which the lock block resides.
The process name of the process owning the lock.
The lock ID value (which is useful with SDA).
One of the following modes at which the lock is granted: EX, CW, CR, PW,
PR, NL.
One of the following modes at which the lock is requested: EX, CW, CR, PW,
PR, NL.
The length of time the lock has been in the current queue (since the console application found the lock).
The flags specified with the $ENQ(W) request.
You can open a window about a specific process in the Single Lock Summary window by double-clicking on the process name.
Note
Processes that are labeled unknown are associated with system locks.
They cannot be opened.
DECamds detects the following single lock-related events and displays them in the Event Log window. Node is replaced by the name of the node to which the event is related. Process is replaced by the name of the process to which the event is related.
3–29
Managing DECamds Data Windows
3.12 Single Lock Summary Window
LCKBLK, node process blocking resource resource
LCKWAT, node process waiting for resource resource granted to process on node node
3.13 Cluster Transition/Overview Summary Window
The Cluster Transition/Overview Summary window shown in Figure 3–15 displays information about each node in an OpenVMS Cluster. This window is very similar to the System Overview window; however, the Cluster Transition window lists only one cluster for each set of nodes in a cluster, while the System
Overview window lists all the nodes and the user-defined groups the nodes are in.
The window displays summary information as well as information about individual nodes: System Communication Services (SCS) name, SCS ID, Cluster
System ID, Votes, Lock Directory Weight value, cluster status, and last transition time.
The data items shown in the window correspond to data that the Show Cluster utility displays for the SYSTEM and MEMBERS classes. A status field display of "unknown" usually indicates that DECamds is not communicating with the node.
3–30
Managing DECamds Data Windows
3.13 Cluster Transition/Overview Summary Window
Figure 3–15 Cluster Transition/Overview Summary Window
Cluster Transition/Overview Summary
File View Fix Customize
Summary
Formed
Last Trans:
Votes:
Expected Votes:
Failover Step:
29−APR−1996
1−AUG−1996
12
15
55
11:47 Members In:
10:46 Members Out:
Quorum:
QD Votes:
Failover ID:
29
2
8
65535
072
Help
...................................................................................................................................................................................
Cluster Members
SCS Name
SAREKS
WEEKS
MACHU
RUMAD
DFODIL
AZSUN
CLAWS
CALPAL
VAX5
CRNPOP
LOADQ
GNRS
PITMOD
4X4TRK
VMSRMS
ALTOS
FARKLE
TSAVO
ETOSHA
CLAIR
MILADY
CHOBE
ZOON
ZAPNOT
2BOYS
ORNDT
BARNEY
ARUSHA
SUB4
GLOBBO
SCS Id CSID
4CC7
4CBB
FDAA
4CA7
FFA2
4CA1
FE94
4C93
4D12
4EF0
FD77
4C60
FF60
4D56
4C39
4C34
4C32
FD32
100E5
100DD
20002
100F7
100FC
20006
100EB
100C8
100EA
20003
4C31
FC2B
FE29
FF26
FD24
4DOF
100A2
100F9
20008
100FF
100FA
100DB
FE03
4CFE
20001
100CF
4CF3 100CE 1
0
1
4CDF
4ED8
4CD6
100F4
100F6
20009
0
0
1
0
0
1
1
0
0
20004
100ED
100C5
100CO
100FE
100FO
100D9
100FD
1
0
1
0
0
1
0
0
1
1
0
0
1
1
0
0
0
0
Votes Expect
1
15
3
13
3
13
13
15
15
13
3
13
13
13
3
13
3
15
13
13
15
13
15
3
15
5
13
3
13
13
Quorum
8
2
7
2
7
7
8
8
7
7
8
2
7
2
7
2
2
7
7
8
7
7
8
2
7
7
1
8
3
7
Lck:DirWt
0
1
0
1
1
0
1
0
0
0
1
0
0
0
1
0
1
0
1
0
1
1
0
0
0
0
0
0
0
0
Status Transition Time
UNKNOWN
UNKNOWN
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
MEMBER
16−JUL−1996
10−JUL−1996
28−JUL−1996
28−JUL−1996
28−JUL−1996
30−JUL−1996
16−JUL−1996
21−JUN−1996
1−AUG−1996
28−JUL−1996
8−JUN−1996
28−JUL−1996
31−JUL−1996
26−JUL−1996
28−JUL−1996
6−JUL−1996
28−JUL−1996
2−JUL−1996
29−JUN−1996
24−JUL−1996
28−JUL−1996
1−AUG−1996
30−JUL−1996
17−JUL−1996
15−JUN−1996
12−JUN−1996
28−JUL−1996
20−JUL−1996
4−JUL−1996
26−JUL−1996
08:47
11:54
11:08
12:01
11:55
06:41
12:01
06:56
08:19
09:10
11:51
10:46
15:47
14:25
09:43
12:06
11:52
11:55
21:43
16:22
06:48
10:46
12:11
13:28
13:36
13:51
11:56
06:43
16:37
11:55
ZK−8545A−GE
To open the Cluster Transition/Overview Summary window, do either of the following:
• In the System Overview window, click MB3 on a node line. Choose Display from the menu displayed and Cluster Transition Summary from the submenu.
The system displays the Cluster Transition/Overview Summary window.
• In the Event Log window, Click MB3 on a cluster-related event. Choose
Display from the menu displayed and Cluster Transition Summary from the list displayed.
3–31
Managing DECamds Data Windows
3.13 Cluster Transition/Overview Summary Window
Note: The Cluster Transition Summary menu option is not available for nodes that are not in the cluster; it is not available from lines that display groups.
3.13.1 Data Displayed
The Cluster Transition/Overview window has two panel displays:
• The Summary (top) panel displays cluster summary information.
• The Cluster Members (bottom) panel lists each node in the cluster.
Table 3–14 describes the Summary panel data fields.
Table 3–14 Data Items in the Summary Panel of the Cluster Transition/Overview
Summary Window
Data Item Description
Formed
Last Trans
Votes
Expected Votes
Failover Step
Members In
Members Out
Quorum
QD Votes
Failover ID
Date and time the cluster was formed.
Date and time of the most recent cluster state transition.
Total number of quorum votes being contributed by all cluster members and quorum disk.
Number of votes expected to be contributed by all members of the cluster as determined by the connection manager. This value is based on the maximum of the EXPECTED_VOTES system parameter and the maximized value of the VOTES system parameter.
Current failover step index; shows which step in the sequence of failover steps the failover is currently executing.
Number of members of the cluster DECamds has a connection to.
Number of members of the cluster DECamds either has no connection to or has lost connection to.
Number of votes required to keep cluster above quorum.
Number of votes given to Quorum Disk. A value of 65535 means there is no Quorum Disk.
Failover Instance Identification: unique ID of a failover sequence; indicates to system managers whether a failover has occurred since the last time they checked.
Table 3–15 describes the Cluster Members panel data fields.
Table 3–15 Data Items in the Cluster Members Panel of the Cluster
Transition/Overview Summary Window
Data Item Description
SCS Name
SCS id
CSID
Votes
Expect
System Communication Services name for the node (system parameter
SCSNODE)
System Communication Services identification for the node (system parameter SCSYSTEMID)
Cluster System Identification
Number of votes the member contributes
Expected votes to be contributed as set by the EXPECTED_VOTES system parameter
(continued on next page)
3–32
Managing DECamds Data Windows
3.13 Cluster Transition/Overview Summary Window
Table 3–15 (Cont.) Data Items in the Cluster Members Panel of the Cluster
Transition/Overview Summary Window
Data Item Description
Quorum
Lck:DirWt
Status
Recommended quorum value derived from the expected votes
Lock Manager distributed directory weight as determined by the
LCKDIRWT system parameter
Current cluster member status: MEMBER, UNKNOWN, or BRK_NON
(break nonmember)
Transition Time Time cluster member had last transition
3.13.2 Notes About the Display
Following are notes about the display of data in the window:
• No highlighting conventions are used in the window; all data items are displayed in normal mode.
• You cannot filter out any data.
• The data items in the window are sorted on an "as-found" basis. You cannot change the sort criteria.
• When you click on an item, DECamds temporarily stops updating the window for 15 seconds or until you choose an item from a menu.
• DECamds signals the LOVOTE event when the difference between the cluster’s quorum and votes values is less than the threshold for the event:
LOVOTE, ’node’ VOTES count is close to or below QUORUM
The default threshold for LOVOTE is 1.
• You can change collection intervals.
3.14 System Communications Architecture Summary Window
The System Communications Architecture Summary (SCA Summary) window shown in Figure 3–16 displays information about a selected node’s virtual circuits and connections to other nodes in a cluster. (The display represents the view one node has of other nodes in the cluster.) More than one type of virtual circuit indicates that more than one path to the remote node exists.
3–33
Managing DECamds Data Windows
3.14 System Communications Architecture Summary Window
Figure 3–16 SCA Summary Window
DFODIL System Communication Architecture Summary
File View Fix Customize Help
MACHU
RUNAD
DFODIL
AZSUN
CLAWS
CALPAL
VAX5
CRNPOP
LOADQ
GNRS
PITMOD
4X4TRK
VMSRMS
ALTOS
FARKLE
TSAVO
ETOSHA
CLAIR
MILADY
CHOBE
ZOON
ZAPNOT
2BOYS
ORNOT
BARNEY
ARUSHA
SUB4
NodeName
Local SysApp
VC (Type)
Remote SysApp State
PEA0:
PEA0:
PEA0:
PEA0:
PEA0:
PEA0:
PEA0:
PEA0:
PEA0:
PEA0:
PEA0:
PEA0:
PEA0:
PEA0:
(LAN) OPEN
(LAN) OPEN
PEA0: (LAN) OPEN
PEA0: (LAN) OPEN
PEAO:
PEA0:
PEA0:
PEA0:
PEA0:
PEA0:
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
PEA0:
PEA0:
PEA0:
PEA0:
PEA0:
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
(LAN) OPEN
........
Messages
........
Rcvd
0.00
0.00
0.00
0.01
0.00
0.00
0.04
0.00
0.00
0.00
0.04
0.00
0.00
0.00
0.00
0.04
0.04
0.00
0.01
0.00
0.04
0.00
0.04
0.01
0.00
0.04
0.00
0.00
0.00
0.00
0.04
0.04
0.00
0.00
0.04
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.04
0.00
0.00
0.00
0.04
0.00
0.04
0.00
0.00
0.04
0.00
KB Mapped
......
Block Data (KB)
Sent Rcvd
........
Block Transfer
Sent
........
Reqd
**Use MB3 to switch between raw or rate display**
Datagrams
........
Sent Rcvd Credit Wait CDT
0
13
0
0
0
4
0
2
18
0
4
8
0
0
0
15
0
0
0
0
15
3
0
0
31
0
0
46
16
3025
47
32964
16
224
16
258
16
168
148
16
16
24
16
0
16
65
916
6416
17
16059
15
16
16
26
15
7
0
7
22
148
432
7
116
7
7
7
8
226
7
156
135
7
7
38
7
171
7
268
7
191
7
0
0
0
0
14
12
22
0
35
0
0
0
0
21
0
12
17
0
0
4
0
14
0
6
0
29
0
50
26
0
26
39
561
2772
26
447
25
25
25
29
888
25
610
505
25
25
154
25
595
25
927
25
698
25
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
256
0
358
1
0
0
0
0
0
0
0
0
110
0
0
0
0
0
0
0
341
39
0
0
0
ZK−8546A−GE
Each line in the window shows either a summary of all system applications
(SysApps) using the virtual circuit communication or the communication on the connection between a local and a remote SysApp. The data displayed in the window is similar to the information that the Show Cluster utility displays for the CIRCUITS, CONNECTIONS, and COUNTERS classes. Unlike Show Cluster, however, this display shows only SCA connections to other OpenVMS nodes; it does not show SCA connections to the Disk Storage Architecture (DSA) or to devices such as FDDI or DSSI disk controllers.
By clicking MB3 on a node name and choosing View SysApps from the pop-up menu, you can display the system applications that are using virtual circuits.
This option expands the list below a virtual circuit to show all the system applications that contribute to that virtual circuit. (The SysApp lines are dimmed and right-justified.)
To hide the display of system applications, click MB3 and choose Hide SysApps from the pop-up menu.
To display a menu that allows you to toggle between Raw and Rate data, click
MB3 on the data to the right of the State field. (For messages, the default is the display of rate data; raw data is the default for all other types of data.)
To open an SCA Summary window, follow these steps:
1.
In the Cluster Transition/Overview Summary window, click MB3 on an SCS name.
The system displays a pop-up menu.
2.
Choose Display SCA Summary.
3–34
Managing DECamds Data Windows
3.14 System Communications Architecture Summary Window
The system displays the System Communication Architecture (SCA)
Summary window.
Table 3–16 describes the SCA Summary window data fields.
Table 3–16 Data Items in the SCA Summary Window
Data Item Description
NodeName
VC(Type)
State
Messages
Block Transfer
KB Mapped
SCS name of the remotely connected node.
The virtual circuit being used and its type.
The state of the virtual circuit connection.
Relatively small data packets sent and received between nodes for control information.
Fields listing the count of the number of block data transfers and requests initiated.
Field listing the number of kilobytes mapped for block data transfer.
Note: This field is available in Raw data format only.
Block Data (KB) Fields listing in kilobytes the data transferred via block data transfer.
Datagrams Number of unacknowledged messages sent between virtual circuits.
Credit Wait
CDT Wait
Local SysApp
Number of times the connection had to wait for a send credit.
Number of times the connection had to wait for a buffer descriptor.
Name of the local system application using the virtual circuit.
Remote SysApp Name of the remote system application being communicated to.
3.14.1 Notes About the Display
Following are notes about the display of data in the window:
• The window does not follow highlighting conventions: virtual circuit lines are displayed normally and are left-aligned; SysApp lines are dimmed and are indented by a column.
• You cannot filter out any data.
• The data items in the window are sorted on an "as-found" basis. You cannot change sort criteria at this time.
• DECamds signals the LOSTVC event when a virtual circuit between two nodes has been lost. This loss might be due either to a cluster node crashing or to cluster problems that caused the virtual circuit to close.
LOSTVC, <node> lost virtual circuit (<string>) to node <node>
• You can change collection intervals.
3–35
Managing DECamds Data Windows
3.15 NISCA Summary Window
3.15 NISCA Summary Window
The Network Interconnect System Communication Architecture (NISCA) is the transport protocol responsible for carrying messages such as disk I/Os and lock messages across Ethernet and FDDI LANs to other nodes in the cluster. More detailed information about the protocol is in the OpenVMS Cluster Systems manual.
The NISCA Summary window shown in Figure 3–17 displays detailed information about the LAN (Ethernet or FDDI) connection between two nodes. DECamds displays one window per virtual circuit provided the virtual circuit is running over a PEA0: device.
The purpose of this window is to view statistics in real time and to troubleshoot problems found in the NISCA protocol. The window is intended primarily as an aid to diagnosing LAN-related problems. The OpenVMS Cluster Systems manual describes the parameters shown in this window and tells how to use them to diagnose LAN-related cluster problems.
The window provides the same information as the OpenVMS System Dump
Analyzer (SDA) command SHOW PORTS/VC=VC_nodex. (VC refers to virtual circuit; nodex is a node in the cluster. The system defines VC-nodex after a
SHOW PORTS command is issued from SDA.)
3–36
Figure 3–17 NISCA Summary Window
Managing DECamds Data Windows
3.15 NISCA Summary Window
DFODIL NISCA Connection to MACHU
File View Fix Customize
Transmit
Item
Packets
Unsequenced (DG)
Sequenced
Lone ACK
ReXmt Count
ReXmt Timeout
ReXmt Ratio
Bytes
Raw Rate
37246
3
18752
18486
5
14
N/A
1817582
0.02
0.00
0.00
0.00
0.00
0.00
0.0000
1.04
Receive
Item
Packets
Unsequenced (DG)
Sequenced
Lone ACK
Duplicate
Out of Order
Illegal ACK
Bytes
Raw
19908
3
18786
1115
5
0
0
1646125
Rate
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.92
Help
Congestion Control
Item
Transmit Window Current
Transmit Window Grow
Transmit Window Max
Transmit Window Reached
Roundtrip uSec
Roundtrip Deviation uSec
Retransmit Timeout uSec
UnAcked Messages
CMD Queue Length
CMD Queue Max
Value
9
5
16
16
11230
0
1
1312
31729
0
VC Closures
Item
SeqMsg TMO
CC DFQ Empty
Topology Change
NPAGEDYN Low
Count
0
0
0
0
Channel Selection
Item Value
Buffer Size
Channel Count
Channel Selections
Protocol
Local Device
Local LAN Address
Remote Device
Remote LAN Address
1412
1
15
1.4.0
ES_LANCE
AA−00−D4−00−6D−FF
E2_TGEC
AA−00−D4−00−39−4C
Packets Discarded
Item
No Xmt Chan
Ill Seq Msg
TR DFQ Empty
CC MFQ Empty
Count
0
0
0
0
Item
Rcv Short Msg
Bad Checksum
TR MFQ Empty
Cache Miss
Count
0
0
0
0
ZK−8547A−GE
To open an NISCA Summary window, do one of the following:
• In the SCA Summary window, click MB3 on a row with the PEA0: Virtual
Circuit. Choose View SysApps from the popup menu, click MB3 on a SysApps node, and Choose Display NISCA. The system displays the NISCA Summary window.
Note: If the Display NISCA option is dimmed, the NISCA protocol is not running for that system application.
• Double-click MB1 on a row with a PEA0: to display an expanded list below the node name.
• Double-click MB1 on a SysApps node to display the NISCA Summary window.
3–37
Managing DECamds Data Windows
3.15 NISCA Summary Window
3.15.1 Data Displayed
Panels in the NISCA Summary window contain the data described in the following tables.
Table 3–17 lists data items displayed in the Transmit Panel, which contains data packet transmission information.
Table 3–17 Data Items in the Transmit Panel
Data Item Description
Packets
Unsequenced (DG)
Sequenced
Lone ACK
ReXmt Count
ReXmt Timeout
ReXmt Ratio
Bytes
Number of packets transmitted through the virtual circuit to the remote node, including both sequenced and unsequenced (channel control) messages, and lone acknowledgments.
Count and rate of the number of unsequenced datagram packages transmitted.
Count and rate of the number of sequenced packages transmitted.
Sequenced messages are used for application data.
Count and rate of the number of lone acknowledgments.
Number of packets retransmitted. Retransmission occurs when the local node does not receive an acknowledgment for a transmitted packet within a predetermined timeout interval.
Number of retransmission timeouts that have occurred.
Ratio of ReXmt Count current and past to the current and past number of sequenced messages sent.
Count and rate of the number of bytes transmitted through the virtual circuit.
Table 3–18 describes data items displayed in the Receive Panel, which contains data packet reception information.
Table 3–18 Data Items in the Receive Panel
Data Item Description
Packets
Unsequenced (DG)
Sequenced
Lone ACK
Duplicate
Out of Order
Illegal Ack
Bytes
Number of packets transmitted through the virtual circuit to the remote node, including both sequenced and unsequenced (channel control) messages, and lone acknowledgments.
Count and rate of the number of unsequenced packages received.
Count and rate of the number of sequenced packages received.
Sequenced messages are used for application data.
Count and rate of the number of lone acknowledgments.
Number of redundant packets received by this system.
Number of packets received out of order by this system.
Number of illegal acknowledgments received.
Count and rate of the number of bytes received through the virtual circuit.
Table 3–19 describes data items displayed in the Congestion Control Panel, which contains transmit congestion control information.
3–38
Managing DECamds Data Windows
3.15 NISCA Summary Window
The values in the panel list the number of messages that can be sent to the remote node before receiving an acknowledgment and the retransmission timeout.
Table 3–19 Data Items in the Congestion Control Panel
Data Item Description
Transmit Window
Current
Transmit Window
Grow
Transmit Window
Max
Transmit Window
Reached
Current value of the pipe quota (transmit window). After a timeout, the pipe quota is reset to 1 to decrease congestion and is allowed to increase quickly as acknowledgments are received.
The slow growth threshold: size at which the rate of increase is slowed to avoid congestion on the network again.
Maximum value of pipe quota currently allowed for the virtual circuit based on channel limitations.
Number of times the entire transmit window was full. If this number is small as compared with the number of sequenced messages transmitted, the local node is not sending large bursts of data to the remote node.
Roundtrip uSec
Roundtrip
Deviation uSec
Retransmit
Timeout uSec
Average roundtrip time for a packet to be sent and acknowledged.
The value is displayed in microseconds.
Average deviation of the roundtrip time. The value is displayed in microseconds.
Value used to determine packet retransmission timeout. If a packet does not receive either an acknowledging or a responding packet, the packet is assumed to be lost and will be resent.
UnAcked Messages Number of unacknowledged messages.
CMD Queue
Length
Current length of all command queues.
CMD Queue Max Maximum number of commands in queues so far.
Table 3–20 describes data items displayed in the Channel Selection Panel, which contains channel selection information.
Table 3–20 Data Items in the Channel Selection Panel
Data Item Description
Buffer Size Maximum PPC data buffer size for this virtual circuit.
Channel Count Number of channels connected to this virtual circuit.
Channel Selections Number of channel selections performed.
Protocol NISCA Protocol version.
Local Device Name of the local device that the channel uses to send and receive packets.
Local LAN Address Address of the local LAN device that performs sends and receives.
Remote Device
Remote LAN
Address
Name of the remote device that the channel uses to send and receive packets.
Address of the remote LAN device performing the sends and receives.
Table 3–21 describes data items displayed in the VC Closures panel, which contains information about the number of times a virtual circuit has closed for a particular reason.
3–39
Managing DECamds Data Windows
3.15 NISCA Summary Window
Table 3–21 Data Items in the VC Closures Panel
Data Item Description
SeqMsg TMO
CC DFQ Empty
Topology Change
NPAGEDYN Low
Number of sequence transmit timeouts.
Number of times the channel control DFQ was empty.
Number of times PEDRIVER performed a failover from FDDI to
Ethernet, necessitating the closing and reopening of the virtual circuit.
Number of times the virtual circuit was lost because of a pool allocation failure on the local node.
Table 3–22 lists data items displayed in the Packets Discarded Panel, which contains information about the number of times packets were discarded for a particular reason.
Table 3–22 Data Items in the Packets Discarded Panel
Data Item Description
No Xmt Chan
Ill Seq Msg
TR DFQ Empty
CC MFQ Empty
Rcv Short Msg
Bad Checksum
TR MFQ Empty
Cache Miss
Number of times there was no transmit channel.
Number of times an illegal sequenced message was received.
Number of times the Transmit DFQ was empty.
Number of times the Control Channel MFQ was empty.
Number of times a short transport message was received.
Number of times there was a checksum failure.
Number of times the Transmit MFQ was empty.
Number of messages that could not be placed in the cache.
3.15.2 Notes About the Display
Following are notes about the display of data in the window:
• No highlighting conventions are used in the NISCA Summary window.
• You cannot sort or filter the data displayed in this window.
• You can change collection intervals.
3–40
4
Performing Fixes
You can perform fixes to resolve resource availability problems and improve system availability.
This chapter covers the following topics:
• Understanding fixes
• Performing fixes
• Typical fix examples
Caution
Performing certain actions to fix a problem can have serious repercussions on a system, including possibly causing a system failure. Therefore, only experienced system managers should perform fixes.
4.1 Understanding Fixes
When DECamds detects a resource availability problem, it analyzes the problem and proposes one or more fixes to improve the situation. Most fixes correspond to an OpenVMS system service call.
The following fixes are available from DECamds:
Fix Category Possible Fixes
Memory usage fixes
Process fixes
Adjust Process Quota
Limit fix
Process state fixes
Adjust working set
Purge working set
Delete a process
Exit an image
Change limits for AST, BIO, DIO,
ENQ, FIL, PRC, and TQE process quota limits
Resume a process
Suspend a process
Process priority fixes Lower or raise a process priority
Quorum fix Adjust cluster quorum
System fix Crash node
System Service Call
$ADJWSL
$PURGWS
$DELPRC
$FORCEX
None
$RESUME
$SUSPND
$SETPRI
None
None
4–1
Performing Fixes
4.1 Understanding Fixes
Before you perform a fix, you should understand the following information:
• Fixes are optional.
• You must have write access to perform a fix. (See Section 1.3 for more information about DECamds security.)
• You cannot undo many fixes. (After using the crash node fix, for example, the node must be rebooted.)
• The exit image, delete process, and suspend process fixes should not be applied to system processes. Doing so can require rebooting the node.
• Whenever you exit an image, you cannot return to that image.
• Processes that have exceeded their job or process quota cannot be deleted.
• DECamds ignores fixes applied to the SWAPPER process.
4.2 Performing Fixes
Standard OpenVMS privileges restrict write access of users. When you run the Data Analyzer, you must have the CMKRNL privilege to send a write (fix) instruction to a node with a problem.
To initiate a fix, perform one of the following actions:
• From any of the data windows, double-click on a process, and then choose an action from the Fix menu.
• Click MB3 on an event, and choose Fix from the menu.
DECamds displays a dialog box listing the fixes you can perform for the selected event. The recommended choice is highlighted. When you click on OK or Apply,
DECamds performs one of the following actions:
• If the event you selected is not specific to a certain process, DECamds automatically performs the fix. Some fixes are performed automatically when
‘‘(automatic)’’ is displayed next to the selection.
• If the event is specific to a process, DECamds displays another dialog box in which you can specify the fix parameters. For example, for the Adjust
Working Set Size fix, you specify a new working set size for the process.
DECamds performs the highlighted fix as long as the event still exists. If the event you are fixing has changed, the dialog box disappears when you click on
OK, Apply, or Cancel, and the fix is not performed.
Table 4–1 summarizes all fixes alphabetically and specifies the windows from which they are available.
4–2
Performing Fixes
4.2 Performing Fixes
Table 4–1 Summary of DECamds Fixes
Problem to be Solved Fix
Process quota has reached its limit and has entered
RWAIT state
Cluster hung
Adjust Process Quota
Limit
Adjust Quorum
Working set too high or low
Runaway process
Node resource hanging cluster
Process looping, intruder
Endlessly process loop in same PC range
Node or process low memory
Process previously suspended
Runaway process, unwelcome intruder
Adjust Working Set
Change Process
Priority
Crash Node
Delete Process
Exit Image
Purge Working Set
Resume Process
Suspend Process
Available From
Single Process Summary
Event Log
Effects
Process receives greater limit.
Node Summary
Cluster
Transition/Overview
Summary
Memory Summary
Single Process Summary
Event Log
CPU Summary
Single Process Summary
Event Log
System Overview
Node Summary
Single Lock Summary
Any process window
Any process window
Quorum for cluster is adjusted.
Removes unused pages from working set; page faulting might occur.
Priority stays at selected setting.
Node crashes with operator requested shutdown.
Process no longer exists.
Exit from current image.
Event Log
Memory Summary
Single Process Summary
Event Log
Memory Summary
CPU Summary
Process I/O Summary
Single Process Summary
Event Log
Memory Summary
CPU Summary
Process I/O Summary
Single Process Summary
Frees memory; page faulting might occur.
Process starts from point it was suspended.
Process gets no computes.
The following sections provide reference information about each DECamds fix.
4–3
Performing Fixes
4.2 Performing Fixes
4.2.1 Adjust Quorum Fix
When you perform the Adjust Quorum fix, DECamds displays a dialog box similar to the one shown in Figure 4–1.
Figure 4–1 FIX Adjust Quorum Dialog Box
BHAK − FIX quorum node
This fix will force a cluster quorum adjustment on the entire OpenVMS Cluster upon which this fix is run. Pressing OK will adjust the quorum, while pressing
Cancel will avoid quorum adjustment.
OK
Cancel
ZK−8991A−GE
The Adjust Quorum fix forces the node to refigure the quorum value. This fix is the equivalent of the Interrupt Priority C (IPC) mechanism used at system consoles for the same purpose. The fix forces the adjustment for the entire cluster so that each node in the cluster will have the same new quorum value.
The Adjust Quorum fix is useful when the number of votes in a cluster falls below the quorum set for that cluster. This fix allows you to readjust the quorum so that it corresponds to the current number of votes in the cluster.
4–4
Performing Fixes
4.2 Performing Fixes
4.2.2 Adjust Process Quota Limit
When you perform the Adjust Process Quota Limit fix, DECamds displays a dialog box similar to the one shown in Figure 4–2.
Figure 4–2 FIX Adjust Process Quota Limit Dialog Box
CALPAL − FIX Adjust Process Quota Limit
Process Name : BATCH_1944 (7100210E)
Current Limit : 600
Select Quota to Modify and then use Slider to Adjust
AST
FIL
600
BIO
TQE
DIO
ENQ
PRC
BYT
1200
Fix Process Quota Limit Size Scale
OK Apply
Cancel
ZK−8992A−GE
If a process is waiting for a resource, you can use the Adjust Process Quota Limit fix to increase the resource limit so that the process can continue. The increased limit is only in effect for the life of the process, however; any new process will be assigned the quota set in the UAF.
To use this fix, select the resource and then use the slide bar to change the current setting. Finally, select one of the following:
• OK — to apply the fix and exit the window
• Apply — to apply the fix and not exit the window (so that you can continue to make changes)
• Cancel — not to perform the fix and exit the window
4–5
Performing Fixes
4.2 Performing Fixes
4.2.3 Adjust Working Set Fix
When you perform the Adjust Working Set fix, DECamds displays a dialog box similar to the one shown in Figure 4–3.
Figure 4–3 FIX Adjust Working Set Size Dialog Box
DELPHI − FIX Adjust Working Set Size
Process Name : NET_34934 (62A01E6B)
Ws Count : 1144
20
544
Fix Working Set Size Scale
32000
OK Apply
Cancel
ZK−7953A−GE
Adjusting the working set can give needed memory to other processes that are page faulting. In your adjustment, try to bring the working set size closer to the actual count being used by nonpage faulting processes.
Caution
If the automatic working set adjustment is enabled for the system, a fix to Adjust Working Set Size will disable the automatic adjustment for the process.
4.2.4 Change Process Priority Fix
When you perform the Change Process Priority fix, DECamds displays a dialog box similar to Figure 4–4.
4–6
Performing Fixes
4.2 Performing Fixes
Figure 4–4 FIX Process Priority Dialog Box
DELPHI − FIX Process Priority
Process Name : NET_34934 (62A01E6B)
Priority : 5/ 4
0
4
Fix Process Priority Scale
31
OK Apply
Cancel
ZK−7972A−GE
Setting a priority too high for a compute-bound process allows it to consume all the CPU cycles on the node, which can affect performance dramatically. On the other hand, setting a priority too low prevents the process from getting enough
CPU cycles to do its job, which can also affect performance.
4.2.5 Crash Node Fix
When you perform the Crash Node fix, DECamds displays a dialog box similar to
Figure 4–5.
Figure 4–5 FIX Crash Node Dialog Box
AMDS − FIX crash node
******* WARNING *******
******* IRRECOVERABLE FIX *******
Pressing OK will force a system crash on the node listed in the title!!!
Press cancel to avoid crashing the node
OK
Cancel
ZK−7954A−GE
4–7
Performing Fixes
4.2 Performing Fixes
Caution
The crash node fix is an operator-requested bugcheck from the driver. It happens immediately when you click on OK in the Fix Crash Node dialog box. After performing this fix, the node cannot be restored to its previous state. After a crash, the node must be rebooted.
Recognizing a System Failure Forced by DECamds
Because a user with suitable privileges can force a node to fail from the Data
Analyzer by using the Crash Node fix, system managers have requested a method for recognizing these particular failure footprints so that they can distinguish them from other failures. These failures all have identical footprints: they are operator-induced system failures in kernel mode at IPL 8. The top of the kernel stack is similar to the following display:
SP => Quadword system address
Quadword data
1BE0DEAD.00000000
00000000.00000000
Quadword data
Quadword data
TRAP$CRASH
SYS$RMDRIVER + offset
4.2.6 Exit Image and Delete Process Fixes
When you perform either the Exit Image or Delete Process fix, DECamds displays a dialog box similar to Figure 4–6.
Figure 4–6 FIX Process State Dialog Box — Exit Image or Delete Process
DELPHI − FIX suspend or resume process
Process Name : NET_34934 (62A01E6B)
State : PFW
Exit Image Delete Process
OK Apply
Cancel
ZK−7971A−GE
You cannot reverse the action when you delete a process that is in a resource wait state. You must reboot the node. Deleting a process on a node that is in a resource wait state might not have an effect on the process.
Exiting an image on a node can stop an application that is required by the user.
Check the single process window first to determine which image it is running.
4–8
Performing Fixes
4.2 Performing Fixes
Caution
Deleting or exiting a system process could corrupt the kernel.
4.2.7 Purge Working Set Fix
When you perform the Purge Working Set fix, DECamds displays a dialog box similar to Figure 4–7.
Figure 4–7 FIX Purge Working Set Dialog Box
DELPHI − FIX Purge Working Set
Process Name : NET_34934 (62A01E6B)
Ws Count : 482
OK Apply
Cancel
ZK−7973A−GE
Continual purging of a working set on a node could force excessive page faulting, which affects system performance.
4.2.8 Suspend Process and Resume Process Fixes
When you perform either the Suspend Process or Resume Process fix, DECamds displays a dialog box similar to the one shown in Figure 4–8.
4–9
Performing Fixes
4.2 Performing Fixes
Figure 4–8 FIX Process State Dialog Box — Suspend or Resume Process
DELPHI − FIX suspend or resume process
Process Name : NET_34934 (62A01E6B)
State : PFW
Suspend Resume
OK Apply
Cancel
ZK−7955A−GE
Suspending a process that is consuming excess CPU time can improve perceived
CPU performance by freeing the CPU for use by other processes. Conversely, resuming a process that was using excess CPU time while running might reduce perceived CPU performance.
Caution
Do not suspend system processes, especially JOB_CONTROL.
4.3 Examples for Fixing Low Memory Availability
This section describes two approaches for solving a low memory problem, which is a common resource availability problem.
The procedure in Section 4.3.1 uses DECamds default settings. The procedure in
Section 4.3.2 shows how you can use DECamds to make a more detailed analysis and investigation. Both examples begin at the Event Log window entry.
4.3.1 Performing a Fix Using Automatic Fix Settings
When a process is page faulting, for example, it may signal a problem of available memory. A low memory (LOMEMY) event is generated. To fix this problem, you should purge the working sets of inactive processes. This will free up memory for the process that is page faulting. DECamds offers a quick, direct way to fix this and similar problems by performing the following steps:
1.
Click MB3 on the event and choose Fix.
If the event is related to a specific process, DECamds displays a dialog box with fixes you can perform. If the event is not related to a specific process but may be related to more than one process, DECamds automatically performs the fix.
In the low memory example, DECamds displays a dialog box suggesting the automatic Purge Working Set fix.
2.
Click on OK or Apply to perform the fix.
4–10
Performing Fixes
4.3 Examples for Fixing Low Memory Availability
The Purge Working Set fix purges the working set of the five processes that are the highest consumers of memory and are not page faulting. If this fix is not sufficient and the low memory event entry returns, repeat the fix every
15 or 20 seconds until enough working sets are purged to eliminate the event message. If two or three purges are not sufficient, then you should investigate manually.
4.3.2 Performing a Fix Using Manual Investigation
DECamds lets you manually display additional information related to an event before performing a fix. The following example uses the same low memory problem described in the previous section to investigate and select specific fixes for the problem.
For this example, manually select the processes you want to fix from the Memory
Summary window. You also may want to refer to data in the CPU Summary window.
To investigate the low memory event before fixing it, perform the following steps:
1.
Click MB3 on the event and choose Display.
DECamds displays a dialog box with a window name highlighted to indicate the recommended path for information. In the example shown in Figure 4–9, the Memory Summary window is recommended.
Figure 4–9 Sample Fix Dialog Box
DISPLAY − LOMEMY, DELPHI free memory is low
Event Display Choices
Memory Summary
Node Summary
OK
Apply Cancel
ZK−7958A−GE
2.
Click on Apply to open the Memory Summary window shown in Figure 4–10 and keep the dialog box.
4–11
Performing Fixes
4.3 Examples for Fixing Low Memory Availability
Figure 4–10 DECamds Memory Summary Window
File View Fix
DELPHI Memory Summary
Customize Help
PID
Process
Name
62AQE61D
62AQE802
62AQE86B
62AQE8B4
TGOODWIN_1
TGOODWIN_2
YURYAN
BATCH_2878
62AQE611
62AQE873
. . . the Giant!
_RTA57:
62AQE88B WEINER_1
62AQE83A ABRAMSON
62AQE614
62AQE8C1
62AQE85E
DECW$TE_E614
UCX$BOOT_BG6112
CWINSOR
8542
5578
5233
2537
2469
1235
1032
1028
831
774
740
Count
Working Set
Size Extent Rate
11498 32004
9548 32004
6994
3394
32000
32000
3544
1744
1448
1444
35.44
0.00
0.00
0.00
32000 0.01
32000 0.00
32004 0.00
32000 0.00
6094 32000 0.00
1144 32000 0.80
994 32000 0.00
Page Fault
I/O
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.05
0.00
ZK−7959A−GE
3.
To determine which process consumes the most memory and is not page faulting, you can sort and examine the data in the Memory Summary window.
In this example, the process TGOODWIN_1 is consuming the most memory and is page faulting.
4.
Select the Node Summary window from the Low Memory dialog box and click on Apply to display the window. DECamds displays a window similar to
Figure 4–11.
4–12
Performing Fixes
4.3 Examples for Fixing Low Memory Availability
Figure 4–11 DECamds Node Summary Window
DELPHI Node Summary
File View Fix Customize Help
Model:
O.S.:
Uptime:
Memory:
CPUs:
DEC 7000 Model 630
OpenVMS V7.0
12 00:39:15.14
192.00 Mb
4
CPU Process State Queues
COM
WAIT
0 1 2 3 4 Curr
0.00
Peak
1.79
0.89
2.77
CPU Modes (Avg all processors)
0 25 50 75 100
U
I
K
E
S
C
M
N
0
3
78
8
1
0
4
7
Page Faults (per second)
0 16
Total
Hard
System
32
Memory (Pages in thousands)
0 49 98
Free
Used
Modified
Bad
147
48
196
64 80 96 Curr
66.50
10.80
0.00
Peak
1177.24
67.33
0.00
245 294 344 393 Curr
24609
354061
14546
0
Peak
23848
354147
15549
0
31
11
18
35
14
0
12
36
I/O (per second)
0
WIO
DIO
BIO
16 32 48 64 80 96 Curr
0.00
39.17
30.17
Peak
6.34
81.34
157.90
ZK−7962A−GE
The Node Summary window in Figure 4–11 confirms there is little free memory available. (The Node Summary window also can show other activity that is relevant in diagnosing the problem, such as a high number of page faults.)
5.
Purge the working sets. Choose which process’s working sets are to be purged by performing the following steps: a.
In the Memory Summary window, select any process, click MB3 on the count field, and choose Fix from the menu.
b.
Click on OK or Apply in the Fix dialog box.
4–13
5
Customizing DECamds
This chapter describes how to organize data collection, analysis, and display by filtering, sorting, and customizing DECamds. It also describes how some of these tasks can optimize the performance of DECamds.
5.1 Customizing DECamds Defaults
To set DECamds application values such as bar graph colors and automatic collection options, choose DECamds Customizations from the Customize menu of the Event Log or System Overview window. DECamds displays the DECamds
Application Customizations dialog box as shown in Figure 5–1.
5–1
Customizing DECamds
5.1 Customizing DECamds Defaults
Figure 5–1 DECamds Application Customizations Dialog Box
DECamds Application Customizations
Event Color
NoEvent Color
Current Values
Collection Interval Factor
Red
Green
1
Automatic Collection Options
Node
CPU
Disk
Volume
CluTran
Memory
I/O
Page/Swap
Lock
Application State Options
Show Nodes
Lock Event Collect
Automatic Event Investigation
Highlight Events
OK
Apply Default Cancel
ZK−7938A−GE
Table 5–1 lists the items you can customize.
To save your changes from one use to the next, choose Save DECamds
Customizations from the Customize menu of the Event Log or System Overview window. The changes are stored in the AMDS$APPLIC_CUSTOMIZE.DAT file.
Note
Subsequent installations of DECamds will not overwrite existing customization files. The installation procedure will check for the existence of each customization file. If found, the procedure will provide any new file with the .TEMPLATE file extension. The installer must check the new .TEMPLATE files for new features implemented in future releases; any changes will be stated in the online release notes in the following location:
SYS$HELP:AMDS0nn.RELEASE_NOTES
Note that nn refers to the version number of the release.
5–2
Customizing DECamds
5.1 Customizing DECamds Defaults
Table 5–1 DECamds Application Defaults
Field Default Function
Current Values
Event Color
NoEvent Color
Collection
Interval Factor
Red Specifies the bar graph color used for signaled events.
Green Specifies the bar graph color used for nonsignaled events.
1 This value is multiplied by a window’s collection interval definition. Used to force windows to have longer time spans between data collection. Increasing this number decreases the use of the Data Analyzer’s CPU and LAN.
Automatic Collection Options
Node
CPU
Memory
I/O
Disk
Volume
Page/Swap
On
Off
Off
Off
Off
Off
On
Lock
CluTran
On
On
Determines whether node data is collected at startup.
Determines whether CPU data is collected at startup.
Determines whether memory data is collected at startup.
Determines whether I/O data is collected at startup.
Determines whether disk data is collected at startup.
Determines whether volume data is collected on startup.
Determines whether page and swap data is collected at startup.
Determines whether lock contention data is collected at startup.
Determines whether a view of the cluster from the node on which Collect Cluster Transition Information was selected is collected.
Application State Options
Show Nodes On
Lock Event
Collect
Automatic
Event
Investigation
Highlight
Events
Off
Off
On
Determines whether the System Overview window starts up with individual node names displayed.
Determines whether DECamds automatically collects additional data about all the processes waiting for a locked resource.
Determines whether additional data is collected when
DECamds detects an event.
Determines whether event-related data is highlighted.
5.1.1 Setting Default Data Collection
By default, DECamds collects the following categories of data when started:
• Node Summary
• Page/Swap File Summary
• Lock Contention Summary
You can change the default amount of data collected when DECamds starts by choosing DECamds Customizations from the Customize menu in the Event Log or
System Overview window. The DECamds Application Customizations dialog box appears and you can click on the Automatic Collection Options buttons to select
5–3
Customizing DECamds
5.1 Customizing DECamds Defaults
or disable the categories you want. To save the settings for the next time you run
DECamds, choose Save DECamds Customizations from the Customize menu.
5.1.2 Setting Automatic Event Investigation
Automatic Event Investigation enhances the speed with which you can pursue a specified event. When this option is enabled, DECamds automatically collects follow-up data on the event. When this option is disabled, you must initiate follow-up data collection when an event occurs.
To enable automatic event investigation, choose Enable Automatic Event
Investigation from the Control menu of the System Overview or Event Log window. To disable it, choose Disable Automatic Event Investigation.
You also can set Automatic Event Investigation by choosing DECamds
Customizations from the Customize menu; then click on the Automatic Event
Investigation button in the resulting DECamds Application Customizations dialog box. To save the settings for the next time you run DECamds, choose Save
DECamds Customizations from the Customize menu.
Note that enabling this option can significantly increase CPU, memory, and LAN traffic load. By default, DECamds does not automatically investigate events that might require attention. Automatic investigation applies only to events that are detected after you enable the option. It does not apply to lock-related events, which you can control using the DECamds Application Customizations dialog box.
5.1.3 Setting Automatic Lock Investigation
With Automatic Lock Investigation, the Data Analyzer automatically investigates any signaled lock contention events. Setting this option allows you to determine more quickly the blocking lock in a resource contention situation.
Note that this option sometimes uses more DECamds memory, CPU, and LAN bandwidth to investigate locks that are very transient.
To enable automatic investigation of locks, click on the Lock Event Collect button in the DECamds Application Customizations dialog box.
5.2 Filtering Data
DECamds can collect and display every event regardless of how important or unimportant an event is to you. However, you can narrow the focus so that the events that you want to see are displayed. You can use the following methods to determine which events qualify for your attention:
• Filter all events on a global severity basis. For example, you might not want to see any event that has less than a 40 severity value.
• Define specific event criteria. For example, you can refine the global filtering by also defining that DSKRWT event (high disk device Rwait count) must pass your specifications before being considered an event worth displaying or logging.
Figure 5–2 shows the process an event must pass through to qualify as important enough to be logged or displayed for your attention.
5–4
Customizing DECamds
5.2 Filtering Data
Figure 5–2 Event Qualification
Data Analyzer gets information from the Data Provider
Event Severity Check
Set in Event Log Filter dialog box.
Choose Filter Data... from the
Customize menu of the Event Log.
Test Threshold Values
Set in event customization dialog box. Choose Customize Events from Event Log Customize menu.
Double−click on an event.
Data meets or exceeds values to signal an event
?
Yes
No
Data meets or exceeds threshold values
?
Yes
Add 1 to Occurrence counter.
No
Do nothing.
Do nothing.
Test Threshold Values
Set in event customization dialog box. Choose Customize Events from Event Log Customize menu.
Double−click on an event.
Occurrence count>=set value
?
Yes
Signal event.
No
Do nothing.
Event Severity Check
Set in Event Log Filter dialog box.
Choose Filter Data... from the
Customize menu of the Event Log.
Data meets or exceeds values to display an event
?
Yes
Display event and write the event to the AMDS$LOG file,
OPCOM, or user file.
No
Write the event to the AMDS$LOG file,
OPCOM, or user file.
ZK−7949A−GE
5–5
Customizing DECamds
5.2 Filtering Data
5.2.1 Filtering Events
To specify the events to be displayed in the Event Log window, perform the following steps:
1.
Choose Filter Data... from the Customize menu. A filter dialog box appears.
Table 5–2 describes the filter options.
Table 5–2 Event Log Filters
Filter Description
Severity
Event Bell
Bell Volume
Event Highlight
Event Signal
Event Timeout
(secs)
Event Escalation
Time (secs)
Event Escalation
Severity
Controls the severity level at which events are displayed in the
Event Log menu. By default, all events are displayed. Increasing this value reduces the number of event messages in the Event Log window and can improve perceived response time.
Determines which events are marked by an audible signal by specifying a minimum event severity value. When a new event is displayed, if the severity value is the same or greater than the specified value, an audible notification is given. To disable the sound, specify a value of 101.
Controls the pitch or sound level at which the bell is rung when an event is signaled whose priority is greater than the Event Bell filter.
Determines which events are marked by a visual signal by specifying a minimum event severity value. When a new event is displayed, if the severity value is the same or greater than the specified value, an event is highlighted. To disable highlighting, specify a value of 101.
Determines the severity value at which DECamds signals an event for attention. Only events that qualify are passed on to be checked by any filters you may set for a specific event. Increasing this value reduces the number of event messages that need to be tested to see if further attention is warranted, which can improve perceived response time.
Determines how long an informational event is displayed (in seconds).
Determines how long an event must be signaled before it is sent to the operator communication manager (OPCOM). DECamds uses this value along with the Event Escalation Severity value. Both criteria must be met before the event is signaled to OPCOM.
Determines which events are sent to OPCOM. DECamds uses this value along with the Event Escalation Time (secs) value. Both criteria must be met before the event is signaled to OPCOM.
2.
Modify the settings, which will apply to the current session. To save these settings from session to session, choose Save Filter Changes from the
Customize menu in the Event Log window.
5–6
Customizing DECamds
5.2 Filtering Data
You can also filter data in the following data windows:
• CPU Summary
• Lock Contention Summary
• Memory Summary
• Process I/O Summary
• Disk Status Summary
• Volume Summary
• Page/Swap File Summary
The modifiable options that are displayed in the filter dialog box for the window vary with the window.
Figure 5–3 shows the CPU Summary Filtering dialog box. For a process to be displayed in the CPU Summary window, it must have a Current Priority of 4 or more and be in any of the process states indicated except HIB, HIBO, or SUSPO.
No other processes are displayed.
Figure 5–3 CPU Summary Filtering Dialog Box
CPU Summary Filtering
Current Filter Values
Current Priority 4
CPU Rate 0.000
Select value, then either use arrows to change value or input new value and ‘Apply’ or ‘OK’ the change:
Process States
COLPG
MWAIT
PFW
LEF
CEF LEFO
HIB
HIBO
SUSP
SUSPO
FPG
COM
COMO
CUR
OK
Apply Default Cancel
ZK−7943A−GE
If the Enable Highlighting option is on, any process that signals an event is included in the display, regardless of whether it meets the filter criteria.
5–7
Customizing DECamds
5.2 Filtering Data
To change the value of a filter, turn the filter button on by clicking on it, and then click on the up or down arrow. Click on OK or Apply for the filter to take effect.
To return to system default values, click on Default.
Changing a Filter Category
Some data windows also allow you to filter data by category. For example, in the CPU Summary window, you also can filter by the Process State category to display only processes in certain states. Category buttons that are selected display the associated information.
In the CPU Summary window, to display only inactive processes, select the HIB and HIBO buttons under Process States, and deselect all other process states.
When you click on OK or Apply, only inactive processes appear in the CPU
Summary window.
5.2.2 Customizing Events
You can define criteria by which specific events are qualified for your attention.
For example, you can refine the global filtering by also defining that DSKRWT event (high disk device Rwait count) must pass your specifications before being considered an event worth displaying or logging. To define specific event criteria, perform the following steps:
1.
Choose Customize Events from the Customize menu in the Event Log window.
Figure 5–4 shows the Customize Events dialog box that appears.
5–8
Customizing DECamds
5.2 Filtering Data
Figure 5–4 Customize Events Dialog Box
Customize Events
HIHRDP,
HIMWTQ,
HINTER,
HIPWIO,
HIPWTQ,
HISYSP,
HITTLP,
HMPSYN, high hard page fault rate many processes waiting in MWAIT high interrupt mode time high paging Write I/O rate many processes waiting in Page WAIT high system page fault rate high total page fault rate high MP synchronization mode time
LCKBLK,
LCKCNT,
LCKWAT,
LOASTQ, lock blocking lock contention lock waiting process has used most of ASTLM quota
LOBIOQ,
LOBYTQ,
LODIOQ,
LOENQU, process has used most of BIOLM quota process has used most of BYTLM quota process has used most of DIOLM quota process has used most of ENQLM quota
LOFILQ,
LOMEMY,
LOPGFQ,
LOPGSP,
LOPRCQ,
LOSWSP,
LOTQEQ,
LOVLSP,
LOWEXT, process has used most of FILLM quota free memory is low process has used most of PGFLQUOTA quota low page file space process has used most of PRCLM quota low swap file space process has used most of TQELM quota low disk volume free space low process working set extent
LOWSQU, low process working set quota
LRGHSH, large hash table
NOPGFL,
NOPROC, no page file cannot find process
NOSWFL,
PRBIOR,
PRBIOW, no swap file high process Buffered I/O rate process waiting for Buffered I/O
PRCCOM, process waiting in COM or COMO
OK Select
Cancel
ZK−7944A−GE
2.
Double-click on an event that you want to customize. A dialog box appears with the event you select. The dialog box also contains an explanation of what might cause this event to occur. Figure 5–5 shows the LOWSQU Event
Customization window.
5–9
Customizing DECamds
5.2 Filtering Data
Figure 5–5 LOWSQU Event Customization Window
LOWSQU, low process working set quota
Event Format: LOWSQU, <node> <process> working set quota is too small
Signaled From: Memory or Single Process Summary
Event Class Type: Memory
Event Description
The process page fault rate exceeds the threshold and the percentage of
Working Set Size to Working Set Quota exceeds the threshold.
Event Investigation Hints
This event indicates the process needs more memory, but may not be able to get it due to either the WSQUO value in the UAF file being set too low for size of memory allocation requests or the system is memory constrained.
Event Customize Options
Severity
Occurrence
40
3
Class
Threshold 1
N/A
50
150.000
Page faults per second
Percent WSQuota over WSCount Threshold 2
Select value, then either use arrows to change value or input new value and "Apply" or "OK" the change:
Event Escalation Action Options
OPCOM USER NONE
Type in procedure to be run (e.g., amds$system:amds$event_mail_sample.com)
OK
Apply Default Cancel
ZK−7945A−GE
Figure 5–5 shows the values you can set in any Event Customization window.
To change the value of an option, click on an option and then use the arrow buttons to increase or decrease the value. A higher number indicates a more severe event.
3.
Modify the settings that will apply to the current session. To save these settings from session to session, choose Save Event Customizations from the
Customize menu in the Event Log window.
5–10
Customizing DECamds
5.2 Filtering Data
The following sections describe the event customization options.
Severity Option
Severity is the relative importance of an event. Events with a high severity must also exceed threshold settings before an event can be signaled for display or logging.
Occurrence Option
Each DECamds event is assigned an occurrence value, that is, the number of consecutive data samples that must exceed the event threshold before the event is signaled. By default, events have low occurrence values. However, you might find that a certain event only indicates a problem when it occurs repeatedly for an extended period. You can change the occurrence value assigned to that event so that DECamds signals it only when necessary.
For example, suppose page fault spikes are common in your environment, and
DECamds frequently signals intermittent HITTLP, total page fault rate is high events. You could change the event’s occurrence value to 3, so that the total page fault rate must exceed the threshold for three consecutive collection intervals before being signaled to the Event Log.
To avoid displaying insignificant events, you can customize an event so that
DECamds signals it only when it continuously occurs.
Automatic Event Investigation (see Section 5.1.2) uses the occurrence value to determine when to further investigate an event. When enabled, the automatic event investigation is activated when the Occurrence count is three times the
Occurrence setting value.
Class Option
You can customize certain events so that the event threshold varies depending on the class of computer system the event occurs on. This feature is particularly useful in environments with many different types and sizes of computers.
By default, DECamds uses only one default threshold for each event, regardless of the type of computer the event occurs on. However, for certain events (in particular, CPU, I/O, and memory usage events) the level at which resource use becomes a problem depends on the size and type of computer. For example, a page fault rate of 100 may be important on a VAXstation 2000 system but not on a VAX 7000 system.
DECamds provides three additional predefined classes for CPU, I/O, and Memoryrelated events. You can specify threshold values for each class in addition to the default threshold for an event. To specify an additional event threshold for each class, edit the file AMDS$THRESHOLD_DEFS.DAT located in the
AMDS$CONFIG directory.
Table 5–3 defines CPU, I/O, and Memory classes.
5–11
Customizing DECamds
5.2 Filtering Data
Table 5–3 CPU, I/O, and Memory Class Definitions
Class
1
Description
Class 1
Class 2
Class 3
Class 4
CPU Classes
All VAXft systems, VAXstation/VAXserver 4000, MicroVAX 4000
Higher VUP workstations: VAXstation/VAXserver 3100-M76, MicroVAX 3100-
M76, MicroVAX 3100-8*, VAXstation 3100-9*, MicroVAX 3100-9*, VAXstation
4000-9*
VAX/VAXserver 6000, 7000, 9000, 10000
All Alpha systems
Class 1
Class 2
Class 3
Class 4
I/O Classes
All VAX systems, VAXft systems, VAXstation/VAXserver 4000, MicroVAX 4000
Higher VUP workstations: VAXstation/VAXserver 3100-M76, MicroVAX 3100-
M76, MicroVAX 3100-8*, VAXstation 3100-9*, MicroVAX 3100-9*, VAXstation
4000-9*
VAX/VAXserver 6000, 7000, 9000, 10000
All Alpha systems
Systems with less than or equal to 24 MB of memory
Systems with more than 24 MB and less than or equal to 64 MB of memory
Systems with more than 64 MB of memory
All Alpha systems
Memory Classes
Class 1
Class 2
Class 3
Class 4
1
If no class is defined, DECamds uses the default threshold value.
You can specify class-based thresholds only for the following events:
• CPU-related events:
HINTER, node interrupt mode time is high
HICOMQ, node many processes waiting for CPU
HMPSYN, node MP synchronization mode time is high
HIPWTQ, node many processes waiting in COLPG, PFW, or FPG
HIMWTQ, node many processes waiting in MWAIT
• I/O-related events:
HIBIOR, node buffered I/O rate is high
HIDIOR, node direct I/O rate is high
HIPWIO, node paging write I/O rate is high
• Memory-related events:
LOMEMY, node free memory is low
HIHRDP, node hard page fault rate is high
HISYSP, node high system page fault rate
HITTLP, node total page fault rate is high
RESPRS, node resource hash table sparse
RESDNS, node resource hash table dense
5–12
Customizing DECamds
5.2 Filtering Data
As an example of setting a class-based threshold, the HITTLP, total page fault
rate is high event is a memory-related event, so the thresholds are based on the memory class definitions shown in Table 5–3. The default threshold for this event is 20 page faults per second. A page fault rate of 20 may be important on a VAXstation 2000 system, but it is not important on a VAX 7000 system.
To account for this, you can specify the following additional thresholds for the
HITTLP, total page fault rate is high event:
Class
1 (systems with less than or equal to 64 MB of memory)
2 (systems with 24 MB to 64
MB of memory)
3 (systems with more than 64
MB of memory)
4 (Alpha systems)
Threshold
20
40
100
100
Description
Event is triggered at the default threshold of 20 page faults per second.
Event is triggered at 40 page faults per second.
Event is triggered at 100 page faults per second.
Event is triggered at 100 page faults per second
Threshold Options
Threshold values are compared to an event’s description to determine whether an event meets the criteria for display or log. Threshold values are used in conjunction with the occurrence and severity values. Increasing event threshold values can reduce CPU use and improve perceived response time because more instances must occur for the threshold to be crossed, so fewer thresholds are crossed and fewer events are triggered.
Note
Setting a threshold too high could mask a serious problem.
You can read a description of an event by choosing Customize Events from the
Customize menu in the Event Log window, then double-clicking on the event. The
Event Customization dialog box displays an Event Description field.
Most events are checked against only one threshold; however, some have dual thresholds, where the event is triggered if either one is true. For example, for the
LOVLSP, node disk volume free space is low event, DECamds checks both of the following thresholds:
• Number of blocks remaining (LowDiskFreeSpace.BlkRem)
• Percentage of total blocks remaining (LowDiskFreeSpace.Percent)
Note
Events with both high severity and threshold values are signaled to the operator communication manager (OPCOM). For more information about signaling events to OPCOM, see Section 2.3.3.
5–13
Customizing DECamds
5.3 Sorting Data
5.3 Sorting Data
Choose Sort Data... from the Customize menu to change the order of the information displayed in a window. A dialog box appears in which you can specify sort criteria. All sort criteria must be met for a process to be displayed.
You can sort data in the following windows:
• CPU Summary
• Disk Status Summary
• Volume Summary
• Event Log
• Lock Contention Summary
• Memory Summary
• Page/Swap File Summary
• Process I/O Summary
Figure 5–6 shows a sample Memory Summary Sorting dialog box.
5–14
Figure 5–6 Memory Summary Sorting Dialog Box
Memory Summary Sorting
Sort Order
Ascending Descending
Sort Field
Process PID
Process Name
Working Set Count
Working Set Size
Working Set Extent
Page Fault Rate
Paging I/O Rate
Unsorted
Customizing DECamds
5.3 Sorting Data
OK
Apply Default Cancel
ZK−7961A−GE
Sorting is based on two variables: the sort order and the sort field. You can choose only one sort criterion for each variable—one for the sort order, and one for the sort field. To sort Memory Summary data to list the processes with the highest page fault rates first, for example perform the following steps:
1.
Choose Sort Data... from the Customize menu on the Memory Summary window. The Memory Summary Sorting dialog box appears; current sort field settings are displayed. (By default, DECamds sorts Memory Summary data on the Working Set Count field in descending order.)
2.
Change sort settings by choosing Page Fault Rate and Ascending order.
3.
Click on OK or Apply.
4.
To save sort settings, choose Save Sort Changes on the Customize menu.
5–15
Customizing DECamds
5.4 Setting Collection Intervals
5.4 Setting Collection Intervals
A collection interval is the time the Data Analyzer waits before requesting more information from Data Provider nodes. Changing the collection interval helps you control the performance of DECamds and its consumption of system resources.
The frequency of polling remote nodes for data (collection intervals) can affect perceived response time. You want to find a balance between collecting data often enough to detect potential resource availability problems before a node or cluster experiences a severe problem, and seldom enough to optimize perceived response time. Increasing the collection interval factor decreases CPU consumption and
LAN load, but response time might appear slower because the intervals are longer.
Collection intervals do not affect memory use.
To change a collection interval, choose Collection Interval from the Customize menu. Figure 5–7 shows a sample Memory Summary Collection Interval dialog box.
Figure 5–7 Memory Summary Collection Interval Dialog Box
Memory Summary Collection Interval
Current Collection Interval: 3.00
Based on Collection Interval Factor: 1
Display Interval (sec)
Event Interval (sec)
NoEvent Interval (sec)
3.00
5.00
30.00
Select value, then either use arrows to change value or input new value and ‘Apply’ or ‘OK’ the change:
OK
Apply Default Cancel
ZK−7939A−GE
Table 5–4 describes the fields on the Memory Summary Collection Interval dialog box.
5–16
Customizing DECamds
5.4 Setting Collection Intervals
Table 5–4 Memory Summary Collection Interval Fields
Current Collection
Interval
Based on Collection
Interval Factor
Displays the number of seconds between requests for data. You can change the value for all collection intervals for all windows by choosing DECamds Customizations from the Customize menu of the Event Log or System Overview window. The DECamds
Application Customizations dialog box appears and you can increase or decrease the collection interval factor.
Displays the number with which the collection interval is multiplied.
Display Interval (sec)
Event Interval (sec)
Displays the collection interval for displaying data in a window.
You can change the interval by clicking on the up or down arrows in the dialog box.
Displays the collection interval used when events are found.
This value is used by default when you start background collection. You can change the interval by clicking on the up or down arrows in the dialog box.
NoEvent Interval (sec) Displays the collection interval when no events are found. You can change the interval by clicking on the up or down arrows in the dialog box.
To apply the changes, click on OK or Apply. To save collection interval changes, choose Save Collection Interval Changes from the Customize menu.
To change back to DECamds default values for the window, click on Default. To exit without making any changes, click on Cancel.
Table 5–5 lists the default window collection interval values (in seconds) provided with DECamds for each window type.
Table 5–5 Default Window Collection Intervals
Window
CPU Modes Summary
CPU Summary
Disk Status Summary
Volume Summary
Lock Contention
Memory Summary
Node Summary
Page/Swap File Summary
Process Identification Manager
2
Process I/O Summary
Single Lock Summary
Single Process Summary
Display
1
5.0
5.0
30.0
15.0
10.0
5.0
5.0
30.0
60.0
10.0
10.0
5.0
Event
1
5.0
10.0
15.0
15.0
20.0
10.0
5.0
30.0
60.0
10.0
10.0
5.0
No Event
5.0
30.0
60.0
120.0
60.0
30.0
10.0
2400.0
240.0
30.0
20.0
20.0
1
1
All times are in seconds and cannot be less than .5 second.
2
Process Identification Manager supports the CPU, Memory, Process I/O, and Single Lock Summary window sampling.
5–17
Customizing DECamds
5.5 Optimizing Performance with System Settings
5.5 Optimizing Performance with System Settings
DECamds is a compute-intensive and LAN traffic-intensive application. At times, routine data collection, display activities, and corrective actions can cause a delay in perceived response time.
This section explains how to optimize perceived response time based on actual measurements of CPU utilization rates (throughput). Performance improvements can be made in the following areas:
Area
DECamds software
System settings
Hardware configuration
Discussed in...
Section 5.5.1
Section 5.5.2
Section 5.5.3
Site configurations vary widely, and no rules apply to all situations. However, the information in this section can help you make informed choices about improving your system performance.
The following factors affect perceived response time:
• Load on monitored nodes including applications and peripherals (especially number of disks)
• Number of monitored nodes and users
• Size of operating system tables and lists on monitored nodes (process and lock)
• Version of operating system running on monitored nodes
• LAN traffic, cluster communications, nodes booting, and network-based applications and tools
5.5.1 Optimizing DECamds Software
When DECamds starts, it polls the LAN to locate all nodes running the DECamds
Data Provider, creates a communications link, and collects data from each
Data Provider node on the LAN. (See Section 1.1 for more information about establishing a communications link between nodes.)
The initial polling process creates a short-term high load of CPU and LAN activity. After establishing a communications link with other nodes, DECamds reduces polling frequency, thereby reducing the CPU and LAN load.
Note
Each request to collect a new category of data increases memory and LAN requirements. Memory requirements vary with the number of categories collected and the number of nodes being polled.
Polling frequency does not affect memory because polling only changes how frequently existing data is replaced with updated data.
The following sections describe system settings that you can change to improve performance and the ability of DECamds to handle data collection demands.
5–18
Customizing DECamds
5.5 Optimizing Performance with System Settings
5.5.1.1 Setting Process Quotas
To improve the performance of DECamds, you might need to change process quotas. The quotas used extensively by DECamds are ASTLM, TQELM, BIOLM,
BYTLM, and WSEXTENT. The values listed in Section A.2 are suggestions for a
50-node cluster.
The following process quotas are recommended:
Quota
ASTLM
TQELM
BIOLM
WSEXTENT
BYTLM
Recommended Value
1
4 times the node count
4 times the node count
2 times the node count
350 times the node count
1500 times the node count
1
node count is the number of nodes a Data Analyzer monitors simultaneously.
Perform the following steps to change process quotas:
1.
Increase the process quotas assigned to the process initiating DECamds in the system’s user authorization file (UAF).
2.
Log out, log back in, and restart DECamds.
5.5.1.2 Setting LAN Load
The maximum size for data packets is 1500 bytes. When the amount of data is greater than 1500 bytes, DECamds must send multiple requests to complete the data collection request.
Table 5–6 shows the LAN load for various levels of collection intervals and data collection. You can modify a data collection window’s collection intervals (as explained in Section 5.4) or reduce the scope of data collection (as explained in
Section 5.1.1) to reduce LAN activity.
Table 5–6 LAN Load
Data
Outgoing Packet
Size (in bytes) on
Alpha Systems
Configuration data
CPU Modes
CPU Summary
Disk Status
Summary
Fix
Hello Message
Lock Contention
Memory
Summary
129
201
178
473
24
N/A
240
275
Outgoing Packet
Size (in bytes) on
VAX Systems
285
Return Packet Size (in bytes)
88
129
171
473
24
N/A
240
275
48 + (64* no. of processors)
16 per active process
56 per disk
12
32
76 per resource
36 per active process
(continued on next page)
5–19
Customizing DECamds
5.5 Optimizing Performance with System Settings
Table 5–6 (Cont.) LAN Load
Data
Outgoing Packet
Size (in bytes) on
Alpha Systems
Node Summary
Page/Swap File
Process I/O
Summary
Single Lock
(Waiting)
Single Process
Summary
Volume
Summary
319
208
236
272
491
430
Outgoing Packet
Size (in bytes) on
VAX Systems
241
208
229
Return Packet Size (in bytes)
48 + (64 * no. of processors)
46 per page/swap file
32 per active process
272
471
430
32 per waiter
00
28 per disk
5.5.1.3 Setting Window Customizations
The Sort, Filter, and collection interval settings at the data window level affect performance. Follow these guidelines to balance customization with performance:
• Filter out data to improve CPU performance. Reducing the collection criteria increases performance. See Section 5.2 for information on filtering data.
• Use unsorted windows to improve performance. Sorting requires extra computations. See Section 5.3 for information on sorting data.
• Increase collection interval values to improve performance. See Section 5.4
for information on changing collection intervals.
5.5.2 Optimizing System Settings
Changing several system settings might improve the performance of DECamds on your system. The following sections discuss these settings and how to change them.
5.5.2.1 Setting Data Link Read Operations
Increase read operations to the data link by changing the logical name
AMDS$COMM_READS in the AMDS$CONFIG:AMDS$LOGICALS.COM
command procedure. The AMDS$COMM_READS logical name controls the number of requests for data (read operations) queued to the data link.
If you increase data collection, increase the number of requests that can be queued. Compaq recommends two requests for each node being monitored. Each read operation queued requires 1500 bytes of BYTLM quota.
5.5.2.2 Setting the Communications Buffer
Increase the communications buffer by changing the logical name AMDS$COMM_
BUFFER_SIZE in the AMDS$CONFIG:AMDS$LOGICALS.COM command procedure. The buffer controls the size of the global section used for communication between the provider node and the communications process.
When DECamds cannot keep up, it displays the following warning message:
AMDS$_COMMBUFOVF---communications buffer overflow.
Increase the buffer by 25 percent.
In addition to increasing the value of the AMDS$COMM_BUFFER_SIZE logical name, set the system parameter GBLPAGFIL on the provider node to cover the increase. This adds to the amount of data collection that DECamds can perform.
5–20
Customizing DECamds
5.5 Optimizing Performance with System Settings
The value of the GBLPAGFIL system parameter must always be higher than the number of FREE_GBLPAGES. To determine the value of FREE_GBLPAGES, enter the following commands:
$ A = F$GETSYI("FREE_GBLPAGES")
$ SHO SYM A
The value of A must conform to the following formula:
2 * ( (buffer_size / 512) + 512)
5.5.3 Optimizing Performance with Hardware
Table 5–7 provides an approximate guideline for the number of nodes you can monitor when running DECamds on certain computer types.
Table 5–7 Monitoring Nodes
Monitoring Computer Type
VAXstation 3100
VAXstation 4000 Model 60
VAX 6000
1
VAX 4000 Model 90
DEC 3000 Model 400
DEC 4000 Model 620
DEC 7000 Model 720
Number of Nodes Monitored
Observation Only Observe and Fix
0-30
20-60
75-150
75-150
0-50
0-70
Any number
1
With DECwindows display directed to a workstation
0-20
20-50
65-130
65-130
0-50
0-70
Any number
Follow these suggestions when choosing and configuring a console:
• Use fast hardware.
Because DECamds is compute and memory-intensive, which is compounded by its real-time DECwindows-based display medium, faster CPUs will improve throughput and perceived response time.
• Use multiprocessors.
DECamds runs two processes: one handling calculations and display; one handling communications between the monitoring node and the remote nodes.
A multiprocessor reduces the DECwindows server process competition for
CPU time. On single processor systems, the processes must compete.
• Run the monitoring portion of DECamds on a standalone system.
If a cluster is experiencing system resource problems, you can still use
DECamds.
5–21
A
Installing the DECamds Data Analyzer
This appendix explains how to install the DECamds Data Analyzer software on
OpenVMS Alpha and OpenVMS VAX Version 6.2 and later systems.
Beginning with OpenVMS Version 7.2, the Data Provider ships as part of the
OpenVMS installation. Installing or upgrading to OpenVMS Version 7.2 or later automatically installs the Data Provider on your system. You can run the Data
Provider on any VAX or Alpha Version 6.2 or later system.
Note
The Compaq Availability Manager web site might refer you to a more recent version of the Data Provider than the one on the current OpenVMS
VAX or Alpha operating system CD-ROM. Compaq recommends that you install the DECamds Data Provider software using the version indicated at the following URL: http://www.openvms.compaq.com/openvms/products/availman/
Section A.7 explains how to start using the Data Provider.
This chapter contains the following sections:
• General installation information
• Data Analyzer installation requirements
• Downloading the Data Analyzer software
• Installing Data Analyzer software from a PCSI kit
• Postinstallation tasks on Data Provider nodes
• Postinstallation tasks on the Data Analyzer node
• Starting to use the Data Provider
• Determining and reporting problems
• Running the Installation Verification Procedure (IVP) separately
A.1 General Installation Information
DECamds provides online release notes. Compaq strongly recommends that you read the release notes before proceeding with the installation. You can print the text file of the release notes from the following location:
SYS$HELP:AMDS072-1B.RELEASE_NOTES
A–1
Installing the DECamds Data Analyzer
A.1 General Installation Information
DECamds consists of client and server software:
• The client software, the Data Analyzer, provides the graphical user interface to display DECamds information to users.
• The server software, the DECamds Data Provider (RMDRIVER), collects the data that DECamds analyzes and displays.
In earlier versions of OpenVMS, you needed to install both both the Data
Analyzer and Data Provider software on your system from the latest DECamds kit. Beginning with OpenVMS Version 7.2, you need to install only the Data
Analyzer software on the system where you run the client, or graphical user interface. You need to do this to obtain the new library for DECamds Version 7.2
and later.
A.2 Data Analyzer Installation Requirements
This section provides a checklist of hardware and software requirements for installing the DECamds Data Analyzer. A typical installation takes approximately 5 to 10 minutes per node, depending on your type of media and system configuration.
•
Hardware requirements
– A workstation monitor. For any hardware configuration without a
DECwindows Motif display device, use the DECwindows server to direct the display to a workstation or an X terminal.
– 16 MB of memory for VAX systems and 32 MB for Alpha systems, for the
Data Analyzer portion of DECamds.
You should use a more powerful system as the number of nodes and the amount of collected data rises. Table A–1 shows general guidelines for the default Data Analyzer node. Note that the following table does not preclude
DECamds from running on a less powerful system than listed for the number of nodes being monitored.
Table A–1 Recommended System Requirements
Number of
Monitored Nodes Recommended Alpha Hardware
1-30
20-50
40-90
91 or more
DEC 3000 Model 400, 32 MB
DEC 3000 Model 400, 64 MB
DEC 3000 Model 500
DEC 4000 Model 620
Recommended VAX
Hardware
VAXstation 3100, 16 MB
VAXstation 4000 Model 60
VAXstation 4000 Model 90
VAX 6000-420
•
Operating system version
At least one of the following:
OpenVMS VAX Version 6.2 or higher
OpenVMS Alpha Version 6.2 or higher
•
Display software
DECwindows Motif for OpenVMS Version 1.1 or higher installed on the
Data Analyzer node system.
A–2
Installing the DECamds Data Analyzer
A.2 Data Analyzer Installation Requirements
•
Privileges
Operation
Monitor only (read-only access)
Implement fixes (write access)
Stop, start, reload, or restart the Data
Provider node. Includes changing security or group name.
Privileges Needed
OPER
OPER, CMKRNL
OPER, CMKRNL, LOG_IO, SYSNAM,
SYSPRV
Note
For OpenVMS Version 6.2 and later, if the Data Provider is running on the same node as the Data Analyzer node, you must also have either
SYSPRV privilege or ACL access to the RMA0: device.
•
Disk space
– 3500 blocks on VAX systems.
– 4000 blocks on Alpha systems.
To determine the number of free disk blocks on the current system disk, enter the following command at the OpenVMS DCL prompt:
$ SHOW DEVICE SYS$SYSDEVICE
•
System parameter settings
These settings are the same as those required for operating system installation. The Installation Verification Procedure (IVP) requires additional space as follows:
GLBPAGFIL
WSMAX
1200
16384
You can modify WSMAX and GLBPAGFIL using the System Management utility (SYSMAN). See the OpenVMS System Manager’s Manual for more information.
• Process account quotas (minimum)
ASTLM
BIOLM
BYTLM
FILLM
PRCLM
PGFLQUO
TQELM
WSEXTENT
150
51
75000
20
3
25600
100
16384
Note that the AMDS$COMM_READS logical determines the default value.
If you are reinstalling the Data Analyzer, or have changed AMDS$COMM_
READS, then the following formulas are used to determine the default value:
ASTLM >= (AMDS$COMM_READS*3)
BIOLM >= (AMDS$COMM_READS+1)
BYTLM >= (AMDS$COMM_READS*1500)
A–3
Installing the DECamds Data Analyzer
A.2 Data Analyzer Installation Requirements
TQELM >= (AMDS$COMM_READS*2)
User account quotas are stored in the file SYSUAF.DAT. Use the OpenVMS
Authorize utility (AUTHORIZE) to verify and change user account quotas.
For more information on modifying account quotas, see the description of the
Authorize utility in the OpenVMS system management documentation.
Note
On both Alpha and VAX systems, Compaq recommends that you perform a system disk backup before installing any software. Use the backup procedures that are established at your site. For details about performing a system disk backup, see the OpenVMS Backup utility documentation.
A.3 Obtaining the Data Analyzer Software
The Data Analyzer software is available on the OpenVMS operating system layered product CD-ROM or from the Compaq Availability Manager web site.
Follow these steps to download the software from the web:
1.
From the Availability Manager home page, click Software Download. The
Availability Manager home page is at the following URL: http://www.openvms.compaq.com/openvms/products/availman/
2.
Complete the user survey, which allows you to proceed to the Download web page.
3.
Click one or both of the DECamds executables:
DECamds - Alpha: decamds0721b.pcsi-dcx_axpexe
DECamds - VAX: decamds0721b.pcsi-dcx_vaxexe
4.
Save the executable to a device and directory of your choice.
5.
Run the executable and accept the default file name. The result will be:
DECamds: DEC-VMS-AMDSV0702-1B-1.PCSI
The next section provides installation instructions for the Data Analyzer.
A.4 Installing Data Analyzer Software from a PCSI Kit
This section describes the installation procedure on an OpenVMS Version 6.2 or later systems from a POLYCENTER Software Installation (PCSI) kit.
•
Starting the installation
Use the following procedure to start the installation:
Enter the OpenVMS DCL command PRODUCT, the name of the task to be performed, and the name of one or more products. For example, to install
DECamds Version 7.2, enter the following command:
$ PRODUCT INSTALL AMDS/SOURCE=device directory/HELP where:
device directory refers to the device and the directory where the kit is located.
A–4
Installing the DECamds Data Analyzer
A.4 Installing Data Analyzer Software from a PCSI Kit
For a description of the features you can request with the PRODUCT
INSTALL command when starting an installation such as running the
IVP, purging files, and configuring the installation, see the POLYCENTER
Software Installation Utility User’s Guide.
As an installation procedure progresses, the system displays a percentage message to indicate how much of the installation is done. For example:
Percent Done: 15%
...30%
...46%
...62%
...76%
...92%
%PCSI-I-SUCCESS, operation completed successfully
If you started the installation using the /LOG qualifier, the system displays details of the installation.
•
Stopping and restarting the installation
Use the following procedure to stop and restart the installation:
1.
To stop the procedure at any time, press Ctrl/Y.
2.
Enter the PRODUCT REMOVE command to reverse any changes to the system that occurred during the partial installation. This deletes all files created up to that point and causes the installation procedure to exit.
3.
Go back to the beginning of the installation procedure to restart the installation.
•
Recovering from errors
If the installation procedure fails for any reason, the following message is displayed:
%POLYCENTER Software Installation utility
%INSTAL-E-INSFAIL, The installation of DECamds 7.2-1B has failed.
An error during the installation can occur if one or more of the following conditions exist:
• The operating system version is incorrect.
• The prerequisite software version is incorrect.
• Quotas necessary for successful installation are inadequate.
• Process quotas required by the POLYCENTER Software Installation utility are inadequate.
• The OpenVMS Help library is currently in use.
If you receive any error message beginning with
%PCSI-E-INSTAL
, refer to the OpenVMS DCL HELP/MESSAGE utility for POLYCENTER Software
Installation information and a possible solution to the problem.
If the installation fails, you must restart the installation procedure. If the installation fails due to an IVP failure, contact a Compaq support representative.
A–5
Installing the DECamds Data Analyzer
A.4 Installing Data Analyzer Software from a PCSI Kit
Sample Installation on an OpenVMS Alpha System
Example A–1 shows a sample installation on an OpenVMS Alpha system. This sample was run on a system that had no previous version of DECamds installed.
Depending on which layered products you have on your system, you might see additional messages and questions when you perform your installation.
Example A–1 Sample OpenVMS Alpha Installation
$ product install amds
The following product has been selected:
DEC VMS AMDS V7.2-1B
Do you want to continue? [YES]
Layered Product
Configuration phase starting ...
You will be asked to choose options, if any, for each selected product and for any products that may be installed to satisfy software dependency requirements.
DEC VMS AMDS V7.2-1B: DECamds (Availability Manager for Distributed
Systems) V7.2-1B
COPYRIGHT © © 1994, 1995, 1999 -- All rights reserved
Compaq Computer Corporation
License and Product Authorization Key (PAK) Information
Do you want the defaults for all options? [YES]
DECamds Data Provider Installation Verification Procedure
DECamds Startup File
DECamds Logicals Customization File
DECamds Data Provider Security Access File
DECamds Data Analyzer Security Access File
DECamds Data Analyzer Installation Verification Procedure (IVP)
IVP may fail due to the following PQL values being too low:
PQL_MASTLM, PQL_MBIOLM, PQL_MTQELM, or PQL_MBYTLM
See the file AMDS$SYSTEM:AMDS$PCSI_IVP_OUTPUT.LOG for help on failure.
Do you want to review the options? [NO]
Execution phase starting ...
The following product will be installed to destination:
DEC VMS AMDS V7.2-1B DISK$ALPHA_V72:[VMS$COMMON.]
Portion done: 0%...20%...30%...40%...50%...60%...70%...80%...90%...100%
The following product has been installed:
DEC VMS AMDS V7.2-1B Layered Product
%PCSI-I-IVPEXECUTE, executing test procedure for DEC VMS AMDS V7.2-1B ...
%PCSI-I-IVPSUCCESS, test procedure completed successfully
DEC VMS AMDS V7.2-1B: DECamds (Availability Manager for Distributed
Systems) V7.2-1B
(continued on next page)
A–6
Installing the DECamds Data Analyzer
A.4 Installing Data Analyzer Software from a PCSI Kit
Example A–1 (Cont.) Sample OpenVMS Alpha Installation
This product requires the following SYSGEN parameters:
GBLPAGES add 1172
A.5 Postinstallation Tasks on Data Provider Nodes
Perform the following tasks after installing DECamds on Data Provider nodes:
1.
If you have not read the release notes, please read them.
2.
Modify user accounts.
Users who maintain the security or group name files or load new versions of the driver need privileges associated with the driver startup procedure.
3.
Add AMDS$STARTUP.COM to the node’s startup and shutdown procedures to provide for automatic startup and shutdown of the Data Provider driver when a node is booted or shut down.
Add the following command line to SYS$MANAGER:SYSTARTUP_
VMS.COM:
$ @SYS$STARTUP:AMDS$STARTUP.COM STOP
Also, edit SYSHUTDWN.COM to add the following line:
$ @SYS$STARTUP:AMDS$STARTUP.COM STOP
4.
Modify default security files.
To implement fixes, which require write access, the security files must bemodified. The Data Provider security file contains a list of three-part codes representing Data Analyzer nodes that have read or write access to that node.Refer to Section 1.3 for complete instructions about designing security files.
5.
Assign a node to a group.
See Section 1.3.2.1.
6.
Start DECamds (the Data Provider).
Even though the IVP starts and stops the driver, you must start the Data
Provider drivers by entering the following command on each node:
$ @SYS$STARTUP:AMDS$STARTUP.COM START
Note
Starting, stopping, or reloading DECamds (the AMDS$STARTUP.COM
procedure) requires at least TMPMBX, NETMBX, SYSNAM, LOG_IO, and
CMKRNL privileges. Use the OpenVMS Authorize utility (AUTHORIZE) to determine whether users have the required privileges and then make adjustments as needed.
A–7
Installing the DECamds Data Analyzer
A.5 Postinstallation Tasks on Data Provider Nodes
A.5.1 Starting, Stopping, and Reloading DECamds
To start and stop the Data Provider driver, enter the following command. (Use this command if a node will be used to both provide and collect system data.)
$ @SYS$STARTUP:AMDS$STARTUP.COM [parameter] where the optional parameter is one of the following:
NODRIVER
START
STOP
RELOAD
Defines the default input and output logicals on the Data Analyzer node driver. Use this parameter on the Data Analyzer node where the Data
Provider driver is not running. It is the default.
Starts the Data Provider driver.
Stops the Data Provider driver.
Loads a new Data Provider driver. Use this parameter when installing a new version of DECamds.
Note
If you use the OpenVMS Snapshot Facility, stop the DECamds Data
Analyzer and Data Provider node drivers before taking a system snapshot.
A.6 Postinstallation Tasks on a Data Analyzer Node
Perform the following tasks after installing the DECamds Data Analyzer:
1.
If you were previously running an earlier version of DECamds, check the differences between the .DAT or .COM files on your system and the associated
.TEMPLATE files provided with the new kit. Change your existing files as necessary.
Note
The new .TEMPLATE files may contain important changes. However, to avoid altering your customizations, the upgrade procedure does not modify your existing customized versions of these files. Check the new
.TEMPLATE versions of these files provided with the kit, and make the appropriate change to your files.
2.
Modify default DECamds security files on each Data Analyzer node.
The security files must be modified to implement fixes (fixes require write access). Refer to Section 1.3 for complete instructions about designing security files.
3.
Define the system directory logical name AMDS$SYSTEM.
To define the logical name AMDS$SYSTEM on systems running the Data
Analyzer but not the Data Provider, enter the following command:
$ @SYS$STARTUP:AMDS$STARTUP.COM NODRIVER
This command requires SYSNAM privilege. The NODRIVER parameter specifies that the procedure is to define the input and output logical names in
AMDS$LOGICALS.COM.
4.
Modify user accounts as needed.
A–8
Installing the DECamds Data Analyzer
A.6 Postinstallation Tasks on a Data Analyzer Node
To use DECamds, user accounts require certain privileges and quotas:
• Using the Data Analyzer node for data collection (read access) requires
TMPMBX, NETMBX, and OPER privileges.
• Performing fixes (write access) requires the CMKRNL privilege in addition to TMPMBX, NETMBX, and OPER.
• Using the AMDS$STARTUP.COM to start, stop, or reload the Data
Provider requires at least TMPMBX, NETMBX, SYSNAM, LOG_IO, and
CMKRNL privileges.
5.
Start the application.
For example, the following command starts DECamds with all input files read from AMDS$SYSTEM and all output files written to the current default directory. Only data from group A nodes and group B nodes is displayed.
$ AVAIL /CONFIGURE=AMDS$SYSTEM /LOG_DIRECTORY=SYS$LOGIN-
_$/GROUP=(GROUP_A, GROUP_B)
See Chapter 2 for startup options.
A.7 Starting to Use the Data Provider
Before starting to use the Data Provider, you need to move and remove several files to make the Data Provider RMDRIVER part of OpenVMS.
Move these Files
Move the following files:
File
AMDS$DRIVER_ACCESS.DAT
AMDS$LOGICALS.COM
Old Directory Location New Directory Location
SYS$COMMON:[AMDS] SYS$COMMON:[SYSMGR]
SYS$COMMON:[AMDS] SYS$COMMON:[SYSMGR]
These new directory locations should not affect previous copies of
AMDS$DRIVER_ACCESS.DAT that are in the AMDS$SYSTEM directory because the AMDS$SYSTEM logical is now a search list for
SYS$COMMON:[AMDS] and SYS$COMMON:[SYSMGR]. Previous copies of the files will still be valid; however, new copies of the files will be placed in the new locations.
Delete this File
Also, because the installation replaces the following file, remove it from your system:
SYS$COMMON:[AMDS]AMDS$RMCP.EXE
Data Provider Commands
To start to use the Data Provider, perform either of the following tasks:
• Run the SYS$STARTUP:AMDS$STARTUP START command procedure at the OpenVMS DCL prompt ($).
• Add the @SYS$STARTUP:AMDS$STARTUP START command to the
SYSTARTUP_VMS.COM command file in the SYS$MANAGER directory.
A–9
Installing the DECamds Data Analyzer
A.8 Determining and Reporting Problems
A.8 Determining and Reporting Problems
If you encounter a problem while using DECamds, report the problem to Compaq.
Depending on the nature of the problem and the type of support you have, take one of these actions:
• If your software contract or warranty agreement entitles you to telephone support, contact a Compaq support representative.
• If the problem is related to the DECamds documentation, see the Preface of this manual for instructions.
A.9 Running the Installation Verification Procedure Separately
Usually the Installation Verification Procedure (IVP) runs during installation.
Should system problems occur after you install DECamds, check the integrity of installed files by executing the following command procedure:
$ @SYS$TEST:AMDS$IVP.COM
The IVP leaves the Data Provider in the same state in which it was found. For example, if the Data Provider is running, the IVP stops and starts it.
A–10
B
DECamds Files and Logical Names
The DECamds Data Analyzer installation procedure installs files and defines logical names to customize the environment.
The installation procedure defines all logical names in executive mode in the system table (with the /SYSTEM /EXECUTIVE qualifiers). However, you can define logical names in /JOB or /GROUP tables, preceding the system definitions.
Table B–1 and Table B–2 explain the files installed and logical names defined with the Data Analyzer.
Table B–3 and Table B–4 explain the files installed and logicals defined on each node running the Data Provider.
Logical names are added to the logical name table when the
AMDS$LOGICALS.COM procedure is invoked by AMDS$STARTUP.COM.
Note
Logical names can be a search list of other logicals.
The logical names in Table B–2 and Table B–4 must be defined in the job, group, or system table. If you change the name, define the new logical in the job, group, or system table.
B.1 Files and Logical Names for the Data Analyzer Node
Table B–1 and Table B–2 contain the names of all files created on a Data
Analyzer node when DECamds is installed.
Table B–1 Files on the Data Analyzer Node
Directory-Logical:File-Name
AMDS$HELP:AMDS$HELP.HLB
AMDS$CONFIG:AMDS$*.DAT
AMDS$SYSTEM:AMDS073.RELEASE_NOTES
AMDS$CONFIG:AMDS$COMM.EXE
AMDS$SYSTEM:AMDS$CONSOLE.EXE
AMDS$CONFIG:AMDS$CONSOLE.UID
AMDS$CONFIG:AMDS$CONSOLE_
ACCESS.DAT
1
Function
Help library
Customization files
Product Release Notes
Communication image
Data Analyzer image
User interface description file
Data Analyzer security file
1
Can be provided as a TEMPLATE file, depending on whether the file was found during installation.
(continued on next page)
B–1
DECamds Files and Logical Names
B.1 Files and Logical Names for the Data Analyzer Node
Table B–1 (Cont.) Files on the Data Analyzer Node
Directory-Logical:File-Name
SYS$MANAGER:AMDS$LOGICALS.COM
1
AMDS$SYSTEM:AMDS$VMS*-*.LIB
AMDS$TEST:AMDS$IVP.COM
SYS$STARTUP:AMDS$STARTUP.COM
Function
Logical name definition file
DECamds version-specific libraries
Installation verification procedure
DECamds startup file
1
Can be provided as a TEMPLATE file, depending on whether the file was found during installation.
Table B–2 Logical Names Defined for the Data Analyzer
Logical Name Definition
AMDS$COMM_BUFFER_SIZE This value is the size (in bytes) of the communications buffer between the
AMDS$CONSOLE process and the
AMDS$COMM process.
AMDS$COMM_READS
AMDS$COMM_PKT_RETRY
AMDS$COMM_PKT_TMOUT
This value is the number of read aheads posted by the DECamds communications process
(AMDS$COMM) to handle the delivery of remote response packets from the Data Provider to the Data Analyzer node.
Specifies the number of retries before quitting and issuing a "delivery path lost message."
Specifies the timeout period (in seconds) for packet retry for the Data Analyzer.
AMDS$CONFIG The device and directory location for the following DECamds input files:
AMDS$APPLIC_CUSTOMIZE.DAT
AMDS$COMM.EXE
AMDS$CONSOLE.UID
AMDS$CONSOLE_ACCESS.DAT
AMDS$VMS*-*.LIB
All customization files AMDS$*_DEFS.DAT
Default
300000 bytes
50 read aheads
4
10
AMDS$SYSTEM
AMDS$DPI
AMDS$LOG
This value specifies the DPI value of your display device.
The device and directory location for the following DECamds output files:
AMDS$EVENT_LOG.LOG
AMDS$LOCK_LOG.LOG
75 or 100
AMDS$SYSTEM
B.2 Files and Logical Names for Data Provider Nodes
Table B–3 and Table B–4 contain the names of all files created on a node when a
Data Provider is installed.
B–2
DECamds Files and Logical Names
B.2 Files and Logical Names for Data Provider Nodes
Table B–3 Files on Nodes Running the Data Provider
Directory-Logical:File-Name Function
SYS$MANAGER:AMDS$DRIVER_ACCESS.DAT
1
SYS$MANAGER:AMDS$LOGICALS.COM
1
AMDS$SYSTEM:RMCP.EXE
SYS$HELP:AMDS072-1B.RELEASE_NOTES
SYS$HELP:AMDS$HELP.HLB
SYS$LOADABLE_IMAGES:RMDRIVER.EXE,
SYS$LOADABLE_IMAGES:RMDRIVER.STB
2
SYS$LOADABLE_IMAGES:SYS$RMDRIVER.EXE,
SYS$LOADABLE_IMAGES:SYS$RMDRIVER.STB
3
SYS$STARTUP:AMDS$STARTUP.COM
SYS$TEST:AMDS$IVP.COM
Data Provider security file
Logical name definition file
Management interface to the
Data Provider
Product Release Notes
Help library
Data Provider (VAX systems)
Data Provider (Alpha systems)
DECamds startup file
Installation verification procedure
1
Can be provided as a .TEMPLATE file, depending on whether the file was found during installation.
2
VAX specific
3
Alpha specific
Table B–4 Logical Names Defined on Nodes Running the Data Provider
Logical Name Definition Default
AMDS$CONFIG
AMDS$DEVICE
AMDS$GROUP_NAME
AMDS$NUM_DL_READS
The device and directory location for the
DECamds input file AMDS$DRIVER_
ACCESS.DAT.
This logical is translated as the first LAN device to which the Data Provider or Data Analyzer node attempts to connect. The attempts are made in this order: AMDS$DEVICE, FXA0,
XEA0, XQA0, EFA0, ETA0, ESA0, EXA0, EZA0,
FCA0, ECA0.
If your LAN line is not in this list, use
AMDS$DEVICE.
If the Data Analyzer node and Data Provider run on the same node, *RMA0 is used.
The group to which the node is assigned. Choose an alphanumeric string of up to 15 characters.
The group name is defined on the node running the Data Provider and is used by the Data
Analyzer node to display nodes in the System
Overview window.
The number of data link reads to be posted by the
Data Provider as read-ahead buffers. Generally between 4 and 8 should be sufficient to allow the
Data Provider to process without having to wait for a data link buffer to be cleared.
AMDS$SYSTEM
Undefined
DECAMDS
5 data link reads
1
1
Each read request requires 1500 bytes of BYTCNT quota used for the starting process.
(continued on next page)
B–3
DECamds Files and Logical Names
B.2 Files and Logical Names for Data Provider Nodes
Table B–4 (Cont.) Logical Names Defined on Nodes Running the Data Provider
Logical Name Definition Default
AMDS$RM_DEFAULT_
INTERVAL
The number of seconds between multicast hello messages from the Data Provider to the Data
Analyzer node when the Data Provider is inactive or is only minimally active.
The minimum value is 15. The maximum value is 300.
AMDS$RM_OPCOM_READ This logical defined as TRUE allows OPCOM messages for read failures from the Data
Provider. Defined as FALSE, the message facility is disabled.
AMDS$RM_OPCOM_
WRITE
This logical defined as TRUE allows OPCOM messages for write (fix) successes and failures from the Data Provider. Defined as FALSE, the message facility is disabled.
AMDS$RM_SECONDARY_
INTERVAL
The number of seconds between multicast hello messages from the Data Provider to the Data
Analyzer node when the Data Provider is active.
The minimum value is 15. The maximum value is 1800.
30
TRUE
TRUE
90
B.3 Log Files
The DECamds Data Analyzer records two log files:
• An events log file named AMDS$EVENT_LOG.LOG. This ASCII text file records all event messages displayed in the Event Log window.
• A lock contention log file named AMDS$LOCK_LOG.LOG. This ASCII text file records all lock contention information displayed in the Lock Contention window.
Both log files are created when the DECamds application is started. Either file can be edited while the application is running.
Event Log File and Lock Log File Enhancements
Prior to Version 7.2, the Event Log File and Lock Log File were created with a default creation size of 1 block and a default extension size of 1 block. This sometimes resulted in a very fragmented log file (and disk) when DECamds was allowed to run for a long period of time.
Two new logicals in the AMDS$LOGICALS.COM file allow users to define additional sizes in log files. The following table describes these logicals and their default values.
Logical
AMDS$EVTLOG_ALLOC_SIZE
AMDS$EVTLOG_EXTNT_SIZE
Description
Sets the initial size of the log files.
Sets the extension size of a file when it needs to grow.
Default Value
100 blocks
0 blocks
The default value for AMDS$EVTLOG_EXTNT_SIZE causes DECamds to use the system defaults for extent size.
B–4
DECamds Files and Logical Names
B.4 Event Log File
B.4 Event Log File
The event log file keeps a record of the events detected by DECamds. You can review it without a DECwindows terminal. Every 30 minutes, DECamds writes a message to the file, noting the date and time.
Example B–1 is an example of AMDS$LOG:AMDS$EVENT_LOG.LOG.
Example B–1 Sample Event Log File
Time Sev Event
Opening DECamds Event Log on date/time: 11:16:07.00
11:16:07.98 0 CFGDON, PROD12 configuration done
11:16:08.44 0 CFGDON, PROD09 configuration done
11:16:09.65 0 CFGDON, AXPND1 configuration done
11:16:11.47 0 CFGDON, PROD01 configuration done
11:16:11.89 0 CFGDON, VAXND1 configuration done
11:16:12.14 0 CFGDON, PROD15 configuration done
11:16:14.02 0 CFGDON, PROD14 configuration done
11:16:14.57 60 HIDIOR, PROD12 direct I/O rate is high
11:16:14.57 70 HITTLP, PROD12 total page fault rate is high
11:16:14.57 80 LOMEMY, PROD12 free memory is low
11:16:14.58 70 HITTLP, PROD09 total page fault rate is high
11:16:14.58 80 LOMEMY, PROD09 free memory is low
11:16:15.32 70 HITTLP, AXPND1 total page fault rate is high
11:16:25.33 60 HIBIOR, PROD09 buffered I/O rate is high
11:16:35.46 60 HIBIOR, AXPND1 buffered I/O rate is high
11:16:40.62 95 LOSWSP, AXPND1 DISK$ALPHAVMS015:[SYS0.SYSEXE]SWAPFILE.SYS swap file space is low
11:16:49.84 70 HITTLP, PROD09 total page fault rate is high
11:16:55.14 60 HIBIOR, PROD12 buffered I/O rate is high
11:17:14.58 0 CFGDON, PROD05 configuration done
11:17:14.94 70 HITTLP, PROD09 total page fault rate is high
11:17:16.93 0 CFGDON, PROD04 configuration done
11:17:18.10 0 CFGDON, PROD17 configuration done
11:17:18.15 0 CFGDON, PROD10 configuration done
11:17:19.50 60 HIBIOR, PROD10 buffered I/O rate is high
11:17:19.50 60 HIDIOR, PROD10 direct I/O rate is high
11:17:19.50 70 HITTLP, PROD10 total page fault rate is high
11:17:19.50 80 LOMEMY, PROD10 free memory is low
11:17:20.33 60 HIBIOR, PROD05 buffered I/O rate is high
11:17:21.49 0 CFGDON, PROD20 configuration done
11:17:21.52 0 CFGDON, PROD13 configuration done
11:17:24.96 0 CFGDON, PROD06 configuration done
11:17:35.35 0 CFGDON, PROD07 configuration done
11:17:39.84 60 HINTER, PROD07 interrupt mode time is high
11:17:40.21 70 HITTLP, PROD09 total page fault rate is high
11:18:04.69 60 HIBIOR, PROD10 buffered I/O rate is high
11:18:05.36 60 HIDIOR, PROD07 direct I/O rate is high
11:18:10.49 60 HIBIOR, PROD09 buffered I/O rate is high
11:18:10.49 60 HIDIOR, PROD09 direct I/O rate is high
11:18:14.70 60 HIBIOR, PROD12 buffered I/O rate is high
11:18:15.68 60 HIBIOR, AXPND1 buffered I/O rate is high
11:18:26.05 60 HIBIOR, PROD05 buffered I/O rate is high
11:18:40.57 75 HIHRDP, PROD10 hard page fault rate is high
11:18:45.80 60 HIDIOR, PROD09 direct I/O rate is high
11:18:55.91 60 HINTER, PROD07 interrupt mode time is high
11:19:09.67 60 HIBIOR, PROD09 buffered I/O rate is high
11:19:09.67 60 HIDIOR, PROD09 direct I/O rate is high
11:19:09.67 75 HIHRDP, PROD09 hard page fault rate is high
11:19:15.48 60 HIBIOR, PROD05 buffered I/O rate is high
B.5 Lock Contention Log File
Example B–2 is an example of a Lock Contention Log File.
B–5
DECamds Files and Logical Names
B.5 Lock Contention Log File
Example B–2 Sample Lock Contention Log File
***********************************************
Time: 9-JUL-2000 14:23:46.68
Master Node: AXPND1
Resource Name: QMAN$JBC_ALIVE_01
Parent Resource Name: QMAN$MSR_$10$DKA300.....ñ.....
RSB Address: 805B1400, GGMODE: EX, CGMODE: EX
Hex Representation
514D414E 244A4243 (Bytes 0 - 7)
5F414C49 56455F30 (Bytes 8 - 15)
31000000 00000000 (Bytes 16 - 23)
00000000 000000C0 (Bytes 24 - 31)
Status: VALID
***********************************************
Time: 9-JUL-2000 14:28:42.44
Resource Name: QMAN$JBC_ALIVE_01
Parent Resource Name: QMAN$MSR_$10$DKA300.....ñ.....
Blocking Lock Data
Node: AXPND1, PID: 2020008C, Name: JOB_CONTROL
LKID: 0200015E, GR Mode: EX
Flags: NOQUEUE
Local Copy
Blocked Lock on WAITING queue
Node: AXPND1, PID: 2020008D, Name: QUEUE_MANAGER
LKID: 2000013B, RQ Mode: CR
Flags: NODLCKW
Local Copy
***********************************************
B.6 OPCOM Log
The following examples show some of the OPCOM messages that appear in the operator log file from the Data Provider:
%%%%%%%%%%% OPCOM 2-JAN-2000 08:16:21.92
%%%%%%%%%%%
Message from user RMDRIVER
RMA0: - No privilege to access from node 2.2
This message means that the node does not have the privilege to perform a read operation.
%%%%%%%%%%% OPCOM 2-JAN-2000 10:10:45.08
%%%%%%%%%%%
Message from user RMDRIVER
RMA0: - No privilege to write from node 2.2
This message means that the Data Provider does not have the privilege to perform a write operation.
%%%%%%%%%%% OPCOM 2-JAN-2000 12:35:05.28
%%%%%%%%%%%
Message from user RMDRIVER
RMA0: - Process 2390003c modified from node 2.2
This message means that the Data Provider has successfully performed a write operation on the node.
B–6
Glossary
Following is an alphabetical listing of terms used in this manual and their definitions.
automatic data collection
Data collection that begins automatically when the Data Analyzer runs and recognizes a Data Provider. By default, this feature is enabled.
The default data windows for which automatic collection is enabled are:
Node Summary
Page/Swap File Summary
Lock Contention Summary
Cluster Transition Summary
Automatic Event Investigation
Enhances the speed with which you can pursue a specified event. When this option is enabled, DECamds automatically collects follow-up data on the event.
When this option is disabled, you must initiate follow-up data collection when an event occurs.
To enable automatic event investigation, choose Enable Automatic Event
Investigation from the Control menu of the System Overview or Event Log window. To disable it, choose the Disable Automatic Event Investigation menu item.
This feature does not apply to any lock contention events. To enable automatic lock contention detection, use the DECamds Application Customizations dialog box, as explained in Section 5.1.
collection interval
The frequency at which the Data Analyzer will send requests to a Data Provider to collect data.
See also Data Analyzer, Data Provider.
Data Analyzer
The portion of DECamds that collects and displays system data from Data
Provider nodes. You can also perform fixes with the Data Analyzer.
See also Data Provider, fix.
Data Provider
The portion of DECamds that is installed to provide system data when requested by authorized Data Analyzers. A Data Provider node uses the OpenVMS LAN drivers to receive and send data across the network.
See also Data Analyzer.
Glossary–1
Glossary–2 data window
A Data Analyzer window that displays additional data. A number of different data windows are available as follows (see also Chapter 3):
CPU Modes Summary
CPU Summary
Disk Status Summary
Volume Summary
Single Disk Summary
Lock Contention Summary
Memory Summary
Node Summary
Page/Swap File Summary
Process I/O Summary
Single Lock Summary
Single Process Summary
Cluster Transition/Overview Summary
System Communication Architecture Summary
NISCA Summary
event
A description of a potential resource availability problem, based on rules defined by the Data Analyzer and customized thresholds. Events trigger display changes in data windows such as color and item highlighting.
See also Data Analyzer, data window.
Event Log window
One of two primary Data Analyzer windows that displays events as they occur.
For each event, you can display more detailed information to investigate the underlying problem by double-clicking on the event. You can also perform fixes for some events from this window.
See also System Overview window.
fix
A corrective action made to a Data Provider node but initiated from the Data
Analyzer node.
group
A set of remote Data Provider nodes with similar attributes; for example, all the members of an OpenVMS Cluster can be in the same group. The group that a node belongs to is determined by the translation of the AMDS$GROUP_NAME logical on each Data Analyzer.
occurrence value
The number of consecutive data samples that must exceed the event threshold before an event is signaled.
page
A unit used by the operating system to section memory. On VAX systems, a page is 512 bytes. On Alpha systems, a page can be 8 kilobytes (8192 bytes), 16 KB,
32 KB, or 64 KB.
pagelet
A unit used by the OpenVMS Alpha operating system to break down the page into smaller addressable units. One pagelet is the same as a VAX page: 512 bytes.
security triplet
A three-part access code located in the AMDS$DRIVER_ACCESS.DAT and
AMDS$CONSOLE_ACCESS.DAT files that enables communications to be established between the Data Analyzer and Data Provider.
System Overview window
One of two primary Data Analyzer windows that graphically displays groups and the nodes that belong to each group. The System Overview window provides summary data about CPU, Memory, Process I/O usage, Number of Processes in
CPU Queues, Operating System Version, and Hardware Model for the nodes being monitored.
Glossary–3
Index
A
Access for Data Analyzers, 1–4 for Data Providers, 1–4 read-only, 1–9
Account field, 3–20
Adjust Working Set fix, 4–6
AMDS$*_DEFS.DAT files, B–1
AMDS$APPLIC_CUSTOMIZE.DAT file, 5–2
AMDS$COMM.EXE file, B–1
AMDS$COMMBUFOVF logical name, 5–20
AMDS$COMM_BUFFER_SIZE logical name,
5–20, B–2
AMDS$COMM_READS logical name, 5–20, A–3,
B–2
AMDS$CONFIG logical name, B–2, B–3
AMDS$CONSOLE.EXE file, B–1
AMDS$CONSOLE.UID file, B–1
AMDS$CONSOLE_ACCESS.DAT file, 1–4, B–1
AMDS$DEVICE logical name, B–3
AMDS$DRIVER_ACCESS.DAT file, 1–4, B–3 example, 1–8
AMDS$EVENT_LOG.LOG file, 2–7, B–4
AMDS$EVTLOG_ALLOC_SIZE
DECamds logical, B–4
AMDS$EVTLOG_EXTNT_SIZE
DECamds logical, B–4
AMDS$GROUP_NAME logical name, 1–8, B–3
AMDS$IVP.COM file, B–2, B–3
AMDS$LOCK_LOG.LOG file, B–4
AMDS$LOGICALS.COM file, 1–8, B–1, B–3
AMDS$LOG logical name, B–2
AMDS$NUM_DL_READS logical name, B–3
AMDS$RM_DEFAULT_INTERVAL logical name,
B–3
AMDS$RM_OPCOM_READ logical name, B–4
AMDS$RM_OPCOM_WRITE logical name, B–4
AMDS$RM_SECONDARY_INTERVAL logical name, B–4
AMDS$STARTUP.COM file, 1–8, A–8, B–2
AMDS$THRESHOLD_DEFS.DAT file, 5–11
AMDS$VMS*-*.LIB files, B–2
AMDS073.RELEASE_NOTES, B–1, B–3
Assigning a node to a group, 1–8
ASTLM (AST limit) quota current count, 3–23
Automatic data collection, 5–3 event investigation, 5–3 fix, 4–10 investigation for events, 5–4 investigation for locks, 5–4
Availability messages setting broadcast intervals, 1–10
AVAIL command, 2–1
B
Bell filter, 5–6
Bell volume, 5–6
Blocks number free on a volume, 3–6 number used on a volume, 3–6 percentage number used on a volume, 3–6
Broadcast intervals, B–3 setting for node availability messages, 1–10
Buffered I/O, 3–22 byte limit (BYTLM), 3–24 field, 3–13 limit (BIOLM), 3–23
Limit Remaining field, 3–13 rate, 3–13 rate display, 2–4 wait state, 3–23
Byte limit remaining for process I/O, 3–13
C
Change process priority fix, 4–6
Choosing data categories, 2–6
Classes
CPU, 5–12 customizing, 5–11
I/O, 5–12 memory, 5–12 thresholds, 5–12
Cluster hung, 4–3
Cluster Transition/Overview Summary window,
3–30 to 3–33
Index–1
Collecting data automatic at startup, 5–3 by category, 2–6 by event, 2–6 choosing a data category, 2–6 default, 5–3 for events, 5–3 for lock events, 5–3 options, 5–3 recommendations to handle heavy workloads,
5–18 single node or group, 2–6 stopping, 2–6
Collection intervals changing at window level, 5–16, 5–20 globally, 5–16 collection factor, 5–17 default values, 5–17 display factor, 5–17
Event, 5–17 factor setting, 5–3
NoEvent, 5–17
Collect menu, 2–6
Command qualifiers, 2–1
Communications buffer, 5–20
Compaq DECamds
See DECamds
Compute wait state, 3–23
/CONFIGURE qualifier, 2–1
Control wait state, 3–23
Conversion locks, 3–29
Corrective action, 2–9
CPU Modes Summary window, 3–14 to 3–15
CPU queues display of number of processes in, 2–4
CPUs (central processing units) capability, 3–15 classes, 5–12
CPU identifier (ID), 3–15
CPU Modes field, 3–11
CPU Process State Queues field, 3–11 default data collection, 5–3 execution rates, 3–22 filter categories, 5–8 filtering data, 5–7 improving performance by suspending, 4–10 response, 5–13 load in gathering data, 2–6 modes, 3–11, 3–15 number active on a node, 3–11 peak usage, 3–15 percentage used, 3–15 process identifier (PID), 3–15, 3–16 name in, 3–15
CPUs (central processing units) process (cont’d) priority, 3–16 state queues, 3–11 time, 3–16 program counter, 3–20 setting process priorities, 4–7 state, 3–15, 3–16 time rate, 3–15, 3–16 time limit for single process modes, 3–23 usage, 3–14, 3–15 display of, 2–4 wait state, 3–16
CPU Summary window, 3–15 to 3–16
Crash Node fix, 4–7
Customizing automatic investigation when events detected,
5–3 collection interval factor, 5–3, 5–20 default settings, 5–1 effect on performance, 5–20 hiding node names on startup, 5–3 highlighting event data, 5–3 security files, 1–7 template customization files, 5–2
D
Data display default, 2–5 event details, 2–9 link, 5–20 transfer security, 1–4
Data Analyzer, 1–2 access for Data Providers, 1–4
AMDS$CONSOLE_ACCESS.DAT file, 1–4 data exchange with Data Provider, 1–8 files used for, B–1 log files, B–4 security file, A–8 setting up after installing, A–8 starting, 2–1 to 2–2, A–9 system directory definition, A–8 typical setup, 1–2 user account privileges and quotas, A–8
Data collection
See Collecting data
Data Provider, 1–2 access for Data Analyzers, 1–4
AMDS$DRIVER_ACCESS.DAT file, 1–4 data exchange with Data Analyzer, 1–8 files, B–3 restarting, 1–8 restrictions, 1–2 security file, A–7 setting up after installation, A–7
Index–2
Data Provider (cont’d) starting, A–8
Data windows
See also specific window names
Cluster Transition/Overview Summary, 3–30
CPU Modes Summary, 3–14
CPU Summary, 3–15
Disk Status Summary, 3–2 hierarchy, 3–1
Lock Contention Summary, 3–25
Memory Summary, 3–17
NISCA Summary, 3–36
Node Summary, 3–10 overview, 3–1
Page/Swap File Summary, 3–8
Process I/O Summary, 3–12
Single Disk Summary, 3–6
Single Lock Summary, 3–28
Single Process Summary, 3–19
System Communications Architecture (SCA)
Summary, 3–33
Volume Summary, 3–5
DECamds
Data Analyzer system requirements, A–2 installation requirements, 1–3 log file enhancements, B–4 overview, 1–1 processing model, 1–2 security features, 1–3 starting, 2–1 to 2–2, A–8
DECamds Data Analyzer installation requirements, A–2 preparing for installation, A–1 system requirements, A–2
DECwindows Motif used on DECamds, 1–2
Defaults automatic data collection, 5–3 collection intervals, 5–17 customizing, 5–1 data display, 2–5 event color, 5–3 event highlighting, 5–3 event investigation, 5–3 lock event collect state, 5–3 options, 2–5 setting default data collection, 5–3
Deleting events, 2–10
Device name field, 3–3, 3–6
DIO
See Direct I/O
DIOLM (Direct I/O limit), 3–23
Direct I/O
DIO rate field, 3–13
Limit Remaining field, 3–13 use display, 2–4 wait state, 3–23
Disk error messages, 3–4
Disk space required for installation, A–3
Disk Status Summary window, 3–2 to 3–4
Disk volumes error messages, 3–6
Displaying default data, 2–5 event data, 2–9 options, 2–5
Display software installation requirements, A–2
DSKERR error message, 3–4
DSKINV error message, 3–4
DSKMNV error message, 3–4
DSKOFF error message, 3–4
DSKQLN error message, 3–6
DSKRWT error message, 3–4
DSKUNA error message, 3–4
DSKWRV error message, 3–4
Duration field, 3–26, 3–29
E
ENQLM (enqueue limit) job quotas in use, 3–23
Error messages
CPU, 3–16 disk status, 3–4 disk volume, 3–6 lock contention, 3–27 memory, 3–18 node, 3–11 page/swap file, 3–9 process I/O, 3–13 single lock, 3–29 single process, 3–24
Errors field, 3–4
Escalation severity filter, 5–6 time filter, 5–6
Event Log window, 2–7 deleting events from, 2–10 display fields, 2–7 filters, 5–6 menus, 2–8 using, 2–7
Events
See also Event Log window automatic investigation, 5–4 bell filter, 5–6 bell volume, 5–6 changing default highlighting, 5–3 severity filter, 5–7, 5–11 corrective action, 2–9 creating thresholds for different computer classes, 5–11
Index–3
Events (cont’d) customizing based on frequency of occurrence,
5–11 deleting, 2–10 displaying more information, 2–9 escalation severity filter, 5–6 time filter, 5–6 filtering, 5–6 highlight filter, 5–6 highlighting color, 5–3 lock contention investigation, 5–4 log files, B–4 removing from the Event Log window, 2–6 sending messages to OPCOM, 2–10 severity filter, 5–6 severity values, 2–7 signal filter, 5–6 temporary freeze, 2–10 timeout, 2–8 filter, 5–6
Exiting Image and Deleting Process fix, 4–8
Explicit wait state, 3–23
F
Fault rate for pages, 3–18
File Name field, 3–9
File protection for security, 1–4
Files, B–1 to B–6
FILLM (file limit) job quota in use, 3–23
Filtering data at window level, 5–20 changing a filter category, 5–8 event qualification, 5–5 methods, 5–4
Filtering events, 5–6
Filters bell, 5–6 bell volume, 5–6 changing severity values for events, 5–6, 5–7,
5–11 escalation severity, 5–6 escalation time, 5–6
Event Log, 5–6 highlight, 5–6 signal (display), 5–6 timeout, 5–6
Fixes, 4–1 adjust working set, 4–6 automatic, 4–10 changing process priority, 4–6 working set size, 4–6 cluster hung, 4–3 crashing a node, 4–7 deleting a process, 4–8 examples, 4–10 to 4–13
Fixes (cont’d) exiting an image, 4–8
Fix menu, 2–9 intruder, 4–3 list of available, 4–1 manual, 4–11 memory too low, 4–3 memory usage, 4–1 options, 2–9 performing, 4–2 to 4–10 process, 4–1, 4–3 purging a working set, 4–9 quorum, 4–1 resuming a process, 4–9 runaway process, 4–3 summary, 4–3 suspending a process, 4–9 system, 4–1 understanding, 4–2 working set too high or too low, 4–3
Flags field, 3–29
Free field, 3–6
G
Getting Started, 2–1
Granted locks, 3–29
GR mode field, 3–29
/GROUP qualifier, 2–1
Groups, 2–4 collapsing information, 2–5 collecting data, 2–6 expanding information, 2–5 how to assign a node, 1–8 of nodes creating, 1–7
H
Hanging cluster fix, 4–3
Hardware
DECamds installation requirements, A–2 model field, 3–11 security triplet address, 1–6
Hardware model display of for node, 2–4
Hello message broadcasts
See Broadcast intervals
Help, 2–2
HIBIOR error message, 3–11
HICOMQ error message, 3–11
Hide Nodes, 2–5 changing default behavior, 5–3
HIDIOR error message, 3–11
Highlighting changing default behavior, 5–3 customizing color, 5–3 filter, 5–6
Index–4
HIHRDP error message, 3–11
HIMWTQ error message, 3–11
HINTER error message, 3–11
HIPWIO error message, 3–11
HIPWTQ error message, 3–11
HISYSP error message, 3–11
HITTLP error message, 3–11
HMPSYN error message, 3–11
I
I/O (input/output) average number of operations pending for a
BIO volume, 3–6 limit remaining, 3–13 buffered limit (BIOLM), 3–23 buffered wait state, 3–23 byte limit remaining, 3–13 classes, 5–12 default data collection, 5–3
DIO limit remaining, 3–13 direct limit (DIOLM), 3–23 direct wait state, 3–23
Fault rate for pages, 3–18 open files, 3–13 paging, 3–22
PIO, 3–13 process identifier (PID), 3–13 summary, 3–12 summary for node, 3–11
I/O field, 3–11
IEEE DECamds protocol, 1–4
Images Activated field, 3–22
Installation requirements
DECamds, 1–3
Installing DECamds post-installation tasks, A–7
Installing software
DECamds Data Analyzer, A–1
Installing the DECamds Data Analyzer, A–4 requirements, A–2
Interrupt priority level
See IPL
Intruder fix, 4–3
Investigating events, 4–11
IPID (internal PID), 3–15
IPL (interrupt priority level) on DECamds, 1–2
J
Job quotas in use, 3–23 to 3–24
JOB_CONTROL process, 4–10
L
LAN polling, 5–18 setting load, 5–19
LCKBLK error message, 3–29
LCKCNT error message, 3–27
LCKWAT error message, 3–29
LIB files, B–2
LKID field, 3–29
LOASTQ error message, 3–24
LOBIOQ error message, 3–13, 3–24
LOBYTQ error message, 3–13, 3–24
Lock Contention Summary window, 3–25 to 3–27 default data collection, 5–3 detailed data, 3–28 logging information, B–4
Locks automatic investigation, 5–4 types of, 3–29
Lock Type field, 3–29
LODIOQ error message, 3–13, 3–24
LOENQU error message, 3–24
LOFILQ error message, 3–13, 3–24
Log files, B–4 lock contention, B–5
Logging event messages, B–4
Logical names
AMDS$COMMBUFOVF, 5–20
AMDS$COMM_BUFFER_SIZE, 5–20, B–2
AMDS$COMM_READS, 5–20, B–2
AMDS$CONFIG, B–2, B–3
AMDS$DEVICE, B–3
AMDS$GROUP_NAME, B–3
AMDS$LOG, 2–7, B–2
AMDS$NUM_DL_READS, B–3
AMDS$RM_DEFAULT_INTERVAL, B–3
AMDS$RM_OPCOM_READ, B–4
AMDS$RM_OPCOM_WRITE, B–4
AMDS$RM_SECONDARY_INTERVAL, B–4
Data Analyzer node, B–1
Data Provider node, B–3 requirements, B–1 sending messages to OPCOM, 1–9
/LOG_DIRECTORY qualifier, 2–1
LOMEMY error message, 3–11
Looping process fix, 4–3
LOPGFQ error message, 3–24
LOPGSP error message, 3–9
Index–5
LOPRCQ error message, 3–24
LOSWSP error message, 3–9
LOTQEQ error message, 3–24
LOVLSP error message, 3–6
LOWEXT error message, 3–18, 3–24
LOWSQU error message, 3–18, 3–24
LRGHSH error message, 3–27
M
Manual fix, 4–11
Master Node field, 3–26
Memory classes, 5–12 default data collection, 5–3 distribution on a node, 3–11 example fix for low, 4–10 fixes, 4–1 fix for low, 4–3 investigating low memory, 4–11
Memory field, 3–11 process identifier (PID), 3–18 rate display, 2–4 requirements, 5–18 sorting data, 5–14 total for a node, 3–11 wait state, 3–23
Memory field, 3–11
Memory Summary window, 3–17 to 3–18
Messages sending to OPCOM, 1–9
Mount field, 3–4
Multicast messages customizing interval, B–3
Mutexes held, 3–22
N
Network address security triplet, 1–5
Network Interconnect System Communication
Architecture
See NISCA
NISCA Summary window, 3–36 to 3–40
Node Name field, 3–9
Nodes assigning to a group, 1–8 availability messages setting broadcast intervals, 1–10 crash fix, 4–7 default data collection, 5–3 error messages, 3–11 field, 3–29 hardware model of, 2–4 shutdown procedure, A–7
Node Summary window, 3–10 to 3–12
NOPGFL error message, 3–9
NOPROC error message, 3–11
NOSWFL error message, 3–9
O
Occurrence value, 5–11
Online release notes, A–1
OPCOM (Operator Communication Manager) filtered event messages sent to, 5–6 sending events to, 2–10 sending messages, 1–9 using on DECamds, 1–4
Open files field, 3–13 for process I/O, 3–13 limit remaining, 3–13
Limit Remaining (Files) field, 3–13
OpenVMS Clusters including in groups, 1–7
Operating system field, 3–11 version displayed, 2–4 version requirements for installing the Data
Analyzer, A–2
Operations count rate, 3–6
OpRate field, 3–6
Owner ID field, 3–20
P
Page/Swap file error messages, 3–9
Page/Swap File Summary window, 3–8 to 3–9
Page faults, 3–22 adjust working set, 4–6 field, 3–11
I/O Rate field, 3–18 purging working sets, 4–9
Rate field, 3–18
Page files default data collection, 5–3
Pagelets number used, 3–9 percentage number used, 3–9 reserving for use, 3–9 total available in page file, 3–9
Pages number used, 3–9 percentage number used, 3–9 reserving for use, 3–9 total available in page file, 3–9
Paging I/O, 3–22
Parameter (system) settings required for installation, A–3
Parent Resource field, 3–26, 3–29
Index–6
Partitioning groups, 1–7
Path field, 3–3, 3–6
PC (program counter) field, 3–20
Performance improving, 5–1 with hardware, 5–21 optimizing by customizing data collection, 5–18
PGFLQUO (page file quota) job quotas in use,
3–23
PID (process identifier), 3–16
CPU, 3–15 field, 3–13, 3–18, 3–20
I/O, 3–13 memory, 3–18 single process, 3–20
PIO (paging I/O) field, 3–13 rate, 3–13
Polling
LAN, 5–18
PRBIOR error message, 3–13, 3–24
PRBIOW error message, 3–24
PRCCOM error message, 3–16, 3–24
PRCCUR error message, 3–24
PRCCVR error message, 3–16
PRCLM process limit job quotas in use, 3–24
PRCMUT error message, 3–24
PRCMWT error message, 3–16
PRCPUL error message, 3–24
PRCPWT error message, 3–16, 3–24
PRCQUO error message, 3–24
PRCRWA error message, 3–24
PRCRWC error message, 3–24
PRCRWM error message, 3–24
PRCRWP error message, 3–24
PRCRWS error message, 3–24
PRCUNK error message, 3–24
PRDIOR error message, 3–13, 3–24
PRDIOW error message, 3–24
Priority field, 3–20 process, 3–20 process fix, 4–1
Private LAN transport security, 1–3
Privileges installation requirements, A–3 to run Data Analyzer node, A–8 to run Data Provider node, A–8 to start DECamds, A–7
PRLCKW error message, 3–24
Problems, reporting, A–10
Processes account quotas required for installation, A–3 displaying number of in CPU queues, 2–4 fixes, 4–1 looping fix, 4–3 name in CPU, 3–15
Processes (cont’d) privileges, 1–4 quotas recommended, 5–19 setting, 5–19 states, 3–21 queues on CPUs, 3–11
Process I/O Summary window, 3–12 to 3–13
Process Name field, 3–13, 3–18, 3–20, 3–29
PRPGFL error message, 3–18, 3–24
PRPIOR error message, 3–13, 3–18, 3–24
PSLs (processor status longwords), 3–20
Purge Working Set fix, 4–9
Q
Qualifiers
See Command qualifiers
Queue field, 3–6
Quorum fix, 4–1
Quotas in use for jobs, 3–23 limit fix, 4–3 process account quota requirements, A–3 process mode, 3–23 recommended for processes, 5–19 to run Data Analyzer node, A–8 wait state, 3–23
R
Read-only access, 1–9
Recording event messages, B–4 lock contention information, B–4
Release Notes, 5–2, A–1
Reporting problems, A–10
RESDNS error message, 3–27
Reservable field, 3–9
Resource Name field, 3–29
Response time external factors, 5–18 optimizing by customizing data collection, 5–18 system hardware, 5–21
RESPRS error message, 3–27
Restarting the Data Provider, 1–8
Restrictions on the Data Provider, 1–2
Resume Process fix, 4–9
RMCP.EXE file, B–3
RMDRIVER.EXE file, B–3
RQ Mode field, 3–29
Runaway process fix, 4–3
Rwait field, 3–4
Index–7
S
SCA (System Communications Architecture)
Summary window, 3–33 to 3–35
Security customizing files, 1–7
Data Analyzer file, A–8
Data Provider file, A–7 data transfe, 1–4 file protection, 1–4 logging issues with OPCOM, 1–4 private LAN transport, 1–3 process privileges, 1–4 read-only access, 1–9 steps after installing
Data Analyzer, A–8
Data Provider, A–7
Security triplets access verification code, 1–5 format, 1–5 how they work, 1–6 network address, 1–5 password, 1–5 verifying, 1–7 wildcard address, 1–6
Setting broadcast intervals, 1–10
Severity events, 5–6
Show Nodes, 2–5 changing default behavior, 5–3
Shutdown procedure, node, A–7
Signal filter, 5–6
Single Disk Summary window, 3–6
Single Lock Summary window error messages, 3–29
Granted, Conversion, and Waiting Queue lock,
3–29
Granted lock fields, 3–29
Single node collecting data, 2–6
Single Process Summary window error messages, 3–24 execution rates, 3–22 quotas, 3–23
SMP (symmetric multiprocessing), 2–4
Sorting data, 5–14, 5–15 for memory data, 5–14
Starting DECamds, 2–1 to 2–2
Starting the Data Analyzer, 2–1 to 2–2, A–9
Starting the Data Provider, A–8
State field, 3–20
Status field, 3–4, 3–26
Stopping data collection, 2–6
Suspended process fix, 4–3, 4–9
Swap files default data collection, 5–3
SWAPPER process fixes ignored, 4–2
SYS$HELP.HLB file, B–1, B–3
SYS$RMDRIVER.EXE file, B–3
SYS$STARTUP.COM file, B–3
System fix, 4–1 load recommendations for hardware, 5–21 parameter settings required for installation,
A–3 processes, 4–10
System Communications Architecture
See SCA
System Overview window, 2–2, 2–3 defining groups, 1–7 hiding node name on startup with customization file, 5–3 menus, 2–4
T
Thresholds customizing for events or computer classes,
5–11
Timeout filter, 5–6
Total field, 3–9
TQELM process limit job quotas in use, 3–23
Trans (transaction) field, 3–4
Transactions number of operations for disk, 3–4
Triplets
See Security triplets
U
UICs (user identification codes) single process summary, 3–20
UIC field, 3–20
Uptime field, 3–11
Used field, 3–6, 3–9
% Used field, 3–6, 3–9
User accounts
Data Analyzer node privileges and quotas, A–8
Data Provider node privileges, A–3
Username field, 3–20
V
Version display of operating system, 2–4
Viewing groups, 2–5
View menu, 2–5
Volume default data collection, 5–3
I/O operations, 3–6 number of free blocks, 3–6
Index–8
Volume (cont’d) number of used blocks, 3–6 operations count rate, 3–6 percentage number of blocks used, 3–6
Volume Name field, 3–3, 3–6
Volume Summary window, 3–5 to 3–6
W
Waiting queue locks, 3–29
Wait states buffered I/O, 3–23 compute, 3–23 control, 3–23
CPU, 3–16 direct I/O, 3–23 disk status, 3–4 explicit, 3–23 memory, 3–23 quota, 3–23
RWAIT, 3–4
Wildcard address security triplet, 1–6
Working set count, 3–18 default, 3–22 extent, 3–18, 3–22 global pages, 3–21 private pages, 3–22 purging, 4–9 quota, 3–22 size, 3–18, 3–22 size fix, 4–6 too high or too low, 4–3 total pages, 3–22
WSdef field, 3–22
WSextent field, 3–22
WSquo field, 3–22
Index–9
advertisement
Key Features
- Real-time system monitoring
- Diagnostic capabilities
- Availability improvements
- OpenVMS Cluster management
- Intuitive user interface
- Real-time intervention
- System customization
- Scalability for multiple systems