Microsoft Windows Server 2003 Performance

Microsoft Windows Server 2003 Performance
A01TX1106989.fm Page 55 Monday, March 21, 2005 10:26 AM
PUBLISHED BY
Microsoft Press
A Division of Microsoft Corporation
One Microsoft Way
Redmond, Washington 98052-6399
Copyright © 2005 by Microsoft Corporation
All rights reserved. No part of the contents of this book may be reproduced or transmitted in any form or by
any means without the written permission of the publisher.
Library of Congress Congress Control Number 2005921847
Printed and bound in the United States of America.
1 2 3 4 5 6 7 8 9 QWT 8 7 6 5 4 3
Distributed in Canada by H.B. Fenn and Company Ltd. A CIP catalogue record for this book is available from
the British Library.
A CIP catalogue record for this book is available from the British Library.
Microsoft Press books are available through booksellers and distributors worldwide. For further information
about international editions, contact your local Microsoft Corporation office or contact Microsoft Press International directly at fax (425) 936-7329. Visit our Web site at www.microsoft.com/mspress. Send comments
to [email protected]
Microsoft, Active Directory, ActiveX, Microsoft Press, MSDN, MSN, Visual Basic, Win32, Windows, Windows Media, Windows NT, and Windows Server are either registered trademarks or trademarks of Microsoft
Corporation in the United States and/or other countries. Other product and company names mentioned herein
may be the trademarks of their respective owners.
The example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and
events depicted herein are fictitious. No association with any real company, organization, product, domain
name, e-mail address, logo, person, place, or event is intended or should be inferred.
This book expresses the author’s views and opinions. The information contained in this book is provided without any express, statutory, or implied warranties. Neither the authors, Microsoft Corporation, nor its resellers,
or distributors will be held liable for any damages caused or alleged to be caused either directly or indirectly
by this book.
Acquisitions Editor: Martin DelRe
Project Editor: Karen Szall
Copy Editor: Victoria Thulman
Technical Editor: Mitch Tulloch
Indexer: Tony Ross and Lee Ross
SubAsy Part No. X11-06988
Body Part No. X11-06989
Contents at a Glance
1
2
3
4
5
6
Performance Monitoring Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Performance Monitoring Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Measuring Server Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Performance Monitoring Procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
Performance Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
Advanced Performance Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
Contents
About the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Resource Kit Companion CD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
1
Performance Monitoring Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Introducing Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Learning About Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Proactive Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Diagnosing Performance Problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Scalability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Performance Monitoring Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Utilization Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Queue Time and Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Little’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Using the Performance Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Operating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Memory and Paging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
The I/O Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Network Interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
2
Performance Monitoring Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Summary of Monitoring Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Performance Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
What do you think of this book?
We want to hear from you!
Microsoft is interested in hearing your feedback about this publication so we can
continually improve our books and learning resources for you. To participate in a brief
online survey, please visit: www.microsoft.com/learning/booksurvey/
v
vi
Contents
Event Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Load Generating and Testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Administrative Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Required Security for Tool Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Performance Monitoring Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Performance Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
System Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Viewing a Chart in Real Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Changing the Sampling Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Creating a Custom Monitoring Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 147
Saving Real-Time Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Customizing How Data Is Viewed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Tips for Working with System Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Task Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Working with Task Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Monitoring Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Monitoring Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Monitoring Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Monitoring the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Monitoring Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Automated Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Performance Logs and Alerts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Counter Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Tips for Working with Performance Logs and Alerts . . . . . . . . . . . . . . . . . . . . 178
Creating Performance Logs Using Logman. . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Managing Performance Logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Using the Relog Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Using Typeperf Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Windows Performance Monitoring Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Performance Library DLLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Performance Counter Text String Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Performance Data Helper Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Disable Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
Remote Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
Event Tracing for Windows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
Event Tracing Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
Contents
vii
Using Log Manager to Create Trace Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Event Trace Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Alerts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Configuring Alerts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Configuring Alert Notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Windows System Resource Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Network Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
3
Measuring Server Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Using Performance Measurements Effectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Identifying Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Management by Exception. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
Key Performance Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
System and Application Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Processor Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Monitoring Memory and Paging Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
Monitoring Disk Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Managing Network Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
Maintaining Server Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Terminal Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
4
Performance Monitoring Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
Understanding Which Counters to Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
Background Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
Management Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Capacity Planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
Daily Server Monitoring Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
Daily Counter Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
Using Alerts Effectively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
Daily Management Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
Historical Data for Capacity Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
Automated Counter Log Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
Using a SQL Server Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
Using the System Monitor Console with SQL Server . . . . . . . . . . . . . . . . . . . . 366
How to Configure System Monitor to Log to SQL Server . . . . . . . . . . . . . . . . 367
Counter Log Database Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
Querying the SQL Performance Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
viii
Contents
Capacity Planning and Trending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Organizing Data for Capacity Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
Forecasting Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
Counter Log Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
Logging Local Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
Monitoring Remote Servers in Real Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
Troubleshooting Counter Collection Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Missing Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
Restoring Corrupt Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
5
Performance Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
Bottleneck Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
Baseline Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
Current Performance Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
Resource Utilization and Queue Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
Decomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
Analysis Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
Understanding the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
Analyzing the Logged Performance Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
Analyzing Performance Data Interactively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Fine-Grained Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
What to Check Next in the Enterprise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
Processor Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
Resource Utilization and Queue Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Decomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
Identifying a Runaway Process by Using Task Manager . . . . . . . . . . . . . . . . . 413
Identifying a Runaway Process by Using a Counter Log . . . . . . . . . . . . . . . . . 416
Memory Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
Counters to Evaluate When Troubleshooting Memory Performance . . . . . . 431
What to Check Next When Troubleshooting Memory Performance . . . . . . 434
Excessive Paging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Virtual Memory Shortages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
32-Bit Virtual Memory Addressing Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456
Disk Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
Disk Performance Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
Diagnosing Disk Performance Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
Contents
ix
Network Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
Counters to Log When Troubleshooting Network Performance . . . . . . . . . . 509
Counters to Evaluate When Troubleshooting Network Performance . . . . . . 511
LAN Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
WAN Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
6
Advanced Performance Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
Processor Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
Instruction Execution Throughput. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
Time-Slicing Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
Multiprocessors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
Memory Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608
Extended Virtual Addressing in 32-Bit Machines . . . . . . . . . . . . . . . . . . . . . . . 608
64-Bit Virtual Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
Forecasting Memory Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622
The System Monitor Automation Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628
Adding the System Monitor ActiveX Control to a Web Page . . . . . . . . . . . . . 629
Customizing the System Monitor ActiveX Control . . . . . . . . . . . . . . . . . . . . . . 630
Configuring the System Monitor ActiveX Control Display Type . . . . . . . . . . . 632
Configuring the System Monitor ActiveX Control Sampling Rate . . . . . . . . . 634
Manually Retrieving Performance Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634
Configuring the System Monitor ActiveX Control’s Appearance . . . . . . . . . . 636
Configuring the System Monitor ActiveX Control Color Schemes . . . . . . . . . 637
Configuring the System Monitor ActiveX Control Font Styles . . . . . . . . . . . . 638
Adding Performance Counters to the System Monitor ActiveX Control . . . . 639
Configuring System Monitor ActiveX Control Performance Counters. . . . . . 640
Removing Performance Counters from the
System Monitor ActiveX Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
Using Counter Paths to Track Individual Performance Counters . . . . . . . . . . 642
Creating a Web Page for Monitoring Performance . . . . . . . . . . . . . . . . . . . . . 643
Drag-and-Drop Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681
What do you think of this book?
We want to hear from you!
Microsoft is interested in hearing your feedback about this publication so we can
continually improve our books and learning resources for you. To participate in a brief
online survey, please visit: www.microsoft.com/learning/booksurvey/
About the Author
Mark Friedman is the president and CEO of Demand Technology Software (DTS) in
Naples, Florida, and is responsible for the development of Microsoft® Windows
Server™ performance monitoring and capacity planning tools for large-scale enterprises. Mark founded DTS in 1996, and the company became a Microsoft ISV partner
in 1997. He has been programming on the Windows platform since 1990.
Mark founded OnDemand Software in 1994 and was chairman of the board of directors of that company when it was sold to Seagate Technology in the spring of 1996.
OnDemand developed and marketed utility software, including the award-winning
WinInstall software distribution package. Between 1987–1991, he was a director of
product development at Landmark Systems and responsible for the design and development of TMON/MVS, a leading mainframe performance monitoring product.
Mark is a recognized expert in computer performance, disk and tape performance,
and storage management. He is the author of over 100 technical articles and papers on
these subjects. He co-authored (with Dr. Odysseas Pentakalos) the book Windows
2000 Performance Guide (O’Reilly, 2002). Mark’s training seminars, lectures, and published work are highly regarded for their technical quality and depth, and he is
esteemed for his ability to communicate complex technical topics in plain, concise
terms.
He holds a master’s degree in computer science from Temple University.
xi
Thank you to those who contributed to the Microsoft Windows Server 2003 Performance
Guide and Microsoft Windows Server 2003 Troubleshooting Guide.
Technical Writing Lead: David Stern
Writers: Mark Friedman, Tony Northrup, David Stern, Brit Weston
Editors: Carolyn Eller, Paula Younkin, Julia Ziobro
Project Manager: Cliff Hall
Production Lead: Jim Bevan
Art Production: Chris Blanton, David Hose, Jon Billow
Technical Contributors: Jee Fung Pang, Iain Frew, Neel Jain, Inder Sethi, Brad
Waters, Bruce Worthington Ahmed Talat, Tom Hawthorn, Adam Berkan, Oscar Omar
Garza Santos, Rick Vicik, Kathy Sestrap, David Stern, Jon Wojan, Ben Christenbury,
Steve Patrick, Greg Cottingham, Rick Anderson, Khalil Nasser, Darrell Gorter, Andrew
Ritz, Jeremy Cahill, Rob Haydt, Jonathan V. Smith, Matt Holle, Jamie Schwartz, Keith
Hageman, Terence Hosken, Karan Mehra, Tony Donno, Joseph Davies, Greg Marshall,
Jonathan Schwartz, Chittur Subbaraman, Clark Nicholson, Bob Fruth, Lara Sosnosky,
Charles Anthe, Tim Lytle, Adam Edwards, Simon Muzio, Mike Hillberg, Vic Heller,
Prakash Rao, Ilan Caron, Shy Cohen, Ashwin Palekar, Matt Desai, Mahmood Dhalla,
Joseph Dadzie, David Cross, Jiandong Ruan, Stephane St-Michel, Kamen Moutafov,
KC Lemson, Jim Cavalaris, Jeff Westhead, Glenn Pittaway, Stephen Hui, Davide Massarenti, David Kruse, Chris Evans, Brian Granowitz, David Lee, Neta Amit, Avi
Shmueli, Jim Thatcher, Pung Xu, Steve Olsson, Ran Kalach, Brian Dewey, V Raman,
Paul Mayfield, David Eitelbach, Jaroslav Dunajsky, Alan Warwick, Pradeep Madhavarapu, Kahren Tevosyan, Huei Wang, Ido Ben-Shachar, Florin Teodorescu, Michael
Hills, Fred Bhesania, Randy Aull, Sachin Seth, Chris Stackhouse, David Fields, Stuart
Sechrest, Landy Wang, Duane Thomas, Lisa Cipriano, Kristin Thomas, Stewart Cox,
Joseph Davies, Pilar Ackerman, Cheryl Jenkins
From the Microsoft Press editorial team, the following individuals contributed to the
Microsoft Windows Server 2003 Performance Guide:
Product Planner: Martin DelRe
Project Editor: Karen Szall
Technical Reviewer: Mitch Tulloch
Copy Editor: Victoria Thulman
Production Leads: Dan Latimer and Elizabeth Hansford
Indexers: Tony Ross and Lee Ross
Art Production: Joel Panchot and William Teel
xiii
Introduction
Welcome to Microsoft® Windows Server™ 2003 Performance Guide.
The Microsoft Windows 2003 Server Resource Kit consists of seven volumes and a single
compact disc (CD) containing tools, additional reference materials, and an electronic
version of the books (eBooks).
The Microsoft Windows Server 2003 Performance Guide is your technical resource for
optimizing the performance of computers and networks running on the Microsoft
Windows Server 2003 operating system. Windows Server 2003 provides a comprehensive set of features that helps you automate the management of most workloads
and configurations. It also provides a powerful set of performance monitoring tools
and performance-oriented settings that you can use to fine-tune system performance.
Use this guide to gain a basic understanding of performance concepts and strategies
so that you can optimize the speed, reliability, and efficiency of your Windows Server
2003 operating system.
Document Conventions
Reader alerts are used throughout the book to point out useful details.
Reader Alert
Meaning
Tip
A helpful bit of inside information on specific tasks or functions
Note
Alerts you to supplementary information
Important
Provides information that is essential to the completion of a task
Caution
Important information about possible data loss, breaches of security, or
other serious problems
Warning
Information essential to completing a task, or notification of potential
harm
xv
xvi
Introduction
The following style conventions are used in documenting command-line tasks
throughout this guide.
Element
Meaning
Bold font
Characters that you type exactly as shown, including commands
and parameters. User interface elements also appear in boldface
type.
Italic font
Variables for which you supply a specific value. For example, Filename.ext can refer to any valid file name.
Monospace font
Code samples.
%SystemRoot%
Environment variables.
Resource Kit Companion CD
The companion CD includes a variety of tools and resources to help you work more
efficiently with Microsoft Windows® clients and servers.
Note
The tools on the CD are designed to be used on Windows Server 2003 or
Windows XP (or as specified in the documentation of the tool).
The Resource Kit companion CD includes the following:
■
The Microsoft Windows Server 2003 Resource Kit tools—a collection of tools and
other resources that help you to efficiently harness the power of Windows
Server 2003. Use these tools to manage Microsoft Active Directory® directory
services, administer security features, work with the registry, automate recurring
jobs, and perform many other tasks. Use the Tools Help documentation to discover and learn how to use these administrative tools.
■
Windows Server 2003 Technical Reference—documentation that provides comprehensive information about the technologies included in the Windows Server
2003 operating system, including Active Directory and Group Policy, as well as
core operating system, high availability and scalability, networking, storage, and
security technologies.
■
Electronic version (eBook) of this guide as well as an eBook of each of the other
volumes in the Microsoft Windows Server 2003 Resource Kit.
■
EBooks of Microsoft Encyclopedia of Networking, Second Edition, Microsoft Encyclopedia of Security, Internet Information Services (IIS) 6 Resource Kit, and Microsoft
Scripting Self-Paced Learning Guide.
■
Sample chapters from the Assessing Network Security and Microsoft Windows
Server 2003 PKI and Certificate Security books.
Introduction
xvii
■
VBScript Essentials Videos—videos from the Microsoft Windows Administrator’s
Automation Toolkit.
■
A link to the eLearning site where you can access free eLearning clinics and
hand-on labs.
■
An online book survey that gives you the opportunity to comment on your
Resource Kit experience as well as influence future Resource Kit publications.
Resource Kit Support Policy
Microsoft does not support the tools supplied on the Microsoft Windows Server 2003
Resource Kit CD. Microsoft does not guarantee the performance of the tools, or any
bug fixes for these tools. However, Microsoft Press provides a way for customers who
purchase Microsoft Windows Server 2003 Resource Kit to report any problems with the
software and receive feedback for such issues. To report any issues or problems, send
an e-mail message to [email protected] This e-mail address is only for issues
related to Microsoft Windows Server 2003 Resource Kit and any of the volumes within
the Resource Kit. Microsoft Press also provides corrections for books and companion
CDs through the World Wide Web at http://www.microsoft.com/learning/support/.
To connect directly to the Microsoft Knowledge Base and enter a query regarding a
question or issue you have, go to http://support.microsoft.com. For issues related to
the Microsoft Windows Server 2003 operating system, please refer to the support
information included with your product.
Chapter 1
Performance Monitoring
Overview
In this chapter:
Introducing Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Performance Monitoring Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
System Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Comprehensive measurement data on the operation and performance of computers
running the Microsoft Windows Server 2003 operating system makes these systems
easy to manage. Windows Server 2003 provides powerful and comprehensive features
that manage the performance of most workloads and configurations automatically.
The presence of these advanced features means you usually do not need to intervene
manually to try and coax better performance out of your Windows Server 2003 configuration. Nevertheless, the automatic management facilities might not be optimal in
every case. For those situations in which the performance of applications is slow or
otherwise less than optimal, you can use a variety of tunable, performance-oriented
settings to gain better performance. How to use the most important settings that are
available to fine-tune your Windows Server 2003 machines is one of the main areas of
discussion in Chapter 6, “Advanced Performance Topics,” in this book.
Caution
There are many performance settings and tuning parameters that you
can use in Windows Server 2003. The wrong values for these performance and tuning settings can easily do your system more harm than good. Changing the system’s
default settings should be attempted only after a thorough study has convinced you
that the changes contemplated are likely to make things better. How to conduct a
study to determine this is one of the important topics discussed throughout this section of the book.
When your performance is not optimal, Windows Server 2003 provides a rich set of
tools for monitoring the performance of a computer system and its key components.
These components include the hardware (for example, the processor, disks, and
1
2
Microsoft Windows Server 2003 Performance Guide
memory); the network; the operating system and its various services; the major Server
subsystems (for example, security, file, print, Web publishing, database, and messaging); and the specific application processes that are executing. Rest assured that whatever role your Windows Server 2003 machine is configured to play in your
environment, your machine will be capable of providing ample information about its
operating characteristics and performance. This chapter shows you how to use this
information effectively to solve a wide variety of problems, from troubleshooting a performance problem to planning for the capacity required to support a major new application running on your Windows Server 2003 farm.
This chapter introduces the major facilities for performance monitoring in Windows
Server 2003. It begins by describing the empirical approach that experienced computer performance analysts should adopt to address performance problems. It then
reviews the key concepts that apply whenever you are diagnosing computer performance. Because computer performance deals with measurements and their relationship to each other, some of the discussion concerns these mathematical relations.
The central topic in this chapter is an introduction to computer system architecture
from a primarily performance point of view. This introduction focuses on how Windows Server 2003 works so that you will be able to use the performance statistics it
generates more effectively. It also introduces the most important performance
counters that are available for the major system components on your Windows Server
2003 computers. It discusses what these measurements mean and explains how they
are derived. A more comprehensive discussion of the most important performance
statistics is provided in Chapter 3, “Measuring Server Performance,” in this book.
Introducing Performance Monitoring
Configuring and tuning computer systems for optimal performance are perennial
concerns among system administrators. Users of Windows Server 2003 applications
who rely on computer systems technology to get important work done are naturally
also concerned about good performance. When computer performance is erratic or
the response time of critical applications is slow, these consumers are forced to work
less efficiently. For example, customers visiting a .NET e-commerce Web site facing
elongated response times might become dissatisfied and decide to shop elsewhere.
The ability to figure out why a particular system configuration is running slowly is a
desirable skill that is partly science and partly art. Whatever level of skill or artistry
you possess, gathering the performance data is a necessary first step to diagnosing
and resolving a wide range of problems. Determining which data to collect among all
the performance statistics that can be gathered on a Windows Server 2003 machine is
Chapter 1: Performance Monitoring Overview
3
itself a daunting task. Knowing which tool to choose among the different tools supplied with Windows Server 2003 to gather the performance data you need is another
important skill to learn. Finally, learning to understand and interpret the performance
data that you gather is another valuable area of expertise you must cultivate. This section of this book is designed to help you with all these aspects of performance monitoring. Although reading this chapter and the related chapters in this book will not
immediately transform you into a performance wizard, it will provide the background
necessary for you to acquire that knowledge and skill.
Learning About Performance Monitoring
This introductory chapter looks at the basics of performance monitoring by explaining how a Windows Server 2003 computer system works. It provides an overview of
the performance data that is available and what it can be used for. If you are experienced with performance monitoring, you might want to skim through this chapter
and move to a more challenging one. Figure 1-1 provides a basic roadmap for the
chapters of this book and can help you decide where to start your reading.
Ch1. Performance
Monitoring Overview
Concepts and
Definitions.
System Architecture:
Processors, Memory,
Disks, and Networking.
Ch4. Performance
Monitoring Procedures
Daily Monitoring
Best Practices:
Alerts, Management
Reporting,
Capacity Planning,
Stress Testing
New Applications.
Troubleshooting
Counter Log
Collection Problems.
Ch5. Performance
Troubleshooting
Analysis Procedures.
Processor, Memory,
Disk, and Network
Troubleshooting.
Ch2. Performance
Monitoring Tools
System Monitor Console
Performance Logs
and Alerts:
Counter Logs,
Traces, and Alerts.
Ch3. Measuring Server
Performance
Key Performance
Indicators:
Processors, Memory
and Paging,
Disks, Networking,
and Applications.
Ch6. Advanced
Performance Topics
Instruction Execution
Architectures,
Multiprocessor
Scalability, Extended
Virtual Addressing.
System Monitor
Automation Interface.
Figure 1-1
Chapter roadmap
4
Microsoft Windows Server 2003 Performance Guide
Proactive Performance Monitoring
Experienced drivers are careful to monitor their vehicles’ gas gauges on a regular basis
so that they know how much gas is left in the tank and can stop to refuel before the
tank runs dry. That, of course, is the idea behind the gas gauge in the first place—to
monitor your fuel supply so that you take action to avert a problem before it occurs.
No one would want a gas gauge that waited until after the car had stopped to
announce, “By the way, you are out of gas.”
Unfortunately, many system administrators wait until after a computer has started to
experience problems to begin monitoring its performance. When you discover that
the computer has, say, run out of disk space, it is already too late to take corrective
action that would have averted the problem in the first place. Had you been monitoring performance on a regular basis as part of your routine systems management procedures, you would have known in advance that disk space was beginning to run low.
You would have been able to take steps to prevent the problem from occurring.
Instead of using performance monitoring to react to problems once they occur, use
proactive measures to ensure that the systems you are responsible for remain capable
of delivering acceptable levels of performance at all times.
Don’t neglect performance monitoring until after something bad happens. Use
proactive measures to find and correct potential performance problems before they
occur.
Tip
Understanding the capabilities of the hardware and software that Windows Server
2003 manages is crucial to this goal. The computer and network hardware that you
can acquire today are extremely powerful, but they still have a finite processing capacity. If the applications you are responsible for push hard against the capacity limits of
the equipment you have installed, critical performance problems are likely to occur.
You need to understand how to identify capacity constraints and what you can do
about them when you encounter them. For example, it might be possible to upgrade
to even more powerful hardware or to configure two or more machines into a cluster
that can spread the work from a single machine over multiple ones. But to relieve a
capacity constraint, you must first be able to identify it by having access to performance monitoring data that you can understand and know how to analyze.
The overall approach championed throughout this book favors developing proactive
procedures to deal with potential performance and capacity problems in advance.
This approach focuses on continuous performance monitoring, with regular reporting
procedures to identify potential problems before they flare up and begin to have an
Chapter 1: Performance Monitoring Overview
5
impact on daily operations. This book provides detailed guidelines that will allow you
to implement proven methods. These techniques include:
■
Baselining and other forms of workload characterization
■
Stress testing new applications and hardware configurations before deploying
them on a widespread scale
■
Establishing and reporting service level objectives based on the reasonable service
expectations of your applications
■
Management by exception to focus attention on the most serious performancerelated problems
■
Trending and forecasting to ensure that service level objectives can be met in the
future
These are all proven techniques that can successfully scale to an enterprise level. This
book will describe step-by-step procedures that you can implement—what data to collect, what alert thresholds to set, what statistical measures to report, and so on, so that
you can establish an effective program of regular performance monitoring for your
installation. Of course, as you grow more confident in your ability to analyze the performance statistics you gather, you will want to modify the sample procedures
described here. Once you understand better what you are doing, you will be able to
tailor these procedures to suit your environment better.
Diagnosing Performance Problems
Even with effective proactive monitoring procedures in place across your network of
Windows Server 2003 machines, you can still expect occasional flare-ups that will call
for immediate and effective troubleshooting. It is likely that no amount of proactive
monitoring will eliminate the need for all performance troubleshooting. This book
also emphasizes the practical tools, tips, and techniques that you will use to diagnose
and solve common performance problems.
Wherever possible, this book tries to give you clear-cut advice and simple procedures
to follow so that you can quickly diagnose and resolve many common performance
problems. However, a simple cookbook approach to performance monitoring will
take you only so far. Because the Windows Server 2003 systems, applications, and
configurations you manage can be quite complex, the proactive performance procedures that you establish are subject to at least some of that complexity. (For more
information about these performance procedures, see Chapter 4, “Performance Monitoring Procedures,” in this book.) Some of the following factors can complicate the
performance monitoring procedures you implement:
6
Microsoft Windows Server 2003 Performance Guide
■
Complex and expensive hardware configurations Windows Server 2003 supports a wide range of environments, from simple 32-bit machines with a single
processor, to more complex 64-bit machines with up to 512 GB of RAM and
attached peripherals, to symmetric multiprocessing (SMP) architectures supporting up to 64 processors, and even specialized Non-Uniform Memory Access
(NUMA) architecture machines. These advanced topics are discussed in detail
in Chapter 6, “Advanced Performance Topics.”
■
The number of systems that must be managed Developing automated performance monitoring procedures that can scale across multiple machines and
across multiple locations is inherently challenging. For complex environments,
you might need even more powerful tools than those discussed in this book,
including the Microsoft Operations Manager (MOM), a Microsoft product that
provides comprehensive event management, proactive monitoring and alerting,
and reporting and trend analysis for Windows Server System-based networks.
For more information about MOM, see http://www.microsoft.com/mom/.
■
The complexity of the applications that run on Windows Server 2003 Some of
the complex application environments your Windows Server 2003 machines
must support include multi-user Terminal Services configurations, the .NET
Framework of application run-time services, Microsoft Internet Information
Services (IIS) Web server, the Microsoft SQL Server database management
system, and the Microsoft Exchange Server messaging and collaboration
server application. Each of these applications might require specialized procedures to gather and analyze performance data that is specific to these environments. In addition, these servers’ applications can be clustered so that
application processing is distributed across multiple server machines. Many
of the application subsystems have specific configuration and tuning options
that can have an impact on performance levels. In many instances, the application-specific knowledge to solve a specific SQL Server or Exchange performance problem is beyond the scope of the book. Where possible, other
useful books, resources, and white papers that deal with Microsoft server
application performance are referenced.
Each of these factors can add complexity to any performance problem diagnosis task.
The best solutions to problems of this type are likely to be dependent on highly specific aspects of your configuration and workload. In addition to providing simple recipes for resolving common performance issues, this book also attempts to supply you
with the basic knowledge and skills that will allow you to deal with more complex
problems. As you gain confidence in the effectiveness of the methods and analytic
techniques that are described here, you will learn to identify and resolve more difficult
and more complex performance problems.
Chapter 1: Performance Monitoring Overview
7
Overhead Considerations
One of the challenges of performance monitoring in the Windows Server 2003 environment is that the system configuration, the hardware, and the application software
can be quite complex, as discussed. The challenge in complex environments is to collect the right amount of performance monitoring data so that you can diagnose and
solve problems when they occur.
Caution
You must always be careful to ensure that the performance data you
gather does not put so great a burden on the machine you are monitoring that you
actually contribute to the performance problem you are trying to fix. You must also be
careful to avoid collecting so much performance data that it greatly complicates the
job of analyzing it.
These and other related considerations are aspects of the problem of performance
monitoring overhead. By design, the performance monitoring procedures recommended here gather data that has a high degree of usefulness and a low impact on the
performance of the underlying system. Nevertheless, it is not always possible for performance monitoring procedures to be both efficient and effective at diagnosing specific problems. The performance data you need to solve a problem might be
voluminous as well as costly to gather and analyze. There are often difficult tradeoffs
decisions that need to be made. You will always need to carefully assess the tradeoffs,
and exercise good judgment about which data to collect and at what cost.
Important In a crisis, overhead considerations pale beside the urgent need to
troubleshoot a problem that is occurring. The normal rules about limiting the impact
of performance monitoring do not apply in a crisis.
In Chapter 2, “Performance Monitoring Tools,” in this book, the architecture of the
main performance monitoring interfaces that Windows Server 2003 uses is discussed.
The mechanisms built into Windows Server 2003 to capture performance statistics,
gather them from different system components, and return them to various performance monitoring applications like the built-in Performance Monitor application will
be described in detail. Once you understand how performance monitoring in Windows Server 2003 works, you should be able to make an informed decision about
what costly performance data to gather and when it is justified to do so.
8
Microsoft Windows Server 2003 Performance Guide
Crisis Mode Interventions
You are probably familiar with the crisis mode you and your Information Technology
(IT) organization are plunged into when an urgent performance problem arises.
When a Windows Server 2003 machine responsible for some mission-critical application misbehaves, alarms of various kinds start to spread through the IT technical support group. In the initial stages, there are likely to be many agitated callers to your
organization’s Help Desk function that services the user community. If the crisis is
prolonged, established escalation procedures start to increase the visibility of the key
role your department plays in maintaining a stable systems environment. Many selfappointed “experts” in this or that aspect of computer performance eventually convene to discuss the situation. Your efforts to resolve the problem quickly are suddenly
thrust into the spotlight. Senior managers who never seemed very interested in your
job function before are now anxious to hear a detailed account of your activities to
solve the current crisis.
During a crisis, it is important that cooler heads prevail. Instead of jumping to a conclusion about what caused the current problem, begin by gathering and analyzing
focused data about the problem. Windows Server 2003 includes many tools that are
specifically designed to gather data to solve particular performance problems. Chapter 5, “Performance Troubleshooting,” documents the use of many special purpose
tools that you might only need to use in a crisis.
Normally, the performance monitoring procedures recommended in Chapter 5, “Performance Monitoring Procedures,” are designed to gather performance data without
having a major impact on the performance of the underlying system. In crisis mode,
however, normal overhead considerations do not apply. If costly data gathering will
potentially yield crucial information about a critical performance problem, the normal
overhead considerations usually do not apply.
Scalability
Computer equipment today is extremely powerful, yet performance problems have
not disappeared. The computing resources you have in place are finite. They have definite limits on their processing capability. Scalability concerns how those finite limitations impact performance. Computer professionals worry about scalability because in
their experience, many computer systems encounter performance problems as the
number of users of those systems grows. When computer professionals are discussing
application or hardware scalability, they are concerned with root computer performance and capacity planning issues.
Chapter 1: Performance Monitoring Overview
9
Figure 1-2 shows two scalability curves. The left-hand y-axis represents a measure of
performance workload throughput—it could be database transactions per second, disk
I/Os per second, Web visitors per hour, or e-mail messages processed per minute. The
horizontal x-axis shows the growth in the number of users of this application.
Application scalability
400
350
Actual
Ideal
Throughput
300
250
200
150
100
50
0
0
Figure 1-2
50
100
150
200
Users
250
300
350
400
Ideal vs. actual application scalability
The “Ideal,” or dotted, line is a straight line that shows performance increasing linearly
as a function of the number of users. This is the ideal that computer engineers and
designers strive for. As the number of concurrent users of an application grows, the
user experience does not degrade because of elongated or erratic response times. The
“Actual,” or solid, line models the performance obstacles that an actual system
encounters as the workload grows. Initially, the actual throughput curve diverges very
little from the ideal case. But as the number of users grows, actual performance levels
tend to be nonlinear with respect to the number of users. As more users are added and
the system reaches its capacity limits, the throughput curve eventually plateaus, as
illustrated in the figure.
Frequently, when computer applications are initially deployed, the number of users
is quite small. Because the current system is not close to reaching its capacity limits,
performance appears to scale linearly. But as users are added and usage of the application grows, performance problems are inevitably encountered. This is a core concern whenever you are planning for an application deployment that must
accommodate a large number of users. Because computer hardware and software have
finite processing limits, this nonlinear behavior—which is evidence of some form of
performance degradation—can be expected at some point as the number of users
grows. The focus of computer capacity planning, for instance, is to determine at what
10
Microsoft Windows Server 2003 Performance Guide
point, as the number of users grows, this performance degradation begins to interfere
with the smooth operation of this application.
Inevitably, computer systems reach their capacity limits, and at that point, when more
users are added, these systems no longer scale linearly. The focus of computer capacity planning for real-world workloads, of course, is to anticipate at what point serious
performance degradation can be expected. After you understand the characteristics of
your workload and the limitations of the computer environment in which it runs, you
should be able to forecast the capacity limits of an application server.
In many instances, you can use stress-testing tools to simulate a growing workload
until you encounter the capacity limits of your hardware. In simulated benchmark
runs, using a stress-testing tool, from which the throughput curves in Figure 1-2 are
drawn, the number of users is increased steadily until the telltale signs of nonlinear
scalability appear. Stress testing your important applications to determine at what
point serious performance degradation occurs is one effective approach to capacity
planning. There are also analytic and modeling approaches to capacity planning that
are effective. The mathematical relationships between key performance measurements, which are discussed in the next section of this chapter, form the basis for these
analytic approaches.
For example, suppose you are able to measure the following:
■
The current utilization of a potentially saturated resource like a processor, disk,
or network adaptor
■
The average individual user’s resource demand that contributes to that
resource’s utilization
■
The rate at which the number of application users is increasing
Using a simple formula called the Utilization Law, which is defined later in this chapter, you will be able to estimate the number of users necessary to drive the designated
resource to its capacity limits, at which point that resource is bound to become a performance bottleneck. By both stress testing your application and using analytic modeling techniques, you can predict when the resource will reach its capacity limits. Once
you understand the circumstances that could cause the resource to run out of capacity, you can formulate a strategy to cope with the problem in advance.
Chapter 1: Performance Monitoring Overview
11
Caution
Many published articles that discuss application scalability display graphs
of performance levels that are reported as a function of an ever-increasing number of
connected users, similar to Figure 1-2. These articles often compare two or more similar applications to show which has the better performance. They are apparently
intended to provide capacity planning guidance, but unless the workload used in the
tests matches your own, the results of these benchmarks might have little applicability
to your own specific problems.
Experienced computer performance analysts understand that nonlinear scalability is
to be expected when you reach the processing capacity at some bottlenecked
resource. You can expect that computer performance will cease to scale linearly at
some point as the number of users increases. As the system approaches its capacity
limits, various performance statistics that measure the amount of work being performed tend to level off. Moreover, computer systems do not degrade gracefully.
When a performance bottleneck develops, measures of application response time
tend to increase very sharply. A slight increase in the amount of work that needs to be
processed, which causes a very sharp increase in the response time of the application,
is often evidence of a resource bottleneck. This nonlinear relationship between utilization and response time is also explored in the next section.
Being able to observe a capacity constraint that limits the performance of some realworld application as the load increases, as illustrated in Figure 1-2, is merely the starting point of computer performance analysis. Once you understand that a bottleneck
is constraining the performance of the application, your analysis should proceed to
identify the component of the application (or the hardware environment that the
application runs in) that is the root cause of the constraint. This book provides guidance on how to perform a bottleneck analysis, a topic that is discussed in Chapter 5,
“Performance Troubleshooting,” but you should be prepared—this step might require
considerable effort and skill.
After you find the bottleneck, you can then proceed to consider various steps that
could relieve this capacity constraint on your system. This is also a step that might
require considerable effort and skill. The alternatives you evaluate are likely to be very
specific to the problem at hand. For example, if you determine that network capacity
is a constraint on the performance of one of your important applications, you will
12
Microsoft Windows Server 2003 Performance Guide
need to consider practical approaches for reducing the application’s network load, for
example, compressing data before it is transmitted over the wire, or alternately, adding network bandwidth. You might also need to weigh both the potential cost and the
benefits of the alternatives proposed before deciding on an effective course of action
to remedy the problem. Some factors you might need to consider include:
■
How long it will take to implement the change and bring some desperately
needed relief to the situation
■
How long the change will be effective, considering the current growth rate in the
application’s usage
■
How to pay for the change, assuming there are additional costs involved in
making the change (for example, additional hardware or software that must
be procured)
Bottleneck analysis is a proven technique that can be applied to diagnose and resolve
a wide variety of performance problems. Your success in using this technique
depends on your ability to gather the relevant performance statistics you will need to
understand where the bottleneck is. Effective performance monitoring procedures are
a necessary first step. Understanding how to interpret the performance information
you gathered is also quite important.
In benchmark runs, simulated users continue to be added to the system beyond the
system’s saturation point. Because these scalability articles report on the behavior of
only simulated “users,” they can safely ignore the impact on real customers and how
these customers react to a computer system that has reached its capacity limits. In real
life, system administrators must deal with dissatisfied customers who react harshly to
erratic performance conditions. There might also be serious economic considerations
associated with performance degradations. Workers who rely on computer systems to
get their daily jobs done on time will lose productivity. Customers who rely on your
applications might become so frustrated that they start to turn to your competitors for
better service. When important business applications reach the limits of their scalability using current hardware and software, one of those crisis-mode interventions discussed earlier is likely to ensue.
Performance Monitoring Concepts
This section introduces the standard computer performance terminology that will be
used in this book. Before you can apply the practices and procedures that are recommended in Chapter 4, “Performance Monitoring Procedures,” it is a good idea to
Chapter 1: Performance Monitoring Overview
13
acquire some familiarity with these basic computer measurement concepts. By necessity, several mathematical formulas are introduced. These formulas are intended to
illustrate the basic concepts used in computer performance analysis. Readers who are
interested in a more formal mathematical presentation should consult any good computer science textbook on the subject.
Computers are electronic machines designed to perform calculations and other types
of arithmetic and logical operations. The components of a computer system—its central processing unit (CPU) or processor, disks, network interface card, and so on—that
actually perform the work are known generically as the computer’s resources. Each
resource has a finite capacity to perform designated types of work. Customers generate
work requests for the server machine (or machines) to perform. In this book we are
concerned primarily with Windows Server 2003 machines designed to service
requests from multiple customers. In analyzing the performance of a particular computer system with a given workload, we need to measure the following:
■
The capacity of those machines to perform this work
■
The rate at which the machines are currently performing it
■
The time it takes to complete specific tasks
The next section defines the terms that are commonly used to describe computer performance and capacity and describes how they are related to each other.
Definitions
Most computer performance problems can be analyzed in terms of resources, queues,
service requests, and response time. This section defines these basic performance
measurement concepts. It describes what they mean and how they are related.
Two of the key measures of computer capacity are bandwidth and throughput. Bandwidth is a measure of capacity, which is the rate at which work can be completed,
whereas throughput measures the actual rate at which work requests are completed.
Scalability, as discussed in the previous section, is often defined as the throughput of
the machine or device as a function of the total number of users requesting service.
How busy the various resources of a computer system get is known as their utilization.
How much work each resource can process at its maximum level of utilization is
defined as its capacity.
The key measures of the time it takes to perform specific tasks are queue time, service
time, and response time. The term latency is often used in an engineering context to
14
Microsoft Windows Server 2003 Performance Guide
refer to either service time or response time. Response time will be used consistently
here to refer to the sum of service time and queue time. In networks, another key measure is round trip time, which is the amount of time it takes to send a message and
receive a confirmation message (called an Acknowledgement, or ACK for short) in reply.
When a work request arrives at a busy resource and cannot be serviced immediately,
the request is queued. Queued requests are subject to a queue time delay before they
are serviced. The number of requests that are delayed waiting for service is known as
the queue length.
Note The way terms like response time, service time, and queue time are defined
here is consistent with the way these same terms are defined and used in Queuing
Theory, which is a formal, mathematical approach used widely in computer performance analysis.
Elements of a Queuing System
Figure 1-3 illustrates the elements of a simple queuing system. It depicts customer
requests arriving at a server for processing. This example illustrates customer requests
for service arriving intermittently. The customer requests are for different amounts of
service. (The service request arrival rate and service time distributions are both nonuniform.) The server in the figure could be a processor, a disk, or a network interface
card (NIC). If the device is free when the request arrives, it goes into service immediately. If the device is already busy servicing some previous request, the request is
queued. Service time refers to the time spent at the device while the request is being
processed. Queue time represents the time spent waiting in the queue until the server
becomes available. Response time is the sum of both service time and queue time.
How busy the server gets is its utilization.
Server
Queue
Service Requests
Arrival Rate Distribution
Service
time
Queue
time
Response time
Figure 1-3
The elements of a queuing system
Chapter 1: Performance Monitoring Overview
15
The computer resource and its queue of service requests depicted in Figure 1-3 leads
to a set of mathematical formulas that can characterize the performance of this queuing system. Some of these basic formulas in queuing theory are described later. Of
course, this model is too simple. Real computer systems are much more complicated.
They have many resources, not only one, that are interconnected. At a minimum, you
might want to depict some of these additional resources, including the processor, one
or more disks, and the network interface cards. Conceptually, these additional components can be linked together in a network of queues. Computer scientists can successfully model the performance of complex computer systems using queuing
networks such as the one depicted in Figure 1-4. When specified in sufficient detail,
queuing networks, similar to the one illustrated, can model the performance of complex computer systems with great accuracy.
Service
Requests
Processor
Queue
CPU
Disk
Queue
Disk
Figure 1-4
NIC
Disk
Queue
Disk
A network of queues
Disk
Queue
Disk
16
Microsoft Windows Server 2003 Performance Guide
Note
Not all the hardware resources necessary for a computer system to function
are easily represented in a simple queuing model like the one depicted in Figure 1-4.
The memory that a computer uses is one notable resource missing from this simple
queuing model. Physical memory, or RAM, is not utilized in quite the same way as
other resources like CPUs and disks. Cache buffering is another important element
that might not be easy to characterize mathematically. Computer scientists use much
more complex queuing models to represent a complex machine environment and all
its critical resources accurately. For example, virtual memory overflowing to the paging file is usually represented indirectly as an additional disk I/O workload. If an element is important to performance, computer scientists usually find a way to represent
it mathematically, but those more complex representations are beyond the scope of
this chapter.
Bandwidth
Bandwidth measures the capacity of a link, bus, channel, interface, or the device itself
to transfer data. Bandwidth is usually measured in either bits/second or bytes/second
(where there are 8 bits in a data byte). For example, the bandwidth of a 10BaseT
Ethernet connection is 10 megabits per second (Mbps), the bandwidth of an ultra SCSI
disk is 160 megabytes per second (MBps), and the bandwidth of the PCI-X 64-bit 100
megahertz (MHz) bus is 800 MBps.
Bandwidth usually refers to the maximum theoretical data transfer rate of a device
under ideal operating conditions. Therefore, it is an upper-bound on actual performance. You are seldom able to measure a device actually performing at its full rated
bandwidth. Devices cannot reach their advertised performance level because overhead
is often associated with servicing work requests. For example, you can expect operating system overhead, protocol message processing time, and a delay in disk positioning to absorb some of the available bandwidth for each request to read or write a disk.
These overhead factors mean that the application can seldom use the full rated bandwidth of a disk for data transfer. As another example, various overheads associated
with network communication protocols reduce the theoretical capacity of a 100 Mbps
Fast Ethernet link to significantly less than 10 MBps. Consequently, discussing effective bandwidth or effective capacity—the amount of work that can be accomplished
using the device under real-world conditions—is usually more realistic.
Throughput
Throughput measures the rate that work requests are completed, from the point of
view of some observer. Examples of throughput measurements include the number of
reads per second from the disk or file system, the number of instructions per second
executed by the processor, HTTP requests processed by a Web server, and transactions per second that can be processed by a database engine.
Chapter 1: Performance Monitoring Overview
17
Throughput and bandwidth are very similar. Bandwidth is often construed as the
maximum capacity of the system to perform work, whereas throughput is the current
observed rate at which that work is being performed.
Utilization
Utilization measures the fraction of time that a device is busy servicing requests, usually reported as a percent busy. Utilization of a device varies from 0 through 1, where
0 is idle and 1 (or 100 percent) represents utilization of the full bandwidth of the
device. It is customary to report that the processor or CPU is 75 percent busy, or the
disk is 40 percent busy. It is not possible for a single device to ever be greater than 100
percent busy.
Measures of resource utilization are common in Windows Server 2003. Later in this
chapter, many of the specific resource utilization measurements that you are able to
gather on your Windows Server 2003 machines will be described. You can easily find
out how busy the processors, disks, and network adaptors are on your machines. You
will also see how these utilization measurements are derived by the operating system,
often using indirect measurement techniques that save on overhead. Knowing how
certain resource utilization measurements are derived will help you understand how
to interpret them.
Monitoring the utilization of various hardware components is an important element
of any capacity planning exercise. If an application server is currently processing 60
transactions per second with a CPU utilization measured at 20 percent, the server
apparently has considerable reserve capacity to process transactions at an even higher
rate. On the other hand, a server processing 60 transactions per second running at a
CPU utilization of 98 percent is operating at or near its maximum capacity.
In forecasting your future capacity requirements based on current performance levels,
understanding the resource profile of workload requests is very important. If you are
monitoring an IIS Web server, for example, and you measure processor utilization at
20 percent busy and the transaction rate at 50 HTTP GET Requests per second, it is
easy to see how you might create the capacity forecast shown in Table 1-1.
Forecasting Linear Growth in Processor Utilization as a Function of
the Service Request Arrival Rate
Table 1-1
HTTP GET Requests/Sec
% Processor Time
50
20%
100
40%
150
60%
200
80%
250
100%
18
Microsoft Windows Server 2003 Performance Guide
The measurements you took and the analysis you performed enabled you to anticipate that having to process 250 HTTP GET Requests per second at this Web site
would exhaust the current processor capacity. This conclusion should then lead you
to start tracking the growth of your workload, with the idea of recommending additional processor capacity as the GET Request rate approaches 200 per second, for
example.
You have just executed a simple capacity plan designed to cope with the scalability
limitations of the current computer hardware environment for this workload. Unfortunately, computer capacity planning is rarely quite so simple. For example, Web
transactions use other resources besides the CPU, and one of those other resources
might reach its effective capacity limits long before the CPU becomes saturated.
And there are other complicating factors. One operating assumption in this simple
forecast is that processing one HTTP GET Request every second in this environment
requires 0.4 percent processor utilization, on average. This assumption is based on
your empirical observation of the current system. Other implicit assumptions in this
approach include:
■
Processor utilization is a simple, linear function of the number of HTTP GET
Requests being processed
■
The service time distribution for processor at the processor per HTTP GET
Request—the amount of processor utilization per request—remains constant
Unfortunately, these implicit assumptions might not hold true as the workload grows.
Because of caching effects, for example, the amount of processor time per request
might vary as the workload grows. If the caching is very effective, the amount of processor time per request could decrease. If the caching loses effectiveness as the workload grows, the average amount of processor time consumed per request might
increase. You will need to continue to monitor this system as it grows to see which of
these cases holds.
The component functioning as the constraining factor on throughput—in this case,
the processor when 250 HTTP GET Requests per second are being processed—is designated as the bottlenecked device. If you improve performance at the bottlenecked
device—by upgrading to a faster component, for example—you are usually able to
extend the effective capacity of the computer system to perform more work.
Tip
Measuring utilization is often very useful in detecting system bottlenecks. Bottlenecks are usually associated with processing constraints at some overloaded device.
It is usually safe to assume that devices observed operating at or near their 100 percent utilization limits are bottlenecks, although things are not always that simple, as
discussed later in this chapter.
Chapter 1: Performance Monitoring Overview
19
It is not always easy to identify the bottlenecked device in a complex computer system
or a network of computer systems. For example, 80 or 90 percent utilization is not
necessarily the target threshold for all devices. Some computer equipment like disk
drives perform more efficiently under heavier loads. These and other anomalies make
the straight-line projections shown in Table 1-1 prone to error if load-dependent servers
are involved.
Service Time
Service time measures how long it takes to process a specific customer work request.
Engineers alternatively often speak of the length of time to process a request as the
device’s latency, which is another word for delay. For example, memory latency measures the amount of time the processor takes to fetch data or instructions from RAM
or one of its internal memory caches. Other related measures of service time are the
turnaround time for requests, usually ascribed to longer running tasks such as disk-totape backup runs. The round trip time is an important measure of network latency
because when a request is sent to a destination across a communications link using
the Transmission Control Protocol/Internet Protocol (TCP/IP), the sender must wait
for a reply.
The service time of a file system request, for example, will vary based on whether the
request is cached in memory or requires a physical disk operation. The service time
will also vary according to whether it is a sequential read of the disk, a random read of
the disk, or a write operation. The expected service time of the physical disk request
also varies depending on the block size of the request. These workload dependencies
demand that you measure disk service time directly instead of rely on projections that
are based on some idealized model of disk performance.
The service time for a work request is generally assumed to be constant, a simple function of the device’s speed or its capacity. Though this is largely true, under certain circumstances, device service times can vary as a function of utilization. Using intelligent
scheduling algorithms, it is often possible for processors and disks to work more efficiently at higher utilization rates. You are able to observe noticeably better service
times for these devices when they are more heavily utilized. Some aspects of these
intelligent scheduling algorithms are described in greater detail later in this chapter.
The service time spent processing an ASP.NET Web-based application request can be
broken down into numerous processing components—for example, time spent in the
application program, time spent during processing by .NET Framework components,
time spent in the operating system, and time spent in database processing. For each
one of these subcomponents, the application service time can be further decomposed
into time spent at various hardware components, for example, the CPU, the disk, and
20
Microsoft Windows Server 2003 Performance Guide
the network. Decomposition is an important technique used in computer performance
analysis to relate a workload to its various hardware and software processing components. To decompose application service times into their component parts, you must
understand how busy various hardware components are and, specifically, how workloads contribute to that utilization. This can be very challenging for many Windows
Server 2003 transaction processing applications because of their complexity. You will
need to gather detailed trace data to accurately map all the resources used by applications to the component parts of individual transactions.
Response Time
Response time is the sum of service time and queue time:
response time = service time + queue time
Mathematically, this formula is usually represented as follows:
W = W s + Wq
where W is latency, Ws is the service time, and Wq is the queue time.
Response time includes both the device latency and any queuing delays that accrue
while the request is queued waiting for the device. At heavily utilized devices, queue
time is likely to represent a disproportionate amount of the observed response time.
Queue time is discussed in greater detail in the next section.
Transaction response time also refers to the amount of time it takes to perform some
unit of work, which can further be decomposed into the time spent using (and sometimes waiting to use) various components of a computer system. Because they best
encapsulate the customer’s experience interacting with an application hosted on a
Windows Server 2003 machine, measures of application response time are among the
most important measures in computer performance and capacity planning. Wherever
possible, management reports detailing application response times are preferable to
reports showing the utilization of computer resources or their service times.
Queue Time
When a work request arrives at a busy resource and cannot be serviced immediately,
the request is queued. Requests are subject to a queue time delay once they begin to
wait in a queue before being serviced.
Queue time arises in a multi-user computer system like Windows Server 2003
because important computer resources are shared. Shared resources include the processor, the disks, and network adaptors. This sharing of devices is orchestrated by the
operating system using locking structures in a way that is largely transparent to the
individual programs you are running. The operating system guarantees the integrity
Chapter 1: Performance Monitoring Overview
21
of shared resources like the processor, disks, and network interfaces by ensuring that
contending applications can access them only serially, or one at a time. One of the
major advantages of a multi-user operating system like Windows Server 2003 is that
resources can be shared safely among multiple users.
When a work request to access a shared resource that is already busy servicing
another request occurs, the operating system queues the request and queue time
begins to accumulate. The one aspect of sharing resources that is not totally transparent to programs executing under Windows Server 2003 is the potential performance
impact of resource sharing. Queuing delays occur because shared resources have multiple applications attempting to access these resources in parallel. Significant delays at
a constrained resource are apt to become visible. If there is significant contention for
a shared resource because two or more programs are attempting to use it at the same
time, performance might suffer. When there is a performance problem on a local Windows workstation, only one user suffers. When there is a performance problem on a
Windows Server 2003 application server, a multitude of computer users can be
affected.
On a very heavily utilized component of a system, queue time can become a very significant source of delay. It is not uncommon for queue time delays to be longer than
the amount of time actually spent receiving service at the device. No doubt, you can
relate to many real-world experiences where queue time is significantly greater than
service time. Consider the time you spend waiting in line in your car at a tollbooth.
The amount of time it takes you to pay the toll is often insignificant compared to the
time you spend waiting in line. The amount of time spent waiting in line to have your
order taken and filled at a fast-food restaurant during the busy lunchtime period is
often significantly longer than the time it takes to process your order. Similarly, queuing delays at an especially busy shared computer resource can be prolonged. It is
important to monitor the queues at shared resources closely to identify periods when
excessive queue time delays are occurring.
Important Measurements of either the queue time at a shared resource or the
queue length are some of the most important indicators of performance you will
encounter.
Queue time can be difficult to measure directly without adding excessive measurement overhead. Direct measurements of queue time are not necessary if both the service time and the queue depth (or queue length) can be measured reliably and
accurately. If you know the queue length at a device and the average service time, the
queue time delay can be estimated reliably, as follows:
queue time = average queue length × average service time
22
Microsoft Windows Server 2003 Performance Guide
This simple formula reflects the fact that any queued request must wait for the request
currently being serviced to complete.
Actually, this formula overestimates queue time slightly. On average, the queue time of
the first request in the queue is only one half of the service time. Subsequent requests
that arrive and find the device busy and at least one other request already in the queue
are then forced to wait. Therefore, a better formula is:
queue time = ((queue length−1) × average service time) + (average service time÷2)
Of course, service time is not always so easy to measure either. However, service time
can often be computed using the Utilization Law in cases where it cannot be measured directly but the device utilization and arrival rate of requests are known.
Not all computer resources are shared on Windows Server 2003, which means that
these unshared devices have no queuing time delays. Input devices like the mouse
and keyboard, for example, are managed by the operating system so that they are
accessible by only one application at a time. These devices are buffered to match the
speed of the people operating them because they are capable of generating interrupts
faster than the application with the current input focus can process their requests.
Instead of queuing these requests, however, the operating system device driver routines for the keyboard and mouse discard extraneous interrupts. The effect is that little or no queue time delay is associated with these devices.
Bottlenecks
One of the most effective methods used to tune performance is systematically to identify bottlenecked resources and then work to remove or relieve them. When the
throughput of a particular system reaches its effective capacity limits, the system is
said to be bottlenecked. The resource bottleneck is the component that is functioning
at its capacity limit. The bottlenecked resource can also be understood as the resource
with the fastest growing queue as the number of users increases.
Important
Empirically, you can identify the bottlenecked resource that serves to
constrain system scalability as the resource that saturates first or the one with the fastest growing queue. The goal of performance tuning is to create a balanced system
with no single bottlenecked resource in evidence. A balanced system is one in which
no resource saturates before any other as the load increases, and all resource queues
grow at the same rate. In a balanced system, queue time delays are minimized across
all resources, leading to performance that is optimal for a given configuration and
workload.
Chapter 1: Performance Monitoring Overview
23
Understanding that a computer system is operating at the capacity limit of one of its
components is important to know. It means, for example, that no amount of tweaking
the tuning parameters is going to overcome the capacity constraint and allow the system to perform more work. You need more capacity, and any other resolution short of
providing some capacity relief is bound to fall short!
Once you identify a bottlenecked resource, you should follow a systematic approach
to relieve that limit on performance and permit more work to get done. You might
consider these approaches, for example:
■
Optimizing the application so that it runs more efficiently (that is, utilizes less
bandwidth) against the specific resource
■
Upgrading the component of the system that is functioning at or near its effective bandwidth limits so that it runs faster
■
Balancing the application across multiple resources by adding more processors,
disks, network segments, and so on, and processing it in parallel
Possibly, none of these alternatives for relieving a capacity constraint will succeed in
fixing the problem quick enough to satisfy your users. In these cases, it might be
worthwhile to resort to tweaking this or that system or application tuning parameter
to provide some short-term relief. The most important settings for influencing system
and application performance in Windows Server 2003 are discussed in Chapter 6,
“Advanced Performance Topics.” There are also many run-time settings associated
with applications such as Exchange or Internet Information Services (IIS) that can
impact performance. Many application-oriented optimizations are documented in
other Resource Kit publications or in white papers available at http://
www.microsoft.com.
A number of highly effective performance optimizations are built into Windows
Server 2003. These settings are automatic, but there might be additional adjustments
that are worthwhile for you to consider making manually. These manual adjustments
are discussed in Chapter 6, “Advanced Performance Topics.” Some of the built-in performance optimizations employ intelligent scheduling algorithms. Keep in mind that
scheduling algorithms have a better opportunity to improve performance only when
enough queued requests are outstanding that it makes a difference which request the
operating system schedules next. Consequently, the busier the resource is, the greater
the impact on performance these performance optimizations have. The next section
explains how these scheduling algorithms work.
Managing Queues for Optimal Performance
If multiple requests are waiting in a queue, the queuing discipline is what determines
which request is serviced next. Most queues that humans occupy when they are wait-
24
Microsoft Windows Server 2003 Performance Guide
ing for service are governed by the principle of fairness. A fair method of ordering the
queue is First Come, First Serve (FCFS). This is also known as a FIFO queue, which
stands for First In, First Out. This principle governs how you find yourself waiting in
a bank line to cash a check, for example. FIFO is considered fair because no request
that arrives after another can be serviced before requests that arrived earlier are themselves satisfied. Round robin is another fair scheduling algorithm where customers
take turns receiving service.
Unfair scheduling algorithms Fair scheduling policies do not always provide optimal performance. For performance reasons, the Windows Server 2003 operating system does not always use fair scheduling policies for the resource queues it is
responsible for managing. Where appropriate, Windows Server 2003 uses unfair
scheduling policies that can produce better results at a heavily loaded device. The
unfair scheduling algorithms that are implemented make it possible for devices such
as processors and disks to work more efficiently under heavier loads, for example.
Priority queuing with preemptive scheduling Certain work requests are
regarded as higher priority than others. If both high priority and low priority requests
are waiting in the queue, it makes sense for the operating system to schedule the
higher priority work first. On Windows Server 2003, queued requests waiting for the
processor are ordered by priority, with higher priority work taking precedence over
lower priority work.
The priority queuing scheme used to manage the processor queue in Windows Server
2003 has at least one additional feature worth considering here. The processor hardware is also used to service high priority interrupts from devices. Devices such as disks
interrupt the processor to signal that an I/O request that was initiated earlier is now
complete. When a device interrupt occurs, the processor stops executing the current
program thread and begins to immediately service the device that generated the
higher priority interrupt. (The interrupted program thread is queued and rescheduled to resume execution after the interrupt is serviced.) Higher priority work that is
scheduled to run immediately and interrupts a lower priority thread that is already
running is called preemptive scheduling. Windows Server 2003 uses both priority queuing and preemptive scheduling to manage the system’s processor queue. The priority
queuing scheme used by the operating system to manage the processor queue is
reviewed in more detail later in this chapter.
Priority queuing has a well-known side effect that becomes apparent when a resource
is very heavily utilized. If there are enough higher priority work requests to saturate
the processor, lower priority requests might get very little service. This is known as
starvation. When a resource is saturated, priority queuing ensures that higher priority
Chapter 1: Performance Monitoring Overview
25
work receives preferred treatment, but lower priority work can suffer from starvation.
Lower priority work could remain delayed in the queue, receiving little or no service
for extended periods. The resource utilization measurements that are available on
Windows Server 2003 for the processor allow you to assess whether the processor is
saturated, what work is being performed at different priority levels, and whether low
priority tasks are suffering from starvation.
Serving shorter requests first An important result of Queuing Theory is that when
queued requests can be sorted according to the amount of service time that will be
needed to complete the request, higher throughput is achieved if the shorter work
requests are serviced first. This is the same sort of optimization that supermarkets use
when they have shoppers sort themselves into two sets of queues based on the number of items in their shopping carts. In scheduling work at the processor, for example,
this sorting needs to be done based on the expected service time of the request,
because the actual duration of a service request is not known in advance. Windows
Server 2003 implements a form of dynamic sorting that boosts the priority of processor service requests that are expected to be short and reduces the priority of requests
that are expected to take longer. Another situation in which queued requests are
ordered by the shortest service time first is when Serial ATA or SCSI disks are enabled
for tagged command queuing.
For all the benefit these intelligent scheduling algorithms confer, it is important to
realize that reordering the device queue can have a significant performance impact
only when there is a long queue of work requests that can be rearranged. Computer
systems on which these scheduling algorithms are most beneficial have a component
that is saturated for an extended period of time, allowing lengthy lines of queued
requests to build up which can then be sorted. Such a system is by definition out of
capacity. Reducing the queue depth at the bottlenecked device by adding capacity, for
example, is a better long-term solution. You should configure machines with sufficient
capacity to service normal peak loads so that lengthy queues that can be sorted optimally are the exception, not the rule. While intelligent scheduling at a saturated
device can provide some relief during periods of exceptional load, its effectiveness
should never divert your attention from the underlying problem, which is a shortage
of capacity at the resource where the queue is being manipulated so favorably.
Bottleneck Analysis
To make the best planning decisions, a traditional approach is to try and understand
hardware speeds and feeds—how fast different pieces of equipment are capable of running. This approach, however, is much more difficult than it sounds. For example, it
26
Microsoft Windows Server 2003 Performance Guide
certainly sounds like a SCSI disk attached to a 20-MBps SCSI-2 adapter card would
run much slower than one attached to an 80-MBps UltraSCSI-3 adapter card.
UltraSCSI-3 sounds like it should beat an older SCSI-2 configuration every time. But
the fact is that there might be little or no practical difference in the performance of the
two configurations. One reason there might be no difference is because the disk might
transfer data only at 20 MBps anyway, so the extra capacity of the UltraSCSI-3 bus is
never being utilized.
Important
A complex system can run only as fast as its slowest component. This is
the principle that underlies the technique of bottleneck analysis.
The principle that a complex system will run only as fast as its slowest component
forms the basis for a very useful analysis technique called bottleneck analysis. The slowest device in a configuration is often the weakest link. Find it and replace it with a
faster component, and you have a good chance of improving performance. Replacing
some component other than the bottleneck device with a faster component will not
appreciably improve performance. This theory sounds good, of course, but you probably noticed that the rule does not tell you how to go about finding this component.
Given the complexity of many modern computer networks, this seemingly simple
task is actually quite complicated.
In both theory and practice, performance tuning is the process of locating the bottleneck in a configuration and removing it—somehow. The system’s performance will
improve until the next bottleneck manifests, which you can then identify and remove.
Easing a bottleneck for an overloaded resource usually entails replacing it with a
newer, faster version of the same component. For example, if network bandwidth is a
constraint on performance, upgrade the configuration from 10 Mb Ethernet to 100
Mb Fast Ethernet. If the network actually is the bottleneck, performance should
improve.
A system in which all the bottlenecks have been removed can be said to be a balanced
system. All the components in a balanced system are at least capable of handling the
flow of work from component to component without excessive delays building up at
any one particular component. For a moment, think of the network of computing
components where work flows from one component (the CPU) to another (the disk),
back again, then to another (the network) and back again to the CPU, as depicted in
Figure 1-5. When different workload processing components are evenly distributed
across the hardware devoted to doing the processing, that system is balanced.
You can visualize a balanced system (and not one that is simply over-configured) as
one in which workload components are evenly distributed across the processing
Chapter 1: Performance Monitoring Overview
27
resources. If there are delays, the work that is waiting to be processed is also evenly
distributed in the system. Work that is evenly distributed around the system waiting
to be processed is illustrated in Figure 1-5. Suppose you could crank up the rate at
which requests arrive to be serviced. (Think of SQL Server requests to a Windows
Server 2003 database, for example, or logon requests to an Active Directory authentication server.) If the system is balanced, you will observe that work waiting to be processed remains evenly distributed across system components, as shown in Figure 1-5.
Service
Requests
Processor
Queue
CPU
Disk
Queue
Disk
Figure 1-5
NIC
Disk
Queue
Disk
Disk
Queue
Disk
A balanced system
If instead you observe something like what is depicted in Figure 1-6, where many
more requests are waiting behind just one of the disks, you have identified with some
authority the component that is the bottleneck in the configuration. When work
backs up behind a bottlenecked device, delays there can cascade, causing delays to
build up elsewhere in the configuration. Because the manner in which work flows
through the system might be complicated, a bottlenecked resource can impact processing at other components in unexpected ways. Empirically, it is sufficient to
observe that work accumulates behind the bottlenecked device at the fastest rate as the
workload rate increases. Replacing this component with a faster processing component should improve the rate that work that can flow through the entire system.
28
Microsoft Windows Server 2003 Performance Guide
Service
Requests
Processor
Queue
CPU
NIC
Disk
Queue
Disk
Queue
Disk
Queue
Disk
Figure 1-6
Disk
Disk
A bottlenecked system
Utilization Law
The utilization of a device is the product of the observed rate in which requests are
processed and the service time of those requests, as follows:
utilization = completion rate × service time
This simple formula relating device utilization, the request completion rate (or
throughput), and the service time is known as the Utilization Law. Service time is
often difficult to measure, but the Utilization Law makes it possible to measure the
throughput rate and the utilization of a disk, for example, and derive the disk service
time. A disk that processes 30 input/output (I/O) operations per second with an average service time of 10 milliseconds is busy processing requests 30 × 0.010 sec = 300
milliseconds of utilization every second, or 30 percent busy.
Utilization, by definition, is limited to 0–100 percent device-busy. If it is not possible for
a device to be more than 100 percent utilized, what happens when requests arrive at a
device faster than they can be processed? The answer, of course, is that requests for service that arrive faster than they can be serviced must be queued. A related question—
the relationship between queue time and utilization—is explored later in this chapter.
Chapter 1: Performance Monitoring Overview
29
There is no substitute for direct measurements of device or application throughput,
service time, and utilization. But the Utilization Law allows you to measure two of the
three terms and then derive the remaining one. Many of the device utilization measurements reported in Windows Server 2003 are derived by taking advantage of the
Utilization Law.
Queue Time and Utilization
If a request A arrives at an idle resource, request A is serviced immediately. If the
resource is already busy servicing request B when request A arrives, request A is
queued for service, forced to wait until the resource becomes free. It should be apparent that the busier a resource gets, the more likely a new request will encounter a busy
device and be forced to wait in a queue. This relationship between utilization and
queue time needs to be investigated further. The insights revealed by a branch of mathematics known as Queuing Theory can shed light on this interesting relationship.
Queuing Theory is a branch of applied mathematics that is widely used to model computer system performance. It can be used, for example, to predict how queue time
might behave under load. Only very simple queuing models will be discussed here.
These simple models relate the following:
■
Server utilization (resources are termed servers)
■
Rate at which work requests arrive
■
Service time of those requests
■
Amount of queue time that can be expected as the load on the resource varies
Caution These simple models do not represent reality that closely and are used
mainly because they are easy to calculate. However, experience shows that these simple queuing models can be very useful in explaining how many of the computer performance measurements you will encounter behave—albeit up to a point.
You need to understand some of the important ways these simple models fail to reflect
the reality of complex computer systems so that you are able to use these mathematical
insights wisely. Simple queuing models, as depicted in Figure 1-3, are characterized by
three elements: the arrival rate of requests, the service time of those requests, and the
number of servers to service those requests. If those three components can be measured, simple formulas can be used to calculate other interesting metrics. Both the
queue length and the amount of queue time that requests are delayed while waiting for
service can be calculated using a simple formula known as Little’s Law. Queue time
and service time, of course, can then be added together to form response time, which is
usually the information you are most interested in deriving.
30
Microsoft Windows Server 2003 Performance Guide
Arrival Rate Distribution
To put the mathematics of simple Queuing Theory to work, it is necessary to know
both the average rate that requests arrive and the distribution of arrivals around the
average value. The arrival rate distribution describes whether requests are spaced out
evenly (or uniformly) over the measurement interval or whether they tend to be
bunched together, or bursty. When you lack precise measurement data on the arrival
rate distribution, it is usually necessary to assume that the distribution is bursty (or
random). A random arrival rate distribution is often a reasonable assumption, especially if many independent customers are generating the requests. A large population
of users of an Internet e-business Web site, for example, is apt to generate a randomly
distributed arrival rate. Similarly, a Microsoft Exchange Server servicing the e-mail
requests of employees from a large organization is also likely to approximate a randomly distributed arrival rate.
But it is important to be careful. The independence assumption can also be a very
poor one, especially when the number of customers is very small. Consider, for example, a disk device with only one customer, such as a back-up process or a virus scan.
Instead of having random arrivals, a disk back-up process schedules requests to disk
one after another in a serial fashion. A program execution thread from a back-up program issuing disk I/O requests will generally not release another I/O request until the
previous one completes. When requests are scheduled in this fashion, it is possible for
a single program to drive the disk to virtually 100 percent utilization levels without
incurring any queuing delay. Most throughput-oriented disk processes routinely violate the independence assumption behind simple queuing models because they
schedule requests to the disk serially. As a consequence, a simple queuing model of
disk performance is likely to seriously overestimate the queue time at a disk device
being accessed by a few customers that are scheduling their requests serially.
Service Time Distribution
It is also necessary to understand both the average service time for requests and the
distribution of those service times around the average value. Again, lacking precise
measurement data, it is simple to assume that the service time distribution is also random. It will also be useful to compare and contrast the case where the service time is
relatively constant, or uniform.
The two simple cases illustrated in Figure 1-7 are denoted officially as M/M/1 and M/
D/1 queuing models. The standard notation identifies the following:
arrival rate distribution/service time distribution/number of servers
where M is an exponential, or random distribution; D is a uniform distribution; and 1
is the number of servers (resources).
Chapter 1: Performance Monitoring Overview
31
Response time as a function of utilization
(assumes constant service time = 10)
60
M/M/1
M/D/1
Response time
50
40
30
20
10
0
Figure 1-7
25
50
Utilization (percent)
75
100
Graphing response time as a function of utilization
Both curves in Figure 1-7 show that the response time of a request increases sharply as
the server utilization increases. Because these simple models assume that service time
remains constant under load (not always a valid assumption), the increase in
response time is a result solely of increases in the request queue time. When device
utilization is relatively low, the response time curve remains reasonably flat. But by the
time the device reaches 50 percent utilization, in the case of M/M/1, the average
queue length is approximately 1. At 50 percent utilization in an M/M/1 model, the
amount of queue time that requests encounter is equal to the service time. To put it
another way, at approximately 50 percent busy, you can expect that queuing delays
lead to response times that are double the amount of time spent actually servicing the
request. Above 50 percent utilization, queue time increases even faster, and more and
more queue time delays accumulate. This is an example of an exponential curve,
where queue time (and response time) is a nonlinear function of utilization. As
resources saturate, queue time comes to dominate the application response time that
customers experience.
The case of M/D/1 shows queue time for a uniform service time distribution that is
exactly 50 percent of an M/M/1 random service time distribution with the same average service time. Reducing the variability of the service time distribution works to
reduce queue time delays. Many tuning strategies exploit this fact. If work requests
can be scheduled in a way to create a more uniform service time distribution, queue
time—and response time—are significantly reduced. That is why supermarkets, for
example, separate customers into two or three sets of lines based on the number of
items in their shopping carts. This smoothes out the service time distribution in the
supermarket checkout line and reduces the overall average queue time for shoppers
that are waiting their turn.
32
Microsoft Windows Server 2003 Performance Guide
Queue Depth Limits
One other detail of the modeling results just discussed should be examined. Technically, these results are for open network queuing models that assume the average arrival
rate of new requests remains constant no matter how many requests are backed up in
any of the queues. Mathematically, open queuing models assume that the arrival rate
of requests is sampled from an infinite population. This is an assumption that is made
to keep the mathematics simple. The formula used to derive the queue time from utilization and service time for an M/M/1 model in Figure 1-7 is shown here:
Wq = (Ws × u ) / (1 − u)
In this formula, u is the device utilization, Ws is the service time, and Wq is the queue
time.
The corresponding formula for an M/D/1 model where the service time distribution
is uniform is this:
Wq = (Ws × u ) / (1 − u) / 2
The practical problem with this simplifying assumption is that it predicts the queue
length growing at a faster rate than you are likely to observe in practice. Mathematically, both these formulas show extreme behavior as the utilization, u, approaches 100
percent. (As u approaches 1, the denominator in both formulas approaches 0.) This
produces a right-hand tail to the queue length distribution that rises hyper exponentially to infinity as a resource saturates.
In practice, when a bottleneck is evident, the arrival rate of new requests slows down
as more and more customers get stuck in the system, waiting for service. It takes a
more complicated mathematical approach to model this reality. Closed network queuing models are designed to reflect this behavior. In a closed model, there is an upper
limit on the customer population. Examples of such an upper limit in real life include
the number of TCP sessions established that sets an upper limit on the number of network requests that can be queued for service. In dealing with the processor queue,
there is an upper limit on the queue depth based on the number of processing threads
in the system eligible to execute. This upper limit on the size of the population creates
a practical limit on maximum queue depth. If 200 processing threads are defined, for
example, it is impossible for the processor queue depth (representing threads that are
queued for service) to exceed 199. The practical limit on the processor queue depth is
even lower. Because many processing threads are typically idle, a more practical upper
limit on the processor queue depth you are likely to observe is the number of processing threads that are eligible to run, that is, those threads not in a voluntary Wait state.
The number of processing threads that are currently eligible to run, in fact, sets a practical upper limit on the processor queue depth. Once every thread that is eligible to
run is stuck in the processor queue, jammed up behind a runaway thread in a high
priority loop, for example, no new requests for processor service are generated.
Chapter 1: Performance Monitoring Overview
33
Though closed network queuing models are much more capable of modeling reality
than the simple open models, they lead to sets of simultaneous equations that need to
be solved, and the mathematics is beyond the scope of this chapter. You can learn
more about these equations and the techniques for solving them in any good computer science textbook on the subject. The point of this discussion is to provide some
advice on when to use the simple equations presented here. The simple formulas for
M/M/1 and M/D/1 open queuing models suffice for many capacity planning exercises that do not require a precise solution. However, they break down when there is
a saturated resource. Up until the point when server utilization starts to approach 100
percent, the simple open models generally do a reasonably good job of predicting the
behavior you can observe in practice.
Little’s Law
A mathematical formula known as Little’s Law relates response time and utilization.
In its simplest form, Little’s Law expresses an equivalence relation between response
time (W), the arrival rate (λ), and the number of customer requests in the system (Q),
which is also known as the queue length:
Q=λ×W
Note that in this context, the queue length Q refers both to customer requests in service (Qs) and waiting in a queue (Qq) for processing. In Windows Server 2003, there
are several opportunities to take advantage of Little’s Law to estimate the response
time of applications where only the arrival rate and queue length are known. For the
record, Little’s Law is a very general result that applies to a large class of queuing models. It allows you to estimate response time in a situation in which measurements for
both the arrival rate and the queue length are available. Note that Little’s Law itself
provides no insight into how the response time (W) is broken down into the service
time (Ws) and the queue time delay (Wq).
Unfortunately, defining suitable boundaries for transactions in Windows applications
is very difficult, which is why there are not more direct measurements of response
time available in Windows Server 2003. Using Little’s Law, in at least one instance
(discussed later in this book) it is possible to derive reliable estimates of response time
from the available measurements.
The response time to service a request at a resource is usually a nonlinear function of
its utilization. This nonlinear relation between response time and utilization that usually holds is known as Little’s Law. Little’s Law explains why linear scalability of applications is so difficult to achieve. It is a simple and powerful construct, with many
applications to computer performance analysis. However, don’t expect simple formulas like Little’s Law to explain everything in computer performance. This chapter, for
example, will highlight several common situations where intelligent scheduling algorithms actually reduce service time at some computer resources the busier the
34
Microsoft Windows Server 2003 Performance Guide
resource gets. You cannot apply simple concepts like Little’s Law unreflexively to
many of the more complicated situations you can expect to encounter.
Response Time Revisited
As the utilization of shared components increases, processing delays tend to be
encountered more frequently. When the network is heavily utilized and Ethernet collisions occur, for example, network interface cards are forced to retransmit packets. As
a result, the service time of individual network requests elongates. The fact that
increasing the rate of requests often leads to processing delays at busy shared components is crucial. It means that you should expect that as the load on your server rises
and bottlenecks in the configuration start to appear, the overall response time associated with processing requests will not hold steady. Not only will the response time for
requests increase as utilization increases, but that response time will likely increase in
a nonlinear relationship with respect to utilization. In other words, as the utilization of
a device increases slightly from 80 percent to 90 percent busy, you might observe that
the response time of requests doubles, for example.
Response time, then, encompasses both the service time at the device processing the
request and any other delays encountered waiting for processing. Formally, response
time is defined as
response time = service time + queue time
where queue time (Wq) represents the amount of time a request waits for service. In
general, at low levels of utilization, there is minimal queuing, which allows device service time to dictate response time. As utilization increases, however, queue time
increases nonlinearly and, soon, grows to dominate response time.
Note
Although this discussion focuses on the behavior of a queue at a single
resource, network queuing models allow you to step back and analyze the response
time of customer transactions at a higher level. The customer transaction must first be
decomposed into a series of service demands against a set of related resources—how
many times, on average, each transaction uses each of the disks, for example. Then the
response time of the high-level transaction can be modeled as the sum of the
response times at each individual resource.
Conceptually, a client transaction can be represented using a workstation component,
a network transmission component, and a server component. Each of these subcomponents can be further understood as having a processor component, a disk component, and a network component, and so on. To track down the source of a
performance problem, you might need measurement data on every one of the
resources involved in processing the request.
Queues are essentially data structures where requests for service are parked until they
can be serviced. Examples of queues abound in Windows Server 2003, and measures
Chapter 1: Performance Monitoring Overview
35
showing the queue length or the amount of time requests wait in the queue for processing are some of the most important indicators of performance bottlenecks you are
likely to find. The Windows Server 2003 queues that will be discussed here include
the operating system’s thread scheduling queue, logical and physical disk queues, the
network interface queue, and the queue of Active Server Pages (ASP) and ASP.NET
Web server requests. Little’s Law shows how the rate of requests, the response time,
and the queue length are related. This relationship allows us to calculate, for example,
average response times for ASP.NET requests even though Windows Server 2003 does
not report the average response time of these applications directly.
Preventing Bottlenecks
With regular performance monitoring procedures, you will be able to identify devices
with high utilizations that lead to long queues in which requests are delayed. These
devices are bottlenecks throttling system performance. Intuitively, what can you do
about a bottleneck once you discover one? Several forms of preventive medicine can
usually be prescribed:
1. Upgrade to a faster device, if available.
2. Balance the workload across multiple devices, if possible.
3. Reduce the load on the device by tuning the application, if possible.
4. Change scheduling parameters to favor cherished workloads, if possible.
These are hardly mutually exclusive alternatives. Depending on the situation, you
might want to try more than one of them. Common sense should dictate which of
these alternatives you try first. Which change will have the greatest impact on performance? Which configuration change is the least disruptive? Which is the easiest to
implement? Which is possible to back out in case it makes matters worse? Which
alternative involves the least additional cost? Sometimes, the choices are fairly obvious, but often these are not easy questions to answer.
Measuring Performance
Response time measurements are also important for another reason. For example, the
response time for a Web server is the amount of time between the instant that a client
selects a hyperlink and the requested page is returned and displayed on her monitor.
Because it reflects the user’s perspective, the overall response time is the performance
measure of greatest interest to the users of a computer system. It is axiomatic that long
delays cause user dissatisfaction with the users’ computer systems, although this is
usually not a straightforward relationship either. Human factors research, for
instance, indicates that users might be bothered more by long, unexpected delays
than by consistently long response times that they can become resigned to endure.
36
Microsoft Windows Server 2003 Performance Guide
Conclusions
This section showed how the scalability concerns of computer performance analysts
can be expressed in formal mathematical terms. Queuing systems represent computer
workloads in the form of resources and their queues and customer requests for service. The total amount of time a customer waits for a request for service to complete is
the response time of the system. Response time includes both time in service at the
resource and the time delayed in the resource queue waiting for service. A complex
computer system can be represented as a network of interconnected resources and
queues.
The utilization of a resource is the product of the average service time of requests and
the request rate. This relationship, which is known as the Utilization Law, allows you
to calculate the service time of disk I/O requests from the measured utilization of the
disk and the rate of I/O completions.
Queue time is often a significant factor delaying service requests. Consequently, minimizing queue time delays is an important performance and tuning technique. A formula known as Little’s Law expresses the number of outstanding requests in the
system, which includes queued requests, as the product of the arrival rate and
response time. Queue time tends to increase exponentially with respect to utilization.
The device with the fastest growing queue as the workload grows becomes the bottleneck that constrains performance and limits scalability. Identifying the bottlenecked
device in a configuration that is performing poorly is another important technique in
performance and tuning. Only configuration and tuning actions that improve service
time or reduce queuing at the bottlenecked device can be effective at relieving a capacity constraint. Any other configuration or tuning change that you make will prove
fruitless.
The formulas that were discussed in this section are summarized in Table 1-2.
Table 1-2
Formulas Used in Computer Performance Analysis
Formula
Derivation
Response time
response time = service time + queue time
Utilization Law
utilization = service time × arrival rate
Queue time as a function of
queue length and service time
queue time = ((queue length−1) × average service time) + (average service time/2)
Queue time as a function of
utilization and service time: (M/M/1)
queue time = (service time × utilization ) /
(1 − utilization)
Queue time as a function of
utilization and service time: (M/D/1)
queue time = (service time × utilization ) /
(1 − utilization) / 2
Little’s Law
queue length = arrival rate × response time
Chapter 1: Performance Monitoring Overview
37
The Utilization Law and Little’s Law are especially useful when you have measurement data for two of the terms shown in the equation and can use the equations to
derive the third term. The simple open queuing model formulas are mainly useful
conceptually to demonstrate how response time is related to utilization, modeling the
behavior of a bottlenecked system where one or more resources are saturated. The primary method used in computer performance and tuning is bottleneck analysis, which
uses measurement data to identify saturated resources and then works to eliminate
them systematically.
System Architecture
To use the performance monitoring tools provided with Windows Server 2003 effectively requires a solid background in computer systems architecture. This is knowledge that often takes systems administrators years of experience to acquire. This
section discusses the aspects of computer architecture that are particularly important
in understanding how to use the Windows Server 2003 performance tools effectively.
Among the topics addressed in this section are how Windows Server 2003 thread
scheduling at the processor works, how the system manages virtual memory and paging, Windows Server 2003 networking services, and the Windows Server 2003 I/O
Manager.
This section describes the basic architectural components of a Windows Server 2003
machine, emphasizing performance-related issues. Discussing these topics here is
intended to introduce a complex subject, providing the basic background you need to
pursue these more complex topics further, particularly as you begin to explore the
performance monitoring tools available to you in Windows Server 2003. Aspects of
the operating system that are most important in diagnosing and resolving performance problems are emphasized throughout.
The introductory material presented here is a prerequisite to the discussion in Chapter 3, “Measuring Server Performance,” which describes the most important performance statistics that can be gathered. This introductory material also provides the
background necessary to understand the information presented in Chapter 5, “Performance Troubleshooting.”
Using the Performance Monitor
The discussion of each major component of the Windows Server 2003 operating system focuses on the measurements that are available to help you understand how your
computer is performing.
38
Microsoft Windows Server 2003 Performance Guide
Tip
The best way to learn about computer performance is through direct observation. Look for sections marked as Tips that invite you to follow along with the discussion. Use the built-in Performance Monitor tool to examine the performance counters
that are directly related to the topic you are reading about.
The Performance Monitor is probably the most important tool that you will use to
diagnose performance problems on a Windows Server 2003 machine. Complete documentation about using this and other performance monitoring tools is available in
Chapter 2, “Performance Monitoring Tools,” in this book. To access the Performance
Monitor, from the Run menu, type perfmon, or click Performance on the Administrative Tools menu.
Performance data in the Performance Monitor is organized into objects, which, in turn,
contain a set of related counters. To open a selection menu so that you can access the
current values of the counters, click the plus sign (+) on the button bar that allows
you to add counters to a real-time display. The names of the counters to watch are referenced in each Tip section, which look like the Tip that follows. A simple objectname\counter-name convention is used here to identify the counter. For example, Processor\% Processor Time identifies a counter called % Processor Time that you will
find under the Processor object. A complete guide to the syntax for Performance Monitor counters is also provided in Chapter 2, “Performance Monitoring Tools.”
Tip
You can observe the performance of your machine in action using the System
Monitor console, which runs a desktop application called Performance Monitor. To
access the System Monitor console, open the Administrative Tools menu and select
Performance. You can watch the current values of many performance statistics using
this tool.
Performance statistics are organized into objects and counters. To add counters to the
current Performance Monitor Chart View, click the Add counters button, identified by
a plus sign (+), to select the counter values you want to observe.
Complete documentation about using the System Monitor console is available in
Chapter 2, “Performance Monitoring Tools.”
Operating Systems
Windows Server 2003 is an operating system program especially designed to run
enterprise and departmental-level server applications on a wide range of computers. A
computer system includes hardware components such as a central processing unit
(CPU, or processor, for short) that performs arithmetic and logical operations, a memory unit that contains active programs and data, at least one disk drive where computer data files are stored permanently, a network interface to allow the computer to
Chapter 1: Performance Monitoring Overview
39
communicate with other computers, a video display unit that provides visual feedback, and a keyboard and mouse to allow for human input. An operating system is a
control program, a piece of software that allows you to utilize the computer system.
An operating system is required to run your computer hardware configuration.
Like many other examples of operating systems software, Windows Server 2003 provides many important control functions, such as these:
■
Supports Plug and Play services that allow it to recognize, install, and configure
new peripherals such as printers, fax machines, scanners, and tape back-up
devices.
■
Secures the use of your computers against misappropriation by unauthorized
users.
■
Allows you to install and run many popular computer applications. In addition,
it is capable of running multiple application programs at any one time, known
as multithreading.
■
Allows your programs to store and access data files on local disks. In addition, it
allows you to set up and maintain file and print servers so that you can easily
share data files and printers with other attached users.
■
Allows you to set up and maintain Web servers so that other users can interact
and communicate using Internet protocols and other Web services.
Moreover, Windows Server 2003 is designed to function as an application server. It
can be configured to serve many roles, including a domain controller responsible for
security and authentication services; a database server running Microsoft SQL Server,
a database management system (DBMS); a messaging server running the Exchange
messaging and collaboration program; and a Web server running the Internet Information Services (IIS) application.
Windows Server 2003 is designed to run on a wide range of computer hardware configurations, including advanced multiprocessor systems costing millions of dollars.
Many of the advanced capabilities of Windows Server 2003 are discussed in Chapter
6, “Advanced Performance Topics.” From a performance perspective, some of the
most important aspects of the Windows Server 2003 operating system are the
advanced hardware functions it supports.
Figure 1-8 shows the basic structure of the Windows Server 2003 operating system
and its most important components. Operating system components run in a protected mode enforced by the hardware, known as Privileged state or Kernel mode. Kernel mode programs, such as device drivers, can access hardware components directly.
Applications running in User mode cannot—they access hardware components indirectly by calling the operating system services that are responsible for managing hardware devices.
40
Microsoft Windows Server 2003 Performance Guide
Security
Subsystem
Encryption
Subsystem
User
Process
Services
Win32
Subsystem
Winlogon
User
Privileged
Executive
Server/Redirector
I/O Manager
Stack
I/O
Manager
NBT
IP
Device
Drivers
Process
Manager
Network
Protocol Stack
TCP/UDP
File
System
Drivers
Object
Manager
Security
Virtual
Memory
Manager
Win32k.
sys
Local
Procedure
Call
Ntoskrnl.sys
Processor.sys
NDIS
Driver
Video
Driver
Hal.sys
HARDWARE
Figure 1-8
The main components of the Windows Server 2003 operating system
The full set of operating system services is known as the Executive, which reflects its
role as a control program. Some of the major components of the Executive are discussed further later in this chapter. A more detailed account of the operating system
structure and logic is provided in Microsoft Windows Server 2003 Resource Kit Troubleshooting Guide.
Components
Operating system components run in Kernel mode. These components include the
kernel, the Executive, and the device drivers, as illustrated in Figure 1-8. Kernel mode
is a method of execution that allows code to access all system memory and the full set
Chapter 1: Performance Monitoring Overview
41
of processor instructions. The differential access allowed to Kernel mode and User
mode processes is enforced by the processor architecture. User mode components are
limited to using the set of interfaces provided by the Kernel mode components to
interact with system resources.
The kernel itself, contained in Ntoskrnl.exe, is the core of the Windows Server 2003
layered architecture. The kernel performs low-level operating system functions,
including thread scheduling, dispatching interrupts, and dispatching exceptions. The
kernel controls the operating system’s access to the processor or processors. The kernel schedules different blocks of executing code, called threads, for the processors to
keep them as busy as possible and to maximize efficiency. The kernel also synchronizes activities among Executive-level subcomponents, such as I/O Manager and Process Manager, and plays a role in troubleshooting by handling hardware exceptions
and other hardware-dependent functions.
The kernel works closely with the hardware abstraction layer (HAL). The HAL encapsulates services that allow the operating system to be very portable from one hardware platform to another. It is the principal layer of software that interacts with the
processor hardware directly. It provides hardware-dependent implementations for
services like thread dispatching and context switching, interrupt processing, instruction-level serialization, and inter-processor signaling. But it hides the details of these
implementations from the kernel. An example HAL function is translating a serialization primitive like a spin lock into a hardware-specific instruction sequence that will
test and set a data word in memory in an uninterruptible, atomic operation. The HAL
exports abstract versions of these services that can be called by the kernel, as well as
other Kernel mode programs, allowing them to be written in a manner that is independent of the specific hardware implementation. Kernel mode components can gain
access to very efficient HAL spin lock services, for example, without needing to know
how these services are actually implemented on the underlying hardware. The HAL
also provides routines that allow a single device driver to support the same device on
all platforms. Having this hardware abstraction layer between other operating system
functions and specific processor hardware is largely responsible for the ease with
which the Windows Server 2003 operating system can support so many different processor architectures.
Another hardware-oriented module, Processr.sys, contains routines to take advantage
of processor power management interfaces, where they exist.
The Executive provides higher-level operating system functions than the kernel, including Plug and Play services, power management, memory management, process and
thread management, and security. The Memory Manager, for example, plays a major
role in memory and paging performance, as will be discussed in more detail later.
42
Microsoft Windows Server 2003 Performance Guide
The Win32k.sys module consolidates support for the elements of the Windows Graphical User Interface (GUI) into a set of highly optimized routines. These routines reside
inside the operating system to improve the performance of desktop applications.
The operating system also contains a complete TCP/IP network protocol stack and an
I/O Manager stack for communicating with peripheral devices such as disks, tape
drives, DVD players, and CD players. The video driver that communicates directly
with the graphics display adapter also resides inside the Executive and runs in Privileged mode. The network file server and file server client, respectively Server and Redirector, are services that have major Kernel mode components that run inside the
Executive.
Functions
Figure 1-8 also shows several important operating system functions that run in User
mode outside the Executive. These include the security subsystem, Smss.exe, and the
encryption service contained in Lsass.exe. There are even some Windows graphical
user interface (GUI) functions that reside in the client/server subsystem, Csrss.exe. A
number of other operating system service processes also perform a variety of valuable
functions. Finally, the Windows Logon process, Winlogon.exe, is responsible for
authenticating user logons, which are required to establish a desktop session running
the GUI shell.
Note
Application programs running in User mode are restricted from accessing
protected mode memory locations, except through standard operating system calls.
To access a protected mode operating system service, User mode applications call the
Ntdll.dll communications module that executes a state switch to Kernel mode before
calling the appropriate Kernel mode service. Application programs running in User
mode are also restricted from accessing memory locations associated with other User
mode processes. Both of these restrictions protect the integrity of the Windows Server
2003 system from inadvertent damage by a User mode program. This is why some
applications that were designed to run on earlier versions of the DOS or Windows
operating systems might not run on Windows Server 2003.
Many important Windows Server 2003 performance considerations revolve around
the operating system’s support and interaction with hardware components. These
include the processor, memory, disks, and network interfaces. In the next sections,
the way the operating system manages these hardware components is discussed,
along with the most important performance ramifications.
Chapter 1: Performance Monitoring Overview
43
Processors
At the heart of any computer is a central processing unit (CPU), or simply the processor for short. The processor is the hardware component responsible for computation—
it is a machine that performs arithmetic and logical operations that are presented to it
in the form of computer instructions. These instructions are incorporated into computer programs, which are loaded into the computer’s memory and become the software that the machine executes. Windows Server 2003 runs on a wide variety of 32bit and 64-bit processors that vary in speed and architecture. In this section, several
important aspects of processor performance in a Windows Server 2003 machine are
discussed.
Windows Server 2003 is a computer software program like any other, except that it is
designed to interface directly with the computer hardware and control it. The operating system loads first, using a bootstrap program (or boot, for short) that gains control of the machine a little at a time. After the operating system program initializes, it
is responsible for loading any additional programs that are scheduled to run. These
include the services that are scheduled to be loaded automatically immediately following the operating system kernel initialization and, finally, the Winlogon process that
allows you access to the Windows desktop. For a more detailed account of the boot
process of operating system initialization, see the Microsoft Windows Server 2003
Resource Kit Troubleshooting Guide.
The operating system detects the presence of other hardware resources that are
attached to the computer such as memory, disks, network interfaces, and printers,
and loads the device driver programs that control access to them. Device drivers that
are loaded become part of the operating system. The Plug and Play facilities of Windows Server 2003 allow the operating system to detect and support any additional
hardware devices that you happen to connect to the computer at any time after it is up
and running.
The operating system is also responsible for determining what other programs are
actually run on the computer system. This involves a Scheduler function for running
applications fast and efficiently. The Scheduler is responsible for selecting the next
program thread for the processor to execute and setting the processor controls that
allow the selected thread to run.
Threads
A thread is the unit of execution in Windows Server 2003. Every process address space
has one or more execution threads that contain executable instructions. There are
both operating system threads (or kernel threads) and application program threads
that the operating system keeps track of. A thread can be in one of several, mutually
exclusive Execution states, as illustrated in Figure 1-9.
44
Microsoft Windows Server 2003 Performance Guide
Ready
Running
Wait
Figure 1-9
Thread Execution state
A thread can be:
■
Running The running thread is the set of computer instructions the processor
is currently executing. The processor hardware is capable of executing only one
set of instructions at a time; so, at any one time, only one thread is executing per
processor.
■
Ready A ready thread is one that is eligible to be executed but is waiting for the
processor to become free before it actually can execute. The operating system
stores the handles of all ready threads in the processor Ready queue, where they
are ordered by priority. The priority scheme used in Windows Server 2003 is
described a little later.
■
Waiting A waiting thread is not eligible to run. It is blocked. It remains loaded
in the computer while it is blocked from running until an event that the thread
is waiting for occurs. When a waiting thread unblocks, it transitions to the
Ready state. Blocked threads are frequently waiting for an I/O executing on an
external device to complete.
Note
When a peripheral device such as a disk or a network interface card finishes an operation that it was previously instructed to perform, the device interrupts the processor to demand servicing. You can observe the rate that
interrupts occur on your machine by accessing the Interrupts/sec counter in the
Processor object.
Context switches Computers are loaded with many program threads that all
require execution at the same time. Yet only one program thread can actually execute
at a time. This is known as multiprogramming or multithreading. The Windows Server
2003 operating system, of course, keeps track of all the program threads that are
Chapter 1: Performance Monitoring Overview
45
loaded and their Execution state. As soon as a running thread blocks—usually because
it needs to use one of the devices—the Scheduler function selects among the eligible
threads that are ready to run. The Scheduler selects the ready thread with the highest
priority waiting in the Ready queue to run next. It then sets the control registers on
the processor that determine which program thread executes next and passes control
to the thread selected so that its instructions do execute next. This operation is known
as a context switch. A context switch occurs whenever the operation system passes control from one executing thread to another. Switching threads is one of the functions
that the HAL implements because the mechanics of a context switch are processorspecific.
Tip Accessing a counter called System\Context switches/sec allows you to observe
the rate at which context switches occur on your machine. You can also observe the
rate at which individual program threads execute by accessing the Thread\Context
switches/sec counter.
There is also a Thread\Thread State counter that tells you the current Execution state
of every thread.
Once a thread is running, it executes continuously on the processor until one of the
following events occurs:
■
A high priority interrupt occurs, signaling that an external device has completed
an operation that was initiated previously.
■
The thread voluntarily relinquishes the processor. This is usually because the
thread needs to perform I/O or is waiting on a lock or a timer.
■
The thread involuntarily relinquishes the processor, usually because it incurred
a page fault, which requires the system to perform I/O on its behalf. Page fault
resolution is discussed in detail in a later section of this chapter.
■
A maximum uninterrupted execution time limit is reached. This is known as a
time slice. At the end of a time slice, the thread remains in the Ready state. The
Scheduler returns a thread that has exhausted its time-slice on the processor to
the Ready queue and then proceeds to dispatch the highest priority thread that
is waiting in the Ready queue. If the long-running thread that was executing is
still the highest priority thread on the Ready queue, it will receive another time
slice and be scheduled to run next anyway. Time-slicing is discussed in greater
detail later in this chapter.
Multithreading The basic rationale for multithreading is that most computing
tasks do not execute instructions continuously. After a typical program thread executes for some period of time, it often needs to perform an input/output (I/O) operation like reading information from the disk, printing some text or graphics to a
printer, or drawing on the video display. While a program thread is waiting for this
46
Microsoft Windows Server 2003 Performance Guide
input/output function to complete, it is not necessary for the program to hang on to
the processor. An operating system that supports multithreading saves the status of a
thread that is waiting, restores its status when it is ready to resume execution, and
tries to find something else that can run in the meantime.
I/O devices are much slower than the processor, and I/O operations typically take a
long time compared to CPU processing. A single I/O operation to a disk might take 5
or 10 milliseconds, which means that the disk is capable of executing perhaps 100 or
so such operations per second. Printers, which are even slower, are usually rated in
pages printed per minute. In contrast, processors typically execute an instruction at
least every one or two clock cycles, where you might be running processors capable of
running at 1500–3000 million cycles per second. During the time that one thread is
delayed doing one disk I/O operation, the processor could be executing some
30,000,000 instructions on behalf of another thread.
Although it leads to more efficient utilization of the processor, multithreading actually
slows down individual execution threads because they are not allowed to run uninterrupted from start to finish. In other words, when the thread that was waiting becomes
ready to execute again, it is quite possible for it to be delayed because some higher priority thread is in line ahead of it. Selecting eligible threads from the Ready queue in
order by priority is an attempt to ensure that more important work is delayed the least.
Preemptive scheduling Like other multiprogrammed operating systems, Windows
Server 2003 manages multiple program threads that are all running concurrently. Of
course, only one program can execute at a time on the processor. Threads selected to run
by the Windows Server 2003 Scheduler execute until they block, normally because they
need to perform an I/O operation to access a device or are waiting for a lock or a timer. Or
they execute until an interrupt occurs, which usually signals the completion of an event
that a blocked thread was waiting for. After a thread is activated following an interrupt,
Windows Server 2003 boosts the dispatching priority of that thread. This means that the
thread that was being executed at the time the interrupt occurred is likely to be preempted
by the now higher priority thread that was waiting for the interrupt to occur. Preemptive
scheduling of higher priority work can delay thread execution, but it typically helps to
balance processor utilization across CPU and I/O bound threads.
Note
Thread priority is boosted following an interrupt and decays over time as the
thread executes. This dynamic thread scheduling priority scheme in Windows Server
2003 works with time-slicing to help ensure that a CPU-bound thread cannot monopolize the processor when other Ready threads are waiting.
Thread state When an I/O operation completes, a thread that was blocked
becomes eligible to run again. This scheme means that a thread alternates back and
forth between three states: the Ready state where it is eligible to execute, the Running
state where it actually executes instructions, and a Wait state where it is blocked from
Chapter 1: Performance Monitoring Overview
47
executing. Logically, a Thread state transition diagram like the one in Figure 1-9 models this behavior.
How Windows Server 2003 Tells Time
The way that Windows Server 2003 tells time is crucial to many of the performance measurements that the operating system takes. To understand how the
operating system tells time, you must differentiate between several types of
machine hardware “clocks.”
The first clock is the machine instruction execution clock cycle, which is measured in MHz. The machine’s instruction execution clock is a good indicator of
relative processor speed, but is not accessible externally. The operating system
has no access to the machine instruction execution clock.
Standard Windows timer services create a virtual system clock in 100 nanosecond units. Because this time unit might or might not map easily into the
machine’s real-time clock hardware, maintaining the virtual clock is a HAL function. These timer services are built around a periodic native clock interrupt,
which is set to occur every 10 milliseconds. Even though the granularity of the
clock is in 100 nanosecond units, a clock “tick” actually occurs only once every
10 milliseconds. This is the most familiar form of clock services available in Windows Server 2003. Programmers, for example, call the SetTimer application programming interface (API) function to receive a notification for a specific virtual
clock interrupt. You can get access to the standard clock interval value by calling
GetSystemTimeAdjustment.
There are important operating system functions that work off this standard
clock interrupt. The first is the Scheduler function that checks the current running thread and performs the % Processor Time accounting that is discussed in
this section. A second Scheduler function checks to see whether the running
thread has exhausted its time slice.
The clock interrupt that drives the standard timer services relies on a native system clock, normally a hardware function provided by a chipset external to the
processor. This is also known as the High Precision clock because the native system clock usually has significantly higher resolution than the standard Windows timer services. The precise granularity of the native system clock is specific
to the external clock hardware. Win32 programmers can get access to a high
precision clock using QueryPerformanceCounter and QueryPerformanceFrequency.
For example, consider http://support.microsoft.com//kb/172338. In this example, the standard 10-millisecond clock timer does not offer enough resolution to
time an instruction loop with sufficient granularity.
48
Microsoft Windows Server 2003 Performance Guide
Tip A thread that is blocked is waiting for some system event to occur. The event
signals that the transition from Waiting to Ready can occur. The Thread\Wait Reason
counter shows the reason threads are blocked. You will find that most threads are
waiting for a signal from the operating system Executive, which corresponds to a
Thread\Wait Reason value of 7. The operating system activates a thread that is waiting
on an event and makes it Ready to run when that event occurs. Common events
include waiting for a clock timer to expire; waiting for an I/O to complete; or waiting
for some requested system service, such as authentication or encryption, to complete.
Interrupt Processing
An interrupt is a signal from an external device to the processor. Hardware devices
raise interrupts to request servicing immediately.
When an I/O request to a disk device, for example, is initiated, the device processes
the request independently of the processor. When the device completes the request, it
raises an interrupt to signal the processor that the operation has completed. This signal is treated as a very high priority event: the device is relatively slow compared to the
processor, the device needs attention, and some other user might be waiting for the
device to become free. When the processor recognizes the interrupt request (IRQ), it
does the following:
1. It stops whatever it is doing immediately (unless it is already servicing a higher
priority interrupt request).
2. The device’s Interrupt Service Routine (ISR) is dispatched to begin processing
the interrupt. The Interrupt Service Routine is a function of the device driver.
3. The ISR saves the status of the current running thread. This status information
is used to restore the interrupted thread to its previous Execution state the next
time it is selected to run.
4. The ISR stops the device from interrupting and then services the device that
raised the interrupt.
This is why the process is known as an interrupt: the normal flow of thread execution
is interrupted. The thread that was running when the interrupt occurred returns to
the Ready queue. It might not be the thread the Scheduler selects to run following
interrupt processing. In addition, interrupt processing is likely to add another thread
to the Ready queue, namely the thread that was waiting for the event to occur.
Note The thread execution status information that the ISR saves is also known as the
thread context. The thread context includes the thread’s set of machine registers, the kernel stack, a thread environment block, and a user stack in the address space of the
thread’s process. For more information about the thread context, see the Platform SDK.
Chapter 1: Performance Monitoring Overview
49
In Windows Server 2003, one consequence of an interrupt occurring is a likely reordering of the Scheduler Ready queue following interrupt processing. The device
driver that completes the interrupt processing supplies a boost to the priority of the
application thread that transitions from Waiting to Ready when the interrupt processing completes. Interrupt processing juggles priorities so that the thread made ready to
run following interrupt processing is likely to be the highest thread waiting to run in
the Ready queue. Consequently, the application thread that had been waiting for an
I/O request to complete is likely to receive service at the processor next.
Voluntary Wait
A thread voluntarily relinquishes the processor when it issues an I/O request and
then waits for the request to complete. Other voluntary waits include a timer wait, or
waiting for a serialization signal from another thread. A thread issuing a voluntary
wait enters the Wait state. This causes the Windows Server 2003 Scheduler to select
the highest priority task waiting in the Ready queue to execute next. Threads with a
Wait State Reason value of 7, waiting for a component of the Windows 2000 Executive, are in the voluntary Wait state.
Involuntary Wait
Involuntary waits are usually associated with virtual memory management. For example, a thread enters an involuntary Wait state when the processor attempts to execute
an instruction that references data in a buffer that happens not to be currently resident in physical memory (or RAM). Because the instruction indicated cannot be executed, the processor generates a page fault interrupt, which the Virtual Memory
Manager (VMM) must resolve by allocating a free page in memory, reading the appropriate page containing the necessary instruction or data into memory from disk, and
re-executing the failed instruction.
A currently running thread encountering a page fault is promptly suspended with the
thread context reset to re-execute the failing instruction. The suspended task is placed
in an involuntary Wait state until the page requested can be brought into memory and
the instruction executed successfully. At that point, the Virtual Memory Manager
component of the operating system is responsible for resolving the page fault and
transitioning the thread from the Wait state back to Ready. Virtual memory and Paging are topics that are revisited later in this chapter.
Tip There are several Thread Wait Reason values that correspond to virtual memory
involuntary waits. See the Thread\Thread Wait Reason Explain text for more details.
When you are able to observe threads delayed with these Wait Reasons, it is usually a
sign that there is excessive memory management overhead, an indication that the
machine has a shortage of RAM, which forces the memory management routines to
work harder.
50
Microsoft Windows Server 2003 Performance Guide
Time-Slicing
A running thread that almost never needs to perform I/O or block waiting for an event
is not allowed to monopolize the processor completely. Without intervention from
the Scheduler, some very CPU-intensive execution threads will attempt to do this.
There is also the possibility that a program bug will cause the thread to go into an infinite loop in which it will attempt to execute continuously. Either way, the Windows
Server 2003 Scheduler will eventually interrupt the running thread if no other type of
interrupt occurs. If the thread is not inclined to relinquish the processor voluntarily,
the Scheduler eventually forces the thread to return to the Ready queue. This form of
processor sharing is called time-slicing, and it is designed to prevent a CPU-bound task
from dominating the use of the processor for an extended period of time. Without
time-slicing, a high priority CPU-intensive thread could indefinitely delay other
threads waiting in the Ready queue. The Scheduler implements time-slicing using a
high-priority clock timer interrupt that is set to occur at regular intervals to check on
the threads that are running. For more information about this clock interrupt, see
“How Windows Server 2003 Tells Time” sidebar.
When a thread’s allotted time slice is exhausted, the Windows Server 2003 Scheduler
interrupts it and looks for another Ready thread to dispatch. Of course, if the interrupted thread still happens to be the highest priority Ready thread (or the only Ready
thread), the Scheduler is going to select it to run again immediately. The Scheduler
also lowers the priority of any thread that was previously boosted when the thread
executes for the entire duration of its time-slice. This further reduces the likelihood
that a CPU-intensive thread will monopolize the processor. This technique of boosting the relative priority of threads waiting on device interrupts and reducing the priority of CPU-intensive threads helps to ensure that a CPU-bound thread cannot
monopolize the processor when other Ready threads are waiting.
The duration of a thread’s time-slice is established by default. Under Windows Server
2003, the time slice value is a long interval, which usually benefits long-running
server application threads. Longer time slices lead to less overhead from thread context switching. Shorter time slices generally benefit interactive work. You might consider changing the default time slice value to a shorter interval on a Windows Server
2003 machine that was used predominantly for interactive work under Terminal Services, for example. The time-slice settings and criteria for changing them are discussed
in Chapter 6, “Advanced Performance Topics.”
Chapter 1: Performance Monitoring Overview
51
Note The Scheduler uses a simple but flexible mechanism for making sure that running threads do not execute continuously for more than their allotted time-slice. At
the time the Scheduler initially selects a thread to run, the thread receives an initial
time-slice allotment in quantums. The quantum corresponds to the periodic clock
interrupt interval. During each periodic clock interval that the thread is found running,
the Scheduler subtracts several quantums from the allotment. When the thread has
exhausted its time allotment—that is, the number of quantums it has remaining falls
to zero—the Scheduler forces the thread to return to the Ready queue.
Idle Thread
When the current thread that is running blocks, often because of I/O, the Windows
Server 2003 Scheduler finds some other thread that is ready to run and schedules it
for execution. What if no other program threads are ready to run?
If no threads are ready to run, the Windows Server 2003 Scheduler calls a HAL routine known as the Idle thread. The Idle thread is not a true thread, nor is it associated
with a real process. There is also no Scheduling priority associated with the Idle
thread. In reality, the Idle thread is a bookkeeping mechanism that is provided to
allow the operating system to measure processor utilization.
Normally, the Idle thread routine will execute a low priority instruction loop continuously until the next interrupt occurs, signaling that there is real work to be done. But,
for example, if the processor supports power management, the Idle thread routine
will eventually call the Processr.sys module to instruct the processor to change to a
state where it consumes less power. The way the Idle thread is implemented is discussed in greater detail in Chapter 5, “Performance Troubleshooting.”
Accounting for Processor Usage
Windows Server 2003 uses a sampling technique to account for processor usage at
the thread, process, and processor level. The operating system allows the built-in system clock to generate a high priority interrupt periodically, normally 100 times per
second. During the servicing of this periodic interval interrupt, the Interrupt Service
Routine checks to see which thread was running when the interrupt occurred. The
ISR then increments a timer tick counter field (a timer tick is 100 nanoseconds) in the
Thread Environment Block to account for the processor usage during the last interval
52
Microsoft Windows Server 2003 Performance Guide
between periodic interrupts. Note that all the processor time during the interval is
attributed to the thread that was executing when the periodic clock interrupt
occurred. This is why the % Processor Time measurements Windows Server 2003
makes should be interpreted as sampled values.
Tip Use the Thread\% Processor Time counter to see how much processor time a
thread accumulates. This data is also rolled up to the process level where you can
watch a similar Process\% Processor Time counter. Because the sampling technique
used to account for processor usage requires a respectable number of samples for any
degree of accuracy, the smallest data collection interval that System Monitor allows is
1 second.
Because the periodic clock interrupt is very high priority, it is possible to account for
processor usage during the Interrupt state as well as threads running in either the
Privileged state or the User state.
Tip
The portion of time that a thread is executing in User mode is captured as
Thread\% User Time. Privileged mode execution time is captured in the Thread\%
Privileged Time counter. % User Time and % Privileged Time are also measured at the
Process and Processor levels.
The periodic clock interrupt might also catch the processor when it was previously
executing the Idle thread function in either the HAL or the Processr.sys routine.
The Idle thread is allowed to accumulate processor utilization clock tick samples
exactly like real threads. By subtracting the amount of time the periodic clock interval routine found the system running the Idle thread from 100 percent, it becomes
possible to calculate accurately how busy the processor is over any extended
period of observation.
Note
Think of the Idle thread as a bookkeeping mechanism instead of a true execution thread. The Processor\% Processor Time counter is calculated by subtracting
the amount of time the system found that the Idle thread was running from 100 percent. On a multiprocessor machine, there are separate bookkeeping instances of the
Idle thread, each dedicated to a specific processor. The _Total instance of the Processor
object actually reports the average % Processor Time across all processors during the
interval.
There are two additional subcategories of processor time usage that Windows Server
2003 breaks out. The % Interrupt Time represents processor cycles consumed in
Chapter 1: Performance Monitoring Overview
53
device driver Interrupt Service Routines, which process interrupts from attached
peripherals such as the keyboard, mouse, disks, and network interface cards. Interrupt processing was discussed earlier in this section. This is work performed at very
high priority, typically while other interrupts are disabled. It is captured and reported
separately not only because of its high priority, but also because it is not easily associated with any particular User mode process.
Windows Server 2003 also tracks the amount of time device drivers spend executing
deferred procedure calls (DPCs), which also service peripheral devices, but run with
interrupts enabled. DPCs represent higher priority work than other system calls and
kernel thread activity. Note that both ISRs and DPCs are discussed later in this chapter when the priority queuing mechanism in Microsoft Windows Server 2003 is
described in more detail.
Note % DPC Time is also included in % Privileged Time. Like % Interrupt Time, it is
only available at the Processor level. When the periodic clock interval catches the system executing an ISR or a DPC, it is not possible to associate this interrupt processing
time with any specific User or Kernel mode thread.
Transient threads and processes Although it is usually very accurate, the sampling
technique that the operating system uses to account for processor usage can miss
some of what is happening on your system. Watch out for any transient threads and
processes that execute for so little time that you can miss them entirely. Once a thread
terminates, the processor timer ticks it has accumulated are also destroyed. This is
also true at the process level. When a process terminates, all its associated threads are
destroyed. At the next Performance Monitor collection interval, there is no evidence
that process or thread ever existed!
Tip If you discover that too much of the % Processor Time you gather at the processor level is unaccounted for at the process or thread level, your machine might be
running many transient processes. Examine the process Elapsed Time counter to
determine whether you have a large number of transient processes that execute very
quickly. Increase the rate of Performance Monitor data collection until the sample rate
is less than 1/2 the average elapsed time of your transient processes. This will ensure
that you are gathering performance data rapidly enough to catch most of these transient processes in action.
Increasing the Performance Monitor data collection sample rate also has overhead
considerations, which are discussed in Chapter 2, “Performance Monitoring Tools.”
54
Microsoft Windows Server 2003 Performance Guide
Normalizing CPU time All the % Processor Time utilization measurements that the
operating system gathers are reported relative to the processing power of the hardware. When you use a measurement that reports the processor as being, say, 60 percent, the logical question to ask is “Percent of what?” This is a good question to ask
because you can expect a program running on a 1 GHz Pentium to use three times the
amount of processor time as the same program running on a 3 GHz Pentium
machine.
For comparisons across hardware, normalizing CPU seconds based on a standard hardware platform can be useful. Fortunately, both Intel and AMD microprocessors identify
their clock speed to the initialization NTDETECT routine. Use the System item in Control Panel to determine the clock speed of the processor installed in your machine.
This clock speed value is also stored in the Registry in the ~MHz field under the
HKLM\HARDWARE\DESCRIPTION\System\CentralProcessor key. When it is available, a ProcessorNameString can also be found there that provides similar information.
Processor Ready Queue
The System\Processor Queue Length counter is another important indicator of processor performance. This is an instantaneous peek at the number of Ready threads
that are currently delayed waiting to run. The Processor Queue Length counter is
reported only at the system level because there is a single Scheduler Dispatch Queue
containing all the threads that are ready to run that is shared by all processors on a
multiprocessor. (The operating system does maintain separate queue structures per
processor, however, to enhance performance of the Scheduler on a multiprocessor.)
The Thread State counter in the Thread object, as discussed earlier, indicates which
threads at the moment are waiting for service at the processor or processors.
When the processor is heavily utilized, there is a greater chance that large values for
the Processor Queue Length can also be observed. The longer the queue, the longer
the delays that threads encounter waiting to execute.
Keep in mind when you are comparing processor utilization and Processor Queue
Length that the former represents a continuously sampled value, whereas the latter
represents a single observation reflecting the measurement taken at the last periodic
clock interval. For example, if you gather performance measurements once per second,
the processor utilization statistics reflect about 100 samples per second. In contrast,
the System\Processor Queue Length counter is based only on the last of these samples. This discontinuity can distort the relationship between the two measurements.
Note When the system is lightly utilized, you might see unexpectedly large values
of the Processor Queue Length counter. This is an artifact of the way the counter is
derived using the periodic clock interval.
Chapter 1: Performance Monitoring Overview
55
Priority Queuing
Three major processor dispatching priority levels determine the order in which Ready
threads are scheduled to run. The highest priority work in the system is performed at
Interrupt priority by ISRs. The next highest priority is known as Dispatch level. Dispatch level is where high priority systems routines known as asynchronous procedure
calls (APCs) and deferred procedure calls (DPCs) run. Finally, there is the Passive Dispatch level where both Kernel mode and User mode threads are scheduled. As illustrated in Figure 1-10, the three major processor dispatching priority levels determine
the order in which Ready threads are scheduled to run.
Highest IRQ Level
System shutdown routine
Passive Dispatch Level
(Scheduler)
Dispatch
Level
Interrupt Level
System power-down routine
Interprocessor IRQ
Clock IRQ Level
Interprocessor signaling ISR
Clock ISR routine
IRQ Level n
Device n ISR routine
IRQ Level 1
Device 1 ISR routine
Dispatch Level
APC Level
Deferred procedure calls
Asynchronous procedure calls
Real-time (fixed)
(16-31)
Dynamic
(1-15)
Zero Page Thread
Figure 1-10
The overall priority scheme
Interrupt priority Interrupts are subject to priority. The interrupt priority scheme is
determined by the hardware, but in the interest of portability, the priority scheme is
abstracted by the Windows Server 2003 HAL. During interrupt processing, interrupts
from lower priority interrupts are disabled—or masked, in hardware terminology—so
that they remain pending until the current interrupt processing routine completes.
Lower priority devices that attempt to interrupt the processor while it is running disabled for interrupts remain pending until the ISR routine finishes and once again
enables the processor for interrupts. Following interrupt processing, the operating
56
Microsoft Windows Server 2003 Performance Guide
system resets the processor to return to its normal operating mode, where it is once
again able to receive and process interrupt signals.
Note Running in Disabled mode has some important consequences for Interrupt
Service Routines. Because interrupts are disabled, the ISR cannot sustain a page fault,
something that would normally generate an interrupt. A page fault in an ISR is a fatal
error. To avoid page faults, device drivers allocate memory-resident work areas from
the system’s Nonpaged pool. For a more detailed discussion of the Nonpaged pool,
see the section entitled “System Working Set,” later in this chapter.
Hardware device interrupts are serviced by an Interrupt Service Routine, or ISR,
which is a standard device driver function. Device drivers are extensions of the operating system tailored to respond to the specific characteristics of the devices they
understand and know how to control. The ISR code executes at the interrupt level priority, with interrupts disabled at the same level or lower level. An ISR is high priority
by definition because it interrupts the regularly scheduled thread and executes until it
voluntarily relinquishes the processor (or is itself interrupted by a higher priority
interrupt).
Deferred procedure calls For the sake of performance, ISRs should perform the
minimum amount of processing necessary to service the interrupt in Disabled mode.
Any additional device interrupt-related processing that can be performed with the system once again enabled for interrupts is executed in a routine that the ISR schedules
to run after interrupts are re-enabled. This special post-interrupt routine is called a
deferred procedure call (DPC).
After all pending interrupts are cleared, queued DPCs are dispatched until the DPC
queue itself is empty.
Tip
For each processor, % Interrupt Time and % DPC Time counters are available.
Both % Interrupt Time and % DPC Time are also included in the % Privileged Time
counter. If you want to report on % Interrupt Time and % DPC Time separately, make
sure you then subtract them from % Privileged Time.
Thread dispatching priority After all interrupts are cleared and the DPC queue is
empty, the Windows Server 2003 Scheduler is invoked. The Scheduler examines the
Ready queue, selects the highest priority ready thread, and instructs the processor to
begin executing that thread.
Chapter 1: Performance Monitoring Overview
57
Threads that are ready to run are ordered by priority. Thread dispatching priority is a
number that ranges from zero through 31. The higher the number, the higher the priority. The thread dispatching priority scheme is illustrated in Figure 1-11. Zero is the
lowest priority and is reserved for use by the system zero page thread. The priority values 1–31 are divided into two sections, called the dynamic range (1–15) and the realtime range (16–31). Priority values in the range of 16–31 are used by many operating
system kernel threads. They are seldom used by normal User mode applications.
Real-time priorities are fixed values.
The remaining priorities with values from 1 through 15 are known as the dynamic priority range. When a User mode process is created, it begins life at the Normal base priority, which corresponds to a priority of 8. A priority of 8 is the midpoint of the
dynamic range. The base priority of a process can be adjusted by making a call to the
SetPriorityClass Win32 API function. Using SetPriorityClass, you can choose among
the following base priorities: idle, below normal, normal, above normal, and high, as
shown in Figure 1-11. (Below Normal and Above Normal are seldom used.)
Real-time Range
Real-time Time Critical
Real-time Time Idle
Dynamic Time Critical
Dynamic Idle
System Zero Page Thread
Dynamic Range
Ready Queue
31
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
High
Idle
Above
Normal
Highest
Above
Normal
Below
Lowest/
Highest
Above
Normal
Below
Lowest
Normal
Highest
Above
Normal
Below
Lowest
Figure 1-11
Highest
Above
Normal
Below
Lowest
Highest
Above
Normal
Below
Lowest
Highest
Above
Normal
Below
Lowest
Below
Normal
Real-time
Base priorities and their ranges
Win32
Base Priority
58
Microsoft Windows Server 2003 Performance Guide
Note
Priority 0 is reserved for exclusive use by the operating system’s zero page
thread. This is a low priority thread that places zero values in free memory pages. The
Idle Thread, which is a bookkeeping mechanism rather than an actual execution
thread, has no priority level associated with it.
Threads inherit the base priority of the process when they are created, but they can
adjust their priority upward or downward from the base setting dynamically during
run time. Within each dynamic priority class, five priority adjustments can be made at
the thread level by calling SetThreadPriority. These adjustments are highest, above normal, normal, below normal, and lowest. They correspond to +2, +1, +0, −1, and −2 priority levels above or below the base level, as illustrated in Figure 1-11. The Win32 API
also provides for two extreme adjustments within the dynamic range to either timecritical, or priority 15, and idle, or priority 1.
Tip
You can monitor a thread’s Base Priority and its Priority Current, the latest priority level of the thread. Current priority is subject to adjustment in the dynamic
range, with a boost applied to the priority of a thread transitioning from a voluntary
Wait to Ready. Boosted thread scheduling priority decays over time. At the process
level, you can monitor a Process’s Base Priority.
The thread priority adjustments allow an application designer to give threads performing time-critical work like managing the application’s main window or responding to keyboard and mouse events higher dispatching priority than other threads
processing within the application. On the other hand, threads performing longer-running background processes can be set to priorities below normal. An application
might even expose a tuning parameter that allows a system administrator to determine what the dispatching priority of a process or some of its threads should be. Several of these Dispatch priority tuning parameters are discussed in Chapter 6,
“Advanced Performance Topics.”
When there is no higher priority interrupt processing, DPCs, or APCs to dispatch, the
Windows Server 2003 Scheduler scans the Ready queue to find the highest priority
ready thread to run next. Ready threads at the same priority level are ordered in a
FIFO (First In, First Out) queue so that there are no ties. This type of scheduling
within the priority level is also called round robin. Round robin, as noted earlier, is considered a form of fair scheduling.
Dynamic priority adjustments Priorities in the 1–15 range are called dynamic
because they can be adjusted based on their current behavior. A thread that relinquishes the processor voluntarily usually has its priority boosted when the event it is
waiting for finally occurs. Boosts are cumulative, although it is not possible to boost a
Chapter 1: Performance Monitoring Overview
59
thread above priority 14, the next-to-highest priority level in the dynamic range. (This
leaves priority level 15 in the dynamic range available for time-critical work.)
Threads that have their priority boosted are also subject to decay back to their current
base. Following a dynamic priority boost, a thread in the Running state has its priority
reduced over time. As a boosted priority of a running thread decays over time, priority
is never pushed below its original base priority. At the expiration of its time slice, a
running thread is forced to return to the Ready queue. At the end of its time slice,
thread priority is reset to the level prior to its original boost.
Tip Using the System Monitor console, you can observe these thread dynamic priority adjustments in action. Observe the Thread\ Priority Current counter for all the
threads of an application like Microsoft Internet Explorer while you are using it to
browse the Web. You will see that the current priority of some of the threads of the
Internet Explorer process are constantly being adjusted upward and downward, as
illustrated.
These priority adjustments only apply to threads in the dynamic range—that is why it
is known as the “dynamic” range, as Figure 1-12 illustrates. The effect of these
dynamic adjustments is to boost the priority of threads waiting on I/O devices and
lower the priority of long running threads. This approximates the Mean Time To Wait
scheduling algorithm, which optimizes processor throughput and minimizes processor queuing.
Figure 1-12 Dispatching priorities in the dynamic range are subject to
dynamic adjustment
60
Microsoft Windows Server 2003 Performance Guide
Caution
Because these thread adjustments are made automatically thousands of
time per second based on current thread execution behavior, they are likely to be
much more effective than almost any static priority scheme that you can devise yourself. Override the priority adjustments that the operating system makes automatically
only after a thorough study and careful consideration.
There are no priority adjustments made for threads running in the real-time range.
Important
You normally do not need to worry much about the detailed mechanisms that the Windows Server 2003 Scheduler uses unless:
■
The processor is very busy for extended periods of time
■
There are too many threads delayed in the Ready queue
When these two conditions are true, the processor itself is a potential performance
bottleneck. Troubleshooting processor bottlenecks is one of the topics discussed in
Chapter 5, “Performance Troubleshooting.”
If you are experiencing a processor bottleneck, you might consider adjusting the
default settings that govern time-slicing. These settings are discussed in Chapter 6,
“Advanced Performance Topics.”
Although it is seldom necessary to fine-tune Scheduler parameters like the length of a
time slice, Windows Server 2003 does expose one tuning parameter in the Registry
called Win32PrioritySeparation. When to use the Win32PrioritySeparation setting is discussed in Chapter 6, “Advanced Performance Topics.”
Processor Affinity
On machines with multiple processors, there are additional thread scheduling considerations. By default, Windows Server 2003 multiprocessors are configured symmetrically. Symmetric multiprocessing means that any thread can be dispatched on any
processor. ISRs and DPCs for processing external device interrupts can also run on
any available processor when the configuration is symmetric.
Even though symmetric multiprocessing is easy and convenient, it is not always optimal from a performance standpoint. Multiprocessor performance will often improve if
very active threads are dispatched on the same physical processors. The key to this
performance improvement is the cache techniques that today’s processors utilize to
speed up instruction execution rates. Processor caches store the contents of frequently accessed memory locations. These caches—and there are several—are highspeed buffers located on the microprocessor chip. Fetching either instructions or data
from a cache is several times faster than accessing RAM directly.
Chapter 1: Performance Monitoring Overview
61
When an execution thread is initially loaded on the processor following a context
switch, the processor cache is empty of the code and data areas associated with the
thread. This is known as cache cold start. Time-consuming memory fetches slow down
instruction execution rates during a cache cold start until some time afterwards as the
cache begins to fill up with the code and data areas the thread references during the
execution interval. Over time, the various processor caches become loaded with the
thread’s frequently accessed data, and the instruction execution rate accelerates.
Because the performance difference between a cache cold start and a warm start is
substantial, it is worthwhile to use a thread’s history to select among the available processors where the thread can be scheduled to run. Instead of always experiencing a
slow cache cold start, a thread dispatched on the same physical processor where it
executed last might experience a faster cache warm start. Some of the data and
instructions from memory that the thread accesses might still be in the processor
caches from the last time the thread was running.
Windows Server 2003 uses the thread’s history on a multiprocessor to make its scheduling decisions. Favoring one of the processors out of the many that could be available for thread dispatching on a multiprocessor is known as processor affinity. The
physical processor where the thread was dispatched last is known as the thread’s ideal
processor. If the ideal processor is available when a thread is selected to run, the thread
is dispatched on that processor. This is known as soft affinity, because if the thread’s
ideal processor is not available—it is already busy running a thread of equal or higher
priority—the thread will be dispatched on a less than ideal, but still available processor. If the ideal processor is busy, but it is running a lower priority thread, the lower
priority thread is pre-empted.
Windows Server 2003 also supports hard affinity in which certain threads can be
restricted to being dispatched on a subset of the available processors. Hard processor affinity can be an important configuration and tuning option to use for larger
multiprocessor configurations, but it needs to be used with care. More information
about using hard processor affinity is available in Chapter 6, “Advanced Performance Topics.”
Memory and Paging
Random access memory (RAM) is an essential element of your computer. Programs
are loaded from disk into memory so that they can be executed. Each memory location
has a unique address, allowing instructions to access and modify the data that is stored
there. Data files stored on the disk must first be loaded into memory before instructions to manipulate and update that data can be executed. In this section, several
important performance-related aspects of memory usage and paging are discussed.
62
Microsoft Windows Server 2003 Performance Guide
Physical memory (or real memory) needs to be distinguished from virtual memory.
Only specific operating system kernel functions access and operate on physical memory locations directly on a machine running Windows Server 2003. Application programs use virtual memory instead, addressing memory locations indirectly.
Virtual memory addressing is a hardware feature of all the processors that Windows
Server 2003 supports. Supporting virtual memory requires close cooperation
between the hardware and the operating system software. The operating system is
responsible for mapping virtual memory addresses to physical memory locations so
that the processor hardware can translate virtual addresses to physical ones as program threads execute. Virtual memory addressing also allows executing programs to
reference larger ranges of memory addresses than might actually be installed on the
machine. The operating system is responsible for managing the contents of physical
memory so that this virtual addressing scheme runs as efficiently as possible.
Virtual Addressing
Virtual memory is a feature supported by most advanced processors. Hardware support for virtual memory includes a hardware mechanism to map from logical (that is,
virtual) memory addresses that application programs reference to physical memory
hardware addresses. When an executable program’s image file is first loaded into
memory, the logical memory address range of the application is divided into fixed size
chunks called pages. As these logical pages are referenced by an executing program,
they are then mapped to similar-sized physical pages that are resident in physical
memory. This mapping is dynamic so that the operating system can ensure that frequently referenced logical addresses reside in physical memory, while infrequently
referenced pages are relegated to paging files on secondary disk storage.
Virtual memory addressing allows executing processes to co-exist in physical memory
without any risk that a thread executing in one process can access physical memory
belonging to another. The operating system creates a separate and independent virtual address space for each individual process that is launched. On 32-bit processors,
each process virtual address space can be as large as 4 GB. On 64-bit processors, process virtual address spaces can be as large as 16 terabytes in Windows Server 2003.
Note that each process virtual address space must allow for the range of virtual
addresses that operating system functions use. For example, the 4-GB process virtual
address space on 32-bit machines is divided by default into a 2-GB range of private
addresses that User mode threads can address and a 2-GB range of system addresses
that only kernel threads can address. Application program threads can access only virtual memory locations associated with their parent process virtual address space. A
User mode thread that attempts to access a virtual memory address that is in the system range or is outside the range of currently allocated virtual addresses causes an
Access violation the operating system will trap.
Chapter 1: Performance Monitoring Overview
63
Virtual memory systems work well because for executing programs to run, they seldom require all their pages to be resident in physical memory concurrently. The active
subset of virtual memory pages associated with a single process address space that is
currently resident in RAM is known as the process’s working set because those are the
active pages the program references as it executes. With virtual memory, only the
active pages associated with a program’s current working set remain resident in physical memory. On the other hand, virtual memory systems can run very poorly when
the working sets of active processes greatly exceed the amount of RAM that the computer contains. Serious performance problems can arise when physical memory is
over-committed. Windows Server 2003 provides virtual and physical memory usage
statistics so that you can recognize when an acute shortage of physical memory leads
to performance problems.
Page Tables
Virtual memory addresses are grouped into fixed-size blocks of memory called pages.
The virtual memory pages of a process are backed by pages in physical memory that
are the same size. Page Tables, built and maintained by the operating system, are used
to map virtual memory pages to physical memory. The processor hardware specifies
the size of pages and the format of the Page Tables that are used to map them. This
mapping is dynamic, performed on demand by the operating system as threads running in the process address space access new virtual memory locations. Because available RAM is allocated for active pages on demand, virtual memory systems use RAM
very efficiently.
One advantage of virtual memory addressing is that separate application programs
loaded into RAM concurrently are both isolated and protected from each other when
they run. Threads associated with a process can reference only the physical memory
locations that correspond to the process’s unique set of virtual memory pages. This
makes it impossible for a bug in one program to access memory in another executing
program’s virtual address space. Another advantage is that User mode programs can
be written largely independently of how much RAM is actually installed on any particular machine. A process can be written to reference a uniform-sized virtual address
space regardless of how much physical memory is present on the machine.
Virtual memory addresses are assigned to physical memory locations on demand,
which has a number of implications for the performance of virtual memory machines.
The Memory Manager component of the Windows Server 2003 Executive is responsible for building and maintaining process address space Page Tables. The Memory
Manager is also responsible for managing physical memory effectively. It attempts to
ensure that an optimal set of pages for each running process—its working set of active
pages—resides in RAM.
64
Microsoft Windows Server 2003 Performance Guide
Note
Working set pages are the active pages of a process address space currently
backed by RAM. These are resident pages. Nonresident pages are virtual memory
addresses that are allocated but not currently backed by RAM. Committed pages are
those that have Page Table entries (PTEs). Committed pages can be either resident or
nonresident.
Virtual memory addressing makes life easier for programmers because they no longer
have to worry about how much physical memory is installed. Virtual memory always
makes it appear as if 4 GB of memory is available to use. Virtual memory addressing
is also transparent to the average application programmer. User mode threads never
access anything but virtual memory addresses.
The Virtual Memory Manager performs several vital tasks in support of virtual
addressing. It constructs and maintains the Page Tables. Page Tables are built for each
process address space. The function of the Page Tables is to map logical program virtual addresses to physical memory locations. The location of a process’s set of Page
Tables is passed to the processor hardware during a context switch. The processor
loads them and refers to them to perform virtual-to-physical address translation as it
executes the thread’s instruction stream. This is illustrated in Figure 1-13.
Virtual
Address Space
Physical Memory
Virtual
Address Space
Page Tables
Figure 1-13
Virtual-to-physical address translation
Another key role the operating system plays is to manage the contents of physical
memory effectively. VMM implements a Least Recently Used (LRU) page replacement
policy to ensure that frequently referenced pages remain in physical memory. VMM
also attempts to maintain a pool of free or available pages to ensure that pages faults
can be resolved rapidly. Whenever physical memory is in short supply, the VMM page
replacement policy replenishes the supply of free (available) pages.
Chapter 1: Performance Monitoring Overview
65
When the virtual pages of active processes overflow the size of RAM, the Memory
Manager tries to identify older, inactive pages that are usually better candidates to be
removed from RAM and stored on disk instead. The Memory Manager maintains a
current copy of any inactive virtual memory pages in the paging file. In practice, this
means that the operating systems checks to see whether a page that it temporarily
removes from a process working set has been modified since the last time it was
stored on the paging file. If the page is unchanged—that is, the current copy is current—there is no need to copy its contents to disk again before it is removed.
If the Memory Manager succeeds in keeping the active pages of processes in RAM,
then virtual memory addressing is largely transparent to User processes. If there is not
enough RAM to hold the active pages of running processes, there are apt to be performance problems. If a running thread accesses a virtual memory address that is not
currently backed by RAM, the hardware generates an interrupt signaling a page fault.
The operating system must then resolve the page fault by accessing the page on disk,
reading it into a free page in RAM, and then re-executing the failed instruction. The
running thread that has encountered the page fault is placed in an involuntary Wait
state for the duration of the page fault resolution process, including the time it takes to
copy the page from disk into memory.
Page Fault Resolution
The most serious performance issues associated with virtual memory are the execution delays that programs encounter whenever they reference virtual memory locations that are not in the current set of memory-resident pages. This event is known as
a page fault. A program thread that incurs a page fault is forced into an involuntary
Wait state during page fault resolution for the entire time it takes the operating system
to find the specific page on disk and restore it to physical memory.
When a program execution thread attempts to reference an address on a page that is
not currently resident in physical memory, a hardware interrupt occurs that halts the
executing program. If the page referenced is not currently resident in RAM, the
instruction referencing it fails, creating an addressing exception that generates an interrupt. An operating system Interrupt Service Routine gains control following the interrupt and determines that the address referenced was valid, but that the page
containing that address is not currently resident in RAM. The operating system then
must remedy this situation by locating a copy of the desired page on secondary storage, issuing an I/O operation to the paging file, and copying the designated page from
disk into a free page in RAM. Once the page has been copied successfully, the operating system re-dispatches the temporarily halted program, allowing the program
thread to continue its normal execution cycle.
66
Microsoft Windows Server 2003 Performance Guide
Note
If a User program accesses an invalid memory location because of a logic
error, for example, that references an uninitialized pointer, an addressing exception
similar to a page fault occurs. The same hardware interrupt is raised. It is up to the
Memory Manager’s ISR that gets control following the interrupt to distinguish
between the two situations.
The performance of User mode application programs suffers when there is a shortage
of RAM and too many page faults occur. It is also imperative that page faults be
resolved quickly so that page fault resolution does not delay the execution of program
threads unduly.
Note
For the sake of simplicity, the discussion of virtual memory addressing and
paging in this chapter generally ignores the workings of the system file cache. The system file cache uses Virtual Memory Manager functions to manage application file
data. The system file cache automatically maps open files into a portion of the system
virtual address range and then uses the process working set memory management
mechanisms discussed in this section to keep the most active portions of current files
resident in physical memory. Cache faults in Windows Server 2003 are a type of page
fault that occurs when an executing program references a section of an open file that
is not currently resident in physical memory. Cache faults are resolved by reading the
appropriate file data from disk or, in the case of a file stored remotely, accessing it
across the network. On many file servers, the system file cache is one of the leading
consumers of both virtual and physical memory.
Available Bytes pool The Windows Server 2003 Memory Manager maintains a
pool of available (free) physical memory pages to resolve page faults quickly. Whenever the pool is depleted, the Memory Manager replenishes its buffer of available RAM
by trimming older—that is, less frequently referenced—pages of active processes and
writing these to disk. If available RAM is adequate, executing programs seldom
encounter page faults that delay their execution, and the operating system has no difficulty maintaining a healthy supply of free pages. If the system is short on physical
memory, high page fault rates can occur, slowing down the performance of executing
programs considerably.
The operating system might be unable to maintain an adequate pool of available RAM
if there is less physical memory than the workload requires. This is a situation that
you can recognize by learning to interpret the memory performance statistics that are
available.
Chapter 1: Performance Monitoring Overview
67
Tip
The primary indicator of a shortage of RAM is that the pool of Available Bytes,
relative to the size of RAM, is too small. Just as important are the number of paging
operations to and from disk, a Memory counter called Memory\Pages/sec. When
physical memory is over-committed, your system can become so busy moving pages
in and out of RAM that it is not accomplishing much in the way of real work.
Server applications that attempt to allocate as much RAM as possible further complicate matters. These programs, among them SQL Server and Exchange Server, attempt
to grab as much RAM as they can. They communicate and coordinate with the Memory Manager to determine whether it is a good idea to try and allocate more memory
for database buffers. When you are running these server applications, RAM should
always look like it is almost full. The only way to tell that you could use more RAM on
the system is to look inside these applications and see how effectively they are using
the database buffers they have allocated.
Note Some people like to think of RAM as serving as a cache buffer for virtual
memory addresses. Like most caching schemes, there is usually a point of diminishing
return when adding more and more RAM to your system. All the virtual memory
pages that processes create do not have to be resident in RAM concurrently for performance to be acceptable.
Performance considerations In a virtual memory computer system, some page
fault behavior—for instance, when a program first begins to execute—is inevitable. You
do not need to eliminate paging activity completely. You want to prevent excessive paging from impacting performance.
Several types of performance problems can occur when there is too little physical
memory:
■
Too many paging operations to disk Too many page faults that result in disk
operations lead to excessive program execution delays. This is the most straightforward performance problem associated with virtual memory and paging.
Unfortunately, it is also the one that requires the most intense data gathering to
diagnose.
■
Disk contention Virtual memory machines that sustain high paging rates to
disk might also encounter disk performance problems. The disk I/O activity
because of paging might contend with applications attempting to access their
data files stored on the same disk as the paging file. The most common sign of
a memory shortage is seeing disk performance suffer because of disk contention. Even though it is a secondary effect, it is often easier to recognize.
68
Microsoft Windows Server 2003 Performance Guide
The Memory Pages/sec counter reports the total number of pages being
moved in and out of RAM. Compare the number Pages/sec to the total number
of Logical Disk\Disk Transfers/sec for the paging file disk. If the disk is saturated
and Pages/sec represents 20–50 percent or more of total Disk transfers/sec,
paging is probably absorbing too much of your available I/O bandwidth.
Tip
■
A general physical memory shortage User programs compete for access to available physical memory. Because physical memory is allocated to process virtual
address spaces on demand, when memory is scarce, all running programs can
suffer.
Note
Memory does not get utilized like other shared computer resources.
Unlike processors, disks, and network interfaces, it is not possible to measure
memory request rates, service times, queue time, and utilization factors. For
example, a page in RAM is either occupied or free. While memory is occupied,
it is 100 percent utilized. While it is occupied, it is occupied exclusively by a virtual memory page from a single process address space. How long memory
remains occupied depends on how often it is being accessed and what other
memory accesses are occurring on the system. When memory locations are no
longer active, they might still remain occupied for some time. None of these
usage characteristics of memory is analogous to the way computer resources
like processors, disks, and network interfaces are used.
A severe shortage of physical memory can seriously impact performance. Moving
pages back and forth between disk and memory consumes both processing and disk
capacity. A system forced to use too many CPU and disk resources on virtual memory
management tasks and too few on the application workload is said to be thrashing.
The image that the term thrashing conjures up is a washing machine so overloaded
with clothes that it expends too much energy sloshing laundry around without succeeding in getting the clothes very clean. Troubleshooting memory bottlenecks is one
of the topics discussed at length in Chapter 5, “Performance Troubleshooting.”
The solution to most paging problems is to install more physical memory capacity to
reduce the amount of paging to disk that needs to occur. If you cannot add memory
capacity immediately, you can take other effective steps to minimize the performance
impact of an acute memory shortage. For instance, you might be able to reduce the
number and size of processes that are running on the system or otherwise reduce the
physical memory demands of the ones that remain. Because a memory shortage often
manifests itself as disk contention, you can also attack the problem by improving disk
performance to the paging file. Possible remedies include:
Chapter 1: Performance Monitoring Overview
69
■
Defining additional paging files across multiple (physical) disks
■
Reducing disk contention by removing other heavily accessed files from the paging file physical disk or disks
■
Upgrading to faster disks
These disk tuning strategies will speed up page fault resolution for the page faults that
do occur or increase the effective disk bandwidth to allow the system to sustain
heavier paging rates. Disk performance is discussed in greater detail later in this chapter in “The I/O Subsystem” section. Disk tuning strategies are discussed in depth in
Chapter 5, “Performance Troubleshooting.”
Committed Pages
The operating system builds page tables on behalf of each process that is created. A
process’s page tables get built on demand as virtual memory locations are accessed,
potentially mapping the entire virtual process address space range. The Win32 VirtualAlloc API call provides both for reserving contiguous virtual address ranges and committing specific virtual memory addresses. Merely allocating virtual memory does not
trigger building Page Table entries because you are not yet accessing the virtual memory address range to store data.
Reserving a range of virtual memory addresses is something your application might
want to do in advance for a data file or other data structure that needs to be mapped
into contiguous virtual storage addresses. Only later, when those virtual addresses are
accessed, is physical memory allocated to allow the program access to those reserved
virtual memory pages. The operating system constructs a Page Table entry to map the
virtual address into RAM. Alternatively, a PTE points to the address of the page where
it is stored on one of the paging files. The paging files that are defined allow virtual
memory pages that will not all fit in RAM to spill over onto disk.
Committing virtual memory addresses causes the Virtual Memory Manager to ensure
that the process address space requesting the memory will be able to access it. This is
accomplished by charging the request against the system’s commit limit. Any unreserved and unallocated process virtual memory addresses are considered free.
Commit Limit The Commit Limit is the upper limit on the total number of Page
Table entries the operating system will build on behalf of all running processes. The
virtual memory Commit Limit prevents the system from creating a virtual memory
page that cannot fit somewhere in either RAM or the paging files.
70
Microsoft Windows Server 2003 Performance Guide
Note The number of PTEs that can be built per process is limited by the width of a
virtual address. For machines using a 32-bit virtual address, there is a 4-GB limit on the
size of the process virtual address space. A 32-bit machine with a 4-bit Physical
Address Extension (PAE) can be configured with more than 4 GB of RAM, but process
virtual address spaces are still limited to 4 GB. Machines with 64-bit virtual addressing
allow the operating system to build process address spaces larger than 4 GB. Windows
Server 2003 builds process address spaces on 64-bit machines that can be as large as
16 TB. For more information about 64-bit addressing, see Chapter 6, “Advanced Performance Topics.”
The Commit Limit is the sum of the amount of physical memory, plus the size of the
paging files, minus some system overhead. When the Commit Limit is reached, a process can no longer allocate virtual memory. Programs making routine calls to VirtualAlloc to allocate memory will fail.
Paging file extension Before the Commit Limit is reached, Windows Server 2003
will alert you to the possibility that virtual memory could soon be exhausted. Whenever a paging file becomes nearly full, a distinctive warning message, shown in Figure
1-14, is issued to the console. Also generated is a System Event log message with an ID
of 26 that documents the condition.
Figure 1-14
Out of Virtual Memory console error message
Chapter 1: Performance Monitoring Overview
71
Following the instructions in the error message directs you to the Virtual Memory
control (Figure 1-15) from the Advanced tab of the System item in Control Panel,
where additional paging files can be defined or the existing paging files can be
extended (assuming disk space is available and the page file does not already exceed
its 4-GB upper limit). Note that this extension of the paging file occurs immediately
while the system is running—it is not necessary to reboot the system.
Figure 1-15 The Virtual Memory dialog for configuring the location and size
of the paging files
Windows Server 2003 creates an initial paging file automatically when the operating
system is first installed. The default paging file is built on the same logical drive where
Windows Server 2003 is installed. The initial paging file is built with a minimum allocation equal to 1.5 times the amount of physical memory. It is defined by default so
that it can extend to approximately two times the initial allocation.
The Virtual Memory dialog illustrated in Figure 1-15 allows you to set initial and maximum values that define a range of allocated paging file space on disk for each paging
file created. When the system appears to be running out of virtual memory, the Memory Manager will automatically extend a paging file that is running out of space, has a
range defined, and is currently not at its maximum allocation value. This extension, of
course, is also subject to space being available on the specified logical disk. The automatic extension of the paging file increases the amount of virtual memory available
for allocation requests. This extension of the Commit Limit might be necessary to
keep the system from crashing.
72
Microsoft Windows Server 2003 Performance Guide
Warning
Windows Server 2003 supports a maximum of 16 paging files, each of
which must reside on distinct logical disk partitions. Page files are named Pagefile.sys
and are always created in the root directory of a logical disk. On 32-bit systems, each
paging file can hold up to 1 million pages, so each can be as large as 4 GB on disk. This
yields an upper limit on the amount of virtual memory that can be allocated, 16 × 4
GB, or 64 GB, plus whatever amount of RAM is installed on the machine.
Extending the paging file automatically might have some performance impact. When
the paging file allocation extends, it no longer occupies a contiguous set of disk sectors. Because the extension fragments the paging file, I/O operations to disk might
suffer from longer seek times. On balance, this potential performance degradation is
far outweighed by availability considerations. Without a paging file extension occurring automatically, the system is vulnerable to running out of virtual memory and
crashing.
Warning
Although you can easily prevent paging files from being extended automatically by setting the Maximum Size of the paging file equal to its Initial Size, doing
this is seldom a good idea. Allowing the paging file to be extended automatically can
save the system from crashing. This benefit far outweighs any performance degradation that might occur. To maintain highly available systems, ensure that paging files
and their extensions exceed the demand for committed bytes of your workload. See
Chapter 3, “Measuring Server Performance,” for a description of the Performance
Monitor counters you should monitor to ensure you always have enough virtual
memory space defined.
Please note that a fragmented paging file is not always a serious performance liability.
Because your paging files coexist on physical disks with other application data files,
some disk seek activity that moves the disk read/write head back and forth between
the paging file and application data files is unavoidable. Extending the paging file so
that there are noncontiguous segments, some of which might be in areas of the disk
that are surrounded by application data files, might actually reduce overall average
seek distances for paging file operations.
Multiple paging files Windows Server 2003 supports up to 16 paging files. Having
multiple paging files has possible performance advantages. It provides greater bandwidth for disk I/O paging operations. In Windows Server 2003, you can define only
one paging file per logical disk.
Chapter 1: Performance Monitoring Overview
73
Caution The contents of physical memory are copied to the paging file located on
the system root volume whenever the system creates a memory crash dump. To generate a complete diagnostic crash dump, the paging file located on the system root
volume must be at least as large as the size of RAM. Reducing the size of the primary
paging file to below the size of RAM or eliminating it altogether will prevent you from
generating a complete crash dump.
To maximize disk throughput, try to define paging files on separate physical disks, if
possible. The performance benefit of defining more than one paging file depends on
being able to access multiple physical disks in parallel. However, multiple paging files
configured on the same physical disk leads only to increased disk contention. Of
course, what looks like a single physical disk to the operating system might in fact be
an array of disks managed by a hardware disk array controller. Again, the rule of
thumb of allocating no more than one paging file per physical disk usually applies.
Caution Because paging files often sustain a higher percentage of write than read
operations, it is recommended that you avoid using RAID 5 disk arrays for the paging
file, if possible. The RAID 5 write performance penalty makes this type of disk array a
bad choice for a paging disk.
Clustered paging I/O Windows Server 2003 performs clustered paging file I/O. This
feature might encourage you to configure multiple paging files. When a page fault
occurs and the Memory Manager must retrieve it from the paging file, additional
related pages are also copied into memory in the same operation. The rationale for
clustered paging is simple. Individual disk I/O operations to the paging file are timeconsuming. After spending considerable time positioning the disk arm over the correct disk sector, it makes sense to gather any related nearby pages from the disk in one
continuous operation. This is also known as prefetching or anticipatory paging.
The reasoning behind anticipatory paging is similar to that which leads to your decision to add a few extra items to your shopping cart when you make an emergency visit
to the supermarket to buy a loaf of bread and a carton of milk. Picking up a dozen
eggs or a pound of butter at the same time might save you from making a second timeconsuming visit to the same store later on. It takes so long (relatively speaking) to get
to the disk in the first place that it makes sense for the Memory Manager to grab a few
extra pages while it is at it. After all, these are pages that are likely to be used in the
near future.
74
Microsoft Windows Server 2003 Performance Guide
Anticipatory paging turns individual on-demand page read requests into bulk paging
operations. It is a throughput-oriented optimization that tends to increase both disk
and memory utilization rather than a response-oriented one. Clustered paging elongates page read operations. Because it is handling bulk paging requests, the paging file
disk is busier for somewhat longer periods than it would be if it were just performing
individual on-demand page read operations.
Having more time-consuming paging file requests normally does not affect the thread
that is currently delayed waiting for a page fault to be resolved. Indeed, this thread
might directly benefit from the Memory Manager correctly anticipating a future page
access. However, other executing threads that encounter page faults that need to be
resolved from the same paging file disk can be impacted. Having multiple paging files
can help with this condition. With only one paging file, other threads are forced to
wait until the previous bulk paging operation completes before the Memory Manager
can resolve their page faults. Having a second (or third or fourth, and so on) paging
file allocated increases the opportunity for paging file I/O parallelism. While one paging file disk is busy with a bulk paging operation, it might be possible for the Memory
Manager to resolve a page fault quickly for a second thread—if it is lucky enough to
need a page from a different paging file than the one that is currently busy.
Tip The Memory\Page Reads/sec counter reports the number of I/O operations to
disk to resolve page faults. The Memory\Pages Input/sec counter reports the total
number of pages that were read from disk during those operations. The ratio of Pages
Input/sec to Page Reads/sec is the average number of pages fetched from disk per
paging operation.
Process Virtual Address Spaces
The operating system constructs a separate virtual memory address space on behalf of
each running process, potentially addressing up to 4 GB of virtual memory on 32-bit
machines. Each 32-bit process virtual address space is divided into two equal parts, as
depicted in Figure 1-16. The lower 2 GB of each process address space consists of private addresses associated with that specific process only. This 2-GB range of addresses
refers to pages that can be accessed only by threads running in that process address
space context. Each per process virtual address space can range from 0x0000 0000 to
address 0x7fff ffff, spanning 2 GB, potentially. Each process gets its own unique set of
user addresses in this range. Furthermore, no thread running in one process can
access virtual memory addresses in the private range that is associated with a different
process.
Chapter 1: Performance Monitoring Overview
75
x’ffff 0000’16
System
File Cache
PTEs
Paged Pool
Nonpaged Pool
Device Driver Code
System Code
x’8000 0000’16
User
0
Figure 1-16
The 4-GB address space used in 32-bit systems
Because the operating system builds a unique address space for every process, Figure
1-17 is perhaps a better picture of what the User virtual address spaces look like.
Notice that the System portion of each process address space is identical. One set of
System PTEs—augmented by per process page tables and session space—maps the System portion of the process virtual address space for every process. Because System
addresses are common to all processes, they facilitate high performance communication with the operating system functions and device drivers. These common
addresses also offer a convenient way for processes to communicate with each other,
when necessary.
76
Microsoft Windows Server 2003 Performance Guide
x’ffff 0000’16
System
File Cache
PTEs
Paged Pool
Nonpaged Pool
Device Driver Code
System Code
x’8000 0000’16
User
User
User
User
0
Figure 1-17
User processes share the System portion of the 4-GB virtual address space
Shared system addresses The upper half of each per-process address space in the
range of ‘0x8000 0000’ to ‘0xffff ffff’ consists of system addresses common to all virtual address spaces. All running processes have access to the same set of addresses in
the system range. This feat is accomplished by combining the system’s Page Tables
with each unique per process set of Page Tables.
However, User mode threads running inside a process cannot directly address memory locations in the system range because system virtual addresses are allocated using
Privileged mode. This restricts memory access to addresses in the system range to kernel threads running in Privileged mode. This is a form of protection that restricts
access to kernel memory to authorized kernel threads. When a User mode application thread calls a system function, the thread transfers to an associated Kernel mode
address where its calling parameters are then checked. If the call is validated, the
thread safely transitions to Kernel mode, changing its Execution state from User
mode to Privileged. It is in this fashion that an application thread gains access to common system virtual memory addresses.
Chapter 1: Performance Monitoring Overview
77
Commonly addressable system virtual memory locations play an important role in
interprocess communication, or IPC. Win32 API functions can be used to allocate portions of commonly addressable system areas to share data between two or more distinct processes. For example, the mechanism that Windows Server 2003 uses to allow
multiple process address spaces to access common modules, known as dynamic-link
libraries (DLLs), utilizes this form of shared memory addressing. (DLLs are library
modules that contain subroutines and functions that can be called dynamically at run
time, instead of being linked statically to the programs that utilize them.)
Extended user virtual addressing Windows Server 2003 permits a different partitioning of user and system addressable storage locations using the /userva boot
switch. This extends the private User address range to as much as 3 GB and shrinks
the system area to as little as 1 GB, as illustrated in Figure 1-18. When to use Extended
User virtual addressing is a topic discussed in Chapter 6, “Advanced Performance
Topics.”
x’ffff 0000’16
System
File Cache
PTEs
Paged Pool
Nonpaged Pool
Device Driver Code
System Code
x’c000 0000’16
User
/userva
0
Figure 1-18
The /userva boot switch increases the size of the User virtual address range
78
Microsoft Windows Server 2003 Performance Guide
Page Table Entries
During instruction execution, virtual addresses are translated into physical (real)
memory addresses. This virtual address translation takes place inside the instruction
execution pipeline internal to each processor. For example, during the Prefetch stage
of a pipelined instruction execution, the pipeline translates the logical address of the
next instruction to be executed, pointed to by the Program Counter (PC) register, into
its corresponding physical address. Similarly, during the instruction Decode phases,
virtual addresses pointing to instruction operands are translated into their corresponding physical addresses.
The precise mapping function that is used to translate a running program’s virtual
addresses into physical memory locations is hardware-specific. Hardware requirements specify the following:
■
The mechanism that establishes the virtual address translation context for individual address spaces
■
The format of the virtual-to-physical address translation tables used
■
The method for notifying the operating system that page faults have occurred
Intel-compatible 32-bit processors have the specific hardware requirements for building and maintaining 32-bit page translation tables illustrated in this section. Other
processor architectures that Windows Server 2003 supports are conceptually similar.
The processor architecture specifies the format of the page tables that the Windows
Server 2003 operating system must build and maintain to enable the computer to perform virtual-to-physical address translation. The Intel 32-bit architecture (IA-32) specifies a two-level indexing scheme using a Page Directory, which then points to the Page
Tables themselves. The Page Directory resides in a single 4-KB page and resides in
memory while the process executes. The processor’s internal Control Register 3
points to the origin of the Page Directory. Page Tables, also 4 KB in size, are built on
demand as virtual memory locations are accessed. These consist of 32-bit Page Table
entries that contain the physical memory address where the page of virtual addresses
is currently mapped. Each Page Table can map 1024 4-KB pages (a 4-MB range),
whereas the Page Directory can point to 1024 Page Tables. The combination supports
the full 4-GB addressing scheme.
As required by the hardware, Windows Server 2003 builds and maintains one set of
Page Tables capable of accessing the full 4-GB range of virtual addresses per process.
Because each process is a separate and distinct address space, each execution thread
inherits a specific address space context. A thread can access only virtual addresses
Chapter 1: Performance Monitoring Overview
79
associated with its specific process address space—with the exception of common system virtual addresses, which are accessible by any thread running in Privileged mode.
Any code that attempts to access a memory location that is not valid for that process
context encounters an invalid Page Table entry that causes an addressing exception—
a page fault. The addressing exception occurs when the hardware attempts to translate the virtual address reference by an instruction to a physical one.
Page faults are processed by the operating system’s Virtual Memory Manager, which
then has to figure out whether they are the result of a programming bug or of accessing a page that is not currently in RAM. If the culprit is a programming bug, you will
receive a familiar Access Violation message allowing you to attempt to debug the process before the operating system destroys it.
Being a repetitive task, virtual address translation can be sped up by buffering virtual
address mapping tables in fast cache memory on board the processor chip. Like other
computer architectures that support virtual memory, Intel-compatible processors provide hardware Translation Lookaside Buffers (TLBs) to speed up virtual address translation. When Control Register 3 is reloaded to point to a new set of per-process Page
Tables, a context switch occurs, which has performance implications. A context
switch flushes the TLB, slowing down instruction execution for a transitional period
known as a cache cold start.
Memory status bits For a valid page, the high order 20 bits of the PTE reference the
address of the physical memory location where the page resides. During virtual
address translation, the processor replaces the high-order 20 bits of the virtual
address with the 20 bits contained in the PTE to create the physical memory address.
As illustrated in Figure 1-19, the Intel IA-32 hardware PTE also maintains a number of
1-bit flags that reflect the current status of the virtual memory page. Bit 0 of the PTE is
the present bit—the valid bit that indicates whether the virtual address currently
resides in physical memory. If bit 0 is set, the PTE is valid for virtual address translation, and the interpretation of the other bits is hardware-determined, as shown in Figure 1-19. If the present bit is not set, an Intel-compatible processor ignores the
remainder of the information stored in the PTE.
The status bits in the PTE perform a variety of functions. Bit 2, for example, is an
authorization bit set to prevent programs executing in User mode from accessing
operating system memory locations allocated by kernel threads running in Privileged
mode. It is called the supervisor bit. The dirty bit, which is bit 6, is set whenever the
contents of a page are changed. The Memory Manager refers to the dirty bit during
page replacement to determine whether the copy of the page on the paging file is current. Bit 5 is an access bit that the hardware sets whenever the page is referenced. It is
80
Microsoft Windows Server 2003 Performance Guide
designed to play a role in page replacement, and it is used for that purpose by the Virtual Memory Manager, as will be described shortly. Likewise, the Virtual Memory
Manager turns off the read/write bit to protect code pages from being overwritten
inadvertently by executing programs. The Virtual Memory Manager does not utilize
the Intel hardware status bits 3 and 4, which are “hints” that are designed to influence
the behavior of the processor cache. Windows Server 2003 does use 4-MB large pages
to load sections of the operating system, which cuts down on the number of PTEs that
need to be defined for the system virtual memory areas and saves space in the processor TLB.
31
12
Read address (High order 20 bits)
Figure 1-19
11
9
Reserved
8
7
6
5
G
L
D
A
4
3
2
1
0
CD WT U/S R/W P
The format of an Intel 32-bit Page Table entry (PTE)
Invalid PTEs When the PTE bit 0 (the present bit) is not set, it is an invalid PTE, and
the hardware ignores the remaining contents of the PTE. In the case of an invalid PTE,
the operating system is free to use this space any way it sees fit. The Virtual Memory
Manager uses the empty space in an invalid PTE to store the essential information
about where a paged-out page can be located—in physical memory in the VMM
Standby List, on the paging file, or, in the case of a file cache fault, in the file system.
This information is stored in an invalid PTE, as shown in Figure 1-20.
31
12
11
1
Paging file offset (20 bits)
9
8
7
6
5
4
3
2
PFN
1
0
0
Transition bit
Prototype bit
Figure 1-20
The format of an invalid 32-bit PTE
Invalid PTEs contain a paging file number (PFN) and a 20-bit offset to identify the
exact location on disk where the page is stored. The paging file number is a 4-bit index
that is used to reference up to 16 unique paging files. The PFN references the
HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management Registry key, where information about the paging file configuration is stored in
the PagingFiles field. The 20-bit paging file offset then references a page slot somewhere in that paging file. The Memory Manager maintains a transition bit that is used
to identify a trimmed process page that is still resident in physical memory in the
VMM Standby List. Its role in page replacement is discussed later.
Windows Server 2003 also uses invalid PTEs as build Prototype PTEs, identified in
Figure 1-20 with the prototype bit set. The prototype PTE is the mechanism used for
mapping shared memory pages into multiple process address spaces. DLLs exploit
this feature. DLL modules are loaded once by the operating system into an area of
commonly addressable storage backed by a real Page Table entry. When referenced by
Chapter 1: Performance Monitoring Overview
81
individual processes, the operating system builds a prototype PTE that points to the
real PTE. Using the prototype PTE mechanism, a DLL is loaded into RAM just once,
but that commonly addressable page can be shared and referenced by many different
process virtual address spaces.
Prototype PTEs are also used by the built-in Windows Server 2003 file cache. The file
cache is a reserved area of the system’s virtual address space where application files
are mapped.
Page Replacement
Following a policy of allocating physical memory page slots on demand as they are
referenced inevitably fills up all available physical memory. A common problem virtual memory operating systems face is what to do when a page fault occurs, a valid
page must be retrieved from the paging file, and there is little or no room left in physical memory for the referenced page. When physical memory is fully allocated and a
new page is referenced, something has to give. The page replacement policy decides
what to do when a new page is referenced and physical memory is full.
Tip You can watch physical memory filling up in Windows Server 2003 by monitoring Memory\Available Bytes. Available Bytes reports the amount of free, unallocated
memory in RAM. It is a buffer of free pages that the Virtual Memory Manager maintains so that page faults can be resolved as rapidly as possible. As long as there are free
pages in RAM, the Virtual Memory Manager does not have to remove an old page
from a process working set first when a page fault occurs.
LRU A Least Recently Used (LRU) page replacement policy is triggered whenever the
pool of Available Bytes is running low. LRU is a popular solution to the page replacement problem. The name, Least Recently Used, captures the overall flavor of the strategy. LRU tries to identify “older” pages and replace them with new ones, reflecting the
current virtual memory access patterns of executing programs. Older pages, by inference, are less likely to be referenced again soon by executing programs, so they are the
best candidates for page replacement.
When Available Bytes is low, the Virtual Memory Manager scans all the currently resident pages of each process’s working set and identifies those that have not been referenced recently. It then trims the oldest pages from a process’s working set and
places them in the Available Bytes pool to replenish it.
The way the Windows Server 2003 page replacement policy works is illustrated in Figure 1-21.
82
Microsoft Windows Server 2003 Performance Guide
Process Working Sets (Process\Working Set)
Working Set Page Aging (LRU)
Pages
Input/sec
Transition
Faults/sec
Demand
Zero Faults/sec
Zero
List
Freed
Pages
Trimmed
Pages
Standby
List
Trimmed
Dirty
Pages
Free
List
Modified
List
(Dirty Pages)
Modified Page Writer
(Pages Output/sec)
Available Bytes
Zero Page
Thread
Re-Purposed
Transition Pages
Figure 1-21
Process working set trimming in Windows Server 2003
The Available Bytes pool consists of three lists of available pages:
■
Zero List
■
Free List
■
Standby List Contains recently trimmed process working set pages that are
also eligible to satisfy new virtual memory allocation requests. Standby List
pages still contain current data from the process working set that they were
recently removed from. The PTE associated with a page on the Standby List has
its transition bit set. A page fault that references a page on the Standby List can
be resolved immediately without having to access the disk. This is called a tran-
Pages that are available immediately to satisfy new virtual memory
allocation requests.
Pages previously allocated that were explicitly freed by the application. These are available to satisfy new virtual memory allocation requests only
after they have been zeroed out for the sake of integrity.
Chapter 1: Performance Monitoring Overview
83
sition fault, or soft fault. For this reason, the Standby List is also known as the
VMM cache. If both the Zero List and Free List are depleted, a page on the
Standby List can be migrated, or repurposed, to the Zero List.
Note Before dirty pages containing changed data can be removed from
memory, the operating system must first copy their contents to the paging file.
The Virtual Memory Manager maintains information about the age of each page of
each process working set. A page with its access bit in the PTE set is considered
recently referenced. It should remain in the process working set. During its periodic
scans of memory, VMM checks each page PTE in turn, checking whether the access
bit is set, and then clearing the bit. (Processor hardware will turn the access bit on the
next time the page is referenced during virtual address translation.) Pages without
their access bits set are aged into three categories of old, older, and oldest pages. (The
durations represented by these categories vary with memory pressure so that pages
are moved to older categories more quickly when memory is tight.) VMM trims pages
from working sets when there is a shortage of Available Bytes or a sudden drop in
Available Bytes. When trimming, VMM makes multiple scans, taking oldest pages
from each process’s working set first, then taking newer pages, stopping when the
Available Bytes pool is replenished. Trimmed pages are placed on the Standby List if
they are unmodified (the dirty bit in their PTE is clear) and on the Modified List if
they are dirty.
Trimmed process working set pages placed on the Standby List or the Modified List
receive a second chance to be referenced by a process before they are replaced by new
pages. PTEs for pages on the Standby List are marked invalid, but have their transition
bits set, as illustrated in Figure 1-20. If any trimmed pages on the Standby List marked
in transition are referenced again by a process’s threads before they are repurposed
and their contents overwritten, they are allowed to transition fault back into their process working set without the need to perform an I/O to the underlying file or to the
paging file. Transition faults are distinguished from hard page faults, which must be
satisfied by reading from the disk. Transition faults are also known as soft faults
because VMM does not need to issue an I/O to recover their contents.
Trimmed pages that have the dirty bit in their PTE set are placed on the Modified
List. Once this list grows to a modest size, the VMM schedules a modified page writer
thread to write the current page contents to the paging file. This I/O can be performed
very efficiently. After the paging file is current for a page, VMM clears the dirty bit and
moves the page to the Standby List. As with other pages on the list, the page maintains
its contents and still has its transition bit switched on, so it can be returned to a working set using the transition fault mechanism without additional I/O.
84
Microsoft Windows Server 2003 Performance Guide
The size of the Standby List, the Free List, and the Zero List are added together and
reported as Available Bytes. Pages from the Zero List are allocated whenever a process
references a brand new page in its address space. Pages from either the Zero List or
the Free List are used when a page is needed as a destination for I/O. The System Zero
Page Thread zeroes out the contents of pages on the Free List and moves them to the
Zero List, so the Free List tends to be quickly emptied. When there is a shortage of
zeroed or free pages, pages at the front of the Standby List are repurposed. These
pages at the front have been sitting on the list for the longest time, or their contents
were deemed expendable when they were placed on the list (for example, pages used
in a sequential scan of a large file).
Note
Memory\Available Bytes counts the number of pages on the Standby List, the
Free List, and the Zero List. There is no easy way to watch the size of these lists individually using Performance Monitor. The size of the Modified List is unreported, but it is
presumed small because modified pages are written to the paging file (or file system)
as soon as possible after they are trimmed.
The page trimming procedure is entirely threshold-driven. Page trimming is invoked
as necessary whenever the pool of Available Bytes is depleted. Notice that there is no
set amount of time that older pages will remain memory resident. If there is plenty of
available memory, older pages in each process working set will be allowed to age
gracefully. If RAM is scarce, the LRU page replacement policy will claim more recently
referenced pages.
Measurement support Figure 1-21 also indicates the points where this process is
instrumented. The Transition Faults/sec counter in the Memory object reports the
rate at which so-called soft page faults occur. Similarly, Demand Zero Faults/sec
reports the rate new pages are being created. Pages Output/sec shows the rate at
which changed pages have been copied to disk. Pages Input/sec counts the number of
pages from disk that the Memory Manager had to make room for in physical memory
during the last measurement interval. By implication, because physical memory is a
closed system,
Pages trimmed/sec + Pages freed = Transition Faults/sec + Demand Zero Faults/sec + Pages
Input/sec, plus, any change in the size of the Available Bytes buffer from one interval to the
next
Neither the rate of page trimming nor the rate at which applications free virtual memory pages is instrumented. These and other complications aside, such as the fact that
a shared page that is trimmed could be subject to multiple page faults, those three
Memory performance counters remain the best overall indicators of virtual memory
management overhead.
Chapter 1: Performance Monitoring Overview
85
The Page Faults/sec counter reports all types of page faults:
Memory\Page faults/sec = Memory\Transition Faults/sec + Memory\Demand Zero
Faults/sec + Memory\Page Reads/sec
The Pages Read/sec counter corresponds to a hard page fault rate in this formula.
Hard page faults require the operating system to retrieve a page from disk. Pages
Input/sec counts hard page faults, plus the extra number of pages brought into memory at the time a page fault is resolved in anticipation of future requests.
Process Working Set Management
Windows Server 2003 provides application programming interfaces to allow individual processes to specify their physical memory requirements to the operating system
and take the guesswork out of page replacement. Applications like Microsoft SQL
Server, Exchange, and IIS utilize these memory management API calls. SQL Server, for
example, implements the SetProcessWorkingSetSize Win32 API call to inform the operating system of its physical memory requirements. It also exposes a tuning parameter
that allows the database administrator to plug in appropriate minimum and maximum working set values for the Sqlserver.exe process address space.
As illustrated in Figure 1-22, you can tell SQL Server to call SetProcessWorkingSetSize to
set the process working set minimum and maximum values. The Windows Server
2003 Virtual Memory Manager will attempt to honor the minimum and maximum
values you set, unless the system determines that there are not enough Available Bytes
to honor that request safely.
Figure 1-22
The SQL Server 2000 process working set management settings
86
Microsoft Windows Server 2003 Performance Guide
The problem with controls like the SetProcessWorkingSetSize Win32 API call is that
they are static. Meanwhile, virtual memory management is very much a dynamic process that adjusts to how various running programs are currently exercising memory.
One potentially undesirable side effect of dynamic memory management is that the
amount of memory one process acquires can affect what else is happening on the system. Setting fixed lower and upper limits on the number of physical pages an application can utilize requires eternal vigilance to ensure that the value you specified is the
correct one.
Caution If you set the wrong value for the process working set in a control like the
one in Figure 1-22, you can make the system run worse than it would have if you had
given Windows Server 2003 the freedom to manage process working sets dynamically.
Because virtual memory usage is dynamic, a control (for example, like the one illustrated in Figure 1-22 for SQL Server) to set a suitable minimum and maximum range
for an application is often more helpful. If you click the Dynamically configure SQL
Server memory option, you can set a minimum and maximum working set that is
appropriate for your workload. This setting instructs dynamic memory management
routines of Windows Server 2003 to use a working set range for SQL Server that is
more appropriate than the system defaults.
Being inside a process like Sqlserver.exe is not always the best vantage point to understand what is going on with a global LRU page placement policy. Windows Server
2003 provides a feedback mechanism for those processes that are interested in controlling the size of their working sets. This feedback mechanism is particularly helpful
to processes like SQL Server, IIS, and Exchange, which utilize large portions of the
User virtual address space to store files and database buffers. These applications perform their own aging of these internal cache buffer areas.
Processes can receive notifications from the Memory Manager on the state of free
RAM. These processes receive a LowMemoryResourceNotification when Available Bytes
is running low and a HighMemoryResourceNotification when Available Bytes appears
ample. These are global events posted by the Memory Manager to let processes that
register for these notifications know when the supply of free memory is rich and they
can help themselves to more. Processes are also notified when available memory is
depleted and they should return some of their older working set pages to the system.
Processes that register to receive these notifications and react to them can grab as
much RAM as they need and still remain good citizens that will not deliberately
degrade system performance for the rest of the application processes running on the
machine.
Chapter 1: Performance Monitoring Overview
87
Accounting for process memory usage The operating system maintains a Page
Frame Number (PFN) list structure that accounts for every page in physical memory
and how it is currently being used. Physical memory usage statistics are gathered by
traversing the PFN. The number of active pages associated for a given process is
reported as the Process\Working Set bytes. The process working set measurements
are mainly used to determine which processes are responsible for a physical memory
shortage.
On behalf of each process virtual address space, the operating system builds a set of
Virtual Address Descriptors (VADs) that account for all the virtual memory a process
has reserved or committed. Tabulating the virtual memory committed by each process
by reading the VADs leads to the Process\Virtual Bytes measurements. A process’s
current allocations in the common system Paged and Nonpaged pools are also
counted separately. These two pools in the system range are discussed in more detail
later in this chapter. The process virtual memory allocation counters are very useful if
you are tracking down the source of a memory leak, as illustrated in Chapter 5, “Performance Troubleshooting.”
Shared DLLs Modular programming techniques encourage building libraries containing common routines that can be shared easily among running programs. In the
Microsoft Windows programming environment, these shared libraries are known as
dynamic-link libraries (DLLs), and they are used extensively by Microsoft developers
and other developers. The widespread use of shared DLLs complicates the bookkeeping that Windows Server 2003 performs to figure out how many resident pages are
associated with each process working set.
Windows Server 2003 counts all the resident pages associated with shared DLLs as
part of every process working set that has the DLL loaded. All resident pages of the
DLL, whether or not the process has recently accessed them, are counted in the process working set. In Windows Server 2003, the Process Working Set Bytes counter
includes all resident pages of all shared DLLs that the process currently has loaded.
This has the effect of charging processes for resident DLL pages they might never have
touched, but at least this double counting is performed consistently across all processes that have the DLL loaded.
This working set accounting procedure that also spawns the measurement data is
designed to enable VMM to do a good job when it needs to trim pages. Unfortunately,
it does make it difficult to account precisely for how physical memory is being used. It
leads to a measurement anomaly that is illustrated in Figure 1-23. For example,
because the resident pages associated with shared DLLs are included in the process
working set, it is not unusual for a process to acquire a working set larger than the
number of committed virtual memory bytes it has requested. Notice the number of
88
Microsoft Windows Server 2003 Performance Guide
processes in Figure 1-23 with more working set bytes (the Mem Usage column) than
committed virtual bytes (the VM Size column). Because DLLs are files that are read
into memory directly from the file system, few working set pages associated with
shared DLLs ever need to be committed to virtual memory. They are not included in
the Process Virtual Bytes counter even though all the resident bytes associated with
them are included in the Process Working Set counter.
Figure 1-23
Working set bytes compared to virtual bytes
System working set Windows Server 2003 operating system functions also consume RAM. Consequently, the system has a working set that needs to be controlled
and managed like any other process. In this section, the components of the system
working set are discussed.
Both system code and device driver code occupy memory. In addition, the operating
system allocates data structures in two areas of memory: a pool for nonpageable storage and a pageable pool. Data structures that are accessed by operating system and
driver functions when interrupts are disabled must be resident in RAM at the time
they are referenced. These data structures are usually allocated from the nonpageable
pool so that they reside permanently in RAM. The Pool Nonpaged Bytes counter in
the Memory object shows the amount of RAM currently allocated in this pool that is
permanently resident in RAM.
Generally, though, most system data structures are pageable—they are created in a
pageable pool of storage and are subject to page replacement like the virtual memory
pages of any other process. Windows Server 2003 maintains a working set of active
Chapter 1: Performance Monitoring Overview
89
pages in RAM for the operating system that are subject to the same LRU page replacement policy as ordinary process address spaces. The Pool Paged Bytes counter reports
the amount of paged pool virtual memory that is allocated. The Pool Paged Resident
Bytes counter reports on the number of page pool pages that are currently resident in
RAM.
Tip The Memory\Cache Bytes counter reports the total number of resident pages in
the current system working set. Cache Bytes is the sum of the System Cache Resident
Bytes, System Driver Resident Bytes, System Code Resident Bytes, and Pool Paged Resident Bytes counters. The operating system’s working set became known as the Cache
because it also includes resident pages of the built-in file cache, the operating system
function that historically consumed more RAM than any other.
The system virtual address range is limited to 2 GB, and by using the /3 GB boot
option, it can be limited even further to as little as 1 GB. On a large 32-bit system, it is
not uncommon to run out of virtual memory in the system address range. The culprit
could be a program that is leaking virtual memory from the Paged pool. Alternatively,
it could be caused by active usage of the system address range by a multitude of
important system functions—kernel threads, TCP session data, the file cache, or many
other normal functions. When the number of free System PTEs reaches zero, no function is able to map additional virtual memory within the system range. When you run
out of system virtual memory addresses, the results are usually catastrophic.
The 2-GB limit on the size of the system virtual address range is a serious constraint
that can sometimes be relieved only by moving to a 64-bit machine. There are examples of how to determine whether your system is running out of system virtual memory in Chapter 5, “Performance Troubleshooting.” Chapter 6, “Advanced Performance
Topics,” discusses the virtual memory boot options, the Physical Address Extension,
the Memory Manager Registry settings that influence how the system address space is
allocated, and 64-bit virtual addressing.
Tip Tracking the Memory\Free System Page Table Entries counter can help you tell
when the system virtual address range is going be exhausted. Unfortunately, you can
sometimes run out of virtual addressing space in the Paged or Nonpaged pools before
all the System PTEs are used up.
The I/O Subsystem
One of the key components of the Windows Server 2003 Executive is the I/O Manager. The I/O Manager provides a set of interfaces for applications that need to retrieve
data from or store data to external devices. Because the contents of RAM are volatile,
90
Microsoft Windows Server 2003 Performance Guide
any computing results that need to be stored permanently must be stored on a disk
drive or other external device. These Input/Output interfaces are consistent across all
physical and logical devices. The I/O Manager also provides services that device drivers use to process the I/O requests. Figure 1-24 illustrates the relationship between
User mode applications, the I/O Manager, and the device drivers under it, all the way
down to the physical devices.
User Mode Process
Win32 Subsystem
User mode
Kernel mode
I/O Manager
IRP
Quota
FAT
HSM
CDFS
NTFS
Filter Drivers
UDF
Compression
Encryption
CD
Disk
Tape
DVD
File System
Drivers
Filter Drivers
Class
Drivers
SCSI Driver
SCSI Miniport
Hardware
I/O Bus
Host Bus
Adaptor
Disk
Figure 1-24
The I/O Manager
When it is called by a User mode process to perform an I/O operation against a file,
the Win32 Subsystem creates an I/O Request Packet (IRP) that encapsulates the
request. The IRP is then passed down through various layers of the I/O Manager for
Chapter 1: Performance Monitoring Overview
91
processing. The IRP proceeds down through the file system layer, where it is mapped
to a physical disk location, to the physical disk device driver that generates the appropriate hardware command, and, finally, to the device itself. The I/O Manager handles
all I/O devices, so a complete picture would include the driver components for the
other types of devices such as network drivers, display drivers, and multimedia drivers. For the purpose of this section, it will be sufficient to concentrate on disk I/O processing.
More Info More complete documentation about the I/O Manager can be found in
the Windows Device Model reference manual available at http://msdn.microsoft.com/
library/default.asp?url=/library/en-us/kmarch/hh/kmarch/wdmintro_d5b4fea2e96b-4880-b610-92e6d96f32be.xml.asp.
Filter drivers are modules that are inserted into the I/O Manager stack at various
points to perform optional functions. Some like the disk volume Quota Manager can
process IRPs before they reach the file system driver. Others like the Compression and
Encryption services only process requests that originated specifically for the NTFS
driver.
Eventually, an IRP is passed down to the physical device driver layer, which generates an appropriate command to the hardware device. Because disk hardware is
usually the most critical element of disk performance, beginning there is appropriate. Disk performance is a complex subject because of the range of hardware solutions with different cost/performance characteristics that you might want to
consider. The discussion here is limited to the performance of simple disk configurations. Cached disks and disk array alternatives are discussed in Chapter 5, “Performance Troubleshooting.”
Disk Performance Expectations
The performance of disk drives and related components such as I/O buses is directly
related to their hardware feeds and speeds. Using performance monitoring, you can
determine how the disks attached to your Windows Server 2003 machine are performing. If the disks are performing at or near the performance levels that can be
expected from the type of hardware devices you have installed, there is very little that
can be accomplished by trying to “tune” the environment to run better. If, on the other
hand, actual disk performance is considerably worse than reasonable disk performance expectations, mounting a tuning effort could prove very worthwhile. This section will provide a basic framework for determining what performance levels you can
reasonably expect from different kinds of disk hardware. This topic is discussed in
greater detail in Chapter 5, “Performance Troubleshooting.”
92
Microsoft Windows Server 2003 Performance Guide
Because so many disk hardware and configuration alternatives are available, this complex subject can only be introduced here. Determining precisely what performance
level your disk configuration is capable of delivering is something you are going to
have to investigate on your own. The suggestions and guidelines introduced here
should help in that quest.
After you determine what performance level your disks are capable of delivering, you
need to understand the actual performance level of your disks. This requires knowing
how to calculate the service time and queue time of your disks. Once you calculate the
actual disk service time, you can compare it to the expected level of performance to
see whether you have a configuration or tuning problem. In Chapter 5, “Performance
Troubleshooting,” the steps for relieving a disk bottleneck are discussed in detail.
Here the scope of the discussion is limited to the essential background you need to
understand and execute those procedures.
Disk Architecture
Figure 1-25 illustrates the architecture of a hard disk. Disks store and retrieve digitally
encoded data on platters. A disk spindle normally consists of several circular platters
that rotate continuously. Data is encoded and stored on both sides of each of the disk
platters. Bits are stored on the platter magnetically on data tracks, arranged in concentric circles on the platter surface. The smallest unit of data transfer to and from the
disk is called a sector. The capacity of a sector is usually 512 bytes. The recording density
refers to the number of bits per square inch that can be stored and retrieved. Because
the recording density is constant, outer tracks can store much more data than inner
tracks. Data on the disk is addressed using a relative sector number. Data is stored and
retrieved from the platters using the recording and playback heads that are attached at
the end of each of the actuator arms.
Track
Figure 1-25
Platter 0
Spindle
The components of a disk drive
Chapter 1: Performance Monitoring Overview
93
An I/O command initiates a request to read or write a designated sector on the disk.
The first step is to position the heads over the specific track location where the sector
is located. This is known as a seek, which can be relatively time-consuming. Seek time
is primarily a function of distance. If the heads are already positioned in the right
place, there is a zero motion seek. Zero seeks occur, for example, when you read consecutive sectors off the disk. This is something that happens when a file is accessed
sequentially (unless the file happens to be heavily fragmented). A minimum seek is the
time it takes to reposition the heads from one track to an adjacent track. A maximum
seek traverses the length of the platter from the first track to the last. An average seek is
calculated as the time it takes to move across 1/3 of the available tracks.
Following the seek operation, another mechanical delay occurs to wait for the designated sector to rotate under the playback and recording head. This is known as rotational delay, or sometimes just latency. Device latency is one of those dumb luck
operations. If your luck is good, the desired sector is very close to the current position
of the recording heads. If your luck is bad, the desired sector is almost a full revolution
of the disk away. On average, device latency is 1/2 of the device’s rotational speed,
usually specified in revolutions per minute (rpm). A disk that is spinning at 7200 rpm
revolves 120 times per second, or once every 8.33 milliseconds. The average rotational delay for this disk is 1/2 a revolution, or approximately 4.2 ms.
The third major component of disk I/O service time is data transfer time. This is a function of the recording density of the track and the rotational speed of the device. These
mechanical characteristics determine the data rate at which bits pass under the read/
write heads. Outer tracks with roughly double the number of bits as inner tracks
transfer data at twice the data rate. On average, any specific sector could be located
anywhere on the disk, so it is customary to calculate an average data rate halfway
between the highest and lowest speed tracks. The other variable factor in data transfer
is the size of the request, which is normally a function of the block size selected for the
file system. Some applications (like the paging subsystem) make bulk requests where
they attempt to read or write several logical blocks in a single, physical disk operation.
The service time of a disk request can normally be broken into these three individual
components:
disk service time = seek + latency + data transfer
There are device service time components other than these three, including protocol
delay time. But because these other components represent only minor delays, they are
ignored here for the sake of simplicity.
94
Microsoft Windows Server 2003 Performance Guide
The specifications for several current, popular disk models are shown in Table 1-3.
The service time expectation calculated here is based on the assumption that disks
perform zero seeks 50 percent of the time and average seeks the other 50 percent. It
also assumes a block size in the range of 4 KB–16 KB, which is typical for NTFS.
Disk Service Time Expectations for a Representative Sample of
Current Disks
Table 1-3
Model
Average
Seek (ms)
Rpm
Latency
(ms)
Transfer Rate Service Time
(MB/sec)
(ms)
Desktop disk
9
5400
5.5
45
11
Server disk
7.5
7200
4.2
50
9
Performance disk
6
10,000
3.0
60
7
High performance
disk
5
15,000
2.0
75
5
Notice that the worst performing disks in this example are still capable of servicing
requests in roughly 10 milliseconds. At 100 percent utilization, these disks are capable
of about 100 I/O operations per second (IOPS). More expensive performance disks
are capable of performance at about twice that rate.
A simple chart like the one in Table 1-3 should give you some insight into the bandwidth (or I/O capacity) of your disk configuration. Optimizations such as cached
disks aside for the moment, the critical factor constraining the performance of your
disk configuration is the number of independent physical disk spindles, each with its
own finite capacity to execute I/O operations. If your server is configured with a single
physical disk, it is probably capable of performing only 100–200 I/Os per second. If
five disks are configured, the same computer can normally perform 5 times the number of disk I/Os.
This chart does not say anything about disk response time—in other words, service
time plus queue time. Average disk response time is one of the important metrics that
the I/O Manager supplies. The counter is called Avg. Disk sec/Transfer and is provided for both Logical and Physical Disks. At high utilization levels, queuing delays
waiting for the disk are liable to be substantial. However, the number of concurrent
disk requestors, which serves as an upper limit on the disk queue depth, is often
small. The number of concurrent disk requestors usually establishes an upper bound
on disk queue time that is much lower than a simple queuing model like M/M/1
would predict.
To set reasonable service time expectations for the disks you are using, access the public Web site maintained by the manufacturer that supplies your disks. There you can
obtain the specifications for the drives you are running—the average seek time, the
rotational speed, and the minimum and maximum data transfer rates the device sup-
Chapter 1: Performance Monitoring Overview
95
ports. Failing that, you can also determine for yourself how fast your disks are capable
of running by using a stress-testing program.
To utilize the disk service time expectations in Table 1-3 for planning purposes, you
need to be able to compare the expected values to actual disk service times in your
environment. To understand how to calculate disk service time from these measurements, it will help to learn a little more about the way disk performance statistics are
gathered.
Disk Performance Measurements
The Physical Disk and Logical Disk performance statistics in Windows Server 2003
are gathered by the functional layers in the I/O Manager stack. Figure 1-26 shows the
volume manager layer underneath the file system layer that is used to gather Logical
Disk statistics. The physical disk partition manager, Partmgr.sys, gathers Physical Disk
statistics.
I/O Manager
IRP
NTFS.sys
Volume Manager:
Basic/Dynamic
Logical Disk
measurements
Physical Disk Partition
Manager
Physical Disk
measurements
Disk
Class
Drivers
SCSI Driver
SCSI Miniport
Hardware
Disk
Figure 1-26
Disk performance statistics are gathered by the I/O Manager stack
96
Microsoft Windows Server 2003 Performance Guide
Important Both Logical and Physical Disk statistics are enabled by default in Windows Server 2003. This is a change from Windows 2000 and Windows XP where only
the Physical Disk statistics were installed by default.
Unlike the % Processor Time measurements that Windows Server 2003 derives using
sampling, the disk performance measurements gathered by the I/O Manager reflect
precise timings of individual disk I/O requests using the High Precision clock. As each
IRP passes through the measurement layer, the software gathers information about the
request. The DISK_PERFORMANCE structure definition, documented in the platform SDK, shows the metrics that the I/O Manager stack keeps track of for each individual request. They include the metrics shown in Table 1-4.
Table 1-4
Metrics Tracked by the I/O Manager
Metric
Description
BytesRead
Number of bytes read
BytesWritten
Number of bytes written
ReadTime
Time it took to complete the read
WriteTime
Time it took to complete the write
IdleTime
Specifies the idle time
ReadCount
Number of read operations
WriteCount
Number of write operations
QueueDepth
Depth of the queue
SplitCount
Number of split I/Os. Usually an indicator of file fragmentation
Table 1-4 is the complete list of metrics that the I/O Manager measurement layer compiles for each disk I/O request. These are the metrics that are then summarized and
reported at the Logical or Physical Disk level. All the disk performance statistics that
are available using Performance Monitor are derived from these basic fields. For
instance, the Avg. Disk Queue Length counter that is available in Performance Monitor is derived using Little’s Law as follows:
Average Disk Queue Length = (ReadCount × ReadTime) + (WriteCount × WriteTime)
Caution
The Avg. Disk Queue Length counter is derived using Little’s Law and not
measured directly. If the Little’s Law equilibrium assumption is not valid for the measurement interval, the interpretation of this value is subject to question. Any interval
where there is a big difference in the value of the Current Disk Queue Length counter
compared to the previous interval is problematic.
Stick with the metrics that the I/O Manager measurement layer measures directly and
you cannot go wrong!
Chapter 1: Performance Monitoring Overview
97
The I/O Manager measurement layer derives the ReadTime and WriteTime timing values
by saving a clock value when the IRP initially arrives on its way down to the physical
device, and then collecting a second timestamp after the disk I/O completes on its
return trip back up the I/O Manager stack. The first clock value is subtracted from the
second value to calculate ReadTime and WriteTime. These timings are associated with
the Avg. Disk sec/Read and Avg. Disk sec/Write counters that are visible in Performance
Monitor. Avg. Disk sec/Transfer is the weighted average of the Read and Write. They are
measurements of the round trip time (RTT) of the IRP to the disk device and back.
Avg. Disk sec/Transfer includes any queuing delays at lower levels of the I/O Manager
stack and, of course, at the device. To break down disk response time into service time
and queue time, it helps to understand how disk Idle time is measured. The QueueDepth is simply an instantaneous count of the number of active IRPs that the filter driver
is keeping track of. These are IRPs that the filter driver has passed down on their way to
the device, but have not returned yet. It includes any I/O requests that are currently executing at the device, as well as any requests that are queued waiting for service.
When an IRP is being passed upward following I/O completion, the I/O Manager
measurement layer checks the current QueueDepth. If QueueDepth is zero, the I/O
Manager measurement layer stores a clock value indicating that an Idle period is
beginning. The disk idle period ends when the very next I/O Request Packet arrives at
the I/O Manager measurement layer. When there is work to be done, the device is
busy, not idle. The I/O Manager measurement layer accesses the current time, subtracts the previous timestamp marking the start of the Idle period, and accumulates
the total Idle time measurement over the interval.
Even though Performance Monitor only displays a % Idle Time counter, it is more
intuitive to calculate:
disk utilization = 100 −% Idle Time
Applying the Utilization Law, you can then calculate disk service time:
disk service time = disk utilization ÷ Disk Transfers/sec
With disk service time, you can then calculate average disk queue time:
disk queue time = Avg. Disk sec/Transfer −disk service time
Once you calculate the disk service time in this fashion for the devices attached to
your machine, you can compare those measurements to your expectations regarding
the levels of performance these devices can deliver. If actual disk performance is much
worse than expected, you have a disk performance problem worth doing something
about. Various configuration and tuning options for improving disk response time
and throughput are discussed in Chapter 5, “Performance Troubleshooting.”
98
Microsoft Windows Server 2003 Performance Guide
Network Interfaces
Networking refers to data communication between two or more computers linked by
some transmission medium. Data communication technology is readily broken into
local area network (LAN) technology, which is designed to link computers over limited
distances, and wide area network (WAN) technology for communicating over longer
distances.
LANs utilize inexpensive wire and wireless protocols suitable for peer-to-peer communication, making it possible to link many computers together cost-effectively. LAN
technologies include the popular Fast Ethernet 100baseT standard and Gigabit Ethernet, FDDI, and Token Ring. An unavoidable consideration in wiring your computers
together is that LAN technologies have a built-in distance constraint that must be honored—that is why they are referred to as local area networks. The Fast Ethernet protocol, for example, cannot be used to connect computers over distances greater than a
hundred meters. Wireless LANs also have very stringent distance limitations. The set
of localized connections associated with a single Ethernet hub or switch is known as
a network segment. As you add more connections to a LAN or try to interconnect
machines over greater distances, inevitably you will create more network segments
that then must be bridged or routed to form a cohesive network.
WAN connections link networks and network segments over longer distances.
Eventually, WANs connect to the backbone of the World Wide Web, which literally
interconnect millions of individual computers scattered around the globe. Wide
area networking utilizes relatively expensive long distance lines normally provided
by telephone companies and other common carriers to connect distant locations.
Popular WAN technologies include Frame Relay, ISDN, DSL, T1, T3, and SONET,
among others.
The networking services Windows Server 2003 uses are based on prevailing industry
standards. The spread of the Internet has led to universal adoption of the Internet
communication protocols associated with TCP/IP. Windows Server 2003 is designed
to operate with and is fully compliant with the bundle of networking standards associated with the Internet. These Internet protocols include UDP, TCP, IP, ICMP, DNS,
DHCP, HTTP, and RPC. Instead of this alphabet soup, the suite of Internet standard
protocols is often simply called TCP/IP, the two components that play a central role.
The full set of TCP/IP protocols is the native networking language of the Windows
Server 2003 operating system.
Packets
Data is transmitted over a communications line in a serial fashion, one bit at a time.
Instead of simply sending individual bits between stations across the network,
however, data communication is performed using groups of bits organized into dis-
Chapter 1: Performance Monitoring Overview
99
tinct datagrams, or packets. It is the function of the data communications hardware
and software that you run to shape bit streams into standard, recognizable packets.
The overall shape of the packets that are being sent and received in Microsoft Windows-based networks is discussed here from a performance and capacity planning
perspective.
More Info
For more in-depth information about TCP/IP protocols, refer to
Microsoft Windows Server 2003 TCP/IP Protocols and Services Technical Reference
(Microsoft Press, 2003).
The Network Monitor is the diagnostic tool that allows you to capture packet traces in
Windows Server 2003. A Network Monitor example is shown here to illustrate some
common types of packets that you can expect to find circulating on your network.
Using the Network Monitor to capture and examine network traffic is discussed in
more detail in Chapter 2, “Performance Monitoring Tools.”
At the heart of any packet is the payload, the information that is actually intended to
be transmitted between two computers. Networking hardware and software inserts
packet headers at the front of the data transmission payload to describe the data being
transmitted. For instance, the packet header contains a tag that shows the type and
format of the packet. The header also contains the source address of the station transmitting the data and the destination address of the station intended to receive it. In
addition, the packet header contains a length code that tells you how much data it
contains—remember, coming across the wire, the data appears as a continuous
sequence of bits.
Mining these packet header fields, you can calculate who is sending how much data to
whom, information that can then be compared to the capacity of the links connecting
those stations to determine whether network link capacity is adequate. The information contained in the packet headers forms the basis of the networking performance
statistics that you can gather using Performance Monitor.
Understanding the packet-oriented nature of data communication transmissions is
very important. The various network protocols determine the format of data packets—
how many bits in the header, the sequence of header fields, and how error correction
code data is created and stored in the packet. Protocols simply represent standard bit
formats that packets must conform to. Packets also must conform to some maximum
size or maximum transmission unit (MTU). Transmitting blocks of data that are
larger than the MTU is also problematic. Large blocks must be broken into packets
that will fit within the MTU. Consequently, packet disassembly and reassembly are
necessary functions that must be performed by networking hardware and software,
too. Packets representing pieces of larger blocks must contain instructions for their
100
Microsoft Windows Server 2003 Performance Guide
reassembly at the receiver. In routing, two packets from the same logical transmission
might get sent along different routes and even arrive at their destination out of
sequence. Receiving packets out of order naturally complicates the task of reassembling the transmission at the receiver.
Protocol Stack
It is customary to speak of networking hardware and software technology as a series
of distinct, well-defined layers. The notion of building networking technology by
using layers of hardware and software began with a standardization process that originated in the early 1980s. When the ARPANET, the predecessor of today’s Internet,
was created, it implemented four standard networking layers. These Internet protocol
layers are almost uniformly accepted as standards today, and they form the basis of
the networking support for Microsoft Windows Server 2003. This layered architecture
of the Internet is depicted in Figure 1-27. These are the standard layers of the TCP/IP
protocol stack.
Application: HTTP, RPC, and so on
Host-to-Host: TCP, UDP
Internet Protocol: IP
Media Access: Ethernet, FDDI
Packet
Figure 1-27
Packet
Packet
Packet
Packet
Networking protocol stack
The Internet architecture defines the following functional layers:
■
Media Access The lowest level layer is concerned with the physical transmission media and how the signal it carries is used to represent data bits. The MAC
layer is also sometimes decomposed further into Physical and Data Link layers.
The various forms of Ethernet are the most common implementation of the
MAC layer.
■
The IP layer is concerned with packet delivery. IP solves
the problem of delivering packets across autonomous network segments. At
each network hop, the IP decides the next system to forward the packet to. The
packet gets forwarded from neighbor to neighbor in this manner until the payload reaches its intended final destination. The IP layer also includes the
Address Resolution Protocol (ARP), the Internet Control Message Protocol
Internet Protocol (IP)
Chapter 1: Performance Monitoring Overview
101
(ICMP), and the Border Gateway Protocol (BGP), which are involved in discovering and maintaining packet delivery routes. The most common version of the
protocol deployed today is IP version 4; however the next version, IP version 6,
is beginning to be deployed. Most concepts described in this chapter apply to
both versions, although most of the discussion is in the context of IP version 4.
■
Host-to-Host The Host-to-Host layer is concerned with how machines that
want to transmit data back and forth can communicate. The Internet protocols
define two Host-to-Host implementations—the User Datagram Protocol (UDP)
for transmission of simple messages and the Transmission Control Program
(TCP) for handling more complex transactions that require communication sessions. TCP is the layer that is responsible for ensuring reliable in-order delivery
of all packets. It is also responsible for the flow control functions that have major
performance implications in WAN communications.
■
Application The Internet protocols include standard applications for transmitting mail (Simple Mail Transfer Protocol, or SMTP), files (File Transfer Protocol
or FTP), Web browser hypertext files (Hypertext Transfer Protocol or HTTP),
remote login (Telnet), and others. In addition, Windows Server 2003 supports
many additional networking applications that plug into TCP, including Common Internet File System (CIFS), Remote Procedure Call (RPC), Distributed
COM (DCOM), Lightweight Directory Access Protocol (LDAP), among others.
Processing of packets The layered architecture of the Internet protocols is much
more than a conceptual abstraction. Each layer of networking services operates in succession on individual packets. Consider a datagram originally created by an application layer like the Microsoft Internet Information Services (IIS), which supports HTTP
(the Internet’s Hypertext Transfer Protocol) for communicating with a Web browser
program. As created by IIS, this packet of information contains HTML format text, for
example, in Response to a specific HTTP GET Request.
IIS then passes the HTTP Response Message to the next appropriate lower layer,
which is TCP, in this case, for transmission to the requesting client program, which is
a Web browser program running on a remote computer. The TCP layer of software is
responsible for certain control functions such as establishing and maintaining a data
communications session between the IIS Web server machine and an Internet
Explorer Web browser client. TCP ensures that the data sent is reliably received by the
remote end. TCP is also responsible for network flow control, which ensures that the
sending computer system (IIS Web Server computer) does not flood the Internet routers or your client-side network with data. In addition, TCP is responsible for breaking
the data to be sent into maximum sized segments that can be sent across the network
to the destination. The TCP layer, in turn, passes the TCP segment to the IP layer,
which decides which router or gateway to forward the packet to, such that the IP
102
Microsoft Windows Server 2003 Performance Guide
packet moves closer to its eventual destination. Finally, IP passes each IP packet to the
MAC layer, which is actually responsible for placing bits on the wire. The layers in the
networking protocol stack each operate in sequence on the packet. As illustrated in
Figure 1-28, each layer also contributes some of the packet header control information
that is ultimately placed on the wire.
Figure 1-28
Network Monitor
The protocol stack functions in reverse order at the receiving station. Each layer processes the packet based on the control information encapsulated in the packet header
deposited by the corresponding layer at the sending computer system. If the layer
determines that the received packet contains a valid payload, the packet header data
originally inserted by that corresponding layer at the sender station is stripped off.
The remaining payload data is then passed up to the next higher layer in the stack for
processing. In this fashion, the packet is processed at the receiving station by the MAC
layer, the IP layer, and the TCP layer in sequence, until it is finally passed to the Web
browser application that originally requested the transmission and knows how to format HTML text so that it looks good on your display monitor.
Example packet trace: processing an HTTP GET request Figure 1-28 is a Network Monitor packet capture that illustrates how these processing layers work. In this
example, a TCP-mandated SYN request (SYN is short for synchronize) in Frame 6 is
transmitted from an Internet Explorer Web browser to establish a session with the
Web server running at http://www.msn.com. The initial session request packet has a
number of performance-oriented parameters including Selective Acknowledgement
(SACK) and the TCP Advertised Window. Notice in the Network Monitor’s middle
Chapter 1: Performance Monitoring Overview
103
frame how the TCP header information is encapsulated inside an IP segment, which is
then enclosed in an Ethernet packet for transmission over the wire.
Frame 8 in Figure 1-28 shows a SYN, ACK response frame from the Web server that
continues the session negotiation process. Notice that it was received about 80 milliseconds following the initial transmission. This round trip time is a key performance
metric in TCP because it governs the Retransmission Time Out (RTO) that the protocol uses to determine when congested routers have dropped packets and data needs
to be retransmitted. This aspect of TCP congestion control will be discussed in more
detail later in this chapter. In this example, the ACK packet in Frame 9 completes the
sequence of packets that establishes a session between two TCP host machines.
Frame 10 follows immediately, an HTTP protocol GET Request to the MSN Web
site to access the site’s home page. This GET Request also passes a cookie containing information to the IIS Web server about the initiator of the request. The TCP
running on the Web server acknowledges this packet in Frame 11, some 260 milliseconds later.
An IIS Web server then builds the HTTP Response message, which spans Frames 12,
13, 15, 16, 18, and 19. Here the HTTP Response message is larger than an Ethernet
MTU (maximum transmission unit), so it is fragmented into multiple segments. The
TCP layer is responsible for breaking up this Response message into MTU-size packets on the Sender side and then reassembling the message at the Receiver. After these
frames have been received by the Web browser, Internet Explorer has all the data it
needs to render an attractive-looking Web page.
The Network Monitor captures and displays packet headers in a way that lets you easily dig down into the protocol layers to see what is going on. To use the Network Monitor effectively to diagnose performance problems, it helps to understand a little more
about these networking protocols and what they do.
Bandwidth
Bandwidth refers to the data rate of the data communications transmission, usually
measured in bits per second. It is the capacity of the link to send and receive data.
Some authorities suggest visualizing bandwidth as the width of the data pipe connecting two stations. A better analogy is to visualize the rate at which bits arrive at the
other end of the pipe. Bandwidth describes the rate at which bits are sent across the
link. It tells you nothing about how long it takes to transmit those bits. Physically, each
bit transmitted across the wire is part of a continuous wave form. The waveform cycles
at 100 MHz for 100baseT and at 1 GHz for Gigabit Ethernet. In the time it takes to
send 1 bit using 100baseT, you can send 10 bits using the Gigabit Ethernet standard.
104
Microsoft Windows Server 2003 Performance Guide
Bandwidth is usually the prime performance concern in LANs, only when the network segment is used to move large blocks of data from point to point, as in disk-totape backup or video streaming applications. For long distance data communications,
especially for organizations attempting to do business on the Web, for example, the
long latency, not bandwidth, is the more pressing (and less tractable) performance
problem.
Table 1-5 shows the bandwidth rating of a variety of popular link technologies. It also
compares the relative bandwidth of the link to a 56 Kbps telephone line. It is more
precise, however, to speak of the effective bandwidth of a data communications link.
Effective bandwidth attempts to factor in the many types of overhead that add bytes to
the data payload you are attempting to move over the network. Consider what happens when you transfer a 10-MB (megabyte) file using either FTP or Microsoft’s CIFS
network file sharing protocol from one machine to another across a 100-Mbps (Megabits per second) switched Ethernet link.
Table 1-5
Connection Speed for a Variety of Networking Links
Circuit
Connection Speed (bps)
Relative Speed
Modem
28,800
0.5
Frame Relay
56,000
1
ISDN
128,000
2
DSL
640,000
12
T1/DS1
1,536,000
28
10 Mb Ethernet
10,000,000
180
11 Mb Wireless
11,000,000
196
T3/DS3
44,736,000
800
OC1
51,844,000
925
100 Mb Fast Ethernet
100,000,000
1800
FDDI
100,000,000
1800
OC3
155,532,000
2800
ATM
155,532,000
2800
OC12
622,128,000
11,120
Gigabit Ethernet
1,000,000,000
18,000
The first overhead of data communication that should be factored into any calculation
of effective bandwidth is the packet-header overhead. The 10-MB file that the FTP or
SMB protocol transfers must be broken into data packets no larger than Ethernet’s
1500-byte MTU. As illustrated in the HTTP packet trace in Figure 1-28, each Ethernet
packet also contains IP, TCP, and application headers for this application. The space in
the packet that these protocol stack headers occupy reduces effective bandwidth by
Chapter 1: Performance Monitoring Overview
105
about 2–3 percent. Plus, there are other protocol-related overheads such as the ACK
packets seen in the Figure 1-28 trace that further reduce effective bandwidth. Altogether, the overhead of typical TCP/IP traffic reduces the effective bandwidth of a
switched Ethernet link to approximately 95 percent of its rated capacity. If you experiment with transferring this hypothetical 10-MB file across a typical 100-Mbps Ethernet link, you will probably be able to measure only about 95 Mbps of throughput,
which for planning purposes, is the effective bandwidth of the link.
The most important measure of bandwidth usage is line utilization. The measurement
technique is straightforward. Using MAC layer length fields, a measurement layer
inserted into the network protocol stack accumulates the total number of bytes
received from packets transferred across the link. Utilization is then calculated as:
network interface utilization = Bytes Total/sec current bandwidth
where both fields are measured in bytes per second. Dividing the Network Interface\Bytes Total/sec counter by the Network Interface\Current Bandwidth counter
yields the utilization of the link.
Latency
Latency refers to the delay in sending bits from one location to another. It is the length
of time it takes to send a message from one station to another across the link. Electronic signals travel at the speed of light, approximately 300,000 kilometers per second. The physical characteristics of transmission media do have a dampening effect
on signal propagation delays, with a corresponding increase in latency. The effective
speed of an electronic data transmission wire is only about 1/2 the speed of light, or
150,000 km/second. Optical fiber connections reach fully 2/3 the speed of light, or a
latency of 200,000 km/second. The delay involved in sending a message from a location in the eastern United States to a west coast location across a single, continuous
optical cable would traverse 5,000 km. At a top speed of 200,000 km/second, the
latency for this data transmission is a not insignificant 25 milliseconds. For a rule-ofthumb calculation, allow for at least 5 milliseconds of delay for every 1000 kilometers
separating two stations.
Of course, most long distance transmissions do not cross simple, continuous point-topoint links. Over long distances, both electrical and optical signals attenuate and
require amplification using repeaters to reconstitute the signal and send it further
along on its way. These repeaters add latency to the transmission time. Because a longdistance transmission will traverse multiple network segments, additional processing
is necessary at every hop to route packets to the next hop in the journey between
106
Microsoft Windows Server 2003 Performance Guide
sender and receiver. Processing time at links, including routers and repeaters and
amplifiers of various forms, adds significant delays at every network hop. High performance IP packet routers that are designed to move massive amounts of traffic along
the Internet backbone, for example, might add 10 µsecs of delay. Slower routers like
the ones installed on customer premises could add as much as 50 µsecs of additional
latency to the transmission time. This yields a better estimate of long distance data
communication latency:
(distance / signal propagation delay) + (hop count × average router latency)
Because determining network latency across a complex internetworking scheme is so
important, the Internet protocols include facilities to measure network packet routing
response time. The Internet Control Message Protocol (ICMP), a required component
of the TCP/IP standard, supports an Echo Reply command that returns the response
time for the request. A simple command-line utility called ping that is included with
Windows Server 2003 issues several ICMP Echo Reply commands and displays the
response time as reported by the destination node. A slightly more sophisticated utility called tracert decomposes the response time to a remote IP destination by calculating the time spent traversing every hop in the route. These utilities are discussed in
more detail later in this chapter.
A related measure of latency is the round trip time (RTT). RTT is defined as the time it
takes for a message to get to its destination and back (usually the sum of the latency in
the forward direction and the latency in the backward direction). In typical client/
server transactions, network RTT corresponds closely to the response time that the
user of the application perceives. As discussed earlier, this perceived application
response time is the most important performance metric because of its correlation
with user satisfaction. RTT is also used by TCP to track whether a particular data
packet has been lost; the performance of TCP on lossy links depends on RTT.
Ethernet
In Ethernet, the network connections are peer-to-peer, meaning there is no master controller. Because there is no master controller in Ethernet peer-to-peer connections and
the link is a shared transmission medium, it is possible for collisions to occur when two
stations attempt to use the shared link at the same time. The performance impact of
Ethernet collisions is discussed later.
Fault tolerance, price, and performance are the main considerations that determine
the choice of a local area network configuration. As the price of Ethernet switches has
dropped, switched network segments have become more common. Unfortunately,
many people are then disappointed when a bandwidth-hogging application like tape
backup does not run any faster when a switch is configured instead of a hub. Because
Chapter 1: Performance Monitoring Overview
107
the underlying protocol is unchanged, point-to-point data transmissions that proceed
in a serial fashion cannot run any faster.
Another caution is not to get confused by the terminology that is used to identify hubs
and switches. Physically, hubs are wiring hubs that function logically as rings where
collisions occur whenever two stations attempt to send data concurrently. Switches
create network segments that function logically like spoke and hub configurations
where multiple transmissions can be in progress simultaneously on what are, in effect,
dedicated links operating at the interface’s full rated bandwidth. Collisions occur only
in switched networks when two (or more) stations A and B attempt to send data to the
same station C concurrently.
The Ethernet protocol is peer-to-peer, requiring no master controller of any kind.
Among other things, this makes an Ethernet network very easy to configure—you can
just continue to extend the wire and add links, up to the physical limitations of the
protocol in terms of the number of stations and the length of the wiring loop. Unlike
the SCSI protocol used to talk to a computer’s disk, tape, and other peripherals, for
example, the Ethernet standard has no provision for time-consuming and complex
bus arbitration. An Ethernet station that has data that it wants to send to another session does not face any sort of arbitration. A station simply waits until the transmission
medium appears to be free and then starts transmitting data.
This simple approach to peer-to-peer communication works best on relatively lightly
used network segments where stations looking to transmit data seldom encounter a
busy link. The rationale behind keeping the Ethernet protocol simple suggests that it
is not worth bothering about something that rarely happens anyway. The unhappy
result of having no bus arbitration is that in busier network segments, multiple stations can and do try to access the same communications link at the same time. This
leads to collisions, which are disrupted data transmissions that then must be retried.
A station with data to transmit waits until the wire appears free before attempting to
transmit. Each transmission begins with a characteristic preamble of alternating 0 and
1 bits of proscribed length. (The network interface card discards this preamble so that
it is not visible in a Network Monitor packet trace.) The preamble is followed by a 1byte start delimiter that contains the bit sequence 10101011 designed to distinguish
the preamble from the beginning of the real data to be transmitted.
The station then continues with the transmission, always sending an entire packet, or
frame, of information. Each Ethernet frame begins with the 48-bit destination address,
followed by the 48-bit source address. These 48-bit MAC addresses uniquely identify
the Ethernet source and destination addresses—this is accomplished by giving every
hardware manufacturer a distinct range of addresses that only it can use. These
108
Microsoft Windows Server 2003 Performance Guide
unique MAC addresses are also called unicast addresses. Ethernet also supports broadcast addresses where the address field is set to binary 1s to indicate that it should be
processed by all LAN cards on the segment. Broadcast messages are used, for example, to pass configuration and control information around the network.
Maximum transmission unit The length of the frame, including the header, is
encoded in the frame header immediately following the addressing fields. For historical reasons, Ethernet frames are limited to no more than 1514 bytes (1518 bytes, if you
include the required postamble bits) to keep any one station from monopolizing a
shared data link for too long. Assuming that successive Ethernet, IP, and TCP headers
occupy a minimum of 54 bytes, the data payload in an Ethernet packet is limited to
about 1460 bytes. As the speed of Ethernet links has increased, the small frame size
that the protocol supports has emerged as a serious performance limitation. For
example, CIFS access to remote files must conform to the Ethernet MTU, causing
blocks from large files to be fragmented into multiple packets. This slows down network throughput considerably because each station must wait a predetermined interval before transmitting its next packet. Consequently, some Gigabit Ethernet
implementations across 1 Gb/sec high-speed fiber optics links optionally create socalled jumbo frames. Windows Server 2003 support for the Gigabit Ethernet standard
is discussed in Chapter 6, “Advanced Performance Topics.”
Following the actual data payload, each Ethernet frame is delimited at the end by a
Frame Check Sequence, a 32-bit number that is calculated from the entire frame contents (excluding the preamble) as a cyclic redundancy check (CRC). A receiving station calculates its own version of the CRC as it takes data off the wire and compares it
to the CRC embedded in the frame. If they do not match, it is an error condition and
the frame is rejected.
Collision detection When two (or more) stations have data to transmit and they
both attempt to put data on the wire at the same time, this creates an error condition
called a collision. What happens is that each station independently senses that the
wire is free and begins transmitting its preamble, destination address, source address,
and other header fields, data payload, and CRC. If more than one station attempts to
transmit data on the wire, the sequence of bits from two different frames becomes
hopelessly intermixed. The sequence of bits received at the destination is disrupted,
and, consequently, the frame is rejected.
A sending station detects that a collision has occurred because it also receives a copy
of the disrupted frame, which no longer matches the original. The frame must be long
enough so that the original station can detect the fact that the collision has occurred
before it attempts to transmit its next packet. This key requirement in the protocol
specification determines a minimum sized packet that must be issued. Transmissions
Chapter 1: Performance Monitoring Overview
109
smaller than the minimum size are automatically padded with zeros to reach the
required length.
The latency (or transmission delay) for a maximum extent Ethernet segment determines the minimum packet size that can be sent across an Ethernet network. The
propagation delay for a maximum extent network segment in 10BaseT, for example, is
28.8 µsecs, according to the standards specification. The sending station must send
data to the distant node and back to detect that a collision has occurred. The round
trip time for the maximum extent network is 2 × 28.8 µsecs, or 57.6 µsecs. The sending station must get a complete frame header, data payload, and CRC back to detect
the collision. At a data rate of 10 Mb/sec, a station could expect to send 576 bits, or 72
bytes in 57.6 µsecs. Requiring each station to send at least 72 bytes means that collisions can be detected across a maximum extent network. Ethernet pads messages
smaller than 72 bytes with zeros to achieve the minimum length frames required.
With wiring hubs, most Ethernet segments that are in use today do not approach anywhere near the maximum extent limits. With hubs, maximum distances in the range
of 200–500 meters are typical. At the 512 meters limit of a single segment, Ethernet
latency is closer to 5 µsecs, and with shorter segments the latency is proportionally
less. So you can see that with wiring hubs, Ethernet transmission latency is seldom a
grave performance concern.
Switches help to minimize collisions on a busy network because stations receive only
packets encoded with their source or destination address. Collisions can and still do
occur on switched segments, however, and might even be prevalent when network
traffic all tends to be directed at one or more Windows Server 2003 machines configured on the segment. When there are two network clients that both want to send data
to the same server at the same time, a collision on the link to the server can and will
occur on most types of switches.
Back-off and retry Collisions disrupt the flow of traffic across an Ethernet network
segment. The data Sender A intended to transmit is not received properly at the receiving station C. This causes an error condition that must be corrected. Sender station B
also trying to send data to C also detects that a collision has occurred. Both Senders
must resend the frames that were not transmitted correctly.
The thought might occur to you that if Sender A and Sender B both detect a collision
and both try to resend data again at the same time, it is highly likely that the datagrams
will collide again. In fact this is exactly what happens on an Ethernet segment. Following a collision, Ethernet executes its exponential back-off and retry algorithm to try to
avoid potential future collisions. Each station waits a random period of time before
resending data to recover from the collision. If a collision reoccurs the second time
Microsoft Windows Server 2003 Performance Guide
(the probability of another collision on the first retry remains high), each station
doubles the potential delay interval and tries again. If a collision happens again,
each station doubles the potential delay interval again, and so on, until the transmission finally succeeds. As the potential interval between retries lengthens, one of
the stations will gain enough of a staggered start that eventually its transmission
will succeed.
In summary, the Ethernet protocol avoids the overhead of a shared bus arbitration
scheme to resolve conflicts when more than one station needs access to the bus. The
rationale is, “Let’s keep things simple.” This approach has much to commend it. As
long as the network is not heavily utilized, there is little reason to worry about bus
contention.
When conflicts do arise, which they inevitably do on busier networks, Ethernet stations detect that the collisions have occurred and attempt to recover by retrying the
transmissions until they succeed. Notice that each station executes the exponential
back-off algorithm independently until the transmissions finally succeed. No master
controller is ever required to intervene to bring order to the environment. Moreover,
no priority scheme is involved in sharing the common transmission medium.
Performance monitoring So long as network utilization remains relatively low, the
performance characteristics of Ethernet are excellent. For a nonswitched LAN, as network utilization begins to increase above 20–30 percent busy with multiple stations
attempting to transmit data on the segment, collisions begin to occur. Because of
retries, utilization of the segment increases sharply, doubling from 30–35 percent
busy to 60–70 percent, as depicted in Figure 1-29.
100
80
Utilization
110
60
40
20
0
Arrival rate
Figure 1-29 The characteristic utilization bulge in the utilization of a shared Ethernet transmission medium because of collisions
Figure 1-29 illustrates the bulge in utilization that can be expected on an Ethernet segment once collisions begin to occur. This characteristic bulge leads many authorities
Chapter 1: Performance Monitoring Overview
111
to recommend that you try to keep the utilization of Ethernet segments below 30–40
percent busy for nonswitched segments. Switched segments can typically sustain
much higher throughput levels without collisions, but remember that most switches
do not eliminate all collisions. Because collision detection and retries are a Network
Interface hardware function, performance monitoring from inside a machine running
Windows Server 2003 cannot detect that collisions and retries are occurring. Only
packets sent and delivered successfully are visible to the network interface measurement software.
Lacking direct measurement data, you must resort to assembling a case for collisions
occurring indirectly. From the standpoint of performance monitoring, utilization levels on the link of less than the 100 percent theoretical maximum utilization may represent the effective saturation point, given the way Ethernet behaves. Remember,
however, that the condition that causes collision is contention for the transmission
link. If the only activity on a segment consists of station A sending data to station B
(and station B acknowledging receipt of that data with transmissions back to A) during an application like network backup, there is no contention for the link. Under
those circumstances, you can drive utilization of an Ethernet link to 100 percent without collisions.
Warning
Switched networks provide significant relief from performance problems
related to Ethernet collisions, but they do not eliminate them completely. A switch
provides a dedicated virtual circuit to and from every station on the segment. With a
switch, station A can send data to B while station C sends data to D concurrently without a collision. However, collisions can still occur on a switched network if two stations
both try to send data to a third station concurrently, which is frequently what happens
in typical client/server networking configurations.
You might have access to other network performance statistics from your switches
that are available through RMON or SNMP interfaces that report the rate of collisions
directly.
IP Routing
The Internet Protocol layer, also known as layer 3 (with the physical and data link layers associated with the MAC layer being layers 1 and 2, respectively) is primarily concerned with delivering packets from one location to another. On the sender, the IP
layer decides which gateway or router the outgoing packet must be sent to. On the
receiver, it is IP that makes sure that the incoming packet has been sent by the
expected router or gateway. On intermediate links, IP picks the incoming packet and
then decides which network-segment the packet should go out on, and which router
112
Microsoft Windows Server 2003 Performance Guide
it should be sent to, so that the packet moves closer to its eventual destination. This
technology is called routing. Routing is associated with a bundle of standards that
include IP itself, ICMP, ARP, BGP, and others. This section introduces the key aspects
of IP routing technology that most impact network performance and capacity planning. Note that this section discusses IP-layer concepts in the context of IP version 4,
the dominant version of IP on the Internet today. However, most of the discussion
applies to IP version 6 also.
The basic technology used in IP routing is deceptively simple. What makes IP routing such a difficult topic from a performance perspective is the complicated, interconnected network infrastructures and superstructures that organizations have
erected to manage masses of unscheduled IP traffic. That the Internet works as well
as it does is phenomenal, given the complexity of the underlying network of networks that it supports.
Routing Routing is the process where packets are forwarded from one network segment to the next until they reach their final destination. These network segments can
span organizations, regions, and countries, with the result that IP is used to interconnect a vast worldwide network of computers. IP is the set of routing standards that ties
computers on both private intranets and the public Internet together so that they can
communicate with each other to send mail, messages, files, and other types of digital
information back and forth.
Devices called routers serve as gateways, interconnecting different network segments.
They implement layer 3 packet forwarding. Routers are connected to one or more local
LAN segments and then connected via WAN links to other routers located on external networks. Windows Server 2003 machines can be configured to serve as routers
by enabling the IP Forwarding function. However, the more common practice is to use
dedicated devices that are designed specifically to perform layer 3 switching. Routers
are basically responsible for forwarding packets on to the next hop in their journey
toward their ultimate destination. Routers serve as gateways connecting separate and
distinct network segments. They recognize packets that have arrived at the network
junction that are intended for internal locations. They place these packets on the
LAN, where they circulate until they reach the desired MAC address. Routers also initiate messages (encapsulated as packets, naturally) intended for other routers that are
used to exchange information about routes.
Although the IP layer is responsible for moving packets toward their eventual destination, IP does not guarantee delivery of those packets. Moreover, once a packet is
entrusted to IP for delivery, there is no mechanism within IP to confirm the delivery of
that packet as instructed. IP was designed around a “best effort” service model that is
both unreliable and connectionless. It is the Host-to-Host connection layer above IP that
is responsible for maintaining reliable, in-order delivery of packets. That component,
of course, is TCP.
Chapter 1: Performance Monitoring Overview
113
Being a “best effort” service model, IP works hard to deliver the packets entrusted to
it. Using IP, if there is a serviceable route between two addresses on the Internet, no
matter how convoluted, IP will find it and use it to deliver the datagram payload. As
you can probably imagine, route availability across a large system of interconnected
public and private networks is subject to constant change. For example, it is a good
practice for critical locations on your private network to be accessible by two or more
connections or paths. If one of these links goes down, IP will still be able to deliver
packets through an alternate route.
Determining which path among many possible choices that exist is one of the responsibilities of Internet Protocol layer 3 routers. Undoubtedly, some routes are better
than others because they can deliver traffic faster to a destination, more reliably, or
with less cost. Some routers implement the simple Routing Information Protocol
(RIP), which selects routes primarily based on the number of hops involved. More
powerful routers usually implement the more robust Open Shortest Path First (OSPF)
protocol, which attempts to assess both route availability and performance in making
decisions. The popularity of the public access Internet has recently generated interest
in having routers use policy-oriented Quality of Service (QoS) metrics to select among
packets that arrive from different sources and different applications. An in-depth discussion comparing and contrasting these routing methods is beyond the scope of this
chapter, but it is a good discussion to have with your networking provider.
Routing tables The dynamic aspects of routing create a big problem: specifically,
how to store all that information about route availability and performance and keep it
up-to-date. The Internet consists of thousands and thousands of separate autonomous network segments. They are interconnected in myriad ways. The route from
your workstation to some location like http://www.microsoft.com is not predetermined. There is no way to know beforehand that such a route even exists.
IP solves the problem of how to store information about route availability in an interesting way. The IP internetworking environment does not store a complete set of routing information in any one, centralized location that might be either vulnerable to
failure or be subject to becoming a performance bottleneck.
Instead, information about route availability is distributed across the network, maintained in routing tables stored in individual routers. These routing tables list the specific network addresses to which the individual router can deliver packets. In
addition, there is a default location—usually another router—where the router forwards any packets with an indeterminate destination for further address resolution.
The route command-line utility can be used to display the contents of a machine’s
routing table:
114
Microsoft Windows Server 2003 Performance Guide
C:\>route print
===========================================================================
Interface List
0x1 ........................... MS TCP Loopback interface
0x2000002 ...00 00 86 38 39 5a ...... 3Com Megahertz LAN + 56K
===========================================================================
===========================================================================
Active Routes:
Network Destination
Netmask
Gateway
Interface Metric
0.0.0.0
0.0.0.0
24.10.211.1
24.10.211.47
1
24.10.211.0
255.255.255.0
24.10.211.47
24.10.211.47
1
24.10.211.47 255.255.255.255
127.0.0.1
127.0.0.1
1
24.255.255.255 255.255.255.255
24.10.211.47
24.10.211.47
1
127.0.0.0
255.0.0.0
127.0.0.1
127.0.0.1
1
192.168.247.0
255.255.255.0
192.168.247.1
192.168.247.1
1
192.168.247.1 255.255.255.255
127.0.0.1
127.0.0.1
1
200.200.200.0
255.255.255.0
200.200.200.1
200.200.200.1
1
200.200.200.1 255.255.255.255
127.0.0.1
127.0.0.1
1
224.0.0.0
224.0.0.0
200.200.200.1
200.200.200.1
1
224.0.0.0
224.0.0.0
24.10.211.47
24.10.211.47
1
224.0.0.0
224.0.0.0
192.168.247.1
192.168.247.1
1
255.255.255.255 255.255.255.255
192.168.247.1
0.0.0.0
1
===========================================================================
Persistent Routes:
None
This sample route table is for a Windows Server 2003 machine at address
24.10.211.47 serving as a router. This table marks addresses within the 24.10.211.0
Class C network range (with a subnet mask of 255.255.255.0) for local delivery. It
also shows two external router connections at locations 200.200.200.1 and
192.168.247.1. Packets intended for IP addresses that this machine has no direct
knowledge of are routed to the 24.10.211.1 gateway by default.
The set of all IP addresses that an organization’s routers manage directly defines the
boundaries of what is known in routing as an autonomous system (AS). For the Internet
to work, routers in one autonomous system need to interchange routing information
with the routers they are connected to in other autonomous systems. This is accomplished using the Border Gateway Protocol (BGP). Using BGP, routers exchange information with other routers about the IP addresses that they are capable of delivering
packets to.
As the status of links within some autonomous network configuration changes, the
routers at the borders of the network communicate these changes to the routers they
are attached to at other external networks. Routers use Border Gateway Protocol
(BGP) messages to exchange information about routes with routers they are connected to across autonomous systems. BGP is mainly something that the Internet Ser-
Chapter 1: Performance Monitoring Overview
115
vice Providers worry about. Naturally, ISP routers maintain extensive routing tables
about the subnetworks the Providers manage.
One of the key ingredients that makes IP routing across a vast network like the Internet work is that each IP packet that routers operate on is self-contained. Each packet
contains all the information that the layer 3 switching devices need to decide where to
deliver the packet and what kind of service it requires. The IP header contains the
addresses of both the sender and the intended receiver. Another feature of IP is that
routers operate on each self-contained IP packet individually. It is quite possible for
one packet intended for B that is sent by A to get delivered to its destination following
one route while the next packet is delivered by an entirely different route.
When a router receives a packet across an external link that is destined for delivery
locally, the router is responsible for delivering that packet to the correct station on the
LAN. That means the router sends this packet to the MAC layer interface, plugging in
the correct destination address. (The router leaves the Source address unchanged so
that you can always tell where the packet originated.) The Address Resolution Protocol (ARP) is used to maintain a current list of local IP addresses and their associated
MAC addresses.
Router performance The fact that IP does not guarantee the delivery of packets
does have some interesting performance consequences. When IP networks get congested, routers can either queue packets for later processing or drop the excess load.
By design, most high-performance routers do the latter. They are capable of keeping
only a very small queue of packets. If additional requests to forward packets are
received and the queue is full, most routers simply drop incoming packets. Dropping
packets is acceptable behavior in IP, and even something that is to be expected. After
all, IP never guaranteed that it would deliver those packets in the first place. The protocol is designed to make only its “best effort” to deliver them.
The justification for this strategy is directly related to performance. The original
designers of the Internet understood that persistent bottlenecks in the Internet infrastructure would exist whenever some customer’s equipment plugged into the network was hopelessly undersized. In fact, given the range of organizations that could
connect to the Internet, it is inevitable that some routes are not adequately sized.
Therefore, the Internet packet delivery mechanism needs to be resilient in the face of
persistent, mismatched speeds of some of the components of a route.
In the face of inevitable speed mismatches, what degree of queuing should a router
serving as the gateway between two networks attached to the Internet support? Consider that queuing at an undersize component during peak loads could lead to
unbounded queuing delays whenever requests started to arrive faster than the router
116
Microsoft Windows Server 2003 Performance Guide
could service them. It is also inevitable that whatever queue depth a bottlenecked
router was designed to support could readily be exceeded at some point. When a
router finally exhausts the buffer space it has available to queue incoming packets, it
would be necessary to discard packets anyway. With this basic understanding of the
fundamental problem in mind, it makes sense to start to discard packets before the
queue of deferred packets grows large and begins, for example, to require extensive
resources to manage.
Routers have a specific processing capacity, rated in terms of packets/second. When
packets arrive at a router faster than the router can deliver them, excess packets are
dropped. Because most routers are designed to support only minimal levels of queuing, the response times for the packets they deliver is very consistent, never subject to
degradation when the network is busy. The price that is paid for this consistent
response time is that some packets might not be delivered at all.
Obviously, you need to know when the rate of network traffic exceeds capacity
because that is when routers begin dropping packets. You can then upgrade these
routers or replace them with faster units. Alternatively, you might need to add network capacity in the form of both additional routes and routers. Performance statistics
are available from most routers using either SNMP or RMON interfaces. In addition,
some routers return ICMP Source Quench messages when they reach saturation and
need to begin dropping packets.
Tip
The IP statistics that are available in System Monitor count packets received and
processed and are mainly useful for network capacity planning. These include IP\Datagrams Received/sec, IP\Datagrams Received/sec, and the total IP\Datagrams/sec.
Counters for both the IP version 4 (IPv4) and IP version 6 (IPv6) versions of the protocol are available.
Obviously, dropping packets at a busy router has dire consequences for someone
somewhere, namely the originator of the request that failed. Indeed it does. Although
IP is not concerned about what happens to a few packets here and there that might
have gotten dropped, this is a concern at the next higher level in the protocol stack,
namely in the TCP Host-to-Host connection layer 4. TCP will eventually notice that a
packet is missing and attempt some form of error recovery, which involves resending
the original packet. If TCP cannot recover from the error, it will eventually notify the
application that issued the request. This leads to the familiar “Request Timed Out”
error message in your Web browser that prods you into retrying the entire request.
The structural problem of having an overloaded component on the internetwork
somewhere is something that also must be systematically addressed. In networking
Chapter 1: Performance Monitoring Overview
117
design, this is known as the problem of flow control, for example, what to do about a
sender computing system that is overloading some underpowered router installed on
a customer’s premises. Again, flow control is not a concern at the IP level, but at the
next higher level, TCP does provide a suitable flow control and a congestion control
mechanism, which is discussed later.
Packet headers IP packet headers contain the familiar source and destination IP
addresses, fragmentation instructions, and a hop count called TTL, or Time To Live.
Three IP header fields provide the instructions used in packet fragmentation and reassembly. These are the Identification, Flags, and Offset fields that make up the second
32-bit word in the IP packet header. The IP layer at the packet’s destination is responsible for reassembling the message from a series of fragmented packets and returning
data to the application in its original format.
The Time To Live (TTL) field is used to ensure that packets cannot circulate from
router to router around the network forever. Each router that operates on a packet
decrements the TTL field before sending it on its way. If a router finds a packet with a
TTL value of zero, that packet is discarded on the assumption that it is circulating in
an infinite loop.
Note As originally specified, the 1-byte TTL field in the IP header was supposed to
represent the number of seconds a packet could survive on the Internet before it was
discarded. Each IP router in the destination path was supposed to subtract the number
of seconds that the packet resided at the router. But because packets tended to spend
most of their time in transit between routers connected by long distance WAN links,
this scheme proved unworkable. The TTL field was then reinterpreted to mean the
number of path components that a packet travels on the way to its destination. Today,
when the TTL hop count gets to zero, the packet is discarded.
Subtracting the final TTL value observed at the destination from the initial value
yields the hop count, which is the number of links traversed before the packet reached
its final destination. TTL is a 1-byte field with a maximum possible link count of 255.
Windows Server 2003 sets TTL to 128 by default. Because IP packets typically can
span the globe in less than 20 hops, a default TTL value of 128 is generous.
ICMP The Internet Control Message Protocol (ICMP) is used to generate informational error messages on behalf of IP. Even though ICMP does not serve to make IP
reliable, it certainly makes IP easier to manage. In addition to its role in generating
error messages, ICMP messages are used as the basis for several interesting utilities,
including ping and tracert.
118
Microsoft Windows Server 2003 Performance Guide
Ping is a standard command-line utility that utilizes ICMP messaging. The most common use of the ping command is simply to verify that one IP address can be reached
from another. The ping command sends an ICMP type 0 Echo Reply message and
expects a type 8 Echo Request message in reply. The ping utility calculates the round
trip time for the request and the TTL value for a one-way trip. (Note that ping sets TTL
to a value of 255 initially.) By default, ping sends Echo Reply messages four times so
that you can see representative RTT and hop count values. Because different packets
can arrive at the destination IP address through different routes, it is not unusual to
observe variability in the four measurements.
A more elaborate diagnostic tool is the tracert utility that determines the complete
path to the destination, router by router. Typical output from the tracert command follows:
C:\>tracert 207.46.155.17
Tracing route to 207.46.155.17 over a maximum of 30 hops
1
2
3
4
5
6
7
8
9
10
11
12
1
14
13
13
17
19
56
58
56
59
59
54
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
1
15
13
15
15
15
53
59
59
57
60
57
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
1
13
11
16
14
13
53
56
58
58
57
56
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
192.168.0.101
12-208-96-1.client.attbi.com [12.208.96.1]
12.244.104.97
12.244.72.230
gbr6-p90.cgcil.ip.att.net [12.123.6.6]
tbr2-p013601.cgcil.ip.att.net [12.122.11.61]
gbr4-p20.st6wa.ip.att.net [12.122.10.62]
gar1-p370.stwwa.ip.att.net [12.123.203.177]
12.127.70.6
207.46.33.225
207.46.36.66
207.46.155.17
Trace complete.
The way the tracert command works is that it begins by sending ICMP Echo Reply
type 0 messages with a TTL of 1, then increments TTL until the message is successfully received at the destination. In this fashion, it traces at least one likely route of a
packet and calculates the cumulative amount of time it took to reach each intermediate link. (This is why tracert sometimes reports that it takes less time to travel further
along the route—the response times displayed represent different ICMP packets that
were issued.) The tracert command also issues a DNS reverse query to determine the
DNS name of each node in the path.
TCP
The Transmission Control Protocol (TCP) is the Layer 4 protocol that provides a reliable, peer-to-peer delivery service. TCP sets up point-to-point, connection-oriented
sessions to guarantee reliable in-order delivery of application transmission requests.
Chapter 1: Performance Monitoring Overview
119
TCP sessions are full duplex, capable of sending and receiving data between two locations concurrently. This section reviews the way the TCP protocol works. The most
important TCP tuning parameters are discussed in Chapter 5, “Performance Troubleshooting.”
TCP sessions, or connections, are application-oriented. TCP port numbers uniquely
identify applications that plug into TCP. Familiar Internet applications like HTTP, FTP,
SMTP, and Telnet all plug into TCP sockets. Microsoft networking applications like
DCOM, RPC, and the SMB server and redirector functions also utilize TCP. They use
the NBT interface that allows NetBEUI services to run over TCP/IP.
TCP connection-oriented behavior plays an important role in network performance
because the TCP layer is responsible for both flow control and congestion control.
The flow control problem was cited earlier: how to keep a powerful sender from
overwhelming an undersized link. Congestion control deals with a performance
problem that arises in IP routing where busy routers drop packets instead of queuing them. The TCP layer that is responsible for in-order, reliable receipt of all data,
detects that packets are being dropped. Because the likely cause of packets being
dropped is router congestion, TCP senders recognize this and back off to lower
transmission rates.
TCP advertises a sliding window that limits the amount of data that one host application can send to another without receiving an Acknowledgement. Once the Advertised Window is full, the sender must wait for an ACK before it can send any
additional packets. This is the flow control mechanism that TCP uses to ensure that a
powerful sender does not overwhelm the limited capacity of a receiver.
Being unable to measure network link capacity directly, TCP instead detects that network congestion is occurring and backs off sharply from sending data. TCP recognizes two congestion signals: a window-full condition indicating that the receiver is
backed up; and an unacknowledged packet that is presumed lost because of an overloaded router. In both cases, the TCP sender reacts by reducing its network transmission rates.
These two important TCP performance-oriented functions are tied to the round trip
time (RTT) of connections, which TCP calculates. Together, the RTT and the size of
the TCP sliding data window determine the throughput capability of a TCP connection. RTT also figures into TCP congestion control. If a sender fails to receive a timely
Acknowledgement message that a packet was delivered successfully, TCP ultimately
retransmits the datagram. The amount of time TCP waits for an Acknowledgement to
be delivered before it retransmits the data is based on the connection RTT. Important
TCP tuning options include setting the size of the Advertised Window and the
method used to calculate RTT.
120
Microsoft Windows Server 2003 Performance Guide
Session connections Before any data can be transferred between two TCP peers,
these peers first must establish a connection. In the setup phase of a connection, two
TCP peers go through a handshaking process where they exchange information about
each other. Because TCP cares about delivering data in the proper sequence, the hosts
initiating a session need to establish common sequence numbers to use when they
want to begin transferring data. The peers also negotiate to set various options associated with the session, including establishing the size of the data transfer window, the
use of selective acknowledgement (SACK), the maximum segment size the parties can
use to send data, and other high-speed options like timestamps and window scaling.
To initiate a connection, TCP peer 1 sends a Synchronize Sequence Number (SYN)
message that contains a starting sequence number, a designated port number to send
the reply to, and the various proposed option settings. This initial message is posted
to a standard application port destination at the receiver’s IP address—for example,
Port 80 for an HTTP session between a Web browser and a Web server. Then TCP peer
2 acknowledges the original SYN message with a SYN-ACK, which returns the
receiver’s starting sequence number to identify the packets that it will initiate. Peer 2
also replies with its AdvertisedWindow size recommendation. Peer 1 naturally must
acknowledge receipt of Peer 2’s follow-up SYN-ACK. When TCP Peer 2 receives Peer
1’s acknowledgement (ACK) message referencing its SYN-ACK message number and
the agreed-upon AdvertisedWindow size, the session is established. Frames 9–11 in
the packet trace illustrated in Figure 1-28 are a typical sequence of messages
exchanged by two hosts to establish a TCP connection.
Byte sequence numbers The TCP packet header references two 32-bit sequence
numbers, a Sequence Number and the Acknowledgement field. These are the relative
byte number offsets of the current transmission streams that are being sent back and
forth between the two host applications that are in session. At the start of each TCP
session as part of establishing the connection, the peers exchange initial byte
sequence numbers in their respective SYN messages. Because TCP supports full
duplex connections, both host applications send SYN messages initially.
The Sequence Number field in the header is the relative byte offset of the first data
byte in the current transmission. This Sequence Number field allows the receiver to
slot an IP packet received out of order into the correct sequence. Because the
sequence numbers are 32 bits wide, it is safe for TCP to assume any packets received
with identical sequence numbers are duplicates because of retransmission. In case
this assumption is not true for certain high-latency high-speed connections, the endpoints should turn on the timestamp option in TCP. Duplicates can safely be discarded by the receiver.
Chapter 1: Performance Monitoring Overview
121
The Acknowledgment field acknowledges receipt of all bytes up to (but not including)
the current byte offset. It is interpreted as the Next Byte a TCP peer expects to receive
in this session. The receiver matches the Acknowledgement ID against the SequenceNumber field in the next message received. If the SequenceNumber is higher, the current message block is interpreted as being out of sequence, and the Acknowledgement
field of the ACK message returned is unchanged. The Acknowledgement field is
cumulative, specifically acknowledging receipt of all bytes from the Initial Sequence
Number (ISN) +1 to the current Acknowledgement byte number −1. A receiver can
acknowledge an out-of-sequence packet only when the SACK (Selective Acknowledgement) option is enabled.
Sliding window TCP provides a flow control mechanism called the sliding window,
which determines the maximum amount of data a peer can transmit before receiving
a specific acknowledgement from the receiver. This mechanism can be viewed as window that slides forward across the byte transmission stream. The AdvertisedWindow
is the maximum size of the sliding window. The current Send Window is the AdvertisedWindow minus any transmitted bytes that the receiver has not yet Acknowledged. Once the window is filled and no Acknowledgement is forthcoming, the
sender is forced to wait before sending any more data on that connection.
The AdvertisedWindow field in the TCP header is 16 bits wide, making 64 KB the largest possible window size. However, an optional Window Scale factor can also be specified, which is used to scale up the AdvertisedWindow field to support larger
windows. The combination of the two fields allows TCP to support a sliding data window up to 1 GB wide.
TCP’s sliding window mechanism has network capacity planning implications.
Together, the Advertised Window size and the RTT establish an upper limit to the
effective throughput of a TCP session. The default Advertised Window Windows
Server 2003 uses for Ethernet connections is about 17,520 bytes. TCP session management can send only one 17.5 KB window’s worth of data before stopping to wait
for an ACK from the receiver. If the RTT of the session is 100 milliseconds (ms) for a
long distance connection, the TCP session will be able to send a maximum of only
1/RTT windows per second, in this case, just 10 windows per second. The maximum
throughput of that connection is effectively limited to
Max throughput = AdvertisedWindow / RTT
which in this case is 175 KB/sec, independent of the link bandwidth.
Consider a Fast Ethernet link at 100 Mb/sec, where the effective throughput capacity of
the link is roughly 12 MB/sec (12,000,000 bytes/sec). RTT and the default Windows
122
Microsoft Windows Server 2003 Performance Guide
Server 2003 TCP Advertised Window begin to limit the effective capacity of a Fast Ethernet link once RTT increases above 1 millisecond, as illustrated in Table 1-6.
Table 1-6
How Various RTT Values Reduce the Effective Capacity of a 100BaseT
Link
RTT (ms)
Max Windows/sec
Max Throughput/sec
0.001
1,000,000
12,000,000
0.010
100,000
12,000,000
0.100
10,000
12,000,000
1
1,000
12,000,000
10
100
1,750,000
100
10
175,000
1000
1
17,500
Because the latency of a LAN connection is normally less than a millisecond (which is
what ping on a LAN will tell you), RTT will not limit the effective capacity of a LAN
session. However, it will have a serious impact on the capacity of a long distance connection to a remote Web server, where you can expect the RTT to be 10–100 milliseconds, depending on the distances involved. Note that this is the effective link capacity
of a single TCP connection. On IIS Web servers with many connections active concurrently, the communications link can still saturate.
Windows Server 2003 defaults to using a larger window of about 65,535 bytes for a
Gigabit Ethernet link. Assuming a window of exactly 65,000 bytes leads to the behavior illustrated in Table 1-7.
Table 1-7
How Various Values Reduce the Effective Capacity of a 1000BaseT
Link
RTT (ms)
Max Windows/sec
Max Throughput/sec
0.001
1,000,000
125,000,000
0.010
100,000
125,000,000
0.100
10,000
125,000,000
1
1,000
65,000,000
10
100
6,500,000
100
10
650,000
1000
1
65,000
An Advertised Window value larger than 65535 bytes can be set for Gigabit Ethernet
links using the Window scaling option. This is one of the networking performance
options discussed in more detail in Chapter 6, “Advanced Performance Topics.” in
this book.
Chapter 1: Performance Monitoring Overview
123
Ad
d
cr itiv
ea e
se
In
St
ar
t
Slo
w
Advertised
Window
Congestion window As part of congestion control, a TCP sender paces its rate of
data transmission, slowly increasing it until a congestion signal is recognized. When a
congestion signal is received, the sender backs off sharply. TCP uses Additive increase/
Multiplicative decrease to open and close the Send Window. It starts a session by sending two packets at a time and waiting for an ACK. TCP slowly increases its connection
Send Window one packet at a time until it receives a congestion signal. When it recognizes a congestion signal, TCP cuts the current Send Window in half and then
resumes additive increase. The operation of these two congestion mechanisms produces a Send Window that tends to oscillate, as illustrated in Figure 1-30, reducing
the effective capacity of a TCP connection accordingly.
Multiplicative
Decrease
Effective Capacity
Figure 1-30
The TCP congestion window reducing effective network capacity
The impact of the TCP congestion window on effective network capacity is explored in
more detail in Chapter 5, “Performance Troubleshooting.”
Summary
This chapter introduced the performance monitoring concepts that are used throughout the remaining chapters of this book to describe the scalability and provisioning of
Windows Server 2003 machines for optimal performance. It provided definitions for
key computer performance concepts, including utilization, response time, service
time, and queue time. It then discussed several mathematical relations from the queuing models that are often applied to problems related to computer capacity planning,
including Utilization Law and Little’s Law. The insights of mathematical Queuing
Theory imply that it is difficult to balance efficient utilization of hardware resources
with optimal response time. This tension between these two goals arises because, as
utilization at a resource increases, so does the potential for services requests to
encounter queue time delays at a busy device.
124
Microsoft Windows Server 2003 Performance Guide
This chapter also discussed an approach to computer performance and tuning that is
commonly known as bottleneck analysis. Bottleneck analysis decomposes a response
time–oriented transaction into a series of service requests to computer subcomponents such as processors, disks, and network adaptors, that are interconnected in a
network of servers and their queues. Bottleneck analysis then seeks to determine
which congested resource adds the largest amount of queue time delay to the transaction as it traverses this set of interconnected devices. Once they have been identified,
bottlenecks can be relieved by adding processing capacity, by spreading the load over
multiple servers, or by optimizing the application’s use of the saturated resource in a
variety of straightforward ways.
The bulk of this chapter discussed the key aspects of computer systems architecture
that are most important to performance and scalability. This included a discussion of
the basic performance capabilities of the processor, memory, disk, and networking
hardware that is most widely in use today. The key role that the Windows Server 2003
operating system plays in supporting these devices was also highlighted, including a
discussion of the basic physical and virtual memory management algorithms that the
operating system implements. The techniques used by the operating system to gather
performance data on the utilization of these resources were also discussed to provide
insight into the manner in which the key measurement data that supports performance troubleshooting and capacity planning is derived.
Chapter 2
Performance Monitoring
Tools
In this chapter:
Summary of Monitoring Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Performance Monitoring Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
System Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Task Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Automated Performance Monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Managing Performance Logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Windows Performance Monitoring Architecture . . . . . . . . . . . . . . . . . . . 207
Event Tracing for Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
Alerts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Windows System Resource Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Network Monitor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Microsoft® Windows Server™ 2003 provides a comprehensive set of tools to help
with the collection of useful performance data. Four general classes of tools are
available: performance statistics gathering and reporting tools, event tracing tools,
load generating tools, and administration tools. The statistical tools that you will
use regularly are the Performance Monitor and Task Manager, but you will have
occasion to use other more specialized statistical tools to investigate specific performance problems.
Event tracing tools gather data about key, predefined system and application events.
The event traces that you can gather document the sequence of events that take place
on your Windows Server 2003 machine in great detail. Event tracing reports can be
extremely useful in diagnosing performance problems, and they can also be used to
augment performance management reporting in some critical areas.
This chapter discusses the use of these tools to gather performance statistics and diagnostic performance events. Specific performance statistics that should be gathered are
discussed in detail in Chapter 3, “Measuring Server Performance.” Performance monitoring procedures designed to support both problem diagnosis and longer term
capacity planning are described in Chapter 4, “Performance Monitoroing Procedures.”
125
126
Microsoft Windows Server 2003 Performance Guide
Chapter 5, “Performance Troubleshooting,” and Chapter 6, “Advanced Performance
Topics,” discuss the role that other key diagnostic tools can play in troubleshooting
some of the difficult performance problems that you can encounter.
In addition to these essential performance monitoring tools, load testing should be
used in conjunction with capacity planning to determine the capacity limits of your
applications and hardware. How to use application load testing tools is beyond the
scope of this chapter.
Summary of Monitoring Tools
There are three primary sources for the monitoring and diagnostic tools that are available for Windows Server 2003:
■
Windows Server 2003 tools installed as part of the operating system
■
The Windows Server 2003 Support Tools from the operating system installation CD
■
The Windows Server 2003 Resource Kit tools
This chapter is mainly concerned with the monitoring and diagnostic tools that are
automatically installed alongside the operating system.
Performance Statistics
The first group of tools to be discussed gathers and displays statistical data on an
interval basis. These tools use a variety of sampling techniques to generate interval
performance monitoring data that is extremely useful in diagnosing performance
problems. The statistical tools in Windows Server 2003 are designed to be efficient
enough that you can run them continuously with minimal impact. Using the tools
that are supplied with Windows Server 2003, you should be able to establish automated data collection procedures. The performance statistics you gather using the
Performance Monitor can also be summarized over longer periods of time to assist
you in capacity planning and forecasting. Specific procedures designed to help you
accomplish this are described in Chapter 5 of this book.
You will find that the same set of performance counters described here are available in
many other tools. Other applications that access the same performance statistics
include the Microsoft Operations Manager (MOM) as well as applications that have
been developed by third parties. All these applications that gather Windows Server
2003 performance measurements share a common measurement interface—a performance monitoring application programming interface (API). The performance monitoring API is the common source of all the performance statistics these tools gather.
Chapter 2:
Performance Monitoring Tools
127
Event Traces
A comprehensive trace facility called Microsoft Event Tracing for Windows (ETW)
gathers detailed event traces from operating system providers. Using ETW, you can
determine precisely when context switches, file operations, network commands, page
faults, or other system events occur. Event trace providers are also available for many
server applications, including File Server, Internet Information Services (IIS), and
Active Directory. ETW traces capture a complete sequence of events that allow you to
reconstruct precisely what is occurring on your Windows Server 2003 machine and
when. Event traces can also illuminate key aspects of application performance for
those server applications like IIS and Active Directory that have also been instrumented. Event traces are one of the most important tools available for diagnosing performance problems.
Unlike statistical tools, event tracing is not designed to be executed continuously or to
run unattended for long periods of time. Depending on what events you are tracing,
ETW traces can easily generate significant overhead to the extent that they might
interfere with normal systems operation. Consequently, event traces are a diagnostic
tool normally reserved for gathering data about performance problems that require
more detailed information than statistical tools can provide. Also, there is no easy way
to summarize the contents of an event trace log, although standard reports are provided. In addition, it is possible to generate very large files in a relatively short time
period. So long as you are careful about the way you use them, event traces are a valuable diagnostic tool that you will find invaluable in a variety of circumstances.
Another important trace tool is Network Monitor. With Network Monitor, you can
determine the precise sequence of events that occur when your computer is communicating with the outside world. Network Monitor traces are often used to diagnose
network connectivity and performance problems. But they are also useful to document the current baseline level at which your systems are operating so that you can
recognize how your systems are growing and changing. Using Network Monitor to
diagnose network performance problems is discussed in Chapter 5, “Performance
Troubleshooting” in this book.
Finally, one more event-oriented administrative tool should be mentioned in this context. The event logging facility of the operating system can often play an important
role in diagnosing problems. Using Event Viewer, you can access event logs that
record a variety of events that can be meaningful when you are analyzing a performance problem. For example, one of the easiest ways to determine when processes
begin and end in Windows Server 2003 is to enable audit tracking of processes and
then examine process start and end events in the Security log.
128
Microsoft Windows Server 2003 Performance Guide
Load Generating and Testing
Load-generating tools of all kinds are useful for stress testing specific hardware and
software configurations. Windows Server 2003 includes several load- testing tools,
including some for specific applications like Web-based transaction processing and
Microsoft Exchange. Load-testing tools simulate the behavior of representative application users. As you increase the number of simulated users, you can also gather performance statistics to determine the impact on system resource utilization. These
tools help you determine the capacity limits of the hardware you have selected
because they allow you to measure application response time as the number of users
increases. As discussed in Chapter 1, “Performance Monitoring Overview,” application response time can be expected to increase as a function of load, and these tools
allow you to characterize that nonlinear function very accurately for specific application workloads.
Load testing your workload precisely can become very complicated. A key consideration with load-testing tools is developing an artificial workload that parallels the reallife behavior of the application you are interested in stress testing. The fact that a particular hardware or software bottleneck arises under load might be interesting, but
unless the simulated workload bears some relationship to the real workload, the test
results might not be very applicable. Further discussion of the use of load-testing tools
is beyond the scope of this chapter.
Administrative Controls
Finally, there are tools that provide system administrators with controls that allow you
to manage complex Windows Server 2003 environments. Administrative tools are
especially useful when you have two or more major applications running on the same
server that are vying for the same, potentially overloaded resources. The Windows
System Resources Manager (WSRM) is the most comprehensive administrative tool
available for the Windows Server 2003 environment, but other administrative tools
can also prove helpful. WSRM is discussed briefly in Chapter 6, “Advanced Performance Topics,” in this book.
More Info
Complete documentation about the use of WSRM is available at http://
www.microsoft.com/windowsserver2003/technologies/management/wsrm
/default.mspx.
Required Security for Tool Usage
Many monitoring tools will only work provided you have the appropriate security
access. In previous versions of the operating system, access was controlled by setting
Chapter 2:
Performance Monitoring Tools
129
security on the appropriate registry keys. Windows Server 2003 comes with two predefined security groups—Performance Monitor Users and Performance Log Users—
that already have the prerequisite security rights for accessing the registry keys
required by Performance Monitor.
You can add a designated support person into an appropriate performance security
group by using the Active Directory Users and Computers tool when working in a
domain. These two security groups are found in the Builtin container for the domain
and are the following:
■
Performance Monitor Users To allow designated support personnel access to
remote viewing of system performance through the System Monitor tool, you
will need to make each person a member of the Performance Monitor Users
group.
■
To allow designated support personnel access to
remote configuring of and using the Performance Logs and Alerts tool, you will
need to make each person a member of the Performance Log Users group.
Performance Log Users
More Info
For information about how to add or remove users from groups,
see “Changing group memberships” in the Windows Server 2003 Help.
Table 2-1 contains a list of Windows Server 2003 operating system tools for monitoring performance.
More Info
For more information about operating system tools, in Help and Support Center for Microsoft® Windows Server™ 2003, click Tools, and then click Command-Line Reference A–Z.
Table 2-1
Performance-Related Operating System Tools
Tool
Description
Freedisk.exe
Checks for a specified amount of free disk space, returning a 0 if there
is enough space for an operation and a 1 if there isn’t. If no values are
specified, the tool tells you how much hard disk space you have left.
Useful when performance monitoring disk usage.
Lodctr.exe
Loads performance counters. Especially useful for saving and restoring
a set of registry enabled performance counters.
Logman.exe
Powerful command-line tool for collecting event trace and performance information in log files. Useful for scripting performance monitoring. Using Logman is discussed later in this chapter.
Comprehensive performance monitoring procedures that incorporate
Logman are described in Chapter 4, “Performance Monitoring Procedures,” later in this book.
130
Microsoft Windows Server 2003 Performance Guide
Table 2-1
Performance-Related Operating System Tools
Tool
Description
Msinfo32.exe
Provides information about the resources used by your system. Useful
when requiring a snapshot of information about processes and the resources they are consuming.
Network Monitor
Comprehensive graphical tool for monitoring network traffic to and
from the local server. Must be installed manually after the base operating system is loaded. Use of Network Monitor is discussed in Chapter
5, “Performance Troubleshooting,” later in this book.
Performance Logs Creates a manually defined set of logs that can be used as baselines
and Alerts
for your servers. The Alerts component allows you to take a specified
action when specific conditions are encountered. Using Performance
Logs and Alerts is discussed later in this chapter.
Relog.exe
Command-line tool that creates new performance logs from existing
log files by varying the sample rate. Using Relog is discussed later in
this chapter. Comprehensive performance monitoring procedures
that incorporate Relog are described in Chapter 4, “Performance Monitoring Procedures,” later in this book.
Systeminfo.exe
Provides detailed support information about computers
System Monitor
Main graphical tool used for monitoring system performance. Using
System Monitor is discussed extensively throughout this book.
Taskkill.exe
Kills a specified process on the specified computer.
Tasklist.exe
Displays the process ID and memory information about all running
processes.
Task Manager
Graphical tool that offers an immediate overview of the system and
network performance of the local server only. Using Task Manager is
discussed later in this chapter.
Tracerpt.exe
Command-line tool useful for converting binary log event trace files
into a report or comma-separated value (CSV) files for importing into
spreadsheet products such as Microsoft Excel. Using Tracerpt is discussed later in this chapter. Using Tracerpt in conjunction with troubleshooting performance problems is discussed in Chapter 5,
“Performance Troubleshooting.”
Typeperf.exe
Command-line tool that writes performance data to a log file. Useful
for automating the performance monitoring process. Using Typeperf
is discussed later in this chapter. Comprehensive performance monitoring procedures that incorporate Typeperf are described in Chapter
4, “Performance Monitoring Procedures.”
Unlodctr.exe
Unloads performance counters. Especially useful for restoring a set of
registry-enabled performance counters.
Table 2-2 contains a list of Windows Server 2003 Support Tools for monitoring performance. For more information about Windows Support Tools, in Help and Support
Chapter 2:
Performance Monitoring Tools
131
Center for Microsoft® Windows Server™ 2003, click Tools, and then click Windows
Support Tools.
Table 2-2
Performance-Related Support Tools
Tool
Description
Depends.exe
The Depency Walker tool scans any 32-bit or 64-bit Windows module
(including .exe, .dll, .ocx, and .sys, among others) and builds a hierarchical tree diagram of all dependent modules. Useful when knowledge
of all files used by a monitored process is required.
Devcon.exe
The Device Configuration Utility displays all device configuration information and their current status. Useful when monitoring hardware
performance.
Diruse.exe
Directory Usage will scan a directory tree and report the amount of
space used by each user. Useful when tracking disk space issues.
Exctrlst.exe
Extensible Counter List is a graphical tool that displays all counter .dll
files that are running and provides the capability to disable them. Using Exctrlst is discussed later in this chapter.
Health_chk.cmd
This script uses Ntfrsutl.exe to gather data from FRS on the target
computer for later analysis. Useful when scripting remote monitoring.
Memsnap.exe
This memory-profiling tool takes a snapshot of the memory resources
being consumed by all running processes and writes this information
to a log file. Useful when performance monitoring a system’s memory.
Netcap.exe
Command-line sniffer tool used to capture network packets. Useful
when monitoring network performance.
Poolmon.exe
The Memory Pool Monitor tool monitors memory tags, including total
paged and non-paged pool bytes. Poolmon is often used to help detect memory leaks. Using Poolmon is discussed in Chapter 5, “Performance Troubleshooting.”
Pviewer.exe
Process Viewer is a graphical tool that displays information about a
running process and allows you to stop (kill) processes and change
process priority.
Replmon.exe
The Replication Monitor tool enables administrators to monitor Active
Directory replication, synchronization, and topology.
Table 2-3 contains a list of Windows Server 2003 Resource Kit Tools for monitoring
performance.
More Info
For more information about Resource Kit tools, in Help and Support
Center for Microsoft Windows Server 2003, click Tools, and then click Windows
Resource Kit Tools. You can also download these tools directly from the Microsoft
Download Center by going to http://www.microsoft.com/downloads and searching
for “Windows Server 2003 Resource Kit Tools”.
132
Microsoft Windows Server 2003 Performance Guide
Table 2-3
Performance-Related Windows Resource Kit Tools
Tool
Description
Adlb.exe
Active Directory Load Balancing tool that balances the load imposed
by Active Directory connection objects across multiple servers.
Checkrepl.vbs
Script to monitor replication and enumerate the replication topology
for a given domain controller. Useful when monitoring specific network performance.
Clearmem.exe
Forces pages out of memory. Useful when testing and tracking memory performance issues.
Consume.exe
Consumes resources for stress testing performance such as low memory situations. The resources that can be appropriated include physical memory, page file memory, disk space, CPU time, and kernel pool
memory. Examples of uses of Consume.exe are found in Chapter 5,
“Performance Troubleshooting.”
Custreasonedit.exe Command-line and GUI tool that allows users to add, modify, and delete custom reasons used by the Shutdown Event Tracker on the Windows Server 2003 operating system.
DH.exe
Display Heap shows information about usage in a User mode process,
or about pool usage in Kernel mode memory. A heap is a region of
one or more pages that can be subdivided and allocated in smaller
chunks. This is normally done by the heap manager, whose job is to
allocate and deallocate variable amounts of memory.
Empty.exe
Frees the working set memory of a specified task or process.
Intfiltr.exe
The Interrupt-Affinity Filter (IntFiltr) allows a user to change the CPU
affinity for hardware components that generate interrupts in a computer. See Chapter 6, “Advanced Performance Topics,” for a discussion of the benefits this tool can provide on a large-scale
multiprocessor machine.
Kernrate.exe
Kernrate is a sample-profiling tool meant primarily to help identify
where CPU time is being spent. Both Kernel and User mode processes
can be profiled separately or simultaneously. Useful when monitoring
the performance of a process or device driver that is consuming excessive CPU time. See Chapter 5, “Performance Troubleshooting,” for
an example of how to use it to resolve performance problems involving excessive CPU usage.
Memtriage.exe
Detects a possible resource leak on a running system. Useful for monitoring memory leak, memory fragmentation, heap fragmentation,
pool leaks, and handle leaks.
Pfmon.exe
Page Fault Monitor lists the source and number of page faults generated by an application’s function calls. Useful when monitoring memory and disk performance. See Chapter 5, “Performance
Troubleshooting,” for an example of how to use it to resolve performance problems involving excessive application paging.
Pmon.exe
Process Resource Monitor shows each process and its processor and
memory usage.
Chapter 2:
Table 2-3
Performance Monitoring Tools
133
Performance-Related Windows Resource Kit Tools
Tool
Description
Showperf.exe
Performance Data Block Dump Utility is a graphical tool that creates a
dump of the contents of the Performance Data block so that you can view
and debug the raw data structure of all loaded performance objects.
Splinfo.exe
Displays print spooler performance information on the screen. Useful
when taking a snapshot of print spooler performance.
Srvinfo.exe
Displays a summary of information about a remote computer and the
current services running on the computer.
TSSCalling.exe
Series of applets that can be used to create automated scripts to simulate interactive remote users for performance stress testing of Terminal Services environments.
Vadump
Details the current amount of virtual memory allocated by a process.
Volperf.dll
Enables administrators to use Performance Monitor to monitor shadow copies.
Performance Monitoring Statistics
The easiest way to get started with performance monitoring in Windows Server 2003
is to click Performance on the Administrative Tools menu and begin a real-time monitoring session using the Performance Monitor console, as illustrated in Figure 2-1.
Figure 2-1
The Performance Monitor console
134
Microsoft Windows Server 2003 Performance Guide
When you first launch Performance Monitor, a System Monitor Chart View is activated with a default set of counters loaded for monitoring your local computer, as
illustrated in Figure 2-1. The default display shows three of the potentially thousands
of performance counter values that System Monitor can report on.
Performance Objects
Related performance statistics are organized into objects. For example, measurements
related to overall processor usage, like Interrupts/sec and % User Time, are available
in the Processor object.
Multiple Instances of Objects
There may be one or more instances of a performance object, where each instance is
named so that it is uniquely identified. For example, on a machine with more than
one processor, there is more than one instance of each set of processor measurements.
Each processor performance counter is associated with a specific named instance of
the Processor object. A Processor object has instances 0 and 1 for a 2-way multiprocessor that uniquely identify the processor instance. The instance name is a unique identifier for the set of counters related to that instance, as illustrated in Figure 2-2.
Processor object
instance 0
Processor object
instance 1
Interrupts/sec
Interrupts/sec
%Privileged
time
%Privileged
time
%User time
%User time
_Total
interrupts/sec
Figure 2-2
Objects, counters, and instances
Figure 2-2 gives an example of a computer containing two Processor objects. In the
example, statistics from three counters (Interrupts/sec, %Privileged time, and %User
Time) are being monitored for each instance of the Processor object. Also, the total
number of Interrupts/sec on all processors in the system is being monitored.
Similarly, for each running process, a unique set of related performance counters are
associated with that process instance. The instance name for a process has an additional index component whenever multiple instances of a process have the same pro-
Chapter 2:
Performance Monitoring Tools
135
cess name. For example, you will see instances named svchost#1, svchost#2, and
svchost#3 to distinguish separate instances of the svchost process.
The best way to visualize the relationships among object instances is to access the Add
Counters dialog box by clicking the Plus Sign (+) button on the toolbar. Select the
Thread object, and you will see something like the form illustrated in Figure 2-3.
There are many instances of the Thread object, each of which corresponds to a process program execution thread that is currently active.
Figure 2-3
Thread object instances
Two objects can also have a parent-child relationship, which is illustrated in Figure 2-3.
For example, all the threads of a single process are related. So thread instances, which
are numbered to identify each uniquely, are all children of some parent process. The
thread parent instance name is the process name that the thread is associated with. In
Figure 2-3, thread 11 of the svchost#3 process identifies a specific thread from a specific instance of the svchost process.
Many objects that contain multiple instances also provide a _Total instance that conveniently summarizes the performance statistics associated with multiple instances.
Types of Performance Objects
Table 2-4 shows a list of the performance objects corresponding to hardware, operating system services, and other resources that are installed with Windows Server 2003.
These objects and their counters can be viewed using the Add Counters dialog box of
the Performance Monitor console. This is neither a default list nor a definitive guide.
You might not see some of the performance objects mentioned here on your machine
136
Microsoft Windows Server 2003 Performance Guide
because these objects are associated with hardware, applications, or services that are
not installed. You are also likely to see many additional performance objects that are
not mentioned here but that you do have installed, such as those for measuring other
applications such as Microsoft SQL Server.
Tip
For a more comprehensive list of Windows Server 2003 performance objects
and counters along with a list of which objects are associated with optional services
and features, see the Windows Server 2003 Performance Counters Reference in the
Windows Server 2003 Deployment Kit. You can view this Reference online at http://
www.microsoft.com/resources/documentation/WindowsServ/2003/all/deployguide
/en-us/counters_overview.asp. And for information about performance objects and
counters for other Windows Server System products like Exchange Server and SQL
Server, see the documentation for these products.
Table 2-4
Windows Server 2003 Performance Objects
Object Name
Description
ACS Policy
Provides policy-based Quality of Service (QoS) admission
control data.
ACS/RSVP
Interfaces
Reports the Resource Reservation Protocol (RSVP) or Admission Control Service (ACS) Interfaces performance counters.
ACS/RSVP
Service
Reports the activity of the Quality of Service Admission Control
Service (QoS ACS), which manages the priority use of network
resources (bandwidth) at the subnet level.
Active Server
Pages
Monitors errors, requests, sessions, and other activity data
from Active Server Pages (ASP).
ASP .NET
Monitors errors, requests, sessions, and other activity data
from ASP.NET requests.
AppleTalk
Monitors traffic on the AppleTalk network.
Browser
Reports the activity of the Browser service that lists computers
sharing resources in a domain and other domain and workgroup names across the network. The Browser service provides
backward compatibility with clients that are running Microsoft
Windows 95, Microsoft Windows 98, Microsoft Windows 3.x,
and Microsoft Windows NT.
Cache
Reports activity for the file system cache, an area of physical
memory that holds recently used data from open files.
Client Service
for Netware
Reports packet transmission rates, logon attempts, and connections to Netware servers.
Database
Reports statistics regarding the Active Directory database
cache, files, and tables.
Chapter 2:
Table 2-4
Performance Monitoring Tools
137
Windows Server 2003 Performance Objects
Object Name
Description
Database
Instances
Reports statistics regarding access to the Active Directory
database and associated files.
DHCP Server
Provides counters for monitoring Dynamic Host Configuration
Protocol (DHCP) service activity.
Distributed
Transaction
Coordinator
Reports statistics about the activity of the Microsoft Distributed Transaction Coordinator, which is a part of Component Services (formerly known as Transaction Server) and which
coordinates two-phase transactions by Message Queuing.
DNS
Provides counters for monitoring various areas of the Domain
Name System (DNS) to find and access resources offered by
other computers.
Fax Service
Displays fax activity.
FileReplicaConn
Monitors performance of replica connections to the Distributed File System service.
FileReplicaSet
Monitors the performance of file replication service.
FTP Service
Includes counters specific to the File Transfer Protocol (FTP)
Publishing Service.
HTTP Indexing Service
Reports statistics regarding queries that are run by the Indexing Service, a service that builds and maintains catalogs of the
contents of local and remote disk drives to support powerful
document searches.
IAS Accounting Clients
Reports the activity of Internet Authentication Service (IAS) as
it centrally manages remote client accounting.
IAS Accounting Proxy
Reports the activity of the accounting proxy for Remote Authentication Dial-In User Service (RADIUS).
IAS Accounting Server
Reports the activity of Internet Authentication Service (IAS) as
it centrally manages remote server accounting.
IAS Authentication Clients Reports the activity of Internet Authentication Service (IAS) as
it centrally manages remote client authentication.
IAS Authentication Proxy
Reports the activity of the RADIUS authentication proxy.
IAS Authentication Server
Reports the activity of Internet Authentication Service (IAS) as
it centrally manages remote server authentication.
IAS Remote Accounting
Server
Reports the activity of the RADIUS accounting server where
the proxy shares a secret.
IAS Remote
Authentication Server
Reports the activity of the RADIUS authentication server where
the proxy shares a secret.
ICMP and
ICMPv6
Reports the rate at which messages are sent and received by
using Internet Control Message Protocol (ICMP), which provides error correction and other packet information.
138
Microsoft Windows Server 2003 Performance Guide
Table 2-4
Windows Server 2003 Performance Objects
Object Name
Description
Indexing Service
Reports statistics pertaining to the creation of indexes and the
merging of indexes by Indexing Service. Indexing Service indexes documents and document properties on your disks and
stores the information in a catalog. You can use Indexing Service to search for documents, either by using the Search command on the Start menu or by using a Web browser.
Indexing Service Filter
Reports the filtering activity of Indexing Service. Indexing Service indexes documents and document properties on your disks
and stores the information in a catalog. You can use Indexing
Service to search for documents, either by using the Search
command on the Start menu or by using a Web browser.
Internet
Information
Services Global
Includes counters that monitor Internet Information Services
(IIS), which includes the Web service and the FTP service.
IPSec v4 Driver
Reports activity about encrypted.IPSec network traffic
IPSec v4 IKE
Reports IPSec security association information.
IPv4 and Ipv6
Reports activity at the Internet Protocol (IP) layer of Transmission Control Protocol/Internet Protocol (TCP/IP).
Job Object
Reports the data for accounting and processor use that is collected by each active, named job object.
Job Object Details
Reports detailed performance information about the active
processes that make up a job object.
Logical Disk
Reports activity rates and allocation statistics associated with a
Logical Disk file system.
Macfile Server
Provides information about a system that is running File Server
for Macintosh.
Memory
Reports on the overall use of both physical (RAM) and virtual
memory, including paging statistics.
MSMQ Queue
Monitors message statistics for selected queues.
MSMQ Service
Monitors session and message statistics.
MSMQ Session
Monitors statistics about active sessions.
NBT Connection
Reports the rate at which bytes are sent and received over connections that use the NetBIOS over TCP/IP (NetBT) protocol,
which provides network basic input/output system (NetBIOS)
support for TCP/IP between the local computer and a remote
computer.
NetBEUI
Measures NetBIOS Enhanced User Interface (NetBEUI) data
transmission.
NetBEUI Resource
Tracks the use of buffers by the NetBEUI protocol.
Chapter 2:
Table 2-4
Performance Monitoring Tools
139
Windows Server 2003 Performance Objects
Object Name
Description
Network Interface
Reports the rate at which bytes and packets are sent and received over a TCP/IP connection by means of network adapters.
NNTP
Commands
Includes counters for all Network News Transfer Protocol
(NNTP) commands processed by the NNTP service.
NNTP Server
Monitors posting, authentication, and connection activity on
an NNTP server.
NTDS
Handles the Windows NT directory service on your system.
NWLink IPX
Measures datagram network traffic between computers that
use the Internetwork Packet Exchange (IPX) protocol.
NWLink NetBIOS
Monitors IPX transport rates and connections.
NWLink SPX
Measures network traffic between computers that use the Sequenced Packet Exchange (SPX) protocol.
Objects
Reports data about system software objects such as events.
Paging File
Reports the current allocation of each paging file, which is
used to back virtual memory allocations.
Pbserver Monitor
Reports activity on a phone book server.
Physical Disk
Reports activity on hard or fixed disk drives.
Print Queue
Reports statistics for print jobs in the queue of the print server.
This object is new in Windows Server 2003.
Process
Reports the activity of the process, which is a software object
that represents a running program.
Process Address Space
Monitors memory allocation and use for a selected process.
Processor
Reports the activity for each instance of the processor.
ProcessorPerformance
Reports on the activity of variable speed processors.
PSched Flow
Monitors flow statistics from the packet scheduler.
PSched Pipe
Monitors pipe statistics from the packet scheduler.
RAS Port
Monitors individual ports of the remote access device on your
system.
RAS Total
Monitors all combined ports of the remote access device on
your system.
Redirector
Reports activity for the redirector, which diverts file requests to
network servers.
Server
Reports activity for the server file system, which responds to
file requests from network clients.
Server Work Queues
Reports the length of queues and number of objects in the file
server queues.
140
Microsoft Windows Server 2003 Performance Guide
Table 2-4
Windows Server 2003 Performance Objects
Object Name
Description
SMTP NTFS Store Driver
Monitors Simple Mail Transport Protocol message activity that
is associated with an MS Exchange client.
SMTP Server
Monitors message activity generated by the Simple Mail
Transport Protocol (SMTP) service.
System
Reports overall statistics for system counters that track system
up time, file operations, the processor queue length, and so on.
TCPv4 and TCPv6
Reports the rate at which Transmission Control Protocol (TCP)
segments are sent and received.
Telephony
Reports the activity for telephony devices and connections.
Terminal Services
Provides Terminal Services summary information.
Terminal Services Session
Provides resource monitoring for individual Terminal sessions.
Thread
Reports the activity for a thread, which is the part of a process
that uses the processor.
UDPv4 and UDPv6
Reports the rate at which datagrams are sent and received by
using the User Datagram Protocol (UDP).
Web Service
Includes counters specific to the Web publishing service that is
part of Internet Information Services.
Web Service Cache
Provides statistics on the Kernel mode and User mode caches
that are used in IIS 6.0.
Windows Media Station
Service
Provides statistics about the Windows Media Station service,
which provides multicasting, distribution, and storage functions for Windows Media streams.
Windows Media Unicast
Service
Provides statistics about the Windows Media Unicast service
that provides unicasting functions for Advanced Streaming
Format (ASF) streams.
WMI Objects
Reports the available classes of WMI objects
Performance Counters
The individual performance statistics that are available for each measurement interval
are numeric counters. You can obtain an explanation about the meaning of a counter
by clicking the Explain button shown in Figure 2-3.
Performance Counter Path
Each performance counter you select is uniquely identified by its path.
If you right-click Chart View in the System Monitor control of the Performance Monitor console to access the Properties of your console session, you will see listed on the
Data tab the counters that are selected to be displayed, as shown in Figure 2-4. Each
counter selected is identified by its path.
Chapter 2:
Figure 2-4
Performance Monitoring Tools
141
Data tab for System Monitor Properties
The following syntax is used to describe the path to a specified counter:
\\Computer_name\Object(Parent/Instance#Index)\Counter
The same syntax is also used consistently to identify the counters you want to gather
using the Logman, Relog, and Typeperf command-line tools.
For a simple object like System or Memory that has only a single object instance associated with it, the following syntax will suffice:
\Object\Counter
For example, \Memory\Pages/sec identifies the Pages/sec counter in the Memory
object.
The Computer_name portion of the path is optional; by default, the local computer
name is assumed. However, you can specify the computer by name so that you can
access counters from a remote machine.
The parent, instance, index, and counter components of the path can contain either a
valid name or a wildcard character. For example, to specify all the counters associated
with the Winlogon process, you can specify the counters individually or use a wildcard character (*):
\Process(winlogon)\*
142
Microsoft Windows Server 2003 Performance Guide
Only some objects have parent instances, instance names, and index numbers that
need to be used to identify them uniquely. You need to specify these components of
the path only when they are necessary to identify the object instance you are interested in. Where a parent instance, instance name, or instance index is necessary to
identify the counter, you can specify either each individual path or use a wildcard
character (*) instead. This allows you to identify all the instances with a common path
identification, without having to enumerate each individual counter path.
For example, the LogicalDisk object has an instance name, so you must provide either
the name or a wildcard. Use the following format to identify all instances of the Logical Disk object:
\LogicalDisk(*)\*
To specify Logical Disk instances separately, use the following paths:
\LogicalDisk(C:)\*
\LogicalDisk(D:)\*
It is easy to think of the instance name, using this notation, as an index into an array
of object instances that uniquely identifies a specific set of counters.
The Process object has an additional path component because the process instance
name is not guaranteed to be unique. You would use the following format to collect
the ID Process counter for each running process:
\Process(*)\ID Process
When there are multiple processes with the same name running that you need to distinguish, use the #Index identifier. For example, multiple instances of the svchost process would be identified as follows:
\Process(svchost)\% Processor Time
\Process(svchost#1)\% Processor Time
\Process(svchost#2)\% Processor Time
\Process(svchost#3)\% Processor Time
Notice that the first unique instance of a process instance name does not require an
#Index identifier. The instance index 0 is hidden so that the numbering of additional
instances starts with 1. You cannot identify multiple instances of the same process for
monitoring unless you display instance indexes.
For the Thread object, which has a parent instance of the process to help identify it, the
parent instance is also part of the path. For example, the following is the path that identifies counters associated with Thread 11 of the third instance of the svchost process.
\Process(svchost11#2)\Context switches/sec
Chapter 2:
Performance Monitoring Tools
143
If a wildcard character is specified in the parent name, all instances of the specified
object that match the specified instance and counter fields will be returned. If a wildcard character is specified in the instance name, all instances of the specified object
will be returned. If a wildcard character is specified in the counter name, all counters
of the specified object are returned.
Partial counter path string matches (for example, svc*) are not supported.
Types of Counters
Each counter has a counter type. System Monitor (and similar applications) uses the
counter type to calculate and present the counter value correctly. Knowing the
counter type is also useful because it indicates how the performance statistic was
derived. The counter type also defines the summarization rule that will be used to
summarize the performance statistic over longer intervals using the Relog commandline tool.
The performance monitor API defines more than 20 specific counter types, some of
which are highly specialized. The many different counter types fall into a few general
categories, depending on how they are derived and how they can be summarized. Five
major categories of counters are:
■
Instantaneous counters that display a simple numeric value of the most recent
measurement An instantaneous counter is a single observation or sample of
the value of a performance counter right now. Instantaneous counters are
always reported as an integer value.
Instantaneous counters tell you something about the state of the machine right
now. They do not tell you anything about what happened during the interval
between two measurements. An example of an instantaneous counter is Memory\Available Bytes, which reports the current number of RAM pages in physical memory that is available for immediate use. Most queue length
measurements are also instantaneous counters since they represent a single
observation of the current value of the measurement. You can summarize such
counters over longer intervals by calculating average values, minimums, maximums, and other summary statistics.
■
Interval counters that display an activity rate over time Interval counters are
derived from an underlying measurement mechanism that counts continuously
the number of times some particular event occurs. The System Monitor retrieves
the current value of this counter every measurement interval.
Interval counters can also be thought of as difference counters because the
underlying counter reports the current value of a continuously measured event.
System Monitor retains the previous interval value and calculates the difference
144
Microsoft Windows Server 2003 Performance Guide
between these two values. The difference is then usually expressed as a rate per
second. Examples of this type of counter include Processor\Interrupts/sec and
Logical and Physical Disk\Disk Transfers/sec. Some interval counters count
timer ticks instead of events. These interval counters are transformed into %
busy processor time measurements by dividing the timer tick difference by the
total number of timer ticks in the interval. Interval counters can be summarized
readily, reflecting average rates over longer periods of time.
■
Elapsed time counters There are a few important elapsed time counters that
measure System Up Time and Process\Elapsed time. These counters are gathered on an interval basis and cannot be summarized.
■
Averaging counters that provide average values derived for the interval Examples of averaging counters include the hit rate % counters in the Cache object
and the average disk I/O response time counters in the Logical and Physical
Disk objects. These counters must be summarized carefully; make sure you
avoid improperly calculating the average of a series of computed average values.
You must calculate a weighted average over the summarization interval instead.
■
Miscellaneous complex counters including specialized counters that do not readily
fall into any of the other categories Similar to instantaneous counters, the com-
plex counters are also single observations. Examples include Logical Disk\%
Free Space, which is calculated by subtracting the number of allocated disk
bytes from the total number of bytes available in the file system and expressing
the result as a percentage of the whole. They also must be summarized carefully.
For more information about the way the Relog tool summarizes counter types, see
“Summarizing Log Files Using Relog.”
Note
Instantaneous counters like System\Processor Queue Length are always
reported as integer values. They are properly viewed as single instances of a sample.
You can summarize them by calculating average values and other summary statistics
over longer intervals.
System Monitor
System Monitor is the main graphical tool used for real-time monitoring and analysis
of logged data. The most common method of accessing System Monitor is by loading
the Performance Monitor console.
To see procedures and a brief overview of the Performance Monitor console, click
Help on the Performance Monitor toolbar. For information about remote monitoring,
see Chapter 4, “Performance Monitoring Procedures.”
Chapter 2:
Performance Monitoring Tools
145
Warning
Monitoring large numbers of counters can generate high overhead,
potentially making the system unresponsive to keyboard or mouse input and impacting the performance of important server application processes. To reduce performance monitoring overhead, delete some of the counters you are collecting, reduce
the sampling interval, or switch to a background data logging session using a binary
logging file. For more information about background data logging, see “Performance
Logs and Alerts” in this chapter.
Viewing a Chart in Real Time
To access the Performance Monitor console, select Performance from the Administrative Tools menu; or click Start, Run, and type Perfmon.exe.
When you start the Performance Monitor console, by default a System Monitor graph
displays a default set of basic counters that monitor processor, disk, and virtual memory activity. These counters give you immediate information about the health of a system you are monitoring. The Chart View is displayed by default, but you can also
create bar charts (histograms) and tabular reports of performance counter data using
the Performance Monitor console.
When you run the Performance Monitor console, the Performance Logs and Alerts
snap-in also appears beneath System Monitor in the console tree, as shown in Figure 2-5.
Toolbar
Time Bar
Graphical View
Value Bar
Legend
Figure 2-5
System Monitor in the Performance console
146
Microsoft Windows Server 2003 Performance Guide
The System Monitor display shown in Figure 2-5 contains the elements listed in Table 2-5.
Table 2-5
System Monitor Elements
Element
Description
Optional toolbar
This toolbar provides capabilities for adding and deleting counters
from a graph. The toolbar buttons provide a quick way to configure
the monitoring console display. You can also right-click the display to
access a shortcut menu to add counters and configure your monitoring session Properties.
Graphical View
This displays the current values for selected counters. You can vary the
line style, width, and color. You can customize the color of the window
and the line chart itself, add a descriptive title, display chart gridlines,
and change the display font, among other graphical options.
Value Bar
This displays the Last, Average, Minimum, and Maximum values for
the counter value that is currently selected in the Legend. The value
bar also shows a Duration value that indicates the total elapsed time
displayed in the graph (based on the current update interval).
Legend
The Legend displays the counters selected for viewing, identified by
the computer name, object, and parent and instance name. The line
graph color and the scaling factor used to graph the counter value
against the y-axis are also shown.
Time Bar
In real-time mode, this Time Bar moves across the graph from right to
left to indicate the passing of each update interval.
Changing the Sampling Interval
The default interval for a line graph in Chart View is once per second. You might find
that using a 1-second sampling rate generates measurements too quickly to analyze
many longer-term conditions. Change the sampling interval by accessing the General
tab on the System Monitor Properties.
In a real-time monitoring session, the Chart View displays the last 100 observations
for each counter. If your monitoring session has not yet accumulated 100 samples, the
line charts you see will be truncated.
Caution
Regardless of the update interval, a real-time Chart View can display no
more than the last 100 observations for each counter value. When the Time Bar
reaches the right margin of the display, the display wraps around to the beginning and
starts overwriting data. If you change the sampling interval on the General Properties
tab, the Duration field in the Chart View is updated immediately to reflect the change.
However, until the Time Bar makes one complete revolution, the time range of the
current display is a combination of the previous and current interval values.
Chapter 2:
Performance Monitoring Tools
147
A Histogram, or Report View, can display the last (or Current) value of each counter
being monitored, or one of the following statistics: the minimum, maximum, or average value of the last 100 observations. Use the Legend in the Histogram View to identify what counter each bar in the graph corresponds to. Figure 2-6 illustrates the
System Monitor Histogram View, which is an effective way to monitor a large number
of counters at any one time.
Figure 2-6
Histogram View helps montior many counters simultaneously
Creating a Custom Monitoring Configuration
After you select the counters you want to view and customize the System Monitor display the way you like it, you can save the System Monitor configuration for reuse at a
later time.
To create a simple real-time monitoring configuration
1. Click the New button.
2. Click the Plus Sign (+) button.
3. In the Add Counters dialog box, select the performance objects and counters
that you want to monitor.
4. Click Add for each one you want to add to the chart.
5. After selecting your counter set, click Close and watch the graph.
6. Click the File menu and click Save As to save the chart settings in a folder where
you can easily reload your graph without having to reconfigure your counters.
148
Microsoft Windows Server 2003 Performance Guide
You can also use the optional toolbar that appears by default to help you reconfigure
your customized settings. Resting your mouse over any toolbar button will tell you the
operation or action the button performs. Table 2-6 outlines the purpose of the tasks
associated with each button more fully.
Table 2-6
System Monitor Toolbar Buttons
Task
Which Buttons To Use
Create a new chart
Use the New Counter Set button to create a new chart.
Refresh the data
Click Clear Display to clear the displayed data and obtain a fresh data
sample for existing counters.
Conduct manual or
automatic updates
Click Freeze Display to suspend data collection. Click Freeze Display
again to resume data collection. Alternatively use the Update Data
button for manual snapshots of data.
Select display
options
Use the View Graph, View Histogram, or View Report option button.
Select a data
source
Use the View Current Activity button for real-time data, or the View
Log File Data button for data from either a completed or a currently
running log.
Add, delete, reset,
and get more
information about
counters
Use the Add or Delete buttons as needed. You can also use the New
Counter Set button to reset the display and select new counters.
Clicking the Add button displays the Add Counters dialog box, as
shown in Figure 2-3. You can also press the Delete key to delete a
counter that you select in the list box. When you are adding
Counters, click the Explain button to read a more detailed explanation of a counter.
Highlighting
To accentuate the chart line or histogram bar for a selected counter
with white (default) or black (for light backgrounds), click Highlight
on the toolbar.
Import or export
counter settings
To save the displayed configuration to the Clipboard for insertion
into a Web page, click Copy Properties. To import counter settings
from the Clipboard to the current System Monitor display, click Paste
Counter List.
Configure other
System Monitor
properties
To access colors, fonts, or other settings that have no corresponding
button on the toolbar, click the Properties button.
When you want to review data in a time series, the Chart View line
graphs display a range of measured values continuously. Histograms
and reports are useful for viewing a large number of Counter values.
However, they display only a single value at a time, reflecting either
current activity or a range of statistics.
Saving Real-Time Data
If something of interest is in your real-time graph that you would like to spend more
time investigating, you can save a copy of the current display by using the following
steps:
Chapter 2:
Performance Monitoring Tools
149
1. Right-click your chart and click Save As.
2. Type the name of the file you want to create.
3. Select the format (chose HTML if you would like to view the full-time range of
the counter values that are being displayed).
4. Click Save.
Note
Because this is a real-time graph, you must click Save before the data
you are interested in gets overwritten as the Time Bar advances. If you are likely
to need access to performance data from either the recent period or historical
periods, use Performance Logs and Alerts to write counter values to a log file.
The other commands available in the shortcut menu are listed in Table 2-7.
Table 2-7
System Monitor Shortcut Menu Commands
Command
Description
Add Counters
Use this command in the same way you use the Add button on the
System Monitor toolbar.
Save As
Use this command if you want to save a snapshot of the current performance monitor statistics either in HTML format for viewing using a
browser or as a .tsv file for incorporation into a report using a program
such as Excel.
Save Data As
Use this command if you want to save log data in a different format or
reduce the log size by using a greater sampling rate. This shortcut
command is unavailable if you are collecting real-time data.
Properties
Click this command to access the property tabs that provide options
for controlling all aspects of System Monitor data collection and
display.
Customizing How Data Is Viewed
By default, real-time data is displayed in Chart View. Chart View is useful for watching
trends. You can configure the elements shown such as the scale used, and the vertical
and horizontal axis; and you can also change how the data is graphed.
The full range of configurable properties can be accessed by using the System Monitor
Properties sheets. To open the System Monitor Properties dialog box, right-click on the
right pane and select Properties, or click the Properties button on the toolbar.
Table 2-8 lists the property tabs in the dialog box, along with the attributes they control.
150
Microsoft Windows Server 2003 Performance Guide
Table 2-8
System Monitor Properties
Use This Tab
To Add or Change This
General
View: Choose between a graph (chart), histogram, or report.
Display Elements: Show/hide the following:
■ Counter legend
■ Value bar: Displays the last, minimum, and maximum values for a
selected counter
■ Toolbar
Report and Histogram Data: Choose between default, minimum, average, current, and maximum report values.
Appearance and Border: Change the appearance of the view. You can
configure three-dimensional or flat effects, or include or omit a border
for the window.
Sample Automatically Every: Update sample rates. Specify sample rates
in seconds.
Allow Duplicate Counter Instances: Displays instance indexes (for monitoring multiple instances of a counter). The first instance (instance number 0) displays no index. System Monitor numbers subsequent instances
starting with 1.
Source
Data Source: Select the source of data to display.
Current Activity: Display the current data for the graph.
Log Files: Display archived data from one or more log files.
Database: Display current data from a Structured Query Language (SQL)
database.
Time Range Button: Display any subset of the time range for a log or database file.
Data
Counters: Add or remove objects, counters, and instances.
Color: Change the color assigned to a counter.
Width: Change the thickness of the line representing a counter
Scale: Change the scale of the value represented by a counter.
Style: Change the pattern of the line representing a counter.
Graph
Title: Type a title for the graph.
Vertical Axis: Type a label for the vertical axis.
Show: Select any of the following to show:
■
Vertical grid lines
Horizontal grid lines
■ Vertical scale numbers
Vertical Scale: Specify a maximum and minimum (upper and lower limits, respectively) for the graph axes.
■
Appearance
Color: Specify color for Graph Background, Control Background, Text,
Grid, and Time Bar when you click Change.
Font: Specify Font, Font Style, and Size when you click Change.
Chapter 2:
Performance Monitoring Tools
151
In addition to using the Chart properties to change the view of the data, you can use
the toolbar buttons:
■
View Histogram button Click this button to change from the line graph Chart
View to a bar chart Histogram View. Histogram View is useful at a deeper level
when, for example, you are trying to track the application that is using most of
the CPU time on a system.
Figure 2-7
■
Histogram View
View Report button
Click this button when you want to view data values only.
Tips for Working with System Monitor
Windows Server 2003 Help for Performance Monitoring explains how to perform
common tasks, such as how to:
■
Work with Counters
■
Work with Data
■
Work with Settings
Some useful tips for using the System Monitor are provided in this section.
Simplifying Detailed Charts
When working with Chart Views, you will find that simpler charts are usually easier to
understand and interpret. If you have several counters to graph in Chart View, it can
lead to a display that is jumbled and difficult to untangle. Some tips for simplifying
complex charts to make them easier to understand include:
152
Microsoft Windows Server 2003 Performance Guide
■
In Chart View, use the Highlight button while you scroll up and down in the
Legend to locate interesting counter values.
■
Switch to the Report View or Histogram View to analyze data from many
counters. When you switch back to Chart View, delete the counters from the
chart that are cluttering up the display.
■
Double-click an individual line graph in Chart View or a bar in Histogram View
to identify that counter in the Legend.
■
Widen the default width of line graphs that display the “interesting” counters
that remain for better visibility.
■
Run multiple copies of the Performance Monitor console if you need to watch
more counters than are readily displayed on a single chart or to group them into
categories for easier viewing.
Scaling the Y-Axis
All the counter values in a Chart View or Histogram View are graphed against a single
y-axis. The y-axis displays values from 0 through 100 by default. The default y-axis
scale works best for counter values that ordinarily range from 0 through 100. You can
change the default minimum and maximum y-axis values using the Graph tab of the
System Monitor Properties dialog. At the same time, you can also turn on horizontal
and vertical grid lines to make the graph easier to read.
You might also need to adjust the scaling factor for some of the counters so that all
counters you selected are visible in the graph area. For example, select the Avg. Disk
sec/Transfer, as illustrated in Figure 2-8.
Figure 2-8
Scaling counter data to fit on the y-axis
Chapter 2:
Performance Monitoring Tools
153
Notice that the Avg. Disk sec/Transfer counter, which measures the average response
time for an I/O to a physical disk, uses a default scaling factor of 1000. System Monitor multiplies the counter values by the scaling factor before it displays them on a line
chart or histogram. Multiplying counter values for Avg. Disk sec/Transfer by 1000 normally allows disk response time in the range of 0–100 milliseconds to display against
a default y-axis scale of 0–100.
On the Data tab of the System Monitor Properties dialog, you will see the Default scaling factor that was defined for each counter. If you click on this drop-down list, you
will see a list of the available scaling factors. Select a scaling factor that will allow the
counters to be visible within the display. If the counter values are too small to be displayed, use a higher number scaling factor. If the counter values are too large to be displayed, use a smaller scaling factor.
You might have to experiment using trial and error to arrive at the best scaling factor
to use for each counter selected.
Sorting the Legend
The Counter, Instance, Parent, Object, and Computer columns in the Legend can
be sorted. Click on the column heading in the Legend to sort the legend by the values in that column. Click on the column header to re-sort the display in the opposite direction.
Printing System Monitor Displays
You cannot print System Monitor charts and tabular reports directly from the Performance Monitor console. There are a number of other ways that you can print a System
Monitor report, including:
■
Ensure the System Monitor window is the active window. Copy the active window to the Clipboard by pressing ALT+PRINT SCREEN. Then you can open a Paint
program, paste the image from the Clipboard, and print it.
■
Add the System Monitor control to a Microsoft Office application, such as
Microsoft Word or Microsoft Excel. Configure it to display data, and then print
from that program.
■
Save the System Monitor control as an HTML file by right-clicking the details
pane of System Monitor, clicking Save As, and typing a file name. You can then
open the HTML file and print it from Microsoft Internet Explorer or another
program.
154
Microsoft Windows Server 2003 Performance Guide
Table 2-9 provides a set of tips that you can use when working with system monitor.
Table 2-9
Tips for Working with System Monitor
Tip
Do This
Learn about
individual
counters
When adding counters, if you click Explain in the Add Counters dialog
box in System Monitor, Counter Logs or Alerts, you can view counter
descriptions.
Vary the data
displayed in a
report
By default, reports display only one value for each counter. This is the
current data if the data source is real-time activity, or averaged data if
the source is a log. However, on the General tab, you can configure the
report display to show different values, such as the maximum, minimum, and average.
Select a group of
counters or
counter instances
to monitor
In the Add Counters dialog box in System Monitor, you can do the
following:
Track totals for all
instances of a
counter
■
To select all counters or instances, click All Counters or All Instances.
■
To select specific counters or instances, click Select Counters
from the list or Select Instances from the list.
■
To monitor a group of consecutive counters or instances in a list
box, hold down the Shift key and click the first counter. Scroll
down through the items in the list box and click the last counter
you require.
■
To select multiple, nonconsecutive counters or instances, click
each counter and press CTRL. It is a good idea to keep the Ctrl
key pressed throughout this operation; if you do not, you can
inadvertently lose previously selected counters.
Instead of monitoring individual instances for a selected counter, you
can use the _Total instance, which sums all instance values and reports
them in System Monitor.
Pinpoint a specific To match a line in a graph with the counter for which it is charting valcounter from lines ues, double-click a position in the line. If chart lines are close together,
in a graph
try to find a point in the graph where they diverge.
Highlight the
data for a specific
counter
To highlight the data for a specific counter, use the highlighting feature. To do so, press CTRL+H or click Highlight on the toolbar. For the
counter selected, a thick line replaces the colored chart line. For white
or light-colored backgrounds (defined by the Graph Background
property), this line is black; for other backgrounds, this line is white.
Task Manager
Sometimes a quick look at how the system is performing is more practical than setting
up a complicated monitoring session. Task Manager is ideal for taking a quick look at
how your system is performing. You can use it to verify whether there are any pressing
performance issues and to obtain an overview of how key system components are
functioning. You can also use Task Manager to take action. For example, after you
Chapter 2:
Performance Monitoring Tools
155
determine that a runaway process is causing the system to become unresponsive, you
can use Task Manager to end the process that is behaving badly.
Although Task Manager lacks the breadth of information available from the System
Monitor application, it is useful as a quick reference to system operation and performance. Task Manager does provide several administrative capabilities not available
with the System Monitor console, including the ability to:
■
Stop running processes
■
Change the base priority of a process
■
Set the processor affinity of a process so that it can be dispatched only on particular processors in a multiprocessor computer
■
Disconnect or log off a connected user session
Unlike the Performance Monitor application, Task Manager can only run interactively.
Also, you are permitted to run only one instance of Task Manager at a time. You can
use Task Manager only to view current performance statistics.
Most of the performance statistics that Task Manager displays correspond to counters
you can also gather and view using the Performance Monitor, even though Task Manager sometimes calls similar values by slightly different names. Later in this section,
Tables 2-10, 2-11, and 2-12 explain each Task Manager field and which Performance
Monitor counter it corresponds to.
Warning
If you are comparing statistics being gathered by both a Performance
Monitor console session and Task Manager, don’t expect these applications to report
exactly the same values. At the very least, they are gathering performance measurement data at slightly different times, which helps explain some of the discrepancy
when you compare two views of the same measurement.
Working with Task Manager
To start Task Manager, press CTRL+SHIFT+ESC or right-click the Taskbar and select
Task Manager.
Task Manager displays different types of system information using various tabs. The
display will open to the tab that was selected when Task Manager was last closed.
Task Manager has five tabs: Applications, Processes, Performance, Networking, and
Users. The status bar, at the bottom of the Task Manager window, shows a record of
the number of Processes open, the CPU Usage, and the amount of virtual memory
committed compared to the current Commit Limit.
156
Microsoft Windows Server 2003 Performance Guide
The Task Manager display allows you to:
■
Select Always On Top from the Options menu to keep the window in view as
you switch between applications. The same menu option is also available by
using the context menu on the CPU usage gauge in the notification area of the
Taskbar.
■
Press CTRL+TAB to toggle between tabs, or click the tab.
■
Resize all Task Manager columns shown in the Networking, Processes, and
Users tabs.
■
Click a column in any view other than the Performance tab to sort its entries in
ascending or descending order.
While Task Manager is running, a miniature CPU Usage gauge appears in the notification area of the Taskbar. When you point to this icon, the icon displays the percentage
of processor use in text format. The miniature gauge, illustrated in Figure 2-9, matches
the CPU Usage History chart on the Performance tab.
Figure 2-9
Task Manager CPU gauge shown in the notification area
If you would like to run Task Manager without its application window taking up space
on the Taskbar, click Hide When Minimized on the Options menu. To open an
instance of Task Manager when it is hidden, double-click the Task Manager CPU
gauge in the notification area, press CTRL+SHIFT+ESC, or right-click the Taskbar and
select Task Manager.
You can control the rate at which Task Manager updates its counts by setting the
Update Speed option on the View menu. These are the possible speeds:
■
High
■
Normal
■
Low
■
Paused
Updates every half-second.
Updates once per second. This is the default sampling rate.
Updates every 4 seconds.
Does not update automatically. Press F5 to update.
A lower update speed reduces Task Manager overhead and thus lowers the sampling
rate, which might cause some periodic spikes to go unnoticed. You can manually force
an update at any time by clicking Refresh Now on the View menu or by pressing F5.
Monitoring Applications
The Applications tab lists the applications currently running on your desktop. You
can also use this tab to start and end applications. Ending the application is just like
clicking the X in the title bar of an application. The application then has a chance to
Chapter 2:
Performance Monitoring Tools
157
prompt you to save information and clean up (flush buffers, close handles, cancel
database transactions, close database connections, and so on). If you don’t respond
immediately to the prompt generated by the application when you try to end it, Task
Manager prompts you to either cancel the operation or terminate the application
immediately with possible loss of data.
Click the Applications tab to view the status of each open desktop application. The
Applications tab includes the following additional features:
■
To end an application, highlight the application and then click End Task.
■
To switch to another application, highlight the application and then click
Switch To.
■
To start a new task, click New Task. In the Open text box that appears, type the
name of the program you want to run.
■
To determine what executable is associated with an application, right-click the
task you want, and then click Go To Process.
Monitoring Processes
The Processes tab provides a tabular presentation of current performance statistics of
all processes currently running on the system. You can select which statistics are displayed on this tab by clicking Select Columns on the View menu. You can also terminate processes from this tab. Terminating a process is much different from clicking the
X in the application’s title bar—the application is marked for termination. The application will not have a chance to save information or clean up before it is terminated.
When you terminate an application process in this fashion, you can corrupt the files
or other data that the application is currently manipulating and prevent that application from being restarted successfully. Verify that the process you are terminating is
causing a serious performance problem by either monopolozing the processor or
leaking virtual memory before you terminate it using this display.
Warning
You should use the End Process function in Task Manager only to terminate a runaway process that is threatening the stability of the entire machine.
Note that you can terminate only those processes that you have the appropriate security access rights for. The attempt to terminate a service process running under a different security context from your desktop Logon ID will be denied for security
reasons.
In Task Manager, click the Processes tab to see a list of running processes and their
resource usage. Figure 2-10 is an example of how Task Manager displays process information. It shows some additional columns that have been turned on.
158
Microsoft Windows Server 2003 Performance Guide
Figure 2-10
Processes tab in Task Manager
Note
System Monitor displays memory allocation values in bytes, but Task Manager displays its values in kilobytes, which are units of 1,024 bytes. When you compare
System Monitor and Task Manager values, divide System Monitor values by 1,024.
To include 16-bit applications in the display, on the Options menu, click Show 16-Bit
Tasks.
Table 2-10 shows how the data displayed on the Task Manager Processes tab compares to performance counters displayed in System Monitor in the Performance console. Not all of these columns are displayed by default; use the Select Columns
command on the View menu to add columns to or remove columns from the Task
Manager display.
Table 2-10
Task Manager Processes Tab Compared to the Performance Console
Task Manager Column
System Monitor
Counter
Description
PID (Process Identifier)
Process(*)\ID
Process
The numerical identifier assigned to the
process when it is created. This number
is unique to a process at any point in
time. However, after a process has been
terminated, the process ID it was assigned might be reused and assigned to
another process.
CPU Usage (CPU)
Process(*)\%
Processor Time
The percentage of time the processor or
processors spent executing the threads
in this process.
Chapter 2:
Table 2-10
Performance Monitoring Tools
159
Task Manager Processes Tab Compared to the Performance Console
Task Manager Column
System Monitor
Counter
CPU Time
Not available
The total time the threads of this process
used the processor since the process was
started.
Memory Usage
Process(*)\Working
Set
The amount of physical memory currently allocated to this process. This
counter includes memory that is allocated specifically to this process in addition
to memory that is shared with other processes.
Memory Usage Delta
Not available
The change in memory usage since the
last update.
Peak Memory Usage
Process(*)\
Working Set Peak
The maximum amount of physical memory used by this process since it was
started.
Page Faults
Not available
The number of page faults caused by this
process since it was started.
Page Faults Delta
Process(*)\
Page Faults/sec
The change in the value of the Page
Faults counter since the display was last
updated.
USER Objects
Not available
The number of objects supplied to this
process by the User subsystem. Some examples of USER objects include windows, menus, cursors, icons, and so on.
I/O Reads
Not available
The number of read operations performed by this process since it was started. Similar to the Process(*)\IO Read
Operations/sec counter value, except
Task Manager reports only a cumulative
value.
I/O Read Bytes
Not available
The number of bytes read by read operations performed by this process since it
was started. Similar to the Process(*)\IO
Read Bytes/sec counter value, except
Task Manager reports only a cumulative
value.
Session ID
Terminal Services
Session instance
name
This counter is meaningful only when
Terminal Services is installed. When installed, the counter displays the Terminal
Services session ID that is running this
process. This number is unique to the
session at any point in time; however, after a session has been terminated, the
session ID it was originally assigned
might be reused and assigned to another process.
Description
160
Microsoft Windows Server 2003 Performance Guide
Table 2-10
Task Manager Processes Tab Compared to the Performance Console
Task Manager Column
System Monitor
Counter
User Name
Not available
The account the process was started
under.
Virtual Memory Size
Process(*)\Virtual
Bytes
The amount of virtual memory allocated
to this program.
Paged Pool
Process(*)\
Pool Paged Bytes
The amount of pageable system
memory allocated to this process.
Non-pages Pool
Process(*)\
Pool Nonpages
Bytes
The amount of nonpageable system
memory allocated to this process.
Base Priority
Process(*)\Priority
Base
The base priority assigned to this process. Performance Monitor displays priority as a number, whereas Task
Manager uses a name. The names used
by Task Manager are Low, Below Normal, Normal, Above Normal, High, and
Realtime. These correspond to priority
levels 4, 6, 8, 10, 13, and 24, respectively.
Handle Count
Process(*)\Handle
Count
The number of open handles in this
process.
Thread Count
Process(*)\Thread
Count
The number of threads that make up this
process.
GDI Objects
Not available
The number of Graphics Device Interface
(GDI) objects in use by this process. A
GDI object is an item provided by the
GDI library for graphics devices.
I/O Writes
Not available
The number of write operations performed by this process since it was
started.
Description
Similar to the Process(*)\IO Write
Operations/sec counter value, except
Task Manager reports only a cumulative
value.
I/O Write Bytes
Not available
The number of bytes written by write
operations performed by this process
since it was started.
Similar to the Process(*)\IO Write Bytes/
sec counter value, except Task Manager
reports only a cumulative value.
Chapter 2:
Table 2-10
Performance Monitoring Tools
161
Task Manager Processes Tab Compared to the Performance Console
Task Manager Column
System Monitor
Counter
I/O Other
Not available
Description
The number of input/output operations
that are neither read nor write. An example of this would be a command input to
a part of the running process.
Similar to the Process(*)\IO Other Operations/sec counter value, except Task Manager reports only a cumulative value.
I/O Other Bytes
Not available
The number of bytes used in other I/O
operations performed by this process
since it was started.
Similar to the Process(*)\IO Other Bytes/
sec counter value, except Task Manager
reports only a cumulative value.
Monitoring Performance
The Performance tab displays a real-time view of performance data collected from the
local computer, including a graph and numeric display of processor and memory usage.
Most of the data displayed on this tab can also be displayed using Performance Monitor.
In Task Manager, to see a dynamic overview of system performance, click the Performance tab, as shown in Figure 2-11.
Figure 2-11
Performance tab in Task Manager
162
Microsoft Windows Server 2003 Performance Guide
The following additional display options are available on the Performance tab:
■
Double-clicking Task Manager will display a window dedicated to the CPU
usage gauges. Using this mode on the Performance tab, you can resize and
change the location of the CPU usage gauges. To move and resize the chart, click
the edge of the gauge and adjust the size, and then drag the window to the new
location. To return to Task Manager, double-click anywhere on the gauge.
■
To graph the percentage of processor time in privileged or Kernel mode, click
Show Kernel Times on the View menu. This is a measure of the time that applications are using operating system services. The difference between the time
spent in Kernel mode and the overall CPU usage represents time spent by
threads executing in User mode.
Table 2-11 shows how the data displayed on the Task Manager Performance tab compares to performance counters displayed in System Monitor.
Table 2-11
Task Manager Performance Tab Compared to the Performance
Console
Task Manager Field
System Monitor
Counter
CPU Usage
Processor(_Total)\
(bar graph and chart) % Processor Time
Description
The percentage of time the processor or processors were busy executing application or
operating system instructions.
This counter provides a general indication of
how busy the computer is.
PF Usage (bar graph)
Page File Usage
History
Memory\
Committed Bytes
A graphical representation of the Commit
Charge Total. This is the total amount of virtual memory in use at that instant.
Handles
Process(_Total)\
Handle Count
The total number of open handles in all processes currently running on the system.
Threads
System\Threads
The total number of threads currently running on the system.
Processes
System\Processes
The number of running processes.
Physical
Memory - Total
Not available
The total amount of physical memory installed and recognized by the operating system.
Physical
Memory - Available
Memory\
Available KBytes
The amount of physical memory that can be
allocated to a process immediately.
Chapter 2:
Table 2-11
Performance Monitoring Tools
163
Task Manager Performance Tab Compared to the Performance
Console
Task Manager Field
System Monitor
Counter
Physical Memory System Cache
Memory\ Cache
Bytes
The amount of physical memory allocated to
the system working set; includes System
Cache Resident Bytes, Pool Paged Resident
Bytes, System Driver Resident Bytes, and System Code Resident Bytes.
Commit Charge Total
Memory\
Committed Bytes
The amount of virtual memory allocated and
in use (that is, committed) by all processes on
the system.
Commit Charge Limit
Memory\Commit
Limit
The maximum amount of virtual memory
that can be committed without enlarging the
paging file.
Commit Charge Peak
Not available
The most virtual memory that has been committed since system startup.
Kernel Memory Total
Not available
The sum of nonpaged and paged system
memory from the Nonpaged and paged
memory pools that is currently in use.
Kernel Memory Paged
Memory\Pool
Paged Resident
Byes
The amount of system memory from the
paged pool that is currently resident.
Kernel Memory Nonpaged
Memory\Pool
Nonpaged Bytes
The amount of system memory from the
Nonpaged pool that is currently in use.
Description
Monitoring the Network
The Networking tab displays the network traffic to and from the computer by network
adapter. Network usage is displayed as a percentage of theoretical total available network capacity for that network adapter, RAS connection, established VPN, or other
network connection.
Click the Networking tab to see the status of the network. Table 2-12 shows how the
data displayed on the Task Manager Networking tab compares to performance
counters in System Monitor. Not all of these columns are displayed by default; use the
Select Columns command on the View menu to add columns to the Task Manager
display.
164
Microsoft Windows Server 2003 Performance Guide
Table 2-12
Comparing the Task Manager Networking Tab with the Performance
Console
Task Manager
Column
System Monitor
Counter
Adapter Description
Network
Interface(instance)
The name of the network connection
being monitored. This is used as the instance name of the performance object
for this network adapter.
Network Adapter
Name
Not available
The name of the Network Connection.
Network Utilization
Not available
The Network Interface(*)\Bytes Total/
sec counter divided by the Network Interface(*)\Current Bandwidth counter
and displayed as a percentage.
Link Speed
Network
Interface(*)\Current
Bandwidth
The theoretical maximum bandwidth of
this network adapter.
State
Not available
The current operational state of this
network adapter.
Bytes Sent
Throughput
Not available
The rate that bytes were sent from this
computer using this network adapter,
divided by the Link Speed and displayed
as a percentage.
Description
This value is based on the same data
that is used by the Network Interface(*)\Bytes Sent/sec performance
counter.
Bytes Received
Throughput
Not available
The rate that bytes were received by this
computer using this network adapter,
divided by the Link Speed and displayed
as a percentage.
This value is based on the same data
that is used by the Network Interface(*)\Bytes Received/sec performance
counter.
Bytes Throughput
Not available
The rate that bytes were sent from or received by this computer using this network adapter divided by the Link Speed
and displayed as a percentage.
This value is based on the same data
that is used by the Network Interface(*)\Bytes Total/sec performance
counter.
Chapter 2:
Table 2-12
Performance Monitoring Tools
165
Comparing the Task Manager Networking Tab with the Performance
Console
Task Manager
Column
System Monitor
Counter
Bytes Sent
Not available
Description
The total number of bytes sent from this
computer since the computer was started.
This value is based on the same data
that is used by the Network Interface(*)\Bytes Sent/sec performance
counter.
Bytes Received
Not available
The total number of bytes received by
this computer since the computer was
started.
This value is based on the same data
that is used by the Network Interface(*)\Bytes Received/sec performance
counter.
Bytes
Not available
The total number of bytes sent from or
received by this computer since the
computer was started.
This value is based on the same data
that is used by the Network Interface(*)\Bytes Total/sec performance
counter.
Bytes Sent/Interval
Network
Interface(*)\Bytes
Sent/sec
The number of bytes sent from this
computer during the sample interval.
Bytes Received/
Interval
Network Interface(*)\Bytes
Received/sec
The number of bytes received by this
computer during the sample interval.
Bytes/Interval
Network
Interface(*)\Bytes
Total/sec
The number of bytes sent from or received by this computer during the
sample interval.
Unicasts Sent
Not available
The total number of unicast packets
sent from this computer since the computer was started.
This value is based on the same data
that is used by the Network Interface(*)\Packets Sent Unicast/sec performance counter.
166
Microsoft Windows Server 2003 Performance Guide
Table 2-12
Comparing the Task Manager Networking Tab with the Performance
Console
Task Manager
Column
System Monitor
Counter
Unicasts Received
Not available
Description
The total number of unicast packets received by this computer since the computer was started.
This value is based on the same data
that is used by the Network Interface(*)\Packets Received Unicast/sec
performance counter.
Unicasts
Not available
The total number of unicast packets
sent from or received by this computer
since the computer was started.
Unicasts Sent/
Interval
Network
Interface(*)\Packets
Sent Unicast/sec
The number of unicast packets sent
from this computer during the sample
interval.
Unicasts Received/
Interval
Network
Interface(*)\Packets
Received Unicast /sec
The number of unicast packets received
by this computer during the sample interval.
This value is based on the same data
that is used by the performance
counter.
Unicasts/Interval
Not available
The number of unicast packets sent
from or received by this computer during the sample interval.
Nonunicasts Sent
Not available
The total number of nonunicast packets
sent from this computer since the computer was started.
This value is based on the same data
that is used by the Network Interface(*)\Packets Sent Non-Unicast/sec
performance counter.
Nonunicasts Received Not available
The total number of nonunicast packets
received by this computer since the
computer was started.
This value is based on the same data
that is used by the Network Interface(*)\Packets Received Non-Unicast/
sec performance counter.
Nonunicasts
Not available
The total number of nonunicast packets
sent from or received by this computer
since the computer was started.
Nonunicasts Sent/
Interval
Network
Interface(*)\Packets
Sent Non-Unicast/sec
The number of nonunicast packets sent
from this computer during the sample
interval.
Chapter 2:
Table 2-12
Performance Monitoring Tools
167
Comparing the Task Manager Networking Tab with the Performance
Console
Task Manager
Column
System Monitor
Counter
Nonunicasts
Received/Interval
Network
Interface(*)\Packets
Received Non-Unicast /
sec
The number of nonunicast packets received by this computer during the
sample interval.
Nonunicasts/Interval
Not available
The number of nonunicast packets sent
from or received by this computer during the sample interval.
Description
Monitoring Users
The Users tab displays those users currently logged on to the computer. You can use
this display to quickly identify the users logged on to that system.
Click the Users tab to see the status of all users currently logged on. To select the columns you want to view, on the View menu, click Select Columns. Table 2-13 gives a
description of each of these columns.
Table 2-13
Task Manager Columns for Users
Task Manager
Column
Description
User
Name of person logged on.
ID
Number that the user session is identified by.
Status
Active or Disconnected.
Client Name
Name of computer using the session. If it is a local session, this field
will appear blank.
Session
Console or Terminal Services.
Using Task Manager you can perform a number of user-related tasks. These include
the ability to:
■
End a User session (Console session or Terminal session). Select the session,
and then click the Disconnect button. This allows you to disconnect a user while
the user’s session continues to run the applications.
■
Close a session and all applications that are running on that session. Select the
session and click Log Off.
Warning
If the user has any unsaved data, that data might be lost.
168
Microsoft Windows Server 2003 Performance Guide
■
Send a message to a specific user. Select the user, and then click Send Message.
■
Take remote control of a Terminal Services session. Right-click a user, and then
select remote control.
Automated Performance Monitoring
Performance problems do not occur only when you have the opportunity to observe
them. Background performance monitoring procedures provide you with a way to
diagnose performance problems that occurred in the recent past while you were not
looking. This section documents the two automated tools in Windows Server 2003
that allow you to establish background performance monitoring procedures.
The first method uses the Counter Logs facility in the Performance Logs and Alerts
section of the Performance Monitor console, which has a graphical user interface. The
second method uses the Logman, Typeperf, and Relog command-line tools. The
Counter Logs facility in the Performance Logs and Alerts section of the Performance
Monitor console are discussed first. Then, performance monitoring command-line
utilities and some of their additional capabilities will be discussed.
These are the keys to establishing effective automated performance monitoring
procedures:
■
Knowing which current performance statistics you want to collect on a regular
basis
■
Knowing in what form and how frequently you want to collect the data
■
Knowing how much historical performance data you need to keep to go back in
time and resolve a problem that occurred in the recent past or to observe historical trends
This section examines the data logging facilities you will use to gather performance
statistics automatically. Chapter 3, “Measuring Server Performance,” provides advice
about what metrics to gather and how to interpret them. Chapter 4, “Peformance
Monitoring Procedures,” offers recommendations for setting up automated performance logging procedures.
Performance Logs and Alerts
When you open the Performance Monitor, you will notice the Performance Logs and
Alerts function in the left tree view. There are three components to Performance Logs
and Alerts: counter logs, trace logs, and alerts. This section discusses only the use of
Chapter 2:
Performance Monitoring Tools
169
counter logs. Trace logs and alerts are discussed later in this chapter in sections entitled “Event Tracing for Windows” and “Alerts,” respectively.
You create counter logs using the Performance Logs and Alerts tool whenever you
require detailed analysis of performance statistics for your servers. Retaining, summarizing, and analyzing counter log data collected over a period of several months is beneficial for capacity planning and deployment. Using the Performance Logs and Alerts
tool, designated support personnel can:
■
Manage multiple logging sessions from a single console window.
■
Start and stop logging manually, on demand, or automatically, at scheduled
times for each log.
■
Stop each log based on the elapsed time or the current file size.
■
Specify automatic naming schemes and stipulate that a program be run when a
log is stopped.
The Performance Logs and Alerts service process, Smlogsvc.exe, is responsible for
executing the performance logging functions you have defined. Comparable performance data logging capabilities are also available using the command-line tools that
also interface with the Performance Logs and Alerts service process. These commandline tools are discussed in “Creating Performance Logs Using Logman” in this chapter.
Counter Logs
Counter logs record to a log file the same performance statistics on hardware resource
usage and system services that you can gather and view interactively in the System
Monitor. Both facilities gather performance objects and counters through a common
performance monitoring API. Counter logs are suited for gathering much more information than an interactive System Monitor console session. With the performance
statistics stored in a log file, you no longer have to worry about newer counter values
replacing older performance data values in an interactive data gathering session. All
the interval performance data that was gathered for the duration of the counter logging session is available for viewing and reporting.
After you create a counter log file, you can use the System Monitor to view and analyze
the performance data you collected. To access counter data from a log instead of viewing counters interactively, use the View Log Data button in System Monitor. You can
use the System Monitor both during and after log file data collection to view the performance data you have collected.
170
Microsoft Windows Server 2003 Performance Guide
Counter Log File Formats
Data in counter logs can be saved in the file formats shown in Table 2-14.
Table 2-14
Counter Log File Formats
File Format
File Type
Description
Binary
.blg
Binary log format. This format is used if another format is not specified. When a binary log file reaches its
maximum size, data collection stops.
Binary circular
.blg
Circular binary format. This file format is the same as
binary format. However, when a circular binary log file
reaches its maximum size, data collection continues,
wrapping around to the beginning of the file. Once
the file is filled with data, the oldest data in the file is
overwritten as new data is recorded.
Text
.csv
(comma-separated)
Comma-separated values. This format is suitable for
creating performance data in a format compatible
with spreadsheet programs like Excel.
Text
(tab-separated)
.tsv
Tab-separated values. This format is also suitable for
creating performance data in a format compatible
with spreadsheet programs like Excel.
SQL database
System DSN
SQL database format. This is valid only if data is being
logged to a SQL database.
Binary log files are the recommended file format for most counter logging, especially
any logging sessions in which you expect a sizable amount of data. They are the most
efficient way to gather large amounts of performance data, and they store counter data
more concisely than any other format. Because binary log files are readily converted
into the other formats that are available, there is very little reason not to use binary log
files initially. Use the Relog command-line tool to convert binary format log files to
other formats as needed.
Note
Binary file format is designed to be used by System Monitor. To interchange
performance data with other applications like Microsoft Excel, use text file format. The
precise format of a binary log file is open and documented. You could write a program
to read a binary log file using the Performance Data Helper interface.
Binary circular files have the same format as binary linear files. They record data until
they reach a user-defined maximum size. Once they reach this size, they will overwrite
existing log data starting from the beginning of the file. Use this format if you want to
ensure that the most current performance data is always accessible and you do not
need to keep older sampled data once the log file maximum is reached.
Chapter 2:
Performance Monitoring Tools
171
Structured Query Language (SQL) database format allows all data to be quickly and
easily imported into a SQL database. This format is ideal for archiving summarized
data, especially if you are responsible for tracking performance on multiple computers. Chapter 4, “Performance Monitoring Procedures,” illustrates the use of a SQL
database for longer term archival and retrieval of counter log data to support capacity
planning.
Scheduling Collection Periods
You can start and stop counter logs manually or schedule them to run automatically. You can define an automatic start and end time for a counter logging session, or specify that the logging session gather performance data continuously.
At the end of a designated logging period, you can start a new logging session
and run a command to process the counter log file that has just been completed.
File Management
You can generate unique names for counter log files that you create. You can choose to
number log files sequentially or append a time or date stamp that identifies when the
counter log file was created. You can also set a limit on the size that any log file can
grow to. Using the circular binary file format, you can ensure that your counter log file
will never consume more than the designated amount of disk space.
Working with Counter Logs
You can find Performance Logs and Alerts in the Performance Monitor console and
the Computer Management console. The following procedure describes how to start
them.
To start Performance Logs and Alerts from the Performance console
1. Click Start, point to Run, and then type Perfmon.
2. Press the ENTER key.
3. Double-click Performance Logs and Alerts to display the available tools.
Figure 2-12 shows the Performance console tree.
172
Microsoft Windows Server 2003 Performance Guide
Figure 2-12
Performance Logs and Alerts console tree
After you load the Performance Logs and Alerts console, you will need to configure
the Counter logs.
To configure counter logs
1. Click the Counter Logs entry to select it.
Previously defined logs and alerts appear in the appropriate node of the details
pane. A sample settings file for a counter log named System Overview in <system drive>:\perflogs\system_overview.blg is included with Windows
Server 2003. These counter log settings allow you to monitor basic counters
from the memory, disk, and processor objects.
2. Right-click the details pane to create a new log. You can also use settings from an
existing HTML file as a template.
Note
To run the Performance Logs and Alerts service, you must be a member
of the Performance Log Users or Administrators security groups. These groups
have special security access to a subkey in the registry to create or modify a log
configuration.
3. In the New Log Settings box, type the name of your counter log session and
click OK.
Figure 2-13 shows the General tab for the Properties of a new counter log after
you have entered the counter log name.
Chapter 2:
Figure 2-13
Performance Monitoring Tools
173
General tab for a counter log
4. To configure a counter log, click Add Objects or Add Counters to specify objects,
counters, and instances. You also set the data collection interval on the General
tab. Use the Add Objects button to log all counters from all instances of a performance object that you have selected. Use the Add Counters button to see the
same familiar dialog box that is used to add counters to an interactive System
Monitor data collection session.
5. Click the Log Files tab, shown in Figure 2-14, to set the file type, the file naming
convention, and other file management options.
174
Microsoft Windows Server 2003 Performance Guide
Figure 2-14
Configuring counter log file properties
The Log Files tab is used to select the file type and automatic file naming
options. You can generate unique log file names that are numbered consecutively as an option, or you can add a date and timestamp to the file name automatically. Or you can choose to write all performance data to same log file where
you specify that current performance data is used to overwrite any older data in
the file. After you set the appropriate option, the Log Files tab displays an example of the automatic file names that will be generated for you.
6. Click the Configure button to set the file name, location, and a file size limit in
the Configure Log Files dialog box, as shown in Figure 2-15. Click OK to close
the dialog box.
Figure 2-15
Dialog box for configuring log file size limits
Chapter 2:
Performance Monitoring Tools
175
7. Click the Schedule tab (Figure 2-16) to choose manual or automatic startup
options. You can then set the time you want the logging session to end using an
explicit end time; a duration value in seconds, minutes, hours, or days; or when
the log file reaches its designated size limit.
Figure 2-16
Setting startup options on the Schedule tab
The Counter log properties that allow you to establish automated performance monitoring procedures are summarized in the Table 2-15.
Table 2-15
Summary of Counter Log Properties
Tab
Settings to Configure
Notes
General
Add objects, or add
counters
You can collect performance data from the local
computer and from remote computers.
Sample interval
Defaults to once every 15 seconds.
Account and password
You can use Run As to provide the logon account
and password for data collection on remote
computers.
File type
Counter logs can be defined as comma-delimited
or tab-delimited text files, as binary or binary circular files, or as SQL database files. For database
files, use Configure to enter a repository name
and data set size. For all other files, use Configure
to enter location, file name, and log file size.
Automatic file naming
You can choose to add unique file sequence
numbers to the file name or append a time and
date stamp to identify it.
Log Files
176
Microsoft Windows Server 2003 Performance Guide
Table 2-15
Summary of Counter Log Properties
Tab
Settings to Configure
Notes
Schedule
Manual or automated start You can specify that the log stop collecting data
and stop methods and
when the log file is full.
schedule
Automated start and stop
times
Start and stop by time of day or specify the log
start time and duration.
Stop when the file is full
Automatically stop data collection when the file
reaches its maximum size.
Processing when the log
file closes
For continuous data collection, start a new log file
when the log file closes. You can also initiate automatic log file processing by running a designated command when the log file closes.
Note
Sequential counter log files can grow larger than the maximum file size specified. This occurs for counter logs because the log service waits until after the last data
sample was gathered and written to the file before checking the size of the log file. At
this point, the file size might already exceed the defined limit.
Analyzing Counter Logs
Once you have created a counter log, you can use the System Monitor to analyze the
performance data it contains. If you need to manipulate the measurement data further, run the Relog tool to create a text format file that you can read using a spreadsheet application like Excel, as discussed further in the section entitled “Managing
Performance Logs.”
To analyze counter logs using the System Monitor
1. Open System Monitor and click the View Log Data button.
2. On the Source tab (shown in Figure 2-17), click Log Files as the data source and
use the Add button to open the log file you want to analyze. You can specify one
or more log files or a SQL database.
3. Click the Time Range button to set the start and end time of the interval you
want to analyze using the slider control.
Remember that a System Monitor Chart View can display only 100 data points at
a time. When you are reporting on data from a log file, the Chart View automatically summarizes sequences longer than 100 measurement intervals to fit the display. If the counter contains 600 measurement samples, for example, the Chart
View line graphs will show the average of every six data points. The Duration
Chapter 2:
Performance Monitoring Tools
177
field in the Value bar will display the duration of the time interval you have
selected to view. You can also use Report View and Histogram View to display
numeric averages, minimum values, and maximum values for all the counter
values stored in the log file.
Note
When you are working with a log file, you can only add counters that
are available in the counter file to a chart or report.
4. Right-click the display to relog a binary data file to a text file format using the
Save Data As function. Relogging binary data to text format allows you to analyze the counter log data using a spreadsheet application like Excel. To convert
a binary data file into a SQL database format, you must run the Relog commandline tool instead.
Tip You also summarize a counter log when you relog it by setting the summarization interval. For example, if the original counter log contains measurement data gathered every 5 minutes, if you reduce the log file size by writing
data only once every 12 intervals, you will create a counter log that contains
hourly data.
Figure 2-17
Changing the time range to view a subset of the counter log data
178
Microsoft Windows Server 2003 Performance Guide
Tips for Working with Performance Logs and Alerts
Windows Server 2003 Help for Performance Logs and Alerts describes performing
the most common tasks with logs and alerts, including how to create and configure a
counter log.
Table 2-16 gives tips that you can use when working with the Performance Logs and
Alerts snap-in.
Table 2-16
Tips for Working with Performance Logs and Alerts
Task
Do This
Export log data
to a spreadsheet
for reporting
purposes
Exporting log data to a spreadsheet program such as Excel offers easy
sorting and filtering of data. For best results, Relog binary counter log
files to text file format, either CSV or TSV.
Record
intermittent
data to a log
Not all counter log file formats can accommodate data that is not
present at the start of a logging session. For example, if you want to
record data for a process that begins after you start the log, select one
of the binary (.blg) formats on the Log Files tab.
Limit log file size
to avoid diskspace problems
If you choose automated counter logging with no scheduled stop
time, the file can grow to the maximum size allowed based on available space. When you set this option, consider your available disk
space and any disk quotas that are in place. Change the file path from
the default (the Perflogs folder on the local computer) to a location
with adequate space if appropriate. During a logging session, if the
Counter Logs service cannot update the file because of lack of disk
space, an event is written to the Application Event Log showing an error status of “Error disk full.”
Name files
for easy
identification
Use File Name (in the Configure Log Files box) and End File Names
With (on the Log Files tab) to make it easy to find specific log files. For
example, you can set up periodic logging, such as a log for every day
of the week. Then you can develop different naming schemes with the
base name being the computer where the log was run, or the type of
data being logged, followed by the date as the suffix. For example, a
naming scheme that generates a file named ServerRed1_050212.blg
was created on a computer named ServerRed1 on May 2nd at noon,
assuming the End File Name With entry was set at mmddhh.
Creating Performance Logs Using Logman
Suppose you want to monitor the amount of network traffic generated by your backup
program, an application that runs at 3:00 each morning? The Counter Log facility in
Performance Logs and Alerts allows you to record performance data to a log file, thus
providing for automated and unattended performance monitoring. But even though
the Performance Logs and Alerts snap-in is an excellent tool for carrying out these
tasks, it does have some limitations:
Chapter 2:
Performance Monitoring Tools
179
■
Each counter log setting must be created using the graphical user interface.
Although this makes it possible for novice users to create performance monitors, experienced users might find the process slower and less efficient than creating performance monitors from the command line.
■
Performance log settings created with the snap-in cannot be accessed from
batch files or scripts.
The Logman command-line tool helps overcome these limitations. In addition, Logman is designed to work with other command-line tools like Relog and Typeperf to
allow you to build reliable automated performance monitoring procedures. This section documents the use of Logman to create and manage counter log files. For more
information about Logman, in Help and Support Center for Microsoft Windows
Server 2003, click Tools, and then click Command-Line Reference A–Z. The Logman
facilities related to creating trace logs are discussed in “Event Tracing for Windows”
later in this chapter.
Log Manager Overview
Log Manager (Logman.exe) is a command-line tool that complements the Performance Logs and Alerts snap-in. Log Manager replicates the functionality of Performance Logs and Alerts in a simple-to-use command-line tool. Among other things,
Log Manager enables you to:
■
Quickly create performance log settings from the command line.
■
Create and use customized settings files that allow you to copy the monitoring
configuration and reuse it on other computers.
■
Call performance logging within batch files or scripts. These batch files or
scripts can then be copied and used on other computers in your organization.
■
Simultaneously collect data from multiple computers.
The data that Log Manager collects is recorded in a performance counter log file using
the format you specify. You can use the System Monitor to view and analyze any
counter log file that Log Manager creates.
Command Syntax
Logman operates in one of two modes. In Interactive mode, you can run Logman from
a command-line prompt and interact with the logging session. For example, you can
control the start and stop of a logging session interactively. In Background mode, Logman creates Counter log configurations that are scheduled and processed by the same
Performance Logs and Alerts service that is used with the Performance Monitor console. For more information about Performance Logs and Alerts, see “Performance
Logs and Alerts” in this chapter.
180
Microsoft Windows Server 2003 Performance Guide
Table 2-17 summarizes the six basic Logman subcommands.
Table 2-17
Logman Subcommands
Subcommand
Function
Create counter
CollectionName
Creates collection queries for counter data collection sessions.
Update
CollectionName
Updates an existing collection query to modify the collection
parameters.
Delete
CollectionName
Deletes an existing collection query.
Query
CollectionName
Lists the collection queries that are defined and their status. Use
query CollectionName to display the properties of a specific collection. To display the properties on remote computers, use the
-s RemoteComputer option on the command line.
Start CollectionName
Starts a logging session manually.
Stop CollectionName
Stops a logging session manually.
Collection queries created using Log Manager contain properties settings identical to
the Counter Logs created using the Performance Logs and Alerts snap-in. If you open
the Performance Logs and Alerts snap-in, you will see any collection queries you created previously using Log Manager. Likewise, if you use the -query command-line
parameter in Log Manager to view a list of collection queries on a computer, you will
also see any counter logs created using the Performance Logs and Alerts snap-in.
Table 2-18 summarizes the Logman command-line parameters that are used to set the
properties of a logging session.
Table 2-18
Logman Counter Logging Session Parameters
Parameter
Syntax
Function
Settings file
-config FileName
Use the logging parameters
defined in this setting file.
Notes
Computer
-s ComputerName
Specify the computer you
want to gather the performance counters from.
If no computer name
is provided, the local
computer is assumed.
Counters
-c {Path [Path ...]
Specify the counters that you
want to gather.
Required. Use -cf
FileName to use
counter settings from
an existing log file.
Sample
interval
-si [[HH:]MM:]SS
Specify the interval between
data collection samples.
Defaults to 15
seconds.
Output file
name
-o {Path |
DSN!CounterLog}
Specify the output file name. If Required. Use -v
the file does not exist, Logman option to generate
will create it.
unique file names.
Chapter 2:
Table 2-18
Performance Monitoring Tools
181
Logman Counter Logging Session Parameters
Parameter
Syntax
Function
File
versioning
-v {NNNNNN |
MMDDHHMM}
Generate unique file names,
either by numbering them
consecutively or by adding a
time and date stamp to the
file name.
Log file
format
-f bin | bincirc | csv |
tsv | SQL}
Choose the format of the
output counter log file.
Defaults to binary
format.
Specify the maximum log file
or database size in MB.
Logging ends when
the file size limit is
reached.
File size limit -max Value
Notes
Create New
log file at
session end
-cnf [[HH:]MM:]SS
Create a new log file when
the file size limit or logging
duration is exceeded.
Requires that -v versioning be specified
to generate unique
file names.
Run command at
session end
-rc FileName
Run this command at session
end.
Use in conjunction
with -cnf and -v options.
Append
-a
Append the output from this
logging session to an existing
file.
Begin
logging
-b M/D/YYYY
H:MM:SS
[{AM | PM}]
Begin a logging session automatically at the designated
date and time.
End
logging
-e M/D/YYYY
H:MM:SS
[{AM | PM}]
End a logging session automatically at the designated
date and time.
Or use -rf to specify
the duration of a logging session.
Log duration -rf [[HH:]MM:]SS
End a logging session after
this amount of elapsed time.
Or use -e to specify a
log end date and
time.
Repeat
-r
Repeats the collection every
day at the same time. The
time period is based on
either the -b and -rf options,
or the -b and -e options.
Use in conjunction
with -cnf, -v, and -rc
options.
Start and
stop data
collection.
-m [start] [stop]
Start and stop an interactive
logging session manually.
User name
and
password.
-u UserName
Password
Specify user name and
password for remote
computer access.
The User account
must be a member of
the Performance Log
Users Group. Specify
* to be prompted for
the password at the
command line.
182
Microsoft Windows Server 2003 Performance Guide
Creating Log Manager Collection Queries
Before you can use Log Manager to start logging performance data, you must create a
collection query, which is a set of instructions that specifies which computer to monitor, which performance counters to gather, how often to gather those counters, and
other performance data collection parameters.
Collection queries are created using the create counter parameter, followed by the
name to give to the collection query, and then a list of performance counters to be
monitored. For example, the following command creates a collection query called
web_server_log:
logman create counter web_server_log
Although the preceding command creates web_server_log, it does not specify any performance counters. To create a collection query that actually collects data, you need to
use the -c parameter to indicate the performance counters to be sampled. For example, the following command creates a collection query called web_server_log that can
be used to monitor and log available bytes of memory:
logman create counter web_server_log –c "\Memory\Available Bytes"
If you want to monitor multiple performance counters with a single collection query,
list each of these counters after the -c parameter. The following example creates a collection query that measures three performance counters:
logman create counter web_server_log –c "\Memory\Available Bytes" "\Memory\Pages/sec"
"\Memory\Cache Bytes"
Using a Settings File with Log Manager
When creating a collection query that monitors a large number of performance
counters, placing those counters in a counter settings file is easier than typing them as
part of the command-line string. For example, to monitor disk drive performance, you
could use a command string similar to this:
logman –create counter web_server_log –c "\PhysicalDisk\Avg. Disk Bytes/Read"
"\PhysicalDisk\Avg. % Disk Read Time" "\PhysicalDisk\Avg. Split IOs/sec"
"\PhysicalDisk\Avg. Current Disk Queue Length" "\Avg. PhysicalDisk\Avg. Disk Bytes/
Read"
Alternatively, you could list the performance counters in a text file (one counter per
line) and then reference this file when creating the collection query. A sample settings
file is shown in Listing 2-1.
Listing 2-1
Log Manager Settings File
"\PhysicalDisk\Avg. Disk Bytes/Read"
"\PhysicalDisk\Avg. % Disk Read Time"
"\PhysicalDisk\Avg. Split IOs/sec"
"\PhysicalDisk\Avg. Current Disk Queue Length"
"\Avg. PhysicalDisk\Avg. Disk Bytes/Read"
Chapter 2:
Performance Monitoring Tools
183
To create a collection query that reads a settings file, use the -cf parameter followed by
the path to the file. For example, if your settings file is stored in
C:\Scripts\Counters.txt, use the following command string:
logman create counter web_server_log –cf c:\scripts\counters.txt
Settings files enable you to create query collections that can be used consistently on
many computers. For example, if you plan to monitor disk drive performance on 20
different computers, you can use a common settings file to create and distribute consistent collection queries
Tip Use the query (-q) option of the Typeperf tool to list counter paths and save
them in a text file, which you can then edit and use as a Logman settings file to minimize typing errors.
For more information about how to specify the counter path correctly, see “Performance Counter Path” earlier in this chapter.
Keep in mind that the Logman tool was designed for scripting. One way to create the
same query collections on multiple computers is to create a batch file with the command string. That batch file can then be copied and run on multiple computers,
resulting in the exact same query collection being created on each computer.
Monitoring Remote Computers Using Log Manager
Log Manager can be used to monitor remote computers, and, if you choose, to consolidate the performance statistics from each of these computers into the same log file.
To monitor a remote computer, add the -s parameter followed by the computer name.
For example, the following collection query monitors available bytes of memory on
the remote computer DatabaseServer:
logman create counter web_server_log –c "Memory\Available bytes" –s DatabaseServer
To carry out performance monitoring on remote computers, you must have the necessary permissions. If the account from which you are working does not have the
required permissions, you can specify a user name and password for an account that
does by using the -u (user name and password) parameters. For example, this command creates a collection query that runs under the user name jones, with the password mypassword:
logman create counter file_server_log –cf c:\Windows\logs\counters.txt –s FileServer
–u jones mypassword
184
Microsoft Windows Server 2003 Performance Guide
To add monitoring for another remote computer to the same logging session, use the
update parameter:
logman update counter file_server_log –cf c:\Windows\logs\counters.txt –s WebServer
–u jones mypassword
If you specify an asterisk as the password (-u jones *), you are prompted for
the password when you start the collection query. This way, the password is not saved
in the collection query, nor is it echoed to the screen.
Note
Configuring the Log Manager Output File
Log Manager does not display performance data on-screen. Instead, all performance
measurements are recorded in an output log file. Whenever you create a collection
query, you must include the -o parameter followed by the path for the log file. You do
not need to include a file name extension; Log Manager will append the appropriate
file name extension based on the format of the output file.
By default, Log Manager will save output files to the directory where you issued the
command. For example, this command saves the output file in C:\My Documents\Web_server_log.blg, assuming you issued the command from the folder
C:\My Documents:
logman –create counter web_server_log –c "\Memory\Available Bytes" –o web_server_log
To save your files somewhere other than in the default folder, include the full path.
For example, to have performance data logged to the file
C:\Scripts\Web_server_log.blg, use this command:
logman create counter web_server_log –c "\Memory\Available Bytes" –o
c:\scripts\web_server_log
Or, use a UNC path to save the file to a remote computer:
logman create counter web_server_log –c "\Memory\Available Bytes" –o
\\RemoteComputer\scripts\web_server_log
If the file Web_server_log does not exist, Log Manager creates it. If the file does exist,
Log Manager returns an error saying that the file already exists. To overwrite the file,
use the -y option. If you want Log Manager to append information to an existing log file
(that is, to add new data without erasing existing data), you must use the -a parameter:
logman create counter web_server_log –c "\Memory\Available Bytes" –o web_server_log –a
Chapter 2:
Performance Monitoring Tools
185
As an alternative to logging data to a text file, you can record data in a SQL database,
provided you have used the ODBC Data Source Administrator to create a system Data
Source Name (DSN) for that database. If a DSN exists, you can log performance data
to a SQL database using the format DSN!Table_name, where DSN represents the Data
Source Name, and Table_name represents the name of a table within that database. (If
the table does not exist, Log Manager will create it.) For example, the following command logs performance data into a table named DailyLog in the PerformanceMonitoring database:
logman create counter web_server_log –c "\Memory\Available Bytes" –o
PerformanceMonitoring!DailyLog –f SQL
Note The -f (format) parameter is required when saving data to a SQL database.
The DSN created must be a system DSN and not a user DSN. Logman will fail if a user
DSN is used. Additionally, the DSN must point to a database that already exists. Your
SQL database administrator can create a SQL database for you to use.
Adding Versioning Information to Log File Names with Log
Manager
Versioning allows you to automatically generate unique counter log file names. Log
Manager supports two different versioning methods:
■
With numeric versioning, file names are appended with an incremental 6-digit numeric suffix. For example, the first log file created by a collection
query might be called Web_server_log_000001.blg. The second file would then
be called Web_server._log_000002.blg.
■
With date/time versioning, file names are appended with the current month, day, and time (based on a 24-hour clock), using the mmddhhmm
format (month, date, hour, minute). For example, a file saved at 8:30 A.M. on
January 23 might be named Web_server.log_01230830.
Numeric
Date/Time
To add versioning information to your file names, use the -v parameter, followed by
the appropriate argument. Logman uses nnnnnn as the argument for adding numeric
versioning, and mmddhhmm as the argument for date/time versioning. These are the
only two valid arguments.
Note
Performance Logs and Alerts supports some additional date formats, including yyyymmdd, which appends the year, month, and day to the end of the file name.
These additional formats cannot be set using Log Manager; however, you can create a
collection query using Log Manager, and then use the Performance Logs and Alerts
snap-in to modify the versioning format.
186
Microsoft Windows Server 2003 Performance Guide
For example, this command configures versioning using the numeric format:
logman create counter web_server_log –c "\Memory\Available Bytes" –o web_server_log
–a –v nnnnnn
This command configures versioning using the date/time format:
logman create counter web_server_log –c "\Memory\Available Bytes" –o web_server_log
–a –v mmddhhmm
Formatting the Log Manager Output File
Log Manager allows you to specify the data format for your output file. To designate
the file format for a Log Manager output file, use the -f parameter followed by the format type. For example, this command sets the file format to circular binary:
logman create counter web_server_log –c "\Memory\Available Bytes" –o web_server_log
–f BINCIRC
Valid file formats are described in Table 2-19. Logman uses the same file formats as
the Counter log facility in Performance Logs and Alerts. (See Table 2-14 for more
details.) Binary file format is the most efficient and concise way to store counter log
data. Because the Relog tool can be used to convert binary files into any other format,
there should be little reason to utilize anything but binary format files.
Table 2-19
Log Manager File Formats
File Format
Description
BIN
Binary format.
BINCIRC
Circular binary format.
CSV
Comma-separated values.
TSV
Tab-separated values.
SQL
SQL database format.
Specifying a Maximum Size for the Log Manager Output File
Although Log Manager output files are relatively compact, they can still potentially
grow quite large. For example, a collection query that monitors two performance
counters every 15 seconds creates, after 24 hours, a file that is about one megabyte in
size. This might be a reasonable size for a single computer. However, if you have multiple computers logging performance data to the same file, that file might grow so
large that it would become very difficult to analyze the data.
To keep log files to a manageable number of bytes, you can set a maximum file size. To
do this, add the -max parameter followed by the maximum file size in megabytes. (If
you are logging data to a SQL database, the number following the -max parameter rep-
Chapter 2:
Performance Monitoring Tools
187
resents the maximum number of records that can be added to the table.) For example,
to limit the file Web_server_log.blg to 5 megabytes, use this command:
logman create counter web_server_log –c "\Memory\Available Bytes" –o web_server_log
–a –f CSV –max 5
File size limit checking If you specify a file size limit using the -max value, the file
system where you plan to create the file is evaluated before the counter log session
starts to see whether there is an adequate amount of space to run to completion. This
file size limit is performed only once when the log file you are creating is first stored
on the system drive. If the system drive has insufficient space, the logging session will
fail, with a smlogsvc error message reported in the event log similar to the following:
An error occurred while trying to update the log file with the current data
for the <session name> log session. This log session will be stopped.
The Pdh error returned is: Log file space is insufficient to support this operation.
An error return code of 0xC0000188 in this error message indicates an out-of-space
condition that prevented the logging session from being started.
Note that additional configuration information is written to the front of every counter
log file so that slightly more disk space than you have specified in the -max parameter
is actually required to start a trace session.
Creating new log files automatically When a log file reaches its maximum size,
the default behavior for Log Manager is to stop collecting data for that collection
query. Alternatively, you can have Log Manager automatically start recording data to a
new output file should a log file reach its maximum size. You do this with the -cnf (create new file) parameter. When you specify the create new file parameter, you must use
the -v versioning option to generate unique file names. For example, the following
command instructs Log Manager to create a new log file each time the maximum size
is reached, and to use the numeric versioning method for naming files:
logman create counter web_server_log –c "\Memory\Available Bytes" –o web_server_log
–a –f CSV –v nnnnnn -cnf
You can also force Log Manager to start a new log file after a specified amount of time
with -cnf hh:mm:ss, where hh is hours, mm is minutes, and ss is seconds. For example,
the following command causes Log Manager to start a new log file every 4 hours:
logman create web_server_log –c "\Memory\Available Bytes" –o web_server_log –a –f
BINCIRC –v nnnnnn –cnf 04:00:00
188
Microsoft Windows Server 2003 Performance Guide
Configuring the Log Manager Sampling Interval
The default Log Manager sampling interval is 15 seconds, similar to the Performance
Logs and Alerts facility. The Log Manager sampling interval can be changed using the
-si parameter followed by the interval duration. To configure the sampling interval, use
-si hh:mm:ss, where hh is hours, mm is minutes, and ss is seconds. Partial parameters
for the sample duration interval can be specified. For example, this command sets the
sampling interval to 45 seconds:
logman create counter web_server_log –c "\Memory\Available Bytes" –o web_server_log
–a –f CSV –si 45
To sample every 1 minute 45 seconds, the following command can be used:
logman create counter web_server_log –c "\Memory\Available Bytes" –o web_server_log
–a –f CSV –si 1:45
To sample every 1 hour 30 minutes 45 seconds, the following command can be used:
logman create counter web_server_log –c "\Memory\Available Bytes" –o web_server_log
–a –f CSV –si 1:30:45
Scheduling Log Manager Data Collection
By default, Log Manager begins collecting performance data as soon as you start a collection query, and it continues to collect that performance data until you stop it. There
might be times, however, when you want to schedule data collection for a specific
period of time. For example, suppose you have a new backup program that automatically runs from 3:00 A.M. to 4:00 A.M. each morning. To know what sort of stress this
backup program is placing on the system, you could come in at 3:00 A.M., start Log
Manager, and then, an hour later, stop Log Manager. Alternatively, you can schedule
Log Manager to automatically begin data collection at 3:00 A.M. and automatically to
stop monitoring an hour later.
To schedule data collection with Log Manager, you must specify both a beginning
date and time by using the -b parameter, and an ending time by using the -e parameter.
Both of these parameters require time data to be formatted as hh:mm:ssAMPM, where
hh is hours, mm is minutes, ss is seconds, and AMPM designates either morning or
afternoon/evening. For example, this command monitors available bytes of memory
between 3:00 A.M. and 4:00 A.M. on August 1, 2003:
logman create counter web_server_log –c "\Memory\Available Bytes" –o web_server_log
–a –f CSV –b 08/01/2003 03:00:00AM –e 08/01/2003 04:00:00AM
The preceding command causes data collection to begin at 3:00 A.M. and to end—permanently—at 4:00 A.M. If you want data collection to take place every day between
3:00 A.M. and 4:00 A.M., add the repeat (-r) parameter:
Chapter 2:
Performance Monitoring Tools
189
logman create counter web_server_log –c "\Memory\Available Bytes" –o web_server_log
–a –f CSV –b 08/01/2001 03:00:00AM –e 08/01/2001 04:00:00AM -r
You can also schedule data collection to take place during only a specified set of
dates. For example, to collect data only for September 1, 2003 through September
5, 2003, add the dates to the -b and -e parameters using the mm-dd-yyyy (monthday-year) format:
logman create counter web_server_log –c "\Memory\Available Bytes" –o web_server_log
–a –f CSV –b 09-01-2003 03:00:00AM –e 09-05-2003 04:00:00AM
Starting, Stopping, Updating, and Deleting Data Collections Using
Log Manager
Simply creating a collection query without specifying a scheduled begin time or end
time does not cause Log Manager to start monitoring performance data. Instead, you
must explicitly start data collection by using the start parameter. For example, to
begin collecting data using the web_server_log collection query, type this command:
logman start web_server_log
Once data collection has begun, the collection query will run either until it reaches
the end time (if you have included the -e parameter) or until you stop it by using the
stop parameter. For example, this command stops the collection query web_server_log:
logman stop web_server_log
To help you keep track of all your collections, the query parameter shows you a list of
all the collection queries stored on a computer, as well as their current status (running
or stopped). To view the list of collection queries on the local computer, use this:
logman query
To view the list of collection queries on a remote computer, add the -s parameter and
the name of the computer (in this case, DatabaseServer):
logman query –s DatabaseServer
Note The -query parameter will tell you what collection queries are running on a
computer; however, it will not tell you what collection queries are being run against a
computer. For example, you might have 10 collection queries running on your computer, but each could be monitoring performance on a different remote computer. You
will not know this, however, unless you know the specifics of each collection query.
To delete a collection query, use the delete parameter followed by the query name:
logman delete web_server_log
190
Microsoft Windows Server 2003 Performance Guide
An existing collection query can be updated with new information using the update
parameter followed by the collection name and the parameters you want to update.
For example to update a collection called web_server_log with a new sample interval of
60 seconds, the following command can be used:
Logman update web_server_log –si 60
Any parameter can be updated using update. Note that for an update to take effect, the
collection must be stopped and restarted.
Using Windows Script Host to Manage Log Manager Data
Collection
Log Manager’s built-in scheduling options allow you to schedule data collection at
specific times during the day (for example, between 3:00 A.M. and 4:00 A.M.). By
specifying a different sampling interval, you can also have Log Manager collect data at
regular intervals. (For example, setting the sampling interval to 1:00:00 will cause Log
Manager to take a single sample every hour.)
What you cannot do with Log Manager is schedule data collection for more irregular
time periods. For example, suppose you want to collect performance data for 5 minutes at the beginning of every hour. There is no way to do this using Log Manager
alone.
Note
You could, however, do this using Typeperf. To do this, create a batch file that
configures Typeperf to take only 5 minutes’ worth of samples. Then schedule the
batch file to run once every hour using the Task Scheduler.
However, because Logman is scriptable, you can combine Windows Script Host
(WSH) and Log Manager to schedule data collection using more irregular intervals.
The script shown in Listing 2-2 starts Log Manager (using the -start parameter) and
then pauses for 5 minutes, using the WSH Sleep method. While the WSH script is
paused, Log Manager collects data. After 5 minutes, the script resumes and issues the
stop command to stop data collection. The script then pauses for 55 minutes
(3300000 milliseconds) before looping around and starting again.
Listing 2-2
Running Log Manager Within a WSH Script
set WshShell = WScript.CreateObject(“WScript.Shell”)
Do
WshShell.Run "%COMPSEC% /c logman –start web_server_log"
WScript.Sleep 300000
WshShell.Run "%COMPSEC% /c logman –stop web_server_log"
WScript.Sleep 3300000
Loop
Chapter 2:
Performance Monitoring Tools
191
Listing 2-2 is designed to run indefinitely. To run it a finite number of times, use a fornext loop. For example, the script shown in Listing 2-3 causes the script to run 24
times (once an hour for an entire day) and stop.
Listing 2-3
Using a WSH Script to Run Log Manager Once an Hour for 24 Hours
Set WshShell = WScript.CreateObject("WScript.Shell")
For i = 1 to 24
WshShell.Run "%COMPSEC% /c logman –start web_server_log"
WScript.Sleep 300000
WshShell.Run "%COMPSEC% /c logman –stop web_server_log"
WScript.Sleep 3300000
Next i
Wscript.Quit
Managing Performance Logs
Both the Performance Logs and Alerts facility in the Performance Monitor console
and the Log Manager command-line interface allow you considerable flexibility in
gathering performance statistics. The Relog tool (Relog.exe) is a command-line tool
that allows you manage the counter logs that you create on a regular basis. Using
Relog, you can perform the following tasks:
■
Combine multiple counter logs into a single log file. You can list the file names
of all the counter logs that you want Relog to process separately, or you can use
wildcards (for example, *.blg) to identify them. The logs you combine can contain counters from a single computer or from multiple computers.
■
Create summarized output files from an input file or files.
■
Edit the contents of a counter log by allowing you to drop counters by name or
drop all counters not collected during a designated time interval.
■
Convert counter data from one file format to another.
Note Log Manager can record performance data on multiple computers and
save that data to the same log file. However, this can result in a considerable
amount of unwanted network traffic. Relog allows you to monitor performance
locally, and then retrieve the data as needed. By putting the Relog commands in
a batch file, data retrieval can be scheduled to take place at times when network
traffic is relatively low.
More Info
For more information about Relog, in Help and Support Center
for Microsoft Windows Server 2003, click Tools, and then click Command-Line
Reference A–Z.
192
Microsoft Windows Server 2003 Performance Guide
Using the Relog Tool
Relog requires two parameters: the path for the existing (input) log file, and the path
for the new (output) log file (indicated by the -o parameter). For example, this command will extract the performance records from the file C:\Perflogs\Oldlog.blg and
copy them into the file C:\Perflogs\Newlog.blg:
relog c:\Perflogs\oldlog.blg –o c:\Perflogs\newlog.blg
Relog gathers data from one or more performance logs and combines that data into a
single output file. You can specify a single input file or a string of input files, as in the
following example:
relog c:\Perflogs\oldlog1.blg c:\Perflogs\oldlog2.blg
c:\Perflogs\oldlog3.blg
–o c:\Perflogs\newlog.blg
If the file Newlog.blg does not exist, Relog will create it. If the file Newlog.blg does
exist, Relog will ask you if you want to overwrite it with the new set of records, and any
previously saved data is lost.
Command Syntax
The Relog tool supports a set of run-time parameters to define editing, summarization, and conversion options, with syntax and function that is similar to Logman.
These parameters are summarized in Table 2-20.
Table 2-20 Relog Tool Parameters for Editing, Summarizing, and Converting
Counter Log Files
Parameter
Syntax
Function
Notes
Settings file
-config FileName
Use the logging parameters
defined in this setting file.
Use -i in the configuration file as a
placeholder for a list
of input files that
can be placed on
the command line.
Counters
-c {Path [Path ...]
Specify the counters from the
input file that you want to
write to the output file. If no
counters are specified, all
counters from the input files
are written.
Use -cf FileName to
use counter settings
from an existing log
file.
Write output every n
intervals of the input
counter logs.
Defaults to creating
output every input
interval.
Summarization -t n
interval
Chapter 2:
Performance Monitoring Tools
193
Table 2-20 Relog Tool Parameters for Editing, Summarizing, and Converting
Counter Log Files
Parameter
Syntax
Function
Output file
name
-o {Path |
DSN!CounterLog}
Specify the output file name. Required.
If the file does not exist,
Relog will create it.
Log file
format
-f bin | bincirc | csv | Choose the format of the
tsv | SQL}
output counter log file.
Append
-a
Append the output from this For binary input and
logging session to an existing output files only.
file.
Begin
relogging
-b M/D/YYYY
H:MM:SS
Specify the start date and
time of the output file.
Defaults to the earliest start time of the
input files.
Specify the end date and
time of the output file.
Defaults to the latest
end time of the
input files.
[{AM | PM}]
End relogging
-e M/D/YYYY
H:MM:SS
[{AM | PM}]
Notes
Defaults to binary
format.
To append data to an existing file, add the -a parameter.
relog c:\scripts\oldlog.blg –o c:\scripts\newlog.blg -a
Adding -a to the preceding example causes Relog to add the records extracted from
Oldlog.blg to any existing records in Newlog.blg. Note that only binary files can be
appended.
Note To append to an existing text file, use Relog to convert the text file to binary,
use Relog again to append data from another file, and, finally, use Relog one more
time to create the resulting text file.
Merging Counter Logs Using Relog
Use the Relog tool to merge data from multiple performance logs into a single file. You
do this by specifying the path and file names for multiple performance logs as part of
the initial parameter. For example, this command gathers records from three separate
log files, and appends all the data to the file Newlog.blg:
relog c:\scripts\log1.blg c:\scripts\log2.blg c:\scripts\log3.blg –o
c:\Windows\logs\newlog.blg -a
Note Use the -a parameter to specify that Relog appended output to any existing
data in the output file when you merge data from multiple logs.
194
Microsoft Windows Server 2003 Performance Guide
Counter logs stored on remote computers can also be merged; use the UNC path
instead of the local path. For example, this command gathers records from three different computers (DatabaseServer, PrintServer, and FileServer), and appends that data
to the file Historyfile.blg:
relog \\DatabaseServer\logs\log1.blg \\PrintServer\logs\log2.blg
\\FileServer\logs\log3.blg –o c:\Windows\logs\Historyfile.blg –a –f blg
The individual paths cannot exceed a total of 1,024 characters. If you are retrieving
performance data from a large number of computers, it is conceivable that you could
exceed the 1,024-character limitation. In that case, you will need to break a single
Relog command into multiple instances of Relog.
Formatting the Relog Output File
Relog supports the same input and output file formats as Logman except that you
cannot use the binary circular file format or create output files with size limits. If a
new output file is being created, the new file will be created in binary format by
default. Relog can append data only to binary format files. Relog can append data
only to existing files when both the input and output files use binary format.
When creating a new output file, you can use the -f parameter to specify one of the
data formats shown in Table 2-21.
Table 2-21
File Format
Relog.exe File Formats
Description
bin
Binary format. This is the default file format.
csv
Comma-separated values.
tsv
Tab-separated values.
SQL
SQL database format.
Filtering Log Files Using Relog
Relog has a filtering capability that can be used to extract performance data from the
input counter logs based on the following criteria:
■
A list of counters, as specified by their paths
■
A specified date and time range
Only the counters that meet the filtering criteria you specified are written to the output file that Relog creates.
Filtering by counter Filtering by counter is based on a counter list, specified either
on the command line or in a Settings file. Relog writes to the designated output file
Chapter 2:
Performance Monitoring Tools
195
only those values for the counters specified on the Relog command line or in the settings file. For more information about how to specify the counter path correctly, see
“Performance Counter Path” earlier in this chapter.
In the following example, Relog writes values for only the \Memory\Available Bytes
counter to the output file:
relog c:\scripts\oldlog.txt –o c:\scripts\newlog.txt –f csv –c "\Memory\Available
Bytes"
To extract the data for more than one counter, include each counter path as part of the
-c parameter:
relog c:\scripts\oldlog.txt –f csv –o c:\scripts\newlog.txt –c "\Memory\Available
Bytes" "\Memory\Pages/sec" "\Memory\Cache Bytes"
Note
Relog does not do any performance monitoring itself; all it does is collect
data from existing performance logs. If you specify a performance counter that does
not appear in any of your input files, the counter will not appear in your output file
either.
If your input log contains data from multiple computers, include the computer name
as part of the counter path. For example, to extract available memory data for the
computer DatabaseServer, use this command:
relog c:\scripts\oldlog.txt –o –f csv c:\scripts\newlog.txt –c
"\\DatabaseServer\Memory\Available Bytes"
Instead of typing a large number of performance counters as part of your command
string, you can use a settings file to extract data from a log file. A settings file is a text
file containing the counter paths of interest. For instance, the settings file shown in
Listing 2-4 includes 10 different counter paths:
Listing 2-4
Relog.exe Settings File
"\Memory\Pages/sec"
"\Memory\Page Faults/sec"
"\Memory\Pages Input/sec"
"\Memory\Page Reads/sec"
"\Memory\Transition Faults/sec"
"\Memory\Pool Paged Bytes"
"\Memory\Pool Nonpaged Bytes"
"\Cache\Data Map Hits %"
"\Server\Pool Paged Bytes"
"\Server\Pool Nonpaged Bytes"
196
Microsoft Windows Server 2003 Performance Guide
To filter the input files so that only these counter values are output, use the -cf parameter, followed by the path of the settings file:
relog c:\scripts\oldlog.txt –o c:\scripts\newlog.txt –a –cf
c:\Windows\logs\memory.txt
Filtering by date Relog provides the ability to extract a subset of performance
records based on date and time. To do this, specify the beginning time (-b parameter)
and ending time (-e parameter) as part of your command string. Both of these parameters express dates and times using the mm-dd-yyyy hh:mm:ss format, where mm-ddyyyy represents month-day-year; hh:mm:ss represents hours:minutes:seconds; and
time is expressed in 24-hour format.
For example, to extract performance records from 9:00 P.M. on September 1, 2003
through 3:00 A.M. on September 2, 2003, use this command:
relog c:\scripts\oldlog.txt" –f csv –o c:\scripts\newlog.txt" –b 09-01-2003 21:00:00
–e 09-02-2003 03:00:00
If you code the time only on the -b and -e parameters, the current date is assumed. For
example, this command extracts performance records logged between 9:00 A.M. and
5:00 P.M. on the current date:
relog c:\scripts\oldlog.txt –o c:\scripts\newlog.txt –b 09:00:00 –e 17:00:00
If you choose, you can filter the input files by both counter value and date and time in
a single Relog execution.
Summarizing Log Files Using Relog
Relog allows you to reduce the size of the output files you create by writing only one
out of every n records, where n is a parameter you can specify using the -t option. This
has the effect of summarizing interval and averaging counters. Interval counters are
ones like \Processor\Interrupts/sec and \Process\% Processor Time that report an
activity rate over the measurement interval. Average counters like \Physical Disk\Avg.
Disk sec/transfer are ones that report an average value over the measurement.
The -t parameter allows you to extract every nth record from the input counter logs
and write them only to the output log file. For instance, -t 40 selects every fortieth
record; -t 4 selects every fourth record. If the original input counter log was recorded
with a sampling interval of once every 15 seconds, using Relog with the -t 4 parameter
results in an output file with data recorded at minute intervals. Interval counters like
\Processor\Interrupts/sec and \Process\% Processor Time in the output file represent activity rates over 1-minute intervals. The same input file relogged using -t 240
results in an output file with data recorded at 1-hour intervals. Interval counters like
\Processor\Interrupts/sec and \Process\% Processor Time in the output file represent activity rates over 1-hour intervals.
Chapter 2:
Performance Monitoring Tools
197
Summarizing using Relog works by resampling the input counter log. The output file
that results is not only more concise, it contains counter values that are summarized
over longer intervals. You can expect some data smoothing to result from this sort of
summarization, but nothing that would normally distort the underlying distribution
and be difficult to interpret.
Not all the counters that you can collect can be summarized in this fashion, however.
For an instantaneous counter like \Memory\Available Bytes or \System\Processor
Queue Length, relogging the counter log merely drops sample observations. If you
relog instantaneous counters to a large enough extent, you could wind up with an output file with so few observations that you do not have a large enough sample to interpret the measurements meaningfully. At that point, it is probably better to use a
counter list with Relog to drop instantaneous counters from the output file entirely.
Chapter 4, “Performance Monitoring Procedures,” offers sample summarization
scripts that illustrate this recommendation.
An example that uses Relog to summarize an input counter log from 1 minute to 4
minute intervals illustrates these key points.
Figure 2-18 shows a System Monitor Chart View that graphs three counters. The highlighted counter, the \System\Processor Queue Length, is an instantaneous counter,
which Relog cannot summarize. The remaining counters shown on the chart are interval counters, which Relog can summarize. This counter log was created using a sample interval of 1 minute.
Figure 2-18
Original counter log data before Relog summarization
198
Microsoft Windows Server 2003 Performance Guide
System Monitor reports that the counter log being charted covers a period with a
duration of roughly 2 hours. Using the Log File Time Span feature, the Chart is
zoomed into this period of unusual activity. Because the Chart View can display only
100 points, the line graphs are drawn based on skipping over a few of the data values
that will not fit on the graph. (It is as if the Chart View has a built-in Relog function for
drawing line graphs.) The statistics shown in the Value bar are based on all the observations in the period of interest. For the measurement period, the Processor Queue
Length, sampled once every minute, shows an average value of 4, a minimum value of
1, and a maximum value of 81.
The Relog command to summarize this file to 4-minute intervals is shown in Listing 2-5,
along with the output the tool produces.
Listing 2-5
Relogging Using the -t Parameter to Create Summarized Counter Logs
C:\PerfLogs>relog BasicDailyLog_12161833_001.blg -o
relogged_BasicDailyLog_12161833_001.blg -t 4
Input
---------------File(s):
BasicDailyLog_12161833_001.blg (Binary)
Begin:
End:
Samples:
12/16/2004 18:33:52
12/17/2004 17:15:26
1363
Output
---------------File:
relogged_BasicDailyLog_12161833_001.blg
Begin:
End:
Samples:
12/16/2004 18:33:52
12/17/2004 17:15:26
341
The command completed successfully.
Relog reports that it found 1363 total sample collection intervals in the original input
file, somewhat less than an entire day’s worth of data. Relogging using the -t 4 parameter creates an output file with 341 intervals over the same duration. Figure 2-19 is an
identical System Monitor Chart View using the relogged file instead of the original
data zoomed into the same 2-hour time span of interest.
Chapter 2:
Figure 2-19
Performance Monitoring Tools
199
Relogged data showing summarized data
The \System\Processor Queue Length counter is again highlighted. With only about
30 data points to report, the line graphs fall short of spanning the entire x-axis. The
Chart View shows every observed data point, which was not the case in the original
view. The graphs of the interval counters reveal some smoothing, but not much compared to the original. The average value of the Processor Queue Length counter is 5 in
the relogged data, with a maximum value of 72 being reported.
The average values reported in the Report View (not illustrated) for four interval
counters are compared in Table 2-22.
Table 2-22
Average Values in Report View
Interval Counter
Original (125 Points)
Relogged (30 Data Points)
Pages/sec
86.194
88.886
% Processor Time
16.820
17.516
% Privileged Time
9.270
9.863
Avg. Disk secs/transfer
0.008
0.009
As expected, the average values for the interval counters in the relogged file are consistent with the original sample.
200
Microsoft Windows Server 2003 Performance Guide
The Processor Queue Length statistics that System Monitor calculated for this instantaneous value reflect the observations that were dropped from the output file. The
maximum observed value for the Processor Queue Length in the relogged counter log
file is 72, instead of 81 in the original. Due to chance, the observed maximum value in
the original set of observations was lost. The average value, reflecting the underlying
uniformity of the distribution of Processor Queue Length values that were observed,
remains roughly the same across both views. When the underlying distribution is
more erratic, you can expect to see much larger differences in the summary statistics
that can be calculated for any instantaneous counters.
Using Typeperf Queries
The Typeperf tool provides a command-line alternative to the Windows System Monitor. Typeperf can provide a running list of performance counter values, giving you
detailed performance monitoring in real-time. Typeperf also imposes less overhead
than System Monitor. This can be important if the computer you are monitoring is
already sluggish or performing poorly.
More Info
For more information about Typeperf, in Help and Support Center for
Microsoft® Windows Server™ 2003, click Tools, and then click Command-Line Reference A–Z.
To assist in building automated performance monitoring procedures, Typeperf provides an easy way to retrieve a list of all the performance counters installed on a given
computer. (Although it is possible to view the set of installed performance counters
using System Monitor, there is no way to review and save the entire list.) With Typeperf, you can list the installed performance counters, and save that list to a text file
that can be edited later to create a Logman or Relog tool settings file. This capability of
Typeperf queries is designed to complement the Log Manager (Logman.exe) and
Relog (Relog.exe) command-line tools that have been discussed here.
Command Syntax
Typeperf supports a set of runtime parameters to define counter log queries, along
with options to gather counters in real time. Typeperf parameters use syntax and perform functions that are very similar to Logman and Relog. These parameters are summarized in Table 2-23.
Chapter 2:
Performance Monitoring Tools
201
Table 2-23 Typeperf Tool Parameters for Gathering Performance Data and
Listing Available Queries
Parameter
Syntax
Function
Notes
Query
-q {Path [Path ...]
Returns a list of counters,
one path per line.
Use -o to direct
output to a text file.
Extended query
-qx{Path [Path ...]
Returns a list of counters
with instances.
Extended query
output is verbose.
Settings file
-config FileName
Use the logging
parameters defined
in this settings file.
Code counter Path
statements one per
line.
Counters
-c {Path [Path ...]
Specify the counters
that you want to
gather.
Use -cf FileName to
use counter settings
from an existing log
file.
Sample interval
-si [[HH:]MM:]SS
Specify the interval
between data collection
samples.
Defaults to 1 second.
# of samples
-sc Samples
Specify the number of
data samples to collect.
Output file name
-o {FileName}
Specify the output file
name. If the file does
not exist, Typeperf will
create it.
Log file format
-f {bin | csv | tsv | SQL} Choose the format of the
output counter log file.
Defaults to .csv
format.
Computer
-s ComputerName
If no computer
name is provided,
the local computer
is assumed.
Specify the computer
you want to gather the
performance counters
from.
Redirects output to
a file. Defaults to
stdout.
Obtaining a List of Performance Counters Using Typeperf Queries
Before you can monitor performance on a computer, you need to know which performance counters are available on that computer. Although a default set of performance
counters is installed along with the operating system, the actual counters present on
a given computer will vary depending on such things as:
202
Microsoft Windows Server 2003 Performance Guide
■
The operating system installed. Windows 2000, Windows XP, and Windows
Server 2003, for example, all have different sets of default performance counters.
■
Additional services or applications installed. Many applications—including
Microsoft Exchange and Microsoft SQL Server—provide their own set of performance counters as part of the installation process.
■
Whether performance counters have been disabled or become corrupted.
To retrieve a list of all the performance counters (without instances) available on a
computer, start Typeperf using the -q parameter:
Typeperf –q
In turn, Typeperf displays the paths of the performance counters installed on the
computer. The display looks something like the excerpted set of performance
counters shown in Listing 2-6.
Listing 2-6
Abbreviated Typeperf Performance Counter Listing
\Processor(*)\% Processor Time
\Processor(*)\% User Time
\Processor(*)\% Privileged Time
\Processor(*)\Interrupts/sec
\Processor(*)\% DPC Time
\Processor(*)\% Interrupt Time
\Processor(*)\DPCs Queued/sec
\Processor(*)\DPC Rate
\Processor(*)\% Idle Time
\Processor(*)\% C1 Time
\Processor(*)\% C2 Time
\Processor(*)\% C3 Time
\Processor(*)\C1 Transitions/sec
\Processor(*)\C2 Transitions/sec
\Processor(*)\C3 Transitions/sec
\Memory\Page Faults/sec
\Memory\Available Bytes
\Memory\Committed Bytes
\Memory\Commit Limit
To return a list of all the performance counters available, including instances, use the
-qx option. Be aware that the -qx parameter will return a far greater number of performance counters than the -q parameter.
The output from a counter query can be directed to a text file using the -o parameter.
You can then edit this text file to create a settings file that can be referenced in subsequent Logman and Relog queries.
Chapter 2:
Performance Monitoring Tools
203
Retrieving Performance Counters from Remote Computers
You can append a UNC computer name to the command string to obtain a list of performance counters from a remote computer. For example, to list the performance
counters on the remote computer DatabaseServer, type the following:
Typeperf –q \\DatabaseServer
To retrieve only the counters for the Memory object, type this:
Typeperf –q \\DatabaseServer\Memory
Tip Although Typeperf does not provide a way to specify an alternate user name
and password directly, try connecting to the remote system first using a net use
\\remotesystems\ipc$ /user:<name> Password command to make the connection
first. Then issue the Typeperf command on the remote machine.
Monitoring Performance from the Command Line Using Typeperf
After you know which performance counters are installed on a computer, you can
begin monitoring performance. To do this, start Typeperf followed by a list of the
counters you want to monitor. For example, to monitor the available bytes of memory,
type the following, enclosing the counter name in quotation marks:
Typeperf “\Memory\Available Bytes"
Typeperf will begin monitoring the available bytes of memory, and will display the
performance data in real time in the command window. An example of this output is
shown in Listing 2-7.
Listing 2-7
Sample Typeperf Output
C:\ Typeperf "\Memory\Available bytes"
"(PDH-CSV 4.0)","\\COMPUTER1\Memory\Available bytes"
"10/24/2001 13:41:31.193","35700736.000000"
"10/24/2001 13:41:32.195","35717120.000000"
"10/24/2001 13:41:33.196","35700736.000000"
"10/24/2001 13:41:34.197","35680256.000000"
The Typeperf output consists of the date and time the sample was taken, along with
the value measured. A new sample is taken (and the command window updated)
once every second. This will continue until you press CTRL+C and end the Typeperf
session.
204
Microsoft Windows Server 2003 Performance Guide
Monitoring Multiple Performance Counters Using Typeperf
To monitor two or more performance counters at the same time, include each counter
name as part of the -c parameter. For example, to simultaneously monitor three separate memory counters, type the following, enclosing each counter name in quotation
marks, and separating the individual counters by using commas:
Typeperf "\Memory\Available Bytes" "\Memory\Pages/sec" "\Memory\Cache Bytes"
Typeperf will display output similar to that shown in Listing 2-8, using commas to
separate the fields.
Listing 2-8
Displaying Multiple Counters with Typeperf
"(PDH-CSV 4.0)","\\DCPRNTEST\Memory\Available Bytes","\\DCPRNTEST\Memory\Pages/
sec","\\DCPRNTEST\Memory\Cache bytes"
"02/06/2001 09:05:57.464","24489984.000000","0.000000","47214592.000000"
"02/06/2001 09:05:58.516","24489984.000000","0.000000","47214592.000000"
"02/06/2001 09:05:59.567","24530944.000000","0.000000","47214592.000000"
"02/06/2001 09:06:00.619","24514560.000000","0.000000","47214592.000000"
As an alternative, you can store counter paths in a settings file, and then reference that
file as part of the command string the next time you start Typeperf by using the -cf
parameter, similar to the way you would with both Logman and Relog.
Monitoring the Performance of Remote Computers Using
Typeperf
Typeperf can also monitor performance on remote computers; one way to do this is to
include the UNC computer name as part of the counter path. For example, to monitor
memory use on the remote computer WebServer, type the following:
Typeperf "\\Webserver\Memory\Available Bytes"
Alternately, use the local counter name and specify the name of the remote computer
by using the -s option:
Typeperf "\Memory\Available Bytes" -s Webserver
To monitor counters from multiple remote computers, use the settings file and specify
the computer name UNC as part of the counter path.
Automating Typeperf Usage
By default, Typeperf measures (samples) performance data once a second until you
press CTRL+C to manually end the session. This can be a problem if you are recording
Typeperf output to a performance log, and no one is available to end the Typeperf session: at one sample per second, it does not take long for a log to grow to an enormous
size. For example, if you use the default settings to monitor a single counter every second, after one day, the size of your log file will be more than 4 megabytes.
Chapter 2:
Performance Monitoring Tools
205
Note Windows Server 2003 has no limitation on the size of the log file created.
However, even though log files of this size are supported, they can be difficult to analyze because of the huge amount of information contained within them.
Modifying the sampling interval Although you cannot place a specific time limit
on a Typeperf session—for example, you cannot specify that Typeperf run for two
hours and then stop—you can specify the number of samples that Typeperf collects
during a given session. Once that number is reached, the session will automatically
end.
To limit the number of samples collected in a Typeperf session, use the -sc parameter
followed by the number of samples to collect. For example, the following command
measures memory use 60 times and then stops:
Typeperf "\Memory\Available Bytes" –sc 60
Because a new sample is taken every second, 60 samples will take approximately one
minute. Thus, this session of Typeperf will run for 1 minute, and then shut down.
Modifying the Typeperf sampling rate In addition to specifying the number of
samples Typeperf will collect, you can specify how often Typeperf will collect these
samples. By default, Typeperf collects a new sample once every second (3,600 times
per hour, or 86,400 times in a 24-hour period). For routine monitoring activities, this
might be too much information to analyze and use effectively.
To change the sampling rate, use the -si parameter, followed by the new sampling time
in seconds. For example, this command measures memory use every 60 seconds:
Typeperf "\Memory\Available Bytes" –si 60
This command measures memory use every 10 minutes (60 seconds × 10):
Typeperf "\Memory\Available Bytes" –si 600
You must use the same sampling rate for all the performance counters being monitored in any one Typeperf instance. For example, suppose you use the -si parameter to
change the sampling rate for a set of memory counters:
Typeperf "\Memory\Available Bytes" "\Memory\Pages/sec" "\Memory\Cache Bytes" –si 60
All three counters must use the sampling rate of 60 seconds. (You cannot assign different sampling rates to individual counters.) If you need to measure performance at
different rates, you must run separate instances of Typeperf.
Writing Typeperf output to a counter log Typeperf was initially designed primarily for displaying real-time performance data on the screen. However, the application
206
Microsoft Windows Server 2003 Performance Guide
can also be used to record data in a log file. This allows you to keep a record of your
Typeperf sessions, as well as carry out unattended performance monitoring from a
script or batch file. For example, you could create a script that starts Typeperf, collects
and records performance data for a specified amount of time, and then terminates the
Typeperf session.
Note
If you plan on collecting performance data on a regular basis (for example,
every morning at 4:00 A.M.), you should consider using the Log Manager tool (Logman.exe), which has a built-in scheduling component.
To create a performance log using Typeperf, use the -o parameter followed by the path
for the log. For example, to save performance data to the file C:\Windows\Logs
\Memory.blg, type this command:
Typeperf "\Memory\Available Bytes" –o c:\Windows\logs\memory.blg –f bin
Note that the default log file type that Typeperf produces is in .csv format. You must
specify -f bin if you want to create a binary format log file. If the file Memory.blg does
not exist, Typeperf will create it. If the file Memory.blg does exist, Typeperf will ask if
you want to overwrite the file. You cannot run Typeperf on multiple occasions and
have all the information saved to the same log file. Instead, you must create separate
log files for each Typeperf session, then use the Relog tool to merge those separate
logs into a single file.
When Typeperf output is redirected to a performance log, the output does not appear
onscreen. You can view performance data onscreen or you can redirect performance
data to a log, but you cannot do both at the same time.
Changing the data format of a Typeperf performance log Unless otherwise
specified, Typeperf saves its output in comma-separated values format (CSV). In addition to the CSV format, you can use the -f parameter and save the log file in either the
TSV (tab-separated values) or BLG (binary log) formats; or to a SQL database.
To change the format of a Typeperf performance log, use the -o parameter to specify
the file name, and the -f parameter to specify the data format. For example, this command creates an output file C:\Windows\Logs\Memory.tsv, saving the data in tabseparated-values format:
Typeperf "\Memory\Available Bytes" –o c:\Windows\logs\memory.tab –f TSV
The -f parameter is valid only when output is being directed to a file; it has no
effect on output being displayed onscreen. Output is always displayed onscreen as
comma-separated values.
Note
Chapter 2:
Performance Monitoring Tools
207
Windows Performance Monitoring Architecture
A common set of architectural features of the Windows Server 2003 operating system
support the operation of System Monitor; Performance Logs and Alerts; and the Logman, Relog, and Typeperf command-line tools. These performance tools all obtain
data by means of using the Performance Data Helper (PDH) dynamic-link library
(DLL) as an intermediary.
More Info
For more information about PDH, see the Windows Server 2003 Software Development Kit (SDK) documentation.
Each of these performance tools gathers counters using the PDH interface. Each tool
is then responsible for making the calculations to convert raw data into interval
counters and for formatting the data for generating output and reporting.
Performance Library DLLs
Performance Library (Perflib) DLLs provide the raw data for all the measurements
you see in System Monitor counters. The operating system supplies a base set of performance library DLLs for monitoring the behavior of resources such as memory, processors, disks, and network adapters and protocols. In addition, many other
applications and services in the Windows Server 2003 family provide their own DLLs
that install counters that you can use to monitor their operations. All the Performance
Library DLLs that are installed on the machine are registered in \Performance subkeys under the HKLM\SYSTEM\CurrentControlSet\Services\<service-name>\ key.
Figure 2-20 shows the four Performance Library DLLs that are supplied with the operating system. They are responsible for gathering disk, network, system-level, and process-level performance counters.
Figure 2-20
Performance Library DLLs
208
Microsoft Windows Server 2003 Performance Guide
The entry for the Perfos.dll illustrates the registry fields that are associated with a
Perflib. Some of these are documented in Table 2-24.
Table 2-24
Registry Fields
Registry Field
Description
Library
Identifies the file name (and path) of the Perflib DLL module. If it is
an unqualified file name, %systemroot%\system32 is the assumed
location.
Open
The entry point for the Perflib’s Open routine to be called when initializing a performance counter collection session.
Open Timeout
How long to wait in milliseconds before timing out the call to the
Open routine. If the Open routine fails to return in the specified
amount of time, the Disable Performance Counters flag is set.
Collect
The entry point to call every sample interval to retrieve counter data.
Collect Timeout
How long to wait in milliseconds before timing out the call to the
Collect routine. If the Collect routine fails to return in this amount of
time, the Disable Performance Counters flag is set.
Close
The entry point for the Perflib’s Close routine to be called to clean up
prior to termination.
These registry fields are used by the PDH routines to load Perflib DLLs, initialize them
for use in performance data collection, call them to gather the performance data, and
close them when the performance monitoring session is over.
Performance Counter Text String Files
To save space in the registry, the large REG_MULTI_SZ string variables that make up
the names and explanatory text of the performance counters are saved in performance counter text string files outside the registry. These files are mapped into the
registry so that they appear as normal registry keys to users and applications. The performance counter text string file names are:
■
%windir%\system32\perfc009.dat
■
%windir%\system32\perfh009.dat
Storing this text data in a separate file also assists in internationalization. Perfc009.dat
is the English version of this text data; the 009 represents the language ID of the country, in this case English.
Performance Data Helper Processing
When called by a performance monitoring application, PDH initialization processing
involves the following steps:
Chapter 2:
Performance Monitoring Tools
209
PDH processing steps
1. PDH accesses the perfc009.dat and perfh009.dat files that contain text that
defines the performance objects and counter names that are installed on the
machine, along with their associated Explain text.
2. PDH inventories the HKLM\SYSTEM\CurrentControlSet\Services\<servicename>\ key to determine which Perflib DLLs are available on this machine.
3. For each Performance key found, PDH loads the Perflib DLL.
4. After the Perflib DLL is loaded, PDH calls its Open routine. The Open routine
returns information that describes the objects and counters that the Perflib DLL
supports. The performance monitoring application can use this information to
build an object and counter selection menu like the Add Counters form in System Monitor.
5. At the first sample interval, PDH calls the Perflib Collect routine to gather raw
performance counters. The performance monitoring application can then make
additional PDH calls to format these raw counters for display.
6. At termination, PDH calls the Close routine of all active Perflibs so that they end
processing gracefully.
PDH processing hides most of the details of these processing steps from performance
monitoring applications like the System Monitor and Log Manager. The Extensible
Counter List (exctrlst.exe) tool (included as part of the Windows Support Tools for
Windows Server 2003) illustrates step 2 of this processing. If you type exctrlst on a
command line, you will see a display like the one in Figure 2-21.
Figure 2-21
Extensible Counter List dialog
210
Microsoft Windows Server 2003 Performance Guide
The Extensible Counter List tool displays a complete inventory of the Performance
Library DLLs that are registered on your machine.
Disable Performance Counters
If a call to a Perflib function fails or returns error status, PDH routines add an optional
field to the Performance subkey called Disable Performance Counters. An Application
Event log message is also generated when PDH encounters a Perflib error. (More information documenting the Perflib error messages is available in Chapter 6, “Advanced
Performance Topics,” in this book.) If a Perflib’s Disable Performance Counters flag is
set, PDH routines will not attempt to load and collect counter data from the library
until the problem is resolved and you clear the Disable Performance Counters flag by
using Exctrlst.exe.
Tip
If you are expecting to collect a performance counter but cannot, use exctrlst to
check whether the Disable Performance Counters flag has been set for the Perflib
responsible for that counter. Once you have resolved the problem that caused the Disable Performance Counters flag to be set, use exctrlst to clear the flag and permit PDH
to call the Perflib once again.
Always use the Extensible Counter List tool to reset the Disable Performance
Counters flag rather than editing the Registry key directly.
Additional Perflib registry entries are at HKLM\SOFTWARE\Microsoft\Windows
NT\CurrentVersion\Perflib and are designed to help you when you are encountering
data collection problems or other failures inside Performance Library DLLs.
Remote Monitoring
Monitoring performance counters remotely requires that you have network access to
the remote computer and an agent on the remote computer that collects performance
data and returns it to the local computer that requested the data. The remote collection agent supplied with the Windows Server 2003 family is the Remote Registry service (Regsvc.dll). Regsvc.dll collects performance data about the computer it is
running on and provides the remote procedure call (RPC) interface that allows other
computers to connect to the remote computer and collect that data. This service must
be started and running on the remote computer before other computers can connect
to it and collect performance data. Figure 2-22 illustrates the different interefaces and
functional elements used when monitoring performance data remotely.
Chapter 2:
Performance
application based
on the performance
registry (for example
Perfmon.exe)
4
Performance Monitoring Tools
User-defined
HTML page
or script
Performance Logs
and Alerts snap-in
8
9
System Monitor
Performance Logs
and Alerts service
6
6
Performance Data Helper (PDH) library
pdh.dll
5
4
Performance registry
Performance
data log file
Monitoring Computer
7
Remote Computer Being Monitored
Remote Registry service
Regsvc.dll
Legend:
1 Windows system call
API: each API is specific to
information requested
3
Performance registry
2 Standard performance
library interface
2
2
System
performance
DLL
Performance
extension
DLL
1
System Calls
3 Registry internal
interface
4 RefQueryValueEx API to
performance registry key
Performance
counter text
string files
5 PDH internal log
file interface
6 Published PDH API
7 Registry internal
RPC interface
8 System Monitor ActiveX
control interface
9 Log service internal
configuration interface
Figure 2-22
Remote Performance Monitoring Architecture
Note The Messenger service in the Windows Server 2003 family sends users alert
notifications. This service must be running for alert notifications to be received.
211
212
Microsoft Windows Server 2003 Performance Guide
Event Tracing for Windows
Event Tracing for Windows (ETW) is an event-oriented instrumentation available
from operating system and application providers. These events report precisely when
certain performance-oriented events occur, including:
■
Context switches
■
Page faults
■
File I/O requests
■
Process creation and termination
■
Thread creation and termination
■
TCP Send, Receive, and connection requests
In addition, server applications like IIS 6.0 and Active Directory are extensively instrumented to provide diagnostic event traces. In IIS 6.0, for example, the HTTP driver,
the Inetinfo process address space, ISAPI filtering, CGI Requests, and even ASP
Requests provide specific request start and end events that allow you to trace the
progress of an individual HTTP Get Request through various stages of its processing
by these components.
Event traces not only record when these events occur, they also capture specific information that can be used to identify the event and the application that caused it. These
events can be logged to a file where you can view them or report on them. Event tracing is a technique that you can rely on to diagnose performance problems that are not
easy to solve using statistical tools like System Monitor.
The great benefit of event traces is that they are extremely precise. You know exactly
what happened and when. But there are potential pitfalls to using event tracing that
you also need to be aware of. A drawback of event tracing is the potential to generate
large quantities of data that can complicate the analysis of the gathered data. In addition, manual analysis of raw event traces is complex, although Windows Server 2003
includes a built-in trace data-reporting tool called Tracerpt.exe that simplifies matters.
If you just wanted to know a simple count of how many events occurred, you could
ordinarily gather that information using statistical tools like System Monitor. If you
need to understand in detail the sequence of events associated with a specific performance problem, Event Tracing for Windows can provide that information. It can tell
you about a wide variety of system and application-oriented events.
Event Tracing Overview
Trace data in Windows Server 2003 is gathered in logging sessions which record data
to a trace log file. You can create and manage event tracing sessions using the Log
Chapter 2:
Performance Monitoring Tools
213
Manager command-line interface. In addition, the Trace Logs facility in Performance
Logs and Alerts in the Performance Monitor console provides an interactive facility for
defining event tracing sessions, starting them, and stopping them. However, the interactive Trace Logs facility provides access to only a subset of the Trace definition
options that are available using Logman. The Logman interface also has the advantage
that it can be used in conjunction with scripts to automate all aspects of event trace
logging. The Trace Reporting program, Tracerpt.exe, formats trace data and provides
a number of built-in reports.
In a tracing session, you communicate with selected trace data providers that are
responsible for reporting whenever designated events occur. When an instrumented
event occurs, such as an application sending or receiving a TCP/IP segment or a
thread context switch, the provider returns information about the event to the trace
session manager, ordinarily the Performance Logs and Alerts service. The Performance Logs and Alerts service then writes a trace event entry in the log file.
Trace logs are saved only in binary format. Trace log files are automatically saved with
an .etl extension. You can use either a circular trace file or a sequentially organized
trace file. Like counter logs, you can set trace log file-size limits. When a circular trace
file reaches its designated size limit, event logging continues by wrapping around to
the beginning of the log file and overwriting the oldest trace events with current
events. When a sequential trace file reaches its designated size limit, the logging session terminates. Use a circular log file when you want to run an event log tracing session long enough to capture information about some event that occurs unpredictably.
Viewing trace logs requires a parsing tool, such as Tracerpt.exe, to process the trace
log output file and convert from binary format to CSV format so that you can read it.
Event tracing reports are also available using the -report option of tracerpt. Typically,
you will be interested in the Trace reports that are available using the -report option,
rather than in viewing raw CSV files containing the event trace records.
Performance Logs and Alerts
When you open the Performance Monitor, you will notice the Performance Logs
and Alerts function in the left tree view. There are three components to Performance Logs and Alerts: counter logs, trace logs, and alerts. This section documents
the use of trace logs.
You can create trace logs using the Performance Logs and Alerts tool whenever you
require detailed trace data to resolve a performance problem. Reports created from
trace logs using the tracerpt tool can provide detailed insight into many problems that
are difficult to unravel using performance statistics alone. Performance Logs and
Alerts provides trace log capabilities that are similar to those available for counter logs.
For example, you can:
214
Microsoft Windows Server 2003 Performance Guide
■
Manage multiple trace logging sessions from a single console window.
■
Start and stop trace logging sessions manually, on demand, or automatically, at
scheduled times for each log.
■
Stop each log based on the elapsed time or the current file size.
■
Specify automatic naming schemes and stipulate that a program be run when a
trace log is stopped.
There is an upper limit on the number of trace sessions that can run concurrently.
This limit is 32. You can define many more trace logs settings than that, but you can
activate only 32 trace sessions at a time.
The Performance Logs and Alerts service process, Smlogsvc.exe, is responsible for
executing the trace log functions you have defined. Comparable trace performance
data logging capabilities are also available using the Logman command-line tool,
which also interfaces with the Performance Logs and Alerts service process. This command-line interface to gather event traces is discussed in “Using Log Manager to Create Trace Logs.”
After you load the Performance Logs and Alerts console, you will need to configure
the trace logs.
To configure trace logs
1. Click the Trace Logs entry to select it.
Previously defined trace log sessions appear in the appropriate node of the
details pane.
2. Right-click the details pane to create a new log. You can also use settings from an
existing HTML file as a template.
Note
To run the Performance Logs and Alerts service, you must be a member
of the Performance Log Users or Administrators security groups. These groups
have special security access to a subkey in the registry to create or modify a log
configuration. (The subkey is HKEY_CURRENT_MACHINE\SYSTEM\CurrentControlSet\Services\ SysmonLog\Log_Queries.)
3. In the New Log Settings box, type the name of your trace log session and click
OK.
Figure 2-23 shows the General tab for the Properties of a new counter log after
you enter the counter log name.
Chapter 2:
Figure 2-23
Performance Monitoring Tools
215
General tab for a trace log
4. To configure a trace log, chose either events from the system Provider or one of
the available application Providers. Click the Provider Status button to see what
specific trace Providers are installed on your machine.
5. Click the Log Files, Schedule, and Advanced options tabs to set the file type, the
file naming convention, and other file management options, and to configure
the collection period. These options are discussed later.
Trace event providers Providers are responsible for sending information about an
event to the Performance Logs and Alerts service when it occurs. By default, on the
General tab, the Nonsystem Providers option is selected to keep trace logging overhead to a minimum. Click the Add button to include data from that Provider in the
trace log. Application Providers include Active Directory, Microsoft Message Queue,
IIS 6.0, and the print spooler. The Processor Trace Information Provider traces Dispatcher events, including thread context switches.
If you click Events Logged By System Provider, a built-in provider for Windows kernel
events is used to monitor processes, threads, and other activity. To define kernel
events for logging, select the check boxes as appropriate.
The built-in system Provider can trace the following kernel events:
■
Process creation and deletion
■
Thread creation and deletion by process
216
Microsoft Windows Server 2003 Performance Guide
■
Disk input/output operations specifying logical, physical, and network disk
operations by process
■
Network TCP/IP Send and Receive commands by process
■
Page faults, both hard and soft, by process
■
File details for disk input/output operations
Note that the ability to use the system provider to trace registry and image events is
restricted to the Log Manager program (Logman.exe).
Before you turn on tracing for a class of event, it helps to understand what the performance impact of tracing might be. To understand the volume of event trace records
that can be produced, use System Monitor to track the following counters over several
minutes:
■
\System\File Data Operations/sec
■
\System\File Control Operations/sec
■
\Memory\Page Faults/sec
■
\TCPv4\Segments/sec
These counters count the events that the kernel trace Provider can trace, assuming
you choose to select them. The size of the trace files that are produced and the overhead consumed by a trace is proportional to the number of these events that occur.
Configuring Trace Log Properties The Trace Log Properties sheets that allow you
to set up automated event trace procedures. The file and scheduling options for trace
logs are very similar to those that are available for counter logs. The log files tab is
used to select the file type and automatic file naming options. For example, you can
generate unique log file names that are numbered consecutively, or you can add a date
and timestamp to the file name automatically. Or you can choose to write all performance data to the same log file; in this case, you specify that current performance data
is used to overwrite any older data in the file. Once you set the appropriate option, the
Log Files tab displays an example of the automatic file names that will be generated
for you.
On the Schedule tab, you can choose manual or automatic startup options. You can
then set the time you want the logging session to end using an explicit end time or a
duration value in seconds, minutes, hours, or days; or by specifying that it end when
the log file reaches its designated size limit.
Trace logging options are summarized in Table 2-25.
Chapter 2:
Table 2-25
Performance Monitoring Tools
217
Summary of Trace Log Properties
Tab
Settings to Configure
Notes
General
Select Providers
You can collect trace data from the local computer only. Configure system Provider events.
Account and Password
You can use Run As to provide the logon account
and password for data collection on remote
computers.
File Type
Trace logs are binary files stored with an .etl extension. You can select either circular trace files
that wrap around to the beginning when they fill
up, or sequential trace files. Use Configure to enter location, file name, and log file size.
Automatic File Naming
You can choose to add unique file sequence
numbers to the file name or append a time and
date stamp to identify the file.
Manual or Automated
Start and Stop Methods
and Schedule
You can specify that the log stop collecting data
when the log file is full.
Log Files
Schedule
Automated Start and Stop Start and stop by time of day, or specify the log
Times
start time and duration.
Automated Stop When the
File Reaches its Maximum
Size
Processing When the Log
File Closes
Advanced
For continuous data collection, start a new log file
when the log file closes. You can also initiate automatic log file processing by running a designated command when the log file closes.
Buffer Size
Minimum and Maximum
buffers
Increase the number of buffers if too many trace
events are being skipped.
Buffer Flush timer
The longest amount of time, in seconds, that a
trace entry can remain in memory without being
flushed to the disk logging file.
Using Log Manager to Create Trace Logs
Log Manager (Logman.exe) can also be used from the command line to generate
event trace logs. This section documents the use of Logman to create and manage
trace log files.
More Info For more information about Logman, in Help and Support Center for
Microsoft® Windows Server™ 2003, click Tools, and then click Command-Line Reference A–Z.
218
Microsoft Windows Server 2003 Performance Guide
Command Syntax
Logman operates in one of two modes. In Interactive mode, you can run Logman from
a command-line prompt and interact with the logging session. In Interactive mode, for
example, you can control the start and stop of a logging session. In Background mode,
Logman creates trace log configurations that are scheduled and processed by the
same Performance Logs and Alerts service that is used with the Performance Monitor
console. For more information about Performance Logs and Alerts, see “Performance
Logs and Alerts” in this book.
Table 2-26 summarizes the seven basic Logman subcommands.
Table 2-26
Logman Subcommands
Subcommand
Function
create trace
CollectionName
Creates collection queries for either counter data or trace collection sessions.
update
CollectionName
Updates an existing collection query to modify the collection
parameters.
delete CollectionName
Deletes an existing collection query.
query
{ CollectionName }
Lists the collection queries that are defined and their status.
Use query CollectionName to display the properties of a specific collection. To display the properties on remote computers,
use the -s RemoteComputer option in the command line.
query providers
[ProviderName …]
Use query providers ProviderName to display a list of parameters that can be set for the specified provider, including their
values and descriptions of what they enable. Note that this information is provider-dependant.
start CollectionName
Start a logging session manually.
stop CollectionName
Stop a logging session manually.
Collection queries created using Log Manager contain properties settings identical to
the trace logs created using the Performance Logs and Alerts snap-in. If you open the
Performance Logs and Alerts snap-in, you will see any collection queries you created
using Log Manager. Likewise, if you use the -query command-line parameter in Log
Manager to view a list of collection queries on a computer, you will also see any trace
logs created using the Performance Logs and Alerts snap-in.
Interactive Sessions
The -ets command-line switch is used to establish an interactive event trace session.
Without it, Logman assumes a scheduled trace collection. When -ets is used, Logman
will not look at previously saved session configurations. The parameters will be
passed to the event trace session directly without being saved or scheduled. Using the
-ets switch, you can create multiple event trace sessions per console window.
Chapter 2:
Performance Monitoring Tools
219
In the following command sequence,
logman create trace "mytrace" -pf iistrace.txt –bs 64 -o mytrace.etl
logman start "mytrace" -ets
logman stop "mytrace" -ets
The session started by the second command is not the one created by the first command. The second Logman command will start a session mytrace with default settings
because the user specified -ets without any other arguments. However, the second
command does not erase the saved settings from the first command.
In contrast, without the -ets switch,
logman create trace "mytrace" -pf iistrace.txt –bs 64 -o mytrace.etl
logman start "mytrace"
logman stop "mytrace"
The second and third command will retrieve the settings from the saved session, start
and stop the session. Please note that this command sequence looks like an interactive session, but without -ets, the commands will still go through the scheduling service, even for immediate start/stop.
Trace Providers
The Logman -query providers subcommand allows you to determine which trace providers you can gather trace data from. For example,
C:\>logman query providers
Provider
GUID
--------------------------------------------------------------------------ACPI Driver Trace Provider
{dab01d4d-2d48-477d-b1c3-daad0ce6f06b}
Active Directory: Kerberos
{bba3add2-c229-4cdb-ae2b-57eb6966b0c4}
IIS: SSL Filter
{1fbecc45-c060-4e7c-8a0e-0dbd6116181b}
IIS: WWW Server
{3a2a4e84-4c21-4981-ae10-3fda0d9b0f83}
IIS: Active Server Pages (ASP)
{06b94d9a-b15e-456e-a4ef-37c984a2cb4b}
Local Security Authority (LSA)
{cc85922f-db41-11d2-9244-006008269001}
Processor Trace Information
{08213901-B301-4a4c-B1DD-177238110F9F}
Windows Kernel Trace
{9e814aad-3204-11d2-9a82-006008a86939}
ASP.NET Events
{AFF081FE-0247-4275-9C4E-021F3DC1DA35}
NTLM Security Protocol
{C92CF544-91B3-4dc0-8E11-C580339A0BF8}
IIS: WWW Isapi Extension
{a1c2040e-8840-4c31-ba11-9871031a19ea}
HTTP Service Trace
{dd5ef90a-6398-47a4-ad34-4dcecdef795f}
Active Directory: NetLogon
{f33959b4-dbec-11d2-895b-00c04f79ab69}
Spooler Trace Control
{94a984ef-f525-4bf1-be3c-ef374056a592}
The command completed successfully.
220
Microsoft Windows Server 2003 Performance Guide
Warning
If the application or service associated with the provider is not active on
the machine, the provider it is not enabled to gather the corresponding trace events.
Some providers support additional options that allow you to select among the events
that they can trace. For example, by default, by using the Windows Kernel Trace Provider, only Process, Thread, and Disk trace events are gathered. You must specifically
set the provider flags that correspond to the other events the provider can trace to collect the other kernel trace events.
You can query to see what flags can be set using the query providers ProviderName command. You can use either the flag name returned from the Query Providers command
or set the flag value. A flag value of 0xFFFFFFFF sets all the flags, allowing you to
gather all the trace events the provider can supply.
C:\>logman query providers "Windows Kernel Trace"
Provider
GUID
--------------------------------------------------------------------------Windows Kernel Trace
{9e814aad-3204-11d2-9a82-006008a86939}
Flags
Value
Description
--------------------------------------------------------------------------process
0x00000001
Process creations/deletions
thread
0x00000002
Thread creations/deletions
img
0x00000004
image description
disk
0x00000100
Disk input/output
file
0x00000200
File details
pf
0x00001000
Page faults
hf
0x00002000
Hard page faults
net
0x00010000
Network TCP/IP
registry
0x00020000
Registry details
dbgprint
0x00040000
Debug print
The command completed successfully.
The output of this query lists the flags that the Windows Kernel Trace Provider supports. To trace process creations/deletions, thread creations/deletions, and hard page
faults, issue the following Logman command:
logman create trace "NT Kernel Logger" -P "Windows kernel trace" (process, thread,hf)
/u mydomain\username *
To trace all kernel events, set all the available flags, as follows:
logman create trace "NT Kernel Logger" -P "Windows kernel trace" 0xFFFFFFFF /u
mydomain\username *
Chapter 2:
Performance Monitoring Tools
221
Note The Windows trace Provider can write only to a special trace session called
the NT Kernel Logger. It cannot write events to any other trace session. Plus, there can
be only one NT Kernel Logger session running at any one time. To gather a Kernel
Logger trace, the Performance Logs and Alerts service must run under an account
with Administrator credentials.
IIS 6.0 trace providers Comprehensive IIS 6.0 event tracing usues several providers, namely HTTP.SYS, WWW server, WWW Isapi Extension, ASP, ASP.NET, and
StreamFilter.
Create a config file (say, named Iisprovs.txt) by using the following lines:
"HTTP Service Trace"
0 5
"IIS: WWW Server"
0 5
"IIS: Active Server Pages (ASP)"
"IIS: WWW Isapi Extension"
"ASP.NET Events"
0 5
0 5
0 5
Once you create this file, you can issue the following command to start the IIS 6.0
event trace.
logman start "IIS Trace" -pf iisprovs.txt -ct perf -o iistrace.etl -bs 64 –nb 200 400
–ets
On high volume Web sites, using more trace buffers and larger buffer sizes might be
appropriate, as illustrated. Gathering a kernel trace at the same time is highly recommended.
More Info For more information about and documented procedures for using
Logman to gather IIS trace data, see the “Capacity Planning Tracing” topic in the IIS 6.0
Help documentation, which is available with the IIS Administration Microsoft Management Console snap-in.
Active Directory 6.0 trace providers Comprehensive Active Directory trace also
uses several providers. Create a config file Adprov.txt that contains the following
Active Directory provider names:
"Active Directory: Core"
"Active Directory: SAM"
"Active Directory: NetLogon"
"Active Directory: Kerberos"
"Local Security Authority (LSA)"
"NTLM Security Protocol"
222
Microsoft Windows Server 2003 Performance Guide
Then, issue the following command to start an Active Directory trace:
logman start ADTrace –pf adprov.txt –o adtrace.etl –ets
Gathering a kernel trace at the same time is strongly recommended.
Additional Logman parameters for event tracing include those listed in Table 2-27.
Table 2-27
Parameter
Logman Parameters for Event Tracing
Syntax
Enable Trace -p {GUID | Provider
Provider(s)
[(Flags[,Flags ...])]
Level |
Function
Notes
Specifies the trace data providers use for this session.
Use -pf [FileName]}
to use provider
names and flags
from a settings file.
Buffer size
-bf Value
Specifies the buffer size in
KB used for this trace data
collection session.
If trace events occur
faster than they can
be logged to disk,
some trace data can
be lost. A larger
buffer might be
necessary.
Number of
buffers
nb Min Max
Specifies the minimum and
maximum number of buffers
for trace data collection.
Minimum default is
the number of processors on the system plus two. Default
maximum is 25.
Set trace
mode options
-mode [TraceMode
[TraceMode ...]]
TraceMode can be globalsequence, localsequence, or
pagedmemory.
The pagedmemory
option uses pageable
memory buffers.
perf and cycle use a 100 ns
timer vs. a 1 ms system clock.
-cycle uses the least
overhead.
Clock resolu- -ct {system | perf |
tion
cycle}
-perf provides the
most accurate timer
resolution
Create and
-ets
start the
trace session
Starts a trace session by using
the logging parameters defined on the command line for
this session.
Computer
-s ComputerName
Specifies the computer you
want to gather the performance counters from.
Realtime
session
-rt
Displays trace data in real
time. Do not log trace data to
a file.
User mode
tracing
-ul
Specifies that the event trace
session is run in User mode.
If no computer name
is provided, the local
computer is assumed.
In User mode, one
provider can be enabled for the event
trace session.
Chapter 2:
Performance Monitoring Tools
223
Table 2-27 Logman Parameters for Event Tracing
Parameter
Syntax
Function
Output file
name
-o {Path |
Specifies the output file name. Required. Event trace
If the file does not exist, Log- log files are in binary
man will create it.
format, identified by
an .etl extension.
File versioning
-v {NNNNNN |
MMDDHHMM}
Generates unique file names,
either by numbering them
consecutively or adding a
time and date stamp to the
file name.
Flush timer
-ft [[HH:]MM:]SS
Flushes events from buffers
after the specified time.
File size limit -max Value
Notes
Specifies the maximum log file Logging ends when
or database size in MB.
the file size limit is
reached
Logger
Name
ln LoggerName
Specifies a user-defined name By default, the logfor the event trace logging
ger name is the colsession.
lection name.
Append
-a
Appends the output from this
logging session to an existing
file.
Begin logging
-b M/D/YYYY
H:MM:SS
Begins a logging session automatically at the designated
date and time.
[{AM | PM}]
End logging
-e M/D/YYYY
H:MM:SS
[{AM | PM}]
Log duration -rf [[HH:]MM:]SS
Ends a logging session automatically at the designated
date and time.
Or use -r to specify
the duration of a logging session.
Ends a logging session after
this amount of elapsed time.
Or use -e to specify a
log end date/time.
Repeat
-r
Use in conjunction
Repeats the collection every
with -cnf, -v, and -rc
day at the same time. The
time period is based on either options.
the -b and -rf options, or the
-b and -e options.
Start and
stop data
collection
-m [start] [stop]
Starts and stops an interactive
logging session manually.
User name
and
password
-u UserName
Password
Specifies user name and
password for remote
computer access.
The User account
must be a member of
the Performance Log
Users Group. Specify
*to be prompted for
the password at the
command line.
224
Microsoft Windows Server 2003 Performance Guide
Event Timestamps
If event trace entries appear to be out of order, you might need to use a higher resolution clock. Sometimes, if you use system time as the clock, the resolution (10 ms)
might not be fine enough for a certain sequence of events. When a set of events all
show the same timestamp value, the order that the events appear in the log is not
guaranteed to be the same order in which the events actually occurred. If you see this
occurring, use the perf clock, which has a finer resolution (100ns). When using Logman, -ct perf will force the usage of the perf clock.
File Size Limit Checking
If you specify a file size limit using the -max value, the file system in which you plan to
create the trace log file is evaluated before the trace session starts to see whether an
adequate amount of space exists to run the entire trace to completion. This file size
limit is performed only when the log file being stored is on the system drive. If the system drive has insufficient space, the logging session will fail, with a generic smlogsvc
error message reported in the event log that is similar to the following:
Unable to start the trace session for the <session name> trace log configuration. The
Kernel trace provider and some application trace providers require Administrator
privileges in order to collect data. Use the Run As option in the configuration
application to log under an Administrator account for these providers. System error
code returned is in the data.
The data field is "0000: 70 00 00 00
p...
"
The error return code of 70 00 00 00 in this error message indicates an out-of-space
condition that prevented the logging session from being started.
Note that additional configuration information is written to the front of every trace log
file so that slightly more disk space than you specified in the -max parameter is actually required to start a trace session.
Event Trace Reports
You can use the Trace Report (Tracerpt.exe) tool to convert one or more .etl files to
.csv format so that you can view the contents of a trace log. To do this, issue the following command:
tracerpt iistrace.etl -o iistrace.csv
Opening the output csv file in, for example, Microsoft Excel, allows you to view, in
sequence, the trace events recorded. The fields in Table 2-28 accompany every trace
event.
Chapter 2:
Table 2-28
Performance Monitoring Tools
225
Fields Accompanying Trace Events
Field
Description
TID
Thread identifies
Clock time
The time the event occurred, using the clock timer resolution in effect
for the logging session
Kernel (ms)
Processor time in Kernel mode
User (ms)
Processor time in User mode
User data
Variable, depending on the event type
IID
Instance ID
PID
Parent instance ID
More Info
For more information about tracerpt, in Help and Support Center for
Microsoft Windows Server 2003, click Tools, and then click Command-Line Reference A–Z.
When the -report option is used with a log file that contains trace data from the Windows Kernel Trace, IIS, Spooler, or Active Directory providers, tracerpt generates additional tables in the report that contain preformatted data related to each. For example,
the following command generates a report showing tables that incorporate information from the Windows Kernel Trace and IIS providers.
tracerpt iistrace.etl
kerneltrace.etl –report iis.html –f html
Alerts
The capability to generate real-time alerts automatically based on measurements that
exceed designated thresholds is an important aspect of any program of proactive performance monitoring. Performance Logs and Alerts provides an alerting service that
can be set to take action when one or more specific counter values have tripped a predetermined limit. In this way, you can be notified of potential performance problems
without having to constantly monitor your systems.
This section documents how to set up alerting using the Performance Logs and Alerts
facility of the Performance Monitor console. Recommendations for Alert thresholds
are provided in Chapter 3, “Measuring Server Performance.” Several practical examples of proactive performance monitoring procedures that utilize alerts are described
in Chapter 4, “Performance Monitoring Procedures.”
226
Microsoft Windows Server 2003 Performance Guide
Configuring Alerts
An alert is one or more threshold tests, defined for values of specific performance
counters, that trigger an action when those defined measurements exceed their designated threshold values. You can configure numerous actions to occur automatically
when the alert occurs, including sending a message, running a designated program to
take further action, or starting a counter or event trace logging session. You can also
configure multiple alerts to perform different actions.
For example, suppose you have a file server, and you want to log a serious event in the
Event log when the free disk space on the primary drive drops below 20 percent. In
addition, you might want to be notified via a network message when disk free space
drops below a critical 10 percent level, because in that case you might need to take
immediate action. To accomplish this, you would configure two different alerts. In
both cases, the alert definition will check the same performance counter (\LogicalDisk(D:)\% Free Space). Each separate alert definition will have a different threshold
test and perform different actions when the threshold is exceeded. Note that in this
example, when the monitored disk space goes below 10 percent, both alerts are triggered and will perform their designated action. Therefore, it would be redundant to
configure the 10 percent alert to log an event to the Event log, because the 20 percent
alert will already do that. When you use different alerts to monitor the same counter,
keep the logical relationship of their limits in mind so that unnecessary actions do not
occur as a result of overlapping conditions.
Alerts are identified by a name as well as a more descriptive comment. Each alert definition must have a unique name, and you can define as many alerts as you think
appropriate.
Scheduling Alerts
You must consider two aspects of alert scheduling, remembering that each alert you
define is scheduled to run independently of every other defined alert. The first scheduling aspect is the frequency with which you want to evaluate the counters that are
involved in the threshold test or tests. This is equivalent to a sampling interval. The
sampling interval you set determines the maximum number of alert messages or
event log entries that can be created per alert test. The sampling interval is set on the
General tab of the alert definition Properties pages, as illustrated in Figure 2-24. For
example, if you want to test a counter value every 5 seconds and generate an alert message if the designated threshold is exceeded, you would set a sample interval of 5 seconds on the General tab.
Chapter 2:
Figure 2-24
Performance Monitoring Tools
227
The General tab for a Severe Disk Free Space Shortage alert
The other alert actions that can occur, which are running a program automatically or
starting an event tracing or counter log session automatically, are governed by a separate scheduling parameter. These actions are performed no more than once per alert
session, no matter how many times the alert threshold is exceeded over the course of
any single session. Note that these actions, when defined, are performed at the first
sampling interval that the alert threshold is exceeded during the session. The duration of the Alert session is set on the Scheduling tab of the alert definition Properties
pages, as illustrated in Figure 2-25. For example, if you wanted to gather an IIS 6.0
event trace no more than once per hour if ASP Request execution time exceeds some
threshold service level, you would schedule the Alert scan to run for one hour and to
start a new scan when the current Alert scan finishes.
228
Microsoft Windows Server 2003 Performance Guide
Figure 2-25
Scheduling properties for an alert
Configuring Alert Thresholds
Adding performance counters to an alert definition is similar to adding performance
counters to a performance log query; however, adding them to an alert is a two-step
process. The first step is to select the performance counters you want to monitor, and
specify the interval between the data samples. The next step, unique to an alert configuration, is to set the limit threshold for each counter. You define the counter values
that you want to test and their threshold values on the General property page of the
alert definition, as illustrated in Figure 2-24.
When planning your alerting strategy, keep the logical relationship of all related
counter threshold values in mind. For example, two entries in the same alert in which
the same performance counter is listed twice but has different and overlapping thresholds might be redundant. In the following example, the second condition is unnecessary, because the first one will always be exceeded before the second one is exceeded.
\Processor(_Total)\% Processor Time > 80%
\Processor(_Total)\% Processor Time > 90%
It can also be worthwhile to have the same counter listed twice in an alert, but without
overlapping values. For example, you might want to know when the server is not busy
so that you can perform routine maintenance; you might also want to be alerted when
Chapter 2:
Performance Monitoring Tools
229
the server reaches a threshold of excessive activity. You can configure one alert scan to
track server usage, as in the following example:
\Server\Files Open < 20
\Server\Files Open > 1000
In this example, the same counter is monitored; however, the conditions do not overlap.
If you need to monitor the same counter and be alerted to two different but overlapping thresholds, you might want to consider using separate alerts. For example, you
can configure a “warning” alert scan that tracks one or more values by using the first—
or warning—threshold values and that initiates a warning action (such as logging an
event to the event log). You might also want to configure a second alert scan that
tracks the “danger” thresholds that require immediate attention. This makes it possible for different actions to be taken at the different thresholds and prevents overlapping thresholds from being masked.
Configuring Alert Notification
Alerts can be configured to perform several different actions when one of the conditions is met. The action to be taken is configured on the Action tab in the alert property sheet.
Log an Event to an Application Log
This is the default action taken. It can be useful in several ways:
■
The event that was generated by an alert condition can be compared to other
events in the event log to determine whether there is some correlation between
the alert and other system or application events.
■
The events in the event log can be used as a record to track issues and alert conditions that occur over time. Event-log analysis tools can be used to further
refine this information.
A sample event log entry that the Alert facility creates is illustrated in Figure 2-26.
These log entries are found in the Application Event log. The source of these event log
messages is identified as SysmonLog with an Event ID of 2031. The body of the event
log message identifies the counter threshold that was tripped and the current measured value of the counter that triggered the Alert message.
230
Microsoft Windows Server 2003 Performance Guide
Figure 2-26
Event log entry
Send a Network Message
For alert conditions that require immediate attention, a network message can be sent
to a specific computer. You can specify either a computer name to send the message to
or an IP address.
Start a Performance Data Log
An alert can also be configured to start a performance data log in which additional performance data will be collected. For example, you can configure an alert that monitors
processor usage; when that counter exceeds a certain level, Performance Logs and
Alerts can start a performance data log that collects data on which processes were running at the time and how much processor time each was using. You can use this feature to collect performance data at critical times without having to collect data when
there is nothing noteworthy to observe, thus saving disk space and analysis time.
To log data in response to an alert threshold being reached, you need to create the log
query first. Define the log query by using Counter Logs in Performance Logs and Alerts.
When configuring the log file that will run in response to an alert trigger, be sure to
define a long enough logging session so that you will get enough data to analyze.
After the log query is defined, you can configure the alert by using Alerts in Performance Logs and Alerts to define the alert conditions. On the Action tab in the alert
property sheet, select the log query from the list under the Start Performance Data
Log check box, as illustrated in Figure 2-27.
Chapter 2:
Figure 2-27
Performance Monitoring Tools
231
Action tab for an alert
In this example, when the alert fires, the alert initiates an event trace logging session. This session gathers both IIS and kernel trace information that will allow you
to report on Web site activity. Note that the performance data log session is initiated
only once per alert session, corresponding to the first time in the alert session that
the alert fires.
Run a Program
The most powerful action an alert can take is to run a command when the alert
threshold condition is met. The specified command is passed a command line detailing the alert condition and time. The format of this command line is configured by
using the Command Line Arguments dialog box; a sample of the command line is
displayed on the Action tab. It is important to make sure the command line sent to
the command to be run is formatted correctly for that command. In some cases, it
might be necessary to create a command file that reformats the command line so that
the command runs properly.
Command-line argument format The information passed to the program can be
formatted in several different ways. The information can be passed as a single argument by using the individual information fields delimited by commas, or as separate
arguments, each enclosed within double quotation marks and separated by spaces.
Choose the format that is most suitable to the program they are being passed to. Note
that the program you schedule to run is run only once per alert session, corresponding to the first time in the alert session that the alert fires.
232
Microsoft Windows Server 2003 Performance Guide
Command-line arguments passed by the alert service might not conform to the arguments expected by another program unless the program was specifically written to be
used with the alert service. In most cases, you will need to write a command file that
formats the arguments for use by your program, or develop a specific program to
accept the arguments passed by the alert service. Here are some examples:
Example 1
REM Command file to log alert messages to a text file
REM
This file expects the alert commands to be passed
REM
as a separate strings and for the user text to be
REM
the destination file name.
REM
All alert info should be sent to the command file
REM
%1 = the alert name
REM
%2 = the date/time of the alert
REM
%3 = the counter path
REM
%4 = the measured value
REM
%5 = the alert condition
REM
%6 = the user text (the file name to log this info to)
REM
Echo %1 %2 %3 %4 %5 >>%6
End
Example 2
REM Command file to send alert data to (an imaginary) program
REM This file expects the alert string to be passed
REM
as a single string. This file adds the command
REM
line switches necessary for this program
REM
%1 = the command string formatted by the alert service
REM
MyLogApp /data=%1 /logtype=alert
End
Command-line argument fields The argument passed to the program can contain
information fields that describe the alert and the condition that was met. The fields
that are passed can be individually enabled or disabled when the alert is configured;
however, the order in which they appear cannot. These fields in the following list are
described in the order in which they appear in the command-line argument from left
to right, or from first argument to last argument, as they are processed by the command-line processor:
■
Name of the alert as it appears in Performance Logs and Alerts in
the Performance Console. It is the unique name of this alert scan.
■
Date/Time
Alert Name
Date and time the alert condition occurred. The format is:
YYYY/MM/DD-HH-MM-SS-mmm
Where:
❑
YYYY is the four-digit year.
❑
MM is the two-digit month.
Chapter 2:
Performance Monitoring Tools
❑
DD is the two-digit day.
❑
HH is the two-digit hour from the 24-hour clock (00=midnight).
❑
MM is the two-digit minutes past the hour.
❑
SS is the two-digit seconds past the minute.
❑
mmm is the number of milliseconds past the second.
233
■
This is the name of the performance object, the instance (if
required), and the counter of the performance counter value that was sampled
and tested to meet the specified alert condition.
■
This is the decimal value of the performance counter that met
the alert condition.
■
Limit Value
■
Text Message
Counter Name
Measured Value
This is the limit condition that was met.
This is a user-specified text field.
Windows System Resource Manager
Windows System Resource Manager (WSRM) is a new MMC snap-in that comes on a
separate CD. It is shipped only with Microsoft Windows Server 2003, Enterprise Edition and Microsoft Windows Server 2003, Datacenter Edition. The benefit of using
WSRM is that you can manipulate individual processes or groups of processes to
enhance system performance. Processes that are aggregated together into manageable
groups are called process matching criteria. WSRM allows you to set limits on CPU
usage and memory allocation per process based on process matching criteria. For
more information about WSRM, see Chapter 6, “Advanced Performance Topics.”
Network Monitor
A poorly performing system is sometimes the result of a bottleneck in your network.
Network Monitor is a tool that allows you to monitor your network and detect traffic
problems. It also allows you to isolate different types of network traffic, such as all traffic created by accessing the DNS database or all network traffic caused by domain controller replication. By using Network Monitor you can quickly tell what percentage of
your network is being utilized and which applications are using too much bandwidth.
A version of Network Monitor with reduced functionality is included with Microsoft
Windows Server 2003, Standard Edition; Microsoft Windows Server 2003, Enterprise
Edition; and Microsoft Windows Server 2003, Datacenter Edition. It is limited to monitoring local network traffic only. If you want to monitor network traffic on other computers, you must install the version of Network Monitor that comes with Microsoft
Systems Management Server.
Chapter 3
Measuring Server
Performance
In this chapter:
Using Performance Measurements Effectively . . . . . . . . . . . . . . . . . . . . . 237
Key Performance Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Microsoft® Windows Server™ 2003 provides extensive statistics on its operation and
performance. You can gather statistics on processor scheduling, virtual memory management, disk operation, and network communications. In addition, server applications such as Active Directory, Internet Information Services (IIS), network file and
print sharing services, and Terminal Services provide measurements that enable you
to understand what is going on inside these applications.
This chapter identifies the most important performance counters that are used to
diagnose and solve performance problems, support capacity planning, and improve
operational efficiency. It identifies those counters that are primary indicators of specific
performance problems related to critical resource shortages. These primary indicators
provide direct evidence that specific capacity constraints are potentially limiting current performance levels. This chapter also identifies important secondary indicators of
performance and capacity problems. Secondary indicators provide more indirect evidence of capacity constraints that are impacting performance levels. Alone, a single
secondary indicator is often inconclusive, but a combination of secondary indicators
can reliably build a case for specific capacity constraints that are causing problems. In
conjunction with primary indicators, these secondary indicators are useful for confirming and supporting your diagnosis and conclusions.
Gathering these performance statistics is useful when you can use the information to
diagnose and resolve performance problems. Performance problems arise whenever
there is an overloaded resource for which requests waiting to be processed are
delayed. Overloaded resources become bottlenecks that slow down the processing of
requests that your Windows Server 2003 machines must service. For bottleneck
detection, it is important to capture measurements showing how busy various computer resources are and the status of the queues where requests are delayed. Fortunately, many Windows Server 2003 performance statistics can help you pinpoint
saturated computer resources and identify queues with backed-up requests.
235
236
Microsoft Windows Server 2003 Performance Guide
You can also use the performance statistics you gather daily proactively to anticipate
performance problems that are brewing and to forecast workload growth. You can
then act to relieve a potential bottleneck before it begins to hamper application performance. For forecasting purposes, it is important to capture measures of load, such as
requests per second or the number of connected users. Fortunately, there are many
metrics available that are good indicators of load and workload growth.
Finally, many performance measurements are important for reasons other than performance tracking, such as for reporting operational problems. Measurements that
show system and application availability help pinpoint where operational problems
exist. Gathering and reporting performance statistics that show application up time
help to measure the stability of your information technology (IT) infrastructure.
Tracking and reporting error conditions involving connections lost or error messages
sent also focuses attention on these operational issues. This chapter identifies the performance statistics that you should collect regularly to:
■
Resolve the performance problems you encounter
■
Support the capacity planning process so that you can intervene in a timely fashion to avoid future performance problems
■
Provide feedback to IT support staff and customers on operational trends
This chapter also provides tips that will help you set up informative alerts based on
performance counter measurements that exceed threshold values. In addition, you’ll
find measurement notes for many important performance counters that should
answer many of your questions about how to interpret the values you observe. Finally,
extensive usage notes are provided that describe how to get the most value from the
performance monitoring statistics you gather.
Note
You will find that the same set of performance counters described in this
chapter is available in many other tools. Other applications that access the same performance statistics include Microsoft Operations Manager (MOM) and those developed by third parties. All applications that gather Windows Server 2003 performance
measurements share a common measurement interface—a performance monitoring
application programming interface (API) discussed in this book in Chapter 2, “Performance Monitoring Tools.” The performance monitoring API is the common source of
all the performance statistics these tools gather.
Chapter 3:
Measuring Server Performance
237
Using Performance Measurements Effectively
This chapter focuses on the key performance counters that you should become familiar with to better understand the performance of your machines running Windows
Server 2003. Guidance is provided here to explain how these key measurements are
derived and how they should be interpreted. It is assumed that you are familiar with
the performance monitoring concepts discussed in this book in Chapter 1, “Performance Monitoring Overview.” That chapter discusses the relationship between these
counters and the hardware and software systems they measure, and understanding
these relationships is a prerequisite for performing effective analysis of common computer performance problems. Chapter 4, “Performance Monitoring Procedures,” provides a set of recommended performance monitoring procedures that you can use to
gather these and related performance counters on a regular basis, which will support
problem diagnosis, management reporting, and capacity planning.
Interpreting the performance data you gather can be challenging. It requires considerable expertise in understanding the way computer hardware and operating software work. Interpreting the performance data you gather correctly also requires
good analytical and problem-solving skills. The analysis of many computer performance and capacity problems involves identification of resource bottlenecks and
the systematic elimination of them. In large-scale environments in which you are
responsible for the performance and capacity of many machines, it is also important
to take steps to deal with the large amount of performance data that you must
potentially gather. This critical topic is discussed thoroughly in Chapter 4, “Performance Monitoring Procedures.”
Identifying Bottlenecks
As discussed in Chapter 1, “Performance Monitoring Overview,” the recommended
way to locate a bottlenecked resource that is the major contributor to a computer performance problem requires the following actions:
■
Gathering measurement data on resource utilization at the component level
■
Gathering measurement data on queuing delays that are occurring at resources
that might be overloaded
■
Determining the relationship between resource utilization and queuing
Theoretically, a nonlinear relationship exists between utilization and queuing, which
becomes evident when a resource approaches saturation. When you detect a nonlin-
238
Microsoft Windows Server 2003 Performance Guide
ear relationship between utilization and queuing at a resource, there is a good chance
that this overloaded resource is causing a performance constraint. You might be able
to add additional capacity at this point, or you might be able to tune the system so that
demands for the resource are reduced. Performance tuning is the process of systematically finding and eliminating resource constraints that constrain performance levels.
For a variety of reasons—some of which were discussed in Chapter 1, “Performance
Monitoring Overview”—this nonlinear relationship might not be readily apparent,
making bottleneck detection complicated to perform in practice. For example, consider disks and disk arrays that use some form of cache memory. Using small computer system interface (SCSI) command-tag queuing algorithms that sort a queue of
requests to favor the requests that can be serviced the fastest improves the efficiency
of disk I/O request processing as the disks get busier and busier. This is known as a
load-dependent server. Another common example is network adaptors that support the
Ethernet protocol. The Ethernet collision detection and avoidance algorithm can lead
to saturation of the link long before the effective utilization of the interface reaches
100 percent busy.
Specific measurement statistics that you gather should never be analyzed in a vacuum. You will frequently need to augment the general approach to bottleneck detection discussed here with information about how specific hardware and software
components are engineered to work. In this chapter, the “Usage Notes” section for
each key counter identified discusses how this specific counter is related to other
similar counters. In addition, several case studies that show how to identify these
and other specific resource bottlenecks are provided in Chapter 5, “Performance
Troubleshooting.”
Management by Exception
System administrators who are responsible for many Windows Server 2003 machines
have an additional challenge: namely, how to keep track of so much measurement
data across so many machines. Often this is best accomplished by paying close attention to only that subset of your machines that is currently experiencing critical
resource shortages. This approach to dealing with a large volume of information is
sometimes known as management by exception. Using a management by exception
approach, you carefully identify those machines that are experiencing the most severe
performance problems and subject them to further scrutiny. Management by exception is fundamentally reactive, so it also needs to be augmented by a proactive
approach that attempts to anticipate and avoid future problems.
Several kinds of performance level exceptions need to be considered. Absolute exceptions are easy to translate into threshold-based rules. For example, a Web server
machine that is part of a load-balanced cluster and is currently not processing any cli-
Chapter 3:
Measuring Server Performance
239
ent requests is likely to be experiencing an availability problem that needs further
investigation. Unfortunately, in practice, absolute exceptions that can be easily turned
into alerting thresholds are rare in computer systems. As a result, this chapter makes
very few specific recommendations for setting absolute alerting thresholds for key
performance counters.
Exception-reporting thresholds that are related to configuration-specific capacity constraints are much more common. Most of the alerting thresholds that are discussed
in this chapter are relative exceptions. A threshold rule to define a relative exception
requires that you know some additional information about the specific counter,
such as:
■
Whether there is excessive utilization of a resource relative to the effective capacity of the resource, which might be configuration-dependent
■
Whether there is a backlog of requests being delayed by excessive utilization of
a resource
■
Which application is consuming the resource bandwidth and when this is
occurring
■
Whether the current measurement observation deviates sharply from historical
norms
A good example of an exception that is relative to specific capacity constraints is an
alert on the Memory\Pages/sec counter, which can indicate excessive paging to disk.
What is considered excessive paging to disk depends to a large degree on the capacity
of the disk or disks used to perform I/Os, and how much of that capacity can be
devoted to paging operations without negatively impacting the I/O performance of
other applications that rely on the same disk or disks. This is a function of both the
configuration and of the specific workloads involved. Consequently, it is impossible
to recommend a single threshold value for Memory\Pages/sec that should be used to
generate a performance alert that you can apply across all your machines running
Windows Server 2003.
Although there is no simple rule that you can use to establish unacceptable and
acceptable values of many measurements, those measurements can still be effective in
helping you identify many common performance problems. The Memory\Pages/sec
counter is a key performance indicator. When it reports high rates of paging to disk
relative to the capacity of the physical disk configuration, you have a telltale sign that
the system is being operated with a physical memory constraint.
Another example of an exception that is relative to specific capacity constraints is an
alert on the Processor(_Total)\% Processor Time counter, indicating excessive processor utilization. The specific alerting threshold you choose for a machine to let you
240
Microsoft Windows Server 2003 Performance Guide
know that excessive processor resources are being consumed should depend on the
number of processors in the machine, and also on whether those processors are configured symmetrically so that any thread can be serviced on any processor. (Configuring asymmetric processors to boost the performance of large- scale multiprocessors is
discussed in Chapter 6, “Advanced Performance Topics.”) You might also like to know
which processes are associated with the excessive processor utilization that was measured. During disk-to-tape backup, running one or more processors at nearly 100 percent utilization might be expected and even desirable. On the other hand, an
application component called from a .NET application that utilizes excessive processor resources over an extended period of time is often a symptom associated with a
programming bug that can be very disruptive of the performance of other applications running on the same machine.
Tip
To establish meaningful alert thresholds for many relative exceptions, it is
important to be able to view specific counter measurements in a broader, environment-specific context.
In many cases, your understanding of what constitutes an exception should be based
on deviation from historical norms. Many performance counter measurements, such
as System\Context Switches/sec or Processor(_Total)\% Interrupt Time, are meaningful error indicators when they deviate sharply from the measurements you have
gathered in the past. A sharp change in the number of the context switches that are
occurring, relative to the amount of processor time consumed by a workload, might
reflect a programming bug that is degrading performance. Similarly, a sharp increase
in % Interrupt Time or % DPC Time might be indirect evidence of hardware errors.
This is the whole foundation of statistical quality control methods, for example, which
have proved very effective in detecting defects in manufacturing and other mass production processes. Applying statistical quality control methods, for example, you might
classify a measurement that is two, three, or four standard deviations from an historical baseline as an exception requiring more scrutiny.
Finally, the technique of management by exception can be used to help you focus on
machines needing the most attention in large-scale environments in which you must
monitor many machines. The management by exception approach to crisis intervention is termed triage—that is, classifying problems according to their severity so that
the limited time and attention available for devising solutions can be allocated appropriately. This approach suggests, for example, creating Top Ten lists that show the
servers that are overloaded the most in their use of critical resources, and, in general,
devoting your attention to dealing with the most severe problems first.
Chapter 3:
Measuring Server Performance
241
Key Performance Indicators
This section reviews the most important performance counters available on machines
running Windows Server 2003. These counters are used to report on system and
application availability and performance. Key performance indicators are discussed.
Measurement notes describe how these indicators are derived, and usage notes provide additional advice on how to interpret these measurements in the context of problem solving and capacity planning. Some basic measures of system and application
availability are discussed first, followed by the key counters that report on the utilization of the processor, memory, disk, and network resources. The last section of this
chapter discusses some important server applications that are integrated with the
base operating system. These include discussions of the performance counters that
are available to monitor file and print servers, Web servers, and thin-client Terminal
servers.
System and Application Availability
Before you can worry about performance issues, servers and server applications have
to be up and running and available for use. This section describes the performance
counters that are available to monitor system and application up time and availability.
Table 3-1 describes the System\System Up Time counter.
Table 3-1
System\System Up Time Counter
Counter Type
Elapsed time.
Description
Shows the time, in seconds, that the computer has been operational since it was last rebooted.
Measurement Notes
The values of this counter are cumulative until the counter is reset the next time the system is rebooted.
Usage Notes
The primary indicator of system availability.
Performance
Not applicable.
Capacity Planning
Not applicable.
Operations
Reporting on system availability.
Alert Threshold
Not applicable.
Related Measures
Process(n)\Elapsed Time.
Availability can also be tracked at the application level by looking at the Process
object. Any measurement interval in which a Process object instance is not available
means that the process was not running at the end of the data collection interval.
242
Microsoft Windows Server 2003 Performance Guide
When the process is active, the Process(n)\Elapsed Time counter contains a running
total that shows how long the process has been active. Note that some processes are
short-lived by design. The Process(n)\Elapsed Time counter can be used effectively
only for long-lived processes. Table 3-2 describes the Process(n)\Elapsed Time
counter.
Table 3-2
Process(n)\Elapsed Time Counter
Counter Type
Elapsed time.
Description
Shows the time, in seconds, that the process has been active
since it was last restarted.
Measurement Notes
The process instance exists only during an interval in which the
process was found running at the end of the interval. The values
of this counter are cumulative across measurement intervals
while the process is running.
Usage Notes
The primary indicator of application availability. For system services, compare the value of Process(n)\Elapsed Time with the
System\System Up Time counter to determine whether the application has been available continuously since the machine was
rebooted.
Performance
Not applicable.
Capacity Planning
Not applicable.
Operations
Reporting on application availability.
Alert Threshold
Not applicable.
Related Measures
System\System Up Time.
Other potential measures of system availability are the TCP\Connections Active
counter and the Server\Server Sessions counter, which indicate the status of network
connectivity. A system that is up and running but cannot communicate with other
machines is probably not available for use. For Web application hosting, separate FTP
Service\FTP Service Uptime and Web Service\Service Uptime counters are available
for those Web server applications.
Processor Utilization
Program execution threads consume processor (CPU) resources. These threads can
be part of User-mode processes or the operating system kernel. High-priority device
interrupt processing functions are performed by Interrupt Service Routines (ISRs)
and deferred procedure calls (DPCs). Performance counters are available that measure how much CPU processing time threads and other executable units of work consume. These processor utilization measurements allow you to determine which
applications are responsible for CPU consumption. The performance counters avail-
Chapter 3:
Measuring Server Performance
243
able for monitoring processor usage include the Processor object, which contains an
instance for each hardware engine and a _Total instance that summarizes usage levels
over all available processors. In addition, processor usage is tracked at the process and
thread level.
Process-level processor utilization measures are sometimes available for specific
server applications like Microsoft SQL Server, too. These applications are not able to
report any more detailed information than the process and thread instances provide,
but you might find it more convenient to gather them at the application level, along
with other related application-specific counters.
Measuring Processor Utilization
Processor utilization statistics are gathered by a Windows Server 2003 operating system measurement function that gains control during each periodic clock interval. This
measurement function runs inside the Interrupt Service Routine (ISR) that gains control during clock interrupt processing. The ISR code determines what work, if any,
was being performed at the time the interrupt occurred. Each periodic clock interval
is viewed as a random sample of the processor execution state. The ISR processor
measurement routine determines which process thread was executing, and whether
the processor was running in Interrupt mode, Kernel mode, or User mode. It also
records the number of threads in the processor Ready Queue.
The ISR processor measurement routine develops an accurate picture of how the processor is being utilized by determining what thread (and what kind of thread) was
running just before the interrupt occurred. If the routine that was interrupted when
the ISR processor measurement routine gained control is the Idle Thread, the processor is assumed to be Idle.
The operating system accumulates measurement samples 50–200 times per second,
depending on the speed and architecture of the machine. This sampling of processor
state will be quite accurate for any measurement interval for which at least several
thousand samples can be accumulated. At the process level, measurement intervals of
30–60 seconds should provide enough samples to identify accurately even those processes that consume trace amounts of CPU time. For very small measurement intervals in the range of 1–5 seconds, the number of samples that are gathered is too small
to avoid sampling errors that might cast doubt on the accuracy of the measurements.
244
Microsoft Windows Server 2003 Performance Guide
Overall Processor Utilization
The primary indicator of processor utilization is contained in counters from the _Total
instance of the Processor object. The Processor(_Total)\% Processor Time counter
actually reports the average processor utilization over all available processors during
the measurement interval. Table 3-3 describes the Processor(_Total)\% Processor
Time counter.
Table 3-3
Processor(_Total)\% Processor Time Counter
Counter Type
Interval (% Busy).
Description
Overall average processor utilization over the interval. Every interval in
which the processor is not running the Idle Thread, the processor is
presumed to be busy on behalf of some real workload.
Measurement
Notes
The processor state is sampled once every periodic interval by a system
measurement function. The % Processor Time counter is computed
from the ratio of samples in which the processor is detected running
the Idle thread compared to the total number of samples, as follows:
Usage Notes
The primary indicator of overall processor usage.
100% − ((TotalSamples − IdleThreadSamples) ÷ TotalSamples × 100)
Performance
■
Values fall within the range of 0–100 percent busy. The _Total instance of the processor object calculates average values of the
processor utilization instances, not the total.
■
Normalize based on clock speed for comparison across machines. Clock speed is available in the ~MHz field at
HKLM\HARDWARE\DESCRIPTION\System\CentralProcessor\n.
■
Drill down to process level statistics.
Primary indicator to determine whether the processor is a potential
bottleneck.
Capacity Planning Trending and forecasting processor usage by workload over time.
Operations
Sustained periods of 100 percent utilization might mean a runaway
process. Investigate further by looking at the Process(n)\% Processor
Time counter to see whether a runaway process thread is in an infinite
loop.
Alert Threshold
For response-oriented workloads, beware of sustained periods of utilization above 80–90 percent. For throughput-oriented workloads, extended periods of high utilization are seldom a concern, except as a
capacity constraint.
Related Measures Processor(_Total)\% Privileged Time
Processor(_Total)\% User Time
Processor(n)\% Processor Time
Process(n)\% Processor Time
Thread(n/Index#)\% Processor Time
Chapter 3:
Measuring Server Performance
245
Processors that are observed running for sustained periods at greater than 90 percent
busy are running at their CPU capacity limits. Processors observed running regularly
in the 75–90 percent range are near their capacity constraints and should be monitored more closely. Processors reported regularly only 10–20 percent busy might be
good candidates for consolidation.
Unique hardware factors in multiprocessor configurations and the use of Hyperthreaded logical processors raise difficult interpretation issues. These are discussed in
Chapter 6, “Advanced Performance Topics.”
Normalizing processor utilization measures The % Processor Time counters are
reported as percentage busy values over the measurement interval. For comparison
across machines of different speeds, you can use the value of the ~MHz field at
HKLM\HARDWARE\DESCRIPTION\System\CentralProcessor\n to normalize
these measurements to values that are independent of the speed of the specific hardware. Processor clock speed is a good indicator of processor capacity, but a less than
perfect one in many cases. Comparisons across machines of the same processor family or architecture are considerably more reliable than comparisons across machines
with quite different architectures. For example, it is difficult to compare hyperthreaded multiprocessors with conventional multiprocessors based on clock speed
alone, or 32-bit processor families with 64-bit versions. For more discussion about
processor architectures and their impact on processor performance, see Chapter 6,
“Advanced Performance Topics.”
Some processor hardware, especially processor hardware designed for use in batterypowered portable machines, can run at multiple clock speeds. These processors drop
to a lower clock speed to save power when they are running on batteries. As a result,
the value of the ~MHz field at HKLM\HARDWARE\DESCRIPTION\System\CentralProcessor\n might not reflect the current clock speed. The ProcessPerformance\Processor Frequency and ProcessPerformance\% of Maximum Frequency counters
enable you to weight processor utilization by the clock speed over a measurement
interval for a processor that supports multiple clock speeds.
In the case of hyperthreaded processors, it might make sense to normalize processor
utilization of the logical processors associated with a common physical processor core
to report the utilization of the physical processor unit. The operating system measures
and reports on the utilization of each logical processor. The weighted average of the
utilization of the logical processors is a good estimate of the average utilization of the
physical processor core over the same interval. Normalizing the measures of processor utilization in this fashion avoids the logical error of reporting greater than 100 per-
246
Microsoft Windows Server 2003 Performance Guide
cent utilization for a physical processor core. To determine whether the machine is a
hyperthreaded multiprocessor or a conventional multiprocessor, User mode applications can make a GetLogicalProcessorInformation API call. This API call returns an array
of SYSTEM_LOGICAL_PROCESSOR_INFORMATION structures that show the relationship of logical processors to physical processor cores. On a hyperthreaded
machine with two physical processors, processor instances 0 and 2 are associated
with the first physical processor, and processor instances 1 and 3 are associated with
the second physical processor. For more discussion about hyperthreaded processor
architectures, see Chapter 6, “Advanced Performance Topics.”
Diagnosing processor bottlenecks Observing that the processors on a machine
are heavily utilized does not always indicate a problem that you need to address. During disk-to-tape backup operations, for example, it is not unusual for the backup agent
to drive processor utilization to near capacity. Your server might be performing many
other CPU-intensive tasks including data compression and encryption, which you can
expect will be CPU-intensive. Try to drill down to the process level and identify the
processes that are the heaviest consumers of % Processor Time. You might also find
that breaking down overall processor utilization by processor execution state is useful
for determining whether User mode or Kernel mode functions are responsible for
driving processor utilization up.
A heavily utilized processor is a concern when there is contention for this shared
resource. You need to determine whether a processor capacity constraint is causing
contention that would slow application response time, or a single application process
is responsible for most of the processor workload. The important indicator of processor contention is the System\Processor Queue Length counter, described in Table
3-4, which measures the number of threads delayed in the processor Ready Queue.
Table 3-4
System\Processor Queue Length Counter
Counter Type
Instantaneous (sampled once during each measurement period).
Description
The number of threads that are observed as delayed in the processor Ready Queue and waiting to be scheduled for execution.
Threads waiting in the processor Ready Queue are ordered by
priority, with the highest priority thread scheduled to run next
when the processor is idle.
Measurement Notes
The processor queue length is sampled once every periodic interval. The sample value reported as the Processor Queue
Length is the last observed value of this measurement that was
obtained from the processor measurement function that runs
every periodic interval.
Usage Notes
Many program threads are asleep in voluntary wait states. The
subset of active threads sets a practical upper limit on the length
of the processor queue that can be observed.
Chapter 3:
Table 3-4
Measuring Server Performance
247
System\Processor Queue Length Counter
Performance
Important secondary indicator to determine whether the processor is a potential bottleneck.
Capacity Planning
Normally, not a useful indicator for capacity planning.
Operations
An indication that a capacity constraint might be causing excessive application delays.
Alert Threshold
On a machine with a single very busy processor, repeated observations where Processor Queue Length > 5 is a warning sign
indicating that there is frequently more work available than the
processor can handle readily. Ready Queue lengths > 10 are a
strong indicator of a processor constraint, again when processor
utilization also approaches saturation. On multiprocessors, divide the Processor Queue Length by the number of physical
processors. On a multiprocessor configured using hard processor affinity to run asymmetrically, large values for Processor
Queue Length can be a sign of an unbalanced configuration.
Related Measures
Thread(parent-process\Index#)\Thread State.
It is a good practice to observe the Processor(_Total)\% Processor Time in tandem
with the System\Processor Queue Length. Queuing Theory predicts that the queue
length should rise exponentially as processor utilization increases. Keep in mind that
Processor(_Total)\% Processor Time is based on a continuous sampling technique,
whereas the System\Processor Queue Length is an instantaneous value. It is not a
simple matter to compare a continuously measured value with an instantaneous one.
Queuing Theory also predicts that the queue length approaches infinity as the processor utilization approaches 100 percent. However, many program threads, especially
those inside background service processes, spend most of their time asleep in a voluntary wait state. These threads are normally not vying for the processor. Only active
threads do that. Consequently, the number of active threads sets a practical limit on
the size of the processor queue length that you are likely to observe.
Another factor limiting the size of the processor Ready Queue is server applications
that utilize thread pooling techniques and regulate their thread scheduling internally.
Be sure you check whether requests are queued internally inside these applications by
checking, for example, counters like ASP\Requests Queued, ASP.NET\Requests
Queued, and Server Work Queues(n)\Queue length. For more information about
thread pooling applications, see Chapter 6, “Advanced Performance Topics.”
The Thread(*)\Thread State counter is closely related to the System\Processor Queue
Length. Active threads showing a Thread State of 1 are Ready to run. The Thread State
248
Microsoft Windows Server 2003 Performance Guide
of a running thread is 2. When processor contention is evident, being able to determine which process threads are being delayed can be quite helpful. Unfortunately, the
volume of thread instances that you need to sift through is normally too large to
attempt to correlate Thread(*)\Thread State with the Processor Queue Length over
any reasonable period of time.
The size of the processor Ready Queue sometimes can appear disproportionately
large, compared to overall processor utilization. This is a by-product of the clock interrupt mechanism that is used to gather the processor Ready Queue length statistics.
Because there is only one hardware clock per machine, it is not unusual for threads
waiting on a timer interval to get bunched together. If this “bunching” occurs shortly
before the last periodic clock interval when the processor Ready Queue Length is
measured, the Processor Queue Length can artificially appear quite large. This can
sometimes be observed in machines supporting a large number of Terminal Services
sessions. Keyboard and mouse movements at Terminal Services client machines are
sampled by the server on a periodic basis. The Processor Queue Length value measured might show a large number of Ready threads for the period immediately following session sampling.
Process-Level CPU Consumption
If a Windows Server 2003 machine is dedicated to performing a single role, knowing
that machine’s overall processor utilization is probably enough information to figure
out what to do. However, if the server is performing multiple roles, it is important to
drill down and determine which processes are primarily responsible for the CPU
usage profile that you are measuring. Statistics on processor utilization that are compiled at the process level allow you to determine which workloads are consuming processor resources. Table 3-5 describes the Process(instancename)\% Processor Time
counter.
Table 3-5
Process(instancename)\% Processor Time Counter
Counter Type
Interval (% Busy).
Description
Total processor utilization by threads belonging to the process
over the measurement interval.
Measurement Notes
The processor state is sampled once every periodic interval. %
Processor Time is computed as
(Process Busy Samples ÷ Total Samples) × 100.
Chapter 3:
Table 3-5
Measuring Server Performance
249
Process(instancename)\% Processor Time Counter
Usage Notes
The primary indicator of processor usage at the process level.
■
Values fall within the range of 0–100 percent busy by default. A multithreaded process can be measured consuming more than 100 percent processor busy on a
multiprocessor. On multiprocessors, the default range can
be overridden by disabling the CapPercents at100 setting
in the HKLM\SOFTWARE\Microsoft\Perfmon key.
■
The Process object includes an Idle process instance. Unless capped at 100 percent busy on a multiprocessor, Process(_Total)% Processor Time will report 100 percent busy
× the number of processor instances.
■
Volume considerations may force you to gather process
level statistics only for specific processes. For example,
collect Process(inetinfo)\% Processor Time for Web servers, Process(store)\% Processor Time for Exchange Servers, etc.
■
Process name is not unique. Operating system services are
distributed across multiple svchost processes, for example. COM+ server applications run inside instances of Dllhost.exe. IIS 6.0 application pools run in separate
instances of the W3wp.exe process. Within a set of processes with the same name, the ID Process counter is
unique.
■
Be cautious about interpreting this counter for measurement intervals that are 5 seconds or less. They might be
subject to sampling error.
Performance
Primary indicator to determine whether process performance is
constrained by a CPU bottleneck.
Capacity Planning
Trending and forecasting processor usage by application over
time.
Operations
Sustained periods of 100 percent utilization might mean a runaway process in an infinite loop. Adjust the base priority of the
process downward or terminate it.
Alert Threshold
Sustained periods at or near 100 percent busy might mean a
runaway process. You should also build alerts for important
server application processes based on deviation from historical
norms.
Related Measures
Process(n)\% Privileged Time
Process(n)\% User Time
Process(n)\Priority Base
Thread(n/Index#)\% Processor Time
Thread(n/Index#)\Thread State
250
Microsoft Windows Server 2003 Performance Guide
In rare cases, you might find it useful to drill further down into a process by looking at
its thread data. % Processor Time can also be measured at the thread level. Even more
interesting are the Thread(n/Index#)\Thread State and Thread(n/Index#)\Wait State
Reason counters. These are instantaneous counters containing coded values that indicate the execution state of each thread and the reason threads in the Wait state are
waiting. Detailed event tracing of the operating system thread Scheduler can also be
performed using Event Tracing for Windows.
Processor Utilization by Processor
On a multiprocessor are multiple instances of the Processor object, one for each
installed processor, as well as the _Total instance that reports processor utilization values that are averaged across available processors.
On a multiprocessor, counters in the _Total instance of the Processor object
that are event counters, such as Interrupts/sec, do report totals over all processor
instances. Only the % Processor Time measurements are averaged.
Note
By default, multiprocessors are configured for symmetric multiprocessing. Symmetric
multiprocessing means that any thread is eligible to run on any available processor.
This includes Interrupt Service Routines (ISRs) and deferred procedure calls (DPCs),
which can also be dispatched on any physical processor. When machines are configured for symmetric multiprocessing, individual processors tend to be loaded evenly.
Over any measurement interval, differences in % Processor Time or Interrupts/sec at
individual processor level instances should be uniform, subject to some variability
because of Scheduler decisions based on soft processor affinity. Otherwise, differences in the performance of an individual processor within a multiprocessor configuration are mainly the result of chance. As a consequence, on symmetric
multiprocessing machines, individual processor level statistics are seldom interesting.
However, if the machine is configured for asymmetric processing using the Interrupt
Affinity tool in the Windows Server 2003 Resource Kit, WSRM, or application-level
processor affinity settings that are available in IIS 6.0 and SQL Server, monitoring
individual instances of the processor object can be very important. See Chapter 6,
“Advanced Performance Topics,” for more information about using the Interrupt
Affinity tool.
Context Switches/Sec
A context switch occurs whenever the operating system stops one thread from running
and starts executing another thread. This can happen because the thread that was
originally running voluntarily relinquishes the processor, often because it needs to
Chapter 3:
Measuring Server Performance
251
wait until an I/O finishes before it can resume processing. A running thread can also
be preempted by a higher priority thread that is ready to run, again, often because an
I/O interrupt has just occurred. User-mode threads also switch to a corresponding
Kernel mode thread whenever the User-mode application needs to perform a Privileged mode operating system or subsystem service. All of these events are counted as
context switches in Windows.
The rate of thread context switches that occur is tallied at the thread level and at the
overall system level. This is an intrinsically interesting statistic, but a system administrator usually can do very little about the rate that context switches occur. Table 3-6
describes the System\Context Switches/sec counter.
Table 3-6
System\Context Switches/sec Counter
Counter Type
Interval difference counter (rate/second).
Description
A context switch occurs when one running thread is replaced by
another. Because Windows Server 2003 supports multithreaded
operations, context switches are normal behavior for the system. When a User-mode thread calls any privileged operating
system function, a context switch occurs between the Usermode thread and a corresponding Kernel-mode thread that
performs the called function in Privileged mode.
Measurement Notes
The operating system counts the number of context switches as
they occur. The measurement reported is the difference between the current number of context switches and the number
from the previous measurement interval:
Usage Notes
Context switching is a normal system function, and the rate of
context switches that occur is a by-product of the workload. A
high rate of context switches is not normally a problem indicator. Nor does it mean the machine is out of CPU capacity. Moreover, a system administrator usually can do very little about the
rate that context switches occur.
(ContextSwitches+1 – ContextSwitches+0) ÷ Duration
A large increase in the rate of context switches/sec relative to
historical norms might reflect a problem, such as a malfunctioning device. Compare Context Switches/sec to the Processor(_Total)\Interrupts/sec counter with which it is normally
correlated.
Performance
High rates of context switches often indicate application design
problems and might also foreshadow scalability difficulties.
Capacity Planning
Not applicable.
Operations
Not applicable.
Alert Threshold
Build alerts for important server machines based on extreme deviation from historical norms.
Related Measures
Thread\Context Switches/sec.
252
Microsoft Windows Server 2003 Performance Guide
The number of context switches that occur is sometimes related to the number of system calls, which is tracked by the System\System Calls/sec counter. However, no hard
and fast rule governs this relationship because some system calls make additional system calls.
Processor Utilization by Processor Execution State
The Windows Server 2003 operating system measurement function that gathers processor utilization statistics also determines whether the processor was running in Kernel (or Privileged) mode, or User mode.
Figure 3-1 illustrates that Processor(n)\% Processor Time = Processor(n)\
% Privileged Time + Processor(n)\% User Time.
Figure 3-1
Processor utilization by processor execution state
Privileged mode includes both the time the processor is serving interrupts inside
Interrupt Service Routines (% Interrupt Time) and executing Deferred Procedure
Calls (% DPC Time) on behalf of ISRs, as well as all other Kernel-mode functions of
the operating system and device drivers.
There is no set rule to tell you what percent of User Time or Privilege Time to expect.
It is a workload-dependent variable. System administrators should mainly take note of
measurements that vary significantly from historical norms. In addition, the processor state utilization measurements can provide insight into the type of running thread
causing a CPU usage spike. This is another piece of information that can help you narrow down the source of a problem. The Kernrate processor sampling tool can then be
used to identify a specific User-mode application or kernel module that is behaving
badly. Using Kernrate is discussed in Chapter 5, “Performance Troubleshooting.”
Chapter 3:
Measuring Server Performance
253
Only Interrupt Service Routines (ISRs) run in Interrupt state, a subset of the amount
of time spent in Privileged mode. An excessive amount of % Interrupt Time might
indicate a hardware problem such as a malfunctioning device. Excluding ISRs, all
operating system and subsystem functions run in Privileged state. This includes initiating I/O operations to devices, managing TCP/IP connections, and generating print
or graphic display output. A good portion of device interrupt processing is also handled in Privileged mode by deferred procedure calls (DPCs) running with interrupts
enabled. User-mode application program threads run in the User state. Many User
applications make frequent calls to operating system services and wind up spending a
high percentage of time running in Privileged mode.
% Interrupt Time The % Interrupt Time counter measures the amount of processor time devoted to interrupt processing by ISR functions running with interrupts disabled. % Interrupt Time is included in overall % Privileged Time.
Table 3-7 describes the Processor(_Total)\% Interrupt Time counter.
Table 3-7
Processor(_Total)\% Interrupt Time Counter
Counter Type
Interval (% Busy).
Description
Overall average processor utilization that occurred in Interrupt
mode over the interval. Only Interrupt Service Routines (ISRs),
which are device driver functions, run in Interrupt mode.
Measurement Notes
The processor state is sampled once every periodic interval:
InterruptModeSamples ÷ Total Samples × 100
Usage Notes
■
The _Total instance of the Processor objects calculates average values of the processor utilization instances, not the
total.
■
Interrupt processing by ISRs is the highest priority processing that takes place. Interrupts also have priority, relative to the IRQ. When an ISR is dispatched, interrupts at
an equal or lower priority level are disabled.
■
An ISR might hand off the bulk of its device interrupt processing functions to a DPC that runs with interrupts enabled in Privileged mode.
■
Interrupt processing is a system function with no associated process. The ISR and its associated DPC service the
device that is interrupting. Not until later, in Privileged
mode, does the I/O Manager determine which thread
from which process was waiting for the I/O operation to
complete.
■
Excessive amounts of % Interrupt Time can identify that a
device is malfunctioning but cannot pinpoint which device. Use Kernrate, the kernel debugger, to determine
which ISRs are being dispatched most frequently.
254
Microsoft Windows Server 2003 Performance Guide
Table 3-7
Processor(_Total)\% Interrupt Time Counter
Performance
Used to track the impact of using the Interrupt Filter tool to restrict Interrupt processing for specific devices to specific processors using hard processor affinity. See Chapter 6, “Advanced
Performance Topics,” for more information.
Capacity Planning
Not applicable.
Operations
Secondary indicator to determine whether a malfunctioning device is contributing to a potential processor bottleneck.
Alert Threshold
Build alerts for important server machines based on extreme deviation from historical norms.
Related Measures
Processor(_Total)\Interrupts/sec
Processor(_Total)\% DPC Time
Processor(_Total)\% Privileged Time
Processor(_Total)\Interrupts/sec records the number of device interrupts that were
serviced per second. If you are using the Windows Server 2003 Resource Kit’s Interrupt Filter tool to restrict interrupt processing for certain devices to specific processors using hard processor affinity, drill down to the individual processor level to
evaluate how this partitioning scheme is working. Monitor both Processor(n)\Interrupts/sec and Processor(n)\% Interrupt Time to see how your interrupt partitioning
scheme is working. See Chapter 6, “Advanced Performance Topics” for more information about using the Windows Server 2003 Resource Kit’s Interrupt Filter tool.
% Privileged mode Operating system functions, including ISRs and DPCs, run in
Privileged or Kernel mode. Virtual memory that is allocated by Kernel mode threads
can be accessed only by threads running in Kernel mode. When a User-mode thread
needs to perform an operating system function of any kind, a context switch takes
place between the User-mode thread and a corresponding Kernel-mode thread, which
changes the state of the machine. Table 3-8 describes the Processor(_Total)\% Privileged Time counter.
Table 3-8
Processor(_Total)\% Privileged Time Counter
Counter Type
Interval (% Busy).
Description
Overall average processor utilization that occurred in Privileged
or Kernel mode over the interval. All operating system functions,
including Interrupt Service Routines (ISRs) and deferred procedure calls (DPCs), run in Privileged mode. Privileged mode includes device driver code involved in initiating device Input/
Output operations and deferred procedure calls that are used to
complete interrupt processing.
Chapter 3:
Table 3-8
Measuring Server Performance
255
Processor(_Total)\% Privileged Time Counter
Measurement Notes
The processor state is sampled once every periodic interval:
PrivilegedModeSamples ÷ Samples × 100
Usage Notes
■
The _Total instance of the Processor objects calculates average values of the processor utilization instances, not the
total.
■
The ratio of % Privileged Time to overall % Processor Time
is workload-dependent.
■
Drill down to the Process(n)\% Privileged Time to determine what application is issuing the system calls.
Performance
Secondary indicator to determine whether operating system
functions, including device driver functions, are responsible for
a potential processor bottleneck.
Capacity Planning
Not applicable.
Operations
The state of the processor, when a runaway process thread is in
an infinite loop, can pinpoint whether a system module is implicated in the problem.
Alert Threshold
Build alerts for important server machines based on extreme deviation from historical norms.
Related Measures
Processor(_Total)\% Interrupt Time
Processor(_Total)\% DPC Time
Process(n)\% Privileged Time
Calculate the ratio of % Privileged Time to overall % Processor Time usage:
Privileged mode ratio =
Processor(_Total)\% Privileged Time ÷ Processor(_Total)\% Processor Time
No fixed ratio value is good or bad. The relative percentage of Privileged mode CPU
usage is workload-dependent. However, a sudden change in this ratio for the same
workload should arouse your curiosity and trigger your interest in finding out what
caused the change.
% Privileged Time is measured at the overall system level, by processor, and by process. If % Privileged Time at the system level appears excessive, you should be able to
drill down the process level and determine which process is responsible for the system calls. You might also need to use Kernrate or the kernel debugger to track down
the source of excessive % Privileged Time in an Interrupt Service Routine (ISR) or
deferred procedure call (DPC) associated with a device driver.
256
Microsoft Windows Server 2003 Performance Guide
When multiprocessors are configured to run symmetrically, drilling down to the
machine state on individual processors is seldom interesting or useful. However, if
you are making extensive use of hard processor affinity, processor-level measurements
can reveal how well your partitioning scheme is working.
Monitoring Memory and Paging Rates
Counters in the Memory object report on both physical and virtual memory usage.
They also report paging rates associated with virtual memory management. Because
of virtual memory management, a shortage of RAM is often evident indirectly as a disk
performance problem, when excessive paging to disk consumes too much of the available disk bandwidth. Consequently, paging rates to disk are an important memory
performance indicator.
On 32-bit systems, virtual memory is limited to 4 GB, normally partitioned into a 2GB private area that is unique per process, and a common 2-GB shared range of memory addresses that is common to all processes. On machines configured with large
amounts of RAM (for example, 1–2 GB of RAM or more), virtual memory might
become exhausted before a shortage of physical memory occurs. When virtual memory becomes exhausted because of a bug in which a program allocates virtual memory
but never releases it, the situation is known as a memory leak. If virtual memory is
exhausted because of the orderly expansion of either a process address space or the
system range, the problem is an architectural constraint. In either instance, the result
can be catastrophic, leading to widespread application failure and/or a system crash.
On 32-bit systems, it is important to monitor virtual memory usage within the system
memory pools and at the process address space level.
Virtual Memory and Paging
Physical memory in Windows Server 2003 is allocated to processes on demand. On
32-bit systems, each process address space is able to allocate up to 4 billion bytes of
virtual memory. The operating system builds and maintains a set of per-process page
tables that are used to map process address space virtual memory locations into physical memory (RAM) pages. At the process level, allocated memory is either reserved or
committed. When virtual memory is committed, the operating system reserves room
for the page, either in RAM or on a paging file on disk, to allow the process to reference that range of virtual addresses.
Chapter 3:
Measuring Server Performance
257
The current resident set of a process’s virtual memory pages is called its working set. So
long as process working sets can fit readily into RAM, virtual memory addressing has
little performance impact. However, when process working sets require more RAM
than is available, performance problems can arise. When processes acquire new
ranges of virtual memory addresses, the operating system fulfills these requests by
allocating pages from a pool of available pages. (Because the page size is hardwaredependent, Available Bytes is reported in bytes, kilobytes, and megabytes.) The Memory Manager attempts to maintain a minimum-sized pool of available pages so that it
is capable of granting new pages to processes promptly. The target minimum size of
the Available Pages pool is about 8 MB for every 1 GB of RAM. When the pool of available pages becomes depleted, something has to give.
When RAM is full, the operating system is forced to trim older pages from process
working sets and add them to the pool of Available pages. Trimmed working set pages
that are “dirty”—that contain changed data—must be written to the paging file before
they can be granted to another process and used. A page writer thread schedules page
writes as soon as possible so that these older pages can be made available for new allocations. When the virtual memory manager (VMM) trims older working set pages,
these pages are initially added to the pool of available pages provisionally and stored
in the Standby list. Pages on the Standby list are flagged “in transition,” and in that
state they can be restored to the process working set where they originated with very
little performance impact.
Over time, unreferenced pages on the Standby list age and are eventually moved to the
Zero list, where they are no longer eligible to transition fault back into their original
process working set. A low-priority system thread zeros out the contents of older
pages, at which point these pages are moved to the Zero list. The operating system
assigns new process working set pages to available pages from either the Zero list or
the Free list. The Free list contains pages that have been explicitly freed by the processes that originally allocated them.
The dynamics of virtual memory management are extensively instrumented. Available
Bytes represents the total number of pages currently on the Standby list, the Free list,
and the Zero list. There are also counters for three different types of page faults: socalled hard page faults (Page Reads/sec), transition faults, and demand zero faults.
Pages Output/sec counts the number of trimmed pages that are written to disk. The
number of resident system pages and process working set pages is also counted.
258
Microsoft Windows Server 2003 Performance Guide
A few performance aspects of memory management are not directly visible, but these
can usually be inferred from the statistics that are provided. There is, for example, no
direct measure of the rate at which page trimming occurs. However, page trimming is
normally accompanied by an increase in the rate of transition faults. In addition, the
number of trimmed, dirty pages waiting in memory to be written to the paging file is
not reported. During sustained periods in which the number of Page Writes/sec
reaches 80 percent of total disk transfers to the paging file disk, you might assume
that a backlog of trimmed dirty pages is building up.
In a system in which physical memory is undersized, the virtual-memory manager is
hard pressed to keep up with the demand for new working set pages. This memory
management activity will be reflected in both the high rates of paging to disk (Memory\Pages/sec) and the soft faults (Memory\Transition Faults/sec).
Available Bytes is also an extremely important indicator of physical memory usage; it
is a reliable indicator that shows there is an ample supply of RAM. It can also help you
identify configurations in which you do not have enough physical memory. Once
Available Bytes falls to its minimum size, the effect of page trimming will tend to keep
Available Bytes at or near that value until the demand for RAM slackens. Several server
applications—notably IIS, SQL Server, and Microsoft Exchange—interact with the
Memory Manager to grow their working sets when free RAM is abundant. These
server applications will also jettison older pages from their process working sets when
the pool of available pages is depleted. This interaction also tends to keep Available
Bytes at a relatively constant level at or near its minimum values.
Over longer-term periods, virtual memory Committed Bytes can serve as an indicator
of physical memory demand. As the number of Committed Bytes grows larger than
the size of RAM, older pages are trimmed from process working sets and relegated to
the paging file on disk. The potential for paging operations to disk increases as the
number of Committed Bytes grows. This dynamic aspect of virtual memory management must be understood when you are planning for RAM capacity on new machines,
and when you are forecasting the point at which paging problems are apt to emerge on
existing machines with growing workloads.
Pages/sec Because of virtual memory, a shortage of RAM is transformed into a disk
I/O bottleneck. Not having enough RAM for your workload is often evident indirectly
as a disk performance problem. Excessive paging rates to disk might consume too
much of the available disk bandwidth and slow down applications attempting to
access their files on the same disk or disks. The Memory\Pages/sec counter, which
tracks total paging rates to disk and is described in Table 3-9, is the single most important physical memory performance indicator.
Chapter 3:
Table 3-9
Measuring Server Performance
259
Memory\Pages/sec Counter
Counter Type
Interval difference counter (rate/second).
Description
The number of paging operations to disk during the interval.
Pages/sec is the sum of Page Reads/sec and Page Writes/sec.
Measurement Notes
Each paging operation is counted.
Usage Notes
■
Page Reads/sec counters are hard page faults. A running
thread has referenced a page in virtual memory that is not
in the process working set. Nor is it a trimmed page
marked in transition, but rather is still resident in memory.
The thread is delayed for the duration of the I/O operation to fetch the page from disk. The operating system
copies the page from disk to an available page in RAM
and then redispatches the thread.
■
Page writes (the Page Writes/sec counter) occur when
dirty pages are trimmed from process working sets. Page
trimming is triggered when the pool of available pages
drops below its minimum allotment.
■
Excessive paging can usually be reduced by adding RAM.
■
When paging files coexist with application data on the
same disk or disks, calculate the percentage of disk paging operations to total disk I/O operations:
Memory\Pages/sec ÷ Physical Disk(_Total)\
Disk Transfers/sec
■
Disk bandwidth is finite. Capacity used for paging operations is unavailable for other application-oriented file operations.
■
Be aware that the operation of the system file cache redirects normal application I/O through the paging subsystem. Note that the Memory\Cache Faults/sec counter
reflects both hard and soft page faults as a result of read
requests for files that are cached.
Performance
Primary indicator to determine whether real memory is a potential bottleneck.
Capacity Planning
Watch for upward trends. Add memory when paging operations
absorb more than 20–50 percent of your total disk I/O bandwidth.
Operations
Excessive paging can lead to slow and erratic response times.
Alert Threshold
Alert when Pages/sec exceeds 50 per paging disk.
Related Measures
Memory\Available Bytes
Memory\Committed Bytes
Process(n)\Working Set
260
Microsoft Windows Server 2003 Performance Guide
Disk throughput capacity creates an upper bound on Pages/sec. That is the basis for
the configuration rule discussed earlier. If paging rates to disk are high, they will delay
application-oriented file I/Os to the disks where the paging files are located.
Physical memory usage When physical memory is a scarce resource, you will
observe high Pages/sec rates. The Available Bytes counter is a reliable indicator of
when RAM is plentiful. But if RAM is scarce, it is important to understand which processes are using it. There are also system functions that consume RAM. These physical
memory usage measurements are discussed in this section.
Available bytes The Memory Manager maintains a pool of available pages in RAM
that it uses to satisfy requests for new pages. The current size of this pool is reported in
the Memory\Available Bytes counters, described in Table 3-10. This value, like all other
memory allocation counters, is reported in bytes, not pages. (The page size is hardwaredependent. On 32-bit Intel-compatible machines, the standard page size is 4096 bytes.)
You can calculate the size of the pool in pages by dividing Memory\Available Bytes by
the page size. For convenience, there are Available Kbytes and Available Mbytes
counters. These report available bytes divided by 1024 and 1,048,576, respectively.
Whenever Available Bytes drops below its minimum threshold, a round of workingset page trimming is initiated to replenish the system’s supply of available pages. The
minimum threshold that triggers page trimming is approximately 8 MB per 1 GB of
RAM, or when RAM is 99 percent allocated. Once RAM is allocated to that extent,
working set page trimming tends to keep the size of the Available Page pool at or near
this minimum value. This means that Available Bytes can reliably indicate that the
memory supply is ample—it is safe to regard any machine with more than 10 percent
of RAM available as having an adequate supply of memory for the workload. Once
RAM fills up, however, the Available Bytes counter alone cannot distinguish between
machines that have an adequate supply of RAM and those that that have an extreme
shortage. Direct measures of paging activity, like Pages/sec and Transition Faults/sec,
will help you distinguish between these situations.
The Memory Manager organizes Available pages into three list structures. These are
the Standby list, the Free list, and the Zero list. The sizes of these lists are not measured separately. Trimmed working-set pages are deposited in the Standby list first.
(Trimmed pages that are dirty—that contain modified or changed data—must be written to disk before they can be regarded as immediately available.) Pages on the
Standby list are marked “in transition.” If a process references a page on the Standby
list, it transition faults back into the process working set with very little performance
impact. Transition faults are also known as “soft faults,” as opposed to hard page faults
that require I/O to the paging disk to resolve.
Eventually, unreferenced pages on the Standby list migrate to the Zero list, where they
are no longer marked as being in transition. References to new pages can also be satisfied with pages from the Free list. The Free list contains pages that have been explicitly freed by the processes that originally allocated them.
Chapter 3:
Table 3-10
Measuring Server Performance
261
Memory\Available Bytes Counter
Counter Type
Instantaneous
(sampled once during each measurement period).
Description
The number of free pages in RAM available for immediate allocation. Available Bytes counts available pages on the Standby,
Free, and Zero lists.
Measurement Notes
Can also be obtained by calling the GlobalMemoryStatusEx Microsoft Win32 API function.
Usage Notes
Available Bytes is the best single indicator that physical memory is
plentiful. When memory is scarce, Pages/sec is a better indicator.
■
Divide by the size of page to calculate the number of free
pages. Available KBytes is the same value divided by 1024.
Available MBytes is the same value divided by 1,048,576.
■
Calculate % Available Bytes as a percentage of total RAM
(available in Task Manager on the Performance tab):
% Available Bytes = Memory\Available Bytes ÷ sizeof(RAM)
■
A machine with Available Bytes > 10 percent of RAM has
ample memory.
■
The Memory Manager’s page replacement policy attempts
to maintain a minimum number of free pages. When available memory falls below this threshold, it triggers a round
of page trimming, which replenishes the pool of Available
pages with older working set pages. So when memory is
scarce, Available Bytes will consistently be measured at or
near the Memory Manager minimum threshold.
■
The Available Bytes threshold that triggers working-set
page trimming is approximately 8 MB per 1 GB of RAM, or
less than 1 percent Available Bytes. When Available Bytes
falls to this level, monitor Pages/sec and Transition Faults/
sec to see how hard the Memory Manager has to work to
maintain a minimum-sized pool of Available pages.
■
Some server applications, such as IIS, Exchange, and SQL
Server, manage their own working sets. They interact with
the Memory Manager to allocate more memory when
there is an ample supply and jettison older pages when
RAM grows scarce. When these servers’ applications are
running, Available Bytes measurements tend to stabilize
at or near the minimum threshold.
Performance
Primary indicator to determine whether the supply of real memory is ample.
Capacity Planning
Watch for downward trends. Add memory when % Available
Bytes consistently drops below 10 percent.
Operations
Excessive paging can lead to slow and erratic response times.
Alert Threshold
Alert when Available Bytes < 2 percent of the size of RAM.
262
Microsoft Windows Server 2003 Performance Guide
Table 3-10
Memory\Available Bytes Counter
Related Measures
Memory\Pages/sec
Memory\Transition Faults/sec
Memory\Available KBytes
Memory\Available MBytes
Memory\Committed Bytes
Process(n)\Working Set
Process working set bytes When there is a shortage of available RAM, it is often
important to determine how the allocated physical memory is being used. Resident
pages of a process are known as its working set. The Process(*)\Working Set counter,
described in Table 3-11, is an instantaneous counter that reports the number of resident pages of each process.
Table 3-11
Process(*)\Working Set Counter
Counter Type
Instantaneous
(sampled once during each measurement period).
Description
The set of resident pages for a process. The number of allocated
pages in RAM that this process can address without causing a
page fault to occur.
Measurement Notes
Includes both resident private pages and resident pages
mapped to an Image file, which are often shared among
multiple processes.
Usage Notes
Process(n)\Working Set tracks current RAM usage by active
processes.
■
Resident pages from a mapped Image file are counted in
the working set of every process that has loaded that
Image file. Because of the generous use of shared DLLs,
resident pages of active DLLs are counted many times in
different process working sets. This is the reason Process(n)\Working Set is often greater than Process(n)\Private Bytes and Process(n)\Virtual Bytes.
■
Divide by the size of a page to calculate the number of allocated pages.
■
Calculate % RAM Used as a percentage of total RAM
(available in Task Manager on the Performance tab): %
RAM Used = Process(n)\Working Set ÷ sizeof(RAM)
■
Some server applications, such as IIS, Exchange, and SQL
Server, manage their own process working sets. These
server applications build and maintain memory-resident
caches in their private process address spaces. You need
to monitor the effectiveness of these internal caches to
determine whether these server processes have access to
an adequate supply of RAM.
■
Monitor Process(_Total)\Working Set in the Process object to see how RAM is allocated overall across all process
address spaces.
Chapter 3:
Table 3-11
Measuring Server Performance
263
Process(*)\Working Set Counter
Performance
If memory is scarce, Process(n)\Working Set tells you how much
RAM each process is using.
Capacity Planning
Watch for upward trends for important applications.
Operations
Not applicable.
Alert Threshold
Build alerts for important processes based on extreme deviation
from historical norms.
Related Measures
Memory\Available Bytes
Memory\Committed Bytes
Process(n)\Private Bytes
Process(n)\Virtual Bytes
Process(n)\Pool Paged Bytes
The Windows Server 2003 Memory Manager uses a global last-recently used (LRU)
policy. This ensures that the active pages of any process remain resident in RAM. If the
memory access pattern of one process leads to a shortage of RAM, all active processes
can be affected. At the process level, there is a Page Faults/sec interval counter that can
be helpful in determining what processes are being impacted by a real memory shortage. However, the Process(n)\Page Faults/sec counter includes all three type of page
faults that occur, so it can be difficult to interpret.
Resident pages in the system range Memory resident pages in the system range
are counted by two counters in the Memory object: Cache Bytes and Pool Nonpaged
Bytes. Cache Bytes is the pageable system working set and is managed like any other
process working set. Cache Bytes can be further broken down into Pool Paged Resident Bytes, System Cache Resident Bytes, System Code Resident Bytes, and System
Driver Resident Bytes. Both System Code Resident Bytes and System Driver Resident
Bytes are usually quite small relative to the other categories of resident system pages.
On the other hand, both Pool Paged Resident Bytes and System Cache Resident Bytes
can be quite large and might also vary greatly from period to period, depending on
memory access patterns. Memory\Cache Bytes is described in Table 3-12.
Table 3-12
Memory\Cache Bytes Counter
Counter Type
Instantaneous
(sampled once during each measurement period).
Description
The set of resident pages in the system working set. The number
of allocated pages in RAM that kernel threads can address without causing a page fault to occur.
Measurement Notes
Includes Pool Paged Resident Bytes, System Cache Resident
Bytes, System Code Resident Bytes, and System Driver Resident
Bytes.
264
Microsoft Windows Server 2003 Performance Guide
Table 3-12
Memory\Cache Bytes Counter
Usage Notes
The system working set is subject to page replacement like any
other working set.
■
The Pageable pool is an area of virtual memory in the system range from which system functions allocate pageable
memory. Pool Paged Resident Bytes is the number of Pool
Paged Bytes that are currently resident in memory.
■
Calculate the ratio:
Pool Paged Bytes ÷ Pool Paged Resident Bytes
This ratio can serve as a memory contention index that
might be useful in capacity planning.
■
The System Cache is an area of virtual memory in the system range in which application files are mapped. System
Cache Resident Bytes is the number of pages from the
System Cache currently resident in RAM.
■
The kernel debugger !vm command can be used to determine the maximum size of the Paged pool.
■
Add Process(_Total)\Working Set, Memory\Cache Bytes,
and Memory\Pool Nonpaged Bytes to see how RAM
overall is allocated.
■
Divide by the size of a page to calculate the number of allocated pages.
Performance
If memory is scarce, Cache Bytes tells you how much pageable
RAM system functions are using.
Capacity Planning
The ratio of Pool Paged Bytes to Pool Paged Resident Bytes is
a memory contention index that might be useful in capacity
planning.
Operations
Not applicable.
Alert Threshold
Build alerts for important machines based on extreme deviation
from historical norms.
Related Measures
Pool Nonpaged Bytes
Pool Paged Resident Bytes
System Cache Resident Bytes
System Code Resident Bytes
System Driver Resident Bytes
Process(_Total)\Working Set
There are also system functions that allocate memory from the Nonpaged pool. Pages
in the Nonpaged pool are always resident in RAM. They cannot be paged out. An
example of a system function that will allocate memory in the Nonpaged pool is working storage that might be accessed inside an Interrupt Service Routine (ISR). An ISR
Chapter 3:
Measuring Server Performance
265
runs in Interrupt mode with interrupts disabled. An ISR that encounters a page fault
will crash the system, because a page fault generates an interrupt that cannot be serviced when the processor is already disabled for interrupt processing. The Memory\Pool Nonpaged Bytes counter is described in Table 3-13.
Table 3-13
Memory\Pool Nonpaged Bytes Counter
Counter Type
Instantaneous (sampled once during each measurement
period).
Description
Pages allocated from the Nonpaged pool are always resident in
RAM.
Usage Notes
■
Status information about every TCP connection is stored
in the Nonpaged pool.
■
The kernel debugger !vm command can be used to determine the maximum size of the Nonpaged pool.
■
Divide by the size of a page to calculate the number of allocated pages.
Performance
If memory is scarce, Pool Nonpaged Bytes tells you how much
nonpageable RAM system functions are using.
Capacity Planning
Can be helpful when you need to plan for additional TCP
connections.
Operations
Not applicable.
Alert Threshold
Build alerts for important machines based on extreme deviation
from historical norms.
Related Measures
Pool Paged Bytes
Pool Paged Resident Bytes
System Cache Resident Bytes
System Code Resident Bytes
System Driver Resident Bytes
Process(_Total)\Working Set
Process(_Total)\Working Set, Memory\Cache Bytes and Memory\Pool Nonpaged
Bytes account for how RAM is allocated. If you also add Memory\Available Bytes, you
should be able to account for all RAM. Usually, however, this occurs:
sizeof(RAM) ≠ Process(_Total)\Working Set +
Memory\Cache Bytes + Memory\Pool Nonpaged Bytes
+ Memory\Available Bytes
This situation arises because the Process(_Total)\Working Set counter contains resident pages from shared DLLs that are counted against the working set of every process that has the DLL loaded. When you are trying to account for how all RAM is
being used, you can expect to see something like Figure 3-2. (The example is from a
server with 1 GB of RAM installed.)
266
Microsoft Windows Server 2003 Performance Guide
Figure 3-2
Accounting for RAM usage
Figure 3-2 shows what happens when you add Process(_Total)\Working Set to Cache
Bytes, Pool Nonpaged Bytes, and Available Bytes. Although the sum of these four
counters accounts for overall RAM usage in the system pools and in process working
sets, they clearly do not add up to the size of physical RAM.
In addition to this anomaly, some RAM allocations are not counted anywhere. Trimmed
working set pages that are dirty are stored in RAM prior to being copied to disk. There
is no Memory counter that tells you how much RAM these pages are currently occupying, although it is presumed to be a small quantity because page writes to disk are scheduled as soon as possible. There is also no direct accounting of RAM usage by the IIS
Kernel-mode cache that stores recently requested HTTP Response messages.
Transition faults An increase in the rate of transition faults that are occurring is a
clear indicator that the Memory Manager is working harder to maintain an adequate
pool of available pages. When the rate of Transition Faults appears excessive, adding
RAM should reduce the number of transition faults that occur. Table 3-14 describes
the Memory\Transition Faults/sec counter.
Table 3-14
Memory\Transition Faults/sec Counter
Counter Type
Interval difference counter (rate/second).
Description
The total number of soft or transition faults during the interval.
Transition faults occur when a recently trimmed page on the
Standby list is re-referenced. The page is removed from the
Standby list in memory and returned to the process working set.
No page-in operation from disk is required to resolve a transition fault.
Chapter 3:
Measuring Server Performance
267
Table 3-14 Memory\Transition Faults/sec Counter
Measurement Notes
Each type of page fault is counted.
Usage Notes
High values for this counter can easily be misinterpreted. Some
transition faults are unavoidable—they are a natural by-product
of the LRU-based page-trimming algorithm the Windows operating system uses. High rates of transition faults should not be
treated as performance concerns, if other indicators of paging
performance problems are not present.
■
When Available Bytes is at or near its minimum threshold
value, the rate of transition faults is an indicator of how
hard the operating system has to work to maintain a pool
of available pages.
Performance
Use Pages/sec and Page Reads/sec instead to detect excessive
paging.
Capacity Planning
An upward trend is a leading indicator of a developing memory
shortage.
Operations
Not applicable.
Alert Threshold
Do not Alert on this counter.
Related Measures
Memory\Pages/sec
Memory\Demand Zero Faults/sec
Memory\Page Reads/sec
Demand Zero Faults/sec reflects processes that are acquiring new pages, the contents
of which are always zeroed by the operating system before being reassigned to a new
process. This is normal behavior for modularly constructed applications that acquire
a new heap in each nested function call. As long as processes are releasing older memory pages at approximately the same rate as they acquire new pages on demand—in
other words, they are not leaking memory—the Memory Manager should have no
trouble keeping up with the demand for new pages.
Page Faults/sec The Memory\Page Faults/sec counter, described in Table 3-15, is
the sum of the three types of page faults that can occur: hard faults, which require an
I/O to disk; and transition faults and demand zero faults, which do not. The Page
Faults/sec counter is the sum of these three measurements: Page Reads/sec, Transition Faults/sec, and Demand Zero Faults/sec. It is recommended that you do not use
this field to generate Performance Alerts or alarms of any kind.
Table 3-15
Memory\Page Faults/sec Counter
Counter Type
Interval difference counter (rate/second).
Description
The total number of paging faults during the interval, including
both hard and soft faults. Only hard faults—Page Reads/sec—
have a significant performance impact. Page faults/sec is the
sum of Page Reads/sec, Transition Faults/sec, and Demand Zero
Faults/sec.
268
Microsoft Windows Server 2003 Performance Guide
Table 3-15
Memory\Page Faults/sec Counter
Measurement Notes
Each type of page fault is counted.
Usage Notes
High values for this counter can easily be misinterpreted. It is
safer to report the following counters separately, rather than
this composite number.
■
Page Reads/sec are hard page faults. A running thread has
referenced a page in virtual memory that is not in the process working set. It is also not a trimmed page marked in
transition, but rather is still resident in memory. The
thread is delayed for the duration of the I/O operation to
fetch the page from disk. The operating system copies the
page from disk to an available page in RAM and then redispatches the waiting thread.
■
Transition Faults/sec occur when pages trimmed from
process working sets are referenced before they are
flushed from memory. Page trimming is triggered when
the pool of available pages drops below its minimum allotment. The oldest pages in a process working set are
trimmed first. Transition faults are also known as “soft”
faults because they do not require a paging operation to
disk. Because the page referenced is still resident in memory, that page can be restored to the process working set
with minimal delay.
Many transition faults are unavoidable—they are a natural by-product of the LRU-based page-trimming algorithm that the Windows operating system uses. High rates
of transition faults should not be treated as performance
concerns. However, a constant upward trend might be a
leading indicator of a developing memory shortage.
■
Demand Zero Faults/sec measures requests for new virtual memory pages. Older trimmed pages eventually are zeroed by the operating system in anticipation of Demand
Zero requests. Windows zeros out the contents of new
pages as a security feature. Processes can acquire new
pages at very highs rates with little performance impact,
unless they never free the memory after it is allocated.
■
Older trimmed pages from the Standby list are repurposed when they are moved to the Zero list.
Performance
Use Pages/sec and Page Reads/sec instead to detect excessive
paging.
Capacity Planning
Not applicable.
Operations
Not applicable.
Alert Threshold
Do not Alert on this counter.
Related Measures
Memory\Transition Faults/sec
Memory\Demand Zero Faults/sec
Memory\Page Reads/sec
Chapter 3:
Measuring Server Performance
269
Virtual Memory Usage
When the active working sets of running processes fit comfortably into RAM, there is
little virtual memory management overhead. When the working sets of running processes overflow the size of RAM, the Memory Manager trims older pages from process
working sets to try to make room for newer virtual pages. Because of the operating system’s use of a global LRU page replacement policy, the memory access pattern of one
process can impact other running processes, which could see their working sets
trimmed as a result. If excessive paging then leads to a performance bottleneck, it is
because the demand for virtual memory pages exceeds the amount of physical memory installed.
Measurements of virtual memory allocations allow you to observe the demand side of
a dynamic page replacement policy. Each process address space allocates committed
memory that has to be backed by physical pages in RAM or slots on the paging file.
Monitoring the demand for virtual memory also allows you detect memory leaks—program bugs that cause a process to commit increasing amounts of virtual memory that
it never frees up.
Committed Bytes represents the overall demand for virtual memory by running
processes. Committed Bytes should be compared to the system’s Commit Limit, an
upper limit on the amount of virtual pages the system will allocate. The system’s
Commit Limit is the size of RAM, plus the sizing of the paging file, minus a small
amount of overhead. At or near the Commit Limit, memory allocation requests will
fail, which is usually catastrophic. Not many applications, system services, or system functions can recover when a request to allocate virtual memory fails. Prior to
reaching the Commit Limit, the operating system will automatically attempt to
grow the size of the paging file to try to forestall running out of virtual memory.
You should add memory or increase the size of your paging file or files to avoid
running up against the Commit Limit.
Virtual memory allocations can also be tracked at the process level. In addition, system functions allocate pageable virtual memory from the Paged pool, which is something that should also be monitored.
Committed bytes Virtual memory in a process address space is free (unallocated),
reserved, or committed. Committed memory is allocated memory that the system
must reserve space for either in physical RAM or out on the paging file so that this
memory can be addressed properly by threads running in the process context. Table
3-16 describes the Memory\Committed Bytes counter.
270
Microsoft Windows Server 2003 Performance Guide
Table 3-16
Memory\Committed Bytes Counter
Counter Type
Instantaneous (sampled once during each measurement period).
Description
The number of committed virtual memory pages. A committed
page must be backed by a physical page in RAM or by a slot on
the paging file.
Measurement Notes
Can also be obtained by calling the GlobalMemoryStatusEx
Win32 API function.
Usage Notes
Committed Bytes reports how much total virtual memory process address spaces have allocated. Each committed page will
have a Page Table entry built for it when an instruction first references a virtual memory address that it contains.
■
Divide by the size of a page to calculate the number of
committed pages.
■
A related measure, % Committed Bytes in Use, is calculated by dividing Committed Bytes by the Commit Limit:
Memory\% Committed Bytes in Use =
Memory\Committed Bytes ÷ Memory\Commit Limit
A machine with % Committed Bytes in Use > 90 percent
is running short of virtual memory.
■
The Commit Limit is the amount of virtual memory that can
be committed without having to extend the paging file or
files. The system’s Commit Limit is the size of RAM, plus the
sizing of the paging file, minus a small amount of overhead.
The paging file can be extended dynamically when it is full
(if it is not already at its maximum size and there is sufficient
space in the file system where it is located.)
When the Commit Limit is reached, the system is out of virtual
memory. No more than the Commit Limit number of virtual
pages can be allocated.
■
Program calls to allocate virtual memory will fail at or near
the Commit Limit. The results are usually catastrophic.
■
When a Paging File(n)\% Usage approaches 100 percent,
the Memory Manager will extend the paging file—if the
configuration permits—which will result in an increase to
the Commit Limit.
■
Calculate a memory contention index:
Committed Bytes ÷ sizeof(RAM)
If the Committed Bytes:RAM ratio is > 1, virtual memory exceeds the size of RAM, and some memory management will be
necessary.
As the Committed Bytes:RAM ratio grows above 1.5, paging to
disk will usually increase up to a limit imposed by the bandwidth
of the paging disks.
Chapter 3:
Measuring Server Performance
271
Table 3-16 Memory\Committed Bytes Counter
Performance
The Committed Bytes:RAM ratio is a secondary indicator of a
real memory shortage.
Capacity Planning
Watch for upward trends in the Committed Bytes:RAM ratio.
Add memory when the Committed Bytes:RAM ratio exceeds 1.5.
Operations
Excessive paging can lead to slow and erratic response times.
Alert Threshold
Alert when the Committed Bytes:RAM ratio exceeds 1.5.
Related Measures
Memory\Pages/sec
Memory\Commit Limit
Memory\% Committed Bytes in Use
Memory\Pool Paged Bytes
Process(n)\Private Bytes
Process(n)\Virtual Bytes
When a memory leak occurs, or the system is otherwise running out of virtual memory, drilling down to the individual process is often useful. Three counters at the process level describe how each process is allocating virtual memory: Process(n)\Virtual
Bytes, Process(n)\Private Bytes, and Process(n)\Pool Paged Bytes.
Process(n)\Virtual Bytes shows the full extent of each process’s virtual address space,
including shared memory segments that are used to map files and shareable image file
DLLs. If you need more information about how the virtual memory is allocated inside
a process virtual address space, run the Virtual Address Dump (vadump.exe) command-line tool. The use of the vadump command-line tool is illustrated in Chapter 5,
“Performance Troubleshooting.”
If a process is leaking memory, you should be able to tell by monitoring Process(n)\Private Bytes or Process(n)\Pool Paged Bytes, depending on the type of memory leak. A memory leak that is allocating but not freeing virtual memory in the
process’s private range will be reflected in monotonically increasing values of the
Process(n)\Private Bytes counter, described in Table 3-17. A memory leak that is
allocating but not freeing virtual memory in the system range will be reflected in
monotonically increasing values of the Process(n)\Pool Paged Bytes counter.
Table 3-17
Process(n)\Private Bytes Counter
Counter Type
Instantaneous
(sampled once during each measurement period).
Description
The number of a process’s committed virtual memory pages
that are private. Private page addresses can be addressed only
by a thread running in this process context.
272
Microsoft Windows Server 2003 Performance Guide
Table 3-17
Process(n)\Private Bytes Counter
Usage Notes
Process(n)\Private Bytes reports how much private virtual memory the process address space has allocated. Process(n)\Virtual
Bytes includes shared segments associated with mapped files
and shareable Image files.
■
Divide by the size of the page to calculate the number of
free pages.
■
Identify the cause of a memory leak by finding a process
with an increasing number of Process(n)\Private Bytes. A
process leaking memory may also see growth in its Working Set bytes, but Private Bytes is the more direct symptom.
■
Some outlaw processes may leak memory in the system’s
Paged Pool. The Process(n)\Paged Pool Bytes counter
helps you to identify those leaky applications.
Performance
Not applicable.
Capacity Planning
Not applicable.
Operations
Primarily used to identify processes that are leaking memory.
Alert Threshold
In general, do not Alert on this counter value. However, it is often useful to Alert on Process(n)\Private Bytes as soon as a process suspected of leaking memory exceeds a critical allocation
threshold.
Related Measures
Memory\Commit Limit
Memory\% Committed Bytes in Use
Process(n)\Pool Paged Bytes
Process(n)\Virtual Bytes
Virtual memory in the system range The upper half of the 32-bit 4-GB virtual
address range is earmarked for system virtual memory. The system virtual memory
range, 2-GB wide, is divided into three major pools: the Nonpaged pool, the Paged
pool, and the system file cache. When the Paged pool or the Nonpaged pool is
exhausted, system functions that need to allocate virtual memory will fail. These
pools can be exhausted before the system Commit Limit is reached. If the system runs
out of virtual memory for the file cache, file cache performance could suffer, but the
situation is not as dire.
The size of the three main system area virtual memory pools is determined initially
based on the amount of RAM. These initial allocation decisions can also be influenced
by a series of settings in the HKLM\SYSTEM\CurrentControlSet\Control\Session
Manager\Memory Management key. These settings are discussed in more detail in
Chapter 6, “Advanced Performance Topics.” The size of these pools is also adjusted
dynamically, based on virtual memory allocation patterns, to try and avoid shortages
in one area or another. Still, sometimes shortages can still occur, with machines configured with large amounts of RAM being the most vulnerable. Using the boot options
that shrink the system virtual address range in favor of a larger process private address
range sharply increases the risk of running out of system PTEs. These boot options
are discussed in Chapter 6, “Advanced Performance Topics.”
Chapter 3:
Measuring Server Performance
273
System services and other functions allocate pageable virtual memory from the Pageable pool. A system function called by a process could also allocate pageable virtual
memory from the Pageable pool. If the Pageable pool runs out of space, system functions that attempt to allocate virtual memory from the Pageable pool will fail. It is possible for the Pageable pool to be exhausted long before the Commit Limit is reached.
Registry configuration and tuning parameters that can affect the size of the Pageable
pool are discussed in Chapter 6, “Advanced Performance Topics.” The Memory\Paged
Pool Bytes Counter is described in Table 3-18.
Table 3-18
Memory\Paged Pool Bytes Counter
Counter Type
Instantaneous
(sampled once during each measurement period).
Description:
The number of committed virtual memory pages in the system’s
Paged pool. System functions allocate virtual memory pages
that are eligible to be paged out from the Paged pool. System
functions that are called by processes also allocate virtual memory pages from the Paged pool.
Usage Notes
Memory\Paged Pool Bytes reports how much virtual memory is
allocated in the system Paged pool. Memory\Paged Pool Resident Bytes is the current number of Paged pool pages that are
resident in RAM. The remainder is paged out.
■
Divide by the size of a page to calculate the number of allocated virtual pages.
■
A memory leak can deplete the Paged pool, causing system functions that allocate virtual memory from the
Paged pool to fail. You can identify the culprit causing a
memory leak by finding a process with an increasing
number of Process(n)\Paged Pool Bytes. A process that is
leaking memory might also see growth in its Working Set
bytes, but Paged Pool Bytes is the more direct symptom.
■
Some outlaw processes might leak memory in the system’s Paged pool. The Process(n)\Paged Pool Bytes
counter helps you to identify those leaky applications.
Performance
Not applicable.
Capacity Planning
Not applicable.
Operations
Primarily used to identify processes that are leaking memory.
Alert Threshold
In general, do not Alert on this counter value. However, it is often useful to Alert on Process(n)\Paged Pool Bytes as soon as a
process suspected of leaking memory exceeds a critical allocation threshold.
Related Measures
Memory\Commit Limit
Memory\% Committed Bytes in Use
Process(n)\Pool Paged Bytes
Process(n)\Virtual Bytes
Depending on what it is doing, a process could also leak memory in the system’s
Paged pool. The Process(n)\Pool Paged Bytes counter allows you to identify processes
274
Microsoft Windows Server 2003 Performance Guide
that are leaking memory in the system Paged pool. The Memory\Nonpaged Pool
Bytes counter is described in Table 3-19.
Table 3-19
Memory\Nonpaged Pool Bytes
Counter Type
Instantaneous
(sampled once during each measurement period).
Description
The number of allocated pages in the system’s Nonpaged pool.
System functions allocate pages from the Nonpaged pool when
they require memory that cannot be paged out. For example,
device driver functions that execute during interrupt processing
must allocate memory from the Nonpaged pool.
Usage Notes
Memory\Nonpaged Pool Bytes reports how much memory is allocated in the system Nonpaged pool. Because pages in the
Nonpaged pool cannot be paged out, this counter measures
both virtual and real memory usage.
■
Divide by the size of the page to calculate the number of
allocated virtual pages.
■
If the Nonpaged pool fills up, key system functions might
fail.
■
Important functions that allocate memory from the Nonpaged pool include TCP/IP session connection data that
is accessed during Network Interface interrupt processing.
Performance
Not applicable.
Capacity Planning
Sizing and planning for network connections.
Operations
Used to identify device drivers that are leaking memory.
Alert Threshold
In general, do not Alert on this counter value. However, it is often useful to Alert on Process(n)\Nonpaged Pool Bytes as soon
as a process suspected of leaking memory exceeds a critical allocation threshold.
Related Measures
Memory\Pool Paged Bytes.
System PTEs are built and used by system functions to address system virtual memory areas. When the system virtual memory range is exhausted, the number of Free
System PTEs drops to zero, and no more system virtual memory of any type can be
allocated. On 32-bit systems with large amounts of RAM (1–2 GB or more), tracking
the number of Free System PTEs is important. Table 3-20 describes the Memory\Free
System Page Table Entries counter.
Chapter 3:
Table 3-20
Measuring Server Performance
275
Memory\Free System Page Table Entries Counter
Counter Type
Instantaneous
(sampled once during each measurement period).
Description
The number of free System PTEs. A free System PTE is used to
address virtual memory in the system range. This includes both
the Paged pool and the Nonpaged pool. When no free System
PTEs are available, calls to allocate new virtual memory areas in
the system range will fail.
Usage Notes
Memory\Paged Pool Bytes reports how much virtual memory is
allocated in the system Paged pool. Memory\Paged Pool Resident Bytes is the current number of Paged pool pages that are
resident in RAM. The remainder is paged out.
■
The system virtual memory range is exhausted when the
number of free System PTEs drops to zero. At that point,
no more system virtual memory of any type can be allocated.
■
On 32-bit systems with 2 GB or more of RAM, tracking the
number of free System PTEs is important. Those systems
are vulnerable to running out of free System PTEs.
Performance
Not applicable.
Capacity Planning
Not applicable.
Operations
Primarily used to identify processes that are leaking memory.
Alert Threshold
Alert when the number of free System PTEs < 100.
Related Measures
Memory\Commit Limit
Memory\% Committed Bytes in Use
Process(n)\Pool Paged Bytes
Process(n)\Virtual Bytes
System functions allocate pageable virtual memory from a single, shared Paged pool.
A process or device driver function that leaks memory from the Paged pool will
deplete and eventually exhaust the pool. When the pool is depleted, subsequent
requests to allocate memory from the Paged pool will fail. Any operating system function, device driver, or application process that requests virtual memory is subject to
these memory allocation failures. It is not always easy to pinpoint exactly which application is responsible for this pool becoming exhausted. Fortunately, at least one
server application—the file Server service—reports on Paged pool memory allocation
failures when they occur. This counter can prove helpful even when a server is not primarily intended to serve as a network file server. Table 3-21 describes the
Server\Paged Pool Failures counter.
276
Microsoft Windows Server 2003 Performance Guide
Table 3-21
Server\Paged Pool Failures Counter
Counter Type
Instantaneous
(sampled once during each measurement period).
Description
The cumulative number of Paged pool allocation failures that
the Server service experienced since being initialized.
Usage Notes
The file Server service has a number of functions that allocate
virtual memory pages from the Paged pool.
Performance
■
If a memory leak exhausts the Paged pool, the file Server
service might encounter difficulty in allocating virtual
memory from the Paged pool.
■
If a call to allocate virtual memory fails, the file Server service recovers gracefully from these failures and reports on
them.
■
Because many other applications and system functions do
not recover gracefully from virtual memory allocation
failures, this counter can be the only reliable indicator that
a memory leak caused these allocation failures.
Not applicable.
Capacity Planning
Not applicable.
Operations
Primarily used to identify a virtual memory shortage in the
Paged pool.
Alert Threshold
Alert on any nonzero value of this counter.
Related Measures
Memory\Pool Paged Bytes
Memory\Commit Limit
Memory\% Committed Bytes in Use
Server\Pool Paged Bytes
Process(n)\Pool Paged Bytes
Memory-Resident Disk Caches
Memory-resident disk caches are one of the major consumers of RAM on many Windows Server 2003 machines. Many applications can run faster if they cache frequently
accessed data in memory rather than access it from disk repeatedly. The system file
cache is an area of system virtual memory that is reserved for the purpose of storing
frequently accessed file segments in memory for quicker access. The system file cache
has three interfaces: the Copy interface, which is used by default; the Mapping interface, which is used by system applications that need to control when cached writes are
written to the disk; and the MDL interface, which is designed for applications that
need access to physical memory buffers. Each of these interfaces is associated with a
separate set of the counters in the Cache object.
Chapter 3:
Measuring Server Performance
277
Besides the built-in system file cache, some applications build and maintain their own
memory-resident caches specifically designed to cache objects other than files. IIS 6.0
operates a Kernel-mode cache that caches frequently requested HTTP Response messages. Because the IIS Kernel-mode driver operates this cache, the HTTP Response
cache is built using physical memory.
SQL Server and Exchange both build caches to store frequently accessed information
from disk databases. The SQL Server and Exchange caches are carved from the process private address space. On 32-bit machines, they both can benefit from an
extended private area address space. SQL Server might also benefit from being able to
access more than 4 GB of RAM using the Physical Address Extension (PAE) and the
Address Windowing Extensions (AWE). PAE and AWE are discussed in Chapter 6,
“Advanced Performance Topics”
Each of the application-oriented caches is instrumented and provides performance
counters. These application cache counters are mentioned briefly here. When they are
active, these caches tend to be larger consumers of RAM. The most important performance measurements associated with caches are the amount of memory they use, the
rate of read and write activity to the cache, and the percentage of cache hits. A cache hit
is an access request that is satisfied from current data that is stored in the cache, not
on disk. Usually, the more memory devoted to the cache, the higher the rate of cache
hits. However, even small amounts of cache memory are likely to be very effective,
while allocating too much physical memory to the cache is likely to be a waste of
resources.
A cache miss is a request that misses the cache and requires a disk access to satisfy.
Most applications read through the cache, which means that following a cache miss,
the data requested is available in the cache for subsequent requests. When writes
occur, the disk copy of the data is no longer current and must be updated. Most
caches defer writes to disk as long as possible. This is also known as lazy write. Then,
after enough dirty blocks in cache have accumulated, writes are flushed to the disk in
efficient, bulk operations. For data integrity and recoverability reasons, applications
sometimes need to control when data on disk is updated. The System File Cache’s
Mapping Interface provides that capability, for example. Table 3-22 describes the
Memory\System Cache Resident Bytes counter.
278
Microsoft Windows Server 2003 Performance Guide
Table 3-22
Memory\System Cache Resident Bytes Counter
Counter Type
Instantaneous
(sampled once during each measurement period).
Description
The number of resident pages allocated to the System File
Cache. The System File Cache occupies a reserved area of the
system virtual address range. This counter tracks the number of
virtual memory pages from the File Cache that are currently resident in RAM.
Usage Notes
On file print and servers, System Cache Resident Bytes is often
the largest consumer of RAM.
Compare memory usage to each of the following file cache hit
ratios:
■
Copy Read Hits %: The Copy interface to the cache is invoked by default when a file is opened. The System Cache
maps the file into the system virtual memory range and
copies file data from the system range to the process private address space, where it can be addressed by the application.
■
Data Map Hits %: Returns virtual addresses that point to
the file data in the system virtual memory range. Requires
that application threads run in Privileged mode. The
Mapping interface supports calls to Pin and Unpin file
buffers to control the timing of physical disk writes. Used
by the Redirector service for the client-side file cache and
Ntfs.sys for caching file system metadata.
■
MDL Read Hits %: MDL stands for Memory Descriptor List,
which consists of physical address parameters passed to
DMA controllers. Requires that application threads run in
Privileged mode and support physical addresses. Used by
the Server service for the server-side file cache, and IIS for
caching htm, .gif, .jpg, .wav, and other static files.
■
On a System File Cache miss, a physical disk I/O to an application file is performed. The paging file is unaffected.
PTEs backing the File Cache do not point directly to RAM,
but instead to Virtual Address Descriptors (VADs).
■
Divide by the size of a page to calculate the number of allocated virtual pages.
■
The System File Cache reserves approximately 1 GB of virtual memory in the system range by default.
■
System Cache Resident Bytes is part of the system’s working set (Cache Bytes) and is subject to page trimming
when Available Bytes becomes low.
Chapter 3:
Table 3-22
Measuring Server Performance
279
Memory\System Cache Resident Bytes Counter
Performance
When the System File Cache is not effective, performance of
server applications that rely on the cache are impacted. These
include Server, Redirector, NTFSs, and IIS.
Capacity Planning
Not applicable.
Operations
Primarily used to identify processes that are leaking memory.
Alert Threshold
Do not Alert on this counter value.
Related Measures
Memory\Cache Bytes
Memory\Transition Pages rePurposed/sec
Cache\MDL Read Hits %
Cache\Data Map Hits %
Cache\Copy Read Hits %
IIS version 6.0 relies on a Kernel-mode cache for HTTP Response messages in addition to the User-mode cache, in which recently accessed static objects are cached. The
kernel cache is used to store complete HTTP Response messages that can be returned
to satisfy HTTP GET Requests without leaving Kernel mode. Web service cache performance statistics for both the Kernel mode and User mode caches are available in
the Web Service Cache object. SQL Server 2000 cache statistics per database are available in the SQL Server:Cache Manager object.
Monitoring Disk Operations
Performance statistics on both logical and physical disks are provided by measurement layers in the I/O Manager stack, as described in Chapter 1, “Performance Monitoring Overview.” These measurement functions track and compute performance
information about disk I/O requests at both the Logical and Physical Disk level. Logical Disk performance measurements are provided by the Logical Disk I/O driver:
either the Ftdisk.sys driver for basic volumes or Dmio.sys for dynamic volumes. Physical disk measurements are maintained by the Partmgr.sys driver layer. Interval performance statistics include the activity rate to disk, the disk % Idle Time, the average
response time (including queue time) of disk requests, the bytes transferred, and
whether the operations were Reads or Writes. Additional disk performance statistics
are then calculated by the PerfDisk.dll Performance Library based on these measurements, including the Avg. Disk Queue Length.
To diagnose and resolve disk I/O performance problems, calculating some additional
disk statistics beyond those that are provided automatically is extremely useful. A few
simple calculations allow you to generate some important additional disk perfor-
280
Microsoft Windows Server 2003 Performance Guide
mance metrics, namely, disk utilization, average disk service time, and average disk
queue time. Being able to decompose disk response time, as reported by the Avg. Disk
secs/Transfer counters, into device service time and queue time allows you to distinguish between a device that is running poorly and a device that is overloaded.
A logical disk represents a single file system with a unique drive letter, for example. A
physical disk is the internal representation of a SCSI Logical Unit Number (LUN).
When you are using array controllers and RAID disks, the underlying physical disk
hardware characteristics are not directly visible to the operating system. These physical characteristics—the number of disks, the speed of the disks, the RAID-level organization of the disks—can have a major impact on performance. Among simple disk
configurations, device performance characteristics vary based on seek time, rotational
speed, and bit density. More expensive, performance-oriented disks also incorporate
on-board memory buffers that boost performance substantially during sequential
operations. In addition, disk support for SCSI tagged command queuing opens the
way for managing a device’s queued requests so that ones with the shortest expected
service time are scheduled first. This optimization can boost the performance of a disk
servicing random requests by 25–50 percent in the face of queued requests.
What appears to the operating system as a simple disk drive might in fact be an array
of disks behind a caching controller. Disk arrays spread the I/O load evenly across
multiple devices. Redundant array of independent disks (RAID) provides storage for
duplicate data, in the case of disk mirroring schemes; or parity data, in the case of
RAID 5, that can be used to reconstitute the contents of a failed device. Maintaining
redundant data normally requires additional effort, leading to I/O activity within the
disk array that was not directly initiated by the operating system. Any I/Os within the
array not directly initiated by the operating system are not counted by the I/O Manager instrumentation.
Similarly, I/Os that are resolved by controller caches are counted as physical I/O operations whether or not there is physical I/O to the disk or disks configured behind the
cache that occurs. For battery-backed, nonvolatile caches, write operations to disk are
frequently deferred and occur only later asynchronously. When cached writes to disk
are deferred, device response time measurements obtained by the I/O Manager instrumentation layers are often much lower than expected. This is because the cache
returns a successful I/O completion status to the operating system almost immediately as soon as the data transfer from host memory to the controller cache memory
completes.
Chapter 3:
Measuring Server Performance
281
When you run any of these types of devices, the performance data available from System Monitor needs to be augmented by configuration and performance information
available from the array itself. See Chapter 5, “Performance Troubleshooting,” for an
in-depth discussion of disk performance measurements issues when disk array controllers and disk controller caches are present.
It is important to be proactive about disk performance because it is prone to degrade
rapidly, particularly when memory-resident caches start to lose their effectiveness or
disk-paging activity erupts. The interaction between disk I/O rates and memory cache
effectiveness serves to complicate disk capacity planning. (In this context, paging to
disk can be viewed as a special case of memory-resident disk caching where the most
active virtual memory pages are cached in physical RAM.)
Cache effectiveness tends to degrade rapidly, leading to sharp, nonlinear spikes in
disk activity beginning at the point where the cache starts to lose its effectiveness.
Consequently, linear trending based on historical patterns of activity is often not a reliable way to predict future activity levels. Disk capacity planning usually focuses,
instead, on provisioning to support growing disk space requirements, not poor performance. However, planning for adequate disk performance remains an essential element of capacity planning. Because each physical disk has a finite capacity to service
disk requests, the number of physical disks installed usually establishes an upper
bound on disk I/O bandwidth. The Physical Disk(n)\Avg. Disk secs/transfer counter
is described in Table 3-23.
Note
Logical disk and physical disk statistics are defined and derived identically by
I/O Manager instrumentation layers, except for the addition of two file system disk
space usage measurements in the Logical Disk object.
Table 3-23
Physical Disk(n)\Avg. Disk secs/transfer Counter
Counter Type
Average.
Description
Overall average response time of physical disk requests over the
interval. Avg. Disk secs/transfer includes both device service
time and queue time.
Measurement Notes
The start and end time of each I/O Request Packet (IRP) is recorded by the I/O Manager instrumentation layer. The result,
averaged over the interval, is the round trip time (RTT) of a disk
request.
282
Microsoft Windows Server 2003 Performance Guide
Table 3-23
Physical Disk(n)\Avg. Disk secs/transfer Counter
Usage Notes
The primary indicator of physical disk I/O performance.
Physical disks are the equivalent of SCSI LUNs. Performance is
dependent on the underlying disk configuration, which is transparent to the operating system.
■
Individual disks range in performance characteristics
based on seek time, rotational speed, recording density,
and interface speed. More expensive, performance-oriented disks can provide 50 percent better performance.
■
Disk arrays range in performance based on the number of
disks in the array and how redundant data is organized
and stored. RAID 5 disk arrays, for example, suffer a significant performance penalty when writes occur.
■
Disk cache improves performance on read hits up to the
interface speed. Deferred writes to cache require reliable,
on-board battery backup of cache memory.
Performance
Primary indicator to determine whether the disk is a potential
bottleneck.
Capacity Planning
Not applicable.
Operations
Poor disk response time slows application response time.
Alert Threshold
Depends on the underlying disk hardware.
Related Measures
Physical Disk(n)\Disk Transfers/sec
Physical Disk(n)\% Idle Time
Physical Disk(n)\Current Disk Queue Length
If disks are infrequently accessed, even very poor disk response time is not a major
problem. However, when Physical Disk\Disk Transfers/sec exceeds 15–25 disk I/Os
per second per disk, the reason for the poor disk response time should be investigated. When Avg. Disk secs/transfer indicates slow disk response time, you need to
determine the underlying cause. As a first step, separate the disk response time value
recorded in the Avg. Disk secs/transfer counter into average service time and average
queue time. Table 3-24 describes the Physical Disk(n)\% Idle Time counter.
Table 3-24
Physical Disk(n)\% Idle Time Counter
Counter Type
Interval (% Busy).
Description
% of time that the disk was idle during the interval. Subtract %
Idle Time from 100 percent to calculate disk utilization.
Measurement Notes
Idle Time accumulates whenever there are no requests outstanding for the device.
Chapter 3:
Table 3-24
Measuring Server Performance
283
Physical Disk(n)\% Idle Time Counter
Usage Notes
% Idle Time is the additive reciprocal (1−x) of disk utilization.
■
Derive disk utilization as follows: Physical Disk(n)\Disk utilization = 100% − Physical Disk(n)\% Idle Time
■
For disk arrays, divide disk utilization by the number disks
in the array to estimate individual disk busy. Note, however, that additional I/Os might be occurring on the disks that
are invisible to the operating system and that cause disks
in redundant arrays to be busier than this estimated value.
RAID subsystems require additional I/Os to maintain redundant data. If cached disks use Lazy Write to defer writes
to disk, these writes to disk still take place, but only at some
later time.
■
Queue time can be expected to increase exponentially as
disk utilization approaches 100 percent, assuming independent arrivals to the disk. Derive disk queue time as follows:
Physical Disk(n)\Disk service time =
Physical Disk(n)\Disk utilization ÷ Physical Disk(n)\Disk
Transfers/sec
Physical Disk(n)\Disk queue time = Physical Disk(n)\Avg.
Disk sec/Transfer − Physical Disk(n)\Disk service time
■
Apply an appropriate optimization strategy to improve
disk performance, depending on whether the problem is
excessive service time or queue time delays. See Chapter 5,
“Performance Troubleshooting,” for more details.
Performance
Primary indicator to determine whether a physical disk is overloaded and serving as a potential bottleneck.
Capacity Planning
Not applicable.
Operations
Increased queue time contributes to poor disk response time,
which slows application response time.
Alert Threshold
Alert when % Idle Time is < 20 percent.
Related Measures
Physical Disk(n)\Avg. Disk secs/Transfer
Physical Disk(n)\Disk Transfers/sec
Physical Disk(n)\Current Disk Queue Length
Calculate disk utilization, disk service time, and disk queue time to determine
whether you have a poor performing disk subsystem, an overloaded disk, or both. If
disk I/O rates are high, you should also reconsider how effectively your workload is
utilizing memory-resident cache to reduce the number of disk I/Os that reach the
physical disk. These and other disk optimization strategies are discussed in more
depth in Chapter 5, “Performance Troubleshooting.” Table 3-25 describes the Physical
Disk(n)\Disk Transfers/sec counter.
284
Microsoft Windows Server 2003 Performance Guide
Table 3-25
Physical Disk(n)\Disk Transfers/sec Counter
Counter Type
Interval difference counter (rate/second).
Description
The rate physical disk requests were completed over the interval.
Measurement Notes
The start and end time of each I/O Request Packet (IRP) is recorded by the I/O Manager instrumentation layer. This counter reflects
the number of requests that completed during the interval.
Usage Notes
The primary indicator of physical disk I/O activity. Also known as
the disk arrival rate.
■
Also broken down by Reads and Writes: Physical
Disk(n)\Disk Transfers/sec = Physical Disk(n)\Disk Reads/sec
+ Physical Disk(n)\Disk Writes/sec
■
For Disk arrays, divide Disk Transfers/sec by the number of
disks in the array to estimate individual disk I/O rates. Note,
however, that additional I/Os might be occurring on the
disks that are invisible to the operating system that cause
disks in redundant arrays to be busier than this estimated
value. In a RAID 1 or RAID 5 organization, additional I/Os
are required to maintain redundant data segments.
■
Used to calculate disk service time from % Idle Time by applying the Utilization Law.
Performance
Primary indicator to determine whether the disk is a potential
bottleneck.
Capacity Planning
Not applicable.
Operations
Poor disk response time slows application response time.
Alert Threshold
Depends on the underlying disk hardware.
Related Measures
Physical Disk(n)\Disk Transfers/sec
Physical Disk(n)\% Idle Time
Physical Disk(n)\Current Disk Queue Length
Physical disk hardware can perform only one I/O operation at a time, so the number
of physical disks attached to your computer serves as an upper bound on the sustainable disk I/O rate. Table 3-26 describes the Physical Disk(n)\Current Disk Queue
Length counter.
Table 3-26
Physical Disk(n)\Current Disk Queue Length Counter
Counter Type
Instantaneous
(sampled once during each measurement period).
Description
The current number of physical disk requests that are either in
service or are waiting for service at the disk.
Measurement Notes
The start and end time of each I/O Request Packet (IRP) is recorded by the I/O Manager instrumentation layer. This counter
reflects the number of requests that are outstanding at the end
of the measurement interval.
Chapter 3:
Measuring Server Performance
285
Table 3-26 Physical Disk(n)\Current Disk Queue Length Counter
Usage Notes
A secondary indicator of physical disk I/O queuing.
■
Current Disk Queue Length is systematically under-sampled because Interrupt processing, which reduces the
length of the disk request queue, runs at a higher dispatching priority than the software that gathers the disk
performance measurements from the PerfDisk.dll Performance Library.
■
Useful to correlate this measured value with derived values like the Avg. Disk Queue Time, which you can calculate, and the Avg. Disk Queue Length counter to verify
that disk queuing is a significant problem.
■
Values of the Current Disk Queue Length counter should
be interpreted based on an understanding of the nature
of the underlying physical disk entity. What appears to
the host operating system as a single physical disk might,
in fact, be a collection of physical disks that appear as a
single LUN. Array controllers are often used to create Virtual LUNs that are backed by multiple physical disks. With
array controllers, multiple disks in the array can be performing concurrent operations. Under these circumstances, the Physical Disk entity should then no longer be
viewed as a single server.
■
If multiple disks are in the underlying physical disk entity,
calculate the Current Disk Queue Length per physical
disk.
Performance
Secondary indicator to determine whether the disk is a potential
bottleneck.
Capacity Planning
Not applicable.
Operations
Poor disk response time slows application response time.
Alert Threshold
Alert when Current Disk Queue Length exceeds 5 requests per
disk.
Related Measures
Physical Disk(n)\% Idle Time
Physical Disk(n)\Avg. Disk secs/Transfer
Physical Disk(n)\Avg. Disk Queue Length
Because disk I/O interrupt processing has priority over System Monitor measurement
threads, the Current Disk Queue Length counter probably underestimates the extent
that disk queuing is occurring. It is useful, nevertheless, to confirm the extent that
disk queuing is occurring. Many I/O workloads are bursty, so you should not become
alarmed when you see nonzero values of the Current Disk Queue Length from time to
time. However, when the Current Disk Queue Length is greater than zero for sustained intervals, the disk is overloaded.
286
Microsoft Windows Server 2003 Performance Guide
Derived Disk Measurements
The Logical and Physical Disk counters include several counters derived from the
direct disk performance measurements that are prone to misinterpretation. These
counters include % Disk Read Time, % Disk Write Time, % Disk Time, Avg. Disk
Read Queue Length, Avg. Disk Write Queue Length, and Avg. Disk Queue Length. All
of these derived counters need to be interpreted very carefully to avoid confusion.
Caution Unlike the Physical Disk\% Idle Time counter, the % Disk Read Time, %
Disk Write Time, or % Disk Time counters do not attempt to report disk utilization.
Whereas % Idle Time is measured directly by the I/O Manager instrumentation layer,
% Disk Read Time, % Disk Write Time, and % Disk Time are derived from basic measurements using a formula based on Little’s Law. The application of Little’s Law might
not be valid at that moment for your disk configuration.
The Avg. Disk Read Queue Length, Avg. Disk Write Queue Length, and Avg. Disk
Queue Length measurements are based on a similar formula. These derived counters
attempt to calculate the average number of outstanding requests to the Physical (or
Logical) Disk over the measurement interval using Little’s Law. However, Little’s Law
might not be valid over very small measurement intervals or intervals in which the disk
I/O is quite bursty.
These derived disk performance counters should be relied on only if you have a good
understanding of the underlying problems of interpretations.
As an alternative, you can always rely on the disk counters that are based on direct
measurements. These include the % Idle Time, Disk Transfers/sec, Avg. Disk secs/Transfer, and Current Disk Queue Length counters.
Interpretation of the % Disk Time and Avg. Disk Queue Length counters is also difficult when the underlying physical disk entity contains multiple disks. What the host
operating system regards as a physical disk entity might, in fact, be a collection of
physical disks—or portions of physical disks—that are configured using an array controller to appear as a single LUN. If the underlying physical disk entity contains multiple disks capable of performing disk operations in parallel—a function common to
most array controllers—the physical disk entity should not be viewed as a single
server. Under these circumstances, the measured values for the Current Disk Queue
Length and derived values of the Avg. Disk Queue Length reflect a single queue serviced by multiple disks. If multiple disks are in the physical disk entity, you should
calculate the average queue length per disk. Table 3-27 describes the Physical
Disk(n)\Avg. Disk Queue Length counter.
Table 3-27
Physical Disk(n)\Avg. Disk Queue Length Counter
Counter Type
Compound counter.
Description
The estimated average number of physical disk requests that are
either in service or are waiting for service at the disk.
Chapter 3:
Table 3-27
Measuring Server Performance
287
Physical Disk(n)\Avg. Disk Queue Length Counter
Measurement Notes
Avg. Disk Queue Length is derived using Little’s Law by multiplying Physical Disk(n)\Avg. Disk secs/Transfer by Physical
Disk(n)\Disk Transfers/sec. This counter estimates the average
number of requests that are in service or queued during the
measurement interval.
Usage Notes
A secondary indicator of physical disk I/O queuing that requires
careful interpretation.
■
For very short measurement intervals or for intervals in
which the I/O activity is quite bursty, very high values of the
Avg. Disk Queue Length should be interpreted cautiously.
The use of Little’s Law to derive the average disk queue
length might not be valid for those measurement intervals.
■
Little’s Law requires the equilibrium assumption that the
number of I/O arrivals equals completion during the interval. For short measurement intervals, compare the Current Disk Queue Length to the value observed at the end
of the previous measurement interval. If the values are
significantly different, the use of Little’s Law to estimate
the queue length during the interval is suspect.
■
Correlate this derived value with measured values of the
Current Disk Queue Length for the same measurement
intervals.
■
Avg. Disk Read Queue Length and Avg. Disk Write Queue
Length are derived similarly:
Physical Disk(n)\Avg. Disk Read Queue Length = Physical
Disk(n)\Avg. Disk secs/Read × Physical Disk(n)\Disk
Reads/sec
Physical Disk(n)\Avg. Disk Write Queue Length = Physical
Disk(n)\Avg. Disk secs/Write × Physical Disk(n)\Disk Write
/sec
Interpretation of these values is subject to the same warnings listed in the Caution earlier in this section.
■
Values of the Avg. Disk Queue Length counter should be
interpreted based on an understanding of the nature of the
underlying physical disk entity. What appears to the host
operating system as a single physical disk might, in fact, be
a collection of physical disks that appear as a single LUN.
Array controllers are often used to create Virtual LUNs that
are backed by multiple physical disks. With array controllers, multiple disks in the array can be performing concurrent operations. Under these circumstances, the physical
disk entity should no longer be viewed as a single server.
■
If multiple disks are in the underlying physical disk entity,
calculate the Avg. Disk Queue Length per physical disk.
■
% Disk Read Time, % Disk Time, and % Disk Write Time
are derived using the same formulas, except that the values they report are capped at 100 percent.
288
Microsoft Windows Server 2003 Performance Guide
Table 3-27
Physical Disk(n)\Avg. Disk Queue Length Counter
Performance
Secondary indicator to determine whether the disk is a potential
bottleneck.
Capacity Planning
Not applicable.
Operations
Not applicable.
Alert Threshold
Do not Alert on this counter value.
Related Measures
Physical Disk(n)\% Idle Time
Physical Disk(n)\Avg. Disk secs/Transfer
Physical Disk(n)\Disk Transfers/sec
Physical Disk(n)\Current Disk Queue Length
Physical Disk(n)\% Disk Time
The Explain text for the % Disk Time counters is misleading. These counters do not
measure disk utilization or how busy the disk is, as the Explain text seems to imply.
Use the % Idle Time measurement instead to derive a valid measure of disk utilization, as described earlier.
The % Disk Read Time, % Disk Time, and % Disk Write Time counters are derived
using the same application of Little’s Law formula, except that the values are reported
as percentages and the percentage value is capped at 100 percent. For example, if the
Avg. Disk Queue Length value is 0.8, the % Disk Time Counter reports 80 percent. If
the Avg. Disk Queue Length value is greater than 1, % Disk Time remains at 100 percent.
Because the % Disk Time counters are derived using Little’s Law, similar to the way
the Avg. Disk Queue Length counters are derived, they are subject to the same interpretation issues. Table 3-28 describes the Physical Disk(n)\% Disk Time counter.
Table 3-28
Physical Disk(n)\% Disk Time Counter
Counter Type
Compound counter.
Description
The average number of physical disk requests that are either in
service or are waiting for service at the disk, expressed as a percentage.
Measurement Notes
% Disk Time is derived using Little’s Law by multiplying Physical
Disk(n)\Avg. Disk secs/Transfer by Physical Disk(n)\Disk Transfers/sec. The calculation is then reported as a percentage and
capped at 100 percent.
Chapter 3:
Measuring Server Performance
289
Table 3-28 Physical Disk(n)\% Disk Time Counter
Usage Notes
Performance
This derived value should be used cautiously, if at all.
■
This counter duplicates the Avg. Disk Queue Length calculation, which estimates the average number of requests
that are in service or queued during the measurement interval. % Disk Time reports the same value, as a percentage, as the Avg. Disk Queue Length for disks with an
average queue length <= 1. For disks with a calculated
average queue length > 1, % Disk Time always reports
100 percent.
■
% Disk Read Time, % Disk Time, and % Disk Write Time
are derived using the same formulas as the Avg. Disk
Queue Length counters, except they report values as percentages and the values are capped at 100 percent.
■
% Disk Read Time and % Disk Write Time are derived using similar formulas:
Physical Disk(n)\% Disk Read Time = 100 × min(1,(Physical Disk(n)\Avg. Disk secs/Read × Physical Disk(n)\Disk
Reads/sec))
Physical Disk(n)\% Disk Write Time = 100 × min(1,(Physical Disk(n)\Avg. Disk secs/Write × Physical Disk(n)\Disk
Write/sec)).
Interpretation of these values is subject to the same
Warning.
■
Values of the % Disk Time counters should also be interpreted based on an understanding of the nature of the underlying physical disk entity. What appears to the host
operating system as a single physical disk might, in fact, be
a collection of physical disks that appear as a single LUN.
Array controllers are often used to create virtual LUNs that
are backed by multiple physical disks. With array controllers, multiple disks in the array can be performing concurrent operations. Under these circumstances, the physical
disk entity should no longer be viewed as a single server.
■
If multiple disks are in the underlying physical disk entity,
calculate the % Disk Time per physical disk.
Not applicable. Use the % Idle Time and Avg. Disk Queue
Length counters instead.
Capacity Planning
Not applicable.
Operations
Not applicable.
Alert Threshold
Do not Alert on this counter value. Use the % Idle Time and Avg.
Disk Queue Length counters instead.
Related Measures
Physical Disk(n)\% Idle Time
Physical Disk(n)\Avg. Disk secs/Transfer
Physical Disk(n)\Disk Transfers/sec
Physical Disk(n)\Current Disk Queue Length
Physical Disk(n)\Avg. Disk Queue Length
290
Microsoft Windows Server 2003 Performance Guide
Split I/Os
Split I/Os are physical disk requests that are split into multiple requests, usually due
to disk fragmentation. The Physical Disk object reports the rate that physical disk I/
Os are split into multiple physical disk requests so that you can easily determine when
disk performance is suffering because of excessive file system fragmentation. Table 329 describes the Physical Disk(n)\Split IO/sec counter.
Table 3-29
Physical Disk(n)\Split IO/sec Counter
Counter Type
Interval difference counter (rate/second).
Description
The rate physical disk requests were split into multiple disk requests during the interval. Note that when a split I/O occurs, the
I/O Manager measurement layers count both the original I/O
request and the split I/O request as split I/Os, so the split I/O
count accurately reflects the number of I/O operations initiated
by the I/O Manager.
Usage Notes
A primary indicator of physical disk fragmentation.
■
Defragmenting disks on a regular basis helps improve
disk performance because sequential operations run several times faster than random disk requests on most disks.
On disks with built-in actuator-level buffers, sequential
operations can run 10 times faster than random disk requests.
■
A split I/O might also result when data is requested in a
size that is too large to fit into a single I/O.
■
Calculate split I/Os as a percentage of Disk Transfers/sec:
Physical Disk(n)\% Split IOs = Physical Disk(n)\Split IO/sec
÷ Physical Disk(n)\Disk Transfers/sec
When the number of split I/Os is 10–20 percent or more
of the total Disk Transfers, check to see whether the disk
is very fragmented.
■
Split I/Os usually take longer for the disk to service, so also
watch for a correlation with Physical Disk(n)\Avg. Disk
secs/Transfer. Higher values of Avg. Disk secs/Transfer
also contribute to greater disk utilization (1−% Idle Time).
Performance
Secondary indicator that helps you determine how often you
need to run disk defragmentation software.
Capacity Planning
Not applicable.
Operations
Poor disk response time slows application response time.
Alert Threshold
Alert when split I/Os > 20 percent of Disk Transfers/sec.
Related Measures
Physical Disk(n)\Disk Transfers/sec
Physical Disk(n)\Avg. Disk secs/Transfer
Physical Disk(n)\% Idle Time
Defragmenting disks on a regular basis or when the number of split I/Os is excessive
will normally improve disk performance, because disks are capable of processing
Chapter 3:
Measuring Server Performance
291
sequential operations much faster than they process random requests. Be sure to
check the Analysis Report produced by the Defragmentation utility. If the report indicates that the files showing the most fragmentation are the ones in constant use or the
ones being modified regularly, the performance boost gained from defragmenting the
disk might be short-lived. For more advice about using disk defragmentation utilities
effectively, see Chapter 5, “Performance Troubleshooting.”
Disk Space Usage Measurements
Two important disk space usage measurements are available in the Logical Disk
object: % Free Space and Free Megabytes. Because running out of disk space is almost
always catastrophic, monitoring the Free Space available on your logical disks is critical. Because disk space tends to be consumed gradually, you usually don’t have to
monitor disk free space as frequently as you do for many of the other performance
indicators you will gather. Monitoring disk free space hourly or even daily is usually
sufficient. Note that the Performance Library responsible for computing the Free
Megabytes and % Free Space counters, Perfdisk.dll, refreshes these measurements at a
slower pace because the counter values themselves tend to change slowly. By default,
these disk space measurements are refreshed once every 5 minutes, independent of
the Performance Monitor data collection interval. Using Performance Monitor to
gather these counters at a rate faster than approximately 5 minutes per sample, you
will retrieve counter values that are duplicates, reflecting the slow rate of data gathering by the Perfdisk.dll Performance Library. Both the high overhead associated with
calculating Free Megabytes on very large file systems and the normally slow rate at
which these counter values change are the important factors that this slower rate of
data gathering reflects. Table 3-30 describes the Logical Disk(n)\Free Megabytes
counter.
Note Because of the high overhead associated with calculating % Free Space and
Free Megabytes on very large file systems, and the normally slow rate at which these
counter values change, these counters are normally measured only once every 5 minutes. If you need more frequent measurements to track file system growth, you can
add a binary Registry field at HKLM\System\CurrentControlSet\Services\Perfdisk\Performance\VolumeRefreshInterval and change the VolumeRefreshInterval to a more
suitable value. Code the number of seconds you would like to wait between recalculations of the Logical Disk % Free Space and Free Megabytes metrics. The default VolumeRefreshInterval is 300 seconds.
292
Microsoft Windows Server 2003 Performance Guide
Table 3-30
Logical Disk(n)\Free Megabytes Counter
Counter Type
Instantaneous
(sampled once during each measurement period).
Description
The amount of unallocated space on the logical disk, reported
in megabytes.
Measurement Notes
This is the same value reported by Windows Explorer on the
Logical Disk Property sheets.
Usage Notes
■
This counter value tends to change slowly.
■
Because calculating free megabytes for very large file systems is time-consuming, the I/O Manager measurement
layers recalculate the value of the counter approximately
once every 5 minutes. When you gather this measurement data more frequently than the rate at which it is recalculated, you will obtain static values of the counter that
reflect this slow rate of updating.
A primary indicator of logical disk space capacity used.
■
Divide by % Free Space or multiply by the reciprocal of
free space (1/% Free Space).
Performance
Not applicable.
Capacity Planning
Trending and forecasting disk space usage over time.
Operations
Running out of space on the file system is usually catastrophic.
Alert Threshold
Alert on this counter value or when Logical Disk(n)\% Free Space
< 10 percent.
Related Measures
Logical Disk(n)\% Free Space.
You can use statistical forecasting techniques to extrapolate from the historical trend
of disk space usage to anticipate when you are likely to run out of disk space. See
Chapter 4, “Performance Monitoring Procedures,” for an example that uses linear
regression to forecast workload growth—a simple statistical technique that can readily
be adapted for long-term disk capacity planning.
Managing Network Traffic
Network traffic is instrumented at the lowest level hardware interface and at each
higher level in the TCP/IP stack. At the lowest level, both packets and byte counts are
accumulated. At the IP level, datagrams sent and received are counted. Statistics for
both IP version 4 and IP version 6 are provided. At the TCP level, counters exist for
segments sent and received, and for the number of initiated and closed connections.
At the network application level, similar measures of load and traffic are in, for example, HTTP requests, FTP requests, file Server requests, network client Redirector
requests, as well other application-level statistics. In some cases, application response
time measures might also be available in some form.
Chapter 3:
Measuring Server Performance
293
Network Interface Measurements
At the lowest level of the TCP/IP stack, the network interface driver software layer provides instrumentation on networking hardware performance. Network interface statistics are gathered by software embedded in the network interface driver layer. This
software counts the number of packets that are sent and received, and also tracks the
size of their data payloads. Multiple instances of the Network Interface object are generated, one for every network interface chip or card that is installed, plus the Loopback interface, if that is defined. Note that network packets that were retransmitted as
a result of collisions on an Ethernet segment are not directly visible to the host software measurement layer. Ethernet packet collision detection and recovery is performed on board the network adapter card, transparently to all host networking
software. Table 3-31 describes the Network Interface(n)\Bytes Total/sec counter.
Table 3-31
Network Interface(n)\Bytes Total/sec Counter
Counter Type
Interval difference counter (rate/second).
Description
Total bytes per second transmitted and received over this interface during the interval. This is the throughput (in bytes)
across this interface.
Measurement Notes
This counter tracks packets sent and received and accumulates
byte counts from packet headers as they are transmitted or received. Packets are retransmitted on an Ethernet segment because collisions are not included in this count.
Usage Notes
The primary indicator of network interface traffic.
■
Calculate network interface utilization: Network Interface(n)\% Busy = Network Interface(n)\Bytes Total/sec ÷
Network Interface(n)\Current Bandwidth
■
Network packets that were retransmitted as a result of
collisions on an Ethernet segment are not counted. Collision detection and recovery is entirely performed on
board the NIC, transparently to the host networking
software.
■
The Current Bandwidth counter reflects the actual performance level of the network adaptor, not its rated capacity. If a gigabit network adapter card on a segment is
forced to revert to a lower speed, the Current Bandwidth
counter will reflect the shift from 1 Gbps to 100 Mbps,
for example.
■
The maximum achievable bandwidth on a switched link
should be close to 90–95 percent of the Current Bandwidth counter.
294
Microsoft Windows Server 2003 Performance Guide
Table 3-31
Network Interface(n)\Bytes Total/sec Counter
Performance
Primary indicator to determine whether the network is a potential bottleneck.
Capacity Planning
Trending and forecasting network usage over time.
Alert Threshold
Alert when Total Bytes/sec exceeds 90 percent of line capacity.
Related Measures
Network Interface(n)\Bytes Received/sec
Network Interface(n)\Bytes Sent/sec
Network Interface(n)\Packets Received/sec
Network Interface(n)\Packets Sent/sec
Network Interface(n)\Current Bandwidth
TCP/IP Measurements
Both the IP and TCP layers of the TCP/IP stack are instrumented. Windows Server
2003 supports both TCP/IP version 4 and version 6, and there are separate performance objects for each, depending on which versions of the software are active. At the
IP level, Datagrams/sec is the most important indicator of network activity, which can
also be broken out into Datagrams Received/sec and Datagrams Sent/sec. Additional
IP statistics are available on packet fragmentation and reassembly. Table 3-32
describes the IPvn\Datagrams/sec counter.
Table 3-32
IPvn\Datagrams/sec Counter
Counter Type
Interval difference counter (rate/second).
Description
Total IP datagrams per second transmitted and received during
the interval.
Measurement Notes
The IP layer of the TCP/IP stack counts datagrams sent and received.
Usage Notes
The primary indicator of IP traffic.
Identical sets of counters are available for IPv4 and IPv6.
Performance
Secondary indicator to determine whether the network is a potential bottleneck.
Capacity Planning
Trending and forecasting network usage over time.
Operations
Sudden spikes in the amount of IP traffic might indicate the
presence of an intruder.
Alert Threshold
Build alerts for important machines linked to the network backbone based on extreme deviation from historical norms.
Related Measures
IPvn\Datagrams Received/sec
IPvn\Datagrams Sent/sec
Network Interface(n)\Packets/sec
Chapter 3:
Measuring Server Performance
295
For the TCP protocol, which is session- and connection-oriented, connection statistics are also available. It is useful to monitor and track TCP connections both for security reasons to detect Denial of Service attacks, and for capacity planning. Table 3-33
describes the TCPvn\Connections Established counter.
Table 3-33
TCPvn\Connections Established Counter
Counter Type
Instantaneous (sampled once during each measurement period).
Description
The total number of TCP connections in the ESTABLISHED state
at the end of the measurement interval.
Measurement Notes
The TCP layer counts the number of times a new TCP connection is established.
Usage Notes
The primary indicator of TCP session connection behavior.
Identical counters are maintained for TCPv4 and TCPv6.
The number of TCP connections that can be established is constrained by the size of the Nonpaged pool. When the Nonpaged
pool is depleted, no new connections can be established.
Performance
Secondary indicator to determine whether the network is a potential bottleneck.
Capacity Planning
Trending and forecasting growth in the number of network users over time. The administrator should tune TCP registry entries
like MaxHashTableSize and NumTcTablePartitions based on the
number of network users seen on average.
Operations
Sudden spikes in the number of TCP connections might indicate
a Denial of Service attack.
Alert Threshold
Build alerts for important machines linked to the network backbone based on extreme deviation from historical norms.
Related Measures
TCPPvn\Segments Received/sec
TCPPvn\Segments Sent/sec
Network Interface(n)\Packets/sec
Memory\Nonpaged Pool Bytes
From a capacity planning perspective, TCP Connections Established measures the
number of network clients connected to the server. Additionally, characterizing each
session by its workload demand is useful. TCP activity is recorded by segments, which
then get broken into packets by the IP layer that are compatible with the underlying
hardware. Segments Received/sec corresponds to the overall request rate from networking clients to your server. Table 3-34 describes the counter.
296
Microsoft Windows Server 2003 Performance Guide
Table 3-34
TCPvn\Segments Received/sec Counter
Counter Type
Interval difference counter (rate/second).
Description
The number of TCP segments received across established connections, averaged over the measurement interval.
Measurement Notes
The TCP layer counts the number of times TCP segments are received.
Usage Notes
The primary indicator of TCP network load.
■
Identical counters are maintained for TCPv4 and TCPv6.
■
Calculate the average number of segments received per
connection:
TCPvn\Segments Received/sec ÷ TCPPvn\Connections
Established/sec
This can be used to forecast future load as the number of
users grows.
■
When server request packets are received from network
clients, depending on the networking application, the request is usually small enough to fit into a single Ethernet
message and IP Datagram. For HTTP and server message
block (SMB) requests, for example, TCPvn\Segments
Received/sec≅ IPvn\Datagrams Received/sec because
HTTP and SMB requests are usually small enough to fit
into a single packet.
Performance
Secondary indicator to determine whether the network is a potential bottleneck.
Capacity Planning
Trending and forecasting network usage over time.
Operations
Sudden spikes in the amount of TCP requests received might
indicate the presence of an intruder.
Alert Threshold
Build alerts for important machines linked to the network backbone based on extreme deviation from historical norms.
Related Measures
TCPPvn\Connections Established/sec
TCPPvn\Segments Sent/sec
IPvn\Datagrams Received/sec
Network Interface(n)\Packets/sec
If Windows Server 2003 machines are serving as networking hubs or gateways,
IPvn\Datagrams Received/sec and TCPvn\Segments Received track the number of
requests received from networking clients. For capacity planning, either of these indicators of load can be used to characterize your machine’s workload in terms of network traffic per user. When you are running machines dedicated to a single server
application—a dedicated IIS Web server, for example—you can also characterize pro-
Chapter 3:
Measuring Server Performance
297
cessor usage per user connection, along with processor usage per request. Then as the
number of users increases in your forecast, you can also project the network and processor resources that are required to service that projected load. Because of the extensive use of memory-resident caching in most server applications, characterizing the
disk I/O rate per user or request isn’t easy, because the disk I/O rate could remain relatively flat as the request load increases because of effective caching.
Networking Error Conditions
In all areas of network monitoring, pay close attention to any reported error incidents.
These include Network Interface(n)\Packets Received Errors, Network Interface(n)\Packets Outbound Errors, IP\Datagrams Outbound No Route, IP\Datagrams
Received Address Errors, IP\Datagrams Received Discarded, TCP\Segments Retransmitted/sec and TCP\Connection Failures. Configure alerts to fire when any of these
networking error conditions are observed.
Maintaining Server Applications
When the TCP layer is finished processing a segment received from a networking client, the request is passed upwards to the networking application that is plugged into
the associated TCP port. Some of the networking applications you will be managing
might include:
■
File Server
■
Print server
■
Web server
■
SQL Server
■
Terminal Server
■
Exchange Mail and Messaging server
■
COM+ Server applications
These and other networking applications provide additional performance measurements that can be used to characterize their behavior. This section highlights some of
the most important performance counters available from these networking applications that aid in their configuration and tuning.
298
Microsoft Windows Server 2003 Performance Guide
Thread Pooling
To aid in scalability, most server applications use some form of thread pooling. In general, thread pooling means these server applications perform the following actions:
■
Define a pool that contains Worker threads that can handle incoming requests
for service.
■
Queue work requests as they arrive, and then release and dispatch Worker
threads to take a work request off the queue and complete it.
The maximum size of the thread pool defined by the server application is usually a
function of the size of RAM and the number of processors. There will be many
instances when the heuristic used to set the maximum size of thread pool is inadequate for your specific workload. When that happens, these server applications frequently also have configuration and tuning options that allow you to do either of the
following actions, or both:
■
Increase the number of worker threads in the thread pool
■
Boost the dispatching priority of worker threads in the thread pool
Looking at these thread pooling server applications externally, from the point of view
of their process address space, Process(n)\Thread Count identifies the total number
of threads that are created, including the worker threads created in the thread pool.
You will frequently find that the Process(n)\Thread Count is a large, relatively static
number. However, inside the application, the situation is much more dynamic.
Worker threads are activated as work requests are received, up to the maximum size
of the thread pool, and they might be decommissioned later after they are idle.
Performance counters that are internal to the server application let you know how
many of the worker threads that are defined are in use. If you find that the following
two conditions are true for these thread pooling applications, you might find that
increasing the maximum size of the thread pool will boost throughput and improve
the responsiveness of the server application:
■
Running worker threads for sustained periods of time at or near the limit of the
maximum size of the pool
■
Neither the processor (or processors) or memory appears to be saturated
Chapter 3:
Measuring Server Performance
299
In addition to monitoring the number of active worker threads, you might also find it
useful to track the client request rate and any counters that indicate when internal
requests are delayed in the request queue. For more information about thread pooling
in server applications, see Chapter 6, “Advanced Performance Topics.”
You might also find that increasing the maximum size of the thread pool makes no
improvement, merely resulting in higher context switch rates and increased CPU utilization (with no increase in throughput). In this case, you should reduce the maximum size of the thread pool to its original size (or to an even lower value).
File Server
The file Server service that is available on all Windows Server 2003 machines is a good
example of a thread pooling application. It defines separate thread pools for each processor, and then uses processor affinity to limit the amount of interprocessor communication that occurs. The file Server also provides extensive instrumentation that is
available in two performance objects: Server and Server Work Queues. You can monitor the number of active worker threads in each Server Work Queue, the rate of
requests that are processed, the request queue length, and a number of error indicators, along with other measurements.
The Server object contains overall file server statistics, including the number of file
server sessions, the rate of requests, and several error indicators. The most important
error indicator is the Server\Work Item Shortages counter, which tracks the number
of times a client request was rejected because of a resource shortage.
Table 3-35
Server\Work Item Shortages Counter
Counter Type
Interval difference counter (rate/second).
Description
The number of times a shortage of work items caused the file Server to reject a client request. This error usually results in session termination.
Measurement Notes
The file Server counts the number of times a work item shortage
occurs.
300
Microsoft Windows Server 2003 Performance Guide
Table 3-35
Server\Work Item Shortages Counter
Usage Notes
Performance
A primary indicator that the File Server service is short of resources.
■
As server message blocks (SMBs) are received from clients by
the Server service, the request is stored in a work item and
assigned to an available worker thread from the thread pool
to process. If worker threads are not available or cannot process requests fast enough, the queue of Available Work
Items can become depleted. When no more work items are
available, the Server service cannot process the SMB request
and terminates the session with an error response.
■
If memory is available, add the values for the InitWorkItems
or MaxWorkItems parameters to the Registry key at
HKLM\SYSTEM\CurrentControlSet\Services\lanmanserver\parameters to increase the number of work items that are
allocated.
Primary indicator to determine whether the number of Server
Work items defined is a potential bottleneck.
Capacity Planning
Not applicable.
Operations
File Server clients whose sessions are terminated because of a work
item shortage must restore their session manually.
Alert Threshold
Alert on any nonzero value of this counter.
Related Measures
Server Work Queues(n)\Active Threads
Server Work Queues(n)\Available Threads
Server Work Queues(n)\Available Work Items
Server Work Queues(n)\Borrowed Work Items
Server Work Queues(n)\Queue Length
Server Work Queues(n)\Work Item Shortages.
At the Server Work Queue level, you can drill down into more detail. One Server Work
Queue is defined for blocking requests, which are operations that require an I/O to
disk. Server implements an I/O completion port for worker threads that handle these
operations. Besides the Blocking Queue is one Server Work Queue defined per processor, each with dedicated thread pool. The per processor Server Work Queues are
identified using the processor instance name. SMBs received from network clients are
assigned to the Server Work Queue associated with the processor where the network
interface interrupt was serviced. Threads from the thread pool are activated as necessary to process incoming requests for service.
The file Server is well-instrumented. There are counters at the Server Work Queue
level for the number of Active Threads currently engaged in processing SMB requests
and Available Threads that could be scheduled if new work requests arrive. You
should verify that Available Threads is not equal to zero for any sustained period that
Chapter 3:
Measuring Server Performance
301
might cause the Work Item queue to back up. Table 3-36 describes the Server Work
Queues(n)\Available Threads counter.
Table 3-36
Server Work Queues(n)\Available Threads Counter
Counter Type
Instantaneous (sampled once during each measurement period).
Description
A sampled value that reports the number of available threads
from the per-processor Server Work Queue that are available to
process incoming SMB requests.
Measurement Notes
The file Server counts the number of free worker threads.
Usage Notes
A primary indicator that the file Server service is short of worker
threads.
■
When Available Threads is zero, incoming SMB requests
must be queued. If requests arrive faster than they can be
processed because there are no Available Threads, the
queue where pending work items are stored might back up.
■
If there are no Available Work Items, the server attempts to
borrow them from another processor Work Item queue.
Borrowing work items forces the Server to lock the per-processor Work Item queue to facilitate interprocessor communication, which tends to slow down work item
processing.
■
If Available Threads is at or near zero for any sustained period, the Queue Length of waiting requests is > 5, and processor resources are available—% Processor Time for the
associated processor instance < 80 percent—you should
add the value for MaxThreadsPerQueue to the Registry key
at HKLM\SYSTEM\CurrentControlSet\Services\lanmanserver\parameters to increase the number of threads that are
created in the per-processor thread pools.
■
Per processor thread pools are defined so that there are
multiple instances of the Server Work queues performance
object on a multiprocessor.
■
The values of the Active Threads and Available Threads
counters for the Blocking queue can be safely ignored because the Blocking queue is managed differently from the
per-processor thread pools. Resources to process Blocking
queue requests are allocated on demand. The Server service
also utilizes the I/O completion port facility to process I/O
requests more efficiently.
Performance
Primary indicator to determine whether the number of worker
threads defined for the per-processor Server Work queues is a potential bottleneck.
Capacity Planning
Not applicable.
302
Microsoft Windows Server 2003 Performance Guide
Table 3-36
Server Work Queues(n)\Available Threads Counter
Operations
File Server clients whose sessions are terminated because of a
work item shortage must restore their session manually.
Alert Threshold
Alert on any zero value of this counter.
Related Measures
Server\Work Item Shortages
Server Work Queues(n)\Active Threads
Server Work Queues(n)\Available Work Items
Server Work Queues(n)\Borrowed Work Items
Server Work Queues(n)\Queue Length
Server Work Queues(n)\Work Item Shortages
Each per-processor Server Work queue also reports on the number of queued
requests that are waiting for a worker thread to become available. Table 3-37 describes
the Server Work Queues(n)\Queue Length counter.
Table 3-37
Server Work Queues(n)\Queue Length Counter
Counter Type
Instantaneous (sampled once during each measurement period).
Description
A sampled value that reports the number of incoming SMB requests that are queued for processing, waiting for a worker thread
to become available. There are separate per-processor Server
Work queues to minimize interprocessor communication delays.
Measurement Notes
The file Server reports the number of SMB requests stored in work
items that are waiting to be assigned to an available worker
thread for servicing.
Usage Notes
A secondary indicator that the file Server service is short of worker
threads.
■
Pay close attention to the per-processor work item queues,
and watch for indications that the queues are backing up.
■
Work items for the Blocking Queue are created on demand,
so the Blocking queue is managed differently from the perprocessor work queues. The Blocking queue is seldom a
bottleneck.
Performance
Primary indicator to determine whether client SMB requests are
delayed for processing at the file Server. A secondary indicator
that the per-processor Work Item queue is backed up because of
a shortage of threads or processing resources.
Capacity Planning
Not applicable.
Operations
File Server clients whose sessions are terminated because of a
work item shortage must restore their session manually.
Alert Threshold
Alert when the Queue Length > 5 for any processor Work Item
queue.
Related Measures
Server\Work Item Shortages
Server Work Queues(n)\Active Threads
Server Work Queues(n)\Available Threads
Server Work Queues(n)\Available Work Items
Server Work Queues(n)\Borrowed Work Items
Server Work Queues(n)\Work Item Shortages
Chapter 3:
Measuring Server Performance
303
Print Servers
Printing is performed by worker threads associated with the Print Spooler service,
which executes as the Spoolsv.exe process. A tuning parameter allows you to boost
the priority of print spooler threads if you need to boost the performance of background printing services. The print spooler also supplies performance statistics in the
Print Queue object. Key performance counters include both print jobs and printed
pages. As you would for other server applications, set up alerts for error conditions.
Any nonzero occurrences of the Print Queue\Not Ready Errors, Out of Paper Errors,
and Job Errors that occur should generate alerts so that operations staff can intervene
promptly to resolve the error conditions.
Web-Based Applications
The Web server and FTP server functions in IIS are also structured as thread pooling
applications to assist in scalability. The IIS Web server contains many performanceoriented parameters and settings, a complete discussion of which is beyond the scope
of this book.
More Info
For more information about performance-oriented parameters and
settings for IIS Web server, see the “Internet Information Services (IIS) 6.0” topic under
“Internet and E-mail services” in the “Help and Support” documentation, and the
“Internet Information Services (IIS) 6.0 Help” in the Internet Information Services Manager. Also see Chapter 13, “Optimizing IIS 6.0 Performance,” in the Microsoft Internet
Information Services (IIS) 6.0 Resource Kit from Microsoft Press.
IIS provides a number of performance objects, depending on which specific services
are defined, each with corresponding measures that report on the transaction load.
These objects and some of their most important measures of transaction load are
identified in the Table 3-38.
Table 3-38
IIS Critical Measurements
Object
Counter
Notes
SMTP Server
Bytes Received/sec
The Exchange Server Internet Mail Connector
(IMC) uses the IIS SMTP Server facility to communicate with other e-mail servers using the
SMTP protocol.
Bytes Sent/sec
Messages Received/sec
Messages Sent/sec
304
Microsoft Windows Server 2003 Performance Guide
Table 3-38
IIS Critical Measurements
Object
Counter
Notes
FTP Service
Bytes Received/sec
There is one instance of the FTP Service object
per FTP site, plus an _Total instance.
Bytes Sent/sec
Total Files Received
Total Files Sent
Web Service
Bytes Received/sec
There is one instance of the Web Service object
per Web site, plus an _Total instance.
Bytes Sent/sec
Get Requests/sec
Post Requests/sec
ISAPI Extension Requests/sec
NNTP Service
Articles Received/sec
Articles Sent/sec
Bytes Received/sec
Bytes Sent/sec
Active Server
Pages
Requests/sec
ASP.NET
Requests/sec
ASP.NET
Applications
Requests/sec
There is one instance of the ASP.NET Applications object per ASP.NET application, plus an
_Total instance.
IIS defines a pool of worker threads, which are then assigned dynamically to perform
the various tasks requested from the different installed Web applications. ASP and
ASP.NET applications also rely on additional thread pools. For example, ASP application thread pools are governed by the AspProcessorThreadMax property, which is
stored in the metabase. In the case of ASP.NET applications, the maxWorkerThreads
and maxIOThreads properties from the processModel section of the Machine.config file
determine the size of the thread pool. Thread pool configuration and tuning for Web
server and other server applications is discussed in Chapter 6, “Advanced Performance Topics.”
Chapter 3:
Measuring Server Performance
305
Both Active Server Pages and ASP.NET provide many additional metrics, including
some important response time indicators. Table 3-39 describes the Active Server
Pages\Request Execution Time and ASP.NET\Request Execution Time counters.
Table 3-39 Active Server Pages\Request Execution Time and ASP.NET\Request
Execution Time Counters
Counter Type
Instantaneous (sampled once during each measurement period).
Description
The execution time in milliseconds of the ASP or ASP.NET transaction that completed last.
Measurement Notes
This is the same value that is available in the IIS log as the Time
Taken field. It is the Time Taken for the last ASP Request that completed execution. If you gather this counter at a rate faster than
the ASP Requests/sec transaction arrival rate, you will gather duplicate counter values.
Usage Notes
A primary indicator of ASP and ASP.NET application service time.
Properly viewed as a sample measurement of ASP or ASP.NET
transaction service time. Because it is a sampled value and it is impossible to tell what specific transaction was completed last, you
should not use this counter for operational alerts.
Calculate ASP application response time by adding Active Server
Pages\Request Queue Time:
Active Server Pages\Request Response Time = Active Server
Pages\Request Execution Time + Active Server Pages\
Request Queue Time
Calculate ASP.NET application response time by adding
ASP.NET\Request Queue Time:
ASP.NET \Request Response Time = ASP.NET \Request Execution
Time + ASP.NET \Request Queue Time
Performance
A primary indicator of ASP or ASP.NET application service time.
Capacity Planning
Not applicable.
Operations
Not applicable.
Alert Threshold
Do not alert on this counter.
Related Measures
Active Server Pages\Requests/sec
Active Server Pages\Request Queue Time
Active Server Pages\Requests Executing
Active Server Pages\Requests Queued
ASP.NET\Requests/sec
ASP.NET\Request Queue Time
ASP.NET\Requests Executing
ASP.NET\Requests Queued
Active Server Pages\Request Execution Time and ASP.NET\Request Execution Time
are indicators of ASP and ASP.NET application service time, respectively. But both
306
Microsoft Windows Server 2003 Performance Guide
counters need to be treated as sampled measurements, or a single observation. You
need, for example, to accumulate several hundred sample measurements during periods of peak load to be able to estimate the average Request Execution Time accurately
over that interval. Table 3-40 describes the Active Server Pages\Request Queue Time
and ASP.NET\Request Queue Time counters.
Table 3-40 Active Server Pages\Request Queue Time and ASP.NET\Request
Queue Time Counters
Counter Type
Instantaneous (sampled once during each measurement period).
Description
The queue time in milliseconds of the ASP or ASP.NET transaction that completed last.
Measurement Notes
This is the Queue Time delay value for the last ASP or ASP.NET
request that completed execution. If you gather this counter at
a rate faster than the ASP Requests/sec transaction arrival rate,
you will gather duplicate counter values.
Usage Notes
A primary indicator of ASP or ASP.NET application queue time.
Properly viewed as a sample measurement of ASP or ASP.NET
transaction queue time. Because it is a sampled value and it is
impossible to tell what specific transaction completed last, you
should not use this counter for operational alerts.
Calculate ASP.NET application response time by adding
ASP.NET \Request Execution Time:
ASP.NET \Request Response Time = ASP.NET \Request Execution
Time + ASP.NET \Request Queue Time
Performance
A primary indicator of ASP or ASP.NET application queue time.
Capacity Planning
Not applicable.
Operations
Not applicable.
Alert Threshold
Do not alert on this counter.
Related Measures
Active Server Pages\Requests/sec
Active Server Pages\Request Execution Time
Active Server Pages\Requests Executing
Active Server Pages\Requests Queued
ASP.NET\Requests/sec
ASP.NET \Request Execution Time
ASP.NET\Requests Executing
ASP.NET\Requests Queued
Adding ASP.NET Request Execution Time and Request Queue Time yields the
response time of the last ASP.NET transaction. This derived value also needs to be
treated as a sample measurement, or a single observation. In a spreadsheet, for example, accumulate several hundred sample measurements of Request Execution Time
and Request Queue Time during periods of peak load, and then summarize them to
Chapter 3:
Measuring Server Performance
307
estimate the average Request Response Time during the period. Table 3-41 describes
the Active Server Pages\Requests Executing and ASP.NET\Requests Executing
counters.
Table 3-41 Active Server Pages\Requests Executing and ASP.NET\Requests
Executing Counters
Counter Type
Instantaneous (sampled once during each measurement period).
Description
The number of ASP or ASP.NET requests that are currently being
executed.
Usage Notes
A primary indicator of ASP application concurrency. Each active
ASP or ASP.NET request is serviced by a worker thread.
If all ASP or ASP.NET threads are currently busy when a new request arrives, the request is queued.
This is an instantaneous counter that reports the number of ASP
or ASP.NET threads currently occupied with active requests.
Performance
A primary indicator of ASP or ASP.NET application concurrency.
Capacity Planning
Not applicable.
Operations
Not applicable.
Alert Threshold
Alert when this ASP\Requests Executing ≥ (AspProcessorThreadMax − 2) × #_of _processors.
Alert when this ASP.NET\Requests Executing ≥ (ProcessorThreadMax − 2) × #_of _processors.
Related Measures
Active Server Pages\Requests/sec
Active Server Pages\Request Queue Time
Active Server Pages\Requests Execution Time
Active Server Pages\Requests Queued
ASP.NET\Requests/sec
ASP.NET\Request Queue Time
ASP.NET \Requests Execution Time
ASP.NET Requests Queued
The ProcessorThreadMax property is an upper limit on the number of ASP.NET Requests
that can be in service at one time. If all available ASP.NET threads are currently busy
when a new ASP.NET request arrives, the request is queued and must wait until one of
the requests in service completes and an ASP.NET worker thread becomes available.
Tip The ProcessorThreadMax parameter can have a significant affect on the scalability
of your ASP applications. When ASP.NET\Requests Executing is observed at or near ProcessorThreadMax multiplied by the number of processors, consider increasing the value
of ProcessorThreadMax. For more details, see Chapter 6, “Advanced Performance Topics.”
308
Microsoft Windows Server 2003 Performance Guide
Consider, for example, an ASP.NET application that services 10 requests per second. If
the average response of ASP.NET applications is 4 seconds, Little’s Law predicts that
the number of ASP.NET requests in the system equals the arrival rate multiplied by
the response time, or 10 × 4, or 40, which is how many ASP.NET\Requests Executing
you would expect to see. If ProcessorThreadMax is set to its default value, which is 20,
and IIS is running on a 2-way multiprocessor, the maximum value you could expect to
see for ASP.NET\Requests Executing is 40. Absent other obvious processor, memory,
or disk resource constraints on ASP.NET application throughput, observing no more
than 40 Requests Executing during periods of peak load might mean that the ProcessorThreadMax is an artificial constraint on ASP.NET throughput. For a more detailed
discussion of server thread pooling applications and their scalability, see Chapter 6,
“Advanced Performance Topics.” Table 3-42 describes the Active Server
Pages\Requests Queued and ASP.NET\Requests Queued counters.
Table 3-42 Active Server Pages\Requests Queued and ASP.NET\Requests
Queued Counters
Counter Type
Instantaneous (sampled once during each measurement period).
Description
The number of ASP or ASP.NET requests that are currently
queued for execution, pending the availability of an ASP or
ASP.NET worker thread.
Usage Notes
A primary indicator of ASP or ASP.NET concurrency constraint.
Each active request is serviced by a worker thread. The number
of worker threads available to process ASP requests is governed
by the AspProcessorThreadMax property in the IIS metabase.
AspProcessorThreadMax defaults to 25 threads per processor in
IIS 6.0. The number of worker threads available to process
ASP.NET requests is governed by the ProcessorThreadMax
property in the Machine.config settings file. ProcessorThreadMax defaults to 20 threads per processor in .NET version 1.
If all worker threads are currently busy when a new request arrives, the request is queued.
This is an instantaneous counter that reports the number of ASP
or ASP.NET threads currently occupied with active requests.
Estimate average ASP Request response time using Little’s Law:
Active Server Pages\Request response time = (Active Server Pages\Requests Executing + Active Server Pages\Requests Queued)
÷ Active Server Pages\Requests/sec
Estimate average ASP.NET request response time using Little’s
Law:
ASP.NET\Request response time = (ASP.NET\Requests Executing
+ ASP.NET\Requests Queued) ÷ ASP.NET\Requests/sec
Chapter 3:
Measuring Server Performance
309
Table 3-42 Active Server Pages\Requests Queued and ASP.NET\Requests
Queued Counters
Performance
A primary indicator of ASP and ASP.NET application
concurrency.
Capacity Planning
Not applicable.
Operations
Not applicable.
Alert Threshold
Alert when this counter ≥ 5 × #_of _processors.
Related Measures
Active Server Pages\Requests/sec,
Active Server Pages\Request Queue Time
Active Server Pages\Requests Execution Time
Active Server Pages\Requests Queued
ASP.NET\Requests/sec
ASP.NET\Request Queue Time
ASP.NET\Requests Execution Time
ASP.NET\Requests Queued
The ASP.NET counters (and their Active Server Pages corresponding counters) can
also be used to estimate response time on an interval basis using Little’s Law.
Together, ASP.NET\Requests Executing and ASP.NET\Requests Queued measure the
number of ASP.NET requests currently in the system at the end of the interval. Assuming the equilibrium assumption is not violated—that ASP.NET transaction arrivals
roughly equal completions over the interval—then divide the number of requests in
the system by the arrival rate, ASP.NET\Requests/sec, to calculate average ASP.NET
application response time. This value should correlate reasonably well with the sampled ASP.NET response time that you can calculate by adding ASP.NET\Requests Execution Time and ASP.NET\Requests Queue Time over an interval in which you have
accumulated sufficient samples.
Similar ASP.NET measurements are available at the .NET application level.
Tip Any time your Web site reports ASP.NET\Requests Executing at or near the ProcessorThreadMax × the number of processors maximum, consider boosting the number of ASP.NET threads.
You might find that increasing ProcessorThreadMax makes no improvement, merely
resulting in higher context switch rates and increased CPU utilization (with no increase
in throughput). In this case, reduce the number to its original size (or to an even lower
value).
This is a simplified view of Web application thread pooling. With Web application gardens and ASP and ASP.NET applications running in isolated processes, the configuration and tuning of the Web server application thread pool grows more complicated.
See Chapter 6, “Advanced Performance Topics,” for more details.
310
Microsoft Windows Server 2003 Performance Guide
Terminal Services
Terminal Services provides many unique capabilities. With Terminal Services, for
example, a single point of installation of a desktop application can be made available
centrally to multiple users. Users sitting at Terminal Server client machines can run
programs remotely, save files, and use network resources just as though the applications were installed locally. Terminal Server can also deliver Windows desktop applications to computers that might not otherwise be able to run the Windows operating
system.
Similar to other server applications, there might be many Terminal Server clients that
depend on good performance from your Terminal Server machine or machines. Effective performance monitoring procedures are absolutely necessary to ensure good service is provided to Terminal Server clients.
When a Terminal Server client logs on to Windows Server 2003 configured to run Terminal services, it creates a Terminal Server session. Table 3-43 describes the Terminal
Services\Total Sessions counter.
Table 3-43
Terminal Services\Total Sessions Counter
Counter Type
Instantaneous
(sampled once during each measurement period).
Description
The total number of Terminal Server sessions, including both active and inactive sessions.
Usage Notes
The primary indicator of total Terminal Server clients. It includes
both inactive and active sessions.
Terminal Server creates and maintains a desktop environment
for each active session. This requires private copies of the following processes: Explorer, Winlogon, Smss.exe, Lsass.exe, and
Csrss.exe. Additional private copies of any processes that the
Terminal Server launches from the desktop are also created.
Private copies of the desktop processes are retained for duration of the session, regardless of whether the session is inactive
or active.
Performance
Not applicable.
Capacity Planning
Trending and forecasting Terminal Server usage over time.
Operations
Terminal Server capacity constraints can affect multiple Terminal Server clients.
Alert Threshold
Do not alert on this counter.
Related Measures
Terminal Services\Active Sessions
Terminal Services\Inactive Sessions
Chapter 3:
Measuring Server Performance
311
Every Terminal Server session is provided with a Windows desktop environment,
which is supported by private instances of several critical processes: the Explorer
desktop shell, a Winlogon process for authentication, a private copy of the Windows
Client/Server Subsystem, Csrss.exe, the Lsass.exe, and Smss.exe security subsystem
processes. In addition, Terminal Server creates process address spaces associated with
the desktop application that the Terminal Server client is running remotely. A Terminal Server supporting a large number of Terminal Server clients must be able to sustain a large number of process address space and execution threads.
In practice, this means that large Terminal Server deployments on 32-bit machines
can encounter severe 32-bit virtual memory addressing constraints. With enough Terminal Server clients on a 32-bit server machine, virtual memory shortages in the system area can occur, specifically in the system Paged pool or the pool of available
system PTEs.
More Info For more information about virtual memory shortages, see the “Windows Server 2003 Terminal Server Capacity and Scaling” white paper at http://
www.microsoft.com/windowsserver2003/techinfo/overview/tsscaling.mspx, and the
discussion in Chapter 5, “Performance Troubleshooting,” which addresses 32-bit virtual memory constraints.
The Memory\Pool Paged Bytes and Memory\Free System Page Table Entries counters
should be tracked on Terminal Server machines to determine whether 32-bit virtual
memory shortages are a concern. Note that systems with 64-bit virtual addressing, such
as x64 and IA64 systems, provide more Terminal Server capacity than 32-bit systems.
A related concern, discussed at length in the “Windows Server 2003 Terminal Server
Capacity and Scaling” white paper, is a potential shortage of physical memory for the
system file cache because of contention for RAM with the system’s Paged pool and
pool of System PTEs. The file cache shares the range of system virtual memory available to the operating system with the Paged pool and the System PTE pool. Direct evidence that the size of the system file cache might be constrained is obtained by
monitoring the Cache\Copy Read Hits % counter. For the sake of delivering good performance to Terminal Server clients, the Cache\Copy Read Hits % counter should be
consistently above 95 percent. In fact, the authors of the “Windows Server 2003 Terminal Server Capacity and Scaling” white paper recommend that Cache\Copy Read
Hits % counter remain at a 99 percent level for optimal performance.
Chapter 4
Performance Monitoring
Procedures
In this chapter:
Understanding Which Counters to Log . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
Daily Server Monitoring Procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
Using a SQL Server Repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
Capacity Planning and Trending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Counter Log Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
Troubleshooting Counter Collection Problems . . . . . . . . . . . . . . . . . . . . 395
The regular performance monitoring procedures you implement need to be able to
serve multiple purposes, including the following:
■
Detecting and diagnosing performance problems
■
Verifying that agreed-upon service levels are being met
■
Supporting proactive capacity planning initiatives to anticipate and relieve
impending resource shortages before they impact service levels
In this chapter, sample performance-monitoring procedures are described that will
help you meet these important goals. These sample procedures, which center on a
daily performance data-gathering process that you can easily implement, are generalized and thus are appropriate for both small- and large-scale enterprises. However, you
will likely need to customize them to some degree to suit your specific environment.
These sample procedures will also help you start diagnosing common server performance problems. Additional recommendations will address performance alerts, management reporting, and capacity planning.
The final section of this chapter documents procedures that you should follow when
you experience a problem using the Performance Monitor to gather specific performance statistics. For example, applications that install or uninstall performance
counters can sometimes do so incorrectly, in which case you can use the procedures
described here to restore the performance counter infrastructure integrity. Doing so
will ensure that Performance Monitor correctly reports performance statistics.
313
314
Microsoft Windows Server 2003 Performance Guide
The sample procedures in this chapter are designed to help you anticipate a wide variety of common performance problems. They rely on the Log Manager and Relog automated command-line tools that were discussed in Chapter 2, “Performance
Monitoring Tools.” Log Manager and Relog will help you log counters from the local
machine to a local disk and gather the most important performance counters, which
were highlighted in Chapter 3, “Measuring Server Performance.”
These sample automated procedures allow you to detect and diagnose many performance problems. They rely on background data-gathering sessions that you may analyze at length later after a problem is reported. Still, the amount of data it is suggested
you collect continuously for daily performance monitoring will not always be adequate to solve every performance-related problem. Sometimes you will need to augment these background data collection procedures with focused real-time
monitoring. Use real-time monitoring when you need to gather more detailed information about specific situations.
More Info
Trying to diagnose any complex performance problem is often challenging. Trying to diagnose a complex performance problem in real time using the
System Monitor is even more difficult. Unless you know precisely what you are looking
for and the problem itself is persistent, it can be difficult to use the System Monitor in
real time to identify the cause of a performance problem. In a real-time monitoring
session, you have so many counters and instances of counters to look at and analyze
that the problem might disappear before you are able to identify it. Problem diagnosis
is the focus of Chapter 5, “Performance Troubleshooting.”
Understanding Which Counters to Log
A daily performance monitoring procedure that is effective in helping you detect, diagnose, and resolve common performance problems must gather large quantities of useful performance data, which you can then edit, summarize, and use in management
reporting and capacity planning. The first decision you will have to make is about
which performance counters, among all those that are available, you should gather on
a regular basis.
Background Performance Monitoring
A daily performance monitoring regimen that sets up background counter log sessions to gather performance data on a continuous basis allows you to detect and
Chapter 4:
Performance Monitoring Procedures
315
resolve common performance problems when they arise. Because it is impossible to
know in advance which key resources are saturated on a machine experiencing performance problems, in an ideal world you would collect performance data on all the key
resources, such as the processor, memory, disk, and network. However, overhead considerations dictate that you can never collect all the available performance information about all resources and all workloads all the time on any sizable machine running
Microsoft Windows Server 2003. Thus, you must be selective about which data you
are going to gather and how often you will collect it. Striking a balance between the
amount of data you will gather and analyze and the costs associated with that process
is important.
To detect and diagnose common performance problems involving overloaded
resources, you need to gather a wide range of detailed performance data on processor,
memory, disk, and network utilization and the workloads that are consuming those
resources. This data should include any performance counters that indicate error conditions, especially errors resulting from a shortage of internal resources. This chapter
discusses some of the key error indicators you should monitor.
Management Reporting
Normally, much less detailed information is required for service-level and other forms
of management reporting. Thus, the level of detail provided by daily performance
monitor procedures appropriate for detecting and resolving common performance
problems is more than adequate for management reporting. In fact, to build management reporting procedures that scale efficiently across a large organization, you typically would find it helpful to summarize your daily performance counter logs first,
prior to reporting. Summarizing the daily counter logs will reduce the size of the files
that need to be transferred around the network and improve the efficiency of the
reporting procedures.
Service-level reporting focuses on resource consumption and measures of load such
as logons, sessions, transaction rates, and messages received. Service-level reporting is
often of interest to a wider audience of system management professionals, many of
whom might not be intimately familiar with the way Windows Server 2003 and its
major server applications function. Consequently, it is important to avoid reporting
too much technical detail in management reports aimed at this wider audience.
Rather, focus service-level reporting on reporting a few key measures of resource utilization and load that are widely understood.
316
Microsoft Windows Server 2003 Performance Guide
Capacity Planning
Finally, to implement proactive capacity planning, in which you identify workload
growth trends, reliably predict workload growth, and forecast future requirements,
you track and accumulate historical data on key resource usage and consumption levels over time. You will probably need to do this for only a small number of key computer components such as the processor, the networks and the disks, and a few key
applications. The counter log data that feeds your management reporting processes
will be edited and summarized again to support capacity planning, at which point the
emphasis shifts to building an historical record of computer resource usage data.
You can reliably predict the future with reasonable accuracy only when you have
amassed a considerable amount of historical data on the patterns of workload growth.
For example, for every unit of time that you want to predict future workload growth,
you need, at a minimum, an historical record equal to twice that amount of time. Thus,
capacity planning requires that you maintain a record of resource usage over long periods of time. Typically, only when you have accumulated at least 2–3 years of data can
you make reasonably accurate forecasts of capacity requirements that will feed decision making for an annual budget cycle.
In planning for future capacity requirements, seasonal patterns of activity often have a
major impact. Seasonal variations in many production workloads commonly occur in
monthly and annual cycles. For example, higher rates of financial transaction processing are often associated with month-end and year-end closings. You will need to provide computer capacity that is sufficient to absorb these month-end and year-end peak
loads. In a retail sales organization, you are likely to find that patterns of consumer
purchases are tied to holiday gift giving, when you can expect much higher transaction volume. It goes without saying that these peak transaction rates must be accommodated somehow. You will be able to factor in seasonal variations in workload
demand only after you accumulate historical data reflecting multiple cycles of that seasonal activity.
Daily Server Monitoring Procedures
This section details a model daily performance monitoring procedure, which is part of
a comprehensive program of proactive performance management. This procedure
includes the following daily activities:
Chapter 4:
Performance Monitoring Procedures
317
■
Automatically gathering an in-depth view of system performance using counter
logs
■
Monitoring key system and server application error indicators
■
Setting up alerts that automatically trigger the collection of detailed, diagnostic
counter logs
■
Developing management reports on key performance metrics that can be
viewed by interested parties in your organization
■
Retaining summarized performance statistics to support capacity planning
■
Managing the counter logs that are created automatically during these processes
Remember that the model performance monitoring practices and procedures discussed here will require tailoring for use in your environment, based on your site-specific requirements. For instance, each IT organization tends to have unique
management reporting requirements impacting the type and quantity of reports generated and the data included on those reports. The practices and procedures here represent a good place for you to start building an effective performance management
function within your IT organization.
Daily Counter Logs
The first step in monitoring machines running Windows Server 2003 is to establish
automated data logging using the Log Manager (logman) command-line utility.
The command shown in Listing 4-1 establishes a daily performance logging procedure using a settings file that defines the performance counters you want to gather. (A
sample settings file is described later.) The command also references a command file
to be executed when the daily counter log files are closed. (A sample script to perform
typical post-processing is also illustrated.)
Listing 4-1
Establishing Daily Performance Logging
logman create counter automatic_DailyLog -cf "c:\perflogs\basic-counterssetting-file.txt" -o C:\Perflogs\Today\BasicDailyLog -b 1/1/2004 00:00:00
-cnf 24:00:00 -si 1:00 -f BIN -v mmddhhmm -rc c:\perflogs\post-process.vbs
After you execute this command, the counter log you defined is visible and should
look similar to the display shown in Figure 4-1. If you use the counter log’s graphical
user interface, you can confirm the properties used to establish the logging session, as
illustrated.
318
Microsoft Windows Server 2003 Performance Guide
Figure 4-1
Properties used to establish the logging session
As documented in Chapter 2, “Performance Monitoring Tools,” the Log Manager utility allows you to configure background counter log data-gathering sessions. Table 4-1
parses the Log Manager command parameters that are used in Listing 4-1 and
explains what it is they accomplish.
Table 4-1
Parsing the Log Manager Command Parameters
Log Manager Parameter
Explanation
-cf “c:\perflogs\
basic-counterssetting-file.txt"
The counters to log are specified in a counters settings file.
An example of a basic-counters-setting-file is provided in
the section “A Sample Counter Settings File” in this chapter.
-b 1/1/2004 00:00:00 -cnf
24:00:00
Logging is initiated automatically by the System Monitor
logging service as soon as your machine reboots and runs
continuously for a 24-hour period.
-si 1:00
Data samples are collected once per minute.
-f BIN
Data logging is performed continuously to a binary log file.
The binary format is used for the sake of efficiency and to
save on the amount of disk space consumed.
-v mmddhhmm
Automatic versioning is used to create unique daily
counter log file names.
-rc c:\perflogs\
post-process.bat
At the end of a data logging session, when the log file is
closed, a script is launched. This script performs file management and other post-processing. This post processing
includes deleting older copies of the counter log files that
were created previously, and summarizing the current log
file for daily reporting. A sample post-processing script is
provided in “Automated Counter Log Processing” in this
chapter.
The Log Manager -v parameter allows you to generate unique file names for the
counter logs created daily. The Performance Logs and Alerts snap-in supports an additional file versioning option that specifies the date in yyyymmdd format. If you prefer
to have counter log file names automatically versioned using a yyyymmdd format, you
Chapter 4:
Performance Monitoring Procedures
319
can use the Performance snap-in afterward to manually modify the counter log properties to append the year, month, and day to the counter log file name.
When you are satisfied that the counter logging session you created is correctly specified, issue the command shown in Listing 4-2 to start logging data.
Listing 4-2
Starting Logging Data
logman start counter automatic_DailyLog
Logging Local Counters to a Local Disk
The daily performance monitoring procedure recommended here generates counter
logs that contain local counters that are written in binary format to a local disk. The
example writes counter log files to a local disk folder named C:\Perflogs\Today,
although any suitable local disk folder will do. Binary log file format is recommended
because it is more efficient and consumes less overhead.
Logging local counters to a local disk permits you to implement a uniform performance monitoring procedure across all the machines in your network, enabling you to
scale these procedures to the largest server configurations, no matter what the network topology is.
The daily performance monitoring procedures documented here assume that local
counter log data is written to a local disk folder, although other configurations are
possible. Counter logs, for example, can be used to gather data from remote computers. Gathering data remotely is appropriate when you cannot get physical access to the
remote machine. In those circumstances, it is simpler to perform performance data
gathering on one centrally located machine that is designed to pull counter data from
one or more remote machines. However, such a procedure is inherently less robust
than having counter logs running locally, because a data collection problem on any
remote machine can impact all machines that the centralized process is designed to
monitor. The network impact of monitoring remote machines must also be understood. This topic is discussed in greater detail in the “Counter Log Scenarios” section.
When you log counter log data to a local disk, you are required to manage the counter
log files that are created on a regular basis so that they do not absorb an inordinate
amount of local disk space. Without some form of file aging and cleanup, the counter
logs you generate daily can be expected to absorb as much as 30–100 MB of local disk
space on your servers each day they are active. In the section “Automated Counter Log
Processing,” a sample post-processing script is provided that uses Microsoft Visual
Basic Scripting Edition (VBScript) and Windows Script Host (WSH) and can perform
this daily file management and cleanup.
320
Microsoft Windows Server 2003 Performance Guide
Using the Relog command-line utility, you can transform binary format files later into
any other form. For example, you can create summarized files that can then be transferred to a consolidation server on the network. You can also create comma-delimited
text files for use with programs like Microsoft Excel, which can generate useful and
attractive charts and graphs. Using Relog, you can also build and maintain a SQL
Server database of consolidated counter log data from a number of servers that will
serve the needs of capacity planners.
Logging to a Network Share
Instead of generating counter logs in binary format to a local disk folder, many people
prefer to write counter logs to a network-shared folder. Because it can simplify file
management, this approach is often preferred. If you opt for this method, however,
note the following considerations, which might affect the scalability of your performance monitoring procedures at sites where a large number of servers exist:
■
Be careful not to overload the network. If your daily counter log consumes 50
MB of disk space per day, spread over a 24-hour period, that consumption
amounts to only about 600 bytes per server per second of additional load that
your network must accommodate. To determine the approximate load on the
network to perform remote data logging, multiply by the number of servers that
will be logging to the same network shared folder.
■
Make sure that your counter log session runs under a User ID that has permission to access the shared network folder. This User ID also must also be a member of the built-in Performance Log Users group. To add User credentials to a
Log Manager session, use the -u parameter to specify UserName and Password.
Using Performance Logs and Alerts in the Performance Monitor console, you
must set the Run As parameter on the General properties page for your counter
log.
■
Ensure that the Windows Time Service is used to synchronize the clocks on all
the servers that are writing counter log files to a shared network folder.
More Info For a description of how to use the Windows Time Service to synchronize the clocks on the machines on your Windows Server 2003 network, see
the documentation entitled “Windows Time Service Technical Reference”
posted on TechNet at http://www.microsoft.com/resources/documentation
/WindowsServ/2003/all/techref/en-us/W2K3TR_times_intro.asp.
■
Embed the computer name in the file names of the counter log files so that you
can easily identify the machine they come from.
Chapter 4:
Performance Monitoring Procedures
321
These considerations can be easily accomplished using a WSH script. For example,
the VBScript shown in Listing 4-3 returns the local computer name in a variable
named LogonServer.
Listing 4-3
Identifying the Source Machine
Set WshShell = CreateObject("Wscript.Shell")
Set objEnv = WshShell.Environment("Process")
LogonServer = objENV("COMPUTERNAME")
Next, you can construct a file name with the local computer name embedded in it
using the script code shown in Listing 4-4.
Listing 4-4
Constructing a File Name
Const PerflogFolderYesterday = "Z:\SharedPerflogs\Yesterday"
Const LogType = "blg"
Const LogParms = " -b 1/1/2004 00:00:00 -cnf 24:00:00 -si 1:00 -f BIN
-v mmddhhmm -rc c:\perflogs\post-process.vbs"
DailyFileName = PerflogFolderYesterday & "\" & LogonServer & "." &
"basicDailyLog" & "." & LogType
LogSessionName = LogonServer & "-daily-log"
Then you can execute the Logman utility from inside the script using WSH Shell’s
Exec method, as illustrated in Listing 4-5.
Listing 4-5
Executing the Logman Utility
Const logSettingsFileName = "Z:\SharedPerflogs\log-counters-setting-file.txt"
command = "logman create counter " & LogSessionName & " -o " & DailyFileName & " -cf
" & logSettingsFileName & " " & LogParms
Set WshShell = Wscript.CreateObject("WScript.Shell")
Set execCommand = WshShell.Exec(command)
Wscript.Echo execCommand.StdOut.ReadAll
The final line of the script obtains the ReadAll property of the StdOut stream, which
contains any messages generated by the Logman utility from the Shell Object’s Exec
method, allowing you to determine whether the utility executed successfully.
Additional considerations for performing remote logging and writing counter logs to
a remote disk are discussed in “Counter Log Scenarios.”
Baseline Measurements
Baseline performance data is a detailed performance profile of your machine that you
store for future use in case you need to determine what has changed in your environment. You can compare the previous baseline set of measurements to current data to
detect any major changes in the workload. Sometimes, small incremental changes that
322
Microsoft Windows Server 2003 Performance Guide
occur slowly add up to major changes over time. By using performance reports that
are narrowly focused in time, you can easily miss seeing the extent and scope of these
incremental changes. One good way to assess the extent of these changes over time is
to compare a detailed view of the current environment with one of your more recent
sets of baseline measurements.
The daily performance monitoring procedure recommended here is suitable for establishing a set of baseline measurements for your machine. Every six months or so, copy
this counter log file and set it aside. Copy the counter log to a secure location where
you can store it long-term. It is also a good idea to save a baseline counter log before
and after any major system hardware or software configuration change.
To be useful as a baseline set of measurements, all counter data you gather should be
kept in its original binary format. You might want to save these baseline measurement
sets to offline archived storage so that you can keep them around for several years
without incurring much cost.
Using a Counter Settings File
Using a counter settings file to specify the counters that you want to gather makes it
easy to create and maintain uniform performance monitoring procedures across all
your machines with similar requirements. For example, you might create a counter
settings file for use on all your machines that are running Microsoft SQL Server. Similarly, your Web server machines running Internet Information Services (IIS) and
.NET Framework applications would require a different counter settings file for best
results.
Any counter settings file that you create is likely to include a base set of counters that
reports utilization of the machine’s principal resources—processors, memory, disks,
and network interfaces. A simple counter settings file that would form the heart of
almost any application-specific counter settings that you will need to create might
look the example in Listing 4-6.
Listing 4-6
Simple Counter Settings File
\LogicalDisk(*)\% Free Space
\LogicalDisk(*)\Free Megabytes
\PhysicalDisk(*)\*
\Cache\*
\Processor(*)\*
\Memory\*
\System\*
\Network Interface(*)\*
\IPv4\*
\TCPv4\*
Chapter 4:
Performance Monitoring Procedures
323
Listing 4-6 gathers Logical Disk free space statistics and all counters from the Cache,
Memory, Network Interface, Physical Disk, Processor, and System objects. The Logical
Disk free space measurements enable you to determine when your file system is running out of capacity.
Adding application-specific counters The counter settings file shown in Listing 46 lacks any application-specific performance data. This simple counter settings file
can be enhanced significantly by adding counters associated with applications that
the server runs. (See Table 4-2.)
Table 4-2
Adding Counters
Role That Your Windows
Server 2003 Machine
Gather Counters from These Objects
Serves
Gather Counters from
These Processes
Domain Controller
NTDS
lsass, smss
Remote Access Server
RAS Total, RAS Port
svchost
Database Server
SQL Server:General Statistics, SQL
Server:Databases, SQL Server:Buffer
Manager, SQL Server:Cache Manager,
SQL Server:SQL Statistics, SQL
Server:Locks
sqlserver, sqlagent
Web Server
Internet Information Services Global,
FTP Service, Web Service, Web Service
Cache
inetinfo, svchost
File and Print Server
Server, Server Work Queues, Print
Queue, NBT Connection
svchost, spoolsv
Terminal Server
Terminal Services, Terminal Services
Session
svchost; tssdis
Exchange Server
MSExchangeAL, MSExchangeDSAccess
Caches, MSExchangeDSAccess
Contexts, MSExchangeDSAccess Processes, Epoxy, MSExchangeIS Mailbox,
Database ==> Instances,
MSExchange Transport Store Driver,
MSExchangeIS Transport Driver,
MSExchangeSRS, MSExchange Web
Mail, MSExchangeIMAP4,
MSExchangePOP3, MSExchangeMTA,
MSExchangeMTA Connections, SMTP
Server, SMTP NTFS Store Driver
store, dsamain
Application Server
MSMQ Session, MSMQ Service,
MSMQ Queue
dllhost
324
Microsoft Windows Server 2003 Performance Guide
To limit the size of the counter log files that are generated daily, do not create a single
settings file that contains all the application-specific counters in Table 4-2 and their
associated process-level data. Instead, create a series of application-specific settings
files.
Tip
When you do not know what server applications are installed and active on a
machine, the typeperf command-line utility, documented in Chapter 2, “Performance
Monitoring Tools,” can be used. It generates a settings file that provides a full inventory of the extended application counters available for collection on the machine.
A sample counter settings file Following the guidelines given in the preceding
section, a counter settings file for a File and Print Server might look like the sample in
Listing 4-7.
Listing 4-7
Example Counter Settings File for a File and Print Server
\LogicalDisk(*)\% Free Space
\LogicalDisk(*)\Free Megabytes
\LogicalDisk(*)\Current Disk Queue Length
\PhysicalDisk(*)\Current Disk Queue Length
\PhysicalDisk(*)\Avg. Disk Queue Length
\PhysicalDisk(*)\Avg. Disk sec/Transfer
\PhysicalDisk(*)\Avg. Disk sec/Read
\PhysicalDisk(*)\Avg. Disk sec/Write
\PhysicalDisk(*)\Disk Transfers/sec
\PhysicalDisk(*)\Disk Reads/sec
\PhysicalDisk(*)\Disk Writes/sec
\PhysicalDisk(*)\Disk Bytes/sec
\PhysicalDisk(*)\Disk Read Bytes/sec
\PhysicalDisk(*)\Disk Write Bytes/sec
\PhysicalDisk(*)\Avg. Disk Bytes/Transfer
\PhysicalDisk(*)\Avg. Disk Bytes/Read
\PhysicalDisk(*)\Avg. Disk Bytes/Write
\PhysicalDisk(*)\% Idle Time
\PhysicalDisk(*)\Split IO/Sec
\Server\Bytes Total/sec
\Server\Bytes Received/sec
\Server\Bytes Transmitted/sec
\Server\Sessions Timed Out
\Server\Sessions Errored Out
\Server\Sessions Logged Off
\Server\Sessions Forced Off
\Server\Errors Logon
\Server\Errors Access Permissions
\Server\Errors Granted Access
\Server\Errors System
\Server\Blocking Requests Rejected
Chapter 4:
Performance Monitoring Procedures
\Server\Work Item Shortages
\Server\Pool Nonpaged Bytes
\Server\Pool Nonpaged Failures
\Server\Pool Nonpaged Peak
\Server\Pool Paged Bytes
\Server\Pool Paged Failures
\Server Work Queues(*)\Queue Length
\Server Work Queues(*)\Active Threads
\Server Work Queues(*)\Available Threads
\Server Work Queues(*)\Available Work Items
\Server Work Queues(*)\Borrowed Work Items
\Server Work Queues(*)\Work Item Shortages
\Server Work Queues(*)\Current Clients
\Server Work Queues(*)\Bytes Transferred/sec
\Server Work Queues(*)\Total Operations/sec
\Server Work Queues(*)\Context Blocks Queued/sec
\Cache\Data Maps/sec
\Cache\Data Map Hits %
\Cache\Data Map Pins/sec
\Cache\Pin Reads/sec
\Cache\Pin Read Hits %
\Cache\Copy Reads/sec
\Cache\Copy Read Hits %
\Cache\MDL Reads/sec
\Cache\MDL Read Hits %
\Cache\Read Aheads/sec
\Cache\Lazy Write Flushes/sec
\Cache\Lazy Write Pages/sec
\Cache\Data Flushes/sec
\Cache\Data Flush Pages/sec
\Processor(*)\% Processor Time
\Processor(*)\% User Time
\Processor(*)\% Privileged Time
\Processor(*)\Interrupts/sec
\Processor(*)\% DPC Time
\Processor(*)\% Interrupt Time
\Memory\Page Faults/sec
\Memory\Available Bytes
\Memory\Committed Bytes
\Memory\Commit Limit
\Memory\Write Copies/sec
\Memory\Transition Faults/sec
\Memory\Cache Faults/sec
\Memory\Demand Zero Faults/sec
\Memory\Pages/sec
\Memory\Pages Input/sec
\Memory\Page Reads/sec
\Memory\Pages Output/sec
\Memory\Pool Paged Bytes
\Memory\Pool Nonpaged Bytes
\Memory\Page Writes/sec
325
326
Microsoft Windows Server 2003 Performance Guide
\Memory\Pool Paged Allocs
\Memory\Pool Nonpaged Allocs
\Memory\Free System Page Table Entries
\Memory\Cache Bytes
\Memory\Cache Bytes Peak
\Memory\Pool Paged Resident Bytes
\Memory\System Code Total Bytes
\Memory\System Code Resident Bytes
\Memory\System Driver Total Bytes
\Memory\System Driver Resident Bytes
\Memory\System Cache Resident Bytes
\Memory\% Committed Bytes In Use
\Memory\Available KBytes
\Memory\Available MBytes
\Memory\Transition Pages RePurposed/sec
\Paging File(*)\% Usage
\Paging File(*)\% Usage Peak
\System\Context Switches/sec
\System\System Up Time
\System\Processor Queue Length
\System\Processes
\System\Threads
\Process(svchost,*)\% Processor Time
\Process(svchost,*)\% User Time
\Process(svchost,*)\% Privileged Time
\Process(svchost,*)\Virtual Bytes Peak
\Process(svchost,*)\Virtual Bytes
\Process(svchost,*)\Page Faults/sec
\Process(svchost,*)\Working Set Peak
\Process(svchost,*)\Working Set
\Process(svchost,*)\Page File Bytes Peak
\Process(svchost,*)\Page File Bytes
\Process(svchost,*)\Private Bytes
\Process(svchost,*)\Thread Count
\Process(svchost,*)\Priority Base
\Process(svchost,*)\Elapsed Time
\Process(svchost,*)\ID Process
\Process(svchost,*)\Pool Paged Bytes
\Process(svchost,*)\Pool Nonpaged Bytes
\Print Queue(*)\Total Jobs Printed
\Print Queue(*)\Bytes Printed/sec
\Print Queue(*)\Total Pages Printed
\Print Queue(*)\Jobs
\Print Queue(*)\References
\Print Queue(*)\Max References
\Print Queue(*)\Jobs Spooling
\Print Queue(*)\Max Jobs Spooling
\Print Queue(*)\Out of Paper Errors
\Print Queue(*)\Not Ready Errors
\Print Queue(*)\Job Errors
\Print Queue(*)\Enumerate Network Printer Calls
Chapter 4:
Performance Monitoring Procedures
327
\Print Queue(*)\Add Network Printer Calls
\Network Interface(*)\Bytes Total/sec
\Network Interface(*)\Packets/sec
\Network Interface(*)\Packets Received/sec
\Network Interface(*)\Packets Sent/sec
\Network Interface(*)\Current Bandwidth
\Network Interface(*)\Bytes Received/sec
\Network Interface(*)\Packets Received Unicast/sec
\Network Interface(*)\Packets Received Non-Unicast/sec
\Network Interface(*)\Packets Received Discarded
\Network Interface(*)\Packets Received Errors
\Network Interface(*)\Packets Received Unknown
\Network Interface(*)\Bytes Sent/sec
\Network Interface(*)\Packets Sent Unicast/sec
\Network Interface(*)\Packets Sent Non-Unicast/sec
\Network Interface(*)\Packets Outbound Discarded
\Network Interface(*)\Packets Outbound Errors
\Network Interface(*)\Output Queue Length
\IPv4\Datagrams/sec
\IPV4\Datagrams Received/sec
\IPV4\Datagrams Received Header Errors
\IPV4\Datagrams Received Address Errors
\IPV4\Datagrams Forwarded/sec
\IPV4\Datagrams Received Unknown Protocol
\IPV4\Datagrams Received Discarded
\IPV4\Datagrams Received Delivered/sec
\IPV4\Datagrams Sent/sec
\IPV4\Datagrams Outbound Discarded
\IPV4\Datagrams Outbound No Route
\IPV4\Fragments Received/sec
\IPV4\Fragments Re-assembled/sec
\IPV4\Fragment Re-assembly Failures
\IPV4\Fragmented Datagrams/sec
\IPV4\Fragmentation Failures
\IPV4\Fragments Created/sec
\TCPV4\Segments/sec
\TCPV4\Connections Established
\TCPV4\Connections Active
\TCPV4\Connections Passive
\TCPV4\Connection Failures
\TCPV4\Connections Reset
\TCPV4\Segments Received/sec
\TCPV4\Segments Sent/sec
\TCPV4\Segments Retransmitted/sec
Depending on such factors as the number of physical processors installed, Physical
Disks attached, the number of Logical Disks defined, and the number of Network
Interface adaptors, the volume of counter data generated by the counter settings file in
Listing 4-7 would generate could range from 30 MB per day for a small machine to 100
MB or more per day on a very large machine.
328
Microsoft Windows Server 2003 Performance Guide
Gathering error indicators Many of the counters included in the sample counter
settings file in Listing 4-7 are indicators of specific error conditions. These error
conditions are not limited to resource shortages—some reflect improperly configured system and networking services, for example. Others might indicate activity
associated with unauthorized users attempting to breach the machine’s security.
The value of including these counters in your daily performance monitoring procedures is that they allow you to pinpoint the times when system services encounter
these error conditions.
Listing 4-8 shows counters that were included in the settings file in Listing 4-7; these
counters are included primarily as error indicators on a File and Print Server. This set
of counters includes error indicators for File Server sessions, printer error conditions,
and generic networking errors.
Listing 4-8
Counters That Are Error Indicators for a File and Print Server
\Server\Sessions Timed Out
\Server\Sessions Errored Out
\Server\Sessions Logged Off
\Server\Sessions Forced Off
\Server\Errors Logon
\Server\Errors Access Permissions
\Server\Errors Granted Access
\Server\Errors System
\Server\Blocking Requests Rejected
\Server\Work Item Shortages
\Server\Pool Nonpaged Bytes
\Server\Pool Nonpaged Failures
\Server\Pool Paged Failures
\Print Queue(*)\Out of Paper Errors
\Print Queue(*)\Not Ready Errors
\Network Interface(*)\Packets Received Discarded
\Network Interface(*)\Packets Received Errors
\Network Interface(*)\Packets Received Unknown
\Network Interface(*)\Packets Outbound Discarded
\Network Interface(*)\Packets Outbound Errors
\IPV4\Datagrams Received Header Errors
\IPV4\Datagrams Received Address Errors
\IPV4\Datagrams Received Unknown Protocol
\IPV4\Datagrams Received Discarded
\IPV4\Datagrams Outbound Discarded
\IPV4\Datagrams Outbound No Route
\IPV4\Fragment Re-assembly Failures
\IPV4\Fragmentation Failures
\TCPV4\Connection Failures
\TCPV4\Connections Reset
Chapter 4:
Performance Monitoring Procedures
329
Counters, such as the ones featured in Listing 4-8 that record the number of error conditions that have occurred, are usually instantaneous, or raw counters. However, the
counter values they contain are cumulative values. Because the number of error conditions that should be occurring is small, using an interval difference counter to track
errors would result in reporting error rates per second that are so small that they
would report zero values. So, to avoid reporting the valuable metrics as zero values,
these counters report cumulative error counts.
The one drawback of this approach is that you cannot use the Alerts facility of the Performance Monitor to notify you when error conditions occur. The chart in Figure 4-2
plots values for one Network Interface error condition counter over the course of a 2hour monitoring session. Notice that the curve marking the values of the Network
Interface\Packets Received Unknown counter increases steadily during the monitoring interval. Whatever error condition is occurring is occurring regularly. Because the
counter maintains the cumulative number of error conditions that have occurred
since the system was last booted, once an error condition occurs, the counter value
remains a nonzero value. You can see that once the alert on one of these counters is
triggered, the alert will continue to occur at every measurement interval that follows.
Such alerts will overrun the Application event log.
Figure 4-2 Values for one Network Interface error condition counter over the course of a
2-hour monitoring session
330
Microsoft Windows Server 2003 Performance Guide
An easy and quick way to check to whether these error conditions are occurring is to
develop a report using the Histogram view, which allows you to identify all the nonzero error conditions. An example of this approach is shown in Figure 4-3, which
shows that only the Packets Received Unknown error indicator occurs over a two-day
monitoring interval
Figure 4-3
Histogram view showing error condition counters that have nonzero values
If you remove all the error condition counters from the Histogram that report zero values, you can then switch to the Chart view to investigate the remaining nonzero
counters. Using the Chart view, you can determine when these error conditions
occurred and the rate at which they occurred.
Using Alerts Effectively
Because of the resulting overhead and the size of the counter logs, it is rarely possible
to gather all the performance statistics you would need all the time. Gathering process-level data tends to contribute most to the size of the counter log files you generate—even smaller machines running Windows Server 2003 typically have numerous
processes running and thus large counter log files. In the sample counter settings file
in Listing 4-7, this potential problem was addressed by collecting process-level performance counters for a only few processes at a time. In Listing 4-7 and in the examples
listed in Table 4-1, process-level performance data was selected only for processes that
were central to the server application running.
Chapter 4:
Performance Monitoring Procedures
331
However, this selective garnering of data is not a wholly satisfactory solution because
process-level performance data is required to detect runaway processes that are
monopolizing the processor or are leaking virtual memory. The solution to this quandary is to use the Alerts facility of the Performance Monitor console to trigger a Log
Manager counter log data collection session automatically when an alert has tripped
its threshold. Implementing counter logging procedures in this fashion allows you to
gather very detailed information about problems automatically, without resorting to
gathering all the performance data all the time.
Follow these simple steps to set up a Log Manager counter log data collection session
that starts automatically when an alert fires:
1. Define the Log Manager counter log you plan to use to gather detailed performance data about a specific condition.
2. Define the alert condition that will be used to trigger the counter log session.
3. In the definition of the alert action, start the counter log session you defined in
step 1.
For example, to help detect that a process is leaking virtual memory, define a counter
log session that will gather data at the process level on virtual memory allocations.
Then define an alert condition that will be triggered whenever virtual memory allocations reach a critical level. Finally, define an alert action that will start the counter log
session you defined in step 1 (of the preceding procedure) when the alert condition
occurs.
The next section walks you through a sample implementation of an alert that will fire
when there is a virtual memory shortage and then initiate a counter log data gathering
session that will enable you to determine which process is responsible for the virtual
memory shortage. In Chapter 5, “Performance Troubleshooting,” additional techniques for detecting and diagnosing virtual memory leaks are discussed.
Caution
You need to be careful that your performance alerts do not result in flooding the Application event log with an excessive number of messages. You should periodically review the Application event log to ensure that your alerts settings are not
generating too many event log entries. Note that programs like the Microsoft Operations Manager can actively manage your event logs and consolidate and suppress
duplicate event log entries so that duplicate alert messages are not generated for the
same condition over and over.
332
Microsoft Windows Server 2003 Performance Guide
Triggering Counter Logs Automatically
This section walks you step by step through a procedure that will generate diagnostic
counter logs whenever there is a virtual memory shortage that might be caused by a
runaway process leaking virtual memory.
Step 1: Define the counter log settings These are the counter log settings you
want to use when the alert trips its threshold condition. For example, use the following command:
Logman create counter MemoryTroubleshooting -v mmddhhmm -c "\Memory\Available Bytes"
"\Memory\% Committed Bytes in Use" "\Memory\Cache Bytes" "\Memory\Pool Nonpaged Bytes"
"\Memory\Pool Paged Bytes" "\Process(*)\Pool Nonpaged Bytes" "\Process(*)\Pool Paged
Bytes" "\Process(*)\Pool Virtual Bytes" "\Process(*)\Pool Private Bytes"
"\Process(*)\Page Faults/sec" "\Process(*)\Pool Working Set" -si 00:30 -o
"c:\Perflogs\Alert Logs\MemoryTroubleshooting" -max 5
This command defines a MemoryTroubleshooting counter log session, which gathers
virtual memory allocation counters at a process level, along with a few system-wide
virtual memory allocation counters. In this example, these counters are sampled at
30-second intervals. Specifying the -max parameter shuts down data collection when
the counter log reaches 5 MB. Using the Performance Monitor console, you could also
limit the duration of the data collection session to a value of, for example, 10–15 minutes. Notice that the counter logs will be created in a C:\Perflogs\Alert Logs\ folder,
distinct from the folder in which you create your regular daily performance logs. This
separate folder enables you to more easily distinguish alert-triggered counter logging
sessions from other counter logs that you create, and also makes it easier for you to
manage them based on different criteria. For example, this MemoryTroubleshooting
log is a detailed view that is limited to data on virtual memory allocations. You would
never need to summarize this type of counter log, nor would you normally need to
keep a detailed historical record on these types of problems.
Step 2: Define the alert General Properties You define the counter value threshold test (or tests) that trigger the specific alert condition using the tabs of the Virtual
Memory Alert Properties page. In our ongoing example, to create an alert that is triggered by a virtual memory shortage, the threshold test is the value of the Memory\%
Committed Bytes In Use exceeding 85 percent, as illustrated in Figure 4-4.
Chapter 4:
Figure 4-4
Performance Monitoring Procedures
333
The Memory\% Committed Bytes in Use over 85%
The rate at which the alert scan samples the performance counter (or counters) you
select determines how frequently alert messages can be generated. In this example, the
alert scan is scheduled to run once per minute. On a virtual memory-constrained
machine in which the value of the Memory\% Committed Bytes In Use counter consistently exceeds the 85 percent alert threshold, this alert condition is triggered once every
minute. Depending on the alert action you choose, this frequency might prove excessive.
Tip Where possible, define alert conditions that are triggered no more frequently
than several times per hour under normal circumstances. Adjust those alert conditions
that are triggered more frequently than 5–10 times per hour, even under severe conditions, so that they fire less frequently.
Alerts that occur too frequently annoy recipients. Psychologically, alert conditions that
are triggered too frequently lose their effectiveness as notifications of significant
events. Ultimately, they become treated as commonplace events that are safe to
ignore. These human responses to the condition are understandable, but highly undesirable if the alert threshold truly represents an exceptional condition worthy of attention and additional investigation.
You can easily control the frequency for triggering alerts in either of the following ways:
1.
Adjust the threshold condition so that the alert fires less frequently.
2.
Slow down the rate at which the alert scan runs.
In addition, you might choose to limit alert actions to those that are calculated not to
annoy anyone.
334
Microsoft Windows Server 2003 Performance Guide
Step 3: Define the alert schedule The alert schedule parameters determine the
duration of an alert scan. For best results, limit the duration of alert scans to 1–2
hours. For continuous monitoring, be sure to start a new scan when the current scan
finishes, as illustrated in Figure 4-5.
Figure 4-5
Use the Start A New Scan check box
Step 4: Define the alert action The alert action parameters determine what action
the alert facility will take when the alert condition is true. In this example, you want
the alert facility to initiate the counter log you defined in step 1 to gather more
detailed information about application performance at the process level at the time
the alert was triggered. Note that the counter log you specify will be started just once
per alert scan. In contrast, event log entries are generated at each sampling interval
where the alert condition is true. Alert messages are also generated at each sampling
interval where the alert condition is true. If you choose to run a program (Figure 4-6),
the Alerts facility will schedule it to run only once per alert scan.
Chapter 4:
Figure 4-6
Performance Monitoring Procedures
335
You can choose to run a program
The counter log or specified program is started immediately following the first sample
interval of the alert scan in which the Alert condition is true. The duration of the
counter logging session is determined by the schedule parameters you specified at the
time you defined the counter log session. As noted earlier, using the command-line
interface to the Log Manager, you can limit the size of the counter log file that will be
created. Using the Performance Monitor console interface, you can limit the duration
of the counter log session. A detailed counter log that allows you to explore the condition of the machine in depth over the next 10–30 minutes is usually effective. Obviously, if the alert conditions are triggered frequently enough and the detailed counter
log sessions are relatively long, there is a risk that you will gather much more data
about a potential problem than you can effectively analyze later. You might also use
too much disk space on the machine that is experiencing the problem.
General Alerting Procedures
You will want to define similar alert procedures to launch counter logs that will allow
you to investigate periods of excessive processor utilization, physical memory shortages, and potential disk performance problems. Table 4-3 summarizes basic Alert procedures that are valuable for your production servers. Refer to the discussion in
336
Microsoft Windows Server 2003 Performance Guide
Chapter 3 on establishing configuration-dependent alert thresholds for the excessive
paging alert condition.
Table 4-3
Settings for General Alert Conditions
Condition
Threshold Tests
Scan
Frequency
(In
Additional Counters
Seconds) to Log
Log
Sample
Interval
(In
Seconds)
Excessive CPU Processor(*)\
utilization; po- % Processor Time >
tential pro98%
cess in an
infinite loop
10–30
Process
leaking
virtual
memory
Memory\
% Committed Bytes In
Use > 85%
10–30
15–30 for
Process(*)\Private Bytes, 10–30
Process(*)\Virtual Bytes, minutes
Process(*)\Pool
Nonpaged Bytes,
Process(*)\Pool Paged
Bytes, Process(*)\
Page File Bytes
Excessive
paging to
disk
Memory\Available
Kbytes < <threshold>;
10–30
Memory\*;
Processor(*)\*;
Process(*)\% Processor
Time, Process(*)\
% Privileged Time;
Process(*)\% User Time
Memory\*;
Physical Disk(n)\% Idle
Time, Physical
Disk(n)\Avg. Disk Secs/
Transfer, Physical
Disk(n)\Transfers/sec;
Memory\Pages/sec >
<threshold>
10–20 for
10–20
minutes
10–20 for
10–20
minutes
Process(*)\Page Faults/
sec
Poor disk
performance
Physical Disk(n)\
Avg. Disk Secs/
Transfer > 20;
Physical Disk(n)\
Transfers/sec > 200
15
Physical Disk(n)\% Idle
Time, Physical
Disk(n)\Avg. Disk Secs/
Transfer, Physical
Disk(n)\Transfers/sec
10 for 10
minutes
Physical Disk(n)\
Current Disk Queue
Length > 5
The alert threshold tests illustrated in Figure 4-3 show representative values that are
good for many machines. For more specific recommendations on Alert thresholds for
these counters, see the extended discussion in Chapter 3, “Measuring Server Performance.”
Chapter 4:
Performance Monitoring Procedures
337
Application Alerts
You might also want to define additional alert scans that are based on application-specific alerting criteria. In each situation, focus on alert conditions associated with excessive resource consumption, indicators of resource shortages, measurements showing
large numbers of requests waiting in the queue, and other anomalies related to application performance. For some applications, it is worthwhile to generate alerts both
when transaction rates are heavy and when transaction rates are unusually light, perhaps indicating the application stopped responding and transactions are blocked
from executing.
Table 4-4 lists suggested settings for alerts for several popular server applications. The
values you use in alert condition threshold tests for application servers usually varies
with each site, as illustrated.
More Info For additional assistance in setting application-specific alert thresholds,
consult the Microsoft Operations Framework documentation online, at the TechNet
Products and Technologies section, at http://www.microsoft.com/technet/.
Table 4-4
Sample Settings for Application-Specific Alert Conditions
Application
Condition
Domain
Controllers
Excessive Active NTDS\LDAP Searches/
Directory
sec > <threshold>
requests
Threshold Tests
NTDS\*
Active Server
Pages (or .NET
Framework
ASPX
applications)
Excessive ASP
request
queuing
ASP; Internet Information
System Global; Web Service; Process(w3wp)\*
File Server
Resource short- Server Work
Server; Server Work
ages
Queues(*)\Queue Length Queues; Cache; Memory;
> <threshold>
Processor; Process(svchost)\% Processor
Time
SQL Server
Excessive
database
transaction
rates
ASP\Requests Queued >
<threshold>
SQL Server:Databases(n)\Transactions/
sec > <threshold>
Additional Objects to Log
SQL Server:Buffer Manager; SQL Server:Cache
Manager; SQL
Server:Memory Manager;
SQL Server:Locks; SQL
Server:SQL Statistics;SQL
Server:Databases
338
Microsoft Windows Server 2003 Performance Guide
Daily Management Reporting
Management reporting is a way to ensure that all parties at your installation who are
interested in server performance have access to key information and reports that
describe the behavior of your machines running Windows Server 2003. Management
reporting should not be confused with the detailed and exploratory data analysis performed by experienced performance analysts when there are performance problems
to diagnose. Management reporting focuses on a few key metrics that can be readily
understood by a wide audience. Keep the charts and graphs you produce relatively
simple and uncluttered.
The metrics that technical managers and other interested parties request the most
include:
■
Utilization measures of key resources like processors, memory, disks, and the
network
■
Availability and transaction rates of key applications such as Web servers, database servers, and Exchange mail and messaging servers
■
Transaction service and response times of key applications, when they are
available
Summarizing Daily Counter Logs
Because management reports should provide a high-level view of performance, you
will want to generate daily files containing summarized performance data using the
Relog utility. The detailed daily counter logs you generate are not the most efficient
way to produce management reports. A daily counter log that gathers 1-minute samples continuously over the course of a 24-hour period accumulates 1440 observations
daily of each performance counter you are logging. (The actual number of samples
that would be generated daily is 1441 because one extra sample is needed at the outset
to gather the starting values of all counters.) That is much more data than the System
Monitor can squeeze onto a Chart view, which is limited to plotting 100 data points
across its time-based x-axis. To create a Chart view for a 24-hour period, the System
Monitor is forced to distill 14 separate measurements and plot summary statistics
instead, which can easily create a distorted view of the performance statistics.
Using the Relog utility to create a summarized version of the daily counter logs you
are gathering will simplify the process of creating useful daily management reports.
For example, issuing the following command creates a compact version of one of your
daily counter logs, summarized to convenient 15-minute intervals:
relog basicDailyLog_20031228.blg -o <computer-name>.basicDailyLog.blg
-f BIN -t 15
Chapter 4:
Performance Monitoring Procedures
339
These summarized versions of your daily counter logs are very handy for building
management reports quickly and efficiently.
Using summarized data in your management reports eliminates any distortions that
can be introduced when the Chart view is forced to drop so many intermediate observations. Summarizing a 24-hour period to fifteen-minute intervals yields slightly fewer
than 100 observations per counter, a number that fits neatly in a System Monitor
Chart view. Note that summarizing the measurement data to this degree will smooth
out many of the highs and lows evident in the original, detailed counter log. When
you are investigating a performance problem, you will want to access the original
counter log data, along with any detailed counter logs for the interval that were generated automatically by your alert procedures.
Tip
The detailed counter logs that you create daily are suitable for generating management reports that focus on a peak 1- or 2-hour period. When peak hour transaction rates are two or more times heavier than average transaction rates, management
reports that concentrate on these peak periods are extremely useful. When performance bottlenecks are evident only during these peak loads, reports that concentrate
on these narrow periods of time are very helpful.
Consolidating performance data from multiple servers If you are responsible
for the performance of a large number of servers, you will probably want to gather the
counter logs from many of these servers so that you can report on them from a single
location. To save on disk space at a central location, you might want to perform this
consolidation using summarized counter logs, rather than the bulky, detailed daily
counter logs that you collect initially. This consolidation can be performed daily using
a series of automated procedures, as follows:
1. Use the Relog utility on the local server machine to create a summarized version
of the daily counter log files that you produce.
2. Embed the computer name of the machine where the counter log originated
into the summarized file name that Relog produces as output.
3. Copy the summarized counter log file to a central location.
4. At the consolidation server, use Relog to combine all the counter logs into a single output file that can be used for daily reporting.
Later in this section, examples of scripts that you can use to automate these daily processing functions are provided. As an alternative to consolidating counter logs gathered across many servers at a central location during a post-processing stage, you
might consider creating all your counter logs at a centralized location at the outset. As
340
Microsoft Windows Server 2003 Performance Guide
discussed in “Logging to a Network Share” in this chapter, this option has scalability
implications for very large server farms, so it should be implemented with caution. For
a more detailed discussion of these issues, see the section entitled “Logging Local
Counters to a Local Disk” earlier in this chapter.
When you execute the Relog utility to create summarized daily counter logs suitable
for management reporting and archiving, you might choose to edit the performance
metrics being retained even further, eliminating counters that you do not intend to
report on in your management reports. When you execute the Relog utility to create
summarized daily counter logs, you can reference a counter log settings file that will
perform this editing.
For example, you can invoke Relog as follows, where relog-counters-setting-file.txt is a
narrower subset of the original basic-counters-setting-file.txt that you used to generate
the full daily counter logs.
relog basicDailyLog_20031228.blg -o <computer-name>.basicDailyLog.blg -cf relogcounters-setting-file.txt -f BIN -t 15
Listing 4-9 shows the contents of a recommended Relog counter settings file suited to
creating summarized files for daily management reporting.
Listing 4-9
relog-counters-setting-file.txt
\LogicalDisk(*)\% Free Space
\LogicalDisk(*)\Free Megabytes
\LogicalDisk(*)\Current Disk Queue Length
\PhysicalDisk(*)\Current Disk Queue Length
\PhysicalDisk(*)\Avg. Disk Queue Length
\PhysicalDisk(*)\Avg. Disk sec/Transfer
\PhysicalDisk(*)\Avg. Disk sec/Read
\PhysicalDisk(*)\Avg. Disk sec/Write
\PhysicalDisk(*)\Disk Transfers/sec
\PhysicalDisk(*)\Disk Reads/sec
\PhysicalDisk(*)\Disk Writes/sec
\PhysicalDisk(*)\Disk Bytes/sec
\PhysicalDisk(*)\Disk Read Bytes/sec
\PhysicalDisk(*)\Disk Write Bytes/sec
\PhysicalDisk(*)\Avg. Disk Bytes/Transfer
\PhysicalDisk(*)\Avg. Disk Bytes/Read
\PhysicalDisk(*)\Avg. Disk Bytes/Write
\PhysicalDisk(*)\% Idle Time
\Processor(*)\% Processor Time
\Processor(*)\% User Time
\Processor(*)\% Privileged Time
\Processor(*)\Interrupts/sec
\Processor(*)\% DPC Time
\Processor(*)\% Interrupt Time
\Memory\Page Faults/sec
\Memory\Available Bytes
\Memory\Committed Bytes
Chapter 4:
Performance Monitoring Procedures
\Memory\Commit Limit
\Memory\Transition Faults/sec
\Memory\Cache Faults/sec
\Memory\Demand Zero Faults/sec
\Memory\Pages/sec
\Memory\Pages Input/sec
\Memory\Page Reads/sec
\Memory\Pages Output/sec
\Memory\Pool Paged Bytes
\Memory\Pool Nonpaged Bytes
\Memory\Page Writes/sec
\Memory\Cache Bytes
\Memory\Pool Paged Resident Bytes
\Memory\System Code Resident Bytes
\Memory\System Driver Resident Bytes
\Memory\System Cache Resident Bytes
\Memory\% Committed Bytes In Use
\Memory\Available KBytes
\Memory\Available MBytes
\Memory\Transition Pages RePurposed/sec
\System\Context Switches/sec
\System\Processor Queue Length
\Process(sqlserver)\% Processor Time
\Process(sqlserver)\% User Time
\Process(sqlserver)\% Privileged Time
\Process(sqlserver)\Virtual Bytes
\Process(sqlserver)\Page Faults/sec
\Process(sqlserver)\Working Set
\Process(sqlserver)\Private Bytes
\Process(sqlserver)\Elapsed Time
\Process(sqlserver)\Pool Paged Bytes
\Process(sqlserver)\Pool Nonpaged Bytes
\Process(inetinfo)\% Processor Time
\Process(inetinfo)\% User Time
\Process(inetinfo)\% Privileged Time
\Process(inetinfo)\Virtual Bytes
\Process(inetinfo)\Page Faults/sec
\Process(inetinfo)\Working Set
\Process(inetinfo)\Private Bytes
\Process(inetinfo)\Elapsed Time
\Process(inetinfo)\Pool Paged Bytes
\Process(inetinfo)\Pool Nonpaged Bytes
\RAS Total\Bytes Transmitted/Sec
\RAS Total\Bytes Received/Sec
\RAS Total\Total Errors/Sec
\Print Queue(*)\Total Jobs Printed
\Print Queue(*)\Bytes Printed/sec
\Print Queue(*)\Total Pages Printed
\Print Queue(*)\Jobs
\Network Interface(*)\Bytes Total/sec
\Network Interface(*)\Packets/sec
\Network Interface(*)\Packets Received/sec
\Network Interface(*)\Packets Sent/sec
\Network Interface(*)\Current Bandwidth
341
342
Microsoft Windows Server 2003 Performance Guide
\Network Interface(*)\Bytes Received/sec
\Network Interface(*)\Bytes Sent/sec
\Network Interface(*)\Output Queue Length
\IP\Datagrams/sec
\IP\Datagrams Received/sec
\IP\Datagrams Sent/sec
\TCP\Segments/sec
\TCP\Connections Established
\TCP\Connections Active
\TCP\Connection Failures
\TCP\Connections Reset
\TCP\Segments Received/sec \TCP\Segments Sent/sec
\TCP\Segments Retransmitted/sec
Sample Management Reports
The following section illustrates the management reports that you are likely to find
most useful for presenting summarized performance statistics. These examples primarily use the System Monitor console’s Chart view to present the performance data.
You can, of course, always build more elaborate reporting charts and graphs than are
available from the System Monitor console by using tools like Microsoft Excel or other
charting programs.
Note that the sample management reports presented here showcase the key performance counters you should display. They also present uncluttered charts and graphs
that are easy to read and decipher. The counter logs used to generate these charts were
gathered from a relatively quiet machine performing little or no work during much of
the reporting interval. The intent here is to focus your attention on the presentation of
the data, not on the data itself. If you are interested in seeing examples of interesting
charts and reports illustrating machines experiencing performance problems, you can
find many relevant examples in Chapter 5, “Performance Troubleshooting.”
For information about how to use the System Monitor Automation Interface to generate management reports like these automatically, see the section entitled “System
Monitor Automation Interface” in Chapter 6, “Advanced Performance Topics.”
Processor utilization Figure 4-7 illustrates a basic Chart view template that is suitable for many management reports. The report shows overall processor utilization
and also breaks out processor utilization into its component parts. A large, easy-toread font was selected, x- and y-axis gridlines were added, and a descriptive title was
used. After you create report templates, selecting the counters you want and adding
the proper presentation elements, you can save your settings as a Microsoft Management console .msc settings file that you can reuse.
Chapter 4:
Figure 4-7
Performance Monitoring Procedures
343
Daily Processor Utilization Report
A similar chart that zooms in on a two-hour peak processing period is illustrated in
Figure 4-8. Reusing the chart templates for similar management reports simplifies
your task of explaining what these various charts and graphs mean.
Figure 4-8
Peak Hour Processor Utilization Report
344
Microsoft Windows Server 2003 Performance Guide
The daily processor utilization management report illustrated in Figure 4-7 used a
summarized daily counter log file created using the Relog utility as input. The peak
hour report in Figure 4-8 uses the full, unsummarized daily counter log with the Time
Window adjusted to show approximately two hours of peak load data.
Available disk space The Histogram view illustrated in Figure 4-9 works well for
reporting data from counters that tend to change very slowly, such as Logical
Disk(*)\Free Megabytes. Using the Histogram view, you can show the amount of free
space available on a large number of server disks very succinctly.
Figure 4-9
Disk Free Space Report
Disk performance Figure 4-10 illustrates the readability problems that occur when
you are forced to graph multiple counters against a single y-axis. The judicious use of
scaling values makes this possible, of course. However, reporting metrics that require
different scaling values to permit them all to be displayed against a single y-axis frequently confuses viewers. Figure 4-10 illustrates this problem, using disk performance
measurements of idle time, device activity rates, and disk response time. Together,
these three metrics accurately characterize physical disk performance. However, each
individual counter usually requires using a separate scaling factor, and this tends to
create confusion.
Chapter 4:
Figure 4-10
Performance Monitoring Procedures
345
Measuring disk performance using three metrics
The % Idle Time counter neatly falls neatly within the default y-axis that ranges from
zero through 100, but the other metrics do not. Physical Disk Transfers/sec frequently
exceeds 100 per second, so this counter cannot always be properly graphed against
the same y-axis scale as % Idle Time. In addition, disk response time, measured in milliseconds, has to be multiplied by a scaling factor of 1000 to be displayed against the
same y-axis as % Idle Time. Placing several counters that all use different scaling factors on the same chart invites confusion.
Nevertheless, this practice is sometimes unavoidable. Specifying an appropriate y-axis
label can sometimes help minimize confusion. The alternative of providing three separate graphs, one for each measurement, has limited appeal because the content in an
individual counter value chart is substantially diluted.
Network traffic Network traffic can be reported from the standpoint of each network interface, or summarized across all existing network interfaces. Because the Network Interface counters do not report network utilization directly, a report template
like the one illustrated in Figure 4-11, which shows both the total bytes transferred
across the network interface and the current bandwidth rating of the card, makes it
relatively easy to visualize what the utilization of the interface is. Change the default yaxis maximum so that the Current Bandwidth counter represents a 100 percent utilization level at the top of the scale. Then the Bytes Total /sec counter for the network
interface instance can be visualized as a fraction of the interface bandwidth. Be sure to
label the y-axis appropriately, as in Figure 4-11.
346
Microsoft Windows Server 2003 Performance Guide
Figure 4-11
Network Interface Traffic Report
Historical Data for Capacity Planning
Capacity planning refers to the practices and procedures you institute to avoid performance problems as your workloads grow and change. When you maintain an historical record of computer resource usage by important production workloads, you can
forecast future resource requirements by extrapolating from historical trends.
Capacity planning processes also benefit from growth forecasts that people associated
with your organization’s key production workloads can provide for you. Often, these
growth forecasts are projections based on adding more customers who are users of
these production systems. You must then transform these growth projections into a
set of computer resource demands based on the resource usage profile of existing customers. Usually, a capacity plan brings together both kinds of forecasting information
to project future workload requirements and the kinds of computer resources
required to run those workloads.
This section focuses on procedures you can use to harness the daily performance
counter logs you generate to create and maintain a database containing an historical
record of computer resource usage data. You can then use this data to support capacity planning and other system management functions. This historical record of computer performance data is known as a performance database, or PDB. This section also
describes procedures you can use to build a SQL Server–based performance database
Chapter 4:
Performance Monitoring Procedures
347
using the command-line tools discussed in Chapter 2, “Performance Monitoring
Tools.” A later section of this chapter, entitled “Using a SQL Server Repository,” discusses using tools like Microsoft Excel to analyze the data in this SQL Server PDB and
provides an example of using Excel to forecast future capacity requirements.
Why SQL Server?
Microsoft SQL Server makes an ideal performance database. If you are an inexperienced user of SQL Server, you might be reluctant to implement a SQL Server performance database. Rest assured that when you use command-line tools such as Logman
and Relog, you can quite easily use SQL Server for this purpose. Microsoft SQL Server
is an advanced Relational Database Management System (RDBMS) that is well suited
to this task, particularly to handling the large amounts of capacity planning data you
will eventually accumulate. Using SQL Server, you do not have to be responsible for
managing lots of individual counter log files. Instead, you can consolidate all your historical information in one place in one or more sets of SQL Server Tables.
Another advantage of using SQL Server as the repository for your capacity planning
data is that you are not limited to using the System Monitor for reporting. Once your
counter log data is loaded into SQL Server, you can use a variety of data access, reporting, and analysis tools. One of the most popular and powerful of these reporting and
analysis tools is Microsoft Excel. The section of this chapter entitled “Using a SQL
Server Repository” discusses using Microsoft Excel to analyze, report, and forecast
performance data that has been stored in a SQL Server PDB.
Creating Historical Summary Logs
Capacity planning requires summarized data that is accumulated and stored over
long periods of time so that historical trends become evident. This section shows how
the Relog utility can be used to summarize data and accumulate this counter log data
to support capacity planning.
The summarized daily counter log files that are used for management reporting contain too much detail to be saved for weeks, months, and years. Consider running
Relog again on the summarized daily counter log files that your daily performance
monitoring procedure produces to edit these counter logs and summarize them further. Listing 4-10 provides an example.
Listing 4-10 Editing and Summarizing Daily Count Logs
relog <computer-name>.basicDailyLog.blg -o <computer-name>.dailyHistoryLog.blg -cf
summary-counters-setting-file.txt -f BIN -t 4
348
Microsoft Windows Server 2003 Performance Guide
In this code, because the counter log files defined as input were already summarized
to 15-minute intervals, the -t 4 subparameter performs additional summarization to
the 1-hour level.
Only a small number of counters are valuable for long-term capacity planning. The
counter log settings file referenced by this Relog command drops many counter values that have little or no value long term. An example counter log settings file for a File
and Print Server that is suitable for capacity planning is illustrated in Listing 4-11.
Listing 4-11
Counter Log Settings File for a Capacity Planning Database
\LogicalDisk(*)\% Free Space
\LogicalDisk(*)\Free Megabytes
\PhysicalDisk(*)\Avg. Disk sec/Transfer
\PhysicalDisk(*)\Avg. Disk sec/Read
\PhysicalDisk(*)\Avg. Disk sec/Write
\PhysicalDisk(*)\Disk Transfers/sec
\PhysicalDisk(*)\Disk Reads/sec
\PhysicalDisk(*)\Disk Writes/sec
\PhysicalDisk(*)\Disk Bytes/sec
\PhysicalDisk(*)\Disk Read Bytes/sec
\PhysicalDisk(*)\Disk Write Bytes/sec
\PhysicalDisk(*)\Avg. Disk Bytes/Transfer
\PhysicalDisk(*)\Avg. Disk Bytes/Read
\PhysicalDisk(*)\Avg. Disk Bytes/Write
\PhysicalDisk(*)\% Idle Time
\Processor(*)\% Processor Time
\Processor(*)\% User Time
\Processor(*)\% Privileged Time
\Processor(*)\Interrupts/sec
\Processor(*)\% DPC Time
\Processor(*)\% Interrupt Time
\Memory\Page Faults/sec
\Memory\Available Bytes
\Memory\Committed Bytes
\Memory\Commit Limit
\Memory\Transition Faults/sec
\Memory\Cache Faults/sec
\Memory\Demand Zero Faults/sec
\Memory\Pages/sec
\Memory\Pages Input/sec
\Memory\Page Reads/sec
\Memory\Pages Output/sec
\Memory\Pool Paged Bytes
\Memory\Pool Nonpaged Bytes
\Memory\Page Writes/sec
\Memory\Cache Bytes
\Memory\Pool Paged Resident Bytes
\Memory\System Cache Resident Bytes
\Memory\% Committed Bytes In Use
Chapter 4:
Performance Monitoring Procedures
349
\Memory\Available KBytes
\Memory\Available MBytes
\Memory\Transition Pages RePurposed/sec
\Process(svchost,*)\% Processor Time
\Process(svchost,*)\% User Time
\Process(svchost,*)\% Privileged Time
\Process(svchost,*)\Virtual Bytes
\Process(svchost,*)\Page Faults/sec
\Process(svchost,*)\Working Set
\Print Queue(*)\Bytes Printed/sec
\Print Queue(*)\Total Pages Printed
\Print Queue(*)\Jobs
\Network Interface(*)\Bytes Total/sec
\Network Interface(*)\Bytes Received/sec
\Network Interface(*)\Bytes Sent/sec
\IPv4\Datagrams/sec
\IPv4\Datagrams Received/sec
\IPv4\Datagrams Sent/sec
\TCPv4\Segments/sec
\TCPv4\Segments Received/sec
\TCPv4\Segments Sent/sec
As just mentioned, because only a small number of counters are valuable for capacity
planning, the counter log settings file can be even more concise than the example in
Listing 4-11. In this listing, only high-level statistics on processor, memory, disk, and
network utilization are kept, along with some server application-related process statistics and server application-specific statistics related to the printer workload throughput.
Accumulating historical data The Relog command procedure (Listing 4-10) constructs a binary counter log from your daily file and summarizes it to 1-hour intervals.
One-hour intervals are suitable for longer-term capacity planning, forecasting, and
trending. For capacity planning, you will want to build and maintain a summarized,
historical counter log file that will contain information spanning weeks, months, and
even years. You can use the Append option of the Relog utility to accomplish this. For
instance, the following command uses Relog to append the daily summarized counter
log to a counter log file that contains accumulated historical data using the -a append
parameter:
relog <computer-name>.dailyHistoryLog.blg -o
<computer-name>.historyPerformanceLog.blg -f BIN -a
Note that the -a option assumes that the output counter log file already exists.
350
Microsoft Windows Server 2003 Performance Guide
Creating a SQL Server PDB
Before you are ready to use command-line tools like Logman and Relog to populate a
SQL Server performance database, you must install an instance of SQL Server and
define and configure the SQL Server database that you intend to use as a PDB. You can
install a separate instance of SQL Server for your use exclusively as a PDB, or share an
instance of SQL Server with other applications. Once you select the instance of SQL
Server that you intend to use, follow these steps to define a performance database:
1. Define the database where you intend to store the performance data. Using
the SQL Enterprise Manager console, define a new database and allocate disk
space for it and its associated database recovery log. For the purpose of this
example, the database that will be used has been called PDB. Once the database
is created, you can access the system tables that are built automatically.
2. Define the database security rules. Security is a major aspect of SQL Server
database administration. Until you specifically define their access rights, no
external users can log on to gain access to the information stored in the PDB
database. You must define at least one new database logon and grant that user
access to the PDB database, as illustrated in Figure 4-12. In this example, the
new database logon is called PDB-Access, which is defined by default to have
access to the PDB database.
Figure 4-12
Defining a new database logon
Note that the SQL Server instance you use must be set up to use Microsoft Windows NT Authentication only. In addition, you must define a named user with
the proper credentials, in this example a user named PDB-Access. At the level of
Chapter 4:
Performance Monitoring Procedures
351
the PDB database, you must also define the permissions for the logon you just
defined. Figure 4-13 illustrates the PDB database user properties for the PDBAccess logon. Here PDB-Access has been granted permission to use all database
security roles that are available. At a minimum, give this logon db_owner and
db_datawriter authority. The db_owner role allows it to define, add, and change
database tables, and db_datawriter gives the user permission to update and add
data to the defined tables.
Figure 4-13
Database user properties for the PDB-Access logon
3. Define the ODBC connection that will be used to access the PDB database.
To allow programs like the Relog utility, the Performance Logs and Alerts service, the System Monitor console for reporting, and Microsoft Excel for data
mining and analysis to access the PDB database, you must define a System DSN
connection. Using the ODBC Administrator, add a new System DSN connection
that uses the SQL Server driver to allow access to the PDB database that you just
defined, as illustrated in Figure 4-14.
352
Microsoft Windows Server 2003 Performance Guide
Figure 4-14
Using the SQL Server driver to allow access to the PDB database
Doing this launches the ODBC connection wizard, which will enable you to configure this connection, as illustrated in Figure 4-15.
Figure 4-15
Using the ODBC connection wizard
You must supply a connection name, as illustrated, and point the connection to
the SQL Server instance and database. In this example, the connection name is
PDB. Then you must supply information about the security of the connection, as
illustrated in Figure 4-16.
Chapter 4:
Figure 4-16
Performance Monitoring Procedures
353
Verifying the authenticity of the connection
The Connect To SQL Server To Obtain Default Settings For The Additional Configuration Options check box, which is shown as selected in Figure 4-16, allows
you to fully test the permissions on the connection when the connection definition is complete. Continue to fill out the forms provided by the ODBC connection wizard and click Finish to test the connection.
Populating the Repository
After the ODBC connection is defined, you can start loading counter log data into the
PDB database you created. Because you plan to accumulate a large amount of historical performance information in the PDB, even the summarized daily counter log,
which you created to build daily management reports, contains much more data than
you will need for capacity planning purposes. Similar to the processing you performed when maintaining an historical binary format counter log, as discussed in
“Creating Historical Summary Logs,” you probably want to summarize this data even
further and drop any counters that will not prove worthwhile to retain over long periods of time.
For example, the following Relog command takes one or more daily counter log files
that are summarized to create a history file and inserts the output into the PDB database:
relog <computer-name>.historyPerformanceLog.blg -o "SQL:PDB!ByHour" -f SQL -cf
c:\perflogs\summary-counters-setting-file.txt
354
Microsoft Windows Server 2003 Performance Guide
The output file specification, -o “SQL:PDB!ByHour", identifies the ODBC connection
named PDB and defines a subsection of the PDB database that is called ByHour. You
can define multiple capacity planning databases within the PDB database by identifying them using separate names.
After this command executes, it creates the PDB database tables that are used to store
the counter log data. Using the SQL Enterprise Manager console, you can verify that
the counter log tables have been created properly. Figure 4-17 illustrates the state of
the PDB database following execution of the Relog command that references the SQL
Server PDB ODBC connection.
Figure 4-17
The state of the PDB database following execution of the Relog command
The counter log data is stored in three SQL Server tables—CounterData, CounterDetails, and DisplayToID. The format of these SQL Server tables reflects the data model
that the performance monitoring facilities and services use to maintain a SQL Server
repository of counter log data. This data model is discussed in detail below in the section entitled Using a SQL Server Repository.
Automated Counter Log Processing
In the preceding sections of this chapter, procedures were outlined to perform the following functions:
■
Gather daily performance counter logs
■
Gather diagnostic counter logs in response to alerts
■
Summarize the daily performance counter logs to build management reports
■
Populate a repository inside SQL Server that can be used in capacity planning
Chapter 4:
Performance Monitoring Procedures
355
These are functions that need to be performed automatically on all the Windows
Server 2003 machines requiring continuous performance monitoring. This section
provides a sample WSH script to automate these daily procedures. This script is written in VBScript, and you can easily tailor it to your environment. It will reference the
counter log settings files that were shown in the preceding sections, which create summarized counter log files and reference the PDB database in SQL Server defined in the
previous section.
The complete sample post-processing script is shown in Listing 4-12. It is heavily commented and sprinkled with Wscript.Echo messages; this information will allow you to
modify the script easily to suit your specific requirements. To enable these
Wscript.Echo diagnostic messages, remove the Microsoft Visual Basic comment indicator, which is a single quotation mark character (').
Listing 4-12 Sample Post-Processing Script
‘VBscript for post-processing daily performance Counter Logs
‘
‘Initialization
CreatedTodayDate = Now
Const
Const
Const
Const
OverwriteExisting = True
LogType = "blg"
OldLogType = "Windows Backup File"
PMFileType = "Performance Monitor File"
Const
Const
Const
Const
Const
relogSettingsFileName =
relogParms = "-f BIN -t
relogSummaryParms = "-f
relogHistoryParms = "-f
HistorySettingsFileName
Const
Const
Const
Const
PerflogFolderToday = "C:\Perflogs\Today"
PerflogFolderYesterday = "C:\Perflogs\Yesterday"
PerflogFolderAlerts = "C:\Perflogs\Alert Logs"
PerflogFolderHistory = "C:\Perflogs\History"
"c:\perflogs\relog-counters-setting-file.txt"
15"
BIN -t 4"
BIN -a"
= "c:\perflogs\summary-counters-setting-file.txt"
Set WshShell = CreateObject("Wscript.Shell")
Set objEnv = WshShell.Environment("Process")
LogonServer = objENV("COMPUTERNAME")
DailyFileName = PerflogFolderYesterday & "\" & LogonServer & "." & _
"basic_daily_logDailyLog" & "." & LogType
SummaryFileName = PerflogFolderYesterday & "\" & LogonServer & "." & _
"basic_history_logHistoryLog" & "." & LogType
HistoryFileName = PerflogFolderHistory & "\" & LogonServer & "." & _
"history_performance_logPerformanceLog" & "." & LogType
‘WScript.Echo DailyFileName
‘WScript.Echo SummaryFileName
‘WScript.Echo HistoryFileName
356
Microsoft Windows Server 2003 Performance Guide
Const AlertDaysOld = 3 ‘Number of days to keep Alert-generated
Counter Log files
Set objFSO1 = CreateObject("Scripting.FileSystemObject")
If objFSO1.FolderExists(PerflogFolderYesterday) Then
Set objYesterdayFolder = objFSO1.GetFolder(PerflogFolderYesterday)
Else
Wscript.Echo "Yesterday folder does not exist. Will create " &
PerflogFolderYesterday
Set objYesterdayFolder = objFSO1.CreateFolder(PerflogFolderYesterday)
End If
Set objYesterdayFolder = objFSO1.GetFolder(PerflogFolderYesterday)
Set objTodayFolder = objFSO1.GetFolder(PerflogFolderToday)
Set objAlertsFolder = objFSO1.GetFolder(PerflogFolderAlerts)
‘Wscript.Echo "Begin Script Body"
objYesterdayFolder.attributes = 0
Set fc1 = objYesterdayFolder.Files
‘Wscript.Echo "Look for Yesterday's older backup files..."
For Each f1 in fc1
‘Wscript.Echo "Found " & f1.name & " in " & PerflogFolderYesterday
‘Wscript.Echo "File type is " & f1.type
If f1.type = OldLogType Then
‘Wscript.Echo "Old files of type " & f1.type & " will be deleted."
filename = PerflogFolderYesterday & "\" & f1.name
‘Wscript.Echo "Delete " & filename
objFSO1.DeleteFile(filename)
End If
Next
‘Wscript.Echo "Look for Yesterday's .blg files..."
For Each f1 in fc1
‘Wscript.Echo "Found " & f1.name & " in " & PerflogFolderYesterday
If f1.type = PMFileType Then
NewName = PerflogFolderYesterday & "\" & f1.name & "." & "bkf"
‘Wscript.Echo f1.name & " will be renamed to " & NewName
filename = PerflogFolderYesterday & "\" & f1.name
‘Wscript.Echo "Rename " & filename
objFSO1.MoveFile filename, NewName
End If
Next
objYesterdayFolder.attributes = 0
‘Wscript.Echo "Look at Today's files..."
‘Wscript.Echo "Today is " & CStr(CreatedTodayDate)
Chapter 4:
Performance Monitoring Procedures
Set fc2 = objTodayFolder.Files
For Each f2 in fc2
filename = PerflogFolderToday & "\" & f2.name
FileCreatedDate = CDate (f2.DateCreated)
‘Wscript.Echo filename & " was created on " & CStr (FileCreatedDate)
If DateDiff ("d", CreatedTodayDate, FileCreatedDate) = 0 Then
‘Wscript.Echo "Skipping the file currently in use: " & filename
Else
‘Wscript.Echo filename & " is " & CStr(DateDiff("d", FileCreatedDate,
CreatedTodayDate)) & " day(s) old."
‘Wscript.Echo "Copying " & filename & " to " & PerflogFolderYesterday
objFSO1.CopyFile filename, PerflogFolderYesterday & "/"
relogfiles = relogfiles & PerflogFolderYesterday & "\" & f2.name & " "
End If
Next
‘Wscript.Echo "Today's files to send to relog: " & relogfiles
command = "relog " & relogfiles & "" -o " & DailyFileName & " -cf " _
& relogSettingsFileName & " " & relogParms
‘Wscript.Echo "Relog command string: " & command
Set WshShell = Wscript.CreateObject("WScript.Shell")
Set execCommand = WshShell.Exec(command)
Wscript.Echo execCommand.StdOut.ReadAll
command = "relog " & DailyFileName & _
" -o " & SummaryFileName & " -cf " & HistorySettingsFileName _
& " " & relogSummaryParms
‘Wscript.Echo "Relog command string: " & command
Set WshShell = Wscript.CreateObject("WScript.Shell")
Set execCommand = WshShell.Exec(command)
Wscript.Echo execCommand.StdOut.ReadAll
If (objFSO1.FileExists(HistoryFileName)) Then
command = "relog " & HistoryFileName & " " & SummaryFileName & _
" -o " & HistoryFileName & " " & relogHistoryParms
‘Wscript.Echo "Relog command string: " & command
Set WshShell = Wscript.CreateObject("WScript.Shell")
Set execCommand = WshShell.Exec(command)
Wscript.Echo execCommand.StdOut.ReadAll
Else
objFSO1.CopyFile SummaryFileName, HistoryFileName
End If
‘Copy the summarized daily file to a Counter Log data consolidation server
‘
objFSO1.CopyFile DailyFileName, <somewhere>
357
358
Microsoft Windows Server 2003 Performance Guide
‘Wscript.Echo "Deleting files after processing"
For Each f2 in fc2
filename = PerflogFolderToday & "\" & f2.name
FileCreatedDate = CDate (f2.DateCreated)
If DateDiff ("d", CreatedTodayDate, FileCreatedDate) = 0 Then
‘Wscript.Echo "Skipping the file currently in use: " & filename
Else
‘Wscript.Echo "Deleting " & filename & " from " & PerflogFolderToday
objFSO1.DeleteFile(filename)
End If
Next
Set fc3 = objAlertsFolder.Files
‘Wscript.Echo "Look for older Alert-generated log files ..."
For Each f3 in fc3
‘Wscript.Echo "Found " & f3.name & " in " & PerflogFolderAlerts
filename = PerflogFolderAlerts & "\" & f3.name
FileCreatedDate = CDate (f3.DateCreated)
‘Wscript.Echo filename & " is " & CStr(DateDiff("d", FileCreatedDate,
CreatedTodayDate)) & " day(s) old."
If DateDiff ("d", FileCreatedDate, CreatedTodayDate) < AlertDaysOld Then
‘Wscript.Echo "Skipping recently created Alert Counter Log file: " &
filename
Else
‘Wscript.Echo "Deleting " & filename & " from " &
PerflogFolderToday
objFSO1.DeleteFile(filename)
End If
Next
This sample script is designed to be launched automatically by the Performance Logs
and Alerts service when smlogsvc closes one daily counter log and opens the next.
You can also use the Task Scheduler to launch this script automatically whenever you
decide to perform these post-processing steps.
The sections that follow discuss in detail the logic used by the post-processing script,
in case you need to customize it.
Script Initialization
The Initialization section of this sample script sets initial values for a number of constants that are used. By changing the initial values that these variables are set to, you
can easily change the behavior of the script to conform to your environment without
having to alter much of the script’s logic. For example, the following lines set initial
Chapter 4:
Performance Monitoring Procedures
359
values for four string variables that reference the folders containing the counter logs
that the script will manage:
Const
Const
Const
Const
PerflogFolderToday = "C:\Perflogs\Today"
PerflogFolderYesterday = "C:\Perflogs\Yesterday"
PerflogFolderAlerts = "C:\Perflogs\Alert Logs"
PerflogFolderHistory = "C:\Perflogs\History"
It assumes that PerflogFolderToday points to the folder where the daily counter logs are
being written by the Performance Logs and Alerts service. It also assumes that PerflogFolderAlerts points to the folder where counter logs that are triggered by hourly alert
scans are written. The script will move yesterday’s counter logs from C:\Perflogs\Today to C:\Perflogs\Yesterday, creating C:\Perflogs\Yesterday if it does not
already exist. The script will also delete counter logs in the C:\Perflogs\Alert Logs
folder if they are older than three days. Another constant named AlertDaysOld contains the aging criteria for the Alert Logs folder. If you prefer to keep diagnostic
counter logs that were automatically generated by alert thresholds being tripped for
10 days, simply change the initialization value of the AlertDaysOld variable as follows:
Const AlertDaysOld = 10
The script uses the WshShell object to access the built-in COMPUTERNAME environment variable. This variable is used to create a file name for the summarized counter
log file that relog will create from any detail counter logs that are found. Having the
computer name where the counter log originated embedded in the file name makes
for easier identification.
Set WshShell = CreateObject("Wscript.Shell")
Set objEnv = WshShell.Environment("Process")
LogonServer = objENV("COMPUTERNAME")
DailyFileName = PerflogFolderYesterday & "\" & LogonServer & "." &
"basic_daily_log" & "." & LogType
‘WScript.Echo DailyFileName
Following initialization of these and other constants, the script initializes a FileSystemObject, which it uses to perform various file management operations on the designated folders and the counter log files they contain:
Set objFSO1 = CreateObject("Scripting.FileSystemObject")
The FileSystemObject is then used to identify the file folders the script operates on. For
example, the following code initializes a variable named fc1, which is a collection that
contains the files in the C:\Perflog\Yesterday folder:
Set objYesterdayFolder = objFSO1.GetFolder(PerflogFolderYesterday)
Set fc1 = objYesterdayFolder.Files
360
Microsoft Windows Server 2003 Performance Guide
Cleaning Up Yesterday’s Backup Files
The body of the script begins by enumerating the files in the C:\Perflogs\Yesterday
folder where older counter logs are stored. At this point, any files in the C:\Perflogs\Yesterday folder that match the OldLogType are deleted. The next section of the
script renames any .blg files that it finds in the C:\Perflogs\Yesterday folder with the
OldLogType suffix, which is appended to the file name using this section of code:
For Each f1 in fc1
If f1.type = PMFileType Then
filename = PerflogFolderYesterday & "\" & f1.name
NewName = PerflogFolderYesterday & "\" & f1.name & "." & "bkf"
objFSO1.MoveFile filename, NewName
End If
Next
The sample script initializes the value of OldLogType to associate it with Windows
Backup File, which uses a file name suffix of .bkf. You can change this to any value
appropriate for your environment. (Alternately, you could parse the file name string to
look for this file extension.)
The effect of this processing on the files in the C:\Perflogs\Yesterday folder is that the
folder will contain counter logs from yesterday and backup versions of the counter
logs used the day before yesterday. This provides a margin of safety that allows you to
go back in time at least one full day when something goes awry with any aspect of the
daily counter log post-processing procedure.
Managing Yesterday’s Daily Counter Logs
Next, the script enumerates the counter log files that exist in the C:\Perflogs\Today
folder. This should include the counter log binary file where current measurement
data is being written, along with any earlier counter logs that are now available for
post-processing. The script determines whether the counter log files in the C:\Perflogs\Today folder are ready for processing by comparing their creation date to the current date. Files in the C:\Perflogs\Today folder from the previous day are then copied
to the C:\Perflogs\Yesterday folder using the DateDiff function. Meanwhile, the
names of the files being copied are accumulated in a string variable called relogfiles for
use later in the script.
Creating Summarized Counter Logs
After all the older counter logs in the C:\Perflogs\Today folder are processed, the
script executes the Relog utility using the Exec method of the WshShell object:
Chapter 4:
Performance Monitoring Procedures
361
command = "relog " & relogfiles & "-o " & DailyFileName & " -cf " _
& relogSettingsFileName & " " & relogParms
Set WshShell = Wscript.CreateObject("WScript.Shell")
Set execCommand = WshShell.Exec(command)
This creates a binary counter log denoted by the DailyFileName string variable, which
is summarized to 15-minute intervals.
The line that follows the invocation of the Relog utility gives you access to any output
messages that Relog produces, if you remove the comment character and enable the
call to Wscript.Echo:
‘Wscript.Echo execCommand.StdOut.ReadAll
Once this summarized daily counter log is created using the Relog utility, you will
probably want to add a processing step in which you copy the daily log to a counter
log consolidation server somewhere on your network. Replace the following comment
lines, which reserve a space for this processing, with code that performs the file transfer to the designated consolidation server in your environment:
‘
‘
Copy the summarized daily file to a Counter Log data consolidation server
objFSO1.CopyFile DailyFileName, <somewhere>
The script issues a second Relog command to summarize the daily log to 1-hour intervals. Next, it uses Relog again to append this counter log to an historical summary log
file in the C:\Perflogs\History folder that is also summarized hourly. At the conclusion of this step, a file named <computer-name>.history_performance_log.blg, which
contains all the accumulated historical data you have gathered, summarized to 1-hour
intervals, is in the C:\Perflogs\History folder. Because this historical summary file will
continue to expand at the rate of about 300–500 KB per day, you might want to
archive it periodically for long-term storage in another location. For instance, after one
year, the historical summarized counter log will grow to approximately 100–200 MB
per machine.
Alternatively, at this point in the procedure, you could chose to store the historical,
summarized counter log data in a SQL Server performance database. If you decide to
use a SQL Server repository for historical counter data, first insert the following initialization code:
Const relogSQLParms = "-f SQL -t 4"
Const relogSQLDB = """SQL:PDB!ByHour"""
Const SQLSettingsFileNane = "c:\perflogs\summary-counters-setting-file.txt"
362
Microsoft Windows Server 2003 Performance Guide
Next, replace the last two Relog commands with a single Relog command to insert the
hourly counter log data that is generated directly into SQL Server, as follows:
command = "relog " & DailyFileName & " -o " & relogSQLDB & " -cf " & SQLSettingsFileNane
& " " & relogSQLParms
Tip
For maximum flexibility, you might want to build and maintain both summarized data and historical data in binary format logs that are easy to transfer from computer to computer in your network and to a consolidated historical PDB using SQL
Server.
The script then contains a comment that instructs you to copy the summarized daily
counter log file that was just created to a consolidation server somewhere on your network, where you will use it in your management reporting process:
‘
objFSO2.CopyFile DailyFileName, <somewhere>
Next, the script returns to the C:\Perflogs\Today folder and deletes the older files that
were previously copied to C:\Perflogs\Yesterday and processed by Relog.
Managing Counter Logs Automatically Generated by Alerts
Finally, the script enumerates all the files in the C:\Perflogs\Alert Logs folder and
deletes any that are older than the value set in the AlertDaysOld constant, which by
default is set to 3 days. The rationale for deleting older counter logs in the Alert Logs
folder is that the crisis has probably passed. Like the C:\Perflogs\Today folder where
you are writing daily counter logs, if you neglect to perform some sort of automatic file
management on the C:\Perflogs\Alert Logs folder, eventually you are going to run out
of disk space on the machine.
Scheduled Monthly Reports and Archiving
Additional relogging can be done on a scheduled basis using the Scheduled Tasks utility. Tasks scheduled to run by a service other than the Performance Logs and Alerts service should use active log files very cautiously. Having an active log file open by
another process might prevent the log service from closing it properly, or prevent the
command file that is executed after the current log is closed from functioning correctly.
At the end of the month, the log files that were collected each day throughout the
month can be consolidated into a single archival summary log file. In most cases, this
will provide sufficient detail while not consuming unnecessary disk space.
Chapter 4:
Performance Monitoring Procedures
363
The end-of-the-month processing consists of two main functions:
1. Reducing the number of samples
2. Consolidating the daily files into a single log file representing the month
For best processing performance, first compress the individual files and then concatenate them. Because Relog tries to join all the input files together before processing
them, working with smaller input files makes the resulting join much quicker. Listing
4-13 is an example of typical monthly processing using a command file.
Listing 4-13 Monthly Processing Example
Rem ***********************************************************************
Rem * arg 1 is the name of the month that the performance data logs
Rem * are being compiled for.
Rem *
Rem * arg 2 is the directory path in which to put the output file
Ren *
Ren * NOTE: This procedure should not run when a daily log is being
Ren * processed. It should run after the last daily log of the month has
Rem * been processed and moved to the SAVE_DIR
Rem *
Rem ***********************************************************************
set LOG_DIR <directory path containing these files>
set SAVE_DIR <directory path where daily log files are saved>
set TEMP_DIR <directory path where compressed daily log files are saved>
echo Archiving logs in %SAVE_DIR% at %date% %time%>> %LOG_DIR%\SAMPLE_RELOG.LOG
Rem compress each daily log and store the output files in the TEMP_DIR
for %a in (%SAVE_DIR%) do %LOG_DIR%\Montlhy_Compress.bat "%a" "%TEMP_DIR%"
if errorlevel 1 goto COMPRESS_ERROR
Rem concatenate the compressed log files into monthly summary file
relog %TEMP_DIR%\*.blg -o %2\%1.blg
if errorlevel 1 goto RELOG_ERROR
Rem clear out the temp directory to remove the temp files
Del /q %TEMP_DIR%\*.*
Rem clear out the original files if you want to free up the space
Rem or you may wish to do this manually after insuring the files were
Rem compressed correctly
Del /q %SAVE_DIR%\*.*
exit :COMPRESS_ERROR
echo Compress error at %date% %time%>> %LOG_DIR%\SAMPLE_RELOG.LOG
exit
:RELOG_ERROR
echo Relog error at %date% %time%>> %LOG_DIR%\SAMPLE_RELOG.LOG
exit
364
Microsoft Windows Server 2003 Performance Guide
In the command file in Listing 4-13, the log files stored in the SAVE_DIR directory are
compressed with another command file and then merged together.
Listing 4-14
Command File Invoked by the Monthly Processing Job
Rem ***********************************************************************
Rem * arg 1 is the filename of the daily performance data log file to
Rem * compress for subsequent concatenation.
Rem *
Rem * arg 2 is the directory path in which to put the output file
Ren *
Rem ***********************************************************************
set LOG_DIR <directory path containing these files>
set SAVE_DIR <directory path where daily log files are saved>
set TEMP_DIR <directory path where compressed daily log files are saved>
echo Compressing file: %1 at %date% %time%>> %LOG_DIR%\SAMPLE_RELOG.LOG
Rem compress each daily log and store the output files in the TEMP_DIR
Relog %1 -config %LOG_DIR%\COMPRESS_CFG.TXT -o %2\%~nx1 >>
%LOG_DIR%\SAMPLE_RELOG.LOG
The command file in Listing 4-14 is separate from the main command file so that the
file name parsing features of the command interpreter can be used.
After the data collected during the month is consolidated into a condensed summary
file, monthly reports can be produced using the summary file. In fact, the same
counter log configuration files could be used for a monthly summary of the same data
reported in the daily reports. The only difference is in using a different input file—the
monthly summary log file instead of the daily performance data log file.
Defining Log Configurations for Multiple Servers
Configuration files can also be used to establish uniform counter log collection procedures across multiple computers. The following configuration file illustrates the basic
log settings that might be used in a server configuration consisting of many servers.
Note that additional counters can be added if necessary.
[name]
Basic Performance Data Log
[sample]
15
[format]
bin
[--max]
[--append]
Chapter 4:
Performance Monitoring Procedures
365
[version]
nnnnnn
[runcmd] <insert the full path name of command file to run when this log file is
closed>
[counters]
\Processor(_Total)\% Processor Time
\LogicalDisk(_Total)\% Disk Time
\Memory\Pages/sec
\Network Interface (*)\Bytes Total/sec
Using a SQL Server Repository
The SQL Server support for counter log data is flexible and provides a good way to
build and maintain a long-term repository of counter log data for both analysis and
capacity planning. This section discusses the use of a SQL Server repository for reporting counter log performance data. It discusses the database schema that is employed
to store the counter log data. It then describes how to use data mining tools like
Microsoft Excel to access counter log data in a SQL Server repository and report on it.
Finally, an example is provided that retrieves counter log data from SQL Server using
Excel and produces a workload growth–based forecast, performing a statistical analysis of historical usage trends.
When counter log data is stored in a SQL Server repository, it is organized into three
database tables. In general, data in database tables is arranged into rows and columns.
Rows are equivalent to records in a file, whereas columns represent fields. Each
counter log table is indexed using key fields. Keys uniquely identify rows in the table
and allow you to select specific rows directly without having to read through the entire
table. Each counter log table contains foreign keys, which are the key fields that allow
you to link to another, related table. If two tables share a common key field, you can
join them, which is a logical operation that combines data from two tables into one.
The language used to select and join database tables is known as SQL, which is based
on relational algebra. Fortunately, you do not have to know how to use the SQL language to access and manipulate the counter log tables inside SQL Server. You can rely
on tools like Microsoft Excel, Microsoft Query, and Microsoft Access that allow you to
access SQL Server data without ever having to code SQL statements. Nevertheless,
when you are using one of these tools to access counter log data stored in SQL Server,
it is helpful to have an understanding of how the counter log is organized into database tables.
366
Microsoft Windows Server 2003 Performance Guide
Although you can store any type of counter log in a SQL Server database, the database
is ideal for storing summarized counter log data that is consolidated from multiple
machines for longer-term trending, forecasting, and capacity planning. The procedures described in this chapter focus on this type of usage.
If you are creating a SQL Server repository of summarized counter log data for capacity planning, you will use procedures like the ones described earlier in “Populating the
Repository,” in which you relog the data to create concise summaries and send the
output from the Relog utility to SQL Server. For example, the following Relog command takes one or more daily counter log files summarized to 15-minute intervals,
summarizes them even further to 1-hour intervals, and inserts the output into a SQL
Server database named PDB:
relog <summaryfile-list> -o "SQL:PDB!ByHour" -f SQL -t 4 -cf c:\perflogs\summarycounters-setting-file.txt
The output file specification, -o “SQL:PDB!ByHour", identifies the ODBC connection
named PDB and also identifies a log set of the PDB database, called ByHour.
Using the System Monitor Console with SQL Server
You can continue to use the System Monitor console to report on counter log data that
is stored in a SQL Server performance database. In the Chart view, click the View Log
Data toolbar button (which looks like a disk drive), and then select Database as the
data source, as illustrated in Figure 4-18.
Figure 4-18
Database selected on the Source tab
Chapter 4:
Performance Monitoring Procedures
367
Select the System DSN and the Log Set, and then click Apply. At this point, you will be
able to create charts and histograms for any of the historical data that is stored in the
SQL Server database. For example, Figure 4-19 shows a Chart view of Committed
Bytes compared to the machine’s Commit Limit over a 3-day span using counter log
data retrieved from a SQL Server database.
Figure 4-19
A Chart view illustrating memory capacity planning data from SQL Server
How to Configure System Monitor to Log to SQL Server
Although these procedures focus on using SQL Server as a repository of summarized
counter log data for capacity planning, it is also possible to direct counter log data to
SQL Server when the data is initially created.
1. To configure Performance Monitor to Log to SQL Server from Performance Logs
and Alerts, right-click Counter Logs, and then click New Log Settings. Type a
name for this log, and then click OK.
2. On the General tab, click Add Objects to add the objects you want to log, and
then click Add. Enter the counters that you want to monitor, and then click
Close. In the Run As box, be sure to supply a user name that has the proper credentials to create and store data in the SQL Server database selected. The following permissions are required:
❑
The correct rights to run System Monitor
❑
The correct rights to the SQL Server database (both create and read)
368
Microsoft Windows Server 2003 Performance Guide
3. Click the Log Files tab, click SQL Database in the Log File Type list, and then
click Configure, as illustrated in Figure 4-20. The Configure SQL Logs dialog
box is displayed.
Figure 4-20
Configuring the SQL database
Note that using automatic versioning when you are logging counter log data
directly to SQL Server is usually not advisable, as illustrated. Versioning will create new counter log sets continuously and increase the size of the CounterDetails table accordingly. This table growth will make using the counter logs stored
in the SQL Server database more complicated when you employ query and analysis tools such as Microsoft Excel.
4. In the System DSN box, click the DSN that you want to connect to and provide
a name for the counter log set you want to log to, as illustrated in Figure 4-21. If
this log set already exists, the counter log data will be added to the existing log
set. If the log set is new, it will be created when the counter log session is started.
Chapter 4:
Figure 4-21
Performance Monitoring Procedures
369
Setting the DSN and naming the log set
After you start the counter log session, you can verify that the counter log was created
properly inside SQL Server using SQL Server Enterprise Manager. Navigate to the
database to which you are logging counter data and access its tables. You can then
right-click the DisplayToID table, select Open Table, Open All Rows. You should see
a display similar to Figure 4-22.
Figure 4-22 Using SQL Server Enterprise Manager to verify that a counter log was created
properly inside SQL Server
Counter Log Database Schema
When you first use the SQL Server support in either Logman or Relog to create a
counter log data repository, three new tables are defined to store the counter log data,
as illustrated in Figure 4-17. Counter log data is stored in SQL Server in three interlinked tables:
■
The CounterData table contains individual measurement observations, with
one measurement sample per row.
■
The CounterDetails table tells you what counter fields are stored in the database, with one set of counter identification information stored per row.
■
The DisplayToID table contains information used to identify one or more sets of
counter log data stored in the database.
370
Microsoft Windows Server 2003 Performance Guide
CounterData Table
The CounterData table contains a row for each observation of a distinct counter value
measured at a specific time. Each individual counter measurement is stored in a separate row of the table, so you can expect this table to grow quite large.
The columns (or fields) of the CounterData table that you are likely to use most frequently are described in Table 4-5.
Table 4-5
Frequently Used CounterData Columns (Fields)
Column Name
Explanation
CounterID
Part of the primary key to the table; foreign key to the
CounterDetails table
CounterDateTime
The start time when the value of this counter was collected,
based on Coordinated Universal Time (UTC)
CounterValue
The formatted value of the counter, ready to be displayed
GUID
Part of the primary key to the table; foreign key to the DisplayToID table
RecordIndex
Part of the primary key to the table; uniquely identifies the
data sample
The CounterData table also contains the raw performance data values that were necessary to calculate the formatted data value. Deriving the formatted counters might
require up to four raw performance data values that are used in the formatted counter
value calculation. If available, these raw performance data values are stored in FirstValueA, FirstValueB, SecondValueA, and SecondValueB. When you use the Relog utility to
summarize data from a SQL Server database, the raw performance counters are put to
use. For most other purposes, you can ignore them.
The primary key for this table is a combination of the GUID, CounterID, and RecordIndex fields. That means that a SQL SELECT statement identifies a unique row in the
CounterTable when you specify a value for the CounterID, GUID, and RecordIndex
fields.
Even though it contains the measurement data for all the counters you are interested
in, by itself the CounterData table is not usable. However, once you join the CounterTable with the CounterDetails table that identifies each counter by its CounterID, the
CounterData yields the measurement data you are interested in.
CounterDetails Table
The CounterDetails table associates the CounterID that is the key to the CounterData
table with fields that identify the counter by name. It stores the counter type, which is
Chapter 4:
Performance Monitoring Procedures
371
required for summarization. CounterDetails provides the fully qualified counter
name, which it breaks into a series of fields so that the counter name value can be
stored and retrieved readily. For example, to build a fully qualified counter name such
as \\Computer\Processor(1)\% Processor Time, the CounterDetails table supplies
values for the ComputerName, ObjectName, CounterName, and InstanceName columns.
The CounterID field is the key to the CounterDetails table. The fields listed in Table
4-6 are available in the CounterDetails table.
Table 4-6
Fields Available in the CounterDetails Table
Column Name
Explanation
CounterID
A unique identifier; foreign key to the CounterData table
MachineName
The machine name portion of the fully qualified counter name
ObjectName
The object name portion of the fully qualified counter name
CounterName
The counter name portion of the fully qualified counter name
InstanceName
The counter instance name portion of the fully qualified
counter name, if applicable
InstanceIndex
The counter instance index portion of the fully qualified
counter name, if applicable
ParentName
The name of the parent instance, if applicable
ParentObjectID
The parent object ID, if applicable
CounterType
The counter type for use in summarization
DefaultScale
The default scaling factor to be applied when the counter value is charted using the System Monitor console
When you join the CounterDetails table with the CounterData table in a SQL query,
you can associate the measurement data values stored in CounterData with the
proper counter name identification. Storing the counter name details once in a separate table is an example of database normalization, and saves a great of deal of disk
space in the database tables. It does require an extra step during reporting, namely,
that you must join the two tables first. But this is very easy to do using the rich data
mining tools that are provided in programs like Microsoft Excel and Microsoft Query,
which is illustrated in the “Querying the SQL Performance Database” section.
Note If you intend to code SQL statements to retrieve data from counter logs
stored in SQL Server, you must first join the CounterData and CounterDetails tables, as
in the following generic example:
Select * from CounterData, CounterDetails where CounterData.CounterID
= 50
372
Microsoft Windows Server 2003 Performance Guide
DisplayToID Table
The DisplayToID table serves several purposes. Primarily, it associates the character
string you specified when you originally created the SQL Server counter log database
with a GUID that can serve as a unique identifier for the counter log database. This
character string is for reference only. For example, in the following Relog command,
the character string “ByHour” identifies the log set of the PDB database that is referenced:
relog <summaryfile-list> -o "SQL:PDB!ByHour" -f SQL -t 4 -cf
c:\perflogs\summary-counters-setting-file.txt
This character string is stored in the DisplayToID table in a field called DisplayString,
which is automatically associated with a GUID to create a unique identifier that can be
used as a key to both this table and the CounterData table. The fields listed in Table
4-7 are in the DisplayToID table.
Table 4-7
Fields in the DisplayToID Table
Column Name
Explanation
GUID
A unique identifier; also a foreign key to the CounterData table. Note this GUID is unique; it is not the same GUID stored
in the registry at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\SysmonLog\Log Queries\.
DisplayString
Equivalent to a log file name; the character string specified
with the -o subparameter following the exclamation point
character (!) in the SQL Server database output file reference.
LogStartTime
Log start and end times in yyyy-mm-dd hh:mm:ss:nnn format.
Note that the datetime values in the LogStartTime and
LogStopTime fields allow you establish the time range for the
log without scanning the entire CounterData table.
LogStopTime
NumberOfRecords
The number of CounterData table rows associated with this
log file.
MinutesToUTC
Add this value to CounterData.CounterDateTime to convert
the CounterData datetime field to local time.
TimeZoneName
The name of the time zone where the CounterData was gathered.
RunID
A reserved field used for internal use only.
The Time conversion data present in the DisplayToID table allows you to convert the
datetime field in the CounterData table to local time.
Chapter 4:
Performance Monitoring Procedures
373
If you now issue a different Relog command, such as the one that follows, separate
rows will be created in the DisplayToID table—one set of rows that store the ByHour
summary log and another set that corresponds to the ByShift summary log:
relog <summaryfile-list> -o "SQL:PDB!ByShift" -f SQL -t 32 -cf
c:\perflogs\summary-counters-setting-file.txt
Counter measurement data from both logs will be stored together in the same CounterData table. A unique GUID is generated for each log. This GUID identifies the log
set uniquely and is used as the key to the DisplayToID table. The GUID also serves as
foreign key to the CounterData table so that the rows associated with each log can be
distinguished.
Note If, as in the example just discussed, multiple logs are stored in a single database, you must join all three tables first before attempting to utilize the counter log
data in a report. For a simple example, the following SELECT statement will join all
three tables and retrieve information from just the ByHour counter log:
Select * from CounterData, CounterDetails, DisplayToID where
DisplayToID.DisplayString = ‘ByHour' and CounterData.GUID =
DisplayToID.GUID and CounterData.CounterID = 50
Fortunately, the SQL Server Query Optimizer will ensure that this complex, nested
SELECT statement executes as efficiently as possible against the database.
To keep your SQL statements from growing complex as a result of maintaining multiple logs in a single SQL Server database, you can define additional counter log databases in the same instance of SQL Server or in separate instances of SQL Server.
Querying the SQL Performance Database
Once Performance Monitor data is stored in one or more SQL Server databases, you
can access it for reporting and analysis in a variety of ways. You can, for example, use
the Relog command-line utility to access performance data stored data in a SQL
Server log set and export it to a .csv file. You are also free to develop your own tools to
access and analyze the counter log data stored in SQL Server databases. Finally, you
might want to take advantage of the data mining capabilities built into existing tools
like Microsoft Excel, which can be used to query SQL Server databases and analyze
the data stored there. This section will walk you step by step through a typical SQL
Server counter log database access scenario. This scenario analyzes the ByHour
counter log file example using Microsoft Excel to retrieve the counter log data stored
there.
374
Microsoft Windows Server 2003 Performance Guide
More Info
For more information about developing your own tools to access and
analyze the counter log data stored in SQL Server databases, see the topic entitled
“Using the PDH Interface” in the SDK documentation available at http://
msdn.microsoft.com/library/en-us/perfmon/base/using_the_pdh_interface.asp.
To use the Excel database access component to access counter log data stored in SQL
Server, you follow two simple steps. The first step is to access the CounterDetails table
to generate a list of CounterIDs and counter names. If only one counter log file is
stored in the specified SQL Server database, a simple SQL SELECT operation returns
all rows:
USE PDB
SELECT * FROM CounterDetails
However, if multiple counter logs are stored in a single database, you will need to
select the appropriate DisplayString in the DisplayToID table and perform a join with
CounterDetails on the corresponding GUID:
USE PDB
SELECT * from CounterDetails, DisplayToID where DisplayToID.DisplayString = ‘ByHour'
and CounterData.GUID = DisplayToID.GUID
Creating a worksheet first that contains all the information from CounterDetails allows
you to determine which counter IDs to use in subsequent selections from the CounterData table. Because the CounterData table can grow quite large, you want to avoid any
operation that involves scanning the entire table. To select specific rows in the CounterData based on CounterIDs, you must first determine the correlation between the
CounterID field, which is a foreign key to the CounterData table, and the
Object:Counter:Instance identification information, which is stored in CounterDetails.
When you use Excel, you do not need to code any SQL statements to execute queries
of this kind against the counter log database. Excel supports Microsoft Query, which
generates the appropriate SELECT statement as you step through a Query generation
wizard with a few mouse clicks, as illustrated in the next procedure.
Here is the procedure for using Excel to access a counter log file stored in a SQL Server
database. It uses the PDB!ByHour counter log file example that has been referenced
throughout this chapter.
Chapter 4:
Performance Monitoring Procedures
375
To access a Counter Log file stored in SQL Server
1. Start with an empty Excel worksheet and workbook. From the Data menu,
select Get External Data, New Database Query. This invokes the Microsoft
Query Wizard.
2. In the Choose Data Source dialog box, click the Databases tab, shown in Figure 4-23. Select your defined ODBC connection to the counter log SQL Server
database.
Figure 4-23
The Databases tab in Choose Data Source
3. Make sure that the Use The Query Wizard To Create/Edit Queries check box
is selected and click OK. The Choose Columns dialog box is displayed.
4. In the Available Tables And Columns section, select the CounterDetails table.
All the columns from CounterDetails appear in the Columns In Your Query list,
as illustrated in Figure 4-24. Click Next.
Figure 4-24
Displaying the columns from the CounterDetails query
5. Skip the next step of the Query Wizard in which you would normally enter your
filtering criteria to build the SQL SELECT statement’s WHERE clause. Because
the information you are retrieving from the CounterDetails table is intended to
act as a data dictionary for all subsequent requests, unless the result set of rows
376
Microsoft Windows Server 2003 Performance Guide
is too large for Excel to handle conveniently, there is no reason to filter this
request.
Optionally, you can select a sort sequence for the result set by specifying the
fields in any order that will help you find what you want fast. For example, if you
want to gather data from multiple machines, sort the result set by ObjectName,
CounterName, and InstanceName, and MachineName, as illustrated in Figure 425. If you are focused instead on the measurements from a single machine, sort
by MachineName, followed by ObjectName, CounterName, and InstanceName.
Figure 4-25
Gathering data from multiple machines
6. Select Return Data To Microsoft Excel and click Finish. When the query runs,
you will have an Excel worksheet that looks similar to Figure 4-26.
Figure 4-26
Data returned to an Excel worksheet
Chapter 4:
Performance Monitoring Procedures
377
Scan this Excel worksheet and look for the CounterIDs that are of interest in the next
procedure, which will retrieve counter log measurement data from the CounterData
table.
To retrieve Counter Log measurement data from the CounterData table
1. Repeat the procedure titled “To access a counter log file stored in SQL Server.”
Start with a blank worksheet. Select Get External Data, Edit Query from
Excel’s Data menu. This invokes the Microsoft Query Wizard with the settings
from the last SQL Query intact.
2. This time you want to execute a query that will join the CounterData and CounterDetail tables. To the previous query, add the columns from the CounterData
table you want to see (for example, CounterData and CounterDateTime), as
shown in Figure 4-27.
Figure 4-27
Choosing columns to include in a query
3. Filter on specific CounterIDs; in this example, select only CounterData rows in
which the CounterID equals 27 or 132, which in this case (Figure 4-28) represents instances of Processor(_Total)\% Processor Time counter from two different machines.
378
Microsoft Windows Server 2003 Performance Guide
Figure 4-28
Including data from two different machines
Or, for example, for a generic Processor\% Processor Time query that will
retrieve counter data about all instances of the Processor object that are stored
in the counter log database, filter on values of Processor in the ObjectName field,
as illustrated in Figure 4-29.
Figure 4-29
Filtering on values of Processor in the ObjectName field
4. Microsoft Query will generate a SQL SELECT statement according to these specifications and execute it, returning all the rows that meet the selection criteria
and placing the result set of columns and rows inside your spreadsheet. The
result will appear similar to Figure 4-30.
Chapter 4:
Figure 4-30
Performance Monitoring Procedures
379
The Excel Workbook For SQL PDB Reporting & Alalysis.xls
5. Finally, execute another query on the DisplayToID table to gain access to the
MinutesToUTC field for this counter log so that you can adjust the CounterData
CounterDateTime to local time, if necessary.
Capacity Planning and Trending
Capacity planning refers to processes and procedures designed to anticipate future
resource shortages and help you take preventative action. Because preventative
actions might include acquiring more hardware to handle the workload, computer
capacity planning is closely tied to the budget process and has a strong financial component associated with it. Many successful IT organizations make it a practice to coordinate major hardware acquisition with resource planners.
Capacity planners rely on historical information about computer resource usage taken
from counter logs. Their planning horizon often encompasses decisions like whether
to build or extend a networking infrastructure or build a new computing site, which
would involve major capital expenditures that need approval at the highest levels of
your organization. As you can see, capacity planners are typically not concerned with
day-to-day computer performance problems (although some planners wear many
hats). Rather, they need information that can support strategic planning decisions,
and they need the performance databases that can supply it.
380
Microsoft Windows Server 2003 Performance Guide
Capacity planners also receive input on growth, expansion plans, acquisitions, and
other initiatives that might affect the computer resource usage that supports various
business functions. Capacity planners must understand the relationship between
business factors that drive computer resource usage. Historical information that
shows how these business drivers operated in the past often plays a key role in predicting the future.
To cope with anticipated shortages of critical resources, capacity planners prepare a
forecast of future computer resource requirements and their associated costs. This
forecast normally takes into account historical data on resource usage, which is kept
in the performance database. The computer capacity planners then factor in growth
estimates that are supplied by business planners. Like any prognostication about the
future, an element of guesswork is involved. However, by relying on extrapolations
from historical trends that are derived from empirical measurement data, capacity
planners strive to make their forecasts as accurate as possible.
In this section, using counter log data on resource usage to assist in long-range computer capacity planning is discussed. It also discusses basic statistical techniques for
performing workload forecasting using the resource data you have gathered. It will
provide guidelines for establishing a useful capacity planning function in your organization.
Organizing Data for Capacity Planning
The data of interest to capacity planners includes:
■
The utilization of major resource categories, including processors, disks, network, and memory
■
The throughput and transaction rates of key server applications
The measurement data best suited for capacity planning largely overlaps with the recommendations made earlier for management reporting in the section entitled “Sample Management Reports.” This means that that the counter log data you gather to
produce those reports will also feed a longer-term capacity planning process.
Because capacity planning focuses on longer-term usage trends, it deals exclusively
with summarized data and is usually limited to resources and workloads that have a
major financial impact on information technology (IT) costs and services. Planning
horizons span months and years, so you must accumulate a good deal of historical
data before you can reliably use it to make accurate predictions about the future.
Chapter 4:
Performance Monitoring Procedures
381
Tip As a general rule, for every unit of time that you have gathered accurate historical data, you can forecast a future trend line that is one-half that length of time. For
example, to forecast one year of future workload activity with reasonable accuracy,
you should have amassed two years of historical data.
Because you can accumulate data points faster, building a capacity planning database
initially that consists of measurement data organized for weekly reporting is a good
place to start. (Data organized monthly is usually more attuned to financial planning
cycles, which makes it more useful in the long run.) In the example discussed here,
measurement data has been organization by week for an entire year, allowing you to
produce a workload forecast for the next six month period. As you accumulate more
and more measurement data, over time you may prefer to shift to monthly or even
quarterly reporting.
Accumulating measurement data from counter logs into a performance database
using the procedures outlined in this chapter is only the first step in providing a decision support capability for computer resource capacity planning. Both data mining
and data analysis need to be performed outside the scope of these automated procedures. Data analysis in particular seeks to uncover any patterns that would seriously
undermine the accuracy of a capacity forecast based purely on statistical techniques.
Some of the key considerations include:
■
Identifying periods of peak load Summarization tends to smooth data over
time, eliminating the most severe peaks and values, as illustrated in Chapter 1,
“Performance Monitoring Overview.” Because performance problems are most
acute during periods of peak load, an important function of capacity planning is
identifying these periods of peak load. Machines and networks should be sized
so that performance is acceptable during peak periods of activity that occur with
predictable regularity. In the process of summarizing data for capacity planning
purposes, attempt to identify periods of peak usage so that requirements for
peak load processing can also be understood. In the next example examined in
this chapter, workload activity was summarized over a prime shift interval of
several hours involving a five-day work week. In addition to calculating average
utilization over that extended period, the peak level of activity for any one- hour
processing window within the summarization interval was retained. As illustrated, analysis of the peak workload trend, in comparison to the trend in the
smoothed average data, is usually very instructive.
Microsoft Windows Server 2003 Performance Guide
■
Cleansing the performance database of anomalies and other statistical outliers
In its original unedited form, the performance database is likely to contain many
measurement data points made during periods in which the operational environment was irregular. Consider, for example, the impact on measurement data
over an extended time period in which one of the application programs you
tracked was permitted to execute undetected in an infinite processor loop. This
creates a situation where the measured processor utilization statistics greatly
overestimate the true workload. So that future projections are not erroneously
based on this anomalous measurement data, these statistical outliers need to be
purged from the database. Instead of deleting these observations and diluting
the historical record, one option is to replace outlying observations with a more
representative calculation based on a moving average.
■
Ensuring that the historical record can account for seasonal adjustments and other
predictable cycles of activity that contribute to variability in load Many load fac-
tors are sensitive to both time and date. To understand the full impact of any seasonal factors that contribute to variability in the workload, it is important to
accumulate enough data about these cycles. For example, if your workload
tends to rise and fall predictably during an annual cycle, it is important to accumulate several instances of this cyclic behavior.
After the data points in the capacity planning database are analyzed and statistical
anomalies are removed, you can safely subject the database to statistical forecasting
techniques. Figure 4-31 illustrates this approach using measurement data on overall
processor utilization accumulated for 52 weeks on a 4-way system that is used as a primary mail server. The time series plot shows average weekly utilization of the processors during the prime shift, alongside the peak hour utilization for the summarization
interval.
Processor utilization (4 CPUs)
300
250
200
Percent
382
150
100
50
0
0
13
26
By week
Average utilization
Figure 4-31
39
52
Peak hour utilization
Processor utilization data accumulated for 52 weeks on a 4-processor system
Chapter 4:
Performance Monitoring Procedures
383
Overall utilization of the four processors was less than 50 percent on average across
the week at the outset of the tracking period, but rose to over 100 percent utilization
(one out of the four processors was busy on average during the weekly interval) by the
end of the one-year period. Looking at peak hour utilization over the interval reveals
peak hour utilization growing at an even faster rate. In the next section, statistical forecasting techniques are used to predict the growth of this workload for the upcoming
half-year period.
Notice in Figure 4-31 that by the end of the one-year period, peak hour utilization is
approaching 300 percent (three out of four processors are continuously busy during
the interval), which suggests a serious resource shortage is looming. This resource
shortage will be felt initially only during peak hour workloads. But as the upward
growth trend continues, this resource shortage will impact more and more hours of
the weekly prime processing shift. Statistical forecasting techniques can now be
applied to predict when in the future this out-of-capacity condition will occur.
Tip To justify an upgrade, you might need to supplement simple measurements of
resource utilization with empirical data that suggests performance degradation is
occurring when the resource saturates. In the case of a mail server, for example, evidence that Microsoft Exchange Server message delivery queue lengths are growing,
or that average message delivery times are increasing, bolsters an argument for adding more processing resources before the resource shortage turns critical.
Forecasting Techniques
Capacity planners use forecasting techniques that are based on statistical models. The
simplest techniques are often the most effective, so unless you are competent to use
advanced statistical techniques, you should stick with basic methods such as linear
regression. Linear regression derives a straight line in the form of an equation, y=mx+b,
based on a series of points that are represented as (x,y) coordinates on a Cartesian
plane. The regression formula calculates the line that is the best fit to the empirical
data. (There is one and only one such line that can be calculated for any series of three
or more points.) Because the x values of each of the (x,y) coordinates used in this
regression line calculation represent time values, the series of points (x,y) is also
known as a time series.
Note Just because a regression procedure derives a line that is the best fit to a series
of (x,y) coordinates does not mean that the line is a good fit to the underlying data that
is suitable for forecasting. Linear regression models also produce goodness-of-fit statistics that help you determine whether the regression line is a good fit to the underlying data. The most important goodness-of-fit statistic in linear regression is the
correlation coefficient, also known as r2.
Microsoft Windows Server 2003 Performance Guide
Forecasts Based on Linear Regression
The Excel function LINEST uses linear regression to derive a line from a set of (x,y)
coordinates, produce a trendline forecast, and also return goodness-of-fit statistics,
including r2. In the following example, Excel’s LINEST function is used to generate a
linear regression line that could be used for forecasting. Excel also provides a linear
regression–based function called TREND that simply returns the forecast without providing any of the goodness-of-fit statistics. If you do not feel competent to interpret the
linear regression goodness-of-fit statistics, the TREND function will usually suffice.
For more information about the use of the LINEST function, see Help in
Microsoft Excel.
More Info
Figure 4-32 illustrates the use of the Excel TREND function to forecast workload
growth based on fitting a linear regression line to the historical data. In this chart of
the actual measured values of the workload and the 26-week forecast, the historical
data is shown using a heavy line, whereas the linear trend is shown as a dotted line. In
an Excel chart, this is accomplished by creating an xy-scatterplot chart type based on
the historical data, and then adding the forecasting time series to the plot as a new
series. Making the forecast trend a separate data series from the actual history data
makes it possible to use formatting that clearly differentiates each on the chart.
Processor utilization (4 CPUs)
350
300
250
Percent
384
200
150
100
50
0
0
13
26
39
By week
Average utilization
Peak (Forecast)
Figure 4-32
52
65
78
Peak hour utilization
Average (Forecast)
A chart showing both actual historical data and forecasted future data points
Chapter 4:
Performance Monitoring Procedures
385
If you use the Excel LINEST function that returns goodness-of-fit statistics, the r2 correlation coefficient for the linear regression line that was used as the basis of the forecast is available. In this instance, an r2 value of 0.80 was calculated, indicating that the
regression line can “explain” fully 80 percent of the variability associated with the
underlying data points.
The r2 correlation coefficient ranges from 0, which means there is no significant correlation among the observations, to 1, which means all the observed data
points fall precisely on the regression line and the fit is perfect. For a more elaborate
discussion of the linear regression goodness-of-fit statistics, see the Microsoft Excel
Help for the LINEST function or any good college-level introductory text on statistics.
Note
This would be regarded as a relatively weak correlation for measurement data
acquired under tightly controlled circumstances—for example, in drug testing effectiveness trials, in which people’s lives are at risk. However, for uncontrolled operational environments like most real-world computer workloads, it is a strong enough
correlation to lend authority to any forecasts based on it. In the uncontrolled operational environments of computers, you are fortunate when you can observe correlation coefficients of 0.75 or higher, and even then only when the underlying
measurement has been thoroughly scrubbed.
Nonlinear Regression Models
Workload growth trends are often nonlinear, causing forecasts based on linear models to underestimate actual growth. That is because many growth processes are cumulative, operating on both the base workload and the growth portion of the workload,
just like compounded interest. Figure 4-33 is a variation of Figure 4-31 in which the
Add Trendline feature of the Excel (x,y) scatterplot is employed to plot the linear
regression line that the TREND function calculates against the underlying data. For
the Peak Hour Utilization data, the trend line tends to be above the actual values in
the first half of the time series and below the actual values toward the end of the chart.
This is often a sign of a nonlinear upward growth trend.
Microsoft Windows Server 2003 Performance Guide
Processor utilization (4 CPUs)
300
250
200
Percent
386
150
100
50
0
0
13
26
By week
Average utilization
Linear (Average utilization)
Figure 4-33
39
52
Peak hour utilization
Expon. (Peak hour utilization)
A chart variation showing a nonlinear trendline that can be used in forecasting
Because Excel also supports nonlinear regression models, it is easy to compare linear
and nonlinear models to see which yield better goodness-of-fit statistics for the same
set of data points. LOGEST is the nonlinear regression function in Excel that corresponds to LINEST. Excel also provides a nonlinear regression–based function called
GROWTH that returns only the forecast, without providing any of the goodness-of-fit
statistics. If you do not feel competent to interpret the nonlinear regression goodnessof-fit statistics, the GROWTH function might suffice.
Figure 4-34 adds an exponential growth trendline calculated based on the peak hour
utilization measurements, forecasting peak hour utilization that diverges significantly
from the linear estimate. Comparing goodness-of-fit statistics for the two models, LINEST reports an r2 of 0.84, whereas LOGEST reports an even higher r2 value of 0.90.
The exponential growth model is evidently a better fit to the underlying data. The
exponential growth trend predicts that the machine will be running at its absolute
CPU capacity limit by week 80, whereas the linear estimate suggests that saturation
point might not be reached for another six months.
Chapter 4:
Performance Monitoring Procedures
387
Processor utilization (4 CPUs)
400
350
Percent
300
250
200
150
100
50
0
0
13
26
39
By week
52
65
78
Average utilization
Peak hour utilization
Peak (Forecast)
Average (Forecast)
Expon. (Peak hour utilization)
Figure 4-34
An exponential growth trend added to the chart
In this example, the nonlinear growth forecast for peak hour utilization is the safer
bet. The goodness-of-fit statistics recommend the nonlinear model as the better fit to
the underlying data. The nonlinear trendline also forecasts a serious out-of-capacity
condition six months sooner than the linear estimate. Budget impact, technical feasibility, and other considerations might also bear on a decision to relieve this capacity
constraint during the next six-month period, sometime before the saturation point is
reached.
The Problem of Latent Demand
Once resources saturate, computer capacity limits will constrain the historical
workload growth trends that are evident in this example. The measurements
taken during periods in which resource constraints are evident will show the
historical growth trend leveling off. In many instances, you will find that the historical growth trend will resume once the capacity constraint that is dampening
growth is removed. In other instances, demand for new processing resources
might simply have leveled off on its own. This is known in capacity planning circles as the problem of latent demand. When a resource constraint is evident in
the measurement data, there is normally no way to determine definitively from
the historical record alone whether latent demand exists or the workload
growth has tailed off of its own accord.
388
Microsoft Windows Server 2003 Performance Guide
In forecasting, what to do about the possibility of latent demand is complicated by
evidence that users alter their behavior to adapt to slow systems. End users who
rely on transaction processing systems to get their work done, will adapt to a condition of shortage that slows their productivity. Assuming suitable facilities are
available, they tend to work smarter so that their productivity does not suffer as a
consequence of the resource shortage. One example of this sort of adaptation is
the behavioral change in end users who use search engines. When search engine
response time is very quick, a user is more likely to scroll through pages and pages
of results, one Web page at a time. However, if response time is slow, end users are
more likely to use advanced search capabilities to narrow the search engine result
set, thus boosting their productivity while using the application.
Counter Log Scenarios
This chapter has focused on uniform counter log data collection, summarization, and
reporting procedures that support both performance management and capacity planning, and that can scale effectively to even the largest environments. Still, these model
procedures might not be ideal for every situation. This section discusses alternative
ways to approach counter log data collection.
In particular, the model counter log data collection procedures described here rely on
gathering local counter data and writing counter log files to a local disk. Other
counter log scenarios exist as well and are explained in this section. You can use performance data logging and analysis tools to gather both local and remote performance data. You can store the data you collect in a file on the local computer you are
monitoring, or write to a file stored on a remote computer. Where to monitor the data
from and log the data to depends on your environment and how you want to store
and process the logged data.
When you encounter a performance problem and need to initiate a counter log session promptly to identify the problem in real time, you cannot always rely on logging
local counters to a local disk. You might have difficulty gaining physical access to the
machine experiencing the problem, and you might not be able to use the Remote
Desktop facility to gain access either. In situations like this, you must rely instead on
facilities to gather counter logs remotely. Remote counter log sessions and their special performance considerations are discussed in this section.
Logging Local Counters
When you specify the specific objects and counters you want to gather, you have the
option to gather these counters from only the local computer or to use the Remote
Chapter 4:
Performance Monitoring Procedures
389
Registry service to gather them from a remote machine somewhere on the network.
For efficiency, gathering counter log data from the local computer only is usually the
recommended approach. However, you would be forced to gather measurement data
from a remote machine in real time to diagnose a problem that is currently happening.
Considerations for gathering data from remote machines in real time for problem
diagnosis are discussed later in “Using the Performance Console for Remote Performance Analysis.”
Logging data from only the local machine is more efficient because of the architecture
of the Performance Monitoring facility in Windows Server 2003. (This architecture
was discussed extensively in Chapter 2, “Performance Monitoring Tools.”) Figure 435 shows a simple view of this architecture and focuses on a single aspect of it—
namely, the facilities used to identify and gather only a specific set of counters out of
all the counters that are available. It shows a performance monitoring application,
such as the System Monitor console or the background Counter Logs and Alerts service, specifying a list of counters to be gathered and passing that list to the Performance Data Helper library (Pdh.dll) interface. It shows the PDH library sitting
between the calling application and the set of Performance Library (Perflib) DLLs that
are installed on the machine.
Monitoring Application:
(System Monitor Console, Counter Logs
and Alerts service, etc.)
Counter List Selection
Counter List Selection Interface
PDH.DLL
Open:
Collect:
Close:
Figure 4-35
Perflib
DLLs
The facilities used to identify and gather a specific set of counters
390
Microsoft Windows Server 2003 Performance Guide
Each Perflib DLL is defined using a performance subkey in the registry at
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services. This performance
subkey defines the library name so that Pdh.dll can locate and load it as well as identifying three external functions that Pdh.dll can call. These are the Open, Collect, and
Close routines. The Open routine is called to inventory the objects and counters that
the Perflib DLL supplies. The Collect routine is called to gather the measurement data.
Ultimately, no matter which application programming interface is used for the
counter log data, a single Performance Library DLL is responsible for gathering the
data for individual counter values. These Performance Library DLLs reply to Collect
function calls to gather the counters they maintain. Only one Collect function call is
defined for each Performance Library DLL, and it is designed to return all the current
counter values that the Perflib DLL maintains. Even if the calling program using the
PDH interface to the counter log data requests a single counter value, the Performance
Library DLL returns current data from all the counters that it is responsible for. An
editing function provided by the PDH interface is responsible for filtering out all the
counter values that were not requested by the original caller and returning only the
values requested.
When you gather counter log data from a remote machine, the Remote Registry service that provides network access to a Performance Library DLL that is executing
remotely must transfer all the data gathered from the call to the Perflib DLL Collect
routine across the network, back to the PDH function. It is the responsibility of the
PDH counter selection editing function to then discard the counter data not explicitly
requested by the original caller. As you might expect, this architecture has major implications on performance when you need to gather counter log data from a remote
machine. The biggest impact occurs when counters are selected that are part of a very
voluminous set of counter data that one of the Perflibs is responsible for gathering.
The most common example where extreme care should be exercised involves any
counter from either the Process or Thread objects.
Process and Thread counter data is supplied by the Perfproc.dll Perflib, which is associated with the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\PerfProc\Performance Registry Key. The Collect routine of the Perfproc.dll
Perflib returns all counters for all instances of the Process and Thread objects. When
you request a single Process or Thread counter from a remote machine, it is necessary
for all the data—returned by the Perfproc.dll Perflib Collect routine—to be transferred
across the network back to the local machine where the performance monitoring
application is running. This architecture for remote machine monitoring is discussed
in more detail in the section entitled “Monitoring Remote Servers in Real Time.”
Local Logging—Local Log File
In local logging with a local log file, the performance data from the local computer is
logged, the log service of the local computer is configured and started locally, and the
Chapter 4:
Performance Monitoring Procedures
391
log file is stored on the local computer. This combination is the most efficient way to
manage counter log data collection and has no impact on the network. Use this scenario to collect performance data for baseline use, or to track down a suspected performance problem on the local computer.
The model daily performance monitoring procedures described in this chapter all rely
on logging local data to a local disk. They assume you will want to consolidate on a
centrally located machine Counter Logs gathered across multiple machines on your
network for the purpose of reporting, building, and maintaining a capacity planning
repository of Counter Log data. This will necessitate implementing daily performance
procedures, such as the ones described here, to transfer Counter Log files from each
local machine to a consolidation server on a regular basis. To minimize the network
impact of these file transfers, the model procedures recommended here perform this
data transfer on summarized files, which are much smaller than the original counter
logs. It is also possible that you can arrange to have these file transfers occur during
periods of slack network activity.
Local Logging—Remote Log File
In local logging with a remote log file, you configure the local log service to collect performance data from the local computer, but the log file is written to another computer.
This combination is useful on a computer with limited hard disk space. It is also useful when you need to ensure that the performance of the local hard disk is not
impacted by your performance monitoring procedure. The network impact of using
local logging to a remote log file is minimal because the only counter data sent across
the network are those specific counter values that were selected to be logged.
Although this network workload is typically small in proportion to the rest of the network traffic that occurs, it does start to add up when the data from many computers
that are being monitored is written to the remote disk.
Remote Logging—Local Log File
If you are responsible for monitoring several computers in a small enterprise, it might
be simplest to implement daily performance monitoring procedures in which the log
service runs on one machine, gathering data from the local machine and from several
remote computers (the other servers in your enterprise, for example). To limit the network impact of performing remote logging, you can log counter data to a consolidated
log file on the local computer. In a small environment, this saves the trouble of having
to consolidate multiple log files for analysis and reporting later. It also simplifies file
management because it is not necessary to have an automated procedure, like the one
documented in “Automated Counter Log Processing,” to perform file management on
each monitored system.
As long as you are careful about which counters you gather from the remote machines
and the rate at which you gather them, this scheme is easy to implement and relatively
392
Microsoft Windows Server 2003 Performance Guide
efficient. When there is a performance problem and you need to initiate a Counter Log
session promptly to identify the problem in real time, and you encounter difficulty
gaining physical access to the machine experiencing the problem, you are advised to
try this logging scenario.
The most frequent performance concern that arises in this scenario occurs when you
need to gather data at the process or thread level from the remote machine. Even if
you select only a single Process or Thread counter to monitor, the call to the Collect
routine of the Perflib DLL loaded on the remote machine will cause all the current
counter data associated with every counter and instance to be transferred across the
network back to the local machine each sample interval.
Remote logging might also have additional security considerations. These are discussed later in the section entitled “Monitoring Remote Servers in Real Time.”
Remote Logging—Remote Log File
Gathering counter data from one or more remote machines and writing the Counter
Log to a remote disk causes the biggest network impact of any logging scenario. It
might be necessary to engage in remote logging using a remote log file if the conditions are favorable for remote logging and there is not ample disk space on the local
computer. If you are very careful about which counters you gather from the remote
machines, the rate at which you gather them, and the amount of data that must be
sent across the network to be written to the remote disk, you can implement this
scheme without paying a prohibitively large performance penalty. If the computer
being monitored is connected to the central monitoring server via a high-speed network, the best configuration might be to copy the detailed file to the central monitoring server and have it processed there. The optimum configuration cannot be
prescribed in this text because there are too many site-specific factors to weigh.
Monitoring Remote Servers in Real Time
In any enterprise, large or small, you can’t always have physical access to the computer
you want to monitor. There are several ways to remotely obtain performance data
from a system or group of systems. Reviewing performance parameters of several
computers at the same time can also be a very effective way to study how multiple systems interact.
Monitoring performance counters remotely requires that you have network access to
the remote computer and an agent on the remote computer that collects performance
data and returns it to the local computer that requested it. The remote collection agent
supplied with the Windows Server 2003 family is the Remote Registry service
(Regsvc.dll). Regsvc.dll collects performance data about the computer it is running on
and provides a remote procedure call (RPC) interface, which allows other computers
Chapter 4:
Performance Monitoring Procedures
393
to connect to the remote computer and collect that data. This service must be started
and running on the remote computer for other computers to connect to it and collect
performance data. Figure 4-36 illustrates the different interfaces and functional elements used when monitoring performance data remotely.
Performance
application based
on the performance
registry (for example
Perfmon.exe)
4
User-defined
HTML page
or script
Performance Logs
and Alerts snap-in
8
9
System Monitor
Performance Logs
and Alerts service
6
6
Performance Data Helper (PDH) library
pdh.dll
5
4
Performance registry
Performance
data log file
Monitoring Computer
7
Remote Computer Being Monitored
Remote Registry service
Regsvc.dll
Legend:
1 Windows system call
API: each API is specific to
information requested
3
Performance registry
2 Standard performance
library interface
2
2
System
performance
DLL
Performance
extension
DLL
1
System Calls
3 Registry internal
interface
4 RefQueryValueEx API to
performance registry key
Performance
counter text
string files
5 PDH internal log
file interface
6 Published PDH API
7 Registry internal
RPC interface
8 System Monitor ActiveX
control interface
9 Log service internal
configuration interface
Figure 4-36
Remote performance monitoring architecture
394
Microsoft Windows Server 2003 Performance Guide
Access Rights and Permissions
A discretionary access control list (DACL) controls access to the performance data on
the remote computer. The user or account (in the case of a service) must have permission to log on to the remote computer and read data from the registry.
Performance counter text string files To save space in the registry, the large
REG_MULTI_SZ string variables that make up the names and explanatory text of the
performance counters are saved in performance counter text string files outside the
registry. These files are mapped into the registry so that they appear as normal registry keys to users and applications. Although this all takes place transparently to the
calling user or application, the user or application must still have access to these files
to access the performance data on the system. The performance counter text string
file names are:
■
%windir%\system32\perfc009.dat
■
%windir%\system32\perfh009.dat
If these files are installed on an NTFS file system partition, the DACLs on both files
must grant at least read access to the intended users of the performance data. By
default, only the Administrators group and interactive users are given sufficient access
to these files; therefore, the values of entries in the
\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Perflib\009 subkey are invisible to all other users. Note that both files must have
the correct ACL; if they do not, neither will be visible.
Using the Performance Console for Remote Performance Analysis
The Performance console that is used to monitor a local computer can also be used to
monitor remote computers at the same time. The ability to monitor local and remote
computers by using the same tool makes it possible for you to easily observe the interaction of the different components in a client/server application.
Selecting systems to monitor To interactively monitor the performance of remote
systems, select or enter the name of the system you want to monitor in the Select
Counters From Computer box in the System Monitor Add Counters dialog box. There
is normally a delay between the time you specify the remote computer name and the
time that its performance counters and objects appear in the dialog box. During the
delay, the local computer is establishing a connection with the remote computer’s performance registry and retrieving the list of performance counters and objects that are
available on the remote computer.
Chapter 4:
Performance Monitoring Procedures
395
Saving console settings for remote performance analysis When you save the
settings from a Performance console that is monitoring remote computers, be sure the
counter paths are saved the way you want to apply them later. Determine exactly
which counters you will want the user of that settings file to monitor, then use the correct setting for those counters. This determination is made in the Add Counters dialog
box of the Performance console. In the Add Counters dialog box, you can choose from
two options that determine from where the performance counters are read:
■
Use Local Computer Counters If this option is selected, only counters from the
local computer are displayed in the Add Counters dialog box. When the Performance console settings are saved and sent to another computer, the same list of
counters from the computer loading the settings file will be displayed in System
Monitor.
■
Select Counters From Computer If this option is selected, the name of the com-
puter you specify is saved along with the counters from that computer. No matter where that Performance console settings file is sent, the counter data from
the original computer specified in each counter path will always be used.
Sending console settings to others for remote performance analysis How you
specify the computer you want to monitor in the Add Counters dialog box determines
how the settings will be applied on another computer. For example, if you click the
Select Counters From Computer option, and the computer specified in the combo
box is the current computer (\\MyMachine, for example), when Performance console
settings from \\MyMachine are sent to another computer (\\YourMachine, for example) and opened in the Performance console on \\YourMachine, that Performance
console will try to connect to \\MyMachine to collect performance data. This might
not be what you want, so make this selection carefully.
Troubleshooting Counter Collection Problems
As you come to rely on counter logs for detecting and diagnosing common performance problems, you will develop less and less tolerance for any problems that
interfere with your ability to gather counter logs reliably. Most of the counter collection problems you are likely to encounter are associated with unreliable
extended counters supplied by third-party Performance Library DLLs. In this section, a variety of troubleshooting procedures to cope with common counter collection problems are discussed.
396
Microsoft Windows Server 2003 Performance Guide
Missing Performance Counters
When Performance Monitoring API calls that are routed via Pdh.dll to Performance
Library DLLs fail, counter values that you expected to gather are missing. (See the discussion earlier in this chapter about the internal architecture of the performance monitoring API in the section entitled “Counter Log Scenarios.”) When these failures
occur, the Performance Data Helper library routines attempt to document the error
condition and isolate the failing component so that the component does not cause
system-wide failure of the performance monitoring data gathering functions. This section reviews some of the common error messages that can occur and what should be
done about them. It also discusses the Disable Performance Counter function, which
is performed automatically to isolate the failing component and prevent it from causing a system-wide failure. If you are unable to gather some performance counters that
should be available on a machine, it is usually because the Performance Library associated with those counters has been disabled automatically because of past errors.
Common Perflib Error Messages
When Performance Monitoring API calls that are routed via Pdh.dll to Performance
Library DLLs fail, the PDH routines attempt to recover from those failures and issue a
diagnostic error message. These error messages are written to the Application event
log with a Source identified as “Perflib.” These error messages are a valuable source of
information about counter collection problems.
The performance monitoring API defines three external function calls that each Performance Library DLL must support. These are Open, Collect, and Close. Errors are
most common in the Open routine, where the Performance Library DLL responds
with a set of counter definitions that it supports, and in the Collect routine, where the
Performance Library DLL supplies current counter values. (See Table 4-8.)
Table 4-8
Error Messages in the Open and Collect Routines
Event ID
Message/Explanation
1008
The Open Procedure for service (service name) in DLL (DLL name)
failed. Performance data for this service will not be available. Status
code returned is DWORD 0.
Self-explanatory. The DisablePerformanceCounters flag in the associated registry key at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\<service-name>\Performance is set to 1 to prevent
the problem from recurring.
Chapter 4:
Table 4-8
Performance Monitoring Procedures
397
Error Messages in the Open and Collect Routines
Event ID
Message/Explanation
1009
The Open Procedure for service (service name) in DLL (DLL name)
generated an exception. Performance data for this service will not be
available. Exception code returned is DWORD 0.
Self-explanatory. The DisablePerformanceCounters flag in the associated registry key at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\<service-name>\Performance is set to 1 to prevent
the problem from recurring.
1010
The Collect Procedure for the (service name) service in DLL (DLL
name) generated an exception or returned an invalid status. Performance data returned by counter DLL will not be returned in Perf Data
Block. Exception or status code returned is DWORD 0.
Self-explanatory. The DisablePerformanceCounters flag in the associated registry key at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\<service-name>\Performance is set to 1 to prevent
the problem from recurring.
1011
The library file (DLL name) specified for the (service name) service
could not be opened. Performance data for this service will not be
available. Status code is data DWORD 0.
Perflib failed to load the performance extensions library. The status
code from GetLastError is posted in the data field of the event. For example, 7e means the DLL could not be found or the library name in
the registry is not correct.
1015
The timeout waiting for the performance data collection function
(function name) to finish has expired. There may be a problem with
that extensible counter or the service from which it is collecting data.
The Open or Collect routine failed to return in the time specified by
the Open Timeout, Collect Timeout, or OpenProcedureWaitTime registry fields. If no registry values are set for the Perflib, the default timeout value is 10 seconds.
When serious Perflib errors occur that generate Event ID 1008, 1009, and 1010 messages, steps are taken to isolate the failing component and safeguard the integrity of
the remainder of the performance monitoring facility. Performance data collection for
the associated Perflib is disabled until you are able to fix the problem. Fixing the problem might require a new or an upgraded version of the Perflib. After the problem is
resolved, you can re-enable the Performance Library and begin gathering performance
statistics from it again.
398
Microsoft Windows Server 2003 Performance Guide
Additional error conditions in which the Perflib DLL returns invalid length performance data buffers also cause the performance monitoring API to disable the Perflib
generating the error.
More Info For more information about these other Perflib error messages, see KB
article 226494, available at http://support.microsoft.com/default.aspx?scid=kb;zhtw;226494.
A Collect Timeout value can be specified (in milliseconds) in the HKLM\System\CurrentControlSet\Services\<Service-name>\Performance key for the Performance
Library DLL. If this value is present, the performance monitoring API sets up a timer
value prior to calling the Perflib’s Collect routine. If the Collect function of the Perflib
does not return within the time specified in this registry value, the call to Collect fails
and an Event ID 1015 error message is posted to the event log.
Similarly, there is an Open Timeout value in the registry under the Performance subkey
that functions in the same fashion for calls to the Perflib’s Open routine.
A registry field called OpenProcedureWaitTime at HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Perflib establishes a global default timeout value for all Performance Library DLLs if Open Timeout or Collect Time are not set explicitly. The
OpenProcedureWaitTime registry value defaults to a timeout value of 10,000 milliseconds, or 10 seconds.
Disable Performance Counters
To maintain the integrity of the performance data and to improve reliability, the performance monitoring API disables any performance DLL that returns data in the
incorrect format, causes an unhandled program fault, or takes too long to return the
performance data. As a result of an error condition of this magnitude, a field is added
to the registry in the HKLM\System\CurrentControlSet\Services\<Servicename>\Performance key named Disable Performance Counters. When Disable Performance Counters is set to 1, no performance monitoring application will be able to
gather the counters that the disabled Perflib is responsible for gathering. Perflib Event
ID 1017 and 1018 messages are written to the Application event log at the time the Disable Performance Counters flag is set.
Note When a Performance Library DLL is disabled, the performance counters gathered by that DLL are not available through the System Monitor console, the background Counter Logs and Alerts service, or any other performance application that
calls Performance Data Helper API functions. Disabled Perflib DLLs remain disabled
until the Disable Performance Counters flag in the registry is reset manually. Disabled
DLLs are not reloaded when the system is restarted.
Chapter 4:
Performance Monitoring Procedures
399
Sometimes, the problem that leads to the Performance Library is transient and will
clear up on its own. You can try re-enabling the extension DLL using the ExCtrLst utility (part of the Windows Server 2003 Support Tools) and restarting the counter log
session. Alternatively, you can use the registry Editor to change the value of the Disable Performance Counters flag to zero.
Sometimes, by the time you notice that some performance counters are disabled, the
event log messages that informed you of the original Perflib failure are no longer available. The only way to reconstruct what might have happened is to reset the Disable
Performance Counters flag and try to perform the counter logging again.
If the problem persists and the Disable Performance Counters flag is set again, contact
the vendor that developed the Performance Library for a solution. If the object is a
Windows Server 2003 system object (such as the Process object), please contact
Microsoft Product Support Services (PSS). See the procedures discussed in “Troubleshooting Counter Collection Problems” for more information.
Troubleshooting Counter Collection Problems
If you are having problems gathering performance counters provided by a Microsoft
or a third party–supplied Performance Library DLL, you might be expected to gather
the following information to assist your vendor in determining the problem.
1. Obtain a copy of the HKLM\SYSTEM\CurrentControlSet\Services registry
hive.
2. Obtain a copy of the Perfc009.dat and Perfh009.dat files in %SystemRoot%\system32.
3. Copy the HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Perflib\009 key. To do this, double-click both the Counter and Help fields and copy
their contents to a .txt file because they cannot be exported to a .reg file.
4. Obtain a copy of this registry hive: HKEY_CURRENT_USER\Software\Microsoft\PerfMon.
5. Start a counter log session that attempts to gather every counter registered on
your system for a brief interval. The counter log can be compared to the Performance subkeys in the registry at HKLM\SYSTEM\CurrentControlSet\Services
to see what objects, counters, and instances are missing.
400
Microsoft Windows Server 2003 Performance Guide
Restoring Corrupt Performance Counters
Problems associated with a bad Performance Library DLL might become so serious
you find it necessary to reconstruct the base set of performance counter libraries that
come with Windows Server 2003, along with any extensible counters associated with
other Perflib DLLs. The procedure for rebuilding the performance counters is to issue
the following command:
lodctr /r
This command restores the performance counters to their original baseline level, plus
applies any extended counter definitions that have been added later through subsequent software installations.
Chapter 5
Performance
Troubleshooting
In this chapter:
Bottleneck Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
Analysis Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
Processor Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
Memory Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
Disk Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
Network Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
A framework for proactive performance monitoring of your Microsoft Windows
Server 2003 infrastructure was provided in Chapter 4, “Performance Monitoring Procedures.” Sample procedures that illustrated ways to implement systematic performance monitoring were discussed, as were the measurements and the procedures for
gathering those measurements. These measurements are used to establish a baseline,
documenting historical levels of service that your Windows Server 2003 machines are
providing to the client customers of your server applications. By extrapolating from
historical trends in resource usage—that is, processor usage, disk space consumption,
network bandwidth usage, and so on—you can use the data collection, summarization, and archival procedures described in Chapter 4 to anticipate and detect periods
during which resource shortages might cause severe availability and performance
problems, and also take action to keep these issues from disrupting the orderly delivery of services to server application clients.
However, despite your best efforts at proactive performance monitoring, some problems will inevitably occur. The range and complexity of at least some of your Windows Server 2003 machines may mean that the relatively simple performance data
gathering procedures discussed in Chapter 4 are not going to catch everything occurring on your machines. There is simply too much performance data on too many operating system and application functions to gather all the time.
In this chapter, procedures for troubleshooting performance problems in specific
areas are discussed and illustrated. The focus is primarily on troubleshooting performance problems and identifying the precise cause of a shortage in any of the four
classes of machine resources common to all application environments: processor,
401
402
Microsoft Windows Server 2003 Performance Guide
physical memory and paging, disk, and network. This analysis cannot be performed
in a vacuum. The load on these resources is generated by specific application requests.
The troubleshooting procedures you use might have to be tailored for each application-specific environment that you are responsible for managing.
The troubleshooting procedures documented here all begin with the baseline of performance data you have been accumulating (using procedures similar to the ones
described in Chapter 4, “Performance Monitoring Procedures”) to identify resource
bottlenecks. Resources that are bottlenecks are not sized large enough to process all the
requests for service they receive, thus causing serious delays. Bottleneck analysis is the
technique used to identify computer resource bottlenecks and systematically eliminate them. Identifying the root cause of a resource shortage often involves the use of
very fine-grained troubleshooting tools. This chapter will illustrate how to use performance baseline measurements to identify resource shortages and how to use troubleshooting tools to hone in on specific problems.
Bottleneck Analysis
Analyzing a performance problem is easier if you already have established data collection and analysis procedures, and have collected performance data at a time when the
system or enterprise is performing optimally.
Baseline Data
The procedures for capturing, reporting, and archiving baseline performance data are
described in Chapter 4, “Performance Monitoring Procedures.” Collecting and analyzing baseline data before you have to conduct any problem analysis or troubleshooting
makes the troubleshooting process much easier. Not only will you have a set of reference values with which you can compare the current values, but you will also gain
some familiarity with the tools.
The baseline data to use during a troubleshooting or analysis session can exist in several forms:
■
Tables of data collected from baseline analysis can be used in hardcopy form as
a reference for comparison to current values. Graphs can also be used for baseline data that varies over time.
■
Performance data logs saved during baseline analysis can be loaded into one
System Monitor window and compared with current performance data displayed in another System Monitor window.
Chapter 5:
Performance Troubleshooting
403
Tip You might find it useful to save baseline workload data from a System
Monitor Report view in .tsv format. Then you can open open several .tsv files
that reflect different points in time in Microsoft Excel and compare them side by
side in a spreadsheet.
Although having this baseline data on hand is the ideal case, it is not always possible.
There might be times during the installation or setup of a system or an application
when performance is not as good as it needs to be. In those cases, you will need to
troubleshoot the system without the benefit of a baseline to help you isolate or identify the problem. Both approaches to troubleshooting—with baseline data and without—are covered in this chapter.
Current Performance Levels
The first step in a performance analysis is to ensure performance data is being logged.
Even though System Monitor can monitor and display performance data interactively,
many problems show up more clearly in a performance data log.
When collecting performance data for problem analysis, you typically collect data at a
higher rate (that is, using a shorter sample interval) and over a shorter period of time
than you would for routine data logging of the type recommended in Chapter 4, “Performance Monitoring Procedures.” The goal is to generate a performance data log file
with sufficient resolution to perform a detailed investigation of the problem. Otherwise, important information from instantaneous counters can be lost or obscured in
the normal performance logs that use a longer sample interval. Creating a separate
performance log query for each task is one way to accomplish this. Of the possible
performance data log formats, the binary format gives the best results. The binary log
retains the most information and allows for the most accurate relogging to other formats. Whenever raw data is processed into formatted data (which occurs with all
other data formats), some information is lost; therefore, subsequent computations on
formatted data are less accurate than computations performed on the raw, original
logged data that is stored in the binary log file format. If you have many servers to
monitor, creating a SQL database for your performance data might be useful as well,
making consolidation and summarization of the data easier.
Resource Utilization and Queue Length
To identify saturated resources that might be causing performance bottlenecks, it is
important to focus on the following activities:
404
Microsoft Windows Server 2003 Performance Guide
■
Gathering measurement data on resource utilization at the component level
■
Gathering measurement data on queuing delays that are occurring at the
resource that might be overloaded
■
Determining the relationship between resource utilization and request queuing
that exists at a particular resource
Theoretically, a nonlinear relationship exists between utilization and queuing, which
becomes evident when a resource approaches saturation. When you detect one of
these characteristically nonlinear relationships between utilization and queuing at a
resource, there is a good chance that this overloaded resource is causing a performance constraint.
Decomposition
Once you identify a bottlenecked resource, you often break down utilization at the
bottlenecked resource according to its application source, allowing you to hone in on
the problem in more detail. Several examples that illustrate bottleneck detection and
decomposition are provided later in this chapter.
Analysis Procedures
Listing all possible problems and their subsequent resolutions in this chapter is
impossible. It is possible, however, to describe common problems, the steps to take
when analyzing those problems, and the data you should gather so that you can perform your own analysis of similar problems.
Understanding the Problem
The obvious but often overlooked first step in problem analysis is to understand the
problem being observed or reported. The initial indication of a potential problem can
come from a user, a customer, or even a superior. Each person might characterize the
same problem differently or use the same characterization for different problems. As
the person doing the analysis, you need to clearly understand what the person reporting the problem is describing.
Analyzing the Logged Performance Data
The performance counters you log to provide data for troubleshooting are listed later
in this chapter. These lists are different from the lists of counters that you evaluate and
Chapter 5:
Performance Troubleshooting
405
analyze daily, because logging the data to a binary log file is best done by performance
object, whereas the actual analysis is performed on the individual performance
counters. Also, having more available data (to a point) is better than having less during analysis. Then, if you find something interesting when examining one of the key
counters, you will have additional data at your disposal for further investigation. On
the other hand, if your system is resource-constrained and you are trying to generate
the smallest possible log file, you can collect only the key counters to analyze, with the
understanding that you might not have all the data you need to do a comprehensive
analysis. Of course, some analysis is usually better than no analysis.
Analyzing Performance Data Interactively
For the performance problems that result from a lengthy and subtle change in system
behavior, the best way to start your analysis is by using a log file; however, some problems are immediate and must be analyzed interactively. In such cases, the counters
you need to monitor interactively are the same ones listed later in this chapter. During
an interactive analysis session, the counters listed are the first ones that should be
loaded into the interactive tool.
Fine-Grained Analysis Tools
In many problem-solving scenarios, gathering counter log data is often only the first
step in your analysis. You will often need to rely on more fine-grained performance
analysis tools, some of which are described in the “Help and Support” section of the
Windows Server 2003 documentation. In this chapter, how to use the following finegrained analysis tools is discussed:
■
Kernrate for the analysis of processor bottlenecks
■
Trace logs, Kernel Debugger commands, and the Poolmon utility for the analysis
of physical and virtual memory shortages
■
Trace logs for the analysis of disk bottlenecks
■
Server Performance Advisor reports for troubleshooting networking problems
■
Network Monitor packet traces and trace logs for the analysis of network bottlenecks
406
Microsoft Windows Server 2003 Performance Guide
What to Check Next in the Enterprise
In most cases, the next step in evaluating a problem at the enterprise level is to go to
the system exhibiting the problem and investigate further. In the case of a network
problem, look at the system components and the role they play in the enterprise
before examining the system itself. For example, heavier-than-normal network traffic
on one client might be the result of an errant application on the client, or it might be
a problem on a server that is not responding correctly to requests from that client. For
more information, see “Network Troubleshooting” later in this chapter.
Tip
A quick way to localize the source of an overloaded server is to temporarily
remove it from the network (if you can do that safely).
Monitor the Processor(_Total)\% Processor Time counter by using the Performance
console on the server in question. If it is not possible to observe the System Monitor
directly, make sure this performance counter is logged to a performance data log file
while performing this test.
Briefly, remove the server from the network for a period at least as long as several
sample intervals. If you are monitoring interactively, observe the value of the Processor(_Total)\% Processor Time performance counter. If the value drops when the server
is disconnected from the network, the load on the server is the result of client-induced
requests or operations. If the value remains more or less the same, the load is the result
of a process or application that resides on that server.
Be sure to reconnect the server to the network when this test is complete.
Processor Troubleshooting
Processor bottlenecks occur when the processor is so busy that it cannot respond to
requests for a noticeable period of time. Extended periods of near 100 percent processor utilization, accompanied by an increase in the number of ready threads delayed in
the processor dispatcher queue, are the primary indicators of a processor bottleneck.
These indicators reflect direct measurement of resource utilization and the queue of
requests delayed at a busy resource.
Chapter 5:
Performance Troubleshooting
407
Resource Utilization and Queue Length
The major indicators of a processor bottleneck are the system-level measurements of
processor utilization and processor queuing, as shown in Table 5-1.
Table 5-1
Primary Indicators of a Processor Bottleneck
Counter
Primary Indicator
Threshold Values
Processor(_Total)\
% Processor Time
Processor utilization
Sustained values > 90% busy on a
uniprocessor or sustained values >
80% on a multiprocessor should be
investigated.
System\Processor Queue
Length
Current depth of the
Numerous observations > 2 Ready
thread Scheduler Ready threads per processor should be inQueue
vestigated.
Observations > 5 Ready threads
per processor are cause for alarm.*
* This is an instantaneous counter and, therefore, often can report data that is not representative of overall system behavior.
In addition to the recommended threshold levels in Table 5-1, take note of any measurements that differ sharply from the historical baseline.
The combination of high processor utilization and a lengthy processor queue signals
an overloaded processor. Often the situation is unambiguous, for example, a program
thread stuck in an infinite loop of instructions that drives processor utilization to 100
percent. This condition is normally a programming bug that can be alleviated only by
identifying the offending program thread and ending its parent process. A sudden
spike in demand or even a gradual, sustained increase in the customer workload
might also create an evident out-of-capacity situation. The proactive performance
monitoring procedures discussed in Chapter 4, “Performance Monitoring Procedures,” of course, are designed to allow you to detect and anticipate an out-of-capacity
situation and intervene before it impacts daily operations.
Note This section ignores complex measurement and interpretation issues introduced by hyperthreaded processors, multiprocessors, and Non-Uniform Memory
Access (NUMA) architectures. These are all subjects discussed in Chapter 6, “Advanced
Performance Topics.”
408
Microsoft Windows Server 2003 Performance Guide
On the other hand, in many situations, you need to exercise considerable judgment. The interpretation of the two performance counters described in Table 5-1,
which are the primary indicators of a resource shortage, is subject to the following
considerations:
■
Processor state is sampled several hundred times per second and the results are
accumulated over the measurement interval. Over small intervals, the sampling
data can be skewed.
■
Beware of program threads that are soakers, that is, capable of absorbing any
excess processor cycles that are available. Screen savers, Web page animation
controls, and other similar applications need to be excluded from your analysis.
■
Processor Queue Length is an instantaneous counter reflecting the depth of the
Ready Queue at the last processor state sample.
■
Polling threads of various kinds that are activated by a timer interrupt that can
pop in tandem with the measurement timer can create a false impression about
the size of the Ready Queue
■
Individual Processor Queue Length counter samples can occur at intervals that
are at odds with the processor-busy measurements accumulated continuously
during the interval.
■
Most threads are in a voluntary Wait state much of the time. The remaining
threads that are actively attempting to run—ordinarily a small subset of the total
number of threads that exist—form the practical upper limit on the number of
threads that you can observe in the processor queue.
Any normal workload that drives processor utilization to near 100 percent for an
extended period of time is subject to a processor capacity constraint that warrants
some form of relief. This is self-evident even if the processor queue of Ready threads
remains low. Such a CPU-bound workload will undoubtedly benefit from a faster processor. If the workload is multithreaded and can proceed in parallel, a multiprocessor
upgrade should also relieve the situation.
If an afflicted machine is truly short of processor capacity, a range of possible solutions to increase processor capacity are available, including these:
■
Moving the workload to a faster machine
■
Adding processors to the current machine
■
Directing the workload to a cluster of machines
Figure 5-1 illustrates a processor out-of-capacity condition in which processor utilization remains at or near 100 percent for an extended period of time. The observed max-
Chapter 5:
Performance Troubleshooting
409
imum Processor Queue Length during this measurement interval is 18 threads
delayed in the Ready Queue. The Chart view also reveals several observations in
which the Processor Queue Length exceeds 10 on this machine, which happens to be
a uniprocessor.
Figure 5-1
Out-of-capacity condition
Note that the scale for the System\Processor Queue Length scale in the Chart view
has been changed from its default value, which multiples the observed values by 10, to
a scaling value of 1.
Decomposition
After you identify a processor out-of-capacity condition, you need to investigate further. Initially, you have three lines of inquiry:
■
Determine processor utilization by processor state The processor state measurements allow you to determine whether the component responsible for the
processor load is a User mode or Privileged mode application, which includes
device interrupt processing routines.
■
Determine processor utilization by processor This is usually necessary only
when the machine is deliberately configured for asymmetric multiprocessing.
Asymmetric multiprocessing is an important subject that is addressed at length
in Chapter 6, “Advanced Performance Topics.”
410
Microsoft Windows Server 2003 Performance Guide
■
Determine processor utilization by process and thread When the processor
overload originates in a User-mode application, you should be able to identify
the process or processes responsible.
In each of these cases, it might also be necessary to determine processor utilization by
application module and function.
Processor Use by State
When the processor executes instructions, it is in one of two states: Privileged mode or
User mode.
Privileged mode Authorized operating system threads and interrupts, including
all device driver functions, execute in Privileged mode. Kernel-mode threads also execute in Privileged mode.
Processor(n)\% Privileged Time records the percentage the system was found busy
while executing in Privileged mode. Within % Privileged Time, you can also distinguish two additional modes: Interrupt mode and deferred procedure call (DPC)
mode, which typically account for only a small portion of the time spent in Privileged
mode.
Interrupt mode is a high-priority mode reserved for interrupt service routines (ISRs),
which are device driver functions that perform hardware-specific tasks to service an
interrupting device. Interrupt processing is performed at an elevated dispatching level,
with interrupts at the same or lower priority disabled.
Note Although interrupt priority is a hardware function, the Windows Server 2003
Hardware Abstraction Level (HAL) maintains an interrupt priority scheme known as
interrupt request levels (IRQLs), which represent the current dispatching mode of a
processor. Processing at a higher IRQL will preempt a thread or interrupt running at a
lower IRQL. An IRQL of 0 means the processor is running a normal User- or Kernelmode thread. There is also an IRQL of 1 for asynchronous procedure calls, or APCs. An
IRQL value of 2 indicates a deferred procedure call (DPC) or dispatch-level work, discussed in more detail later. An IRQL greater than 2 indicates a device interrupt or other
high priority work is being serviced. When an interrupt occurs, the IRQL is raised to the
level associated with that specific device and calls the device’s interrupt service routine. After the ISR completes, the IRQL is restored to the previous state when the interrupt occurred. Once the IRQL on the processor is raised to an interrupt-level IRQL,
interrupts of equal or lower IRQL are masked on that processor. Interrupts at a higher
IRQL can still occur, however, and will preempt the current lower priority activity that
was running.
Chapter 5:
Performance Troubleshooting
411
Processor(n)\% Interrupt Time records the percentage of time the system was found
busy while it was executing in Interrupt mode, or other high IRQL activity. This
includes the execution time of ISRs running at a higher priority than any other Kernel
or User-mode thread. The amount of % Interrupt Time measured is included in the %
Privileged Time measurement. % Interrupt Time is broken out separately to make
identifying a malfunctioning interrupt service routine easier.
DPC mode is time spent in routines known as deferred procedures calls, which are
device driver–scheduled routines called from ISRs to complete device interrupt processing once interrupts are re-enabled. DPCs are often referred to as soft interrupts.
DPCs run at dispatch level IRQL, just below the priority level of interrupt service
routines. DPCs at dispatch level IRQL will be interrupted by any interrupt requests
that occur because hardware interrupts have IRQL greater than the dispatch level. In
addition to DPCs, there are other sources of dispatch level activity on the system.
Processor(n)\% DPC Time records the percentage of time the system was found busy
while at dispatch level. This represents the execution time of DPCs running at a
higher priority than any other Kernel- or User-mode thread, excluding ISRs and other
dispatch level activity. The amount of % DPC Time that is measured is also included
in the % Privileged Time measurement. % DPC Time is broken out separately to make
identifying a malfunctioning interrupt service routine’s DPC easier.
User mode Program threads from services and desktop applications execute in
User mode. User-mode threads cannot access system memory locations and perform
operating system functions directly. Processor(n)\% User Time records the percentage the system was found busy while it was executing in User mode. Figure 5-2 shows
the % Processor Time on the same machine broken down by Processor State.
The relative proportion of % User Time and % Privileged Time is workload-dependent. Do not expect a constant relationship between these two counters on machines
running different workloads. In this example, time spent in User mode is only slightly
more common than Privileged-mode execution time. On the same machine running a
similar workload, the relative proportion of User-mode and Privilege-mode execution
time should be relatively constant. Compare measurements from the current system
to the historical baseline measurements you have put aside. Unless the workload
changes dramatically, the proportion spent in User mode and Privileged mode should
remain relatively constant for the same machine and workload over time. Are you able
to observe a major change in the ratio of User- to Privileged-mode processing? This
could be an important clue to understanding what has changed in the interim.
412
Microsoft Windows Server 2003 Performance Guide
Figure 5-2
Percentage of processor time broken down by processor state
The relative proportion of time spent in Interrupt mode—generally, a function of network and disk interrupt rates—is normally quite small. In the example shown in Figure 5-2, the amount of time was insignificant, consistently less than 2 or 3% busy. If
the current amount of % Interrupt Time is much higher than historical levels, you
might have a device driver problem or a malfunctioning piece of hardware. Compare
the current Interrupts/sec rate to historical levels. If the current Interrupts/sec rate is
proportional to the level measured in the baseline, device driver code is responsible
for the increase in % Interrupt Time. If the Interrupt rate is sharply higher, you have a
hardware problem. In either case, you will want to use the Kernrate utility to hone in
on the problem device. (Kernrate is discussed later.) If you observe a spike in the
amount of % DPC Time being consumed, identical considerations apply.
If the bulk of the time is being spent in User-mode processing, proceed directly to analyzing processor usage at the process level. For Privileged-mode processing, you might
or might not find a clear relationship to a specific process or processes. Nevertheless,
because both User-mode and Privileged-mode processing are broken out at the process level, checking out the process level measurements, even when excessive Privileged-mode execution time is the concern, is worthwhile.
Chapter 5:
Performance Troubleshooting
413
Processor Use by Process
The sample measurements that determine the state of the machine when it is busy
executing instructions also track the process and thread context currently executing.
Once you determine which process instance or instances you need to observe closely,
select their Process(n)\% User Time and Process(n)\% Privileged Time counters. You
can even drill down to the thread level, as illustrated in Figure 5-3.
Figure 5-3
A single thread impacting the percentage of processor time consumed
In this example, a single thread from an Mmc.exe parent process is responsible for
about 40 percent of the total % Processor Time recorded in Figure 5-2. The complication introduced when you need to drill down to the process or thread level is that
there are many individual processes to examine. It is often easier to use Task Manager
first to identify a CPU-bound process, as discussed later.
Identifying a Runaway Process by Using Task Manager
If you are diagnosing a runaway process in real time, use Task Manager to help you
zero in on a problem process quickly.
1. Press CTRL+SHIFT+ESC to launch Windows Task Manager, or click the Task Manager icon in the system tray if Task Manager is already active.
414
Microsoft Windows Server 2003 Performance Guide
2. Click the Processes tab. Select View, choose Select Columns, and make sure that
the CPU Usage field is displayed. This column is labeled CPU in the Processes
view.
3. Click the CPU column label to sort the displayed items in sequence by processor usage. You should see a display similar to the one in Figure 5-4. To sort the
display in reverse order, click the column heading again.
Figure 5-4
Processes in sequence by processor usage (CPU column)
When the display is sorted, a runaway process will appear at the top of the display and remain there for an extended period of time.
4. Select the runaway process, right-click, and then click Set Priority to reset the
dispatching priority to Low or Below Normal, as illustrated in Figure 5-5.
Chapter 5:
Figure 5-5
Performance Troubleshooting
415
Resetting the dispatching priority
Once this action is successful, you will find that desktop applications are much more
responsive to keyboard input and mouse actions. You can allow the runaway process
to continue to run at a lower dispatching priority while you attempt to figure out what
is wrong with it. You can now work at a more leisurely pace, because the runaway process can consume only as much excess processor time as is available after all other
processes have been serviced. So it can continue to run at Low or Below Normal priority without causing any harm to overall system performance.
As you proceed to determine the root cause of the problem, your next step is to perform one of the following actions:
■
If you are familiar with the application, use the Task Manager context menu to
attach the debugger that was used to develop the application.
■
If you are not familiar with the application, you can run the Kernrate utility to
determine which parts of the program are consuming the most processor time.
When you are finished, you might want to end the runaway process. One approach is
to use the Task Manager context menu and click End Process to end the process.
416
Microsoft Windows Server 2003 Performance Guide
Identifying a Runaway Process by Using a Counter Log
You can also work with counter logs to diagnose a problem with a runway process by
using the performance data recorded there, assuming you are gathering process level
statistics.
1. Using either the Histogram view or the Report view, select all process
instances of the % Processor Time counter. The Histogram view is illustrated
in Figure 5-6.
Figure 5-6
Histogram view of process instances of the % Processor Time counter
2. Click the Freeze Frame button to freeze the display once you can see that there
is a runaway process.
3. Click the Highlight button and then, using the keyboard, scroll through the legend until you can identify the instance of the process that is consuming an
excessive amount of processor time.
4. Delete all the extraneous process instances, unfreeze the display, and revert to
Chart view to continue to observe the behavior of the runaway process over
time.
Chapter 5:
Performance Troubleshooting
417
Processor Use by Module and Function
When you have a runaway process disrupting service levels on a production machine,
the first step is to identify the process and remove it before it causes any further damage. You might be asked to investigate the problem further. If the application that is
causing the problem was developed in-house, the programmers involved might need
help in further pinpointing the problem. If the excessive processor utilization is associated with a Kernel-mode thread or device driver function, you need more information to determine which kernel module is involved.
The Kernrate utility included in the Windows Server 2003 Resource Kit is an efficient
code-profile sampling tool that you can use to resolve processor usage at the application and kernel module level, and at the function level. Using Kernrate, you can drill
deep into your application and into the operating system functions that are executing.
Caution Kernrate is a potentially high-overhead diagnostic tool that can be
expected to impact normal operations. The sample rate used in Kernrate can be
adjusted to trade off sampling accuracy against the tool’s impact on the running system. Use Kernrate carefully in a production environment and be sure to limit its use to
very small measurement intervals.
The Overhead of the System Monitor
In skilled hands, the System Monitor is a very powerful diagnostic and reporting tool.
It can also be misused. The following examples illustrate some of the dos and don’ts
of using the System Monitor effectively.
Figure 5-7 shows a very busy system running only two applications. The first is the
System Monitor console (Sysmon.ocx embedded in the Microsoft Management Console, Mmc.exe). The System Monitor console is being used very inappropriately. In a
real-time monitoring session, the System Monitor is gathering every instance of every
Process and Thread object and counter. This is occurring on a 1.2-GHz machine with
approximately 50 active processes and 500 active threads running Windows Server
2003. Obviously, this is not something anyone would do in real life—it is an example
illustrating how the System Monitor application works.
418
Microsoft Windows Server 2003 Performance Guide
Figure 5-7
A very busy system running only two applications
The overall system averages 83 percent busy during the approximately 10-minute
interval illustrated. The Mmc.exe process that runs the System Monitor accounts for
over 90 percent busy over the reporting interval. At the bottom of the chart, the % Processor Time for the Smlogsvc process is shown, broken out, like Sysmon, into % User
Time and % Privileged Time. Over the same 10-minute interval, Smlogsvc accounts for
less than 1 percent processor busy.
Smlogsvc is the service process that gathers performance logs and alerts to create
both counter logs and trace logs. Smlogsvc also provides performance data collection
services for the Logman utility. Smlogsvc runs as a background service. In this example, it is writing performance data on processor utilization to a binary log file.
Smlogsvc gathered the performance data on processor usage per process, which is
being reported in Figure 5-7, using the System Monitor console. Smlogsvc gathered
performance data on itself and on the other main application running on the system,
which happened to be the interactive System Monitor session. Smlogsvc is writing
performance data once a second to a binary log file. Obviously, there is a big difference in the amount of processor overhead associated with the two performance data
collectors that are running concurrently and performing similar tasks.
This example helps to answer the question, “What is the processor overhead of the
System Monitor?” The answer, of course, is, “It depends.” It depends on what func-
Chapter 5:
Performance Troubleshooting
419
tions you are performing with the System Monitor. As you can see, the background
data collection functions are very efficient when the binary format file is used. In foreground mode, the System Monitor is expensive, by comparison. However, no one
would use the System Monitor in the foreground in the manner illustrated, unless he
or she were interested in generating an excessive processor load.
The point of this exercise is to illustrate that you do not have to guess at the overhead
of a system performance monitoring session. You can use the performance monitoring tools that Windows Server 2003 supplies to find out for yourself how much processor time a monitoring session uses.
Figure 5-8 provides a fairer comparison of a background performance data collection
session by using Smlogsvc to gather the same process and thread objects and
counters at 1-second intervals as the interactive System Monitor session gathered, as
illustrated in Figure 5-7. The processor utilization is so small—approximately 1 percent
busy—that it was necessary to reduce the y-axis scale of the System Monitor Chart
view to render the display meaningful.
Figure 5-8
A background performance data collection session using Smlogsvc
420
Microsoft Windows Server 2003 Performance Guide
Counter Selection Logic
Smlogsvc runs even more efficiently in this example in Figure 5-8 than it did in the
previous one. The only difference in the two counter log sessions was that the first collected process level data for a few selected processes and threads. In the second example, illustrated in Figure 5-8, the Smlogsvc gathered performance data for every
process and thread. In Figure 5-8, Smlogsvc is gathering much more performance
data, but evidently executing more efficiently. As the measurement indicates, the
counter log service performs more efficiently when it can gather data from a Performance Library and write that data directly to a binary log file without subjecting it to
any intervening object instance or counter selection logic. This is the factor that
accounts for the greater efficiency of the counter log session illustrated in Figure 5-8.
Counters from all process and thread instances are being logged, compared to Figure
5-7, which shows only select instances and counters being logged.
Even though this approach is the most efficient way to gather performance data, there
can be problems. When you gather all the instances and all the counters from highvolume objects like Process and Thread, the data collection files grow extremely fast.
In approximately the same amount of time, the binary log file holding selected Process and Thread data grew to about 15 MB in the logging session documented in Figure 5-7. By comparison, the binary log file holding every Process and Thread counter
from the logging session illustrated in Figure 5-8 grew to about 50 MB in just over 10
minutes. Although gathering all instances and counters might be the most efficient
way to gather performance data from the standpoint of processor utilization, the size
of the binary format files is prohibitively large, making this approach generally undesirable to take.
Log file formats Figure 5-9 illustrates another significant aspect of System Monitor
overhead—the difference between using a binary file format and a text file format file
to log performance data. Figure 5-9 shows another Smlogsvc data gathering session,
identical to the one in Figure 5-8, except that a text format output log file is being created instead of a binary one. Processor utilization increases dramatically, as illustrated. Note also that the bulk of the processing to create a text format file takes place
in User mode. Compare the proportion of Privileged-mode processing here to that
shown in Figure 5-7, in which System Monitor processing in real-time mode is gathering the same set of objects and counters. In Figure 5-7, the proportion of time spent in
Privileged mode was substantial. This is because of the number of Graphics Device
Interface (GDI) calls required to update the Chart view every second. In Figure 5-9,
the amount of Privileged-mode processing is negligible.
Chapter 5:
Figure 5-9
Performance Troubleshooting
421
Increase in processor utilization because of using a text file format file
Again, the explanation is that the counter log service operates most efficiently when it
can gather data from a Performance Library and write it directly to a binary log file
without much intervening processing logic. In fact, the data stored in a binary format
counter log file is in a raw form, identical to the buffers returned by calls to Performance Library DLL Collect routines. There is no intervening processing of these raw
data buffers—they are simply written directly and efficiently to the output log file.
Comparing Figure 5-7 to Figure 5-8 illustrates one form of intervening processing
logic that needs to occur when only specific object and counter instances are selected
for output. A call to the Perfib DLL Collect routine responsible for process and thread
performance data returns data for all object instances and counters in a series of raw
buffers. Counter selection requires parsing the raw data buffers to identify individual
object instances and counters, and discarding data about objects and counters that
were not selected. This selection logic is one of the functions that the Performance
Data Helper (PDH) Library of routines provides. The System Monitor console application and the Smlogsvc Counter Logs and Alerts service both rely on these PDH
functions to parse the raw data buffers returned by Perflib DLL data Collect routines.
Moreover, by relying on the same set of common PDH routines to parse and interpret
Perflib DLL raw data, the System Monitor console can process and display data representing current activity—either directly from the Performance Library DLLs or from
binary log files in which the raw collection data is stored in its original format.
422
Microsoft Windows Server 2003 Performance Guide
When creating a text format counter log, significant processing needs to be performed. This processing is similar to the logic that the System Monitor console applies
to raw binary format data when transforming that data into displayable counters. This
essential processing logic is also provided by PDH routines. These helper routines are
associated with the supported performance counter types. Consider, for example, a
PERF_COUNTER_RAWCOUNT, which is an instantaneous observation of a single
measurement. An instantaneous counter requires very little processing needs—only
the transformation of the original numeric value in binary format into a text representation. But consider a different counter of the popular PERF_COUNTER_COUNTER
type. Here the Perflib DLL raw data buffers contain only the current value of this continuously accumulated metric. PDH routines must locate the raw value of this counter
in the data buffers retained from the previous data collection interval to calculate an
interval activity rate that is generated as the current counter value. This is not a trivial
effort. First, a large number of instances of process and thread data are contained in
both sets of Perflib DLL raw data collection buffers—which correspond to the current
and previous measurement intervals—that must be parsed. Second, the dynamic
nature of process and thread creation and destruction ensures that the sequence of
data in the raw buffer can change from one interval to the next.
PDH routines exist that parse these raw data Perflib buffers and derive the current
counter values, which are then generated to create text format counter logs. To be
sure, these are the same PDH functions that the System Monitor console relies on to
format data to generate its Chart and Report views. In Chart view, showing many
counter values in the same display is impractical, so using selection logic first to trim
the amount of raw data needing to be processed speeds up the processing.
The two factors affecting the overhead incurred by creating a text format counter log
are:
■
The number of instances and counters contained in the raw data buffers
■
The need to apply selection logic to weed out instances and counters that are
not written to the data file
If the text format counter log file is restricted to a few object instances and counters,
much less parsing of raw data buffers is required. Counter log files limited to a few
object instances and counters can be created relatively efficiently. But, as Figure 5-9
indicates, creating text format file counter logs that track large numbers of object
instances and counters can be prohibitively expensive. This is especially true when
voluminous raw data buffers associated with process and thread objects and counters
require parsing. This relative inefficiency of text format file counter log creation under-
Chapter 5:
Performance Troubleshooting
423
lies the recommendations in Chapter 4, “Performance Monitoring Procedures,” to use
binary format log files for bulk data collection.
One mystery that still needs to be addressed is why the background logging session
illustrated in Figure 5-9 uses as many processor resources as the interactive foreground session, illustrated in Figure 5-7. In both cases, because system-wide % Processor Time is pinned at 100 percent busy, processing is probably constrained by
processor capacity. The implication of the System Monitor data collection functions
being processor-constrained is that there is little or no processor idle time between
data collection intervals. In both the foreground System Monitor console session and
the background text file format counter log session, as soon as the program is finished gathering and processing one interval worth of data, it is time to gather the next.
In the case of Figure 5-7, it seems reasonable to assume that GDI calls to update the
Chart view are responsible for at least some of the heavy processing load. In Figure 59, performance logging by Smlogvc has no associated GUI overhead but uses every bit
as much processing time. Using Kernrate, it will be possible to get to the bottom of
this mystery.
Using Kernrate to Identify Processor Use by Module and Function
Complete documentation for the Kernrate tool is included in the Windows Server
2003 Resource Kit Help. Using Kernrate, you can see exactly where processor time is
spent inside processes, their modules, and module functions. Kernrate is an efficient
code-profiling tool that samples processor utilization by process virtual address. This
includes instructions executing inside the operating system kernel, the HAL, device
drivers, and other operating system functions. It is similar to some of the CPU execution profilers available from third-party vendors.
This section illustrates using Kernrate to resolve processor usage at the process, module, and function level. This example answers the questions raised earlier in the chapter about the performance impact of the Performance Monitor application. A Kernrate
CPU execution profile is useful whenever you need to understand processor usage at
a finer level of detail than the Performance Monitor objects and counters can provide.
Listing 5-1 shows the output from a Kernrate monitoring session, initiated by the following command:
kernrate -n smlogsvc -s 120 -k 100 -v 2 -a -e -z pdh
424
Microsoft Windows Server 2003 Performance Guide
Table 5-2 briefly explains the Kernrate command-line parameters. Note that if the
_NT_SYMBOL_PATH environment variable is not set, you can specify it using the -j
switch.
Table 5-2
Key Kernrate Run-Time Parameters
Switch
Parameter
Function
-n
process-name
Multiple processes can be monitored in a single run.
-s
Duration in seconds Gathers instruction execution observations for the
specified number of seconds. Because of the overhead
of Kernrate sampling, it is wise to limit the duration of
a Kernrate profile.
-k
Hit count
Restricts output to only those modules and buckets
that equal or exceed the Hit count threshold. Defaults
to 10.
-v
Verbosity-level
Controls the output that Kernrate supplies. Verbosity
level 2 displays instruction information per bucket, including symbols and source-code line information for
every bucket.
-i
Sets the sampling interval. More frequent sampling
provides more accurate data but is more intrusive. If
the CPU impact of Kernrate sampling is a concern, use
a less frequent sampling interval.
-a
Performs a consolidated Kernel- and User-mode instruction execution profile.
-e
Prevents gathering of system-wide and process-specific performance metrics (context switches, memory
usage, and so on) to reduce processing overhead.
-z
module-name
Multiple modules can be monitored in a single run.
In this example, Kernrate is monitoring the Smlogsvc process in the midst of a text file
format counter log session identical to the one illustrated in Figure 5-9. This is the session that used a surprising amount of User-mode processor time. Kernrate allows you
to drill down into the Smlogsvc process to see what modules and instructions are
being executed. In this example, Kernrate is also used to drill down into the Pdh.dll
Performance Data Helper library that it relies on to parse raw performance data buffers and calculate counter type values.
Chapter 5:
Note
Listing 5-1
Performance Troubleshooting
425
The Kernrate output has been edited for the sake of conciseness.
A Kernrate Report on Processor Usage
Starting to collect profile data
Will collect profile data for 120 seconds
===> Finished Collecting Data, Starting to Process Results
------------Overall Summary:-------------P0
K 0:00:03.094 ( 2.6%) U 0:01:19.324 (66.1%) I 0:00:37.584 (31.3%) DPC 0:00
:00.200 ( 0.2%) Interrupt 0:00:00.991 ( 0.8%)
Interrupts= 190397, Interrupt Ra
te= 1587/sec.
The Kernrate Overall Summary statistics are similar to the Processor(_Total)\% Processor Time counters. They break down the execution profile samples that Kernrate
gathers by processor state, including the Idle state. Note that DPC and interrupt time
are included in Kernrate’s calculation of the kernel time. Kernrate identifies 31.3 percent of the sample duration as time spent in the Idle thread. Kernrate examples will
further illuminate the Idle thread implementation details in a moment—a discussion
that is also relevant to some of the multiprocessing discussion in Chapter 6,
“Advanced Performance Topics.”
Continuing to review the Kernrate output in Listing 5-1, you can see that Kernrate
found the processor busy executing a User-mode instruction 66.1 percent of the time
and a Privileged (Kernel) mode instruction 2.6 percent of the time. The sum of Idle-,
User-, and Kernel-mode processing is, of course, equal to 100 percent, after allowing
for possible rounding errors.
Following the Overall Summary statistics, Kernrate lists all the processes that were
active during the monitoring session (not shown), which, in this case, includes the
Smlogsvc process and the Kernrate process. Kernrate then provides a hit count of the
Kernel-mode modules that executed instructions during the interval, as shown in
Listing 5-2.
426
Microsoft Windows Server 2003 Performance Guide
Listing 5-2
A Kernrate Report on Processor Usage by Kernel Routines (continued)
Results for Kernel Mode:
----------------------------OutputResults: KernelModuleCount = 619
Percentage in the following table is based on the Total Hits for the Kernel
Time
20585 hits, 19531 events per hit -------Kernel CPU Usage (including idle process) based on the profile interrupt
total possible hits is 33.50%
Module
ntoskrnl
processr
hal
win32k
Ntfs
nv4_disp
nv4_mini
USBPORT
Hits
15446
4381
524
114
42
22
12
11
Shared
0
0
0
0
0
0
0
0
msec
120002
120002
120002
120002
120002
120002
120002
120002
%Total %Certain Events/Sec
75 % 75 %
2513923
21 % 21 %
713032
2 % 2 %
85283
0 % 0 %
18554
0 % 0 %
6835
0 % 0 %
3580
0 % 0 %
1953
0 % 0 %
1790
The Kernrate output explains that its resolution of instructions being executed for
modules and routines in the Kernel state includes samples taken during the Idle state.
Including Idle state samples, Kernel-mode instructions have been detected 33.5 percent of the time for a total of 20,585 module hits. Seventy-five percent of the module
hits are in Ntoskrnl, 21 percent are in Processr.sys, and 2 percent are in Hal.dll. Other
than being in the Idle state (31.3 percent), very little Kernel-mode processing is taking
place.
Listing 5-3 continues the example with a Kernrate report on processor usage at the
module level. The process-level statistics for the Smlogsvc process are reported next.
Listing 5-3
A Kernrate Report on Processor Usage by Module Level (continued)
Results for User Mode Process SMLOGSVC.EXE (PID = 2120)
OutputResults: ProcessModuleCount (Including Managed-Code JITs) = 41
Percentage in the following table is based on the Total Hits for this Process
Time
40440 hits, 19531 events per hit -------UserMode CPU Usage for this Process based on the profile interrupt total possible hits i
s 65.82%
Module
pdh
kernel32
msvcrt
ntdll
Hits
31531
6836
1700
362
Shared
0
0
0
0
msec
120002
120002
120002
120002
%Total %Certain Events/Sec
77 % 77 %
5131847
16 % 16 %
1112597
4 % 4 %
276684
0 % 0 %
58917
Chapter 5:
Performance Troubleshooting
427
Smlogsvc has 40,440 hits, which means the machine is executing Counter Logs and
Alerts service User-mode instructions 65.8 percent of the time. Inside Smlogsvc, only
four modules have more than the threshold number of hits. Calls to the Pdh.dll Performance Data Helper function account for 77 percent of the User-mode processing
within the Smlogsvc. The Kernel32.dll and Msvcrt.dll run-time libraries account for
16 percent and 4 percent of the Smlogsvc hits, respectively. The fourth runtime
library, Ntdll.dll, yields less than 1 percent of the instruction execution hits.
Listing 5-4 is the Kernrate report on processor usage by module function. The module-level statistics for Pdh.dll provide additional insight.
Listing 5-4
A Kernrate Report on Processor Usage by Module Function (continued)
===> Processing Zoomed Module pdh.dll...
----- Zoomed module pdh.dll (Bucket size = 16 bytes, Rounding Down) -------Percentage in the following table is based on the Total Hits for this Zoom Module
Time
31671 hits, 19531 events per hit -------- (33371 total hits from summingup the module components)
(51.55% of Total Possible Hits based on the profile interrupt)
Module
StringLengthWorkerA
IsMatchingInstance
GetInstanceByName
IsMatchingInstance
NextInstance
GetInstanceName
PdhiHeapFree
GetInstance
PdhiHeapReAlloc
GetCounterDataPtr
NextCounter
GetInstanceByName
FirstInstance
PdhiHeapFree
PdhiMakePerfPrimaryLangId
PdhiWriteTextLogRecord
GetObjectDefByTitleIndex
FirstObject
GetPerfCounterDataPtr
GetInstanceByUniqueId
GetQueryPerfData
_SEH_prolog
NextObject
FirstInstance
UpdateRealTimeCounterValue
_SEH_epilog
PdhiPlaInitMutex
GetStringResource
Hits
23334
1513
1374
1332
1305
829
495
469
428
338
246
233
121
121
108
101
93
84
84
77
75
53
40
36
31
31
30
30
Shared
1
0
0
0
12
0
428
11
0
338
1
229
121
0
108
0
3
30
0
77
0
0
39
13
0
5
30
0
msec
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
119992
%Total %Certain Events/Sec
69 % 69 %
3798056
4 % 4 %
246269
4 % 4 %
223644
3 % 3 %
216808
3 % 3 %
212413
2 % 2 %
134935
1 % 0 %
80570
1 % 1 %
76338
1 % 1 %
69665
1 % 0 %
55015
0 % 0 %
40041
0 % 0 %
37925
0 % 0 %
19695
0 % 0 %
19695
0 % 0 %
17579
0 % 0 %
16439
0 % 0 %
15137
0 % 0 %
13672
0 % 0 %
13672
0 % 0 %
12533
0 % 0 %
12207
0 % 0 %
8626
0 % 0 %
6510
0 % 0 %
5859
0 % 0 %
5045
0 % 0 %
5045
0 % 0 %
4883
0 % 0 %
4883
428
Microsoft Windows Server 2003 Performance Guide
Kernrate reports that the PDH module has 31,671 hits, accounting for 51.6 percent of
the total instruction samples collected. Fully 69 percent of these hits are associated
with an internal helper function called StringLengthWorkerA, which is called repeatedly to parse current and previous raw data buffers. Additional raw data buffer parsing helper functions such as IsMatchingInstance, GetInstanceByName,
IsMatchingInstance, NextInstance, GetInstanceName, GetInstance, GetCounterDataPtr,
NextCounter, and GetInstanceByName account for another 20 percent of the instructions inside Pdh.dll. The column that indicates the number of shared hits refers to
address range buckets that span function boundaries and make identification less
than certain. The default address bucket range is 16 bytes, an alignment that is consistent with the optimized output from most compilers. If too many uncertain hits occur,
the bucket size can be adjusted downward using the -b command-line switch.
The details Kernrate provides at the Module level are mainly of interest to the programmers responsible for building the software module and maintaining it, so exploring the inner workings of PDH any further isn’t necessary here.
Idle thread processing Kernrate can also be used to drill down into Kernel-mode
modules, which allows you to identify device drivers or other Kernel-mode functions
that are consuming excessive CPU time. This function of Kernrate is illustrated in the
following code—it drills down into Ntoskrnl and Processr.sys to illuminate the mechanism used to implement the Idle thread that plays a distinctive role in the processor
utilization measurements.
Under circumstances identical to the previous example that illustrated the use of
Kernrate and produced the output in Listings 5-1 through 5-4, the following Kernrate command was issued to drill down into the Ntoskrnl.exe and Processr.sys modules that, along with the HAL, lie at the heart of the Windows Server 2003 operating
system:
kernrate -s 120 -k 10 -v 2 -x -a -e -z processr -z ntoskrnl
This Kernrate session gathered Module level hits against these two modules. Listing
5-5 shows a Kernrate report on processor usage for the Processr.sys Kernal module by
function.
Chapter 5:
Listing 5-5
Performance Troubleshooting
429
A Kernrate Report on Processor Usage by Function
===> Processing Zoomed Module processr.sys...
----- Zoomed module processr.sys (Bucket size = 16 bytes, Rounding Down) ------Percentage in the following table is based on the Total Hits for this Zoom Module
Time
4474 hits, 19531 events per hit -------- (4474 total hits from summingup the module components)
(7.28% of Total Possible Hits based on the profile interrupt)
Module
AcpiC1Idle
Hits
4474
Shared
0
msec
120002
%Total %Certain Events/Sec
100 % 100 %
728168
Kernrate found instructions being executed inside the Processr.sys module 7.28 percent of the time. The Processr.sys device driver module is a new feature of the Windows Server 2003 operating system. It provides hardware-specific processor support,
including implementation of the processor power management routines. Depending
on the processor model and its feature set, the operating system will provide an optimal Idle thread implementation. In a multiprocessor configuration, for example, the
operating system will issue no-operation (NOP) instructions inside the Idle loop to
ensure no memory bus traffic can clog up the shared memory bus. Here, a call is made
to the Advanced Configuration and Power Interface (ACPI) C1 Idle routine because
the target processor is a machine that supports this power management hardware
interface.
Listing 5-6
A Kernrate Report on Processor Usage for Ntoskrnl.exe by Function
===> Processing Zoomed Module ntoskrnl.exe...
----- Zoomed module ntoskrnl.exe (Bucket size = 16 bytes, Rounding Down) ------Percentage in the following table is based on the Total Hits for this Zoom Module
Time
15337 hits, 19531 events per hit -------- (15639 total hits from summingup the module components)
(24.96% of Total Possible Hits based on the profile interrupt)
Module
KiIdleLoop
KiDispatchInterrupt
KiSwapContext
READ_REGISTER_BUFFER_UCHAR
READ_REGISTER_ULONG
Hits
14622
104
104
46
46
Shared
0
104
0
46
0
msec %Total %Certain Events/Sec
120002
93 % 93 %
2379812
120002
0 % 0 %
16926
120002
0 % 0 %
16926
120002
0 % 0 %
7486
120002
0 % 0 %
7486
430
Microsoft Windows Server 2003 Performance Guide
FsRtlIsNameInExpressionPrivate
FsRtlIsNameInExpressionPrivate
READ_REGISTER_USHORT
READ_REGISTER_UCHAR
KiTrap0E
KiSystemService
WRITE_REGISTER_USHORT
WRITE_REGISTER_UCHAR
FsRtlIsNameInExpressionPrivate
ExpCopyThreadInfo
KiXMMIZeroPagesNoSave
KiTimerExpiration
ObReferenceObjectByHandle
SwapContext
44
39
30
30
24
24
15
15
14
13
13
13
10
10
0
0
30
0
0
0
15
0
0
0
13
0
0
0
120002
120002
120002
120002
120002
120002
120002
120002
120002
120002
120002
120002
120002
120002
0
0
0
0
0
0
0
0
0
0
0
0
0
0
%
%
%
%
%
%
%
%
%
%
%
%
%
%
0
0
0
0
0
0
0
0
0
0
0
0
0
0
%
%
%
%
%
%
%
%
%
%
%
%
%
%
7161
6347
4882
4882
3906
3906
2441
2441
2278
2115
2115
2115
1627
1627
Ninety-three percent of the module hits inside Ntoskrnl are for the KiIdleLoop routine. This is the Idle thread routine that eventually calls the Processr.sys AcpiC1Idle
function.
In summary, Kernrate is a processor instruction sampling tool that allows you to profile processor usage at a considerably more detailed level than the Performance Monitor counters allow. Kernrate can be used to understand how the processor is being
used at the module and instruction level. It is capable of quantifying processor usage
inside system functions, device drivers, and even operating system kernel routines
and the HAL.
Memory Troubleshooting
Memory problems—either a shortage of memory or poorly configured memory—are a
common cause of performance problems. This section looks at two types of memory
bottlenecks. The first is a shortage of RAM, or physical memory. When there is a
shortage of RAM, the virtual memory manager (VMM) component, which attempts to
keep the most recently accessed virtual memory pages of processes in RAM, must
work harder and harder. Performance might suffer as paging operations to disk
increase, and these paging operations can interfere with applications that need to
access the same disk on which the paging file (or files) is located. Even though excessive paging to disk is a secondary effect of a RAM shortage, it is the symptom that is
easiest to detect. Examples are discussed that illustrate how to identify a system with
a shortage of RAM that is encountering this type of memory bottleneck.
Chapter 5:
Performance Troubleshooting
431
A second type of memory bottleneck occurs when a process exhausts the amount of
virtual memory available for allocation. Virtual memory can become depleted by a
process with a memory leak, the results of which, if undetected, can be catastrophic.
The program with the leak might fail, or it might cause other processes to fail because
of a shortage of resources. Memory leaks are usually program defects.
Normal server workload growth can also lead to a similar shortage of virtual memory.
Instead of a virtual memory leak, think of this situation as virtual memory creep. Virtual memory creep is very easy to detect and avoid. There is an example later in this
chapter that illustrates what to look for to diagnose a memory leak or virtual memory
creep. A useful technique to use in memory capacity planning is also discussed in
Chapter 6, “Advanced Performance Topics.”
The memory on a computer is not utilized in quite the same fashion as other hardware resources. You cannot associate memory utilization with specific requests for
service, for example, or compute a service time and response time for memory
requests. Program instructions and data area occupy physical memory to execute.
They often occupy physical memory locations long after they are actively addressed. A
program’s idle virtual memory code and data areas are removed from RAM only when
new requests for physical memory addresses cannot be satisfied from current supplies of unallocated (or available) RAM. Another factor that complicates the memory
utilization measures is that RAM tends to look fully utilized all the time because of the
way a process’s virtual memory address space is mapped to physical memory on
demand.
The statistics that are available to measure memory utilization reflect this dynamic
policy of allocating virtual memory on demand. The performance measurements
include instantaneous virtual memory allocation counters, instantaneous physical
memory allocation counters, and continuous interval counters that measure paging
activity. These measurements are available at both the system and process levels.
Counters to Evaluate When Troubleshooting Memory Performance
The first step in analyzing memory problems is determining whether the problem is a
result of insufficient available physical memory leading to excessive paging. Even
though insufficient available memory can cause excessive paging, excessive paging
can occur even when there is plenty of available memory, when, for example, an application is functioning improperly and leaking memory.
432
Microsoft Windows Server 2003 Performance Guide
Ideally, compare the values of the counters listed in Table 5-3 to the value of these
same counters that you archived in your baseline analysis. If you do not have a baseline analysis to go by, or the system has changed considerably since you last made
baseline measurements, the suggested thresholds listed in Table 5-3 can be used as
very rough usage guidelines.
Table 5-3
Memory Performance Counters to Evaluate
Counter
Description
Suggested Threshold
Memory\% Committed This value should be relatively stable
Bytes in Use
during a long-term view.
Investigate if greater
than 80%.
Memory\Available
Bytes
Investigate if less than
5% of the size of RAM.
If this value is low, check the Memory\
Pages/sec counter value. Low available
memory and high paging indicate a
memory shortage resulting from an
excessive application load or a
defective process.
Alarm if less than 0.5%
of the size of RAM.
Memory\Commit
Limit
This value should stay constant, indicat- Investigate if the trend
ing an adequately sized paging file. If
of this value is increasthis value increases, the system ening over time.
larged the paging file for you, indicating
a prolonged virtual memory shortage.
Memory\Committed
Bytes
This represents the total virtual memory Investigate if the trend
allocated by the processes on the sys- of this value is increastem. If it increases over an extended
ing over time.
period of time, a process might be
leaking memory
Memory\Pages/sec
Tracks page fault pages generated by
read (input) and write (output) operations. If this value is high, check the
Pages Input/sec to see whether
application(s) are waiting for pages
that could slow response time.
Depends on page file
disk speed. Additional
investigation might be
required when there
are more than 40 per
second on slow disks or
more than 300 per second on faster disks.
Memory\Pages
Input/sec
Tracks page faults requiring data to be
read from the disk. Unlike output pages,
the application must wait for this data
to be read, so application response time
can be slowed if this number is high.
Varies with disk hardware and system performance.
Check the disk % Idle Time to see
whether the page file drive is so busy
that paging performance might be
adversely affected.
More than 20 might be
a problem on slow disk
drives, whereas faster
drives can handle
much more.
Chapter 5:
Table 5-3
Performance Troubleshooting
433
Memory Performance Counters to Evaluate
Counter
Description
Suggested Threshold
Memory\Pool
Nonpaged Bytes
Tracks memory that is always resident
in physical memory. Primarily device
drivers use this memory.
Investigate if Pool
Nonpaged Bytes is
running at > 80% of its
maximum configured
pool size.
The value of this counter should be
relatively stable. An increasing value
over time might indicate a pool
memory leak.
Memory\Pool Paged
Bytes
Tracks memory that can be paged out Investigate if Pool
of physical memory. Any service or ap- Paged Bytes is running
plication can use this memory.
at > 70% of its maxiThe value of this counter should be rel- mum configured pool
atively stable. An increasing value over size.
time might indicate a pool memory
leak.
Process(_Total)\
Private Bytes
Monitors the sum of all private virtual
memory allocated by all the processes
running on that system.
Investigate if the trend
of this value is increasing over time.
If this value increases over a long period
of time, an application might be leaking
memory.
LogicalDisk(pagefile
drive)\% Idle Time
Monitors the idle time of the drive (or Investigate paging
drives) on which the paging file resides. drives with less than
If this disk is too busy (that is, has a very 50% Idle Time.
low idle time), virtual memory operations to that disk will slow down.
LogicalDisk(pagefile
drive)\Split I/O/sec
Monitors the rate that Split I/Os are oc- The threshold value for
curring on the drive (or drives) with the this counter depends
paging file(s).
on the disk drive type
A higher than normal rate of Split I/Os and configuration.
on a drive with a paging file can cause
virtual memory operations to that disk
to take longer.
If the paging file shows a high degree of usage, the paging file might be too small for
the applications you are running on the system. Likewise, a disk that holds the paging
file or files that is too busy can also impact overall performance.
Memory leaks in applications are indicated in several places. First, you might get an
error message indicating the system is low on virtual memory. If you have logged
434
Microsoft Windows Server 2003 Performance Guide
performance data on the computer over a period of time, a memory leak in an application process will show up as a gradual increase in the value of the Memory\Committed Bytes counter, as well as an increase in the value of the Process(_Total)\Private
Bytes counter. A memory leak in one process might also cause excessive paging by
squeezing other process working sets out of RAM. An example of this condition is discussed later in “Virtual Memory Leaks.”
What to Check Next When Troubleshooting Memory Performance
If you determine that the system needs more physical memory, you can either install
more physical memory or move applications to another computer to relieve the excessive load. If you decide that you do not have a physical memory shortage or problem,
the next step is to evaluate another component—for example, the processor or disk—
depending on the value of other performance counters.
If memory seems to be a problem, the next step is to determine the specific cause.
Sometimes in a large Terminal Services environment, there is simply too much
demand for memory from multiple application processes. Other times, you can isolate
a memory problem to a specific process, and most likely that problem will be the
result of one of two situations: either an application needs more memory than is
installed on the system, or an application has a problem that needs to be fixed. If the
memory usage of a specific process rises to a certain level and stabilizes, you can
increase available virtual memory by expanding the paging file. Eventually, if the
application is not defective, its memory consumption should stabilize. If the amount
of physical memory installed is inadequate, the application might perform poorly,
and it might cause a significant amount of paging to disk. However, its consumption
of virtual memory normally will not increase forever.
If the application is defective and continually consumes more and more memory
resources, its memory usage, as indicated by the Process(ProcessName)\Private Bytes
performance counter, will constantly increase over time. This situation is known as a
memory leak—where memory resources are reserved and used but not released when
they are no longer required. Depending on the severity of the leak, this condition of
using memory but not releasing it can consume all available memory resources in a matter of days, hours, or even minutes. Generally, the serious leaks are caught before an
application is released to customers, so only the slow leaks are left to be noticed by the
end users of the production application. Consequently, a counter log file that tracks
memory usage over a long period of time is the best way to detect slow memory leaks
before they cause problems in the rest of the system. The example performance moni-
Chapter 5:
Performance Troubleshooting
435
toring procedures recommended in Chapter 4, “Performance Monitoring Procedures,”
incorporate the measurements you need to identify a system with a memory leak.
If the paging file is fragmented or is located on a disk that is heavily used by other
applications, memory performance can be degraded even though there is no shortage
of either physical or virtual memory. A consistently low value of the LogicalDisk(PageFileDrive)\% Idle Time performance counter indicates that the disk is very busy,
which might contribute to degraded memory performance. Moving the paging file to
another disk drive might improve performance in this situation. If the value of the
LogicalDisk(PageFileDrive)\Split I/O/sec counter is high on the drive that contains
the paging file, the disk or the paging file might be fragmented. If either is the case,
accessing that disk is going to take longer than it would without the fragmentation.
Defragmenting the drive or moving the paging file to a less crowded disk should
improve memory performance.
Tip
The built-in Disk Defragmenter tool does not defragment paging files. To
defragment the paging file, either use a third-party defragmenter that supports this
feature or follow the procedure outlined in article 229850 “Analyze Operation Suggests Defragmenting Disk Multiple Times” in the Microsoft Knowledge Base at http://
support.microsoft.com.
Excessive Paging
You want to install enough RAM to prevent excessive paging from impacting performance, but you should not attempt to install enough RAM to eliminate paging activity
completely. In a virtual memory computer system, some page fault behavior—for
instance, when a program first begins to execute—is inevitable. Modified virtual pages
in memory have to be updated on disk eventually, so some amount of Page Writes/sec
is also inevitable.
Two types of serious performance problems can occur if too little RAM is available:
■
Too many page faults
Too many page faults leads to excessive program execu-
tion delays.
■
Virtual memory machines that sustain high page-fault rates
might also encounter disk performance problems.
Disk contention
Too many page faults is the more straightforward performance problem associated
with virtual memory and paging. Unfortunately, it is also the one that requires the
436
Microsoft Windows Server 2003 Performance Guide
most intense data gathering to diagnose. A commonly encountered problem occurs
when disk performance suffers because of excessive paging operations to disk. Even
though it is a secondary effect, it is often the easier condition to recognize. The extra
disk I/O activity resulting from paging can easily interfere with applications attempting to access their data files stored on the same disk as the paging file.
Table 5-4
Primary Indicators of a Memory Bottleneck
Counter
Primary Indicator
Threshold Values
Memory\Pages/sec
Paging operations to
disk (Pages input +
Pages output)
Pages/sec × 4K page size > 70%
of the total number of Logical
Disk Bytes/sec to the disk(s)
where the paging file is located.
Memory\Page Reads/sec
Page faults that were
resolved by reading the
disk
Sustained values > 50% of the
total number of Logical Disk operations to the disk(s) where the
paging file is located.
Memory\Available Bytes
Free (unallocated) RAM
Available Bytes < 5% of the size
of RAM is likely to mean there is
a shortage of physical memory.
Table 5-4 shows three primary indicators of a shortage of RAM, and these indicators
are all interrelated. The overall paging rate to disk includes both Page Reads/sec and
Page Writes/sec. Because the operating system must ultimately write changed pages
to disk, it is not possible to avoid most page write operations. Page Reads/sec—the
hard page fault rate—is the measurement most sensitive to a shortage of RAM. As Available Bytes—the pool of unallocated RAM—becomes depleted, the number of hard page
faults that occur normally increases. The total number of Pages/sec that the system
can sustain is a function of disk bandwidth. When the disk or disks where the paging
files are located become saturated, the system reaches an upper limit for sustainable
paging activity to disk. However, because paging operations consist of a mixture of
sequential and random disk I/Os, you will discover that this limit on paging activity is
quite elastic. The performance of disks on sequential and random access workloads is
discussed further in the section entitled “Establishing a Disk Drive Performance Baseline” later in this chapter.
Because the limit on the number of Pages/sec that the system can read and write is
elastic, no simple rule-of-thumb approach is adequate for detecting thrashing, the classic symptom of a machine that is memory constrained. A better approach is to compare the amount of disk traffic resulting from paging to overall disk operations. If
paging accounts for only 20 percent or less of total disk operations, the impact of vir-
Chapter 5:
Performance Troubleshooting
437
tual memory management is tolerable. If paging accounts for 70 percent or more of all
disk operations, the situation is probably not tolerable.
Figure 5-10 illustrates these points, showing a system that is paging heavily during a 2minute interval. The number of available bytes plummets about 30 seconds into this
monitoring session, when the Resource Kit resource consumer tool, Consume.exe, is
launched to create a shortage of RAM on this machine:
C:\>consume -physical-memory -time 600
Consume: Message: Time out after 600 seconds.
Consume: Message: Successfully assigned process to a job object ...
Consume: Message: Total physical memory: 1FF6F000
Consume: Message: Available physical memory: 7036000
Consume: Message: Will attempt to create 1 baby consumers ...
Consume: Message: Sleeping ...
Figure 5-10
A system that is paging heavily during a 10-minute interval
When the consume process acquired 600 MB of virtual memory to create a memory
shortage, available bytes dropped to near zero. As server applications continued to
execute, they encountered serious paging delays. The operating system was forced to
perform 37 Page Reads/sec on average during this period of shortage. A spike of over
300 Page Reads/sec occurred during one interval, along with several peak periods in
which the number of Page Reads/sec exceeded 200.
438
Microsoft Windows Server 2003 Performance Guide
At this rate, paging activity will consume almost all the available disk bandwidth. You
can see this consumption better in Figure 5-11, which compares Page Reads/sec and
Page Writes/sec to Disk Transfers/sec. It is apparent that almost all disk activity at this
point results from page fault resolution. This is a classic illustration of a virtual memory system that is paging heavily, possibly to the detriment of its designated I/O workload execution.
Figure 5-11
Comparing Page Reads/sec and Page Writes/sec to Disk Transfers/sec
Because the operating system often attempts bulk paging operations, especially on
Page Writes, comparing bytes moved for paging operations to total Disk Bytes/sec is a
better way to determine the amount of disk capacity devoted to paging. Multiply the
Memory\Pages/sec counter by 4096, the size of an IA-32 page, to calculate bytes
moved by disk paging. Compare that value to Logical Disk\Disk Bytes/sec. If the percentage of the available disk bandwidth devoted to paging operations exceeds 50 percent, paging potentially will impact application performance. If the percentage of the
available disk bandwidth devoted to memory management–initiated paging operations exceeds 70 percent, the system is probably experiencing excessively high paging
rates, that is, performing too much work in virtual memory management and not
enough of its application-oriented physical disk work.
The standard remedy for a system that is paging too much is to add RAM and increase
the pool of Available Bytes. This will fix most performance problems resulting from
excessive paging. If you cannot add RAM to the machine, other remedies include con-
Chapter 5:
Performance Troubleshooting
439
figuring faster paging disks, more paging disks, or a combination of both. Any of the
disk tuning strategies discussed later that improve disk service time can also help,
including defragmentation of the paging file disk.
Caution Some paging activity cannot be eliminated entirely by adding more RAM.
Page Reads that occur at process initialization cannot be avoided by adding more
RAM. Because the operating system must ensure that modified pages are current on
the paging file, many Page Writes cannot be avoided either. Demand zero paging is
the allocation of virtual memory for new data areas. Demand zero paging occurs at
application startup and, in some applications, as new data structures are allocated
during run time. Adding memory will often reduce paging and improve memory efficiency, but generally does not reduce the demand zero fault rate.
Available Memory
The other primary indicator of a memory bottleneck is that the pool of Available Bytes
has become depleted. Understanding how the size of the Available Bytes pool affects
paging is very important. This relationship is so important that three Available Bytes
counters are available: one that counts bytes, one that counts kilobytes, and a third
that counts megabytes. Page trimming by the virtual memory manager is triggered by
a shortage of available bytes. Page trimming attempts to replenish the pool of Available Bytes by identifying virtual memory pages that have not been referenced for a relatively long time. When page trimming is effective, older pages that are trimmed from
process working sets are not needed again soon. These older pages are replaced by
more active, recent pages. Trimmed pages are marked in transition and remain in
RAM for an extra period of time to reduce the amount of paging to disk that occurs.
Dirty pages must be written to disk if more pages are on the Modified List than the
list’s threshold value allows, so there is usually some cost associated with page trimming even though the process is very effective.
If there is a chronic shortage of Available Bytes, page trimming loses effectiveness and
leads to more paging operations to disk. There is little room in RAM for pages marked
in transition; therefore, when recently trimmed pages are referenced again, they must
be accessed from disk instead of RAM. When dirty pages are trimmed frequently,
more frequent updates of the paging file are scheduled. This paging to disk interferes
with application-directed I/O operations to the same disk.
Following a round of page trimming, if the memory shortage persists, the system is
probably in store for more page trimming. Figure 5-12 zooms in on the value of Available MBytes during the same period shown in Figure 5-10, when Consume.exe was
440
Microsoft Windows Server 2003 Performance Guide
active. Notice that following the initial round of page trimming to replenish the Available Bytes pool, the value of Available MBytes increases, but in an oscillating manner
depending on how effective the previous round of page trimming was. It is apparent
that additional rounds of page trimming are initiated because the physical memory
shortage persists. When a persistent memory shortage occurs, page trimming can
combat the problem, but do little to relieve the shortage for good. The only effective
way to relieve the shortage is to add RAM to the machine.
Figure 5-12
The value of Available MBytes oscillates between rounds of page trimming
As a general rule, you can avoid a memory shortage by ensuring that Available Bytes
does not drop below 5 percent of RAM for an extended period of time. However, this
rule of thumb can be fallible if you are running server applications that manage their
own working sets. Applications that can manage their own working sets include
Microsoft Internet Information Services (IIS) 6.0, Microsoft Exchange Server, and
Microsoft SQL Server. As described in Chapter 1, “Performance Monitoring Overview,” these applications interact with the virtual memory manager to expand their
working sets when free RAM is ample and contract them when RAM is depleted.
These applications rely on RAM-resident cache buffers to reduce the amount of I/O
they must direct to disk. When you are running server applications that manage their
own working sets, RAM will always look full. Available Bytes will remain within a narrow range, and the only reliable indicator of a RAM shortage will be a combination of
more paging to disk and less effective application caching.
Chapter 5:
Performance Troubleshooting
441
Memory Allocation
After you determine that physical memory is saturated and paging is excessive,
exploring the way physical memory is allocated is often useful. Figure 5-13 shows the
four major counters that tell you how RAM is allocated.
Figure 5-13
The four major counters indicate how RAM is allocated
The following memory allocation counters are shown against a scale that reflects the
size of RAM on the machine in Figure 5-13, namely 1 GB:
■
Memory\Available Bytes RAM that is currently available for immediate allocation. Available Bytes is the sum of the Zero, Free, and Standby lists.
■
Memory\Cache Bytes The pageable working set associated with allocated system memory. Cache Bytes is the sum of four counters: System Cache Resident
Bytes, System Driver Resident Bytes, System Code Resident Bytes, and Pool
Paged Resident Bytes.
■
Memory\Pool Nonpaged Bytes
■
Process(_Total)\Working Set The sum of each process’s current working set.
Resident pages from shared DLLs are counted as part of every process address
space that loaded the DLL.
The current nonpageable pool allocation.
442
Microsoft Windows Server 2003 Performance Guide
You do not have to use the default scaling factor associated with these counters; they
have all been adjusted to use a scale factor of .000001. Presentation in this s