Secretsof the Oracle Database

Secretsof the Oracle Database
 CYAN
YELLO
PAN
The EXPERT’s VOIce ® in Oracle
Secrets of the
Oracle Database
Advanced administration, tuning, and
troubleshooting using undocumented features
Norbert Debes
Foreword by Guy Harrison
Secrets of the
Oracle Database
■■■
Norbert Debes
Secrets of the Oracle Database
Copyright © 2009 by Norbert Debes
All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying, recording, or by any information storage or retrieval
system, without the prior written permission of the copyright owner and the publisher.
ISBN-13 (pbk): 978-1-4302-1952-1
ISBN-13 (electronic): 978-1-4302-1953-8
Printed and bound in the United States of America 9 8 7 6 5 4 3 2 1
Trademarked names may appear in this book. Rather than use a trademark symbol with every occurrence
of a trademarked name, we use the names only in an editorial fashion and to the benefit of the trademark
owner, with no intention of infringement of the trademark.
Lead Editor: Jonathan Gennick
Editorial Board: Clay Andres, Steve Anglin, Mark Beckner, Ewan Buckingham, Tony Campbell, Gary
Cornell, Jonathan Gennick, Michelle Lowman, Matthew Moodie, Jeffrey Pepper, Frank Pohlmann,
Ben Renow-Clarke, Dominic Shakeshaft, Matt Wade, Tom Welsh
Project Manager: Beth Christmas
Copy Editor: Lisa Hamilton
Associate Production Director: Kari Brooks-Copony
Production Editor: Kelly Winquist
Compositor: Susan Glinert
Proofreader: Greg Teague
Indexer: Norbert Debes
Artist: April Milne
Cover Designer: Kurt Krames
Manufacturing Director: Tom Debolski
Distributed to the book trade worldwide by Springer-Verlag New York, Inc., 233 Spring Street, 6th Floor,
New York, NY 10013. Phone 1-800-SPRINGER, fax 201-348-4505, e-mail orders ny@springer sbm.com, or
visit http://www.springeronline.com.
For information on translations, please contact Apress directly at 2855 Telegraph Avenue, Suite 600,
Berkeley, CA 94705. Phone 510-549-5930, fax 510-549-5939, e-mail info@apress.com, or visit http://
www.apress.com.
Apress and friends of ED books may be purchased in bulk for academic, corporate, or promotional use.
eBook versions and licenses are also available for most titles. For more information, reference our Special
Bulk Sales–eBook Licensing web page at http://www.apress.com/info/bulksales.
The information in this book is distributed on an “as is” basis, without warranty. Although every precaution
has been taken in the preparation of this work, neither the author(s) nor Apress shall have any liability to
any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly
by the information contained in this work.
The source code for this book is available to readers at http://www.apress.com. You will need to answer
questions pertaining to this book in order to successfully download the code.
Contents at a Glance
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
About the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
About the Foreword Writer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
PART 1
Initialization Parameters
■CHAPTER 1
Partially Documented Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
■CHAPTER 2
Hidden Initialization Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
PART 2
■■■
Data Dictionary Base Tables
■CHAPTER 3
Introduction to Data Dictionary Base Tables . . . . . . . . . . . . . . . . . . . 41
■CHAPTER 4
IND$, V$OBJECT_USAGE, and Index Monitoring . . . . . . . . . . . . . . . . 45
PART 3
■■■
Events
■CHAPTER 5
Event 10027 and Deadlock Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . 57
■CHAPTER 6
Event 10046 and Extended SQL Trace . . . . . . . . . . . . . . . . . . . . . . . . . 61
■CHAPTER 7
Event 10053 and the Cost Based Optimizer . . . . . . . . . . . . . . . . . . . . 63
■CHAPTER 8
Event 10079 and Oracle Net Packet Contents . . . . . . . . . . . . . . . . . . 87
PART 4
iv
■■■
■■■
X$ Fixed Tables
■CHAPTER 9
Introduction to X$ Fixed Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
■CHAPTER 10
X$BH and Latch Contention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
■CHAPTER 11
X$KSLED and Enhanced Session Wait Data . . . . . . . . . . . . . . . . . . . 113
■CHAPTER 12
X$KFFXP and ASM Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
PART 5
■■■
SQL Statements
■CHAPTER 13
ALTER SESSION/SYSTEM SET EVENTS . . . . . . . . . . . . . . . . . . . . . . . 129
■CHAPTER 14
ALTER SESSION SET CURRENT_SCHEMA . . . . . . . . . . . . . . . . . . . . . 135
■CHAPTER 15
ALTER USER IDENTIFIED BY VALUES . . . . . . . . . . . . . . . . . . . . . . . . . 143
■CHAPTER 16
SELECT FOR UPDATE SKIP LOCKED . . . . . . . . . . . . . . . . . . . . . . . . . . 149
PART 6
■■■
Supplied PL/SQL Packages
■CHAPTER 17
DBMS_BACKUP_RESTORE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
■CHAPTER 18
DBMS_IJOB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
■CHAPTER 19
DBMS_SCHEDULER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
■CHAPTER 20
DBMS_SYSTEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
■CHAPTER 21
DBMS_UTILITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
PART 7
■■■
Application Development
■CHAPTER 22
Perl DBI and DBD::Oracle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
■CHAPTER 23
Application Instrumentation and End-to-End Tracing . . . . . . . . . . 251
PART 8
■■■
Performance
■CHAPTER 24
Extended SQL Trace File Format Reference . . . . . . . . . . . . . . . . . . 271
■CHAPTER 25
Statspack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
■CHAPTER 26
Integrating Extended SQL Trace and AWR . . . . . . . . . . . . . . . . . . . . 345
■CHAPTER 27
ESQLTRCPROF Extended SQL Trace Profiler . . . . . . . . . . . . . . . . . . 351
■CHAPTER 28
The MERITS Performance Optimization Method . . . . . . . . . . . . . . . 371
v
PART 9
■■■
Oracle Net
■CHAPTER 29
TNS Listener IP Address Binding and IP=FIRST . . . . . . . . . . . . . . . 401
■CHAPTER 30
TNS Listener TCP/IP Valid Node Checking . . . . . . . . . . . . . . . . . . . . 413
■CHAPTER 31
Local Naming Parameter ENABLE=BROKEN . . . . . . . . . . . . . . . . . . 419
■CHAPTER 32
Default Host Name in Oracle Net Configurations . . . . . . . . . . . . . . 423
PART 10
■■■
Real Application Clusters
■CHAPTER 33
Session Disconnection, Load Rebalancing, and TAF . . . . . . . . . . . 429
■CHAPTER 34
Removing the RAC Option Without Reinstalling . . . . . . . . . . . . . . . 445
PART 11
■■■
Utilities
■CHAPTER 35
OERR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
■CHAPTER 36
Recovery Manager Pipe Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
■CHAPTER 37
ORADEBUG SQL*Plus Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
PART 12
■■■
Appendixes
■APPENDIX A
Enabling and Disabling DBMS Options . . . . . . . . . . . . . . . . . . . . . . . 495
■APPENDIX B
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
■APPENDIX C
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
■INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
vi
Contents
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
About the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
About the Foreword Writer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
PART 1
■■■
■CHAPTER 1
Initialization Parameters
Partially Documented Parameters
........................3
AUDIT_SYSLOG_LEVEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Syslog Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Introduction to Auditing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Using AUDIT_SYSLOG_LEVEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Auditing Non-Privileged Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
PGA_AGGREGATE_TARGET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Introduction to Automatic PGA Memory Management . . . . . . . . . . . . 8
Misconceptions About PGA_AGGREGATE_TARGET . . . . . . . . . . . . . 10
Researching PGA_AGGREGATE_TARGET . . . . . . . . . . . . . . . . . . . . . . 11
Creating a Large Table with a Pipelined Table Function . . . . . . . . . 11
V$SQL_WORKAREA_ACTIVE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
_PGA_MAX_SIZE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
_SMM_MAX_SIZE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
_SMM_PX_MAX_SIZE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Shared Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Parallel Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
EVENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Leveraging Events at the Instance-Level . . . . . . . . . . . . . . . . . . . . . . 22
Case Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
vii
viii
■C O N T E N T S
OS_AUTHENT_PREFIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
OPS$ Database Users and Password Authentication . . . . . . . . . . . . 23
Case Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
■CHAPTER 2
Hidden Initialization Parameters
. . . . . . . . . . . . . . . . . . . . . . . . . 29
Trace File Permissions and _TRACE_FILES_PUBLIC . . . . . . . . . . . . . . . . 30
ASM Test Environment and _ASM_ALLOW_ONLY_RAW_DISKS . . . . . . 31
ASM Hidden Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Setting Up Oracle Clusterware for ASM . . . . . . . . . . . . . . . . . . . . . . . 33
ASM Instance Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Disk Failure Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
PART 2
■■■
■CHAPTER 3
Data Dictionary Base Tables
Introduction to Data Dictionary Base Tables . . . . . . . . . . . . . 41
Large Objects and PCTVERSION vs. RETENTION . . . . . . . . . . . . . . . . . . . . 42
■CHAPTER 4
IND$, V$OBJECT_USAGE, and Index Monitoring
. . . . . . . . . 45
Schema Restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Index Usage Monitoring Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Function MONITOR_SCHEMA_INDEXES . . . . . . . . . . . . . . . . . . . . . . . 47
Enabling Index Monitoring on Schema HR . . . . . . . . . . . . . . . . . . . . . 48
Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
PART 3
■■■
■CHAPTER 5
Events
Event 10027 and Deadlock Diagnosis
. . . . . . . . . . . . . . . . . . . . 57
Deadlocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Event 10027 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
■CHAPTER 6
Event 10046 and Extended SQL Trace
. . . . . . . . . . . . . . . . . . . . 61
■C O N T E N T S
■CHAPTER 7
Event 10053 and the Cost Based Optimizer
. . . . . . . . . . . . . . 63
Trace File Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Query Blocks and Object Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Query Transformations Considered . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Legend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Results of Bind Variable Peeking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Optimizer Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
System Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Object Statistics for Tables and Indexes . . . . . . . . . . . . . . . . . . . . . . 77
Single Table Access Path and Cost . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Join Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Execution Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Predicate Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Hints and Query Block Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
■CHAPTER 8
Event 10079 and Oracle Net Packet Contents
. . . . . . . . . . . . 87
Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
PART 4
■■■
■CHAPTER 9
X$ Fixed Tables
Introduction to X$ Fixed Tables
. . . . . . . . . . . . . . . . . . . . . . . . . . 93
X$ Fixed Tables and C Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Layered Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Granting Access to X$ Tables and V$ Views . . . . . . . . . . . . . . . . . . . . . . . 96
Drilling Down from V$ Views to X$ Fixed Tables . . . . . . . . . . . . . . . . . . . 97
Drilling Down from V$PARAMETER to the Underlying
X$ Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Relationships Between X$ Tables and V$ Views . . . . . . . . . . . . . . . . . . . 102
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
■CHAPTER 10 X$BH and Latch Contention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
ix
x
■C O N T E N T S
■CHAPTER 11 X$KSLED and Enhanced Session Wait Data . . . . . . . . . . . . . 113
Drilling Down from V$SESSION_WAIT . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
An Improved View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
■CHAPTER 12 X$KFFXP and ASM Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
X$KFFXP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Salvaging an SPFILE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Mapping Segments to ASM Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
PART 5
■■■
SQL Statements
■CHAPTER 13 ALTER SESSION/SYSTEM SET EVENTS . . . . . . . . . . . . . . . . . . 129
Tracing Your Own Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
ALTER SESSION SET EVENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
ALTER SYSTEM SET EVENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
ALTER SESSION/SYSTEM SET EVENTS and Diagnostic Dumps . . . . . . 132
Immediate Dumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
■CHAPTER 14 ALTER SESSION SET CURRENT_SCHEMA . . . . . . . . . . . . . . . . 135
Privilege User vs. Schema User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Creating Database Objects in a Foreign Schema . . . . . . . . . . . . . . 137
Restrictions of ALTER SESSION SET CURRENT_SCHEMA . . . . . . . . . . . 138
Advanced Queuing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
RENAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Private Database Links. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Stored Outlines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
■CHAPTER 15 ALTER USER IDENTIFIED BY VALUES . . . . . . . . . . . . . . . . . . . . 143
The Password Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Locking Accounts with ALTER USER IDENTIFIED BY VALUES . . . . . . . . 145
ALTER USER and Unencrypted Passwords . . . . . . . . . . . . . . . . . . . . . . . 146
■C O N T E N T S
■CHAPTER 16 SELECT FOR UPDATE SKIP LOCKED . . . . . . . . . . . . . . . . . . . . . . 149
Advanced Queuing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Contention and SELECT FOR UPDATE SKIP LOCKED . . . . . . . . . . . . . . . 151
DBMS_LOCK—A Digression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
PART 6
■■■
Supplied PL/SQL Packages
■CHAPTER 17 DBMS_BACKUP_RESTORE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Recovery Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Disaster Recovery Case Study with Tivoli Data Protection
for Oracle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
■CHAPTER 18 DBMS_IJOB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Introduction to DBMS_JOB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
BROKEN Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
FULL_EXPORT Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
REMOVE Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
RUN Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
■CHAPTER 19 DBMS_SCHEDULER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Running External Jobs with the Database Scheduler . . . . . . . . . . . . . . . 181
Exit Code Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Standard Error Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
External Jobs on UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Removal of Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Command Line Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
External Jobs and Non-Privileged Users . . . . . . . . . . . . . . . . . . . . . 190
xi
xii
■C O N T E N T S
External Jobs on Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Command Line Argument Handling . . . . . . . . . . . . . . . . . . . . . . . . . 192
Windows Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
External Jobs and Non-Privileged Users . . . . . . . . . . . . . . . . . . . . . 193
Services Created by the ORADIM Utility . . . . . . . . . . . . . . . . . . . . . . 194
OracleJobScheduler Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
■CHAPTER 20 DBMS_SYSTEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
GET_ENV Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
KCFRMS Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
KSDDDT Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
KSDFLS Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
KSDIND Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
KSDWRT Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
READ_EV Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
SET_INT_PARAM_IN_SESSION Procedure . . . . . . . . . . . . . . . . . . . . . . . 205
SET_BOOL_PARAM_IN_SESSION Procedure . . . . . . . . . . . . . . . . . . . . . 207
SET_EV Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
SET_SQL_TRACE_IN_SESSION Procedure . . . . . . . . . . . . . . . . . . . . . . . 210
WAIT_FOR_EVENT Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
■CHAPTER 21 DBMS_UTILITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
NAME_RESOLVE Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Name Resolution and Extraction of Object Statistics . . . . . . . . . . . . . . . 218
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
PART 7
■■■
Application Development
■CHAPTER 22 Perl DBI and DBD::Oracle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Circumnavigating Perl DBI Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
A Brief History of Perl and the DBI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Setting Up the Environment for Perl and the DBI . . . . . . . . . . . . . . . . . . 224
UNIX Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Windows Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
■C O N T E N T S
Transparently Running Perl Programs on UNIX Systems . . . . . . . . . . . . 232
Transparently Running Perl Programs on Windows . . . . . . . . . . . . . . . . 233
Connecting to an ORACLE DBMS Instance . . . . . . . . . . . . . . . . . . . . . . . 235
DBI connect Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
Connecting Through the Bequeath Adapter . . . . . . . . . . . . . . . . . . . 237
Connecting Through the IPC Adapter . . . . . . . . . . . . . . . . . . . . . . . . 237
Connecting Through the TCP/IP Adapter . . . . . . . . . . . . . . . . . . . . . 239
Easy Connect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Connecting with SYSDBA or SYSOPER Privileges . . . . . . . . . . . . . . 240
Connecting with Operating System Authentication . . . . . . . . . . . . . 241
Connect Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Comprehensive Perl DBI Example Program . . . . . . . . . . . . . . . . . . . . . . . 244
Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
■CHAPTER 23 Application Instrumentation
and End-to-End Tracing
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Introduction to Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
JDBC End-to-End Metrics Sample Code . . . . . . . . . . . . . . . . . . . . . 254
Compiling the Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
Instrumentation at Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
Setting Up Tracing, Statistics Collection,
and the Resource Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
Using TRCSESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
TRCSESS and Shared Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Instrumentation and the Program Call Stack . . . . . . . . . . . . . . . . . . . . . . 266
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
PART 8
■■■
Performance
■CHAPTER 24 Extended SQL Trace File Format Reference . . . . . . . . . . . . . 271
Introduction to Extended SQL Trace Files . . . . . . . . . . . . . . . . . . . . . . . . 271
SQL and PL/SQL Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
Recursive Call Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
xiii
xiv
■C O N T E N T S
Database Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
PARSING IN CURSOR Entry Format . . . . . . . . . . . . . . . . . . . . . . . . . . 275
PARSE Entry Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
PARSE ERROR Entry Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
EXEC Entry Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
FETCH Entry Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
Execution Plan Hash Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Plan Hash Value Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
CLOSE Entry Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
COMMIT and ROLLBACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
UNMAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Execution Plans, Statistics, and the STAT Entry Format . . . . . . . . . . . . . 285
STAT Entry Format in Oracle9i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
STAT Entry Format in Oracle10g and Oracle11g . . . . . . . . . . . . . . 286
Wait Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
WAIT Entry Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
WAIT in Oracle9i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
WAIT in Oracle10g and Oracle11g . . . . . . . . . . . . . . . . . . . . . . . . . . 290
Bind Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
BINDS Entry Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Statement Tuning, Execution Plans, and Bind Variables . . . . . . . . 295
Miscellaneous Trace File Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
Session Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
Service Name Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
Application Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
ERROR Entry Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
Application Instrumentation and Parallel Execution Processes . . . 308
■CHAPTER 25 Statspack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Introduction to Statspack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Retrieving the Text of Captured SQL Statements . . . . . . . . . . . . . . 313
Accessing STATS$SQLTEXT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
Capturing SQL Statements with Formatting Preserved . . . . . . . . . 323
Undocumented Statspack Report Parameters . . . . . . . . . . . . . . . . . . . . . 324
Statspack Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Finding Expensive Statements in a Statspack Repository . . . . . . . . . . . 330
■C O N T E N T S
Identifying Used Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Execution Plans for Statements Captured with SQL Trace . . . . . . . . . . 331
Finding Snapshots with High Resource Utilization . . . . . . . . . . . . . . . . . 334
High CPU Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
High DB Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Importing Statspack Data from Another Database . . . . . . . . . . . . . . . . . 340
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
■CHAPTER 26 Integrating Extended SQL Trace and AWR . . . . . . . . . . . . . . 345
Retrieving Execution Plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
■CHAPTER 27 ESQLTRCPROF Extended SQL Trace Profiler . . . . . . . . . . . . 351
Categorizing Wait Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
Calculating Response Time and Statistics . . . . . . . . . . . . . . . . . . . . . . . . 353
Case Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
Running the Perl Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
Calculating Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Calculating Response Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
ESQLTRCPROF Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Command Line Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
ESQLTRCPROF Report Sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
■CHAPTER 28 The MERITS Performance Optimization Method . . . . . . . . 371
Introduction to the MERITS Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
Measurement Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Resource Profiles and Performance Assessment Tools . . . . . . . . . 378
Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
xv
xvi
■C O N T E N T S
MERITS Method Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
Phase 1—Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
Phase 2—Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
Phase 3—Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Phase 4—Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
Phase 5—Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Phase 6—Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
PART 9
■■■
Oracle Net
■CHAPTER 29 TNS Listener IP Address Binding and IP=FIRST . . . . . . . . 401
Introduction to IP Address Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
Multihomed Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
IP=FIRST Disabled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Host Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Loopback Adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Boot IP Address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Service IP Address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
IP=FIRST Enabled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
■CHAPTER 30 TNS Listener TCP/IP Valid Node Checking . . . . . . . . . . . . . . 413
Introduction to Valid Node Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Enabling and Modifying Valid Node Checking at Runtime . . . . . . . . . . . 415
■CHAPTER 31 Local Naming Parameter ENABLE=BROKEN . . . . . . . . . . . . 419
Node Failure and the TCP/IP Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
■CHAPTER 32 Default Host Name in Oracle Net Configurations . . . . . . . 423
Default Host Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Disabling the Default Listener . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
■C O N T E N T S
PART 10
■■■
Real Application Clusters
■CHAPTER 33 Session Disconnection, Load Rebalancing, and TAF . . . 429
Introduction to Transparent Application Failover . . . . . . . . . . . . . . . . . . . 429
ALTER SYSTEM DISCONNECT SESSION . . . . . . . . . . . . . . . . . . . . . . . . . . 430
SELECT Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Failover at the End of a Transaction . . . . . . . . . . . . . . . . . . . . . . . . . 435
Session Disconnection and DBMS_SERVICE . . . . . . . . . . . . . . . . . . . . . . 437
Setting Up Services with DBMS_SERVICE . . . . . . . . . . . . . . . . . . . . 437
Session Disconnection with DBMS_SERVICE and TAF . . . . . . . . . . 439
Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
■CHAPTER 34 Removing the RAC Option Without Reinstalling . . . . . . . . 445
Linking ORACLE Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
Simulating Voting Disk Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
Removing the RAC Option with the Make Utility . . . . . . . . . . . . . . . 449
Conversion of a CRS Installation to Local-Only . . . . . . . . . . . . . . . . 451
Re-enabling CRS for RAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
PART 11
■■■
Utilities
■CHAPTER 35 OERR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
Introduction to the OERR Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
Retrieving Undocumented Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
■CHAPTER 36 Recovery Manager Pipe Interface . . . . . . . . . . . . . . . . . . . . . . . 465
Introduction to Recovery Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
Introduction to DBMS_PIPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
xvii
xviii
■C O N T E N T S
RMAN_PIPE_IF Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
RMAN_PIPE_IF Package Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
Using the Package RMAN_PIPE_IF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
Validating Backup Pieces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
Internode Parallel Backup and Restore . . . . . . . . . . . . . . . . . . . . . . . . . . 476
Source Code Depot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
■CHAPTER 37 ORADEBUG SQL*Plus Command . . . . . . . . . . . . . . . . . . . . . . . . . 479
Introduction to ORADEBUG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
ORADEBUG Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
ORADEBUG Command Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
Attaching to a Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
ORADEBUG IPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
ORADEBUG SHORT_STACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
Diagnostic Dumps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
PART 12
■■■
Appendixes
■APPENDIX A Enabling and Disabling DBMS Options . . . . . . . . . . . . . . . . . . 495
■APPENDIX B Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
■APPENDIX C
Glossary
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
■INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
Foreword
W
hen you have more than 10 years’ experience in a technology it sounds impressive.
However, when you’ve got more than 20 years’ experience, it starts to sound a bit sad. Sadly,
I’ve now been working closely with Oracle database software for about 21 years. The first Oracle
database release I worked with was 5.1.
In my early days, the Oracle documentation set consisted of a few slim manuals. The World
Wide Web was still five years in the future, and most of us didn’t even have access to the
USENET Internet groups such as comp.databases.oracle. There were no books published on
Oracle. Aside from the odd call to Oracle technical support, you were on your own.
From the beginning, Oracle seemed more of a mystery that needed solving than did other
software of the day such as VMS, UNIX, or (chuckle) MS-DOS. How does Oracle work? Why is
this SQL so slow? How can I make Oracle go faster? The urge to understand the secrets of Oracle
has driven many of us who’ve made Oracle software the basis for their career.
Throughout the 90s, pioneers such as Anjo Kolk, Cary Millsap, and others sought to explain
the inner workings of Oracle and how they could be applied to improve performance and functionality. The Web emerged and with it a vibrant community for Oracle technologists to
exchange information. The standard Oracle documentation set grew with each release and
now—in Oracle Database 11g—comprises 140 books in the database set alone. The trickle of
technical books that emerged around the Oracle 5–6 timeframe has became a flood. Not
enough information became information overload.
You might think we would know it all now. However, Oracle reminds me of the TV series
Lost: with each new episode questions are answered but more new questions are posed. Just
when you think you understand how the database engine uses memory for sorting, Oracle
introduces Automatic PGA Memory Management; just when you think you understand latches,
Oracle introduces mutexes; and so on. The quest to understand Oracle—so as to make it work
faster and better—is never-ending.
That’s why I am so enthusiastic about Norbert’s book. Some books summarize and clarify
the documented behaviors of Oracle—Tom Kyte’s excellent Expert Oracle Database Architecture is
a perfect example: it contains a lot of material that can be found in the Oracle documentation
set and elsewhere, but organizes and summarizes the information. Other books, such as
Jonathan Lewis’s outstanding Cost-Based Oracle Fundamentals, become the definitive reference on a particular topic by merging documented information and original research. Norbert
has attempted something quite different: he has set out to illuminate important hidden aspects
of the Oracle software stack.
In this book, Norbert reveals heretofore undocumented internal algorithms, PL/SQL packages, parameters, debugging interfaces, and more. However, this isn’t a book of trivia: each
secret revealed has a definite useful application. I got a lot out of reading this book and I recommend it to all who, like me, seek to understand the secrets of Oracle.
Guy Harrison
xix
About the Author
■NORBERT DEBES has more than 13 years of experience as an ORACLE
database administrator. He holds a master’s degree in computer science
from the University of Erlangen, Germany and is an Oracle8, Oracle8i,
and Oracle9i Certified Professional ORACLE Database Administrator.
For well over six years, he held different positions in technical roles at
Oracle Germany, among them team leader in Oracle Support Services
and technical account manager in Strategic Alliances. In his last role at
Oracle, he was responsible for promoting Real Application Clusters
on a technical level. During his tenure, he contributed to the Oracle9i SQL Reference and Real
Application Clusters manuals as well as Real Application Clusters training materials.
As early as 2000, he published an article on performance diagnosis with extended SQL
trace event 10046 by using a logon trigger and writing session statistics from V$SESSTAT and
V$SESSION EVENT to a trace file with the package DBMS SYSTEM. This article appeared in the
“DOAG News” magazine of the German Oracle User Group. Additional publications include
articles in trade journals as well as two books on Oracle9i, which he coauthored. He has given
numerous presentations on the ORACLE DBMS at trade fairs, such as Cebit, and the annual
German Oracle Users Conference.
Since 2002, he has been working as an independent consultant for large corporations in
the industrial, financial, automotive, and services sectors. His assignments include topics such
as Real Application Clusters, Data Guard, Streams, performance tuning, database security,
migration, Advanced Queuing, PL/SQL development, and Perl DBI scripting as well as RMAN
backup and recovery. On most of his assignments, he has the role of an administrator, performance engineer, or architect. However, he occasionally does software development and serves
as a trainer too. He was featured in the “Peer to Peer” section of the January/February 2005
edition of Oracle Magazine.
Right from the beginning of his quest into the ORACLE DBMS, he always wanted to know
exactly how things work. He would not be satisfied with superficial explanations, but demand
evidence. The passion to dig deeper served him well in acquiring extensive knowledge of the
ORACLE DBMS and occasionally makes him a restless researcher who may be working on a
topic from dusk until dawn when captured by the flow.
In his spare time, he likes to hike, snowboard, play basketball, and read non-fiction on topics
such as the emotional brain. Furthermore he is a passionate analog and digital photographer
and recently—having been intrigued by the vibrancy of stereoscopic capture for twenty years—
added a stereo camera1 to his lineup.
1. A stereo camera has two lenses and two shutters allowing it to capture three-dimensional images
thus simulating human binocular vision.
xxi
About the Foreword Writer
■GUY HARRISON is a director of Development at Quest Software. He’s the author of several books
on database technology including Oracle SQL High Performance Tuning (Prentice Hall, 2001)
and Oracle Performance Survival Guide (Prentice Hall, September 2009). You can reach Guy at
http://www.guyharrison.net.
xxiii
Acknowledgments
I
am indebted to the following persons for their comments, suggestions, and encouragement:
Helga Debes, Lise Andreasen, Pete Finnigan, and William Kehoe. Special thanks go to Iggy
Fernandez for introducing me to Jonathan Gennick, and to Guy Harrison for contributing the
foreword. I would also like to thank all of my clients for the privilege of fulfilling assignments for
them and the trust placed in me.
xxv
Introduction
S
ecrets of the ORACLE Database brings together a wealth of information on undocumented
as well as incompletely documented features of the ORACLE database management system
(DBMS). It has been my goal to combine many of the hidden features of the ORACLE database
server into a single source. You will be hard-pressed to find the same density of material on
advanced, undocumented topics in another book. Certain topics addressed may also be found
in articles on the Internet, but I have striven to provide more background information and indepth examples than are usually available on the Internet. The book also contains a significant
amount of original material, such as the inclusion of think time in resource profiles for performance diagnosis, an emergency procedure for the conversion of a RAC installation to a single
instance installation, as well as the integration of Statspack, Active Workload Repository, and
Active Session History with SQL trace.
The book is intended to complement the vast documentation from Oracle Corporation as
well as articles found on Oracle’s Metalink support platform. Arguably, the omission of some
features from Oracle’s documentation might be considered a documentation bug. Many features,
especially among those for troubleshooting (e.g., events) and tracing, remain undocumented
on purpose and for good reason, since Oracle Corporation rightfully suspects that they might
backfire when used in the wrong situation or without fully understanding the implications of
their use. Such features are not the subject of this book either. Instead, this text is centered
on those undocumented features that provide significant benefit without compromising the
integrity or availability of databases.
In this book, a certain feature is said to be undocumented if the full text search of the documentation provided on the Oracle Technology Network2 web site does not yield any hint of the
feature’s existence. A feature is said to be partially documented if the full text search does reveal
that the feature exists, but significant aspects of the feature are undocumented, thus limiting
the usefulness of the feature. Incomplete documentation often causes the need to investigate
a feature, which constitutes a significant investment in time and thus money, to reveal the
undocumented aspects through trial and error, searching the Internet, or Oracle’s Metalink
support platform. A significant number of undocumented aspects unveiled in this text are not
addressed by Metalink articles.
This book is a highly technical book. I have spared no effort in making the material as easily
accessible as possible by not assuming too much previous knowledge by the reader, adopting a
clear writing style, and presenting many examples. An occasional humorous remark serves to
intermittently stimulate the right brain and perhaps even trigger a grin, allowing the left analytical
brain to rest for a moment before tackling more technicalities.
Although this book is not expressly an ORACLE DBMS performance optimization book, it
has been my intention to offer a solid performance diagnostic method based on the analysis of
extended SQL trace data. To the best of my knowledge, this is the first book that covers the
2. http://otn.oracle.com
xxvii
xxviii
■I N T R O D U C T I O N
Oracle10g and Oracle11g extended SQL trace file formats, which differ in several important
aspects from the format used by Oracle9i. I sincerely hope that the free extended SQL trace
profiler ESQLTRCPROF provided with the book will help to quickly diagnose and solve difficult
performance problems you might face. As far as I know, ESQLTRCPROF is the only profiler that
classifies the wait event SQL*Net message from client into unavoidable latency due to client/
server communication and think time due to non-database–related processing by the client.
This configurable feature of the profiler alone is immensely valuable in situations where proof
is needed that the ORACLE DBMS is not the cause of a performance problem. Since think time
cannot be optimized, except by recoding the application or other applications waited for, proper
identification of think time will also aid in estimating the maximum speedup attainable by
tuning interactions between client and DBMS instance.
In situations where it’s appropriate for an ORACLE database administrator to see past the
end of his or her nose, I include background information on operating systems, networking,
and programming. I have also devoted some sections to operating system tools that are useful
for troubleshooting or investigation. I hope you will agree that this leads to a broader understanding of the features discussed than could be attained by exclusively focusing on ORACLE
DBMS software and leaving interactions with the operating system on which it runs aside.
Given the vast amount of hidden parameters, undocumented events, and X$ fixed tables,
it is impossible to cover all of these. It would keep me busy for the rest of my lifetime and I could
never share the insights with my readers. It has been my goal to explain how these undocumented
features integrate into the ORACLE DBMS and most of all to present a structured approach for
dealing with them. Thus, after assimilating the knowledge conferred, you will be able to make
your own discoveries of valuable undocumented features.
ORACLE Database Server Releases
When I started working on the first edition of this book in 2007, Oracle9i Release 2 was still fully
supported, while Oracle10g had been adopted by a significant portion of users. In the fall of 2007,
Oracle11g was released. Error correction support for Oracle9i Release 2 ended in July 2007.
However, Oracle9i Release 2 was still in widespread use. I decided that the best way to deal with
these three software releases was to incorporate some new material on Oracle11g and to repeat
most of the tests with Oracle11g. Generally, most of the material is rather release independent.
Events, ORADEBUG, ASM, extended SQL trace, Statspack, and AWR have not changed tremendously in Oracle11g. Of course, the latest release has additional wait and diagnostic events. The
extended SQL trace format has changed slightly in Oracle11g Release 1 (11.1.0.6) and again
in patch set 1 (11.1.0.7), but remains undocumented. There are also lots of new documented
features such as the result cache, Real Application Testing, additional partitioning strategies
(interval, reference, system, list-list, list-hash, list-range, range-range), PIVOT/UNPIVOT, and
Secure Files, which I have not included in this book on undocumented features.
The MERITS performance optimization method, which is presented in Chapter 28, applies
equally to all three releases. I have incorporated support for the Oracle11g SQL trace file format
into the ESQLTRCPROF extended SQL trace profiler (up to 11.1.0.7; see Chapter 27). Since the
TKPROF release shipped with Oracle11g still does not calculate a resource profile, I recommend
using ESQLTRCPROF instead of TKPROF, no matter which ORACLE release your company is
running. For the reader who is interested in quickly locating material pertaining to Oracle11g, I
have included a separate entry in the index that refers to all the pages with material on Oracle11g.
■I N T R O D U C T I O N
Intended Audience of This Book
This book was written for senior database administrators who have gained a solid understanding of
the ORACLE DBMS over the course of several years. It is not intended for the ORACLE DBMS
novice. Having made this clear, it is obvious that I will not dwell much on introductory material.
Where necessary and appropriate, discussions of undocumented features start by presenting
an overview of the respective feature to establish a starting point for the ensuing discussion.
Yet, these overviews are no substitute for reading the documentation on the feature and previous
experience with the documented aspects of the feature discussed is recommended. By no means
is it my intention to deter the novice DBA from reading this book. As long as you are willing to
draw on other sources such as the extensive ORACLE documentation to acquire prerequisite
knowledge, then please be my guest.
Organization of This Book
This book is organized into twelve major parts. The denotations of the parts (e.g., Initialization
Parameters, Utilities) are inspired by documentation on the ORACLE DBMS, such that you will
immediately be familiar with the overarching structure of the book. Each part is mostly selfcontained. Accordingly, there is no need to read the book from cover to cover. Instead, it can be
used like a reference manual by picking chapters that might assist with your current workload—
be it performance optimization or troubleshooting. Whenever material in different chapters is
interrelated, this is indicated by cross references.
Material in the individual parts is organized into chapters. Each chapter starts with an
introduction that addresses the benefits of the respective feature and states to what extent the
feature is documented, if at all. This includes references to ORACLE database manuals (if any)
containing information on the topic of the chapter. If you have not yet worked with the feature
discussed, it is a good idea to read the documentation in addition to the respective chapter. The
introduction also points out why the chapter is a worthwhile read and under what circumstances
the knowledge conferred is valuable. Chapters in the parts on SQL statements, supplied PL/SQL
packages, application development, Oracle Net, Real Application Clusters, and Utilities may be
read in any order. Chapters in the remaining parts on initialization parameters, data dictionary
base tables, events, performance, and X$ fixed tables should be read in the order in which they
appear in the text, since later chapters build on the foundation laid in earlier chapters within
the same part.
• Part 1, Initialization Parameters, deals with partially documented and undocumented
initialization parameters. Among the parameters covered, PGA AGGREGATE TARGET is the
most widely used. Chapter 1 explains the inner workings of work area sizing and the
hidden parameters it is based on. The remaining documented parameters addressed
are AUDIT SYSLOG LEVEL, EVENT and OS AUTHENT PREFIX. Chapter 2 presents the hidden
parameters TRACE FILES PUBLIC and ASM ALLOW ONLY RAW DISKS.
• Part 2, Data Dictionary Base Tables, is a look under the hood of data dictionary views.
After introducing the reader to data dictionary base tables in Chapter 3, Chapter 4 on
index usage monitoring details how to build a better view for finding used indexes than
the built-in view V$OBJECT USAGE.
xxix
xxx
■I N T R O D U C T I O N
• Part 3, Events, presents events for performance diagnosis (Chapter 6), tracing the cost-based
optimizer (Chapter 7), dumping Oracle Net packet contents (Chapter 8), and so on. It also
demonstrates how to find undocumented events supported by a certain DBMS release.
• Part 4, X$ Fixed Tables, addresses X$ tables, which are the foundation of GV_$ and V_$
views. The latter views are documented as dynamic performance views. X$ tables contain
information that goes beyond dynamic performance views. This part unveils how to find
hidden parameters along with descriptions in X$ tables (Chapter 9), how to get additional information on the buffer cache and latches (Chapter 10), and how to retrieve wait
event timings at microsecond resolution instead of the centisecond resolution offered by
V$SESSION WAIT (Chapter 11). Chapter 12 explains Automatic Storage Management (ASM)
metadata and the mapping between database file extents and ASM allocation units. Again, a
structured approach for dealing with undocumented features is emphasized. An example
of this is a method that generates a document with dependencies between V$ views and
X$ tables and vice versa.
• Part 5, SQL Statements, talks almost exclusively about undocumented SQL statements.
The statements may be used to set events at session or instance level (Chapter 13), change
the parsing schema identifier (Chapter 14), temporarily change a user’s password
(Chapter 15), and enhance scalability in concurrent processing (Chapter 16). Examples
for the usefulness of each statement are provided.
• Part 6, Supplied PL/SQL Packages, focuses on three undocumented as well as two partially
documented packages. Chapter 17 on DBMS BACKUP RESTORE explains how to restore a
database that was backed up with Recovery Manager (RMAN), in a disaster scenario, that
is, after the loss of the RMAN catalog and all database files including the most recent
copy of the control file, which contains the directory of the most recent backups. The
remaining packages covered address topics such as performance diagnosis and tracing
(DBMS SYSTEM), jobs (DBMS IJOB), undocumented aspects of the Oracle10g and Oracle11g
database scheduler (DBMS SCHEDULER), and database object name resolution (DBMS UTILITY).
• Part 7, Application Development, consists of two chapters. Chapter 22 is an introduction
to Perl DBI and DBD::Oracle—a Perl interface for accessing ORACLE databases built with
Oracle Call Interface. It is undocumented that each Oracle10g and Oracle11g ORACLE HOME
includes a Perl installation with the DBI. Scripting in Perl and the DBI is much more
powerful than using a combination of a (UNIX) shell and SQL*Plus. Development time is
also reduced. Oracle Corporation’s documentation is lacking a document that explains
the benefits and effects of JDBC end-to-end metrics. This is what Chapter 23 does. It brings
together all the information on performance diagnosis and monitoring that relates to
application instrumentation, end-to-end tracing (DBMS MONITOR), extended SQL trace,
and the TRCSESS utility.
• The goal of Part 8, Performance, is to acquaint the reader with a solid performance optimization method, based for the most part on the assessment of extended SQL trace files.
This part covers the undocumented extended SQL trace file format (Chapter 24) as well
as how to get the most out of Statspack (Chapter 25) and AWR (Chapter 26). Chapter 27
presents the free extended SQL trace profiler ESQLTRCPROF provided with this book.
Chapter 28 on the MERITS performance optimization method is the culmination of this
part. The MERITS method is a tested, proven framework for diagnosing and solving
performance problems.
■I N T R O D U C T I O N
• Part 9, Oracle Net, addresses undocumented and partially documented Oracle Net parameters, which may be used to configure the TNS Listener and certain Oracle Net features.
Chapter 29 explains the setting IP=FIRST, which was introduced, but not documented in
Oracle10g. Chapter 30 explains how to use the fully dynamic valid node checking feature
of the listener to set up a simple form of a firewall to protect ORACLE instances from
intruders. Chapter 31 discusses the parameter ENABLE=BROKEN. Chapter 32 talks about the
default host name feature.
• Part 10, Real Application Clusters, discusses undocumented aspects of Transparent
Application Failover (TAF) and database services (DBMS SERVICE), which may be used for
load re-balancing after a cluster node has failed and subsequently re-joined the cluster
(Chapter 33). Chapter 34 presents a quick procedure for converting an ORACLE RAC
installation with or without ASM to an ORACLE single instance installation. The procedure is intended for disaster scenarios where hardware failure or software defects make
it impossible to run the DBMS in multi-instance mode (i.e., with RAC enabled).
• Part 11, Utilities, includes chapters on the OERR utility (Chapter 35), the RMAN pipe
interface (Chapter 36), and ORADEBUG (Chapter 37). Among these, ORADEBUG is
presumably the most useful. It may be used to control processes of an instance, enable
and disable SQL trace, retrieve the SQL trace file name, generate diagnostic dumps for
hang analysis, and much more.
• Part 12, Appendixes, contains a glossary and a bibliography. You may wish to read the
glossary first, to acquaint yourself with the terms used in this book. Throughout the
book, four-letter strings and the year of publication in angle brackets (e.g., [ShDe 2004])
refer to sources in the bibliography. This part also contains an appendix that lists make
targets for enabling and disabling DBMS options, such as RAC or Partitioning.
Source Code Depot
Source code shown in listings is downloadable as a zip archive from the Apress web page. Each
listing that exceeds approximately a dozen lines is identified by a unique file name. At the end
of most chapters, there is a section titled Source Code Depot. This section lists all the source files
of the respective chapter and their functionality. To download the source code depot of the entire
book, browse to http://www.apress.com/book/view/1430219521 and click on the link Source Code.
Conventions and Terms
In my understanding, the designation Oracle refers to the company Oracle Corporation and its
international subsidiaries. I adhere to the convention concerning the designations Oracle
and ORACLE that Oracle Corporation proposes.3 Oracle Corporation refers to the company,
whereas ORACLE refers to the database server product. ORACLE HOME (or the environment variable
$ORACLE HOME on UNIX; %ORACLE HOME% on Windows) designates the directory where the database server software is installed. Contrary to most authors, I refrain from using the designation
Oracle to refer to any software manufactured by Oracle Corporation Instead, I use the term
3. See line 234 of $ORACLE_HOME/rdbms/mesg/oraus.msg.
xxxi
xxxii
■I N T R O D U C T I O N
ORACLE DBMS (database management system) to refer to the database server software Oracle
Corporation offers.
Database vs. Instance
On page 1-8, the manual Oracle Database Concepts 10g Release 2 explains “the physical database structures of an Oracle database, including data files, redo log files, and control files”.
Thus, an ORACLE database is made up of data files (grouped into tablespaces), redo log files,
and control files—a collection of files residing on disk. On page 1-13, the same manual goes on
to say this:
Every time a database is started, a system global area (SGA) is allocated and Oracle
background processes are started. The combination of the background processes and
memory buffers is called an ORACLE instance.
I’m a strong advocate of stating things clearly and using terms consistently. Unfortunately,
the term database is often used incorrectly. If we take the definition of a database as consisting
of data files, redo log files, and control files literally, then it is obvious that a database cannot be
started. It is the ORACLE DBMS instance, which consists of processes and memory structures
such as the SGA, that breathes life into an ORACLE database. I invite you to adopt the wording
concerning database and instance in Table 1. In this book, I have made every effort to use the
terms database and instance in the sense defined here, to avoid confusion.
Table 1. Instance vs. Database
Wording
Action
SQL*Plus command
To start an ORACLE DBMS The instance reads the parameter file, starts
instance
processes (SMON, PMON, etc.), and creates
the SGA (system global area) in memory.
STARTUP NOMOUNT
To mount a database
Certain processes of the DBMS instance
open the control file(s) of the database. No
other files are accessed at this time.
STARTUP MOUNT
To open a database
The DBMS instance opens the online redo
logs and one or more tablespaces (at least the
tablespace called SYSTEM, which contains
the data dictionary).
STARTUP or STARTUP
OPEN
To shut down
an instance
The DBMS instance first closes the data files
and online redo logs (message “Database
closed” in SQL*Plus), then closes the control
file(s) (message “Database dismounted”),
and finally removes the SGA and terminates
all processes (message “ORACLE instance
shut down”). The SQL statements ALTER
DATABASE CLOSE and ALTER DATABASE DISMOUNT
may also be used to accomplish the first
two steps.
SHUTDOWN
■I N T R O D U C T I O N
Instance Service Name vs. Net Service Name
To disambiguate the term service name, I use the denomination Net (from Oracle Net) service
name for services defined in tnsnames.ora or a directory service such as LDAP. I call the service
names an instance registers with a listener instance service names. Instance service names
are defined either through the parameter SERVICE NAMES or with the package DBMS SERVICE in
Oracle10g and Oracle11g. The command tnsping accepts Net service names, whereas the list of
services returned by the command lsnrctl services contains instance service names. Connect
strings, such as ndebes/secret@ten.oradbpro.com, contain Net service names. The body of a Net
service name definition includes either an instance service name or an ORACLE SID (SID=oracle_sid).
A Net service name definition in tnsnames.ora has the following format:
net service name =
(DESCRIPTION =
(ADDRESS LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = host name)(PORT = 1521))
)
(CONNECT DATA =
(SERVICE NAME = instance service name)
)
)
Mind the keyword SERVICE NAME in the body of the DESCRIPTION section. The setting of
SERVICE NAME is an instance service name and in Oracle10g it is reflected in the column V$SESSION.
SERVICE NAME and in SQL trace files. In Oracle10g, all configured instance service names are in
DBA SERVICES.NETWORK NAME. Why NETWORK NAME? These are the instance service names registered
with an Oracle Net listener (parameters LOCAL LISTENER and REMOTE LISTENER).4
Client sessions that connect by using a Net service name definition that contains SID=
oracle_sid instead of SERVICE NAME=instance_service_name have the service name SYS$USERS in
V$SESSION.SERVICE NAME. This is also true for local sessions established without specifying a Net
service name. These latter sessions use the so-called bequeath protocol adapter, which takes
the setting of SID from the environment variable ORACLE SID.
Typographical Conventions
The typographical conventions used in this book are summarized in Table 2.
Table 2. Typographical conventions
Convention
Meaning
italic
Italic type indicates book titles, quotes, emphasis, or placeholder
variables for which particular values have to be supplied.
monospace
Monospace type indicates operating system, SQL, or SQL*Plus
commands, as well as file or code excerpts.
4. To immediately register instance service names with a listener, for example after a listener restart,
the command ALTER SYSTEM REGISTER is provided.
xxxiii
xxxiv
■I N T R O D U C T I O N
Table 2. Typographical conventions (Continued)
Convention
Meaning
GUI Item
Bold font designates items in graphical user interfaces, e.g., Control Panel.
…
An ellipsis represents one or more lines that have been omitted. It is
used in log file or code excerpts.
<placeholder>
The expression <placeholder> is used in syntax descriptions and represents
a placeholder that needs to be substituted by an actual value. Angle
brackets surround the placeholder and include the string that must
be replaced by an actual value. Here’s an example syntax: CONNECT
<username>/<password>. With actual values filled in, it might become:
CONNECT ndebes/secret.
$
Marks commands entered at a UNIX shell prompt (Bourne or Korn Shell).
C:>
Marks commands entered at the prompt of a Windows command
interpreter (cmd.exe).
SQL>
Marks commands entered in a SQL*Plus database session.
{value1|…|valueN}
Range of acceptable values, for example INSTANCE TYPE={ASM|RDBMS}.
Vertical bars separate alternatives.
Table 3 contains a list of abbreviations used throughout the book.
Table 3. Abbreviations
Abbreviation
Meaning
ASCII
American Standard Code for Information Interchange
ASH
Active Session History
AWR
Active Workload Repository
ADDM
Automatic Database Diagnostic Monitor
DBA
Database Administrator
DBMS
Database Management System
GCS
Global Cache Service
GES
Global Enqueue Service
I/O
input/output from/to a device
IP
Internet Protocol
LOB
Large Object (BLOB, CLOB, NCLOB)
LUN
Logical unit number
OCI
Oracle Call Interface
PGA
Program Global Area
RAC
Real Application Clusters
■I N T R O D U C T I O N
Table 3. Abbreviations
Abbreviation
Meaning
SCN
System Change Number
SGA
System Global Area
TCP
Transmission Control Protocol
UDP
User Datagram Protocol
a.k.a.
also known as
e.g.
for example (from Latin: exempli gratia)
et al.
and others (from Latin: et alteri)
i.e.
that is (from Latin: id est)
n/a
not applicable
Send Us Your Comments
The author and reviewers have verified and tested the information in this book to the best of
their capability. Please inform the publisher of any issues you may find in spite of our efforts to
make this book as reliable a source as possible. You may submit errata pertaining to this publication on the Apress web site at http://www.apress.com/book/view/1430219521.
xxxv
PA R T
1
Initialization
Parameters
CHAPTER 1
■■■
Partially Documented
Parameters
F
iguratively speaking, the Oracle database management system has a tremendous number of
knobs to turn and switches to flip. Oracle9i Release 2 has 257 documented parameters, Oracle10g
Release 2 has 258, and Oracle11g Release 1 has 294. Presumably there is no single DBA who has
memorized the meanings and permissible values for all those parameters. The Oracle Database
Reference manual of each respective release is the definitive source for documented initialization
parameters. This chapter scrutinizes the partially documented parameters AUDIT SYSLOG LEVEL,
PGA AGGREGATE TARGET, EVENT, and OS AUTHENT PREFIX and provides information that is absent
from the Oracle Database Reference manual. Both AUDIT SYSLOG LEVEL and OS AUTHENT PREFIX
are related to database security. EVENT is a curious parameter in the sense that the parameter
itself is documented, but permissible values are not. Among other things it may be used to
collect more evidence when errors occur or gather diagnostic information under the supervision of
Oracle Support Services. From a performance perspective, learning how PGA AGGREGATE TARGET is
handled internally allows a DBA to significantly reduce the response time of large sort operations.
AUDIT_SYSLOG_LEVEL
The initialization parameter AUDIT SYSLOG LEVEL is partially documented. Several inaccuracies
in the documentation suggest that the parameter is less useful than it actually is. Database
actions by SYS and/or database administrators or operators may be audited to the UNIX operating system’s syslog daemon log files owned by the UNIX user root. This prevents privileged
database users from removing audit records that contain a log of their activities. The default
setting is to audit CONNECT, STARTUP, and SHUTDOWN with SYSDBA or SYSOPER privileges to files
owned by the ORACLE software owner, while not auditing SQL, PL/SQL statements, and other
actions with these privileges or other privileges, such as the role DBA, at all. In other words,
except for the aforementioned operations, standard auditing (see parameter AUDIT TRAIL) as
well as fine grained auditing (see package DBMS FGA) are switched off by default. As a consequence, there will be no trace of many activities performed by privileged users. Auditing to
operating system files owned by the ORACLE software owner (AUDIT TRAIL=OS) or to the database table SYS.AUD$ (AUDIT TRAIL=DB) may be circumvented, since DBAs normally have access
to the ORACLE software owner’s UNIX account as well as to SYS.AUD$, allowing them to easily
remove audit records generated for their actions. Auditing via the UNIX syslog facility is also
useful for detecting intrusions by hackers or manipulations by malevolent insiders.
3
4
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
Syslog Facility
A new feature of Oracle10g is the ability to write audit trails using the syslog facility on UNIX
systems. This facility consists of a daemon process named syslogd (see man syslogd) that accepts
log messages from applications via the syslog C library function (see man syslog). The configuration file for syslogd is usually /etc/syslog.conf and log messages go to files in /var/log or
/var/adm depending on the UNIX variant. The log file name is determined by a string that
consists of a facility name and a priority or level. Most of these may be used when setting
AUDIT SYSLOG LEVEL. Each entry in /etc/syslog.conf assigns a log file name to a certain combination of facility and priority. By placing the entry user.notice /var/log/oracle dbms into the
file syslog.conf and telling syslogd to reread the configuration file by sending it a hang-up
signal with the command kill,1 any subsequent log entries from an ORACLE instance with the
setting AUDIT SYSLOG LEVEL=user.notice will be recorded in the file /var/log/oracle dbms.
Introduction to Auditing
On UNIX systems, CONNECT, STARTUP, and SHUTDOWN of an ORACLE instance with SYSDBA or SYSOPER
privileges are unconditionally audited to files with extension .aud in $ORACLE HOME/rdbms/audit
or a directory specified with the parameter AUDIT FILE DEST.2 Oracle9i was the first release that
had the capability of auditing actions other than CONNECT, STARTUP, and SHUTDOWN performed with
SYSDBA or SYSOPER privileges by setting AUDIT SYS OPERATIONS=TRUE.
Figure 1-1. Event Details in Windows Event Viewer
1. Use kill HUP `cat /var/run/syslogd.pid` on Red Hat Linux.
2. AUDIT FILE DEST is used as soon as an instance has started. When connecting as SYSDBA or SYSOPER
while an instance is down, the default audit file destination $ORACLE HOME/rdbms/audit is used.
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
When AUDIT SYSLOG LEVEL and AUDIT SYS OPERATIONS are combined, any SQL and PL/SQL
run as user SYS may be audited using the syslog facility. Since the files used by syslog are owned
by root, and a DBA usually does not have access to the root account, DBAs will not be able to
remove traces of their activity. Of course, this also applies to intruders who have managed to
break into a machine and have gained access to the account of the ORACLE software owner but
not to the root account. The same applies to hackers who have cracked the password of a privileged database user and are able to connect via Oracle Net.
On Windows, the parameters AUDIT SYSLOG LEVEL and AUDIT FILE DEST are not implemented,
since the Windows event log serves as the operating system audit trail (see Figure 1-1). Just
like on UNIX, CONNECT, STARTUP, and SHUTDOWN are unconditionally logged. When AUDIT SYS
OPERATIONS=TRUE is set, operations with SYSDBA or SYSOPER privileges are also written to the
Windows event log, which may be viewed by navigating to Start ➤ Control Panel ➤ Administrative Tools ➤ Event Viewer. The logging category used is Application and the source is
named Oracle.ORACLE_SID. Events for a certain DBMS instance may be filtered by choosing
View ➤ Filter.
The Oracle Database Reference 10g Release 2 manual explains AUDIT SYSLOG LEVEL as
follows (page 1-22):
AUDIT_SYSLOG_LEVEL enables OS audit logs to be written to the system via the syslog
utility, if the AUDIT_TRAIL parameter is set to os. The value of facility can be any of the
following: USER, LOCAL0- LOCAL7, SYSLOG, DAEMON, KERN, MAIL, AUTH, LPR,
NEWS, UUCP or CRON. The value of level can be any of the following: NOTICE, INFO,
DEBUG, WARNING, ERR, CRIT, ALERT, EMERG.
Tests of the new feature on a Solaris 10 and a Red Hat Linux system showed that the documentation is inaccurate on three counts:
1. AUDIT SYSLOG LEVEL is independent of AUDIT TRAIL. When AUDIT SYSLOG LEVEL is set
and AUDIT TRAIL has the default value NONE, CONNECT, STARTUP, and SHUTDOWN are logged
via syslog.
2. Setting the parameters AUDIT SYSLOG LEVEL and AUDIT SYS OPERATIONS=TRUE causes
any actions such as SQL and PL/SQL statements executed with SYSDBA or SYSOPER
privileges to be logged via syslog, even if AUDIT TRAIL=NONE.
3. Only certain combinations of facility and level are acceptable. Unacceptable combinations cause the error “ORA- 32028: Syslog facility or level not recognized” and prevent
DBMS instances from starting.
If the documentation were accurate, it would not be possible to audit actions performed
with SYSDBA or SYSOPER privileges to the system log, while auditing actions by other users to the
data dictionary base table SYS.AUD$. However, such a limitation does not exist.
5
6
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
Using AUDIT_SYSLOG_LEVEL
As stated earlier, the string assigned to AUDIT SYSLOG LEVEL must consist of a facility name and
a priority or level. Surprisingly, when doing a SHOW PARAMETER or a SELECT from V$PARAMETER,
merely the facility is visible—the dot as well as the level are suppressed.3 For example, with
the entry *.audit syslog level='USER.NOTICE' in the SPFILE used to start the instance, SHOW
PARAMETER yields:
SQL> SHOW PARAMETER audit syslog level
NAME
TYPE
VALUE
------------------------------------ ----------- ----audit syslog level
string
USER
SQL> SELECT value FROM v$parameter WHERE name='audit syslog level';
VALUE
----USER
Yet, when executing CONNECT / AS SYSDBA, the facility and level logged in /var/adm/messages
on Solaris is “user.notice”:
Feb 21 11:45:52 dbserver Oracle Audit[27742]: [ID 441842 user.notice]
ACTION
Feb 21
Feb 21
Feb 21
Feb 21
Feb 21
: 'CONNECT'
11:45:52 dbserver
11:45:52 dbserver
11:45:52 dbserver
11:45:52 dbserver
11:45:52 dbserver
DATABASE USER: '/'
PRIVILEGE : SYSDBA
CLIENT USER: oracle
CLIENT TERMINAL: pts/3
STATUS: 0
If an SPFILE is used, the full setting is available by querying V$SPPARAMETER:
SQL> SELECT value FROM v$spparameter WHERE name='audit syslog level';
VALUE
----------user.notice
Auditing Non-Privileged Users
Of course, you may also direct audit records pertaining to non-privileged users to the system
log by setting AUDIT TRAIL=OS in addition to AUDIT SYSLOG LEVEL. Non-privileged users cannot
delete audit trails logging their actions. The search for perpetrators with queries against auditing
views, such as DBA AUDIT STATEMENT or DBA AUDIT OBJECT, is easier than searching the system
log. For these reasons, keeping the audit trails of non-privileged users inside the database with
AUDIT TRAIL=DB is preferred. With the latter setting, audit trails are written to the table SYS.AUD$
and may be queried through the aforementioned data dictionary views. Setting AUDIT TRAIL=NONE
switches off auditing of actions by non-privileged users.
3. Test performed with ORACLE DBMS version 10.2.0.3.
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
After enabling auditing for database connections established by non-privileged users, e.g.,
as in:
SQL> AUDIT CONNECT BY appuser /* audit trail=os set */;
entries similar to the following are written to the syslog facility (example from Solaris):
Feb 21 11:41:14 dbserver Oracle Audit[27684]: [ID 930208 user.notice]
SESSIONID: "15" ENTRYID: "1" STATEMENT: "1" USERID: "APPUSER"
USERHOST: "dbserver" TERMINAL: "pts/3" ACTION: "100" RETURNCODE: "0"
COMMENT$TEXT: "Authenticated by: DATABASE" OS$USERID: "oracle"
PRIV$USED: 5
Another entry is added to /var/adm/messages when a database session ends:
Feb 21 11:44:41 dbserver Oracle Audit[27684]: [ID 162490 user.notice]
SESSIONID: "15" ENTRYID: "1" ACTION: "101" RETURNCODE: "0"
LOGOFF$PREAD: "1" LOGOFF$LREAD: "17" LOGOFF$LWRITE: "0" LOGOFF$DEAD:
"0" SESSIONCPU: "2"
Note that additional data provided on the actions LOGON (100) and LOGOFF (101) conforms
to the columns of the view DBA AUDIT SESSION. Translation from action numbers to action
names is done via the view AUDIT ACTIONS as in this example:
SQL> SELECT action, name FROM audit actions WHERE action IN (100,101)
ACTION NAME
------ -----100 LOGON
101 LOGOFF
When AUDIT SYSLOG LEVEL=AUTH.INFO, AUDIT SYS OPERATIONS=FALSE and AUDIT TRAIL=NONE,
CONNECT, STARTUP, and SHUTDOWN are logged via syslog. With these settings, an instance shutdown
on Solaris writes entries similar to the following to /var/adm/messages:
Feb
Feb
Feb
Feb
Feb
Feb
21
21
21
21
21
21
14:40:01
14:40:01
14:40:01
14:40:01
14:40:01
14:40:01
dbserver
dbserver
dbserver
dbserver
dbserver
dbserver
Oracle Audit[29036]:[ID 63719 auth.info] ACTION:'SHUTDOWN'
DATABASE USER: '/'
PRIVILEGE : SYSDBA
CLIENT USER: oracle
CLIENT TERMINAL: pts/3
STATUS: 0
When AUDIT SYSLOG LEVEL=AUTH.INFO, AUDIT SYS OPERATIONS=TRUE, and AUDIT TRAIL=NONE,
SQL and PL/SQL statements executed with SYSDBA or SYSOPER privileges are also logged via
syslog. Dropping a user after connecting with / AS SYSDBA results in a syslog entry similar to the
one shown here:
7
8
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
Feb 21
ACTION
Feb 21
Feb 21
Feb 21
Feb 21
Feb 21
14:46:53 dbserver Oracle Audit[29170]: [ID 853627 auth.info]
: 'drop user appuser'
14:46:53 dbserver DATABASE USER: '/'
14:46:53 dbserver PRIVILEGE : SYSDBA
14:46:53 dbserver CLIENT USER: oracle
14:46:53 dbserver CLIENT TERMINAL: pts/3
14:46:53 dbserver STATUS: 0
Lessons Learned
CONNECT, STARTUP, and SHUTDOWN with SYSDBA or SYSOPER privileges are logged to *.aud files by
default in spite of an AUDIT TRAIL=NONE setting. If AUDIT SYSLOG LEVEL is set, the SQL*Plus
STARTUP command is logged to a *.aud file in $ORACLE HOME/rdbms/audit, whereas ALTER DATABASE
MOUNT and subsequent commands as well as SHUTDOWN are logged via syslog, since a running
instance is required for using the syslog facility and the instance is not yet running when
STARTUP is issued.
Setting AUDIT SYSLOG LEVEL and AUDIT SYS OPERATIONS=TRUE produces additional auditing
trail records covering all actions performed with SYSDBA or SYSOPER privileges in the configured
syslog log file irrespective of the setting of AUDIT TRAIL. Intruders who have not managed to
break into the account of the UNIX user root, will not be able to remove these audit trail records.
Of course, an intruder who is aware of these features might remove the AUDIT SYSLOG
LEVEL setting, but at least the parameter change would be logged if an SPFILE is used, and the
change would not be in effect immediately since it is a static parameter. You may wish to set
AUDIT SYS OPERATIONS=FALSE during maintenance operations such as an upgrade (which have
to be run as user SYS) to avoid generating large syslog log files.
PGA_AGGREGATE_TARGET
The initialization parameter PGA AGGREGATE TARGET is documented in Oracle9i Database
Performance Tuning Guide and Reference Release 2 and in Oracle Database Performance
Tuning Guide 10g Release 2. The aforementioned Oracle9i manual states that the parameters
SORT AREA SIZE and HASH AREA SIZE for manual PGA memory management should not be
used, except in Shared Server environments, since Oracle9i Shared Server cannot leverage
automatic PGA memory management (pages 1–57 and 14–50). The algorithm that governs
individual work area sizing for serial and parallel execution is undocumented.
Knowing the undocumented restrictions imposed on work area sizing allows DBAs to set
the most appropriate value for PGA AGGREGATE TARGET, thus avoiding expensive spilling of work
areas to disk and allowing operations to run entirely in memory, realizing significant performance gains. Under rare circumstances it may be desirable to override automatic settings of
hidden parameters affected by PGA AGGREGATE TARGET.
Introduction to Automatic PGA Memory Management
The program global area (PGA) is a private memory region where server processes allocate
memory for operations such as sorts, hash joins, and bitmap merges. Consequently, the PGA
memory region is separate from the SGA (system global area). There is even a third memory
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
region, the UGA (user global area), that holds session and cursor state information. Dedicated
server processes allocate UGA memory inside the PGA, whereas shared server processes place
the UGA inside the SGA, since it must be accessible to all shared server processes. If the SGA
contains a large pool (parameter LARGE POOL SIZE), shared server processes place the UGA
inside the large pool. In Oracle10g, the shared pool, large pool, java pool, streams pool, and the
default buffer pool with standard block size4 can be sized automatically and dynamically with
Automatic Shared Memory Management (parameter SGA TARGET).
In releases prior to Oracle9i, several * AREA SIZE parameters had to be used to adjust the
sizes for various PGA memory regions. Examples of these parameters are SORT AREA SIZE and
HASH AREA SIZE. On UNIX, where the ORACLE DBMS is implemented as a multiprocess architecture, PGA memory could not always be returned to the operating system after a memoryintensive operation. It lingered within the virtual address space of the server process and may
have caused paging. Memory thus allocated was also not available to other server processes.
There was also no instance-wide limit on PGA memory regions. Since each server process was
allowed to allocate memory for an operation up to the limits imposed by * AREA SIZE parameters, the instance-wide memory consumption could become extensive in environments with
several hundred server processes. Note also that the * AREA SIZE parameters enforce a per
operation limit, not a per session limit. Since a query may open several cursors simultaneously
and each might execute an expensive SELECT that includes an ORDER BY or a hash join, there
is no limit on overall memory consumption with the old approach now called manual PGA
memory management.
To address these shortcomings, automatic PGA memory management was introduced
with Oracle9i. On UNIX, it is based on the modern technology of memory mapping, which
enables a process to allocate virtual memory and to map it into its virtual address space. Once
the memory is no longer needed, it can be returned to the operating system by removing the
mapping into the virtual address space. On Solaris, the UNIX system calls used are mmap and
munmap. Calls to memory mapping routines by the ORACLE kernel may be traced using truss
(Solaris) or strace (Linux).5 Another interesting utility is pmap (Solaris, Linux). It displays information about the address space of a process, which includes anonymous memory mapped
with mmap. Back in the old days of 32-bit computing, this tool provided precisely the information
needed to relocate the SGA base address to allow mapping of a larger shared memory segment
into the limited virtual address space of a 32-bit program (see Metalink note 1028623.6). Using
pmap while a process sorts, reveals how many regions of anonymous memory it has mapped
and what their cumulative size is.
Here’s an example (29606 is the UNIX process ID of the server process found in
V$PROCESS.SPID). The relevant column is “Anon” (anonymous mapped memory).
$ pmap -x 29606 | grep Kb
Address
Kbytes
total Kb
934080
RSS
888976
Anon
63008
Locked Mode
806912
Mapped File
4. Oracle10g supports up to seven buffer pools: five buffer pools varying in block size from 2 KB to 32 KB
and two additional buffer pools with standard block size, i.e., the block size set with the parameter
DB BLOCK SIZE. The latter two buffer pools are the keep and recycle pools. Segments may be placed
into the keep or recycle pool with an ALTER TABLE or ALTER INDEX statement as appropriate.
5. The web page titled Rosetta Stone for UNIX at http://bhami.com/rosetta.html lists system call tracing
utilities for common UNIX systems.
9
10
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
With automatic PGA memory management, so-called work areas are used for operations
such as sorts or hash joins. The target cumulative size of all active work areas is specified with
the parameter PGA AGGREGATE TARGET (PAT). A single process may access several work areas
concurrently. Information on automatic PGA memory management and work areas is available by querying dynamic performance views such as V$PGASTAT, V$PROCESS, V$SQL WORKAREA,
and V$SQL WORKAREA ACTIVE.
Misconceptions About PGA_AGGREGATE_TARGET
The parameter name PGA AGGREGATE TARGET is a name well chosen. What I’m trying to say is
that it is what it sounds like—a target, not an absolute limit, merely a target. This means that
the actual amount of memory consumed under high load may constantly, or at least intermittently, be higher than the target value. But the implementation of automatic PGA memory
management is so well crafted that processes will then release memory, such that the total
memory consumption will soon drop below the target, if at all possible. Especially when PL/SQL,
which allocates a lot of memory, e.g., for collections such as index-by tables, is executed, the
target may be permanently exceeded. Whereas sort memory requirements can be reduced by
using temporary segments, PL/SQL memory requirements cannot.
Figure 1-2. PGA Sizing in Database Configuration Assistant
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
When creating a database with the database configuration assistant (DBCA), there is a
memory configuration page for customizing most of the aforementioned pools as well as the
PGA. On this page, DBCA adds the sizes of all pools as well as the PGA and reports the resulting
figure as Total Memory for Oracle (see Figure 1-2). This leads some people to believe that this
amount of memory (4956 MB in the screenshot) will be allocated when the ORACLE instance is
started. Knowing that the SGA is allocated on instance startup, they assume the same must be
true for the PGA. However, this is not the case. PGA memory is allocated on demand. Even the
* AREA SIZE parameters do not cause a memory allocation of the designated size. These too are
allocated on an as-needed basis.
Since the documentation does not address the details of work area sizing, many database
administrators assume that the entire memory set aside with PGA AGGREAGTE TARGET is available to
a single session as long as it does not have to compete for the memory with other sessions. In
case you’re curious what the real deal is, please read on.
Researching PGA_AGGREGATE_TARGET
The research presented in this section was done with Oracle10g Release 2. Results show that
the algorithms used by Oracle9i and Oracle10g are different. Due to space constraints, no
example or evidence concerning Oracle9i is included.6
Creating a Large Table with a Pipelined Table Function
For starters, we need a table that is large enough to cause disk spilling during sort operations.
The next few paragraphs show how to code a pipelined table function that returns an arbitrary
number of rows (see file row factory.sql in the source code depot). This function may then be
used in conjunction with the package DBMS RANDOM to create arbitrarily sized tables with random
data. Since pipelined table functions return a collection type, we start by creating an object
type for holding a row number.
SQL> CREATE OR REPLACE TYPE row nr type AS OBJECT (row nr number);
/
The pipelined table function will return a collection type made up of individual row nr types.
SQL> CREATE OR REPLACE TYPE row nr type tab AS TABLE OF row nr type;
/
The function row factory returns any number of rows—within the limits of the ORACLE
NUMBER data type, of course. It has the two parameters first nr and last nr, which control how
many rows will be returned.
CREATE OR REPLACE FUNCTION row factory(first nr number, last nr number)
RETURN row nr type tab PIPELINED
AS
row nr row nr type:=NEW row nr type(0);
6. For research on Oracle9i, see Jonathan Lewis’ article at http://www.jlcomp.demon.co.uk/untested.html.
11
12
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
BEGIN
FOR i IN first nr .. last nr LOOP
row nr.row nr:=i;
PIPE ROW(row nr);
END LOOP;
return;
END;
/
When last nr is larger than first nr, row factory returns last nr - first nr plus one
row. The result is very much like SELECT ROWNUM FROM table, except that the argument values and
not the number of rows in a table control how many rows are returned. Here’s an example:
SQL> SELECT * FROM TABLE(row factory(1,2));
ROW NR
---------1
2
The classic approach for generating a large table consists of selecting from a real table,
possibly using a Cartesian join to arrive at a very large number of rows. Beyond requiring less
coding for the CREATE TABLE statement, this novel approach using a pipelined table function has
the additional benefit of not causing any consistent or physical reads on a segment. By calling
row factory with a first nr and last nr setting of 1 and 1000000, we can now create a table
with one million rows.
SQL> CREATE TABLE random strings AS
SELECT dbms random.string('a', 128) AS random string
FROM TABLE(row factory(1,1000000))
NOLOGGING;
The first argument (opt) tells DBMS RANDOM to generate random mixed-case strings consisting
solely of letters. The second argument (len) controls the length of the random string. Note that
in releases prior to Oracle11g, arguments to PL/SQL routines cannot be passed by name from
SQL.7
In my test database with db block size=8192, the previous CTAS (create table as select)
resulted in a segment size of about 150 MB. DBMS RANDOM is also capable of generating random
alphanumeric strings in lower, upper, or mixed case, as well as random numbers.8
V$SQL_WORKAREA_ACTIVE
A good way to monitor PGA memory management at the session level is to query the dynamic
performance view V$SQL WORKAREA ACTIVE, which has the following columns:
7. In Oracle11g, SELECT * FROM TABLE(row factory(first nr => 1, last nr => 3)) is syntactically
correct. In prior releases this statement causes ORA-00907.
8. The document A Security Checklist for Oracle9i lists DBMS RANDOM among a list of packages that might
be misused and recommends to revoke execute permission on DBMS RANDOM from PUBLIC (see http://
www.oracle.com/technology/deploy/security/oracle9i/pdf/9i checklist.pdf).
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
SQL> DESC v$sql workarea active
Name
Null?
----------------------------------------- -------WORKAREA ADDRESS
OPERATION TYPE
OPERATION ID
POLICY
SID
QCINST ID
QCSID
ACTIVE TIME
WORK AREA SIZE
EXPECTED SIZE
ACTUAL MEM USED
MAX MEM USED
NUMBER PASSES
TEMPSEG SIZE
TABLESPACE
SEGRFNO#
SEGBLK#
Type
-----------RAW(4)
VARCHAR2(20)
NUMBER
VARCHAR2(6)
NUMBER
NUMBER
NUMBER
NUMBER
NUMBER
NUMBER
NUMBER
NUMBER
NUMBER
NUMBER
VARCHAR2(31)
NUMBER
NUMBER
I wrote a small Perl DBI program for closely monitoring the use of PGA work areas. The
Perl program executes a SELECT on V$SQL WORKAREA ACTIVE once per second and prints the results
to the screen. In addition to the session identifier (which corresponds to V$SESSION.SID), the
current and maximum work area sizes, and the size of temporary segments, the query also
retrieves a timestamp. All sizes are reported in MB. The SELECT statement used by the Perl
program is as follows:
SELECT sid, to char(sysdate,'mi:ss') time,
round(work area size/1048576, 1) work area size mb,
round(max mem used/1048576, 1) max mem used mb, number passes, nvl(tempseg size/
1048576, 0) tempseg size mb
FROM v$sql workarea active
ORDER BY sid;
Now we have a large table and a monitoring tool. So we’re all set to run some actual tests.
Since I’m the only tester using the instance, I might assume that the entire memory set aside
with PGA AGGREGATE TARGET will be available to me. As stated before, the segment size of the
table is about 150 MB, such that a PGA AGGREGATE TARGET setting of 256 MB should be more
than sufficient for an in-memory sort. So this is the value we will use:
SQL> ALTER SYSTEM SET pga aggregate target=256m;
System altered.
To start monitoring, set the ORACLE SID and DBI environment variables (discussed further
in Chapter 22), then run sql workarea active.pl. The following example is from Windows. On
UNIX, use export to set environment variables.
13
14
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
C:> set ORACLE SID=ORCL
C:> set DBI USER=ndebes
C:> set DBI PASS=secret
C:> set DBI DSN=DBI:Oracle:
C:> sql workarea active.pl
SID TIME WORK AREA SIZE MAX MEM USED PASSES TEMPSEG SIZE
The Perl program does not display any data until one or more work areas are allocated. We
will use the script sort random strings.sql to run SELECT . . . ORDER BY in SQL*Plus. Following
are the script’s contents:
set timing on
set autotrace traceonly statistics
SELECT * FROM random strings ORDER BY 1;
exit
The SQL*Plus command SET AUTOTRACE with the options TRACEONLY and STATISTICS is very
useful in this context, since it executes the statement without printing the result set to the
screen. Furthermore it collects and displays execution statistics from V$SESSTAT. In a separate
window from the one running sql workarea active.pl, execute the script sort random
strings.sql with SQL*Plus, as shown here:
C:> sqlplus ndebes/secret @sort random strings.sql
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
1000000 rows selected.
Elapsed: 00:00:18.73
Statistics
--------------------------------------------------133 recursive calls
7 db block gets
18879 consistent gets
16952 physical reads
0 redo size
138667083 bytes sent via SQL*Net to client
733707 bytes received via SQL*Net from client
66668 SQL*Net roundtrips to/from client
0 sorts (memory)
1 sorts (disk)
1000000 rows processed
Surprisingly, the available memory was insufficient and the sort spilled to disk. Following
is the output from sql workarea active.pl, which shows that the session performed a onepass sort, since it only got a work area size of 51.2 MB:
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
SID
148
148
148
148
148
…
148
TIME WORK AREA SIZE MAX MEM USED PASSES TEMPSEG SIZE
31:38
51.2
51.2
0
16
31:39
51.2
51.2
0
48
31:40
51.2
51.2
1
73
31:41
21.4
51.2
1
100
31:42
25.8
51.2
1
130
31:56
2.9
51.2
1
133
The timestamps confirm that the statement completed after 18 seconds. The temporary
segment grew to 133 MB, somewhat less than the table’s segment size. Obviously the entire
memory set aside with PAT is not available to a single session. Accordingly, additional undocumented restrictions must be in place. Searching the Internet for “pga_aggregate_target tuning
undocumented”, one quickly realizes that several hidden parameters impact automatic PGA
memory management. The names of the hidden parameters are PGA MAX SIZE, SMM MAX SIZE,
and SMM PX MAX SIZE. Of these, PGA MAX SIZE is in bytes and the other two in kilobytes (KB).
Descriptions and current values of these parameters are available by querying the X$ fixed
tables X$KSPPI and X$KSPPCV (see Part IV). The script auto pga parameters.sql, which queries
these X$ fixed tables and normalizes all four relevant parameters to kilobytes, is reproduced here:
SELECT x.ksppinm name,
CASE WHEN x.ksppinm like '%pga%' THEN to number(y.ksppstvl)/1024
ELSE to number(y.ksppstvl)
END AS value,
x.ksppdesc description
FROM x$ksppi x, x$ksppcv y
WHERE x.inst id = userenv('Instance')
AND y.inst id = userenv('Instance')
AND x.indx = y.indx
AND x.ksppinm IN ('pga aggregate target', ' pga max size',
' smm max size', ' smm px max size');
With the current settings, the script gives the following result:
C:> sqlplus -s / as sysdba @auto pga parameters
NAME
Value (KB) DESCRIPTION
-------------------- ---------- -------------------------------------------pga aggregate target
262144 Target size for the aggregate PGA memory
consumed by the instance
pga max size
204800 Maximum size of the PGA memory for one
process
smm max size
52428 maximum work area size in auto mode (serial)
smm px max size
131072 maximum work area size in auto mode (global)
15
16
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
I prepared the script pga aggregate target iterator.sql, which varies PGA AGGREGATE
TARGET between its minimum value of 10 MB up to a maximum value of 32 GB and calls auto
pga parameters.sql for each setting, to investigate how changing the value of PAT affects these
hidden parameters. Shutting down and restarting the DBMS is not necessary since all three
parameters are recalculated dynamically within certain limits. The results are presented in
Table 1-1. Results beyond 8 GB are omitted, since no change in policy is seen beyond this value.
Table 1-1. PGA_AGGREGATE_TARGET (PAT) and Dependent Hidden Parameters
PAT
_pga_max_size
(% of PAT)
_smm_max_size
(% of _pga_max_size)
_smm_max_size
as % of PAT
_smm_px_max_size
(% of PAT)
10 MB
200 MB (2000%)
2 MB (1%)
20%
5 MB (50%)
32 MB
200 MB (625%)
6.4 MB (3.2%)
20%
16 MB (50%)
64 MB
200 MB (320%)
12.8 MB (6.4%)
20%
32 MB (50%)
128 MB
200 MB (156%)
25 MB (12%)
20%
64 MB (50%)
256 MB
200 MB (78%)
51 MB (25%)
20%
128 MB (50%)
512 MB
200 MB (39%)
100 MB (50%)
19.5%
256 MB (50%)
1 GB
204 MB (20%)
102 MB (50%)
10%
512 MB (50%)
2 GB
410 MB(20%)
205 MB (50%)
10%
1 GB (50%)
3 GB
416 MB (13.5%)
208 MB (50%)
6.8%
1536 MB (50%)
4 GB
480 MB (11.7%)
240 MB (50%)
5.8%
2 GB (50%)
8 GB
480 MB (5.8%)
240 MB (50%)
2.9%
4 GB (50%)
Looking at Table 1-1, a few patterns emerge. These are addressed in the sections that
follow. Note that when overriding the settings of SMM MAX SIZE or SMM PX MAX SIZE by putting
them into a parameter file, these are no longer dynamically adjusted as PGA AGGREGATE TARGET
is modified. Since both parameters are static, the ability to change them at runtime, albeit indirectly, is lost.
_PGA_MAX_SIZE
The parameter PGA MAX SIZE limits the maximum size of all work areas for a single process.
• For values of PAT below 1 GB, PGA MAX SIZE is 200 MB.
• For values of PAT between 1 GB and 2 GB, PGA MAX SIZE is 20% of PAT.
• At values beyond 2 GB, PGA MAX SIZE keeps on growing as PAT is increased, but at a
lower rate, such that PGA MAX SIZE is less than 20% of PAT.
• A limit of 480 MB on PGA MAX SIZE takes effect at a PAT value of 4 GB.
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
• Increasing PAT beyond 4 GB does not result in higher values of PGA MAX SIZE than 480 MB.
• In Oracle9i, PGA MAX SIZE had a limit of 200 MB.
Just like PGA AGGREGATE TARGET, PGA MAX SIZE is a dynamic parameter that can be modified with ALTER SYSTEM. Changing PGA MAX SIZE increases SMM MAX SIZE in a similar way
that modifying PGA AGGREGATE TARGET does. However, the rule that SMM MAX SIZE is 50% of
PGA MAX SIZE does not hold for manual changes of PGA MAX SIZE. Following is an example
that increases PGA MAX SIZE beyond the limit of 480 MB that can be reached by modifying
PGA AGGREGATE TARGET:
SQL> @auto pga parameters
NAME
Value (KB) DESCRIPTION
-------------------- ---------- -------------------------------------------pga aggregate target
1048576 Target size for the aggregate PGA memory
consumed by the instance
pga max size
209700 Maximum size of the PGA memory for one
process
smm max size
104850 maximum work area size in auto mode (serial)
smm px max size
524288 maximum work area size in auto mode (global)
SQL> ALTER SYSTEM SET " pga max size"=500m;
System altered.
SQL> @auto pga parameters
NAME
Value (KB) DESCRIPTION
-------------------- ---------- -------------------------------------------pga aggregate target
1048576 Target size for the aggregate PGA memory
consumed by the instance
pga max size
512000 Maximum size of the PGA memory for one
process
smm max size
209715 maximum work area size in auto mode (serial)
smm px max size
524288 maximum work area size in auto mode (global)
By increasing PGA MAX SIZE, the work area size(s) available can be increased, without
extending the memory allowance for the entire instance. When memory is scarce this might
avoid some paging activity. As long as very few sessions concurrently request large work areas,
i.e., competition for PGA memory is low, this may lead to better response time for operations
involving large sorts. By altering PGA MAX SIZE, SMM MAX SIZE can be dynamically set to values
larger than the normal limit of 240 MB.
_SMM_MAX_SIZE
The parameter SMM MAX SIZE limits the maximum size of an individual work area for a single
process.
• For values of PAT below 512 MB, SMM MAX SIZE is 20% of PGA AGGREGATE TARGET.
• For PAT values of 512 MB and beyond, SMM MAX SIZE is always 50% of PGA MAX SIZE.
• In Oracle9i, SMM MAX SIZE had a limit of 100 MB. Following is an example of a session
that had two simultaneously active work areas when the given parameters were in effect:
17
18
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
NAME
Value (KB)
-------------------- ---------pga aggregate target
1536000
pga max size
307200
smm max size
153600
smm px max size
768000
C:>
SID
159
159
. .
159
159
sql workarea active hash.pl
TIME HASH VALUE
TYPE WORK AREA SIZE MAX MEM USED PASSES TMP SIZE
57:46 1705656915 SORT (v2)
133.7
133.7
0
0
57:46 3957124346 HASH-JOIN
16.2
15.7
0
105
.
57:52 1705656915 SORT (v2)
133.7
133.7
0
0
57:52 3957124346 HASH-JOIN
108.3
96.1
1
138
Output from the Perl script sql workarea active hash.pl includes the columns
HASH VALUE and TYPE from V$SQL WORKAREA ACTIVE. Both work areas combined exceed
SMM MAX SIZE, but not PGA MAX SIZE.
_SMM_PX_MAX_SIZE
The setting of SMM PX MAX SIZE is always 50% of PGA AGGREGATE TARGET. There is no limit on
SMM PX MAX SIZE (at least not within the tested range of PGA AGGREGATE TARGET of 10 MB to
32 GB). In Oracle9i, SMM PX MAX SIZE was 30% of PGA AGGREGATE TARGET.
Shared Server
In Oracle10g, Shared Server was recoded to use automatic PGA memory management. Oracle9i
Shared Server uses the * AREA SIZE Parameters, i.e., it behaves as if ALTER SESSION SET WORKAREA
SIZE POLICY=MANUAL had been executed. Hence it is valid to leave SORT AREA SIZE inside an
Oracle9i PFILE or SPFILE and to set it to a more useful value, such as 1048576, than the default
65536. Of course it is still valid to set meaningful values for SORT AREA SIZE, HASH AREA SIZE,
and so on in Oracle10g, for sessions that might run with manual work area sizing (WORKAREA
SIZE POLICY=MANUAL).
Parallel Execution
The hidden parameter SMM PX MAX SIZE applies to parallel execution, but exactly how needs
to be revealed by further tests. Regarding parallel execution (PX), it is important to bear in
mind that a parallel full scan of a table at degree n divides the work among n parallel execution
processes, such that the volume of data handled by each process equates to approximately one
nth of the entire data volume. The figure n is commonly called the degree of parallelism or DOP.
Each parallel execution process allocates its own work area(s). Since each process handles
merely a fraction of the data, the work areas required by individual processes in parallel mode
are smaller than a single work area in serial mode.
It turns out that SMM PX MAX SIZE places an additional restriction on the maximum work
area size, which is exercised on parallel execution processes. Each PX process may not use
more than SMM PX MAX SIZE/DOP memory. The per process restriction of SMM MAX SIZE
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
remains in effect for PX, such that the available memory is the lesser of SMM MAX SIZE and
SMM PX MAX SIZE/DOP. To sort entirely in memory, these two conditions must be met:
• The data volume per PX process must be less than SMM MAX SIZE.
• The data volume per PX process must be less than SMM PX MAX SIZE/DOP.
Let’s run some examples. The previous tests revealed that the SELECT from the test table
has a data volume of about 133 MB. Thus, at a DOP of four, each PX process requires a work
area size of around 133 MB divided by 4, or approximately 34 MB for an optimal sort. Rounding
up slightly to 40 MB to allow for fluctuations of the data volume among PX processes, we will
set SMM MAX SIZE=40960, since the unit of SMM MAX SIZE is KB. To avoid PGA AGGREGATE TARGET
or SMM PX MAX SIZE becoming the limiting factor, we also set both parameters to DOP times
SMM MAX SIZE or 160 MB. To set these parameters, place the following three lines into a parameter
file and restart the instance with STARTUP PFILE:
pga aggregate target=160m
smm px max size=163840 # in KB
smm max size=40960 # in KB
Verifying the settings with the script auto pga parameters.sql gives this result:
NAME
Value (KB) DESCRIPTION
-------------------- ---------- -------------------------------------------pga aggregate target
163840 Target size for the aggregate PGA memory
consumed by the instance
pga max size
204800 Maximum size of the PGA memory for one
process
smm max size
40960 maximum work area size in auto mode (serial)
smm px max size
163840 maximum work area size in auto mode (global)
Next, a FULL and a PARALLEL hint must be added to the SELECT statement to enable parallel
execution.
SQL> SELECT /*+ FULL(r) PARALLEL(r, 4) */ * FROM random strings r ORDER BY 1;
Running the parallel query at a DOP of four and monitoring it with sql workarea
active.pl gives this:
SID
143
144
145
146
…
145
…
146
…
144
…
143
TIME WORK AREA SIZE MAX MEM USED PASSES TEMPSEG SIZE
06:36
1.7
1.6
0
0
06:36
1.7
1.6
0
0
06:36
2.6
2.1
0
0
06:36
1.7
1.6
0
0
06:43
32.3
39.6
0
0
06:46
31.6
31.6
0
0
06:48
31.4
31.4
0
0
06:50
31.2
31.2
0
0
19
20
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
As expected, an optimal sort was performed. The response time is 14 s. Halving DOP
results in only two processes sharing the workload and the following measurements:
SID
140
147
…
147
…
140
TIME WORK AREA SIZE MAX MEM USED PASSES TEMPSEG SIZE
23:48
3.1
2.7
0
0
23:48
3.8
3.1
0
0
24:03
1.2
40
1
71
24:08
1.2
40
1
63
Here, SMM MAX SIZE leads to a degradation of response time to around 20 s, since at DOP
two each process requires a work area size of around 75 MB, but only 40 MB was available,
resulting in one-pass sorts and spilling to disk. Now back to the original DOP of four—a reduction of SMM PX MAX SIZE below the data volume divided by DOP also results in spilling to disk.
Following are the results at DOP four with these settings:
pga aggregate target=160m
smm px max size=122880 # in KB
smm max size=40960 # in KB
This time, SMM PX MAX SIZE is the limiting factor.
SID
143
…
145
…
146
…
144
…
143
TIME WORK AREA SIZE MAX MEM USED PASSES TEMPSEG SIZE
33:27
1.7
1.7
0
0
33:41
1.2
30
1
40
33:44
1.2
30
1
32
33:46
1.2
30
1
32
33:49
1.2
30
1
31
All slaves spilled their work areas to disk, since work areas were limited to 120 MB/DOP=40 MB
and the query completed in 22 s.
Lessons Learned
When using automatic PGA memory management, three hidden parameters— PGA MAX SIZE,
SMM MAX SIZE, and SMM PX MAX SIZE—work behind the scenes to enforce restrictions on
memory consumption. The parameter PGA MAX SIZE limits the size of all work areas in use by
a single process. The size of an individual work area is limited by SMM MAX SIZE for both serial
and parallel execution. When parallel execution is used, an additional restriction on the total
size of all work areas in use by the processes involved is in place. This limit is controlled with
the parameter SMM PX MAX SIZE. Within certain limits, all three parameters are recalculated at
runtime as a result of modifying PAT. All three parameters may be set manually to override the
result of this calculation.
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
EVENT
The initialization parameter EVENT is partially documented in the Oracle Database Reference
manual. The parameter syntax as well as which events may be set are undocumented. The manual
states that the parameter must not be used except under the supervision of Oracle Support
Services.
The parameter EVENT may be used to set one or more events at instance level. Events set
in this way are enabled for the entire lifetime of an instance. All other approaches for setting
events, such as DBMS SYSTEM, do not cover the entire lifetime of an instance. The parameter is
appropriate for situations where other means for setting events are not feasible or events must
be set right when a process starts. Processing of a technical assistance request by Oracle Support
Services may involve setting certain events. A DBA who is familiar with the parameter EVENT is
less dependent on Oracle Support and may find a workaround or gather diagnostic data without
needing to ask for assistance.
Syntax
The events that may be set with the parameter EVENT are the same events that can be set by
other means, such as ALTER SESSION, ALTER SYSTEM, and ORADEBUG. The commonalities go even
further, since the event specification syntax for the aforementioned methods and the parameter
EVENT is identical. Multiple events may be set by entering several event specifications separated
by colons. The syntax is:
event='event specification1[:event specificationN]*'
Brackets indicate that an element is optional. The asterisk indicates that the preceding
element may be repeated. The syntax for an individual event specification is as follows:
event number trace name context forever, level event level
The placeholders event_number and event_level are both integers. Most event numbers
are in the range 10000 to 10999.9 On UNIX systems, these events are listed in the file $ORACLE
HOME/rdbms/mesg/oraus.msg along with a description. The supported event level is unspecified
for most of the events in the file, such that it may be necessary to involve Oracle Support to
determine the correct level. The OERR utility may be used to retrieve the description for a
certain event. Following is an example for an event that switches off a cost-based optimizer
(CBO) access path:
$ oerr ora 10196
10196, 00000, "CBO disable index skip scan"
// *Cause:
// *Action:
It is also possible to request a diagnostic dump when an ORA-nnnnn error occurs. The
syntax for this is identical to the syntax that must be used with ALTER SESSION/SYSTEM SET EVENTS
(covered in Chapter 13). Further information on events is provided in Part III.
9. Events 14532 (enable bug fix for excess use of shared pool memory during DDL on partitioned objects
in 10.2.0.3) and 38068 (CBO enable override of guess impact on index choice) are example exceptions
to this rule.
21
22
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
Leveraging Events at the Instance-Level
Several scenarios mandate setting events at the instance level. These are:
• Enabling or disabling bug fixes
• Enabling or disabling features, such as cost-based optimizer access paths
• Tracing certain code paths or features in all processes of an instance
• Enabling or disabling certain checks
• Writing a diagnostic dump whenever an ORACLE error (ORA-nnnnn) is raised in any
database session
Consider the parameter EVENT whenever events must be set right when a process starts
or for all processes of an instance. While it is absolutely feasible to obtain SQL trace files for
multiple processes, such as those originating from a connection pool by setting event 10046
with parameter EVENT, I am a strong opponent of such a procedure because more sophisticated
approaches, such as using a logon trigger or DBMS MONITOR exist. Setting event 10046 at level 8
or 12 with the parameter EVENT is better than setting SQL TRACE=TRUE at the instance level, since
wait events and binds may be included; however, both incur the unnecessary overhead of
tracing each and every process of the instance. I certainly wouldn’t be willing to sift through
dozens or even hundreds of trace files to find a few relevant ones when other features allow
tracing just the processes of interest.
Case Study
Recovery Manager (RMAN) supports writing backups to file systems and to third party media
managers. Writing backups to a file system works out of the box and does not incur additional
expenses. Since writing to a local file system does not protect against the failure of the database
server hardware as a whole, writing to remote file systems or network-attached storage (NAS)
arrays via NFS is supported too. RMAN has several undocumented requirements concerning
NFS mount options. If the mount options hard,rsize=32768,wsize=32768 are not used, RMAN
will refuse to write to an NFS file system. However, certain releases of RMAN still throw an error
when these requirements are met. Under these circumstances, Oracle Support has suggested
setting event 10298 at level 32 as a temporary workaround until the underlying issue is resolved.
This is a case for setting an event at the instance level with parameter EVENT. With other
methods for setting events, such as ORADEBUG or DBMS SYSTEM, it is impossible to set the event in
time for the multiple processes that RMAN spawns. Furthermore it would be too cumbersome
to set the event after each instance startup with ALTER SYSTEM SET EVENTS.
OS_AUTHENT_PREFIX
The initialization parameter OS AUTHENT PREFIX is documented in the Oracle Database Reference
manual. It is undocumented that a database user name that is prefixed by the string OPS$
allows for local authentication through the operating system and password authentication
when connecting over a network.
Since REMOTE OS AUTHENT=FALSE should be set for security reasons, it’s impossible to use
externally-identified users to connect to an instance over a network, e.g., using Oracle Net and
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
the TCP/IP protocol adapter. Creating OPS$ users with password authentication allows the
convenience of omitting the user name and password when connecting locally using the Oracle
Net bequeath adapter, while being able to connect over a network using password authentication.
OPS$ Database Users and Password Authentication
Operating system authentication is intended for local connections. The Oracle Database SQL
Reference 10g Release 2 manual states the following on externally-identified users:
EXTERNALLY Clause
Specify EXTERNALLY to create an external user. Such a user must be authenticated by
an external service, such as an operating system or a third-party service. In this case,
Oracle Database relies on authentication by the operating system or third-party service
to ensure that a specific external user has access to a specific database user.
In the same way that a user who belongs to the DBA group (usually the UNIX group dba)
can connect with SYSDBA privileges without entering a password using CONNECT / AS SYSDBA, an
externally-identified user can connect using CONNECT /. When verifying credentials for an externally-identified user, the value of the ORACLE initialization parameter OS AUTHENT PREFIX is
prepended to the operating system user name. If the resulting user name exists in the data
dictionary and DBA USERS.PASSWORD=EXTERNAL for this user, then the user may connect without
entering a password. The syntax for creating an externally-identified user is as follows:
CREATE USER <os authent prefix><os user name> IDENTIFIED EXTERNALLY;
It is undocumented that operating system authentication also works for users created with
password authentication as long as OS AUTHENT PREFIX is left at its default setting of ops$. That
is, users created with the syntax CREATE USER ops$os_user_name IDENTIFIED BY password may
connect locally without entering a password as long as OS AUTHENT PREFIX=ops$. In a way, this
approach combines the best of both worlds. The need to enter passwords for interactive database sessions as well as storing passwords for batch jobs running locally is dispelled and the
same user name may be used to connect over the network.
Case Study
The environment for this case study is a UNIX system, where the DBA group name is “dba”, the
OPER group name is “oper”, and the ORACLE software owner group is “oinstall”. Furthermore,
a password file is used. In a moment you will see how a user who is not a member of any of the
aforementioned three special groups may be granted the SYSOPER privilege, allowing him to
start and stop an instance, while not being able to change parameters or to modify the ORACLE
software installation. This is an additional option that may be implemented with the undocumented approach discussed in the previous section.
First of all, we verify that the parameter OS AUTHENT PREFIX has the default value ops$.
SQL> SHOW PARAMETER os authent prefix
NAME
TYPE
VALUE
------------------------------------ ----------- ----os authent prefix
string
ops$
23
24
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
Next, we create a database user whose name is formed by prepending the string ops$ to the
operating system user name, in this case “ndebes”, and grant the privileges CONNECT and SYSOPER
to the new user.
SQL> CREATE USER ops$ndebes IDENTIFIED BY secret;
User created.
SQL> GRANT CONNECT, SYSOPER TO ops$ndebes;
Grant succeeded.
SQL> SELECT * FROM v$pwfile users;
USERNAME
SYSDBA SYSOPER
------------------------------ ------ ----SYS
TRUE
TRUE
OPS$NDEBES
FALSE TRUE
As evidenced by Figure 1-3, the database user OPS$NDEBES can connect via the Oracle
Net TCP/IP adapter from a Windows system. Password authentication is required, since
REMOTE OS AUTHENT=FALSE is set.
Figure 1-3. SQL*Plus Session via the Oracle Net TCP/IP Adapter
Back on UNIX, the operating system user “ndebes” can connect without entering a password.
$ id
uid=500(ndebes) gid=100(users) groups=100(users)
$ sqlplus /
SQL*Plus: Release 10.2.0.3.0 - Production on Wed Sep 5 08:02:33 2007
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
SQL> SHOW USER
USER is "OPS$NDEBES"
Thanks to password authentication and a password file, connecting AS SYSOPER works too.
SQL> CONNECT ops$ndebes/secret AS SYSOPER
Connected.
SQL> SHOW USER
USER is "PUBLIC"
SQL> SELECT * FROM session privs;
PRIVILEGE
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
---------------------------------------CREATE SESSION
RESTRICTED SESSION
SYSOPER
Due to the SYSOPER privilege, the database user “OPS$NDEBES” can stop and restart
the instance.
SQL> SHUTDOWN IMMEDIATE
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> STARTUP
ORACLE instance started.
Database mounted.
Database opened.
Contrary to SYSDBA, the SYSOPER privilege does not include access to data dictionary views
or tables, but allows the use of ARCHIVE LOG LIST for monitoring. Merely database objects accessible to PUBLIC may be accessed with the SYSOPER privilege.
SQL> SELECT startup time FROM v$instance;
SELECT startup time FROM v$instance
*
ERROR at line 1:
ORA-00942: table or view does not exist
SQL> ARCHIVE LOG LIST
Database log mode
No Archive Mode
Automatic archival
Disabled
Archive destination
/opt/oracle/product/db10.2/dbs/arch
Oldest online log sequence
18
Current log sequence
19
The combined benefits of operating system and password authentication become
unavailable with a nondefault setting of OS AUTHENT PREFIX. The SYSDBA privilege can merely be
granted to database users created with password authentication, but obviously such users
must enter the correct password when connecting. The problem is that the undocumented
check for operating system authentication in spite of an assigned password is not done when
OS AUTHENT PREFIX has a nondefault value.
SQL> ALTER SYSTEM SET os authent prefix='' SCOPE=SPFILE;
System altered.
Since OS AUTHENT PREFIX is now a zero-length string, operating system user name and
database user name are identical.
SQL> CREATE USER ndebes IDENTIFIED BY secret;
User created.
SQL> GRANT CONNECT, SYSOPER TO ndebes;
Grant succeeded.
25
26
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
To allow the changed value of OS AUTHENT PREFIX to take effect, the instance must be
restarted. Clearly, the operating system user “ndebes” will not be able to connect as database
user “ndebes” without entering the password “secret”.
$ sqlplus -s /
ERROR:
ORA-01017: invalid username/password; logon denied
When setting the authentication method for the user to operating system authentication,
the string “EXTERNAL” instead of a password hash is stored in DBA USERS.PASSWORD.
SQL> ALTER USER ndebes IDENTIFIED externally;
User altered.
SQL> SELECT password FROM dba users WHERE username='NDEBES';
PASSWORD
-----------------------------EXTERNAL
Now the operating system user “ndebes” is able to connect without entering the password.
$ id
uid=500(ndebes) gid=100(users) groups=100(users)
$ sqlplus /
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
SQL> CONNECT ndebes/secret as SYSOPER
ERROR:
ORA-01031: insufficient privileges
However, the ability to connect as SYSOPER using the password stored in the password file
is lost for the now externally-identified database user. The same applies to the privilege SYSDBA.
Lessons Learned
There is an undocumented code path that enables operating system authentication for database users whose user names start with OPS$, even when these users are created with password
authentication. This combines the best aspects of the otherwise mutually-exclusive approaches
of operating system and password authentication. To leverage the undocumented feature, the
initialization parameter OS AUTHENT PREFIX must have the default value ops$. The feature may
also be used to set up a single database user with SYSDBA or SYSOPER privileges who does not
belong to the DBA or OPER operating system groups and who can connect locally without
entering a password. Such a user must only enter the password when connecting over the network
or when needing a session with SYSDBA or SYSOPER privileges. Separate database users are
required without the undocumented feature or if a nondefault setting of OS AUTHENT PREFIX is
in effect. If you are dealing with a security-sensitive environment and need to make sure that
an intruder cannot exploit this feature, you should disable it by assigning a nondefault value to
the parameter OS AUTHENT PREFIX.
CHAPTER 1 ■ PARTIALLY DOCUMENTED PARAMETERS
Source Code Depot
Table 1-2 lists this chapter’s source files and their functionality.
Table 1-2. Partially Documented Parameters Source Code Depot
File Name
Functionality
auto pga parameters.sql
Retrieves all the documented and undocumented
parameters that affect SQL work area sizing.
pga aggregate target iterator.sql
This script varies PGA AGGREGATE TARGET between
10 MB and 32 GB and calls auto pga parameters.sql
at each iteration.
row factory.sql
Creates a pipelined table function that returns an
arbitrary number of rows.
sort random strings.sql
This script enables SQL*Plus AUTOTRACE and selects
rows from the test table RANDOM STRINGS.
sql workarea active.pl
This Perl script monitors work areas by querying the
dynamic performance view V$SQL WORKAREA ACTIVE.
sql workarea active hash.pl
This Perl script monitors work areas by querying the
dynamic performance view V$SQL WORKAREA ACTIVE.
The output includes SQL statement hash values.
27
CHAPTER 2
■■■
Hidden Initialization
Parameters
T
he view V$PARAMETER provides access to documented initialization parameters. This view is
built on top of the X$ fixed tables X$KSPPI and X$KSPPCV (more on X$ fixed tables in Part IV). All
hidden parameters start with one or two underscores. V$PARAMETER is built in such a way that
it lists documented parameters only. There are 540 undocumented parameters in Oracle9i
Release 2, 1,124 in Oracle10g Release 2, and 1,627 in Oracle11g. Their names and a short
description may be retrieved with the following query (file hidden parameters.sql):
SQL> SELECT ksppinm name,
ksppstvl value,
ksppdesc description
FROM x$ksppi x, x$ksppcv y
WHERE (x.indx = y.indx)
AND x.inst id=userenv('instance')
AND x.inst id=y.inst id
AND ksppinm LIKE '\ %' ESCAPE '\'
ORDER BY name;
The source code depot contains a complete list of undocumented parameters for the releases
Oracle9i, Oracle10g, and Oracle11g. Due to the large amount of undocumented parameters, it
is impossible to investigate what they all do and to document the results. I have chosen TRACE
FILES PUBLIC and ASM ALLOW ONLY RAW DISKS as examples of two useful hidden parameters
and discuss both in detail in this chapter. Of course there are many more, which may be useful
under certain circumstances. Some may be set to work around bugs (e.g., DISABLE RECOVERABLE
RECOVERY), while others may have an effect on performance (e.g., ROW CACHE CURSORS, ASM
AUSIZE). Still others may be used to salvage data from a database when the documented
recovery procedures fail due to invalid backup procedures or the loss of the current online
redo log ( ALLOW RESETLOGS CORRUPTION, OFFLINE ROLLBACK SEGMENTS). Normally hidden
parameters should only be set under the supervision of Oracle Support Services.
29
30
CHAPTER 2 ■ HIDDEN INITIALIZATION PARAMETERS
Trace File Permissions and
_TRACE_FILES_PUBLIC
Trace files are created either on request, e.g., with ALTER SYSTEM SET SQL TRACE=TRUE or when
internal errors occur. Trace files from foreground processes are located in the directory set
with the parameter USER DUMP DEST, whereas trace files from background processes take the
directory setting from the initialization parameter BACKGROUND DUMP DEST. In any case, the file
name extension is .trc. By default, trace files are readable only for the owner of the ORACLE
installation (normally “oracle”) or members of the installation group (normally “oinstall”). If a
database administrator does not belong to the installation group, even he or she cannot read
trace files.
Since trace files may contain sensitive information, either as bind variable values or literals,
it is appropriate that the default permissions are restrictive. On a test system, however, where
developers enable SQL trace and need to analyze the output with TKPROF, it’s much more
convenient to allow anyone with access to the system to read trace files. A hidden parameter
called TRACE FILES PUBLIC may be used to make newly created trace files readable by everyone.
As shown by running the script hidden parameter value.sql, the default setting of the static
parameter is FALSE:
$ cat hidden parameter value.sql
col name format a33
col value format a36
set verify off
SELECT x.ksppinm name, y.ksppstvl value
FROM x$ksppi x, x$ksppcv y
WHERE x.inst id = userenv('Instance')
AND y.inst id = userenv('Instance')
AND x.indx = y.indx
AND x.ksppinm='&hidden parameter name';
$ sqlplus -s / as sysdba @hidden parameter value.sql
Enter value for hidden parameter name: trace files public
NAME
VALUE
--------------------------------- -----------------------------------trace files public
FALSE
Let’s have a look at the permissions of files in the user dump destination.
SQL> SHOW PARAMETER user dump dest
NAME
TYPE
VALUE
---------------------------- ----------- --------------------------------user dump dest
string
/opt/oracle/obase/admin/TEN/udump
SQL> !cd /opt/oracle/obase/admin/TEN/udump; ls -l
total 68
-rw-r----- 1 oracle oinstall 1024 Jul 21 21:26 ten1 ora 11685.trc
-rw-r----- 1 oracle oinstall 874 Jul 24 02:56 ten1 ora 13035.trc
-rw-r----- 1 oracle oinstall 737 Jul 24 02:56 ten1 ora 13318.trc
CHAPTER 2 ■ HIDDEN INITIALIZATION PARAMETERS
As expected, read permission is granted solely to the owner of the file and the group
“oinstall”.1 If a server parameter file (SPFILE) is used, TRACE FILES PUBLIC must be changed
with an ALTER SYSTEM command. Double quotes around the parameter name are mandatory,
since it starts with an underscore ( ).
SQL> ALTER SYSTEM SET " trace files public" = TRUE SCOPE=SPFILE;
Double quotes around the parameter are not required when a text parameter file (PFILE) is
used. Since the parameter is static, the instance must be shut down and restarted for the new
setting to take effect. Using ORADEBUG (see Chapter 37) we can quickly verify that read permission for others is now granted on newly created trace files.
SQL> ORADEBUG SETMYPID
Statement processed.
SQL> ALTER SESSION SET SQL TRACE=TRUE;
Session altered.
SQL> SELECT sysdate FROM dual;
SYSDATE
--------25-JUL-07
SQL> ORADEBUG TRACEFILE NAME
/opt/oracle/obase/admin/TEN/udump/ten1 ora 18067.trc
SQL> !cd /opt/oracle/obase/admin/TEN/udump;ls -l ten1 ora 18067.trc
-rw-r--r-- 1 oracle oinstall 1241 Jul 25 20:53 ten1 ora 18067.trc
As you would expect by looking at the parameter name, TRACE FILES PUBLIC has no effect
on permissions of the alert log.
ASM Test Environment and
_ASM_ALLOW_ONLY_RAW_DISKS
Automatic Storage Management (ASM) is essentially a volume manager and a file system for
exclusive use by ORACLE instances. The volume management capabilities include mirroring
and striping. ASM implements the S.A.M.E. (stripe and mirror everything)2 approach. ASM uses a
number of raw devices,3 concatenates them into a large pool of storage, and offers the storage
space as a kind of file system to ORACLE instances. Raw disks (e.g., LUNs in a SAN) are grouped
into disk groups. ASM can rely on RAID storage arrays for mirroring (external redundancy) or it
can do its own mirroring (normal/high redundancy). If necessary, disks in a disk group may
1. The format in which the UNIX command ls displays permissions, is {r|-}{w|-}{x|-}. This sequence of
characters is repeated three times. The left part applies to the owner of the file, group permissions are
in the middle, and permissions for anyone (a.k.a. world) are on the right. A minus sign means that the
permission represented by the position in the string is not granted. For example, rwxr xr x means that
the owner may read, write, and execute, the group may read and execute and anyone may read and
execute.
2. See http://www.oracle.com/technology/deploy/availability/pdf/OOW2000 same ppt.pdf
3. Oracle10g Release 2 on Linux supports block devices too. These are opened with O DIRECT to eliminate
caching by the Linux operating system kernel as with raw devices. Performance is the same as that of
raw devices, which have been deprecated on the Linux platform.
31
32
CHAPTER 2 ■ HIDDEN INITIALIZATION PARAMETERS
be assigned to failure groups, which indicate the storage system topology to ASM, such that
mirrored copies can be placed on different storage arrays or may be accessed using different
host bus adapters.
For readers who would like to familiarize themselves with ASM, but do not have access to
a SAN or cannot create raw devices on a local disk due to space constraints or lack of privileges,
this chapter demonstrates how to set up a test environment for automatic storage management
on Windows with cooked files and ASM ALLOW ONLY RAW DISKS. Old school UNIX jargon distinguished raw files from cooked files. Cooked files are simply the opposite of raw devices—files in a
file system. After all, something that’s not raw has to be cooked, right?
ASM Hidden Parameters
Undocumented parameters pertaining to ASM may be retrieved by running the following
query as user SYS:
SQL> SELECT x.ksppinm name, y.ksppstvl value, x.ksppdesc description
FROM x$ksppi x, x$ksppcv y
WHERE x.inst id = userenv('Instance')
AND y.inst id = userenv('Instance')
AND x.indx = y.indx
AND x.ksppinm LIKE '\ asm%' ESCAPE '\'
ORDER BY name;
NAME
VALUE
DESCRIPTION
------------------------------ ---------- ----------------------------------asm acd chunks
1
initial ACD chunks created
asm allow only raw disks
TRUE
Discovery only raw devices
asm allow resilver corruption FALSE
Enable disk resilvering for
external redundancy
asm ausize
1048576
allocation unit size
asm blksize
4096
metadata block size
asm disk repair time
14400
seconds to wait before dropping a
failing disk
asm droptimeout
60
timeout before offlined disks get
dropped (in 3s ticks)
asm emulmax
10000
max number of concurrent disks to
emulate I/O errors
asm emultimeout
0
timeout before emulation begins (in
3s ticks)
asm kfdpevent
0
KFDP event
asm libraries
ufs
library search order for discovery
asm maxio
1048576
Maximum size of individual I/O
request
asm stripesize
131072
ASM file stripe size
asm stripewidth
8
ASM file stripe width
asm wait time
18
Max/imum time to wait before asmb
exits
asmlib test
0
Osmlib test event
asmsid
asm
ASM instance id
CHAPTER 2 ■ HIDDEN INITIALIZATION PARAMETERS
These parameters unveil some of ASM’s inner workings. The default settings indicate that
ASM divides an allocation unit ( ASM AUSIZE) into 128 KB chunks ( ASM STRIPESIZE) and places
each of those chunks on up to eight different disks ( ASM STRIPEWIDTH). Very large databases
may benefit from increasing ASM AUSIZE, but this is beyond the scope of this book.4 This chapter
merely addresses ASM ALLOW ONLY RAW DISKS.
Setting Up Oracle Clusterware for ASM
To start an ASM instance, a stripped down version of Oracle Clusterware must be running on
the same system. This is accomplished with the command %ORACLE HOME%\bin\localconfig
add, which creates an ORACLE cluster registry (OCR) in %ORACLE HOME%\cdata\localhost\
local.ocr. It also creates a new Windows service for the OCSSD Clusterware daemon. OCSSD
logging goes to the file %ORACLE HOME%\log\<host name>\cssd\ocssd.log.
C:> localconfig add
Step 1: creating new OCR repository
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'ndebes', privgrp ''..
Operation successful.
Step 2: creating new CSS service
successfully created local CSS service
successfully added CSS to home
The service that implements OCSSD is called OracleCSService. You can verify that the
service is functional by running the command net start in a Windows command interpreter.
This command lists all running services.
C:> net start
…
OracleCSService
OracleDB 10 2TNSListener
…
The Clusterware command for checking OCSSD’s status is crsctl check css.
C:> crsctl check css
CSS appears healthy
By running crsctl check crs, it becomes apparent that only a subset of Clusterware daemons
are active in a local-only configuration. The CRS and EVM daemons required for RAC are
not needed.
C:> crsctl check crs
CSS appears healthy
Cannot communicate with CRS
Cannot communicate with EVM
4. See http://h20331.www2.hp.com/ERC/downloads/4AA0 9728ENW.pdf and Metalink note 368055.1.
33
34
CHAPTER 2 ■ HIDDEN INITIALIZATION PARAMETERS
ASM Instance Setup
Next, we will create some cooked files for ASM storage. ASM will use these files instead of raw
disks. On Windows, the undocumented switch -create of the asmtool command is used to
create cooked files for use by ASM. The syntax is as follows:
asmtool -create <file name> <file size mb>
We will use the asmtool command to create four files which will serve as “disks”. Each file
has a size of 512 MB.
C:\> mkdir C:\oradata
C:\> asmtool -create C:\oradata\ARRAY1 DISK1 512
Repeat the asmtool command with the file names ARRAY1 DISK2, ARRAY2 DISK1, and
ARRAY2 DISK2 to create three more files. On UNIX, the dd command is available to accomplish
the same task as asmtool -create. Following is an example for creating a file with a size of 512 MB:
$ dd if=/dev/zero bs=1048576 count=512 of=ARRAY1 DISK1
The above dd command reads 512 blocks with a size of 1 MB each from the device special
file /dev/zero. Since reading from /dev/zero returns nothing but binary zeros, the resulting file
is zeroed out as required by ASM. The dd command is available for Windows systems by installing
Cygwin (http://www.cygwin.com). The dd option conv=notrunc is interesting in that it may be
used to zero out a section within a file to simulate a failure or induce a block corruption.
The four files will be used to simulate two disk arrays (ARRAY1 and ARRAY2) with two
logical units (LUNs) each. We will then set up ASM to mirror across the two arrays. Striping
occurs within the array boundaries.
C:\oradata>ls
total 2463840
-rw-rw-rw- 1
-rw-rw-rw- 1
-rw-rw-rw- 1
-rw-rw-rw- 1
-l
ndebes
ndebes
ndebes
ndebes
mkpasswd
mkpasswd
mkpasswd
mkpasswd
536870912
536870912
536870912
536870912
Nov
Nov
Nov
Nov
2
2
2
2
13:34
13:38
13:40
13:38
ARRAY1
ARRAY1
ARRAY2
ARRAY2
DISK1
DISK2
DISK1
DISK2
To start an ASM instance, a parameter file that contains INSTANCE TYPE=ASM is required.
The parameter ASM DISKSTRING is used to indicate where ASM should search for disks. Create a
file called pfile+ASM.ora with the following contents in %ORACLE HOME%\database:
instance type = ASM
asm diskstring = 'c:\oradata\*'
Next, create a Windows service for the ORACLE ASM instance with oradim:
C:> oradim -new -asmsid +ASM -syspwd secret -startmode manual -srvcstart demand
Instance created.
The command oradim creates and starts a Windows service called OracleASMService+ASM.
You may verify that the service is running with net start.
C:> net start | grep -i asm
OracleASMService+ASM
Now we are ready to start the ASM instance.
CHAPTER 2 ■ HIDDEN INITIALIZATION PARAMETERS
C:> set ORACLE SID=+ASM
C:> sqlplus / as sysdba
SQL*Plus: Release 10.2.0.3.0 - Production on Tue Aug 14 16:17:51 2007
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to an idle instance.
SQL> STARTUP NOMOUNT PFILE=?\database\pfile+ASM.ora
ASM instance started
Total System Global Area
79691776 bytes
Fixed Size
1247396 bytes
Variable Size
53278556 bytes
ASM Cache
25165824 bytes
Next, we create a server parameter file (SPFILE), such that ASM will be able to store disk
group names it should mount at instance startup in the SPFILE.
SQL> CREATE SPFILE FROM PFILE='?\database\pfile+ASM.ora';
File created.
This creates an SPFILE called spfile+ASM.ora. Let’s see whether ASM recognizes the
cooked files as disks.
SQL> SELECT path, header status FROM v$asm disk;
no rows selected
ASM does not see any disks that it might use. This is not surprising, since the default
setting of the parameter ASM ALLOW ONLY RAW DISKS is TRUE. We need to shut down the
instance and restart it before we can change the parameter in the SPFILE that we created.
SQL> SHUTDOWN IMMEDIATE
ORA-15100: invalid or missing diskgroup name
ASM instance shutdown
SQL> STARTUP NOMOUNT
ASM instance started
…
SQL> SHOW PARAMETER SPFILE
NAME
TYPE
VALUE
-------- ----------- -----------------------------------------------spfile
string
C:\ORACLE\PRODUCT\DB10.2\DATABASE\SPFILE+ASM.ORA
Since ASM ALLOW ONLY RAW DISKS is a static parameter, another instance restart is
required after changing it.
SQL> ALTER SYSTEM SET " asm allow only raw disks"=FALSE SCOPE=SPFILE SID='*';
System altered.
SQL> SHUTDOWN IMMEDIATE
ORA-15100: invalid or missing diskgroup name
ASM instance shutdown
SQL> STARTUP NOMOUNT
ASM instance started
…
35
36
CHAPTER 2 ■ HIDDEN INITIALIZATION PARAMETERS
SQL> SELECT path, header status, library, total mb, free mb FROM v$asm disk;
PATH
HEADER STATUS LIBRARY
TOTAL MB
FREE MB
----------------------- ------------- ------- ---------- ---------C:\ORADATA\ARRAY1 DISK1 CANDIDATE
System
512
0
C:\ORADATA\ARRAY2 DISK2 CANDIDATE
System
512
0
C:\ORADATA\ARRAY2 DISK1 CANDIDATE
System
512
0
C:\ORADATA\ARRAY1 DISK2 CANDIDATE
System
512
0
This time ASM did recognize the cooked files as disks for use in a disk group, so we may go
ahead and create a disk group with external redundancy. By assigning the failure group array1
to the disks in the first disk array (files ARRAY1 DISK1 and ARRAY1 DISK2) and the failure group
array2 to the second disk array (files ARRAY2 DISK1 and ARRAY2 DISK2), ASM is instructed to
mirror across the two disk arrays. It will automatically stripe the data within each disk array.
SQL> CREATE DISKGROUP cooked dg NORMAL REDUNDANCY
FAILGROUP array1
DISK
'C:\ORADATA\ARRAY1 DISK1' NAME array1 disk1,
'C:\ORADATA\ARRAY1 DISK2' NAME array1 disk2
FAILGROUP array2
DISK
'C:\ORADATA\ARRAY2 DISK1' NAME array2 disk1,
'C:\ORADATA\ARRAY2 DISK2' NAME array2 disk2;
Diskgroup created.
The disks that were formerly candidates are now members of a disk group.
SQL> SELECT path, header status, library, total mb, free mb FROM v$asm disk;
PATH
HEADER STATUS LIBRARY
TOTAL MB
FREE MB
----------------------- ------------- ------- ---------- ---------C:\ORADATA\ARRAY1 DISK1 MEMBER
System
512
482
C:\ORADATA\ARRAY1 DISK2 MEMBER
System
512
489
C:\ORADATA\ARRAY2 DISK1 MEMBER
System
512
484
C:\ORADATA\ARRAY2 DISK2 MEMBER
System
512
487
As you can see by comparing the columns TOTAL MB and FREE MB, ASM uses quite a bit of
space for internal purposes. The view V$ASM DISKGROUP gives access to information on disk groups.
If you have read the overview of hidden ASM parameters at the beginning of this chapter attentively, you will recognize the settings of two hidden parameters in the following output:
SQL> SELECT name, block size, allocation unit size,
type, total mb, usable file mb
FROM v$asm diskgroup;
NAME
BLOCK SIZE ALLOCATION UNIT SIZE STATE
---------- ---------- -------------------- -------COOKED DG
4096
1048576 MOUNTED
state,
TYPE
TOTAL MB USABLE FILE MB
------ -------- -------------NORMAL
2048
715
The value in column BLOCK SIZE is derived from the parameter ASM BLKSIZE, while
ALLOCATION UNIT SIZE is derived from ASM AUSIZE. You may now use DBCA to create a
database in the disk group. Make sure you choose ASM storage for all data files.
CHAPTER 2 ■ HIDDEN INITIALIZATION PARAMETERS
Disk Failure Simulation
Chances are high that you have never had to deal with disk failure in an ASM environment. To
prepare yourself for such a case, you may wish to use the environment set up in this chapter
to simulate disk failure and gain experience with repairing an ASM setup. Disk failure may be
simulated by placing cooked files on an external FireWire disk drive, USB disk drive, or USB
stick, and pulling the cable to the disk or the stick. In a SAN environment, disk failure might by
simulated by pulling cables, changing the zoning configuration, or removing logical unit
number (LUN) access rights in a storage array. The term zoning is used to describe the configuration whereby a storage area network administrator separates a SAN into units and allocates
storage to those units. Each disk or LUN in a SAN has a unique identification called a worldwide
number (WWN). If a WWN is made invisible to a system by changing the zoning configuration,
neither ASM nor RDBMS instances will be able to use the LUN.
Source Code Depot
Table 2-1 lists this chapter’s source files and their functionality.
Table 2-1. Hidden Parameters Source Code Depot
File Name
Functionality
10g hidden parameters.html
Complete list of undocumented parameters in Oracle10g
with default values and descriptions
11g hidden parameters.html
Complete list of undocumented parameters in Oracle11g
with default values and descriptions
9i hidden parameters.html
Complete list of undocumented parameters in Oracle9i with
default values and descriptions
hidden parameters.sql
SELECT statement for retrieving all hidden parameters with
their values and descriptions
hidden parameter value.sql
SELECT statement for retrieving the value of a single hidden
parameter
37
PA R T
2
Data Dictionary Base
Tables
CHAPTER 3
■■■
Introduction to Data Dictionary
Base Tables
E
ach ORACLE database contains a data dictionary that holds metadata, i.e., data about the
database itself. Data dictionary objects are mostly clusters, tables, indexes, and large objects.
The data dictionary is like the engine of a car. If it doesn’t ignite (or rather bootstrap using
SYS.BOOTSTRAP$), then all the other fancy features are quite useless. Traditionally, all data
dictionary objects were stored in the tablespace SYSTEM. With the release of Oracle10g, the
additional tablespace SYSAUX was introduced. This new tablespace contains the Workload
Repository base tables (WRI$* and WRH$* tables) and other objects.
Knowing how to leverage data dictionary base tables allows a DBA to accomplish tasks
that cannot be completed by accessing data dictionary views built on top of dictionary base
tables. This includes scenarios where dictionary views lack required functionality as well as
workarounds for defects in data dictionary views.
The data dictionary is created behind the scenes when the SQL statement CREATE DATABASE
is executed. It is created by running the script $ORACLE HOME/rdbms/admin/sql.bsq. Except for
some placeholders, sql.bsq is a regular SQL*Plus script. Oracle9i contains 341 data dictionary
base tables, Oracle10g 712, and Oracle11g 839.
Database administrators and users seldom access the data dictionary base tables directly.
Since the base tables are normalized and often rather cryptic, the data dictionary views with
prefixes DBA *, ALL * and USER * are provided for convenient access to database metadata.
Some data dictionary views do not have one of these three prefixes (e.g., AUDIT ACTIONS). The
well-known script catalog.sql creates data dictionary views. By looking at view definitions in
catalog.sql, it becomes apparent which base table column corresponds to which dictionary
view column.
For optimum performance, data dictionary metadata are buffered in the dictionary cache.
To further corroborate the saying that well-designed ORACLE DBMS features have more than
a single name, the dictionary cache is also known as the row cache. The term row cache stems
from the fact that this cache contains individual rows instead of entire blocks like the buffer
cache does. Both caches are in the SGA. The dictionary cache is part of the shared pool, to be
precise.
The role DBA includes read-only access to data dictionary base tables through the system
privilege SELECT ANY DICTIONARY. This privilege should not be granted frivolously to non-DBA
users. This is especially true for Oracle9i where the dictionary base table SYS.LINK$ contains
unencrypted passwords of database links, whereas the dictionary view DBA DB LINKS, which is
accessible through the role SELECT CATALOG ROLE, hides the passwords. Passwords for database
41
42
CHAPTER 3 ■ INTRODUCTION TO DATA DICTIONARY BASE TABLES
links are encrypted during the upgrade process to Oracle10g. Table 3-1 lists some dictionary
tables that are related to prominent database objects.
Table 3-1. Data Dictionary Base Tables
Object
Data Dictionary
Base Table
Associated DBA_* View(s)
Clusters
CLU$
DBA CLUSTERS, DBA SEGMENTS
Database links
LINK$
DBA DB LINKS
Data files
FILE$
DBA DATA FILES, DBA FREE SPACE
Free extents
FET$
DBA FREE SPACE
Indexes
IND$
DBA INDEXES
Large objects
LOB$
DBA LOBS
Database objects
OBJ$
DBA OBJECTS, DBA LOBS, DBA TYPES
Segments
SEG$
DBA SEGMENTS
Tables
TAB$
DBA TABLES, DBA LOBS
Tablespaces
TS$
DBA TABLESPACES, DBA DATA FILES, DBA LOBS
Types
TYPE$
DBA TYPES
Used extents
UET$
DBA SEGMENTS, DBA FREE SPACE
Users
USER$
DBA USERS, DBA DB LINKS, DBA LOBS
Of course, dictionary base tables should never be changed directly, as this may easily
cause database corruption. Querying dictionary base tables should be considered when data
dictionary views do not expose enough information to solve a task. Sometimes dictionary
views have bugs, which can be worked around by accessing the base tables directly. The script
sql.bsq is well commented, such that reading this script may aid in understanding the structure of the dictionary base tables.
Large Objects and PCTVERSION vs. RETENTION
An example of leveraging direct access to dictionary base tables is an issue with the data dictionary
view DBA LOBS in Oracle9i and Oracle10g Release 1. The view fails to correctly report the versioning
setting for LOB segments. Since Oracle9i, multiversion read consistency for LOBs is done either
by setting aside a certain percentage of storage in the LOB segment (SQL keyword PCTVERSION;
old approach) or with undo segments (SQL keyword RETENTION; new approach). The default for
an Oracle9i database with automatic undo management is PCTVERSION. For an Oracle10g database in automatic undo management mode, the default is RETENTION. The setting of RETENTION
cannot be specified with SQL syntax and is copied from the parameter UNDO RETENTION. Here’s
an example that uses both approaches within a single table:
CHAPTER 3 ■ INTRODUCTION TO DATA DICTIONARY BASE TABLES
SQL> CREATE TABLE blog (
username VARCHAR2(30),
date time DATE,
text CLOB,
img BLOB)
LOB (text) STORE AS blog text clob (RETENTION),
LOB (img) STORE AS blog img blob (PCTVERSION 10);
Table created.
SQL> SELECT pctversion, retention FROM user lobs WHERE table name='BLOG';
PCTVERSION RETENTION
---------- ---------10
10800
10
10800
SQL> SHOW PARAMETER undo retention
NAME
TYPE
VALUE
------------------------------------ ----------- -----------------------undo retention
integer
10800
The result of querying the data dictionary view USER LOBS is obviously incorrect. Looking
at sql.bsq, there’s unfortunately no comment that says which column is used to discern
PCTVERSION and RETENTION, though it appears likely that the column FLAGS holds the required
information. Here’s the relevant excerpt of sql.bsq:
create table lob$
( obj#
number
…
lobj#
number
…
pctversion$ number
flags
number
…
retention
not null,
/* LOB information table */
/* object number of the base table */
not null,
/* object number for the LOB */
not null,
not null,
number not null,
/* version pool
/* 0x0000 = CACHE
/* 0x0001 = NOCACHE LOGGING
/* 0x0002 = NOCACHE NOLOGGING
*/
*/
*/
*/
/* retention value = UNDO RETENTION */
The PCTVERSION setting is stored in the column PCTVERSION$ and the undo retention setting
is stored in the column RETENTION. Since LOB$.LOBJ# corresponds to DBA OBJECTS.OBJECT ID
(see definition of DBA LOBS in the file catalog.sql), we can query LOB$.FLAGS for our table by
joining DBA OBJECTS and LOB$:
SQL> SELECT object name, flags
FROM sys.lob$ l, dba objects o
WHERE l.lobj#=o.object id
AND o.object name IN ('BLOG TEXT CLOB', 'BLOG IMG BLOB');
OBJECT NAME
FLAGS
-------------------- ---------BLOG IMG BLOB
65
BLOG TEXT CLOB
97
43
44
CHAPTER 3 ■ INTRODUCTION TO DATA DICTIONARY BASE TABLES
There’s the missing piece of information: if retention is specified, then LOB$.FLAGS, which
is obviously a bit vector, is incremented by 32. So the bit that represents 25 is set if RETENTION is
used. Leveraging our finding, we can write the following query, which uses the function BITAND
to detect whether RETENTION is enabled:
SQL> SELECT owner, object name,
CASE WHEN bitand(l.flags, 32)=0 THEN l.pctversion$
ELSE NULL
END AS pctversion,
CASE WHEN bitand(l.flags, 32)=32 THEN l.retention
ELSE NULL
END AS retention
FROM sys.lob$ l, dba objects o
WHERE l.lobj#=o.object id
AND o.object type='LOB'
AND OWNER='NDEBES';
OWNER
OBJECT NAME
PCTVERSION RETENTION
------------------------------ -------------------- ---------- ---------NDEBES
BLOG IMG BLOB
10
NDEBES
BLOG TEXT CLOB
10800
The result of this query is in line with the CREATE TABLE statement executed earlier. Direct
access to the dictionary base table SYS.LOB$ resolved the issue.
CHAPTER 4
■■■
IND$, V$OBJECT_USAGE,
and Index Monitoring
T
he view V$OBJECT USAGE is partially documented in the Oracle Database Reference manual
and in the Oracle Database Administrator’s Guide. The purpose of this view is to classify indexes as
used or unused. Since SELECT statements don’t benefit from unused indexes, and modifications through INSERT, UPDATE, and DELETE statements must maintain indexes, it may be
worthwhile to drop unused indexes.
It is undocumented that the view V$OBJECT USAGE can only be queried for information
on indexes within a single schema at a time while logged in as the user corresponding to the
schema. Furthermore, it is undocumented that ALTER INDEX REBUILD switches off index monitoring and marks the rebuilt index as used. Last but not least, it is undocumented that there is
a performance penalty for index monitoring, since it causes the execution of recursive SQL
statements each time a monitored index is used. The Oracle Database SQL Reference manual
incorrectly states that the MONITORING USAGE clause of ALTER INDEX may only be used on indexes
owned by the user who executes ALTER INDEX. Additional undocumented aspects are that index
usage monitoring cannot be used for primary key indexes of index-organized tables (the error
“ORA-25176: storage specification not permitted for primary key” would be raised) and domain
indexes (“ORA-29871: invalid alter option for a domain index” would result).
This chapter presents an improved view for index usage information that is built directly
on data dictionary base tables. The enhanced view removes the restriction of merely retrieving
information on indexes owned by the current user. It takes the effects of ALTER INDEX REBUILD
into account and designates only those indexes as used, which were accessed by DML, and not
an index rebuild operation. The enhanced view allows a DBA to detect superfluous indexes in
all schemas of a database.
Schema Restriction
V$OBJECT USAGE is a misnomer for a view that is based only on data dictionary tables in schema
SYS and not on X$ fixed tables. The prefix V$ suggests that it is a dynamic performance view,
but it is not. The lack of a column called “OWNER” might send you wondering how the DBA is
supposed to find out which indexes in an application schema have been used. After all, views
such as DBA INDEXES and DBA SEGMENTS have a column “OWNER”, and dynamic performance
views such as V$ACCESS and V$SEGMENT STATISTICS also have a column called “OWNER”, such
that the DBA can view information for any schema he or she chooses. If you thought you as the
45
46
CHAPTER 4 ■ IND$, V$OBJECT_USAGE, AND INDEX MONITORING
almighty DBA could do the same with index usage information and retrieve information for
other schemas than your own DBA schema—think again. Seriously, the only way to get index
usage information for a foreign schema is to connect to that schema, which requires knowledge
of the password or a temporary change of the password as discussed in Chapter 15. The temporary password change is risky, since any connect attempt by an application will fail while the
changed password is in effect. When done properly, the window where this can happen is small,
but nonetheless it may cause problems. ALTER SESSION SET CURRENT SCHEMA (see Chapter 5) won’t
help, since it affects only the current schema name but not the logon user name.
Following are the column definitions of V$OBJECT USAGE:
SQL> DESCRIBE v$object usage
Name
----------------------------------------INDEX NAME
TABLE NAME
MONITORING
USED
START MONITORING
END MONITORING
Null?
-------NOT NULL
NOT NULL
Type
-----------VARCHAR2(30)
VARCHAR2(30)
VARCHAR2(3)
VARCHAR2(3)
VARCHAR2(19)
VARCHAR2(19)
Take a closer look at the columns START MONITORING and END MONITORING. Their data type is
VARCHAR2. I think I vaguely remember that Oracle Corporation recommends using DATE columns
and not VARCHAR2 to store date and time information. Well, maybe the design concept for this
view was approved on April Fools’ Day. Let’s have a look at the view’s definition.
SQL> SET LONG 0815
SQL> SELECT text FROM dba views WHERE owner='SYS' and
view name='V$OBJECT USAGE';
TEXT
-------------------------------------------------------------select io.name, t.name,
decode(bitand(i.flags, 65536), 0, 'NO', 'YES'),
decode(bitand(ou.flags, 1), 0, 'NO', 'YES'),
ou.start monitoring,
ou.end monitoring
from sys.obj$ io, sys.obj$ t, sys.ind$ i, sys.object usage ou
where io.owner# = userenv('SCHEMAID')
and i.obj# = ou.obj#
and io.obj# = ou.obj#
and t.obj# = i.bo#
There’s the culprit. The view uses the undocumented parameter SCHEMAID in a call of the
function USERENV. Called in this way, it returns the numeric user identification in the same way
as the query SELECT user id FROM all users WHERE username=user would. The numeric identifier
is used to filter SYS.OBJ$, the data dictionary base table underlying views such as DBA OBJECTS.
As a consequence, a DBA cannot retrieve information on indexes in foreign schemas.
CHAPTER 4 ■ IND$, V$OBJECT_USAGE, AND INDEX MONITORING
Index Usage Monitoring Case Study
We will use the sample schema HR (see Oracle Database Sample Schemas manual) as our playground for the case study. First, I will enable index usage monitoring on all indexes in schema
HR. For this purpose, the source code depot contains the function MONITOR SCHEMA INDEXES
(file monitor schema indexes.sql), which may be used to switch index usage monitoring on all
indexes in a schema on or off. Before proceeding, I will acquaint you with this function.
Function MONITOR_SCHEMA_INDEXES
The syntax of function MONITOR SCHEMA INDEXES is as follows:
FUNCTION site sys.monitor schema indexes (
ownname VARCHAR2 DEFAULT NULL,
failed counter OUT NUMBER,
monitoring BOOLEAN DEFAULT TRUE
) RETURN INTEGER AUTHID CURRENT USER;
Parameters
Parameter
Description
ownname
Schema name on which to operate. If NULL, the current schema is used.
failed counter
Returns the number of times an ALTER INDEX statement failed due to
“ORA-00054 resource busy and acquire with NOWAIT specified.” This
happens when another session holds an incompatible lock on the base
table of an index, such as when a transaction on the table is open.
monitoring
Used to switch monitoring on (TRUE) or off (FALSE).
Usage Notes
The function returns the number of indexes that were successfully altered. If the value of
FAILED COUNTER is greater than zero, it is best to wait until open transactions have completed
and to rerun the procedure until FAILED COUNTER=0 is returned, i.e., no objects to be altered
remain.
Examples
Switch on index monitoring on all indexes in schema SH.
SQL> VARIABLE success counter NUMBER
SQL> VARIABLE failed counter NUMBER
SQL> EXEC :success counter:=site sys.monitor schema indexes(ownname=>'SH', > failed counter=>:failed counter);
47
48
CHAPTER 4 ■ IND$, V$OBJECT_USAGE, AND INDEX MONITORING
Switch off index monitoring on all indexes in the current schema.
SQL> EXEC :success counter:=site sys.monitor schema indexes( > failed counter=>:failed counter, monitoring=>false);
Enabling Index Monitoring on Schema HR
To enable index usage monitoring on all indexes in schema HR, connect as user HR and run
SITE SYS.MONITOR SCHEMA INDEXES. Before doing so, you may wish to query V$OBJECT USAGE to
confirm that none of the indexes in schema HR have ever been monitored.
SQL> CONNECT hr/secret
SQL> SELECT * FROM v$object usage;
no rows selected
SQL> VARIABLE success counter NUMBER
SQL> VARIABLE failed counter NUMBER
SQL> SET AUTOPRINT ON
SQL> EXEC :success counter:=site sys.monitor schema indexes( > failed counter=>:failed counter);
PL/SQL procedure successfully completed.
FAILED COUNTER
-------------0
SUCCESS COUNTER
--------------18
SQL> SELECT table name, index name, monitoring, used,
start monitoring, end monitoring
FROM v$object usage ORDER BY 1, 2;
TABLE NAME INDEX NAME
MONITORING USED START MONITORING
END MONITORING
----------- ----------------- ---------- ---- ------------------- -------------DEPARTMENTS DEPT ID PK
YES
NO 10/04/2007 17:21:54
DEPARTMENTS DEPT LOCATION IX YES
NO 10/04/2007 17:21:55
EMPLOYEES
EMP DEPARTMENT IX YES
NO 10/04/2007 17:21:55
EMPLOYEES
EMP EMAIL UK
YES
NO 10/04/2007 17:21:55
EMPLOYEES
EMP EMP ID PK
YES
NO 10/04/2007 17:21:55
EMPLOYEES
EMP JOB IX
YES
NO 10/04/2007 17:21:55
EMPLOYEES
EMP MANAGER IX
YES
NO 10/04/2007 17:21:55
EMPLOYEES
EMP NAME IX
YES
NO 10/04/2007 17:21:55
…
The SQL*Plus setting SET AUTOTRACE TRACEONLY EXPLAIN tells SQL*Plus to merely run EXPLAIN
PLAN on the statements entered, without actually executing them or fetching any rows in case
of a SELECT statement. May I ask you to cast a vote? Will EXPLAIN PLAN mark indexes indicated by
a plan as used, or is it necessary to actually access an index by fetching rows? Please cast your
vote before you read on.
CHAPTER 4 ■ IND$, V$OBJECT_USAGE, AND INDEX MONITORING
SQL> SET AUTOTRACE TRACEONLY EXPLAIN
SQL> SELECT emp.last name, emp.first name, d.department name
FROM hr.employees emp, hr.departments d
WHERE emp.department id=d.department id
AND d.department name='Sales';
Execution Plan
---------------------------------------------------------Plan hash value: 2912831499
---------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |Cost (%CPU)|
----------------------------------------------------------------------------------| 0 | SELECT STATEMENT
|
|
10 |
340 |
4
(0)|
| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES
|
10 |
180 |
1
(0)|
| 2 |
NESTED LOOPS
|
|
10 |
340 |
4
(0)|
|* 3 |
TABLE ACCESS FULL
| DEPARTMENTS
|
1 |
16 |
3
(0)|
|* 4 |
INDEX RANGE SCAN
| EMP DEPARTMENT IX |
10 |
|
0
(0)|
----------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------3 - filter("D"."DEPARTMENT NAME"='Sales')
4 - access("EMP"."DEPARTMENT ID"="D"."DEPARTMENT ID")
The execution plan1 indicates that the index EMP DEPARTMENT IX would be used if the query
were executed. Let’s take a look at V$OBJECT USAGE.
SQL> SELECT table name, index name, monitoring, used,
start monitoring, end monitoring
FROM v$object usage
WHERE table name IN ('EMPLOYEES', 'DEPARTMENTS');
TABLE NAME INDEX NAME
MONITORING USED START MONITORING
END MONITORING
----------- ----------------- ---------- ---- ------------------- -------------DEPARTMENTS DEPT ID PK
YES
NO 10/04/2007 17:21:54
DEPARTMENTS DEPT LOCATION IX YES
NO 10/04/2007 17:21:55
EMPLOYEES
EMP DEPARTMENT IX YES
YES 10/04/2007 17:21:55
EMPLOYEES
EMP EMAIL UK
YES
NO 10/04/2007 17:21:55
…
The index EMP DEPARTMENT IX is indeed marked as used (column USED=YES), even though
merely EXPLAIN PLAN was executed.
Index Rebuild
It is undocumented that an index rebuild affects V$OBJECT USAGE. It sets V$OBJECT USAGE.
USED=YES and V$OBJECT USAGE.MONITORING=NO, i.e., it terminates index monitoring.
1. To improve legibility, the column TIME was omitted from the execution plan.
49
50
CHAPTER 4 ■ IND$, V$OBJECT_USAGE, AND INDEX MONITORING
SQL> ALTER INDEX dept id pk REBUILD;
Index altered.
SQL> SELECT * FROM v$object usage WHERE
INDEX NAME
TABLE NAME MONITORING
---------------- ----------- ---------DEPT ID PK
DEPARTMENTS NO
DEPT LOCATION IX DEPARTMENTS YES
table name='DEPARTMENTS';
USED START MONITORING
END MONITORING
---- ------------------- -------------YES 10/04/2007 17:21:54
NO 10/04/2007 17:21:55
This behavior is a bit surprising, since an index rebuild is not the kind of index usage a DBA
would be interested in. Other DDL statements, such as ANALYZE INDEX index_name VALIDATE
STRUCTURE or ANALYZE TABLE table_name VALIDATE STRUCTURE CASCADE have no influence on the
index status in V$OBJECT USAGE, although they do access index segments.
Indexes Used by DML
Finding out which indexes were used by DML statements is what really counts. Several factors
make this more intricate than you might suspect. We haven’t yet considered the case where
index monitoring on an index that was marked as used is switched off. By calling MONITOR
SCHEMA INDEXES with the parameter MONITORING set to FALSE, we switch off index monitoring for
all indexes in schema HR.
SQL> EXEC :success counter:=site sys.monitor schema indexes(monitoring=>false, > failed counter=>:failed counter);
FAILED COUNTER
-------------0
SUCCESS COUNTER
--------------17
Since the index rebuild already switched off monitoring on one of the indexes and the
function only considers indexes that do not yet have the desired status, the value of the variable
SUCCESS COUNTER is 17. Let’s take a look at the contents of V$OBJECT USAGE.
SQL> SELECT table name, index name, monitoring AS monitored,
used, start monitoring, end monitoring
FROM v$object usage
WHERE table name IN ('EMPLOYEES', 'DEPARTMENTS');
TABLE NAME INDEX NAME
MONITORED USED START MONITORING
----------- ----------------- --------- ---- ------------------DEPARTMENTS DEPT ID PK
NO
YES 10/04/2007 17:21:54
DEPARTMENTS DEPT LOCATION IX NO
NO 10/04/2007 17:21:55
EMPLOYEES
EMP DEPARTMENT IX NO
YES 10/04/2007 17:21:55
EMPLOYEES
EMP EMAIL UK
NO
NO 10/04/2007 17:21:55
…
END MONITORING
------------------10/04/2007 18:17:58
10/04/2007 18:17:58
10/04/2007 18:17:58
CHAPTER 4 ■ IND$, V$OBJECT_USAGE, AND INDEX MONITORING
What we’re seeing now is that as expected MONITORING=NO for all indexes. Note the subtle
difference between the index DEPT ID PK, which had index monitoring switched off due to an
ALTER INDEX REBUILD, and the index EMP DEPARTMENT IX, which had index monitoring switched
off with ALTER INDEX index_name NOMONITORING by the function MONITOR SCHEMA INDEXES. The
former has END MONITORING set to NULL, whereas the latter has the point in time when index
monitoring was switched off. This is a clue for distinguishing between an index rebuild and a
genuine index usage due to DML.
Taking all of the findings into account, the following cases have to be considered:
• Rebuilt indexes are marked as used and monitoring on them is switched off, while leaving
the value END MONITORING set to NULL. Since we are only interested in index usage due to
DML, we need to exclude this case.
• Indexes that were used by DML retain the settings of MONITORING (YES) and
END MONITORING (NULL).
• Indexes on which monitoring was switched off after they were used by DML retain the
setting MONITORING=YES, but have an actual timestamp instead of NULL in END MONITORING.
The following query retrieves only indexes that were marked as used by DML, but not by
an index rebuild:
SQL> SELECT * FROM v$object usage
WHERE (monitoring='YES' AND used='YES') OR
(used='YES' AND end monitoring IS NOT NULL)
ORDER BY index name;
INDEX NAME
TABLE NAME MONITORING USED START MONITORING
END MONITORING
----------------- ---------- ---------- ---- ------------------- ------------------EMP DEPARTMENT IX EMPLOYEES NO
YES 10/04/2007 17:21:55 10/04/2007 18:17:58
This essentially solves the issue of index monitoring, apart from the annoyance that a DBA
cannot retrieve information on foreign schemas, except by connecting with the user name that
is identical to the schema name. This is the time for data dictionary base tables to make their
appearance on stage. After dwelling on the definition of V$OBJECT USAGE in the file catalog.sql
for a moment, it is not hard to write an enhanced version of the view, which deserves the name
DBA INDEX USAGE, i.e., a view that allows access to index usage information for all indexes in a
database, not just within the current schema. Since I don’t intend to cause confusion by imitating
Oracle Corporation’s naming convention, I will simply call the view INDEX USAGE. The script
view index usage.sql to create it is reproduced below. It adds the column OWNER, which it retrieves
from SYS.USER$. USER$, has to be joined with OBJ$ using OBJ$.OWNER#=USER$.USER# to retrieve
the names of index owners. I’m creating database objects in schema SITE SYS to prevent interference with the data dictionary in schema SYS.
51
52
CHAPTER 4 ■ IND$, V$OBJECT_USAGE, AND INDEX MONITORING
SQL>
SQL>
SQL>
SQL>
SQL>
SQL>
CONNECT / AS SYSDBA
GRANT SELECT ON obj$ TO site sys WITH GRANT OPTION;
GRANT SELECT ON ind$ TO site sys WITH GRANT OPTION;
GRANT SELECT ON object usage TO site sys WITH GRANT OPTION;
GRANT SELECT ON user$ TO site sys WITH GRANT OPTION;
CREATE OR REPLACE VIEW site sys.index usage
(owner,
INDEX NAME,
TABLE NAME,
MONITORING,
USED,
START MONITORING,
END MONITORING)
AS
SELECT u.name, io.name index name, t.name table name,
decode(bitand(i.flags, 65536), 0, 'NO', 'YES'),
decode(bitand(ou.flags, 1), 0, 'NO', 'YES'),
ou.start monitoring,
ou.end monitoring
FROM sys.obj$ io, sys.obj$ t, sys.ind$ i, sys.user$ u, sys.object usage ou
WHERE io.owner# = t.owner#
AND io.owner# = u.user#
AND i.obj# = ou.obj#
AND io.obj# = ou.obj#
AND t.obj# = i.bo#;
-- have to grant to public, to allow non DBAs access to the view
-- used by function MONITOR SCHEMA INDEXES, which runs with AUTHID CURRENT USER
GRANT SELECT ON site sys.index usage TO PUBLIC;
Still connected as SYS or any other user with the required privileges, we may now retrieve
index usage information on any foreign schema such as HR.
SQL> SELECT owner, table name, index name, monitoring, used
FROM site sys.index usage
WHERE owner='HR'
AND ((monitoring='YES' AND used='YES')
OR (used='YES' AND end monitoring IS NOT NULL));
OWNER TABLE NAME INDEX NAME
MONITORING USED
----- ---------- ----------------- ---------- ---HR
EMPLOYEES EMP DEPARTMENT IX NO
YES
Lessons Learned
This wraps up the case study on index monitoring. With the function MONITOR SCHEMA INDEXES
and the view INDEX USAGE, a DBA has the required tools to monitor index usage in any schema.
Indexes that were not used over a period of time that covers the complete code paths of
applications using a database may be dropped to reduce index maintenance costs. The issue
here lies with complete code path. You may never be certain that the entire code path of an
CHAPTER 4 ■ IND$, V$OBJECT_USAGE, AND INDEX MONITORING
application has been exercised. As a consequence, it is best to use the functionality in the early
stages of development and test instead of taking the risk to drop an index in a production database that later turns out to be required for end-of-business-year processing.
A less intrusive way of finding used indexes, which does not cause a performance penalty
due to recursive SQL, is using Statspack at level 6 or higher (see Chapter 25). However, this
approach is significantly more risky, since Statspack samples SQL statements and their execution plans, such that you may not have any data on the usage of certain indexes, simply since
the Statspack snapshots were taken at a time when no execution plans indicating these indexes
were cached in the shared pool.
Source Code Depot
Table 4-1 lists this chapter’s source files and their functionality.
Table 4-1. Index Monitoring Source Code Depot
File Name
Functionality
monitor schema indexes.sql
Contains function MONITOR SCHEMA INDEXES for switching index
usage monitoring on all indexes in a schema on or off. Calls
the script view index usage.sql to create the view INDEX USAGE,
since it is used by the function.
view index usage.sql
Contains the view INDEX USAGE for accessing index usage information for an entire database instead of merely for the current
schema as with V$OBJECT USAGE.
53
P A R T
3
Events
CHAPTER 5
■■■
Event 10027 and
Deadlock Diagnosis
P
art 1 introduced the partially documented initialization parameter EVENT and some of the
benefits that may be realized by using it for troubleshooting. Part 3 expands on that material in
the sense that the events it discusses may be set using the parameter EVENT or the SQL statements
ALTER SESSION/SYSTEM. The difference between these approaches is that solely the parameter
EVENT ensures that the configuration is both persistent and pertains to the entire lifetime of an
ORACLE DBMS instance. The subsequent chapters address events for deadlock diagnosis,
collection of performance data, and Oracle Net packet dumps.
Deadlocks
A deadlock occurs when two or more sessions hold locks and another lock request, which
would result in a circular chain of locks, is issued. If this new lock request were granted, the
sessions would deadlock and none of them would ever finish. Hence, the ORACLE DBMS
detects circular chains pertaining to interdependent locks, signals the error “ORA-00060: deadlock
detected while waiting for resource”, and rolls back one of the sessions involved in the would-be
deadlock. A trace file is written whenever an ORACLE instance detects a deadlock. The undocumented event 10027 gives the DBA control over the amount and type of diagnostic information
generated.
Figure 5-1 depicts a deadlock situation among two database sessions. Session 1 locks the
row with EMPLOYEE ID=182 at time t1. At time t2, session 2 locks the row with EMPLOYEE ID=193.
At t3, session 1 requests a lock on the row with EMPLOYEE ID=193, which is already locked by
session 2. Hence, session 1 has to wait on the event enq: TX - row lock contention. At t4, session 2
requests a lock on the row that session 1 locked at t1. Since granting this lock would lead to a
circular chain, the DBMS signals “ORA-00060: deadlock detected while waiting for resource” at t5.
The UPDATE statement executed by session 1 at t3 is rolled back. At this point, session 2 is still
waiting for the lock on the row with EMPLOYEE ID=182, which session 1 continues to hold. Session 1
should ROLLBACK in response to ORA-00060, releasing all its locks and allowing session 2 to
complete the update of employee 182.
57
CHAPTER 5 ■ EVENT 10027 AND DEADLOCK DIAGNOSIS
Database
Session
58
UPDATE hr.employees
SET phone_number='650.507.2876'
WHERE employee_id=193;
2
UPDATE hr.employees
SET salary=5000
WHERE employee_id=182;
1
t1
t2
UPDATE hr.employees
SET phone_number='650.507.9878'
WHERE employee_id=182;
UPDATE hr.employees
SET salary=5500
WHERE employee_id=193;
t3
t4
ORA-00060: deadlock
detected while waiting for
resource
t5
t
Figure 5-1. Deadlock detection
To avoid deadlocks, rows need to be locked in the same order by all database sessions.
If this is not feasible, deadlocks and the overhead associated with writing trace files may be
avoided by executing SELECT FOR UPDATE NOWAIT prior to an UPDATE. If this returns ORA-00054,
then the session needs to roll back and reattempt the entire transaction. The ROLLBACK will
allow other transactions to complete. The downside of this approach is the additional processing.
Following is an example that is tailored to the previous scenario:
SQL> SELECT rowid FROM hr.employees WHERE employee id=182 FOR UPDATE NOWAIT;
SELECT rowid FROM hr.employees WHERE employee id=182 FOR UPDATE NOWAIT
*
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified
SQL> ROLLBACK;
Rollback complete.
SQL> SELECT rowid FROM hr.employees WHERE employee id=182 FOR UPDATE NOWAIT;
ROWID
-----------------AAADNLAAEAAAEi1ABb
SQL> UPDATE hr.employees SET phone number='650.507.9878'
WHERE rowid='AAADNLAAEAAAEi1ABb';
1 row updated.
Event 10027
Event 10027 gives the DBA control over the amount and type of diagnostic information generated
in response to ORA-00060. At default settings, an ORA-00060 trace file contains cached cursors,
a deadlock graph, process state, current SQL statements of the sessions involved, and session
wait history (in Oracle10g and subsequent releases). Except for the current SQL statements and
the deadlock graph, all the information pertains merely to the session that received ORA00060. Event 10027 may be used to achieve the following oppositional goals:
CHAPTER 5 ■ EVENT 10027 AND DEADLOCK DIAGNOSIS
• Reduce the volume of trace information generated in response to ORA-00060, e.g., when
there is no way to fix the issue.
• Augment the trace information with a system state dump or call stack, in an attempt to
find the root cause of the deadlocks.
The smallest amount of trace information is written at level 1. At this level, the trace file
merely contains a deadlock graph and the current SQL statements of the sessions involved.
Following is an example ORA-00060 trace file with event 10027 at level 1:
*** ACTION NAME:() 2007-09-08 05:34:52.373
*** MODULE NAME:(SQL*Plus) 2007-09-08 05:34:52.373
*** SERVICE NAME:(SYS$USERS) 2007-09-08 05:34:52.373
*** SESSION ID:(159.5273) 2007-09-08 05:34:52.372
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)--------Resource Name
process session holds waits process session holds waits
TX-0007000f-000002a4
20
159
X
16
145
X
TX-000a0013-000002a3
16
145
X
20
159
X
session 159: DID 0001-0014-0000004B
session 145: DID 0001-0010-0000004E
session 145: DID 0001-0010-0000004E
session 159: DID 0001-0014-0000004B
Rows waited on:
Session 145: obj - rowid = 0000334B - AAADNLAAEAAAEi1ABb
(dictionary objn - 13131, file - 4, block - 18613, slot - 91)
Session 159: obj - rowid = 0000334B - AAADNLAAEAAAEi2AAE
(dictionary objn - 13131, file - 4, block - 18614, slot - 4)
Information on the OTHER waiting sessions:
Session 145:
pid=16 serial=1880 audsid=210018 user: 34/NDEBES
O/S info: user: oracle, term: pts/5, ospid: 24607, machine: dbserver1.oradbpro.com
program: sqlplus@dbserver1.oradbpro.com (TNS V1-V3)
application name: SQL*Plus, hash value=3669949024
Current SQL Statement:
UPDATE hr.employees SET phone number='650.507.9878' WHERE employee id=182
End of information on OTHER waiting sessions.
Current SQL statement for this session:
UPDATE hr.employees SET salary=5500 WHERE employee id=193
The system state dump included in the trace file at event level 2 may aid in diagnosing the
cause of a deadlock. A system state dump includes cached SQL and wait history of all sessions,
not just the current SQL statements of the sessions involved in the deadlock. Thus, it may be
possible to reconstruct the scenario that lead to a deadlock.
59
60
CHAPTER 5 ■ EVENT 10027 AND DEADLOCK DIAGNOSIS
The call stack trace included in the trace file at event level 4 is less useful. It shows which C
function an ORACLE server process was in at the time a deadlock was detected. If you encounter
deadlocks in an application for the first time, it is a good idea to temporarily set event 10027 at
level 2 with ALTER SYSTEM as follows:
SQL> ALTER SYSTEM SET EVENTS '10027 trace name context forever, level 2';
This will increase your chances of finding the root cause of deadlocks. If the setting shall
persist across instance startups, you need to use the initialization parameter EVENT.
EVENT="10027 trace name context forever, level 2"
As soon as you have obtained enough system state dumps for further analysis, you may
reduce the event level to 1. Since locks are released after an ORA-00060 trace file is written,
event 10027 at level 1 makes sure that sessions can respond more quickly. In my testing, I
observed that trace files at this level were one hundred times smaller than trace files with
default settings. The supported event levels and the trace information included at each level
are summarized in Table 5-1.1
Table 5-1. Event 10027 and Trace File Contents
Contents/Level
Default
Level 1
Level 2
Level 4
Cached cursors
yes
no
yes
yes
Call stack trace
no
no
no
yes
Deadlock graph
yes
yes
yes
yes
Process state
yes
no
yes
yes
yes
SQL statements
yes
yes
yes, for all sessions1
Session wait history
yes
no
yes, for all sessions
yes
System state
no
no
yes
no
1. SQL statements for all sessions are in the system state dump section.
CHAPTER 6
■■■
Event 10046 and Extended
SQL Trace
T
he Oracle Database Performance Tuning Guide 10g Release 2 states that event 10046 at level
8 may be used to enable logging of wait events to a SQL trace file. Additional event levels are
undocumented. Event 10046 is ideal for enabling extended SQL trace. When combined with
ALTER SESSION, event 10046 is the only way to enable extended SQL trace that does not require
DBA or SYSDBA privileges. This event is useful for building a self-tracing capability into an
application. Self-tracing means the ability of an application to create evidence of a performance problem upon request by the end user. Whenever an end user is dissatisfied with the
performance of an application he or she may switch on extended SQL trace without assistance
from a DBA. Such a feature is a big time-saver since the DBA does not need to identify which
database session serves the end user and it also reduces the risk that a transient performance
problem does not reproduce by the time a DBA can attend to it.
Event 10046 is used in the context of performance diagnosis in several places in this book.
Part 8 contains numerous examples of leveraging event 10046. This chapter is intended as a
brief reference for the event and its levels. The event is most useful at session and process level.
Examples of using the event at session level are in Chapter 13. Please refer to Chapter 37 for
instances of using the event at process level.
The supported event levels are detailed in Table 6-1. The term database call refers to the
parse, execute, and fetch stages of executing SQL statements.
Table 6-1. SQL Trace Levels
SQL Trace Level
Database Calls
Bind Variable Values
Wait Events
1
yes
no
no
4
yes
yes
no
8
yes
no
yes
12
yes
yes
yes
The following example illustrates how to use event 10046 to trace SQL statements, bind
variables, and wait events. The trace file is from Oracle11g Release 1. Note the new Oracle11g
parameter sqlid, which corresponds to V$SQL.SQL ID, in the PARSING IN CURSOR entry.
61
62
CHAPTER 6 ■ EVENT 10046 AND EXTENDED SQL TRACE
SQL> ALTER SESSION SET EVENTS '10046 trace name context forever, level 12';
Session altered.
SQL> VARIABLE id NUMBER
SQL> INSERT INTO customer(id, name, phone)
VALUES (customer id seq.nextval, '&name', '&phone')
RETURNING id INTO :id;
Enter value for name: Deevers
Enter value for phone: +1 310 45678923
1 row created.
Excerpts of the resulting SQL trace file are reproduced here:
*** ACTION NAME:() 2007-11-28 22:02:15.625
*** MODULE NAME:(SQL*Plus) 2007-11-28 22:02:15.625
*** SERVICE NAME:(SYS$USERS) 2007-11-28 22:02:15.625
*** SESSION ID:(32.171) 2007-11-28 22:02:15.625
…
WAIT #6: nam='SQL*Net message to client' ela= 6 driver id=1111838976 #bytes=1 p3=0
obj#=15919 tim=230939271782
*** 2007-11-30 09:45:33.828
WAIT #6: nam='SQL*Net message from client' ela= 235333094 driver id=1111838976
#bytes=1 p3=0 obj#=15919 tim=231174604922
=====================
PARSING IN CURSOR #4 len=122 dep=0 uid=32 oct=2 lid=32 tim=231174605324 hv=798092392
ad='6b59f600' sqlid='96032xwrt3v38'
INSERT INTO customer(id, name, phone)
VALUES (customer id seq.nextval, 'Deevers', '+1 310 45678923')
RETURNING id INTO :id
END OF STMT
PARSE #4:c=0,e=111,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=231174605317
BINDS #4:
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1000000 frm=00 csi=00 siz=24 off=0
kxsbbbfp=07a7e7a0 bln=22 avl=04 flg=05
value=370011
WAIT #4: nam='SQL*Net message to client' ela= 7 driver id=1111838976
#bytes=1 p3=0 obj#=15919 tim=231174606084
EXEC #4:c=15625,e=673,p=0,cr=0,cu=3,mis=0,r=1,dep=0,og=1,tim=231174606139
STAT #4 id=1 cnt=0 pid=0 pos=1 obj=0
op='LOAD TABLE CONVENTIONAL (cr=0 pr=0 pw=0 time=0 us)'
STAT #4 id=2 cnt=1 pid=1 pos=1 obj=15920
op='SEQUENCE CUSTOMER ID SEQ (cr=0 pr=0 pw=0 time=0 us)'
*** 2007-11-30 09:45:39.015
WAIT #4: nam='SQL*Net message from client' ela= 5179787 driver id=1111838976
#bytes=1 p3=0 obj#=15919 tim=231179786085
For details on how to interpret extended SQL trace files and how to automatically generate
a resource profile for performance diagnosis, please refer to Part 8.
CHAPTER 7
■■■
Event 10053 and the
Cost Based Optimizer
T
here is no better way to comprehend decisions and cost calculations of the cost based optimizer
(CBO) than to read an optimizer trace file generated with the undocumented event 10053.
Oracle Support will usually request such a trace file if you intend to file a technical assistance
request against the optimizer in situations where it fails to find an execution plan that yields an
acceptable response time.
Essentially, the CBO is a mathematical model for calculating estimates of SQL statement
response times. It receives initialization parameters, object statistics pertaining to tables and
indexes, as well as system statistics that represent the capabilities of the hardware, as input.
According to Oracle9i Database Performance Tuning Guide and Reference Release 2, the cost
calculation formula for serial execution used by the optimizer is as follows:
CPU Cycles
SRds sreadtim + MRds mreadtim + ----------------------------cpuspeed
--------------------------------------------------------------------------------------------------------------------------------sreadtim
Thus, the unit of cost is the time it takes to complete a single block read. Table 7-1 explains
the placeholders used in the formula. In case you are familiar with system statistics gathering
using DBMS STATS.GATHER SYSTEM STATS, the three placeholders—sreadtim, mreadtim, and
cpuspeed—will be familiar. All three are part of so-called workload statistics, which are derived
from measurements of the system on which the DBMS runs. The documented interfaces for
setting these parameters (and a few more) are the packaged procedures DBMS STATS.SET
SYSTEM STATS and DBMS STATS.IMPORT SYSTEM STATS. The current settings may be retrieved
with a call to the packaged procedure DBMS STATS.GET SYSTEM STATS. System statistics were
optional in Oracle9i. Oracle10g uses so-called noworkload statistics if actual measurements
have not been imported into the data dictionary table SYS.AUX STATS$ by one of the interfaces
cited above.
63
64
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
Table 7-1. CBO Cost Calculation Placeholders
Placeholder
Meaning
SRds
Single block reads.
sreadtim
Single block read time.
MRds
Multi block reads.
mreadtim
Multi block read time, i.e., the time it takes for a multi block read request to
complete. The number of blocks requested in a multi block read request is
limited by the parameter DB FILE MULTIBLOCK READ COUNT.
CPU Cycles
Number of CPU cycles required to execute a statement.
cpuspeed
Number of instructions per second.1
The following query result is from a system where all workload statistics parameters except
SLAVETHR (the parallel execution slave throughput) were set:
SQL> SELECT *
SNAME
------------SYSSTATS INFO
SYSSTATS INFO
SYSSTATS INFO
SYSSTATS INFO
SYSSTATS MAIN
SYSSTATS MAIN
SYSSTATS MAIN
SYSSTATS MAIN
SYSSTATS MAIN
SYSSTATS MAIN
SYSSTATS MAIN
SYSSTATS MAIN
SYSSTATS MAIN
FROM sys.aux stats$;
PNAME
PVAL1 PVAL2
---------- ------- ---------------STATUS
COMPLETED
DSTART
11-29-2007 13:49
DSTOP
11-29-2007 13:49
FLAGS
1
CPUSPEEDNW 841.336
IOSEEKTIM
10
IOTFRSPEED
4096
SREADTIM
4
MREADTIM
10
CPUSPEED
839
MBRC
14
MAXTHR
8388608
SLAVETHR
Workload parameters are reproduced in bold. MBRC is the actual multi block read count
derived from statistics collected in X$ fixed tables. MBRC is limited by the parameter DB FILE
MULTIBLOCK READ COUNT and is usually somewhat lower than this parameter. DB FILE MULTIBLOCK
READ COUNT had the value 16 on the system used for the example and the actual MBRC was 14.
MAXTHR is the maximum I/O throughput. SLAVETHR is the average parallel execution slave I/O
throughput. SREADTIM and MREADTIM have already been covered by Table 7-1.
1. Event 10053 trace files from Oracle10g Release 2 contain lines such as this:
CPUSPEED: 839 millions instructions/sec
This seems to indicate that the DBMS measures CPU speed by calculating how many instructions
complete per second. The kind of instruction used for this purpose is undocumented.
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
There are three noworkload statistics parameters.
• CPUSPEEDNW (noworkload CPU speed)
• IOSEEKTIM (I/O seek time in ms)
• IOTFRSPEED (I/O transfer speed in KB/s)
The unit of CPU speed is not MHz, but rather some undocumented measure proprietary
to Oracle Corporation Clearly, the output of the CBO is an execution plan. But this is not the
only output from the optimizer. Additionally, it provides the following items:
• Estimated number of rows in the statement’s results set
• Estimated number of bytes processed
• Query block names
• Filter predicates
• Access predicates
• Column projection information
• Values of peeked bind variables
• An outline, i.e., a set of hints that forces the chosen execution plan (requires Oracle10g
or newer)
Outline data reference query block names and are an ideal basis for testing alternative
plans with hints. The information in the preceding list may also be retrieved with DBMS XPLAN.
DISPLAY CURSOR starting with Oracle10g. However, undocumented format options are required
for retrieving the outline and peeked binds. Use the format option OUTLINE to include the
outline and PEEKED_BINDS to retrieve peeked bind variable values. There’s also an undocumented format option ADVANCED that also includes an outline, but not peeked bind variable
values. Here is an example:
SQL> SELECT * FROM TABLE(dbms xplan.display cursor(null, null,
'OUTLINE PEEKED BINDS -PROJECTION -PREDICATE -BYTES'));
PLAN TABLE OUTPUT
------------------------------------SQL ID 9w4xfcb47qfdn, child number 0
------------------------------------SELECT e.last name, e.first name, d.department name FROM hr.employees e,
hr.departments d WHERE e.department id=d.department id AND
d.department id=:dept id AND e.employee id=:emp id AND first name=:fn
Plan hash value: 4225575861
65
66
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
----------------------------------------------------------------------------------|Id| Operation
| Name
| Rows | Cost (%CPU)| Time
|
----------------------------------------------------------------------------------| 0| SELECT STATEMENT
|
|
|
2 (100)|
|
| 1| NESTED LOOPS
|
|
1 |
2
(0)| 00:00:01 |
| 2|
TABLE ACCESS BY INDEX ROWID| DEPARTMENTS
|
1 |
1
(0)| 00:00:01 |
| 3|
INDEX UNIQUE SCAN
| DEPT ID PK
|
1 |
0
(0)|
|
| 4|
TABLE ACCESS BY INDEX ROWID| EMPLOYEES
|
1 |
1
(0)| 00:00:01 |
| 5|
INDEX UNIQUE SCAN
| EMP EMP ID PK |
1 |
0
(0)|
|
----------------------------------------------------------------------------------Outline Data
------------/*+
BEGIN OUTLINE DATA
IGNORE OPTIM EMBEDDED HINTS
OPTIMIZER FEATURES ENABLE('10.2.0.3')
ALL ROWS
OUTLINE LEAF(@"SEL$1")
INDEX RS ASC(@"SEL$1" "D"@"SEL$1" ("DEPARTMENTS"."DEPARTMENT ID"))
INDEX RS ASC(@"SEL$1" "E"@"SEL$1" ("EMPLOYEES"."EMPLOYEE ID"))
LEADING(@"SEL$1" "D"@"SEL$1" "E"@"SEL$1")
USE NL(@"SEL$1" "E"@"SEL$1")
END OUTLINE DATA
*/
Peeked Binds (identified by position):
-------------------------------------1 - :DEPT ID (NUMBER): 50
2 - :EMP ID (NUMBER): 120
3 - :FN (VARCHAR2(30), CSID=178): 'Matthew'
When the first two arguments to DBMS XPLAN.DISPLAY CURSOR are NULL, the SQL identifier
(SQL ID) and child cursor number (CHILD NUMBER) of the previous SQL statement are used as
defaults. As is evident from the previous example, a large set of hints is required to fully specify
an execution plan. Whenever the optimizer does not honor an individual hint, the reason may
be that it had to take some decisions of its own, which precluded honoring the individual hint.
Trace File Contents
A 10053 trace file is a protocol of the optimizer’s inputs, calculations, and outputs. The correct
event level to use is 1. Higher levels do not produce additional output. Be warned that the trace
file contents as well as the costing formulas used by the CBO are subject to change without
notice. The main sections of an Oracle10g Release 2 trace file are as follows:
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
• Query blocks and object identifiers (DBA OBJECTS.OBJECT ID) of the tables involved in an
optimized statement
• Query transformations considered (predicate move-around, subquery unnesting, etc.)
• Legend (abbreviations used)
• Results of bind variable peeking
• Optimizer parameters (documented and hidden)
• System statistics (workload or noworkload)
• Object statistics for tables and indexes
• Single table access path and cost for each table
• List of join orders and cost of each
• Execution plan
• Predicate information
• A full set of hints including query block names, which would be used to define a stored
outline
If, after enabling event 10053, you do not find the aforementioned sections in a trace file,
the CBO may have used a cached execution plan instead of optimizing a statement from scratch.
You can force a cursor miss by inserting a comment into the statement.
Case Study
In the subsequent section, we will generate a 10053 trace file for a 5-way join on tables of the
sample schema HR.
SQL> VARIABLE loc VARCHAR2(30)
SQL> EXEC :loc:='South San Francisco'
SQL> ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';
SQL> SELECT emp.last name, emp.first name, j.job title, d.department name, l.city,
l.state province, l.postal code, l.street address, emp.email,
emp.phone number, emp.hire date, emp.salary, mgr.last name
FROM hr.employees emp, hr.employees mgr, hr.departments d, hr.locations l, hr.jobs j
WHERE l.city=:loc
AND emp.manager id=mgr.employee id
AND emp.department id=d.department id
AND d.location id=l.location id
AND emp.job id=j.job id;
SQL> ALTER SESSION SET EVENTS '10053 trace name context off';
The query used in the case study is available in the file hr 5way join.sql in the source
code depot. The following trace file excerpts are from Oracle10g Release 2.
67
68
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
Query Blocks and Object Identifiers
This section lists all the query blocks along with the tables in each. Since the query executed did
not contain any subselects, there is but a single query block:
Registered qb: SEL$1 0x4aea6b4 (PARSER)
signature (): qb name=SEL$1 nbfros=5 flg=0
fro(0): flg=4 objn=51905 hint alias="D"@"SEL$1"
fro(1): flg=4 objn=51910 hint alias="EMP"@"SEL$1"
fro(2): flg=4 objn=51908 hint alias="J"@"SEL$1"
fro(3): flg=4 objn=51900 hint alias="L"@"SEL$1"
fro(4): flg=4 objn=51910 hint alias="MGR"@"SEL$1"
The object identifiers (objn) may be used to determine the owner(s) of the tables.
SQL> SELECT owner, object name, object type
FROM dba objects WHERE object id IN (51905, 51910);
OWNER OBJECT NAME OBJECT TYPE
----- ----------- ----------HR
DEPARTMENTS TABLE
HR
EMPLOYEES
TABLE
Query Transformations Considered
The optimizer considers several query transformations. Note the SQL identifier “2ck90xfmsza4u”
in the unparsed query subsection, which may be used to retrieve past execution plans for the
statement with the packaged procedure DBMS XPLAN.DISPLAY AWR or Statspack (see Chapter 25).
**************************
Predicate Move-Around (PM)
**************************
PM: Considering predicate move-around in SEL$1 (#0).
PM: Checking validity of predicate move-around in SEL$1 (#0).
CBQT: Validity checks failed for 2ck90xfmsza4u.
CVM: Considering view merge in query block SEL$1 (#0)
Query block (04AEA6B4) before join elimination:
SQL:******* UNPARSED QUERY IS *******
SELECT "EMP"."LAST NAME" "LAST NAME","EMP"."FIRST NAME" "FIRST NAME",
"J"."JOB TITLE" "JOB TITLE","D"."DEPARTMENT NAME" "DEPARTMENT NAME",
"L"."CITY" "CITY","L"."STATE PROVINCE" "STATE PROVINCE"
,"L"."POSTAL CODE" "POSTAL CODE","L"."STREET ADDRESS" "STREET ADDRESS",
"EMP"."EMAIL" "EMAIL","EMP"."PHONE NUMBER" "PHONE NUMBER",
"EMP"."HIRE DATE" "HIRE DATE","EMP"."SALARY" "SALARY",
"MGR"."LAST NAME" "LAST NAME"
FROM "HR"."EMPLOYEES" "EMP","HR"."EMPLOYEES" "MGR",
"HR"."DEPARTMENTS" "D", "HR"."LOCATIONS" "L","HR"."JOBS" "J"
WHERE "L"."CITY"=:B1 AND "EMP"."MANAGER ID"="MGR"."EMPLOYEE ID"
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
AND "EMP"."DEPARTMENT ID"="D"."DEPARTMENT ID"
AND "D"."LOCATION ID"="L"."LOCATION ID" AND "EMP"."JOB ID"="J"."JOB ID"
Query block (04AEA6B4) unchanged
CBQT: Validity checks failed for 2ck90xfmsza4u.
***************
Subquery Unnest
***************
SU: Considering subquery unnesting in query block SEL$1 (#0)
*************************
Set-Join Conversion (SJC)
*************************
SJC: Considering set-join conversion in SEL$1 (#0).
**************************
Predicate Move-Around (PM)
**************************
PM: Considering predicate move-around in SEL$1 (#0).
PM: Checking validity of predicate move-around in SEL$1 (#0).
PM:
PM bypassed: Outer query contains no views.
FPD: Considering simple filter push in SEL$1 (#0)
FPD:
Current where clause predicates in SEL$1 (#0) :
"L"."CITY"=:B1 AND "EMP"."MANAGER ID"="MGR"."EMPLOYEE ID"
AND "EMP"."DEPARTMENT ID"="D"."DEPARTMENT ID"
AND "D"."LOCATION ID"="L"."LOCATION ID"
AND "EMP"."JOB ID"="J"."JOB ID"
kkogcp: try to generate transitive predicate from check constraints for SEL$1 (#0)
constraint: "MGR"."SALARY">0
constraint: "EMP"."SALARY">0
predicates with check contraints: "L"."CITY"=:B1
AND "EMP"."MANAGER ID"="MGR"."EMPLOYEE ID"
AND "EMP"."DEPARTMENT ID"="D"."DEPARTMENT ID"
AND "D"."LOCATION ID"="L"."LOCATION ID"
AND "EMP"."JOB ID"="J"."JOB ID" AND "MGR"."SALARY">0 AND "EMP"."SALARY">0
after transitive predicate generation:
"L"."CITY"=:B1 AND "EMP"."MANAGER ID"="MGR"."EMPLOYEE ID"
AND "EMP"."DEPARTMENT ID"="D"."DEPARTMENT ID"
AND "D"."LOCATION ID"="L"."LOCATION ID" AND "EMP"."JOB ID"="J"."JOB ID"
AND "MGR"."SALARY">0 AND "EMP"."SALARY">0
finally: "L"."CITY"=:B1 AND "EMP"."MANAGER ID"="MGR"."EMPLOYEE ID"
AND "EMP"."DEPARTMENT ID"="D"."DEPARTMENT ID"
AND "D"."LOCATION ID"="L"."LOCATION ID" AND "EMP"."JOB ID"="J"."JOB ID"
apadrv-start: call(in-use=744, alloc=0), compile(in-use=47988, alloc=0)
kkoqbc-start
: call(in-use=756, alloc=0), compile(in-use=49312, alloc=0)
The optimizer retrieves check constraints from the data dictionary to generate additional
predicates (AND "MGR"."SALARY">0 AND "EMP"."SALARY">0).
69
70
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
Legend
The legend section lists abbreviations used in the trace file.
Legend
The following abbreviations are used by optimizer trace.
CBQT - cost-based query transformation
JPPD - join predicate push-down
FPD - filter push-down
PM - predicate move-around
CVM - complex view merging
SPJ - select-project-join
SJC - set join conversion
SU - subquery unnesting
OBYE - order by elimination
ST - star transformation
qb - query block
LB - leaf blocks
DK - distinct keys
LB/K - average number of leaf blocks per key
DB/K - average number of data blocks per key
CLUF - clustering factor
NDV - number of distinct values
Resp - response cost
Card - cardinality
Resc - resource cost
NL - nested loops (join)
SM - sort merge (join)
HA - hash (join)
CPUCSPEED - CPU Speed
IOTFRSPEED - I/O transfer speed
IOSEEKTIM - I/O seek time
SREADTIM - average single block read time
MREADTIM - average multiblock read time
MBRC - average multiblock read count
MAXTHR - maximum I/O system throughput
SLAVETHR - average slave I/O throughput
dmeth - distribution method
1: no partitioning required
2: value partitioned
4: right is random (round-robin)
512: left is random (round-robin)
8: broadcast right and partition left
16: broadcast left and partition right
32: partition left using partitioning of right
64: partition right using partitioning of left
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
128: use hash partitioning dimension
256: use range partitioning dimension
2048: use list partitioning dimension
1024: run the join in serial
0: invalid distribution method
sel - selectivity
ptn - partition
Results of Bind Variable Peeking
This section contains all the bind variables used in the query, their data types (oacdty), and
values. The output is identical to the format of an extended SQL trace with event 10046 at levels
4 or 12.
*******************************************
Peeked values of the binds in SQL statement
*******************************************
kkscoacd
Bind#0
oacdty=01 mxl=32(30) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1000000 frm=01 csi=178 siz=32 off=0
kxsbbbfp=04c3ae00 bln=32 avl=19 flg=05
value="South San Francisco"
Optimizer Parameters
This section holds a listing of 184 documented and hidden parameters, which affect the
optimizer’s calculations and decisions, and is organized into three subsections:
1. Parameters with altered values.
2. Parameters with default values.
3. Parameters supplied with the hint OPT PARAM (e.g., OPT PARAM('optimizer index cost
adj' 30)). This undocumented hint may be part of stored outlines.
Subsections 1 and 3 from the example trace file are empty, since all parameters had
default values.
***************************************
PARAMETERS USED BY THE OPTIMIZER
********************************
*************************************
PARAMETERS WITH ALTERED VALUES
******************************
*************************************
71
72
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
PARAMETERS WITH DEFAULT VALUES
******************************
optimizer mode hinted
optimizer features hinted
parallel execution enabled
parallel query forced dop
parallel dml forced dop
parallel ddl forced degree
parallel ddl forced instances
query rewrite fudge
optimizer features enable
optimizer search limit
cpu count
active instance count
parallel threads per cpu
hash area size
bitmap merge area size
sort area size
sort area retained size
sort elimination cost ratio
optimizer block size
sort multiblock read count
hash multiblock io count
db file optimizer read count
optimizer max permutations
pga aggregate target
pga max size
query rewrite maxdisjunct
smm auto min io size
smm auto max io size
smm min size
smm max size
smm px max size
cpu to io
optimizer undo cost change
parallel query mode
parallel dml mode
parallel ddl mode
optimizer mode
sqlstat enabled
optimizer percent parallel
always anti join
always semi join
optimizer mode force
partition view enabled
always star transformation
query rewrite or error
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
false
0.0.0
true
0
0
0
0
90
10.2.0.1
5
2
1
2
131072
1048576
65536
0
0
8192
2
0
16
2000
119808 KB
204800 KB
257
56 KB
248 KB
128 KB
23961 KB
59904 KB
0
10.2.0.1
enabled
disabled
enabled
all rows
false
101
choose
choose
true
true
false
false
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
hash join enabled
cursor sharing
b tree bitmap plans
star transformation enabled
optimizer cost model
new sort cost estimate
complex view merging
unnest subquery
eliminate common subexpr
pred move around
convert set to join
push join predicate
push join union view
fast full scan enabled
optim enhance nnull detection
parallel broadcast enabled
px broadcast fudge factor
ordered nested loop
no or expansion
optimizer index cost adj
optimizer index caching
system index caching
disable datalayer sampling
query rewrite enabled
query rewrite integrity
query cost rewrite
query rewrite 2
query rewrite 1
query rewrite expression
query rewrite jgmigrate
query rewrite fpc
query rewrite drj
full pwise join enabled
partial pwise join enabled
left nested loops random
improved row length enabled
index join enabled
enable type dep selectivity
improved outerjoin card
optimizer adjust for nulls
optimizer degree
use column stats for function
subquery pruning enabled
subquery pruning mv enabled
or expand nvl predicate
like with bind as equality
table scan cost plus one
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
true
exact
true
false
choose
true
true
true
true
true
false
true
true
true
true
true
100
true
false
100
0
0
false
true
enforced
true
true
true
true
true
true
true
true
true
true
true
true
true
true
true
0
true
true
false
true
false
true
73
74
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
cost equality semi join
= true
default non equality sel check
= true
new initial join orders
= true
oneside colstat for equijoins
= true
optim peek user binds
= true
minimal stats aggregation
= true
force temptables for gsets
= false
workarea size policy
= auto
smm auto cost enabled
= true
gs anti semi join allowed
= true
optim new default join sel
= true
optimizer dynamic sampling
= 2
pre rewrite push pred
= true
optimizer new join card computation = true
union rewrite for gs
= yes gset mvs
generalized pruning enabled
= true
optim adjust for part skews
= true
force datefold trunc
= false
statistics level
= typical
optimizer system stats usage
= true
skip unusable indexes
= true
remove aggr subquery
= true
optimizer push down distinct
= 0
dml monitoring enabled
= true
optimizer undo changes
= false
predicate elimination enabled
= true
nested loop fudge
= 100
project view columns
= true
local communication costing enabled = true
local communication ratio
= 50
query rewrite vop cleanup
= true
slave mapping enabled
= true
optimizer cost based transformation = linear
optimizer mjc enabled
= true
right outer hash enable
= true
spr push pred refspr
= true
optimizer cache stats
= false
optimizer cbqt factor
= 50
optimizer squ bottomup
= true
fic area size
= 131072
optimizer skip scan enabled
= true
optimizer cost filter pred
= false
optimizer sortmerge join enabled
= true
optimizer join sel sanity check
= true
mmv query rewrite enabled
= true
bt mmv query rewrite enabled
= true
add stale mv to dependency list
= true
distinct view unnesting
= false
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
optimizer dim subq join sel
= true
optimizer disable strans sanity checks = 0
optimizer compute index stats
= true
push join union view2
= true
optimizer ignore hints
= false
optimizer random plan
= 0
query rewrite setopgrw enable
= true
optimizer correct sq selectivity
= true
disable function based index
= false
optimizer join order control
= 3
optimizer cartesian enabled
= true
optimizer starplan enabled
= true
extended pruning enabled
= true
optimizer push pred cost based
= true
sql model unfold forloops
= run time
enable dml lock escalation
= false
bloom filter enabled
= true
update bji ipdml enabled
= 0
optimizer extended cursor sharing = udo
dm max shared pool pct
= 1
optimizer cost hjsmj multimatch
= true
optimizer transitivity retain
= true
px pwg enabled
= true
optimizer secure view merging
= true
optimizer join elimination enabled = true
flashback table rpi
= non fbt
optimizer cbqt no size restriction = true
optimizer enhanced filter push
= true
optimizer filter pred pullup
= true
rowsrc trace level
= 0
simple view merging
= true
optimizer rownum pred based fkr
= true
optimizer better inlist costing
= all
optimizer self induced cache cost = false
optimizer min cache blocks
= 10
optimizer or expansion
= depth
optimizer order by elimination enabled = true
optimizer outer to anti enabled
= true
selfjoin mv duplicates
= true
dimension skip null
= true
force rewrite enable
= false
optimizer star tran in with clause = true
optimizer complex pred selectivity = true
gby hash aggregation enabled
= true
***************************************
PARAMETERS IN OPT PARAM HINT
****************************
75
76
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
When changing DB FILE MULTIBLOCK READ COUNT at session level, this is reflected as the
undocumented parameter DB FILE OPTIMIZER READ COUNT in the subsection on altered values.
*************************************
PARAMETERS WITH ALTERED VALUES
******************************
db file optimizer read count
= 64
The excerpt below illustrates the effect of adjusting OPTIMIZER INDEX COST ADJ at statement level with the hint OPT PARAM on the third subsection.
***************************************
PARAMETERS IN OPT PARAM HINT
****************************
optimizer index cost adj
= 30
A small subset of these parameters, pertaining to the SELECT statement used as an example,
may be retrieved with the following query:
SQL> SELECT name FROM V$SQL OPTIMIZER ENV WHERE sql id='2ck90xfmsza4u';
System Statistics
On the system where the case study was performed, workload statistics had been set in the data
dictionary with the following anonymous PL/SQL block (file set system stats.sql):
SQL> BEGIN
dbms stats.set
dbms stats.set
dbms stats.set
dbms stats.set
dbms stats.set
END;
/
system
system
system
system
system
stats('sreadtim', 4);
stats('mreadtim', 10 );
stats('cpuspeed', 839);
stats('mbrc', 14);
stats('maxthr', 8 * 1048576);
This is reflected in the 10053 trace as reproduced here:
*****************************
SYSTEM STATISTICS INFORMATION
*****************************
Using WORKLOAD Stats
CPUSPEED: 839 millions instructions/sec
SREADTIM: 4 milliseconds
MREADTIM: 10 milliseconds
MBRC: 14.000000 blocks
MAXTHR: 8388608 bytes/sec
SLAVETHR: -1 bytes/sec
These system statistics were derived by averaging the results of several system statistics
gatherings with DBMS STATS.GATHER SYSTEM STATS. The value of CPUSPEED depends on the
hardware used and fluctuates with the apportionment of CPU time to the DBMS. I have included
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
the script gather get system stats.sql in the source code depot of this chapter. The script
collects system statistics during several subsequent intervals and stores them in a statistics
table outside of the data dictionary. By default, it uses four intervals of 15 minutes each. It also
includes PL/SQL code for retrieving system statistics. The script does not affect any optimizer
parameters or decisions. Workload statistics must be imported into the data dictionary to affect
execution plans generated by the CBO.
When noworkload statistics are used, the 10053 system statistics trace file section looks
as follows:
*****************************
SYSTEM STATISTICS INFORMATION
*****************************
Using NOWORKLOAD Stats
CPUSPEED: 485 millions instruction/sec
IOTFRSPEED: 4096 bytes per millisecond (default is 4096)
IOSEEKTIM: 10 milliseconds (default is 10)
The value of CPUSPEED depends on the hardware used, whereas the parameters
IOTFRSPEED and IOSEEKTIM have identical values on any UNIX or Windows port of
Oracle10g.
Object Statistics for Tables and Indexes
This section comprises statistics from DBA TABLES, DBA TAB COL STATISTICS and DBA INDEXES. If
the statement accesses partitioned objects, statistics from DBA TAB PARTITIONS and DBA IND
PARTITIONS would be present. In case histograms have been created for some columns, statistics
from DBA TAB HISTOGRAMS would be displayed. In the excerpt that follows, the column LOCATION
ID has a histogram with seven buckets, whereas column DEPARTMENT ID does not have a histogram.
Merely columns that are candidates for filter or access predicates are listed. Abbreviations used
in this section and their meanings are depicted in Table 7-2.
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: JOBS Alias: J
#Rows: 19 #Blks: 5 AvgRowLen: 33.00
Column (#1): JOB ID(VARCHAR2)
AvgLen: 8.00 NDV: 19 Nulls: 0 Density: 0.052632
Index Stats::
Index: JOB ID PK Col#: 1
LVLS: 0 #LB: 1 #DK: 19 LB/K: 1.00 DB/K: 1.00 CLUF: 1.00
Table Stats::
Table: DEPARTMENTS Alias: D
#Rows: 27 #Blks: 5 AvgRowLen: 20.00
Column (#1): DEPARTMENT ID(NUMBER)
AvgLen: 4.00 NDV: 27 Nulls: 0 Density: 0.037037 Min: 10 Max: 270
77
78
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
Column (#4): LOCATION ID(NUMBER)
AvgLen: 3.00 NDV: 7 Nulls: 0 Density: 0.018519 Min: 1400 Max: 2700
Histogram: Freq #Bkts: 7 UncompBkts: 27 EndPtVals: 7
Index Stats::
Index: DEPT ID PK Col#: 1
LVLS: 0 #LB: 1 #DK: 27 LB/K: 1.00 DB/K: 1.00 CLUF: 1.00
Index: DEPT LOCATION IX Col#: 4
LVLS: 0 #LB: 1 #DK: 7 LB/K: 1.00 DB/K: 1.00 CLUF: 1.00
…
Table 7-2. Abbreviations Used in the Section “Base Statistical Information”
Abbreviation
Meaning
Dictionary View (and Column)
#Bkts
Number of histogram buckets
DBA TAB HISTOGRAMS
#Blks
Number of table blocks below the
high water mark
DBA TABLES.BLOCKS
#DK
Distinct keys
DBA INDEXES.DISTINCT KEYS
#LB
Number of index leaf blocks
DBA INDEXES.LEAF BLOCKS
#Rows
Number of table rows
DBA TABLES.NUM ROWS
AvgLen
Average column value length
DBA TAB COLUMNS.AVG COL LEN
AvgRowLen
Average row length
DBA TABLES.AVG ROW LEN
EndPtVals
Histogram end point values
DBA TAB HISTOGRAMS
CLUF
Clustering factor
DBA INDEXES.CLUSTERING FACTOR
DB/K
Average number of index data blocks
per key
DBA INDEXES.AVG DATA BLOCKS PER K
EY
LB/K
Average number of index leaf blocks
per key
DBA INDEXES.AVG LEAF BLOCKS PER K
EY
LVLS
B-tree level
DBA INDEXES.BLEVEL
NDV
Number of distinct values
DBA TAB COLUMNS.NUM DISTINCT
UncompBkts
Uncompressed buckets
n/a
The clustering factor is an indication of how clustered (beneficial) or randomly distributed
(detrimental) data is in a table’s segment. Indexes are always sorted. When individual index
blocks point to many table blocks, data pertaining to a range of index keys is randomly distributed
within a table’s segment, and a high clustering factor, which renders index access unattractive,
results.
When histograms are absent, the following holds for density:
1
Densit y = ------------NDV
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
Missing Statistics
As soon as one of the objects referenced in a SQL statement has object statistics, the CBO is
used to calculate the best execution plan. When statistics for one or more tables, indexes, or
partitions thereof are missing, the calculation may go awry. The 10053 trace file points out
which objects lack statistics, and contains the default values CBO uses instead of actual values
computed with DBMS STATS.
Table Stats::
Table: LOCATIONS Alias: L (NOT ANALYZED)
#Rows: 409 #Blks: 5 AvgRowLen: 100.00
Column (#1): LOCATION ID(NUMBER) NO STATISTICS (using defaults)
AvgLen: 22.00 NDV: 13 Nulls: 0 Density: 0.07824
Index Stats::
Index: LOC CITY IX Col#: 4
(NOT ANALYZED)
LVLS: 1 #LB: 25 #DK: 100 LB/K: 1.00 DB/K: 1.00 CLUF: 800.00
Single Table Access Path and Cost
The optimizer ignores join conditions in the single table access path section. Solely access
predicates, which were supplied a value with a literal or bind variable, are considered. In the
absence of such predicates, merely a full table scan or index fast full scan are considered for
a table. The optimizer estimates how many rows will be retrieved and calculates the cheapest
alternative for getting them. In the following excerpt, the optimizer determines that it is cheaper
to use the index LOC CITY IX than to perform a full table scan:
SINGLE TABLE ACCESS PATH
Column (#4): CITY(VARCHAR2)
AvgLen: 9.00 NDV: 23 Nulls: 0 Density: 0.043478
Table: LOCATIONS Alias: L
Card: Original: 23 Rounded: 1 Computed: 1.00 Non Adjusted: 1.00
Access Path: TableScan
Cost: 3.01 Resp: 3.01 Degree: 0
Cost io: 3.00 Cost cpu: 41607
Resp io: 3.00 Resp cpu: 41607
Access Path: index (AllEqRange)
Index: LOC CITY IX
resc io: 2.00 resc cpu: 14673
ix sel: 0.043478 ix sel with filters: 0.043478
Cost: 2.00 Resp: 2.00 Degree: 1
Best:: AccessPath: IndexRange Index: LOC CITY IX
Cost: 2.00 Degree: 1 Resp: 2.00 Card: 1.00 Bytes: 0
***************************************
SINGLE TABLE ACCESS PATH
Table: JOBS Alias: J
Card: Original: 19 Rounded: 19 Computed: 19.00 Non Adjusted: 19.00
79
80
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
Access Path: TableScan
Cost: 3.01 Resp: 3.01 Degree: 0
Cost io: 3.00 Cost cpu: 38837
Resp io: 3.00 Resp cpu: 38837
Best:: AccessPath: TableScan
Cost: 3.01 Degree: 1 Resp: 3.01 Card: 19.00
***************************************
Bytes: 0
According to Jonathan Lewis ([Lewi 2005]), the CBO calculates the cost of a B-tree index
access with this formula:
LVLS + ceiling  #LB  ix_sel  + ceiling  CLUF  ix_sel_with_filters 
The values for ix_sel (index selectivity) and ix_sel_with_filters (effective table selectivity)
are found in the trace file. The values for LVLS (B-tree level), #LB (number of leaf blocks), and
CLUF (clustering factor) are in the section entitled “BASE STATISTICAL INFORMATION”
presented earlier. When the formula is applied to the index LOC CITY IX, it does yield the same
result as in the optimizer trace.
0 + ceiling  1  0.043478  + ceiling  1  0.043478  = 2
The formula also gives correct results when applied to more complicated cases.
Dynamic Sampling
Dynamic sampling is a feature of the CBO introduced with Oracle9i. Dynamic sampling is the
capability of the CBO to calculate statistics based on a small sample of rows as it optimizes a
query. The feature is controlled by the parameter OPTIMIZER DYNAMIC SAMPLING and is enabled
by default in Oracle9i Release 2 and subsequent versions. The following excerpt depicts the
SELECT statement the optimizer used to dynamically sample the table LOCATIONS, after object
statistics for the table had been deleted:
** Generated dynamic sampling query:
query text :
SELECT /* OPT DYN SAMP */ /*+ ALL ROWS opt param('parallel execution enabled',
'false')
NO PARALLEL(SAMPLESUB) NO PARALLEL INDEX(SAMPLESUB)
NO SQL TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0), NVL(SUM(C3),0)
FROM (SELECT /*+ NO PARALLEL("L") INDEX("L" LOC CITY IX)
NO PARALLEL INDEX("L") */ 1 AS C1, 1 AS C2, 1 AS C3
FROM "LOCATIONS" "L" WHERE "L"."CITY"=:B1
AND ROWNUM <= 2500) SAMPLESUB
*** 2007-11-30 16:44:25.703
** Executed dynamic sampling query:
level : 2
sample pct. : 100.000000
actual sample size : 23
filtered sample card. : 1
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
filtered sample card. (index LOC CITY IX): 1
orig. card. : 23
block cnt. table stat. : 5
block cnt. for sampling: 5
max. sample block cnt. : 4294967295
sample block cnt. : 5
min. sel. est. : 0.01000000
index LOC CITY IX selectivity est.: 0.04347826
Join Orders
This section depicts all the join orders the CBO has scrutinized. There are n! (n factorial) join
orders for an n-way join. Thus, a 5-way join has 120 join orders and a 6-way join has 720. For
higher order joins, it would be a waste of CPU resources to try all possible join orders. Hence,
the CBO does not consider them all and stops examining join orders that exceed the cost of the
best plan found so far.
***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
***********************
Join order[1]: LOCATIONS[L]#0 JOBS[J]#1 DEPARTMENTS[D]#2
EMPLOYEES[EMP]#3 EMPLOYEES[MGR]#4
***************
Now joining: JOBS[J]#1
***************
NL Join
Outer table: Card: 1.00 Cost: 2.00 Resp: 2.00 Degree: 1 Bytes: 48
Inner table: JOBS Alias: J
Access Path: TableScan
NL Join: Cost: 5.02 Resp: 5.02 Degree: 0
Cost io: 5.00 Cost cpu: 53510
Resp io: 5.00 Resp cpu: 53510
Best NL cost: 5.02
resc: 5.02 resc io: 5.00 resc cpu: 53510
resp: 5.02 resp io: 5.00 resp cpu: 53510
Join Card: 19.00 = outer (1.00) * inner (19.00) * sel (1)
Join Card - Rounded: 19 Computed: 19.00
Best:: JoinMethod: NestedLoop
Cost: 5.02 Degree: 1 Resp: 5.02 Card: 19.00 Bytes: 75
…
***************
Now joining: DEPARTMENTS[D]#2
***************
…
81
82
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
***************
Now joining: EMPLOYEES[EMP]#3
***************
…
***************
Now joining: EMPLOYEES[MGR]#4
***************
…
***********************
Best so far: Table#: 0 cost: 2.0044 card: 1.0000 bytes: 48
Table#: 1 cost: 5.0159 card: 19.0000 bytes: 1425
Table#: 2 cost: 6.0390 card: 73.2857 bytes: 6862
Table#: 3 cost: 9.5672 card: 15.1429 bytes: 2400
Table#: 4 cost: 11.6050 card: 15.0013 bytes: 2580
***********************
Join order[2]: LOCATIONS[L]#0 JOBS[J]#1 DEPARTMENTS[D]#2
EMPLOYEES[MGR]#4 EMPLOYEES[EMP]#3
***************
Now joining: EMPLOYEES[MGR]#4
…
***********************
For each potential join order, the optimizer evaluates the cost of these three join methods:
• Nested loops (NL) join
• Sort-merge (SM) join
• Hash join (HA)
At the end of each section entitled “Now joining”, CBO reports the best join method
(e.g., “Best:: JoinMethod: NestedLoop”).
Join order[8]: LOCATIONS[L]#0 DEPARTMENTS[D]#2 EMPLOYEES[EMP]#3
JOBS[J]#1 EMPLOYEES[MGR]#4
***************
Now joining: EMPLOYEES[EMP]#3
***************
NL Join
Outer table: Card: 3.86 Cost: 3.01 Resp: 3.01 Degree: 1 Bytes: 67
Inner table: EMPLOYEES Alias: EMP
Access Path: TableScan
NL Join: Cost: 8.09 Resp: 8.09 Degree: 0
Cost io: 8.00 Cost cpu: 316513
Resp io: 8.00 Resp cpu: 316513
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
Access Path: index (AllEqJoinGuess)
Index: EMP DEPARTMENT IX
resc io: 1.00 resc cpu: 13471
ix sel: 0.091767 ix sel with filters: 0.091767
NL Join: Cost: 4.65 Resp: 4.65 Degree: 1
Cost io: 4.63 Cost cpu: 52993
Resp io: 4.63 Resp cpu: 52993
Best NL cost: 4.65
resc: 4.65 resc io: 4.63 resc cpu: 52993
resp: 4.65 resp io: 4.63 resp cpu: 52993
Join Card: 15.14 = outer (3.86) * inner (107.00) * sel (0.036691)
Join Card - Rounded: 15 Computed: 15.14
SM Join
Outer table:
resc: 3.01 card 3.86 bytes: 67 deg: 1 resp: 3.01
Inner table: EMPLOYEES Alias: EMP
resc: 3.02 card: 107.00 bytes: 66 deg: 1 resp: 3.02
using dmeth: 2 #groups: 1
SORT resource
Sort statistics
Sort width:
138 Area size:
131072 Max Area size:
24536064
Degree:
1
Blocks to Sort:
1 Row size:
84 Total Rows:
4
Initial runs:
1 Merge passes:
0 IO Cost / pass:
0
Total IO sort cost: 0
Total CPU sort cost: 3356360
Total Temp space used: 0
SORT resource
Sort statistics
Sort width:
138 Area size:
131072 Max Area size:
24536064
Degree:
1
Blocks to Sort:
2 Row size:
83 Total Rows:
107
Initial runs:
1 Merge passes:
0 IO Cost / pass:
0
Total IO sort cost: 0
Total CPU sort cost: 3388500
Total Temp space used: 0
SM join: Resc: 8.04 Resp: 8.04 [multiMatchCost=0.00]
SM cost: 8.04
resc: 8.04 resc io: 6.00 resc cpu: 6842201
resp: 8.04 resp io: 6.00 resp cpu: 6842201
HA Join
Outer table:
resc: 3.01 card 3.86 bytes: 67 deg: 1 resp: 3.01
Inner table: EMPLOYEES Alias: EMP
resc: 3.02 card: 107.00 bytes: 66 deg: 1 resp: 3.02
using dmeth: 2 #groups: 1
Cost per ptn: 0.50 #ptns: 1
hash area: 0 (max=0)
Hash join: Resc: 6.53 Resp: 6.53 [multiMatchCost=0.00]
83
84
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
HA cost: 6.53
resc: 6.53 resc io: 6.00 resc cpu: 1786642
resp: 6.53 resp io: 6.00 resp cpu: 1786642
Best:: JoinMethod: NestedLoop
Cost: 4.65 Degree: 1 Resp: 4.65 Card: 15.14 Bytes: 133
***************
Now joining: JOBS[J]#1
***************
NL Join
…
***********************
Best so far: Table#: 0 cost: 2.0044 card: 1.0000 bytes: 48
Table#: 2 cost: 3.0072 card: 3.8571 bytes: 268
Table#: 3 cost: 4.6454 card: 15.1429 bytes: 1995
Table#: 1 cost: 5.6827 card: 15.1429 bytes: 2400
Table#: 4 cost: 7.7205 card: 15.0013 bytes: 2580
…
Join order[76]: EMPLOYEES[MGR]#4 EMPLOYEES[EMP]#3
DEPARTMENTS[D]#2 LOCATIONS[L]#0 JOBS[J]#1
***************
Now joining: DEPARTMENTS[D]#2
…
Join order aborted: cost > best plan cost
***********************
(newjo-stop-1) k:0, spcnt:0, perm:76, maxperm:2000
Number of join permutations tried: 76
*********************************
(newjo-save)
[1 2 4 0 3 ]
Final - All Rows Plan: Best join order: 8
Cost: 7.7205 Degree: 1 Card: 15.0000 Bytes: 2580
Resc: 7.7205 Resc io: 7.6296 Resc cpu: 305037
Resp: 7.7205 Resp io: 7.6296 Resc cpu: 305037
At the end of the report for a particular join sequence, the CBO prints the best join order
detected so far along with its cost, cardinality, and data volume (bytes). All three figures displayed
under the heading “Best so far” are cumulative in the same way that the cost of a row source in
an execution plan includes the cost of dependent row sources.
The CBO has decided that the join order 8, which joins the tables in the sequence LOCATIONS,
DEPARTMENTS, EMPLOYEES (alias EMP), JOBS, and EMPLOYEES (alias MGR), has the lowest cost. This join
order is reflected in the execution plan and the LEADING hint in the next sections.
Execution Plan
This section contains a nicely formatted execution plan. Besides DBMS XPLAN.DISPLAY CURSOR,
V$SQL PLAN, AWR, and Statspack, the 10053 trace file is another reliable source for execution plans.
Remember that EXPLAIN PLAN is notoriously unreliable and should never be used in optimization
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
projects. Oracle Corporation cautions DBAs that “with bind variables in general, the EXPLAIN
PLAN output might not represent the real execution plan” (Oracle Database Performance Tuning
Guide 10g Release 2, page 19-4). It is a good idea to check the cardinalities (column “Rows”) in
the plan table. Suboptimal plans may result whenever these are grossly incorrect.
============
Plan Table
============
----------------------------------------------------------------------------------+
|Id|Operation
|Name
|Rows |Bytes|Cost |Time
|
----------------------------------------------------------------------------------+
|0 |SELECT STATEMENT
|
|
|
|
8|
|
|1 | NESTED LOOPS
|
|
15| 2580|
8|00:00:01 |
|2 | NESTED LOOPS
|
|
15| 2400|
6|00:00:01 |
|3 |
NESTED LOOPS
|
|
15| 1995|
5|00:00:01 |
|4 |
NESTED LOOPS
|
|
4| 268|
3|00:00:01 |
|5 |
TABLE ACCESS BY INDEX ROWID|LOCATIONS
|
1| 48|
2|00:00:01 |
|6 |
INDEX RANGE SCAN
|LOC CITY IX
|
1|
|
1|00:00:01 |
|7 |
TABLE ACCESS BY INDEX ROWID|DEPARTMENTS
|
4| 76|
1|00:00:01 |
|8 |
INDEX RANGE SCAN
|DEPT LOCATION IX |
4|
|
0|
|
|9 |
TABLE ACCESS BY INDEX ROWID |EMPLOYEES
|
4| 264|
1|00:00:01 |
|10|
INDEX RANGE SCAN
|EMP DEPARTMENT IX|
10|
|
0|
|
|11|
TABLE ACCESS BY INDEX ROWID |JOBS
|
1| 27|
1|00:00:01 |
|12|
INDEX UNIQUE SCAN
|JOB ID PK
|
1|
|
0|
|
|13| TABLE ACCESS BY INDEX ROWID |EMPLOYEES
|
1| 12|
1|00:00:01 |
|14|
INDEX UNIQUE SCAN
|EMP EMP ID PK
|
1|
|
0|
|
----------------------------------------------------------------------------------+
Predicate Information
The output in the predicate and plan sections is nearly identical to the output you would get
from running SELECT * FROM table (DBMS XPLAN.DISPLAY CURSOR()) immediately after the statement you are investigating.
Predicate Information:
---------------------6 - access("L"."CITY"=:LOC)
8 - access("D"."LOCATION ID"="L"."LOCATION ID")
10 - access("EMP"."DEPARTMENT ID"="D"."DEPARTMENT ID")
12 - access("EMP"."JOB ID"="J"."JOB ID")
14 - access("EMP"."MANAGER ID"="MGR"."EMPLOYEE ID")
Hints and Query Block Names
This section comprises a full set of hints including query block names. The hints would be used
to define a stored outline, which fixes the plan chosen by the CBO. The data is displayed with
correct syntax for hints.
85
86
CHAPTER 7 ■ EVENT 10053 AND THE COST BASED OPTIMIZER
Outline Data:
/*+
BEGIN OUTLINE DATA
IGNORE OPTIM EMBEDDED HINTS
OPTIMIZER FEATURES ENABLE('10.2.0.1')
ALL ROWS
OUTLINE LEAF(@"SEL$1")
INDEX(@"SEL$1" "L"@"SEL$1" ("LOCATIONS"."CITY"))
INDEX(@"SEL$1" "D"@"SEL$1" ("DEPARTMENTS"."LOCATION ID"))
INDEX(@"SEL$1" "EMP"@"SEL$1" ("EMPLOYEES"."DEPARTMENT ID"))
INDEX(@"SEL$1" "J"@"SEL$1" ("JOBS"."JOB ID"))
INDEX(@"SEL$1" "MGR"@"SEL$1" ("EMPLOYEES"."EMPLOYEE ID"))
LEADING(@"SEL$1" "L"@"SEL$1" "D"@"SEL$1" "EMP"@"SEL$1" "J"@"SEL$1" "MGR"@"SEL$1")
USE NL(@"SEL$1" "D"@"SEL$1")
USE NL(@"SEL$1" "EMP"@"SEL$1")
USE NL(@"SEL$1" "J"@"SEL$1")
USE NL(@"SEL$1" "MGR"@"SEL$1")
END OUTLINE DATA
*/
Source Code Depot
Table 7-3 lists this chapter’s source files and their functionality.
Table 7-3. Event 10053 Source Code Depot
File Name
Functionality
hr 5way join.sql
5-way join of tables in the sample schema HR
set system stats.sql
This script sets workload system statistics in the
data dictionary
gather get system stats.sql
This script collects system statistics during several
subsequent intervals (default: 4 intervals, 15 minutes
each) and stores them in a statistics table outside of
the data dictionary
hr.dmp
Conventional export dump file that contains tables
from the sample schema HR (created with Oracle10g
Release 2)
CHAPTER 8
■■■
Event 10079 and Oracle
Net Packet Contents
T
he undocumented event 10079 may be used to dump Oracle Net traffic to a trace file. It is
useful for quickly determining which SQL statements, PL/SQL calls, or SQL*Plus commands
send sensitive data such as passwords unencrypted.
Event 10079 is similar to Oracle Net tracing in that it dumps the complete contents of
network packet contents between database client and server. It is more convenient than changing
sqlnet.ora to enable dumping of Oracle Net packet contents. Unlike trace level client in
sqlnet.ora, it may also be used to enable packet dumps for database sessions that are already
established. Table 8-1 lists the supported event levels.
Table 8-1. Supported Levels of Event 10079
Level
Purpose
1
Trace network operations to/from client
2
In addition to level 1, dump data
4
Trace network operations to/from database link
8
In addition to level 4, dump data
Case Study
The subsequent sections assume that the Advanced Security Option for the encryption of
Oracle Net traffic is not used. The SQL*Plus User’s Guide and Reference does not state whether
passwords are sent encrypted when modified with the SQL*Plus command PASSWORD. Event 10079
may be used to find out.
87
88
CHAPTER 8 ■ EVENT 10079 AND ORACLE NET PACKET CONTENTS
SQL> CONNECT / AS SYSDBA
Connected.
SQL> ALTER SESSION SET EVENTS '10079 trace name context forever, level 2';
Session altered.
SQL> PASSWORD ndebes
Changing password for ndebes
New password:
Retype new password:
Password changed
SQL> ORADEBUG SETMYPID
Statement processed.
SQL> ORADEBUG TRACEFILE NAME
/opt/oracle/obase/admin/TEN/udump/ten1 ora 20364.trc
The resulting trace file contains the following packet dump:
C850BD0
C850BE0
C850BF0
C850C00
C850C10
C850C20
C850C30
C850C40
FFD668BF
54554110
0000C044
41323232
46323546
34423130
30304542
5F485455
646E06BF
454E5F48
31384000
39363642
30324343
32353232
00003245
53534150
73656265
53415057
39314642
45453539
30313239
45423332
00270000
44524F57
00000030
524F5753
38373930
42303242
39453434
44393431
410D0000
00000000
[.h....ndebes0...]
[.AUTH NEWPASSWOR]
[D....@81BF190978]
[222AB66995EEB20B]
[F52FCC20921044E9]
[01B4225223BE149D]
[BE00E2....'....A]
[UTH PASSWORD....]
Obviously, the password was sent encrypted. Thus, the SQL*Plus PASSWORD command is a
safe way to change passwords, whereas ALTER USER user_name IDENTIFIED BY new_password is
not, since it sends the password unencrypted along with the SQL statement text. By the way,
the preceding encryption is different from the password hash in DBA USERS.PASSWORD, such that
eavesdropping a communications link cannot be used to glean password hashes stored in the
dictionary base table USER$. Oracle Call Interface provides the function OCIPasswordChange()
and it is safe to assume that SQL*Plus uses this function to implement the PASSWORD command.
Unfortunately the manuals do not state whether or not OCIPasswordChange() encrypts passwords.
Some applications use roles, which are protected by a password, to enable certain privileges only when a user connects with the application. This is intended to restrict the privileges
of users who connect with SQL*Plus or other applications. Event 10079 may be used to prove
that both the SQL statement SET ROLE role_name IDENTIFIED BY password as well as DBMS
SESSION.SET ROLE send the role’s password unencrypted to the DBMS server. This means that
any user who knows enough about Oracle Net, can get the unencrypted role password from a
packet dump. Since an end user cannot add an ALTER SESSION statement to an application, an alternative way to dump Oracle Net packets is needed. All that is necessary is to copy tnsnames.ora and
sqlnet.ora to the user’s home directory and to set TNS ADMIN to the same directory. Then, after
adding the following two lines to sqlnet.ora:
CHAPTER 8 ■ EVENT 10079 AND ORACLE NET PACKET CONTENTS
trace level client=support
trace directory client=<user's home directory>
and restarting the application, the clear text password may be retrieved from the trace file.1
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
[28-NOV-2007
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
23:10:54:156]
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
00
64
73
5F
6F
3B
01
00
00
00
00
00
00
00
00
00
00
70
64
64
73
2E
62
69
72
6C
20
00
00
00
00
00
00
00
32
00
00
00
70
65
20
65
42
6D
6F
6F
65
45
00
00
00
00
00
00
00
00
00
00
00
72
6E
62
63
45
73
6E
6C
5F
4E
00
00
00
00
00
00
00
00
00
00
00
6F
74
79
72
47
5F
2E
65
63
44
01
00
00
08
00
00
01
00
00
00
00
6C
69
20
65
49
73
73
28
6D
3B
00
00
00
00
00
00
01
00
00
B2
07
65
66
74
74
4E
65
65
3A
64
0A
00
00
00
00
00
00
03
00
00
00
1F
20
69
6F
20
73
74
72
29
00
00
00
00
00
00
00
00
00
00
01
61
69
65
70
|..BEGIN.|
|dbms ses|
|sion.set|
| role(:r|
|ole cmd)|
|;.END;..|
|........|
|........|
|........|
|........|
|........|
|........|
|........|
|.2......|
|........|
|........|
|.......a|
|pprole.i|
|dentifie|
|d.by.top|
|secret |
The safe way to implement privileges, which are only available when connecting with an
application, is to use proxy authentication in conjunction with Oracle Internet Directory and
secure application roles.
Of course, the same vulnerability also applies to CREATE USER user_name IDENTIFIED BY
password. This statement also sends the password in clear text.
[08-SEP-2007
[08-SEP-2007
[08-SEP-2007
[08-SEP-2007
[08-SEP-2007
09:28:23:864]
09:28:23:864]
09:28:23:864]
09:28:23:864]
09:28:23:864]
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
08
20
20
49
65
23
55
49
45
63
43
53
44
44
72
52
45
45
20
65
45
52
4E
42
74
41
20
54
59
01
54
68
49
20
00
45
72
46
73
00
|.#CREATE|
|.USER.hr|
|.IDENTIF|
|IED.BY.s|
|ecret...|
Hence you should create users as externally identified and then change the password with
the SQL*Plus command PASSWORD.
1. The naming convention for Oracle Net trace files is cli spid.trc, where spid is the client process identifier.
89
PA R T
4
X$ Fixed Tables
CHAPTER 9
■■■
Introduction to X$ Fixed Tables
A
few X$ tables are mentioned in the documentation, but the vast majority are undocumented.
For example X$BH is mentioned in the Oracle9i Performance Tuning Guide as well as the Oracle10g
Performance Tuning Guide. The Oracle10g Warehouse Builder Installation and Administration
Guide contains a procedure for resolving locking issues pertaining to the library cache by
accessing X$KGLLK.1
Many X$ tables hold much more information than the GV$ views built on top of them. In
instances where information provided by GV$ or V$ views is insufficient, the underlying X$
table may be scrutinized. Numerous X$ tables do not serve as the basis for GV$ views at all.
X$ Fixed Tables and C Programming
At least a significant part, if not all of the code for the ORACLE DBMS kernel, is written in the C
programming language. A lot of data is maintained in two-dimensional arrays. On UNIX systems,
C arrays located in the SGA may be read by commercial third-party performance diagnostic
utilities that attach one or more shared memory segments that hold the SGA.2 Personally,
although I have great respect for the developers who reverse engineer internal data structures
in the SGA, I am not an advocate of such SGA sampling tools. After all they are mere sampling
tools and may miss relevant data. In my view, extended SQL trace data combined with Statspack
(or AWR) snapshots provide sufficient database performance diagnostic data that is not collected
by sampling.
The basic idea behind V$ views is to expose information in C data structures to database
administrators. This is done by mapping V$ views to C data structures through some intermediate layers. X$ tables are one of the intermediate layers. They are the layer closest to C, to be
precise. Of course the word table in X$ table has a meaning that is almost entirely different
from the meaning in a SQL context. It goes without saying that none of the X$ tables have a
segment in DBA SEGMENTS associated with them. Additional evidence for the uniqueness of X$
tables comes from the fact that the row source for accessing them is FIXED TABLE FULL. To
speed up access, indexes on X$ tables are maintained. The row source associated with index
access to a fixed table is called FIXED TABLE FIXED INDEX. Along the same lines, there are no
1. The manual shows how to access X$KGLLK for resolving the error “ORA-04021 timeout occurred while
waiting to lock object.”
2. See Oracle Wait Interface: A Practical Guide to Performance Diagnostics and Tuning by Richmond Shee
et al. ([ShDe 2004]).
93
94
CHAPTER 9 ■ INTRODUCTION TO X$ FIXED TABLES
metadata on V$ views in DBA VIEWS. Conversely, metadata on V_$ views, which are used for
granting access to V$ views, is available through DBA VIEWS.
Some X$ tables are linked to disk storage. For example, the column DCNAM of the X$ fixed table
X$KCCDC holds path names of data file copies created with the RMAN command COPY DATAFILE.
The disk storage for these path names is within the control file. By looking at V$CONTROLFILE
RECORD SECTION or its foundation X$KCCRS, you will be able to identify a section called “DATAFILE
COPY”. This is the control file section represented by X$KCCDC (KCCDC is short for Kernel Cache
Control file Data file Copy).
There is an old saying about the ORACLE DBMS which maintains that any well designed
feature has more than a single denomination. The synonymous terms V$ dynamic performance
view and V$ fixed view are an example of this. Another example is automatic undo management, which is also known as system managed undo (SMU) or the System Change Number
(SCN), which is occasionally referred to as the System Commit Number.
Layered Architecture
X$ fixed tables are accessed in a layered manner as depicted in Figure 9-1. A public synonym by
the same name exists for each V$ or GV$ fixed view. These synonyms refer to V_$ and GV_$
views owned by SYS. These are true views with metadata in DBA VIEWS. DDL for these views is
in catalog.sql. It is of the following form:
CREATE OR REPLACE VIEW {g|v} $<view name> AS
SELECT * FROM {g|v$}<fixed view name>;
SELECT on these V_$ and GV_$ views is then granted to SELECT CATALOG ROLE. All V$ dynamic
performance views hold information pertaining to the current instance, i.e., the instance the
user has connected to. V$ fixed views are based on GV$ fixed views, which are built on top of
one or more X$ fixed tables. On systems running Real Application Clusters, GV$ fixed views
provide access to information about other instances that have mounted the same database.
Array offsets in C start at 0, whereas a minimum index of 1 is preferred in SQL, e.g., the value of
the pseudo-column ROWNUM starts at 1. This is why you see that 1 is added to some columns of
X$ tables in the definition of the GV$ view.
Note that V$OBJECT USAGE is the only view—a true view with metadata in DBA VIEWS—that
is not based on a GV$ view. Instead, it retrieves information from data dictionary tables in
tablespace SYSTEM owned by user SYS. Since it violates the aforementioned rules for V$ views,
it doesn’t deserve the prefix V$. Apparently no knight armored by a valid Oracle customer
service number dares to dispute the undeserved prefix by opening a service request.
Many X$ table names follow a strict naming convention, where the first few letters represent a layer or module in the ORACLE kernel. For example KC means Kernel Cache and KT
Kernel Transaction. Table 9-1 has some more abbreviations used by X$ tables and their presumed
meaning.
CHAPTER 9 ■ INTRODUCTION TO X$ FIXED TABLES
V$ Public Synonyms
SYS.V_$ Views
SYS.V$ Fixed Views
GV$ Public Synonyms
SYS.GV_$ Views
SYS.GV$ Fixed Views
SYS.X$ Fixed Tables
Figure 9-1. Layered architecture of V$ fixed views, GV$ fixed views, and X$ fixed tables
Table 9-1. Abbreviations Used in X$ Fixed Table Names
Abbreviation
Presumed Meaning
K
Kernel
KC
Kernel Cache
KCB
Kernel Cache Buffer
KCBW
Kernel Cache Buffer Wait
KCC
Kernel Cache Control file
KCCB
Kernel Cache Control file Backup
KCCCF
Kernel Cache Copy Flash recovery area
KCCDC
Kernel Cache Control file Data file Copy
KCP
Kernel Cache transPortable tablespace
KCR
Kernel Cache Redo
KCT
Kernel Cache insTance
KG
Kernel Generic
KGL
Kernel Generic Library cache
KGLJ
Kernel Generic Library cache Java
KS
Kernel Service
KSB
Kernel Service Background
KSM
Kernel Service Memory
KSU
Kernel Service User
KSUSE
Kernel Service User SEssion
KSUSECON
Kernel Service User SEssion COnnection
KSUSEH
Kernel Service User SEssion History
KT
Kernel Transaction
95
96
CHAPTER 9 ■ INTRODUCTION TO X$ FIXED TABLES
Table 9-1. Abbreviations Used in X$ Fixed Table Names (Continued)
Abbreviation
Presumed Meaning
KTU
Kernel Transaction Undo
KX
Kernel eXecution
KXS
Kernel eXecution Sql
Granting Access to X$ Tables and V$ Views
For users other than SYS, the role SELECT CATALOG ROLE is sufficient to access V$ views in SQL
statements and anonymous blocks. Since roles are disabled inside stored PL/SQL routines such as
packages, users who require access to V$ or GV$ views from PL/SQL must be granted SELECT on
the corresponding V_$ or GV_$ view directly, rather than indirectly through a role. Another
option is to grant the system privilege SELECT ANY DICTIONARY. The latter option should be used
with caution, since it grants access to unencrypted passwords in SYS.LINK$ in Oracle9i. If security is a concern, you might not even want to give access to password hashes in DBA USERS with
SELECT CATALOG ROLE, since an intruder might try to crack them.
Contrary to V$ fixed views and corresponding V_$ views, there are no X_$ views on X$ fixed
tables. Hence there is no quick way to grant access to X$ tables to users other than SYS. The
preferred way to implement this access is to mimic the approach taken with V$ views, by creating
true views on X$ tables owned by SYS and granting access to these. Following the naming
conventions, these views might be given the prefix X_$. Except for the prefix, this is precisely
the approach taken in the implementation of Statspack. The Statspack installation script creates
three such views. Their names and purpose are summarized in Table 9-2. Access to these views
is granted to user PERFSTAT only, but grants to other users are in order.
Table 9-2. Statspack Views on X$ Fixed Tables
Public Synonym
View
X$ Base Table
Associated
V$ Views
STATS$X$KSPPI
STATS$X $KSPPI
X$KSPPI (Parameter
names and descriptions)
V$PARAMETER
STATS$X $KSPPSV
X$KSPPSV (Parameter
values at system level)
V$SYSTEM PARAMETER
STATS$X $KCBFWAIT
X$KCBFWAIT (Wait time
and number of waits at
data file level)
n/a
V$PARAMETER2
V$SGA CURRENT RESIZE OPS
V$SGA RESIZE OPS
V$SYSTEM PARAMETER2
V$SYSTEM PARAMETER
STATS$X$KSPPSV
V$SYSTEM PARAMETER2
STATS$X$KCBFWAIT
CHAPTER 9 ■ INTRODUCTION TO X$ FIXED TABLES
Drilling Down from V$ Views to X$ Fixed Tables
The next few sections present an approach for drilling down from a V$ view—via its underlying
GV$ view—all the way to one or more X$ tables at the lowest level. This approach is suitable
whenever the information exposed by a V$ view is a limiting factor in a troubleshooting or
performance diagnosis effort. Although Oracle Corporation has exposed more and more information in X$ tables over the last few releases, it may occasionally be necessary to glean additional
information from X$ tables. Of course, due to their undocumented nature, X$ tables are subject
to change without notice.
Some articles endorsing X$ tables found on the Internet overlook that, in many cases,
equally useful information can be pulled from V$ views. As an example, instead of turning to
X$BH to detect contention and hot blocks, one might also access V$SEGMENT STATISTICS, which
was introduced in Oracle9i. Statspack reports at level 7 or higher include data captured from
V$SEGMENT STATISTICS. Personally, I have never needed information from X$ tables to resolve a
performance problem. Resolving hanging issues is a different matter, since the documented
view DBA BLOCKERS does not consider library cache pins. Under such circumstances, knowing
about X$KGLLK is truly advantageous.
Drilling Down from V$PARAMETER to the Underlying X$ Tables
The best known X$ tables are probably X$KSPPI and X$KSPPCV. This section shows how to uncover
both by drilling down from V$PARAMETER. Likely, any DBA has used or heard of one or the other
undocumented (or hidden) parameter. Undocumented parameters start with an underscore
character ( ). Such parameters are not found in V$PARAMETER. In Oracle10g, you may have
noticed double underscore parameters in the server parameter files of instances with enabled
automatic shared memory management (e.g., db cache size). The columns of the view
V$PARAMETER are as follows:
SQL> DESCRIBE v$parameter
Name
Null?
----------------------------------------- -------NUM
NAME
TYPE
VALUE
DISPLAY VALUE
ISDEFAULT
ISSES MODIFIABLE
ISSYS MODIFIABLE
ISINSTANCE MODIFIABLE
ISMODIFIED
ISADJUSTED
ISDEPRECATED
DESCRIPTION
UPDATE COMMENT
HASH
Type
------------NUMBER
VARCHAR2(80)
NUMBER
VARCHAR2(512)
VARCHAR2(512)
VARCHAR2(9)
VARCHAR2(5)
VARCHAR2(9)
VARCHAR2(5)
VARCHAR2(10)
VARCHAR2(5)
VARCHAR2(5)
VARCHAR2(255)
VARCHAR2(255)
NUMBER
97
98
CHAPTER 9 ■ INTRODUCTION TO X$ FIXED TABLES
In a few moments, you will learn how to generate a list of all undocumented parameters
along with the default value and a description for each. The documented V$ view V$FIXED
VIEW DEFINITION will serve as our starting point. This V$ view is the repository for all V$ and
GV$ views as well as X$ tables of an instance. Inquiring this view about V$PARAMETER gives
the following:
SQL> COLUMN view definition FORMAT a80 WORD WRAPPED
SQL> SELECT view definition FROM v$fixed view definition
WHERE view name='V$PARAMETER';
VIEW DEFINITION
---------------------------------------------------------------------------select NUM , NAME , TYPE , VALUE , DISPLAY VALUE, ISDEFAULT ,
ISSES MODIFIABLE, ISSYS MODIFIABLE , ISINSTANCE MODIFIABLE, ISMODIFIED,
ISADJUSTED, ISDEPRECATED, DESCRIPTION, UPDATE COMMENT, HASH
from GV$PARAMETER where inst id = USERENV('Instance')
We learn that V$PARAMETER is based on GV$PARAMETER and that the former removes the crossinstance information found in the latter by filtering rows from other instances. No surprises here.
Except for the additional column INST ID, GV$PARAMETER has the same structure as V$PARAMETER.
SQL> DESC gv$parameter
Name
Null?
----------------------------------------- -------INST ID
NUM
NAME
TYPE
VALUE
DISPLAY VALUE
ISDEFAULT
ISSES MODIFIABLE
ISSYS MODIFIABLE
ISINSTANCE MODIFIABLE
ISMODIFIED
ISADJUSTED
ISDEPRECATED
DESCRIPTION
UPDATE COMMENT
HASH
Type
------------NUMBER
NUMBER
VARCHAR2(80)
NUMBER
VARCHAR2(512)
VARCHAR2(512)
VARCHAR2(9)
VARCHAR2(5)
VARCHAR2(9)
VARCHAR2(5)
VARCHAR2(10)
VARCHAR2(5)
VARCHAR2(5)
VARCHAR2(255)
VARCHAR2(255)
NUMBER
Further tapping V$FIXED VIEW DEFINITION gives this:
SQL> SELECT view definition FROM v$fixed view definition
WHERE view name='GV$PARAMETER';
VIEW DEFINITION
CHAPTER 9 ■ INTRODUCTION TO X$ FIXED TABLES
---------------------------------------------------------------------------select x.inst id,x.indx+1,ksppinm,ksppity,ksppstvl, ksppstdvl, ksppstdf,
decode(bitand(ksppiflg/256,1),1,'TRUE','FALSE'),
decode(bitand(ksppiflg/65536,3),1,'IMMEDIATE',2,'DEFERRED',
3,'IMMEDIATE','FALSE'), decode(bitand(ksppiflg,4),4,'FALSE',
decode(bitand(ksppiflg/65536,3), 0, 'FALSE', 'TRUE')),
decode(bitand(ksppstvf,7),1,'MODIFIED',4,'SYSTEM MOD','FALSE'),
decode(bitand(ksppstvf,2),2,'TRUE','FALSE'),
decode(bitand(ksppilrmflg/64, 1), 1, 'TRUE', 'FALSE'), ksppdesc,
ksppstcmnt, ksppihash
from x$ksppi x, x$ksppcv y where (x.indx = y.indx) and
((translate(ksppinm,' ','#') not like '##%') and
((translate(ksppinm,' ','#') not like '#%') or (ksppstdf = 'FALSE')
or (bitand(ksppstvf,5) > 0)))
The well-known X$ tables X$KSPPI and X$PSPPCV have come to the fore. Looking at the
where-clause, it is very obvious that parameters that start with one or two underscores are
hidden from the prying eyes of DBAs thirsty for knowledge.
Next, we need to make sense out of the cryptic column names. Since the column sequence
of GV$PARAMETER is known, we can add the well-understandable column names of GV$PARAMETER
as column aliases to the view definition, by traversing the select-list of the view from top to
bottom. This yields the following:
select x.inst id AS inst id,
x.indx+1 AS num, /* C language arrays start at offset 0,
but SQL stuff usually starts at offset 1*/
ksppinm AS name,
ksppity AS type,
ksppstvl AS value,
ksppstdvl AS display value,
ksppstdf AS isdefault,
decode(bitand(ksppiflg/256,1),1,'TRUE','FALSE') AS isses modifiable,
decode(bitand(ksppiflg/65536,3),1,'IMMEDIATE',2,'DEFERRED',
3,'IMMEDIATE','FALSE') AS issys modifiable,
decode(bitand(ksppiflg,4),4,'FALSE',
decode(bitand(ksppiflg/65536,3), 0, 'FALSE', 'TRUE')) AS isinstance modifiable,
decode(bitand(ksppstvf,7),1,'MODIFIED',4,'SYSTEM MOD','FALSE') AS ismodified,
decode(bitand(ksppstvf,2),2,'TRUE','FALSE') AS isadjusted,
decode(bitand(ksppilrmflg/64, 1), 1, 'TRUE', 'FALSE') AS isdeprecated,
ksppdesc AS description,
ksppstcmnt AS update comment,
ksppihash AS hash
99
100
CHAPTER 9 ■ INTRODUCTION TO X$ FIXED TABLES
from x$ksppi x, x$ksppcv y
where (x.indx = y.indx)
and (
(translate(ksppinm,' ','#') not like '##%')
and
(
(translate(ksppinm,' ','#') not like '#%')
or (ksppstdf = 'FALSE') or (bitand(ksppstvf,5) > 0)
)
);
Thus, we obtain a mapping from the cryptic column names of X$KSPPI and X$KSPPCV to the
well-understandable column names of the view GV$PARAMETER. The mappings for X$KSPPI and
X$KSPPCV are in Table 9-3 and Table 9-4 respectively.
Table 9-3. Columns of X$KSPPI
X$ Table Column
GV$ View Column
X$KSPPI.ADDR
n/a
X$KSPPI.INDX
GV$PARAMETER.NUM
X$KSPPI.INST ID
GV$PARAMETER.INST ID
X$KSPPI.KSPPINM
GV$PARAMETER.NAME
X$KSPPI.KSPPITY
GV$PARAMETER.TYPE
X$KSPPI.KSPPDESC
GV$PARAMETER.DESCRIPTION
X$KSPPI.KSPPIFLG
GV$PARAMETER.ISSES MODIFIABLE,
GV$PARAMETER.ISSYS MODIFIABLE,
GV$PARAMETER.ISINSTANCE MODIFIABLE
X$KSPPI.KSPPILRMFLG
GV$PARAMETER.ISDEPRECATED
X$KSPPI.KSPPIHASH
GV$PARAMETER.HASH
The column X$KSPPI.KSPPIFLG is a flag that is expanded to three separate columns in
GV$PARAMETER using BITAND and DECODE.
Table 9-4. Columns of X$KSPPCV
X$ Table Column
GV$ View Column
X$KSPPCV.ADDR
n/a
X$KSPPCV.INDX
GV$PARAMETER.NUM
X$KSPPCV.INST ID
GV$PARAMETER.INST ID
X$KSPPCV.KSPPSTVL
GV$PARAMETER.VALUE
X$KSPPCV.KSPPSTDVL
GV$PARAMETER.DISPLAY VALUE
CHAPTER 9 ■ INTRODUCTION TO X$ FIXED TABLES
Table 9-4. Columns of X$KSPPCV
X$ Table Column
GV$ View Column
X$KSPPCV.KSPPSTDF
GV$PARAMETER.ISDEFAULT
X$KSPPCV.KSPPSTVF
GV$PARAMETER.ISMODIFIED
X$KSPPCV.KSPPSTCMNT
GV$PARAMETER.UPDATE COMMENT
To retrieve undocumented parameters, we need to modify the where-clause in such a way
that only underscore parameters satisfy the predicates. We substitute LIKE ESCAPE in place of
the awkward TRANSLATE used in the original GV$ view definition (file hidden parameters.sql).
SELECT ksppinm name,
ksppstvl value,
ksppdesc description
FROM x$ksppi x, x$ksppcv y
WHERE (x.indx = y.indx)
AND x.inst id=userenv('instance')
AND x.inst id=y.inst id
AND ksppinm LIKE '\ %' ESCAPE '\'
ORDER BY name;
Running this query on Oracle10g Release 2 retrieves an impressive amount of 1124 undocumented parameters (540 in Oracle9i Release 2 and 1627 in Oracle11g). A small excerpt of the
query result, which includes the double underscore parameters introduced in Oracle10g, follows:
NAME
VALUE
DESCRIPTION
------------------------- ------------ ------------------------------------4031 dump bitvec
67194879
bitvec to specify dumps prior to 4031
error
…
PX use large pool
FALSE
Use Large Pool as source of PX buffers
db cache size
473956352
Actual size of DEFAULT buffer pool for
standard block size buffers
dg broker service names orcl XPT
service names for broker use
java pool size
4194304
Actual size in bytes of java pool
large pool size
4194304
Actual size in bytes of large pool
shared pool size
117440512
Actual size in bytes of shared pool
streams pool size
4194304
Actual size in bytes of streams pool
abort recovery on join FALSE
if TRUE, abort recovery on join
reconfigurations
…
yield check interval
100000
interval to check whether actses
should yield
1124 rows selected.
101
102
CHAPTER 9 ■ INTRODUCTION TO X$ FIXED TABLES
Relationships Between X$ Tables and V$ Views
Wouldn’t it be convenient to consult a document that contains the underlying X$ fixed table
for any V$ view and vice versa? Such a document would facilitate the drill-down process presented
in the previous section. Such a document may be generated automatically. The following four
facts are paramount to coding the generation of such a document:
• The column PREV HASH VALUE of the dynamic performance view V$SESSION holds the
hash value of the previous SQL statement executed by a database session.
• The execution plan for a SQL statement, which is identified by its hash value, is available
in V$SQL PLAN.
• A cached execution plan contains the names of the objects accessed in V$SQL PLAN.
OBJECT NAME.
• All row sources pertaining to X$ tables contain the string “FIXED TABLE”.
Based on this information, an algorithm that executes SELECT statements on V$ views and
pulls the names of underlying X$ tables from V$SQL PLAN may be devised. The final task is to
store the associations between V$ views and X$ tables found in a table, such as this:
CREATE TABLE x v assoc (
x id number,
v id number);
Instead of storing the fixed table or view names, the table X V ASSOC saves their object
identifiers. The names may be retrieved by joining either column with V$FIXED TABLE.
OBJECT ID.
Following is the pseudo-code of the aforementioned algorithm:
LOOP over all V$ view names and their OBJECT ID in V$FIXED TABLE
Parse and execute SELECT * FROM view name using dynamic SQL (DBMS SQL)
Get the value of V$SESSION.PREV HASH VALUE for the current session
LOOP over all object names from the execution plan for the previous SQL
statement in V$SQL PLAN,
considering only row sources that contain the string "FIXED TABLE"
Translate the object name to the fixed table number
V$FIXED TABLE.OBJECT ID
Insert the object identifier of the X$ table and the object identifier
of the associated V$ view
into the table X V ASSOC
END LOOP
END LOOP
The full source code is too long to reproduce here, but it is available in the file x v assoc.sql in
the source code depot. Once the table X V ASSOC is populated, the following query retrieves X$
fixed tables and the V$ views that are based on them:
CHAPTER 9 ■ INTRODUCTION TO X$ FIXED TABLES
SQL> SELECT
f1.name x name,
f2.name v name
FROM x v assoc a, v$fixed table f1, v$fixed table f2
WHERE a.x id=f1.object id
AND a.v id=f2.object id
ORDER BY x name;
In Oracle10g Release 2, 727 rows are returned by this query. To get a list of X$ fixed tables
underlying V$ views, run the following query:
SQL> SELECT
f1.name v name,
f2.name x name
FROM x v assoc a, v$fixed table f1, v$fixed table f2
WHERE a.v id=f1.object id
AND a.x id=f2.object id;
Table 9-5 depicts a tiny subset of the associations returned by the latter query.
Table 9-5. Some V$ Views and Their Underlying X$ Fixed Tables
V$ View
Underlying X$ table
V$ARCHIVED LOG
X$KCCAL
V$BH
X$BH
V$CONTROLFILE
X$KCCCF
V$DATAFILE
X$KCCFN
V$DB PIPES
X$KGLOB
V$INSTANCE
X$KVIT, X$QUIESCE, X$KSUXSINST
V$LATCH
X$KSLLD
V$MYSTAT
X$KSUSGIF, X$KSUMYSTA
V$PROCESS
X$KSUPR
V$RECOVER FILE
X$KCVFHMRR
V$SEGMENT STATISTICS
X$KTSSO, X$KSOLSFTS
V$SESSION
X$KSLED, X$KSUSE
V$SQL
X$KGLCURSOR CHILD
V$SQL BIND DATA
X$KXSBD
V$SQL PLAN
X$KQLFXPL
By spooling the output of the script x v assoc.sql, a text document that contains all the
associations between V$ views and X$ tables may be generated.
103
104
CHAPTER 9 ■ INTRODUCTION TO X$ FIXED TABLES
Source Code Depot
Table 9-6 lists this chapter’s source files and their functionality.
Table 9-6. X$ Fixed Tables Source Code Depot
File Name
Functionality
hidden parameters.sql
Lists hidden parameters from X$KSPPI and X$KSPPCV
x v assoc.sql
Generates associations between V$ fixed views and X$ fixed tables
CHAPTER 10
■■■
X$BH and Latch Contention
T
he X$ fixed table X$BH is partially documented in Oracle Database Performance Tuning Guide
10g Release 2. This chapter provides additional information on X$BH. Contrary to what you might
expect, I will show that scrutinizing undocumented X$ tables such as X$BH is not necessarily the
royal road to optimum system performance.
A latch is a low-level locking mechanism used by the ORACLE DBMS to protect memory
structures. The wait event latch free is used to account for the wait time a process incurs when
it attempts to get a latch, and the latch is unavailable on the first attempt. In Oracle10g, there
are several dedicated latch-related wait events for latches, which are usually affected by contention. For those events, the name of the latch appears in the name of the wait event. The wait
events latch: library cache or latch: cache buffers chains may serve as examples. The additional
wait events in Oracle10g dispel the requirement to find out which latch a generic latch free wait
pertains to. In Oracle9i, V$SESSION WAIT.P2 contains the latch number waited for. The latch
number corresponds to V$LATCH.LATCH#. By joining V$SESSION WAIT and V$LATCH, the latch
name in V$LATCH.NAME may be retrieved.
Cache buffers chains latches are used to protect a buffer list in the buffer cache. These latches
are used when searching for, adding, and removing a buffer from the buffer list. Contention on
this latch usually indicates that there is contention for the blocks protected by certain latches.
Contention may be detected by looking at the column MISSES of the fixed view V$LATCH CHILDREN.
The following query identifies child latches with the highest miss count:
SQL> SELECT name, addr, latch#, child#, misses, sleeps
FROM v$latch children
WHERE misses > 10000
ORDER BY misses;
NAME
ADDR
LATCH# CHILD# MISSES SLEEPS
-------------------- -------- ------ ------ ------ -----cache buffers chains 697ACFD8
122
190 11909
125
session idle bit
699B2E6C
7
2 13442
75
library cache pin
68BC3C90
216
2 30764
79
library cache
68BC3B58
214
2 178658
288
The cache buffers chains child latch with address 697ACFD8 is among the latches with the
highest miss count. Counters in V$LATCH CHILDREN are since instance startup. I chose the predicate MISSES > 10000 simply because this restricted the result of the query to four rows. Of course,
you should look at figures from an interval where a performance problem was observed, not
figures since instance startup.
105
106
CHAPTER 10 ■ X$BH AND LATCH CONTENTION
Applying the procedure for making sense out of cryptic X$ column names from the previous
chapter yields the column mapping between GV$BH and X$BH shown in Table 10-1. Note that
only a subset of X$BH columns, which includes all columns of GV$BH, is shown. Approximately
half of the columns in X$BH are not externalized by GV$BH. Table 10-1 contains n/a under the
heading “GV$BH Column Name” for columns not externalized by GV$BH.
Table 10-1. Mapping of GV$BH Columns to X$BH
X$BH Column Name GV$BH Column Name
Meaning
ADDR
n/a
Buffer header address
INDX
n/a
Buffer header index (0-n)
INST ID
INST ID
Instance number, corresponds to
GV$INSTANCE.INSTANCE NUMBER
HLADDR
n/a
Child latch address
BLSIZ
n/a
Block size
FLAG
DIRTY, TEMP, PING, STALE, DIRECT
Flag for the type and status of
the block
TS#
TS#
Tablespace number; corresponds to
V$TABLESPACE.TS#
FILE#
FILE#
File number; corresponds to
V$DATAFILE.FILE#
DBARFIL
n/a
Relative file number
DBABLK
BLOCK#
Block number from data block
address
CLASS
CLASS#
Class number
STATE
STATUS
Block status (free, xcur, cr, etc.)
LE ADDR
LOCK ELEMENT ADDR
Lock element address
OBJ
OBJD
Dictionary object number of the
segment that contains the object
CR SCN BAS
n/a
Consistent read SCN base
CR SCN WRP
n/a
Consistent read SCN wrap
TCH
n/a
touch count
TIM
n/a
touch time
CHAPTER 10 ■ X$BH AND LATCH CONTENTION
By joining X$BH and DBA OBJECTS, we may find out to which database objects the blocks
protected by the child latch belong (script latch vs blocks.sql).
SQL> SELECT bh.file#, bh.dbablk, bh.class,
decode(bh.state,0,'free',1,'xcur',2,'scur',3,'cr',
4,'read',5,'mrec',6,'irec',7,'write',8,'pi',
9,'memory',10,'mwrite',11,'donated') AS status,
decode(bitand(bh.flag,1), 0, 'N', 'Y') AS dirty, bh.tch,
o.owner, o.object name, o.object type
FROM x$bh bh, dba objects o
WHERE bh.obj=o.data object id
AND bh.hladdr='697ACFD8'
ORDER BY tch DESC;
FILE# DBABLK CLASS STATUS DIRTY TCH OWNER OBJECT NAME
----- ------ ----- ------ ----- --- ------ ----------------4
476
1 xcur N
24 NDEBES SYS IOT TOP 53076
1 48050
1 xcur N
4 SYS
SYS C00651
1
7843
1 xcur N
2 SYS
ICOL$
1
7843
1 xcur N
2 SYS
IND$
1
7843
1 xcur N
2 SYS
COL$
1
7843
1 xcur N
2 SYS
CLU$
1
7843
1 xcur N
2 SYS
C OBJ#
OBJECT TYPE
----------INDEX
INDEX
TABLE
TABLE
TABLE
TABLE
CLUSTER
The index SYS_IOT_TOP_53076 has the highest touch count among all objects protected
by the child latch.1 This index is the segment underlying an index organized table.
SQL> SELECT table name, table type
FROM dba indexes
WHERE index name='SYS IOT TOP 53076';
TABLE NAME
TABLE TYPE
------------------------------ ----------CUSTOMER
TABLE
The big question is whether this whole investigation deserves the DBA’s time and energy
or whether he is about to become afflicted by CTD. Compulsive tuning disorder (CTD) is a term
coined by Gaja Krishna Vaidyanatha, co-author of Oracle Performance Tuning 101 ([VaDe 2001]).
Of course, CTD is a pun on the designation of the serious psychiatric disorder OCD (obsessive
compulsive disorder). Looking at a Statspack report that quantifies the workload that caused
the latch contention, it turns out that the wait event latch: cache buffers chains is insignificant.
I modified the Statspack report parameter top n events to include the top ten timed events in
the report (default is five). In spite of this, the cache buffers chains wait event did not make the cut.
1. It’s also possible to join X$BH with DBA EXTENTS by using the where-clause WHERE bh.file#=e.file id AND
bh.dbablk BETWEEN e.block id AND e.block id+e.blocks 1.However, the response time of such a
query is quite long.
107
108
CHAPTER 10 ■ X$BH AND LATCH CONTENTION
Snapshot
Snap Id
Snap Time
Sessions Curs/Sess Comment
~~~~~~~~
---------- ------------------ -------- --------- ------------------Begin Snap:
662 15-Oct-07 20:20:20
16
5.1 contention customer
End Snap:
663 15-Oct-07 20:24:30
17
6.1 contention customer
Elapsed:
4.17 (mins)
…
Top 10 Timed Events
Avg %Total
~~~~~~~~~~~~~~~~~~
wait
Call
Event
Waits
Time (s)
(ms)
Time
----------------------------------------- ------------ ----------- ------ -----db file sequential read
3,109
89
29
39.7
CPU time
59
26.3
control file sequential read
532
23
43
10.1
db file parallel write
191
17
87
7.4
db file scattered read
580
9
15
3.8
buffer busy waits
12,530
7
1
3.2
read by other session
420
4
8
1.6
os thread startup
5
3
597
1.3
log file sync
59
3
48
1.3
SQL*Net more data to client
88,397
3
0
1.2
…
Wait Events DB/Inst: TEN/ten Snaps: 662-663
…
Avg
%Time Total Wait
wait
Waits
Event
Waits -outs Time (s)
(ms)
/txn
--------------------------------- ------------ ------ ---------- ------ -------db file sequential read
3,109
0
89
29
97.2
…
latch: cache buffers chains
53
0
0
5
1.7
…
In the five-minute interval covered by the Statspack report, a meager 165 ms were spent
waiting for the cache buffers chains latch. What about individual sessions that are affected by
this wait event? These are identified by querying V$SESSION EVENT.
SQL> SELECT s.sid, s.serial#, p.spid, e.event, e.total waits, e.time waited
FROM v$session s, v$process p, v$session event e
WHERE s.paddr=p.addr
AND s.sid=e.sid
AND s.type='USER'
AND e.event LIKE 'latch%'
ORDER BY time waited;
SID SERIAL# SPID EVENT
TOTAL WAITS TIME WAITED
--- ------- ---- --------------------------- ----------- ----------27
769 4400 latch: cache buffers chains
49
0
27
769 4400 latch: library cache pin
3
0
32
1921 5232 latch: library cache pin
29
0
CHAPTER 10 ■ X$BH AND LATCH CONTENTION
32
1921 5232 latch: cache buffers chains
129
32
1921 5232 latch: library cache
86
27
769 4400 latch: library cache
35
SQL> ORADEBUG SETOSPID 5232
Oracle pid: 18, Windows thread id: 5232, image: ORACLE.EXE (SHAD)
SQL> ORADEBUG EVENT 10046 trace name context forever, level 8
Statement processed.
SQL> ORADEBUG TRACEFILE NAME
c:\programs\oracle\product\admin\ten\udump\ten ora 5232.trc
1
1
1
Since session (SID) 32 with operating system process identifier (SPID) 5332 had the highest
amount of waits for latch: cache buffers chains, I decided to trace this session at level 8 using
ORADEBUG (see Chapter 37). Then I generated a resource profile from the trace file using the free
extended SQL trace profiler ESQLTRCPROF, included with this book.
C:> esqltrcprof.pl ten ora 5232.trc
ORACLE version 10.2 trace file. Timings are in microseconds (1/1000000 sec)
Resource Profile
================
Response time: 43.387s; max(tim)-min(tim): 58.216s
Total wait time: 26.738s
---------------------------Note: 'SQL*Net message from client' waits for more than 0.005s
are considered think time
Wait events and CPU usage:
Duration
Pct
Count
Average Wait Event/CPU Usage/Think Time
-------- ------ ------------ ---------- ----------------------------------22.439s 51.72%
188693 0.000119s SQL*Net message from client
16.250s 37.45%
188741 0.000086s total CPU
1.124s
2.59%
2 0.561751s log file switch (checkpoint incomplete)
0.774s
1.78%
3 0.257905s think time
0.642s
1.48%
362 0.001774s enq: TX - index contention
0.625s
1.44%
4543 0.000138s buffer busy waits
0.569s
1.31%
188696 0.000003s SQL*Net message to client
0.399s
0.92%
unknown
0.329s
0.76%
4 0.082256s log file switch completion
0.170s
0.39%
5 0.034085s log buffer space
0.057s
0.13%
4 0.014250s db file sequential read
0.005s
0.01%
1 0.004899s log file sync
0.002s
0.00%
20 0.000093s latch: library cache
0.001s
0.00%
13 0.000056s latch: cache buffers chains
0.001s
0.00%
10 0.000054s latch free
0.000s
0.00%
28 0.000015s buffer deadlock
0.000s
0.00%
10 0.000022s latch: cache buffers lru chain
0.000s
0.00%
1 0.000101s latch: library cache pin
--------- ------- -----------------------------------------------------------
109
110
CHAPTER 10 ■ X$BH AND LATCH CONTENTION
43.387s 100.00% Total response time
Total number of roundtrips (SQL*Net message from/to client): 188696
CPU usage breakdown
-----------------------parse CPU:
0.00s (12 PARSE calls)
exec CPU:
16.25s (188707 EXEC calls)
fetch CPU:
0.00s (22 FETCH calls)
…
The most expensive SQL statement was an INSERT statement on the table CUSTOMER identified earlier by accessing X$BH.
Statements Sorted by Elapsed Time (including recursive resource utilization)
==============================================================================
Hash Value: 1256130531 - Total Elapsed Time (excluding think time): 42.608s
INSERT INTO customer(id, name, phone)
VALUES (customer id seq.nextval, :name, :phone)
RETURNING id INTO :id
DB Call
Count
Elapsed
CPU
Disk
Query Current
Rows
------- -------- ---------- ---------- -------- -------- -------- -------PARSE
0
0.0000s
0.0000s
0
0
0
0
EXEC
188695
19.6005s
16.2344s
4
4241
581188
188695
FETCH
0
0.0000s
0.0000s
0
0
0
0
------- -------- ---------- ---------- -------- -------- -------- -------Total
188695
19.6005s
16.2344s
4
4241
581188
188695
Wait Event/CPU Usage/Think Time
Duration
Count
---------------------------------------- ---------- -------SQL*Net message from client
22.439s 188692
total CPU
16.234s 188695
think time
0.774s
3
enq: TX - index contention
0.642s
362
buffer busy waits
0.625s
4543
SQL*Net message to client
0.569s 188694
log file switch completion
0.329s
4
log buffer space
0.170s
5
db file sequential read
0.023s
2
latch: library cache
0.002s
20
latch: cache buffers chains
0.001s
13
latch free
0.001s
10
buffer deadlock
0.000s
28
latch: cache buffers lru chain
0.000s
10
latch: library cache pin
0.000s
1
CHAPTER 10 ■ X$BH AND LATCH CONTENTION
Again, the contribution of the wait event latch: cache buffers chains is negligible. The problem
is elsewhere: 51% of the response time is due to SQL*Net message from client, the most significant contributor to overall response time. Look at the columns “Count” and “Rows” of the
statistics for the INSERT statement. The figures are identical. Furthermore the number of executions is almost identical to the number of network round-trips (SQL*Net message from client).
This means that the application does single row inserts, i.e., one network round-trip per INSERT
statement executed. This observation is confirmed by looking at the row count r of EXEC entries
in the trace file. These consistently have the value 1.
WAIT #3: nam='SQL*Net message to client' ela= 3 driver id=1413697536
#bytes=1 p3=0 obj#=-1 tim=347983945867
EXEC #3:c=0,e=79,p=0,cr=0,cu=2,mis=0,r=1,dep=0,og=1,tim=347983945892
WAIT #3: nam='SQL*Net message from client' ela= 109 driver id=1413697536
#bytes=1 p3=0 obj#=-1 tim=347983946105
A major reduction in response time could be achieved by recoding the application to use
array inserts. I caution you against following advice on the Internet to search for latch contention by accessing X$ tables. Such advice may lead you on the road to compulsive tuning disorder,
rather than paving the way to a solution for performance problems. Always use a response timebased approach. Never drill down to some perceived anomalies or ratios that you deem to be
too high unless you have convincing evidence that these contribute significantly to response
time. Several years ago, I had the opportunity to talk to the renowned neuroscientist Jaak Panksepp. To this date, I remember him asking, “What data do you have in support of this claim?”
More often than not, unsubstantiated assertions may be dismissed by asking this question.
Instead of prematurely attributing symptoms to causes and haphazardly implementing hypothetical solutions, we as DBAs would be better advised to adopt scientific approaches
resembling those used by the medical community.
Source Code Depot
Table 10-2 lists this chapter’s source files and their functionality.
Table 10-2. X$BH Source Code Depot
File Name
Functionality
latch vs blocks.sql
Retrieves database objects protected by a child latch.
111
CHAPTER 11
■■■
X$KSLED and Enhanced
Session Wait Data
T
he X$ fixed table X$KSLED is an undocumented X$ fixed table. The dynamic performance view
V$SESSION WAIT is based on X$KSLED and X$KSUSECST. X$KSLED contributes the wait event name
to V$SESSION WAIT whereas X$KSUSECST holds timing information. Neither Oracle9i nor Oracle10g
have a V$ view that provides more than centisecond resolution for a single wait event at session
level.1 There is also no view that integrates information on operating system processes found
in V$PROCESS with wait information. It’s a bit cumbersome to correctly interpret the columns
WAIT TIME and SECONDS IN WAIT of the view V$SESSION WAIT depending on the value of the
column STATE.
Direct access to X$ fixed tables makes it possible to get microsecond resolution for wait
events without the need to enable SQL trace. Furthermore, an enhanced version of V$SESSION
WAIT, which combines information from V$SESSION, V$SESSION WAIT and V$PROCESS and is easier to
interpret, may be built. Note that V$SESSION in Oracle9i does not include information on wait
events, whereas wait event information has been incorporated into V$SESSION in Oracle10g.
The enhanced session wait view presented in this chapter is compatible with Oracle9i, Oracle10g,
and Oracle11g.
Drilling Down from V$SESSION_WAIT
By drilling down from V$SESSION WAIT, as presented in Chapter 9, it becomes apparent that this
view is based on X$KSLED and X$KSUSECST. Adding column alias names, which correspond to the
column names of GV$SESSION, to the view definition retrieved from V$FIXED VIEW DEFINITION,
yields the following SELECT statement:
SELECT s.inst id AS inst id,
s.indx AS sid,
s.ksussseq AS seq#,
e.kslednam AS event,
e.ksledp1 AS p1text,
s.ksussp1 AS p1,
s.ksussp1r AS p1raw,
1. In Oracle11g, the dynamic performance view V$SESSION WAIT has the following new columns:
WAIT TIME MICRO, TIME REMAINING MICRO, and TIME SINCE LAST WAIT MICRO.
113
114
CHAPTER 11 ■ X$KSLED AND ENHANCED SESSION WAIT DATA
e.ksledp2 AS p2text,
s.ksussp2 AS p2,
s.ksussp2r AS p2raw,
e.ksledp3 AS p3text,
s.ksussp3 AS p3,
s.ksussp3r AS p3raw,
decode(s.ksusstim,0,0,-1,-1,-2,-2,
decode(round(s.ksusstim/10000),0,- 1,round(s.ksusstim/10000)))
AS wait time,
s.ksusewtm AS seconds in wait,
decode(s.ksusstim, 0, 'WAITING', -2, 'WAITED UNKNOWN TIME', -1,
'WAITED SHORT TIME', 'WAITED KNOWN TIME') AS state
FROM x$ksusecst s, x$ksled e
WHERE bitand(s.ksspaflg,1)!=0
and bitand(s.ksuseflg,1)!=0
and s.ksussseq!=0
and s.ksussopc=e.indx
Note how the microsecond resolution in X$KSUSECST is artificially reduced to centisecond
resolution through the division by 10000. At reduced resolution, it is impossible to learn how
long short wait events such as db file sequential read, db file scattered read, or global cache related
wait events in Real Application Clusters (RAC) were. Wait times shorter than 1 centisecond are
displayed as -1 by V$SESSION WAIT. At this resolution, it is impossible to see disk access times at
session level. Peaks in I/O service time also remain unnoticed, as long as the duration of the
wait events stays below 1 centisecond, which it normally will. Following is an example:
SQL> SELECT event, wait time, seconds in wait, state
FROM v$session wait
WHERE (state='WAITED KNOWN TIME' or state='WAITED SHORT TIME')
AND event !='null event';
EVENT
WAIT TIME SECONDS IN WAIT STATE
--------------------------- --------- --------------- ----------------db file sequential read
-1
0 WAITED KNOWN TIME
SQL*Net message from client
-1
0 WAITED KNOWN TIME
SQL*Net message to client
-1
0 WAITED KNOWN TIME
An Improved View
Now that the restrictions of V$SESSION WAIT have become apparent, we may set goals for an
improved view. The goals are to
• Provide wait event duration at microsecond resolution
• Integrate process, session, and session wait information
• Present the wait status and wait time in a readily accessible format without requiring
further decoding by users of the view
CHAPTER 11 ■ X$KSLED AND ENHANCED SESSION WAIT DATA
Information on processes and sessions is available from the X$ tables underlying V$PROCESS
and V$SESSION. These are X$KSUSE and X$KSUPR respectively. It requires some perseverance to
construct the largish view that meets the goals set above. Following is the DDL to create the
view, which I have called X $SESSION WAIT (script file name x session wait.sql):
CREATE OR REPLACE view x $session wait AS
SELECT s.inst id AS inst id,
s.indx AS sid,
se.ksuseser AS serial#,
-- spid from v$process
p.ksuprpid AS spid,
-- columns from v$session
se.ksuudlna AS username,
decode(bitand(se.ksuseidl,11),1,'ACTIVE',0,
decode(bitand(se.ksuseflg,4096),0,'INACTIVE','CACHED'),2,'SNIPED',3,
'SNIPED', 'KILLED') AS status,
decode(ksspatyp,1,'DEDICATED',2,'SHARED',3,'PSEUDO','NONE') AS server,
se.ksuseunm AS osuser,
se.ksusepid AS process,
se.ksusemnm AS machine,
se.ksusetid AS terminal,
se.ksusepnm AS program,
decode(bitand(se.ksuseflg,19),17,'BACKGROUND',1,'USER',2,'RECURSIVE','?') AS type,
se.ksusesqh AS sql hash value,
se.ksusepha AS prev hash value,
se.ksuseapp AS module,
se.ksuseact AS action,
se.ksuseclid AS client identifier,
se.ksuseobj AS row wait obj#,
se.ksusefil AS row wait file#,
se.ksuseblk AS row wait block#,
se.ksuseslt AS row wait row#,
se.ksuseltm AS logon time,
se.ksusegrp AS resource consumer group,
-- columns from v$session wait
s.ksussseq AS seq#,
e.kslednam AS event,
e.ksledp1 AS p1text,
s.ksussp1 AS p1,
s.ksussp1r AS p1raw,
e.ksledp2 AS p2text,
s.ksussp2 AS p2,
s.ksussp2r AS p2raw,
e.ksledp3 AS p3text,
s.ksussp3 AS p3,
s.ksussp3r AS p3raw,
115
116
CHAPTER 11 ■ X$KSLED AND ENHANCED SESSION WAIT DATA
-- improved timing information from x$ksusecst
decode(s.ksusstim,
-2, 'WAITED UNKNOWN TIME',
-1,'LAST WAIT < 1 microsecond', -- originally WAITED SHORT TIME
0,'CURRENTLY WAITING SINCE '|| s.ksusewtm || 's',
'LAST WAIT ' || s.ksusstim/1000 || ' milliseconds (' ||
s.ksusewtm || 's ago)') wait status,
to number(decode(s.ksusstim,0,NULL,-1,NULL,-2,NULL, s.ksusstim/1000))
AS wait time milli
from x$ksusecst s, x$ksled e , x$ksuse se, x$ksupr p
where bitand(s.ksspaflg,1)!=0
and bitand(s.ksuseflg,1)!=0
and s.ksussseq!=0
and s.ksussopc=e.indx
and s.indx=se.indx
and se.ksusepro=p.addr;
The unit of wait time in the column WAIT TIME MILLI is 1 millisecond. Fractional milliseconds are preserved, yielding microsecond resolution. The column WAIT STATUS indicates whether
the session is currently waiting or not. In case the session is waiting, it displays for how many
seconds it has already been waiting. For sessions that are not currently waiting, the duration of
the last wait event and the time that has elapsed since the last wait event began, are reported.
The following query confirms that wait events are actually timed with microsecond resolution:
SQL> SELECT sid, serial#, spid, username, event, wait status, wait time milli
FROM x $session wait
WHERE wait time milli > 0 and wait time milli <10;
SID SERIAL# SPID
USERNAME EVENT
WAIT STATUS
WAIT TIME MILLI
--- ------- ------- -------- --------------- ------------------ --------------24 58259 1188090 SYS
db file
LAST WAIT 6.541 ms
6.541
sequential read (0 s ago)
22 48683 966786 SYS
SQL*Net message LAST WAIT .003 ms
.003
to client
(0 s ago)
I have included the operating system process identifier SPID, represented by the column
KSUPRPID from the X$ fixed table X$KSUPR, in the view. This column corresponds to V$PROCESS.
SPID. It is useful for enabling SQL trace with ORADEBUG SETOSPID and ORADEBUG EVENT (see Chapter 37).
In a scenario where operating system tools, such as top, prstat, or nmon are used to identify
resource intensive processes (e.g., high I/O wait percentage), the column SPID provides instant
access to wait information based on the process identifier displayed by these tools. Figure 11-1
shows the result of a query on the view CV $SESSION WAIT that includes both the SPID and detailed
information on the last wait event.
CHAPTER 11 ■ X$KSLED AND ENHANCED SESSION WAIT DATA
Figure 11-1. Result of a query on the enhanced session wait view CV_$SESSION_WAIT
Users other than SYS may be given access to X $SESSION WAIT by granting SELECT privilege
on the view and creating a public synonym. Below are some DDL statements that mimic the
hierarchical approach of the built-in V$ and GV$ views. Additional database objects are created in
schema SITE_SYS, to avoid cluttering the data dictionary with site-specific objects. Privileges
on the views in schema SITE_SYS are then granted to SELECT_CATALOG_ROLE. The letter C
for custom in the view and synonym names is used to distinguish these views from the built-in
dynamic performance views. Use the public synonyms CV$SESSION WAIT and CGV$SESSION WAIT
to access enhanced versions of the views V $SESSION WAIT and GV $SESSION WAIT respectively.
SQL> GRANT SELECT ON x $session wait TO site sys WITH GRANT OPTION;
Grant succeeded.
SQL> CREATE OR REPLACE VIEW site sys.cgv $session wait AS
SELECT * FROM sys.x $session wait;
View created.
SQL> CREATE OR REPLACE VIEW site sys.cv $session wait AS
SELECT * FROM sys.x $session wait WHERE inst id=userenv('instance');
View created.
SQL> GRANT SELECT ON site sys.cgv $session wait TO select catalog role;
Grant succeeded.
SQL> GRANT SELECT ON site sys.cv $session wait TO select catalog role;
Grant succeeded.
SQL> CREATE OR REPLACE PUBLIC SYNONYM cgv$session wait
FOR site sys.cgv $session wait;
Synonym created.
SQL> CREATE OR REPLACE PUBLIC SYNONYM cv$session wait FOR site sys.cv $session wait;
Synonym created.
117
118
CHAPTER 11 ■ X$KSLED AND ENHANCED SESSION WAIT DATA
Source Code Depot
Table 11-1 lists this chapter’s source files and their functionality.
Table 11-1. X$KSLED Source Code Depot
File Name
Functionality
x session wait.sql
This script contains enhanced views for accessing session wait events. In
addition to columns from V$SESSION WAIT, the views include information
from V$PROCESS and V$SESSION. Wait time has microsecond resolution.
The script includes grants and public synonyms.
CHAPTER 12
■■■
X$KFFXP and ASM Metadata
T
he X$ fixed table X$KFFXP is undocumented in Oracle10g and Oracle11g. ASM metadata
concerning mirroring and the assignment of ASM file extents to allocation units in ASM disks
is available through X$KFFXP. An understanding of X$KFFXP enables a DBA to directly access
database and server parameter files stored in ASM with operating system commands. Beyond
educational purposes, an understanding of ASM file layout may prove useful for salvaging data
or troubleshooting.
X$KFFXP
Each ASM file consists of one or more extents. Extents are striped across disks within the
disk group where the file resides. The size of an extent and the size of an ASM allocation unit
(parameter ASM AUSIZE) are identical. Due to striping, a mapping table between contiguous
ASM file extents and noncontiguous storage of files in ASM disks must be maintained.
An ORACLE segment consists of one or more extents. Each extent consists of a contiguous
set of blocks at a certain offset (block number) from the beginning of a data file. The locations
and sizes of a segment’s extents within a data file are available by querying the dictionary view
DBA EXTENTS. This applies to all the storage options for a data file, such as file system, raw device, or
ASM file.
The X$ fixed table X$KFFXP holds the mapping between ASM file extents and allocation
units within ASM disks. It keeps track of the position of striped and mirrored extents for each
ASM file. You must be connected to an ASM instance to retrieve rows from this view. Given a
block number within a segment from DBA EXTENTS, it is possible to find out which block of an
ASM disk holds the data in the block.
Each file in an ASM disk group is identified by a file number and an incarnation. Both
numbers are part of the file name in V$ASM ALIAS. V$ASM ALIAS is built on top of X$KFFIL and not
X$KFFXP. Actually, there is no V$ fixed view, which is based on X$KFFXP. ASMCMD is a command
line utility that presents ASM disk groups as if they were file systems. ASMCMD displays the file
number and incarnation when the command ls -l is used. The alias name spfileTEN.ora in
the following example points to the ASM file with file number 265 and incarnation 632700769:
$ asmcmd
ASMCMD> cd DG/TEN
ASMCMD> ls -l spfileTEN.ora
Type Redund Striped Time Sys Name
N spfileTEN.ora => +DG/TEN/PARAMETERFILE/spfile.265.63
2700769
119
120
CHAPTER 12 ■ X$KFFXP AND ASM METADATA
X$KFFXP also contains the file number and incarnation. Since the view must keep track of
all the striped and mirrored extents, it includes a number of other columns that hold information on disk groups, disks, and ASM files. Table 12-1 lists the columns of X$KFFXP.
Table 12-1. X$KFFXP Columns
X$KFFXP Column Name
Meaning
ADDR
Address
INDX
Row index
INST ID
Instance identifier (1 for single instance, 1..n for RAC)
GROUP KFFXP
Disk group number; corresponds to V$ASM DISKGROUP.GROUP NUMBER
NUMBER KFFXP
File number; corresponds to V$ASM FILE.FILE NUMBER
COMPOUND KFFXP
Compound index; corresponds to V$ASM FILE.COMPOUND INDEX
INCARN KFFXP
Incarnation; corresponds to V$ASM FILE.INCARNATION
PXN KFFXP
Unknown
XNUM KFFXP
Extent number
LXN KFFXP
Logical extent number (0=primary, 1=mirrored copy)
DISK KFFXP
Disk number; corresponds to V$ASM DISK.DISK NUMBER
AU KFFXP
Offset within the device in multiples of the allocation unit size
(V$ASM DISKGROUP.ALLOCATION UNIT SIZE)
FLAGS KFFXP
Unknown
CHK KFFXP
Unknown
Salvaging an SPFILE
Presumably I am not the only DBA who has had to face the issue of a server parameter file
(SPFILE) that had undergone a modification with ALTER SYSTEM, which rendered the file unsuitable
to start up an instance. This may manifest itself by several different errors. Here is one example:
SQL> STARTUP
ORA-00821: Specified value of sga target 356M is too small, needs to be at least
564M
ORA-01078: failure in processing system parameters
The SPFILE, which is a binary file, cannot be modified, except by performing an ALTER
SYSTEM command against a running instance. However, in this case the instance refuses to start
up with the SPFILE. If the SPFILE is stored in a file system, it can easily be converted to a text
parameter file with a text editor by removing non-ASCII characters. However, if the SPFILE is an
ASM file and you do not have a recent RMAN backup of it, you are stuck. Unless, of course, you
know how to retrieve the SPFILE directly from an ASM disk, which is what you will learn shortly.
CHAPTER 12 ■ X$KFFXP AND ASM METADATA
Run against an ASM instance, the five-way join below retrieves all the information that is
required to retrieve an SPFILE from ASM storage with the UNIX command dd.1 The query receives
the name of the SPFILE as input (not the absolute path within the disk group, which would be
assigned to the parameter SPFILE). The following example is from a disk group with external
redundancy. Thus, there is merely the primary allocation unit and no second allocation unit
for mirroring.
SQL> SELECT a.name, a.group number AS group#, a.file number AS file#,
f.bytes, allocation unit size AS au size, au kffxp AS au#, decode(x.lxn kffxp, 0,
'PRIMARY', 1, 'MIRROR') AS type,
d.failgroup AS failgrp, d.path
FROM v$asm alias a, v$asm file f, x$kffxp x, v$asm disk d, v$asm diskgroup dg
WHERE lower(a.name)=lower('spfileTEN.ora')
AND a.group number=f.group number
AND a.file number=f.file number
AND f.group number=x.group kffxp
AND f.file number=x.number kffxp
AND x.disk kffxp=d.disk number
AND f.group number=dg.group number;
NAME
GROUP# FILE# BYTES AU SIZE AU# TYPE
FAILGRP PATH
------------- ------ ----- ----- ------- --- ------- ------- -------------------spfileTEN.ora
1 265 2560 1048576 240 PRIMARY SDA9
/dev/oracleasm/disks
/SDA9
With normal redundancy, i.e., two-way mirroring by ASM, you would see a primary and a
secondary allocation unit, which are assigned to different disks belonging to disjoint fail groups.
NAME
GROUP# FILE# BYTES AU SIZE AU# TYPE
FAILGRP
------------- ------ ----- ----- ------- --- ------- ------spfileten.ora
1 257 2560 1048576 23 PRIMARY DC1
spfileten.ora
1 257 2560 1048576 26 MIRROR DC2
PATH
---------------------/dev/rlv asm dc1 dsk09
/dev/rlv asm dc2 dsk09
The column AU# in the query output (X$KFFXP.AU KFFXP) is the offset of an allocation unit
from the first block of an ASM disk. The default size of an ASM allocation unit is 1048576 bytes
(1 MB). Taking this into consideration, the SPFILE with a size of 2560 bytes, which is stored in
allocation unit 240, may be retrieved with the following command pipeline:
$ dd if=/dev/oracleasm/disks/SDA9 bs=1048576 skip=240 count=1 \
| dd bs=2560 count=1 | strings > spfile.txt
1+0 records in
1+0 records out
1+0 records in
0+0 records out
The dd options bs, skip, and count indicate the block size, number of blocks to skip from
the beginning of a file, and the number of blocks to read respectively. The strings command
near the end of the command pipeline removes all non-ASCII characters. Hence, the resulting
1. On Windows, use the dd command that ships with Cygwin.
121
122
CHAPTER 12 ■ X$KFFXP AND ASM METADATA
file spfile.txt is an ASCII text file, ready for editing. The commands head and tail, which display
the beginning and end of the file, confirm that the entire contents of the SPFILE were retrieved.
$ head -3 spfile.txt
*.audit file dest='/opt/oracle/obase/admin/TEN/adump'
*.background dump dest='/opt/oracle/obase/admin/TEN/bdump'
*.compatible='10.2.0.3.0'
$ tail -3 spfile.txt
TEN1.undo tablespace='UNDOTBS1'
TEN2.undo tablespace='UNDOTBS2'
*.user dump dest='/opt/oracle/obase/admin/TEN/udump'
Mapping Segments to ASM Storage
Now that we have successfully completed the warm-up exercise, we are ready to tackle the
more difficult task of mapping a block in a database segment to the corresponding block in
an ASM disk. Beyond the mapping to the correct allocation unit in an ASM disk, this requires
finding the correct block within the allocation unit. A single ASM allocation unit may contain
blocks from several database segments. Remember that the smallest extent size in a locally
managed tablespace with AUTOALLOCATE option is 64 KB. The case study that follows uses the
segment LOCATIONS in tablespace USERS as an example. We repeat the query from the previous
section, however this time pass the single data file of tablespace USERS as input.
SQL> SELECT x.xnum kffxp AS extent, a.group number AS grp#, a.file number AS file#,
f.bytes, allocation unit size AS au size, au kffxp AS au,
decode(x.lxn kffxp, 0, 'PRIMARY', 1, 'MIRROR') AS type,
d.failgroup AS failgrp, d.path
FROM v$asm alias a, v$asm file f, x$kffxp x, v$asm disk d, v$asm diskgroup dg
WHERE lower(a.name)=lower('USERS.263.628550085')
AND a.group number=f.group number
AND a.file number=f.file number
AND f.group number=x.group kffxp
AND f.file number=x.number kffxp
AND x.disk kffxp=d.disk number
AND f.group number=dg.group number
ORDER BY x.xnum kffxp;
The size of the file USERS.263.628550085 is 175 MB. The result of the query, which is depicted
in the next code listing, shows how ASM has striped the file across five disks. By default, ASM
stripes across eight disks (parameter ASM STRIPEWIDTH), given that the disk group contains
enough disks to accomplish this.
EXTENT GRP# FILE#
BYTES AU SIZE AU TYPE
FAILGRP
------ ---- ----- --------- ------- --- ------- ------0
1 263 183508992 1048576 123 PRIMARY SDA11
1
1 263 183508992 1048576 126 PRIMARY SDA6
2
1 263 183508992 1048576 123 PRIMARY SDA9
3
1 263 183508992 1048576 125 PRIMARY SDA10
4
1 263 183508992 1048576 125 PRIMARY SDA5
PATH
-------------------------/dev/oracleasm/disks/SDA11
/dev/oracleasm/disks/SDA6
/dev/oracleasm/disks/SDA9
/dev/oracleasm/disks/SDA10
/dev/oracleasm/disks/SDA5
CHAPTER 12 ■ X$KFFXP AND ASM METADATA
5
6
7
8
9
10
…
1
1
1
1
1
1
263
263
263
263
263
263
183508992
183508992
183508992
183508992
183508992
183508992
1048576
1048576
1048576
1048576
1048576
1048576
124
127
124
126
126
125
PRIMARY
PRIMARY
PRIMARY
PRIMARY
PRIMARY
PRIMARY
SDA11
SDA6
SDA9
SDA10
SDA5
SDA11
/dev/oracleasm/disks/SDA11
/dev/oracleasm/disks/SDA6
/dev/oracleasm/disks/SDA9
/dev/oracleasm/disks/SDA10
/dev/oracleasm/disks/SDA5
/dev/oracleasm/disks/SDA11
The next step consists of locating the correct database block among all those ASM file
extents. The package DBMS ROWID may be used to retrieve the block within a segment, where
a certain row resides. The block number returned by DBMS ROWID.ROWID BLOCK NUMBER must
subsequently be mapped to the correct extent within the ASM file. We will use the row in table
LOCATIONS, which contains the name of the city of angels, as an example.
SQL> SELECT f.file name, e.relative fno AS rel fno,
e.extent id, e.block id AS "1st BLOCK", e.blocks,
dbms rowid.rowid block number(l.ROWID) AS block id
FROM locations l, dba extents e, dba data files f
WHERE l.city='Los Angeles'
AND f.relative fno=e.relative fno
AND e.relative fno=dbms rowid.rowid relative fno(l.ROWID)
AND dbms rowid.rowid block number(l.ROWID)
BETWEEN e.block id AND e.block id + e.blocks -1;
FILE NAME
REL FNO EXTENT ID 1st BLOCK BLOCKS BLOCK ID
------------------------------------ ------- --------- --------- ------ -------+DG/ten/datafile/users.263.628550085
4
0
21793
8
21798
The row where LOCATIONS.CITY='Los Angeles' is in file 4, block 21798 (column BLOCK ID in
the query result). Block 21798 is in extent 0 of the segment, which starts at block 21793. Extent 0 has
a total of 8 blocks. The name of file 4 indicates that it is an ASM file. Next, we need to locate
the ASM allocation unit that contains this database block. The BLOCK ID is the offset from the
beginning of the database file. The following query takes into consideration that the database
block size is 8192 bytes and the ASM allocation unit size is 1048576 bytes. Offsets in a segment
are measured in database blocks (DB BLOCK SIZE or the tablespace’s block size), whereas offsets in
ASM disks are measured in allocation units. The extent in the ASM file, which contains the
block, is thus:
8192
floor 21798
------------------------------- = 170
1048576
The following query returns the ASM disk and the sought after allocation unit within
the disk:
SQL> SELECT x.xnum kffxp AS extent, a.group number AS grp#, a.file number AS file#,
f.bytes, allocation unit size AS au size, au kffxp AS au#,
decode(x.lxn kffxp, 0, 'PRIMARY', 1, 'MIRROR') AS type, d.failgroup, d.path
FROM v$asm alias a, v$asm file f, x$kffxp x, v$asm disk d, v$asm diskgroup dg
WHERE lower(a.name)=lower('USERS.263.628550085')
AND a.group number=f.group number
AND a.file number=f.file number
123
124
CHAPTER 12 ■ X$KFFXP AND ASM METADATA
AND f.group number=x.group kffxp
AND f.file number=x.number kffxp
AND x.disk kffxp=d.disk number
AND f.group number=dg.group number
AND x.xnum kffxp=trunc(21798*8192/1048576);
EXTENT GRP# FILE#
BYTES AU SIZE AU# TYPE
FAILGROUP PATH
------ ---- ----- --------- ------- --- ------- --------- ------------------------170
1 263 183508992 1048576 238 PRIMARY SDA9
/dev/oracleasm/disks/SDA9
Finally, we need to find the correct block within the 1 MB sized allocation unit. The
following formula calculates the offset of a database block from the beginning of an ASM disk:
AU# AU Size + Block# DB_BLOCK_SIZE– ASM Extent# AU Size
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------DB_BLOCK_SIZE
In the preceding formula, AU# is the allocation unit (offset) in an ASM disk (X$KFFXP.AU
KFFXP), AU size is the ASM allocation unit size, Block# is the block number in the database
segment, DB BLOCK SIZE is the database block size (or tablespace block size, if different), and
ASM Extent# is the ASM extent number (X$KFFXP.XNUM KFFXP) in the ASM file. The offset thus
obtained is in multiples of the database (or tablespace) block size. Entering the figures from the
preceding query result into the formula yields the following:
238 1048576 + 21798 8192 – 170 1048576
--------------------------------------------------------------------------------------------------------------------- = 30502
8192
The database block, which contains the string “Los Angeles”, is at an offset of 30502 blocks
from the beginning of the disk, where each block has a size of 8192 bytes. To extract this block,
we need to use the following dd command:
$ dd if=/dev/oracleasm/disks/SDA9 bs=8192 skip=30502 count=1 | strings | \
grep "Los Angeles"
1+0 records in
1+0 records out
Los Angeles
Of course, I did not cheat by creating a table that contains the string “Los Angeles” in each
and every row. Adjacent blocks do not contain this string.
$ dd if=/dev/oracleasm/disks/SDA9 bs=8192 skip=30501 count=1 | strings | \
grep "Los Angeles"
1+0 records in
1+0 records out
$ dd if=/dev/oracleasm/disks/SDA9 bs=8192 skip=30503 count=1 | strings | \
grep "Los Angeles"
1+0 records in
1+0 records out
CHAPTER 12 ■ X$KFFXP AND ASM METADATA
Figure 12-1 depicts the mapping of block 21978 in extent 0 of the database segment LOCATIONS
to allocation unit 238 in the ASM disk. This block is 38 blocks past the first block in allocation
unit 238. Since the segment’s extent consists of 8 blocks, the ASM allocation unit contains additional extents, possibly belonging to different segments.
RDBMS
ASM
Extent 0
...
Block 21793
AU# 2
Block 21798
AU# 3
...
AU# 4
Extent 1
AU# 238
Disk
Extent 2
Segment
Figure 12-1. Mapping of a block in a database segment to a block within an allocation unit of an
ASM disk
125
PA R T
5
SQL Statements
CHAPTER 13
■■■
ALTER SESSION/SYSTEM
SET EVENTS
A
LTER SESSION SET EVENTS and ALTER SYSTEM SET EVENTS are undocumented in the Oracle9i and
Oracle10g Oracle Database SQL Reference manuals as well as in Oracle Database SQL Language
Reference 11g Release 1. The manual Oracle Database Performance Tuning Guide 10g Release 2
contains an ALTER SESSION statement for the presumably best known event 10046 at level 8. Yet,
it is undocumented how to switch event 10046 off. The remaining levels of event 10046, which
are also very useful, are undocumented too. Furthermore, there are hundreds of other undocumented events that may be set with ALTER SESSION and ALTER SYSTEM, some of which may be
very useful too.
Still today, for a database user who has the privilege ALTER SESSION but was not granted the
role DBA, ALTER SESSION SET EVENTS is the only way to enable extended SQL trace in his own
session in such a way that wait events and/or bind variables are included in the SQL trace file.
Both ALTER SESSION SET EVENTS and ALTER SYSTEM SET EVENTS may also be used to request diagnostic dumps when a certain ORA-nnnnn error occurs.
Tracing Your Own Session
Supposedly every DBA and developer is aware of the SQL statement ALTER SESSION SET SQL
TRACE=TRUE. This statement creates a trace file, which logs all the SQL statements of the session
that executed the command. Actually it logs much more than that, e.g., session identification
information, application instrumentation entries, and timestamps, to mention a few. For details,
please refer to Chapter 24. What is lacking are wait events and information on bind variables.1
Both are needed for reliable performance diagnoses and the reproduction of performance
problems.
Oracle10g was the first release that offered a documented interface to enable tracing of
bind variables. This is possible with the package DBMS MONITOR, which also supports tracing
of wait events. But execute permission on DBMS MONITOR is solely granted to the role DBA (see
$ORACLE HOME/rdbms/admin/dbmsmntr.sql). The same applies to the undocumented package DBMS
SYSTEM, which is capable of setting events such as 10046. The undocumented package DBMS
SUPPORT, which also provides tracing of wait events and bind variables in addition to database
1. Wait events and bind variables are equally missing when SQL trace is enabled with the packaged
procedure DBMS SESSION.SET SQL TRACE(TRUE).
129
130
CHAPTER 13 ■ ALTER SESSION/SYSTEM SET EVENTS
calls (parse, execute, fetch) is not even granted to the role DBA. All three are too powerful to
allow normal database users to execute them. Last but not least, there is the SQL*Plus command
ORADEBUG, which is merely available to SYS. So the average database user who is proficient in
performance diagnosis is still at a loss when it comes to creating trace files without bothering a
DBA—if it weren’t for ALTER SESSION SET EVENTS.
ALTER SESSION SET EVENTS
ALTER SESSION SET EVENTS is ideal for building self-tracing capability into applications and for
enabling SQL trace in a logon trigger. By self-tracing, I mean the ability of an application to
enable SQL trace depending on an environment variable or a menu item in a graphical user
interface. The syntax for switching an event on is as follows:
ALTER SESSION SET EVENTS 'event number TRACE NAME CONTEXT [FOREVER,] LEVEL lvl'
The kind of event is determined by the integer event_number. The level, which often
controls the verbosity, is set with the integer lvl. By including the keyword FOREVER, the event
remains on, whereas without it, the event is only switched on momentarily. Normally, an event
must remain switched on for a longer period of time, hence FOREVER is almost always used. If,
for example, you were to execute ALTER SESSION SET EVENTS '10046 trace name context level 1',
then the resulting SQL trace file would record the ALTER SESSION statement and tracing would
be switched off when it finishes. Not very useful. Instead, you will want to use ALTER SESSION SET
EVENTS '10046 trace name context forever, level 12' to trace wait events and bind variables of
all subsequent statements.
The syntax for switching events off is as follows:
ALTER SESSION SET EVENTS 'event number trace name context off'
The usual approach is to first enable event 10046, then exercise a code path that requires
optimization, and finally to switch off event 10046. Performance diagnosis may then be done
with TKPROF or with an extended SQL trace profiler such as ESQLTRCPROF, which is included
in this book (see Chapter 27). In case you suspect or already know that the optimizer picks
suboptimal execution plans for some of the traced statements, you should also enable event 10053
with the same syntax and for the same interval as event 10046. Event 10053 instructs the costbased optimizer to write a log of its decision-making process to a trace file. Level 1 is the correct
level for this event.
Using ALTER SESSION SET EVENTS, it’s also possible to obtain all of the name-based dumps,
such as system state, library cache, heap, control file, and many more. The SQL*Plus command
ORADEBUG DUMPLIST prints a list of all available dumps. A while ago, Pete Finnigan pointed out
that a library cache dump may be used to glean passwords, which were used in ALTER USER
statements against Oracle8. This issue is fixed in Oracle10g, as the following test proves:
SQL> ALTER USER hr IDENTIFIED BY secret;
User altered.
SQL> ALTER SESSION SET EVENTS 'immediate trace name library cache level 10';
Session altered.
The resulting trace file contains asterisks instead of the password:
CHAPTER 13 ■ ALTER SESSION/SYSTEM SET EVENTS
BUCKET 76429:
LIBRARY OBJECT HANDLE: handle=6c15e8e8 mutex=6C15E99C(1)
name=ALTER USER hr IDENTIFIED BY ******
There might be other similar exploits, such that you may feel that granting ALTER SESSION
is too risky. The free instrumentation library for ORACLE (ILO) from Hotsos2 contains the package
HOTSOS ILO TIMER, which may be used to enable and disable tracing of database calls, wait
events, and bind variables without providing access to events or dumps. Be warned, however,
that you should revoke execute privilege from PUBLIC on HOTSOS SYSUTIL, which is installed
with ILO. Otherwise any database user could write to the alert log with the procedure WRITE
TO ALERT. Of course, you may also build your own wrapper around DBMS SUPPORT or DBMS MONITOR,
but ILO also provides interesting functionality for application instrumentation with module,
action, and client identifier (see Chapter 28).
ALTER SYSTEM SET EVENTS
ALTER SYSTEM SET EVENTS is the instance level counterpart to ALTER SESSION SET EVENTS. It sets
events for all future database sessions. Events set in this way do not persist across instance
restarts. If an event must be set each time an instance starts, use the parameter EVENT (see page 32).
Here’s a scenario for using an event at instance level, which you will hopefully never have
to face. Let’s assume there are several block corruptions in a table segment. Unfortunately, the
database is in no archive log mode, so neither data file nor block level recovery are an option.
Importing the last export dump would result in data loss. The following undocumented event
enables skipping of corrupt blocks during a full table scan:3
$ oerr ora 10231
10231, 00000, "skip corrupted blocks on table scans "
// *Cause:
// *Action: such blocks are skipped in table scans, and listed in trace files
You may be able to salvage most of the changes since the last export was done by a tablelevel export with this event. Yet, there is no way to instruct the export utility exp to set an event
after it has connected. This is where ALTER SYSTEM SET EVENTS comes in handy. By using DBMS
SYSTEM.READ EV, we may confirm that events thus set really are picked up by new database
sessions.
SQL> CONNECT / AS SYSDBA
Connected.
SQL> VARIABLE lev NUMBER
SQL> SET AUTOPRINT ON
SQL> ALTER SYSTEM SET EVENTS '10231 trace name context forever, level 10';
System altered.
SQL> EXECUTE sys.dbms system.read ev(10231, :lev)
2. ILO is available for download at http://sourceforge.net/projects/hotsos ilo.
3. The documented procedure DBMS REPAIR.SKIP CORRUPT BLOCKS provides the same functionality in
Oracle9i and subsequent releases.
131
132
CHAPTER 13 ■ ALTER SESSION/SYSTEM SET EVENTS
LEV
---------0
SQL> CONNECT / AS SYSDBA
Connected.
SQL> EXECUTE sys.dbms system.read ev(10231, :lev)
LEV
---------10
Event 10231 was not enabled in the database session that ran ALTER SYSTEM SET EVENTS.
After starting a new database session, the event is set. At this point the export utility might be
started to salvage the data. As soon as it finishes, the event may be switched off.
SQL> ALTER SYSTEM SET EVENTS '10231 trace name context off';
System altered.
SQL> CONNECT / AS SYSDBA
Connected.
SQL> EXECUTE sys.dbms system.read ev(10231, :lev)
LEV
---------0
ALTER SESSION/SYSTEM SET EVENTS and
Diagnostic Dumps
ALTER SESSION SET EVENTS and ALTER SYSTEM SET EVENTS may also be used to request certain diagnostic dumps in the event of an ORA-nnnnn error. The resulting trace files could be sent to
Oracle Support for analysis. Usually, such trace files document the state of an instance at the
time when an error occurred. Thus, they may prove useful in pinpointing the cause of a defect.
The syntax for requesting a dump when an ORA-nnnnn error occurs is as follows:
ALTER {SESSION|SYSTEM} SET EVENTS 'error code TRACE NAME dump name LEVEL lvl'
To print a list of available dumps, use the command ORADEBUG DUMPLIST in SQL*Plus. Following
is the syntax for disabling a dump in the event of an ORACLE error:
ALTER {SESSION|SYSTEM} SET EVENTS 'error code TRACE NAME dump name OFF'
Let’s assume that several sessions have encountered the error ORA-04031, for example
“ORA-04031: unable to allocate 4096 bytes of shared memory ("java pool","unknown object","joxs
heap","Intern")”. To create a heap dump each time any database session receives the ORACLE
error 4031, you would run the following ALTER SYSTEM statement:4
SQL> ALTER SYSTEM SET EVENTS '4031 trace name heapdump level 536870914';
System altered.
4. Oracle10g automatically writes a trace file when an ORA-4031 error is raised.
CHAPTER 13 ■ ALTER SESSION/SYSTEM SET EVENTS
Valid levels for this event are 1, 2, 3, 32, and 536870912 plus one of the levels between 1 and
32. Among others, the resulting trace file contains a call stack trace and the SQL statement,
which was active at the time when the error occurred.
ioc allocate (size: 4096, heap name: *** SGA ***, flags: 110009) caught 4031
*** 2007-11-12 16:20:54.359
ksedmp: internal or fatal error
ORA-04031: unable to allocate 4096 bytes of shared memory ("java pool",
"unknown object","joxs heap","Intern")
Current SQL statement for this session:
SELECT SYNNAM, DBMS JAVA.LONGNAME(SYNNAM), DBMS JAVA.LONGNAME(SYNTAB),
TABOWN, TABNODE, PUBLIC$, SYNOWN, SYNOWNID, TABOWNID, SYNOBJNO
FROM
SYS.EXU9SYN
WHERE SYNOWNID = :1
ORDER BY SYNTIME
----- Call Stack Trace ----calling
call
entry
argument values in hex
location
type
point
(? means dubious value)
…
After collecting sufficient diagnostic data, the dump event may be disabled with the
following statement:
SQL> ALTER SYSTEM SET EVENTS '4031 trace name heapdump off';
System altered.
Immediate Dumps
Using the following syntax, it is also possible to request that a dump be taken immediately:
ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME dump name LEVEL lvl'
If you suspect that an instance is hanging, you might take a system state dump like this:
SQL> ALTER SESSION SET EVENTS 'immediate trace name systemstate level 10';
Session altered.
The header of a system state dump looks as follows:
SYSTEM STATE
-----------System global information:
processes: base 6ED426BC, size 50, cleanup 6ED47FF4
allocation: free sessions 6ED60D60, free calls 00000000
control alloc errors: 0 (process), 0 (session), 0 (call)
PMON latch cleanup depth: 0
seconds since PMON's last scan for dead processes: 21
system statistics:
166 logons cumulative
18 logons current
1082648 opened cursors cumulative
25 opened cursors current
133
CHAPTER 14
■■■
ALTER SESSION SET
CURRENT_SCHEMA
A
LTER SESSION SET CURRENT SCHEMA was undocumented in Oracle8 and prior releases. It is
partially documented in the Oracle8i, Oracle9i, and Oracle10g SQL Reference manuals. There
are undocumented restrictions concerning certain database objects. Restrictions apply to
Advanced Queuing, the SQL statement RENAME, and database links. The CURRENT SCHEMA session
parameter offers a convenient way to perform operations on objects in a schema other than
that of the current user without having to qualify the objects with the schema name. It does not
affect the privileges of a session.
Privilege User vs. Schema User
The ORACLE DBMS distinguishes between a schema user identity and a privilege user identity. The
privilege user identity determines which permissions are available to create or access database
objects, while the schema user identity provides the context for statement parsing and execution. After logging into a DBMS instance, say as database user SYSTEM, the privileges granted
to user SYSTEM are available and any unqualified database objects referred to are expected in
schema SYSTEM. One might say that by default the current schema of the user SYSTEM is the
schema SYSTEM. The DBMS responds as if the database objects used had been qualified by
the schema name SYSTEM. But since the default current schema of user SYSTEM is SYSTEM,
the DBMS does not require qualification by a schema name.
The user name and the current schema need not always be equal. This is evident from the
columns PARSING USER ID, PARSING SCHEMA ID, and PARSING SCHEMA NAME of the view V$SQL as
well as from SQL trace files, which distinguish between user identity (uid) and logical identity
(lid) in the PARSING IN CURSOR entry.
This functionality is similar to UNIX systems, which discern an effective user identity from
a real user identity. The omnipotent UNIX user root may switch to another user through the
command su (switch user). Processes spawned thereafter have an effective user identity of the
user switched to and a real user identity of root.
In circumstances where it is sufficient to change the parsing schema identifier, ALTER SESSION
SET CURRENT SCHEMA is a less intrusive alternative to a temporary change of the password and
subsequent reset with ALTER USER IDENTIFIED BY VALUES (see Chapter 15).
135
136
CHAPTER 14 ■ ALTER SESSION SET CURRENT_SCHEMA
The following query retrieves SQL statements run by user HR from the shared pool by
selecting from the view V$SQL. Note that there are no SQL statements where the parsing user
identity differs from the parsing schema identity.
SQL> SELECT s.parsing user id, u.username, s.parsing schema id,
s.parsing schema name, substr(s.sql text,1,15) sql text
FROM v$sql s, dba users u
WHERE s.parsing user id=u.user id
AND s.parsing schema name='HR';
PARSING USER ID USERNAME PARSING SCHEMA ID PARSING SCHEMA NAME SQL TEXT
--------------- -------- ----------------- ------------------- --------------38 HR
38 HR
SELECT USER FRO
38 HR
38 HR
BEGIN DBMS OUTP
After importing some tables into the schema HR by running the IMPORT utility (imp) as
SYSTEM, repeating the previous query on V$SQL shows that some statements were executed
with the parsing user identity SYSTEM, while the parsing schema identity was HR.
PARSING USER ID
--------------38
38
38
5
5
USERNAME PARSING SCHEMA ID
-------- ----------------HR
38
HR
38
HR
38
SYSTEM
38
SYSTEM
38
PARSING SCHEMA NAME
------------------HR
HR
HR
HR
HR
SQL TEXT
--------------BEGIN
sys.dbm
ALTER SESSION S
ALTER SESSION S
BEGIN SYS.DBMS
BEGIN SYS.DBMS
If we trace the SQL statements issued by the IMPORT utility, we will find an ALTER SESSION
SET CURRENT SCHEMA statement in the SQL trace file. To set SQL TRACE=TRUE in the import session,
use the undocumented command line option TRACE=TRUE.
$ imp trace=true
Import: Release 10.2.0.1.0 - Production on Tue Jun 19 10:09:40 2007
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Username:
By the way, Data Pump Export (expdp) and Import (impdp) have the same undocumented
TRACE switch. The resulting SQL trace file contains lines such as these:
PARSING IN CURSOR #5 len=38 dep=0 uid=5 oct=42 lid=5 tim=63968187622 hv=886929406 ad
='6786940c'
ALTER SESSION SET CURRENT SCHEMA= "HR"
END OF STMT
PARSE #5:c=0,e=523,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=63968187614
EXEC #5:c=0,e=63,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=63968187802
XCTEND rlbk=0, rd only=1
=====================
PARSING IN CURSOR #5 len=113 dep=0 uid=38 oct=13 lid=5 tim=63968189065 hv=0 ad='8eaf
7d4'
CREATE SEQUENCE "LOCATIONS SEQ" MINVALUE 1 MAXVALUE 9900 INCREMENT BY 100
START WITH 3300 NOCACHE NOORDER NOCYCLE
CHAPTER 14 ■ ALTER SESSION SET CURRENT_SCHEMA
END OF STMT
PARSE #5:c=0,e=711,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=63968189057
The trace file shows that the statement ALTER SESSION SET CURRENT SCHEMA="HR" was parsed
with parsing user identity 5 (lid=5; SYSTEM) and parsing schema identity 5 (uid=5). This ALTER
SESSION statement sets the parsing schema name to HR, as evidenced by the subsequent statement CREATE SEQUENCE, which was parsed with a parsing schema identity of 38 (uid=38), which
corresponds to the schema HR.
Creating Database Objects in a Foreign Schema
Let’s assume that a software developer working with the account HR needs a large quantity of
new tables for testing purposes. The database user HR does not have the privilege CREATE TABLE,
since the developer is only allowed to work with existing tables in the schema HR. Thus, the
developer is unable to create the tables in schema HR himself. The developer sent a script for
creating the tables, but the database object names in the script are not prefixed by a schema name.
This scenario is an example for putting the ALTER SESSION SET CURRENT SCHEMA statement to
use. Without the statement, one of the following solutions must be chosen:
• The DBA has to temporarily grant CREATE TABLE to the database user HR, such that the
developer can create the tables himself.
• The DBA has to ask the developer to prefix each and every object name in the script with
the schema name HR, such that the DBA can run the script.
The DBA may run the script unchanged by leveraging CURRENT SCHEMA. Below is a single
CREATE TABLE statement executed after ALTER SESSION SET CURRENT SCHEMA. The example illustrates that database objects are created under the parsing schema identifier, not the privilege
schema identifier.
SQL> SHOW USER
USER is "SYSTEM"
SQL> ALTER SESSION SET current schema=hr;
Session altered.
SQL> CREATE TABLE country (
country id char(2),
country name varchar2(40),
region id number,
CONSTRAINT pk country PRIMARY KEY (country id)
);
Table created.
SQL> SELECT owner, object type
FROM dba objects
WHERE object name IN ('COUNTRY', 'PK COUNTRY');
OWNER OBJECT TYPE
----- ----------HR
TABLE
HR
INDEX
137
138
CHAPTER 14 ■ ALTER SESSION SET CURRENT_SCHEMA
The database objects were created in schema HR. It is undocumented that restrictions
pertaining to certain database objects apply when using ALTER SESSION SET CURRENT SCHEMA.
Restrictions of ALTER SESSION SET
CURRENT_SCHEMA
Switching to a different parsing schema identifier cannot be used with Advanced Queuing.
Furthermore it is impossible to create a private database link in a foreign schema. It is possible
to create a stored outline in a foreign schema. The next sections provide the details on these
restrictions.
Advanced Queuing
ALTER SESSION SET CURRENT SCHEMA has no effect on name resolution by the Advanced Queuing
(AQ) packages DBMS AQADM and DBMS AQ. It is documented that synonyms cannot be used with
advanced queues. Hence, the only way to use queue tables and queues in a foreign schema in
conjunction with the PL/SQL packages DBMS AQ and DBMS AQADM is to qualify the object names
with the schema name, in the same way that a table in a foreign schema needs to be qualified
as long as there is no synonym in the current schema. The following code sample assumes that
the queue created in Chapter 16 resides in schema NDEBES.
$ sqlplus / as sysdba
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
SQL> SELECT owner FROM dba queues WHERE name='CAUGHT IN SLOW Q AGAIN';
OWNER
--------NDEBES
SQL> ALTER SESSION SET CURRENT SCHEMA=ndebes;
Session altered.
SQL> EXEC dbms aqadm.stop queue('CAUGHT IN SLOW Q AGAIN')
BEGIN dbms aqadm.stop queue('CAUGHT IN SLOW Q AGAIN'); END;
*
ERROR at line 1:
ORA-24010: QUEUE SYS.CAUGHT IN SLOW Q AGAIN does not exist
ORA-06512: at "SYS.DBMS AQADM SYS", line 4913
ORA-06512: at "SYS.DBMS AQADM", line 240
ORA-06512: at line 1
SQL> CONNECT ndebes/secret
Connected.
SQL> EXEC dbms aqadm.stop queue('CAUGHT IN SLOW Q AGAIN')
PL/SQL procedure successfully completed.
The reason for the failure of STOP QUEUE executed by SYS is not a lack of privileges. When
SYS qualifies the queue name with the correct schema name, STOP QUEUE works flawlessly.
CHAPTER 14 ■ ALTER SESSION SET CURRENT_SCHEMA
SQL> CONNECT / AS SYSDBA
Connected.
SQL> EXEC dbms aqadm.stop queue('NDEBES.CAUGHT IN SLOW Q AGAIN')
PL/SQL procedure successfully completed.
RENAME
The names of database tables may be changed with the SQL statement RENAME. It is one of the
few SQL statements that does not support the qualification of database objects with schema
names. This limitation cannot be worked around by using ALTER SYSTEM SET CURRENT SCHEMA,
since a subsequent RENAME statement causes the error ORA-03001.
SQL> SHOW USER
USER is "NDEBES"
SQL> ALTER SESSION SET CURRENT SCHEMA=hr;
Session altered.
SQL> RENAME employees TO emp;
RENAME employees TO emp
*
ERROR at line 1:
ORA-03001: unimplemented feature
The solution is to create a stored procedure in the same schema that holds the table requiring
a different name and to use EXECUTE IMMEDIATE along with RENAME in the procedure. The RENAME
through the stored procedure completes successfully, since it executes in the context of the
schema holding the table.
Private Database Links
It is impossible to create a private database link in a foreign schema, since a database link
cannot be prefixed with a schema name. This restriction remains, even when ALTER SESSION SET
CURRENT SCHEMA is used.
SQL> CONNECT / AS SYSDBA
Connected.
SQL> ALTER SESSION SET CURRENT SCHEMA=ndebes;
Session altered.
SQL> CREATE DATABASE LINK lnk CONNECT TO remote user
IDENTIFIED BY pwd USING 'dbserver1.oradbpro.com';
CREATE DATABASE LINK lnk CONNECT TO remote user
IDENTIFIED BY pwd USING 'dbserver1.oradbpro.com'
*
ERROR at line 1:
ORA-01031: insufficient privileges
The error message makes sense, since there is no privilege such as CREATE ANY DATABASE
LINK. If you really do need to create a database link in a foreign schema, then you may use the
trick with the following stored procedure:
139
140
CHAPTER 14 ■ ALTER SESSION SET CURRENT_SCHEMA
CREATE OR REPLACE PROCEDURE ndebes.create db link
IS
BEGIN
EXECUTE IMMEDIATE 'CREATE DATABASE LINK lnk
CONNECT TO remote user IDENTIFIED BY pwd
USING ''dbserver1.oradbpro.com''';
END;
/
Procedure created.
SQL> EXEC ndebes.create db link
PL/SQL procedure successfully completed.
SQL> SELECT owner, db link, username, host
FROM dba db links
WHERE db link LIKE 'LNK%';
OWNER DB LINK
USERNAME
HOST
------ ---------------- ----------- ---------------------NDEBES LNK.ORADBPRO.COM REMOTE USER dbserver1.oradbpro.com
Since stored procedures are run with owner’s rights by default, the CREATE DATABASE LINK
statement is executed as privilege user NDEBES and parsing user NDEBES, such that the database link is created in the same schema as the procedure. Analogous to database links, a directory
name in a CREATE DIRECTORY statement cannot be prefixed with a schema name. However, all
directories are owned by SYS, so it’s irrelevant which user creates a directory.
Stored Outlines
Stored outlines may be used to fix an execution plan for a SQL statement, such that the optimizer
always uses the plan from the stored outline instead of optimizing the statement in the current
environment. It is possible to create a stored outline in a foreign schema, although there is no
privilege CREATE ANY OUTLINE.
SQL> CONNECT system/secret
Connected.
SQL> ALTER SESSION SET CURRENT SCHEMA=ndebes;
Session altered.
SQL> CREATE OUTLINE some outline ON
SELECT emp.last name, emp.first name, d.department name
FROM hr.employees emp, hr.departments d
WHERE emp.department id=d.department id;
Outline created.
SQL> SELECT node, stage, join pos, hint FROM dba outline hints
WHERE owner='NDEBES'
AND name='SOME OUTLINE'
ORDER by node, join pos;
NODE STAGE JOIN POS HINT
CHAPTER 14 ■ ALTER SESSION SET CURRENT_SCHEMA
---- ----- -------- ----------------------------------------------------------1
1
0 USE NL(@"SEL$1" "D"@"SEL$1")
1
1
0 LEADING(@"SEL$1" "EMP"@"SEL$1" "D"@"SEL$1")
1
1
0 OUTLINE LEAF(@"SEL$1")
1
1
0 ALL ROWS
1
1
0 IGNORE OPTIM EMBEDDED HINTS
1
1
0 OPTIMIZER FEATURES ENABLE('10.2.0.1')
1
1
1 FULL(@"SEL$1" "EMP"@"SEL$1")
1
1
2 INDEX(@"SEL$1" "D"@"SEL$1" ("DEPARTMENTS"."DEPARTMENT ID"))
User NDEBES is the owner of the outline. Outlines consist of hints, which fully describe
the execution plan for a statement.
141
CHAPTER 15
■■■
ALTER USER
IDENTIFIED BY VALUES
A
LTER USER username IDENTIFIED BY VALUES 'password_hash' is an undocumented SQL statement. It is used internally by import utilities to store a password hash, which was previously
saved with an export utility, in the data dictionary base table SYS.USER$. In situations where a
DBA needs to connect as a certain user, but does not or must not know the password of that
user, the capability to restore a saved password hash saves time and spares the DBA from
annoyance asking around for passwords.
The IDENTIFIED BY VALUES clause may also be used along with the SQL statement CREATE
USER to create a new user account in a database. Given that the underlying password is long and
complex enough to withstand a brute force password attack, CREATE USER username IDENTIFIED
BY VALUES may be used in scripts that need to create certain database accounts without exposing
the clear text password. In this context, brute force means trying all character combinations to
guess a password, while computing the password hash for each guessed password. The process
stops when a password that matches the password hash is found. I am not aware of any utility
that performs a brute force attack on passwords with 15 or more characters. Even a small character
repertoire containing letters, digits, and an underscore—38 characters in all—allows for
4.9746E+23 passwords consisting of 15 characters. Even if hardware that could try one trillion
passwords per second should become available one day, it would still take more than 15000 years
to try all 15 character combinations. Hence such long passwords are immune to brute force
attacks.
The Password Game
Regularly, DBAs are asked to create database objects in a foreign schema or to investigate
malfunctions that manifest themselves only when working with a certain user. Often, a DBA
does not know the password to connect as a particular user. Company policy may not even
allow him to obtain the password from a colleague, or none of the colleagues on-shift know the
password. In situations like these, the DBA will often be able to complete a task much more
quickly if he or she has access to the schema involved. As a solution, the DBA can record the
password hash of the schema, change the password, connect using the changed password, and
reset the password hash. This chapter explains the approach made possible by the undocumented SQL statement ALTER USER IDENTIFIED BY VALUES. Some tasks can only be accomplished
by connecting to the schema in question, rendering the approach presented here the only path
143
144
CHAPTER 15 ■ ALTER USER IDENTIFIED BY VALUES
to a solution. Some examples are querying the view V$INDEX USAGE or running third-party
scripts that do not prefix database objects with schema names, without changing the code.
The example below uses the user SYSTEM, which has DBA privileges, and the application
schema HR. The first step is to connect as a DBA and to save the current unknown password.
SQL> CONNECT system
Enter password:
Connected.
SQL> SPOOL pwd.log
SQL> SELECT password FROM dba users WHERE username='HR';
PASSWORD
-----------------------------2AB46277EE8215C4
SQL> SPOOL OFF
The file pwd.log now contains the password hash of the schema HR. Next, edit pwd.log,
such that it contains a SQL statement suitable for resetting the password to its original value.
The syntax is ALTER USER username IDENTIFIED BY VALUES ‘password_hash’, where password_
hash is the password hash obtained above. After editing, the file pwd.log should contain the
following line:
ALTER USER hr IDENTIFIED BY VALUES '2AB46277EE8215C4';
Now the password may be changed temporarily.
SQL> ALTER USER hr IDENTIFIED BY secret;
User altered.
Note that the preceding statement will send the temporary password to the DBMS instance
unencrypted. Use the SQL*Plus command PASSWORD if this concerns you. The password hash
stored in the data dictionary has now changed.
SQL> SELECT password FROM dba users WHERE username='HR';
PASSWORD
-----------------------------D370106EA83A3CD3
Next, start one or more applications suitable for diagnosing and/or solving the issue at
hand. To minimize the interval while the changed password is in effect, proceed to restore the
original password immediately after the application has connected to the DBMS instance.
$ sqlplus hr/secret
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
SQL>
Now, reset the password to its original value by running the ALTER USER command in file
pwd.log.
CHAPTER 15 ■ ALTER USER IDENTIFIED BY VALUES
SQL>
SQL>
SQL>
User
SET ECHO ON
@pwd.log
ALTER USER hr IDENTIFIED BY VALUES '2AB46277EE8215C4';
altered.
The original password has now been restored and it is no longer possible to connect with
the temporary password “secret”.
SQL> SELECT password FROM dba users WHERE username='HR';
PASSWORD
---------------2AB46277EE8215C4
SQL> CONNECT hr/secret
ERROR:
ORA-01017: invalid username/password; logon denied
Warning: You are no longer connected to ORACLE.
The password hash depends on the user name, i.e., the same password used for different
users yields different password hashes. Here’s an example:
SQL> CREATE USER U1 IDENTIFIED
User created.
SQL> CREATE USER U2 IDENTIFIED
User created.
SQL> SELECT username, password
USERNAME
-----------------------------U1
U2
BY "Rattle And Hum";
BY "Rattle And Hum";
FROM dba users WHERE username IN ('U1', 'U2');
PASSWORD
-----------------------------07A31E4964AEAC50
31019CA688540357
Locking Accounts with ALTER USER IDENTIFIED
BY VALUES
By following Oracle Corporation’s recommendation to lock the accounts of internal schemas
such as CTXSYS, MDSYS, XDB, OLAPSYS, etc., you allow an attacker to find out which components are installed and to specifically exploit vulnerabilities in these components. The error
“ORA-28000: the account is locked” tells the attacker that a certain schema does exist. You
might prefer leaving the account open, while setting an impossible password hash with ALTER
USER IDENTIFIED BY VALUES. Attempts to connect will then result in “ORA-01017: invalid username/
password; logon denied”, such that the attacker will not gain information on which user names
exist. Since it is impossible to specify a matching password for the incorrect password hash,
such an account is effectively locked, even without an expired password.
SQL> ALTER USER ctxsys IDENTIFIED BY VALUES 'LOCKED' ACCOUNT UNLOCK;
User altered.
SQL> SELECT password FROM dba users WHERE username='CTXSYS';
PASSWORD
145
146
CHAPTER 15 ■ ALTER USER IDENTIFIED BY VALUES
-------LOCKED
SQL> CONNECT ctxsys/impossible to crack incorrectly encoded password
ERROR:
ORA-01017: invalid username/password; logon denied
No matter which approach you prefer, you may always audit failed connect attempts by
setting AUDIT TRAIL=DB and enabling auditing for connect failures with AUDIT CONNECT WHENEVER
NOT SUCCESSFUL. The following query will then yield failed connect attempts:
SQL> SELECT username, os username, userhost, terminal, timestamp, returncode
FROM dba audit trail
WHERE action name='LOGON'
AND returncode!=0;
USERNAME OS USERNAME
USERHOST
TERMINAL TIMESTAMP
RETURNCODE
-------- ----------------- ------------------- --------- -------------- ---------CTXSYS
DBSERVER\ndebes
WORKGROUP\DBSERVER DBSERVER 28.09.07 20:22
1017
MDSYS
DBSERVER\ndebes
WORKGROUP\DBSERVER DBSERVER 28.09.07 20:37
28000
In the context of security, it is worth mentioning that Oracle10g databases, which were
created based on a seed database such as General Purpose or Transaction Processing with DBCA,
contain a new undocumented profile called MONITORING PROFILE, which is assigned to the user
DBSNMP. This profile allows an unlimited number of failed login attempts, whereas the standard
profile DEFAULT, which also exists in Oracle9i and prior releases, allows ten failed login attempts
in Oracle10g before locking an account.
SQL> SELECT profile, limit FROM dba profiles
WHERE resource name='FAILED LOGIN ATTEMPTS';
PROFILE
LIMIT
------------------------------ ---------------------------------------DEFAULT
10
MONITORING PROFILE
UNLIMITED
This setting makes the account DBSNMP a likely target for password-cracking routines. This
vulnerability does not apply to databases that were created manually or using Custom Database
in DBCA.
ALTER USER and Unencrypted Passwords
In case you are concerned about sending unencrypted passwords across a network—after all
this is one of the reasons why telnet and ftp have been abandoned in favor of secure shell
(SSH)—you should be aware of the fact that ALTER USER IDENTIFIED BY does just that, unless
your site has licensed and installed the Advanced Security Option, which encrypts all Oracle
Net traffic. It’s fairly easy to demonstrate this, since the undocumented Oracle Net trace file
format contains an ASCII dump of the network packages transmitted by a database client when
the highest trace level support is enabled. After setting the following parameters in sqlnet.ora
on a Windows client system, trace files will be written to C:\temp:
CHAPTER 15 ■ ALTER USER IDENTIFIED BY VALUES
trace level client=support
trace directory client=c:\temp
After running the SQL statement ALTER USER ndebes IDENTIFIED BY secret, the trace file
contains the unencrypted password.
[28-SEP-2007
[28-SEP-2007
[28-SEP-2007
[28-SEP-2007
[28-SEP-2007
18:07:38:305]
18:07:38:305]
18:07:38:305]
18:07:38:305]
18:07:38:305]
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
00
55
62
54
59
26
53
65
49
20
41
45
73
46
73
4C
52
20
49
65
54
20
49
45
63
45
6E
44
44
72
52
64
45
20
65
20
65
4E
42
74
|.&ALTER.|
|USER.nde|
|bes.IDEN|
|TIFIED.B|
|Y.secret|
This vulnerability does not apply to the SQL*Plus command PASSWORD, as is evident from
the Oracle Net trace file. After changing a password in the following manner:
SQL> PASSWORD ndebes
Changing password for ndebes
New password:
Retype new password:
Password changed
you will notice an encrypted password in the trace file.
[28-SEP-2007
[28-SEP-2007
[28-SEP-2007
[28-SEP-2007
[28-SEP-2007
[28-SEP-2007
[28-SEP-2007
[28-SEP-2007
[28-SEP-2007
[28-SEP-2007
[28-SEP-2007
[28-SEP-2007
[28-SEP-2007
[28-SEP-2007
[28-SEP-2007
18:12:17:602]
18:12:17:602]
18:12:17:602]
18:12:17:602]
18:12:17:602]
18:12:17:602]
18:12:17:602]
18:12:17:602]
18:12:17:602]
18:12:17:602]
18:12:17:602]
18:12:17:602]
18:12:17:602]
18:12:17:602]
18:12:17:602]
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
nspsend:
06
00
5F
57
40
41
36
32
43
34
35
36
31
00
41
6E
00
4E
4F
44
42
41
44
46
35
36
33
00
0D
53
64
00
45
52
38
34
44
43
44
33
39
31
00
41
53
65
10
57
44
36
43
34
41
39
41
34
38
00
55
57
62
41
50
40
38
42
31
36
35
45
46
44
00
54
4F
65
55
41
00
43
37
36
46
41
35
45
43
0D
48
52
73
54
53
00
39
39
36
42
35
34
37
43
00
5F
44
10
48
53
00
36
39
31
37
33
39
36
43
00
50
00
|.ndebes.|
|....AUTH|
| NEWPASS|
|WORD@...|
|@D868C96|
|AB4CB799|
|6AD41661|
|2DCA6FB7|
|CFD95A53|
|453AE549|
|5694FE76|
|6318DCCC|
|1.......|
|..AUTH P|
|ASSWORD.|
147
CHAPTER 16
■■■
SELECT FOR UPDATE
SKIP LOCKED
S
ELECT FOR UPDATE SKIP LOCKED is undocumented in Oracle9i and Oracle10g. It is used behind
the scenes by Advanced Queuing, a reliable messaging service built into the ORACLE DBMS.
Oracle11g Release 1 is the first DBMS release that includes documentation on SELECT FOR UPDATE
SKIP LOCKED. The SKIP LOCKED clause improves the scalability of applications that attempt to
concurrently update the same set of rows in a table. It eliminates wait time for TX locks. Consistency and isolation are preserved. The DBMS server assigns a fair share of the rows to each
database client that is interested in an overlapping result set.
Advanced Queuing
Advanced Queuing (AQ) is Oracle’s implementation of a reliable messaging service, which is
integrated into the ORACLE DBMS. AQ has been available since Oracle8. With the advent of
Oracle10g, it was renamed Streams AQ (see Streams Advanced Queuing User’s Guide and Reference Oracle10g Release 2), since Streams—an alternative replication mechanism to Advanced
Replication—is built on top of message queuing and message propagation with AQ. Streams is
based on redo log mining (Log Miner), whereas Advanced Replication is trigger-based.
AQ messages are usually made up of a user-defined abstract data type (ADT; see Application
Developer’s Guide—Advanced Queuing) built with CREATE TYPE. The latter is called the payload
of a message. In its simplest form, the payload is merely a BLOB instead of an ADT. Messages
may be created by calling DBMS AQ.ENQUEUE, propagated from one queue to another (even from
one database to another or to another messaging system from a different vendor), and consumed
with DBMS AQ.DEQUEUE. The details are beyond the scope of this book. Please keep in mind though
that AQ ships with all editions of the ORACLE DBMS at no additional cost; is highly reliable
since it benefits from the infrastructure of the ORACLE DBMS server, which provides crash and
media recovery; and has Java, PL/SQL, and C/C++ interfaces. So if you do need message
queuing functionality in an upcoming project, AQ might be the right choice for you.
If you have ever taken a closer look behind the scenes of AQ, you may have noticed the
undocumented SELECT FOR UPDATE SKIP LOCKED statement. In case you weren’t among the lucky
ones who were able to obtain a backstage pass, the well-kept secret will be unveiled instantly.
AQ uses SKIP LOCKED when removing messages from a queue with DBMS AQ.DEQUEUE to ensure
scalability by preventing waits for TX locks. In a situation where several processes dequeue
from the same queue simultaneously, locking would severely limit the scalability of applications
149
150
CHAPTER 16 ■ SELECT FOR UPDATE SKIP LOCKED
that want to use concurrent processes for dequeuing. To be precise, this applies only when
applications ask for any message that is ready for dequeuing, which is the predominant
approach. When concurrent processes ask for specific messages, the probability for contention
is lower.
The following example shows how to obtain a SQL trace file that illustrates the use of the
clause SKIP LOCKED by AQ. Before running the code below, make sure you have sufficient privileges to execute the package DBMS AQADM that is needed to create queue tables and queues.
SQL> EXEC dbms aqadm.create queue table('post office queue table', 'raw');
SQL> EXEC dbms aqadm.create queue('caught in slow q again', 'post office queue table');
SQL> EXEC dbms aqadm.start queue('caught in slow q again');
SQL> ALTER SESSION SET SQL TRACE=TRUE;
Session altered.
SQL> DECLARE
dequeue options dbms aq.dequeue options t;
message properties dbms aq.message properties t;
payload blob;
msgid raw(16);
BEGIN
dequeue options.wait:=dbms aq.no wait; -- default is to patiently wait forever
DBMS AQ.DEQUEUE (
queue name => 'caught in slow q again',
dequeue options => dequeue options,
message properties => message properties,
payload => payload,
msgid => msgid);
END;
/
DECLARE
*
ERROR at line 1:
ORA-25228: timeout or end-of-fetch during message dequeue from
NDEBES.CAUGHT IN SLOW Q AGAIN
ORA-06512: at "SYS.DBMS AQ", line 358
ORA-06512: at "SYS.DBMS AQ", line 556
ORA-06512: at line 8
In the SQL trace file, you will see an example of a SELECT with the FOR UPDATE SKIP LOCKED
clause.
PARSING IN CURSOR #27 len=367 dep=0 uid=30 oct=47 lid=30 tim=69919173952 hv=38306557
86 ad='6771d48c'
DECLARE
dequeue options dbms aq.dequeue options t;
message properties dbms aq.message properties t;
payload blob;
msgid raw(16);
CHAPTER 16 ■ SELECT FOR UPDATE SKIP LOCKED
BEGIN
dequeue options.wait:=dbms aq.no wait;
DBMS AQ.DEQUEUE (
queue name => 'caught in slow q again',
dequeue options => dequeue options,
message properties => message properties,
payload => payload,
msgid => msgid);
END;
END OF STMT
PARSE #27:c=0,e=123,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=69919173943
=====================
PARSING IN CURSOR #26 len=565 dep=1 uid=0 oct=3 lid=0 tim=69919341626 hv=319671114 a
d='674e3128'
select /*+ FIRST ROWS(1) */
tab.rowid, tab.msgid, tab.corrid,
tab.priority, tab.delay,
tab.expiration, tab.retry count,
tab.exception qschema, tab.exception queue, tab.chain no,
tab.local order no, tab.enq time, tab.time manager info, tab.state,
tab.enq tid, tab.step no,
tab.sender name, tab.sender address,
tab.sender protocol, tab.dequeue msgid, tab.user prop, tab.user data
from "NDEBES"."POST OFFICE QUEUE TABLE" tab where q name = :1 and (state = :2 )
order by q name, state, enq time, step no, chain no, local order no
for update skip locked
END OF STMT
EXEC #26:c=0,e=168,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=2,tim=69919341618
FETCH #26:c=0,e=139,p=0,cr=3,cu=0,mis=0,r=0,dep=1,og=2,tim=69919363741
EXEC #27:c=0,e=38468,p=0,cr=12,cu=0,mis=0,r=0,dep=0,og=1,tim=69919378626
ERROR #27:err=25228 tim=0
Contention and SELECT FOR UPDATE SKIP LOCKED
Let’s pretend the SKIP LOCKED extension does not exist. To investigate what happens when
several processes attempt to consume messages simultaneously (any available message),
we need to first enqueue some messages.
SQL> SET SERVEROUTPUT ON
SQL> DECLARE
enqueue options dbms aq.enqueue options t;
message properties dbms aq.message properties t;
payload blob;
msg raw(64);
msgid raw(16);
BEGIN
dbms lob.createtemporary(payload, true);
msg:=utl raw.cast to raw('message in a bottle');
dbms lob.writeappend(payload, utl raw.length(msg), msg);
DBMS AQ.ENQUEUE (
151
152
CHAPTER 16 ■ SELECT FOR UPDATE SKIP LOCKED
queue name => 'caught in slow q again',
enqueue options => enqueue options,
message properties => message properties,
payload => payload,
msgid => msgid);
dbms output.put line(rawtohex(msgid));
END;
/
89A118C42BFA4A22AE31932E3426E493
PL/SQL procedure successfully completed.
SQL> COMMIT;
Commit complete.
Let’s see which messages, except the one with MSGID=89A118C42BFA4A22AE31932E3426E493
we just enqueued, are in the queue.
SQL> SELECT tab.msgid, tab.state
FROM "NDEBES"."POST OFFICE QUEUE TABLE" tab
WHERE q name='CAUGHT IN SLOW Q AGAIN';
MSGID
STATE
-------------------------------- ---------3DB7BE5A803B4ECB91EF7B021FB223F4
1
00E146F0560C4760886B7AEEEDCF7BF2
1
34F01D98AF0444FF91B10C6D00CB5826
0
54783A999BB544419CDB0D8D44702CD3
0
B78950028B3A4F42A5C9460DDDB9F9D7
0
89A118C42BFA4A22AE31932E3426E493
0
25CC997C9ADE48FFABCE33E62C18A7F3
0
353D44D753494D78B9C5E7B515263A6D
0
2544AA9A68C54A9FB9B6FE410574D85A
0
F7E0192C2AEF45AEAEE5661F183261CC
1
10 rows selected.
There are currently ten messages in the queue (each row represents one message). STATE=0
means the message is ready for dequeue, while STATE=1 means the message was enqueued with
a delay for deferred dequeue (message properties.delay>0; the delay is in seconds).
What happens when two processes dequeue concurrently? Let’s take a simplified version
of the SELECT FOR UPDATE we found in the SQL trace file of the dequeue operation and run it in
session 1 without SKIP LOCKED, while retaining the bind variables.
SQL> VARIABLE q name VARCHAR2(30)
SQL> VARIABLE state NUMBER
SQL> BEGIN
:q name:='CAUGHT IN SLOW Q AGAIN';
:state:=0;
END;
/
CHAPTER 16 ■ SELECT FOR UPDATE SKIP LOCKED
PL/SQL procedure successfully completed.
SQL> ALTER SESSION SET SQL TRACE=TRUE;
Session altered.
SQL> SELECT userenv('sid') FROM dual;
USERENV('SID')
-------------134
SQL> SET ARRAYSIZE 1
SQL> SET PAGESIZE 1
SQL> SET TIME ON
23:41:04 SQL> SET PAUSE "Hit return to continue"
23:41:04 SQL> SET PAUSE ON
23:41:04 SQL> SELECT tab.msgid
FROM "NDEBES"."POST OFFICE QUEUE TABLE" tab
WHERE q name=:q name and (state=:state)
FOR UPDATE;
Hit return to continue
Session 1 has retrieved one row up to this point, even though SQL*Plus does not yet
display that row. The SQL trace output, which contains a FETCH call with r=1 (r is short for
rows), proves it.
$ tail -f ten ora 1724.trc
=====================
PARSING IN CURSOR #2 len=112 dep=0 uid=30 oct=3 lid=30 tim=78932802402 hv=2531579934
ad='6792e910'
SELECT tab.msgid
FROM "NDEBES"."POST OFFICE QUEUE TABLE" tab
WHERE q name=:q name and (state=:state)
FOR UPDATE
END OF STMT
PARSE #2:c=0,e=115,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=78932802391
EXEC #2:c=0,e=1290,p=0,cr=16,cu=9,mis=0,r=0,dep=0,og=1,tim=78932838870
FETCH #2:c=0,e=459,p=0,cr=12,cu=0,mis=0,r=1,dep=0,og=1,tim=78932843524
In theory, this leaves six rows worth of messages with STATE=0 for session 2. Let’s see what
happens in session 2.
SQL> ALTER SESSION SET EVENTS '10046 trace name context forever, level 8';
Session altered.
SQL> SELECT userenv('sid') FROM dual;
USERENV('SID')
-------------158
SQL> SET ARRAYSIZE 1
SQL> SET PAGESIZE 1
SQL> SET TIME ON
153
154
CHAPTER 16 ■ SELECT FOR UPDATE SKIP LOCKED
23:45:28 SQL> SET PAUSE "Hit return to continue"
23:45:28 SQL> SET PAUSE ON
23:45:28 SQL> SELECT tab.msgid
FROM "NDEBES"."POST OFFICE QUEUE TABLE" tab
WHERE q name=:q name and (state=:state)
FOR UPDATE;
The second session is unable to retrieve any data. Here’s the level 8 SQL trace output from
that session:
*** 2007-07-10 23:45:40.319
WAIT #1: nam='enq: TX - row lock contention' ela= 2999807
16 | slot=327721 sequence=660 obj#=16567 tim=79706625648
WAIT #1: nam='enq: TX - row lock contention' ela= 3000541
16 | slot=327721 sequence=660 obj#=16567 tim=79709637248
WAIT #1: nam='enq: TX - row lock contention' ela= 2999946
16 | slot=327721 sequence=660 obj#=16567 tim=79712642844
WAIT #1: nam='enq: TX - row lock contention' ela= 3000759
16 | slot=327721 sequence=660 obj#=16567 tim=79715649132
*** 2007-07-10 23:45:52.347
WAIT #1: nam='enq: TX - row lock contention' ela= 2999721
16 | slot=327721 sequence=660 obj#=16567 tim=79718655012
name|mode=1415053318 usn<<
name|mode=1415053318 usn<<
name|mode=1415053318 usn<<
name|mode=1415053318 usn<<
name|mode=1415053318 usn<<
We see repeated waits for a TX lock, due to a lock (enqueue) request constantly being reattempted. A look at V$LOCK confirms that session 1 with SID=134 blocks session 2 with SID=158.
SQL> SELECT sid, type, id1, id2, lmode, request, block
FROM v$lock
WHERE sid IN (134,158)
ORDER BY 1;
SID TYPE
ID1 ID2 LMODE REQUEST BLOCK
--- ---- ------ --- ----- ------- ----134 TM
16567
0
3
0
0
134 TX
458764 657
6
0
1
158 TM
16567
0
3
0
0
158 TX
458764 657
0
6
0
The TYPE and LMODE in V$LOCK are represented as name|mode=1415053318 in the extended SQL
trace file. This is a decimal number with the upper two bytes representing the enqueue name
as ASCII encoded letters and the lowest byte representing the lock mode. Name and mode are
equivalent to V$SESSION WAIT.P1 when the wait in V$SESSION WAIT.EVENT is for a TX enqueue.
Oracle9i uses the generic wait event name enqueue, whereas Oracle10g uses enq: TX - row lock
contention. Oracle10g provides the same information in V$SESSION.EVENT and V$SESSION.P1
as well.
You have two options for converting name and mode into a more human readable format.
The first one uses decimal to hexadecimal and decimal to ASCII conversion, whereas the second
relies entirely on SQL. The UNIX utility bc (an arbitrary precision calculator) may be used to
convert between decimal and hexadecimal numbers (most implementations of bc do not support
CHAPTER 16 ■ SELECT FOR UPDATE SKIP LOCKED
comments (#), but the one by the Free Software Foundation does). Here’s the output of a bc
session, which accomplished the conversion:
$ bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
obase=16 # ask for hexadecimal output, i.e. output base 16
1415053318 # convert name and mode to hexadecimal
54580006
# lock mode is 6 (exclusive) and enqueue name is hex 5458, let's
# convert that to decimal
obase=10 # ask for output in decimal, i.e. base 10
ibase=16 # input will be hexadecimal
54 # hex
84
# is 84 decimal
58 # hex
88
# is 88 decimal
Now we can use the function CHR in SQL*Plus to convert from decimal to ASCII.
SQL> SELECT chr(84)||chr(88) AS name FROM dual;
NAME
---TX
Alternatively, a SELECT statement, that incorporates the function BITAND, may be used to
extract the information from the decimal name and mode.
SQL> VARIABLE p1 NUMBER
SQL> EXEC :p1:=1415053318
SQL> SELECT chr(to char(bitand(:p1,-16777216))/16777215)||
chr(to char(bitand(:p1, 16711680))/65535) "Name",
to char( bitand(:p1, 65535) ) "Mode"
FROM dual;
Name Mode
---- ---TX 6
Let’s go back to session 1 and hit the return key, such that it can continue.
23:41:04 SQL> SELECT tab.msgid
FROM "NDEBES"."POST OFFICE QUEUE TABLE" tab
WHERE q name=:q name and (state=:state)
FOR UPDATE;
Hit return to continue
34F01D98AF0444FF91B10C6D00CB5826
Hit return to continue
155
156
CHAPTER 16 ■ SELECT FOR UPDATE SKIP LOCKED
The row previously fetched by SQL*Plus session 1 is now displayed and the SQL trace file
shows that another FETCH call was done.
FETCH #1:c=0,e=158,p=0,cr=12,cu=0,mis=0,r=1,dep=0,og=1,tim=79430535911
*** 2007-07-10 23:48:01.833
FETCH #1:c=0,e=19466,p=0,cr=1,cu=0,mis=0,r=2,dep=0,og=1,tim=79848141763
SQL*Plus pauses again after displaying one row, since PAGESIZE=1 is set. Switching back to
session 2 there is still no progress. Even though I prevented SQL*Plus from fetching all the relevant rows with a single FETCH call by reducing ARRAYSIZE from its default of 15 to just 1, the whole
process is single threaded and thus there’s no room for benefiting from a multiprocessor system.
Finally, after fetching all rows and committing in session 1, session 2 can retrieve the
matching rows. In the real world though, there would be no rows left for session 2 to process,
since session 1 would have changed the status to a value other than 0 after finishing its job.
Here’s the COMMIT in session 1:
Hit return to continue
2544AA9A68C54A9FB9B6FE410574D85A
7 rows selected.
23:56:03 SQL>
23:56:05 SQL> COMMIT;
Commit complete.
23:56:17 SQL>
Now session 2 wakes up.
Hit return to continue
34F01D98AF0444FF91B10C6D00CB5826
…
2544AA9A68C54A9FB9B6FE410574D85A
7 rows selected.
Even though the normal processing of SQL*Plus was altered, the result of this test is disappointing. However, taking locking strategies of relational databases into account, the behavior
observed is exactly what one should expect. Let’s investigate what happens when we add SKIP
LOCKED to the picture.
Session 1:
00:50:41 SQL> ALTER SESSION SET EVENTS '10046 trace name context forever, level 8';
Session altered.
00:50:51 SQL> SET ARRAYSIZE 1
00:50:51 SQL> SET PAGESIZE 1
00:50:51 SQL> SET TIME ON
00:50:51 SQL> SET PAUSE "Hit return to continue"
00:50:51 SQL> SET PAUSE ON
00:50:51 SQL> SELECT tab.msgid
00:50:51 FROM "NDEBES"."POST OFFICE QUEUE TABLE" tab
00:50:51 WHERE q name=:q name and (state=:state)
00:50:51 FOR UPDATE SKIP LOCKED;
Hit return to continue
CHAPTER 16 ■ SELECT FOR UPDATE SKIP LOCKED
Session 2:
00:50:44 SQL> ALTER SESSION SET EVENTS '10046 trace name context forever, level 8';
Session altered.
00:51:00 SQL> SET FEEDBACK ON
00:51:00 SQL> SET ARRAYSIZE 1
00:51:00 SQL> SET PAGESIZE 1
00:51:00 SQL> SET TIME ON
00:51:00 SQL> SET PAUSE "Hit return to continue"
00:51:00 SQL> SET PAUSE ON
00:51:00 SQL> SELECT tab.msgid
FROM "NDEBES"."POST OFFICE QUEUE TABLE" tab
WHERE q name=:q name and (state=:state)
FOR UPDATE SKIP LOCKED;
Hit return to continue
Both sessions are waiting for input in order to display the first row. Note how session 2
now prints “Hit return to continue”. This was not the case in the first test, since the FETCH call
of session 2 could not complete due to the wait for a TX enqueue. Now hit return in session 2
…
Hit return to continue
54783A999BB544419CDB0D8D44702CD3
Hit return to continue
Session 2 succeeded in retrieving a row, even though session 1 issued SELECT FOR UPDATE
SKIP LOCKED first. Now hit return in session 1.
…
Hit return to continue
34F01D98AF0444FF91B10C6D00CB5826
Hit return to continue
Session 1 also retrieved a row. Hitting the return key while alternating between sessions
shows that both get a fair share of the rows. Looking at V$LOCK confirms that this time neither
session is blocked
SQL> SELECT sid, type, id1, id2, lmode, request, block
FROM v$lock WHERE sid IN (134,158)
ORDER BY 1;
SID TYPE
ID1 ID2 LMODE REQUEST BLOCK
--- ---- ------ --- ----- ------- ----134 TM
16567
0
3
0
0
134 TX
196644 644
6
0
0
158 TX
65539 666
6
0
0
158 TM
16567
0
3
0
0
157
158
CHAPTER 16 ■ SELECT FOR UPDATE SKIP LOCKED
Table 16-1 illustrates the entire sequence chronologically. Rows further down in the table
correspond to a later point in time. The timestamps (format is HH24:MI:SS) printed by
SQL*Plus due to SET TIME ON, serve to further document the sequence of events.
Table 16-1. Concurrent SELECT FOR UPDATE SKIP LOCKED
Session 1
Session 2
00:50:51 SQL> SELECT tab.msgid
FROM "NDEBES"."POST OFFICE QUEUE TABLE" tab
WHERE q name=:q name AND (state=:state) FOR
UPDATE SKIP LOCKED;
Hit return to continue
00:51:00 SQL> SELECT tab.msgid
FROM "NDEBES"."POST OFFICE QUEUE TABLE"
tab WHERE q name=:q name AND (state=:state)
FOR UPDATE SKIP LOCKED;
Hit return to continue
34F01D98AF0444FF91B10C6D00CB5826
Hit return to continue
54783A999BB544419CDB0D8D44702CD3
Hit return to continue
25CC997C9ADE48FFABCE33E62C18A7F3
Hit return to continue
B78950028B3A4F42A5C9460DDDB9F9D7
Hit return to continue
353D44D753494D78B9C5E7B515263A6D
3 rows selected.
00:57:43 SQL>
89A118C42BFA4A22AE31932E3426E493
Hit return to continue
2544AA9A68C54A9FB9B6FE410574D85A
4 rows selected.
00:57:45 SQL>
This time the results are excellent. Both sessions fetched a fair share of the rows, no row
was fetched by more than one session, and no rows were skipped. Both sessions were properly
isolated from each other due to locking, yet no session blocked the other. The level 8 extended
SQL trace files for both sessions confirm this in that they do not contain a single wait for a TX
lock during the entire sequence of FETCH calls. The approach works just as well with three or
more sessions.
At the beginning of this chapter, I recommended the use of AQ for good reason. As we
learned in this chapter, it is inherently scalable, since it uses the undocumented SKIP LOCKED
clause when dequeuing messages. Another compelling reason for using AQ is that it has no
requirement for polling to find out whether messages are available or not. I have seen several
systems which were under high load due to applications that asked the DBMS instance several
dozen times a second whether there was work to do. In AQ parlance, one would say that these
applications were looking for messages to consume. The poor implementation resulted in one
CPU being almost 100% busy all the time.
CHAPTER 16 ■ SELECT FOR UPDATE SKIP LOCKED
A closer look at how DBMS AQ.DEQUEUE is implemented reveals that it is possible to wait one
or more seconds for a message to arrive. Database sessions that request a message do not keep
a CPU busy while they are waiting. Instead, the session is put to sleep on the wait event Streams
AQ: waiting for messages in the queue (in Oracle10g). The first parameter of this wait event
(V$SESSION EVENT.P1) is the object identifier of the queue in the data dictionary (DBA OBJECTS.
OBJECT ID). V$SESSION EVENT.P3 holds the time (in seconds) which was used in the call to
DBMS AQ.DEQUEUE.
SQL> SELECT p1, p1text, p2, p2text, p3, p3text
FROM v$session wait
WHERE event='Streams AQ: waiting for messages in the queue';
P1 P1TEXT
P2 P2TEXT
P3 P3TEXT
----- -------- ---------- -------- --- --------16580 queue id 1780813772 process# 120 wait time
SQL> SELECT object name, object type FROM dba objects WHERE object id=16580;
OBJECT NAME
OBJECT TYPE
---------------------- ----------CAUGHT IN SLOW Q AGAIN QUEUE
By the way, if you are merely trying to protect a shared resource from being used simultaneously by different processes, consider using DBMS LOCK instead of implementing a sort of
mutual exclusion or semaphore mechanism with a table. I’m addressing this point because I
have seen postings on the Internet that suggest using SELECT FOR UPDATE SKIP LOCKED to implement a resource control table, i.e., a dedicated table that has one row per resource and a status
column. The value of the status column would indicate whether the resource was available or
not. Obviously, frequent concurrent accesses to such a table will incur waits for TX locks,
unless the undocumented SKIP LOCKED clause is used.
DBMS_LOCK—A Digression
DBMS LOCK allows you to request and release locks in shared as well as exclusive mode. What’s
more, locks can even be converted between modes. With regards to redo generation, calling
DBMS LOCK is also preferable over implementing a locking mechanism based on a table, which
needs to be updated in order to reflect the status of locks. The use of DBMS LOCK.REQUEST/RELEASE to
obtain and release locks does not generate any redo. DBMS LOCK is fully documented in the PL/
SQL Packages and Types Reference, but a small example for its use is in order.
Lock names or numbers must be agreed upon by all components of an application. To
make sure different applications do not interfere witch each other by accidentally using the
same lock number, DBMS LOCK provides a way to convert a name for a lock to a lock handle,
which then replaces the rather meaningless lock number. All sessions wishing to use the same
lock (e.g., MYAPP MUTEX1 in the example below) must call DBMS LOCK.ALLOCATE UNIQUE once, to
convert the lock name to a lock handle.
SQL> VARIABLE lockhandle VARCHAR2(128)
SQL> BEGIN
-- get a lock handle for the lockname that was agreed upon
-- make sure you choose a unique name, such that other vendors' applications
-- won't accidentally interfere with your locks
159
160
CHAPTER 16 ■ SELECT FOR UPDATE SKIP LOCKED
DBMS LOCK.ALLOCATE UNIQUE(
lockname => 'MYAPP MUTEX1',
lockhandle => :lockhandle
);
END;
/
print lockhandle
LOCKHANDLE
----------------------10737418641073741864187
There’s now a new row in table SYS.DBMS LOCK ALLOCATED.
SQL> SELECT * FROM sys.dbms lock allocated /* no public synonym */;
NAME
LOCKID EXPIRATION
---------------------------------------- ---------- -------------DROP EM USER:SYSMAN
1073741824 13.03.07 17:45
ORA$KUPV$MT-SYSTEM.SYS EXPORT SCHEMA 01 1073741844 09.03.07 15:51
ORA$KUPV$JOB SERIALIZE
1073741845 09.03.07 15:51
ORA$KUPM$SYSTEM$SYS EXPORT SCHEMA 01
1073741846 09.03.07 15:48
MYAPP MUTEX1
1073741864 20.07.07 19:52
As is obvious from the other entries above, the ORACLE DBMS uses DBMS LOCK for internal
purposes.
SQL> VARIABLE result NUMBER
SQL> BEGIN
-- request the lock with the handle obtained above in exclusive mode
-- the first session which runs this code succeeds
:result:=DBMS LOCK.REQUEST(
lockhandle => :lockhandle,
lockmode => DBMS LOCK.X MODE,
timeout => 0,
release on commit => TRUE /* default is false */
);
END;
/
SELECT decode(:result,0,'Success',
1,'Timeout',
2,'Deadlock',
3,'Parameter error',
4,'Already own lock specified by id or lockhandle',
5,'Illegal lock handle') Result
FROM dual;
RESULT
------Success
The second session (the initial call to DBMS LOCK.ALLOCATE UNIQUE is omitted) may then run
the same code.
CHAPTER 16 ■ SELECT FOR UPDATE SKIP LOCKED
SQL> VARIABLE result NUMBER
SQL> BEGIN
:result:=DBMS LOCK.REQUEST(
lockhandle => :lockhandle,
lockmode => DBMS LOCK.X MODE,
timeout => 0,
release on commit => TRUE /* default is false */
);
END;
/
SQL> SELECT decode(:result,1,'Timeout') Result FROM dual;
RESULT
------Timeout
The second session was unable to obtain the lock held by the first session in exclusive mode.
The time spent waiting for the lock is accounted for by the wait event enqueue in Oracle9i and
enq: UL - contention in Oracle10g. The abbreviation UL means user lock. After session 1 has
committed, session 2 is able to obtain the lock, since session 1 had specified RELEASE ON COMMIT=
TRUE in its lock request.
Session 1:
SQL> COMMIT;
Commit complete.
Session 2:
BEGIN
:result:=DBMS LOCK.REQUEST(
lockhandle => :lockhandle,
lockmode => DBMS LOCK.X MODE,
timeout => 0,
release on commit => TRUE /* default is false */
);
END;
/
SQL> SELECT decode(:result,0,'Success') Result FROM dual;
RESULT
------Success
As you can see, very little code is required to lock resources for exclusive use with DBMS LOCK.
Just like with AQ, there is no need for any resource-intensive polling, since a non-zero time-out
may be used when waiting for a lock. Waiting sessions sleep on the event enq: UL - contention,
in case another session holds the requested lock in an incompatible mode. The second parameter
of this wait event (V$SESSION WAIT.P2) is the user lock identifier (LOCKID) in the dictionary view
SYS.DBMS LOCK ALLOCATED.
161
162
CHAPTER 16 ■ SELECT FOR UPDATE SKIP LOCKED
SQL> SELECT p1, p1text, p2, p2text, wait class, seconds in wait, state
FROM v$session wait
WHERE event='enq: UL - contention';
P1 P1TEXT
P2 P2TEXT WAIT CLASS SECONDS IN WAIT STATE
---------- --------- ---------- ------ ----------- --------------- ------1431044102 name|mode 1073741864 id
Application
121 WAITING
SQL> SELECT name
FROM sys.dbms lock allocated la, v$session wait sw
WHERE sw.event='enq: UL - contention'
AND la.lockid=sw.p2;
NAME
-----------MYAPP MUTEX1
At the end of the day, DBMS LOCK is no more, but also no less, than the ORACLE database
server’s powerful enqueue mechanism externalized through a PL/SQL interface.
Source Code Depot
Table 16-2 lists this chapter’s source files and their functionality.
Table 16-2. SELECT FOR UPDATE SKIP LOCKED Source Code Depot
File Name
Functionality
create queue deq.sql
Creates an AQ queue table and attempts to dequeue a message
aq enq.sql
Enqueues a message into a queue
dbms lock.sql
Illustrates how to use the package DBMS LOCK
PA R T
6
Supplied PL/SQL
Packages
CHAPTER 17
■■■
DBMS_BACKUP_RESTORE
T
he package DBMS BACKUP RESTORE is undocumented in the Oracle9i Supplied PL/SQL Packages and Types Reference as well as in Oracle Database PL/SQL Packages and Types Reference of
Oracle10g and subsequent releases. Restoring database files with Recovery Manager (RMAN)
requires either that a control file with bookkeeping information on the files to restore is mounted
or that a database session to an RMAN catalog with like information is available. DBMS BACKUP
RESTORE makes possible a restore without RMAN in a disaster scenario. Such a scenario is characterized by the loss of all current control files, and the lack of or unavailability of a recovery
catalog or control file backup that contains records of the most recent data file and archived
redo log backups. Note that bookkeeping information on backups is aged out of the control file
depending on the setting of the initialization parameter CONTROL FILE RECORD KEEP TIME. The
default setting of this parameter is merely seven days.
Recovery Manager
Recovery Manager (RMAN) is a backup and recovery utility that was introduced with Oracle8.
RMAN is a database client just like SQL*Plus, Data Pump Export/Import, SQL*Loader, or any
other program that can interact with an ORACLE instance through Oracle Call Interface (OCI).
RMAN itself does not back up or restore any part of an ORACLE database. If you recall that
RMAN can run on any machine within a network, connect to an ORACLE instance on a remote
database server, and perform backup and recovery, it is quite clear that RMAN neither needs to
nor has the capability to read or write any database files. Instead, RMAN uses the undocumented
package DBMS BACKUP RESTORE to instruct an ORACLE instance to perform backup and restore
operations.
The ORACLE DBMS supports mounted file systems and so-called media managers for
storage of backups. A media manager is software that integrates with an ORACLE instance
through the SBT (System Backup to Tape, see Oracle Database Backup and Recovery Basics 10g
Release 2) interface and is capable of controlling tape devices for storing backups. The SBT
interface is a specification by Oracle Corporation. The specification is disseminated to software companies wishing to support RMAN-based backup and recovery. The SBT interface is
not documented in ORACLE DBMS documentation and is usually implemented as a shared
library (a.k.a. dynamic link library or DLL), which is used by the program $ORACLE HOME/bin/
oracle (or oracle.exe on Windows) that implements an ORACLE instance. Oracle Corporation
provides an implementation of the SBT interface that is linked with the executable oracle[.exe] by
default. Usually, this executable is relinked with a shared library shipped with media management software, to enable an ORACLE instance to talk to a media manager.
165
166
CHAPTER 17 ■ DBMS_BACKUP_RESTORE
Oracle Corporation recently entered the circle of companies that provide media management software by offering Oracle Secure Backup (see Oracle Secure Backup Administrator’s
Guide). Other well known players are Hewlett Packard with OmniBack, Symantec with NetBackup
and Tivoli Software with Tivoli Storage Manager.
DBMS BACKUP RESTORE makes extensive use of the control file to keep track of what was
backed up, when, and how. Here, when means at what time the backup started and when it
ended. What means what type of file, such as control file, data file, or archived redo log file.
How refers to the medium that holds the backup, such as a file system mounted on the database server or a media manager as well as the incremental level at which the backup was taken.
Note that Oracle10g only supports incremental levels 0 and 1, whereas Oracle9i and previous
releases supported levels 0 to 4. Considering that there are cumulative backups on top of the
default differential backups, there is no truly compelling argument for more than two levels.
In what follows, the data inside the control file representing the when, what, and how will
be called backup metadata. Please refer to Table 17-1 for an overview of some of the V$ views
that provide access to the metadata.
Table 17-1. Backup-Related V$ Views
V$ View
Purpose
Related RMAN
Command
V$ARCHIVED LOG
Archived redo log files, their thread
number, sequence number, and status
(available, deleted, expired, or unavailable)
BACKUP ARCHIVELOG
Backups of data files (FILE# > 0) and
control files (CONTROLFILE TYPE IS NOT
NULL); both types of files may be present
in the same backup piece since a backup
of file 1, which belongs to tablespace
SYSTEM, also backs up the control file,
unless automatic control file backups are
enabled (AUTOBACKUP)
BACKUP DATABASE
V$BACKUP PIECE
Backup pieces with creation time,
device type, status, and path name
(column HANDLE)
BACKUP
V$BACKUP SET
Backup sets consist of one or more
backup pieces, in case the files
contained in a backup set exceed
the maximum piece size
BACKUP
V$BACKUP SPFILE
Backups of the server parameter file
BACKUP SPFILE
V$BACKUP REDOLOG
Backups of archived redo logs
BACKUP ARCHIVELOG
DELETE ARCHIVELOG
CATALOG ARCHIVELOG
V$BACKUP DATAFILE
BACKUP TABLESPACE
BACKUP DATAFILE
BACKUP CURRENT CONTROLFILE
CHAPTER 17 ■ DBMS_BACKUP_RESTORE
ORACLE DBMS releases up to Oracle8i release 3 (8.1.7) had a considerable vulnerability
due to the fact that the database—the control file to be precise—is used to keep track of backups.
Of course, RMAN supported a recovery catalog for duplicating recovery-related information in
the control file right from the first release with Oracle8. The recovery catalog enables restore
operations when no control file is available. Contrary to the control file, backup metadata in a
recovery catalog are exempt from overwrites (parameter CONTROLFILE RECORD KEEPTIME). This
left the architecture with the following two vulnerabilities:
• Metadata on backups that were taken without connecting to a catalog (NOCATALOG command
line switch) and were never propagated by a subsequent run of RMAN are not registered
in the recovery catalog. Let’s say a backup script is smart enough to check the availability
of the catalog and to run RMAN with the NOCATALOG option, should the availability test
fail. Should you lose the most recent control file before the catalog and control file are
resynchronized, you would not be able to restore the most recent archived redo log files,
even if the recovery catalog outage was already over.
• After the loss of the most recent control file, recovery will not be possible while there is
an outage of the recovery catalog.
No special action is required to resynchronize backup metadata from the control file with
the recovery catalog. If needed, RMAN automatically propagates metadata records from the
control file to the recovery catalog whenever an RMAN command is run while connected to
a catalog. This feature may be leveraged to synchronize the control file metadata with two or
more recovery catalogs on different database servers, thus achieving higher availability of the
catalog without resorting to clustering or replication technologies. Merely run RMAN one more
time after the backup has finished, connect to an additional recovery catalog (e.g., with CONNECT
CATALOG), and execute the RMAN command RESYNC CATALOG.
The aforementioned vulnerabilities could be worked around by creating a control file copy
(ALTER DATABASE BACKUP CONTROLFILE TO 'path') and backing up that copy with a file system
backup utility. The downside of this approach is that the administrator performing the restore
will need to be familiar with and privileged for use of both RMAN and file system restore utilities. Often, privileges for file system restores are available only to privileged operating system
users such as root on UNIX systems or the group Administrators on Windows systems. Thus,
a DBA would need assistance from a system administrator to perform restore operations.
Oracle9i RMAN shipped with the new automatic control file backup functionality,1 which
addresses the issues discussed above. Automatic backup of the control file after each backup
makes sure that the most recent copy of the control file, i.e., the only copy that contains all the
bookkeeping information required to restore the last backup, is either backed up to disk or to a
media manager. However, this feature is disabled by default, such that databases remain vulnerable, unless the DBA enables this new feature.
For the first time, thanks to automatic control file backup, the most recent control file can
be restored without accessing a recovery catalog. An example of how to restore a control file
from an automatic control file backup is shown in the next code segment. The database identifier
(V$DATABASE.DBID), which is stored inside the control file, is required by restore operations.
1. See documentation on the commands CONFIGURE CONTROLFILE AUTOBACKUP ON and CONFIGURE
CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE in [OL92 2002].
167
168
CHAPTER 17 ■ DBMS_BACKUP_RESTORE
To restore an automatic control file backup after the loss of all current control files, the database identifier needs to be set with the command SET DBID.2
C:> rman target /
Recovery Manager: Release 10.2.0.1.0 - Production on Mon Jul 2 18:32:08 2007
connected to target database: TEN (not mounted)
RMAN> SET DBID 2848896501;
executing command: SET DBID
RMAN> RUN {
RESTORE CONTROLFILE FROM AUTOBACKUP;
}
Starting restore at 02.07.07 18:32
using target database control file instead of recovery catalog
allocated channel: ORA DISK 1
channel ORA DISK 1: sid=156 devtype=DISK
channel ORA DISK 1: looking for autobackup on day: 20070702
channel ORA DISK 1: autobackup found: c-2848896501-20070702-02
channel ORA DISK 1: control file restore from autobackup complete
output filename=C:\ORADATA\TEN\CONTROL01.CTL
Finished restore at 02.07.07 18:32
The default format for automatic control file backups is %F. This translates to C-database_
id-YYYYMMDD-QQ, where QQ represents a hexadecimal sequence number between 0 and FF
(0-256 in decimal). The remaining format string adheres to the well known SQL date and time
format models. In case a custom format has been configured—which I strongly discourage—a
RESTORE command for an automatic control file backup must be preceded by the command SET
CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE device_type TO 'autobackup_format', to let RMAN
know the non-default format. By the way, it is undocumented how RMAN uses the hexadecimal
sequence number QQ. It serves to generate unique file names for automatic control file backups
within a single day. Remember that due to the date format “YYYYMMDD” without a time
component, all automatic control file backups within a single day would have the same filename or handle (V$BACKUP PIECE.HANDLE). As the clock strikes 12 and a new day is about to
dawn, RMAN begins using the new value of “YYYYMMDD” and resets QQ to zero. But what if
more than 256 automatic control file backups were written within a single day? Then RMAN
leaves QQ at FF and overwrites the backup piece with sequence FF (QQ=FF). Here is the proof:
SQL> SELECT completion time, handle
FROM v$backup piece
WHERE handle LIKE '%FF';
COMPLETION TIME HANDLE
--------------- -----------------------02.07.07 22:04 C-2848896501-20070702-FF
After another automatic control file backup on the same day, the backup piece name has
been reused and the query yields:
2. In Oracle10g, it is possible to embed the database identifier in the backup piece name by using the
placeholder %I in the FORMAT specification. Thus, even if no log files from past backups remain, the
database identifier may be derived from the backup piece names in the media manager repository.
CHAPTER 17 ■ DBMS_BACKUP_RESTORE
SQL> SELECT completion time, handle
FROM v$backup piece
WHERE handle LIKE '%FF';
COMPLETION TIME HANDLE
--------------- -----------------------02.07.07 22:30 C-2848896501-20070702-FF
The first control file autobackup one day later again uses the hexadecimal sequence
number 00.
SQL> SELECT completion time, handle
FROM v$backup piece
WHERE handle like '%00';
COMPLETION TIME HANDLE
--------------- -----------------------02.07.07 17:17 C-2848896501-20070702-00
03.07.07 22:51 C-2848896501-20070703-00
This is the only situation that I am aware of, where RMAN overwrites existing backup
pieces with new data. Under all other circumstances, RMAN refuses to overwrite existing files
and aborts the command BACKUP with “ORA- 19506: failed to create sequential file, name=
"string", parms="string" for device type SBT_TAPE” or “ORA-19504: failed to create file "string"
for device type disk” and “ORA-27038: created file already exists”.
This behavior sheds some light on the algorithm RMAN uses to generate backup piece
handles in its quest for a control file from an automatic control file backup. Among others, the
settings of the optional parameters MAXDAYS and MAXSEQ in the command RESTORE CONTROLFILE
FROM AUTOBACKUP determine the handle names RMAN generates. If MAXSEQ is not set, RMAN uses
the default of 256, translates it to hexadecimal FF, and builds a handle name according to the
%F format presented earlier. If no such backup piece exists, the sequence number (QQ) is decremented by 1 until it reaches 0. If no control file backup is found during this process, RMAN moves
on to the previous day and recommences the process with a hexadecimal sequence number of FF.
The default for MAXDAYS is 7 with a permissible range between 1 and 366. If, after MAXDAYS days
before the current day have been tried, RMAN still fails to locate a suitable backup piece, the
RESTORE command terminates with the error “RMAN-06172: no autobackup found or specified
handle is not a valid copy or piece”. The search may be accelerated somewhat by using a lower
MAXSEQ such as 10 in cases where it is known that no more than 10 BACKUP commands get executed
per day due to data file and archived log backups.
The new automatic control file backup functionality—which also backs up the server
parameter file (SPFILE), if present—is disabled by default (page 2-69 of the Oracle Database
Backup and Recovery Reference 10g Release 2). There certainly are production systems that are
still vulnerable, since no backup—automatic or not—of the control file with the most recent
recovery-related metadata exists.
Undoubtedly, the likelihood of scenarios 1 and 2 described earlier is low. Nonetheless it is
reassuring to learn that even on systems where the quality of backup scripts is insufficient, the
road to full recovery is still open when following the advice on the undocumented package
DBMS BACKUP RESTORE that I will present shortly.
169
170
CHAPTER 17 ■ DBMS_BACKUP_RESTORE
Disaster Recovery Case Study with Tivoli Data
Protection for Oracle
Tivoli Data Protection for ORACLE (TDPO) is software that conforms to Oracle Corporation’s
SBT specification and enables integration of RMAN-based backup and recovery with Tivoli
Storage Manager (TSM) through a shared library (see http://www.tivoli.com). It supports the
client platforms AIX, HP-UX, Solaris, Linux, and Windows.
From the perspective of the TSM server, an ORACLE instance is a backup client. Using the
TSM utility dsmadmc, it is possible to retrieve the names of backup pieces. If a control file were
available, these would match the backup piece names in V$BACKUP PIECE.HANDLE. The TSM
server accepts SQL statements for retrieving files backed up by a client. The column NODE NAME
stores the name of the TSM client used by the database server host. If you have given different
file name prefixes to control file, data file, and archived log backups, it will pay off now.
dsmadmc> SELECT * FROM backups WHERE node name='DBSERVER'
ANR2963W This SQL query may produce a very large result table, or may
require a significant amount of time to compute.
Do you wish to proceed? (Yes (Y)/No (N)) y
NODE NAME: DBSERVER
FILESPACE NAME: /adsmorc
FILESPACE ID: 1
STATE: ACTIVE VERSION
TYPE: FILE
HL NAME: //
LL NAME: CF-DB-NINE-20071010-vsiu61og
OBJECT ID: 201461790
BACKUP DATE: 2007-10-10 14:43:27.000000
DEACTIVATE DATE:
OWNER: oraoper
CLASS NAME: DEFAULT
In the output from dsmadmc, you would look for the most recent control file backup or the
most recent backup piece that contains a control file. Control files are backed up whenever file 1,
the first file of tablespace SYSTEM, is backed up. The script dbms backup restore cf tdpo.sql,
which is reproduced in the following code segment, may be used to restore a control file without
mounting a database and accessing a catalog or automatic control file backup. The LL NAME
found in the TSM repository is used as the piece name in the call to DBMS BACKUP RESTORE.
variable type varchar2(10)
variable ident varchar2(10)
variable piece1 varchar2(513)
begin
:type:='SBT TAPE';
:ident:='channel1';
:piece1:='CF-DB-NINE-20071010-vsiu61og';
end;
CHAPTER 17 ■ DBMS_BACKUP_RESTORE
/
set serveroutput on
DECLARE
v devtype VARCHAR2(100);
v done
BOOLEAN;
v maxPieces NUMBER;
TYPE t pieceName IS TABLE OF varchar2(513) INDEX BY binary integer;
v piece name tab t pieceName;
BEGIN
-- Define the backup pieces (names from the RMAN Log file or TSM repository)
v piece name tab(1) := :piece1;
--v piece name tab(2) := '<backup piece name 2>';
v maxPieces
:= 1;
-- Allocate a channel (Use type=>null for DISK, type=>'sbt tape' for TAPE)
v devtype := DBMS BACKUP RESTORE.deviceAllocate(
type=>:type,
ident=> :ident,
params => 'ENV=(TDPO OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/
tdpo.opt)'
);
dbms output.put line('device type '||v devtype);
-- begin restore conversation
DBMS BACKUP RESTORE.restoreSetDataFile(check logical=>false);
-- set restore location with CFNAME parameter
DBMS BACKUP RESTORE.restoreControlFileTo(cfname=>'/tmp/control.ctl');
FOR i IN 1..v maxPieces LOOP
dbms output.put line('Restoring from piece '||v piece name tab(i));
DBMS BACKUP RESTORE.restoreBackupPiece(handle=>v piece name tab(i),
done=>v done, params=>null);
exit when v done;
END LOOP;
-- Deallocate the channel
DBMS BACKUP RESTORE.deviceDeAllocate(:ident);
EXCEPTION WHEN OTHERS THEN
DBMS BACKUP RESTORE.deviceDeAllocate(:ident);
RAISE;
END;
/
Of course, an instance must be running to be able to execute PL/SQL, but STARTUP NOMOUNT
is sufficient to run the script in SQL*Plus. Normal PL/SQL packages can only be executed when
the database is open, since the so-called PL/SQL mpcode in the data dictionary is read. Some
kind of undocumented “magic” makes it possible to execute DBMS BACKUP RESTORE without
even mounting a database. Following is a sample run of the control file restore script dbms
backup restore cf tdpo.sql:
171
172
CHAPTER 17 ■ DBMS_BACKUP_RESTORE
$ sqlplus "/ as sysdba"
SQL*Plus: Release 9.2.0.8.0 - Production on Wed Oct 10 21:33:44 2007
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
SQL> @dbms backup restore cf tdpo.sql
PL/SQL procedure successfully completed.
device type SBT TAPE
Restoring from piece CF-DB-NINE-20071010-vsiu61og
PL/SQL procedure successfully completed.
SQL> !ls -l /tmp/control.ctl
-rw-r----- 1 oracle dba
1413120 Oct 10 21:33 /tmp/control.ctl
The control file was restored as file /tmp/control.ctl. If this control file contains the backup
piece names of the most recent data file and archived log backups, the database should be
mounted and the restore continued with RMAN in the usual manner. However, if no control
file backup with the most recent backup piece names exists, simply since the control file was
not backed up after the last backup (e.g., since automatic control file backup is disabled), data
files and archived logs must also be restored by calling DBMS BACKUP RESTORE. For these purposes,
the source code depot contains the scripts dbms backup restore.sql and dbms backup restore
arch tdpo.sql.
Source Code Depot
Table 17-2 lists this chapter’s source files and their functionality.
Table 17-2. DBMS_BACKUP_RESTORE Source Code Depot
File Name
Functionality
dbms backup restore.sql
Disaster recovery script for restoring a control file
and multiple data files using a disk channel
dbms backup restore arch tdpo.sql
Disaster restore script for restoring archived redo
logs from a backup set, which may consist of several
backup pieces, using TDPO (device type SBT TAPE)
dbms backup restore cf tdpo.sql
Disaster restore script for restoring a control file using
TDPO (device type SBT TAPE)
CHAPTER 18
■■■
DBMS_IJOB
T
he package DBMS IJOB is an undocumented PL/SQL package that is called internally by DBMS
JOB. By using DBMS IJOB directly, the limitations inherent in DBMS JOB may be overcome. Using
DBMS IJOB, it is possible to create and drop jobs in other schemas, to export jobs as PL/SQL
scripts, and to change the NLS environment of jobs as well as the database user who runs a job.
Execute permission on DBMS IJOB is included in the role DBA. Unless the January 2009 Critical
Patch Update is installed in an ORACLE_HOME, database users with access to DBMS IJOB can
circumvent auditing.
Introduction to DBMS_JOB
The package DBMS JOB submits PL/SQL procedures, which shall be run at regular intervals, to
the job queue. The job queue is enabled by setting the initialization parameter JOB QUEUE
PROCESSES to a value greater than zero. The job queue is handled by the job queue coordinator
process CJQ0 and job queue slave processes (JNNN). The documented interface to the job queue is
the package DBMS JOB. This package does not allow a database administrator to create, modify,
and drop jobs in foreign schemas.
DBMS JOB calls the undocumented package DBMS IJOB to accomplish its tasks. Using DBMS
IJOB directly, allows a database administrator to overcome the aforementioned limitations.
Changes to the job queue with DBMS IJOB take effect when COMMIT is issued (same as with DBMS
JOB). The data dictionary table underlying DBA JOBS is SYS.JOB$.
DBMS JOB and DBMS IJOB have the procedures BROKEN, REMOVE and RUN in common. These
procedures have identical arguments in both packages. For each of them I provide an example
that illustrates how DBMS IJOB gives the DBA full control of all jobs in the database.
BROKEN Procedure
This procedure may be used to change the status of a job in any schema, thus overcoming the
limitation of DBMS JOB.BROKEN. Jobs with status broken (DBA JOBS.BROKEN=Y) are not run automatically, but may be run manually with DBMS JOB.RUN.
173
174
CHAPTER 18 ■ DBMS_IJOB
Syntax
DBMS IJOB.BROKEN (
job IN BINARY INTEGER,
broken IN BOOLEAN,
next date IN DATE DEFAULT SYSDATE);
Parameters
Parameter
Description
job
Job number; corresponds to DBA JOBS.JOB
broken
job as not broken
TRUE: mark job as broken, FALSE: mark
next date
Next time and date to run job
Usage Notes
Because there is no public synonym for DBMS IJOB you need to qualify the package name with
the schema name SYS. For jobs with status BROKEN=Y the column NEXT DATE always has the value
January 1st, 4000.
Examples
Here’s an example of marking a job in a foreign schema as broken. It fails with DBMS JOB, but
succeeds with DBMS IJOB.
SQL> SHOW USER
USER is "NDEBES"
SQL> SELECT role FROM session roles WHERE role='DBA';
ROLE
-----------------------------DBA
SQL> SELECT job, what, broken FROM dba jobs WHERE priv user='PERFSTAT';
JOB WHAT
BROKEN
--- --------------- -----1 statspack.snap; N
SQL> EXEC dbms job.broken(1,true)
BEGIN dbms job.broken(1,true); END;
*
ERROR at line 1:
ORA-23421: job number 1 is not a job in the job queue
ORA-06512: at "SYS.DBMS IJOB", line 529
ORA-06512: at "SYS.DBMS JOB", line 245
SQL> EXEC sys.dbms ijob.broken(1, true)
PL/SQL procedure successfully completed.
CHAPTER 18 ■ DBMS_IJOB
SQL> COMMIT;
Commit complete.
SQL> SELECT job, what, broken FROM dba jobs WHERE priv user='PERFSTAT';
JOB WHAT
BROKEN
--- --------------- -----1 statspack.snap; Y
FULL_EXPORT Procedure
This procedure returns strings that hold PL/SQL calls to recreate a job. It may be used to export
job definitions.
Syntax
DBMS IJOB.FULL EXPORT(
job IN BINARY INTEGER,
mycall IN OUT VARCHAR2,
myinst IN OUT VARCHAR2);
Parameters
Parameter
Description
job
Job number; corresponds to DBA JOBS.JOB
mycall
After a successful call of the procedure, the return value holds a string that
represents a call to DBMS IJOB.SUBMIT
myinst
After a successful call of the procedure, the return value holds a string that
represents a call to DBMS JOB.INSTANCE
Examples
The script dbms ijob full export.sql exports the definition of a single job in a format that is
suitable for recreating the job. The main part of the script is reproduced here:
variable job number
variable submit call varchar2(4000)
variable instance call varchar2(4000)
exec :job:=&job number;
begin
dbms ijob.full export(:job, :submit call, :instance call);
end;
/
print submit call
print instance call
175
176
CHAPTER 18 ■ DBMS_IJOB
Following is an example of running the script to export a job that takes Statspack snapshots:
$ sqlplus -s / as sysdba @dbms ijob full export.sql
JOB WHAT
---------- --------------------------------------------21 statspack.snap;
Enter value for job number: 21
sys.dbms ijob.submit(job=>21,luser=>'PERFSTAT',puser=>'PERFSTAT',cuser=>'PERFSTAT',
next date=>to date('2007-12-07:21:00:00','YYYY-MM-DD:HH24:MI:SS'),
interval=>'trunc(SYSDATE+1/24,''HH'')',
broken=>FALSE,what=>'statspack.snap;',nlsenv=>'NLS LANGUAGE=''AMERICAN'' NLS TERRITO
RY=''AMERICA'' NLS CURRENCY=''$'' NLS ISO CURRENCY=''AMERICA'' NLS NUMERIC CHARACTER
S=''.,'' NLS DATE FORMAT=''dd.mm.yyyy hh24:mi:ss'' NLS DATE LANGUAGE=''AMERICAN'' NL
S SORT=''BINARY''',env=>'0102000200000000');
dbms job.instance(job=>21, instance=>1, force=>TRUE);
Job 21 may be recreated by executing the PL/SQL calls generated by
DBMS IJOB.FULL EXPORT.
REMOVE Procedure
Use this procedure to remove jobs in any schema.
Syntax
DBMS IJOB.REMOVE(
job IN BINARY INTEGER);
Parameters
Parameter
Description
job
Job number
Examples
Following is an example of deleting a job in a foreign schema:
SQL>
USER
SQL>
PRIV
SHOW USER
is "NDEBES"
SELECT priv user FROM dba jobs WHERE job=29;
USER
CHAPTER 18 ■ DBMS_IJOB
-----------------------------PERFSTAT
SQL> EXEC sys.dbms ijob.remove(29)
PL/SQL procedure successfully completed.
SQL> COMMIT;
Commit complete.
RUN Procedure
This procedure may be used to run jobs in any schema, irrespective of their status (BROKEN=
Y/N). The job is run by the foreground process of the user executing DBMS IJOB instead of by a
job queue process. This facilitates troubleshooting or performance diagnosis of the job, since
the session to trace or monitor is known. DBA JOBS.NEXT DATE is recalculated.
Syntax
DBMS IJOB.RUN(
job IN BINARY INTEGER,
force IN BOOLEAN DEFAULT FALSE);
Parameters
Parameter
Description
job
Job number
force
If TRUE, DBMS IJOB.RUN ignores the instance affinity setting of the job
specified with the parameters INSTANCE and FORCE=TRUE to DBMS JOB.SUBMIT.
If FORCE=TRUE when calling DBMS IJOB.RUN, then the job may be run in any
instance, thereby ignoring instance affinity.
Usage Notes
In a scenario where a database user approaches a DBA and asks the DBA to diagnose a failing
job, the DBA would normally need to start a database session as the owner of the job to reproduce the problem. Using DBMS IJOB, the DBA can run and diagnose the job without knowing
the password of the job owner or asking the job owner to log him in.
Contrary to the Oracle10g scheduler (DBMS SCHEDULER), which records the reason for a job’s
failure in the data dictionary view DBA SCHEDULER JOB RUN DETAILS, the DBMS JOB job queue
does not record errors from job runs in the database. Errors from failed DBMS JOB jobs are recorded
in job queue slave process trace files in the background dump destination (parameter BACKGROUND
DUMP DEST). Job developers are expected to implement their own error logging. Three undocumented PL/SQL metadata variables are available to DBMS JOB jobs. Their variable names, data
types, and descriptions are provided in Table 18-1.
177
178
CHAPTER 18 ■ DBMS_IJOB
Table 18-1. DBMS_JOB Metadata
Variable
Name
Data Type
Description
job
BINARY INTEGER
Job number of the job currently being executed.
next date
DATE
Date and time of the next scheduled job run; this variable
may be used to override the repeat interval of a job.
broken
BOOLEAN
The value of this variable is initially FALSE, irrespective of the
current job status. It may be used to mark a job as broken. If
this variable is assigned the value TRUE and the job completes
without raising an exception, the job is marked as broken
(ALL JOBS.BROKEN='Y').
The variable broken is provided for jobs that catch exceptions. The job queue cannot
detect failing jobs if a job catches exceptions internally. Setting broken=TRUE in an anonymous
PL/SQL block that implements a job allows a job developer to mark the job as broken if an
exception is caught. Thus, he may set the job’s status according to his own strategy. Jobs that
do raise exceptions are marked as broken after 16 failures. Job queue processes use an exponential
backoff strategy for calculating the next scheduled run date of failed jobs. Jobs are reattempted
one minute after the initial failure. The wait interval is doubled after each additional failure.
The next scheduled run of a job may be overridden by assigning a tailored value to the variable
next date. This may be used to implement a custom strategy for reattempting failed jobs. The
source code depot contains the file dbms job metadata.sql with a sample implementation of a
job, which uses the three PL/SQL metadata variables in Table 18-1.
Examples
Following is a scenario for debugging a failing job in a foreign schema by running it in a foreground process with DBMS IJOB.RUN:
SQL> SHOW USER
USER is "SYS"
SQL> SELECT job, priv user, failures FROM dba jobs WHERE job=1;
JOB PRIV USER
FAILURES
---------- ------------ ---------1 PERFSTAT
1
SQL> EXEC dbms job.run(1)
BEGIN dbms job.run(1); END;
*
ERROR at line 1:
ORA-23421: job number 1 is not a job in the job queue
SQL> EXEC sys.dbms ijob.run(1)
CHAPTER 18 ■ DBMS_IJOB
BEGIN sys.dbms ijob.run(1); END;
*
ERROR at line 1:
ORA-12011: execution of 1 jobs failed
ORA-06512: at "SYS.DBMS IJOB", line 406
ORA-06512: at line 1
SQL> ORADEBUG SETMYPID
Statement processed.
SQL> ORADEBUG TRACEFILE NAME
/opt/oracle/obase/admin/TEN/udump/ten1 ora 19365.trc
SQL> !tail -4 /opt/oracle/obase/admin/TEN/udump/ten1 ora 19365.trc
ORA-12012: error on auto execute of job 1
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '+DG/ten/datafile/sysaux.261.628550067'
ORA-06512: at "PERFSTAT.STATSPACK", line 5376
Source Code Depot
Table 18-2 lists this chapter’s source files and their functionality.
Table 18-2. DBMS_IJOB Source Code Depot
File Name
Functionality
dbms ijob full export.sql
Generates PL/SQL code to recreate a job by retrieving its metadata from DBA JOBS (or rather SYS.JOB$).
dbms job metadata.sql
This script creates a table for implementing custom logging of
DBMS JOB jobs in a database table. The job definition uses PL/SQL
job metadata variables to record the job number and the next
scheduled run date. If an exception occurs, the job catches it,
records the error in the logging table, and marks itself as broken.
179
CHAPTER 19
■■■
DBMS_SCHEDULER
T
he database scheduler is an advanced job scheduling capability built into Oracle10g and
subsequent releases. The package DBMS SCHEDULER is the interface to the job scheduler. It is
extensively documented in the Oracle Database Administrator’s Guide and the PL/SQL Packages and Types Reference. The database scheduler supports jobs that run inside the DBMS as
well as jobs that run outside of the DBMS at operating system level. The latter type of job is
termed an external job. Important aspects concerning the execution of external jobs, such as
exit code handling, removal of environment variables, details of program argument passing,
requirements to run external programs owned by users other than SYS, and default privileges
of external jobs defined by the configuration file externaljob.ora are undocumented.
This chapter provides all the details on the subject of external jobs: how they are run,
which environment variables and privileges are available to them, how to signal success and
failure, and how to integrate custom logging with the scheduler’s own logging.
Running External Jobs with
the Database Scheduler
The database scheduler is an advanced job scheduling capability that ships with Oracle10g and
subsequent releases. The PL/SQL package DBMS SCHEDULER is the interface to a rich set of job
scheduling features. Oracle10g Release 1 was the first ORACLE DBMS release with the capability
to run jobs outside of the database. The scheduler supports three types of jobs:
• Stored procedures
• PL/SQL blocks
• Executables, i.e., external programs that run outside of the database engine
Job chains are another new feature introduced with Oracle10g. Chains consist of several
jobs. Rules are used to decide which job within a chain to execute next. Since the scheduler
supports jobs that run within as well as outside of the database engine, it makes sense to use it
for controlling complex processing that involves job steps at operating system level as well as
within the database. Another option for running jobs at operating system level is the Enterprise
Manager job system. In my own experience, the database scheduler has worked flawlessly, whereas
I witnessed several failures of the Enterprise Manager job system. Hence I recommend using
the database scheduler in favor of the Enterprise Manager job system. Enterprise Manager
181
182
CHAPTER 19 ■ DBMS_SCHEDULER
Grid Control includes a web-based interface for both. Enterprise Manager Database Control
has a web-based interface for the database scheduler.
In a high availability environment, such as a failover cluster or Real Application Clusters,
the scheduler solves the issue that jobs need to run in spite of a node or instance failure. Since
high availability solutions ensure that at least one instance in a cluster is up (exactly one instance
per database in a failover cluster), this instance is available to execute jobs. Thus, for the sake
of scheduling, it is unnecessary to protect an Enterprise Manager agent (and its job system) or
a third-party scheduling solution by clustering software.
Certain undocumented aspects of the database scheduler apply to both Windows and UNIX,
whereas others are platform specific. Generic aspects are covered in the next two sections.
Platform-specific scheduler features are addressed in separate sections that pertain to UNIX
and Windows. An interesting generic feature of the scheduler, which is documented in the Oracle
Database Administrator’s Guide, is that it captures and saves up to 200 bytes of standard error
output in the column ADDITIONAL INFO of the dictionary view DBA SCHEDULER JOB RUN DETAILS.
Exit Code Handling
On UNIX systems as well as Windows, programs signal success or failure by returning an exit
code to the parent process. The exit code zero signals successful execution, whereas an exit
code between 1 and 255 indicates a failure or at least a warning. In the Bourne, Korn, and Bash
shells, the 8-bit exit code is available in the variable $?. If a program is terminated by a signal,
the exit code is 128+n, where n is the number of the signal that terminated the program. On
UNIX, signal numbers are defined in the C programming language include file /usr/include/
sys/signal.h. Here’s an example. Start the UNIX program sleep and while sleep is still running
interrupt the process with Ctrl+C.
$ sleep 10
$ echo $?
130
The echo statement retrieves the exit code from sleep. Since 130 - 128 is 2, we need to look
for signal 2 in signal.h.
#define SIGINT
2
/* interrupt */
Signal 2 is the interrupt signal, which may also be sent to a process with kill -INT pid,
where pid is the process identifier.
In Perl $? provides information on which signal, if any, had a role in ending the child process
as well as whether or not a core dump was generated in addition to the exit code. Thus $? is a
16-bit value in Perl. How does the database scheduler fare in this respect? It is undocumented
how the scheduler decides whether an external job succeeded or failed.
On Windows, %ERRORLEVEL% has the same purpose as $? on UNIX systems. Following is an
example that calls the programs true.exe and false.exe shipped with Cygwin, a highly recommended, free, UNIX-like environment for Windows systems, which provides all the UNIX utilities,
such as bash, find, awk, grep, vim (Vi iMproved), and even X11 to redirect the output of X11
clients on UNIX systems (such as the Oracle Universal Installer) to a bitmapped display on a
Windows machine.
CHAPTER 19 ■ DBMS_SCHEDULER
C:>
C:>
1
C:>
C:>
0
false.exe
echo %ERRORLEVEL%
true.exe
echo %ERRORLEVEL%
Back to the database scheduler. Successful execution of a job is characterized by the value
“SUCCEEDED” in the column STATUS of the view DBA SCHEDULER JOB LOG. Failure is signaled by
the value “FAILED” (not “FAILURE” as stated in the Oracle Database Administrator’s Guide 10g
Release 2 on page 2-27). But how does this relate to the exit code of the executable? Except
signaling success or failure, the exit code might also indicate why the external job failed. Could
it be available in DBA SCHEDULER JOB LOG.ADDITIONAL INFO or in any of the columns STATUS, ERROR#,
or ADDITIONAL INFO of the view DBA SCHEDULER JOB RUN DETAILS? Some tests reveal that.
• Executables that return exit code zero are assigned the job status “SUCCEEDED”. All
other exit codes result in the status “FAILED”.
• The exit code itself is not available.
• The column DBA SCHEDULER JOB RUN DETAILS.STATUS, which has an undocumented
range of values in the Oracle10g Release 2 Database Reference, has the same range of
values as the column STATUS of the view DBA SCHEDULER JOB LOG (“SUCCEEDED” or
“FAILED”).
Standard Error Output
UNIX shell scripts and windows command-line scripts support the same output redirection
syntax. Both environments provide three default file handles. These are summarized in Table 19-1.
Table 19-1. Input and Output Handles
Handle Designation
Numeric Equivalent
Description
STDIN
0
Keyboard input
STDOUT
1
Output to the terminal window
STDERR
2
Error output to the terminal window
By default, the UNIX command echo prints its arguments to standard output (STDOUT).
To redirect an error message to standard error (STDERR), use a command such as this:
$ echo Command failed. 1>&2
Command failed.
This works in exactly the same way on Windows.
C:> echo Command failed. 1>&2
Command failed.
183
184
CHAPTER 19 ■ DBMS_SCHEDULER
Evidence for the findings reported so far comes from running a small command-line
script, which terminates with exit code 1. The subsequent example is coded for Windows, but
the results apply equally to UNIX. The source code of the Windows command-line script (file
failure.bat) is reproduced here:
echo This is script %0.
echo About to exit with exit code 1. 1>&2
exit 1
The following PL/SQL block creates an external job, which runs the above script. The
supplied calendaring expression “FREQ=MINUTELY;INTERVAL=3” schedules the job to run
every three minutes. Note that the job is created and run as user SYS.
C:> sqlplus / as sysdba
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
SQL> BEGIN
DBMS SCHEDULER.CREATE JOB(
job name => 'failure test',
job type => 'EXECUTABLE',
job action => 'C:\home\ndebes\bin\failure.bat',
start date => systimestamp,
repeat interval => 'FREQ=MINUTELY;INTERVAL=3',
enabled=>true /* default false! */
);
END;
/
PL/SQL procedure successfully completed.
The job may be run manually by calling the procedure DBMS SCHEDULER.RUN JOB.
SQL> EXEC dbms scheduler.run job('failure test')
BEGIN dbms scheduler.run job('failure test'); END;
*
ERROR at line 1:
ORA-27369: job of type EXECUTABLE failed with exit code: Incorrect function.
ORA-06512: at "SYS.DBMS ISCHED", line 154
ORA-06512: at "SYS.DBMS SCHEDULER", line 450
ORA-06512: at line 1
Due to the non-zero exit code, the job is classified as “FAILED”. After the preceding
manual job execution and a single scheduled execution, querying DBA SCHEDULER JOB LOG
and DBA SCHEDULER JOB RUN DETAILS yields the following results:
SQL> SELECT jl.log id, jl.status, jl.additional info AS log addtl info,
jd.status, jd.additional info AS details addtl info
FROM dba scheduler job log jl, dba scheduler job run details jd
WHERE jl.job name='FAILURE TEST'
CHAPTER 19 ■ DBMS_SCHEDULER
AND jl.log id=jd.log id
ORDER BY jl.log id;
LOG ID STATUS LOG ADDTL INFO
STATUS DETAILS ADDTL INFO
------ ------ --------------------- ------ ---------------------------------245 FAILED
FAILED ORA-27369: job of type EXECUTABLE
failed with exit code: Incorrect
function.
STANDARD ERROR="About to exit with
exit code 1. "
246 FAILED REASON="manually run" FAILED ORA-27369: job of type EXECUTABLE
failed with exit code: Incorrect
function.
STANDARD ERROR="About to exit with
exit code 1. "
The results of the test presented previously, as well as additional tests are summarized below:
• The column ADDITIONAL INFO of the view DBA SCHEDULER JOB RUN DETAILS captures standard error output for external jobs irrespective of their exit codes. This column is a CLOB,
such that it could theoretically capture up to 8 TB or 128 TB depending on the setting of
the parameter DB BLOCK SIZE (see Database Reference 10g Release 2, page A1).
• There is a size limitation of 200 bytes on DBA SCHEDULER JOB RUN DETAILS.ADDITIONAL INFO.
• Standard output is never captured in DBA SCHEDULER JOB RUN DETAILS.ADDITIONAL INFO.
• The column ADDITIONAL INFO of the view DBA SCHEDULER JOB LOG is NULL for jobs that are
run based on a schedule and has the value REASON="manually run" for jobs that were
run manually by a call to the procedure DBMS SCHEDULER.RUN JOB.
To drop the job named “FAILURE_TEST”, execute the call to DBMS SCHEDULER.DROP JOB
reproduced here:
SQL> EXEC dbms scheduler.drop job('failure test')
External Jobs on UNIX
Wouldn’t it be nice to teach a database how to back itself up by running RMAN as an external
job? The unsuspecting DBA might think that all that needs to be done is to run RMAN through
the scheduler with exactly the same command line options as from a shell. Actual tests, however,
reveal that matters are more intricate. Environment variables and argument passing are two of
the issues that pose undocumented obstacles and must be overcome.
In the source code depot, I provide a prototype of a job that controls RMAN through the
pipe interface discussed in Chapter 36 (file rman backup.sql). It creates an external job, which
runs RMAN through the scheduler, and a job of type PLSQL_BLOCK. This latter job starts RMAN by
manually executing the aforementioned external job and passes commands to RMAN through
the pipe interface. Thus, the database has been taught to back itself up. There is no longer any
need for scheduling backups outside of the DBMS instance. In a high availability environment,
guarding the DBMS instance against failure automatically protects the backup job too.
185
186
CHAPTER 19 ■ DBMS_SCHEDULER
The scheduler provides several metadata attributes to jobs of type PLSQL_BLOCK. Oracle10g
and Oracle11g support the same attributes (see page 114-61 in Oracle Database PL/SQL Packages
and Types Reference 11g Release 1). Unfortunately, the value of the sequence SYS.SCHEDULER$
INSTANCE S, which is used for numbering the column LOG ID in the views DBA SCHEDULER
JOB LOG and DBA SCHEDULER JOB RUN DETAILS is not among them. It would be very useful for
integrating custom logging in a database table with the scheduler’s own logging. Yet, there is
a workaround. After a scheduled job run, the value of DBA SCHEDULER JOBS.NEXT RUN DATE is
copied to DBA SCHEDULER JOB RUN DETAILS.REQ START DATE (requested start date). A job of type
PLSQL_BLOCK, which selects NEXT RUN DATE and incorporates it into a custom logging table,
solves the integration issue. After the job has completed, the custom logging table may be joined
with DBA SCHEDULER JOB RUN DETAILS using the column REQ START DATE. A sample implementation is included in the file rman backup.sql in the source code depot.
Removal of Environment Variables
It is undocumented that the scheduler removes all environment variables before it starts a
UNIX process, which implements an external job. Evidence for this trait is easily collected by
having the scheduler run a shell script that calls the program env. On UNIX systems, if called
without arguments, env prints out all the environment variables that it inherited. Following is
the code of a shell script that prints a sorted list of all environment variables on standard error
output and saves a copy of the output to the file /tmp/env.out by using the command tee. Execute
permission on the file env.sh is required and is set using chmod.
$ more env.sh
#!/bin/sh
env | sort | tee /tmp/env.out 1>&2
exit 0
$ chmod +x env.sh
To have the database scheduler run this shell script, we create the following scheduler
program called “ENV”:
SQL> BEGIN
DBMS SCHEDULER.CREATE PROGRAM(
program name=>'env',
program action=>'/home/oracle/env.sh',
program type=>'EXECUTABLE',
number of arguments=>0,
comments=>'environment variables',
enabled=>true);
end;
/
Then we create a job that uses the preceding program:
SQL> BEGIN
sys.dbms scheduler.create job(
job name => 'env job',
program name => 'env',
CHAPTER 19 ■ DBMS_SCHEDULER
auto drop => FALSE,
enabled => false);
END;
/
The job was defined as disabled, which dispels the requirement to supply a schedule. We
are now ready to run the job manually.
SQL> EXEC dbms scheduler.run job(job name => 'env job')
PL/SQL procedure successfully completed.
SQL> SELECT status, error#, additional info
FROM dba scheduler job run details
WHERE job name='ENV JOB'
AND owner='SYS';
STATUS
ERROR# ADDITIONAL INFO
--------- ---------- ---------------------SUCCEEDED
0 STANDARD ERROR="PWD=/
SHLVL=1
=/bin/env"
Thanks to terminating the shell script with exit code 0, the job is considered successful
(“SUCCEEDED”). The job scheduler saved the standard error output of the job in the column
ADDITIONAL INFO of the view DBA SCHEDULER JOB RUN DETAILS. There are merely three environment variables and these are certainly not the ones you would expect. PATH, ORACLE HOME, and
ORACLE SID are missing. PWD—the process’s working directory—is a familiar one. SHLVL and “ ”
are present due to the fact that the test was performed on a Linux system, where /bin/sh is a
symbolic link to /bin/bash. SHLVL (shell level) is incremented by one each time bash starts another
instance of bash as a child process. Bash places the full path of each command it runs in the
environment variable “ ” (underscore), which is then inherited by child processes.
If we insist on implementing a database that can back itself up, we must run RMAN through
the scheduler in such a way that the environment variables ORACLE HOME and ORACLE SID are
visible. NLS DATE FORMAT should also be set, since it controls the date and time format used by
RMAN. Several approaches come to mind.
• Run /usr/bin/env through the scheduler, set environment variables with the command
line arguments to env, and then have env run RMAN as in the following example:
$ /usr/bin/env ORACLE HOME=/opt/oracle/product/db10.2 ORACLE SID=TEN \
NLS LANG=dd.Mon.yy-hh24:mi:ss /opt/oracle/product/db10.2/bin/rman target / \
cmdfile /opt/oracle/admin/scripts/backup.rcv msglog /opt/oracle/admin/log/rman.log
• Write a shell script that sets the required environment variables and then calls RMAN.
Have the scheduler run the shell script.
• Write a general purpose shell script wrapper that receives the name of a file with environment variables to set as well as the program to run as input.
The third approach should be the most promising, since some reuse is possible and the
wrapper script could print the exit code of the program on standard error for later retrieval
through the view DBA SCHEDULER JOB RUN DETAILS. A sample implementation of such a wrapper
script is in the file extjob.sh in the source code depot.
187
188
CHAPTER 19 ■ DBMS_SCHEDULER
Command Line Processing
Another undocumented aspect concerns how the scheduler runs the PROGRAM ACTION specified
with one of the procedures DBMS SCHEDULER.CREATE PROGRAM or DBMS SCHEDULER.CREATE JOB.
The Oracle Database PL/SQL Packages and Types Reference 10g Release 2 informs us that “the
PROGRAM ACTION for a program of type EXECUTABLE is the name of the external executable, including
the full path name and any command-line arguments” (page 93-35, [OL10 2005]). The first part
of this statement is correct, since the attempt to run an external program without specifying
the full path name fails with “ORA-27369: job of type EXECUTABLE failed with exit code: No
such file or directory”. After a job failure, scheduler error messages may be retrieved from the
column ADDITIONAL INFO of the data dictionary view DBA SCHEDULER JOB RUN DETAILS.
The second part, however, solely applies to Windows. On UNIX, arguments to external
programs must be defined with the procedure DBMS SCHEDULER.DEFINE PROGRAM ARGUMENT
instead of adding them to the PROGRAM ACTION. Otherwise jobs based on the program fail with
“ORA-27369: job of type EXECUTABLE failed with exit code: No such file or directory”. External
jobs, which are not based on programs, cannot receive arguments.
On UNIX, the program $ORACLE HOME/bin/extjobo is responsible for running external jobs
owned by SYS. Using a system call trace utility such as truss or strace reveals that extjobo uses
the UNIX system call access to verify that the program action is executable. A program action
that includes arguments fails this test with the UNIX error ENOENT (No such file or directory).
System call tracing also reveals that the executable is run directly with the system call execve.
This implies that characters such as ? or * within program arguments, which have special
meaning to shells, must not be escaped.
Another implication is that the pseudo comment #!/bin/sh (or #!/bin/ksh, #!/bin/bash,
etc.), which specifies an interpreter for running text files, must be present in the first line of
shell scripts. If the specification of a command interpreter is missing, the error “ORA-27369:
job of type EXECUTABLE failed with exit code: 255” is reported and DBA SCHEDULER JOB RUN
DETAILS.ADDITIONAL INFO contains STANDARD_ERROR="execve: Exec format error".
Likewise, perl must be specified as the command interpreter in order to run Perl scripts as
external jobs. Thus, the first line of a Perl script might specify the absolute path to the perl
interpreter as shown here:
#!/opt/oracle/product/db10.2/perl/bin/perl
The more portable approach is to use /usr/bin/env, which has the same absolute path on
all UNIX systems, to run perl as follows:
#!/usr/bin/env perl
The disadvantage of this latter approach in the context of the database scheduler is that
the environment variable PATH is removed by extjobo. Thus, some mechanism that sets PATH
before the Perl script is run must be in place. Once more /usr/bin/env may be used for this
purpose, by defining a program that calls env and sets PATH in a program argument. Following
is an example. The Perl script test.pl, which shall be executed by the scheduler, contains the
code shown here:
#!/usr/bin/env perl
printf STDERR "This is perl script $0 executed by UNIX process $$.\n";
exit 0;
CHAPTER 19 ■ DBMS_SCHEDULER
The following PL/SQL block creates a program with two arguments for running the Perl
script test.pl. To enable a program that includes arguments, all arguments must be defined
with separate calls to the packaged procedure DBMS SCHEDULER.DEFINE PROGRAM ARGUMENT.
begin
dbms scheduler.create program(
program name=>'perl program',
program type=>'EXECUTABLE',
program action=> '/usr/bin/env',
number of arguments=>2,
enabled=>false
);
dbms scheduler.define program argument(
program name=>'perl program',
argument position=>1,
argument name=>'env',
argument type=>'VARCHAR2',
default value=>'PATH=/opt/oracle/product/db10.2/perl/bin:/home/oracle'
);
dbms scheduler.define program argument(
program name=>'perl program',
argument position=>2,
argument name=>'script',
argument type=>'VARCHAR2',
default value=>'test.pl'
);
dbms scheduler.enable('perl program');
dbms scheduler.create job(
job name=>'perl job',
program name=>'perl program',
enabled=>false,
auto drop=>false
);
end;
/
The job succeeds, since the environment variable PATH, which is removed by the scheduler, is supplied explicitly as a program argument. The file test.pl must be located in one of
the directories assigned to PATH.
SQL> EXEC dbms scheduler.run job('perl job')
PL/SQL procedure successfully completed.
SQL> SELECT status, additional info
FROM dba scheduler job run details
WHERE log id=(SELECT max(log id) FROM dba scheduler job run details);
STATUS
ADDITIONAL INFO
--------- -----------------------------------------------------------SUCCEEDED STANDARD ERROR="This is perl script /home/oracle/test.pl
executed by UNIX process 5387."
189
190
CHAPTER 19 ■ DBMS_SCHEDULER
External Jobs and Non-Privileged Users
On UNIX, external jobs owned by the privileged user SYS are run with the privileges of the
ORACLE software owner—usually the UNIX user oracle. The execution of external jobs owned
by database users other than SYS is enabled by default. It is undocumented which UNIX user is
used to run these external jobs. Users who have neither the privilege SYSDBA nor the role DBA
require the system privileges CREATE JOB and CREATE EXTERNAL JOB to successfully create and run
external jobs. Thus, to create a new user EXTJOB with just enough privileges to run external
jobs, the following SQL statements may be used:
SQL>
SQL>
SQL>
SQL>
CREATE USER extjob IDENTIFIED BY secret;
GRANT CONNECT TO extjob;
GRANT CREATE JOB TO extjob;
GRANT CREATE EXTERNAL JOB TO extjob;
When called without arguments, the UNIX program id displays the user name and group
set of the current user. This program may be used to find out which UNIX user is used to run
external programs by non-privileged users. Calling a shell script that redirects the output of id
to standard error for capture by the scheduler yields the following:
SQL> SELECT additional info
FROM all scheduler job run details
WHERE log id=(SELECT max(log id) FROM all scheduler job run details);
ADDITIONAL INFO
-------------------------------------------------------------------------------STANDARD ERROR="uid=99(nobody) gid=99(nobody) groups=800(oinstall),801(dba)"
The UNIX user and group nobody are used. For this reason, the existence of the UNIX user
nobody is an installation prerequisite mentioned in installation guides for the ORACLE DBMS.
Oracle10g Release 2 UNIX installation guides incorrectly state that the program $ORACLE HOME/
bin/extjob must be owned by nobody. When this is the case, external jobs fail with the following
error message in ALL SCHEDULER JOB RUN DETAILS.ADDITIONAL INFO:
ADDITIONAL INFO
-------------------------------------------------------------------------------ORA-27369: job of type EXECUTABLE failed with exit code: 274662
STANDARD ERROR="Oracle Scheduler error: Config file is not owned by root or is
writable by group or other or extjob is not setuid and owned by root"
The “config file” mentioned in the error message refers to the file externaljob.ora, which
is located in the directory $ORACLE HOME/rdbms/admin. This file is undocumented in Oracle10g
and is partially documented in Oracle Database Administrator’s Guide 11g Release 1. It must be
owned by root and must be writable only by the owner:.
$ ls -l $ORACLE HOME/rdbms/admin/externaljob.ora
-rw-r----- 1 root oinstall 1534 Dec 22 2005 /opt/oracle/product/db10.2/rdbms/admin
/externaljob.ora
CHAPTER 19 ■ DBMS_SCHEDULER
Contents of the file externaljob.ora are reproduced here:1
# This configuration file is used by dbms scheduler when executing external
# (operating system) jobs. It contains the user and group to run external
# jobs as. It must only be writable by the owner and must be owned by root.
# If extjob is not setuid then the only allowable run user
# is the user Oracle runs as and the only allowable run group is the group
# Oracle runs as.
run user = nobody
run group = nobody
The correct permissions for extjob are setuid root.
$ ls -l $ORACLE HOME/bin/extjob
-rwsr-x--- 1 root oinstall 64920 Jul 21 17:04 /opt/oracle/product/db10.2/bin/extjob
Setuid permissions are required to allow the program extjob to change its effective user ID
to that of the user nobody by calling the C library function seteuid. The effective group ID is set
by a call to setegid. Since both the effective user and group ID are changed to nobody before
using execve to run the external program, merely the permissions of user and group nobody
are available to external jobs not owned by SYS. This mechanism must be in place to prevent
external jobs from connecting as SYS, which would pose a serious security threat.
Metalink note 391820.1 suggests setting run user=oracle and run group=oinstall as part
of resolving the errors “ORA-27369: job of type EXECUTABLE failed with exit code: Operation
not permitted” and “ORA-27369: job of type EXECUTABLE failed with exit code: 274662”. From
a security perspective, this is very problematic. Normally, the UNIX user oracle is a member of
the OSDBA group (usually group dba) and may connect as SYS without supplying a password.
By allowing users other than SYS to execute external jobs as a member of the OSDBA group,
those users may connect as SYS in their external jobs! Thus, any user who has the privileges
CREATE JOB and CREATE EXTERNAL JOB can connect as SYS! The correct solution would have been
to create and run the job as SYS. Jobs owned and run by SYS are always executed as the ORACLE
software owner. The program $ORACLE HOME/ bin/extjobo, which runs these jobs, does not
use the configuration file externaljob.ora. Setuid permission for extjobo is not required either,
since this program does not alter effective user or group identifiers.
External Jobs on Windows
The implementation of the database scheduler on Windows differs from the UNIX implementation in these three respects:
1. The error “ORA-27369: job of type EXECUTABLE failed with exit code: 274668 STANDARD_ERROR=
"Oracle Scheduler error: Invalid or missing run_group in configuration file."” may be raised in spite of
a correct configuration due to a line in externaljob.ora that exceeds 100 characters.
191
192
CHAPTER 19 ■ DBMS_SCHEDULER
• Command line argument handling
• Environment variables
• Execution of external jobs by non-privileged users
The next three sections address these topics in detail.
Command Line Argument Handling
Contrary to UNIX, scheduler programs or job actions on Windows may contain command line
arguments. The source code depot contains a complete prototype for taking RMAN backups
with the scheduler (zip file exec rman.zip). The prototype includes the backup script
exec rman.bat, which requires the following two arguments:
• A batch script with configuration variables, such as a directory for log files (LOG DIR) and
a connect string for the RMAN catalog (CATALOG CONNECT)
• The name of an RMAN script to run, e.g., for database or archived redo log backup
The script exec rman.bat calls the configuration batch file (e.g., TEN config.bat) to read
the configuration and then runs the requested RMAN script. It checks RMAN’s exit code and
prints it to standard error output, such that it may be retrieved from the view DBA SCHEDULER
JOB RUN DETAILS. In case RMAN terminates with a non-zero exit code, exec rman.bat terminates
with a non-zero exit code too, thus signaling success or failure of the backup to the scheduler.
Following is an example of a job that uses exec rman.bat and supplies the required command
line arguments:
SQL> BEGIN
DBMS SCHEDULER.CREATE JOB(
job name => 'rman online backup',
job type => 'EXECUTABLE',
job action => 'c:\home\ndebes\bin\exec rman.bat
C:\home\ndebes\rman\TEN config.bat backup online.rcv',
start date => systimestamp,
repeat interval => 'FREQ=DAILY;BYHOUR=22',
enabled=>true /* default false! */
);
END;
/
On Windows, the Process Explorer2 is the correct tool to visualize relationships among
processes. Figure 19-1 shows that the scheduler uses %ORACLE HOME%\bin\extjobo.exe and a
Windows command interpreter (cmd.exe) to run the batch script defined in the job’s action.
The option /C instructs cmd.exe to execute the string passed as a command and to exit as soon
as the command has finished.
2. Process Explorer is available at no charge at http://www.sysinternals.com. It may be used to find out
which process has opened a DLL or file. I recommend two other tools from Sysinternals: Regmon for
monitoring registry access and Filemon for monitoring file access.
CHAPTER 19 ■ DBMS_SCHEDULER
Figure 19-1. An external job in Process Explorer
Windows Environment Variables
In the previous section, we saw that cmd /c is used to run external jobs on Windows. Since
cmd.exe retrieves and sets system-wide as well as user-specific environment variables, we
should expect that environment variables are available to external jobs. A quick test confirms
that this assumption is correct. By default, the service that implements an ORACLE instance
runs as user SYSTEM. Thus, only system-wide environment variables are available. If, however,
the service is run as a specific user, environment variables set by that user are also available.
The following batch script (file environment.bat) may be run as an external job to check environment variables:
echo
echo
echo
echo
echo
echo
PATH=%PATH% > c:\temp\environment.log
ORACLE HOME=%ORACLE HOME% >> c:\temp\environment.log
ORACLE SID=%ORACLE SID% >> c:\temp\environment.log
NLS DATE FORMAT=%NLS DATE FORMAT% >> c:\temp\environment.log
TNS ADMIN=%TNS ADMIN% >> c:\temp\environment.log
PERL5LIB=%PERL5LIB% >> c:\temp\environment.log
An interesting result is that ORACLE SID, which is normally not defined as an environment
variable, is set to the same value as V$INSTANCE.INSTANCE NAME. This behavior of the DBMS on
Windows is just the opposite of what we saw on UNIX. Whereas extjob and extjobo on UNIX
remove all environment variables, the same programs on Windows appear to set ORACLE SID
(or at least they don’t remove it after inheritance from another process).
External Jobs and Non-Privileged Users
External jobs owned by the database user SYS run as Windows user SYSTEM. SYSTEM is a
member of the group Administrators. From a security perspective, this is as if external jobs
were run as root on UNIX. Care must be taken that scripts or executables run by external jobs
cannot be modified by potential intruders.
After a default installation, jobs owned by users other than SYS fail with the following errors:
SQL> EXEC dbms scheduler.run job('id')
BEGIN dbms scheduler.run job('id'); END;
*
193
194
CHAPTER 19 ■ DBMS_SCHEDULER
ERROR at line 1:
ORA-27370: job slave failed to launch a job of type EXECUTABLE
ORA-27300: OS system dependent operation:accessing execution agent failed with
status: 2
ORA-27301: OS failure message: The system cannot find the file specified.
ORA-27302: failure occurred at: sjsec 6a
ORA-27303: additional information: The system cannot find the file specified.
ORA-06512: at "SYS.DBMS ISCHED", line 150
ORA-06512: at "SYS.DBMS SCHEDULER", line 441
ORA-06512: at line 1
Services Created by the ORADIM Utility
ORADIM is a command line utility for creating Windows services, which implement RDBMS
and ASM instances. In case you have ever taken a close look at the services ORADIM creates,
you may have noticed that it creates two services in Oracle10g and subsequent releases. If, for
example, you create an RDBMS instance called TEST as follows:
C:> oradim -new -sid TEST -syspwd secret -startmode manual -srvcstart demand
you will see these two new services:
• OracleServiceTEST
• OracleJobSchedulerTEST
The service OracleServiceTEST implements the RDBMS instance. Opening the
registry with regedit.exe reveals that the service HKEY_LOCAL_MACHINE\SYSTEM\
CurrentControlSet\Services\OracleServiceTEST is based on the executable oracle.exe. The
service OracleJobSchedulerTEST has status disabled and is implemented by extjob.exe. This
service must be started for external jobs owned by database users other than SYS to complete
successfully.
OracleJobScheduler Service
The new service OracleJobSchedulerORACLE_SID, where ORACLE_SID is the name of an
instance created with oradim.exe, is undocumented in Oracle10g. It is documented in Oracle
Database Platform Guide 11g Release 1 for Microsoft Windows. The Oracle11g documentation
points out that the service must be configured to run as a user with low privileges before enabling
it. By default, OracleJobSchedulerORACLE_SID is configured to run as user SYSTEM. For security
reasons, this setting must be modified. Otherwise, any database user with enough privileges to
execute external jobs might use operating system authentication to connect as SYS and take
control of databases on the local system. The account used to run OracleJobSchedulerORACLE_
SID should not be a member of the groups Administrators or ORA_DBA.
CHAPTER 19 ■ DBMS_SCHEDULER
Source Code Depot
Table 19-2 lists this chapter’s source files and their functionality.
Table 19-2. DBMS_SCHEDULER Source Code Depot
File Name
Functionality
environment.bat
Saves Windows environment variable settings available to external
jobs in a file.
exec rman.zip
Prototype for taking RMAN backups on Windows with the
database scheduler.
extjob.sh
Shell script wrapper for running external jobs through the database
scheduler. Sets environment variables before running the proper job.
failure.bat
Windows command line script for simulating job failure.
perl job.sql
Scheduler program and job for running the perl script test.pl on UNIX
rman backup.sql
Creates an external job, which runs RMAN with the pipe interface
enabled. A second job of type PLSQL_BLOCK passes commands to
RMAN through the pipe interface. The pipe messages sent to and
received by RMAN are logged in a custom database table. This table
is integrated with the scheduler’s own logging. The package
RMAN PIPE IF, presented in Chapter 36, is required.
test.pl
Perl script which writes the script name and the process identifier of the
process that executes the script to standard error output
195
CHAPTER 20
■■■
DBMS_SYSTEM
T
he package DBMS SYSTEM is installed by default in Oracle9i and subsequent releases. Execute
permission on this package is only available to the user SYS. There are a few references to the
package DBMS SYSTEM in manuals, but it is undocumented in the Oracle9i Supplied PL/SQL
Packages and Types References as well as in the Oracle Database PL/SQL Packages and Types
Reference of Oracle10g Release 2 and Oracle11g Release 1.
The package DBMS SYSTEM provides a wide array of useful functionality, such as enabling
SQL trace at all supported levels in any database session, setting events, generating dumps
without using the SQL*Plus command ORADEBUG, writing custom entries to the alert log or trace
files, getting environment variable values, and changing parameters in running sessions. Since
the package DBMS SUPPORT is not installed by default, DBMS SYSTEM is an alternative to using
DBMS SUPPORT for tracing sessions.
GET_ENV Procedure
This procedure gives read-only access to environment variables of the process servicing a database client. In case you have ever been desperately seeking for a way to ask a DBMS instance
what ORACLE HOME it is running in, look no further. While the machine a DBMS instance is
running on (V$INSTANCE.HOST NAME) as well as the ORACLE SID of the instance (V$INSTANCE.
INSTANCE NAME) are available through the view V$INSTANCE, none of the V$ views tell what the
setting of ORACLE HOME is. The procedure GET ENV is not available in Oracle9i.
Syntax
DBMS SYSTEM.GET ENV(
var IN VARCHAR2,
val OUT VARCHAR2);
Parameters
Parameter
Description
var
Name of an environment variable
val
Value of the environment variable
197
198
CHAPTER 20 ■ DBMS_SYSTEM
Usage Notes
When there is no environment variable that matches the name passed in parameter var, an
empty string is returned in val. If the maximum length of val is insufficient to hold the value of
the environment variable, the error “ORA- 06502: PL/SQL: numeric or value error: character
string buffer too small” is thrown. To avoid this, I recommend that you always declare val as large
as possible. In SQL*Plus the maximum size is VARCHAR2(4000) as opposed to VARCHAR2(32767) in
PL/SQL. Environment variable names on UNIX systems are case-sensitive, whereas they are
not case-sensitive on Windows systems.
Examples
The following PL/SQL code may be used to retrieve the value of the environment variable
ORACLE HOME in SQL*Plus:
SQL> SET AUTOPRINT ON
SQL> VARIABLE val VARCHAR2(4000)
SQL> BEGIN
dbms system.get env('ORACLE HOME', :val);
END;
/
PL/SQL procedure successfully completed.
VAL
------------------------------------------------/opt/oracle/product/10.2
KCFRMS Procedure
This procedure resets the maximum wait time for each event (V$SESSION EVENT.MAX WAIT), the
maximum read time for a data file (V$FILESTAT.MAXIORTM), and the maximum write time for a
data file (V$FILESTAT. MAXIOWTM) to zero. This procedure might be useful in an environment
where peaks in file access (V$FILESTAT) or wait events (V$SESSION EVENT) are observed. By saving
the values from these V$ views just before calling DBMS SYSTEM.KCFRMS on an hourly basis, it
would be possible to determine at which times during the day and to what extent peaks occur.
Syntax
DBMS SYSTEM.KCFRMS();
Usage Notes
The values are set to zero for all sessions in V$SESSION EVENT, not solely the session that calls the
procedure DBMS SYSTEM.KCFRMS.
Examples
Following are some sample rows of V$SESSION EVENT before calling DBMS SYSTEM.KCFRMS (all
timings are in seconds; the value of the column MAX WAIT is natively in centiseconds):
CHAPTER 20 ■ DBMS_SYSTEM
SQL> SELECT sid, wait class, event,
round(time waited micro/1000000,3) AS time waited sec,
max wait/100 AS max wait sec
FROM v$session event WHERE sid in(140, 141)
ORDER BY sid, wait class, event;
SID WAIT CLASS EVENT
TIME WAITED SEC MAX WAIT SEC
--- ----------- ----------------------------- --------------- -----------140 Application enq: TX - row lock contention
4.929
3
140 Commit
log file sync
.009
.01
140 Idle
SQL*Net message from client
53.167
29.72
140 Network
SQL*Net message to client
0
0
140 System I/O control file sequential read
.006
.01
140 User I/O
db file scattered read
.208
.13
140 User I/O
db file sequential read
.603
.04
140 User I/O
direct path write temp
0
0
141 Application SQL*Net break/reset to client
.022
0
141 Commit
log file sync
.04
.01
141 Idle
SQL*Net message from client
2432.647
171.39
141 Network
SQL*Net message to client
.001
0
141 Other
events in waitclass Other
1.042
1.04
141 System I/O control file sequential read
2.166
0
141 User I/O
db file sequential read
.267
.02
After calling DBMS SYSTEM.KCFRMS, the same query on V$SESSION EVENT gave the following
results:
SID
--140
140
140
140
140
140
140
140
141
141
141
141
141
141
141
WAIT CLASS
----------Application
Commit
Idle
Network
System I/O
User I/O
User I/O
User I/O
Application
Commit
Idle
Network
Other
System I/O
User I/O
EVENT
TIME WAITED SEC MAX WAIT SEC
----------------------------- --------------- -----------enq: TX - row lock contention
4.929
0
log file sync
.009
0
SQL*Net message from client
379.432
0
SQL*Net message to client
0
0
control file sequential read
.006
0
db file scattered read
.208
0
db file sequential read
.603
0
direct path write temp
0
0
SQL*Net break/reset to client
.022
0
log file sync
.04
0
SQL*Net message from client
2460.816
28.17
SQL*Net message to client
.001
0
events in waitclass Other
1.042
0
control file sequential read
2.166
0
db file sequential read
.267
0
As you can see from these results, the values in column MAX WAIT (column MAX WAIT SEC)
have been set to zero for all columns. For session 141 the value of MAX WAIT has become nonzero again, since it had become active after DBMS SYSTEM.KCFRMS had finished.
The contents of V$FILESTAT before calling DBMS SYSTEM.KCFRMS were as follows:
199
200
CHAPTER 20 ■ DBMS_SYSTEM
SELECT file#, phyrds, phywrts, readtim, writetim, maxiortm, maxiowtm
FROM v$filestat;
FILE# PHYRDS PHYWRTS READTIM WRITETIM MAXIORTM MAXIOWTM
----- ------ ------- ------- -------- -------- -------1
8194
845 36635
810
6
8
2
26
1655
88
1698
0
1
3
720
1324
3060
1534
4
0
4
213
10
997
0
8
0
5
5
1
23
1
0
0
After calling DBMS SYSTEM.KCFRMS, querying V$FILESTAT returned the following data:
SELECT file#, phyrds, phywrts, readtim, writetim, maxiortm, maxiowtm
FROM v$filestat;
FILE# PHYRDS PHYWRTS READTIM WRITETIM MAXIORTM MAXIOWTM
----- ------ ------- ------- -------- -------- -------1
8194
849 36635
812
0
0
2
26
1655
88
1698
0
0
3
720
1324
3060
1534
0
0
4
213
10
997
0
0
0
5
5
1
23
1
0
0
KSDDDT Procedure
This procedure writes a timestamp into a SQL trace file. The format of the timestamp as represented by ORACLE DBMS date and time format models (see Chapter 2 of Oracle Database SQL
Reference 10g Release 2) is YYYY-MM- DD HH24:MI:SS.FF3.
Syntax
DBMS SYSTEM.KSDDDT();
Usage Notes
If the process servicing the current database session does not yet have a trace file, e.g., from an
execution of the statement ALTER SESSION SET sql trace=TRUE, then a trace file is created. The
timestamp’s format does not depend on the session’s national language support (NLS) settings,
i.e., NLS DATE FORMAT and related parameters do not influence the format of the timestamp.
Examples
C:> sqlplus / as sysdba
SQL*Plus: Release 10.2.0.1.0 - Production on Mon Jun 25 13:17:07 2007
SQL> ORADEBUG SETMYPID
Statement processed.
SQL> ORADEBUG TRACEFILE NAME
Statement processed.
SQL> EXEC dbms system.ksdddt();
CHAPTER 20 ■ DBMS_SYSTEM
PL/SQL procedure successfully completed.
SQL> ORADEBUG TRACEFILE NAME
c:\programs\admin\ten\udump\ten ora 4588.trc
SQL> $type c:\programs\admin\ten\udump\ten ora 4588.trc
Dump file c:\programs\admin\ten\udump\ten ora 4588.trc
Mon Jun 25 13:17:25 2007
…
Windows thread id: 4588, image: ORACLE.EXE (SHAD)
…
*** SERVICE NAME:(SYS$USERS) 2007-06-25 13:17:25.923
*** SESSION ID:(147.755) 2007-06-25 13:17:25.923
*** 2007-06-25 13:17:25.923
The last line in the preceding example is the timestamp written by DBMS SYSTEM.KSDDDT. As
you can see, when ORADEBUG TRACEFILE NAME was first called, it did not return the path of a trace
file, since no trace file existed at that point in time. By running DBMS SYSTEM.KSDDDT a trace file
was created and its path was returned by the second call to ORADEBUG TRACEFILE NAME.
KSDFLS Procedure
This procedure flushes any pending output to the target file (alert log and/or trace file).
Syntax
DBMS SYSTEM.KSDFLS();
Usage Notes
Personally, I have never found a situation where it was necessary to call DBMS SYSTEM.KSDFLS.
Examples
SQL> EXEC dbms system.ksdfls
KSDIND Procedure
This procedure indents the next string written to a SQL trace file with DBMS SYSTEM.KSDWRT by
placing one or more colons (:) as specified by parameter lvl at the beginning of the line.
Syntax
DBMS SYSTEM.KSDIND(
lvl IN BINARY INTEGER);
Parameters
Parameter
Description
lvl
Indentation level
201
202
CHAPTER 20 ■ DBMS_SYSTEM
Usage Notes
This procedure has no effect when writing to the alert log with DBMS SYSTEM.KSDWRT.
Examples
SQL> BEGIN
sys.dbms system.ksdind(3);
sys.dbms system.ksdwrt(1, 'indented string');
END;
/
The above anonymous block writes the following line to the trace file:
:::indented string
KSDWRT Procedure
This procedure writes a string to the SQL trace file of the server process servicing the database
client, the alert log of the instance, or both. If SQL TRACE has not been enabled (through ALTER
SESSION, DBMS SYSTEM, DBMS SUPPORT,1 etc.) in the server process and thus no trace file exists yet,
a trace file is created. Thus it is possible to write transcripts of sessions to trace files without
enabling SQL trace.
Syntax
DBMS SYSTEM.KSDWRT(
dest IN BINARY INTEGER,
tst IN VARCHAR2);
Parameters
Parameter
Description
dest
Destination file, 1=SQL trace file, 2=alert log, 3=both
tst
String to write to destination file
Usage Notes
When writing to a SQL trace file, a timestamp such as the one written by DBMS SYSTEM.KSDDT is
automatically placed on the line preceding the string written, given that the session has been
inactive for more than ten seconds. To make sure the string written is always preceded by a
timestamp, explicitly call DBMS SYSTEM.KSDDT before calling DBMS SYSTEM.KSDWRT.
1. The package DBMS SUPPORT may be installed with the script $ORACLE HOME/rdbms/admin/dbmssupp.sql.
Tracing of a foreign session is initiated with the procedure START TRACE IN SESSION and terminated
with the procedure STOP TRACE IN SESSION.
CHAPTER 20 ■ DBMS_SYSTEM
When writing to the alert log, the string written is unconditionally preceded by a timestamp and the format is “Dy Mon DD HH24:MI:SS YYYY” (e.g., “Mon Jun 25 15:17:37 2007”).
Since the package DBMS SYSTEM has functionality that should only be accessible to privileged sessions, it is advisable to provide access to DBMS SYSTEM through wrapper procedures.
Such wrapper procedures are part of the open source ORACLE instrumentation library ILO by
Hotsos.2 Their names are HOTSOS SYSUTIL.WRITE DATESTAMP and HOTSOS SYSUTIL.WRITE TO
TRACE. Execute permission on these packages is granted to PUBLIC.
Examples
The following anonymous block shows how timing information for expensive tasks can be
written to a trace file. The string “===” at the beginning of the line serves to make sure that the
line is ignored by the TKPROF utility. DBMS UTILITY.GET CPU TIME is not available in Oracle9i.
SQL> DECLARE
elapsed time t1 number;
elapsed time t2 number;
cpu time t1 number;
cpu time t2 number;
BEGIN
elapsed time t1:=dbms utility.get time;
cpu time t1:=dbms utility.get cpu time;
dbms stats.gather schema stats(user); -- do something expensive
elapsed time t2:=dbms utility.get time;
cpu time t2:=dbms utility.get cpu time;
sys.dbms system.ksdddt;
sys.dbms system.ksdwrt(1, '=== Elapsed time: ' ||
to char((elapsed time t2 - elapsed time t1)/100)||
' sec CPU: ' || to char((cpu time t2 - cpu time t1)/100) || ' sec');
END;
/
This anonymous block writes entries such as these into a trace file:
*** 2007-06-25 16:23:12.316
=== Elapsed time: 1.15 sec CPU: .68 sec
Another example for leveraging DBMS SYSTEM.KSDWRT might be to record errors that are not
normally logged to the alert log by creating a SERVERERROR trigger that calls KSDWRT. For example,
Oracle9i does not write entries to the alert log when “ORA-01555: snapshot too old” occurs
(Oracle10g does). By implementing a SERVERERROR trigger that checks for error code 1555, this
error and the statement that failed could be written to the alert log of an Oracle9i instance.
Many monitoring tools such as Enterprise Manager or Tivoli TEC are able to parse the alert log
of an instance and to relay errors found to a management console. By writing custom errors
such as ORA-20000 to the alert log, a simple yet efficient integration may be built.
2. See http://www.hotsos.com.
203
204
CHAPTER 20 ■ DBMS_SYSTEM
READ_EV Procedure
This procedure reads the level at which an event has been enabled. It returns zero for
disabled events.
Syntax
DBMS SYSTEM.READ EV(
iev IN BINARY INTEGER,
oev OUT BINARY INTEGER);
Parameters
Parameter
Description
iev
Event number; usually between 10000 and 10999
oev
Level at which the event is set; level=0 means the event is disabled
Usage Notes
DBMS SYSTEM.READ EV works as expected when SQL trace has been enabled by setting event
10046, e.g., with ALTER SESSION, DBMS MONITOR.SESSION TRACE ENABLE, or DBMS SYSTEM.SET EV.
DBMS SYSTEM.READ EV does not work in conjunction with ALTER SESSION SET SQL TRACE=TRUE
in Oracle10g and Oracle11g, since the oev returned by DBMS SYSTEM.READ EV remains 0 when
this statement is executed (in Oracle9i oev=1 is returned, i.e., it works as expected in Oracle9i).
In the latter case, SELECT value FROM v$parameter WHERE name='sql trace' can be used to find
out whether SQL trace is switched on or not in both releases.
Examples
The following example shows that using DBMS MONITOR in Oracle10g to enable tracing of SQL
statements and wait events sets event 10046 at level 8:
SQL> VARIABLE lev number
SQL> SET AUTOPRINT ON
SQL> EXECUTE sys.dbms system.read ev(10046, :lev)
PL/SQL procedure successfully completed.
LEV
---------0
SQL> EXEC dbms monitor.session trace enable
PL/SQL procedure successfully completed.
SQL> SELECT sql trace, sql trace waits, sql trace binds FROM v$session
WHERE sid=userenv('sid')
CHAPTER 20 ■ DBMS_SYSTEM
SQL TRACE SQL TRACE WAITS SQL TRACE BINDS
---------- --------------- ---------------ENABLED
TRUE
FALSE
SQL> EXECUTE sys.dbms system.read ev(10046,:lev)
PL/SQL procedure successfully completed.
LEV
---------8
SET_INT_PARAM_IN_SESSION Procedure
This procedure sets an integer parameter in a foreign database session.
Syntax
DBMS SYSTEM.SET INT PARAM IN SESSION(
sid IN NUMBER,
serial# IN NUMBER,
parnam IN VARCHAR2,
intval IN BINARY INTEGER);
Parameters
Parameter
Description
sid
Session identifier; corresponds to V$SESSION.SID
serial#
Session serial number; corresponds to V$SESSION.SERIAL#
parnam
Parameter name; corresponds to V$PARAMETER.NAME
intval
Integer value to assign to a parameter
Usage Notes
No exception is raised when an incorrect SID or SERIAL# are passed.
Examples
The following example shows how to increase the setting of the parameter SORT AREA SIZE in a
session, e.g., when sorts are slow due to an insufficient default value of the parameter SORT
AREA SIZE.3 First, start a session and retrieve the current setting of SORT AREA SIZE by calling
the package DBMS UTILITY.4
3. Under most circumstances SORT AREA SIZE and other * AREA SIZE parameters should no longer be
used in favor of PGA AGGREGATE TARGET. SORT AREA SIZE cannot be modified with ALTER SYSTEM.
4. Retrieval of parameter settings with DBMS UTILITY works in any session, whereas access to V$PARAMETER
requires the role SELECT CATALOG ROLE or other suitable privilege.
205
206
CHAPTER 20 ■ DBMS_SYSTEM
$ sqlplus hr/hr
SQL> VARIABLE result NUMBER
SQL> VARIABLE sort area size NUMBER
SQL> VARIABLE dummy VARCHAR2(255)
SQL> BEGIN
:result:=dbms utility.get parameter value(parnam=>'sort area size',
intval=>:sort area size, strval=>:dummy);
END;
/
PL/SQL procedure successfully completed.
SQL> PRINT sort area size
SORT AREA SIZE
-------------65536
Next, start another SQL*Plus session as user SYS, retrieve the SID and SERIAL# of the
session you want to modify, and increase SORT AREA SIZE to 1048576 bytes by calling the
package DBMS SYSTEM.
$ sqlplus "/ as sysdba"
SQL> SELECT sid, serial# FROM v$session WHERE username='HR';
SID
SERIAL#
---------- ---------9
19
SQL> EXEC sys.dbms system.set int param in session(9, 19, 'sort area size', 1048576);
PL/SQL procedure successfully completed.
Now, back in the first session by HR, verify that the parameter SORT AREA SIZE has actually
changed.
SQL> BEGIN
:result:=dbms utility.get parameter value(parnam=>'sort area size',
intval=>:sort area size, strval=>:dummy);
END;
/
PL/SQL procedure successfully completed.
SQL> PRINT sort area size
SORT AREA SIZE
-------------1048576
CHAPTER 20 ■ DBMS_SYSTEM
SET_BOOL_PARAM_IN_SESSION Procedure
This procedure sets a boolean parameter in a foreign session.
Syntax
DBMS SYSTEM.SET BOOL PARAM IN SESSION(
sid IN NUMBER,
serial# IN NUMBER,
parnam IN VARCHAR2,
bval IN BOOLEAN);
Parameters
Parameter
Description
sid
Session identifier; corresponds to V$SESSION.SID
serial#
Session serial number; corresponds to V$SESSION.SERIAL#
parnam
Parameter name; corresponds to V$PARAMETER.NAME
bval
Boolean value, i.e., TRUE or FALSE
Usage Notes
No exception is raised when an incorrect SID or SERIAL# are passed.
Examples
Setting the boolean parameter TIMED STATISTICS to TRUE in a session with SID=12 and SERIAL#=16
is accomplished as follows:
SQL> EXEC dbms system.set bool param in session(12, 16, 'timed statistics', TRUE);
SET_EV Procedure
This procedure sets a numeric event in a session or generates a dump of information contained
in the SGA or PGA by specifying the name of the dump (i.e., SYSTEMSTATE dump).5 It is commonly
used to enable tracing of SQL statements, wait events, and binds in a session that exhibits
performance problems and is foreign to the caller’s session. Note that tracing wait events
(event 10046 at levels 8 and 12) and/or binds is highly recommended. Use of DBMS SYSTEM.SET
SQL TRACE IN SESSION is discouraged, since it enables SQL trace at level 1, i.e., without bind
variables and wait events.
5. A SYSTEMSTATE dump reveals the current state of an instance and may be used by Oracle Support to
diagnose hanging issues.
207
208
CHAPTER 20 ■ DBMS_SYSTEM
Syntax
DBMS SYSTEM.SET EV(
si IN BINARY INTEGER,
se IN BINARY INTEGER
ev IN BINARY INTEGER
le IN BINARY INTEGER
nm IN VARCHAR2);
Parameters
Parameter
Description
si
Session identifier; corresponds to V$SESSION.SID
se
Session serial number; corresponds to V$SESSION.SERIAL#
ev
Event number between 10000 and 10999 for numeric events such as
10053 (optimizer trace). If nm is not NULL and ev is an ORACLE error
number outside of the range 10000 to 10999, the dump named by nm is
taken when the session throws the error specified. If ev=65535 and nm
is not NULL, an immediate dump of the type specified by nm is taken.
This is equivalent to ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME
event_name LEVEL level', where event_name is the name of the dump
to take, for example SYSTEMSTATE, PROCESSSTATE, or ERRORSTACK, and
level is the event level. Named dumps are only possible within the
session of the caller, that is, they cannot be taken in a foreign session.
le
Level of the event; 0=disable event. Each event supports certain levels.
The maximum level is usually 10.
nm
Name, e.g. for taking diagnostic dumps. A list of permissible dump
names is available by calling ORADEBUG DUMPLIST as user SYS in
SQL*Plus. If nm is an empty string ('', i.e. two single quotes without
any characters in between), the call to DBMS SYSTEM.SET EV is equivalent
to ALTER SESSION SET EVENTS 'event TRACE NAME CONTEXT FOREVER, LEVEL
level', where event is a numeric event such as 10046 and level is the
event level.
Usage Notes
When using a numeric event such as 10046, set nm to an empty string (passing NULL does not
work, although the DBMS normally treats empty strings as NULL). If SET EV is used incorrectly
or with an si or se that does not exist, the procedure call does not have any effect and no exception is thrown.
Examples
The following examples illustrate both uses of DBMS SYSTEM.SET EV:
• Setting an event at session level (recommended for Shared Server database sessions)
• Taking a named dump
CHAPTER 20 ■ DBMS_SYSTEM
Enabling SQL Trace with SET_EV
Let’s assume that the session of user HR is a resource hog. So we need to trace SQL statements
and wait events to figure out what’s going on. The following example shows how to retrieve the
SID and SERIAL# from V$SESSION to enable tracing with DBMS SYSTEM.SET EV:
SQL> CONNECT / AS SYSDBA
Connected.
SQL> SELECT sid, serial# FROM v$session WHERE username='HR';
SID
SERIAL#
---------- ---------140
862
SQL> EXECUTE dbms system.set ev(140, 862, 10046, 8, '')
PL/SQL procedure successfully completed.
Taking Named Dumps
Named dumps may only be taken in the same session that calls DBMS SYSTEM.SET EV. To take an
immediate ERRORSTACK dump in a foreign session, use ORADEBUG (see Chapter 37). The following
anonymous block takes an ERRORSTACK dump at level 3:
SQL> VARIABLE sid NUMBER
SQL> VARIABLE serial NUMBER
SQL> BEGIN
SELECT sid, serial# INTO :sid, :serial
FROM v$session
WHERE sid=(SELECT sid FROM v$mystat WHERE rownum=1);
sys.dbms system.set ev(:sid, :serial, 65535, 3, 'errorstack');
END;
/
The same result is attained by a much simpler ALTER SESSION statement.
SQL> ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME ERRORSTACK LEVEL 3';
An error stack dump shows which subroutine a program was executing at the time the
dump was taken. It also lists the entire call stack, i.e., in what sequence subroutines called each
other. The currently executed subroutine is at the top of the call stack. ORACLE call stacks are
similar to call stacks obtained by reading core files with debuggers, such as adb, sdb, or gdb.
Taking an ERRORSTACK dump does not terminate an ORACLE process. However, you should
expect the process to become unresponsive while it writes the trace file.
A level 3 error stack dump includes open cursors and is thus useful to find the SQL statement text corresponding to a certain cursor number, in case the PARSING IN CURSOR entry for
a specific cursor is missing in a SQL trace file. In Oracle10g, cursors are dumped as in the
example below:
Cursor#1(07470C24) state=BOUND curiob=07476564
curflg=4c fl2=0 par=00000000 ses=6998B274
sqltxt(66F31AA0)=ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME ERRORSTACK LEVEL 3'
209
210
CHAPTER 20 ■ DBMS_SYSTEM
SET_SQL_TRACE_IN_SESSION Procedure
This procedure enables SQL trace in a database session.
Syntax
DBMS SYSTEM.SET SQL TRACE IN SESSION(
sid IN NUMBER,
serial# IN NUMBER,
sql trace IN BOOLEAN);
Parameters
Parameter
Description
sid
Session id; corresponds to V$SESSION.SID
serial#
Session serial number; corresponds to V$SESSION.SERIAL#
sql trace
TRUE turns tracing on, FALSE turns tracing off
Usage Notes
No exception is raised when one of the parameters SID or SERIAL# are incorrect. Use the packaged
procedure DBMS SYSTEM.SET SQL TRACE IN SESSION only if you are sure that you don’t need to
trace wait events and bind variables.
Examples
SQL> SELECT sid, serial# FROM v$session WHERE username='NDEBES';
SID
SERIAL#
---------- ---------35
283
SQL> EXEC dbms system.set sql trace in session(35, 283, true);
WAIT_FOR_EVENT Procedure
This procedure causes the calling session to artificially wait for a certain amount of seconds for
the event specified. The event must be a wait event from V$EVENT NAME.NAME. If SQL trace at
level 8 or 12 is enabled, artificially generated wait events are emitted to a trace file. WAIT FOR
EVENT is useful for developers of extended SQL trace profilers who need to make sure that their
profiler software understands all the wait events that might be emitted to a trace file. It would
be hard to write software that is able to cause all of the 872 wait events in Oracle10g for the
purpose of testing a profiler. Even more so, since Oracle10g wait events are logged with various
meaningful parameter names instead of p1, p2, p3 in Oracle9i and earlier releases.
CHAPTER 20 ■ DBMS_SYSTEM
Syntax
DBMS SYSTEM.WAIT FOR EVENT(
event IN VARCHAR2,
extended id IN BINARY INTEGER,
timeout IN BINARY INTEGER);
Parameters
Parameter
Description
event
Name of a wait event; corresponds to V$EVENT NAME.NAME
extended id
Additional information on the event
timeout
Time to wait for an event (in seconds)
Usage Notes
The session waits for the time-out period specified and populates the column V$SESSION
WAIT.EVENT with the value of the parameter EVENT and V$SESSION WAIT.P1 with the value of the
parameter EXTENDED ID. For wait events that have more than just one parameter, the remaining
parameters are set to default values, which limits the usefulness of the procedure for testing
extended SQL trace profilers. In Oracle9i, EXTENDED ID is emitted to a trace file in the format
p1=extended_id, whereas in Oracle10g the format is field_name=extended_id, where field_name
may be retrieved from V$EVENT NAME.PARAMETER1 as shown in the next section. If event is not a
valid event from V$EVENT NAME, the exception “ORA-29352: event ’event_name’ is not an internal
event” is raised.
Examples
SQL> CONNECT / AS SYSDBA
Connected.
SQL> ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';
Session altered.
SQL> EXECUTE dbms system.wait for event('db file scattered read', 1, 1);
PL/SQL procedure successfully completed.
SQL> EXECUTE dbms system.wait for event('index block split', 204857603, 1);
PL/SQL procedure successfully completed.
SQL> ORADEBUG SETMYPID
Statement processed.
SQL> ORADEBUG TRACEFILE NAME
/opt/oracle/admin/ten/udump/ten ora 2928.trc
SQL> !egrep 'scattered|split' /opt/oracle/admin/ten/udump/ten ora 2928.trc
BEGIN dbms system.wait for event('db file scattered read', 1, 1); END;
WAIT #1: nam='db file scattered read' ela= 993052 file#=1 block#=0 blocks=0 obj#=-1
tim=456558169208
211
212
CHAPTER 20 ■ DBMS_SYSTEM
BEGIN dbms system.wait for event('index block split', 204857603, 1); END;
WAIT #1: nam='index block split' ela= 994057 rootdba=204857603 level=0 childdba=0 ob
j#=-1 tim=456559821542
SQL> SELECT parameter1 FROM v$event name WHERE name='db file scattered read';
PARAMETER1
---------file#
CHAPTER 21
■■■
DBMS_UTILITY
M
ost of the functions and procedures of the PL/SQL package DBMS UTILITY are sufficiently
documented. The procedure NAME RESOLVE accepts the name of a database object and decomposes the name into its constituent parts, such as a schema, an object name, and potentially
a database link. According to Oracle Database PL/SQL Packages and Types Reference 10g Release 2,
the procedure supports the four object types: synonym, procedure, function, and package.
Actual testing reveals that it supports a total of eight object types, thus extending its usefulness.
DBMS UTILITY.NAME RESOLVE is useful for applications that deal with database object
names. It removes the burden of translating a name into a fully qualified database object designation from applications. SQL identifiers are not case sensitive unless surrounded by double
quotes. Considering that identifiers may be case sensitive or contain spaces and punctuation
characters, it is clear that it makes sense to reuse existing functionality.
NAME_RESOLVE Procedure
This chapter discusses undocumented aspects of the packaged procedure DBMS UTILITY.
NAME RESOLVE. This procedure resolves a name that may include quotes, spaces, and mixed
case to an unambiguous designation of a database object. The return parameter values have
the same spelling as the schema and object names in the data dictionary. The object identifier,
which corresponds to DBA OBJECTS.OBJECT ID, is returned along with the type of object resolved.
Syntax
DBMS UTILITY.NAME RESOLVE (
name IN VARCHAR2,
context IN NUMBER,
schema OUT VARCHAR2,
part1 OUT VARCHAR2,
part2 OUT VARCHAR2,
dblink OUT VARCHAR2,
part1 type OUT NUMBER,
object number OUT NUMBER);
213
214
CHAPTER 21 ■ DBMS_UTILITY
Parameters
Parameter
Description
name
Name of a database object in the format [schema.]identifier1[.identifier2]
[@database_link], where all placeholders are valid SQL identifiers. All
components except identifier1 are optional.
context
The context in which to resolve a name. An integer between 1 and 7 (not 0
and 8 as stated in the documentation).1 Names of different object types
must be resolved in the correct context. See Table 21-1 for the mapping
from object type to context.
schema
Schema containing the resolved object.
part1
Database object name.
part2
Lower level database object name, such as a column of a table or procedure
in a package. No check for correctness is performed at this level.
dblink
Database link name, if present in parameter name.
part1 type
Numeric code representing the object type of part1. See Table 21-2 for
mapping type codes to object types.
object number
The unique numeric object identification from DBA OBJECTS.OBJECT ID. If
name contains a database link, then object number is zero.
If the type of an object is unknown, all contexts must be tried in an attempt to resolve a
name. Table 21-1 lists the supported object types and the corresponding context numbers.
Clusters, database links (by themselves), directories, indexes, LOB column names, queues, rule
names, and rule sets cannot be resolved. Use the packaged procedure DBMS UTILITY.NAME
TOKENIZE to decompose names of such objects for easier lookup in the data dictionary.
1.Values outside of the range 1–7 cause “ORA-20005: ORU-10034: context argument must be 1 or 2 or 3 or
4 or 5 or 6 or 7”.
CHAPTER 21 ■ DBMS_UTILITY
Table 21-1. Context Parameter vs. Object Type
Object Type
Context Parameter
Package
1
Sequence
2
Synonym
2
View
2
Table
2
Trigger
3
Type
7
The range of values pertaining to the OUT parameter PART1 TYPE is in Table 21-2.
Table 21-2. Mapping from PART1_TYPE to Object Type
PART1_TYPE
Object Type
0
Object name followed by database link
2
Table
4
View
6
Sequence
7
Procedure
8
Function
9
Package
12
Trigger
13
Type
215
216
CHAPTER 21 ■ DBMS_UTILITY
Usage Notes
The name to resolve may include three identifiers as in schema.table_name.column_name or
schema.package_name.subroutine_name. These types of expressions are resolved to schema.
table_name and schema.package_name respectively. Neither column_name nor subroutine_
name are checked for correctness. When a database link is detected, PART1 TYPE=0 is returned
and no checking of object names occurs. To resolve such names, connect to the database
where the remote object resides and repeat name resolution there or perform a remote procedure call by using the database link.
Exceptions
If a name cannot be resolved in the specified context, “ORA-06564: object object_name does
not exist” is raised, where object_name is the value of parameter NAME passed to the procedure
NAME RESOLVE. If an existing object is resolved in the wrong context, the exception “ORA-04047:
object specified is incompatible with the flag specified” is thrown.
Examples
In Oracle10g, the public synonym PRODUCT USER PROFILE points to a table with SQL*Plus
configuration data in schema SYSTEM. The following example resolves this public synonym
and displays the table it refers to (see file name resolve table.sql in the source code depot):
SQL>
SQL>
SQL>
SQL>
SQL>
SQL>
SQL>
SQL>
SQL>
VARIABLE name VARCHAR2(100)
VARIABLE context NUMBER
VARIABLE schema VARCHAR2(30)
VARIABLE part1 VARCHAR2(30)
VARIABLE part2 VARCHAR2(30)
VARIABLE dblink VARCHAR2(30)
VARIABLE part1 type NUMBER
VARIABLE object number NUMBER
BEGIN
:context:=2; -- 1: package, 2: table
:name:=' "SYSTEM" . Product User Profile '; -- name to resolve
DBMS UTILITY.NAME RESOLVE (
name => :name,
context => :context,
schema => :schema,
part1 => :part1,
part2 => :part2,
dblink => :dblink,
part1 type => :part1 type,
object number => :object number
);
end;
/
PL/SQL procedure successfully completed.
CHAPTER 21 ■ DBMS_UTILITY
After a successful call to DBMS UTILITY.NAME RESOLVE, the bind variables contain the
constituent parts of the referenced database object.
SQL> SELECT -- resolved name
'"' || :schema|| '"' || nvl2(:part1,'."'||:part1 || '"', null)||
nvl2(:part2,'."'||:part2 || '"',NULL) ||
nvl2(:dblink,'@"'||:dblink || '"' ,NULL) || ' is ' ||
-- translate part1 type to object type
decode(:part1 type, 0, 'an object at a remote database',
2, 'a table',
4, 'a view',
6, 'a sequence',
7, 'a procedure',
8, 'a function',
9, 'a package',
12, 'a trigger',
13, 'a type') ||
' (PART1 TYPE=' || :part1 type || ', OBJECT NUMBER=' ||
:object number || ')' AS detailed info
FROM dual;
DETAILED INFO
--------------------------------------------------------------------------------"SYSTEM"."SQLPLUS PRODUCT PROFILE" is a table (PART1 TYPE=2, OBJECT NUMBER=10209)
The OUT parameter OBJECT NUMBER may be used to retrieve additional information on a
database object from the dictionary view DBA OBJECTS.
SQL> SELECT owner, object name, object type, status, created
FROM all objects
WHERE object id=:object number;
OWNER OBJECT NAME
OBJECT TYPE STATUS CREATED
------ ----------------------- ----------- ------- --------SYSTEM SQLPLUS PRODUCT PROFILE TABLE
VALID
30.Aug.05
The result of the query on ALL OBJECTS at the very end of the script confirms that the name
was correctly resolved. Thus, the public synonym PRODUCT USER PROFILE resolves to the table
SQLPLUS PRODUCT PROFILE. With a name containing a database link
("SYSTEM".Product User Profile@db link) as a final example, the result is as follows:
DETAILED INFO
--------------------------------------------------------------------------"SYSTEM"."PRODUCT USER PROFILE"@"DB LINK" is an object at a remote database
(PART1 TYPE=0, OBJECT NUMBER=0)
no rows selected
This time, the query on ALL OBJECTS does not return a result, since the value of the OUT
parameter OBJECT NUMBER is zero.
217
218
CHAPTER 21 ■ DBMS_UTILITY
Name Resolution and Extraction of
Object Statistics
An ORACLE performance optimization assignment may involve the extraction of object statistics used by the cost based optimizer. I frequently use a script called statistics.sql, which
reports details of the table structure, table partitions and subpartitions (if present), table and
index cardinality, distinct values in columns, distinct index keys, timestamps of the last statistics gathering, indexes, tablespaces, block size, and LOB columns. The report reveals problems
such as stale or missing statistics, indexes on non-selective columns, or LOBs with the NOCACHE
option, which cause direct path read and write waits. The report also makes it easy to identify
columns with high selectivity, which are good candidates for indexing, given that these columns
appear as predicates in where-clauses. I highly recommend using the script when investigating
performance problems.
The use of DBMS UTILITY.NAME RESOLVE renders calling the script much more convenient,
since it is sufficient to provide the name of a table (or synonym) to report on in lowercase
instead of the owner (or schema) and table with exactly the same spelling as in the data dictionary
(usually all upper case letters). The script takes the name of a database object as input, resolves
the name, and then pulls the relevant information from ALL * dictionary views (this is serious
business, not an All-Star Game!), such as ALL TABLES, ALL INDEXES, ALL LOBS, and so on. The
script works without DBA privileges, since ALL * views and not DBA * views are used. The syntax
for running the script is as follows:
sqlplus -s user/password @statistics[.sql] { [schema.]table name | synonym }
Data dictionary object names that contain lowercase letters must be quoted. The
following illustration shows sample output from the script statistics.sql.
Monitoring
-----YES
Buffer
IOT
Pool
Degree Cluster Type
IOT Name
------- ------ ------- ------- -----------------------------DEFAULT 1
User
Sample
Last
Stats
Size
Analyze
----- ---------- --------------NO
2,000
01.Oct 07 18:09
Last DDL
--------------05.Oct 07 09:08
05.Oct 07 00:42
05.Oct 07 00:42
05.Oct 07 00:42
05.Oct 07 00:42
05.Oct 07 00:42
Last
Sample Buffer
Analyze
Size Pool
Degree
--------------- ---------- ------- -----01.Oct 07 18:09
2,500 DEFAULT 1
01.Oct 07 18:09
2,500 DEFAULT 1
01.Oct 07 18:09
2,500 DEFAULT 1
01.Oct 07 18:09
2,500 DEFAULT 1
01.Oct 07 18:09
2,500 DEFAULT 1
01.Oct 07 18:09
2,500 DEFAULT 1
Index
Type
-------NORMAL
NORMAL
NORMAL
NORMAL
NORMAL
NORMAL
Column
Col Column
Name
Pos Details
------------------------------ ---- -----------------------DEPARTMENT_ID
1 NUMBER(4)
EMAIL
1 VARCHAR2(25) NOT NULL
EMPLOYEE_ID
1 NUMBER(6) NOT NULL
JOB_ID
1 VARCHAR2(10) NOT NULL
MANAGER_ID
1 NUMBER(6)
LAST_NAME
1 VARCHAR2(25) NOT NULL
FIRST_NAME
2 VARCHAR2(20)
Index Owner
and Name
-----------------------------HR . EMP_DEPARTMENT_IX
HR . EMP_EMAIL_UK
HR . EMP_EMP_ID_PK
HR . EMP_JOB_IX
HR . EMP_MANAGER_IX
HR . EMP_NAME_IX
Average Average
B
Leaf
Data
Uni Tree
Leaf
Distinct
Number Blocks Blocks
Cluster Global
que Level
Blks
Keys
of Rows Per Key Per Key
Factor Stats
--- ----- ---------- --------------- ------------ ------- ------- ----------- -----YES
0
1
107
107
1
1
19 YES
YES
0
1
107
107
1
1
2 YES
NO
0
1
11
106
1
1
7 YES
NO
0
1
19
107
1
1
8 YES
NO
0
1
18
106
1
1
7 YES
NO
0
1
107
107
1
2
15 YES
Created
--------------01.Oct 07 18:09
01.Oct 07 18:09
01.Oct 07 18:09
01.Oct 07 18:09
01.Oct 07 18:09
01.Oct 07 18:09
Index Owner
and Name
-----------------------------HR . EMP_EMAIL_UK
HR . EMP_EMP_ID_PK
HR . EMP_DEPARTMENT_IX
HR . EMP_JOB_IX
HR . EMP_MANAGER_IX
HR . EMP_NAME_IX
Index Owner
and Name
-----------------------------HR . EMP_EMAIL_UK
HR . EMP_EMP_ID_PK
HR . EMP_DEPARTMENT_IX
HR . EMP_JOB_IX
HR . EMP_MANAGER_IX
HR . EMP_NAME_IX
User
Stats
----NO
NO
NO
NO
NO
NO
Column
Column
Distinct
Number
Number Global User
Sample Last
Name
Details
Values Density Buckets
Nulls Stats Stats
Size Analyze
------------------------------ ------------------------ --------------- -------- ------- ---------- ------ ----- ---------- -------------EMPLOYEE_ID
NUMBER(6) NOT NULL
107
.0093
1
0 YES
NO
01.Oct 07 18:09
FIRST_NAME
VARCHAR2(20)
91
.0110
1
0 YES
NO
01.Oct 07 18:09
LAST_NAME
VARCHAR2(25) NOT NULL
102
.0098
1
0 YES
NO
01.Oct 07 18:09
EMAIL
VARCHAR2(25) NOT NULL
107
.0093
1
0 YES
NO
01.Oct 07 18:09
PHONE_NUMBER
VARCHAR2(20)
107
.0093
1
0 YES
NO
01.Oct 07 18:09
HIRE_DATE
DATE NOT NULL
98
.0102
1
0 YES
NO
01.Oct 07 18:09
JOB_ID
VARCHAR2(10) NOT NULL
19
.0047
19
0 YES
NO
01.Oct 07 18:09
SALARY
NUMBER(8,2)
57
.0175
1
0 YES
NO
01.Oct 07 18:09
COMMISSION_PCT
NUMBER(2,2)
7
.1429
1
72 YES
NO
01.Oct 07 18:09
MANAGER_ID
NUMBER(6)
18
.0047
18
1 YES
NO
01.Oct 07 18:09
DEPARTMENT_ID
NUMBER(4)
11
.0047
11
1 YES
NO
01.Oct 07 18:09
BlockTablespace
size
Size (MB)
------------------------------ ------- --------EXAMPLE
8 KB
0
$ sqlplus -s hr/hr @statistics.sql employees
Table Owner
Number
Empty Average Chain Average Global
and Name
of Rows
Blocks
Blocks
Space Count Row Len Stats
------------------------------ ------------ ----------- -------- ------- ----- ------- -----HR . EMPLOYEES
107
5
0
0
0
68 YES
CHAPTER 21 ■ DBMS_UTILITY
219
220
CHAPTER 21 ■ DBMS_UTILITY
Source Code Depot
Table 21-3 lists this chapter’s source files and their functionality.
Table 21-3. DBMS_UTILITY Source Code Depot
File Name
Functionality
name resolve.sql
This script contains an anonymous block that resolves the
name of any object type supported by the procedure DBMS
UTILITY.NAME RESOLVE.
name resolve procedure.sql
This script creates a stored procedure called NAME RESOLVE.
The procedure resolves all the object types supported by
DBMS UTILITY.NAME RESOLVE. The procedure NAME RESOLVE
accepts the same parameters as the packaged procedure
DBMS UTILITY.NAME RESOLVE, tries all contexts, and if successful,
returns the same information as DBMS UTILITY.NAME RESOLVE.
Additionally, it returns the qualified resolved name. Individual
components of the resolved name are returned in double
quotes, to preserve case sensitivity.
name resolve table.sql
Example of name resolution of a table with DBMS UTILITY.
statistics.sql
This script reports optimizer statistics for table and index
columns. Used for checking column selectivity and indexing
when tuning SQL statements.
PA R T
7
Application
Development
CHAPTER 22
■■■
Perl DBI and DBD::Oracle
P
erl is a general-purpose, interpreted programming language that supports access to many
commonly used database systems. The documentation does not mention the fact that each
Oracle10g and Oracle11g ORACLE HOME contains a Perl installation, which includes the Perl
modules DBI and DBD::Oracle for access to an ORACLE DBMS instance. Hence there is no
need to install Perl, DBI, and DBD::Oracle,1 which requires a C compiler and a scarce resource
called time. Perl and the DBI may be used to write your own monitoring and benchmarking
tools, to extract data (including LOBs) to flat file for long term archival, and to insert or update
LOBs from operating system files. Furthermore, Perl DBI is an excellent prototyping tool. The
Perl subdirectory plus several directories within ORACLE HOME can be copied to create a small
footprint Perl DBI Oracle client for machines that do not require a full RDBMS server installation.
Circumnavigating Perl DBI Pitfalls
Except pointing out that Perl DBI is sitting in each Oracle10g ORACLE HOME ready for use and
how to use it, what benefit could I possibly provide in this chapter? In my experience, users
of the Perl DBI struggle with the many ways of connecting to an ORACLE instance and how to
implement them in Perl DBI scripts, since the Perl DBI and DBD::Oracle documentation at the
Comprehensive Perl Archive Network (CPAN; http://www.cpan.org) does not provide all the
details.
The goal of this chapter is to provide you with a comprehensive source of Perl programming in
an ORACLE environment. The following ORACLE-specific material is addressed:
• Named bind variables
• Connecting via the TCP/IP, bequeath, and IPC protocols
• Connecting with SYSDBA and SYSOPER privileges
• Using connect strings with and without Net service names
Basically any book should be able to speak for itself. However, including in-depth coverage of
Perl, the DBI, and DBD::Oracle would be off-topic for this book. Not to mention that it would
not fit the page count. By addressing the ORACLE-specific issues of connecting in great detail
and providing an example Perl script that does SELECT, INSERT, calls PL/SQL, and includes
1. The Tk package, which is needed to build graphical user interfaces with Perl, is not included.
223
224
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
transaction as well as exception handling, I hope to instill enough confidence in the novice Perl
DBI user for him or her to begin coding Perl DBI scripts.
Personally, I strongly endorse the use of Perl with DBI over scripting with SQL*Plus and a
command shell such as Bourne Shell or Korn Shell. For example, error handling in Perl is much
better, several database sessions may be opened simultaneously, file operations are much
more sophisticated, and so on. Once you have some practice programming the Perl DBI, development time will be reduced significantly. I encourage any DBA, even DBAs without prior
knowledge of Perl, to accept the challenge of learning the language. Once proficient, you will
discover that there is plenty of low-hanging fruit for its application.
If you encounter any bugs while using a Perl distribution that ships with Oracle software,
I suggest that you try a more recent Perl distribution from ActiveState (http://www.
activestate.com). ActiveState, a company that specializes in scripting languages such as Perl,
provides a free Perl distribution that includes DBD::Oracle starting with Perl release 5.10. The
DBI and DBD::ORACLE releases provided by ActiveState are more recent than those shipped
with Oracle11g.
A Brief History of Perl and the DBI
The inventor of Perl is Larry Wall. Perl was first released in 1987. Today it is distributed under
the GNU General Public License and the Perl Artistic License. Can you keep a secret? Larry Wall
said that Perl actually stands for Pathologically Eclectic Rubbish Lister, but don’t tell anyone
else he said that!
The truth is that the acronym Perl means Practical Extraction and Reporting Language.
Programming Perl, coauthored by Larry Wall2 (a.k.a. “The Camel Book” and “The Perl Bible”)
states that “Perl is designed to make the easy jobs easy, without making the hard jobs impossible” [WaCh 2000]. Although at first glance it may appear that Perl is an interpreted language,
Perl code is actually compiled and run using virtual machine technology comparable to Java.
The Perl DBI (database independent interface) is written and maintained by Tim Bunce,
author of the book Programming the Perl DBI [Bunc 2000]. His motto is “making simple things
easy and difficult things possible.” Except designing and implementing the DBI, Tim Bunce
has been a Perl5 porter since 1994, contributing to the development of the Perl language and
many of its core modules. He provides additional information on the Perl DBI and DBD::Oracle
at http://search.cpan.org/~timb.
Setting Up the Environment for Perl and the DBI
The Perl installation distributed as part of each Oracle10g and Oracle11g ORACLE HOME resides in
the directory $ORACLE HOME/perl. It is used by Oracle Corporation’s Database Control and Grid
Control administration tools. Perl DBI, being a database client, needs the ORACLE DBMS client
shared library, which contains the Oracle Call Interface (OCI) routines, for communication
with an ORACLE instance. The only obstacles on the way to leveraging the Perl DBI in $ORACLE
HOME are locating a suitable client shared library and setting platform-specific environment
variables.
2. For more information on Larry Wall, see http://en.wikipedia.org/wiki/Larry Wall.
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
UNIX Environment
Many UNIX systems already have a Perl installation. Usually, the DBI module for database
access is missing in that installation and it would require a C compiler to install it. Much easier
is setting some environment variables for using Perl in an ORACLE HOME by following the steps
described next.
PATH
First of all, the executable search path variable PATH needs to be modified, such that $ORACLE HOME/
perl/bin is searched before any directory that contains the Perl interpreter shipped with the
operating system.
$ export PATH=$ORACLE HOME/perl/bin:$PATH
$ which perl
/opt/oracle/product/db10.2/perl/bin/perl
PERL5LIB
The second step is setting the Perl module search path. Perl programs normally have extension
.pl, while Perl modules have extension .pm. The DBI is implemented by a file named DBI.pm. It
is this file that the Perl interpreter needs to locate. Fortunately, this variable does not depend
on the word size of the platform or the version of Perl. The following setting may be used
generically:
export PERL5LIB=$ORACLE HOME/perl/lib:$ORACLE HOME/perl/lib/site perl
On a 32-bit Linux system, the required file is $ORACLE HOME/perl/lib/site perl/5.8.3/
i686-linux- thread-multi/DBI.pm. Perl locates the file by also searching subdirectories of the
directories specified with PERL5LIB. It starts looking in the most specific directory for the build
and version of Perl, then tries directories further up in the directory tree and finally stops by
searching the directory specified.
In the above example, the version of Perl (perl -version) is 5.8.3 and the build is i686linux-thread-multi (Intel x86 32-bit machine architecture with Perl multithreading linked in).
In case PERL5LIB is set incorrectly or the DBI is not present in the Perl installation (such as the
one that ships with the operating system), you will get an error similar to the following:
$ echo $PERL5LIB
/opt/oracle/product/db10.2/perl/lib
$ echo "use DBI;" | perl
Can't locate DBI.pm in @INC (@INC contains: /opt/oracle/product/db10.2/perl/lib/5.8.
3/ i686-linux-thread-multi /opt/oracle/product/db10.2/perl/lib/5.8.3 /opt/oracle/pro
duct/ db10.2/perl/lib) at dbi.pl line 3.
BEGIN failed--compilation aborted at dbi.pl line 3.
In the preceding example, $ORACLE HOME/perl/lib/site perl is missing from PERL5LIB.
The fix is to add this directory to PERL5LIB. Use a colon as the separator character when setting
up multiple directories for searching. If you need to use additional Perl modules not located in
$ORACLE HOME/perl/lib or even wrote your own modules (it’s not that hard, trust me), you must
also add their locations to PERL5LIB. The source code depot contains a Perl module, which is built
on top of the DBI.
225
226
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
Shared Library Search Path
All current implementations of UNIX use shared libraries or runtime linking instead of static
linking. With static linking, libraries such as the Standard C library are statically linked into
programs at runtime. Dynamic linking adds the library to the text segment (where the machine
code instructions are) of the program at runtime. This approach results in smaller executables
and has the advantage that newer releases of shared libraries are picked up automatically by
the executables the next time they are run. On many UNIX platforms, the command ldd may
be used to find out which shared libraries an executable requires.
$ ldd `which perl`
libnsl.so.1 => /lib/libnsl.so.1 (0x0551c000)
libdl.so.2 => /lib/libdl.so.2 (0x00862000)
libm.so.6 => /lib/tls/libm.so.6 (0x00868000)
libcrypt.so.1 => /lib/libcrypt.so.1 (0x053f9000)
libutil.so.1 => /lib/libutil.so.1 (0x00cd8000)
libpthread.so.0 => /lib/tls/libpthread.so.0 (0x0097e000)
libc.so.6 => /lib/tls/libc.so.6 (0x00734000)
/lib/ld-linux.so.2 (0x0071a000)
In the preceding output, libc.so.6 is the Standard C library, libm.so is the math library,
and libdl.so is part of the dynamic linker itself.
The initial step on the way to a correct shared library search path is to determine whether
$ORACLE HOME/perl/ bin/perl is a 32-bit or a 64-bit executable. This can be done using the
UNIX command file. On an AIX system, this might yield the following:
$ file $ORACLE HOME/perl/bin/perl
/opt/oracle/product/10.2.0.2.1/perl/bin/perl: executable (RISC System/6000) or
object module not stripped
Since there is no mention of 64-bit in the preceding output, perl is a 32-bit executable. On
a 32-bit Linux system, it might look as follows:
$ file `which perl`
/opt/oracle10/app/oracle/product/10.2.0.2/db rac/perl/bin/perl: ELF 32-bit LSB
executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5,
dynamically linked (uses shared libs), not stripped
On this Solaris 9 system, perl is a 64-bit executable.
$ file /opt/oracle/product/10.2.0.2.0/perl/bin/perl
/opt/oracle/product/10.2.0.2.0/perl/bin/perl:
ELF 64-bit MSB executable SPARCV9
Version 1, dynamically linked, not stripped
The next step consists of locating the ORACLE client shared library libclntsh.so with the
same word length (32-bit or 64-bit) as perl. ORACLE installations can be 32-bit or 64-bit. ORACLE
DBMS software is only available in a 64-bit version for platforms such as AIX and HP-UX, whereas
32-bit versions exist for Sparc Solaris, Solaris x86, and Linux. Matters are simple on an operating system that only supports 32-bit executables, such as Linux x86. On these platforms,
$ORACLE HOME/lib contains a suitable 32-bit client shared library.
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
In a 64-bit ORACLE HOME, 64-bit shared libraries are located in $ORACLE HOME/lib and 32-bit
libraries in $ORACLE HOME/lib32. Here’s an example from an AIX system:
$ file $ORACLE HOME/lib/libclntsh.so $ORACLE HOME/lib32/libclntsh.so
/opt/oracle/product/10.2.0.2.1/lib/libclntsh.so: 64-bit XCOFF executable or object
module not stripped
/opt/oracle/product/10.2.0.2.1/lib32/libclntsh.so: executable (RISC System/6000) or
object module not stripped
On a Solaris system, it might look as follows:
$ file $ORACLE HOME/lib/libclntsh.so $ORACLE HOME/lib32/libclntsh.so
/opt/oracle/product/10.2.0.2.1/lib/libclntsh.so:
ELF 64-bit MSB dynamic lib
SPARCV9 Version 1, dynamically linked, not stripped
/opt/oracle/product/10.2.0.2.1/lib32/libclntsh.so:
ELF 32-bit MSB dynamic lib
SPARC Version 1, dynamically linked, not stripped
Finally, here’s some output from a 64-bit Linux system:
file $ORACLE HOME/lib/libclntsh.so* $ORACLE HOME/lib32/libclntsh.so*
/opt/oracle10/app/oracle/product/10.2.0.2/db rac/lib/libclntsh.so:
link to `libclntsh.so.10.1'
/opt/oracle10/app/oracle/product/10.2.0.2/db rac/lib/libclntsh.so.10.1:
LSB shared object, AMD x86-64, version 1 (SYSV), not stripped
/opt/oracle10/app/oracle/product/10.2.0.2/db rac/lib32/libclntsh.so:
link to `libclntsh.so.10.1'
/opt/oracle10/app/oracle/product/10.2.0.2/db rac/lib32/libclntsh.so.10.1:
LSB shared object, Intel 80386, version 1 (SYSV), not stripped
symbolic
ELF 64-bit
symbolic
ELF 32-bit
Now we are ready to set the platform-specific environment variable, which controls the
shared library search path. Table 22-1 lists the most common platforms and the name of the
variable on each platform.
Table 22-1. Shared Library Search Path Environment Variables per Platform
Operating System
Shared Library Search Path Environment Variable
AIX
LIBPATH
HP-UX 32-bit
SHLIB PATH
HP-UX 64-bit
LD LIBRARY PATH and SHLIB PATH
Linux
LD LIBRARY PATH
Mac OS X3
DYLD LIBRARY PATH
Solaris
LD LIBRARY PATH
Tru64 UNIX
LD LIBRARY PATH
3. Oracle10g Release 1 is available for Mac OS X. As of this writing, Oracle10g Release 2 is not planned for
Mac OS X.
227
228
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
Perl will not be able to load the ORACLE driver module DBD::Oracle unless the correct
variable is used and the correct search path is set. The most concise test consists of running the
following command:
$ echo "use DBI;use DBD::Oracle" | perl
Can't load '/opt/oracle/product/db10.2/perl/lib/site perl/5.8.3/i686-linux-thread-mu
lti/auto/DBD/Oracle/Oracle.so' for module DBD::Oracle: libclntsh.so.10.1: cannot ope
n shared object file: No such file or directory at /opt/oracle/product/db10.2/perl/l
ib/ 5.8.3/i686-linux-thread-multi/DynaLoader.pm line 229.
at - line 1
Compilation failed in require at - line 1.
BEGIN failed--compilation aborted at - line 1.
If you get an error such as this, you need to fix the shared library search path. Error messages
differ slightly per platform, but always mention that libclntsh.so could not be found.
The following Perl DBI program called perl-dbi-test.pl is ideal for testing connectivity
to an ORACLE instance. It prompts for user name, password, and Net service name, and then
attempts to connect to the DBMS. If it succeeds, it selects the database login user name and
prints it to the screen.
#!/usr/bin/env perl
# RCS: $Header: /home/ndebes/it/perl/RCS/perl-dbi-test.pl,v 1.1 2007/01/26 16:07:13
ndebes Exp ndebes $
# Perl DBI/DBD::Oracle Example
use strict;
use DBI;
print "Username: \n";
my $user = <STDIN>;
chomp $user;
print "Password: \n";
my $passwd = <STDIN>;
chomp $passwd;
print "Net Service Name (optional, if ORACLE instance runs locally and ORACLE SID is
set): \n";
my $net service name = <STDIN>; # Oracle Net service name from tnsnames.ora or other
name resolution method
chomp $net service name;
if ($net service name) {
print "Trying to connect to $user/$passwd\@$net service name\n";
}
else {
print "Trying to connect to $user/$passwd\n";
}
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
# Connect to the database and return a database handle
my $dbh = DBI->connect("dbi:Oracle:${net service name}", $user, $passwd)
or die "Connect failed: $DBI::errstr";
my $sth = $dbh->prepare("SELECT user FROM dual"); # PARSE
$sth->execute(); # EXECUTE
my @row = $sth->fetchrow array(); # FETCH
printf ("Connected as user %s\n", $row[0]);
$sth->finish;
$dbh->disconnect; # disconnect from ORACLE instance
Following are example settings for 32-bit Perl in a 64-bit ORACLE HOME
export PATH=$ORACLE HOME/perl/bin:$ORACLE HOME/bin:/usr/bin:/usr/ccs/bin
export PERL5LIB=$ORACLE HOME/perl/lib:$ORACLE HOME/perl/lib/site perl
export LD LIBRARY PATH=$ORACLE HOME/lib32
Let’s confirm that these settings are correct.
$ perl perl-dbi-test.pl
Username:
ndebesPassword:
secret
Net Service Name (optional, if ORACLE instance runs locally and ORACLE SID is set):
ten tcp.world
Trying to connect to ndebes/secret@ten tcp.world
Connected as user NDEBES
The script ran successfully. The connection to the DBMS as well as the execution of SELECT
user FROM dual completed without error.
Windows Environment
On Windows, the Oracle10g Release 2 Universal Installer (OUI) sets the system environment
variables PATH and PERL5LIB. You may look these up by navigating to Control Panel ➤ System ➤
Advanced ➤ Environment Variables and browsing the list box System Variables. However, it
merely adds %ORACLE HOME%\bin to PATH and in addition to correct directories in PERL5LIB also
sets some non-existent directories (e.g., %ORACLE HOME%\perl\5.8.3\lib\MSWin32-x86) and
several unnecessary ones.
On Windows, environment variables may be set system-wide, user-specifically, and in an
instance of the command interpreter cmd.exe.4 User-specific settings override system-wide
settings. If an environment variable is changed in a command interpreter, the modified value
is used instead of the user or system variable set using the Control Panel. On startup, command
interpreters inherit user-specific and system-wide environment variables. If you do not have
permission to change system-wide settings, you can still override them with user-specific settings.
4. Click Start ➤ Run or hold down the Windows key (the one showing a flag) and type R, enter cmd, and
click OK to start a command interpreter.
229
230
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
Figure 22-1.PERL5LIB system environment variable
Of course, it’s much more convenient to set variables with Control Panel ➤ System once
and for all than to source a file with environment variables each time a command interpreter is
started. A smart approach is to set environment variables in a command interpreter for testing
and to store them in the registry once the results are correct. This is the approach taken in the
sections that follow.
First of all, we will set ORACLE HOME, such that it may be reused when setting the remaining
environment variables.
C:> set ORACLE HOME=C:\ORACLE\product\db10.2
Next we need to add the directory where perl.exe resides to PATH. On Windows, it resides
in the build-specific directory MSWin32-x86-multi-thread. The separator for specifying multiple
directories is a semicolon (;).
C:> set PATH=%ORACLE HOME%\perl\5.8.3\bin\MSWin32-x86-multi-thread;%ORACLE HOME%\bin
;C:\WINDOWS\System32
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
Additional Windows-specific directories were omitted from the preceding setting. Now it
should be possible to run perl.exe:
C:> perl -version
This is perl, v5.8.3 built for MSWin32-x86-multi-thread
Copyright 1987-2003, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'. If you have access to the
Internet, point your browser at http://www.perl.com/, the Perl Home Page.
If you intend to use other components of the Perl installation apart from the Perl interpreter itself, you need to add %ORACLE HOME%\perl\5.8.3\bin to PATH. You may then use perldoc
to read the documentation library within the Perl installation or pod2html to generate HTML
documentation from POD (Plain Old Documentation) statements embedded in Perl source
(see perldoc perlpod).
C:> set PATH=%ORACLE HOME%\perl\5.8.3\bin\MSWin32-x86-multi-thread;%ORACLE HOME%\ pe
rl\5.8.3\bin;%ORACLE HOME%\bin;C:\WINDOWS\System32
C:> perldoc perltoc
NAME
perltoc - perl documentation table of contents
DESCRIPTION
This page provides a brief table of contents for the rest of the Perl
documentation set. It is meant to be scanned quickly or grepped through
to locate the proper section you're looking for.
BASIC DOCUMENTATION
perl - Practical Extraction and Report Language
…
As it turns out, PERL5LIB does not have to be set for Perl in a Windows ORACLE HOME to work,
since perl.exe automatically searches the directory tree from which it is invoked. On Windows,
there is no separate environment variable for searching dynamic link libraries (DLLs)—the
Windows variant of shared libraries. PATH serves as the command search path as well as the DLL
search path. The ORACLE DLL required to connect to a DBMS instance is OCI.DLL. Since
perl58.dll, which is required to run perl.exe, is collocated with perl.exe in %ORACLE HOME%
\perl\5.8.3\bin\MSWin32-x86-multi-thread, and OCI.DLL is in %ORACLE HOME%\bin, you are all
set once PATH contains these two directories.
The OUI also adds the directory %ORACLE HOME%\sysman\admin\scripts to the setting of
PERL5LIB in the Control Panel. This should be retained to avoid impairing Database Control or
Grid Control. Note that a Grid Control Management Agent installation also contains an installation
231
232
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
of Perl including DBI and DBD::Oracle. After all, these components are built with Perl DBI. So
you can actually get away with simplifying the environment variable PERL5LIB as follows:
set PERL5LIB=<ORACLE HOME>\sysman\admin\scripts
Remember though, to use the actual path of your ORACLE HOME when adjusting the setting
in the user or system environment variables via the Control Panel. %ORACLE HOME% is not expanded
when the variable is read from there.
Let’s repeat the test we previously performed on UNIX, to verify that the settings are correct.
C:> set ORACLE HOME
ORACLE HOME=C:\oracle\product\db10.2
C:> set PATH
Path=C:\oracle\product\db10.2\perl\5.8.3\bin\MSWin32-x86-multi- thread;C:\oracle\pro
duct\db10.2\bin;C:\WINDOWS\System32
PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH
C:> set PERL5LIB
PERL5LIB=C:\oracle\product\db10.2\sysman\admin\scripts
C:> perl perl-dbi-test.pl
Username:
ndebes
Password:
secret
Net Service Name (optional, if ORACLE instance runs locally and ORACLE SID is set):
TEN.oradbpro.com
Trying to connect to ndebes/secret@TEN.oradbpro.com
Connected as user NDEBES
Transparently Running Perl Programs
on UNIX Systems
Perl scripts can always be run by entering the name of the Perl interpreter (simply perl) and
passing the name of the Perl script to run as an argument as follows:
$ cat args.pl
print "Script name: $0\n";
$ perl args.pl
Script name: args.pl
When a UNIX text file is made executable with chmod +x filename and executed with
./filename, a default shell for the UNIX system used is spawned and filename is passed as an
argument to it. The default shell for Linux is bash (Bourne Again Shell), while sh (Bourne Shell)
is the default command shell for most other systems. The pseudo comment #!executable on
the first line of a text file may be used to specify another program than the default shell for
processing a text file. At first glance, it may be appropriate to put the absolute path name of the
Perl interpreter after the pseudo comment (#!).
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
$ cat args.pl
#!/usr/bin/perl
print "Script name: $0\n";
$ chmod +x args.pl
$ ./args.pl
Script name: ./args.pl
This works without problems but is not portable. On another system, the Perl interpreter
might reside in the directory /usr/local/bin/perl instead of /usr/bin/perl. The Perl script
would not run on such a system.
$ ./args.pl
: bad interpreter: No such file or directory
A better approach is to employ a level of indirection by first using the UNIX command env,
which is located in the directory /usr/bin on all UNIX systems, as the interpreter executable. It
has the capability to set environment variables based on command line arguments and to run
other programs. The environment variable PATH is considered when env attempts to locate the
program to run. Thus, the code of the script need not be modified to run on other systems.
Only the PATH variable must contain the directory where the Perl interpreter resides.
$ cat args.pl
#!/usr/bin/env perl
print "Script name: $0
$ ./args.pl
Script name: ./args.pl
Transparently Running Perl Programs
on Windows
Matters are slightly more complicated on Windows. But then again, Windows offers the option
to omit the extension .pl commonly used for Perl programs. You win some, you lose some, as
they say. Here is the deal. First of all, you define a new file type to Windows and tell it which
executable is responsible for the file type. Obviously the executable is perl.exe. Note that
Windows requires the absolute path to the executable. The environment variable PATH is not
considered for searching the executable, though environment variables may be used within
the definition. The command FTYPE is used for defining the file type. In the following code
example I use the file type PerlProgram:
C:> FTYPE PerlProgram="C:\oracle\product\db10.2\perl\5.8.3\bin\MSWin32-x86-multi-thr
ead\perl.exe" %1 %*
PerlProgram="C:\oracle\product\db10.2\perl\5.8.3\bin\MSWin32-x86-multi-thread\perl.e
xe" %1 %*
Here, the installation of Oracle10g is in C:\oracle\product\db10.2. Assuming you have
defined ORACLE HOME as an environment variable, you may also call FTYPE like this:
233
234
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
C:> set ORACLE HOME=C:\oracle\product\db10.2
C:> FTYPE PerlProgram="%ORACLE HOME%\perl\5.8.3\bin\MSWin32-x86-multi-thread\perl.ex
e" %1 %*
PerlProgram="C:\oracle\product\db10.2\perl\5.8.3\bin\MSWin32-x86-multi-thread\perl.e
xe" %1 %*
The character strings %1 and %* represent the first and the second to last arguments to pass
to perl.exe respectively. The former (%1) will contain the full path to the script file, while the
latter (%*) will pass the remaining arguments to perl.exe.
Second, Windows requires an association between the extension of a file name and a file
type. Associations are maintained through the command ASSOC. Since the file name extension
for Perl programs is .pl, we need to associate .pl with the file type PerlProgram that we defined
previously.
C:> ASSOC .pl=PerlProgram
.pl=PerlProgram
Now, instead of running Perl programs with perl filename.pl, it is sufficient to type just
filename.pl. To demonstrate that both methods are indeed equivalent, we will use the following
Perl program:
print "Script name: $0\n";
for ($i=0; $i < 10; $i++) {
if (defined $ARGV[$i]) {
printf "Argument %d: %s\n", $i, $ARGV[$i];
}
}
Here’s the output of running args.pl the old fashioned way, which requires slightly
more typing:
C:> perl args.pl first second third fourth
Script name: args.pl
Argument 0: first
Argument 1: second
Argument 2: third
Argument 3: fourth
The directory where args.pl resides should be in the command search path variable PATH.
Capitalizing on the association defined previously, you may now run this:
C:> args.pl first second third fourth
Script name: C:\home\ndebes\it\perl\args.pl
Argument 0: first
Argument 1: second
Argument 2: third
Argument 3: fourth
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
The results of both methods are identical, except for the script name, which appears with
its full path when the association is used. You may fully indulge your laziness after defining .pl
as an additional extension of executable files. This is achieved by using the environment variable
PATHEXT.
C:> set PATHEXT
PATHEXT=.COM;.EXE;.BAT;.CMD
C:> set PATHEXT=%PATHEXT%;.PL
C:> args first second third fourth
Script name: C:\home\ndebes\it\perl\args.pl
Argument 0: first
Argument 1: second
Argument 2: third
Argument 3: fourth
An association is removed by using a single blank as the file type, as shown here:
C:> ASSOC .pl= 
.pl=
The glyph  represents a blank character.
Connecting to an ORACLE DBMS Instance
An area where novice Perl DBI programmers frequently have trouble is deciding which of the
many ways to connect to an ORACLE DBMS instance is best suited for them. The Perl DBI
documentation covers this subject only partially, so I have decided to include a complete overview of all variations extant.
Table 22-2 lists the most common Oracle Net TNS (Transparent Network Substrate) protocol
adapters for connecting to an ORACLE instance. The option to use an Oracle Net service name
description with IPC or TCP, without retrieving it from the configuration file tnsnames.ora, is
ignored for the moment.
Table 22-2. Oracle Net Protocol Adapters
Method
Listener
Required
tnsnames.ora
Required
Notes
Bequeath
adapter
no
no
The environment variable ORACLE SID must
be set. On UNIX systems ORACLE HOME must
be set too.
IPC adapter
yes
yes
For systems that do not have a network
adapter or where the database client uses a
different local ORACLE HOME than the server.
TCP/IP
adapter
yes
yes
Most common method; TCP/IP network
infrastructure must be in place.
235
236
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
DBI connect Syntax
The DBI call for establishing a database session is connect. It has the following syntax:
$dbh = DBI->connect($data source, $username, $auth, \%attr);
In Perl, names of scalar variables are prefixed by $, and hashes by % (please run perldoc
perlintro for an explanation of the terms scalar and hash). Table 22-3 explains the parameters
of the connect call.
Table 22-3. connect Parameters
Parameter
Meaning
$dbh
Database handle for the database session.
$data source
Data source, i.e., specification for connecting to a local ORACLE instance
or contacting a listener. The following formats are supported:
"DBI:Oracle:", "DBI:Oracle:<Net service name>",
"DBI:Oracle:host=<host name>;port=<port number>;sid=<ORACLE SID>",
"DBI:Oracle:<host name>:<port number>/<instance service name>",5
or undef.
$username
Database username, “/” for OS authentication, or undef.
$auth
Password or undef.
\%attr
Optional reference to a hash with connect options.
The Perl keyword undef represents an undefined value analogous to NULL in SQL. If one of
$data source, $username, or $auth are undef, then environment variables are used, if available.
Table 22-4 lists the parameters and the corresponding environment variables.
Table 22-4. Perl DBI Environment Variables
Parameter
Environment Variable
$data source
DBI DSN
$username
DBI USER
$auth
DBI PASS
The value of $data source must always start with the string “DBI:Oracle:”. Note that the
uppercase “O” in the string “Oracle” as well as the colon (:) after the string “Oracle” are mandatory. Using a lowercase “o” causes the following error:
DBD::oracle initialisation failed: Can't locate object method "driver" via package
"DBD::oracle"
5. The Oracle Net easy connect format is available in Oracle10g and subsequent releases.
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
DBI uses the string “Oracle” to build the case-sensitive name of the driver module. The
driver module for connecting to ORACLE instances is DBD::Oracle not DBD::oracle—thus the
error. Other driver modules exist for most database products on the market. DBD::ODBC is a
driver that works with any database product that supports ODBC (Open Database Connectivity).
Connecting Through the Bequeath Adapter
When connecting to a DBMS instance using the bequeath adapter, make sure the environment
variables ORACLE HOME (mandatory on UNIX systems, optional on Windows) and ORACLE SID are
set. The following two lines of Perl code suffice to connect:
use DBI;
$dbh = DBI->connect("DBI:Oracle:", "ndebes", "secret") or die "Connect failed:
$DBI::errstr";
On Windows systems, the following exception is thrown when ORACLE SID is not set correctly:
Connect failed: ORA-12560: TNS:protocol adapter error (DBD ERROR: OCIServerAttach).
The error indicates that the Windows service, which implements the ORACLE DBMS
instance, is not running. The Windows service may be started with the command net start
OracleServiceORACLE_SID. On UNIX, the error is different, since there are significant disparities between the architecture of the ORACLE DBMS on UNIX and Windows (multi-process
architecture with shared memory on UNIX vs. single process, threaded architecture on Windows).
$ export ORACLE SID=ISDOWN
$ perl dbi.pl
DBI connect('','ndebes',...) failed: ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux Error: 2: No such file or directory (DBD ERROR: OCISessionBegin) at dbi.pl
line 5
$ unset ORACLE SID
$ perl dbi.pl
DBI connect('','ndebes',...) failed: ORA-12162: TNS:net service name is incorrectly
specified (DBD ERROR: OCIServerAttach) at dbi.pl line 5
By the way, V$SESSION.SERVICE NAME always has the default value of SYS$USERS when
connecting through the bequeath adapter.
Connecting Through the IPC Adapter
Use of the IPC protocol implies that the DBMS instance, the listener, and the client are running
on the same system. An entry in the configuration file listener.ora for a listener supporting
only the IPC protocol might look as follows:
LISTENER =
(DESCRIPTION =
(ADDRESS LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = TEN))
)
)
237
238
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
A Net service name definition in tnsnames.ora for connecting through the listener defined
in the preceding code section might be as follows:
TEN IPC.WORLD =
(DESCRIPTION =
(ADDRESS LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = TEN))
)
(CONNECT DATA =
(SERVICE NAME = TEN)
)
)
The only difference between connecting with bequeath vs. IPC is that the Net service
name from tnsnames.ora (e.g., TEN IPC.WORLD) is inserted in $data source after DBI:Oracle:
and there is no need to set the environment variable ORACLE SID, since the value of the parameter SERVICE NAME is taken from the Oracle Net service name definition instead.
use DBI;
my $dbh = DBI->connect("DBI:Oracle:TEN IPC.WORLD", "ndebes", "secret") or die
"Connect failed: $DBI::errstr";
The Net service name definition might also contain (SID=TEN) instead of (SERVICE NAME=TEN)
in the section CONNECT DATA. Use of SERVICE NAME is recommended, since the service name is
reflected in the view V$SESSION and may be used for tracing the subset of sessions using this
service name with DBMS MONITOR.
SQL> SELECT sid, service name, module, action
FROM v$session
WHERE program='perl.exe';
SID SERVICE NAME
MODULE
ACTION
---- -------------------- ------------ -----137 TEN
perl.exe
SQL> EXEC dbms monitor.serv mod act trace enable('TEN','perl.exe','', true, true);
PL/SQL procedure successfully completed.
Unfortunately, with this approach, the view V$SESSION does not reflect the fact that tracing
has been switched on.
SQL> SELECT sql trace, sql trace waits, sql trace binds
FROM v$session
WHERE sid=137;
SQL TRACE SQL TRACE WAITS SQL TRACE BINDS
--------- --------------- --------------DISABLED FALSE
FALSE
When the parameter SID is used in tnsnames.ora, V$SESSION.SERVICE NAME has the rather
meaningless default setting of SYS$USERS.
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
Connecting Through the TCP/IP Adapter
Use of the TPC/IP protocol enables clients to connect to a DBMS instance from any system
within a network. Following is an example of an Oracle Net service name definition for a TCP/
IP connection:
TEN TCP.WORLD =
(DESCRIPTION =
(ADDRESS LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST=dbserver.oradbpro.com)(PORT = 1521))
)
(CONNECT DATA =
(SERVICE NAME = TEN)
)
)
The listener needs to support the TCP/IP protocol too.
LISTENER =
(DESCRIPTION =
(ADDRESS LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = TEN))
(ADDRESS = (PROTOCOL = TCP)(HOST=dbserver.oradbpro.com)(PORT = 1521))
)
)
This is the Perl code to connect using the service name TEN TCP.WORLD:
use DBI;
my $dbh = DBI->connect("DBI:Oracle:TEN TCP.WORLD", "ndebes", "secret") or die
"Connect failed: $DBI::errstr";
As you can see, the Perl code is analogous to the IPC connection. Merely the Oracle Net
service name passed as part of $data source has changed. All three arguments to connect can
be undef, if the corresponding environment variables DBI DSN, DBI USER, and DBI PASS are set
using SET on Windows and export on UNIX.
my $dbh = DBI->connect(undef, undef, undef) or die "Connect failed: $DBI::errstr";
Two other approaches for connecting through the TCP/IP adapter without requiring Net
service name resolution by a naming method, such as Local Naming (tnsnames.ora) or LDAP
(Light Weight Directory Access Protocol), exist. The first has syntax similar to a Java JDBC URL
and specifies host, port, and ORACLE SID as part of the data source $dsn. Here’s an example:
my $dbh = DBI->connect("DBI:Oracle:host=dbserver.oradbpro.com;port=1521;sid=TEN",
"ndebes", "secret")
or die "Connect failed: $DBI::errstr";
The port number may be omitted. If it is, both 1521 and 1526 are tried. The sequence of the
fields host, port, and sid is irrelevant. Another method is to supply the full description that
would otherwise be retrieved from the configuration file tnsnames.ora in the argument $dsn.
239
240
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
my $dbh = DBI->connect( "DBI:Oracle:(DESCRIPTION=(ADDRESS LIST=(ADDRESS=(PROTOCOL=TC
P)(HOST=dbserver.oradbpro.com)(PORT=1521)))(CONNECT DATA=(SERVICE NAME=TEN)))", "nde
bes", "secret") or die "Connect failed: $DBI::errstr";
Again, use of SERVICE NAME in the CONNECT DATA section is preferred over SID.
Easy Connect
In Oracle10g and subsequent releases, Oracle Net easy connect may be used in connect strings.
An easy connect specification has the format host_name:port/instance_service_name. All the
information required to contact a listener is embedded in the connect string, such that Net
service name resolution is not required. Following is an example DBI connect call that uses
easy connect:
my $dbh = DBI->connect("DBI:Oracle:dbserver:1521/ELEVEN", "ndebes", "secret") or die
"Connect failed: $DBI::errstr";
Easy connect is preferred over the old format host=host_name;port=port_number;sid=
ORACLE_SID, since it uses an instance service name.6
Connecting with SYSDBA or SYSOPER Privileges
Since DBMS release Oracle9i, connecting as user SYS is only possible with SYSDBA privileges.
SYSDBA privileges are assigned either by means of operating system group membership or by
granting SYSDBA while a password file is in use (REMOTE LOGIN PASSWORDFILE=exclusive or shared).
Care must be taken to distinguish between SYSDBA (or SYSOPER) privileges and operating system
authentication, discussed in the next section. The connect string "/ AS SYSDBA" uses both. The
slash (/) indicates that operating system authentication instead of a password is to be used,
while AS SYSDBA signals that SYSDBA privileges are requested for the session. I will first show how
to connect as SYSDBA with database password authentication. The following requirements exist:
• The DBMS instance must have been started with REMOTE LOGIN PASSWORDFILE=EXCLUSIVE
or SHARED.
• A password file, which was previously created with orapwd, must exist (otherwise the
instance would not start with the above parameter setting). On UNIX systems, the
password file path name is $ORACLE HOME/ dbs/orapw$ORACLE SID, on Windows it is
%ORACLE HOME%\database\pwd%ORACLE SID%.ora.
• SYSDBA privilege must be granted to the user who wants to connect AS SYSDBA (in sqlnet.ora
on Windows, SQLNET.AUTHENTICATION SERVICES is not set or does not include NTS; otherwise operating system authentication might be used).
When these requirements are met, it is possible to connect AS SYSDBA with SQL*Plus using
any of the protocol adapters bequeath, IPC, or TCP/IP.
6. Please refer to the Introduction at the beginning of the book for a definition of the term instance
service name.
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
$ sqlplus "ndebes/secret@ten tcp.world AS SYSDBA"
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
SQL> EXIT
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Produc
tion
The connect command fails, after the SYSDBA privilege has been revoked from user NDEBES.
SQL> SHOW USER
USER is "SYS"
SQL> REVOKE SYSDBA FROM ndebes;
Revoke succeeded.
SQL> CONNECT ndebes/secret@ten tcp.world AS SYSDBA
ERROR:
ORA-01031: insufficient privileges
Now we may test connecting in the same way with Perl DBI. So far, I have not addressed
the parameter \%attr of the connect method. The backslash (\) indicates that the method
expects a reference, while the percent sign (%) indicates that a Perl hash is expected. Special
constants defined in the database driver module DBD::Oracle have to be passed as part of
\%attr to make connecting as SYSDBA or SYSOPER possible. Thus, we can no longer rely on the
DBI to automatically load the driver module DBD::Oracle for us. Instead, we must explicitly
load the module with the command use DBD::Oracle and request that the constants ORA SYSDBA
and ORA SYSOPER be loaded into the Perl symbol table. Here’s the Perl code that connects in the
same way as was done previously using SQL*Plus:
use DBI;
use DBD::Oracle qw(:ora session modes); # imports ORA SYSDBA and ORA SYSOPER
my $dbh = DBI->connect("DBI:Oracle:", "ndebes", "secret",
{ora session mode => ORA SYSDBA})
or die "Connect failed: $DBI::errstr";
Make sure the SYSDBA privilege is granted and the other requirements are met before
testing the Perl code.
Connecting with Operating System Authentication
On UNIX systems, the DBA group name commonly used to assign SYSDBA privilege is “dba”,
while the suggested group name for the OPER group and the SYSOPER privilege is “oper”. These
UNIX group names are merely suggestions. When installing with the Oracle Universal Installer
(OUI), other group names may be chosen.
On Windows, the DBA group name for the SYSDBA privilege is always “ORA_DBA” and the
OPER group name is “ORA_OPER”. NTS (NT Security) must be enabled as an authentication
service in sqlnet.ora for operating system authentication to work. This is done with the following
line in sqlnet.ora:
SQLNET.AUTHENTICATION SERVICES = (NTS) # required for connect / as sysdba
241
242
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
On UNIX systems, membership in the DBA group is sufficient to connect without a password. With default settings (REMOTE OS AUTHENT=FALSE), connecting as SYSDBA without providing
a password is only possible with the bequeath adapter. This default behavior should not be
changed, since setting REMOTE OS AUTHENT=TRUE is a security hazard.
As before, it is good practice to verify that CONNECT / AS SYSDBA works in SQL*Plus, before
attempting the same with Perl DBI code such as this:
use DBI;
use DBD::Oracle qw(:ora session modes); # imports ORA SYSDBA and ORA SYSOPER
my $dbh = DBI->connect("DBI:Oracle:", "/", undef, {ora session mode => ORA SYSDBA})
or die "Connect failed: $DBI::errstr";
Note that the preceding example uses both SYSDBA privileges and operating system
authentication. The latter is characterized by passing "/" as $user and undef (i.e., no password)
as $auth.
Let’s take a look at how a non-privileged user might connect with operating system
authentication. This is useful for running batch jobs without embedding a password in scripts
or passing a password on the command line, which might be eavesdropped by looking at the
process list with the UNIX command ps. Let’s say we want to permit the UNIX user oracle to
connect without a password. The database user name required for that purpose depends on
the setting of the initialization parameter OS AUTHENT PREFIX. In the following example, the
default value ops$ is set:
SQL> SHOW PARAMETER os authent prefix
NAME
TYPE
VALUE
----------------- ----------- -------------os authent prefix string
ops$
Next, we create a database user by using ops$ as a prefix for the UNIX user name oracle.
SQL> CREATE USER ops$oracle IDENTIFIED EXTERNALLY;
SQL> GRANT CONNECT TO ops$oracle;
To test connecting as the new database user, we must be logged in as the UNIX user
oracle.
$ id
uid=503(oracle) gid=504(oinstall) groups=504(oinstall),505(dba)
$ sqlplus /
Connected.
The following Perl program os.pl uses operating system authentication and retrieves the
database user name by executing SELECT user FROM dual:
#!/usr/bin/env perl
use DBI;
my $dbh=DBI->connect("dbi:Oracle:", "/", undef) or die "Failed to connect.\n";
my $sth=$dbh->prepare("SELECT user FROM dual");
$sth->execute;
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
my @row=$sth->fetchrow array;
printf "Connected as user %s.\n", $row[0];
$ chmod +x os.pl
$ ./os.pl
Connected as user OPS$ORACLE.
Connect Attributes
There are four additional attributes that may be passed to the connect method. They are
AutoCommit, ora module name, PrintError, and RaiseError. Each is explained in detail in the
next sections, and recommendations for optimal values in conjunction with the ORACLE
DBMS are given. You may pass one or more of these attributes to the connect method, by separating them with commas.
AutoCommit
The setting of the AutoCommit attribute decides whether or not each execution of an INSERT,
UPDATE, DELETE, or MERGE statement is committed immediately and implicitly or explicitly at a
later time by executing $dbh->commit. The default setting of AutoCommit is 1 (enabled). Since
committing each change individually severely degrades the response time of any application,
it is imperative that AutoCommit is explicitly set to 0 (disabled).
Module Name
The attribute ora module name may be used to set V$SESSION.MODULE, which is useful for letting
the database administrator identify an application. The result of setting ora module name is the
same as if the procedure DBMS APPLICATION INFO.SET MODULE were called in an anonymous
block, but is achieved with almost no extra coding. The default value is perl on UNIX and
perl.exe on Windows. Note that module names set in this way are ignored by tracing and
client statistics functionality controlled with DBMS MONITOR (see also related data dictionary
views DBA ENABLED TRACES and DBA ENABLED AGGREGATIONS).
PrintError
PrintError controls the behavior of the Perl DBI in case of an error. The default value is 1 (enabled),
which means errors are printed on the standard error output. When PrintError=0 is set, the
application is responsible for printing error messages when deemed appropriate. Since not all
errors may be fatal or even indicative of a problem, the recommended setting is 0 (disabled).
RaiseError
RaiseError controls whether or not DBI raises a Perl exception whenever an error is encountered. It applies to all DBI methods except connect. As an alternative, the return codes of each
individual DBI call may be checked. Since it is much more convenient to embed many DBI
calls in an eval block that catches exceptions than coding an if else sequence for each DBI call
based on the return code, using eval is the much preferred approach. The default value of
RaiseError is 0 (disabled). When RaiseError=1 (enabled) is set, exceptions are raised whenever
a DBI call fails.
243
244
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
Comprehensive Perl DBI Example Program
Now it is time to put it all together. The following Perl DBI program illustrates many aspects of
Perl DBI programming. It uses all four connect attributes and inserts rows into a table called
CUSTOMER in a loop. The customer identification (CUSTOMER.ID) is generated by a sequence. The
CREATE statements for the database objects needed to run the example are embedded as a POD
(Plain Old Documentation) section in the Perl source. POD sections are ignored by the Perl
compiler.
The number of rows to insert is passed as an argument. To ensure good performance, the
program commits only once, just before exiting, parses the INSERT statement with bind variables only once before entering the loop, and merely makes execute and bind calls inside the
loop. Another performance boost is achieved by using an INSERT TRIGGER with INSERT RETURNING
instead of first fetching the next sequence number with a SELECT statement and then passing
the sequence number back to the DBMS instance in the INSERT statement. The latter approach
impairs performance, since it incurs unnecessary network round trips between client and server.
The higher the network latency, the more severe the impact on response time will be.
As an aside, the fastest way to insert rows using a sequence to generate primary keys is to
reference the sequence with sequence_name.NEXTVAL in the INSERT statement. This requires
fewer CPU cycles than a PL/SQL trigger. Following is an example:
INSERT INTO customer(id, name, phone)
VALUES (customer id seq.nextval, :name, :phone)
RETURNING id INTO :id
The downside is that access to the sequence must be coded in every application, whereas
it would be coded centrally if a trigger were used.
The program features the ability to enable SQL TRACE based on an environment variable
setting. This is a feature any application should have, since it reduces the effort needed to
compile performance diagnostic data. The DBI method do is used to execute ALTER SESSION
statements. This method is also appropriate for executing non-reusable statements such as
CREATE TABLE.
The program also shows how to call PL/SQL packages by preparing and executing an
anonymous block. An alternative way to call PL/SQL routines (functions, packages, and procedures) that works with DBI but is not used in the following example, is the SQL statement CALL
(see Oracle Database SQL Reference 10g Release 2, page 13-53). I have added plenty of comments
to point out what is happening in the Perl program insert perf4.pl here:
1
2
3
4
5
6
7
8
9
10
11
#!/usr/bin/env perl
=pod
create table customer(
id number(*,0) not null,
name varchar2(10),
phone varchar2(30)
);
create sequence customer id seq;
create or replace trigger ins customer before insert on customer for each row
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
12 begin
13
SELECT customer id seq.nextval INTO :new.id FROM dual;
14 end;
15 /
16 variable id number
17 INSERT INTO customer(name, phone) VALUES ('&name', '&phone') RETURNING id INTO
:id;
18 print id
19
20 CLEANUP
21 =======
22 drop table customer;
23 drop sequence customer id seq;
24
25 =cut
26
27 if ( ! defined $ARGV[0] ) {
28
printf "Usage: $0 iterations\n";
29
exit;
30 }
31
32 # declare and initialize variables
33 my ($id, $name, $phone, $sth)=(undef, "Ray", "089/4711", undef);
34
35 use File::Basename;
36 use DBI; # import DBI module
37 print "DBI Version: $DBI::VERSION\n"; # DBI version is available after use DBI
38 use strict; # variables must be declared with my before use
39 my $dbh = DBI->connect("DBI:Oracle:TEN IPC.WORLD", "ndebes", "secret",
40
# set recommended values for attributes
41
{ora module name => basename($0),RaiseError=>1, PrintError=>0,AutoCommit => 0
})
42
or die "Connect failed: $DBI::errstr";
43 # DBD::Oracle version is available after connect
44 print "DBD::Oracle Version: $DBD::Oracle::VERSION\n";
45
46 # start eval block for catching exceptions thrown by statements inside the
block
47 eval {
48
# tracing facility: if environment variable SQL TRACE LEVEL is set,
49
# enable SQL trace at that level
50
my $trc ident=basename($0); # remove path component from $0, if present
51
if ( defined($ENV{SQL TRACE LEVEL})) {
52
$dbh->do("alter session set tracefile identifier='$trc ident'");
53
$dbh->do("alter session set events
54
'10046 trace name context forever, level $ENV{SQL TRACE LEVEL}'");
55
}
245
246
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
56
57
58
59
60
61
62
63
64
65
);
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
);
81
82
83
84
85
86
87
88
89
90
91
etc.
92
93
94
)
95
96
97
98
# parse an anonymous PL/SQL block for retrieving the ORACLE DBMS version
# and compatibility as well as the database and instance names
# V$ views are not used, since they may be accessed by privileged users only
# quoting with q{<SQL or PL/SQL statements>} is used, since
# it avoids trouble with quotes (", ')
$sth = $dbh->prepare(q{
begin
dbms utility.db version(:version, :compatibility);
:result:=dbms utility.get parameter value('db name',:intval, :db name
:result:=dbms utility.get parameter value('instance name', :intval,
:instance name);
end;
});
my ($version, $compatibility, $db name, $instance name, $result, $intval);
$sth->bind param inout(":version", \$version, 64);
$sth->bind param inout(":compatibility", \$compatibility, 64);
$sth->bind param inout(":db name", \$db name, 9);
$sth->bind param inout(":instance name", \$instance name, 16);
$sth->bind param inout(":intval", \$intval, 2);
$sth->bind param inout(":result", \$result, 1);
$sth->execute;
$sth = $dbh->prepare(q{SELECT userenv('sid'),
to char(sysdate, 'Day, dd. Month yyyy hh24:mi:ss "(week" IW")"') FROM dual}
$sth->execute;
my ($sid, $date time);
# pass reference to variables which correspond to columns in
# SELECT from left to right
$sth->bind columns(\$sid, \$date time);
my @row = $sth->fetchrow array;
printf "Connected to ORACLE instance %s, release %s (compatible=%s)",
$instance name, $version, $compatibility;
printf "; Database %s\n", $db name;
# due to bind columns, may use meaningful variable names instead of $row[0],
printf "Session %d on %s\n", $sid, $date time;
$sth = $dbh->prepare("INSERT INTO customer(name, phone) VALUES (:name, :phone
RETURNING id INTO :id", { ora check sql => 0 });
# loop, number of iterations is in command line argument
for (my $i=0; $i < $ARGV[0]; $i++) {
# bind param inout is for receiving values from the DBMS
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
$sth->bind param inout(":id", \$id, 38);
# bind param is for sending bind variable values to the DBMS
# assign value to bind variable (placeholder :name)
$sth->bind param(":name", $name);
# assign value to bind variable "phone"
$sth->bind param(":phone", $phone);
# execute the INSERT statement
$sth->execute();
printf "New customer with id %d inserted.\n", $id;
}
};
# check for exceptions
if ($@) {
printf STDERR "ROLLBACK due to Oracle error %d: %s\n", $dbh->err, $@;
# ROLLBACK any previous INSERTs
$dbh->rollback;
exit;
} else {
# commit once at end
$dbh->commit;
}
$sth->finish; # close statement handle
$dbh->disconnect; # disconnect from ORACLE instance
Line 35 imports the package File::Basename, which contains the function basename. The
Perl command basename works in the same way as the UNIX command by the same name. It
returns the file component from a path name by stripping one or more directories from the
string. The command basename is used to make sure that neither the module name nor the
TRACEFILE IDENTIFIER contain illegal or unwanted characters such as the directory separator
(/) that may be contained in the Perl variable $0.
The connect statement, which sets the recommended values for RaiseError (1),
PrintError (0), and AutoCommit (0) is in line 39. An eval block used to catch exceptions from
DBI method invocations encompasses lines 47 to 109. The check for the environment variable
SQL TRACE LEVEL is in line 51. The range of values is the same as for event 10046 (1, 4, 8, 12, see
also Chapter 24). If the environment variable is set, SQL trace is enabled in line 53 with ALTER
SESSION SET EVENTS. The name of the Perl program is used as the TRACEFILE IDENTIFIER in line
52 to facilitate locating trace files in the directory set as USER DUMP DEST.
Lines 62 to 77 show how to call PL/SQL packages by embedding them in an anonymous
block. The publicly accessible package DBMS UTILITY is called three times in a single anonymous
block to retrieve information on the instance and database.
Lines 79 to 86 and 92 exemplify how conveniently fetching data can be coded with the DBI.
With bind columns, variables with meaningful names are used for accessing column values
retrieved with the Perl DBI method fetchrow array. Without bind columns, array syntax such
as $row[column_index] must be used, where column_index starts at 0 and designates columns
in the SELECT column list from left to right.
The INSERT statement, which contains bind variables and is thus reusable, is parsed once
before entering the loop by calling prepare in line 94. It would be a costly mistake to place the
247
248
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
prepare call inside the loop. Extra network round trips due to parse calls and excess CPU
consumption due to superfluous soft parses would be incurred.
Inside the for loop, which stretches from line 97 to 108, the bind param inout method is
used to tell the DBI which variable to use for receiving the sequence number returned to the
client due to the SQL statement INSERT RETURNING id INTO :id. The bind variables name and
phone are for sending values to the DBMS. This is accomplished with the DBI method
bind param.
The eval block is followed by an if statement, which checks the special Perl variable $@ for
an exception. If an exception has occurred, $@ contains the error message. Otherwise $@ is an
empty string, considered by Perl as a boolean expression that evaluates as FALSE, such that the
if branch in line 112 is not entered. In case of an exception, any rows already inserted are
discarded by issuing rollback in line 114.
If all is well, commit is called once in line 118, the statement handle $sth is released in line
120, and the client disconnects from the DBMS in line 121. Following is a transcript of running
the Perl program insert perf4.pl:
$ ./insert perf4.pl 3
DBI Version: 1.41
DBD::Oracle Version: 1.15
Connected to ORACLE instance ten, release 10.2.0.1.0 (compatible=10.2.0.1.0);
Database TEN
Session 145 on Thursday , 19. July
2007 21:57:54 (week 29)
New customer with id 1 inserted.
New customer with id 2 inserted.
New customer with id 3 inserted.
Exception Handling
Before the curtain drops, I’d like to show the exception handling with eval in action. Setting the
tablespace where the table CUSTOMER resides to status read-only causes the execute method call
in line 106 to fail. Execution will continue at the if statement in line 111. Following is a step by
step test:
SQL> CONNECT system
Enter password:
Connected.
SQL> ALTER TABLESPACE users READ ONLY;
Tablespace altered.
SQL> EXIT
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Produc
tion
$ ./insert perf4.pl 1
DBI Version: 1.48
DBD::Oracle Version: 1.16
Connected to ORACLE instance ten, release 10.2.0.1.0 (compatible=10.2.0.1.0); Databa
se TEN
Session 145 on Thursday , 19. July
2007 23:52:32 (week 29)
CHAPTER 22 ■ PERL DBI AND DBD::ORACLE
ROLLBACK due to Oracle error 372: DBD::Oracle::st execute failed: ORA-00372: file 4
cannot be modified at this time
ORA-01110: data file 4: 'F:\ORADATA\TEN\USERS01.DBF' (DBD ERROR: error possibly near
<*> indicator at char 12 in 'INSERT INTO <*>customer(name, phone) VALUES (:name,
:phone)
RETURNING id INTO :id') [for Statement "INSERT INTO customer(name, phone)
VALUES (:name, :phone)
RETURNING id INTO :id" with ParamValues: :name='Ray', :phone='089/4711',
:id=undef] at ./insert perf4.pl line 106.
As is evident from the preceding output, the DBI provides a lot of information beyond the
ORACLE DBMS error message “ORA-00372: file 4 cannot be modified at this time”. It retrieves
the second error message on the error stack (ORA-01110: data file 4: ’F:\ORADATA\TEN\
USERS01.DBF’), tells us which SQL statement caused the error, the line number in the Perl
source file for locating the statement, which bind variables were in use, and what their values were.
Source Code Depot
Table 22-5 lists this chapter’s source files and their functionality.
Table 22-5. Perl DBI Source Code Depot
File Name
Functionality
ORADBB.pm
The Perl module ORADBB.pm is required to run dbb.pl. It contains several Perl
subroutines, which may be reused by any Perl DBI application (e.g., for
connecting with SYSDBA or SYSOPER privileges).
args.pl
Perl program that prints command line arguments on standard output.
dbb.pl
Command line utility capable of executing arbitrary SQL and PL/SQL
statements. Query result column widths are adjusted automatically.
insert perf4.pl
Perl program that demonstrates exception handling, PL/SQL execution,
bind variables, ROLLBACK, and COMMIT.
os.pl
Perl program that uses operating system authentication to connect.
perl dbi test.pl
Perl Program for testing connectivity to an ORACLE DBMS instance.
249
CHAPTER 23
■■■
Application Instrumentation
and End-to-End Tracing
T
he Oracle Database Performance Tuning Guide 10g Release 2 and the Oracle Call Interface
Programmer’s Guide 10g Release 2 describe a feature called end-to-end application tracing.
The Oracle Database JDBC Developer’s Guide and Reference 10g Release 2 documents end-toend metrics support in JDBC. The Oracle Database Concepts 10g Release 2 manual presents a
feature called end-to-end monitoring. These three terms refer to the same set of tracing, monitoring, and application instrumentation features. The feature set is implemented with three
application programming interfaces (OCI, PL/SQL, Java/JDBC), dictionary views, V$ dynamic
performance views, and SQL trace. Instrumentation is leveraged by the Active Workload
Repository, Active Session History, Statspack, Enterprise Manager, the database resource
manager, and the TRCSESS utility. The undocumented aspects surrounding this set of features
are so numerous that I refrain from listing them all.1 For example, it is undocumented that a
shared server process does not re-emit instrumentation entries to SQL trace files as it services
different sessions. This may lead to incorrect results when a shared server process’ SQL trace
file, which contains several sessions, is processed with TRCSESS.
A complete instrumentation case study that points out the benefits seen in the aforementioned components and addresses undocumented aspects, such as the SQL trace file entries
written by JDBC end-to-end metrics, is presented. A work-around that avoids incorrect results
with TRCSESS in a Shared Server environment is suggested. An example of the integration
between application instrumentation and the resource manager, which provides the undocumented syntax for assigning a resource group based on service name, module, and action, is
also included. Finally, backward compatibility of TRCSESS with Oracle9i is discussed.
Introduction to Instrumentation
According to Wikipedia, “instrumentation refers to an ability to monitor or measure the level
of a product’s performance, to diagnose errors and write trace information. […] In programming,
instrumentation means the ability of an application to incorporate:
1. The Java source file ApplicationInstrumentation.java in the source code depot lists ten undocumented aspects.
251
252
CHAPTER 23 ■ APPLICATION INSTRUMENTATION AND END-TO-END TRACING
• Code tracing—receiving informative messages about the execution of an application at
run time. […]
• Performance counters—components that allow you to track the performance of the
application.
• Event logs—components that allow you to receive and track major events in the execution of the application.”2
In a nutshell, instrumentation enables software to measure its own performance. The
ORACLE DBMS is well instrumented. It maintains hundreds of counters and timers that represent the workload executed as well as the performance of SQL statements, memory, file, and
network access. Measurements are available at instance level, session level, and SQL or PL/SQL
statement level. In 1999 Anjo Kolk, Shari Yamaguchi, and Jim Viscusi of Oracle Corporation, in
their acclaimed paper Yet Another Performance Profiling Method (or YAPP-Method), proposed
the following formula:
Response Time = Service Time + Wait Time
Even though the paper was lacking a snapshot-based approach to measuring performance3
and stated that the wait event SQL*Net message from client should be ignored at session level—
a mistake that limits the explanatory power of performance diagnoses still seen in recent books4—
it was a milestone towards a new tuning paradigm. Put simply, service time is the CPU time
consumed and wait time is the time spent waiting for one of several hundred wait events related to
disk access, synchronization, or network latency (see appendix Oracle Wait Events in Oracle
Database Reference and the dynamic performance view V$EVENT NAME). Instrumentation of the
database server provides these measurements. Instrumentation itself has a certain impact on
performance termed measurement intrusion. Basically, an extended SQL trace file is a microsecond-by-microsecond account of a database session’s elapsed time. In practice, some code
paths of the ORACLE DBMS are less instrumented than others and thus a SQL trace file does
not account for the entire elapsed time of a database session.
The response time perceived by an end user is affected by additional factors such as
network latency or processing in intermediate tiers such as application servers. Clearly, the
database server has its own perspective on response time, which is different from that of an
application server, which is still different from that of the end user. For example, instrumentation in Oracle10g introduced time stamps for wait events (WAIT entries) in extended SQL trace
files. Formerly, just the database calls parse, execute, and fetch were tagged with timestamps.
From the database server’s perspective, the response time of a SQL statement comprises the
interval between the arrival of the statement at the database server and the response sent to the
client. The former point in time is marked by the wait event SQL*Net message from client and
the latter by the wait event SQL*Net message to client. Due to network latency, the client will
2. http://en.wikipedia.org/wiki/Instrumentation %28computer programming%29
3. Both Statspack and AWR implement a snapshot-based approach to capturing performance data.
Figures since instance or session startup are not snapshot-based.
4. Please see Chapter 27 for information on the relevance of SQL*Net message from client and how to
derive think time from this wait event.
CHAPTER 23 ■ APPLICATION INSTRUMENTATION AND END-TO-END TRACING
not receive the response at the moment the wait event SQL*Net message to client completes. Thus
the response time measured by the client is longer than that measured by the database server.
Not all code paths in the database server are thoroughly instrumented, such that further
inaccuracies aside from measurement intrusion are introduced. The deviation between the
elapsed time covered by trace file entries as delineated by timestamps in the file (parameter
tim, see Chapter 24) and the time accounted for by CPU consumption and waiting is usually
less than ten percent. In other words, the quality of instrumentation is high and measurement
intrusion is low, such that it is possible to use the data for reliable performance diagnoses.
I use the term application instrumentation to differentiate instrumentation of the ORACLE
DBMS from instrumentation within an application or database client. Basically, instrumentation of the DBMS is the core functionality, which may be leveraged to a greater degree when the
application is also instrumented. This is the main theme I’m trying to bring across in this chapter.
The DBMS has application interfaces that allow setting module, action, and client identifier. In
this context, a module identifies a longer code path, which may correspond to a batch job, a
business task, or a functional component within a larger application. For example, human
resources and manufacturing might be modules of an enterprise resource planning application.
Modules consist of one or more actions, i.e., actions are at a more granular level than modules.
A client identifier serves to uniquely identify an end user. In a connection pooling environment, dedicated server processes are spawned once and reused again and again for running
DML on behalf of different end users. A DBA has no way of knowing which dedicated server
process services which end user, unless the application is instrumented and sets the client
identifier. Since all the sessions in a connection pool are opened by the same application server
and connect to a single database user, the columns USERNAME, OSUSER, and MACHINE in V$SESSION
are useless. With application instrumentation in place, a DBA can identify the database session
which services an end user. The DBA will also see which module and action are executed. Of
course, you may wish to add a lot more instrumentation code than simply setting module,
action, and client identifier. Code that has nothing to do with tracing or statistics collection in
the DBMS, but instead focuses on your application’s functionality or measures response time
independently of the DBMS. Especially if your application includes modules that do not interact
with a DBMS instance, you will need to include some kind of accounting for the time spent in
those modules. After all, the DBMS can only account for the time spent in database calls and
the time spent waiting to receive another database call, i.e., waiting for the event SQL*Net
message from client.
In addition to module, action, and client identifier, the instance service name (see “Instance
Service Name vs. Net Service Name” in this book’s Introduction) used by an application may
also be used to facilitate monitoring and tracing. Separate applications (or even connection
pools) running against an instance should use different service names. This opens up the
possibility of tracing and monitoring by service name.
Case Study
I have chosen Java, JDBC, and the new JDBC end-to-end metrics introduced with Oracle10g for
the code sample in this case study. In Oracle11g, the JDBC end-to-end metrics interface has
been removed from documentation in favor of Dynamic Monitoring Service. However, the
interface is still present in the JDBC driver release 11.1.0.6.0 and works exactly as in Oracle10g.
Instrumentation with PL/SQL has been available for a number of years and works as expected.
JDBC end-to-end metrics is a different issue in that the code sample provided by Oracle in the
253
254
CHAPTER 23 ■ APPLICATION INSTRUMENTATION AND END-TO-END TRACING
Oracle Database JDBC Developer’s Guide and Reference 10g Release 2 is neither a complete
runnable program nor syntactically correct. The case study looks at the effects of instrumentation on V$ views as well as SQL trace files and shows how to leverage it with DBMS MONITOR,
TRCSESS, and the database resource manager.
JDBC End-to-End Metrics Sample Code
JDBC end-to-end metrics is a lightweight approach to instrumentation. Instead of immediately
setting module, action, and client identifier as is done with the PL/SQL packages DBMS
APPLICATION INFO and DBMS SESSION, changed values are merely stored in a Java array of strings
and are sent to the DBMS with the next database call. Thus, contrary to PL/SQL, no extra network
round-trips are incurred. The functionality is available in the Thin JDBC driver as well as in the
JDBC OCI driver. The API is only found in Oracle JDBC drivers starting with release 10.1. Thirdparty JDBC drivers and Oracle drivers with release numbers 9.2 and earlier do not offer this
functionality. Coding is quite simple. You declare an array of strings and use several constants
to size the string and set the array elements, which represent module, action, and client identifier. The following code excerpt is from the Java program ApplicationInstrumentation.java,
which is included in the source code depot:
1 DriverManager.registerDriver(new oracle.jdbc.OracleDriver());
2 java.util.Properties prop = new java.util.Properties();
3 // properties are evaluated once by JDBC Thin when the session is created. Not su
itable for setting the program or other information after getConnection has been
called
4 prop.put("user", username);
5 prop.put("password", pwd);
6 prop.put("v$session.program", getClass().getName()); // undocumented property V$s
ession.program works with JDBC Thin only; if specified, then set as program and modu
le, as expected end-to-end metrics overwrites the module; program is not overwritten
7 Connection conn = DriverManager.getConnection(url, prop);
8 conn.setAutoCommit(false);
9 // Create Oracle DatabaseMetaData object
10 DatabaseMetaData metadata = conn.getMetaData();
11 // gets driver info:
12 System.out.println("JDBC driver version: " + metadata.getDriverVersion() + "\n");
13 System.out.println("\nPlease query V$SESSION and hit return to continue when done
.\n");
14 System.in.read(buffer, 0, 80); // Pause until user hits enter
15 // end-to-end metrics interface
16 String app instrumentation[] = new String[OracleConnection.END TO END STATE INDEX
MAX];
17 app instrumentation[OracleConnection.END TO END CLIENTID INDEX]="Ray.Deevers";
18 app instrumentation[OracleConnection.END TO END MODULE INDEX]="mod";
19 app instrumentation[OracleConnection.END TO END ACTION INDEX]="act";
20 ((OracleConnection)conn).setEndToEndMetrics(app instrumentation,(short)0);
21 Statement stmt = conn.createStatement();
22 ResultSet rset = stmt.executeQuery("SELECT userenv('sid'), to char(sysdate, 'Mont
h dd. yyyy hh24:mi') FROM dual");
CHAPTER 23 ■ APPLICATION INSTRUMENTATION AND END-TO-END TRACING
23 while (rset.next())
24
System.out.println("This is session " + rset.getString(1) + " on " +
rset.getString(2));
25 rset.close();
26 System.out.println("\nPlease query V$SESSION and hit return to continue when done
.\n");
27 System.in.read(buffer, 0, 80); // Pause until user hits enter
28 // with connection pooling, execute this code before returning session to connect
ion pool
29 app instrumentation[OracleConnection.END TO END CLIENTID INDEX]="";
30 app instrumentation[OracleConnection.END TO END MODULE INDEX]="";
31 app instrumentation[OracleConnection.END TO END ACTION INDEX]="";
32 ((OracleConnection)conn).setEndToEndMetrics(app instrumentation,(short)0);
The connection is established by creating an instance of the class java.util.Properties,
setting user name and password, and then passing the class instance to the driver manager
in DriverManager.getConnection. This makes it possible to use the undocumented property
v$session.program (line 6) to set the program and module names in V$SESSION for JDBC Thin
connections. The default program and module name for such connections is JDBC Thin Client,
which is not very telling. JDBC OCI ignores this property.
If you cannot afford to thoroughly instrument a JDBC Thin application, I recommend that
you at least set this property prior to establishing a connection to the DBMS. Thus, the DBA will
be able to identify which database session your Java program uses, but he will not be able to
use the setting with DBMS MONITOR, since it only honors the documented programming interfaces PL/SQL, OCI, and Java. The module name will not only appear in V$SESSION.MODULE, but
also in Statspack and in AWR reports. Of course, the latter benefit applies to any program that
sets V$SESSION.MODULE, irrespective of whether it is set with DBMS APPLICATION INFO.SET
MODULE, OCI, or JDBC. The module should also be set with JDBC OCI, but it requires a few more
lines of code. The default program and module for JDBC OCI are java@client_host_name on
UNIX and java.exe on Windows, where client_host_name is the system from which the JDBC
OCI program connected.
On line 16, the variable app instrumentation is declared as an array of strings. The constant
OracleConnection.END TO END STATE INDEX MAX is used to specify the array size. Client identifier,
module, and action are set in lines 17 to 19. The call to
((OracleConnection)conn).setEndToEndMetrics in line 20 makes the new settings available to
the end-to-end metrics API. The cast is mandatory to use the extension from Oracle Corporation. In lines 20 to 25 a SELECT statement is executed. This statement causes parse, execute, and
fetch database calls. The new settings will be sent along with the first database call, which results
from the SELECT statement. Thus, while the program waits for the user to press Return in line 27, the
new settings for module, action, and client identifier will be seen in V$SESSION. In lines 29 to 32
module, action, and client identifier are set to empty strings. This is done to show the integration with DBMS MONITOR and TRCSESS.
The source code depot also includes a Java program which is instrumented with PL/SQL
and is compatible with Oracle9i as well as Oracle10g. Use of an Oracle10g JDBC driver to connect
to an Oracle9i instance is supported. However, end-to-end metrics code has no effect when
executed against an Oracle9i instance. Thus, at least to set module and action, PL/SQL must be
used by Java applications running against Oracle9i. The Java class OracleConnection includes
the method public void setClientIdentifier(java.lang.String clientId) to set the client
255
256
CHAPTER 23 ■ APPLICATION INSTRUMENTATION AND END-TO-END TRACING
identifier in Java. The Java program JdbcInstrumentationOracle9i.java in the source code
depot is instrumented with PL/SQL only and works with Oracle9i and subsequent releases.
Compiling the Program
Since there is a Java Developer’s Kit (JDK) in an Oracle10g ORACLE HOME, there is no need to
install additional software to compile the Java program. It’s sufficient to set the environment
variable CLASSPATH and to include the Java compiler javac in the command search path variable
PATH. On UNIX, a colon (:) is used to separate individual entries in CLASSPATH, whereas on Windows
a semicolon (;) is used. Following is an example from Windows:
C:>
C:>
C:>
C:>
set ORACLE HOME=C:\Oracle\product\db10.2
set CLASSPATH=%ORACLE HOME%\jdbc\lib\ojdbc14.jar;.
set PATH=%ORACLE HOME%\bin;%ORACLE HOME%\jdk\bin;%SystemRoot%\system32
javac ApplicationInstrumentation.java
At this point, the compiled Java program is in the file ApplicationInstrumentation.class,
which is also included in the source code depot, such that you need not compile the source file.
Instrumentation at Work
Now it’s time to run the program. Note that, since “.” was used in the CLASSPATH setting in the
previous section, ApplicationInstrumentation.class must be in the current directory to successfully run it. The program requires the following three arguments:
• A database user name
• The user’s password
• A JDBC URL for either JDBC Thin or JDBC OCI
JDBC OCI URLs take on the form jdbc:oracle:oci:@net_service_name, where net_service_
name is a Net service name defined in tnsnames.ora or by a directory service. JDBC Thin URLs
for connecting to a specific instance service have the format jdbc:oracle:thin:@//host_name:
port/instance_service_name, where host_name is the system where the DBMS instance is running,
port is the port number used by the listener, and instance_service_name is a service name listed
with lsnrctl services. The old JDBC Thin URL syntax jdbc:oracle:thin:@host_
name:port:ORACLE_SID should no longer be used, since it results in the default service name
SYS$USERS in V$SESSION.SERVICE NAME. This prevents the use of individual instance service
names for different applications and also defeats the purpose of cluster services in RAC
environments.
Setting Up Tracing, Statistics Collection, and
the Resource Manager
Some preparations are needed to show the interaction of instrumentation with end-to-end
tracing, client statistics collection, and the resource manager. The package DBMS MONITOR may
CHAPTER 23 ■ APPLICATION INSTRUMENTATION AND END-TO-END TRACING
be used to enable SQL trace or client statistics for a certain combination of instance service
name, module, and action. In the call to the packaged procedure DBMS MONITOR.SERV MOD ACT
TRACE ENABLE, merely the instance service name is mandatory. If the module is unspecified,
then the setting affects all modules. Likewise, when service name and module are specified, the
setting affects all actions. Tracing and statistics collection for a certain client identifier are also
possible. Here are some examples which first enable and then disable trace and statistics
collection at various levels:
SQL>
SQL>
SQL>
SQL>
SQL>
SQL>
SQL>
SQL>
EXEC
EXEC
EXEC
EXEC
EXEC
EXEC
EXEC
EXEC
dbms
dbms
dbms
dbms
dbms
dbms
dbms
dbms
monitor.serv mod act trace enable('TEN.oradbpro.com', 'mod', 'act')
monitor.client id trace enable('Ray.Deevers')
monitor.serv mod act stat enable('TEN.oradbpro.com', 'mod', 'act')
monitor.client id stat enable('Ray.Deevers')
monitor.client id stat disable('Ray.Deevers')
monitor.serv mod act stat disable('TEN.oradbpro.com', 'mod', 'act')
monitor.client id trace disable('Ray.Deevers')
monitor.serv mod act trace disable('TEN.oradbpro.com', 'mod', 'act')
Arguments to DBMS MONITOR are case sensitive. The spelling of the service name must match
the value of the column DBA SERVICES.NETWORK NAME. The settings are persistent in the sense
that they remain in effect across instance restarts. The dictionary views DBA ENABLED TRACES
and DBA ENABLED AGGREGATIONS reflect the current settings. If we wanted to trace all the sessions
that use the instance service “TEN.oradbpro.com” and enable statistics collection for that
same service and the module “mod”, we could achieve this with the following code:
SQL> SELECT network name FROM dba services WHERE network name LIKE 'TEN.%';
NETWORK NAME
---------------TEN.oradbpro.com
SQL> EXEC dbms monitor.serv mod act trace enable(service name=>'TEN.oradbpro.com')
SQL> SELECT trace type, primary id, waits, binds FROM dba enabled traces;
TRACE TYPE PRIMARY ID
WAITS BINDS
---------- ---------------- ----- ----SERVICE
TEN.oradbpro.com TRUE FALSE
SQL> EXEC dbms monitor.serv mod act stat enable('TEN.oradbpro.com', 'mod')
SQL> SELECT aggregation type, primary id, qualifier id1, qualifier id2
FROM dba enabled aggregations;
AGGREGATION TYPE PRIMARY ID
QUALIFIER ID1 QUALIFIER ID2
---------------- ---------------- ------------- ------------SERVICE MODULE
TEN.oradbpro.com mod
Statistics for service name, module, and action may be retrieved from the dynamic
performance view V$SERV MOD ACT STATS, whereas statistics for a client identifier are in
V$CLIENT STATS.
The assignment of a database session to a consumer group based on instrumentation
settings is a new feature of the Oracle10g database resource manager. Among others, assignments can be made for combinations of service name, module, and action or combinations of
module and action, where merely the first component is mandatory. The settings in effect are
available by querying the view DBA RSRC GROUP MAPPINGS. This feature may be used to prioritize
modules and services. The resource manager is shipped with a default plan called SYSTEM_PLAN.
257
258
CHAPTER 23 ■ APPLICATION INSTRUMENTATION AND END-TO-END TRACING
This plan includes the consumer groups SYS_GROUP, OTHER_GROUPS, and LOW_GROUP
(see the dynamic performance view V$RSRC CONSUMER GROUP). As the name suggests, LOW_GROUP
is the consumer group with the lowest priority. In a situation where CPU resources are scarce,
the SYS_GROUP and OTHER_GROUPS will be given slices of CPU time, whereas sessions in the
LOW_GROUP have to yield the CPU. Since the resource manager is disabled by default, it must
be enabled to use these features. This can be done at runtime using ALTER SYSTEM.
SQL> ALTER SYSTEM SET resource manager plan=system plan;
System altered.
The syntax for specifying a combination of instance service name, module, and action as
a single string in a call to DBMS RESOURCE MANAGER.SET CONSUMER GROUP MAPPING is undocumented.
There is also no statement on case sensitivity. Actual testing reveals that a dot (.) has to be used
to separate the service name from the module and the module from the action. Contrary to
DBMS MONITOR, capitalization is irrelevant. The code below instructs the resource manager to
place a session that uses the instance service “TEN.oradbpro.com” and has set module and
action to “mod” and “act” respectively into the consumer group LOW_GROUP (file rsrc mgr.sql).
begin
dbms resource manager.create pending area();
dbms resource manager.set consumer group mapping(
attribute=>dbms resource manager.service module action,
value=>'TEN.oradbpro.com.mod.act',
consumer group=>'LOW GROUP'
);
dbms resource manager.submit pending area();
dbms resource manager.clear pending area();
end;
/
SQL> SELECT * FROM dba rsrc group mappings;
ATTRIBUTE
VALUE
CONSUMER GROUP STATUS
--------------------- ------------------------ -------------- -----SERVICE MODULE ACTION TEN.ORADBPRO.COM.MOD.ACT LOW GROUP
ORACLE USER
SYS
SYS GROUP
ORACLE USER
SYSTEM
SYS GROUP
The setting takes effect immediately, i.e., it is applied to sessions that are already connected.
This completes the preparations. If we now run the Java program, we expect to see these three
things happening:
1. A SQL trace file is created.
2. Statistics are collected for service name, module, and action.
3. The session is placed in the consumer group LOW_GROUP as soon as it sets module
and action to the values “mod” and “act” respectively.
Let’s verify that the software responds as expected. The Java program is run by starting a
Java virtual machine and passing the class and parameters for the program as arguments.
CHAPTER 23 ■ APPLICATION INSTRUMENTATION AND END-TO-END TRACING
C:> java ApplicationInstrumentation ndebes secret jdbc:oracle:thin:@//localhost:1521
/ TEN.oradbpro.com
JDBC driver version: 10.2.0.3.0
Please query V$SESSION and hit return to continue when done.
At this point, the program has connected, but not yet made any calls to the end-to-end
metrics interface. Now open another window and run the following query in SQL*Plus:
SQL> SELECT program, module, action, resource consumer group
FROM v$session
WHERE service name='TEN.oradbpro.com';
PROGRAM
MODULE
ACTION
-------------------------- -------------------------- -----ApplicationInstrumentation ApplicationInstrumentation
AS consumer group
CONSUMER GROUP
-------------OTHER GROUPS
This shows that the undocumented property for setting program and module has worked.
The session is currently in consumer group OTHER_GROUPS. Next, press Return in the window
where the Java program is running. This will cause the program to advance to the section where
client identifier, module, and action are set. It will also execute a query that retrieves the session
identifier and the current date.
This is session 40 on October
30. 2007 22:37
Please query V$SESSION and hit return to continue when done.
The program pauses again, allowing you to see the effects of the calls to the end-to-end
metrics API in V$SESSION and V$SERV MOD ACT STATS (timings are in microseconds).
SQL> SELECT program, module, action, client identifier,
resource consumer group AS consumer group
FROM v$session
WHERE service name='TEN.oradbpro.com';
PROGRAM
MODULE ACTION CLIENT IDENTIFIER
-------------------------- ------ ------ ----------------ApplicationInstrumentation mod
act
Ray.Deevers
SQL> SELECT stat name, value
FROM v$serv mod act stats
WHERE service name='TEN.oradbpro.com' AND module='mod' AND
STAT NAME
VALUE
------------------------- ----user calls
2
DB time
5555
DB CPU
5555
parse count (total)
1
parse time elapsed
63
execute count
2
sql execute elapsed time
303
opened cursors cumulative
1
CONSUMER GROUP
-------------LOW GROUP
value > 0;
Module, action, and client identifier are set, while the program name has been retained. As
expected, the session has been placed into the consumer group LOW_GROUP. Statistics for
259
260
CHAPTER 23 ■ APPLICATION INSTRUMENTATION AND END-TO-END TRACING
service and module are collected and available by querying V$SERV MOD ACT STATS. Now press
Return in the window where the Java program is running one more time.
application instrumentation settings removed
Please query V$SESSION and hit return to continue when done.
At this point, the program has set module, action, and client identifier to empty strings. It
pauses again to allow us to observe the effects. The previous consumer group of the session has
been restored.
SQL> SELECT program, module, action, client identifier,
resource consumer group AS consumer group
FROM v$session
WHERE service name='TEN.oradbpro.com';
PROGRAM
MODULE ACTION CLIENT IDENTIFIER CONSUMER GROUP
-------------------------- ------ ------ ----------------- -------------ApplicationInstrumentation
OTHER GROUPS
The program disconnects and exits as soon as you press Return one last time. There is now
an extended SQL trace file, which includes wait events but not binds, in the directory set with
parameter USER DUMP DEST. To include binds, set the boolean parameter BINDS in the call to
DBMS MONITOR.SERV MOD ACT TRACE ENABLE to TRUE. The relevant sections of the trace file are
reproduced here:
*** SERVICE NAME:(TEN.oradbpro.com) 2007-10-30 22:52:52.703
*** SESSION ID:(47.2195) 2007-10-30 22:52:52.703
WAIT #0: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0
obj#=-1 tim=392092318798
…
*** 2007-10-30 22:53:03.984
WAIT #0: nam='SQL*Net message from client' ela= 11260272 driver id=1413697536
#bytes=1 p3=0 obj#=-1 tim=392103603777
*** ACTION NAME:(act) 2007-10-30 22:53:03.984
*** MODULE NAME:(mod) 2007-10-30 22:53:03.984
*** CLIENT ID:(Ray.Deevers) 2007-10-30 22:53:03.984
=====================
PARSING IN CURSOR #2 len=75 dep=0 uid=61 oct=3 lid=61 tim=392103604193 hv=536335290
ad='66afd944'
SELECT userenv('sid'), to char(sysdate, 'Month dd. yyyy hh24:mi') FROM dual
END OF STMT
PARSE #2:c=0,e=90,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=392103604185
EXEC #2:c=0,e=53,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=392103608774
WAIT #2: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0
obj#=-1 tim=392103608842
WAIT #2: nam='SQL*Net message from client' ela= 52777 driver id=1413697536 #bytes=1
p3=0 obj#=-1 tim=392103661760
WAIT #2: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0
obj#=-1 tim=392103661904
FETCH #2:c=0,e=93,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,tim=392103661941
CHAPTER 23 ■ APPLICATION INSTRUMENTATION AND END-TO-END TRACING
WAIT #2: nam='SQL*Net message from client' ela= 233116 driver id=1413697536 #bytes=1
p3=0 obj#=-1 tim=392103895174
*** ACTION NAME:() 2007-10-30 22:53:04.281
*** MODULE NAME:() 2007-10-30 22:53:04.281
*** CLIENT ID:() 2007-10-30 22:53:04.281
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=0 op='FAST DUAL (cr=0 pr=0 pw=0 time=14 us)'
=====================
PARSING IN CURSOR #2 len=63 dep=0 uid=61 oct=3 lid=61 tim=392103896865 hv=2359234954
ad='672f1af4'
SELECT 'application instrumentation settings removed' FROM dual
END OF STMT
The fact that the module and action names appear before the SELECT statement, which
retrieves the session identifier, proves that the instrumentation settings were piggybacked with
the network packet(s) sent to parse the SELECT statement. JDBC end-to-end metrics do not
cause additional network round-trips. At the IP level, the extra data may necessitate an additional packet, but there will be no extra round-trips reported as the wait event SQL*Net message
from client.
Using TRCSESS
TRCSESS is a new utility that ships with Oracle10g and subsequent releases. It may be used to
extract relevant sections from one or more trace files into a single file based on the following
criteria:
• Service name (V$SESSION.SERVICE NAME)
• Session identification (V$SESSION.SID and V$SESSION.SERIAL#)
• Module name (V$SESSION.MODULE)
• Action name (V$SESSION.ACTION)
• Client identifier (V$SESSION.CLIENT IDENTIFIER)
One of the session identification, client identifier, service name, action, and module options
must be specified. If more than a single option is used, the trace file sections that satisfy all the
criteria, are combined into an output file. All the option values are case sensitive. After TRCSESS
has merged the trace information into a single output file, this file may be processed by TKPROF
or the extended SQL trace profiler ESQLTRCPROF included with this book.
The TRCSESS utility prints information on its usage when it is called without any arguments.
$ trcsess
trcsess [output=<output file name >] [session=<session ID>] [clientid=<clientid>]
[service=<service name>] [action=<action name>] [module=<module name>]
<trace file names>
output=<output file name> output destination default being standard output.
session=<session Id> session to be traced.
Session id is a combination of session Index & session serial number e.g. 8.13.
261
262
CHAPTER 23 ■ APPLICATION INSTRUMENTATION AND END-TO-END TRACING
clientid=<clientid> clientid to be traced.
service=<service name> service to be traced.
action=<action name> action to be traced.
module=<module name> module to be traced.
<trace file names> Space separated list of trace files with wild card '*' supported.
In the dedicated server model, all trace file entries for a particular session are in a single
file. Shared Server and connection pooling environments are a different matter. In the Shared
Server model, trace file entries for a single session may be spread across as many files as shared
server processes configured. This is due to the fact that a user session is serviced by different
shared server processes from time to time. This makes it difficult to create a complete resource
profile for a session. Connection pooling environments are the worst of all, since a DBA cannot
form an association between an end user, who may have contacted him for support, and a
database session, unless the application is instrumented and sets a client identifier, which
allows the DBA to find out which database session is currently used on behalf of the end user.
To continue the example from the previous section, I will show how to extract the SQL
statements issued while the session was executing module “mod” and action “act” with TRCSESS.
Glancing back at the code excerpt in the section “JDBC End-to-End Metrics Sample Code” near
the beginning of this chapter, you would expect that merely the SELECT statement that retrieves
the session identifier is contained in the resulting trace file.
$ trcsess module=mod action=act *.trc
*** 2007-10-30 22:53:03.984
*** 2007-10-30 22:53:03.984
=====================
PARSING IN CURSOR #2 len=75 dep=0 uid=61 oct=3 lid=61 tim=392103604193 hv=536335290
ad='66afd944'
SELECT userenv('sid'), to char(sysdate, 'Month dd. yyyy hh24:mi') FROM dual
END OF STMT
PARSE #2:c=0,e=90,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=392103604185
EXEC #2:c=0,e=53,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=392103608774
WAIT #2: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0
obj#=-1 tim=392103608842
WAIT #2: nam='SQL*Net message from client' ela= 52777 driver id=1413697536 #bytes=1
p3=0 obj#=-1 tim=392103661760
WAIT #2: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0
obj#=-1 tim=392103661904
FETCH #2:c=0,e=93,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,tim=392103661941
WAIT #2: nam='SQL*Net message from client' ela= 233116 driver id=1413697536 #bytes=1
p3=0 obj#=-1 tim=392103895174
Note how all the application instrumentation entries including the client identifier, which
came after the action and module names, have been removed from the output of TRCSESS.
Unless a meaningful file name is used, it is not possible to tell how the file was processed. This
also implies that it is impossible to drill down from module and action to a specific client identifier. Calling TRCSESS with the option session leaves any other instrumentation entries intact.
Specifying any other option removes all the instrumentation entries and merely leaves the
timestamps that accompanied them in the resulting trace file. Since the session identification
CHAPTER 23 ■ APPLICATION INSTRUMENTATION AND END-TO-END TRACING
entry format is the only one that is identical in Oracle9i and Oracle10g, TRCSESS has limited
backward compatibility to Oracle9i trace files. So far, everything has worked as you might expect.
The next section addresses pitfalls of using TRCSESS in a Shared Server environment.
TRCSESS and Shared Server
According to the documentation, TRCSESS caters for Shared Server environments (Oracle
Database Performance Tuning Guide 10g Release 2, page 20-6). However, there is a chance of
incorrect results when using TRCSESS in a Shared Server environment with any option apart
from session. When a shared server process begins servicing a session that has SQL trace enabled,
it only emits the session identifier (V$SESSION.SID) and session serial number (V$SESSION.SERIAL#).
Any other instrumentation entries are not written to the trace file repeatedly. The current
implementations of TRCSESS (releases 10.2.0.3 and 11.1.0.6) apparently do not keep track of
which instrumentation settings were made by a session. Instead, TRCSESS includes trace file
sections from other sessions in the output, merely because it does not encounter a new value
for service name, module, action, or client identifier. In order to work properly, TRCSESS
would need to see entries for service name, module, action, and client identifier each time a
shared server process services a new session. But since these entries are written only once, the
results obtained with TRCSESS may be incorrect. The test case below illustrates this. If Shared
Server is not enabled in your test environment, then you may enable it like this:
SQL> ALTER SYSTEM SET shared servers=1;
System altered.
SQL> ALTER SYSTEM SET dispatchers='(PROTOCOL=TCP)(DISPATCHERS=1)';
System altered.
By using a single shared server process, I can make sure that several sessions will be present
in the same trace file. Next, logged in as a DBA, I enable tracing by service name, module, and
action using DBMS MONITOR.
SQL> EXEC dbms monitor.serv mod act trace enable('TEN.oradbpro.com', 'mod', 'act');
SQL> SELECT trace type, primary id, qualifier id1, qualifier id2, instance name
FROM dba enabled traces;
TRACE TYPE
PRIMARY ID
QUALIFIER ID1 QUALIFIER ID2 INSTANCE NAME
--------------------- ---------------- ------------- ------------- ------------SERVICE MODULE ACTION TEN.oradbpro.com mod
act
For the remainder of the test I need two sessions, with SQL trace enabled in both. The first
session enables SQL trace manually.
$ sqlplus ndebes/secret@ten.oradbpro.com
SQL> ALTER SESSION SET sql trace=true;
Session altered.
The second session uses the instance service specified with the preceding call of DBMS
MONITOR and also sets module and action to the values used in the call to DBMS MONITOR.SERV
MOD ACT TRACE ENABLE. This enables SQL TRACE. In the next code example, I have changed
the SQL*Plus setting for SQLPROMPT such that the reader can easily recognize which session
executes what.
263
264
CHAPTER 23 ■ APPLICATION INSTRUMENTATION AND END-TO-END TRACING
$ sqlplus ndebes/secret@ten.oradbpro.com
SQL> SET SQLPROMPT "MOD ACT> "
MOD ACT> exec dbms application info.set module('mod','act');
MOD ACT> SELECT server, service name, module, action
FROM v$session WHERE sid=userenv('SID');
SERVER
SERVICE NAME
MODULE
ACTION
--------- ---------------- ---------- ------SHARED
TEN.oradbpro.com mod
act
MOD ACT> SELECT 'mod act' AS string FROM dual;
STRING
------mod act
Back to the first session. I select a string literal from DUAL. This serves the purpose of including
a unique string in the trace file, which may be searched for later.
SQL> SELECT 'not instrumented' AS string FROM dual;
STRING
---------------not instrumented
SQL> ALTER SESSION SET sql trace=false;
Session altered.
SQL trace in the first session is switched off now. Back to session 2. In order to delimit the
trace file section for extraction with TRCSESS and to switch off tracing, this session sets module
and action to empty strings.
MOD ACT> exec dbms application info.set module('','');
You may now go to the directory specified with the parameter BACKGROUND DUMP DEST5 and
run TRCSESS.
$ trcsess output=mod act.trc module=mod action=act ten1 s000 8558.trc
$ grep "not instrumented" mod act.trc
SELECT 'not instrumented' AS string FROM dual
The output file created by TRCSESS includes the SELECT of the string literal by session 1,
although session 1 did not set module and action to the option values passed to TRCSESS. This
defect applies to TRCSESS releases 10.2 and 11.1. As stated earlier, the problem is that instrumentation entries are not written again when a shared server process services another session.
Following is an excerpt of a shared server process’ trace file that confirms this (all instrumentation entries are included):
1 *** ACTION NAME:(act) 2007-09-06 20:23:52.514
2 *** MODULE NAME:(mod) 2007-09-06 20:23:52.514
3 *** SERVICE NAME:(TEN.oradbpro.com) 2007-09-06 20:23:52.514
5. The session level parameter TRACE FILE IDENTIFIER, which makes it easier to identify trace files,
merely applies to dedicated server processes. These include the parameter’s value in the trace file
name. Shared server processes ignore this parameter.
CHAPTER 23 ■ APPLICATION INSTRUMENTATION AND END-TO-END TRACING
4 *** SESSION ID:(133.39) 2007-09-06 20:23:52.514
5 =====================
6 PARSING IN CURSOR #7 len=59 dep=0 uid=34 oct=47 lid=34 tim=1161233430189798
hv=919966564 ad='22c5ec10'
7 BEGIN dbms application info.set module('mod','act'); END;
8 END OF STMT
9 …
10 *** SESSION ID:(146.361) 2007-09-06 20:26:18.442
11 …
12 *** SESSION ID:(133.39) 2007-09-06 20:26:48.229
13 …
14 *** SESSION ID:(146.361) 2007-09-06 20:27:02.759
15 =====================
16 PARSING IN CURSOR #7 len=45 dep=0 uid=34 oct=3 lid=34 tim=1161233615975704
hv=2134668354 ad='2527ef28'
17 SELECT 'not instrumented' AS string FROM dual
18 …
19 *** SESSION ID:(133.39) 2007-09-06 20:27:27.502
20 …
21 PARSING IN CURSOR #2 len=53 dep=0 uid=34 oct=47 lid=34 tim=1161233640139316
hv=3853446089 ad='22c484d4'
22 BEGIN dbms application info.set module('',''); END;
23 END OF STMT
24 PARSE #2:c=0,e=2,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1161233640139316
25 *** ACTION NAME:() 2007-09-06 20:27:27.503
26 *** MODULE NAME:() 2007-09-06 20:27:27.503
Lines 1 to 2 contain the module and action emitted due to the second session’s call to
DBMS APPLICATION INFO. Lines 10 to 14 show that the shared server process intermittently
serviced another session with SID=146 and SERIAL#=361. Line 17 contains the string literal used
to mark the SELECT executed by the first session, which was not instrumented. Lines 25 to 26
mark the end of the trace file section that pertains to the module “mod” and the action “act”.
At this point, the module and action names no longer matched the settings in DBA ENABLED
TRACES made with DBMS MONITOR, hence tracing in session 2 was switched off. In other words,
end-to-end tracing noted the new values for module and action and then switched off SQL
trace.6 Since there are no entries for module and action between lines 10 and 19, TRCSESS
included all the trace file entries between these lines, although some of them belong to a
different session.
A correct implementation of TRCSESS would memorize the instrumentation settings for
each combination of SID and SERIAL# and use them to omit or include the trace file section that
follows a line with a new value for SID and SERIAL#. This approach may be used as a workaround.
Look for the SESSION ID entry next to the module and action that we are interested in. Line 4
contains the SID and SERIAL# (133.39) that we need to pass to TRCSESS to get a correctly
6. A dedicated server process servicing an application instrumented with JDBC end-to-end metrics
switches off SQL trace before emitting the new values for module and action that cause end-to-end
tracing to disable SQL trace. Since a dedicated server process, re-emits module, action, and client
identifier each time SQL trace is enabled, this will not lead to incorrect results.
265
266
CHAPTER 23 ■ APPLICATION INSTRUMENTATION AND END-TO-END TRACING
processed output file. If tracing had been enabled before module and action were set, you
would need to look for the SESSION ID entry preceding the ACTION NAME or MODULE NAME entry.
Instead of passing module and action to TRCSESS, we now pass the SID and SERIAL# that we
identified in the trace file.
$ trcsess output=session 133 39.trc session=133.39 ten1 s000 8558.trc
$ grep "not instrumented" session 133 29.trc
This time the string literal used to mark the first session is not found in the output file. This
proves that TRCSESS is able to process the shared server process’ input trace file correctly, as
long as the option session is used.
Instrumentation and the Program Call Stack
The examples we have looked at so far are highly simplified. A real application has many routines,
which call each other. The called routine does not normally know which routine it was called
by and it may not always be called by the same routine. To keep track of the program call stack,
each routine needs to push the current instrumentation settings on to a stack before overwriting
them with its own settings. On exit from the routine, the previous settings may be fetched from
the stack (pop) and restored. Care must be taken that the pop operation is also performed
upon abnormal exit from a routine due to an exception or error. As far as I know, such a stack
was first implemented in PL/SQL by the Hotsos ORACLE instrumentation library (ILO).7
Figure 23-1 illustrates how routines in an application call each other. Execution begins in
routine A at t1 Instrumentation settings of routine A are in effect at this point. At time t2, routine
A calls routine B. To preserve the instrumentation settings of routine A, routine B pushes them
onto a stack and then puts its own settings into effect. At t3, routine B calls routine C, which in
turn saves instrumentation settings from routine B on the stack before overwriting them. At t4,
the application exits routine C, which restores the instrumentation settings of routine B by
popping them off the stack, just prior to exiting. The process goes on until the program exits
from routine A at some point after t7. The axis on the right indicates which routine is executing
at a certain point in time. If this approach is adhered to throughout the code path of an application, it is possible to find out how much time is spent in each module by using DBMS MONITOR
and V$SERV MOD ACT STATS. Statistics provided by the view V$SERV MOD ACT STATS include “DB
time”, “DB CPU”, “physical reads”, “physical writes”, “cluster wait time”, “concurrency wait
time”, “application wait time”, and “user I/O wait time”.
7. The Hotsos ILO library is free software and may be downloaded from http://sourceforge.net/
projects/hotsos ilo.
CHAPTER 23 ■ APPLICATION INSTRUMENTATION AND END-TO-END TRACING
t1
A
t2
push(A), B
push(B), C
t3
pop
t4
t5
pop
t6
t7
push(A), C
pop
A
B
C
B
A
C
A
Nesting level
Figure 23-1. Program call stack and instrumentation
Source Code Depot
Table 23-1 lists this chapter’s source files and their functionality.
Table 23-1. Application Instrumentation Source Code Depot
File Name
Functionality
ApplicationInstrumentation.class
Java program compiled from source file
ApplicationInstrumentation.java
ApplicationInstrumentation.java
Java source file that illustrates Oracle10g JDBC end-toend metrics
JdbcInstrumentationOracle9i.class
Java program compiled from source file
JdbcInstrumentationOracle9i.java
JdbcInstrumentationOracle9i.java
Java source file that demonstrates instrumentation
with PL/SQL
rsrc mgr.sql
Anonymous PL/SQL block for setting up resource
manager consumer group mapping by service, module,
and action
267
PA R T
8
Performance
CHAPTER 24
■■■
Extended SQL Trace File
Format Reference
A
SQL trace file contains a precise log of the SQL statements parsed and executed as well as
the fetch calls made by a database client. This includes the CPU time and the elapsed (wall
clock) time consumed by parse, execute, and fetch operations. Optionally, a SQL trace file
includes information on wait events, which are indispensable for an accurate performance
diagnosis. Last but not least, actual values for bind variables may also be included, e.g., when
the PL/ SQL package DBMS MONITOR is used to enable SQL trace.
The SQL trace file format is undocumented, even though the utilities TKPROF and TRCSESS
are based on the analysis of SQL trace files. Understanding the format of extended SQL trace
files is an essential skill for any DBA who is confronted with performance problems or troubleshooting tasks. Since formatting trace files with TKPROF obscures important information,
such as statement hash values, timestamps, recursive call depths, instance service name,
module, action, and SQL identifiers (V$SQL.SQL ID),1 it is often mandatory to read and understand the trace files themselves.
Introduction to Extended SQL Trace Files
Extended SQL trace files are by and large a statement-by-statement account of SQL and PL/SQL
executed by a database client.2 Entries found in such files fall into these four major categories:
• Database calls (parse, execute, and fetch)
• Wait events
• Bind variable values
• Miscellaneous (timestamps, instance service name, session, module, action, and client
identifier)
Database calls, session identification, and other details from category miscellaneous are
logged when tracing is enabled at the lowest level 1, e.g., with ALTER SESSION SET SQL TRACE=TRUE,
whereas recording of wait events and bind variable values may be enabled independently.
1. TKPROF release 11.1.0.7 is the first release that includes SQL identifiers in the report.
2. Background processes may be traced, too, but they are normally not responsible for performance
problems.
271
272
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
How to obtain trace files at various levels of detail is the topic of Chapter 28. The trace levels
and the type of trace file entries they enable are summarized in Table 24-1.
Table 24-1. SQL Trace Levels
SQL Trace Level
Database Calls
Bind Variable Values
Wait Events
1
yes
no
no
4
yes
yes
no
8
yes
no
yes
12
yes
yes
yes
Sometimes extended SQL trace files are referred to as raw SQL trace files. Both terms are
indeed synonymous. Since there is nothing particularly raw about the files—they are perfectly
human-readable—I have decided not to use the adjective raw and will stick with the term extended
SQL trace file or simply trace file for the sake of conciseness.
SQL and PL/SQL Statements
The term cursor is often used in conjunction with SELECT statements and the iterative fetching
of rows returned by queries. However, the ORACLE DBMS uses cursors to execute any SQL or
PL/SQL statement, not just SELECT statements. SQL and PL/SQL statements in a trace file are
identified by their cursor number. Cursor numbers for SQL statements sent by clients start at 1. The
cursor number is the figure behind the pound sign (#) in entries such as PARSING IN CURSOR #1,
PARSE #1, EXEC #1, FETCH #1, WAIT #1, and STAT #1. These examples all refer to the same cursor
number 1. Each additional SQL statement run by the client receives another cursor number,
unless reuse of a cursor number is taking place after the cursor has been closed. Entries relating
to the same statement are interrelated through the cursor number.
Not all operations executed are assigned a proper cursor number. One notable exception
is the use of large objects (LOBs) through Oracle Call Interface (OCI). When working with LOBs,
you may see cursor number 0 or cursor numbers for which a parse call is missing, although
tracing was switched on right after connecting. This does not apply to the PL/SQL LOB interface DBMS LOB.
Cursor numbers may be reused within a single database session. When the client closes a
cursor, the DBMS writes STAT entries, which represent the execution plan, into the trace file. At
this stage, the cursor number can be reused for a different SQL statement. The SQL statement
text for a certain cursor is printed after the first PARSING IN CURSOR #n entry above any EXEC #n,
FETCH #n, WAIT #n, or STAT #n entries with the same cursor number n.
Recursive Call Depth
Anyone who has worked with the TKPROF utility is presumably familiar with the concept of
recursive and internal SQL statements. SQL Statements sent by a database client are executed
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
at recursive call depth 0. Should a SQL statement fire other statements, such as an INSERT statement, which fires the execution of an insert trigger, then these other statements would be
executed at recursive call depth 1. A trigger body may then execute additional statements,
which may cause recursive SQL at the next higher recursive call depth. Following is an example
of an INSERT statement executed at recursive call depth 0. The INSERT statement fires a trigger.
Access to a sequence in the trigger body is at recursive call depth 1. Note how the execution
of the top level INSERT statement (EXEC #3) is written to the trace file after the execution of the
dependent SELECT from the sequence has completed (EXEC #2).
PARSING IN CURSOR #3 len=78 dep=0 uid=61 oct=2 lid=61 tim=771237502562 hv=3259110965
ad='6c5f86dc'
INSERT INTO customer(name, phone) VALUES (:name, :phone) RETURNING id INTO :id
END OF STMT
PARSE #3:c=0,e=1314,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=771237502553
=====================
PARSING IN CURSOR #2 len=40 dep=1 uid=61 oct=3 lid=61 tim=771237506650 hv=1168215557
ad='6c686178'
SELECT CUSTOMER ID SEQ.NEXTVAL FROM DUAL
END OF STMT
PARSE #2:c=0,e=1610,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,tim=771237506643
EXEC #2:c=0,e=57,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=771237507496
FETCH #2:c=0,e=54,p=0,cr=0,cu=0,mis=0,r=1,dep=1,og=1,tim=771237507740
EXEC #3:c=0,e=4584,p=0,cr=1,cu=3,mis=1,r=1,dep=0,og=1,tim=771237508046
When a client runs an anonymous PL/SQL block, the block itself is executed at recursive
call depth 0, but the statements inside the block will have recursive call depth 1. Another
example is auditing entries inserted into table SYS.AUD$. These are executed at one recursive
call depth higher than the statement that triggered the auditing.
Recursive parse, execute, and fetch operations are listed before the execution of the statement that triggered the recursive operations. Statistics for SQL statements executed at recursive
call depth 0 contain the cost of dependent statements in terms of CPU time, elapsed time,
consistent reads, and so forth. This must be taken into consideration to avoid double counting
when evaluating SQL trace files.
Database Calls
The database call category consists of the three subcategories parsing, execution, and fetching.
Note that these entries correspond with the three stages of running dynamic SQL with the
package DBMS SQL by calling the package subroutines DBMS SQL.PARSE, DBMS SQL.EXECUTE, and
DBMS SQL.FETCH ROWS.
Among other metrics, database call entries represent the CPU and wall clock time (elapsed
time) a server process spends inside the ORACLE kernel on behalf of a database client. The
aggregated CPU and wall clock times from database calls in a trace file are closely related to the
session level statistics DB CPU and DB time in the dynamic performance view V$SESS TIME MODEL,
which is available in Oracle10g and subsequent releases.
273
274
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
Parsing
Parsing involves syntactic and semantic analysis of SQL statements as well as determining a
well-suited execution plan. Since this may be a costly operation, the ORACLE DBMS has the
capacity to cache the results of parse calls in the so-called library cache within the System
Global Area (SGA) for reuse by other statements that use the same SQL statement text.
The use of bind variables in SQL statements is crucial for the reuse of cached statements.
Failure to use bind variables causes increased parse CPU consumption, contention for the
library cache, excessive communication round-trips between client and server due to repeated
parse calls of non-reusable statements with literals, and difficulties in diagnosing performance
problems due to the inability of the TKPROF utility to aggregate statements, which are identical
apart from literals. I recommend reading the section “Top Ten Mistakes Found in Oracle Systems”
on page 3-4 of Oracle Database Performance Tuning Guide 10g Release 2 before beginning
design and coding of an application. While bind variables are mandatory to achieve scalability
in high volume transaction processing (OLTP), literals are usually preferred in data warehousing
applications to provide the CBO with as much information as possible and to avoid the unpredictability inherent in bind variable peeking. The CBO looks at bind variable values when it first
encounters a statement, but not on subsequent executions of the same statement, such that
the plan chosen may be optimal for the initial execution but inappropriate for subsequent
executions. This functionality is called bind variable peeking. It is enabled by default with the
hidden parameter setting OPTIM PEEK USER BINDS=TRUE.
Parsing is usually represented by two adjacent entries in the trace file. The first is PARSING
IN CURSOR, and the second is PARSE. The minimum SQL trace level for enabling parse related
entries is 1. Here’s an example of a PARSING IN CURSOR followed by a PARSE from an Oracle10g
trace file:
PARSING IN CURSOR #3 len=92 dep=0 uid=30 oct=2 lid=30 tim=81592095533 hv=1369934057
ad='66efcb10'
INSERT INTO poem (author, text) VALUES(:author, empty clob())
RETURNING text INTO :lob loc
END OF STMT
PARSE #3:c=0,e=412,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=81592095522
Parameters associated with the PARSING IN CURSOR entry are explained in Table 24-2.
Table 24-2. PARSING IN CURSOR Parameters
Parameter
Meaning
len
Length of the SQL statement text in bytes
dep
Recursive call depth
uid
Parsing user identity; corresponds to ALL USERS.USER ID and V$SQL.PARSING USER ID
oct
ORACLE command type; corresponds to V$SQL.COMMAND TYPE and V$SESSION.COMMAND
lid
Parsing schema identity; corresponds to ALL USERS.USER ID and V$SQL.PARSING
SCHEMA ID; may differ from uid (see Chapter 14 on ALTER SESSION SET CURRENT SCHEMA)
tim
Timestamp in microseconds; often slightly earlier than the value of tim in the associated
PARSE entry
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
Table 24-2. PARSING IN CURSOR Parameters
Parameter
Meaning
hv
Hash value; corresponds to V$SQL.HASH VALUE
ad
Address; corresponds to V$SQL.ADDRESS
sqlid
SQL identifier; corresponds to V$SQL.SQL ID (emitted by Oracle11g)
PARSING IN CURSOR Entry Format
Caching of SQL statements in the shared pool is based on a hash value that is derived from the
SQL or PL/SQL statement text. Changes in optimizer settings have no effect on the hash value,
whereas slight changes to the statement text such as insertion of a blank or tab character do.
Rarely, two statements with different statements texts may have the same hash value.
The hash value may be retrieved from many V$ views such as V$SQL, V$SQLTEXT, V$SQLAREA,
V$OPEN CURSOR, and V$SESSION. It remains constant across instance startups, but might change
after an upgrade to a new release. In fact, the algorithm for computing hash values has changed
in Oracle10g. The hash value compatible with previous releases is available in the column
OLD HASH VALUE of the views V$SQL and V$SQLAREA. Merely the hash value is emitted to trace
files. Since Statspack stuck with the “old school” hash value but merely the new hash value is
emitted to trace files, this adds the complexity of translating from the new hash value to the old
hash value when searching a Statspack repository for information pertinent to statements in a
trace file (more on this in Chapter 25).
Oracle10g introduced the new column SQL ID to some of the aforementioned V$ views.
The value of this new column is not written to SQL trace files in releases prior to Oracle11g, but
is used in Active Workload Repository reports, such that translation from the new hash value
(column HASH VALUE) to the SQL ID may be required when looking up information on a statement in AWR. For cached SQL statements, translation among SQL ID, HASH VALUE, and OLD
HASH VALUE may be performed using V$SQL. For statements that are no longer cached, but were
captured by a Statspack snapshot, STATS$SQL SUMMARY serves as a translation table (the Rosetta
Stone of SQL statement identifiers). AWR has no facility for translating the hash value found in
trace files to the corresponding SQL ID. In releases prior to Oracle11g, matching the statement
text between both types of capture is the only time consuming approach for extracting historical information on a statement, such as past execution time and plan, from AWR (see Chapter 26).
Considering that Statspack requires no extra license and includes session level capture and
reporting (watch out for bug 5145816; see Table 25-4 in Chapter 25), this shortcoming of AWR
might be another reason for favoring Statspack.
Oracle11g is the first DBMS release that emits the SQL identifier (V$SQL.SQL ID) in addition
to the hash value to trace files. Hence the statement matching issue between extended SQL
trace and AWR is a thing of the past for users of Oracle11g. Following is an example of an
Oracle11g PARSING IN CURSOR entry:
PARSING IN CURSOR #3 len=116 dep=0 uid=32 oct=2 lid=32 tim=15545747608 hv=1256130531
ad='6ab5ff8c' sqlid='b85s0yd5dy1z3'
INSERT INTO customer(id, name, phone)
VALUES (customer id seq.nextval, :name, :phone) RETURNING id INTO :id
END OF STMT
275
276
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
The SQL statement parsed is printed on a new line after the line starting with the string
PARSING IN CURSOR. A line starting with END OF STMT marks the end of the SQL statement. The
mapping from the numeric command type (parameter oct) to the command name is available
by running SELECT action, name FROM audit actions. Table 24-3 contains the most common
command types plus some additional command types that may be used by applications. Please
note that these numeric command types do not correspond with Oracle Call Interface SQL
command codes (Oracle Call Interface Programmer’s Guide 10g Release 2, Appendix A).
Table 24-3. SQL and PL/SQL Command Types
Numeric Command Type
SQL or PL/SQL Command
2
INSERT
3
SELECT
6
UPDATE
7
DELETE
26
LOCK TABLE
44
COMMIT
45
ROLLBACK
46
SAVEPOINT
47
PL/SQL block
48
SET TRANSACTION
55
SET ROLE
90
SET CONSTRAINTS
170
CALL
189
MERGE
PARSE Entry Format
Among other metrics, PARSE entries represent CPU and wall clock time consumed by parse
operations. By looking at the parameter mis (library cache miss) it is possible to derive the
library cache hit ratio from trace files. A low hit ratio usually indicates that the application does
not use bind variables. Details on the parameters of the PARSE entry are in Table 24-4.
Table 24-4. PARSE Parameters
Parameter
Meaning
c
CPU consumption
e
Elapsed time
p
Physical reads
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
Table 24-4. PARSE Parameters
Parameter
Meaning
cr
Consistent reads
cu
Current blocks processed
mis
Cursor misses, 0=soft parse, i.e., statement found in library cache, 1=hard parse,
i.e., statement not found
r
Rows processed
dep
Recursive call depth
og
Optimizer goal, 1=ALL ROWS, 2=FIRST ROWS, 3=RULE, 4=CHOOSE; Oracle9i default is
CHOOSE; Oracle10g and Oracle11g default is ALL ROWS
plh
Execution plan hash value; corresponds to V$SQL PLAN.PLAN HASH VALUE,
V$SQL PLAN STATISTICS ALL.PLAN HASH VALUE, and
V$SQLSTATS.PLAN HASH VALUE among others—this parameter was introduced in
release 11.1.0.7 ( see the section “Execution Plan Hash Value” in this chapter for
details)
tim
Timestamp in microseconds
The following SELECT statement confirms the links between trace file entries and data
dictionary views:
SQL> SELECT s.sql text, u1.username user name,
u2.username schema name, optimizer mode
FROM v$sql s, all users u1, all users u2
WHERE hash value='1369934057'
AND address=upper('66efcb10')
AND u1.user id=s.parsing user id
AND u2.user id=s.parsing schema id;
SQL TEXT USER NAME SCHEMA NAME OPTIMIZER MODE
-------------------------------- --------- ----------- -------------INSERT INTO poem (author, text) NDEBES NDEBES ALL ROWS
VALUES(:author, empty clob())
RETURNING text INTO :lob loc
PARSE #n may be missing, such that PARSING IN CURSOR #n is followed directly by EXEC #n,
such as in this trace file excerpt from a call to the $dbh->do function of the Perl DBI (more on
Perl DBI in Chapter 22):
PARSING IN CURSOR #4 len=69 dep=0 uid=30 oct=42 lid=30 tim=81591952901 hv=3164292706
ad='67339f7c'
alter session set events '10046 trace name context forever, level 12'
END OF STMT
EXEC #4:c=0,e=68910,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=81591952889
277
278
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
PARSE ERROR Entry Format
Failing parse calls due to incorrect syntax or insufficient privileges result in errors such as
“ORA-00942: table or view does not exist” or “ORA-00904: invalid identifier”. Such errors are
marked by PARSE ERROR entries. Here’s an example of a parse error due to an incorrectly spelled
column name:
PARSING IN CURSOR #6 len=93 dep=0 uid=30 oct=2 lid=30 tim=170048888062 hv=986445513
ad='676bb350'
INSERT INTO poem (author, txt) VALUES(:author, empty clob())
RETURNING ROWID INTO :row id
END OF STMT
PARSE #6:c=0,e=457,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=170048888050
=====================
PARSING IN CURSOR #7 len=198 dep=1 uid=0 oct=3 lid=0 tim=170048915712 hv=4125641360
ad='67b0c8d4'
…
PARSE ERROR #6:len=93 dep=0 uid=30 oct=2 lid=30 tim=170048972426 err=904
INSERT INTO poem (author, txt) VALUES(:author, empty clob())
RETURNING ROWID INTO :row id
As in the preceding example, the DBMS may need to run recursive SQL statements (dep=1)
to process the parse call in order to load the dictionary cache. Abbreviations used are analogous to PARSE entries, except for the last field err, which indicates the ORACLE error number.
On UNIX systems, the error message text can be retrieved by calling oerr ora error_number
from a shell command line. Where appropriate, the output of oerr also includes a probable
cause of the error and suggests an action. The same information is available by looking up the
error number in the Oracle Database Error Messages manual.
$ oerr ora 942
00942, 00000, "table or view does not exist"
// *Cause:
// *Action:
EXEC Entry Format
EXEC entries have the same format as PARSE entries. EXEC is short for execution. The minimum
SQL trace level for enabling EXEC entries is 1. INSERT, UPDATE, DELETE, MERGE, PL/SQL, and DDL
operations all have an execution stage, but no fetch stage. Of course, recursive fetches at a
higher recursive call depth may occur on behalf of these operations. When formatting a trace
file containing a large number of these operations with TKPROF, you might want to sort the
statements with the option exeela (execute elapsed time). Execute elapsed time includes the
CPU time consumed by EXEC entries, though occasionally higher values of CPU than elapsed
time are reported.
FETCH Entry Format
FETCH entries adhere to the same format as PARSE and EXEC entries. The minimum SQL trace
level for enabling FETCH entries is 1. When formatting a trace file where fetching contributes
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
most to elapsed time with TKPROF, you might want to sort the statements with the option
fchela (fetch elapsed time). Fetch elapsed time includes fetch CPU time.
Execution Plan Hash Value
Starting with Oracle9i, execution plans were made available through the V$ views V$SQL PLAN
and V$SQL PLAN STATISTICS ALL. Formerly, execution plans had been available in SQL trace
files only. Multiple execution plans may be used for a single SQL statement both over time and
simultaneously. Varying optimizer statistics, differences in parameters related to the cost
based optimizer (a.k.a. optimizer environment), and adaptive cursor sharing introduced in
Oracle11g are potential causes for multiple plans. Release 11.1.0.7 (Oracle11g patch set 1) is the
first release that includes the execution plan hash value in SQL trace files. It is represented by
the parameter plh of PARSE, EXEC, and FETCH entries (see Table 24-4). This new feature resolves
an issue with SQL trace files where multiple plans for a statement are present in the SGA and
no plan is emitted to the SQL trace file. Formerly, it could not be determined which of the plans
applied to the traced session. Since the inclusion of plh in release 11.1.0.7 it is trivial to retrieve
missing plans from V$SQL PLAN. The new parameter also facilitates the retrieval of additional
information on an execution plan with the function DBMS XPLAN.DISPLAY CURSOR. This function
returns much more information on execution plans than is emitted to SQL trace files. In fact it
provides an unsurpassed level of detail that matches the extent of information produced by
event 10053. The signature of DBMS XPLAN.DISPLAY CURSOR is reproduced here:
FUNCTION display cursor(sql id VARCHAR2 DEFAULT NULL,
cursor child no INTEGER DEFAULT 0,
format VARCHAR2 DEFAULT 'TYPICAL')
RETURN dbms xplan type table
PIPELINED;
The amount of detail returned by the function depends on the parameter FORMAT. To achieve
the same level of detail as the output from event 10053, the two undocumented format options
ADVANCED and PEEKED_BINDS need to be supplied. ADVANCED includes a set of hints that
fully describe an execution plan in a section titled Outline Data. As the name suggests, the
format option PEEKED_BINDS reports the values of peeked bind variables, unless bind variable
peeking is disabled with OPTIM PEEK USER BINDS=FALSE. The next example illustrates the use
of DBMS XPLAN.DISPLAY CURSOR based on the correlation between trace file entries and V$ views.
Plan Hash Value Case Study
A join between the sample schema tables EMPLOYEES and DEPARTMENTS is used in the
following code example to highlight the extent of information available through both SQL trace
and V$ views:
SQL> ALTER SESSION SET SQL TRACE=TRUE;
Session altered.
SQL> VARIABLE dept id NUMBER
SQL> VARIABLE emp id NUMBER
SQL> VARIABLE fn VARCHAR2(30)
SQL> EXEC :dept id:=50; :emp id:=120; :fn:='Matthew'
SQL> SELECT emp.last name, emp.first name, d.department name
279
280
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
FROM hr.employees emp, hr.departments d
WHERE emp.department id=d.department id
AND d.department id=:dept id
AND emp.employee id=:emp id
AND first name=:fn;
PL/SQL procedure successfully completed.
In release 11.1.0.7, the preceding equi-join results in the following trace file entries (excerpt):
PARSING IN CURSOR #2 len=201 dep=0 uid=32 oct=3 lid=32 tim=1000634574600
hv=3363518900 ad='1d7f9688' sqlid='9w4xfcb47qfdn'
SELECT e.last name, e.first name, d.department name
FROM hr.employees e, hr.departments d
WHERE e.department id=d.department id
AND d.department id=:dept id
AND e.employee id=:emp id
AND first name=:fn
END OF STMT
PARSE #2:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=1000634574600
EXEC #2:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=4225575861,tim=1000634574600
FETCH #2:c=0,e=0,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=4225575861,tim=1000634574600
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=0 op='NESTED LOOPS (cr=4 pr=0 pw=0 time=0 us cost
=2 size=38 card=1)'
STAT #2 id=2 cnt=1 pid=1 pos=1 obj=19136 op='TABLE ACCESS BY INDEX ROWID DEPARTMENTS
(cr=2 pr=0 pw=0 time=0 us cost=1 size=16 card=1)'
STAT #2 id=3 cnt=1 pid=2 pos=1 obj=19137 op='INDEX UNIQUE SCAN DEPT ID PK (cr=1 pr=0
pw=0 time=0 us cost=0 size=0 card=1)'
STAT #2 id=4 cnt=1 pid=1 pos=2 obj=19139 op='TABLE ACCESS BY INDEX ROWID EMPLOYEES (
cr=2 pr=0 pw=0 time=0 us cost=1 size=22 card=1)'
STAT #2 id=5 cnt=1 pid=4 pos=1 obj=19143 op='INDEX UNIQUE SCAN EMP EMP ID PK (cr=1 p
r=0 pw=0 time=0 us cost=0 size=0 card=1)'
Since the SQL statement had not been cached in the library cache the parse call incurs a
cursor miss (mis=1). The plan hash value is not yet known at this point due to the cursor miss.
The plh would have been emitted if the statement had already been cached (mis=0). Both the
EXEC and FETCH calls include the execution plan hash value (plh). To obtain additional information on the execution plan represented by the preceding STAT entries (discussed later in
this chapter) using DBMS XPLAN.DISPLAY CURSOR, the child cursor number must be determined.
SQL> SELECT DISTINCT child number FROM v$sql plan WHERE sql id='9w4xfcb47qfdn'
AND plan hash value=4225575861;
CHILD NUMBER
-----------0
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
In cases where the preceding query returns more than a single row, the V$ view V$SQL
SHARED CURSOR shows what type of mismatch is responsible for the creation of multiple child
cursors. Nevertheless, statements with identical SQL identifiers (sqlid) and plan hash values
are processed using the same execution plan. Following is an example that selects only three
out of 61 columns describing potential reasons for a mismatch:
SQL> SELECT child number, user bind peek mismatch, optimizer mode mismatch,
bind mismatch
FROM v$sql shared cursor
WHERE sql id='9w4xfcb47qfdn';
CHILD NUMBER USER BIND PEEK MISMATCH
OPTIMIZER MODE MISMATCH BIND MISMATCH
------------ ------------------------- ------------------------- ------------0 N
N
N
1 N
Y
N
At this stage both the SQL identifier and the child cursor number are known, such that
DBMS XPLAN may be called. The format option ALLSTATS is a shortcut for IOSTATS (column
Buffers in the plan) combined with MEMSTATS (automatic PGA memory management statistics).
The option LAST requests execution statistics for merely the last as opposed to all executions of a
statement. To improve readability, the following plan table has been split in two parts and the
column Id is repeated for each:
SQL> SELECT *
FROM TABLE (dbms xplan.display cursor('9w4xfcb47qfdn', 1,
'ADVANCED ALLSTATS LAST +PEEKED BINDS'));
PLAN TABLE OUTPUT
-----------------------------------------------------------------------------------SQL ID 9w4xfcb47qfdn, child number 1
------------------------------------SELECT e.last name, e.first name, d.department name FROM hr.employees
e, hr.departments d WHERE e.department id=d.department id AND
d.department id=:dept id AND e.employee id=:emp id AND first name=:fn
Plan hash value: 4225575861
------------------------------------------------------------------------------| Id | Operation
| Name
| Starts | E-Rows |E-Bytes|
------------------------------------------------------------------------------| 0 | SELECT STATEMENT
|
|
1 |
|
|
| 1 | NESTED LOOPS
|
|
1 |
1 |
38 |
| 2 |
TABLE ACCESS BY INDEX ROWID| DEPARTMENTS
|
1 |
1 |
16 |
|* 3 |
INDEX UNIQUE SCAN
| DEPT ID PK
|
1 |
1 |
|
|* 4 |
TABLE ACCESS BY INDEX ROWID| EMPLOYEES
|
1 |
1 |
22 |
|* 5 |
INDEX UNIQUE SCAN
| EMP EMP ID PK |
1 |
1 |
|
-------------------------------------------------------------------------------
281
282
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
-------------------------------------------------------------| Id | Cost (%CPU)| E-Time
| A-Rows |
A-Time
| Buffers |
-------------------------------------------------------------| 0 |
2 (100)|
|
1 |00:00:00.01 |
4 |
| 1 |
2 (0)| 00:00:01 |
1 |00:00:00.01 |
4 |
| 2 |
1 (0)| 00:00:01 |
1 |00:00:00.01 |
2 |
|* 3 |
0 (0)|
|
1 |00:00:00.01 |
1 |
|* 4 |
1 (0)| 00:00:01 |
1 |00:00:00.01 |
2 |
|* 5 |
0 (0)|
|
1 |00:00:00.01 |
1 |
-------------------------------------------------------------Query Block Name / Object Alias (identified by operation id):
------------------------------------------------------------1 - SEL$1
2 - SEL$1 / D@SEL$1
3 - SEL$1 / D@SEL$1
4 - SEL$1 / E@SEL$1
5 - SEL$1 / E@SEL$1
Outline Data
------------/*+
BEGIN OUTLINE DATA
IGNORE OPTIM EMBEDDED HINTS
OPTIMIZER FEATURES ENABLE('11.1.0.7')
DB VERSION('11.1.0.7')
FIRST ROWS
OUTLINE LEAF(@"SEL$1")
INDEX RS ASC(@"SEL$1" "D"@"SEL$1" ("DEPARTMENTS"."DEPARTMENT ID"))
INDEX RS ASC(@"SEL$1" "E"@"SEL$1" ("EMPLOYEES"."EMPLOYEE ID"))
LEADING(@"SEL$1" "D"@"SEL$1" "E"@"SEL$1")
USE NL(@"SEL$1" "E"@"SEL$1")
END OUTLINE DATA
*/
Peeked Binds (identified by position):
-------------------------------------1 - (NUMBER): 50
2 - (NUMBER): 120
3 - (VARCHAR2(30), CSID=178): 'Matthew'
Predicate Information (identified by operation id):
--------------------------------------------------3 - access("D"."DEPARTMENT ID"=:DEPT ID)
4 - filter(("FIRST NAME"=:FN AND "E"."DEPARTMENT ID"=:DEPT ID))
5 - access("E"."EMPLOYEE ID"=:EMP ID)
Column Projection Information (identified by operation id):
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
----------------------------------------------------------1 - "D"."DEPARTMENT NAME"[VARCHAR2,30], "FIRST NAME"[VARCHAR2,20], "E"."LAST NAME"
[VARCHAR2,25]
2 - "D"."DEPARTMENT NAME"[VARCHAR2,30]
3 - "D".ROWID[ROWID,10]
4 - "FIRST NAME"[VARCHAR2,20], "E"."LAST NAME"[VARCHAR2,25]
5 - "E".ROWID[ROWID,10]
68 rows selected.
The output of DBMS XPLAN provides information on estimated (E-Rows) versus actual rows
(A-Rows), estimated (E- Time) versus actual time (A-Time), peeked binds, predicates, and
column projection. Except for actual rows and actual time, this information is absent from SQL
trace files. If an inaccurate estimate leads to a slow execution plan, it may be corrected using a
SQL profile, given that a license for both the diagnostics and the tuning pack has been obtained.
Correcting a false assumption by the CBO may lead to a faster execution plan.
CLOSE Entry Format
As the name suggests, the CLOSE entry indicates that a cursor is closed. This type of entry was
introduced with release 11.1.0.7 (Oracle 11g Release 1 patch set 1). The minimum trace level for
enabling CLOSE entries is 1. Following is an example that shows how cursor 2 is opened by a
parse call, an update is executed, the execution plan is emitted, and the cursor is closed:
PARSING IN CURSOR #2 len=42 dep=0 uid=32 oct=6 lid=32 tim=1177102683619
hv=2139321929 ad='1f4b3f6c' sqlid='12kkd2pzs6xk9'
UPDATE hr.employees SET salary=salary*1.10
END OF STMT
PARSE #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=964452392,tim=1177102683619
EXEC #2:c=31250,e=167015,p=0,cr=7,cu=236,mis=0,r=108,dep=0,og=1,plh=964452392,
tim=1177102850634
STAT #2 id=1 cnt=0 pid=0 pos=1 obj=0 op='UPDATE EMPLOYEES (cr=7 pr=0 pw=0
time=0 us)'
STAT #2 id=2 cnt=108 pid=1 pos=1 obj=19139 op='TABLE ACCESS FULL EMPLOYEES (cr=7
pr=0 pw=0 time=0 us cost=3 size=428 card=107)'
CLOSE #2:c=0,e=0,dep=0,type=0,tim=1177105944410
Table 24-5 explains the parameters associated with the CLOSE entry. A cursor may be hard
closed (type=0) for several reasons, including the following:
• The cursor is associated with a DDL statement. DDL statements are not eligible for
caching.
• The server-side cursor cache is disabled, i.e., the initialization parameter
SESSION CACHED CURSORS has the value zero.
• The SQL statement is eligible for caching and the server-side cursor cache is enabled,
but the statement was executed less than three times.
283
284
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
Table 24-5. CLOSE Parameters
Parameter
Meaning
c
CPU time
e
Elapsed time
dep
Recursive call depth
type
Type of close operation:
0: hard close; the cursor is not put into the server-side cursor cache
1: the cursor is cached in a previously empty slot of the server-side
cursor cache, since it executed at least 3 times
2: the cursor is placed in a slot of the server-side cursor cache at the
expense of aging out another cursor, since it executed at least 3 times
3: the cursor remains in the server-side cursor cache
tim
Timestamp
The CLOSE entry is suitable for calculating the effectiveness of the server-side cursor cache.
Oracle11g is the first release that enables the server-side cursor cache with a default size of 50 slots.
In all prior releases the server-side cursor cache was disabled by default. The paper Designing
applications for performance and scalability ([Engs 2005]) released by Oracle Corporation in 2005
states that the scalability of applications that repeatedly soft-parse the same SQL statements
does improve with a sufficiently large value of SESSION CACHED CURSORS. Applications that
avoid repeated soft-parses, e.g., by enabling a client-side cursor cache, are even more scalable.
The method for enabling a client-side cursor cache depends on the Oracle API used. The
Oracle JDBC drivers include support for client-side cursor caching through the methods
setImplicitCachingEnabled and setStatementCacheSize of the class OracleConnection.
COMMIT and ROLLBACK
The ORACLE DBMS does not require clients to explicitly start a transaction. The DBMS automatically opens a transaction as soon as the first data item is modified or a distributed operation,
such as a SELECT from a table via a database link is performed. The latter operation takes a TX
and a DX enqueue, which are released when COMMIT is issued. Transaction boundaries in a trace
file are marked by XCTEND entries. The minimum SQL trace level for enabling XCTEND entries is 1.
Their format is as follows:
XCTEND rlbk=[0-1], rd only=[0-1]
Parameters used in XCTEND entries are explained in Table 24-6.
Table 24-6. XCTEND Parameters
Parameter
Meaning
rlbk
Short for rollback, rlbk=0: COMMIT, rlbk=1: ROLLBACK
rd only
Read only transaction, rd only=0: read/write operations have occurred, rd only=1:
read only—no data changed
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
The two parameters of the XCTEND entry allow four combinations. These are summarized in
Table 24-7. Note that rd only=1 does not imply that the session previously issued the statement
SET TRANSACTION READ ONLY.
Table 24-7. XCTEND Parameter Combinations
XCTEND Parameters
Client Operation
Example
rlbk=0, rd only=1
COMMIT, no data modified
COMMIT after SELECT from remote table
to release locks
rlbk=0, rd only=0
COMMIT, data modified
COMMIT after INSERT/UPDATE/DELETE of
one or more rows
rlbk=1, rd only=1
ROLLBACK, no data modified
ROLLBACK after SELECT from local
tables, e.g., to increment the snapshot
SCN after SET TRANSACTION READ ONLY
rlbk=1, rd only=0
ROLLBACK, data modified
ROLLBACK after changing data, e.g.,
through a MERGE statement
UNMAP
When interrupting ongoing disk sorts with Ctrl+C in Oracle10g, UNMAP entries are written to the
trace file irrespective of the setting of the parameter WORKAREA SIZE POLICY. These are apparently related to cleanup of the sort segment observed in V$SORT USAGE. In all my testing with
Oracle9i and Oracle10g I have never seen an UNMAP entry when a SELECT statement that triggered a
disk sort ran to completion. My conclusion is that UNMAP entries are not part of normal operations
and do not merit further investigation. Here’s an example:
FETCH #3:c=1972837,e=30633987,p=4725,cr=4405,cu=25,mis=0,r=1,dep=0,og=1,tim=18549919
3369
FETCH #3:c=0,e=3113,p=0,cr=0,cu=0,mis=0,r=15,dep=0,og=1,tim=185499197879
…
UNMAP #3:c=0,e=136661,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=185552120691
UNMAP parameters are the same as those of PARSE entries.
Execution Plans, Statistics, and the STAT
Entry Format
Execution plans and statistics are reported by STAT entries. Each line of a group of STAT entries
represents a row source that contributed to the result of a statement. The term row source
designates either data retrieved from a table or index or an intermediate result within a larger
execution plan. Since no more than two tables can be joined simultaneously, a three-way join
includes a row source that is the intermediate result of joining the other two tables. The minimum
SQL trace level for enabling STAT entries is 1. STAT entry parameters are summarized in Table 24-8.
285
286
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
Table 24-8. STAT Parameters
Parameter
Meaning
id
Identifier that denotes the order of row sources in the execution plan; normally
id=1 on the first STAT line of an execution plan.
cnt
Number of rows processed.
pid
Parent identifier; normally pid=0 on the first STAT line of an execution plan.
TKPROF and ESQLTRCPROF use id and pid to generate properly indented
execution plans with dependent steps of a plan indented by one more level
than their parent steps.
pos
Position of a step within the parent step.
obj
Object identifier; corresponds to ALL OBJECTS.OBJECT ID and V$SQL PLAN.OBJECT#.
op
Row source operation performed, such as table access, index scan, sort, union,
and so on; corresponds to V$SQL PLAN.OPERATION. In Oracle10g, op contains actual
statement execution metrics in parentheses after the row source information.
STAT Entry Format in Oracle9i
The only difference between Oracle9i and Oracle10g STAT entries is the amount of information
conveyed with the parameter op. The inclusion of actual execution metrics in op is not implemented in Oracle9i. Following is an example of a hash join, which is the parent row source of
two full table scans:
STAT
STAT
STAT
STAT
#3
#3
#3
#3
id=1
id=2
id=3
id=4
cnt=106 pid=0 pos=1 obj=0 op='SORT ORDER BY '
cnt=106 pid=1 pos=1 obj=0 op='HASH JOIN '
cnt=27 pid=2 pos=1 obj=6764 op='TABLE ACCESS FULL DEPARTMENTS '
cnt=107 pid=2 pos=2 obj=6769 op='TABLE ACCESS FULL EMPLOYEES '
The preceding STAT entries would be formatted by TKPROF as follows:
Rows
------106
106
27
107
Row Source Operation
------------------------------SORT ORDER BY
HASH JOIN
TABLE ACCESS FULL DEPARTMENTS
TABLE ACCESS FULL EMPLOYEES
STAT Entry Format in Oracle10g and Oracle11g
In Oracle10g and Oracle11g, as opposed to Oracle9i, STAT entries are only written when TIMED
STATISTICS=TRUE in addition to a SQL trace level of at least 1. Note that setting STATISTICS LEVEL=
BASIC (default is TYPICAL) in Oracle10g and Oracle11g implicitly sets TIMED STATISTICS=FALSE. This
behavior may be overridden by explicitly setting TIMED STATISTICS=TRUE. Except for some
additional statistics in parentheses at the end of the operation (bold in the following code
example), STAT entries in Oracle9i and subsequent releases have the same format:
STAT #4 id=3 cnt=107 pid=2 pos=1 obj=16496 op='TABLE ACCESS FULL EMPLOYEES (cr=7
pr=0 pw=0 time=725 us)'
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
The additional information, which is useful for identifying expensive row sources in the
execution plan, depends on the DBMS release and is summarized in Table 24-9.
Table 24-9. STAT Execution Statistics
Parameter
Meaning
cr
Consistent reads
pr
Physical reads
pw
Physical writes
time
Estimated elapsed time in microseconds
cost
Cost of the execution plan calculated by CBO (requires Oracle11g)
size
Estimated data volume in bytes (requires Oracle11g); the estimate is based on
object statistics (DBA TABLES, etc.)—information from the segment header is
used if object statistics are not available
card
Estimated cardinality, i.e., number of rows processed (requires Oracle11g); the
estimate is based on object statistics
Oracle11g STAT entries have the most verbose format.
The following example (trace file excerpt) shows how the execution plan of a two-way join
is represented by STAT entries in an Oracle10g trace file:
PARSING IN CURSOR #4 len=140 dep=0 uid=5 oct=3 lid=5 tim=105385553438 hv=782962817
ad='670e3cf4'
SELECT e.last name, e.first name, d.department name
FROM hr.employees e, hr.departments d
WHERE e.department id=d.department id
ORDER BY 1, 2
END OF STMT
…
STAT #4 id=1 cnt=106 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=115 pr=0 pw=0 time=5720
us)'
STAT #4 id=2 cnt=106 pid=1 pos=1 obj=0 op='NESTED LOOPS (cr=115 pr=0 pw=0 time=7302
us)'
STAT #4 id=3 cnt=107 pid=2 pos=1 obj=16496 op='TABLE ACCESS FULL EMPLOYEES (cr=7 pr=
0 pw=0 time=725 us)'
STAT #4 id=4 cnt=106 pid=2 pos=2 obj=16491 op='TABLE ACCESS BY INDEX ROWID DEPARTMEN
TS (cr=108 pr=0 pw=0 time=4336 us)'
STAT #4 id=5 cnt=106 pid=4 pos=1 obj=16492 op='INDEX UNIQUE SCAN DEPT ID PK (cr=2 pr
=0 pw=0 time=1687 us)'
When formatting execution plans, TKPROF preserves the additional information provided
by the parameter op.
287
288
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
Rows
------106
107
106
106
Row Source Operation
--------------------------------------------------NESTED LOOPS (cr=223 pr=0 pw=0 time=166 us)
TABLE ACCESS FULL EMPLOYEES (cr=11 pr=0 pw=0 time=1568 us)
TABLE ACCESS BY INDEX ROWID DEPARTMENTS (cr=212 pr=0 pw=0 time=4747 us)
INDEX UNIQUE SCAN DEPT ID PK (cr=106 pr=0 pw=0 time=2151 us)(object id
51906)
This trace file excerpt shows the more verbose format of Oracle11g:
STAT #4 id=1 cnt=106 pid=0 pos=1 obj=0 op='MERGE JOIN (cr=19 pr=0 pw=0 time=0 us cos
t=6 size=3604 card=106)'
STAT #4 id=2 cnt=27 pid=1 pos=1 obj=16967 op='TABLE ACCESS BY INDEX ROWID DEPARTMENT
S (cr=12 pr=0 pw=0 time=26 us cost=2 size=432 card=27)'
STAT #4 id=3 cnt=27 pid=2 pos=1 obj=16968 op='INDEX FULL SCAN DEPT ID PK (cr=6 pr=0
pw=0 time=11 us cost=1 size=0 card=27)'
STAT #4 id=4 cnt=106 pid=1 pos=2 obj=0 op='SORT JOIN (cr=7 pr=0 pw=0 time=24 us cost
=4 size=1926 card=107)'
STAT #4 id=5 cnt=108 pid=4 pos=1 obj=16970 op='TABLE ACCESS FULL EMPLOYEES (cr=7 pr=
0 pw=0 time=4 us cost=3 size=1926 card=107)'
Wait Events
Capturing wait events is essential for solving performance problems due to waiting for resources
such as disk access, latches, locks, or inter-instance communication in RAC. Oracle10g is the
first release that has a documented interface for tracing wait events—the PL/SQL package
DBMS MONITOR.
WAIT Entry Format
The minimum SQL trace level for enabling WAIT entries is 8. Never use a SQL trace level below
8 when investigating performance problems. Otherwise the contribution of wait events to
response time will be unknown, thus preventing diagnosis of performance problems where
waiting and not CPU consumption is the most important contribution to response time.
Each wait event is associated with up to three parameters that provide more detail on the
event. Many wait events are documented in the Oracle Database Reference (e.g., Appendix C of
Oracle Database Reference 10g Release 2). The full listing of wait events and their parameters is
available by querying the view V$EVENT NAME like this:
SQL> SELECT name, parameter1, parameter2, parameter3 FROM v$event name ORDER BY 1;
Some example rows from the result of the query are shown here:
NAME
----------------------------SQL*Net message from client
SQL*Net message from dblink
SQL*Net message to client
SQL*Net message to dblink
db file parallel write
PARAMETER1
--------------driver id
driver id
driver id
driver id
requests
PARAMETER2
PARAMETER3
------------------ ---------#bytes
#bytes
#bytes
#bytes
interrupt
timeout
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
db file scattered read
db file sequential read
direct path read
direct path write
enq: MR - contention
enq: ST - contention
enq: TM - contention
enq: TX - allocate ITL entry
enq: TX - contention
enq: TX - index contention
enq: TX - row lock contention
enq: UL - contention
enq: US - contention
latch free
latch: gcs resource hash
latch: ges resource hash list
latch: library cache
latch: library cache lock
latch: library cache pin
latch: redo allocation
latch: redo copy
latch: shared pool
file#
file#
file number
file number
name|mode
name|mode
name|mode
name|mode
name|mode
name|mode
name|mode
name|mode
name|mode
address
address
address
address
address
address
address
address
address
block#
block#
first dba
first dba
0 or file #
0
object #
usn<<16 | slot
usn<<16 | slot
usn<<16 | slot
usn<<16 | slot
id
undo segment #
number
number
number
number
number
number
number
number
number
blocks
blocks
block cnt
block cnt
type
0
table/partition
sequence
sequence
sequence
sequence
0
0
tries
tries
tries
tries
tries
tries
tries
tries
tries
Parameters of WAIT entries relate to data dictionary or V$ views. For example, the parameters “file#” and “file number” both correspond to V$DATAFILE.FILE#, the parameter “object#”
refers to DBA OBJECTS.OBJECT ID, and the parameter “address” of latch waits corresponds to
V$LATCH.ADDRESS. Since latch and enqueue wait event names are more specific in Oracle10g,
the need to drill down to the referenced enqueue or latch through the parameters of a wait
event has been greatly reduced compared to Oracle9i.
WAIT in Oracle9i
WAIT entry parameters in Oracle9i are called p1, p2, and p3 generically. The meanings of the parameters are not evident from their names as in Oracle10g. Instead, the meanings of the parameters
must be derived from the wait event name using the documentation or V$EVENT NAME. The
parameters of Oracle9i WAIT entries are provided in Table 24-10.
Table 24-10. Oracle9i WAIT Parameters
Parameter
Meaning
ela
Elapsed time in microseconds (centiseconds in de-supported releases such as
Oracle8i and earlier).
p1
First parameter of the wait event. The meaning of the parameter is to be found
in V$EVENT NAME.PARAMETER1. The value corresponds to V$SESSION WAIT.P1.
p2
Second parameter of the wait event, analogous to p1, i.e., look at V$EVENT NAME.
PARAMETER2 for the meaning of the parameter.
p3
Third parameter of the wait event, analogous to p1.
289
290
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
Following is an example WAIT entry due to a single block disk read associated with cursor 4:
WAIT #4: nam='db file sequential read' ela= 19045 p1=1 p2=19477 p3=1
In the preceding wait event, p1 is the data file number (V$DATAFILE.FILE#), p2 is the block
within the data file, and p3 specifies the number of blocks read. As you will see shortly, wait
entries in Oracle9i trace files lack the timestamp (tim) that Oracle10g wait entries have.
WAIT in Oracle10g and Oracle11g
Oracle10g has 872 wait events (Oracle11g has 959). This is partly due to the fact that the single
wait event enqueue in Oracle9i has been replaced by 208 individual wait events for each kind of
enqueue. Similarly, Oracle10g has 27 wait events for latches whereas Oracle9i has only two.
Oracle10g and Oracle11g wait entries have meaningful parameter names instead of the
generic p1, p2, and p3 parameters in Oracle9i. A timestamp has also been added. Here’s
an example:
WAIT #3: nam='db file scattered read' ela= 22652 file#=4 block#=253 blocks=4
obj#=14996 tim=81592211996
A total of 276 different wait event parameters exist in Oracle10g (293 in Oracle11g). Of
these, a significant portion has different spelling, but identical meaning, such as “retry count”
and “retry_count”. Another example is the parameters “obj#”, “object #”, and “object_id”. The
parameters may be retrieved by running the following query:
SQL> SELECT PARAMETER1
FROM v$event name
UNION
SELECT PARAMETER2
FROM v$event name
UNION
SELECT PARAMETER3
FROM v$event name
ORDER BY 1;
Bind Variables
To obtain the maximum amount of diagnostic data, tracing of bind variables should be enabled.
Details on bind variables include the data type and value of a bind variable. Without this
information, it is impossible to find out whether an index was not used due to a mismatch
between the data type of an indexed column and the bind variable data type, for example an
indexed DATE column and a bind variable with type TIMESTAMP. A bind data type mismatch may
also cause increased CPU usage due to conversion from one data type to another. Bind variable
values may be used to tune queries under the exact same conditions that they were captured.
Without bind variable values, example values for query tuning must be retrieved from the
tables. This is a very tedious and time-consuming process. Better to accept the additional file
size and higher measurement intrusion incurred with bind variable tracing than waste time on
finding sample values.
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
BINDS Entry Format
The minimum SQL trace level for enabling BINDS entries is 4. The structure of a BINDS entry
consists of the word BINDS followed by the cursor number and a separate subsection (bind n
or BIND #n) for each bind variable.
BINDS #m:
<subsection
<details of
…
<subsection
<details of
0>
bind variable 0>
n>
bind variable n>
Bind variables are numbered from left to right within the statement text starting at zero.
When associating bind variables with subsections, do not pay attention to numerals, which
may be included in the name of a bind variable (e.g., :B1). As you will see in the examples that
follow, the bind variable name :B<n+1> may appear before :Bn, where n is an integer. This is
indeed confusing. The correct association is formed by reading the statement text from left to
right and top to bottom. Subsection 0 provides details on the first bind variable thus encountered,
subsection 1 on the second, and so on.
Bind in Oracle9i
The following anonymous PL/SQL block will serve as an example for examining BINDS entries:
DECLARE
sal number(8,2):=4999.99;
dname varchar2(64):='Shipping';
hired date:=to date('31.12.1995','dd.mm.yyyy');
BEGIN
FOR emp rec IN ( SELECT e.last name, e.first name, e.salary, d.department name
FROM hr.employees e, hr.departments d
WHERE e.department id=d.department id
AND e.salary > sal
AND e.hire date > hired
AND d.department name=dname) LOOP
null;
END LOOP;
END;
/
Tracing this block at SQL trace level 4 or 12 yields the trace file entries including a BINDS
section as shown here:
PARSING IN CURSOR #2 len=209 dep=1 uid=0 oct=3 lid=0 tim=118435024029 hv=3341549851
ad='19cf2638'
SELECT e.last name, e.first name, e.salary, d.department name
FROM hr.employees e, hr.departments d
WHERE e.department id=d.department id
AND e.salary > :b3
AND e.hire date > :b2
291
292
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
AND d.department name=:b1
END OF STMT
PARSE #2:c=15625,e=75118,p=2,cr=150,cu=0,mis=1,r=0,dep=1,og=0,tim=118435024022
…
BINDS #2:
bind 0: dty=2 mxl=22(21) mal=00 scl=00 pre=00 oacflg=03 oacfl2=1 size=64 offset=0
bfp=05babee8 bln=22 avl=04 flg=05
value=4999.99
bind 1: dty=12 mxl=07(07) mal=00 scl=00 pre=00 oacflg=03 oacfl2=1 size=0 offset=24
bfp=05babf00 bln=07 avl=07 flg=01
value="12/31/1995 0:0:0"
bind 2: dty=1 mxl=32(08) mal=00 scl=00 pre=00 oacflg=03 oacfl2=1 size=0 offset=32
bfp=05babf08 bln=32 avl=08 flg=01
value="Shipping"
Bind variable values are printed as strings instead of an internal representation. Thus, it is
easy to reproduce a traced statement by using the captured bind variable values. Subsection 0
(bind 0) is associated with bind variable :b3. Subsection 1 (bind 1) is associated with bind variable :b2. Oracle9i uses 14 parameters to convey detailed information on bind variables. A detailed
explanation of the parameters is in Table 24-11.
Table 24-11. Oracle9i BIND Parameters
Parameter
Meaning
dty
Data type code
mxl
Maximum length of the bind variable value (private maximum length in parentheses)
mal
Array length
scl
Scale
pre
Precision
oacflg
Special flag indicating bind options
oacflg2
Second part of oacflg
size
Amount of memory to be allocated for this chunk
offset
Offset into this chunk for this bind buffer
bfp
Bind address
bln
Bind buffer length
avl
Actual value length
flg
Bind status flag
value
Value of the bind variable
The Oracle Database SQL Reference manual contains a table for translating the numeric
data type codes to the corresponding data type name. The table is in the “Oracle Built-In
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
Datatypes” section in Chapter 2 of the manual and lists 18 different type codes for the 21 data
types implemented in Oracle9i (the national character set types NCHAR, NVARCHAR2 and NCLOB use
the same type code as CHAR, VARCHAR2, and CLOB respectively). The most common data type
codes and names are summarized in Table 24-12.
Table 24-12. Data Type Codes vs. Names
Data Type Code
Data Type Name
1
VARCHAR2, NVARCHAR2
2
NUMBER
12
DATE
96
CHAR, NCHAR
112
CLOB, NCLOB
113
BLOB
180
TIMESTAMP
The built-in SQL function DUMP(expr [, format [, position [, length]]]) may be used to
display the data type code of a column along with the internal representation of the column’s
value. The first argument to DUMP is the format for displaying the internal representation. The
formats supported are octal (format=8), decimal (default or format=10), hexadecimal (format=16),
and individual single-byte characters (format=17). Information on the character set is included,
if 1000 is added to one of the three aforementioned format settings (format=format+1000). When
position and length are omitted, the entire internal representation is returned. Otherwise, merely
the portion starting at offset position having length bytes is considered. Here’s an example:
SQL> SELECT dump(employee id, 16, 1, 1) empid dmp,
dump(last name, 1017,1,2) name dmp, dump(hire date, 10, 1,
FROM hr.employees
WHERE rownum=1;
EMPID DMP
NAME DMP
--------------- -----------------------------------------Typ=2 Len=3: c2 Typ=1 Len=8 CharacterSet=WE8MSWIN1252: O,C
1) date dmp
DATE DMP
----------------Typ=12 Len=7: 119
As one might expect, the values of Typ in the example correspond with the data type codes
in Table 24-12.
Bind in Oracle10g and Oracle11g
Oracle10g and Oracle11g Bind#n sections are different from an Oracle9i Bind#n section. Oracle10g
and Oracle11g use 17 parameters to convey detailed information on bind variables. There is
currently no information on three of the new parameters. Some of the parameters that were
also present in Oracle9i have been renamed. Taking this into account, Table 24-13 presents the
available information.
293
294
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
Table 24-13. Oracle10g and Oracle11g BINDS Parameters
Parameter
Meaning
oacdty
Data type code
mxl
Maximum length of the bind variable value (private maximum length in parentheses)
mxlc
Unknown
mal
Array length
scl
Scale
pre
Precision
oacflg
Special flag indicating bind options
fl2
Second part of oacflg
frm
Unknown
csi
Character set identifier of the database character set or national character set
(see Table 24-14)
siz
Amount of memory to be allocated for this chunk
off
Offset into the chunk of the bind buffer
kxsbbbfp
Bind address
bln
Bind buffer length
avl
Actual value length
flg
Bind status flag
value
Value of the bind variable
Table 24-14 shows some common values of the character set identifier (csi) parameter.
Note that the national character set is always a Unicode character set since Oracle9i.
Table 24-14. Common Character Sets and Their Identifiers
CSI
Character Set Name
Description
1
US7ASCII
ASCII 7-bit American
31
WE8ISO8859P1
ISO 8859-1 8-bit West European
46
WE8ISO8859P15
ISO 8859-15 8-bit West European
170
EE8MSWIN1250
MS Windows Code Page 1250 8-bit East European
178
WE8MSWIN1252
MS Windows Code Page 1252 8-bit West European
871
UTF8
Unicode 3.0 UTF-8 Universal character set
873
AL32UTF8
Unicode 4.0 UTF-8 Universal character set
2000
AL16UTF16
Unicode 4.0 UTF-16 Universal character set
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
Following is an example of an Oracle10g BINDS section. It is the result of tracing the same
SQL statement that was used in the previous example for Oracle9i.
PARSING IN CURSOR #1 len=205 dep=1 uid=67 oct=3 lid=67 tim=8546016035 hv=3746754718
ad='2548145c'
SELECT E.LAST NAME, E.FIRST NAME, E.SALARY, D.DEPARTMENT NAME
FROM HR.EMPLOYEES E, HR.DEPARTMENTS D
WHERE E.DEPARTMENT ID=D.DEPARTMENT ID
AND E.SALARY > :B3
AND E.HIRE DATE > :B2
AND D.DEPARTMENT NAME=:B1
END OF STMT
PARSE #1:c=0,e=96,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=8546016029
BINDS #1:
kkscoacd
Bind#0
oacdty=02 mxl=22(21) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1206001 frm=00 csi=00 siz=160 off=0
kxsbbbfp=07d9e508 bln=22 avl=04 flg=05
value=4999.99
Bind#1
oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1206001 frm=00 csi=00 siz=0 off=24
kxsbbbfp=07d9e520 bln=07 avl=07 flg=01
value="12/31/1995 0:0:0"
Bind#2
oacdty=01 mxl=128(64) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1206001 frm=01 csi=178 siz=0 off=32
kxsbbbfp=07d9e528 bln=128 avl=08 flg=01
value="Shipping"
Oracle10g introduced two additional data types. These are BINARY FLOAT and BINARY DOUBLE.
Their data type codes (oacdty) are 21 and 22 respectively.
Statement Tuning, Execution Plans, and Bind Variables
When tuning a statement that was captured with SQL trace and includes bind variables, do not
replace bind variables with literals. In doing so, the optimizer might make different decisions.
Instead, when tuning the statement in SQL*Plus, declare SQL*Plus bind variables matching the
data types of the bind variables in the trace file (parameter dty or oacdty). Since SQL*Plus variables do not support all data types, for example DATE and TIMESTAMP are not provided, you may
have to resort to an anonymous PL/SQL block to replicate the data types exactly. The use of
conversion functions, such as TO DATE or TO TIMESTAMP with a SQL*Plus variable of type VARCHAR2
when the original data type is not available, is another option, but this too might affect the plan
chosen by the optimizer. Even if you do reproduce the bind data types exactly, you may still get
a plan that is different from a previous execution, since the previous execution may have reused a
plan that was built based on different peeked bind variable values. If that plan has meanwhile
been aged out of the shared pool, you may get a different plan based on peeking the current
295
296
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
bind variable values. Thus, it’s a good idea to capture plans with AWR snapshots or level 6
Statspack snapshots.
The SQL statement EXPLAIN PLAN may be used to generate an execution plan for SQL statements that include bind variables. However, EXPLAIN PLAN knows nothing about the bind variable
data types and values. You need not even declare SQL*Plus variables to successfully use EXPLAIN
PLAN on a statement that includes bind variables.
EXPLAIN PLAN is notoriously unreliable. It should never be used for statement tuning, since
it regularly reports execution plans that are different from the actual plan used when the statement is executed. The Oracle Database Performance Tuning Guide 10g Release 2 has this to say
about EXPLAIN PLAN (page 19-4):
Oracle does not support EXPLAIN PLAN for statements performing implicit type
conversion of date bind variables. With bind variables in general, the EXPLAIN PLAN
output might not represent the real execution plan.
SQL trace files, V$SQL PLAN, V$SQL PLAN STATISTICS ALL, AWR, and the Statspack Repository
table STATS$SQL PLAN, which contains snapshots of V$SQL PLAN, are reliable sources for execution plans. Be warned that V$SQL PLAN may hold several plans for the same HASH VALUE or
SQL ID and there is no easy way of figuring out which plan was used. To avoid this pitfall, it is
possible to run tail -f on the trace file to display the tail of the file as it grows. The unformatted
execution plan in STAT entries is rather awkward to read. The Oracle10g pipelined table function
DBMS XPLAN.DISPLAY CURSOR has the most sophisticated solution to date. Its syntax is as follows:
dbms xplan.display cursor(
sql id IN VARCHAR2 DEFAULT NULL,
child number IN NUMBER DEFAULT NULL,
format IN VARCHAR2 DEFAULT 'TYPICAL')
RETURN dbms xplan type table PIPELINED;
When DBMS XPLAN.DISPLAY CURSOR is called without passing a sql id and child number,
the plan of the last cursor executed by the session is displayed, making the function an ideal
replacement for SQL*Plus AUTOTRACE. The format argument allows precise control of the output.
The most verbose output pertaining merely to the last execution of a statement is obtained by
using the format 'ALL ALLSTATS LAST'. To collect memory and I/O statistics, STATISTICS LEVEL=ALL
must be set at session level. The columns “A-Rows” (actual rows), “A-Time” (actual time),
“Buffers”, and “Reads”, which are present in the example that follows, are missing at the default
STATISTICS LEVEL=TYPICAL. The subsequent example illustrates several points:
• Flushing the buffer cache to provoke disk reads, which affects the values reported in
column “Buffers” of the execution plan generated with DBMS XPLAN.DISPLAY CURSOR.
• The use of SQL*Plus variables as bind variables in SQL statements.
• The impact of the optimizer environment, specifically the parameter OPTIMIZER INDEX
COST ADJ on execution plans.
• How to retrieve filter and access predicates along with an execution plan. Columns in a
WHERE-clause are called predicates.
• How to retrieve query block names, which may be useful for tuning with optimizer hints.
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
The example consists of running a two-way join on the tables EMPLOYEES and DEPARTMENTS
with varying optimizer parameter settings. The first iteration runs in the default optimizer
environment and results in a nested loops join, a full scan on DEPARTMENTS, and an index access
to EMPLOYEES. The smaller table DEPARTMENTS is used as the driving table in the join.
For the second iteration, OPTIMIZER INDEX COST ADJ was set to its maximum value of 10000,
provoking full table scans by tagging index accesses with a higher cost. This latter setting results
in a hash join and full table scans on both tables. Let’s walk through the example step by step.
First, the buffer cache is flushed3 and STATISTICS LEVEL is changed.
SQL> ALTER SYSTEM FLUSH BUFFER CACHE;
System altered.
SQL> ALTER SESSION SET statistics level=all;
Session altered.
Next, we declare three SQL*Plus bind variables to mimic the bind variables :B1, :B2, and
:B3 that we saw in the SQL trace file reproduced in the previous section and assign the bind
variable values reported in the trace file. Since SQL*Plus does not support variables with data
type DATE, VARCHAR2 is used for the variable HIRED and the function TO DATE is applied to convert
from VARCHAR2 to DATE.
SQL> VARIABLE sal NUMBER
SQL> VARIABLE hired VARCHAR2(10)
SQL> VARIABLE dname VARCHAR2(64)
SQL> EXEC :sal :=4999.99; :hired:='31.12.1995'; :dname:='Shipping'
PL/SQL procedure successfully completed.
Except for more readable bind variable names than :Bn and the addition of TO DATE, the
statement text found in the trace file is used.
SQL> SELECT e.last name, e.first name, e.salary, d.department name
FROM hr.employees e, hr.departments d
WHERE e.department id=d.department id
AND e.salary > :sal
AND e.hire date > to date(:hired, 'dd.mm.yy')
AND d.department name=:dname;
LAST NAME
FIRST NAME
SALARY DEPARTMENT NAME
------------------------- -------------------- ---------- --------------Weiss
Matthew
8000 Shipping
Fripp
Adam
8200 Shipping
Vollman
Shanta
6500 Shipping
Mourgos
Kevin
5800 Shipping
Next, DBMS XPLAN is called to retrieve the execution plan of the last statement executed. I
have split the execution plan output in two parts to make it more readable. The uppercase “E”
in the columns “E-Time”, “E-Bytes”, and “E- Rows” is short for estimated. Note that this first
iteration reports on child cursor number 0.
3. Flushing the buffer cache requires at least release Oracle10g.
297
298
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
SQL> SELECT *
FROM table (DBMS XPLAN.DISPLAY CURSOR(null, null, 'ALL ALLSTATS LAST'));
PLAN TABLE OUTPUT
------------------------------------SQL ID b70r97ta66g1j, child number 0
------------------------------------SELECT e.last name, e.first name, e.salary, d.department name
FROM hr.employees e, hr.departments d
WHERE e.department id=d.department id
AND e.salary > :sal
AND e.hire date > to date(:hired, 'dd.mm.yy')
AND d.department name=:dname
Plan hash value: 2912831499
------------------------------------------------------------------------| Id | Operation
| Name
| Starts | E-Rows |
------------------------------------------------------------------------|* 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES
|
1 |
3 |
| 2 | NESTED LOOPS
|
|
1 |
3 |
|* 3 |
TABLE ACCESS FULL
| DEPARTMENTS
|
1 |
1 |
|* 4 |
INDEX RANGE SCAN
| EMP DEPARTMENT IX |
1 |
10 |
------------------------------------------------------------------------------------------------------------------------------------------------|E-Bytes| Cost (%CPU)| E-Time
| A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------|
90 |
1
(0)| 00:00:01 |
4 |00:00:00.03 |
13 |
15 |
| 138 |
4
(0)| 00:00:01 |
47 |00:00:00.02 |
10 |
7 |
|
16 |
3
(0)| 00:00:01 |
1 |00:00:00.01 |
8 |
6 |
|
|
0
(0)|
|
45 |00:00:00.01 |
2 |
1 |
------------------------------------------------------------------------Query Block Name / Object Alias (identified by operation id):
------------------------------------------------------------1 - SEL$1 / E@SEL$1
3 - SEL$1 / D@SEL$1
4 - SEL$1 / E@SEL$1
Predicate Information (identified by operation id):
--------------------------------------------------1 - filter(("E"."SALARY">:SAL AND "E"."HIRE DATE">TO DATE(:HIRED,'dd.mm.yy')))
3 - filter("D"."DEPARTMENT NAME"=:DNAME)
4 - access("E"."DEPARTMENT ID"="D"."DEPARTMENT ID")
Column Projection Information (identified by operation id):
----------------------------------------------------------1 - "E"."FIRST NAME"[VARCHAR2,20], "E"."LAST NAME"[VARCHAR2,25], "E"."SALARY"[NUMBER
,22]
2 - "D"."DEPARTMENT NAME"[VARCHAR2,30], "E".ROWID[ROWID,10]
3 - "D"."DEPARTMENT ID"[NUMBER,22], "D"."DEPARTMENT NAME"[VARCHAR2,30]
4 - "E".ROWID[ROWID,10]
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
Projection information lists the subset of columns accessed by the query for each step in
the execution plan. The word projection is an academic term from relational algebra. For the
second iteration, let’s change the optimizer environment and see how this affects the execution plan.
SQL> ALTER SESSION SET optimizer index cost adj=10000;
Session altered.
SQL> SELECT e.last name, e.first name, e.salary, d.department name
FROM hr.employees e, hr.departments d
WHERE e.department id=d.department id
AND e.salary > :sal
AND e.hire date > to date(:hired, 'dd.mm.yy')
AND d.department name=:dname;
LAST NAME
FIRST NAME
SALARY DEPARTMENT NAME
------------------------- -------------------- ---------- --------------Weiss
Matthew
8000 Shipping
Fripp
Adam
8200 Shipping
Vollman
Shanta
6500 Shipping
Mourgos
Kevin
5800 Shipping
Note that the child cursor number reported in the next code example is 1. This is due to the
different plan, which results from a changed optimizer environment. The plan hash value has
also changed, while the SQL ID has remained the same. It is undocumented which parameter
changes force the optimizer to consider a new plan. When tuning a statement, it is wise to add
a unique comment to the statement text before each execution. This forces a hard parse and
ensures that the optimizer considers all aspects of the current environment, which may include
updated object and system statistics as well as modified initialization parameters.
Due to caching, a single disk read occurred when the statement was run the second time
(column “Reads”). This time, I have split the execution plan into three parts for better readability,
since memory usage statistics from the hash join made it even wider.
SQL> SELECT *
FROM table (DBMS XPLAN.DISPLAY CURSOR(null, null, 'ALL ALLSTATS LAST'));
PLAN TABLE OUTPUT
------------------------------------SQL ID b70r97ta66g1j, child number 1
------------------------------------SELECT e.last name, e.first name, e.salary, d.department name
FROM hr.employees e, hr.departments d
WHERE e.department id=d.department id
AND e.salary > :sal
AND e.hire date > to date(:hired, 'dd.mm.yy')
AND d.department name=:dname
Plan hash value: 2052257371
299
300
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
-----------------------------------------------------------------| Id | Operation
| Name
| Starts | E-Rows |E-Bytes|
-----------------------------------------------------------------|* 1 | HASH JOIN
|
|
1 |
3 |
138 |
|* 2 | TABLE ACCESS FULL| DEPARTMENTS |
1 |
1 |
16 |
|* 3 | TABLE ACCESS FULL| EMPLOYEES
|
1 |
31 |
930 |
---------------------------------------------------------------------------------------------------------------| Cost (%CPU)| E-Time | A-Rows | A-Time
|
----------------------------------------------|
7
(15)| 00:00:01 |
4 |00:00:00.01 |
|
3
(0)| 00:00:01 |
1 |00:00:00.01 |
|
3
(0)| 00:00:01 |
45 |00:00:00.01 |
-----------------------------------------------------------------------------------------| Buffers | Reads | OMem | 1Mem | Used-Mem |
-------------------------------------------|
12 |
1 | 887K| 887K| 267K (0) |
|
7 |
0 |
|
|
|
|
5 |
1 |
|
|
|
-------------------------------------------Query Block Name / Object Alias (identified by operation id):
------------------------------------------------------------1 - SEL$1
2 - SEL$1 / D@SEL$1
3 - SEL$1 / E@SEL$1
Predicate Information (identified by operation id):
--------------------------------------------------1 - access("E"."DEPARTMENT ID"="D"."DEPARTMENT ID")
2 - filter("D"."DEPARTMENT NAME"=:DNAME)
3 - filter(("E"."SALARY">:SAL AND "E"."HIRE DATE">TO DATE(:HIRED,'dd.mm.yy')))
Column Projection Information (identified by operation id):
----------------------------------------------------------1 - (#keys=1) "D"."DEPARTMENT NAME"[VARCHAR2,30], "E"."FIRST NAME"[VARCHAR2,20], "E"
."LAST NAME"[VARCHAR2,25], "E"."SALARY"[NUMBER,22]
2 - "D"."DEPARTMENT ID"[NUMBER,22], "D"."DEPARTMENT NAME"[VARCHAR2,30]
3 - "E"."FIRST NAME"[VARCHAR2,20], "E"."LAST NAME"[VARCHAR2,25], "E"."SALARY"[NUMBER
,22], "E"."DEPARTMENT ID"[NUMBER,22]
Tracing the SQL*Plus session at level 4 or 12 proves that the data types of the bind variables
(parameter oacdty) match the data types in the original trace file with the exception of the bind
variable with data type DATE (oacdty=12), which cannot be reproduced with a SQL*Plus variable.
Following is the relevant section of an Oracle10g trace file:
BINDS #3:
kkscoacd
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
oacflg=03 fl2=1000000 frm=00 csi=00 siz=184 off=0
kxsbbbfp=04d99cb0 bln=22 avl=04 flg=05
value=4999.99
Bind#1
oacdty=01 mxl=32(10) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1000000 frm=01 csi=178 siz=0 off=24
kxsbbbfp=04d99cc8 bln=32 avl=10 flg=01
value="31.12.1995"
Bind#2
oacdty=01 mxl=128(64) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1000000 frm=01 csi=178 siz=0 off=56
kxsbbbfp=04d99ce8 bln=128 avl=08 flg=01
value="Shipping"
The example has pointed out how to leverage information from BINDS entries in trace files
by reproducing the statement as closely as possible in respect to the data types of bind variables
and their values. Such precise reproduction of traced statements is the optimal starting point
for tuning.
To use the DISPLAY CURSOR functionality, the calling user must have SELECT privilege on
V$SQL, V$SQL PLAN, and V$SQL PLAN STATISTICS ALL. The role SELECT CATALOG ROLE may be
granted to ensure these privileges are available. The previous example showed that calling
DBMS XPLAN.DISPLAY CURSOR without arguments retrieves the plan of the previous statement
from V$SQL PLAN, even when several execution plans for a single statement (i.e., several child
cursors) exist. This functionality cannot be replicated in Oracle9i, since the column V$SESSION.
PREV CHILD NUMBER is not available.4
Even in Oracle10g, SQL*Plus AUTOTRACE uses EXPLAIN PLAN, such that it suffers from the
same deficiencies of EXPLAIN PLAN mentioned earlier. This also applies to the TKPROF option
EXPLAIN=user/password, which also runs EXPLAIN PLAN, this time even in a different database
session from the one that generated the trace file supplied to TKPROF, such that the chances of
getting incorrect results are even greater. Of course, you should never use this TKPROF switch,
but instead allow TKPROF to format the STAT lines in the trace file. In case a cursor was not
closed while tracing was active, there won’t be any STAT lines for that particular cursor in the trace
file. Under such circumstances, you need to query V$SQL PLAN (using DBMS XPLAN in Oracle10g),
which succeeds only if the statement in question is still cached. If the statement in question is
no longer cached, access the Statspack repository using the script $ORACLE HOME/rdbms/admin/
sprepsql.sql or the AWR using DBMS XPLAN.DISPLAY AWR.
Miscellaneous Trace File Entries
The miscellaneous category consists among others of entries that document which session,
module, or action generated trace file entries. Some of these entries are written automatically,
while others require application coding.
4. In Oracle9i, the underlying X$ fixed table X$KSUSE does not hold the child cursor number of the
previous statement either.
301
302
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
Session Identification
Session identification is always emitted to SQL trace files. In contrast to module or action identification, it does not require any application coding. The format of the session identification is
identical in Oracle9i and Oracle10g, such that the Oracle10g TRCSESS utility can be used to
extract trace information for a single session from multiple Oracle9i or Oracle10g shared server
trace files.
Irrespective of the server model used (dedicated or shared) each session is uniquely identified by a combination of two figures during the lifetime of the instance. That is, the same
combination of figures will not be reused, unless the DBMS instance is shut down and restarted.
Contrast this with V$SESSION.AUDSID, which is derived from the sequence SYS.AUDSES$, used for
auditing purposes, and available by calling USERENV('SESSIONID'). Speaking of USERENV, Oracle10g
and subsequent releases finally provide access to V$SESSION.SID to non-privileged users through
the undocumented option SID of the function USERENV as in USERENV('SID').5 Non-privileged in
this context means that access to V$ views has not been granted, for example through SELECT
CATALOG ROLE.
As an aside, the most appropriate general way to figure out the SID in Oracle9i is to run SELECT
sid FROM v$mystat WHERE ROWNUM=1. Here, general means that this works for non-privileged sessions
as well as sessions with SYSDBA and SYSOPER privileges. The query SELECT sid FROM v$session
WHERE audsid = userenv ('sessionid') is inappropriate for getting the SID of privileged sessions,
since these are not assigned a unique auditing session identifier (V$SESSION.AUDSID). Privileged
sessions have AUDSID=0 in Oracle9i and AUDSID=4294967295 in Oracle10g. This fact is undocumented in the Database Reference (Oracle9i and Oracle10g).
The first figure of the session identification is the session identifier, while the second is the
session serial number. The former is found in V$SESSION.SID, whereas the latter is accessible
through V$SESSION.SERIAL# and is incremented each time the SID is reused. The format of the
entry is shown here:
*** SESSION ID:(sid.serial#) YYYY-MM-DD HH24:MI:SS.FF3
The timestamp at the end of the line is depicted using ORACLE date format models (see
Oracle Database SQL Reference 10g Release 2, page 2-58). FF3 represents three fractional
seconds. An actual entry written on February 6th, 2007 on behalf of a session with V$SESSION.
SID=147 and V$SESSION.SERIAL#=40 is shown here:
*** SESSION ID:(147.40) 2007-02-06 15:53:20.844
Service Name Identification
Service name identification is always emitted to Oracle10g and Oracle11g SQL trace files. Oracle9i
does not have this feature. Service names in trace files refer to instance service names. Do not
confound these with Oracle Net service names defined in tnsnames.ora or a directory service
(see “Instance Service Name vs. Net Service Name” in this book’s Introduction for disambiguation). The service name of a session established using the bequeath adapter—such sessions
are always running on the same machine as the DBMS instance, do not go through the listener,
and require setting ORACLE SID to the same value the instance was started with—is “SYS$USERS”.
5. The documented alternative is SELECT sys context('USERENV', 'SID') FROM dual.
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
Such sessions do not provide an Oracle Net service name in the connect string. The connect
string merely contains username and password such as in this example:
$ sqlplus ndebes/secret
Sessions established through a listener by specifying an Oracle Net service name in the
connect string can have any service name known to the listener, provided that the Net service
name definition contains the specification SERVICE NAME=instance_service_name instead of
SID=oracle_sid. The format of the entry is as follows:
*** SERVICE NAME:(instance service name) YYYY-MM-DD HH24:MI:SS.FF3
Instance_service_name can be any of “SYS$USERS”, “SYS$BACKGROUND” (trace is from a
background process), or any instance service name known to a listener serving that particular
instance. Here are some examples:
*** SERVICE NAME:(SYS$USERS) 2007-06-12 08:43:24.241
*** SERVICE NAME:(TEN.world) 2007-06-13 17:38:55.289
For mandatory background processes of the instance, such as CKPT (checkpointer), SMON
(system monitor), and LGWR (log writer) the service name is an empty string such as in this
example:
$ grep "SERVICE NAME" ten lgwr 3072.trc
*** SERVICE NAME:() 2007-06-13 17:37:04.830
When creating services with the packaged procedure DBMS SERVICE.CREATE SERVICE, the
value of the parameter network name (not service name) is registered as an instance service
name with the listener (requires at least Oracle10g). Thus, the network name needs to be used in
Net service name definitions in tnsnames.ora and will appear in the SERVICE NAME entry. The
same applies to RAC Cluster Database Services created with the DBCA or srvctl, since these
are based on the functionality of the package DBMS SERVICE.
Application Instrumentation
The term application instrumentation refers to a programming technique, whereby a program
is capable of producing an account of its own execution time. The ORACLE DBMS is heavily
instrumented (wait events, counters). However, this instrumentation may be leveraged to a
greater degree when a database client informs the DBMS of the tasks (module and action) it is
performing. This section discusses trace file entries that are related to application instrumentation. The format of these entries has changed considerably from Oracle9i to Oracle10g, so the
material is presented by release. The minimum SQL trace level for enabling entries discussed
in this section is 1.
Application Instrumentation Entries in Oracle10g and Oracle11g
Table 24-15 lists the instrumentation entries of Oracle10g and Oracle11g in alphabetical order
along with the PL/SQL and OCI interfaces to generate them. Note that Oracle JDBC drivers
have Java instrumentation interfaces which are more efficient than calling PL/SQL from Java
(see Chapter 23). At the lowest level, application instrumentation is achieved with the Oracle
Call Interface (OCI) function OCIAttrSet (see Oracle Call Interface Programmer’s Guide).
303
304
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
Table 24-15. PL/SQL and OCI Interfaces for Instrumentation Entries
Trace File Entry
PL/SQL Interface
OCIAttrSet Attribute
ACTION NAME
DBMS_APPLICATION_INFO.SET_MODULE
DBMS_APPLICATION_INFO.SET_ACTION
OCI_ATTR_ACTION
CLIENT ID
DBMS_SESSION.SET_IDENTIFIER
OCI_ATTR_CLIENT_IDENTIFIER6
MODULE NAME
DBMS_APPLICATION_INFO.SET_MODULE
OCI_ATTR_MODULE
When running code such as the following in SQL*Plus, all three types of instrumentation
entries are written to a trace file.
C:> sqlplus ndebes/secret@ten g.oradbpro.com
Connected.
SQL> BEGIN
dbms application info.set module('mod', 'act');
dbms session.set identifier(sys context('userenv','os user') ||
'@' || sys context('userenv','host') || ' (' ||
sys context('userenv','ip address') || ')' );
END;
/
PL/SQL procedure successfully completed.
SQL> ALTER SESSION SET sql trace=TRUE;
Session altered.
The resulting trace file contains lines such as these:
*** ACTION NAME:(act) 2007-08-31 18:02:26.578
*** MODULE NAME:(mod) 2007-08-31 18:02:26.578
*** SERVICE NAME:(orcl.oradbpro.com) 2007-08-31 18:02:26.578
*** CLIENT ID:(DBSERVER\ndebes@WORKGROUP\DBSERVER (192.168.10.1)) 2007-08-31 18:02:2
6.578
*** SESSION ID:(149.21) 2007-08-31 18:02:26.578
The value orcl.oradbpro.com of SERVICE NAME stems from the use of this string as the
SERVICE NAME in the definition of the Net service name ten g.oradbpro.com.
These are the kinds of trace file entries that the Oracle10g TRCSESS utility searches for
when used to extract relevant sections from one or more trace files. The sections that follow
provide additional detail on the individual entries. For detailed information on TRCSESS, see
Chapter 23.
6. DBMS APPLICATION INFO.SET CLIENT INFO and the OCI attribute OCI ATTR CLIENT INFO set V$SESSION.
CLIENT INFO. This setting is not emitted to trace files and cannot be used in conjunction with the
package DBMS MONITOR.
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
Module Name
The module name is intended to convey the name of an application or larger module to the
DBMS. The default setting is NULL. SQL*Plus and Perl DBI automatically set a module name.
This example is from a SQL*Plus session:
*** MODULE NAME:(SQL*Plus) 2007-02-06 15:53:20.844
Action Name
An action name represents a smaller unit of code or a subroutine. A module might call several
subroutines, where each subroutine sets a different action name. The default setting NULL results in
a zero length action name.
*** ACTION NAME:() 2007-02-06 15:53:20.844
Client Identifier
Performance problems or hanging issues in three-tier environments, where the application
uses a database connection pool maintained by an intermediate application server layer, can
be extremely cumbersome to track down. Due to the connection pool in the middle tier,
performance analysts looking at V$ views or extended SQL trace files cannot form an association between an application user reporting a slow database and the database session or
server process within the ORACLE instance serving a particular user. There is no way to find
out which SQL statements are run on behalf of the complaining end user—unless the application is properly instrumented, which is something that has eluded me in my career as a DBA.
The client identifier is the answer to this dilemma. The package DBMS SESSION provides a
means for an application to communicate an identifier that uniquely designates an application
user to the DBMS. This identifier becomes the value of the column V$SESSION.CLIENT
IDENTIFIER. If SQL trace is enabled, this same identifier is also embedded in the SQL trace file.
The format is as follows:
*** CLIENT ID:(client identifier) YYYY-MM-DD HH24:MI:SS.FF3
Client_identifier is the client identifier set by calling the procedure DBMS SESSION.SET
IDENTIFIER from the application code. To extract trace information for a certain client identifier
from one or more SQL trace files, trcsess clientid=client_identifier can be used. The following
line shows an actual entry from a trace file. The client identifier used was ND. The entry was
written on February 6th, 2007.
*** CLIENT ID:(ND) 2007-02-06 15:53:20.844
The maximum length of a client identifier is 64 bytes. Strings exceeding this length are
silently truncated. When instrumenting applications with DBMS SESSION, consider that the
procedure DBMS SESSION.CLEAR IDENTIFIER does not write a CLIENT ID entry into the trace file,
leaving the client identifier in effect until it is changed with DBMS SESSION.SET IDENTIFIER.
When connection pooling is used, this may result in trace files where sections pertaining to
different client identifiers are not delineated. The solution consists of setting an empty client
identifier by passing NULL to the packaged procedure DBMS SESSION.SET IDENTIFIER instead of
calling the procedure DBMS SESSION.CLEAR IDENTIFIER.
305
306
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
Application Instrumentation Entries in Oracle9i
Running the same instrumentation code as in the section “Application Instrumentation Entries
in Oracle10g and Oracle11g” against Oracle9i results in far fewer entries being written to the
trace file. The instance service name is neither available in V$SESSION nor in the trace file. An
Oracle9i trace file contains lines such as these:
*** SESSION ID:(10.697) 2007-08-31 18:19:10.000
APPNAME mod='mod' mh=781691722 act='act' ah=3947624709
Obviously, the format for module and action is different in Oracle9i. Module and action
are always logged on a single line after the keyword APPNAME, even when just the action is set
with DBMS APPLICATION INFO.SET ACTION. Abbreviations used in the APPNAME entry are explained in
Table 24-16.
Table 24-16. Oracle9i APPNAME Parameters
Parameter
Meaning
mod
Module; corresponds to V$SESSION.MODULE
mh
Module hash value; corresponds to V$SESSION.MODULE HASH
act
Action; corresponds to V$SESSION.ACTION
ah
Action hash value; corresponds to V$SESSION.ACTION HASH
In Oracle9i, the client identifier is not written to the trace file. It is merely set in
V$SESSION.CLIENT IDENTIFIER.7
SQL> SELECT client identifier FROM v$session WHERE sid=10;
CLIENT IDENTIFIER
---------------------------------------------------------ndebes@WORKGROUP\DBSERVER
ERROR Entry Format
Errors during execution of SQL statements are marked by ERROR entries. These may occur for
statements that were parsed successfully, but failed to execute successfully. The minimum SQL
trace level for ERROR entries is 1. Here is an example:
PARSING IN CURSOR #6 len=94 dep=0 uid=30 oct=2 lid=30 tim=171868250869 hv=3526281696
ad='6778c420'
INSERT INTO poem (author, text) VALUES(:author, empty clob())
RETURNING ROWID INTO :row id
END OF STMT
PARSE #6:c=0,e=150,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=171868250857
EXEC #6:c=10014,e=52827,p=0,cr=2,cu=5,mis=0,r=0,dep=0,og=1,tim=171868303859
ERROR #6:err=372 tim=17186458
7. Strangely, the IP address was silently truncated from the client identifier by the Oracle 9.2.0.1.0
instance used.
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
The figure after the pound sign (#) is the cursor number of the failed statement. The reason
for the failing INSERT statement in the preceding example was “ORA-00372: file 4 cannot be
modified at this time”. The tablespace containing the table POEM had status read only. The
parameters of the ERROR entry are in Table 24-17.
Table 24-17. ERROR Parameters
Parameter
Meaning
err
Error number
tim
Timestamp in centiseconds (appears to be 0 at all times in Oracle9i, but has a
meaningful value in Oracle10g)
Sometimes applications do not report ORACLE error numbers or error messages. Under
such circumstances, it is very worthwhile to do a level 1 SQL trace, to discover which error the
DBMS throws. It is even possible to find out when an error occurred by looking at timestamps
and tim values in the trace file. Near the header of each trace file is a timestamp such as this:
*** SESSION ID:(150.524) 2007-06-22 15:42:41.018
After periods of inactivity, the DBMS automatically writes additional timestamps with the
following format into the trace file:
*** 2007-06-22 15:42:51.924
Intermittent timestamps can be forced by calling DBMS SYSTEM.KSDDT. Running the following
script will generate two errors which are 10 seconds apart, due to a call to DBMS LOCK.SLEEP:
SQL>
SQL>
SQL>
SQL>
SQL>
ALTER SESSION SET sql trace=TRUE;
ALTER TABLESPACE users READ WRITE /* is read write, will fail */;
EXEC dbms lock.sleep(10) /* sleep for 10 seconds */
EXEC dbms system.ksddt /* write timestamp to trace file */
ALTER TABLESPACE users READ WRITE /* is read write, will fail */;
The resulting Oracle10g trace file is shown here (excerpted):
1 *** SESSION ID:(150.524) 2007-06-22 15:42:41.018
2 PARSING IN CURSOR #1 len=55 dep=1 uid=0 oct=3 lid=0 tim=176338392169 hv=1950821498
ad='6784bfac'
…
3 EXEC #1:c=20028,e=208172,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=176339220462
4 ERROR #1:err=1646 tim=17633550
…
5 PARSING IN CURSOR #1 len=33 dep=0 uid=0 oct=47 lid=0 tim=176339264800
hv=2252395675
ad='67505814'
6 BEGIN dbms lock.sleep(10); END;
7 END OF STMT
8 PARSE #1:c=0,e=140,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=176339264790
307
308
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
9 *** 2007-06-22 15:42:51.924
10 EXEC #1:c=0,e=9997915,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,tim=176349294618
…
11 EXEC #1:c=70101,e=74082,p=0,cr=2,cu=0,mis=0,r=0,dep=0,og=1,tim=176349582606
12 ERROR #1:err=1646 tim=17634587
The first tim value after the timestamp in the header will serve as a reference. In the example,
“2007-06-22 15:42:41.018” in line 1 is approximately the same as 176338392169 microseconds
in line 2. The nearest tim value above the ERROR entry in line 4 is 176339220462 microseconds in
line 3. So between lines 2 and 3, (176339220462-176338392169) / 1000000 or 0.82 seconds have
passed. The nearest tim value above the second ERROR entry is 176349582606 in line 11. Between
line 3 and line 11, (176349582606-176339220462) / 1000000 or 10.36 seconds have passed. This
is reasonable, since the session was put to sleep for 10 seconds. The intermittent timestamp in
line 9 forced with DBMS SYSTEM.KSDDT confirms this. Timestamp (tim) values of ERROR entries only
have centisecond resolution. This is apparent when subtracting 17633550 (ERROR entry in line 4)
from 17634587 (ERROR entry in line 12), which yields 10.37 seconds ((17634587-17633550) / 100).
Using the same approach, we can compute that the second error in line 12 occurred
(176349582606-176338392169) / 1000000 or 11.190 seconds after 15:42:41.018. So the second
error occurred at 15:42:52.208, which is about ten seconds after the trace file header was
written and about 0.3 seconds after the timestamp written by DBMS SYSTEM.KSDDT. Of course,
this approach works for any trace file entry that has a tim field and not just ERROR entries. Why
there is an offset—about 3.7 seconds in my testing—between centisecond resolution tim
values in ERROR entries and microsecond resolution tim values in other entries is a question for
Oracle development.
I remember an assignment due to an escalated service request. It was believed that the
DBMS was erroneously throwing “ORA-06502: PL/SQL: numeric or value error”. By taking a
level 1 trace and looking for ERROR entries, I was able to prove that there were no errors in the
communication between the DBMS instance and the client. It turned out that it was an application coding error with a buffer that was too small. The ORA-06502 was thrown by the PL/SQL
engine of the Oracle Forms runtime environment, not the PL/SQL engine within the ORACLE
DBMS. Thus, there was no ERROR entry in the SQL trace file.
Application Instrumentation and Parallel Execution Processes
Among the three kinds of application instrumentation entries, solely the client identifier is
emitted to trace files from parallel execution processes. These trace files are created in the
directory assigned with the parameter BACKGROUND DUMP DEST and follow the naming scheme
ORACLE_SID pnnn spid.trc in Oracle10g, where n is a digit between 0 and 9 and spid corresponds to V$PROCESS.SPID.
The term query coordinator refers to a process that controls parallel execution slaves. The
slaves perform the actual work. The query coordinator is the same process that serves the database client. Since resource consumption, such as CPU time and I/O requests by parallel execution
slaves, is not rolled up into the statistics reported for the query coordinator, the client identifier
may be used in conjunction with the TRCSESS utility to get a full account of the resources
consumed by the query coordinator and the parallel execution processes it recruited.
The subsequent example shows how to use the TRCSESS utility to combine the trace file
from the query coordinator and four trace files from parallel execution processes into a single
trace file for formatting with TKPROF. A user with SELECT ANY DICTIONARY privilege might use
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
the table SYS.SOURCE$ for testing parallel query. Both a FULL and a PARALLEL hint are necessary
to scan the table in parallel. Event 10046 is used to enable tracing, since the procedure DBMS
MONITOR.SESSION TRACE ENABLE has no effect on parallel execution processes. Embedding the
auditing session identifier in the client identifier guarantees a unique client identifier.
SQL> ALTER SESSION SET EVENTS '10046 trace name context forever, level 8';
Session altered.
SQL> VARIABLE client identifier VARCHAR2(64)
SQL> EXEC :client identifier:='pqtest ' || userenv('sessionid')
PL/SQL procedure successfully completed.
SQL> PRINT client identifier
CLIENT IDENTIFIER
----------pqtest 1894
SQL> EXEC dbms session.set identifier(:client identifier)
PL/SQL procedure successfully completed.
SQL> SELECT /*+ FULL(s) PARALLEL (s ,4) */ count(*) FROM sys.source$ s;
COUNT(*)
---------298767
SQL> SELECT statistic, last query
FROM v$pq sesstat
WHERE statistic='Queries Parallelized';
STATISTIC
LAST QUERY
------------------------------ ---------Queries Parallelized
1
SQL> EXEC dbms session.set identifier(NULL)
PL/SQL procedure successfully completed.
The query on V$PQ SESSTAT confirms that the SELECT statement ran in parallel. At this
point, the client identifier has been emitted to four trace files from parallel execution processes.
C:\oracle\admin\orcl\bdump> grep pqtest 1894 *.trc
orcl p000 5412.trc:*** CLIENT ID:(pqtest 1894) 2007-08-31
orcl p001 2932.trc:*** CLIENT ID:(pqtest 1894) 2007-08-31
orcl p002 4972.trc:*** CLIENT ID:(pqtest 1894) 2007-08-31
orcl p003 1368.trc:*** CLIENT ID:(pqtest 1894) 2007-08-31
23:14:38.421
23:14:38.421
23:14:38.421
23:14:38.421
TRCSESS is used to combine the trace files of the four parallel execution processes and the
trace file of the query coordinator into a single trace file called pqtest 1894.trc. The client
identifier is passed to TRCSESS with the option clientid. Since TRCSESS supports wildcards
and scans all files that match, it is sufficient to pass *.trc for the local directory bdump and
..\udump\*.trc for the directory where the query coordinator trace file resides as the input file
specification.
C:\oracle\admin\orcl\bdump> trcsess output=pqtest 1894.trc clientid=pqtest 1894 *.tr
c ..\udump\*.trc
309
310
CHAPTER 24 ■ EXTENDED SQL TRACE FILE FORMAT REFERENCE
The output file is a trace file that contains the database calls and wait events of all processes
involved in the parallel execution. The combined trace file may be processed with TKPROF.
C:\oracle\admin\orcl\bdump> tkprof pqtest 1894.trc pqtest 1894.tkp
Whereas the trace file of the query coordinator reports zero disk reads as well as zero
consistent reads,8 the combined formatted trace file gives an accurate account of the resource
consumption.
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
call
count
cpu
elapsed
disk
query current rows
------- ------ -------- ---------- -------- -------- -------- -----Parse
7
0.00
0.00
0
0
0
0
Execute
7
0.18
7.05
5209
5368
0
1
Fetch
4
0.01
1.87
0
0
0
2
------- ------ -------- ---------- -------- -------- -------- -----total
18
0.20
8.93
5209
5368
0
3
Disk reads are due to direct path reads performed by the parallel execution processes,
which do not use the buffer cache in the SGA. Since parallel execution circumvents the buffer
cache, it benefits from caching at the file system level by the operating system.
8. FETCH #2:c=15625,e=1875092,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,tim=108052435005
CHAPTER 25
■■■
Statspack
S
tatspack is a PL/SQL package that persistently stores snapshot-based performance data
pertaining to a single ORACLE DBMS instance or a set of RAC instances. It is documented in
the Oracle9i Database Performance Tuning Guide and Reference manual as well as in the file
$ORACLE HOME/rdbms/admin/spdoc.txt on both UNIX and Windows.1 The latter document is
significantly more detailed than the Performance Tuning Guide and Reference. Due to the
introduction of the Active Workload Repository in Oracle10g, Statspack documentation has
been removed from the Oracle Database Performance Tuning Guide 10g Release 2 manual.
This chapter covers advanced aspects of Statspack usage, such as undocumented report
parameters and how to relate SQL statements identified by hash value found in SQL trace files
to information in Statspack reports and the Statspack repository. Furthermore it presents the
mostly undocumented repository structure and explains how to find used indexes as well as
current and past execution plans for statements in SQL trace files. It also looks at how to identify periods of high resource consumption among a large amount of Statspack snapshots, by
using the analytic function LAG.
Introduction to Statspack
Introductory documentation on Statspack is in the Oracle9i Database Performance Tuning
Guide and Reference Release 2. The chapter on Statspack has been removed from Oracle10g
documentation. Oracle Database Performance Tuning Guide 10g Release 2 merely states that
Statspack has been replaced by the Automatic Workload Repository, which is not really true,
since Statspack is still available in Oracle10g as well as Oracle11g.
I will not provide a thorough introduction to Statspack here, but for the novice Statspack
user some minimal instructions on how to get started with the package are in order. These are
reproduced in Table 25-1. The default file name extension .sql for SQL*Plus scripts is omitted
in the table.
The installation of Statspack into the schema PERFSTAT must be performed as SYS. All other
actions except truncation of Statspack tables may be run by any user with DBA privileges.
Statspack is implemented by a number of scripts named sp*.sql in $ORACLE HOME/rdbms/admin.
1. %ORACLE HOME%\rdbms\admin\spdoc.txt in Windows syntax.
311
312
CHAPTER 25 ■ STATSPACK
$ ls sp*.sql
spauto.sql
spcpkg.sql
spcreate.sql
spctab.sql
spcusr.sql
spdrop.sql
spdtab.sql
spdusr.sql
sppurge.sql
sprepcon.sql
sprepins.sql
spreport.sql
sprepsql.sql
sprsqins.sql
sptrunc.sql
spup10.sql
spup816.sql
spup817.sql
spup90.sql
spup92.sql
Note that setting ORACLE HOME as part of the SQL script directory search path with environment variable SQLPATH removes the necessity to supply the full path name of Statspack SQL
scripts and any other scripts in the directories thereby specified. On UNIX use a colon (:) as a
separator between multiple directories.
$ export SQLPATH=$ORACLE HOME/rdbms/admin:$HOME/it/sql
On Windows, use a semicolon (;).
C:> set SQLPATH=%ORACLE HOME%\rdbms\admin;C:\home\ndebes\it\sql
Table 25-1. Statspack Quick Reference
Action
Command to Enter in SQL*Plus
Run as User
Installation
SQL> @spcreate
SYS
Manual snapshot of performance
data with optional session-level
snapshot and comment
SQL> EXEC statspack.snap
(i snap level=>snapshot_level[,
i session id=>sid_from_v$session ][,
i ucomment=>'comment' ])2
DBA
Automatic snapshots every hour
on the hour taken by job queue
processes (DBMS JOB)
SQL> @spauto
DBA
Reporting
SQL> @spreport
DBA
Purge obsolete Statspack data
by snapshot ID range to prevent
the default tablespace of user
PERFSTAT from overflowing3
SQL> @sppurge
DBA
Truncate tables containing snapshot data
SQL> @sptrunc
PERFSTAT
Deinstallation
SQL> @spdrop
DBA
2. Snapshot_level is an integer in the range 1..10; sid_from_v$session is V$SESSION.SID of a session for
which CPU consumption, wait events, and session statistics are captured; comment is a comment,
which will be reproduced along with the snapshot ID, time, and level when spreport.sql is run. Statspack
releases including Oracle10g Release 2 have a software defect concerning the generation of the sessionspecific report. Wait events that occurred solely in the end snapshot, but not in the begin snapshot, are
omitted from the report due to a missing outer join. I reported this issue, which is tracked by bug 5145816.
The bug is fixed in Oracle11g. There is no backport of the fix to earlier releases, but you can use my fixes
in the source code depot.
3. In Oracle10g, the purge functionality is also available as part of the Statspack package (STATSPACK.PURGE). The
source code depot contains a backport of the procedure to Oracle9i (see Table 25-4).
CHAPTER 25 ■ STATSPACK
I urge you to capture at least one snapshot per hour. When no performance snapshots are
captured on a regular basis, you will be at a loss when database users call and state they had a
performance problem at some point in the recent past. You won’t be able to answer the request
to figure out why, except by shrugging your shoulders. With historical performance data, you
ask at what time it happened, generate the Statspack reports for snapshots taken before and
after that time, possibly drill down by looking at the execution plan of an expensive statement
with script sprepsql.sql (requires snapshot level 6 or higher),4 identify the cause of the problem,
and solve it.
Retrieving the Text of Captured SQL Statements
Statspack is a tremendous improvement over its predecessor bstat/estat. However, it is annoying
that all SQL statements are reproduced with forced line breaks as in V$SQLTEXT.SQL TEXT. Additionally, statements with more than five lines of text are truncated. Following is an excerpt of a
Statspack report that contains a truncated statement:
SQL ordered by CPU DB/Inst: ORCL/orcl Snaps: 2-3
-> Resources reported for PL/SQL code includes the resources used by all SQL
statements called by the code.
-> Total DB CPU (s):
11
-> Captured SQL accounts for
48.4% of Total DB CPU
-> SQL reported below exceeded 1.0% of Total DB CPU
CPU
CPU per
Elapsd
Old
Time (s) Executions Exec (s) %Total Time (s) Buffer Gets Hash Value
-------- ---------- -------- ------ -------- ----------- ---------10.34
11
0.94
12.3
31.65
162,064 1455318379
SELECT emp.last name, emp.first name, j.job title, d.department
name, l.city, l.state province, l.postal code, l.street address,
emp.email, emp.phone number, emp.hire date, emp.salary, mgr.las
t name FROM employees emp, employees mgr, departments d, locatio
ns l, jobs j WHERE emp.manager id=mgr.employee id AND emp.depart
Let’s assume that the statement is consuming too many resources and should be tuned.
But how do you tune a SQL statement when you don’t have the complete text? At the time of
creating the Statspack report, the application may long have terminated. If it has, it is too late
to capture the statement with SQL trace.
4. The job created with the script spauto.sql does not call the STATSPACK package with a specific snapshot level. It uses the default snapshot level in STATS$STATSPACK PARAMETER, which can be modified by
calling the procedure STATSPACK.MODIFY STATSPACK PARAMETER(i snap level=>6). Snapshot level 6
may cause STATSPACK.SNAP to fail, due to an internal error when selecting from V$SQL PLAN. Try flushing
the shared pool with ALTER SYSTEM FLUSH SHARED POOL in case this problem manifests itself. If this does
not prevent an internal error from recurring, reduce the snapshot level to 5 and restart the instance
during the next maintenance window.
313
314
CHAPTER 25 ■ STATSPACK
To show you what the original formatting of the example statement was, I have captured it
with SQL trace.
*** SERVICE NAME:(SYS$USERS) 2007-07-28 13:10:44.703
*** SESSION ID:(158.3407) 2007-07-28 13:10:44.703
…
*** ACTION NAME:(EMPLIST) 2007-07-28 13:10:57.406
*** MODULE NAME:(HR) 2007-07-28 13:10:57.406
…
PARSING IN CURSOR #8 len=416 dep=0 uid=0 oct=3 lid=0 tim=79821013130 hv=3786124882
ad='2d8f5f1c'
SELECT emp.last name, emp.first name, j.job title, d.department name, l.city,
l.state province, l.postal code, l.street address, emp.email,
emp.phone number, emp.hire date, emp.salary, mgr.last name
FROM hr.employees emp, hr.employees mgr, hr.departments d, hr.locations l, hr.jobs j
WHERE emp.manager id=mgr.employee id
AND emp.department id=d.department id
AND d.location id=l.location id
AND emp.job id=j.job id
END OF STMT
PARSE #8:c=46875,e=42575,p=0,cr=4,cu=0,mis=1,r=0,dep=0,og=1,tim=79821013122
The developer nicely formatted the statement, placing each clause of the SELECT statement
on a separate line and indenting the long select list with tab stops. As we just saw, all of this is lost
in the Statspack report. Active Workload Repository (AWR) HTML formatted reports do contain
the complete text of SQL statements (see Figure 25-1). By default, AWR text reports contain up to
four lines of SQL statement text. AWR is bundled with the Database Diagnostics Pack, which
has a price tag of $3,000 per processor or $60 per named user (see http://oraclestore.
oracle.com). Some DBAs mistakenly assume that AWR may be used at no additional charge,
since it is installed and enabled (STATISTICS LEVEL=TYPICAL) by default.
The Database Diagnostics Pack includes AWR, the DBMS WORKLOAD REPOSITORY package for
taking AWR snapshots and AWR administration, AWR reports (awrrpt.sql), the DBA HIST * and
DBA ADVISOR * views, Active Session History (V$ACTIVE SESSION HISTORY), and ADDM (Automatic
Database Diagnostic Monitor). Of course, without proper licensing, these features may not be
accessed through Enterprise Manager Grid Control or Database Control either. Speaking of
AWR and Statspack, AWR snapshots cannot include session-specific data in the same way that
Statspack snapshots can (see i_session_id in Figure 25-1). Active Session History samples metrics of
all sessions, whereas a session-level Statspack snapshot saves accurate, non-sampled metrics
of a single session.
CHAPTER 25 ■ STATSPACK
Figure 25-1. AWR Report section showing unabridged text of SQL statements
If you have read the Oracle10g Statspack documentation in spdoc.txt then you might know
that in Oracle10g the file sprepcon.sql (Statspack report configuration) contains a number of
parameters that control the appearance of Statspack reports as well as which statements make
it into the report.5 Here’s the relevant excerpt from spdoc.txt:
SQL section report settings - num rows per hash
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is the upper limit of the number of rows of SQL Text to print for
each SQL statement appearing in the SQL sections of the report. This
variable applies to each SQL statement (i.e. hash value). The default value
is 4, which means at most 4 lines of the SQL text will be printed for
each SQL statement. To change this value, change the value of the variable
num rows per hash.
e.g.
define num rows per hash = 10;
And this is the relevant excerpt of sprepcon.sql:
-- SQL related report settings
-- Number of Rows of SQL to display in each SQL section of the report
define top n sql = 65;
-- Number of rows of SQL Text to print in the SQL sections of the report
-- for each hash value
define num_rows_per_hash = 4;
…
define top pct sql = 1.0;
5. In Oracle9i, sprepcon.sql does not exist and num rows per hash is undocumented.
315
316
CHAPTER 25 ■ STATSPACK
Thus, the solution for getting the full text of the poorly performing SQL statement is to
increase the value of num rows per hash. Since the documentation does not reveal how Statspack
stores captured SQL statements, we simply set num rows per hash to an arbitrarily large value
such as 1000 and run the Statspack report again (script $ORACLE HOME/rdbms/admin/spreport.sql).
This time, the complete SQL statement text is in the report.
CPU
CPU per
Elapsd
Old
Time (s) Executions Exec (s) %Total Time (s) Buffer Gets Hash Value
-------- ---------- -------- ------ -------- ----------- ---------10.34
11
0.94
12.3
31.65
162,064 1455318379
SELECT emp.last name, emp.first name, j.job title, d.department
name, l.city, l.state province, l.postal code, l.street address,
emp.email, emp.phone number, emp.hire date, emp.salary, mgr.las
t name FROM employees emp, employees mgr, departments d, locatio
ns l, jobs j WHERE emp.manager id=mgr.employee id AND emp.depart
ment id=d.department id AND d.location id=l.location id AND emp.
job id=j.job id
Unfortunately it’s still not in a format that adheres to correct SQL syntax. The remaining
task consists of copying the statement from the report and editing it, such that line breaks in
the middle of identifiers, SQL reserved words, and literals are removed. This procedure can be
quite annoying—especially for statements that exceed 50 lines or so.
The more elegant approach, which will pay off sooner or later, is to go directly to the Statspack
repository in the PERFSTAT schema and to retrieve the statement from there. Note, however,
that Statspack copies SQL statement texts from V$SQLTEXT instead of from V$SQLTEXT WITH
NEWLINES. Only the latter view contains the statement with line breaks and tab stops in the original
positions where the developer placed them preserved. If you’re dealing with a more complex
statement, proper formatting may significantly ease the process of finding out what the statement does. SQL statement texts retrieved from V$SQL and V$SQLTEXT have been subjected to an
undocumented normalization procedure which removes line breaks as well as tab stops.
In Oracle10g, the column SQL FULLTEXT was added to V$SQL to provide SQL statements
with intact formatting as a CLOB.
SQL> DESCRIBE v$sql
Name
Null?
------------------------------- -------SQL TEXT
SQL_FULLTEXT
SQL ID
…
Type
-------------VARCHAR2(1000)
CLOB
VARCHAR2(13)
When working with Oracle10g, given that the statement text is still cached in the shared pool
within the SGA, the following SQL*Plus script retrieves the statement text with intact formatting:
$ cat sql fulltext.sql
-- pass old hash value as the single argument to the script
define old hash value='&1'
set verify off
set long 100000
CHAPTER 25 ■ STATSPACK
set trimout on
set trimspool on
set feedback off
set heading off
set linesize 32767
col sql fulltext format a32767
spool sp sqltext &old hash value..lst
SELECT sql fulltext FROM v$sql WHERE old hash value=&old hash value;
spool off
exit
Let’s test the script sql fulltext.sql with the SELECT statement that needs tuning.
$ sqlplus -s system/secret @sql fulltext.sql 1455318379
SELECT emp.last name, emp.first name, j.job title, d.department name, l.city,
l.state province, l.postal code, l.street address, emp.email,
emp.phone number, emp.hire date, emp.salary, mgr.last name
FROM hr.employees emp, hr.employees mgr, hr.departments d, hr.locations l, hr.jobs j
WHERE emp.manager id=mgr.employee id
AND emp.department id=d.department id
AND d.location id=l.location id
AND emp.job id=j.job id
This time, the tab stops in lines 2 and 3 as well as the line breaks in the original positions
have been preserved. Do not despair, even if the statement has been aged out of the shared
pool or the DBMS instance restarted. The next section will show how pipelined table functions
introduced in PL/SQL with Oracle9i may be used to solve the problem.
Accessing STATS$SQLTEXT
A closer look at the main Statspack report file sprepins.sql soon reveals that the full statement
text is stored in the table STATS$SQLTEXT, while the measurement data of the statement are in
table STATS$SQL SUMMARY. The source code of the package STATSPACK in spcpkg.sql reveals that
STATS$SQLTEXT copies statement texts from V$SQLTEXT. Both V$ views split SQL statements into
several VARCHAR2 pieces with a maximum length of 64 characters.
SQL> DESCRIBE v$sqltext
Name
Null?
------------------------------- -------ADDRESS
HASH VALUE
SQL ID
COMMAND TYPE
PIECE
SQL TEXT
SQL> DESCRIBE perfstat.stats$sqltext
Name
Null?
------------------------------- --------
Type
---------------------------RAW(4)
NUMBER
VARCHAR2(13)
NUMBER
NUMBER
VARCHAR2(64)
Type
----------------------------
317
318
CHAPTER 25 ■ STATSPACK
OLD HASH VALUE
TEXT SUBSET
PIECE
SQL ID
SQL TEXT
ADDRESS
COMMAND TYPE
LAST SNAP ID
NOT NULL NUMBER
NOT NULL VARCHAR2(31)
NOT NULL NUMBER
VARCHAR2(13)
VARCHAR2(64)
RAW(8)
NUMBER
NUMBER
This explains why it is impossible to obtain statement texts with original formatting in
Statspack reports. The table STATS$SQLTEXT may be queried as follows:
SQL> SELECT sql text
FROM perfstat.stats$sqltext
WHERE old hash value=1455318379
ORDER BY piece;
SQL TEXT
---------------------------------------------------------------SELECT emp.last name, emp.first name, j.job title, d.department
name, l.city, l.state province, l.postal code, l.street address,
emp.email, emp.phone number, emp.hire date, emp.salary, mgr.las
t name FROM employees emp, employees mgr, departments d, locatio
ns l, jobs j WHERE emp.manager id=mgr.employee id AND emp.depart
ment id=d.department id AND d.location id=l.location id AND emp.
job id=j.job id
7 rows selected.
Due to the maximum piece length, there is no way to remove the forced line break after
64 characters. There is no SQL*Plus formatting option that glues consecutive lines together.
However, with some background in PL/SQL programming, creating the glue to solve the issue
at hand is straightforward.
The algorithm is as follows:
1. Create an abstract data type that holds the hash value of the statement and the statement
itself as a CLOB. Remember, a single CLOB can hold at least 2 GB, whereas VARCHAR2 columns
are limited to 4000 bytes.
2. Create a pipelined table function that selects rows from STATS$SQLTEXT piece by piece.
3. Append each piece to a temporary CLOB using DBMS LOB.WRITEAPPEND, that is, glue the
pieces together eliminating the forced line breaks.
4. When all pieces for a single SQL or PL/SQL statement have been exhausted, use row
pipelining (PIPE ROW (object_type_instance)) to pass an instance of the abstract data
type to the caller of the function.
5. Call the pipelined table function from SQL*Plus or any other database
client with the TABLE clause of the SELECT statement (SELECT * FROM
TABLE(function_name(optional_arguments))).
CHAPTER 25 ■ STATSPACK
Pipelined table functions require the keyword PIPELINED after the RETURN clause. This keyword
indicates that the function returns rows iteratively. The return type of the pipelined table function
must be a collection type. This collection type can be declared at the schema level with CREATE TYPE
or inside a package. The function iteratively returns individual elements of the collection type. The
elements of the collection type must be supported SQL data types, such as NUMBER and VARCHAR2.
PL/SQL data types, such as PLS INTEGER and BOOLEAN, are not supported as collection elements in a
pipelined table function. We will use the following object type for pipelining:
CREATE OR REPLACE TYPE site sys.sqltext type AS OBJECT (
hash value NUMBER,
sql text CLOB
);
/
CREATE OR REPLACE TYPE site sys.sqltext type tab AS TABLE OF sqltext type;
/
The code of the pipelined table function SP SQLTEXT is reproduced in the next code example.
Objects are created in schema SITE SYS, since SYS is reserved for the data dictionary and objects in
schema SYS are not covered by a full export. Some extra work is required to obtain compatibility
of the function with both Oracle9i and Oracle10g. In Oracle10g and subsequent releases, the
column HASH VALUE in the table STATS$SQLTEXT was renamed to OLD HASH VALUE due to a likewise rename in the views V$SQL and V$SQLAREA. For Oracle9i a synonym is used, whereas a view
is created to compensate for the renamed column in Oracle10g and subsequent releases. Thus the
code of the function SITE SYS.SP SQLTEXT can remain constant for both releases and dynamic SQL
with DBMS SQL must not be used.
$ cat sp sqltext.sql
-- run as a DBA user
CREATE USER site sys IDENTIFIED BY secret PASSWORD EXPIRE ACCOUNT LOCK;
/* note that show errors does not work when creating objects in a foreign schema.
If you get errors either run this script as SITE SYS after unlocking
the account or access DBA ERRORS as below:
col text format a66
SELECT line,text from dba errors where name='SP SQLTEXT' ORDER BY line;
*/
-- cleanup, e.g. for database upgraded to 10g
begin
execute immediate 'DROP SYNONYM site sys.stats$sqltext';
execute immediate 'DROP VIEW site sys.stats$sqltext';
exception when others then null;
end;
/
GRANT SELECT ON perfstat.stats$sqltext TO site sys;
/* for 9i:
CREATE OR REPLACE SYNONYM site sys.stats$sqltext FOR perfstat.stats$sqltext;
for 10g, create this view:
CREATE OR REPLACE VIEW site sys.stats$sqltext(hash value, piece, sql text) AS
SELECT old hash value, piece, sql text FROM perfstat.stats$sqltext;
*/
319
320
CHAPTER 25 ■ STATSPACK
declare
version varchar2(30);
compatibility varchar2(30);
begin
dbms utility.db version(version, compatibility);
if to number(substr(version,1,2)) >= 10 then
execute immediate 'CREATE OR REPLACE VIEW site sys.stats$sqltext
(hash value, piece, sql text) AS
SELECT old hash value, piece, sql text
FROM perfstat.stats$sqltext';
else
execute immediate 'CREATE OR REPLACE SYNONYM site sys.stats$sqltext
FOR perfstat.stats$sqltext';
end if;
end;
/
/*
p hash value is either the hash value of a specific statement in
STATS$SQLTEXT to retrieve or NULL.
When NULL, all statements in the Statspack repository are retrieved.
The column is called old hash value in Oracle10g
*/
CREATE OR REPLACE function site sys.sp sqltext(p hash value number default null)
RETURN sqltext type tab PIPELINED
AS
result row sqltext type:=sqltext type(null, empty clob);
cursor single stmt(p hash value number) is
select hash value, piece, sql text from stats$sqltext
where p hash value=hash value
order by piece;
cursor multi stmt is
select hash value, piece, sql text from stats$sqltext
order by hash value, piece;
v sql text stats$sqltext.sql text%TYPE;
v piece binary integer;
v prev hash value number:=NULL;
v cur hash value number:=0;
BEGIN
dbms lob.CREATETEMPORARY(result row.sql text, true);
IF p hash value IS NULL THEN
open multi stmt; -- caller asked for all statements
CHAPTER 25 ■ STATSPACK
ELSE
open single stmt(p hash value); -- retrieve only one statement
END IF;
LOOP
IF p hash value IS NULL THEN
FETCH multi stmt INTO v cur hash value, v piece, v sql text;
EXIT WHEN multi stmt%NOTFOUND;
ELSE
FETCH single stmt INTO v cur hash value, v piece, v sql text;
EXIT WHEN single stmt%NOTFOUND;
END IF;
IF v piece=0 THEN -- new stmt starts
IF v prev hash value IS NOT NULL THEN
-- there was a previous statement which is now finished
result row.hash value:=v prev hash value;
pipe row(result row);
-- trim the lob to length 0 for the next statement
dbms lob.trim(result row.sql text, 0);
-- the current row holds piece 0 of the new statement - add it to
CLOB
dbms lob.writeappend(result row.sql text, length(v sql text),
v sql text);
ELSE
-- this is the first row ever
result row.hash value:=v cur hash value;
dbms lob.writeappend(result row.sql text, length(v sql text),
v sql text);
END IF;
ELSE
-- append the current piece to the CLOB
result row.hash value:=v cur hash value;
dbms lob.writeappend(result row.sql text, lengthb(v sql text),
v sql text);
END IF;
v prev hash value:=v cur hash value;
END LOOP;
-- output last statement
pipe row(result row);
dbms lob.freetemporary(result row.sql text);
IF p hash value IS NULL THEN
CLOSE multi stmt;
ELSE
CLOSE single stmt;
END IF;
return;
END;
/
GRANT EXECUTE ON site sys.sp sqltext TO dba;
321
322
CHAPTER 25 ■ STATSPACK
The following SQL script retrieves the statement text without forced line breaks and saves
it in a spool file named sp sqltext hash_value.lst, where hash_value is the argument passed
to the script:
$ cat sp sqltext get.sql
define hash value=&1
set verify off
set long 100000
set trimout on
set trimspool on
set feedback off
set heading off
set linesize 32767
col sql text format a32767
spool sp sqltext &hash value..lst
select sql text from table(site sys.sp sqltext(&hash value));
spool off
exit
Let’s test the script with the hash value 1455318379 of the statement in question.
$ sqlplus -s system/secret @sp sqltext get.sql 1455318379
SELECT emp.last name, emp.first name, j.job title, d.department name, l.city,
l.state province, l.postal code, l.street address, emp.email,
emp.phone number, emp.hire date, emp.salary, mgr.last name
FROM hr.employees emp, hr.employees mgr,
hr.departments d, hr.locations l, hr.jobs j
WHERE emp.manager id=mgr.employee id
AND emp.department id=d.department id
AND d.location id=l.location id
AND emp.job id=j.job id
The entire statement is now on a single line of text6 (SQL*Plus inserts a blank line at the
beginning of the file, such that the total line count amounts to 2).
$ wc -l sp sqltext 1455318379.lst
2 sp sqltext 1455318379.lst
We finally achieved our goal of retrieving the statement text with correct SQL syntax. When
called without an argument or a NULL argument, the function SP SQLTEXT retrieves all statements
from STATS$SQLTEXT.
6. The UNIX command wc counts lines and characters in files. If working on Windows, install the complimentary UNIX-like environment Cygwin from http://www.cygwin.com to get access to wc and many
other UNIX utilities such as awk, grep, and find.
CHAPTER 25 ■ STATSPACK
Capturing SQL Statements with Formatting Preserved
We have already come a long way, yet we could go one step further by enabling Statspack to
save SQL statements with line breaks and tab stops preserved. If you don’t have any apprehensions about changing a single line in the package body of the STATSPACK package, please follow suit.
Remember, V$SQLTEXT WITH NEWLINES preserves line breaks and tab stops, whereas V$SQLTEXT,
which is queried by Statspack, does not. First of all, we need to authorize the user PERFSTAT to
access the dynamic performance view V$SQLTEXT WITH NEWLINES.
SQL> CONNECT / AS SYSDBA
SQL> GRANT SELECT ON v $sqltext with newlines TO PERFSTAT;
I used the Revision Control System7 (RCS) to save the original version of spcpkg.sql, which
contains the STATSPACK package body, as version 1.1. Since V$SQLTEXT is referenced only once in
this file, it’s sufficient to change a single line as shown here (the line number is from an Oracle10g
version of spcpkg.sql):
$ rcsdiff spcpkg.sql
=================================================================
RCS file: RCS/spcpkg.sql,v
retrieving revision 1.1
diff -r1.1 spcpkg.sql
4282c4282
<
, v$sqltext vst
-->
, v$sqltext with newlines vst
Please note that the change made only has an effect on new SQL statements captured
by Statspack. Any statement with a hash value already present in STATS$SQLTEXT.(OLD )
HASH VALUE will not be captured again. To recapture existing statements, export the schema
PERFSTAT to save past snapshots and run sptrunc.sql to purge all snapshots. This removes all data
from STATS$SQLTEXT. Don’t worry, Statspack configuration data in the tables STATS$STATSPACK
PARAMETER and STATS$IDLE EVENT is preserved in spite of the warning that appears when running
sptrunc.sql and claims that it “removes ALL data from Statspack tables”.
After editing spcpkg.sql, recreating the package STATSPACK, re-running the application,
and capturing its SQL statements with Statspack, we can finally view the statement with all
formatting fully preserved.
$ sqlplus -s system/secret @sp sqltext get.sql 1455318379
SELECT emp.last name, emp.first name, j.job title, d.department name, l.city,
l.state province, l.postal code, l.street address, emp.email,
emp.phone number, emp.hire date, emp.salary, mgr.last name
FROM hr.employees emp, hr.employees mgr, hr.departments d, hr.locations l, hr.jobs j
WHERE emp.manager id=mgr.employee id
AND emp.department id=d.department id
AND d.location id=l.location id
AND emp.job id=j.job id
7. RCS is open source software. Precompiled executables for Windows ship with Cygwin and most Linux
distributions. The command rcsdiff displays differences between releases of a file.
323
324
CHAPTER 25 ■ STATSPACK
This change does not harm the appearance of the Statspack report. The section with our
problem statement now becomes this:
CPU
CPU per
Elapsd
Old
Time (s) Executions Exec (s) %Total Time (s) Buffer Gets Hash Value
-------- ---------- -------- ------ -------- ----------- ---------10.34
11
0.94
12.3
31.65
162,064 1455318379
SELECT emp.last name, emp.first name, j.job title, d.department
name, l.city,
l.state province, l.postal code, l.street address
, emp.email,
emp.phone number, emp.hire date, emp.salary, mgr.l
ast name
FROM hr.employees emp, hr.employees mgr, hr.departments
d, hr.locations l, hr.jobs j
WHERE emp.manager id=mgr.employee
id
AND emp.department id=d.department id
AND d.location id=l.loc
ation id
AND emp.job id=j.job id
Undocumented Statspack Report Parameters
As we saw in the previous sections, an important undocumented Statspack report parameter
in Oracle9i is num rows per hash. This parameter is documented in Oracle10g; however, there
are still a number of old and new undocumented report parameters in Oracle10g. I consider
none of them as important or useful as num rows per hash. Perhaps top n events, which is
undocumented in both releases, is the most interesting. It controls how many lines are shown
in the “Top Timed Events” section near the beginning of the report. Often the lowest contribution to total elapsed time (Oracle9i: “Total Ela Time”; Oracle10g: “Total Call Time”) in this section
is less than 2–3 percent and thus marginally relevant. It’s only worth increasing top n events if
the lowest contributor has consumed 5 percent or more of the total elapsed time. Of course,
any such contributor would also appear in the “Wait Events” section of the report,8 but the
percentage of total elapsed time is only reported in the “Top Timed Events” section. Following
is an example “Top Timed Events” section from an Oracle10g Statspack report:
8. Except if the contribution of CPU time had been so low that it was not reported in the “Top Timed
Events” section, which is unlikely.
CHAPTER 25 ■ STATSPACK
Top 5 Timed Events
Avg %Total
~~~~~~~~~~~~~~~~~~
wait
Call
Event
Waits
Time (s)
(ms)
Time
----------------------------------------- ------------ ----------- ------ -----CPU time
671
90.2
db file sequential read
1,525,262
36
0
4.9
db file scattered read
138,657
16
0
2.1
latch: library cache
7
15
2086
2.0
log file parallel write
5,187
3
1
.4
The lowest contribution to elapsed time comes from log file parallel write and is less than
1 percent, so in this case there is no need to change top n events. Table 25-2 lists the undocumented Statspack report parameters of Oracle10g Release 2.
Table 25-2. Undocumented Oracle10g Statspack Report Parameters
Statspack Report Parameter
Purpose
Default
Value
avwt fmt
Displayed precision of average wait time
1 ms
cache xfer per instance
Report per instance cache transfer statistics
Y
display file io
Report file-level I/O statistics
Y
display undostat
Report undo segment statistics
Y
linesize fmt
Controls the line size of SQL*Plus output and
should be increased if any columns are made wider
80
streams top n
Number of lines in Oracle Streams related statistics
(e.g., capture, propagation, and apply statistics)
25
top n events
Number of lines in the “Top Timed Events”
report section9
5
top n undostat
Number of lines in undo statistics
35
Statspack Tables
In Oracle10g, the Statspack schema PERFSTAT contains 67 tables. Among these, only
STATS$ENQUEUE STAT and STATS$STATSPACK PARAMETER are documented in the file spdoc.txt.
A Statspack schema contains a wealth of information on a database and the instance that
opens the database (or multiple instances in case of RAC). When troubleshooting performance
problems or malfunctions, it may be useful to query these tables directly to detect snapshots
with high resource utilization or to figure out when a problem occurred for the first time.
Table 25-3 contains a list of all tables, the V$ views used to populate each table, if any, and a
short explanation on the purpose of the table.
9. The variable top n events is defined in sprepins.sql not in sprepcon.sql.
325
326
CHAPTER 25 ■ STATSPACK
Table 25-3. Oracle10g Statspack Repository Tables
Statspack Table
Underlying V$ View(s)
Purpose
STATS$BG EVENT SUMMARY
V$SESSION EVENT
Wait events of background sessions
STATS$BUFFERED QUEUES
V$BUFFERED QUEUES
Statistics on Streams buffered queues
(messages processed, messages
spilled to disk, etc.)
STATS$BUFFERED SUBSCRIBERS
V$BUFFERED SUBSCRIBERS
Statistics on subscribers of Streams
buffered queues
STATS$BUFFER POOL STATISTICS
V$BUFFER POOL STATISTICS
Buffer pool statistics per buffer pool
name (DEFAULT, KEEP, RECYCLE)
and block size (2 KB up to 32 KB) in
the database buffer cache
STATS$CR BLOCK SERVER
V$CR BLOCK SERVER
Statistics concerning RAC consistent
read block server processes (Global
Cache Service)
STATS$CURRENT BLOCK SERVER
V$CURRENT BLOCK SERVER
Global Cache Service current block
server statistics
STATS$DATABASE INSTANCE
V$INSTANCE
DBMS instances for which Statspack
has captured snapshots
STATS$DB CACHE ADVICE
V$DB CACHE ADVICE
Sizing advice for the database buffer
cache per buffer pool name and
block size
STATS$DLM MISC
V$DLM MISC
Real Application Clusters Global
Enqueue Service and Global Cache
Service statistics
STATS$DYNAMIC REMASTER STATS
X$KJDRMAFNSTATS
RAC Global Cache Service resource
remastering
STATS$ENQUEUE STATISTICS
V$ENQUEUE STATISTICS
Enqueue statistics
STATS$EVENT HISTOGRAM
V$EVENT HISTOGRAM
Histogram statistics on wait events
STATS$FILESTATXS
X$KCBFWAIT, V$FILESTAT,
V$TABLESPACE, V$DATAFILE
I/O statistics per data file
STATS$FILE HISTOGRAM
V$FILE HISTOGRAM
Histogram statistics on single block
physical read time; statistics are
reported per file and per read time
range (0–2 ms, 2–4 ms, 4–8 ms, etc.)
STATS$IDLE EVENT
V$EVENT NAME
Events considered idle waits
by Statspack10
10. In Oracle10g, STATS$IDLE EVENT contains 41 events from wait class “Idle”, two from wait class “Network”, three
from wait class “Other”, and 24 events which no longer exist in Oracle10g. STATS$IDLE EVENT has occasionally
been of concern (see bug database on Metalink), since several releases of Statspack did not insert all the wait
events considered as idle in the table. For example the RAC idle wait event ges remote message was missing in
Oracle9i. Thus, an idle event could make it into the report section entitled “Top 5 Timed Events” and render the
calculations unusable. The solution consists of inserting missing idle wait events into the table and regenerating
the Statspack report. In Oracle10g, 21 events from wait class Idle are not registered in STATS$IDLE EVENT. In my
view, this is rightfully so for the event SQL*Net message from dblink, but PL/SQL lock timer, which occurs when
a session is put to sleep by a call to DBMS LOCK.SLEEP, should be present in STATS$IDLE EVENT.
CHAPTER 25 ■ STATSPACK
327
Table 25-3. Oracle10g Statspack Repository Tables (Continued)
Statspack Table
Underlying V$ View(s)
Purpose
STATS$INSTANCE CACHE TRANSFER
V$INSTANCE CACHE TRANSFER
RAC cache transfers
STATS$INSTANCE RECOVERY
V$INSTANCE RECOVERY
Statistics on estimated mean time
to recover in case crash recovery is
needed due to instance, operating
system, or hardware failure (parameter FAST START MTTR TARGET)
STATS$JAVA POOL ADVICE
V$JAVA POOL ADVICE
Advice for sizing the Java pool
(parameter JAVA POOL SIZE)
STATS$LATCH
V$LATCH
Latch statistics
STATS$LATCH CHILDREN
V$LATCH CHILDREN
Child latch statistics
STATS$LATCH MISSES SUMMARY
V$LATCH MISSES
Latch misses
STATS$LATCH PARENT
V$LATCH PARENT
Parent latch statistics
STATS$LEVEL DESCRIPTION
n/a
Descriptions for snapshot levels 0, 5,
6, 7, and 10
STATS$LIBRARYCACHE
V$LIBRARYCACHE
Shared pool statistics, including RAC
specific statistics
STATS$MUTEX SLEEP
V$MUTEX SLEEP
Mutex statistics
STATS$OSSTAT
V$OSSTAT
Operating system statistics, such as
idle time, busy time, I/O wait, and
CPUs in the system
STATS$OSSTATNAME
V$OSSTAT
Lookup table for STATS$OSSTAT.
OSSTAT ID; avoids redundant
storage of V$OSSTAT.STAT NAME
STATS$PARAMETER
V$PARAMETER
Captured values of initialization
parameters
STATS$PGASTAT
V$PGASTAT
Program global area (PGA)
statistics for Automatic PGA
Memory Management
STATS$PGA TARGET ADVICE
V$PGA TARGET ADVICE
Advice for setting PGA AGGREGATE
TARGET when using Automatic PGA
Memory Management
STATS$PROCESS MEMORY ROLLUP
V$PROCESS MEMORY
Allocated PGA memory per category
(Freeable, Other, PL/SQL, and SQL)
STATS$PROCESS ROLLUP
V$PROCESS
Allocated PGA memory per process
STATS$PROPAGATION RECEIVER
V$PROPAGATION RECEIVER
Streams buffered queue propagation statistics on the receiving
(destination) side
STATS$PROPAGATION SENDER
V$PROPAGATION SENDER
Streams buffered queue propagation statistics on the sending
(source) side
328
CHAPTER 25 ■ STATSPACK
Table 25-3. Oracle10g Statspack Repository Tables (Continued)
Statspack Table
Underlying V$ View(s)
Purpose
STATS$RESOURCE LIMIT
V$RESOURCE LIMIT
Resource limit statistics on processes,
sessions, enqueues, parallel execution, and undo (or rollback) segments
STATS$ROLLSTAT
V$ROLLSTAT
Rollback segment statistics
STATS$ROWCACHE SUMMARY
V$ROWCACHE
Data dictionary cache (a.k.a. row
cache) statistics per category (e.g.,
segments, sequences, and users)
STATS$RULE SET
V$RULE SET
Statistics on rule set evaluations
STATS$SEG STAT
V$SEGMENT STATISTICS
Segments with high physical reads
or contention including RAC-specific
statistics
STATS$SEG STAT OBJ
V$SEGMENT STATISTICS
Lookup table for the columns
DATAOBJ#, OBJ# and TS# in
STATS$SEG STAT
STATS$SESSION EVENT
V$SESSION EVENT
Statistics on session-specific wait
events
STATS$SESSTAT
V$SESSTAT
Session-specific statistics captured
if the parameter I SESSION ID was
used with STATSPACK.SNAP
STATS$SESS TIME MODEL
V$SESS TIME MODEL
Session-specific time model statistics
STATS$SGA
V$SGA
SGA sizing information pertaining
to these SGA components: database
buffers, redo buffers, variable size,
and fixed size
STATS$SGASTAT
V$SGASTAT
Statistics on individual pools and
free memory in the shared pool
STATS$SGA TARGET ADVICE
V$SGA TARGET ADVICE
SGA sizing advice if Automatic SGA
Memory Management is enabled
(parameter SGA TARGET)
STATS$SHARED POOL ADVICE
V$SHARED POOL ADVICE
Shared pool sizing advice
STATS$SNAPSHOT
V$INSTANCE, V$SESSION
Stores detailed data on each
snapshot, such as snapshot ID,
instance number, startup time,
snapshot level, session ID passed
to STATSPACK.SNAP with parameter
I SESSION ID, comment, and
thresholds
STATS$SQLTEXT
V$SQLTEXT
Normalized SQL statement texts split
into pieces of at most 64 characters
STATS$SQL PLAN
V$SQL PLAN
Execution plans of captured SQL
statements
STATS$SQL PLAN USAGE
V$SQL PLAN
Snapshot ID, date, and time when
an execution plan was used
CHAPTER 25 ■ STATSPACK
329
Table 25-3. Oracle10g Statspack Repository Tables (Continued)
Statspack Table
Underlying V$ View(s)
Purpose
STATS$SQL STATISTICS
V$SQL
Memory consumption by all SQL
statements and non-sharable SQL
statements
STATS$SQL SUMMARY
V$SQLSTATS
Performance metrics, such as elapsed
time, CPU usage, disk reads, direct
path writes, and buffer gets for
captured SQL statements
STATS$SQL WORKAREA HISTOGRAM
V$SQL WORKAREA HISTOGRAM
Histogram statistics for work area
sizes of 0–1 KB, 1–2 KB, 2–4 KB,
4–8 KB, etc.
STATS$STATSPACK PARAMETER
n/a
Statspack parameters such as default
snapshot level and thresholds
STATS$STREAMS APPLY SUM
V$STREAMS APPLY SERVER,
V$STREAMS APPLY READER
Statistics for Streams apply processes
STATS$STREAMS CAPTURE
V$STREAMS CAPTURE
Statistics for Streams capture
STATS$STREAMS POOL ADVICE
V$STREAMS POOL ADVICE
Streams pool sizing advice
(parameter STREAMS POOL SIZE)
STATS$SYSSTAT
V$SYSSTAT
Instance-wide statistics, such as
total CPU consumption, recursive
CPU usage, parse calls (hard/total),
number of transactions, etc.
STATS$SYSTEM EVENT
V$SYSTEM EVENT
Instance-wide wait event statistics
STATS$SYS TIME MODEL
V$SYS TIME MODEL
Instance-wide time model statistics
STATS$TEMPSTATXS
V$TABLESPACE, V$TEMPFILE,
V$TEMPSTAT, X$KCBFWAIT
Statistics on sorting in temporary
segments
STATS$TEMP HISTOGRAM
V$TEMP HISTOGRAM
Histogram statistics on the number
of single block read operations from
temporary segments in buckets of
0–1, 1–2, 2–4, 4–8, 8–16, etc. milliseconds read duration
STATS$THREAD
V$THREAD
Information on online redo
log threads
STATS$TIME MODEL STATNAME
V$SYS TIME MODEL
Lookup table for the column STAT ID
in the tables STATS$SYS TIME MODEL
and STATS$SESS TIME MODEL
STATS$UNDOSTAT
V$UNDOSTAT
Undo segment statistics
STATS$WAITSTAT
V$WAITSTAT
Block contention statistics for data
blocks, extent maps, file header
blocks, free lists, undo blocks, undo
headers, etc.
330
CHAPTER 25 ■ STATSPACK
Finding Expensive Statements in a
Statspack Repository
To get a quick overview of expensive statements in an entire Statspack repository, the result of
the function SP SQLTEXT from the previous section may be joined to STATS$SQL SUMMARY, which
contains the measurement data of all the SQL statements captured. If the figures are normalized by how many times a statement was executed (STATS$SQL SUMMARY.EXECUTIONS), an initial
overview of slow statements results.
Following is the script sp sqltext join.sql, which accomplishes this. It reports elapsed
time in seconds (STATS$SQL SUMMARY.ELAPSED TIME is in microseconds). Disk reads and buffer
gets are normalized by the execution count of the statement. The script restricts the result set
to statements that completed after more than one second. Of course, before starting a tuning
session, you should confirm that statements found in this way impair business processes.
$ cat sp sqltext join.sql
set long 1000000
col module format a6
col snap id format 9999999
col sql text format a80 word wrapped
SELECT s.snap id, s.old hash value,
round(s.elapsed time/s.executions/1000000, 2) ela sec per exec,
floor(s.disk reads/s.executions) read per exec,
floor(s.buffer gets/s.executions) gets per exec,
s.module, t.sql text
FROM stats$sql summary s,
(SELECT hash value, sql text from table(site sys.sp sqltext())) t
WHERE s.old hash value=t.hash value
AND s.elapsed time/s.executions/1000000 > 1
ORDER BY s.elapsed time, s.disk reads, s.buffer gets;
Running the preceding query yields this well-known statement that was used as an
example throughout this chapter:
SNAP ID OLD HASH VALUE ELA SEC PER EXEC READ PER EXEC GETS PER EXEC MODULE
-------- -------------- ---------------- ------------- ------------- -----SQL TEXT
---------------------------------------------------------------------------33
1455318379
2.87
2380
14733 HR
SELECT emp.last name, emp.first name, j.job title, d.department name, l.city,
l.state province, l.postal code, l.street address, emp.email, emp.phone number,
emp.hire date, emp.salary, mgr.last name FROM hr.employees emp, hr.employees
mgr, hr.departments d, hr.locations l, hr.jobs j WHERE
emp.manager id=mgr.employee id AND emp.department id=d.department id AND
d.location id=l.location id AND emp.job id=j.job id
CHAPTER 25 ■ STATSPACK
The preceding SELECT statement retrieves captured measurements since instance startup.
It does not take the snapshot interval into account, such that it may miss statements that were
slow intermittently, but performed acceptably on average.
Identifying Used Indexes
Statspack snapshots at level 6 or higher capture execution plans in addition to the usual
measurement data. Since index usage monitoring with ALTER INDEX MONITORING is somewhat
intrusive (see Chapter 4), you might consider examining index usage by accessing the
Statspack repository (script sp used indexes.sql).
SQL> SELECT DISTINCT o.owner, o.object name index name
FROM dba objects o, stats$sql plan p
WHERE o.object id=p.object#
AND o.object type='INDEX'
AND o.owner='HR';
OWNER
INDEX NAME
------------ ------------HR
JOB ID PK
HR
LOC ID PK
HR
EMP EMP ID PK
HR
DEPT ID PK
Execution Plans for Statements Captured with
SQL Trace
When tracing applications with SQL trace, it may happen that execution plans for certain statements are absent in the trace file and as a consequence also in the TKPROF formatted report.
Under these circumstances, do not use TKPROF with the option EXPLAIN or execute EXPLAIN
PLAN manually, since this runs the risk of producing an execution plan that differs from the plan
used by the application. Instead, use Statspack or AWR (see Chapter 26) to retrieve the execution plan, which was actually used, from the repository of the respective tool. The process is
somewhat easier with Statspack since the hash value emitted to the trace file can be used to get
the desired information. In Oracle9i, there is a single hash value in V$SQL and it is this hash
value that is also emitted to trace files. In Oracle10g, matters are more complicated, since there
are now two hash values: V$SQL.OLD HASH VALUE as well as V$SQL.HASH VALUE. Merely the latter
is written to SQL trace files. Oracle10g Statspack uses the old hash value, which is calculated
with the algorithm used by Oracle9i, for its SQL report. The translation from the HASH VALUE
found in the trace file to the OLD HASH VALUE needed to run the Statspack SQL report may be
accomplished with the following query (script sp translate hv.sql):
331
332
CHAPTER 25 ■ STATSPACK
SQL> SELECT p.snap id, s.snap time, p.sql id, p.hash value, p.old hash value,
p.plan hash value, p.cost
FROM stats$sql plan usage p, stats$snapshot s
WHERE p.snap id=s.snap id
AND p.hash value=3786124882
ORDER BY p.snap id;
SNAP ID SNAP TIME SQL ID
HASH VALUE OLD HASH VALUE PLAN HASH VALUE COST
------- ---------- ------------- ---------- -------------- --------------- ---493 13. Oct 07 1yw85nghurbkk 3786124882
1455318379
4095786543
9
502 13. Oct 07 1yw85nghurbkk 3786124882
1455318379
4095786543
9
582 15. Oct 07 1yw85nghurbkk 3786124882
1455318379
3985860841
17
602 15. Oct 07 1yw85nghurbkk 3786124882
1455318379
4095786543
9
The query result contains several values for the column PLAN HASH VALUE. Hence different
execution plans were used over time. To generate the Statspack SQL report, run the script
sprepsql.sql (or sprsqins.sql if Statspack data from another database was imported) and
enter any adjacent snapshot identifiers from the preceding query result.
SQL> @sprepsql
…
Specify the Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin snap: 582
Begin Snapshot Id specified: 582
Enter value for end snap: 602
End Snapshot Id specified: 602
At the point where the script asks for the statement’s hash value, make sure you enter the
OLD HASH VALUE if you are using Oracle10g.
Specify the old (i.e. pre-10g) Hash Value
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for hash value: 1455318379
Hash Value specified is: 1455318379
…
Known Optimizer Plan(s) for this Old Hash Value
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Shows all known Optimizer Plans for this database instance, and the Snap Id's
they were first found in the shared pool. A Plan Hash Value will appear
multiple times if the cost has changed
-> ordered by Snap Id
First
First
Last
Plan
Snap Id
Snap Time
Active Time
Hash Value
Cost
--------- --------------- --------------- ------------ ---------493 13-Oct-07 21:41 15-Oct-07 15:38 4095786543
9
502 13-Oct-07 21:47 15-Oct-07 15:14 3985860841
17
CHAPTER 25 ■ STATSPACK
Plans in shared pool between Begin and End Snap Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Shows the Execution Plans found in the shared pool between the begin and end
snapshots specified. The values for Rows, Bytes and Cost shown below are those
which existed at the time the first-ever snapshot captured this plan - these
values often change over time, and so may not be indicative of current values
-> Rows indicates Cardinality, PHV is Plan Hash Value
-> ordered by Plan Hash Value
-------------------------------------------------------------------------------| Operation
| PHV/Object Name
| Rows | Bytes| Cost |
-------------------------------------------------------------------------------|SELECT STATEMENT
|----- 3985860841 ----|
|
|
17 |
|HASH JOIN
|
|
105 |
17K|
17 |
| TABLE ACCESS FULL
|JOBS
|
19 | 513 |
3 |
| HASH JOIN
|
|
105 |
14K|
14 |
| TABLE ACCESS FULL
|EMPLOYEES
|
107 |
1K|
3 |
| HASH JOIN
|
|
106 |
13K|
10 |
| HASH JOIN
|
|
27 |
1K|
7 |
|
TABLE ACCESS FULL
|LOCATIONS
|
23 |
1K|
3 |
|
TABLE ACCESS FULL
|DEPARTMENTS
|
27 | 513 |
3 |
| TABLE ACCESS FULL
|EMPLOYEES
|
107 |
6K|
3 |
|SELECT STATEMENT
|----- 4095786543 ----|
|
|
9 |
|NESTED LOOPS
|
|
105 |
17K|
9 |
| NESTED LOOPS
|
|
105 |
12K|
7 |
| NESTED LOOPS
|
|
105 |
9K|
6 |
| NESTED LOOPS
|
|
106 |
8K|
4 |
|
TABLE ACCESS FULL
|EMPLOYEES
|
107 |
6K|
3 |
|
TABLE ACCESS BY INDEX ROWID |DEPARTMENTS
|
1 |
19 |
1 |
|
INDEX UNIQUE SCAN
|DEPT ID PK
|
1 |
|
0 |
| TABLE ACCESS BY INDEX ROWID |EMPLOYEES
|
1 |
12 |
1 |
|
INDEX UNIQUE SCAN
|EMP EMP ID PK
|
1 |
|
0 |
| TABLE ACCESS BY INDEX ROWID |JOBS
|
1 |
27 |
1 |
| INDEX UNIQUE SCAN
|JOB ID PK
|
1 |
|
0 |
| TABLE ACCESS BY INDEX ROWID
|LOCATIONS
|
1 |
48 |
1 |
| INDEX UNIQUE SCAN
|LOC ID PK
|
1 |
|
0 |
-------------------------------------------------------------------------------The script retrieves all the execution plans for the statement with the old hash value specified. Since execution plans may change when upgrading the DBMS software to Oracle10g, I
recommend capturing Statspack snapshots at level 6 or higher prior to an upgrade.
Oracle Corporation provides the script spup10.sql for upgrading a Statspack repository to
Oracle10g, but states that the upgrade is not guaranteed to work. My limited experience with
this script and Oracle10g Release 2 is that repository upgrades do not succeed. To preserve
Statspack snapshots captured with Oracle9i, export the schema PERFSTAT using the export
utility (exp), before upgrading to Oracle10g. If necessary, an Oracle9i export dump may be
imported into an Oracle9i test database to run reports (see “Importing Statspack Data from
Another Database” later in this chapter).
333
334
CHAPTER 25 ■ STATSPACK
Finding Snapshots with High Resource Utilization
As part of my consulting work, I was occasionally asked to scrutinize an entire Statspack repository. A client would create a schema level export of the PERFSTAT schema in his production
database and I would import it into a test database (more on importing Statspack data in a
moment). This is a situation where initially you don’t have any idea which snapshots are worth
investigating.
Since a Statspack repository contains snapshots of performance data from V$ views, the
Statspack report must subtract the measurements taken at the beginning of the snapshot interval
from the measurements taken at the end of the snapshot interval to arrive at the resource
consumption during the interval. The report script spreport.sql is passed the beginning and
end snapshot numbers as input. Hence a simple join (or outer join where necessary) is sufficient.
Matters are more complicated when all the snapshots must be investigated. This task is a
case for the analytic function LAG, which maps column values of a previous row visited by a
SELECT statement into the current row without a self join, such that a window containing two
rows is available simultaneously. At this point the attentive reader will rightfully object that
there is no previous row in a relational database. This is why an ordering must be defined using
the SQL keywords OVER and ORDER BY. Following is an example of a query on STATS$SNAPSHOT
using LAG:
SQL> SET NULL <NULL>
SQK> SELECT LAG(snap id) OVER (ORDER BY snap id) AS start snap id,
snap id AS end snap id
FROM stats$snapshot;
START SNAP ID END SNAP ID
------------- ----------<NULL>
33
33
34
34
35
Note that the start snapshot identifier in the first row is NULL, since it is outside of the window.
For details on LAG, please consult the Oracle Database SQL Reference manual.
A view that yields successive snapshot identifiers and verifies that the interval is valid may
be created as the basic building block for analysis. The check for interval validity is performed
by comparing the values in the column STATS$SNAPSHOT.STARTUP TIME. If the startup time does
not match, then the instance was restarted in between snapshots and the measurements are
not usable. In a RAC environment, matters get even more intricate. Since a single repository
might contain measurements from several RAC instances, the instance numbers of the start
and end snapshots must also match. This is accomplished by using the columns INSTANCE
NUMBER and SNAP ID in the ORDER BY supplied with the function LAG. The view, which retrieves
identifiers of valid consecutive beginning and ending snapshots, is called SP VALID INTERVALS
(script sp valid intervals.sql).
CREATE OR REPLACE VIEW site sys.sp valid intervals AS
SELECT *
FROM (
SELECT lag(dbid) over (order by dbid, instance number, snap id) AS start dbid,
dbid AS end dbid,
CHAPTER 25 ■ STATSPACK
lag(snap id) over (order by dbid, instance number, snap id) AS start snap id,
snap id AS end snap id,
lag(instance number) over (order by dbid, instance number, snap id)
AS start inst nr, instance number AS end inst nr,
lag(snap time) over (order by dbid, instance number, snap id)
AS start snap time, snap time AS end snap time,
lag(startup time) over (order by dbid, instance number, snap id)
AS start startup time, startup time AS end startup time
FROM perfstat.stats$snapshot
) iv
WHERE iv.start snap id IS NOT NULL
AND iv.start dbid=iv.end dbid
AND iv.start inst nr=iv.end inst nr
AND iv.start startup time=iv.end startup time;
Following is a query that accesses the view SP VALID INTERVALS:
SELECT start snap id, end snap id, start inst nr, start snap time,
trunc((end snap time-start snap time)*86400) AS interval
FROM site sys.sp valid intervals;
START SNAP ID END SNAP ID START INST NR START SNAP TIME
INTERVAL
------------- ----------- ------------- ------------------- ---------87
88
1 15.08.2007 06:06:04
1898
88
89
1 15.08.2007 06:37:42
2986
90
91
1 15.08.2007 09:35:21
1323
High CPU Usage
High CPU usage might be an indication of a performance problem, such that snapshot intervals
exhibiting high CPU usage may warrant further investigation. CPU usage is usually expressed as a
percentage of available CPU resources. Applied to Statspack, this means that we need to look
at CPU utilization during consecutive snapshot intervals. The algorithm for computing CPU
usage from Statspack repository tables is as follows:
1. Get the CPU consumption (in centiseconds) during the snapshot interval by subtracting
the value captured by the start snapshot from the value captured by the end snapshot.
This value is part of each Statspack report and is calculated by the STATSPACK package.
CPU consumption since instance startup is represented by the statistic “CPU used by
this session” in STATS$SYSSTAT.
2. Divide the value by 100 for conversion from centiseconds to seconds.
3. Divide the CPU consumption by the snapshot interval in seconds. The length of the
interval may be derived from STATS$SNAPSHOT using the analytic function LAG.
4. Divide the result by the number of CPUs (parameter CPU COUNT) captured at the beginning
of the snapshot interval (STATS$PARAMETER) to get average CPU utilization as a percentage
of CPU capacity. The CPU capacity of a system is one second of CPU time per CPU
per second.
335
336
CHAPTER 25 ■ STATSPACK
Let’s look at an example. We will use the excerpts from the following Statspack report to
provide sample figures for the calculation. The relevant figures are reproduced in bold font.
Snap Id
Snap Time
Sessions Curs/Sess Comment
------- ------------------ -------- --------- ------Begin Snap:
90 15-Aug-07 09:35:21
215
11.9
End Snap:
91 15-Aug-07 09:57:24
177
10.6
Elapsed:
22.05 (mins)
…
Statistic
Total
per Second
per Trans
--------------------------------- ------------------ -------------- -----------CPU used by this session
82,337
62.2
23.0
The value of the parameter CPU COUNT is captured, but not printed at the end of the
Statspack report, since it has a default value, so we need to retrieve the value from
STATS$PARAMETER.
SQL> SELECT value FROM stats$parameter WHERE name='cpu count' and snap id=90;
VALUE
----4
If we translate the preceding algorithm into a formula, it becomes this:
consumption during snapshot interval (s) 100 CPU usage (%) = CPU
---------------------------------------------------------------------------------------------------------------------------snapshot interval (s) CPU_COUNT
Using the sample figures yields this:
823.37 100

----------------------------= 15.56 %
1323 4
The following query automates this calculation (script snap by cpu util.sql):
SELECT i.start snap id, i.end snap id,
i.start snap time, i.end snap time,
(i.end snap time - i.start snap time) * 86400 AS interval,
round(((s2.value - s1.value)/ 100 / ((i.end snap time - i.start snap time) * 86400)
/ p.value) * 100,2) AS cpu utilization
FROM site sys.sp valid intervals i, stats$sysstat s1,
stats$sysstat s2, stats$parameter p
WHERE i.start snap id=s1.snap id
AND i.end snap id=s2.snap id
AND s1.name='CPU used by this session'
AND s1.name=s2.name
AND p.snap id=i.start snap id
AND p.name='cpu count'
ORDER BY cpu utilization DESC;
CHAPTER 25 ■ STATSPACK
Running the query confirms the result of the manual calculation for the interval between
snapshots 90 and 91.
Start
End
SnapID SnapID Start Time
CPU
End Time
Interval (s) Utilization (%)
------ ------ ------------------- ------------------- ------------ --------------90
91 15.08.2007 09:35:21 15.08.2007 09:57:24
1323
15.56
88
87
89 15.08.2007 06:37:42 15.08.2007 07:27:28
88 15.08.2007 06:06:04 15.08.2007 06:37:42
2986
1898
7.14
5.28
This query quickly identifies periods of high load that may merit drilling down by generating Statspack reports for the beginning and ending snapshots with the highest CPU usage.
High DB Time
What if a performance problem is due to waiting and does not manifest itself through high CPU
usage? Then the approach shown in the previous section fails. Waiting might be due to contention for latches, locks, or other resources such as slow disks. Oracle10g offers so-called time
model statistics that consider CPU time consumption and waiting. These are available at instance
(V$SYS TIME MODEL) and session level (V$SESS TIME MODEL). In essence, time spent within the
database instance is accounted for. According to the Oracle10g Database Reference:
DB Time is the amount of elapsed time (in microseconds) spent performing database
user-level calls. This does not include the time spent on instance background processes
such as PMON.
The manual further states that the metric “DB time” includes the following:
• DB CPU
• connection management call elapsed time
• sequence load elapsed time
• sql execute elapsed time
• parse time elapsed
• PL/SQL execution elapsed time
• inbound PL/SQL rpc elapsed time
• PL/SQL compilation elapsed time
• Java execution elapsed time
Not a single word about wait time here. However, considering that wait time is rolled up
into the elapsed time of database calls in SQL trace files, it would be surprising if time model
statistics followed a different approach. A quick proof may be built by leveraging the undocumented PL/SQL package DBMS SYSTEM and generating some artificial wait time. At the beginning of
a new database session, all values in V$SESS TIME MODEL are nearly zero.
337
338
CHAPTER 25 ■ STATSPACK
SQL> SELECT stat name, value/1000000 time secs FROM v$sess time model
WHERE (stat name IN ('sql execute elapsed time','PL/SQL execution elapsed time')
OR stat name like 'DB%')
AND sid=userenv('sid');
STAT NAME
TIME SECS
---------------------------------------------------------------- ---------DB time
.018276
DB CPU
.038276
sql execute elapsed time
.030184
PL/SQL execution elapsed time
.007097
Next, we generate some wait time, by artificially waiting one second for the event db file
scattered read, which otherwise occurs when a SELECT statement causes a full table scan. Although
in reality the wait happens in a PL/SQL procedure, it is accounted for as if a full table scan due
to a SQL statement had occurred.
SQL> EXECUTE dbms system.wait for event('db file scattered read', 1, 1);
PL/SQL procedure successfully completed.
Note how the metrics “sql execute elapsed time” and “PL/SQL execution elapsed time”
both have increased by almost one second. Obviously, due to the artificial nature of this test,
the elapsed time is accounted for twice. The metric “DB CPU” has risen only slightly and “DB
time” has also increased by one second, since it aggregates SQL and PL/SQL elapsed time.11
SQL> SELECT stat name, value/1000000 time secs FROM v$sess time model
WHERE (stat name IN ('sql execute elapsed time','PL/SQL execution elapsed time')
OR stat name like 'DB%')
AND sid=userenv('sid');
STAT NAME
TIME SECS
---------------------------------------------------------------- ---------DB time
1.030818
DB CPU
.045174
sql execute elapsed time
1.017276
PL/SQL execution elapsed time
.987208
The test shows that waiting on events, which occur as part of SQL execution, is rolled up
into the metric “sql execute elapsed time”. Wait time, except for wait events in wait class Idle,12
is also rolled up into “DB time”.
Just like with CPU usage, we need to somehow normalize “DB time”. Computer systems
have a more or less unlimited capacity for wait time. The more processes run on a system and
compete for the same resources, the more wait time is accumulated at a system level. When ten
processes each wait one second for the same TX enqueue, ten times the total wait time of a
single process results. The metric “DB time” may be normalized by the snapshot interval. I
shall call this metric relative DB time. We will again start with a manual calculation. The relevant
11. Interestingly, the one second waited is not accounted for twice in the metric “DB time”.
12. To retrieve all events in wait class Idle, run the following query on an Oracle10g instance:
SELECT name FROM v$event name WHERE wait class='Idle'.
CHAPTER 25 ■ STATSPACK
excerpts of an Oracle10g Statspack report follow. Figures required for the calculation of relative
DB time are in bold font.
Snapshot
Snap Id
Snap Time
Sessions Curs/Sess Comment
~~~~~~~~
---------- ------------------ -------- --------- ------------------Begin Snap:
83 06-Sep-07 17:04:06
24
3.3
End Snap:
84 06-Sep-07 17:09:54
24
3.3
Elapsed:
5.80 (mins)
Time Model System Stats DB/Inst: TEN/TEN1 Snaps: 83-84
-> Ordered by % of DB time desc, Statistic name
Statistic
Time (s) % of DB time
----------------------------------- -------------------- -----------sql execute elapsed time
319.3
100.0
PL/SQL execution elapsed time
316.7
99.2
DB CPU
301.4
94.4
…
DB time
319.3
Expressed as a formula, relative DB time is as follows:
relative DB time (s) =
DB time (s)
snapshot interval (s)
Using the sample figures yields this:
319.3
----------------- = 0.92
5.8 60
The query, which automates the calculation, is once again based on the view
SP VALID INTERVALS (file snap by db time.sql).
SQL> SELECT i.start snap id, i.end snap id,
i.start snap time, i.end snap time,
(i.end snap time - i.start snap time) * 86400 AS interval,
round((s2.value - s1.value) / 1000000 /* convert from microsec to sec */
/ ((i.end snap time - i.start snap time) * 86400 ), 2)
/* normalize by snapshot interval */
AS db time per sec
FROM site sys.sp valid intervals i, stats$sys time model s1,
stats$sys time model s2, stats$time model statname n
WHERE i.start snap id=s1.snap id
AND i.end snap id=s2.snap id
AND n.stat name='DB time'
AND s1.stat id=n.stat id
AND s2.stat id=n.stat id
ORDER BY db time per sec DESC;
Start
End
339
340
CHAPTER 25 ■ STATSPACK
SnapID SnapID Start Time
End Time
Interval (s) DB time/s
------ ------ -------------- -------------- ------------ --------83
84 06.09.07 17:04 06.09.07 17:09
348
.92
49
50 05.09.07 07:45 05.09.07 08:00
850
.02
25
26 25.07.07 19:53 25.07.07 20:00
401
.01
The highest relative DB time occurred in the interval between snapshots 83 and 84.
Importing Statspack Data from Another Database
As stated earlier, it may be desirable to import Statspack data from a database into a different
database for analysis. Let’s say that the schema PERFSTAT of a production database is exported
once per month, backed up to tape, and then the Statspack tables are truncated to conserve
space. The deleted snapshots may be imported into another database, should the need arise to
investigate past snapshots, for example to retrieve last month’s execution plan of a particular
statement. Obviously, the production database cannot be the target of the import, since this
might interfere with the ongoing snapshot capture process.
The procedure shown next takes into account that the Statspack table STATS$IDLE EVENT
might contain additional wait events that were missing in a particular Statspack release. The
brute force approach of dropping all tables owned by PERFSTAT and letting import create
them would remove this customizing. This is why the approach shown next does not drop any
of the Statspack tables. Instead, it disables referential integrity constraints, truncates the tables
with sptrunc.sql,13 and uses the import setting IGNORE=Y to import data into existing tables.
As a starting point, a test database where Statspack has been installed with spcreate.sql
is required. The version of Statspack installed must match the version contained in the export
dump. Any automatic snapshot captures should be disabled. First of all, existing snapshots
need to be removed from the Statspack repository by running the script sptrunc.sql.
$ sqlplus perfstat/secret @sptrunc
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
Warning
~~~~~~~
Running sptrunc.sql removes ALL data from Statspack tables.
wish to export the data before continuing.
About to Truncate Statspack Tables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you would like to continue, press <return>
Enter value for return:
Entered - starting truncate operation
Table truncated.
…
Truncate operation complete
SQL> EXIT
13. The script sptrunc.sql does not truncate the table STATS$IDLE EVENT.
You may
CHAPTER 25 ■ STATSPACK
Next, referential integrity constraints are disabled, since these would be violated during
the import process. The imp utility re-enables them at the end of the import run. Disabling
constraints is achieved by generating a SQL script that contains ALTER TABLE DISABLE CONSTRAINT
statements with SQL*Plus. The SQL script is named gen disable sp constr.sql and its contents
are shown here:
set linesize 200
set trimout on
set trimspool on
set heading off
set pagesize 0
set feedback off
set heading off
spool disable.sql
select 'ALTER TABLE perfstat.' || table name || ' DISABLE CONSTRAINT ' ||
constraint name || ';'
from dba constraints
where owner='PERFSTAT' and constraint type='R';
prompt exit
exit
Next, run the script as user PERFSTAT.
$ sqlplus -s perfstat/secret @gen disable sp constr.sql
ALTER TABLE perfstat.STATS$BG EVENT SUMMARY
DISABLE CONSTRAINT STATS$BG EVENT SUMMARY FK;
…
ALTER TABLE perfstat.STATS$WAITSTAT DISABLE CONSTRAINT STATS$WAITSTAT FK;
SQL*Plus writes the generated ALTER TABLE statements to the file disable.sql. Running
disable.sql disables all referential integrity constraints in schema PERFSTAT.
$ sqlplus -s perfstat/secret @disable
Table altered.
…
Table altered.
At this point, the schema is ready for importing past snapshot data. Note that the import
option IGNORE=Y is used to import into existing tables. Import will signal several ORA-00001 and
ORA-02264 errors. These are irrelevant and should be ignored.
$ imp system/secret file=perfstat.dmp full=y ignore=y log=imp.log
Import: Release 9.2.0.1.0 - Production on Wed Sep 19 18:09:00 2007
. importing PERFSTAT's objects into PERFSTAT
…
ORA-00001: unique constraint (PERFSTAT.STATS$IDLE EVENT PK) violated
Column 1 smon timer
…
. . importing table
"STATS$STATSPACK PARAMETER"
1 rows imported
IMP-00017: following statement failed with ORACLE error 2264:
"ALTER TABLE "STATS$STATSPACK PARAMETER" ADD CONSTRAINT "STATS$STATSPACK P P"
341
342
CHAPTER 25 ■ STATSPACK
"IN CK" CHECK (pin statspack in ('TRUE', 'FALSE')) ENABLE NOVALIDATE"
IMP-00003: ORACLE error 2264 encountered
ORA-02264: name already used by an existing constraint
…
. . importing table
"STATS$WAITSTAT"
126 rows imported
…
About to enable constraints...
Import terminated successfully with warnings.
Now we are ready to run Statspack reports. Note that the script spreport.sql cannot be
used, since it only considers snapshots taken by the current instance. Instead, its companion
script sprepins.sql must be used. This latter script is called by spreport.sql once it has determined the database identifier and instance number. With sprepins.sql, setting both of these
figures is a manual process. When sprepins.sql is run, it lists the database instances in
STATS$DATABASE INSTANCE. Select the desired instance, preferably by copying and pasting.
SQL> @sprepins
Instances in this Statspack schema
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DB Id
Inst Num DB Name
Instance
Host
----------- -------- ------------ ------------ -----------4273840935
1 PRODDB
IPROD
dbserver3
Enter value for dbid: 4273840935
Using 4273840935 for database Id
Enter value for inst num: 1
Using 1 for instance number
Completed Snapshots
Snap
Snap
Instance
DB Name
Id Snap Started
Level Comment
------------ ------------ ----- ----------------- ----- -------------PR02
PROD
104 15 Aug 2006 16:30
10
…
202 22 Aug 2006 09:15
10
203 22 Aug 2006 09:30
10
At this point, the script asks for the beginning and ending snapshot identifiers and the
report file name in the usual manner. Once instance name and instance number have been
entered, there is no difference between reporting on snapshots captured by the current instance
and imported snapshot data.
CHAPTER 25 ■ STATSPACK
Source Code Depot
Table 25-4 lists this chapter’s source files and their functionality.
Table 25-4. Statspack Source Code Depot
File Name
Functionality
gen disable sp constr.sql
Generates a SQL script that disables all referential integrity
constraints in schema PERFSTAT
snap by cpu util.sql
Lists snapshot intervals with high CPU usage
snap by db time.sql
Lists snapshot intervals with high DB time
sp sqltext.sql
PL/SQL pipelined table function for retrieval of syntactically
correct (i.e., no forced line breaks) SQL statements from the
Statspack repository
sp sqltext get.sql
SQL script that calls the pipelined table function SITE SYS.
SP SQLTEXT
sp sqltext join.sql
Retrieves statements which had an elapsed time of more than
one second per execution from a Statspack repository
sp translate hv.sql
Translates an Oracle10g hash value from a SQL trace file to the
old hash value for use with Statspack and to the SQL identifier
(SQL ID) for use with AWR by querying STATS$SQL PLAN USAGE
sp used indexes.sql
Identifies used indexes in a schema; requires snapshot level 6
or higher
sp valid intervals.sql
View for retrieving valid snapshot intervals using the analytic
function LAG
sprepins fix 10g.sql
Fix for incorrect session level reports in Oracle10g (bug 5145816);
replaces the original file sprepins.sql
sprepins fix 9i.sql
Fix for incorrect session level reports in Oracle9i (bug 5145816);
replaces the original file sprepins.sql
sql fulltext.sql
SQL*Plus script for retrieving a SQL statement identified by hash
value from V$SQL (requires at least Oracle10g)
statspack purge.sql
Snapshot purge PL/SQL procedure for Oracle9i
343
CHAPTER 26
■■■
Integrating Extended SQL
Trace and AWR
T
he Active Workload Repository (AWR) and SQL trace files both capture performance-relevant
data on SQL and PL/SQL execution. It is undocumented that both data sources may be linked
to answer questions that frequently arise during investigations of performance problems. By
integrating SQL trace data with AWR, it is possible to find out whether different execution plans
were used for a particular statement and at what time. Furthermore, contrary to EXPLAIN PLAN,
the AWR is a reliable source for execution plans when plans for certain statements are absent
from SQL trace files.
Retrieving Execution Plans
Unlike Statspack release 10.2, which captures the SQL ID as well as the old (V$SQL.OLD HASH VALUE)
and new hash value (V$SQL.HASH VALUE) for SQL statement texts, AWR captures merely the
SQL ID from V$SQL. In Oracle10g, this poses a slight problem for the retrieval of past execution
plans and statistics for statements captured by SQL trace files. The reason is that Oracle10g
trace files do not include the SQL ID from V$SQL. Oracle11g SQL trace files include the new
parameter sqlid (see “PARSING IN CURSOR Entry Format” in Chapter 24), which corresponds
to the SQL ID from V$SQL. Thus, the issue of mapping a SQL statement text to the SQL ID has
been resolved in Oracle11g.
Note that execution plans are only emitted to SQL trace files when cursors are closed, such
that it is possible to encounter trace files that do not contain execution plans for certain statements. If such statements have been aged out of the shared pool by the time the absence of an
execution plan becomes evident, then AWR or Statspack (see “Execution Plans for Statements
Captured with SQL Trace” in Chapter 25) are the only options for retrieving the plan. Occasionally the optimizer chooses different plans for the same statement over time. One execution
plan might result in a very poor response time while another may cause an appropriate response
time. The procedure presented next shows how to retrieve all plans for a SQL statement captured
by SQL trace and AWR. The five-way join of tables in the sample schema HR depicted in
“Retrieving the Text of Captured SQL Statements” in Chapter 25 is used as an example. In releases
prior to Oracle11g, we need to start by determining the SQL ID for a statement, since those
releases do not emit it to the SQL trace file. Since AWR captures the SQL statement text as a
character large object (CLOB) from V$SQL.SQL FULLTEXT, this is accomplished by searching for
the statement text or portions of it with DBMS LOB. Statements captured by AWR are stored in
345
346
CHAPTER 26 ■ INTEGRATING EXTENDED SQL TRACE AND AWR
the data dictionary base table WRH$ SQLTEXT and may be accessed through the view DBA HIST
SQLTEXT (script awr sqltext.sql).
SQL> SET LONG 1048576
SQL> COLUMN sql text FORMAT a64 WORD WRAPPED
SQL> SELECT sql id, sql text
FROM dba hist sqltext
WHERE dbms lob.instr(sql text, '&pattern', 1, 1) > 0;
Enter value for pattern: FROM hr.employees emp, hr.employees mgr
SQL ID
SQL TEXT
------------- ------------------------------------------------------------1yw85nghurbkk SELECT emp.last name, emp.first name, j.job title,
d.department name, l.city,
l.state province, l.postal code, l.street address, emp.email,
emp.phone number, emp.hire date, emp.salary, mgr.last name
FROM hr.employees emp, hr.employees mgr, hr.departments d,
hr.locations l, hr.jobs j
WHERE emp.manager id=mgr.employee id
AND emp.department id=d.department id
AND d.location id=l.location id
AND emp.job id=j.job id
Having retrieved the SQL ID, we may now search for AWR snapshots that captured the
statement. The view DBA HIST SQLSTAT not only contains the snapshot identifiers, but also
gives access to execution statistics, hash values of execution plans, and the optimizer environment used (script awr sqlstat.sql).
SQL> SELECT st.snap id,
to char(sn.begin interval time,'dd. Mon yy hh24:mi') begin time,
st.plan hash value, st.optimizer env hash value opt env hash,
round(st.elapsed time delta/1000000,2) elapsed,
round(st.cpu time delta/1000000,2) cpu,
round(st.iowait delta/1000000,2) iowait
FROM dba hist sqlstat st, dba hist snapshot sn
WHERE st.snap id=sn.snap id
AND st.sql id='1yw85nghurbkk'
ORDER BY st.snap id;
SNAP ID BEGIN TIME
PLAN HASH VALUE OPT ENV HASH ELAPSED
CPU IOWAIT
------- ---------------- --------------- ------------ ------- ------ -----72 13. Oct 07 21:39
4095786543
611815770
1.28
.05
1.21
73 13. Oct 07 21:42
4095786543
611815770
.32
.06
.27
73 13. Oct 07 21:42
3985860841 3352456078
1.82
.38
1.60
81 15. Oct 07 11:24
4095786543
611815770
.16
.06
.10
The fact that the columns PLAN HASH VALUE and OPT ENV HASH in the query result are not
unique for the single SQL ID “1yw85nghurbkk” proves that multiple plans for the same statement
have been used and that the statement was run with different optimizer parameter settings.
Actually, a single parameter used by the optimizer, namely OPTIMIZER INDEX COST ADJ, which
CHAPTER 26 ■ INTEGRATING EXTENDED SQL TRACE AND AWR
was varied between 100 (default) and 10000, is responsible for the effect shown. The increase
of OPTIMIZER INDEX COST ADJ caused the optimizer to consider index access as 100 times more
expensive. As a consequence, the optimizer chose a plan with full table scans and hash joins
instead of index accesses and nested loops.
There are two approaches for retrieving plans from an AWR repository:
• The pipelined table function DBMS XPLAN.DISPLAY AWR
• The AWR SQL statement report script $ORACLE HOME/rdbms/admin/awrsqrpt.sql
Unless you wish to retrieve query block names for use in hints—these are not displayed by
the AWR report—the second approach is the preferred one, since it not only contains all plans
for a specific SQL ID, but also includes execution statistics. The call to DBMS XPLAN.DISPLAY AWR
requires the SQL ID, plan hash value, and the database identifier (V$DATABASE.DBID) as input
parameters. Values for the first two parameters have already been retrieved from DBA HIST
SQLSTAT, so solely the database identifier must be queried before DBMS XPLAN can be called.
SQL> SELECT dbid FROM v$database;
DBID
---------2870266532
SQL> SELECT * FROM
TABLE (dbms xplan.display awr('1yw85nghurbkk', 4095786543, 2870266532, 'ALL'));
PLAN TABLE OUTPUT
----------------------------------------------------------------------------SQL ID 1yw85nghurbkk
-------------------SELECT emp.last name, emp.first name, j.job title, d.department name, l.city,
…
Plan hash value: 4095786543
PLAN TABLE OUTPUT
------------------------------------------------------------------------------| Id | Operation
| Name | Rows | Bytes | Cost (%CPU)| Time
|
------------------------------------------------------------------------------| 0 | SELECT STATEMENT
|
|
|
|
9 (100)|
|
| 1 | NESTED LOOPS
|
|
105 | 18060 |
9 (12)| 00:00:01 |
…
Query Block Name / Object Alias (identified by operation id):
------------------------------------------------------------1 - SEL$1
PLAN TABLE OUTPUT
------------------------------5 - SEL$1 / EMP@SEL$1
…
The AWR report script awrsqrpt.sql asks for the SQL ID and the beginning and end snapshot identifiers. These values were previously retrieved from DBA HIST SQLSTAT. Figure 26-1
347
348
CHAPTER 26 ■ INTEGRATING EXTENDED SQL TRACE AND AWR
depicts an excerpt of an AWR SQL report in HTML format with two execution plans for a single
SQL ID.
Figure 26-1. AWR SQL report with multiple plans for a single statement
Lessons Learned
Retrieving execution plans for SQL statements captured with SQL trace from AWR is useful for
retrieving plans absent from SQL trace files or for comparisons of current and past execution
plans. Execution plans may change over time for various reasons such as these:
• Changes in optimizer parameters
• Updated optimizer statistics (DBMS STATS or ANALYZE)
• Software upgrades or downgrades
• Partitioning of tables or indexes, which were previously not partitioned
• Bind variable peeking
CHAPTER 26 ■ INTEGRATING EXTENDED SQL TRACE AND AWR
The hash value calculated on the optimizer environment and stored in the AWR repository
may be used as evidence that a plan may have changed due to different parameter settings at
instance or session level. Since updated optimizer statistics may cause plans to change, I
recommend saving statistics in a statistics table using the packaged procedure DBMS STATS.
EXPORT SCHEMA STATS before overwriting them. The package DBMS STATS includes the procedure
CREATE STAT TABLE for creating statistics tables. In Oracle10g, automatic statistics recalculation
for a schema may be prevented with DBMS STATS.LOCK SCHEMA STATS.
Source Code Depot
Table 26-1 lists this chapter’s source files and their functionality.
Table 26-1. Extended SQL Trace and AWR Source Code Depot
File Name
Functionality
awr sqltext.sql
Retrieves the SQL ID and full statement text from DBA HIST SQLTEXT
based on a text subset of a SQL statement. The SQL ID may be passed
to the AWR script awrsqrpt.sql as input.
awr sqlstat.sql
Retrieves AWR snapshots that captured a SQL statement with a certain
SQL ID. Execution statistics, such as elapsed time and CPU usage, are
displayed along with hash values for the execution plan and the optimizer
environment.
349
CHAPTER 27
■■■
ESQLTRCPROF Extended
SQL Trace Profiler
O
ne might say that developing profiler tools for analyzing extended SQL trace files has been
en vogue for the last few years. TKPROF, Oracle’s own profiler has seen only marginal improvements since it first supported wait events in Oracle9i. Other tools offer a much more profound
analysis of trace files than TKPROF. First of all there is Oracle’s free Trace Analyzer (see Metalink
note 224270.1). This PL/SQL-based tool creates a very detailed HTML report. The downside is
that it needs to be installed into a database, which may preclude an ad-hoc analysis of a production
system. Ideally it is run in the same instance that was traced, since it will then translate object
identifiers into the corresponding segment names.
Another freely available and useful tool is TVD$XTAT written by fellow Apress author
Christian Antognini. It is a Java program that also creates an HTML report. The ability to create
wait event histograms is one of its more advanced features. Unlike Trace Analyzer, it runs independently of an Oracle DBMS instance and is much faster. It cannot translate database object
identifiers within wait events into segment names of tables and indexes.
ESQLTRCPROF is my own Perl-based profiler for Oracle9i, Oracle10g, and Oracle11g
extended SQL trace files. ESQLTRCPROF is capable of parsing the undocumented extended
SQL trace file format. It calculates a resource profile for an entire SQL trace file and for each
cursor in the trace file. It appears to be the only profiler that supports think time as a separate
contributor to response time. It also breaks down elapsed time, CPU time, and wait time by
recursive call depth, thus pointing out at which call depth most of the response time is consumed.
Like the other profilers mentioned, ESQLTRCPROF is a replacement for TKPROF, since it
addresses several shortcomings of Oracle Corporation’s own official SQL trace analysis tool.
351
352
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
Categorizing Wait Events
The ultimate goal is to automatically create a resource profile from an extended SQL trace file.
Even though the file format has been discussed in Chapter 24, some more preparations are
necessary to achieve this goal. An issue that has not yet been discussed is the categorization
of wait events into intra database call wait events and inter database call wait events. This is
necessary for correct response time accounting. Intra database call wait events occur within
the context of a database call. The code path executed to complete a database call consists not
only of CPU consumption, but may also engender waiting for resources such as disks, latches,
or enqueues. Time spent waiting within a database call is accounted for by intra database call
wait events. Examples of such wait events are latch free, enqueue, db file sequential read, db file
scattered read, and buffer busy waits. In fact, most wait events are intra database call wait events.
Inter database call wait events occur when the DBMS server is waiting to receive the next database call. In other words, the DBMS server is idle, since the client does not send a request.
According to Millsap and Holt ([MiHo 2003], page 88), the following wait events are inter
(or between) database call wait events:
• SQL*Net message from client
• SQL*Net message to client
• pmon timer
• smon timer
• rdbms ipc message
Of these wait events pmon timer, smon timer, and rdbms ipc message solely occur in background processes. Thus, the only inter database call wait events, which are relevant to tuning
an application, are SQL*Net message from client and SQL*Net message to client.1
Figure 27-1 is a graphical representation of database calls, wait events, and transaction
entries (XCTEND) from an extended SQL trace file. The X axis represents time (t), while the Y axis
represents the recursive call depth (dep). Inter database call wait events are depicted in white
font against a dark background. Note that all of these wait events are either associated with a
cursor number n (n > 0) at recursive call depth 0 or the default cursor 0.
1. The list in [MiHo 2003] also contains pipe get and single-task message. I have omitted pipe get, since
my testing showed that this wait event occurs when an execute database call on the PL/SQL package
DBMS PIPE is made. When DBMS PIPE.RECEIVE MESSAGE is called with a non-zero timeout, time spent
waiting for a message is accounted for with the wait event pipe get and is rolled up into the parameter
e of the associated EXEC entry. Thus, pipe get is an intra database call wait event.
All implementations of the ORACLE DBMS since Oracle9i are two-task implementations only, i.e., server
process and client process are separate tasks, each running in its own address space. Hence the wait
event single-task message is no longer used (see also Metalink note 62227.1).
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
WAIT #0
SQL*Net msg to client
WAIT #0
log file sync
XCTEND
WAIT #1
SQL*Net msg from client
EXEC #1
WAIT #1
SQL*Net msg to client
FETCH #2
WAIT #1
db file sequential read
PARSE #1
INSERT customer
WAIT #2
SQL*Net msg from client
WAIT #2
SQL*Net msg to client
0
EXEC #2
ALTER SESSION
1
EXEC #2
PARSE #2
SELECT NEXTVAL
2
EXEC #3
PARSE #3
UPDATE seq$
dep
t
Figure 27-1. Recursive call depth and inter database call wait events2
Calculating Response Time and Statistics
According to Millsap and Holt ([MiHo 2003]), the response time R represented by a SQL trace
file is defined as the sum of the elapsed time spent in database calls (e values) at recursive call
depth 0 (dep=0) plus the sum of all ela values from inter database call wait events. The wait time
(ela values) accumulated while processing a database call is rolled up into the parameter e of
the database call that engendered the wait. The categorization of wait events discussed in the
previous section is applied in the calculation of R. Time spent waiting for intra database call
wait events must not be added to R, since this would result in double counting. The e values of
a database call already contain the wait time of all intra database call wait events. Database
calls are emitted to trace files upon completion. This is why WAIT entries for intra database call
wait events appear before the PARSE, EXEC, and FETCH entries that engendered them.
Runtime statistics, such as consistent reads, physical writes, and db block gets at recursive
call depths other than zero are rolled up into PARSE, EXEC, and FETCH calls at recursive call depth
0. Just like ela values of intra database call wait events, these must not be double counted. To
promote a thorough understanding of how an extended SQL trace profiler calculates a resource
2. The abbreviation msg is used instead of the word message in wait event names.
353
354
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
profile from a trace file, I would like to walk you through a manual calculation of some figures
based on a small trace file.
Case Study
This case study is based on a small Perl DBI program called insert customer.pl. The program
inserts a single row into the table CUSTOMER. Each customer is identified by a unique number,
which is generated by the sequence CUSTOMER ID SEQ. An INSERT trigger is used to assign the
next value from the sequence to the column ID of the table. I have deliberately created the
sequence with the NOCACHE option, since this causes recursive SQL (i.e., dep values greater than
zero).3 The purpose of the case study is twofold:
• To demonstrate how the response time R is calculated.
• To provide evidence that runtime statistics at recursive call depths greater than zero are
rolled up into statistics at higher recursive call depths and ultimately at recursive call
depth 0.
Remember that statements executed directly by a database client have recursive call depth
zero. The program pauses twice to allow the user to query the dynamic performance view
V$SESSTAT, once before enabling SQL trace and once just prior to disconnection from the
ORACLE instance. Since V$SESSTAT is not affected by double counting as discussed previously,
it holds an accurate representation of session level statistics.
Running the Perl Program
To repeat the case study, open two terminal windows. One for running the Perl program and
another one for querying V$SESSTAT. Run insert customer.pl in the first terminal window.4
$ insert customer.pl
Hit return to continue
While the program waits for input, query V$SESSTAT in the second window.
SQL> SELECT n.name, s.value
FROM v$sesstat s, v$statname n, v$session se
WHERE s.statistic#=n.statistic#
AND n.name IN ('db block gets', 'consistent gets')
AND s.sid=se.sid
AND se.program LIKE 'perl%';
NAME
VALUE
------------------ ----db block gets
0
consistent gets
195
3. Do not use NOCACHE in any real-world applications, since it degrades performance.
4. Use the Perl DBI environment variables DBI USER, DBI PASS, and DBI DSN to specify user name,
password, data source, and connect string (see Chapter 22).
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
Now hit return in the first window. The program enables SQL trace and inserts a single
row. After a moment the program asks for input again.
Hit return to continue
At this point, the INSERT statement and a subsequent COMMIT have been completed. Another
query on V$SESSTAT reveals that the figures for db block gets and consistent gets have risen to 9
and 197 respectively.
SQL> SELECT n.name, s.value
FROM v$sesstat s, v$statname n, v$session se
WHERE s.statistic#=n.statistic#
AND n.name IN ('db block gets', 'consistent gets')
AND s.sid=se.sid
AND se.program='perl.exe';
NAME
VALUE
------------------- ----db block gets
9
consistent gets
197
You may now hit return for the second time. The program disconnects and terminates.
Subtracting the initial figures from the final figures yields nine db block gets and two consistent
gets. In theory, the extended SQL trace file should contain the same figures, however small
discrepancies are in order.
The trace file that results from this test is small enough to evaluate manually. Note that the
trace file which results the first time you run the test may be larger than the file reproduced
below, since the dictionary cache and the library cached need to be loaded. Except for some
header information, the complete trace file (with line numbers added) is depicted here:
1 Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
2 With the Partitioning, Oracle Label Security, OLAP and Data Mining options
3
4 *** ACTION NAME:() 2007-11-20 15:39:38.546
5 *** MODULE NAME:(insert customer.pl) 2007-11-20 15:39:38.546
6 *** SERVICE NAME:(TEN.oradbpro.com) 2007-11-20 15:39:38.546
7 *** SESSION ID:(44.524) 2007-11-20 15:39:38.546
8 =====================
9 PARSING IN CURSOR #2 len=68 dep=0 uid=61 oct=42 lid=61 tim=789991633616
hv=740818757 ad='6be3972c'
10 alter session set events '10046 trace name context forever, level 8'
11 END OF STMT
12 EXEC #2:c=0,e=98,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=789991633607
13 WAIT #2: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1
p3=0 obj#=-1 tim=789991638001
14 WAIT #2: nam='SQL*Net message from client' ela= 569 driver id=1413697536 #bytes=1
p3=0 obj#=-1 tim=789991638751
15 =====================
16 PARSING IN CURSOR #1 len=87 dep=0 uid=61 oct=2 lid=61 tim=789991639097
hv=2228079888 ad='6cad992c'
355
356
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
17 INSERT INTO customer(name, phone) VALUES (:name, :phone)
18
RETURNING id INTO :id
19 END OF STMT
20 PARSE #1:c=0,e=84,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=789991639091
21 =====================
22 PARSING IN CURSOR #2 len=40 dep=1 uid=61 oct=3 lid=61 tim=789991640250
hv=1168215557 ad='6cbaf25c'
23 SELECT CUSTOMER ID SEQ.NEXTVAL FROM DUAL
24 END OF STMT
25 PARSE #2:c=0,e=72,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=789991640243
26 EXEC #2:c=0,e=62,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=789991641167
27 =====================
28 PARSING IN CURSOR #3 len=129 dep=2 uid=0 oct=6 lid=0 tim=789991641501
hv=2635489469 ad='6bdb9be8'
29 update seq$ set increment$=:2,minvalue=:3,maxvalue=:4,cycle#=:5,order$=:6,
cache=:7,highwater=:8,audit$ =:9,flags=:10 where obj#=:1
30 END OF STMT
31 PARSE #3:c=0,e=68,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,tim=789991641494
32 EXEC #3:c=0,e=241,p=0,cr=1,cu=2,mis=0,r=1,dep=2,og=4,tim=789991642567
33 STAT #3 id=1 cnt=1 pid=0 pos=1 obj=0 op='UPDATE SEQ$ (cr=1 pr=0 pw=0
time=195 us)'
34 STAT #3 id=2 cnt=1 pid=1 pos=1 obj=102 op='INDEX UNIQUE SCAN I SEQ1
(cr=1 pr=0 pw=0 time=25 us)'
35 FETCH #2:c=0,e=1872,p=0,cr=1,cu=3,mis=0,r=1,dep=1,og=1,tim=789991643213
36 WAIT #1: nam='db file sequential read' ela= 33297 file#=4 block#=127140 blocks=1
obj#=54441 tim=789993165434
37 WAIT #1: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1
p3=0 obj#=54441 tim=789993165747
38 EXEC #1:c=1500000,e=1525863,p=1,cr=2,cu=8,mis=0,r=1,dep=0,og=1,tim=789993165858
39 WAIT #1: nam='SQL*Net message from client' ela= 232 driver id=1413697536 #bytes=1
p3=0 obj#=54441 tim=789993166272
40 XCTEND rlbk=0, rd only=0
41 WAIT #0: nam='log file sync' ela= 168 buffer#=5320 p2=0 p3=0
obj#=54441 tim=789993166718
42 WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1
p3=0 obj#=54441 tim=789993166829
43 *** 2007-11-20 15:39:49.937
44 WAIT #0: nam='SQL*Net message from client' ela= 9864075 driver id=1413697536
#bytes=1 p3=0 obj#=54441 tim=790003031019
45 XCTEND rlbk=0, rd only=1
46 STAT #2 id=1 cnt=1 pid=0 pos=1 obj=53073 op='SEQUENCE CUSTOMER ID SEQ
(cr=1 pr=0 pw=0 time=1878 us)'
47 STAT #2 id=2 cnt=1 pid=1 pos=1 obj=0 op='FAST DUAL (cr=0 pr=0 pw=0 time=15 us)'
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
Calculating Statistics
Database call statistics at recursive call depths other than zero are rolled up into the statistics
at recursive call depth 0. To calculate the total number of db block gets in the trace file, we must
consider only cu parameter values of PARSE, EXEC, and FETCH entries with dep=0. The database
call parameter cu (for current read) corresponds to the statistic db block gets. The only cu value
at recursive call depth 0 that is greater than zero is in line 38 (cu=8). This is off by one from the
db block gets value retrieved from V$SESSTAT. Note that three db block gets have occurred at
recursive call depth 1 and below (line 35). Two db block gets were recorded at recursive call
depth 2 (line 32). The fact that the total number of db block gets as determined by querying
V$SESSTAT was nine confirms that database call statistics at lower levels are rolled up into statistics
at recursive call depth 0. The sum of cu values at any recursive call depth is 13. If the cu values
at recursive call depth n did not include the cu values at recursive call depth n+1, we would see
at least 13 db block gets in V$SESSTAT.
Calculating Response Time
The response time R is defined as the sum of all e values at dep=0 plus the sum of all ela values
from inter database call wait events ([MiHo 2003], page 94).
e
R =
dep = 0
+
 ela
inter db call
The sum of all e values at dep=0 is derived from lines 12, 20, and 38.
e
= 98 + 84 + 1525863 = 1526045
dep = 0
SQL*Net message from client and SQL*Net message to client are the inter database call wait
events present in the trace file. The sum of all ela values of inter database call wait events is
derived from lines 13, 14, 37, 39, 42, and 44.

ela = 5 + 569 + 5 + 232 + 2 + 9864075 = 9864888
inter db call
Since e and ela values are in microseconds, R expressed in seconds is:
1526045 + 9864888  = 11.390 s
R = ---------------------------------------------------1000000
To check that the code path measured is sufficiently instrumented, it is always a good idea
to calculate the elapsed time covered by the trace file from the first and last tim values. The
lowest timestamp at the beginning of the trace file is 789991633616. The highest timestamp in
line 44 is 790003031019. Thus, the interval in seconds covered by the trace file is:
790003031019 – 789991633616
----------------------------------------------------------------------------- = 11.397 s
1000000
If there is a large discrepancy between R and the elapsed time covered by the trace file,
then the ORACLE kernel does not have instrumentation for part of the code path in place.
357
358
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
The elapsed time spent in database calls consists of CPU consumption and waiting. Ideally
e for each database call at recursive call depth 0 would equal the CPU usage (c) plus the intra
db call wait time (ela).
e
dep = 0
c
=
dep = 0
+
 ela
intra db call
In practice, there is usually a difference between these two values. It is unknown where
time not accounted for by c and ela was spent. The unknown contribution to response time
(U) within an entire trace file may be calculated as shown here:
U =
e
dep = 0
–
c
dep = 0
–
 ela
intra db call
In the example, the difference is very small, slightly more than 7 ms.
 98 + 84 + 1525863  – 1500000 – 33297
-------------------------------------------------------------------------------------------------- = – 0.007252 s
1000000
Of course it is much more convenient to automate the calculation of these figures with an
extended SQL trace profiler such as ESQLTRCPROF. This is the subject of the next section.
ESQLTRCPROF Reference
ESQLTRCPROF is an extended SQL trace profiler written in Perl by the author. It has the
following features:
• Calculation of a resource profile for an entire SQL trace file
• Calculation of a resource profile for each SQL or PL/SQL statement in a trace file
• Categorization of the inter database call wait event SQL*Net message from client into
unavoidable latency due to network round-trips and think time
• Sorting of statements by total elapsed time (the sum of parse, execute, and fetch elapsed
time plus inter database call wait time except think time)
• Extraction of execution plans from STAT entries
• Calculation of various statistics, such as physical reads, consistent gets, db block gets,
transactions per second, and buffer cache hit ratio
• Apportionment of enqueue waits by individual enqueue
• Breakdown of latch waits by individual latch
• Inclusion of SQL statement hash values in the report for quickly locating statements in
trace files and integration with Statspack
• Inclusion of SQL statement identifiers (sqlid) in the report for quickly locating statements
in trace files and integration with AWR (requires Oracle11g trace files as input)
• Inclusion of module, action, and recursive call depth in the report
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
From the perspective of the database server, think time is elapsed time that a database
client accumulates without making any demands on the DBMS. In other words, the client is
not sending any requests to the DBMS server. By default, ESQLTRCPROF classifies any waits of
more than 5 ms for SQL*Net message from client as the pseudo wait event think time. For such
waits, think time will be displayed as a contributor to response time in resource profiles.
Command Line Options
ESQLTRCPROF accepts three options that modify its processing. When called without any
arguments, it prints information on its usage, supported options, and their meanings.
$ esqltrcprof.pl
Usage: esqltrcprof.pl -v -r <ORACLE major release>.<version> -t <think time in
milliseconds> <extended sql trace file>
-v verbose output; includes instances of think time
-r value must be in range 8.0 to 10.2
The threshold beyond which SQL*Net message from client is classified as think time is
configured in milliseconds with the option -t (for threshold). The default threshold (5 ms) is
usually appropriate for local area networks (LANs), but needs to be increased for trace files
from database clients that connect over a wide area network (WAN). To suppress the categorization of SQL*Net message from client, perhaps for the sake of comparing figures with a TKPROF
report or other tools, set the think time threshold to an arbitrarily large value such as ten hours
(36000000 ms). This will make the pseudo wait event think time disappear from resource profiles.
The option -v (for verbose) is for printing instances of think time. When -v is used, each
time ESQLTRCPROF encounters an instance of think time in a trace file, it prints a message
with the length of the think time and the position in the trace file. Module and action are also
included in the message to point out where in the code path think time was detected.
Found 9.864 s think time (Line 64, Module 'insert customer.pl' Action 'undefined')
Since ESQLTRCPROF also supports Oracle8, which merely had centisecond resolution of
e, c, and ela values, it determines the unit of timing data from the trace file header. The mapping
of latch numbers to latch names is also release dependent. Thus, ESQLTRCPROF refuses to
process trace files that lack release information in the header. The TRCSESS utility shipped
with Oracle10g creates trace files that lack a header. When ESQLTRCPROF is passed such a file
as input, it exits with the following error message:
$ esqltrcprof.pl insert cust.trcsess
No trace file header found reading trace file up to line 9. ORACLE release unknown.
Please use switch -r to specify release
Usage: esqltrc profiler.pl -v -r <ORACLE major release>.<version> -t <think time in
milliseconds> <extended sql trace file>
-v verbose output; includes instances of think time
-r value must be in range 8.0 to 10.2
The option -r (for release) is provided to let ESQLTRCPROF know the release of the ORACLE
DBMS that created the trace file.
359
360
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
$ esqltrcprof -r 10.2 insert cust.trcsess
Assuming ORACLE release 10.2 trace file.
Resource Profile
================
Response time: 11.391s; max(tim)-min(tim): 11.397s
…
ESQLTRCPROF Report Sections
ESQLTRCPROF reports consist of a resource profile at session level, statistics at session level,
statement level resource profiles, and statement level statistics for each statement encountered in
the trace file. The report starts with the session level section entitled “Resource Profile”.
Session Level Resource Profile
The resource profile for the trace file used in the previous case study is reproduced in the following
code example. The R and db block gets figures that were calculated manually are identical to
the figures reported by ESQLTRCPROF. The 7 ms of unknown time also matches the manual
calculation. The report includes the difference between the maximum and minimum timestamp
(tim) value found as “max(tim)-min(tim)” next to the response time.
$ esqltrcprof.pl ten ora 6720 insert customer.pl.trc
ORACLE version 10.2 trace file. Timings are in microseconds (1/1000000 sec)
Warning: WAIT event 'log file sync' for cursor 0 at line 61 without prior PARSING IN
CURSOR #0 - all waits for cursor 0 attributed to default unknown statement
with hash value -1
Resource Profile
================
Response time: 11.391s; max(tim)-min(tim): 11.397s
Total wait time: 9.898s
---------------------------Note: 'SQL*Net message from client' waits for more than 0.005s are considered think
time
Wait events and CPU usage:
Duration
Pct
Count
Average Wait Event/CPU Usage/Think Time
-------- ------ ------------ ---------- ----------------------------------9.864s 86.59%
1 9.864075s think time
1.500s 13.17%
8 0.187500s total CPU
0.033s
0.29%
1 0.033297s db file sequential read
0.001s
0.01%
2 0.000400s SQL*Net message from client
0.000s
0.00%
1 0.000168s log file sync
0.000s
0.00%
3 0.000004s SQL*Net message to client
-0.007s -0.06%
unknown
--------- ------- ----------------------------------------------------------11.391s 100.00% Total response time
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
Total number of roundtrips (SQL*Net message from/to client): 3
CPU usage breakdown
-----------------------parse CPU:
0.00s (3 PARSE calls)
exec CPU:
1.50s (4 EXEC calls)
fetch CPU:
0.00s (1 FETCH calls)
The session level resource profile ends with detailed information on the apportionment
of CPU usage. Sessions that modify many rows have the highest CPU consumption in the “exec
CPU” category, whereas sessions that are mostly reading will have most of the CPU usage
accounted for as “fetch CPU”.
Session Level Statistics
The next report section contains statistical information. Transactions per second are calculated based on R and XCTEND entries. Entries of the form XCTEND rlbk=0, rd only=0 are used as
transaction end markers. The division of the number of transaction end markers encountered
by R yields transactions per second. What is a transaction? Of course, this is entirely dependent
on the application. Different applications perform different types of transactions. This figure
may only be used to compare the performance of the same application before and after tuning.
Note that the response time of a code path may be much improved after optimization, while
transactions per seconds may have dropped. This would happen after the elimination of
unnecessary commits.
Statistics:
----------COMMITs (read write): 1 -> transactions/sec 0.088
COMMITs (read only): 1
ROLLBACKs (read write): 0
ROLLBACKs (read only): 0
rows processed: 1
cursor hits (soft parses): 3
cursor misses (hard parses): 0
consistent gets: 2
db block gets: 8
physical reads: 1
buffer cache hit ratio: 90.00%
Physical read breakdown:
-----------------------single block: 1
multi-block: 0
Latch wait breakdown
-----------------------Enqueue wait breakdown (enqueue name, lock mode)
------------------------------------------------
361
362
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
All the statistics calculated and their sources are summarized in Table 27-1.
Table 27-1. Statistics
Statistic
Source
Buffer cache hit ratio
cr, cu, and p in PARSE, EXEC, and FETCH entries
Consistent gets
cr in PARSE, EXEC, and FETCH entries
Cursor hits and misses
parameter mis in PARSE entries
Db block gets
cu in PARSE, EXEC, and FETCH entries
Enqueue waits
WAIT entries with nam='enqueue'or 'enq: enqueue_details', where
enqueue_details contains the name of an enqueue and a short
description (e.g., nam='enq: TX
contention')
Latch waits
WAIT entries with nam='latch free'or 'latch: latch_details’, where
latch_details contains the name of a latch and a short description
(e.g., nam='latch: cache buffers chains')
Physical reads
p in PARSE, EXEC, and FETCH entries
Rows processed
r in PARSE, EXEC, and FETCH entries
Single block and
multi block reads
p3 (Oracle9i and prior releases) or blocks (Oracle10g) of the wait
events db file sequential read and db file scattered read
Transactions per second
R and XCTEND rlbk=0, rd only=0
The small trace file used in the previous case study did not contain any waits for enqueues
or latches. Hence the latch wait and enqueue wait sections in the preceding code example are
empty. Following is an excerpt of an ESQLTRCPROF report, which resulted from tracing database sessions that were concurrently enqueuing messages into an Advanced Queuing queue
table. The excerpt illustrates that the sessions were contending for both latches and enqueues.
Latch wait breakdown
-----------------------row cache objects
library cache pin
commit callback allocation
cache buffers chains
dml lock allocation
library cache
enqueue hash chains
waits:
waits:
waits:
waits:
waits:
waits:
waits:
1
2
1
2
1
9
1
sleeps:
sleeps:
sleeps:
sleeps:
sleeps:
sleeps:
sleeps:
0
0
0
1
0
4
0
Enqueue wait breakdown (enqueue name, lock mode)
-----------------------------------------------HW,X waits:
4
TX,S waits:
7
The enqueue HW was requested in exclusive mode (X), whereas the enqueue TX was
requested in shared mode (S). The enqueue HW (high water mark) is requested when a segment
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
needs to grow, while the enqueue TX is used to serialize write access to rows and the interested
transaction list (ITL) at the beginning of a database block.
In Oracle10g, the fixed view V$ENQUEUE STATISTICS may be used to generate a list of
208 enqueue types and names along with descriptions and one or more reasons why each enqueue
may be requested. Table 27-2 contains such a list for some of the more common enqueue types.
Table 27-2. Enqueue Types
Enqueue
Type
Name
Description
Reason for
Getting Enqueue
CF
Controlfile
Transaction
Synchronizes accesses to the control file
contention
CI
Cross-Instance
Call Invocation
Coordinates cross-instance
function invocations
contention
DX
Distributed
Transaction
Serializes tightly coupled distributed
transaction branches
contention
HV
Direct Loader
High Water Mark
Lock used to broker the high water mark
during parallel inserts
contention
HW
Segment High
Water Mark
Lock used to broker the high water mark
during parallel inserts
contention
JQ
Job Queue
Lock to prevent multiple instances from
running a single job
contention
JS
Job Scheduler
Lock to prevent job from running elsewhere
job run
lock-synchronize
JS
Job Scheduler
Lock to recover jobs running on crashed
RAC inst
job recov lock
JS
Job Scheduler
Lock on internal scheduler queue
queue lock
JS
Job Scheduler
Scheduler non-global enqueues
sch locl enqs
JS
Job Scheduler
Lock obtained when cleaning up
q memory
q mem clnup lck
JS
Job Scheduler
Lock got when adding subscriber to
event q
evtsub add
JS
Job Scheduler
Lock got when dropping subscriber to
event q
evtsub drop
JS
Job Scheduler
Lock got when doing window open/close
wdw op
JS
Job Scheduler
Lock got during event notification
evt notify
JS
Job Scheduler
Synchronizes accesses to the job cache
contention
SQ
Sequence Cache
Lock to ensure that only one process can
replenish the sequence cache
contention
SS
Sort Segment
Ensures that sort segments created during
parallel DML operations aren’t prematurely
cleaned up
contention
ST
Space Transaction
Synchronizes space management activities
in dictionary-managed tablespaces
contention
363
364
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
Table 27-2. Enqueue Types (Continued)
Enqueue
Type
Name
Description
Reason for
Getting Enqueue
TM
DML
Synchronizes accesses to an object
contention
TS
Temporary
Segment
Serializes accesses to temp segments
contention
TX
Transaction
Lock held on an index during a split to
prevent other operations on it
index contention
TX
Transaction
Lock held by a transaction to allow other
transactions to wait for it
contention
TX
Transaction
Lock held on a particular row by a transaction to prevent other transactions from
modifying it
row lock contention
TX
Transaction
Allocating an ITL entry in order to begin
a transaction
allocate ITL entry
UL
User-defined
Lock used by user applications5
contention
US
Undo Segment
Lock held to perform DDL on the
undo segment
contention
The table’s rows were generated by the following query:
SQL> SELECT eq type, eq name, req description, req reason
FROM v$enqueue statistics
WHERE eq type IN ('CF', 'CI', 'DX', 'HV', 'HW', 'JQ', 'JS', 'SQ',
'SS', 'ST', 'TM', 'TS', 'TX', 'UL', 'US')
ORDER BY eq type;
If you encounter an enqueue that is neither documented nor listed in Table 27-2, then
querying V$ENQUEUE STATISTICS will give you a first impression of what the enqueue is used for
and why the application you are optimizing may have requested the enqueue. Many enqueue
types are identical in Oracle9i and Oracle10g. So there is a good chance that you may find information that also applies to Oracle9i.
Statement Level Resource Profiles
The last section of an ESQLTRCPROF report contains the SQL statements encountered, statement level resource profiles, statement level statistics, and execution plans. Each statement is
depicted in a separate subsection, which begins with the statement’s hash value and the statement’s total elapsed time. The hash value is normally unique and is useful for quickly locating
a statement in a trace file. Furthermore, the hash value may be used to retrieve current and
past execution plans of a particular statement from a Statspack repository (see Chapter 25).
5. Waits for the enqueue UL occur when an application implements synchronization with the package
DBMS LOCK.
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
Statement Sort Order
Unlike TKPROF, ESQLTRCPROF does not have any command line options for sorting. At first
glance, this may seem like a disadvantage; however this is one of the strengths of ESQLTRCPROF.
ESQLTRCPROF always sorts statements by the total elapsed time attributed to each statement.
At the extended SQL trace file level, this means that ESQLTRCPROF calculates the total elapsed
time by summing up the e values of PARSE, EXEC, and FETCH entries of a certain cursor and adds
the ela values of inter database call wait events except for SQL*Net message from client waits,
which are classified as think time. Think time is ignored, since it is not indicative of a problem
with the DBMS instance. This approach is superior to the way TKPROF sorts statements on the
following three counts:
• TKPROF sorts either by executed elapsed time (exeela) or fetch elapsed time (fchela), but
not by the entire elapsed time of PARSE, EXEC, and FETCH entries. For the purpose of sorting, it
is irrelevant where the time was spent, since response time is what matters.
• TKPROF ignores inter database call wait events when sorting statements.
• TKPROF subtracts the recursive resource utilization (dep values greater than zero) when
reporting on statements at recursive call depth 0. This makes it impossible to sort by the
most expensive statements executed by the client.
In my view, it is more appropriate not to subtract recursive resource utilization. Then a
statement sent by a client (dep=0) appears higher in the sort order than recursive statements
engendered by that same statement. Thus, it is evident which statements merit a closer look
and are candidates for drilling down to higher recursive call depths. Ideally, ESQLTRCPROF
would create a separate report section that depicts the recursive relationship among statements, but this is beyond the capabilities of the current version.
ESQLTRCPROF has a peculiar feature for dealing with trace file entries that relate to cursor
number 0 as well as for cursors that lack a PARSING IN CURSOR entry. The latter phenomenon may
be seen when tracing is begun while an application is in mid-flight. There may be EXEC, FETCH,
and WAIT entries for a cursor without a corresponding PARSING IN CURSOR entry. Thus, the SQL
statement text for such a cursor cannot be determined.6 There is never a PARSING IN CURSOR
entry for cursor 0. LOB access with OCI may also be attributed to cursor 0. Instead of ignoring
entries pertaining to such cursors, which is what TKPROF does in the section on individual
statements, ESQLTRCPROF defines a default cursor with the impossible “hash value” -1 to
account for any such trace file entries. The ESQLTRCPROF report based on the trace file from
the previous section’s case study is continued in the next code example. Note that even this
short trace file contained almost ten seconds of think time and a log file sync, which were attributed to cursor 0.
Statements Sorted by Elapsed Time (including recursive resource utilization)
==============================================================================
Hash Value: 2228079888 - Total Elapsed Time (excluding think time): 1.526s
INSERT INTO customer(name, phone) VALUES (:name, :phone)
6. The measurement scripts sp capture.sql and awr capture.sql presented in Chapter 28 create a level 2
error stack dump, since this dump may contain the missing SQL statement texts.
365
366
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
RETURNING id INTO :id
DB Call
Count
Elapsed
CPU
Disk
Query Current
Rows
------- -------- ---------- ---------- -------- -------- -------- -------PARSE
1
0.0001s
0.0000s
0
0
0
0
EXEC
1
1.5259s
1.5000s
1
2
8
1
FETCH
0
0.0000s
0.0000s
0
0
0
0
------- -------- ---------- ---------- -------- -------- -------- -------Total
2
1.5259s
1.5000s
1
2
8
1
Wait Event/CPU Usage/Think Time
Duration
Count
---------------------------------------- ---------- -------total CPU
1.500s
2
db file sequential read
0.033s
1
SQL*Net message from client
0.000s
1
SQL*Net message to client
0.000s
1
Hash Value: 1168215557 - Total Elapsed Time (excluding think time): 0.002s
SELECT CUSTOMER ID SEQ.NEXTVAL FROM DUAL
DB Call
Count
Elapsed
CPU
Disk
Query Current
Rows
------- -------- ---------- ---------- -------- -------- -------- -------PARSE
1
0.0001s
0.0000s
0
0
0
0
EXEC
1
0.0001s
0.0000s
0
0
0
0
FETCH
1
0.0019s
0.0000s
0
1
3
1
------- -------- ---------- ---------- -------- -------- -------- -------Total
3
0.0020s
0.0000s
0
1
3
1
Wait Event/CPU Usage/Think Time
Duration
Count
---------------------------------------- ---------- -------total CPU
0.000s
3
Execution Plan:
Step Parent
Rows Row Source
---- ------ -------- -----------------------------------------------------------1
0
1 SEQUENCE CUSTOMER ID SEQ (cr=1 pr=0 pw=0 time=1878 us)
(object id=53073)
2
1
1 FAST DUAL (cr=0 pr=0 pw=0 time=15 us)
Hash Value: 740818757 - Total Elapsed Time (excluding think time): 0.001s
alter session set events '10046 trace name context forever, level 8'
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
DB Call
Count
Elapsed
CPU
Disk
Query Current
Rows
------- -------- ---------- ---------- -------- -------- -------- -------PARSE
0
0.0000s
0.0000s
0
0
0
0
EXEC
1
0.0001s
0.0000s
0
0
0
0
FETCH
0
0.0000s
0.0000s
0
0
0
0
------- -------- ---------- ---------- -------- -------- -------- -------Total
1
0.0001s
0.0000s
0
0
0
0
Wait Event/CPU Usage/Think Time
Duration
Count
---------------------------------------- ---------- -------SQL*Net message from client
0.001s
1
SQL*Net message to client
0.000s
1
total CPU
0.000s
1
Hash Value: 2635489469 - Total Elapsed Time (excluding think time): 0.000s
update seq$ set increment$=:2,minvalue=:3,maxvalue=:4,cycle#=:5,order$=:6,cache=:7,
highwater=:8,audit$ =:9,flags=:10 where obj#=:1
DB Call
Count
Elapsed
CPU
Disk
Query Current
Rows
------- -------- ---------- ---------- -------- -------- -------- -------PARSE
1
0.0001s
0.0000s
0
0
0
0
EXEC
1
0.0002s
0.0000s
0
1
2
1
FETCH
0
0.0000s
0.0000s
0
0
0
0
------- -------- ---------- ---------- -------- -------- -------- -------Total
2
0.0003s
0.0000s
0
1
2
1
Wait Event/CPU Usage/Think Time
Duration
Count
---------------------------------------- ---------- -------total CPU
0.000s
2
Execution Plan:
Step Parent
Rows Row Source
---- ------ -------- -----------------------------------------------------------1
0
1 UPDATE SEQ$ (cr=1 pr=0 pw=0 time=195 us)
2
1
1 INDEX UNIQUE SCAN I SEQ1 (cr=1 pr=0 pw=0 time=25 us)
(object id=102)
Hash Value: -1 - Total Elapsed Time (excluding think time): 0.000s
Cursor 0 - unknown statement (default container for any trace file entries relating
to cursor 0)
DB Call
Count
Elapsed
CPU
Disk
Query Current
Rows
------- -------- ---------- ---------- -------- -------- -------- -------PARSE
0
0.0000s
0.0000s
0
0
0
0
EXEC
0
0.0000s
0.0000s
0
0
0
0
367
368
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
FETCH
0
0.0000s
0.0000s
0
0
0
0
------- -------- ---------- ---------- -------- -------- -------- -------Total
0
0.0000s
0.0000s
0
0
0
0
Wait Event/CPU Usage/Think Time
Duration
Count
---------------------------------------- ---------- -------think time
9.864s
1
log file sync
0.000s
1
SQL*Net message to client
0.000s
1
total CPU
0.000s
0
SQL Identifiers and Oracle11g Trace Files
The latest release of ESQLTRCPROF supports Oracle11g extended SQL trace files. It includes
the SQL identifier, module, action, and recursive call depth in the header of each statementlevel resource profile. SQL identifiers were introduced with Oracle10g, but are not emitted to
SQL trace files prior to Oracle11g. Thus, merely ESQLTRCPROF reports based on Oracle11g
trace files contain SQL identifiers. Following is an example:
Statements Sorted by Elapsed Time (including recursive resource utilization)
==============================================================================
Hash Value: 1256130531 - Total Elapsed Time (excluding think time): 0.102s
SQL Id: b85s0yd5dy1z3 Module 'insert perf5.pl' Action 'undefined'
Dependency Level: 0
INSERT INTO customer(id, name, phone)
VALUES (customer id seq.nextval, :name, :phone)
RETURNING id INTO :id
DB Call
Count
Elapsed
CPU
Disk
Query Current
Rows
------- -------- ---------- ---------- -------- -------- -------- -------PARSE
1
0.0002s
0.0000s
0
0
0
0
EXEC
10
0.0956s
0.0313s
5
27
36
10
FETCH
0
0.0000s
0.0000s
0
0
0
0
------- -------- ---------- ---------- -------- -------- -------- -------Total
11
0.0958s
0.0313s
5
27
36
10
Wait Event/CPU Usage/Think Time
Duration
Count
---------------------------------------- ---------- -------total CPU
0.031s
11
db file sequential read
0.021s
3
SQL*Net message from client
0.006s
10
SQL*Net message to client
0.000s
10
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
Execution Plan:
Step Parent
Rows Row Source
---- ------ -------- -----------------------------------------------------------1
0
0 LOAD TABLE CONVENTIONAL (cr=3 pr=4 pw=4 time=0 us)
2
1
1 SEQUENCE CUSTOMER ID SEQ (cr=3 pr=1 pw=1 time=0 us)
(object id=15920)
The SQL identifier in reports allows for easy integration with AWR. It may be used as input
to the AWR SQL report script awrsqrpt.sql or DBMS XPLAN.
SQL> SELECT * FROM TABLE (dbms xplan.display awr('b85s0yd5dy1z3'));
PLAN TABLE OUTPUT
---------------------------------------------------------------------SQL ID b85s0yd5dy1z3
-------------------INSERT INTO customer(id, name, phone) VALUES (customer id seq.nextval,
:name, :phone)
RETURNING id INTO :id
Plan hash value: 2690979981
-----------------------------------------------------------| Id | Operation
| Name
| Cost |
-----------------------------------------------------------| 0 | INSERT STATEMENT
|
|
1 |
PLAN TABLE OUTPUT
-----------------------------------------------------------| 1 | LOAD TABLE CONVENTIONAL |
|
|
| 2 |
SEQUENCE
| CUSTOMER ID SEQ |
|
-----------------------------------------------------------Note
---- cpu costing is off (consider enabling it)
Module and action are extracted from application instrumentation entries. If no such
entries are present in a trace file, both module and action are reported as “undefined”. The
Oracle9i APPNAME entry format is used to extract module and action from Oracle9i trace files.
The recursive call depth is taken from the parameter dep of PARSING IN CURSOR entries.
Lessons Learned
The ESQLTRCPROF profiler accepts an extended SQL trace file as input and calculates session
level and statement level resource profiles. To the best of my knowledge, it is the only profiler
that categorizes SQL*Net message from client into think time and unavoidable network roundtrips between client and database server. The pseudo wait event think time is defined as a WAIT
entry with nam='SQL*Net message from client' where the ela value exceeds a configurable
threshold. Remember that a DBA does not stand any chance to reduce think time accumulated
369
370
CHAPTER 27 ■ ESQLTRCPROF EXTENDED SQL TRACE PROFILER
by a database client. Occasionally the DBMS is not to blame for performance problems that
end users or developers perceive as database performance problems. If you encounter a resource
profile with very prominent think time, say 50% or more, the average duration of think time is
more than several hundred milliseconds and there are no expensive SQL statements in the
statement level ESQLTRCPROF report, this is probably such a case. If you do encounter such a
case, average think time will usually be in the range of several seconds. You should check the
average duration of think time to safeguard against erroneous classification of SQL*Net message
from client as think time due to a threshold that is too low. Of course it is also important to pay
attention to the measurement interval. It would be inappropriate to choose a period of inactivity
by a database client as the measurement interval.
ESQLTRCPROF tries to address some shortcomings of TKPROF. Given that Millsap and
Holt’s book [MiHo 2003] has been available since 2003, it is somewhat astonishing that the
TKPROF release shipped with Oracle11g still does not contain a resource profile, nor does it
sort wait events by contribution to response time. TKPROF also fails to report average durations of database calls and wait events.
ESQLTRCPROF takes into account that inter database call as well as intra database call
wait events affect a statement’s contribution to response time. A statement that is executed
many times at recursive call depth 0 incurs a network round-trip for each execution. Thus, a
statement that executes in a fraction of a second, but is executed many times, may contribute
more to response time than a slower statement, which is executed merely a few times. For the
purpose of sorting SQL or PL/SQL statements in a trace file by their actual contribution to
response time, inter database call wait events must be considered too. This is why ESQLTRCPROF
defines total elapsed time for a statement as the sum of all e values plus the sum of all ela values for
inter database call wait events that are associated with a statement. Total elapsed time is used
to sort statements in the ESQLTRCPROF statement level report section. Due to this novel
approach, ESQLTRCPROF does not need sort options in the way that TKPROF does.
Source Code Depot
Table 27-3 lists this chapter’s source files and their functionality.
Table 27-3. ESQLTRCPROF Source Code Depot
File Name
Functionality
esqltrcprof.pl
This Perl program creates a session level resource profile as well as
statement level resource profiles from an extended SQL trace file. It
classifies the wait event SQL*Net message from client into think time
and genuine network latency. It also computes transactions per second
and other metrics and breaks the wait events latch free and enqueue
down to individual latches and enqueues.
insert customer.pl
This Perl program inserts a row into a table with INSERT RETURNING.
DDL for creating database objects referenced by the program is
included in the Perl source code.
insert customer.trc
Extended SQL trace file from a run of insert customer.pl.
CHAPTER 28
■■■
The MERITS Performance
Optimization Method
T
he MERITS performance optimization method is built around a sophisticated assessment of
extended SQL trace files. The extended SQL trace profiler ESQLTRCPROF, which is capable of
parsing the undocumented trace file format, is used in the assessment phase of the method.
The MERITS method uses undocumented features predominantly in the assessment, reproduction, and extrapolation phases.
Essentially, the MERITS method is a framework for solving performance problems. The
goal of the method is to identify the root cause of slow response time and to subsequently
modify parameters, database objects, or application code until the performance goal is met.
Introduction to the MERITS Method
The MERITS performance optimization method is an approach to performance optimization
that consists of six phases and relies on undocumented features in several phases. MERITS is a
designed acronym derived from the following six phases of the method:
1. Measurement
2. Assessment
3. Reproduction
4. Improvement
5. Extrapolation
6. Installation
The first step in any performance optimization project should consist of measuring the
application or code path that is too slow (phase 1). Measurement data are assessed in the second
phase. In some cases this assessment may already reveal the cause of the performance problem.
More intricate cases need to be reproduced by a test case, potentially on a test system (phase 3). If
a SQL statement takes excessive time to execute, then the test case consists of reproducing the
response time of the SQL statement. The fourth phase is concerned with improving the response
time of the application, code path, or test case. This may involve creating a new index, changing
371
372
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
optimizer parameters, changing the SQL statement, changing database objects with DDL, introducing previously unused features (e.g., Partitioning option, stored outlines, SQL profiles), etc.
Effects of the improvement are measured in the same way as the original code path. Comparing the
measurement data of the original code path with the measurement data of the improvements
achieved in phase 4 may be used to extrapolate the magnitude of the performance improvement
(phase 5). In other words, it is possible to forecast the effect of an improvement in a test case on
the code path that was measured in phase 1. If the improvement is deemed sufficient, the
necessary changes need to be approved and installed on the target (production) system at some
point. Discussing each phase of the MERITS method in full detail is a subject for a separate
book. However, I provide enough information on each phase to allow you to use the method as
a framework for performance optimization tasks.
Measurement
Since extended SQL trace is the most complete account of where a database session spent its
time and a resource profile may be compiled from extended SQL trace data, this data source is
at the core of the measurements taken. However, an extended SQL trace file does not provide
a complete picture of an application, system, or DBMS instance. Some aspects that are not covered
by an extended SQL trace file are as follows:
• Load at the operating system level (I/O bottlenecks, paging, network congestion, waiting
for CPU)
• ORACLE DBMS parameters
• Session statistics (V$SESSSTAT)
• Contending database sessions
To capture a complete picture of the system, I recommend using tools such as sar, iostat,
vmstat, and top to record activity at the operating system level. Concerning the DBMS, I advocate taking a Statspack or AWR snapshot that spans the same interval as the extended SQL trace
file. The Statspack snapshot should include the traced session (STATSPACK.SNAP parameter
i session id). If AWR is preferred, an active session history report may be used to get additional information on the session. It may be necessary to take several measurements and to
compute an average to compensate for fluctuations in response time. Both AWR and Statspack
reports contain a list of all initialization parameters with non-default values. An Active Session
History (ASH) report contains a section on contending sessions entitled “Top Blocking Sessions”.
Measurement Tools
This section presents two SQL scripts that may serve as measurement tools at session and
instance level. The script awr capture.sql is based on AWR and ASH, while sp capture.sql is
based on Statspack. Both scripts require SYSDBA privileges. The scripts do not invoke any operating system tools to collect operating system statistics. Yet, an Oracle10g Statspack report includes
CPU and memory statistics at operating system level in the “Host CPU” and “Memory Statistics” sections and an AWR report includes a section titled “Operating System Statistics”.
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
Both AWR and ASH are included in the extra-cost Diagnostics Pack. The downside of ASH
is that it does not provide a resource profile, since it is built with sampling. A session-level
Statspack report does include a rudimentary resource profile, although the report does not use
the term resource profile. Session level data are based on V$SESSION EVENT and V$SESSTAT, which
afford the calculation of a resource profile. Interesting sections from an ASH report, which a
session-level Statspack as well as a TKPROF report lack, are “Top Service/Module”, “Top SQL
using literals”, “Top Blocking Sessions”, and “Top Objects/Files/Latches”.
Extended SQL Trace, AWR, and ASH
The script awr capture.sql temporarily sets the hidden parameter ASH SAMPLE ALL=TRUE to
cause ASH to sample idle wait events for improved diagnostic expressiveness. Then the script
takes an AWR snapshot and enables level 12 SQL trace for the session that exhibits a performance problem. Next, the script asks the user for how long it should trace the session. There
are two ways of using the script:
• Tracing the session for a predetermined interval, such as 300 or 600 seconds. This is
achieved by entering the desired length of the interval. The script calls DBMS LOCK.SLEEP
to pause for the specified number of seconds, takes another AWR snapshot, and disables
SQL trace.
• Using an event-based approach to control the measurement interval. With this approach,
type 0 in response to the question concerning the length of the interval, but do not yet
hit return. Wait until an event, such as an end user calling to report that the session you
are tracing has returned a result set, occurs, then hit return. Due to the zero-length wait
interval, the script immediately takes another AWR snapshot and disables SQL trace.
At the end of the measurement interval, the script automatically creates both an AWR and
an ASH report in HTML format by calling the package DBMS WORKLOAD REPOSITORY. The files are
created in the current directory. Note that the documented ASH script does not have the capability to report solely on a specific session. Following is an example of the script in operation:
SQL> @awr capture
Please enter SID (V$SESSION.SID): 143
Please enter a comment (optional): slow-app
SPID SID SERIAL# USERNAME MACHINE
SERVICE NAME
----- --- ------- -------- ---------- ---------------18632 143
333 NDEBES
WORKGROUP\ TEN.oradbpro.com
DBSERVER
PROGRAM MODULE
ACTION CLIENT IDENTIFIER
-------- -------- ------ ----------------perl.exe perl.exe NULL NULL
Oracle pid: 20, Unix process pid: 18632, image: oracleTEN1@dbserver1.oradbpro.com
Statement processed.
Extended SQL trace file:
/opt/oracle/obase/admin/TEN/udump/ten1 ora 18632.trc
373
374
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
Begin snapshot: 95
Please enter snapshot interval in seconds (0 to take end snapshot immediately): 300
End snapshot: 96
Begin time: 07.Sep.2007 03:51:33; End time: 07.Sep.2007 03:56:39; Duration (minutes)
: 5.1
ASH Report file: slow-app-SID-143-SERIAL-333-ash.html
AWR Report file: slow-app-SID-143-SERIAL-333-awr.html
Extended SQL Trace and Session Level Statspack Snapshot
The measurement script for Statspack is very similar to the AWR variant. It also asks for a session to
trace and enables level 12 SQL trace for that session. Instead of taking AWR snapshots, this
script takes Statspack snapshots, which include session level data. The script uses ORADEBUG to
retrieve the extended SQL trace file name. The following example illustrates the event-based
approach to determining the capture interval. This means that the capture does not last for a
predetermined interval.
SQL> @sp capture
Please enter SID (V$SESSION.SID): 139
SPID
SID SERIAL# USERNAME
MACHINE
SERVICE NAME
------------ ----- ------- ---------- ---------- ---------------19376
139
41757 NDEBES
WORKGROUP\ TEN.oradbpro.com
DBSERVER
PROGRAM
MODULE
ACTION
CLIENT IDENTIFIER
----------- --------------- ---------- ----------------perl.exe
insert perf5.pl NULL
NULL
Oracle pid: 16, Unix process pid: 19376, image: oracleTEN1@dbserver1.oradbpro.com
Extended SQL trace file:
/opt/oracle/obase/admin/TEN/udump/ten1 ora 19376.trc
Begin snapshot: 291
At this point, the script waits for input. If you wish to end the capture interval as soon as an
event occurs, wait for the event and enter zero.
Please enter snapshot interval in seconds (0 to take end snapshot immediately): 0
End snapshot: 301
Begin time: 07.Sep.2007 06:20:07; End time: 07.Sep.2007 06:24:44; Duration (minutes)
: 4.6
As soon as the capture is over, a Statspack report for the beginning and end snapshot
numbers may be generated by running the script $ORACLE HOME/rdbms/admin/spreport.sql.
Following is an excerpt of a report that includes session level data captured with the script
sp capture.sql. The session-specific sections are “Session Wait Events”, “Session Time Model
Stats” (Oracle10g and later releases only), and “Session Statistics”:1
1. Per transaction statistics have been omitted in the “Session Statistics” section.
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
Snapshot
Snap Id
Snap Time
Sessions Curs/Sess Comment
~~~~~~~~
---------- ------------------ -------- --------- ------------------Begin Snap:
291 07-Sep-07 06:20:07
21
6.7 SID-139-perl.exe-in
End Snap:
301 07-Sep-07 06:24:44
21
6.7 SID-139-perl.exe-in
Elapsed:
4.62 (mins)
Session Wait Events DB/Inst: TEN/TEN1 Snaps: 291-301
Session Id:
139 Serial#:
41757
-> ordered by wait time desc, waits desc (idle events last)
Total Wait
Event
Waits Timeouts Time (s)
WT
---------------------------- ------------ ---------- ---------- ---------Waits
/txn
-------log file switch completion
3
1
1 345.605667
3.0
db file sequential read
13
0
0 4.61123077
13.0
control file sequential read
26
0
0 .954307692
26.0
control file parallel write
6
0
0 2.56516667
6.0
db file scattered read
4
0
0
3.045
4.0
log file sync
2
0
0
5.1105
2.0
latch: library cache
1
0
0
0
1.0
SQL*Net message from client
283,496
0
72 .254345627
########
SQL*Net message to client
283,496
0
1 .004825056
########
Session Time Model Stats DB/Inst: TEN/TEN1 Snaps: 291-301
Session Id:
139 Serial#:
41757
-> Total Time in Database calls
182.9s (or 182854062us)
-> Ordered by % of DB time desc, Statistic name
Statistic
Time (s) % of DB time
----------------------------------- -------------------- -----------DB CPU
179.2
98.0
sql execute elapsed time
176.1
96.3
sequence load elapsed time
32.1
17.6
PL/SQL execution elapsed time
6.7
3.7
375
376
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
parse time elapsed
hard parse elapsed time
hard parse (sharing criteria) elaps
repeated bind elapsed time
DB time
1.4
0.4
0.0
0.0
180.6
.8
.2
.0
.0
Session Statistics DB/Inst: TEN/TEN1 Snaps: 291-301
Session Id:
139 Serial#:
41757
Statistic
Total
per Second
--------------------------------- ------------------ -----------------active txn count during cleanout
948
3
consistent gets
31,335
113
cpu used by this session
18,117
65
parse time cpu
30
0
physical read total bytes
655,360
2,366
redo size
92,965,180
335,614
session pga memory max
4,718,592
17,035
sql*net roundtrips to/from client
283,503
1,023
workarea executions - optimal
56
0
A resource profile may be derived from the session-specific sections by using the DB CPU
from the “Session Time Model Stats” section and the wait events from the “Session Wait
Events” section. In Oracle9i, CPU consumption is represented by the statistic “CPU used by
this session”. The response time equates the interval covered by the begin and end snapshots
(value “Elapsed” at the beginning of the report). The difference between the measurement
interval and total wait time plus DB CPU is calculated and reported as “unknown”. A large
portion of “unknown” time may mean that the session did not get the CPU or that the code
path captured is incompletely instrumented. There may also have been a long ongoing wait
such as SQL*Net message from client, which has not yet been incorporated into V$SESSION EVENT.
Table 28-1 displays the resource profile calculated with this approach.
Table 28-1. Resource Profile Derived from a Session Level Statspack Report
Response Time Contributor
Time
Percentage
DB CPU
179.2 s
64.64%
SQL*Net message from client
72.0 s
26.97%
unknown
24.0 s
8.66%
log file switch completion
1.0 s
0.03%
SQL*Net message to client
1.0 s
0.03%
Response time
277.2 s
100.00%
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
You may compare the resource profile in Table 28-1 to the resource profile for the same
session, which was calculated from the extended SQL trace file that spanned (almost)2 the
same interval using ESQLTRCPROF. The latter resource profile is reproduced in the next code
example. The CPU usage (179.2 vs. 218.31) differs significantly between the two reports, whereas
figures for the wait events are nearly identical.
ORACLE version 10.2 trace file. Timings are in microseconds (1/1000000 sec)
Resource Profile
================
Response time: 239.377s; max(tim)-min(tim): 271.173s
Total wait time: 74.080s
---------------------------Note: 'SQL*Net message from client' waits for more than 0.005s are considered think
time
Wait events and CPU usage:
Duration
Pct
Count
Average Wait Event/CPU Usage/Think Time
-------- ------ ------------ ---------- ----------------------------------218.316s 91.20%
904144 0.000241s total CPU
70.953s 29.64%
282089 0.000252s SQL*Net message from client
1.349s
0.56%
282092 0.000005s SQL*Net message to client
1.037s
0.43%
3 0.345606s log file switch completion
0.419s
0.18%
16 0.026201s KSV master wait
0.092s
0.04%
8 0.011520s Data file init write
0.062s
0.03%
2 0.030860s log file switch (checkpoint incomplete)
0.060s
0.03%
13 0.004611s db file sequential read
0.025s
0.01%
26 0.000954s control file sequential read
0.023s
0.01%
3 0.007810s think time
0.016s
0.01%
2 0.007905s rdbms ipc reply
0.015s
0.01%
6 0.002565s control file parallel write
0.012s
0.01%
4 0.003045s db file scattered read
0.010s
0.00%
2 0.005110s log file sync
0.005s
0.00%
2 0.002479s db file single write
0.001s
0.00%
2 0.000364s latch: library cache
0.000s
0.00%
2 0.000243s latch: shared pool
-53.019s -22.15%
unknown
--------- ------- ----------------------------------------------------------239.377s 100.00% Total response time
Assessment
Asking for the goal of the performance optimization should be the first step in the assessment
phase. If the goal is unrealistic, then expectations need to be corrected. Having a goal is also
important for deciding whether or not the goal is attainable. As you assess the data collected in
2. Since it takes a few seconds (usually less than three), to take a Statspack snapshot, the interval covered
by the beginning and end snapshots will never have exactly the same duration as that captured by the
SQL trace file.
377
378
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
the measurement phase, you will be able to decide whether the goal might be attainable or not.
Often managers or other contacts will simply say “make it as fast as possible”. All right, at least
you’ve asked.
The second question to ask is whether the issue faced is truly a database performance
problem. This is done by looking at the think time figure ESQLTRCPROF derives from extended
SQL trace files. If, say 90% of the response time is think time and all the SQL statements that
were executed intermittently completed reasonably quickly, then there is no database problem.
Think time indicates that the database client did not ask the server to process any requests. The
application is either idle or busy processing instructions that do not make demands on the
DBMS instance. Since it’s impossible for a DBA to reduce think time in an application, the
application developer must find out what is taking too long.
The next classification to make, is whether the slow response time is due to excessive CPU
usage or high wait time. The statistic CPU used by this session or DB CPU in Oracle10g indicate
CPU usage. If waiting is the main issue, the resolution depends on the kinds of wait events.
Waiting might be due to a slow I/O system (e.g., db file sequential read), contention (e.g., enqueue,
latch free), or lack of CPU resources. The latter cause may be reflected in wait events if the database resource manager is enabled. How to reduce wait time depending on which wait events
are most prominent (Oracle10g has 878 different wait events) is a subject for a separate performance tuning book. Shee et al. ([ShDe 2004]) do a good job of addressing this topic in their book.
Resource Profiles and Performance Assessment Tools
The main goal of the assessment phase is to generate a resource profile from the data captured
in the preceding measurement phase. The concept of a resource profile has been made popular by
the work of Cary Millsap and Jeff Holt as published in the book Optimizing Oracle Performance
([MiHo 2003]). My extended SQL trace profiler ESQLTRCPROF is strongly influenced by this
publication. Yet, the addition of think time to resource profiles is my own invention. The research
done by Millsap and Holt has lead to the development of a commercial profiler for extended
SQL trace files, which is offered by Hotsos Enterprises, Ltd. Other tools for obtaining a resource
profile are TKPROF and ESQLTRCPROF, which is described in detail in Chapter 27.
TKPROF vs. ESQLTRCPROF
TKPROF does not really report a resource profile. Nonetheless, a TKPROF report does contain
enough information to calculate a resource profile from it. The same approach as with session
level Statspack data may be used. The response time R consists of the sum of the elapsed times
for non-recursive and recursive statements plus the wait time of the inter database call wait
events SQL*Net message from client and SQL*Net message to client. Overall totals for non-recursive
and recursive statements need to be added to get the totals for both CPU usage and wait events.
The measurement interval is reported as “elapsed seconds in trace file” in the last line of the
report by TKPROF release 10.2. Once these figures are calculated for CPU usage and all wait
events, they need to be sorted and arranged as a resource profile. Much easier to use ESQLTRCPROF, which does it all automatically. Another disadvantage of TKPROF is the omission
of hash values, which identify SQL statements, from the report. Hash values might be used
to correlate the SQL statements with instance level (spreport.sql) and SQL statement level
(sprepsql.sql) Statspack reports or V$ views (V$SQL.HASH VALUE).
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
Both TKPROF and ESQLTRCPROF report CPU usage. Often high CPU usage is associated
with SQL statements that access thousands or millions of blocks in the buffer cache. If this is
the case, a poor execution plan, which includes unnecessary full scans, has an inappropriate
join order, or uses the wrong index, might be the culprit. SQL statements with high CPU usage
must be identified and their execution plans checked. TKPROF provides the sort options fchela
and exeela, which may be combined. If you suspect that a SELECT statement is the most expensive statement in a trace file, use sort=fchela,exeela. Otherwise use sort=exeela,fchela.
TKRPOF cannot sort by total elapsed time, which comprises elapsed time from the three stages
parse, execute, and fetch. Furthermore it does not consider inter database call wait events
attributed to a cursor when sorting. ESQLTRCPROF always sorts by total elapsed time and
considers in between database call wait events such as SQL*Net message from/to client while
ignoring think time. This makes sure that a SQL statement that causes many round-trips and
thus incurs a lot of network latency is ranked as more expensive than a statement that is responsible for the same amount of elapsed time within database calls, but does not accumulate as
much network latency.
Reproduction
Before the actual performance optimization may begin, a way to reproduce the problem at
hand must be found. It is crucial to reproduce the problematic code path as closely as possible.
Depending on the kind of performance problem, to reproduce an issue, the following factors
may need to be identical between the original environment and the test environment:
• Hardware capability
• Operating system and release
• ORACLE DBMS release
• Initialization parameters (documented and hidden), especially optimizer parameters
• Database object statistics (a.k.a. optimizer statistics)
• Database block size
• Bind variables and bind data types in SQL statements which are optimized
• Stored outlines
• SQL profiles (Oracle10g and later releases only)
When creating database objects, make sure you create them with the same DDL in
tablespaces with the same block size as the original application. The package DBMS METADATA
may be used to extract DDL for database objects. When reproducing SQL statements with bind
variables, it’s important to use bind variables with the same data type as the original statement.
Level 12 SQL trace files contain bind variable values as well as bind variable data types in the
BINDS section (see Chapter 24). SQL*Plus or PL/SQL variables may be used to reproduce the
bind data types. Last but not least, make sure no other users are running stress tests or other
resource-intensive programs on your test system, since this may lead to contradictory results
of your tests.
379
380
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
Improvement
Improving performance is a vast subject. If we ignore the use of more powerful hardware,
which might not even solve the problem, the following procedures may result in better
performance:
• Parameter changes (e.g., DB CACHE SIZE, PARALLEL EXECUTION MESSAGE SIZE)
• Adding an index to avoid full table scans
• Dropping unnecessary indexes
• Partitioning of tables and/or indexes
• Materialized views
• Use of bind variables instead of literals in SQL statements
• Correction of bind data type mismatches
• Calculation of more accurate optimizer statistics with DBMS STATS
• Optimizer Dynamic Sampling
• Use of system statistics, given that the cost-based SQL optimizer chooses better execution
plans than without them (DBMS STATS)
• Use of cached sequences instead of counters implemented with tables
• Adding hints to SQL statements (as a last resort to improve execution plans)
• Stored outlines
• Supplying hidden hints with stored outlines (see Metalink note 92202.1)
• SQL profiles (Oracle10g and subsequent releases)
• SQL plan management (requires Oracle11g)
• Use of array inserts or bulk load programming interfaces
• Reduction of network round-trips (e.g., with INSERT RETURNING)
• PL/SQL native compilation
• Changes in application coding to reduce contention, parse overhead, polling, and so forth
Performance improvements should be documented with additional measurements. You
should not rely on a single measurement, but instead take several measurements and calculate
the average.
Extrapolation
By the time a substantial performance improvement has been achieved with the test case, it is
important to extrapolate the effect of the improvement on the original application. This is to
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
decide whether the tuning effort can be stopped or not. In case tests in the previous phase were
run with instrumentation or SQL trace enabled, these should be switched off now. Measurement intrusion is undesirable when trying to get reliable figures to base a forecast on. After all,
the original application usually runs without measurement intrusion through instrumentation
or tracing. On the other hand, if the original application runs with instrumentation enabled, so
should your test case.
Installation
Usually, changes are made in test and quality assurance systems first before they are allowed
to go into production. If the root cause was inefficient SQL coding or application coding in
general, it may take quite a while before the software manufacturer incorporates the necessary
changes and releases a new version of the software. In addition to approval, changes that require
reorganization of database objects need to wait for a sufficiently large maintenance window. If
there is no budget for an extra-cost feature that was used in the improvement phase (e.g., partitioned tables), then it may be hard to get approval for the suggested changes. Furthermore it
may require additional downtime to install the feature.
MERITS Method Case Study
This section presents the application of the MERITS method to a real-world performance
problem. A digital imaging company accepts JPEG image files for printing and archival from
customers. A routine within the company’s web application, which is implemented in Perl,
reads the EXIF image metadata in the digital still camera files and loads the image metadata
along with the images themselves into an ORACLE database. The EXIF data are used by search
functionality provided on the web site. The image data is stored as a BLOB, since this provides
full recoverability in case of disk failure. The average size of the JPEG files loaded is 1 MB. The
contact says that the application is capable of loading 68 files per minute. The goal is to at least
triple the number of files loaded per minute.
I have chosen this example, since it highlights some limitations of extended SQL trace. The
striking discrepancy between the response time calculated from the extended SQL trace file
and the actual elapsed time you will see shortly, should not lead you to believe that analysis
of extended SQL trace files is generally inaccurate. Such a high discrepancy is the exception,
rather than the rule. In this case it is due to incomplete instrumentation of LOB access with
OCI. The lack of accuracy observed in the response time profile for this case provides the
opportunity to endow the reader with additional tools and knowledge to overcome such situations. In this particular case, I show how instrumentation combined with statistics collection
at the action level offered by Oracle10g results in an accurate representation of response time
per action (V$SERV MOD ACT STATS). In case you were still running Oracle9i, you would be able
to derive the elapsed time per action from the SQL trace file by looking at the timestamps that
are written with the module and action entry in Oracle9i (see page 249). This is not as good as
the data provided by the view V$SERV MOD ACT STATS in Oracle10g, which include DB time, DB
CPU, and other statistics, but it’s sufficient to find out where an application spends most of
the time.
381
382
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
Phase 1—Measurement
I sent the file awr capture.sql to the client and asked the DBA to capture 60 seconds of activity
from the image loading routine. I also asked the DBA to run the script statistics.sql to create
a report on the structure of the tables involved.
Phase 2—Assessment
I received a SQL trace file as well as the AWR and ASH reports created by awr capture.sql. The
statistics.sql report showed that LOBs are stored in a tablespace with default block size. Storage
of LOB data in row was enabled.
LOB Column
BlockPctIn
Name
Segment Name
Tablespace
size Chunk version Retention Cache Row
---------- ----------------- ---------- ------ ------ -------- --------- ----- --IMAGE DATA IMAGES IMAGE DATA USERS
8 KB
8192
10
NO YES
I processed the SQL trace file with TKPROF. Since it was likely that EXEC calls rather than
FETCH calls would contribute most to response time, I used the sort options exeela,fchela.
$ tkprof ten ora 3172 img load.trc ten ora 3172 img load.tkp sort=exeela,fchela
Following is an excerpt of the TKPROF report for the trace file:
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
call
count
------- -----Parse
213
Execute
213
Fetch
142
------- -----total
568
cpu
elapsed
disk
query
current
-------- ---------- ---------- ---------- ---------0.01
0.00
0
0
0
0.07
0.14
1
71
495
0.00
0.01
0
142
0
-------- ---------- ---------- ---------- ---------0.09
0.16
1
213
495
Elapsed times include waiting on following events:
Event waited on
Times Max. Wait
---------------------------------------- Waited ---------direct path read
8010
0.27
SQL*Net more data from client
32182
0.00
direct path write
45619
0.01
SQL*Net message to client
8578
0.00
SQL*Net message from client
8578
0.03
db file sequential read
7130
0.25
log file sync
86
0.05
log file switch completion
5
0.99
latch: shared pool
10
0.00
latch: library cache
1
0.00
log file switch (checkpoint incomplete)
8
0.99
rows
---------0
71
142
---------213
Total Waited
-----------13.23
0.43
0.69
0.02
4.91
26.11
0.51
3.23
0.00
0.00
1.76
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
call
count
cpu
elapsed
disk
query
current
rows
------- ------ -------- ---------- ---------- ---------- ---------- ---------Parse
100
0.00
0.03
0
0
0
0
Execute
166
0.14
0.18
0
230
46
46
Fetch
199
0.01
0.04
5
449
0
489
------- ------ -------- ---------- ---------- ---------- ---------- ---------total
465
0.15
0.27
5
679
46
535
Elapsed times include waiting on following events:
Event waited on
Times Max. Wait Total Waited
---------------------------------------- Waited ---------- -----------db file sequential read
5
0.01
0.03
213 user SQL statements in session.
166 internal SQL statements in session.
379 SQL statements in session.
********************************************************************************
Trace file: ten ora 3172 img load.trc
Trace file compatibility: 10.01.00
Sort options: exeela fchela
1 session in tracefile.
213 user SQL statements in trace file.
166 internal SQL statements in trace file.
379 SQL statements in trace file.
12 unique SQL statements in trace file.
128315 lines in trace file.
72 elapsed seconds in trace file.
Note the large discrepancy between total (recursive and non-recursive) elapsed time (0.43 s)
and the value for elapsed seconds in trace file (72 s). According to TKPROF, the following recursive
statement had the highest elapsed time:
update seg$ set type#=:4,blocks=:5,extents=:6,minexts=:7,maxexts=:8,extsize=
:9,extpct=:10,user#=:11,iniexts=:12,lists=decode(:13, 65535, NULL, :13),
groups=decode(:14, 65535, NULL, :14), cachehint=:15, hwmincr=:16, spare1=
DECODE(:17,0,NULL,:17),scanhint=:18
where
ts#=:1 and file#=:2 and block#=:3
call
count
------- -----Parse
46
Execute
46
Fetch
0
------- -----total
92
cpu
elapsed
disk
query
current
-------- ---------- ---------- ---------- ---------0.00
0.03
0
0
0
0.09
0.10
0
230
46
0.00
0.00
0
0
0
-------- ---------- ---------- ---------- ---------0.09
0.13
0
230
46
rows
---------0
46
0
---------46
383
384
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
I also processed the SQL trace file with ESQLTRCPROF. It gave hundreds of warnings like
the following:
Warning: WAIT event 'direct path read' for cursor 4 at line 5492 without prior
PARSING IN CURSOR #4 - ignored for per statement response time accounting
Warning: WAIT event 'direct path write' for cursor 4 at line 5497 without prior
PARSING IN CURSOR #4 - ignored for per statement response time accounting
The warning indicates that there is a problem with response time accounting for cursor 4.
The resource profile from ESQLTRCPROF is shown here:
ORACLE version 10.2 trace file. Timings are in microseconds (1/1000000 sec)
Resource Profile
================
Response time: 5.627s; max(tim)-min(tim): 74.957s
Total wait time: 50.974s
---------------------------Note: 'SQL*Net message from client' waits for more than 0.005s are considered think
time
Wait events and CPU usage:
Duration
Pct
Count
Average Wait Event/CPU Usage/Think Time
-------- ------ ------------ ---------- ----------------------------------26.153s 464.76%
7135 0.003665s db file sequential read
13.234s 235.17%
8010 0.001652s direct path read
3.232s 57.43%
5 0.646382s log file switch completion
2.604s 46.28%
8507 0.000306s SQL*Net message from client
2.312s 41.08%
71 0.032562s think time
1.768s 31.41%
8 0.220956s log file switch (checkpoint incomplete)
0.690s 12.27%
45619 0.000015s direct path write
0.516s
9.18%
86 0.006004s log file sync
0.434s
7.72%
32182 0.000013s SQL*Net more data from client
0.313s
5.55%
1033 0.000303s total CPU
0.030s
0.53%
8578 0.000003s SQL*Net message to client
0.001s
0.02%
10 0.000109s latch: shared pool
0.000s
0.00%
1 0.000207s latch: library cache
-45.659s -811.39%
unknown
--------- ------- ----------------------------------------------------------5.627s 100.00% Total response time
Total number of roundtrips (SQL*Net message from/to client): 8578
CPU usage breakdown
-----------------------parse CPU:
0.05s (313 PARSE calls)
exec CPU:
0.25s (379 EXEC calls)
fetch CPU:
0.02s (341 FETCH calls)
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
The difference between the response time of 5.627 s and the elapsed time covered by the
trace file (max(tim)- min(tim): 74.957s) is just as prominent as in the TKPROF report. Contrary
to TKPROF, the ESQLTRCPROF report points out that there are 45 seconds which are not
accounted for (unknown). The total wait time of 50.974 s proves that the application was interacting with the DBMS most of the time, but the wait time should be rolled up into parse, execute,
and fetch calls.
According to ESQLTRCPROF, the highest contributor to response time was associated
with a cursor that does not have a SQL statement text associated with it.
Statements Sorted by Elapsed Time (including recursive resource utilization)
==============================================================================
Hash Value: -1 - Total Elapsed Time (excluding think time): 2.976s
Cursor 0 - unknown statement (default container for any trace file entries relating
to cursor 0)
DB Call
Count
Elapsed
CPU
Disk
Query Current
Rows
------- -------- ---------- ---------- -------- -------- -------- -------PARSE
0
0.0000s
0.0000s
0
0
0
0
EXEC
0
0.0000s
0.0000s
0
0
0
0
FETCH
0
0.0000s
0.0000s
0
0
0
0
------- -------- ---------- ---------- -------- -------- -------- -------Total
0
0.0000s
0.0000s
0
0
0
0
Wait Event/CPU Usage/Think Time
Duration
Count
---------------------------------------- ---------- -------SQL*Net message from client
2.431s
8081
think time
2.312s
71
log file sync
0.516s
86
SQL*Net message to client
0.028s
8152
latch: shared pool
0.001s
6
latch: library cache
0.000s
1
total CPU
0.000s
0
TKPROF does not have a sort option that incorporates wait time between database calls.
Hence it did not report this unknown statement as the highest contributor to response time.
What’s wrong? Cursors without associated SQL statements? Wait time from intra database
call waits such as direct path read and direct path write, which is not rolled up into database
calls? Do both tools report incorrect results? A look at the extended trace file reveals that there
were 92956 wait events pertaining to cursor 4. However, the trace file did not contain a single
PARSING IN CURSOR, PARSE, EXEC, or FETCH entry for cursor 4. The level 2 ERRORSTACK dump taken
by the script awr capture.sql contained the rather strange sqltxt value “table_e_a_d21e_
a_0_0” for cursor 4. Clearly, this was not an issue of a missing PARSING IN CURSOR entry due to the
fact that tracing was switched on in the midst of a running application. There simply did not
exist a proper SQL statement for this cursor.
385
386
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
Cursor#4(09050CE4) state=NULL curiob=090C2054
curflg=1044 fl2=0 par=00000000 ses=6CD86754
sqltxt(6AAAA544)=table_e_a_d21e_a_0_0
As pointed out in Chapter 24, the PARSE, EXEC, and FETCH entries are the only ones that
report CPU usage. Furthermore, the elapsed time reported by these entries includes the wait
events caused by parse, execute, and fetch operations. Since there were no such entries for
cursor 4, the wait time could not be rolled up. The response time R is defined as the sum of the
elapsed time of parse, execute, and fetch calls plus the sum of wait time between database
calls. The reason why R differs tremendously from the interval recorded by the trace file is now
clear. Theoretically, the elapsed time of all database calls should equal CPU usage plus wait
time caused by the database calls. The difference is reported as “unknown”. The contribution
of “unknown” in this resource profile is so large because the wait time from cursor 4 is not rolled up
into any database calls. So what is it that cursor 4 is responsible for? Here are some lines from
the trace file:
WAIT #4: nam='db file sequential read' ela= 2926 file#=4 block#=90206 blocks=1
obj#=53791 tim=19641952691
WAIT #4: nam='db file sequential read' ela= 1666 file#=4 block#=90221 blocks=1
obj#=53791 tim=19641954572
WAIT #4: nam='direct path read' ela= 275 file number=4 first dba=90206 block cnt=1
obj#=53791 tim=19641964448
WAIT #4: nam='direct path write' ela= 3 file number=4 first dba=90174 block cnt=1
obj#=53791 tim=19641955477
Translation of file#=4 and block#=90206 by querying DBA EXTENTS yields this:
SQL> SELECT segment name, segment type, extent id
FROM dba extents
WHERE file id=4 AND 90206 BETWEEN block id AND block id + blocks - 1;
SEGMENT NAME
SEGMENT TYPE EXTENT ID
----------------- ------------ --------IMAGES IMAGE DATA LOBSEGMENT
40
Obviously cursor 4 is related to LOB loading. This finding is corroborated by the fact that
the Active Session History (ASH) report also lists a cursor that has caused direct path read and
write operations, but lacks a SQL statement text. This information is found in the “Top SQL
Statements” section of the ASH report (see Figure 28-1) created by the script awr capture.sql.
The “Top DB Objects” section indicates a LOB as the top object. At this point we may conclude
that loading LOB data with Oracle Call Interface (OCI), which is used internally by Perl DBI, is
poorly instrumented. The extended SQL trace file does not report the CPU usage of loading
LOBs at all.
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
Figure 28-1. Top SQL Statements section of an Active Session History report
According to the ESQLTRCPROF report, 71 rows were inserted into the table IMAGES.
Hash Value: 3858514115 - Total Elapsed Time (excluding think time): 0.146s
INSERT INTO images (id, date loaded, exif make, exif model, exif create date,
exif iso, exif f number, exif exposure time, exif 35mm focal length, image data)
VALUES(:id, sysdate, :exif make, :exif model,
to date(:exif create date, 'yyyy:mm:dd hh24:mi:ss'),
:exif iso, :exif f number, :exif exposure time, :exif 35mm focal length,
empty blob())
DB Call
Count
Elapsed
CPU
Disk
Query Current
Rows
------- -------- ---------- ---------- -------- -------- -------- -------PARSE
71
0.0026s
0.0156s
0
0
0
0
EXEC
71
0.1203s
0.0781s
1
71
495
71
FETCH
0
0.0000s
0.0000s
0
0
0
0
------- -------- ---------- ---------- -------- -------- -------- -------Total
142
0.1229s
0.0938s
1
71
495
71
Since the interval covered by the trace file was 74.9 seconds, 56 LOBs were inserted per
minute. What struck me was that there were also 71 parse calls for this INSERT statement. This
was an indication that the parse call for the INSERT statement was done in a loop instead of just
once before entering the loop.
387
388
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
It was also worth noting that 71 updates of the table SYS.SEQ$, the data dictionary base
table which holds sequences, had occurred:
Hash Value: 2635489469 - Total Elapsed Time (excluding think time): 0.082s
update seq$ set increment$=:2,minvalue=:3,maxvalue=:4,cycle#=:5,order$=:6,cache=:7,
highwater=:8,audit$ =:9,flags=:10 where obj#=:1
DB Call
Count
Elapsed
CPU
Disk
Query Current
Rows
------- -------- ---------- ---------- -------- -------- -------- -------PARSE
71
0.0026s
0.0000s
0
0
0
0
EXEC
71
0.0790s
0.0781s
0
71
142
71
FETCH
0
0.0000s
0.0000s
0
0
0
0
------- -------- ---------- ---------- -------- -------- -------- -------Total
142
0.0815s
0.0781s
0
71
142
71
This probably meant that a sequence, which was not cached, was incremented 71 times.
To verify this assumption, I searched the trace file for the hash value 2635489469 in the ESQLTRCPROF report3 and retrieved the bind variable value for column obj#. Since the script awr
capture.sql enables SQL trace at level 12, the trace file does contain bind variables. Counting
from left to right, the tenth bind variable was applicable to obj#. Since bind variable values are
numbered from 0, I needed to look for Bind#9.
Bind#9
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=0915bf8c bln=22 avl=04 flg=05
value=53740
This yielded the object identifier of the sequence, which I then used to retrieve information on the object from DBA OBJECTS. It turned out that object 53740 was indeed a sequence and
that it was not cached.
SQL> SELECT object name, object type FROM dba objects WHERE object id=53740;
OBJECT NAME OBJECT TYPE
------------- ------------------IMAGE ID SEQ SEQUENCE
SQL> SELECT cache size
FROM dba objects o, dba sequences s
WHERE o.object id=53740
AND o.owner=s.sequence owner
AND o.object name=s.sequence name;
CACHE SIZE
---------0
3. TKPROF omits the hash values for SQL statement texts.
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
All the other statements with the same hash value also had 53740 as the bind variable value
for Bind#9. The update of SYS.SEQ$ did not contribute significantly to the total response time.
However it was unnecessary overhead and very easy to fix with an ALTER SEQUENCE statement.
Another issue I noticed in the ESQLTRCPROF report was 71 commits.
Statistics:
----------COMMITs (read write): 71 -> transactions/sec 12.617
COMMITs (read only): 0
ROLLBACKs (read write): 0
ROLLBACKs (read only): 0
Apparently, each row inserted and LOB loaded was committed separately, adding overhead. Each commit may cause waiting for the wait event log file sync.
Phase 3—Reproduction
The client agreed to provide the Perl subroutine used for loading LOBs as a stand-alone Perl
program for further investigation on a test system. The DDL was also provided. It showed that
both the LOB and the sequence did not have caching enabled.
CREATE TABLE images(
id number,
date loaded date,
exif make varchar2(30),
exif model varchar2(30),
exif create date date,
exif iso varchar2(30),
exif f number varchar2(30),
exif exposure time varchar2(30),
exif 35mm focal length varchar2(30),
image data BLOB,
CONSTRAINT images pk PRIMARY KEY(id)
)
LOB (image data) STORE AS images image data;
CREATE SEQUENCE image id seq NOCACHE;
The first thing I did was to instrument the Perl program with the Hotsos instrumentation
library for ORACLE4 (ILO). Instrumentation with ILO is straight forward. The procedure HOTSOS
ILO TASK.BEGIN TASK is used to start a new task with a certain module and action name. BEGIN
TASK pushes the previous module and action on a stack as discussed in Chapter 23. The procedure HOTSOS ILO TASK.END TASK terminates a task and restores the previous module and action.
Both module and action are reflected in V$ views such as V$SESSION and V$SQL. To enable SQL
trace as soon as the first task is begun, the package HOTSOS ILO TIMER is called by the application itself as follows:
4. ILO is free software and may be downloaded from http://sourceforge.net/projects/hotsos ilo.
The download package includes documentation in HTML format.
389
390
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
begin
hotsos ilo timer.set mark all tasks interesting(mark all tasks interesting=>true,
ignore schedule=>true);
end;
I used the module name “img_load” for the entire program and defined two actions:
• The action “exif_insert” encompassed the generation of a new primary key for the next
row and the retrieval of the LOB locator, which was then used to load the LOB data with
the Perl DBI function ora lob append.
• The action “lob_load” comprised reading the JPEG file from disk and loading it into the
BLOB column. Due to the average image file size of 1 MB, the LOB was loaded piece-wise.
I assumed that the file system access to read the JPEG file did not cause a lot of overhead.
If this assumption had turned out to be wrong, I would have instrumented the file system access in
Perl, in addition to instrumenting database access with Hotsos ILO.
The test program allowed me to indicate how many LOBs it should load. For the sake of
simplicity, a single JPEG file was loaded again and again, although this would reduce the impact of
reading the image file due to file system caching, whilst the original application needed to read
separate image files each time.
I set up the database to collect statistics on the service, module, and actions of interest as
follows:
SQL> EXEC dbms monitor.serv mod act stat enable('TEN.oradbpro.com', > 'img load', 'exif insert')
SQL> EXEC dbms monitor.serv mod act stat enable('TEN.oradbpro.com', > 'img load', 'lob load')
I measured the response time of a load run comprising ten LOBs with the UNIX utility time.5
$ time perl img load.pl 10 sample.jpg
real
0m9.936s
user
0m0.015s
sys
0m0.015s
Then I retrieved the relevant statistics on service, module, and action from
V$SERV MOD ACT STATS.
SQL> SELECT action, stat name, round(value/1000000, 2) AS value
FROM v$serv mod act stats
WHERE service name='TEN.oradbpro.com'
AND module='img load'
AND action IN ('exif insert','lob load')
AND stat name in ('DB time', 'DB CPU', 'sql execute elapsed time',
5. A time utility for Windows ships with Cygwin.
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
'user I/O wait time')
ORDER BY action, stat name;
ACTION
STAT NAME
VALUE
----------- ------------------------ ----exif insert DB CPU
.02
exif insert DB time
.02
exif insert sql execute elapsed time
.01
exif insert user I/O wait time
0
lob load
DB CPU
2.7
lob load
DB time
8.75
lob load
sql execute elapsed time
0
lob load
user I/O wait time
5.85
The total “DB time”, which does not include network latency and think time (SQL*Net message
from client), for both actions was 8.77 seconds. This was reasonably close to the response time
reported by the UNIX utility time (9.9 s). Since the latter measurement includes compilation of
the Perl program and connecting to the DBMS instance, some discrepancy had to be expected.
Thanks to instrumentation, it is very obvious that the bulk of response time is due to the action
“lob_load”, which deals with loading the BLOB column.
Whereas the extended SQL trace file failed to account for CPU usage due to LOB loading,
V$SERV MOD ACT STATS reported 2.7 seconds of CPU usage for loading 10 LOBs. The bulk of the
response time was attributed to user I/O wait time. This correlated with the large amount of db
file sequential read and direct path read as well as direct path write operations associated with
cursor 4 of the SQL trace file. Since at least 8.77 seconds (total “DB time” for both actions) out
of 9.93 seconds (response time measured by the UNIX utility time) or 88% are spent in the DBMS
instance, the assumption that reading the JPEG file does not contribute significantly to response
time had proven to be correct.
Figures in V$SERV MOD ACT STATS are cumulative since instance startup. To disable statistics
collection, these calls to DBMS MONITOR need to be used:
SQL> EXEC dbms monitor.serv mod act stat disable('TEN.oradbpro.com', > 'img load', 'exif insert')
SQL> EXEC dbms monitor.serv mod act stat disable('TEN.oradbpro.com','img load', > 'lob load')
Disabling statistics collection in this way does not clear the statistics. If statistics collection
on the same module and action is re-enabled at a later time, the measurements taken before
are made available again, unless the instance has been restarted.
For the sake of comparing measurements taken before and after improving the response
time, I recorded another run with level 8 SQL trace enabled through event 10046, since I did not
want the additional executions and round-trips caused by ILO to impact the results. The resulting
ESQLTRCPROF report is reproduced here:
391
392
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
ORACLE version 10.2 trace file. Timings are in microseconds (1/1000000 sec)
Resource Profile
================
Response time: 0.779s; max(tim)-min(tim): 8.902s
Total wait time: 6.012s
---------------------------Note: 'SQL*Net message from client' waits for more than 0.005s are considered think
time
Wait events and CPU usage:
Duration
Pct
Count
Average Wait Event/CPU Usage/Think Time
-------- ------ ------------ ---------- ----------------------------------3.834s 491.92%
1160 0.003305s db file sequential read
1.259s 161.54%
1120 0.001124s direct path read
0.350s 44.96%
1191 0.000294s SQL*Net message from client
0.346s 44.40%
10 0.034603s think time
0.090s 11.60%
6227 0.000015s direct path write
0.068s
8.72%
23 0.002954s log file sync
0.060s
7.66%
4500 0.000013s SQL*Net more data from client
0.004s
0.52%
1201 0.000003s SQL*Net message to client
0.000s
0.00%
119 0.000000s total CPU
-5.232s -671.32%
unknown
--------- ------- ----------------------------------------------------------0.779s 100.00% Total response time
Total number of roundtrips (SQL*Net message from/to client): 1201
CPU usage breakdown
-----------------------parse CPU:
0.00s (46 PARSE calls)
exec CPU:
0.00s (47 EXEC calls)
fetch CPU:
0.00s (26 FETCH calls)
Statistics:
----------COMMITs (read write): 10 -> transactions/sec 12.830
Note that there are ten instances of think time. This makes sense, since ten iterations were
performed. This means that the profiler has correctly classified the time it takes to access the
JPEG file as think time.
Phase 4—Improvement
It was very obvious that the loading of LOB data needed improvement. From other projects
involving LOBs, I knew that access to LOBs, which are not cached, is quite slow. I also knew that
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
LOBs, which are significantly larger than a few database blocks, benefit from a larger block and
chunk size. The database block size of both the client’s database and my test database was 8 KB. I
configured a separate buffer pool with 16 KB block size, restarted the test instance, and created
a tablespace for LOB storage with 16 KB block size. These are the SQL statements involved in
the task:
SQL> ALTER SYSTEM SET db 16k cache size=50m SCOPE=SPFILE;
SQL> CREATE TABLESPACE lob ts DATAFILE '&data file path' SIZE 1G BLOCKSIZE 16384;
For a production system one would use a much larger value of DB 16K CACHE SIZE than
just 50 MB. Next, I moved the LOB segment into the new tablespace, increased the LOB chunk
size to the maximum value 32768, and disabled the storage of LOB data in row with the other
columns of the table.
SQL> ALTER TABLE images MOVE LOB (image data)
STORE AS (TABLESPACE lob ts DISABLE STORAGE IN ROW
CACHE RETENTION CHUNK 32768);
Since moving a table makes indexes unusable (an INSERT would cause ORA-01502), the
primary key index had to be rebuilt.
SQL> ALTER INDEX images pk REBUILD;
The package DBMS REDEFINITION, which supports online reorganization of tables and associated indexes, may be used to reduce the downtime incurred by this operation.
After applying these changes, I ran another load of ten rows with instrumentation enabled
and SQL trace disabled.
$ time perl img load.pl 10 sample.jpg
real
0m2.550s
user
0m0.045s
sys
0m0.061s
The response time had come down to only 2.5 seconds. This was more than I had expected, so
I repeated the run another nine times. The average response time of ten runs was 3.02 seconds.
The preceding changes alone more than halved the previous response time.
Next I took a closer look at the Perl code. Apart from parsing inside a loop, which was
already evident from the SQL trace file and the resource profiles, I saw that the LOB was read
from the file system and sent to the DBMS instance in 8 KB pieces. This seemed small, since I
had increased the block size to 16 KB and the LOB chunk size to 32 KB. Here’s an excerpt of the
code that shows how the LOB column was loaded:
do {
$bytes read=sysread(LOBFILE, $data, 8192);
$total bytes+=$bytes read;
if ($bytes read > 0) {
my $rc = $dbh->ora lob append($lob loc, $data);
}
} until $bytes read <=0;
393
394
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
I decided to try the extreme and used a piece size of 1048576 (1 MB) instead of 8192 (8 KB).
The average of another ten runs was 1.71 seconds. Another 43% reduction in response time.
The two changes tested so far already more than met the goal set by the client, so I could have
stopped the optimization at this point. However, I wished to show the benefits of parsing once
and executing many times. Furthermore, I wanted to point out how INSERT RETURNING may be
used to reduce the number of round-trips between client and database server.
The original algorithm of the application was as follows:
1. Increment and retrieve the sequence used for numbering the primary key with the SQL
statement SELECT image id seq.NEXTVAL FROM dual.
2. Insert a row into the table IMAGES using the sequence value as the key for column ID. The
INSERT statement also initialized the BLOB with empty blob().
3. Retrieve the LOB locator using the index on column ID with SELECT image data FROM
images WHERE id=:id.
This required parsing and execution of three separate statements. However, the three
steps may be combined into a single step by using INSERT RETURNING as shown here:
INSERT INTO images (id, date loaded, exif make, exif model, exif create date,
exif iso, exif f number, exif exposure time, exif 35mm focal length, image data)
VALUES(image id seq.NEXTVAL, sysdate, :exif make, :exif model,
to date(:exif create date, 'yyyy:mm:dd hh24:mi:ss'), :exif iso,
:exif f number, :exif exposure time, :exif 35mm focal length, empty blob())
RETURNING id, rowid, image data INTO :id, :row id, :lob loc
Unfortunately this crashes the release of Perl that ships with Oracle10g Release 2 (DBI
version: 1.41 DBD::Oracle version: 1.15),6 but works with more recent releases. As a workaround,
the LOB locator may be fetched separately. With this workaround in place, the average response
time of ten runs was reduced further to 1.11 seconds.
Three issues remained to be fixed:
• The sequence, which was not cached
• The superfluous parse calls inside the loop which loaded the images
• Frequent commits inside the loop instead of once after finishing the load process (or at
least intermittently, say after 10000 rows when loading a large number of images)
I assigned the sequence a cache of 1000 numbers with the following DDL statement:
SQL> ALTER SEQUENCE image id seq CACHE 1000;
6. Oracle11g ships with the same DBI and DBD::Oracle releases, such that it’s not an option to use an
Oracle11g Perl client.
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
Caching more sequence numbers does not increase the memory usage by the shared pool.
Unused sequence numbers are noted in the data dictionary when an instance is shut down
with SHUTDOWN NORMAL or IMMEDIATE. Sequence numbers are lost only when an instance crashes
or is shut down with the ABORT option. Note that rolling back a transaction which has selected
NEXTVAL from a sequence also discards sequence numbers. Consequently, there is no reason
not to use a large sequence cache.
Finally, I modified the Perl program in such a way that both the INSERT and the SELECT
statements were parsed only once. To confirm that the statements were indeed parsed only
once, I enabled SQL trace with event 10046, ran the trace file through ESQLTRCPROF, and
looked at the parse calls. The figures for the INSERT statement are as follows:
DB Call
Count
Elapsed
CPU
Disk
Query Current
Rows
------- -------- ---------- ---------- -------- -------- -------- -------PARSE
1
0.0002s
0.0000s
0
0
0
0
EXEC
10
0.0459s
0.0156s
4
15
75
10
FETCH
0
0.0000s
0.0000s
0
0
0
0
------- -------- ---------- ---------- -------- -------- -------- -------Total
11
0.0460s
0.0156s
4
15
75
10
The numbers show that the statement was parsed once and executed ten times. Of course,
the assignment of values to bind variables remained in the loop. Merely the parse call (prepare
in Perl DBI) was moved outside the loop. This kind of benefit is not available when using literals in
statements, since the statement text changes for each loop iteration such that the statement
must be parsed each time. Reducing parse overhead is one of the reasons why bind variables
should be used. Another is the reduction of contention for the library cache. The entire resource
profile is shown here:
ORACLE version 10.2 trace file. Timings are in microseconds (1/1000000 sec)
Resource Profile
================
Response time: 0.269s; max(tim)-min(tim): 0.582s
Total wait time: 0.320s
---------------------------Note: 'SQL*Net message from client' waits for more than 0.005s are considered think
time
Wait events and CPU usage:
Duration
Pct
Count
Average Wait Event/CPU Usage/Think Time
-------- ------ ------------ ---------- ----------------------------------0.215s 80.10%
10 0.021544s think time
0.056s 20.93%
4620 0.000012s SQL*Net more data from client
0.035s 13.11%
1 0.035251s log file sync
0.013s
4.94%
33 0.000403s SQL*Net message from client
0.000s
0.06%
43 0.000004s SQL*Net message to client
0.000s
0.00%
41 0.000000s total CPU
-0.051s -19.13%
unknown
--------- ------- ----------------------------------------------------------0.269s 100.00% Total response time
395
396
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
Total number of roundtrips (SQL*Net message from/to client): 43
CPU usage breakdown
-----------------------parse CPU:
0.00s (5 PARSE calls)
exec CPU:
0.00s (24 EXEC calls)
fetch CPU:
0.00s (12 FETCH calls)
Statistics:
----------COMMITs (read write): 1 -> transactions/sec 3.718
In this resource profile, the response time R accounted for 46% of the 0.582 seconds captured
by the trace file. This time, the unknown portion of response time was only 19%.
I once again measured the actions with V$SERV MOD ACT STATS and DBMS MONITOR. Since
there is no way to clear statistics in V$SERV MOD ACT STATS, I used the action names “exif_
insert_imp” and “lob_load_imp” this time. Certainly, it would have been possible to take two
snapshots of V$SERV MOD ACT STATS and to calculate the differences, but changing the module
and action names made more sense, since the program had changed significantly too. The
necessary calls to DBMS MONITOR are as follows:
SQL> EXEC dbms monitor.serv mod act stat enable('TEN.oradbpro.com','img load', > 'exif insert imp')
SQL> EXEC dbms monitor.serv mod act stat enable('TEN.oradbpro.com','img load', > 'lob load imp')
Now the system was ready to measure another ten iterations.
$ time perl img load improved.pl 10 sample.jpg
real
0m0.688s
user
0m0.031s
sys
0m0.000s
This run was so fast that I had to get the figures with millisecond resolution to prevent
some values from becoming zero due to rounding.
SQL> SELECT action, stat name, round(value/1000000, 3) AS value
FROM v$serv mod act stats
WHERE service name='TEN.oradbpro.com'
AND module='img load'
AND action IN ('exif insert imp','lob load imp')
AND stat name in ('DB time', 'DB CPU', 'sql execute elapsed time',
'user I/O wait time')
ORDER BY action, stat name;
ACTION
STAT NAME
VALUE
--------------- ------------------------ ----exif insert imp DB CPU
.003
exif insert imp DB time
.003
exif insert imp sql execute elapsed time .003
exif insert imp user I/O wait time
0
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
lob
lob
lob
lob
load
load
load
load
imp
imp
imp
imp
DB CPU
DB time
sql execute elapsed time
user I/O wait time
.116
.152
.001
0
The DB time for both actions, which was formerly 8.77 seconds, was reduced to a mere
0.155 seconds. This accounts for only 22% of the response time as measured by the utility time,
such that it would not be a good idea to base a forecast on these figures. Note that DB time does
not include wait events that occur between database calls, such as SQL*Net message from client
and SQL*Net message to client.
Phase 5—Extrapolation
Since the original measurement of 68 LOBs per minute was done without instrumentation in
place and without SQL trace enabled, I also needed measurements that were not influenced by
such factors as the basis for my extrapolation. I did another series of ten measurements with
instrumentation and SQL trace disabled. The average elapsed time to insert 10 LOBs was 0.931
seconds. Based on this figure, the application should be able to load almost 645 LOBs per
minute. Use of the elapsed time (0.582 s) covered by the level 8 SQL trace file would result in
about 1030 LOBs per minute. The actual figure will probably lie somewhere in between these
two values.
Table 28-2 summarizes some of the differences between the original test case and the optimized test case. Figures are from a load run of 10 LOBs. Elapsed time was reduced to less than
one-tenth of the original value.
Table 28-2. Measurements Before and After Optimization
Metric
Original Test Case
Optimized Test Case
Elapsed time covered by trace file
8.90 s
0.58 s
Total wait time
6.01 s
0.32 s
SQL*Net round-trips
1201
43
Parse calls
46
5
Executions
47
24
Fetch calls
26
12
Think time
0.34 s
0.21 s
Phase 6—Installation
The one significant obstacle that had to be overcome was the reorganization of the LOB segment,
since it required downtime. Other than that, the changes were approved quickly. New measurements were taken after the table had been reorganized during a maintenance window. The
measurements were taken with instrumentation and SQL trace disabled. Throughput varied
between 640 and 682 LOBs per minute. This was reasonably close to the extrapolated value of
645 LOBs per minute. Compared to the original throughput, the speedup was more than tenfold.
397
398
CHAPTER 28 ■ THE MERITS PERFORMANCE OPTIMIZATION METHOD
Lessons Learned
The case study has emphasized that it’s not always possible to rely on instrumentation of the
DBMS, since some code paths may not be sufficiently instrumented. Even under such aggravating circumstances, it is fairly easy to determine where an application spends most of the
time when instrumentation is used. For optimum performance, both database structure and
application coding must leverage the rich features offered by the ORACLE DBMS, such as caching
of LOBs and sequences, reduction of parse overhead with bind variables, and diminution of
network round-trips with INSERT RETURNING. The default settings of LOBs are inapt to achieve
good performance.
Source Code Depot
Table 28-3 lists this chapter’s source files and their functionality.
Table 28-3. MERITS Method Source Code Depot
File Name
Functionality
awr capture.sql
Script for capturing performance data with extended SQL trace and
AWR. Temporarily sets ASH SAMPLE ALL=TRUE to cause ASH to sample
idle wait events for improved diagnostic expressiveness. Automatically
generates an AWR report and an ASH report for the traced session. Both
reports are generated in HTML format in the current directory.
ilo test.sql
This SQL script enables extended SQL trace with Hotsos ILO, runs
SELECT statements, begins tasks, and terminates tasks. It may be used
to learn what kinds of trace file entries are written by application
instrumentation.
img load.pl
Original suboptimal LOB loading test case. Contains DDL statements
for the table and sequence used. To run the program, include the
path to the installation directory of the Perl package Image::ExifTool
in the environment variable PERL5LIB.
img load improved.pl
Optimized LOB loading test case.
sp capture.sql
Script for capturing performance data with extended SQL trace and
Statspack. Includes taking a session level Statspack snapshot. Make
sure you install the fix for the bug that causes incorrect session level
reports (see source code depot of Chapter 25).
PA R T
9
Oracle Net
CHAPTER 29
■■■
TNS Listener IP Address
Binding and IP=FIRST
O
n systems that have more than one network adapter, the default behavior of the TNS
Listener is to accept connections from any network. In Oracle10g and subsequent releases
IP=FIRST may be used to restrict incoming connections to the network in which the host
configured in listener.ora resides.
The Oracle Database, Oracle Clusterware and Oracle Real Application Clusters Installation
Guide 10g Release 2 for AIX contains an example TNS Listener configuration that makes use of
IP=FIRST in the configuration file listener.ora, but does not explain its meaning. This option
is not documented in Oracle Database Net Services Reference 10g Release 2. IP=FIRST is documented in Oracle Database Net Services Reference 11g Release 1, yet the documentation is not
entirely correct.1 The manual does not address all the implications of the parameter. Furthermore it is undocumented that the listener’s behavior concerning the loopback adapter is also
influenced by IP=FIRST.
Introduction to IP Address Binding
Internet protocol network addresses may be classified into three functional categories:
• Boot IP addresses
• Common (non-boot) IP addresses
• Service IP addresses
Each system in a network is identified by a unique host name. The IP address assigned to
such a unique host name is called the boot IP address. This IP address is bound to a network
adapter during the boot process. The unique host name of a system is returned by the command
hostname on both UNIX and Windows. The host name thus returned maps to the boot IP address.
No additional software except the operating system itself is required for the availability of the
boot IP address. Common (non-boot) addresses are addresses used by adapters other than the
boot adapter (the adapter which was assigned the boot address). Of course, the latter adapters
1. Among other things, the syntax of the example is wrong and the C language preprocessor macro
INADDR ANY is reproduced incorrectly.
401
402
CHAPTER 29 ■ TNS LISTENER IP ADDRESS BINDING AND IP=FIRST
are normally assigned addresses during the boot process as well. Any system has exactly one
boot IP address. To obtain the boot IP address of a UNIX system, you might ping the system itself.
$ ping -c 1 `hostname`
PING dbserver1.oradbpro.com (172.168.0.1) 56(84) bytes of data.
64 bytes from dbserver1.oradbpro.com (172.168.0.1): icmp seq=0 ttl=64 time=0.029 ms
--- dbserver1.oradbpro.com ping statistics --1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.029/0.029/0.029/0.000 ms, pipe 2
On Linux, the switch -c instructs ping to send a single packet. Use ping host_name packet_
size count with a count of 1 on Solaris.
The term service address is used to refer to IP addresses that applications use to provide
certain services. Often, service addresses are not assigned to a separate physical network adapter,
but are added as an alias address to a network adapter that already has a boot address or a nonboot address. This approach is called IP aliasing. A service address is also a non-boot address,
the difference being that the former is assigned to a virtual adapter whereas the latter is assigned to
a physical adapter.
Within a single operating system instance, each network adapter is identified by a unique
name. AIX uses entN, Solaris hmeN (hme=hundred megabit ethernet) or geN (ge=gigabit
ethernet), whereas Linux uses ethN, where N is an integer that distinguishes several adapters
of the same type. Additional adapter names are required for use with IP aliasing. The adapter
name for adding the alias IP is formed by adding a colon and a number to the physical adapter’s
name. For instance, the physical adapter might be called eth0 and the adapter for IP aliasing
eth0:1. The alias IP address cannot be chosen arbitrarily. It must reside within the same network as
the IP address of the associated physical adapter. The network mask must be taken into consideration too (Mask:255.255.255.0 in the next example). These topics are well beyond the scope of
this book.2
Clustering software relocates service addresses from a failed node of a cluster to a surviving
node in the same cluster. Service IP addresses assigned by Oracle10g Clusterware are called
virtual IP addresses (VIPs). This is just another term for the same IP aliasing concept. In a RAC
cluster, the VIP of a failing node is assigned to a surviving node. The implementation of virtual
IP addresses uses IP aliasing. IP aliasing can easily be done manually by the UNIX user root
using the command ifconfig. Here’s an example that adds a service IP address to adapter eth1
on a Linux system:
# ifconfig eth1
eth1
Link encap:Ethernet HWaddr 00:0C:29:07:84:EC
inet addr:172.168.0.1 Bcast:172.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe07:84ec/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
…
# ifconfig eth1:1 inet 172.168.0.11 # IP aliasing on adapter eth1
2. For a discussion of subnetting, see http://en.wikipedia.org/wiki/Subnetwork.
CHAPTER 29 ■ TNS LISTENER IP ADDRESS BINDING AND IP=FIRST
# ifconfig eth1:1
eth1:1
Link encap:Ethernet HWaddr 00:0C:29:07:84:EC
inet addr:172.168.0.11 Bcast:172.168.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:169 Base address:0x2080
# ping -c 1 172.168.0.11
PING 172.168.0.11 (172.168.0.11) 56(84) bytes of data.
64 bytes from 172.168.0.11: icmp seq=0 ttl=64 time=0.072 ms
--- 172.168.0.11 ping statistics --1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.072/0.072/0.072/0.000 ms, pipe 2
Multihomed Systems
Systems where ORACLE database software is installed may have more than a single network
interface controller (NIC). A machine might belong to different LAN segments or it might have
a separate network adapter for backup and restore traffic. Clustered systems running Real
Application Clusters (RAC) have a separate private network called interconnect for communication between individual RAC DBMS instances. Machines with several network interface
controllers (a.k.a. network adapters) are sometimes referred to as multihomed, since they are
“at home” in more than just one network. If only we as humans were as fortunate as computer
systems and did not have to pay an extra rent for multiple homes.
The term multihomed is somewhat ambiguous. It may be used in a broad sense where it
describes any machine that has several network adapters connected to one or more networks,
or in a narrow sense where it only applies to Internet access via more than a single network for
higher reliability.3
It may be the intention of a network administrator to allow clients on a certain LAN segment
access to a DBMS instance on a multihomed system, but deny access to clients on another LAN
segment. To achieve this, the DBA needs to make sure that the TNS Listener can be reached
from one LAN segment but not the other. Technically speaking, the TNS Listener should bind
to the IP address of one adapter, but not the other.
Now it’s time to finally fill you in on the rather awkward acronym INADDR ANY, that was
mentioned in the status section at the beginning of this chapter. It is a C language preprocessor
macro used in the C language programming interface for sockets. On POSIX-compliant systems,
TCP/IP as well as UDP networking is implemented using sockets (see man socket). The socket
programming paradigm for a server process such as the TNS Listener is to
1. create a socket
2. bind the socket to an address, potentially using INADDR ANY
3. listen for incoming connections
4. accept incoming connections (or maybe not as in valid node checking discussed in
Chapter 30)
3. The latter definition is found on Wikipedia at http://en.wikipedia.org/wiki/Multihoming.
403
404
CHAPTER 29 ■ TNS LISTENER IP ADDRESS BINDING AND IP=FIRST
The C language routines for these four steps are socket, bind, listen, and accept. The use
of INADDR ANY tells the operating system that the creator of the socket is willing to communicate
with any outside system that may contact it. The Solaris documentation describes INADDR ANY
as follows ([Sol8 2000]):
By using the special value INADDR_ANY with IP, or the unspecified address (all zeros)
with IPv6, the local IP address can be left unspecified in the bind() call by either active
or passive TCP sockets. This feature is usually used if the local address is either
unknown or irrelevant. If left unspecified, the local IP or IPv6 address will be bound at
connection time to the address of the network interface used to service the connection.
The opposite would be to allow connections from hosts residing in a certain network only.
The value of INADDR ANY is an IP address that consists of all zeros (0.0.0.0). Three tools are suitable
for verifying that the TNS Listener behaves as expected:
• telnet, to establish a connection to the TNS Listener and verify that it is actually listening on
a port
• netstat, to list network connections (generic) as well as related processes (some operating
systems only)
• A UNIX system call tracing utility such as strace (Linux), truss (Solaris, AIX), or tusc
(HP-UX)
UNIX system calls are the interface to the UNIX operating system kernel. The POSIX
(Portable Operating System Interface) standard is a specification for UNIX system calls. It is
maintained by The Open Group.4 Of course tnsping could also be used, but according to the
documentation (Oracle Database Net Services Administrator’s Guide 10g Release 2) it requires a
Net service name as an argument. Fortunately, the undocumented feature of supplying an
address specification on the command line as in the following example is a big time saver:
$ tnsping '(ADDRESS=(PROTOCOL=TCP)(Host=172.168.0.1)(Port=1521))'
TNS Ping Utility for Linux: Version 10.2.0.3.0 - Production on 22-JUL-2007 20:13:38
Copyright (c) 1997, 2006, Oracle. All rights reserved.
Attempting to contact (ADDRESS=(PROTOCOL= TCP)(Host=172.168.0.1)(Port=1521))
OK (10 msec)
The address specification follows the same syntax as a full-fledged DESCRIPTION in tnsnames.
ora, but omits sections such as CONNECT DATA or FAILOVER MODE, which are irrelevant to tnsping.
On UNIX systems, quotes around it are necessary since parentheses have special meaning to
shells. From Oracle10g onward, tnsping also supports the easy connect format host_name:
port/instance_service_name, which also does not require Net service name resolution.
C:> tnsping dbserver:1521/ten.oradbpro.com
TNS Ping Utility for 32-bit Windows: Version 10.2.0.1.0 - Production on 14-DEC-2007
18:58:59
Copyright (c) 1997, 2005, Oracle. All rights reserved.
Used parameter files:
4. See http://www.pasc.org.
CHAPTER 29 ■ TNS LISTENER IP ADDRESS BINDING AND IP=FIRST
C:\oracle\admin\network\admin\sqlnet.ora
Used HOSTNAME adapter to resolve the alias
Attempting to contact (DESCRIPTION=(CONNECT DATA=(SERVICE NAME=ten.oradbpro.com))
(ADDRESS=(PROTOCOL=TCP)
(HOST=169.254.212.142)(PORT=1521)))
OK (10 msec)
When using the easy connect format with tnsping, the instance_service_name is optional.
If specified, it is not verified by tnsping.
IP=FIRST Disabled
Let’s investigate what happens when IP=FIRST is disabled. The following tests were performed
with Oracle10g, since Oracle9i does not support IP=FIRST. However, an Oracle9i TNS Listener
has the same behavior as an Oracle10g or Oracle11g TNS Listener without IP=FIRST. The test
system’s host name is dbserver1.oradbpro.com. This host name maps to the boot IP address
172.168.0.1.
$ ping -c 1 `hostname`
PING dbserver1.oradbpro.com (172.168.0.1) 56(84) bytes of data.
64 bytes from dbserver1.oradbpro.com (172.168.0.1): icmp seq=0 ttl=64 time=0.011 ms
Host Name
For this demonstration, the TNS Listener configuration in listener.ora contained the host
name of the test system.
LISTENER=
(DESCRIPTION =
(ADDRESS LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST=dbserver1.oradbpro.com)(PORT=1521))
)
)
)
Coming back to the digression on UNIX network programming with sockets, let’s take a
look at the bind calls the TNS Listener makes when IP=FIRST is disabled. On Linux, strace may
be used to trace the bind system calls the TNS Listener makes. The basic syntax for starting and
tracing a process with strace is as follows:
strace -f -o output file command arguments
The switch -f tells strace to trace across forks (creation of a child process) and -o is for
specifying the output file where strace writes the system calls it captured. So let’s start a TNS
Listener under the control of strace.
$ strace -f -o /tmp/strace no ip first.out lsnrctl start
405
406
CHAPTER 29 ■ TNS LISTENER IP ADDRESS BINDING AND IP=FIRST
Without IP=FIRST, the trace output contains a bind call, which uses an incoming address
of all zeros (INADDR ANY).
$ grep "bind.*1521" /tmp/strace no ip first.out
25443 bind(8, {sa family=AF INET, sin port=htons(1521), sin addr=inet addr("0.0.0.0"
)}, 16) = 0
Running telnet IP_address listener_port for all the IP addresses associated with the host,
confirms that the TNS Listener accepts connections on any network. This is also evident from
netstat output (multiple connections between telnet and tnslsnr must be open to get this
output).
$ netstat -np|egrep 'Proto|telnet|tnslsnr'
Proto Recv-Q Send-Q Local Address
Foreign Address
me
tcp
et
tcp
et
tcp
et
tcp
snr
tcp
snr
tcp
snr
tcp
snr
tcp
snr
State
PID/
Program na
0
0 192.168.10.132:50388 192.168.10.132:1521 ESTABLISHED 25610/teln
0
0 172.168.0.1:50393
172.168.0.1:1521
ESTABLISHED 25613/teln
0
0 127.0.0.1:50394
127.0.0.1:1521
ESTABLISHED 25614/teln
0
0 172.168.0.1:1521
172.168.0.1:50319
ESTABLISHED 25489/tnsl
0
0 172.168.0.1:1521
172.168.0.1:50320
ESTABLISHED 25489/tnsl
0
0 192.168.10.132:1521
192.168.10.132:50388 ESTABLISHED 25489/tnsl
0
0 172.168.0.1:1521
172.168.0.1:50393
ESTABLISHED 25489/tnsl
0
0 127.0.0.1:1521
127.0.0.1:50394
ESTABLISHED 25489/tnsl
The switch -n tells netstat to use numbers instead of names for host names and ports. The
switch -p (Linux specific) is for displaying process names. The INADDR ANY value of 0.0.0.0 can
be seen in the column “Local Address” of the netstat output, if the switch -l is used. This
switch restricts the report to sockets with status LISTEN, i.e., sockets that are not yet connected
to a client program.
$ netstat -tnlp | egrep 'Proto|tns'
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp
0
0 0.0.0.0:1521 0.0.0.0:*
LISTEN
PID/Program name
25489/tnslsnr
The netstat switch -t limits the output to TCP sockets only. On Windows, you would run
netstat with the switches -abno -p tcp, to get nearly the same report as on UNIX.
C:> netstat -abno -p tcp
CHAPTER 29 ■ TNS LISTENER IP ADDRESS BINDING AND IP=FIRST
Active Connections
Proto Local Address
TCP
0.0.0.0:1521
[TNSLSNR.exe]
TCP
127.0.0.1:1521
[TNSLSNR.exe]
TCP
127.0.0.1:2431
[telnet.exe]
TCP
192.168.10.1:1521
[TNSLSNR.exe]
TCP
192.168.10.1:2432
[telnet.exe]
Foreign Address
0.0.0.0:0
State
LISTENING
PID
4524
127.0.0.1:2431
ESTABLISHED
4524
127.0.0.1:1521
ESTABLISHED
4836
192.168.10.1:2432
ESTABLISHED
4524
192.168.10.1:1521
ESTABLISHED
4224
Loopback Adapter
You may be surprised to find that the TNS Listener may also be contacted at the IP address
127.0.0.1 of the loopback adapter. The loopback adapter provides IP networking within the
boundaries of a machine for situations where a system is not connected to a network, but
needs TCP/IP for applications that require it. The host name “localhost” is mapped to the IP
address 127.0.0.1 in the hosts configuration file. This file is /etc/hosts on UNIX and %SYSTEM
ROOT%\system32\drivers\etc\hosts on Windows. On a laptop computer, the loopback IP
address may be explicitly assigned to the TNS Listener when working offline. This will allow
use of the TNS Listener over TCP/IP even when not connected to any network. Another option
for local connections via the TNS Listener is the IPC protocol. To check whether a process is
listening on port 1521 of the loopback address, the following telnet command may be used:
$ telnet localhost 1521
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Since telnet did not terminate with the error “Connection refused”, the TNS Listener
accepted the connection from telnet. The strace output contains a separate bind call for the
IP address 127.0.0.1.
25443 bind(12, {sa family=AF INET, sin port=htons(0),
sin addr=inet addr ("127.0.0.1")}, 16) = 0
Boot IP Address
It is undocumented in releases prior to Oracle11g that the TNS Listener does not bind to INADDR
ANY if the boot IP address is used in listener.ora, instead of the system’s host name. Following
is a modified TNS Listener configuration that uses the boot IP address:
407
408
CHAPTER 29 ■ TNS LISTENER IP ADDRESS BINDING AND IP=FIRST
LISTENER=
(DESCRIPTION =
(ADDRESS LIST =
(ADDRESS = (PROTOCOL=TCP)(HOST=172.168.0.1)(PORT=1521))
)
)
)
After restarting the TNS Listener, the local address corresponds to the configured IP
address.
$ netstat -tnlp|grep tns
Proto Recv-Q Send-Q Local Address
tcp
0
0 172.168.0.1:1521
Foreign Address
0.0.0.0:*
State
LISTEN
PID/Program name
25630/tnslsnr
It is no longer possible to connect to the TNS Listener on the loopback address or any
other IP address that is not explicitly configured.
$ telnet localhost 1521
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
$ tnsping 192.168.10.132:1521
TNS Ping Utility for Linux: Version 10.2.0.3.0 - Production on 08-SEP-2007 15:47:25
Copyright (c) 1997, 2006, Oracle. All rights reserved.
Used parameter files:
/opt/oracle/product/db10.2/network/admin/sqlnet.ora
Used HOSTNAME adapter to resolve the alias
Attempting to contact (DESCRIPTION=(CONNECT DATA=(SERVICE NAME=192.168.10.132))
(ADDRESS=(PROTOCOL=TCP)(HOST= 192.168.10.132)(PORT=1521)))
TNS-12541: TNS:no listener
With this configuration, the boot IP address is used in the bind call traced with strace:
$ strace -f -o /tmp/strace boot ip.out lsnrctl start
$ grep "bind.*1521" /tmp/strace boot ip.out
25689 bind(8, {sa family=AF INET, sin port=htons(1521),
sin addr=inet addr("172.168.0.1")}, 16) = 0
Service IP Address
To complement the probe into the TNS Listener’s undocumented features, let’s see what happens
when the TNS Listener is configured with a host name that maps to a service IP address. The
alias IP address 172.168.0.11 of adapter eth1:1 that was defined above will serve as an example.
This IP address is assigned to the host name vip-dbserver1 in the configuration file /etc/hosts.
$ grep 172.168.0.11 /etc/hosts
172.168.0.11
vip-dbserver1.oradbpro.com
vip-dbserver1
CHAPTER 29 ■ TNS LISTENER IP ADDRESS BINDING AND IP=FIRST
Thus, the ADDRESS entry in listener.ora becomes:
(ADDRESS=(PROTOCOL=TCP)(HOST=vip-dbserver1)(PORT=1521))
When assigned a host name that maps to a service IP address (i.e., not the boot IP address),
the TNS Listener binds specifically to that address and does not use INADDR ANY. The output
from netstat shown here confirms this:
$ netstat -tnlp|egrep 'Proto|tns'
Proto Recv-Q Send-Q Local Address
Foreign Address State
tcp
0
0 172.168.0.11:1521 0.0.0.0:*
LISTEN
PID/Program name
27050/tnslsnr
Further testing reveals that INADDR ANY is also not used for non-boot IP addresses.
IP=FIRST Enabled
To enable IP=FIRST, the ADDRESS line in listener.ora must be modified as follows:
(ADDRESS=(PROTOCOL=TCP)(HOST=172.168.0.1)(PORT=1521)(IP=FIRST))
After restarting the TNS Listener (lsnrctl reload listener_name is not sufficient) with this
configuration, the bind call in the strace output file has changed. Where previously 0.0.0.0
(INADDR ANY) was used, there is now the IP address, which the system’s host name resolves to.
bind(8, {sa family=AF INET, sin port=htons(1521),
sin addr=inet addr("172.168.0.1")}, 16) = 0
The TNS Listener is again functional at the assigned address 172.168.0.1.
$ telnet 172.168.0.1 1521
Trying 172.168.0.1...
Connected to 172.168.0.1.
Escape character is '^]'.
Connection closed by foreign host.
The TNS Listener closes the connection to telnet after a few minutes. Type Ctrl+C, Ctrl+D,
and hit return to abort the connection. The program telnet then responds with “Connection
closed by foreign host.” as in the preceding code example. Now that IP=FIRST is in effect,
attempting to contact the TNS Listener on any address other than the boot IP address fails.
$ telnet 77.47.1.187 1521
Trying 77.47.1.187...
telnet: connect to address 77.47.1.187: Connection refused
$ telnet localhost 1521
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
# netstat -tnlp | egrep 'Proto|tns'
Proto Recv-Q Send-Q Local Address
Foreign Address State
tcp
0
0 172.168.0.1:1521 0.0.0.0:*
LISTEN
PID/Program name
27916/tnslsnr
409
410
CHAPTER 29 ■ TNS LISTENER IP ADDRESS BINDING AND IP=FIRST
Table 29-1 summarizes the TNS Listener’s IP address binding behavior under all situations
which may arise. Note that the TNS Listener’s behavior for boot IP address, non-boot IP
address or host name, and service IP addresses or host name is identical, irrespective of the
option IP=FIRST. In other words, IP=FIRST solely has an impact when the system’s host name is
used in listener.ora.
Table 29-1. TNS Listener IP Address Binding
HOST Setting
in listener.ora
IP=FIRST
Boot IP
Address
Other Non-Boot IP Address
or Service IP Address (IP Aliasing)
Loopback
Address
System host name
(maps to boot IP address)
yes
yes
no
no
System host name
(maps to boot IP address)
no
yes
yes
yes
Boot IP address
no
yes
no
no
Non-Boot IP address/host
name
no
no
no
no
Service IP
address/host name
no
no
no
no
The first row of the table represents the TNS Listener’s behavior when a system’s host
name is used in the configuration file listener.ora and IP=FIRST is set. Columns 3–5 indicate
the IP address binding behavior under the settings in columns 1 and 2. The value “yes” means that
the TNS Listener does bind to the type of IP address indicated in the table’s column heading. Thus,
the TNS listener binds solely to the boot IP address under the settings depicted in the first row.
Lessons Learned
This chapter investigated the TNS Listener’s binding to IP addresses on multihomed systems
as well as its use of the loopback adapter. Several undocumented aspects were found:
• When assigned a system’s host name, the TNS Listener uses INADDR ANY and thus can be
reached from any network as well as via the loopback adapter, which always has the IP
address 127.0.0.1.
• When assigned the boot IP address or a non-boot IP address or host name, the TNS
Listener does not use INADDR ANY, but instead binds specifically to the address assigned.
It also refrains from using the loopback adapter.
• When the option IP=FIRST is enabled, the TNS Listener binds specifically to the IP address,
which the configured host name resolves to and cannot be reached from any other IP
address including the loopback address. This option is relevant only if the system’s host
name is assigned to the parameter HOST in listener.ora.
CHAPTER 29 ■ TNS LISTENER IP ADDRESS BINDING AND IP=FIRST
Thus, there are three solutions for TNS Listeners, which shall not be reached from
any network:
• Use the boot IP address instead of the system’s host name (which maps to the boot
IP address).
• Use a non-boot or service IP address or host name (neither the system’s host name nor
a host name that resolves to the boot IP address).
• Configure the option IP=FIRST when referencing the system’s host name in listener.ora
(requires Oracle10g or later release).
411
CHAPTER 30
■■■
TNS Listener TCP/IP
Valid Node Checking
L
istener valid node checking may be used to prevent malicious or errant Oracle Net connections to DBMS instances. It’s a “poor man’s firewall” under control of the DBA. Production
DBMS instances may by separated from test and development instances without additional
hardware or firewall software simply by specifying a list of nodes that may contact the listener.
Valid node checking is documented, but it is undocumented that the parameters are fully
dynamic in Oracle10g and Oracle11g, such that the configuration may be enabled, changed,
and removed without stopping and restarting the TNS Listener, rendering the feature much
less intrusive.
Introduction to Valid Node Checking
Valid node checking is an interesting security feature that protects DBMS instances from
malevolent or errant Oracle Net connections over TCP/IP, without the need for a firewall or IP
address filtering at the operating system level. The feature is available in Oracle9i and subsequent releases at no extra cost.
Here’s an anecdote that illustrates why valid node checking is a worthwhile feature. A
production database that had several database jobs was copied onto a test machine. The database jobs started running on the test machine. Some of these jobs were using database links.
Since the database link definition contained a full Net service name definition, instead of referencing a Net service name in tnsnames.ora (an undocumented feature), the test system was
able to access a critical production system and caused a deterioration in its performance. The
administrators got off lightly since the jobs were read-only. Imagine what could have happened
had the jobs modified production data. Correctly configured valid node checking would have
prevented the issue.
The feature is controlled by the three parameters tcp.validnode checking, tcp.invited
nodes, and tcp.excluded nodes, which are presented in Table 30-1.
413
414
CHAPTER 30 ■ TNS LISTENER TCP/IP VALID NODE CHECKING
Table 30-1. Valid Node Checking Parameters
Name
Purpose
Values
Default
tcp.validnode checking
Turns valid node
checking on or off
yes, no
no
tcp.invited nodes
List of nodes that may
connect to the TNS
Listener
Comma separated list of host
names and/or IP addresses
on a single line
empty list
tcp.excluded nodes
List of nodes that are
denied a connection to
the TNS Listener
Comma separated list of host
names and/or IP addresses
on a single line
empty list
It appears that the code path for valid node checking is always executed. As long as the
feature is not in use, the lists for invited and excluded hosts are empty, thus allowing any client
to connect. This assumption is based on the observation that the TNS Listener writes lines such
as in the following excerpt on each connect by a client to its trace file, irrespective of the setting
of tcp.validnode checking:
[12-JUL-2007 20:03:12:268] nttcnp: Validnode Table IN use; err 0x0
[12-JUL-2007 20:03:12:268] nttvlser: valid node check on incoming node 10.6.6.64
Only when the TNS Listener trace level is at least at level ADMIN and tcp.validnode
checking=no is set, does the trace file contain evidence that valid node checking is switched off:
[12-JUL-2007 19:52:28:329] ntvllt: tcp.validnode checking not turned on
Let’s take a look at some examples. The prompts dbserver$ and client$ indicate where
each command was run. The listener.ora and sqlnet.ora on the server are shown here:
dbserver$ head -200 listener.ora sqlnet.ora
==> listener.ora <==
LISTENER =
(DESCRIPTION LIST =
(DESCRIPTION =
(ADDRESS LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST =dbserver.oradbpro.com)(PORT = 1521))
)
)
)
trace level listener=admin
==> sqlnet.ora <==
NAMES.DIRECTORY PATH= (TNSNAMES)
Valid node checking is currently switched off. This is evident from the configuration file
sqlnet.ora reproduced in the preceding code example. Following is the tnsnames.ora on
the client:
CHAPTER 30 ■ TNS LISTENER TCP/IP VALID NODE CHECKING
client$ cat tnsnames.ora
TEN TCP.WORLD =
(DESCRIPTION =
(ADDRESS LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST=dbserver.oradbpro.com)(PORT = 1521))
)
(CONNECT DATA =
(SERVICE NAME = TEN)
)
)
Let’s start the TNS Listener and verify that the client can connect to it. For the sake of
conciseness, lsnrctl output, which does not indicate whether valid node checking is configured, is omitted.
dbserver$ lsnrctl start
Since the preceding sqlnet.ora file does not contain any of the three valid node checking
parameters, the feature is disabled and any client can connect successfully.
client$ sqlplus -l ndebes/secret@ten tcp.world
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
SQL> EXIT
Disconnected from Oracle Database 10g Enterprise Edition Rlease
10.2.0.1.0 - Production
Enabling and Modifying Valid Node Checking
at Runtime
I claimed that valid node checking could be enabled dynamically in Oracle10g and Oracle11g,
that is, without stopping and restarting the TNS Listener. Note that this works only if the
configuration file sqlnet.ora was present when the TNS Listener was started. Let’s verify my
claim by changing sqlnet.ora as follows on the server and running lsnrctl reload:1
dbserver$ cat sqlnet.ora
NAMES.DIRECTORY PATH=(TNSNAMES)
tcp.validnode checking=yes
tcp.excluded_nodes=(client.oradbpro.com)
dbserver$ lsnrctl reload
The output of lsnrctl reload does not indicate whether valid node checking is enabled or
not. Now, an attempt to connect from the client system client.oradbpro.com fails.
1. It appears that the LSNRCTL utility caches the TNS Listener configuration. When testing, you should
always run lsnrctl from the command line, instead of leaving the utility open and running multiple
commands at the LSNRCTL prompt. The latter approach may not pick up changes to listener.ora.
415
416
CHAPTER 30 ■ TNS LISTENER TCP/IP VALID NODE CHECKING
client$ sqlplus -l ndebes/secret@ten tcp.world
SQL*Plus: Release 10.2.0.1.0 - Production on Fri Jul 13 02:06:33 2007
ERROR:
ORA-12537: TNS:connection closed
SP2-0751: Unable to connect to Oracle. Exiting SQL*Plus
Of course, translation of the client host name to an IP address with DNS, NIS, or other
method must be configured. IP addresses may also be used in the list of invited or excluded
hosts. If the TNS Listener trace level is at least USER, an entry like the following, which identifies
the client that was denied, is written to the TNS Listener trace file:
13-JUL-2007 02:21:02:109] nttvlser: valid node check on incoming node 88.215.114.53
13-JUL-2007 02:21:02:109] nttvlser: Denied Entry: 88.215.114.53
Setting the list of invited nodes in such a way that client.oradbpro.com is included and
running another reload enables the client to connect again.
dbserver$ cat sqlnet.ora
tcp.validnode checking=yes
tcp.invited_nodes=(client.oradbpro.com)
dbserver$ lsnrctl reload
client$ sqlplus -l ndebes/secret@ten tcp.world
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
The successful connection by the client is logged as follows in the TNS Listener trace file:
[13-JUL-2007 02:24:44:789] nttvlser: valid node check on incoming node 88.215.114.53
[13-JUL-2007 02:24:44:789] nttvlser: Accepted Entry: 88.215.114.53
If tcp.invited nodes is set, any node not mentioned in the list is denied access.
dbserver$ cat sqlnet.ora
tcp.validnode checking=yes
tcp.invited nodes=(192.168.0.1)
dbserver$ lsnrctl reload
client$ sqlplus -l ndebes/secret@ten tcp.world
SQL*Plus: Release 10.2.0.1.0 - Production on Fri Jul 13 02:06:33 2007
ERROR:
ORA-12537: TNS:connection closed
SP2-0751: Unable to connect to Oracle. Exiting SQL*Plus
Of course, the denied hosts also include the system where the TNS Listener is running,
such that subsequent LSNRCTL commands over TCP/IP fail. You need to include the local
system in tcp.invited nodes to allow LSNRCTL commands over TCP/IP. Another method is to
use an IPC protocol entry as the first ADDRESS of the TNS Listener. This tells the LSNRCTL utility
to communicate with the TNS Listener using IPC, which is obviously exempt from TCP/IP valid
node checking. The next example shows a TNS Listener definition, which uses the IPC protocol
in the first ADDRESS entry:
CHAPTER 30 ■ TNS LISTENER TCP/IP VALID NODE CHECKING
LISTENER =
(DESCRIPTION LIST =
(DESCRIPTION =
(ADDRESS LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = TEN))
(ADDRESS = (PROTOCOL = TCP)(HOST =dbserver.oradbpro.com)(PORT = 1521))
)
)
)
By default, an Oracle9i TNS Listener is unprotected against STOP commands from remote
nodes. Instead of using a TNS Listener password to prevent someone from another host within
the network from shutting down the TNS Listener, you could also use valid node checking. The
downside is that the list of invited nodes has to include all the machines that may access the
TNS Listener. These could still be used to remotely stop the TNS Listener, but might be trusted
systems. This is interesting news for installations that run clustering software, which protects
the ORACLE TNS Listener against node failure, but does not support TNS Listener passwords
(e.g., VERITAS Cluster Server prior to release 4).
If you want to take a more relaxed approach, you may set only tcp.excluded nodes and list
systems that you are certain may not connect to the TNS Listener, and thus the instance(s)
served by the TNS Listener. All nodes not mentioned will be able to connect. Host names and
IP addresses may be used at the same time.
There’s no sense in setting both tcp.invited nodes and tcp.excluded nodes at the same
time, since even nodes not mentioned explicitly as excluded nodes will still be excluded when
tcp.invited nodes is set. If a node name is contained in both tcp.excluded nodes and tcp.
invited nodes, tcp.invited nodes takes precedence and the node is allowed access. In Oracle9i, if
there is a single node name that cannot be resolved to an IP address, this error is logged to the
trace file:
[12-JUL-2007 21:25:10:162] nttcnp: Validnode Table **NOT** used; err 0x1f7
Valid node checking is switched off when this error occurs. Unfortunately, the Oracle9i
LSNRCTL utility does not write an error message to the terminal. In the presence of invalid host
names, Oracle10g lsnrctl startup fails with “TNS-12560: TNS:protocol adapter error” and
“TNS-00584: Valid node checking configuration error”. Using oerr on TNS-00584 gives this:
$ oerr tns 584
00584, 00000, "Valid node checking configuration error"
// *Cause:Valid node checking specific Oracle Net configuration is invalid.
// *Action:Ensure the hosts specified in the "invited nodes" and "excluded nodes"
// are valid. For further details, turn on tracing and reexecute the operation.
If TNS Listener tracing is enabled, the trace file will contain a message similar to the following:
[12-JUL-2007 23:27:16:808] snlinGetAddrInfo: Name resolution failed for
wrong.host.name
[12-JUL-2007 23:27:16:808] nttcnp: Validnode Table **NOT** used; err 0x248
417
418
CHAPTER 30 ■ TNS LISTENER TCP/IP VALID NODE CHECKING
A reload of configuration files with lsnrctl reload completes successfully, in spite of name
resolution failures, which are logged in the following format:
[13-JUL-2007 00:11:53:427] snlinGetAddrInfo: Name resolution failed for
wrong.host.name
No reverse address translation is performed on IP addresses. Thus, IP addresses that cannot
be translated to host names do not prevent the operation of valid node checking. The operating
system utility nslookup may be used to translate between Domain Name Service (DNS) host
names and IP addresses and vice versa. Keep in mind that nslookup does not read the hosts file
(/etc/hosts on UNIX, %SYSTEM ROOT%\system32\drivers\etc\hosts on Windows, where SYSTEM
ROOT is usually C:\WINDOWS). So the TNS Listener may be able to resolve a name or IP address by
calling C programming language library routines (gethostbyaddr(), gethostbyname()), while
nslookup may not.
I ran some tests to find out what the undocumented maximum accepted length for the
invited and excluded node lists is. The maximum line length of the Vi editor I used was 2048 bytes.2
Both parameters were still working fine at this line length. Assuming an average length of 30
bytes for a host name, this length would provide enough room for around 65 entries. If IP addresses
were used, at least 128 IP addresses would fit. The list of valid nodes cannot exceed a single line,
otherwise the error “TNS-00583: Valid node checking: unable to parse configuration parameters”
is signaled and the TNS Listener does not start.
2. Other editors allow lines that are longer than 2048 bytes. Vim (Vi improved) is an enhanced implementation of Vi, which supports a line length of more than 2048 bytes. It is available for free at the URL
http://www.vim.org and runs on UNIX, Windows, and Mac OS.
CHAPTER 31
■■■
Local Naming Parameter
ENABLE=BROKEN
T
he local naming parameter setting ENABLE=BROKEN is undocumented. This parameter may be
used in a Net service name definition to switch on sending of TCP/IP keep-alive packets in
order to detect communication failures. Keep-alive packets are not normally sent over a TCP/IP
connection. Furthermore, certain timeouts of the TCP/IP protocol are in the range of hours,
such that it may take unacceptably long to detect a communications failure. The parameter
setting ENABLE=BROKEN allows for reasonably fast detection of communication failures without
changing operating system TCP/IP parameters that affect all TCP/IP connections. The default
values of TCP/IP keep-alive parameters may need to be changed in order to best use ENABLE=
BROKEN, but these changes do not affect other TCP/IP connections that do not enable keepalive packets.
Node Failure and the TCP/IP Protocol
Before the advent of Oracle10g Clusterware and its support for virtual IP addresses, failure of a
node in a RAC cluster would normally leave clients, which were connected to the failed node,
waiting for extended periods of time, possibly up to two hours—the default period for TCP/IP
connection time-out. ENABLE=BROKEN addresses RAC high availability environments. It can be
used to reduce the interval where a client hangs due to a broken network connection to a RAC
cluster node that died unexpectedly.
Syntactically, ENABLE=BROKEN belongs in the DESCRIPTION section, right where other high
availability related parameters such as FAILOVER have to be placed. Following is an example Net
service name definition for use with a two-node RAC cluster and Transparent Application
Failover (TAF):
DBSERVER TAF.WORLD =
(DESCRIPTION =
(ENABLE=BROKEN)
(FAILOVER=ON)
(LOAD BALANCE=OFF)
(ADDRESS LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = dbserver1.oradbpro.com)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = dbserver2.oradbpro.com)(PORT = 1521))
)
419
420
CHAPTER 31 ■ LOCAL NAMING PARAMETER ENABLE=BROKEN
(CONNECT DATA =
(SERVICE NAME = TEN)(SERVER = DEDICATED)
(FAILOVER MODE=
(TYPE=SELECT)
(METHOD=BASIC)
(DELAY=5)
(RETRIES=600)
)
)
)
The option ENABLE=BROKEN is available at least since Oracle8 and controls whether or not
the keep-alive option of a network connection is switched on. UNIX network programming is
based on the socket interface. A socket is a communication endpoint. Again, UNIX system call
tracing is the tool of choice to observe what is going on. Here’s an example from a Solaris system,
where ENABLE=BROKEN was not configured:
$ truss -o /tmp/truss-no-enable-broken.out -t so socket,setsockopt \
-v so socket,setsockopt sqlplus system@DBSERVER TAF.WORLD
$ cat /tmp/truss-no-enable-broken.out
so socket(PF INET, SOCK STREAM, IPPROTO IP, "", 1) = 8
setsockopt(8, tcp, TCP NODELAY, 0xFFFFFFFF7FFF6924, 4, 1) = 0
Repeating the same actions with ENABLE=BROKEN configured yields the following:
so socket(PF INET, SOCK STREAM, IPPROTO IP, "", 1) = 8
setsockopt(8, SOL SOCKET, SO KEEPALIVE, 0xFFFFFFFF7FFF665C,
setsockopt(8, tcp, TCP NODELAY, 0xFFFFFFFF7FFF6924, 4, 1) =
setsockopt(8, SOL SOCKET, SO KEEPALIVE, 0xFFFFFFFF7FFF529C,
setsockopt(8, SOL SOCKET, SO KEEPALIVE, 0xFFFFFFFF7FFFDC2C,
4, 1) = 0
0
4, 1) = 0
4, 1) = 0
Three additional setsockopt system calls have occurred after setting ENABLE=BROKEN. The C
language flag to switch on sending of keep-alive packets is SO KEEPALIVE, as is evident from the
preceding output.
On Solaris, slightly fewer than 80 parameters control TCP/IP networking. The Solaris Tunable
Parameters Reference Manual ([SoTu 2007]) has this to say about SO KEEPALIVE (ndd is a Solaris
utility for changing TCP/IP parameters):
tcp_keepalive_interval
This ndd parameter sets a probe interval that is first sent out after a TCP connection is
idle on a system-wide basis.
Solaris supports the TCP keep-alive mechanism as described in RFC 1122. This mechanism is enabled by setting the SO_KEEPALIVE socket option on a TCP socket.
If SO_KEEPALIVE is enabled for a socket, the first keep-alive probe is sent out after a
TCP connection is idle for two hours, the default value of the tcp_keepalive_interval
parameter. If the peer does not respond to the probe after eight minutes, the TCP
connection is aborted. For more information, refer to tcp_keepalive_abort_interval.
CHAPTER 31 ■ LOCAL NAMING PARAMETER ENABLE=BROKEN
You can also use the TCP_KEEPALIVE_THRESHOLD socket option on individual applications to override the default interval so that each application can have its own
interval on each socket. The option value is an unsigned integer in milliseconds. See
also tcp(7P).
The Solaris manual goes on to state that the commitment level for the parameter is
unstable and that it should not be changed. To the best of my knowledge, tcp keepalive
threshold is not implemented in ORACLE DBMS software. Instead of modifying keep-alive
settings, the Solaris documentation recommends changing re-transmit time-outs (tcp
rexmit interval max and tcp ip abort interval).
A while back, my own testing with Oracle8i confirmed that ENABLE=BROKEN is functional
and useful given that tcp keepalive interval and tcp keepalive abort interval are adjusted as
needed. Tracing with truss showed that it is still implemented in Oracle10g. Keep in mind that
an appropriate test for all of these TCP/IP settings consists of either pulling the network cable
(and keeping it pulled), switching off the server, or any other method of bringing down the
operating system (Stop+A on Solaris), such that it does not stand a chance of sending a message to
a remote system to indicate that sockets should be closed. Such a test should be performed on
any RAC cluster before it moves into production.
Before I move off topic any further, let me explain why ENABLE=BROKEN should be considered an
outdated feature. With Oracle10g Clusterware and virtual IP addresses the IP address that went
down on a failed host is brought back online on a surviving node. Re-transmits by the client
should then be redirected to a surviving node and fail, since it knows nothing about the sockets
that were open on the failed node. As part of virtual IP address (VIP) failover, Oracle Clusterware flushes the address resolution protocol (ARP) cache, which translates between IP addresses
and MAC (medium access control) addresses of ethernet adapters. This is undocumented, but
is essential in accomplishing successful reconnects by database clients, which must become
aware of the new mapping between IP address and MAC address. On Linux, the ARP cache is
flushed by executing the command /sbin/arping -q -U -c 3 -I adapter ip_address in the
script $ORA CRS HOME/bin/racgvip. The mapping between MAC and IP addresses may be displayed
with the command arp -a on UNIX as well as Windows.
Additional information on the subject of TCP/IP and failover is in Metalink note 249213.1.
According to the note, Sun Microsystems suggests setting tcp keepalive interval, tcp ip
abort cinterval (prevents connect attempts to the failed node from waiting up to three minutes),
and tcp ip abort interval (by default, a connection is closed after not receiving an acknowledgment for eight minutes). Unfortunately, the Metalink note does not state that tcp keepalive
interval is ignored, unless SO KEEPALIVE is set on the socket, which in turn requires ENABLE=BROKEN.
I recommend adjusting tcp ip abort cinterval to prevent connections initiated before
the virtual IP address has come back online on a surviving node from locking up for up to three
minutes. I also suggest reducing the values of the parameters tcp ip abort interval and tcp
rexmit interval max to 45 seconds (default: 8 minutes) and 30 seconds (default: 3 minutes)
respectively. These parameters must be changed on the database client machine—remember,
421
422
CHAPTER 31 ■ LOCAL NAMING PARAMETER ENABLE=BROKEN
the server is down when the reduced time-outs must be used by the client. Table 31-1 suggests
settings that should keep the time-outs below one minute under all circumstances.
Table 31-1. Recommended TCP/IP Parameters
Parameter
Default
Suggested Value
Unit
tcp rexmit interval max
60000 (60 seconds)
10000 (10 seconds)
ms
tcp ip abort interval
480000 (8 minutes)
45000 (45 seconds)
ms
tcp ip abort cinterval
180000 (3 minutes)
30000 (30 seconds)
ms
CHAPTER 32
■■■
Default Host Name in
Oracle Net Configurations
I
t is undocumented that the host name may be left unspecified in the configuration files
listener.ora and tnsnames.ora. Hence, the configuration file listener.ora does not need to
be modified in case the host name is changed. Custom scripts that might generate a TNS Listener
configuration do not have to consider varying host names as long as the TNS Listener may use
the default host name (i.e., the local system’s host name, which uniquely identifies it).
Default Host Name
In a Net service name description, the host name may be omitted. The syntax for omitting the
host name (or IP address) is simply (HOST=). If the host name or IP address is an empty string,
it defaults to the host name of the local system, which may be obtained with the command
hostname (UNIX and Windows) or the C library routine gethostname. Since the latter is not a
UNIX system call, it cannot be observed with system call trace utilities such as truss. Following
is an example that passes a Net service name description to tnsping on the command line:
$ tnsping "(ADDRESS=(PROTOCOL=TCP)(Host=)(Port=1521))"
TNS Ping Utility for Linux: Version 10.2.0.3.0 - Production on 22-JUL-2007 22:13:07
Attempting to contact (ADDRESS=(PROTOCOL= TCP)(Host=)(Port=1521))
OK (0 msec)
The same syntax may also be used in listener.ora, such as in the following example:
LISTENER DBSERVER1 =
(DESCRIPTION =
(ADDRESS LIST =
(ADDRESS=(PROTOCOL=TCP)(HOST=)(PORT=1521)(IP=FIRST))
)
)
423
424
CHAPTER 32 ■ DEFAULT HOST NAME IN ORACLE NET CONFIGURATIONS
The UNIX command hostname returns the host name of the system.
$ ping -c 1 `hostname`
PING dbserver1.oradbpro.com (172.168.0.1) 56(84) bytes of data.
64 bytes from dbserver1.oradbpro.com (172.168.0.1): icmp seq=0 ttl=64 time=0.072 ms
--- dbserver1.oradbpro.com ping statistics --1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.072/0.072/0.072/0.000 ms, pipe 2
In the absence of a host name in listener.ora, the TNS Listener uses the host name returned
by the command hostname.
$ lsnrctl start listener dbserver1
LSNRCTL for Linux: Version 10.2.0.3.0 - Production on 08-SEP-2007 16:32:09
Copyright (c) 1991, 2006, Oracle. All rights reserved.
Starting /opt/oracle/product/db10.2/bin/tnslsnr: please wait...
TNSLSNR for Linux: Version 10.2.0.3.0 - Production
System parameter file is /opt/oracle/product/db10.2/network/admin/listener.ora
Log messages written to /opt/oracle/product/db10.2/network/log/listener dbserver1.lo
g
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.168.0.1)(PORT=1521)))
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=)(PORT=1521)(IP=FIRST)))
STATUS of the LISTENER
-----------------------Alias
listener dbserver1
Version
TNSLSNR for Linux: Version 10.2.0.3.0 - Production
Start Date
08-SEP-2007 16:32:09
Uptime
0 days 0 hr. 0 min. 3 sec
Trace Level
off
Security
ON: Local OS Authentication
SNMP
OFF
Listener Parameter File /opt/oracle/product/db10.2/network/admin/listener.ora
Listener Log File
/opt/oracle/product/db10.2/network/log/listener dbserver1.
log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.168.0.1)(PORT=1521)))
The listener supports no services
The command completed successfully
Of course, the host name could also be omitted in tnsnames.ora, but this is not useful except
on the database server itself, since most of the time clients on remote hosts need to connect to the
database server. On a client system, (HOST=) would be synonymous to (HOST=client_host_name),
which will not allow the client to connect to the TNS Listener on the server.
CHAPTER 32 ■ DEFAULT HOST NAME IN ORACLE NET CONFIGURATIONS
Disabling the Default Listener
Another use for the default host name is to disable the default listener called LISTENER. Let us
assume that multiple listeners, each dedicated to a single DBMS instance, are running on a
system whereby each listener has a unique name that includes the name of the DBMS instance
it serves. Hence the default name LISTENER is not used. If a DBA forgets to supply the listener
name to the lsnrctl command, the default listener LISTENER starts, since it does not require
any configuration in listener.ora. This may pose a problem to security if the correct listener
needs to be started with a special setting of TNS ADMIN that enables valid node checking in
$TNS ADMIN/sqlnet.ora. Other security-related listener parameters like ADMIN RESTRICTIONS_
listener_name1 may be in effect for non-default listeners, but are disabled in a default listener
configuration. Hence it makes sense to disable the default listener with a generic section in
listener.ora.
On the UNIX platform, port numbers between 1 and 1023 inclusive may only be used by
programs running with root privileges. If a port number in that range is used by an Oracle
listener, it fails to start with “TNS-12546: TNS:permission denied”. Windows does not impose
the aforementioned restriction on port numbers. However, the invalid port number 0 may be
used to prevent the default listener from starting. A listener configuration that accomplishes
this for both UNIX and Windows independently of the host name is reproduced here:
# Disable default listener
LISTENER =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST=)(PORT = 0))
)
This configuration prevents the default listener from starting:
$ lsnrctl start
…
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=)(PORT=0)))
TNS-01103: Protocol specific component of the address is incorrectly specified
TNS-12533: TNS:illegal ADDRESS parameters
TNS-12560: TNS:protocol adapter error
TNS-00503: Illegal ADDRESS parameters
The error serves as a reminder to set TNS ADMIN, if necessary, and to supply the correct
listener name to the lsnrctl command.
1. With ADMIN RESTRICTIONS listener=on, the listener rejects SET commands that might have been sent
from an intruder on a remote system. It only allows changes through lsnrctl reload on the local system.
425
PA R T
1 0
Real Application
Clusters
CHAPTER 33
■■■
Session Disconnection, Load
Rebalancing, and TAF
N
one of the manuals, including the latest Oracle11g release 1 editions, indicate any link
between ALTER SYSTEM DISCONNECT SESSION and Transparent Application Failover (TAF). This is
also true for a transactional shutdown of an instance, that is, a shutdown operation, which is
deferred until all clients have committed or rolled back their work, with the command SHUTDOWN
TRANSACTIONAL. A third undocumented link exists between the disconnection of sessions with
the PL/SQL package DBMS SERVICE in Oracle10g as well as Oracle11g and TAF.
The SQL statement ALTER SYSTEM DISCONNECT SESSION, the SQL*Plus command SHUTDOWN
TRANSACTIONAL, and the PL/SQL package DBMS SERVICE may be used to gracefully disconnect
and reconnect database sessions that have Transparent Application Failover enabled, to another
DBMS instance. This feature is useful in cases where maintenance needs to be performed. It is
also useful in a scenario where load needs to be rebalanced among DBMS instances in a RAC
cluster, after one or more nodes had intermittently been unavailable.
Introduction to Transparent Application Failover
Transparent Application Failover is an automatic database session reestablishment feature
built into Oracle Call Interface (OCI). It is primarily intended for RAC environments to reestablish
database sessions in case of cluster node failure, but the functionality as such is fully independent
of RAC and may be used for single instance as well as Data Guard environments. TAF does not
work with the JDBC Thin driver, since that driver is not built on top of OCI. Following is an
example of a Net service name configured with Transparent Application Failover:
taftest.oradbpro.com =
(DESCRIPTION =
(ADDRESS LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = dbserver1)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = dbserver2)(PORT = 1521))
)
429
430
CHAPTER 33 ■ SESSION DISCONNECTION, LOAD REBALANCING, AND TAF
(CONNECT DATA =
(SERsVICE NAME = TEN.oradbpro.com)
(FAILOVER MODE =
(TYPE = select)
(METHOD = basic)
(RETRIES = 36)
(DELAY = 5)
)
)
)
The preceding Net service name definition instructs Oracle Net to do the following:
• Enable session failover (or rather reconnection to a surviving instance). As long as
FAILOVER MODE is present, it is not necessary to explicitly request reconnection by
adding (FAILOVER=ON) to the DESCRIPTION section.
• Attempt to automatically and transparently re-run SELECT statements that were in
progress at the time the database connection was disrupted.
• Wait five seconds before each attempt to reconnect (DELAY).
• Retry connection reestablishment at most 36 times, such that a connection must be
reestablished within three minutes. After expiration of the reconnection interval (DELAY
times RETRIES), Oracle Net signals the error “ORA-03113: end-of-file on communication
channel” to the client, if the reason of the disconnection was a node or instance failure.
If the session was disconnected and the reconnection interval expires, the error
“ORA-00028: your session has been killed” is reported. If a session attempts to run SQL
statements after one of these errors, it incurs the error “ORA-03114: not connected to
ORACLE”. At this point, it might attempt to start a new database session without the
assistance of TAF.
In a RAC environment, one would usually add the directive (LOAD BALANCE=ON) to the
DESCRIPTION section. Even without this, sessions are distributed across available RAC instances.
Please refer to the Oracle Database Net Services Reference manual for further details on local
naming parameters (tnsnames.ora) related to TAF as well as load balancing.
ALTER SYSTEM DISCONNECT SESSION
TAF takes effect when node failure or instance failure occurs. The latter may be simulated with
SHUTDOWN ABORT. It is undocumented that TAF may also take effect when database sessions are
disconnected explicitly. The syntax for this is as follows:
ALTER SYSTEM DISCONNECT SESSION 'sid, serial#' [POST TRANSACTION] [IMMEDIATE];
The parameters sid and serial# correspond to the columns by the same name in V$SESSION.
The keyword POST TRANSACTION requests disconnection after the next COMMIT or ROLLBACK by the
client. The keyword IMMEDIATE requests immediate termination of the database session irrespective of open transactions. At least one of these two keywords must be present. In the absence of a
transaction, POST TRANSACTION has the same effect as IMMEDIATE in terms of timing of the
CHAPTER 33 ■ SESSION DISCONNECTION, LOAD REBALANCING, AND TAF
disconnection. However, in terms of session reestablishment, the implications are quite different.
When IMMEDIATE is specified, alone or in conjunction with POST TRANSACTION, TAF does not take
effect, whereas it does when merely POST TRANSACTION is used. By the way, TAF will also not
intervene when ALTER SYSTEM KILL SESSION is issued or the client’s server process is terminated with the UNIX command kill –TERM.1
SELECT Failover
As stated before, the requirements for TAF are a properly configured Net service name (e.g., in
tnsnames.ora) and a database client built on top of OCI. To avoid boring you by demonstrating
TAF with SQL*Plus, I have chosen to use Perl DBI (see Chapter 22), which is implemented with
OCI. The screen output depicted next is from the Perl DBI program dbb.pl,2 which is capable
of executing arbitrary SQL statements and PL/SQL blocks. I wrote it after many years of annoyance due to unreadable output from SQL*Plus and time spent making obfuscated query results
halfway readable with COLUMN name FORMAT commands.
The program dbb.pl reads from standard input until it finds a slash (/) by itself at the
beginning of a line. At this point, it prepares and executes the statement entered. If it detects a
SELECT statement by checking that the DBI handle attribute NUM OF FIELDS is larger than zero,
it fetches and displays the rows with automatically adjusted column widths! The column width
is automatically made just wide enough to accommodate the larger of either column heading
or column value, leading to easily readable query results. This is what makes the program a big
time saver compared to SQL*Plus, which does not have such a feature. At this stage, this is
pretty much all dbb.pl can do.
For the following test, I slightly modified a routine called by dbb.pl, which iteratively
fetches all rows in such a way that it pauses after each fetch. Thus, it is guaranteed that disconnection can be requested while the program is in the fetch loop. To avoid buffering effects due
to bulk fetch (a.k.a. array fetch, i.e., retrieval of more than a single row with each fetch call), it is
necessary that the number of rows in the table is larger than the fetch array size. If this issue
is disregarded, it may happen that all the rows have already been retrieved into a client-side
buffer. The client then merely reads from the client-side buffer, without interacting with the
DBMS instance, such that the SELECT statement does not need to be restarted. When testing
TAF SELECT failover with SQL*Plus, I recommend the following settings to avoid buffering:
SQL> SET ARRAYSIZE 1
SQL> SET PAGESIZE 1
SQL> SET PAUSE "Hit enter to continue ..."
SQL*Plus will then
• parse and execute the statement
• fetch a single row at a time (FETCH ... r=1 would be seen in a SQL trace file; see Chapter 24)
• delay display of the first row until the user hits enter
• pause after each row until the user hits enter
1. The commands kill TERM and kill 9 are equivalent. TERM is the abbreviated name of the signal with
number nine. A list of all signals is in the C language include file /usr/include/sys/signal.h.
2. The Perl program dbb.pl (database browser) is included in the source code depot of Chapter 22.
431
432
CHAPTER 33 ■ SESSION DISCONNECTION, LOAD REBALANCING, AND TAF
This allows the tester to interrupt an in-flight fetch loop. Without these settings, you would
need to select from a large table to allow enough time to interrupt the SELECT statement.
Back to the demonstration with dbb.pl. I opted to point out some side-effects of session
reestablishment that are often overlooked and not explicitly documented. On page 13-15, the
Oracle Database Net Services Administrators Guide 10g Release 2 states this:
Server side program variables, such as PL/SQL package states, are lost during failures; TAF
cannot recover them. They can be initialized by making a call from the failover callback.
As the following demonstration illustrates, there is a whole lot more that TAF does not restore.
• Effects of SET ROLE statements (or DBMS SESSION.SET ROLE), such that a reestablished session
may have fewer privileges than the original session
• Effects of enabling secure application roles
• Effects of ALTER SESSION statements, such as enabling SQL trace or adjusting NLS settings
• The client identifier (V$SESSION.CLIENT IDENTIFIER)
• The module and action for application instrumentation (see Chapter 24)
In fact, TAF cannot restore anything beyond the session itself and the previous cursor position
of SELECT statements—keep in mind that the latter is not guaranteed to work.3 Doctor ORACLE
prescribes callback functions as a remedy for this situation. A callback function is a subroutine
that the client registers by calling an OCI function (or JDBC method if programming in Java).
OCI then assumes the task of executing the client’s callback function when certain events occur.
TAF callbacks may be registered for the following events:
• Commencement of session failover
• Unsuccessful failover attempt; the client may indicate that OCI should continue retrying
• Completed successful session reestablishment
• Unsuccessful failover; no retry possible
The details are documented in Oracle Call Interface Programmer’s Guide 10g Release 2 and
Oracle Database JDBC Developer’s Guide and Reference 10g Release 2. For programming languages
that do not provide an interface to OCI callback functions (this includes the Perl DBI module),
it is possible to detect session failover by checking for certain errors such as “ORA-25408: can
not safely replay call”.
Following is a demonstration of a successful session reestablishment and a restarted SELECT
statement. To point out that statements were run using dbb.pl and not SQL*Plus, the prompt
DBB> is used. The Perl program dbb.pl is started in much the same way as SQL*Plus by passing
a connect string.
$ dbb.pl app user/secret@taftest.oradbpro.com
3. SELECT failover may fail with the error “ORA-25401: can not continue fetches”.
CHAPTER 33 ■ SESSION DISCONNECTION, LOAD REBALANCING, AND TAF
Next, module, action, and client identifier are set.
DBB> begin
dbms application info.set module('taf mod', 'taf act');
dbms session.set identifier('taf ident');
end;
/
1 Row(s) Processed.
Since SELECT CATALOG ROLE is not a default role of APP_USER, it must be enabled with a SET
ROLE statement before V$SESSION may be accessed. Furthermore, the NLS date format is changed.
DBB> SET ROLE select catalog role
/
0 Row(s) Processed.
DBB> ALTER SESSION SET nls date format='dd. Mon yy hh24:mi'
/
0 Row(s) Processed.
DBB> SELECT sid, serial#, audsid, logon time, client identifier, module, action
FROM v$session
WHERE username='APP USER'
/
Row 1 fetched. Hit enter to continue fetching ...
SID SERIAL# AUDSID LOGON TIME
CLIENT IDENTIFIER MODULE ACTION
--- ------- ------ ---------------- ----------------- ------- ------116 23054 110007 05. Aug 07 14:40 taf ident
taf mod taf act
1 Row(s) processed.
Take note that the client was assigned session 116, session serial number 23054, and auditing
session identifier4 110007. The auditing identifier is formed by selecting NEXTVAL from the sequence
SYS.AUDSES$ at session establishment. The auditing identifier uniquely identifies a session for
the lifetime of a database and is saved in DBA AUDIT TRAIL.SESSIONID if any auditing on behalf
of the session occurs. The date format of the session includes the month name. Client identifier, module, and action were communicated to the DBMS. Querying failover-related columns
in V$SESSION confirms that TAF is switched on for the session.
DBB> SELECT failover type, failover method, failed over
FROM v$session
WHERE username='APP USER'
/
Row 1 fetched. Hit enter to continue fetching ...
FAILOVER TYPE FAILOVER METHOD FAILED OVER
------------- --------------- ----------SELECT
BASIC
NO
1 Row(s) processed.
4. The session auditing identifier V$SESSION.AUDSID may also be retrieved with the following statement:
SELECT userenv(sessionid) FROM dual.
433
434
CHAPTER 33 ■ SESSION DISCONNECTION, LOAD REBALANCING, AND TAF
Next, APP_USER enters a fetch loop that retrieves rows from the table EMPLOYEES. Note how
the program dbb.pl pauses after each fetch call, without displaying any data yet.
DBB> SELECT employee id, first name, last name, email FROM hr.employees
/
Row 1 fetched. Hit enter to continue fetching ...
Row 2 fetched. Hit enter to continue fetching ...
In a different window from the one where dbb.pl is running, disconnect the session as a DBA.
SQL> ALTER SYSTEM DISCONNECT SESSION '116,23054' POST TRANSACTION;
System altered.
Move back to the window where dbb.pl is running and keep hitting enter until all rows
have been fetched.
Row 3 fetched. Hit enter to continue fetching ...
…
Row 107 fetched. Hit enter to continue fetching ...
EMPLOYEE ID FIRST NAME LAST NAME EMAIL
----------- ----------- ----------- -------198 Donald
OConnell
DOCONNEL
…
197 Kevin
Feeney
KFEENEY
107 Row(s) processed.
The SELECT statement completed without any noticeable interruption. Now it’s time to
take a look at the value in the column V$SESSION.FAILED OVER.
DBB> SELECT failover type, failover method, failed over
FROM v$session
WHERE username='APP USER'
/
error code: 942, error message: ORA-00942: table or view does not exist (error possi
bly near <*> indicator at char 56 in 'SELECT failover type, failover method, failed
over FROM <*>v$session WHERE username='APP USER'
')
The SELECT from V$SESSION, which worked previously, failed. This is a strong indication
that session reestablishment has occurred without restoring all the properties of the session.
Let’s again enable SELECT CATALOG ROLE.
DBB> SET ROLE select catalog role
/
0 Row(s) Processed.
DBB> SELECT failover type, failover method, failed over FROM v$session WHERE
username='APP USER'
/
CHAPTER 33 ■ SESSION DISCONNECTION, LOAD REBALANCING, AND TAF
Row 1 fetched. Hit enter to continue fetching ...
FAILOVER TYPE FAILOVER METHOD FAILED OVER
------------- --------------- ----------SELECT
BASIC
YES
1 Row(s) processed.
The value of V$SESSION.FAILED OVER was previously NO and is now YES. This confirms that
TAF succeeded. How about the remaining properties of the previous database session? They
are all lost. Date format, client identifier, module, and action now have default values.
DBB> SELECT sid, serial#, audsid, logon time, client identifier,
module, action
FROM v$session
WHERE username='APP USER'
/
Row 1 fetched. Hit enter to continue fetching ...
SID SERIAL# AUDSID LOGON TIME
CLIENT IDENTIFIER MODULE
ACTION
--- ------- ------ ------------------- ----------------- -------- -----133 15197 110008 05.08.2007 14:49:23
perl.exe
1 Row(s) processed.
The auditing identifier of the new session is 110008. Perl DBI automatically registers the
module name perl.exe with the DBMS.
Failover at the End of a Transaction
While we’re at it, we might also verify that DISCONNECT SESSION POST TRANSACTION allows ongoing
transactions to complete and then initiates session reestablishment through TAF. A test case
for such a scenario follows:
1. Starts a transaction by deleting a row.
2. Runs another DELETE statement that blocks on a TX enqueue due to a row locked by
another session.
3. Gets marked for disconnection while waiting for the lock.
4. Succeeds in finishing the transaction and reconnects.
The first step of the scenario is to start a transaction with DELETE.
DBB> DELETE FROM hr.employees WHERE employee id=190
/
1 Row(s) Processed.
As a DBA using SQL*Plus (or dbb.pl, in case you appreciate its automatic column sizing
feature), check that APP_USER has an open transaction and lock the row with EMPLOYEE ID=180
in HR.EMPLOYEES.5
5. An export dump containing the sample schema HR is included in the source code depot.
435
436
CHAPTER 33 ■ SESSION DISCONNECTION, LOAD REBALANCING, AND TAF
SQL> SELECT s.sid, s.serial#, s.event, t.start time, t.status
FROM v$transaction t, v$session s
WHERE s.taddr=t.addr
AND s.sid=139;
SID
SERIAL# START TIME
STATUS
---------- ---------- -------------------- ---------------139
74 07/30/07 15:34:25
ACTIVE
SQL> SELECT employee id FROM hr.employees WHERE employee id=180 FOR UPDATE NOWAIT;
EMPLOYEE ID
----------180
As APP USER, try to delete the row in EMPLOYEES with EMPLOYEE ID=180. The session has to
wait for the DBA to release the lock.
DBB> DELETE FROM hr.employees WHERE employee id=180
/
As a DBA, mark APP_USER’s session for disconnection.
SQL> ALTER SYSTEM DISCONNECT SESSION '139,74' POST TRANSACTION;
System altered.
As a DBA, verify that the transaction is still active and that APP_USER is still waiting for the
row lock, then COMMIT, thus releasing the lock on EMPLOYEES.
SQL> SELECT s.sid, s.serial#, s.event, t.start time, t.status
FROM v$transaction t, v$session s
WHERE s.taddr=t.addr
AND s.sid=139;
SID SERIAL# EVENT
START TIME
STATUS
--- ------- ----------------------------- ----------------- -----139
74 enq: TX - row lock contention 07/30/07 15:34:25 ACTIVE
SQL> COMMIT;
Commit complete.
In the other window, dbb.pl displays the number of rows processed, which we respond to
with a COMMIT statement.
1 Row(s) Processed.
DBB> COMMIT
/
0 Row(s) Processed.
Right after the COMMIT, the DBMS disconnects APP_USER’s session. The reconnect due to
TAF starts a new session.
DBB> SET ROLE SELECT CATALOG ROLE
/
0 Row(s) Processed.
DBB> SELECT sid, serial#, logon time, client identifier, module, action, failed over
CHAPTER 33 ■ SESSION DISCONNECTION, LOAD REBALANCING, AND TAF
FROM v$session
WHERE username='APP USER'
/
Row 1 fetched. Hit enter to continue fetching ...
SID SERIAL# LOGON TIME CLIENT IDENTIFIER MODULE
ACTION FAILED OVER
--- ------- ---------- ----------------- -------- ------ ----------139
76 30-JUL-07
perl.exe
YES
1 Row(s) processed.
The preceding output shows that session failover has occurred (FAILED OVER=YES). The
question is whether or not the transaction completed successfully.
DBB> DELETE FROM hr.employees WHERE employee id IN (180, 190)
/
0 Row(s) Processed.
As implied by the lack of any errors, the transaction did complete successfully and the rows
were truly deleted since rerunning the identical DELETE statement did not find any matching rows.
The exact same functionality demonstrated is available at instance level with SHUTDOWN
TRANSACTIONAL. Thus, at the beginning of a maintenance window, all database sessions connected
to an instance may be shifted to other instances providing the service requested by the client.
The optional keyword LOCAL to SHUTDOWN TRANSACTIONAL applies to distributed database environments, not RAC environments. When LOCAL is specified, the instance waits only for local
transactions to complete, but not for distributed transactions.
Session Disconnection and DBMS_SERVICE
Oracle10g and Oracle11g include the package DBMS SERVICE for managing instance services
and TAF. For the first time, this package allows the configuration of TAF on the server-side
instead of the client-side configuration file tnsnames.ora. DBMS SERVICE is the most sophisticated approach to services so far. In a RAC cluster, it is called behind the scenes when cluster
services are created with the DBCA, but it may be used directly in both RAC and single instance
environments. In a RAC environment, cluster database services should be configured with DBCA,
since it sets up the integration between Oracle Clusterware and DBMS SERVICE. Services created
with DBMS SERVICE do not restart automatically on instance startup. Oracle Clusterware performs
this task, given that the appropriate cluster resources were created by DBCA.
Setting Up Services with DBMS_SERVICE
To create a service, at least a service name and a network name must be provided. The service
name is an identifier, which is stored in the data dictionary. The network name is the instance
service name (see Instance Service Name vs. Net Service Name in this book’s Introduction), which is
registered with the listener and needs to be referenced as the SERVICE NAME in a client-side Net
service name description in tnsnames.ora.
DBMS SERVICE.CREATE SERVICE inserts services into the dictionary table SERVICE$, which
underlies the view DBA SERVICES. The procedure DBMS SERVICE.START SERVICE adds the network
name to the initialization parameter SERVICE NAMES, such that the service is registered with the
listener. When registering the service with the listener, the value of the initialization parameter
437
438
CHAPTER 33 ■ SESSION DISCONNECTION, LOAD REBALANCING, AND TAF
DB DOMAIN is appended to the network name, unless it is already present. Instance service names
specified with the parameter NETWORK NAME are not case sensitive. The service name as well as
the network name must be unique. Otherwise the error “ORA-44303: service name exists” or
“ORA-44314: network name already exists” is raised. The following sample code creates and
starts a service with the TAF settings used in the previous examples:
SQL> BEGIN
dbms service.create service(
service name=>'TAF INST SVC',
network name=>'taf inst svc net name',
failover method=>dbms service.failover method basic,
failover type=>dbms service.failover type select,
failover retries=>36,
failover delay=>12
);
dbms service.start service('TAF INST SVC', DBMS SERVICE.ALL INSTANCES);
END;
/
PL/SQL procedure successfully completed.
The new service is now present in DBA SERVICES.
SQL> SELECT name, network name, failover method method, failover type type,
failover retries retries, failover delay delay
FROM sys.dba services;
NAME
NETWORK NAME
METHOD TYPE
RETRIES DELAY
------------------ ----------------------- ------ ------ ------- ----SYS$BACKGROUND
SYS$USERS
TEN.oradbpro.com
TEN.oradbpro.com
TAF INST SVC
taf inst svc net name BASIC SELECT
36
12
Since the service was started, it’s also registered as an active service in
GV$ACTIVE SERVICES. This view lists active services for all instances:
SQL> SELECT inst id, name, network name, blocked
FROM gv$active services
WHERE name NOT LIKE 'SYS$%';
INST ID NAME
NETWORK NAME
BLOCKED
------- ---------------- --------------------- ------1 TAF INST SVC
taf inst svc net name NO
1 TEN.oradbpro.com TEN.oradbpro.com
NO
2 TAF INST SVC
taf inst svc net name NO
2 TEN.oradbpro.com TEN.oradbpro.com
NO
The database domain name (DB DOMAIN) “oradbpro.com” was appended to the network
name, since it lacked a domain suffix. The network name was also added to the list of instance
service names in the parameter SERVICE NAMES.
CHAPTER 33 ■ SESSION DISCONNECTION, LOAD REBALANCING, AND TAF
SQL> SELECT name, value FROM v$parameter
WHERE name in ('db domain', 'service names');
NAME
VALUE
------------- --------------------db domain
oradbpro.com
service names taf inst svc net name
Each instance service name set with the parameter SERVICE NAMES is registered with one or
more listeners. Note that in a RAC cluster, you need to set the parameter REMOTE LISTENER to
a Net service name that references all remote nodes in the cluster, such that an instance can
register instance service names with the remote nodes. In case the local listener is not running
on the default port 1521, you must also set the parameter LOCAL LISTENER. The output of the
command lsnrctl below indicates that the new instance service names were registered with
the listener by both instances, since the constant DBMS SERVICE.ALL INSTANCES was used as the
second argument to the procedure DBMS SERVICE.START SERVICE.
$ lsnrctl services listener dbserver1
…
Services Summary...
…
Service "taf inst svc net name.oradbpro.com" has 2 instance(s).
Instance "TEN1", status READY, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
LOCAL SERVER
Instance "TEN2", status READY, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
With this configuration, there is no need to enable TAF on the client side in the configuration file tnsnames.ora. TAF has already been enabled for the instance service name taf inst
svc net name.oradbpro.com, such that the Net service name taf net svc.oradbpro.com shown
next does not contain a FAILOVER MODE section. Server-side TAF settings override client-side
TAF settings. Note that the network name, which was passed to DBMS SERVICE (suffixed by the
database domain name), is used as the SERVICE NAME in tnsnames.ora.
taf net svc.oradbpro.com =
(DESCRIPTION =
(ADDRESS = (PROTOCOL= TCP)(Host=dbserver1)(Port= 1521))
(CONNECT DATA = (SERVICE NAME = taf inst svc net name.oradbpro.com))
)
Session Disconnection with DBMS_SERVICE and TAF
Now it’s time to test session disconnection with DBMS SERVICE for the purpose of maintenance
or load rebalancing. Connect as APP_USER using the Net service name in the previous section
and run a SELECT statement.
439
440
CHAPTER 33 ■ SESSION DISCONNECTION, LOAD REBALANCING, AND TAF
$ sqlplus app user/secret@taf net svc.oradbpro.com
SQL> SET PAUSE "Hit enter to continue ..."
SQL> SET PAUSE ON
SQL> SELECT * FROM audit actions;
Hit enter to continue ...
ACTION NAME
---------- ---------------------------0 UNKNOWN
1 CREATE TABLE
…
As a DBA, verify that the TAF settings are in effect.
SQL> SELECT inst id, sid, serial#, audsid, logon time, service name,
failover type, failover method, failed over
FROM gv$session
WHERE username='APP USER';
INST ID SID SERIAL# AUDSID LOGON TIME
SERVICE NAME FAILOVER TYPE
------- --- ------- ------ ------------------- ------------ ------------2
143
4139 120036 05.08.2007 19:27:58 TAF INST SVC SELECT
FAILOVER METHOD FAILED OVER
--------------- ----------BASIC
NO
Now stop the service on instance 2 that hosts the client (INST ID=2 in the result of the
preceding SELECT). The name of instance 2 is TEN2. A service may be stopped from any instance in
the cluster.
SQL> EXEC dbms service.stop service('TAF INST SVC', 'TEN2')
PL/SQL procedure successfully completed.
This removes the service from instance 2 in GV$ACTIVE SERVICES:
SQL> SELECT name FROM gv$active services WHERE inst id=2;
NAME
-----------------TEN.oradbpro.com
SYS$BACKGROUND
SYS$USERS
It also removes the instance TEN2 from the listener’s services summary.
$ lsnrctl services listener dbserver1
…
Service "taf inst svc net name.oradbpro.com" has 1 instance(s).
Instance "TEN1", status READY, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
LOCAL SERVER
The command completed successfully
CHAPTER 33 ■ SESSION DISCONNECTION, LOAD REBALANCING, AND TAF
DBMS SERVICE.DISCONNECT SESSION affects all sessions of the local instance using a certain
service, such that you need to connect to the instance hosting the client in order to disconnect
all sessions. This procedure has an undocumented parameter DISCONNECT OPTION. The default
value for this parameter is the numeric constant DBMS SERVICE.POST TRANSACTION. It can also
take the value DBMS SERVICE.IMMEDIATE. These constants have the same meaning as the keywords
POST TRANSACTION and IMMEDIATE supported by the SQL statement ALTER SYSTEM DISCONNECT
SESSION. Let’s disconnect all the sessions that were established via the service TAF_INST_SVC.
SQL> EXEC dbms service.disconnect session('TAF INST SVC')
Beware that DBMS SERVICE.DISCONNECT SESSION completes successfully, even when a nonexistent service name is passed. Session disconnection takes effect immediately, except for
sessions that have an open transaction.
SQL> SELECT inst id, sid, serial#, audsid, logon time, service name,
failover type, failover method, failed over
FROM gv$session
WHERE username='APP USER';
INST ID SID SERIAL# AUDSID LOGON TIME
SERVICE NAME
------- --- ------- ------ ------------------- -----------2 143
4139 120036 05.08.2007 19:27:58 TAF INST SVC
FAILOVER TYPE FAILOVER METHOD FAILED OVER
------------- --------------- ----------SELECT
BASIC
NO
Once APP_USER attempts to retrieve the remaining rows from the SELECT statement, the
disconnection is detected and a new connection to an instance, which still offers the requested
service, is opened. Repeating the SELECT from GV$SESSION yields this:
INST ID SID SERIAL# AUDSID LOGON TIME
SERVICE NAME
------- --- ------- ------ ------------------- -----------1 151
1219 130018 05.08.2007 19:31:35 TAF INST SVC
FAILOVER TYPE FAILOVER METHOD FAILED OVER
------------- --------------- ----------SELECT
BASIC
YES
APP_USER’s session has been reconnected to instance 1, as evidenced by INST ID=1, a
different AUDSID, and a later LOGON TIME. The value of AUDSID must not necessarily increment
after a reconnect due to sequence caching in all instances. TAF also reconnects disconnected
sessions that had open transactions. For such sessions, disconnection takes place when they
commit or roll back. The procedure DBMS SERVICE.DISCONNECT SESSION behaves in the same
way as ALTER SYSTEM DISCONNECT SESSION POST TRANSACTION, in that it allows open transactions
to complete before disconnecting the session. In fact, it is no more than a layer on top of ALTER
SYSTEM DISCONNECT SESSION. This undocumented fact is given away by the source file $ORACLE HOME/
rdbms/admin/dbmssrv.sql.
Integration between the new features of DBMS SERVICE and the vintage features associated
with the parameter SERVICE NAMES is not anywhere near seamless. The statement ALTER SYSTEM
SET SERVICE NAMES adds a service to DBA SERVICES, whereas removal of the same service from
SERVICE NAMES does not remove it from DBA SERVICES. It merely stops the service and removes
it from V$ACTIVE SERVICES.
441
442
CHAPTER 33 ■ SESSION DISCONNECTION, LOAD REBALANCING, AND TAF
To delete a service, it must be stopped on all instances. The default service, which matches
the name of the database suffixed by the database domain, cannot be deleted. It appears as
though it could be stopped with DBMS SERVICE.STOP SERVICE, which completes without error, but
crosschecking with the services summary from lsnrctl services listener_name and V$ACTIVE
SERVICES reveals that it was not stopped. It would have been more appropriate to introduce a
new error message such as “default service cannot be stopped”. For example, my DBMS instance
with the settings db name=TEN and db domain=oradbpro.com automatically has an instance service
name TEN.oradbpro.com in DBA SERVICES and V$ACTIVE SERVICES, which can neither be stopped
nor deleted.
Lessons Learned
Disruptions in database service due to scheduled maintenance or manual load rebalancing
may be much ameliorated by configuring TAF and disconnecting sessions gracefully. Session
disconnection can occur on three levels: instance level, service level, and session level. The
commands and levels are summarized in Table 33-1.
Table 33-1. Statements for Session Disconnection
Level
SQL Statement or PL/SQL
Procedure Call
Client
Program
Instance
SHUTDOWN TRANSACTIONAL
SQL*Plus,
Oracle11g
JDBC
Service
DBMS SERVICE.DISCONNECT SESSION(
SERVICE NAME=>'name',
DISCONNECT OPTION=>option);6
Session
Any
ALTER SYSTEM DISCONNECT SESSION 'sid,
serial#' POST TRANSACTION;
Any
While graceful disconnection at instance and session level was already implemented in
Oracle9i, disconnection at service level was first introduced in Oracle10g. The features may
even be used to hide an instance restart from database clients, given that the clients are configured
and known to work with TAF. In Oracle10g and subsequent releases, TAF should be configured on
the server-side with DBMS SERVICE.6
6. The parameter DISCONNECT OPTION is not available in release 10.2.0.1. The default value of this parameter
is DBMS SERVICE.POST TRANSACTION. The disconnect option DBMS SERVICE.IMMEDIATE disconnects sessions
immediately.
CHAPTER 33 ■ SESSION DISCONNECTION, LOAD REBALANCING, AND TAF
Source Code Depot
Table 33-2 lists this chapter’s source files and their functionality.
Table 33-2. Session Disconnection, Load Rebalancing, and TAF Source Code Depot
File Name
Functionality
hr.dmp
Oracle10g export dump file. Contains the database objects in sample
schema HR.
tnsnames.ora.smp
Sample Net service name definition, which configures TAF on the
client-side.
443
CHAPTER 34
■■■
Removing the RAC Option
Without Reinstalling
O
racle Real Application Clusters (RAC) can only operate if an instance of Oracle’s own cluster
software, which is mandatory starting from Oracle10g, or a so-called vendor cluster software is
available on the same node as the Oracle DBMS instance. This chapter explores the impact of
cluster software failure, which in turn prevents RAC instances from opening a database.
Oracle’s cluster software uses a so-called Cluster Registry to save configuration information. It
also writes special messages to voting disks to orchestrate cluster actions after an interconnect
network failure. In cases of an emergency, such as the loss of all voting disks or the failure of all
devices holding copies of the Cluster Registry, Oracle Clusterware cannot be started. In such a
severe fault scenario, it might take hours or even days to attach new SAN (Storage Area Network)
storage to the system, create logical units (LUNs) in the SAN disk array, and configure zoning
to make the LUNs visible to the database server. Oracle Clusterware may cause unjustified
node reboots and thus unplanned downtime, although it has matured tremendously since the
early days. This might be another reason for disabling RAC. This chapter explains how to use
an undocumented make command to remove the RAC option from the oracle executable. Since
ASM cannot run at all without Oracle Clusterware infrastructure, a procedure that converts an
Oracle Clusterware installation for RAC to a local-only Oracle Clusterware installation for use
with single-instance ASM is shown. Thus, a system running RAC can quickly be converted to a
system running single instance ORACLE without reinstalling any software with Oracle Universal
Installer (OUI) and patching an ORACLE_HOME with OPatch, greatly reducing the downtime
incurred by such a severe outage.
Linking ORACLE Software
When ORACLE Server software is installed on a UNIX system by OUI, many programs, including
$ORACLE HOME/bin/oracle, which implements the database kernel, are linked with static and
shared libraries on the system. OUI calls the utility make and passes it the makefile $ORACLE HOME/
rdbms/lib/ ins rdbms.mk as an argument. Other makefiles such as $ORACLE HOME/network/lib/
ins net server.mk are used to link Oracle Net components and still others to link SQL*Plus,
and so on.
Similar steps occur when a patch set (OUI) or interim patch (OPatch) is applied. Most
patches modify a static library in $ORACLE HOME/lib by replacing an object module with a newer
version that includes a bug fix. The executable oracle must be relinked to pick up a changed
445
446
CHAPTER 34 ■ REMOVING THE RAC OPTION WITHOUT REINSTALLING
object module in a static library. For someone who knows how to use the commands ar and
make, it is fairly easy to manually apply an interim patch. This is useful when OPatch fails for
whatever reason. For example, newer releases of OPatch (starting in versions 1.0.0.0.54 and
10.2.0.x) are able to verify that a new object module (extension .o, e.g., dbsdrv.o) was correctly
inserted into a static library (extension .a, e.g., libserver10.a) by extracting the object module
and comparing it to the one shipped with the patch. This very reasonable test failed on Solaris
64-bit ORACLE installations, since Solaris 10 pads object files with newline characters (use od
-c filename to check this). OPatch complained with a message that said “Archive failed: failed
to update” and backed out the interim patch. Setting the environment variable OPATCH DEBUG=TRUE,
which is documented in Oracle Universal Installer and OPatch User’s Guide 10g Release 2, revealed
that it was not the ar command that was failing, but instead the verification. Meanwhile this
issue has been taken care of (see Metalink note 353150.1 for details).
The same approach of exchanging object modules is used to add or remove ORACLE
server options. Options can only be purchased with the Enterprise Edition of the ORACLE DBMS.
Currently, ten options exist, among them are the following:
• Label Security
• Partitioning
• Real Application Clusters
• Spatial
• Data Mining
The Oracle10g OUI has a bug that causes Data Mining to be installed unconditionally—
even when it was deselected on the relevant OUI screen. The approach to add and remove
options, which will be presented shortly, may be used to work around this bug.
Which options are installed becomes evident when SQL*Plus is started.
$ sqlplus ndebes/secret
SQL*Plus: Release 10.2.0.3.0 - Production on Thu Jul 26 02:21:43 2007
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, Real Application Clusters, Oracle Label Security and Data
Mining options
SQL*Plus takes this information from the view V$OPTION:
SQL> SELECT parameter, value FROM v$option
WHERE parameter IN ('Partitioning','Real Application Clusters',
'Oracle Label Security','Data Mining')
PARAMETER
VALUE
-------------------------------------- -----Partitioning
TRUE
Real Application Clusters
TRUE
Oracle Label Security
TRUE
Data Mining
TRUE
As an option is added or removed by linking, V$OPTION.VALUE becomes TRUE or FALSE.
CHAPTER 34 ■ REMOVING THE RAC OPTION WITHOUT REINSTALLING
Case Study
In this section we will simulate the failure of all voting disks in a RAC environment in order to
create a scenario for the removal of the RAC option. Did you know that an ORACLE instance,
which uses an oracle executable with RAC linked in, cannot be started irrespective of the value
of the CLUSTER DATABASE parameter when cluster group services—a component of Oracle Clusterware—on the local node is not functional? This is true for ASM as well as RDBMS instances
(parameter instance type={ASM|RDBMS}).
Simulating Voting Disk Failure
Voting disks are devices that Oracle Clusterware uses to ensure the integrity of the database in
case of an interconnect failure. Say you lose all the voting disks configured for use with Oracle
Clusterware. This causes Oracle Clusterware to abort. Following is a quick test that simulates
the failure of all voting disks. In this case, all voting disks means just one, since external
redundancy, that is, redundancy in the storage subsystem, is used instead of triple mirroring
by Oracle Clusterware. The test was performed on Red Hat Advanced Server 4.
Linux does not have proper raw devices. Instead the command raw must be used to bind
raw devices to block devices.1 Failure of a voting disk may be simulated by binding the voting
disk device to an unused block device. On the test system, /dev/md1 was such a device. It could
be opened without error, but 0 bytes were returned. The individual steps to simulate voting
disk failure are reproduced here:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# wc -c /dev/md1
0 /dev/md1
# crsctl query css votedisk # ask Clusterware for configured voting disks
0.
0
/opt/oracle/votedisk
located 1 votedisk(s).
# ls -l /opt/oracle/votedisk # character special device with major nr. 162
crw-r--r-- 1 oracle oinstall 162, 1 Jul 20 22:40 /opt/oracle/votedisk
# raw -q /opt/oracle/votedisk # which block device is it bound to?
/dev/raw/raw1: bound to major 8, minor 8
# ls -l /dev/sda8 # major device number 8 and minor 8 is sda8 (SCSI disk)
brw-rw---- 1 root disk 8, 8 Jul 24 03:15 /dev/sda8
# raw /opt/oracle/votedisk /dev/md1 # rebind to the wrong device
/dev/raw/raw1: bound to major 9, minor 1
# crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
# crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
1. ASM instances in Oracle10g Release 2 for Linux also support block devices, since the implementation
of raw devices is deprecated. Red Hat encourages software vendors to modify their applications to
open block devices with the O DIRECT flag instead of requiring raw devices.
447
448
CHAPTER 34 ■ REMOVING THE RAC OPTION WITHOUT REINSTALLING
As the output of the command crsctl check crs in line 18 suggests, Oracle Clusterware
cannot be started without any voting disks available. Let’s take a detailed look at the commands by
line number.
• Line 1: The command wc (word count) reads from /dev/md1 and succeeds, but cannot
read anything from the device. This device file will be used to simulate a failed voting disk.
• Line 3: The list of configured voting disks is retrieved with crsctl query css votedisk.
• Line 4: The only voting disk configured is /opt/oracle/votedisk. This is not the device
itself, but merely a character special (a.k.a. raw) device file that points to the block device
representing the disk.
• Line 7: The major and minor numbers of /opt/oracle/votedisk are 162 and 8 respectively.
On Linux, all raw devices have major number 162. The major number identifies an entire
devices class. The minor number identifies device instances within a device class. On
Linux, raw devices instances have minor numbers between 1 and 255.
• Line 8: The current binding of /opt/oracle/votedisk is queried with raw -q raw_device_file.
• Line 9: The raw command does not take into account that raw devices might have more
meaningful names than rawn. The raw device /opt/oracle/votedisk is bound to a block
device with major number 8 (SCSI disks) and minor number 8.
• Line 11: Major number 8, minor number 8 is /dev/sda8, i.e., partition 8 of the first SCSI
disk in the system.
• Line 12: The raw device /opt/oracle/votedisk is rebound to /dev/md1. This succeeds,
since Oracle Clusterware was shut down and the character special file is not accessed.
• Line 14: An attempt is made to start Oracle Clusterware.
• Line 17: The status of Oracle Clusterware processes is queried.
• Line 18: Oracle Clusterware did not start, since no voting disk is available.
Surprisingly, no error message is written to any of the documented log files (test performed
with 10.2.0.3.0). On Linux, CRS (Cluster Ready Services) writes log messages to /var/log/messages
using /bin/logger.2 It’s a good idea to check this file if you cannot diagnose a problem from
$ORA CRS HOME/log/nodename/alertnodename.log or other log files in $ORA CRS HOME/log, where
nodename is the host name of the system. CRS is not available, since it is waiting for voting
disks to become available.
Jul 26 20:54:44 dbserver1 logger: Cluster Ready Services waiting on dependencies.
Diagnostics in /tmp/crsctl.30644.
The reason for the failure is in /tmp/crsctl.30644, as shown here:
# more /tmp/crsctl.30644
Failure reading from offset 2560 in device votedisk
Failure 1 checking the CSS voting disk 'votedisk'.
Not able to read adequate number of voting disks
2. The command /bin/logger is a shell command interface to the syslog system log module.
CHAPTER 34 ■ REMOVING THE RAC OPTION WITHOUT REINSTALLING
There is a loop in /etc/init.d/init.cssd that calls crsctl check boot until it returns exit
code 0. With all voting disks offline, this will never happen, so CRS loops forever.3
# crsctl check boot
Failure reading from offset 2560 in device votedisk
Failure 1 checking the CSS voting disk 'votedisk'.
Not able to read adequate number of voting disks
# echo $?
6
Another way to simulate the failure of voting disks as well as any other files such as the
Oracle Cluster Registry, ASM disks, or data files, is to overwrite the first few blocks with binary
zeros. This can be done with the following dd (device dump) command:
dd if=/dev/zero bs=8192 count=2 of=voting disk file
Obviously this is a destructive test, so you need to take a backup of your voting disks with
dd before you attempt this. However, it has the advantage that it can be performed while Oracle
Clusterware is running, whereas the raw command cannot remap a device that is in use.4 Errors
caused by this latter test are properly reported in the documented log files below
$ORA CRS HOME/log/nodename.
Back to my initial claim that an oracle executable with RAC linked in cannot be used to
start a DBMS instance, unless Oracle Clusterware is functional. In case ASM is used, an ASM
instance must be started before the DBMS instance. Here’s some evidence (run SQL*Plus as
the owner of the ORACLE installation):
$ env ORACLE SID=+ASM1 sqlplus / as sysdba
SQL*Plus: Release 10.2.0.3.0 - Production on Thu Jul 26 02:35:22 2007
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
Connected to an idle instance.
SQL> STARTUP
ORA-29701: unable to connect to Cluster Manager
For completeness, let’s confirm that a similar error is reported by an RDBMS instance.
$ env ORACLE SID=TEN1 sqlplus / as sysdba
SQL> STARTUP NOMOUNT
ORA-29702: error occurred in Cluster Group Service operation
Even STARTUP NOMOUNT does not work when Cluster Group Services are unavailable.
Removing the RAC Option with the Make Utility
As long as ASM is not in use, it is very easy to convert the oracle executable from RAC to single
instance and to get one instance in the cluster running. Calling make as the owner of the ORACLE
installation (usually oracle) and specifying the undocumented make target rac off does the job.
3. The same happens when all copies of the OCR are unavailable. The error reported in my test was “OCR
initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating
System error [Success] [0]”.
4. The error thrown is “Error setting raw device (Device or resource busy)”.
449
450
CHAPTER 34 ■ REMOVING THE RAC OPTION WITHOUT REINSTALLING
$ cd $ORACLE HOME/rdbms/lib
$ make -f ins rdbms.mk rac off ioracle
…
/usr/bin/ar d /opt/oracle/product/db10.2/rdbms/lib/libknlopt.a kcsm.o
/usr/bin/ar cr /opt/oracle/product/db10.2/rdbms/lib/libknlopt.a /opt/oracle/product/
db10.2/rdbms/lib/ksnkcs.o
- Linking Oracle
gcc -o /opt/oracle/product/db10.2/rdbms/lib/oracle ... -lknlopt ...
…
mv /opt/oracle/product/db10.2/rdbms/lib/oracle /opt/oracle/product/db10.2/bin/oracle
chmod 6751 /opt/oracle/product/db10.2/bin/oracle
The most interesting parts are the ar commands. First ar is used to remove the object file
kcsm.o from libknlopt.a (kernel options library), then the object file ksnkcs.o is added to
libknlopt.a. This removes the RAC option. Another startup attempt yields this:
$ env ORACLE SID=TEN1 sqlplus / as sysdba
SQL> STARTUP
ORA-00439: feature not enabled: Real Application Clusters
The reason for ORA-00349 is that DBMS instances in a RAC environment are run with the
parameter setting CLUSTER DATABASE=TRUE. But this parameter can only have the value TRUE if
RAC is linked in. So we need to set CLUSTER DATABASE=FALSE. In case you’re using an SPFILE, you
won’t be able to modify it without starting an instance, which is currently impossible. So you
need to transform the SPFILE by removing the binary contents, such that you get a plain text
file, and remove the parameter CLUSTER DATABASE, which defaults to FALSE. The next attempt
then yields this:
$ cd $ORACLE HOME/dbs
$ strings spfileTEN1.ora | grep -v cluster > pfileTEN1.ora
$ env ORACLE SID=TEN1 sqlplus / as sysdba
SQL> STARTUP NOMOUNT PFILE=pfileTEN1.ora
ORACLE instance started.
Total System Global Area 314572800 bytes
Fixed Size
1261564 bytes
Variable Size
201326596 bytes
Database Buffers
104857600 bytes
Redo Buffers
7127040 bytes
SELECT value FROM v$option WHERE parameter='Real Application Clusters';
VALUE
---------------------------------------------------------------FALSE
Finally the RDBMS instance has started. As long as it does not use ASM storage you are all
set. But what if ASM is used? Let’s attempt to start an ASM instance with CRS down, the RAC
option removed, and the initialization parameter setting CLUSTER DATABASE=FALSE.
CHAPTER 34 ■ REMOVING THE RAC OPTION WITHOUT REINSTALLING
$ strings spfile+ASM1.ora|grep -v cluster > pfile+ASM1.ora
$ env ORACLE SID=+ASM1 sqlplus / as sysdba
SQL> STARTUP NOMOUNT PFILE=pfile+ASM1.ora
ORA-29701: unable to connect to Cluster Manager
No progress here. This is because ASM instances depend on the Oracle Cluster Synchronization Service Daemon (OCSSD) of Oracle Clusterware for communication with RDBMS
instances.
Conversion of a CRS Installation to Local-Only
ASM does not require the full CRS stack. A stripped down version of CRS, which runs no
daemons except OCSSD, is sufficient. OCSSD implements the Cluster Synchronization Service,
which monitors node health through third-party cluster software, or in its absence makes its
own determination. It also provides notifications to ORACLE instances about cluster membership and notifies CRSD and EVMD of the health of the cluster. The latter two daemons are not
mandatory for ASM instance operations.
A local-only installation of CRS does not require any voting disks. After all, there is just a
single node that could cast a vote. A file system file is used as the Oracle Cluster Registry (OCR),
so raw devices are not needed at all. The conversion consists of backing up the OCR as well as
other configuration files and running a few shell scripts located in $ORA CRS HOME/install and
$ORA CRS HOME/bin.
A CRS installation for RAC is marked by three entries in /etc/inittab, as shown here:
$ grep '^h' /etc/inittab
h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null
h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null
h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null
The parameter local only in /etc/ocr.loc5 has the value FALSE.
$ cat /etc/oracle/ocr.loc
ocrconfig loc=/opt/oracle/ocr
local only=FALSE
A local-only CRS installation has but one entry in /etc/inittab and the parameter
local only is set to TRUE. Other differences in the directory structure below /etc/oracle/
scls scr exist. We will first back up the current configuration of CRS. This involves taking a
backup of the cluster registry with dd, backing up configuration files and shell scripts, and
preserving the entries for CRS in /etc/inittab. These steps need to be performed as root:
# dd if=/opt/oracle/ocr bs=1048576 of=ocr.bin
258+1 records in
258+1 records out
# ocrdump ocr.txt
tar cvfP crs local only false.tar /etc/oracle /etc/init.d/init.crs \
/etc/init.d/init.crsd /etc/init.d/init.cssd /etc/init.d/init.evmd \
5. The file ocr.loc is located in /var/opt/oracle on some platforms.
451
452
CHAPTER 34 ■ REMOVING THE RAC OPTION WITHOUT REINSTALLING
/etc/rc0.d/K96init.crs /etc/rc1.d/K96init.crs /etc/rc2.d/K96init.crs \
/etc/rc3.d/S96init.crs /etc/rc4.d/K96init.crs /etc/rc5.d/S96init.crs \
/etc/rc6.d/K96init.crs /etc/inittab
# grep '^h[1-3]' /etc/inittab > inittab.crs
Now that the current configuration is saved, let’s dare to change it. The script rootdelete.sh
removes CRS entries from /etc/inittab, notifies init of the changes, removes files from /etc/
init.d as well as /etc/rc[0-6].d, and deletes /etc/oracle/scls scr while retaining /etc/
oracle/ocr.loc. In case CRS is running in the endless loop mentioned before, kill the shell
scripts that were called with the argument startcheck by init.cssd. Otherwise stop CRS with
crsctl stop crs and then run rootdelete.sh as root.
# ps -e -o pid,ppid,comm,args|fgrep init.|grep -v grep
4584
1 init.evmd
/bin/sh /etc/init.d/init.evmd run
4877
1 init.cssd
/bin/sh /etc/init.d/init.cssd fatal
4878
1 init.crsd
/bin/sh /etc/init.d/init.crsd run
8032 4584 init.cssd
/bin/sh /etc/init.d/init.cssd startcheck
8194 4878 init.cssd
/bin/sh /etc/init.d/init.cssd startcheck
8247 4877 init.cssd
/bin/sh /etc/init.d/init.cssd startcheck
# kill 8032 8194 8247
# $ORA CRS HOME/install/rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources. This could take several minutes.
Error while stopping resources. Possible cause: CRSD is down.
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down...
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in '/etc/oracle/scls scr'
Now that the CRS configuration for RAC has been removed, we can install a local-only CRS
configuration for use by ASM with the documented script localconfig.
# $ORA CRS HOME/bin/localconfig add
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Configuration for local CSS has been initialized
Adding to inittab
Startup will be queued to init within 30 seconds.
Checking the status of new Oracle init process...
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
dbserver1
CSS is active on all nodes.
Oracle CSS service is installed and running under init(1M)
CHAPTER 34 ■ REMOVING THE RAC OPTION WITHOUT REINSTALLING
The stripped down CRS stack is now up and running. You may query its status with crsctl
check css.
$ crsctl check css
CSS appears healthy
$ crsctl check crs
CSS appears healthy
Cannot communicate with CRS
Cannot communicate with EVM
$ ps -ef|grep ocssd |grep -v grep
oracle
14459
1 0 01:42 ? 00:00:00 /opt/oracle/product/crs10.2/bin/ ocssd.bin
The command crsctl check crs fails to contact CRSD and EVMD, which are now disabled.
The value of the parameter local only in ocr.loc is now TRUE and only a single entry for OCSSD
is in /etc/inittab.
# cat /etc/oracle/ocr.loc
ocrconfig loc=/opt/oracle/product/crs10.2/cdata/localhost/local.ocr
local only=TRUE
# grep 'h[1-3]' /etc/inittab
h1:35:respawn:/etc/init.d/init.cssd run >/dev/null 2>&1 </dev/null
The ASM instance can be started and any RDBMS instance using ASM storage can mount
and open the database.
$ env ORACLE SID=+ASM1 sqlplus / as sysdba
SQL> STARTUP PFILE=$ORACLE HOME/dbs/pfile+ASM1.ora
ASM instance started
Total System Global Area
83886080 bytes
Fixed Size
1260216 bytes
Variable Size
57460040 bytes
ASM Cache
25165824 bytes
ASM diskgroups mounted
SQL> EXIT
$ env ORACLE SID=TEN1 sqlplus / as sysdba
SQL> ALTER DATABASE MOUNT;
Database altered.
SQL> ALTER DATABASE OPEN;
Database altered.
SQL> SELECT name FROM v$datafile WHERE file#=1;
NAME
------------------------------------+DG/ten/datafile/system.259.628550039
Startup and shutdown of OCSSD is performed with /etc/init.d/init.cssd {start|stop}.
# /etc/init.d/init.cssd stop
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.
453
454
CHAPTER 34 ■ REMOVING THE RAC OPTION WITHOUT REINSTALLING
Re-enabling CRS for RAC
As soon as the voting disks are available again, the RAC option can be linked back in and the
CRS configuration for RAC can be restored. Make sure any RDBMS and ASM instances are shut
down before you run make.
$ cd $ORACLE HOME/rdbms/lib
$ make -f ins rdbms.mk rac on ioracle
As root, run localconfig delete to remove the current CRS configuration.
# $ORA CRS HOME/bin/localconfig delete
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.
As long as the script $ORA CRS HOME/install/rootdeinstall.sh, which overwrites the OCR
with binary zeros, has not been used, there is no need to restore the backup of OCR. Considering
that a local-only configuration does not access the raw device holding the OCR, the backup is
just a precaution. The final steps consist of restoring the CRS scripts, adding entries in /etc/
inittab, and notifying the init process of the changes to inittab.
# tar xvfP crs local only false.tar # -P extracts with absolute path
/etc/oracle/
…
/etc/rc6.d/K96init.crs
# cat inittab.crs >> /etc/inittab # append CRS entries to inittab
# telinit q # notify init of changes to inittab
After a few moments, the CRS processes are running.
# ps -ef|grep d\\.bin
root
319 31977 1 02:21 ? 00:00:01 /opt/oracle/product/crs10.2/bin/crsd.bin reboo
t
oracle
738 32624 0 02:22 ? 00:00:00 /opt/oracle/product/crs10.2/bin/ocssd.bin
oracle 32577 31975 0 02:21 ? 00:00:01 /opt/oracle/product/crs10.2/bin/evmd.bin
# crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
CRS automatically starts the local ASM and RDBMS instances, given that cluster resources
are configured correctly. Multiple RAC instances are again able to open the same database.
$ env ORACLE SID=TEN1 sqlplus / as sysdba
SQL> SELECT inst number, trim(INST NAME) inst name FROM v$active instances;
INST NUMBER INST NAME
----------- -----------------------------1 dbserver1.oradbpro.com:TEN1
2 dbserver2.oradbpro.com:TEN2
CHAPTER 34 ■ REMOVING THE RAC OPTION WITHOUT REINSTALLING
Lessons Learned
This chapter discussed an emergency procedure for quickly bringing up a single node in a RAC
cluster in case of failure of all voting disks or OCR devices. The same procedure may be applied
to other severe error scenarios that prevent Oracle Clusterware from functioning. The procedure modifies the configuration of Oracle Clusterware, ASM, and RDBMS instances. It is not
destructive—the changes made may be reversed quickly, as soon as the underlying problem
that caused the outage of Oracle Clusterware has been resolved.
455
PA R T
1 1
Utilities
CHAPTER 35
■■■
OERR
T
he OERR utility, which ships with Oracle software distributions for UNIX and Linux, provides
quick access to ORACLE DBMS error message texts based on an error message code. The code
consists of a facility name and a positive number. Furthermore, the cause and action sections
retrieved by OERR along with the error message may help in solving a problem at hand. The
OERR script is mentioned in Oracle10g documentation in conjunction with TNS errors (see
Oracle Database Net Services Administrator’s Guide). It is undocumented that OERR supports
additional categories of errors except TNS. The SQL*Plus User’s Guide and Reference Release 9.2
states that “the UNIX oerr script now recognizes SP2- error prefixes to display the Cause and
Action text”, but does not explain how to use OERR. It is undocumented that most events (e.g.,
10046 for extended SQL trace) known by the DBMS instance are also represented in the message
files read by OERR and that a short description on each event is available.
The OERR utility is not available for Windows. However, alternative implementations for
Windows developed by third parties are available freely on the Internet.
Introduction to the OERR Script
Occasionally, software running against the ORACLE DBMS server only reports an error code in
case of a failure and omits the associated error message. In such a situation, the OERR script
comes in handy, since it retrieves the error message associated with the error code. But it doesn’t
stop there. Where available, it also prints probable causes of and remedies for the error. Too
bad it can’t be persuaded to fix the problem, given that it “knows” what’s wrong and how to fix
it! Here’s an example of OERR in action:
$ oerr ora 2292
02292, 00000,"integrity constraint (%s.%s) violated - child record found"
// *Cause: attempted to delete a parent key value that had a foreign
//
key dependency.
// *Action: delete dependencies first then parent or disable constraint.
OERR ships with each ORACLE DBMS software distribution for UNIX systems. It is implemented by the Bourne Shell script $ORACLE HOME/bin/oerr. Since the script is located in
$ORACLE HOME/bin, there is no need to add a directory to the PATH variable to use it. When called
without arguments, OERR prints its usage.
459
460
CHAPTER 35 ■ OERR
$ oerr
Usage: oerr facility error
Facility is identified by the prefix string in the error message.
For example, if you get ORA-7300, "ora" is the facility and "7300"
is the error. So you should type "oerr ora 7300".
If you get LCD-111, type "oerr lcd 111", and so on.
OERR reads the plain text error message files shipped with UNIX distributions of the
ORACLE DBMS. Windows distributions merely contain binary error message files. Plain text
error message files have the extension .msg, whereas binary error message files have the extension
.msb. Error message files are located in the directory $ORACLE HOME/component/mesg/facilityus.msg.
With the help of the file $ORACLE HOME/lib/facility.lis, OERR translates the facility name
into the corresponding component name. It uses the UNIX utility awk to extract the requested
portions from a message file.
Many DBAs are already familiar with OERR. It is known to a lesser degree that it supports
many more facilities than just “ORA”. If, for example, you were running Oracle Clusterware to
support RAC and received the error message “CLSS-02206: local node evicted by vendor node
monitor”, then OERR would have the following advice for you:
$ oerr clss 2206
2206, 1, "local node evicted by vendor node monitor"
// *Cause: The Operating System vendor's node monitor evicted the local node.
// *Action: Examine the vendor node monitor's logs for relevant information.
Some common facility codes and component names are listed in Table 35-1. The compilation in the table is by no means complete. Oracle10g boasts a total of 174 facilities.
Table 35-1. Facility Codes
Facility
Description
CLSR
Message file for Oracle Real Application Clusters HA (RACHA)
CLSS
Message file for Oracle Cluster Synchronization Services
CLST
Message file for modules common to Cluster Ready Services
DBV
Database Verification Utility (DBVERIFY)
DGM
Oracle Data Guard broker command line utility DGMGRL
EXP
Export Utility
IMP
Import Utility
LDAP
OiD LDAP Server
LPX
XML parser
LRM
CORE error message file for the parameter manager
LSX
XML Schema processor
CHAPTER 35 ■ OERR
Table 35-1. Facility Codes
Facility
Description
NID
New Database Id (NID Utility)
OCI
Oracle Call Interface
ORA
ORACLE DBMS Server
PCP
Pro*C/C++ C/SQL/PLS/DDL Parser
PLS
PL/SQL
PLW
PL/SQL Warnings
PROT
Oracle Cluster Registry (OCR) Tools
RMAN
Recovery Manager
SP2
SQL*Plus
TNS
Oracle Net Services (Transparent Network Substrate)
UDE
Data Pump Export
UDI
Data Pump Import
UL
SQL*Loader
Retrieving Undocumented Events
In addition to the documented error messages, ORACLE error message files also contain event
numbers of undocumented events. These are handled as if they were error codes and most are
in the range between 10000 and 10999. Just like with true error codes, there’s a message associated with them. Some of the events are described further in the cause and action sections of
oerr output.
The following shell script (file oerr.sh) retrieves all the events of a particular ORACLE
DBMS release:
event=10000
counter=0
while [ $event -lt 11000 ]
do
text=`oerr ora $event`
if [ "$text" != "" ]; then
counter=`expr $counter + 1`
echo "$text"
fi
event=`expr $event + 1`
done
echo "$counter events found."
461
462
CHAPTER 35 ■ OERR
Running this shell script against Oracle9i yields 623 events, while Oracle10g has 713 events.
Oracle11g has 761 events. Following is a small excerpt of the output generated by the script
oerr.sh, which contains some of the better known events such as 10046 and 10053.
10046, 00000, "enable SQL statement timing"
// *Cause:
// *Action:
10047, 00000, "trace switching of sessions"
// *Cause:
// *Action:
10048, 00000, "Undo segment shrink"
// *Cause:
// *Action:
10049, 00000, "protect library cache memory heaps"
// *Cause:
// *Action: Use the OS memory protection (if available) to protect library
//
cache memory heaps that are pinned.
10050, 00000, "sniper trace"
// *Cause:
// *Action:
10051, 00000, "trace OPI calls"
// *Cause:
// *Action:
10052, 00000, "don't clean up obj$"
// *Cause:
// *Action:
10053, 00000, "CBO Enable optimizer trace"
// *Cause:
// *Action:
10056, 00000, "dump analyze stats (kdg)"
// *Cause:
// *Action:
10057, 00000, "suppress file names in error messages"
// *Cause:
// *Action:
Instead of oerr on UNIX, you can use the following anonymous PL/SQL block on any platform
where an ORACLE client is available (file oerr.sql):
SET SERVEROUTPUT ON SIZE 1000000
DECLARE
err msg VARCHAR2(4000);
counter number:=0;
BEGIN
FOR err num IN 10000..10999
LOOP
err msg := SQLERRM (-err num);
CHAPTER 35 ■ OERR
IF err msg NOT LIKE '%Message '||err num||' not found%' THEN
DBMS OUTPUT.PUT LINE (err msg);
counter:=counter+1;
END IF;
END LOOP;
DBMS OUT